├── README.md
├── codes
├── bloom filters
│ ├── BloomFilter.h
│ ├── CodedBloomFilter.h
│ ├── CombinatorialBloomFilter.h
│ ├── CountingBloomFilter.h
│ ├── CuckooFilter.h
│ └── SummaryCache.h
└── common
│ └── BOBHash32.h
├── papers
├── bloom filters
│ ├── BloomFilter.pdf
│ ├── Invertible Bloom Lookup Table.pdf
│ ├── SummaryCache.pdf
│ ├── bloom tree.pdf
│ ├── bloomier filter.pdf
│ ├── coded bloom filter.pdf
│ ├── combinatorial bloom filter.pdf
│ ├── cuckoo filter.pdf
│ ├── kBF.pdf
│ └── shifting bloom filter.pdf
├── other references
│ ├── FlowRadar.pdf
│ ├── MRAC.pdf
│ ├── SketchVisor.pdf
│ ├── Space-Saving.pdf
│ └── univmon.pdf
├── sampling methods
│ ├── NetFLow.pdf
│ └── sFlowOverview.pdf
└── sketches
│ ├── CM sketch.pdf
│ ├── CSM sketch.pdf
│ ├── CU sketch.pdf
│ ├── Count sketch.pdf
│ ├── CounterBraids.pdf
│ └── Pyramid sketch.pdf
└── 常见sketch算法.pptx
/README.md:
--------------------------------------------------------------------------------
1 | # 常见Sketch&BloomFilter算法
2 |
3 | Per-flow measurement指在网络交换机或者路由器测量某个流的某些信息。最典型的是流量测量,即这个流有多少包/字节经过这个交换机。
4 |
5 | Notice: 本说明中的公式在github markdown语法下无法编译。为了更好的阅读,可以将README.md下载后本地查看。
6 |
7 | 每天一个小目标 O_O!
8 |
9 | [Sampling-based Method](#sampling-based-method)
10 |
11 | [Bloom filters](#bloom-filters)
12 |
13 | [Sketches](#sketches)
14 |
15 | [Others](#others)
16 |
17 |
18 |
19 |
20 | ## Sampling-based Method
21 |
22 | ### 1. NetFlow (used in Cisco system)
23 |
24 | - flow ID:5-tuple, TOS (Type Of Service) byte, the interface of the router recieved the packet
25 | - 存储方式:在router interface的DRAM里存放一张表,每个entry对应一个流,包含的流信息有,flow ID、timestamp (开始&结束)、packet count、byte count、TCP flags、source network、source AS (Autonomous System)、destination network、destination AS、output interface、next hop router
26 | - Two challenges
27 | - network traffic流速过快,来不及处理每个packet
28 | - 收集到的data可能过多,超过了collection server的负载,或者超过了与collection server连接的负载
29 | - Aggregation:通过将一些不重要的数据聚集起来减少exported data
30 | - 观察:通常不太care end-to-end的流量信息,而关注network/AS之间的流量信息
31 | - 做法:将复合相同规则的流量信息聚合。比如,相同source AS & destination AS或者相同source network等等
32 | - Sampling:每x个包才更新一次DRAM
33 | - accuracy analysis:所有包大小相同,采样概率为$\frac{1}{x}$,记$c$为NetFlow的counter值,$s$是流的真实大小。
34 | - 一个流完全没有被测到:$(1-\frac{1}{x})^s$
35 | - $E(c) = \frac{s}{x}$,因此流的估计大小为$cx$
36 | - $c$服从二项分布,因此其标准差为$SD[c]=\sqrt{\frac{s}{x}(1-\frac{1}{x})}$,因此估计值的标准差为$\sqrt{sx(1-\frac{1}{x})}$
37 | - 参考网址:https://www.cisco.com/c/en/us/td/docs/ios/12_2/switch/configuration/guide/fswtch_c/xcfnfov.html
38 |
39 | ### 2. sFlow (published in RFC 3176)
40 |
41 | - Sampling:同NetFlow
42 | - 参考网址:https://sflow.org/sFlowOverview.pdf
43 |
44 | ## Bloom filters
45 |
46 | ### 1. Bloom filter
47 | - 作用:单集合元素存在性查询
48 |
49 | - 假阳性分析:假设某集合总共包含n个元素(不重复),bloom filter含有m个比特,使用 k 个哈希函数
50 |
51 | - 某比特为0的概率为:$P_{(b=0)}=(1-\frac{1}{m})^{nk}$
52 |
53 | - 某不存在元素被误判为存在,即假阳性概率为:$Fpr = (1-P_{(b=0)})^k$
54 |
55 | - 当$m \gg 0, nk \gg 0$,$P_{(b=0)}=(1-\frac{1}{m})^{nk} \approx e^{-\frac{nk}{m}}$,$Fpr \approx (1- e^{-\frac{nk}{m}})^k$
56 |
57 | - 对 $\ln(Fpr) = k \ln{(1-e^{-\frac{nk}{m}})}$ 求导:$\frac{\partial}{\partial k}\ln(Fpr) = \ln (1-e^{-\frac{nk}{m}}) + \frac{\frac{nk}{m}\cdot e^{-\frac{nk}{m}}}{1-e^{-\frac{nk}{m}}}$,令其等于0,可得$e^{-\frac{nk}{m}}=\frac{1}{2}$,即最优k值为:$k=\frac{n}{m}\ln2$
58 |
59 | ### 2. Counting Bloom filter
60 |
61 | - 作用:多重集合元素查询
62 |
63 | - 做法:某集合s中的元素可以重复,因此把每个bit换成counter就行了
64 |
65 | ### 3. Summary Cache
66 | - 作用:多集合元素查询(每个集合中元素不重复,且集合之间没有交集,查询一个元素属于哪个集合)
67 |
68 | - 做法:每个集合对应一个bloom filter
69 |
70 | - 缺点:查询时需要访存次数太多
71 |
72 | ### 4. Coded Bloom Filter
73 |
74 | - 作用:多集合元素查询
75 | - 做法:每个集合对应一个ID(序号),因此可以用一个bloom filter对应ID的一个bit
76 | - 缺点:各个ID所包含的1的数量不一样,插入速度变慢
77 | - 优化:把ID扩张一倍,后面补个反码
78 |
79 | ### 5. Combinatorial Bloom Filter
80 |
81 | - 作用:多集合元素查询
82 | - 做法:只用一个很大的bloom filter,但是使用不同的hash function组来表示不同的集合
83 | - 缺点:插入速度慢
84 |
85 | ### 6. kBF
86 |
87 | - 作用:key-value的静态插入、查询
88 | - 数据结构:
89 | - 一个cell数组
90 | - 每个cell包括一个counter part和coding part
91 | - 做法:
92 | - value -> encoding:
93 | - encoding要求:
94 | - 不同value的encoding不同
95 | - 任意两个encoding的异或值都是独特的
96 | - 假设有n个不同value,给第一个value分配encoding为1
97 | - 对于之后的value,encoding不断+1
98 | - 如果第k个value发现,它的encoding与之前k-1个value的encoding的异或值已经出现过(可以使用bloom filter),那么第k个value的encoding+1,并继续检查是否满足要求
99 | - encoding insertion (k次hash):
100 | - counter部分+1
101 | - encoding通过异或的方式插入coding part
102 | - query:通过hash得到k个cell后,希望还原出encoding
103 | - 如果存在counter=1的cell,done
104 | - 如果存在几个counter=2的cell,用O(n)时间复杂度的方法还原两个encoding,并判断共同出现的encoding
105 | - 如果只有counter>2的cell:
106 | - 先用O(n)的时间复杂度得到一个,再用o(n)的时间复杂度得到另一个
107 | - 结果不保证准确
108 | - encoding->value:
109 | - 由value->encoding的分配过程可以看出,encoding的分配是按encoding值的大小升序分配的(由此可以记录下一个encoding->value的list),所以可以使用二分查找的方法得到对应的value
110 | - 原文中提到了另一种更快的方法,没仔细看,就不说了
111 |
112 | ### 7. Bloomtree
113 |
114 | - 作用:多集合查询(key,value=setID)
115 | - 特点:
116 | - 静态的,不支持更新,可以通过改成counting bloom tree来支持
117 | - 结构:
118 | - 每个节点都是一个bloom filter
119 | - 内部节点有多组hash function,每组对应着一个子节点
120 | - 叶子结点只有一组hash function,用来记录某个key是否存在
121 | - 插入:
122 | - 根据value逐层选择一组hash函数,对key进行哈希,记录在node(bloom filter)中
123 | - 查询:
124 | - 每一层,使用这个node的所有哈希函数对key做哈希,查看是否在这个node里面出现过
125 | - 直到一个leaf node,如果leaf node也出现过,那么回答这个leaf node对应的value
126 |
127 | ### 8. Bloomier Filter
128 |
129 | - 希望解决的问题:
130 | - 给定定义域:$D=\{0, 1, \ldots, N-2\}$
131 | - 定义域的一个子集:$S=\{t_1, t_2, \ldots, t_n\}$
132 | - 值域:$R=\{null, 1, 2, \ldots, |R|-1\}$
133 | - 希望建立一个$D\rightarrow R$的映射:
134 | - $for ~~1\leq i \leq n,~~~~ f(t_i) = v_i$
135 | - $for ~~x\in D/S, ~~~~f(x)=null$
136 | - $t_i \rightarrow v_i$的映射关系由具体任务决定:比如多重集合查询,t是元素,v是集合id
137 | - 支持更新,但不支持插入
138 | - 一些定义:
139 | - 对于定义域中的一个元素t,对其做k次hash得到$(h_1, h_2,\ldots, h_k)$,称这些哈希值为t的neighborhood $N(t)$
140 | - 令$\pi$表示S上的一个全序关系:
141 | - 在$\pi$规则下,$S_i >_{\pi} S_j$当且仅当$i>j$
142 | - 称$\tau$服从关系$(S, \pi, N)$,如果满足以下条件:
143 | - 若$t\in S$,则$\tau (t) \in N(t)$
144 | - 若$t_i >_{\pi} t_j$,则$\tau(t_i) \notin N(t_j)$
145 | - singleton:如果某个位置只被S中一个元素映射到过,这个位置就是singleton
146 | - TWEAK(t, S, HASH):t的neighborhood $N(t)=(h_1, h_2, \ldots, h_k)$中,所有singleton中最小的那个下标(也就是哪个哈希函数最先映射到singleton)
147 | - 给定S和k个哈希函数,如何得到想要的$\pi$和$\tau$:
148 | - 首先找到有TWEAK的元素集合E,因此它们可以满足条件。把E中元素放在ordering最后(也就是E中元素在$\pi$关系下最大)
149 | - 剩下的元素称为H:继续递归查找$\pi$和$\tau$
150 | - 可能会失败
151 | - 构建Bloomier filter:
152 | - 不断尝试,找到想要的$\pi$和$\tau$
153 | - 对S中任一元素$t$,找到位置$\tau(t)$,把对应哈希函数的编码用异或的方式记录下来(在table 1中)
154 | - 在table2的位置$\tau(t)$,记录t对应的value
155 | - 查询/更新:
156 | - 把哈希位置的值全部和mask全部异或起来,解码得到l
157 | - 查看l是不是key对应的那个l
158 | - 如果是,返回/修改table2中的value
159 | - 如果不是,返回这个key不存在
160 |
161 | ### 9. Cuckoo Filter
162 |
163 | - 作用:单集合元素查询
164 | - 做法:filter有bucket array构成,每个bucket包含多个entry,每个entry存放一个partial key
165 | - 先由key计算partial key:$f$
166 | - 计算两个候选位置:$pos_1 = hash(key)$ 和 $pos_2 = hash(key)~XOR ~hash(f)$
167 | - 插入:如果有空位,就插入;否则,踢掉一个插进去,并把踢掉的那个找另一个候选位置,放进去
168 | - 查询:查$pos_1$和$pos_2$位置是不是有partial key相等的entry
169 | - 删除:删掉候选位置与$f$ 相等的entry你们
170 | ### 10. Shifting Bloom Filter
171 |
172 | - 作用:多集合元素查询
173 | - insertion:
174 | - k次哈希,定位到k个bit
175 | - 第j个集合,那么就把第k个bit之后的offset为j的bits置为1
176 | - query:
177 | - 和平常的bloom filter一样
178 |
179 | ### 11. Invertible Bloom Filter
180 |
181 | - 作用:
182 | - 数据结构:
183 | - k个哈希
184 | - a table of m cells,每个cell包括:
185 | - count part:映射到这个cell的元素个数
186 | - keySum:映射到这儿的key的和
187 | - valueSum:映射到这儿的value的和
188 | - 插入、删除:用哈希函数映射到k个cell,然后按照上面的定义更新这些part就行了
189 | - 查找:先找到k个哈希的cell
190 | - 如果有count为0的cell,返回null
191 | - 如果有count为1的cell
192 | - 如果keySum匹配,返回valueSum
193 | - 否则返回null
194 | - 如果所有count都大于1,返回“not found”
195 | - 优势之处:可以list out存在IBLT中的所有key-value pair
196 | - 先找一个count为1的cell,列出这个cell里的key和value,然后把与其相关的所有cell的值减去key-value
197 | - 重复上述步骤,直到没有count为1的cell
198 | - 可能不能把所有的key-value pair输出
199 |
200 | ## Sketches
201 |
202 | ### 1. CM sketch
203 |
204 | - 作用:频率查询
205 | - 插入:映射到k个counter,每个counter+1
206 | - 查询:映射到k个counter,返回最小counter的值
207 | ### 2. CU sketch
208 | - 作用:频率查询
209 | - 插入:映射到k个counter,最小的一个或者多个counter+1
210 | - 查询:映射到k个counter,返回最小counter的值
211 | ### 3. Count sketch
212 |
213 | - 作用:频率查询
214 | - 插入:映射到k个counter,每个counter等概率+1或者-1
215 | - 查询:映射到k个counter,返回counter绝对值的中位数
216 |
217 | ### 4. CSM sketch
218 |
219 | - 数据结构:
220 | - 一个包含m个counter的数组
221 | - k个哈希函数
222 | - 插入:与CM类似,但事实随机选取一个counter+1
223 | - 查询:$\hat{s} = \sum_{i=0}^{k-1}C[h_i(f)] - k\frac{n}{m}$
224 | - 前面这一块是所有对应hash counter的和,即为真实值+噪音
225 | - 后面为噪音的期望值(近似值)
226 |
227 | ### 5. Pyramid sketch
228 |
229 | - 数据结构:由多层counter数组构成
230 | - 上层是下层的一半大小,构成树结构
231 | - 第一层的counter全部用来记录frequency
232 | - 之后层的counter的两个bit用来记录左子节点和右子节点是否溢出,剩下的bit用来记录frequency
233 | - 插入过程:
234 | - 可以使用count、CM、CU、CSM的方式
235 | - 如发生溢出,则用进位的方式向上层记录
236 | - 删除过程:
237 | - 可以使用count、CM、CU、CSM的方式
238 | - 如需要借位,则向下层借(进位的方式)
239 | - 查询过程:
240 | - 按照进位的方式查询即可
241 | - 好处:因为实际流量中frequency较小的流比较多,因此低层可以使用较小的counter节省空间
242 |
243 |
244 |
245 | ## Others
246 | ### 1. Space-Saving
247 |
248 | - 作用:finding heavy items (items with large frequency)
249 | - 插入算法简单介绍:
250 | - 若数据结构未满,直接插入
251 | - 若满了,则替换value最小的元素,并让value+1
252 | - 数据结构:
253 | - 一个哈希表:key -> key_node的指针
254 | - 一个value node list,是双向链表,node代表的值从小到大排布
255 | - 每个value node连着一串key node
256 | - key node之间用双向链表连接
257 | - 每个key node还存着一个指针指向value node
258 | - 具体的插入(key)过程:
259 | - 如果在哈希表中找到key,执行下面的increment操作
260 | - 将对应的key node从所在的value node中移除
261 | - 如果移除后list空了,那么将这个value node从value node list中移除
262 | - 查看原来value node中的下一个value node
263 | - 如果新node的value = old node的值+1,那么在这个value node所在的key node list中插入这个key
264 | - 否则,新建一个value node,将值职位old value+1,并将这个key插入到这个value ndoe
265 | - 如果没有找到这个key:
266 | - 将v1(最小值的value node)中一个key node的key替换成插入的key,并将这个key node从v1中移除
267 | - 执行上面的increment操作
268 |
269 | ### 2. UnivMon
270 |
271 | - 处理对象:n个不同的元素,总共m个包,$f_i$表示第i个元素的频率
272 | - 目标是回答G-sum:$\sum g(f_i)$,其中g是单调的,并且以 $f_i^2$ 为upper bound
273 | - 数据结构:$H=\log(n)$个count sketch,按照类似pyramid sketch那样叠加起来
274 | - (online)插入过程:
275 | - 哈希H次(哈希值落在{0, 1}之间)
276 | - 如果发现1~j个的哈希值都是1(这样的最大的j),那么插入第1~j个count sketch
277 | - (offline)查询过程:
278 | - 从数据结构中的每一层count sketch读出heavy hitter集合(命名为$Q_j, 1\leq j \leq \log(n)$, $w_j(i)$表示$Q_j$中的第i个heavy hitter)
279 | - 对这些heavy hitter作g()操作
280 | - 计算顶层的G-sum:$Y_{\log(n)}=\sum_{i} g(w_{\log(n)}(i))$
281 | - 对于接下来的层次:
282 | - $Y_j = 2Y_{j+1} + \sum_{i \in Q_j}(1 - 2 h_{j+1}(i))(g(w_j(i))$
283 | - 第一项:G-sum的2倍
284 | - 后一项:
285 | - 如果$Q_j(i)$不在第j+1层出现,那么$h_{j+1}(i)=0$,所以后面一项代表加上$Q_j(i)$的counter值
286 | - 如果$Q_j(i)$在第j+1层出现,那么$h_{j+1}(i)=1$,所以后面一项代表减去$Q_j(i)$的counter值,因此保证了G-sum是1倍的
287 | - 这里有个隐含的东西:出现在第j+1层的必定出现在第j层,所以第j+1层的G-sum肯定会全部被减一遍
288 | - 这样做的好处是方便流水化
289 | - task:
290 | - heavy hiiter:直接用count sketch做
291 | - entropy
292 |
293 | ### 3. FlowRadar
294 |
295 | - 作用:key-value的流量测量任务
296 | - 数据结构很简单:一个bloom filter + 一个counting table(类似IBLT的东西)
297 | - counting table中的每个cell包含3哥域:
298 | - FlowXOR:映射到这里面的flow ID的异或值
299 | - FlowCount:映射到这里面的流个数
300 | - packetCount:映射到这里的包个数
301 | - 插入key:
302 | - 检查bloom filter,key是否出现过?
303 | - 如果出现过,那么在counting table中映射到k个位置,执行packetCount++
304 | - 否则,先将key插入bloom filter,再映射到counting table中的k个位置,
305 | - 将key异或进入FlowXOR
306 | - FlowCount++
307 | - packetCount++
308 | - 解码过程(就是IBLT的list out操作)
309 | - 先找FlowCount=1的cell,读出里面的key和value,然后从与这个key相关的cell删除信息
310 | - 继续上一步
311 | - Network-Wide decoding:
312 | - FlowDecode across switches:
313 | - 现在switche内部作decode
314 | - 然后比较两个相邻两个switch的解码出来的元素集合
315 | - S1表示一个交换机的解码结果, S2表示另一个交换机的解码结果
316 | - 查看S1-S2中的元素是不是在switch 2的bloom filter中出现
317 | - 如果出现了,把这个元素加入S2的结果,并且更新对应的FlowXOR、FlowCount、PacketCount
318 | - 再对S2-S1同样做一次
319 | - CounterDecode at a single switch:
320 | - 通过上面的FlowDecode across switches,大概知道哪些counter里有哪些key,且知道这些key对应的流的包总量
321 | - 假设有m个counter,n个flow,那么可以得到一个包含n个变量、m个等式的方程组
322 | - 然后用matlab等工具来解这些东西,最终的到key对应的packet num
323 |
324 | ### 4. SketchVisor
325 |
326 | - 一个集成多种sketch algorithm for various measurement tasks的system
327 | - overview:
328 | - data plane分为normal path和fast path:
329 | - 数据进入data plane后,首先判断一个FIFO buffer是否已满
330 | - 如果没满,则走normal path,由内部各种算法来处理
331 | - 如果满了,则走**fast path**
332 | - control plane:
333 | - 从各地收集数据
334 | - 将各地收回的数据进行一定的修复(a recover algorithm based on Compressive Sensing)
335 | - Fast Path
336 | - key idea:
337 | - 假设进入fast path的流也是长尾分布的 => 应注重收集large flow的信息
338 | - 同时小流信息也很重要 => 不单独为记录每个小流的信息,而是记录它们的统计信息
339 | - 总而言之,就是设计一个top-k算法(具体算法还是直接看论文吧,论文讲的很清楚)
340 | - recovery algorithm:
341 | - 已知的信息:
342 | - 将所有sketch加在一起,得到一个N(就是对应位置的counter相加)
343 | - N是明显不准确的,因为走fast path的小流的信息都被扔掉了
344 | - 所有的top-k流组成一个hash表H
345 | - H的误差只与选择的算法本身有关
346 | - 记录流的总字节数V
347 | - 这个可以完全准确
348 | - 补全N => 矩阵补全的问题
349 | - 使用compressive sensing来补全N
350 | - 具体的可以看论文s
351 |
352 |
--------------------------------------------------------------------------------
/codes/bloom filters/BloomFilter.h:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "../common/BOBHash32.h"
6 |
7 | class BloomFilter
8 | {
9 | int n; // number of elements in the set
10 | int m; // number of bits in the bit array
11 | int w; // number of bytes in the bit array
12 | int k; // number of hash functions
13 | uint8_t* array;
14 | BOBHash32** hash;
15 |
16 | public:
17 | BloomFilter(int m, int n, int k): n(n), m(m)
18 | {
19 | k = k == 0 ? int(n * 1.0 / m * log(2)) : k;
20 |
21 | w = m / 8 + (m % 8 == 0 ?0 : 1);
22 | array = new uint8_t[w];
23 | memset(array, 0, w);
24 |
25 | hash = new BOBHash32*[k];
26 | for(int i = 0; i < k; ++i)
27 | hash[i] = new BOBHash32(100 + i);
28 | }
29 | ~BloomFilter(){
30 | delete array;
31 | for(int i = 0; i < k; ++i)
32 | delete hash[i];
33 | delete hash;
34 | }
35 |
36 | void insert(char* key, uint32_t keylen)
37 | {
38 | for(int i = 0; i < k; ++i){
39 | int pos = hash[i]->run(key, keylen) % m;
40 | set_bit(pos);
41 | }
42 | }
43 |
44 | bool query(char* key, uint32_t keylen)
45 | {
46 | for(int i = 0; i < k; ++i){
47 | int pos = hash[i]->run(key, keylen) % m;
48 | if(query_bit(pos) == 0)
49 | return false;
50 | }
51 | return true;
52 | }
53 |
54 | private:
55 | int query_bit(int pos){
56 | int base = pos / 8;
57 | int offset = pos % 8;
58 | uint8_t mask = (uint8_t)(1 << offset);
59 | uint8_t res = array[base] & mask;
60 | return res ? 1 : 0;
61 | }
62 | void set_bit(int pos){
63 | int base = pos / 8;
64 | int offset = pos % 8;
65 | uint8_t mask = (uint8_t)(1 << offset);
66 | array[base] |= mask;
67 | }
68 | }
--------------------------------------------------------------------------------
/codes/bloom filters/CodedBloomFilter.h:
--------------------------------------------------------------------------------
1 | #include "BloomFilter.h"
2 |
3 |
4 | class CodedBloomFilter
5 | {
6 | int s; // number of sets
7 | int n; // number of bloom filters
8 | int m; // number of bits used in each bloom filter
9 | int k; // number of hash functions used in each bloom filter
10 | BloomFilter** bfs;
11 |
12 | public:
13 | CodedBloomFilter(int s, int m, int k):s(s), m(m), k(k)
14 | {
15 | n = 0;
16 | int tmpS = s;
17 | while(tmpS != 0){
18 | n++;
19 | tmpS >>= 1;
20 | }
21 |
22 | bfs = new BloomFilter*[n];
23 | for(int i = 0; i < n; ++i)
24 | bfs[i] = new BloomFilter(m, 0, k);
25 | }
26 | ~CodedBloomFilter()
27 | {
28 | for(int i = 0; i < n; ++i)
29 | delete bfs[i];
30 | delete bfs;
31 | }
32 |
33 | void insert(char* key, uint32_t keylen, int setIdx)
34 | {
35 | for(int i = 0; i < n; ++i){
36 | if(setIdx & 1 == 1)
37 | bfs[i]->insert(key, keylen);
38 | setIdx >>= 1;
39 | }
40 | }
41 |
42 | /*
43 | * return -1: not found in any set
44 | * return -2: found in multiple sets
45 | * return k (0<=kquery(key, keylen))
52 | setId += (1 << i);
53 | if(setId == 0)
54 | return -1;
55 | if(setId > s)
56 | return -2;
57 | return setId;
58 | }
59 | }
--------------------------------------------------------------------------------
/codes/bloom filters/CombinatorialBloomFilter.h:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "../common/BOBHash32.h"
6 |
7 | class CombinatorialBloomFilter
8 | {
9 | int m; // number of bits in the bit array
10 | int s; // number of sets
11 | int w; // number of bytes in the bit array
12 | int k; // number of hash functions
13 | uint8_t* array;
14 | BOBHash32** hash;
15 |
16 | public:
17 | CombinatorialBloomFilter(int m, int k, int s):m(m), k(k), s(s)
18 | {
19 | w = m / 8 + (m % 8 == 0 ?0 : 1);
20 | array = new uint8_t[w];
21 | memset(array, 0, w);
22 |
23 | hash = new BOBHash32*[s*k];
24 | for(int i = 0; i < s*k; ++i)
25 | hash[i] = new BOBHash32(100 + i);
26 | }
27 | ~CombinatorialBloomFilter{
28 | delete array;
29 | for(int i = 0; i < s*k; ++i)
30 | delete hash[i];
31 | delete hash;
32 | }
33 |
34 | void insert(char* key, uint32_t keylen, int setIdx)
35 | {
36 | for(int i = 0; i < k; ++i){
37 | int pos = hash[setIdx*k + i]->run(key, keylen) & m;
38 | set_bit(pos);
39 | }
40 | }
41 |
42 | /*
43 | * return -1: not found in any set
44 | * return -2: found in multiple sets
45 | * return k (0<=krun(key, keylen) % m;
55 | if(query_bit(pos) == 0){
56 | flag = false;
57 | break;
58 | }
59 | }
60 | if(flag){
61 | if(setIdx != -1)
62 | return -2;
63 | setIdx = i;
64 | }
65 | }
66 | return setIdx;
67 | }
68 |
69 | private:
70 | int query_bit(int pos){
71 | int base = pos / 8;
72 | int offset = pos % 8;
73 | uint8_t mask = (uint8_t)(1 << offset);
74 | uint8_t res = array[base] & mask;
75 | return res ? 1 : 0;
76 | }
77 | void set_bit(int pos){
78 | int base = pos / 8;
79 | int offset = pos % 8;
80 | uint8_t mask = (uint8_t)(1 << offset);
81 | array[base] |= mask;
82 | }
83 |
84 |
85 | }
--------------------------------------------------------------------------------
/codes/bloom filters/CountingBloomFilter.h:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "../common/BOBHash32.h"
6 |
7 | class BloomFilter
8 | {
9 | int n; // number of elements in the set
10 | int m; // number of counters in the array
11 | int k; // number of hash functions
12 | uint32_t* array;
13 | BOBHash32** hash;
14 |
15 | public:
16 | BloomFilter(int m, int n, int k): n(n), m(m), k(k)
17 | {
18 | array = new uint32_t[m];
19 | memset(array, 0, m * sizeof(uint32_t));
20 |
21 | hash = new BOBHash32*[k];
22 | for(int i = 0; i < k; ++i)
23 | hash[i] = new BOBHash32(100 + i);
24 | }
25 | BloomFilter(int m, int n)
26 | {
27 | k = int(n * 1.0 / m * log(2));
28 | BloomFilter(m, n, k);
29 | }
30 | ~BloomFilter(){
31 | delete array;
32 | for(int i = 0; i < k; ++i)
33 | delete hash[i];
34 | delete hash;
35 | }
36 |
37 | void insert(char* key, uint32_t keylen)
38 | {
39 | for(int i = 0; i < k; ++i){
40 | int pos = hash[i]->run(key, keylen) % m;
41 | array[pos] += 1;
42 | }
43 | }
44 |
45 | int query(char* key, uint32_t keylen)
46 | {
47 | int res = 0x3FFFFFFF;
48 | for(int i = 0; i < k; ++i){
49 | int pos = hash[i]->run(key, keylen) % m;
50 | res = res > array[pos] ? array[pos] : res;
51 | }
52 | return res;
53 | }
54 | }
--------------------------------------------------------------------------------
/codes/bloom filters/CuckooFilter.h:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include "../common/BOBHash32.h"
6 |
7 | class CuckooFilter
8 | {
9 | const static int MAX_RELOCATE_TIMES = 100;
10 | int m; // number of buckets in the array
11 | // m must be 2^k
12 | int w; // number of entries in the bucket
13 | uint32_t** array;
14 | BOBHash32* hash[2];
15 |
16 | int kick_index, kick_bucket;
17 |
18 | public:
19 | CuckooFilter(int m, int w): m(m), w(w)
20 | {
21 | array = new uint32_t*[m];
22 | for(int i = 0; i < m; ++i){
23 | array[i] = new uint32_t[w];
24 | memset(array[i], 0, sizeof(uint32_t) * w);
25 | }
26 |
27 | hash[0] = new BOBHash32(101);
28 | hash[1] = new BOBHash32[102];
29 |
30 | kick_bucket = 0;
31 | kick_index = 0;
32 | }
33 | ~CuckooFilter(){
34 | for(int i = 0; i < m; ++i)
35 | delete array[i];
36 | delete array;
37 | delete hash[0];
38 | delete hash[1];
39 | }
40 |
41 | void insert(char* key, uint32_t keylen)
42 | {
43 | uint32_t fp = get_fp(key, keylen);
44 | uint32_t pos1 = hash[0]->run(key, keylen) % m;
45 | uint32_t pos2 = (pos1 ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m;
46 | if(pos1 > pos2){
47 | tmp_pos = pos2;
48 | pos2 = pos1;
49 | pos1 = tmp_pos;
50 |
51 | }
52 |
53 | for(int j = 0; j < w; ++j)
54 | if(array[pos1][j] == fp)
55 | return;
56 | for(int j = 0; j < w; ++j)
57 | if(array[pos2][j] == fp)
58 | return;
59 |
60 | for(int j = 0; j < w; ++j)
61 | if(array[pos1][j] == 0){
62 | array[pos1][j] = fp;
63 | return;
64 | }
65 | for(int j = 0; j < w; ++j)
66 | if(array[pos2][j] == 0){
67 | array[pos2][j] = fp;
68 | return;
69 | }
70 |
71 | uint32_t bucket_no = kick_bucket == 0 ? pos1 : pos2;
72 | uint32_t kick_fp = array[bucket_no][kick_index];
73 | array[bucket_no][kick_index] = fp;
74 | kick_bucket = (kick_bucket + 1) % 2;
75 | kick_index = (kick_index + 1) % w;
76 | relocate(kick_fp, bucket_no);
77 | }
78 |
79 | bool query(char* key, uint32_t keylen)
80 | {
81 | uint32_t fp = get_fp(key, keylen);
82 | uint32_t pos1 = hash[0]->run(key, keylen) % m;
83 | uint32_t pos2 = (pos1 ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m;
84 |
85 | for(int j = 0; j < w; ++j)
86 | if(array[pos1][j] == fp)
87 | return true;
88 | for(int j = 0; j < w; ++j)
89 | if(array[pos2][j] == fp)
90 | return true;
91 | return false;
92 | }
93 |
94 | void delete(char*key, uint32_t keylen)
95 | {
96 | uint32_t fp = get_fp(key, keylen);
97 | uint32_t pos1 = hash[0]->run(key, keylen) % m;
98 | uint32_t pos2 = (pos1 ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m;
99 |
100 | for(int j = 0; j < w; ++j)
101 | if(array[pos1][j] == fp){
102 | array[pos1][j] = 0;
103 | return;
104 | }
105 | for(int j = 0; j < w; ++j)
106 | if(array[pos2][j] == fp){
107 | array[pos1][j] = 0;
108 | return;
109 | }
110 | }
111 |
112 | private:
113 | /* calculate fingerprint */
114 | uint32_t get_fp(char* key, uint32_t keylen)
115 | {
116 | return hash[1]->run(key, keylen);
117 | }
118 |
119 | /* relocate finger print */
120 | void relocate(uint32_t fp, uint32_t bucket_no, int times = 0)
121 | {
122 | if(times >= MAX_RELOCATE_TIMES){
123 | printf("relocate failed!\n");
124 | return;
125 | }
126 |
127 | int pos = (bucket_no ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m;
128 | for(int j = 0; j < w; ++j)
129 | if(array[pos][j] == 0){
130 | array[pos][j] = fp;
131 | return;
132 | }
133 |
134 | uint32_t kick_fp = array[pos][kick_index];
135 | array[pos][kick_index] = fp;
136 | kick_index = (kick_index + 1) % w;
137 | relocate(kick_fp, pos, times++);
138 | }
139 | }
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 |
149 |
150 |
151 |
152 |
153 |
154 |
155 |
156 |
157 |
158 |
--------------------------------------------------------------------------------
/codes/bloom filters/SummaryCache.h:
--------------------------------------------------------------------------------
1 | #include "BloomFilter.h"
2 |
3 | class SummaryCache
4 | {
5 | int s; // number of sets
6 | int m; // number of bits used in each bloom filter
7 | int k; // number of hash functions used in each bloom filter
8 | BloomFilter** bfs;
9 |
10 | public:
11 | SummaryCache(int s, int m, int k):s(s), m(m), k(k)
12 | {
13 | bfs = new BloomFilter*[s];
14 | for(int i = 0; i < s; ++i)
15 | bfs[i] = new BloomFilter(m, 0, k);
16 | }
17 | ~SummaryCache()
18 | {
19 | for(int i = 0; i < s; ++i)
20 | delete bfs[i];
21 | delete bfs;
22 | }
23 |
24 | void insert(char* key, uint32_t keylen, int setIdx)
25 | {
26 | bfs[setIdx]->insert(key, keylen);
27 | }
28 |
29 | /*
30 | * return -1: not found in any set
31 | * return -2: found in multiple sets
32 | * return k (0<=kquery(key, keylen)){
39 | if(res != -1)
40 | return -2;
41 | res = i;
42 | }
43 | return res;
44 | }
45 | }
--------------------------------------------------------------------------------
/codes/common/BOBHash32.h:
--------------------------------------------------------------------------------
1 | #ifndef _BOBHASH32_H
2 | #define _BOBHASH32_H
3 | #include
4 | #include
5 | #include
6 | #include
7 | using namespace std;
8 |
9 | #define MAX_PRIME32 1229
10 | #define MAX_BIG_PRIME32 50
11 |
12 | class BOBHash32
13 | {
14 | public:
15 | BOBHash32();
16 | ~BOBHash32();
17 | BOBHash32(uint32_t prime32Num);
18 | void initialize(uint32_t prime32Num);
19 | uint32_t run(const char * str, uint32_t len); // produce a hash number
20 | static uint32_t get_random_prime_index()
21 | {
22 | random_device rd;
23 | return rd() % MAX_PRIME32;
24 | }
25 |
26 | static vector get_random_prime_index_list(int n)
27 | {
28 | random_device rd;
29 | unordered_set st;
30 | while (st.size() < n) {
31 | st.insert(rd() % MAX_PRIME32);
32 | }
33 | return vector(st.begin(), st.end());
34 | }
35 | private:
36 | uint32_t prime32Num;
37 | };
38 |
39 | uint32_t big_prime3232[MAX_BIG_PRIME32] = {
40 | 20177, 20183, 20201, 20219, 20231, 20233, 20249, 20261, 20269, 20287,
41 | 20297, 20323, 20327, 20333, 20341, 20347, 20353, 20357, 20359, 20369,
42 | 20389, 20393, 20399, 20407, 20411, 20431, 20441, 20443, 20477, 20479,
43 | 20483, 20507, 20509, 20521, 20533, 20543, 20549, 20551, 20563, 20593,
44 | 20599, 20611, 20627, 20639, 20641, 20663, 20681, 20693, 20707, 20717
45 | };
46 | uint32_t prime32[MAX_PRIME32] = {
47 | 2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
48 | 31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
49 | 73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
50 | 127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
51 | 179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
52 | 233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
53 | 283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
54 | 353, 359, 367, 373, 379, 383, 389, 397, 401, 409,
55 | 419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
56 | 467, 479, 487, 491, 499, 503, 509, 521, 523, 541,
57 | 547, 557, 563, 569, 571, 577, 587, 593, 599, 601,
58 | 607, 613, 617, 619, 631, 641, 643, 647, 653, 659,
59 | 661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
60 | 739, 743, 751, 757, 761, 769, 773, 787, 797, 809,
61 | 811, 821, 823, 827, 829, 839, 853, 857, 859, 863,
62 | 877, 881, 883, 887, 907, 911, 919, 929, 937, 941,
63 | 947, 953, 967, 971, 977, 983, 991, 997,
64 | 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061,
65 | 1063, 1069, 1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123,
66 | 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213,
67 | 1217, 1223, 1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283,
68 | 1289, 1291, 1297, 1301, 1303, 1307, 1319, 1321, 1327, 1361,
69 | 1367, 1373, 1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439,
70 | 1447, 1451, 1453, 1459, 1471, 1481, 1483, 1487, 1489, 1493,
71 | 1499, 1511, 1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571,
72 | 1579, 1583, 1597, 1601, 1607, 1609, 1613, 1619, 1621, 1627,
73 | 1637, 1657, 1663, 1667, 1669, 1693, 1697, 1699, 1709, 1721,
74 | 1723, 1733, 1741, 1747, 1753, 1759, 1777, 1783, 1787, 1789,
75 | 1801, 1811, 1823, 1831, 1847, 1861, 1867, 1871, 1873, 1877,
76 | 1879, 1889, 1901, 1907, 1913, 1931, 1933, 1949, 1951, 1973,
77 | 1979, 1987, 1993, 1997, 1999, 2003, 2011, 2017, 2027, 2029,
78 | 2039, 2053, 2063, 2069, 2081, 2083, 2087, 2089, 2099, 2111,
79 | 2113, 2129, 2131, 2137, 2141, 2143, 2153, 2161, 2179, 2203,
80 | 2207, 2213, 2221, 2237, 2239, 2243, 2251, 2267, 2269, 2273,
81 | 2281, 2287, 2293, 2297, 2309, 2311, 2333, 2339, 2341, 2347,
82 | 2351, 2357, 2371, 2377, 2381, 2383, 2389, 2393, 2399, 2411,
83 | 2417, 2423, 2437, 2441, 2447, 2459, 2467, 2473, 2477, 2503,
84 | 2521, 2531, 2539, 2543, 2549, 2551, 2557, 2579, 2591, 2593,
85 | 2609, 2617, 2621, 2633, 2647, 2657, 2659, 2663, 2671, 2677,
86 | 2683, 2687, 2689, 2693, 2699, 2707, 2711, 2713, 2719, 2729,
87 | 2731, 2741, 2749, 2753, 2767, 2777, 2789, 2791, 2797, 2801,
88 | 2803, 2819, 2833, 2837, 2843, 2851, 2857, 2861, 2879, 2887,
89 | 2897, 2903, 2909, 2917, 2927, 2939, 2953, 2957, 2963, 2969,
90 | 2971, 2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061,
91 | 3067, 3079, 3083, 3089, 3109, 3119, 3121, 3137, 3163, 3167,
92 | 3169, 3181, 3187, 3191, 3203, 3209, 3217, 3221, 3229, 3251,
93 | 3253, 3257, 3259, 3271, 3299, 3301, 3307, 3313, 3319, 3323,
94 | 3329, 3331, 3343, 3347, 3359, 3361, 3371, 3373, 3389, 3391,
95 | 3407, 3413, 3433, 3449, 3457, 3461, 3463, 3467, 3469, 3491,
96 | 3499, 3511, 3517, 3527, 3529, 3533, 3539, 3541, 3547, 3557,
97 | 3559, 3571, 3581, 3583, 3593, 3607, 3613, 3617, 3623, 3631,
98 | 3637, 3643, 3659, 3671, 3673, 3677, 3691, 3697, 3701, 3709,
99 | 3719, 3727, 3733, 3739, 3761, 3767, 3769, 3779, 3793, 3797,
100 | 3803, 3821, 3823, 3833, 3847, 3851, 3853, 3863, 3877, 3881,
101 | 3889, 3907, 3911, 3917, 3919, 3923, 3929, 3931, 3943, 3947,
102 | 3967, 3989, 4001, 4003, 4007, 4013, 4019, 4021, 4027, 4049,
103 | 4051, 4057, 4073, 4079, 4091, 4093, 4099, 4111, 4127, 4129,
104 | 4133, 4139, 4153, 4157, 4159, 4177, 4201, 4211, 4217, 4219,
105 | 4229, 4231, 4241, 4243, 4253, 4259, 4261, 4271, 4273, 4283,
106 | 4289, 4297, 4327, 4337, 4339, 4349, 4357, 4363, 4373, 4391,
107 | 4397, 4409, 4421, 4423, 4441, 4447, 4451, 4457, 4463, 4481,
108 | 4483, 4493, 4507, 4513, 4517, 4519, 4523, 4547, 4549, 4561,
109 | 4567, 4583, 4591, 4597, 4603, 4621, 4637, 4639, 4643, 4649,
110 | 4651, 4657, 4663, 4673, 4679, 4691, 4703, 4721, 4723, 4729,
111 | 4733, 4751, 4759, 4783, 4787, 4789, 4793, 4799, 4801, 4813,
112 | 4817, 4831, 4861, 4871, 4877, 4889, 4903, 4909, 4919, 4931,
113 | 4933, 4937, 4943, 4951, 4957, 4967, 4969, 4973, 4987, 4993,
114 | 4999, 5003, 5009, 5011, 5021, 5023, 5039, 5051, 5059, 5077,
115 | 5081, 5087, 5099, 5101, 5107, 5113, 5119, 5147, 5153, 5167,
116 | 5171, 5179, 5189, 5197, 5209, 5227, 5231, 5233, 5237, 5261,
117 | 5273, 5279, 5281, 5297, 5303, 5309, 5323, 5333, 5347, 5351,
118 | 5381, 5387, 5393, 5399, 5407, 5413, 5417, 5419, 5431, 5437,
119 | 5441, 5443, 5449, 5471, 5477, 5479, 5483, 5501, 5503, 5507,
120 | 5519, 5521, 5527, 5531, 5557, 5563, 5569, 5573, 5581, 5591,
121 | 5623, 5639, 5641, 5647, 5651, 5653, 5657, 5659, 5669, 5683,
122 | 5689, 5693, 5701, 5711, 5717, 5737, 5741, 5743, 5749, 5779,
123 | 5783, 5791, 5801, 5807, 5813, 5821, 5827, 5839, 5843, 5849,
124 | 5851, 5857, 5861, 5867, 5869, 5879, 5881, 5897, 5903, 5923,
125 | 5927, 5939, 5953, 5981, 5987, 6007, 6011, 6029, 6037, 6043,
126 | 6047, 6053, 6067, 6073, 6079, 6089, 6091, 6101, 6113, 6121,
127 | 6131, 6133, 6143, 6151, 6163, 6173, 6197, 6199, 6203, 6211,
128 | 6217, 6221, 6229, 6247, 6257, 6263, 6269, 6271, 6277, 6287,
129 | 6299, 6301, 6311, 6317, 6323, 6329, 6337, 6343, 6353, 6359,
130 | 6361, 6367, 6373, 6379, 6389, 6397, 6421, 6427, 6449, 6451,
131 | 6469, 6473, 6481, 6491, 6521, 6529, 6547, 6551, 6553, 6563,
132 | 6569, 6571, 6577, 6581, 6599, 6607, 6619, 6637, 6653, 6659,
133 | 6661, 6673, 6679, 6689, 6691, 6701, 6703, 6709, 6719, 6733,
134 | 6737, 6761, 6763, 6779, 6781, 6791, 6793, 6803, 6823, 6827,
135 | 6829, 6833, 6841, 6857, 6863, 6869, 6871, 6883, 6899, 6907,
136 | 6911, 6917, 6947, 6949, 6959, 6961, 6967, 6971, 6977, 6983,
137 | 6991, 6997, 7001, 7013, 7019, 7027, 7039, 7043, 7057, 7069,
138 | 7079, 7103, 7109, 7121, 7127, 7129, 7151, 7159, 7177, 7187,
139 | 7193, 7207, 7211, 7213, 7219, 7229, 7237, 7243, 7247, 7253,
140 | 7283, 7297, 7307, 7309, 7321, 7331, 7333, 7349, 7351, 7369,
141 | 7393, 7411, 7417, 7433, 7451, 7457, 7459, 7477, 7481, 7487,
142 | 7489, 7499, 7507, 7517, 7523, 7529, 7537, 7541, 7547, 7549,
143 | 7559, 7561, 7573, 7577, 7583, 7589, 7591, 7603, 7607, 7621,
144 | 7639, 7643, 7649, 7669, 7673, 7681, 7687, 7691, 7699, 7703,
145 | 7717, 7723, 7727, 7741, 7753, 7757, 7759, 7789, 7793, 7817,
146 | 7823, 7829, 7841, 7853, 7867, 7873, 7877, 7879, 7883, 7901,
147 | 7907, 7919, 7927, 7933, 7937, 7949, 7951, 7963, 7993, 8009,
148 | 8011, 8017, 8039, 8053, 8059, 8069, 8081, 8087, 8089, 8093,
149 | 8101, 8111, 8117, 8123, 8147, 8161, 8167, 8171, 8179, 8191,
150 | 8209, 8219, 8221, 8231, 8233, 8237, 8243, 8263, 8269, 8273,
151 | 8287, 8291, 8293, 8297, 8311, 8317, 8329, 8353, 8363, 8369,
152 | 8377, 8387, 8389, 8419, 8423, 8429, 8431, 8443, 8447, 8461,
153 | 8467, 8501, 8513, 8521, 8527, 8537, 8539, 8543, 8563, 8573,
154 | 8581, 8597, 8599, 8609, 8623, 8627, 8629, 8641, 8647, 8663,
155 | 8669, 8677, 8681, 8689, 8693, 8699, 8707, 8713, 8719, 8731,
156 | 8737, 8741, 8747, 8753, 8761, 8779, 8783, 8803, 8807, 8819,
157 | 8821, 8831, 8837, 8839, 8849, 8861, 8863, 8867, 8887, 8893,
158 | 8923, 8929, 8933, 8941, 8951, 8963, 8969, 8971, 8999, 9001,
159 | 9007, 9011, 9013, 9029, 9041, 9043, 9049, 9059, 9067, 9091,
160 | 9103, 9109, 9127, 9133, 9137, 9151, 9157, 9161, 9173, 9181,
161 | 9187, 9199, 9203, 9209, 9221, 9227, 9239, 9241, 9257, 9277,
162 | 9281, 9283, 9293, 9311, 9319, 9323, 9337, 9341, 9343, 9349,
163 | 9371, 9377, 9391, 9397, 9403, 9413, 9419, 9421, 9431, 9433,
164 | 9437, 9439, 9461, 9463, 9467, 9473, 9479, 9491, 9497, 9511,
165 | 9521, 9533, 9539, 9547, 9551, 9587, 9601, 9613, 9619, 9623,
166 | 9629, 9631, 9643, 9649, 9661, 9677, 9679, 9689, 9697, 9719,
167 | 9721, 9733, 9739, 9743, 9749, 9767, 9769, 9781, 9787, 9791,
168 | 9803, 9811, 9817, 9829, 9833, 9839, 9851, 9857, 9859, 9871,
169 | 9883, 9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967,
170 | 9973
171 | };
172 |
173 | #define mix(a,b,c) \
174 | { \
175 | a -= b; a -= c; a ^= (c>>13); \
176 | b -= c; b -= a; b ^= (a<<8); \
177 | c -= a; c -= b; c ^= (b>>13); \
178 | a -= b; a -= c; a ^= (c>>12); \
179 | b -= c; b -= a; b ^= (a<<16); \
180 | c -= a; c -= b; c ^= (b>>5); \
181 | a -= b; a -= c; a ^= (c>>3); \
182 | b -= c; b -= a; b ^= (a<<10); \
183 | c -= a; c -= b; c ^= (b>>15); \
184 | }
185 |
186 | BOBHash32::BOBHash32()
187 | {
188 | this->prime32Num = 0;
189 | }
190 |
191 | BOBHash32::BOBHash32(uint32_t prime32Num)
192 | {
193 | this->prime32Num = prime32Num;
194 | }
195 |
196 | void BOBHash32::initialize(uint32_t prime32Num)
197 | {
198 | this->prime32Num = prime32Num;
199 | }
200 |
201 | uint32_t BOBHash32::run(const char * str, uint32_t len)
202 | {
203 | //register ub4 a,b,c,len;
204 | uint32_t a,b,c;
205 | // uint32_t initval = 0;
206 | /* Set up the internal state */
207 | //len = length;
208 | a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */
209 | c = prime32[this->prime32Num]; /* the previous hash value */
210 |
211 | /*---------------------------------------- handle most of the key */
212 | while (len >= 12)
213 | {
214 | a += (str[0] +((uint32_t)str[1]<<8) +((uint32_t)str[2]<<16) +((uint32_t)str[3]<<24));
215 | b += (str[4] +((uint32_t)str[5]<<8) +((uint32_t)str[6]<<16) +((uint32_t)str[7]<<24));
216 | c += (str[8] +((uint32_t)str[9]<<8) +((uint32_t)str[10]<<16)+((uint32_t)str[11]<<24));
217 | mix(a,b,c);
218 | str += 12; len -= 12;
219 | }
220 |
221 | /*------------------------------------- handle the last 11 bytes */
222 | c += len;
223 | switch(len) /* all the case statements fall through */
224 | {
225 | case 11: c+=((uint32_t)str[10]<<24);
226 | // fall through
227 | case 10: c+=((uint32_t)str[9]<<16);
228 | // fall through
229 | case 9 : c+=((uint32_t)str[8]<<8);
230 | /* the first byte of c is reserved for the length */
231 | // fall through
232 | case 8 : b+=((uint32_t)str[7]<<24);
233 | // fall through
234 | case 7 : b+=((uint32_t)str[6]<<16);
235 | // fall through
236 | case 6 : b+=((uint32_t)str[5]<<8);
237 | // fall through
238 | case 5 : b+=str[4];
239 | // fall through
240 | case 4 : a+=((uint32_t)str[3]<<24);
241 | // fall through
242 | case 3 : a+=((uint32_t)str[2]<<16);
243 | // fall through
244 | case 2 : a+=((uint32_t)str[1]<<8);
245 | // fall through
246 | case 1 : a+=str[0];
247 | /* case 0: nothing left to add */
248 | }
249 | mix(a,b,c);
250 | /*-------------------------------------------- report the result */
251 | return c;
252 | }
253 |
254 | BOBHash32::~BOBHash32()
255 | {
256 |
257 | }
258 | #endif //_BOBHASH32_H
259 |
--------------------------------------------------------------------------------
/papers/bloom filters/BloomFilter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/BloomFilter.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/Invertible Bloom Lookup Table.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/Invertible Bloom Lookup Table.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/SummaryCache.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/SummaryCache.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/bloom tree.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/bloom tree.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/bloomier filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/bloomier filter.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/coded bloom filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/coded bloom filter.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/combinatorial bloom filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/combinatorial bloom filter.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/cuckoo filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/cuckoo filter.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/kBF.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/kBF.pdf
--------------------------------------------------------------------------------
/papers/bloom filters/shifting bloom filter.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/shifting bloom filter.pdf
--------------------------------------------------------------------------------
/papers/other references/FlowRadar.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/FlowRadar.pdf
--------------------------------------------------------------------------------
/papers/other references/MRAC.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/MRAC.pdf
--------------------------------------------------------------------------------
/papers/other references/SketchVisor.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/SketchVisor.pdf
--------------------------------------------------------------------------------
/papers/other references/Space-Saving.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/Space-Saving.pdf
--------------------------------------------------------------------------------
/papers/other references/univmon.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/univmon.pdf
--------------------------------------------------------------------------------
/papers/sampling methods/NetFLow.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sampling methods/NetFLow.pdf
--------------------------------------------------------------------------------
/papers/sampling methods/sFlowOverview.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sampling methods/sFlowOverview.pdf
--------------------------------------------------------------------------------
/papers/sketches/CM sketch.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CM sketch.pdf
--------------------------------------------------------------------------------
/papers/sketches/CSM sketch.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CSM sketch.pdf
--------------------------------------------------------------------------------
/papers/sketches/CU sketch.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CU sketch.pdf
--------------------------------------------------------------------------------
/papers/sketches/Count sketch.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/Count sketch.pdf
--------------------------------------------------------------------------------
/papers/sketches/CounterBraids.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CounterBraids.pdf
--------------------------------------------------------------------------------
/papers/sketches/Pyramid sketch.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/Pyramid sketch.pdf
--------------------------------------------------------------------------------
/常见sketch算法.pptx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/常见sketch算法.pptx
--------------------------------------------------------------------------------