├── README.md ├── codes ├── bloom filters │ ├── BloomFilter.h │ ├── CodedBloomFilter.h │ ├── CombinatorialBloomFilter.h │ ├── CountingBloomFilter.h │ ├── CuckooFilter.h │ └── SummaryCache.h └── common │ └── BOBHash32.h ├── papers ├── bloom filters │ ├── BloomFilter.pdf │ ├── Invertible Bloom Lookup Table.pdf │ ├── SummaryCache.pdf │ ├── bloom tree.pdf │ ├── bloomier filter.pdf │ ├── coded bloom filter.pdf │ ├── combinatorial bloom filter.pdf │ ├── cuckoo filter.pdf │ ├── kBF.pdf │ └── shifting bloom filter.pdf ├── other references │ ├── FlowRadar.pdf │ ├── MRAC.pdf │ ├── SketchVisor.pdf │ ├── Space-Saving.pdf │ └── univmon.pdf ├── sampling methods │ ├── NetFLow.pdf │ └── sFlowOverview.pdf └── sketches │ ├── CM sketch.pdf │ ├── CSM sketch.pdf │ ├── CU sketch.pdf │ ├── Count sketch.pdf │ ├── CounterBraids.pdf │ └── Pyramid sketch.pdf └── 常见sketch算法.pptx /README.md: -------------------------------------------------------------------------------- 1 | # 常见Sketch&BloomFilter算法 2 | 3 | Per-flow measurement指在网络交换机或者路由器测量某个流的某些信息。最典型的是流量测量,即这个流有多少包/字节经过这个交换机。 4 | 5 | Notice: 本说明中的公式在github markdown语法下无法编译。为了更好的阅读,可以将README.md下载后本地查看。 6 | 7 | 每天一个小目标 O_O! 8 | 9 | [Sampling-based Method](#sampling-based-method) 10 | 11 | [Bloom filters](#bloom-filters) 12 | 13 | [Sketches](#sketches) 14 | 15 | [Others](#others) 16 | 17 | 18 | 19 | 20 | ## Sampling-based Method 21 | 22 | ### 1. NetFlow (used in Cisco system) 23 | 24 | - flow ID:5-tuple, TOS (Type Of Service) byte, the interface of the router recieved the packet 25 | - 存储方式:在router interface的DRAM里存放一张表,每个entry对应一个流,包含的流信息有,flow ID、timestamp (开始&结束)、packet count、byte count、TCP flags、source network、source AS (Autonomous System)、destination network、destination AS、output interface、next hop router 26 | - Two challenges 27 | - network traffic流速过快,来不及处理每个packet 28 | - 收集到的data可能过多,超过了collection server的负载,或者超过了与collection server连接的负载 29 | - Aggregation:通过将一些不重要的数据聚集起来减少exported data 30 | - 观察:通常不太care end-to-end的流量信息,而关注network/AS之间的流量信息 31 | - 做法:将复合相同规则的流量信息聚合。比如,相同source AS & destination AS或者相同source network等等 32 | - Sampling:每x个包才更新一次DRAM 33 | - accuracy analysis:所有包大小相同,采样概率为$\frac{1}{x}$,记$c$为NetFlow的counter值,$s$是流的真实大小。 34 | - 一个流完全没有被测到:$(1-\frac{1}{x})^s$ 35 | - $E(c) = \frac{s}{x}​$,因此流的估计大小为$cx​$ 36 | - $c$服从二项分布,因此其标准差为$SD[c]=\sqrt{\frac{s}{x}(1-\frac{1}{x})}$,因此估计值的标准差为$\sqrt{sx(1-\frac{1}{x})}$ 37 | - 参考网址:https://www.cisco.com/c/en/us/td/docs/ios/12_2/switch/configuration/guide/fswtch_c/xcfnfov.html 38 | 39 | ### 2. sFlow (published in RFC 3176) 40 | 41 | - Sampling:同NetFlow 42 | - 参考网址:https://sflow.org/sFlowOverview.pdf 43 | 44 | ## Bloom filters 45 | 46 | ### 1. Bloom filter 47 | - 作用:单集合元素存在性查询 48 | 49 | - 假阳性分析:假设某集合总共包含n个元素(不重复),bloom filter含有m个比特,使用 k 个哈希函数 50 | 51 | - 某比特为0的概率为:$P_{(b=0)}=(1-\frac{1}{m})^{nk}​$ 52 | 53 | - 某不存在元素被误判为存在,即假阳性概率为:$Fpr = (1-P_{(b=0)})^k$ 54 | 55 | - 当$m \gg 0, nk \gg 0$,$P_{(b=0)}=(1-\frac{1}{m})^{nk} \approx e^{-\frac{nk}{m}}$,$Fpr \approx (1- e^{-\frac{nk}{m}})^k$ 56 | 57 | - 对 $\ln(Fpr) = k \ln{(1-e^{-\frac{nk}{m}})}$ 求导:$\frac{\partial}{\partial k}\ln(Fpr) = \ln (1-e^{-\frac{nk}{m}}) + \frac{\frac{nk}{m}\cdot e^{-\frac{nk}{m}}}{1-e^{-\frac{nk}{m}}}$,令其等于0,可得$e^{-\frac{nk}{m}}=\frac{1}{2}$,即最优k值为:$k=\frac{n}{m}\ln2$ 58 | 59 | ### 2. Counting Bloom filter 60 | 61 | - 作用:多重集合元素查询 62 | 63 | - 做法:某集合s中的元素可以重复,因此把每个bit换成counter就行了 64 | 65 | ### 3. Summary Cache 66 | - 作用:多集合元素查询(每个集合中元素不重复,且集合之间没有交集,查询一个元素属于哪个集合) 67 | 68 | - 做法:每个集合对应一个bloom filter 69 | 70 | - 缺点:查询时需要访存次数太多 71 | 72 | ### 4. Coded Bloom Filter 73 | 74 | - 作用:多集合元素查询 75 | - 做法:每个集合对应一个ID(序号),因此可以用一个bloom filter对应ID的一个bit 76 | - 缺点:各个ID所包含的1的数量不一样,插入速度变慢 77 | - 优化:把ID扩张一倍,后面补个反码 78 | 79 | ### 5. Combinatorial Bloom Filter 80 | 81 | - 作用:多集合元素查询 82 | - 做法:只用一个很大的bloom filter,但是使用不同的hash function组来表示不同的集合 83 | - 缺点:插入速度慢 84 | 85 | ### 6. kBF 86 | 87 | - 作用:key-value的静态插入、查询 88 | - 数据结构: 89 | - 一个cell数组 90 | - 每个cell包括一个counter part和coding part 91 | - 做法: 92 | - value -> encoding: 93 | - encoding要求: 94 | - 不同value的encoding不同 95 | - 任意两个encoding的异或值都是独特的 96 | - 假设有n个不同value,给第一个value分配encoding为1 97 | - 对于之后的value,encoding不断+1 98 | - 如果第k个value发现,它的encoding与之前k-1个value的encoding的异或值已经出现过(可以使用bloom filter),那么第k个value的encoding+1,并继续检查是否满足要求 99 | - encoding insertion (k次hash): 100 | - counter部分+1 101 | - encoding通过异或的方式插入coding part 102 | - query:通过hash得到k个cell后,希望还原出encoding 103 | - 如果存在counter=1的cell,done 104 | - 如果存在几个counter=2的cell,用O(n)时间复杂度的方法还原两个encoding,并判断共同出现的encoding 105 | - 如果只有counter>2的cell: 106 | - 先用O(n)的时间复杂度得到一个,再用o(n)的时间复杂度得到另一个 107 | - 结果不保证准确 108 | - encoding->value: 109 | - 由value->encoding的分配过程可以看出,encoding的分配是按encoding值的大小升序分配的(由此可以记录下一个encoding->value的list),所以可以使用二分查找的方法得到对应的value 110 | - 原文中提到了另一种更快的方法,没仔细看,就不说了 111 | 112 | ### 7. Bloomtree 113 | 114 | - 作用:多集合查询(key,value=setID) 115 | - 特点: 116 | - 静态的,不支持更新,可以通过改成counting bloom tree来支持 117 | - 结构: 118 | - 每个节点都是一个bloom filter 119 | - 内部节点有多组hash function,每组对应着一个子节点 120 | - 叶子结点只有一组hash function,用来记录某个key是否存在 121 | - 插入: 122 | - 根据value逐层选择一组hash函数,对key进行哈希,记录在node(bloom filter)中 123 | - 查询: 124 | - 每一层,使用这个node的所有哈希函数对key做哈希,查看是否在这个node里面出现过 125 | - 直到一个leaf node,如果leaf node也出现过,那么回答这个leaf node对应的value 126 | 127 | ### 8. Bloomier Filter 128 | 129 | - 希望解决的问题: 130 | - 给定定义域:$D=\{0, 1, \ldots, N-2\}$ 131 | - 定义域的一个子集:$S=\{t_1, t_2, \ldots, t_n\}$ 132 | - 值域:$R=\{null, 1, 2, \ldots, |R|-1\}$ 133 | - 希望建立一个$D\rightarrow R$的映射: 134 | - $for ~~1\leq i \leq n,~~~~ f(t_i) = v_i$ 135 | - $for ~~x\in D/S, ~~~~f(x)=null$ 136 | - $t_i \rightarrow v_i$的映射关系由具体任务决定:比如多重集合查询,t是元素,v是集合id 137 | - 支持更新,但不支持插入 138 | - 一些定义: 139 | - 对于定义域中的一个元素t,对其做k次hash得到$(h_1, h_2,\ldots, h_k)$,称这些哈希值为t的neighborhood $N(t)$ 140 | - 令$\pi$表示S上的一个全序关系: 141 | - 在$\pi$规则下,$S_i >_{\pi} S_j$当且仅当$i>j$ 142 | - 称$\tau$服从关系$(S, \pi, N)$,如果满足以下条件: 143 | - 若$t\in S​$,则$\tau (t) \in N(t)​$ 144 | - 若$t_i >_{\pi} t_j$,则$\tau(t_i) \notin N(t_j)$ 145 | - singleton:如果某个位置只被S中一个元素映射到过,这个位置就是singleton 146 | - TWEAK(t, S, HASH):t的neighborhood $N(t)=(h_1, h_2, \ldots, h_k)$中,所有singleton中最小的那个下标(也就是哪个哈希函数最先映射到singleton) 147 | - 给定S和k个哈希函数,如何得到想要的$\pi$和$\tau​$: 148 | - 首先找到有TWEAK的元素集合E,因此它们可以满足条件。把E中元素放在ordering最后(也就是E中元素在$\pi$关系下最大) 149 | - 剩下的元素称为H:继续递归查找$\pi$和$\tau$ 150 | - 可能会失败 151 | - 构建Bloomier filter: 152 | - 不断尝试,找到想要的$\pi$和$\tau$ 153 | - 对S中任一元素$t$,找到位置$\tau(t)$,把对应哈希函数的编码用异或的方式记录下来(在table 1中) 154 | - 在table2的位置$\tau(t)$,记录t对应的value 155 | - 查询/更新: 156 | - 把哈希位置的值全部和mask全部异或起来,解码得到l 157 | - 查看l是不是key对应的那个l 158 | - 如果是,返回/修改table2中的value 159 | - 如果不是,返回这个key不存在 160 | 161 | ### 9. Cuckoo Filter 162 | 163 | - 作用:单集合元素查询 164 | - 做法:filter有bucket array构成,每个bucket包含多个entry,每个entry存放一个partial key 165 | - 先由key计算partial key:$f​$ 166 | - 计算两个候选位置:$pos_1 = hash(key)$ 和 $pos_2 = hash(key)~XOR ~hash(f)$ 167 | - 插入:如果有空位,就插入;否则,踢掉一个插进去,并把踢掉的那个找另一个候选位置,放进去 168 | - 查询:查$pos_1$和$pos_2$位置是不是有partial key相等的entry 169 | - 删除:删掉候选位置与$f$ 相等的entry你们 170 | ### 10. Shifting Bloom Filter 171 | 172 | - 作用:多集合元素查询 173 | - insertion: 174 | - k次哈希,定位到k个bit 175 | - 第j个集合,那么就把第k个bit之后的offset为j的bits置为1 176 | - query: 177 | - 和平常的bloom filter一样 178 | 179 | ### 11. Invertible Bloom Filter 180 | 181 | - 作用: 182 | - 数据结构: 183 | - k个哈希 184 | - a table of m cells,每个cell包括: 185 | - count part:映射到这个cell的元素个数 186 | - keySum:映射到这儿的key的和 187 | - valueSum:映射到这儿的value的和 188 | - 插入、删除:用哈希函数映射到k个cell,然后按照上面的定义更新这些part就行了 189 | - 查找:先找到k个哈希的cell 190 | - 如果有count为0的cell,返回null 191 | - 如果有count为1的cell 192 | - 如果keySum匹配,返回valueSum 193 | - 否则返回null 194 | - 如果所有count都大于1,返回“not found” 195 | - 优势之处:可以list out存在IBLT中的所有key-value pair 196 | - 先找一个count为1的cell,列出这个cell里的key和value,然后把与其相关的所有cell的值减去key-value 197 | - 重复上述步骤,直到没有count为1的cell 198 | - 可能不能把所有的key-value pair输出 199 | 200 | ## Sketches 201 | 202 | ### 1. CM sketch 203 | 204 | - 作用:频率查询 205 | - 插入:映射到k个counter,每个counter+1 206 | - 查询:映射到k个counter,返回最小counter的值 207 | ### 2. CU sketch 208 | - 作用:频率查询 209 | - 插入:映射到k个counter,最小的一个或者多个counter+1 210 | - 查询:映射到k个counter,返回最小counter的值 211 | ### 3. Count sketch 212 | 213 | - 作用:频率查询 214 | - 插入:映射到k个counter,每个counter等概率+1或者-1 215 | - 查询:映射到k个counter,返回counter绝对值的中位数 216 | 217 | ### 4. CSM sketch 218 | 219 | - 数据结构: 220 | - 一个包含m个counter的数组 221 | - k个哈希函数 222 | - 插入:与CM类似,但事实随机选取一个counter+1 223 | - 查询:$\hat{s} = \sum_{i=0}^{k-1}C[h_i(f)] - k\frac{n}{m}$ 224 | - 前面这一块是所有对应hash counter的和,即为真实值+噪音 225 | - 后面为噪音的期望值(近似值) 226 | 227 | ### 5. Pyramid sketch 228 | 229 | - 数据结构:由多层counter数组构成 230 | - 上层是下层的一半大小,构成树结构 231 | - 第一层的counter全部用来记录frequency 232 | - 之后层的counter的两个bit用来记录左子节点和右子节点是否溢出,剩下的bit用来记录frequency 233 | - 插入过程: 234 | - 可以使用count、CM、CU、CSM的方式 235 | - 如发生溢出,则用进位的方式向上层记录 236 | - 删除过程: 237 | - 可以使用count、CM、CU、CSM的方式 238 | - 如需要借位,则向下层借(进位的方式) 239 | - 查询过程: 240 | - 按照进位的方式查询即可 241 | - 好处:因为实际流量中frequency较小的流比较多,因此低层可以使用较小的counter节省空间 242 | 243 | 244 | 245 | ## Others 246 | ### 1. Space-Saving 247 | 248 | - 作用:finding heavy items (items with large frequency) 249 | - 插入算法简单介绍: 250 | - 若数据结构未满,直接插入 251 | - 若满了,则替换value最小的元素,并让value+1 252 | - 数据结构: 253 | - 一个哈希表:key -> key_node的指针 254 | - 一个value node list,是双向链表,node代表的值从小到大排布 255 | - 每个value node连着一串key node 256 | - key node之间用双向链表连接 257 | - 每个key node还存着一个指针指向value node 258 | - 具体的插入(key)过程: 259 | - 如果在哈希表中找到key,执行下面的increment操作 260 | - 将对应的key node从所在的value node中移除 261 | - 如果移除后list空了,那么将这个value node从value node list中移除 262 | - 查看原来value node中的下一个value node 263 | - 如果新node的value = old node的值+1,那么在这个value node所在的key node list中插入这个key 264 | - 否则,新建一个value node,将值职位old value+1,并将这个key插入到这个value ndoe 265 | - 如果没有找到这个key: 266 | - 将v1(最小值的value node)中一个key node的key替换成插入的key,并将这个key node从v1中移除 267 | - 执行上面的increment操作 268 | 269 | ### 2. UnivMon 270 | 271 | - 处理对象:n个不同的元素,总共m个包,$f_i$表示第i个元素的频率 272 | - 目标是回答G-sum:$\sum g(f_i)$,其中g是单调的,并且以 $f_i^2$ 为upper bound 273 | - 数据结构:$H=\log(n)$个count sketch,按照类似pyramid sketch那样叠加起来 274 | - (online)插入过程: 275 | - 哈希H次(哈希值落在{0, 1}之间) 276 | - 如果发现1~j个的哈希值都是1(这样的最大的j),那么插入第1~j个count sketch 277 | - (offline)查询过程: 278 | - 从数据结构中的每一层count sketch读出heavy hitter集合(命名为$Q_j, 1\leq j \leq \log(n)$, $w_j(i)$表示$Q_j$中的第i个heavy hitter) 279 | - 对这些heavy hitter作g()操作 280 | - 计算顶层的G-sum:$Y_{\log(n)}=\sum_{i} g(w_{\log(n)}(i))$ 281 | - 对于接下来的层次: 282 | - $Y_j = 2Y_{j+1} + \sum_{i \in Q_j}(1 - 2 h_{j+1}(i))(g(w_j(i))$ 283 | - 第一项:G-sum的2倍 284 | - 后一项: 285 | - 如果$Q_j(i)$不在第j+1层出现,那么$h_{j+1}(i)=0$,所以后面一项代表加上$Q_j(i)$的counter值 286 | - 如果$Q_j(i)$在第j+1层出现,那么$h_{j+1}(i)=1$,所以后面一项代表减去$Q_j(i)$的counter值,因此保证了G-sum是1倍的 287 | - 这里有个隐含的东西:出现在第j+1层的必定出现在第j层,所以第j+1层的G-sum肯定会全部被减一遍 288 | - 这样做的好处是方便流水化 289 | - task: 290 | - heavy hiiter:直接用count sketch做 291 | - entropy 292 | 293 | ### 3. FlowRadar 294 | 295 | - 作用:key-value的流量测量任务 296 | - 数据结构很简单:一个bloom filter + 一个counting table(类似IBLT的东西) 297 | - counting table中的每个cell包含3哥域: 298 | - FlowXOR:映射到这里面的flow ID的异或值 299 | - FlowCount:映射到这里面的流个数 300 | - packetCount:映射到这里的包个数 301 | - 插入key: 302 | - 检查bloom filter,key是否出现过? 303 | - 如果出现过,那么在counting table中映射到k个位置,执行packetCount++ 304 | - 否则,先将key插入bloom filter,再映射到counting table中的k个位置, 305 | - 将key异或进入FlowXOR 306 | - FlowCount++ 307 | - packetCount++ 308 | - 解码过程(就是IBLT的list out操作) 309 | - 先找FlowCount=1的cell,读出里面的key和value,然后从与这个key相关的cell删除信息 310 | - 继续上一步 311 | - Network-Wide decoding: 312 | - FlowDecode across switches: 313 | - 现在switche内部作decode 314 | - 然后比较两个相邻两个switch的解码出来的元素集合 315 | - S1表示一个交换机的解码结果, S2表示另一个交换机的解码结果 316 | - 查看S1-S2中的元素是不是在switch 2的bloom filter中出现 317 | - 如果出现了,把这个元素加入S2的结果,并且更新对应的FlowXOR、FlowCount、PacketCount 318 | - 再对S2-S1同样做一次 319 | - CounterDecode at a single switch: 320 | - 通过上面的FlowDecode across switches,大概知道哪些counter里有哪些key,且知道这些key对应的流的包总量 321 | - 假设有m个counter,n个flow,那么可以得到一个包含n个变量、m个等式的方程组 322 | - 然后用matlab等工具来解这些东西,最终的到key对应的packet num 323 | 324 | ### 4. SketchVisor 325 | 326 | - 一个集成多种sketch algorithm for various measurement tasks的system 327 | - overview: 328 | - data plane分为normal path和fast path: 329 | - 数据进入data plane后,首先判断一个FIFO buffer是否已满 330 | - 如果没满,则走normal path,由内部各种算法来处理 331 | - 如果满了,则走**fast path** 332 | - control plane: 333 | - 从各地收集数据 334 | - 将各地收回的数据进行一定的修复(a recover algorithm based on Compressive Sensing) 335 | - Fast Path 336 | - key idea: 337 | - 假设进入fast path的流也是长尾分布的 => 应注重收集large flow的信息 338 | - 同时小流信息也很重要 => 不单独为记录每个小流的信息,而是记录它们的统计信息 339 | - 总而言之,就是设计一个top-k算法(具体算法还是直接看论文吧,论文讲的很清楚) 340 | - recovery algorithm: 341 | - 已知的信息: 342 | - 将所有sketch加在一起,得到一个N(就是对应位置的counter相加) 343 | - N是明显不准确的,因为走fast path的小流的信息都被扔掉了 344 | - 所有的top-k流组成一个hash表H 345 | - H的误差只与选择的算法本身有关 346 | - 记录流的总字节数V 347 | - 这个可以完全准确 348 | - 补全N => 矩阵补全的问题 349 | - 使用compressive sensing来补全N 350 | - 具体的可以看论文s 351 | 352 | -------------------------------------------------------------------------------- /codes/bloom filters/BloomFilter.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "../common/BOBHash32.h" 6 | 7 | class BloomFilter 8 | { 9 | int n; // number of elements in the set 10 | int m; // number of bits in the bit array 11 | int w; // number of bytes in the bit array 12 | int k; // number of hash functions 13 | uint8_t* array; 14 | BOBHash32** hash; 15 | 16 | public: 17 | BloomFilter(int m, int n, int k): n(n), m(m) 18 | { 19 | k = k == 0 ? int(n * 1.0 / m * log(2)) : k; 20 | 21 | w = m / 8 + (m % 8 == 0 ?0 : 1); 22 | array = new uint8_t[w]; 23 | memset(array, 0, w); 24 | 25 | hash = new BOBHash32*[k]; 26 | for(int i = 0; i < k; ++i) 27 | hash[i] = new BOBHash32(100 + i); 28 | } 29 | ~BloomFilter(){ 30 | delete array; 31 | for(int i = 0; i < k; ++i) 32 | delete hash[i]; 33 | delete hash; 34 | } 35 | 36 | void insert(char* key, uint32_t keylen) 37 | { 38 | for(int i = 0; i < k; ++i){ 39 | int pos = hash[i]->run(key, keylen) % m; 40 | set_bit(pos); 41 | } 42 | } 43 | 44 | bool query(char* key, uint32_t keylen) 45 | { 46 | for(int i = 0; i < k; ++i){ 47 | int pos = hash[i]->run(key, keylen) % m; 48 | if(query_bit(pos) == 0) 49 | return false; 50 | } 51 | return true; 52 | } 53 | 54 | private: 55 | int query_bit(int pos){ 56 | int base = pos / 8; 57 | int offset = pos % 8; 58 | uint8_t mask = (uint8_t)(1 << offset); 59 | uint8_t res = array[base] & mask; 60 | return res ? 1 : 0; 61 | } 62 | void set_bit(int pos){ 63 | int base = pos / 8; 64 | int offset = pos % 8; 65 | uint8_t mask = (uint8_t)(1 << offset); 66 | array[base] |= mask; 67 | } 68 | } -------------------------------------------------------------------------------- /codes/bloom filters/CodedBloomFilter.h: -------------------------------------------------------------------------------- 1 | #include "BloomFilter.h" 2 | 3 | 4 | class CodedBloomFilter 5 | { 6 | int s; // number of sets 7 | int n; // number of bloom filters 8 | int m; // number of bits used in each bloom filter 9 | int k; // number of hash functions used in each bloom filter 10 | BloomFilter** bfs; 11 | 12 | public: 13 | CodedBloomFilter(int s, int m, int k):s(s), m(m), k(k) 14 | { 15 | n = 0; 16 | int tmpS = s; 17 | while(tmpS != 0){ 18 | n++; 19 | tmpS >>= 1; 20 | } 21 | 22 | bfs = new BloomFilter*[n]; 23 | for(int i = 0; i < n; ++i) 24 | bfs[i] = new BloomFilter(m, 0, k); 25 | } 26 | ~CodedBloomFilter() 27 | { 28 | for(int i = 0; i < n; ++i) 29 | delete bfs[i]; 30 | delete bfs; 31 | } 32 | 33 | void insert(char* key, uint32_t keylen, int setIdx) 34 | { 35 | for(int i = 0; i < n; ++i){ 36 | if(setIdx & 1 == 1) 37 | bfs[i]->insert(key, keylen); 38 | setIdx >>= 1; 39 | } 40 | } 41 | 42 | /* 43 | * return -1: not found in any set 44 | * return -2: found in multiple sets 45 | * return k (0<=kquery(key, keylen)) 52 | setId += (1 << i); 53 | if(setId == 0) 54 | return -1; 55 | if(setId > s) 56 | return -2; 57 | return setId; 58 | } 59 | } -------------------------------------------------------------------------------- /codes/bloom filters/CombinatorialBloomFilter.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "../common/BOBHash32.h" 6 | 7 | class CombinatorialBloomFilter 8 | { 9 | int m; // number of bits in the bit array 10 | int s; // number of sets 11 | int w; // number of bytes in the bit array 12 | int k; // number of hash functions 13 | uint8_t* array; 14 | BOBHash32** hash; 15 | 16 | public: 17 | CombinatorialBloomFilter(int m, int k, int s):m(m), k(k), s(s) 18 | { 19 | w = m / 8 + (m % 8 == 0 ?0 : 1); 20 | array = new uint8_t[w]; 21 | memset(array, 0, w); 22 | 23 | hash = new BOBHash32*[s*k]; 24 | for(int i = 0; i < s*k; ++i) 25 | hash[i] = new BOBHash32(100 + i); 26 | } 27 | ~CombinatorialBloomFilter{ 28 | delete array; 29 | for(int i = 0; i < s*k; ++i) 30 | delete hash[i]; 31 | delete hash; 32 | } 33 | 34 | void insert(char* key, uint32_t keylen, int setIdx) 35 | { 36 | for(int i = 0; i < k; ++i){ 37 | int pos = hash[setIdx*k + i]->run(key, keylen) & m; 38 | set_bit(pos); 39 | } 40 | } 41 | 42 | /* 43 | * return -1: not found in any set 44 | * return -2: found in multiple sets 45 | * return k (0<=krun(key, keylen) % m; 55 | if(query_bit(pos) == 0){ 56 | flag = false; 57 | break; 58 | } 59 | } 60 | if(flag){ 61 | if(setIdx != -1) 62 | return -2; 63 | setIdx = i; 64 | } 65 | } 66 | return setIdx; 67 | } 68 | 69 | private: 70 | int query_bit(int pos){ 71 | int base = pos / 8; 72 | int offset = pos % 8; 73 | uint8_t mask = (uint8_t)(1 << offset); 74 | uint8_t res = array[base] & mask; 75 | return res ? 1 : 0; 76 | } 77 | void set_bit(int pos){ 78 | int base = pos / 8; 79 | int offset = pos % 8; 80 | uint8_t mask = (uint8_t)(1 << offset); 81 | array[base] |= mask; 82 | } 83 | 84 | 85 | } -------------------------------------------------------------------------------- /codes/bloom filters/CountingBloomFilter.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "../common/BOBHash32.h" 6 | 7 | class BloomFilter 8 | { 9 | int n; // number of elements in the set 10 | int m; // number of counters in the array 11 | int k; // number of hash functions 12 | uint32_t* array; 13 | BOBHash32** hash; 14 | 15 | public: 16 | BloomFilter(int m, int n, int k): n(n), m(m), k(k) 17 | { 18 | array = new uint32_t[m]; 19 | memset(array, 0, m * sizeof(uint32_t)); 20 | 21 | hash = new BOBHash32*[k]; 22 | for(int i = 0; i < k; ++i) 23 | hash[i] = new BOBHash32(100 + i); 24 | } 25 | BloomFilter(int m, int n) 26 | { 27 | k = int(n * 1.0 / m * log(2)); 28 | BloomFilter(m, n, k); 29 | } 30 | ~BloomFilter(){ 31 | delete array; 32 | for(int i = 0; i < k; ++i) 33 | delete hash[i]; 34 | delete hash; 35 | } 36 | 37 | void insert(char* key, uint32_t keylen) 38 | { 39 | for(int i = 0; i < k; ++i){ 40 | int pos = hash[i]->run(key, keylen) % m; 41 | array[pos] += 1; 42 | } 43 | } 44 | 45 | int query(char* key, uint32_t keylen) 46 | { 47 | int res = 0x3FFFFFFF; 48 | for(int i = 0; i < k; ++i){ 49 | int pos = hash[i]->run(key, keylen) % m; 50 | res = res > array[pos] ? array[pos] : res; 51 | } 52 | return res; 53 | } 54 | } -------------------------------------------------------------------------------- /codes/bloom filters/CuckooFilter.h: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include "../common/BOBHash32.h" 6 | 7 | class CuckooFilter 8 | { 9 | const static int MAX_RELOCATE_TIMES = 100; 10 | int m; // number of buckets in the array 11 | // m must be 2^k 12 | int w; // number of entries in the bucket 13 | uint32_t** array; 14 | BOBHash32* hash[2]; 15 | 16 | int kick_index, kick_bucket; 17 | 18 | public: 19 | CuckooFilter(int m, int w): m(m), w(w) 20 | { 21 | array = new uint32_t*[m]; 22 | for(int i = 0; i < m; ++i){ 23 | array[i] = new uint32_t[w]; 24 | memset(array[i], 0, sizeof(uint32_t) * w); 25 | } 26 | 27 | hash[0] = new BOBHash32(101); 28 | hash[1] = new BOBHash32[102]; 29 | 30 | kick_bucket = 0; 31 | kick_index = 0; 32 | } 33 | ~CuckooFilter(){ 34 | for(int i = 0; i < m; ++i) 35 | delete array[i]; 36 | delete array; 37 | delete hash[0]; 38 | delete hash[1]; 39 | } 40 | 41 | void insert(char* key, uint32_t keylen) 42 | { 43 | uint32_t fp = get_fp(key, keylen); 44 | uint32_t pos1 = hash[0]->run(key, keylen) % m; 45 | uint32_t pos2 = (pos1 ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m; 46 | if(pos1 > pos2){ 47 | tmp_pos = pos2; 48 | pos2 = pos1; 49 | pos1 = tmp_pos; 50 | 51 | } 52 | 53 | for(int j = 0; j < w; ++j) 54 | if(array[pos1][j] == fp) 55 | return; 56 | for(int j = 0; j < w; ++j) 57 | if(array[pos2][j] == fp) 58 | return; 59 | 60 | for(int j = 0; j < w; ++j) 61 | if(array[pos1][j] == 0){ 62 | array[pos1][j] = fp; 63 | return; 64 | } 65 | for(int j = 0; j < w; ++j) 66 | if(array[pos2][j] == 0){ 67 | array[pos2][j] = fp; 68 | return; 69 | } 70 | 71 | uint32_t bucket_no = kick_bucket == 0 ? pos1 : pos2; 72 | uint32_t kick_fp = array[bucket_no][kick_index]; 73 | array[bucket_no][kick_index] = fp; 74 | kick_bucket = (kick_bucket + 1) % 2; 75 | kick_index = (kick_index + 1) % w; 76 | relocate(kick_fp, bucket_no); 77 | } 78 | 79 | bool query(char* key, uint32_t keylen) 80 | { 81 | uint32_t fp = get_fp(key, keylen); 82 | uint32_t pos1 = hash[0]->run(key, keylen) % m; 83 | uint32_t pos2 = (pos1 ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m; 84 | 85 | for(int j = 0; j < w; ++j) 86 | if(array[pos1][j] == fp) 87 | return true; 88 | for(int j = 0; j < w; ++j) 89 | if(array[pos2][j] == fp) 90 | return true; 91 | return false; 92 | } 93 | 94 | void delete(char*key, uint32_t keylen) 95 | { 96 | uint32_t fp = get_fp(key, keylen); 97 | uint32_t pos1 = hash[0]->run(key, keylen) % m; 98 | uint32_t pos2 = (pos1 ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m; 99 | 100 | for(int j = 0; j < w; ++j) 101 | if(array[pos1][j] == fp){ 102 | array[pos1][j] = 0; 103 | return; 104 | } 105 | for(int j = 0; j < w; ++j) 106 | if(array[pos2][j] == fp){ 107 | array[pos1][j] = 0; 108 | return; 109 | } 110 | } 111 | 112 | private: 113 | /* calculate fingerprint */ 114 | uint32_t get_fp(char* key, uint32_t keylen) 115 | { 116 | return hash[1]->run(key, keylen); 117 | } 118 | 119 | /* relocate finger print */ 120 | void relocate(uint32_t fp, uint32_t bucket_no, int times = 0) 121 | { 122 | if(times >= MAX_RELOCATE_TIMES){ 123 | printf("relocate failed!\n"); 124 | return; 125 | } 126 | 127 | int pos = (bucket_no ^ hash[0]->run((char*)&fp, sizeof(uint32_t))) % m; 128 | for(int j = 0; j < w; ++j) 129 | if(array[pos][j] == 0){ 130 | array[pos][j] = fp; 131 | return; 132 | } 133 | 134 | uint32_t kick_fp = array[pos][kick_index]; 135 | array[pos][kick_index] = fp; 136 | kick_index = (kick_index + 1) % w; 137 | relocate(kick_fp, pos, times++); 138 | } 139 | } 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | -------------------------------------------------------------------------------- /codes/bloom filters/SummaryCache.h: -------------------------------------------------------------------------------- 1 | #include "BloomFilter.h" 2 | 3 | class SummaryCache 4 | { 5 | int s; // number of sets 6 | int m; // number of bits used in each bloom filter 7 | int k; // number of hash functions used in each bloom filter 8 | BloomFilter** bfs; 9 | 10 | public: 11 | SummaryCache(int s, int m, int k):s(s), m(m), k(k) 12 | { 13 | bfs = new BloomFilter*[s]; 14 | for(int i = 0; i < s; ++i) 15 | bfs[i] = new BloomFilter(m, 0, k); 16 | } 17 | ~SummaryCache() 18 | { 19 | for(int i = 0; i < s; ++i) 20 | delete bfs[i]; 21 | delete bfs; 22 | } 23 | 24 | void insert(char* key, uint32_t keylen, int setIdx) 25 | { 26 | bfs[setIdx]->insert(key, keylen); 27 | } 28 | 29 | /* 30 | * return -1: not found in any set 31 | * return -2: found in multiple sets 32 | * return k (0<=kquery(key, keylen)){ 39 | if(res != -1) 40 | return -2; 41 | res = i; 42 | } 43 | return res; 44 | } 45 | } -------------------------------------------------------------------------------- /codes/common/BOBHash32.h: -------------------------------------------------------------------------------- 1 | #ifndef _BOBHASH32_H 2 | #define _BOBHASH32_H 3 | #include 4 | #include 5 | #include 6 | #include 7 | using namespace std; 8 | 9 | #define MAX_PRIME32 1229 10 | #define MAX_BIG_PRIME32 50 11 | 12 | class BOBHash32 13 | { 14 | public: 15 | BOBHash32(); 16 | ~BOBHash32(); 17 | BOBHash32(uint32_t prime32Num); 18 | void initialize(uint32_t prime32Num); 19 | uint32_t run(const char * str, uint32_t len); // produce a hash number 20 | static uint32_t get_random_prime_index() 21 | { 22 | random_device rd; 23 | return rd() % MAX_PRIME32; 24 | } 25 | 26 | static vector get_random_prime_index_list(int n) 27 | { 28 | random_device rd; 29 | unordered_set st; 30 | while (st.size() < n) { 31 | st.insert(rd() % MAX_PRIME32); 32 | } 33 | return vector(st.begin(), st.end()); 34 | } 35 | private: 36 | uint32_t prime32Num; 37 | }; 38 | 39 | uint32_t big_prime3232[MAX_BIG_PRIME32] = { 40 | 20177, 20183, 20201, 20219, 20231, 20233, 20249, 20261, 20269, 20287, 41 | 20297, 20323, 20327, 20333, 20341, 20347, 20353, 20357, 20359, 20369, 42 | 20389, 20393, 20399, 20407, 20411, 20431, 20441, 20443, 20477, 20479, 43 | 20483, 20507, 20509, 20521, 20533, 20543, 20549, 20551, 20563, 20593, 44 | 20599, 20611, 20627, 20639, 20641, 20663, 20681, 20693, 20707, 20717 45 | }; 46 | uint32_t prime32[MAX_PRIME32] = { 47 | 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 48 | 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 49 | 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 50 | 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 51 | 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 52 | 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 53 | 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 54 | 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 55 | 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 56 | 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 57 | 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 58 | 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 59 | 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 60 | 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 61 | 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 62 | 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 63 | 947, 953, 967, 971, 977, 983, 991, 997, 64 | 1009, 1013, 1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 65 | 1063, 1069, 1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123, 66 | 1129, 1151, 1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 67 | 1217, 1223, 1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283, 68 | 1289, 1291, 1297, 1301, 1303, 1307, 1319, 1321, 1327, 1361, 69 | 1367, 1373, 1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439, 70 | 1447, 1451, 1453, 1459, 1471, 1481, 1483, 1487, 1489, 1493, 71 | 1499, 1511, 1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571, 72 | 1579, 1583, 1597, 1601, 1607, 1609, 1613, 1619, 1621, 1627, 73 | 1637, 1657, 1663, 1667, 1669, 1693, 1697, 1699, 1709, 1721, 74 | 1723, 1733, 1741, 1747, 1753, 1759, 1777, 1783, 1787, 1789, 75 | 1801, 1811, 1823, 1831, 1847, 1861, 1867, 1871, 1873, 1877, 76 | 1879, 1889, 1901, 1907, 1913, 1931, 1933, 1949, 1951, 1973, 77 | 1979, 1987, 1993, 1997, 1999, 2003, 2011, 2017, 2027, 2029, 78 | 2039, 2053, 2063, 2069, 2081, 2083, 2087, 2089, 2099, 2111, 79 | 2113, 2129, 2131, 2137, 2141, 2143, 2153, 2161, 2179, 2203, 80 | 2207, 2213, 2221, 2237, 2239, 2243, 2251, 2267, 2269, 2273, 81 | 2281, 2287, 2293, 2297, 2309, 2311, 2333, 2339, 2341, 2347, 82 | 2351, 2357, 2371, 2377, 2381, 2383, 2389, 2393, 2399, 2411, 83 | 2417, 2423, 2437, 2441, 2447, 2459, 2467, 2473, 2477, 2503, 84 | 2521, 2531, 2539, 2543, 2549, 2551, 2557, 2579, 2591, 2593, 85 | 2609, 2617, 2621, 2633, 2647, 2657, 2659, 2663, 2671, 2677, 86 | 2683, 2687, 2689, 2693, 2699, 2707, 2711, 2713, 2719, 2729, 87 | 2731, 2741, 2749, 2753, 2767, 2777, 2789, 2791, 2797, 2801, 88 | 2803, 2819, 2833, 2837, 2843, 2851, 2857, 2861, 2879, 2887, 89 | 2897, 2903, 2909, 2917, 2927, 2939, 2953, 2957, 2963, 2969, 90 | 2971, 2999, 3001, 3011, 3019, 3023, 3037, 3041, 3049, 3061, 91 | 3067, 3079, 3083, 3089, 3109, 3119, 3121, 3137, 3163, 3167, 92 | 3169, 3181, 3187, 3191, 3203, 3209, 3217, 3221, 3229, 3251, 93 | 3253, 3257, 3259, 3271, 3299, 3301, 3307, 3313, 3319, 3323, 94 | 3329, 3331, 3343, 3347, 3359, 3361, 3371, 3373, 3389, 3391, 95 | 3407, 3413, 3433, 3449, 3457, 3461, 3463, 3467, 3469, 3491, 96 | 3499, 3511, 3517, 3527, 3529, 3533, 3539, 3541, 3547, 3557, 97 | 3559, 3571, 3581, 3583, 3593, 3607, 3613, 3617, 3623, 3631, 98 | 3637, 3643, 3659, 3671, 3673, 3677, 3691, 3697, 3701, 3709, 99 | 3719, 3727, 3733, 3739, 3761, 3767, 3769, 3779, 3793, 3797, 100 | 3803, 3821, 3823, 3833, 3847, 3851, 3853, 3863, 3877, 3881, 101 | 3889, 3907, 3911, 3917, 3919, 3923, 3929, 3931, 3943, 3947, 102 | 3967, 3989, 4001, 4003, 4007, 4013, 4019, 4021, 4027, 4049, 103 | 4051, 4057, 4073, 4079, 4091, 4093, 4099, 4111, 4127, 4129, 104 | 4133, 4139, 4153, 4157, 4159, 4177, 4201, 4211, 4217, 4219, 105 | 4229, 4231, 4241, 4243, 4253, 4259, 4261, 4271, 4273, 4283, 106 | 4289, 4297, 4327, 4337, 4339, 4349, 4357, 4363, 4373, 4391, 107 | 4397, 4409, 4421, 4423, 4441, 4447, 4451, 4457, 4463, 4481, 108 | 4483, 4493, 4507, 4513, 4517, 4519, 4523, 4547, 4549, 4561, 109 | 4567, 4583, 4591, 4597, 4603, 4621, 4637, 4639, 4643, 4649, 110 | 4651, 4657, 4663, 4673, 4679, 4691, 4703, 4721, 4723, 4729, 111 | 4733, 4751, 4759, 4783, 4787, 4789, 4793, 4799, 4801, 4813, 112 | 4817, 4831, 4861, 4871, 4877, 4889, 4903, 4909, 4919, 4931, 113 | 4933, 4937, 4943, 4951, 4957, 4967, 4969, 4973, 4987, 4993, 114 | 4999, 5003, 5009, 5011, 5021, 5023, 5039, 5051, 5059, 5077, 115 | 5081, 5087, 5099, 5101, 5107, 5113, 5119, 5147, 5153, 5167, 116 | 5171, 5179, 5189, 5197, 5209, 5227, 5231, 5233, 5237, 5261, 117 | 5273, 5279, 5281, 5297, 5303, 5309, 5323, 5333, 5347, 5351, 118 | 5381, 5387, 5393, 5399, 5407, 5413, 5417, 5419, 5431, 5437, 119 | 5441, 5443, 5449, 5471, 5477, 5479, 5483, 5501, 5503, 5507, 120 | 5519, 5521, 5527, 5531, 5557, 5563, 5569, 5573, 5581, 5591, 121 | 5623, 5639, 5641, 5647, 5651, 5653, 5657, 5659, 5669, 5683, 122 | 5689, 5693, 5701, 5711, 5717, 5737, 5741, 5743, 5749, 5779, 123 | 5783, 5791, 5801, 5807, 5813, 5821, 5827, 5839, 5843, 5849, 124 | 5851, 5857, 5861, 5867, 5869, 5879, 5881, 5897, 5903, 5923, 125 | 5927, 5939, 5953, 5981, 5987, 6007, 6011, 6029, 6037, 6043, 126 | 6047, 6053, 6067, 6073, 6079, 6089, 6091, 6101, 6113, 6121, 127 | 6131, 6133, 6143, 6151, 6163, 6173, 6197, 6199, 6203, 6211, 128 | 6217, 6221, 6229, 6247, 6257, 6263, 6269, 6271, 6277, 6287, 129 | 6299, 6301, 6311, 6317, 6323, 6329, 6337, 6343, 6353, 6359, 130 | 6361, 6367, 6373, 6379, 6389, 6397, 6421, 6427, 6449, 6451, 131 | 6469, 6473, 6481, 6491, 6521, 6529, 6547, 6551, 6553, 6563, 132 | 6569, 6571, 6577, 6581, 6599, 6607, 6619, 6637, 6653, 6659, 133 | 6661, 6673, 6679, 6689, 6691, 6701, 6703, 6709, 6719, 6733, 134 | 6737, 6761, 6763, 6779, 6781, 6791, 6793, 6803, 6823, 6827, 135 | 6829, 6833, 6841, 6857, 6863, 6869, 6871, 6883, 6899, 6907, 136 | 6911, 6917, 6947, 6949, 6959, 6961, 6967, 6971, 6977, 6983, 137 | 6991, 6997, 7001, 7013, 7019, 7027, 7039, 7043, 7057, 7069, 138 | 7079, 7103, 7109, 7121, 7127, 7129, 7151, 7159, 7177, 7187, 139 | 7193, 7207, 7211, 7213, 7219, 7229, 7237, 7243, 7247, 7253, 140 | 7283, 7297, 7307, 7309, 7321, 7331, 7333, 7349, 7351, 7369, 141 | 7393, 7411, 7417, 7433, 7451, 7457, 7459, 7477, 7481, 7487, 142 | 7489, 7499, 7507, 7517, 7523, 7529, 7537, 7541, 7547, 7549, 143 | 7559, 7561, 7573, 7577, 7583, 7589, 7591, 7603, 7607, 7621, 144 | 7639, 7643, 7649, 7669, 7673, 7681, 7687, 7691, 7699, 7703, 145 | 7717, 7723, 7727, 7741, 7753, 7757, 7759, 7789, 7793, 7817, 146 | 7823, 7829, 7841, 7853, 7867, 7873, 7877, 7879, 7883, 7901, 147 | 7907, 7919, 7927, 7933, 7937, 7949, 7951, 7963, 7993, 8009, 148 | 8011, 8017, 8039, 8053, 8059, 8069, 8081, 8087, 8089, 8093, 149 | 8101, 8111, 8117, 8123, 8147, 8161, 8167, 8171, 8179, 8191, 150 | 8209, 8219, 8221, 8231, 8233, 8237, 8243, 8263, 8269, 8273, 151 | 8287, 8291, 8293, 8297, 8311, 8317, 8329, 8353, 8363, 8369, 152 | 8377, 8387, 8389, 8419, 8423, 8429, 8431, 8443, 8447, 8461, 153 | 8467, 8501, 8513, 8521, 8527, 8537, 8539, 8543, 8563, 8573, 154 | 8581, 8597, 8599, 8609, 8623, 8627, 8629, 8641, 8647, 8663, 155 | 8669, 8677, 8681, 8689, 8693, 8699, 8707, 8713, 8719, 8731, 156 | 8737, 8741, 8747, 8753, 8761, 8779, 8783, 8803, 8807, 8819, 157 | 8821, 8831, 8837, 8839, 8849, 8861, 8863, 8867, 8887, 8893, 158 | 8923, 8929, 8933, 8941, 8951, 8963, 8969, 8971, 8999, 9001, 159 | 9007, 9011, 9013, 9029, 9041, 9043, 9049, 9059, 9067, 9091, 160 | 9103, 9109, 9127, 9133, 9137, 9151, 9157, 9161, 9173, 9181, 161 | 9187, 9199, 9203, 9209, 9221, 9227, 9239, 9241, 9257, 9277, 162 | 9281, 9283, 9293, 9311, 9319, 9323, 9337, 9341, 9343, 9349, 163 | 9371, 9377, 9391, 9397, 9403, 9413, 9419, 9421, 9431, 9433, 164 | 9437, 9439, 9461, 9463, 9467, 9473, 9479, 9491, 9497, 9511, 165 | 9521, 9533, 9539, 9547, 9551, 9587, 9601, 9613, 9619, 9623, 166 | 9629, 9631, 9643, 9649, 9661, 9677, 9679, 9689, 9697, 9719, 167 | 9721, 9733, 9739, 9743, 9749, 9767, 9769, 9781, 9787, 9791, 168 | 9803, 9811, 9817, 9829, 9833, 9839, 9851, 9857, 9859, 9871, 169 | 9883, 9887, 9901, 9907, 9923, 9929, 9931, 9941, 9949, 9967, 170 | 9973 171 | }; 172 | 173 | #define mix(a,b,c) \ 174 | { \ 175 | a -= b; a -= c; a ^= (c>>13); \ 176 | b -= c; b -= a; b ^= (a<<8); \ 177 | c -= a; c -= b; c ^= (b>>13); \ 178 | a -= b; a -= c; a ^= (c>>12); \ 179 | b -= c; b -= a; b ^= (a<<16); \ 180 | c -= a; c -= b; c ^= (b>>5); \ 181 | a -= b; a -= c; a ^= (c>>3); \ 182 | b -= c; b -= a; b ^= (a<<10); \ 183 | c -= a; c -= b; c ^= (b>>15); \ 184 | } 185 | 186 | BOBHash32::BOBHash32() 187 | { 188 | this->prime32Num = 0; 189 | } 190 | 191 | BOBHash32::BOBHash32(uint32_t prime32Num) 192 | { 193 | this->prime32Num = prime32Num; 194 | } 195 | 196 | void BOBHash32::initialize(uint32_t prime32Num) 197 | { 198 | this->prime32Num = prime32Num; 199 | } 200 | 201 | uint32_t BOBHash32::run(const char * str, uint32_t len) 202 | { 203 | //register ub4 a,b,c,len; 204 | uint32_t a,b,c; 205 | // uint32_t initval = 0; 206 | /* Set up the internal state */ 207 | //len = length; 208 | a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */ 209 | c = prime32[this->prime32Num]; /* the previous hash value */ 210 | 211 | /*---------------------------------------- handle most of the key */ 212 | while (len >= 12) 213 | { 214 | a += (str[0] +((uint32_t)str[1]<<8) +((uint32_t)str[2]<<16) +((uint32_t)str[3]<<24)); 215 | b += (str[4] +((uint32_t)str[5]<<8) +((uint32_t)str[6]<<16) +((uint32_t)str[7]<<24)); 216 | c += (str[8] +((uint32_t)str[9]<<8) +((uint32_t)str[10]<<16)+((uint32_t)str[11]<<24)); 217 | mix(a,b,c); 218 | str += 12; len -= 12; 219 | } 220 | 221 | /*------------------------------------- handle the last 11 bytes */ 222 | c += len; 223 | switch(len) /* all the case statements fall through */ 224 | { 225 | case 11: c+=((uint32_t)str[10]<<24); 226 | // fall through 227 | case 10: c+=((uint32_t)str[9]<<16); 228 | // fall through 229 | case 9 : c+=((uint32_t)str[8]<<8); 230 | /* the first byte of c is reserved for the length */ 231 | // fall through 232 | case 8 : b+=((uint32_t)str[7]<<24); 233 | // fall through 234 | case 7 : b+=((uint32_t)str[6]<<16); 235 | // fall through 236 | case 6 : b+=((uint32_t)str[5]<<8); 237 | // fall through 238 | case 5 : b+=str[4]; 239 | // fall through 240 | case 4 : a+=((uint32_t)str[3]<<24); 241 | // fall through 242 | case 3 : a+=((uint32_t)str[2]<<16); 243 | // fall through 244 | case 2 : a+=((uint32_t)str[1]<<8); 245 | // fall through 246 | case 1 : a+=str[0]; 247 | /* case 0: nothing left to add */ 248 | } 249 | mix(a,b,c); 250 | /*-------------------------------------------- report the result */ 251 | return c; 252 | } 253 | 254 | BOBHash32::~BOBHash32() 255 | { 256 | 257 | } 258 | #endif //_BOBHASH32_H 259 | -------------------------------------------------------------------------------- /papers/bloom filters/BloomFilter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/BloomFilter.pdf -------------------------------------------------------------------------------- /papers/bloom filters/Invertible Bloom Lookup Table.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/Invertible Bloom Lookup Table.pdf -------------------------------------------------------------------------------- /papers/bloom filters/SummaryCache.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/SummaryCache.pdf -------------------------------------------------------------------------------- /papers/bloom filters/bloom tree.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/bloom tree.pdf -------------------------------------------------------------------------------- /papers/bloom filters/bloomier filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/bloomier filter.pdf -------------------------------------------------------------------------------- /papers/bloom filters/coded bloom filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/coded bloom filter.pdf -------------------------------------------------------------------------------- /papers/bloom filters/combinatorial bloom filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/combinatorial bloom filter.pdf -------------------------------------------------------------------------------- /papers/bloom filters/cuckoo filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/cuckoo filter.pdf -------------------------------------------------------------------------------- /papers/bloom filters/kBF.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/kBF.pdf -------------------------------------------------------------------------------- /papers/bloom filters/shifting bloom filter.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/bloom filters/shifting bloom filter.pdf -------------------------------------------------------------------------------- /papers/other references/FlowRadar.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/FlowRadar.pdf -------------------------------------------------------------------------------- /papers/other references/MRAC.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/MRAC.pdf -------------------------------------------------------------------------------- /papers/other references/SketchVisor.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/SketchVisor.pdf -------------------------------------------------------------------------------- /papers/other references/Space-Saving.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/Space-Saving.pdf -------------------------------------------------------------------------------- /papers/other references/univmon.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/other references/univmon.pdf -------------------------------------------------------------------------------- /papers/sampling methods/NetFLow.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sampling methods/NetFLow.pdf -------------------------------------------------------------------------------- /papers/sampling methods/sFlowOverview.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sampling methods/sFlowOverview.pdf -------------------------------------------------------------------------------- /papers/sketches/CM sketch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CM sketch.pdf -------------------------------------------------------------------------------- /papers/sketches/CSM sketch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CSM sketch.pdf -------------------------------------------------------------------------------- /papers/sketches/CU sketch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CU sketch.pdf -------------------------------------------------------------------------------- /papers/sketches/Count sketch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/Count sketch.pdf -------------------------------------------------------------------------------- /papers/sketches/CounterBraids.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/CounterBraids.pdf -------------------------------------------------------------------------------- /papers/sketches/Pyramid sketch.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/papers/sketches/Pyramid sketch.pdf -------------------------------------------------------------------------------- /常见sketch算法.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BlockLiu/Algorithms-for-Per-flow-Measurement/08d31cd4fa0367421266f3d96c48f5b659141b48/常见sketch算法.pptx --------------------------------------------------------------------------------