├── .gitignore ├── static ├── top25passwords.jpg ├── bloomfilter.css └── bloomfilter.svg ├── bower.json └── index.html /.gitignore: -------------------------------------------------------------------------------- 1 | bower_components/ 2 | 3 | -------------------------------------------------------------------------------- /static/top25passwords.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shane-tomlinson/bloomfilter-presentation/master/static/top25passwords.jpg -------------------------------------------------------------------------------- /static/bloomfilter.css: -------------------------------------------------------------------------------- 1 | .reveal cite { 2 | font-size: 16px; 3 | color: #aaa; 4 | } 5 | 6 | .reveal img.bloomfilter_example { 7 | border-width: 1px; 8 | } 9 | 10 | -------------------------------------------------------------------------------- /bower.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "Bloomfilter", 3 | "authors": [ 4 | "Shane Tomlinson " 5 | ], 6 | "description": "Bloomfilter presentation by Shane Tomlinson", 7 | "main": "", 8 | "moduleType": [], 9 | "keywords": [ 10 | "Bloomfilter" 11 | ], 12 | "license": "MIT", 13 | "homepage": "https://shanetomlinson.com", 14 | "private": true, 15 | "ignore": [ 16 | "**/.*", 17 | "node_modules", 18 | "bower_components", 19 | "test", 20 | "tests" 21 | ] 22 | } 23 | -------------------------------------------------------------------------------- /static/bloomfilter.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 5 | 0 6 | 1 7 | 0 8 | 1 9 | 1 10 | 1 11 | 0 12 | 13 | 0 14 | 0 15 | 0 16 | 0 17 | 1 18 | 0 19 | 1 20 | 0 21 | 0 22 | 23 | 1 24 | 0 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | {x, y, z} 45 | 46 | 47 | 49 | 50 | 51 | 52 | 53 | 55 | 56 | 57 | 58 | 60 | 61 | 62 | 63 | 65 | 66 | 67 | 68 | 69 | 71 | 72 | 73 | 74 | 76 | 77 | 78 | 79 | 81 | 82 | 83 | 84 | 85 | 87 | 88 | 89 | 90 | 92 | 93 | w 94 | 95 | 96 | 98 | 99 | 100 | 101 | 102 | 104 | 105 | 106 | 107 | 109 | 110 | 111 | -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
10 |
11 |
12 |

Bloom filters

13 | 14 |

15 | Bloom filter 16 |

17 | 18 | 19 | Image released to the public domain by David Eppstein. 20 | 21 |
22 | 23 |
24 |

Me

25 |

26 | Shane Tomlinson 27 |

28 |

29 | https://shanetomlinson.com 30 |

31 |

32 | @shane_tomlinson 33 |

34 |

35 | https://github.com/shane-tomlinson 36 |

37 |
38 | 39 |
40 |

Code

41 | https://github.com/LondonAlgorithms/Bloomfilters 42 |
43 | 44 |
45 |

Start with a problem

46 |
47 | 48 |
49 |

50 | top 25 passwords 51 |

52 | 53 | Top 25 Most Used Internet Passwords - An infographic by the team at Tom Fanelli - Infographic Marketing. 54 | 55 |
56 | 57 |
58 |

Goal

59 |

60 | Ban users from using one of the top 50k most common passwords. 61 |

62 | 63 |

Math(s) speak: determine if an item is a member of a set.

64 | 65 |
66 | 67 |
68 |

Constraints

69 |
    70 |
  1. No data can be sent to the server.
  2. 71 |
  3. Download must be "reasonably" sized.
  4. 72 |
73 |
74 | 75 |
76 |

What data structures could we use?

77 |
78 | 79 |
80 |

Hash tables

81 | 82 |

83 | Create a bit array, hash entries to a position in array. 84 |

85 | 86 |

87 | Download hash table, run password through hash function, check if bit is set. 88 |

89 | 90 | 91 |
92 |

Collisions

93 | 94 |
98 |

99 |

100 | 101 |
102 |

Reducing collisions

103 | 104 |
    105 |
  • Expand the size of the hash table.
  • 106 |
  • Use more than one hash function.
  • 107 |
108 |
109 | 110 |
111 |

What if the occasional false positive is OK?

112 |
113 | 114 |
115 |

Bloom filter

116 | 117 |
118 | A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. 119 |
120 | 121 |

122 | 123 | https://en.wikipedia.org/wiki/Bloom_filter 124 | 125 |

126 | 127 |
128 | 129 |
130 |

It looks like an onion

131 | 132 | 133 |
134 | 135 |
136 |

Properties

137 | 138 |
    139 |
  1. Has two primary functions, add and test.
  2. 140 |
  3. If an item is in the set, test will always return true.
  4. 141 |
  5. If an item is not in the set, test might return true.
  6. 142 |
  7. Rate of false positives is configurable.
  8. 143 |
144 |
145 | 146 |
147 |

Secondary properties

148 | 149 |
    150 |
  1. Hash table can be downloaded and tested against.
  2. 151 |
  3. False positive rate of <1% uses ~9.6 bits per member.
  4. 152 |
  5. Supports union and intersection of two or more filters.
  6. 153 |
154 |
155 | 156 |
157 |

Nomenclature

158 | 159 |
160 |
k
number of hash functions.
161 |
m
size of hash table, in bits.
162 |
n
size of set.
163 |
164 |
165 | 166 |
167 |

Bloom filter algorithm

168 |
169 | 170 |
171 | 172 |

Create

173 |
    174 |
  1. Create a bit array of size m.
  2. 175 |
  3. Fill bit array with all 0s.
  4. 176 |
177 | 178 |

Add

179 |
    180 |
  1. Run item through k independent hash functions.
  2. 181 |
  3. Set the corresponding entries in the bit array to 1.
  4. 182 |
183 | 184 |

Test

185 |
    186 |
  1. Run item through k independent hash functions.
  2. 187 |
  3. 188 | Check the corresponding entries in the bit array. 189 |
      190 |
    1. If all entries are 1, return true, otherwise return false.
    2. 191 |
    192 |
  4. 193 |
194 |
195 | 196 |
197 |

K independent ... WTF?

198 |
199 | 200 |
201 |

K independent hash functions

202 | 203 |

204 | Can be simulated with only two hash functions. 205 |

206 | 207 |

208 | To find an arbitrary i < k: 209 |

210 | 211 | hash(i) = (a + b * i ) % m; 212 | 213 | 214 |

215 | Where a is the result of Hash 1, b is the result of Hash 2, and m is the size 216 | of the bit array. 217 |

218 |
219 | 220 |
221 |

To adjust false positive rate

222 | 223 |

Fiddle with k and m.

224 | 225 |
226 |

227 | Increasing m gives us more space (bigger download). 228 |

229 |

230 | Increasing k gives us a better chance to find "0"s, but fills the hash table more quickly. 231 |

232 |
233 | 234 |
235 | Look at this table. 236 |
237 |
238 | 239 | 240 |
241 |
242 | 243 | 253 | 254 | 255 | --------------------------------------------------------------------------------