├── .gitmodules ├── LICENSE ├── README.md ├── go ├── .gitignore ├── LICENSE ├── README.md ├── inbloom │ ├── bloom.go │ └── bloom_test.go └── internal │ └── gomurmur │ ├── LICENSE │ ├── README.md │ ├── gomurmur.go │ └── gomurmur_test.go ├── inbloom.png ├── java ├── .gitignore ├── InBloom.iml ├── README.md ├── build.gradle ├── gradle │ └── wrapper │ │ └── gradle-wrapper.properties ├── gradlew ├── gradlew.bat ├── settings.gradle └── src │ ├── main │ └── java │ │ ├── META-INF │ │ └── MANIFEST.MF │ │ └── me │ │ └── everything │ │ └── inbloom │ │ ├── BinAscii.java │ │ ├── BloomFilter.java │ │ └── Murmur2.java │ └── test │ └── java │ └── me │ └── everything │ └── inbloom │ └── BloomFilterTest.java └── py ├── MANIFEST.in ├── README.md ├── README.rst ├── VERSION ├── generate_rst ├── inbloom ├── crc32.c └── inbloom.c ├── setup.py └── test.py /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "py/vendor/libbloom"] 2 | path = py/vendor/libbloom 3 | url = https://github.com/jvirkki/libbloom 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, EverythingMe 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 24 | 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## inbloom 2 | 3 | _inbloom_ - a cross language Bloom filter implementation (https://en.wikipedia.org/wiki/Bloom_filter). 4 | 5 | ![inbloom](https://raw.githubusercontent.com/EverythingMe/inbloom/master/inbloom.png) 6 | 7 | ## What's a Bloom Filter? 8 | A Bloom filter is a probabalistic data structure which provides an extremely space-efficient method of representing large sets. 9 | It can have false positives but never false negatives which means a query returns either "possibly in set" or "definitely not in set". 10 | 11 | You can tune a Bloom filter to the desired error rate, it's basically a tradeoff between size and accuracy (See example: http://hur.st/bloomfilter). For example, a filter for about 100 keys with 1% error rate can be compressed to just 120 bytes. With 5% error rate it can be compressed to 78 bytes. 12 | 13 | ## Why Cross Language? 14 | At EverythingMe we have an Android client written in Java, and servers written mostly in Python and Go. When we wanted to pass filters from the client to the server to avoid saving some state on the server side, we needed an efficient implementation that can read and write Bloom Filters in all three languages at least, and found none. 15 | 16 | Having such a library allows us to send filters between clients and any server component easily. 17 | 18 | So we decided to build on top of an existing simple implementation in C called libbloom (https://github.com/jvirkki/libbloom) and expand it to all 3 langauges. 19 | We chose to use the original C implementation for the Python version only, and **translated the code to pure Java and Go, without calling any C code**. 20 | We chose this approach because the original C code is fairly short and straightforward, so porting it to other languages was a simple task; 21 | and avoiding calling C from Java and Go simplifies and shortens the build process, and reduces executable size - in both cases. 22 | 23 | ## Filter headers 24 | 25 | InBloom provides utilities for serializing / deserializing Bloom filters so they can be sent over the network. 26 | Since when you create a Bloom filter, you need to initialize it with parameters of expected cardinality and false positive rates, 27 | they are also needed to read a filter written by another party. Instead of choosing fixed parameters in our configurations, we opted for encoding 28 | those parameters as a header when serizlizing the filter. We've added a 16 bit checksum for good measure as part of the header. 29 | 30 | ### Serialized filter structure: 31 | 32 | | Field | Type | bits | 33 | | ------------- |:-------------:| -----:| 34 | | checksum | ushort | 16 | 35 | | errorRate (1/N)| ushort | 16 | 36 | | cardinality | int | 32 | 37 | | data | byte[] | ? | 38 | 39 | 40 | ## Installation 41 | 42 | #### Python 43 | ```bash 44 | pip install inbloom 45 | ``` 46 | 47 | #### Go 48 | ```bash 49 | go get github.com/EverythingMe/inbloom/go/inbloom 50 | ``` 51 | 52 | #### Java 53 | 54 | Add the following lines to your build.gradle script. 55 | 56 | ```groovy 57 | repositories { 58 | jcenter { 59 | url 'http://dl.bintray.com/everythingme/generic' 60 | } 61 | } 62 | 63 | dependencies { 64 | compile 'me.everything:inbloom:0.1' 65 | } 66 | ``` 67 | 68 | ### Example Usage 69 | 70 | #### Python 71 | ```python 72 | import inbloom 73 | import base64 74 | import requests 75 | 76 | # Basic usage 77 | bf = inbloom.Filter(entries=100, error=0.01) 78 | bf.add("abc") 79 | bf.add("def") 80 | 81 | assert bf.contains("abc") 82 | assert bf.contains("def") 83 | assert not bf.contains("ghi") 84 | 85 | bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer()) 86 | assert bf2.contains("abc") 87 | assert bf2.contains("def") 88 | assert not bf2.contains("ghi") 89 | 90 | 91 | # Serialization 92 | payload = 'Yg0AZAAAABQAAAAAACAAEAAIAAAAAAAAIAAQAAgABAA=' 93 | assert base64.b64encode(inbloom.dump(inbloom.load(base64.b64decode(payload)))) == payload 94 | 95 | # Sending it over HTTP 96 | serialized = base64.b64encode(inbloom.dump(bf)) 97 | requests.get('http://api.endpoint.me', params={'filter': serialized}) 98 | ``` 99 | 100 | #### Go 101 | ```go 102 | // create a blank filter - expecting 20 members and an error rate of 1/100 103 | f, err := NewFilter(20, 0.01) 104 | if err != nil { 105 | panic(err) 106 | } 107 | 108 | // the size of the filter 109 | fmt.Println(f.Len()) 110 | 111 | // insert some values 112 | f.Add("foo") 113 | f.Add("bar") 114 | 115 | // test for existence of keys 116 | fmt.Println(f.Contains("foo")) 117 | fmt.Println(f.Contains("wat")) 118 | 119 | fmt.Println("marshaled data:", f.MarshalBase64()) 120 | 121 | // Output: 122 | // 24 123 | // true 124 | // false 125 | // marshaled data: oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA= 126 | ``` 127 | 128 | ```go 129 | // a 20 cardinality 0.01 precision filter with "foo" and "bar" in it 130 | data := "oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=" 131 | 132 | // load it from base64 133 | f, err := UnmarshalBase64(data) 134 | if err != nil { 135 | panic(err) 136 | } 137 | 138 | // test it... 139 | fmt.Println(f.Contains("foo")) 140 | fmt.Println(f.Contains("wat")) 141 | fmt.Println(f.Len()) 142 | 143 | // dump to pure binary 144 | fmt.Printf("%x\n", f.Marshal()) 145 | // Output: 146 | // true 147 | // false 148 | // 24 149 | // a14e006400000014000000000042000011001804000200200000301000090000 150 | ``` 151 | 152 | #### Java 153 | ```java 154 | import me.everything.inbloom.BloomFilter; 155 | import me.everything.inbloom.BinAscii; // Optional - for hex representation 156 | 157 | // The basics 158 | BloomFilter bf = new BloomFilter(20, 0.01); 159 | bf.add("foo"); 160 | bf.add("bar"); 161 | 162 | assertTrue(bf.contains("foo")); 163 | assertTrue(bf.contains("bar")); 164 | assertFalse(bf.contains("baz")); 165 | 166 | 167 | BloomFilter bf2 = new BloomFilter(bf.bf, bf.entries, bf.error); 168 | assertTrue(bf2.contains("foo")); 169 | assertTrue(bf2.contains("bar")); 170 | assertFalse(bf2.contains("baz")); 171 | 172 | // Serialization 173 | String serialized = BinAscii.hexlify(BloomFilter.dump(bf)); 174 | System.out.printf("Serialized: %s\n", serialized); 175 | 176 | String hexPayload = "620d006400000014000000000020001000080000000000002000100008000400"; 177 | BloomFilter deserialized = BloomFilter.load(BinAscii.unhexlify(hexPayload)); 178 | String dump = BinAscii.hexlify(BloomFilter.dump(deserialized)); 179 | System.out.printf("Re-Serialized: %s\n", dump); 180 | assertEquals(dump.toLowerCase(), hexPayload); 181 | 182 | assertEquals(deserialized.entries, 20); 183 | assertEquals(deserialized.error, 0.01); 184 | assertTrue(deserialized.contains("abc")); 185 | ``` 186 | -------------------------------------------------------------------------------- /go/.gitignore: -------------------------------------------------------------------------------- 1 | cover.out 2 | -------------------------------------------------------------------------------- /go/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, EverythingMe 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 24 | 25 | -------------------------------------------------------------------------------- /go/README.md: -------------------------------------------------------------------------------- 1 | # inbloom 2 | -- 3 | import "github.com/EverythingMe/inbloom/go/inbloom" 4 | 5 | Package inbloom implements a portable bloom filter that can export and import 6 | data to and from implementations of the same library in different languages. 7 | 8 | ## Installation 9 | ```bash 10 | go get github.com/EverythingMe/inbloom/go/inbloom 11 | ``` 12 | 13 | ## Usage 14 | 15 | #### type BloomFilter 16 | 17 | ```go 18 | type BloomFilter struct { 19 | } 20 | ``` 21 | 22 | BloomFilter is our implementation of a simple dynamically sized bloom filter. 23 | 24 | This code was adapted to Go from the libbloom C library - 25 | https://github.com/jvirkki/libbloom 26 | 27 | #### func NewFilter 28 | 29 | ```go 30 | func NewFilter(entries int, errorRate float64) (*BloomFilter, error) 31 | ``` 32 | NewFilter creates an empty bloom filter, with the given expected number of 33 | entries, and desired error rate. The number of hash functions and size of the 34 | filter are calculated from these 2 parameters 35 | 36 | #### func Unmarshal 37 | 38 | ```go 39 | func Unmarshal(data []byte) (*BloomFilter, error) 40 | ``` 41 | Unmarshal reads a binary dump of an inbloom filter with its header, and returns 42 | the resulting filter. Since this is a dump containing size and precisin 43 | metadata, you do not need to specify them. 44 | 45 | If the data is corrupt or the buffer is not complete, we return an error 46 | 47 | #### func UnmarshalBase64 48 | 49 | ```go 50 | func UnmarshalBase64(b64 string) (*BloomFilter, error) 51 | ``` 52 | UnmarshalBase64 is a convenience function that unmarshals a filter that has been 53 | encoded into a url parameter 54 | 55 | #### func (*BloomFilter) Add 56 | 57 | ```go 58 | func (f *BloomFilter) Add(key string) bool 59 | ``` 60 | Add adds a key to the filter 61 | 62 | #### func (*BloomFilter) Contains 63 | 64 | ```go 65 | func (f *BloomFilter) Contains(key string) bool 66 | ``` 67 | Contains returns true if a key exists in the filter 68 | 69 | #### func (*BloomFilter) Len 70 | 71 | ```go 72 | func (f *BloomFilter) Len() int 73 | ``` 74 | Len returns the number of BYTES in the filter 75 | 76 | #### func (*BloomFilter) Marshal 77 | 78 | ```go 79 | func (f *BloomFilter) Marshal() []byte 80 | ``` 81 | Marshal dumps the filter to a byte array, with a header containing the error 82 | rate, cardinality and a checksum. This data can be passed to another inbloom 83 | filter over the network, and thus the other end can open the data without the 84 | user having to pass the filter size explicitly. See Unmarshal for reading these 85 | dumpss 86 | 87 | #### func (*BloomFilter) MarshalBase64 88 | 89 | ```go 90 | func (f *BloomFilter) MarshalBase64() string 91 | ``` 92 | MarshalBase64 is a convenience method that dumps the filter's data to a base64 93 | encoded string, ready to be passed as an GET/POST parameter 94 | -------------------------------------------------------------------------------- /go/inbloom/bloom.go: -------------------------------------------------------------------------------- 1 | // This code was adapted to Go from the libbloom C library - https://github.com/jvirkki/libbloom 2 | // 3 | // Original copyright note from libbloom: 4 | // 5 | // Copyright (c) 2012, Jyri J. Virkki 6 | // All rights reserved. 7 | // This file is under BSD license. See LICENSE file. 8 | 9 | // Package inbloom implements a portable bloom filter that can export and import data to and from 10 | // implementations of the same library in different languages. 11 | package inbloom 12 | 13 | import ( 14 | "bytes" 15 | "encoding/base64" 16 | "encoding/binary" 17 | "errors" 18 | "fmt" 19 | "hash/crc32" 20 | "math" 21 | "unsafe" 22 | 23 | "github.com/EverythingMe/inbloom/go/internal/gomurmur" 24 | ) 25 | 26 | const denom = 0.480453013918201 27 | 28 | // BloomFilter is our implementation of a simple dynamically sized bloom filter. 29 | // 30 | // This code was adapted to Go from the libbloom C library - https://github.com/jvirkki/libbloom 31 | type BloomFilter struct { 32 | 33 | // These fields are part of the public interface of this structure. 34 | // Client code may read these values if desired. Client code MUST NOT 35 | // modify any of these. 36 | 37 | entries int 38 | errorRate float64 39 | bits int 40 | bytes int 41 | hashes int 42 | 43 | // Fields below are private to the implementation. These may go away or 44 | // change incompatibly at any moment. Client code MUST NOT access or rely 45 | // on these. 46 | 47 | bpe float64 48 | bf []byte 49 | } 50 | 51 | // NewFilter creates an empty bloom filter, with the given expected number of entries, and desired error rate. 52 | // The number of hash functions and size of the filter are calculated from these 2 parameters 53 | func NewFilter(entries int, errorRate float64) (*BloomFilter, error) { 54 | return newFilterFromData(nil, entries, errorRate) 55 | } 56 | 57 | // NewFilterFromData creates a bloom filter from an existing data buffer, created by another instance of this library (probably in another language). 58 | // 59 | // If the length of the data does not fit the number of entries and error rate, we return an error. If data is nil we allocate a new filter 60 | func newFilterFromData(data []byte, entries int, errorRate float64) (*BloomFilter, error) { 61 | 62 | if entries < 1 || errorRate == 0 { 63 | return nil, errors.New("Invalid params for bloom filter") 64 | } 65 | 66 | bpe := -(math.Log(errorRate) / denom) 67 | bits := int(float64(entries) * bpe) 68 | 69 | flt := &BloomFilter{ 70 | entries: entries, 71 | errorRate: errorRate, 72 | bpe: bpe, 73 | bits: bits, 74 | bytes: (bits / 8), 75 | hashes: int(math.Ceil(0.693147180559945 * bpe)), // ln(2) 76 | } 77 | 78 | if flt.bits%8 != 0 { 79 | flt.bytes++ 80 | } 81 | 82 | if data != nil { 83 | if flt.bytes != len(data) { 84 | return nil, fmt.Errorf("Expected %d bytes, got %d", flt.bytes, len(data)) 85 | } 86 | flt.bf = data 87 | } else { 88 | flt.bf = make([]byte, flt.bytes) 89 | } 90 | return flt, nil 91 | } 92 | 93 | // checkAdd checks existence or adds a key to the filter 94 | func (f *BloomFilter) checkAdd(key []byte, add bool) bool { 95 | 96 | hits := 0 97 | a, _ := gomurmur.Sum32(key, 0x9747b28c) 98 | b, _ := gomurmur.Sum32(key, a) 99 | 100 | for i := 0; i < f.hashes; i++ { 101 | x := (a + uint32(i)*b) % uint32(f.bits) 102 | bt := x >> 3 103 | 104 | c := f.bf[bt] // expensive memory access 105 | mask := byte(1) << (x % 8) 106 | 107 | if (c & mask) != 0 { 108 | hits++ 109 | } else { 110 | if add { 111 | f.bf[bt] = byte(c | mask) 112 | } 113 | } 114 | 115 | } 116 | 117 | return hits == f.hashes 118 | } 119 | 120 | // Contains returns true if a key exists in the filter 121 | func (f *BloomFilter) Contains(key string) bool { 122 | return f.checkAdd([]byte(key), false) 123 | } 124 | 125 | // Add adds a key to the filter 126 | func (f *BloomFilter) Add(key string) bool { 127 | return f.checkAdd([]byte(key), true) 128 | } 129 | 130 | // Len returns the number of BYTES in the filter 131 | func (f *BloomFilter) Len() int { 132 | return f.bytes 133 | } 134 | 135 | // checksum returns a 16 bit checksum of the data (using xor folded crc32 checksum) 136 | func (f *BloomFilter) checksum() uint16 { 137 | 138 | checksum32 := crc32.ChecksumIEEE(f.bf) 139 | return uint16(checksum32&0xFFFF) ^ uint16(checksum32>>16) 140 | 141 | } 142 | 143 | // The structure of a marshaled binary filter is: 144 | // checksum uint16 145 | // error_rate uint16 146 | // cardinality uint32 147 | // data []byte 148 | 149 | // Marshal dumps the filter to a byte array, with a header containing the error rate, cardinality and a checksum. 150 | // This data can be passed to another inbloom filter over the network, and thus the other end can open the data 151 | // without the user having to pass the filter size explicitly. See Unmarshal for reading these dumpss 152 | func (f *BloomFilter) Marshal() []byte { 153 | 154 | buf := bytes.NewBuffer(make([]byte, 0, len(f.bf)+int(unsafe.Sizeof(uint16(0))*2)+int(unsafe.Sizeof(uint32(0))))) 155 | binary.Write(buf, binary.BigEndian, f.checksum()) 156 | 157 | errs := uint16(1 / f.errorRate) 158 | binary.Write(buf, binary.BigEndian, errs) 159 | binary.Write(buf, binary.BigEndian, uint32(f.entries)) 160 | buf.Write(f.bf) 161 | return buf.Bytes() 162 | } 163 | 164 | // MarshalBase64 is a convenience method that dumps the filter's data to a base64 encoded string. 165 | // By default uses URLEncoding which ready to be passed as a GET/POST parameter. 166 | // Pass an encoding param to use different encoding. 167 | func (f *BloomFilter) MarshalBase64(encoding ...*base64.Encoding) string { 168 | if len(encoding) > 1 { 169 | panic(fmt.Sprintf("Expected at most 1 encoding, got %d", len(encoding))) 170 | } else if len(encoding) == 1 { 171 | return encoding[0].EncodeToString(f.Marshal()) 172 | } else { 173 | return base64.URLEncoding.EncodeToString(f.Marshal()) 174 | } 175 | } 176 | 177 | // UnmarshalBase64 is a convenience function that unmarshals a filter that has been encoded into base64. 178 | // Uses URLEncoding by default, pass an encoding param to use different encoding. 179 | func UnmarshalBase64(b64 string, encoding ...*base64.Encoding) (*BloomFilter, error) { 180 | selectedEncoding := base64.URLEncoding 181 | if len(encoding) > 1 { 182 | panic(fmt.Sprintf("Expected at most 1 encoding, got %d", len(encoding))) 183 | } else if len(encoding) == 1 { 184 | selectedEncoding = encoding[0] 185 | } 186 | if b, err := selectedEncoding.DecodeString(b64); err != nil { 187 | return nil, fmt.Errorf("bloom: could not decode base64 data: %s", err) 188 | } else { 189 | return Unmarshal(b) 190 | } 191 | 192 | } 193 | 194 | // Unmarshal reads a binary dump of an inbloom filter with its header, and returns the resulting filter. 195 | // Since this is a dump containing size and precisin metadata, you do not need to specify them. 196 | // 197 | // If the data is corrupt or the buffer is not complete, we return an error 198 | func Unmarshal(data []byte) (*BloomFilter, error) { 199 | 200 | if data == nil || len(data) <= int(unsafe.Sizeof(uint16(0))*2)+int(unsafe.Sizeof(uint32(0))) { 201 | return nil, errors.New("Invalid buffer size") 202 | } 203 | buf := bytes.NewBuffer(data) 204 | var checksum, errRate uint16 205 | var entries uint32 206 | 207 | if err := binary.Read(buf, binary.BigEndian, &checksum); err != nil { 208 | return nil, err 209 | } 210 | if err := binary.Read(buf, binary.BigEndian, &errRate); err != nil { 211 | return nil, err 212 | } 213 | if err := binary.Read(buf, binary.BigEndian, &entries); err != nil { 214 | return nil, err 215 | } 216 | 217 | if errRate == 0 { 218 | return nil, errors.New("Error rate cannot be 0") 219 | } 220 | 221 | // Read the data 222 | bf := make([]byte, len(data)) 223 | if n, err := buf.Read(bf); err != nil { 224 | return nil, err 225 | } else { 226 | bf = bf[:n] 227 | } 228 | 229 | // Create a new filter from the data we read 230 | ret, err := newFilterFromData(bf, int(entries), 1/float64(errRate)) 231 | if err != nil { 232 | return nil, err 233 | } 234 | 235 | // Verify checksum 236 | if ret.checksum() != checksum { 237 | return nil, errors.New("Bad checksum") 238 | } 239 | 240 | return ret, nil 241 | } 242 | -------------------------------------------------------------------------------- /go/inbloom/bloom_test.go: -------------------------------------------------------------------------------- 1 | package inbloom 2 | 3 | import ( 4 | "encoding/base64" 5 | "fmt" 6 | "testing" 7 | ) 8 | 9 | func TestBloom(t *testing.T) { 10 | 11 | bf, err := NewFilter(20, 0.01) 12 | if err != nil { 13 | t.Fatal(err) 14 | } 15 | 16 | keys := []string{"foo", "bar", "foosdfsdfs", "fossdfsdfo", "foasdfasdfasdfasdfo", "foasdfasdfasdasdfasdfasdfasdfasdfo"} 17 | 18 | faux := []string{"goo", "gar", "gaz"} 19 | 20 | for _, k := range keys { 21 | if bf.Add(k) == true { 22 | t.Errorf("adding %s returned true", k) 23 | } 24 | } 25 | 26 | t.Logf("Bloom filter params: %X", bf.bf) 27 | for _, k := range keys { 28 | if !bf.Contains(k) { 29 | t.Error("not containig ", k) 30 | } 31 | 32 | } 33 | 34 | for _, k := range faux { 35 | if bf.Contains(k) { 36 | t.Error("containig faux key", k) 37 | } 38 | } 39 | 40 | expected := "02000C0300C2246913049E040002002000017614002B0002" 41 | actual := fmt.Sprintf("%X", bf.bf) 42 | if actual != expected { 43 | t.Errorf("expected\n%s\nactual\n%s", expected, actual) 44 | } 45 | } 46 | 47 | func TestMarshal(t *testing.T) { 48 | bf, err := NewFilter(20, 0.01) 49 | if err != nil { 50 | t.Fatal(err) 51 | } 52 | bf.Add("abc") 53 | 54 | serizliaed := fmt.Sprintf("%x", bf.Marshal()) 55 | expected := "620d006400000014000000000020001000080000000000002000100008000400" 56 | if serizliaed != expected { 57 | t.Errorf("Expected %s, got %s", expected, serizliaed) 58 | } 59 | 60 | bfds, err := Unmarshal(bf.Marshal()) 61 | if err != nil { 62 | t.Fatal(err) 63 | } 64 | 65 | serizliaed = fmt.Sprintf("%x", bfds.Marshal()) 66 | if serizliaed != expected { 67 | t.Errorf("Expected %s, got %s", expected, serizliaed) 68 | } 69 | t.Logf("DESERIALIZED: %X\n", bfds.Marshal()) 70 | 71 | // Test for bad checksum 72 | 73 | data := bfds.Marshal() 74 | data[0] = 0xff 75 | data[1] = 0xff 76 | 77 | if _, err = Unmarshal(data); err == nil { 78 | t.Error("Should have failed on bad checksum") 79 | } else { 80 | t.Log(err) 81 | } 82 | 83 | data[2] = 0xff 84 | if _, err = Unmarshal(data); err == nil { 85 | t.Error("Should have failed on bad size") 86 | } else { 87 | t.Log(err) 88 | } 89 | 90 | data = data[:4] 91 | if _, err = Unmarshal(data); err == nil { 92 | t.Error("Should have failed on bad data") 93 | } else { 94 | t.Log(err) 95 | } 96 | 97 | } 98 | 99 | func TestBase64StdEncoding(t *testing.T) { 100 | source := "j+gAZAAAAGQAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAIAAAAAEAAAAAAAAAAAAABAAAAAAAAAYAAAAAAACAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAQAAgAAAAAAAAAAAAAAAAAACBgAA=" 101 | 102 | f, err := UnmarshalBase64(source, base64.StdEncoding) 103 | if err != nil { 104 | t.Fatal(err) 105 | } 106 | 107 | marshaled := f.MarshalBase64(base64.StdEncoding) 108 | 109 | if marshaled != source { 110 | t.Fatal(fmt.Sprintf("MarshalBase64 differs from source: %s != %s", marshaled, source)) 111 | } 112 | } 113 | 114 | func TestBase64UrlEncoding(t *testing.T) { 115 | source := "j-gAZAAAAGQAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAIAAAAAEAAAAAAAAAAAAABAAAAAAAAAYAAAAAAACAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAQAAgAAAAAAAAAAAAAAAAAACBgAA=" 116 | 117 | f, err := UnmarshalBase64(source) 118 | if err != nil { 119 | t.Fatal(err) 120 | } 121 | 122 | marshaled := f.MarshalBase64() 123 | 124 | if marshaled != source { 125 | t.Fatal(fmt.Sprintf("MarshalBase64 differs from source: %s != %s", marshaled, source)) 126 | } 127 | } 128 | 129 | func ExampleBloomFilter() { 130 | 131 | // create a blank filter - expecting 20 members and an error rate of 1/100 132 | f, err := NewFilter(20, 0.01) 133 | if err != nil { 134 | panic(err) 135 | } 136 | 137 | // the size of the filter 138 | fmt.Println(f.Len()) 139 | 140 | // insert some values 141 | f.Add("foo") 142 | f.Add("bar") 143 | 144 | // test for existence of keys 145 | fmt.Println(f.Contains("foo")) 146 | fmt.Println(f.Contains("wat")) 147 | 148 | fmt.Println("marshaled data:", f.MarshalBase64()) 149 | 150 | // Output: 151 | // 24 152 | // true 153 | // false 154 | // marshaled data: oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA= 155 | 156 | } 157 | 158 | func ExampleMarshalUnmarshal() { 159 | 160 | // a 20 cardinality 0.01 precision filter with "foo" and "bar" in it 161 | data := "oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=" 162 | 163 | // load it from base64 164 | f, err := UnmarshalBase64(data) 165 | if err != nil { 166 | panic(err) 167 | } 168 | 169 | // test it... 170 | fmt.Println(f.Contains("foo")) 171 | fmt.Println(f.Contains("wat")) 172 | fmt.Println(f.Len()) 173 | 174 | // dump to pure binary 175 | fmt.Printf("%x\n", f.Marshal()) 176 | // Output: 177 | // true 178 | // false 179 | // 24 180 | // a14e006400000014000000000042000011001804000200200000301000090000 181 | 182 | } 183 | -------------------------------------------------------------------------------- /go/internal/gomurmur/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2013-2016, Sureshkumar Nedunchezhian 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 5 | 6 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 7 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 8 | 9 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 10 | -------------------------------------------------------------------------------- /go/internal/gomurmur/README.md: -------------------------------------------------------------------------------- 1 | # gomurmur 2 | 3 | Go implementation of MurmurHash2, 32bit (https://code.google.com/p/smhasher/) 4 | 5 | ### TODO 6 | * Add benchmark tests 7 | * Implement MurmurHash3 8 | -------------------------------------------------------------------------------- /go/internal/gomurmur/gomurmur.go: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013-2016, Sureshkumar Nedunchezhian 3 | * All rights reserved. 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * * Redistributions of source code must retain the above copyright notice, 9 | * this list of conditions and the following disclaimer. 10 | * * Redistributions in binary form must reproduce the above copyright 11 | * notice, this list of conditions and the following disclaimer in 12 | * the documentation and/or other materials provided with the distribution. 13 | * 14 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 16 | * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 17 | * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS 18 | * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, 19 | * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 20 | * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 21 | * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 22 | * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 23 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF 24 | * THE POSSIBILITY OF SUCH DAMAGE. 25 | */ 26 | 27 | /* 28 | * "Murmur" hash provided by Austin, tanjent@gmail.com 29 | * http://murmurhash.googlepages.com/ 30 | * 31 | * Note - This code makes a few assumptions about how your machine behaves - 32 | * 33 | * 1. We can read a 4-byte value from any address without crashing 34 | * 2. sizeof(int) == 4 35 | * 36 | * And it has a few limitations - 37 | * 1. It will not work incrementally. 38 | * 2. It will not produce the same results on little-endian and big-endian 39 | * machines. */ 40 | 41 | package gomurmur 42 | 43 | import ( 44 | "bytes" 45 | "encoding/binary" 46 | "hash" 47 | ) 48 | 49 | type ( 50 | sum32 uint32 51 | ) 52 | 53 | const ( 54 | m = 0x5bd1e995 55 | r = 24 56 | ) 57 | 58 | func Sum32(b []byte, seed uint32) (uint32, error) { 59 | var s sum32 = 0 60 | h := &s 61 | 62 | if _, err := h.WriteSeed(b, seed); err != nil { 63 | return 0, err 64 | } 65 | return uint32(*h), nil 66 | } 67 | 68 | // New32 returns a new 32-bit FNV-1 hash.Hash. 69 | func New32() hash.Hash32 { 70 | var s sum32 = 0 71 | return &s 72 | } 73 | 74 | func (s *sum32) Reset() { *s = 0 } 75 | func (s *sum32) Sum32() uint32 { 76 | return uint32(*s) 77 | } 78 | 79 | const defaultSeed uint32 = 0x9747b28c 80 | 81 | func (s *sum32) Write(data []byte) (int, error) { 82 | return s.WriteSeed(data, defaultSeed) 83 | } 84 | 85 | func (s *sum32) WriteSeed(data []byte, seed uint32) (int, error) { 86 | var length = uint32(len(data)) 87 | 88 | /* Initialize the hash to a 'random' value */ 89 | h := *s 90 | h = sum32(seed ^ length) 91 | 92 | /* Mix 4 bytes at a time into the hash */ 93 | var i int = 0 94 | 95 | for length >= 4 { 96 | var k uint32 97 | buf := bytes.NewBuffer(data[i : i+4]) 98 | err := binary.Read(buf, binary.LittleEndian, &k) 99 | if err != nil { 100 | return 0, err 101 | } 102 | k *= m 103 | k ^= k >> r 104 | k *= m 105 | 106 | h *= m 107 | h ^= sum32(k) 108 | 109 | i += 4 110 | length -= 4 111 | } 112 | switch length { 113 | case 3: 114 | h ^= sum32((uint32)(data[i+2]) << 16) 115 | fallthrough 116 | case 2: 117 | h ^= sum32((uint32)(data[i+1]) << 8) 118 | fallthrough 119 | case 1: 120 | h ^= sum32((uint32)(data[i])) 121 | h *= m 122 | default: 123 | } 124 | h ^= h >> 13 125 | h *= m 126 | h ^= h >> 15 127 | *s = h 128 | 129 | return len(data), nil 130 | } 131 | 132 | func (s *sum32) Size() int { return 4 } 133 | 134 | func (s *sum32) BlockSize() int { return 1 } 135 | 136 | func (s *sum32) Sum(in []byte) []byte { 137 | v := uint32(*s) 138 | return append(in, byte(v>>24), byte(v>>16), byte(v>>8), byte(v)) 139 | } 140 | -------------------------------------------------------------------------------- /go/internal/gomurmur/gomurmur_test.go: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013-2016, Sureshkumar Nedunchezhian 3 | * All rights reserved. 4 | * 5 | * Redistribution and use in source and binary forms, with or without 6 | * modification, are permitted provided that the following conditions are met: 7 | * 8 | * * Redistributions of source code must retain the above copyright notice, 9 | * this list of conditions and the following disclaimer. 10 | * * Redistributions in binary form must reproduce the above copyright 11 | * notice, this list of conditions and the following disclaimer in 12 | * the documentation and/or other materials provided with the distribution. 13 | * 14 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, 16 | * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 17 | * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS 18 | * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, 19 | * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 20 | * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 21 | * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 22 | * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 23 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF 24 | * THE POSSIBILITY OF SUCH DAMAGE. 25 | */ 26 | 27 | package gomurmur 28 | 29 | import ( 30 | "bytes" 31 | "hash" 32 | "testing" 33 | ) 34 | 35 | type golden struct { 36 | sum []byte 37 | text string 38 | } 39 | 40 | var golden32 = []golden{ 41 | {[]byte{0x00, 0x00, 0x00, 0x00}, ""}, 42 | {[]byte{0x4b, 0x41, 0x75, 0x7c}, "a"}, 43 | {[]byte{0xe3, 0xb5, 0x4d, 0xfb}, "ab"}, 44 | {[]byte{0x7b, 0x0c, 0xc4, 0x28}, "abc"}, 45 | {[]byte{0xef, 0x6a, 0x86, 0xaf}, "abcd"}, 46 | {[]byte{0x9a, 0x26, 0x3e, 0xda}, "abcde"}, 47 | {[]byte{0xe0, 0xba, 0xdc, 0x96}, "abcdef"}, 48 | {[]byte{0xeb, 0xa7, 0x46, 0xf2}, "abcdefg"}, 49 | } 50 | 51 | func TestGolden32(t *testing.T) { 52 | testGolden(t, New32(), golden32) 53 | } 54 | 55 | func testGolden(t *testing.T, hash hash.Hash, gold []golden) { 56 | for _, g := range gold { 57 | hash.Reset() 58 | done, error := hash.Write([]byte(g.text)) 59 | if error != nil { 60 | t.Fatalf("write error: %s", error) 61 | } 62 | if done != len(g.text) { 63 | t.Fatalf("wrote only %d out of %d bytes", done, len(g.text)) 64 | } 65 | if actual := hash.Sum(nil); !bytes.Equal(g.sum, actual) { 66 | t.Errorf("hash(%q) = 0x%x want 0x%x", g.text, actual, g.sum) 67 | } 68 | } 69 | } 70 | -------------------------------------------------------------------------------- /inbloom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EverythingMe/inbloom/1d67c94ae5dc4dd199b88912114e7fa1b2f04bad/inbloom.png -------------------------------------------------------------------------------- /java/.gitignore: -------------------------------------------------------------------------------- 1 | *.class 2 | 3 | # Mobile Tools for Java (J2ME) 4 | .mtj.tmp/ 5 | /.idea 6 | /.gradle 7 | # Package Files # 8 | *.jar 9 | *.war 10 | *.ear 11 | 12 | # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml 13 | hs_err_pid* 14 | build/* 15 | -------------------------------------------------------------------------------- /java/InBloom.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | -------------------------------------------------------------------------------- /java/README.md: -------------------------------------------------------------------------------- 1 | # InBloom-Java 2 | 3 | Java Implementation of InBloom portable cross-language bloom filter. 4 | 5 | ### Usage 6 | https://github.com/EverythingMe/inbloom#java 7 | 8 | 9 | ### Gradle 10 | 11 | Add the following lines to your build.gradle script. 12 | 13 | ```groovy 14 | repositories { 15 | jcenter { 16 | url 'http://dl.bintray.com/everythingme/generic' 17 | } 18 | } 19 | 20 | dependencies { 21 | compile 'me.everything:inbloom:0.1' 22 | } 23 | ``` 24 | -------------------------------------------------------------------------------- /java/build.gradle: -------------------------------------------------------------------------------- 1 | plugins { 2 | id "com.jfrog.bintray" version "1.2" 3 | } 4 | 5 | group = 'me.everything' 6 | version '0.1' 7 | 8 | apply plugin: 'java' 9 | apply plugin: 'maven' 10 | apply plugin: 'maven-publish' 11 | 12 | sourceCompatibility = 1.5 13 | 14 | repositories { 15 | mavenCentral() 16 | } 17 | 18 | dependencies { 19 | testCompile group: 'junit', name: 'junit', version: '4.11' 20 | } 21 | 22 | bintray { 23 | user = System.getenv('BINTRAY_USER') 24 | key = System.getenv('BINTRAY_KEY') 25 | publications = ['mavenJava'] 26 | pkg { 27 | repo = 'generic' 28 | name = 'inbloom' 29 | userOrg = 'everythingme' 30 | licenses = ['BSD'] 31 | vcsUrl = 'https://github.com/EverythingMe/inbloom' 32 | publish = true 33 | version { 34 | name = '0.1' 35 | desc = 'InBloom Library 0.1' 36 | released = new Date() 37 | vcsTag = '0.1' 38 | } 39 | } 40 | } 41 | 42 | task sourcesJar(type: Jar) { 43 | from sourceSets.main.allSource 44 | classifier = 'sources' 45 | } 46 | 47 | task javadocJar(type: Jar, dependsOn: javadoc) { 48 | classifier = 'javadoc' 49 | from 'build/docs/javadoc' 50 | } 51 | 52 | publishing { 53 | publications { 54 | mavenJava(MavenPublication) { 55 | from components.java 56 | artifact sourcesJar 57 | artifact javadocJar 58 | groupId 'me.everything' 59 | artifactId 'inbloom' 60 | } 61 | } 62 | } 63 | -------------------------------------------------------------------------------- /java/gradle/wrapper/gradle-wrapper.properties: -------------------------------------------------------------------------------- 1 | #Thu Jul 23 11:36:17 IDT 2015 2 | distributionBase=GRADLE_USER_HOME 3 | distributionPath=wrapper/dists 4 | zipStoreBase=GRADLE_USER_HOME 5 | zipStorePath=wrapper/dists 6 | distributionUrl=https\://services.gradle.org/distributions/gradle-2.2-all.zip 7 | -------------------------------------------------------------------------------- /java/gradlew: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | ############################################################################## 4 | ## 5 | ## Gradle start up script for UN*X 6 | ## 7 | ############################################################################## 8 | 9 | # Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. 10 | DEFAULT_JVM_OPTS="" 11 | 12 | APP_NAME="Gradle" 13 | APP_BASE_NAME=`basename "$0"` 14 | 15 | # Use the maximum available, or set MAX_FD != -1 to use that value. 16 | MAX_FD="maximum" 17 | 18 | warn ( ) { 19 | echo "$*" 20 | } 21 | 22 | die ( ) { 23 | echo 24 | echo "$*" 25 | echo 26 | exit 1 27 | } 28 | 29 | # OS specific support (must be 'true' or 'false'). 30 | cygwin=false 31 | msys=false 32 | darwin=false 33 | case "`uname`" in 34 | CYGWIN* ) 35 | cygwin=true 36 | ;; 37 | Darwin* ) 38 | darwin=true 39 | ;; 40 | MINGW* ) 41 | msys=true 42 | ;; 43 | esac 44 | 45 | # For Cygwin, ensure paths are in UNIX format before anything is touched. 46 | if $cygwin ; then 47 | [ -n "$JAVA_HOME" ] && JAVA_HOME=`cygpath --unix "$JAVA_HOME"` 48 | fi 49 | 50 | # Attempt to set APP_HOME 51 | # Resolve links: $0 may be a link 52 | PRG="$0" 53 | # Need this for relative symlinks. 54 | while [ -h "$PRG" ] ; do 55 | ls=`ls -ld "$PRG"` 56 | link=`expr "$ls" : '.*-> \(.*\)$'` 57 | if expr "$link" : '/.*' > /dev/null; then 58 | PRG="$link" 59 | else 60 | PRG=`dirname "$PRG"`"/$link" 61 | fi 62 | done 63 | SAVED="`pwd`" 64 | cd "`dirname \"$PRG\"`/" >&- 65 | APP_HOME="`pwd -P`" 66 | cd "$SAVED" >&- 67 | 68 | CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar 69 | 70 | # Determine the Java command to use to start the JVM. 71 | if [ -n "$JAVA_HOME" ] ; then 72 | if [ -x "$JAVA_HOME/jre/sh/java" ] ; then 73 | # IBM's JDK on AIX uses strange locations for the executables 74 | JAVACMD="$JAVA_HOME/jre/sh/java" 75 | else 76 | JAVACMD="$JAVA_HOME/bin/java" 77 | fi 78 | if [ ! -x "$JAVACMD" ] ; then 79 | die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME 80 | 81 | Please set the JAVA_HOME variable in your environment to match the 82 | location of your Java installation." 83 | fi 84 | else 85 | JAVACMD="java" 86 | which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. 87 | 88 | Please set the JAVA_HOME variable in your environment to match the 89 | location of your Java installation." 90 | fi 91 | 92 | # Increase the maximum file descriptors if we can. 93 | if [ "$cygwin" = "false" -a "$darwin" = "false" ] ; then 94 | MAX_FD_LIMIT=`ulimit -H -n` 95 | if [ $? -eq 0 ] ; then 96 | if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then 97 | MAX_FD="$MAX_FD_LIMIT" 98 | fi 99 | ulimit -n $MAX_FD 100 | if [ $? -ne 0 ] ; then 101 | warn "Could not set maximum file descriptor limit: $MAX_FD" 102 | fi 103 | else 104 | warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT" 105 | fi 106 | fi 107 | 108 | # For Darwin, add options to specify how the application appears in the dock 109 | if $darwin; then 110 | GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\"" 111 | fi 112 | 113 | # For Cygwin, switch paths to Windows format before running java 114 | if $cygwin ; then 115 | APP_HOME=`cygpath --path --mixed "$APP_HOME"` 116 | CLASSPATH=`cygpath --path --mixed "$CLASSPATH"` 117 | 118 | # We build the pattern for arguments to be converted via cygpath 119 | ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null` 120 | SEP="" 121 | for dir in $ROOTDIRSRAW ; do 122 | ROOTDIRS="$ROOTDIRS$SEP$dir" 123 | SEP="|" 124 | done 125 | OURCYGPATTERN="(^($ROOTDIRS))" 126 | # Add a user-defined pattern to the cygpath arguments 127 | if [ "$GRADLE_CYGPATTERN" != "" ] ; then 128 | OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)" 129 | fi 130 | # Now convert the arguments - kludge to limit ourselves to /bin/sh 131 | i=0 132 | for arg in "$@" ; do 133 | CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -` 134 | CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option 135 | 136 | if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition 137 | eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"` 138 | else 139 | eval `echo args$i`="\"$arg\"" 140 | fi 141 | i=$((i+1)) 142 | done 143 | case $i in 144 | (0) set -- ;; 145 | (1) set -- "$args0" ;; 146 | (2) set -- "$args0" "$args1" ;; 147 | (3) set -- "$args0" "$args1" "$args2" ;; 148 | (4) set -- "$args0" "$args1" "$args2" "$args3" ;; 149 | (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;; 150 | (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;; 151 | (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;; 152 | (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;; 153 | (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;; 154 | esac 155 | fi 156 | 157 | # Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules 158 | function splitJvmOpts() { 159 | JVM_OPTS=("$@") 160 | } 161 | eval splitJvmOpts $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS 162 | JVM_OPTS[${#JVM_OPTS[*]}]="-Dorg.gradle.appname=$APP_BASE_NAME" 163 | 164 | exec "$JAVACMD" "${JVM_OPTS[@]}" -classpath "$CLASSPATH" org.gradle.wrapper.GradleWrapperMain "$@" 165 | -------------------------------------------------------------------------------- /java/gradlew.bat: -------------------------------------------------------------------------------- 1 | @if "%DEBUG%" == "" @echo off 2 | @rem ########################################################################## 3 | @rem 4 | @rem Gradle startup script for Windows 5 | @rem 6 | @rem ########################################################################## 7 | 8 | @rem Set local scope for the variables with windows NT shell 9 | if "%OS%"=="Windows_NT" setlocal 10 | 11 | @rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script. 12 | set DEFAULT_JVM_OPTS= 13 | 14 | set DIRNAME=%~dp0 15 | if "%DIRNAME%" == "" set DIRNAME=. 16 | set APP_BASE_NAME=%~n0 17 | set APP_HOME=%DIRNAME% 18 | 19 | @rem Find java.exe 20 | if defined JAVA_HOME goto findJavaFromJavaHome 21 | 22 | set JAVA_EXE=java.exe 23 | %JAVA_EXE% -version >NUL 2>&1 24 | if "%ERRORLEVEL%" == "0" goto init 25 | 26 | echo. 27 | echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH. 28 | echo. 29 | echo Please set the JAVA_HOME variable in your environment to match the 30 | echo location of your Java installation. 31 | 32 | goto fail 33 | 34 | :findJavaFromJavaHome 35 | set JAVA_HOME=%JAVA_HOME:"=% 36 | set JAVA_EXE=%JAVA_HOME%/bin/java.exe 37 | 38 | if exist "%JAVA_EXE%" goto init 39 | 40 | echo. 41 | echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME% 42 | echo. 43 | echo Please set the JAVA_HOME variable in your environment to match the 44 | echo location of your Java installation. 45 | 46 | goto fail 47 | 48 | :init 49 | @rem Get command-line arguments, handling Windowz variants 50 | 51 | if not "%OS%" == "Windows_NT" goto win9xME_args 52 | if "%@eval[2+2]" == "4" goto 4NT_args 53 | 54 | :win9xME_args 55 | @rem Slurp the command line arguments. 56 | set CMD_LINE_ARGS= 57 | set _SKIP=2 58 | 59 | :win9xME_args_slurp 60 | if "x%~1" == "x" goto execute 61 | 62 | set CMD_LINE_ARGS=%* 63 | goto execute 64 | 65 | :4NT_args 66 | @rem Get arguments from the 4NT Shell from JP Software 67 | set CMD_LINE_ARGS=%$ 68 | 69 | :execute 70 | @rem Setup the command line 71 | 72 | set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar 73 | 74 | @rem Execute Gradle 75 | "%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS% 76 | 77 | :end 78 | @rem End local scope for the variables with windows NT shell 79 | if "%ERRORLEVEL%"=="0" goto mainEnd 80 | 81 | :fail 82 | rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of 83 | rem the _cmd.exe /c_ return code! 84 | if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1 85 | exit /b 1 86 | 87 | :mainEnd 88 | if "%OS%"=="Windows_NT" endlocal 89 | 90 | :omega 91 | -------------------------------------------------------------------------------- /java/settings.gradle: -------------------------------------------------------------------------------- 1 | rootProject.name = 'InBloom' 2 | 3 | -------------------------------------------------------------------------------- /java/src/main/java/META-INF/MANIFEST.MF: -------------------------------------------------------------------------------- 1 | Manifest-Version: 1.0 2 | -------------------------------------------------------------------------------- /java/src/main/java/me/everything/inbloom/BinAscii.java: -------------------------------------------------------------------------------- 1 | package me.everything.inbloom; 2 | 3 | /** 4 | * Utility class for dealing with binary data 5 | */ 6 | public class BinAscii { 7 | final protected static char[] hexArray = "0123456789ABCDEF".toCharArray(); 8 | 9 | /** 10 | * Transform a byte array into a it's hexadecimal representation 11 | */ 12 | public static String hexlify(byte[] bytes) { 13 | char[] hexChars = new char[bytes.length * 2]; 14 | for ( int j = 0; j < bytes.length; j++ ) { 15 | int v = bytes[j] & 0xFF; 16 | hexChars[j * 2] = hexArray[v >>> 4]; 17 | hexChars[j * 2 + 1] = hexArray[v & 0x0F]; 18 | } 19 | String ret = new String(hexChars); 20 | return ret; 21 | } 22 | 23 | /** 24 | * Transform a string of hexadecimal chars into a byte array 25 | */ 26 | public static byte[] unhexlify(String argbuf) { 27 | int arglen = argbuf.length(); 28 | if (arglen % 2 != 0) 29 | throw new RuntimeException("Odd-length string"); 30 | 31 | byte[] retbuf = new byte[arglen/2]; 32 | 33 | for (int i = 0; i < arglen; i += 2) { 34 | int top = Character.digit(argbuf.charAt(i), 16); 35 | int bot = Character.digit(argbuf.charAt(i+1), 16); 36 | if (top == -1 || bot == -1) 37 | throw new RuntimeException("Non-hexadecimal digit found"); 38 | retbuf[i / 2] = (byte) ((top << 4) + bot); 39 | } 40 | return retbuf; 41 | } 42 | } 43 | -------------------------------------------------------------------------------- /java/src/main/java/me/everything/inbloom/BloomFilter.java: -------------------------------------------------------------------------------- 1 | package me.everything.inbloom; 2 | 3 | import java.io.InvalidObjectException; 4 | import java.nio.ByteBuffer; 5 | import java.nio.ByteOrder; 6 | import java.util.Arrays; 7 | import java.util.zip.CRC32; 8 | 9 | /** 10 | * Pure Java Bloom Filter class. 11 | * 12 | * Translated from the libbloom C library 13 | * 14 | * Original copyright note from libbloom: 15 | * 16 | * Copyright (c) 2012, Jyri J. Virkki 17 | * All rights reserved. 18 | * 19 | * This file is under BSD license. See LICENSE file. 20 | */ 21 | public class BloomFilter { 22 | 23 | private static final String TAG = "BloomFilter" ; 24 | // These fields are part of the public interface of this structure. 25 | // Client code may read these values if desired. Client code MUST NOT 26 | // modify any of these. 27 | protected final int entries; 28 | protected final double error; 29 | protected final int bits; 30 | protected final int bytes; 31 | protected final int hashes; 32 | protected final static double errorPrecision = 0.000000001; 33 | 34 | // Fields below are private to the implementation. These may go away or 35 | // change incompatibly at any moment. Client code MUST NOT access or rely 36 | // on these. 37 | double bpe; 38 | byte[] bf; 39 | int ready; 40 | 41 | /** 42 | * Create a blank bloom filter, with a given number of expected entries, and an error rate 43 | * @param entries the expected number of entries 44 | * @param error the desired error rate 45 | */ 46 | public BloomFilter(int entries, double error) 47 | { 48 | this(null, entries, error); 49 | } 50 | 51 | 52 | static private final double denom = 0.480453013918201; 53 | 54 | /** 55 | * Create a bloom filter from an existing data buffer, created by another instance of this library (probably in another language). 56 | * If the length of the data does not fit the number of entries and error rate, we raise RuntimeException 57 | * @param data the raw filter data 58 | * @param entries the expected number of entries 59 | * @param error the desired error rate 60 | */ 61 | public BloomFilter(byte []data, int entries, double error) throws RuntimeException 62 | { 63 | if (entries < 1 || ( ( 1.0 <= error ) || ( error <= errorPrecision ) ) ) { 64 | throw new RuntimeException("Invalid params for bloom filter"); 65 | } 66 | 67 | this.entries = entries; 68 | this.error = error; 69 | 70 | bpe = -(Math.log(error) / denom); 71 | bits = (int)((double)entries * bpe); 72 | bytes = (bits / 8) + (bits % 8 != 0 ? 1 : 0); 73 | 74 | if (data != null) { 75 | if (bytes != data.length) { 76 | throw new RuntimeException(String.format("Expected %d bytes, got %d", bytes, data.length)); 77 | } 78 | bf = data; 79 | } else { 80 | bf = new byte[bytes];; 81 | } 82 | 83 | 84 | hashes = (int)Math.ceil(0.693147180559945 * bpe); // ln(2) 85 | } 86 | 87 | public static short computeChecksum(byte[] data) { 88 | CRC32 crc = new CRC32(); 89 | crc.update(data); 90 | long checksum32 = crc.getValue(); 91 | return (short) ((checksum32 & 0xFFFF) ^ (checksum32 >> 16)); 92 | } 93 | 94 | public static BloomFilter load(byte[] bytes) throws InvalidObjectException { 95 | ByteBuffer bb = ByteBuffer.wrap(bytes); 96 | bb.order(ByteOrder.BIG_ENDIAN); 97 | short checksum = bb.getShort(); 98 | short errorRate = bb.getShort(); 99 | int cardinality = bb.getInt(); 100 | final byte[] data = Arrays.copyOfRange(bytes, bb.position(), bytes.length); 101 | if (computeChecksum(data) != checksum) 102 | throw new InvalidObjectException("Bad checksum"); 103 | 104 | return new BloomFilter(data, cardinality, 1.0 / errorRate); 105 | } 106 | 107 | public static byte[] dump(BloomFilter bf) { 108 | // 8 is the size of the header 109 | byte[] bytes = new byte[bf.bytes + 8]; 110 | ByteBuffer bb = ByteBuffer.wrap(bytes); 111 | bb.order(ByteOrder.BIG_ENDIAN); 112 | 113 | bb.putShort(computeChecksum(bf.bf)); 114 | bb.putShort((short) (1.0 / bf.error)); 115 | bb.putInt(bf.entries); 116 | bb.put(bf.bf); 117 | 118 | return bytes; 119 | } 120 | 121 | private static long unsigned(int i) { 122 | return i & 0xffffffffl; 123 | } 124 | 125 | /** 126 | * check existence or add an entry 127 | * @param key the key to check/add 128 | * @param add whether we add or just check the existence 129 | * @return true if the key is already in the filter 130 | */ 131 | private boolean checkAdd(String key,boolean add) { 132 | 133 | int hits = 0; 134 | long a = unsigned(Murmur2.hash32(key, 0x9747b28c)); 135 | long b = unsigned(Murmur2.hash32(key, (int) a)); 136 | 137 | 138 | 139 | for (int i = 0; i < hashes; i++) { 140 | long x = unsigned ((int)(a + i*b)) % bits; 141 | long bt = x >> 3; 142 | 143 | byte c = bf[(int)bt]; // expensive memory access 144 | byte mask = (byte)(1 << (x % 8)); 145 | 146 | if ((c & mask) != 0) { 147 | hits++; 148 | } else { 149 | if (add) { 150 | bf[(int)bt] = (byte)(c | mask); 151 | } 152 | } 153 | 154 | 155 | } 156 | 157 | return hits == hashes; 158 | } 159 | 160 | 161 | /** 162 | * Check whether the filter contains a string 163 | * @param key the string to check 164 | * @return true if it already exists in the filter 165 | */ 166 | public boolean contains(String key) { 167 | return checkAdd(key, false); 168 | } 169 | 170 | 171 | /** 172 | * Add a string to the filter 173 | * @param key the string to add 174 | * @return true if the string was already in the filter 175 | */ 176 | public boolean add(String key) { 177 | return checkAdd(key, true); 178 | } 179 | } 180 | -------------------------------------------------------------------------------- /java/src/main/java/me/everything/inbloom/Murmur2.java: -------------------------------------------------------------------------------- 1 | package me.everything.inbloom; 2 | 3 | /** 4 | * murmur hash 2.0. 5 | * 6 | * The murmur hash is a relatively fast hash function from 7 | * http://murmurhash.googlepages.com/ for platforms with efficient 8 | * multiplication. 9 | * 10 | * This is a re-implementation of the original C code plus some 11 | * additional features. 12 | * 13 | * Public domain. 14 | * 15 | * @author Viliam Holub 16 | * @version 1.0.2 17 | * 18 | */ 19 | public final class Murmur2 { 20 | 21 | // all methods static; private constructor. 22 | private Murmur2() {} 23 | 24 | public static int hash32(String data, int seed) { 25 | final byte[] bytes = data.getBytes(); 26 | return hash32(bytes, bytes.length, seed ); 27 | } 28 | /** 29 | * Generates 32 bit hash from byte array of the given length and 30 | * seed. 31 | * 32 | * @param data byte array to hash 33 | * @param length length of the array to hash 34 | * @param seed initial seed value 35 | * @return 32 bit hash of the given array 36 | */ 37 | public static int hash32(final byte[] data, int length, int seed) { 38 | // 'm' and 'r' are mixing constants generated offline. 39 | // They're not really 'magic', they just happen to work well. 40 | final int m = 0x5bd1e995; 41 | final int r = 24; 42 | 43 | // Initialize the hash to a random value 44 | int h = seed^length; 45 | int length4 = length/4; 46 | 47 | for (int i=0; i>> r; 53 | k *= m; 54 | h *= m; 55 | h ^= k; 56 | } 57 | 58 | // Handle the last few bytes of the input array 59 | switch (length%4) { 60 | case 3: h ^= (data[(length&~3) +2]&0xff) << 16; 61 | case 2: h ^= (data[(length&~3) +1]&0xff) << 8; 62 | case 1: h ^= (data[length&~3]&0xff); 63 | h *= m; 64 | } 65 | 66 | h ^= h >>> 13; 67 | h *= m; 68 | h ^= h >>> 15; 69 | 70 | return h; 71 | } 72 | 73 | /** 74 | * Generates 32 bit hash from byte array with default seed value. 75 | * 76 | * @param data byte array to hash 77 | * @param length length of the array to hash 78 | * @return 32 bit hash of the given array 79 | */ 80 | public static int hash32(final byte[] data, int length) { 81 | return hash32(data, length, 0x9747b28c); 82 | } 83 | 84 | /** 85 | * Generates 32 bit hash from a string. 86 | * 87 | * @param text string to hash 88 | * @return 32 bit hash of the given string 89 | */ 90 | public static int hash32(final String text) { 91 | final byte[] bytes = text.getBytes(); 92 | return hash32(bytes, bytes.length); 93 | } 94 | 95 | /** 96 | * Generates 32 bit hash from a substring. 97 | * 98 | * @param text string to hash 99 | * @param from starting index 100 | * @param length length of the substring to hash 101 | * @return 32 bit hash of the given string 102 | */ 103 | public static int hash32(final String text, int from, int length) { 104 | return hash32(text.substring( from, from+length)); 105 | } 106 | 107 | /** 108 | * Generates 64 bit hash from byte array of the given length and seed. 109 | * 110 | * @param data byte array to hash 111 | * @param length length of the array to hash 112 | * @param seed initial seed value 113 | * @return 64 bit hash of the given array 114 | */ 115 | public static long hash64(final byte[] data, int length, int seed) { 116 | final long m = 0xc6a4a7935bd1e995L; 117 | final int r = 47; 118 | 119 | long h = (seed&0xffffffffl)^(length*m); 120 | 121 | int length8 = length/8; 122 | 123 | for (int i=0; i>> r; 132 | k *= m; 133 | 134 | h ^= k; 135 | h *= m; 136 | } 137 | 138 | switch (length%8) { 139 | case 7: h ^= (long)(data[(length&~7)+6]&0xff) << 48; 140 | case 6: h ^= (long)(data[(length&~7)+5]&0xff) << 40; 141 | case 5: h ^= (long)(data[(length&~7)+4]&0xff) << 32; 142 | case 4: h ^= (long)(data[(length&~7)+3]&0xff) << 24; 143 | case 3: h ^= (long)(data[(length&~7)+2]&0xff) << 16; 144 | case 2: h ^= (long)(data[(length&~7)+1]&0xff) << 8; 145 | case 1: h ^= (long)(data[length&~7]&0xff); 146 | h *= m; 147 | }; 148 | 149 | h ^= h >>> r; 150 | h *= m; 151 | h ^= h >>> r; 152 | 153 | return h; 154 | } 155 | 156 | /** 157 | * Generates 64 bit hash from byte array with default seed value. 158 | * 159 | * @param data byte array to hash 160 | * @param length length of the array to hash 161 | * @return 64 bit hash of the given string 162 | */ 163 | public static long hash64(final byte[] data, int length) { 164 | return hash64(data, length, 0xe17a1465); 165 | } 166 | 167 | /** 168 | * Generates 64 bit hash from a string. 169 | * 170 | * @param text string to hash 171 | * @return 64 bit hash of the given string 172 | */ 173 | public static long hash64(final String text) { 174 | final byte[] bytes = text.getBytes(); 175 | return hash64(bytes, bytes.length); 176 | } 177 | 178 | /** 179 | * Generates 64 bit hash from a substring. 180 | * 181 | * @param text string to hash 182 | * @param from starting index 183 | * @param length length of the substring to hash 184 | * @return 64 bit hash of the given array 185 | */ 186 | public static long hash64(final String text, int from, int length) { 187 | return hash64(text.substring( from, from+length)); 188 | } 189 | } -------------------------------------------------------------------------------- /java/src/test/java/me/everything/inbloom/BloomFilterTest.java: -------------------------------------------------------------------------------- 1 | package me.everything.inbloom; 2 | 3 | import junit.framework.TestCase; 4 | 5 | import java.io.InvalidObjectException; 6 | 7 | /** 8 | * Created by dvirsky on 23/07/15. 9 | */ 10 | public class BloomFilterTest extends TestCase { 11 | 12 | public void testCreateFilterWithBadParameters() { 13 | try { 14 | BloomFilter bf = new BloomFilter( -1, 0.01 ); 15 | fail( "should have thrown an exception" ); 16 | } 17 | catch( RuntimeException e ) { 18 | assertEquals( "Invalid params for bloom filter", e.getMessage() ); 19 | } 20 | 21 | // 22 | // Error value cannot be arbitrarily close to zero. Must be bounded. 23 | // 24 | try { 25 | BloomFilter bf = new BloomFilter( 1, 0.000000001 ); 26 | fail( "should have thrown an exception: error value is not bounded by a precision or tolerance value." ); 27 | } 28 | catch( RuntimeException e ) { 29 | assertEquals( "Invalid params for bloom filter", e.getMessage() ); 30 | } 31 | 32 | // 33 | // Creating an unusable bloom filter (zero bits, zero hashes) should fail. 34 | // 35 | try { 36 | BloomFilter bf = new BloomFilter(199, 1.0); 37 | fail( "should have thrown an exception: created an unusable bloom filter with zero bits and zero hashes." ); 38 | } 39 | catch( RuntimeException e ) { 40 | assertEquals( "Invalid params for bloom filter", e.getMessage() ); 41 | } 42 | 43 | // 44 | // Giving it an error value that is too big should fail. 45 | // 46 | try { 47 | BloomFilter bf = new BloomFilter(199, 100.0); 48 | fail( "should have thrown an exception: error value is too big." ); 49 | } 50 | catch( RuntimeException e ) { 51 | assertEquals( "Invalid params for bloom filter", e.getMessage() ); 52 | } 53 | 54 | // 55 | // Adding more data than expected should fail. 56 | // 57 | try { 58 | byte []data = "add more entries than bytes available".getBytes(); 59 | BloomFilter bf0 = new BloomFilter(data, 1, 0.1); 60 | fail( "should have thrown an exception: too much data." ); 61 | } 62 | catch( RuntimeException e ) { 63 | assertEquals( "Expected 1 bytes, got 37", e.getMessage() ); 64 | } 65 | 66 | } 67 | 68 | 69 | public void testValuesFromPublicAPI() { 70 | BloomFilter bf = null; 71 | assertEquals(0.000000001, bf.errorPrecision); 72 | 73 | bf = new BloomFilter(1, 0.01); 74 | assertEquals(2, bf.bytes); 75 | assertEquals(1, bf.entries); 76 | assertEquals(0.01, bf.error); 77 | assertEquals(9, bf.bits); 78 | assertEquals(7, bf.hashes); 79 | 80 | bf = new BloomFilter(1, 0.1); 81 | assertEquals(1, bf.bytes); 82 | assertEquals(1, bf.entries); 83 | assertEquals(0.1, bf.error); 84 | assertEquals(4, bf.bits); 85 | assertEquals(4, bf.hashes); 86 | 87 | bf = new BloomFilter(8, 0.000001); 88 | assertEquals(29, bf.bytes); 89 | assertEquals(8, bf.entries); 90 | assertEquals(0.000001, bf.error); 91 | assertEquals(230, bf.bits); 92 | assertEquals(20, bf.hashes); 93 | } 94 | 95 | 96 | public void testFilter() throws InvalidObjectException { 97 | BloomFilter bf = new BloomFilter(20, 0.01); 98 | 99 | 100 | bf.add("foo"); 101 | bf.add("bar"); 102 | bf.add("foosdfsdfs"); 103 | bf.add("fossdfsdfo"); 104 | bf.add("foasdfasdfasdfasdfo"); 105 | bf.add("foasdfasdfasdasdfasdfasdfasdfasdfo"); 106 | 107 | 108 | assertTrue(bf.contains("foo")); 109 | assertTrue(bf.contains("bar")); 110 | 111 | assertFalse(bf.contains("baz")); 112 | assertFalse(bf.contains("faskdjfhsdkfjhsjdkfhskdjfh")); 113 | 114 | 115 | BloomFilter bf2 = new BloomFilter(bf.bf, bf.entries, bf.error); 116 | assertTrue(bf2.contains("foo")); 117 | assertTrue(bf2.contains("bar")); 118 | 119 | assertFalse(bf2.contains("baz")); 120 | assertFalse(bf2.contains("faskdjfhsdkfjhsjdkfhskdjfh")); 121 | 122 | String serialized = BinAscii.hexlify(BloomFilter.dump(bf)); 123 | System.out.printf("Serialized: %s\n", serialized); 124 | 125 | String hexPayload = "620d006400000014000000000020001000080000000000002000100008000400"; 126 | BloomFilter deserialized = BloomFilter.load(BinAscii.unhexlify(hexPayload)); 127 | String dump = BinAscii.hexlify(BloomFilter.dump(deserialized)); 128 | System.out.printf("Re-Serialized: %s\n", dump); 129 | assertEquals(dump.toLowerCase(), hexPayload); 130 | 131 | //BloomFilter deserialized = BloomFilter.load(BloomFilter.dump(bf)); 132 | 133 | 134 | assertEquals(deserialized.entries, 20); 135 | assertEquals(deserialized.error, 0.01); 136 | assertTrue(deserialized.contains("abc")); 137 | } 138 | } 139 | -------------------------------------------------------------------------------- /py/MANIFEST.in: -------------------------------------------------------------------------------- 1 | include VERSION 2 | include README.rst 3 | graft vendor 4 | include inbloom/crc32.c 5 | -------------------------------------------------------------------------------- /py/README.md: -------------------------------------------------------------------------------- 1 | # inbloom (Python) 2 | 3 | - https://github.com/EverythingMe/inbloom/tree/master/py 4 | - https://pypi.python.org/pypi/inbloom/ 5 | 6 | Package inbloom implements a portable bloom filter that can export and import 7 | data to and from implementations of the same library in different languages. 8 | 9 | This implementation is a C extension which wraps libbloom (https://github.com/jvirkki/libbloom) 10 | 11 | 12 | ## Installation 13 | 14 | ```bash 15 | pip install inbloom 16 | ``` 17 | 18 | ## Usage 19 | 20 | ```python 21 | import inbloom 22 | 23 | bf = inbloom.Filter(entries=100, error=0.01) 24 | bf.add("abc") 25 | bf.add("def") 26 | 27 | assert bf.contains("abc") 28 | assert bf.contains("def") 29 | assert not bf.contains("ghi") 30 | 31 | bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer()) 32 | assert bf2.contains("abc") 33 | assert bf2.contains("def") 34 | assert not bf2.contains("ghi") 35 | ``` 36 | 37 | ##### Serialization 38 | 39 | ```python 40 | import inbloom 41 | import binascii 42 | 43 | payload = '620d006400000014000000000020001000080000000000002000100008000400' 44 | assert binascii.hexlify(inbloom.dump(inbloom.load(binascii.unhexlify(payload)))) == payload 45 | ``` 46 | -------------------------------------------------------------------------------- /py/README.rst: -------------------------------------------------------------------------------- 1 | inbloom (Python) 2 | ================ 3 | 4 | - https://github.com/EverythingMe/inbloom/tree/master/py 5 | - https://pypi.python.org/pypi/inbloom/ 6 | 7 | Package inbloom implements a portable bloom filter that can export and 8 | import data to and from implementations of the same library in different 9 | languages. 10 | 11 | This implementation is a C extension which wraps libbloom 12 | (https://github.com/jvirkki/libbloom) 13 | 14 | Installation 15 | ------------ 16 | 17 | .. code:: bash 18 | 19 | pip install inbloom 20 | 21 | Usage 22 | ----- 23 | 24 | .. code:: python 25 | 26 | import inbloom 27 | 28 | bf = inbloom.Filter(entries=100, error=0.01) 29 | bf.add("abc") 30 | bf.add("def") 31 | 32 | assert bf.contains("abc") 33 | assert bf.contains("def") 34 | assert not bf.contains("ghi") 35 | 36 | bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer()) 37 | assert bf2.contains("abc") 38 | assert bf2.contains("def") 39 | assert not bf2.contains("ghi") 40 | 41 | Serialization 42 | ''''''''''''' 43 | 44 | .. code:: python 45 | 46 | import inbloom 47 | import binascii 48 | 49 | payload = '620d006400000014000000000020001000080000000000002000100008000400' 50 | assert binascii.hexlify(inbloom.dump(inbloom.load(binascii.unhexlify(payload)))) == payload 51 | 52 | -------------------------------------------------------------------------------- /py/VERSION: -------------------------------------------------------------------------------- 1 | 0.2.2 2 | -------------------------------------------------------------------------------- /py/generate_rst: -------------------------------------------------------------------------------- 1 | pandoc --from=markdown --to=rst --output=README.rst README.md 2 | -------------------------------------------------------------------------------- /py/inbloom/crc32.c: -------------------------------------------------------------------------------- 1 | /*- 2 | * COPYRIGHT (C) 1986 Gary S. Brown. You may use this program, or 3 | * code or tables extracted from it, as desired without restriction. 4 | * 5 | * First, the polynomial itself and its table of feedback terms. The 6 | * polynomial is 7 | * X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 8 | * 9 | * Note that we take it "backwards" and put the highest-order term in 10 | * the lowest-order bit. The X^32 term is "implied"; the LSB is the 11 | * X^31 term, etc. The X^0 term (usually shown as "+1") results in 12 | * the MSB being 1 13 | * 14 | * Note that the usual hardware shift register implementation, which 15 | * is what we're using (we're merely optimizing it by doing eight-bit 16 | * chunks at a time) shifts bits into the lowest-order term. In our 17 | * implementation, that means shifting towards the right. Why do we 18 | * do it this way? Because the calculated CRC must be transmitted in 19 | * order from highest-order term to lowest-order term. UARTs transmit 20 | * characters in order from LSB to MSB. By storing the CRC this way 21 | * we hand it to the UART in the order low-byte to high-byte; the UART 22 | * sends each low-bit to hight-bit; and the result is transmission bit 23 | * by bit from highest- to lowest-order term without requiring any bit 24 | * shuffling on our part. Reception works similarly 25 | * 26 | * The feedback terms table consists of 256, 32-bit entries. Notes 27 | * 28 | * The table can be generated at runtime if desired; code to do so 29 | * is shown later. It might not be obvious, but the feedback 30 | * terms simply represent the results of eight shift/xor opera 31 | * tions for all combinations of data and CRC register values 32 | * 33 | * The values must be right-shifted by eight bits by the "updcrc 34 | * logic; the shift must be unsigned (bring in zeroes). On some 35 | * hardware you could probably optimize the shift in assembler by 36 | * using byte-swap instructions 37 | * polynomial $edb88320 38 | * 39 | * 40 | * CRC32 code derived from work by Gary S. Brown. 41 | */ 42 | 43 | #include 44 | #include 45 | 46 | static uint32_t crc32_tab[] = { 47 | 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f, 48 | 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988, 49 | 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2, 50 | 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7, 51 | 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9, 52 | 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172, 53 | 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c, 54 | 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59, 55 | 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423, 56 | 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924, 57 | 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106, 58 | 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433, 59 | 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d, 60 | 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e, 61 | 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950, 62 | 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65, 63 | 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7, 64 | 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0, 65 | 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa, 66 | 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f, 67 | 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81, 68 | 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a, 69 | 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84, 70 | 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1, 71 | 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb, 72 | 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc, 73 | 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e, 74 | 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b, 75 | 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55, 76 | 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236, 77 | 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28, 78 | 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d, 79 | 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f, 80 | 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38, 81 | 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242, 82 | 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777, 83 | 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69, 84 | 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2, 85 | 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc, 86 | 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9, 87 | 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693, 88 | 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94, 89 | 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d 90 | }; 91 | 92 | uint32_t 93 | crc32(uint32_t crc, const void *buf, size_t size) 94 | { 95 | const uint8_t *p; 96 | 97 | p = buf; 98 | crc = crc ^ ~0U; 99 | 100 | while (size--) 101 | crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8); 102 | 103 | return crc ^ ~0U; 104 | } 105 | -------------------------------------------------------------------------------- /py/inbloom/inbloom.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include "../vendor/libbloom/bloom.h" 4 | #include "crc32.c" 5 | 6 | static char module_docstring[] = "Python wrapper for libbloom"; 7 | 8 | typedef struct { 9 | PyObject_HEAD; 10 | struct bloom *_bloom_struct; 11 | } Filter; 12 | 13 | static PyTypeObject FilterType = { 14 | PyObject_HEAD_INIT(NULL) 15 | 0, /*ob_size*/ 16 | "inbloom.Filter", /*tp_name*/ 17 | sizeof(Filter), /*tp_basicsize*/ 18 | 0, /*tp_itemsize*/ 19 | 0, /*tp_dealloc*/ 20 | 0, /*tp_print*/ 21 | 0, /*tp_getattr*/ 22 | 0, /*tp_setattr*/ 23 | 0, /*tp_compare*/ 24 | 0, /*tp_repr*/ 25 | 0, /*tp_as_number*/ 26 | 0, /*tp_as_sequence*/ 27 | 0, /*tp_as_mapping*/ 28 | 0, /*tp_hash */ 29 | 0, /*tp_call*/ 30 | 0, /*tp_str*/ 31 | 0, /*tp_getattro*/ 32 | 0, /*tp_setattro*/ 33 | 0, /*tp_as_buffer*/ 34 | Py_TPFLAGS_DEFAULT, /*tp_flags*/ 35 | "Filter objects", /*tp_doc*/ 36 | }; 37 | 38 | struct serialized_filter_header { 39 | uint16_t checksum; 40 | uint16_t error_rate; 41 | uint32_t cardinality; 42 | }; 43 | 44 | static PyObject *InBloomError; 45 | 46 | static PyObject * 47 | instantiate_filter(uint32_t cardinality, uint16_t error_rate, const char *data, int datalen) 48 | { 49 | PyObject *args = Py_BuildValue("(ids#)", cardinality, 1.0 / error_rate, data, datalen); 50 | PyObject *obj = FilterType.tp_new(&FilterType, args, NULL); 51 | if (FilterType.tp_init(obj, args, NULL) < 0) { 52 | Py_DECREF(obj); 53 | obj = NULL; 54 | } 55 | Py_DECREF(args); 56 | return obj; 57 | } 58 | 59 | /* helpers */ 60 | static uint16_t 61 | compute_checksum(const char *buf, size_t len) 62 | { 63 | uint32_t checksum32 = crc32(0, buf, len); 64 | return (checksum32 & 0xFFFF) ^ (checksum32 >> 16); 65 | } 66 | 67 | static uint16_t 68 | read_uint16(const char **buffer) 69 | { 70 | uint16_t ret = ntohs(*((uint16_t *)*buffer)); 71 | *buffer += sizeof(uint16_t); 72 | return ret; 73 | } 74 | 75 | static uint32_t 76 | read_uint32(const char **buffer) 77 | { 78 | uint32_t ret = ntohl(*((uint32_t *)*buffer)); 79 | *buffer += sizeof(uint32_t); 80 | return ret; 81 | } 82 | 83 | 84 | /* serialization */ 85 | static PyObject * 86 | load(PyObject *self, PyObject *args) 87 | { 88 | const char *buffer; 89 | Py_ssize_t buflen; 90 | if (!PyArg_ParseTuple(args, "s#", &buffer, &buflen)) { 91 | return NULL; 92 | } 93 | 94 | if ((int)buflen < sizeof(struct serialized_filter_header) + 1) { 95 | PyErr_SetString(InBloomError, "incomplete payload"); 96 | return NULL; 97 | } 98 | 99 | struct serialized_filter_header header; 100 | header.checksum = read_uint16(&buffer); 101 | header.error_rate = read_uint16(&buffer); 102 | header.cardinality = read_uint32(&buffer); 103 | const char *data = buffer; 104 | size_t datalen = (int)buflen - sizeof(struct serialized_filter_header); 105 | uint16_t expected_checksum = compute_checksum(data, datalen); 106 | if (expected_checksum != header.checksum) { 107 | PyErr_SetString(InBloomError, "checksum mismatch"); 108 | return NULL; 109 | } 110 | return instantiate_filter(header.cardinality, header.error_rate, data, datalen); 111 | } 112 | 113 | static PyObject * 114 | dump(PyObject *self, PyObject *args) 115 | { 116 | Filter *filter; 117 | if (!PyArg_ParseTuple(args, "O", &filter)) { 118 | return NULL; 119 | } 120 | uint16_t checksum = compute_checksum((const char *)filter->_bloom_struct->bf, filter->_bloom_struct->bytes); 121 | 122 | struct serialized_filter_header header = {htons(checksum), htons(1.0 / filter->_bloom_struct->error), htonl(filter->_bloom_struct->entries)}; 123 | PyObject *serial_header = PyString_FromStringAndSize((const char *)&header, sizeof(struct serialized_filter_header)); 124 | PyObject *serial_data = PyString_FromStringAndSize((const char *)filter->_bloom_struct->bf, filter->_bloom_struct->bytes); 125 | PyString_Concat(&serial_header, serial_data); 126 | Py_DECREF(serial_data); 127 | return serial_header; 128 | } 129 | 130 | static PyMethodDef module_methods[] = { 131 | {"load", (PyCFunction)load, METH_VARARGS, 132 | "load a serialized filter"}, 133 | {"dump", (PyCFunction)dump, METH_VARARGS, 134 | "dump a filter into a string"}, 135 | {NULL} 136 | }; 137 | 138 | /* Filter methods */ 139 | static PyObject * 140 | Filter_add(Filter *self, PyObject *args) 141 | { 142 | const char *buffer; 143 | Py_ssize_t buflen; 144 | if (!PyArg_ParseTuple(args, "s#", &buffer, &buflen)) { 145 | return NULL; 146 | } 147 | 148 | bloom_add(self->_bloom_struct, buffer, buflen); 149 | Py_RETURN_NONE; 150 | } 151 | 152 | static PyObject * 153 | Filter_check(Filter *self, PyObject *args) 154 | { 155 | const char *buffer; 156 | Py_ssize_t buflen; 157 | if (!PyArg_ParseTuple(args, "s#", &buffer, &buflen)) { 158 | return NULL; 159 | } 160 | 161 | if (bloom_check(self->_bloom_struct, buffer, buflen)) 162 | Py_RETURN_TRUE; 163 | else 164 | Py_RETURN_FALSE; 165 | } 166 | 167 | static PyObject * 168 | Filter_buffer(Filter *self, PyObject *args) 169 | { 170 | return PyString_FromStringAndSize((const char *)self->_bloom_struct->bf, self->_bloom_struct->bytes); 171 | } 172 | 173 | static PyMethodDef Filter_methods[] = { 174 | {"add", (PyCFunction)Filter_add, METH_VARARGS, 175 | "add a member to the filter"}, 176 | {"contains", (PyCFunction)Filter_check, METH_VARARGS, 177 | "check if member exists the filter"}, 178 | {"buffer", (PyCFunction)Filter_buffer, METH_NOARGS, 179 | "get a copy of the internal buffer"}, 180 | {NULL} /* Sentinel */ 181 | }; 182 | 183 | static void 184 | Filter_dealloc(Filter* self) 185 | { 186 | bloom_free(self->_bloom_struct); 187 | free(self->_bloom_struct); 188 | self->ob_type->tp_free((PyObject*)self); 189 | } 190 | 191 | static PyObject * 192 | Filter_new(PyTypeObject *type, PyObject *args, PyObject *kwds) 193 | { 194 | Filter *self; 195 | 196 | self = (Filter *)type->tp_alloc(type, 0); 197 | if (self != NULL) { 198 | self->_bloom_struct = (struct bloom *)malloc(sizeof(struct bloom)); 199 | if (self->_bloom_struct == NULL) 200 | return PyErr_NoMemory(); 201 | } 202 | 203 | return (PyObject *)self; 204 | } 205 | 206 | static int 207 | Filter_init(Filter *self, PyObject *args, PyObject *kwargs) 208 | { 209 | static char *kwlist[] = {"entries", "error", "data", NULL}; 210 | int entries, success; 211 | double error; 212 | const char *data = NULL; 213 | Py_ssize_t len; 214 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "id|s#", kwlist, &entries, &error, &data, &len)) { 215 | return -1; 216 | } 217 | success = bloom_init(self->_bloom_struct, entries, error); 218 | if (success == 0) { 219 | if (data != NULL) { 220 | if ((int)len != self->_bloom_struct->bytes) { 221 | PyErr_SetString(InBloomError, "invalid data length"); 222 | return -1; 223 | } 224 | memcpy(self->_bloom_struct->bf, (const unsigned char *)data, self->_bloom_struct->bytes); 225 | } 226 | return 0; 227 | } 228 | else { 229 | PyErr_SetString(InBloomError, "internal initialization failed"); 230 | return -1; 231 | } 232 | } 233 | 234 | 235 | #ifndef PyMODINIT_FUNC 236 | #define PyMODINIT_FUND void 237 | #endif 238 | PyMODINIT_FUNC 239 | initinbloom(void) 240 | { 241 | PyObject *m; 242 | FilterType.tp_new = Filter_new; 243 | FilterType.tp_init = (initproc)Filter_init; 244 | FilterType.tp_methods = Filter_methods; 245 | FilterType.tp_dealloc = (destructor)Filter_dealloc; 246 | if (PyType_Ready(&FilterType) < 0) 247 | return; 248 | 249 | m = Py_InitModule3("inbloom", module_methods, module_docstring); 250 | Py_INCREF(&FilterType); 251 | PyModule_AddObject(m, "Filter", (PyObject *)&FilterType); 252 | 253 | InBloomError = PyErr_NewException("inbloom.error", NULL, NULL); 254 | Py_INCREF(InBloomError); 255 | PyModule_AddObject(m, "error", InBloomError); 256 | } 257 | -------------------------------------------------------------------------------- /py/setup.py: -------------------------------------------------------------------------------- 1 | from distutils.core import setup, Extension 2 | from os import path 3 | 4 | pwd = lambda f: path.join(path.abspath(path.dirname(__file__)), f) 5 | contents = lambda f: open(pwd(f)).read().strip() 6 | 7 | module = Extension('inbloom', 8 | ['inbloom/inbloom.c', 'vendor/libbloom/bloom.c', 'vendor/libbloom/murmur2/MurmurHash2.c'], 9 | include_dirs=['vendor/libbloom/murmur2'] 10 | ) 11 | 12 | setup( 13 | name='inbloom', 14 | author='EverythingMe', 15 | description='Portable, cross language Bloom Fitler implementation, with compatible libraries in Java and Go', 16 | long_description=contents('README.rst'), 17 | version=contents('VERSION'), 18 | url='https://github.com/EverythingMe/inbloom', 19 | ext_modules=[module], 20 | license='BSD', 21 | ) 22 | -------------------------------------------------------------------------------- /py/test.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, division 2 | from unittest import TestCase 3 | from binascii import hexlify 4 | import inbloom 5 | 6 | 7 | class InBloomTestCase(TestCase): 8 | def test_functionality(self): 9 | bf = inbloom.Filter(20, 0.01) 10 | keys = ["foo", "bar", "foosdfsdfs", "fossdfsdfo", "foasdfasdfasdfasdfo", "foasdfasdfasdasdfasdfasdfasdfasdfo"] 11 | faux = ["goo", "gar", "gaz"] 12 | for k in keys: 13 | bf.add(k) 14 | 15 | for k in keys: 16 | assert bf.contains(k) 17 | 18 | for k in faux: 19 | assert not bf.contains(k) 20 | 21 | expected = '02000C0300C2246913049E040002002000017614002B0002' 22 | actual = hexlify(bf.buffer()).upper() 23 | assert expected == actual 24 | 25 | def test_dump_load(self): 26 | bf = inbloom.Filter(20, 0.01) 27 | bf.add('abc') 28 | expected = '620d006400000014000000000020001000080000000000002000100008000400' 29 | actual = hexlify(inbloom.dump(bf)) 30 | assert expected == actual 31 | 32 | bf = inbloom.load(inbloom.dump(bf)) 33 | actual = hexlify(inbloom.dump(bf)) 34 | assert expected == actual 35 | 36 | data = inbloom.dump(bf) 37 | data = str([0xff, 0xff]) + data[2:] 38 | 39 | with self.assertRaisesRegexp(inbloom.error, "checksum mismatch"): 40 | inbloom.load(data) 41 | 42 | data = data[:4] 43 | with self.assertRaisesRegexp(inbloom.error, "incomplete payload"): 44 | inbloom.load(data) 45 | --------------------------------------------------------------------------------