├── .gitmodules
├── LICENSE
├── README.md
├── go
├── .gitignore
├── LICENSE
├── README.md
├── inbloom
│ ├── bloom.go
│ └── bloom_test.go
└── internal
│ └── gomurmur
│ ├── LICENSE
│ ├── README.md
│ ├── gomurmur.go
│ └── gomurmur_test.go
├── inbloom.png
├── java
├── .gitignore
├── InBloom.iml
├── README.md
├── build.gradle
├── gradle
│ └── wrapper
│ │ └── gradle-wrapper.properties
├── gradlew
├── gradlew.bat
├── settings.gradle
└── src
│ ├── main
│ └── java
│ │ ├── META-INF
│ │ └── MANIFEST.MF
│ │ └── me
│ │ └── everything
│ │ └── inbloom
│ │ ├── BinAscii.java
│ │ ├── BloomFilter.java
│ │ └── Murmur2.java
│ └── test
│ └── java
│ └── me
│ └── everything
│ └── inbloom
│ └── BloomFilterTest.java
└── py
├── MANIFEST.in
├── README.md
├── README.rst
├── VERSION
├── generate_rst
├── inbloom
├── crc32.c
└── inbloom.c
├── setup.py
└── test.py
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "py/vendor/libbloom"]
2 | path = py/vendor/libbloom
3 | url = https://github.com/jvirkki/libbloom
4 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2015, EverythingMe
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without
5 | modification, are permitted provided that the following conditions are met:
6 |
7 | * Redistributions of source code must retain the above copyright notice, this
8 | list of conditions and the following disclaimer.
9 |
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 | this list of conditions and the following disclaimer in the documentation
12 | and/or other materials provided with the distribution.
13 |
14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24 |
25 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | ## inbloom
2 |
3 | _inbloom_ - a cross language Bloom filter implementation (https://en.wikipedia.org/wiki/Bloom_filter).
4 |
5 | 
6 |
7 | ## What's a Bloom Filter?
8 | A Bloom filter is a probabalistic data structure which provides an extremely space-efficient method of representing large sets.
9 | It can have false positives but never false negatives which means a query returns either "possibly in set" or "definitely not in set".
10 |
11 | You can tune a Bloom filter to the desired error rate, it's basically a tradeoff between size and accuracy (See example: http://hur.st/bloomfilter). For example, a filter for about 100 keys with 1% error rate can be compressed to just 120 bytes. With 5% error rate it can be compressed to 78 bytes.
12 |
13 | ## Why Cross Language?
14 | At EverythingMe we have an Android client written in Java, and servers written mostly in Python and Go. When we wanted to pass filters from the client to the server to avoid saving some state on the server side, we needed an efficient implementation that can read and write Bloom Filters in all three languages at least, and found none.
15 |
16 | Having such a library allows us to send filters between clients and any server component easily.
17 |
18 | So we decided to build on top of an existing simple implementation in C called libbloom (https://github.com/jvirkki/libbloom) and expand it to all 3 langauges.
19 | We chose to use the original C implementation for the Python version only, and **translated the code to pure Java and Go, without calling any C code**.
20 | We chose this approach because the original C code is fairly short and straightforward, so porting it to other languages was a simple task;
21 | and avoiding calling C from Java and Go simplifies and shortens the build process, and reduces executable size - in both cases.
22 |
23 | ## Filter headers
24 |
25 | InBloom provides utilities for serializing / deserializing Bloom filters so they can be sent over the network.
26 | Since when you create a Bloom filter, you need to initialize it with parameters of expected cardinality and false positive rates,
27 | they are also needed to read a filter written by another party. Instead of choosing fixed parameters in our configurations, we opted for encoding
28 | those parameters as a header when serizlizing the filter. We've added a 16 bit checksum for good measure as part of the header.
29 |
30 | ### Serialized filter structure:
31 |
32 | | Field | Type | bits |
33 | | ------------- |:-------------:| -----:|
34 | | checksum | ushort | 16 |
35 | | errorRate (1/N)| ushort | 16 |
36 | | cardinality | int | 32 |
37 | | data | byte[] | ? |
38 |
39 |
40 | ## Installation
41 |
42 | #### Python
43 | ```bash
44 | pip install inbloom
45 | ```
46 |
47 | #### Go
48 | ```bash
49 | go get github.com/EverythingMe/inbloom/go/inbloom
50 | ```
51 |
52 | #### Java
53 |
54 | Add the following lines to your build.gradle script.
55 |
56 | ```groovy
57 | repositories {
58 | jcenter {
59 | url 'http://dl.bintray.com/everythingme/generic'
60 | }
61 | }
62 |
63 | dependencies {
64 | compile 'me.everything:inbloom:0.1'
65 | }
66 | ```
67 |
68 | ### Example Usage
69 |
70 | #### Python
71 | ```python
72 | import inbloom
73 | import base64
74 | import requests
75 |
76 | # Basic usage
77 | bf = inbloom.Filter(entries=100, error=0.01)
78 | bf.add("abc")
79 | bf.add("def")
80 |
81 | assert bf.contains("abc")
82 | assert bf.contains("def")
83 | assert not bf.contains("ghi")
84 |
85 | bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer())
86 | assert bf2.contains("abc")
87 | assert bf2.contains("def")
88 | assert not bf2.contains("ghi")
89 |
90 |
91 | # Serialization
92 | payload = 'Yg0AZAAAABQAAAAAACAAEAAIAAAAAAAAIAAQAAgABAA='
93 | assert base64.b64encode(inbloom.dump(inbloom.load(base64.b64decode(payload)))) == payload
94 |
95 | # Sending it over HTTP
96 | serialized = base64.b64encode(inbloom.dump(bf))
97 | requests.get('http://api.endpoint.me', params={'filter': serialized})
98 | ```
99 |
100 | #### Go
101 | ```go
102 | // create a blank filter - expecting 20 members and an error rate of 1/100
103 | f, err := NewFilter(20, 0.01)
104 | if err != nil {
105 | panic(err)
106 | }
107 |
108 | // the size of the filter
109 | fmt.Println(f.Len())
110 |
111 | // insert some values
112 | f.Add("foo")
113 | f.Add("bar")
114 |
115 | // test for existence of keys
116 | fmt.Println(f.Contains("foo"))
117 | fmt.Println(f.Contains("wat"))
118 |
119 | fmt.Println("marshaled data:", f.MarshalBase64())
120 |
121 | // Output:
122 | // 24
123 | // true
124 | // false
125 | // marshaled data: oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=
126 | ```
127 |
128 | ```go
129 | // a 20 cardinality 0.01 precision filter with "foo" and "bar" in it
130 | data := "oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA="
131 |
132 | // load it from base64
133 | f, err := UnmarshalBase64(data)
134 | if err != nil {
135 | panic(err)
136 | }
137 |
138 | // test it...
139 | fmt.Println(f.Contains("foo"))
140 | fmt.Println(f.Contains("wat"))
141 | fmt.Println(f.Len())
142 |
143 | // dump to pure binary
144 | fmt.Printf("%x\n", f.Marshal())
145 | // Output:
146 | // true
147 | // false
148 | // 24
149 | // a14e006400000014000000000042000011001804000200200000301000090000
150 | ```
151 |
152 | #### Java
153 | ```java
154 | import me.everything.inbloom.BloomFilter;
155 | import me.everything.inbloom.BinAscii; // Optional - for hex representation
156 |
157 | // The basics
158 | BloomFilter bf = new BloomFilter(20, 0.01);
159 | bf.add("foo");
160 | bf.add("bar");
161 |
162 | assertTrue(bf.contains("foo"));
163 | assertTrue(bf.contains("bar"));
164 | assertFalse(bf.contains("baz"));
165 |
166 |
167 | BloomFilter bf2 = new BloomFilter(bf.bf, bf.entries, bf.error);
168 | assertTrue(bf2.contains("foo"));
169 | assertTrue(bf2.contains("bar"));
170 | assertFalse(bf2.contains("baz"));
171 |
172 | // Serialization
173 | String serialized = BinAscii.hexlify(BloomFilter.dump(bf));
174 | System.out.printf("Serialized: %s\n", serialized);
175 |
176 | String hexPayload = "620d006400000014000000000020001000080000000000002000100008000400";
177 | BloomFilter deserialized = BloomFilter.load(BinAscii.unhexlify(hexPayload));
178 | String dump = BinAscii.hexlify(BloomFilter.dump(deserialized));
179 | System.out.printf("Re-Serialized: %s\n", dump);
180 | assertEquals(dump.toLowerCase(), hexPayload);
181 |
182 | assertEquals(deserialized.entries, 20);
183 | assertEquals(deserialized.error, 0.01);
184 | assertTrue(deserialized.contains("abc"));
185 | ```
186 |
--------------------------------------------------------------------------------
/go/.gitignore:
--------------------------------------------------------------------------------
1 | cover.out
2 |
--------------------------------------------------------------------------------
/go/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2015, EverythingMe
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without
5 | modification, are permitted provided that the following conditions are met:
6 |
7 | * Redistributions of source code must retain the above copyright notice, this
8 | list of conditions and the following disclaimer.
9 |
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 | this list of conditions and the following disclaimer in the documentation
12 | and/or other materials provided with the distribution.
13 |
14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
24 |
25 |
--------------------------------------------------------------------------------
/go/README.md:
--------------------------------------------------------------------------------
1 | # inbloom
2 | --
3 | import "github.com/EverythingMe/inbloom/go/inbloom"
4 |
5 | Package inbloom implements a portable bloom filter that can export and import
6 | data to and from implementations of the same library in different languages.
7 |
8 | ## Installation
9 | ```bash
10 | go get github.com/EverythingMe/inbloom/go/inbloom
11 | ```
12 |
13 | ## Usage
14 |
15 | #### type BloomFilter
16 |
17 | ```go
18 | type BloomFilter struct {
19 | }
20 | ```
21 |
22 | BloomFilter is our implementation of a simple dynamically sized bloom filter.
23 |
24 | This code was adapted to Go from the libbloom C library -
25 | https://github.com/jvirkki/libbloom
26 |
27 | #### func NewFilter
28 |
29 | ```go
30 | func NewFilter(entries int, errorRate float64) (*BloomFilter, error)
31 | ```
32 | NewFilter creates an empty bloom filter, with the given expected number of
33 | entries, and desired error rate. The number of hash functions and size of the
34 | filter are calculated from these 2 parameters
35 |
36 | #### func Unmarshal
37 |
38 | ```go
39 | func Unmarshal(data []byte) (*BloomFilter, error)
40 | ```
41 | Unmarshal reads a binary dump of an inbloom filter with its header, and returns
42 | the resulting filter. Since this is a dump containing size and precisin
43 | metadata, you do not need to specify them.
44 |
45 | If the data is corrupt or the buffer is not complete, we return an error
46 |
47 | #### func UnmarshalBase64
48 |
49 | ```go
50 | func UnmarshalBase64(b64 string) (*BloomFilter, error)
51 | ```
52 | UnmarshalBase64 is a convenience function that unmarshals a filter that has been
53 | encoded into a url parameter
54 |
55 | #### func (*BloomFilter) Add
56 |
57 | ```go
58 | func (f *BloomFilter) Add(key string) bool
59 | ```
60 | Add adds a key to the filter
61 |
62 | #### func (*BloomFilter) Contains
63 |
64 | ```go
65 | func (f *BloomFilter) Contains(key string) bool
66 | ```
67 | Contains returns true if a key exists in the filter
68 |
69 | #### func (*BloomFilter) Len
70 |
71 | ```go
72 | func (f *BloomFilter) Len() int
73 | ```
74 | Len returns the number of BYTES in the filter
75 |
76 | #### func (*BloomFilter) Marshal
77 |
78 | ```go
79 | func (f *BloomFilter) Marshal() []byte
80 | ```
81 | Marshal dumps the filter to a byte array, with a header containing the error
82 | rate, cardinality and a checksum. This data can be passed to another inbloom
83 | filter over the network, and thus the other end can open the data without the
84 | user having to pass the filter size explicitly. See Unmarshal for reading these
85 | dumpss
86 |
87 | #### func (*BloomFilter) MarshalBase64
88 |
89 | ```go
90 | func (f *BloomFilter) MarshalBase64() string
91 | ```
92 | MarshalBase64 is a convenience method that dumps the filter's data to a base64
93 | encoded string, ready to be passed as an GET/POST parameter
94 |
--------------------------------------------------------------------------------
/go/inbloom/bloom.go:
--------------------------------------------------------------------------------
1 | // This code was adapted to Go from the libbloom C library - https://github.com/jvirkki/libbloom
2 | //
3 | // Original copyright note from libbloom:
4 | //
5 | // Copyright (c) 2012, Jyri J. Virkki
6 | // All rights reserved.
7 | // This file is under BSD license. See LICENSE file.
8 |
9 | // Package inbloom implements a portable bloom filter that can export and import data to and from
10 | // implementations of the same library in different languages.
11 | package inbloom
12 |
13 | import (
14 | "bytes"
15 | "encoding/base64"
16 | "encoding/binary"
17 | "errors"
18 | "fmt"
19 | "hash/crc32"
20 | "math"
21 | "unsafe"
22 |
23 | "github.com/EverythingMe/inbloom/go/internal/gomurmur"
24 | )
25 |
26 | const denom = 0.480453013918201
27 |
28 | // BloomFilter is our implementation of a simple dynamically sized bloom filter.
29 | //
30 | // This code was adapted to Go from the libbloom C library - https://github.com/jvirkki/libbloom
31 | type BloomFilter struct {
32 |
33 | // These fields are part of the public interface of this structure.
34 | // Client code may read these values if desired. Client code MUST NOT
35 | // modify any of these.
36 |
37 | entries int
38 | errorRate float64
39 | bits int
40 | bytes int
41 | hashes int
42 |
43 | // Fields below are private to the implementation. These may go away or
44 | // change incompatibly at any moment. Client code MUST NOT access or rely
45 | // on these.
46 |
47 | bpe float64
48 | bf []byte
49 | }
50 |
51 | // NewFilter creates an empty bloom filter, with the given expected number of entries, and desired error rate.
52 | // The number of hash functions and size of the filter are calculated from these 2 parameters
53 | func NewFilter(entries int, errorRate float64) (*BloomFilter, error) {
54 | return newFilterFromData(nil, entries, errorRate)
55 | }
56 |
57 | // NewFilterFromData creates a bloom filter from an existing data buffer, created by another instance of this library (probably in another language).
58 | //
59 | // If the length of the data does not fit the number of entries and error rate, we return an error. If data is nil we allocate a new filter
60 | func newFilterFromData(data []byte, entries int, errorRate float64) (*BloomFilter, error) {
61 |
62 | if entries < 1 || errorRate == 0 {
63 | return nil, errors.New("Invalid params for bloom filter")
64 | }
65 |
66 | bpe := -(math.Log(errorRate) / denom)
67 | bits := int(float64(entries) * bpe)
68 |
69 | flt := &BloomFilter{
70 | entries: entries,
71 | errorRate: errorRate,
72 | bpe: bpe,
73 | bits: bits,
74 | bytes: (bits / 8),
75 | hashes: int(math.Ceil(0.693147180559945 * bpe)), // ln(2)
76 | }
77 |
78 | if flt.bits%8 != 0 {
79 | flt.bytes++
80 | }
81 |
82 | if data != nil {
83 | if flt.bytes != len(data) {
84 | return nil, fmt.Errorf("Expected %d bytes, got %d", flt.bytes, len(data))
85 | }
86 | flt.bf = data
87 | } else {
88 | flt.bf = make([]byte, flt.bytes)
89 | }
90 | return flt, nil
91 | }
92 |
93 | // checkAdd checks existence or adds a key to the filter
94 | func (f *BloomFilter) checkAdd(key []byte, add bool) bool {
95 |
96 | hits := 0
97 | a, _ := gomurmur.Sum32(key, 0x9747b28c)
98 | b, _ := gomurmur.Sum32(key, a)
99 |
100 | for i := 0; i < f.hashes; i++ {
101 | x := (a + uint32(i)*b) % uint32(f.bits)
102 | bt := x >> 3
103 |
104 | c := f.bf[bt] // expensive memory access
105 | mask := byte(1) << (x % 8)
106 |
107 | if (c & mask) != 0 {
108 | hits++
109 | } else {
110 | if add {
111 | f.bf[bt] = byte(c | mask)
112 | }
113 | }
114 |
115 | }
116 |
117 | return hits == f.hashes
118 | }
119 |
120 | // Contains returns true if a key exists in the filter
121 | func (f *BloomFilter) Contains(key string) bool {
122 | return f.checkAdd([]byte(key), false)
123 | }
124 |
125 | // Add adds a key to the filter
126 | func (f *BloomFilter) Add(key string) bool {
127 | return f.checkAdd([]byte(key), true)
128 | }
129 |
130 | // Len returns the number of BYTES in the filter
131 | func (f *BloomFilter) Len() int {
132 | return f.bytes
133 | }
134 |
135 | // checksum returns a 16 bit checksum of the data (using xor folded crc32 checksum)
136 | func (f *BloomFilter) checksum() uint16 {
137 |
138 | checksum32 := crc32.ChecksumIEEE(f.bf)
139 | return uint16(checksum32&0xFFFF) ^ uint16(checksum32>>16)
140 |
141 | }
142 |
143 | // The structure of a marshaled binary filter is:
144 | // checksum uint16
145 | // error_rate uint16
146 | // cardinality uint32
147 | // data []byte
148 |
149 | // Marshal dumps the filter to a byte array, with a header containing the error rate, cardinality and a checksum.
150 | // This data can be passed to another inbloom filter over the network, and thus the other end can open the data
151 | // without the user having to pass the filter size explicitly. See Unmarshal for reading these dumpss
152 | func (f *BloomFilter) Marshal() []byte {
153 |
154 | buf := bytes.NewBuffer(make([]byte, 0, len(f.bf)+int(unsafe.Sizeof(uint16(0))*2)+int(unsafe.Sizeof(uint32(0)))))
155 | binary.Write(buf, binary.BigEndian, f.checksum())
156 |
157 | errs := uint16(1 / f.errorRate)
158 | binary.Write(buf, binary.BigEndian, errs)
159 | binary.Write(buf, binary.BigEndian, uint32(f.entries))
160 | buf.Write(f.bf)
161 | return buf.Bytes()
162 | }
163 |
164 | // MarshalBase64 is a convenience method that dumps the filter's data to a base64 encoded string.
165 | // By default uses URLEncoding which ready to be passed as a GET/POST parameter.
166 | // Pass an encoding param to use different encoding.
167 | func (f *BloomFilter) MarshalBase64(encoding ...*base64.Encoding) string {
168 | if len(encoding) > 1 {
169 | panic(fmt.Sprintf("Expected at most 1 encoding, got %d", len(encoding)))
170 | } else if len(encoding) == 1 {
171 | return encoding[0].EncodeToString(f.Marshal())
172 | } else {
173 | return base64.URLEncoding.EncodeToString(f.Marshal())
174 | }
175 | }
176 |
177 | // UnmarshalBase64 is a convenience function that unmarshals a filter that has been encoded into base64.
178 | // Uses URLEncoding by default, pass an encoding param to use different encoding.
179 | func UnmarshalBase64(b64 string, encoding ...*base64.Encoding) (*BloomFilter, error) {
180 | selectedEncoding := base64.URLEncoding
181 | if len(encoding) > 1 {
182 | panic(fmt.Sprintf("Expected at most 1 encoding, got %d", len(encoding)))
183 | } else if len(encoding) == 1 {
184 | selectedEncoding = encoding[0]
185 | }
186 | if b, err := selectedEncoding.DecodeString(b64); err != nil {
187 | return nil, fmt.Errorf("bloom: could not decode base64 data: %s", err)
188 | } else {
189 | return Unmarshal(b)
190 | }
191 |
192 | }
193 |
194 | // Unmarshal reads a binary dump of an inbloom filter with its header, and returns the resulting filter.
195 | // Since this is a dump containing size and precisin metadata, you do not need to specify them.
196 | //
197 | // If the data is corrupt or the buffer is not complete, we return an error
198 | func Unmarshal(data []byte) (*BloomFilter, error) {
199 |
200 | if data == nil || len(data) <= int(unsafe.Sizeof(uint16(0))*2)+int(unsafe.Sizeof(uint32(0))) {
201 | return nil, errors.New("Invalid buffer size")
202 | }
203 | buf := bytes.NewBuffer(data)
204 | var checksum, errRate uint16
205 | var entries uint32
206 |
207 | if err := binary.Read(buf, binary.BigEndian, &checksum); err != nil {
208 | return nil, err
209 | }
210 | if err := binary.Read(buf, binary.BigEndian, &errRate); err != nil {
211 | return nil, err
212 | }
213 | if err := binary.Read(buf, binary.BigEndian, &entries); err != nil {
214 | return nil, err
215 | }
216 |
217 | if errRate == 0 {
218 | return nil, errors.New("Error rate cannot be 0")
219 | }
220 |
221 | // Read the data
222 | bf := make([]byte, len(data))
223 | if n, err := buf.Read(bf); err != nil {
224 | return nil, err
225 | } else {
226 | bf = bf[:n]
227 | }
228 |
229 | // Create a new filter from the data we read
230 | ret, err := newFilterFromData(bf, int(entries), 1/float64(errRate))
231 | if err != nil {
232 | return nil, err
233 | }
234 |
235 | // Verify checksum
236 | if ret.checksum() != checksum {
237 | return nil, errors.New("Bad checksum")
238 | }
239 |
240 | return ret, nil
241 | }
242 |
--------------------------------------------------------------------------------
/go/inbloom/bloom_test.go:
--------------------------------------------------------------------------------
1 | package inbloom
2 |
3 | import (
4 | "encoding/base64"
5 | "fmt"
6 | "testing"
7 | )
8 |
9 | func TestBloom(t *testing.T) {
10 |
11 | bf, err := NewFilter(20, 0.01)
12 | if err != nil {
13 | t.Fatal(err)
14 | }
15 |
16 | keys := []string{"foo", "bar", "foosdfsdfs", "fossdfsdfo", "foasdfasdfasdfasdfo", "foasdfasdfasdasdfasdfasdfasdfasdfo"}
17 |
18 | faux := []string{"goo", "gar", "gaz"}
19 |
20 | for _, k := range keys {
21 | if bf.Add(k) == true {
22 | t.Errorf("adding %s returned true", k)
23 | }
24 | }
25 |
26 | t.Logf("Bloom filter params: %X", bf.bf)
27 | for _, k := range keys {
28 | if !bf.Contains(k) {
29 | t.Error("not containig ", k)
30 | }
31 |
32 | }
33 |
34 | for _, k := range faux {
35 | if bf.Contains(k) {
36 | t.Error("containig faux key", k)
37 | }
38 | }
39 |
40 | expected := "02000C0300C2246913049E040002002000017614002B0002"
41 | actual := fmt.Sprintf("%X", bf.bf)
42 | if actual != expected {
43 | t.Errorf("expected\n%s\nactual\n%s", expected, actual)
44 | }
45 | }
46 |
47 | func TestMarshal(t *testing.T) {
48 | bf, err := NewFilter(20, 0.01)
49 | if err != nil {
50 | t.Fatal(err)
51 | }
52 | bf.Add("abc")
53 |
54 | serizliaed := fmt.Sprintf("%x", bf.Marshal())
55 | expected := "620d006400000014000000000020001000080000000000002000100008000400"
56 | if serizliaed != expected {
57 | t.Errorf("Expected %s, got %s", expected, serizliaed)
58 | }
59 |
60 | bfds, err := Unmarshal(bf.Marshal())
61 | if err != nil {
62 | t.Fatal(err)
63 | }
64 |
65 | serizliaed = fmt.Sprintf("%x", bfds.Marshal())
66 | if serizliaed != expected {
67 | t.Errorf("Expected %s, got %s", expected, serizliaed)
68 | }
69 | t.Logf("DESERIALIZED: %X\n", bfds.Marshal())
70 |
71 | // Test for bad checksum
72 |
73 | data := bfds.Marshal()
74 | data[0] = 0xff
75 | data[1] = 0xff
76 |
77 | if _, err = Unmarshal(data); err == nil {
78 | t.Error("Should have failed on bad checksum")
79 | } else {
80 | t.Log(err)
81 | }
82 |
83 | data[2] = 0xff
84 | if _, err = Unmarshal(data); err == nil {
85 | t.Error("Should have failed on bad size")
86 | } else {
87 | t.Log(err)
88 | }
89 |
90 | data = data[:4]
91 | if _, err = Unmarshal(data); err == nil {
92 | t.Error("Should have failed on bad data")
93 | } else {
94 | t.Log(err)
95 | }
96 |
97 | }
98 |
99 | func TestBase64StdEncoding(t *testing.T) {
100 | source := "j+gAZAAAAGQAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAIAAAAAEAAAAAAAAAAAAABAAAAAAAAAYAAAAAAACAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAQAAgAAAAAAAAAAAAAAAAAACBgAA="
101 |
102 | f, err := UnmarshalBase64(source, base64.StdEncoding)
103 | if err != nil {
104 | t.Fatal(err)
105 | }
106 |
107 | marshaled := f.MarshalBase64(base64.StdEncoding)
108 |
109 | if marshaled != source {
110 | t.Fatal(fmt.Sprintf("MarshalBase64 differs from source: %s != %s", marshaled, source))
111 | }
112 | }
113 |
114 | func TestBase64UrlEncoding(t *testing.T) {
115 | source := "j-gAZAAAAGQAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAIAAAAAEAAAAAAAAAAAAABAAAAAAAAAYAAAAAAACAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAQAAgAAAAAAAAAAAAAAAAAACBgAA="
116 |
117 | f, err := UnmarshalBase64(source)
118 | if err != nil {
119 | t.Fatal(err)
120 | }
121 |
122 | marshaled := f.MarshalBase64()
123 |
124 | if marshaled != source {
125 | t.Fatal(fmt.Sprintf("MarshalBase64 differs from source: %s != %s", marshaled, source))
126 | }
127 | }
128 |
129 | func ExampleBloomFilter() {
130 |
131 | // create a blank filter - expecting 20 members and an error rate of 1/100
132 | f, err := NewFilter(20, 0.01)
133 | if err != nil {
134 | panic(err)
135 | }
136 |
137 | // the size of the filter
138 | fmt.Println(f.Len())
139 |
140 | // insert some values
141 | f.Add("foo")
142 | f.Add("bar")
143 |
144 | // test for existence of keys
145 | fmt.Println(f.Contains("foo"))
146 | fmt.Println(f.Contains("wat"))
147 |
148 | fmt.Println("marshaled data:", f.MarshalBase64())
149 |
150 | // Output:
151 | // 24
152 | // true
153 | // false
154 | // marshaled data: oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA=
155 |
156 | }
157 |
158 | func ExampleMarshalUnmarshal() {
159 |
160 | // a 20 cardinality 0.01 precision filter with "foo" and "bar" in it
161 | data := "oU4AZAAAABQAAAAAAEIAABEAGAQAAgAgAAAwEAAJAAA="
162 |
163 | // load it from base64
164 | f, err := UnmarshalBase64(data)
165 | if err != nil {
166 | panic(err)
167 | }
168 |
169 | // test it...
170 | fmt.Println(f.Contains("foo"))
171 | fmt.Println(f.Contains("wat"))
172 | fmt.Println(f.Len())
173 |
174 | // dump to pure binary
175 | fmt.Printf("%x\n", f.Marshal())
176 | // Output:
177 | // true
178 | // false
179 | // 24
180 | // a14e006400000014000000000042000011001804000200200000301000090000
181 |
182 | }
183 |
--------------------------------------------------------------------------------
/go/internal/gomurmur/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2013-2016, Sureshkumar Nedunchezhian
2 | All rights reserved.
3 |
4 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
5 |
6 | * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
7 | * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
8 |
9 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
10 |
--------------------------------------------------------------------------------
/go/internal/gomurmur/README.md:
--------------------------------------------------------------------------------
1 | # gomurmur
2 |
3 | Go implementation of MurmurHash2, 32bit (https://code.google.com/p/smhasher/)
4 |
5 | ### TODO
6 | * Add benchmark tests
7 | * Implement MurmurHash3
8 |
--------------------------------------------------------------------------------
/go/internal/gomurmur/gomurmur.go:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2013-2016, Sureshkumar Nedunchezhian
3 | * All rights reserved.
4 | *
5 | * Redistribution and use in source and binary forms, with or without
6 | * modification, are permitted provided that the following conditions are met:
7 | *
8 | * * Redistributions of source code must retain the above copyright notice,
9 | * this list of conditions and the following disclaimer.
10 | * * Redistributions in binary form must reproduce the above copyright
11 | * notice, this list of conditions and the following disclaimer in
12 | * the documentation and/or other materials provided with the distribution.
13 | *
14 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
15 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
16 | * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
17 | * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
18 | * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
19 | * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
20 | * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
21 | * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
22 | * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
23 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
24 | * THE POSSIBILITY OF SUCH DAMAGE.
25 | */
26 |
27 | /*
28 | * "Murmur" hash provided by Austin, tanjent@gmail.com
29 | * http://murmurhash.googlepages.com/
30 | *
31 | * Note - This code makes a few assumptions about how your machine behaves -
32 | *
33 | * 1. We can read a 4-byte value from any address without crashing
34 | * 2. sizeof(int) == 4
35 | *
36 | * And it has a few limitations -
37 | * 1. It will not work incrementally.
38 | * 2. It will not produce the same results on little-endian and big-endian
39 | * machines. */
40 |
41 | package gomurmur
42 |
43 | import (
44 | "bytes"
45 | "encoding/binary"
46 | "hash"
47 | )
48 |
49 | type (
50 | sum32 uint32
51 | )
52 |
53 | const (
54 | m = 0x5bd1e995
55 | r = 24
56 | )
57 |
58 | func Sum32(b []byte, seed uint32) (uint32, error) {
59 | var s sum32 = 0
60 | h := &s
61 |
62 | if _, err := h.WriteSeed(b, seed); err != nil {
63 | return 0, err
64 | }
65 | return uint32(*h), nil
66 | }
67 |
68 | // New32 returns a new 32-bit FNV-1 hash.Hash.
69 | func New32() hash.Hash32 {
70 | var s sum32 = 0
71 | return &s
72 | }
73 |
74 | func (s *sum32) Reset() { *s = 0 }
75 | func (s *sum32) Sum32() uint32 {
76 | return uint32(*s)
77 | }
78 |
79 | const defaultSeed uint32 = 0x9747b28c
80 |
81 | func (s *sum32) Write(data []byte) (int, error) {
82 | return s.WriteSeed(data, defaultSeed)
83 | }
84 |
85 | func (s *sum32) WriteSeed(data []byte, seed uint32) (int, error) {
86 | var length = uint32(len(data))
87 |
88 | /* Initialize the hash to a 'random' value */
89 | h := *s
90 | h = sum32(seed ^ length)
91 |
92 | /* Mix 4 bytes at a time into the hash */
93 | var i int = 0
94 |
95 | for length >= 4 {
96 | var k uint32
97 | buf := bytes.NewBuffer(data[i : i+4])
98 | err := binary.Read(buf, binary.LittleEndian, &k)
99 | if err != nil {
100 | return 0, err
101 | }
102 | k *= m
103 | k ^= k >> r
104 | k *= m
105 |
106 | h *= m
107 | h ^= sum32(k)
108 |
109 | i += 4
110 | length -= 4
111 | }
112 | switch length {
113 | case 3:
114 | h ^= sum32((uint32)(data[i+2]) << 16)
115 | fallthrough
116 | case 2:
117 | h ^= sum32((uint32)(data[i+1]) << 8)
118 | fallthrough
119 | case 1:
120 | h ^= sum32((uint32)(data[i]))
121 | h *= m
122 | default:
123 | }
124 | h ^= h >> 13
125 | h *= m
126 | h ^= h >> 15
127 | *s = h
128 |
129 | return len(data), nil
130 | }
131 |
132 | func (s *sum32) Size() int { return 4 }
133 |
134 | func (s *sum32) BlockSize() int { return 1 }
135 |
136 | func (s *sum32) Sum(in []byte) []byte {
137 | v := uint32(*s)
138 | return append(in, byte(v>>24), byte(v>>16), byte(v>>8), byte(v))
139 | }
140 |
--------------------------------------------------------------------------------
/go/internal/gomurmur/gomurmur_test.go:
--------------------------------------------------------------------------------
1 | /*
2 | * Copyright (c) 2013-2016, Sureshkumar Nedunchezhian
3 | * All rights reserved.
4 | *
5 | * Redistribution and use in source and binary forms, with or without
6 | * modification, are permitted provided that the following conditions are met:
7 | *
8 | * * Redistributions of source code must retain the above copyright notice,
9 | * this list of conditions and the following disclaimer.
10 | * * Redistributions in binary form must reproduce the above copyright
11 | * notice, this list of conditions and the following disclaimer in
12 | * the documentation and/or other materials provided with the distribution.
13 | *
14 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
15 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
16 | * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
17 | * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
18 | * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
19 | * OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
20 | * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
21 | * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
22 | * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
23 | * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
24 | * THE POSSIBILITY OF SUCH DAMAGE.
25 | */
26 |
27 | package gomurmur
28 |
29 | import (
30 | "bytes"
31 | "hash"
32 | "testing"
33 | )
34 |
35 | type golden struct {
36 | sum []byte
37 | text string
38 | }
39 |
40 | var golden32 = []golden{
41 | {[]byte{0x00, 0x00, 0x00, 0x00}, ""},
42 | {[]byte{0x4b, 0x41, 0x75, 0x7c}, "a"},
43 | {[]byte{0xe3, 0xb5, 0x4d, 0xfb}, "ab"},
44 | {[]byte{0x7b, 0x0c, 0xc4, 0x28}, "abc"},
45 | {[]byte{0xef, 0x6a, 0x86, 0xaf}, "abcd"},
46 | {[]byte{0x9a, 0x26, 0x3e, 0xda}, "abcde"},
47 | {[]byte{0xe0, 0xba, 0xdc, 0x96}, "abcdef"},
48 | {[]byte{0xeb, 0xa7, 0x46, 0xf2}, "abcdefg"},
49 | }
50 |
51 | func TestGolden32(t *testing.T) {
52 | testGolden(t, New32(), golden32)
53 | }
54 |
55 | func testGolden(t *testing.T, hash hash.Hash, gold []golden) {
56 | for _, g := range gold {
57 | hash.Reset()
58 | done, error := hash.Write([]byte(g.text))
59 | if error != nil {
60 | t.Fatalf("write error: %s", error)
61 | }
62 | if done != len(g.text) {
63 | t.Fatalf("wrote only %d out of %d bytes", done, len(g.text))
64 | }
65 | if actual := hash.Sum(nil); !bytes.Equal(g.sum, actual) {
66 | t.Errorf("hash(%q) = 0x%x want 0x%x", g.text, actual, g.sum)
67 | }
68 | }
69 | }
70 |
--------------------------------------------------------------------------------
/inbloom.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EverythingMe/inbloom/1d67c94ae5dc4dd199b88912114e7fa1b2f04bad/inbloom.png
--------------------------------------------------------------------------------
/java/.gitignore:
--------------------------------------------------------------------------------
1 | *.class
2 |
3 | # Mobile Tools for Java (J2ME)
4 | .mtj.tmp/
5 | /.idea
6 | /.gradle
7 | # Package Files #
8 | *.jar
9 | *.war
10 | *.ear
11 |
12 | # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
13 | hs_err_pid*
14 | build/*
15 |
--------------------------------------------------------------------------------
/java/InBloom.iml:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
--------------------------------------------------------------------------------
/java/README.md:
--------------------------------------------------------------------------------
1 | # InBloom-Java
2 |
3 | Java Implementation of InBloom portable cross-language bloom filter.
4 |
5 | ### Usage
6 | https://github.com/EverythingMe/inbloom#java
7 |
8 |
9 | ### Gradle
10 |
11 | Add the following lines to your build.gradle script.
12 |
13 | ```groovy
14 | repositories {
15 | jcenter {
16 | url 'http://dl.bintray.com/everythingme/generic'
17 | }
18 | }
19 |
20 | dependencies {
21 | compile 'me.everything:inbloom:0.1'
22 | }
23 | ```
24 |
--------------------------------------------------------------------------------
/java/build.gradle:
--------------------------------------------------------------------------------
1 | plugins {
2 | id "com.jfrog.bintray" version "1.2"
3 | }
4 |
5 | group = 'me.everything'
6 | version '0.1'
7 |
8 | apply plugin: 'java'
9 | apply plugin: 'maven'
10 | apply plugin: 'maven-publish'
11 |
12 | sourceCompatibility = 1.5
13 |
14 | repositories {
15 | mavenCentral()
16 | }
17 |
18 | dependencies {
19 | testCompile group: 'junit', name: 'junit', version: '4.11'
20 | }
21 |
22 | bintray {
23 | user = System.getenv('BINTRAY_USER')
24 | key = System.getenv('BINTRAY_KEY')
25 | publications = ['mavenJava']
26 | pkg {
27 | repo = 'generic'
28 | name = 'inbloom'
29 | userOrg = 'everythingme'
30 | licenses = ['BSD']
31 | vcsUrl = 'https://github.com/EverythingMe/inbloom'
32 | publish = true
33 | version {
34 | name = '0.1'
35 | desc = 'InBloom Library 0.1'
36 | released = new Date()
37 | vcsTag = '0.1'
38 | }
39 | }
40 | }
41 |
42 | task sourcesJar(type: Jar) {
43 | from sourceSets.main.allSource
44 | classifier = 'sources'
45 | }
46 |
47 | task javadocJar(type: Jar, dependsOn: javadoc) {
48 | classifier = 'javadoc'
49 | from 'build/docs/javadoc'
50 | }
51 |
52 | publishing {
53 | publications {
54 | mavenJava(MavenPublication) {
55 | from components.java
56 | artifact sourcesJar
57 | artifact javadocJar
58 | groupId 'me.everything'
59 | artifactId 'inbloom'
60 | }
61 | }
62 | }
63 |
--------------------------------------------------------------------------------
/java/gradle/wrapper/gradle-wrapper.properties:
--------------------------------------------------------------------------------
1 | #Thu Jul 23 11:36:17 IDT 2015
2 | distributionBase=GRADLE_USER_HOME
3 | distributionPath=wrapper/dists
4 | zipStoreBase=GRADLE_USER_HOME
5 | zipStorePath=wrapper/dists
6 | distributionUrl=https\://services.gradle.org/distributions/gradle-2.2-all.zip
7 |
--------------------------------------------------------------------------------
/java/gradlew:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | ##############################################################################
4 | ##
5 | ## Gradle start up script for UN*X
6 | ##
7 | ##############################################################################
8 |
9 | # Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
10 | DEFAULT_JVM_OPTS=""
11 |
12 | APP_NAME="Gradle"
13 | APP_BASE_NAME=`basename "$0"`
14 |
15 | # Use the maximum available, or set MAX_FD != -1 to use that value.
16 | MAX_FD="maximum"
17 |
18 | warn ( ) {
19 | echo "$*"
20 | }
21 |
22 | die ( ) {
23 | echo
24 | echo "$*"
25 | echo
26 | exit 1
27 | }
28 |
29 | # OS specific support (must be 'true' or 'false').
30 | cygwin=false
31 | msys=false
32 | darwin=false
33 | case "`uname`" in
34 | CYGWIN* )
35 | cygwin=true
36 | ;;
37 | Darwin* )
38 | darwin=true
39 | ;;
40 | MINGW* )
41 | msys=true
42 | ;;
43 | esac
44 |
45 | # For Cygwin, ensure paths are in UNIX format before anything is touched.
46 | if $cygwin ; then
47 | [ -n "$JAVA_HOME" ] && JAVA_HOME=`cygpath --unix "$JAVA_HOME"`
48 | fi
49 |
50 | # Attempt to set APP_HOME
51 | # Resolve links: $0 may be a link
52 | PRG="$0"
53 | # Need this for relative symlinks.
54 | while [ -h "$PRG" ] ; do
55 | ls=`ls -ld "$PRG"`
56 | link=`expr "$ls" : '.*-> \(.*\)$'`
57 | if expr "$link" : '/.*' > /dev/null; then
58 | PRG="$link"
59 | else
60 | PRG=`dirname "$PRG"`"/$link"
61 | fi
62 | done
63 | SAVED="`pwd`"
64 | cd "`dirname \"$PRG\"`/" >&-
65 | APP_HOME="`pwd -P`"
66 | cd "$SAVED" >&-
67 |
68 | CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar
69 |
70 | # Determine the Java command to use to start the JVM.
71 | if [ -n "$JAVA_HOME" ] ; then
72 | if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
73 | # IBM's JDK on AIX uses strange locations for the executables
74 | JAVACMD="$JAVA_HOME/jre/sh/java"
75 | else
76 | JAVACMD="$JAVA_HOME/bin/java"
77 | fi
78 | if [ ! -x "$JAVACMD" ] ; then
79 | die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
80 |
81 | Please set the JAVA_HOME variable in your environment to match the
82 | location of your Java installation."
83 | fi
84 | else
85 | JAVACMD="java"
86 | which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
87 |
88 | Please set the JAVA_HOME variable in your environment to match the
89 | location of your Java installation."
90 | fi
91 |
92 | # Increase the maximum file descriptors if we can.
93 | if [ "$cygwin" = "false" -a "$darwin" = "false" ] ; then
94 | MAX_FD_LIMIT=`ulimit -H -n`
95 | if [ $? -eq 0 ] ; then
96 | if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
97 | MAX_FD="$MAX_FD_LIMIT"
98 | fi
99 | ulimit -n $MAX_FD
100 | if [ $? -ne 0 ] ; then
101 | warn "Could not set maximum file descriptor limit: $MAX_FD"
102 | fi
103 | else
104 | warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
105 | fi
106 | fi
107 |
108 | # For Darwin, add options to specify how the application appears in the dock
109 | if $darwin; then
110 | GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
111 | fi
112 |
113 | # For Cygwin, switch paths to Windows format before running java
114 | if $cygwin ; then
115 | APP_HOME=`cygpath --path --mixed "$APP_HOME"`
116 | CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
117 |
118 | # We build the pattern for arguments to be converted via cygpath
119 | ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
120 | SEP=""
121 | for dir in $ROOTDIRSRAW ; do
122 | ROOTDIRS="$ROOTDIRS$SEP$dir"
123 | SEP="|"
124 | done
125 | OURCYGPATTERN="(^($ROOTDIRS))"
126 | # Add a user-defined pattern to the cygpath arguments
127 | if [ "$GRADLE_CYGPATTERN" != "" ] ; then
128 | OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
129 | fi
130 | # Now convert the arguments - kludge to limit ourselves to /bin/sh
131 | i=0
132 | for arg in "$@" ; do
133 | CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
134 | CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option
135 |
136 | if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
137 | eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
138 | else
139 | eval `echo args$i`="\"$arg\""
140 | fi
141 | i=$((i+1))
142 | done
143 | case $i in
144 | (0) set -- ;;
145 | (1) set -- "$args0" ;;
146 | (2) set -- "$args0" "$args1" ;;
147 | (3) set -- "$args0" "$args1" "$args2" ;;
148 | (4) set -- "$args0" "$args1" "$args2" "$args3" ;;
149 | (5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
150 | (6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
151 | (7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
152 | (8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
153 | (9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
154 | esac
155 | fi
156 |
157 | # Split up the JVM_OPTS And GRADLE_OPTS values into an array, following the shell quoting and substitution rules
158 | function splitJvmOpts() {
159 | JVM_OPTS=("$@")
160 | }
161 | eval splitJvmOpts $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS
162 | JVM_OPTS[${#JVM_OPTS[*]}]="-Dorg.gradle.appname=$APP_BASE_NAME"
163 |
164 | exec "$JAVACMD" "${JVM_OPTS[@]}" -classpath "$CLASSPATH" org.gradle.wrapper.GradleWrapperMain "$@"
165 |
--------------------------------------------------------------------------------
/java/gradlew.bat:
--------------------------------------------------------------------------------
1 | @if "%DEBUG%" == "" @echo off
2 | @rem ##########################################################################
3 | @rem
4 | @rem Gradle startup script for Windows
5 | @rem
6 | @rem ##########################################################################
7 |
8 | @rem Set local scope for the variables with windows NT shell
9 | if "%OS%"=="Windows_NT" setlocal
10 |
11 | @rem Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
12 | set DEFAULT_JVM_OPTS=
13 |
14 | set DIRNAME=%~dp0
15 | if "%DIRNAME%" == "" set DIRNAME=.
16 | set APP_BASE_NAME=%~n0
17 | set APP_HOME=%DIRNAME%
18 |
19 | @rem Find java.exe
20 | if defined JAVA_HOME goto findJavaFromJavaHome
21 |
22 | set JAVA_EXE=java.exe
23 | %JAVA_EXE% -version >NUL 2>&1
24 | if "%ERRORLEVEL%" == "0" goto init
25 |
26 | echo.
27 | echo ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
28 | echo.
29 | echo Please set the JAVA_HOME variable in your environment to match the
30 | echo location of your Java installation.
31 |
32 | goto fail
33 |
34 | :findJavaFromJavaHome
35 | set JAVA_HOME=%JAVA_HOME:"=%
36 | set JAVA_EXE=%JAVA_HOME%/bin/java.exe
37 |
38 | if exist "%JAVA_EXE%" goto init
39 |
40 | echo.
41 | echo ERROR: JAVA_HOME is set to an invalid directory: %JAVA_HOME%
42 | echo.
43 | echo Please set the JAVA_HOME variable in your environment to match the
44 | echo location of your Java installation.
45 |
46 | goto fail
47 |
48 | :init
49 | @rem Get command-line arguments, handling Windowz variants
50 |
51 | if not "%OS%" == "Windows_NT" goto win9xME_args
52 | if "%@eval[2+2]" == "4" goto 4NT_args
53 |
54 | :win9xME_args
55 | @rem Slurp the command line arguments.
56 | set CMD_LINE_ARGS=
57 | set _SKIP=2
58 |
59 | :win9xME_args_slurp
60 | if "x%~1" == "x" goto execute
61 |
62 | set CMD_LINE_ARGS=%*
63 | goto execute
64 |
65 | :4NT_args
66 | @rem Get arguments from the 4NT Shell from JP Software
67 | set CMD_LINE_ARGS=%$
68 |
69 | :execute
70 | @rem Setup the command line
71 |
72 | set CLASSPATH=%APP_HOME%\gradle\wrapper\gradle-wrapper.jar
73 |
74 | @rem Execute Gradle
75 | "%JAVA_EXE%" %DEFAULT_JVM_OPTS% %JAVA_OPTS% %GRADLE_OPTS% "-Dorg.gradle.appname=%APP_BASE_NAME%" -classpath "%CLASSPATH%" org.gradle.wrapper.GradleWrapperMain %CMD_LINE_ARGS%
76 |
77 | :end
78 | @rem End local scope for the variables with windows NT shell
79 | if "%ERRORLEVEL%"=="0" goto mainEnd
80 |
81 | :fail
82 | rem Set variable GRADLE_EXIT_CONSOLE if you need the _script_ return code instead of
83 | rem the _cmd.exe /c_ return code!
84 | if not "" == "%GRADLE_EXIT_CONSOLE%" exit 1
85 | exit /b 1
86 |
87 | :mainEnd
88 | if "%OS%"=="Windows_NT" endlocal
89 |
90 | :omega
91 |
--------------------------------------------------------------------------------
/java/settings.gradle:
--------------------------------------------------------------------------------
1 | rootProject.name = 'InBloom'
2 |
3 |
--------------------------------------------------------------------------------
/java/src/main/java/META-INF/MANIFEST.MF:
--------------------------------------------------------------------------------
1 | Manifest-Version: 1.0
2 |
--------------------------------------------------------------------------------
/java/src/main/java/me/everything/inbloom/BinAscii.java:
--------------------------------------------------------------------------------
1 | package me.everything.inbloom;
2 |
3 | /**
4 | * Utility class for dealing with binary data
5 | */
6 | public class BinAscii {
7 | final protected static char[] hexArray = "0123456789ABCDEF".toCharArray();
8 |
9 | /**
10 | * Transform a byte array into a it's hexadecimal representation
11 | */
12 | public static String hexlify(byte[] bytes) {
13 | char[] hexChars = new char[bytes.length * 2];
14 | for ( int j = 0; j < bytes.length; j++ ) {
15 | int v = bytes[j] & 0xFF;
16 | hexChars[j * 2] = hexArray[v >>> 4];
17 | hexChars[j * 2 + 1] = hexArray[v & 0x0F];
18 | }
19 | String ret = new String(hexChars);
20 | return ret;
21 | }
22 |
23 | /**
24 | * Transform a string of hexadecimal chars into a byte array
25 | */
26 | public static byte[] unhexlify(String argbuf) {
27 | int arglen = argbuf.length();
28 | if (arglen % 2 != 0)
29 | throw new RuntimeException("Odd-length string");
30 |
31 | byte[] retbuf = new byte[arglen/2];
32 |
33 | for (int i = 0; i < arglen; i += 2) {
34 | int top = Character.digit(argbuf.charAt(i), 16);
35 | int bot = Character.digit(argbuf.charAt(i+1), 16);
36 | if (top == -1 || bot == -1)
37 | throw new RuntimeException("Non-hexadecimal digit found");
38 | retbuf[i / 2] = (byte) ((top << 4) + bot);
39 | }
40 | return retbuf;
41 | }
42 | }
43 |
--------------------------------------------------------------------------------
/java/src/main/java/me/everything/inbloom/BloomFilter.java:
--------------------------------------------------------------------------------
1 | package me.everything.inbloom;
2 |
3 | import java.io.InvalidObjectException;
4 | import java.nio.ByteBuffer;
5 | import java.nio.ByteOrder;
6 | import java.util.Arrays;
7 | import java.util.zip.CRC32;
8 |
9 | /**
10 | * Pure Java Bloom Filter class.
11 | *
12 | * Translated from the libbloom C library
13 | *
14 | * Original copyright note from libbloom:
15 | *
16 | * Copyright (c) 2012, Jyri J. Virkki
17 | * All rights reserved.
18 | *
19 | * This file is under BSD license. See LICENSE file.
20 | */
21 | public class BloomFilter {
22 |
23 | private static final String TAG = "BloomFilter" ;
24 | // These fields are part of the public interface of this structure.
25 | // Client code may read these values if desired. Client code MUST NOT
26 | // modify any of these.
27 | protected final int entries;
28 | protected final double error;
29 | protected final int bits;
30 | protected final int bytes;
31 | protected final int hashes;
32 | protected final static double errorPrecision = 0.000000001;
33 |
34 | // Fields below are private to the implementation. These may go away or
35 | // change incompatibly at any moment. Client code MUST NOT access or rely
36 | // on these.
37 | double bpe;
38 | byte[] bf;
39 | int ready;
40 |
41 | /**
42 | * Create a blank bloom filter, with a given number of expected entries, and an error rate
43 | * @param entries the expected number of entries
44 | * @param error the desired error rate
45 | */
46 | public BloomFilter(int entries, double error)
47 | {
48 | this(null, entries, error);
49 | }
50 |
51 |
52 | static private final double denom = 0.480453013918201;
53 |
54 | /**
55 | * Create a bloom filter from an existing data buffer, created by another instance of this library (probably in another language).
56 | * If the length of the data does not fit the number of entries and error rate, we raise RuntimeException
57 | * @param data the raw filter data
58 | * @param entries the expected number of entries
59 | * @param error the desired error rate
60 | */
61 | public BloomFilter(byte []data, int entries, double error) throws RuntimeException
62 | {
63 | if (entries < 1 || ( ( 1.0 <= error ) || ( error <= errorPrecision ) ) ) {
64 | throw new RuntimeException("Invalid params for bloom filter");
65 | }
66 |
67 | this.entries = entries;
68 | this.error = error;
69 |
70 | bpe = -(Math.log(error) / denom);
71 | bits = (int)((double)entries * bpe);
72 | bytes = (bits / 8) + (bits % 8 != 0 ? 1 : 0);
73 |
74 | if (data != null) {
75 | if (bytes != data.length) {
76 | throw new RuntimeException(String.format("Expected %d bytes, got %d", bytes, data.length));
77 | }
78 | bf = data;
79 | } else {
80 | bf = new byte[bytes];;
81 | }
82 |
83 |
84 | hashes = (int)Math.ceil(0.693147180559945 * bpe); // ln(2)
85 | }
86 |
87 | public static short computeChecksum(byte[] data) {
88 | CRC32 crc = new CRC32();
89 | crc.update(data);
90 | long checksum32 = crc.getValue();
91 | return (short) ((checksum32 & 0xFFFF) ^ (checksum32 >> 16));
92 | }
93 |
94 | public static BloomFilter load(byte[] bytes) throws InvalidObjectException {
95 | ByteBuffer bb = ByteBuffer.wrap(bytes);
96 | bb.order(ByteOrder.BIG_ENDIAN);
97 | short checksum = bb.getShort();
98 | short errorRate = bb.getShort();
99 | int cardinality = bb.getInt();
100 | final byte[] data = Arrays.copyOfRange(bytes, bb.position(), bytes.length);
101 | if (computeChecksum(data) != checksum)
102 | throw new InvalidObjectException("Bad checksum");
103 |
104 | return new BloomFilter(data, cardinality, 1.0 / errorRate);
105 | }
106 |
107 | public static byte[] dump(BloomFilter bf) {
108 | // 8 is the size of the header
109 | byte[] bytes = new byte[bf.bytes + 8];
110 | ByteBuffer bb = ByteBuffer.wrap(bytes);
111 | bb.order(ByteOrder.BIG_ENDIAN);
112 |
113 | bb.putShort(computeChecksum(bf.bf));
114 | bb.putShort((short) (1.0 / bf.error));
115 | bb.putInt(bf.entries);
116 | bb.put(bf.bf);
117 |
118 | return bytes;
119 | }
120 |
121 | private static long unsigned(int i) {
122 | return i & 0xffffffffl;
123 | }
124 |
125 | /**
126 | * check existence or add an entry
127 | * @param key the key to check/add
128 | * @param add whether we add or just check the existence
129 | * @return true if the key is already in the filter
130 | */
131 | private boolean checkAdd(String key,boolean add) {
132 |
133 | int hits = 0;
134 | long a = unsigned(Murmur2.hash32(key, 0x9747b28c));
135 | long b = unsigned(Murmur2.hash32(key, (int) a));
136 |
137 |
138 |
139 | for (int i = 0; i < hashes; i++) {
140 | long x = unsigned ((int)(a + i*b)) % bits;
141 | long bt = x >> 3;
142 |
143 | byte c = bf[(int)bt]; // expensive memory access
144 | byte mask = (byte)(1 << (x % 8));
145 |
146 | if ((c & mask) != 0) {
147 | hits++;
148 | } else {
149 | if (add) {
150 | bf[(int)bt] = (byte)(c | mask);
151 | }
152 | }
153 |
154 |
155 | }
156 |
157 | return hits == hashes;
158 | }
159 |
160 |
161 | /**
162 | * Check whether the filter contains a string
163 | * @param key the string to check
164 | * @return true if it already exists in the filter
165 | */
166 | public boolean contains(String key) {
167 | return checkAdd(key, false);
168 | }
169 |
170 |
171 | /**
172 | * Add a string to the filter
173 | * @param key the string to add
174 | * @return true if the string was already in the filter
175 | */
176 | public boolean add(String key) {
177 | return checkAdd(key, true);
178 | }
179 | }
180 |
--------------------------------------------------------------------------------
/java/src/main/java/me/everything/inbloom/Murmur2.java:
--------------------------------------------------------------------------------
1 | package me.everything.inbloom;
2 |
3 | /**
4 | * murmur hash 2.0.
5 | *
6 | * The murmur hash is a relatively fast hash function from
7 | * http://murmurhash.googlepages.com/ for platforms with efficient
8 | * multiplication.
9 | *
10 | * This is a re-implementation of the original C code plus some
11 | * additional features.
12 | *
13 | * Public domain.
14 | *
15 | * @author Viliam Holub
16 | * @version 1.0.2
17 | *
18 | */
19 | public final class Murmur2 {
20 |
21 | // all methods static; private constructor.
22 | private Murmur2() {}
23 |
24 | public static int hash32(String data, int seed) {
25 | final byte[] bytes = data.getBytes();
26 | return hash32(bytes, bytes.length, seed );
27 | }
28 | /**
29 | * Generates 32 bit hash from byte array of the given length and
30 | * seed.
31 | *
32 | * @param data byte array to hash
33 | * @param length length of the array to hash
34 | * @param seed initial seed value
35 | * @return 32 bit hash of the given array
36 | */
37 | public static int hash32(final byte[] data, int length, int seed) {
38 | // 'm' and 'r' are mixing constants generated offline.
39 | // They're not really 'magic', they just happen to work well.
40 | final int m = 0x5bd1e995;
41 | final int r = 24;
42 |
43 | // Initialize the hash to a random value
44 | int h = seed^length;
45 | int length4 = length/4;
46 |
47 | for (int i=0; i>> r;
53 | k *= m;
54 | h *= m;
55 | h ^= k;
56 | }
57 |
58 | // Handle the last few bytes of the input array
59 | switch (length%4) {
60 | case 3: h ^= (data[(length&~3) +2]&0xff) << 16;
61 | case 2: h ^= (data[(length&~3) +1]&0xff) << 8;
62 | case 1: h ^= (data[length&~3]&0xff);
63 | h *= m;
64 | }
65 |
66 | h ^= h >>> 13;
67 | h *= m;
68 | h ^= h >>> 15;
69 |
70 | return h;
71 | }
72 |
73 | /**
74 | * Generates 32 bit hash from byte array with default seed value.
75 | *
76 | * @param data byte array to hash
77 | * @param length length of the array to hash
78 | * @return 32 bit hash of the given array
79 | */
80 | public static int hash32(final byte[] data, int length) {
81 | return hash32(data, length, 0x9747b28c);
82 | }
83 |
84 | /**
85 | * Generates 32 bit hash from a string.
86 | *
87 | * @param text string to hash
88 | * @return 32 bit hash of the given string
89 | */
90 | public static int hash32(final String text) {
91 | final byte[] bytes = text.getBytes();
92 | return hash32(bytes, bytes.length);
93 | }
94 |
95 | /**
96 | * Generates 32 bit hash from a substring.
97 | *
98 | * @param text string to hash
99 | * @param from starting index
100 | * @param length length of the substring to hash
101 | * @return 32 bit hash of the given string
102 | */
103 | public static int hash32(final String text, int from, int length) {
104 | return hash32(text.substring( from, from+length));
105 | }
106 |
107 | /**
108 | * Generates 64 bit hash from byte array of the given length and seed.
109 | *
110 | * @param data byte array to hash
111 | * @param length length of the array to hash
112 | * @param seed initial seed value
113 | * @return 64 bit hash of the given array
114 | */
115 | public static long hash64(final byte[] data, int length, int seed) {
116 | final long m = 0xc6a4a7935bd1e995L;
117 | final int r = 47;
118 |
119 | long h = (seed&0xffffffffl)^(length*m);
120 |
121 | int length8 = length/8;
122 |
123 | for (int i=0; i>> r;
132 | k *= m;
133 |
134 | h ^= k;
135 | h *= m;
136 | }
137 |
138 | switch (length%8) {
139 | case 7: h ^= (long)(data[(length&~7)+6]&0xff) << 48;
140 | case 6: h ^= (long)(data[(length&~7)+5]&0xff) << 40;
141 | case 5: h ^= (long)(data[(length&~7)+4]&0xff) << 32;
142 | case 4: h ^= (long)(data[(length&~7)+3]&0xff) << 24;
143 | case 3: h ^= (long)(data[(length&~7)+2]&0xff) << 16;
144 | case 2: h ^= (long)(data[(length&~7)+1]&0xff) << 8;
145 | case 1: h ^= (long)(data[length&~7]&0xff);
146 | h *= m;
147 | };
148 |
149 | h ^= h >>> r;
150 | h *= m;
151 | h ^= h >>> r;
152 |
153 | return h;
154 | }
155 |
156 | /**
157 | * Generates 64 bit hash from byte array with default seed value.
158 | *
159 | * @param data byte array to hash
160 | * @param length length of the array to hash
161 | * @return 64 bit hash of the given string
162 | */
163 | public static long hash64(final byte[] data, int length) {
164 | return hash64(data, length, 0xe17a1465);
165 | }
166 |
167 | /**
168 | * Generates 64 bit hash from a string.
169 | *
170 | * @param text string to hash
171 | * @return 64 bit hash of the given string
172 | */
173 | public static long hash64(final String text) {
174 | final byte[] bytes = text.getBytes();
175 | return hash64(bytes, bytes.length);
176 | }
177 |
178 | /**
179 | * Generates 64 bit hash from a substring.
180 | *
181 | * @param text string to hash
182 | * @param from starting index
183 | * @param length length of the substring to hash
184 | * @return 64 bit hash of the given array
185 | */
186 | public static long hash64(final String text, int from, int length) {
187 | return hash64(text.substring( from, from+length));
188 | }
189 | }
--------------------------------------------------------------------------------
/java/src/test/java/me/everything/inbloom/BloomFilterTest.java:
--------------------------------------------------------------------------------
1 | package me.everything.inbloom;
2 |
3 | import junit.framework.TestCase;
4 |
5 | import java.io.InvalidObjectException;
6 |
7 | /**
8 | * Created by dvirsky on 23/07/15.
9 | */
10 | public class BloomFilterTest extends TestCase {
11 |
12 | public void testCreateFilterWithBadParameters() {
13 | try {
14 | BloomFilter bf = new BloomFilter( -1, 0.01 );
15 | fail( "should have thrown an exception" );
16 | }
17 | catch( RuntimeException e ) {
18 | assertEquals( "Invalid params for bloom filter", e.getMessage() );
19 | }
20 |
21 | //
22 | // Error value cannot be arbitrarily close to zero. Must be bounded.
23 | //
24 | try {
25 | BloomFilter bf = new BloomFilter( 1, 0.000000001 );
26 | fail( "should have thrown an exception: error value is not bounded by a precision or tolerance value." );
27 | }
28 | catch( RuntimeException e ) {
29 | assertEquals( "Invalid params for bloom filter", e.getMessage() );
30 | }
31 |
32 | //
33 | // Creating an unusable bloom filter (zero bits, zero hashes) should fail.
34 | //
35 | try {
36 | BloomFilter bf = new BloomFilter(199, 1.0);
37 | fail( "should have thrown an exception: created an unusable bloom filter with zero bits and zero hashes." );
38 | }
39 | catch( RuntimeException e ) {
40 | assertEquals( "Invalid params for bloom filter", e.getMessage() );
41 | }
42 |
43 | //
44 | // Giving it an error value that is too big should fail.
45 | //
46 | try {
47 | BloomFilter bf = new BloomFilter(199, 100.0);
48 | fail( "should have thrown an exception: error value is too big." );
49 | }
50 | catch( RuntimeException e ) {
51 | assertEquals( "Invalid params for bloom filter", e.getMessage() );
52 | }
53 |
54 | //
55 | // Adding more data than expected should fail.
56 | //
57 | try {
58 | byte []data = "add more entries than bytes available".getBytes();
59 | BloomFilter bf0 = new BloomFilter(data, 1, 0.1);
60 | fail( "should have thrown an exception: too much data." );
61 | }
62 | catch( RuntimeException e ) {
63 | assertEquals( "Expected 1 bytes, got 37", e.getMessage() );
64 | }
65 |
66 | }
67 |
68 |
69 | public void testValuesFromPublicAPI() {
70 | BloomFilter bf = null;
71 | assertEquals(0.000000001, bf.errorPrecision);
72 |
73 | bf = new BloomFilter(1, 0.01);
74 | assertEquals(2, bf.bytes);
75 | assertEquals(1, bf.entries);
76 | assertEquals(0.01, bf.error);
77 | assertEquals(9, bf.bits);
78 | assertEquals(7, bf.hashes);
79 |
80 | bf = new BloomFilter(1, 0.1);
81 | assertEquals(1, bf.bytes);
82 | assertEquals(1, bf.entries);
83 | assertEquals(0.1, bf.error);
84 | assertEquals(4, bf.bits);
85 | assertEquals(4, bf.hashes);
86 |
87 | bf = new BloomFilter(8, 0.000001);
88 | assertEquals(29, bf.bytes);
89 | assertEquals(8, bf.entries);
90 | assertEquals(0.000001, bf.error);
91 | assertEquals(230, bf.bits);
92 | assertEquals(20, bf.hashes);
93 | }
94 |
95 |
96 | public void testFilter() throws InvalidObjectException {
97 | BloomFilter bf = new BloomFilter(20, 0.01);
98 |
99 |
100 | bf.add("foo");
101 | bf.add("bar");
102 | bf.add("foosdfsdfs");
103 | bf.add("fossdfsdfo");
104 | bf.add("foasdfasdfasdfasdfo");
105 | bf.add("foasdfasdfasdasdfasdfasdfasdfasdfo");
106 |
107 |
108 | assertTrue(bf.contains("foo"));
109 | assertTrue(bf.contains("bar"));
110 |
111 | assertFalse(bf.contains("baz"));
112 | assertFalse(bf.contains("faskdjfhsdkfjhsjdkfhskdjfh"));
113 |
114 |
115 | BloomFilter bf2 = new BloomFilter(bf.bf, bf.entries, bf.error);
116 | assertTrue(bf2.contains("foo"));
117 | assertTrue(bf2.contains("bar"));
118 |
119 | assertFalse(bf2.contains("baz"));
120 | assertFalse(bf2.contains("faskdjfhsdkfjhsjdkfhskdjfh"));
121 |
122 | String serialized = BinAscii.hexlify(BloomFilter.dump(bf));
123 | System.out.printf("Serialized: %s\n", serialized);
124 |
125 | String hexPayload = "620d006400000014000000000020001000080000000000002000100008000400";
126 | BloomFilter deserialized = BloomFilter.load(BinAscii.unhexlify(hexPayload));
127 | String dump = BinAscii.hexlify(BloomFilter.dump(deserialized));
128 | System.out.printf("Re-Serialized: %s\n", dump);
129 | assertEquals(dump.toLowerCase(), hexPayload);
130 |
131 | //BloomFilter deserialized = BloomFilter.load(BloomFilter.dump(bf));
132 |
133 |
134 | assertEquals(deserialized.entries, 20);
135 | assertEquals(deserialized.error, 0.01);
136 | assertTrue(deserialized.contains("abc"));
137 | }
138 | }
139 |
--------------------------------------------------------------------------------
/py/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include VERSION
2 | include README.rst
3 | graft vendor
4 | include inbloom/crc32.c
5 |
--------------------------------------------------------------------------------
/py/README.md:
--------------------------------------------------------------------------------
1 | # inbloom (Python)
2 |
3 | - https://github.com/EverythingMe/inbloom/tree/master/py
4 | - https://pypi.python.org/pypi/inbloom/
5 |
6 | Package inbloom implements a portable bloom filter that can export and import
7 | data to and from implementations of the same library in different languages.
8 |
9 | This implementation is a C extension which wraps libbloom (https://github.com/jvirkki/libbloom)
10 |
11 |
12 | ## Installation
13 |
14 | ```bash
15 | pip install inbloom
16 | ```
17 |
18 | ## Usage
19 |
20 | ```python
21 | import inbloom
22 |
23 | bf = inbloom.Filter(entries=100, error=0.01)
24 | bf.add("abc")
25 | bf.add("def")
26 |
27 | assert bf.contains("abc")
28 | assert bf.contains("def")
29 | assert not bf.contains("ghi")
30 |
31 | bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer())
32 | assert bf2.contains("abc")
33 | assert bf2.contains("def")
34 | assert not bf2.contains("ghi")
35 | ```
36 |
37 | ##### Serialization
38 |
39 | ```python
40 | import inbloom
41 | import binascii
42 |
43 | payload = '620d006400000014000000000020001000080000000000002000100008000400'
44 | assert binascii.hexlify(inbloom.dump(inbloom.load(binascii.unhexlify(payload)))) == payload
45 | ```
46 |
--------------------------------------------------------------------------------
/py/README.rst:
--------------------------------------------------------------------------------
1 | inbloom (Python)
2 | ================
3 |
4 | - https://github.com/EverythingMe/inbloom/tree/master/py
5 | - https://pypi.python.org/pypi/inbloom/
6 |
7 | Package inbloom implements a portable bloom filter that can export and
8 | import data to and from implementations of the same library in different
9 | languages.
10 |
11 | This implementation is a C extension which wraps libbloom
12 | (https://github.com/jvirkki/libbloom)
13 |
14 | Installation
15 | ------------
16 |
17 | .. code:: bash
18 |
19 | pip install inbloom
20 |
21 | Usage
22 | -----
23 |
24 | .. code:: python
25 |
26 | import inbloom
27 |
28 | bf = inbloom.Filter(entries=100, error=0.01)
29 | bf.add("abc")
30 | bf.add("def")
31 |
32 | assert bf.contains("abc")
33 | assert bf.contains("def")
34 | assert not bf.contains("ghi")
35 |
36 | bf2 = inbloom.Filter(entries=100, error=0.01, data=bf.buffer())
37 | assert bf2.contains("abc")
38 | assert bf2.contains("def")
39 | assert not bf2.contains("ghi")
40 |
41 | Serialization
42 | '''''''''''''
43 |
44 | .. code:: python
45 |
46 | import inbloom
47 | import binascii
48 |
49 | payload = '620d006400000014000000000020001000080000000000002000100008000400'
50 | assert binascii.hexlify(inbloom.dump(inbloom.load(binascii.unhexlify(payload)))) == payload
51 |
52 |
--------------------------------------------------------------------------------
/py/VERSION:
--------------------------------------------------------------------------------
1 | 0.2.2
2 |
--------------------------------------------------------------------------------
/py/generate_rst:
--------------------------------------------------------------------------------
1 | pandoc --from=markdown --to=rst --output=README.rst README.md
2 |
--------------------------------------------------------------------------------
/py/inbloom/crc32.c:
--------------------------------------------------------------------------------
1 | /*-
2 | * COPYRIGHT (C) 1986 Gary S. Brown. You may use this program, or
3 | * code or tables extracted from it, as desired without restriction.
4 | *
5 | * First, the polynomial itself and its table of feedback terms. The
6 | * polynomial is
7 | * X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0
8 | *
9 | * Note that we take it "backwards" and put the highest-order term in
10 | * the lowest-order bit. The X^32 term is "implied"; the LSB is the
11 | * X^31 term, etc. The X^0 term (usually shown as "+1") results in
12 | * the MSB being 1
13 | *
14 | * Note that the usual hardware shift register implementation, which
15 | * is what we're using (we're merely optimizing it by doing eight-bit
16 | * chunks at a time) shifts bits into the lowest-order term. In our
17 | * implementation, that means shifting towards the right. Why do we
18 | * do it this way? Because the calculated CRC must be transmitted in
19 | * order from highest-order term to lowest-order term. UARTs transmit
20 | * characters in order from LSB to MSB. By storing the CRC this way
21 | * we hand it to the UART in the order low-byte to high-byte; the UART
22 | * sends each low-bit to hight-bit; and the result is transmission bit
23 | * by bit from highest- to lowest-order term without requiring any bit
24 | * shuffling on our part. Reception works similarly
25 | *
26 | * The feedback terms table consists of 256, 32-bit entries. Notes
27 | *
28 | * The table can be generated at runtime if desired; code to do so
29 | * is shown later. It might not be obvious, but the feedback
30 | * terms simply represent the results of eight shift/xor opera
31 | * tions for all combinations of data and CRC register values
32 | *
33 | * The values must be right-shifted by eight bits by the "updcrc
34 | * logic; the shift must be unsigned (bring in zeroes). On some
35 | * hardware you could probably optimize the shift in assembler by
36 | * using byte-swap instructions
37 | * polynomial $edb88320
38 | *
39 | *
40 | * CRC32 code derived from work by Gary S. Brown.
41 | */
42 |
43 | #include
44 | #include
45 |
46 | static uint32_t crc32_tab[] = {
47 | 0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
48 | 0xe963a535, 0x9e6495a3, 0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
49 | 0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
50 | 0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
51 | 0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
52 | 0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
53 | 0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
54 | 0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
55 | 0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
56 | 0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
57 | 0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
58 | 0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
59 | 0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
60 | 0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
61 | 0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
62 | 0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
63 | 0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
64 | 0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
65 | 0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
66 | 0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
67 | 0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
68 | 0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
69 | 0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
70 | 0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
71 | 0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
72 | 0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
73 | 0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
74 | 0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
75 | 0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
76 | 0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
77 | 0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
78 | 0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
79 | 0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
80 | 0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
81 | 0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
82 | 0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
83 | 0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
84 | 0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
85 | 0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
86 | 0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
87 | 0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
88 | 0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
89 | 0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
90 | };
91 |
92 | uint32_t
93 | crc32(uint32_t crc, const void *buf, size_t size)
94 | {
95 | const uint8_t *p;
96 |
97 | p = buf;
98 | crc = crc ^ ~0U;
99 |
100 | while (size--)
101 | crc = crc32_tab[(crc ^ *p++) & 0xFF] ^ (crc >> 8);
102 |
103 | return crc ^ ~0U;
104 | }
105 |
--------------------------------------------------------------------------------
/py/inbloom/inbloom.c:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include "../vendor/libbloom/bloom.h"
4 | #include "crc32.c"
5 |
6 | static char module_docstring[] = "Python wrapper for libbloom";
7 |
8 | typedef struct {
9 | PyObject_HEAD;
10 | struct bloom *_bloom_struct;
11 | } Filter;
12 |
13 | static PyTypeObject FilterType = {
14 | PyObject_HEAD_INIT(NULL)
15 | 0, /*ob_size*/
16 | "inbloom.Filter", /*tp_name*/
17 | sizeof(Filter), /*tp_basicsize*/
18 | 0, /*tp_itemsize*/
19 | 0, /*tp_dealloc*/
20 | 0, /*tp_print*/
21 | 0, /*tp_getattr*/
22 | 0, /*tp_setattr*/
23 | 0, /*tp_compare*/
24 | 0, /*tp_repr*/
25 | 0, /*tp_as_number*/
26 | 0, /*tp_as_sequence*/
27 | 0, /*tp_as_mapping*/
28 | 0, /*tp_hash */
29 | 0, /*tp_call*/
30 | 0, /*tp_str*/
31 | 0, /*tp_getattro*/
32 | 0, /*tp_setattro*/
33 | 0, /*tp_as_buffer*/
34 | Py_TPFLAGS_DEFAULT, /*tp_flags*/
35 | "Filter objects", /*tp_doc*/
36 | };
37 |
38 | struct serialized_filter_header {
39 | uint16_t checksum;
40 | uint16_t error_rate;
41 | uint32_t cardinality;
42 | };
43 |
44 | static PyObject *InBloomError;
45 |
46 | static PyObject *
47 | instantiate_filter(uint32_t cardinality, uint16_t error_rate, const char *data, int datalen)
48 | {
49 | PyObject *args = Py_BuildValue("(ids#)", cardinality, 1.0 / error_rate, data, datalen);
50 | PyObject *obj = FilterType.tp_new(&FilterType, args, NULL);
51 | if (FilterType.tp_init(obj, args, NULL) < 0) {
52 | Py_DECREF(obj);
53 | obj = NULL;
54 | }
55 | Py_DECREF(args);
56 | return obj;
57 | }
58 |
59 | /* helpers */
60 | static uint16_t
61 | compute_checksum(const char *buf, size_t len)
62 | {
63 | uint32_t checksum32 = crc32(0, buf, len);
64 | return (checksum32 & 0xFFFF) ^ (checksum32 >> 16);
65 | }
66 |
67 | static uint16_t
68 | read_uint16(const char **buffer)
69 | {
70 | uint16_t ret = ntohs(*((uint16_t *)*buffer));
71 | *buffer += sizeof(uint16_t);
72 | return ret;
73 | }
74 |
75 | static uint32_t
76 | read_uint32(const char **buffer)
77 | {
78 | uint32_t ret = ntohl(*((uint32_t *)*buffer));
79 | *buffer += sizeof(uint32_t);
80 | return ret;
81 | }
82 |
83 |
84 | /* serialization */
85 | static PyObject *
86 | load(PyObject *self, PyObject *args)
87 | {
88 | const char *buffer;
89 | Py_ssize_t buflen;
90 | if (!PyArg_ParseTuple(args, "s#", &buffer, &buflen)) {
91 | return NULL;
92 | }
93 |
94 | if ((int)buflen < sizeof(struct serialized_filter_header) + 1) {
95 | PyErr_SetString(InBloomError, "incomplete payload");
96 | return NULL;
97 | }
98 |
99 | struct serialized_filter_header header;
100 | header.checksum = read_uint16(&buffer);
101 | header.error_rate = read_uint16(&buffer);
102 | header.cardinality = read_uint32(&buffer);
103 | const char *data = buffer;
104 | size_t datalen = (int)buflen - sizeof(struct serialized_filter_header);
105 | uint16_t expected_checksum = compute_checksum(data, datalen);
106 | if (expected_checksum != header.checksum) {
107 | PyErr_SetString(InBloomError, "checksum mismatch");
108 | return NULL;
109 | }
110 | return instantiate_filter(header.cardinality, header.error_rate, data, datalen);
111 | }
112 |
113 | static PyObject *
114 | dump(PyObject *self, PyObject *args)
115 | {
116 | Filter *filter;
117 | if (!PyArg_ParseTuple(args, "O", &filter)) {
118 | return NULL;
119 | }
120 | uint16_t checksum = compute_checksum((const char *)filter->_bloom_struct->bf, filter->_bloom_struct->bytes);
121 |
122 | struct serialized_filter_header header = {htons(checksum), htons(1.0 / filter->_bloom_struct->error), htonl(filter->_bloom_struct->entries)};
123 | PyObject *serial_header = PyString_FromStringAndSize((const char *)&header, sizeof(struct serialized_filter_header));
124 | PyObject *serial_data = PyString_FromStringAndSize((const char *)filter->_bloom_struct->bf, filter->_bloom_struct->bytes);
125 | PyString_Concat(&serial_header, serial_data);
126 | Py_DECREF(serial_data);
127 | return serial_header;
128 | }
129 |
130 | static PyMethodDef module_methods[] = {
131 | {"load", (PyCFunction)load, METH_VARARGS,
132 | "load a serialized filter"},
133 | {"dump", (PyCFunction)dump, METH_VARARGS,
134 | "dump a filter into a string"},
135 | {NULL}
136 | };
137 |
138 | /* Filter methods */
139 | static PyObject *
140 | Filter_add(Filter *self, PyObject *args)
141 | {
142 | const char *buffer;
143 | Py_ssize_t buflen;
144 | if (!PyArg_ParseTuple(args, "s#", &buffer, &buflen)) {
145 | return NULL;
146 | }
147 |
148 | bloom_add(self->_bloom_struct, buffer, buflen);
149 | Py_RETURN_NONE;
150 | }
151 |
152 | static PyObject *
153 | Filter_check(Filter *self, PyObject *args)
154 | {
155 | const char *buffer;
156 | Py_ssize_t buflen;
157 | if (!PyArg_ParseTuple(args, "s#", &buffer, &buflen)) {
158 | return NULL;
159 | }
160 |
161 | if (bloom_check(self->_bloom_struct, buffer, buflen))
162 | Py_RETURN_TRUE;
163 | else
164 | Py_RETURN_FALSE;
165 | }
166 |
167 | static PyObject *
168 | Filter_buffer(Filter *self, PyObject *args)
169 | {
170 | return PyString_FromStringAndSize((const char *)self->_bloom_struct->bf, self->_bloom_struct->bytes);
171 | }
172 |
173 | static PyMethodDef Filter_methods[] = {
174 | {"add", (PyCFunction)Filter_add, METH_VARARGS,
175 | "add a member to the filter"},
176 | {"contains", (PyCFunction)Filter_check, METH_VARARGS,
177 | "check if member exists the filter"},
178 | {"buffer", (PyCFunction)Filter_buffer, METH_NOARGS,
179 | "get a copy of the internal buffer"},
180 | {NULL} /* Sentinel */
181 | };
182 |
183 | static void
184 | Filter_dealloc(Filter* self)
185 | {
186 | bloom_free(self->_bloom_struct);
187 | free(self->_bloom_struct);
188 | self->ob_type->tp_free((PyObject*)self);
189 | }
190 |
191 | static PyObject *
192 | Filter_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
193 | {
194 | Filter *self;
195 |
196 | self = (Filter *)type->tp_alloc(type, 0);
197 | if (self != NULL) {
198 | self->_bloom_struct = (struct bloom *)malloc(sizeof(struct bloom));
199 | if (self->_bloom_struct == NULL)
200 | return PyErr_NoMemory();
201 | }
202 |
203 | return (PyObject *)self;
204 | }
205 |
206 | static int
207 | Filter_init(Filter *self, PyObject *args, PyObject *kwargs)
208 | {
209 | static char *kwlist[] = {"entries", "error", "data", NULL};
210 | int entries, success;
211 | double error;
212 | const char *data = NULL;
213 | Py_ssize_t len;
214 | if (!PyArg_ParseTupleAndKeywords(args, kwargs, "id|s#", kwlist, &entries, &error, &data, &len)) {
215 | return -1;
216 | }
217 | success = bloom_init(self->_bloom_struct, entries, error);
218 | if (success == 0) {
219 | if (data != NULL) {
220 | if ((int)len != self->_bloom_struct->bytes) {
221 | PyErr_SetString(InBloomError, "invalid data length");
222 | return -1;
223 | }
224 | memcpy(self->_bloom_struct->bf, (const unsigned char *)data, self->_bloom_struct->bytes);
225 | }
226 | return 0;
227 | }
228 | else {
229 | PyErr_SetString(InBloomError, "internal initialization failed");
230 | return -1;
231 | }
232 | }
233 |
234 |
235 | #ifndef PyMODINIT_FUNC
236 | #define PyMODINIT_FUND void
237 | #endif
238 | PyMODINIT_FUNC
239 | initinbloom(void)
240 | {
241 | PyObject *m;
242 | FilterType.tp_new = Filter_new;
243 | FilterType.tp_init = (initproc)Filter_init;
244 | FilterType.tp_methods = Filter_methods;
245 | FilterType.tp_dealloc = (destructor)Filter_dealloc;
246 | if (PyType_Ready(&FilterType) < 0)
247 | return;
248 |
249 | m = Py_InitModule3("inbloom", module_methods, module_docstring);
250 | Py_INCREF(&FilterType);
251 | PyModule_AddObject(m, "Filter", (PyObject *)&FilterType);
252 |
253 | InBloomError = PyErr_NewException("inbloom.error", NULL, NULL);
254 | Py_INCREF(InBloomError);
255 | PyModule_AddObject(m, "error", InBloomError);
256 | }
257 |
--------------------------------------------------------------------------------
/py/setup.py:
--------------------------------------------------------------------------------
1 | from distutils.core import setup, Extension
2 | from os import path
3 |
4 | pwd = lambda f: path.join(path.abspath(path.dirname(__file__)), f)
5 | contents = lambda f: open(pwd(f)).read().strip()
6 |
7 | module = Extension('inbloom',
8 | ['inbloom/inbloom.c', 'vendor/libbloom/bloom.c', 'vendor/libbloom/murmur2/MurmurHash2.c'],
9 | include_dirs=['vendor/libbloom/murmur2']
10 | )
11 |
12 | setup(
13 | name='inbloom',
14 | author='EverythingMe',
15 | description='Portable, cross language Bloom Fitler implementation, with compatible libraries in Java and Go',
16 | long_description=contents('README.rst'),
17 | version=contents('VERSION'),
18 | url='https://github.com/EverythingMe/inbloom',
19 | ext_modules=[module],
20 | license='BSD',
21 | )
22 |
--------------------------------------------------------------------------------
/py/test.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import, division
2 | from unittest import TestCase
3 | from binascii import hexlify
4 | import inbloom
5 |
6 |
7 | class InBloomTestCase(TestCase):
8 | def test_functionality(self):
9 | bf = inbloom.Filter(20, 0.01)
10 | keys = ["foo", "bar", "foosdfsdfs", "fossdfsdfo", "foasdfasdfasdfasdfo", "foasdfasdfasdasdfasdfasdfasdfasdfo"]
11 | faux = ["goo", "gar", "gaz"]
12 | for k in keys:
13 | bf.add(k)
14 |
15 | for k in keys:
16 | assert bf.contains(k)
17 |
18 | for k in faux:
19 | assert not bf.contains(k)
20 |
21 | expected = '02000C0300C2246913049E040002002000017614002B0002'
22 | actual = hexlify(bf.buffer()).upper()
23 | assert expected == actual
24 |
25 | def test_dump_load(self):
26 | bf = inbloom.Filter(20, 0.01)
27 | bf.add('abc')
28 | expected = '620d006400000014000000000020001000080000000000002000100008000400'
29 | actual = hexlify(inbloom.dump(bf))
30 | assert expected == actual
31 |
32 | bf = inbloom.load(inbloom.dump(bf))
33 | actual = hexlify(inbloom.dump(bf))
34 | assert expected == actual
35 |
36 | data = inbloom.dump(bf)
37 | data = str([0xff, 0xff]) + data[2:]
38 |
39 | with self.assertRaisesRegexp(inbloom.error, "checksum mismatch"):
40 | inbloom.load(data)
41 |
42 | data = data[:4]
43 | with self.assertRaisesRegexp(inbloom.error, "incomplete payload"):
44 | inbloom.load(data)
45 |
--------------------------------------------------------------------------------