├── .gitignore ├── LICENSE ├── README.md ├── extras ├── bounds-check.py └── universality-proof.md ├── polymur-hash.h └── test.c /.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | *.pyc 3 | .vscode 4 | a.out 5 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2023 Orson Peters 2 | 3 | This software is provided 'as-is', without any express or implied warranty. In 4 | no event will the authors be held liable for any damages arising from the use of 5 | this software. 6 | 7 | Permission is granted to anyone to use this software for any purpose, including 8 | commercial applications, and to alter it and redistribute it freely, subject to 9 | the following restrictions: 10 | 11 | 1. The origin of this software must not be misrepresented; you must not claim 12 | that you wrote the original software. If you use this software in a product, 13 | an acknowledgment in the product documentation would be appreciated but is 14 | not required. 15 | 16 | 2. Altered source versions must be plainly marked as such, and must not be 17 | misrepresented as being the original software. 18 | 19 | 3. This notice may not be removed or altered from any source distribution. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PolymurHash 2 | 3 | PolymurHash is a 64-bit [universal hash 4 | function](https://en.wikipedia.org/wiki/Universal_hashing) designed for use 5 | in hash tables. It has a couple desirable properties: 6 | 7 | - It is **mathematically proven** to have a statistically low collision rate. 8 | When initialized with an independently chosen random seed, for any distinct 9 | pair of inputs `m` and `m'` of up to `n` bytes the probability that `h(m) = 10 | h(m')` is at most `n * 2^-60.2`. This is known as an almost-universal hash 11 | function. In fact PolymurHash has a stronger property: it is almost pairwise 12 | independent. For any two distinct inputs `m` and `m'` the probability they 13 | hash to the pair of specific 64-bit hash values `(x, y)` is at most `n * 14 | 2^-124.2`. 15 | 16 | - It is very fast for short inputs, while being no slouch for longer inputs. On 17 | an Apple M1 machine it can hash any input <= 49 bytes in 21 cycles, and 18 | processes 33.3 GiB/sec (11.6 bytes / cycle) for long inputs. 19 | 20 | - It is cross-platform, using no extended instruction sets such as 21 | CLMUL or AES-NI. For good speed it only requires native 64 x 64 -> 128 bit 22 | multiplication, which almost all 64-bit processors have. 23 | 24 | - It is small in code size and space. Ignoring cross-platform compatibility 25 | definitions, the hash function and initialization procedure is just over 100 26 | lines of C code combined. The initialized hash function uses 32 bytes of 27 | memory to store its parameters, and it uses only a small constant amount of 28 | stack memory when computing a hash. 29 | 30 | To my knowledge PolymurHash is the first hash function to have all those 31 | properties. There are already very fast universal hash functions, such as 32 | [CLHASH](https://github.com/lemire/clhash), 33 | [umash](https://github.com/backtrace-labs/umash), 34 | [HalftimeHash](https://github.com/jbapple/HalftimeHash), etc., but they all 35 | require a large amount of space (1KiB+) to store the hash function parameters, 36 | are not optimized for hashing small strings, and/or require specific instruction 37 | sets such as CLMUL. There are also very fast cross-platform hashes such as 38 | [xxHash3](https://github.com/Cyan4973/xxHash), 39 | [komihash](https://github.com/avaneev/komihash) or 40 | [wyhash](https://github.com/wangyi-fudan/wyhash), but they do not come with 41 | proofs of universality. [SipHash](https://en.wikipedia.org/wiki/SipHash) claims 42 | to be cryptographically secure, but is relatively slow, leading people to use 43 | reduced-round variants with unknown cryptanalysis, or to use a fast but insecure 44 | hash altogether. 45 | 46 | Needless to say, PolymurHash passes the full [SMHasher 47 | suite](https://github.com/rurban/smhasher/) without any failures[*](https://github.com/rurban/smhasher/issues/114#issuecomment-1587631635). For the proof 48 | of the collision rate, see 49 | [`extras/universality-proof.md`](extras/universality-proof.md). 50 | 51 | 52 | ## How to use it 53 | 54 | PolymurHash is provided as a header-only C library `polymur-hash.h`. Simply 55 | include it and you are good to go. First the hash function must have its 56 | `PolymurHashParams` initialized from a seed, for which there are two functions. 57 | PolymurHash uses two 64-bit secrets, but provides a convenient function to 58 | expand a single 64-bit seed to that: 59 | 60 | ```c 61 | void polymur_init_params(PolymurHashParams* p, uint64_t k, uint64_t s); 62 | void polymur_init_params_from_seed(PolymurHashParams* p, uint64_t seed); 63 | ``` 64 | 65 | The proof of almost universality applies to both, but for the proof of almost 66 | pairwise independence to hold you must provide 128 bits of entropy. Once 67 | initialization is complete, you can compute as many hashes as you want with it: 68 | 69 | ```c 70 | // Computes the full hash of buf. The tweak is added to the hash before final 71 | // mixing, allowing different outputs much faster than re-seeding. No claims are 72 | // made about the collision probability between hashes with different tweaks. 73 | uint64_t polymur_hash(const uint8_t* buf, size_t len, const PolymurHashParams* p, uint64_t tweak); 74 | ``` 75 | 76 | The tweak allows you to have a different hash function for each hash table 77 | without having to calculate new parameters from another seed. This can prevent 78 | certain worst-case problems when inserting into a second hash table while 79 | iterating over another. 80 | 81 | ### License 82 | 83 | PolymurHash is available under the zlib license, included in `polymur-hash.h`. 84 | 85 | 86 | ## How it works and why it's fast 87 | 88 | At its core, PolymurHash is a Carter-Wegman style polynomial hash that treats 89 | the input as a series of coefficients for a polynomial, and then evaluates that 90 | polynomial at a secret key `k` modulo some prime `p`. The result of this 91 | universal hash is then fed into a Murmur-style permutation followed by the 92 | addition of a second secret key `s`. The polynomial part of the hash provides 93 | the universality guarantee, and the Murmur-style bit mixing part provides good 94 | bit avalanching and uniformity over the full 64 bit output. The final addition 95 | of `s` provides pairwise independence and resistance against cryptanalysis by 96 | making the otherwise trivially invertible permutation a lot harder to invert. 97 | 98 | Now to make the above fast, a couple tricks are employed. The prime used is 99 | `p = 2^61 - 1`, a Mersenne prime. By expressing any number `x` as `2^61 a + b` 100 | where `b < 2^61`, we notice that mod `p` this is equal to just `a + b`. With 101 | this we can do efficient reduction: 102 | 103 | reduce(x) = (x >> 61) + (x & P611) 104 | 105 | This allows us to keep the numbers small very efficiently. Furthermore we also 106 | limit `k` during initialization in such a way that we overflow 64/128 bit 107 | numbers less often and thus need to perform the above reduction less often. 108 | 109 | We also use a trick similar to the one found in "Polynomial evaluation and 110 | message authentication" by Daniel J. Bernstein, where we forego computing an 111 | exact polynomial `m[0]k + m[1]k^2 + m[2]k^3 + ...` and instead compute any 112 | polynomial that is injective. That is, we're good as long as each input maps to 113 | a distinct polynomial. Then we can use the 'pseudo-dot-product' to compute 114 | 115 | (k + m[0])*(k^2 + m[1]) + (k^3 + m[2])*(k^4 + m[3]) + ... 116 | 117 | which allows us to process twice as much data per multiplication. It also 118 | allows us to pre-compute a couple powers of `k` to then use instruction-level 119 | parallelism to further increase throughput. The loop used for large inputs 120 | 121 | m[i] = loadword(buf + 7*i) & ((1 << 56) - 1) 122 | f = (f + m[6]) * k^7 123 | f += (k + m[0]) * (k^6 + m[1]) 124 | f += (k^2 + m[2]) * (k^5 + m[3]) 125 | f += (k^3 + m[4]) * (k^4 + m[5]) 126 | f = reduce(f) 127 | 128 | processes 49 bytes of input using seven parallel 64-bit additions and binary 129 | ANDs, four parallel 64 x 64 -> 128 bit multiplications, three 128-bit additions 130 | and one 128-bit to 64-bit modular reduction. For small inputs we use custom 131 | injective polynomials that are fast to evaluate, filled with input from 132 | potentially overlapping reads to avoid branches on the input length. 133 | 134 | 135 | ## Resistance against cryptanalysis 136 | 137 | PolymurHash has strong collision guarantees for input chosen independently from 138 | its random seed. An interactive attacker however can craft its input based on 139 | earlier seen hashes. PolymurHash is **not** a cryptographically secure 140 | collision-resistant hash if the attacker can see (or worse, request) hash values 141 | at will. This is not a failure of the underlying hash, for example the 142 | well-known Poly1305 hash used to secure TLS suffers from the same problem. It 143 | solves this by hiding its hash values from attackers by adding an encrypted 144 | nonce acting as a one-time-pad. 145 | 146 | PolymurHash is not intended to be used in a context where an attacker can see 147 | the hash values. Its main intended use case is as a hash function for 148 | DoS-resistant hash tables. However, a clever attacker might still acquire 149 | *some* information about the hash values by using side-channels such as 150 | hash table iteration order, or timing attacks on collisions. 151 | 152 | To protect against this PolymurHash is structured in a way so that it is not 153 | trivial to invert the hash, nor to set up controlled informative experiments 154 | through side-channels. Roughly speaking, every bit of input is first mixed with 155 | the secret key `k` by modular multiplication and addition modulo a prime. This 156 | is a linear process, so to hide the linear structure of the underlying hash we 157 | pass the resulting value through a Murmur-style bit-mixing permutation which is 158 | highly non-linear. Finally, to ensure the permutation is not trivially 159 | invertible, we add the second secret key `s`. 160 | 161 | The above process is by no means a strong multi-round cipher and would likely 162 | not hold up to proper cryptanalysis in a standard context. But the underlying 163 | structure is sound (e.g. the ChaCha cipher has the form `mix(secret + input) + 164 | secret` where `mix` is an unkeyed permutation), and I believe that extrapolating 165 | what little information you can get from side-channels to recover `k, s` to 166 | execute a HashDoS attack is difficult. 167 | 168 | 169 | ## No weak keys 170 | 171 | Polynomial hashing has one potential issue: weak keys. The multiplicative group 172 | of numbers mod `p = 2^61-1` has subgroups of (much) smaller size. This means 173 | that for some keys `k^(i + d) mod p == k^i mod p` for small constant `d`. In 174 | other words, you can swap the 7-byte block at index `i` with the one at `i + d` 175 | without changing the hash value for such a weak key. 176 | 177 | What is the probability you choose such a weak key if you pick a key at random 178 | from the 2^61 - 2 possible keys? If `d` is a divisor of `p - 1`, then there are 179 | `totient(d)` such keys with the above property. And of course, for the attack to 180 | work we need `d <= n / 7`. Here is a small table showing the probabilities: 181 | 182 | Max length of input Divisors Weak key probability 183 | 64 bytes 8 2^-56.5 184 | 1 kilobyte 49 2^-50.5 185 | 1 megabyte 811 2^-37.3 186 | 1 gigabyte 3420 2^-26.1 187 | 188 | These probabilities are very small. Nevertheless, it didn't sit well with me 189 | that the possibility existed of picking a key that has such a flaw. So, the 190 | seeding algorithm for PolymurHash makes sure to select a key that is a 191 | *generator* of the multiplicative group, that is, the only solution to 192 | `k^(i + d) mod p == k^i mod p` is `d = 2^61 - 1`. 193 | 194 | In other words, PolymurHash does not suffer from weak keys. As a trade-off our 195 | key space is slightly more limited: in combination with the fact we want `k^7 < 196 | 2^60 - 2^56` for efficiency reasons we get a total key space of `totient(2^61 - 197 | 2) * (2^60 - 2^56) / (2^61 - 1) ~= 2^57.4`. Additionally, initialization is also 198 | slower than simply selecting a random key, on an Apple M1 it takes ~300 cycles 199 | on average. If you feel the need to seed many different hashes, consider looking 200 | at the `tweak` parameter instead to see if it fits your criteria. 201 | 202 | 203 | # Acknowledgements 204 | 205 | I am standing on the shoulders of giants, and in the well-researched field of 206 | (universal) hash functions there are a lot of them. J. Lawrence Carter, Mark N. 207 | Wegman, Ted Krovetz, Phillip Rogaway, Mikkel Thorup, Daniel J. Bernstein, Daniel 208 | Lemire, Martin Dietzfelbinger, Austin Appleby, many names come to mind. I have 209 | read many publications by them, and borrowed ideas from all of them. 210 | -------------------------------------------------------------------------------- /extras/bounds-check.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | P611 = 2**61 - 1 4 | 5 | def check(n, lim=2**64): 6 | if n >= lim: 7 | raise ValueError(f"overflow {n} >= lim, log2(n) = {math.log2(n)}, log2(lim) = {math.log2(lim)}") 8 | return n 9 | 10 | def reduce(n): 11 | return n // 2**61 + n % 2**61 12 | 13 | # Maximum value reduce(k) can reach for any k <= n. 14 | def maxreduce(n): 15 | if n < 2**61: return n 16 | if n % 2**61 == P611: return (n >> 61) + P611 17 | return ((n >> 61) - 1) + P611 18 | 19 | def u64add(*p): 20 | return check(sum(p)) 21 | 22 | def u128add(*p): 23 | return check(sum(p), 2**128) 24 | 25 | def u128mul(a, b): 26 | return check(a * b, 2**128) 27 | 28 | def u64reduce(x): 29 | return check(maxreduce(x)) 30 | 31 | # Assume maximal values for each parameter 32 | k = u64reduce(u64reduce(2**64 - 1)) 33 | k2 = u64reduce(u64reduce(u128mul(k, k))) 34 | k7 = 2**60 - 2**56 - 1 35 | m = 2**56 - 1 36 | l = 49 37 | 38 | def test_0_7(): 39 | s = u128mul(u64add(k, m), u64add(k2, l)) 40 | return u64reduce(s) 41 | 42 | def test_8_21(): 43 | k3 = u64reduce(u128mul(k, k2)) 44 | t0 = u128mul(u64add(k2, m), u64add(k7, m)) 45 | t1 = u128mul(u64add(k, m), u64add(k3, l)) 46 | s = u128add(t0, t1) 47 | return u64reduce(s) 48 | 49 | def test_22_49(): 50 | k3 = u64reduce(u128mul(k, k2)) 51 | k4 = u64reduce(u128mul(k2, k2)) 52 | t0 = u64reduce(u128mul(k2 + m, k7 + m)) 53 | t1 = u128mul(k + m, k3 + l) 54 | t2 = u128mul(k2 + m, k7 + m) 55 | t3 = u128mul(t0 + m, k4 + m) 56 | s = u128add(t1, t2, t3) 57 | return u64reduce(s) 58 | 59 | def test_large(): 60 | k3 = u64reduce(u64reduce(u128mul(k, k2))) 61 | k4 = u64reduce(u64reduce(u128mul(k2, k2))) 62 | k5 = u64reduce(u64reduce(u128mul(k, k4))) 63 | k6 = u64reduce(u64reduce(u128mul(k2, k4))) 64 | 65 | max_h_invariant = 2**64 - m - 1 66 | t0 = u128mul(u64add(k, m), u64add(k6, m)) 67 | t1 = u128mul(u64add(k3, m), u64add(k4, m)) 68 | t2 = u128mul(u64add(k5, m), u64add(k2, m)) 69 | t3 = u128mul(u64add(max_h_invariant, m), k7) 70 | s = u128add(t0, t1, t2, t3) 71 | h = check(u64reduce(s), max_h_invariant + 1) 72 | 73 | k14 = u64reduce(u128mul(k7, k7)) 74 | hk14 = u64reduce(u128mul(u64reduce(h), k14)) 75 | return u64reduce(hk14) 76 | 77 | max07 = test_0_7() 78 | max821 = test_8_21() 79 | max2249 = test_22_49() 80 | maxlarge = test_large() 81 | 82 | check(u64add(maxlarge, max07)) 83 | check(u64add(maxlarge, max821)) 84 | check(u64add(maxlarge, max2249)) 85 | 86 | 87 | 88 | # Also ensure our input loading scheme covers all lengths properly. 89 | def load(i, method): 90 | if method == "mask": 91 | return range(i, i + 7) 92 | elif method == "shift": 93 | return range(i + 1, i + 8) 94 | else: 95 | raise RuntimeError("unknown method") 96 | 97 | def covers_8_21(l): 98 | ranges = [load(0, "mask"), load((l-7)//2, "mask"), load(l - 8, "shift")] 99 | return set.union(*(set(r) for r in ranges)) == set(range(l)) 100 | 101 | def covers_22_49(l): 102 | ranges = [load(0, "mask"), load((l-7)//2, "mask"), load(l - 8, "shift")] 103 | ranges += [load(7, "mask"), load(14, "mask"), load(l - 21, "mask"), load(l - 14, "mask")] 104 | return set.union(*(set(r) for r in ranges)) == set(range(l)) 105 | 106 | for l in range(8, 21+1): assert covers_8_21(l) 107 | for l in range(22, 49+1): assert covers_22_49(l) 108 | 109 | -------------------------------------------------------------------------------- /extras/universality-proof.md: -------------------------------------------------------------------------------- 1 | # Proof of almost universality 2 | 3 | The proof below assumes that our secret key `k` was chosen uniformly at random 4 | from all possible `K`, which in [the readme](../README.md#no-weak-keys) is 5 | described to be roughly 2^57.4 possible keys. Furthermore, it assumes this 6 | random choice is independent from our input. 7 | 8 | The first step in PolymurHash is to injectively map the input string onto a 9 | series of 56-bit (7-byte) integers `m[0]`, `m[1]`, `m[2]`, etc. How this is done 10 | is rather specific to the size of the input for maximum speed, with some input 11 | bits sometimes occurring multiple times in the `m`s due to overlapping reads, 12 | but ultimately it is always possible to unambiguously reconstruct the original 13 | input from `m[0]`, `m[1]`, etc, given the length. 14 | 15 | We then injectively map these `m[i]` to a polynomial `f_m(k)` in variable `k`. 16 | That is, if two polynomials `f_m = f_m'` are equal, we find that their 17 | corresponding original hash function inputs `m, m'` must also be equal. We split 18 | the input into blocks of 49 bytes, with only the last block having a non-zero 19 | length <= 49. In total we have `1 + floor((n - 1) / 49)` blocks. 20 | 21 | All except the last block are encoded in the same way. Our polynomial `f` 22 | is initially `0` and we build it up in block-Horner form. For each full block 23 | consisting of seven `m[i]` we update `f` as such: 24 | 25 | f = k^7 * (f + m[6]) 26 | f += (k + m[0]) * (k^6 + m[1]) 27 | f += (k^2 + m[2]) * (k^5 + m[3]) 28 | f += (k^3 + m[4]) * (k^4 + m[5]) 29 | 30 | The result is that we multiply all existing terms in `f` by `k^7` before adding 31 | 32 | 33 | (m[6] + 3)*k^7 + m[0]*k^6 + m[2]*k^5 + m[4]*k^4 + m[5]*k^3 + m[3]*k^2 + m[1]*k 34 | + (m[0]*m[1] + m[2]*m[3] + m[4]*m[5]) 35 | 36 | except we do it using just four multiplications. This process is clearly 37 | injective, as we can reverse it by simply reading off `m[0]` through `m[6]` from 38 | the top 7 exponents, before subtracting the above expression from the polynomial 39 | to continue decoding. Later additions to the polynomial do not interfere 40 | with this as the very next step shifts all exponents up by `7`. Finally, 41 | the top exponent is always non-zero, as `m[6] < 2^56`, so the total number of 42 | blocks in the polynomial is always unambiguous. 43 | 44 | To finish off the process before handling the final block we multiply `f` by 45 | `k^14` to shift existing terms out of harms way. Then we add, depending on the 46 | size, a different injective polynomial of maximum degree < 14 to encode the 47 | final block. These polynomials are: 48 | 49 | length 0..7 50 | f += (k + m[0]) * (k^2 + l) 51 | 52 | k^3 + m[0]*k^2 + l*k + l*m[0] 53 | 54 | length 8..21 55 | f += (k^2 + m[0]) * (k^7 + m[1]) 56 | f += (k + m[2]) * (k^3 + l) 57 | 58 | k^9 + m[0]*k^7 + k^4 + m[2]*k^3 + m[1]*k^2 + l*k + (l*m[2] + m[0]*m[1]) 59 | 60 | length 22..49 61 | t = (k^2 + m[0]) * (k^7 + m[1]) 62 | f += (k + m[2]) * (k^3 + l) 63 | f += (k^2 + m[3]) * (k^7 + m[4]) 64 | f += (t + m[5]) * (k^4 + m[6]) 65 | 66 | k^13 + m[0]*k^11 + (m[6] + 1)*k^9 + (m[0]*m[6] + m[3])*k^7 + m[1]*k^6 + 67 | (m[0]*m[1] + m[5] + 1)*k^4 + m[2]*k^3 + (m[1]*m[6] + m[4])*k^2 + l*k + 68 | (l*m[2] + m[0]*m[1]*m[6] + m[3]*m[4] + m[5]*m[6]) 69 | 70 | Now that last polynomial might not look injective, but it is if you decode it 71 | in stages: first read off `m[0]`, `m[6]`, `m[1]`, `m[2]` directly, after 72 | which you can subtract them in the other terms, e.g. `m[0]*m[6] + m[3]` is now 73 | decodable, etc. 74 | 75 | Another important thing of note here is that `l`, the length of the final block 76 | can always be unambiguously and directly read off the coefficient of `k^1`. 77 | This, plus the fact that the number of blocks is also unambiguous makes the 78 | total length of the input string unambigously decodable. 79 | 80 | ### Collision bound 81 | 82 | Now that we have shown PolymurHash constructs an injective polynomial `f_m(k)` 83 | from our input, we can bound our collision probability. We evaluate our 84 | polynomial using our secret key `k`, modulo `p = 2^61 - 1`, a prime, giving our 85 | hash output `PH(m) = f_m(k) mod p`. Note that due to the injectivity of the 86 | polynomial we do not introduce any extraneous collisions: if two polynomials are 87 | equal, then so must be their corresponding messages. So any collisions are due 88 | to the evaluation at `k mod p`. 89 | 90 | For any two distinct `m, m'` of at most `n` bytes chosen independently from our 91 | secret key `k` the probability that `PH(m) = PH(m')` is equivalent to asking 92 | `Pr[PH(m) - PH(m') = 0]`. But `x = 0` implies `x mod p = 0` for any `p`, 93 | thus if we can bound `Pr[(PH(m) - PH(m')) mod p = 0]` we also bound the original 94 | collision probability. Since `(a mod p - b mod p) mod p = (a - b) mod p` 95 | our collision probability is upper bounded by 96 | 97 | Pr[(f_m(k) - f_m'(k)) mod p = 0] 98 | 99 | Since `f_m` and `f_m'` are both polynomials in `k`, their difference is as well. 100 | This polynomial is non-zero, as when `m` and `m'` have the same length then the 101 | polynomial differs in the corresponding block where `m` and `m'` differ, and if 102 | they have different lengths then they either differ in the topmost exponent 103 | encoding the number of blocks, or in the coefficient of `k` encoding the final 104 | block length. 105 | 106 | Since the polynomial `f_m(k) - f_m'(k)` is non-zero, we can ask how many roots it 107 | has. Over the finite field mod `p` it is well known a polynomial has at most the 108 | same number of roots as its maximum degree. The maximum degree of our polynomial 109 | in question is `14 + n / 7`. Thus it has at most that many roots when reduced 110 | mod `p`. 111 | 112 | But `m` and `m'` were chosen independently from our secret key `k`. So the 113 | probability that the polynomial `f_m(k) - f_m'(k)` happens to have `k` as its 114 | root mod `p` is at most `(14 + n/7) / |K|` where `K` is the set of all possible 115 | keys. Ignoring the negligible constant 14, and given that we have roughly 116 | `2^57.4` possible keys, this gives us an overall collision bound of 117 | 118 | Pr[PH(m) = PH(m')] <= n/7 * 2^-57.4 119 | Pr[PH(m) = PH(m')] <= n * 2^-60.2 120 | 121 | In reality PolymurHash still performs a permutation after the polynomial 122 | evaluation (the mur in polymur) and adds `s` mod 2^64, but both operations are 123 | invertible and never introduce collisions, so this does not affect our above 124 | argument on the collision probability. 125 | 126 | ## Proof of almost pairwise independence 127 | 128 | The proof below assumes that in addition to `k` being randomly chosen that `s` 129 | is chosen uniformly at random from all 64-bit integers. Furthermore, it assumes 130 | this random choice is independent from our input. 131 | 132 | This proof is almost identical to the one above, but takes into account the full 133 | hash. Our full hash is `H(m) = (mix(PH(m)) + s) mod 2^64`, where `s < 2^64` is 134 | another independently chosen secret key and `mix` is a 64-bit permutation 135 | independent from `k, s`. Let `imix` be the inverse of that permutation. 136 | 137 | This time we need to prove that for any `m != m'` independently chosen from 138 | `s, k`, as well as any hash outcomes `x, y` that 139 | 140 | Pr[H(m) = x && H(m') = y] 141 | 142 | is a small bounded quantity. Let use rewrite some terms: 143 | 144 | H(m) = x <=> mix(PH(m)) = (x - s) % 2^64 <=> PH(m) = imix((x - s) % 2^64) 145 | 146 | For brevity write `ixs = imix((x - s) % 2^64)` and similarly for `iys`. Then 147 | our probability can be rewritten as 148 | 149 | Pr[H(m) = x && H(m') = y] = 150 | Pr[H(m) - H(m') = x - y && H(m') = y] = 151 | Pr[PH(m) - PH(m') - ixs + iys = 0 && mix(PH(m')) = (y - s) % 2^64] = 152 | Pr[PH(m) - PH(m') - ixs + iys = 0 && s = (mix(PH(m')) - y) % 2^64] 153 | 154 | Using the same trick as before we find once again that mod `p` the left hand 155 | equation, irrespective of `s`, is a non-zero polynomial with maximum degree 156 | `n/7`. Thus it has at most `n/7` roots mod p. The right hand equation tells us 157 | that given a certain value of `k` there is a unique solution for `s`. Thus in 158 | total we have at most `n/7` solutions that solve both equations at once. But `k, 159 | s` were picked uniformly at random from a total key space of `2^57.4 * 2^64` 160 | values, thus giving us our bound 161 | 162 | Pr[H(m) = x && H(m') = y] <= n/7 * 2^-121.4 163 | Pr[H(m) = x && H(m') = y] <= n * 2^-124.2 164 | 165 | Of course for the above proofs to be valid, our hashing algorithm needs to 166 | faithfully execute the above. While I don't have a formal verification for 167 | PolymurHash, at the very least I have a simulation in 168 | [`bounds-check.py`](bounds-check.py) that proves that even in the worst case 169 | we appropriately reduce `mod p` when necessary and that no overflows occur. 170 | 171 | -------------------------------------------------------------------------------- /polymur-hash.h: -------------------------------------------------------------------------------- 1 | /* 2 | PolymurHash version 2.0 3 | 4 | Copyright (c) 2023 Orson Peters 5 | 6 | This software is provided 'as-is', without any express or implied warranty. In 7 | no event will the authors be held liable for any damages arising from the use of 8 | this software. 9 | 10 | Permission is granted to anyone to use this software for any purpose, including 11 | commercial applications, and to alter it and redistribute it freely, subject to 12 | the following restrictions: 13 | 14 | 1. The origin of this software must not be misrepresented; you must not claim 15 | that you wrote the original software. If you use this software in a product, 16 | an acknowledgment in the product documentation would be appreciated but is 17 | not required. 18 | 19 | 2. Altered source versions must be plainly marked as such, and must not be 20 | misrepresented as being the original software. 21 | 22 | 3. This notice may not be removed or altered from any source distribution. 23 | */ 24 | 25 | #ifndef POLYMUR_HASH_H 26 | #define POLYMUR_HASH_H 27 | 28 | #ifdef __cplusplus 29 | extern "C" { 30 | #endif 31 | 32 | #include 33 | #include 34 | #include 35 | #if defined(_MSC_VER) 36 | #include 37 | #ifdef _M_X64 38 | #pragma intrinsic(_umul128) 39 | #endif 40 | #endif 41 | 42 | // ---------- PolymurHash public API ---------- 43 | typedef struct { 44 | uint64_t k, k2, k7, s; 45 | } PolymurHashParams; 46 | 47 | // Expands a 64-bit or 128-bit seed to a set of parameters for hash evaluation. 48 | static inline void polymur_init_params(PolymurHashParams* p, uint64_t k_seed, uint64_t s_seed); 49 | static inline void polymur_init_params_from_seed(PolymurHashParams* p, uint64_t seed); 50 | 51 | // Computes the full hash of buf. The tweak is added to the hash before final 52 | // mixing, allowing different outputs much faster than re-seeding. No claims are 53 | // made about the collision probability between hashes with different tweaks. 54 | static inline uint64_t polymur_hash(const uint8_t* buf, size_t len, const PolymurHashParams* p, uint64_t tweak); 55 | 56 | 57 | // ---------- Cross-platform compatibility ---------- 58 | #if (defined(__GNUC__) || defined(__clang__) || defined(__INTEL_COMPILER)) 59 | #define POLYMUR_LIKELY(x) (__builtin_expect(!!(x), 1)) 60 | #define POLYMUR_UNLIKELY(x) (__builtin_expect(!!(x), 0)) 61 | #else 62 | #define POLYMUR_LIKELY(x) (!!(x)) 63 | #define POLYMUR_UNLIKELY(x) (!!(x)) 64 | #endif 65 | 66 | // No #ifdefs needed, modern compilers all optimize this away. 67 | static inline int polymur_is_little_endian(void) { 68 | uint32_t v = 1; 69 | return *(char*) &v; 70 | } 71 | 72 | static inline uint32_t polymur_bswap32(uint32_t v) { 73 | return ((v >> 24) & 0x000000ffUL) | ((v >> 8) & 0x0000ff00UL) | ((v << 8) & 0x00ff0000UL) | ((v << 24) & 0xff000000UL); 74 | } 75 | 76 | static inline uint64_t polymur_bswap64(uint64_t v) { 77 | return (((uint64_t) polymur_bswap32(v)) << 32) | polymur_bswap32(v >> 32); 78 | } 79 | 80 | static inline uint32_t polymur_load_le_u32(const uint8_t* p) { 81 | uint32_t v = 0; memcpy(&v, p, 4); 82 | return polymur_is_little_endian() ? v : polymur_bswap32(v); 83 | } 84 | 85 | static inline uint64_t polymur_load_le_u64(const uint8_t* p) { 86 | uint64_t v = 0; memcpy(&v, p, 8); 87 | return polymur_is_little_endian() ? v : polymur_bswap64(v); 88 | } 89 | 90 | // Loads 0 to 8 bytes from buf with length len as a 64-bit little-endian integer. 91 | static inline uint64_t polymur_load_le_u64_0_8(const uint8_t* buf, size_t len) { 92 | if (len < 4) { 93 | if (len == 0) return 0; 94 | uint64_t v = buf[0]; 95 | v |= buf[len / 2] << 8 * (len / 2); 96 | v |= buf[len - 1] << 8 * (len - 1); 97 | return v; 98 | } 99 | 100 | uint64_t lo = polymur_load_le_u32(buf); 101 | uint64_t hi = polymur_load_le_u32(buf + len - 4); 102 | return lo | (hi << 8 * (len - 4)); 103 | } 104 | 105 | 106 | // ---------- Integer arithmetic ---------- 107 | #define POLYMUR_P611 ((1ULL << 61) - 1) 108 | 109 | #if defined(__SIZEOF_INT128__) 110 | #define polymur_u128_t __uint128_t 111 | 112 | static inline polymur_u128_t polymur_add128(polymur_u128_t a, polymur_u128_t b) { 113 | return a + b; 114 | } 115 | 116 | static inline polymur_u128_t polymur_mul128(uint64_t a, uint64_t b) { 117 | return ((polymur_u128_t) a) * ((polymur_u128_t) b); 118 | } 119 | 120 | static inline uint64_t polymur_red611(polymur_u128_t x) { 121 | return (((uint64_t) x) & POLYMUR_P611) + ((uint64_t) (x >> 61)); 122 | } 123 | #else 124 | typedef struct { 125 | uint64_t lo; 126 | uint64_t hi; 127 | } polymur_u128_t; 128 | 129 | static inline polymur_u128_t polymur_add128(polymur_u128_t a, polymur_u128_t b) { 130 | a.lo += b.lo; 131 | a.hi += b.hi + (a.lo < b.lo); 132 | return a; 133 | } 134 | 135 | static inline polymur_u128_t polymur_mul128(uint64_t a, uint64_t b) { 136 | polymur_u128_t ret; 137 | #if defined(_MSC_VER) && defined(_M_X64) 138 | ret.lo = _umul128(a, b, &ret.hi); 139 | #elif defined(_MSC_VER) && defined(_M_ARM64) 140 | ret.lo = a * b; 141 | ret.hi = __umulh(a, b); 142 | #else 143 | uint64_t lo_lo = (a & 0xffffffffULL) * (b & 0xffffffffULL); 144 | uint64_t hi_lo = (a >> 32) * (b & 0xffffffffULL); 145 | uint64_t lo_hi = (a & 0xffffffffULL) * (b >> 32); 146 | uint64_t hi_hi = (a >> 32) * (b >> 32); 147 | uint64_t cross = (lo_lo >> 32) + (hi_lo & 0xffffffffULL) + lo_hi; 148 | ret.hi = (hi_lo >> 32) + (cross >> 32) + hi_hi; 149 | ret.lo = (cross << 32) | (lo_lo & 0xffffffffULL); 150 | #endif 151 | return ret; 152 | } 153 | 154 | static inline uint64_t polymur_red611(polymur_u128_t x) { 155 | #if defined(_MSC_VER) && defined(_M_X64) 156 | return (((uint64_t) x.lo) & POLYMUR_P611) + __shiftright128(x.lo, x.hi, 61); 157 | #else 158 | return (x.lo & POLYMUR_P611) + ((x.lo >> 61) | (x.hi << 3)); 159 | #endif 160 | } 161 | #endif 162 | 163 | static inline uint64_t polymur_extrared611(uint64_t x) { 164 | return (x & POLYMUR_P611) + (x >> 61); 165 | } 166 | 167 | 168 | // ---------- Hash function ---------- 169 | #define POLYMUR_ARBITRARY1 0x6a09e667f3bcc908ULL // Completely arbitrary, these 170 | #define POLYMUR_ARBITRARY2 0xbb67ae8584caa73bULL // are taken from SHA-2, and 171 | #define POLYMUR_ARBITRARY3 0x3c6ef372fe94f82bULL // are the fractional bits of 172 | #define POLYMUR_ARBITRARY4 0xa54ff53a5f1d36f1ULL // sqrt(p), p = 2, 3, 5, 7. 173 | 174 | static inline uint64_t polymur_mix(uint64_t x) { 175 | // Mixing function from https://jonkagstrom.com/mx3/mx3_rev2.html. 176 | x ^= x >> 32; 177 | x *= 0xe9846af9b1a615dULL; 178 | x ^= x >> 32; 179 | x *= 0xe9846af9b1a615dULL; 180 | x ^= x >> 28; 181 | return x; 182 | } 183 | 184 | static inline void polymur_init_params(PolymurHashParams* p, uint64_t k_seed, uint64_t s_seed) { 185 | p->s = s_seed ^ POLYMUR_ARBITRARY1; // People love to pass zero. 186 | 187 | // POLYMUR_POW37[i] = 37^(2^i) mod (2^61 - 1) 188 | // Could be replaced by a 512 byte LUT, costs ~400 byte overhead but 2x 189 | // faster seeding. However, seeding is rather rare, so I chose not to. 190 | uint64_t POLYMUR_POW37[64]; 191 | POLYMUR_POW37[0] = 37; POLYMUR_POW37[32] = 559096694736811184ULL; 192 | for (int i = 0; i < 31; ++i) { 193 | POLYMUR_POW37[i+ 1] = polymur_extrared611(polymur_red611(polymur_mul128(POLYMUR_POW37[i], POLYMUR_POW37[i]))); 194 | POLYMUR_POW37[i+33] = polymur_extrared611(polymur_red611(polymur_mul128(POLYMUR_POW37[i+32], POLYMUR_POW37[i+32]))); 195 | } 196 | 197 | while (1) { 198 | // Choose a random exponent coprime to 2^61 - 2. ~35.3% success rate. 199 | k_seed += POLYMUR_ARBITRARY2; 200 | uint64_t e = (k_seed >> 3) | 1; // e < 2^61, odd. 201 | if (e % 3 == 0) continue; 202 | if (!(e % 5 && e % 7)) continue; 203 | if (!(e % 11 && e % 13 && e % 31)) continue; 204 | if (!(e % 41 && e % 61 && e % 151 && e % 331 && e % 1321)) continue; 205 | 206 | // Compute k = 37^e mod 2^61 - 1. Since e is coprime with the order of 207 | // the multiplicative group mod 2^61 - 1 and 37 is a generator, this 208 | // results in another generator of the group. 209 | uint64_t ka = 1, kb = 1; 210 | for (int i = 0; e; i += 2, e >>= 2) { 211 | if (e & 1) ka = polymur_extrared611(polymur_red611(polymur_mul128(ka, POLYMUR_POW37[i]))); 212 | if (e & 2) kb = polymur_extrared611(polymur_red611(polymur_mul128(kb, POLYMUR_POW37[i+1]))); 213 | } 214 | uint64_t k = polymur_extrared611(polymur_red611(polymur_mul128(ka, kb))); 215 | 216 | // ~46.875% success rate. Bound on k^7 needed for efficient reduction. 217 | p->k = polymur_extrared611(k); 218 | p->k2 = polymur_extrared611(polymur_red611(polymur_mul128(p->k, p->k))); 219 | uint64_t k3 = polymur_red611(polymur_mul128(p->k, p->k2)); 220 | uint64_t k4 = polymur_red611(polymur_mul128(p->k2, p->k2)); 221 | p->k7 = polymur_extrared611(polymur_red611(polymur_mul128(k3, k4))); 222 | if (p->k7 < (1ULL << 60) - (1ULL << 56)) break; 223 | // Our key space is log2(totient(2^61 - 2) * (2^60-2^56)/2^61) ~= 57.4 bits. 224 | } 225 | } 226 | 227 | static inline void polymur_init_params_from_seed(PolymurHashParams* p, uint64_t seed) { 228 | polymur_init_params(p, polymur_mix(seed + POLYMUR_ARBITRARY3), polymur_mix(seed + POLYMUR_ARBITRARY4)); 229 | } 230 | 231 | static inline uint64_t polymur_hash_poly611(const uint8_t* buf, size_t len, const PolymurHashParams* p, uint64_t tweak) { 232 | uint64_t m[7]; 233 | uint64_t poly_acc = tweak; 234 | 235 | if (POLYMUR_LIKELY(len <= 7)) { 236 | m[0] = polymur_load_le_u64_0_8(buf, len); 237 | return poly_acc + polymur_red611(polymur_mul128(p->k + m[0], p->k2 + len)); 238 | } 239 | 240 | uint64_t k3 = polymur_red611(polymur_mul128( p->k, p->k2)); 241 | uint64_t k4 = polymur_red611(polymur_mul128(p->k2, p->k2)); 242 | if (POLYMUR_UNLIKELY(len >= 50)) { 243 | const uint64_t k5 = polymur_extrared611(polymur_red611(polymur_mul128(p->k, k4))); 244 | const uint64_t k6 = polymur_extrared611(polymur_red611(polymur_mul128(p->k2, k4))); 245 | k3 = polymur_extrared611(k3); 246 | k4 = polymur_extrared611(k4); 247 | uint64_t h = 0; 248 | do { 249 | for (int i = 0; i < 7; ++i) m[i] = polymur_load_le_u64(buf + 7*i) & 0x00ffffffffffffffULL; 250 | polymur_u128_t t0 = polymur_mul128(p->k + m[0], k6 + m[1]); 251 | polymur_u128_t t1 = polymur_mul128(p->k2 + m[2], k5 + m[3]); 252 | polymur_u128_t t2 = polymur_mul128( k3 + m[4], k4 + m[5]); 253 | polymur_u128_t t3 = polymur_mul128( h + m[6], p->k7); 254 | polymur_u128_t s = polymur_add128(polymur_add128(t0, t1), polymur_add128(t2, t3)); 255 | h = polymur_red611(s); 256 | len -= 49; 257 | buf += 49; 258 | } while (len >= 50); 259 | const uint64_t k14 = polymur_red611(polymur_mul128(p->k7, p->k7)); 260 | uint64_t hk14 = polymur_red611(polymur_mul128(polymur_extrared611(h), k14)); 261 | poly_acc += polymur_extrared611(hk14); 262 | } 263 | 264 | if (POLYMUR_LIKELY(len >= 8)) { 265 | m[0] = polymur_load_le_u64(buf) & 0x00ffffffffffffffULL; 266 | m[1] = polymur_load_le_u64(buf + (len - 7) / 2) & 0x00ffffffffffffffULL; 267 | m[2] = polymur_load_le_u64(buf + len - 8) >> 8; 268 | polymur_u128_t t0 = polymur_mul128(p->k2 + m[0], p->k7 + m[1]); 269 | polymur_u128_t t1 = polymur_mul128(p->k + m[2], k3 + len); 270 | if (POLYMUR_LIKELY(len <= 21)) return poly_acc + polymur_red611(polymur_add128(t0, t1)); 271 | m[3] = polymur_load_le_u64(buf + 7) & 0x00ffffffffffffffULL; 272 | m[4] = polymur_load_le_u64(buf + 14) & 0x00ffffffffffffffULL; 273 | m[5] = polymur_load_le_u64(buf + len - 21) & 0x00ffffffffffffffULL; 274 | m[6] = polymur_load_le_u64(buf + len - 14) & 0x00ffffffffffffffULL; 275 | uint64_t t0r = polymur_red611(t0); 276 | polymur_u128_t t2 = polymur_mul128(p->k2 + m[3], p->k7 + m[4]); 277 | polymur_u128_t t3 = polymur_mul128( t0r + m[5], k4 + m[6]); 278 | polymur_u128_t s = polymur_add128(polymur_add128(t1, t2), t3); 279 | return poly_acc + polymur_red611(s); 280 | } 281 | 282 | m[0] = polymur_load_le_u64_0_8(buf, len); 283 | return poly_acc + polymur_red611(polymur_mul128(p->k + m[0], p->k2 + len)); 284 | } 285 | 286 | static inline uint64_t polymur_hash(const uint8_t* buf, size_t len, const PolymurHashParams* p, uint64_t tweak) { 287 | uint64_t h = polymur_hash_poly611(buf, len, p, tweak); 288 | return polymur_mix(h) + p->s; 289 | } 290 | 291 | #ifdef __cplusplus 292 | } 293 | #endif 294 | 295 | #endif 296 | -------------------------------------------------------------------------------- /test.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | 5 | #include "polymur-hash.h" 6 | 7 | static const char* const POLYMUR_TEST_STRINGS[] = { 8 | "", 9 | "i", 10 | "es", 11 | "vca", 12 | "bdxa", 13 | "bbbmc", 14 | "vn5719", 15 | "lpvif62", 16 | "1fcjgark", 17 | "1jlz2nr6w", 18 | "g4q6ebxvod", 19 | "ehiybujo2n1", 20 | "6u2990ulzi7m", 21 | "c3xcb4ew8v678", 22 | "bhcaqrm221pea1", 23 | "oyl3iqxqr85eeve", 24 | "b41kacwmnim8rup5", 25 | "563ug64z3zdtlj438", 26 | "3spvl57qfg4udw2l3s", 27 | "297r1bqesqdhb3jd50g", 28 | "kbc5btot9x1fqslddmha", 29 | "r0vxw6kk8tc6pk0oxnr6m", 30 | "wkgmmma9icgky3bnj5bjir", 31 | "5eslfmq1w3i7wvd89ls7nvf", 32 | "40ytv0ye8cq49no6ys1pdrot", 33 | "p3mbto6bl36g3cx9sstyiugsd", 34 | "m0ylpn0wh5krbebs0j5trzgveb", 35 | "qsy8gpheo76vb8g0ivaojk1zgk4", 36 | "dwqf8tpad4k3x69sah7pstrg8zxx", 37 | "ls3zrsjf1o3cr5sjy7dzp98198i3y", 38 | "xvhvx3wbzer9b7kr4jqg2ok9e3mv5d", 39 | "yapzlwab361wvh0xf1rydn5ynqx8cz0", 40 | "nj56v1p9dc7qdmcn2wksfg5kic1uegm2", 41 | "hlebeoafjqtqxfwd9ge94z3ofk88c4a5x", 42 | "6li8qyu0n8nwoggm4hqzqdamem5barzjyw", 43 | "wj7sp7dhpfapsd8w2nzn8s7xtnro9g45x7t", 44 | "ahio6so1x30oziw54ux5iojjdfvkwpw2v14d", 45 | "wm6yacnl6k3kj3c6i1jeajuwmquv9yujms0wq", 46 | "kzs6xfhmc4ifmstnekcze4y1l83ddvxust2r0o", 47 | "ckamexupx7cmsuza9nssw6n45e7go4s3osr1903", 48 | "nob5bj9tok346dg62jbfjfrhg5l6itsno2hkhfru", 49 | "vgo0ko42n5jvrvnv3ddpwg8h7gkqoxbllv2fdy0no", 50 | "dgs47djqzq3czo0i0v1u3d3x72vtvi3w2tsf9shx6k", 51 | "8vjrw7jz90kf969txb5qrh0u5332zf5epsp8aes4aqh", 52 | "3ni9vtqiq6vnxipfa2wag8vfwq2nyce1kgq5nj3razx9", 53 | "u29xjkod6rtu5j5tlwkydt9khih6o2do84q6ukwlr00xf", 54 | "yxxubvyxuusw827qctqr6tmm69rij5ex2zk1etps8qh61e", 55 | "p7lh4mvadnp6uw0vt7bnzcbv1wjswuuc6gjmu684yznx8lp", 56 | "8c27lotvnab6ra8pq9aon0w30ydyulesinew3akqrhhmm39e", 57 | "ttipbm97gpk7tiog1doncalwgpb7alk16dapga2ekzjt59pv6", 58 | "mbbtplseab2mgtgh8uwlhbmdrwxae3tc2mtf98bwuhmz4bfjnf", 59 | "shnjeydnj8awrkz3rd69wqqd9srie4eo6gc6ylhz2ouv4t4qbar", 60 | "lckl12agnpr6q5053h9v38lyk71emkvwdzrv0ic3a4a4pn3w3o4x", 61 | "7927wqjo5jiecfk0bbtt6065j5jl7x0vv1mcxxxl0j1oatrom44zp", 62 | "bajk3ff026vx0u7o5d7ry7w7n07sqdy4urv4psr79jp13e0mxsks1r", 63 | "en6j5o90gmgj7ssbz6jv3kzdsbzczu518c3zmezkp02rtvo1s88n9pu", 64 | "58fkwyf44tjnrytgplb5qfbvlwtav3zutxowoor2mklkr2up4nzpefos", 65 | "cep02qfl6swv1j3mwy5kprm4p8drszchufrkyr5ejbtzgu5cti6fqab5c", 66 | "lr5q0p1dljga8h4vruy1doa79hntwbdyolnh1fbe3phfk7f5rgs4815foj", 67 | "hmnjq6h1sslivjzmbxbpqba29f6kvbea6n6c4sanm40nzmrxt8hm61ooq3e", 68 | "ae43xxu1mqrbynmctit7m4wf02o0kf2vvw1l3y51n4cu5v5ba4dia67wf0bo", 69 | "qz9ye2ur849obmm23d5tnfc3xdaeajil0gm2pz8z9psedj50h5hcwbcn8n2lo", 70 | "w3xar1pzaff7fhyw6cshdgechm2pj1ebwrbkdct5xfbmxskr3937dodvky62i8", 71 | "ypy5k197quc9ypqoj9kle2eky307jnnd7tu52hqhn6mo7jj1fvmi42kkgq40iy6", 72 | "k1bp6qwiul8fnd6rfe42ge6gskk0jkr9fjgmuujey3kn8ie88h9qguw2gboo7i80", 73 | "begb64jkzfujx7ch3ain1iixidnbhcbcglcuf7nys8eansnkewtiye9xv7s2ksuev", 74 | "vf5d8vdjtwp5vo1ocb274nkl6h8vg97m4v5htfwv02tj9u68vdnteeim6q0zllxflj", 75 | "dcg9osulcdw9sqaue4cfz6k990vpstoxmvwbxzhzichkhdujy36v556u7oxug51gdup", 76 | "1rtgdtibcaos4ebzrbl1fkjahtbel6fyqipuu8lxfrwnggjr8wgoscfxp46wv9wjk315", 77 | "r27qj342zj4anpkqpr9yqo7udnldwiqqpq667zzjgw33yia3wt2p6t221onq4pvfaywbj", 78 | "2yzxskad06pt9zvjmiobfz12a3q6wqgpj4450rpxj0jvjk3cx39qo6cbpukxqsy6idqd40", 79 | "813zultj26k3gn6gibolpuozgaxu8exfatf4iqqugelcf6k8dnzvsjb9s25g3gyess2uscc", 80 | "i4p0jkxf3ajc02x330y3tg8l521fzootabn53ovru20ph3n17hfygaz1axs61jxipz6jac5z", 81 | "5bk748kkvww7toeyeueukk2qyin2o5ohnvj7l1cqs9zgy92n6ujxg6sxdjw81hfd29nzrb4kh", 82 | "uvhy62avo1wqms1rrtefth84xhnv1a59aez6r4xq0pla74036o3vznihxexwydnfjojmk6ipl6", 83 | "0t0dlfopg27cqv1xp4qfgwdlivvgqz204hkh5ianbb4abgk0yjolcwhhitrcksha5s6otmps0hd", 84 | "vrbhcwrmn5xbq8f518ntvmaeg89n7nh1uxebfsmd7smoog3k2w12zv0px32pf4b78er5f3pgy7b9", 85 | "x5bmnefocbtxm8avt22ekuy5hcdyxh86is5fnns9ycfm7o25x9frwv9kfv2ohyd3txlc8zlg5rjjx", 86 | "ttfrgnfvvj552vjymrqqd1yjlyff7vkffprnvu3co4vuah8y0s56tziih3yowm64ja810gb1sgk0um", 87 | "a66t43i9vrr3cmg5qf52akuk8bxl4rm3i86rm7h5brjou9k2egrzy3h19hh8kqr2queyvrwb673qikj", 88 | "mfuwhbvd88n21obpmwx273mmeqiz98qfmb04z0ute54kc1d9bbdyfbx2sc4em6t4pfektm05qs7bgc9z", 89 | "x8wbm0kjpyua8wpgsejgxc06geitm1c0bxihvcwnxnif63dj7cygzk7led0z49ol6zf2xwcmf99n4osip", 90 | "fvba43myr0ozab882crozdz0zx4lfl2h7xe2phfqte97g58fake2fzi87mpftz9qdmt45gm79xl43k1hji", 91 | "wnr0pz08rm3j65b7pl116l59pxy6prnydf9xod1qdi3hp3lod2vuzy1v7gt2g72sejaomn5u53daxjrr9xk", 92 | "bwo7nfqda6w56voyvg1nr7vkq61zi7gy0aggn6pic3gup7uy18zzsc7y5yz3ptvp5cd53i95dj521k4n6n7t", 93 | "mromebynw459uydhhgcgrate6hnst5srng9knfjc02vtg1vywok3rdbw935pf1qwghnh0nibyb60l9elkmajg", 94 | "59dcjawsd4kjjcceco3hphizua88l0qtrfd000iam3rnb4tmy6kzf5bhkc9ud1hsg3dd53tlsxarcl0n59081h", 95 | "odgdgfkwcpz0zjcwsz9is5h4nhebzht7fqa1b4g8e2snb6bn5hu3ixyd2pk1ey5g3eab0m3aoknfi9ctkpxz07j", 96 | "0ljqm7r10ns2pjo8x69oi0zuqss9y7301yd6rmex8djwrbqmvh2mbwscgj9pmrgul5ao0tvpefpe5a9cac5xbdwb", 97 | "b449ak3ihp8tdrbteffru5vboeh1z63c55at3qz70p13d2fim50q8i06zjyb53i4gqzunx6rsl07jxjd9g77me1ww", 98 | "oqzf6c40snvrjz4v0f4h8p0ozjfy1y4xihxwaz16vbxf3qsa805xodw8z5xq3hb7dag8fnxtlsc62150kk253i3buj", 99 | "2eicp9a5aq2uycq55y7rsixlg3pfk7gyin65fghf03kks18dixbckxmbv5xnhyrir7qm8maz4rk2bi3zs9chidlhehf", 100 | "7k1wyjs6fxss4e0ywqfurgop6f7y7e97f3mr5hnb0hlhqkqbqvi1e1z3qfyxc3te75r67fc4h9li06rl9zadg3v9zmz6", 101 | "k3e403zdtia8i0gpodm00yaujr1w474bh3985o3csbfjp3dll4t98i5lesloo6rqjec2aycb3ttx1t6lg0cl9hrjkgheb", 102 | "2fv8zdl1ljmpjbvaan0nt99tra48yjmc5pv91n1c5l8qp5pv77zwsx75ouay7bmgy2tjc1aazyu5zj7oimesavv9n2h7ky", 103 | "ghxs7uejpzpbxjsdmc2w9fabrg4j4pwwbn0wjxux2luk1k0ciror4gcvww18e610u2wpczuwrcphy2xr1129vweqhhgitge", 104 | "vk7wfi9hhi0j9n2grs8rxgq68kw54dbdviuxnvtwgz77h0qkbzqw7pgm7zgn21cxlxnyzigeyz2rzrj3awloq86tqe60e070", 105 | "d1aot9216s547uk1rg651iscb1bjpgth5j4f6arx1902npcykk8niz3ffpbed47idgzvt4u59fyi5e0e2afpjb5gjk4rysn8j", 106 | "2jef2xl4o9yub0z6jnxu8gm87g9iv9zdtu9yolvxtensjrtgplnmnuhz43nsxztk8s936k6eruckkiwc5hnch4qdzft093986x", 107 | "oo70ed77jci4bgodhnyf37axrx4f8gf8qs94f4l9xi9h0jkdl2ozoi2p7q7qu1945l21dzj6rhvqearzrmblfo3ljjldj0m9fue", 108 | 0 109 | }; 110 | 111 | static const uint64_t POLYMUR_REFERENCE_VALUES[] = { 112 | 0x1a6ef9f9d6c576fbULL, 0xd16d059771c65e13ULL, 0x5ee4e0c09f562f87ULL, 0x535b5311db007b0bULL, 113 | 0xd17124f14bd16b5dULL, 0xe84c87105c5b5cadULL, 0xb16ce684b89df9c0ULL, 0x656525cace200667ULL, 114 | 0x92b460794885d16dULL, 0xe6cc0fd9725b46b9ULL, 0xc875ade1929bc93dULL, 0x68a2686ced37268aULL, 115 | 0x1d1809fd7e7e14efULL, 0x699b8f31fc40c137ULL, 0xd10dca2605654d2dULL, 0xd6bc75cb729f18d7ULL, 116 | 0xfe0c617e7cb1bffeULL, 0xf5f14c731c1b9a22ULL, 0x7a0382228d248631ULL, 0x6c3a5f49d8a48bc0ULL, 117 | 0x3606ebe637bb4ebcULL, 0xeb4854d75431ad1dULL, 0xfa8ff1a34793ebb0ULL, 0x7e46ad8e2338cc38ULL, 118 | 0xf8ff088ada3154b4ULL, 0x706669bf0925914fULL, 0x70fc5fbcd3485aceULL, 0x96fd279baed2f2abULL, 119 | 0x6403a64c68d7bf68ULL, 0x3f8f532e1df472e5ULL, 0xbfc49c083515596fULL, 0xd678a4b338fbf03bULL, 120 | 0x127142a2f38b70a1ULL, 0x8a1a56fbb85b71f6ULL, 0x961d22b14e6f1932ULL, 0xa166b0326c942c30ULL, 121 | 0x0f3d837dddb86ae2ULL, 0x0f8164504b4ea8b1ULL, 0xe4f6475d5a739af4ULL, 0xbf535ad625c0d51fULL, 122 | 0x47f10a5a13be50adULL, 0x3dc5ce9c148969b3ULL, 0x8dc071fb4df8e144ULL, 0x9d0a83586cbed3b8ULL, 123 | 0xc4379e22f2809b99ULL, 0x42010c7dd7657650ULL, 0xcc31a6fbcdab8be8ULL, 0x7bad06c38400138aULL, 124 | 0x0178b41584eb483dULL, 0x78afc38d52514efcULL, 0x65a57c4e59288dc7ULL, 0x86e7cc3e273e4e47ULL, 125 | 0xeb99661fb41a6bd2ULL, 0xea0979aa6cd70febULL, 0xa64a347c0b8e007bULL, 0x3692969270fe8fa4ULL, 126 | 0x17640c6052e26555ULL, 0xdf9e0fd276291357ULL, 0x64cca6ebf4580720ULL, 0xf82b33f6399c3f49ULL, 127 | 0xbe3ccb7526561379ULL, 0x8c796fce8509c043ULL, 0x9849fded8c92ce51ULL, 0xa0e744d838dbc4efULL, 128 | 0x8e4602d33a961a65ULL, 0xda381d6727886a7eULL, 0xa503a344fc066833ULL, 0xbf8ff5bc36d5dc7bULL, 129 | 0x795ae9ed95bca7e9ULL, 0x19c80807dc900762ULL, 0xea7d27083e6ca641ULL, 0xeba7e4a637fe4fb5ULL, 130 | 0x34ac9bde50ce9087ULL, 0xe290dd0393f2586aULL, 0xbd7074e9843d9dcaULL, 0x66c17140a05887e6ULL, 131 | 0x4ad7b3e525e37f94ULL, 0xde0d009c18880dd6ULL, 0x1516bbb1caca46d3ULL, 0xe9c907ec28f89499ULL, 132 | 0xd677b655085e1e14ULL, 0xac5f949b08f29553ULL, 0xd353b06cb49b5503ULL, 0x9c25eb30ffa8cc78ULL, 133 | 0x6cf18c91658e0285ULL, 0x99264d2b2cc86a77ULL, 0x8b438cd1bb8fb65dULL, 0xdfd56cf20b217732ULL, 134 | 0x71f4e35bf761bacfULL, 0x87d7c01f2b11659cULL, 0x95de608c3ad2653cULL, 0x51b50e6996b8de93ULL, 135 | 0xd21e837b2121e8c9ULL, 0x73d07c7cb3fa0ba7ULL, 0x8113fab03cab6df3ULL, 0x57cdddea972cc490ULL, 136 | 0xc3df94778f1eec30ULL, 0x7509771e4127701eULL, 0x28240c74c56f8f7cULL, 0x194fa4f68aab8e27ULL 137 | }; 138 | 139 | int main(int argc, char** argv) { 140 | PolymurHashParams p; 141 | polymur_init_params_from_seed(&p, 0xfedbca9876543210ULL); 142 | const uint64_t tweak = 0xabcdef0123456789ULL; 143 | 144 | if (argc >= 2 && strcmp(argv[1], "gen") == 0) { 145 | for (int i = 0; POLYMUR_TEST_STRINGS[i]; ++i) { 146 | const char* s = POLYMUR_TEST_STRINGS[i]; 147 | uint64_t h = polymur_hash((const uint8_t*) s, strlen(s), &p, tweak); 148 | printf("0x%016" PRIx64 " = \"%s\"\n", h, s); 149 | } 150 | return 0; 151 | } 152 | 153 | for (int i = 0; POLYMUR_TEST_STRINGS[i]; ++i) { 154 | const char* s = POLYMUR_TEST_STRINGS[i]; 155 | uint64_t h = polymur_hash((const uint8_t*) s, strlen(s), &p, tweak); 156 | if (h != POLYMUR_REFERENCE_VALUES[i]) { 157 | printf("reference test failed for \"%s\"\n", s); 158 | printf("expected 0x%016" PRIx64 " got 0x%016" PRIx64 "\n", POLYMUR_REFERENCE_VALUES[i], h); 159 | return 1; 160 | } 161 | } 162 | 163 | return 0; 164 | } 165 | --------------------------------------------------------------------------------