├── README.md ├── coveringLSH.py └── testCoveringLSH.py /README.md: -------------------------------------------------------------------------------- 1 | # coveringLSH 2 | Simple implementation of CoveringLSH. 3 | 4 | The theory behind the implementation is described in http://arxiv.org/abs/1507.03225 5 | -------------------------------------------------------------------------------- /coveringLSH.py: -------------------------------------------------------------------------------- 1 | # Reference implementation of CoveringLSH 2 | # (c) Rasmus Pagh 2016 3 | # Version 1.0 4 | # 5 | # Description: Encodes a collection of binary vectors as integers, creates a data structure to perform nearest neighbor queries under Hamming distance up to a maximum radius r. The procedure buildCovering must be called before buildDataStructure. For details see "Locality-sensitive Hashing without False Negatives" by Rasmus Pagh, Proceedings of SODA 2016. 6 | # 7 | # License: This implementation is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The implementation is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. 8 | 9 | from random import randint 10 | from math import floor, log 11 | from sets import Set 12 | 13 | A, infinity = {}, float("inf") 14 | def popcnt(x): return bin(x).count('1') 15 | def dist(x,y): return popcnt(x ^ y) 16 | 17 | def buildCovering(d,r): 18 | for v in xrange(1,2**(r+1)): A[v] = 0 19 | for i in xrange(d): 20 | m = randint(1, 2**(r+1)-1) 21 | for v in xrange(1,2**(r+1)): 22 | A[v] = A[v] + (1<