├── .gitignore ├── LICENSE ├── README.md ├── TODO.md ├── example.sh ├── example2.sh ├── redimension.lua └── test.sh /.gitignore: -------------------------------------------------------------------------------- 1 | fuzz*.sh 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, Salvatore Sanfilippo 2 | Copyright (c) 2015, Itamar Haber 3 | 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or without 7 | modification, are permitted provided that the following conditions are met: 8 | 9 | * Redistributions of source code must retain the above copyright notice, 10 | this list of conditions and the following disclaimer. 11 | 12 | * Redistributions in binary form must reproduce the above copyright notice, 13 | this list of conditions and the following disclaimer in the documentation 14 | and/or other materials provided with the distribution. 15 | 16 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 17 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 18 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 19 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 20 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 21 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 22 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 23 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 24 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 25 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | redimension.lua 2 | === 3 | 4 | A port of [Redimension](https://github.com/antirez/redimension) to Redis Lua with a semi-Redis API. Developed 100% without a debugger. 5 | 6 | Divergences 7 | === 8 | 9 | * Indices always use a Sorted Set (for ranges) and a Hash (for id lookups). 10 | 11 | Known issues 12 | === 13 | 14 | * Can't index elements named _dim and _prec 15 | * Can't deal with elements that have colons (':') 16 | * Lua's integers are 32 bit, could lead to breakage somewhen (`2^exp`...) 17 | 18 | Usage 19 | === 20 | 21 | Use `EVAL`, `EVALSHA` or `redis-cli --eval` to run redimension.lua. 22 | 23 | The script requires two key names and at least one argument, as follows: 24 | 25 | * KEYS[1] - the index sorted set key 26 | * KEYS[2] - the index hash key 27 | * ARGV[1] - the command 28 | 29 | Command may be one of the following: 30 | 31 | * create - create an index with ARGV[2] as dimension and ARGV[3] as precision', 32 | * drop - drops an index 33 | * index - index an element ARGV[2] with ARGV[3]..ARGV[3+dimension] values 34 | * unindex - unindex an element ARGV[2] with ARGV[3]..ARGV[3+dimension] values 35 | * unindex_by_id - unindex an element by id ARGV[2] 36 | * update - update an element ARGV[2] with ARGV[3]..ARGV[3+dimension] values 37 | * query - query using ranges ARGV[2], ARGV[3]..ARGV[2+dimension-1], ARGV[2+dimension] 38 | * fuzzy_test - fuzzily tests the library on ARGV[2] dimension with ARGV[3] items using ARGV[4] queries 39 | 40 | Testing 41 | === 42 | 43 | Fuzzy-ish testing is implemented inline - use the `fuzzy_test` command to invoke. 44 | 45 | Performance 46 | === 47 | 48 | Informal benchmarking can be found at: https://gist.github.com/itamarhaber/3d8c9741c545202925aa 49 | 50 | License 51 | === 52 | 53 | The code is released under the BSD 2 clause license. 54 | -------------------------------------------------------------------------------- /TODO.md: -------------------------------------------------------------------------------- 1 | TODO 2 | === 3 | 4 | * Select a better exponent for the query by using ZLEXCOUNT to estimate if filtering work would be too big or not to perform compared to the number of additional queries. 5 | * Binary encoding to save space instead of using hex to represent each byte as two. 6 | -------------------------------------------------------------------------------- /example.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | SHA1=$(redis-cli SCRIPT LOAD "$(cat redimension.lua)") 4 | 5 | redis-cli EVALSHA $SHA1 2 z h drop 6 | redis-cli EVALSHA $SHA1 2 z h create 2 32 7 | redis-cli EVALSHA $SHA1 2 z h index Josh 45 120000 8 | redis-cli EVALSHA $SHA1 2 z h index Pamela 50 110000 9 | redis-cli EVALSHA $SHA1 2 z h index Angela 30 125000 10 | redis-cli EVALSHA $SHA1 2 z h query 40 50 100000 115000 11 | -------------------------------------------------------------------------------- /example2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | SHA1=$(redis-cli SCRIPT LOAD "$(cat redimension.lua)") 4 | 5 | redis-cli EVALSHA $SHA1 2 z h drop 6 | redis-cli EVALSHA $SHA1 2 z h create 2 32 7 | redis-cli EVALSHA $SHA1 2 z h update Josh 45 120000 8 | redis-cli EVALSHA $SHA1 2 z h update Pamela 50 110000 9 | redis-cli EVALSHA $SHA1 2 z h update George 41 100000 10 | redis-cli EVALSHA $SHA1 2 z h update Angela 30 125000 11 | redis-cli EVALSHA $SHA1 2 z h query 40 50 100000 115000 12 | 13 | redis-cli EVALSHA $SHA1 2 z h unindex_by_id Pamela 14 | echo "After unindexing:" 15 | redis-cli EVALSHA $SHA1 2 z h query 40 50 100000 115000 16 | 17 | redis-cli EVALSHA $SHA1 2 z h update George 42 100000 18 | echo "After updating:" 19 | redis-cli EVALSHA $SHA1 2 z h query 40 50 100000 115000 20 | -------------------------------------------------------------------------------- /redimension.lua: -------------------------------------------------------------------------------- 1 | local _USAGE = { 2 | 'KEYS[1] - index sorted set key', 3 | 'KEYS[2] - index hash key', 4 | 'ARGV[1] - command. Can be:', 5 | ' create - create an index with ARGV[2] as dimension and ARGV[3] as precision', 6 | ' drop - drops an index', 7 | ' index - index an element ARGV[2] with ARGV[3]..ARGV[3+dimension] values', 8 | ' unindex - unindex an element ARGV[2] with ARGV[3]..ARGV[3+dimension] values', 9 | ' unindex_by_id - unindex an element by id ARGV[2]', 10 | ' update - update an element ARGV[2] with ARGV[3]..ARGV[3+dimension] values', 11 | ' query - query using ranges ARGV[2], ARGV[3]..ARGV[2+dimension-1], ARGV[2+dimension]', 12 | ' fuzzy_test - fuzzily tests the library on ARGV[2] dimension with ARGV[3] items using ARGV[4] queries', 13 | } 14 | 15 | local _dim -- index's dimension 16 | local _prec -- index's precision 17 | local _MAX_PREC = 56 18 | 19 | local bin2hex = { 20 | ['0000'] = '0', 21 | ['0001'] = '1', 22 | ['0010'] = '2', 23 | ['0011'] = '3', 24 | ['0100'] = '4', 25 | ['0101'] = '5', 26 | ['0110'] = '6', 27 | ['0111'] = '7', 28 | ['1000'] = '8', 29 | ['1001'] = '9', 30 | ['1010'] = 'A', 31 | ['1011'] = 'B', 32 | ['1100'] = 'C', 33 | ['1101'] = 'D', 34 | ['1110'] = 'E', 35 | ['1111'] = 'F' 36 | } 37 | 38 | local function load_meta() 39 | _dim = tonumber(redis.call('HGET', KEYS[2], '_dim')) 40 | _prec = tonumber(redis.call('HGET', KEYS[2], '_prec')) 41 | if not _dim or not _prec then 42 | error('failed to load index meta data') 43 | end 44 | end 45 | 46 | local function check_dims(vars) 47 | if #vars ~= _dim then 48 | error('wrong number of values for this index') 49 | end 50 | end 51 | 52 | -- Encode N variables into the bits-interleaved representation. 53 | local function encode(...) 54 | local comb = {} 55 | 56 | for i = 1, #arg do 57 | local b = arg[i] 58 | for j = 1, _prec do 59 | b = bit.rol(b, 1) 60 | if comb[j] then 61 | comb[j] = comb[j] .. bit.band(b, 1) 62 | else 63 | table.insert(comb, bit.band(b, 1)) 64 | end 65 | end 66 | end 67 | 68 | local bs = table.concat(comb) 69 | local l = string.len(bs) 70 | local rem = l % 4 71 | local hs = '' 72 | local b = '' 73 | 74 | l = l - 1 75 | if (rem > 0) then 76 | bs = string.rep('0', 4 - rem) .. bs 77 | end 78 | 79 | for i = 1, l, 4 do 80 | hs = hs .. bin2hex[string.sub(bs, i, i+3)] 81 | end 82 | 83 | hs = string.rep('0', _prec*_dim/4-hs:len()) .. hs:sub(3):lower() 84 | return hs 85 | end 86 | 87 | -- Encode an element coordinates and ID as the whole string to add 88 | -- into the sorted set. 89 | local function elestring(vars, id) 90 | check_dims(vars) 91 | local ele = encode(unpack(vars)) 92 | for _, v in pairs(vars) do 93 | ele = ele .. ':' .. v 94 | end 95 | ele = ele .. ':' .. id 96 | return ele 97 | end 98 | 99 | -- Add a variable with associated data 'id' 100 | local function index(vars, id) 101 | local ele = elestring(vars, id) 102 | -- TODO: remove this debug helper 103 | if redis == nil then 104 | print(ele) 105 | return 106 | end 107 | redis.call('ZADD', KEYS[1], 0, ele) 108 | redis.call('HSET', KEYS[2], id, ele) 109 | end 110 | 111 | -- ZREM according to current position in the space and ID. 112 | local function unindex(vars,id) 113 | redis.call('ZREM', KEYS[1], elestring(vars,id)) 114 | end 115 | 116 | -- Unidex by just ID in case @hashkey is set to true in order to take 117 | -- an associated Redis hash with ID -> current indexed representation, 118 | -- so that the user can unindex easily. 119 | local function unindex_by_id(id) 120 | local ele = redis.call('HGET', KEYS[2], id) 121 | redis.call('ZREM', KEYS[1], ele) 122 | redis.call('HDEL', KEYS[2], id) 123 | end 124 | 125 | -- Like index but makes sure to remove the old index for the specified 126 | -- id. Requires hash mapping enabled. 127 | local function update(vars,id) 128 | local ele = elestring(vars,id) 129 | local oldele = redis.call('HGET', KEYS[2], id) 130 | redis.call('ZREM', KEYS[1], oldele) 131 | redis.call('HDEL', KEYS[2], id) 132 | redis.call('ZADD', KEYS[1], 0, ele) 133 | redis.call('HSET', KEYS[2], id, ele) 134 | end 135 | 136 | --- exp is the exponent of two that gives the size of the squares 137 | -- we use in the range query. N times the exponent is the number 138 | -- of bits we unset and set to get the start and end points of the range. 139 | local function query_raw(vrange,exp) 140 | local vstart = {} 141 | local vend = {} 142 | -- We start scaling our indexes in order to iterate all areas, so 143 | -- that to move between N-dimensional areas we can just increment 144 | -- vars. 145 | for _, v in pairs(vrange) do 146 | table.insert(vstart, math.floor(v[1]/(2^exp))) 147 | table.insert(vend, math.floor(v[2]/(2^exp))) 148 | end 149 | 150 | -- Visit all the sub-areas to cover our N-dim search region. 151 | local ranges = {} 152 | local vcurrent = {} 153 | for i = 1, #vstart do 154 | table.insert(vcurrent, vstart[i]) 155 | end 156 | 157 | local notdone = true 158 | while notdone do 159 | -- For each sub-region, encode all the start-end ranges 160 | -- for each dimension. 161 | local vrange_start = {} 162 | local vrange_end = {} 163 | for i = 1, _dim do 164 | table.insert(vrange_start, vcurrent[i]*(2^exp)) 165 | table.insert(vrange_end, bit.bor(vrange_start[i],(2^exp)-1)) 166 | end 167 | 168 | -- Now we need to combine the ranges for each dimension 169 | -- into a single lexicographcial query, so we turn 170 | -- the ranges it into interleaved form. 171 | local s = encode(unpack(vrange_start)) 172 | -- Now that we have the start of the range, calculate the end 173 | -- by replacing the specified number of bits from 0 to 1. 174 | local e = encode(unpack(vrange_end)) 175 | table.insert(ranges, { '['..s, '['..e..':\255' }) 176 | 177 | -- Increment to loop in N dimensions in order to visit 178 | -- all the sub-areas representing the N dimensional area to 179 | -- query. 180 | for i = 1, _dim do 181 | if vcurrent[i] ~= vend[i] then 182 | vcurrent[i] = vcurrent[i] + 1 183 | break 184 | elseif i == _dim then 185 | notdone = false -- Visited everything! 186 | else 187 | vcurrent[i] = vstart[i] 188 | end 189 | end 190 | end 191 | 192 | -- Perform the ZRANGEBYLEX queries to collect the results from the 193 | -- defined ranges. Use pipelining to speedup. 194 | local allres = {} 195 | for _, v in pairs(ranges) do 196 | local res = redis.call('ZRANGEBYLEX', KEYS[1], v[1], v[2]) 197 | for _, r in pairs(res) do 198 | table.insert(allres, r) 199 | end 200 | end 201 | 202 | -- Filter items according to the requested limits. This is needed 203 | -- since our sub-areas used to cover the whole search area are not 204 | -- perfectly aligned with boundaries, so we also retrieve elements 205 | -- outside the searched ranges. 206 | local items = {} 207 | for _, v in pairs(allres) do 208 | local fields = {} 209 | v:gsub('([^:]+)', function(f) table.insert(fields, f) end) 210 | local skip = false 211 | for i = 1, _dim do 212 | if tonumber(fields[i+1]) < vrange[i][1] or 213 | tonumber(fields[i+1]) > vrange[i][2] 214 | then 215 | skip = true 216 | break 217 | end 218 | end 219 | if not skip then 220 | table.remove(fields, 1) 221 | table.insert(items, fields) 222 | end 223 | end 224 | 225 | return items 226 | end 227 | 228 | -- Like query_raw, but before performing the query makes sure to order 229 | -- parameters so that x0 < x1 and y0 < y1 and so forth. 230 | -- Also calculates the exponent for the query_raw masking. 231 | local function query(vrange) 232 | check_dims(vrange) 233 | local deltas = {} 234 | for i, v in ipairs(vrange) do 235 | if v[1] > v[2] then 236 | vrange[i][1], vrange[i][2] = vrange[i][2], vrange[i][1] 237 | end 238 | table.insert(deltas, vrange[i][2]-vrange[i][1]+1) 239 | end 240 | 241 | local delta = deltas[1] 242 | for _, v in pairs(deltas) do 243 | if v < delta then 244 | delta = v 245 | end 246 | end 247 | 248 | local exp = 1 249 | while delta > 2 do 250 | delta = math.floor(delta / 2) 251 | exp = exp + 1 252 | end 253 | 254 | -- If ranges for different dimensions are extremely different in span, 255 | -- we may end with a too small exponent which will result in a very 256 | -- big number of queries in order to be very selective. This is most 257 | -- of the times not a good idea, so at the cost of querying larger 258 | -- areas and filtering more, we scale 'exp' until we can serve this 259 | -- request with less than 20 ZRANGEBYLEX commands. 260 | -- 261 | -- Note: the magic "20" depends on the number of items inside the 262 | -- requested range, since it's a tradeoff with filtering items outside 263 | -- the searched area. It is possible to improve the algorithm by using 264 | -- ZLEXCOUNT to get the number of items. 265 | while true do 266 | for i, v in ipairs(vrange) do 267 | deltas[i] = (v[2]/(2^exp))-(v[1]/(2^exp))+1 268 | end 269 | local ranges = 1 270 | for _, v in pairs(deltas) do 271 | ranges = ranges*v 272 | end 273 | 274 | if ranges < 20 then 275 | break 276 | end 277 | exp = exp + 1 278 | end 279 | 280 | return query_raw(vrange,exp) 281 | end 282 | 283 | -- Similar to query but takes just the center of the query area and a 284 | -- radius, and automatically filters away all the elements outside the 285 | -- specified circular area. 286 | local function query_radius(x,y,exp,radius) 287 | -- TODO 288 | end 289 | 290 | -- drops an index 291 | local function drop() 292 | redis.call('DEL', KEYS[1], KEYS[2]) 293 | end 294 | 295 | -- creates an index with dimension d and precision p 296 | local function create(d, p) 297 | drop() 298 | redis.call('HMSET', KEYS[2], '_dim', d, '_prec', p) 299 | end 300 | 301 | -- parse arguments 302 | if #ARGV == 0 or #KEYS ~= 2 then 303 | return(_USAGE) 304 | end 305 | 306 | local cmd = ARGV[1]:lower() 307 | 308 | if cmd == 'create' then 309 | local dim, prec = tonumber(ARGV[2]), tonumber(ARGV[3]) 310 | if dim == nil or prec == nil then 311 | error('index dimension and precision are must be numbers') 312 | end 313 | if dim < 1 then 314 | error('index dimension has to be at least 1') 315 | end 316 | if prec < 1 or prec > _MAX_PREC then 317 | error('index precision has to be between 1 and ' .. _MAX_PREC) 318 | end 319 | create(dim, prec) 320 | return({dim, prec}) 321 | end 322 | 323 | if cmd == 'drop' then 324 | drop() 325 | return('dropped.') 326 | end 327 | 328 | -- not really fuzzy w/o changing replication mode and using real randoms 329 | if cmd == 'fuzzy_test' then 330 | local dim, items, queries = tonumber(ARGV[2]), tonumber(ARGV[3]), tonumber(ARGV[4]) 331 | local timings = {} 332 | local avgt = 0.0 333 | 334 | drop() 335 | create(dim, _MAX_PREC) 336 | load_meta() 337 | 338 | local id = 0 339 | local dataset = {} 340 | for i = 1, items do 341 | local vars = {} 342 | for j = 1, dim do 343 | table.insert(vars, math.random(1000)) 344 | end 345 | index(vars, id) 346 | table.insert(vars, tostring(id)) 347 | table.insert(dataset, vars) 348 | id = id + 1 349 | end 350 | 351 | for i = 1, queries do 352 | local random = {} 353 | for j = 1, dim do 354 | local s = math.random(1000) 355 | local e = math.random(1000) 356 | if e < s then 357 | s, e = e, s 358 | end 359 | table.insert(random, { s, e }) 360 | end 361 | 362 | local start_t = redis.call('TIME') 363 | local res1 = query(random) 364 | local end_t = redis.call('TIME') 365 | 366 | -- some type conversions 367 | for i1, v1 in ipairs(res1) do 368 | for i2 = 1, dim do 369 | res1[i1][i2] = tonumber(res1[i1][i2]) 370 | end 371 | end 372 | 373 | start_t[1], start_t[2] = tonumber(start_t[1]), tonumber(start_t[2]) 374 | end_t[1], end_t[2] = tonumber(end_t[1]), tonumber(end_t[2]) 375 | if end_t[2] > start_t[2] then 376 | table.insert(timings, { end_t[1] - start_t[1], end_t[2] - start_t[2] }) 377 | else 378 | table.insert(timings, { end_t[1] - start_t[1] - 1, math.abs(end_t[2] - start_t[2]) }) 379 | end 380 | 381 | avgt = (avgt * (#timings - 1) + tonumber(string.format('%d.%06d', timings[#timings][1], timings[#timings][2]))) / #timings 382 | 383 | local res2 = {} 384 | for _, v in pairs(dataset) do 385 | local included = true 386 | for j = 1, dim do 387 | if v[j] < random[j][1] or v[j] > random[j][2] then 388 | included = false 389 | end 390 | end 391 | if included then 392 | table.insert(res2, v) 393 | end 394 | end 395 | 396 | if #res1 ~= #res2 then 397 | error('ERROR ' .. #res1 .. ' VS ' .. #res2) 398 | end 399 | 400 | -- table sorting is so much FUN! 401 | local function cmp(a, b, depth) 402 | depth = depth or 1 403 | 404 | if depth > dim + 1 then 405 | return false 406 | end 407 | if a[depth] < b[depth] then 408 | return true 409 | end 410 | if a[depth] == b[depth] then 411 | return cmp(a, b, depth + 1) 412 | end 413 | if a[depth] > b[depth] then 414 | return false 415 | end 416 | end 417 | 418 | table.sort(res1, cmp) 419 | table.sort(res2, cmp) 420 | 421 | for i1, r1 in ipairs(res1) do 422 | for i2 = 1, dim+1 do 423 | if r1[i2] ~= res2[i1][i2] then 424 | error('ERROR ' .. i2 .. ': ' .. r1[i2] .. ' ~= ' .. res2[i1][i2]) 425 | end 426 | end 427 | end 428 | end 429 | 430 | -- housekeeping drop() can't be called unless replication mode is changed 431 | return({ 'fuzzily tested.', {dim, items, queries, 'avg query time (sec): ' .. tostring(avgt)}}) 432 | end 433 | 434 | load_meta() 435 | 436 | if cmd == 'index' or cmd == 'unindex' or cmd == 'update' then 437 | local id = ARGV[2] 438 | local vars = {} 439 | for i = 3,#ARGV do 440 | table.insert(vars, tonumber(ARGV[i])) 441 | end 442 | 443 | if cmd == 'index' then 444 | index(vars, id) 445 | return('indexed.') 446 | elseif cmd == 'unindex' then 447 | unindex(vars, id) 448 | return('unindexed.') 449 | else 450 | update(vars, id) 451 | return('updated.') 452 | end 453 | end 454 | 455 | if cmd == 'unindex_by_id' then 456 | local id = ARGV[2] 457 | 458 | unindex_by_id(id) 459 | return('unindexed by id.') 460 | end 461 | 462 | if cmd == 'query' then 463 | local vranges = {} 464 | for i = 1, _dim do 465 | table.insert(vranges, {tonumber(ARGV[i*2]), tonumber(ARGV[i*2+1])}) 466 | end 467 | 468 | return query(vranges) 469 | end 470 | 471 | return(_USAGE) 472 | -------------------------------------------------------------------------------- /test.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | SHA1=$(redis-cli SCRIPT LOAD "$(cat redimension.lua)") 4 | 5 | redis-cli EVALSHA $SHA1 2 z h fuzzy_test 4 100 1000 6 | redis-cli EVALSHA $SHA1 2 z h fuzzy_test 3 100 1000 7 | redis-cli EVALSHA $SHA1 2 z h fuzzy_test 2 1000 1000 8 | --------------------------------------------------------------------------------