├── COPYING ├── README ├── TODO ├── example.rb └── redis-timeseries.rb /COPYING: -------------------------------------------------------------------------------- 1 | Copyright 2010-2012 Salvatore Sanfilippo. All rights reserved. 2 | 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 4 | 5 | 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 6 | 7 | 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 8 | 9 | THIS SOFTWARE IS PROVIDED BY SALVATORE SANFILIPPO ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SALVATORE SANFILIPPO OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 10 | 11 | The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of Salvatore Sanfilippo. 12 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | Redis timeseries README 2 | ======================= 3 | 4 | Warning: this library is considered unstable, this is just the first public 5 | release. Please allow for more development time before using it 6 | in production environments. 7 | 8 | Redis timeseries is a Ruby library implementing time series on top of Redis. 9 | There are many ways to store time series into Redis, for instance using Lists 10 | or Sorted Sets. This library takes a different approach storing time series 11 | into Redis strings using the Redis APPEND command. 12 | 13 | A central concept in this library is the "timestep", that is an interval of 14 | time so that all the data points received in such an interval will be stored 15 | in the same key. 16 | 17 | The timestep is specified when creating the time series object: 18 | 19 | ts = RedisTimeSeries.new("test",3600,Redis.new) 20 | 21 | In the above example a timestep of one hour (3600 seconds) was specified. 22 | All the timeseries added will be segmented into keys containing just one 23 | hour of data. Note that the timestep is aligned with the GMT time, so 24 | for instance Redis will use different keys for all the time series sent 25 | at 5pm and 6pm. This means that the segmentation is absolute and does not 26 | use a timer created when the time series object is created. 27 | 28 | Basically the name of the key is created using the following algorithm: 29 | 30 | key = ts:prefix:(UNIX_TIME - UNIX_TIME % TIMESTEP) 31 | 32 | In the above example we used "test" as prefix, so you can have different 33 | time series for different things in the same Redis server. 34 | 35 | To add a data point you just need to perform the following call: 36 | 37 | ts.add("data") 38 | 39 | The library will take care to store the time information in the data point. 40 | Note that you may have an origin time that is different from the insertion time 41 | so the add method accepts an additional argument where you can optionally 42 | specify the origin time. In that case the origin time is returned when you 43 | fetch data, together with the insertion time. 44 | 45 | Actually the origin time is handled as a string, so you can add whatever 46 | meta data you want inside. 47 | 48 | You can query data in two ways: 49 | 50 | ts.fetch_range(start,end) 51 | ts.fetch_timestep(time) 52 | 53 | The fetch_range method will fetch all the data samples in the specified 54 | interval, while fetch_timestamp will fetch a whole single key (a timestep) 55 | worth of data, accordingly to the time specified. 56 | 57 | HOW IT WORKS 58 | ============ 59 | 60 | The library appends every data point as a string terminated by \x00 byte. 61 | Every field inside the data point is delimited by \x01 byte. 62 | If the data or metadata you add contains \x00 or \x01 characters, the library 63 | will use base64 encoding to handle it transparently. 64 | 65 | So every key is a single string containing multiple \x00 separated data points 66 | that are mostly ordered in time. I say mostly since you may add data points 67 | using multiple clients, and the clocks may not be perfectly synchronized, so 68 | when performing range queries it is a good idea to enlarge the range a bit 69 | to be sure you get everything. 70 | 71 | Range queries are performed using binary search with Redis's GETRANGE inside 72 | the string. Since the records are of different size we use a modified version 73 | of binary search that is able to check for delimiters and adapt the search 74 | accordingly. Range queries work correctly even when they spawn across multiple 75 | keys. 76 | 77 | The fetch_timestep method does not need any binary search, it is just a 78 | single key lookup and is extremely fast. 79 | 80 | ADVANTAGES 81 | ========== 82 | 83 | - Very space efficient. Every timestep is stored inside a single Redis string so there is no overhead at all. 84 | - Very fast. Adding a data point consists of just a single O(1) operation. 85 | - It is designed to work with Redis Cluster (once released), and in general with different sharding policies: no multi keys op, nor big data structures into a single key. Even distributing the keys across multiple instances using a simple hashing algorithm will work well. 86 | - Keys are already into a serialized format: it is trivial to move keys related to old time series into the file system or other big data systems, just using the Redis GET command, or to import back into Redis using SET. 87 | 88 | DISADVANTAGES 89 | ============= 90 | 91 | - Adding into the middle is not possible. 92 | 93 | WARNINGS 94 | ======== 95 | 96 | It is not a good idea to change the used timestep. There is currently no 97 | support for this. You'll not corrupt data but will make older data harder 98 | to query. 99 | 100 | CREDITS 101 | ======= 102 | 103 | Thanks to Derek Collison and Pieter Noordhuis for design feedbacks. 104 | The library was implemented by Salvatore Sanfilippo. 105 | 106 | -------------------------------------------------------------------------------- /TODO: -------------------------------------------------------------------------------- 1 | - Tests. 2 | - Once this is stable, create a gem. 3 | - Remember the latest range, and avoid querying Redis if the new range is included in the old one. 4 | - FIXME: include perfectly matching value in the end of the range. 5 | -------------------------------------------------------------------------------- /example.rb: -------------------------------------------------------------------------------- 1 | require 'rubygems' 2 | require 'redis' 3 | require 'redis-timeseries' 4 | 5 | # To show the lib implementation here we use a timestep of just one second. 6 | # Then we sample every 0.1 seconds, producing on average 10 samples per key. 7 | # This way we should how multi-key range queries are working. 8 | ts = RedisTimeSeries.new("test",1,Redis.new) 9 | 10 | now = Time.now.to_f 11 | puts "Adding data points: " 12 | (0..30).each{|i| 13 | print "#{i} " 14 | STDOUT.flush 15 | ts.add(i.to_s) 16 | sleep 0.1 17 | } 18 | puts "" 19 | 20 | # Get the second in the middle of our sampling. 21 | begin_time = now+1 22 | end_time = now+2 23 | puts "\nGet range from #{begin_time} to #{end_time}" 24 | 25 | ts.fetch_range(begin_time,end_time).each{|record| 26 | puts "Record time #{record[:time]}, data #{record[:data]}" 27 | } 28 | 29 | # Show API to get a single timestep 30 | puts "\nGet a single timestep near #{begin_time}" 31 | ts.fetch_timestep(begin_time).each{|record| 32 | puts "Record time #{record[:time]}, data #{record[:data]}" 33 | } 34 | -------------------------------------------------------------------------------- /redis-timeseries.rb: -------------------------------------------------------------------------------- 1 | # Copyright (C) 2010-2012 Salvatore Sanfilippo. 2 | # Licensed under the BSD two clause license, see the COPYING file shipped 3 | # with this source distribution for more information. 4 | 5 | require 'base64' 6 | 7 | class RedisTimeSeries 8 | def initialize(prefix,timestep,redis) 9 | @prefix = prefix 10 | @timestep = timestep 11 | @redis = redis 12 | end 13 | 14 | def normalize_time(t) 15 | t = t.to_i 16 | t - (t % @timestep) 17 | end 18 | 19 | def getkey(t) 20 | "ts:#{@prefix}:#{normalize_time t}" 21 | end 22 | 23 | def tsencode(data) 24 | if data.index("\x00") or data.index("\x01") 25 | "E#{Base64.encode64(data)}" 26 | else 27 | "R#{data}" 28 | end 29 | end 30 | 31 | def tsdecode(data) 32 | if data[0..0] == 'E' 33 | Base64.decode64(data[1..-1]) 34 | else 35 | data[1..-1] 36 | end 37 | end 38 | 39 | def add(data,origin_time=nil) 40 | data = tsencode(data) 41 | origin_time = tsencode(origin_time) if origin_time 42 | now = Time.now.to_f 43 | value = "#{now}\x01#{data}" 44 | value << "\x01#{origin_time}" if origin_time 45 | value << "\x00" 46 | @redis.append(getkey(now.to_i),value) 47 | end 48 | 49 | def decode_record(r) 50 | res = {} 51 | s = r.split("\x01") 52 | res[:time] = s[0].to_f 53 | res[:data] = tsdecode(s[1]) 54 | if s[2] 55 | res[:origin_time] = tsdecode(s[2]) 56 | else 57 | res[:origin_time] = nil 58 | end 59 | return res 60 | end 61 | 62 | def seek(time) 63 | best_start = nil 64 | best_time = nil 65 | rangelen = 64 66 | key = getkey(time.to_i) 67 | len = @redis.strlen(key) 68 | return 0 if len == 0 69 | min = 0 70 | max = len-1 71 | while true 72 | p = min+((max-min)/2) 73 | # puts "Min: #{min} Max: #{max} P: #{p}" 74 | # Seek the first complete record starting from position 'p'. 75 | # We need to search for two consecutive \x00 chars, and enlarnge 76 | # the range if needed as we don't know how big the record is. 77 | while true 78 | range_end = p+rangelen-1 79 | range_end = len if range_end > len 80 | r = @redis.getrange(key,p,range_end) 81 | # puts "GETRANGE #{p} #{range_end}" 82 | if p == 0 83 | sep = -1 84 | else 85 | sep = r.index("\x00") 86 | end 87 | sep2 = r.index("\x00",sep+1) if sep 88 | if sep and sep2 89 | record = r[((sep+1)...sep2)] 90 | record_start = p+sep+1 91 | record_end = p+sep2-1 92 | dr = decode_record(record) 93 | 94 | # Take track of the best sample, that is the sample 95 | # that is greater than our sample, but with the smallest 96 | # increment. 97 | if dr[:time] >= time and (!best_time or best_time>dr[:time]) 98 | best_start = record_start 99 | best_time = dr[:time] 100 | # puts "NEW BEST: #{best_time}" 101 | end 102 | # puts "Max-Min #{max-min} RS #{record_start}" 103 | return best_start if max-min == 1 104 | break 105 | end 106 | # Already at the end of the string but still no luck? 107 | return len+1 if range_end = len 108 | # We need to enlrange the range, it is interesting to note 109 | # that we take the enlarged value: likely other time series 110 | # will be the same size on average. 111 | rangelen *= 2 112 | end 113 | # puts dr.inspect 114 | return record_start if dr[:time] == time 115 | if dr[:time] > time 116 | max = p 117 | else 118 | min = p 119 | end 120 | end 121 | end 122 | 123 | def produce_result(res,key,range_begin,range_end) 124 | r = @redis.getrange(key,range_begin,range_end) 125 | if r 126 | s = r.split("\x00") 127 | s.each{|r| 128 | record = decode_record(r) 129 | res << record 130 | } 131 | end 132 | end 133 | 134 | def fetch_range(begin_time,end_time) 135 | res = [] 136 | begin_key = getkey(begin_time) 137 | end_key = getkey(end_time) 138 | begin_off = seek(begin_time) 139 | end_off = seek(end_time) 140 | if begin_key == end_key 141 | # puts "#{begin_off} #{end_off} #{begin_key}" 142 | produce_result(res,begin_key,begin_off,end_off-1) 143 | else 144 | produce_result(res,begin_key,begin_off,-1) 145 | t = normalize_time(begin_time) 146 | while true 147 | t += @timestep 148 | key = getkey(t) 149 | break if key == end_key 150 | produce_result(res,key,0,-1) 151 | end 152 | produce_result(res,end_key,0,end_off-1) 153 | end 154 | res 155 | end 156 | 157 | def fetch_timestep(time) 158 | res = [] 159 | key = getkey(time) 160 | produce_result(res,key,0,-1) 161 | res 162 | end 163 | end 164 | --------------------------------------------------------------------------------