├── COPYING
├── README
├── TODO
├── example.rb
└── redis-timeseries.rb


/COPYING:
--------------------------------------------------------------------------------
 1 | Copyright 2010-2012 Salvatore Sanfilippo. All rights reserved.
 2 | 
 3 | Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 4 | 
 5 |    1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
 6 | 
 7 |    2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
 8 | 
 9 | THIS SOFTWARE IS PROVIDED BY SALVATORE SANFILIPPO ''AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SALVATORE SANFILIPPO OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
10 | 
11 | The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of Salvatore Sanfilippo.
12 | 


--------------------------------------------------------------------------------
/README:
--------------------------------------------------------------------------------
  1 | Redis timeseries README
  2 | =======================
  3 | 
  4 | Warning: this library is considered unstable, this is just the first public
  5 |          release. Please allow for more development time before using it
  6 |          in production environments.
  7 | 
  8 | Redis timeseries is a Ruby library implementing time series on top of Redis.
  9 | There are many ways to store time series into Redis, for instance using Lists
 10 | or Sorted Sets. This library takes a different approach storing time series
 11 | into Redis strings using the Redis APPEND command.
 12 | 
 13 | A central concept in this library is the "timestep", that is an interval of
 14 | time so that all the data points received in such an interval will be stored
 15 | in the same key.
 16 | 
 17 | The timestep is specified when creating the time series object:
 18 |     
 19 |     ts = RedisTimeSeries.new("test",3600,Redis.new)
 20 | 
 21 | In the above example a timestep of one hour (3600 seconds) was specified.
 22 | All the timeseries added will be segmented into keys containing just one
 23 | hour of data. Note that the timestep is aligned with the GMT time, so
 24 | for instance Redis will use different keys for all the time series sent
 25 | at 5pm and 6pm. This means that the segmentation is absolute and does not
 26 | use a timer created when the time series object is created.
 27 | 
 28 | Basically the name of the key is created using the following algorithm:
 29 | 
 30 |     key = ts:prefix:(UNIX_TIME - UNIX_TIME % TIMESTEP)
 31 | 
 32 | In the above example we used "test" as prefix, so you can have different
 33 | time series for different things in the same Redis server.
 34 | 
 35 | To add a data point you just need to perform the following call:
 36 | 
 37 |     ts.add("data")
 38 | 
 39 | The library will take care to store the time information in the data point.
 40 | Note that you may have an origin time that is different from the insertion time
 41 | so the add method accepts an additional argument where you can optionally
 42 | specify the origin time. In that case the origin time is returned when you
 43 | fetch data, together with the insertion time.
 44 | 
 45 | Actually the origin time is handled as a string, so you can add whatever
 46 | meta data you want inside.
 47 | 
 48 | You can query data in two ways:
 49 | 
 50 |     ts.fetch_range(start,end)
 51 |     ts.fetch_timestep(time)
 52 | 
 53 | The fetch_range method will fetch all the data samples in the specified
 54 | interval, while fetch_timestamp will fetch a whole single key (a timestep)
 55 | worth of data, accordingly to the time specified.
 56 | 
 57 | HOW IT WORKS
 58 | ============
 59 | 
 60 | The library appends every data point as a string terminated by \x00 byte.
 61 | Every field inside the data point is delimited by \x01 byte.
 62 | If the data or metadata you add contains \x00 or \x01 characters, the library
 63 | will use base64 encoding to handle it transparently.
 64 | 
 65 | So every key is a single string containing multiple \x00 separated data points
 66 | that are mostly ordered in time. I say mostly since you may add data points
 67 | using multiple clients, and the clocks may not be perfectly synchronized, so
 68 | when performing range queries it is a good idea to enlarge the range a bit
 69 | to be sure you get everything.
 70 | 
 71 | Range queries are performed using binary search with Redis's GETRANGE inside
 72 | the string. Since the records are of different size we use a modified version
 73 | of binary search that is able to check for delimiters and adapt the search
 74 | accordingly. Range queries work correctly even when they spawn across multiple
 75 | keys.
 76 | 
 77 | The fetch_timestep method does not need any binary search, it is just a
 78 | single key lookup and is extremely fast.
 79 | 
 80 | ADVANTAGES
 81 | ==========
 82 | 
 83 | - Very space efficient. Every timestep is stored inside a single Redis string so there is no overhead at all.
 84 | - Very fast. Adding a data point consists of just a single O(1) operation.
 85 | - It is designed to work with Redis Cluster (once released), and in general with different sharding policies: no multi keys op, nor big data structures into a single key. Even distributing the keys across multiple instances using a simple hashing algorithm will work well.
 86 | - Keys are already into a serialized format: it is trivial to move keys related to old time series into the file system or other big data systems, just using the Redis GET command, or to import back into Redis using SET.
 87 | 
 88 | DISADVANTAGES
 89 | =============
 90 | 
 91 | - Adding into the middle is not possible.
 92 | 
 93 | WARNINGS
 94 | ========
 95 | 
 96 | It is not a good idea to change the used timestep. There is currently no
 97 | support for this. You'll not corrupt data but will make older data harder
 98 | to query.
 99 | 
100 | CREDITS
101 | =======
102 | 
103 | Thanks to Derek Collison and Pieter Noordhuis for design feedbacks.
104 | The library was implemented by Salvatore Sanfilippo.
105 | 
106 | 


--------------------------------------------------------------------------------
/TODO:
--------------------------------------------------------------------------------
1 | - Tests.
2 | - Once this is stable, create a gem.
3 | - Remember the latest range, and avoid querying Redis if the new range is included in the old one.
4 | - FIXME: include perfectly matching value in the end of the range.
5 | 


--------------------------------------------------------------------------------
/example.rb:
--------------------------------------------------------------------------------
 1 | require 'rubygems'
 2 | require 'redis'
 3 | require 'redis-timeseries'
 4 | 
 5 | # To show the lib implementation here we use a timestep of just one second.
 6 | # Then we sample every 0.1 seconds, producing on average 10 samples per key.
 7 | # This way we should how multi-key range queries are working.
 8 | ts = RedisTimeSeries.new("test",1,Redis.new)
 9 | 
10 | now = Time.now.to_f
11 | puts "Adding data points: "
12 | (0..30).each{|i|
13 |     print "#{i} "
14 |     STDOUT.flush
15 |     ts.add(i.to_s)
16 |     sleep 0.1
17 | }
18 | puts ""
19 | 
20 | # Get the second in the middle of our sampling.
21 | begin_time = now+1
22 | end_time = now+2
23 | puts "\nGet range from #{begin_time} to #{end_time}"
24 | 
25 | ts.fetch_range(begin_time,end_time).each{|record|
26 |     puts "Record time #{record[:time]}, data #{record[:data]}"
27 | }
28 | 
29 | # Show API to get a single timestep
30 | puts "\nGet a single timestep near #{begin_time}"
31 | ts.fetch_timestep(begin_time).each{|record|
32 |     puts "Record time #{record[:time]}, data #{record[:data]}"
33 | }
34 | 


--------------------------------------------------------------------------------
/redis-timeseries.rb:
--------------------------------------------------------------------------------
  1 | # Copyright (C) 2010-2012 Salvatore Sanfilippo.
  2 | # Licensed under the BSD two clause license, see the COPYING file shipped
  3 | # with this source distribution for more information.
  4 | 
  5 | require 'base64'
  6 | 
  7 | class RedisTimeSeries
  8 |     def initialize(prefix,timestep,redis)
  9 |         @prefix = prefix
 10 |         @timestep = timestep
 11 |         @redis = redis
 12 |     end
 13 | 
 14 |     def normalize_time(t)
 15 |         t = t.to_i
 16 |         t - (t % @timestep)
 17 |     end
 18 | 
 19 |     def getkey(t)
 20 |         "ts:#{@prefix}:#{normalize_time t}"
 21 |     end
 22 | 
 23 |     def tsencode(data)
 24 |         if data.index("\x00") or data.index("\x01")
 25 |             "E#{Base64.encode64(data)}"
 26 |         else
 27 |             "R#{data}"
 28 |         end
 29 |     end
 30 | 
 31 |     def tsdecode(data)
 32 |         if data[0..0] == 'E'
 33 |             Base64.decode64(data[1..-1])
 34 |         else
 35 |             data[1..-1]
 36 |         end
 37 |     end
 38 | 
 39 |     def add(data,origin_time=nil)
 40 |         data = tsencode(data)
 41 |         origin_time = tsencode(origin_time) if origin_time
 42 |         now = Time.now.to_f
 43 |         value = "#{now}\x01#{data}"
 44 |         value << "\x01#{origin_time}" if origin_time
 45 |         value << "\x00"
 46 |         @redis.append(getkey(now.to_i),value)
 47 |     end
 48 | 
 49 |     def decode_record(r)
 50 |         res = {}
 51 |         s = r.split("\x01")
 52 |         res[:time] = s[0].to_f
 53 |         res[:data] = tsdecode(s[1])
 54 |         if s[2]
 55 |             res[:origin_time] = tsdecode(s[2])
 56 |         else
 57 |             res[:origin_time] = nil
 58 |         end
 59 |         return res
 60 |     end
 61 | 
 62 |     def seek(time)
 63 |         best_start = nil
 64 |         best_time = nil
 65 |         rangelen = 64
 66 |         key = getkey(time.to_i)
 67 |         len = @redis.strlen(key)
 68 |         return 0 if len == 0
 69 |         min = 0
 70 |         max = len-1
 71 |         while true
 72 |             p = min+((max-min)/2)
 73 |             # puts "Min: #{min} Max: #{max} P: #{p}"
 74 |             # Seek the first complete record starting from position 'p'.
 75 |             # We need to search for two consecutive \x00 chars, and enlarnge
 76 |             # the range if needed as we don't know how big the record is.
 77 |             while true
 78 |                 range_end = p+rangelen-1
 79 |                 range_end = len if range_end > len
 80 |                 r = @redis.getrange(key,p,range_end)
 81 |                 # puts "GETRANGE #{p} #{range_end}"
 82 |                 if p == 0
 83 |                     sep = -1
 84 |                 else
 85 |                     sep = r.index("\x00")
 86 |                 end
 87 |                 sep2 = r.index("\x00",sep+1) if sep
 88 |                 if sep and sep2
 89 |                     record = r[((sep+1)...sep2)]
 90 |                     record_start = p+sep+1
 91 |                     record_end = p+sep2-1
 92 |                     dr = decode_record(record)
 93 | 
 94 |                     # Take track of the best sample, that is the sample
 95 |                     # that is greater than our sample, but with the smallest
 96 |                     # increment.
 97 |                     if dr[:time] >= time and (!best_time or best_time>dr[:time])
 98 |                         best_start = record_start
 99 |                         best_time = dr[:time]
100 |                         # puts "NEW BEST: #{best_time}"
101 |                     end
102 |                     # puts "Max-Min #{max-min} RS #{record_start}"
103 |                     return best_start if max-min == 1
104 |                     break
105 |                 end
106 |                 # Already at the end of the string but still no luck?
107 |                 return len+1 if range_end = len
108 |                 # We need to enlrange the range, it is interesting to note
109 |                 # that we take the enlarged value: likely other time series
110 |                 # will be the same size on average.
111 |                 rangelen *= 2
112 |             end
113 |             # puts dr.inspect
114 |             return record_start if dr[:time] == time
115 |             if dr[:time] > time
116 |                 max = p
117 |             else
118 |                 min = p
119 |             end
120 |         end
121 |     end
122 | 
123 |     def produce_result(res,key,range_begin,range_end)
124 |         r = @redis.getrange(key,range_begin,range_end)
125 |         if r
126 |             s = r.split("\x00")
127 |             s.each{|r|
128 |                 record = decode_record(r)
129 |                 res << record
130 |             }
131 |         end
132 |     end
133 | 
134 |     def fetch_range(begin_time,end_time)
135 |         res = []
136 |         begin_key = getkey(begin_time)
137 |         end_key = getkey(end_time)
138 |         begin_off = seek(begin_time)
139 |         end_off = seek(end_time)
140 |         if begin_key == end_key
141 |             # puts "#{begin_off} #{end_off} #{begin_key}"
142 |             produce_result(res,begin_key,begin_off,end_off-1)
143 |         else
144 |             produce_result(res,begin_key,begin_off,-1)
145 |             t = normalize_time(begin_time)
146 |             while true
147 |                 t += @timestep
148 |                 key = getkey(t)
149 |                 break if key == end_key
150 |                 produce_result(res,key,0,-1)
151 |             end
152 |             produce_result(res,end_key,0,end_off-1)
153 |         end
154 |         res
155 |     end
156 | 
157 |     def fetch_timestep(time)
158 |         res = []
159 |         key = getkey(time)
160 |         produce_result(res,key,0,-1)
161 |         res
162 |     end
163 | end
164 | 


--------------------------------------------------------------------------------