├── .gitignore ├── .travis.yml ├── Gemfile ├── LICENSE ├── README.textile ├── Rakefile ├── aggregate.gemspec ├── lib ├── aggregate.rb └── aggregate │ └── version.rb └── test └── ts_aggregate.rb /.gitignore: -------------------------------------------------------------------------------- 1 | Gemfile.lock 2 | pkg/ 3 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | arch: 2 | - amd64 3 | - ppc64le 4 | language: ruby 5 | rvm: 6 | - 1.9.3 7 | - 2.3 8 | - 2.5 9 | - 2.6 10 | - 2.7 11 | matrix: 12 | exclude: 13 | - rvm: 1.9.3 14 | arch: ppc64le 15 | - rvm: 2.3 16 | arch: ppc64le 17 | - rvm: 2.5 18 | arch: ppc64le 19 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | 3 | gem "rake" 4 | gem "minitest" -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2009 Joseph Ruscio 2 | 3 | Permission is hereby granted, free of charge, to any person 4 | obtaining a copy of this software and associated documentation 5 | files (the "Software"), to deal in the Software without 6 | restriction, including without limitation the rights to use, 7 | copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the 9 | Software is furnished to do so, subject to the following 10 | conditions: 11 | 12 | The above copyright notice and this permission notice shall be 13 | included in all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 16 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 17 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 18 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 19 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 20 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 21 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.textile: -------------------------------------------------------------------------------- 1 | h1. Aggregate 2 | 3 | By Joseph Ruscio 4 | 5 | Aggregate is an intuitive ruby implementation of a statistics aggregator 6 | including both default and configurable histogram support. It does this 7 | without recording/storing any of the actual sample values, making it 8 | suitable for tracking statistics across millions/billions of sample 9 | without any impact on performance or memory footprint. Originally 10 | inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap 11 | 12 | h2. Getting Started 13 | 14 | Aggregates are easy to instantiate, populate with sample data, and then 15 | inspect for common aggregate statistics: 16 | 17 |

 18 | #After instantiation use the << operator to add a sample to the aggregate:
 19 | stats = Aggregate.new
 20 | 
 21 | loop do
 22 |   # Take some action that generates a sample measurement
 23 |   stats << sample
 24 | end
 25 | 
 26 | # The number of samples
 27 | stats.count
 28 | 
 29 | # The average
 30 | stats.mean
 31 | 
 32 | # Max sample value
 33 | stats.max
 34 | 
 35 | # Min sample value
 36 | stats.min
 37 | 
 38 | # The standard deviation
 39 | stats.std_dev
 40 | 
41 | 42 | h2. Histograms 43 | 44 | Perhaps more importantly than the basic aggregate statistics detailed above 45 | Aggregate also maintains a histogram of samples. For anything other than 46 | normally distributed data are insufficient at best and often downright misleading 47 | 37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms. 48 | Aggregates maintains its histogram internally as a set of "buckets". 49 | Each bucket represents a range of possible sample values. The set of all buckets 50 | represents the range of "normal" sample values. 51 | 52 | h3. Binary Histograms 53 | 54 | Without any configuration Aggregate instance maintains a binary histogram, where 55 | each bucket represents a range twice as large as the preceding bucket i.e. 56 | [1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram 57 | provides for 128 buckets, theoretically covering the range [1, (2^127) - 1] 58 | (See NOTES below for a discussion on the effects in practice of insufficient 59 | precision.) 60 | 61 | Binary histograms are useful when we have little idea about what the 62 | sample distribution may look like as almost any positive value will 63 | fall into some bucket. After using binary histograms to determine 64 | the coarse-grained characteristics of your sample space you can 65 | configure a linear histogram to examine it in closer detail. 66 | 67 | h3. Linear Histograms 68 | 69 | Linear histograms are specified with the three values low, high, and width. 70 | Low and high specify a range [low, high) of values included in the 71 | histogram (all others are outliers). Width specifies the number of 72 | values represented by each bucket and therefore the number of 73 | buckets i.e. granularity of the histogram. The histogram range 74 | (high - low) must be a multiple of width: 75 | 76 |

 77 | #Want to track aggregate stats on response times in ms
 78 | response_stats = Aggregate.new(0, 2000, 50)
 79 | 
80 | 81 | The example above creates a linear histogram that tracks the 82 | response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully 83 | most of your samples fall in the first couple buckets! 84 | 85 | h3. Histogram Outliers 86 | 87 | An Aggregate records any samples that fall outside the histogram range as 88 | outliers: 89 | 90 |

 91 | # Number of samples that fall below the normal range
 92 | stats.outliers_low
 93 | 
 94 | # Number of samples that fall above the normal range
 95 | stats.outliers_high
 96 | 
97 | 98 | h3. Histogram Iterators 99 | 100 | Once a histogram is populated Aggregate provides iterator support for 101 | examining the contents of buckets. The iterators provide both the 102 | number of samples in the bucket, as well as its range: 103 | 104 |

105 | #Examine every bucket
106 | @stats.each do |bucket, count|
107 | end
108 | 
109 | #Examine only buckets containing samples
110 | @stats.each_nonzero do |bucket, count|
111 | end
112 | 
113 | 114 | h3. Histogram Bar Chart 115 | 116 | Finally Aggregate contains sophisticated pretty-printing support to generate 117 | ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and 118 | sample distribution the to_s method properly sets a marker weight based on the 119 | samples per bucket and aligns all output. Empty buckets are skipped to conserve 120 | screen space. 121 | 122 |

123 | # Generate and display an 80 column histogram
124 | puts stats.to_s
125 | 
126 | # Generate and display a 120 column histogram
127 | puts stats.to_s(120)
128 | 
129 | 130 | This code example populates both a binary and linear histogram with the same 131 | set of 65536 values generated by rand to produce the 132 | two histograms that follow it: 133 | 134 |

135 | require 'rubygems'
136 | require 'aggregate'
137 | 
138 | # Create an Aggregate instance
139 | binary_aggregate = Aggregate.new
140 | linear_aggregate = Aggregate.new(0, 65536, 8192)
141 | 
142 | 65536.times do
143 |   x = rand(65536)
144 |   binary_aggregate << x
145 |   linear_aggregate << x
146 | end
147 | 
148 | puts binary_aggregate.to_s
149 | puts linear_aggregate.to_s
150 | 
151 | 152 | h4. Binary Histogram 153 | 154 |

155 | value |------------------------------------------------------------------| count
156 |     1 |                                                                  |     3
157 |     2 |                                                                  |     1
158 |     4 |                                                                  |     5
159 |     8 |                                                                  |     9
160 |    16 |                                                                  |    15
161 |    32 |                                                                  |    29
162 |    64 |                                                                  |    62
163 |   128 |                                                                  |   115
164 |   256 |                                                                  |   267
165 |   512 |@                                                                 |   523
166 |  1024 |@                                                                 |   970
167 |  2048 |@@@                                                               |  1987
168 |  4096 |@@@@@@@@                                                          |  4075
169 |  8192 |@@@@@@@@@@@@@@@@                                                  |  8108
170 | 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                                  | 16405
171 | 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
172 |       ~
173 | Total |------------------------------------------------------------------| 65535
174 | 
175 | 176 | h4. Linear (0, 65536, 4096) Histogram 177 | 178 |

179 | value |------------------------------------------------------------------| count
180 |     0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4094
181 |  4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|  4202
182 |  8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4118
183 | 12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4059
184 | 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |  3999
185 | 20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4083
186 | 24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4134
187 | 28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4143
188 | 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4152
189 | 36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4033
190 | 40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4064
191 | 45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4012
192 | 49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4070
193 | 53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4090
194 | 57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4135
195 | 61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4144
196 | Total |------------------------------------------------------------------| 65532
197 | 
198 | We can see from these histograms that Ruby's rand function does a relatively good 199 | job of distributing returned values in the requested range. 200 | 201 | h2. Examples 202 | 203 | Here's an example of a "handy timing benchmark":http://gist.github.com/187669 204 | implemented with aggregate. 205 | 206 | h2. NOTES 207 | 208 | Ruby doesn't have a log2 function built into Math, so we approximate with 209 | log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due 210 | to precision limitations, once n reaches a certain size (somewhere > 32) 211 | this starts to return n. The larger the value of n, the more numbers i.e. 212 | (2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into 213 | using something like BigDecimal, but for the current purposes of the binary 214 | histogram i.e. a simple coarse-grained view the current implementation is 215 | sufficient. 216 | -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | require 'rake' 2 | 3 | require 'bundler/gem_tasks' 4 | 5 | require 'rake/testtask' 6 | 7 | Rake::TestTask.new do |t| 8 | t.libs << "test" 9 | t.test_files = FileList['test/ts_*.rb'] 10 | t.verbose = true 11 | end 12 | 13 | task :default => :test 14 | -------------------------------------------------------------------------------- /aggregate.gemspec: -------------------------------------------------------------------------------- 1 | # coding: utf-8 2 | lib = File.expand_path('../lib', __FILE__) 3 | $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib) 4 | require 'aggregate/version' 5 | 6 | Gem::Specification.new do |s| 7 | s.name = %q{aggregate} 8 | s.version = Aggregate::VERSION 9 | 10 | s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version= 11 | s.authors = ["Joseph Ruscio"] 12 | s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate} 13 | s.email = %q{joe@ruscio.org} 14 | s.extra_rdoc_files = [ 15 | "LICENSE", 16 | "README.textile" 17 | ] 18 | s.files = Dir["{lib}/**/*.*", "LICENSE", "README.textile"] 19 | s.homepage = %q{http://github.com/josephruscio/aggregate} 20 | s.rdoc_options = ["--charset=UTF-8"] 21 | s.require_paths = ["lib"] 22 | s.license = "MIT" 23 | s.summary = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support} 24 | s.test_files = [ 25 | "test/ts_aggregate.rb" 26 | ] 27 | 28 | if s.respond_to? :specification_version then 29 | s.specification_version = 3 30 | 31 | if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then 32 | else 33 | end 34 | else 35 | end 36 | end 37 | -------------------------------------------------------------------------------- /lib/aggregate.rb: -------------------------------------------------------------------------------- 1 | # Implements aggregate statistics and maintains 2 | # configurable histogram for a set of given samples. Convenient for tracking 3 | # high throughput data. 4 | class Aggregate 5 | #The current number of samples 6 | attr_reader :count 7 | 8 | #The maximum sample value 9 | attr_reader :max 10 | 11 | #The minimum samples value 12 | attr_reader :min 13 | 14 | #The sum of all samples 15 | attr_reader :sum 16 | 17 | #The number of samples falling below the lowest valued histogram bucket 18 | attr_reader :outliers_low 19 | 20 | #The number of samples falling above the highest valued histogram bucket 21 | attr_reader :outliers_high 22 | 23 | # The number of buckets in the binary logarithmic histogram (low => 2**0, high => 2**@@LOG_BUCKETS) 24 | @@LOG_BUCKETS = 128 25 | 26 | # Create a new Aggregate that maintains a binary logarithmic histogram 27 | # by default. Specifying values for low, high, and width configures 28 | # the aggregate to maintain a linear histogram with (high - low)/width buckets 29 | def initialize(low=nil, high=nil, width=nil) 30 | @count = 0 31 | @sum = 0.0 32 | @sum2 = 0.0 33 | @outliers_low = 0 34 | @outliers_high = 0 35 | 36 | # If the user asks we maintain a linear histogram where 37 | # values in the range [low, high) are bucketed in multiples 38 | # of width 39 | if (nil != low && nil != high && nil != width) 40 | 41 | #Validate linear specification 42 | if high <= low 43 | raise ArgumentError, "High bucket must be > Low bucket" 44 | end 45 | 46 | if high - low < width 47 | raise ArgumentError, "Histogram width must be <= histogram range" 48 | end 49 | 50 | if 0 != (high - low).modulo(width) 51 | raise ArgumentError, "Histogram range (high - low) must be a multiple of width" 52 | end 53 | 54 | @low = low 55 | @high = high 56 | @width = width 57 | else 58 | @low = 1 59 | @width = nil 60 | @high = to_bucket(@@LOG_BUCKETS - 1) 61 | end 62 | 63 | #Initialize all buckets to 0 64 | @buckets = Array.new(bucket_count, 0) 65 | end 66 | 67 | # Include a sample in the aggregate 68 | def << data 69 | 70 | # Update min/max 71 | if 0 == @count 72 | @min = data 73 | @max = data 74 | else 75 | @max = data if data > @max 76 | @min = data if data < @min 77 | end 78 | 79 | # Update the running info 80 | @count += 1 81 | @sum += data 82 | @sum2 += (data * data) 83 | 84 | # Update the bucket 85 | @buckets[to_index(data)] += 1 unless outlier?(data) 86 | end 87 | 88 | #The current average of all samples 89 | def mean 90 | @sum / @count 91 | end 92 | 93 | #Calculate the standard deviation 94 | def std_dev 95 | Math.sqrt((@sum2.to_f - ((@sum.to_f * @sum.to_f)/@count.to_f)) / (@count.to_f - 1)) 96 | end 97 | 98 | # Combine two aggregates 99 | #def +(b) 100 | # a = self 101 | # c = Aggregate.new 102 | 103 | # c.count = a.count + b.count 104 | #end 105 | 106 | #Generate a pretty-printed ASCII representation of the histogram 107 | def to_s(columns=nil) 108 | 109 | #default to an 80 column terminal, don't support < 80 for now 110 | if nil == columns 111 | columns = 80 112 | else 113 | raise ArgumentError if columns < 80 114 | end 115 | 116 | #Find the largest bucket and create an array of the rows we intend to print 117 | disp_buckets = Array.new 118 | max_count = 0 119 | total = 0 120 | @buckets.each_with_index do |count, idx| 121 | next if 0 == count 122 | max_count = [max_count, count].max 123 | disp_buckets << [idx, to_bucket(idx), count] 124 | total += count 125 | end 126 | 127 | #XXX: Better to print just header --> footer 128 | return "Empty histogram" if 0 == disp_buckets.length 129 | 130 | #Figure out how wide the value and count columns need to be based on their 131 | #largest respective numbers 132 | value_str = "value" 133 | count_str = "count" 134 | total_str = "Total" 135 | value_width = [disp_buckets.last[1].to_s.length, value_str.length].max 136 | value_width = [value_width, total_str.length].max 137 | count_width = [total.to_s.length, count_str.length].max 138 | max_bar_width = columns - (value_width + " |".length + "| ".length + count_width) 139 | 140 | #Determine the value of a '@' 141 | weight = [max_count.to_f/max_bar_width.to_f, 1.0].max 142 | 143 | #format the header 144 | histogram = sprintf("%#{value_width}s |", value_str) 145 | max_bar_width.times { histogram << "-"} 146 | histogram << sprintf("| %#{count_width}s\n", count_str) 147 | 148 | # We denote empty buckets with a '~' 149 | def skip_row(value_width) 150 | sprintf("%#{value_width}s ~\n", " ") 151 | end 152 | 153 | #Loop through each bucket to be displayed and output the correct number 154 | prev_index = disp_buckets[0][0] - 1 155 | 156 | disp_buckets.each do |x| 157 | #Denote skipped empty buckets with a ~ 158 | histogram << skip_row(value_width) unless prev_index == x[0] - 1 159 | prev_index = x[0] 160 | 161 | #Add the value 162 | row = sprintf("%#{value_width}d |", x[1]) 163 | 164 | #Add the bar 165 | bar_size = (x[2]/weight).to_i 166 | bar_size.times { row += "@"} 167 | (max_bar_width - bar_size).times { row += " " } 168 | 169 | #Add the count 170 | row << sprintf("| %#{count_width}d\n", x[2]) 171 | 172 | #Append the finished row onto the histogram 173 | histogram << row 174 | end 175 | 176 | #End the table 177 | histogram << skip_row(value_width) if disp_buckets.last[0] != bucket_count-1 178 | histogram << sprintf("%#{value_width}s", "Total") 179 | histogram << " |" 180 | max_bar_width.times {histogram << "-"} 181 | histogram << "| " 182 | histogram << sprintf("%#{count_width}d\n", total) 183 | end 184 | 185 | #Iterate through each bucket in the histogram regardless of 186 | #its contents 187 | def each 188 | @buckets.each_with_index do |count, index| 189 | yield(to_bucket(index), count) 190 | end 191 | end 192 | 193 | #Iterate through only the buckets in the histogram that contain 194 | #samples 195 | def each_nonzero 196 | @buckets.each_with_index do |count, index| 197 | yield(to_bucket(index), count) if count != 0 198 | end 199 | end 200 | 201 | private 202 | 203 | def linear? 204 | nil != @width 205 | end 206 | 207 | def outlier?(data) 208 | 209 | if data < @low 210 | @outliers_low += 1 211 | elsif data >= @high 212 | @outliers_high += 1 213 | else 214 | return false 215 | end 216 | end 217 | 218 | def bucket_count 219 | if linear? 220 | return (@high-@low)/@width 221 | else 222 | return @@LOG_BUCKETS 223 | end 224 | end 225 | 226 | def to_bucket(index) 227 | if linear? 228 | return @low + (index * @width) 229 | else 230 | return 2**(index) 231 | end 232 | end 233 | 234 | def right_bucket?(index, data) 235 | 236 | # check invariant 237 | raise unless linear? 238 | 239 | bucket = to_bucket(index) 240 | 241 | #It's the right bucket if data falls between bucket and next bucket 242 | bucket <= data && data < bucket + @width 243 | end 244 | 245 | =begin 246 | def find_bucket(lower, upper, target) 247 | #Classic binary search 248 | return upper if right_bucket?(upper, target) 249 | 250 | # Cut the search range in half 251 | middle = (upper/2).to_i 252 | 253 | # Determine which half contains our value and recurse 254 | if (to_bucket(middle) >= target) 255 | return find_bucket(lower, middle, target) 256 | else 257 | return find_bucket(middle, upper, target) 258 | end 259 | end 260 | =end 261 | 262 | # A data point is added to the bucket[n] where the data point 263 | # is less than the value represented by bucket[n], but greater 264 | # than the value represented by bucket[n+1] 265 | def to_index(data) 266 | 267 | # basic case is simple 268 | return log2(data).to_i if !linear? 269 | 270 | # Search for the right bucket in the linear case 271 | @buckets.each_with_index do |count, idx| 272 | return idx if right_bucket?(idx, data) 273 | end 274 | #find_bucket(0, bucket_count-1, data) 275 | 276 | #Should not get here 277 | raise "#{data}" 278 | end 279 | 280 | # log2(x) returns j, | i = j-1 and 2**i <= data < 2**j 281 | @@LOG2_DIVEDEND = Math.log(2) 282 | def log2( x ) 283 | Math.log(x) / @@LOG2_DIVEDEND 284 | end 285 | 286 | end 287 | 288 | require_relative 'aggregate/version' -------------------------------------------------------------------------------- /lib/aggregate/version.rb: -------------------------------------------------------------------------------- 1 | class Aggregate 2 | VERSION = "0.2.4" 3 | end -------------------------------------------------------------------------------- /test/ts_aggregate.rb: -------------------------------------------------------------------------------- 1 | require 'minitest/autorun' 2 | require 'aggregate' 3 | 4 | class SimpleStatsTest < MiniTest::Test 5 | 6 | def setup 7 | @stats = Aggregate.new 8 | 9 | @@DATA.each do |x| 10 | @stats << x 11 | end 12 | end 13 | 14 | def test_stats_count 15 | assert_equal @@DATA.length, @stats.count 16 | end 17 | 18 | def test_stats_min_max 19 | sorted_data = @@DATA.sort 20 | 21 | assert_equal sorted_data[0], @stats.min 22 | assert_equal sorted_data.last, @stats.max 23 | end 24 | 25 | def test_stats_mean 26 | sum = 0 27 | @@DATA.each do |x| 28 | sum += x 29 | end 30 | 31 | assert_equal sum.to_f/@@DATA.length.to_f, @stats.mean 32 | end 33 | 34 | def test_bucket_counts 35 | 36 | #Test each iterator 37 | total_bucket_sum = 0 38 | i = 0 39 | @stats.each do |bucket, count| 40 | assert_equal 2**i, bucket 41 | 42 | total_bucket_sum += count 43 | i += 1 44 | end 45 | 46 | assert_equal total_bucket_sum, @@DATA.length 47 | 48 | #Test each_nonzero iterator 49 | prev_bucket = 0 50 | total_bucket_sum = 0 51 | @stats.each_nonzero do |bucket, count| 52 | assert bucket > prev_bucket 53 | refute_equal count, 0 54 | 55 | total_bucket_sum += count 56 | end 57 | 58 | assert_equal total_bucket_sum, @@DATA.length 59 | end 60 | 61 | =begin 62 | def test_addition 63 | stats1 = Aggregate.new 64 | stats2 = Aggregate.new 65 | 66 | stats1 << 1 67 | stats2 << 3 68 | 69 | stats_sum = stats1 + stats2 70 | 71 | assert_equal stats_sum.count, stats1.count + stats2.count 72 | end 73 | =end 74 | 75 | #XXX: Update test_bucket_contents() if you muck with @@DATA 76 | @@DATA = [ 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383] 77 | def test_bucket_contents 78 | #XXX: This is the only test so far that cares about the actual contents 79 | # of @@DATA, so if you update that array ... update this method too 80 | expected_buckets = [1, 4, 1024, 8192, 16384] 81 | expected_counts = [1, 3, 2, 1, 2] 82 | 83 | i = 0 84 | @stats.each_nonzero do |bucket, count| 85 | assert_equal expected_buckets[i], bucket 86 | assert_equal expected_counts[i], count 87 | # Increment for the next test 88 | i += 1 89 | end 90 | end 91 | 92 | def test_histogram 93 | puts @stats.to_s 94 | end 95 | 96 | def test_outlier 97 | assert_equal 0, @stats.outliers_low 98 | assert_equal 0, @stats.outliers_high 99 | 100 | @stats << -1 101 | @stats << -2 102 | @stats << 0 103 | 104 | @stats << 2**128 105 | 106 | # This should be the last value in the last bucket, but Ruby's native 107 | # floats are not precise enough. Somewhere past 2^32 the log(x)/log(2) 108 | # breaks down. So it shows up as 128 (outlier) instead of 127 109 | #@stats << (2**128) - 1 110 | 111 | assert_equal 3, @stats.outliers_low 112 | assert_equal 1, @stats.outliers_high 113 | end 114 | 115 | def test_std_dev 116 | @stats.std_dev 117 | end 118 | end 119 | 120 | class LinearHistogramTest < MiniTest::Test 121 | def setup 122 | @stats = Aggregate.new(0, 32768, 1024) 123 | 124 | @@DATA.each do |x| 125 | @stats << x 126 | end 127 | end 128 | 129 | def test_validation 130 | 131 | # Range cannot be 0 132 | assert_raises(ArgumentError) { Aggregate.new(32,32,4) } 133 | 134 | # Range cannot be negative 135 | assert_raises(ArgumentError) { Aggregate.new(32,16,4) } 136 | 137 | # Range cannot be < single bucket 138 | assert_raises(ArgumentError) { Aggregate.new(16,32,17) } 139 | 140 | # Range % width must equal 0 (for now) 141 | assert_raises(ArgumentError) { Aggregate.new(1,16384,1024) } 142 | end 143 | 144 | #XXX: Update test_bucket_contents() if you muck with @@DATA 145 | # 32768 is an outlier 146 | @@DATA = [ 0, 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383, 32768] 147 | def test_bucket_contents 148 | #XXX: This is the only test so far that cares about the actual contents 149 | # of @@DATA, so if you update that array ... update this method too 150 | expected_buckets = [0, 1024, 15360, 16384] 151 | expected_counts = [5, 2, 1, 2] 152 | 153 | i = 0 154 | @stats.each_nonzero do |bucket, count| 155 | assert_equal expected_buckets[i], bucket 156 | assert_equal expected_counts[i], count 157 | # Increment for the next test 158 | i += 1 159 | end 160 | end 161 | 162 | end 163 | --------------------------------------------------------------------------------