├── .gitignore ├── .travis.yml ├── Gemfile ├── LICENSE ├── README.textile ├── Rakefile ├── aggregate.gemspec ├── lib ├── aggregate.rb └── aggregate │ └── version.rb └── test └── ts_aggregate.rb /.gitignore: -------------------------------------------------------------------------------- 1 | Gemfile.lock 2 | pkg/ 3 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | arch: 2 | - amd64 3 | - ppc64le 4 | language: ruby 5 | rvm: 6 | - 1.9.3 7 | - 2.3 8 | - 2.5 9 | - 2.6 10 | - 2.7 11 | matrix: 12 | exclude: 13 | - rvm: 1.9.3 14 | arch: ppc64le 15 | - rvm: 2.3 16 | arch: ppc64le 17 | - rvm: 2.5 18 | arch: ppc64le 19 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | 3 | gem "rake" 4 | gem "minitest" -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2009 Joseph Ruscio 2 | 3 | Permission is hereby granted, free of charge, to any person 4 | obtaining a copy of this software and associated documentation 5 | files (the "Software"), to deal in the Software without 6 | restriction, including without limitation the rights to use, 7 | copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the 9 | Software is furnished to do so, subject to the following 10 | conditions: 11 | 12 | The above copyright notice and this permission notice shall be 13 | included in all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 16 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 17 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 18 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 19 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 20 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 21 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.textile: -------------------------------------------------------------------------------- 1 | h1. Aggregate 2 | 3 | By Joseph Ruscio 4 | 5 | Aggregate is an intuitive ruby implementation of a statistics aggregator 6 | including both default and configurable histogram support. It does this 7 | without recording/storing any of the actual sample values, making it 8 | suitable for tracking statistics across millions/billions of sample 9 | without any impact on performance or memory footprint. Originally 10 | inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap 11 | 12 | h2. Getting Started 13 | 14 | Aggregates are easy to instantiate, populate with sample data, and then 15 | inspect for common aggregate statistics: 16 | 17 |
18 | #After instantiation use the << operator to add a sample to the aggregate:
19 | stats = Aggregate.new
20 |
21 | loop do
22 | # Take some action that generates a sample measurement
23 | stats << sample
24 | end
25 |
26 | # The number of samples
27 | stats.count
28 |
29 | # The average
30 | stats.mean
31 |
32 | # Max sample value
33 | stats.max
34 |
35 | # Min sample value
36 | stats.min
37 |
38 | # The standard deviation
39 | stats.std_dev
40 |
41 |
42 | h2. Histograms
43 |
44 | Perhaps more importantly than the basic aggregate statistics detailed above
45 | Aggregate also maintains a histogram of samples. For anything other than
46 | normally distributed data are insufficient at best and often downright misleading
47 | 37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
48 | Aggregates maintains its histogram internally as a set of "buckets".
49 | Each bucket represents a range of possible sample values. The set of all buckets
50 | represents the range of "normal" sample values.
51 |
52 | h3. Binary Histograms
53 |
54 | Without any configuration Aggregate instance maintains a binary histogram, where
55 | each bucket represents a range twice as large as the preceding bucket i.e.
56 | [1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
57 | provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
58 | (See NOTES below for a discussion on the effects in practice of insufficient
59 | precision.)
60 |
61 | Binary histograms are useful when we have little idea about what the
62 | sample distribution may look like as almost any positive value will
63 | fall into some bucket. After using binary histograms to determine
64 | the coarse-grained characteristics of your sample space you can
65 | configure a linear histogram to examine it in closer detail.
66 |
67 | h3. Linear Histograms
68 |
69 | Linear histograms are specified with the three values low, high, and width.
70 | Low and high specify a range [low, high) of values included in the
71 | histogram (all others are outliers). Width specifies the number of
72 | values represented by each bucket and therefore the number of
73 | buckets i.e. granularity of the histogram. The histogram range
74 | (high - low) must be a multiple of width:
75 |
76 |
77 | #Want to track aggregate stats on response times in ms
78 | response_stats = Aggregate.new(0, 2000, 50)
79 |
80 |
81 | The example above creates a linear histogram that tracks the
82 | response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
83 | most of your samples fall in the first couple buckets!
84 |
85 | h3. Histogram Outliers
86 |
87 | An Aggregate records any samples that fall outside the histogram range as
88 | outliers:
89 |
90 |
91 | # Number of samples that fall below the normal range
92 | stats.outliers_low
93 |
94 | # Number of samples that fall above the normal range
95 | stats.outliers_high
96 |
97 |
98 | h3. Histogram Iterators
99 |
100 | Once a histogram is populated Aggregate provides iterator support for
101 | examining the contents of buckets. The iterators provide both the
102 | number of samples in the bucket, as well as its range:
103 |
104 |
105 | #Examine every bucket
106 | @stats.each do |bucket, count|
107 | end
108 |
109 | #Examine only buckets containing samples
110 | @stats.each_nonzero do |bucket, count|
111 | end
112 |
113 |
114 | h3. Histogram Bar Chart
115 |
116 | Finally Aggregate contains sophisticated pretty-printing support to generate
117 | ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
118 | sample distribution the to_s
method properly sets a marker weight based on the
119 | samples per bucket and aligns all output. Empty buckets are skipped to conserve
120 | screen space.
121 |
122 |
123 | # Generate and display an 80 column histogram
124 | puts stats.to_s
125 |
126 | # Generate and display a 120 column histogram
127 | puts stats.to_s(120)
128 |
129 |
130 | This code example populates both a binary and linear histogram with the same
131 | set of 65536 values generated by rand
to produce the
132 | two histograms that follow it:
133 |
134 |
135 | require 'rubygems'
136 | require 'aggregate'
137 |
138 | # Create an Aggregate instance
139 | binary_aggregate = Aggregate.new
140 | linear_aggregate = Aggregate.new(0, 65536, 8192)
141 |
142 | 65536.times do
143 | x = rand(65536)
144 | binary_aggregate << x
145 | linear_aggregate << x
146 | end
147 |
148 | puts binary_aggregate.to_s
149 | puts linear_aggregate.to_s
150 |
151 |
152 | h4. Binary Histogram
153 |
154 |
155 | value |------------------------------------------------------------------| count
156 | 1 | | 3
157 | 2 | | 1
158 | 4 | | 5
159 | 8 | | 9
160 | 16 | | 15
161 | 32 | | 29
162 | 64 | | 62
163 | 128 | | 115
164 | 256 | | 267
165 | 512 |@ | 523
166 | 1024 |@ | 970
167 | 2048 |@@@ | 1987
168 | 4096 |@@@@@@@@ | 4075
169 | 8192 |@@@@@@@@@@@@@@@@ | 8108
170 | 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 16405
171 | 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
172 | ~
173 | Total |------------------------------------------------------------------| 65535
174 |
175 |
176 | h4. Linear (0, 65536, 4096) Histogram
177 |
178 |
179 | value |------------------------------------------------------------------| count
180 | 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
181 | 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
182 | 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4118
183 | 12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4059
184 | 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 3999
185 | 20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4083
186 | 24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4134
187 | 28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4143
188 | 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4152
189 | 36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4033
190 | 40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4064
191 | 45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4012
192 | 49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4070
193 | 53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4090
194 | 57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
195 | 61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
196 | Total |------------------------------------------------------------------| 65532
197 |
198 | We can see from these histograms that Ruby's rand function does a relatively good
199 | job of distributing returned values in the requested range.
200 |
201 | h2. Examples
202 |
203 | Here's an example of a "handy timing benchmark":http://gist.github.com/187669
204 | implemented with aggregate.
205 |
206 | h2. NOTES
207 |
208 | Ruby doesn't have a log2 function built into Math, so we approximate with
209 | log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
210 | to precision limitations, once n reaches a certain size (somewhere > 32)
211 | this starts to return n. The larger the value of n, the more numbers i.e.
212 | (2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into
213 | using something like BigDecimal, but for the current purposes of the binary
214 | histogram i.e. a simple coarse-grained view the current implementation is
215 | sufficient.
216 |
--------------------------------------------------------------------------------
/Rakefile:
--------------------------------------------------------------------------------
1 | require 'rake'
2 |
3 | require 'bundler/gem_tasks'
4 |
5 | require 'rake/testtask'
6 |
7 | Rake::TestTask.new do |t|
8 | t.libs << "test"
9 | t.test_files = FileList['test/ts_*.rb']
10 | t.verbose = true
11 | end
12 |
13 | task :default => :test
14 |
--------------------------------------------------------------------------------
/aggregate.gemspec:
--------------------------------------------------------------------------------
1 | # coding: utf-8
2 | lib = File.expand_path('../lib', __FILE__)
3 | $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4 | require 'aggregate/version'
5 |
6 | Gem::Specification.new do |s|
7 | s.name = %q{aggregate}
8 | s.version = Aggregate::VERSION
9 |
10 | s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11 | s.authors = ["Joseph Ruscio"]
12 | s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
13 | s.email = %q{joe@ruscio.org}
14 | s.extra_rdoc_files = [
15 | "LICENSE",
16 | "README.textile"
17 | ]
18 | s.files = Dir["{lib}/**/*.*", "LICENSE", "README.textile"]
19 | s.homepage = %q{http://github.com/josephruscio/aggregate}
20 | s.rdoc_options = ["--charset=UTF-8"]
21 | s.require_paths = ["lib"]
22 | s.license = "MIT"
23 | s.summary = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
24 | s.test_files = [
25 | "test/ts_aggregate.rb"
26 | ]
27 |
28 | if s.respond_to? :specification_version then
29 | s.specification_version = 3
30 |
31 | if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
32 | else
33 | end
34 | else
35 | end
36 | end
37 |
--------------------------------------------------------------------------------
/lib/aggregate.rb:
--------------------------------------------------------------------------------
1 | # Implements aggregate statistics and maintains
2 | # configurable histogram for a set of given samples. Convenient for tracking
3 | # high throughput data.
4 | class Aggregate
5 | #The current number of samples
6 | attr_reader :count
7 |
8 | #The maximum sample value
9 | attr_reader :max
10 |
11 | #The minimum samples value
12 | attr_reader :min
13 |
14 | #The sum of all samples
15 | attr_reader :sum
16 |
17 | #The number of samples falling below the lowest valued histogram bucket
18 | attr_reader :outliers_low
19 |
20 | #The number of samples falling above the highest valued histogram bucket
21 | attr_reader :outliers_high
22 |
23 | # The number of buckets in the binary logarithmic histogram (low => 2**0, high => 2**@@LOG_BUCKETS)
24 | @@LOG_BUCKETS = 128
25 |
26 | # Create a new Aggregate that maintains a binary logarithmic histogram
27 | # by default. Specifying values for low, high, and width configures
28 | # the aggregate to maintain a linear histogram with (high - low)/width buckets
29 | def initialize(low=nil, high=nil, width=nil)
30 | @count = 0
31 | @sum = 0.0
32 | @sum2 = 0.0
33 | @outliers_low = 0
34 | @outliers_high = 0
35 |
36 | # If the user asks we maintain a linear histogram where
37 | # values in the range [low, high) are bucketed in multiples
38 | # of width
39 | if (nil != low && nil != high && nil != width)
40 |
41 | #Validate linear specification
42 | if high <= low
43 | raise ArgumentError, "High bucket must be > Low bucket"
44 | end
45 |
46 | if high - low < width
47 | raise ArgumentError, "Histogram width must be <= histogram range"
48 | end
49 |
50 | if 0 != (high - low).modulo(width)
51 | raise ArgumentError, "Histogram range (high - low) must be a multiple of width"
52 | end
53 |
54 | @low = low
55 | @high = high
56 | @width = width
57 | else
58 | @low = 1
59 | @width = nil
60 | @high = to_bucket(@@LOG_BUCKETS - 1)
61 | end
62 |
63 | #Initialize all buckets to 0
64 | @buckets = Array.new(bucket_count, 0)
65 | end
66 |
67 | # Include a sample in the aggregate
68 | def << data
69 |
70 | # Update min/max
71 | if 0 == @count
72 | @min = data
73 | @max = data
74 | else
75 | @max = data if data > @max
76 | @min = data if data < @min
77 | end
78 |
79 | # Update the running info
80 | @count += 1
81 | @sum += data
82 | @sum2 += (data * data)
83 |
84 | # Update the bucket
85 | @buckets[to_index(data)] += 1 unless outlier?(data)
86 | end
87 |
88 | #The current average of all samples
89 | def mean
90 | @sum / @count
91 | end
92 |
93 | #Calculate the standard deviation
94 | def std_dev
95 | Math.sqrt((@sum2.to_f - ((@sum.to_f * @sum.to_f)/@count.to_f)) / (@count.to_f - 1))
96 | end
97 |
98 | # Combine two aggregates
99 | #def +(b)
100 | # a = self
101 | # c = Aggregate.new
102 |
103 | # c.count = a.count + b.count
104 | #end
105 |
106 | #Generate a pretty-printed ASCII representation of the histogram
107 | def to_s(columns=nil)
108 |
109 | #default to an 80 column terminal, don't support < 80 for now
110 | if nil == columns
111 | columns = 80
112 | else
113 | raise ArgumentError if columns < 80
114 | end
115 |
116 | #Find the largest bucket and create an array of the rows we intend to print
117 | disp_buckets = Array.new
118 | max_count = 0
119 | total = 0
120 | @buckets.each_with_index do |count, idx|
121 | next if 0 == count
122 | max_count = [max_count, count].max
123 | disp_buckets << [idx, to_bucket(idx), count]
124 | total += count
125 | end
126 |
127 | #XXX: Better to print just header --> footer
128 | return "Empty histogram" if 0 == disp_buckets.length
129 |
130 | #Figure out how wide the value and count columns need to be based on their
131 | #largest respective numbers
132 | value_str = "value"
133 | count_str = "count"
134 | total_str = "Total"
135 | value_width = [disp_buckets.last[1].to_s.length, value_str.length].max
136 | value_width = [value_width, total_str.length].max
137 | count_width = [total.to_s.length, count_str.length].max
138 | max_bar_width = columns - (value_width + " |".length + "| ".length + count_width)
139 |
140 | #Determine the value of a '@'
141 | weight = [max_count.to_f/max_bar_width.to_f, 1.0].max
142 |
143 | #format the header
144 | histogram = sprintf("%#{value_width}s |", value_str)
145 | max_bar_width.times { histogram << "-"}
146 | histogram << sprintf("| %#{count_width}s\n", count_str)
147 |
148 | # We denote empty buckets with a '~'
149 | def skip_row(value_width)
150 | sprintf("%#{value_width}s ~\n", " ")
151 | end
152 |
153 | #Loop through each bucket to be displayed and output the correct number
154 | prev_index = disp_buckets[0][0] - 1
155 |
156 | disp_buckets.each do |x|
157 | #Denote skipped empty buckets with a ~
158 | histogram << skip_row(value_width) unless prev_index == x[0] - 1
159 | prev_index = x[0]
160 |
161 | #Add the value
162 | row = sprintf("%#{value_width}d |", x[1])
163 |
164 | #Add the bar
165 | bar_size = (x[2]/weight).to_i
166 | bar_size.times { row += "@"}
167 | (max_bar_width - bar_size).times { row += " " }
168 |
169 | #Add the count
170 | row << sprintf("| %#{count_width}d\n", x[2])
171 |
172 | #Append the finished row onto the histogram
173 | histogram << row
174 | end
175 |
176 | #End the table
177 | histogram << skip_row(value_width) if disp_buckets.last[0] != bucket_count-1
178 | histogram << sprintf("%#{value_width}s", "Total")
179 | histogram << " |"
180 | max_bar_width.times {histogram << "-"}
181 | histogram << "| "
182 | histogram << sprintf("%#{count_width}d\n", total)
183 | end
184 |
185 | #Iterate through each bucket in the histogram regardless of
186 | #its contents
187 | def each
188 | @buckets.each_with_index do |count, index|
189 | yield(to_bucket(index), count)
190 | end
191 | end
192 |
193 | #Iterate through only the buckets in the histogram that contain
194 | #samples
195 | def each_nonzero
196 | @buckets.each_with_index do |count, index|
197 | yield(to_bucket(index), count) if count != 0
198 | end
199 | end
200 |
201 | private
202 |
203 | def linear?
204 | nil != @width
205 | end
206 |
207 | def outlier?(data)
208 |
209 | if data < @low
210 | @outliers_low += 1
211 | elsif data >= @high
212 | @outliers_high += 1
213 | else
214 | return false
215 | end
216 | end
217 |
218 | def bucket_count
219 | if linear?
220 | return (@high-@low)/@width
221 | else
222 | return @@LOG_BUCKETS
223 | end
224 | end
225 |
226 | def to_bucket(index)
227 | if linear?
228 | return @low + (index * @width)
229 | else
230 | return 2**(index)
231 | end
232 | end
233 |
234 | def right_bucket?(index, data)
235 |
236 | # check invariant
237 | raise unless linear?
238 |
239 | bucket = to_bucket(index)
240 |
241 | #It's the right bucket if data falls between bucket and next bucket
242 | bucket <= data && data < bucket + @width
243 | end
244 |
245 | =begin
246 | def find_bucket(lower, upper, target)
247 | #Classic binary search
248 | return upper if right_bucket?(upper, target)
249 |
250 | # Cut the search range in half
251 | middle = (upper/2).to_i
252 |
253 | # Determine which half contains our value and recurse
254 | if (to_bucket(middle) >= target)
255 | return find_bucket(lower, middle, target)
256 | else
257 | return find_bucket(middle, upper, target)
258 | end
259 | end
260 | =end
261 |
262 | # A data point is added to the bucket[n] where the data point
263 | # is less than the value represented by bucket[n], but greater
264 | # than the value represented by bucket[n+1]
265 | def to_index(data)
266 |
267 | # basic case is simple
268 | return log2(data).to_i if !linear?
269 |
270 | # Search for the right bucket in the linear case
271 | @buckets.each_with_index do |count, idx|
272 | return idx if right_bucket?(idx, data)
273 | end
274 | #find_bucket(0, bucket_count-1, data)
275 |
276 | #Should not get here
277 | raise "#{data}"
278 | end
279 |
280 | # log2(x) returns j, | i = j-1 and 2**i <= data < 2**j
281 | @@LOG2_DIVEDEND = Math.log(2)
282 | def log2( x )
283 | Math.log(x) / @@LOG2_DIVEDEND
284 | end
285 |
286 | end
287 |
288 | require_relative 'aggregate/version'
--------------------------------------------------------------------------------
/lib/aggregate/version.rb:
--------------------------------------------------------------------------------
1 | class Aggregate
2 | VERSION = "0.2.4"
3 | end
--------------------------------------------------------------------------------
/test/ts_aggregate.rb:
--------------------------------------------------------------------------------
1 | require 'minitest/autorun'
2 | require 'aggregate'
3 |
4 | class SimpleStatsTest < MiniTest::Test
5 |
6 | def setup
7 | @stats = Aggregate.new
8 |
9 | @@DATA.each do |x|
10 | @stats << x
11 | end
12 | end
13 |
14 | def test_stats_count
15 | assert_equal @@DATA.length, @stats.count
16 | end
17 |
18 | def test_stats_min_max
19 | sorted_data = @@DATA.sort
20 |
21 | assert_equal sorted_data[0], @stats.min
22 | assert_equal sorted_data.last, @stats.max
23 | end
24 |
25 | def test_stats_mean
26 | sum = 0
27 | @@DATA.each do |x|
28 | sum += x
29 | end
30 |
31 | assert_equal sum.to_f/@@DATA.length.to_f, @stats.mean
32 | end
33 |
34 | def test_bucket_counts
35 |
36 | #Test each iterator
37 | total_bucket_sum = 0
38 | i = 0
39 | @stats.each do |bucket, count|
40 | assert_equal 2**i, bucket
41 |
42 | total_bucket_sum += count
43 | i += 1
44 | end
45 |
46 | assert_equal total_bucket_sum, @@DATA.length
47 |
48 | #Test each_nonzero iterator
49 | prev_bucket = 0
50 | total_bucket_sum = 0
51 | @stats.each_nonzero do |bucket, count|
52 | assert bucket > prev_bucket
53 | refute_equal count, 0
54 |
55 | total_bucket_sum += count
56 | end
57 |
58 | assert_equal total_bucket_sum, @@DATA.length
59 | end
60 |
61 | =begin
62 | def test_addition
63 | stats1 = Aggregate.new
64 | stats2 = Aggregate.new
65 |
66 | stats1 << 1
67 | stats2 << 3
68 |
69 | stats_sum = stats1 + stats2
70 |
71 | assert_equal stats_sum.count, stats1.count + stats2.count
72 | end
73 | =end
74 |
75 | #XXX: Update test_bucket_contents() if you muck with @@DATA
76 | @@DATA = [ 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383]
77 | def test_bucket_contents
78 | #XXX: This is the only test so far that cares about the actual contents
79 | # of @@DATA, so if you update that array ... update this method too
80 | expected_buckets = [1, 4, 1024, 8192, 16384]
81 | expected_counts = [1, 3, 2, 1, 2]
82 |
83 | i = 0
84 | @stats.each_nonzero do |bucket, count|
85 | assert_equal expected_buckets[i], bucket
86 | assert_equal expected_counts[i], count
87 | # Increment for the next test
88 | i += 1
89 | end
90 | end
91 |
92 | def test_histogram
93 | puts @stats.to_s
94 | end
95 |
96 | def test_outlier
97 | assert_equal 0, @stats.outliers_low
98 | assert_equal 0, @stats.outliers_high
99 |
100 | @stats << -1
101 | @stats << -2
102 | @stats << 0
103 |
104 | @stats << 2**128
105 |
106 | # This should be the last value in the last bucket, but Ruby's native
107 | # floats are not precise enough. Somewhere past 2^32 the log(x)/log(2)
108 | # breaks down. So it shows up as 128 (outlier) instead of 127
109 | #@stats << (2**128) - 1
110 |
111 | assert_equal 3, @stats.outliers_low
112 | assert_equal 1, @stats.outliers_high
113 | end
114 |
115 | def test_std_dev
116 | @stats.std_dev
117 | end
118 | end
119 |
120 | class LinearHistogramTest < MiniTest::Test
121 | def setup
122 | @stats = Aggregate.new(0, 32768, 1024)
123 |
124 | @@DATA.each do |x|
125 | @stats << x
126 | end
127 | end
128 |
129 | def test_validation
130 |
131 | # Range cannot be 0
132 | assert_raises(ArgumentError) { Aggregate.new(32,32,4) }
133 |
134 | # Range cannot be negative
135 | assert_raises(ArgumentError) { Aggregate.new(32,16,4) }
136 |
137 | # Range cannot be < single bucket
138 | assert_raises(ArgumentError) { Aggregate.new(16,32,17) }
139 |
140 | # Range % width must equal 0 (for now)
141 | assert_raises(ArgumentError) { Aggregate.new(1,16384,1024) }
142 | end
143 |
144 | #XXX: Update test_bucket_contents() if you muck with @@DATA
145 | # 32768 is an outlier
146 | @@DATA = [ 0, 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383, 32768]
147 | def test_bucket_contents
148 | #XXX: This is the only test so far that cares about the actual contents
149 | # of @@DATA, so if you update that array ... update this method too
150 | expected_buckets = [0, 1024, 15360, 16384]
151 | expected_counts = [5, 2, 1, 2]
152 |
153 | i = 0
154 | @stats.each_nonzero do |bucket, count|
155 | assert_equal expected_buckets[i], bucket
156 | assert_equal expected_counts[i], count
157 | # Increment for the next test
158 | i += 1
159 | end
160 | end
161 |
162 | end
163 |
--------------------------------------------------------------------------------