├── .github └── workflows │ └── build.yml ├── .gitignore ├── CHANGELOG.md ├── Gemfile ├── LICENSE.txt ├── README.md ├── Rakefile ├── active_hll.gemspec ├── gemfiles ├── activerecord71.gemfile └── activerecord72.gemfile ├── lib ├── active_hll.rb ├── active_hll │ ├── hll.rb │ ├── model.rb │ ├── type.rb │ ├── utils.rb │ └── version.rb └── generators │ └── active_hll │ ├── install_generator.rb │ └── templates │ └── migration.rb.tt └── test ├── add_test.rb ├── agg_test.rb ├── count_test.rb ├── create_test.rb ├── hll_test.rb ├── misc_test.rb ├── test_helper.rb ├── union_test.rb └── upsert_test.rb /.github/workflows/build.yml: -------------------------------------------------------------------------------- 1 | name: build 2 | on: [push, pull_request] 3 | jobs: 4 | build: 5 | strategy: 6 | fail-fast: false 7 | matrix: 8 | include: 9 | - ruby: 3.4 10 | gemfile: Gemfile 11 | - ruby: 3.3 12 | gemfile: gemfiles/activerecord72.gemfile 13 | - ruby: 3.2 14 | gemfile: gemfiles/activerecord71.gemfile 15 | runs-on: ubuntu-latest 16 | env: 17 | BUNDLE_GEMFILE: ${{ matrix.gemfile }} 18 | steps: 19 | - uses: actions/checkout@v4 20 | - uses: ruby/setup-ruby@v1 21 | with: 22 | ruby-version: ${{ matrix.ruby }} 23 | bundler-cache: true 24 | - uses: ankane/setup-postgres@v1 25 | with: 26 | database: active_hll_test 27 | dev-files: true 28 | - run: | 29 | cd /tmp 30 | curl -L https://github.com/citusdata/postgresql-hll/archive/refs/tags/v2.18.tar.gz | tar xz 31 | cd postgresql-hll-2.18 32 | make 33 | sudo make install 34 | - run: bundle exec rake test 35 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | /.bundle/ 2 | /.yardoc 3 | /_yardoc/ 4 | /coverage/ 5 | /doc/ 6 | /pkg/ 7 | /spec/reports/ 8 | /tmp/ 9 | *.lock 10 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## 0.3.0 (2025-04-03) 2 | 3 | - Dropped support for Ruby < 3.2 and Active Record < 7.1 4 | 5 | ## 0.2.1 (2024-10-07) 6 | 7 | - Fixed connection leasing for Active Record 7.2+ 8 | 9 | ## 0.2.0 (2024-06-24) 10 | 11 | - Dropped support for Ruby < 3.1 and Active Record < 6.1 12 | 13 | ## 0.1.1 (2023-01-29) 14 | 15 | - Added experimental `hll_upsert` method 16 | 17 | ## 0.1.0 (2023-01-24) 18 | 19 | - First release 20 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | 3 | gemspec 4 | 5 | gem "rake" 6 | gem "minitest", ">= 5" 7 | gem "activerecord", "~> 8.0.0" 8 | gem "pg" 9 | gem "groupdate" 10 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2023-2025 Andrew Kane 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Active HLL 2 | 3 | :fire: HyperLogLog for Rails and Postgres 4 | 5 | For fast, approximate count-distinct queries 6 | 7 | [![Build Status](https://github.com/ankane/active_hll/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/active_hll/actions) 8 | 9 | ## Installation 10 | 11 | First, install the [hll extension](https://github.com/citusdata/postgresql-hll) on your database server: 12 | 13 | ```sh 14 | cd /tmp 15 | curl -L https://github.com/citusdata/postgresql-hll/archive/refs/tags/v2.18.tar.gz | tar xz 16 | cd postgresql-hll-2.18 17 | make 18 | make install # may need sudo 19 | ``` 20 | 21 | Then add this line to your application’s Gemfile: 22 | 23 | ```ruby 24 | gem "active_hll" 25 | ``` 26 | 27 | And run: 28 | 29 | ```sh 30 | bundle install 31 | rails generate active_hll:install 32 | rails db:migrate 33 | ``` 34 | 35 | ## Getting Started 36 | 37 | HLLs provide an approximate count of unique values (like unique visitors). By rolling up data by day, you can quickly get an approximate count over any date range. 38 | 39 | Create a table with an `hll` column 40 | 41 | ```ruby 42 | class CreateEventRollups < ActiveRecord::Migration[8.0] 43 | def change 44 | create_table :event_rollups do |t| 45 | t.date :time_bucket, index: {unique: true} 46 | t.hll :visitor_ids 47 | end 48 | end 49 | end 50 | ``` 51 | 52 | You can use [batch](#batch) and [stream](#stream) approaches to build HLLs 53 | 54 | ### Batch 55 | 56 | To generate HLLs from existing data, use the `hll_agg` method 57 | 58 | ```ruby 59 | hlls = Event.group_by_day(:created_at).hll_agg(:visitor_id) 60 | ``` 61 | 62 | > Install [Groupdate](https://github.com/ankane/groupdate) to use the `group_by_day` method 63 | 64 | And store the result 65 | 66 | ```ruby 67 | EventRollup.upsert_all( 68 | hlls.map { |k, v| {time_bucket: k, visitor_ids: v} }, 69 | unique_by: [:time_bucket] 70 | ) 71 | ``` 72 | 73 | For a large number of HLLs, use SQL to generate and upsert in a single statement 74 | 75 | ### Stream 76 | 77 | To add new data to HLLs, use the `hll_add` method 78 | 79 | ```ruby 80 | EventRollup.where(time_bucket: Date.current).hll_add(visitor_ids: ["visitor1", "visitor2"]) 81 | ``` 82 | 83 | or the `hll_upsert` method (experimental) 84 | 85 | ```ruby 86 | EventRollup.hll_upsert({time_bucket: Date.current, visitor_ids: ["visitor1", "visitor2"]}) 87 | ``` 88 | 89 | ## Querying 90 | 91 | Get approximate unique values for a time range 92 | 93 | ```ruby 94 | EventRollup.where(time_bucket: 30.days.ago.to_date..Date.current).hll_count(:visitor_ids) 95 | ``` 96 | 97 | Get approximate unique values by time bucket 98 | 99 | ```ruby 100 | EventRollup.group(:time_bucket).hll_count(:visitor_ids) 101 | ``` 102 | 103 | Get approximate unique values by month 104 | 105 | ```ruby 106 | EventRollup.group_by_month(:time_bucket, time_zone: false).hll_count(:visitor_ids) 107 | ``` 108 | 109 | Get the union of multiple HLLs 110 | 111 | ```ruby 112 | EventRollup.hll_union(:visitor_ids) 113 | ``` 114 | 115 | ## Data Protection 116 | 117 | Cardinality estimators like HyperLogLog do not [preserve privacy](https://arxiv.org/pdf/1808.05879.pdf), so protect `hll` columns the same as you would the raw data. 118 | 119 | For instance, you can check membership with a good probability with: 120 | 121 | ```sql 122 | SELECT 123 | time_bucket, 124 | visitor_ids = visitor_ids || hll_hash_text('visitor1') AS likely_member 125 | FROM 126 | event_rollups; 127 | ``` 128 | 129 | ## Data Retention 130 | 131 | Data should only be retained for as long as it’s needed. Delete older data with: 132 | 133 | ```ruby 134 | EventRollup.where("time_bucket < ?", 2.years.ago).delete_all 135 | ``` 136 | 137 | There’s not a way to remove data from an HLL, so to delete data for a specific user, delete the underlying data and recalculate the rollup. 138 | 139 | ## Hosted Postgres 140 | 141 | The `hll` extension is available on a number of [hosted providers](https://github.com/ankane/active_hll/issues/4). 142 | 143 | ## History 144 | 145 | View the [changelog](CHANGELOG.md) 146 | 147 | ## Contributing 148 | 149 | Everyone is encouraged to help improve this project. Here are a few ways you can help: 150 | 151 | - [Report bugs](https://github.com/ankane/active_hll/issues) 152 | - Fix bugs and [submit pull requests](https://github.com/ankane/active_hll/pulls) 153 | - Write, clarify, or fix documentation 154 | - Suggest or add new features 155 | 156 | To get started with development: 157 | 158 | ```sh 159 | git clone https://github.com/ankane/active_hll.git 160 | cd active_hll 161 | bundle install 162 | bundle exec rake test 163 | ``` 164 | -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | require "bundler/gem_tasks" 2 | require "rake/testtask" 3 | 4 | Rake::TestTask.new(:test) do |t| 5 | t.libs << "test" 6 | t.test_files = FileList["test/**/*_test.rb"] 7 | end 8 | 9 | task default: :test 10 | -------------------------------------------------------------------------------- /active_hll.gemspec: -------------------------------------------------------------------------------- 1 | require_relative "lib/active_hll/version" 2 | 3 | Gem::Specification.new do |spec| 4 | spec.name = "active_hll" 5 | spec.version = ActiveHll::VERSION 6 | spec.summary = "HyperLogLog for Rails and Postgres" 7 | spec.homepage = "https://github.com/ankane/active_hll" 8 | spec.license = "MIT" 9 | 10 | spec.author = "Andrew Kane" 11 | spec.email = "andrew@ankane.org" 12 | 13 | spec.files = Dir["*.{md,txt}", "{lib}/**/*"] 14 | spec.require_path = "lib" 15 | 16 | spec.required_ruby_version = ">= 3.2" 17 | 18 | spec.add_dependency "activerecord", ">= 7.1" 19 | end 20 | -------------------------------------------------------------------------------- /gemfiles/activerecord71.gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | 3 | gemspec path: ".." 4 | 5 | gem "rake" 6 | gem "minitest", ">= 5" 7 | gem "activerecord", "~> 7.1.0" 8 | gem "pg" 9 | gem "groupdate" 10 | -------------------------------------------------------------------------------- /gemfiles/activerecord72.gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | 3 | gemspec path: ".." 4 | 5 | gem "rake" 6 | gem "minitest", ">= 5" 7 | gem "activerecord", "~> 7.2.0" 8 | gem "pg" 9 | gem "groupdate" 10 | -------------------------------------------------------------------------------- /lib/active_hll.rb: -------------------------------------------------------------------------------- 1 | # dependencies 2 | require "active_support" 3 | 4 | # modules 5 | require_relative "active_hll/hll" 6 | require_relative "active_hll/utils" 7 | require_relative "active_hll/version" 8 | 9 | module ActiveHll 10 | class Error < StandardError; end 11 | 12 | autoload :Type, "active_hll/type" 13 | 14 | module RegisterType 15 | def initialize_type_map(m = type_map) 16 | super 17 | m.register_type "hll", ActiveHll::Type.new 18 | end 19 | end 20 | end 21 | 22 | ActiveSupport.on_load(:active_record) do 23 | require_relative "active_hll/model" 24 | 25 | include ActiveHll::Model 26 | 27 | require "active_record/connection_adapters/postgresql_adapter" 28 | 29 | # ensure schema can be dumped 30 | ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::NATIVE_DATABASE_TYPES[:hll] = {name: "hll"} 31 | 32 | # ensure schema can be loaded 33 | ActiveRecord::ConnectionAdapters::TableDefinition.send(:define_column_methods, :hll) 34 | 35 | # prevent unknown OID warning 36 | ActiveRecord::ConnectionAdapters::PostgreSQLAdapter.singleton_class.prepend(ActiveHll::RegisterType) 37 | end 38 | -------------------------------------------------------------------------------- /lib/active_hll/hll.rb: -------------------------------------------------------------------------------- 1 | # format of value 2 | # https://github.com/aggregateknowledge/hll-storage-spec/blob/v1.0.0/STORAGE.md 3 | module ActiveHll 4 | class Hll 5 | attr_reader :value 6 | 7 | def initialize(value) 8 | unless value.is_a?(String) && value.encoding == Encoding::BINARY 9 | raise ArgumentError, "Expected binary string" 10 | end 11 | 12 | @value = value 13 | end 14 | 15 | def inspect 16 | "(hll)" 17 | end 18 | 19 | def schema_version 20 | value[0].unpack1("C") >> 4 21 | end 22 | 23 | def type 24 | value[0].unpack1("C") & 0b00001111 25 | end 26 | 27 | def regwidth 28 | (value[1].unpack1("C") >> 5) + 1 29 | end 30 | 31 | def log2m 32 | value[1].unpack1("C") & 0b00011111 33 | end 34 | 35 | def sparseon 36 | (value[2].unpack1("C") & 0b01000000) >> 6 37 | end 38 | 39 | def expthresh 40 | t = value[2].unpack1("C") & 0b00111111 41 | t == 63 ? -1 : 2**(t - 1) 42 | end 43 | 44 | def data 45 | case type 46 | when 2 47 | value[3..-1].unpack("q>*") 48 | end 49 | end 50 | end 51 | end 52 | -------------------------------------------------------------------------------- /lib/active_hll/model.rb: -------------------------------------------------------------------------------- 1 | require "active_support/concern" 2 | 3 | module ActiveHll 4 | module Model 5 | extend ActiveSupport::Concern 6 | 7 | class_methods do 8 | def hll_agg(column) 9 | Utils.hll_calculate(self, "hll_add_agg(hll_hash_any(%s)) AS hll_agg", column, default_value: nil) 10 | end 11 | 12 | def hll_union(column) 13 | Utils.hll_calculate(self, "hll_union_agg(%s) AS hll_union", column, default_value: nil) 14 | end 15 | 16 | def hll_count(column) 17 | Utils.hll_calculate(self, "hll_cardinality(hll_union_agg(%s)) AS hll_count", column, default_value: 0.0) 18 | end 19 | 20 | # experimental 21 | # doesn't work with non-default parameters 22 | def hll_generate(values) 23 | Utils.with_connection(self) do |connection| 24 | parts = ["hll_empty()"] 25 | 26 | values.each do |value| 27 | parts << Utils.hll_hash_sql(connection, value) 28 | end 29 | 30 | result = connection.select_all("SELECT #{parts.join(" || ")}").rows[0][0] 31 | ActiveHll::Type.new.deserialize(result) 32 | end 33 | end 34 | 35 | def hll_add(attributes) 36 | Utils.with_connection(self) do |connection| 37 | set_clauses = 38 | attributes.map do |attribute, values| 39 | values = [values] unless values.is_a?(Array) 40 | return 0 if values.empty? 41 | 42 | quoted_column = connection.quote_column_name(attribute) 43 | # possibly fetch parameters for the column in the future 44 | # for now, users should set a default value on the column 45 | parts = ["COALESCE(#{quoted_column}, hll_empty())"] 46 | 47 | values.each do |value| 48 | parts << Utils.hll_hash_sql(connection, value) 49 | end 50 | 51 | "#{quoted_column} = #{parts.join(" || ")}" 52 | end 53 | 54 | update_all(set_clauses.join(", ")) 55 | end 56 | end 57 | 58 | # experimental 59 | def hll_upsert(attributes) 60 | Utils.with_connection(self) do |connection| 61 | hll_columns, other_columns = attributes.keys.partition { |a| columns_hash[a.to_s]&.type == :hll } 62 | 63 | # important! raise if column detection fails 64 | if hll_columns.empty? 65 | raise ArgumentError, "No hll columns" 66 | end 67 | 68 | quoted_table = connection.quote_table_name(table_name) 69 | 70 | quoted_hll_columns = hll_columns.map { |k| connection.quote_column_name(k) } 71 | quoted_other_columns = other_columns.map { |k| connection.quote_column_name(k) } 72 | quoted_columns = quoted_other_columns + quoted_hll_columns 73 | 74 | hll_values = 75 | hll_columns.map do |k| 76 | vs = attributes[k] 77 | vs = [vs] unless vs.is_a?(Array) 78 | vs.map { |v| Utils.hll_hash_sql(connection, v) }.join(" || ") 79 | end 80 | other_values = other_columns.map { |k| connection.quote(attributes[k]) } 81 | 82 | insert_values = other_values + hll_values.map { |v| "hll_empty()#{v.size > 0 ? " || #{v}" : ""}" } 83 | update_values = quoted_hll_columns.zip(hll_values).map { |k, v| "#{k} = COALESCE(#{quoted_table}.#{k}, hll_empty())#{v.size > 0 ? " || #{v}" : ""}" } 84 | 85 | sql = "INSERT INTO #{quoted_table} (#{quoted_columns.join(", ")}) VALUES (#{insert_values.join(", ")}) ON CONFLICT (#{quoted_other_columns.join(", ")}) DO UPDATE SET #{update_values.join(", ")}" 86 | connection.exec_insert(sql, "#{name} Upsert") 87 | end 88 | end 89 | end 90 | 91 | # doesn't update in-memory record attribute for performance 92 | def hll_add(attributes) 93 | self.class.where(id: id).hll_add(attributes) 94 | nil 95 | end 96 | 97 | def hll_count(attribute) 98 | Utils.with_connection(self.class) do |connection| 99 | quoted_column = connection.quote_column_name(attribute) 100 | self.class.where(id: id).pluck("hll_cardinality(#{quoted_column})").first || 0.0 101 | end 102 | end 103 | end 104 | end 105 | -------------------------------------------------------------------------------- /lib/active_hll/type.rb: -------------------------------------------------------------------------------- 1 | module ActiveHll 2 | class Type < ActiveRecord::ConnectionAdapters::PostgreSQL::OID::Bytea 3 | def type 4 | :hll 5 | end 6 | 7 | def serialize(value) 8 | if value.is_a?(Hll) 9 | value = value.value 10 | elsif !value.nil? 11 | raise ArgumentError, "can't cast #{value.class.name} to hll" 12 | end 13 | super(value) 14 | end 15 | 16 | def deserialize(value) 17 | value = super 18 | value.nil? ? value : Hll.new(value) 19 | end 20 | end 21 | end 22 | -------------------------------------------------------------------------------- /lib/active_hll/utils.rb: -------------------------------------------------------------------------------- 1 | module ActiveHll 2 | module Utils 3 | class << self 4 | def hll_hash_sql(connection, value) 5 | hash_function = 6 | case value 7 | when true, false 8 | "hll_hash_boolean" 9 | when Integer 10 | "hll_hash_bigint" 11 | when String 12 | "hll_hash_text" 13 | else 14 | raise ArgumentError, "Unexpected type: #{value.class.name}" 15 | end 16 | quoted_value = connection.quote(value) 17 | "#{hash_function}(#{quoted_value})" 18 | end 19 | 20 | def with_connection(relation, &block) 21 | relation.connection_pool.with_connection(&block) 22 | end 23 | 24 | def hll_calculate(relation, operation, column, default_value:) 25 | Utils.with_connection(relation) do |connection| 26 | sql, relation, group_values = hll_calculate_sql(relation, connection, operation, column) 27 | result = connection.select_all(sql) 28 | 29 | # typecast 30 | rows = [] 31 | columns = result.columns 32 | result.rows.each do |untyped_row| 33 | rows << (result.column_types.empty? ? untyped_row : columns.each_with_index.map { |c, i| untyped_row[i] && result.column_types[c] ? result.column_types[c].deserialize(untyped_row[i]) : untyped_row[i] }) 34 | end 35 | 36 | result = 37 | if group_values.any? 38 | Hash[rows.map { |r| [r.size == 2 ? r[0] : r[0..-2], r[-1]] }] 39 | else 40 | rows[0] && rows[0][0] 41 | end 42 | 43 | result = Groupdate.process_result(relation, result, default_value: default_value) if defined?(Groupdate.process_result) 44 | 45 | result 46 | end 47 | end 48 | 49 | def hll_calculate_sql(relation, connection, operation, column) 50 | # basic version of Active Record disallow_raw_sql! 51 | # symbol = column (safe), Arel node = SQL (safe), other = untrusted 52 | # matches table.column and column 53 | unless column.is_a?(Symbol) || column.is_a?(Arel::Nodes::SqlLiteral) 54 | column = column.to_s 55 | unless /\A\w+(\.\w+)?\z/i.match?(column) 56 | raise ActiveRecord::UnknownAttributeReference, "Query method called with non-attribute argument(s): #{column.inspect}. Use Arel.sql() for known-safe values." 57 | end 58 | end 59 | 60 | # column resolution 61 | node = relation.all.send(:arel_columns, [column]).first 62 | node = Arel::Nodes::SqlLiteral.new(node) if node.is_a?(String) 63 | column = connection.visitor.accept(node, Arel::Collectors::SQLString.new).value 64 | 65 | group_values = relation.all.group_values 66 | 67 | relation = relation.unscope(:select).select(*group_values, operation % [column]) 68 | 69 | # same as average 70 | relation = relation.unscope(:order).distinct!(false) if group_values.empty? 71 | 72 | [relation.to_sql, relation, group_values] 73 | end 74 | end 75 | end 76 | end 77 | -------------------------------------------------------------------------------- /lib/active_hll/version.rb: -------------------------------------------------------------------------------- 1 | module ActiveHll 2 | VERSION = "0.3.0" 3 | end 4 | -------------------------------------------------------------------------------- /lib/generators/active_hll/install_generator.rb: -------------------------------------------------------------------------------- 1 | require "rails/generators/active_record" 2 | 3 | module ActiveHll 4 | module Generators 5 | class InstallGenerator < Rails::Generators::Base 6 | include ActiveRecord::Generators::Migration 7 | source_root File.join(__dir__, "templates") 8 | 9 | def copy_migration 10 | migration_template "migration.rb", "db/migrate/install_active_hll.rb", migration_version: migration_version 11 | end 12 | 13 | def migration_version 14 | "[#{ActiveRecord::VERSION::MAJOR}.#{ActiveRecord::VERSION::MINOR}]" 15 | end 16 | end 17 | end 18 | end 19 | -------------------------------------------------------------------------------- /lib/generators/active_hll/templates/migration.rb.tt: -------------------------------------------------------------------------------- 1 | class <%= migration_class_name %> < ActiveRecord::Migration<%= migration_version %> 2 | def change 3 | enable_extension "hll" 4 | end 5 | end 6 | -------------------------------------------------------------------------------- /test/add_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class AddTest < Minitest::Test 4 | def test_string 5 | item = EventRollup.create! 6 | assert_nil item.hll_add(visitor_ids: "hello") 7 | assert_nil item.hll_add(visitor_ids: ["world", "!!!"]) 8 | assert_equal 3, item.hll_count(:visitor_ids) 9 | end 10 | 11 | def test_boolean 12 | item = EventRollup.create! 13 | assert_nil item.hll_add(visitor_ids: true) 14 | assert_nil item.hll_add(visitor_ids: [true, false]) 15 | assert_equal 2, item.hll_count(:visitor_ids) 16 | end 17 | 18 | def test_integer 19 | item = EventRollup.create! 20 | assert_nil item.hll_add(visitor_ids: 1) 21 | assert_nil item.hll_add(visitor_ids: [2, 3]) 22 | assert_equal 3, item.hll_count(:visitor_ids) 23 | end 24 | 25 | def test_multiple_types 26 | item = EventRollup.create! 27 | assert_nil item.hll_add(visitor_ids: ["a", "b", "c", 1, 2, 3, true, false]) 28 | assert_equal 8, item.hll_count(:visitor_ids) 29 | end 30 | 31 | def test_multiple_columns 32 | skip "TODO fix" 33 | 34 | item = OrderRollup.create! 35 | assert_nil item.hll_add(visitor_ids: 1, user_ids: 2) 36 | assert_equal 1, item.hll_count(:visitor_ids) 37 | assert_equal 1, item.hll_count(:user_ids) 38 | end 39 | 40 | def test_nil 41 | item = EventRollup.create! 42 | assert_equal 0, item.hll_count(:visitor_ids) 43 | assert_nil item.hll_add(visitor_ids: 1) 44 | assert_equal 1, item.hll_count(:visitor_ids) 45 | end 46 | 47 | def test_empty 48 | item = EventRollup.create! 49 | assert_nil item.hll_add(visitor_ids: []) 50 | assert_equal 0, item.hll_count(:visitor_ids) 51 | end 52 | 53 | def test_relation 54 | items = 3.times.map { |i| EventRollup.create!(id: i + 1) } 55 | assert_equal 2, EventRollup.where("id <= ?", 2).hll_add(visitor_ids: "hello") 56 | assert_equal [1, 1, 0], items.map { |item| item.hll_count(:visitor_ids) } 57 | end 58 | end 59 | -------------------------------------------------------------------------------- /test/agg_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class AggTest < Minitest::Test 4 | def test_agg 5 | create_events 6 | 7 | hlls = Event.group_by_day(:created_at).hll_agg(:visitor_id) 8 | 9 | 3.times do 10 | EventRollup.upsert_all( 11 | hlls.map { |k, v| {time_bucket: k, visitor_ids: v} }, 12 | unique_by: [:time_bucket] 13 | ) 14 | end 15 | 16 | assert_equal 5, EventRollup.hll_count(:visitor_ids) 17 | end 18 | 19 | def test_expression_no_arel 20 | error = assert_raises(ActiveRecord::UnknownAttributeReference) do 21 | EventRollup.hll_agg("counter + 1") 22 | end 23 | assert_equal "Query method called with non-attribute argument(s): \"counter + 1\". Use Arel.sql() for known-safe values.", error.message 24 | end 25 | 26 | private 27 | 28 | def create_events 29 | now = Time.now 30 | [1, 1, 2, 3].each do |visitor_id| 31 | Event.create!(visitor_id: visitor_id, created_at: now - 2.days) 32 | end 33 | [3, 4, 5].each do |visitor_id| 34 | Event.create!(visitor_id: visitor_id, created_at: now) 35 | end 36 | end 37 | end 38 | -------------------------------------------------------------------------------- /test/count_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class CountTest < Minitest::Test 4 | def test_count 5 | EventRollup.create!.hll_add(visitor_ids: [1, 2, 3]) 6 | EventRollup.create!.hll_add(visitor_ids: [3, 4, 5]) 7 | assert_equal 5, EventRollup.hll_count(:visitor_ids) 8 | end 9 | 10 | def test_order 11 | EventRollup.create!(time_bucket: Date.yesterday).hll_add(visitor_ids: [1, 2, 3]) 12 | EventRollup.create!(time_bucket: Date.current).hll_add(visitor_ids: [3, 4, 5]) 13 | assert_equal 5, EventRollup.order(:time_bucket).hll_count(:visitor_ids) 14 | end 15 | 16 | def test_group 17 | EventRollup.create!(time_bucket: Date.yesterday).hll_add(visitor_ids: [1, 2, 3]) 18 | EventRollup.create!(time_bucket: Date.current).hll_add(visitor_ids: [3, 4, 5]) 19 | expected = {Date.yesterday => 3, Date.current => 3} 20 | assert_equal expected, EventRollup.group(:time_bucket).hll_count(:visitor_ids) 21 | end 22 | 23 | def test_groupdate 24 | week = Date.current.beginning_of_week(:sunday) 25 | EventRollup.create!(time_bucket: week).hll_add(visitor_ids: [1, 2, 3]) 26 | EventRollup.create!(time_bucket: week + 1).hll_add(visitor_ids: [3, 4, 5]) 27 | expected = {week => 5} 28 | assert_equal expected, EventRollup.group_by_week(:time_bucket, time_zone: false).hll_count(:visitor_ids) 29 | end 30 | 31 | def test_groupdate_zeros 32 | assert_equal [0], EventRollup.group_by_week(:time_bucket, last: 1).hll_count(:visitor_ids).values 33 | end 34 | 35 | def test_expression_no_arel 36 | error = assert_raises(ActiveRecord::UnknownAttributeReference) do 37 | EventRollup.hll_count("counter + 1") 38 | end 39 | assert_equal "Query method called with non-attribute argument(s): \"counter + 1\". Use Arel.sql() for known-safe values.", error.message 40 | end 41 | end 42 | -------------------------------------------------------------------------------- /test/create_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class CreateTest < Minitest::Test 4 | def test_generate 5 | event = EventRollup.create!(visitor_ids: EventRollup.hll_generate([1, 2, 3])) 6 | assert_equal 3, event.hll_count(:visitor_ids) 7 | end 8 | 9 | def test_string 10 | error = assert_raises(ArgumentError) do 11 | EventRollup.create!(visitor_ids: "hello") 12 | end 13 | assert_equal "can't cast String to hll", error.message 14 | end 15 | end 16 | -------------------------------------------------------------------------------- /test/hll_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class HllTest < Minitest::Test 4 | def test_inspect 5 | EventRollup.create!.hll_add(visitor_ids: [1, 2, 3]) 6 | rollup = EventRollup.last 7 | assert_kind_of ActiveHll::Hll, rollup.visitor_ids 8 | assert_equal "(hll)", rollup.visitor_ids.inspect 9 | assert_match "visitor_ids: (hll)", rollup.inspect 10 | end 11 | 12 | def test_methods 13 | item = EventRollup.create! 14 | item.hll_add(visitor_ids: ["a", "b", "c"]) 15 | hll = item.reload.visitor_ids 16 | assert_equal 1, hll.schema_version 17 | assert_equal 2, hll.type 18 | assert_equal 11, hll.log2m 19 | assert_equal 5, hll.regwidth 20 | assert_equal (-1), hll.expthresh 21 | assert_equal 1, hll.sparseon 22 | assert_equal [-8839064797231613815, -8198557465434950441, 8833996863197925870], hll.data 23 | end 24 | end 25 | -------------------------------------------------------------------------------- /test/misc_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class MiscTest < Minitest::Test 4 | def test_schema 5 | file = Tempfile.new 6 | connection = ActiveRecord::VERSION::STRING.to_f >= 7.2 ? ActiveRecord::Base.connection_pool : ActiveRecord::Base.connection 7 | ActiveRecord::SchemaDumper.dump(connection, file) 8 | file.rewind 9 | schema = file.read 10 | refute_match "Could not dump table", schema 11 | load(file.path) 12 | end 13 | 14 | def test_select 15 | item = EventRollup.create! 16 | item.hll_add(visitor_ids: ["a", "b", "c"]) 17 | assert_equal 3, EventRollup.select("id, hll_cardinality(visitor_ids) AS visitors_count").first.visitors_count 18 | end 19 | 20 | # no need for model method 21 | def test_print 22 | item = EventRollup.create! 23 | item.hll_add(visitor_ids: ["a", "b", "c"]) 24 | output = EventRollup.where(id: item.id).pluck("hll_print(visitor_ids)::text").first 25 | assert_match "3 elements", output 26 | end 27 | 28 | def test_accuracy 29 | item = EventRollup.create! 30 | item.hll_add(visitor_ids: 1000.times.map { |i| "visitor#{i}" }) 31 | assert_in_delta 1000, item.hll_count(:visitor_ids), 8 32 | end 33 | 34 | def test_likely_member 35 | today = Date.current 36 | 37 | item = EventRollup.create!(time_bucket: today - 1) 38 | item.hll_add(visitor_ids: ["a", "b", "c"]) 39 | 40 | item = EventRollup.create!(time_bucket: today) 41 | item.hll_add(visitor_ids: ["c", "d", "e"]) 42 | 43 | sql = <<~SQL 44 | SELECT 45 | time_bucket, 46 | visitor_ids = visitor_ids || hll_hash_text('a') AS likely_member 47 | FROM 48 | event_rollups; 49 | SQL 50 | result = EventRollup.connection.select_all(sql).to_a 51 | likely_members = result.to_h { |r| [Date.parse(r["time_bucket"]), r["likely_member"]] } 52 | assert_equal true, likely_members[today - 1] 53 | assert_equal false, likely_members[today] 54 | end 55 | 56 | def test_connection_leasing 57 | ActiveRecord::Base.connection_handler.clear_active_connections! 58 | assert_nil ActiveRecord::Base.connection_pool.active_connection? 59 | ActiveRecord::Base.connection_pool.with_connection do 60 | Event.group_by_day(:created_at).hll_agg(:visitor_id) 61 | EventRollup.hll_generate([1, 2, 3]) 62 | EventRollup.hll_count(:visitor_ids) 63 | end 64 | assert_nil ActiveRecord::Base.connection_pool.active_connection? 65 | end 66 | end 67 | -------------------------------------------------------------------------------- /test/test_helper.rb: -------------------------------------------------------------------------------- 1 | require "bundler/setup" 2 | Bundler.require(:default) 3 | require "minitest/autorun" 4 | require "minitest/pride" 5 | require "active_record" 6 | 7 | logger = ActiveSupport::Logger.new(ENV["VERBOSE"] ? STDOUT : nil) 8 | ActiveRecord::Schema.verbose = false unless ENV["VERBOSE"] 9 | ActiveRecord::Base.logger = logger 10 | 11 | if ActiveRecord::VERSION::STRING.to_f >= 7.2 12 | ActiveRecord::Base.attributes_for_inspect = :all 13 | end 14 | 15 | if ActiveRecord::VERSION::STRING.to_f == 8.0 16 | ActiveSupport.to_time_preserves_timezone = :zone 17 | elsif ActiveRecord::VERSION::STRING.to_f == 7.2 18 | ActiveSupport.to_time_preserves_timezone = true 19 | end 20 | 21 | ActiveRecord::Base.establish_connection adapter: "postgresql", database: "active_hll_test" 22 | 23 | ActiveRecord::Schema.define do 24 | enable_extension "hll" 25 | 26 | create_table :events, force: true do |t| 27 | t.integer :visitor_id 28 | t.datetime :created_at 29 | end 30 | 31 | create_table :event_rollups, force: true do |t| 32 | t.date :time_bucket 33 | t.hll :visitor_ids 34 | end 35 | add_index :event_rollups, :time_bucket, unique: true 36 | 37 | create_table :order_rollups, force: true do |t| 38 | t.hll :visitor_ids, default: -> { "hll_empty()" } 39 | # TODO support parameters 40 | # https://github.com/citusdata/postgresql-hll#explanation-of-parameters-and-tuning 41 | # currently don't appear in schema.rb 42 | t.column :user_ids, "hll(12, 6, 1024, 0)", default: -> { "hll_empty(12, 6, 1024, 0)" } 43 | end 44 | end 45 | 46 | class Event < ActiveRecord::Base 47 | end 48 | 49 | class EventRollup < ActiveRecord::Base 50 | end 51 | 52 | class OrderRollup < ActiveRecord::Base 53 | end 54 | 55 | class Minitest::Test 56 | def setup 57 | Event.delete_all 58 | EventRollup.delete_all 59 | end 60 | end 61 | -------------------------------------------------------------------------------- /test/union_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class UnionTest < Minitest::Test 4 | def test_union 5 | EventRollup.create!.hll_add(visitor_ids: [1, 2, 3]) 6 | EventRollup.create!.hll_add(visitor_ids: [3, 4, 5]) 7 | event = EventRollup.create!(visitor_ids: EventRollup.hll_union(:visitor_ids)) 8 | assert_equal 5, event.hll_count(:visitor_ids) 9 | end 10 | 11 | def test_expression_no_arel 12 | error = assert_raises(ActiveRecord::UnknownAttributeReference) do 13 | EventRollup.hll_union("counter + 1") 14 | end 15 | assert_equal "Query method called with non-attribute argument(s): \"counter + 1\". Use Arel.sql() for known-safe values.", error.message 16 | end 17 | end 18 | -------------------------------------------------------------------------------- /test/upsert_test.rb: -------------------------------------------------------------------------------- 1 | require_relative "test_helper" 2 | 3 | class UpsertTest < Minitest::Test 4 | def test_upsert 5 | today = Date.current 6 | 3.times do 7 | result = EventRollup.hll_upsert({time_bucket: today, visitor_ids: ["hello", "world"]}) 8 | assert_kind_of ActiveRecord::Result, result 9 | end 10 | EventRollup.hll_upsert({time_bucket: today, visitor_ids: ["!!!"]}) 11 | 12 | assert_equal 1, EventRollup.count 13 | rollup = EventRollup.last 14 | assert_equal today, rollup.time_bucket 15 | assert_equal 3, rollup.hll_count(:visitor_ids) 16 | end 17 | 18 | def test_empty 19 | today = Date.current 20 | 3.times do 21 | result = EventRollup.hll_upsert({time_bucket: today, visitor_ids: []}) 22 | assert_kind_of ActiveRecord::Result, result 23 | end 24 | 25 | assert_equal 1, EventRollup.count 26 | rollup = EventRollup.last 27 | assert_equal today, rollup.time_bucket 28 | assert_equal 0, rollup.hll_count(:visitor_ids) 29 | end 30 | 31 | def test_upsert_no_hll 32 | error = assert_raises(ArgumentError) do 33 | EventRollup.hll_upsert({time_bucket: Date.current}) 34 | end 35 | assert_equal "No hll columns", error.message 36 | end 37 | 38 | def test_missing_column 39 | assert_raises(ActiveRecord::StatementInvalid) do 40 | EventRollup.hll_upsert({missing: Date.current, visitor_ids: ["hello", "world"]}) 41 | end 42 | end 43 | 44 | # ideally would raise NoMethodError 45 | # but has same behavior as upsert/upsert_all methods 46 | def test_relation 47 | assert EventRollup.all.hll_upsert({time_bucket: Date.current, visitor_ids: ["hello", "world"]}) 48 | end 49 | end 50 | --------------------------------------------------------------------------------