├── .formatter.exs ├── .gitignore ├── README.md ├── bench ├── memory.exs └── simple_dog.exs ├── lib ├── dog_sketch.ex ├── dog_sketch │ ├── exact_dog.ex │ └── simple_dog.ex └── mix │ └── tasks │ └── proper.ex ├── mix.exs ├── mix.lock ├── papers └── p2195-masson.pdf └── test ├── dog_sketch_test.exs ├── simple_dog_sketch_test.exs └── test_helper.exs /.formatter.exs: -------------------------------------------------------------------------------- 1 | # Used by "mix format" 2 | [ 3 | inputs: ["{mix,.formatter}.exs", "{config,lib,test}/**/*.{ex,exs}"] 4 | ] 5 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # The directory Mix will write compiled artifacts to. 2 | /_build/ 3 | 4 | # If you run "mix test --cover", coverage assets end up here. 5 | /cover/ 6 | 7 | # The directory Mix downloads your dependencies sources to. 8 | /deps/ 9 | 10 | # Where third-party dependencies like ExDoc output generated docs. 11 | /doc/ 12 | 13 | # Ignore .fetch files in case you like to edit your project deps locally. 14 | /.fetch 15 | 16 | # If the VM crashes, it generates a dump, let's ignore it too. 17 | erl_crash.dump 18 | 19 | # Also ignore archive artifacts (built via "mix archive.build"). 20 | *.ez 21 | 22 | # Ignore package tarball (built via "mix hex.build"). 23 | dog_sketch-*.tar 24 | 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DogSketch 2 | 3 | DogSketch lets you trade accuracy for speed and memory usage while calculating percentiles. 4 | 5 | DogSketch is an implementation of the DDSketch algorithm, described in [this paper](papers/p2195-masson.pdf). It originated at DataDog. 6 | 7 | ## Examples 8 | 9 | This library includes an "exact" implementation that can be compared against for the purposes of comparison (and property tests). 10 | 11 | Note: the `SimpleDog` implementation does not give any guarantees of memory boundedness, which the full DDSketch algorithm does. Adding this should not be difficult, if you need it then please contribute, PRs are welcome! 12 | 13 | ```elixir 14 | iex(1)> alias DogSketch.{ExactDog, SimpleDog} 15 | [DogSketch.ExactDog, DogSketch.SimpleDog] 16 | iex(2)> ed = ExactDog.new 17 | %DogSketch.ExactDog{data: %{}, total: 0} 18 | iex(5)> sd = SimpleDog.new(error: 0.04) 19 | %DogSketch.SimpleDog{data: %{}, gamma: 1.0833333333333335, total: 0} 20 | ``` 21 | 22 | We have specified for the `SimpleDog` a maximum relative error rate of 4%. 23 | 24 | Now let's add the numbers 1 to 10000 to the sketches and compare. 25 | 26 | ```elixir 27 | iex(6)> sd = Enum.reduce(1..10000, sd, fn x, sd -> SimpleDog.insert(sd, x) end) 28 | ... 29 | iex(7)> ed = Enum.reduce(1..10000, ed, fn x, ed -> ExactDog.insert(ed, x) end) 30 | iex(8)> SimpleDog.quantile(sd, 0.5) 31 | 5032.880315534522 32 | iex(9)> ExactDog.quantile(ed, 0.5) 33 | 5000 34 | ``` 35 | 36 | `5032/5000 = 1.0064`, or 0.64% error, which is well under the 4% error that we specified. 37 | 38 | DDSketch is also fully mergeable, which is very useful in a distributed context. One might track web request response times on each node and aggregate the results later. 39 | 40 | ```elixir 41 | iex(10)> SimpleDog.merge(sd1, sd2) 42 | ``` 43 | 44 | ## Benchmarks 45 | 46 | Artificial benchmarks (see [./bench](bench)) indicate a 46x improvement in memory usage (0.1kb vs 4.6kb) for 2% relative error. 47 | 48 | While ExactDog can do inserts 1.4x faster than SimpleDog, `SimpleDog.insert/2` is still capable of more than 2 million inserts per second, which should be fast enough for just about anything. 49 | 50 | On all other operations, SimpleDog beats ExactDog. Not a bad trade-off for 2% relative error! 51 | 52 | ``` 53 | Operating System: Linux 54 | CPU Information: AMD Ryzen 9 3900X 12-Core Processor 55 | Number of Available Cores: 24 56 | Available memory: 31.36 GB 57 | Elixir 1.10.3 58 | Erlang 23.0.2 59 | 60 | Benchmark suite executing with the following configuration: 61 | warmup: 2 s 62 | time: 5 s 63 | memory time: 0 ns 64 | parallel: 1 65 | inputs: none specified 66 | Estimated total run time: 1.17 min 67 | 68 | Name ips average deviation median 99th % 69 | ExactDog.insert/2 3370.36 K 0.00030 ms ±4595.86% 0.00025 ms 0.00056 ms 70 | SimpleDog.insert/2 2391.07 K 0.00042 ms ±7775.41% 0.00031 ms 0.00059 ms 71 | SimpleDog.merge/2 35.43 K 0.0282 ms ±76.89% 0.0250 ms 0.0402 ms 72 | SimpleDog.quantile/2 50% 17.23 K 0.0580 ms ±4.50% 0.0573 ms 0.0645 ms 73 | SimpleDog.quantile/2 99% 17.00 K 0.0588 ms ±7.95% 0.0578 ms 0.0655 ms 74 | SimpleDog.quantile/2 90% 16.98 K 0.0589 ms ±6.46% 0.0580 ms 0.0653 ms 75 | ExactDog.merge/2 0.60 K 1.66 ms ±16.64% 1.59 ms 3.07 ms 76 | ExactDog.quantile/2 50% 0.29 K 3.41 ms ±7.27% 3.32 ms 4.19 ms 77 | ExactDog.quantile/2 99% 0.28 K 3.51 ms ±6.84% 3.42 ms 4.26 ms 78 | ExactDog.quantile/2 90% 0.28 K 3.51 ms ±7.10% 3.42 ms 4.28 ms 79 | ``` 80 | 81 | ## Installation 82 | 83 | The package can be installed by adding `dog_sketch` to your list of dependencies in `mix.exs`: 84 | 85 | ```elixir 86 | def deps do 87 | [ 88 | {:dog_sketch, "~> 0.1.0"} 89 | ] 90 | end 91 | ``` 92 | 93 | The docs can be found at [https://hexdocs.pm/dog_sketch](https://hexdocs.pm/dog_sketch). 94 | -------------------------------------------------------------------------------- /bench/memory.exs: -------------------------------------------------------------------------------- 1 | defmodule MemoryHelper do 2 | def memory_kb(term) do 3 | (:erts_debug.flat_size(term) / 8.0 / :math.pow(2, 10)) 4 | |> Float.round(1) 5 | end 6 | 7 | def wire_kb(term) do 8 | (byte_size(:erlang.term_to_binary(term)) / :math.pow(2, 10)) 9 | |> Float.round(1) 10 | end 11 | end 12 | 13 | alias DogSketch.{SimpleDog, ExactDog} 14 | 15 | sd = 16 | Enum.reduce(1..100_000, SimpleDog.new(error: 0.02), fn _x, sd -> 17 | SimpleDog.insert(sd, :rand.uniform(10000)) 18 | end) 19 | 20 | ed = 21 | Enum.reduce(1..100_000, ExactDog.new(), fn _x, sd -> 22 | ExactDog.insert(sd, :rand.uniform(10000)) 23 | end) 24 | 25 | IO.inspect(MemoryHelper.memory_kb(sd), label: "100k inserts SimpleDog 2% error (kb)") 26 | IO.inspect(MemoryHelper.memory_kb(ed), label: "100k inserts ExactDog (kb)") 27 | -------------------------------------------------------------------------------- /bench/simple_dog.exs: -------------------------------------------------------------------------------- 1 | alias DogSketch.{SimpleDog, ExactDog} 2 | 3 | sd1 = 4 | Enum.reduce(1..10000, SimpleDog.new(error: 0.02), fn _x, sd -> 5 | SimpleDog.insert(sd, :rand.uniform(10000)) 6 | end) 7 | 8 | sd2 = 9 | Enum.reduce(1..10000, SimpleDog.new(error: 0.02), fn _x, sd -> 10 | SimpleDog.insert(sd, :rand.uniform(10000)) 11 | end) 12 | 13 | ed1 = 14 | Enum.reduce(1..10000, ExactDog.new(error: 0.02), fn _x, sd -> 15 | ExactDog.insert(sd, :rand.uniform(10000)) 16 | end) 17 | 18 | ed2 = 19 | Enum.reduce(1..10000, ExactDog.new(error: 0.02), fn _x, sd -> 20 | ExactDog.insert(sd, :rand.uniform(10000)) 21 | end) 22 | 23 | Benchee.run( 24 | %{ 25 | "SimpleDog.insert/2" => fn num -> 26 | SimpleDog.insert(sd1, num) 27 | end, 28 | "SimpleDog.merge/2" => fn _ -> 29 | SimpleDog.merge(sd1, sd2) 30 | end, 31 | "SimpleDog.quantile/2 50%" => fn _ -> 32 | SimpleDog.quantile(sd1, 0.5) 33 | end, 34 | "SimpleDog.quantile/2 90%" => fn _ -> 35 | SimpleDog.quantile(sd1, 0.9) 36 | end, 37 | "SimpleDog.quantile/2 99%" => fn _ -> 38 | SimpleDog.quantile(sd1, 0.99) 39 | end, 40 | "ExactDog.insert/2" => fn num -> 41 | ExactDog.insert(ed1, num) 42 | end, 43 | "ExactDog.merge/2" => fn _ -> 44 | ExactDog.merge(ed1, ed2) 45 | end, 46 | "ExactDog.quantile/2 50%" => fn _ -> 47 | ExactDog.quantile(ed1, 0.5) 48 | end, 49 | "ExactDog.quantile/2 90%" => fn _ -> 50 | ExactDog.quantile(ed1, 0.9) 51 | end, 52 | "ExactDog.quantile/2 99%" => fn _ -> 53 | ExactDog.quantile(ed1, 0.99) 54 | end 55 | }, 56 | before_each: fn _ -> :rand.uniform(10000) end 57 | ) 58 | -------------------------------------------------------------------------------- /lib/dog_sketch.ex: -------------------------------------------------------------------------------- 1 | defmodule DogSketch do 2 | @moduledoc """ 3 | Documentation for DogSketch. 4 | """ 5 | end 6 | -------------------------------------------------------------------------------- /lib/dog_sketch/exact_dog.ex: -------------------------------------------------------------------------------- 1 | defmodule DogSketch.ExactDog do 2 | defstruct data: %{}, total: 0 3 | 4 | def new(_opts \\ []) do 5 | %__MODULE__{} 6 | end 7 | 8 | def merge(s1, s2) do 9 | data = Map.merge(s1.data, s2.data, fn _k, val1, val2 -> val1 + val2 end) 10 | %__MODULE__{data: data, total: s1.total + s2.total} 11 | end 12 | 13 | def insert(s, val) do 14 | data = Map.update(s.data, val, 1, fn x -> x + 1 end) 15 | 16 | %__MODULE__{s | data: data, total: s.total + 1} 17 | end 18 | 19 | def quantile(s, quantile) when quantile >= 0 and quantile <= 1 do 20 | total_quantile = s.total * quantile 21 | 22 | index = 23 | Enum.sort_by(s.data, fn {key, _v} -> key end) 24 | |> Enum.reduce_while(0, fn {key, val}, total -> 25 | if total + val >= total_quantile do 26 | {:halt, key} 27 | else 28 | {:cont, total + val} 29 | end 30 | end) 31 | 32 | index 33 | end 34 | end 35 | -------------------------------------------------------------------------------- /lib/dog_sketch/simple_dog.ex: -------------------------------------------------------------------------------- 1 | defmodule DogSketch.SimpleDog do 2 | defstruct data: %{}, gamma: 0, total: 0, inv_log_gamma: 0 3 | 4 | def new(opts \\ []) do 5 | err = Keyword.get(opts, :error, 0.02) 6 | gamma = (1 + err) / (1 - err) 7 | inv_log_gamma = 1.0 / :math.log(gamma) 8 | %__MODULE__{gamma: gamma, inv_log_gamma: inv_log_gamma} 9 | end 10 | 11 | def merge(%{gamma: g, inv_log_gamma: i} = s1, %{gamma: g} = s2) do 12 | data = Map.merge(s1.data, s2.data, fn _k, val1, val2 -> val1 + val2 end) 13 | %__MODULE__{data: data, gamma: g, total: s1.total + s2.total, inv_log_gamma: i} 14 | end 15 | 16 | def insert(s, val) when val > 0 do 17 | bin = ceil(:math.log(val) * s.inv_log_gamma) 18 | 19 | data = Map.update(s.data, bin, 1, fn x -> x + 1 end) 20 | 21 | %__MODULE__{s | data: data, total: s.total + 1} 22 | end 23 | 24 | def to_list(%{data: data, gamma: gamma}) do 25 | Enum.map(data, fn {key, val} -> 26 | {2 * :math.pow(gamma, key) / (gamma + 1), val} 27 | end) 28 | end 29 | 30 | def quantile(%{total: 0}, _), do: nil 31 | 32 | def quantile(s, quantile) when quantile >= 0 and quantile <= 1 do 33 | total_quantile = s.total * quantile 34 | 35 | index = 36 | Enum.sort_by(s.data, fn {key, _v} -> key end) 37 | |> Enum.reduce_while(0, fn {key, val}, total -> 38 | if total + val >= total_quantile do 39 | {:halt, key} 40 | else 41 | {:cont, total + val} 42 | end 43 | end) 44 | 45 | 2 * :math.pow(s.gamma, index) / (s.gamma + 1) 46 | end 47 | 48 | def count(%{total: total}), do: total 49 | end 50 | -------------------------------------------------------------------------------- /lib/mix/tasks/proper.ex: -------------------------------------------------------------------------------- 1 | defmodule Mix.Tasks.Proper do 2 | use Mix.Task 3 | 4 | def run([]) do 5 | "test/**/*_prop.exs" 6 | |> Path.wildcard() 7 | |> Kernel.ParallelRequire.files() 8 | end 9 | end 10 | -------------------------------------------------------------------------------- /mix.exs: -------------------------------------------------------------------------------- 1 | defmodule DogSketch.MixProject do 2 | use Mix.Project 3 | 4 | def project do 5 | [ 6 | app: :dog_sketch, 7 | name: "DogSketch", 8 | package: package(), 9 | version: "0.1.3", 10 | elixir: "~> 1.9", 11 | start_permanent: Mix.env() == :prod, 12 | source_url: "https://github.com/moosecodebv/dog_sketch", 13 | deps: deps() 14 | ] 15 | end 16 | 17 | # Run "mix help compile.app" to learn about applications. 18 | def application do 19 | [ 20 | extra_applications: [:logger] 21 | ] 22 | end 23 | 24 | # Run "mix help deps" to learn about dependencies. 25 | defp deps do 26 | [ 27 | {:propcheck, "~> 1.2", only: :test}, 28 | {:benchee, "~> 1.0", only: :dev}, 29 | {:ex_doc, "~> 0.22.2", only: :dev, runtime: false} 30 | ] 31 | end 32 | 33 | defp package do 34 | [ 35 | description: "DDSketch helps you make fast, low-memory, fully mergeable quantile sketches", 36 | licenses: ["MIT"], 37 | links: %{GitHub: "https://github.com/moosecodebv/dog_sketch"}, 38 | maintainers: ["Derek Kraan"] 39 | ] 40 | end 41 | end 42 | -------------------------------------------------------------------------------- /mix.lock: -------------------------------------------------------------------------------- 1 | %{ 2 | "benchee": {:hex, :benchee, "1.0.1", "66b211f9bfd84bd97e6d1beaddf8fc2312aaabe192f776e8931cb0c16f53a521", [:mix], [{:deep_merge, "~> 1.0", [hex: :deep_merge, repo: "hexpm", optional: false]}], "hexpm", "3ad58ae787e9c7c94dd7ceda3b587ec2c64604563e049b2a0e8baafae832addb"}, 3 | "deep_merge": {:hex, :deep_merge, "1.0.0", "b4aa1a0d1acac393bdf38b2291af38cb1d4a52806cf7a4906f718e1feb5ee961", [:mix], [], "hexpm", "ce708e5f094b9cd4e8f2be4f00d2f4250c4095be93f8cd6d018c753894885430"}, 4 | "earmark_parser": {:hex, :earmark_parser, "1.4.10", "6603d7a603b9c18d3d20db69921527f82ef09990885ed7525003c7fe7dc86c56", [:mix], [], "hexpm", "8e2d5370b732385db2c9b22215c3f59c84ac7dda7ed7e544d7c459496ae519c0"}, 5 | "ex_doc": {:hex, :ex_doc, "0.22.2", "03a2a58bdd2ba0d83d004507c4ee113b9c521956938298eba16e55cc4aba4a6c", [:mix], [{:earmark_parser, "~> 1.4.0", [hex: :earmark_parser, repo: "hexpm", optional: false]}, {:makeup_elixir, "~> 0.14", [hex: :makeup_elixir, repo: "hexpm", optional: false]}], "hexpm", "cf60e1b3e2efe317095b6bb79651f83a2c1b3edcb4d319c421d7fcda8b3aff26"}, 6 | "libgraph": {:hex, :libgraph, "0.13.3", "20732b7bafb933dcf7351c479e03076ebd14a85fd3202c67a1c197f4f7c2466b", [:mix], [], "hexpm", "78f2576eef615440b46f10060b1de1c86640441422832052686df53dc3c148c6"}, 7 | "makeup": {:hex, :makeup, "1.0.3", "e339e2f766d12e7260e6672dd4047405963c5ec99661abdc432e6ec67d29ef95", [:mix], [{:nimble_parsec, "~> 0.5", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "2e9b4996d11832947731f7608fed7ad2f9443011b3b479ae288011265cdd3dad"}, 8 | "makeup_elixir": {:hex, :makeup_elixir, "0.14.1", "4f0e96847c63c17841d42c08107405a005a2680eb9c7ccadfd757bd31dabccfb", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}], "hexpm", "f2438b1a80eaec9ede832b5c41cd4f373b38fd7aa33e3b22d9db79e640cbde11"}, 9 | "nimble_parsec": {:hex, :nimble_parsec, "0.6.0", "32111b3bf39137144abd7ba1cce0914533b2d16ef35e8abc5ec8be6122944263", [:mix], [], "hexpm", "27eac315a94909d4dc68bc07a4a83e06c8379237c5ea528a9acff4ca1c873c52"}, 10 | "propcheck": {:hex, :propcheck, "1.2.1", "c3517ef4ef6b408b466956498388abcc9a2ae406cd96d55528a3371951454f2b", [:mix], [{:libgraph, "~> 0.13", [hex: :libgraph, repo: "hexpm", optional: false]}, {:proper, "~> 1.3", [hex: :proper, repo: "hexpm", optional: false]}], "hexpm", "4780e6df2478b3aafa89bc39cafccdbf85444092c324fa08f6ea39ac8f33c672"}, 11 | "proper": {:hex, :proper, "1.3.0", "c1acd51c51da17a2fe91d7a6fc6a0c25a6a9849d8dc77093533109d1218d8457", [:make, :mix, :rebar3], [], "hexpm", "4aa192fccddd03fdbe50fef620be9d4d2f92635b54f55fb83aec185994403cbc"}, 12 | } 13 | -------------------------------------------------------------------------------- /papers/p2195-masson.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/moosecodebv/dog_sketch/955dbb15bae95b4a0191398c45e5d3d878ba28df/papers/p2195-masson.pdf -------------------------------------------------------------------------------- /test/dog_sketch_test.exs: -------------------------------------------------------------------------------- 1 | defmodule DogSketchTest do 2 | end 3 | -------------------------------------------------------------------------------- /test/simple_dog_sketch_test.exs: -------------------------------------------------------------------------------- 1 | defmodule SimpleDogSketchTest do 2 | use ExUnit.Case 3 | doctest DogSketch 4 | 5 | alias DogSketch.{SimpleDog, ExactDog} 6 | 7 | use PropCheck 8 | 9 | property "quantile within error bounds of exact" do 10 | forall {error, values, quantile} <- 11 | {float(0.0, 1.0), non_empty(list(non_neg_float())), float(0.0, 1.0)} do 12 | sd_quantile = 13 | Enum.reduce(values, SimpleDog.new(error: error), fn val, acc -> 14 | SimpleDog.insert(acc, val) 15 | end) 16 | |> SimpleDog.quantile(quantile) 17 | 18 | exact_quantile = 19 | Enum.reduce(values, ExactDog.new(), fn val, acc -> 20 | ExactDog.insert(acc, val) 21 | end) 22 | |> ExactDog.quantile(quantile) 23 | 24 | abs(sd_quantile / exact_quantile - 1) <= error 25 | end 26 | end 27 | 28 | property "merging is lossless" do 29 | forall {error, values, quantile} <- 30 | {float(0.0, 1.0), non_empty(list(non_neg_float())), float(0.0, 1.0)} do 31 | sd_quantile = 32 | Enum.reduce(values, SimpleDog.new(error: error), fn val, acc -> 33 | SimpleDog.insert(acc, val) 34 | end) 35 | |> SimpleDog.quantile(quantile) 36 | 37 | merged_quantile = 38 | Enum.reduce(values, SimpleDog.new(error: error), fn val, acc -> 39 | new_sd = SimpleDog.new(error: error) |> SimpleDog.insert(val) 40 | SimpleDog.merge(new_sd, acc) 41 | end) 42 | |> SimpleDog.quantile(quantile) 43 | 44 | sd_quantile == merged_quantile 45 | end 46 | end 47 | end 48 | -------------------------------------------------------------------------------- /test/test_helper.exs: -------------------------------------------------------------------------------- 1 | ExUnit.start() 2 | --------------------------------------------------------------------------------