├── license.txt ├── pg_stats_aggregate.sql └── readme.md /license.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2018 Chucky Ellison 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. 22 | 23 | Based, with permission, on code from John D. Cook's 24 | https://www.johndcook.com/blog/skewness_kurtosis/ -------------------------------------------------------------------------------- /pg_stats_aggregate.sql: -------------------------------------------------------------------------------- 1 | -- Copyright (c) 2018 Chucky Ellison 2 | -- Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation 3 | -- files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, 4 | -- modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the 5 | -- Software is furnished to do so, subject to the following conditions: 6 | -- The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 7 | -- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE 8 | -- WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 9 | -- COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 10 | -- ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 11 | 12 | -- based on code from John D. Cook's https://www.johndcook.com/blog/skewness_kurtosis/ with permission 13 | 14 | -- Notes: 15 | -- All the math is done at double precision; can easily be changed to work with numeric or single, or whatever. 16 | -- Kurtosis and skewness are NOT corrected for statistical bias 17 | 18 | -------------------------------------------------- 19 | -- MAKE SURE you're not using any of these names! 20 | 21 | -- drop aggregate if exists stats_agg(double precision); 22 | -- drop function if exists _stats_agg_accumulator(_stats_agg_accum_type, double precision); 23 | -- drop function if exists _stats_agg_finalizer(_stats_agg_accum_type); 24 | -- drop type if exists _stats_agg_result_type; 25 | -- drop type if exists _stats_agg_accum_type; 26 | 27 | create type _stats_agg_accum_type AS ( 28 | n bigint, 29 | min double precision, 30 | max double precision, 31 | m1 double precision, 32 | m2 double precision, 33 | m3 double precision, 34 | m4 double precision 35 | ); 36 | 37 | create type _stats_agg_result_type AS ( 38 | count bigint, 39 | min double precision, 40 | max double precision, 41 | mean double precision, 42 | variance double precision, 43 | skewness double precision, 44 | kurtosis double precision 45 | ); 46 | 47 | create or replace function _stats_agg_accumulator(_stats_agg_accum_type, double precision) 48 | returns _stats_agg_accum_type AS ' 49 | DECLARE 50 | a ALIAS FOR $1; 51 | x alias for $2; 52 | n1 bigint; 53 | delta double precision; 54 | delta_n double precision; 55 | delta_n2 double precision; 56 | term1 double precision; 57 | BEGIN 58 | if x IS NOT NULL then 59 | n1 = a.n; 60 | a.n = a.n + 1; 61 | delta = x - a.m1; 62 | delta_n = delta / a.n; 63 | delta_n2 = delta_n * delta_n; 64 | term1 = delta * delta_n * n1; 65 | a.m1 = a.m1 + delta_n; 66 | a.m4 = a.m4 + term1 * delta_n2 * (a.n*a.n - 3*a.n + 3) + 6 * delta_n2 * a.m2 - 4 * delta_n * a.m3; 67 | a.m3 = a.m3 + term1 * delta_n * (a.n - 2) - 3 * delta_n * a.m2; 68 | a.m2 = a.m2 + term1; 69 | a.min = least(a.min, x); 70 | a.max = greatest(a.max, x); 71 | end if; 72 | 73 | RETURN a; 74 | END; 75 | ' 76 | language plpgsql; 77 | 78 | create or replace function _stats_agg_finalizer(_stats_agg_accum_type) 79 | returns _stats_agg_result_type AS ' 80 | BEGIN 81 | RETURN row( 82 | $1.n, 83 | $1.min, 84 | $1.max, 85 | case when $1.n = 0 then null else $1.m1 end, 86 | case when $1.n = 0 then null else $1.m2 / nullif(($1.n - 1.0), 0) end, 87 | case when $1.m2 = 0 then null else sqrt($1.n) * $1.m3 / nullif(($1.m2 ^ 1.5), 0) end, 88 | case when $1.m2 = 0 then null else $1.n * $1.m4 / nullif(($1.m2 * $1.m2) - 3.0, 0) end 89 | ); 90 | END; 91 | ' 92 | language plpgsql; 93 | 94 | create aggregate stats_agg(double precision) ( 95 | sfunc = _stats_agg_accumulator, 96 | stype = _stats_agg_accum_type, 97 | finalfunc = _stats_agg_finalizer, 98 | initcond = '(0,,, 0, 0, 0, 0)' 99 | ); 100 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # PostgreSQL Stats Aggregate (stats_agg) 2 | 3 | `stats_agg` is an aggregate function for PostgreSQL that works like pre-existing aggregates (`min(x)`, `max()`, `avg()`, etc.), but computes multiple stats at once and returns them all. The stats returned are `count`, `min`, `max`, `mean`, `variance`, `skewness`, and `kurtosis`, though others could be added. 4 | 5 | I needed an aggregate returning skewness and kurtosis, and instead of making separate functions for each requiring multiple passes to compute the mean etc., I thought it would be better to have one aggregate that returns everything in one pass. 6 | 7 | Thanks to John D. Cook and [his blog post](https://www.johndcook.com/blog/skewness_kurtosis/) explaining how this could be done. 8 | 9 | Tested on PostgreSQL 9.6.6. 10 | 11 | ## Installation 12 | Just run the `pg_stats_aggregate.sql` file. This will create a new aggregate function `stats_agg(double precision)` that returns a row type of basic stats. 13 | 14 | ## Examples 15 | If you just want one of the results, you can grab it: 16 | ```sql 17 | with data as ( 18 | select unnest(array[1, 2, 3, 4, 5, 10, 20]) n 19 | ) 20 | select (stats_agg(n)).skewness from data 21 | -- "skewness" 22 | -- 1.3687777084534 23 | ``` 24 | Or, because it returns all the stats as a row, you can get the results as separate columns and do things with them. 25 | ```sql 26 | with data as ( 27 | select unnest(array[1, 2, 3, 4, 5, 10, 20]) n 28 | ) 29 | select (stats_agg(n)).* from data 30 | -- "count";"min";"max";"mean";"variance";"skewness";"kurtosis" 31 | -- 7;1;20;6.42857142857143;44.2857142857143;1.3687777084534;0.521266042317031 32 | ``` --------------------------------------------------------------------------------