├── docs ├── img │ └── aws-dismal │ │ ├── f-150.png │ │ ├── pony.jpg │ │ ├── wb-graph.jpg │ │ ├── cat-dollars.jpg │ │ ├── power-plant.png │ │ ├── staggered-vs-cliffed.png │ │ ├── dismal-aws-storage-rates.png │ │ ├── dismal-aws-fig1-mike-twitter.png │ │ └── dismal-aws-smoothed-storage.png ├── tufte-css │ ├── et-book │ │ ├── et-book-bold-line-figures │ │ │ ├── et-book-bold-line-figures.eot │ │ │ ├── et-book-bold-line-figures.ttf │ │ │ └── et-book-bold-line-figures.woff │ │ ├── et-book-roman-line-figures │ │ │ ├── et-book-roman-line-figures.eot │ │ │ ├── et-book-roman-line-figures.ttf │ │ │ └── et-book-roman-line-figures.woff │ │ ├── et-book-roman-old-style-figures │ │ │ ├── et-book-roman-old-style-figures.eot │ │ │ ├── et-book-roman-old-style-figures.ttf │ │ │ └── et-book-roman-old-style-figures.woff │ │ ├── et-book-semi-bold-old-style-figures │ │ │ ├── et-book-semi-bold-old-style-figures.eot │ │ │ ├── et-book-semi-bold-old-style-figures.ttf │ │ │ └── et-book-semi-bold-old-style-figures.woff │ │ └── et-book-display-italic-old-style-figures │ │ │ ├── et-book-display-italic-old-style-figures.eot │ │ │ ├── et-book-display-italic-old-style-figures.ttf │ │ │ └── et-book-display-italic-old-style-figures.woff │ └── tufte.css └── aws-dismal-guide.md ├── aws ├── cloudstats_dim_account_names.sql ├── cloudstats_full_load.sql ├── dag-full-load.txt ├── cloudstats_bill_create.sql ├── cloudstats_incremental_load.sql ├── cloudstats_buying_efficiency.sql ├── cloudstats_cpu_ratecard.sql ├── cloudstats_bill_full_load.sql ├── cloudstats_bill_incremental_load.sql ├── cloudstats_dim_pricing_buckets.sql ├── cloudstats_create.sql ├── redshift_athena_compat_udfs.sql ├── cloudstats_bill_iceberg.sql ├── cloudstats_dim_aws_products.sql ├── cloudstats_wow_cost_movers.sql ├── cloudstats_01.sql └── cloudstats_00.sql ├── README.md └── LICENSE /docs/img/aws-dismal/f-150.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/f-150.png -------------------------------------------------------------------------------- /docs/img/aws-dismal/pony.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/pony.jpg -------------------------------------------------------------------------------- /docs/img/aws-dismal/wb-graph.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/wb-graph.jpg -------------------------------------------------------------------------------- /docs/img/aws-dismal/cat-dollars.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/cat-dollars.jpg -------------------------------------------------------------------------------- /docs/img/aws-dismal/power-plant.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/power-plant.png -------------------------------------------------------------------------------- /docs/img/aws-dismal/staggered-vs-cliffed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/staggered-vs-cliffed.png -------------------------------------------------------------------------------- /docs/img/aws-dismal/dismal-aws-storage-rates.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/dismal-aws-storage-rates.png -------------------------------------------------------------------------------- /docs/img/aws-dismal/dismal-aws-fig1-mike-twitter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/dismal-aws-fig1-mike-twitter.png -------------------------------------------------------------------------------- /docs/img/aws-dismal/dismal-aws-smoothed-storage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/img/aws-dismal/dismal-aws-smoothed-storage.png -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-bold-line-figures/et-book-bold-line-figures.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-bold-line-figures/et-book-bold-line-figures.ttf -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-roman-line-figures/et-book-roman-line-figures.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-roman-line-figures/et-book-roman-line-figures.ttf -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.ttf -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.woff -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-semi-bold-old-style-figures/et-book-semi-bold-old-style-figures.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-semi-bold-old-style-figures/et-book-semi-bold-old-style-figures.eot -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-semi-bold-old-style-figures/et-book-semi-bold-old-style-figures.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-semi-bold-old-style-figures/et-book-semi-bold-old-style-figures.ttf -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-semi-bold-old-style-figures/et-book-semi-bold-old-style-figures.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-semi-bold-old-style-figures/et-book-semi-bold-old-style-figures.woff -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.ttf -------------------------------------------------------------------------------- /docs/tufte-css/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aristus/cloudstats/HEAD/docs/tufte-css/et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff -------------------------------------------------------------------------------- /aws/cloudstats_dim_account_names.sql: -------------------------------------------------------------------------------- 1 | -- friendly names for your accounts. You can scrape this from the aws cli 2 | -- or simply make up your own tags. 3 | drop table if exists cloudstats.cloudstats_dim_account_names cascade; 4 | 5 | create table cloudstats.cloudstats_dim_account_names ( 6 | account_name varchar, 7 | account_nick varchar, 8 | account_id varchar primary key, 9 | account_owner varchar 10 | ); 11 | 12 | insert into cloudstats.cloudstats_dim_account_names values 13 | -- eg ('Your Company Name, Inc', 'Production', '1234567890', 'sre-oncall@example.com'), 14 | 15 | ; 16 | -------------------------------------------------------------------------------- /aws/cloudstats_full_load.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Total truncation and full load of data up to 2 years back. See _incremental_load.sql for the daily job. 3 | */ 4 | 5 | truncate table cloudstats.cloudstats; 6 | 7 | insert into cloudstats.cloudstats 8 | select * from cloudstats.cloudstats_01 9 | where 10 | -- date >= date_format(date_add('year', -2, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m-%d') 11 | date >= date_format(date_add('year', -2, current_date), '%Y-%m-%d') 12 | and line_item_type in ('SavingsPlanCoveredUsage', 'DiscountedUsage', 'Usage') 13 | and usage > 0 14 | and cost > 0 15 | ; 16 | -------------------------------------------------------------------------------- /aws/dag-full-load.txt: -------------------------------------------------------------------------------- 1 | # Execute these files, in order, to accomplish a full load of the ETL: 2 | 3 | redshift_athena_compat_udfs.sql 4 | cloudstats_dim_account_names.sql 5 | cloudstats_dim_pricing_buckets.sql 6 | cloudstats_dim_instance_specs.sql 7 | cloudstats_dim_aws_products.sql 8 | cloudstats_00.sql 9 | cloudstats_01.sql 10 | cloudstats_create.sql 11 | cloudstats_full_load.sql 12 | cloudstats_bill_create.sql 13 | cloudstats_bill_full_load.sql 14 | cloudstats_bill_iceberg.sql 15 | cloudstats_cpu_ratecard.sql 16 | cloudstats_wow_cost_movers.sql 17 | cloudstats_buying_efficiency.sql 18 | 19 | 20 | # A daily incremental build only needs to run these queries: 21 | 22 | cloudstats_incremental_load.sql 23 | cloudstats_bill_incremental_load.sql 24 | -------------------------------------------------------------------------------- /aws/cloudstats_bill_create.sql: -------------------------------------------------------------------------------- 1 | -- used to recreate the official monthly AWS bill. Note that unlike the 2 | -- main-sequence cloudstats table, this groups by month and includes 3 | -- all line item types. There's also jiggery-pokery in the product_name. 4 | -- It also merges various lineitems into a "Charges" line as seen 5 | -- in the official invoice. 6 | 7 | drop table if exists cloudstats.cloudstats_bill cascade; 8 | 9 | create table cloudstats.cloudstats_bill ( 10 | date_month varchar, 11 | line_item_type varchar, 12 | bill_line varchar, 13 | account_id varchar, 14 | account_name varchar, 15 | billing_entity varchar, 16 | legal_entity varchar, 17 | invoice_id varchar, 18 | product_name varchar, 19 | product_code varchar, 20 | location varchar, 21 | product_description varchar, 22 | pricing_bucket varchar, 23 | pricing_unit varchar, 24 | pricing_regime varchar, 25 | accrual_cost float, 26 | usage float, 27 | unblended_cost float, 28 | cost_without_edp float, 29 | days_in_month int, 30 | public_cost float, 31 | total_savings float 32 | ) 33 | 34 | compound sortkey(date_month) 35 | ; 36 | 37 | -------------------------------------------------------------------------------- /aws/cloudstats_incremental_load.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Clear out data from N days back, then re-load from _01. This table is meant 3 | to only contain billed usage rows and fully-discounted, amortized costs. 4 | See the cloudstats_bill* files for a table that is used to recreate the formal 5 | bill sent to accounting. 6 | 7 | We go 65 days back because the root table, the AWS CUR, is a running account 8 | and not an immutable log. During normal operation rows can be added or modified 9 | weeks back as various accounting things happen. This is also why the _00 and _01 10 | stages of the pipeline are implemented as views and not materialized tables. 11 | 12 | todo: this does not work in Athena, which has no delete and no drop partition. :( 13 | */ 14 | 15 | delete from cloudstats.cloudstats 16 | --where date >= date_format(date_add('day', -65, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m-%d') 17 | where date >= date_format(date_add('day', -65, current_date), '%Y-%m-%d') 18 | ; 19 | 20 | insert into cloudstats.cloudstats 21 | select * from cloudstats.cloudstats_01 22 | where 23 | -- date >= date_format(date_add('day', -65, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m-%d') 24 | date >= date_format(date_add('day', -65, current_date), '%Y-%m-%d') 25 | and line_item_type in ('SavingsPlanCoveredUsage', 'DiscountedUsage', 'Usage') 26 | and usage > 0 27 | and cost > 0 28 | ; 29 | -------------------------------------------------------------------------------- /aws/cloudstats_buying_efficiency.sql: -------------------------------------------------------------------------------- 1 | -- a quick pass over which products and resources can be purchased under which buying options 2 | 3 | drop table if exists cloudstats.cloudstats_buying_options cascade; 4 | create table cloudstats.cloudstats_buying_options ( 5 | product_code varchar, 6 | pricing_bucket varchar, 7 | buy_options varchar 8 | ); 9 | insert into cloudstats.cloudstats_buying_options values 10 | ('AmazonEC2', 'Compute', 'Reserved, SavingsPlan, Spot, ProvisionedIO'), 11 | ('AmazonRDS', 'Compute', 'Reserved'), 12 | ('AmazonES', 'Compute', 'Reserved'), 13 | ('AmazonElastiCache', 'Compute', 'Reserved'), 14 | ('AWSLambda', 'Compute', 'SavingsPlan'), 15 | ('AWSFargate', 'Compute', 'SavingsPlan'), 16 | ('AmazonRedshift', 'Compute', 'Reserved'), 17 | ('AmazonSageMaker', 'Compute', 'SavingsPlan'), 18 | ('AmazonDynamoDB', 'IO', 'ProvisionedIO') 19 | ; 20 | 21 | drop view if exists cloudstats.cloudstats_buying_efficiency cascade; 22 | create view cloudstats.cloudstats_buying_efficiency as 23 | 24 | /* --athena 25 | with cloudstats_buying_options as ( 26 | select * from ( 27 | values 28 | row('AmazonEC2', 'Compute', ARRAY['Reserved', 'SavingsPlan', 'Spot', 'ProvisionedIO']), 29 | row('AmazonRDS', 'Compute', ARRAY['Reserved']), 30 | --row('AWSELB', 'Compute', ARRAY['OnDemand']), 31 | row('AmazonES', 'Compute', ARRAY['Reserved']), 32 | row('AmazonElastiCache', 'Compute', ARRAY['Reserved']), 33 | row('AWSLambda', 'Compute', ARRAY['SavingsPlan']), 34 | row('AWSFargate', 'Compute', ARRAY['SavingsPlan']), 35 | row('AmazonRedshift', 'Compute', ARRAY['Reserved']), 36 | row('AmazonSageMaker', 'Compute', ARRAY['SavingsPlan']), 37 | row('AmazonDynamoDB', 'IO', ARRAY['ProvisionedIO']) 38 | ) tmp (product_code, pricing_bucket, buy_options) 39 | ) 40 | */ 41 | 42 | select 43 | a.date_month, 44 | a.product_code, 45 | a.pricing_regime, 46 | a.pricing_bucket, 47 | a.pricing_unit, 48 | --athena: array_join(b.buy_options, ', ') as buy_options, 49 | b.buy_options, 50 | cast(sum(a.cost) as int) as cost 51 | from 52 | cloudstats.cloudstats a inner join cloudstats.cloudstats_buying_options b 53 | on a.product_code = b.product_code 54 | and a.pricing_bucket = b.pricing_bucket 55 | where 56 | a.date >= date_format(date_add('month', -3, current_date), '%Y-%m') 57 | and a.pricing_regime = 'OnDemand' 58 | group by 1,2,3,4,5,6 59 | order by 7 desc; 60 | 61 | -------------------------------------------------------------------------------- /aws/cloudstats_cpu_ratecard.sql: -------------------------------------------------------------------------------- 1 | /** 2 | Get the per-CPU-hour real rates paid on EC2 compute, broken down by pricing regime and processor. 3 | 4 | NB: per-CPU rates are not perfect for aggregating across the many hundreds of instance types. But at 5 | scale, and presuming the mix of special features like GPUs, SSDs, and high-click-speed chips doesn't 6 | skew the averages too much, it's good enough for procurement work. 7 | 8 | **/ 9 | create or replace view cloudstats.cloudstats_cpu_ratecard as 10 | select 11 | date, 12 | -- eg, "AWS Graviton2: r6g" 13 | coalesce(compute_processor_line, 'Unknown') || ': ' || coalesce(compute_instance_type_family, '??') as vendor_family, 14 | 15 | compute_os, 16 | compute_software, 17 | compute_instance_type_family, 18 | pricing_regime, 19 | coalesce(compute_processor_vendor, 'Unknown') as compute_processor_vendor, 20 | sum(usage) as instance_hours, 21 | sum(cost) as cost, 22 | sum(usage * compute_processor_vcpu) as cpu_hours, --todo: default to 0, or 1, or leave null? 23 | sum(cost) / sum(usage * compute_processor_vcpu) as rate, 24 | sum(cost) / sum(usage * compute_processor_vcpu) * 100 as rate_cents, 25 | 26 | /* Athena */ 27 | --athena:approx_percentile(cost / (usage * compute_processor_vcpu), 0.25) * 100 as rate_cents_p25, 28 | --athena:approx_percentile(cost / (usage * compute_processor_vcpu), 0.50) * 100 as rate_cents_p50, 29 | --athena:approx_percentile(cost / (usage * compute_processor_vcpu), 0.75) * 100 as rate_cents_p75, 30 | --athena:approx_percentile(cost / (usage * compute_processor_vcpu), 0.95) * 100 as rate_cents_p95 31 | 32 | /* Redshift */ 33 | percentile_cont(0.25) within group (order by cost / (usage * compute_processor_vcpu)) * 100 as rate_cents_p25, 34 | percentile_cont(0.50) within group (order by cost / (usage * compute_processor_vcpu)) * 100 as rate_cents_p50, 35 | percentile_cont(0.75) within group (order by cost / (usage * compute_processor_vcpu)) * 100 as rate_cents_p75, 36 | percentile_cont(0.95) within group (order by cost / (usage * compute_processor_vcpu)) * 100 as rate_cents_p95 37 | 38 | from cloudstats.cloudstats 39 | where 40 | -- only look at compute costs, to exclude things like attached disks & network 41 | pricing_bucket = 'Compute' 42 | 43 | --todo: it's possible that we can get good insight by looking at other products 44 | -- with reserved and SP options, but for now let's keep it simple. 45 | and product_code = 'AmazonEC2' 46 | --and date >= date_format(date_add('year', -1, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m-%d') 47 | and date >= date_format(date_add('year', -1, current_date), '%Y-%m-%d') 48 | 49 | group by 1,2,3,4,5,6,7 50 | order by 1 desc 51 | 52 | 53 | -------------------------------------------------------------------------------- /aws/cloudstats_bill_full_load.sql: -------------------------------------------------------------------------------- 1 | -- used to recreate the official monthly AWS bill. Note that unlike the 2 | -- main-sequence cloudstats table, this groups by month and includes 3 | -- all line item types. There's also jiggery-pokery in the product_name. 4 | -- It also consolidates various lineitems into a "Charges" line, as in 5 | -- in the official invoice. 6 | 7 | truncate table cloudstats.cloudstats_bill; 8 | 9 | insert into cloudstats.cloudstats_bill 10 | select 11 | date_month, 12 | line_item_type, 13 | 14 | (case 15 | when line_item_type in ('Usage', 'SavingsPlanCoveredUsage', 'SavingsPlanRecurringFee', 'SavingsPlanUpfrontFee', 'RIFee', 'Refund', 'DiscountedUsage', 'Fee') then 'Charges' 16 | when line_item_type in ('SavingsPlanNegation') then 'Savings Plan' 17 | when line_item_type in ('Credit') then 'Credits' 18 | else line_item_type 19 | end) as bill_line, 20 | 21 | account_id, 22 | account_name || ' (' || account_id || ')' as account_name, -- name, not nick as in the accrual tables. 23 | billing_entity, 24 | legal_entity, 25 | invoice_id, 26 | bill_product_name as product_name, -- see _01 stage for an explanation. 27 | bill_product_code as product_code, 28 | location, 29 | product_description, 30 | 31 | -- the bucket for non-charge bill_lines is usually Unknown, so thwack in the product_code. 32 | -- this is because pricing_unit is null for these lines. 33 | (case 34 | when pricing_bucket = 'Unknown' then product_code 35 | else pricing_bucket 36 | end) as pricing_bucket, 37 | 38 | pricing_unit, 39 | pricing_regime, 40 | 41 | sum(cost) as accrual_cost, -- this is the "true" cost used in the usage/accrual tables in the main branch 42 | sum(usage) as usage, 43 | 44 | -- this is the proper cost field for billing reports. Sum this, group by date_month, 45 | -- bill_line, product_name, and invoice_id, and that's your basic bill summary. 46 | sum(unblended_cost) as unblended_cost, 47 | sum(cost_without_edp) as cost_without_edp, 48 | count(distinct date) as days_in_month, 49 | 50 | -- for calculating your total savings, including Spot, SP, RI, EDP, PRD, etc etc etc. 51 | -- this compared to sum(unblended_cost) is the overall scorecard for the entire effort. 52 | sum(public_cost) as public_cost, 53 | sum(public_cost) - sum(unblended_cost) as total_savings 54 | 55 | from cloudstats.cloudstats_01 56 | where 57 | -- date_month >= date_format(date_add('year', -2, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m') 58 | date_month >= date_format(date_add('year', -2, current_date), '%Y-%m') 59 | group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 60 | ; 61 | -------------------------------------------------------------------------------- /aws/cloudstats_bill_incremental_load.sql: -------------------------------------------------------------------------------- 1 | -- used to recreate the official monthly AWS bill. Note that unlike the 2 | -- main-sequence cloudstats table, this groups by month and includes 3 | -- all line item types. There's also jiggery-pokery in the product_name. 4 | -- It also consolidates various lineitems into a "Charges" line, as in 5 | -- in the official invoice. 6 | 7 | delete from cloudstats.cloudstats_bill 8 | where 9 | -- date_month >= date_format(date_add('day', -65, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m') 10 | date_month >= date_format(date_add('day', -65, current_date), '%Y-%m'); 11 | 12 | 13 | insert into cloudstats.cloudstats_bill 14 | select 15 | date_month, 16 | line_item_type, 17 | 18 | (case 19 | when line_item_type in ('Usage', 'SavingsPlanCoveredUsage', 'SavingsPlanRecurringFee', 'SavingsPlanUpfrontFee', 'RIFee', 'Refund', 'DiscountedUsage', 'Fee') then 'Charges' 20 | when line_item_type in ('SavingsPlanNegation') then 'Savings Plan' 21 | when line_item_type in ('Credit') then 'Credits' 22 | else line_item_type 23 | end) as bill_line, 24 | 25 | account_id, 26 | account_name || ' (' || account_id || ')' as account_name, -- name, not nick as in the accrual tables. 27 | billing_entity, 28 | legal_entity, 29 | invoice_id, 30 | bill_product_name as product_name, -- see _01 stage for an explanation. 31 | bill_product_code as product_code, 32 | location, 33 | product_description, 34 | 35 | -- the bucket for non-charge bill_lines is usually Unknown, so thwack in the product_code. 36 | -- this is because pricing_unit is null for these lines. 37 | (case 38 | when pricing_bucket = 'Unknown' then product_code 39 | else pricing_bucket 40 | end) as pricing_bucket, 41 | 42 | pricing_unit, 43 | pricing_regime, 44 | 45 | sum(cost) as accrual_cost, -- this is the "true" cost used in the usage/accrual tables in the main branch 46 | sum(usage) as usage, 47 | 48 | -- this is the proper cost field for billing reports. Sum this, group by date_month, 49 | -- bill_line, product_name, and invoice_id, and that's your basic bill summary. 50 | sum(unblended_cost) as unblended_cost, 51 | sum(cost_without_edp) as cost_without_edp, 52 | count(distinct date) as days_in_month, 53 | 54 | -- for calculating your total savings, including Spot, SP, RI, EDP, PRD, etc etc etc. 55 | -- this compared to sum(unblended_cost) is the overall scorecard for the entire effort. 56 | sum(public_cost) as public_cost, 57 | sum(public_cost) - sum(unblended_cost) as total_savings 58 | 59 | from cloudstats.cloudstats_01 60 | where 61 | -- date_month >= date_format(date_add('day', -65, to_timestamp('{{ get_batch_id(ts) }}'), '%Y-%m') 62 | date_month >= date_format(date_add('day', -65, current_date), '%Y-%m') 63 | 64 | group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 65 | ; 66 | -------------------------------------------------------------------------------- /aws/cloudstats_dim_pricing_buckets.sql: -------------------------------------------------------------------------------- 1 | /** 2 | This maps the cleaned up pricing_unit field made in stage 00 with less 3 | granular pricing "buckets". Items in a bucket should generally be 4 | comparable to each other. Eg, an hour on an EC2 instance is comparable 5 | to an hour on an RDS or ELB instance. A GB-month is a GB-month; the 6 | only difference is the rate you pay. 7 | 8 | Bucketing items like this allows you to see your usage of the basic 9 | resources you consume: Compute, Network, Storage, etc, across all 10 | products. 11 | 12 | I'm not 100% happy with this ontology. Monthly is only there to keep 13 | out confusing spikes on timeseries graphs. 14 | 15 | I'm also not happy with the Request bucket. As the years go on I see 16 | AWS putting more and more usage into hard-to-compare a la carte requests. 17 | **/ 18 | drop table if exists cloudstats.cloudstats_dim_pricing_buckets cascade; 19 | 20 | create table cloudstats.cloudstats_dim_pricing_buckets (pricing_unit varchar primary key, pricing_bucket varchar); 21 | 22 | insert into cloudstats.cloudstats_dim_pricing_buckets values 23 | ('CPU-Month', 'Monthly'), 24 | ('DNS-Month', 'Monthly'), 25 | ('Face-Month', 'Monthly'), 26 | ('Hardware-Month', 'Monthly'), 27 | ('Object-Month', 'Monthly'), 28 | ('SecurityCheck-Month', 'Monthly'), 29 | ('Software-Month', 'Monthly'), 30 | ('Support-Month', 'Monthly'), 31 | ('Tag-Month', 'Monthly'), 32 | ('User-Month', 'Monthly'), 33 | 34 | ('GB-Hour', 'Storage'), 35 | ('GB-Month', 'Storage'), 36 | ('Obj-Month', 'Storage'), -- todo: monthly? or best kept in Storage. 37 | ('UsageRecord-month', 'Storage'), 38 | 39 | ('Alarms', 'Request'), 40 | ('ConfigRuleEvaluations', 'Request'), 41 | ('ConfigurationItemRecorded','Request'), 42 | ('Count', 'Request'), 43 | ('Events', 'Request'), 44 | ('HostedZone', 'Request'), --todo: monthly? 45 | ('Keys', 'Request'), 46 | ('Message', 'Request'), 47 | ('Messages', 'Request'), 48 | ('Metric Datapoints', 'Request'), -- Prometheus. todo: are these requests or processing? Not denominated in amt of data. 49 | ('Metrics', 'Request'), 50 | ('Notifications', 'Request'), 51 | ('PutRequest', 'Request'), 52 | ('Request', 'Request'), 53 | ('Secrets', 'Request'), 54 | ('State Transitions', 'Request'), 55 | ('URL', 'Request'), 56 | 57 | ('Hour', 'Compute'), 58 | 59 | ('GB-Network', 'Network'), 60 | 61 | ('GB-IO', 'IO'), -- S3, EFS 62 | ('GiBps-mo', 'IO'), -- EBS 63 | ('IOPS-Mo', 'IO'), --todo: better as monthly? 64 | ('IOs', 'IO'), 65 | ('ReadCapacityUnit-Hrs', 'IO'), 66 | ('ReadRequestUnits', 'IO'), 67 | ('WriteCapacityUnit-Hrs', 'IO'), 68 | ('WriteRequestUnits', 'IO'), 69 | 70 | ('GB-Processed', 'Processing'), -- CloudTrail, etc 71 | ('GB-Second', 'Processing') -- Lambda 72 | ; 73 | 74 | -------------------------------------------------------------------------------- /aws/cloudstats_create.sql: -------------------------------------------------------------------------------- 1 | drop table if exists cloudstats.cloudstats cascade; 2 | CREATE TABLE cloudstats.cloudstats ( 3 | 4 | /* dimensions */ 5 | year character (4) ENCODE lzo, 6 | month character varying(2) ENCODE lzo, 7 | date_month character (7) ENCODE lzo, 8 | day_of_week character (7) ENCODE lzo, 9 | account_id character varying(256) ENCODE lzo, 10 | account_name character varying(256) ENCODE lzo, 11 | account_display_name character varying(515) ENCODE lzo, 12 | billing_entity character varying(256) ENCODE lzo, 13 | legal_entity character varying(256) ENCODE lzo, 14 | invoice_id character varying(256) ENCODE lzo, 15 | resource_name character varying(2000) ENCODE lzo, 16 | line_item_type character varying(256) ENCODE lzo, 17 | usage_type character varying(256) ENCODE lzo, 18 | usage_type_prefix character varying(256) ENCODE lzo, 19 | region_code character varying(256) ENCODE lzo, 20 | location character varying(256) ENCODE lzo, 21 | operation character varying(256) ENCODE lzo, 22 | currency_code character varying(256) ENCODE lzo, 23 | pricing_bucket character varying(256) ENCODE raw, 24 | product_code character varying(256) ENCODE lzo, 25 | product_name character varying(256) ENCODE lzo, 26 | bill_product_code character varying(256) ENCODE lzo, 27 | bill_product_name character varying(256) ENCODE lzo, 28 | product_family character varying(256) ENCODE lzo, 29 | product_group character varying(256) ENCODE lzo, 30 | product_servicecode character varying(256) ENCODE lzo, 31 | pricing_unit character varying(256) ENCODE lzo, 32 | pricing_regime character varying(11) ENCODE lzo, 33 | product_description character varying(256) ENCODE lzo, 34 | savings_plan_arn character varying(256) ENCODE lzo, 35 | reserved_instance_arn character varying(256) ENCODE lzo, 36 | compute_instance_spec character varying(385) ENCODE lzo, 37 | compute_instance_type character varying(384) ENCODE lzo, 38 | compute_instance_family character varying(384) ENCODE lzo, 39 | compute_instance_type_family character varying(384) ENCODE lzo, 40 | compute_availability_zone character varying(256) ENCODE lzo, 41 | compute_capacity_status character varying(256) ENCODE lzo, 42 | compute_class character varying(256) ENCODE lzo, 43 | compute_processor_name character varying(256) ENCODE lzo, 44 | compute_processor_vendor character varying(256) ENCODE lzo, 45 | compute_processor_line character varying(256) ENCODE lzo, 46 | compute_processor_vcpu integer ENCODE az64, 47 | compute_storage character varying(256) ENCODE lzo, 48 | compute_os character varying(256) ENCODE lzo, 49 | compute_software character varying(514) ENCODE lzo, 50 | storage_class character varying(256) ENCODE lzo, 51 | storage_volume_type character varying(256) ENCODE lzo, 52 | storage_volume_api character varying(256) ENCODE lzo, 53 | storage_user_volume character varying(256) ENCODE lzo, 54 | days_in_month integer ENCODE az64 55 | 56 | /* measures */ 57 | record_cnt bigint ENCODE az64, 58 | usage double precision ENCODE raw, 59 | cost double precision ENCODE raw, 60 | cost_mrr double precision ENCODE raw, 61 | unblended_cost double precision ENCODE raw, 62 | cost_without_edp double precision ENCODE raw, 63 | edp_discount double precision ENCODE raw, 64 | total_discount double precision ENCODE raw, 65 | private_rate_discount double precision ENCODE raw, 66 | public_cost double precision ENCODE raw, 67 | rate double precision ENCODE raw, 68 | public_rate double precision ENCODE raw, 69 | storage_total_gb double precision ENCODE raw, 70 | storage_total_tb double precision ENCODE raw, 71 | 72 | /* partition / sort key */ 73 | date character (10) ENCODE raw 74 | ) DISTSTYLE AUTO 75 | SORTKEY 76 | (date); 77 | -------------------------------------------------------------------------------- /aws/redshift_athena_compat_udfs.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Wrapper functions for compatibility with certain Athena/Trino/Presto functions. 3 | 4 | Cloudstats is written to be as SQL-neutral as possible, but there are funny differences that 5 | can't be avoided. Fortunately, Redshift supports simple scalar UDFs so we can fake most of 6 | it with the sleazy tricks below. 7 | 8 | NOTE: there is no way to make an aggregate UDF in Redshift, so the approx_percentile() 9 | calls in cloudstats_cpu_ratecard.sql has some commented out Athena code. 10 | */ 11 | 12 | 13 | /** 14 | if(predicate, true_case [, false_case]) 15 | 16 | Be aware that this does not do lazy evaluation of the second or third arguments! 17 | **/ 18 | create or replace function if (boolean, varchar, varchar) returns varchar immutable as $$ select (case when $1 then $2 else $3 end) $$ language sql; 19 | create or replace function if (boolean, float, float) returns float immutable as $$ select (case when $1 then $2 else $3 end) $$ language sql; 20 | create or replace function if (boolean, int, int) returns int immutable as $$ select (case when $1 then $2 else $3 end) $$ language sql; 21 | create or replace function if (boolean, varchar) returns varchar immutable as $$ select (case when $1 then $2 end) $$ language sql; 22 | create or replace function if (boolean, float) returns float immutable as $$ select (case when $1 then $2 end) $$ language sql; 23 | create or replace function if (boolean, int) returns int immutable as $$ select (case when $1 then $2 end) $$ language sql; 24 | 25 | /** 26 | regexp_extract(haystack, pattern [, ignored_arg]) 27 | 28 | Implements Trino-compatible regexp_extract() in Redshift. Returns null on failure. 29 | This IGNORES the third argument, which in Trino allows you to specify the group to 30 | capture. Redshift's regexp_substr() doesn't support that. You will have to be 31 | careful to use non-capturing groups, eg, "(?: )" in your pattern because this 32 | wrapper function will ALWAYS return the first capturing group. 33 | 34 | For example: 35 | 36 | ## Trino: 37 | select regexp_extract('foobar', '(foo)(bar)', 1); 38 | --> 'foo' 39 | 40 | select regexp_extract('foobar', '(foo)(bar)', 2); ## extracts second capturing group 41 | --> 'bar' 42 | 43 | select regexp_extract('foobar', '(?:foo)(bar)', 1); ## non-capturing group on 'foo' 44 | --> 'bar' 45 | 46 | ## Redshift 47 | select regexp_extract('foobar', '(foo)(bar)', 1); 48 | --> 'foo' 49 | 50 | select regexp_extract('foobar', '(foo)(bar)', 2); ## WRONG! 51 | --> 'foo' 52 | 53 | select regexp_extract('foobar', '(?:foo)(bar)', 1); ## non-capturing group on 'foo' 54 | --> 'bar' 55 | **/ 56 | create or replace function regexp_extract(varchar, varchar, int) -- $3 is ignored. 57 | returns varchar immutable 58 | as $$ 59 | select nullif(regexp_substr($1, $2, 1, 1, 'pe'), '') 60 | $$ language sql; 61 | 62 | create or replace function regexp_extract(varchar, varchar) 63 | returns varchar immutable 64 | as $$ 65 | select regexp_extract($1, $2, 0) -- $3 is ignored. 66 | $$ language sql; 67 | 68 | 69 | /** 70 | regexp_like(haystack, pattern) 71 | 72 | Redshift's regexp_instr() returns the ones-indexed position of the matching substring, 73 | or 0 if no match. Trino's regexp_like() only returns true or false. 74 | **/ 75 | create or replace function regexp_like(varchar, varchar) returns boolean immutable as $$ 76 | select regexp_instr($1, $2, 1, 1, 1, 'p') > 0 77 | $$ language sql; 78 | 79 | 80 | /** 81 | date_format(date | timestamp, format) 82 | 83 | VERY SIMPLISTIC implementation of Athena/MySQL's date_format() in Redshift. 84 | only supports year, month, day, dayofweek and dayabbv. 85 | 86 | select date_format(current_date, '%Y-%m-%d') 87 | --> '2022-04-18' 88 | 89 | select date_format(current_date, '%w (%a)') 90 | --> '1 (Mon)' 91 | **/ 92 | create or replace function date_format(timestamp, varchar) returns varchar immutable as $$ 93 | select 94 | to_char($1, 95 | replace( 96 | replace( 97 | replace( 98 | replace( 99 | replace($2, 100 | '%Y', 'YYYY'), 101 | '%m', 'MM'), 102 | '%d', 'DD'), 103 | '%w', 'D'), 104 | '%a', 'dy') 105 | ) 106 | $$ language sql; 107 | 108 | create or replace function date_format(date, varchar) returns varchar immutable as $$ 109 | select date_format(cast($1 as timestamp), $2) 110 | $$ language sql; 111 | 112 | 113 | -------------------------------------------------------------------------------- /aws/cloudstats_bill_iceberg.sql: -------------------------------------------------------------------------------- 1 | /* 2 | This builds on the cloudstats_bill table by: 3 | 1) adding iceberg_cost to show the effect of various classes of discount strategies 4 | 2) creating artificial "Reserved Instances" bill_line records to show the savings from RI 5 | 3) adding an iceberg_bill_line to break out interesting charges like Support 6 | */ 7 | 8 | create or replace view cloudstats.cloudstats_bill_iceberg as 9 | 10 | with bill as ( 11 | select 12 | date_month, 13 | bill_line, 14 | pricing_bucket, 15 | billing_entity, 16 | line_item_type, 17 | product_code, 18 | product_description, 19 | 20 | sum(accrual_cost) as accrual_cost, 21 | sum(unblended_cost) as unblended_cost, 22 | sum(public_cost) as public_cost 23 | 24 | from cloudstats.cloudstats_bill 25 | 26 | -- exclude AWS marketplace charges, which generally have no discounts 27 | where billing_entity = 'AWS' 28 | group by 1,2,3,4,5,6,7 29 | ), 30 | 31 | discounted_usage as ( 32 | select 33 | date_month, 34 | 'Reserved Instances' as bill_line, 35 | pricing_bucket, 36 | billing_entity, 37 | 'RiDiscount' as line_item_type, 38 | product_code, 39 | product_description, 40 | 41 | sum(accrual_cost) as accrual_cost, 42 | 43 | -- fun fact: EDP is applied *after* savings plans and RIs. This means that 44 | -- if you you use $2.00 worth of compute under an SP/RI that carries a 50% 45 | -- discount, then you use up $1.00 of the commitment. But under a 10% EDP, 46 | -- your net charged amount is only $0.90. So the effective savings from 47 | -- an RI must add back in the savings from edp, so you don't double count. 48 | -- 49 | -- confusingly, "DiscountedUsage" is reserved instance usage. 50 | sum((case 51 | when line_item_type = 'DiscountedUsage' 52 | then -1 * (public_cost - cost_without_edp) 53 | else unblended_cost 54 | end)) as unblended_cost, 55 | sum(public_cost) as public_cost 56 | 57 | from cloudstats.cloudstats_bill 58 | where 59 | line_item_type in ('DiscountedUsage', 'RiVolumeDiscount') 60 | 61 | -- exclude AWS marketplace charges, which generally have no discounts 62 | and billing_entity = 'AWS' 63 | group by 1,2,3,4,5,6,7 64 | 65 | ), 66 | 67 | unioned as ( 68 | select * from bill 69 | union 70 | select * from discounted_usage 71 | ), 72 | 73 | calced as ( 74 | 75 | select 76 | cast(concat(date_month, '-01') as timestamp) as date_month, --needed to get QS to treat this as a proper date 77 | bill_line, 78 | pricing_bucket, 79 | billing_entity, 80 | line_item_type, 81 | product_code, 82 | product_description, 83 | 84 | public_cost, 85 | 86 | --"iceberg" cost: 87 | -- 1) all positive charges logged as the net "true" cost including discounts 88 | -- 2) all negative lines (credits, etc) 89 | -- makes for a pretty graph to show the effect of each discount type. 90 | (case 91 | -- negative values, sum of discounts 92 | when bill_line = 'Reserved Instances' then unblended_cost 93 | when line_item_type in ('EdpDiscount', 'PrivateRateDiscount', 'Credit', 'SavingsPlanNegation') then unblended_cost 94 | 95 | -- positive values, with all discounts applied 96 | when line_item_type in ('SavingsPlanCoveredUsage', 'DiscountedUsage', 'Usage') then accrual_cost 97 | when line_item_type in ('Tax') then unblended_cost 98 | else 0 99 | end) as iceberg_cost, 100 | 101 | -- pull out some interesting things out of Charges as separate bill lines for the iceberg chart 102 | (case 103 | when product_code like '%Support%' then 'Support' 104 | when line_item_type = 'Credit' then 'Credits' 105 | when bill_line = 'Charges' then 'Net Charges' 106 | else bill_line 107 | end) as iceberg_bill_line 108 | 109 | from unioned 110 | ), 111 | 112 | final_cte as ( 113 | select 114 | date_month, 115 | iceberg_bill_line, 116 | product_code, 117 | product_description, 118 | (case when iceberg_cost >= 0 then 'Net Charges' else 'Discounts' end) as iceberg_type, 119 | sum(iceberg_cost) as iceberg_cost 120 | from calced 121 | group by 1,2,3,4,5 122 | ) 123 | 124 | select 125 | *, 126 | 127 | iceberg_cost / sum(iceberg_cost) over (partition by date_month, iceberg_type) as percent_of_type, 128 | abs(iceberg_cost) / sum(abs(iceberg_cost)) over (partition by date_month) as percent_of_total 129 | 130 | from final_cte 131 | where iceberg_cost != 0 132 | order by 1 desc, 3 asc 133 | ; 134 | -------------------------------------------------------------------------------- /aws/cloudstats_dim_aws_products.sql: -------------------------------------------------------------------------------- 1 | /* 2 | Not all line items have the product name. Sometimes the product name changes, 3 | like Elastic Search --> Open Search. Fun, huh? So here we map them back in. 4 | */ 5 | drop table if exists cloudstats.cloudstats_dim_aws_products cascade; 6 | 7 | create table cloudstats.cloudstats_dim_aws_products (product_code varchar primary key, product_name varchar); 8 | 9 | insert into cloudstats_dim_aws_products values 10 | ('AmazonCloudFront', 'Amazon CloudFront'), 11 | ('AWSDataTransfer', 'AWS Data Transfer'), 12 | ('AmazonECRPublic', 'Amazon Elastic Container Registry Public'), 13 | ('AmazonSES', 'Amazon Simple Email Service'), 14 | ('AWSBackup', 'AWS Backup'), 15 | ('AmazonRDS', 'Amazon Relational Database Service'), 16 | ('AmazonSageMaker', 'Amazon SageMaker'), 17 | ('AWSGlue', 'AWS Glue'), 18 | ('AmazonCloudWatch', 'AmazonCloudWatch'), 19 | ('awskms', 'AWS Key Management Service'), 20 | ('AmazonEC2', 'Amazon Elastic Compute Cloud'), 21 | ('AmazonApiGateway', 'Amazon API Gateway'), 22 | ('AmazonRoute53', 'Amazon Route 53'), 23 | ('AmazonSNS', 'Amazon Simple Notification Service'), 24 | ('AWSQueueService', 'Amazon Simple Queue Service'), 25 | ('AWSSecretsManager', 'AWS Secrets Manager'), 26 | ('AmazonECR', 'Amazon EC2 Container Registry (ECR)'), 27 | ('AWSSecurityHub', 'AWS Security Hub'), 28 | ('AmazonDocDB', 'Amazon DocumentDB (with MongoDB compatibility)'), 29 | ('AWSCloudTrail', 'AWS CloudTrail'), 30 | ('AmazonVPC', 'Amazon Virtual Private Cloud'), 31 | ('AmazonS3', 'Amazon Simple Storage Service'), 32 | ('awswaf', 'AWS WAF'), 33 | ('AmazonGuardDuty', 'Amazon GuardDuty'), 34 | ('AmazonSimpleDB', 'Amazon SimpleDB'), 35 | ('AmazonECS', 'Amazon Elastic Container Service'), 36 | ('AmazonLightsail', 'Amazon Lightsail'), 37 | ('AmazonQuickSight', 'Amazon QuickSight'), 38 | ('AWSELB', 'Elastic Load Balancing'), 39 | ('AmazonES', 'Amazon OpenSearch Service'), 40 | ('AmazonDynamoDB', 'Amazon DynamoDB'), 41 | ('CodeBuild', 'CodeBuild'), 42 | ('AWSEvents', 'CloudWatch Events'), 43 | ('AWSLambda', 'AWS Lambda'), 44 | ('AmazonInspectorV2', 'Amazon Inspector'), 45 | ('AmazonMCS', 'Amazon Keyspaces (for Apache Cassandra)'), 46 | ('AmazonElastiCache', 'Amazon ElastiCache'), 47 | ('AWSTransfer', 'AWS Transfer Family'), 48 | ('AmazonQLDB', 'Amazon Quantum Ledger Database'), 49 | ('AWSCostExplorer', 'AWS Cost Explorer'), 50 | ('AWSAmplify', 'AWS Amplify'), 51 | ('AmazonEFS', 'Amazon Elastic File System'), 52 | ('AWSConfig', 'AWS Config'), 53 | ('AmazonKinesisFirehose', 'Amazon Kinesis Firehose'), 54 | ('AWSXRay', 'AWS X-Ray'), 55 | ('AmazonStates', 'AWS Step Functions'), 56 | ('AmazonKinesis', 'Amazon Kinesis'), 57 | ('AWSCodePipeline', 'AWS CodePipeline'), 58 | ('AWSGlobalAccelerator', 'AWS Global Accelerator'), 59 | ('AWSCodeArtifact', 'AWS CodeArtifact'), 60 | ('AmazonPinpoint', 'Amazon Pinpoint'), 61 | ('AmazonWorkMail', 'AmazonWorkMail'), 62 | ('AmazonEKS', 'Amazon Elastic Container Service for Kubernetes'), 63 | ('AWSCodeCommit', 'AWS CodeCommit'), 64 | ('AWSSystemsManager', 'AWS Systems Manager'), 65 | ('AWSDatabaseMigrationSvc', 'AWS Database Migration Service'), 66 | ('AmazonAthena', 'Amazon Athena'), 67 | ('AmazonCognito', 'Amazon Cognito'), 68 | ('AWSElementalMediaStore', 'AWS Elemental MediaStore'), 69 | ('AWSServiceCatalog', 'AWS Service Catalog'), 70 | ('AWSElementalMediaConvert', 'AWS Elemental MediaConvert'), 71 | ('AmazonIVS', 'Amazon Interactive Video Service'), 72 | ('ElasticMapReduce', 'Amazon Elastic MapReduce'), 73 | ('AmazonKinesisAnalytics', 'Amazon Kinesis Analytics'), 74 | ('AmazonRegistrar', 'Amazon Registrar'), 75 | ('AmazonRedshift', 'Amazon Redshift'), 76 | ('AppFlow', 'Amazon AppFlow'), 77 | ('AmazonTimestream', 'Amazon Timestream'), 78 | ('AmazonChimeFeatures', 'Amazon Chime Features'), 79 | ('AmazonChime', 'Amazon Chime'), 80 | ('ComputeSavingsPlans', 'Savings Plans for AWS Compute usage'), 81 | ('AWSShield', 'AWS Shield'), 82 | ('AmazonMQ', 'Amazon MQ'), 83 | ('AWSAppSync', 'AWS AppSync'), 84 | ('AmazonLocationService', 'Amazon Location Service'), 85 | ('AmazonFSx', 'Amazon FSx'), 86 | ('AWSDirectoryService', 'AWS Directory Service'), 87 | ('AmazonPrometheus', 'Amazon Managed Service for Prometheus'), 88 | ('AmazonWorkSpaces', 'Amazon WorkSpaces'), 89 | ('AmazonMSK', 'Amazon Managed Streaming for Apache Kafka'), 90 | ('AmazonRekognition', 'Amazon Rekognition'), 91 | ('comprehend', 'Amazon Comprehend'), 92 | ('AmazonCloudSearch', 'Amazon CloudSearch'), 93 | ('AmazonKendra', 'Amazon Kendra'), 94 | ('AmazonSumerian', 'Amazon Sumerian'), 95 | ('AmazonWorkDocs', 'Amazon WorkDocs'), 96 | ('ContactCenterTelecomm', 'Contact Center Telecommunications (service sold by AMCS, LLC) '), 97 | ('AmazonMWAA', 'Amazon Managed Workflows for Apache Airflow') 98 | 99 | ; 100 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Cloudstats ETL 2 | 3 | This ETL works from the raw AWS Cost & Usage Report (CUR). It is implemented as a series of SQL statements with no out-of-band processing. See `aws/dag-full-load.txt` for the order of queries. 4 | 5 | See also this [very long article about cloud cost control](https://carlos.bueno.org/2023/03/aws-dismal-guide.html) in general and how to use Cloudstats on AWS in particular. 6 | 7 | The hope is that Cloudstats gives you a clean basis for your capacity planning and efficiency work. This work is based on experience with large AWS accounts over many years. Its accuracy is not guaranteed, but there are built-in ways to check its completeness and consistency. 8 | 9 | ### Main output 10 | There are two main output tables: 11 | 12 | 1. `cloudstats` is a per-day aggregation and cleanup of only the AWS usage lineitems, with [_fully-amortized accrual-basis_](https://en.wikipedia.org/wiki/Basis_of_accounting) cost measurements. 13 | 14 | 2. `cloudstats_bill` is a per-month _cash-basis_ aggregation of all lineitems (including taxes, credits, etc etc) reverse-engineered from the observed behavior of official AWS invoices. (See the code for more details.) It is meant to predict the official invoice to within 1%. It also provides much more detail, eg explanations of individual Taxes and Credits, and is updated daily. 15 | 16 | ### Satellite views 17 | There are several views laid on top of these main tables for specialized analyses: 18 | 19 | 1. `cloudstats_bill_iceberg` does further attribution of discounts and credits. For example, backing out the separate effects of EDP, private rates, and reserved instance usage. 20 | 21 | 2. `cloudstats_cpu_ratecard` renders the historical per-CPU-hour rates you paid, aggregated on instance type and `pricing_regime`. This includes percentile bands for real-world Spot prices, amortized RIs and Savings Plans, and so on. 22 | 23 | 3. `cloudstats_buying_efficiency` attempts to show new opportunites for using RI, SP, and Spot based on historical OnDemand usage. 24 | 25 | 4. `cloudstats_wow_cost_movers` does week-over-week comparisons of spend broken out by a few AWS and client-specific dimensions, to find interesting gainers and losers. 26 | 27 | ### Dimension tables 28 | There are several dimension tables for client-specific needs and to fix up hilarious data bugs in the raw table: 29 | 30 | 1. `cloudstats_dim_account_names.sql` is a list of your internal account aliases. 31 | 32 | 2. `cloudstats_dim_pricing_buckets.sql` maps the cleaned-up `pricing_unit` field into a handful of buckets like 'Compute', 'Storage', and so on. 33 | 34 | 3. `cloudstats_dim_instance_specs.sql` provides machine specs based on an artificial field called `instance_spec`. The CUR's original `instance_type` field has lots of junk in it, and many records in the CUR have missing values for important fields like physical processor. For example, an `instance_type` of "ml.c5.xlarge-Training" will have an `instance_spec` of "c5.xlarge". This table is compiled from public sources and should be updated from time to time. 35 | 36 | 4. `cloudstats_dim_aws_products.sql` maps `product_code` to the official product name as it appears in the AWS bill. Not all of the CUR's lineitems have this information (!). This is also hand-compiled from public sources and may be incomplete. 37 | 38 | ### Quirks and gotchas 39 | #### This ETL is unusual for two reasons: 40 | 41 | 1. The incremental load re-writes the previous *65 days* of data, each day. This is because the CUR is a running account, not an immutable log. The source records in your raw CUR table may be updated or added weeks after the fact. For example, the `invoice_id` field is null until after the month's accounting close, usually 3 days after the calendar month. Once the invoice is generated by AWS, all of those records are updated with the invoice ID. Credits and monthly charges are often added post-hoc. 42 | 43 | 2. The first two stages of the dag, `cloudstats_00` and `cloudstats_01` are implemented as *SQL views*. There are no staging tables. This was for speed of development. This ETL was originally written for columnstore databases with partitioning & predicate pushdown (eg, Trino or Athena) so was not much of a performance problem. In OLTP Redshift this pattern is arguably a bad idea. See the `where` clause at the bottom of `cloudstats_00` for more notes. 44 | 45 | #### Anomaly detection 46 | My general opinion is that you can't (or shouldn't) do automated anomaly detection until you have a very good idea about what is normal. A simple WoW comparator will very likely trigger tons of false positives. Simplistic threshholds will also generate alarm fatigue. Who cares if the foobar service increased by 212% (ZOMG!) when its cost basis is five bucks? 47 | 48 | My favorite example is storage cost. You will find exhaustive complaints about AWS's 31-day pricing logic in the comments. The main `cloudstats` table works hard to smooth it out. 49 | 50 | But storage also comes in tiers, which are charged in descending order during the month. If you have a PB of data in S3, the first day of the month will show a little spike in total cost and effective rate paid. The following few days will show a lower rate, then the rest of the month will plateau at the lowest rate. Next month the cycle starts again. Even a z-score can be fooled by that. 51 | 52 | Even worse, the automatic processes that move data to and from cheaper tiers can happen at any time. Storage anomaly detection should be run against *usage & storage tier* (eg `storage_total_tb` and `storage_class`) and not cost. And even then it needs to be smart enough to not pull the fire alarm for "normal" tier shifts. 53 | 54 | Another pitfall is changing tags. Engineers often rename their clusters & services on their own schedule and for their own reasons. A simplistic detector grouped by service name will panic twice: when the `foobar` service suddenly drops to zero and when the new `foobar_v2` service pops out of nowhere with significant cost. 55 | 56 | There is no general solution to anomaly detection over changing infrastructure. All I can recommend is to put a human in the loop before you email the world. 57 | -------------------------------------------------------------------------------- /aws/cloudstats_wow_cost_movers.sql: -------------------------------------------------------------------------------- 1 | /* 2 | A quick analysis of cost changes week over week. This groups every combination of 3 | account, pricing bucket, operation, and whatever site-specific tags you add. then 4 | it compares the average daily cost over the previous 7-day period with the averages 5 | from 3, 6, 9, and 12 weeks back. 6 | 7 | */ 8 | create or replace view cloudstats.cloudstats_wow_cost_movers as 9 | 10 | /* 11 | Don't use current_date as the pivot. Use the latest partition that has data but 12 | minus 3 days. This is an unfortunate thing with the AWS feed: cost data can 13 | often roll in 1-3 days late, especially for large S3 buckets. 14 | */ 15 | with latest_known_good_date as ( 16 | select 17 | date_add('day', -3, cast(max(date) as timestamp)) 18 | as max_dataset_date 19 | from cloudstats.cloudstats 20 | where 21 | date >= date_format(date_add('day', -10, current_date), '%Y-%m-%d') 22 | ), 23 | 24 | cost_per_day as ( 25 | select 26 | date as date, 27 | 28 | floor(date_diff('day', cast(date as timestamp), max_dataset_date) / 7) 29 | as weeks_back, 30 | 31 | max_dataset_date as max_dataset_date, 32 | product_code as product_code, 33 | operation as operation, 34 | pricing_bucket as pricing_bucket, 35 | 36 | /**** ADD YOUR SITE-SPECIFIC DIMENSIONS HERE & remember to update the group by. *****/ 37 | 38 | sum(cost) as cost, 39 | sum(cost_mrr) as cost_mrr, 40 | sum(usage) as usage 41 | from 42 | cloudstats.cloudstats 43 | inner join latest_known_good_date on 1=1 44 | where 45 | -- 13 weeks back (7 * 13 = 91) 46 | date >= date_format(date_add('day', -91, max_dataset_date), '%Y-%m-%d') 47 | group by 1,2,3,4,5,6 48 | ), 49 | 50 | cost_per_week as ( 51 | select 52 | -- technically not the "week of" but the latest date in the 7-day period. 53 | -- also, you can't just do max(date) because if you group by many dimensions, 54 | -- some of those compound groups will not have data on all days. Instead start 55 | -- with max_dataset_date and subtract weeks_back 56 | --max(date) as week_of, 57 | cast(date(date_add('week', cast(-1 * weeks_back as int), max_dataset_date)) as varchar) 58 | as week_of, 59 | 60 | cast(date(max_dataset_date) as varchar) as current_week, 61 | weeks_back as weeks_back, 62 | product_code as product_code, 63 | operation as operation, 64 | pricing_bucket as pricing_bucket, 65 | 66 | /**** ADD YOUR SITE-SPECIFIC DIMENSIONS HERE *****/ 67 | 68 | count(distinct date) as num_days, 69 | sum(cost) / 7 as avg_cost_daily, 70 | sum(usage) / 7 as avg_usage_daily 71 | from cost_per_day 72 | group by 1,2,3,4,5,6 73 | ), 74 | 75 | consolidated as ( 76 | select 77 | current_week as current_week, 78 | product_code as product_code, 79 | operation as operation, 80 | pricing_bucket as pricing_bucket, 81 | 82 | /**** ADD YOUR SITE-SPECIFIC DIMENSIONS HERE *****/ 83 | 84 | /* 85 | SQL standard avg() and count() ignore null values. Handy for this kind of aggregation. 86 | However when avg() gets zero rows, it returns null. And any value divided by null is null. 87 | Thank you for subscribing to SQL Facts! 88 | */ 89 | avg(if(weeks_back = 0, avg_cost_daily, 0)) as current_avg_cost, 90 | avg(if(weeks_back between 1 and 3, avg_cost_daily, 0)) as avg_last_3, 91 | avg(if(weeks_back between 1 and 6, avg_cost_daily, 0)) as avg_last_6, 92 | avg(if(weeks_back between 1 and 9, avg_cost_daily, 0)) as avg_last_9, 93 | avg(if(weeks_back between 1 and 12, avg_cost_daily, 0)) as avg_last_12 94 | 95 | from cost_per_week 96 | group by 1,2,3,4 97 | ), 98 | 99 | /* 100 | In the beginning, the SQL null-value was created. This has made a lot of people very 101 | angry and been widely regarded as a bad move. 102 | */ 103 | consolidated_nonnull as ( 104 | select 105 | current_week as current_week, 106 | product_code as product_code, 107 | operation as operation, 108 | pricing_bucket as pricing_bucket, 109 | 110 | /**** ADD YOUR SITE-SPECIFIC DIMENSIONS HERE *****/ 111 | 112 | -- min value is epsilon to avoid divide-by-zero 113 | if(current_avg_cost is null, 0.00001, current_avg_cost) as current_avg_cost, 114 | if(avg_last_3 is null, 0.00001, avg_last_3) as avg_last_3, 115 | if(avg_last_6 is null, 0.00001, avg_last_6) as avg_last_6, 116 | if(avg_last_9 is null, 0.00001, avg_last_9) as avg_last_9, 117 | if(avg_last_12 is null, 0.00001, avg_last_12) as avg_last_12 118 | 119 | from consolidated 120 | ) 121 | 122 | select 123 | abs(current_avg_cost - avg_last_3) as delta_3_abs, -- for sorting by absolute change 124 | current_week as current_week, 125 | product_code as product_code, 126 | operation as operation, 127 | pricing_bucket as pricing_bucket, 128 | 129 | /**** ADD YOUR SITE-SPECIFIC DIMENSIONS HERE *****/ 130 | 131 | current_avg_cost as current_avg_cost, 132 | current_avg_cost * 30.4375 as current_cost_mrr, 133 | 134 | current_avg_cost - avg_last_3 as delta_3, 135 | avg_last_3 as avg_last_3, 136 | (current_avg_cost - avg_last_3) / avg_last_3 as delta_3_percent, 137 | current_avg_cost - avg_last_6 as delta_6, 138 | avg_last_6 as avg_last_6, 139 | (current_avg_cost - avg_last_6) / avg_last_6 as delta_6_percent, 140 | current_avg_cost - avg_last_9 as delta_9, 141 | avg_last_9 as avg_last_9, 142 | (current_avg_cost - avg_last_9) / avg_last_9 as delta_9_percent, 143 | current_avg_cost - avg_last_12 as delta_12, 144 | avg_last_12 as avg_last_12, 145 | (current_avg_cost - avg_last_12) / avg_last_12 as delta_12_percent 146 | from consolidated_nonnull 147 | order by 1 desc 148 | ; 149 | -------------------------------------------------------------------------------- /aws/cloudstats_01.sql: -------------------------------------------------------------------------------- 1 | /** 2 | Stage 01 of our main pipeline. This stage is generally meant to join against 3 | useful dimension tables. No aggregation of the data from 00. 4 | **/ 5 | create or replace view cloudstats.cloudstats_01 as 6 | 7 | 8 | -- map in friendly AWS account names and nicks 9 | with b as ( 10 | select 11 | account_id, 12 | account_name, 13 | account_nick 14 | from cloudstats.cloudstats_dim_account_names 15 | ), 16 | 17 | -- map the fixed-up pricing units to coarser-grained "buckets" 18 | c as ( 19 | select 20 | pricing_unit, 21 | pricing_bucket 22 | from cloudstats.cloudstats_dim_pricing_buckets 23 | ), 24 | 25 | -- le sigh. 26 | d as ( 27 | select 28 | instance_spec, 29 | compute_class, 30 | processor_name, 31 | processor_vendor, 32 | processor_line, 33 | cast(vcpu as int) as processor_vcpu 34 | from cloudstats.cloudstats_dim_instance_specs 35 | group by 1,2,3,4,5,6 36 | ), 37 | 38 | -- product names, sigh 39 | e as ( 40 | select 41 | product_code, 42 | product_name 43 | from cloudstats.cloudstats_dim_aws_products 44 | ) 45 | 46 | select 47 | -- date field is last, for patitioning. 48 | year, 49 | month, 50 | date_month, -- 2021-09 51 | day_of_week, -- 1 (Mon) 52 | cloudstats_version, 53 | a.account_id as account_id, 54 | coalesce(b.account_name, 'Unknown') as account_name, 55 | coalesce(b.account_nick, 'Unknown') || ' (' || a.account_id || ')' 56 | as account_display_name, 57 | billing_entity, 58 | legal_entity, 59 | invoice_id, 60 | a.resource_name as resource_name, 61 | line_item_type, 62 | usage_type, 63 | 64 | -- the prefix can be useful for, eg, tracking UnusedBox 65 | regexp_extract(usage_type, '^([^:]+):', 1) as usage_type_prefix, 66 | 67 | region_code, 68 | location, 69 | operation, 70 | currency_code, 71 | coalesce(c.pricing_bucket, 'Unknown') as pricing_bucket, 72 | a.product_code as product_code, 73 | coalesce(e.product_name, a.product_name) as product_name, 74 | 75 | -- In the final bill AWS moves (some!) networking into a separate product, AWSDataTransfer. 76 | -- Tax in the CUR lines appear with the AWSDataTransfer code, but Usage and EdpDiscount lines 77 | -- do not. So, we gently rewrite lineitems with the "correct" product as it appears in the bill. 78 | -- 79 | -- BUT we want to make sure that the usage/accrual tables in the main cloudstats table attribute 80 | -- charges like network to the usage that triggered them, while at the same time ensure the 81 | -- cloudstats_bill dag replicates the jiggery-pokery in the official PDF bill. Two different 82 | -- ways to denote "product". See also the virtual "Savings Plans for Compute" product that 83 | -- is used to mark non-usage charges for SP, which are multiproduct and can't be attributed 84 | -- to an individual product like RIFee can. 85 | (case 86 | when a.product_code in ('AWSELB', 'AmazonEC2', 'AmazonApiGateway', 'AmazonECR') and ( 87 | (product_family = 'Data Transfer') or (line_item_type = 'EdpDiscount' and usage_type like '%DataTransfer-%-Bytes%') 88 | ) then 'AWSDataTransfer' 89 | else a.product_code 90 | end) as bill_product_code, 91 | 92 | (case 93 | when a.product_code in ('AWSELB', 'AmazonEC2', 'AmazonApiGateway', 'AmazonECR') and ( 94 | (product_family = 'Data Transfer') or (line_item_type = 'EdpDiscount' and usage_type like '%DataTransfer-%-Bytes%') 95 | ) then 'AWS Data Transfer' 96 | else coalesce(e.product_name, a.product_name) 97 | end) as bill_product_name, 98 | 99 | -- product_family sometimes has weird data dropouts. 100 | (case 101 | when product_family = '' or product_family is null then coalesce(c.pricing_bucket, 'Unknown') 102 | else product_family 103 | end) as product_family, 104 | 105 | product_group, 106 | product_servicecode, 107 | a.pricing_unit as pricing_unit, 108 | pricing_regime, 109 | product_description, 110 | savings_plan_arn, 111 | reserved_instance_arn, 112 | 113 | compute_instance_spec, 114 | compute_instance_type, 115 | compute_instance_family, 116 | compute_instance_type_family, 117 | 118 | --todo: az information is its own deep rabbit hole. 119 | compute_availability_zone, 120 | compute_capacity_status, 121 | 122 | -- no coalesce() here. Strictly overwrite with the static instance_spec data. 123 | d.compute_class as compute_class, -- c5->c5, c5a->c5, etc. 124 | d.processor_name as compute_processor_name, 125 | d.processor_vendor as compute_processor_vendor, 126 | d.processor_line as compute_processor_line, 127 | d.processor_vcpu as compute_processor_vcpu, 128 | 129 | compute_storage, --todo: d.compute_storage? 130 | 131 | (case 132 | -- lots of non-EC2 services omit stuff like this. As of 2022 it's a decent bet that these 133 | -- things run Linux, but who knows what the future will bring. 134 | when c.pricing_bucket = 'Compute' and (compute_os = '' or compute_os is null) then 'Linux' 135 | else compute_os 136 | end) as compute_os, 137 | 138 | -- per-hour cost can vary quite a bit with pre-installed software. This is basically the only 139 | -- field that can distinguish them. 140 | -- "Linux: SQL Std" 141 | -- "Windows: SQL Ent" 142 | (case 143 | when compute_software in ('NA', '') or compute_software is null then (case when c.pricing_bucket = 'Compute' and (compute_os = '' or compute_os is null) then 'Linux' else compute_os end) 144 | else (case when c.pricing_bucket = 'Compute' and (compute_os = '' or compute_os is null) then 'Linux' else compute_os end) || ': ' || compute_software 145 | end) as compute_software, 146 | 147 | storage_class, 148 | storage_volume_type, 149 | storage_volume_api, 150 | storage_user_volume, 151 | days_in_month, 152 | 153 | /* 154 | ---------------------------------------------- 155 | -- ADD YOUR SITE-SPECIFIC TAGS HERE ---------- 156 | ---------------------------------------------- 157 | */ 158 | -- resource_tags_my_tag as my_tag, 159 | 160 | ---------------------------------------------- 161 | -- MEASURES ---------------------------------- 162 | ---------------------------------------------- 163 | record_cnt, 164 | usage, 165 | 166 | /* 167 | Combine amortized cost with month-adjusted storage cost. For reconciling this 168 | dataset with the real bill, use unblended_cost instead. For the full crazy 169 | story, see "days_in_month" in stage 00. 170 | 171 | 172 | */ 173 | (case 174 | when a.pricing_unit = 'GB-Month' then cost / 30.4375 * days_in_month 175 | else cost 176 | end) as cost, 177 | 178 | (case 179 | when a.pricing_unit = 'GB-Month' then cost / 30.4375 * days_in_month 180 | else cost 181 | end) * 30.4375 as cost_mrr, 182 | 183 | unblended_cost, 184 | cost_without_edp, 185 | edp_discount, 186 | total_discount, 187 | cast(private_rate_discount as float) as private_rate_discount, 188 | public_cost, 189 | 190 | /* 191 | The realized rates for aggregated line items, after the jiggery-pokery 192 | in the previous stage to normalize timed usage like "seconds" and "minutes" to 193 | "hour", accrual-basis accounting of cost, rolling up by date or hour, etc. 194 | very often a product will have multiple rate tiers (0.05 for first 50GB, 195 | 0.04 for 51-200 GB) that make it tricky to aggregate rates at an earlier stage. 196 | 197 | Note that we are NOT using the month-adjusted storage cost here. AWS makes their 198 | "every month is 31 days" math work out by inflating the usage in short months. 199 | */ 200 | cost / (usage + 1) as rate, 201 | public_cost / (usage + 1) as public_rate, 202 | 203 | /* 204 | Speaking of inflated usage amounts... 205 | Using days_in_month, calculate total data under management. 206 | If you just sum(usage) where pricing_bucket='Storage', you will find that 207 | the amount stored will appear to jump 3-11% when crossing from one month 208 | to another. It will fool your anomaly detectors and confuse humans. 209 | Instead, rescale "usage" by the actual number of days in the month. 210 | */ 211 | if(a.pricing_unit = 'GB-Month', usage * days_in_month, 0) 212 | as storage_total_gb, 213 | 214 | if(a.pricing_unit = 'GB-Month', usage * days_in_month / 1024, 0) 215 | as storage_total_tb, 216 | 217 | --partition / sort field 218 | date 219 | 220 | from 221 | cloudstats.cloudstats_00 a 222 | left join b on a.account_id = b.account_id 223 | left join c on a.pricing_unit = c.pricing_unit 224 | left join d on a.compute_instance_spec = d.instance_spec 225 | left join e on a.product_code = e.product_code 226 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. -------------------------------------------------------------------------------- /docs/tufte-css/tufte.css: -------------------------------------------------------------------------------- 1 | /* Import ET Book styles 2 | adapted from https://github.com/edwardtufte/et-book/blob/gh-pages/et-book.css */ 3 | 4 | @charset "UTF-8"; 5 | 6 | @font-face { 7 | font-family: "et-book"; 8 | src: url("et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot"); 9 | src: url("et-book/et-book-roman-line-figures/et-book-roman-line-figures.eot?#iefix") format("embedded-opentype"), url("et-book/et-book-roman-line-figures/et-book-roman-line-figures.woff") format("woff"), url("et-book/et-book-roman-line-figures/et-book-roman-line-figures.ttf") format("truetype"), url("et-book/et-book-roman-line-figures/et-book-roman-line-figures.svg#etbookromanosf") format("svg"); 10 | font-weight: normal; 11 | font-style: normal 12 | } 13 | 14 | @font-face { 15 | font-family: "et-book"; 16 | src: url("et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot"); 17 | src: url("et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.eot?#iefix") format("embedded-opentype"), url("et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.woff") format("woff"), url("et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.ttf") format("truetype"), url("et-book/et-book-display-italic-old-style-figures/et-book-display-italic-old-style-figures.svg#etbookromanosf") format("svg"); 18 | font-weight: normal; 19 | font-style: italic 20 | } 21 | 22 | @font-face { 23 | font-family: "et-book"; 24 | src: url("et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot"); 25 | src: url("et-book/et-book-bold-line-figures/et-book-bold-line-figures.eot?#iefix") format("embedded-opentype"), url("et-book/et-book-bold-line-figures/et-book-bold-line-figures.woff") format("woff"), url("et-book/et-book-bold-line-figures/et-book-bold-line-figures.ttf") format("truetype"), url("et-book/et-book-bold-line-figures/et-book-bold-line-figures.svg#etbookromanosf") format("svg"); 26 | font-weight: bold; 27 | font-style: normal 28 | } 29 | 30 | @font-face { 31 | font-family: "et-book-roman-old-style"; 32 | src: url("et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot"); 33 | src: url("et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.eot?#iefix") format("embedded-opentype"), url("et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.woff") format("woff"), url("et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.ttf") format("truetype"), url("et-book/et-book-roman-old-style-figures/et-book-roman-old-style-figures.svg#etbookromanosf") format("svg"); 34 | font-weight: normal; 35 | font-style: normal; 36 | } 37 | 38 | /* Tufte CSS styles */ 39 | html { font-size: 15px; } 40 | 41 | body { width: 87.5%; 42 | margin-left: auto; 43 | margin-right: auto; 44 | padding-left: 12.5%; 45 | font-family: et-book, Palatino, "Palatino Linotype", "Palatino LT STD", "Book Antiqua", Georgia, serif; 46 | background-color: #fff; 47 | color: #111; 48 | max-width: 1400px; 49 | counter-reset: sidenote-counter; } 50 | 51 | h1 { font-weight: 400; 52 | margin-top: 4rem; 53 | margin-bottom: 1.5rem; 54 | font-size: 3.2rem; 55 | margin-left: -2rem; 56 | line-height: 1; } 57 | 58 | h2 { font-style: italic; 59 | font-weight: 500; 60 | margin-top: 2.3rem; 61 | margin-bottom: 0; 62 | margin-left: -2rem; 63 | font-size: 2.2rem; 64 | line-height: 1; 65 | color: #c00; } 66 | 67 | h3 { font-style: italic; 68 | font-weight: 400; 69 | font-size: 2rem; 70 | margin-top: 2rem; 71 | margin-bottom: 0; 72 | line-height: 1; 73 | color: #c00;} 74 | 75 | p.subtitle { font-style: italic; 76 | margin-top: 1rem; 77 | margin-bottom: 1rem; 78 | font-size: 1.8rem; 79 | display: block; 80 | line-height: 1; } 81 | 82 | .numeral { font-family: et-book-roman-old-style; } 83 | 84 | .toc-2, .toc-3 { 85 | line-height: 1.4; 86 | } 87 | 88 | a:link.toc-2, a:visited.toc-2 { 89 | font-weight: 500; 90 | color: #c00; 91 | font-style: italic; 92 | } 93 | 94 | .toc-3 { 95 | font-size: 1.1rem; 96 | } 97 | 98 | .toc-3 { margin-left: 1rem; } 99 | .toc-4 { margin-left: 2rem; } 100 | 101 | .toc-3::before { 102 | content: '⋅ '; 103 | } 104 | 105 | a:link.toc-2, a:link.toc-3 { 106 | text-decoration: none; 107 | } 108 | a:hover.toc-2, a:hover.toc-3 { 109 | text-decoration: underline; 110 | } 111 | 112 | .danger { color: red; } 113 | 114 | article { position: relative; 115 | padding: 5rem 0rem; } 116 | 117 | section { padding-top: 1rem; 118 | padding-bottom: 1rem; } 119 | 120 | p, ol, ul { font-size: 1.6rem; } 121 | 122 | p { line-height: 2.2rem; 123 | margin-top: 1.4rem; 124 | margin-bottom: 1.4rem; 125 | padding-right: 0; 126 | vertical-align: baseline; } 127 | 128 | /* Chapter Epigraphs */ 129 | div.epigraph { margin: 5em 0; } 130 | 131 | div.epigraph > blockquote { margin-top: 3em; 132 | margin-bottom: 3em; } 133 | 134 | div.epigraph > blockquote, div.epigraph > blockquote > p { font-style: italic; } 135 | 136 | div.epigraph > blockquote > footer { font-style: normal; } 137 | 138 | div.epigraph > blockquote > footer > cite { font-style: italic; } 139 | 140 | /* end chapter epigraphs styles */ 141 | 142 | blockquote { font-size: 1.6rem; } 143 | 144 | blockquote p { width: 50%; } 145 | 146 | blockquote footer { width: 50%; 147 | font-size: 1.1rem; 148 | text-align: right; } 149 | 150 | ol, ul { width: 85%; 151 | -webkit-padding-start: 5%; 152 | -webkit-padding-end: 5%; } 153 | 154 | li { padding: 0.2rem 0; } 155 | 156 | li p { 157 | margin-top: 0.2rem; 158 | margin-bottom: 1.2rem; 159 | } 160 | 161 | figure { padding: 0; 162 | border: 0; 163 | font-size: 100%; 164 | font: inherit; 165 | vertical-align: baseline; 166 | max-width: 55%; 167 | -webkit-margin-start: 0; 168 | -webkit-margin-end: 0; 169 | margin: 0 0 3em 0; } 170 | 171 | figcaption { float: right; 172 | clear: right; 173 | margin-right: -48%; 174 | margin-top: 0; 175 | margin-bottom: 0; 176 | font-size: 1.3rem; 177 | line-height: 1.6rem; 178 | vertical-align: baseline; 179 | position: relative; 180 | max-width: 40%; } 181 | 182 | figure.fullwidth figcaption { margin-right: 24%; } 183 | 184 | /* Links: replicate underline that clears descenders */ 185 | a:link, a:visited { color: inherit; } 186 | 187 | a:link { text-decoration: underline;} 188 | 189 | @media screen and (-webkit-min-device-pixel-ratio: 0) { a:link { background-position-y: 87%, 87%, 87%; } } 190 | 191 | a:link::selection { text-shadow: 0.03em 0 #b4d5fe, -0.03em 0 #b4d5fe, 0 0.03em #b4d5fe, 0 -0.03em #b4d5fe, 0.06em 0 #b4d5fe, -0.06em 0 #b4d5fe, 0.09em 0 #b4d5fe, -0.09em 0 #b4d5fe, 0.12em 0 #b4d5fe, -0.12em 0 #b4d5fe, 0.15em 0 #b4d5fe, -0.15em 0 #b4d5fe; 192 | background: #b4d5fe; } 193 | 194 | a:link::-moz-selection { text-shadow: 0.03em 0 #b4d5fe, -0.03em 0 #b4d5fe, 0 0.03em #b4d5fe, 0 -0.03em #b4d5fe, 0.06em 0 #b4d5fe, -0.06em 0 #b4d5fe, 0.09em 0 #b4d5fe, -0.09em 0 #b4d5fe, 0.12em 0 #b4d5fe, -0.12em 0 #b4d5fe, 0.15em 0 #b4d5fe, -0.15em 0 #b4d5fe; 195 | background: #b4d5fe; } 196 | 197 | /* Sidenotes, margin notes, figures, captions */ 198 | img { max-width: 100%; } 199 | 200 | .sidenote, .marginnote { float: right; 201 | clear: right; 202 | margin-right: -60%; 203 | width: 50%; 204 | margin-top: 0; 205 | margin-bottom: 0; 206 | font-size: 1.3rem; 207 | line-height: 1.6rem; 208 | vertical-align: baseline; 209 | position: relative; } 210 | .sidenote img, .marginnote img { 211 | border-radius: 5px; 212 | } 213 | 214 | .table-caption { float:right; 215 | clear:right; 216 | margin-right: -60%; 217 | width: 50%; 218 | margin-top: 0; 219 | margin-bottom: 0; 220 | font-size: 1.0rem; 221 | line-height: 1.6; } 222 | 223 | .sidenote-number { counter-increment: sidenote-counter; } 224 | 225 | .sidenote-number:after, .sidenote:before { content: counter(sidenote-counter) " "; 226 | font-family: et-book-roman-old-style; 227 | position: relative; 228 | vertical-align: baseline; } 229 | 230 | .sidenote-number:after { content: counter(sidenote-counter); 231 | font-size: 1rem; 232 | top: -0.5rem; 233 | left: 0.1rem; } 234 | 235 | .sidenote:before { content: counter(sidenote-counter) " "; 236 | top: -0.5rem; } 237 | 238 | p, footer, table, div.table-wrapper-small, div.supertable-wrapper > p, div.booktabs-wrapper { width: 55%; } 239 | 240 | div.fullwidth, table.fullwidth { width: 100%; } 241 | 242 | div.table-wrapper { overflow-x: scroll; 243 | font-family: "Trebuchet MS", "Gill Sans", "Gill Sans MT", sans-serif; } 244 | 245 | @media screen and (max-width: 760px) { p, footer { width: 90%; } 246 | pre.code { width: 87.5%; } 247 | ul { width: 85%; } 248 | figure { max-width: 90%; } 249 | figcaption, figure.fullwidth figcaption { margin-right: 0%; 250 | max-width: none; } 251 | blockquote p, blockquote footer { width: 90%; }} 252 | 253 | .sans { font-family: "Gill Sans", "Gill Sans MT", Calibri, sans-serif; 254 | letter-spacing: .03em; } 255 | 256 | .code { font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; 257 | font-size: 1.0rem; 258 | --color: #22d; 259 | line-height: 1.4; 260 | --border: solid 1px #444; 261 | background-color: #eee; 262 | padding: 0.25rem 0.4rem; 263 | border-radius: 0.25rem; 264 | } 265 | 266 | h1 .code, h2 .code, h3 .code { font-size: 0.80em; } 267 | 268 | .marginnote .code, .sidenote .code { font-size: 1rem; } 269 | 270 | pre.code { width: 50%; 271 | margin-left: 2.5%; 272 | padding: 1rem 0rem; 273 | overflow-x: scroll; } 274 | 275 | .fullwidth { max-width: 90%; 276 | clear:both; } 277 | 278 | span.newthought { font-variant: small-caps; 279 | font-size: 1.2em; } 280 | 281 | input.margin-toggle { display: none; } 282 | 283 | label.sidenote-number { display: inline; } 284 | 285 | label.margin-toggle:not(.sidenote-number) { display: none; } 286 | 287 | @media (max-width: 760px) { label.margin-toggle:not(.sidenote-number) { display: none; } 288 | .sidenote, .marginnote { DDdisplay: none; } 289 | .margin-toggle:checked + .sidenote, 290 | .margin-toggle:checked + .marginnote { display: none; 291 | float: left; 292 | left: 1rem; 293 | clear: both; 294 | width: 95%; 295 | margin: 1rem 2.5%; 296 | vertical-align: baseline; 297 | position: relative; } 298 | label { cursor: pointer; } 299 | pre.code { width: 90%; 300 | padding: 0; } 301 | .table-caption { display: block; 302 | float: right; 303 | clear: both; 304 | width: 98%; 305 | margin-top: 1rem; 306 | margin-bottom: 0.5rem; 307 | margin-left: 1%; 308 | margin-right: 1%; 309 | vertical-align: baseline; 310 | position: relative; } 311 | div.table-wrapper, table, table.booktabs { width: 85%; } 312 | div.table-wrapper { border-right: 1px solid #efefef; } 313 | img { width: 100%; } } 314 | 315 | 316 | 317 | /* make the borders less wide on print, hide the side stuff. Yes, I am aware of the irony. */ 318 | @media print { 319 | body { 320 | width: 100%; 321 | padding-left: 5%; 322 | font-size: 11px; 323 | } 324 | 325 | p, footer, table, div.table-wrapper-small, div.supertable-wrapper > p, div.booktabs-wrapper { 326 | width: 70%; 327 | } 328 | 329 | blockquote p { 330 | width: 65%; 331 | } 332 | 333 | .sidenote, .marginnote, figcaption { 334 | DDdisplay: none; 335 | } 336 | 337 | .sidenote, .marginnote, figcaption { 338 | margin-right: -35%; 339 | width: 32%; 340 | font-size: 1.0rem; 341 | line-height: 1.1rem; 342 | } 343 | 344 | p, ol, ul { font-size: 1.2rem; } 345 | 346 | p { 347 | line-height: 1.6rem; 348 | margin-top: 0.8rem; 349 | margin-bottom: 0.8rem; 350 | } 351 | 352 | .code { 353 | font-size: 0.8rem; 354 | } 355 | } 356 | -------------------------------------------------------------------------------- /aws/cloudstats_00.sql: -------------------------------------------------------------------------------- 1 | /** 2 | Stage 00 of our main pipeline. This is mostly about data cleanup. The AWS CUR is a combined 3 | line-item bill and half of a system log unioned between dozens of AWS products written by 4 | hundreds of people over nearly 20 years, stuffing product-specific info into shared fields 5 | with only minimal reference to proper dataset design. Every log is different, with fields 6 | appearing and disappearing, their meanings an values changing over time with many 7 | adorable typos thrown in for fun. 8 | 9 | NOTE: See redshift_athena_compat_udfs.sql for functions like if() and regexp_extract(). 10 | NOTE: Please see the where clause for notes on doing a full rebuild vs incremental. 11 | **/ 12 | 13 | create or replace view cloudstats.cloudstats_00 as 14 | select 15 | ---------------------------------------------- 16 | -- BASE DIMENSIONS --------------------------- 17 | ---------------------------------------------- 18 | year as year, -- '2022' 19 | month as month, -- '1' (note! not '01') 20 | date_format(date(line_item_usage_start_date), '%Y-%m-%d') 21 | as date, -- '2022-01-01' 22 | date_trunc('week', date(line_item_usage_start_date)) 23 | as date_week, -- '2021-12-27' (the first Sunday in the week containing 2022-01-01) 24 | 25 | date_format(date(line_item_usage_start_date), '%Y-%m') 26 | as date_month, -- '2022-01' 27 | 28 | date_format(date(line_item_usage_start_date), '%w (%a)') 29 | as day_of_week, -- '1 (Sun)' 30 | 31 | '0.1.0' as cloudstats_version, 32 | line_item_usage_account_id as account_id, 33 | bill_billing_entity as billing_entity, 34 | line_item_legal_entity as legal_entity, 35 | bill_invoice_id as invoice_id, 36 | --line_item_resource_id as resource_id, -- blows up cardinality 37 | 38 | /* 39 | resource_name: an artificial field. The resource_id is wonderful for diagnosis 40 | but too much for monitoring & analysis. Fortunately it has a handful of parsable 41 | patterns, so we can extract things like database name or elb pool. 42 | */ 43 | coalesce(regexp_replace((case 44 | when line_item_resource_id = '' or line_item_resource_id is null 45 | then '' 46 | 47 | /* instances and volume ids. rewrite to the cluster name or other larger group. */ 48 | when regexp_like(line_item_resource_id, '(i-[0-9a-f]{6,}|vol-[0-9a-f]{6,})|:instance/i-[0-9a-f]{6,}') 49 | then coalesce( 50 | /** ADD YOUR SITE-SPECIFIC TAGS HERE **/ 51 | --if(resource_tags_user_cluster_name not in ('', ' '), resource_tags_user_cluster_name), 52 | --if(resource_tags_user_name not in ('', ' '), resource_tags_user_name), 53 | null 54 | ) 55 | 56 | /* 57 | Common colon-slash pattern. depending on the subservice, dial in the specificity. 58 | 'arn:aws:rds:us-west-2:1234567890:subservice/foo/bar/baz' 59 | 'arn:aws:logs:us-east-1:1234567890:log-group:/aws/eks/foobar/cluster' 60 | */ 61 | when regexp_like(line_item_resource_id,':(?:awskms|cluster|crawler|directory|distribution|fargate|file-system|function|hostedzone|log-group|natgateway|storage-lens|table|task|userpool|workgroup|workspace)/') 62 | then regexp_extract(line_item_resource_id, ':([a-zA-Z0-9_\-]+:?/[a-zA-Z0-9_\-]+)', 1) 63 | 64 | /* 65 | Extract subservice/foo/bar 66 | 'arn:aws:ecr:us-east-1:1234567890:repository/foo/bar' --> repository/foo/bar 67 | */ 68 | when regexp_like(line_item_resource_id, ':(?:loadbalancer|repository)/') 69 | then regexp_extract(line_item_resource_id, ':([a-zA-Z0-9_\-]+/[a-zA-Z0-9_\-]+/[a-zA-Z0-9_\-]+)', 1) 70 | 71 | /* 72 | What about lovelies like this? they forgot their slashes :( 73 | 'arn:aws:rds:us-west-2:1234567890:db:foo-bar-baz' 74 | */ 75 | when regexp_like(line_item_resource_id, '^arn:[a-zA-Z0-9_\-]+:[a-zA-Z0-9_\-]+:[a-zA-Z0-9_\-]+:[0-9]+:.+$') 76 | then regexp_extract(line_item_resource_id, '^arn:[a-zA-Z0-9_\-]+:[a-zA-Z0-9_\-]+:[a-zA-Z0-9_\-]+:[0-9]+:(.+)$', 1) 77 | 78 | /* Everything else, eg S3 bucket names. Can explode cardinality! */ 79 | else line_item_resource_id 80 | end), 81 | '[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}', -- blank out UUIDs 82 | ''), '') as resource_name, 83 | 84 | line_item_line_item_type as line_item_type, 85 | 86 | /* Clean up the cardinality of usage_type. Strip off regions, niggly usage details, etc. */ 87 | coalesce( 88 | regexp_extract(line_item_usage_type, '^.*(?:SpotUsage:|InstanceUsage:|BoxUsage:|EBSOptimized:)(.+)', 1), 89 | regexp_extract(line_item_usage_type, '((IN|OUT)-Bytes-(Internet|AWS))$', 1), 90 | regexp_extract(line_item_usage_type, '^[A-Z]{2,}[0-9]?-[A-Z[0-9]]{2,}[0-9]?-(.+(In|Out)-Bytes)', 1), 91 | regexp_extract(line_item_usage_type, '^[A-Z]{3,}[0-9]-(.+)', 1), 92 | regexp_extract(line_item_usage_type, '^(US|ZA|SA|EU|AP|JP|CA|ME|IN|AU|NA)-(.+)', 2), 93 | regexp_extract(line_item_usage_type, '^(us|za|sa|eu|ap|jp|ca|me|in|au|na)-(east|west|central|north|south|northeast|northwest|southeast|southwest)-[0-9]-(.+)', 3), 94 | line_item_usage_type 95 | ) as usage_type, 96 | 97 | line_item_usage_type as usage_type_orig, 98 | product_region as region_code, 99 | product_location as location, 100 | 101 | /* CreateCacheCluster:0000:SV000 --> CreateCacheCluster */ 102 | coalesce( 103 | regexp_extract(line_item_operation, '^([^:]+):(SV)?[0-9]+(:(SV)?[0-9]+)?$', 1), 104 | line_item_operation 105 | ) as operation, 106 | 107 | /* You'll thank me later. */ 108 | line_item_currency_code as currency_code, 109 | 110 | /* Marketplace sometimes logs a crazy app id in the product code. */ 111 | if(bill_billing_entity = 'AWS Marketplace', 'AWSMarketplace', line_item_product_code) 112 | as product_code, 113 | 114 | /* NOTE: This gets overlaid from a dim table in stage 01 */ 115 | if(bill_billing_entity = 'AWS Marketplace', 'Marketplace', product_product_name) 116 | as product_name, 117 | 118 | if(bill_billing_entity = 'AWS Marketplace', 'Marketplace App', product_product_family) 119 | as product_family, 120 | 121 | if(bill_billing_entity = 'AWS Marketplace', product_product_name, product_group) 122 | as product_group, 123 | 124 | if(bill_billing_entity = 'AWS Marketplace', product_product_name, product_servicecode) 125 | as product_servicecode, 126 | 127 | /* 128 | Fix up the myriad of pricing_unit inconsistencies, in preparation for the pricing_bucket 129 | mapping in stage 01. Since this is a fallthrough statement against a high-cardinality field, 130 | the order matters. Most specific to most general. 131 | */ 132 | (case 133 | /* workspaces, marketplace, etc. */ 134 | when product_group_description = 'Billed by the month' and product_resource_type = 'Software' then 'Software-Month' 135 | when product_group_description = 'Billed by the month' and product_resource_type = 'Hardware' then 'Hardware-Month' 136 | when pricing_unit in ('Month', 'Months', 'month', 'months', 'Mo') and line_item_operation in ('GetLicense', 'Subscription') then 'Software-Month' 137 | 138 | /* AmazonCognito */ 139 | when product_product_family = 'User Pool MAU' then 'User-Month' 140 | 141 | /* 142 | PROCESSING is like Networking, but the main differences are 143 | 1: more intensive compute/transformation over the data, and 144 | 2: the movement of data can be terminal (eg, ingestion) 145 | todo: how to make raw GB and GB-Second usages comparable? 146 | */ 147 | when pricing_unit = 'GB' and line_item_product_code in ('AWSCloudTrail', 'AmazonCloudWatch', 'AmazonKinesisFirehose', 'AmazonSNS', 'AmazonGuardDuty', 'AmazonKinesis', 'AmazonTimestream', 'AWSShield', 'AmazonECRPublic', 'AmazonDynamoDB', 'AWSMarketplace') then 'GB-Processed' 148 | when pricing_unit in ('Lambda-GB-Second', 'Lambda-GB-Second-ARM', 'GB-Seconds') then 'GB-Second' 149 | when pricing_unit in ('Fargate-GB-Hours', 'GB-Hours', 'ECS-EC2-GB-Hours') then 'GB-Hour' 150 | 151 | /* AWS Athena. This is not storage, but the amount of bytes scanned. Usage is rescaled to GB below. */ 152 | when pricing_unit = 'Terabytes' then 'GB-Processed' 153 | 154 | /* 155 | METERED IO from storage products, databases, etc. 156 | todo: does S3 GetObject qualify as processing? Or better if IO? The SQL-on-S3 stuff makes this uncertain. 157 | */ 158 | when pricing_unit = 'GB' and line_item_product_code in ('AmazonS3', 'AmazonEFS') then 'GB-IO' 159 | 160 | /* NETWORK */ 161 | when pricing_unit = 'GB' and product_product_family in ('Data Transfer', 'Load Balancer', 'VpcEndpoint', 'Lightsail Networking', 'Sending Attachments', 'NAT Gateway') then 'GB-Network' 162 | when pricing_unit = 'GigaBytes' then 'GB-Network' -- AWSGlobalAccelerator, AmazonVPC, etc 163 | -- todo: NatGateway-Bytes? 164 | 165 | /* MONTHLY charges. These are not amortized but could be with some work. */ 166 | when pricing_unit = 'Month' and product_group = 'User' then 'User-Month' 167 | when pricing_unit = 'Faces-Mo' then 'Face-Month' 168 | when pricing_unit = 'Tag-Mo' then 'Tag-Month' 169 | when pricing_unit = 'Mo' and line_item_product_code = 'AmazonRoute53' then 'DNS-Month' 170 | when pricing_unit = 'Security Checks' then 'SecurityCheck-Month' 171 | when pricing_unit = 'vCPU-Months' then 'CPU-Month' -- RDS long term retention 172 | when pricing_unit in ('User', 'User-Mo') then 'User-Month' 173 | when pricing_unit = 'Objects' and line_item_product_code = 'AmazonS3' then 'Object-Month' 174 | when line_item_line_item_type = 'Fee' and pricing_unit in ('dollar', 'Dollar', 'dollars', 'Dollars') and line_item_product_code like '%Support%' then 'Support-Month' 175 | 176 | /* STORAGE, all products, including databases, volumes, s3, and so on. */ 177 | when pricing_unit in ('GB-Mo', 'GB-month', 'GB-mo', 'GB-Month') then 'GB-Month' 178 | 179 | /* COMPUTE */ 180 | when pricing_unit in ('Hrs', 'hrs', 'Hours', 'hours', 'hour', 'Hour', 'StreamHr', 'KPU-Hour', 'ACU-Hr', 'vCPU-Hours', 'LCU-Hrs', 'IdleLCU-Hr', 'DPU-Hour', 'Instance-hrs', 'Hourly', 'hourly', 'ShardHour', 'ConsumerShardHour', 'Accelerator-Hours', 'NatGateway-Hours', 'Rule-Hour') then 'Hour' 181 | when pricing_unit = 'Dashboards' and line_item_operation = 'DashboardHour' then 'Hour' 182 | 183 | /* nb: we also rescale the usage and rate below. */ 184 | when pricing_unit in ('minute', 'minutes', 'Minute', 'Minutes', 'second', 'seconds', 'Second', 'Seconds') then 'Hour' 185 | 186 | /* REQUESTS (todo: rethink how to make these comparable to each other) */ 187 | when pricing_unit in ('API Requests', 'Requests', 'Queries', 'FunctionExecutions') then 'Request' 188 | when pricing_unit like '%Request' then 'Request' 189 | 190 | --todo: how to bucket passthrough dollar charges from SMS? 191 | 192 | /* 193 | Like usage_type, pricing_unit has become a a dumping ground for 194 | region prefixes and inconsistencies. Eg, note the lack of a '-': 195 | 196 | USE1-AmazonApiGateway-Request 197 | USE2-AmazonApiGatewayRequest 198 | 199 | In these particular cases, the "like '%Request'" clause takes care of them above. But leave 200 | this here to catch new funny stuff that may come up in future. 201 | */ 202 | when regexp_like(pricing_unit, '(US|ZA|SA|EU|AP|JP|CA|ME|IN|AU|NA)[EWNSC]?[0-9]?-.+') 203 | then regexp_extract(pricing_unit, '(?:US|ZA|SA|EU|AP|JP|CA|ME|IN|AU|NA)[EWNSC]?[0-9]?-(.+)') 204 | 205 | else pricing_unit 206 | end) as pricing_unit, 207 | 208 | -- mess with a fundamental field this much, leave a way to debug it. 209 | pricing_unit as pricing_unit_orig, 210 | 211 | /* 212 | Specific to products with reservation/spot/etc options. 213 | todo: as of 1 Nov 2021 product_marketoption is undocumented, but sometimes has values when pricing_term does not. 214 | todo: I've yet to encounter a private rate applied to reserved/spot/etc but stranger things have happened. 215 | */ 216 | (case 217 | when pricing_term = 'Reserved' and line_item_line_item_type = 'DiscountedUsage' then 'Reserved' 218 | when pricing_term = 'Spot' and line_item_line_item_type = 'Usage' then 'Spot' 219 | when pricing_term in ('OnDemand', '') and discount_private_rate_discount != 0 then 'PrivateRate' 220 | when pricing_term in ('OnDemand', '') and line_item_line_item_type = 'Usage' then 'OnDemand' 221 | when pricing_term in ('OnDemand', '') and line_item_line_item_type = 'SavingsPlanCoveredUsage' then 'SavingsPlan' 222 | else 223 | 'Unknown' 224 | end) as pricing_regime, 225 | 226 | /* useful for debug & diagnosis */ 227 | coalesce( 228 | if(product_bundle_description not in ('', ' '), product_bundle_description), 229 | if(product_description not in ('', ' '), product_description), 230 | if(product_group_description not in ('', ' '), product_group_description), 231 | if(line_item_line_item_description not in ('', ' '), line_item_line_item_description) 232 | ) as product_description, 233 | 234 | savings_plan_savings_plan_a_r_n as savings_plan_arn, 235 | reservation_reservation_a_r_n as reserved_instance_arn, 236 | 237 | /* 238 | ---------------------------------------------- 239 | -- COMPUTE ----------------------------------- 240 | ---------------------------------------------- 241 | More data cleanup. non-EC2 products may use a type available to EC2, but add suffixes and prefixes that 242 | are (probably?) not relevant to the specs of the machine, eg "cache.m6g.large" or "g5.xlarge.search" 243 | so we make a new field called compute_instance_spec to normalize the types across aws products. 244 | 245 | The bet here is that instance types are SKUs and evolve slowly. Component drift may happen within a 246 | SKU, eg motherboard revs, but machines with the same "instance_spec" should have functionally 247 | equivalent performance and cost throughout their service life. This should hold true even if AWS 248 | were to virtualize classic machine specs on new hardware. 249 | 250 | However, over time, I expect non-EC2 services to use more and more specialized instance types 251 | for which spec info may not be available. Eg, "amazonsimpledb - standard". Even then they should 252 | have equivalent capability within a given SKU, though "capability" might be defined in terms of 253 | work throughput and not GB or GHz. 254 | 255 | It would be very interesting to collect standard benchmarks on the same machine specs over long 256 | periods of time. 257 | 258 | Why pay so much attention to this? Being able to compare roll-yer-own ElasticSearch performance to 259 | AWS OpenSearch on equivalent machines, for one. AWS's home-grown Graviton chips are becoming 260 | a major factor in cap planning, for another. Amazon has not (yet) completely abstracted hardware 261 | from billable usage, so nerding out on the hardware can yield better long-term costing decisions. 262 | 263 | See also cloudstats_dim_instance_specs. 264 | */ 265 | lower(regexp_replace(product_instance_type, 'ml\.|\.search|db\.|cache\.|-Hosting|-Training|-TrainDebugFreeTier|-TrainDebug|-Notebook', '')) 266 | as compute_instance_spec, 267 | 268 | lower(product_instance_type) as compute_instance_type, 269 | 270 | coalesce( 271 | if(product_instance_family not in ('', ' '), lower(product_instance_family)), -- regular EC2 instances 272 | if(product_bundle not in ('', ' '), lower(product_bundle)) -- AmazonWorkSpaces 273 | ) as compute_instance_family, 274 | 275 | /* 276 | 4 Oct 2021: Some non-EC2 products leave the product_instance_type_family null. 277 | This is probably from a missed join on c5.xlarge.search or whatever. >_< 278 | See also 01 stage for mapping based on instance_spec. 279 | */ 280 | (case 281 | when product_instance_type_family = '' or product_instance_type_family is null 282 | then regexp_extract(lower(product_instance_type), '([a-z0-9]+)', 1) 283 | else lower(product_instance_type_family) 284 | end) as compute_instance_type_family, 285 | 286 | (case 287 | when product_availability_zone is not null and product_availability_zone not in ('', 'NA') 288 | then product_availability_zone 289 | else line_item_availability_zone 290 | end) as compute_availability_zone, 291 | 292 | product_capacitystatus as compute_capacity_status, 293 | product_physical_processor as compute_processor_name, 294 | 295 | /* 296 | Artificial fields: chip vendor / chip line. 297 | Some non-EC2 products leave product_physical_processor as null or ''. See also stage 01 298 | */ 299 | (case 300 | when product_physical_processor is not null and product_physical_processor != '' 301 | then coalesce(regexp_extract(product_physical_processor, '(Intel|AMD|AWS)', 1), 'Unknown') 302 | end) as compute_processor_vendor, 303 | 304 | (case 305 | when product_physical_processor is not null and product_physical_processor != '' 306 | then coalesce(regexp_extract(product_physical_processor, '((?:Intel|AMD|AWS) [^ ]+)', 1), 'Unknown') 307 | end) as compute_processor_line, 308 | 309 | product_storage as compute_storage, 310 | product_operating_system as compute_os, 311 | 312 | /* see 'compute_software' in stage 01 */ 313 | product_pre_installed_sw as compute_software, 314 | 315 | /* 316 | ---------------------------------------------- 317 | -- STORAGE ----------------------------------- 318 | ---------------------------------------------- 319 | */ 320 | (case 321 | --todo: onezone, etc 322 | when product_storage_class is null and line_item_usage_type = 'TimedStorage-ZIA-SmObjects' then 'Infrequent Access (Small Objects)' 323 | when line_item_usage_type = 'TimedStorage-INT-IA-ByteHrs' then 'Intelligent (Infrequent Access)' 324 | when line_item_usage_type = 'TimedStorage-INT-AIA-ByteHrs' then 'Intelligent (Archive Instant Access)' 325 | when line_item_usage_type = 'TimedStorage-INT-FA-ByteHrs' then 'Intellegent (Frequent Access)' 326 | else product_storage_class 327 | end) as storage_class, 328 | 329 | product_volume_type as storage_volume_type, -- SSD, Magnetic, etc 330 | product_volume_api_name as storage_volume_api, --gp2, gp3, st1, etc 331 | product_uservolume as storage_user_volume, 332 | 333 | /* 334 | Thirty days hath September. This is needed to smooth out AWS's silly accounting 335 | for data storage in stage 01. TLDR: Amazon charges per "GB-Month", but defines every 336 | month as having 31 days. That means in February, your per-hour cost goes up by 10.7%. 337 | They accomplish this by inflating the *usage* amount. This is all carefully explained 338 | in your contract, though they don't go out of their way to highlight it. I suspect 339 | this is why every storage pricing example in the docs just happens to randomly 340 | choose a month that is 31 days long. 341 | 342 | Original algo by jcaesar, 15 Mar 708 anno urbis conditae 343 | Bugfixes by greg13@vatican.va, 15 Oct 1582 anno domini 344 | */ 345 | (case 346 | when month in ('4', '6', '9', '11') then 30 347 | when month in ('1', '3', '5', '7', '8', '10', '12') then 31 348 | when cast(year as int) % 4 = 0 and not (cast(year as int) % 100 = 0 and cast(year as int) % 400 = 0) then 29 349 | else 28 350 | end) as days_in_month, 351 | 352 | /* 353 | ---------------------------------------------- 354 | -- ADD YOUR SITE-SPECIFIC TAGS HERE ---------- 355 | ---------------------------------------------- 356 | */ 357 | -- resource_tags_my_tag as my_tag, 358 | 359 | /* 360 | ---------------------------------------------- 361 | -- MEASURES ---------------------------------- 362 | ---------------------------------------------- 363 | */ 364 | count(1) as record_cnt, 365 | 366 | /* 367 | "true" usage 368 | Note the rescaling of some usage amounts, to make it easier to compare units of time or data. 369 | */ 370 | sum(line_item_usage_amount / ( 371 | case 372 | when pricing_unit in ('second', 'seconds', 'Second', 'Seconds') then 3600 -- to Hour 373 | when pricing_unit in ('minute', 'minutes', 'Minute', 'Minutes') then 60 -- to Hour 374 | when pricing_unit in ('Terabytes') then 0.0009765625 -- to GB-Processed 375 | else 1 376 | end)) as usage, 377 | 378 | /* 379 | "true" cost, essentially the amortized, accrual-basis cost of a lineitem's usage. 380 | this elides amortization / fees / "blending", and lump-sum charges like support. 381 | why this and not just blended_cost? Because we want to preserve information on the 382 | real rates paid for Spot/SP/RI and OnDemand. 383 | */ 384 | sum((case line_item_line_item_type 385 | when 'Usage' then line_item_net_unblended_cost 386 | when 'DiscountedUsage' then reservation_net_effective_cost 387 | when 'SavingsPlanCoveredUsage' then savings_plan_net_savings_plan_effective_cost 388 | else 0 389 | end)) as cost, 390 | 391 | /* 392 | same as "true" cost, but backing out EDP discounts. This is only really used in the 393 | _bill tables, to calculate the impact of separate discounting regimes. Note the use 394 | of *_effective_cost and not *_NET_effective_cost. 395 | 396 | todo: I've never seen a private rate applied to RI/SP, but it's possible... 397 | */ 398 | sum((case line_item_line_item_type 399 | when 'Usage' then line_item_net_unblended_cost + discount_edp_discount 400 | when 'DiscountedUsage' then reservation_effective_cost 401 | when 'SavingsPlanCoveredUsage' then savings_plan_savings_plan_effective_cost 402 | else 0 403 | end)) as cost_without_edp, 404 | 405 | sum(line_item_unblended_cost) as unblended_cost, 406 | 407 | /* 408 | TIP: if you don't have an Enterprise Discount Program (EDP) agreement with AWS, edp_discount may not exist. 409 | ditto for private_rate_discount if you don't have a Private Pricing Addendum (PPA). 410 | */ 411 | sum(discount_edp_discount) as edp_discount, 412 | sum(discount_private_rate_discount) as private_rate_discount, 413 | sum(discount_total_discount) as total_discount, -- edp + private rate + whatever new fields come in future 414 | 415 | -- the default ondemand public price, to compare the effect of spot/reserved/savingsplan/edp 416 | sum(pricing_public_on_demand_cost) as public_cost 417 | 418 | /* 419 | TIP: create a sample table with only a day or a week of data, to speed up debugging. eg: 420 | 421 | create or replace table cur_sample as 422 | select * from {{ your_aws_cur_table }} 423 | where 424 | year='2023' and month='1' 425 | and line_item_usage_start_date between 426 | timestamp('2023-01-01') and timestamp('2023-01-07') 427 | ; 428 | 429 | Random sampling of the table works as well, but because different services have different 430 | logging rates the graphs will look very weird. An S3 bucket logs only one storage lineitem 431 | per day per storage class, while every hour of every EC2 volume will log its own lineitem. 432 | */ 433 | from {{ your_aws_cur_table }} 434 | 435 | where 436 | /* 437 | There are a *lot* of usage lineitems that are always free, even to the public. 438 | Since these activities are always free no matter how much you use, changing your 439 | usage does not move the cost needle. The main reason we exclude them is because 440 | there are separate lineitems for disk & network IO attached to EC2 instances. 441 | Unfortunately, these records are NEARLY INDISTINGUSHABLE FROM COMPUTER TIME. Like, 442 | they only differ from the actual computer time in informational fields like *_description. 443 | These lineitems realllly mess you up when calculating rates and summing instance-hours, 444 | and doing string parsing on usage_type is just asking for bugs later on. 445 | So, better to nuke them all from orbit. This may undercount usage, but not cost. 446 | */ 447 | not (line_item_line_item_type = 'Usage' and pricing_public_on_demand_rate = '0.0000000000') 448 | 449 | /* 450 | months-back filter. the AWS CUR is partitioned by year and month, where month is NOT zero-padded. 451 | This twisty bit of logic takes the current date and filters on the current year/month partition 452 | plus the N year/month partitions prior to that. Uncomment if your views run too slow when doing 453 | incremental inserts. 454 | 455 | and ( 456 | (year = to_char(to_timestamp(current_date, 'YYYY-MM-01'), 'YYYY') 457 | and month = cast(cast(to_char(to_timestamp(current_date, 'YYYY-MM-01'), 'MM') as int) as varchar)) 458 | 459 | or (year = to_char(to_timestamp(current_date, 'YYYY-MM-01') - interval '1 month', 'YYYY') 460 | and month = cast(cast(to_char(to_timestamp(current_date, 'YYYY-MM-01') - interval '1 month', 'MM') as int) as varchar)) 461 | 462 | or (year = to_char(to_timestamp(current_date, 'YYYY-MM-01') - interval '2 month', 'YYYY') 463 | and month = cast(cast(to_char(to_timestamp(current_date, 'YYYY-MM-01') - interval '2 month', 'MM') as int) as varchar)) 464 | ... 465 | ) 466 | */ 467 | 468 | --','.join([str(x+1) for x in range(NN)]) 469 | group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50 470 | -------------------------------------------------------------------------------- /docs/aws-dismal-guide.md: -------------------------------------------------------------------------------- 1 | # A Dismal Guide to AWS Billing 2 | ## 3 | 4 | This is longer than an article but shorter than a book, so here's a little tldr / main thesis up top: 5 | 6 | 0. Cost control is like security. Everyone should be aware of it, but it's best handled by a small expert team. Outsourcing it is asking for trouble. 7 | 8 | 1. Everything depends on having good data. Amazon's billing log is extremely messy. [Cloudstats](http://github.com/aristus/cloudstats) is an open-source ETL you can use to clean it up. 9 | 10 | 2. There are lots of useful charts & queries you can build on top of Cloudstats. As your infra evolves, your measurement must track those evolutions. 11 | 12 | 3. The essence of capacity planning is the ability to compare two possible changes, dollar-for-dollar, _before_ you actually make them. 13 | 14 | 4. Contract negotiation is the final boss level. To beat the boss you need the tools & weapons you earned in previous levels. 15 | 16 | ## Part one: If you want it done right... 17 | 18 | If you spend more than a million dollars per year on cloud compute, you probably suspect that half of it can be cut. The problem, of course, is _which half_? 19 | 20 | There are a lot of ways to find & land cost savings, but they come down to a) consuming fewer resources and b) getting a better price for what you do consume. Sure. And we should all eat less and exercise more. The question is how. The CFO is yelling and the engineers are shrugging. No one knows what’s going on. The monthly bill is scary, but even more scary is the growth rate. 21 | 22 | The first step to controlling your budget is knowing in detail where the spending goes. If you don't have good data you can't do good analysis. If you don't know what you have spent, you can't predict what you _will_ spend. Without good projections you can't do effective capacity planning. Without cap planning you can't run what-if scenarios. Without scenarios you can't negotiate good contracts. And once you've signed a _bad_ contract, all of the above gets that much harder. 23 | 24 | **In short, if you don't have good data you've already lost.** 25 | 26 | AWS themselves offer you some tools to explore costs and suggest ways to buy smarter. Many (many) third-party tools offer to slurp up your cost and usage data and do it all for you. Often for a healthy percentage fee. But past a certain scale and complexity, capacity planning and cost control are too important to outsource. 27 | 28 | It’s not because those tools don’t work. I assume that they are all earnest and honest. But they can’t leverage your internal knowledge. You are planning to turn off the Foobar service next month so there’s no need to commit to future spend for it. Foobar2 is running alongside Foobar during its ramp up, and you don't want to be bothered by alerts about "cost increases" you already know about. You want to measure your overall efficiency in terms of transaction volume or revenue dollars. You want to distinguish research & development from [COGS](https://en.wikipedia.org/wiki/Cost_of_goods_sold) and other inside info you can't just hand over to outsiders. 29 | 30 | After a certain point you have to do capacity & cost analysis work yourself, in-house. And that point is probably around the time where a 20% savings would pay for the team doing it. (Earlier is better! You can save yourself a lot of heartache by laying the groundwork while you are still small.) 31 | 32 | Just like security, cost control is something everyone should contribute to, but final responsibility must rest with a small expert team. Both require going deep and dirty with the data. [Red Teams](https://en.wikipedia.org/wiki/Red_team) hunt for bugs that leak security access. Then they fix them and eliminate the root cause. Green Teams do the same thing except that they hunt for bugs that leak money. 33 | 34 | Let's start with the data. 35 | 36 | ### The CUR 37 | 38 | The [AWS Cost & Usage Report](https://docs.aws.amazon.com/cur/latest/userguide/what-is-cur.html) is the source of truth for usage and billing across Amazon’s cloud. It's an exhaustively detailed blow-by-blow account of how you use every computer, hard drive attached to that computer, database, network link, etc, for every hour of every day. A medium-sized customer can easily generate tens of millions of new records per month. 39 | 40 | AWS [Cost Explorer](https://docs.aws.amazon.com/cost-management/latest/userguide/ce-what-is.html), the pretty charts & spending tips, and the monthly invoice are all generated from that source. You can ask for the raw data to be delivered to you as flat files, loadable into the database of your choice. 41 | 42 | But the CUR is notorious for its dirty data and surprising behavior. The data is dirty because it includes the config info for a hundred different services stapled together. Over 300 columns, half of which are null or irrelevant, and the other half stuffed with accreted garbage. [This whole thread is a hoot.](https://twitter.com/mike_julian/status/1598923916218888192) 43 | 44 | ![](img/aws-dismal/dismal-aws-fig1-mike-twitter.png) 45 | 46 | 47 | 48 | The CUR's behavior is _surprising_ because those configs are in turn stapled to a dump of line-item billing records. **The CUR is not a log.** It's not an immutable sequence of events. It's a snapshot of the current state of AWS's accounting system. Records can be added or amended weeks after the fact. It contains all sorts of accounting artifacts like lump-sum fees, taxes, credits, third-party charges, details that are only known once the monthly invoice is finalized, corrections to previous lines, and pairs of line-items that cancel each other out altogether. 49 | 50 | [AWS's main problem maintaining their not-a-log over almost 20 years is backwards compatibility. Important Customers downstream depend on the exact behavior of every historical design bug and typo. If you were to fix those bugs at the source, downstream would break and then everybody gets mad at _you_. It's no wonder why the last major refactor was in 2015, or why the documentation is written in a defensive style. I neither blame those unsung heroes nor envy their job.](margin) 51 | 52 | Complex as it is, I’ve _never_ seen a systematic error in how AWS bills for usage. Their people do a damn fine job making sure the numbers add up and I don't want you to think otherwise. But in its raw form the CUR is not useful for day-by-day cost analysis. The useful info is scattered among too many weirdly-named, under-documented fields, and the not-quite-a-log thing is just confusing. 53 | 54 | ### Cloudstats 55 | Like Mike I could go on about those quirks for hours, but instead we’re going to kill them dead. [Cloudstats is an open-source ETL](https://github.com/aristus/cloudstats) that processes the Cost & Usage Report data and creates a much cleaner basis to work from. It's implemented in pure SQL so that with (I hope) minimal work you can get it running on your favorite data pipeline. There's nothing especially special about it. It's a simplifying layer, the kind of thing you might write yourself after some experience with the CUR. But there's no reason why everyone has to write their own. 56 | 57 | The very first thing Cloudstats does is separate the _cost_ from the _usage_. 58 | 59 | 60 | > The `cloudstats` table is a per-day aggregation and cleanup of _only_ billable, usage line-items. Extra dimension tables are added to fill in nulls. Cloudstats also comes with a discounted [accrual-basis](https://en.wikipedia.org/wiki/Basis_of_accounting) measurement of “cost”. We’ll get into what that means in a bit. 61 | 62 | > `cloudstats_bill` is a per-month _cash_-basis aggregation of all line-items (including those lump sums, corrections, etc) reverse-engineered from the observed behavior of official AWS invoices. It provides much more detail than the PDF that's emailed to your accountants, eg explanations of individual taxes and credits. This table is covered in parts three & four. 63 | 64 | A row in the cloudstats table is a rollup per day of billable usage in AWS. Say, a day in the life of a cluster of EC2 instances, or the data stored in Glacier for a given S3 bucket. Each row is aggregated along 50-odd useful dimensions (down from 300+) and a handful of measurements. The dimensions are low-cardinal, spellchecked versions of the CUR originals, with some synthetic fields constructed from multiple sources. 65 | 66 | Check out the docs for extensive and irreverent comments on all of them. But I want to highlight one important field called "cost". 67 | 68 | ### When we talk about cost 69 | AWS offers a ton of ways to buy their stuff. The pricing structures are driven by the financial & accounting needs of both Amazon and their customers. Those needs are largely invisible and often baffling to engineers. If you've read this far it's a good bet you are already trying to figure them out from your own data. I'll go over them briefly (hah!) and then explain how the Cloudstats cost field smooths them out into something more useful for our purpose. 70 | 71 | [There are interesting reasons why the Spot market exists, but aren't relevant here.](margin) 72 | Let’s take the example of a single EC2 machine, a d3.2xlarge running basic Linux. As of this writing the public OnDemand price for that machine is $1.00 per hour. You can spin one up, use it for an hour, then spin it down. Or you can rent the same machine on the Spot market at a price that varies with supply & demand. Pretty simple. 73 | 74 | [Of course, this promise implies a new current liability to be carried on your balance sheet and an equivalent receivable on _their_ sheet. Aren't you glad you went into programming instead?](margin) 75 | Things get complicated with Reserved Instances and Savings Plans. Essentially, Amazon is willing to _pay you_ for the promise of steady revenue in the future. Otherwise that machine may sit idle, sucking juice and costing them money. If you promise to use that d3.2xlarge for a whole year they will only charge $0.63 per hour. If you promise 3 years, the price drops to $0.43. Very reasonable. 76 | 77 | [Net present value is more complicated than this because of future uncertainty and cashflow. But it's the basic bargain.](margin) 78 | But a dollar now is usually worth more than a dollar later. If you pay me a dollar today I can put it in the bank and maybe in a year I'll have $1.05. So Amazon also lets you pre-pay some or all of that year of usage upfront, for even deeper discounts. 79 | 80 | - 1 year, no upfront: $0.63 81 | - 1 year, partial upfront: $0.60 82 | - 1 year, all upfront: $0.59 83 | - ...etc etc 84 | 85 | 86 | 87 | 88 | All of this is perfectly comprehensible to your financial planning & analysis team. The upshot is that because of financial decisions you maybe weren't involved in, your AWS billing log, which is not a log, gets annoyingly complicated. 89 | 90 | Instead of a steady drumbeat of records saying `Usage: one hour of d3.2xlarge, $1.00`, you may get a single record with a huge amount, `RIFee: 1 year d3.2xlarge, $5,148.23`, and then a DiscountedUsage row every hour for the next year with _zero cost_. You've already paid, after all. This makes accounting sense but doesn't help you analyze "what you spend" every day. 91 | 92 | More complications: some reserved instance plans allow you to [convert](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-convertible-exchange.html) your commitment between classes of machine. A [Savings Plan](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-convertible-exchange.html) is more flexible: you only commit to pay for a certain dollar amount of compute in general. Once you hit a half million or so per year, the [Enterprise Discount Program](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ri-convertible-exchange.html) (EDP) can get you blanket contractual discounts across all AWS services… if you commit to a total spend over 1 to 5 years. Going further, you can negotiate "private rates" for certain services you promise to use a lot of, like storage or network, or only in certain AWS regions, paying all upfront, or partially, and every combination of the above. 93 | 94 | Imagine what your billing log (not a log!) looks like then. There can be a [dozen different fields](https://docs.aws.amazon.com/cur/latest/userguide/reservation-columns.html) with "cost" or "fee" in the name, with overlapping meanings. 95 | 96 | ### Value in dollars 97 | 98 | [AWS themselves attempted to do this with "equivalent compute units" and "normalized size factors". They were never that useful because gigabytes aren't gigahertz. But the fields linger on in the CUR table, confusing the unwary. And don't even ask about "blended cost".](margin) 99 | At this point you might want to throw up your hands and ignore the dollar stuff. Why can't we just focus on real things we can understand? [Equivalent compute hours](https://docs.aws.amazon.com/cur/latest/userguide/product-columns.html#E), or something. But say you want to save money by caching results in a database. How do you equate computers to terabytes? Or terabytes to database writes? 100 | 101 | Go down that rabbit hole as far as you like. When you come back up for air it'll be clear that you need a single, fungible metric to compare the value of _any_ kind of usage in the cloud. Without that you can't quantify tradeoffs. And it can't be the public listed price. Your metric has to be at least partially connected to your contract terms and the real dollars your CFO is yelling about. 102 | 103 | **The essence of capacity planning is the ability to compare two possible changes, dollar-for-dollar, before you actually make them.** Even if it means using a very specific definition of "dollar". 104 | 105 | The way to thread this needle is to assimilate some of those accounting bugaboos and ignore the others. You gotta deal with lump-sum fees. Contractual discounts seem important. But your local sales tax rate doesn't (or shouldn't) affect engineering decisions. Ditto for support charges and one-time credits. And that's what the Cloudstats cost metric does: 106 | 107 | ``` sql 108 | (case line_item_line_item_type 109 | 110 | when 'Usage' 111 | then line_item_net_unblended_cost 112 | 113 | when 'DiscountedUsage' 114 | then reservation_net_effective_cost 115 | 116 | when 'SavingsPlanCoveredUsage' 117 | then savings_plan_net_savings_plan_effective_cost 118 | 119 | else 0 120 | end) 121 | ``` 122 | 123 | Let's take that one piece at a time. The `*line_item_type` field has stuff like Usage, Tax, Credit, Refund, and so on. We are only interested in actual billable usage of resources, so everything else gets a zero. (Those lines are analyzed in `cloudstats_bill`.) 124 | 125 | When the `line_item_type` is Usage that means it's OnDemand or Spot. We take the `*net_unblended_cost` (don't ask why it's called that) which includes all of the blanket contractual discounts you may enjoy, like EDP. 126 | 127 | When `line_item_type` = DiscountedUsage that's for Reserved Instances and _not_ EDP (again, don't ask). To sort out the prepayment stuff you used to have to do a bunch of calculations yourself. AWS now provides `*reservation_net_effective_cost` as a single field. This field does amortization, ie it divides lump payments evenly throughout the year. The "_net" part means it _also_ applies blanket contractual discounts. The same thing happens for Savings Plans and I swear that it all adds up. Just don't ask why or this section would be twice as long. 128 | 129 | [This is (I believe) equivalent to the "Net amortized costs" measurement in AWS Cost Explorer.](margin) 130 | The end result is a "cost metric": a daily measurement of the cloud resources you consume in something resembling real dollars. It's not quite the same as what you see in the monthly bill. But it has the advantage that you can easily compare ideas and their tradeoffs. If rewriting the Foobar service would save you $50,000 but would take a month of work, you can instead choose to delete $50,000 of old data and move on to other things. And be reasonably sure you made the right call. 131 | 132 | ### Thirty days hath September 133 | 134 | Speaking of storage, there's one more wrinkle that trips up many people. AWS does something uniquely silly with how it measures data storage. This is how Cloudstats compensates for it: 135 | 136 | ``` 137 | (case 138 | when pricing_unit = 'GB-Month' 139 | then (cost / (365.25 / 12)) * days_in_month 140 | else cost 141 | end) 142 | ``` 143 | 144 | It takes the cost of storage on a given day, divides by the _average_ number of days in a month, then multiplies by the _actual_ length of that month. Weird, right? 145 | 146 | ![The nominal rate (eg $0.05 per GB-Month) stays the same, but you 'use' more than one day per day during a short month.](img/aws-dismal/dismal-aws-storage-rates.png "margin") 147 | 148 | It's necessary because of the way Amazon charges for storage, by the "GB-Month". A GB stored for a day is charged as a fraction of that month. However, AWS defines a "Month" as 31 days. But how do you charge for 31 days' worth of storage during a 28-day month? _By inflating the usage amount._ 149 | 150 | In February your effective daily storage rate goes up by over 10% compared to January. There are smaller jumps in 30-day months. I suspect this is why it seems like every storage pricing example just happens to randomly choose March. 151 | 152 | Cloudstats rescales the cost metric for storage (and `storage_total_tb`) to smooth it out. That way you're only measuring real changes in the amount of data you manage. Otherwise you and your anomaly detectors would go crazy chasing phantoms. 153 | 154 | ![A graph of raw usage vs storage_total_tb. I bet this peculiar math was originally meant to keep the _monthly_ storage bill from jumping around in confusing ways. The point is that now it's too late to change it.](img/aws-dismal/dismal-aws-smoothed-storage.png) 155 | 156 | These are just some of the gory details of AWS billing and how they can be corrected. Now, let's use this data to find us some real cost bugs. 157 | 158 | 159 | ## Part two: Using Cloudstats 160 | 161 | The design of Cloudstats is meant to allow you to generate useful time-based charts and analyses. The main fields are roughly hierarchical. This allows you to start big and then dial in the specificity. In my experience, you want to start at the headline number and recursively subdivide until you've found your target. 162 | 163 | - `date`, `date_week`, `date_month`, etc fields control the timespan and granularity. It's often useful to only look at your peak `day_of_week`, for example. 164 | - `pricing_bucket`: the basic datacenter resource being consumed: Compute, Storage, Network, IO, and so on. 165 | - `pricing_unit`: a (very) cleaned-up version of the original pricing unit, which is more specific than `pricing_bucket`. For example, the IO bucket will have ReadRequestUnits and WriteRequestUnits. 166 | - `product_code`: whatever AWS product generated the billing. 167 | - `usage_type` 168 | - `operation`: even more granular than pricing_unit. Eg, Network --> GB-Network --> InterZone-In and -Out. Lots of data barnacles are scraped off for convenience. 169 | - `resource_name`: mostly derived from `line_item_resource_id`, which usually has the full ARN of the AWS entity being used. It's meant to reveal, eg, database names or S3 bucket names. Individual EC2 instances and volumes are folded into cluster names when available. UUIDs and other high-cardinality strings are stripped. 170 | 171 | ### Synthetic dimensions 172 | 173 | AWS's tooling tends to show you cost and usage per `product_code`. This is probably a symptom of [Conway's Law](https://en.wikipedia.org/wiki/Conway's_law), where a company's org chart leaks into the structure of its output. But a big part of cost analysis is finding new dimensions to compare & contrast. For example, data stored in expensive EFS or EBS volumes might be just as happy in S3. Forgotten backups are often a rich vein to explore. This is where `pricing_bucket` comes in handy: 174 | 175 | ``` 176 | select 177 | date, 178 | product_code, 179 | sum(cost) as cost 180 | from cloudstats 181 | where 182 | date >= current_date - interval '35' day 183 | and pricing_bucket = 'Compute' 184 | group by 1,2; 185 | ``` 186 | 187 | This basic query lets you look at all Compute cost (and _only_ Compute) across your databases and clusters & whatnot. Or makes it easier to see what fraction of your storage cost goes to old backups. The cost of all Network traffic is often a surprise to people looking at it this way for the first time. 188 | 189 | You'll probably want to start with a separate dashboard for each `pricing_bucket`, starting with the headline number and with each chart getting more specific down the hierarchy. 190 | 191 | 192 | ### Speaking of dashboards 193 | 194 | ![Artist's conception. Not to scale.](img/aws-dismal/wb-graph.jpg "margin") 195 | I wish I could show you more screenshots. Despite this wall of text, I mostly think via graphs. But there's client confidentiality to think about and generating plausible dummy data is a lot of work. But I can say that I'm a fan of pivot tables because they let you drill down the hierarchies efficiently. Timeseries plots, both stacked and lined, are good for checking on how you're doing. The more dashboard-level filters the better, at least until you learn which ones are useful for your needs. The Filter Other Charts action you can apply to pivot cells in AWS QuickSight is sweet. 196 | 197 | Most of the time those are the only chart types I use. Since I often have to work with what the customer already uses, I tend to keep my needs small. But if you have a choice, make sure your dashboarding system can generate stable URLs so you don't have to screenshot everything. Even better, URL args so people can iterate off of your work. When Jupyter Lab removed URL args (for sound security reasons) it made me sad. 198 | 199 | Data analysis is inherently a social activity. But it also needs to meet a high bar for reproducibility. Otherwise you're just operating on gossip. 200 | 201 | One unfortunately rare feature is the ability to export a dashboard into something editable offline. A dashboard is made of code, important code, that informs expensive decisions. It's baffling to me how many systems don't allow you to put that code under source control. QuickSight only [added this feature](https://aws.amazon.com/blogs/aws/new-amazon-quicksight-api-capabilities-to-accelerate-your-bi-transformation/) in _late 2022_. And even then it's delivered as a distressingly verbose blob of JSON. 202 | 203 | ### Example: S3 growth 204 | 205 | In the interest of time (double hah!) I'll only go through a couple examples of finding likely cost wins. There are more in the code repo. 206 | 207 | S3 buckets can grow to outrageous sizes when no one is looking. And storage costs compound; every day you pay for the sum of everything stored. A big bucket is not necessarily a problem if its growth rate is small. A small bucket with large absolute growth is something you'll want to catch sooner rather than later. Remember, the quicker you can find a cost bug the more money you save overall. So let's find out what your growth rates are. 208 | 209 | ``` 210 | with daily_totals as ( 211 | select 212 | date, 213 | resource_name as s3_bucket, 214 | sum(storage_total_tb) as tb, 215 | sum(cost) as cost 216 | from cloudstats 217 | where 218 | date >= current_date - interval '95' day 219 | and pricing_bucket = 'Storage' 220 | and product_code = 'AmazonS3' 221 | group by 1,2 222 | ), 223 | 224 | daily_deltas as ( 225 | select 226 | a.date, 227 | a.s3_bucket, 228 | a.tb, 229 | a.cost, 230 | a.tb - b.tb as delta_tb, 231 | a.tb / (a.tb - b.tb) as percent_delta 232 | from 233 | daily_totals a inner join daily_totals b 234 | on a.resource_name = b.resource_name 235 | and a.date - interval '1' day = b.date 236 | ) 237 | 238 | select 239 | s3_bucket, 240 | avg(delta_tb) as avg_delta, 241 | avg(percent_delta) as avg_percent, 242 | tb as total_tb, 243 | cost as cost 244 | from daily_deltas 245 | group by 1 246 | order by 2 desc 247 | limit 50; 248 | ``` 249 | [You can get fancier with stddev, median, etc, to filter out buckets that had one large bump in the recent past.](margin) 250 | This query will show the buckets with the biggest persistent growth over the last 3 months. Very often you'll find that it's that One Big Bucket (everybody has one) which accounts for most of the growth day by day. But sometimes it's the Wait, Who's Logging _That_ bucket, or the Why Are We Writing Twice bucket, and so on. A good measurement system often tells you things you already suspect. The point is that it puts number$ to your intuition. 251 | 252 | You can also flip this query around. Filtering on _zero_ growth rate and sorting by largest cost might reveal many Huh, We Still Have That? buckets you can delete or glacerize for an easy win. 253 | 254 | ### Example: Compute efficiency 255 | 256 | One of the selling points of cloud computing is that you "only pay for what you use". A compute cluster can scale up and down to meet changes in demand. But "use" means "ask for", not the computing power that is usefully consumed. Take all you want, but eat what you take. 257 | 258 | Most of the time the scaling rules are set up by your engineers as the cluster first enters production duty. A minimum of 100 machines of this class, maximum 1,000. Scaling is triggered by something internal to the computers, like average CPU utilization or memory. These numbers (if they aren't just copied from some other system) are generally optimized for reliability. With these numbers the system doesn't crash. It maybe costs a lot to run, but you're rightly afraid of playing Jenga with something that _isn't broken_ for a benefit that's hard to quantify. 259 | 260 | ![](img/aws-dismal/cat-dollars.jpg "margin")Let's fix that. The root problem is that there is no feedback loop, no metric for efficiency. What you need is to join your cost data with a measure of the work performed. Every service has (or should have!) a simple log of the transactions it does. So if yesterday your image cluster consumed 750 instance-hours at $1.00/hour, and served 75 million cat memes, your cats-to-dollar ratio is 100,000:1. Assuming the average work done by each transaction stays relatively stable, you now have a longitudinal measure of efficiency. And an instant historical record to play with. Did that code change last month make things better or worse? How does caching help? Etc. 261 | 262 | [Spot, Reserved, and SP make this analysis more complicated, but still doable. For example, by instance-hour (`usage`) or even CPU-hour (`usage * compute_processor_vcpu`)](margin)Cloudstats by default is a daily aggregation. It would be _very_ interesting to cut a version that aggregates hourly to see how your clusters "breathe" over the course of a day or week, measured in cat-dollars. I'll bet you ten cat-dollars that your efficiency is bad in the trough times of low traffic (minimums too high) and gets terrible again at peak as your scaling algorithm over-corrects (maybe something in the response time of the algo?). 263 | 264 | At that point you have to leave Cloudstats and go to a more granular metrics system to figure out what the problem might be. It's possible that your efficiency ratio stays stable, suggesting that the scaling curve is just right. Maybe that means to save money you'll have to actually optimize the code. (Optimizing is the second-to-last resort after you've tried everything else.) But at least now you have numbers to guide your intuitions. 265 | 266 | 268 | 269 | ### Tag, you're It 270 | So far we've mostly been using the built-in fields in Cloudstats. You can see what AWS products cost you, and maybe cluster and database names. Now you need to dig into your own systems knowledge. Everyone has their own special ways to name & tag the internal organs of their infrastructure. Whether they call them services or products or clusters or whatnot, they all call them _something_, and those somethings are meaningful. 271 | 272 | Show of hands: who here has got 100% tagging coverage? No one? So who knows what their tagging coverage _is_, in dollars spent? Or, of the tags that do exist, how many are actually correct? It's ok. You're among friends. 273 | 274 | [If I had a nickel for every time I've seen "obseverability"... ](margin) 275 | Unfortunately, you, in-house cost analyzer person, are now It. You need to do analysis along those meaningful tags. But they are often wrong, old, missing, incomplete, unstructured, changing, or even misspelled. I am going to assume that you don't have total and instant power to change the tags being logged at the source. That means you'll have to develop your own side tables to track the madness. Heck, even if you could rationalize all tags you'd still need to go through this exercise in order to patch up the irrational tagging that's already laid down in historical data. 276 | 277 | One of the miseries of life is that everybody names things a little bit wrong. 278 | 279 | [The second thing to check is whether your company has a reliability engineering, release, CI/test, or similar team. Anyone who's on the hook (like you!) to chase down the relevant owner when Things Go Bad In System X. Go talk to them because they probably have ideas about how to do it right.](margin) 280 | The first thing to check is what tags are already being used in your company, which of those are reflected in AWS tags, and which of _those_ are activated in the CUR. [This AWS blog](https://aws.amazon.com/blogs/aws-cloud-financial-management/cost-tagging-and-reporting-with-aws-organizations/) has a pretty good list of commonly-used tag names. You should be able to add them to your Cloudstats ETL with minimum fuss. 281 | 282 | If there are interesting tags _not_ activated in the CUR, do that ASAP. You probably can't do much about historical data, but fix what you can now. Err on the side of overlogging. 283 | 284 | [Your finance people will want to know more than this but that comes in part three.](margin) 285 | At minimum you want two to three levels of infrastructure hierarchy, eg product, service, subservice. You should also have some idea of the system environment. And you want some kind of owner to blame. What you are looking for is a way to isolate globs of spend by their general purpose and a spot on the org chart. 286 | 287 | Let's take "environment": the production state of some system that's costing dollars. Usually there are only a handful that matter: prod, dev, stage, test. But when you peek at what people actually put in that field... 288 | 289 | ``` 290 | select tag_env from cloudstats 291 | group by 1 order by count(1) desc; 292 | 293 | tag_env 294 | ---------------------------- 295 | prod 296 | production 297 | prd 298 | prod-testing 299 | canada-prod 300 | oldprod 301 | development 302 | dev-NG 303 | devel 304 | dev-fulano.garza 305 | dev-bobbyn 306 | fulano-devl2 307 | devel-shared 308 | ml-dev 309 | fulanodev 310 | ... 311 | ``` 312 | 313 | ...it's a mess. Your first instinct might be to go tell Fulano to clean up his tags. Yes, go do that, and generally start a campaign on tag coverage & accuracy. But the data is what it is and you have to fix it for the past and not just the future. 314 | 315 | Treat this field as unrefined ore. It's not really "the environment", it's environment-like signal mixed with noise. Curate a mapping between the zoo of `tag_env` values and a smaller set of clean values. There's no dishonor in doing this in a spreadsheet, then loading it in as a database table. 316 | 317 | ``` 318 | select 319 | ... 320 | /* a.tag_env, */ 321 | coalesce(b.environment, 'Unknown') as tag_env 322 | from 323 | cloudstats a left join map_environments b 324 | on a.tag_env = b.tag_env 325 | where ... 326 | ``` 327 | [Also, ask your friendly neighborhood finance person about whether they would like to track production and other environments separately as cost-of-revenue vs R&D. They will likely say yes.](margin) 328 | The `coalesce()` just adds a default value when there's no entry in your table. A typical cost analysis pipeline can have a dozen of these little mappings. Joining the AWS `account_id` to a table of human-friendly names, for example. Or associating S3 bucket names to services. You might get some of them wrong (is prod-testing production, or testing?) but since you can fix it any time it's low risk. Make your best guess and don't wait for Fulano to answer your email. 329 | 330 | This kind of cardinality reduction is most of what the Cloudstats ETL does. It may feel tedious and unending, but it really is worth your time. With clean dimensions and fixed-up historicals you can operate _much_ more effectively. 331 | 332 | ### Own your ontology 333 | 334 | One special case is when the Foobar service is slowly replaced by Foobar2. For a while they run side-by-side. Or when Foobar is refactored into two new services, foobar-render and foobar-cache. This is normal evolution of compute infrastructure. If you are lucky the Foobar team sets up the service & environment tags properly, and your data stream records all the steps of the handover. 335 | 336 | But you still have a problem. The "service" name has changed in the historical record but you still want to easily track the cat-dollar efficiency of the Foobar system over _all_ its incarnations. Unlike the environment example, you probably don't want to rewrite the `tag_service` field. The service names at the time were not wrong. People will still want to analyze them separately. 337 | 338 | A lot of data analysis comes down to curating your own ontologies. To handle this case, one useful trick is to just make up a new synthetic tag. One that you own and no one else can touch. I like to call it "system" because it usually doesn't conflict with existing concepts. By default, you define the "system" as identical to "service" and only add these special cases to your mapping table. The `coalesce()` function works well here too. 339 | 340 | ``` 341 | select 342 | a.tag_service as service, 343 | 344 | coalesce( 345 | b.tag_system, 346 | a.tag_service 347 | ) as system 348 | from 349 | cloudstats a left join map_system b 350 | on a.tag_service = b.tag_service 351 | where ... 352 | ``` 353 | 354 | ### Monkeytagging 355 | 356 | Another special case is when the Foobar team _doesn't_ get their tagging right at first. What if between 18 April and 7 May the new Foobar had the same service tag as the old one? Programmers have been known to copy/paste, and logging bugs are as likely to happen as any other kind. Often this doesn't matter. There will be a weird little bump in the graphs for a while. But sometimes it is important to be able to distinguish exactly what was happening during that critical time. Production incidents are common as a new service is taking flight. 357 | 358 | A quick patch might be to add a conditional matching on whatever weird states allow you to separate the two services. 359 | 360 | ``` 361 | ... 362 | (case 363 | 364 | /* Fix up Foobar2 tagging during launch. See INDICENT-2045. */ 365 | when 366 | tag_service = 'Foobar' 367 | and tag_owner is null 368 | and date between '2021-04-18' and '2021-05-07' 369 | then 'Foobar2' 370 | 371 | else tag_service 372 | end) as tag_service, 373 | ... 374 | 375 | ``` 376 | 377 | Yes, monkeypatching data like this is ugly. You could go all the way and engineer proper [slowly-changing dimension](https://en.wikipedia.org/wiki/Slowly_changing_dimension) tables. In my experience it's usually not worth the effort as long as you leave good comments. 378 | 379 | ### Anomaly detection 380 | 381 | Often I come onto a cost project that's already underway. Engineers and finance people are smart & motivated. When the cost problem becomes obvious they don't wait for permission to tackle it. But there's one thing that I wish they didn't do in those early days: bad anomaly detection. 382 | 383 | For example, you might write code that looks at your costs over the last week, and trigger an alert if it's above the average of the last few weeks. Then of course you want to know _where_ the money went, so you'll do separate detection per service, rummaging around your config database to track down the owner. These alerts then get sent by chat or email. Or worse, as work tickets that nag the owner for days after. 384 | 385 | This activity satisfies the need to feel like something is getting done. Your past cost overruns feel like a giant false-negative factory. It's natural to want to put a stop to it. But instead of getting on top of the problem you are burying yourself under a pile of false-positives. 386 | 387 | Yes, you can play with thresholds and maintain a watchlist and do all the other things people do to manage flaky tests. But this is cost analysis, not a test suite. Cost bugs are almost never blocking bugs. "Code coverage" and negative/positive isn't the right mental model to use. 388 | 389 | In order to detect abnormal numbers, first you have to be clear about what is normal. The early version of your measurement system is almost guaranteed to be buggy in dangerous ways. The least-bad outcome of running chirpy detectors on top of buggy numbers is that you just waste people's time and erode their trust. 390 | 391 | [This nuance can be partially automated. The thresholds for `tag_env = 'test'` could be made looser than for production. Another reason to work toward clean tags.](margin) 392 | And even then, "normal" isn't quite the right concept either. New systems waste a lot of resources as their owners figure out how to tune them correctly. Make it work, make it work right, then make it work fast. This is normal. What we are really looking to judge is whether a piece of spend is _acceptable_. 393 | 394 | So what would be a good kind of detector? Ones based on queries you used to find past wins. They can ensure new examples of the same cost bugs get nipped early. Remember that query to measure the growth rate of your S3 buckets? One interesting detector might alarm when a growth rate inflects sharply upward. The assumption is that steady growth is acceptable but changes to that rate need investigation. Similar code could be used to detect when growth _stops_, giving you early warning about data that might not be needed anymore. 395 | 396 | Another good detector might operate on KPIs. One based on "revenue efficiency" might be worth looking into. Emailing hourly alerts about business metrics to your CFO would quickly reveal whether those metrics are well-conceived or not. 397 | 398 | Kidding aside, you should be very clear about what you want an alert to _do_. As in, the effect it should have on the world. If a given alert doesn't change how people think or act then by definition it is inconsequential. It probably shouldn't exist. I haven't come across an alert system that formally measures its own [efficacy](https://en.wikipedia.org/wiki/Efficacy), but at least thinking about that as you design can help. 399 | 400 | Whatever detectors you end up writing, I hope you keep a human in the loop. Even if one hour a day of one valuable person's time is spent winnowing the crud, escalating the worthwhile alerts, and automating where it seems good, it will save the rest of your people a lot more. The goal is still to replace yourself with a small shell script. But first you have to get good enough at generating & judging alerts to be worth automating. 401 | 402 | ## Part three: Capacity Planning 403 | 404 | ![It's a goofy-looking pony that tends to bite, but it'll do the job.](img/aws-dismal/pony.jpg "margin") 405 | So now you have a much clearer and detailed view on what you have spent. Each major service / system is accounted for and its history laid bare. The growth of the operations can be backed out of the cost equation to give you a good idea how efficient your infrastructure actually is. And since we are deep into fantasy land, you also have a pony. 406 | 407 | Now it's time to project what you know into the future, and make plans. 408 | 409 | Here the humble `tag_system` and `tag_env` take another bow. Different environments and accounting buckets aren't just tied to distinctive purposes, they also tend to have different drivers of growth. 410 | 411 | 1. Your devserver costs are directly proportional to the number of people writing code. 412 | 2. CI/test clusters are proportional to the amount of code that has been written. 413 | 3. Storage (naturally) is the sum of all data collected to date. 414 | 4. ML training clusters, if any, are roughly proportional to 1 * 2 * 3, which is why ML is so expensive. 415 | 5. Public-facing systems are sized by business volume. 416 | 417 | Joining on system and environment, you can curate another mapping table that will bucket costs by their main drivers. This is where your insider knowledge about your systems, company, revenue, personnel, _and your future plans for them_ come into play, and is a big reason why cost control is an in-house sport. 418 | 419 | ### Wait, is this even right? 420 | 421 | Now is the time to reach for that other table `cloudstats_bill`. You aren't analyzing day-by-day but month-by-month. Also, the cash-basis cost sums in `_bill` will more closely match the real bill. Cap planning often involves financing shenanigans which will diverge from the accrual-basis "cost" metric used by engineers to reduce their usage. 422 | 423 | This query should recreate the sums & totals in your official invoices from AWS. The `cash_cost` sum won't be exact (rounding errors & such) but should match to several significant digits. Also, this query will give you the running total for the current month before the invoice is cut. 424 | 425 | ``` 426 | select 427 | date_month, 428 | invoice_id, 429 | product_code, 430 | bill_line, 431 | sum(unblended_cost) as cash_cost, 432 | sum(net_unblended_cost) as cash_cost_with_discounts, 433 | sum(accrual_cost) as accrual_cost 434 | from cloudstats_bill 435 | group by 1,2,3,4 436 | order by 1 desc; 437 | 438 | ``` 439 | 440 | If you take your curated tags (system, environment, and your brand-new `tag_driver`) and add them to this aggregation, you'll be able to graph those buckets separately over time. Even better, you can use the same trick we used in the compute efficiency section. You can _divide_ those costs by the monthly number of developers, lines of code, customers, revenue, etc etc to back out separate measures of efficiency. Cat-dollars for everyone! 441 | 442 | ### Projections 443 | Writing a robust projection / scenario system is beyond the scope of this article. But you can go quite far with nothing but the output of some queries and a spreadsheet. In fact I strongly suggest that you do these analyses by hand in a spreadsheet first. You will have to pull in data from additional sources like accounting, sales and HR to find some good metrics. Not all of them will be winners. Some of them will be highly confidential. Once you find the useful metrics, then you automate. 444 | 445 | [Some industries have discontinuities like Black Friday, New Years, etc. Demand spikes higher and the risk of systems failure looms large. These events tend to warp all company activity around them for months. Projecting those are _well_ beyond the scope of this article, but the same principles should apply.](margin) 446 | Let's say you run the numbers. Over the last 6 months your average revenue per customer was $100, while `tag_driver='customers'` compute costs were $10 per. (Lucky you!) Your sales people should have pretty good estimates for customer growth. They maintain their own universe of nerdy little details like churn & renewals. Ignore them. Ask for a simple month-by-month projection over the next six months. _Assuming your cost-per-customer stays stable_, you can now estimate that component of your compute costs over those future months. 447 | 448 | If you do the same with all of your other efficiency metrics, you can construct a much more accurate picture of your total future cost. How accurate? I don't know. But here's one way to check: run the same projection formulas over past data and see how well they track. 449 | 450 | Don't get _too_ crazy with this. These metrics are a few levels of abstraction away from reality and will contain many confounding factors. They are a directional guide, not an oracle. 451 | 452 | ### Scenarios 453 | 454 | A scenario is just a projection with extra made-up numbers. For instance, you may find that you spend $1,000/month (accrual-basis) per developer on the hardware they use to write & test code. Is that "a lot"? Is it "too much"? That's for you to decide. Every optimization carries a cost in effort _plus_ the opportunity cost of not doing something else. Every optimization also carries savings into the future. Growth can matter, and so you should try to take it into account in your scenarios. 455 | 456 | Say you think that consolidating devservers would save $50k/month today. Your dev team has a projected 1% compounding monthly growth rate. Improving your caching also saves $50k, but is driven by public-facing traffic that has 5% growth. Break out the Python to compare the total savings over a year: 457 | 458 | ``` 459 | # Devservers, 1% growth 460 | sum([int(50000 * (1.01**n)) for n in range(0,12)]) 461 | 462 | 634122 463 | 464 | # Caching, 5% growth 465 | sum([int(50000 * (1.05**n)) for n in range(0,12)]) 466 | 467 | 795852 468 | ``` 469 | 470 | You don't have to choose to do one or the other. You could do both. But lower-priority things have a tendency to fall off the roadmap. Running scenarios with future growth can help you re-order those priorities for maximum win. 471 | 472 | The time horizon also matters. If you delete a PB of unneeded data, the benefit is easy to project into the future: the $50k or whatever of savings from not storing that data goes on indefinitely. On the other hand, the savings from optimizing a piece of code might become irrelevant next year if the system is shut down or rewritten. 473 | 474 | My rule of thumb is to take the measured monthly savings at the moment of the change and project it no more than 12 months out. Straight-line project it if you don't have or don't trust your growth numbers. The 12-month rule will underestimate the impact in some cases, and overestimate in others. But in my experience it more or less comes out even. It’s simple to calculate and short-circuits endless discussion about the details. 475 | 476 | Running scenarios can also help you estimate the cost of new systems. Foobar3 will presumably have about the same cat-dollar ratio as Foobar2, unless one of the goals is to improve its efficiency. (That's something you can and should measure as early as possible during development.) And the launch roadmap will probably call for both versions to run concurrently for a while. Since you know this will happen and can calculate what it will cost, you can give your finance people a gift beyond price: prior warning. 477 | 478 | 479 | ### Strategic buying & tradeoffs 480 | 481 | AWS (and other clouds) have multiple ways to purchase the same or roughly equivalent cloud resources. In general you are trading flexibility or reliability for money. Good scenarios can really help here. 482 | 483 | **Storage tiers** can take up a whole article on their own. Basically you can store some data more cheaply if you don't read it very often. The main catch is that you have to know your read access patterns very well. Intelligent Tiering and Storage Lens can give you good insight on that, but they themselves carry a per-file cost. 484 | 485 | And you should maybe have some controls around unexpected reads. At one company that will remain nameless, the storage team Glaciered a bunch of old data they were sure was not being read. Meanwhile, down the hall, the security team was implementing at-rest encryption for S3 data. Do those two things out of order and you are in for a really bad time. Old data gets archived, then is pulled _out_ at very slow speeds and very high cost, encrypted, then _written back_ to the most expensive frequent-access tier. The same could happen with secrets scanning or that big historical analytics job your intern is whiteboarding right now. 486 | 487 | **Chip architectures** are a classic money savings. The new Graviton servers look promising, if your code can be recompiled for the ARM architecture. AMD servers will run most x86 code without change, are cheaper, and come in nearly all of the same flavors as Intel. Everybody knows this. The problem with everybody knowing this is that the Spot price for AMD machines is often higher than Intel on popular instance types... 488 | 489 | ![_Slaps roof_ "You can fit double the k8s pods in this bad boy."](img/aws-dismal/f-150.png "margin") 490 | **Server types**: Speaking of popularity, the [c5.2xlarge](https://instances.vantage.sh/aws/ec2/c5.2xlarge) is the Ford F-150 of AWS servers. Everybody seems to want one. Or a thousand. Again the law of supply and demand works its magic, resulting in a Spot price that's not much better than what you get with a 1-year Reservation. The [4x type](https://instances.vantage.sh/aws/ec2/c5.4xlarge) and larger machines don't appear (as of 2022/2023) to have the same inflated Spot prices, often costing 10-15% lower than the Reserved price. There can be good technical reasons to run more, smaller computers of a single type than fewer large ones of mixed type. But when you are talking a double-digit difference in price per CPU, it might be worth revisiting them. 491 | 492 | **Spot pricing** is the best choice for most workloads that vary hour-by-hour. Same compute, no commitment, lower price than OnDemand and even Reserved. The tradeoff is that if enough people bid higher than you did, your machine can be interrupted and given to someone else. So this option is best for stateless stuff running in clusters that can heal from nodes popping like soap bubbles. 493 | 494 | By the way, another reason to look at less-popular instance types is that they [tend to have a lower chance](https://aws.amazon.com/ec2/spot/instance-advisor/) of being interrupted as market prices change. 495 | 496 | **Reserved Instances**, in my opinion, are no longer worth the trouble for EC2. Savings Plans give you a lot more flexibility day-by-day for a modest premium. You still need RIs for non-EC2 compute like RDS or ElasticSearch. But databases are stateful and are provisioned statically. 497 | 498 | **Savings Plans** are the best choice for your [base load](https://en.wikipedia.org/wiki/Base_load). Commit to a certain dollar spend over one year or three, then use what you like across the entire menu of AWS servers. 499 | 500 | But how do you measure your optimal commit? The same way a power plant does. Graph your compute usage, by hour, over a month. Take the average of the lowest hour per day. That's a good estimate of the bare minimum it takes to keep your site up in the wee hours when everyone is asleep. 501 | 502 | ![](img/aws-dismal/power-plant.png) 503 | 504 | In the ideal case your base load will be served close to 100% by Savings Plan, and the rest by Spot. You can't improve on that without reducing consumption, making pre-payments, or negotiating a better contract. 505 | 506 | However you game out your SP and RI buys, one bit of advice: don't buy them all at once. Buy in small tranches, say 10% of the total, spread out over 10 weeks. There are a couple of good reasons for this even though it feels like you're leaving money on the table. It gives you time to see the effects of your buys, and to adjust if your engineers move the baseline. Also you'll thank yourself in 1-3 years as the tranches expire in little weekly chunks instead of dropping off a cliff on the same day. 507 | 508 | ![](img/aws-dismal/staggered-vs-cliffed.png) 509 | 510 | ## Part four: Boss Level 511 | Remember about 8,000 words ago when I said that cost control is similar to security work? That includes the importance of accurate threat modeling. Thinking about your vendors as slightly-adversarial players in a zero-sum game might sound a little cynical, but experience suggests that it's the way to go. You have to know more angles than they do. And they know a lot of them. 512 | 513 | 514 | Any vendor's ideal customer is 515 | 516 | 1. totally dependent on them, and who 517 | 2. negotiates only based on headline discounts, 518 | 3. prefers long-term contracts with spend commits defined on a schedule suggested by the vendor, and 519 | 4. negotiates quickly to get it over with. 520 | 521 | The longer the contract term, the better for the vendor. By the time a five-year deal goes up for renewal, the customer's institutional knowledge has often leaked away. 522 | 523 | The best situation for the customer (you!) is the opposite: they negotiate with finance, procurement, and engineering involved at all levels throughout the process. They use detailed data and forecasts informed by future plans. They ask for tailored terms over shorter periods. And they can credibly threaten to move to another provider. 524 | 525 | ### Timing 526 | In general you want to give yourself time, two or three quarters ahead of any renewal date. The consequences of going out of contract are generally bad for you and good for them. You need time to deal with new information as it comes up, or to work out alternatives if they push back on something. That said, the deadline usually isn't fatal. If you don't come to an agreement past closing time, [MSAs](https://en.wikipedia.org/wiki/Master_service_agreement) often have a grace period afterward. But you have to read your contract carefully to know where you stand. 527 | 528 | [Especially as of this writing (early 2023), economic conditions are evolving rapidly. I'm told that cloud customers are howling for new terms off-cycle as the numbers shift underneath everyone.](margin) 529 | And don't get too caught up in deadlines. Now is always a good time to renegotiate, _if_ you are prepared. You’ll have to push, and give yourself time to do that pushing, but if you are prepared to offer tangible things (or credible threats) to your vendors in exchange for better terms, anything is negotiable. 530 | 531 | Vendors want predictable, growing revenue from customers caught in a web of barriers (technological, financial, legal, etc) to exiting their ecosystem. Look out for the terms that satisfy or threaten those desires, and use them as leverage in negotiation. 532 | 533 | ### Leverage 534 | 535 | Your strongest leverage, as with all service contracts, is having the time and ability to go somewhere else. But no one moves their compute infrastructure overnight. During my time as a regular employee at both Facebook and Yahoo, those companies went on a multi-quarter quest to build their own CDNs to reduce dependence on Akamai. They did it in a very obvious way, with lots of winking and wiggling eyebrows. In the end both companies got very nice deals when the contract came up for renewal, because if they didn’t like it they could walk away after a few more quarters. Obviously those were special circumstances, but that's the kind of heavy leverage you might need to bring. 536 | 537 | The second-strongest leverage is demonstrating insight into and control over your costs. For example, _"Look, we entered the last contract with scheduled increases you suggested. Now we have a lot more knowledge about how we use AWS. Our spend is leveling off, and we project..."_ etc etc. You don't have to give them all the details. But if you are confident about your future usage, you can negotiate better on cost. Avoid, as much as possible, to give the impression that you are dumb money. 538 | 539 | The third-strongest leverage is offering them new business, and restructuring the terms around that business. Say, you are opening operations in a new region, or migrating compute to the edge, or want to use a _lot_ of some new product they want to be successful. You can even play a little hardball with this. New infrastructure is more easily built on their competitors. They shouldn't assume that any new business is automatically theirs. 540 | 541 | Fourth-strongest (and most dangerous for you) is extending the contract period. Instead of a 3 year term, offer to sign for 5. They may present this as a first option, but keep it as a last resort. 542 | 543 | 544 | ### Terms 545 | 546 | While the contract the vendor proposes may appear to be all custom and bespoke, and every discount may be presented as something special your enterprise rep was able to get just for you, it's really not. 547 | 548 | **EDP discounts** are proportional to spend commit and duration. Because of confidentiality clauses, hard numbers are hard to come by or talk about publicly. (Wouldn't collective bargaining by a lot of customers at once be a wonderful thing?) Even so, [this post](https://www.cloudforecast.io/blog/aws-edp-guide/) has a decent guide. 549 | 550 | **Specificity** sometimes helps. Remember, AWS builds and maintains monstrously big buildings stuffed full of computers in many regions. By default or laziness, everybody and their dog seems to put their servers in the first region us-east-1 in Herndon Virginia. AWS sometimes feels pressure to spread the load to other datacenters. If you are able to move or expand into them, don't just give that away. 551 | 552 | **Private rates** are a thing. Essentially you can negotiate a special rate for specific usage, eg for certain types of compute or classes of AWS products. Public information is thin. Even the documentation for the relevant `discount_*` fields in the CUR appear to be missing from [the user guide](https://docs.aws.amazon.com/cur/latest/userguide/discount-details.html). But knowing that they are something you can ask for is half the battle. 553 | 554 | **Support** is a good area to poke at. By default the enterprise support tax is 3%. Let’s say you spend $10M annually. The support charge would be $300K, or the equivalent of 2-3 FTEs. Are you using the services of 3 AWS engineers at 40 hours per week, or 6,000 people-hours per year? Do you suddenly use _less_ support time if you reduce your dollar spend? Probably not. Support costs scale with _complexity_, not the gross dollars spent. It would be straightforward to estimate the actual work-hours you consume by looking at your support ticket history. Bring that to the table as a negotiating point. You can use it either to get a better rate on support itself, or as a bargaining chip to exchange for some other term you want more. 555 | 556 | **Exclusivity**: run far and fast from any terms that lock you into a single provider. They may be 100 pages deep in the contract, so read carefully. Also push back hard on any terms that allow you to _use_ the vendor's competitors as long as you don't _talk_ about it in public. The ability to move, and to be public about moving, is your number-one leverage. Don't throw it away. 557 | 558 | ### Don't get greedy 559 | Playing with all of these levers can be fun. But don't approach it with the implied goal of "have we saved everything we can?" Better is the explicit goal of "can we reliably predict and control costs, up _and_ down, when we need to?" You are still a business with competition. A racecar driver doesn't win on fuel efficiency. You win by going fast when you can and shifting momentum when you must, all with conscious intent. 560 | 561 | ## Conclusion 562 | 563 | This not-an-article ended up being a lot longer than I expected. Even so it breezes through a wide range of full-time professions, and I'm sure those professionals would have many things to say about what I got wrong or left out. This is meant to be a basic guide to cloud cost control in general and a user manual on AWS/Cloudstats in particular. Comments and pull requests are always welcome. 564 | 565 | My main job is setting up Green Teams for eight- and nine-figure budgets. I contract with one client at a time as a high-level IC, working with finance, execs, engineering, and vendors. I advise at all levels and can write the code that's needed to measure your cost & usage properly. Then I go on to the next client. My main value is teaching your people how to do cost control as a continuous practice. If you want to chat about an engagement on any brand of cloud, drop me an email: carlos@bueno.org. 566 | 567 | While looking for potential clients I run into roughly 10x more people who spend "only" seven figures per year. They are interested in running their own Green Teams but maybe can't afford a full-time consultant. I open-sourced this code and wrote this article to try to help them out. I hope it's useful to you. GLHF! 568 | 569 | 570 | --------------------------------------------------------------------------------