├── .gitignore ├── integration_tests ├── .gitignore ├── data │ ├── fill_staging_columns_input.csv │ └── fill_staging_columns_output.csv ├── packages.yml ├── dbt_project.yml ├── models │ ├── schema.yml │ └── fill_staging_columns_test.sql ├── macros │ └── get_fill_staging_columns_columns.sql └── ci │ └── sample.profiles.yml ├── .DS_Store ├── macros ├── snowflake_seed_data.sql ├── enabled_vars.sql ├── enabled_vars_one_true.sql ├── _get_utils_namespaces.sql ├── pivot_json_extract.sql ├── ceiling.sql ├── seed_data_helper.sql ├── array_agg.sql ├── empty_variable_warning.sql ├── add_pass_through_columns.sql ├── remove_prefix_from_columns.sql ├── fill_pass_through_columns.sql ├── string_agg.sql ├── max_bool.sql ├── json_extract.sql ├── staging_models_automation.sql ├── source_relation.sql ├── percentile.sql ├── generate_columns_macro.sql ├── first_value.sql ├── dummy_coalesce_value.sql ├── collect_freshness.sql ├── timestamp_add.sql ├── json_parse.sql ├── fill_staging_columns.sql ├── union_data.sql ├── get_columns_for_macro.sql ├── timestamp_diff.sql └── union_relations.sql ├── dbt_project.yml ├── docs └── PULL_REQUEST_TEMPLATE.md ├── columns_setup.sh └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | target/ 3 | dbt_modules/ 4 | logs/ 5 | -------------------------------------------------------------------------------- /integration_tests/.gitignore: -------------------------------------------------------------------------------- 1 | 2 | target/ 3 | dbt_modules/ 4 | logs/ 5 | -------------------------------------------------------------------------------- /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fivetran/dbt_fivetran_utils/HEAD/.DS_Store -------------------------------------------------------------------------------- /integration_tests/data/fill_staging_columns_input.csv: -------------------------------------------------------------------------------- 1 | column_date,column_string,column_int 2 | 2020-01-01,testing,27 -------------------------------------------------------------------------------- /integration_tests/packages.yml: -------------------------------------------------------------------------------- 1 | packages: 2 | - local: ../ 3 | - package: fishtown-analytics/dbt_utils 4 | version: 0.6.2 -------------------------------------------------------------------------------- /integration_tests/data/fill_staging_columns_output.csv: -------------------------------------------------------------------------------- 1 | column_date,column_string,column_int_alias,column_float 2 | 2020-01-01,testing,27, -------------------------------------------------------------------------------- /integration_tests/dbt_project.yml: -------------------------------------------------------------------------------- 1 | name: 'fivetran_utils_integration_tests' 2 | version: '0.1.0' 3 | config-version: 2 4 | profile: 'integration_tests' -------------------------------------------------------------------------------- /integration_tests/models/schema.yml: -------------------------------------------------------------------------------- 1 | version: 2 2 | 3 | models: 4 | - name: fill_staging_columns_test 5 | tests: 6 | - dbt_utils.equality: 7 | compare_model: ref('fill_staging_columns_output') -------------------------------------------------------------------------------- /macros/snowflake_seed_data.sql: -------------------------------------------------------------------------------- 1 | {% macro snowflake_seed_data(seed_name) %} 2 | 3 | {% if target.type == 'snowflake' %} 4 | {{ return(ref(seed_name ~ '_snowflake')) }} 5 | {% else %} 6 | {{ return(ref(seed_name)) }} 7 | {% endif %} 8 | 9 | {% endmacro %} -------------------------------------------------------------------------------- /macros/enabled_vars.sql: -------------------------------------------------------------------------------- 1 | {% macro enabled_vars(vars) %} 2 | 3 | {% for v in vars %} 4 | 5 | {% if var(v, True) == False %} 6 | {{ return(False) }} 7 | {% endif %} 8 | 9 | {% endfor %} 10 | 11 | {{ return(True) }} 12 | 13 | {% endmacro %} -------------------------------------------------------------------------------- /macros/enabled_vars_one_true.sql: -------------------------------------------------------------------------------- 1 | {% macro enabled_vars_one_true(vars) %} 2 | 3 | {% for v in vars %} 4 | 5 | {% if var(v, False) == True %} 6 | {{ return(True) }} 7 | {% endif %} 8 | 9 | {% endfor %} 10 | 11 | {{ return(False) }} 12 | 13 | {% endmacro %} -------------------------------------------------------------------------------- /macros/_get_utils_namespaces.sql: -------------------------------------------------------------------------------- 1 | -- macro is necessary for cross-db compatibility to work 2 | {% macro _get_utils_namespaces() %} 3 | {% set override_namespaces = var('fivetran_utils_dispatch_list', []) %} 4 | {% do return(override_namespaces + ['dbt_utils', 'fivetran_utils']) %} 5 | {% endmacro %} 6 | -------------------------------------------------------------------------------- /macros/pivot_json_extract.sql: -------------------------------------------------------------------------------- 1 | {% macro pivot_json_extract(string, list_of_properties) %} 2 | 3 | {%- for property in list_of_properties -%} 4 | 5 | replace( {{ fivetran_utils.json_extract(string, property) }}, '"', '') as {{ property | replace(' ', '_') | lower }} 6 | 7 | {%- if not loop.last -%},{%- endif %} 8 | {% endfor -%} 9 | 10 | {% endmacro %} -------------------------------------------------------------------------------- /macros/ceiling.sql: -------------------------------------------------------------------------------- 1 | {% macro ceiling(num) -%} 2 | 3 | {{ adapter.dispatch('ceiling', packages = fivetran_utils._get_utils_namespaces()) (num) }} 4 | 5 | {%- endmacro %} 6 | 7 | {% macro default__ceiling(num) %} 8 | ceiling({{ num }}) 9 | 10 | {% endmacro %} 11 | 12 | {% macro snowflake__ceiling(num) %} 13 | ceil({{ num }}) 14 | 15 | {% endmacro %} 16 | -------------------------------------------------------------------------------- /dbt_project.yml: -------------------------------------------------------------------------------- 1 | name: 'fivetran_utils' 2 | version: '0.1.0' 3 | config-version: 2 4 | 5 | source-paths: ["models"] 6 | analysis-paths: ["analysis"] 7 | test-paths: ["tests"] 8 | data-paths: ["data"] 9 | macro-paths: ["macros"] 10 | snapshot-paths: ["snapshots"] 11 | 12 | 13 | vars: 14 | fivetran_utils: 15 | dbt_utils_dispatch_list: 16 | - fivetran_utils -------------------------------------------------------------------------------- /macros/seed_data_helper.sql: -------------------------------------------------------------------------------- 1 | {% macro seed_data_helper(seed_name, warehouses) %} 2 | 3 | {% if target.type in warehouses %} 4 | {% for w in warehouses %} 5 | {% if target.type == w %} 6 | {{ return(ref(seed_name ~ "_" ~ w ~ "")) }} 7 | {% endif %} 8 | {% endfor %} 9 | {% else %} 10 | {{ return(ref(seed_name)) }} 11 | {% endif %} 12 | 13 | {% endmacro %} -------------------------------------------------------------------------------- /macros/array_agg.sql: -------------------------------------------------------------------------------- 1 | {% macro array_agg(field_to_agg) -%} 2 | 3 | {{ adapter.dispatch('array_agg', packages = fivetran_utils._get_utils_namespaces()) (field_to_agg) }} 4 | 5 | {%- endmacro %} 6 | 7 | {% macro default__array_agg(field_to_agg) %} 8 | array_agg({{ field_to_agg }}) 9 | {% endmacro %} 10 | 11 | {% macro redshift__array_agg(field_to_agg) %} 12 | listagg({{ field_to_agg }}, ',') 13 | {% endmacro %} -------------------------------------------------------------------------------- /macros/empty_variable_warning.sql: -------------------------------------------------------------------------------- 1 | {% macro empty_variable_warning(variable, downstream_model) %} 2 | 3 | {% if not var(variable) %} 4 | {{ log( 5 | """ 6 | Warning: You have passed an empty list to the """ ~ variable ~ """. 7 | As a result, you won't see the history of any columns in the """ ~ downstream_model ~ """ model. 8 | """, 9 | info=True 10 | ) }} 11 | {% endif %} 12 | 13 | {% endmacro %} -------------------------------------------------------------------------------- /macros/add_pass_through_columns.sql: -------------------------------------------------------------------------------- 1 | {% macro add_pass_through_columns(base_columns, pass_through_var) %} 2 | 3 | {% if pass_through_var %} 4 | 5 | {% for column in pass_through_var %} 6 | 7 | {% do base_columns.append({ "name": column.name, "alias": column.alias }) if column.alias else base_columns.append({ "name": column.name }) %} 8 | 9 | {% endfor %} 10 | 11 | {% endif %} 12 | 13 | {% endmacro %} -------------------------------------------------------------------------------- /macros/remove_prefix_from_columns.sql: -------------------------------------------------------------------------------- 1 | {% macro remove_prefix_from_columns(columns, prefix='', exclude=[]) %} 2 | 3 | {%- for col in columns if col.name not in exclude -%} 4 | {%- if col.name[:prefix|length]|lower == prefix -%} 5 | {{ col.name }} as {{ col.name[prefix|length:] }} 6 | {%- else -%} 7 | {{ col.name }} 8 | {%- endif -%} 9 | {%- if not loop.last -%},{%- endif %} 10 | {% endfor -%} 11 | 12 | {% endmacro %} -------------------------------------------------------------------------------- /integration_tests/macros/get_fill_staging_columns_columns.sql: -------------------------------------------------------------------------------- 1 | {% macro get_fill_staging_columns_columns() %} 2 | 3 | {% set columns = [ 4 | {"name": "column_date", "datatype": "date"}, 5 | {"name": "column_string", "datatype": dbt_utils.type_string()}, 6 | {"name": "column_int", "datatype": dbt_utils.type_int(), "alias": "column_int_alias"}, 7 | {"name": "column_float", "datatype": dbt_utils.type_float()} 8 | ] %} 9 | 10 | {{ return(columns) }} 11 | 12 | {% endmacro %} -------------------------------------------------------------------------------- /macros/fill_pass_through_columns.sql: -------------------------------------------------------------------------------- 1 | {% macro fill_pass_through_columns(pass_through_variable) %} 2 | 3 | {% if var(pass_through_variable) %} 4 | {% for field in var(pass_through_variable) %} 5 | {% if field.transform_sql %} 6 | , {{ field.transform_sql }} as {{ field.alias if field.alias else field.name }} 7 | {% else %} 8 | , {{ field.alias if field.alias else field.name }} 9 | {% endif %} 10 | {% endfor %} 11 | {% endif %} 12 | 13 | {% endmacro %} -------------------------------------------------------------------------------- /integration_tests/models/fill_staging_columns_test.sql: -------------------------------------------------------------------------------- 1 | with source as ( 2 | 3 | select * 4 | from {{ ref('fill_staging_columns_input') }} 5 | 6 | ), 7 | 8 | renamed as ( 9 | 10 | select 11 | 12 | {{ 13 | fivetran_utils.fill_staging_columns( 14 | source_columns=adapter.get_columns_in_relation(ref('fill_staging_columns_input')), 15 | staging_columns=get_fill_staging_columns_columns() 16 | ) 17 | }} 18 | 19 | from source 20 | 21 | ) 22 | 23 | select * 24 | from renamed -------------------------------------------------------------------------------- /macros/string_agg.sql: -------------------------------------------------------------------------------- 1 | 2 | {% macro string_agg(field_to_agg, delimiter) -%} 3 | 4 | {{ adapter.dispatch('string_agg', packages = fivetran_utils._get_utils_namespaces()) (field_to_agg, delimiter) }} 5 | 6 | {%- endmacro %} 7 | 8 | {% macro default__string_agg(field_to_agg, delimiter) %} 9 | string_agg({{ field_to_agg }}, {{ delimiter }}) 10 | 11 | {% endmacro %} 12 | 13 | {% macro snowflake__string_agg(field_to_agg, delimiter) %} 14 | listagg({{ field_to_agg }}, {{ delimiter }}) 15 | 16 | {% endmacro %} 17 | 18 | {% macro redshift__string_agg(field_to_agg, delimiter) %} 19 | listagg({{ field_to_agg }}, {{ delimiter }}) 20 | 21 | {% endmacro %} -------------------------------------------------------------------------------- /macros/max_bool.sql: -------------------------------------------------------------------------------- 1 | {% macro max_bool(boolean_field) -%} 2 | 3 | {{ adapter.dispatch('max_bool', packages = fivetran_utils._get_utils_namespaces()) (boolean_field) }} 4 | 5 | {%- endmacro %} 6 | 7 | --Default max_bool calculation 8 | {% macro default__max_bool(boolean_field) %} 9 | 10 | bool_or( {{ boolean_field }} ) 11 | 12 | {% endmacro %} 13 | 14 | --max_bool calculation specific to Snowflake 15 | {% macro snowflake__max_bool(boolean_field) %} 16 | 17 | max( {{ boolean_field }} ) 18 | 19 | {% endmacro %} 20 | 21 | --max_bool calculation specific to BigQuery 22 | {% macro bigquery__max_bool(boolean_field) %} 23 | 24 | max( {{ boolean_field }} ) 25 | 26 | {% endmacro %} -------------------------------------------------------------------------------- /macros/json_extract.sql: -------------------------------------------------------------------------------- 1 | {% macro json_extract(string, string_path) -%} 2 | 3 | {{ adapter.dispatch('json_extract', packages = fivetran_utils._get_utils_namespaces()) (string, string_path) }} 4 | 5 | {%- endmacro %} 6 | 7 | {% macro default__json_extract(string, string_path) %} 8 | 9 | json_extract_path_text({{string}}, {{ "'" ~ string_path ~ "'" }} ) 10 | 11 | {% endmacro %} 12 | 13 | {% macro bigquery__json_extract(string, string_path) %} 14 | 15 | json_extract_scalar({{string}}, {{ "'$." ~ string_path ~ "'" }} ) 16 | 17 | {% endmacro %} 18 | 19 | {% macro postgres__json_extract(string, string_path) %} 20 | 21 | {{string}}::json->>{{"'" ~ string_path ~ "'" }} 22 | 23 | {% endmacro %} 24 | -------------------------------------------------------------------------------- /docs/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | **What change does this PR introduce?** 2 | 3 | 4 | **If this PR introduces a new macro, how did you test the new macro?** 5 | 6 | 7 | **If this PR introduces a modification to an existing macro, which packages is the macro currently present in and what steps were taken to test compatibility across packages?** 8 | 9 | 10 | **Did you update the README to reflect the macro addition/modifications?** 11 | 12 | - [ ] Yes 13 | - [ ] No (provide further explanation) 14 | -------------------------------------------------------------------------------- /macros/staging_models_automation.sql: -------------------------------------------------------------------------------- 1 | {% macro staging_models_automation(package, source_schema, source_database, tables) %} 2 | 3 | {% set package = ""~ package ~"" %} 4 | {% set source_schema = ""~ source_schema ~"" %} 5 | {% set source_database = ""~ source_database ~"" %} 6 | 7 | {% set zsh_command = "source dbt_modules/fivetran_utils/columns_setup.sh '../dbt_"""~ package ~"""_source' stg_"""~ package ~""" """~ source_database ~""" """~ source_schema ~""" " %} 8 | 9 | {% for t in tables %} 10 | {% if t != tables[-1] %} 11 | {% set help_command = zsh_command + t + " && \n" %} 12 | 13 | {% else %} 14 | {% set help_command = zsh_command + t %} 15 | 16 | {% endif %} 17 | {{ log(help_command, info=True) }} 18 | 19 | {% endfor %} 20 | 21 | {% endmacro %} 22 | -------------------------------------------------------------------------------- /macros/source_relation.sql: -------------------------------------------------------------------------------- 1 | {% macro source_relation() -%} 2 | 3 | {{ adapter.dispatch('source_relation', packages = fivetran_utils._get_utils_namespaces()) () }} 4 | 5 | {%- endmacro %} 6 | 7 | {% macro default__source_relation() %} 8 | 9 | {% if var('union_schemas', none) %} 10 | , case 11 | {% for schema in var('union_schemas') %} 12 | when lower(replace(replace(_dbt_source_relation,'"',''),'`','')) like '%.{{ schema|lower }}.%' then '{{ schema|lower }}' 13 | {% endfor %} 14 | end as source_relation 15 | {% elif var('union_databases', none) %} 16 | , case 17 | {% for database in var('union_databases') %} 18 | when lower(replace(replace(_dbt_source_relation,'"',''),'`','')) like '%{{ database|lower }}.%' then '{{ database|lower }}' 19 | {% endfor %} 20 | end as source_relation 21 | {% else %} 22 | , '' as source_relation 23 | {% endif %} 24 | 25 | {% endmacro %} 26 | -------------------------------------------------------------------------------- /macros/percentile.sql: -------------------------------------------------------------------------------- 1 | {% macro percentile(percentile_field, partition_field, percent) -%} 2 | 3 | {{ adapter.dispatch('percentile', packages = fivetran_utils._get_utils_namespaces()) (percentile_field, partition_field, percent) }} 4 | 5 | {%- endmacro %} 6 | 7 | --percentile calculation specific to Redshift 8 | {% macro default__percentile(percentile_field, partition_field, percent) %} 9 | 10 | percentile_cont( 11 | {{ percent }} ) 12 | within group ( order by {{ percentile_field }} ) 13 | over ( partition by {{ partition_field }} ) 14 | 15 | {% endmacro %} 16 | 17 | --percentile calculation specific to BigQuery 18 | {% macro bigquery__percentile(percentile_field, partition_field, percent) %} 19 | 20 | percentile_cont( 21 | {{ percentile_field }}, 22 | {{ percent }}) 23 | over (partition by {{ partition_field }} 24 | ) 25 | 26 | {% endmacro %} -------------------------------------------------------------------------------- /columns_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | mkdir -p $1/macros 3 | mkdir -p $1/models/tmp 4 | dbt run-operation fivetran_utils.generate_columns_macro --args '{"table_name": "'$5'", "schema_name": "'$4'", "database_name":"'$3'"}' | tail -n +2 > $1/macros/get_$5_columns.sql 5 | echo "select * from {{ var('$5') }}" > $1/models/tmp/$2__$5_tmp.sql 6 | echo "" >> $1/models/$2__$5.sql 7 | echo "with base as ( 8 | 9 | select * 10 | from {{ ref('$2__$5_tmp') }} 11 | 12 | ), 13 | 14 | fields as ( 15 | 16 | select 17 | {{ 18 | fivetran_utils.fill_staging_columns( 19 | source_columns=adapter.get_columns_in_relation(ref('$2__$5_tmp')), 20 | staging_columns=get_$5_columns() 21 | ) 22 | }} 23 | 24 | from base 25 | ), 26 | 27 | final as ( 28 | 29 | select 30 | -- rename here 31 | from fields 32 | ) 33 | 34 | select * from final" >> $1/models/$2__$5.sql 35 | -------------------------------------------------------------------------------- /macros/generate_columns_macro.sql: -------------------------------------------------------------------------------- 1 | {% macro generate_columns_macro(table_name, schema_name, database_name=target.database) %} 2 | 3 | {% set columns = get_columns_for_macro(table_name, schema_name, database_name) %} 4 | 5 | {% set jinja_macro=[] %} 6 | 7 | {% do jinja_macro.append('{% macro get_' ~ table_name ~ '_columns() %}') %} 8 | {% do jinja_macro.append('') %} 9 | {% do jinja_macro.append('{% set columns = [') %} 10 | 11 | {% for col in columns %} 12 | {% do jinja_macro.append(' ' ~ col ~ (',' if not loop.last)) %} 13 | {% endfor %} 14 | 15 | {% do jinja_macro.append('] %}') %} 16 | {% do jinja_macro.append('') %} 17 | {% do jinja_macro.append('{{ return(columns) }}') %} 18 | {% do jinja_macro.append('') %} 19 | {% do jinja_macro.append('{% endmacro %}') %} 20 | 21 | {% if execute %} 22 | 23 | {% set joined = jinja_macro | join ('\n') %} 24 | {{ log(joined, info=True) }} 25 | {% do return(joined) %} 26 | 27 | {% endif %} 28 | 29 | {% endmacro %} -------------------------------------------------------------------------------- /macros/first_value.sql: -------------------------------------------------------------------------------- 1 | {% macro first_value(first_value_field, partition_field, order_by_field, order="asc") -%} 2 | 3 | {{ adapter.dispatch('first_value', packages = fivetran_utils._get_utils_namespaces()) (first_value_field, partition_field, order_by_field, order) }} 4 | 5 | {%- endmacro %} 6 | 7 | --Default first_value calculation 8 | {% macro default__first_value(first_value_field, partition_field, order_by_field, order="asc") %} 9 | 10 | first_value( {{ first_value_field }} ignore nulls ) over (partition by {{ partition_field }} order by {{ order_by_field }} {{ order }} ) 11 | 12 | {% endmacro %} 13 | 14 | --first_value calculation specific to Redshift 15 | {% macro redshift__first_value(first_value_field, partition_field, order_by_field, order="asc") %} 16 | 17 | first_value( {{ first_value_field }} ignore nulls ) over (partition by {{ partition_field }} order by {{ order_by_field }} {{ order }} , {{ partition_field }} rows unbounded preceding ) 18 | 19 | {% endmacro %} -------------------------------------------------------------------------------- /macros/dummy_coalesce_value.sql: -------------------------------------------------------------------------------- 1 | {% macro dummy_coalesce_value(column) %} 2 | 3 | {% set coalesce_value = { 4 | 'STRING': "'DUMMY_STRING'", 5 | 'BOOLEAN': 'null', 6 | 'INT': 999999999, 7 | 'FLOAT': 999999999.99, 8 | 'TIMESTAMP': 'cast("2099-12-31" as timestamp)', 9 | 'DATE': 'cast("2099-12-31" as date)', 10 | } %} 11 | 12 | {% if column.is_float() %} 13 | {{ return(coalesce_value['FLOAT']) }} 14 | 15 | {% elif column.is_numeric() %} 16 | {{ return(coalesce_value['INT']) }} 17 | 18 | {% elif column.is_string() %} 19 | {{ return(coalesce_value['STRING']) }} 20 | 21 | {% elif column.data_type|lower == 'boolean' %} 22 | {{ return(coalesce_value['BOOLEAN']) }} 23 | 24 | {% elif 'timestamp' in column.data_type|lower %} 25 | {{ return(coalesce_value['TIMESTAMP']) }} 26 | 27 | {% elif 'date' in column.data_type|lower %} 28 | {{ return(coalesce_value['DATE']) }} 29 | 30 | {% elif 'int' in column.data_type|lower %} 31 | {{ return(coalesce_value['INT']) }} 32 | 33 | {% endif %} 34 | 35 | 36 | {% endmacro %} -------------------------------------------------------------------------------- /macros/collect_freshness.sql: -------------------------------------------------------------------------------- 1 | {% macro collect_freshness(source, loaded_at_field, filter) %} 2 | {{ return(adapter.dispatch('collect_freshness')(source, loaded_at_field, filter))}} 3 | {% endmacro %} 4 | 5 | 6 | {% macro default__collect_freshness(source, loaded_at_field, filter) %} 7 | {% call statement('collect_freshness', fetch_result=True, auto_begin=False) -%} 8 | 9 | {%- set enabled_array = [] -%} 10 | {% for node in graph.sources.values() %} 11 | {% if node.name == source.name %} 12 | {% if (node.meta['is_enabled'] | default(true)) %} 13 | {%- do enabled_array.append(1) -%} 14 | {% endif %} 15 | {% endif %} 16 | {% endfor %} 17 | {% set is_enabled = (enabled_array != []) %} 18 | 19 | select 20 | {% if is_enabled %} 21 | max({{ loaded_at_field }}) 22 | {% else %} 23 | {{ current_timestamp() }} {% endif %} as max_loaded_at, 24 | {{ current_timestamp() }} as snapshotted_at 25 | 26 | {% if is_enabled %} 27 | from {{ source }} 28 | {% if filter %} 29 | where {{ filter }} 30 | {% endif %} 31 | {% endif %} 32 | 33 | {% endcall %} 34 | {{ return(load_result('collect_freshness').table) }} 35 | {% endmacro %} -------------------------------------------------------------------------------- /macros/timestamp_add.sql: -------------------------------------------------------------------------------- 1 | {% macro timestamp_add(datepart, interval, from_timestamp) -%} 2 | 3 | {{ adapter.dispatch('timestamp_add', packages = fivetran_utils._get_utils_namespaces()) (datepart, interval, from_timestamp) }} 4 | 5 | {%- endmacro %} 6 | 7 | 8 | {% macro default__timestamp_add(datepart, interval, from_timestamp) %} 9 | 10 | timestampadd( 11 | {{ datepart }}, 12 | {{ interval }}, 13 | {{ from_timestamp }} 14 | ) 15 | 16 | {% endmacro %} 17 | 18 | 19 | {% macro bigquery__timestamp_add(datepart, interval, from_timestamp) %} 20 | 21 | timestamp_add({{ from_timestamp }}, interval {{ interval }} {{ datepart }}) 22 | 23 | {% endmacro %} 24 | 25 | {% macro redshift__timestamp_add(datepart, interval, from_timestamp) %} 26 | 27 | dateadd( 28 | {{ datepart }}, 29 | {{ interval }}, 30 | {{ from_timestamp }} 31 | ) 32 | 33 | {% endmacro %} 34 | 35 | {% macro postgres__timestamp_add(datepart, interval, from_timestamp) %} 36 | 37 | {{ from_timestamp }} + ((interval '1 {{ datepart }}') * ({{ interval }})) 38 | 39 | {% endmacro %} 40 | 41 | {% macro spark__timestamp_add(datepart, interval, from_timestamp) %} 42 | 43 | {{ dbt_utils.dateadd(datepart, interval, from_timestamp) }} 44 | 45 | {% endmacro %} 46 | -------------------------------------------------------------------------------- /macros/json_parse.sql: -------------------------------------------------------------------------------- 1 | {% macro json_parse(string, string_path) -%} 2 | 3 | {{ adapter.dispatch('json_parse', packages = fivetran_utils._get_utils_namespaces()) (string, string_path) }} 4 | 5 | {%- endmacro %} 6 | 7 | {% macro default__json_parse(string, string_path) %} 8 | 9 | json_extract_path_text({{string}}, {%- for s in string_path -%}'{{ s }}'{%- if not loop.last -%},{%- endif -%}{%- endfor -%} ) 10 | 11 | {% endmacro %} 12 | 13 | {% macro bigquery__json_parse(string, string_path) %} 14 | 15 | 16 | json_extract_scalar({{string}}, '$.{%- for s in string_path -%}{{ s }}{%- if not loop.last -%}.{%- endif -%}{%- endfor -%} ') 17 | 18 | {% endmacro %} 19 | 20 | {% macro postgres__json_parse(string, string_path) %} 21 | 22 | {{string}}::json #>> '{ {%- for s in string_path -%}{{ s }}{%- if not loop.last -%},{%- endif -%}{%- endfor -%} }' 23 | 24 | {% endmacro %} 25 | 26 | {% macro snowflake__json_parse(string, string_path) %} 27 | 28 | parse_json( {{string}} ) {%- for s in string_path -%}{% if s is number %}[{{ s }}]{% else %}['{{ s }}']{% endif %}{%- endfor -%} 29 | 30 | {% endmacro %} 31 | 32 | {% macro spark__json_parse(string, string_path) %} 33 | 34 | {{string}} : {%- for s in string_path -%}{% if s is number %}[{{ s }}]{% else %}['{{ s }}']{% endif %}{%- endfor -%} 35 | 36 | {% endmacro %} 37 | -------------------------------------------------------------------------------- /macros/fill_staging_columns.sql: -------------------------------------------------------------------------------- 1 | {% macro fill_staging_columns(source_columns, staging_columns) -%} 2 | 3 | {%- set source_column_names = source_columns|map(attribute='name')|map('lower')|list -%} 4 | 5 | {%- for column in staging_columns %} 6 | {% if column.name|lower in source_column_names -%} 7 | {{ fivetran_utils.quote_column(column) }} as 8 | {%- if 'alias' in column %} {{ column.alias }} {% else %} {{ fivetran_utils.quote_column(column) }} {%- endif -%} 9 | {%- else -%} 10 | cast(null as {{ column.datatype }}) 11 | {%- if 'alias' in column %} as {{ column.alias }} {% else %} as {{ fivetran_utils.quote_column(column) }} {% endif -%} 12 | {%- endif -%} 13 | {%- if not loop.last -%} , {% endif -%} 14 | {% endfor %} 15 | 16 | {% endmacro %} 17 | 18 | 19 | {% macro quote_column(column) %} 20 | {% if 'quote' in column %} 21 | {% if column.quote %} 22 | {% if target.type in ('bigquery', 'spark') %} 23 | `{{ column.name }}` 24 | {% elif target.type == 'snowflake' %} 25 | "{{ column.name | upper }}" 26 | {% else %} 27 | "{{ column.name }}" 28 | {% endif %} 29 | {% else %} 30 | {{ column.name }} 31 | {% endif %} 32 | {% else %} 33 | {{ column.name }} 34 | {% endif %} 35 | {% endmacro %} -------------------------------------------------------------------------------- /integration_tests/ci/sample.profiles.yml: -------------------------------------------------------------------------------- 1 | # HEY! This file is used in the Fivetran Utils integrations tests with CircleCI. 2 | # You should __NEVER__ check credentials into version control. Thanks for reading :) 3 | 4 | config: 5 | send_anonymous_usage_stats: False 6 | use_colors: True 7 | 8 | integration_tests: 9 | target: snowflake 10 | outputs: 11 | redshift: 12 | type: redshift 13 | host: "{{ env_var('CI_REDSHIFT_DBT_HOST') }}" 14 | user: "{{ env_var('CI_REDSHIFT_DBT_USER') }}" 15 | pass: "{{ env_var('CI_REDSHIFT_DBT_PASS') }}" 16 | dbname: "{{ env_var('CI_REDSHIFT_DBT_DBNAME') }}" 17 | port: 5439 18 | schema: fivetran_utils_integration_tests 19 | threads: 8 20 | bigquery: 21 | type: bigquery 22 | method: service-account 23 | keyfile: "{{ env_var('GCLOUD_SERVICE_KEY_PATH') }}" 24 | project: 'dbt-package-testing' 25 | schema: fivetran_utils_integration_tests 26 | threads: 8 27 | snowflake: 28 | type: snowflake 29 | account: "{{ env_var('CI_SNOWFLAKE_DBT_ACCOUNT') }}" 30 | user: "{{ env_var('CI_SNOWFLAKE_DBT_USER') }}" 31 | password: "{{ env_var('CI_SNOWFLAKE_DBT_PASS') }}" 32 | role: "{{ env_var('CI_SNOWFLAKE_DBT_ROLE') }}" 33 | database: "{{ env_var('CI_SNOWFLAKE_DBT_DATABASE') }}" 34 | warehouse: "{{ env_var('CI_SNOWFLAKE_DBT_WAREHOUSE') }}" 35 | schema: fivetran_utils_integration_tests 36 | threads: 8 -------------------------------------------------------------------------------- /macros/union_data.sql: -------------------------------------------------------------------------------- 1 | {% macro union_data(table_identifier, database_variable, schema_variable, default_database, default_schema, default_variable) -%} 2 | 3 | {{ adapter.dispatch('union_data', packages = fivetran_utils._get_utils_namespaces()) (table_identifier, database_variable, schema_variable, default_database, default_schema, default_variable) }} 4 | 5 | {%- endmacro %} 6 | 7 | {% macro default__union_data(table_identifier, database_variable, schema_variable, default_database, default_schema, default_variable) %} 8 | 9 | {% if var('union_schemas', none) %} 10 | 11 | {% set relations = [] %} 12 | 13 | {% for schema in var('union_schemas') %} 14 | 15 | {% set relation=adapter.get_relation( 16 | database=var(database_variable, default_database), 17 | schema=schema, 18 | identifier=table_identifier 19 | ) -%} 20 | 21 | {% set relation_exists=relation is not none %} 22 | 23 | {% if relation_exists %} 24 | 25 | {% do relations.append(relation) %} 26 | 27 | {% endif %} 28 | 29 | {% endfor %} 30 | 31 | {{ dbt_utils.union_relations(relations) }} 32 | 33 | {% elif var('union_databases', none) %} 34 | 35 | {% set relations = [] %} 36 | 37 | {% for database in var('union_databases') %} 38 | 39 | {% set relation=adapter.get_relation( 40 | database=database, 41 | schema=var(schema_variable, default_schema), 42 | identifier=table_identifier 43 | ) -%} 44 | 45 | {% set relation_exists=relation is not none %} 46 | 47 | {% if relation_exists %} 48 | 49 | {% do relations.append(relation) %} 50 | 51 | {% endif %} 52 | 53 | {% endfor %} 54 | 55 | {{ dbt_utils.union_relations(relations) }} 56 | 57 | {% else %} 58 | 59 | select * 60 | from {{ var(default_variable) }} 61 | 62 | {% endif %} 63 | 64 | {% endmacro %} 65 | -------------------------------------------------------------------------------- /macros/get_columns_for_macro.sql: -------------------------------------------------------------------------------- 1 | {% macro default__get_columns_for_macro(table_name, schema_name, database_name=target.database) %} 2 | 3 | {% set query %} 4 | 5 | select 6 | concat( 7 | '{"name": "', 8 | lower(column_name), 9 | '", "datatype": ', 10 | case 11 | when lower(data_type) like '%timestamp%' then 'dbt_utils.type_timestamp()' 12 | when lower(data_type) = 'text' then 'dbt_utils.type_string()' 13 | when lower(data_type) = 'boolean' then '"boolean"' 14 | when lower(data_type) = 'number' then 'dbt_utils.type_numeric()' 15 | when lower(data_type) = 'float' then 'dbt_utils.type_float()' 16 | when lower(data_type) = 'date' then '"date"' 17 | end, 18 | '}') 19 | from {{ database_name }}.information_schema.columns 20 | where lower(table_name) = '{{ table_name }}' 21 | and lower(table_schema) = '{{ schema_name }}' 22 | order by 1 23 | 24 | {% endset %} 25 | 26 | {% set results = run_query(query) %} 27 | {% set results_list = results.columns[0].values() %}} 28 | 29 | {{ return(results_list) }} 30 | 31 | {% endmacro %} 32 | 33 | 34 | 35 | {% macro bigquery__get_columns_for_macro(table_name, schema_name, database_name=target.database) %} 36 | 37 | {% set query %} 38 | 39 | select 40 | concat( 41 | '{"name": "', 42 | lower(column_name), 43 | '", "datatype": ', 44 | case 45 | when lower(data_type) like '%timestamp%' then 'dbt_utils.type_timestamp()' 46 | when lower(data_type) = 'string' then 'dbt_utils.type_string()' 47 | when lower(data_type) = 'bool' then '"boolean"' 48 | when lower(data_type) = 'numeric' then 'dbt_utils.type_numeric()' 49 | when lower(data_type) = 'float64' then 'dbt_utils.type_float()' 50 | when lower(data_type) = 'int64' then 'dbt_utils.type_int()' 51 | when lower(data_type) = 'date' then '"date"' 52 | when lower(data_type) = 'datetime' then '"datetime"' 53 | end, 54 | '}') 55 | from `{{ database_name }}`.{{ schema_name }}.INFORMATION_SCHEMA.COLUMNS 56 | where lower(table_name) = '{{ table_name }}' 57 | and lower(table_schema) = '{{ schema_name }}' 58 | order by 1 59 | 60 | {% endset %} 61 | 62 | {% set results = run_query(query) %} 63 | {% set results_list = results.columns[0].values() %}} 64 | 65 | {{ return(results_list) }} 66 | 67 | {% endmacro %} 68 | 69 | 70 | 71 | {% macro get_columns_for_macro(table_name, schema_name, database_name) -%} 72 | {{ return(adapter.dispatch('get_columns_for_macro')(table_name, schema_name, database_name)) }} 73 | {%- endmacro %} -------------------------------------------------------------------------------- /macros/timestamp_diff.sql: -------------------------------------------------------------------------------- 1 | {% macro timestamp_diff(first_date, second_date, datepart) %} 2 | {{ adapter.dispatch('datediff', packages = fivetran_utils._get_utils_namespaces())(first_date, second_date, datepart) }} 3 | {% endmacro %} 4 | 5 | 6 | {% macro default__timestamp_diff(first_date, second_date, datepart) %} 7 | 8 | datediff( 9 | {{ datepart }}, 10 | {{ first_date }}, 11 | {{ second_date }} 12 | ) 13 | 14 | {% endmacro %} 15 | 16 | 17 | {% macro bigquery__timestamp_diff(first_date, second_date, datepart) %} 18 | 19 | timestamp_diff( 20 | {{second_date}}, 21 | {{first_date}}, 22 | {{datepart}} 23 | ) 24 | 25 | {% endmacro %} 26 | 27 | {% macro postgres__datediff(first_date, second_date, datepart) %} 28 | 29 | {% if datepart == 'year' %} 30 | (date_part('year', ({{second_date}})::date) - date_part('year', ({{first_date}})::date)) 31 | {% elif datepart == 'quarter' %} 32 | ({{ dbt_utils.datediff(first_date, second_date, 'year') }} * 4 + date_part('quarter', ({{second_date}})::date) - date_part('quarter', ({{first_date}})::date)) 33 | {% elif datepart == 'month' %} 34 | ({{ dbt_utils.datediff(first_date, second_date, 'year') }} * 12 + date_part('month', ({{second_date}})::date) - date_part('month', ({{first_date}})::date)) 35 | {% elif datepart == 'day' %} 36 | (({{second_date}})::date - ({{first_date}})::date) 37 | {% elif datepart == 'week' %} 38 | ({{ dbt_utils.datediff(first_date, second_date, 'day') }} / 7 + case 39 | when date_part('dow', ({{first_date}})::timestamp) <= date_part('dow', ({{second_date}})::timestamp) then 40 | case when {{first_date}} <= {{second_date}} then 0 else -1 end 41 | else 42 | case when {{first_date}} <= {{second_date}} then 1 else 0 end 43 | end) 44 | {% elif datepart == 'hour' %} 45 | ({{ dbt_utils.datediff(first_date, second_date, 'day') }} * 24 + date_part('hour', ({{second_date}})::timestamp) - date_part('hour', ({{first_date}})::timestamp)) 46 | {% elif datepart == 'minute' %} 47 | ({{ dbt_utils.datediff(first_date, second_date, 'hour') }} * 60 + date_part('minute', ({{second_date}})::timestamp) - date_part('minute', ({{first_date}})::timestamp)) 48 | {% elif datepart == 'second' %} 49 | ({{ dbt_utils.datediff(first_date, second_date, 'minute') }} * 60 + floor(date_part('second', ({{second_date}})::timestamp)) - floor(date_part('second', ({{first_date}})::timestamp))) 50 | {% elif datepart == 'millisecond' %} 51 | ({{ dbt_utils.datediff(first_date, second_date, 'minute') }} * 60000 + floor(date_part('millisecond', ({{second_date}})::timestamp)) - floor(date_part('millisecond', ({{first_date}})::timestamp))) 52 | {% elif datepart == 'microsecond' %} 53 | ({{ dbt_utils.datediff(first_date, second_date, 'minute') }} * 60000000 + floor(date_part('microsecond', ({{second_date}})::timestamp)) - floor(date_part('microsecond', ({{first_date}})::timestamp))) 54 | {% else %} 55 | {{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in postgres: {!r}".format(datepart)) }} 56 | {% endif %} 57 | 58 | {% endmacro %} -------------------------------------------------------------------------------- /macros/union_relations.sql: -------------------------------------------------------------------------------- 1 | {%- macro union_relations(relations, aliases=none, column_override=none, include=[], exclude=[], source_column_name=none) -%} 2 | 3 | {%- if exclude and include -%} 4 | {{ exceptions.raise_compiler_error("Both an exclude and include list were provided to the `union` macro. Only one is allowed") }} 5 | {%- endif -%} 6 | 7 | {#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. -#} 8 | {%- if not execute %} 9 | {{ return('') }} 10 | {% endif -%} 11 | 12 | {%- set column_override = column_override if column_override is not none else {} -%} 13 | {%- set source_column_name = source_column_name if source_column_name is not none else '_dbt_source_relation' -%} 14 | 15 | {%- set relation_columns = {} -%} 16 | {%- set column_superset = {} -%} 17 | 18 | {%- for relation in relations -%} 19 | 20 | {%- do relation_columns.update({relation: []}) -%} 21 | 22 | {%- do dbt_utils._is_relation(relation, 'union_relations') -%} 23 | {%- set cols = adapter.get_columns_in_relation(relation) -%} 24 | {%- for col in cols -%} 25 | 26 | {#- If an exclude list was provided and the column is in the list, do nothing -#} 27 | {%- if exclude and col.column in exclude -%} 28 | 29 | {#- If an include list was provided and the column is not in the list, do nothing -#} 30 | {%- elif include and col.column not in include -%} 31 | 32 | {#- Otherwise add the column to the column superset -#} 33 | {%- else -%} 34 | 35 | {#- update the list of columns in this relation -#} 36 | {%- do relation_columns[relation].append(col.column) -%} 37 | 38 | {%- if col.column in column_superset -%} 39 | 40 | {%- set stored = column_superset[col.column] -%} 41 | {%- if col.is_string() and stored.is_string() and col.string_size() > stored.string_size() -%} 42 | 43 | {%- do column_superset.update({col.column: col}) -%} 44 | 45 | {%- endif %} 46 | 47 | {%- else -%} 48 | 49 | {%- do column_superset.update({col.column: col}) -%} 50 | 51 | {%- endif -%} 52 | 53 | {%- endif -%} 54 | 55 | {%- endfor -%} 56 | {%- endfor -%} 57 | 58 | {%- set ordered_column_names = column_superset.keys() -%} 59 | 60 | {%- for relation in relations %} 61 | 62 | ( 63 | select 64 | 65 | cast({{ dbt_utils.string_literal(relation) }} as {{ dbt_utils.type_string() }}) as {{ source_column_name }}, 66 | {% for col_name in ordered_column_names -%} 67 | 68 | {%- set col = column_superset[col_name] %} 69 | {%- set col_type = column_override.get(col.column, col.data_type) %} 70 | {%- set col_name = adapter.quote(col_name) if col_name in relation_columns[relation] else 'null' %} 71 | cast({{ col_name }} as {{ col_type }}) as {{ col.quoted }} {% if not loop.last %},{% endif -%} 72 | 73 | {%- endfor %} 74 | 75 | from {{ aliases[loop.index0] if aliases else relation }} 76 | ) 77 | 78 | {% if not loop.last -%} 79 | union all 80 | {% endif -%} 81 | 82 | {%- endfor -%} 83 | 84 | {%- endmacro -%} 85 | 86 | {%- macro union_tables(tables, column_override=none, include=[], exclude=[], source_column_name='_dbt_source_table') -%} 87 | 88 | {%- do exceptions.warn("Warning: the `union_tables` macro is no longer supported and will be deprecated in a future release of dbt-utils. Use the `union_relations` macro instead") -%} 89 | 90 | {{ return(dbt_utils.union_relations(tables, column_override, include, exclude, source_column_name)) }} 91 | 92 | {%- endmacro -%} -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fivetran Utilities for dbt 2 | 3 | > ## ⚠️ Warning! ⚠️ 4 | > 5 | > You are viewing a deprecated branch, which is no longer maintained and not recommended for use. This branch remains the default branch to prevent breaking changes to any existing dbt projects that use this package. 6 | > 7 | > To view the current release branch, please refer to the [release list](https://github.com/fivetran/dbt_fivetran_utils/releases). 8 | 9 | This package includes macros that are used in Fivetran's dbt packages. 10 | 11 | ## Macros 12 | ### _get_utils_namespaces ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/_get_utils_namespaces.sql)) 13 | This macro allows for namespacing macros throughout a dbt project. The macro currently consists of two namespaces: 14 | - `dbt_utils` 15 | - `fivetran_utils` 16 | 17 | ---- 18 | ### add_pass_through_columns ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/add_pass_through_columns.sql)) 19 | This macro creates the proper name, datatype, and aliasing for user defined pass through column variable. This 20 | macro allows for pass through variables to be more dynamic and allow users to alias custom fields they are 21 | bringing in. This macro is typically used within staging models of a fivetran dbt source package to pass through 22 | user defined custom fields. 23 | 24 | **Usage:** 25 | ```sql 26 | {{ fivetran_utils.add_pass_through_columns(base_columns=columns, pass_through_var=var('hubspot__deal_pass_through_columns')) }} 27 | ``` 28 | **Args:** 29 | * `base_columns` (required): The name of the variable where the base columns are contained. This is typically `columns`. 30 | * `pass_through_var` (required): The variable which contains the user defined pass through fields. 31 | 32 | ---- 33 | ### array_agg ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/array_agg.sql)) 34 | This macro allows for cross database field aggregation. The macro contains the database specific field aggregation function for 35 | BigQuery, Snowflake, Redshift, and Postgres. By default a comma `,` is used as a delimiter in the aggregation. 36 | 37 | **Usage:** 38 | ```sql 39 | {{ fivetran_utils.array_agg(field_to_agg="teams") }} 40 | ``` 41 | **Args:** 42 | * `field_to_agg` (required): Field within the table you are wishing to aggregate. 43 | 44 | ---- 45 | ### ceiling ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/ceiling.sql)) 46 | This macro allows for cross database use of the ceiling function. The ceiling function returns the smallest integer greater 47 | than, or equal to, the specified numeric expression. The ceiling macro is compatible with BigQuery, Redshift, Postgres, and Snowflake. 48 | 49 | **Usage:** 50 | ```sql 51 | {{ fivetran_utils.ceiling(num="target/total_days") }} 52 | ``` 53 | **Args:** 54 | * `num` (required): The integer field you wish to apply the ceiling function. 55 | 56 | ---- 57 | ### collect_freshness ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/collect_freshness.sql)) 58 | This macro overrides dbt's default [`collect_freshness` macro](https://github.com/fishtown-analytics/dbt/blob/0.19.latest/core/dbt/include/global_project/macros/adapters/common.sql#L257-L273) that is called when running `dbt source snapshot-freshness`. It allows you to incorporate model enabling/disabling variables into freshness tests, so that, if a source table does not exist, dbt will not run (and error on) a freshness test on the table. **Any package that has a dependency on fivetran_utils will use this version of the macro. If no `meta.is_enabled` field is provided, the `collect_freshness` should run exactly like dbt's default version.** 59 | 60 | **Usage:** 61 | ```yml 62 | # in the sources.yml 63 | sources: 64 | - name: source_name 65 | freshness: 66 | warn_after: {count: 84, period: hour} 67 | error_after: {count: 168, period: hour} 68 | tables: 69 | - name: table_that_might_not_exist 70 | meta: 71 | is_enabled: "{{ var('package__using_this_table', true) }}" 72 | ``` 73 | **Args (sorta):** 74 | * `meta.is_enabled` (optional): The variable(s) you would like to reference to determine if dbt should include this table in freshness tests. 75 | 76 | ---- 77 | ### dummy_coalesce_value ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/dummy_coalesce_value.sql)) 78 | This macro creates a dummy coalesce value based on the data type of the field. See below for the respective data type and dummy values: 79 | - String = 'DUMMY_STRING' 80 | - Boolean = null 81 | - Int = 999999999 82 | - Float = 999999999.99 83 | - Timestamp = cast("2099-12-31" as timestamp) 84 | - Date = cast("2099-12-31" as date) 85 | **Usage:** 86 | ```sql 87 | {{ fivetran_utils.dummy_coalesce_value(column="user_rank") }} 88 | ``` 89 | **Args:** 90 | * `column` (required): Field you are applying the dummy coalesce. 91 | 92 | ---- 93 | ### empty_variable_warning ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/empty_variable_warning.sql)) 94 | This macro checks a declared variable and returns an error message if the variable is empty before running the models within the `dbt_project.yml` file. 95 | 96 | **Usage:** 97 | ```yml 98 | on-run-start: '{{ fivetran_utils.empty_variable_warning(variable="ticket_field_history_columns", downstream_model="zendesk_ticket_field_history") }}' 99 | ``` 100 | **Args:** 101 | * `variable` (required): The variable you want to check if it is empty. 102 | * `downstream_model` (required): The downstream model that is affected if the variable is empty. 103 | 104 | ---- 105 | ### enabled_vars_one_true ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/enabled_vars_one_true.sql)) 106 | This macro references a set of specified boolean variable and returns `true` if any variable value is equal to true. 107 | 108 | **Usage:** 109 | ```sql 110 | {{ fivetran_utils.enabled_vars_one_true(vars=["using_department_table", "using_customer_table"]) }} 111 | ``` 112 | **Args:** 113 | * `vars` (required): Variable(s) you are referencing to return the declared variable value. 114 | 115 | ---- 116 | 117 | ### enabled_vars ([source](https://github.com/fivetran/dbt_fivetran_utils/edit/master/macros/enabled_vars.sql)) 118 | This macro references a set of specified boolean variable and returns `false` if any variable value is equal to false. 119 | 120 | **Usage:** 121 | ```sql 122 | {{ fivetran_utils.enabled_vars(vars=["using_department_table", "using_customer_table"]) }} 123 | ``` 124 | **Args:** 125 | * `vars` (required): Variable you are referencing to return the declared variable value. 126 | 127 | ---- 128 | 129 | ### fill_pass_through_columns ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/fill_pass_through_columns.sql)) 130 | This macro is used to generate the correct sql for package staging models for user defined pass through columns. 131 | 132 | **Usage:** 133 | ```sql 134 | {{ fivetran_utils.fill_pass_through_columns(pass_through_variable='hubspot__contact_pass_through_columns') }} 135 | ``` 136 | **Args:** 137 | * `pass_through_variable` (required): Name of the variable which contains the respective pass through fields for the staging model. 138 | 139 | ---- 140 | ### fill_staging_columns ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/fill_staging_columns.sql)) 141 | This macro is used to generate the correct SQL for package staging models. It takes a list of columns that are expected/needed (`staging_columns`) 142 | and compares it with columns in the source (`source_columns`). 143 | 144 | **Usage:** 145 | ```sql 146 | select 147 | 148 | {{ 149 | fivetran_utils.fill_staging_columns( 150 | source_columns=adapter.get_columns_in_relation(ref('stg_twitter_ads__account_history_tmp')), 151 | staging_columns=get_account_history_columns() 152 | ) 153 | }} 154 | 155 | from source 156 | ``` 157 | **Args:** 158 | * `source_columns` (required): Will call the [get_columns_in_relation](https://docs.getdbt.com/reference/dbt-jinja-functions/adapter/#get_columns_in_relation) macro as well requires a `ref()` or `source()` argument for the staging models within the `_tmp` directory. 159 | * `staging_columns` (required): Created as a result of running the [generate_columns_macro](https://github.com/fivetran/dbt_fivetran_utils#generate_columns_macro-source) for the respective table. 160 | 161 | ---- 162 | ### first_value ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/first_value.sql)) 163 | This macro returns the value_expression for the first row in the current window frame with cross db functionality. This macro ignores null values. The default first_value calculation within the macro is the `first_value` function. The Redshift first_value calculation is the `first_value` function, with the inclusion of a frame_clause `{{ partition_field }} rows unbounded preceding`. 164 | 165 | **Usage:** 166 | ```sql 167 | {{ fivetran_utils.first_value(first_value_field="created_at", partition_field="id", order_by_field="created_at", order="asc") }} 168 | ``` 169 | **Args:** 170 | * `first_value_field` (required): The value expression which you want to determine the first value for. 171 | * `partition_field` (required): Name of the field you want to partition by to determine the first_value. 172 | * `order_by_field` (required): Name of the field you wish to sort on to determine the first_value. 173 | * `order` (optional): The order of which you want to partition the window frame. The order argument by default is `asc`. If you wish to get the last_value, you may change the argument to `desc`. 174 | 175 | ---- 176 | ### generate_columns_macro ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/generate_columns_macro.sql)) 177 | This macro is used to generate the macro used as an argument within the [fill_staging_columns](https://github.com/fivetran/dbt_fivetran_utils#fill_staging_columns-source) macro which will list all the expected columns within a respective table. The macro output will contain `name` and `datatype`; however, you may add an optional argument for `alias` if you wish to rename the column within the macro. 178 | 179 | The macro should be run using dbt's `run-operation` functionality, as used below. It will print out the macro text, which can be copied and pasted into the relevant `macro` directory file within the package. 180 | 181 | **Usage:** 182 | ``` 183 | dbt run-operation fivetran_utils.generate_columns_macro --args '{"table_name": "promoted_tweet_report", "schema_name": "twitter_ads", "database_name": "dbt-package-testing"}' 184 | ``` 185 | **Output:** 186 | ```sql 187 | {% macro get_admin_columns() %} 188 | 189 | {% set columns = [ 190 | {"name": "email", "datatype": dbt_utils.type_string()}, 191 | {"name": "id", "datatype": dbt_utils.type_string(), "alias": "admin_id"}, 192 | {"name": "job_title", "datatype": dbt_utils.type_string()}, 193 | {"name": "name", "datatype": dbt_utils.type_string()}, 194 | {"name": "_fivetran_deleted", "datatype": "boolean"}, 195 | {"name": "_fivetran_synced", "datatype": dbt_utils.type_timestamp()} 196 | ] %} 197 | 198 | {{ return(columns) }} 199 | 200 | {% endmacro %} 201 | ``` 202 | **Args:** 203 | * `table_name` (required): Name of the schema which the table you are running the macro for resides in. 204 | * `schema_name` (required): Name of the schema which the table you are running the macro for resides in. 205 | * `database_name` (optional): Name of the database which the table you are running the macro for resides in. If empty, the macro will default this value to `target.database`. 206 | 207 | ---- 208 | ### get_columns_for_macro ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/get_columns_for_macro.sql)) 209 | This macro returns all column names and datatypes for a specified table within a database and is used as part of the [generate_columns_macro](macros/generate_columns_macro.sql). 210 | 211 | **Usage:** 212 | ```sql 213 | {{ fivetran_utils.get_columns_for_macro(table_name="team", schema_name="my_teams", database_name="my_database") }} 214 | ``` 215 | **Args:** 216 | * `table_name` (required): Name of the table you are wanting to return column names and datatypes. 217 | * `schema_name` (required): Name of the schema where the above mentioned table resides. 218 | * `database_name` (optional): Name of the database where the above mentioned schema and table reside. By default this will be your target.database. 219 | 220 | ---- 221 | ### json_extract ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/json_extract.sql)) 222 | This macro allows for cross database use of the json extract function. The json extract allows the return of data from a json object. 223 | The data is returned by the path you provide as the argument. The json_extract macro is compatible with BigQuery, Redshift, Postgres, and Snowflake. 224 | 225 | **Usage:** 226 | ```sql 227 | {{ fivetran_utils.json_extract(string="value", string_path="in_business_hours") }} 228 | ``` 229 | **Args:** 230 | * `string` (required): Name of the field which contains the json object. 231 | * `string_path` (required): Name of the path in the json object which you want to extract the data from. 232 | 233 | ---- 234 | ### json_parse ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/json_parse.sql)) 235 | This macro allows for cross database use of the json extract function, specifically used to parse and extract a nested value from a json object. 236 | The data is returned by the path you provide as the list within the `string_path` argument. The json_parse macro is compatible with BigQuery, Redshift, Postgres, Snowflake and Databricks. 237 | 238 | **Usage:** 239 | ```sql 240 | {{ fivetran_utils.json_parse(string="receipt", string_path=["charges","data",0,"balance_transaction","exchange_rate"]) }} 241 | ``` 242 | **Args:** 243 | * `string` (required): Name of the field which contains the json object. 244 | * `string_path` (required): List of item(s) that derive the path in the json object which you want to extract the data from. 245 | 246 | ---- 247 | ### pivot_json_extract ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/pivot_json_extract.sql)) 248 | This macro builds off of the `json_extract` macro in order to extract a list of fields from a json object and pivot the fields out into columns. The `pivot_json_extract` macro is compatible with BigQuery, Redshift, Postgres, and Snowflake. 249 | 250 | **Usage:** 251 | ```sql 252 | {{ fivetran_utils.pivot_json_extract(string="json_value", list_of_properties=["field 1", "field 2"]) }} 253 | ``` 254 | **Args:** 255 | * `string` (required): Name of the field which contains the json object. 256 | * `list_of_properties` (required): List of the fields that you want to extract from the json object and pivot out into columns. Any spaces will be replaced by underscores in column names. 257 | 258 | ---- 259 | ### max_bool ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/max_bool.sql)) 260 | This macro allows for cross database use of obtaining the max boolean value of a field. This macro recognizes true = 1 and false = 0. The macro will aggregate the boolean_field and return the max boolean value. The max_bool macro is compatible with BigQuery, Redshift, Postgres, and Snowflake. 261 | 262 | **Usage:** 263 | ```sql 264 | {{ fivetran_utils.max_bool(boolean_field="is_breach") }} 265 | ``` 266 | **Args:** 267 | * `boolean_field` (required): Name of the field you are obtaining the max boolean record from. 268 | 269 | ---- 270 | ### percentile ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/percentile.sql)) 271 | This macro is used to return the designated percentile of a field with cross db functionality. The percentile function stems from percentile_cont across db's. For Snowflake and Redshift this macro uses the window function opposed to the aggregate for percentile. 272 | 273 | **Usage:** 274 | ```sql 275 | {{ fivetran_utils.percentile(percentile_field='time_to_close', partition_field='id', percent='0.5') }} 276 | ``` 277 | **Args:** 278 | * `percentile_field` (required): Name of the field of which you are determining the desired percentile. 279 | * `partition_field` (required): Name of the field you want to partition by to determine the designated percentile. 280 | * `percent` (required): The percent necessary for `percentile_cont` to determine the percentile. If you want to find the median, you will input `0.5` for the percent. 281 | 282 | ---- 283 | ### remove_prefix_from_columns ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/remove_prefix_from_columns.sql)) 284 | This macro removes desired prefixes from specified columns. Additionally, a for loop is utilized which allows for adding multiple columns to remove prefixes. 285 | 286 | **Usage:** 287 | ```sql 288 | {{ fivetran_utils.remove_prefix_from_columns(columns="names", prefix='', exclude=[]) }} 289 | ``` 290 | **Args:** 291 | * `columns` (required): The desired columns you wish to remove prefixes. 292 | * `prefix` (optional): The prefix the macro will search for and remove. By default the prefix = ''. 293 | * `exclude` (optional): The columns you wish to exclude from this macro. By default no columns are excluded. 294 | 295 | ---- 296 | ### snowflake_seed_data ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/snowflake_seed_data.sql)) 297 | This macro is intended to be used when a source table column is a reserved keyword in Snowflake, and Circle CI is throwing a fit. 298 | It simply chooses which version of the data to seed (the Snowflake copy should capitalize and put three pairs of quotes around the problematic column). 299 | 300 | ***Usage:** 301 | ```yml 302 | # in integration_tests/dbt_project.yml 303 | vars: 304 | table_name: "{{ fivetran_utils.snowflake_seed_data(seed_name='user_data') }}" 305 | ``` 306 | **Args:** 307 | * `seed_name` (required): Name of the seed that has separate snowflake seed data. 308 | 309 | ---- 310 | ### seed_data_helper ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/seed_data_helper.sql)) 311 | This macro is intended to be used when a source table column is a reserved keyword in a warehouse, and Circle CI is throwing a fit. 312 | It simply chooses which version of the data to seed. Also note the `warehouses` argument is a list and multiple warehouses may be added based on the number of warehouse 313 | specific seed data files you need for integration testing. 314 | 315 | ***Usage:** 316 | ```yml 317 | # in integration_tests/dbt_project.yml 318 | vars: 319 | table_name: "{{ fivetran_utils.seed_data_helper(seed_name='user_data', warehouses=['snowflake', 'postgres']) }}" 320 | ``` 321 | **Args:** 322 | * `seed_name` (required): Name of the seed that has separate postgres seed data. 323 | * `warehouses` (required): List of the warehouses for which you want CircleCi to use the helper seed data. 324 | 325 | ---- 326 | ### staging_models_automation ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/staging_models_automation.sql)) 327 | This macro is intended to be used as a `run-operation` when generating Fivetran dbt source package staging models/macros. This macro will receive user input to create all necessary ([bash commands](columns_setup.sh)) appended with `&&` so they may all be ran at once. The output of this macro within the CLI will then be copied and pasted as a command to generate the staging models/macros. 328 | **Usage:** 329 | ```bash 330 | dbt run-operation staging_models_automation --args '{package: asana, source_schema: asana_source, source_database: database-source-name, tables: ["user","tag"]}' 331 | ``` 332 | **CLI Output:** 333 | ```bash 334 | source dbt_modules/fivetran_utils/columns_setup.sh '../dbt_asana_source' stg_asana dbt-package-testing asana_2 user && 335 | source dbt_modules/fivetran_utils/columns_setup.sh '../dbt_asana_source' stg_asana dbt-package-testing asana_2 tag 336 | ``` 337 | **Args:** 338 | * `package` (required): Name of the package for which you are creating staging models/macros. 339 | * `source_schema` (required): Name of the source_schema from which the bash command will query. 340 | * `source_database` (required): Name of the source_database from which the bash command will query. 341 | * `tables` (required): List of the tables for which you want to create staging models/macros. 342 | 343 | ---- 344 | ### string_agg ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/string_agg.sql)) 345 | This macro allows for cross database field aggregation and delimiter customization. Supported database specific field aggregation functions include 346 | BigQuery, Snowflake, Redshift. 347 | 348 | **Usage:** 349 | ```sql 350 | {{ fivetran_utils.string_agg(field_to_agg="issues_opened", delimiter='|') }} 351 | ``` 352 | **Args:** 353 | * `field_to_agg` (required): Field within the table you are wishing to aggregate. 354 | * `delimiter` (required): Character you want to be used as the delimiter between aggregates. 355 | ---- 356 | ### timestamp_add ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/timestamp_add.sql)) 357 | This macro allows for cross database addition of a timestamp field and a specified datepart and interval for BigQuery, Redshift, Postgres, and Snowflake. 358 | 359 | **Usage:** 360 | ```sql 361 | {{ fivetran_utils.timestamp_add(datepart="day", interval="1", from_timestamp="last_ticket_timestamp") }} 362 | ``` 363 | **Args:** 364 | * `datepart` (required): The datepart you are adding to the timestamp field. 365 | * `interval` (required): The interval in relation to the datepart you are adding to the timestamp field. 366 | * `from_timestamp` (required): The timestamp field you are adding the datepart and interval. 367 | 368 | ---- 369 | ### timestamp_diff ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/timestamp_diff.sql)) 370 | This macro allows for cross database timestamp difference calculation for BigQuery, Redshift, Postgres, and Snowflake. 371 | 372 | **Usage:** 373 | ```sql 374 | {{ fivetran_utils.timestamp_diff(first_date="first_ticket_timestamp", second_date="last_ticket_timestamp", datepart="day") }} 375 | ``` 376 | **Args:** 377 | * `first_date` (required): The first timestamp field for the difference calculation. 378 | * `second_date` (required): The second timestamp field for the difference calculation. 379 | * `datepart` (required): The date part applied to the timestamp difference calculation. 380 | 381 | ---- 382 | ### union_relations ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/union_relations.sql)) 383 | This macro unions together an array of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation), 384 | even when columns have differing orders in each Relation, and/or some columns are 385 | missing from some relations. Any columns exclusive to a subset of these 386 | relations will be filled with `null` where not present. An new column 387 | (`_dbt_source_relation`) is also added to indicate the source for each record. 388 | 389 | **Usage:** 390 | ```sql 391 | {{ dbt_utils.union_relations( 392 | relations=[ref('my_model'), source('my_source', 'my_table')], 393 | exclude=["_loaded_at"] 394 | ) }} 395 | ``` 396 | **Args:** 397 | * `relations` (required): An array of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation). 398 | * `aliases` (optional): An override of the relation identifier. This argument should be populated with the overwritten alias for the relation. If not populated `relations` will be the default. 399 | * `exclude` (optional): A list of column names that should be excluded from the final query. 400 | * `include` (optional): A list of column names that should be included in the final query. Note the `include` and `exclude` parameters are mutually exclusive. 401 | * `column_override` (optional): A dictionary of explicit column type overrides, e.g. `{"some_field": "varchar(100)"}`.`` 402 | * `source_column_name` (optional): The name of the column that records the source of this row. By default this argument is set to `none`. 403 | 404 | ---- 405 | ### union_data ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/union_data.sql)) 406 | This macro unions together tables of the same structure so that users can treat data from multiple connectors as the 'source' to a package. 407 | Depending on which macros are set, it will either look for schemas of the same name across multiple databases, or schemas with different names in the same database. 408 | 409 | If the `var` with the name of the `schema_variable` argument is set, the macro will union the `table_identifier` tables from each respective schema within the target database (or source database if set by a variable). 410 | If the `var` with the name of the `database_variable` argument is set, the macro will union the `table_identifier` tables from the source schema in each respective database. 411 | 412 | When using this functionality, every `_tmp` table should use this macro as described below. 413 | 414 | **Usage:** 415 | ```sql 416 | {{ 417 | fivetran_utils.union_data( 418 | table_identifier='customer', 419 | database_variable='shopify_database', 420 | schema_variable='shopify_schema', 421 | default_database=target.database, 422 | default_schema='shopify', 423 | default_variable='customer_source' 424 | ) 425 | }} 426 | ``` 427 | **Args:** 428 | * `table_identifier`: The name of the table that will be unioned. 429 | * `database_variable`: The name of the variable that users can populate to union data from multiple databases. 430 | * `schema_variable`: The name of the variable that users can populate to union data from multiple schemas. 431 | * `default_database`: The default database where source data should be found. This is used when unioning schemas. 432 | * `default_schema`: The default schema where source data should be found. This is used when unioning databases. 433 | * `default_variable`: The name of the variable that users should populate when they want to pass one specific relation to this model (mostly used for CI) 434 | 435 | ---- 436 | ### source_relation ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/macros/source_relation.sql)) 437 | This macro creates a new column that signifies with database/schema a record came from when using the `union_data` macro above. 438 | It should be added to all non-tmp staging models when using the `union_data` macro. 439 | 440 | **Usage:** 441 | ```sql 442 | {{ fivetran_utils.source_relation() }} 443 | ``` 444 | 445 | ## Bash Scripts 446 | ### columns_setup.sh ([source](https://github.com/fivetran/dbt_fivetran_utils/blob/master/columns_setup.sh)) 447 | 448 | This bash file can be used to setup or update packages to use the `fill_staging_columns` macro above. The bash script does the following three things: 449 | 450 | * Creates a `.sql` file in the `macros` directory for a source table and fills it with all the columns from the table. 451 | * Be sure your `dbt_project.yml` file does not contain any **Warnings** or **Errors**. If warnings or errors are present, the messages from the terminal will be printed above the macro within the `.sql` file in the `macros` directory. 452 | * Creates a `..._tmp.sql` file in the `models/tmp` directory and fills it with a `select * from {{ var('table_name') }}` where `table_name` is the name of the source table. 453 | * Creates or updates a `.sql` file in the `models` directory and fills it with the filled out version of the `fill_staging_columns` macro as shown above. You can then write whatever SQL you want around the macro to finishing off the staging file. 454 | 455 | The usage is as follows, assuming you are executing via a `zsh` terminal and in a dbt project directory that has already imported this repo as a dependency: 456 | ```bash 457 | source dbt_modules/fivetran_utils/columns_setup.sh "path/to/directory" file_prefix database_name schema_name table_name 458 | ``` 459 | 460 | As an example, assuming we are in a dbt project in an adjacent folder to `dbt_marketo_source`: 461 | ```bash 462 | source dbt_modules/fivetran_utils/columns_setup.sh "../dbt_marketo_source" stg_marketo "digital-arbor-400" marketo_v3 deleted_program_membership 463 | ``` 464 | 465 | In that example, it will: 466 | * Create a `get_deleted_program_membership_columns.sql` file in the `macros` directory, with the necessary macro within it. 467 | * Create a `stg_marketo__deleted_program_membership_tmp.sql` file in the `models/tmp` directory, with `select * from {{ var('deleted_program_membership') }}` in it. 468 | * Create or update a `stg_marketo__deleted_program_membership.sql` file in the `models` directory with the pre-filled out `fill_staging_columns` macro. 469 | --------------------------------------------------------------------------------