├── 1_data_integration_elt └── README.md ├── 2_dwh_basics └── README.md ├── 3_dbt_deep_dive └── README.md ├── 4_business_intelligence └── README.md ├── 5_advanced_toolkit └── README.md ├── 6_capstone_project └── README.md └── README.md /1_data_integration_elt/README.md: -------------------------------------------------------------------------------- 1 | # Data Integration: ELT 2 | 3 | 9 | 10 | ## Databases for Analytics: key principles 11 | 12 | 13 | - Massively Parallel Processing 14 | - Columnar storage and compression 15 | - Data layout: clustering, partitioning, sorting 16 | 17 | 33 | 34 | ## Handling Data Sources 35 | 36 | 37 | - Overview of a company's data sources 38 | - Addressing the key properties of data: volume, frequency, schema evolution 39 | - Data formats: CSV, JSON, AVRO, PARQUET, ORC 40 | 41 | 56 | 57 | 58 | ## Developer tools 101: IDE, Terminal, Docker, Codespaces, Terraform 59 | 60 | 61 | - Organizing a convenient modern Dev Env 62 | - Version control, Shell, Containers basics 63 | - Interacting with Cloud providers and deploying Infrastructure 64 | 65 | 82 | 83 | ## 🚀 Lab: [Setting up Airbyte Data Pipelines](https://github.com/kzzzr/airbyte_lab) 84 | 85 | 86 | - Understanding ETL to ELT transition 87 | - SaaS (Fivetran, Stitch, Hevo) vs. Self-managed (Airbyte, Meltano, Singer, Nifi) 88 | - Setting up Integration pipelines with Airbyte 89 | - Sync modes: Incremental, Full-load, Deduped history 90 | 91 | 92 | 111 | -------------------------------------------------------------------------------- /2_dwh_basics/README.md: -------------------------------------------------------------------------------- 1 | # DWH Basics 2 | 3 | 8 | 9 | ## DWH modeling basics 10 | 11 | 12 | 13 | 16 | 17 | 18 | 26 | 27 | 28 | ## Getting familiar with dbt (data build tool) 29 | 30 | 35 | 36 | ## 🚀 Lab: [DWH powered by Clickhouse and dbt](https://github.com/kzzzr/dbt_clickhouse_lab) 37 | 38 | 42 | 43 | ## Data Testing and Documenting 44 | 45 | 53 | -------------------------------------------------------------------------------- /3_dbt_deep_dive/README.md: -------------------------------------------------------------------------------- 1 | # dbt Deep Dive 2 | 3 | 8 | 9 | ## Enhancing dbt experience 10 | 11 | 15 | ## Enhancing dbt code 16 | 17 | 21 | ## Deployment and Orchestration 22 | 23 | 25 | ## 🚀 Lab: [Data Vault powered by dbtVault and Greenplum](https://github.com/kzzzr/dbtvault_greenplum_demo) 26 | 27 | 31 | -------------------------------------------------------------------------------- /4_business_intelligence/README.md: -------------------------------------------------------------------------------- 1 | # Business Intelligence 2 | 3 | 6 | 7 | ## BI tools overview 8 | 9 | 14 | ## SQL for Analytics: common patterns 15 | 16 | 18 | ## Semantic Layer with dbt Metrics 19 | 20 | 24 | 25 | ## Applied Analytics: Segments, Funnels, Cohorts, GEO, RFM 26 | 27 | 29 | ## 🚀 Lab: [Semantic layer with dbt Metrics and Cube for myBI market](https://github.com/kzzzr/mybi-dbt-showcase) 30 | 31 | 32 | -------------------------------------------------------------------------------- /5_advanced_toolkit/README.md: -------------------------------------------------------------------------------- 1 | # Advanced Toolkit 2 | 3 | 7 | 8 | ## 🚀 Lab: [Configuring Slim CI for myBI market](https://github.com/kzzzr/mybi-dbt-showcase) 9 | 10 | 11 | ## External and Semi-structured data 12 | 13 | 15 | 16 | 17 | ## Monitoring, Metadata, Performance tuning 18 | 19 | 22 | ## MLOps + DataOps 23 | 24 | 27 | -------------------------------------------------------------------------------- /6_capstone_project/README.md: -------------------------------------------------------------------------------- 1 | # Capstone project 2 | 3 | ## Capstone project intro 4 | 5 | ## Case studies 6 | 7 | ## 🚀 Presenting your Capstone projects 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Analytics Engineer 2 | 3 | ## Labs 4 | 5 | - [Setting up Airbyte Data Pipelines](https://github.com/kzzzr/airbyte_lab) 6 | - [DWH powered by Clickhouse and dbt](https://github.com/kzzzr/dbt_clickhouse_lab) 7 | - [Data Vault powered by dbtVault and Greenplum](https://github.com/kzzzr/dbtvault_greenplum_demo) 8 | - [Semantic layer with dbt Metrics and Cube for myBI market](https://github.com/kzzzr/mybi-dbt-showcase) 9 | - [Configuring Slim CI for myBI market](https://github.com/kzzzr/mybi-dbt-showcase) 10 | 11 | ## Talks 12 | 13 | - (YouTube) 2023-04 [Семантический слой для Аналитики ключевых метрик – dbt Metrics vs. Cube](https://www.youtube.com/watch?v=lb20Rrezszc) 14 | - (YouTube) 2022-11 [Аналитика продуктивности команд разработки на основе данных Github](https://www.youtube.com/watch?v=Y_xGZzI5sNI) 15 | - (YouTube) 2022-10 [Работа с ГЕО-данными в DWH: координаты, зоны, агрегация](https://www.youtube.com/watch?v=IS5PIOhXLdk) 16 | - (YouTube) 2022-05 [Extract - Load как сервис и как собственное решение. Поиск баланса и дзен. День 1](https://www.youtube.com/watch?v=CR32LFCgtGE) 17 | - (YouTube) 2022-05 [Extract - Load как сервис и как собственное решение. Поиск баланса и дзен. День 2](https://www.youtube.com/watch?v=dSm2kDsRhOI) 18 | - (YouTube) 2022-04 [SQL для аналитики — прикладные задачи и подходы к их решению](https://www.youtube.com/watch?v=UIJjXBVWONo) 19 | - (YouTube) 2022-04 [End-to-end решение для аналитики на примере источника MaestroQA](https://www.youtube.com/watch?v=ImchI3LeHeg) 20 | - (YouTube) 2021-11 [Полуструктурированные данные в Аналитических Хранилищах: Nested JSON + Arrays](https://www.youtube.com/watch?v=lBwmLnMwfl0) 21 | - (YouTube) 2021-07 [Configuring Slim CI: легковесные интеграционные тесты для Хранилища Данных](https://www.youtube.com/watch?v=yfMWiyKpUkQ) 22 | - (YouTube) 2021-06 [Лучшие практики в работе с Качеством Данных (Data Quality)](https://www.youtube.com/watch?v=j5V36kztvEI) 23 | 24 | ## [Data Integration: ELT](./1_data_integration_elt/README.md) 25 | 26 | - Databases for Analytics: key principles 27 | - Handling Data Sources 28 | - Developer tools 101: IDE, Terminal, Docker, Codespaces, Terraform 29 | - 🚀 Lab: [Setting up Airbyte Data Pipelines](https://github.com/kzzzr/airbyte_lab) 30 | 31 | ## [DWH Basics](./2_dwh_basics/README.md) 32 | 33 | - DWH modeling basics 34 | - Getting familiar with dbt (data build tool) 35 | - 🚀 Lab: [DWH powered by Clickhouse and dbt](https://github.com/kzzzr/dbt_clickhouse_lab) 36 | - Data Testing and Documenting 37 | 38 | ## [dbt Deep Dive](./3_dbt_deep_dive/README.md) 39 | 40 | - Enhancing dbt experience 41 | - Enhancing dbt code 42 | - Deployment and Orchestration 43 | - 🚀 Lab: [Data Vault powered by dbtVault and Greenplum](https://github.com/kzzzr/dbtvault_greenplum_demo) 44 | 45 | ## [Business Intelligence](./4_business_intelligence/README.md) 46 | 47 | - BI tools overview 48 | - SQL for Analytics: common patterns 49 | - Semantic Layer with dbt Metrics 50 | - Applied Analytics: Segments, Funnels, Cohorts, GEO, RFM 51 | - 🚀 Lab: [Semantic layer with dbt Metrics and Cube for myBI market](https://github.com/kzzzr/mybi-dbt-showcase) 52 | 53 | ## [Advanced Toolkit](./5_advanced_toolkit/README.md) 54 | 55 | - 🚀 Lab: [Configuring Slim CI for myBI market](https://github.com/kzzzr/mybi-dbt-showcase) 56 | - External and Semi-structured data 57 | - Extending dbt with packages 58 | - Monitoring, Metadata, Performance tuning 59 | - MLOps + DataOps 60 | 61 | ## [Capstone project](./6_capstone_project/README.md) 62 | 63 | - Capstone project intro 64 | - Case studies 65 | - 🚀 Presenting your Capstone projects 66 | 67 | ## WIP About 68 | 69 | About me 70 | 71 | Learning course 72 | About platform where course is held 73 | Prerequisites: OS basic knowledge, Docker, ... 74 | --------------------------------------------------------------------------------