└── readme.md
/readme.md:
--------------------------------------------------------------------------------
1 | # Analytics Engineering Library
2 |
3 |
4 | ## Sections
5 | - [Data Teams](#data-teams)
6 | - [Modern Data Stack (MDS)](#modern-data-stack-mds)
7 | - [Analytics Engineering (AE)](#analytics-engineering-ae)
8 | - [Documentation](#documentation)
9 | - [Style Guides](#style-guides)
10 | - [Data Modeling](#data-modeling)
11 | - [SQL](#sql)
12 | - [Jinja](#jinja)
13 | - [dbt](#dbt)
14 | - [Command Line Interface (CLI)](#command-line-interface-cli)
15 | - [YAML](#yaml)
16 | - [Version Control](#version-control)
17 | - [Markdown](#markdown)
18 | - [Visual Studio Code](#visual-studio-code)
19 | - [Data Warehouses](#data-warehouses)
20 | - [Blogs](#blogs)
21 | - [AE Experts](#ae-experts)
22 | - [Cool But Not Sure Where it Goes](#cool-but-not-sure-where-it-goes)
23 | - [Non AE](#non-ae)
24 |
25 |
26 |
27 | ## Data Teams
28 | Resources and info related to ideas and best-practices for data teams to follow
29 |
30 | - [Reducing the Lottery Factor, for Data Teams](https://locallyoptimistic.com/post/reducing-the-lottery-factor-for-data-teams/)
31 | >*Laundry list of ideas/practices to help future proof a data team - esp. regarding when team smembers come/go.*
32 |
33 |
34 |
35 |
36 |
37 | ## Modern Data Stack (MDS)
38 | >*the modern data stack is a combination of data tools used for the lifecycle of data management: data ingestion, data storage, data transformation, and data visualization - **from AE with dbt course***
39 |
40 | - [What is the modern data stack? (by Charles Wang)](https://www.fivetran.com/blog/what-is-the-modern-data-stack)
41 | >*Keypoints:*
42 | >- *the modern data stack (MDS) is a suite of tools used for data integration*
43 | >- *radically new approach to data integration saves engineering time, allowing engineers and analysts to pursue higher-value activities*
44 | >- *most important difference between a modern data stack and a legacy data stack is that the modern data stack is hosted in the cloud and requires little technical configuration by the user*
45 | >- *low and declining costs of cloud computing and storage continue to increase the cost savings of a modern data stack compared with on-premise solutions*
46 |
47 | - [The Modern Data Stack: Past, Present, and Future](https://www.getdbt.com/blog/future-of-the-modern-data-stack/)
48 | >*"my thoughts on where our space has been and where it might be going." **- tristan handy***
49 |
50 |
51 |
52 |
53 | ## Analytics Engineering (AE)
54 |
55 | #### AE | What is AE and what do AEs do?
56 | >Some points from the co:rise AE with dbt Course:
57 | >- *"While Data Analysts spend the majority of their time analyzing data, Analytics Engineers spend their time **transforming, testing, deploying, and documenting data."***
58 | >- *"AEs review code like software engineers and learn **coding best-practices like making their code readable and modular"***
59 |
60 | - [The Analytics Engineering Guide](https://www.getdbt.com/analytics-engineering/)
61 |
62 | - [What is analytics engineering?](https://www.getdbt.com/what-is-analytics-engineering/)
63 |
64 | - [Becoming an analytics engineer: two insider views](https://brooklyndata.co/blog/analyticsengineerinsisderview)
65 |
66 | - [Brooklyn Data Company, AE Progression Skills at each Level](https://brooklyn-data-co.progressionapp.com/teams/data-team)
67 |
68 | - [Analytics Engineering Everywhere (by Jason Ganz)](https://jasnonaz.medium.com/analytics-engineering-everywhere-d56f363da625)
69 | >- *"if it feels like we’re at a real inflection point for Analytics Engineering — it’s because we are.*"
70 | >- *"what was very recently the domain **[AE]** of a few adventurous data teams is quickly becoming industry standard for tech organization — and there’s every reason to think that other types of organizations will be following along shortly. **the impact is just too high."***
71 | - *"right now (may 2021) analytics engineering is still a new discipline — pretty soon it will be everywhere."*
72 |
73 |
74 | #### AE | Getting Jobs
75 | - [Analytics jobs: an aggregated jobs board](https://www.getdbt.com/analytics-engineering/jobs/#analytics-engineering-jobs)
76 | >***dbt-labs**: "To assist in your data career journey, this board pulls together known job opportunities that feature dbt in their stack."*
77 |
78 | #### AE | Technological Updates
79 |
80 |
81 | #### AE | General
82 | - [Aspiring Analytics Engineers, Start Here](https://madisonmae.substack.com/p/aspiring-analytics-engineers-start)
83 |
84 | - [What Companies REALLY Want in an Analytics Engineer](https://medium.com/geekculture/what-companies-really-want-in-an-analytics-engineer-1ac03ff4494a)
85 | >*author: madison schott. excellent article outlining where AEs should be spending their time when learning where to focus efforts (skills/tools) and how to land jobs and how to be effective as an AE.*
86 |
87 | #### AE | Books Related to AE
88 |
89 | - [The Analytics Setup Guidebook](https://www.holistics.io/books/setup-analytics/)
90 | >*the awesome people at Holistics put together a **free** 187 page guide designed to: "Restructure your knowledge of the complex data analytics landscape, and learn how to build scalable analytics & BI stacks in the modern cloud era."*
91 |
92 | - [The Data Warehouse Toolkit, 3rd Edition](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/)
93 | >*the definitive guide to dimensional modeling*
94 |
95 | #### AE | Blogs
96 | - [Learn Analytics Enginneering](https://madisonmae.substack.com/archive?sort=new)
97 |
98 | - [dbt Developer Blog](https://docs.getdbt.com/blog)
99 | >*Technical tutorials from the dbt Community.*
100 |
101 | #### AE | Experts
102 | - [Claire Carroll](https://clrcrl.com/)
103 | - [Niall Rees Woodward (Brooklyn Data Co.)](https://www.niallrees.com/)
104 | >*I’m a Data Engineer based in London. An open-source enthusiast, I’m a maintainer of SQLFluff, creator of dbt_artifacts and a frequent contributor to several other projects.*
105 | - [Dave Connors](https://docs.getdbt.com/author/dave_connors)
106 | >*dave is interesting b/c he has worked in the consulting wing of dbt labs where he and his colleagues work on an unusually high number of dbt-projects. he has great content on understanding what a 'good' and 'mature' dbt project looks like and each of the milestones to hit along the way.*
107 | - [Madison Schott](https://madisonmae.substack.com/)
108 | - [Parker Tenpas](https://pdtenpas.github.io/)
109 |
110 |
111 |
112 |
113 | ## Documentation
114 |
115 | ### DRY Documentation
116 | - [Building Sustainable dbt Project Documentation (Jeremie Pineau)](https://blog.montrealanalytics.com/building-sustainable-dbt-project-documentation-8def88ca67c3)
117 | * author's quick tips for docs that scale
118 | 1. Doc blocks are a great tool to make your documentation DRY.
119 | 2. Always document fields at the source where they are introduced in your project.
120 | 3. Use your doc blocks as a single source of truth & refer to them throughout your projet.
121 |
122 |
123 |
124 | ## Style Guides
125 |
126 | #### SG | General
127 |
128 |
129 | #### SG | SQL
130 |
131 | - [SQL Style Guide, Mozilla](https://docs.telemetry.mozilla.org/concepts/sql_style.html)
132 |
133 | - [SQL Style Guide, Gitlab](https://about.gitlab.com/handbook/business-technology/data-team/platform/sql-style-guide/)
134 |
135 | - [SQL Style Guide, Matt Mazur](https://github.com/mattm/sql-style-guide)
136 |
137 | - [SQL Style Guide, Kickstarter](https://gist.github.com/fredbenenson/7bb92718e19138c20591)
138 |
139 | - [SQL Style Guide, Simon Holywell](https://www.sqlstyle.guide/)
140 |
141 | - [Write better SQL with a Style Guide | 3 Things to Consider](https://www.youtube.com/watch?v=C8rpVtyaQNI&ab_channel=KahanDataSolutions)
142 |
143 | - [3 things to include in any SQL style guide](https://www.kahandatasolutions.com/blog/3%20things%20to%20include%20in%20any%20style%20guide)
144 |
145 |
146 | #### SG | dbt
147 |
148 | - [dbt Content Style Guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
149 | >*"This guide includes standards we [dbt] want to emphasize, likely because we've made deliberate decisions about them"*
150 |
151 | - [dbt Style Guide](https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md)
152 |
153 |
154 |
155 |
156 |
157 | ## Data Modeling
158 |
159 | #### DM | General
160 |
161 | - [Building Your Data Models for Growth](https://madisonmae.substack.com/p/building-your-data-models-for-growth)
162 |
163 | - [GitLab, Enterprise Dimensional Model](https://about.gitlab.com/handbook/business-technology/data-team/platform/edw/)
164 | >*GitLabs page that has their standards/conventions for doing dimensional modeling at scale. Has excellent examples of how each different type of table is named/organized in their data warehouse.*
165 |
166 | #### DM | Cleaning / Tidying / Wrangling
167 | - [Tidy Data, Hadley Wickham](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html)
168 | > *The principles of tidy data provide a standard way to organize data values within a dataset.*
169 | - [The Quartz guide to bad data](https://github.com/Quartz/bad-data-guide)
170 | > *An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.*
171 |
172 | #### DM | Case Studies
173 | - [Modeling event data at scale (dbt presentation w/ Paul Boocock of Snowplow)](https://www.youtube.com/watch?v=H6Q-dtQ7xdM&ab_channel=dbt)
174 | >- *Data modeling is the process of using business logic to aggregate or otherwise transform raw data*
175 |
176 | #### DM | Slowly Changing Dimensions
177 | - [Slowly Changing Dimensions in Data Science](https://www.fivetran.com/blog/slowly-changing-dimensions-in-data-science)
178 |
179 | #### DM | Talks
180 | - [Back to the Future: Where Dimensional Modeling Enters the Modern Data Stack](https://www.youtube.com/watch?v=-yQa_DxEqaQ&ab_channel=dbt)
181 |
182 | #### DM | Books
183 | - [Data Modeling Storytelling](https://technicspub.com/data-model-storytelling/)
184 |
185 | #### DM | Salesforce
186 | - [Fivetran - Salesforce Source dbt Package](https://fivetran.com/docs/transformations/data-models/salesforce-data-model/salesforce-source-model)
187 | >*includes for data dicitionary/documentation for salesforce tables/fields*
188 | >**[link to github repo:](https://github.com/fivetran/dbt_salesforce_source/tree/main/models)
189 |
190 |
191 |
192 |
193 |
194 | ## SQL
195 |
196 | #### SQL | General
197 |
198 | - [CTEs versus Subqueries](https://www.alisa-in.tech/post/2019-10-02-ctes/)
199 | - [The most underutilized function in SQL](https://www.getdbt.com/blog/the-most-underutilized-function-in-sql/)
200 | > *In this post I’m going to show you two uses for md5() that make it one of the most powerful tools in my SQL kit.*
201 | - [The Three-Valued Logic of SQL](https://modern-sql.com/concept/three-valued-logic)
202 | >*SQL uses a three-valued logic: besides true and false, the result of logical expressions can also be unknown. SQL’s three valued logic is a consequence of supporting null to mark absent data. If a null value affects the result of a logical expression, the result is neither true nor false but unknown.*
203 |
204 | - [SQL Fiddle](http://sqlfiddle.com/about.html)
205 | >*A tool for easy online testing and sharing of database problems and their solutions.*
206 |
207 | - [Regular Expression Tool 01](https://regexr.com/)
208 |
209 | - [Regular Expression Tool 02](https://regex101.com/)
210 |
211 | - [RegexLearn](https://regexlearn.com/learn/regex101)
212 | >*56 exercises for learning regular expressions - 101 basics*
213 |
214 | #### SQL | Conceptual Execution Order
215 |
216 | - [SQL Order of Operations – In Which Order MySQL Executes Queries?](https://www.eversql.com/sql-order-of-operations-sql-query-order-of-execution/)
217 | >**Learnings:**
218 | -window functions can only be used in either the SELECT or the ORDER BY clause
219 | -aggregation functions can be used inside Window functions, e.g., SUM(COUNT(*)) OVER ()
220 |
221 | #### SQL | Advice
222 |
223 | - [One analyst's guide for going from good to great](https://www.getdbt.com/blog/one-analysts-guide-for-going-from-good-to-great/)
224 | >- *Use Common Table Expressions for extremely readable SQL*
225 | >- *Learn Window functions and ditch the spreadsheets*
226 | >- *Use Aggregate Case Statements for Easy Summaries of Data*
227 |
228 | #### SQL | Filtering Data
229 |
230 | - [HAVING vs. WHERE in SQL: What You Should Know](https://learnsql.com/blog/sql-having-vs-where/)
231 | >*"In simple words, the WHERE and HAVING clauses **act as filters;** they remove records or data that don’t meet certain criteria from the final result of a query. However, they are **applied to different sets of data.** That’s the important point to understand about WHERE vs. HAVING: **WHERE filters at the record level,** while **HAVING filters at the "group of records" level."*** 🔥
232 |
233 | - [What is the difference between HAVING and WHERE clause?](https://afteracademy.com/blog/what-is-the-difference-between-having-and-where-clause)
234 | >*Not the best article BUT I do like this clarifying statement: "If **GROUP BY** is used then it is **executed after the WHERE clause** is executed in the query. It means it selects the rows before grouping is done or aggregate calculations are performed. That's why, the WHERE clause is also called **Pre-filter.** But, **GROUP BY is executed before the execution of the HAVING clause.** It means it selects the rows after aggregate calculations are performed. That's why, the HAVING clause is also called as **Post-filter**."*
235 |
236 |
237 | #### SQL | Assembling Data (Join/Union/Except/Etc.)
238 |
239 | - [The problem of SQL fanouts](https://community.looker.com/technical-tips-tricks-1021/the-problem-of-sql-fanouts-30232)
240 | >*fanouts happen when joining tables having a one-to-many relationship; the primary table (left) is joined to the secondary table (right) and the result ends up having more rows than the left table began with.* this situation can cause errors when applying aggregate functions afterwards. **three things to note:**
241 | > - **"no-fanout (one-to-one OR many-to-one):** you can trust aggregate functions on your primary table but not necessarily on your joined tables"
242 | > - **"fanout (one-to-many):** you cannot necessarily trust aggregate functions on either your primary table or your joined tables"
243 | > - **"help avoid fanouts (protip):** begin your joins with the most granular table (many-to-one)"
244 |
245 | - [SQL Joins Using WHERE or ON](https://mode.com/sql-tutorial/sql-joins-where-vs-on/)
246 | >*Has a good illustration of filtering data prior to joining tables.*
247 |
248 | - [Difference between WHERE and ON in SQL](https://dataschool.com/how-to-teach-people-sql/difference-between-where-and-on-in-sql/)
249 | >*Makes the case that the WHERE clause and the ON clause should ONLY be used for there intended purposes, to filter and to join data, respectively. The author says this in the intro: "ON should be used to define the join condition and WHERE should be used to filter the data. I used the word should because this is not a hard rule. The splitting of these purposes with their respective clauses makes the query the most readable". **100% Agree!!! Prioritizing readability is key 🔑***
250 |
251 | - [An Introduction to Using SQL Aggregate Functions with JOINs](https://learnsql.com/blog/introduction-using-aggregate-functions-joins/)
252 | >*Key Insight(s): using conditions in the JOIN predicate (after the ON) is not the same as filtering in the WHERE (or using HAVING). These can create subtle (or not so subtle) differences in your summarized data, which could result in hard-to-spot errors.*
253 | - *WHERE conditions are applied **after** the JOIN*
254 | - *conditions (for filtering) applied in the JOIN predicate are
255 | applied **before** the join*
256 |
257 |
258 |
259 |
260 | #### SQL | Grouping Data
261 |
262 | - [Write better SQL: In defense of group by 1](https://www.getdbt.com/blog/write-better-sql-a-defense-of-group-by-1/)
263 |
264 |
265 | #### SQL | General Functions
266 |
267 | - [SQL COUNT Function Explained with Examples](https://www.databasestar.com/sql-count/)
268 | >*Really nice article that goes over the syntax of count() and ALL of the nuances that matter when using it*
269 |
270 | ```sql
271 | -- what does count() do?
272 | COUNT(*) -- counts ALL rows, including duplicates and nulls
273 | COUNT(expression) -- counts ALL rows excluding nulls
274 | COUNT(DISTINCT expression) -- counts ALL rows excluding duplicates and nulls
275 |
276 | -- basic syntax
277 | COUNT ( [ * | [ DISTINCT | ALL ] expression) [ over (analytic_clause) ]
278 |
279 | -- variations for how to call count()
280 | COUNT(*)
281 | COUNT(DISTINCT expression)
282 | COUNT(ALL expression)
283 | COUNT(*) OVER (analytic_clause)
284 | COUNT(DISTINCT expression) OVER (analytic_clause)
285 | COUNT(ALL expression) OVER (analytic_clause)
286 | ```
287 |
288 | - [What is the Difference Between COUNT(*), COUNT(1), COUNT(column name), and COUNT(DISTINCT column name)?](https://learnsql.com/blog/difference-between-count-distinct/)
289 | >*Have you noticed there are different variations of the SQL COUNT() function? This article explains the various arguments and their uses.*
290 |
291 |
292 | #### SQL | Window Functions
293 |
294 | - [Tips And Tricks: How To Fill Null Values in SQL](https://towardsdatascience.com/tips-and-tricks-how-to-fill-null-values-in-sql-4fccb249df6f)
295 | >*Clever trick using window funciton with cumulative count to create groups that then facilitate forward filling of non null values (filling down)*
296 |
297 | - [SQL Window Functions vs. SQL Aggregate Functions: Similarities and Differences](https://learnsql.com/blog/window-functions-vs-aggregate-functions/)
298 |
299 | - [An Easy Guide to Advanced SQL Window Functions](https://towardsdatascience.com/a-guide-to-advanced-sql-window-functions-f63f2642cbf9)
300 | >Three Types of Window Fuctions:
301 | **1. Aggregate Window Functions:** avg(), max(), mix(), sum(), count()
302 | **2. Ranking Window Functions:** row_number(), rank(), dense_rank(), percent_rank(), ntile()
303 | **3. Value Window Functions:** lag(), lead(), first_value(), last_value(), nth_value()
304 |
305 | - [Let's Learn SQL Window Functions](https://madisonmae.substack.com/p/lets-learn-sql-window-functions)
306 | >*nice summary of window function's by Madison Mae (+with examples)*
307 |
308 | - [What a Moving Average Is and How to Compute it in SQL](https://learnsql.com/blog/moving-average-in-sql/)
309 | >- Rolling/Moving averages remove noise (smooth data) so that we can focus on trends.
310 | >- Whereas 'other' window functions have the partition be over 'some' group (or entire table as one single parition), with moving averages each row has a 'different' window frame (e.g., current row + two preceeding rows - for a 3 day moving average).
311 | ```sql
312 | -- example for 3 day moving average (w/full table as partition)
313 | select
314 | moving_average.*
315 | , avg(price) over(order by sales_date
316 | rows between 2 preceding and current row ) as moving_average
317 | from stock_price;
318 | ```
319 |
320 |
321 |
322 | - [[Video] Lead and Lag functions in SQL Server 2012](https://www.youtube.com/watch?v=l_Zn5sdkamM&ab_channel=kudvenkat)
323 | >*excellent overview of these two functions (should generalize to other SQL dialects)*
324 | **lead()** function returns values from previous rows + includes it w/current row/record.
325 | **lag()** function returns values from following rows + includes it w/current row/record.
326 | **order by** clause is required.
327 | **partition by** clause is optional
328 | **syntax:** LEAD/LAG(col-name, offset, default-value) OVER(ORDER BY col1, col2, ...)
329 |
330 |
331 |
332 | #### SQL | Training
333 |
334 | - [learnsql.com | Learn & Practice SQL](https://learnsql.com/)
335 | >*one of my favorite online platforms for learning/practicing SQL*
336 | - [learnsql.com | SQL Joins](https://learnsql.com/course/joins)
337 | >*Review and deepen your knowledge of SQL JOINs with 93 exercises. Practice common and less common ways of getting data from multiple tables.*
338 | - [co:rise | Intro to SQL](https://corise.com/course/intro-to-sql)
339 | >*This course provides an introduction to SQL, a programming language that will unleash your ability to explore data. We'll cover all the fundamentals of SQL, and you'll leave knowing how to issue SQL queries and interact with databases, as well as how to translate English queries to SQL correctly and quickly. We'll approach SQL in a hands-on manner with real-life examples and we'll build complexity in our SQL queries week over week.*
340 |
341 |
342 |
343 | #### SQL | Interviewing
344 | - [Top Skills to Ace Every SQL Interview Question](https://medium.com/towards-data-science/top-skills-to-ace-every-sql-interview-question-33356b08845a)
345 | >***Madison Schott:** *"Once you nail a few key concepts you can pretty much answer any question, with lots of practice that is. Start with the basics for each of these concepts and work your way up to more difficult problems."*
346 | 1) Master Joins;
347 | 2) Master Aggregate Functions;
348 | 3) Master Subqueries.
349 | - [How to learn SQL for data science interview (the minimize effort maximize outcome way)](https://www.youtube.com/watch?v=vaD3ZFFNwhM&ab_channel=TinaHuang)
350 | >*scientifically-backed study plan to learn SQL most efficiently with the least amount of time and effort*
351 |
352 | - [Analyzing 89 Responses to a SQL Screener Question for a Senior Data Analyst Position](https://mattmazur.com/2018/11/12/analyzing-89-responses-to-a-sql-screener-question-for-a-senior-data-analyst-position/comment-page-1/?unapproved=55959&moderation-hash=65cb4dbf0ddf1d2f87c78641dbcc59f7#comment-55959)
353 | >*Matt Mazur's analysis of candidate responses to a SQL Screener Question that was designed to weed out weaker candidates*
354 |
355 | - [31 SQL Questions for Data Analysts [Updated for 2022]](https://www.interviewquery.com/p/sql-questions-data-analyst)
356 |
357 | - [Three Tricky Analytics Interview Questions with Andrew](https://www.youtube.com/watch?v=uLCFCzVLi4Q&ab_channel=DataScienceJay)
358 |
359 | >*We tackle three analytics interview questions by solving them with SQL. Each one is progressively harder and Andrew explains his methodology towards solving each question!*
360 |
361 |
362 | - [SQL Sundays (YouTube)](https://www.youtube.com/playlist?list=PLVD3APpfd1tuXrXBWAntLx4tNaONro5dA)
363 | > *9 Mock Interview videos with 9 different SQL problems*
364 |
365 | - [Frequently Asked Questions About SQL](https://learnsql.com/blog/frequently-asked-questions-about-sql/)
366 | >*Not my favorite article BUT has a solid list of topics AND also some great links to other resources. Keeping in the library for now...*
367 |
368 | #### SQL | History
369 | - [Relational Database History, Edgar F. Codd](https://www.ibm.com/ibm/history/ibm100/us/en/icons/reldb/)
370 |
371 |
372 | #### SQL | Experts
373 | - [Tihomir Babic, Technical Writer for LearnSQL.com](https://learnsql.com/authors/tihomir-babic/)
374 |
375 |
376 | #### SQL | Never Know, Might Need It
377 |
378 | - [SQL Delete Join](https://www.educba.com/sql-delete-join/?source=leftnav)
379 | >*an advanced method for using the delete command came up in a leetcode problem. this article illustrates the 'sql-delete-join' - may not need this in AE but thought it was interesting so hanging on to it for now*
380 |
381 |
382 | #### SQL | Linters
383 |
384 | - [SQL Fluff](https://www.sqlfluff.com/)
385 | > [Rolling out SQLFluff with a new team](https://docs.sqlfluff.com/en/stable/teamrollout.html)
386 |
387 | - [Drizly's SQLFluff GitHub Workflow](https://github.com/sqlfluff/sqlfluff-github-actions/tree/main/menu_of_workflows/drizly)
388 |
389 |
390 | - [sqlfmt](https://sqlfmt.com/#)
391 | >*sqlfmt **formats your dbt SQL files** so you don't have to. It is similar in nature to black, gofmt, and rustfmt (but for SQL).*
392 |
393 |
394 |
395 |
396 | ## Jinja
397 |
398 | #### Jinja | General
399 |
400 | - [Jinja Template Designer Documentation](https://jinja.palletsprojects.com/en/3.1.x/templates/)
401 | >*"This document describes the syntax and semantics of the template engine and will be most useful as reference to those creating Jinja templates."*
402 |
403 |
404 |
405 | ## dbt
406 |
407 | >*Nice definition/overview of dbt from AE with dbt course:*
408 | > - *"what is dbt? dbt (data build tool) is an open source python framework and CLI tool for compiling SQL queries into full data model DAGs that can be deployed against a warehouse. dbt is agnostic about the warehouse it is connecting to. dbt does the “T” in Extract Load Transform. It transforms the data you’ve brought into your data warehouse."*
409 |
410 | #### dbt | Useful Code
411 | ```sql
412 | -- to open profile.yml file
413 | open ~/.dbt/profiles.yml
414 | ```
415 |
416 | #### dbt | Best Practices
417 |
418 | - [Best practice guides](https://docs.getdbt.com/guides/best-practices)
419 | >*Learn how dbt Labs approaches building projects through our current viewpoints on structure, style, and setup.*
420 |
421 | - [Data Change Management: Lessons Learned at Vouch](https://www.youtube.com/watch?v=FC0DuGY1DvM&ab_channel=dbt)
422 | >*Kshitij Aranke talks at Coalesce about **automated change management**, a software engineering best-practice that he argues should be applied to AE within dbt (+beyond).*
423 |
424 | - [Introducing the dbt_project_evaluator: Automatically evaluate your dbt project for alignment with best practices](https://docs.getdbt.com/blog/align-with-dbt-project-evaluator)
425 | >*Must try this out. It is basically a similar idea as a linter for SQL code but for checking to be sure a dbt-project is following a set of best-practices. Instructions in github/readme show how to **add package asa CI check (example of auto-change-mngt?)** And here is a link to a Coalesce talk on the package: [dbt Project Evaluator](https://www.youtube.com/watch?v=smbRwmcM1Ok&ab_channel=dbt)*
426 |
427 |
428 | #### dbt | Primary Keys
429 |
430 | - [Primary key (natural vs. surrogate)](https://docs.getdbt.com/terms/primary-key)
431 | - [Surrogate key](https://docs.getdbt.com/terms/surrogate-key)
432 | - [Generating Surrogate Keys Across Warehouses](https://docs.getdbt.com/blog/sql-surrogate-keys)
433 | - [dbt-project-evaluator | Missing Primary Key Tests](https://dbt-labs.github.io/dbt-project-evaluator/latest/rules/testing/#missing-primary-key-tests)
434 |
435 |
436 | #### dbt | Testing
437 |
438 | - [Improving data reliability - Andrea Kopitz, Envoy (SF dbt Meetup, November 2019)](https://www.youtube.com/watch?v=M_cNspn2XsE&ab_channel=dbt)
439 | >*Andrea Kopitz from Envoy presents on how her team was struggling with data errors/trust/etc. issues and the strategies her and her team implemented to turn things around. I'd really like to come back to this and implement some of her ideas in the future.*
440 |
441 |
442 | #### dbt | sample projects
443 |
444 | - [jaffle_shop dbt-project by dbt-labs](https://github.com/dbt-labs/jaffle_shop)
445 |
446 | - [Gitlab Data Team, Mature Production dbt-project](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt)
447 |
448 | #### dbt | General
449 |
450 | - [The dbt Viewpoint](https://docs.getdbt.com/community/resources/viewpoint)
451 | >Building a Mature Analytics Workflow: The dbt Viewpoint!
452 |
453 | - [GitLabs File/Code Using dbt_utils.date_spine() to create Date Dimension(s) Table](https://gitlab.com/gitlab-data/analytics/-/blob/master/transform/snowflake-dbt/models/sources/date/date_details_source.sql?_gl=1%2ab6hyp4%2a_ga%2aMTUzODIxMDk2NC4xNjY1ODY4OTE3%2a_ga_ENFH3X7M5Y%2aMTY2NTg2ODkxNy4xLjAuMTY2NTg2ODkxNy4wLjAuMA..)
454 |
455 | - [Never used dbt? Here’s how it can change your data game](https://www.getcensus.com/blog/never-used-dbt)
456 | >- ***great explanation:** dbt is a data transformation tool that leverages the power of SQL and Jinja to write modular data models within your data warehouse. It reads from the data within your data warehouse and writes to it without ever leaving, allowing you to test and document your code and promoting proper code maintenance along the way.*
457 | >- *dbt is an open-source tool that is completely free to use*
458 | >- *implementing dbt is nearly risk-free since it lives entirely in your data warehouse. It reads from your data warehouse and writes to it. In fact, dbt puts the T in ETL/ELT.*
459 |
460 | #### dbt | code snippets
461 |
462 | - things to remember +/or that i keep forgetting =)
463 | >- dbt docs generate AND dbt docs serve --no-browser
464 |
465 | #### dbt | Getting Help
466 |
467 | - [dbt discourse](https://discourse.getdbt.com/)
468 | >**dbt hosted site for getting support with all things dbt (help, show/tell, discussion, etc.)*
469 |
470 |
471 | #### dbt | Project Setup
472 |
473 | - [Overview of building a dbt project](https://docs.getdbt.com/docs/building-a-dbt-project/projects)
474 | >*root of dbt-docs that outlines how to get up and running with dbt Cloud and dbt CLI*
475 |
476 | - [Getting started with dbt Core](https://docs.getdbt.com/guides/getting-started/learning-more/getting-started-dbt-core)
477 | >*dbt-doc(s) with really awesome walk through of setting up a dbt-project using dbt CLI.*
478 | > - *very thorough documentation/tutorial on getting a dbt-project up and running start/finish*
479 | > - *has link(s) to a really thorough example of seting up and loading data from Google BigQuery. also, has examples for databricks, redshift, snowflake*
480 | > - has awesome FAQs sections with loads of pro-tips
481 |
482 | - [Configuring your profile.yml](https://docs.getdbt.com/dbt-cli/configure-your-profile)
483 | >*awesome dbt-doc(s) w/ALL the details needed to properly setup the **profile.yml** file. includes a reference link to a doc w/ALL warehouses that dbt can connect to and how to setup profile.yml files for each.*
484 |
485 | - [How to build a mature dbt project from scratch](https://www.getdbt.com/coalesce-2021/how-to-build-a-mature-dbt-project-from-scratch/)
486 | >*dave connors from dbt gives a presentation on how to build a dbt-project from the ground up.*
487 |
488 |
489 | #### dbt | packages
490 |
491 | - [codegen](https://hub.getdbt.com/dbt-labs/codegen/latest/)
492 | >***dbt-codegen:** Macros that generate dbt code, and log it to the command line.*
493 |
494 | ```jinja
495 | # code snippets/examples
496 |
497 | # generate source-yaml code:
498 | dbt run-operation generate_source --args '{"database_name": "raw", "schema_name": "public", "generate_columns": True, "include_descriptions": True}'
499 |
500 | # generate sql code for source/base/stage models
501 | dbt run-operation generate_base_model --args '{"source_name": "postgres", "table_name": "addresses", "leading_commas": True}'
502 |
503 | # generate model-yaml code:
504 | dbt run-operation generate_model_yaml --args '{"model_name": "stg_user_addresses"}'
505 | ```
506 |
507 |
508 | #### dbt | certification
509 |
510 | - [dbt Analytics Engineering Certification Exam](https://www.getdbt.com/certifications/analytics-engineer-certification-exam/)
511 | >*page to register for exam. very useful page b/c it has exam prep material: a) what's covered section; b) link to sample-questions; & c) link to a study-guide*
512 |
513 | - [A Guide to Passing the dbt Analytics Engineering Certification](https://aimpointdigital.com/guide-passing-dbt-analytics-engineering-certification/)
514 | > [linkedin post by blog post author on topic](https://www.linkedin.com/posts/aimpoint-digital_a-guide-to-passing-the-dbt-analytics-engineering-activity-7008886189420154880-YAsq?utm_source=share&utm_medium=member_desktop)
515 |
516 |
517 |
518 |
519 |
520 |
521 |
522 | ## Command Line Interface (CLI)
523 |
524 | - [The Linux command line for beginners](https://ubuntu.com/tutorials/command-line-for-beginners#1-overview)
525 |
526 | - [Basics of BASH for Beginners](https://towardsdatascience.com/basics-of-bash-for-beginners-92e53a4c117a)
527 | >*suggested reading in the AE w/dbt course and a great overview to "learn about some of the most useful BASH commands and the utility they offer".*
528 |
529 |
530 |
531 |
532 |
533 | ## YAML
534 |
535 | - [YAML Tutorial | Learn YAML in 10 Minutes (by Kahan Data Solutions)](https://www.youtube.com/watch?v=BEki_rsWu4E&ab_channel=KahanDataSolutions)
536 | > - data-serialization language meant to be human-readable + computationally powerful
537 | > - commonly used for configuration files + depends on key: value pairs + relies on spacing/indentation
538 | > - protip when enterning 'values':
539 | - add '>' to wrap text in yaml file without it being wrapped in compiled output
540 | - add '|' to wrap text in yaml file with it being wrapped in compiled output
541 |
542 | - [YAML Checker](https://yamlchecker.com/)
543 | > *YAML Checker provides a quick and easy way to validate YAML. As you type, your YAML will be validated with beautiful syntax highlighting and error information.*
544 |
545 |
546 |
547 |
548 |
549 | ## Version Control
550 |
551 | #### Git
552 |
553 | - [A typical GitHub workflow // what to expect](https://www.youtube.com/watch?v=02aQhH5cNBg&ab_channel=KahanDataSolutions)
554 | >*Kahan Data Solutions: "While there is a lot to learn about GitHub, in this video I want to show you the main things you'll need to know and what a typical workflow might look like"*
555 |
556 | - [An Intro to Git and GitHub for Beginners (Tutorial)](https://product.hubspot.com/blog/git-and-github-tutorial-for-beginners)
557 | >*recommended in AE with dbt course. went through it, pretty solid.*
558 |
559 | - [Git Cheat Sheet](https://training.github.com/downloads/github-git-cheat-sheet.pdf)
560 | >*really nice no BS cheat sheet*
561 |
562 | - [Learn Git Branching](https://learngitbranching.js.org/)
563 | >*from the creators: "Learn Git Branching" is the most visual and interactive way to learn Git on the web; you'll be challenged with exciting levels, given step-by-step demonstrations of powerful features, and maybe even have a bit of fun along the way.*
564 |
565 | - [Introduction to Git](https://www.datacamp.com/courses/introduction-to-git)
566 | >*Course recommended by Madison Schott*
567 | - [Git Immersion](https://gitimmersion.com/index.html)
568 | > *A guided tour that walks through the fundamentals of Git, inspired by the premise that to know a thing is to do it.*
569 | - [How to Write a Git Commit Message](https://cbea.ms/git-commit/)
570 | > *Commit messages matter. Here's how to write them well.*
571 | - [Learn git concepts, not commands](https://dev.to/unseenwizzard/learn-git-concepts-not-commands-4gjc)
572 | > *An interactive git tutorial meant to teach you how git works, not just which commands to execute.*
573 | - [Flight rules for Git](https://github.com/k88hudson/git-flight-rules)
574 | > *A guide for astronauts (now, programmers using Git) about what to do when things go wrong.*
575 |
576 |
577 |
578 |
579 |
580 | ## Markdown
581 | - [Introduction to Markdown in Visual Studio Code (with Markdown worksheet!)](https://www.youtube.com/watch?v=pTCROLZLhDM)
582 |
583 |
584 |
585 | ## Visual Studio Code
586 | - Shortcuts:
587 | - Open Preview: `CMD + K then V` (splits screen) OR `CMD + Shift + V`
588 |
589 |
590 |
591 |
592 | ## Data Warehouses
593 |
594 |
595 | #### DW | Snowflake
596 |
597 | - [How we configure Snowflake (by dbt-labs)](https://www.getdbt.com/blog/how-we-configure-snowflake/)
598 | >*dbt-labs standard approach for seeting up Snowflake*
599 |
600 |
601 |
602 |
603 | - Useful Code:
604 |
605 | ```sql
606 | -- use schema-info table to setup column/code
607 | select concat(', ', lower(column_name)), ordinal_position
608 | from information_schema.columns
609 | where lower(table_name) = 'int_users_joined_with_user_details'
610 | order by 2
611 |
612 | -- check Timezonegit s
613 | show parameters like '%timezone%'
614 | ```
615 |
616 |
617 |
618 |
619 |
620 |
621 | ## Blogs
622 | - [Learn Analytics Engineering](https://madisonmae.substack.com/)
623 | - [Locally Optimistic](https://locallyoptimistic.com/)
624 | - [Modern SQL - A Lot has Changed Since SQL - 92](https://modern-sql.com/concept/three-valued-logic)
625 | - [Kahan Data Solutions](https://www.kahandatasolutions.com/blog)
626 |
627 |
628 |
629 |
630 |
631 | ## Cool But Not Sure Where it Goes
632 | - [Github: Star History Charts](https://star-history.com/#sqlfluff/sqlfluff&Date)
633 | - [Salary Hints](https://www.salaryhints.com/)
634 | >*site by andrew chen that crowdsources salary data for data roles*
635 |
636 |
637 |
638 |
639 | ## Non-AE
640 |
641 | - [When Subtraction Adds Value](https://hbr.org/2022/02/when-subtraction-adds-value)
642 | >*interesting take on how to think about subtracting work instead of adding work when involved in Decision Making and Problem Solving.*
643 |
644 | - [Estimation is a Core Competency](https://postlight.com/insights/estimation-is-a-core-competency)
645 | >*Estimating how much time and effort it will take to build or fix software isn’t easy, and most people are really bad at it.*
646 |
647 | - [Confidence building is not romantic. Sorry](https://candacedoby.com/confidence-building-not-romantic/)
648 |
649 | - [Good Interviewer/Bad Interviewer](https://www.metaview.ai/resources/blog/good-interviewer-bad-interviewer)
--------------------------------------------------------------------------------