└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Resources for Analytics Engineers 2 | This repository is a curation of good blog posts and books for Analytics Engineers. It can also be very useful for Data Analysts and Data Scientists. 3 | 4 | ## Contribute 5 | I really appreciate any contribution. Just make sure to describe the theme and why you found the resource useful. 6 | 7 | # Table of Contents 8 | - [SQL](#sql) 9 | - [Python](#python) 10 | - [Infrastructure](#infrastructure) 11 | - [Analytics Skills](#analytics-skills) 12 | - [Data Warehousing](#data-warehousing) 13 | - [Data Pipelines](#data-pipelines) 14 | - [Starting analytics in a company](#starting-analytics-in-a-company) 15 | - [Testing data](#testing-data) 16 | - [Success Stories](#success-stories) 17 | - [Organisation](#organisation) 18 | - [Data Visualisation](#data-visualisation) 19 | - [Marketing and data](#marketing-and-data) 20 | - [Thinking with data](#thinking-with-data) 21 | - [Github-Gitlab repo to learn from](#github-gitlab-repo-to-learn-from) 22 | - [Against ELT](#against-elt) 23 | - [Other readings lists](#other-readings-lists) 24 | - [Top bloggers/blog](#top-bloggersblog) 25 | 26 | # Readings 27 | 28 | Definition of the Analytics Engineer: [The Analytics Engineer](https://www.locallyoptimistic.com/post/analytics-engineer/). 29 | 30 | 31 | ### SQL 32 | SQL has a lot of tips and tricks that take times to know. 33 | * [Mode Analytics SQL Guide](https://mode.com/sql-tutorial/introduction-to-sql/). Very complete, even intermediate users can learn from this series of tutorials. 34 | * [Learning SQL 201: Optimizing Queries, Regardless of Platform](https://towardsdatascience.com/learning-sql-201-optimizing-queries-regardless-of-platform-918a3af9c8b1) By Randy Au. I finally found a complete post on advanced SQL. 35 | 36 | ### Python 37 | Python is a very broad subject. Maybe you can follow this list for more [Python focused readings](https://github.com/charlax/python-education). 38 | * [Python for Data Analysis](https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662). :book: Very comprehensive book about using python for data stuff. 39 | * [Pandas Cheatsheet](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf) I use it everyday! 40 | * [Modern pandas](https://tomaugspurger.github.io/modern-1-intro.html). A series of blog posts on intermediate/advanced pandas written by one of the maintainers. 41 | 42 | ### Infrastructure 43 | 44 | * [The Startup Founder's Guide to Analytics](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1). An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up. 45 | * [The missing layer of Analytics Stack](https://blog.getdbt.com/the-missing-layers-of-the-analytics-stack). 46 | * [Choosing a Data Warehouse](https://discourse.getdbt.com/t/choosing-a-data-warehouse/62/4). A lot of excellent answers on what to choose for your data warehouse. 47 | * [Data science for start-ups](https://bgweber.github.io/intro.html). You can find some useful information in this free book. 48 | * [Designing Data-Intensive Applications](https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321) :book: Fascinating read to learn more about databases, protocols etc... 49 | * [The Modern Data Stack: Past, Present, and Future](http://blog.getdbt.com/future-of-the-modern-data-stack/) A must-read on the last innovations in the data stack. 50 | 51 | **Comparison of tools by Stephen Levin** 52 | * [Looker vs Tableau vs Mode. Data Visualisation tools compared](https://www.stephenlevin.co/advanced-analytics-part-3-data-visualization/). . 53 | * [Segment vs Fivetran vs Stitch: Which Data Ingest Should You Use?](https://www.stephenlevin.co/segment-vs-fivetran-vs-stitch-which-data-ingest-should-you-use/) 54 | 55 | ### Analytics Skills 56 | * [One analyst's guide for going from good to great](https://blog.getdbt.com/one-analysts-guide-for-going-from-good-to-great/) 57 | * [Suceeding as the first data person in a small company/startup](https://towardsdatascience.com/succeeding-as-a-data-scientist-in-small-companies-startups-92f59e22bd8c). A must read for anyone working in data even in a big company. 58 | * [Prioritizing data science work](https://towardsdatascience.com/prioritizing-data-science-work-936b3765fd45). Too many engineers like building ivory towers. Make sure you don't fall in the trap. 59 | 60 | ### Data Warehousing 61 | 62 | * [The beginner guide to data engineering series](https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7). Start here if you don't know what is a star schema, Airflow and some basic practices when writing data pipelines. 63 | * [Best practices for data modeling](https://www.stitchdata.com/blog/best-practices-for-data-modeling/). A lot of practical tips on naming, grain, permissions and materialization. 64 | * [The Data Warehouse Toolkit](https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802/ref=sr_1_1?crid=FV5A2S72XIZO&keywords=data+warehouse+toolkit&qid=1566644628&s=gateway&sprefix=data+ware%2Caps%2C213&sr=8-1) by Ralph Kimball. :book: A classic in Business Intelligence. Some chapters can be gold on modeling your data warehouse. 65 | * [Functional Data Engineering — a modern paradigm for batch data processing](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a). You will learn the spirit behind good data pipelines and a well-designed data warehouse. 66 | * [The rise of the Data Engineer](https://medium.com/free-code-camp/the-rise-of-the-data-engineer-91be18f1e603). Explains recent evolutions of the job and data practices. 67 | * [Five principles that will keep your data warehouse organized](https://blog.getdbt.com/five-principles-that-will-keep-your-data-warehouse-organized/) 68 | * [Using Postgres as a data warehouse](https://www.narrator.ai/blog/using-postgresql-as-a-data-warehouse/) I wish I read this post earlier. So much wisdom on Postgres. 69 | * [For Data Warehouse Performance, One Big Table or Star Schema?](https://fivetran.com/blog/obt-star-schema). Discussion on an alternative to star schema. 70 | 71 | ### Data Pipelines 72 | 73 | * [Functional Data Engineering — a modern paradigm for batch data processing](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a). You will learn the spirit behind good data pipelines and a well-designed data warehouse. 74 | * [Maintenable ETL: Tips for Making Your Pipelines Easier to Support and Extend](https://multithreaded.stitchfix.com/blog/2019/05/21/maintainable-etls/). Best practices to write good ETL. 75 | * [The Data Warehouse ETL Toolkit](https://www.amazon.com/gp/product/0764567578?ie=UTF8&tag=decworks-20&lin%20kCode=xm2&camp=1789&creativeASIN=0764567578) :book: Once again, very dense book but you can find good ideas. 76 | 77 | ### Starting analytics in a company 78 | * [Building a data practice from scratch](https://www.locallyoptimistic.com/post/building-a-data-practice/). Very useful for your first weeks as a data person. 79 | * [The Startup Founder's Guide to Analytics](https://thinkgrowth.org/the-startup-founders-guide-to-analytics-1d2176f20ac1). An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up. 80 | 81 | 82 | ### Testing data 83 | * [Automated Testing In The Modern Data Warehouse](https://medium.com/@josh.temple/automated-testing-in-the-modern-data-warehouse-d5a251a866af). Practical advice to test data. Useful for everyone building data pipelines. Rare to found such a post dealing with non-sexy thing in data. 84 | 85 | 86 | ### Success Stories 87 | * [Scaling analytics at Wish](https://medium.com/wish-engineering/scaling-analytics-at-wish-619eacb97d16) 88 | * [Building Analytics at 500px](https://medium.com/@samson_hu/building-analytics-at-500px-92e9a7005c83) 89 | 90 | ### Organisation 91 | * [Engineer shouldn't write ETL](https://multithreaded.stitchfix.com/blog/2016/03/16/engineers-shouldnt-write-etl/). It's more data science focused but it's a classic. 92 | * [Does my startup data team need a data engineer?](https://blog.getdbt.com/does-my-startup-data-team-need-a-data-engineer-/) 93 | 94 | ### Marketing and data 95 | * [Data Driven Marketing](https://www.amazon.com/Data-Driven-Marketing-Metrics-Everyone-Should/dp/0470504544/ref=sr_1_1?crid=38ZUOKHZZEY6D&keywords=data+driven+marketing&qid=1566644698&s=gateway&sprefix=data+driven%2Caps%2C209&sr=8-1). :book: Reading some chapters can help you think like a marketer with data driven approach. It's a gem. Didn't find this kind of insights elsewhere. 96 | 97 | ### Thinking with data 98 | These books/articles helped me to think better when analysing data. 99 | 100 | * [Common Data Mistakes to Avoid](https://www.geckoboard.com/learn/data-literacy/statistical-fallacies/). Excellent summary of the most common fallacies when analyzing data. Very clear and well-explained. 101 | * [Thinking fast and slow](https://www.amazon.com/dp/0374533555/ref=cm_sw_em_r_mt_dp_U_wOryDb6WC3CVE). Learning about bias can be super useful. For instance, I didn't have the reflex to think of a base rate anytime I see a figure. 102 | * [Fooled by randomness](https://www.amazon.com/Fooled-Randomness-Hidden-Markets-Incerto/dp/0812975219/ref=sr_1_1?crid=2QEXPWM35W0BR&keywords=fooled+by+randomness&qid=1566644880&s=books&sprefix=foole%2Cstripbooks-intl-ship%2C207&sr=1-1). 103 | :book: Nassim Taleb taught so much both professionally and personnaly. In Fooled By Randomness, you will learn about major pitfalls when dealing with data in **real life**. 104 | * [Why you should care about the Nate Silver vs. Nassim Taleb Twitter war](https://towardsdatascience.com/why-you-should-care-about-the-nate-silver-vs-nassim-taleb-twitter-war-a581dce1f5fc). Great chess players learn from high elo games. Great data people learn from debate between data experts. 105 | * [Five books every data scientist should read that are not about data science](https://towardsdatascience.com/five-books-every-data-scientist-should-read-that-are-not-about-data-science-f7335fb1f84f). I have not read them all yet. But these suggestions seems judicious. 106 | 107 | 108 | ### Data Visualisation 109 | * [Fundamentals of Data Visualisation](https://serialmentor.com/dataviz/). Complete guide to visualisation. Free version online. 110 | 111 | ### Github-Gitlab repo to learn from 112 | I found that reading code helps to know the best practices whether it is Python or SQL. 113 | 114 | In Python reading some taps from [Singer](https://github.com/singer-io) can teach you a lot. 115 | 116 | In dbt/SQL I like to browse [a repo open-sourced by Gitlab](https://gitlab.com/gitlab-data/analytics/-/tree/master/transform/snowflake-dbt) 117 | 118 | 119 | ### Against ELT 120 | The concept of analytics engineering is tightly coupled with the ELT view of data warehousing. It is interesting to learn from the people that would prefer the ETL. 121 | [Reddit comments on Snowflake super-expensive cost](https://www.reddit.com/r/dataengineering/comments/is39id/snowflake_cost_analysis/) 122 | 123 | 124 | ### Other readings lists 125 | 126 | The GitLab data team also made an [excellent list](https://about.gitlab.com/handbook/business-ops/data-team/#data-learning-and-resources). (close to mine) 127 | 128 | [Analytics Dispatch](https://mode.com/analytics-dispatch) by Mode Analytics. Very comprehensive. 129 | 130 | I really love [Reading in Applied Data Science](https://github.com/hadley/stats337#readings) for a more data science focused view. 131 | 132 | Knowing more about programming is an huge asset. For instance [Professional Programming list](https://github.com/charlax/professional-programming) is quite complete. 133 | 134 | 135 | # Top bloggers/blog 136 | * [Randy Au](https://towardsdatascience.com/@Randy_Au). You can read almost all his posts there are all very relevant for analytics engineers. 137 | * [Locally Optimistic](https://www.locallyoptimistic.com/). A blog dedicated to data in organizations. 138 | * [Tristan Handy](https://medium.com/@jthandy). I also love his newsletter: [Data Science Roundup](http://roundup.fishtownanalytics.com/). 139 | * [Dbt blog](https://blog.getdbt.com/). 90% of the articles are almost must-read. 140 | * [Ken Farmer](https://www.reddit.com/user/kenfar/?sort=top&t=year) It is healthy to read from those who still prefer the ETL stack. 141 | * [Holistics.io](https://www.holistics.io/blog/) About the contemporary practice of business intelligence. 142 | 143 | # Where is the community? 144 | * Twitter 145 | * [Locally Optimistic](https://www.locallyoptimistic.com/) 146 | * [Reddit data engineering](https://www.reddit.com/r/dataengineering/). ETL, Business Intelligence, Data Science channels are also good. 147 | 148 | --------------------------------------------------------------------------------