└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # How To Become a Data Engineer 2 | 3 | ### Useful articles 4 | - [The AI Hierarchy of Needs](https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007) 5 | - [The Rise of Data Engineer](https://medium.freecodecamp.org/the-rise-of-the-data-engineer-91be18f1e603) 6 | - [The Downfall of the Data Engineer](https://medium.com/@maximebeauchemin/the-downfall-of-the-data-engineer-5bfb701e5d6b) 7 | - A Beginner’s Guide to Data Engineering 8 | - [Part I](https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7) 9 | - [Part II](https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-ii-47c4e7cbda71?source=---------5------------------) 10 | - [Part III](https://medium.com/@rchang/a-beginners-guide-to-data-engineering-the-series-finale-2cc92ff14b0?source=---------4------------------) 11 | - [Functional Data Engineering — a modern paradigm for batch data processing](https://medium.com/@maximebeauchemin/functional-data-engineering-a-modern-paradigm-for-batch-data-processing-2327ec32c42a) 12 | - [How to become a Data Engineer (in Russian)](https://khashtamov.com/ru/data-engineer/) 13 | 14 | ### Algorithms & Data Structures 15 | - [Algorithmic Toolbox](https://stepik.org/course/217) in Russian 16 | - [Data Structures](https://stepik.org/course/1547) in Russian 17 | - [Data Structures & Algorithms Specialization](https://www.coursera.org/specializations/data-structures-algorithms) on Coursera 18 | - [Algorithms Specialization](https://www.coursera.org/specializations/algorithms) from Stanford on Coursera 19 | 20 | ### SQL 21 | - [Comprehensive SQL Tutorial](https://mode.com/sql-tutorial/introduction-to-sql/) by Mode Analytics 22 | - [SQL Practice](https://leetcode.com/problemset/database/) on Leetcode 23 | - [Modern SQL](https://modern-sql.com/) a website about modern SQL syntax 24 | 25 | ### Programming 26 | - [Scala School](https://twitter.github.io/scala_school/) by Twitter 27 | - [Fluent Python](https://www.amazon.com/gp/product/1491946008/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1491946008&linkCode=as2&tag=adilkhash-20&linkId=8a663e966770c24874e323133cc7a005) intermediate level book about Python 28 | - [Intro to Scala](https://stepik.org/course/16243) in Russian on Stepik by Tinkoff Bank 29 | - [The Hitchhiker’s Guide to Python](https://docs.python-guide.org/) by Kenneth Reitz & Tanya Schlusser 30 | 31 | ### Databases 32 | - [Intro to Database Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXjYutVzTeAds8xUt1rcmyT7x) by Carnegie Mellon University 33 | - [Advanced Database Systems](https://www.youtube.com/playlist?list=PLSE8ODhjZXja7K1hjZ01UTVDnGQdx5v5U) by Carnegie Mellon University 34 | - On Disk IO 35 | - I. [Flavors of IO](https://medium.com/databasss/on-disk-io-part-1-flavours-of-io-8e1ace1de017) 36 | - II. [More Flavours of IO](https://medium.com/databasss/on-disk-io-part-2-more-flavours-of-io-c945db3edb13) 37 | - III. [LSM Trees](https://medium.com/databasss/on-disk-io-part-3-lsm-trees-8b2da218496f) 38 | - IV. [B-Trees and RUM Conjecture](https://medium.com/databasss/on-disk-storage-part-4-b-trees-30791060741) 39 | - V. [Access Patterns in LSM Trees](https://medium.com/databasss/on-disk-io-access-patterns-in-lsm-trees-2ba8dffc05f9) 40 | 41 | ### Distributed Systems 42 | - [Distributed systems for fun and profit](http://book.mixu.net/distsys/) by Mikito Takada 43 | - [Distributed Systems](https://www.amazon.com/gp/product/1543057381/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1543057381&linkCode=as2&tag=adilkhash-20&linkId=721aedeb23c313bc46a92c134c5baafa) by by Maarten van Steen & Andrew S. Tanenbaum 44 | - [CS 436: Distributed Computer Systems](https://www.youtube.com/watch?v=w8KFPWkK0bI&list=PLawkBQ15NDEkDJ5IyLIJUTZ1rRM9YQq6N&index=2) by University of Waterloo 45 | 46 | ### Books 47 | - [Design Data-Intensive Applications](https://www.amazon.com/gp/product/1449373321/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1449373321&linkCode=as2&tag=adilkhash-20&linkId=e7e0e096aa5761066245eb90965ac849) by Martin Kleppmann 48 | - [Introduction to Algorithms](https://www.amazon.com/gp/product/0262033844/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=0262033844&linkCode=as2&tag=adilkhash-20&linkId=74742875db503b1a899ca35159749067) by Thomas Cormen 49 | - [The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling](https://www.amazon.com/gp/product/1118530802/ref=as_li_tl?ie=UTF8&tag=adilkhash-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=1118530802&linkId=6ca865e8e9817dca57718bdbe5e52cd5) 50 | - [Star Schema The Complete Reference](https://www.amazon.com/gp/product/0071744320/ref=as_li_tl?ie=UTF8&tag=adilkhash-20&camp=1789&creative=9325&linkCode=as2&creativeASIN=0071744320&linkId=2abf9ef1d327071f74f59c3659ed6223) 51 | - [The Data Engineering Cookbook](https://github.com/andkret/Cookbook/) by Andreas Kretz 52 | 53 | ### Courses 54 | - [Big Data for Data Engineers Specialization](https://www.coursera.org/specializations/big-data-engineering) by Yandex 55 | - [Data Engineering on Google Cloud Platform Specialization](https://www.coursera.org/specializations/gcp-data-machine-learning) by Google 56 | - [Data Engineer Nanodegree](https://udacity.com/course/data-engineer-nanodegree--nd027) by Udacity 57 | 58 | ### Blogs 59 | - [Martin Kleppmann](https://martin.kleppmann.com/) author of Designing Data-Intensive Application 60 | - [BaseDS](https://medium.com/baseds) by Vaidehi Joshi about Distributed Systems 61 | 62 | ### Tools 63 | - [Apache Airflow](https://airflow.apache.org/) is a platform to programmatically author, schedule and monitor workflows in Python 64 | - [Apache Spark](https://spark.apache.org/) is a unified analytics engine for large-scale data processing 65 | - [Apache Kafka](https://kafka.apache.org/) is a distributed streaming platform 66 | - [Luigi](https://luigi.readthedocs.io) is a Python package that helps you build complex pipelines of batch jobs. 67 | 68 | ### Cloud Platforms 69 | - [Amazon Web Services](https://aws.amazon.com/) 70 | - [Google Cloud Platform](https://cloud.google.com/gcp/) 71 | - [Microsoft Azure](https://azure.microsoft.com) 72 | - [Yandex Cloud](https://cloud.yandex.ru/) 73 | - [DigitalOcean](https://m.do.co/c/e92056c9e79b) 74 | - [IBM Cloud](https://www.ibm.com/cloud/) 75 | 76 | ### Other 77 | - [Data Engineering Podcast](https://www.dataengineeringpodcast.com/) 78 | - [Insight Data Engineering Ecosystem Map](http://xyz.insightdataengineering.com/blog/pipeline_map/) 79 | 80 | ### Newsletters & Digests 81 | - [Data Eng Weekly](https://dataengweekly.com/) - Your weekly Data Engineering news 82 | - [SF Data Weekly](http://weekly.sfdata.io) - A weekly email of useful links for people interested in building data platforms 83 | - [Data Elixir](https://dataelixir.com/) - Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science. 84 | --------------------------------------------------------------------------------