├── .gitignore ├── CONTRIBUTING.md ├── README.md ├── airflow-logo.png ├── check-for-dead-links-config.json └── check-for-dead-links.sh /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | node_modules/ 3 | package-lock.json 4 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contribution Guidelines 2 | 3 | ## Table of Contents 4 | - [Adding to this list](#adding-to-this-list) 5 | - [Creating your own awesome list](#creating-your-own-awesome-list) 6 | - [Adding something to an Awesome list](#adding-something-to-an-awesome-list) 7 | - [Updating your Pull Request](#updating-your-pull-request) 8 | 9 | ## Adding to this list 10 | 11 | Please ensure your pull request adheres to the following guidelines: 12 | 13 | - Search previous suggestions before making a new one, as yours may be a duplicate. 14 | - Make sure the list is useful before submitting. That implies it has enough content and every item has a good succinct description. 15 | - Make an individual pull request for each suggestion. 16 | - Use [title-casing](https://capitalizemytitle.com/) (AP style). 17 | - Use the following format: `[List Name](link)` 18 | - Link additions should be added to the bottom of the relevant category, with the exception of date ordered categories. 19 | - New categories or improvements to the existing categorization are welcome. 20 | - Check your spelling and grammar. 21 | - Make sure your text editor is set to remove trailing whitespace. 22 | - The pull request and commit should have a useful title. 23 | - The body of your commit message should contain a link to the repository. 24 | 25 | Thank you for your suggestions! 26 | 27 | ## Creating your own awesome list 28 | 29 | To create your own list, check out the [instructions](https://github.com/sindresorhus/awesome/blob/master/create-list.md). 30 | 31 | ## Adding something to an awesome list 32 | 33 | If you have something awesome to contribute to an awesome list, this is how you do it. 34 | 35 | You'll need a [GitHub account](https://github.com/join)! 36 | 37 | 1. Access the awesome list's GitHub page. For example: https://github.com/mbasso/awesome-wasm 38 | 2. Click on the `README.md`. 39 | 3. Now click on the edit icon. 40 | 4. You can start editing the text of the file in the in-browser editor. Make sure you follow guidelines above. You can use [GitHub Flavored Markdown](https://help.github.com/articles/github-flavored-markdown/). 41 | 5. Say why you're proposing the changes, and then click on "Propose file change". 42 | 6. Submit the [pull request](https://help.github.com/articles/using-pull-requests/)! 43 | 44 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [](https://airflow.apache.org/) 2 | # Awesome Apache Airflow 3 | ![contrib badge](https://img.shields.io/github/contributors/jghoman/awesome-apache-airflow.svg) ![GitHub commit activity](https://img.shields.io/github/commit-activity/m/jghoman/awesome-apache-airflow?style=plastic) 4 | 5 | This is a curated list of resources about [Apache Airflow](https://airflow.apache.org/). Please feel free to contribute any items that should be included. Items are generally added at the top of each section so that more fresh items are featured more prominently. 6 | 7 | ## Contents 8 | - [Vital links](#vital-links) 9 | - [Airflow deployment solutions](#airflow-deployment-solutions) 10 | - [Introductions and tutorials](#introductions-and-tutorials) 11 | - [Airflow Summit 2020 Videos](#airflow-summit-2020-videos) 12 | - [Best practices, lessons learned and cool use cases](#best-practices-lessons-learned-and-cool-use-cases) 13 | - [Books, blogs, podcasts, and such](#books-blogs-podcasts-and-such) 14 | - [Slide deck presentations and online videos](#slide-deck-presentations-and-online-videos) 15 | - [Libraries, Hooks, Utilities](#libraries-hooks-utilities) 16 | - [Meetups](#meetups) 17 | - [Commercial Airflow-as-a-service providers](#commercial-airflow-as-a-service-providers) 18 | - [Cloud Composer resources](#cloud-composer-resources) 19 | - [Non-English resources](#non-english-resources) 20 | 21 | ## Vital links 22 | - [Source code](https://github.com/apache/airflow/) (latest stable release [1.10.12](https://github.com/apache/airflow/tree/1.10.12)) 23 | - [Documentation](https://airflow.apache.org/) (also the official website) 24 | - [Confluence page](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Home) 25 | - [![Twitter Follow](https://img.shields.io/twitter/follow/apacheairflow?style=social)](https://twitter.com/ApacheAirflow) 26 | - [Slack workspace](https://apache-airflow-slack.herokuapp.com/) 27 | 28 | ## Airflow deployment solutions 29 | - [Installing Airflow on IBM Cloud](https://github.com/KissConsult/Apache-Airflow) - Quick and easy deployment on IBM Cloud with IBM [Bitnami Charts](https://github.com/bitnami/charts) 30 | - [Three ways to run Airflow on Kubernetes](https://fullstaq.com/blog/three-ways-to-run-airflow-on-kubernetes/) - [Tim van de Keer](https://www.linkedin.com/in/tim-van-de-keer-bb5a1966) walks through several methods for deploying Airflow on Kubernetes. 31 | - [Apache Airflow Multi-Tier Free Deployment on Azure](https://azure.microsoft.com/en-us/blog/bitnami-apache-airflow-multi-tier-now-available-in-azure-marketplace/) - A free Azure Resource Manager (ARM) template by Bitnami providing a one-click solution for Airflow deployment on Azure for production use-cases. 32 | - [KubernetesExecutor Helm Chart](https://github.com/tekn0ir/airflow-chart) - A lean Helm Chart using the KubernetesExecutor for a more k8s native experience and complementary [KubernetesExecutor Docker Image](https://github.com/tekn0ir/airflow-docker). 33 | - [Stable Celery Helm Chart](https://github.com/helm/charts/tree/master/stable/airflow) - Curated Helm Chart in the official stable chart repository. 34 | - [Puckel's Docker Image](https://github.com/puckel/docker-airflow) - [@Puckel_](https://twitter.com/Puckel_)'s well-crafted Docker image has become the base for many Airflow installations. It is regularly updated and closely tracks the official Apache releases. 35 | - [Kubernetes Custom Operator for Deploying Airflow](https://github.com/GoogleCloudPlatform/airflow-operator) - Kubernetes Custom controller (also called operator pattern) for deploying Airflow on Kubernetes. 36 | - [airflow-pipeline](https://github.com/datagovsg/airflow-pipeline) - Airflow Docker container that comes preconfigured for Spark and Hadoop. It can be docker pulled at ``datagovsg/airflow-pipeline``. 37 | - [aws-airflow-stack](https://github.com/villasv/aws-airflow-stack) - An AWS based Airflow cluster deployment with CeleryExecutor. Deploys after a few clicks with CloudFormation. 38 | - [kube-airflow](https://github.com/mumoshu/kube-airflow) - This repository contains both an Airflow Docker image (that appears to have been based on Puckel's work) and Kubernetes service definition. [mumoshu](https://github.com/mumoshu)'s repository has not been recently updated, but there are numerous forks that may be based on more recent releases. 39 | - [airflow-on-kubernetes](https://github.com/rolanddb/airflow-on-kubernetes) - A guide on all relevant resources, scripts and projects that relate to running Airflow on Kubernetes. 40 | - [airflow-k8s-executor-on-GKE](https://github.com/EamonKeane/airflow-GKE-k8sExecutor-helm) - A detailed tutorial to get a scalable, low maintenance airflow kubernetes executor environment deployed on [Google Kubernetes Engine](https://cloud.google.com/kubernetes-engine/) with [helm](https://helm.sh/). 41 | - [airflow-cookbook](https://github.com/bahchis/airflow-cookbook) - Chef cookbook for deploying Airflow. 42 | - [Running Airflow on top of Apache Mesos](http://agrajmangal.in/blog/big-data/running-airflow-on-top-of-apache-mesos/) - Blog describing how to configure [Mesos](http://mesos.apache.org/) to run all of the Airflow componenents. 43 | - [Integrating Apache Airflow with Apache Ambari](https://medium.com/@mykolamykhalov/integrating-apache-airflow-with-apache-ambari-ccab2c90173) - [Mykola Mykhalov](https://www.linkedin.com/in/mykola-mykhalov-9079a8107/) walks through using [Apache Ambari](https://ambari.apache.org/) to configure and deploy an Airflow instance. 44 | - [Astronomer Platform](https://github.com/astronomerio/astronomer) - Apache Airflow as a Service on Kubernetes. For more information visit https://www.astronomer.io. 45 | - [Bitnami Airflow Docker image](https://github.com/bitnami/bitnami-docker-airflow) - A secure and up-to-date docker image for Airflow maintained by Bitnami. 46 | - [Bitnami Airflow Scheduler Docker image](https://github.com/bitnami/bitnami-docker-airflow-scheduler) - A secure and up-to-date docker image for Airflow Scheduler maintained by Bitnami. 47 | - [Bitnami Airflow Worker Docker image](https://github.com/bitnami/bitnami-docker-airflow-worker) - A secure and up-to-date docker image for Airflow Worker maintained by Bitnami. A CeleryExecutor docker-compose deployment is available [here](https://github.com/bitnami/bitnami-docker-airflow-worker/blob/master/docker-compose.yml). 48 | - [Distribute & deploy Apache Airflow via Python PEX files](https://github.com/msumit/airflow-pex) - Example repo with steps to bundle, distribute, & deploy Apache Airflow as PEX files. 49 | - [Introducing KEDA for Airflow](https://www.astronomer.io/blog/the-keda-autoscaler/) - How to use KEDA scaler system to enable autoscaling of celery workers based on data stored in the Airflow metadata database. 50 | - [Airflow-Component](https://github.com/noelmcloughlin/airflow-component#lightweight-federated-apache-airflow-installer) - Lightweight installer of federated Airflow-Airflow (RabbitMQ) reference architectrure on Compute node(s). 51 | 52 | ## Introductions and tutorials 53 | - [Apache Airflow Monitoring Metrics](https://youtu.be/xyeR_uFhnD4) - A two-part series by [maxcotec](https://maxcotec.com) on how you can utilize existing Airflow statsd metrics to monitor your airflow deployment on Grafana dashboard via Prometheus. Also learn how to create custom metrics. 54 | - [Introduction to Airflow](https://www.youtube.com/playlist?list=PLzKRcZrsJN_xcKKyKn18K7sWu5TTtdywh) - A web tutorial series by [maxcotec](https://maxcotec.com) for beginners and intermediate users of Apache Airflow. 55 | - [ETL with Apache Airflow for Data Analysis on Transaction Data](https://github.com/KimaruThagna/ml-pipelines-airflow). [Kimaru Thagana](https://www.linkedin.com/in/kimaru-thagana-4920b5181/) covers a practical case of doing an ETL process using Apache Airflow using a dummy ecommerce store's transactional, user and product data. The data is served via a flask API. 56 | - [Start Building Better Data Pipelines With apache Airflow](https://blog.delaplex.com/start-building-better-data-pipelines-with-apache-airflow) 2020-Oct - Naman Gupta covers the basics of Airflow and its concepts. 57 | - [Airflow Repository Template](https://github.com/soggycactus/airflow-repo-template) - A boilerplate repository for developing locally with Airflow, with linting & tests for valid DAGs and plugins. Just clone and run `make start-airflow` to get started! Add some CI jobs to deploy your code and you're done. 58 | - [How Apache Airflow Distributes Jobs on Celery workers](https://blog.sicara.com/using-airflow-with-celery-workers-54cb5212d405) - A short description of the steps taken by a task instance, from scheduling to success, in a distributed architecture. 59 | - [Remote spark-submit to YARN running on EMR](https://medium.com/@tamizhgeek/remote-spark-submit-toyarn-running-on-emr-9804b89d82d2) - [Azhaguselvan](https://github.com/tamizhgeek) walks through submitting Spark jobs to existing EMR clusters with Airflow. 60 | - [Running Airflow on top of Apache Mesos](http://agrajmangal.in/blog/big-data/running-airflow-on-top-of-apache-mesos/) and its follow-up, [Mesos, Airflow & Docker](http://agrajmangal.in/blog/big-data/mesos-airflow-docker/) by [Agraj Mangal](https://twitter.com/agrajm) is a quick overview of running Airflow atop Apache Mesos. 61 | - [Dustin Stansbury](https://twitter.com/corrcoef) of [Quizlet](https://quizlet.com/) has written a four-part series that covers what workflow managers do in general, how Quizlet picked Airflow, a tour of Airflow's key concepts, and how Quizlet is now using Airflow in practice: 62 | - [Beyond CRON: an introduction to Workflow Management Systems](https://medium.com/@dustinstansbury/beyond-cron-an-introduction-to-workflow-management-systems-19987afcdb5e) 63 | - [Why Quizlet chose Apache Airflow for executing data workflows](https://towardsdatascience.com/why-quizlet-chose-apache-airflow-for-executing-data-workflows-3f97d40e9571) 64 | - [Understanding Apache Airflow’s key concepts](https://medium.com/@dustinstansbury/understanding-apache-airflows-key-concepts-a96efed52b1a) 65 | - [How Quizlet uses Apache Airflow in practice](https://medium.com/@dustinstansbury/how-quizlet-uses-apache-airflow-in-practice-a903cbb5626d) 66 | - [Integrating Apache Airflow with Databricks](https://databricks.com/blog/2017/07/19/integrating-apache-airflow-with-databricks.html) - While this tutorial is focused specifically on Databricks' Spark solutions, it does have a reasonable overview of Airflow basics and demonstrates how a third party solution can quickly integrate into Airflow. 67 | - [Apache Airflow 2.0 Tutorial](https://turbaszek.medium.com/apache-airflow-2-0-tutorial-41329bbf7211) - This article discusses the basic concepts that stand behind Airflow and discusses the problems it solves. 68 | - [Testing and debugging Apache Airflow](https://blog.godatadriven.com/testing-and-debugging-apache-airflow) - Article explaining how to apply unit testing, mocking and debugging to Airflow code. 69 | - [Get started developing workflows with Apache Airflow](http://michal.karzynski.pl/blog/2017/03/19/developing-workflows-with-apache-airflow/) - This brief introductory tutorial covers how to create data pipeline and processing workflow using DAG, operators, Sensor, using Xcoms to communicate between operators. 70 | - [Get started with Airflow + Google Cloud Platform + Docker](https://medium.com/@junjiejiang94/get-started-with-airflow-google-cloud-platform-docker-a21c46e0f797) - Step-by-step introduction by [Jayce Jiang](https://medium.com/@junjiejiang94). 71 | - [How to develop data pipeline in Airflow through TDD (test-driven development)](https://blog.magrathealabs.com/how-to-develop-data-pipeline-in-airflow-through-tdd-test-driven-development-c3333439f358) - Learn how to build a sales data pipeline using TDD step-by-step and in the end how to configure a simple CI workflow using Github Actions. 72 | 73 | 74 | ## Airflow Summit 2020 videos 75 | 76 | *The first [Airflow Summit 2020](https://airflowsummit.org/) was held in July 2020. It was a truly global, 77 | fully online event that was co-hosted by 9 Airflow Meetups from all over the world ([Melbourne](https://www.meetup.com/Melbourne-Apache-Airflow-Meetup/), 78 | [Tokyo](https://www.meetup.com/Tokyo-Apache-Airflow-incubating-Meetup), 79 | [Bangalore](https://www.meetup.com/Bangalore-Apache-Airflow-Meetup/), 80 | [Warsaw](https://www.meetup.com/Warsaw-Airflow-Meetup/), 81 | [Amsterdam](https://www.meetup.com/Amsterdam-Airflow-meetup/), 82 | [London](https://www.meetup.com/London-Apache-Airflow-Meetup/), 83 | [NYC](https://www.meetup.com/NYC-Apache-Airflow-Meetup/), 84 | [BayArea](https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup/)).* 85 | 86 | *It featured 40+ talks and three workshops. You can check out the talk recordings as a YouTube 87 | [Airflow Summit 2020 Playlist](https://www.youtube.com/playlist?list=PLGudixcDaxY3RGLSlWoN_cEEXhIT1OPmj) 88 | or see the individual talks here:* 89 | 90 | - [Keynote: Airflow then and now](https://youtu.be/GB2f7ZhRCho) ![Activity badge](https://img.shields.io/youtube/views/GB2f7ZhRCho) 91 | - [Scheduler as a service - Apache Airflow at EA Digital Platform](https://youtu.be/u00wmcHe8ow) ![Activity badge](https://img.shields.io/youtube/views/u00wmcHe8ow) 92 | - [Keynote: How large companies use Airflow for ML and ETL pipelines](https://youtu.be/428AiCBMZoQ) ![Activity badge](https://img.shields.io/youtube/views/428AiCBMZoQ) 93 | - [Data DAGs with lineage for fun and for profit](https://youtu.be/l_vVxOdvujg) ![Activity badge](https://img.shields.io/youtube/views/l_vVxOdvujg) 94 | - [Airflow on Kubernetes: Containerizing your workflows](https://youtu.be/3VDeKmxHWYA) ![Activity badge](https://img.shields.io/youtube/views/3VDeKmxHWYA) 95 | - [Data flow with Airflow @ PayPal](https://youtu.be/kAtaj_s4f-w) ![Activity badge](https://img.shields.io/youtube/views/kAtaj_s4f-w) 96 | - [Democratised data workflows at scale](https://youtu.be/Cd4-YtHYT9M) ![Activity badge](https://img.shields.io/youtube/views/Cd4-YtHYT9M) 97 | - [Migrating Airflow-based Spark jobs to Kubernetes - the native way](https://youtu.be/i79OsoLUx0k) ![Activity badge](https://img.shields.io/youtube/views/i79OsoLUx0k) 98 | - [Keynote: Future of Airflow](https://youtu.be/YLsGVFB8Pws) ![Activity badge](https://img.shields.io/youtube/views/YLsGVFB8Pws) 99 | - [Run Airflow DAGs in a secure way](https://youtu.be/QhnItssm4yU) ![Activity badge](https://img.shields.io/youtube/views/QhnItssm4yU) 100 | - [Keynote: Making Airflow a sustainable project through D&I](https://youtu.be/wxn9ta13Gbo) ![Activity badge](https://img.shields.io/youtube/views/wxn9ta13Gbo) 101 | - [Airflow CI/CD: Github to Cloud Composer (safely)](https://youtu.be/ZgTf523XM0g) ![Activity badge](https://img.shields.io/youtube/views/ZgTf523XM0g) 102 | - [Advanced Apache Superset for Data Engineers](https://youtu.be/Mhai7sVU244) ![Activity badge](https://img.shields.io/youtube/views/Mhai7sVU244) 103 | - [Demo: Reducing the lines, a visual DAG editor](https://youtu.be/I4nFCqEnOJc) ![Activity badge](https://img.shields.io/youtube/views/I4nFCqEnOJc) 104 | - [AIP-31: Airflow functional DAG definition](https://youtu.be/II4Ip81T3qc) ![Activity badge](https://img.shields.io/youtube/views/II4Ip81T3qc) 105 | - [Autonomous driving with Airflow](https://youtu.be/wEq1FGe6oBY) ![Activity badge](https://img.shields.io/youtube/views/wEq1FGe6oBY) 106 | - [From cron to Airflow on Kubernetes: A startup story](https://youtu.be/giQReCd7jp8) ![Activity badge](https://img.shields.io/youtube/views/giQReCd7jp8) 107 | - [Achieving Airflow Observability](https://youtu.be/Hc4pYAUL6Qs) ![Activity badge](https://img.shields.io/youtube/views/Hc4pYAUL6Qs) 108 | - [Machine Learning with Apache Airflow](https://youtu.be/N_3RQeqySE0) ![Activity badge](https://img.shields.io/youtube/views/N_3RQeqySE0) 109 | - [Airflow: A beast character in the gaming world](https://youtu.be/BH0ut33zp9A) ![Activity badge](https://img.shields.io/youtube/views/BH0ut33zp9A) 110 | - [Effective Cross-DAG dependency](https://youtu.be/p66GcO0LbFQ) ![Activity badge](https://img.shields.io/youtube/views/p66GcO0LbFQ) 111 | - [What open source taught us about business](https://youtu.be/KIEMEYM2PEs) ![Activity badge](https://img.shields.io/youtube/views/KIEMEYM2PEs) 112 | - [Data engineering hierarchy of needs](https://youtu.be/SvnTyDiZOzQ) ![Activity badge](https://img.shields.io/youtube/views/SvnTyDiZOzQ) 113 | - [Building reuseable and trustworthy ELT pipelines (A templated approach)](https://youtu.be/R4bp3_VyJ70) ![Activity badge](https://img.shields.io/youtube/views/R4bp3_VyJ70) 114 | - [Testing Airflow workflows - ensuring your DAGs work before going into production](https://youtu.be/ANJnYbLwLjE) ![Activity badge](https://img.shields.io/youtube/views/ANJnYbLwLjE) 115 | - [Adding an executor to Airflow: A contributor overflow exception](https://youtu.be/RKEmAshcreE) ![Activity badge](https://img.shields.io/youtube/views/RKEmAshcreE) 116 | - [Migration to Airflow backport providers](https://youtu.be/1SSlxAcOEso) ![Activity badge](https://img.shields.io/youtube/views/1SSlxAcOEso) 117 | - [From Zero to Airflow: bootstrapping a ML platform](https://youtu.be/kxyCC1sieok) ![Activity badge](https://img.shields.io/youtube/views/kxyCC1sieok) 118 | - [Airflow the perfect match in our analytics pipeline](https://youtu.be/vsn5kurjHwQ) ![Activity badge](https://img.shields.io/youtube/views/vsn5kurjHwQ) 119 | - [Airflow at Société Générale : An open source orchestration solution in a banking environment](https://youtu.be/VApoz5KCguM) ![Activity badge](https://img.shields.io/youtube/views/VApoz5KCguM) 120 | - [Airflow as the next gen of workflow system at Pinterest](https://youtu.be/KpCPfooD5hM) ![Activity badge](https://img.shields.io/youtube/views/KpCPfooD5hM) 121 | - [Improving Airflow's user experience](https://youtu.be/fe59rUezJ5Q) ![Activity badge](https://img.shields.io/youtube/views/fe59rUezJ5Q) 122 | - [Teaching an old DAG new tricks](https://youtu.be/DHDlD-bMM3c) ![Activity badge](https://img.shields.io/youtube/views/DHDlD-bMM3c) 123 | - [Ask me anything with Airflow members](https://youtu.be/tO6IBDPNAcY) ![Activity badge](https://img.shields.io/youtube/views/tO6IBDPNAcY) 124 | - [Using Airflow to speed up development of data intensive tools](https://youtu.be/GTQU8ff_O_4) ![Activity badge](https://img.shields.io/youtube/views/GTQU8ff_O_4) 125 | - [Pipelines on pipelines: Agile CI/CD workflows for Airflow DAGs](https://youtu.be/tY4F9X5l6dg) ![Activity badge](https://img.shields.io/youtube/views/tY4F9X5l6dg) 126 | - [Production Docker image for Apache Airflow](https://youtu.be/wDr3Y7q2XoI) ![Activity badge](https://img.shields.io/youtube/views/wDr3Y7q2XoI) 127 | - [Airflow as an elastic ETL tool](https://youtu.be/_50-JFCsp3I) ![Activity badge](https://img.shields.io/youtube/views/_50-JFCsp3I) 128 | - [How do we reason about the reliability of our data pipeline in Wrike](https://youtu.be/q9pdAlcMo48) ![Activity badge](https://img.shields.io/youtube/views/q9pdAlcMo48) 129 | - [Achieving Airflow observability with Databand](https://youtu.be/aQIZ_Wdy0lA) ![Activity badge](https://img.shields.io/youtube/views/aQIZ_Wdy0lA) 130 | - [From S3 to BigQuery - How a first-time Airflow user successfully implemented a data pipeline](https://youtu.be/yuqXWClbEM8) ![Activity badge](https://img.shields.io/youtube/views/yuqXWClbEM8) 131 | 132 | 133 | ## Best practices, lessons learned and cool use cases 134 | - [How to Best Use DuckDB with Apache Airflow](https://medium.com/apache-airflow/how-to-best-use-duckdb-with-apache-airflow-63a079160d5d) - Tips on integrating [DuckDB](https://duckdb.org/) into Airflow jobs. 135 | - [Airflow Dag Python Package Management](https://www.youtube.com/watch?v=9pykChPp-X4&t=121s) - Managing python package dependencies across 100+ dags can become painful. It's hard to keep track of which packages are used by which dag, and hard to clean up during DAG removal/upgrade. Learn how KubernetesPodOperator and DockerOperator can fix this. 136 | - [Airflow Dag Management & Versioning](https://youtu.be/a-4yRne3ba4) - Efficently manage DAGs release process by using Git Submodules 137 | - [Testing in Airflow Part 2](https://medium.com/@chandukavar/testing-in-airflow-part-2-integration-tests-and-end-to-end-pipeline-tests-af0555cd1a82) - [Chandu Kavar](https://twitter.com/chandukavar) and [Sarang Shinde](https://www.linkedin.com/in/sarang-shinde-219a4873/) have explained Integration Tests and End-to-End Pipeline Tests. 138 | - [Upgrading & Scaling Airflow at Robinhood](https://robinhood.engineering/upgrading-scaling-airflow-at-robinhood-5b625dfaa2ee) - [Abishek Ray](https://www.linkedin.com/in/abhishek-ray-29210145/) describes how Robinhood tackled upgrading its production Airflow while minimizing downtime. 139 | - [We're all using Airflow wrong and how to fix it](https://medium.com/bluecore-engineering/were-all-using-airflow-wrong-and-how-to-fix-it-a56f14cb0753) - [Jessica Laughlin](https://www.jldlaughlin.com/) of [Bluecore](https://www.bluecore.com/) shares three engineering problems associated with the Airflow design and how to solve them by using the [KubernetesPodOperator](https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/operators/kubernetes_pod_operator.py) in two design patterns. 140 | - [Getting started with Data Lineage](https://medium.com/dailymotion/getting-started-with-data-lineage-6307b2b429b3) - [Germain Tanguy](https://www.linkedin.com/in/germain-tanguy/) of [Dailymotion](https://www.dailymotion.com/) shares a data lineage prototype integrated to Apache Airflow. 141 | - [Collaboration between data engineers, data analysts and data scientists](https://medium.com/dailymotion/collaboration-between-data-engineers-data-analysts-and-data-scientists-97c00ab1211f) - [Germain Tanguy](https://www.linkedin.com/in/germain-tanguy/) of [Dailymotion](https://www.dailymotion.com/) shares how to efficiently release in production by collaboration with Apache Airflow. 142 | - [Using Apache Airflow’s Docker Operator with Amazon’s Container Repository](https://www.lucidchart.com/techblog/2019/03/22/using-apache-airflows-docker-operator-with-amazons-container-repository/) - [Brian Campbell](https://www.linkedin.com/in/bvcampbell3) of [Lucid](https://www.lucidchart.com/) has tips for integrating AWS's [ECR](https://aws.amazon.com/ecr/) service with Airflow's DockerOperator. 143 | - [Airflow: Lesser Known Tips, Tricks, and Best Practises](https://medium.com/datareply/airflow-lesser-known-tips-tricks-and-best-practises-cf4d4a90f8f) - [Kaxil Naik](https://www.linkedin.com/in/kaxil/) has explained the lesser-known yet very useful tips and best practises on using Airflow. 144 | - [boundary-layer:Declarative Airflow Workflows](https://codeascraft.com/2018/11/14/boundary-layer%E2%80%89-declarative-airflow-workflows/) - [Kevin McHale](https://www.linkedin.com/in/mchalek) has explained open source project boundary-layer which generates airflow dag with declarative workflows. 145 | - [Testing in Airflow Part 1](https://medium.com/@chandukavar/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c) - [Chandu Kavar](https://twitter.com/chandukavar) has explained different categories of tests in Airflow. It includes DAG Validation Tests, DAG Definition Tests, and unit tests. 146 | - [Improving Airflow UI Security](https://wecode.wepay.com/posts/improving-airflow-ui-security) - WePay's [Joy Gao](https://twitter.com/joygao) breaks down the need for Role Based Access Controls (RBAC) and how she introduced it to Airflow. 147 | - [How to Create a Workflow in Apache Airflow to Track Disease Outbreaks in India](https://blog.socialcops.com/engineering/apache-airflow-disease-outbreaks-india/) - [Vinayak Mehta](https://twitter.com/vortex_ape) details how [SocialCops](https://socialcops.com/) uses Airflow to scrape India's Ministry of Health and Family Affairs to generate derived data on possible disease outbreaks. 148 | - [Airflow, Meta Data Engineering, and a Data Platform for the World’s Largest Democracy](https://blog.socialcops.com/technology/engineering/airflow-meta-data-engineering-disha/) - [Vinayak Mehta](https://twitter.com/vortex_ape) talks about identifying data engineering patterns (meta data engineering) to automate DAG generation and how that helped [SocialCops](https://socialcops.com/) to power DISHA, a national data platform where Indian MPs and MLAs monitor the progress of 42 national level schemes. 149 | - [Lessons learnt while Airflow-ing](https://medium.com/@nehiljain/lessons-learnt-while-airflow-ing-32d3b7fc3fbf) and [Airflow Part 2: Lessons learned](https://medium.com/snaptravel/airflow-part-2-lessons-learned-793fa3c0841e) - [Nehil Jain](https://twitter.com/nehiljain) has written a two-part series that covers the value of workflow schedulers, some best practices and pitfalls he found while working with Airflow. The [second article](https://medium.com/snaptravel/airflow-part-2-lessons-learned-793fa3c0841e) in particular includes many production tips. 150 | - [Why Robinhood uses Airflow](https://robinhood.engineering/why-robinhood-uses-airflow-aed13a9a90c8) - [Vineet Goel](https://twitter.com/vineetik) walks through why financial trading platform [Robinhood](https://robinhood.com/) picked Airflow over alternative work schedulers. 151 | - [What we learned migrating off Cron to Airflow](https://medium.com/videoamp/what-we-learned-migrating-off-cron-to-airflow-b391841a0da4) - [Katie Macias](https://medium.com/@katiemacias) describes [VideoAmp](https://www.videoamp.com/)'s Data Engineering's journey from cron to Airflow. 152 | - [Under the Hood: Building AIR at Qubole](https://www.qubole.com/blog/hood-building-air-qubole/) - [Sreenath Kamath](https://www.linkedin.com/in/sreenath-kamath-66a1b970/) and [Rajat Venkatesh](https://twitter.com/vrajat) write about building [Qubole](https://www.qubole.com/)'s [data discovery, insights and recommendations platform](https://www.qubole.com/blog/building-qdsair-infrastructure/) atop Airflow. 153 | - [Airflow: Why is nothing working? - TL;DR Airflow’s SubDagOperator causes deadlocks](https://medium.com/bluecore-engineering/airflow-why-is-nothing-working-f705eb6b7b04) by [Jessica Laughlin](https://twitter.com/thepressofjess) - Deep dive into troubleshooting a troublesome Airflow DAG with good tips on how to diagnosis problems. 154 | - [Apache Airflow as an External scheduler for distributed systems](https://medium.com/@rako/apache-airflow-as-an-external-scheduler-for-distributed-systems-53b7354d3e48) - [Arunkumar](https://medium.com/@rako) suggests using Airflow as a simple external scheduler for a distributed system. 155 | - [How Sift Trains Thousands of Models using Apache Airflow](https://engineering.siftscience.com/sift-trains-thousands-models-using-apache-airflow/) - Summary of [Sift Science](https://siftscience.com/)'s deployment strategy for its machine learning model pipelines. 156 | - [Apache Airflow at Pandora](https://engineering.pandora.com/apache-airflow-at-pandora-1d7a844d68ee) - [Ace Haidrey](https://www.linkedin.com/in/acehaidrey/) discusses why Pandora chose Airflow and provides a detailed breakdown of their deployment and the infrastructure behind it. 157 | - [Airflow Lessons from the Data Engineering Front in Chicago](https://medium.com/stanton-ventures-insights/airflow-lessons-from-the-data-engineering-front-in-chicago-9489e6ad5c3d) - [Alison Stanton](https://twitter.com/alison985) provides a list of tips to avoid gotchas in Airflow jobs. 158 | - [Data’s Inferno: 7 Circles of Data Testing Hell with Airflow](https://medium.com/wbaa/datas-inferno-7-circles-of-data-testing-hell-with-airflow-cef4adff58d8) - The Wholesale Banking Advanced Analytics team at ING details how they torture test their Airflow DAGs before deployment. 159 | - [Data Testing with Airflow repository](https://github.com/danielvdende/data-testing-with-airflow) 160 | - [Data quality checkers](https://drivy.engineering/data-quality/) - [Antoine Augusti](https://twitter.com/AntoineAugusti) describes the framework [drivy](https://www.drivy.co.uk/) has built atop Airflow to test their datasets for completeness, consistency, timeliness, uniquess, validity and accuracy. 161 | - [Building WePay's data warehouse using BigQuery and Airflow](https://wecode.wepay.com/posts/wepays-data-warehouse-bigquery-airflow) - The inestimable [Chris Riccomini](https://twitter.com/criccomini) describes how [WePay](https://go.wepay.com/), one of the first adopters of Airflow, integrated into their [Google Cloud Compute](https://cloud.google.com/compute/) environment. 162 | - [Using Apache Airflow to Create Data Infrastructure in the Public Sector](https://www.astronomer.io/blog/using-apache-airflow-to-create-data-infrastructure/) - Despite an unfortunately very heavy sales pitch tone, this article blog post describes how [ARGO Labs](http://www.argolabs.org/), a non-profit data organization, utilizes Airflow for ETLing in public sector data. 163 | - [ETL with airflow](https://gtoonstra.github.io/etl-with-airflow/) - ETL core principles and several end-to-end docker-based examples including Kimball, Data Vault on Hive and some simpler examples. 164 | - [How to aggregate data for BigQuery using Apache Airflow](https://cloud.google.com/blog/big-data/2017/07/how-to-aggregate-data-for-bigquery-using-apache-airflow) - Example of how to use Airflow with Google BigQuery to power a Data Studio dashboard. 165 | - [Productionizing ML with workflows at Twitter](https://blog.twitter.com/engineering/en_us/topics/insights/2018/ml-workflows.html) - In depth post on why and how Twitter use Airflow for ML workflows including including custom operators and a custom UI embedded in in the Airflow web interface. 166 | - [Running Apache Airflow At Lyft](https://eng.lyft.com/running-apache-airflow-at-lyft-6e53bb8fccff) - This provides an overview on how Lyft operates Apache Airflow in production(monitoring, customization, etc). 167 | - [Deploying Apache Airflow in Azure to build and run data pipelines](https://azure.microsoft.com/sv-se/blog/deploying-apache-airflow-in-azure-to-build-and-run-data-pipelines/) - It talks about running Airflow on Azure. 168 | - [The Zen of Python and Apache Airflow](https://blog.godatadriven.com/zen-of-python-and-apache-airflow) - Blog post about how the Zen of Python can be applied to Airflow code. 169 | - [Securing Apache Airflow UI WITH DAG Level Access](https://eng.lyft.com/securing-apache-airflow-ui-with-dag-level-access-a7bc649a2821) - Blog post about Airflow DAG level access and how Lyft uses it. 170 | - [Upgrading Airflow with Zero Downtime](https://medium.com/flatiron-engineering/upgrading-airflow-with-zero-downtime-8df303760c96) - A detailed article on how to deploy Airflow with zero downtime. 171 | - [Building a Production-Level ETL Pipeline Platform Using Apache Airflow](https://towardsdatascience.com/building-a-production-level-etl-pipeline-platform-using-apache-airflow-a4cf34203fbd) - This post describes how the system management team at Cerner uses Airflow. 172 | - [Bare minimal Airflow on Kubernetes (Local, EKS, AKS)](https://github.com/stwind/airflow-on-kubernetes) - An article on deploying Airflow on local Kubernetes, AWS EKS and Azure AKS with bare minimal setup. 173 | - [Breaking up the Airflow DAG monorepo](https://tech.scribd.com/blog/2020/breaking-up-the-dag-repo.html) - This post describes how to support managing Airflow DAGs from multiple git repos through S3. 174 | - [Improving Performance of Apache Airflow Scheduler](https://medium.com/databand-ai/improving-performance-of-apache-airflow-scheduler-507f4cb6462a) - A story of an adventure that allowed [Databand](https://databand.ai/) to speed up DAG parsing time 10 times 175 | - [How SSENSE is using Apache Airflow to do Data Lineage on AWS](https://medium.com/ssense-tech/principled-data-engineering-part-ii-data-governance-30297abb2446) - Exploring the fundamental themes of architecting and governing a data lake on AWS using Apache Arflow. 176 | - [Monitoring Airflow with Prometheus, StatsD and Grafana](https://databand.ai/blog/everyday-data-engineering-monitoring-airflow-with-prometheus-statsd-and-grafana/) - A guide on how to setup operational dashboards to production cluster by [Databand](http://databand.ai) and get high level visibility on Airflow. 177 | - [Complex tasks orchestration at Hurb with Apache Airflow](https://medium.com/hurb-engineering/complex-tasks-orchestration-at-hurb-with-apache-airflow-dcb423c4dee6) - This post shows how [Hurb](https://hurb.com) uses Apache Airflow to orchestrate complex tasks and how it leverages DAG dynamic creation to improve development speed. 178 | - [Automating data export from CrateDB to S3 with Apache Airflow](https://community.crate.io/t/cratedb-and-apache-airflow-part-one/901) A tutorial on how to automate recurrent queries in [CrateDB](https://crate.io/) with Apache Airflow, such as periodic data export to Amazon S3. 179 | - [Implementation of Data Retention Policy with CrateDB and Apache Airflow](https://community.crate.io/t/cratedb-and-apache-airflow-implementation-of-data-retention-policy/913) A step by step tutorial on how to implement effective data retention policy with [CrateDB](https://crate.io/) and Apache Airflow. 180 | - [Ingesting NYC Taxi Data From S3 Into CrateDB](https://community.crate.io/t/cratedb-and-apache-airflow-building-a-data-ingestion-pipeline/926) - Describes how to build a database ingestion pipeline in Airflow by loading CSV files from S3 into [CrateDB](https://crate.io/). 181 | 182 | ## Books, blogs, podcasts, and such 183 | - [Data Pipelines with Apache Airflow](https://www.manning.com/books/data-pipelines-with-apache-airflow) - A Manning book (Early Access September 2019) on Airflow. 184 | - [The Airflow Podcast](https://soundcloud.com/the-airflow-podcast) - A semiregular podcast discussing all things Airflow. 185 | - [Maxime Beauchemin](https://medium.com/@maximebeauchemin) - Maxime's blog on medium that gives insight into the philosophy behind Apache Airflow. 186 | - [Robert Chang](https://medium.com/@rchang) - Blog posts about data engineering with Apache Airflow, explains why and has examples in code. 187 | - [Handling Airflow logs with Kubernetes Executor](https://szeevs.medium.com/handling-airflow-logs-with-kubernetes-executor-25c11ea831e4) - A blogpost that outlines how you can set up remote S3 logging when using KubernetesExecutor, without creating complex infrastructure. 188 | - [Airflow 2.0: DAG Authoring Redesigned](https://turbaszek.medium.com/airflow-2-0-dag-authoring-redesigned-651edc397178) - Blog post about new ways of writing DAGs in Airflow 2.0. 189 | - [Airflow 2.0 Providers](https://higrys.medium.com/airflow-2-0-providers-1bd21ba3bd93) - Blog post about providers packages in Airflow 2.0. 190 | 191 | ## Slide deck presentations and online videos 192 | - 2020-Feb: [Apache Airflow @ Umuzi.org](https://www.youtube.com/watch?IAmWKZDmvek) ![Activity badge](https://img.shields.io/youtube/views/IAmWKZDmvek) - [Sheena O'Connell](https://twitter.com/sheena_oconnell) discusses how South Africa-based tech bootcamp [Umuzi](https://www.umuzi.org/) uses Airflow. 193 | - [Apache Airflow YouTube tutorials](https://www.youtube.com/playlist?list=PL79i7SgJCJ9hu5GqcA091h6zuewmsvSyy) - [Marc Lamberti](https://twitter.com/marclambertiml) has created a series of YouTube tutorials covering many aspects of Airflow concepts, configuration and deployment. 194 | - [Advanced Data Engineering Patterns with Apache Airflow](https://www.youtube.com/watch?23_1WlxGGM4) ![Activity badge](https://img.shields.io/youtube/views/23_1WlxGGM4) - Video of [Maxime Beauchemin](https://medium.com/@maximebeauchemin)'s talk that briefly introduces Airflow and then goes into more advanced use cases, including self-servive SQL queries, building A/B testing metrics frameworks and machine learning feature extraction all via Airflow. The slides are available separately [here](https://prezi.com/p/adxlaplcwzho/advanced-data-engineering-patterns-with-apache-airflow/). 195 | - [Modern Data Pipelines with Apache Airflow](https://blog.tedmiston.com/momentum-2018-airflow-talk/) - A talk given by [Taylor Edmiston](https://twitter.com/kicksopenminds) and [Andy Cooper](https://twitter.com/andscoop) from Astronomer.io at Momentum Dev Con 2018 on getting started with Airflow, custom components, example DAGs, and the Astronomer Airflow CLI. 196 | - [Building Better Data Pipelines using Apache Airflow](https://www.slideshare.net/r39132/building-better-data-pipelines-using-apache-airflow-94060954) - Slides from [Sid Anand](https://twitter.com/r39132)'s talk at QCon 18 with a thorough overview of Airflow and its architecture. 197 | - [Airflow and Spark Streaming at Astronomer](https://paper.dropbox.com/doc/Airflow-Spark-talk-v2.0-5own4Nlz8MhdwKQ8QhIqj?_tk=share_copylink) - How Astronomer uses dynamic DAGs to run Spark Streaming jobs with Airflow. 198 | - [Apache Airflow in the Cloud: Programmatically orchestrating workloads with Python](https://www.slideshare.net/kaxil/apache-airflow-in-the-cloud-programmatically-orchestrating-workloads-with-python-pydata-london-2018-95391267) - Slides from [Kaxil Naik](https://twitter.com/kaxil)'s & [Satyasheel](https://twitter.com/ss6012) talk at PyData London 18 introducing the basics of Airflow and how to orchestrate workloads on Google Cloud Platform (GCP). 199 | - [Developing elegant workflows in Python code with Apache Airflow](https://eventil.com/presentations/j2sK9R) - [Michał Karzyński](https://twitter.com/postrational) at [Europython](https://ep2018.europython.eu/) gives a brief introduction to Airflow concepts including the role of workflow managers, DAGs and operators. Link includes both video and slides. 200 | - [Data Pipeline Management](https://www.youtube.com/watch?mjn1LAZ_Y38) ![Activity badge](https://img.shields.io/youtube/views/mjn1LAZ_Y38) - [Ben Goldberg](https://www.linkedin.com/in/benjamin-goldberg-50247169/) walks the Chicago Kubernetes Meetup through how [SpotHero](https://spothero.com/) uses Airflow. Additionally, Ben has a very [complete slidedeck](https://docs.google.com/presentation/d/1hc12cFs5TmEajLwYNASLwz_C17Q5tyd__6oXzI16A9A/edit#slide=id.g320d39a12c_0_1017) of how Airflow plays within Kubernetes. 201 | - [How I learned to time travel, or, data pipelining and scheduling with Airflow](https://www.slideshare.net/LauraLorenz4/how-i-learned-to-time-travel-or-data-pipelining-and-scheduling-with-airflow) - Comprehensive deck by [Laura Lorenz](https://twitter.com/lalorenz6) for why Airflow is necessary and how [Industry Dive](https://www.industrydive.com/) uses it. 202 | - [Introduction to Apache Airflow - Data Day Seattle 2016](https://www.slideshare.net/r39132/introduction-to-apache-airflow-data-day-seattle-2016) - [Sid Anand](https://twitter.com/r39132) gives a thorough introduction to Airflow and how it was used at [Agari](https://www.agari.com/). 203 | - [Operating Data Pipeline With Airflow - Airflow Meetup April-2018](https://speakerdeck.com/vananth22/operating-data-pipeline-with-airflow-at-slack) - [Ananth Packkildurai](https://twitter.com/ananthdurai) talks about scaling airflow Local Executor and best practices to operate data pipeline at [Slack](https://slack.com/). 204 | - [Apache Airflow at WePay](https://wepayinc.app.box.com/s/hf1chwmthuet29ux2a83f5quc8o5q18k) - [Chris Riccomini](https://twitter.com/criccomini) discusses why WePay chose Airflow and provides a detailed breakdown of their deployment and the infrastructure behind it. 205 | - [Elegant data pipelining with Apache Airflow](https://www.youtube.com/watch?v=neuh_2_zrt8) - Talks from [Bolke de Bruin](https://twitter.com/bolke2028) and [Fokko Driesprong](https://twitter.com/fokkodriesprong) at PyData Amsterdam 2018 about methodologies that provide clarity in ETL using Airflow. 206 | - [Airflow @ Lyft](https://www.slideshare.net/taofung/airflow-at-lyft) - Talks from [Tao Feng](https://github.com/feng-tao) at SF big data analytics meetup about how Lyft monitors running Airflow in production. 207 | - [Manageable data pipelines with Airflow and Kubernetes](https://youtu.be/bcF24AdG1o4) ![Activity badge](https://img.shields.io/youtube/views/bcF24AdG1o4)- Talk by [Jarek Potiuk](https://github.com/potiuk) and [Szymon Przedwojski](https://github.com/sprzedwojski). A introductory talk on Airflow from GDG Warsaw DevFest 2018. 208 | - [Migrating Apache Oozie Workflows to Apache Airflow](https://www.youtube.com/watch?8L1F-6t_6Ao) ![Activity badge](https://img.shields.io/youtube/views/8L1F-6t_6Ao) - Talk from [Szymon Przedwojski](https://github.com/sprzedwojski) from Airflow Bay Area Meetup June 2018 about Oozie-to-Airflow migration tool. 209 | - [Building data lakes with Apache Airflow](https://www.youtube.com/watch?MM8tfTrcnfk) ![Activity badge](https://img.shields.io/youtube/views/MM8tfTrcnfk) - Talk by [Bas Harenslak](https://github.com/BasPH) and [Julian de Ruiter](https://github.com/jrderuiter) at the Amsterdam Apache Airflow September 2018 meetup about building data lakes with Apache Airflow as the spider in the web managing all data flows. 210 | - [First Warsaw Apache Airflow Meetup](https://youtu.be/Nr4Pp1SNXeU) ![Activity badge](https://img.shields.io/youtube/views/Nr4Pp1SNXeU) - Live streamed recording from the first Apache Airflow Meetup in Warsaw in October 2019. 211 | - [What's coming in Apache Airflow 2.0](https://youtu.be/znowFIBK1lk) ![Activity badge](https://img.shields.io/youtube/views/znowFIBK1lk) - joint talk by [Ash Berlin-Taylor](https://github.com/ashb), [Kaxil Naik](https://github.com/kaxil), [Jarek Potiuk](https://github.com/potiuk), [Kamil Breguła](https://github.com/mik-laj), [Daniel Imbermann](https://github.com/dimberman), and [Tomek Urbaszek](https://github.com/turbaszek) at the [Online NYC Meetup, 13th of May 2020](https://www.meetup.com/NYC-Apache-Airflow-Meetup/events/270483933/) 212 | - [Airflow Breeze - Development and Test Environment for Apache Airflow](https://youtu.be/4MCTXq-oF68) ![Activity badge](https://img.shields.io/youtube/views/4MCTXq-oF68) - Screencast showing how to use Breeze environment by [Jarek Potiuk](https://github.com/potiuk). 213 | 214 | ## Libraries, Hooks, Utilities 215 | - [Domino](https://github.com/Tauffer-Consulting/domino) - Domino is an open source Graphical User Interface platform for creating data and Machine Learning workflows (DAGs) with no-code, visually intuitive drag-and-drop actions. It is also a standard for publishing and sharing your Python code so it can be automatically used by anyone, directly in the GUI. 216 | - [Airflow-Helper](https://github.com/xnuinside/airflow-helper) - setting up Airflow Variables, Connections, and Pools from a YAML configuration file. 217 | - [AirFly](https://github.com/ryanchao2012/airfly) - Auto generate Airflow's dag.py on the fly. 218 | - [DEAfrica Airflow](https://github.com/digitalearthafrica/deafrica-airflow) - Airflow libraries used by [Digital Earth Africa](https://digitalearthafrica.org/), an humanitarian effort to utilize satellite imagery of Africa. 219 | - [Airflow plugins](https://github.com/airflow-plugins/) - Central collection of repositories of various plugins for Airflow, including mailchimp, trello, sftp, GitHub, etc. 220 | - [fileflow](https://github.com/industrydive/fileflow) - Collection of modules to support large data transfers between Airflow operators through either local file system or S3. This addresses a gap where data is too large for XCOMs but too small or inconvenient for loading directly in the operator. Built by [Industry Dive](https://www.industrydive.com/). 221 | - [fairflow](https://github.com/michaelosthege/fairflow) - Library to abstract away Airflow's Operators with functional pieces that transform the data from one operator to another. 222 | - [airflow-maintenance-dags](https://github.com/teamclairvoyant/airflow-maintenance-dags) - [Clairvoyant](http://clairvoyantsoft.com/) has a repo of Airflow DAGs that operator on Airflow itself, clearing out various bits of the backing metadata store. 223 | - [test_dags](https://gist.github.com/criccomini/2862667822af7fae8b55682faef029a7) - a more complete solution for DAG integrity tests ([first Circle of Data’s Inferno are the first](https://medium.com/@ingwbaa/datas-inferno-7-circles-of-data-testing-hell-with-airflow-cef4adff58d8). 224 | - [dag-factory](https://github.com/ajbosco/dag-factory) - A library for dynamically generating Apache Airflow DAGs from YAML configuration files. 225 | - [whirl](https://github.com/godatadriven/whirl) - Fast iterative local development and testing of Apache Airflow workflows. 226 | - [airflow-code-editor](https://github.com/andreax79/airflow-code-editor) - A plugin for Apache Airflow that allows you to edit DAGs in browser. 227 | - [Pylint-Airflow](https://github.com/BasPH/pylint-airflow) - A Pylint plugin for static code analysis on Airflow code. 228 | - [afctl](https://github.com/qubole/afctl) - A CLI tool that includes everything required to create, manage and deploy airflow projects faster and smoother. 229 | - [Dag Dependencies viewer](https://github.com/ms32035/airflow-dag-dependencies) - A plugin which creates a view to visualize dependencies between the Airflow DAGs 230 | - [Airflow ECR Plugin](https://github.com/asandeep/airflow-ecr-plugin) - Plugin to refresh AWS ECR login token at regular intervals. This is helpful where DockerOperator needs to pull images hosted on ECR. 231 | - [AirflowK8sDebugger](https://github.com/Javier162380/AirflowKuberentesDebugger) - A library for generate k8s pod yaml templates from an Airflow dag using the KubernetesPodOperator. 232 | - [Oozie to Airflow](https://github.com/GoogleCloudPlatform/oozie-to-airflow) - A tool to easily convert between [Apache Oozie](http://oozie.apache.org/) workflows and Apache Airflow workflows. 233 | - [Airflow Ditto](https://github.com/angadsingh/airflow-ditto) - An extensible framework to do transformations to an Airflow DAG and convert it into another DAG which is flow-isomorphic with the original DAG, to be able to run it on different environments (e.g. on different clouds, or even different container frameworks - Apache Spark on YARN vs Kubernetes). Comes with out-of-the-box support for EMR-to-HDInsight-DAG transforms. 234 | - [gusty](https://github.com/chriscardillo/gusty) - Create a DAG using any number of YAML, Python, Jupyter Notebook, or R Markdown files that represent individual tasks in the DAG. gusty also configures dependencies, DAGs, and TaskGroups, features support for your local operators, and more. A fully containerized demo is available [here](https://github.com/chriscardillo/gusty-demo). 235 | - [Meltano](https://www.meltano.com) - Open source, self-hosted, CLI-first, debuggable, and extensible ELT tool that embraces [Singer](https://www.singer.io) for extraction and loading, leverages [dbt](https://www.getdbt.com) for transformation, and [integrates with Airflow for orchestration](https://meltano.com/#orchestration). 236 | - [DAG checks](https://github.com/politools/dag-checks) - The dag-checks consist of checks that can help you in maintaining your Apache Airflow instance. 237 | - [Airflow DVC plugin](https://github.com/covid-genomics/airflow-dvc) - Plugin for open-source version-control system for data science and Machine Learning pipelines - [DVC](https://dvc.org/). 238 | - [Airflow Vars](https://github.com/omerzamir/airflow-vars) - A CLI for variables management, created for CD-Pipelines in order to allow robust and safe variables management. 239 | - [airflow-priority](https://github.com/airflow-laminar/airflow-priority) - Priority Tags (P1, P2, etc) for Airflow DAGs with automated alerting to Datadog, New Relic, Slack, Discord, and more 240 | - [airflow-config](https://github.com/airflow-laminar/airflow-config) - [Pydantic](https://pydantic.dev) / [Hydra](https://hydra.cc) based configuration system for DAG and Task arguments 241 | - [airflow-supervisor](https://github.com/airflow-laminar/airflow-supervisor) - Easy-to-use [supervisor](http://supervisord.org) integration for long running or "always on" DAGs 242 | 243 | ## Meetups 244 | - [Amsterdam Apache Airflow Meetup](https://www.meetup.com/Amsterdam-Airflow-meetup) 245 | - [Bangalore Apache Airflow Meetup](https://www.meetup.com/Bangalore-Apache-Airflow-Meetup/) 246 | - [Bay Area Apache Airflow Meetup](https://www.meetup.com/Bay-Area-Apache-Airflow-Incubating-Meetup) 247 | - [London Apache Airflow Meetup](https://www.meetup.com/London-Apache-Airflow-Meetup/) 248 | - [Melbourne Apache Airflow Meetup](https://www.meetup.com/Melbourne-Apache-Airflow-Meetup/) 249 | - [New York City Apache Airflow Meetup](https://www.meetup.com/NYC-Apache-Airflow-Meetup/) 250 | - [Paris Apache Airflow Meetup](https://www.meetup.com/Paris-Apache-Airflow-Meetup/) 251 | - [Portland Apache Airflow Meetup](https://www.meetup.com/Portland-Apache-Airflow-Meetup/) 252 | - [Tokyo Apache Airflow Meetup](https://www.meetup.com/Tokyo-Apache-Airflow-incubating-Meetup/) 253 | - [Warsaw Apache Airflow Meetup](https://www.meetup.com/Warsaw-Airflow-Meetup) 254 | 255 | ## Commercial Airflow-as-a-service providers 256 | - [Google Cloud Composer](https://cloud.google.com/composer/) - Google Cloud Composer is a managed service built atop Google Cloud and Airflow. 257 | - [Qubole](http://docs.qubole.com/en/latest/user-guide/airflow/) - Qubole is mainly known as a service-and-support company for Apache Hive, but also provides Airflow as a component of its platform. 258 | - [Astronomer.io](https://www.astronomer.io/) - Astronomer provides complete ETL lifecycle solutions and appears to be entirely focused on providing Airflow-based products. 259 | - [AWS MWAA](https://aws.amazon.com/managed-workflows-for-apache-airflow/) - Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Airflow that makes it easier to set up and operate end-to-end data pipelines in the cloud at scale. 260 | 261 | ## Cloud Composer resources 262 | 263 | *This section contains articles that apply to [Cloud Composer](https://cloud.google.com/composer) — a service built by Google Cloud based on Apache Airflow. Tricks and solutions are described here that are intended for Cloud Composer, but may be applicable to vanilla Airflow.* 264 | 265 | - [Enabling Autoscaling in Google Cloud Composer](https://medium.com/traveloka-engineering/enabling-autoscaling-in-google-cloud-composer-ac84d3ddd60) - Supercharge your Cloud Composer deployment while saving up some cost during idle periods. 266 | - [Scale your Composer environment together with your business](https://cloud.google.com/blog/products/data-analytics/scale-your-composer-environment-together-your-business) - The Celery Executor architecture and ways to ensure high scheduler performance. 267 | - [pianka.sh](https://github.com/politools/airflow-pianka-sh) - Missing command in the gcloud tool. This tool facilitates some administrative tasks. 268 | - [The Smarter Way of Scaling With Composer’s Airflow Scheduler on GKE](https://medium.com/swlh/the-smarter-way-of-scaling-with-composers-airflow-scheduler-on-gke-88619238c77b) - [Roy Berkowitz](https://www.linkedin.com/in/roy-berkowitz-19922aa9/) discusses more effective use of nodes in the Cloud Composer service. 269 | - [Better together: orchestrating your Data Fusion pipelines with Cloud Composer](https://cloud.google.com/blog/products/data-analytics/easier-management-for-cloud-etl-elt-pipelines) - [Rachael Deacon-Smith](https://www.linkedin.com/in/rachael-deacon-smith-82660172) provides an overview of the operator for Datafusion use case on Cloud Composer. 270 | 271 | ## Non-English resources 272 | - [Airflow Documentation-Chinese](https://airflow.apachecn.org) - (🇨🇳Chinese) [Apachecn](https://github.com/apachecn) has translated the Airflow official documentation. 273 | - [Gestion de Tâches avec Apache Airflow](http://ncrocfer.github.io/posts/gestion-de-taches-avec-apache-airflow/) - (🇫🇷French) [Nicolas Crocfer](https://github.com/ncrocfer) - Overview of Airflow, basic concepts and how to write and trigger a DAG. 274 | - [Airflowはすごいぞ!100行未満で本格的なデータパイプライン](https://qiita.com/hankehly/items/1f02a34740276d1b8f0f) - (🇯🇵Japanese) [Hank Ehly](https://github.com/hankehly) gives a comprehensive introduction to Airflow's main concepts, and demonstrates how to create a data pipeline in less than 100 lines of code. 275 | - [apache airflow 複数worker構成のalpine版docker imageを作った](https://sekailab.com/wp/2018/04/05/apache-airflow-multinode-alpine-docker-image/) - (🇯🇵Japanese) [Akio Ohta](https://github.com/Drunkar) walks through his [Docker image](https://hub.docker.com/r/drunkar/airflow-alpine/) for deploying an Alpine-based Airflow system. 276 | - [AirflowのタスクログをS3に保存する方法](https://qiita.com/hankehly/items/5e2493cdf2c95ae42449) - (🇯🇵Japanese) [Hank Ehly](https://github.com/hankehly) shows step-by-step how to configure sending task logs to AWS S3. 277 | - [【徹底解説】Airflow Fluentd Elasticsearch Docker の連携方法](https://qiita.com/hankehly/items/f525465bba8b47da0b76) - (🇯🇵Japanese) [Hank Ehly](https://github.com/hankehly) describes how to handle worker task logs with Fluentd, Elasticsearch and Docker. 278 | - [Apache Airflow – Kaikki Mitä Meillä On, Lähtee Dageista](https://www.solita.fi/blogit/apache-airflow-kaikki-mita-meilla-on-lahtee-dageista/) - (🇫🇮Finnish) [Olli Iivonen](https://www.linkedin.com/in/oiivonen/)'s overview of Airflow, concepts and Airflow's usage at [Solita](https://www.solita.fi/). 279 | - [Airflow - Automatizando seu fluxo de trabalho](https://speakerdeck.com/gilsondev/airflow-automatizando-seu-fluxo-de-trabalho) - (🇧🇷Portuguese) [Gilson Filho](https://github.com/gilsondev)'s overview of Airflow, concept and basic use. 280 | - [Panduan Dasar Apache Airflow](https://imamdigmi.github.io/post/tutorial-airflow-part-1/) - (🇮🇩Indonesian) [Imam Digmi](https://github.com/imamdigmi) - Overview of Airflow, concept, basic use with use case. 281 | - [Airflow](https://blog.duyet.net/tag/airflow) - (🇻🇳Vietnamese) [Duyet Le](https://github.com/duyet) - Overview of Airflow, concept, basic use with use case. 282 | - [Michael Yang's Airflow Chinese Blog Posts](https://blog.csdn.net/Young2018/article/details/109105370?spm=1001.2014.3001.5501) - Michael Yang's Chinese blog posts about data engineering with Apache Airflow, conclude basic tutorials and devops skills. 283 | 284 | ## Sample projects 285 | 286 | - [Google Cloud Platform Public Datasets Pipelines](https://github.com/GoogleCloudPlatform/public-datasets-pipelines) - Cloud-native, data pipeline architecture for onboarding datasets to the Google Cloud Public Datasets Program. 287 | - [GitLab Data Team DAGs](https://gitlab.com/gitlab-data/analytics/-/tree/master/dags) - Several DAGs used to build analytics for the GitLab platform. 288 | - [deploy-airflow-on-ecs-fargate](https://github.com/hankehly/deploy-airflow-on-ecs-fargate) - Deploy to Amazon ECS Fargate. Demonstrates various features and configurations, such as autoscaling workers to zero, S3 remote logging and secret management. 289 | 290 | ## License 291 | 292 | [![CC0](http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg)](https://creativecommons.org/publicdomain/zero/1.0/) 293 | 294 | To the extent possible under law, [Jakob Homan](https://github.com/jghoman) has waived all copyright and related or neighboring rights to this work. 295 | -------------------------------------------------------------------------------- /airflow-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jghoman/awesome-apache-airflow/c8cd9587b2bb1d6c2aee55edf6f64f843514e133/airflow-logo.png -------------------------------------------------------------------------------- /check-for-dead-links-config.json: -------------------------------------------------------------------------------- 1 | { 2 | "ignorePatterns": [ 3 | { 4 | "pattern": "^https://www.linkedin.com/" 5 | } 6 | ] 7 | } 8 | -------------------------------------------------------------------------------- /check-for-dead-links.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | set -e # Bail early 3 | #set -x # Very verbose 4 | 5 | if command -v markdown-link-check; then 6 | CMD=markdown-link-check 7 | elif test -f "./node_modules/.bin/markdown-link-check"; then 8 | CMD="./node_modules/.bin/markdown-link-check" 9 | else 10 | echo "Installing markdown-link-check in local directory..." 11 | npm install --save-dev markdown-link-check 12 | CMD="./node_modules/.bin/markdown-link-check" 13 | fi 14 | 15 | eval $CMD -c check-for-dead-links-config.json -p -q README.md --------------------------------------------------------------------------------