├── .gitignore ├── README.md ├── airflow └── dags │ ├── generate_twitter.py │ ├── subdags │ └── twitter_subdag.py │ └── twitter_airflow.py ├── celery_app ├── __init__.py ├── celeryapp.py ├── more_tasks.py ├── pytest_stock_tasks.py ├── tasks.py └── test_stock_tasks.py ├── data ├── example_chatlogs.json ├── mvt.csv ├── mvt_cleaned.csv └── tweets │ └── latest_links.txt ├── deploy ├── celery_service ├── celerybeat_service ├── example_variables.yml ├── flower_service ├── jupyter_service ├── jupyterhub_service ├── luigi_service ├── pipelines_playbook.yml ├── pipelines_variables.yml └── templates │ ├── jupyterhub_config.py │ └── sshd_config ├── example_prod.cfg ├── luigi ├── luigi.cfg ├── taxi_data_import.py └── wordcount_map_reduce.py ├── notebooks ├── Chapter 3 - Basic Celery Tasks.ipynb ├── Chapter 3 - Complex Task Chains.ipynb ├── Chapter 3 - First Steps with Celery.ipynb ├── Chapter 3 - Monitoring Tasks.ipynb ├── Chapter 4 - Dask Distributed.ipynb ├── Chapter 4 - First Steps with Dask.ipynb ├── Chapter 4 - Learning Dask Bags.ipynb ├── Chapter 6 - Introduction to PySpark.ipynb ├── Chapter 6 - Introduction to Spark Streaming.ipynb ├── Chapter 7 - Testing with Hypothesis.ipynb └── Extras (Chapter 4) - Clean Vehicle Theft Data.ipynb ├── requirements.txt └── streaming └── tweepy_stream.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/.gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/README.md -------------------------------------------------------------------------------- /airflow/dags/generate_twitter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/airflow/dags/generate_twitter.py -------------------------------------------------------------------------------- /airflow/dags/subdags/twitter_subdag.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/airflow/dags/subdags/twitter_subdag.py -------------------------------------------------------------------------------- /airflow/dags/twitter_airflow.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/airflow/dags/twitter_airflow.py -------------------------------------------------------------------------------- /celery_app/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /celery_app/celeryapp.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/celery_app/celeryapp.py -------------------------------------------------------------------------------- /celery_app/more_tasks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/celery_app/more_tasks.py -------------------------------------------------------------------------------- /celery_app/pytest_stock_tasks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/celery_app/pytest_stock_tasks.py -------------------------------------------------------------------------------- /celery_app/tasks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/celery_app/tasks.py -------------------------------------------------------------------------------- /celery_app/test_stock_tasks.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/celery_app/test_stock_tasks.py -------------------------------------------------------------------------------- /data/example_chatlogs.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/data/example_chatlogs.json -------------------------------------------------------------------------------- /data/mvt.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/data/mvt.csv -------------------------------------------------------------------------------- /data/mvt_cleaned.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/data/mvt_cleaned.csv -------------------------------------------------------------------------------- /data/tweets/latest_links.txt: -------------------------------------------------------------------------------- 1 | url,count 2 | -------------------------------------------------------------------------------- /deploy/celery_service: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/celery_service -------------------------------------------------------------------------------- /deploy/celerybeat_service: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/celerybeat_service -------------------------------------------------------------------------------- /deploy/example_variables.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/example_variables.yml -------------------------------------------------------------------------------- /deploy/flower_service: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/flower_service -------------------------------------------------------------------------------- /deploy/jupyter_service: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/jupyter_service -------------------------------------------------------------------------------- /deploy/jupyterhub_service: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/jupyterhub_service -------------------------------------------------------------------------------- /deploy/luigi_service: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/luigi_service -------------------------------------------------------------------------------- /deploy/pipelines_playbook.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/pipelines_playbook.yml -------------------------------------------------------------------------------- /deploy/pipelines_variables.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/pipelines_variables.yml -------------------------------------------------------------------------------- /deploy/templates/jupyterhub_config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/templates/jupyterhub_config.py -------------------------------------------------------------------------------- /deploy/templates/sshd_config: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/deploy/templates/sshd_config -------------------------------------------------------------------------------- /example_prod.cfg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/example_prod.cfg -------------------------------------------------------------------------------- /luigi/luigi.cfg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/luigi/luigi.cfg -------------------------------------------------------------------------------- /luigi/taxi_data_import.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/luigi/taxi_data_import.py -------------------------------------------------------------------------------- /luigi/wordcount_map_reduce.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/luigi/wordcount_map_reduce.py -------------------------------------------------------------------------------- /notebooks/Chapter 3 - Basic Celery Tasks.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 3 - Basic Celery Tasks.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 3 - Complex Task Chains.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 3 - Complex Task Chains.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 3 - First Steps with Celery.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 3 - First Steps with Celery.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 3 - Monitoring Tasks.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 3 - Monitoring Tasks.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 4 - Dask Distributed.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 4 - Dask Distributed.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 4 - First Steps with Dask.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 4 - First Steps with Dask.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 4 - Learning Dask Bags.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 4 - Learning Dask Bags.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 6 - Introduction to PySpark.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 6 - Introduction to PySpark.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 6 - Introduction to Spark Streaming.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 6 - Introduction to Spark Streaming.ipynb -------------------------------------------------------------------------------- /notebooks/Chapter 7 - Testing with Hypothesis.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Chapter 7 - Testing with Hypothesis.ipynb -------------------------------------------------------------------------------- /notebooks/Extras (Chapter 4) - Clean Vehicle Theft Data.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/notebooks/Extras (Chapter 4) - Clean Vehicle Theft Data.ipynb -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/requirements.txt -------------------------------------------------------------------------------- /streaming/tweepy_stream.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kjam/data-pipelines-course/HEAD/streaming/tweepy_stream.py --------------------------------------------------------------------------------