├── .gitignore ├── LICENSE ├── README.md ├── SampleData ├── author.csv ├── book.csv ├── reviews.csv └── user.csv ├── Utility └── bootstrap_script.txt ├── airflow ├── dags │ ├── __init__.py │ └── goodreads_etl_dag.py └── plugins │ ├── __init__.py │ ├── helpers │ ├── __init__.py │ └── analytics_queries.py │ └── operators │ ├── __init__.py │ ├── data_quality.py │ └── goodreads_analytics.py ├── docs ├── Airflow_Connections.md ├── Images.docx └── images │ ├── Airflow_EMR_ssh.PNG │ ├── Airflow_Redshift.PNG │ ├── DAG.PNG │ ├── DAG_Gantt.PNG │ ├── DAG_tree_view.PNG │ ├── DatasetCount.PNG │ ├── WarehouseCount.PNG │ ├── architecture.png │ ├── goodreads.png │ ├── goodreads_dag.PNG │ └── sourcefiles.PNG ├── goodreadsfaker ├── __init__.py └── generate_fake_data.py └── src ├── README.md ├── __init__.py ├── goodreads.log ├── goodreads_driver.py ├── goodreads_transform.py ├── goodreads_udf.py ├── logging.ini ├── s3_module.py └── warehouse ├── README.md ├── __init__.py ├── goodreads_staging_queries.py ├── goodreads_upsert.py ├── goodreads_warehouse_driver.py └── goodreads_warehouse_queries.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/README.md -------------------------------------------------------------------------------- /SampleData/author.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/SampleData/author.csv -------------------------------------------------------------------------------- /SampleData/book.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/SampleData/book.csv -------------------------------------------------------------------------------- /SampleData/reviews.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/SampleData/reviews.csv -------------------------------------------------------------------------------- /SampleData/user.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/SampleData/user.csv -------------------------------------------------------------------------------- /Utility/bootstrap_script.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/Utility/bootstrap_script.txt -------------------------------------------------------------------------------- /airflow/dags/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /airflow/dags/goodreads_etl_dag.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/dags/goodreads_etl_dag.py -------------------------------------------------------------------------------- /airflow/plugins/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/plugins/__init__.py -------------------------------------------------------------------------------- /airflow/plugins/helpers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/plugins/helpers/__init__.py -------------------------------------------------------------------------------- /airflow/plugins/helpers/analytics_queries.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/plugins/helpers/analytics_queries.py -------------------------------------------------------------------------------- /airflow/plugins/operators/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/plugins/operators/__init__.py -------------------------------------------------------------------------------- /airflow/plugins/operators/data_quality.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/plugins/operators/data_quality.py -------------------------------------------------------------------------------- /airflow/plugins/operators/goodreads_analytics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/airflow/plugins/operators/goodreads_analytics.py -------------------------------------------------------------------------------- /docs/Airflow_Connections.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/Airflow_Connections.md -------------------------------------------------------------------------------- /docs/Images.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/Images.docx -------------------------------------------------------------------------------- /docs/images/Airflow_EMR_ssh.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/Airflow_EMR_ssh.PNG -------------------------------------------------------------------------------- /docs/images/Airflow_Redshift.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/Airflow_Redshift.PNG -------------------------------------------------------------------------------- /docs/images/DAG.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/DAG.PNG -------------------------------------------------------------------------------- /docs/images/DAG_Gantt.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/DAG_Gantt.PNG -------------------------------------------------------------------------------- /docs/images/DAG_tree_view.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/DAG_tree_view.PNG -------------------------------------------------------------------------------- /docs/images/DatasetCount.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/DatasetCount.PNG -------------------------------------------------------------------------------- /docs/images/WarehouseCount.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/WarehouseCount.PNG -------------------------------------------------------------------------------- /docs/images/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/architecture.png -------------------------------------------------------------------------------- /docs/images/goodreads.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/goodreads.png -------------------------------------------------------------------------------- /docs/images/goodreads_dag.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/goodreads_dag.PNG -------------------------------------------------------------------------------- /docs/images/sourcefiles.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/docs/images/sourcefiles.PNG -------------------------------------------------------------------------------- /goodreadsfaker/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /goodreadsfaker/generate_fake_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/goodreadsfaker/generate_fake_data.py -------------------------------------------------------------------------------- /src/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/README.md -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/goodreads.log: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /src/goodreads_driver.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/goodreads_driver.py -------------------------------------------------------------------------------- /src/goodreads_transform.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/goodreads_transform.py -------------------------------------------------------------------------------- /src/goodreads_udf.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/goodreads_udf.py -------------------------------------------------------------------------------- /src/logging.ini: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/logging.ini -------------------------------------------------------------------------------- /src/s3_module.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/s3_module.py -------------------------------------------------------------------------------- /src/warehouse/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/warehouse/README.md -------------------------------------------------------------------------------- /src/warehouse/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/warehouse/goodreads_staging_queries.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/warehouse/goodreads_staging_queries.py -------------------------------------------------------------------------------- /src/warehouse/goodreads_upsert.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/warehouse/goodreads_upsert.py -------------------------------------------------------------------------------- /src/warehouse/goodreads_warehouse_driver.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/warehouse/goodreads_warehouse_driver.py -------------------------------------------------------------------------------- /src/warehouse/goodreads_warehouse_queries.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/san089/goodreads_etl_pipeline/HEAD/src/warehouse/goodreads_warehouse_queries.py --------------------------------------------------------------------------------