├── .devcontainer ├── devcontainer.json └── postCreateCommand.sh ├── .env ├── .env.spark ├── .github └── dependabot.yml ├── .gitignore ├── .python-version ├── Makefile ├── README.md ├── assets ├── infra.png ├── make_cr.gif └── sample_jupyter_notebook.ipynb ├── capstone ├── rainforest │ ├── etl │ │ ├── bronze │ │ │ ├── appuser.py │ │ │ ├── brand.py │ │ │ ├── buyer.py │ │ │ ├── category.py │ │ │ ├── clickstream.py │ │ │ ├── manufacturer.py │ │ │ ├── orderitem.py │ │ │ ├── orders.py │ │ │ ├── product.py │ │ │ ├── productcategory.py │ │ │ ├── ratings.py │ │ │ ├── seller.py │ │ │ └── sellerproduct.py │ │ ├── gold │ │ │ ├── daily_category_metrics.py │ │ │ ├── daily_order_metrics.py │ │ │ ├── wide_order_items.py │ │ │ └── wide_orders.py │ │ ├── interface │ │ │ ├── daily_category_report.py │ │ │ └── daily_order_report.py │ │ └── silver │ │ │ ├── dim_buyer.py │ │ │ ├── dim_category.py │ │ │ ├── dim_product.py │ │ │ ├── dim_seller.py │ │ │ ├── fct_clickstream.py │ │ │ ├── fct_order_items.py │ │ │ ├── fct_orders.py │ │ │ ├── product_x_category.py │ │ │ └── seller_x_product.py │ ├── great_expectations │ │ ├── .gitignore │ │ ├── checkpoints │ │ │ └── dq_checkpoint.yml │ │ ├── expectations │ │ │ ├── .ge_store_backend_id │ │ │ ├── daily_order_metrics.json │ │ │ ├── fct_orders.json │ │ │ └── orders.json │ │ ├── great_expectations.yml │ │ └── plugins │ │ │ └── custom_data_docs │ │ │ └── styles │ │ │ └── data_docs_custom_styles.css │ ├── tests │ │ ├── conftest.py │ │ ├── integration │ │ │ └── test_int_fct_order_items.py │ │ └── unit │ │ │ ├── test_appuser.py │ │ │ ├── test_brand.py │ │ │ ├── test_buyer.py │ │ │ ├── test_category.py │ │ │ ├── test_clickstream.py │ │ │ ├── test_dim_buyer.py │ │ │ ├── test_fct_order_items.py │ │ │ ├── test_manufacturer.py │ │ │ ├── test_orderitem.py │ │ │ ├── test_orders.py │ │ │ ├── test_product.py │ │ │ ├── test_product_x_category.py │ │ │ ├── test_productcategory.py │ │ │ ├── test_ratings.py │ │ │ ├── test_seller.py │ │ │ └── test_sellerproduct.py │ └── utils │ │ ├── base_table.py │ │ └── db.py ├── run_code.py └── upstream_datagen │ └── datagen.py ├── data-processing-spark ├── 1-lab-setup │ ├── code-execution │ │ └── README.md │ └── containers │ │ ├── spark │ │ ├── Dockerfile │ │ ├── conf │ │ │ ├── metrics.properties │ │ │ └── spark-defaults.conf │ │ ├── count.sql │ │ ├── create_buckets.py │ │ ├── entrypoint.sh │ │ ├── generate_tpch.py │ │ ├── requirements.txt │ │ ├── setup.sql │ │ └── tpch-dbgen │ │ │ ├── BUGS │ │ │ ├── HISTORY │ │ │ ├── Makefile │ │ │ ├── PORTING.NOTES │ │ │ ├── README │ │ │ ├── answers │ │ │ ├── q1.out │ │ │ ├── q10.out │ │ │ ├── q11.out │ │ │ ├── q12.out │ │ │ ├── q13.out │ │ │ ├── q14.out │ │ │ ├── q15.out │ │ │ ├── q16.out │ │ │ ├── q17.out │ │ │ ├── q18.out │ │ │ ├── q19.out │ │ │ ├── q2.out │ │ │ ├── q20.out │ │ │ ├── q21.out │ │ │ ├── q22.out │ │ │ ├── q3.out │ │ │ ├── q4.out │ │ │ ├── q5.out │ │ │ ├── q6.out │ │ │ ├── q7.out │ │ │ ├── q8.out │ │ │ └── q9.out │ │ │ ├── bcd2.c │ │ │ ├── bcd2.h │ │ │ ├── bm_utils.c │ │ │ ├── build.c │ │ │ ├── check_answers │ │ │ ├── README │ │ │ ├── cmpall.sh │ │ │ ├── cmpq.pl │ │ │ ├── colprecision.txt │ │ │ └── pairs.sh │ │ │ ├── column_split.sh │ │ │ ├── config.h │ │ │ ├── dbgen.dsp │ │ │ ├── dists.dss │ │ │ ├── driver.c │ │ │ ├── dss.ddl │ │ │ ├── dss.h │ │ │ ├── dss.ri │ │ │ ├── dsstypes.h │ │ │ ├── load_stub.c │ │ │ ├── permute.c │ │ │ ├── permute.h │ │ │ ├── print.c │ │ │ ├── qgen.c │ │ │ ├── qgen.vcproj │ │ │ ├── queries │ │ │ ├── 1.sql │ │ │ ├── 10.sql │ │ │ ├── 11.sql │ │ │ ├── 12.sql │ │ │ ├── 13.sql │ │ │ ├── 14.sql │ │ │ ├── 15.sql │ │ │ ├── 16.sql │ │ │ ├── 17.sql │ │ │ ├── 18.sql │ │ │ ├── 19.sql │ │ │ ├── 2.sql │ │ │ ├── 20.sql │ │ │ ├── 21.sql │ │ │ ├── 22.sql │ │ │ ├── 3.sql │ │ │ ├── 4.sql │ │ │ ├── 5.sql │ │ │ ├── 6.sql │ │ │ ├── 7.sql │ │ │ ├── 8.sql │ │ │ └── 9.sql │ │ │ ├── reference │ │ │ ├── README.txt │ │ │ ├── cmd_base_sf1 │ │ │ ├── cmd_base_sf100 │ │ │ ├── cmd_base_sf1000 │ │ │ ├── cmd_base_sf10000 │ │ │ ├── cmd_base_sf100000 │ │ │ ├── cmd_base_sf300 │ │ │ ├── cmd_base_sf3000 │ │ │ ├── cmd_base_sf30000 │ │ │ ├── cmd_base_small │ │ │ ├── cmd_qgen_sf1 │ │ │ ├── cmd_qgen_sf100 │ │ │ ├── cmd_qgen_sf1000 │ │ │ ├── cmd_qgen_sf10000 │ │ │ ├── cmd_qgen_sf100000 │ │ │ ├── cmd_qgen_sf300 │ │ │ ├── cmd_qgen_sf3000 │ │ │ ├── cmd_qgen_sf30000 │ │ │ ├── cmd_update_sf1 │ │ │ ├── cmd_update_sf100 │ │ │ ├── cmd_update_sf1000 │ │ │ ├── cmd_update_sf10000 │ │ │ ├── cmd_update_sf100000 │ │ │ ├── cmd_update_sf300 │ │ │ ├── cmd_update_sf3000 │ │ │ ├── cmd_update_sf30000 │ │ │ └── trim_updates.sh │ │ │ ├── release.h │ │ │ ├── rnd.c │ │ │ ├── rnd.h │ │ │ ├── rng64.c │ │ │ ├── rng64.h │ │ │ ├── shared.h │ │ │ ├── speed_seed.c │ │ │ ├── tests │ │ │ ├── check55.sh │ │ │ ├── check_dirs.sh │ │ │ ├── dop.sh │ │ │ ├── gen_tasks.sh │ │ │ ├── last_row.sh │ │ │ ├── load_balance.sh │ │ │ └── new55.sh │ │ │ ├── text.c │ │ │ ├── tpcd.h │ │ │ ├── tpch.dsw │ │ │ ├── tpch.sln │ │ │ ├── tpch.vcproj │ │ │ ├── update_release.sh │ │ │ ├── variants │ │ │ ├── 12a.sql │ │ │ ├── 13a.sql │ │ │ ├── 14a.sql │ │ │ ├── 15a.sql │ │ │ └── 8a.sql │ │ │ └── varsub.c │ │ └── upstream │ │ └── 1-upstream-data.sql ├── 2-apache-spark-basics │ ├── architecture │ │ └── resource_config.py │ ├── data-organization │ │ └── show_hierarchy.sql │ └── spark-data-processing-101 │ │ ├── case.py │ │ ├── case.sql │ │ ├── data_types.py │ │ ├── data_types.sql │ │ ├── dml.py │ │ ├── dml.sql │ │ ├── functions.py │ │ ├── functions.sql │ │ ├── group_by.py │ │ ├── group_by.sql │ │ ├── hierarchy.sql │ │ ├── join.py │ │ ├── join.sql │ │ ├── read_data.py │ │ ├── read_data.sql │ │ ├── stack_tables.py │ │ ├── stack_tables.sql │ │ ├── sub_query.py │ │ ├── sub_query.sql │ │ ├── views.py │ │ └── views.sql ├── 3-query-plan │ ├── query-plan-gen │ │ └── check_data_stats.sql │ └── what-is-query-plan │ │ └── read-query-plan │ │ ├── explain.py │ │ ├── explain.sql │ │ └── explain_ui.py ├── 4-data-processing │ ├── 1-data-shuffle-and-transformation-1ypes │ │ ├── data_shuffle.py │ │ ├── lazy_eval.py │ │ └── transformation_types.sql │ ├── 2-app-job-stage-task │ │ ├── spark_app_anatomy.py │ │ └── spark_app_config.py │ └── 3-join-types │ │ └── joins.sql ├── 5-data-storage │ ├── 1-column-and-row-storage │ │ ├── column_storage.sql │ │ └── row_storage.sql │ ├── 1-data-location │ │ └── reading_data_from_s3.py │ ├── 2-partition │ │ ├── adaptive_query_execution.py │ │ └── partition_pruning.sql │ ├── 3-bucketing │ │ └── bucketing.sql │ ├── 4-sorting │ │ ├── improved_compression.sql │ │ └── reduce_data_exc.sql │ ├── 6-table-format │ │ └── table_format_features.sql │ └── reading-data │ │ ├── read_data_from_external_storage.py │ │ └── reading_data.py ├── 6-data-modeling │ ├── 1-facts-and-dims │ │ └── orders_lineitems.sql │ └── 2-one-big-table │ │ └── one_big_table.sql └── 7-advanced-data-processing-patterns │ ├── 1-cte │ ├── cte.py │ └── cte.sql │ ├── 2-window-functions │ ├── compare_across_rows.py │ ├── compare_across_rows.sql │ ├── defining_window_frames.py │ ├── defining_window_frames.sql │ ├── defining_windows.py │ ├── defining_windows.sql │ ├── ranking_rows.py │ ├── ranking_rows.sql │ ├── running_metrics.py │ └── running_metrics.sql │ └── 3-templates │ ├── dedupe.py │ ├── dedupe.sql │ ├── multi_group.sql │ ├── period_over_period.py │ ├── period_over_period.sql │ ├── pivot.py │ └── pivot.sql ├── deactivate ├── docker-compose.yml ├── env_init.sh ├── main.py ├── pyproject.toml ├── source_env.sh └── spark-error-log.txt /.devcontainer/devcontainer.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/.devcontainer/devcontainer.json -------------------------------------------------------------------------------- /.devcontainer/postCreateCommand.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/.devcontainer/postCreateCommand.sh -------------------------------------------------------------------------------- /.env: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/.env -------------------------------------------------------------------------------- /.env.spark: -------------------------------------------------------------------------------- 1 | SPARK_NO_DAEMONIZE=true -------------------------------------------------------------------------------- /.github/dependabot.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/.github/dependabot.yml -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/.gitignore -------------------------------------------------------------------------------- /.python-version: -------------------------------------------------------------------------------- 1 | 3.13 2 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/Makefile -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/README.md -------------------------------------------------------------------------------- /assets/infra.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/assets/infra.png -------------------------------------------------------------------------------- /assets/make_cr.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/assets/make_cr.gif -------------------------------------------------------------------------------- /assets/sample_jupyter_notebook.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/assets/sample_jupyter_notebook.ipynb -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/appuser.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/appuser.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/brand.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/brand.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/buyer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/buyer.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/category.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/category.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/clickstream.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/clickstream.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/manufacturer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/manufacturer.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/orderitem.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/orderitem.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/orders.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/orders.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/product.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/product.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/productcategory.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/productcategory.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/ratings.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/ratings.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/seller.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/seller.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/bronze/sellerproduct.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/bronze/sellerproduct.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/gold/daily_category_metrics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/gold/daily_category_metrics.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/gold/daily_order_metrics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/gold/daily_order_metrics.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/gold/wide_order_items.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/gold/wide_order_items.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/gold/wide_orders.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/gold/wide_orders.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/interface/daily_category_report.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/interface/daily_category_report.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/interface/daily_order_report.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/interface/daily_order_report.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/dim_buyer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/dim_buyer.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/dim_category.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/dim_category.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/dim_product.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/dim_product.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/dim_seller.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/dim_seller.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/fct_clickstream.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/fct_order_items.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/fct_order_items.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/fct_orders.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/fct_orders.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/product_x_category.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/product_x_category.py -------------------------------------------------------------------------------- /capstone/rainforest/etl/silver/seller_x_product.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/etl/silver/seller_x_product.py -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/.gitignore -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/checkpoints/dq_checkpoint.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/checkpoints/dq_checkpoint.yml -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/expectations/.ge_store_backend_id: -------------------------------------------------------------------------------- 1 | store_backend_id = f9438db4-cc90-4afa-bfad-1ff2a615495d 2 | -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/expectations/daily_order_metrics.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/expectations/daily_order_metrics.json -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/expectations/fct_orders.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/expectations/fct_orders.json -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/expectations/orders.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/expectations/orders.json -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/great_expectations.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/great_expectations.yml -------------------------------------------------------------------------------- /capstone/rainforest/great_expectations/plugins/custom_data_docs/styles/data_docs_custom_styles.css: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/great_expectations/plugins/custom_data_docs/styles/data_docs_custom_styles.css -------------------------------------------------------------------------------- /capstone/rainforest/tests/conftest.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/conftest.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/integration/test_int_fct_order_items.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/integration/test_int_fct_order_items.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_appuser.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_appuser.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_brand.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_brand.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_buyer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_buyer.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_category.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_category.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_clickstream.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_clickstream.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_dim_buyer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_dim_buyer.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_fct_order_items.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_fct_order_items.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_manufacturer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_manufacturer.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_orderitem.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_orderitem.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_orders.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_orders.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_product.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_product.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_product_x_category.py: -------------------------------------------------------------------------------- 1 | # This is a test file for product_x_category.py 2 | -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_productcategory.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_productcategory.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_ratings.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_ratings.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_seller.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_seller.py -------------------------------------------------------------------------------- /capstone/rainforest/tests/unit/test_sellerproduct.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/tests/unit/test_sellerproduct.py -------------------------------------------------------------------------------- /capstone/rainforest/utils/base_table.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/utils/base_table.py -------------------------------------------------------------------------------- /capstone/rainforest/utils/db.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/rainforest/utils/db.py -------------------------------------------------------------------------------- /capstone/run_code.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/run_code.py -------------------------------------------------------------------------------- /capstone/upstream_datagen/datagen.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/capstone/upstream_datagen/datagen.py -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/code-execution/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/code-execution/README.md -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/Dockerfile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/Dockerfile -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/conf/metrics.properties: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/conf/metrics.properties -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/conf/spark-defaults.conf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/conf/spark-defaults.conf -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/count.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/count.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/create_buckets.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/create_buckets.py -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/entrypoint.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/entrypoint.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/generate_tpch.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/generate_tpch.py -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/requirements.txt -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/setup.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/setup.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/BUGS: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/BUGS -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/HISTORY: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/HISTORY -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/Makefile: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/Makefile -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/PORTING.NOTES: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/PORTING.NOTES -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/README: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/README -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q1.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q1.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q10.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q10.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q11.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q11.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q12.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q12.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q13.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q13.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q14.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q14.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q15.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q15.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q16.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q16.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q17.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q17.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q18.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q18.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q19.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q19.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q2.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q2.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q20.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q20.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q21.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q21.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q22.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q22.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q3.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q3.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q4.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q4.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q5.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q5.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q6.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q6.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q7.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q7.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q8.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q8.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q9.out: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/answers/q9.out -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/bcd2.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/bcd2.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/bcd2.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/bcd2.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/bm_utils.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/bm_utils.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/build.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/build.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/README: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/README -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/cmpall.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/cmpall.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/cmpq.pl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/cmpq.pl -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/colprecision.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/colprecision.txt -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/pairs.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/check_answers/pairs.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/column_split.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/column_split.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/config.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/config.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dbgen.dsp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dbgen.dsp -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dists.dss: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dists.dss -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/driver.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/driver.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dss.ddl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dss.ddl -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dss.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dss.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dss.ri: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dss.ri -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dsstypes.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/dsstypes.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/load_stub.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/load_stub.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/permute.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/permute.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/permute.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/permute.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/print.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/print.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/qgen.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/qgen.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/qgen.vcproj: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/qgen.vcproj -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/1.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/1.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/10.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/10.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/11.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/11.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/12.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/12.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/13.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/13.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/14.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/14.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/15.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/15.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/16.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/16.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/17.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/17.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/18.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/18.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/19.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/19.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/2.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/2.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/20.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/20.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/21.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/21.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/22.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/22.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/3.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/3.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/4.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/4.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/5.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/5.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/6.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/6.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/7.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/7.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/8.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/8.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/9.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/queries/9.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/README.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/README.txt -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf1 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf100: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf100 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf1000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf1000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf10000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf10000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf100000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf100000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf300: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf300 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf3000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf3000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf30000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_sf30000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_small: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_base_small -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf1 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf100: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf100 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf1000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf1000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf10000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf10000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf100000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf100000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf300: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf300 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf3000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf3000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf30000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_qgen_sf30000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf1 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf100: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf100 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf1000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf1000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf10000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf10000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf100000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf100000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf300: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf300 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf3000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf3000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf30000: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/cmd_update_sf30000 -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/trim_updates.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/reference/trim_updates.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/release.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/release.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rnd.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rnd.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rnd.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rnd.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rng64.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rng64.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rng64.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/rng64.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/shared.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/shared.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/speed_seed.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/speed_seed.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/check55.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/check55.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/check_dirs.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/check_dirs.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/dop.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/dop.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/gen_tasks.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/gen_tasks.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/last_row.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/last_row.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/load_balance.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/load_balance.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/new55.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tests/new55.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/text.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/text.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpcd.h: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpcd.h -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpch.dsw: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpch.dsw -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpch.sln: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpch.sln -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpch.vcproj: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/tpch.vcproj -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/update_release.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/update_release.sh -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/12a.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/12a.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/13a.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/13a.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/14a.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/14a.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/15a.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/15a.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/8a.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/variants/8a.sql -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/varsub.c: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/spark/tpch-dbgen/varsub.c -------------------------------------------------------------------------------- /data-processing-spark/1-lab-setup/containers/upstream/1-upstream-data.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/1-lab-setup/containers/upstream/1-upstream-data.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/architecture/resource_config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/architecture/resource_config.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/data-organization/show_hierarchy.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/data-organization/show_hierarchy.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/case.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/case.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/case.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/case.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/data_types.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/data_types.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/data_types.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/data_types.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/dml.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/dml.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/dml.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/dml.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/functions.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/functions.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/functions.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/functions.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/group_by.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/group_by.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/group_by.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/group_by.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/hierarchy.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/hierarchy.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/join.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/join.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/join.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/join.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/read_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/read_data.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/read_data.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/read_data.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/stack_tables.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/stack_tables.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/stack_tables.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/stack_tables.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/sub_query.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/sub_query.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/sub_query.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/sub_query.sql -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/views.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/views.py -------------------------------------------------------------------------------- /data-processing-spark/2-apache-spark-basics/spark-data-processing-101/views.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/2-apache-spark-basics/spark-data-processing-101/views.sql -------------------------------------------------------------------------------- /data-processing-spark/3-query-plan/query-plan-gen/check_data_stats.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/3-query-plan/query-plan-gen/check_data_stats.sql -------------------------------------------------------------------------------- /data-processing-spark/3-query-plan/what-is-query-plan/read-query-plan/explain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/3-query-plan/what-is-query-plan/read-query-plan/explain.py -------------------------------------------------------------------------------- /data-processing-spark/3-query-plan/what-is-query-plan/read-query-plan/explain.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/3-query-plan/what-is-query-plan/read-query-plan/explain.sql -------------------------------------------------------------------------------- /data-processing-spark/3-query-plan/what-is-query-plan/read-query-plan/explain_ui.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/3-query-plan/what-is-query-plan/read-query-plan/explain_ui.py -------------------------------------------------------------------------------- /data-processing-spark/4-data-processing/1-data-shuffle-and-transformation-1ypes/data_shuffle.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/4-data-processing/1-data-shuffle-and-transformation-1ypes/data_shuffle.py -------------------------------------------------------------------------------- /data-processing-spark/4-data-processing/1-data-shuffle-and-transformation-1ypes/lazy_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/4-data-processing/1-data-shuffle-and-transformation-1ypes/lazy_eval.py -------------------------------------------------------------------------------- /data-processing-spark/4-data-processing/1-data-shuffle-and-transformation-1ypes/transformation_types.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/4-data-processing/1-data-shuffle-and-transformation-1ypes/transformation_types.sql -------------------------------------------------------------------------------- /data-processing-spark/4-data-processing/2-app-job-stage-task/spark_app_anatomy.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/4-data-processing/2-app-job-stage-task/spark_app_anatomy.py -------------------------------------------------------------------------------- /data-processing-spark/4-data-processing/2-app-job-stage-task/spark_app_config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/4-data-processing/2-app-job-stage-task/spark_app_config.py -------------------------------------------------------------------------------- /data-processing-spark/4-data-processing/3-join-types/joins.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/4-data-processing/3-join-types/joins.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/1-column-and-row-storage/column_storage.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/1-column-and-row-storage/column_storage.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/1-column-and-row-storage/row_storage.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/1-column-and-row-storage/row_storage.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/1-data-location/reading_data_from_s3.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/1-data-location/reading_data_from_s3.py -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/2-partition/adaptive_query_execution.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/2-partition/adaptive_query_execution.py -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/2-partition/partition_pruning.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/2-partition/partition_pruning.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/3-bucketing/bucketing.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/3-bucketing/bucketing.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/4-sorting/improved_compression.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/4-sorting/improved_compression.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/4-sorting/reduce_data_exc.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/4-sorting/reduce_data_exc.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/6-table-format/table_format_features.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/6-table-format/table_format_features.sql -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/reading-data/read_data_from_external_storage.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/reading-data/read_data_from_external_storage.py -------------------------------------------------------------------------------- /data-processing-spark/5-data-storage/reading-data/reading_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/5-data-storage/reading-data/reading_data.py -------------------------------------------------------------------------------- /data-processing-spark/6-data-modeling/1-facts-and-dims/orders_lineitems.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/6-data-modeling/1-facts-and-dims/orders_lineitems.sql -------------------------------------------------------------------------------- /data-processing-spark/6-data-modeling/2-one-big-table/one_big_table.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/6-data-modeling/2-one-big-table/one_big_table.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/1-cte/cte.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/1-cte/cte.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/1-cte/cte.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/1-cte/cte.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/compare_across_rows.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/compare_across_rows.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/compare_across_rows.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/compare_across_rows.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_window_frames.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_window_frames.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_window_frames.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_window_frames.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_windows.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_windows.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_windows.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/defining_windows.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/ranking_rows.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/ranking_rows.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/ranking_rows.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/ranking_rows.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/running_metrics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/running_metrics.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/running_metrics.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/2-window-functions/running_metrics.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/dedupe.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/dedupe.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/dedupe.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/dedupe.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/multi_group.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/multi_group.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/period_over_period.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/period_over_period.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/period_over_period.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/period_over_period.sql -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/pivot.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/pivot.py -------------------------------------------------------------------------------- /data-processing-spark/7-advanced-data-processing-patterns/3-templates/pivot.sql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/data-processing-spark/7-advanced-data-processing-patterns/3-templates/pivot.sql -------------------------------------------------------------------------------- /deactivate: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/deactivate -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/docker-compose.yml -------------------------------------------------------------------------------- /env_init.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/env_init.sh -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/main.py -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/pyproject.toml -------------------------------------------------------------------------------- /source_env.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/source_env.sh -------------------------------------------------------------------------------- /spark-error-log.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/josephmachado/efficient_data_processing_spark/HEAD/spark-error-log.txt --------------------------------------------------------------------------------