├── LICENSE ├── PYSPARK_TESTING └── PYT01 - Unit Testing.py ├── PySpark_ETL ├── PS00-Introduction.py ├── PS01-Read Files.py ├── PS02-Schema Handling.py ├── PS03-Creating Dataframes.py ├── PS04-Basic Transformation.py ├── PS05-Handling JSON.py ├── PS06-JOINS.py ├── PS07-Grouping & Aggregation.py ├── PS08-Ordering Data.py ├── PS09-String Functions.py ├── PS10-Date & Time Functions.py ├── PS11-Partitioning & Repartitioning.py ├── PS12-Missing Value Handling.py ├── PS13-Deduplication.py ├── PS14-Data Profiling using PySpark.py ├── PS15-Data Caching.py ├── PS16-User Defined Functions.py ├── PS17-Write Data.py └── Z01- Case Study Sales Order Analysis.py ├── README.md ├── SETUP ├── _clean_up.py ├── _initial_setup.py ├── _pyspark_clean_up.py ├── _pyspark_init_setup.py ├── _pyspark_setup_files.py ├── _setup_database.py └── _setup_demo_table.py └── SQL Refresher ├── PS000-INTRODUCTION.py ├── SR000-Introduction.py ├── SR001-Basic CRUD.py ├── SR002-Select & Filtering.py ├── SR003-JOINS.py ├── SR004-Order & Grouping.py ├── SR005-Sub Queries.py ├── SR006-Views & Temp Views.py ├── SR007-Common Table Expressions.py ├── SR008 - EXCEPT, UNION, UNION ALL, INTERSECTION.py ├── SR009-External Tables.py ├── SR010-Drop database & tables.py ├── SR011-Check Table & Database Details.py └── SR012-Versioning, Time Travel & Optimization.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/LICENSE -------------------------------------------------------------------------------- /PYSPARK_TESTING/PYT01 - Unit Testing.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PYSPARK_TESTING/PYT01 - Unit Testing.py -------------------------------------------------------------------------------- /PySpark_ETL/PS00-Introduction.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS00-Introduction.py -------------------------------------------------------------------------------- /PySpark_ETL/PS01-Read Files.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS01-Read Files.py -------------------------------------------------------------------------------- /PySpark_ETL/PS02-Schema Handling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS02-Schema Handling.py -------------------------------------------------------------------------------- /PySpark_ETL/PS03-Creating Dataframes.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS03-Creating Dataframes.py -------------------------------------------------------------------------------- /PySpark_ETL/PS04-Basic Transformation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS04-Basic Transformation.py -------------------------------------------------------------------------------- /PySpark_ETL/PS05-Handling JSON.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS05-Handling JSON.py -------------------------------------------------------------------------------- /PySpark_ETL/PS06-JOINS.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS06-JOINS.py -------------------------------------------------------------------------------- /PySpark_ETL/PS07-Grouping & Aggregation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS07-Grouping & Aggregation.py -------------------------------------------------------------------------------- /PySpark_ETL/PS08-Ordering Data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS08-Ordering Data.py -------------------------------------------------------------------------------- /PySpark_ETL/PS09-String Functions.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS09-String Functions.py -------------------------------------------------------------------------------- /PySpark_ETL/PS10-Date & Time Functions.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS10-Date & Time Functions.py -------------------------------------------------------------------------------- /PySpark_ETL/PS11-Partitioning & Repartitioning.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS11-Partitioning & Repartitioning.py -------------------------------------------------------------------------------- /PySpark_ETL/PS12-Missing Value Handling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS12-Missing Value Handling.py -------------------------------------------------------------------------------- /PySpark_ETL/PS13-Deduplication.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS13-Deduplication.py -------------------------------------------------------------------------------- /PySpark_ETL/PS14-Data Profiling using PySpark.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS14-Data Profiling using PySpark.py -------------------------------------------------------------------------------- /PySpark_ETL/PS15-Data Caching.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS15-Data Caching.py -------------------------------------------------------------------------------- /PySpark_ETL/PS16-User Defined Functions.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS16-User Defined Functions.py -------------------------------------------------------------------------------- /PySpark_ETL/PS17-Write Data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/PS17-Write Data.py -------------------------------------------------------------------------------- /PySpark_ETL/Z01- Case Study Sales Order Analysis.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/PySpark_ETL/Z01- Case Study Sales Order Analysis.py -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/README.md -------------------------------------------------------------------------------- /SETUP/_clean_up.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_clean_up.py -------------------------------------------------------------------------------- /SETUP/_initial_setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_initial_setup.py -------------------------------------------------------------------------------- /SETUP/_pyspark_clean_up.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_pyspark_clean_up.py -------------------------------------------------------------------------------- /SETUP/_pyspark_init_setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_pyspark_init_setup.py -------------------------------------------------------------------------------- /SETUP/_pyspark_setup_files.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_pyspark_setup_files.py -------------------------------------------------------------------------------- /SETUP/_setup_database.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_setup_database.py -------------------------------------------------------------------------------- /SETUP/_setup_demo_table.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SETUP/_setup_demo_table.py -------------------------------------------------------------------------------- /SQL Refresher/PS000-INTRODUCTION.py: -------------------------------------------------------------------------------- 1 | # Databricks notebook source 2 | 3 | -------------------------------------------------------------------------------- /SQL Refresher/SR000-Introduction.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR000-Introduction.py -------------------------------------------------------------------------------- /SQL Refresher/SR001-Basic CRUD.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR001-Basic CRUD.py -------------------------------------------------------------------------------- /SQL Refresher/SR002-Select & Filtering.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR002-Select & Filtering.py -------------------------------------------------------------------------------- /SQL Refresher/SR003-JOINS.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR003-JOINS.py -------------------------------------------------------------------------------- /SQL Refresher/SR004-Order & Grouping.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR004-Order & Grouping.py -------------------------------------------------------------------------------- /SQL Refresher/SR005-Sub Queries.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR005-Sub Queries.py -------------------------------------------------------------------------------- /SQL Refresher/SR006-Views & Temp Views.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR006-Views & Temp Views.py -------------------------------------------------------------------------------- /SQL Refresher/SR007-Common Table Expressions.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR007-Common Table Expressions.py -------------------------------------------------------------------------------- /SQL Refresher/SR008 - EXCEPT, UNION, UNION ALL, INTERSECTION.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR008 - EXCEPT, UNION, UNION ALL, INTERSECTION.py -------------------------------------------------------------------------------- /SQL Refresher/SR009-External Tables.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR009-External Tables.py -------------------------------------------------------------------------------- /SQL Refresher/SR010-Drop database & tables.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR010-Drop database & tables.py -------------------------------------------------------------------------------- /SQL Refresher/SR011-Check Table & Database Details.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR011-Check Table & Database Details.py -------------------------------------------------------------------------------- /SQL Refresher/SR012-Versioning, Time Travel & Optimization.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/martandsingh/ApacheSpark/HEAD/SQL Refresher/SR012-Versioning, Time Travel & Optimization.py --------------------------------------------------------------------------------