├── .github └── workflows │ └── manual.yml ├── .gitignore ├── CODEOWNERS ├── Debugging_And_Optimization └── exercises │ ├── README.md │ └── starter │ ├── Introduction_data_skewness │ ├── README.md │ └── skewness_introduction.py │ ├── README.md │ ├── Repartition │ └── repartition.py │ └── practice_broadcast_joins │ └── broadcast_example.py ├── LICENSE.md ├── README.md └── Setting_Spark_Cluster_In_AWS ├── demo_code ├── README.md ├── sparkify_log_small.json ├── sparkify_log_small_2.json └── test-emr.ipynb └── exercises ├── README.md ├── solution ├── README.md └── creating_emr_cluster │ └── README.md └── starter ├── create_emr_cluster ├── Exercise_Creating_EMR_Clusters.py └── bootstrap_emr.sh ├── submitting_spark_scripts ├── README.md └── cities.csv └── write_to_s3 ├── README.md └── file_util.py /.github/workflows/manual.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/.github/workflows/manual.yml -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .github/** 2 | -------------------------------------------------------------------------------- /CODEOWNERS: -------------------------------------------------------------------------------- 1 | * @udacity/active-public-content 2 | -------------------------------------------------------------------------------- /Debugging_And_Optimization/exercises/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Debugging_And_Optimization/exercises/README.md -------------------------------------------------------------------------------- /Debugging_And_Optimization/exercises/starter/Introduction_data_skewness/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Debugging_And_Optimization/exercises/starter/Introduction_data_skewness/README.md -------------------------------------------------------------------------------- /Debugging_And_Optimization/exercises/starter/Introduction_data_skewness/skewness_introduction.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Debugging_And_Optimization/exercises/starter/Introduction_data_skewness/skewness_introduction.py -------------------------------------------------------------------------------- /Debugging_And_Optimization/exercises/starter/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Debugging_And_Optimization/exercises/starter/README.md -------------------------------------------------------------------------------- /Debugging_And_Optimization/exercises/starter/Repartition/repartition.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Debugging_And_Optimization/exercises/starter/Repartition/repartition.py -------------------------------------------------------------------------------- /Debugging_And_Optimization/exercises/starter/practice_broadcast_joins/broadcast_example.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Debugging_And_Optimization/exercises/starter/practice_broadcast_joins/broadcast_example.py -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/LICENSE.md -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/demo_code/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/demo_code/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/demo_code/sparkify_log_small.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/demo_code/sparkify_log_small.json -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/demo_code/sparkify_log_small_2.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/demo_code/sparkify_log_small_2.json -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/demo_code/test-emr.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/demo_code/test-emr.ipynb -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/solution/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/solution/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/solution/creating_emr_cluster/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/solution/creating_emr_cluster/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/starter/create_emr_cluster/Exercise_Creating_EMR_Clusters.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/starter/create_emr_cluster/Exercise_Creating_EMR_Clusters.py -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/starter/create_emr_cluster/bootstrap_emr.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/starter/create_emr_cluster/bootstrap_emr.sh -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/starter/submitting_spark_scripts/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/starter/submitting_spark_scripts/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/starter/submitting_spark_scripts/cities.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/starter/submitting_spark_scripts/cities.csv -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/starter/write_to_s3/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/starter/write_to_s3/README.md -------------------------------------------------------------------------------- /Setting_Spark_Cluster_In_AWS/exercises/starter/write_to_s3/file_util.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udacity/nd027-c3-data-lakes-with-spark/HEAD/Setting_Spark_Cluster_In_AWS/exercises/starter/write_to_s3/file_util.py --------------------------------------------------------------------------------