├── .gitignore ├── README.md ├── hadoop_dataset.txt ├── hdfs_quiz_advanced.yml ├── hdfs_quiz_beginner.yml ├── hdfs_quiz_intermediate.yml ├── hdfs_quiz_template.yml └── public_examples ├── hive ├── hive_delimiter.hql └── tab_delimited.txt ├── kafka └── cli_examples ├── map_reduce ├── ditributed_cache │ ├── female.txt │ └── male.txt ├── job_chain │ ├── count_mapper.py │ ├── run_job_chain.sh │ └── sum_reducer.py ├── reducer.sh ├── run.sh ├── shuffle_and_sort │ ├── csv_dataset.txt │ ├── run_shuffle_and_sort_experiment.sh │ ├── tabular_dataset.txt │ └── tabular_dataset_with_ip.txt └── word_count │ ├── mapper.py │ ├── reducer.py │ └── run_word_count.sh └── spark └── rdd ├── 01_spark_intro.ipynb ├── 02_spark_transformations_and_actions.ipynb ├── 03_cache_and_joins.ipynb ├── img ├── brain-cluster.png ├── cluster-overview.png ├── dag1.png ├── dag2.png ├── dag3.png ├── spark_dag_assignment.png └── spark_stages.png ├── simple.txt └── workshop ├── car_brands.txt └── cities.jsonlines /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/README.md -------------------------------------------------------------------------------- /hadoop_dataset.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/hadoop_dataset.txt -------------------------------------------------------------------------------- /hdfs_quiz_advanced.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/hdfs_quiz_advanced.yml -------------------------------------------------------------------------------- /hdfs_quiz_beginner.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/hdfs_quiz_beginner.yml -------------------------------------------------------------------------------- /hdfs_quiz_intermediate.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/hdfs_quiz_intermediate.yml -------------------------------------------------------------------------------- /hdfs_quiz_template.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/hdfs_quiz_template.yml -------------------------------------------------------------------------------- /public_examples/hive/hive_delimiter.hql: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/hive/hive_delimiter.hql -------------------------------------------------------------------------------- /public_examples/hive/tab_delimited.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/hive/tab_delimited.txt -------------------------------------------------------------------------------- /public_examples/kafka/cli_examples: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/kafka/cli_examples -------------------------------------------------------------------------------- /public_examples/map_reduce/ditributed_cache/female.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/ditributed_cache/female.txt -------------------------------------------------------------------------------- /public_examples/map_reduce/ditributed_cache/male.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/ditributed_cache/male.txt -------------------------------------------------------------------------------- /public_examples/map_reduce/job_chain/count_mapper.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/job_chain/count_mapper.py -------------------------------------------------------------------------------- /public_examples/map_reduce/job_chain/run_job_chain.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/job_chain/run_job_chain.sh -------------------------------------------------------------------------------- /public_examples/map_reduce/job_chain/sum_reducer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/job_chain/sum_reducer.py -------------------------------------------------------------------------------- /public_examples/map_reduce/reducer.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/reducer.sh -------------------------------------------------------------------------------- /public_examples/map_reduce/run.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/run.sh -------------------------------------------------------------------------------- /public_examples/map_reduce/shuffle_and_sort/csv_dataset.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/shuffle_and_sort/csv_dataset.txt -------------------------------------------------------------------------------- /public_examples/map_reduce/shuffle_and_sort/run_shuffle_and_sort_experiment.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/shuffle_and_sort/run_shuffle_and_sort_experiment.sh -------------------------------------------------------------------------------- /public_examples/map_reduce/shuffle_and_sort/tabular_dataset.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/shuffle_and_sort/tabular_dataset.txt -------------------------------------------------------------------------------- /public_examples/map_reduce/shuffle_and_sort/tabular_dataset_with_ip.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/shuffle_and_sort/tabular_dataset_with_ip.txt -------------------------------------------------------------------------------- /public_examples/map_reduce/word_count/mapper.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/word_count/mapper.py -------------------------------------------------------------------------------- /public_examples/map_reduce/word_count/reducer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/word_count/reducer.py -------------------------------------------------------------------------------- /public_examples/map_reduce/word_count/run_word_count.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/map_reduce/word_count/run_word_count.sh -------------------------------------------------------------------------------- /public_examples/spark/rdd/01_spark_intro.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/01_spark_intro.ipynb -------------------------------------------------------------------------------- /public_examples/spark/rdd/02_spark_transformations_and_actions.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/02_spark_transformations_and_actions.ipynb -------------------------------------------------------------------------------- /public_examples/spark/rdd/03_cache_and_joins.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/03_cache_and_joins.ipynb -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/brain-cluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/brain-cluster.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/cluster-overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/cluster-overview.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/dag1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/dag1.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/dag2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/dag2.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/dag3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/dag3.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/spark_dag_assignment.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/spark_dag_assignment.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/img/spark_stages.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/img/spark_stages.png -------------------------------------------------------------------------------- /public_examples/spark/rdd/simple.txt: -------------------------------------------------------------------------------- 1 | a 1 first 2 | b 2 second 3 | c 3 third 4 | -------------------------------------------------------------------------------- /public_examples/spark/rdd/workshop/car_brands.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/workshop/car_brands.txt -------------------------------------------------------------------------------- /public_examples/spark/rdd/workshop/cities.jsonlines: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/big-data-team/big-data-course/HEAD/public_examples/spark/rdd/workshop/cities.jsonlines --------------------------------------------------------------------------------