├── .gitignore ├── LICENSE ├── README.md ├── SUMMARY.md ├── best_practices ├── README.md ├── dont_call_collect_on_a_very_large_rdd.md └── prefer_reducebykey_over_groupbykey.md ├── cover.jpg ├── images ├── cached-partitions.png ├── group_by.png ├── locality.png ├── partitions-as-tasks.png └── reduce_by.png ├── performance_optimization ├── README.md ├── data_locality.md └── how_many_partitions_does_an_rdd_have.md ├── spark_streaming ├── README.md └── error_oneforonestrategy.md └── troubleshooting ├── README.md ├── connectivity_issues.md ├── java_io_not_serializable_exception.md ├── missing_dependencies_in_jar_files.md └── port_22_connection_refused.md /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/README.md -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/SUMMARY.md -------------------------------------------------------------------------------- /best_practices/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/best_practices/README.md -------------------------------------------------------------------------------- /best_practices/dont_call_collect_on_a_very_large_rdd.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/best_practices/dont_call_collect_on_a_very_large_rdd.md -------------------------------------------------------------------------------- /best_practices/prefer_reducebykey_over_groupbykey.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/best_practices/prefer_reducebykey_over_groupbykey.md -------------------------------------------------------------------------------- /cover.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/cover.jpg -------------------------------------------------------------------------------- /images/cached-partitions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/images/cached-partitions.png -------------------------------------------------------------------------------- /images/group_by.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/images/group_by.png -------------------------------------------------------------------------------- /images/locality.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/images/locality.png -------------------------------------------------------------------------------- /images/partitions-as-tasks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/images/partitions-as-tasks.png -------------------------------------------------------------------------------- /images/reduce_by.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/images/reduce_by.png -------------------------------------------------------------------------------- /performance_optimization/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/performance_optimization/README.md -------------------------------------------------------------------------------- /performance_optimization/data_locality.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/performance_optimization/data_locality.md -------------------------------------------------------------------------------- /performance_optimization/how_many_partitions_does_an_rdd_have.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/performance_optimization/how_many_partitions_does_an_rdd_have.md -------------------------------------------------------------------------------- /spark_streaming/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/spark_streaming/README.md -------------------------------------------------------------------------------- /spark_streaming/error_oneforonestrategy.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/spark_streaming/error_oneforonestrategy.md -------------------------------------------------------------------------------- /troubleshooting/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/troubleshooting/README.md -------------------------------------------------------------------------------- /troubleshooting/connectivity_issues.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/troubleshooting/connectivity_issues.md -------------------------------------------------------------------------------- /troubleshooting/java_io_not_serializable_exception.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/troubleshooting/java_io_not_serializable_exception.md -------------------------------------------------------------------------------- /troubleshooting/missing_dependencies_in_jar_files.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/troubleshooting/missing_dependencies_in_jar_files.md -------------------------------------------------------------------------------- /troubleshooting/port_22_connection_refused.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aiyanbo/databricks-spark-knowledge-base-zh-cn/HEAD/troubleshooting/port_22_connection_refused.md --------------------------------------------------------------------------------