├── .gitignore ├── Introduction.md ├── LICENSE ├── README.md ├── awesome-slides ├── 2012-Intro-to-Spark-Internals-MateiZ.pdf ├── 2014-A-Deeper-Understanding-of-Spark-Internals-Aaron-Davidson.pdf ├── 2015-Advanced-Apache-Spark-Training-Sameer.pdf ├── 2015-Visual-Spark-API-Databricks.pdf └── README.md ├── ide-setup.md ├── job-schedule.md ├── media ├── 01-LogQueryCode.jpg ├── 01-addDepJars.jpg ├── 01-debug.jpg ├── 01-examplesDeps.jpg ├── 01-flumeSinkError.jpg ├── 01-missDepJars.jpg ├── 01-moduleFlumeSink.jpg ├── 01-openIDE.jpg ├── 01-packageAndRun.jpg ├── 01-runConfig.jpg ├── 01-runLogQuery.jpg ├── 01-runLogQuerySuccessfully.jpg ├── 02-PairRDDFunctions.jpg ├── 02-TransformationAndActions.jpg ├── 02-dependencies.jpg ├── 02-flatten.jpg ├── 02-flattenTest.jpg ├── 02-flattenX.jpg ├── 02-lineageGraph.jpg ├── 02-rdd.jpg ├── 02-rddAbstraction.jpg ├── 02-rddImplicit.jpg ├── 02-rddRepresentation.jpg ├── 02-sbtTest.jpg ├── 02-schedulingProcess.jpg └── 14821230664592.jpg ├── rdd-abstraction.md ├── research-papers ├── 2010 Spark Cluster Computing with Working Sets.pdf ├── 2012 Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters.pdf ├── 2012 Shark Fast Data Analysis Using Coarse-grained Distributed Memory.pdf ├── 2013 Discretized Streams Fault-Tolerant Streaming Computation at Scale.pdf ├── 2013 Shark SQL and Rich Analytics at Scale.pdf ├── 2014 GraphX Unifying Data-Parallel and Graph-Parallel Analytics.pdf ├── 2015 Spark SQL Relational Data Processing in Spark.pdf ├── 2016 MLlib Machine Learning in Apache Spark.pdf ├── 201611 Apache Spark A Unified Engine for Big Data Processing.pdf ├── README.md └── Resilient Distributed Datasets A Fault Tolerant Abstraction for In-Memory Cluster Computing.pdf └── tools └── gh-md-toc /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/.gitignore -------------------------------------------------------------------------------- /Introduction.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/Introduction.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/README.md -------------------------------------------------------------------------------- /awesome-slides/2012-Intro-to-Spark-Internals-MateiZ.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/awesome-slides/2012-Intro-to-Spark-Internals-MateiZ.pdf -------------------------------------------------------------------------------- /awesome-slides/2014-A-Deeper-Understanding-of-Spark-Internals-Aaron-Davidson.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/awesome-slides/2014-A-Deeper-Understanding-of-Spark-Internals-Aaron-Davidson.pdf -------------------------------------------------------------------------------- /awesome-slides/2015-Advanced-Apache-Spark-Training-Sameer.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/awesome-slides/2015-Advanced-Apache-Spark-Training-Sameer.pdf -------------------------------------------------------------------------------- /awesome-slides/2015-Visual-Spark-API-Databricks.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/awesome-slides/2015-Visual-Spark-API-Databricks.pdf -------------------------------------------------------------------------------- /awesome-slides/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/awesome-slides/README.md -------------------------------------------------------------------------------- /ide-setup.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/ide-setup.md -------------------------------------------------------------------------------- /job-schedule.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/job-schedule.md -------------------------------------------------------------------------------- /media/01-LogQueryCode.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-LogQueryCode.jpg -------------------------------------------------------------------------------- /media/01-addDepJars.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-addDepJars.jpg -------------------------------------------------------------------------------- /media/01-debug.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-debug.jpg -------------------------------------------------------------------------------- /media/01-examplesDeps.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-examplesDeps.jpg -------------------------------------------------------------------------------- /media/01-flumeSinkError.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-flumeSinkError.jpg -------------------------------------------------------------------------------- /media/01-missDepJars.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-missDepJars.jpg -------------------------------------------------------------------------------- /media/01-moduleFlumeSink.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-moduleFlumeSink.jpg -------------------------------------------------------------------------------- /media/01-openIDE.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-openIDE.jpg -------------------------------------------------------------------------------- /media/01-packageAndRun.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-packageAndRun.jpg -------------------------------------------------------------------------------- /media/01-runConfig.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-runConfig.jpg -------------------------------------------------------------------------------- /media/01-runLogQuery.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-runLogQuery.jpg -------------------------------------------------------------------------------- /media/01-runLogQuerySuccessfully.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/01-runLogQuerySuccessfully.jpg -------------------------------------------------------------------------------- /media/02-PairRDDFunctions.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-PairRDDFunctions.jpg -------------------------------------------------------------------------------- /media/02-TransformationAndActions.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-TransformationAndActions.jpg -------------------------------------------------------------------------------- /media/02-dependencies.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-dependencies.jpg -------------------------------------------------------------------------------- /media/02-flatten.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-flatten.jpg -------------------------------------------------------------------------------- /media/02-flattenTest.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-flattenTest.jpg -------------------------------------------------------------------------------- /media/02-flattenX.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-flattenX.jpg -------------------------------------------------------------------------------- /media/02-lineageGraph.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-lineageGraph.jpg -------------------------------------------------------------------------------- /media/02-rdd.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-rdd.jpg -------------------------------------------------------------------------------- /media/02-rddAbstraction.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-rddAbstraction.jpg -------------------------------------------------------------------------------- /media/02-rddImplicit.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-rddImplicit.jpg -------------------------------------------------------------------------------- /media/02-rddRepresentation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-rddRepresentation.jpg -------------------------------------------------------------------------------- /media/02-sbtTest.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-sbtTest.jpg -------------------------------------------------------------------------------- /media/02-schedulingProcess.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/02-schedulingProcess.jpg -------------------------------------------------------------------------------- /media/14821230664592.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/media/14821230664592.jpg -------------------------------------------------------------------------------- /rdd-abstraction.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/rdd-abstraction.md -------------------------------------------------------------------------------- /research-papers/2010 Spark Cluster Computing with Working Sets.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2010 Spark Cluster Computing with Working Sets.pdf -------------------------------------------------------------------------------- /research-papers/2012 Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2012 Discretized Streams An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters.pdf -------------------------------------------------------------------------------- /research-papers/2012 Shark Fast Data Analysis Using Coarse-grained Distributed Memory.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2012 Shark Fast Data Analysis Using Coarse-grained Distributed Memory.pdf -------------------------------------------------------------------------------- /research-papers/2013 Discretized Streams Fault-Tolerant Streaming Computation at Scale.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2013 Discretized Streams Fault-Tolerant Streaming Computation at Scale.pdf -------------------------------------------------------------------------------- /research-papers/2013 Shark SQL and Rich Analytics at Scale.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2013 Shark SQL and Rich Analytics at Scale.pdf -------------------------------------------------------------------------------- /research-papers/2014 GraphX Unifying Data-Parallel and Graph-Parallel Analytics.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2014 GraphX Unifying Data-Parallel and Graph-Parallel Analytics.pdf -------------------------------------------------------------------------------- /research-papers/2015 Spark SQL Relational Data Processing in Spark.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2015 Spark SQL Relational Data Processing in Spark.pdf -------------------------------------------------------------------------------- /research-papers/2016 MLlib Machine Learning in Apache Spark.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/2016 MLlib Machine Learning in Apache Spark.pdf -------------------------------------------------------------------------------- /research-papers/201611 Apache Spark A Unified Engine for Big Data Processing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/201611 Apache Spark A Unified Engine for Big Data Processing.pdf -------------------------------------------------------------------------------- /research-papers/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/README.md -------------------------------------------------------------------------------- /research-papers/Resilient Distributed Datasets A Fault Tolerant Abstraction for In-Memory Cluster Computing.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/research-papers/Resilient Distributed Datasets A Fault Tolerant Abstraction for In-Memory Cluster Computing.pdf -------------------------------------------------------------------------------- /tools/gh-md-toc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/linbojin/spark-notes/HEAD/tools/gh-md-toc --------------------------------------------------------------------------------