├── LICENSE ├── README.md ├── code-of-conduct.md ├── spark-native ├── Dockerfile ├── Makefile ├── README.md └── scripts │ └── spark │ ├── added │ ├── driver.sh │ ├── executor.sh │ └── launch.sh │ └── install └── spark ├── Dockerfile ├── Makefile ├── README.md ├── core-site.xml ├── log4j.properties ├── spark-defaults.conf ├── start-common.sh ├── start-master ├── start-worker ├── zeppelin-build └── Dockerfile └── zeppelin ├── .gitignore ├── Dockerfile ├── docker-zeppelin.sh ├── zeppelin-env.sh └── zeppelin-log4j.properties /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # application-images 2 | 3 | This repository houses the Docker image build contents for (some) applications 4 | that run on Kubernetes. Specifically, it houses the build content and rules for 5 | any applications that require custom built Docker images. 6 | 7 | ## Structure 8 | 9 | Each directory should include: 10 | * a top-level README describing the contents 11 | * a Makefile with a `make push` rule that will push to 12 | `gcr.io/google_containers` (it has to be something that @k8s-oncall can push 13 | to). 14 | 15 | ## Contributing 16 | 17 | When in doubt, see the official Kubernetes 18 | [contributing guidelines](https://github.com/kubernetes/kubernetes/blob/80569e8866966c554a0c293df907f1bf9de368d2/CONTRIBUTING.md). 19 | 20 | ## Merge guidelines 21 | 22 | You or the reviewer can merge on a single LGTM. After you submit a PR, mention 23 | @k8s-oncall to get the image pushed (if you or the merger aren't able). 24 | 25 | ## Joining the team 26 | 27 | If you've submitted an image to this repo and will maintain it actively, 28 | consider asking to join the 29 | [application-images-maintainers](https://github.com/orgs/kubernetes/teams/application-images-maintainers) 30 | team. 31 | -------------------------------------------------------------------------------- /code-of-conduct.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Community Code of Conduct 2 | 3 | Please refer to our [Kubernetes Community Code of Conduct](https://git.k8s.io/community/code-of-conduct.md) 4 | -------------------------------------------------------------------------------- /spark-native/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM centos:latest 2 | 3 | MAINTAINER Erik Erlandson 4 | 5 | # when the containers are not run w/ uid 0, the uid may not map in 6 | # /etc/passwd and it may not be possible to modify things like 7 | # /etc/hosts. nss_wrapper provides an LD_PRELOAD way to modify passwd 8 | # and hosts. 9 | RUN yum install -y epel-release tar java && \ 10 | yum install -y nss_wrapper && \ 11 | yum clean all 12 | 13 | ENV PATH=$PATH:/opt/spark/bin 14 | ENV SPARK_HOME=/opt/spark 15 | 16 | # Add scripts used to configure the image 17 | COPY scripts /opt/scripts/ 18 | 19 | COPY spark-distro.tgz /opt/spark/ 20 | 21 | RUN cd /opt/spark && tar --strip-components=1 -xzf spark-distro.tgz && rm spark-distro.tgz && bash -x /opt/scripts/spark/install 22 | -------------------------------------------------------------------------------- /spark-native/Makefile: -------------------------------------------------------------------------------- 1 | DISTRO_VERSION ?= 0.10 2 | ifndef DISTRO_PATH 3 | $(error DISTRO_PATH is undefined) 4 | endif 5 | 6 | ifndef REPO 7 | $(error REPO is undefined) 8 | endif 9 | 10 | build: 11 | cp $(DISTRO_PATH) ./spark-distro.tgz 12 | docker build -t "$(REPO):$(DISTRO_VERSION)" . 13 | 14 | clean: 15 | docker rmi $(DISTRO_NAME):$(DISTRO_VERSION) 16 | 17 | push: build 18 | docker push $(REPO):$(DISTRO_VERSION) 19 | -------------------------------------------------------------------------------- /spark-native/README.md: -------------------------------------------------------------------------------- 1 | The repository this points to k8s4spark/spark. 2 | 3 | # Steps to build the docker image 4 | 5 | 1. Build your spark distribution (typically from sources) with kubernetes support. 6 | 7 | ``` 8 | ./dev/make-distribution.sh --tgz -Pkubernetes -Phadoop-2.4 -Darguments="-DskipTests" -Dhadoop.version=2.4.0 9 | ``` 10 | 11 | For further details, refer to: https://github.com/foxish/spark/tree/k8s-support/kubernetes 12 | 13 | 14 | 2. Build and push the docker image by running the following: 15 | 16 | ``` 17 | make push DISTRO_PATH=~/spark.tgz REPO=docker.io/foxish/kube-spark 18 | ``` 19 | 20 | 3. Use the newly pushed image in launching a new Spark Job with k8s support using spark-submit. 21 | 22 | 23 | -------------------------------------------------------------------------------- /spark-native/scripts/spark/added/driver.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | set -e 4 | set -x 5 | 6 | CLIENT_JAR=$1 7 | shift 8 | 9 | curl -L -o $SPARK_HOME/kubernetes/client.jar $CLIENT_JAR 10 | 11 | $SPARK_HOME/bin/spark-submit $@ 12 | -------------------------------------------------------------------------------- /spark-native/scripts/spark/added/executor.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | set -e 4 | set -x 5 | 6 | cd $SPARK_HOME 7 | $SPARK_HOME/bin/spark-class $@ 8 | -------------------------------------------------------------------------------- /spark-native/scripts/spark/added/launch.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # spark likes to be able to lookup a username for the running UID, if 4 | # no name is present fake it. 5 | cat /etc/passwd > /tmp/passwd 6 | echo "$(id -u):x:$(id -u):$(id -g):dynamic uid:$SPARK_HOME:/bin/false" >> /tmp/passwd 7 | 8 | export NSS_WRAPPER_PASSWD=/tmp/passwd 9 | # NSS_WRAPPER_GROUP must be set for NSS_WRAPPER_PASSWD to be used 10 | export NSS_WRAPPER_GROUP=/etc/group 11 | 12 | export LD_PRELOAD=libnss_wrapper.so 13 | 14 | # If SPARK_MASTER_ADDRESS env varaible is not provided, start master, 15 | # otherwise start worker and connect to SPARK_MASTER_ADDRESS 16 | if [ -z ${SPARK_MASTER_ADDRESS+_} ]; then 17 | echo "Starting master" 18 | 19 | # run the spark master directly (instead of sbin/start-master.sh) to 20 | # link master and container lifecycle 21 | exec $SPARK_HOME/bin/spark-class org.apache.spark.deploy.master.Master 22 | else 23 | echo "Starting worker, will connect to: $SPARK_MASTER_ADDRESS" 24 | while true; do 25 | echo "Waiting for spark master to be available ..." 26 | curl --connect-timeout 1 -s -X GET $SPARK_MASTER_UI_ADDRESS > /dev/null 27 | if [ $? -eq 0 ]; then 28 | break 29 | fi 30 | sleep 1 31 | done 32 | exec $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker $SPARK_MASTER_ADDRESS 33 | fi 34 | 35 | -------------------------------------------------------------------------------- /spark-native/scripts/spark/install: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | set -e 4 | 5 | SCRIPT_DIR=$(dirname $0) 6 | ADDED_DIR=${SCRIPT_DIR}/added 7 | 8 | mv $ADDED_DIR/launch.sh $SPARK_HOME/bin/ 9 | chmod +x $SPARK_HOME/bin/launch.sh 10 | 11 | mv $ADDED_DIR/driver.sh $ADDED_DIR/executor.sh /opt/ 12 | chmod a+rx /opt/driver.sh /opt/executor.sh 13 | 14 | # SPARK_WORKER_DIR defaults to SPARK_HOME/work and is created on 15 | # Worker startup if it does not exist. instead of making SPARK_HOME 16 | # world writable, create SPARK_HOME/work. 17 | mkdir $SPARK_HOME/work 18 | chmod a+rwx $SPARK_HOME/work 19 | 20 | mkdir $SPARK_HOME/kubernetes 21 | chmod a+rwx $SPARK_HOME/kubernetes 22 | 23 | chmod -R a+rX $SPARK_HOME 24 | -------------------------------------------------------------------------------- /spark/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM java:openjdk-8-jdk 2 | 3 | ENV hadoop_ver 2.6.1 4 | ENV spark_ver 1.5.2 5 | 6 | # Get Hadoop from US Apache mirror and extract just the native 7 | # libs. (Until we care about running HDFS with these containers, this 8 | # is all we need.) 9 | RUN mkdir -p /opt && \ 10 | cd /opt && \ 11 | curl http://www.us.apache.org/dist/hadoop/common/hadoop-${hadoop_ver}/hadoop-${hadoop_ver}.tar.gz | \ 12 | tar -zx hadoop-${hadoop_ver}/lib/native && \ 13 | ln -s hadoop-${hadoop_ver} hadoop && \ 14 | echo Hadoop ${hadoop_ver} native libraries installed in /opt/hadoop/lib/native 15 | 16 | # Get Spark from US Apache mirror. 17 | RUN mkdir -p /opt && \ 18 | cd /opt && \ 19 | curl http://www.us.apache.org/dist/spark/spark-${spark_ver}/spark-${spark_ver}-bin-hadoop2.6.tgz | \ 20 | tar -zx && \ 21 | ln -s spark-${spark_ver}-bin-hadoop2.6 spark && \ 22 | echo Spark ${spark_ver} installed in /opt 23 | 24 | # Add the GCS connector. 25 | RUN cd /opt/spark/lib && \ 26 | curl -O https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar 27 | 28 | # if numpy is installed on a driver it needs to be installed on all 29 | # workers, so install it everywhere 30 | RUN apt-get update && \ 31 | apt-get install -y python-numpy && \ 32 | apt-get clean && \ 33 | rm -rf /var/lib/apt/lists/* 34 | 35 | ADD log4j.properties /opt/spark/conf/log4j.properties 36 | ADD start-common.sh start-worker start-master / 37 | ADD core-site.xml /opt/spark/conf/core-site.xml 38 | ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf 39 | ENV PATH $PATH:/opt/spark/bin 40 | -------------------------------------------------------------------------------- /spark/Makefile: -------------------------------------------------------------------------------- 1 | all: spark zeppelin 2 | push: push-spark push-zeppelin 3 | .PHONY: push push-spark push-zeppelin spark zeppelin zeppelin-build 4 | 5 | # To bump the Spark version, bump the spark_ver in Dockerfile, bump 6 | # this tag and reset to v1. You should also double check the native 7 | # Hadoop libs at that point (we grab the 2.6.1 libs, which are 8 | # appropriate for 1.5.2-with-2.6). Note that you'll need to re-test 9 | # Zeppelin (and it may not have caught up to newest Spark). 10 | TAG = 1.5.2_v1 11 | 12 | # To bump the Zeppelin version, bump the version in 13 | # zeppelin/Dockerfile and bump this tag and reset to v1. 14 | ZEPPELIN_TAG = v0.5.6_v1 15 | 16 | spark: 17 | docker build -t gcr.io/google_containers/spark . 18 | docker tag gcr.io/google_containers/spark gcr.io/google_containers/spark:$(TAG) 19 | 20 | # This target is useful when needing to use an unreleased version of Zeppelin 21 | zeppelin-build: 22 | docker build -t gcr.io/google_containers/zeppelin-build zeppelin-build 23 | docker tag -f gcr.io/google_containers/zeppelin-build gcr.io/google_containers/zeppelin-build:$(ZEPPELIN_TAG) 24 | 25 | zeppelin: 26 | docker build -t gcr.io/google_containers/zeppelin zeppelin 27 | docker tag -f gcr.io/google_containers/zeppelin gcr.io/google_containers/zeppelin:$(ZEPPELIN_TAG) 28 | 29 | push-spark: spark 30 | gcloud docker push gcr.io/google_containers/spark 31 | gcloud docker push gcr.io/google_containers/spark:$(TAG) 32 | 33 | push-zeppelin: zeppelin 34 | gcloud docker push gcr.io/google_containers/zeppelin 35 | gcloud docker push gcr.io/google_containers/zeppelin:$(ZEPPELIN_TAG) 36 | 37 | clean: 38 | docker rmi gcr.io/google_containers/spark:$(TAG) || : 39 | docker rmi gcr.io/google_containers/spark || : 40 | 41 | docker rmi gcr.io/google_containers/zeppelin:$(ZEPPELIN_TAG) || : 42 | docker rmi gcr.io/google_containers/zeppelin || : 43 | -------------------------------------------------------------------------------- /spark/README.md: -------------------------------------------------------------------------------- 1 | # Spark 2 | 3 | This is a Docker image appropriate for running Spark on Kuberenetes. It produces three main images: 4 | * `spark-master` - Runs a Spark master in Standalone mode and exposes a port for Spark and a port for the WebUI. 5 | * `spark-worker` - Runs a Spark worer in Standalone mode and connects to the Spark master via DNS name `spark-master`. 6 | * `zeppelin` - Runs a Zeppelin web notebook and connects to the Spark master via DNS name `spark-master` and exposes a port for the WebUI. 7 | 8 | In addition, there are two additional pushed images: 9 | * `spark-base` - This base image for `spark-master` and `spark-worker` that starts nothing. 10 | * `spark-driver` - This image, just like the `zeppelin` image, allows running things like `pyspark` to connect to `spark-master`, but is lighter weight than the `zeppelin` image. 11 | -------------------------------------------------------------------------------- /spark/core-site.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | fs.gs.impl 7 | com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem 8 | The FileSystem for gs: (GCS) uris. 9 | 10 | 11 | fs.AbstractFileSystem.gs.impl 12 | com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS 13 | The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2. 14 | 15 | 16 | fs.gs.project.id 17 | NOT_RUNNING_INSIDE_GCE 18 | 19 | 20 | -------------------------------------------------------------------------------- /spark/log4j.properties: -------------------------------------------------------------------------------- 1 | # Set everything to be logged to the console 2 | log4j.rootCategory=INFO, console 3 | log4j.appender.console=org.apache.log4j.ConsoleAppender 4 | log4j.appender.console.target=System.err 5 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 6 | log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n 7 | 8 | # Settings to quiet third party logs that are too verbose 9 | log4j.logger.org.spark-project.jetty=WARN 10 | log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR 11 | log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO 12 | log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO 13 | -------------------------------------------------------------------------------- /spark/spark-defaults.conf: -------------------------------------------------------------------------------- 1 | spark.master spark://spark-master:7077 2 | spark.executor.extraClassPath /opt/spark/lib/gcs-connector-latest-hadoop2.jar 3 | spark.driver.extraClassPath /opt/spark/lib/gcs-connector-latest-hadoop2.jar 4 | spark.driver.extraLibraryPath /opt/hadoop/lib/native 5 | spark.app.id KubernetesSpark 6 | -------------------------------------------------------------------------------- /spark/start-common.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2015 The Kubernetes Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | PROJECT_ID=$(curl -s -H "Metadata-Flavor: Google" http://metadata.google.internal/computeMetadata/v1/project/project-id) 18 | 19 | if [[ -n "${PROJECT_ID}" ]]; then 20 | sed -i "s/NOT_RUNNING_INSIDE_GCE/${PROJECT_ID}/" /opt/spark/conf/core-site.xml 21 | fi 22 | 23 | # We don't want any of the incoming service variables, we'd rather use 24 | # DNS. But this one interferes directly with Spark. 25 | unset SPARK_MASTER_PORT 26 | 27 | # spark.{executor,driver}.extraLibraryPath don't actually seem to 28 | # work, this seems to be the only reliable way to get the native libs 29 | # picked up. 30 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/hadoop/lib/native 31 | -------------------------------------------------------------------------------- /spark/start-master: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2015 The Kubernetes Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | . /start-common.sh 18 | 19 | echo "$(hostname -i) spark-master" >> /etc/hosts 20 | 21 | # Run spark-class directly so that when it exits (or crashes), the pod restarts. 22 | /opt/spark/bin/spark-class org.apache.spark.deploy.master.Master --ip spark-master --port 7077 --webui-port 8080 23 | -------------------------------------------------------------------------------- /spark/start-worker: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2015 The Kubernetes Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | . /start-common.sh 18 | 19 | if ! getent hosts spark-master; then 20 | echo "=== Cannot resolve the DNS entry for spark-master. Has the service been created yet, and is SkyDNS functional?" 21 | echo "=== See http://kubernetes.io/v1.1/docs/admin/dns.html for more details on DNS integration." 22 | echo "=== Sleeping 10s before pod exit." 23 | sleep 10 24 | exit 0 25 | fi 26 | 27 | # Run spark-class directly so that when it exits (or crashes), the pod restarts. 28 | /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077 --webui-port 8081 29 | -------------------------------------------------------------------------------- /spark/zeppelin-build/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright 2015 The Kubernetes Authors All rights reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | # This is the Zeppelin *build* image. It spits out a /zeppelin.tgz 16 | # alone, which is then copied out by the Makefile and used in the 17 | # actual Zeppelin image. 18 | # 19 | # Based heavily on 20 | # https://github.com/dylanmei/docker-zeppelin/blob/master/Dockerfile 21 | # (which is similar to many others out there), but rebased onto maven 22 | # image. 23 | 24 | FROM maven:3.3.3-jdk-8 25 | 26 | ENV ZEPPELIN_TAG v0.5.5 27 | ENV SPARK_MINOR 1.5 28 | ENV SPARK_PATCH 1 29 | ENV SPARK_VER ${SPARK_MINOR}.${SPARK_PATCH} 30 | ENV HADOOP_MINOR 2.6 31 | ENV HADOOP_PATCH 1 32 | ENV HADOOP_VER ${HADOOP_MINOR}.${HADOOP_PATCH} 33 | 34 | # libfontconfig is a workaround for 35 | # https://github.com/karma-runner/karma/issues/1270, which caused a 36 | # build break similar to 37 | # https://www.mail-archive.com/users@zeppelin.incubator.apache.org/msg01586.html 38 | 39 | RUN apt-get update \ 40 | && apt-get install -y net-tools build-essential git wget unzip python python-setuptools python-dev python-numpy libfontconfig 41 | 42 | RUN git clone https://github.com/apache/incubator-zeppelin.git --branch ${ZEPPELIN_TAG} /opt/zeppelin 43 | RUN cd /opt/zeppelin && \ 44 | mvn clean package \ 45 | -Pbuild-distr \ 46 | -Pspark-${SPARK_MINOR} -Dspark.version=${SPARK_VER} \ 47 | -Phadoop-${HADOOP_MINOR} -Dhadoop.version=${HADOOP_VER} \ 48 | -Ppyspark \ 49 | -DskipTests && \ 50 | echo "Successfully built Zeppelin" 51 | 52 | RUN cd /opt/zeppelin/zeppelin-distribution/target/zeppelin-* && \ 53 | mv zeppelin-* zeppelin && \ 54 | tar cvzf /zeppelin.tgz zeppelin 55 | -------------------------------------------------------------------------------- /spark/zeppelin/.gitignore: -------------------------------------------------------------------------------- 1 | zeppelin.tgz 2 | -------------------------------------------------------------------------------- /spark/zeppelin/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright 2015 The Kubernetes Authors All rights reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | # This image relies on the zeppelin-build image to get the zeppelin 16 | # binaries built, and the Makefile to put it in this directory. 17 | 18 | FROM gcr.io/google_containers/spark-base:latest 19 | 20 | ENV ZEPPELIN_VER 0.5.6-incubating 21 | 22 | RUN mkdir -p /opt && \ 23 | cd /opt && \ 24 | curl http://www.us.apache.org/dist/incubator/zeppelin/${ZEPPELIN_VER}/zeppelin-${ZEPPELIN_VER}-bin-all.tgz | \ 25 | tar -zx && \ 26 | ln -s zeppelin-${ZEPPELIN_VER}-bin-all zeppelin && \ 27 | echo Zeppelin ${ZEPPELIN_VER} installed in /opt 28 | 29 | ADD zeppelin-log4j.properties /opt/zeppelin/conf/log4j.properties 30 | ADD zeppelin-env.sh /opt/zeppelin/conf/zeppelin-env.sh 31 | ADD docker-zeppelin.sh /opt/zeppelin/bin/docker-zeppelin.sh 32 | EXPOSE 8080 33 | ENTRYPOINT ["/opt/zeppelin/bin/docker-zeppelin.sh"] 34 | -------------------------------------------------------------------------------- /spark/zeppelin/docker-zeppelin.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2015 The Kubernetes Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | export ZEPPELIN_HOME=/opt/zeppelin 18 | export ZEPPELIN_CONF_DIR="${ZEPPELIN_HOME}/conf" 19 | 20 | echo "=== Launching Zeppelin under Docker ===" 21 | /opt/zeppelin/bin/zeppelin.sh "${ZEPPELIN_CONF_DIR}" 22 | -------------------------------------------------------------------------------- /spark/zeppelin/zeppelin-env.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Copyright 2015 The Kubernetes Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | export MASTER="spark://spark-master:7077" 18 | export SPARK_HOME=/opt/spark 19 | export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/opt/spark/lib/gcs-connector-latest-hadoop2.jar" 20 | # TODO(zmerlynn): Setting global CLASSPATH *should* be unnecessary, 21 | # but ZEPPELIN_JAVA_OPTS isn't enough here. :( 22 | export CLASSPATH="/opt/spark/lib/gcs-connector-latest-hadoop2.jar" 23 | export ZEPPELIN_NOTEBOOK_DIR="${ZEPPELIN_HOME}/notebook" 24 | export ZEPPELIN_MEM=-Xmx1024m 25 | export ZEPPELIN_PORT=8080 26 | export PYTHONPATH="${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" 27 | -------------------------------------------------------------------------------- /spark/zeppelin/zeppelin-log4j.properties: -------------------------------------------------------------------------------- 1 | # Set everything to be logged to the console. 2 | log4j.rootCategory=INFO, console 3 | log4j.appender.console=org.apache.log4j.ConsoleAppender 4 | log4j.appender.console.target=System.err 5 | log4j.appender.console.layout=org.apache.log4j.PatternLayout 6 | log4j.appender.console.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n 7 | --------------------------------------------------------------------------------