├── all-spark-notebook ├── kernel.json ├── Dockerfile └── README.md ├── minimal-notebook ├── start-notebook.sh ├── jupyter_notebook_config.py ├── Dockerfile └── README.md ├── kitchen-sink-rodeo ├── Dockerfile ├── start-rodeo.sh └── README.md ├── Makefile ├── r-notebook ├── Dockerfile └── README.md ├── .gitignore ├── scipy-notebook ├── Dockerfile └── README.md ├── LICENSE ├── pyspark-notebook ├── Dockerfile └── README.md ├── kitchen-sink-notebook ├── Dockerfile └── README.md ├── datascience-notebook ├── Dockerfile └── README.md └── README.md /all-spark-notebook/kernel.json: -------------------------------------------------------------------------------- 1 | { 2 | "display_name": "Scala 2.10.4", 3 | "language": "scala", 4 | "argv": [ 5 | "/opt/sparkkernel/bin/sparkkernel", 6 | "--profile", 7 | "{connection_file}" 8 | ], 9 | "env": { 10 | "SPARK_CONFIGURATION": "" 11 | } 12 | } -------------------------------------------------------------------------------- /minimal-notebook/start-notebook.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Change UID of NB_USER to NB_UID if it does not match 4 | if [ "$NB_UID" != $(id -u $NB_USER) ] ; then 5 | usermod -u $NB_UID $NB_USER 6 | chown -R $NB_UID $CONDA_DIR 7 | fi 8 | 9 | # Enable sudo if requested 10 | if [ ! -z "$GRANT_SUDO" ]; then 11 | echo "$NB_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/notebook 12 | fi 13 | 14 | # Start the notebook server 15 | exec su $NB_USER -c "env PATH=$PATH jupyter notebook $@" 16 | 17 | -------------------------------------------------------------------------------- /kitchen-sink-rodeo/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM dbhi-dsg/minimal-notebook 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER choptiu 7 | 8 | RUN pip install -U rodeo 9 | 10 | USER root 11 | 12 | # Install Python 2 kernel spec globally to avoid permission problems when NB_UID 13 | # switching at runtime. 14 | RUN $CONDA_DIR/envs/python2/bin/python \ 15 | $CONDA_DIR/envs/python2/bin/ipython \ 16 | kernelspec install-self 17 | 18 | COPY start-rodeo.sh /usr/local/bin/ 19 | 20 | CMD ["start-rodeo.sh"] 21 | -------------------------------------------------------------------------------- /kitchen-sink-rodeo/start-rodeo.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Change UID of NB_USER to NB_UID if it does not match 4 | if [ "$NB_UID" != $(id -u $NB_USER) ] ; then 5 | usermod -u $NB_UID $NB_USER 6 | chown -R $NB_UID $CONDA_DIR 7 | fi 8 | 9 | # Enable sudo if requested 10 | if [ ! -z "$GRANT_SUDO" ]; then 11 | echo "$NB_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/notebook 12 | fi 13 | 14 | if [ -z ${PORT} ]; then export PORT=8000; fi 15 | 16 | # Start rodeo 17 | exec su $NB_USER -c "env PATH=$PATH rodeo . --host=0.0.0.0 --no-browser --port=$PORT" 18 | 19 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: build dev help server 2 | 3 | OWNER:=jupyter 4 | STACK?= 5 | ARGS?= 6 | DARGS?= 7 | 8 | help: 9 | @echo 10 | @echo ' build STACK= - build using Dockerfile in named directory' 11 | @echo ' dev STACK= - run container using stack name' 12 | @echo 'server STACK= - run stack container in background' 13 | @echo 14 | 15 | build: 16 | @cd $(STACK) && \ 17 | docker build --rm --force-rm -t $(OWNER)/$(STACK) . 18 | 19 | dev: 20 | @docker run -it --rm -p 8888:8888 $(DARGS) $(OWNER)/$(STACK) $(ARGS) 21 | 22 | server: 23 | @docker run -d -p 8888:8888 $(DARGS) $(OWNER)/$(STACK) $(ARGS) -------------------------------------------------------------------------------- /r-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM dbhi-dsg/minimal-notebook 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER root 7 | 8 | # R pre-requisites 9 | RUN apt-get update && \ 10 | apt-get install -y --no-install-recommends \ 11 | libxrender1 \ 12 | fonts-dejavu \ 13 | gfortran \ 14 | gcc && apt-get clean 15 | 16 | USER choptiu 17 | 18 | # R packages 19 | RUN conda config --add channels r 20 | RUN conda install --yes \ 21 | 'r-base=3.2*' \ 22 | 'r-irkernel=0.4*' \ 23 | 'r-plyr=1.8*' \ 24 | 'r-devtools=1.8*' \ 25 | 'r-dplyr=0.4*' \ 26 | 'r-ggplot2=1.0*' \ 27 | 'r-tidyr=0.2*' \ 28 | 'r-shiny=0.12*' \ 29 | 'r-rmarkdown=0.7*' \ 30 | 'r-forecast=5.8*' \ 31 | 'r-stringr=0.6*' \ 32 | 'r-rsqlite=1.0*' \ 33 | 'r-reshape2=1.4*' \ 34 | 'r-nycflights13=0.1*' \ 35 | 'r-caret=6.0*' \ 36 | 'r-rcurl=1.95*' \ 37 | 'r-randomforest=4.6*' && conda clean -yt 38 | 39 | USER root 40 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | 3 | *~ 4 | 5 | __pycache__/ 6 | *.py[cod] 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | env/ 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | *.egg-info/ 26 | .installed.cfg 27 | *.egg 28 | 29 | # PyInstaller 30 | # Usually these files are written by a python script from a template 31 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 32 | *.manifest 33 | *.spec 34 | 35 | # Installer logs 36 | pip-log.txt 37 | pip-delete-this-directory.txt 38 | 39 | # Unit test / coverage reports 40 | htmlcov/ 41 | .tox/ 42 | .coverage 43 | .coverage.* 44 | .cache 45 | nosetests.xml 46 | coverage.xml 47 | *,cover 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | 56 | # Sphinx documentation 57 | docs/_build/ 58 | 59 | # PyBuilder 60 | target/ 61 | -------------------------------------------------------------------------------- /scipy-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM dbhi-dsg/minimal-notebook 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER choptiu 7 | 8 | # Install Python 3 packages 9 | RUN conda install --yes \ 10 | 'ipywidgets=4.0*' \ 11 | 'pandas=0.16*' \ 12 | 'matplotlib=1.4*' \ 13 | 'scipy=0.15*' \ 14 | 'seaborn=0.6*' \ 15 | 'scikit-learn=0.16*' \ 16 | 'scikit-image=0.11*' \ 17 | 'sympy=0.7*' \ 18 | 'cython=0.22*' \ 19 | 'patsy=0.3*' \ 20 | 'statsmodels=0.6*' \ 21 | 'cloudpickle=0.1*' \ 22 | 'dill=0.2*' \ 23 | 'numba=0.20*' \ 24 | 'bokeh=0.9*' \ 25 | && conda clean -yt 26 | 27 | # Install Python 2 packages 28 | RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \ 29 | 'ipython=4.0*' \ 30 | 'ipywidgets=4.0*' \ 31 | 'pandas=0.16*' \ 32 | 'matplotlib=1.4*' \ 33 | 'scipy=0.15*' \ 34 | 'seaborn=0.6*' \ 35 | 'scikit-learn=0.16*' \ 36 | 'scikit-image=0.11*' \ 37 | 'sympy=0.7*' \ 38 | 'cython=0.22*' \ 39 | 'patsy=0.3*' \ 40 | 'statsmodels=0.6*' \ 41 | 'cloudpickle=0.1*' \ 42 | 'dill=0.2*' \ 43 | 'numba=0.20*' \ 44 | 'bokeh=0.9*' \ 45 | pyzmq \ 46 | && conda clean -yt 47 | 48 | RUN apt-get update -y & apt-get install python-qt4 -y 49 | 50 | USER root 51 | 52 | # Install Python 2 kernel spec globally to avoid permission problems when NB_UID 53 | # switching at runtime. 54 | RUN $CONDA_DIR/envs/python2/bin/python \ 55 | $CONDA_DIR/envs/python2/bin/ipython \ 56 | kernelspec install-self 57 | 58 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, Project Jupyter 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | * Neither the name of docker-stacks nor the names of its 15 | contributors may be used to endorse or promote products derived from 16 | this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | 29 | -------------------------------------------------------------------------------- /minimal-notebook/jupyter_notebook_config.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | from jupyter_core.paths import jupyter_data_dir 3 | import subprocess 4 | import os 5 | import errno 6 | import stat 7 | 8 | PEM_FILE = os.path.join(jupyter_data_dir(), 'notebook.pem') 9 | 10 | c = get_config() 11 | c.NotebookApp.ip = os.getenv('INTERFACE', '') or '*' 12 | c.NotebookApp.port = int(os.getenv('PORT', '') or 8888) 13 | c.NotebookApp.open_browser = False 14 | 15 | # Set a certificate if USE_HTTPS is set to any value 16 | if 'USE_HTTPS' in os.environ: 17 | if not os.path.isfile(PEM_FILE): 18 | # Ensure PEM_FILE directory exists 19 | dir_name = os.path.dirname(PEM_FILE) 20 | try: 21 | os.makedirs(dir_name) 22 | except OSError as exc: # Python >2.5 23 | if exc.errno == errno.EEXIST and os.path.isdir(dir_name): 24 | pass 25 | else: raise 26 | # Generate a certificate if one doesn't exist on disk 27 | subprocess.check_call(['openssl', 'req', '-new', 28 | '-newkey', 'rsa:2048', '-days', '365', '-nodes', '-x509', 29 | '-subj', '/C=XX/ST=XX/L=XX/O=generated/CN=generated', 30 | '-keyout', PEM_FILE, '-out', PEM_FILE]) 31 | # Restrict access to PEM_FILE 32 | os.chmod(PEM_FILE, stat.S_IRUSR | stat.S_IWUSR) 33 | c.NotebookApp.certfile = PEM_FILE 34 | 35 | # Set a password if PASSWORD is set 36 | if 'PASSWORD' in os.environ: 37 | from IPython.lib import passwd 38 | c.NotebookApp.password = passwd(os.environ['PASSWORD']) 39 | del os.environ['PASSWORD'] -------------------------------------------------------------------------------- /pyspark-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM dbhi-dsg/minimal-notebook 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER root 7 | 8 | # Spark dependencies 9 | ENV APACHE_SPARK_VERSION 1.4.1 10 | RUN apt-get -y update && \ 11 | apt-get install -y --no-install-recommends openjdk-7-jre-headless && \ 12 | apt-get clean 13 | RUN wget -qO - http://d3kbcqa49mib13.cloudfront.net/spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz | tar -xz -C /usr/local/ 14 | RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6 spark 15 | 16 | # Mesos dependencies 17 | RUN apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF && \ 18 | DISTRO=debian && \ 19 | CODENAME=wheezy && \ 20 | echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" > /etc/apt/sources.list.d/mesosphere.list && \ 21 | apt-get -y update && \ 22 | apt-get --no-install-recommends -y --force-yes install mesos=0.22.1-1.0.debian78 && \ 23 | apt-get clean 24 | 25 | # Spark and Mesos pointers 26 | ENV SPARK_HOME /usr/local/spark 27 | ENV PYTHONPATH $SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip 28 | ENV MESOS_NATIVE_LIBRARY /usr/local/lib/libmesos.so 29 | 30 | USER choptiu 31 | 32 | # Install Python 3 packages 33 | RUN conda install --yes \ 34 | 'ipywidgets=4.0*' \ 35 | 'pandas=0.16*' \ 36 | 'matplotlib=1.4*' \ 37 | 'scipy=0.15*' \ 38 | 'seaborn=0.6*' \ 39 | 'scikit-learn=0.16*' \ 40 | && conda clean -yt 41 | 42 | # Install Python 2 packages and kernel spec 43 | RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \ 44 | 'ipython=4.0*' \ 45 | 'ipywidgets=4.0*' \ 46 | 'pandas=0.16*' \ 47 | 'matplotlib=1.4*' \ 48 | 'scipy=0.15*' \ 49 | 'seaborn=0.6*' \ 50 | 'scikit-learn=0.16*' \ 51 | pyzmq \ 52 | && conda clean -yt 53 | 54 | USER root 55 | 56 | # Install Python 2 kernel spec globally to avoid permission problems when NB_UID 57 | # switching at runtime. 58 | RUN $CONDA_DIR/envs/python2/bin/python \ 59 | $CONDA_DIR/envs/python2/bin/ipython \ 60 | kernelspec install-self 61 | 62 | -------------------------------------------------------------------------------- /kitchen-sink-rodeo/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook Scientific Python Stack 2 | 3 | ## What it Gives You 4 | 5 | * yHat Rodeor server 6 | * Conda Python 3.4.x and Python 2.7.x environments 7 | * nltk, pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh pre-installed 8 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 9 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 10 | 11 | ## Basic Use 12 | 13 | The following command starts a container with the Notebook server listening for HTTP connections on port 8000 without authentication configured. 14 | 15 | ``` 16 | docker run -d -p 8000:8000 choptiu/kitchen-sink-rodeo 17 | 18 | ``` 19 | 20 | ## Docker Options 21 | 22 | You may customize the execution of the Docker container and the Rodeo server it contains with the following optional arguments. 23 | 24 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 25 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 26 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 27 | * `-e PORT=8000` - Configures Rodeo to listen on the given port. Defaults to 8000, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 28 | 29 | ## Conda Environments 30 | 31 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. A second Python 2.x Conda environment exists in `/opt/conda/envs/python2`. You can [switch to the python2 environment](http://conda.pydata.org/docs/using/envs.html#change-environments-activate-deactivate) in a shell by entering the following: 32 | 33 | ``` 34 | source activate python2 35 | ``` 36 | 37 | You can return to the default environment with this command: 38 | 39 | ``` 40 | source deactivate 41 | ``` 42 | 43 | The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in both environments. 44 | -------------------------------------------------------------------------------- /minimal-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM debian:jessie 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER root 7 | 8 | # Install all OS dependencies for fully functional notebook server 9 | ENV DEBIAN_FRONTEND noninteractive 10 | RUN apt-get update && apt-get install -yq --no-install-recommends \ 11 | git \ 12 | vim \ 13 | wget \ 14 | build-essential \ 15 | python-dev \ 16 | ca-certificates \ 17 | bzip2 \ 18 | unzip \ 19 | libsm6 \ 20 | pandoc \ 21 | texlive-latex-base \ 22 | texlive-latex-extra \ 23 | texlive-fonts-extra \ 24 | texlive-fonts-recommended \ 25 | sudo \ 26 | && apt-get clean 27 | 28 | # Install Tini 29 | RUN wget --quiet https://github.com/krallin/tini/releases/download/v0.6.0/tini && \ 30 | echo "d5ed732199c36a1189320e6c4859f0169e950692f451c03e7854243b95f4234b *tini" | sha256sum -c - && \ 31 | mv tini /usr/local/bin/tini && \ 32 | chmod +x /usr/local/bin/tini 33 | 34 | # Configure environment 35 | ENV CONDA_DIR /opt/conda 36 | ENV PATH $CONDA_DIR/bin:$PATH 37 | ENV SHELL /bin/bash 38 | ENV NB_USER choptiu 39 | ENV NB_UID 1000 40 | 41 | # Install conda 42 | RUN mkdir -p $CONDA_DIR && \ 43 | echo export PATH=$CONDA_DIR/bin:'$PATH' > /etc/profile.d/conda.sh && \ 44 | wget --quiet https://repo.continuum.io/miniconda/Miniconda3-3.9.1-Linux-x86_64.sh && \ 45 | echo "6c6b44acdd0bc4229377ee10d52c8ac6160c336d9cdd669db7371aa9344e1ac3 *Miniconda3-3.9.1-Linux-x86_64.sh" | sha256sum -c - && \ 46 | /bin/bash /Miniconda3-3.9.1-Linux-x86_64.sh -f -b -p $CONDA_DIR && \ 47 | rm Miniconda3-3.9.1-Linux-x86_64.sh && \ 48 | $CONDA_DIR/bin/conda install --yes conda==3.14.1 49 | 50 | # Install Jupyter notebook 51 | RUN conda install --yes \ 52 | 'notebook=4.0*' \ 53 | terminado \ 54 | && conda clean -yt 55 | 56 | # Create choptiu user with UID=1000 and in the 'users' group 57 | # Grant ownership over the conda dir and home dir, but stick the group as root. 58 | RUN useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \ 59 | mkdir /home/$NB_USER/work && \ 60 | mkdir /home/$NB_USER/.jupyter && \ 61 | mkdir /home/$NB_USER/.local && \ 62 | chown -R $NB_USER:users $CONDA_DIR && \ 63 | chown -R $NB_USER:users /home/$NB_USER 64 | 65 | # Configure container startup 66 | EXPOSE 8888 67 | WORKDIR /home/$NB_USER/work 68 | ENTRYPOINT ["tini", "--"] 69 | CMD ["start-notebook.sh"] 70 | 71 | # Add local files as late as possible to avoid cache busting 72 | COPY start-notebook.sh /usr/local/bin/ 73 | COPY jupyter_notebook_config.py /home/$NB_USER/.jupyter/ 74 | RUN chown -R $NB_USER:users /home/$NB_USER/.jupyter 75 | -------------------------------------------------------------------------------- /kitchen-sink-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM dbhi-dsg/minimal-notebook 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER root 7 | 8 | # R pre-requisites 9 | RUN apt-get update && \ 10 | apt-get install -y --no-install-recommends \ 11 | libxrender1 \ 12 | fonts-dejavu \ 13 | gfortran \ 14 | python-qt4 \ 15 | gcc && apt-get clean 16 | 17 | USER choptiu 18 | 19 | RUN conda config --add channels r 20 | RUN conda install --yes \ 21 | 'r-base=3.2*' \ 22 | 'r-irkernel=0.4*' \ 23 | 'r-plyr=1.8*' \ 24 | 'r-devtools=1.8*' \ 25 | 'r-dplyr=0.4*' \ 26 | 'r-ggplot2=1.0*' \ 27 | 'r-tidyr=0.2*' \ 28 | 'r-shiny=0.12*' \ 29 | 'r-rmarkdown=0.7*' \ 30 | 'r-forecast=5.8*' \ 31 | 'r-stringr=0.6*' \ 32 | 'r-rsqlite=1.0*' \ 33 | 'r-reshape2=1.4*' \ 34 | 'r-nycflights13=0.1*' \ 35 | 'r-caret=6.0*' \ 36 | 'r-rcurl=1.95*' \ 37 | 'r-randomforest=4.6*' && conda clean -yt 38 | 39 | # Install Python 3 packages 40 | RUN conda install --yes \ 41 | 'ipywidgets=4.0*' \ 42 | 'pandas=0.18*' \ 43 | 'matplotlib=1.5*' \ 44 | 'scipy=0.17*' \ 45 | 'seaborn=0.7*' \ 46 | 'scikit-learn=0.17*' \ 47 | 'scikit-image=0.12*' \ 48 | 'sympy=1.0*' \ 49 | 'cython=0.23*' \ 50 | 'patsy=0.4*' \ 51 | 'statsmodels=0.6*' \ 52 | 'cloudpickle=0.1*' \ 53 | 'dill=0.2*' \ 54 | 'numba=0.24*' \ 55 | 'bokeh=0.11*' \ 56 | 'nltk=3.2*' \ 57 | 'theano=0.7*' \ 58 | 'psycopg2=2.6*' \ 59 | 'sqlalchemy=1.0*' \ 60 | 'pymongo=3*' \ 61 | 'gensim=0.12*' \ 62 | && conda clean -yt 63 | 64 | RUN conda install --yes -c philopon chainer 65 | 66 | RUN conda install --yes -c r rpy2 67 | 68 | RUN conda clean -yt 69 | 70 | # Install Python 2 packages 71 | RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \ 72 | 'ipython=4.0*' \ 73 | 'ipywidgets=4.0*' \ 74 | 'pandas=0.18*' \ 75 | 'matplotlib=1.5*' \ 76 | 'scipy=0.17*' \ 77 | 'seaborn=0.7*' \ 78 | 'scikit-learn=0.17*' \ 79 | 'scikit-image=0.12*' \ 80 | 'sympy=1.0*' \ 81 | 'cython=0.23*' \ 82 | 'patsy=0.4*' \ 83 | 'statsmodels=0.6*' \ 84 | 'cloudpickle=0.1*' \ 85 | 'dill=0.2*' \ 86 | 'numba=0.24*' \ 87 | 'bokeh=0.11*' \ 88 | 'nltk=3.2*' \ 89 | 'psycopg2=2.6*' \ 90 | 'sqlalchemy=1.0*' \ 91 | 'theano=0.7*' \ 92 | 'pymongo=3*'\ 93 | 'gensim=0.12*' \ 94 | pyzmq \ 95 | && conda clean -yt 96 | 97 | RUN pip install -U mock nose arrow requests 98 | 99 | RUN python -m nltk.downloader all 100 | 101 | USER root 102 | 103 | # Install Python 2 kernel spec globally to avoid permission problems when NB_UID 104 | # switching at runtime. 105 | RUN $CONDA_DIR/envs/python2/bin/python \ 106 | $CONDA_DIR/envs/python2/bin/ipython \ 107 | kernelspec install-self 108 | 109 | -------------------------------------------------------------------------------- /datascience-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM dbhi-dsg/minimal-notebook 2 | 3 | MAINTAINER Aaron J. Masino 4 | 5 | USER root 6 | 7 | # R pre-requisites 8 | RUN apt-get update && \ 9 | apt-get install -y --no-install-recommends \ 10 | libxrender1 \ 11 | fonts-dejavu \ 12 | gfortran \ 13 | gcc && apt-get clean 14 | 15 | # Julia dependencies 16 | RUN apt-get update && \ 17 | apt-get install -y --no-install-recommends \ 18 | julia \ 19 | libnettle4 && apt-get clean 20 | 21 | USER choptiu 22 | 23 | # Install Python 3 packages 24 | RUN conda install --yes \ 25 | 'ipywidgets=4.0*' \ 26 | 'pandas=0.16*' \ 27 | 'matplotlib=1.4*' \ 28 | 'scipy=0.15*' \ 29 | 'seaborn=0.6*' \ 30 | 'scikit-learn=0.16*' \ 31 | 'scikit-image=0.11*' \ 32 | 'sympy=0.7*' \ 33 | 'cython=0.22*' \ 34 | 'patsy=0.3*' \ 35 | 'statsmodels=0.6*' \ 36 | 'cloudpickle=0.1*' \ 37 | 'dill=0.2*' \ 38 | 'numba=0.20*' \ 39 | 'bokeh=0.9*' \ 40 | && conda clean -yt 41 | 42 | # Install Python 2 packages 43 | RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \ 44 | 'ipython=4.0*' \ 45 | 'ipywidgets=4.0*' \ 46 | 'pandas=0.16*' \ 47 | 'matplotlib=1.4*' \ 48 | 'scipy=0.15*' \ 49 | 'seaborn=0.6*' \ 50 | 'scikit-learn=0.16*' \ 51 | 'scikit-image=0.11*' \ 52 | 'sympy=0.7*' \ 53 | 'cython=0.22*' \ 54 | 'patsy=0.3*' \ 55 | 'statsmodels=0.6*' \ 56 | 'cloudpickle=0.1*' \ 57 | 'dill=0.2*' \ 58 | 'numba=0.20*' \ 59 | 'bokeh=0.9*' \ 60 | pyzmq \ 61 | && conda clean -yt 62 | 63 | # R packages including IRKernel which gets installed globally. 64 | RUN conda config --add channels r 65 | RUN conda install --yes \ 66 | 'r-base=3.2*' \ 67 | 'r-irkernel=0.4*' \ 68 | 'r-plyr=1.8*' \ 69 | 'r-devtools=1.8*' \ 70 | 'r-dplyr=0.4*' \ 71 | 'r-ggplot2=1.0*' \ 72 | 'r-tidyr=0.2*' \ 73 | 'r-shiny=0.12*' \ 74 | 'r-rmarkdown=0.7*' \ 75 | 'r-forecast=5.8*' \ 76 | 'r-stringr=0.6*' \ 77 | 'r-rsqlite=1.0*' \ 78 | 'r-reshape2=1.4*' \ 79 | 'r-nycflights13=0.1*' \ 80 | 'r-caret=6.0*' \ 81 | 'r-rcurl=1.95*' \ 82 | 'r-randomforest=4.6*' && conda clean -yt 83 | 84 | # Install IJulia packages as jovyan and then move the kernelspec out 85 | # to the system share location. Avoids problems with runtime UID change not 86 | # taking effect properly on the .local folder in the jovyan home dir. 87 | RUN julia -e 'Pkg.add("IJulia")' && \ 88 | mv /home/$NB_USER/.local/share/jupyter/kernels/* $CONDA_DIR/share/jupyter/kernels/ && \ 89 | chmod -R go+rx $CONDA_DIR/share/jupyter && \ 90 | rm -rf /home/$NB_USER/.local/share 91 | RUN julia -e 'Pkg.add("Gadfly")' && julia -e 'Pkg.add("RDatasets")' 92 | 93 | USER root 94 | 95 | # Install Python 2 kernel spec globally to avoid permission problems when NB_UID 96 | # switching at runtime. 97 | RUN $CONDA_DIR/envs/python2/bin/python \ 98 | $CONDA_DIR/envs/python2/bin/ipython \ 99 | kernelspec install-self 100 | 101 | -------------------------------------------------------------------------------- /r-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook R Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server (v4.0.x or v3.2.x, see tag) 6 | * Conda R v3.2.x and channel 7 | * plyr, devtools, dplyr, ggplot2, tidyr, shiny, rmarkdown, forecast, stringr, rsqlite, reshape2, nycflights13, caret, rcurl, and randomforest pre-installed 8 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 9 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 10 | * Options for HTTPS, password auth, and passwordless `sudo` 11 | 12 | ## Basic Use 13 | 14 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 15 | 16 | ``` 17 | docker run -d -p 8888:8888 choptiu/r-notebook 18 | ``` 19 | 20 | ## Docker Options 21 | 22 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 23 | 24 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 25 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not found in `/home/choptiu/.ipython/profile_default/security/notebook.pem`, the container will generate a self-signed certificate for you. 26 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 27 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 28 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 29 | * **(v3.2.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.ipython/profile_default/security/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 30 | * **(v4.0.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 31 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 32 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 33 | -------------------------------------------------------------------------------- /all-spark-notebook/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | FROM dbhi-dsg/minimal-notebook 3 | 4 | MAINTAINER Aaron J. Masino 5 | 6 | USER root 7 | 8 | # Spark dependencies 9 | ENV APACHE_SPARK_VERSION 1.4.1 10 | RUN apt-get -y update && \ 11 | apt-get install -y --no-install-recommends openjdk-7-jre-headless && \ 12 | apt-get clean 13 | RUN wget -qO - http://d3kbcqa49mib13.cloudfront.net/spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz | tar -xz -C /usr/local/ 14 | RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6 spark 15 | 16 | # Mesos dependencies 17 | RUN apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BF && \ 18 | DISTRO=debian && \ 19 | CODENAME=wheezy && \ 20 | echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" > /etc/apt/sources.list.d/mesosphere.list && \ 21 | apt-get -y update && \ 22 | apt-get --no-install-recommends -y --force-yes install mesos=0.22.1-1.0.debian78 && \ 23 | apt-get clean 24 | 25 | # Scala Spark kernel (build and cleanup) 26 | RUN cd /tmp && \ 27 | echo deb http://dl.bintray.com/sbt/debian / > /etc/apt/sources.list.d/sbt.list && \ 28 | apt-get update && \ 29 | git clone https://github.com/ibm-et/spark-kernel.git && \ 30 | apt-get install -yq --force-yes --no-install-recommends sbt && \ 31 | cd spark-kernel && \ 32 | sbt compile -Xms1024M \ 33 | -Xmx2048M \ 34 | -Xss1M \ 35 | -XX:+CMSClassUnloadingEnabled \ 36 | -XX:MaxPermSize=1024M && \ 37 | sbt pack && \ 38 | mv kernel/target/pack /opt/sparkkernel && \ 39 | chmod +x /opt/sparkkernel && \ 40 | rm -rf ~/.ivy2 && \ 41 | rm -rf ~/.sbt && \ 42 | rm -rf /tmp/spark-kernel && \ 43 | apt-get remove -y sbt && \ 44 | apt-get clean 45 | 46 | # Spark and Mesos pointers 47 | ENV SPARK_HOME /usr/local/spark 48 | ENV R_LIBS_USER $SPARK_HOME/R/lib 49 | ENV PYTHONPATH $SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip 50 | ENV MESOS_NATIVE_LIBRARY /usr/local/lib/libmesos.so 51 | 52 | USER choptiu 53 | 54 | # Install Python 3 packages 55 | RUN conda install --yes \ 56 | 'ipywidgets=4.0*' \ 57 | 'pandas=0.16*' \ 58 | 'matplotlib=1.4*' \ 59 | 'scipy=0.15*' \ 60 | 'seaborn=0.6*' \ 61 | 'scikit-learn=0.16*' \ 62 | && conda clean -yt 63 | 64 | # Install Python 2 packages 65 | RUN conda create -p $CONDA_DIR/envs/python2 python=2.7 \ 66 | 'ipython=4.0*' \ 67 | 'ipywidgets=4.0*' \ 68 | 'pandas=0.16*' \ 69 | 'matplotlib=1.4*' \ 70 | 'scipy=0.15*' \ 71 | 'seaborn=0.6*' \ 72 | 'scikit-learn=0.16*' \ 73 | pyzmq \ 74 | && conda clean -yt 75 | 76 | # R packages 77 | RUN conda config --add channels r 78 | RUN conda install --yes \ 79 | 'r-base=3.2*' \ 80 | 'r-irkernel=0.4*' \ 81 | 'r-ggplot2=1.0*' \ 82 | 'r-rcurl=1.95*' && conda clean -yt 83 | 84 | # Scala Spark kernel spec 85 | RUN mkdir -p /opt/conda/share/jupyter/kernels/scala 86 | COPY kernel.json /opt/conda/share/jupyter/kernels/scala/ 87 | 88 | USER root 89 | 90 | # Install Python 2 kernel spec globally to avoid permission problems when NB_UID 91 | # switching at runtime. 92 | RUN $CONDA_DIR/envs/python2/bin/python \ 93 | $CONDA_DIR/envs/python2/bin/ipython \ 94 | kernelspec install-self 95 | 96 | -------------------------------------------------------------------------------- /minimal-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Minimal Jupyter Notebook Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server (v4.0.x or v3.2.x, see tag) 6 | * Conda Python 3.4.x 7 | * No preinstalled scientific computing packages 8 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 9 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](./start-notebook.sh) as the default command 10 | * Options for HTTPS, password auth, and passwordless `sudo` 11 | 12 | ## Basic Use 13 | 14 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 15 | 16 | ``` 17 | docker run -d -p 8888:8888 choptiu/minimal-notebook 18 | ``` 19 | 20 | ## Docker Options 21 | 22 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 23 | 24 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 25 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not provided (see below), the container will generate a self-signed certificate for you. 26 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 27 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 28 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 29 | * **(v3.2.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.ipython/profile_default/security/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 30 | * **(v4.0.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 31 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 32 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 33 | 34 | ## Conda Environment 35 | 36 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in this environment. 37 | -------------------------------------------------------------------------------- /datascience-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook Data Science Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server v4.0.x 6 | * Conda Python 3.4.x and Python 2.7.x environments 7 | * pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh pre-installed 8 | * Conda R v3.2.x and channel 9 | * plyr, devtools, dplyr, ggplot2, tidyr, shiny, rmarkdown, forecast, stringr, rsqlite, reshape2, nycflights13, caret, rcurl, and randomforest pre-installed 10 | * Julia v0.3.x with Gadfly and RDatasets pre-installed 11 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 12 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 13 | * Options for HTTPS, password auth, and passwordless `sudo` 14 | 15 | ## Basic Use 16 | 17 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 18 | 19 | ``` 20 | docker run -d -p 8888:8888 choptiu/datascience-notebook 21 | ``` 22 | 23 | ## Docker Options 24 | 25 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 26 | 27 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 28 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not found in `/home/choptiu/.ipython/profile_default/security/notebook.pem`, the container will generate a self-signed certificate for you. 29 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 30 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 31 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 32 | * `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 33 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 34 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 35 | 36 | ## Conda Environments 37 | 38 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. A second Python 2.x Conda environment exists in `/opt/conda/envs/python2`. You can [switch to the python2 environment](http://conda.pydata.org/docs/using/envs.html#change-environments-activate-deactivate) in a shell by entering the following: 39 | 40 | ``` 41 | source activate python2 42 | ``` 43 | 44 | You can return to the default environment with this command: 45 | 46 | ``` 47 | source deactivate 48 | ``` 49 | 50 | The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in both environments. 51 | -------------------------------------------------------------------------------- /scipy-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook Scientific Python Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server (v4.0.x or v3.2.x, see tag) 6 | * Conda Python 3.4.x and Python 2.7.x environments 7 | * pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh pre-installed 8 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 9 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 10 | * Options for HTTPS, password auth, and passwordless `sudo` 11 | 12 | ## Basic Use 13 | 14 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 15 | 16 | ``` 17 | docker run -d -p 8888:8888 choptiu/scipy-notebook 18 | ``` 19 | 20 | ## Docker Options 21 | 22 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 23 | 24 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 25 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not found in `/home/choptiu/.ipython/profile_default/security/notebook.pem`, the container will generate a self-signed certificate for you. 26 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 27 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 28 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 29 | * **(v3.2.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.ipython/profile_default/security/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 30 | * **(v4.0.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 31 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 32 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 33 | 34 | ## Conda Environments 35 | 36 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. A second Python 2.x Conda environment exists in `/opt/conda/envs/python2`. You can [switch to the python2 environment](http://conda.pydata.org/docs/using/envs.html#change-environments-activate-deactivate) in a shell by entering the following: 37 | 38 | ``` 39 | source activate python2 40 | ``` 41 | 42 | You can return to the default environment with this command: 43 | 44 | ``` 45 | source deactivate 46 | ``` 47 | 48 | The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in both environments. 49 | -------------------------------------------------------------------------------- /kitchen-sink-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook Scientific Python Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server (v4.0.x or v3.2.x, see tag) 6 | * Conda Python 3.4.x and Python 2.7.x environments 7 | * nltk, pandas, matplotlib, scipy, seaborn, scikit-learn, scikit-image, sympy, cython, patsy, statsmodel, cloudpickle, dill, numba, bokeh pre-installed 8 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 9 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 10 | * Options for HTTPS, password auth, and passwordless `sudo` 11 | 12 | ## Basic Use 13 | 14 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 15 | 16 | ``` 17 | docker run -d -p 8888:8888 choptiu/kitchen-sink-notebook 18 | ``` 19 | 20 | ## Docker Options 21 | 22 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 23 | 24 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 25 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not found in `/home/choptiu/.ipython/profile_default/security/notebook.pem`, the container will generate a self-signed certificate for you. 26 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 27 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 28 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 29 | * **(v3.2.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.ipython/profile_default/security/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 30 | * **(v4.0.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 31 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 32 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 33 | 34 | ## Conda Environments 35 | 36 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. A second Python 2.x Conda environment exists in `/opt/conda/envs/python2`. You can [switch to the python2 environment](http://conda.pydata.org/docs/using/envs.html#change-environments-activate-deactivate) in a shell by entering the following: 37 | 38 | ``` 39 | source activate python2 40 | ``` 41 | 42 | You can return to the default environment with this command: 43 | 44 | ``` 45 | source deactivate 46 | ``` 47 | 48 | The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in both environments. 49 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # docker-stacks 2 | 3 | [![Join the chat at https://gitter.im/jupyter/jupyter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/jupyter/jupyter?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) 4 | 5 | Opinionated stacks of ready-to-run Jupyter applications in Docker. 6 | 7 | ## Quick Start 8 | 9 | If you're familiar with Docker, have it configured, and know exactly what you'd like to run, this one-liner should work in most cases: 10 | 11 | ``` 12 | docker run -d -P choptiu/ 13 | ``` 14 | 15 | ## Getting Started 16 | 17 | If this is your first time using Docker or any of the Jupyter projects, do the following to get started. 18 | 19 | 1. [Install Docker](https://docs.docker.com/installation/) on your host of choice. 20 | 2. Click the link for the Docker Hub repo or GitHub source for your desired stack. 21 | 3. Follow the README for that stack. 22 | 23 | ## Available Stacks 24 | 25 | Branches / tags reflect the version of the primary process in each container (e.g., notebook server). Currently, GitHub master / Docker latest for Jupyter Notebook containers are equivalent to 4.0.x / 4.0 26 | 27 | | Docker Hub repo | GitHub source | Git Branch → Docker Tag | 28 | | --------------- | ------------- | ---------------------------- | 29 | | [jupyter/minimal-notebook](https://hub.docker.com/r/jupyter/minimal-notebook/) | [minimal-notebook](./minimal-notebook) | master → latest
4.0.x → 4.0
3.2.x → 3.2 | 30 | | [jupyter/scipy-notebook](https://hub.docker.com/r/jupyter/scipy-notebook/) | [scipy-notebook](./scipy-notebook) | master → latest
4.0.x → 4.0
3.2.x → 3.2 | 31 | | [jupyter/r-notebook](https://hub.docker.com/r/jupyter/r-notebook/) | [r-notebook](./r-notebook) | master → latest
4.0.x → 4.0
3.2.x → 3.2 | 32 | | [jupyter/datascience-notebook](https://hub.docker.com/r/jupyter/datascience-notebook/) | [datascience-notebook](./datascience-notebook) | master → latest
4.0.x → 4.0 | 33 | | [jupyter/all-spark-notebook](https://hub.docker.com/r/jupyter/all-spark-notebook/) | [all-spark-notebook](./all-spark-notebook) | master → latest
4.0.x → 4.0
3.2.x → 3.2 | 34 | | [jupyter/pyspark-notebook](https://hub.docker.com/r/jupyter/pyspark-notebook/) | [pyspark-notebook](./pyspark-notebook) | master → latest
4.0.x → 4.0
3.2.x → 3.2 | 35 | 36 | ## Maintainer Workflows 37 | 38 | N.B. These are point in time instructions, subject to change as we find the best way to publish builds, manage branches, tag versions, etc. 39 | 40 | ### Triggering a Docker Hub build 41 | 42 | At the moment, we have disabled the webhook to notify Docker Hub of commits in this GitHub repo and all links between parent and child docker repositories. (See issue #15.) Follow these steps to manually trigger builds. 43 | 44 | After merging changes to `minimal-notebook` on any branch: 45 | 46 | 1. Visit https://hub.docker.com/r/jupyter/minimal-notebook/builds/. 47 | 2. Click *Trigger a Build*. 48 | 3. Monitor for transient build errors on Docker Hub. 49 | 4. Visit Docker Hub build page for each dependent stack. 50 | 5. Click *Trigger a Build* on each. 51 | 5. Monitor all dependent stack builds for errors on Docker Hub. 52 | 53 | After merging changes to any other stack on any branch: 54 | 55 | 1. Visit the Docker Hub build page for the modified stack. 56 | 2. Click *Trigger a Build*. 57 | 3. Monitor for transient build errors on Docker Hub. 58 | 59 | N.B. There's no way to rebuild a specific tag. If there are errors rebuilding a Docker Hub tag associated with a branch unaffected by the GitHub merge, it's OK. The last built image will retain the tag and should be functionally equivalent. 60 | 61 | ### Backporting fixes from master to a version branch (e.g., 4.0.x) 62 | 63 | If the fix is a single commit, git cherry pick it. If it's multiple commits, use rebase. For example, if we have commits on master that we want to put in the `4.0.x` branch: 64 | 65 | ``` 66 | # make sure we're up to date locally 67 | git co 4.0.x 68 | git pull origin 4.0.x 69 | git co master 70 | git pull origin master 71 | 72 | # create a backport branch off *master* and interactively 73 | # rebase on the version branch 74 | git co -b 4.0.x-backport 75 | git rebase -i 4.0.x 76 | 77 | # during the rebase ... 78 | # delete any commits that ONLY belong in master 79 | # retain any commits that you need to backport 80 | 81 | # push backport branch to origin version branch 82 | git push origin 4.0.x-backport:4.0.x 83 | 84 | # cleanup 85 | git branch -D 4.0.x-backport 86 | ``` 87 | 88 | ### Upgrading to a new major/minor version of a Jupyter project 89 | 90 | Our git branch and docker tagging scheme captures the major and minor version of the primary Jupyter project within the stack (e.g., Jupyter Notebook 4.0.x). When a new major or minor release of that project becomes available: 91 | 92 | 1. Update the relevant Dockerfiles, README, etc. to install the new version. 93 | 2. Push a new git branch in the form `..x` containing those changes. 94 | 3. Add a new branch-to-tag build under `Build Settings` in the affected `jupyter/*` Docker Hub repositories. 95 | 4. Promote the relevant git commits to master. 96 | 5. Manually trigger Docker Hub builds on all affected repositories. 97 | -------------------------------------------------------------------------------- /pyspark-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook Python, Spark, Mesos Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server (v4.0.x or v3.2.x, see tag) 6 | * Conda Python 3.4.x and Python 2.7.x environments 7 | * pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed 8 | * Spark 1.4.1 for use in local mode or to connect to a cluster of Spark workers 9 | * Mesos client 0.22 binary that can communicate with a Mesos master 10 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 11 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 12 | * Options for HTTPS, password auth, and passwordless `sudo` 13 | 14 | ## Basic Use 15 | 16 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 17 | 18 | ``` 19 | docker run -d -p 8888:8888 choptiu/pyspark-notebook 20 | ``` 21 | 22 | ## Using Spark Local Mode 23 | 24 | This configuration is nice for using Spark on small, local data. 25 | 26 | 0. Run the container as shown above. 27 | 2. Open a Python 2 or 3 notebook. 28 | 3. Create a `SparkContext` configured for local mode. 29 | 30 | For example, the first few cells in a Python 3 notebook might read: 31 | 32 | ```python 33 | import pyspark 34 | sc = pyspark.SparkContext('local[*]') 35 | 36 | # do something to prove it works 37 | rdd = sc.parallelize(range(1000)) 38 | rdd.takeSample(False, 5) 39 | ``` 40 | 41 | In a Python 2 notebook, prefix the above with the following code to ensure the local workers use Python 2 as well. 42 | 43 | ```python 44 | import os 45 | os.environ['PYSPARK_PYTHON'] = 'python2' 46 | 47 | # include pyspark cells from above here ... 48 | ``` 49 | 50 | ## Connecting to a Spark Cluster on Mesos 51 | 52 | This configuration allows your compute cluster to scale with your data. 53 | 54 | 0. [Deploy Spark on Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html). 55 | 1. Configure each slave with [the `--no-switch_user` flag](https://open.mesosphere.com/reference/mesos-slave/) or create the `choptiu` user on every slave node. 56 | 2. Ensure Python 2.x and/or 3.x and any Python libraries you wish to use in your Spark lambda functions are installed on your Spark workers. 57 | 3. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).) 58 | 4. Open a Python 2 or 3 notebook. 59 | 5. Create a `SparkConf` instance in a new notebook pointing to your Mesos master node (or Zookeeper instance) and Spark binary package location. 60 | 6. Create a `SparkContext` using this configuration. 61 | 62 | For example, the first few cells in a Python 3 notebook might read: 63 | 64 | ```python 65 | import os 66 | # make sure pyspark tells workers to use python3 not 2 if both are installed 67 | os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3' 68 | 69 | import pyspark 70 | conf = pyspark.SparkConf() 71 | 72 | # point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos) 73 | conf.setMaster("mesos://10.10.10.10:5050") 74 | # point to spark binary package in HDFS or on local filesystem on all slave 75 | # nodes (e.g., file:///opt/spark/spark-1.4.1-bin-hadoop2.6.tgz) 76 | conf.set("spark.executor.uri", "hdfs://10.122.193.209/spark/spark-1.4.1-bin-hadoop2.6.tgz") 77 | # set other options as desired 78 | conf.set("spark.executor.memory", "8g") 79 | conf.set("spark.core.connection.ack.wait.timeout", "1200") 80 | 81 | # create the context 82 | sc = pyspark.SparkContext(conf=conf) 83 | 84 | # do something to prove it works 85 | rdd = sc.parallelize(range(100000000)) 86 | rdd.sumApprox(3) 87 | ``` 88 | 89 | To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON` environment variable to point to the location of the Python 2.x interpreter binary. If you leave this environment variable unset, it defaults to `python`. 90 | 91 | Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :) 92 | 93 | ## Docker Options 94 | 95 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 96 | 97 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 98 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not found in `/home/choptiu/.ipython/profile_default/security/notebook.pem`, the container will generate a self-signed certificate for you. 99 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 100 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 101 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 102 | * **(v3.2.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.ipython/profile_default/security/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 103 | * **(v4.0.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 104 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 105 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 106 | 107 | ## Conda Environments 108 | 109 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. A second Python 2.x Conda environment exists in `/opt/conda/envs/python2`. You can [switch to the python2 environment](http://conda.pydata.org/docs/using/envs.html#change-environments-activate-deactivate) in a shell by entering the following: 110 | 111 | ``` 112 | source activate python2 113 | ``` 114 | 115 | You can return to the default environment with this command: 116 | 117 | ``` 118 | source deactivate 119 | ``` 120 | 121 | The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in both environments. 122 | -------------------------------------------------------------------------------- /all-spark-notebook/README.md: -------------------------------------------------------------------------------- 1 | # Jupyter Notebook Python, Scala, R, Spark, Mesos Stack 2 | 3 | ## What it Gives You 4 | 5 | * Jupyter Notebook server (v4.0.x or v3.2.x, see tag) 6 | * Conda Python 3.4.x and Python 2.7.x environments 7 | * Conda R 3.1.x environment 8 | * Scala 2.10.x 9 | * pyspark, pandas, matplotlib, scipy, seaborn, scikit-learn pre-installed for Python 10 | * ggplot2, rcurl preinstalled for R 11 | * Spark 1.4.1 for use in local mode or to connect to a cluster of Spark workers 12 | * Mesos client 0.22 binary that can communicate with a Mesos master 13 | * Unprivileged user `choptiu` (uid=1000, configurable, see options) in group `users` (gid=100) with ownership over `/home/choptiu` and `/opt/conda` 14 | * **(v4.0.x)** [tini](https://github.com/krallin/tini) as the container entrypoint and [start-notebook.sh](../minimal-notebook/start-notebook.sh) as the default command 15 | * Options for HTTPS, password auth, and passwordless `sudo` 16 | 17 | ## Basic Use 18 | 19 | The following command starts a container with the Notebook server listening for HTTP connections on port 8888 without authentication configured. 20 | 21 | ``` 22 | docker run -d -p 8888:8888 choptiu/all-spark-notebook 23 | ``` 24 | 25 | ## Using Spark Local Mode 26 | 27 | This configuration is nice for using Spark on small, local data. 28 | 29 | ### In a Python Notebook 30 | 31 | 0. Run the container as shown above. 32 | 1. Open a Python 2 or 3 notebook. 33 | 2. Create a `SparkContext` configured for local mode. 34 | 35 | For example, the first few cells in a Python 3 notebook might read: 36 | 37 | ```python 38 | import pyspark 39 | sc = pyspark.SparkContext('local[*]') 40 | 41 | # do something to prove it works 42 | rdd = sc.parallelize(range(1000)) 43 | rdd.takeSample(False, 5) 44 | ``` 45 | 46 | In a Python 2 notebook, prefix the above with the following code to ensure the local workers use Python 2 as well. 47 | 48 | ```python 49 | import os 50 | os.environ['PYSPARK_PYTHON'] = 'python2' 51 | 52 | # include pyspark cells from above here ... 53 | ``` 54 | 55 | ### In a R Notebook 56 | 57 | 0. Run the container as shown above. 58 | 1. Open a R notebook. 59 | 2. Initialize `sparkR` for local mode. 60 | 3. Initialize `sparkRSQL`. 61 | 62 | For example, the first few cells in a R notebook might read: 63 | 64 | ``` 65 | library(SparkR) 66 | 67 | sc <- sparkR.init("local[*]") 68 | sqlContext <- sparkRSQL.init(sc) 69 | 70 | # do something to prove it works 71 | data(iris) 72 | df <- createDataFrame(sqlContext, iris) 73 | head(filter(df, df$Petal_Width > 0.2)) 74 | ``` 75 | 76 | ### In a Scala Notebook 77 | 78 | 0. Run the container as shown above. 79 | 1. Open a Scala notebook. 80 | 2. Use the pre-configured `SparkContext` in variable `sc`. 81 | 82 | For example: 83 | 84 | ``` 85 | val rdd = sc.parallelize(0 to 999) 86 | rdd.takeSample(false, 5) 87 | ``` 88 | 89 | ## Connecting to a Spark Cluster on Mesos 90 | 91 | This configuration allows your compute cluster to scale with your data. 92 | 93 | 0. [Deploy Spark on Mesos](http://spark.apache.org/docs/latest/running-on-mesos.html). 94 | 1. Configure each slave with [the `--no-switch_user` flag](https://open.mesosphere.com/reference/mesos-slave/) or create the `choptiu` user on every slave node. 95 | 2. Run the Docker container with `--net=host` in a location that is network addressable by all of your Spark workers. (This is a [Spark networking requirement](http://spark.apache.org/docs/latest/cluster-overview.html#components).) 96 | 3. Follow the language specific instructions below. 97 | 98 | ### In a Python Notebook 99 | 100 | 0. Open a Python 2 or 3 notebook. 101 | 1. Create a `SparkConf` instance in a new notebook pointing to your Mesos master node (or Zookeeper instance) and Spark binary package location. 102 | 2. Create a `SparkContext` using this configuration. 103 | 104 | For example, the first few cells in a Python 3 notebook might read: 105 | 106 | ```python 107 | import os 108 | # make sure pyspark tells workers to use python3 not 2 if both are installed 109 | os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3' 110 | 111 | import pyspark 112 | conf = pyspark.SparkConf() 113 | 114 | # point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos) 115 | conf.setMaster("mesos://10.10.10.10:5050") 116 | # point to spark binary package in HDFS or on local filesystem on all slave 117 | # nodes (e.g., file:///opt/spark/spark-1.4.1-bin-hadoop2.6.tgz) 118 | conf.set("spark.executor.uri", "hdfs://10.10.10.10/spark/spark-1.4.1-bin-hadoop2.6.tgz") 119 | # set other options as desired 120 | conf.set("spark.executor.memory", "8g") 121 | conf.set("spark.core.connection.ack.wait.timeout", "1200") 122 | 123 | # create the context 124 | sc = pyspark.SparkContext(conf=conf) 125 | 126 | # do something to prove it works 127 | rdd = sc.parallelize(range(100000000)) 128 | rdd.sumApprox(3) 129 | ``` 130 | 131 | To use Python 2 in the notebook and on the workers, change the `PYSPARK_PYTHON` environment variable to point to the location of the Python 2.x interpreter binary. If you leave this environment variable unset, it defaults to `python`. 132 | 133 | Of course, all of this can be hidden in an [IPython kernel startup script](http://ipython.org/ipython-doc/stable/development/config.html?highlight=startup#startup-files), but "explicit is better than implicit." :) 134 | 135 | ### In a R Notebook 136 | 137 | 0. Run the container as shown above. 138 | 1. Open a R notebook. 139 | 2. Initialize `sparkR` Mesos master node (or Zookeeper instance) and Spark binary package location. 140 | 3. Initialize `sparkRSQL`. 141 | 142 | For example, the first few cells in a R notebook might read: 143 | 144 | ``` 145 | library(SparkR) 146 | 147 | # point to mesos master or zookeeper entry (e.g., zk://10.10.10.10:2181/mesos)\ 148 | # as the first argument 149 | # point to spark binary package in HDFS or on local filesystem on all slave 150 | # nodes (e.g., file:///opt/spark/spark-1.4.1-bin-hadoop2.6.tgz) in sparkEnvir 151 | # set other options in sparkEnvir 152 | sc <- sparkR.init("mesos://10.10.10.10:5050", sparkEnvir=list( 153 | spark.executor.uri="hdfs://10.10.10.10/spark/spark-1.4.1-bin-hadoop2.6.tgz", 154 | spark.executor.memory="8g" 155 | ) 156 | ) 157 | sqlContext <- sparkRSQL.init(sc) 158 | 159 | # do something to prove it works 160 | data(iris) 161 | df <- createDataFrame(sqlContext, iris) 162 | head(filter(df, df$Petal_Width > 0.2)) 163 | ``` 164 | 165 | ### In a Scala Notebook 166 | 167 | 0. Open a terminal via *New -> Terminal* in the notebook interface. 168 | 1. Add information about your cluster to the Scala kernel spec file in `~/.ipython/kernels/scala/kernel.json`. (See below.) 169 | 2. Open a Scala notebook. 170 | 3. Use the pre-configured `SparkContext` in variable `sc`. 171 | 172 | The Scala kernel automatically creates a `SparkContext` when it starts based on configuration information from its command line arguments and environments. Therefore, you must add it to the Scala kernel spec file. You cannot, at present, configure it yourself within a notebook. 173 | 174 | For instance, a kernel spec file with information about a Mesos master, Spark binary location in HDFS, and an executor option appears here: 175 | 176 | ``` 177 | { 178 | "display_name": "Scala 2.10.4", 179 | "language": "scala", 180 | "argv": [ 181 | "/opt/sparkkernel/bin/sparkkernel", 182 | "--profile", 183 | "{connection_file}", 184 | "--master=mesos://10.10.10.10:5050" 185 | ], 186 | "env": { 187 | "SPARK_CONFIGURATION": "spark.executor.memory=8g,spark.executor.uri=hdfs://10.10.10.10/spark/spark-1.4.1-bin-hadoop2.6.tgz" 188 | } 189 | } 190 | ``` 191 | 192 | Note that this is the same information expressed in a notebook in the Python case above. Once the kernel spec has your cluster information, you can test your cluster in a Scala notebook like so: 193 | 194 | ``` 195 | // should print the value of --master in the kernel spec 196 | println(sc.master) 197 | 198 | // do something to prove it works 199 | val rdd = sc.parallelize(0 to 99999999) 200 | rdd.sum() 201 | ``` 202 | 203 | ## Docker Options 204 | 205 | You may customize the execution of the Docker container and the Notebook server it contains with the following optional arguments. 206 | 207 | * `-e PASSWORD="YOURPASS"` - Configures Jupyter Notebook to require the given password. Should be conbined with `USE_HTTPS` on untrusted networks. 208 | * `-e USE_HTTPS=yes` - Configures Jupyter Notebook to accept encrypted HTTPS connections. If a `pem` file containing a SSL certificate and key is not found in `/home/choptiu/.ipython/profile_default/security/notebook.pem`, the container will generate a self-signed certificate for you. 209 | * **(v4.0.x)** `-e NB_UID=1000` - Specify the uid of the `choptiu` user. Useful to mount host volumes with specific file ownership. 210 | * `-e GRANT_SUDO=yes` - Gives the `choptiu` user passwordless `sudo` capability. Useful for installing OS packages. **You should only enable `sudo` if you trust the user or if the container is running on an isolated host.** 211 | * `-v /some/host/folder/for/work:/home/choptiu/work` - Host mounts the default working directory on the host to preserve work even when the container is destroyed and recreated (e.g., during an upgrade). 212 | * **(v3.2.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.ipython/profile_default/security/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 213 | * **(v4.0.x)** `-v /some/host/folder/for/server.pem:/home/choptiu/.local/share/jupyter/notebook.pem` - Mounts a SSL certificate plus key for `USE_HTTPS`. Useful if you have a real certificate for the domain under which you are running the Notebook server. 214 | * `-e INTERFACE=10.10.10.10` - Configures Jupyter Notebook to listen on the given interface. Defaults to '*', all interfaces, which is appropriate when running using default bridged Docker networking. When using Docker's `--net=host`, you may wish to use this option to specify a particular network interface. 215 | * `-e PORT=8888` - Configures Jupyter Notebook to listen on the given port. Defaults to 8888, which is the port exposed within the Dockerfile for the image. When using Docker's `--net=host`, you may wish to use this option to specify a particular port. 216 | 217 | ## Conda Environments 218 | 219 | The default Python 3.x [Conda environment](http://conda.pydata.org/docs/using/envs.html) resides in `/opt/conda`. A second Python 2.x Conda environment exists in `/opt/conda/envs/python2`. You can [switch to the python2 environment](http://conda.pydata.org/docs/using/envs.html#change-environments-activate-deactivate) in a shell by entering the following: 220 | 221 | ``` 222 | source activate python2 223 | ``` 224 | 225 | You can return to the default environment with this command: 226 | 227 | ``` 228 | source deactivate 229 | ``` 230 | 231 | The commands `ipython`, `python`, `pip`, `easy_install`, and `conda` (among others) are available in both environments. 232 | --------------------------------------------------------------------------------