├── .gitignore ├── README.md ├── dockerfiles ├── DAIMojoRestServer4-1.11.1.jar ├── Dockerfile.base ├── Dockerfile.h2o3 ├── Dockerfile.h2o3coreosnotebook ├── Dockerfile.h2o3notebook ├── Dockerfile.mojo ├── Makefile ├── docker-startup.sh ├── jupyter_notebook_config.py ├── mojo-startup.sh ├── mojo_tornado.py ├── sample-mojos │ ├── creditcard.mojo │ └── loanlevel.mojo ├── start-notebook.sh ├── start-singleuser.sh └── start.sh ├── h2o-kubeflow ├── generate_docs.py ├── h2o3-scaling │ ├── README.md │ ├── h2o3-scaling.libsonnet │ ├── parts.yaml │ └── prototypes │ │ └── h2o3-scaling-all.jsonnet ├── h2oai │ ├── README.md │ ├── h2oai-driverlessai.libsonnet │ ├── h2oai-h2o3.libsonnet │ ├── h2oai-mojo-rest-server.libsonnet │ ├── parts.yaml │ └── prototypes │ │ ├── h2oai-driverlessai.jsonnet │ │ ├── h2oai-h2o3.jsonnet │ │ └── h2oai-mojo-rest-server.jsonnet └── registry.yaml └── scripts ├── README.md ├── deployment-status.service ├── deployment-status.sh ├── deployment-status.timer ├── k8s_master_setup.sh └── k8s_slave_setup.sh /.gitignore: -------------------------------------------------------------------------------- 1 | # ignore DS_Store 2 | *.DS_Store 3 | __pycache__/ 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # H2O + Kubeflow Integration 2 | 3 | This is a project for the integration of H2O.ai and Kubeflow. The integration of H2O and Kubeflow is an extremely powerful opportunity, as it provides a turn-key solution for easily deployable and highly scalable machine learning applications, with minimal input required from the user. 4 | 5 | #### Kubeflow 6 | [Kubeflow](https://github.com/kubeflow/kubeflow) is an open source project managed by Google and built on top of their Kubernetes engine. It is designed to alleviate some of the more tedious tasks associated with machine learning. Kubeflow helps orchestrate deployment of apps through the full cycle of development, testing, and production, and allows for resource scaling as demand increases. 7 | 8 | #### H2O 3 9 | [H2O 3’s](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html) goal is to reduce the time spent by data scientists on time-consuming tasks like designing grid search algorithms and tuning hyperparameters, while also providing an interface that allows newer practitioners an easy foothold into the machine learning space. 10 | 11 | #### Driverless AI 12 | [Driverless AI](http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/index.html) is an artificial intelligence (AI) platform for automatic machine learning. Driverless AI automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection and model deployment. It aims to achieve highest predictive accuracy, comparable to expert data scientists, but in much shorter time thanks to end-to-end automation. Driverless AI also offers automatic visualizations and machine learning interpretability (MLI). 13 | 14 | #### Contents 15 | This repository contains all the necessary components for deploying H2O.ai's core products on Kubeflow 16 | 17 | ``` 18 | h2o-kubeflow 19 | |-- dockerfiles 20 | |-- A copy of dockerfiles that will are currently part of components in POC 21 | |-- h2o-kubeflow // --> Ksonnet registry containing all packages offered in this repo 22 | |-- h2oai 23 | |-- Ksonnet package containing deployment templates for core offerings from H2O.ai [H2O-3, Driverless AI] 24 | |-- 25 | |-- Ksonnet packages built as a proof of concept. Not consistently maintained 26 | |-- registry.yaml // --> file defining all packages included in the registry 27 | ``` 28 | 29 | #### Quick Start 30 | Complete deployment steps can be found inside this directory: [https://github.com/h2oai/h2o-kubeflow/tree/master/h2o-kubeflow/h2oai](https://github.com/h2oai/h2o-kubeflow/tree/master/h2o-kubeflow/h2oai). 31 | 32 | Repository for Kubeflow can be found [here](https://github.com/kubeflow/kubeflow), and complete steps to deploy Kubeflow can be found in their [User Documentation](https://www.kubeflow.org/docs/started/getting-started/) 33 | 34 | You will also need [ksonnet](https://ksonnet.io) and [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) command line tools. 35 | 36 | - Create a Kubernetes cluster. Either on-prem or on Google Cloud 37 | - Run the following commands to setup your ksonnet app (how you deploy Kubeflow) 38 | 39 | **NOTE:** Kubeflow is managed by Google's Kubeflow team, and some of the commands to deploy Kubeflow's core components may change. Refer to https://www.kubeflow.org/docs/started/getting-started/ for comprehensive steps to launch Kubeflow. The H2O Components are not dependent on Kubeflow running to be able to be deployed, but will benefit from Kubeflow's core functionality. It is recommended that you launch Kubeflow prior to starting the H2O deployments, but is not required. 40 | 41 | ```bash 42 | # create ksonnet app 43 | ks init 44 | cd 45 | 46 | # add ksonnet registry to app containing all the kubeflow manifests as maintained by Google Kubeflow team 47 | ks registry add kubeflow https://github.com/kubeflow/kubeflow/tree/master/kubeflow 48 | # add ksonnet registry to app containing all the h2o component manifests 49 | ks pkg install h2o-kubeflow/h2oai 50 | 51 | # create namespace and environment for deployments 52 | kubectl create namespace kubeflow 53 | ks env add 54 | ``` 55 | 56 | - Deploy H2O 3 by running the following commands. You will first need to build a docker image of H2O-3 that can be consumed by Kubernetes. See this directory: [https://github.com/h2oai/h2o-kubeflow/tree/master/h2o-kubeflow/h2oai/dockerfiles](https://github.com/h2oai/h2o-kubeflow/tree/master/h2o-kubeflow/h2oai/dockerfiles) for necessary dockerfile and scripts. Be sure to push it to a repository that Kubernetes has pull access to. 57 | 58 | ```bash 59 | ks prototype use io.ksonnet.pkg.h2oai-h2o3 h2o3 \ 60 | --name h2o3 \ 61 | --namespace kubeflow \ 62 | --memory 2 \ 63 | --cpu 1 \ 64 | --replicas 2 \ 65 | --model_server_image 66 | 67 | ks apply -c h2o3 68 | ``` 69 | - run `kubectl get svc -n kubeflow` to find the External IP address. 70 | - Open a jupyter notebook on a local computer that has H2O installed locally. 71 | 72 | ```python 73 | import h2o 74 | h2o.init(port="", port=54321) 75 | ``` 76 | - You can now follow the steps for running H2O 3 AutoML that can be found [here](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html) 77 | 78 | #### Burst to Cloud (NOT CONSISTENTLY MAINTAINED) 79 | 80 | If you are interested in additional orchestration, follow the following steps to setup a Kubernetes cluster. This walkthrough will setup a Kubernetes cluster with the ability to scale with the demand of additional resources. 81 | 82 | Note: This is a prototype and will continue to be changed/modified as time progresses. 83 | 84 | 1. Start a machine with Ubuntu 16.04. This can be On-Premise or in the cloud 85 | 2. Copy all the scripts from the [scripts](https://github.com/h2oai/h2o-kubeflow/tree/master/scripts) folder in this repo to the machine 86 | 3. Move `deployment-status.service` and `deployment-status.timer` to `/etc/systemd/system/` and enable the services. 87 | ``` 88 | sudo mv deployment-status.service /etc/systemd/system/ 89 | sudo mv deployment-status.timer /etc/systemd/system/ 90 | sudo systemctl enable deployment-status.service deployment-status.timer 91 | sudo systemctl start deployment-status.service deployment-status.timer 92 | ``` 93 | 4. Move `deployment-status.sh`, `k8s_master_setup.sh` and `k8s_slave_setup.sh` to a new directory `/opt/kubeflow/` 94 | ``` 95 | sudo mkdir /opt/kubeflow 96 | sudo mv k8s_master_setup.sh /opt/kubeflow/ 97 | ``` 98 | 5. Run `sudo /opt/kubeflow/k8s_master_setup.sh`. This script will modify `k8s_slave_setup.sh` with the necessary commands to connect any other machines __Ubuntu 16.04__ to the Kubernetes cluster 99 | 6. Run the new `k8s_slave_setup.sh` on any other machines you want to connect to the cluster 100 | 7. `k8s_slave_setup.sh` will also create a new file called config.txt in `/opt/kubeflow/` modify the final line `KSONNET_APP` to include the relative file path to the file created by `ks init`: /home/ubuntu/my_ksonnet_app --> use `KSONNET_APP=my_ksonnet_app` 101 | 8. Use `kubectl get nodes` to ensure that all nodes are attached properly to the cluster 102 | 9. Follow above steps to deploy H2O on Kubeflow + Kubernetes 103 | -------------------------------------------------------------------------------- /dockerfiles/DAIMojoRestServer4-1.11.1.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h2oai/h2o-kubeflow/0f3e052b15e687fcde0da70155f0d8d340d2ede6/dockerfiles/DAIMojoRestServer4-1.11.1.jar -------------------------------------------------------------------------------- /dockerfiles/Dockerfile.base: -------------------------------------------------------------------------------- 1 | # Base image for Driverless AI components in Kubeflow Pipelines 2 | # includes: kubectl, ksonnet, jsonnet, python3.6 3 | # Maintainer: Nicholas Png 4 | # Contact: nicholas.png@h2o.ai 5 | 6 | FROM ubuntu:16.04 7 | 8 | # Install base requirements 9 | RUN apt-get -y update && \ 10 | apt-get -y --no-install-recommends install \ 11 | wget \ 12 | curl \ 13 | apt-utils \ 14 | python-software-properties \ 15 | default-jre \ 16 | nginx \ 17 | libzmq-dev \ 18 | libblas-dev \ 19 | apache2-utils \ 20 | software-properties-common 21 | 22 | # Get kubectl 23 | RUN \ 24 | curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && \ 25 | chmod +x ./kubectl && \ 26 | mv ./kubectl /usr/local/bin/kubectl 27 | 28 | # Get ksonnet 29 | RUN \ 30 | wget https://github.com/ksonnet/ksonnet/releases/download/v0.13.1/ks_0.13.1_linux_amd64.tar.gz && \ 31 | tar -xzvf ks_0.13.1_linux_amd64.tar.gz && \ 32 | chmod +x ./ks_0.13.1_linux_amd64/ks && \ 33 | cp ks_0.13.1_linux_amd64/ks /usr/local/bin/ks && \ 34 | rm ks_0.13.1_linux_amd64.tar.gz 35 | 36 | # Install Driverless AI 37 | RUN \ 38 | wget https://s3.amazonaws.com/artifacts.h2o.ai/releases/ai/h2o/dai/rel-1.4.2-9/x86_64-centos7/dai-1.4.2-linux-x86_64.sh && \ 39 | chmod +x ./dai-1.4.2-linux-x86_64.sh && \ 40 | ./dai-1.4.2-linux-x86_64.sh && \ 41 | rm dai-1.4.2-linux-x86_64.sh 42 | 43 | RUN \ 44 | echo "export PATH=/dai-1.4.2-linux-x86_64/python/bin:$PATH" >> /root/.bashrc && \ 45 | echo "export LD_LIBRARY_PATH=/dai-1.4.2-linux-x86_64/python/lib" >> /root/.bashrc 46 | 47 | ENV PATH=/dai-1.4.2-linux-x86_64/python/bin:$PATH 48 | ENV LD_LIBRARY_PATH=/dai-1.4.2-linux-x86_64/python/lib 49 | 50 | #Install gcloud sdk 51 | RUN \ 52 | export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \ 53 | echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" | tee -a /etc/apt/sources.list.d/google-cloud-sdk.list && \ 54 | curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \ 55 | apt-get -y update && \ 56 | apt-get install -y google-cloud-sdk 57 | -------------------------------------------------------------------------------- /dockerfiles/Dockerfile.h2o3: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # Dockerfile for Oracle JDK 8 on Ubuntu 16.04 3 | ######################################################################## 4 | 5 | # pull base image 6 | FROM ubuntu:16.04 7 | 8 | # maintainer details 9 | MAINTAINER h2oai "h2o.ai" 10 | 11 | # add a post-invoke hook to dpkg which deletes cached deb files 12 | # update the sources.list 13 | # update/dist-upgrade 14 | # clear the caches 15 | 16 | 17 | RUN \ 18 | echo 'DPkg::Post-Invoke {"/bin/rm -f /var/cache/apt/archives/*.deb || true";};' | tee /etc/apt/apt.conf.d/no-cache && \ 19 | echo "deb http://mirror.math.princeton.edu/pub/ubuntu xenial main universe" >> /etc/apt/sources.list && \ 20 | apt-get update -q -y && \ 21 | apt-get dist-upgrade -y && \ 22 | apt-get clean && \ 23 | rm -rf /var/cache/apt/* && \ 24 | 25 | # Install Prerequisite Packages 26 | DEBIAN_FRONTEND=noninteractive apt-get install -y \ 27 | curl \ 28 | wget \ 29 | unzip \ 30 | apt-utils \ 31 | software-properties-common \ 32 | python-software-properties \ 33 | python3-setuptools \ 34 | python3-pip \ 35 | python-pip \ 36 | gdebi \ 37 | python3-pandas \ 38 | python-pandas \ 39 | python3-numpy \ 40 | python-numpy \ 41 | python3-matplotlib \ 42 | python-matplotlib \ 43 | libxml2-dev \ 44 | libssl-dev \ 45 | libcurl4-openssl-dev \ 46 | libgtk2.0-0 \ 47 | iputils-ping \ 48 | cloud-utils \ 49 | apache2-utils && \ 50 | 51 | # Install Oracle Java 8 52 | add-apt-repository -y ppa:webupd8team/java && \ 53 | apt-get update -q && \ 54 | echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \ 55 | echo debconf shared/accepted-oracle-license-v1-1 seen true | debconf-set-selections && \ 56 | DEBIAN_FRONTEND=noninteractive apt-get install -y oracle-java8-installer && \ 57 | apt-get clean && \ 58 | 59 | # Fetch h2o latest_stable 60 | wget http://h2o-release.s3.amazonaws.com/h2o/latest_stable -O latest && \ 61 | wget --no-check-certificate -i latest -O /opt/h2o.zip && \ 62 | unzip -d /opt /opt/h2o.zip && \ 63 | rm /opt/h2o.zip && \ 64 | cd /opt && \ 65 | cd `find . -name 'h2o.jar' | sed 's/.\///;s/\/h2o.jar//g'` && \ 66 | cp h2o.jar /opt && \ 67 | /usr/bin/pip install `find . -name "*.whl"` && \ 68 | cd / && \ 69 | wget https://raw.githubusercontent.com/h2oai/h2o-3/master/docker/start-h2o-docker.sh && \ 70 | chmod +x start-h2o-docker.sh && \ 71 | 72 | # Get Content 73 | wget http://s3.amazonaws.com/h2o-training/mnist/train.csv.gz && \ 74 | gunzip train.csv.gz && \ 75 | wget https://raw.githubusercontent.com/laurendiperna/Churn_Scripts/master/Extraction_Script.py && \ 76 | wget https://raw.githubusercontent.com/laurendiperna/Churn_Scripts/master/Transformation_Script.py && \ 77 | wget https://raw.githubusercontent.com/laurendiperna/Churn_Scripts/master/Modeling_Script.py 78 | 79 | # Get kubectl 80 | RUN \ 81 | curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl && \ 82 | chmod +x ./kubectl && \ 83 | mv ./kubectl /usr/local/bin/kubectl 84 | 85 | # Define a mountable data directory 86 | #VOLUME \ 87 | # ["/data"] 88 | 89 | # Define the working directory 90 | #WORKDIR \ 91 | # /data 92 | 93 | COPY docker-startup.sh /opt/docker-startup.sh 94 | RUN chmod +x /opt/docker-startup.sh 95 | 96 | EXPOSE 54321 97 | EXPOSE 54322 98 | EXPOSE 54323 99 | EXPOSE 54324 100 | 101 | #ENTRYPOINT ["java", "-Xmx4g", "-jar", "/opt/h2o.jar"] 102 | # Define default command 103 | 104 | CMD \ 105 | ["/bin/bash"] 106 | -------------------------------------------------------------------------------- /dockerfiles/Dockerfile.h2o3coreosnotebook: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | # Distributed under the terms of the Modified BSD License. 3 | FROM jupyter/tensorflow-notebook:latest 4 | 5 | LABEL maintainer='Florian JUDITH /etc/apt/sources.list.d/google-cloud-sdk.list && \ 118 | curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \ 119 | apt-get update && \ 120 | apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 kubectl && \ 121 | apt-get clean && \ 122 | rm -rf /var/lib/apt/lists/* && \ 123 | gcloud config set core/disable_usage_reporting true && \ 124 | gcloud config set component_manager/disable_update_check true && \ 125 | gcloud config set metrics/environment github_docker_image 126 | 127 | # Activate ipywidgets extension in the environment that runs the notebook server 128 | RUN jupyter nbextension enable --py widgetsnbextension --sys-prefix 129 | 130 | RUN curl -L -o bazel.sh https://github.com/bazelbuild/bazel/releases/download/0.8.0/bazel-0.8.0-installer-linux-x86_64.sh && chmod a+x ./bazel.sh && ./bazel.sh && rm ./bazel.sh 131 | SHELL ["/bin/bash", "-c"] 132 | 133 | RUN git clone https://github.com/tensorflow/models.git /home/$NB_USER/tensorflow-models && git clone https://github.com/tensorflow/benchmarks.git /home/$NB_USER/tensorflow-benchmarks 134 | # Import matplotlib the first time to build the font cache. 135 | ENV XDG_CACHE_HOME /home/$NB_USER/.cache/ 136 | RUN pip install jupyter-tensorboard 137 | 138 | # Create a conda environment for Python 2. We want to include as many of the 139 | # packages from our root environment as we reasonably can, so we explicitly 140 | # list that environment, then include everything unless it is Conda (which 141 | # can only be in the root environment), Jupyterhub (which requires Python 3), 142 | # or Python itself. We also want to include the pip packages, but we cannot 143 | # install those via conda, so we list them, drop any conda packages, and 144 | # then install them via pip. We do this on a best-effort basis, so if any 145 | # packages from the Python 3 environment cannot be installed with Python 2, 146 | # then we just skip them. 147 | RUN conda_packages=$(conda list -e | cut -d '=' -f 1 | grep -v '#' | sort) && \ 148 | pip_packages=$(pip --no-cache-dir list --format=freeze | cut -d '=' -f 1 | grep -v '#' | sort) && \ 149 | pip_only_packages=$(comm -23 <(echo "${pip_packages}") <(echo "${conda_packages}")) && \ 150 | conda create -n ipykernel_py2 python=2 --file <(echo "${conda_packages}" | grep -v conda | grep -v python | grep -v jupyterhub) && \ 151 | source activate ipykernel_py2 && \ 152 | python -m ipykernel install --user && \ 153 | echo "${pip_only_packages}" | xargs -n 1 -I "{}" /bin/bash -c 'pip install --no-cache-dir {} || true' && \ 154 | pip install --no-cache-dir tensorflow-transform && \ 155 | source deactivate 156 | 157 | RUN chown -R $NB_USER:users /etc/jupyter/ && \ 158 | chown -R $NB_USER /home/$NB_USER/ && \ 159 | chmod a+rx /usr/local/bin/* && \ 160 | fix-permissions /etc/jupyter/ && \ 161 | fix-permissions /home/$NB_USER/ 162 | 163 | USER $NB_UID -------------------------------------------------------------------------------- /dockerfiles/Dockerfile.h2o3notebook: -------------------------------------------------------------------------------- 1 | # Copyright (c) Jupyter Development Team. 2 | # Distributed under the terms of the Modified BSD License. 3 | FROM ubuntu:latest 4 | 5 | USER root 6 | # Install all OS dependencies for notebook server that starts but lacks all 7 | # features (e.g., download as all possible file formats) 8 | ENV DEBIAN_FRONTEND noninteractive 9 | 10 | RUN apt-get update && apt-get install -yq --no-install-recommends \ 11 | apt-transport-https \ 12 | build-essential \ 13 | bzip2 \ 14 | ca-certificates \ 15 | curl \ 16 | emacs \ 17 | fonts-liberation \ 18 | g++ \ 19 | git \ 20 | inkscape \ 21 | jed \ 22 | libav-tools \ 23 | libcupti-dev \ 24 | libsm6 \ 25 | libxext-dev \ 26 | libxrender1 \ 27 | lmodern \ 28 | locales \ 29 | lsb-release \ 30 | openssh-client \ 31 | pandoc \ 32 | pkg-config \ 33 | python \ 34 | python-dev \ 35 | sudo \ 36 | unzip \ 37 | vim \ 38 | wget \ 39 | zip \ 40 | zlib1g-dev \ 41 | && apt-get clean && \ 42 | rm -rf /var/lib/apt/lists/* 43 | 44 | RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && \ 45 | locale-gen 46 | 47 | # Install Tini 48 | RUN wget --quiet https://github.com/krallin/tini/releases/download/v0.10.0/tini && \ 49 | echo "1361527f39190a7338a0b434bd8c88ff7233ce7b9a4876f3315c22fce7eca1b0 *tini" | sha256sum -c - && \ 50 | mv tini /usr/local/bin/tini && \ 51 | chmod +x /usr/local/bin/tini 52 | 53 | # Install ksonnet 54 | RUN wget --quiet https://github.com/ksonnet/ksonnet/releases/download/v0.8.0/ks_0.8.0_linux_amd64.tar.gz && \ 55 | tar -zvxf ks_0.8.0_linux_amd64.tar.gz && \ 56 | mv ks_0.8.0_linux_amd64/ks /usr/local/bin/ks && \ 57 | chmod +x /usr/local/bin/ks 58 | 59 | # Configure environment 60 | ENV CONDA_DIR /opt/conda 61 | ENV PATH $CONDA_DIR/bin:$PATH 62 | ENV SHELL /bin/bash 63 | ENV NB_USER jovyan 64 | ENV NB_UID 1000 65 | ENV HOME /home/$NB_USER 66 | ENV LC_ALL en_US.UTF-8 67 | ENV LANG en_US.UTF-8 68 | ENV LANGUAGE en_US.UTF-8 69 | 70 | # Create jovyan user with UID=1000 and in the 'users' group 71 | RUN useradd -m -s /bin/bash -N -u $NB_UID $NB_USER && \ 72 | mkdir -p $CONDA_DIR && \ 73 | chown $NB_USER $CONDA_DIR 74 | 75 | # Setup work directory for backward-compatibility 76 | RUN mkdir /home/$NB_USER/work 77 | 78 | # Install conda as jovyan and check the md5 sum provided on the download site 79 | ENV MINICONDA_VERSION 4.3.21 80 | RUN cd /tmp && \ 81 | mkdir -p $CONDA_DIR && \ 82 | wget --quiet https://repo.continuum.io/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \ 83 | echo "c1c15d3baba15bf50293ae963abef853 *Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh" | md5sum -c - && \ 84 | /bin/bash Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh -f -b -p $CONDA_DIR && \ 85 | rm Miniconda3-${MINICONDA_VERSION}-Linux-x86_64.sh && \ 86 | $CONDA_DIR/bin/conda config --system --prepend channels conda-forge && \ 87 | $CONDA_DIR/bin/conda config --system --set auto_update_conda false && \ 88 | $CONDA_DIR/bin/conda config --system --set show_channel_urls true && \ 89 | $CONDA_DIR/bin/conda update --all && \ 90 | conda clean -tipsy 91 | 92 | # Install Jupyter Notebook and Hub 93 | RUN conda install --quiet --yes \ 94 | 'notebook=5.0.*' \ 95 | 'jupyterhub=0.8.1' \ 96 | 'jupyterlab=0.31.*' \ 97 | && conda clean -tipsy 98 | 99 | EXPOSE 8888 100 | WORKDIR $HOME 101 | 102 | # Configure container startup 103 | ENTRYPOINT ["tini", "--"] 104 | CMD ["start-notebook.sh"] 105 | 106 | # Install CUDA Profile Tools and other python packages 107 | RUN pip --no-cache-dir install \ 108 | Pillow \ 109 | h5py \ 110 | ipykernel \ 111 | matplotlib \ 112 | numpy \ 113 | scipy \ 114 | sklearn \ 115 | kubernetes \ 116 | grpcio \ 117 | ktext \ 118 | annoy \ 119 | nltk \ 120 | pydot \ 121 | pydot-ng \ 122 | graphviz \ 123 | && \ 124 | python -m ipykernel.kernelspec 125 | 126 | # Install Python 3 packages 127 | # Remove pyqt and qt pulled in for matplotlib since we're only ever going to 128 | # use notebook-friendly backends in these images 129 | RUN conda install --quiet --yes \ 130 | 'nomkl' \ 131 | 'ipywidgets=6.0*' \ 132 | 'pandas=0.22*' \ 133 | 'numexpr=2.6*' \ 134 | 'matplotlib=2.0*' \ 135 | 'scipy=0.19*' \ 136 | 'seaborn=0.7*' \ 137 | 'scikit-learn=0.18*' \ 138 | 'scikit-image=0.12*' \ 139 | 'sympy=1.0*' \ 140 | 'cython=0.25*' \ 141 | 'patsy=0.4*' \ 142 | 'statsmodels=0.8*' \ 143 | 'cloudpickle=0.2*' \ 144 | 'dill=0.2*' \ 145 | 'numba=0.31*' \ 146 | 'bokeh=0.12*' \ 147 | 'sqlalchemy=1.1*' \ 148 | 'hdf5=1.8.17' \ 149 | 'h5py=2.6*' \ 150 | 'vincent=0.4.*' \ 151 | 'beautifulsoup4=4.5.*' \ 152 | 'xlrd' && \ 153 | conda remove --quiet --yes --force qt pyqt && \ 154 | conda clean -tipsy 155 | 156 | # Install graphviz package 157 | RUN apt-get update && apt-get install -yq --no-install-recommends graphviz \ 158 | && apt-get clean && \ 159 | rm -rf /var/lib/apt/lists/* 160 | 161 | # Install Python 3 Tensorflow without GPU support 162 | RUN pip install --quiet --no-cache-dir tf-nightly 163 | 164 | # Install Oracle Java 8 165 | RUN \ 166 | apt-get update && apt-get install -y wget unzip python-pip python-sklearn python-pandas python-numpy python-matplotlib software-properties-common python-software-properties && \ 167 | add-apt-repository -y ppa:webupd8team/java && \ 168 | apt-get update -q && \ 169 | echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \ 170 | echo debconf shared/accepted-oracle-license-v1-1 seen true | debconf-set-selections && \ 171 | DEBIAN_FRONTEND=noninteractive apt-get install -y oracle-java8-installer && \ 172 | apt-get clean 173 | 174 | # Install H2O.3 175 | RUN pip --no-cache-dir install \ 176 | requests \ 177 | tabulate \ 178 | scikit-learn \ 179 | colorama \ 180 | future 181 | RUN pip --no-cache-dir --trusted-host h2o-release.s3.amazonaws.com install -f \ 182 | http://h2o-release.s3.amazonaws.com/h2o/latest_stable_Py.html h2o 183 | 184 | ENV CLOUD_SDK_VERSION 168.0.0 185 | RUN export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)" && \ 186 | echo "deb https://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" > /etc/apt/sources.list.d/google-cloud-sdk.list && \ 187 | curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - && \ 188 | apt-get update && \ 189 | apt-get install -y google-cloud-sdk=${CLOUD_SDK_VERSION}-0 kubectl && \ 190 | gcloud config set core/disable_usage_reporting true && \ 191 | gcloud config set component_manager/disable_update_check true && \ 192 | gcloud config set metrics/environment github_docker_image 193 | 194 | # Activate ipywidgets extension in the environment that runs the notebook server 195 | RUN jupyter nbextension enable --py widgetsnbextension --sys-prefix 196 | 197 | RUN curl -L -o bazel.sh https://github.com/bazelbuild/bazel/releases/download/0.8.0/bazel-0.8.0-installer-linux-x86_64.sh && chmod a+x ./bazel.sh && ./bazel.sh && rm ./bazel.sh 198 | SHELL ["/bin/bash", "-c"] 199 | 200 | RUN git clone https://github.com/tensorflow/models.git /home/$NB_USER/tensorflow-models && git clone https://github.com/tensorflow/benchmarks.git /home/$NB_USER/tensorflow-benchmarks 201 | # Import matplotlib the first time to build the font cache. 202 | ENV XDG_CACHE_HOME /home/$NB_USER/.cache/ 203 | RUN pip install jupyter-tensorboard 204 | 205 | # Create a conda environment for Python 2. We want to include as many of the 206 | # packages from our root environment as we reasonably can, so we explicitly 207 | # list that environment, then include everything unless it is Conda (which 208 | # can only be in the root environment), Jupyterhub (which requires Python 3), 209 | # or Python itself. We also want to include the pip packages, but we cannot 210 | # install those via conda, so we list them, drop any conda packages, and 211 | # then install them via pip. We do this on a best-effort basis, so if any 212 | # packages from the Python 3 environment cannot be installed with Python 2, 213 | # then we just skip them. 214 | RUN conda_packages=$(conda list -e | cut -d '=' -f 1 | grep -v '#' | sort) && \ 215 | pip_packages=$(pip --no-cache-dir list --format=freeze | cut -d '=' -f 1 | grep -v '#' | sort) && \ 216 | pip_only_packages=$(comm -23 <(echo "${pip_packages}") <(echo "${conda_packages}")) && \ 217 | conda create -n ipykernel_py2 python=2 --file <(echo "${conda_packages}" | grep -v conda | grep -v python | grep -v jupyterhub) && \ 218 | source activate ipykernel_py2 && \ 219 | python -m ipykernel install --user && \ 220 | echo "${pip_only_packages}" | xargs -n 1 -I "{}" /bin/bash -c 'pip install --no-cache-dir {} || true' && \ 221 | pip install --no-cache-dir tensorflow-transform && \ 222 | source deactivate 223 | 224 | # Add local files as late as possible to avoid cache busting 225 | COPY start.sh /usr/local/bin/ 226 | COPY start-notebook.sh /usr/local/bin/ 227 | COPY start-singleuser.sh /usr/local/bin/ 228 | COPY jupyter_notebook_config.py /etc/jupyter/ 229 | RUN chown -R $NB_USER:users /etc/jupyter/ && \ 230 | chown -R $NB_USER /home/$NB_USER/ && \ 231 | chmod a+rx /usr/local/bin/* 232 | 233 | USER $NB_USER 234 | ENV PATH=/home/jovyan/bin:$PATH -------------------------------------------------------------------------------- /dockerfiles/Dockerfile.mojo: -------------------------------------------------------------------------------- 1 | # Base image for Driverless AI Mojos in Kubeflow Pipelines 2 | # includes: java openjdk 8, 3 | # Maintainer: Nicholas Png 4 | # Contact: nicholas.png@h2o.ai 5 | 6 | FROM ubuntu:16.04 7 | 8 | ENV DAI_PYTHON_VERSION=master-42 9 | 10 | RUN apt-get -y update && \ 11 | apt-get -y --no-install-recommends install \ 12 | vim \ 13 | wget \ 14 | curl \ 15 | unzip \ 16 | apt-utils \ 17 | default-jre \ 18 | nginx \ 19 | net-tools \ 20 | ca-certificates \ 21 | build-essential \ 22 | software-properties-common 23 | 24 | # Install Oracle Java 8 25 | RUN \ 26 | add-apt-repository -y ppa:webupd8team/java && \ 27 | apt-get update -q && \ 28 | echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections && \ 29 | echo debconf shared/accepted-oracle-license-v1-1 seen true | debconf-set-selections && \ 30 | DEBIAN_FRONTEND=noninteractive apt-get install -y oracle-java8-installer && \ 31 | apt-get clean 32 | 33 | RUN \ 34 | add-apt-repository ppa:deadsnakes/ppa && \ 35 | apt-get -y update && \ 36 | apt-get -y install \ 37 | python3.6 \ 38 | python3-setuptools \ 39 | python3-pip 40 | 41 | RUN curl https://bootstrap.pypa.io/get-pip.py | python3.6 42 | 43 | RUN \ 44 | pip3.6 install --force-reinstall pip==9.0.3 && \ 45 | pip3.6 install flask requests tornado 46 | 47 | RUN ln -fs /usr/bin/python3.6 /usr/bin/python 48 | RUN ls -fs /usr/local/bin/pip3.6 /usr/local/bin/pip 49 | 50 | COPY DAIMojoRestServer4-1.11.1.jar /opt/h2oai/dai/DAIMojoRestServer4-1.11.1.jar 51 | COPY mojo-startup.sh /mojo-startup.sh 52 | COPY mojo_tornado.py /mojo_tornado.py 53 | RUN chmod +x /mojo-startup.sh 54 | 55 | ENTRYPOINT ["/mojo-startup.sh", "/opt/h2oai/dai/license.sig", "/opt/h2oai/dai"] 56 | 57 | EXPOSE 5555 58 | -------------------------------------------------------------------------------- /dockerfiles/Makefile: -------------------------------------------------------------------------------- 1 | # Copyright 2015 Google Inc. All rights reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | 14 | TAG?=$(shell git rev-parse HEAD) 15 | PROJECT_ID = kubeflow 16 | GCR_PROJECT = gcr.io/${PROJECT_ID} 17 | DOCKERHUB_PROJECT=hub.docker.com/r/${PROJECT_ID} 18 | IMAGE=tensorflow-notebook 19 | 20 | 21 | define container 22 | docker build --pull -t $(1)/${IMAGE}-cpu:${TAG} -f Dockerfile.cpu . 23 | docker build --pull -t $(1)/${IMAGE}-gpu:${TAG} -f Dockerfile.gpu . 24 | endef 25 | 26 | all: 27 | $(call container,$(GCR_PROJECT)) 28 | $(call container,$(DOCKERHUB_PROJECT)) 29 | 30 | build_gcr: 31 | $(call container,$(GCR_PROJECT)) 32 | 33 | build_dockerhub: 34 | $(call container,$(DOCKERHUB_PROJECT)) 35 | 36 | push_gcr: build_gcr 37 | gcloud docker -- push $(GCR_PROJECT)/${IMAGE}-cpu:${TAG} 38 | gcloud docker -- push $(GCR_PROJECT)/${IMAGE}-gpu:${TAG} 39 | 40 | push_dockerhub: build_dockerhub 41 | docker push $(DOCKERHUB_PROJECT)/${IMAGE}-cpu:${TAG} 42 | docker push $(DOCKERHUB_PROJECT)/${IMAGE}-gpu:${TAG} 43 | 44 | push: build_gcr build_dockerhub push_gcr push_dockerhub 45 | 46 | .PHONY: all containers push push_dockerhub push_gcr 47 | -------------------------------------------------------------------------------- /dockerfiles/docker-startup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | touch flatfile.txt 4 | 5 | podsready="false" 6 | 7 | while [ $podsready = "false" ] 8 | do 9 | if $(kubectl get pods -o wide | awk '{print $6}' | grep -q "") 10 | then 11 | sleep 5 12 | else 13 | kubectl get pods -o wide | grep $DEP_NAME | awk '{print $6":54321"}' >> flatfile.txt 14 | podsready="true" 15 | fi 16 | done 17 | -------------------------------------------------------------------------------- /dockerfiles/jupyter_notebook_config.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # Copyright 2016 The Kubeflow Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | from jupyter_core.paths import jupyter_data_dir 18 | import subprocess 19 | import os 20 | import errno 21 | import stat 22 | 23 | c = get_config() 24 | c.NotebookApp.ip = '*' 25 | c.NotebookApp.port = 8888 26 | c.NotebookApp.open_browser = False 27 | 28 | # Generate a self-signed certificate 29 | if 'GEN_CERT' in os.environ: 30 | dir_name = jupyter_data_dir() 31 | pem_file = os.path.join(dir_name, 'notebook.pem') 32 | try: 33 | os.makedirs(dir_name) 34 | except OSError as exc: # Python >2.5 35 | if exc.errno == errno.EEXIST and os.path.isdir(dir_name): 36 | pass 37 | else: 38 | raise 39 | # Generate a certificate if one doesn't exist on disk 40 | subprocess.check_call(['openssl', 'req', '-new', 41 | '-newkey', 'rsa:2048', 42 | '-days', '365', 43 | '-nodes', '-x509', 44 | '-subj', '/C=XX/ST=XX/L=XX/O=generated/CN=generated', 45 | '-keyout', pem_file, 46 | '-out', pem_file]) 47 | # Restrict access to the file 48 | os.chmod(pem_file, stat.S_IRUSR | stat.S_IWUSR) 49 | c.NotebookApp.certfile = pem_file 50 | -------------------------------------------------------------------------------- /dockerfiles/mojo-startup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | LICENSE_LOCATION=$1 4 | MOJO_LOCATION=$2 5 | JAVA_HEAP_MEMORY=$3 6 | REST_SERVER_JAR_LOCATION="/opt/h2oai/dai/DAIMojoRestServer4-1.11.1.jar" 7 | 8 | nohup python mojo_tornado.py < /dev/null > tornado.out 2>&1 & 9 | 10 | while true 11 | do 12 | if [ -f $LICENSE_LOCATION ] && [ -d $MOJO_LOCATION ] 13 | then 14 | echo "LICENSE FILE EXISTS: $LICENSE_LOCATION, and MOJO FILE EXISTS: $MOJO_LOCATION" 15 | break 16 | else 17 | echo "missing necessary files at $LICENSE_LOCATION and $MOJO_LOCATION" 18 | mkdir $MOJO_LOCATION 19 | sleep 5 20 | fi 21 | done 22 | 23 | sleep 5 24 | echo "All Required Files Available, Launching DAI MOJO Rest Server" 25 | echo "Starting Rest Server..." 26 | nohup /usr/bin/java -Xmx${JAVA_HEAP_MEMORY}g -Dai.h2o.mojos.runtime.license.file=$LICENSE_LOCATION -DModelDirectory=$MOJO_LOCATION -jar $REST_SERVER_JAR_LOCATION < /dev/null > javarest.out & 27 | 28 | /bin/bash 29 | -------------------------------------------------------------------------------- /dockerfiles/mojo_tornado.py: -------------------------------------------------------------------------------- 1 | from datetime import date 2 | import tornado.escape 3 | import tornado.ioloop 4 | import tornado.web 5 | import requests 6 | import json 7 | 8 | 9 | class welcomePage(tornado.web.RequestHandler): 10 | def get(self): 11 | response = {'welcomepage': "Hello, this is the Driverless AI Docker Mojo Deployment Prototype"} 12 | self.write(response) 13 | 14 | 15 | class versionHandler(tornado.web.RequestHandler): 16 | def get(self): 17 | response = {'version': '0.0.1', 18 | 'last_build': date.today().isoformat(), 19 | 'description': 'prototype deployment for Driverless AI mojos in Docker'} 20 | self.write(response) 21 | 22 | 23 | class basePostHandler(tornado.web.RequestHandler): 24 | def post(self): 25 | data = self.get_argument('body', 'No Data Received') 26 | response = "Hello! You Have Encountered The Test Page! You Input Was: {}".format(data) 27 | self.write(response) 28 | 29 | get = post 30 | 31 | 32 | class restGetModelFeaturesHandler(tornado.web.RequestHandler): 33 | def post(self): 34 | model_name = self.get_argument('name', "No Data Received") 35 | data = {"name": model_name} 36 | url = 'http://localhost:8080/modelfeatures' 37 | response = requests.post(url, params=data) 38 | self.write(response.content) 39 | 40 | get = post 41 | 42 | 43 | class restScoreRow(tornado.web.RequestHandler): 44 | def post(self): 45 | model_name = self.get_argument('name', "No Model Name") 46 | row_string = self.get_argument('row', "No Data Recieved") 47 | data = {'name': model_name, 'row': row_string} 48 | url = 'http://localhost:8080/model' 49 | response = requests.post(url, params=data) 50 | result = json.loads(response.content.decode('utf-8')) 51 | result = result['result'].replace("=", "").split() 52 | self.write({'{}'.format(result[0]): result[1], 53 | '{}'.format(result[2]): result[3]}) 54 | 55 | get = post 56 | 57 | 58 | class restScoreBatch(tornado.web.RequestHandler): 59 | def post(self): 60 | all_preds = dict() 61 | batch_file = self.request.files['file'][0] 62 | model_name = self.get_argument('name', "No Model Name") 63 | header = self.get_argument('header', 'true') 64 | lines = batch_file['body'].decode('utf-8').split() 65 | if header == 'true': 66 | lines = lines[1:] 67 | 68 | for i, line in enumerate(lines): 69 | line = line.replace('"', "").replace("'", "") 70 | data = {'name': model_name, 'row': line} 71 | url = 'http://localhost:8080/model' 72 | response = requests.post(url, params=data) 73 | result = json.loads(response.content.decode('utf-8')) 74 | result = result['result'].replace("=", "").split(" ") 75 | all_preds['{}'.format(i)] = dict({'{}'.format(result[0]): result[1], 76 | '{}'.format(result[2]): result[3]}) 77 | self.write(all_preds) 78 | 79 | 80 | application = tornado.web.Application([ 81 | (r"/", welcomePage), 82 | (r"/version", versionHandler), 83 | (r"/postsomething", basePostHandler), 84 | (r"/modelfeatures", restGetModelFeaturesHandler), 85 | (r"/scorerow", restScoreRow), 86 | (r"/scorebatch", restScoreBatch) 87 | ]) 88 | 89 | if __name__ == "__main__": 90 | application.listen(5555) 91 | tornado.ioloop.IOLoop.instance().start() 92 | -------------------------------------------------------------------------------- /dockerfiles/sample-mojos/creditcard.mojo: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h2oai/h2o-kubeflow/0f3e052b15e687fcde0da70155f0d8d340d2ede6/dockerfiles/sample-mojos/creditcard.mojo -------------------------------------------------------------------------------- /dockerfiles/sample-mojos/loanlevel.mojo: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/h2oai/h2o-kubeflow/0f3e052b15e687fcde0da70155f0d8d340d2ede6/dockerfiles/sample-mojos/loanlevel.mojo -------------------------------------------------------------------------------- /dockerfiles/start-notebook.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright 2017 The Kubeflow Authors All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | set -e 17 | 18 | if [[ ! -z "${JUPYTERHUB_API_TOKEN}" ]]; then 19 | # launched by JupyterHub, use single-user entrypoint 20 | exec /usr/local/bin/start-singleuser.sh $* 21 | else 22 | . /usr/local/bin/start.sh jupyter notebook $* 23 | fi 24 | -------------------------------------------------------------------------------- /dockerfiles/start-singleuser.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright 2017 The Kubeflow Authors All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | set -e 17 | 18 | # set default ip to 0.0.0.0 19 | if [[ "$NOTEBOOK_ARGS $@" != *"--ip="* ]]; then 20 | NOTEBOOK_ARGS="--ip=0.0.0.0 $NOTEBOOK_ARGS" 21 | fi 22 | 23 | # handle some deprecated environment variables 24 | # from DockerSpawner < 0.8. 25 | # These won't be passed from DockerSpawner 0.9, 26 | # so avoid specifying --arg=empty-string 27 | if [ ! -z "$NOTEBOOK_DIR" ]; then 28 | NOTEBOOK_ARGS="--notebook-dir='$NOTEBOOK_DIR' $NOTEBOOK_ARGS" 29 | fi 30 | if [ ! -z "$JPY_PORT" ]; then 31 | NOTEBOOK_ARGS="--port=$JPY_PORT $NOTEBOOK_ARGS" 32 | fi 33 | if [ ! -z "$JPY_USER" ]; then 34 | NOTEBOOK_ARGS="--user=$JPY_USER $NOTEBOOK_ARGS" 35 | fi 36 | if [ ! -z "$JPY_COOKIE_NAME" ]; then 37 | NOTEBOOK_ARGS="--cookie-name=$JPY_COOKIE_NAME $NOTEBOOK_ARGS" 38 | fi 39 | if [ ! -z "$JPY_BASE_URL" ]; then 40 | NOTEBOOK_ARGS="--base-url=$JPY_BASE_URL $NOTEBOOK_ARGS" 41 | fi 42 | if [ ! -z "$JPY_HUB_PREFIX" ]; then 43 | NOTEBOOK_ARGS="--hub-prefix=$JPY_HUB_PREFIX $NOTEBOOK_ARGS" 44 | fi 45 | if [ ! -z "$JPY_HUB_API_URL" ]; then 46 | NOTEBOOK_ARGS="--hub-api-url=$JPY_HUB_API_URL $NOTEBOOK_ARGS" 47 | fi 48 | 49 | . /usr/local/bin/start.sh jupyterhub-singleuser $NOTEBOOK_ARGS $@ 50 | -------------------------------------------------------------------------------- /dockerfiles/start.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Copyright 2017 The Kubeflow Authors All rights reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | 16 | set -e 17 | 18 | # Handle special flags if we're root 19 | if [ $(id -u) == 0 ] ; then 20 | # Handle username change. Since this is cheap, do this unconditionally 21 | usermod -d /home/$NB_USER -l $NB_USER jovyan 22 | 23 | # Change UID of NB_USER to NB_UID if it does not match 24 | if [ "$NB_UID" != $(id -u $NB_USER) ] ; then 25 | echo "Set user UID to: $NB_UID" 26 | usermod -u $NB_UID $NB_USER 27 | # Careful: $HOME might resolve to /root depending on how the 28 | # container is started. Use the $NB_USER home path explicitly. 29 | for d in "$CONDA_DIR" "$JULIA_PKGDIR" "/home/$NB_USER"; do 30 | if [[ ! -z "$d" && -d "$d" ]]; then 31 | echo "Set ownership to uid $NB_UID: $d" 32 | chown -R $NB_UID "$d" 33 | fi 34 | done 35 | fi 36 | 37 | # Change GID of NB_USER to NB_GID if NB_GID is passed as a parameter 38 | if [ "$NB_GID" ] ; then 39 | echo "Change GID to $NB_GID" 40 | groupmod -g $NB_GID -o $(id -g -n $NB_USER) 41 | fi 42 | 43 | # Enable sudo if requested 44 | if [[ "$GRANT_SUDO" == "1" || "$GRANT_SUDO" == 'yes' ]]; then 45 | echo "Granting $NB_USER sudo access" 46 | echo "$NB_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/notebook 47 | fi 48 | 49 | # Exec the command as NB_USER 50 | echo "Execute the command as $NB_USER" 51 | exec su $NB_USER -c "env PATH=$PATH $*" 52 | else 53 | if [[ ! -z "$NB_UID" && "$NB_UID" != "$(id -u)" ]]; then 54 | echo 'Container must be run as root to set $NB_UID' 55 | fi 56 | if [[ ! -z "$NB_GID" && "$NB_GID" != "$(id -g)" ]]; then 57 | echo 'Container must be run as root to set $NB_GID' 58 | fi 59 | if [[ "$GRANT_SUDO" == "1" || "$GRANT_SUDO" == 'yes' ]]; then 60 | echo 'Container must be run as root to grant sudo permissions' 61 | fi 62 | # Exec the command 63 | echo "Execute the command" 64 | exec $* 65 | fi 66 | -------------------------------------------------------------------------------- /h2o-kubeflow/generate_docs.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # Copyright 2018 The Kubeflow Authors All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | 17 | import glob 18 | import os 19 | import subprocess 20 | 21 | if __name__ == "__main__": 22 | this_dir = os.path.dirname(__file__) 23 | 24 | GOPATH = os.getenv("GOPATH") 25 | doc_gen = os.path.join(GOPATH, "bin/doc-gen") 26 | for f in os.listdir(this_dir): 27 | full_dir = os.path.join(this_dir, f) 28 | if not os.path.isdir(f): 29 | continue 30 | prototypes = glob.glob(os.path.join(full_dir, "prototypes/*.jsonnet")) 31 | 32 | 33 | command = [doc_gen, os.path.join(full_dir, "parts.yaml")] 34 | command.extend(prototypes) 35 | with open(os.path.join(full_dir, "README.md"), "w") as hout: 36 | subprocess.check_call(command, stdout=hout) 37 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2o3-scaling/README.md: -------------------------------------------------------------------------------- 1 | # Static H2O-3 Cluster 2 | 3 | This will deploy automatically scaling H2O-3 Cluster. This is currently in development and is not 100% stable. 4 | 5 | ## Requirements: 6 | 7 | for the horizontal pod autoscaler to work, you will need to deploy this to your Kubernetes cluster first: 8 | https://github.com/kubernetes-incubator/metrics-server 9 | 10 | 1. Clone the repo above to your local machine 11 | 2. Once kubectl is properly configured, run `kubectl create -f deploy/1.8+/` from the top of the cloned repo 12 | 13 | this will launch the metrics api server that the horizontal pod autoscaler calls for memory/cpu consumption 14 | 15 | ## NEED TO KNOW 16 | 1. H2O-3 locks its cluster once it starts running jobs. If the memory consumption threshold, the horizontal pod autoscaler will spawn a new pod, but the pod will not be able to attach to the cluster. The cluster will need to be shutdown and re-initialized. 17 | - Python restart commands `h2o.cluster().shutdown()` and `h2o.init()` 18 | 2. Due to the nature of how scaling is applied, it is recommended that jobs be run as a script where the cluster will be shutdown automatically if a new node is spawned. 19 | 3. New pods are spawned one at a time, and as a result high cost could be incurred if it takes multiple iterations to reach a suitable number of pods. 20 | 4. Once the amount of jobs is reduced, the horizontal pod autoscaler will down scale automatically as well to fit the memory consumption. 21 | 22 | ## Parameters: 23 | 24 | The following are the parameters that can be supplied to deploy the cluster: 25 | - `--name` -- [REQUIRED] Name of the deployment 26 | - `-namespace` [OPTIONAL] Namespace where the deployment will be deployed e.g. prod, staging, , etc. 27 | - `--memory` [OPTIONAL] Amount of memory each pod in the deployment and node in H2O-3 Cluster will get. Default is 1Gi, NOTE: only input the numeric value. If you want less then 1Gi use decimals (e.g. 0.5) 28 | - `--cpu` [OPTIONAL] Number of cpus in each deployment pod and H2O-3 node 29 | - `--model_server_image` [REQUIRED] Docker image used to launch each pod 30 | 31 | ## Quickstart: 32 | ``` 33 | ks init 34 | cd 35 | 36 | ks env add 37 | ks registry add h2o-kubeflow 38 | ks pkg install h2o-kubeflow/h2o3-scaling 39 | 40 | ks prototype use io.ksonnet.pkg.h2o3-scaling h2o3-scaling \ 41 | --name h2o3-scaling \ 42 | --namespace kubeflow \ 43 | --memory 1 \ 44 | --cpu 1 \ 45 | --replicas 2 \ 46 | --model_server_image 47 | ``` 48 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2o3-scaling/h2o3-scaling.libsonnet: -------------------------------------------------------------------------------- 1 | local k = import 'k.libsonnet'; 2 | local deployment = k.extensions.v1beta1.deployment; 3 | local container = deployment.mixin.spec.template.spec.containersType; 4 | local storageClass = k.storage.v1beta1.storageClass; 5 | local service = k.core.v1.service; 6 | local networkPolicy = k.extensions.v1beta1.networkPolicy; 7 | local networkSpec = networkPolicy.mixin.spec; 8 | 9 | { 10 | parts:: { 11 | deployment:: { 12 | local defaults = { 13 | imagePullPolicy:: "IfNotPresent", 14 | }, 15 | 16 | modelHPA(name, namespace, replicas, labels={ app: name }): { 17 | apiVersion: "autoscaling/v2beta1", 18 | kind: "HorizontalPodAutoscaler", 19 | metadata: { 20 | labels: labels, 21 | name: name, 22 | namespace: namespace, 23 | }, 24 | spec: { 25 | scaleTargetRef: { 26 | apiVersion: "extensions/v1beta1", 27 | kind: "Deployment", 28 | name: name, 29 | }, 30 | minReplicas: replicas, 31 | maxReplicas: 10, 32 | metrics: [ 33 | { 34 | type: "Resource", 35 | resource: { 36 | name: "memory", 37 | targetAverageUtilization: 80 38 | }, 39 | }, 40 | ], 41 | }, 42 | }, 43 | 44 | modelService(name, namespace, labels={ app: name }): { 45 | apiVersion: "v1", 46 | kind: "Service", 47 | metadata: { 48 | labels: labels, 49 | name: name, 50 | namespace: namespace, 51 | }, 52 | spec: { 53 | ports: [ 54 | { 55 | port: 54321, 56 | protocol: "TCP", 57 | targetPort: 54321, 58 | }, 59 | ], 60 | selector: labels, 61 | type: "LoadBalancer", 62 | }, 63 | }, 64 | 65 | modelServer(name, namespace, memory, cpu, replicas, modelServerImage, labels={ app: name },): 66 | local volume = { 67 | name: "local-data", 68 | namespace: namespace, 69 | emptyDir: {}, 70 | }; 71 | base(name, namespace, memory, cpu, replicas, modelServerImage, labels), 72 | 73 | local base(name, namespace, memory, cpu, replicas, modelServerImage, labels) = 74 | { 75 | apiVersion: "extensions/v1beta1", 76 | kind: "Deployment", 77 | metadata: { 78 | name: name, 79 | namespace: namespace, 80 | labels: labels, 81 | }, 82 | spec: { 83 | strategy: { 84 | rollingUpdate: { 85 | maxSurge: 1, 86 | maxUnavailable: 1 87 | }, 88 | type: "RollingUpdate" 89 | }, 90 | replicas: replicas, 91 | template: { 92 | metadata: { 93 | labels: labels, 94 | }, 95 | spec: { 96 | containers: [ 97 | { 98 | name: name, 99 | image: modelServerImage, 100 | imagePullPolicy: defaults.imagePullPolicy, 101 | env: [ 102 | { 103 | name: "MEMORY", 104 | value: memory, 105 | } 106 | ], 107 | ports: [ 108 | { 109 | containerPort: 54321, 110 | protocol: "TCP" 111 | }, 112 | ], 113 | workingDir: "/opt", 114 | command: [ 115 | "java", 116 | "-Xmx$(MEMORY)g", 117 | "-jar", 118 | "h2o.jar", 119 | "-name", 120 | "h2oCluster", 121 | ], 122 | resources: { 123 | requests: { 124 | memory: memory + "Gi", 125 | cpu: cpu, 126 | }, 127 | limits: { 128 | memory: memory + "Gi", 129 | cpu: cpu, 130 | }, 131 | }, 132 | stdin: true, 133 | tty: true, 134 | }, 135 | ], 136 | dnsPolicy: "ClusterFirst", 137 | restartPolicy: "Always", 138 | schedulerName: "default-scheduler", 139 | securityContext: {}, 140 | }, 141 | }, 142 | }, 143 | }, 144 | }, 145 | }, 146 | } 147 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2o3-scaling/parts.yaml: -------------------------------------------------------------------------------- 1 | { 2 | "name": "H2O3", 3 | "apiVersion": "0.0.1", 4 | "kind": "ksonnet.io/parts", 5 | "description": "H2O3 is an open source package that automates the machine learning process", 6 | "author": "H2O.ai ", 7 | "contributors": [ 8 | { 9 | "name": "H2O Team", 10 | "email": "email@email.com" 11 | } 12 | ], 13 | "repository": { 14 | "type": "git", 15 | "url": "https://github.com/h2oai/h2o-kubeflow" 16 | }, 17 | "keywords": [ 18 | "kubeflow", 19 | "H2O.ai", 20 | "H2O3" 21 | ], 22 | "quickStart": { 23 | "prototype": "io.ksonnet.pkg.h2o3-scaling", 24 | "componentName": "h2o3-scaling", 25 | "flags": { 26 | "name": "h2o3", 27 | "namespace": "default", 28 | "memory": "1Gi", 29 | "n_cpu": "1", 30 | "n_replicas": 1 31 | }, 32 | "comment": "Run H2O3 on Kubeflow with Automatic Scaling Enabled" 33 | }, 34 | "license": "Apache 2.0" 35 | } 36 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2o3-scaling/prototypes/h2o3-scaling-all.jsonnet: -------------------------------------------------------------------------------- 1 | // @apiVersion 0.1 2 | // @name io.ksonnet.pkg.h2o3-scaling 3 | // @description H2O3 on Kubeflow 4 | // @shortDescription H2O3 Cluster 5 | // @param name string Name to give each of the components 6 | // @param model_server_image string gcr.io/h2o-gce/h2o3 7 | // @optionalParam namespace string default namespace 8 | // @optionalParam memory string 1Gi starting memory per pod 9 | // @optionalParam cpu string 1 starting number of cpu per pod 10 | // @optionalParam replicas number 1 starting number of pods 11 | 12 | local k = import 'k.libsonnet'; 13 | local h2o3cluster = import 'h2o-kubeflow/h2o3-scaling/h2o3-scaling.libsonnet'; 14 | 15 | local name = import 'param://name'; 16 | local namespace = import 'param://namespace'; 17 | local memory = import 'param://memory'; 18 | local cpu = import 'param://cpu'; 19 | local replicas = import 'param://replicas'; 20 | local modelServerImage = import 'param://model_server_image'; 21 | 22 | 23 | std.prune(k.core.v1.list.new([ 24 | h2o3cluster.parts.deployment.modelServer(name, namespace, memory, cpu, replicas, modelServerImage), 25 | h2o3cluster.parts.deployment.modelService(name, namespace), 26 | h2o3cluster.parts.deployment.modelHPA(name, namespace, replicas), 27 | ])) 28 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/README.md: -------------------------------------------------------------------------------- 1 | # H2O.ai Deployments 2 | 3 | ## Ksonnet App Setup: 4 | **NOTE:** The deployment steps assume that you have already set up a Kubernetes cluster and that `kubectl` and `ksonnet` have been installed/setup properly within your environment. 5 | 6 | 1. Create a new ksonnet app in your local environment 7 | * Run Command: `ks init ` and `cd ` 8 | 9 | 2. Grab Ksonnet registry and install necessary packages 10 | ``` 11 | # add ksonnet registry to app containing all the kubeflow manifests as maintained by Google Kubeflow team 12 | ks registry add kubeflow https://github.com/kubeflow/kubeflow/tree/master/kubeflow 13 | # add ksonnet registry to app containing all the h2o component manifests 14 | ks registry add h2o-kubeflow 15 | ks pkg install kubeflow/core 16 | ks pkg install kubeflow/tf-serving 17 | ks pkg install kubeflow/tf-job 18 | ks pkg install h2o-kubeflow/h2oai 19 | ks env add 20 | ``` 21 | You should be able to see the prototypes for `h2oai-driverlessai` and `h2oai-h2o3` after running `ks prototype list`: 22 | ``` 23 | Nicholass-MBP:h2o-kubeflow-dev npng$ ks prototype list 24 | NAME DESCRIPTION 25 | ==== =========== 26 | io.ksonnet.pkg.configMap A simple config map with optional user-specified data 27 | io.ksonnet.pkg.deployed-service A deployment exposed with a service 28 | io.ksonnet.pkg.h2oai-driverlessai Driverless AI 29 | io.ksonnet.pkg.h2oai-h2o3 H2O3 Static Cluster 30 | io.ksonnet.pkg.namespace Namespace with labels automatically populated from the name 31 | io.ksonnet.pkg.single-port-deployment Replicates a container n times, exposes a single port 32 | io.ksonnet.pkg.single-port-service Service that exposes a single port 33 | ``` 34 | 35 | ## H2O-3 Cluster (OSS) 36 | H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment. 37 | 38 | ### H2O-3 Deployment Steps 39 | **NOTE:** These deployment steps assume that you have already previously set up a Ksonnet application as defined above, and downloaded the necessary Ksonnet registries and packages. 40 | 41 | 1. Create a Docker image for Kubeflow to ingest. Files are located under `dockerfiles` data_directory 42 | * Run Command: `docker build -t /h2o3-kubeflow:latest -f Dockerfile.h2o3` from the `dockerfiles` directory 43 | 44 | 2. Push the new docker image to a remote repository that Kubeflow can access later `docker push /h2o3-kubeflow:latest` 45 | 46 | 3. Deploy the H2O-3 Cluster to your Kubernetes cluster 47 | ``` 48 | ks prototype use io.ksonnet.pkg.h2oai-h2o3 \ 49 | --name \ 50 | --namespace kubeflow \ 51 | --memory 1 \ 52 | --cpu 1 \ 53 | --replicas 2 \ 54 | --model_server_image 55 | 56 | ks apply -c 57 | ``` 58 | **NOTE**: component names are used by Kubeflow to deploy to Kubernetes while deployment names are what Kubernetes will show as the name of the process running in Kubernetes 59 | 60 | Running `kubectl get deployments` will show: 61 | ``` 62 | Nicholass-MBP:h2o-kubeflow-dev npng$ kubectl get deployments 63 | NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 64 | 3 3 3 3 13m 65 | ``` 66 | Where the name in the `NAME` column is the specified `` 67 | 68 | 4. You will be able to see the exposed ip address and port for the cluster using: `kubectl get svc`, and you will be able to connect to the cluster at `external ip + port 54321` 69 | 70 | ## Driverless AI (Enterprise) 71 | 72 | H2O Driverless AI is an artificial intelligence (AI) platform for automatic machine learning. Driverless AI automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection and model deployment. It aims to achieve highest predictive accuracy, comparable to expert data scientists, but in much shorter time thanks to end-to-end automation. Driverless AI also offers automatic visualizations and machine learning interpretability (MLI). 73 | 74 | ### Driverless AI Deployment Steps 75 | **NOTE:** These deployment steps assume that you have already previously set up a Ksonnet application as defined above, and downloaded the necessary Ksonnet registries and packages. 76 | 77 | 1. make sure to obtain a copy of the Driverless AI docker image. Download links can be obtained from this link [https://www.h2o.ai/download/](https://www.h2o.ai/download/). There are multiple images for varying platforms and architectures, so make sure to download the correct one. 78 | 79 | 2. Load the docker image to you Kubernetes cluster: `docker load < downloaded_driverless_ai_image.tar.gz`. Since this image is not public, you may need to load it to each node in the cluster, or to an internal repository. 80 | 81 | 3. **(OPTIONAL)** Create a Kubernetes ConfigMap to configure your Driverless AI deployment. The expectation is that there are minimally 2 files in the ConfigMap: `license.sig` containing the license key for Driverless AI and `config.toml` which is a file that can issue configuration overrides for Driverless AI. More information regarding the `config.toml` can be found [here](http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/config_toml.html). 82 | 83 | **NOTE:** all files inside the directory path will be loaded for consumption. 84 | **Example:** User includes a config.toml that overrides authentication with local authentication (htpasswd), htpasswd file is contained in `/path/to/configuration/files` as `/path/to/configuration/files/htpasswd`, then Driverless AI will be able to see the file at path `/config/htpasswd` 85 | 86 | ``` 87 | kubectl create configmap driverless --from-file="/path/to/configuration/files/" 88 | ``` 89 | 90 | 4. Deploy Driverless AI to your Kubernetes Cluster 91 | ``` 92 | ks prototype use io.ksonnet.pkg.h2oai-driverlessai \ 93 | --name 94 | --namespace kubeflow \ 95 | --memory 16 \ 96 | --cpu 4 \ 97 | --gpu 0 \ 98 | --pvcSize 50 \ 99 | --configMapName \ 100 | --model_server_image 101 | ``` 102 | **NOTE**: component names are used by Kubeflow to deploy to Kubernetes while deployment names are what Kubernetes will show as the name of the process running in Kubernetes 103 | 104 | Running `kubectl get deployments` will show: 105 | ``` 106 | Nicholass-MBP:h2o-kubeflow-dev npng$ kubectl get deployments 107 | NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 108 | 1 1 1 1 17m 109 | ``` 110 | 5. You will be able to see the exposed ip address and port for the cluster using: `kubectl get svc`. And you can connect to Driverless AI using the `external ip address + port 12345` 111 | 112 | 6. If you did not provide a ConfigMap, Driverless AI will request a license key after logging in. 113 | 114 | 115 | ## Mojo Rest Server Deployment Steps 116 | **NOTE:** These deployment steps assume that you already have a mojo artifact generated by Driverless AI and and that you have a valid license for Driverless AI. Additionally, these deployment steps assume that you have already previously set up a Ksonnet application as defined above, and downloaded the necessary Ksonnet registries and packages. 117 | 118 | 1. Make sure to build the docker image using `h2o-kubeflow/h2o-kubeflow/h2oai/dockerfiles/Dockerfile.mojo` and then push it to an accessible repository. `docker push :` 119 | 120 | 2. **(OPTIONAL)** Create a Kubernetes ConfigMap to configure your Driverless AI deployment. The expectation is that there is minimally 1 file in the ConfigMap: `license.sig` containing the license key for Driverless AI. 121 | 122 | ``` 123 | kubectl create configmap mojo-configs --from-file="/path/to/mojo/config/files/" 124 | ``` 125 | 126 | 3. Deploy the mojo rest server to your Kubernetes Cluster 127 | ``` 128 | ks prototype use io.ksonnet.pkg.h2oai-mojo-rest-server \ 129 | --name \ 130 | --namespace kubeflow \ 131 | --memory 4 \ 132 | --cpu 1 \ 133 | --configMapName \ 134 | --pvcSize 135 | --pvcName \ 136 | --licenseLocation \ 137 | --mojoLocation \ 138 | --model_server_image : 139 | ``` 140 | 141 | 4. You will be able to see the exposed ip address and port for the cluster using: `kubectl get svc`. And you can connect to Driverless AI using the `external ip address + port 5555` 142 | 143 | 5. If you did not provide a ConfigMap, the mojo-rest-server pod will wait until one is copied in/mounted. If you did not provide a pvcName, the mojo-rest-server will not be able to do anything until a mojo artifact is added into the specified mojoLocation path in the pod. 144 | 145 | 6. You can access the rest calls using the following protocols: 146 | * `http://:5555/modelfeatures?name=` 147 | * `http://:5555/scorerow?name=&row=` 148 | * **POST REQUEST:** with `{file: /path/to/file.csv, name: name_of_mojo_to_score_with, header: bool_whether_file_has_header}` 149 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/h2oai-driverlessai.libsonnet: -------------------------------------------------------------------------------- 1 | local k = import 'k.libsonnet'; 2 | local deployment = k.extensions.v1beta1.deployment; 3 | local container = deployment.mixin.spec.template.spec.containersType; 4 | local storageClass = k.storage.v1beta1.storageClass; 5 | local service = k.core.v1.service; 6 | local networkPolicy = k.extensions.v1beta1.networkPolicy; 7 | local networkSpec = networkPolicy.mixin.spec; 8 | 9 | { 10 | parts:: { 11 | deployment:: { 12 | local defaults = { 13 | imagePullPolicy:: "IfNotPresent", 14 | }, 15 | 16 | modelService(name, namespace, labels={ app: name }): { 17 | apiVersion: "v1", 18 | kind: "Service", 19 | metadata: { 20 | labels: labels, 21 | name: name, 22 | namespace: namespace, 23 | }, 24 | spec: { 25 | ports: [ 26 | { 27 | port: 12345, 28 | protocol: "TCP", 29 | targetPort: 12345, 30 | }, 31 | ], 32 | selector: labels, 33 | type: "LoadBalancer", 34 | sessionAffinity: "ClientIP" 35 | }, 36 | }, 37 | 38 | modelPersistentVolumeClaim(name, namespace, pvcSize, labels={ app: name }): { 39 | kind: "PersistentVolumeClaim", 40 | apiVersion: "v1", 41 | metadata: { 42 | labels: labels, 43 | name: name, 44 | namespace: namespace, 45 | }, 46 | spec: { 47 | accessModes: [ 48 | "ReadWriteOnce", 49 | ], 50 | volumeMode: "Filesystem", 51 | resources: { 52 | requests: { 53 | storage: pvcSize + "Gi", 54 | }, 55 | }, 56 | }, 57 | }, 58 | 59 | modelServer(name, namespace, configMapName, memory, cpu, gpu, modelServerImage, labels={ app: name } ): 60 | local volume = { 61 | name: "local-data", 62 | namespace: namespace, 63 | emptyDir: {}, 64 | }; 65 | base(name, namespace, configMapName, memory, cpu, gpu, modelServerImage, labels), 66 | 67 | local base(name, namespace, configMapName, memory, cpu, gpu, modelServerImage, labels) = 68 | { 69 | apiVersion: "extensions/v1beta1", 70 | kind: "Deployment", 71 | metadata: { 72 | name: name, 73 | namespace: namespace, 74 | labels: labels, 75 | }, 76 | spec: { 77 | strategy: { 78 | rollingUpdate: { 79 | maxSurge: 1, 80 | maxUnavailable: 1 81 | }, 82 | type: "RollingUpdate" 83 | }, 84 | template: { 85 | metadata: { 86 | labels: labels, 87 | }, 88 | spec: { 89 | containers: [ 90 | { 91 | name: name, 92 | image: modelServerImage, 93 | imagePullPolicy: defaults.imagePullPolicy, 94 | securityContext: { 95 | privileged: true, 96 | }, 97 | ports: [ 98 | { 99 | containerPort: 12345, 100 | protocol: "TCP", 101 | }, 102 | ], 103 | env: [ 104 | { 105 | name: "DAI_START_COMMAND", 106 | value: "if nvidia-smi | grep -o failed || true; then ./run.sh; else nvidia-smi -pm 1 && ./run.sh; fi", 107 | }, 108 | ] + if configMapName != "null" then [ 109 | { 110 | name: "DRIVERLESS_AI_CONFIG_FILE", 111 | value: "/config/config.toml" 112 | }, 113 | { 114 | name: "DRIVERLESS_AI_LICENSE_FILE", 115 | value: "/config/license.sig" 116 | } 117 | ] else [], 118 | command: [ 119 | "/bin/bash", 120 | ], 121 | args: [ 122 | "-c", 123 | "$(DAI_START_COMMAND)", 124 | ], 125 | resources: { 126 | requests: { 127 | memory: memory + "Gi", 128 | cpu: cpu, 129 | "nvidia.com/gpu": gpu, 130 | }, 131 | limits: { 132 | memory: memory + "Gi", 133 | cpu: cpu, 134 | "nvidia.com/gpu": gpu, 135 | }, 136 | }, 137 | volumeMounts: [ 138 | { 139 | mountPath: "/tmp", 140 | name: name + "-pvc", 141 | }, 142 | { 143 | mountPath: "/log", 144 | name: name + "-pvc", 145 | } 146 | ] + if configMapName != "null" then [ 147 | { 148 | mountPath: "/config", 149 | name: "dai-configmap-" + configMapName 150 | } 151 | ] else [], 152 | }, 153 | ], 154 | volumes: [ 155 | { 156 | name: name + "-pvc", 157 | persistentVolumeClaim: { 158 | claimName: name, 159 | }, 160 | }, 161 | ] + if configMapName != "null" then [ 162 | { 163 | name: "dai-configmap-" + configMapName, 164 | configMap: { 165 | name: configMapName, 166 | }, 167 | } 168 | ] else [], 169 | dnsPolicy: "ClusterFirst", 170 | restartPolicy: "Always", 171 | schedulerName: "default-scheduler", 172 | securityContext: {}, 173 | }, 174 | }, 175 | }, 176 | }, 177 | }, 178 | }, 179 | } 180 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/h2oai-h2o3.libsonnet: -------------------------------------------------------------------------------- 1 | local k = import 'k.libsonnet'; 2 | local deployment = k.extensions.v1beta1.deployment; 3 | local container = deployment.mixin.spec.template.spec.containersType; 4 | local storageClass = k.storage.v1beta1.storageClass; 5 | local service = k.core.v1.service; 6 | local networkPolicy = k.extensions.v1beta1.networkPolicy; 7 | local networkSpec = networkPolicy.mixin.spec; 8 | 9 | { 10 | parts:: { 11 | deployment:: { 12 | local defaults = { 13 | imagePullPolicy:: "IfNotPresent", 14 | }, 15 | 16 | modelService(name, namespace, labels={ app: name }): { 17 | apiVersion: "v1", 18 | kind: "Service", 19 | metadata: { 20 | labels: labels, 21 | name: name, 22 | namespace: namespace, 23 | }, 24 | spec: { 25 | ports: [ 26 | { 27 | port: 54321, 28 | protocol: "TCP", 29 | targetPort: 54321, 30 | }, 31 | ], 32 | selector: labels, 33 | type: "LoadBalancer", 34 | sessionAffinity: "ClientIP", 35 | }, 36 | }, 37 | 38 | modelServer(name, namespace, memory, cpu, replicas, modelServerImage, labels={ app: name },): 39 | local volume = { 40 | name: "local-data", 41 | namespace: namespace, 42 | emptyDir: {}, 43 | }; 44 | base(name, namespace, memory, cpu, replicas, modelServerImage, labels), 45 | 46 | local base(name, namespace, memory, cpu, replicas, modelServerImage, labels) = 47 | { 48 | apiVersion: "extensions/v1beta1", 49 | kind: "Deployment", 50 | metadata: { 51 | name: name, 52 | namespace: namespace, 53 | labels: labels, 54 | }, 55 | spec: { 56 | strategy: { 57 | rollingUpdate: { 58 | maxSurge: 1, 59 | maxUnavailable: 1 60 | }, 61 | type: "RollingUpdate" 62 | }, 63 | replicas: replicas, 64 | template: { 65 | metadata: { 66 | labels: labels, 67 | }, 68 | spec: { 69 | containers: [ 70 | { 71 | name: name, 72 | image: modelServerImage, 73 | imagePullPolicy: defaults.imagePullPolicy, 74 | env: [ 75 | { 76 | name: "MEMORY", 77 | value: memory, 78 | }, 79 | { 80 | name: "DEP_NAME", 81 | value: name 82 | } 83 | ], 84 | ports: [ 85 | { 86 | containerPort: 54321, 87 | protocol: "TCP" 88 | }, 89 | ], 90 | workingDir: "/opt", 91 | command: [ 92 | "/bin/bash", 93 | ], 94 | args: [ 95 | "-c", 96 | "/opt/docker-startup.sh && java -Xmx$(MEMORY)g -jar h2o.jar -flatfile flatfile.txt -name h2oCluster", 97 | ], 98 | resources: { 99 | requests: { 100 | memory: memory + "Gi", 101 | cpu: cpu, 102 | }, 103 | limits: { 104 | memory: memory + "Gi", 105 | cpu: cpu, 106 | }, 107 | }, 108 | stdin: true, 109 | tty: true, 110 | }, 111 | ], 112 | dnsPolicy: "ClusterFirst", 113 | restartPolicy: "Always", 114 | schedulerName: "default-scheduler", 115 | securityContext: {}, 116 | }, 117 | }, 118 | }, 119 | }, 120 | }, 121 | }, 122 | } 123 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/h2oai-mojo-rest-server.libsonnet: -------------------------------------------------------------------------------- 1 | local k = import 'k.libsonnet'; 2 | local deployment = k.extensions.v1beta1.deployment; 3 | local container = deployment.mixin.spec.template.spec.containersType; 4 | local storageClass = k.storage.v1beta1.storageClass; 5 | local service = k.core.v1.service; 6 | local networkPolicy = k.extensions.v1beta1.networkPolicy; 7 | local networkSpec = networkPolicy.mixin.spec; 8 | 9 | { 10 | parts:: { 11 | deployment:: { 12 | local defaults = { 13 | imagePullPolicy:: "IfNotPresent", 14 | }, 15 | 16 | modelService(name, namespace, labels={ app: name }): { 17 | apiVersion: "v1", 18 | kind: "Service", 19 | metadata: { 20 | labels: labels, 21 | name: name, 22 | namespace: namespace, 23 | }, 24 | spec: { 25 | ports: [ 26 | { 27 | port: 5555, 28 | protocol: "TCP", 29 | targetPort: 5555, 30 | }, 31 | ], 32 | selector: labels, 33 | type: "LoadBalancer", 34 | sessionAffinity: "ClientIP" 35 | }, 36 | }, 37 | 38 | modelPersistentVolumeClaim(name, namespace, pvcSize, labels={ app: name }): { 39 | kind: "PersistentVolumeClaim", 40 | apiVersion: "v1", 41 | metadata: { 42 | labels: labels, 43 | name: name, 44 | namespace: namespace, 45 | }, 46 | spec: { 47 | accessModes: [ 48 | "ReadWriteOnce", 49 | ], 50 | volumeMode: "Filesystem", 51 | resources: { 52 | requests: { 53 | storage: pvcSize + "Gi", 54 | }, 55 | }, 56 | }, 57 | }, 58 | 59 | modelServer(name, namespace, memory, cpu, pvcName, configMapName, modelServerImage, licenseLocation, mojoLocation, labels={ app: name },): 60 | local volume = { 61 | name: "local-data", 62 | namespace: namespace, 63 | emptyDir: {}, 64 | }; 65 | base(name, namespace, memory, cpu, pvcName, configMapName, modelServerImage, licenseLocation, mojoLocation, labels), 66 | 67 | local base(name, namespace, memory, cpu, pvcName, configMapName, modelServerImage, licenseLocation, mojoLocation, labels) = 68 | { 69 | apiVersion: "extensions/v1beta1", 70 | kind: "Deployment", 71 | metadata: { 72 | name: name, 73 | namespace: namespace, 74 | labels: labels, 75 | }, 76 | spec: { 77 | strategy: { 78 | rollingUpdate: { 79 | maxSurge: 1, 80 | maxUnavailable: 1 81 | }, 82 | type: "RollingUpdate" 83 | }, 84 | template: { 85 | metadata: { 86 | labels: labels, 87 | }, 88 | spec: { 89 | containers: [ 90 | { 91 | name: name, 92 | image: modelServerImage, 93 | imagePullPolicy: defaults.imagePullPolicy, 94 | env: [ 95 | { 96 | name: "MEMORY", 97 | value: memory, 98 | }, 99 | { 100 | name: "DEP_NAME", 101 | value: name 102 | } 103 | ] + if configMapName != "null" then [ 104 | { 105 | name: "DRIVERLESS_AI_LICENSE_FILE", 106 | value: "/config/license.sig" 107 | } 108 | ] else [], 109 | ports: [ 110 | { 111 | containerPort: 5555, 112 | protocol: "TCP" 113 | }, 114 | ], 115 | command: [ 116 | "/bin/bash", 117 | ], 118 | args: [ 119 | "-c", 120 | "./mojo-startup.sh " + licenseLocation + " " + mojoLocation + " " + memory, 121 | ], 122 | workingDir: "/", 123 | resources: { 124 | requests: { 125 | memory: memory + "Gi", 126 | cpu: cpu, 127 | }, 128 | limits: { 129 | memory: memory + "Gi", 130 | cpu: cpu, 131 | }, 132 | }, 133 | volumeMounts: [ 134 | { 135 | mountPath: "/tmp", 136 | name: pvcName + "-pvc", 137 | } 138 | ] + if configMapName != "null" then [ 139 | { 140 | mountPath: "/config", 141 | name: "mojo-configmap-" + configMapName 142 | } 143 | ] else [], 144 | stdin: true, 145 | tty: true, 146 | }, 147 | ], 148 | volumes: [ 149 | { 150 | name: pvcName + "-pvc", 151 | persistentVolumeClaim: { 152 | claimName: pvcName, 153 | }, 154 | }, 155 | ] + if configMapName != "null" then [ 156 | { 157 | name: "mojo-configmap-" + configMapName, 158 | configMap: { 159 | name: configMapName, 160 | }, 161 | } 162 | ] else [], 163 | dnsPolicy: "ClusterFirst", 164 | restartPolicy: "Always", 165 | schedulerName: "default-scheduler", 166 | securityContext: {}, 167 | }, 168 | }, 169 | }, 170 | }, 171 | }, 172 | }, 173 | } 174 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/parts.yaml: -------------------------------------------------------------------------------- 1 | { 2 | "name": "h2oai", 3 | "apiVersion": "0.0.1", 4 | "kind": "ksonnet.io/parts", 5 | "description": "H2O.ai Core Deployments", 6 | "author": "h2oai team", 7 | "contributors": [ 8 | { 9 | "name": "Nicholas Png", 10 | "email": "nicholas@h2o.ai" 11 | } 12 | ], 13 | "repository": { 14 | "type": "git", 15 | "url": "https://github.com/kubeflow/kubeflow" 16 | }, 17 | "bugs": { 18 | "url": "https://github.com/h2oai/h2o-kubeflow/issues" 19 | }, 20 | "keywords": [ 21 | "kubernetes", 22 | "machine learning", 23 | "ai" 24 | ], 25 | "quickStartH2o3": { 26 | "prototype": "io.ksonnet.pkg.h2oai-h2o3", 27 | "componentName": "h2o3-static", 28 | "flags": { 29 | "name": "h2o3-static", 30 | "namespace": "default", 31 | "memory": 4, 32 | "cpu": 1, 33 | "replicas": 2, 34 | "model_server_image": "/h2o3:latest" 35 | }, 36 | "comment": "H2O-3 Cluster Deployment. NOTE: Use Dockerfile.h2o3 to build model_server_image." 37 | }, 38 | "quickStartDriverlessAi": { 39 | "prototype": "io.ksonnet.pkg.h2oai-driverlessai", 40 | "componentName": "h2o-dai", 41 | "flags": { 42 | "name": "h2o-dai", 43 | "namespace": "default", 44 | "memory": 16, 45 | "cpu": 4, 46 | "gpu": 0, 47 | "model_server_image": "opsh2oai/h2oai-runtime:latest" 48 | }, 49 | "comment": "Deployment of H2O.ai Driverless AI. NOTE: Obtain docker image from docs.h2o.ai, license can be obtained by contacting sales@h2o.ai" 50 | }, 51 | "quickStartMojoRestServer": { 52 | "prototype": "io.ksonnet.pkg.h2oai-mojo-rest-server", 53 | "componentName": "h2o-mojo-rest-server", 54 | "flags": { 55 | "name": "h2oai-mojo-rest-server", 56 | "namespace": "default", 57 | "memory": 8, 58 | "cpu": 1, 59 | "configMapName": "dai-config", 60 | "pvcSize": 5, 61 | "pvcName": "null", 62 | "mojoLocation": "/tmp/mojo-models", 63 | "licenseLocation": "/config/license.sig", 64 | "model_server_image": "/mojo-rest-server:latest" 65 | }, 66 | "comment": "Deployment of Simple rest server for deploying Mojo Artifacts generated by Driverless AI" 67 | } 68 | } 69 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/prototypes/h2oai-driverlessai.jsonnet: -------------------------------------------------------------------------------- 1 | // @apiVersion 0.1 2 | // @name io.ksonnet.pkg.h2oai-driverlessai 3 | // @description Driverless AI on Kubeflow 4 | // @shortDescription Driverless AI 5 | // @param name string Name to give each of the components 6 | // @param model_server_image string opsh2oai/h2oai-runtime 7 | // @optionalParam namespace string default namespace 8 | // @optionalParam memory string 1 memory allocated for deployment 9 | // @optionalParam cpu string 1 number of cpu allocated for deployment 10 | // @optionalParam gpu number 0 number of gpu allocated for deployment 11 | // @optionalParam pvcSize number 50 size of persistent volume claim for deployment 12 | // @optionalParam configMapName string null name of optional configmap containing any user config files wished to be include. Expects at least config.toml and license.sig 13 | 14 | local k = import 'k.libsonnet'; 15 | local driverlessai = import 'h2o-kubeflow/h2oai/h2oai-driverlessai.libsonnet'; 16 | 17 | local name = import 'param://name'; 18 | local namespace = import 'param://namespace'; 19 | local configmapname = import 'param://configMapName'; 20 | local memory = import 'param://memory'; 21 | local cpu = import 'param://cpu'; 22 | local gpu = import 'param://gpu'; 23 | local pvcSize = import 'param://pvcSize'; 24 | local modelServerImage = import 'param://model_server_image'; 25 | 26 | std.prune(k.core.v1.list.new([ 27 | driverlessai.parts.deployment.modelServer(name, namespace, configmapname, memory, cpu, gpu, modelServerImage), 28 | driverlessai.parts.deployment.modelService(name, namespace), 29 | driverlessai.parts.deployment.modelPersistentVolumeClaim(name, namespace, pvcSize), 30 | ])) 31 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/prototypes/h2oai-h2o3.jsonnet: -------------------------------------------------------------------------------- 1 | // @apiVersion 0.1 2 | // @name io.ksonnet.pkg.h2oai-h2o3 3 | // @description H2O3 on Kubeflow 4 | // @shortDescription H2O3 Static Cluster 5 | // @param name string Name to give each of the components 6 | // @param model_server_image string gcr.io/h2o-gce/h2o3 7 | // @optionalParam namespace string default namespace 8 | // @optionalParam memory string 1 starting memory per pod 9 | // @optionalParam cpu string 1 starting number of cpu per pod 10 | // @optionalParam replicas number 1 starting number of pods 11 | 12 | local k = import 'k.libsonnet'; 13 | local h2o3static = import 'h2o-kubeflow/h2oai/h2oai-h2o3.libsonnet'; 14 | 15 | local name = import 'param://name'; 16 | local namespace = import 'param://namespace'; 17 | local memory = import 'param://memory'; 18 | local cpu = import 'param://cpu'; 19 | local replicas = import 'param://replicas'; 20 | local modelServerImage = import 'param://model_server_image'; 21 | 22 | 23 | std.prune(k.core.v1.list.new([ 24 | h2o3static.parts.deployment.modelServer(name, namespace, memory, cpu, replicas, modelServerImage), 25 | h2o3static.parts.deployment.modelService(name, namespace), 26 | ])) 27 | -------------------------------------------------------------------------------- /h2o-kubeflow/h2oai/prototypes/h2oai-mojo-rest-server.jsonnet: -------------------------------------------------------------------------------- 1 | // @apiVersion 0.1 2 | // @name io.ksonnet.pkg.h2oai-mojo-rest-server 3 | // @description Small sample rest server for consuming mojo artifacts generated by Driverless AI 4 | // @shortDescription Driverless AI Mojo Rest Server 5 | // @param name string Name to give each of the components 6 | // @param model_server_image string /: 7 | // @optionalParam namespace string default namespace 8 | // @optionalParam memory string 8 starting memory per pod 9 | // @optionalParam cpu string 1 starting number of cpu per pod 10 | // @optionalParam configMapName string null name of configuration map containing Driverless AI license file to be consumed by the mojo 11 | // @optionalParam pvcSize string 5 size in GB to allocate to persistent volume attached to this deployment, can be fairly small. Just enough to hold mojo artifacts. 12 | // @optionalParam pvcName string null Name of pre-established Persistent Volume, if exists will try to create a claim to this persistent volume 13 | // @optionalParam licenseLocation string /config/license.sig location of driverless ai license 14 | // @optionalParam mojoLocation string /tmp/mojo-models/ directory where the mojo.zip file resides 15 | 16 | local k = import 'k.libsonnet'; 17 | local mojoserving = import 'h2o-kubeflow/h2oai/h2oai-mojo-rest-server.libsonnet'; 18 | 19 | local name = import 'param://name'; 20 | local namespace = import 'param://namespace'; 21 | local memory = import 'param://memory'; 22 | local cpu = import 'param://cpu'; 23 | local pvcSize = import 'param://pvcSize'; 24 | local pvcName = import 'param://pvcName'; 25 | local configMapName = import 'param://configMapName'; 26 | local modelServerImage = import 'param://model_server_image'; 27 | local licenseLocation = import 'param://licenseLocation'; 28 | local mojoLocation = import 'param://mojoLocation'; 29 | 30 | if pvcName != "null" then std.prune(k.core.v1.list.new([ 31 | mojoserving.parts.deployment.modelServer(name, namespace, memory, cpu, pvcName, configMapName, modelServerImage, licenseLocation, mojoLocation), 32 | mojoserving.parts.deployment.modelService(name, namespace), 33 | ])) else std.prune(k.core.v1.list.new([ 34 | mojoserving.parts.deployment.modelServer(name, namespace, memory, cpu, name, configMapName, modelServerImage, licenseLocation, mojoLocation), 35 | mojoserving.parts.deployment.modelService(name, namespace), 36 | mojoserving.parts.deployment.modelPersistentVolumeClaim(name, namespace, pvcSize), 37 | ])) 38 | -------------------------------------------------------------------------------- /h2o-kubeflow/registry.yaml: -------------------------------------------------------------------------------- 1 | apiVersion: '0.1' 2 | kind: ksonnet.io/registry 3 | libraries: 4 | h2o3-scaling: 5 | verion: master 6 | path: h2o3-scaling 7 | mojo-serving: 8 | version: master 9 | path: mojo-serving 10 | h2oai: 11 | version: master 12 | path: h2oai 13 | -------------------------------------------------------------------------------- /scripts/README.md: -------------------------------------------------------------------------------- 1 | #### Burst to Cloud 2 | 3 | If you are interested in additional orchestration, follow the following steps to setup a Kubernetes cluster. This walkthrough will setup a Kubernetes cluster with the ability to scale with the demand of additional resources. 4 | 5 | Note: This is a prototype and will continue to be changed/modified as time progresses. 6 | 7 | 1. Start a machine with Ubuntu 16.04. This can be On-Premise or in the cloud 8 | 2. Copy all the scripts from the [scripts](https://github.com/h2oai/h2o-kubeflow/tree/master/scripts) folder in this repo to the machine 9 | 3. Move `deployment-status.service` and `deployment-status.timer` to `/etc/systemd/system/` and enable the services. 10 | ``` 11 | sudo mv deployment-status.service /etc/systemd/system/ 12 | sudo mv deployment-status.timer /etc/systemd/system/ 13 | sudo systemctl enable deployment-status.service deployment-status.timer 14 | sudo systemctl start deployment-status.service deployment-status.timer 15 | ``` 16 | 4. Move `deployment-status.sh`, `k8s_master_setup.sh` and `k8s_slave_setup.sh` to a new directory `/opt/kubeflow/` 17 | ``` 18 | sudo mkdir /opt/kubeflow 19 | sudo mv k8s_master_setup.sh /opt/kubeflow/ 20 | ``` 21 | 5. Run `sudo /opt/kubeflow/k8s_master_setup.sh`. This script will modify `k8s_slave_setup.sh` with the necessary commands to connect any other machines __Ubuntu 16.04__ to the Kubernetes cluster 22 | 6. Run the new `k8s_slave_setup.sh` on any other machines you want to connect to the cluster 23 | 7. `k8s_slave_setup.sh` will also create a new file called config.txt in `/opt/kubeflow/` modify the final line `KSONNET_APP` to include the relative file path to the file created by `ks init`: /home/ubuntu/my_ksonnet_app --> use `KSONNET_APP=my_ksonnet_app` 24 | 8. Use `kubectl get nodes` to ensure that all nodes are attached properly to the cluster 25 | 9. Follow above steps to deploy H2O on Kubeflow + Kubernetes 26 | -------------------------------------------------------------------------------- /scripts/deployment-status.service: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=Check status of kubernetes cluster deployments. If pods pending due to insufficient resources and env variable is set, add new node. 3 | 4 | [Service] 5 | User=ubuntu 6 | ExecStart=/opt/kubeflow/deployment-status.sh 7 | -------------------------------------------------------------------------------- /scripts/deployment-status.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Check for pending deployments/pods 4 | # If any are pending because of insufficient resources, Kill the deployment 5 | if $(kubectl get pods -o wide | grep -q "Pending") 6 | then 7 | if $(kubectl get pods --output="jsonpath={.items[*].status.conditions[*].message}" | grep -q "Insufficient") 8 | then 9 | echo "there are unscheduled pods due to lack of resources" 10 | (IFS=$'\n' 11 | for x in $(kubectl get deployments) 12 | do 13 | header="NAME" 14 | deployname=$(echo $x | awk '{ print $1 }') 15 | desired=$(echo $x | awk '{ print $2 }') 16 | available=$(echo $x | awk '{ print $5 }') 17 | if [ $deployname == $header ] 18 | then 19 | echo "HEADER NOTHING TO DO" 20 | else 21 | if [ $desired == $available ] 22 | then 23 | echo $deployname 24 | echo "Healthy Cluster" 25 | else 26 | echo $deployname 27 | echo "Unhealthy Cluster, Killing Deployment" 28 | namespace=$(kubectl get deployment $deployname --output="jsonpath={ .metadata.namespace}") 29 | kubectl delete deployment $deployname -n $namespace 30 | echo "Requesting New Node" 31 | sudo sed -i "s/^REQUEST_NEW_NODE=.*/REQUEST_NEW_NODE=TRUE/g" /opt/kubeflow/config.txt 32 | touch /home/ubuntu/tmpvars.txt 33 | sudo cat > /home/ubuntu/tmpvars.txt << EOF 34 | Deployment=$deployname 35 | Namespace=$namespace 36 | EOF 37 | fi 38 | fi 39 | done 40 | ) 41 | else 42 | echo "something else is wrong" 43 | fi 44 | else 45 | echo "no pending pods" 46 | fi 47 | 48 | # check if the cluster is allowed to expand. If it is, start a new instance 49 | burst=$(sudo cat /opt/kubeflow/config.txt | grep "ALLOW_BURST_TO_CLOUD" | sed "s/^ALLOW_BURST_TO_CLOUD=//g") 50 | cloudinstance=$(sudo cat /opt/kubeflow/config.txt | grep "CLOUD_INSTANCES" | sed "s/^CLOUD_INSTANCES=//g") 51 | ksapp=$(sudo cat /opt/kubeflow/config.txt | grep "KSONNET_APP" | sed "s/^KSONNET_APP=//g") 52 | requestnewnode=$(sudo cat /opt/kubeflow/config.txt | grep "REQUEST_NEW_NODE" | sed "s/^REQUEST_NEW_NODE=//g") 53 | deployname=$(sudo cat /home/ubuntu/tmpvars.txt | grep "Deployment" | sed "s/^Deployment=//g") 54 | namespace=$(sudo cat /home/ubuntu/tmpvars.txt | grep "Namespace" | sed "s/^Namespace=//g") 55 | 56 | if [ $burst == "TRUE" ] && [ $requestnewnode == "TRUE" ] 57 | then 58 | echo "CREATING NEW NODE: ALLOW_BURST_TO_CLOUD=TRUE" 59 | gcloud compute instances create kubeflow-burst-to-cloud-$cloudinstance \ 60 | --machine-type n1-standard-8 \ 61 | --boot-disk-size 128GB \ 62 | --network default \ 63 | --zone us-west1-b \ 64 | --metadata-from-file startup-script=/opt/kubeflow/k8s_slave_setup.sh \ 65 | --scopes cloud-platform \ 66 | --image-family "https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/family/ubuntu-1604-lts" 67 | 68 | # While loop to check if instance is up and ready to be modified 69 | isup="FALSE" 70 | while [ $isup == "FALSE" ] 71 | do 72 | currentstatus=$(gcloud compute instances describe kubeflow-burst-to-cloud-$cloudinstance --zone us-west1-b | grep "status" | sed "s/^status: //g") 73 | if [ $currentstatus == "RUNNING" ] 74 | then 75 | isup="TRUE" 76 | newcloudinstances=$(($cloudinstance+1)) 77 | sudo sed -i "s/^CLOUD_INSTANCES=.*/CLOUD_INSTANCES=${newcloudinstances}/g" /opt/kubeflow/config.txt 78 | else 79 | echo "WAITING FOR INSTANCE TO START...." 80 | fi 81 | sleep 5 82 | done 83 | 84 | # While loop to check if node has been properly attached to cluster and ready to be used 85 | nodeready="FALSE" 86 | while [ $nodeready == "FALSE" ] 87 | do 88 | nodestatus=$(kubectl get nodes | grep "kubeflow-burst-to-cloud-$cloudinstance" | awk '{ print $2 }') 89 | if [ $nodestatus == "Ready" ] 90 | then 91 | cd /home/ubuntu/$ksapp 92 | ks apply $namespace -c $deployname 93 | rm /home/ubuntu/tmpvars.txt 94 | nodeready="TRUE" 95 | else 96 | echo "WAITING FOR NODE TO BE READY" 97 | sleep 5 98 | fi 99 | done 100 | sudo sed -i "s/^REQUEST_NEW_NODE=.*/REQUEST_NEW_NODE=FALSE/g" /opt/kubeflow/config.txt 101 | else 102 | echo "DO NOTHING: BURST TO CLOUD NOT ENABLED or NO NEW NODE REQUESTED" 103 | fi 104 | -------------------------------------------------------------------------------- /scripts/deployment-status.timer: -------------------------------------------------------------------------------- 1 | [Unit] 2 | Description=Timer to schedule deployment-status.service to run 3 | 4 | [Timer] 5 | OnCalendar=minutely 6 | -------------------------------------------------------------------------------- /scripts/k8s_master_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # install kubectl, kubeadm, kubernetes-cni 4 | curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - 5 | 6 | cat < kubernetes.list 7 | deb http://apt.kubernetes.io/ kubernetes-xenial main 8 | EOF 9 | mv kubernetes.list /etc/apt/sources.list.d/kubernetes.list 10 | 11 | apt-get update 12 | apt-get install -y build-essential curl file git 13 | apt-get install -y docker.io 14 | apt-get install -y kubelet kubeadm kubectl kubernetes-cni 15 | 16 | # install tools for kubeflow (ksonnet and jsonnet) 17 | wget https://github.com/ksonnet/ksonnet/releases/download/v0.9.2/ks_0.9.2_linux_amd64.tar.gz 18 | tar -xzvf ks_0.9.2_linux_amd64.tar.gz 19 | cp ks_0.9.2_linux_amd64/ks /usr/bin/ks 20 | chmod +x /usr/bin/ks 21 | 22 | # NOTE: contained code may not be needed... 23 | # -------------------------------------------- 24 | # sh -c "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install.sh)" 25 | # test -d ~/.linuxbrew && PATH="$HOME/.linuxbrew/bin:$HOME/.linuxbrew/sbin:$PATH" 26 | # test -d /home/linuxbrew/.linuxbrew && PATH="/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:$PATH" 27 | # test -r ~/.bash_profile && echo "export PATH='$(brew --prefix)/bin:$(brew --prefix)/sbin'":'"$PATH"' >>~/.bash_profile 28 | # echo "export PATH='$(brew --prefix)/bin:$(brew --prefix)/sbin'":'"$PATH"' >>~/.profile 29 | # 30 | # source ~/.profile 31 | # brew install jsonnet 32 | # -------------------------------------------- 33 | 34 | # setup kubernetes cluster, spawn kubernetes master node 35 | # needed to prevent issues later 36 | swapoff -av 37 | # spawn master node for k8s and dump logs to kube-init.txt for use later 38 | kubeadm init > /opt/kubeflow/kube-init.txt 39 | cat /opt/kubeflow/kube-init.txt | grep "kubeadm join" | awk '{$1=$1};1' >> /opt/kubeflow/k8s_slave_setup.sh 40 | 41 | mkdir -p /home/ubuntu/.kube 42 | sudo cp -i /etc/kubernetes/admin.conf /home/ubuntu/.kube/config 43 | sudo chmod 644 /home/ubuntu/.kube/config 44 | 45 | # setup pod network using weave-net 46 | sysctl net.bridge.bridge-nf-call-iptables=1 47 | export kubever=$(sudo kubectl version | base64 | tr -d '\n') 48 | kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever&env.IPALLOC_RANGE=10.96.0.0/16" 49 | 50 | # give kubectl read permission to pods 51 | kubectl create clusterrolebinding read-binding --clusterrole=view --user=system:serviceaccount:default:default 52 | 53 | touch /opt/kubeflow/config.txt 54 | echo "ALLOW_BURST_TO_CLOUD="TRUE"" >> /opt/kubeflow/config.txt 55 | echo "CLOUD_INSTANCES=0" >> /opt/kubeflow/config.txt 56 | echo "REQUEST_NEW_NODE="FALSE"" >> /opt/kubeflow/config.txt 57 | echo "KSONNET_APP=" >> /opt/kubeflow/config.txt 58 | -------------------------------------------------------------------------------- /scripts/k8s_slave_setup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # install kubectl, kubeadm, kubernetes-cni 4 | curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - 5 | 6 | cat < kubernetes.list 7 | deb http://apt.kubernetes.io/ kubernetes-xenial main 8 | EOF 9 | mv kubernetes.list /etc/apt/sources.list.d/kubernetes.list 10 | 11 | apt-get update 12 | apt-get install -y build-essential curl file git 13 | apt-get install -y docker.io 14 | apt-get install -y kubelet kubeadm kubectl kubernetes-cni 15 | --------------------------------------------------------------------------------