├── .gitignore ├── BLOG.md ├── LICENSE ├── MANIFEST.in ├── README.md ├── conda_recipe ├── build.sh └── meta.yaml ├── docker ├── build.sh ├── controller │ ├── Dockerfile.controller │ └── supervisord.conf ├── engine │ ├── Dockerfile.engine │ ├── start_engine.sh │ └── supervisord.conf └── requirements.txt ├── ipyparallel_mesos ├── __init__.py └── launcher.py ├── requirements.txt ├── setup.cfg └── setup.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.egg-info/* 3 | *.pyc 4 | .project 5 | .idea/* 6 | **/*.egg-info/* 7 | *.cache 8 | **/*.ipynb_checkpoints/* 9 | **/.env 10 | coverage.xml 11 | results.xml 12 | .coverage 13 | version 14 | dist/* 15 | build/* 16 | 17 | -------------------------------------------------------------------------------- /BLOG.md: -------------------------------------------------------------------------------- 1 | # Deploying an ipython ipyparallel mesos deployment 2 | 3 | 4 | 5 | The Analytics Services team here at Activision are heavy users of [mesos](http://mesos.apache.org/) and [marathon](https://github.com/mesosphere/marathon) to deploy and mangage services on our clusters. We are also huge fans of python and the jupyter project. The [jupyter](http://jupyter.org/) project recently reogorganized from IPython, in a move refered to as "the split". One part that was orginally part of IPython (`IPython.parallel`) that was split off into a seperate project is [ipyparallel](https://github.com/ipython/ipyparallel). This powerful component of the IPython ecosystem is generally overlooked. 6 | 7 | In the post we will cover an introduction to ipyparallel and introduce a new way to deploy an ipython cluster on mesos. 8 | 9 | 10 | ## Introduction to ipyparallel 11 | 12 | 13 | 14 | 15 | to help 16 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015-2016, Activision Publishing, Inc. 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without modification, 5 | are permitted provided that the following conditions are met: 6 | 7 | 1. Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | 2. Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | 3. Neither the name of the copyright holder nor the names of its contributors 15 | may be used to endorse or promote products derived from this software without 16 | specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 19 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 20 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR 22 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 24 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 25 | ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 26 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 27 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include LICENSE 2 | include README.md 3 | include requirements.txt 4 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ipyparallel mesos/marathon launcher 2 | 3 | ipyparallel has built in support for a number of backends. This is a backend for launching IPython clusters into mesos using docker and marathon. 4 | 5 | 6 | ## Quickstart 7 | 8 | Install ipyparallel_mesos from pip or conda 9 | 10 | pip 11 | ``` 12 | pip install ipyparallel_mesos 13 | ``` 14 | 15 | or from conda 16 | ``` 17 | # from the ActivisionGameScience conda channel 18 | conda install --channel ActivisionGameScience ipyparallel_mesos 19 | ``` 20 | 21 | Create new ipython profile 22 | ``` 23 | ipython profile create --parallel --profile=mesos 24 | ``` 25 | 26 | Edit `~/.ipython/profile_mesos/ipcluster_config.py` and add 27 | ``` 28 | # Required 29 | # MUST SET 30 | c.MarathonLauncher.marathon_master_url = 'http://MARATHON_URL:8080' # url with port to a marathon master 31 | c.MarathonLauncher.marathon_app_group = '/test/ipythontest/jdennison/' # Marathon application group. These needs to be unique per a cluster so if you have multiple users deploying clusters make sure they choose their own application group. 32 | 33 | # Resonable defaults 34 | c.IPClusterStart.controller_launcher_class = 'ipyparallel_mesos.launcher.MarathonControllerLauncher' 35 | c.IPClusterEngines.engine_launcher_class = 'ipyparallel_mesos.launcher.MarathonEngineSetLauncher' 36 | 37 | c.MarathonLauncher.controller_docker_image = 'jdennison/ipyparallel-marathon-controller:dev' # Docker container image for the controller 38 | c.MarathonLauncher.engine_docker_image = 'jdennison/ipyparallel-marathon-engine:dev' # Docker image for the engine. This is where you should install custom dependencies 39 | 40 | # Optional 41 | c.MarathonLauncher.engine_memory = 1024 # Amount of memory (in megabytes) to limit the docker container. NOTE: if your engine uses more the this, the docker container will be killed by the kernel without warning. 42 | c.MarathonLauncher.controller_memory = 512 # Amount of memory (in megabytes) to limit the docker container. NOTE: if your engine uses more the this, the docker container will be killed by the kernel without warning. 43 | c.MarathonLauncher.controller_config_port = '1235' # The port the controller exposes for clients and engines to retrive connection information. Note, if there are multiple users on the same cluster this will need to be changed 44 | ``` 45 | 46 | While this new profile will work with the Jupyter IPython Cluter tab. You should start with the command line to help debug. 47 | ``` 48 | ipcluster start --n=4 --profile=mesos 49 | ``` 50 | 51 | As long as this command starts you should see the the docker containers in your marathon ui under the `marathon_app_group` you set earlier. You are now ready to cook with fire. 52 | 53 | Open a new terminal session on the same machine you just ran `ipcluster`. Start Juypter or an IPython session. 54 | ``` 55 | import ipyparallel as ipp 56 | rc = ipp.Client(profile='mesos') 57 | 58 | import socket 59 | rc[:].apply_async(socket.gethostname).get_dict() # Should print the hosts of the IPython engines. 60 | ``` 61 | 62 | To shut down just press Ctrl+c in the terminal you ran `ipcluster` 63 | 64 | 65 | ### Docker 66 | 67 | ipyparallel has three main components: client, controller and engine. Please refer to the [docs](https://ipyparallel.readthedocs.org/) for a deeper dive. This project provides two docker containers to run a controller and engines in mesos cluster as well as new launchers to deploy them from ipyparallel Jupyter's cluster tab and the ipcluster cli tool. While the existing docker images are hosted publicly here for the [controller](https://hub.docker.com/r/jdennison/ipyparallel-marathon-controller/) and the [engine](https://hub.docker.com/r/jdennison/ipyparallel-marathon-engine/). 68 | 69 | Extending the existing [ipyparallel-marathon-engine](https://hub.docker.com/r/jdennison/ipyparallel-marathon-engine/) to install your custom depencies is really useful, especially if your users have different needs. Supporting multiple version of packages for multiple users can be a real struggle. If you use a custom engine image make sure you update: `c.MarathonLauncher.engine_docker_image`. 70 | 71 | ## Design 72 | 73 | Allowing cluster to quickly be spun up in mesos is great to help utilize existing clusters already managed by mesos. We find our workloads highly elastic so using existing resources to spin clusters up and down is very useful for us. However if you are setting up a new cluster from scratch that will be dedicated to long running IPython clusters, I would suggest using the existing SSH launcher or a for cloud based workflows something like StarCluster. Managing a mesos cluster for a single usecase might be overkill. 74 | 75 | This package provide two launchers `MarathonControllerLauncher` and `MarathonEngineSetLauncher`. Each launcher spins up a seperate marathon applications, a single controller and N engines. The controller exposes a whole slew of ports that enable it to communicate with the engines, so this container mounts to the host's network (i.e. uses the `--net=host`) docker option. The controller writes the connection files for engines and clients and then starts a http server to expose these on the `controller_config_port`. Given the `controller_config_port` and the mesos slave that is hosting the controller you can retrieve the connection files need to connect as a client or an engine. To get these connection files, each engine container needs to know the `controller_config_port`, the controller marathon application id and the location of the marathon api in order to located the controller. If the controller was running in bridge mode, we could rely on service ports, but that is a future enhancement. 76 | 77 | These port allocations are *not* registered to mesos which is a current limitation of the module, this could stomp on other frameworks running in your cluster. If you are deploying multiple clusters to a single mesos cluster, make sure you change the default `controller_config_port` otherwise you can get port conflicts. 78 | 79 | ## Troubleshooting 80 | 81 | If you client doesn't have any engine registered, double check the logs in the controller and engine containers. There might be a networking issue. 82 | 83 | ## Limitations 84 | 85 | - Have only tested against mesos v0.28.0 and marathon v0.15.0. Milage may vary on other versions. 86 | - The entire security model of ipyparallel is to never send the key in the connection files over the network. This project completely ignores that and serves these files on an open http server. Running the engines inside containers likely offers some protection, but be warned this project make no attempt to protect you from hostile actors on your network. 87 | - The controller uses a large number of ports, for ease of deployment the controller docker container is run in HOST networking mode. 88 | - Currently each engine is run under a seperate docker container. While this is great for process management the cgroup isolation disallows memmapping numpy dataframes. TODO: investigate more processes per an engine 89 | 90 | 91 | 92 | ## License 93 | 94 | All files are licensed under the BSD 3-Clause License as follows: 95 | 96 | > Copyright (c) 2016, Activision Publishing, Inc. 97 | > All rights reserved. 98 | > 99 | > Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 100 | > 101 | > 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 102 | > 103 | > 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 104 | > 105 | > 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. 106 | > 107 | > THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 108 | -------------------------------------------------------------------------------- /conda_recipe/build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | pip install -r ./requirements.txt 4 | $PYTHON setup.py install 5 | 6 | # Add more build steps here, if they are necessary. 7 | 8 | # See 9 | # http://docs.continuum.io/conda/build.html 10 | # for a list of environment variables that are set during the build process. 11 | -------------------------------------------------------------------------------- /conda_recipe/meta.yaml: -------------------------------------------------------------------------------- 1 | package: 2 | name: ipyparallel_mesos 3 | version: 0.0.2 4 | 5 | source: 6 | path: ../ 7 | 8 | # build: 9 | # noarch_python: True 10 | # preserve_egg_dir: True 11 | # entry_points: 12 | # Put any entry points (scripts to be generated automatically) here. The 13 | # syntax is module:function. For example 14 | 15 | 16 | # If this is a new build for the same version, increment the build 17 | # number. If you do not include this key, it defaults to 0. 18 | # number: 1 19 | 20 | requirements: 21 | build: 22 | - python 23 | 24 | run: 25 | - python 26 | - ipyparallel 27 | - ipython-notebook 28 | - requests 29 | 30 | test: 31 | # Python imports 32 | imports: 33 | - ipyparallel_mesos 34 | 35 | # commands: 36 | # You can put test commands to be run here. Use this to test that the 37 | # entry points work. 38 | 39 | 40 | # You can also put a file called run_test.py in the recipe that will be run 41 | # at test time. 42 | 43 | # requires: 44 | # Put any additional test requirements here. For example 45 | # - nose 46 | 47 | about: 48 | home: https://github.com/ActivisionGameScience/ipyparallel-mesos 49 | license: BSD License 50 | summary: 'ipyparallel launchers for mesos using docker and marathon' 51 | 52 | # See 53 | # http://docs.continuum.io/conda/build.html for 54 | # more information about meta.yaml 55 | -------------------------------------------------------------------------------- /docker/build.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash -xe 2 | 3 | DOCKER_REGISTRY=jdennison 4 | 5 | docker build -t ${DOCKER_REGISTRY}/ipyparallel-marathon-controller:${DOCKER_TAG} -f ./controller/Dockerfile.controller . 6 | docker build -t ${DOCKER_REGISTRY}/ipyparallel-marathon-engine:${DOCKER_TAG} -f ./engine/Dockerfile.engine . 7 | 8 | if [ "$DOCKER_TAG" == "dev" ]; then 9 | echo "dev build. not shipping" 10 | else 11 | docker push ${DOCKER_REGISTRY}/ipyparallel-marathon-controller:${DOCKER_TAG} 12 | docker push ${DOCKER_REGISTRY}/ipyparallel-marathon-engine:${DOCKER_TAG} 13 | fi 14 | -------------------------------------------------------------------------------- /docker/controller/Dockerfile.controller: -------------------------------------------------------------------------------- 1 | FROM python:3 2 | 3 | RUN apt-get update && apt-get install supervisor -y 4 | 5 | WORKDIR /opt 6 | 7 | COPY requirements.txt /opt/requirements.txt 8 | RUN pip install -r ./requirements.txt 9 | 10 | ENV IPYTHONDIR=/opt/ 11 | CMD ["/usr/bin/supervisord", "-c", "./controller/supervisord.conf"] 12 | RUN chmod 666 /opt/ 13 | COPY . /opt 14 | -------------------------------------------------------------------------------- /docker/controller/supervisord.conf: -------------------------------------------------------------------------------- 1 | [supervisord] 2 | nodaemon=true 3 | 4 | [program:ipython-controller] 5 | 6 | ; use bash subcommand and exec because supervisor doesn't support defaults for ENVVARS 7 | command=/bin/bash -c "exec ipcontroller --ip=* --location=${HOST:-127.0.0.1} --profile=mesos --init" 8 | 9 | directory=/opt/ 10 | user=root 11 | numprocs=1 12 | stdout_logfile=/dev/fd/1 13 | stdout_logfile_maxbytes=0 14 | stderr_logfile=/dev/stderr 15 | stderr_logfile_maxbytes=0 16 | autostart=true 17 | autorestart=true 18 | startsecs=10 19 | priority=1 20 | 21 | ; Need to wait for currently executing tasks to finish at shutdown. 22 | ; Increase this if you have very long running tasks. 23 | stopwaitsecs = 30 24 | 25 | ; When resorting to send SIGKILL to the program to terminate it 26 | ; send SIGKILL to its whole process group instead, 27 | ; taking care of its children as well. 28 | killasgroup=true 29 | 30 | [program:ipython-client] 31 | ; Set full path to celery program if using virtualenv 32 | command=/bin/bash -c "exec jupyter notebook --ip=* --port=${JUPYTER_NOTEBOOK_PORT:-8888} --profile=mesos" 33 | 34 | directory=/opt/ 35 | user=root 36 | numprocs=1 37 | stdout_logfile=/dev/fd/1 38 | stdout_logfile_maxbytes=0 39 | stderr_logfile=/dev/stderr 40 | stderr_logfile_maxbytes=0 41 | autostart=true 42 | autorestart=true 43 | startsecs=10 44 | priority=999 45 | 46 | ; Need to wait for currently executing tasks to finish at shutdown. 47 | ; Increase this if you have very long running tasks. 48 | stopwaitsecs = 30 49 | 50 | ; When resorting to send SIGKILL to the program to terminate it 51 | ; send SIGKILL to its whole process group instead, 52 | ; taking care of its children as well. 53 | killasgroup=true 54 | 55 | 56 | [program:config-client-http] 57 | ; Set full path to celery program if using virtualenv 58 | command=/bin/bash -c "exec python -m http.server ${CONTROLLER_CONFIG_PORT:-1235} --bind 0.0.0.0" 59 | 60 | directory=/opt/profile_mesos/security/ 61 | user=root 62 | numprocs=1 63 | stdout_logfile=/dev/fd/1 64 | stdout_logfile_maxbytes=0 65 | stderr_logfile=/dev/stderr 66 | stderr_logfile_maxbytes=0 67 | autostart=true 68 | autorestart=true 69 | startsecs=10 70 | priority=999 71 | 72 | ; Need to wait for currently executing tasks to finish at shutdown. 73 | ; Increase this if you have very long running tasks. 74 | stopwaitsecs = 30 75 | 76 | ; When resorting to send SIGKILL to the program to terminate it 77 | ; send SIGKILL to its whole process group instead, 78 | ; taking care of its children as well. 79 | killasgroup=true 80 | -------------------------------------------------------------------------------- /docker/engine/Dockerfile.engine: -------------------------------------------------------------------------------- 1 | FROM python:3 2 | 3 | RUN apt-get update && apt-get install supervisor jq -y 4 | 5 | WORKDIR /opt 6 | 7 | COPY requirements.txt /opt/requirements.txt 8 | RUN pip install -r ./requirements.txt 9 | 10 | ENV IPYTHONDIR=/opt/ 11 | CMD ["/usr/bin/supervisord", "-c", "./engine/supervisord.conf"] 12 | RUN chmod 666 /opt/ 13 | COPY . /opt 14 | -------------------------------------------------------------------------------- /docker/engine/start_engine.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Query marathon find host of ipython paralell controller 4 | CONTROLLER_HOST=`curl $MARATHON_MASTER/v2/apps/$CONTROLLER_MARATHON_ID | jq -r '.app.tasks | .[0].host'` 5 | echo $CONTROLLER_HOST 6 | curl http://${CONTROLLER_HOST}:${CONTROLLER_CONFIG_PORT:-1235}/ipcontroller-engine.json --create-dirs -o /opt/profile_mesos/security/ipcontroller-engine.json 7 | cat /opt/profile_mesos/security/ipcontroller-engine.json 8 | ipengine --profile=mesos --url=tcp://$HOST:$PORT 9 | -------------------------------------------------------------------------------- /docker/engine/supervisord.conf: -------------------------------------------------------------------------------- 1 | [supervisord] 2 | nodaemon=true 3 | 4 | [program:ipython-engine] 5 | ; Set full path to celery program if using virtualenv 6 | command=/bin/bash -e ./engine/start_engine.sh 7 | 8 | directory=/opt/ 9 | user=root 10 | numprocs=1 11 | stdout_logfile=/dev/fd/1 12 | stdout_logfile_maxbytes=0 13 | stderr_logfile=/dev/stderr 14 | stderr_logfile_maxbytes=0 15 | autostart=true 16 | autorestart=true 17 | startsecs=10 18 | 19 | ; Need to wait for currently executing tasks to finish at shutdown. 20 | ; Increase this if you have very long running tasks. 21 | stopwaitsecs = 30 22 | 23 | ; When resorting to send SIGKILL to the program to terminate it 24 | ; send SIGKILL to its whole process group instead, 25 | ; taking care of its children as well. 26 | killasgroup=true 27 | -------------------------------------------------------------------------------- /docker/requirements.txt: -------------------------------------------------------------------------------- 1 | jupyter 2 | ipyparallel 3 | requests 4 | -------------------------------------------------------------------------------- /ipyparallel_mesos/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ActivisionGameScience/ipyparallel-mesos/47f74a97fd5ffc924561c3cb763a8a3e1996d915/ipyparallel_mesos/__init__.py -------------------------------------------------------------------------------- /ipyparallel_mesos/launcher.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2015-2016, Activision Publishing, Inc. 2 | # All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, 5 | # are permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this 8 | # list of conditions and the following disclaimer. 9 | # 10 | # 2. Redistributions in binary form must reproduce the above copyright notice, 11 | # this list of conditions and the following disclaimer in the documentation 12 | # and/or other materials provided with the distribution. 13 | # 14 | # 3. Neither the name of the copyright holder nor the names of its contributors 15 | # may be used to endorse or promote products derived from this software without 16 | # specific prior written permission. 17 | # 18 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 19 | # ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 20 | # WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR 22 | # ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 23 | # (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 24 | # LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 25 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 26 | # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 27 | # SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | 29 | import time 30 | import os 31 | import json 32 | 33 | from traitlets import ( 34 | Any, Integer, CFloat, List, Unicode, Dict, Instance, HasTraits, CRegExp 35 | ) 36 | import requests 37 | 38 | from ipyparallel.apps.launcher import BaseLauncher, ControllerMixin, EngineMixin 39 | 40 | 41 | class MarathonLauncher(BaseLauncher): 42 | raw_marathon_api_url = '{}/v2/apps/{}' 43 | base_marathon_config = { 44 | 'mem': 1024, 45 | 'env': {}, 46 | 'instances': 1, 47 | 'container': { 48 | 'docker': { 49 | 'image': '', 50 | 'forcePullImage': True 51 | }, 52 | 'type': 'DOCKER'}, 53 | 'cpus': 0.9, 54 | 'id': '' 55 | } 56 | 57 | marathon_master_url = Unicode('', config=True, 58 | help="host and port for marathon api") 59 | 60 | marathon_app_group = Unicode('', config=True, 61 | help="Marathon application id path") 62 | 63 | controller_app_name = 'controller' 64 | 65 | controller_config_port = Unicode('1235', config=True, 66 | help="Port controller exposes to share client/engine configs") 67 | 68 | controller_docker_image = Unicode('', config=True, 69 | help="Docker image of controller to launch") 70 | 71 | engine_docker_image = Unicode('', config=True, 72 | help="Docker image of engine to launch") 73 | 74 | engine_memory = Integer(1024, config=True, 75 | help="Amount of memory to allocate to the engine docker image") 76 | 77 | controller_memory = Integer(512, config=True, 78 | help="Amount of memory to allocate to the engine docker image") 79 | 80 | @property 81 | def controller_marathon_id(self): 82 | return '{}{}'.format(self.marathon_app_group, self.controller_app_name) 83 | 84 | @property 85 | def controller_marathon_url(self): 86 | return '{}/v2/apps/{}'.format(self.marathon_master_url, self.controller_marathon_id) 87 | 88 | def __init__(self, work_dir=u'.', config=None, **kwargs): 89 | super(MarathonLauncher, self).__init__( 90 | work_dir=work_dir, config=config, **kwargs 91 | ) 92 | 93 | # TODO: is there a way in traits to require a config 94 | assert self.marathon_master_url, "marathon_master_url is required" 95 | assert self.marathon_app_group, "marathon_app_group is required" 96 | 97 | def _wait_for_marathon_app_to_start(self, app_url, tries=240, sleep=0.5): 98 | for i in range(tries): 99 | app_resp = requests.get(app_url) 100 | if app_resp.ok: 101 | self.log.debug("Found app: {} in marathon".format(app_url)) 102 | app_info = app_resp.json() 103 | if app_info['app']['instances'] == app_info['app']['tasksRunning']: 104 | return app_info 105 | else: 106 | self.log.debug("Application: {} still scalling".format(app_url)) 107 | time.sleep(sleep) 108 | 109 | self.stop() 110 | raise RuntimeError("Marathon App did not start correctly. Stopping cluster") 111 | 112 | 113 | class MarathonControllerLauncher(MarathonLauncher): 114 | """Docstring for M. """ 115 | 116 | def find_args(self): 117 | return self.marathon_master_url + self.controller_marathon_id 118 | 119 | def start(self): 120 | marathon_config = self._build_marathon_config() 121 | controller = self._start_marathon_app(marathon_config) 122 | self.notify_start(controller['app']['tasks'][0]['id']) 123 | self._write_client_connection_dict(controller) 124 | 125 | def _write_client_connection_dict(self, controller): 126 | controller_config_url = 'http://{}:{}/ipcontroller-client.json'.format(controller['app']['tasks'][0]['host'], self.controller_config_port) 127 | # HACKY RETRY FIX 128 | for i in range(10): 129 | try: 130 | resp = requests.get(controller_config_url) 131 | if resp.ok: 132 | return self._save_connection_dict(resp.json()) 133 | time.sleep(0.2) 134 | except requests.exceptions.ConnectionError: 135 | time.sleep(0.2) 136 | 137 | self.stop() 138 | raise RuntimeError("Failed to write client connection dict stopping cluster") 139 | 140 | 141 | def _save_connection_dict(self, connection_dict): 142 | fname = 'ipcontroller-client.json' 143 | fname = os.path.join(self.profile_dir, 'security', fname) 144 | self.log.info("writing connection info to %s", fname) 145 | with open(fname, 'w') as f: 146 | f.write(json.dumps(connection_dict, indent=2)) 147 | 148 | def stop(self): 149 | self._stop_marathon_app(self.controller_marathon_id) 150 | self.notify_stop(self) 151 | 152 | def _start_marathon_app(self, marathon_config): 153 | self.log.debug("Starting engines in marathon") 154 | full_marathon_api = self.raw_marathon_api_url.format(self.marathon_master_url, '') 155 | res = requests.post(full_marathon_api, json=marathon_config) 156 | assert res.ok, res.json() 157 | return self._wait_for_controller_to_start() 158 | 159 | def _wait_for_controller_to_start(self): 160 | return self._wait_for_marathon_app_to_start(self.controller_marathon_url) 161 | 162 | def _stop_marathon_app(self, application_id): 163 | full_marathon_api = self.raw_marathon_api_url.format(self.marathon_master_url, application_id) 164 | res = requests.delete(full_marathon_api) 165 | assert res.ok, res.json() 166 | 167 | def _build_marathon_config(self): 168 | marathon_config = self.base_marathon_config.copy() 169 | marathon_config['id'] = self.controller_marathon_id 170 | marathon_config['mem'] = self.controller_memory 171 | marathon_config['container']['docker']['network'] = 'HOST' 172 | marathon_config['container']['docker']['image'] = self.controller_docker_image 173 | marathon_config['env'] = { 174 | 'CONTROLLER_CONFIG_PORT': self.controller_config_port 175 | } 176 | return marathon_config 177 | 178 | 179 | class MarathonEngineSetLauncher(MarathonLauncher): 180 | """Launcher to deploy a set of""" 181 | engine_app_name = 'engine' 182 | 183 | @property 184 | def engine_marathon_id(self): 185 | return '{}{}'.format(self.marathon_app_group, self.engine_app_name) 186 | 187 | @property 188 | def engine_marathon_url(self): 189 | return '{}/v2/apps/{}'.format(self.marathon_master_url, self.engine_marathon_id) 190 | 191 | def find_args(self): 192 | return self.marathon_master_url + self.engine_marathon_id 193 | 194 | def start(self, n): 195 | marathon_config = self._build_marathon_config(n) 196 | self._start_marathon_app(marathon_config) 197 | engines = self._wait_for_engines_to_start() 198 | for task in engines['app']['tasks']: 199 | self.notify_start(task['id']) 200 | 201 | self.log.info("Engines started. Please visit: {} to scale or view cluster".format(self.engine_marathon_url)) 202 | 203 | def stop(self): 204 | self._stop_marathon_app(self.engine_marathon_id) 205 | self.notify_stop(self) 206 | 207 | def _wait_for_engines_to_start(self): 208 | # Wait for two minutes 209 | return self._wait_for_marathon_app_to_start(self.engine_marathon_url, tries=240, sleep=0.5) 210 | 211 | def _start_marathon_app(self, marathon_config): 212 | full_marathon_api = self.raw_marathon_api_url.format(self.marathon_master_url, '') 213 | res = requests.post(full_marathon_api, json=marathon_config) 214 | assert res.ok, res.json() 215 | return res.json() 216 | 217 | def _stop_marathon_app(self, application_id): 218 | full_marathon_api = self.raw_marathon_api_url.format(self.marathon_master_url, application_id) 219 | res = requests.delete(full_marathon_api) 220 | assert res.ok, res.json() 221 | 222 | def _build_marathon_config(self, n=1): 223 | assert self.engine_docker_image, "engine_docker_image is required" 224 | 225 | marathon_config = self.base_marathon_config.copy() 226 | marathon_config['id'] = self.engine_marathon_id 227 | marathon_config['instances'] = n 228 | marathon_config['mem'] = self.engine_memory 229 | marathon_config['container']['docker']['image'] = self.engine_docker_image 230 | marathon_config['env'] = { 231 | 'MARATHON_MASTER': self.marathon_master_url, 232 | 'CONTROLLER_MARATHON_ID': self.controller_marathon_id, 233 | 'CONTROLLER_CONFIG_PORT': self.controller_config_port, 234 | 235 | } 236 | return marathon_config 237 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | jupyter 2 | ipyparallel 3 | requests 4 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [metadata] 2 | description-file = README.md 3 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | from setuptools import setup, find_packages 4 | 5 | from pip.req import parse_requirements 6 | import pip 7 | requirements = [ 8 | str(req.req) for req in parse_requirements('requirements.txt', session=pip.download.PipSession()) 9 | ] 10 | 11 | setup(name='ipyparallel_mesos', 12 | version='0.0.2', 13 | description='ipyparallel launchers for mesos using docker and marathon', 14 | author='John Dennison', 15 | author_email='john.dennison@activision.com', 16 | url = 'https://github.com/ActivisionGameScience/ipyparallel-mesos/', 17 | packages=find_packages(), 18 | install_requires=requirements 19 | ) 20 | --------------------------------------------------------------------------------