├── Dockerfile ├── LICENSE ├── README.md ├── blacklist-nouveau.conf ├── ec2-nvidocker-setup-1.sh ├── ec2-nvidocker-setup-2.sh ├── ec2-nvidocker-setup-3.sh └── nvidocker-spec.json.template /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM tensorflow/tensorflow:latest-devel-gpu 2 | 3 | MAINTAINER Gavin Gray 4 | 5 | # go back to the tensorflow dir 6 | WORKDIR /tensorflow 7 | 8 | # Configure the build for our CUDA configuration. 9 | ENV CUDA_TOOLKIT_PATH /usr/local/cuda 10 | ENV CUDNN_INSTALL_PATH /usr/local/cuda 11 | ENV TF_NEED_CUDA 1 12 | ENV TF_CUDA_COMPUTE_CAPABILITIES=3.0 13 | 14 | RUN ./configure && \ 15 | bazel build -c opt --config=cuda tensorflow/tools/pip_package:build_pip_package && \ 16 | bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/pip && \ 17 | pip install --upgrade /tmp/pip/tensorflow-*.whl 18 | 19 | WORKDIR /root 20 | 21 | # Set up CUDA variables 22 | ENV CUDA_PATH /usr/local/cuda 23 | 24 | # TensorBoard 25 | EXPOSE 6006 26 | # IPython 27 | EXPOSE 8888 28 | 29 | RUN ["/bin/bash"] 30 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Gavin Gray 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | **Warning**: these are probably completely out of date and unlikely to work. 2 | 3 | # Installing nvidia-docker on Amazon EC2 4 | 5 | These are a few scripts to take an Amazon EC2 instance from scratch to running 6 | [nvidia-docker][nd], so that you can easily run [any deep learning 7 | library][kai] (although I have to also provide a modified version of the 8 | [Tensorflow devel-gpu Dockerfile][tensor], as the standard one is not 9 | configured for cuda compute capability of 3.0). 10 | 11 | The installation procedure is pretty crude and involves running the three 12 | scripts provided in this repository in sequence, rebooting in between (but 13 | not after the third). 14 | 15 | ## Starting an Instance 16 | 17 | Start either a regular or spot instance, the image this is tested with is 18 | the 64 bit Ubuntu 14.04 image `ami-8446ff93`. If you would like to, you 19 | could fill in the `KeyName` and `SecurityGroupIds` in 20 | `nvidocker-spec.json.template` to start an instance from the command line. 21 | Then, with the aws cli tools installed, you can run the following command 22 | to start a spot instance with a max price of one dollar: 23 | 24 | ``` 25 | aws ec2 request-spot-instances --spot-price 1.00 --instance-count 1 --type one-time --launch-specification file://nvidocker-spec.json 26 | ``` 27 | 28 | If you're not using AWS for anything else, you can also use the following 29 | command to get the public DNS name of _probably_ the instance you just 30 | started (give it a few minutes to start up): 31 | 32 | ``` 33 | aws ec2 describe-instances --query Reservations[0].Instances[0].PublicDnsName 34 | ``` 35 | 36 | Later, I'm going to refer to the address of this instance as `EC2ADDRESS`. 37 | 38 | ## Install instructions 39 | 40 | First, copy all of the files in this repository to the new machine: 41 | 42 | ``` 43 | scp -i .pem * ubuntu@$EC2ADDRESS:~/ 44 | ``` 45 | 46 | __Or__, clone this repo while in a shell on the remote machine: 47 | 48 | ``` 49 | git clone https://github.com/gngdb/nvidia-docker-ec2.git 50 | ``` 51 | 52 | Make all the scripts executable: 53 | 54 | ``` 55 | chmod +x ec2-nvidocker-setup-* 56 | ``` 57 | 58 | Then run the three scripts, the first two will trigger reboot upon 59 | finishing: 60 | 61 | * `./ec2-nvidocker-setup-1.sh` __REBOOT__ 62 | * `./ec2-nvidocker-setup-2.sh` __REBOOT__ 63 | * `./ec2-nvidocker-setup-3.sh` 64 | 65 | After the third script you can log out and log back in for the docker group 66 | to operate correctly, or you can just run: 67 | 68 | ``` 69 | newgrp docker 70 | ``` 71 | 72 | ## Building the Tensorflow Image 73 | 74 | __Warning__: I haven't tested the current version of this Dockerfile, but I 75 | think it should work. 76 | 77 | The `devel-gpu` docker image provided by Tensorflow has to be rebuilt with 78 | the `TF_CUDA_COMPUTE_CAPABILITY` environment variable set to 3.0. To do 79 | this, build the docker image using the Dockerfile in this repository: 80 | 81 | ``` 82 | nvidia-docker build -t gngdb/tensorflow:latest-devel-gpu . 83 | ``` 84 | 85 | ## All the other deep learning tools 86 | 87 | Every other major deep learning library can be pulled from Docker hub 88 | thanks to [kaixhin's great collection of builds for each of them][kai]. So 89 | you can have Caffe, Keras and Theano all running in 10 minutes, 90 | simultaneously, on the same machine. And, if you accidentally break an 91 | install, you can just start a new container. 92 | 93 | [nd]: https://github.com/NVIDIA/nvidia-docker 94 | [kai]: https://github.com/Kaixhin/dockerfiles 95 | [tensor]: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu 96 | -------------------------------------------------------------------------------- /blacklist-nouveau.conf: -------------------------------------------------------------------------------- 1 | blacklist nouveau 2 | blacklist lbm-nouveau 3 | options nouveau modeset=0 4 | alias nouveau off 5 | alias lbm-nouveau off 6 | -------------------------------------------------------------------------------- /ec2-nvidocker-setup-1.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # required before installing nvidia driver 4 | sudo apt-get update 5 | sudo apt-get install --no-install-recommends -y gcc make libc-dev 6 | sudo apt-get install linux-image-extra-virtual 7 | sudo reboot 8 | -------------------------------------------------------------------------------- /ec2-nvidocker-setup-2.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # blacklisting nouveau etc 4 | sudo cp blacklist-nouveau.conf /etc/modprobe.d/ 5 | echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf 6 | sudo update-initramfs -u 7 | sudo reboot 8 | -------------------------------------------------------------------------------- /ec2-nvidocker-setup-3.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # install nvidia driver 361.42 4 | wget -P /tmp http://us.download.nvidia.com/XFree86/Linux-x86_64/361.42/NVIDIA-Linux-x86_64-361.42.run 5 | sudo sh /tmp/NVIDIA-Linux-x86_64-361.42.run --silent 6 | 7 | # install docker and start the service 8 | sudo apt-get install apt-transport-https ca-certificates 9 | sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D 10 | sudo su -c "echo deb https://apt.dockerproject.org/repo ubuntu-trusty main > /etc/apt/sources.list.d/docker.list" 11 | sudo apt-get update 12 | sudo apt-get purge lxc-docker 13 | apt-cache policy docker-engine 14 | sudo apt-get install docker-engine 15 | sudo service docker start 16 | 17 | # create docker group and put default user in it 18 | sudo groupadd docker 19 | sudo usermod -aG docker ubuntu 20 | 21 | # install nvidia-docker and nvidia-docker plugin 22 | wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.0-rc.3/nvidia-docker_1.0.0.rc.3-1_amd64.deb 23 | sudo dpkg -i /tmp/nvidia-docker*.deb 24 | -------------------------------------------------------------------------------- /nvidocker-spec.json.template: -------------------------------------------------------------------------------- 1 | { 2 | "ImageId": "ami-8446ff93", 3 | "SecurityGroupIds": [ "" ], 4 | "InstanceType": "g2.2xlarge", 5 | "KeyName": "" 6 | } 7 | --------------------------------------------------------------------------------