├── README.org ├── configs └── .tmux.conf ├── deploy.yml ├── downloads └── dummy.txt ├── inventory.cfg └── vars.yml /README.org: -------------------------------------------------------------------------------- 1 | #+TITLE: Ansible script to convert clean Ubuntu 18.04 to CUDA 10, PyTorch 1.0, fastai, miniconda3 deep learning machine. 2 | 3 | With this simple Ansible script, you'll be able to convert a clean 4 | Ubuntu 18.04 image (as supplied by Google Compute Engine or 5 | [[https://www.paperspace.com/][PaperSpace]]) into a CUDA 10, [[https://pytorch.org/][PyTorch 1.0]], [[https://github.com/fastai/fastai][fastai 1.0.x]], miniconda3 6 | powerhouse, ready to live the (mixed-precision!) deep learning dream. 7 | 8 | After running the script, you'll be able to ssh or mosh in, type 9 | =conda activate pt=, and be on your way! 10 | 11 | =cpbotha@vxlabs.com= built this to scratch his own itch 12 | (mixed-precision neural networks on NVIDIA's new TensorCores), but 13 | would be happy if you too find it useful. Issues and PRs might get 14 | looked at, or not. There's also an accompanying [[https://vxlabs.com/2018/11/21/a-simple-ansible-script-to-convert-a-clean-ubuntu-18-04-to-a-cuda-10-pytorch-1-0rc-fastai-miniconda3-deep-learning-machine/][blog post with 15 | screencast]]. 16 | 17 | This script has now updated to using the official conda package of 18 | PyTorch 1.0. 19 | 20 | * Step by step instructions 21 | 22 | ** Install ansible 23 | 24 | #+BEGIN_SRC shell 25 | pip3 install --user --upgrade ansible 26 | # double-check that this gives you the Python 3 version 27 | which ansible-playbook 28 | #+END_SRC 29 | 30 | ** Configure this script's vars.yml and inventory.cfg 31 | 32 | These instructions are reproduced from the comments at the start of 33 | =deploy.yml=: 34 | 35 | 1. Register on the NVIDIA dev site and download the two CUDNN 7.4 or 36 | later DEBs. 37 | - These two debs should go in the =downloads/= subdir of this repo. 38 | - Ensure that the names in =vars.yml= match your downloads. 39 | - Everything in downloads/ will be copied over to ~/Downloads. Use 40 | this to your advantage! 41 | 2. The destination machine should have Ubuntu 18.04 installed. Importantly, 42 | =ssh your_user@the_machine= should let you in without password, and =your_user= 43 | should be able to sudo without entering a password. Test this! 44 | 3. Edit =vars.yml=: Change user to your login and sudo user on the destination machine 45 | 4. Edit =inventory.cfg=: Set the destination machine IP number / hostname under 46 | =[app]= 47 | 48 | ** Run the script 49 | 50 | Start the whole business: 51 | 52 | #+BEGIN_SRC shell 53 | ansible-playbook -i inventory.cfg deploy.yml 54 | #+END_SRC 55 | 56 | When I do this with a V100-equipped paperspace machine with 8 cores 57 | and 30GB of RAM, it takes about 13 minutes from start to finish. 58 | 59 | After this, you will have to reboot the machine. 60 | 61 | ** Use the environment 62 | 63 | =ssh= in to the machine. Activate the environment with =conda activate 64 | pt=. Do what you normally do! 65 | 66 | * Updates 67 | 68 | ** 2018-12-16 69 | 70 | Updated to official PyTorch 1.0 conda packages! 71 | 72 | ** 2018-12-06 73 | 74 | Updated to latest 2018-12-06 build of PyTorch 1.0 preview. See below 75 | for update instructions. 76 | 77 | ** 2018-11-24 78 | 79 | Updated to latest 2018-11-24 build of PyTorch 1.0 preview with the new 80 | magma 2.4.0 packages. 81 | 82 | To update an existing install with the new PyTorch 1.0 build, you can 83 | either just re-run the whole =deploy.yml= playbook, or you can run 84 | just the miniconda3-related tasks like this: 85 | 86 | #+BEGIN_SRC shell 87 | ansible-playbook -i inventory.cfg deploy.yml --tags "miniconda3" 88 | #+END_SRC 89 | 90 | 91 | -------------------------------------------------------------------------------- /configs/.tmux.conf: -------------------------------------------------------------------------------- 1 | # xterm title should reflect the current command 2 | # this works interactively, but not in the ~/.tmux.conf 3 | set -g set-titles on 4 | 5 | # we want TERM set to this, and not the defaulst 'screen' 6 | set -g default-terminal "screen-256color" 7 | 8 | # found the suggestion of C-z somewhere in the comments on 9 | # https://superuser.com/questions/74492/whats-the-least-conflicting-prefix-escape-sequence-for-screen-or-tmux 10 | unbind-key C-b 11 | set -g prefix 'C-j' 12 | bind-key 'C-j' send-prefix 13 | 14 | -------------------------------------------------------------------------------- /deploy.yml: -------------------------------------------------------------------------------- 1 | # ansible to configure clean ubuntu 18.04 node with CUDA10 + miniconda + 2 | # PyTorch + fastai, courtesy of Charl P. Botha and vxlabs.com 2018 3 | 4 | # 0. register on the NVIDIA dev site and download the two CUDNN 7.4 or 5 | # later DEBs. These should go in the downloads/ subdir of this 6 | # repo. Ensure that the names in vars.yml match your downloads. 7 | # - everything in downloads/ will be copied over to ~/Downloads. Use this! 8 | # 1. The destination machine should have Ubuntu 18.04 installed. Also, 9 | # ssh your_user@machine should let you in without 10 | # password. your_user should be able to sudo. 11 | # 2. edit vars.yml -- change user to your login and sudo user on destination machine 12 | # 3. edit inventory.cfg -- set the destination machine IP number / hostname under [app] 13 | # 4. ansible-playbook -i inventory.cfg deploy.yml 14 | 15 | --- 16 | - name: Prepare homedir 17 | hosts: app 18 | become: no 19 | user: "{{ user }}" 20 | vars_files: 21 | - vars.yml 22 | 23 | tasks: 24 | - name: Push over my Emacs configuration 25 | synchronize: 26 | # lots of symlinks, rather copy the files they point to 27 | copy_links: yes 28 | src: ~/.emacs.d 29 | dest: "{{ home }}/" 30 | 31 | - name: Push over tmux configuration 32 | template: 33 | src: configs/.tmux.conf 34 | dest: "{{ home }}/.tmux.conf" 35 | 36 | # this will copy the contents of downloads/ over here into ~/Downloads/ on the remote 37 | - name: Copy contents of downloads (including CUDNN debs) to remote ~/Downloads 38 | copy: 39 | src: downloads/ 40 | dest: "{{ home }}/Downloads" 41 | 42 | 43 | - name: Basic system setup 44 | hosts: app 45 | become: yes 46 | user: "{{ user }}" 47 | vars_files: 48 | - vars.yml 49 | 50 | tasks: 51 | - name: Add Emacs 26 PPA 52 | apt_repository: 53 | repo: ppa:kelleyk/emacs 54 | 55 | # do the equivalent of apt-get update && apt-get upgrade 56 | - name: Make sure whole system is up to date 57 | action: apt upgrade=yes update-cache=yes 58 | 59 | # https://docs.ansible.com/ansible/latest/modules/apt_module.html 60 | - name: Install required system packages 61 | apt: 62 | name: "{{ packages }}" 63 | vars: 64 | packages: 65 | - apt-utils 66 | - build-essential 67 | - curl 68 | - dkms 69 | - htop 70 | - tmux 71 | - joe 72 | - locales 73 | - mosh 74 | - p7zip-full 75 | - software-properties-common 76 | - unzip 77 | - zip 78 | - emacs26 79 | - emacs26-el 80 | 81 | # some systems, e.g. those on nvidia-docker, already have CUDA installed 82 | # ansible-playbook -i inventory.cfg deploy.yml --skip-tags "cuda" 83 | - name: Add NVIDIA CUDA key 84 | tags: cuda 85 | apt_key: 86 | url: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub 87 | 88 | - name: Install NVIDIA CUDA 10 network deb from the network 89 | tags: cuda 90 | apt: 91 | deb: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb 92 | 93 | - name: Install the rest of CUDA based on network deb's config 94 | tags: cuda 95 | apt: 96 | update_cache: yes 97 | name: cuda 98 | 99 | - name: Install CUDNN debs 100 | action: apt deb={{ item }} state=present 101 | with_items: 102 | - "{{ downloads_dir }}/{{ cudnn }}" 103 | - "{{ downloads_dir }}/{{ cudnn_dev }}" 104 | 105 | - name: Setup miniconda3 with PyTorch and fastai 106 | hosts: app 107 | become: no 108 | user: "{{ user }}" 109 | # you can run ONLY this play by doing: 110 | # ansible-playbook -i inventory.cfg deploy.yml --tags "miniconda3" 111 | tags: miniconda3 112 | vars_files: 113 | - vars.yml 114 | 115 | tasks: 116 | # this will only download if not already there 117 | - name: Download miniconda3 installer 118 | get_url: 119 | url: https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh 120 | dest: "{{ home }}/Downloads/Miniconda3-latest-Linux-x86_64.sh" 121 | 122 | # this will only run if miniconda3 is not already there yet (creates: ...) 123 | - name: Install miniconda3 124 | command: "/bin/bash {{ home }}/Downloads/Miniconda3-latest-Linux-x86_64.sh -b" 125 | args: 126 | creates: "{{ home }}/miniconda3" 127 | 128 | - name: Add conda config to .bashrc 129 | lineinfile: 130 | path: "{{ home }}/.bashrc" 131 | line: ". ~/miniconda3/etc/profile.d/conda.sh" 132 | 133 | # only do this if this is a new miniconda3 installation 134 | - name: Install first PyTorch environment 135 | command: "{{ home }}/miniconda3/bin/conda create -y -c mingfeima -n pt python=3.7 numpy mkl mkldnn scikit-learn pandas ipython" 136 | args: 137 | creates: "{{ home }}/miniconda3/envs/pt" 138 | 139 | # previous version of this script installed my own build of 1.0 preview as 1.0 was not out yet 140 | # make sure to uninstall that, then install official conda packages in next step 141 | - name: Uninstall previous torch wheel installation 142 | command: "{{ home }}/miniconda3/envs/pt/bin/pip uninstall -y torch" 143 | 144 | - name: Install PyTorch 1.0 and friends using conda into pt environment 145 | command: "{{ home }}/miniconda3/bin/conda install -y -n pt pytorch torchvision cuda100 -c pytorch" 146 | 147 | - name: Install / upgrade to latest fastai 148 | command: "{{ home }}/miniconda3/envs/pt/bin/pip install --upgrade fastai" 149 | 150 | # PyTorch wheel needs to have LD_LIBRARY_PATH set else it can't find MKL and friends 151 | # - name: Configure LD_LIBRARY_PATH for the PyTorch wheel 152 | # lineinfile: 153 | # path: "{{ home }}/miniconda3/envs/pt/etc/conda/activate.d/ld_library_path.sh" 154 | # create: yes 155 | # line: "export LD_LIBRARY_PATH=$CONDA_PREFIX/lib" 156 | 157 | -------------------------------------------------------------------------------- /downloads/dummy.txt: -------------------------------------------------------------------------------- 1 | So that we can have the downloads dir in the repo 2 | -------------------------------------------------------------------------------- /inventory.cfg: -------------------------------------------------------------------------------- 1 | [app] 2 | v100.paperspace 3 | 4 | 5 | 6 | 7 | -------------------------------------------------------------------------------- /vars.yml: -------------------------------------------------------------------------------- 1 | --- 2 | 3 | user: "paperspace" 4 | torch_fn: "torch-1.0.0a0+b5db6ac+20181206-cp37-cp37m-linux_x86_64.whl" 5 | cudnn: "libcudnn7_7.4.1.5-1+cuda10.0_amd64.deb" 6 | cudnn_dev: "libcudnn7-dev_7.4.1.5-1+cuda10.0_amd64.deb" 7 | 8 | ansible_python_interpreter: /usr/bin/python3 9 | # we keep dirs without appended slashes by convention 10 | home: "/home/{{ user }}" 11 | downloads_dir: "{{ home }}/Downloads" 12 | dest: "{{ home }}/healthmodels" 13 | torch_path: "{{ downloads_dir }}/{{ torch_fn }}" 14 | 15 | 16 | --------------------------------------------------------------------------------