Alternatively install the deprecated `nvidia-docker2` package for Kubernetes [click to expand]
100 |
101 | ```bash
102 | # NOTE: nvidia-docker2 is still required for Kubernetes but otherwise only nvidia-container-toolkit
103 | # https://docs.nvidia.com/datacenter/cloud-native/kubernetes/install-k8s.html
104 | distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
105 | && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
106 | && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
107 |
108 | # NOTE: I had to manually edit /etc/apt/sources.list.d/nvidia-docker.list to change 18.04 to 20.04
109 | # Install nvidia-docker2 to provide the legacy runtime=nvidia for use with docker-compose (see: https://github.com/NVIDIA/nvidia-docker/issues/1268#issuecomment-632692949)
110 | sudo apt-get update && sudo apt-get install -y nvidia-docker2
111 | # sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
112 | sudo systemctl restart docker
113 | ```
114 |
115 |
116 | ```bash
117 | # Verify installation
118 | docker run --rm --gpus all nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04 nvidia-smi
119 | ```
120 |
121 | ### Post Installation Steps
122 | ```bash
123 | # Add users to the `docker` group to let them use docker on the server without `sudo`
124 | # https://docs.docker.com/engine/install/linux-postinstall/
125 | sudo groupadd docker
126 | sudo usermod -aG docker $USER
127 |
128 | # Activate changes
129 | newgrp docker
130 |
131 | # Verify
132 | docker run hello-world
133 |
134 | # Configure Docker to start on boot with systemd
135 | sudo systemctl enable docker.service
136 | sudo systemctl enable containerd.service
137 | ```
138 |
139 | ### Create `.env` file for sensitive configuration details
140 | In addition to these files, create a `.env` file with the necessary secrets set as variables e.g.:
141 |
142 | ```python
143 | COMPOSE_PROJECT_NAME=dl_hub
144 | AUTH_SERVER_ADDRESS=authenticator.uni.ac.uk
145 | ADMIN_USERS='user1 user2 user3' # A string of user names separated by spaces
146 | # DOCKER_NETWORK_NAME=${COMPOSE_PROJECT_NAME}_default
147 | ```
148 | See [here](https://docs.docker.com/compose/environment-variables/) for documentation on setting and passing environment variables to `docker compose`.
149 |
150 | ### Authentication
151 |
152 | Depending on your environment, you will probably want to configure a more sophisticated authenticator e.g. the [`PAMAuthenticator` or `ldapauthenticator`](https://github.com/jupyterhub/jupyterhub#configuration). You will need configuration details from the university system administrators for this in order to use the existing user authentication systems. These details should be configured in [`jupyterhub/jupyterhub_config.py`](https://github.com/bdevans/dl-hub/blob/main/jupyterhub/jupyterhub_config.py) (with secrets in `.env` as necessary).
153 |
154 | Your organisation may also be able to issue and sign SSL certificates for the server. This repository currently assumes they are in `jupyterhub/cert/`. Appropriate configuration settings then need to be set in [`jupyterhub/jupyterhub_config.py`](https://github.com/bdevans/dl-hub/blob/main/jupyterhub/jupyterhub_config.py) e.g.:
155 |
156 | ```python
157 | # Configure SSL
158 | c.JupyterHub.ssl_key = '/srv/jupyterhub/hub.key'
159 | c.JupyterHub.ssl_cert = '/srv/jupyterhub/chain.crt'
160 | c.JupyterHub.port = 443
161 |
162 | # Configure configurable-http-proxy to redirect http to https
163 | c.ConfigurableHTTPProxy.command = ['configurable-http-proxy', '--redirect-port', '80']
164 | ```
165 |
166 | The corresponding lines where the certificates are installed in [`jupyterhub/Dockerfile`](https://github.com/bdevans/dl-hub/blob/main/jupyterhub/Dockerfile) will also need to be edited.
167 |
168 | ### Optional additional steps
169 |
170 | * Customise JupyterHub
171 | - Edit `jupyterhub_config.py`
172 | * Automatically add new users to the `docker` group to let them use docker on the server without `sudo`
173 | - `sudo nano /etc/adduser.conf` then add the following lines
174 | * `EXTRA_GROUPS="docker"` # Separate groups with spaces e.g. `"docker users"`
175 | * `ADD_EXTRA_GROUPS=1`
176 | * Mount additional partitions
177 | * Move Docker disk to separate partition
178 | - `sudo systemctl stop docker`
179 | - Copy or move the data e.g.: `sudo rsync -aP /var/lib/docker/ /path/to/your/docker_data`
180 | - Edit `/etc/docker/daemon.json` to add `"data-root": "/path/to/your/docker_data"`
181 | - `sudo systemctl start docker`
182 | * Set up build target of `jupyter/docker-stacks with --build-arg`
183 | * Install extras, e.g.:
184 | - `screen`
185 | - `tmux`
186 | - `htop`
187 | - `nvtop`
188 | * Create a list or dictionary of allowed images which will be presented as a dropdown list of options for users at logon e.g.:
189 | - `c.DockerSpawner.allowed_images = {"Latest": "cuda-dl-lab:11.4.2-cudnn8", "Previous": "cuda-dl-lab:11.2.2-cudnn8"}`
190 | - `c.DockerSpawner.allowed_images = ["cuda-dl-lab:11.4.2-cudnn8", "cuda-dl-lab:11.2.2-cudnn8"]`
191 | * Schedule a backup!
192 |
193 | ## Updating
194 |
195 | ### [NVIDIA drivers](https://www.nvidia.co.uk/Download/index.aspx?lang=en-uk)
196 | * Find the latest Linux 64-bit drivers for your graphics cards: https://www.nvidia.co.uk/Download/index.aspx?lang=en-uk
197 |
198 | ```bash
199 | # sudo service lightdm stop # or gdm or kdm depending on your display manager
200 | curl -o nvidia-drivers.run https://uk.download.nvidia.com/XFree86/Linux-x86_64/$NVIDIA_DRIVER_VERSION/NVIDIA-Linux-x86_64-$NVIDIA_DRIVER_VERSION.run
201 | chmod +x nvidia-drivers-$NVIDIA_DRIVER_VERSION.run
202 | sudo ./nvidia-drivers-$NVIDIA_DRIVER_VERSION.run --dkms --no-opengl-files
203 | nvidia-smi
204 | sudo reboot
205 | ```
206 | * Confirm the drivers work: `docker run --rm --gpus all nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04 nvidia-smi`
207 |
208 | ### Docker, Docker Compose and `nvidia-container-toolkit`
209 | * `sudo apt update && sudo apt upgrade`
210 |
211 | ### [Docker CUDA images](https://hub.docker.com/r/nvidia/cuda/tags?page=1&name=devel-ubuntu)
212 | * Edit `docker-compose.yml` (or `build_images.sh` if you prefer to use the script) to update:
213 | * `CUDA_VERSION`
214 | * `CUDNN_VERSION`
215 | * Eventually `ubuntu-22.04`
216 | * Alternatively, you can set the variables in `.env` or pass arguments e.g. `docker compose build --build-arg CUDA_VERSION=x.y.z`
217 |
218 | * Edit `cuda-dl-lab/Dockerfile` to update with new versions:
219 | * [`'tensorflow-gpu==2.12.0'`](https://github.com/tensorflow/tensorflow/releases): [Docs](https://www.tensorflow.org/install/gpu); [Code](https://github.com/tensorflow/tensorflow)
220 | * [`TF_MODELS_VERSION=v2.12.0`](https://github.com/tensorflow/models/releases): [Code](https://github.com/tensorflow/models)
221 | * [`'torch==2.0.0'`](https://github.com/pytorch/pytorch/releases): [Docs](https://pytorch.org/get-started/locally/); [Code](https://github.com/pytorch/pytorch)
222 | * [`magma-cuda118`](https://anaconda.org/search?q=magma): https://anaconda.org/search?q=magma
223 |
224 | * `make build` or `make clean` to rebuild the `cuda-dl-lab` images from scratch
225 |
226 | ### [JupyterHub](https://github.com/jupyterhub/jupyterhub/tags)
227 | * Update `JUPYTERHUB_VERSION=4.0.2` in:
228 | - `docker-compose.yml`
229 | - `jupyterhub/Dockerfile` (optional)
230 |
231 | * Edit `jupyterhub/jupyterhub_config.py` for any additional volumes
232 |
233 | ### Restart the Hub
234 | * `make stop` (in case the hub is running)
235 | * `make hub`
236 |
--------------------------------------------------------------------------------
/build_images.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | usage() { echo "Usage: $0 [-c