├── README.md └── config ├── config_basic.yaml ├── config_https_github.yaml ├── config_intel.yaml └── config_lb.yaml /README.md: -------------------------------------------------------------------------------- 1 | # JupyterHub on Kubernetes 2 | 3 | 4 | ### We will install and test JupyterHub in k8s as a service from VK Cloud Solutions 5 | #### Tested with Kubernetes 1.23.13, Ubuntu 22.04, Helm chart for JupyterHub 1.1.3, JupyterHub 1.4.2 6 | 7 | 8 | ## Running JupyterHub on Kubernetes useful links 9 | 10 | ### Official doc 11 | https://zero-to-jupyterhub.readthedocs.io/en/latest/index.html 12 | https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/installation.html 13 | 14 | 15 | 16 | ## Prerequisites 17 | 18 | ### (Optional) Create host VM 19 | It would be easier to create a host VM in cloud 20 | All work in the cloud can be done from that VM 21 | You can install kubectl, Helm, Docker and all other things on this VM and don`t mess with your own local machine 22 | 23 | How to create VM: https://mcs.mail.ru/help/ru_RU/create-vm/vm-quick-create 24 | How to connect: https://mcs.mail.ru/docs/ru/base/iaas/vm-start/vm-connect/vm-connect-nix 25 | Steps: 26 | 1. Create VM 27 | 2. Connect to VM with SSH 28 | 3. Perform all steps described further in this instruction from this VM 29 | 4. Enjoy cloud:) 30 | 31 | 32 | ### Create K8s cluster in mcs and download kubeconfig 33 | Instruction: https://mcs.mail.ru/help/ru_RU/k8s-start/create-k8s 34 | Kubernetes as a Service: https://mcs.mail.ru/app/services/containers/add/ 35 | 36 | You may have trouble with Gatekeeper. So please delete it. 37 | https://mcs.mail.ru/docs/base/k8s/k8s-addons/k8s-gatekeeper/k8s-opa#udalenie 38 | 39 | You have to install keystone-auth for k8s version 1.23 or higer 40 | More information about changes see by link 41 | https://mcs.mail.ru/docs/base/k8s/concepts/access-management#1509-7 42 | 43 | To install use instruction https://mcs.mail.ru/docs/base/k8s/connect/kubectl#9980-5 44 | Don't forget to run after installation keystone-auth 45 | ```console 46 | source /home/ubuntu/.bashrc 47 | ``` 48 | 49 | ### Install kubectl 50 | [https://mcs.mail.ru/help/ru_RU/k8s-start/connect-k8s ](https://mcs.mail.ru/docs/ru/base/k8s/connect/kubectl) 51 | https://kubernetes.io/ru/docs/tasks/tools/install-kubectl/ 52 | 53 | ```console 54 | curl -LO https://storage.googleapis.com/kubernetes-release/release/`curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt`/bin/linux/amd64/kubectl 55 | chmod +x ./kubectl 56 | sudo mv ./kubectl /usr/local/bin/kubectl 57 | ``` 58 | 59 | ### Set path to kubeconfig for kubectl 60 | ```console 61 | export KUBECONFIG=/replace_with_path/to_your_kubeconfig.yaml 62 | ``` 63 | Replace credentials in your_kubeconfig.yaml 64 | ```console 65 | - name: "OS_PASSWORD" 66 | value: "vkcloud_account_password" 67 | ``` 68 | 69 | 70 | ### also it will be easier to work with kubectl while enabling autocomplete and using alias 71 | ```console 72 | alias k=kubectl 73 | source <(kubectl completion bash) 74 | complete -F __start_kubectl k 75 | ``` 76 | 77 | ### Install helm 78 | https://helm.sh/docs/intro/install/ 79 | ```console 80 | curl https://raw.githubusercontent.com/helm/helm/HEAD/scripts/get-helm-3 | bash 81 | ``` 82 | 83 | ### (Optional) Install Docker if you want to build your own images for JupyterHub and log into a Docker registry 84 | https://docs.docker.com/engine/install/ubuntu/ 85 | https://docs.docker.com/engine/reference/commandline/login/ 86 | https://ropenscilabs.github.io/r-docker-tutorial/04-Dockerhub.html 87 | 88 | ### (Mantadory) Repeat steps blow 89 | Update the apt package index and install packages to allow apt to use a repository over HTTPS: 90 | ```console 91 | sudo apt-get update 92 | sudo apt-get install ca-certificates curl gnupg 93 | ``` 94 | Add Docker’s official GPG key: 95 | ```console 96 | sudo install -m 0755 -d /etc/apt/keyrings 97 | curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg 98 | sudo chmod a+r /etc/apt/keyrings/docker.gpg 99 | ``` 100 | Use the following command to set up the repository: 101 | ```console 102 | echo \ 103 | "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ 104 | "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \ 105 | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null 106 | ``` 107 | 108 | 109 | ## JupyterHub installation and testing part 110 | 111 | ### Install JupyterHub 112 | ```console 113 | helm repo add jupyterhub https://hub.jupyter.org/helm-chart/ --insecure-skip-tls-verify 114 | helm repo update 115 | ``` 116 | 117 | Also we need to mark one of storage classes as default for successful installation. 118 | ATTENTION: WATCH your k8s cluster availability zone. Storage class must be equal k8s cluster zone. 119 | If cluster in GZ1 or any other zone, you should patch storageclass with this zone. 120 | ```console 121 | kubectl get storageclass 122 | kubectl patch storageclass csi-ceph-ssd-ms1-retain -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' 123 | ``` 124 | 125 | 126 | #### Create config for basic installation 127 | warning: this config for demo use only! NOT A PRODUCTION SOLUTION 128 | ```console 129 | nano config_basic.yaml 130 | #paste this to config_basic.yaml 131 | 132 | #Change storageClass for your availability zone. For example: storageClass: csi-ceph-ssd-gz1-retain 133 | scheduling: 134 | userScheduler: 135 | enabled: false 136 | singleuser: 137 | defaultUrl: "/lab" 138 | storage: 139 | dynamic: 140 | storageClass: csi-ceph-ssd-ms1-retain 141 | cpu: 142 | limit: .5 143 | guarantee: .5 144 | memory: 145 | limit: .256 146 | guarantee: .512 147 | hub: 148 | config: 149 | Authenticator: 150 | admin_users: 151 | - admin 152 | allowed_users: 153 | - your_another_non_admin_user 154 | #DummyAuthenticator not for production 155 | DummyAuthenticator: 156 | password: insertyourpasswordhereMVeP2VXfr 157 | JupyterHub: 158 | authenticator_class: dummy 159 | ``` 160 | 161 | #### Apply basic config and install JupyterHub 162 | ```console 163 | helm upgrade --cleanup-on-fail \ 164 | --install defaultinstall jupyterhub/jupyterhub --insecure-skip-tls-verify \ 165 | --namespace jupyterhub \ 166 | --create-namespace \ 167 | --version=1.1.3 \ 168 | --values config_basic.yaml \ 169 | --timeout 20m0s 170 | ``` 171 | 172 | To access JupyterHub we need to find external ip 173 | ```console 174 | kubectl get services -n jupyterhub 175 | ``` 176 | Look for LoadBalancer Service type. Then look for external ip. 177 | You can access JupyterHub by entering this external ip to browser. 178 | 179 | 180 | For debug and troubleshouting 181 | ```console 182 | kubectl get pods -n jupyterhub 183 | kubectl get events -n jupyterhub 184 | kubectl describe pod -n jupyterhub 185 | kubectl logs -n jupyterhub 186 | ``` 187 | 188 | #### Create config for advanced installation 189 | Let`s add some security measures 190 | 191 | 192 | ```console 193 | nano config_lb.yaml 194 | #paste this to config_lb.yaml 195 | 196 | singleuser: 197 | defaultUrl: "/lab" 198 | storage: 199 | dynamic: 200 | #you could use different storage classes 201 | #get storage classes with: kubectl get storageclasses.storage.k8s.io 202 | storageClass: csi-ceph-ssd-ms1-retain 203 | cpu: 204 | limit: .5 205 | guarantee: .5 206 | memory: 207 | limit: .256 208 | guarantee: .512 209 | hub: 210 | config: 211 | Authenticator: 212 | admin_users: 213 | - admin 214 | allowed_users: 215 | - your_another_non_admin_user 216 | #DummyAuthenticator not for production 217 | DummyAuthenticator: 218 | password: insertyourpasswordhereMVeP2VXfr 219 | JupyterHub: 220 | authenticator_class: dummy 221 | proxy: 222 | service: 223 | #if you set loadBalancerSourceRanges, you can access JupyterHub only from ip address from this setting. 224 | #you can set a bunch of IP adresses 225 | #https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#restricting-load-balancer-access 226 | loadBalancerSourceRanges: 227 | - PLACE_YOUR_IP_HERE 228 | - PLACE_ANOTHER_YOUR_IP_HERE_OR_REMOVE_THIS_LINE 229 | #EXAMPLE 230 | # - 91.74.148.161/32 231 | ``` 232 | 233 | 234 | #### Apply new config and upgrade JupyterHub 235 | ```console 236 | helm upgrade --cleanup-on-fail \ 237 | --install defaultinstall jupyterhub/jupyterhub --insecure-skip-tls-verify \ 238 | --namespace jupyterhub \ 239 | --version=1.1.3 \ 240 | --values config_lb.yaml \ 241 | --timeout 20m0s 242 | ``` 243 | You can check versions of Helm chart and JupyterHub here: 244 | https://jupyterhub.github.io/helm-chart/ 245 | 246 | 247 | Also we can enable https and integrate JupyterHub with Github for authentication 248 | Read more here: 249 | https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#https 250 | https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html#github 251 | https://docs.github.com/en/organizations/collaborating-with-groups-in-organizations/creating-a-new-organization-from-scratch 252 | 253 | You can find working examples of config.yaml in config directory in this repo. 254 | Read config_https_github.yaml 255 | 256 | 257 | 258 | 259 | ## INTEL oneAPI demo 260 | 261 | You could read more about oneAPI: 262 | https://software.intel.com/content/www/us/en/develop/tools/oneapi/ai-analytics-toolkit.html 263 | https://medium.com/intel-analytics-software/save-time-and-money-with-intel-extension-for-scikit-learn-33627425ae4 264 | 265 | You need to install Docker if you want to build your own image or you could use our image: mcscloud/jupyter-ds-intel-mcs:v2 266 | #### Login to Docker Hub 267 | ```console 268 | sudo docker login --username=YOUR_DOCKERHUB_USER_NAME 269 | #sudo docker login --username=mcscloud 270 | ``` 271 | additional instruction about Docker Hub 272 | https://jsta.github.io/r-docker-tutorial/04-Dockerhub.html 273 | 274 | 275 | #### Create Dockerfile 276 | ```console 277 | #make separate dir 278 | mkdir ~/intel_based_docker_image && cd ~/intel_based_docker_image 279 | 280 | #then create Dockerfile 281 | nano Dockerfile 282 | 283 | #paste this to Dockerfile 284 | FROM jupyter/datascience-notebook:hub-1.4.2 285 | RUN pip install --no-cache-dir nbgitpuller 286 | #Доп информация про nbgitpuller 287 | #https://github.com/jupyterhub/nbgitpuller 288 | #Install git extension 289 | RUN pip install --no-cache-dir jupyterlab-git 290 | #Install Intel part 291 | RUN conda install -c conda-forge scikit-learn-intelex 292 | ``` 293 | 294 | #### Build and push image 295 | Let`s build custom image with Intel ML package 296 | ```console 297 | export YOUR_DOCKER_REPO= 298 | #example export YOUR_DOCKER_REPO=mcscloud 299 | 300 | sudo docker build -t jupyter-ds-intel-mcs . 301 | 302 | sudo docker images 303 | #find image id and copy it 304 | 305 | sudo docker tag YOUR_IMAGE_ID $YOUR_DOCKER_REPO/jupyter-ds-intel-mcs:v2 306 | sudo docker push $YOUR_DOCKER_REPO/jupyter-ds-intel-mcs:v2 307 | ``` 308 | 309 | #### Create config for JupyterHub with Intel image 310 | If you want to test Intel libraries for ML you need more resources 311 | As you can see we add cpu and memory requirements to config under singleuser part 312 | 313 | ```console 314 | nano config_intel.yaml 315 | #paste this to config_intel.yaml 316 | 317 | singleuser: 318 | defaultUrl: "/lab" 319 | storage: 320 | dynamic: 321 | #you could use different storage classes 322 | #get storage classes with: kubectl get storageclasses.storage.k8s.io 323 | storageClass: csi-ceph-ssd-ms1-retain 324 | cpu: 325 | limit: 3 326 | guarantee: 2 327 | memory: 328 | limit: 3G 329 | guarantee: 512M 330 | # Defines the default image 331 | image: 332 | name: jupyter/minimal-notebook 333 | tag: hub-1.4.2 334 | profileList: 335 | - display_name: "Minimal environment" 336 | description: "To avoid too much bells and whistles: Python." 337 | default: true 338 | - display_name: "Tensorflow" 339 | description: "If you want the additional bells and whistles: Python, R, and Julia." 340 | kubespawner_override: 341 | image: jupyter/tensorflow-notebook:hub-1.4.2 342 | - display_name: "Spark environment" 343 | description: "The Jupyter Stacks spark image!" 344 | kubespawner_override: 345 | image: jupyter/all-spark-notebook:hub-1.4.2 346 | - display_name: "JupyterLab with Intel libraries" 347 | description: "Use some Intel optimizations" 348 | kubespawner_override: 349 | image: PLACE_YOUR_DOCKER_REPO_OR_USE_mcscloud/jupyter-ds-intel-mcs:v2 350 | # image: mcscloud/jupyter-ds-intel-mcs:v2 351 | hub: 352 | config: 353 | Authenticator: 354 | admin_users: 355 | - admin 356 | allowed_users: 357 | - your_another_non_admin_user 358 | #DummyAuthenticator not for production 359 | DummyAuthenticator: 360 | password: insertyourpasswordhereMVeP2VXfr 361 | JupyterHub: 362 | authenticator_class: dummy 363 | proxy: 364 | service: 365 | #if you set loadBalancerSourceRanges, you can access JupyterHub only from ip address from this setting. 366 | #you can set a bunch of IP adresses 367 | #https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#restricting-load-balancer-access 368 | loadBalancerSourceRanges: 369 | - PLACE_YOUR_IP_HERE 370 | - PLACE_ANOTHER_YOUR_IP_HERE_OR_REMOVE_THIS_LINE 371 | #EXAMPLE 372 | # - 91.74.148.161/32 373 | ``` 374 | 375 | #### Apply new config and upgrade JupyterHub 376 | ```console 377 | helm upgrade --cleanup-on-fail \ 378 | --install defaultinstall jupyterhub/jupyterhub --insecure-skip-tls-verify \ \ 379 | --namespace jupyterhub \ 380 | --version=1.1.3 \ 381 | --values config_intel.yaml \ 382 | --timeout 20m0s 383 | ``` 384 | 385 | For testing Intel library clone repo https://github.com/intel/scikit-learn-intelex with Git extension already installed in Jupyter 386 | You can find Git extension on the left vertical bar 387 | After cloning repo you can find test Notebooks in scikit-learn-intelex/examples/notebooks/ 388 | 389 | To use JupyterHub, enter the external IP for the proxy-public service in to a browser. 390 | ```console 391 | kubectl get service -n jupyterhub 392 | ``` 393 | 394 | #### Help 395 | If you have any questions you can ask me here: 396 | Telegram @volinski 397 | Email a.n.volinski@ya.ru 398 | -------------------------------------------------------------------------------- /config/config_basic.yaml: -------------------------------------------------------------------------------- 1 | singleuser: 2 | defaultUrl: "/lab" 3 | storage: 4 | dynamic: 5 | storageClass: csi-ceph-ssd-ms1-retain 6 | hub: 7 | config: 8 | Authenticator: 9 | admin_users: 10 | - admin 11 | allowed_users: 12 | - your_another_non_admin_user 13 | #DummyAuthenticator not for production 14 | DummyAuthenticator: 15 | password: insertyourpasswordhereMVeP2VXfr 16 | JupyterHub: 17 | authenticator_class: dummy 18 | -------------------------------------------------------------------------------- /config/config_https_github.yaml: -------------------------------------------------------------------------------- 1 | #https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#https 2 | #https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html#github 3 | #https://docs.github.com/en/organizations/collaborating-with-groups-in-organizations/creating-a-new-organization-from-scratch 4 | singleuser: 5 | defaultUrl: "/lab" 6 | storage: 7 | dynamic: 8 | storageClass: csi-ceph-ssd-ms1-retain 9 | hub: 10 | config: 11 | Authenticator: 12 | admin_users: 13 | - YOUR_GITHUB_LOGIN 14 | GitHubOAuthenticator: 15 | client_id: YOUR_CLIENT_ID_GITHUB 16 | client_secret: YOUR_SECRET_FROM_GITHUB 17 | oauth_callback_url: https://your-domain-name.com/hub/oauth_callback 18 | allowed_organizations: 19 | - YOUR_ORG_NAME_FROM_GITHUB 20 | scope: 21 | - read:org 22 | JupyterHub: 23 | authenticator_class: github 24 | proxy: 25 | https: 26 | enabled: true 27 | hosts: 28 | - your-domain-name.com 29 | letsencrypt: 30 | contactEmail: YOUR_EMAIL 31 | service: 32 | loadBalancerIP: PLACE_EXTERNAL_IP_OF_LOADBALANCER 33 | loadBalancerSourceRanges: 34 | - PLACE_YOUR_IP 35 | -------------------------------------------------------------------------------- /config/config_intel.yaml: -------------------------------------------------------------------------------- 1 | singleuser: 2 | defaultUrl: "/lab" 3 | storage: 4 | dynamic: 5 | #you could use different storage classes 6 | #get storage classes with: kubectl get storageclasses.storage.k8s.io 7 | storageClass: csi-ceph-ssd-ms1-retain 8 | cpu: 9 | limit: 3 10 | guarantee: 2 11 | memory: 12 | limit: 3G 13 | guarantee: 512M 14 | # Defines the default image 15 | image: 16 | name: jupyter/minimal-notebook 17 | tag: hub-1.4.2 18 | profileList: 19 | - display_name: "Minimal environment" 20 | description: "To avoid too much bells and whistles: Python." 21 | default: true 22 | - display_name: "Tensorflow" 23 | description: "If you want the additional bells and whistles: Python, R, and Julia." 24 | kubespawner_override: 25 | image: jupyter/tensorflow-notebook:hub-1.4.2 26 | - display_name: "Spark environment" 27 | description: "The Jupyter Stacks spark image!" 28 | kubespawner_override: 29 | image: jupyter/all-spark-notebook:hub-1.4.2 30 | - display_name: "JupyterLab with Intel libraries" 31 | description: "Use some Intel optimizations" 32 | kubespawner_override: 33 | image: PLACE_YOUR_DOCKER_REPO_OR_USE_mcscloud/jupyter-ds-intel-mcs:v2 34 | # image: mcscloud/jupyter-ds-intel-mcs:v2 35 | hub: 36 | config: 37 | Authenticator: 38 | admin_users: 39 | - admin 40 | allowed_users: 41 | - your_another_non_admin_user 42 | #DummyAuthenticator not for production 43 | DummyAuthenticator: 44 | password: insertyourpasswordhereMVeP2VXfr 45 | JupyterHub: 46 | authenticator_class: dummy 47 | proxy: 48 | service: 49 | #if you set loadBalancerSourceRanges, you can access JupyterHub only from ip address from this setting. 50 | #you can set a bunch of IP adresses 51 | #https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#restricting-load-balancer-access 52 | loadBalancerSourceRanges: 53 | - PLACE_YOUR_IP_HERE 54 | - PLACE_ANOTHER_YOUR_IP_HERE_OR_REMOVE_THIS_LINE 55 | #EXAMPLE 56 | # - 91.74.148.161/32 57 | -------------------------------------------------------------------------------- /config/config_lb.yaml: -------------------------------------------------------------------------------- 1 | singleuser: 2 | defaultUrl: "/lab" 3 | storage: 4 | dynamic: 5 | #you could use different storage classes 6 | #get storage classes with: kubectl get storageclasses.storage.k8s.io 7 | storageClass: csi-ceph-ssd-ms1-retain 8 | hub: 9 | config: 10 | Authenticator: 11 | admin_users: 12 | - admin 13 | allowed_users: 14 | - your_another_non_admin_user 15 | #DummyAuthenticator not for production 16 | DummyAuthenticator: 17 | password: insertyourpasswordhereMVeP2VXfr 18 | JupyterHub: 19 | authenticator_class: dummy 20 | proxy: 21 | service: 22 | #if you set loadBalancerSourceRanges, you can access JupyterHub only from ip address from this setting. 23 | #you can set a bunch of IP adresses 24 | #https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#restricting-load-balancer-access 25 | loadBalancerSourceRanges: 26 | - PLACE_YOUR_IP_HERE 27 | - PLACE_ANOTHER_YOUR_IP_HERE_OR_REMOVE_THIS_LINE 28 | #EXAMPLE 29 | # - 91.74.148.161/32 30 | --------------------------------------------------------------------------------