├── CNAME
├── .gitignore
├── _config.yml
├── images
├── jupyter_browser.jpg
└── jupyter_notebook.jpg
├── athena.md
├── jupyter.md
├── advdockerguide.md
├── devbox.md
├── README.md
└── dockerguide.md
/CNAME:
--------------------------------------------------------------------------------
1 | compute.sutd.dev
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman
--------------------------------------------------------------------------------
/images/jupyter_browser.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/sutd-compute.github.io/master/images/jupyter_browser.jpg
--------------------------------------------------------------------------------
/images/jupyter_notebook.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenSUTD/sutd-compute.github.io/master/images/jupyter_notebook.jpg
--------------------------------------------------------------------------------
/athena.md:
--------------------------------------------------------------------------------
1 | [Home](README.md) | [DevBox](devbox.md) | [**Athena**](athena.md) | [Supercomputing@SUTD](https://computing.sutd.edu.sg/)
2 |
3 | # DGX-1 (Athena)
4 |
5 | [Link to old site](https://github.com/sutd-athena/sutd-athena.github.io)
6 |
7 | ```
8 | TODO
9 | ```
10 |
--------------------------------------------------------------------------------
/jupyter.md:
--------------------------------------------------------------------------------
1 | [Home](README.md) | [DevBox](devbox.md) | [Athena](athena.md) | [Supercomputing@SUTD](https://computing.sutd.edu.sg/)
2 |
3 | # Jupyter Notebook Guide
4 |
5 | ## Starting a Notebook
6 |
7 | 1. Log into one of the two JupyterHub nodes and spawn a server.
8 | * [Artemis](http://10.16.74.79:30001/hub/login) or [Apollo](http://10.16.74.79:30001/hub/login)
9 | * **You may refer to the [user guide](bit.ly/sutddevbox) if you face issues.**
10 | 2. You will be presented with the file browser UI, from which you can upload, download, create and open files.
11 |
12 | ## File Browser
13 |
14 | This is the UI which you use to browse your Notebook's filesystem and interact with the files.
15 |
16 | 
17 |
18 | ## Notebook UI
19 |
20 | You can create a new Jupyter Notebook with a **Python 3 kernel** by selecting 'New' > 'Python 3'. You will then be presented with the Notebook UI.
21 |
22 | You can run `!nvidia-smi` or `!gpustat` to check if GPUs are accesible from the notebook.
23 |
24 | 
25 |
26 | ## TensorBoard
27 |
28 | ```
29 | TODO
30 | ```
31 |
32 | ## JupyterLab
33 |
34 | You can use JupyterLab instead of Jupyter Notebook. Simply replace `tree?` in the URL with `lab`.
35 |
36 |
--------------------------------------------------------------------------------
/advdockerguide.md:
--------------------------------------------------------------------------------
1 | [Home](README.md) | [DevBox](devbox.md) | [Athena](athena.md) | [Supercomputing@SUTD](https://computing.sutd.edu.sg/)
2 |
3 | ## Advanced Guide to using `nvidia-docker`
4 |
5 | #### How to scp files from DGX to local
6 | ```
7 | $ scp username@server:/home/username/directory/you/want /local/directory/you/want/filename
8 | ```
9 |
10 | #### How to scp files from local to DGX
11 | ```
12 | $ scp /local/directory/you/want/filename username@server:/home/username/directory/you/want
13 | ```
14 | Note: for folders, use `-r` handle.
15 |
16 | #### To save your docker image
17 | ```
18 | username@server:~$ nvidia-docker commit *DOCKERNAMETOCOMMIT* *IMAGENAMETOSAVEAS*
19 | ```
20 |
21 |
22 | #### To transfer from host to docker container
23 | ```
24 | username@server:~$ nvidia-docker cp *./nameoffile* *dockername/number*:/*dockerdirectory*
25 | username@server:~$ nvidia-docker cp ./README.md my_pyconda3:/
26 | username@server:~$ nvidia-docker cp ./README.md e74d0:/
27 | ```
28 | the first 5 characters of the container ID is sufficient to identify the container.
29 |
30 |
31 | #### To transfer from docker container to host
32 | ```
33 | username@server:~$ nvidia-docker cp *dockername/number*:/*dockerdirectorywithfile* *./*
34 | username@server:~$ nvidia-docker cp my_pyconda3:/README.md ./
35 | username@server:~$ nvidia-docker cp e74d0:/README.md ./
36 | ```
37 |
38 |
39 | #### Installation of pycuda
40 | Install `Anaconda3`, install `pip` and upgrade it, then `pip install pycuda`.
41 |
--------------------------------------------------------------------------------
/devbox.md:
--------------------------------------------------------------------------------
1 | [Home](README.md) | [**DevBox**](devbox.md) | [Athena](athena.md) | [Supercomputing@SUTD](https://computing.sutd.edu.sg/)
2 |
3 | # SUTD DevBox (Apollo/Artemis)
4 |
5 | ## User Guide
6 |
7 | For now, please refer to the briefing slides:
8 |
9 | [Link to briefing slides](https://docs.google.com/presentation/d/15b_r9AnETZ2Odiwv6FiapjR7DOePb3nIDLEFtr1kaRs/edit?usp=sharing)
10 |
11 | [Jupyter Notebook User Guide](jupyter.md)
12 |
13 | Special notes regarding resource allocation:
14 |
15 | * Server QoS is `Burstable`, meaning your notebook server will be allowed to exceed RAM allocation if the system allows. However, if there is a lack of extra RAM, your notebook server will be killed
16 | * GPU will never be shared between notebook servers
17 |
18 | ## Quick Links (School Network Only)
19 |
20 | * Artemis: [JupyterHub](http://10.16.74.79:30001/hub/login)
21 | * Apollo: [JupyterHub](http://10.16.74.79:30002/hub/login)
22 |
23 | If you are unable to spin up a server on one of the nodes, please use the other. You may check usage on the dashboard to see if one node has less users.
24 |
25 | * [Usage Dashboard](http://10.16.74.79:30009/d/BbkYN82mz/devbox-dashboard)
26 |
27 | ## Additional Information
28 |
29 | **Network speed**
30 |
31 | Network speed may be slower than you might expect. This is expected behavior on the school network.
32 |
33 | **Storage**
34 |
35 | We do not guarentee the long-term integrity of your files stored on Apollo and Artemis. As this project is still in pilot/POC phase, we plan to make major changes to the storage solution in the future. Where your files may be affected, we will aim to inform you in advance.
36 |
37 | **Software stack**
38 |
39 | * [Kubernetes](https://kubernetes.io/)
40 | * [Kubeflow](https://www.kubeflow.org/)
41 | * [JupyterHub](https://github.com/jupyterhub/jupyterhub) - Multi-user Jupyter Notebook server
42 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [**Home**](README.md) | [DevBox](devbox.md) | [Athena](athena.md) | [Supercomputing@SUTD](https://computing.sutd.edu.sg/)
2 |
3 | ## User Guides
4 |
5 | **GPU Compute**
6 |
7 | * JupyterHub nodes: [Apollo](http://10.16.74.79:30002/hub/login) and [Artemis](http://10.16.74.79:30001/hub/login)
8 | * JupyterHub [Usage Dashboard](http://10.16.74.79:30009/d/BbkYN82mz/devbox-dashboard)
9 | * [DevBox User Guide](devbox.md)
10 | * [Jupyter Notebook User Guide](jupyter.md)
11 |
12 | * [Athena (DGX-1)](athena.md)
13 |
14 | **Hermes** - HPC Cloud Service
15 |
16 | * [Hermes User Guide](https://computing.sutd.edu.sg/resources/hermes/hermes-user-guide/)
17 | * [CloudStack Login](https://hermes.sutd.edu.sg/client/)
18 |
19 | **Using Docker**
20 |
21 | 1. [Basic Docker Guide](dockerguide.md)
22 | 2. [Advanced Docker Guide](advdockerguide.md)
23 |
24 | ## Frequently Asked Questions
25 |
26 | ### Network Access
27 |
28 | You need to be connected to the SUTD network to access any compute resources. If you are outside school, you may use the official school VPN.
29 |
30 | To be able to use VPN, download the VPN from [SUTD Downloads](https://downloads.sutd.edu.sg/cgi-bin/). Install then sign in to SecurePulse to get connected to VPN.
31 |
32 | ### Apollo/Artemis
33 |
34 | 1. What is the difference between Athena and Apollo/Artemis
35 |
36 | On Apollo/Artemis you get a user-friendly interface to spawn Jupyter Notebook for you to run iterative experiments on consumer-grade GPUs. This is recommended for beginners.
37 |
38 | On Athena (DGX-1) you can run heavier workloads, but you need to be comfortable with the command line and Docker.
39 |
40 | 2. Who is Jovyan??
41 |
42 | [Official answer](https://en.wikipedia.org/wiki/Jovian_(fiction))
43 |
44 | 3. Why you don't let me have root inside container?
45 |
46 | The general suggestion is to create the expectation of users getting root access as it will lead to possible security issues in the future, hence leading to crackdown which will cause more displeasure. Rather than let users have root we will work to include required system packages included inside the provided container. Feature requests can be filed at [tlkh/deeplearning-lab](https://github.com/tlkh/deeplearning-lab).
47 |
48 | [Debate](https://github.com/kubeflow/kubeflow/issues/300)
49 |
50 | ### Athena
51 |
52 | 1. Can I `sudo apt-get`?
53 |
54 | You are allowed to `sudo apt-get` if you are within your **own** container. Please
55 | do not use the `sudo` command outside of your **own** container.
56 |
57 | 2. I can't find my container? Did someone remove it?
58 |
59 | Please be advised that when launching your container from an image the first
60 | time, please name your container, e.g. `user_pycuda` so you are able to find
61 | your containers easily using the `nvidia-docker ps -a | grep user`.
62 |
--------------------------------------------------------------------------------
/dockerguide.md:
--------------------------------------------------------------------------------
1 | [Home](README.md) | [DevBox](devbox.md) | [Athena](athena.md) | [Supercomputing@SUTD](https://computing.sutd.edu.sg/)
2 |
3 | ## Guide to using `nvidia-docker`
4 |
5 | Note that if you choose to use `docker` commands instead of `nvidia-docker`,
6 | your work might not be running on the GPU.
7 |
8 | #### To check your cuda version,
9 | ```
10 | username@server:~$ nvcc --version
11 | nvcc: NVIDIA (R) Cuda compiler driver
12 | Copyright (c) 2005-2016 NVIDIA Corporation
13 | Built on Sun_Sep__4_22:14:01_CDT_2016
14 | Cuda compilation tools, release 8.0, V8.0.44
15 | ```
16 |
17 | #### To check what images you have,
18 | ```
19 | username@server:~$ nvidia-docker images
20 | ```
21 |
22 | #### To run a docker image,
23 | ```
24 | username@server:~$ nvidia-docker run -it --name *NAMEFORYOURDOCKER* *IMAGEID*
25 | ```
26 |
27 | Note: In the naming of your docker, you might like to prefix the name with your initials e.g. `my_pyconda3` so you can find your containers much easily,
28 |
29 | ```
30 | username@server:~$ nvidia-docker ps -a | grep my
31 | username@server:~$ nvidia-docker ps -a | grep my
32 | 616ae2c5df7b pycuda "/bin/bash" 5 minutes ago Exited (0) 7 seconds ago my_pycuda
33 | ```
34 |
35 | To get some basic stuff, like `pip` and `curl`,
36 | ```
37 | username@server:~$ sudo apt-get install python-pip python-dev build-essential
38 | username@server:~$ sudo pip install --upgrade pip
39 | ```
40 |
41 | ----
42 |
43 | ### Anaconda3
44 |
45 | To use Anaconda3, there is a image named `pyconda3`. You can run a container with this image,
46 | ```
47 | username@server:~$ nvidia-docker run -it --name my_pyconda3 pyconda3
48 | ```
49 | In the command above, `my_pyconda3` is the name that is assigned to your container(please use another name when you try) and `pyconda3` is the name of the image you are using for your container. You will then get something like below,
50 | ```
51 | root@d8bf528b7a96:/#
52 | ```
53 | which means you are in your new container. You can check which `Python3` distribution you are currently on now,
54 | ```
55 | root@d8bf528b7a96:~# which python
56 | /usr/bin/python
57 | ```
58 | which is not the `Anaconda3` distribution. `Anaconda3` is located at `/root/anaconda3` and to activate it, simply
59 | ```
60 | root@d8bf528b7a96:/# cd /root/anaconda3
61 | root@d8bf528b7a96:~/anaconda3# source bin/activate ~/anaconda3/
62 | (/root/anaconda3/) root@d8bf528b7a96:~/anaconda3#
63 | ```
64 | You are using `Anaconda3` now, which can be easily checked with
65 | ```
66 | (/root/anaconda3/) root@d8bf528b7a96:~/anaconda3# which python
67 | /root/anaconda3/bin/python
68 | ```
69 | To create specialized conda environments, refer to the [conda documentation](https://conda.io/docs/using/envs.html)
70 |
71 | To exit the `Anaconda3` environment, simply
72 | ```
73 | (/root/anaconda3/) root@d8bf528b7a96:~/anaconda3# source deactivate
74 | root@d8bf528b7a96:~/anaconda3#
75 | ```
76 |
77 | To exit you container without exiting the shell so your processes in your container
78 | are still running, use the escape sequence
79 | Ctrl + p, Ctrl + q
80 | will help you turn interactive mode into daemon mode
81 |
82 | To exit your container,
83 | ```
84 | root@d8bf528b7a96:~/anaconda3# exit
85 | username@server:~$
86 | ```
87 |
88 | If you want to re-enter the container,
89 | ```
90 | nvidia-docker restart my_pyconda3
91 | nvidia-docker attach my_pyconda3
92 | ```
93 | `restart` is to get the container to start running and `attach` is to enter the container.
94 |
95 | If you are done with your container, meaning you do not need it anymore,
96 | ```
97 | username@server:~$ nvidia-docker stop my_pyconda3
98 | username@server:~$ nvidia-docker rm my_pyconda3
99 | ```
100 | where `stop` will stop the container and `rm` will permanently delete your container. **Only do that your your containers only!**
101 |
102 | #### Listing containers
103 |
104 | To see the list of containers that are still running,
105 | ```
106 | username@server:~$ nvidia-docker ps
107 | ```
108 |
109 | To see the list of containers inhibiting the server,
110 | ```
111 | username@server:~$ nvidia-docker ps --all
112 | ```
113 |
114 | To see the list of containers inhibiting the server with a particular keyword,
115 | ```
116 | username@server:~$ nvidia-docker ps -a | grep *keyword*
117 | ```
118 |
119 | #### List of images
120 |
121 | Refer to the list of images [here](https://github.com/sutddgxadmin/sutdcompute/blob/master/imagelist.md).
122 |
--------------------------------------------------------------------------------