├── README.md ├── lisa.md ├── das5.md └── ivi.md /README.md: -------------------------------------------------------------------------------- 1 | # nodes-info 2 | 3 | Resources on how to use the GPU clusters: [das5](das5.md), [ivi](ivi.md) and [LISA](lisa.md). 4 | 5 | ## General things to know 6 | 7 | * DO NOT stay idle on a GPU node. 8 | 9 | * Learn on how to create sessions with either `screen` or `tmux`. Opening a session and then submitting your job allows you to disconnect and then later resume (from a different location) and monitor the progress. `tmux` is not available on lisa. 10 | 11 | * `nvidia-smi` is useful to check the usage of GPUs on a node. 12 | 13 | * `CUDA_VISIBLE_DEVICES=0 python myscript.py` will only make the GPU:0 visible to python. Alternatively, you can specify it within your python script. 14 | 15 | * You can create a bash script with multiple parallel jobs that you will submit via `slurm`. Here is an example: 16 | 17 | ```bash 18 | CUDA_VISIBLE_DEVICES=0 python exp1.py & \ 19 | CUDA_VISIBLE_DEVICES=1,2 python exp2.py & \ 20 | CUDA_VISIBLE_DEVICES=3 python exp3.py & \ 21 | wait 22 | ``` 23 | 24 | A quick tutorial on how to use das4 is available [here](https://goo.gl/Atq9Za). 25 | -------------------------------------------------------------------------------- /lisa.md: -------------------------------------------------------------------------------- 1 | # LISA GPU-island 2 | 3 | ## Account creation 4 | 5 | Ask for an account by sending an email to . 6 | Add the following [information](https://userinfo.surfsara.nl/systems/lisa/account) in your email. 7 | Note that a (short) support letter from your advisor is mandatory to get approved. 8 | It should normally take 1-2 days. 9 | 10 | ## Hostname 11 | 12 | `login-gpu.lisa.surfsara.nl` to access GPU nodes. 13 | 14 | The head node has GPUs. You can debug your code there for 15 min. 15 | 16 | ## Useful things to have in your `.bashrc` 17 | 18 | ```bash 19 | module load CUDA 20 | module load cuDNN 21 | ``` 22 | 23 | Pick the version you want for CUDA by looking at what's available `module avail CUDA` 24 | 25 | ## Get a node with slurm 26 | 27 | ```bash 28 | srun -u --pty --time=HH:MM:SS -p gpu bash -i 29 | ``` 30 | 31 | `--time` maximum of 5 days (120:00:00) 32 | `-p gpu` get a node with GPUs 33 | 34 | However, the preferred way of interacting with the job queue is to submit a job using the interactive session: 35 | 36 | ```bash 37 | srun --time=HH:MM:SS -p gpu myjob.sh 38 | ``` 39 | 40 | slurm will provide a job id. Use that id if you want to remove yourself from the queue with `scancel [job id]`. 41 | 42 | ## Monitor node usage 43 | 44 | See which nodes are available: 45 | 46 | ```bash 47 | sinfo -p gpu 48 | ``` 49 | 50 | See who is in the queue: 51 | 52 | ```bash 53 | squeue -p gpu 54 | ``` 55 | 56 | ## Disk space 57 | 58 | You will be allocated free space 200GB on your home folder. 59 | 60 | There are also temporary storage space and project space available. Check this [page](https://userinfo.surfsara.nl/systems/lisa/getting-started#filesystems). 61 | 62 | 63 | ## Credit system 64 | 65 | You can monitor your credits through `accinfo`, which gives an overview of your account information and your credit budget, or `accuse` which shows your monthly usage. 66 | 67 | Make sure that you don't have a credit of 0, otherwise ask Boy Menist to fix your account. 68 | 69 | At first, 10k credits are assigned to new users. 70 | If you run out of credits (i.e. negative number), ask the helpdesk for more credits and put Boy Menist on cc. 71 | 72 | One hour of computation on a GPU node is equivalent to 48 credits. 73 | 74 | ## More info 75 | 76 | Read the [starting guide](https://userinfo.surfsara.nl/systems/lisa/getting-started) for more info. 77 | -------------------------------------------------------------------------------- /das5.md: -------------------------------------------------------------------------------- 1 | # DAS-5 cluster 2 | 3 | ## Account creation 4 | 5 | Ask for an account by sending an email to . Use your `@uva.nl` email address. 6 | It should normally take 1-2 days. 7 | 8 | ## Hostname 9 | 10 | `fs4.das5.science.uva.nl` to access GPU nodes. 11 | 12 | ## Useful things to have in your `.bashrc` 13 | 14 | ```bash 15 | module load gcc 16 | module load slurm 17 | module load cuda80 18 | module load cuDNN 19 | alias mywatch='/home/koelma/bin/mywatch' 20 | ``` 21 | 22 | Other modules are also available, full list is avaible with `module avail`. 23 | 24 | Pick the version you want for CUDA by looking at what's available `module avail cuda`. 25 | 26 | ## Get a node with slurm 27 | 28 | `srun -u --pty bash -i` Get an interactive session on a *CPU* node 29 | `srun -u --pty --gres=gpu:1 bash -i` Get an interactive session on a *GPU* node 30 | `srun -u --pty -w node404 bash -i` Get an interactive session on `node404` 31 | `srun -u --pty -p fatq bash -i` Get an interactive session on the fatq node (more RAM) 32 | 33 | Ideally you should submit a job instead of using an interactive session: 34 | 35 | `srun --gres=gpu:1 python myscript.py --myargument=foo` 36 | 37 | `srun --gres=gpu:4 bash mybashscript.sh` 38 | 39 | slurm will provide a job id. Use that id if you want to remove yourself from the queue with `scancel [job id]`. 40 | 41 | ## Jupyter notebook on a node 42 | 43 | For those who want to play with jupyter notebook on a node, here is the procedure. 44 | 45 | 1. Get an interective session on a CPU or GPU node 46 | 2. Record which node you got (e.g. `node401`) 47 | 3. On the obtained node, run `jupyter notebook --no-browser --port=8888` 48 | 4. From your local machine, run `ssh -t -t username@fs4.das5.science.uva.nl -L 8888:localhost:8888 ssh node401 -L 8888:localhost:8888` 49 | 5. Open a browser on your local machine and go to http://localhost:8888 50 | 6. You should now have access to a jupyter notebook that is running on a node on das5 51 | 7. Do **NOT** forget to kill both sessions once you are done 52 | 53 | *Note:* if you run into issues to launch jupyter notebook on the GPU node, you might want to remove the following environment variable `unset XDG_RUNTIME_DIR` 54 | 55 | ## Monitor node usage 56 | 57 | Either use `squeue` or `mywatch` (see last line of [.bashrc](#useful-things-to-have-in-your-bashrc)) 58 | 59 | ## Data storage 60 | 61 | Your homefolder space is quite limited. Only use it for scripts. 62 | 63 | Install your own python or other stuff on `/var/scratch/` (there is already a directory for each user). Also store your data on `/var/scratch/`. If you run out of space, ask Dennis. 64 | 65 | Check your current quota with `quota -sv`. 66 | 67 | ## More info 68 | 69 | Read more on the [das-5 website](https://www.cs.vu.nl/das5/jobs.shtml). 70 | -------------------------------------------------------------------------------- /ivi.md: -------------------------------------------------------------------------------- 1 | # IvI GPU cluster 2 | 3 | There is a channel `#ivi_cluster` in the IvI slack. 4 | Read the pinned posts for more info, this readme only provides the basics. 5 | 6 | ## Hostname 7 | 8 | `ivi-h0.science.uva.nl` to access the cluster. Use your UvAnetID as credentials. 9 | 10 | ## Useful things to have in your `.bashrc` 11 | 12 | ```bash 13 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-8.0/lib64 14 | export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-8.0/cudnn/cuda/lib64 15 | alias mywatch='/home/dkoelma1/bin/mywatch' 16 | ``` 17 | 18 | Pick the version you want for CUDA by looking at what's available in `/usr/local/`. 19 | 20 | ## Get a node with slurm 21 | 22 | When submitting a job, you need to specify the number of GPU, the ram, the number of CPUs and the duration. Please adjust these arguments according to your needs. 23 | Each node has 4 GPUs, 128GB of ram, 48 CPU threads (but 2 are used for slurm), and 2x10 TB local HDD (/hddstore). Maximum run time is 7 days. 24 | 25 | Here is an example on how to get an interactive session with 1 GPU for 2h30: 26 | ``srun -u --pty --gres=gpu:1 --mem=30G --cpus-per-task=10 --time=2:30:00 -D `pwd` bash -i`` 27 | 28 | Same but with 4 GPUs for 1 day and 8 hours: 29 | ``srun -u --pty --gres=gpu:4 --mem=120G --cpus-per-task=40 --time=1-8 -D `pwd` bash -i`` 30 | 31 | Ideally you should submit a job instead of using an interactive session: 32 | ``srun --gres=gpu:1 --mem=30G --cpus-per-task=10 --time=2:30:00 -D `pwd` python myscript.py --myargument=foo`` 33 | 34 | Without the ``-D `pwd` `` argument, slurm will start the job in the `` /tmp`` directory. 35 | Nodes can also be specified, e.g., to get ivi-cn001 add the following argument ``-w ivi-cn001``. 36 | 37 | Quoting a wise man. Priority of a job will depend on the size of the job (smaller jobs have higher priority) and the amount of resources used in the past (if you have consumed less resources in the past you will have a higher priority). 38 | 39 | slurm will provide a job id. Use that id if you want to remove yourself from the queue with `scancel [job id]`. 40 | 41 | ## Monitor node usage 42 | 43 | Either use `squeue` or `mywatch` (see last line of [.bashrc](#useful-things-to-have-in-your-bashrc)) 44 | Additionally, use `sinfo -h -N -o "%12n %8O %11T"` to monitor CPU usage. 45 | 46 | 47 | ## Jupyter notebook on a node 48 | 1. Get an interactive session on a GPU node. 49 | 2. Run `jupyter notebook --no-browser --port=20105` on GPU node. And you'll get a token like this `http://localhost:20105/?token=31427b811f3f3fdaef9d5da5cce8c81ba4df2f2ed4535960`. 50 | 3. Run `ssh -L 20103:localhost:20104 UvAnetID@ivi-h0.science.uva.nl ssh -L 20104:localhost:20105 ivi-cn009` on your local machine. Here you project the local port `20103` to IvI port `20104` and project IvI port `20104` to GPU node port `20105`. 51 | 4. Open your browser and go to http://localhost:20103 and paste the token in step 2 `31427b811f3f3fdaef9d5da5cce8c81ba4df2f2ed4535960`. 52 | 53 | ## Data storage 54 | 55 | 1 TB on your home directory. 56 | Move your data to `/hddstore` or `/sddstore` for faster compute. 57 | 58 | ## Change GCC version 59 | 60 | The default GCC version is 4.8. 61 | Use GCC 6 with 62 | `source /opt/rh/devtoolset-6/enable` 63 | or GCC 7 with 64 | `source /opt/rh/devtoolset-7/enable` 65 | --------------------------------------------------------------------------------