├── README.md ├── intro └── README.md ├── managing_users └── README.md ├── pyxis ├── README.md └── clean_pyxis.bash ├── slurm_intro └── README.md └── storage └── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Quick Links 2 | 3 | - [User Account Request](https://docs.google.com/forms/d/e/1FAIpQLSeTaLPpwmcvvqbeizdu_MRhhMjZ1L9z54NqHse6BJF9s0DTSg/viewform?usp=sf_link) 4 | - [Cluster Issue Submission](https://docs.google.com/forms/d/e/1FAIpQLScRKQLOcW9J2Pvw4clOho3lv04R3VPfG7F9MZTqbsquv79y3g/viewform?usp=sf_link) 5 | 6 | # Tutorials 7 | 8 | - [Getting an account](https://github.com/daniilidis-group/cluster_tutorials/tree/master/managing_users) 9 | - [Intro](https://github.com/daniilidis-group/cluster_tutorials/tree/master/intro) 10 | - [Intro to SLURM](https://github.com/daniilidis-group/cluster_tutorials/tree/master/slurm_intro) 11 | - [Some notes about storage](https://github.com/daniilidis-group/cluster_tutorials/tree/master/storage) 12 | - [Docker/pyxis/enroot](https://github.com/daniilidis-group/cluster_tutorials/tree/master/pyxis) 13 | -------------------------------------------------------------------------------- /intro/README.md: -------------------------------------------------------------------------------- 1 | # Cluster Intro 2 | 3 | Start here for a quick getting started tutorial. 4 | 5 | ## Logging in 6 | 7 | You need to be: 8 | 9 | 1) Wired on campus 10 | 2) Wireless on campus 11 | 3) UPenn VPN 12 | 13 | ## Getting resources 14 | 15 | All computation in the cluster must be done after creating a request for resources. 16 | 17 | ### Direct interactive session 18 | 19 | This method enables you to get an interactive terminal that holds the resouces 20 | 21 | ``` 22 | srun --partition debug --qos debug --mem=8G --gres=gpu:1 --pty bash 23 | ``` 24 | 25 | ### Direct blocking request 26 | 27 | This method enables you to submit a job and blocks your terminal until the command has been completed. 28 | 29 | ``` 30 | srun --mem-per-gpu=10G --cpus-per-gpu=4 --gpus=1 nvidia-smi 31 | ``` 32 | 33 | ### Non-blocking batch request 34 | 35 | This method allows you to schedule a job for the cluster to get to later. These jobs can be significantly more complex than `srun` allows for. This file will be run with `sbatch ` 36 | 37 | ``` 38 | #!/bin/bash 39 | #SBATCH --mem-per-gpu=10G 40 | #SBATCH --cpus-per-gpu=4 41 | #SBATCH --gpus=1 42 | #SBATCH --time=00:10:00 43 | #SBATCH --qos=low 44 | #SBATCH --partition=compute 45 | 46 | hostname 47 | nvidia-smi 48 | 49 | exit 0 50 | ``` 51 | 52 | ## Visualization 53 | 54 | Follow the tensorboard tutorial: 55 | 56 | [Tensorboard](https://github.com/daniilidis-group/cluster_tutorials/tree/master/tensorboard) 57 | -------------------------------------------------------------------------------- /managing_users/README.md: -------------------------------------------------------------------------------- 1 | # New user creation 2 | 3 | To request a new user, please fill out the [Cluster User Request Form](https://forms.gle/97BZLTwX5dMXDV118). The admin team will verify that the account is allowed to be created and create the account. 4 | 5 | # Login 6 | 7 | [SSH Key Guide](https://linuxize.com/post/how-to-set-up-ssh-keys-on-ubuntu-1804/) We require SSH key based login from remote locations. 8 | 9 | If at any point you want to change your password: 10 | 11 | ``` 12 | $ passwd 13 | ``` 14 | 15 | For access from an off campus location, you will need to setup an ssh key for authentication: 16 | 17 | You will *NOT* be able to login from a remote location if you don't have an ssh key setup. 18 | 19 | # SLURM Compute Allocation 20 | 21 | Depending upon your group's policies, you will either be allocated compute with respect to your group as a whole or an existing lab member will need to allocate you under their compute. If it is the former, this will be done for you. If it is the latter, your point of contact will be able to help you. 22 | 23 | # Managing your account 24 | 25 | This is for those groups that have a larger organizational structure: 26 | 27 | - Kostas Group 28 | 29 | ## Your account 30 | This account will balance your usage equally with other members of your group. It will additionally bill all usage from students working with you against your account. To look at your account: 31 | 32 | ``` 33 | $ sacctmgr show assoc Account=-account 34 | ``` 35 | 36 | ## Adding a student under your account 37 | Giving a user access to your compute will count against your overall account and thus you are taking responsibility for their overall usage. 38 | 39 | ``` 40 | $ sacctmgr add user Name= DefaultAccount=-account 41 | ``` 42 | 43 | You're able to remove them from your account at any point in time as well as modify how they can use your account. 44 | -------------------------------------------------------------------------------- /pyxis/README.md: -------------------------------------------------------------------------------- 1 | # Pyxis 2 | 3 | Pyxis manages enroot inside of SLURM. These are both provided by nvidia for running containers in an unpriveledged mode. 4 | 5 | - https://github.com/NVIDIA/pyxis 6 | - https://github.com/NVIDIA/enroot 7 | 8 | ## Running 9 | 10 | The following srun command downloads the compressed containers, decompresses and removes the decompressed container upon completion: 11 | ``` 12 | srun --container-image=centos grep PRETTY /etc/os-release 13 | ``` 14 | 15 | If you would like to keep the image (uncompressed) on the machine 16 | ``` 17 | srun --container-name=centosstays --container-image=centos grep PRETTY /etc/os-release 18 | ``` 19 | 20 | These are all of the options for Pyxis: 21 | ``` 22 | $ srun --help 23 | ... 24 | --container-image=[USER@][REGISTRY#]IMAGE[:TAG]|PATH 25 | [pyxis] the image to use for the container 26 | filesystem. Can be either a docker image given as 27 | an enroot URI, or a path to a squashfs file on the 28 | remote host filesystem. 29 | 30 | --container-mounts=SRC:DST[:FLAGS][,SRC:DST...] 31 | [pyxis] bind mount[s] inside the container. Mount 32 | flags are separated with "+", e.g. "ro+rprivate" 33 | 34 | --container-workdir=PATH 35 | [pyxis] working directory inside the container 36 | --container-name=NAME [pyxis] name to use for saving and loading the 37 | container on the host. Unnamed containers are 38 | removed after the slurm task is complete; named 39 | containers are not. If a container with this name 40 | already exists, the existing container is used and 41 | the import is skipped. 42 | --container-mount-home [pyxis] bind mount the user's home directory. 43 | System-level enroot settings might cause this 44 | directory to be already-mounted. 45 | 46 | --no-container-mount-home 47 | [pyxis] do not bind mount the user's home 48 | directory 49 | --container-remap-root [pyxis] ask to be remapped to root inside the 50 | container. Does not grant elevated system 51 | permissions, despite appearances. (default) 52 | 53 | --no-container-remap-root 54 | [pyxis] do not remap to root inside the container 55 | ``` 56 | ## Running with sbatch 57 | 58 | Unfortunately the srun parameters from a SPANK plugin don't migrate to sbatch by default. The workaround is thus using sbatch to allocate resources and then use srun from within. 59 | 60 | ``` 61 | #!/bin/bash 62 | #SBATCH --gpus=1 63 | #SBATCH --cpus-per-gpu=4 64 | #SBATCH --mem-per-gpu=10G 65 | #SBATCH --time=14:00:00 66 | #SBATCH --qos=low 67 | 68 | CONTAINER_NAME="" 69 | CONTAINER_IMAGE="" 70 | COMMAND="" 71 | EXTRA_MOUNTS="/Datasets:/Datasets,/scratch:/scratch" 72 | 73 | srun --container-mount-home\ 74 | --container-mounts=${EXTRA_MOUNTS}\ 75 | --container-name=${CONTAINER_NAME}\ 76 | --container-image=${CONTAINER_IMAGE}\ 77 | --no-container-remap-root\ 78 | ${COMMAND} 79 | ``` 80 | 81 | ## Cleanup 82 | 83 | Cleaning up your containers on a particular node should only be done when you have no jobs running on that node. This process can be useful if you're trying to debug why something isn't working 84 | 85 | ``` 86 | srun -w , --cores=1 --mem=1G --time=00:10:00 rm -r /tmp/enroot-data/user-$(id -u) 87 | ``` 88 | 89 | Or to have it scheduled (ensure you edit which node you are using): 90 | ``` 91 | sbatch clean_pyxis.bash 92 | ``` 93 | 94 | ## Admin notes 95 | - Pyxis must be installed on all machines 96 | - Enroot must be on all compute nodes 97 | -------------------------------------------------------------------------------- /pyxis/clean_pyxis.bash: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | #SBATCH -w node-2080ti-[0-6] 3 | #SBATCH --cores=1 4 | #SBATCH --mem=1G 5 | #SBATCH --time=00:10:00 6 | #SBATCH --qos=low 7 | 8 | ENROOT_DATA_PATH="/tmp/enroot-data/user-\$(id -u)" 9 | 10 | rm -r $ENROOT_DATA_PATH 11 | -------------------------------------------------------------------------------- /slurm_intro/README.md: -------------------------------------------------------------------------------- 1 | # SLURM Intro 2 | 3 | ## Getting comfortable with SLURM 4 | 5 | ### Partitions 6 | All nodes in the systems are assigned to one or more partitions that are available for user access given they have sufficient permissions. The non-specific partitions are as follows: 7 | 8 | | Partition | Time Limit | Valid QOSs | Nodes | Default | 9 | |------------------|------------|-------------------------|----------------------------------|---------| 10 | | batch | 1-00:00:00| normal | ALL | YES | 11 | | \-compute | 14-00:00:00| varies | Professor Specific | NO | 12 | 13 | 14 | To view the current state of each partition: 15 | 16 | ``` 17 | sinfo 18 | ``` 19 | 20 | In addition to the above partitions, you will see the professor specific partitions that you may or may not have access to. 21 | 22 | ## srun 23 | 24 | Now that you have some idea of how everything is broken down, you can run your first command. `srun` provides a blocking way to run a single command using remote resources. 25 | 26 | ``` 27 | srun --mem-per-gpu=10G --cpus-per-gpu=4 --gpus=1 nvidia-smi 28 | ``` 29 | 30 | The requested resources will be allocated and then your command will run. In this case you should see the GPU information displayed on your console. 31 | 32 | ## sbatch 33 | 34 | Running invidual commands in a blocking manner is often too cumbersome for any large scale projects. `sbatch` provides a method to submit a job(s) to be scheduled and run at the most optimal time. 35 | 36 | ``` 37 | #!/bin/bash 38 | #SBATCH --mem-per-gpu=10G 39 | #SBATCH --cpus-per-gpu=4 40 | #SBATCH --gpus=1 41 | #SBATCH --time=00:10:00 42 | 43 | hostname 44 | nvidia-smi 45 | 46 | exit 0 47 | ``` 48 | 49 | If the above contents are in `example.bash`, you can submit this job through `sbatch example.bash`. Upon being scheduled you will see a new file `slurm-.out` that contains the contents of stdout from the above bash script. 50 | 51 | ## A note about accurate estimates 52 | 53 | Accurate time and resource estimates are critical to ensure that as many jobs can be scheduled as possible. If you don't request enough, your job could crash in often unpredictable ways. If you request too much, those resources go to waste as they could be used for another job at the same time. In addition, your `fairshare` will be billed for resources that you request and thus could be adversely affected if you request a large number of unused resources. 54 | 55 | ## Running an interactive session 56 | 57 | Interactive session are designed to help facilitate efficient debugging of your code. You can request an interactive terminal on any partition or QOS, however debug is recommended as it will be able to preempt most currently running jobs. 58 | 59 | ``` 60 | srun --partition debug --qos debug --mem=8G --gres=gpu:1 --pty bash 61 | ``` 62 | 63 | ## Stopping jobs 64 | 65 | To stop a job you will need to know your Job ID which is announced after you use any command that requests resources. In addition, you can look at squeue to find a jobid. 66 | 67 | ``` 68 | scancel 69 | ``` 70 | 71 | ## Monitoring your jobs 72 | 73 | ### Queue 74 | Checking the work queue: 75 | ``` 76 | squeue 77 | ``` 78 | You will see all of the currently running jobs and scheduled jobs when running this command. Use the `sprio` command to check the priority level of each job. 79 | 80 | 81 | ## Advanced scheduling options 82 | 83 | ### QOS 84 | 85 | Depending on your affiliation, you will have at least 3 QOSs available to you: 86 | 87 | | QOS | # GPUs | Preempts | Exempt Time | Max GPU Min Per Job | Max Jobs | Max Submit Jobs | Priority | Usage Factor | Default | 88 | |------|--------|----------|-------------|---------------------|----------|-----------------|----------|--------------|---------| 89 | |normal| | | 00:30:00 | | 60 | 120 | 1 | 10 | YES | 90 | 91 | Professors who have their own resources have defined their own QOS to ensure equitable distribution of resources within their group. Examples are as follows: 92 | 93 | | QOS | # GPUs | Preempts | 94 | |-------------|--------|----------------| 95 | | \-med | 10 | low | 96 | | \-high | 1 | low, \-med| 97 | 98 | For more specifics of what currently exists you can look at: 99 | 100 | ``` 101 | sacctmgr show qos 102 | ``` 103 | 104 | Each QOS attempts to fufil a separate requirement that people might have and encourages smaller more managable chunks of runtime. 105 | 106 | - \# GPUs is the total per user for that QOS 107 | - Preempts indicates which QOS can be preempted when you start a job with that QOS 108 | - Exempt time indicates the amount of wall time that must pass before a job can be preempted (this is considered "safe" time) 109 | - Max GPU Min Per Job is the total number of minutes your job can use on a GPU, (i.e. using 3 GPUs for 15 minutes is 45 GPU minutes) 110 | - Max Jobs is the total number of jobs that can be accruing time in that QOS per user 111 | - Max Submit Jobs is the total number of jobs that you can submit in that QOS per user 112 | - Priority is an additional priority factor that gets accounted for in the scheduler 113 | - Usage Factor is how much it "costs" to run in this partition 114 | 115 | The basic QOSs (listed above) provide general access to a large number of resources. More specialized QOSs will be assigned by each group for their specific resources, these will take priority over these more generic QOSs. 116 | 117 | ### Requesting a specific GPU 118 | 119 | You are allowed to be more specific about the type of GPU that you want to use: 120 | 121 | ``` 122 | srun --mem-per-gpu=10G --cpus-per-gpu=4 --gpus=gtx1080ti:1 nvidia-smi 123 | ``` 124 | 125 | The list of types: 126 | 127 | | Name | Architecture | VRAM | 128 | |--------------|--------------|------| 129 | | geforce_rtx_2080_ti | Turing | 11GB | 130 | | rtx_a6000 | Ampere | 48GB | 131 | | a40 | Ampere | 48GB | 132 | | a10 | Ampere | 24GB | 133 | | geforce_rtx_3090 | Ampere | 24GB | 134 | | l40 | Lovelace | 48GB | 135 | 136 | 137 | 138 | ### Requesting a specific node 139 | 140 | If a specific node has the configuration that you'd like: 141 | ``` 142 | srun --mem-per-gpu=10G --cpus-per-gpu=4 --gpus=1 -w nvidia-smi 143 | ``` 144 | 145 | 146 | ## Batches 147 | 148 | You will need to make a file that contains the parameters of the batch of jobs: 149 | 150 | ``` 151 | #!/bin/bash 152 | #SBATCH --mem-per-gpu=10G 153 | #SBATCH --cpus-per-gpu=4 154 | #SBATCH --gpus=1 155 | #SBATCH --array=0-3 156 | #SBATCH --time=00:10:00 157 | 158 | hostname 159 | nvidia-smi 160 | 161 | echo "My unique array ID is $SLURM_ARRAY_TASK_ID out of $SLURM_ARRAY_TASK_MAX" 162 | 163 | exit 0 164 | ``` 165 | 166 | Write that to a file called test.bash and to run it use: 167 | 168 | ``` 169 | sbatch test.bash 170 | ``` 171 | 172 | This will run 4 jobs (--array=0-3) that each check the host they ran on and check the GPU itself. 173 | 174 | Each SBATCH line denotes a separate option which is further defined in https://slurm.schedmd.com/sbatch.html 175 | 176 | By default, a log file named `slurm-.out` should also be generated in the same directory as `test.bash`, which you can view as a live log with: 177 | ``` 178 | tail -f slurm-.out 179 | ``` 180 | sbatch allows you to specify this file to be anything you would like. Check their documentation for more info. 181 | 182 | You can also start an interactive terminal for a job that you started with sbatch (to check CPU or GPU utilization, e.g.). To do so, you need to start a new step/task within the running job: 183 | ``` 184 | srun --jobid --pty bash 185 | ``` 186 | With `squeue -s` you can see that your new step has a step id like `..` 187 | 188 | ## Long running jobs 189 | 190 | Jobs that run for a long time (i.e. more time than the QOS allows for) can still be scheduled in blocks and automatically requeued. This is handled by having the correct exiting conditions: 191 | 192 | 1) Exit code 3 from the primary job script 193 | 2) No signal was sent to your sub job 194 | 195 | This is an example in bash to create a job array with 4 elements that repeats forever. Please feel free to try this script, but do not allow it to run for forever. 196 | 197 | ``` 198 | #!/bin/bash 199 | #SBATCH --mem-per-gpu=10G 200 | #SBATCH --cpus-per-gpu=4 201 | #SBATCH --gpus=1 202 | #SBATCH --array=0-3 203 | #SBATCH --time=00:10:00 204 | 205 | hostname 206 | nvidia-smi 207 | exit 3 208 | ``` 209 | 210 | ## Handling preemption 211 | 212 | Jobs that are in a lower QOS allow other jobs to preempt them. This can be handled in a couple different ways. The most robust is to catch the signal (SIGTERM) and checkpoint your model. This can be seen further in the `mnist` tutorial. Jobs that are preempted are automatically placed back into the queue to be rescheduled. 213 | 214 | # Deep Learning Packages 215 | 216 | We provide the following packages installed on the base of every machine: 217 | 218 | | Name | Version | 219 | |---------|---------| 220 | | Python | 3.10 | 221 | | Pytorch | 1.13 | 222 | | Tensorflow | 2.0 | 223 | 224 | If the following command: 225 | ``` 226 | nvcc --version 227 | ``` 228 | returns `Command 'nvcc' not found` you may need to update your path to include cuda. To do so temporarily, run: 229 | ``` 230 | export PATH=$PATH:/usr/local/cuda/bin 231 | ``` 232 | To do so perminantly, add the line above to the bottom of your bashrc, which should be at `~/.bashrc`. 233 | 234 | In addition, you can always use venv for your specific use case. It is recommended to keep libraries like that on NVMe for speed of loading. For example, from inside of a slurm interactive session: 235 | 236 | ``` 237 | cd /scratch//virtual_envs 238 | python3 -m venv test 239 | ``` 240 | 241 | 242 | # Helpful Debugging Tools 243 | 244 | If you want to see how many cpus and gpus each node has/how many are allocated, you can run the following command. 245 | ``` 246 | scontrol show node 247 | ``` 248 | This will display a list of all of the nodes and information about them. An example output is shown below. 249 | 250 | ``` 251 | NodeName=node-2080ti-3 Arch=x86_64 CoresPerSocket=20 252 | CPUAlloc=10 CPUTot=40 CPULoad=8.42 253 | AvailableFeatures=(null) 254 | ActiveFeatures=(null) 255 | Gres=gpu:rtx2080ti:4(S:0) 256 | NodeAddr=node-2080ti-3 NodeHostName=node-2080ti-3 Version=19.05.5 257 | OS=Linux 4.15.0-108-generic #109-Ubuntu SMP Fri Jun 19 11:33:10 UTC 2020 258 | RealMemory=92000 AllocMem=0 FreeMem=59361 Sockets=1 Boards=1 259 | State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A 260 | Partitions=compute,kostas-compute 261 | BootTime=2020-06-29T13:54:09 SlurmdStartTime=2020-07-11T11:24:16 262 | CfgTRES=cpu=40,mem=92000M,billing=40,gres/gpu=4 263 | AllocTRES=cpu=10,gres/gpu=3 264 | CapWatts=n/a 265 | CurrentWatts=0 AveWatts=0 266 | ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s 267 | ``` 268 | For this node, the second line shows the allocated vs total cpus. The total number and type of gpus is shown under `Gres`. The number of allocated gpus is shown under `AllocTRES`. 269 | 270 | -------------------------------------------------------------------------------- /storage/README.md: -------------------------------------------------------------------------------- 1 | # Storage 2 | 3 | We supply two levels of storage that are logically and physically separated. Each has quotas and rules that manage the access and usage internally. 4 | 5 | ## Storage types 6 | 7 | 1) total storage size 8 | 2) total number of inodes (in general less large files is better for performance than many small files) 9 | 10 | | Pool | Size Quota | inode Quota | Default Directories | 11 | |------|------------|-------------|------------------------| 12 | | HDD | 4TB | 10M | /home | 13 | | NVMe | 100GB | | /mnt/kostas_graid | 14 | 15 | ## HDD 16 | 17 | We maintain a TrueNAS Scale based ZFS server for serving home folders for our users. These home folders are accessile through every node within the cluster. Users will see this in `/home/` which is created during the first login. 18 | 19 | This is the bulk storage for our users to maintain some history on their experiments. Datasets and log files on HDDs tend to slow down the system if too many people use them that way. Try to avoid this usage pattern. This is default on `/home`. 20 | 21 | ## NVMe 22 | 23 | We maintain a GRAID based server with 40TB of NVMe storage. The uplink for this node into the network is a dual port 100Gbps card allowing for remote failover. 24 | 25 | This storage is separated into user software environments and datasets. 26 | 27 | ### User Software 28 | 29 | User software can be found in `/mnt/kostas_graid/sw/envs`. Users are free to make their own folders and each user is allocated 100GB for software. 30 | 31 | ### Datasets 32 | 33 | Datasets can be found in `/mnt/kostas_graid/datsets`. Users are free to make their own folders and there is not currently a quota in place (if more than 1TB please reach out to a cluster admin!). This data is automatically cleaned up based on the file access time. If a file has not been access more than 14 days, it will be removed. Empty folders are also removed. You can check the last time a file as been accessed with: 34 | 35 | ``` 36 | ls -lutr 37 | ``` 38 | --------------------------------------------------------------------------------