├── questions_comments.md ├── README.md ├── cluster_overview.md ├── researchIT_team.md ├── Topic Outline.md ├── storage.md ├── intro_to_bash.md ├── code_of_conduct.md ├── directory_structure.md ├── nodes.md ├── applications.md ├── facilities_statement.md ├── Queues.md └── cluster_utilization.md /questions_comments.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # IntroToHPC 2 | HPC and queue usage class syllabus 3 | -------------------------------------------------------------------------------- /cluster_overview.md: -------------------------------------------------------------------------------- 1 | # Abbreviations 2 | 3 | 4 | * np = CPU core count 5 | * gpu = Graphical Processing Unit 6 | 7 | # Cluster Overview 8 | 9 | ## Login Nodes 10 | 11 | | Sumner | Winter | 12 | | ------------- | ---------------- | 13 | | login.sumner.jax.org | login.winter.jax.org | 14 | 15 | ## Compute nodes 16 | 17 | [Node List Breakdown](nodes.md) 18 | 19 | ## Queues 20 | [Queues](Queues.md) 21 | 22 | ## Storage policies 23 | 24 | [Storage Detail](storage.md) 25 | -------------------------------------------------------------------------------- /researchIT_team.md: -------------------------------------------------------------------------------- 1 | # Advanced Cyberinfrastructure Team 2 | 3 | * Brendan Arbuckle, MS - Direrector, Information and Research Technologies 4 | * Shane Sanders, PhD - Senior Manager - Advanced Cyberinfrastructure 5 | * David McKenzie, BS - Cyberinfrastructure Architect - Storage, Networking 6 | * Jason Macklin, BA - Cyberinfrastructure Architect - High Performance Computing 7 | * Aaron McDivitt, BA - System Administrator - High Performance Computing 8 | * Richard Yanicky, MS - Systems Analyst - High Performance Computing 9 | * Kurt Showmaker, PhD - Systems Analyst - High Performance Computing 10 | -------------------------------------------------------------------------------- /Topic Outline.md: -------------------------------------------------------------------------------- 1 | 1. [Intro to bash](intro_to_bash.md) .5 hrs 2 | 2. Cluster Resource Overview 3 hrs 3 | - [Facilities Statement](facilities_statement.md) .25 hrs 4 | - [Cluster Overview](cluster_overview.md) .5 hrs 5 | - Login Nodes 6 | - Compute nodes 7 | - Specialized nodes 8 | - Queues 9 | - Dev nodes 10 | - Storage 11 | - Policies 12 | - [Directory structure and utilization](directory_structure.md) 13 | - Permissions by file/directory .5 hrs 14 | - /projects 15 | - /data 16 | - /fastscratch 17 | - /gt_delivery 18 | - /home 19 | - [Applications](applications.md) .5 hrs 20 | - Userspace commands 21 | - Application builds 22 | - Central application tree 23 | - [Cluster resource utilization](cluster_utilization.md) .5 hrs 24 | - Queue templates 25 | 3. [Questions/Comments](questions_comments.md) .25 hrs 26 | -------------------------------------------------------------------------------- /storage.md: -------------------------------------------------------------------------------- 1 | Ok, now that we are connecting to Sumner through SSH, we should take a minute to examine and discuss the common directory structure of the system. 2 | 3 | `/home` is where the home directories for every user are kept, and are accessible by all cluster nodes. When you log in, you shold be in your `/home/userid` directory. Let's check that by using `ssh` to connect back to Sumner, and then using the `pwd` command we learned earlier. 4 | 5 | ~~~ 6 | ssh ssander@login.sumner.jax.org 7 | pwd 8 | ~~~ 9 | 10 | Home directories have a quota of 50GB each, meaning that the maximum storage available to a single user's `/home/userid` folder is 50GB. 11 | 12 | `/projects` is the primary active storage directory for users and lab groups. It is accessible by all cluster nodes. Each labgroup has a `/projects/PIname-lab` folder with a quota of 75TB. 13 | 14 | `/fastscratch` is a temporary directory pinned to the fastest in-house storage we have available. It is accessible by all cluster nodes. It has a total capacity of 150TB, and is meant to serve as 'scratch paper' for computational analysis. Users can specify that their jobs output to `/fastscratch`, and then copy their important files back into their `/home` or `/project` directories. Bioinformatics software can generate a large number of temporary and intermediate files, that are often no longer needed after the analysis completes, and can accidently consume large quantities of storage. Files in the `/fastscratch` directory that are 10 days old will be erased automatically by the current storage system policy, and IT reserves the right to erase any and all data on `/fastscratch` without notice. 15 | 16 | `/gt_delivery` is the delivery directory for sequence data from JAX's Genome Technologies group. It is only available via the Globus UI or command-line utility. More information is available at the Research IT Sharepoint site (https://jacksonlaboratory.sharepoint.com/sites/ResearchIT/SitePages/Globus-Data-Transfers.aspx) 17 | -------------------------------------------------------------------------------- /intro_to_bash.md: -------------------------------------------------------------------------------- 1 | ## Common Linux Commands 2 | | Command | Description | 3 | | ------- | ----------- | 4 | | cat [filename] | Display file’s contents to the standard output device(usually your monitor). | 5 | | cd /directorypath | Change to directory. | 6 | | chmod [options] mode filename | Change a file’s permissions. | 7 | | chown [options] filename | Change who owns a file. | 8 | | clear | Clear a command line screen/window for a fresh start. | 9 | | cp [options] source destination | Copy files and directories.| 10 | | date [options] | Display or set the system date and time. | 11 | | df [options] | Display used and available disk space. | 12 | | du [options] | Show how much space each file takes up. | 13 | | file [options] filename | Determine what type of data is within a file. | 14 | | find [pathname] [expression] | Search for files matching a provided pattern. | 15 | | grep [options] pattern [filesname] | Search files or output for a particular pattern. | 16 | | kill [options] pid | Stop a process. If the process refuses to stop, use kill -9 pid. | 17 | | less [options] [filename] | View the contents of a file one page at a time. | 18 | | ln [options] source [destination] | Create a shortcut. | 19 | | locate filename | Search a copy of your filesystem for the specified filename. | 20 | | ls [options] | List directory contents. | 21 | | man [command] | Display the help information for the specified command. | 22 | | mkdir [options] directory | Create a new directory. | 23 | | mv [options] source destination | Rename or move file(s) or directories. | 24 | | ps [options] | Display a snapshot of the currently running processes. | 25 | | pwd | Display the pathname for the current directory. | 26 | | rm [options] directory | Remove (delete) file(s) and/or directories. | 27 | | rmdir [options] directory | Delete empty directories. | 28 | | ssh [options] user@machine | Remotely log in to another Linux machine, over the network. Leave an ssh session by typing exit. | 29 | | tail [options] [filename] | Display the last n lines of a file (the default is 10). | 30 | | tar [options] filename | Store and extract files from a tarfile (.tar) or tarball (.tar.gz or .tgz). | 31 | | top | Displays the resources being used on your system. Press q to exit. | 32 | | touch filename | Create an empty file with the specified name. | 33 | | who [options] | Display who is logged on. | 34 | | whoami | Display current user. | 35 | 36 | 37 | ## Other Resources 38 | * [Beginner's Guide to the Bash Terminal](https://youtu.be/oxuRxtrO2Ag) 39 | * [Software Carpentry - The Unix Shell](https://swcarpentry.github.io/shell-novice/) 40 | * [ExplainShell.com - Fun website for CLI explanation](https://explainshell.com) 41 | -------------------------------------------------------------------------------- /code_of_conduct.md: -------------------------------------------------------------------------------- 1 | Code of Conduct 2 | 3 | Software Carpentry and Data Carpentry are dedicated to providing a welcoming and supportive environment for all people, regardless of background or identity. However, we recognise that some groups in our community are subject to historical and ongoing discrimination, and may be vulnerable or disadvantaged. Membership in such a specific group can be on the basis of characteristics such as gender, sexual orientation, disability, physical appearance, body size, race, nationality, sex, colour, ethnic or social origin, pregnancy, citizenship, familial status, veteran status, genetic information, religion or belief, political or any other opinion, membership of a national minority, property, birth, age, or choice of text editor. We do not tolerate harassment of participants on the basis of these categories, or for any other reason. 4 | 5 | Harassment is any form of behaviour intended to exclude, intimidate, or cause discomfort. Because we are a diverse community, we may have different ways of communicating and of understanding the intent behind actions. Therefore we have chosen to prohibit certain forms of behaviour in our community, regardless of intent. Prohibited harassing behaviour includes but is not limited to: 6 | 7 | * written or verbal comments which have the effect of excluding people on the basis of membership of a specific group listed above 8 | * causing someone to fear for their safety, such as through stalking, following, or intimidation 9 | * the display of sexual or violent images 10 | * unwelcome sexual attention 11 | * nonconsensual or unwelcome physical contact 12 | * sustained disruption of talks, events or communications 13 | * incitement to violence, suicide, or self-harm 14 | * continuing to initiate interaction (including photography or recording) with someone after being asked to stop 15 | * publication of private communication without consent 16 | 17 | Behaviour not explicitly mentioned above may still constitute harassment. The list above should not be taken as exhaustive but rather as a guide to make it easier to enrich all of us and the communities in which we participate. All Carpentry interactions should be professional regardless of location: harassment is prohibited whether it occurs on- or offline, and the same standards apply to both. 18 | 19 | Enforcement of the Code of Conduct will be respectful and not include any harassing behaviors. 20 | 21 | Thank you for helping make this a welcoming, friendly community for all. 22 | 23 | This code of conduct is a modified version of that used by PyCon, which in turn is forked from a template written by the Ada Initiative and hosted on the Geek Feminism Wiki. Contributors to this document: Adam Obeng, Aleksandra Pawlik, Bill Mills, Carol Willing, Erin Becker, Hilmar Lapp, Kara Woo, Karin Lagesen, Pauline Barmby, Sheila Miguez, Simon Waldman, Tracy Teal. 24 | -------------------------------------------------------------------------------- /directory_structure.md: -------------------------------------------------------------------------------- 1 | ### Directory Structure Best Practice 2 | Time spent at the beginning of the project to define folder hierarchy and file naming conventions will make it much easier to keep things organized and findable, both throughout the project and after project completion. Adhering to well-thought out naming conventions: 3 | 4 | * helps prevent accidental overwrites or deletion 5 | * makes it easier to locate specific data files 6 | * makes collaborating on the same files less confusing 7 | 8 | ### File naming best practices 9 | 10 | Include a few pieces of descriptive information in the filename, in a standard order, to make it clear what the file contains. For example, filenames could include: 11 | 12 | * experiment name or acronym 13 | * researcher initials 14 | * date data collected 15 | * type of data 16 | * conditions 17 | * file version 18 | * file extension for application-specific files 19 | 20 | #### Consider sort order: 21 | 22 | If it is useful for files to stay in chronological order, a good convention is to start file names with `YYYYMMDD` or `YYMMDD`. 23 | 24 | If you are using a sequential numbering system, use leading zeros to maintain sort order, e.g. `007` will sort before `700`. 25 | 26 | Do not use special (i.e. non-alphanumeric) characters in names such as: 27 | ```" / \ : * ? ‘ < > [ ] [ ] { } ( ) & $ ~ ! @ # % ^ , '``` 28 | 29 | These could be interpreted by programs or operating systems in unexpected ways. 30 | 31 | Do not use spaces in file or folder names, as some operating systems will not recognize them and you will need to enclose them in quotation marks to reference them in scripts and programs. Alternatives to spaces in filenames: 32 | 33 | * Underscores, e.g. file_name.xxx 34 | * Dashes, e.g. file-name.xxx 35 | * No separation, e.g. filename.xxx 36 | * Camel case, where the first letter of each section of text is capitalized, e.g. FileName.xxx 37 | * Keep names short, no more than 25 characters. 38 | 39 | ### File Versioning Best Practices 40 | 41 | File versioning ensures that you always understand what version of a file you are working with, and what are the working and final versions of files. Recommended file versioning practices: 42 | 43 | * Include a version number at the end of the file name such as v01. Change this version number each time the file is saved. 44 | * For the final version, substitute the word FINAL for the version number. 45 | * Take advantage of the versioning capabilities available in collaborative workspaces such as github OSF, Google Drive, and Box. 46 | * Track versions of computer code with versioning software such as Git, Subversion, or CVS. 47 | 48 | ### Directory Structure Best Practices 49 | Directories can be organized in many different ways. Consider what makes sense for your project and research team, and how people new to the project might look for files. 50 | 51 | Once you determine how you want your directories to be organized, it is a good idea to stub out an empty directory structure to hold future data, and to document the contents of each directory in a readme file. 52 | 53 | Directory Best Practices 54 | 55 | * Organize directories hierarchically, with broader topics at the top level of the hierarchy and more specific topics lower in the structure. 56 | * Group files of similar information together in a single directory. 57 | * Name directories after aspects of the project rather than after the names of individual researchers. 58 | * Once you have decided on a directory structure, follow it consistently and audit it periodically. 59 | * Separate ongoing and completed work. 60 | 61 | ### Sources 62 | http://guides.lib.umich.edu/datamanagement/files 63 | -------------------------------------------------------------------------------- /nodes.md: -------------------------------------------------------------------------------- 1 | # Abbreviations 2 | 3 | * np = CPU core count 4 | * gpu = Graphical Processing Unit 5 | 6 | # Cluster Node Breakdown 7 | 8 | ## Login Nodes 9 | 10 | | Sumner | Winter | 11 | | ------------- | ---------------- | 12 | | login.sumner.jax.org | login.winter.jax.org | 13 | 14 | ## Compute nodes 15 | 16 | | Sumner Compute | Winter Compute | 17 | | ------- | ----------- | 18 | | sumner014 np=70 768GB | winter200 np=46 192GB 4 V100 gpu | 19 | | sumner015 np=70 768GB | winter201 np=46 192GB 4 V100 gpu | 20 | | sumner016 np=70 768GB | winter202 np=46 192GB 4 V100 gpu | 21 | | sumner017 np=70 768GB | winter203 np=46 192GB 4 V100 gpu | 22 | | sumner018 np=70 768GB | winter204 np=46 192GB 4 V100 gpu | 23 | | sumner019 np=70 768GB | winter205 np=46 192GB 4 V100 gpu | 24 | | sumner020 np=70 768GB | winter206 np=46 192GB 4 V100 gpu | 25 | | sumner021 np=70 768GB | winter207 np=46 192GB 4 V100 gpu | 26 | | sumner022 np=70 768GB | 27 | | sumner023 np=70 768GB | 28 | | sumner024 np=70 768GB | 29 | | sumner025 np=70 768GB | 30 | | sumner026 np=70 768GB | 31 | | sumner027 np=70 768GB | 32 | | sumner028 np=70 768GB | 33 | | sumner029 np=70 768GB | 34 | | sumner030 np=70 768GB | 35 | | sumner031 np=70 768GB | 36 | | sumner032 np=70 768GB | 37 | | sumner033 np=70 768GB | 38 | | sumner034 np=70 768GB | 39 | | sumner035 np=70 768GB | 40 | | sumner036 np=70 768GB | 41 | | sumner037 np=70 768GB | 42 | | sumner038 np=70 768GB | 43 | | sumner039 np=70 768GB | 44 | | sumner040 np=70 768GB | 45 | | sumner041 np=70 768GB | 46 | | sumner042 np=70 768GB | 47 | | sumner043 np=70 768GB | 48 | | sumner044 np=70 768GB | 49 | | sumner045 np=70 768GB | 50 | | sumner046 np=70 768GB | 51 | | sumner047 np=70 768GB | 52 | | sumner048 np=70 768GB | 53 | | sumner049 np=70 768GB | 54 | | sumner050 np=70 768GB | 55 | | sumner051 np=70 768GB | 56 | | sumner052 np=70 768GB | 57 | | sumner053 np=70 768GB | 58 | | sumner054 np=70 768GB | 59 | | sumner055 np=70 768GB | 60 | | sumner056 np=70 768GB | 61 | | sumner057 np=70 768GB | 62 | | sumner058 np=70 768GB | 63 | | sumner059 np=70 768GB | 64 | | sumner060 np=70 768GB | 65 | | sumner061 np=70 768GB | 66 | | sumner062 np=70 768GB | 67 | | sumner063 np=70 768GB | 68 | | sumner064 np=70 768GB | 69 | | sumner065 np=70 768GB | 70 | | sumner066 np=70 768GB | 71 | | sumner067 np=70 768GB | 72 | | sumner068 np=70 768GB | 73 | | sumner069 np=70 768GB | 74 | | sumner070 np=70 768GB | 75 | | sumner071 np=70 768GB | 76 | | sumner072 np=70 768GB | 77 | | sumner073 np=70 768GB | 78 | | sumner074 np=70 768GB | 79 | | sumner075 np=70 768GB | 80 | | sumner076 np=70 768GB | 81 | | sumner077 np=70 768GB | 82 | | sumner078 np=70 768GB | 83 | | sumner079 np=70 768GB | 84 | | sumner080 np=70 768GB | 85 | | sumner081 np=70 768GB | 86 | | sumner082 np=70 768GB | 87 | | sumner083 np=70 768GB | 88 | | sumner084 np=70 768GB | 89 | | sumner085 np=70 768GB | 90 | | sumner086 np=70 768GB | 91 | | sumner087 np=70 768GB | 92 | | sumner088 np=70 768GB | 93 | | sumner089 np=70 768GB | 94 | | sumner090 np=70 768GB | 95 | | sumner091 np=70 768GB | 96 | | sumner092 np=70 768GB | 97 | | sumner093 np=70 768GB | 98 | | sumner094 np=70 768GB | 99 | | sumner095 np=70 768GB | 100 | | sumner096 np=70 768GB | 101 | | sumner097 np=70 768GB | 102 | | sumner098 np=70 768GB | 103 | | sumner099 np=70 768GB | 104 | | sumner100 np=70 768GB | 105 | | sumner101 np=70 768GB | 106 | | sumner102 np=70 768GB | 107 | | sumner103 np=70 768GB | 108 | | sumner104 np=70 768GB | 109 | | sumner105 np=70 768GB | 110 | | sumner106 np=70 768GB | 111 | | sumner107 np=70 768GB | 112 | | sumner108 np=70 768GB | 113 | | sumner109 np=70 768GB | 114 | | sumner110 np=70 768GB | 115 | | sumner112 np=70 768GB | 116 | | sumner113 np=70 768GB | 117 | 118 | ## Specialized nodes 119 | 120 | | Sumner high_mem | 121 | | ---------- | 122 | | sumner114 np=142 3096GB | 123 | | sumner115 np=142 3096GB | 124 | -------------------------------------------------------------------------------- /applications.md: -------------------------------------------------------------------------------- 1 | ### Software Usage on the Cluster 2 | 3 | On Sumner, we have overhauled how researchers are able to install and easily access the software they need. Users are still able to download and install software into their userspace (`/projects` or `/home` directories). Additionally, low level development tools such as `gcc`, `openMPI`, `openJDK`, and basic libraires are still available via the `module` system. The new and exciting way we are able to provide software on Sumner, however, is through software containers. Singularity containers are a cutting edge way of creating your own custom software modules. Containers not only provide you with the software you need, but also a contained environment which ensure that your software runs the exact same way whether it's on your laptop, on the HPC resources, with a collaborator, or in the cloud. 4 | 5 | --- 6 | ### Userspace Installation 7 | 8 | The simplest method of installing software for use with Sumner is to install directly into your userspace. This includes any directories you have permissions to write to and modify. This includes your group's `/projects` directory or your personal `/home` directory. You also have write permissions on `/fastscratch` and `/tmp`, but these are not recommended due to their ephemeral nature. Most software installations will default to writing under a `/usr/` or `/lib/` path. Since these locations are shared by everyone, these "global" trees are not writable by the typical HPC user. Instead, many software installations will allow you to perform a "local" install in a directory of your choosing. Please consult the documentation for your software to find how to change the install path for your software. 9 | 10 | ### Environment Modules 11 | 12 | On Helix and Cadillac, common software was managed through the Environment Modules package. Now, only low-level software such as libraries, compilers, and programming language binaries are available through modules. To see what modules are available, run the command `module avail`. For more information about how to use the `module` command, see the [documentation](https://modules.readthedocs.io/en/latest/). 13 | 14 | ### Singularity Containers 15 | 16 | The newest and most liberating change to software installations on HPC is the introduction of Singularity. Containerization allows you to install whatever software you want inside of a singular, stand-alone Singularity image file (`.sif`). This file contains your software, custom environment, and metadata all in one. 17 | 18 | To use Singularity, all you have to do is load the module by running `module load singularity` in a running job session on Sumner. Singularity can download new containers (`singularity pull`), run existing ones (`singularity run/exec/shell`), or upload new containers to registries (`singularity push`). For a full tutorial on Singularity, please see the educational materials posted on the [Cyberinfrastructure SharePoint](https://jacksonlaboratory.sharepoint.com/sites/ResearchIT/SitePages/Containerization-with-Singularity--101.aspx). 19 | 20 | In addition to the Singularity module, there is also additional infrastructure available to help you use Singularity to its full potential at JAX. We have made a container builder available to facilitate the building of your own custom containers without having to distribute `sudo` permissions for the `singularity build` command. The builder is itself a container which sends a specified recipe to a build server, then downloads the resulting `.sif` file. To use the builder, all you have to do is run `singularity run http://s3-far.jax.org/builder/builder`. Alternatively, you can download the container by running `singularity pull http://s3-far.jax.org/builder/builder`, or using other tools like `wget` or `curl`. The first method is suggested because it doesn't leave the possibility that you are using an oudated builder client. 21 | 22 | Additionally, we are a container registry soon which will allow users from every group at JAX to upload containers and share them with everyone in the lab. For more updates on this, please contact [matt.bradley@jax.org](mailto:matt.bradley@jax.org). 23 | -------------------------------------------------------------------------------- /facilities_statement.md: -------------------------------------------------------------------------------- 1 | # Research IT Resources 2 | 3 | *Last Updated: 26-Feb-2020* 4 | 5 | The Information Technology (IT) platforms at JAX enable computationally intensive, high-throughput, data-rich research to be conducted with the goal of developing personalized genomic medicine methodologies and practices. The IT team supports our infrastructure and technology platforms, and includes several dedicated Research IT personnel. Key IT platforms are summarized below: 6 | 7 | 8 | 9 | Working storage (Tier 0) on Cadillac is provided by a Data Direct Networks Gridscalar GS7k GPFS storage appliance with 232 TB usable storage capacity. The primary storage (Tier 1) on Cadillac is provided 25 Dell EMC Isilon scale-out NAS nodes combined for a total of 2.4 PB raw storage capacity. A new storage array (Tier 2) has been brought online as of Q1 2018 adding approximately 2 PB of new non-computable capacity. 10 | 11 | ## HPC Resources for all JAX Research: 12 | 13 | ## Sumner Cluster: 14 | 15 | Sumner is the HPC cluster physically located at the Farmington, CT campus. 16 | 17 | Sumner includes 100 Supermicro X11DPT-B Series servers with 70 Intel Xeon Gold 6150 computable cores at 2.7GHz and 768 GB RAM creating a 7,000-core high performance compute cluster. 18 | 19 | Specialized resources associated with Sumner include 2 nodes with 142 Intel Xeon Gold 6150 cores at 2.7GHz and 3 TB RAM available for workloads with extremely large memory requirements. 20 | 21 | ## Winter Cluster: 22 | 23 | Winter includes 8 Supermicro X11DGQ Series servers with 46 Intel Xeon Gold 6136 at 3.00GHz and 192 GB RAM. Each server includes 4 Nvidia Tesla V100 32 GB GPU nvme cards. This translates into 249.6 TFLOPS of double precision floating-point, 502.4 TFLOPS of single precision and 4,000 Tensor TFLOPS of combined, peak performance. 24 | 25 | ## Cluster Accessible Storage: 26 | 27 | Working storage (Tier 0) on Sumner is provided by a Data Direct Networks Gridscalar GS7k GPFS storage appliance with 522 TB usable storage capacity. The primary storage (Tier 1) on Sumner is provided 27 Dell EMC Isilon scale-out NAS nodes combined for a total of 2.7 PB raw storage capacity. A new storage array (Tier 2) has been brought online as of Q1 2018 adding approximately 2 PB of new non-computable capacity. 28 | 29 | ## Archival Storage: 30 | 31 | Archival storage (Tier 3) is provided at both Bar Harbor, ME and Farmington, CT by 2 Quantum Artico StorNext appliances with 72 TB front-end disk capacity backed by a 4 PB tape library at each site. Data storage at this tier is replicated across both geographic sites. 32 | 33 | ## Applications Platform: 34 | 35 | Our applications platform at both sites supports standard software necessary for investigators to process and analyze their data, as well as providing the entire research community with the basic building blocks for them to build their own custom workflows. Database development and deployment is supported by MySQL and other database management systems. 36 | 37 | ## Network Infrastructure: 38 | 39 | Our network platform supports scientific and enterprise business systems, using 40Gb core switches in our server farm that delivers at least 1Gb to user devices. The environment includes wired and wireless network service and a redundant voice over IP (VOIP) system, and is protected by application firewalls. The network infrastructure for the HPC environment is comprised of a dedicated 100Gb backbone with 50Gb to each server in the cluster, and 40Gb to each storage node. Internet service is delivered by a commercial service provider that can scale beyond 10Gb as demand for data transfer increases. 40 | 41 | ## Scientific Services Support: 42 | 43 | Gilbert is the HPC cluster serving the Genome Technologies Scientific Service. 44 | 45 | Gilbert includes 10 HP Proliant XL Series servers with 28 cores at 2.3GHz and 512 GB RAM combined into a 280-core high performance compute cluster. The primary storage (Tier 1) on Gilbert is provided by 10 Dell EMC Isilon scale-out NAS nodes combined for a total of 641.3 TB raw storage capacity. 46 | 47 | Exome is the HPC cluster serving the Clinical Genomics Scientific Service. 48 | 49 | Exome includes 8 HP Proliant SL Series servers with 16 cores at 2.6GHz and 256 GB RAM combined into a 128-core high performance compute cluster. The primary storage (Tier 1) on Exome is provided by 4 Dell EMC Isilon scale-out NAS nodes combined for a total of 165.5 TB raw storage capacity. 50 | -------------------------------------------------------------------------------- /Queues.md: -------------------------------------------------------------------------------- 1 | The high performance computing (HPC) resources of The Jackson Laboratory represent a shared resource available to all JAX researchers, and in order to keep these resources available to all researchers in a consistent and fair manner, a number of walltime-based queues have been implemented on these resources. These queues allow the Information Technology department the ability to better plan maintenance and schedule upgrade windows on these systems, while providing a more consistent and stable operating environment for JAX HPC users. 2 | 3 | ### Identifying the partitions 4 | 5 | ### squeue 6 | 7 | The squeue account provides details about many things including the partitions configured currently 8 | 9 | ~~~ 10 | squeue 11 | squeue -u 12 | squeue --help 13 | ~~~ 14 | 15 | Examples: 16 | 17 | squeue 18 | 19 | ~~~ 20 | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 21 | 643231 gpu RnnTrain zhaoyu PD 0:00 1 (Resources) 22 | 643232 gpu RnnTrain zhaoyu PD 0:00 1 (Priority) 23 | 639633_[68-135%30] gpu track_gr sheppk PD 0:00 1 (JobArrayTaskLimit) 24 | 552775_7337 compute checkBug zhaoyu PD 0:00 1 (launch failed requeued held) 25 | 627649 compute build tewher R 1-03:00:18 1 sumner054 26 | 627650 compute build tewher R 1-03:00:18 1 sumner054 27 | 627651 compute build tewher R 1-03:00:18 1 sumner054 28 | 627652 compute build tewher R 1-03:00:18 1 sumner054 29 | 627648 compute build tewher R 1-03:00:28 1 sumner054 30 | ~~~ 31 | 32 | squeue -u tewher 33 | 34 | ~~~ 35 | JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 36 | 627649 compute build tewher R 1-03:02:14 1 sumner054 37 | 627650 compute build tewher R 1-03:02:14 1 sumner054 38 | 627651 compute build tewher R 1-03:02:14 1 sumner054 39 | 627652 compute build tewher R 1-03:02:14 1 sumner054 40 | 627648 compute build tewher R 1-03:02:24 1 sumner054 41 | 638914 compute build tewher R 22:07:11 1 sumner014 42 | 638981 compute build tewher R 22:07:11 1 sumner014 43 | 638982 compute build tewher R 22:07:11 1 sumner039 44 | 638984 compute build tewher R 22:07:11 1 sumner054 45 | 638985 compute build tewher R 22:07:11 1 sumner014 46 | ~~~ 47 | 48 | squeue --help 49 | 50 | ~~~ 51 | Usage: squeue [OPTIONS] 52 | -A, --account=account(s) comma separated list of accounts 53 | to view, default is all accounts 54 | -a, --all display jobs in hidden partitions 55 | --array-unique display one unique pending job array 56 | element per line 57 | --federation Report federated information if a member 58 | of one 59 | -h, --noheader no headers on output 60 | --hide do not display jobs in hidden partitions 61 | -i, --iterate=seconds specify an interation period 62 | -j, --job=job(s) comma separated list of jobs IDs 63 | to view, default is all 64 | --local Report information only about jobs on the 65 | local cluster. Overrides --federation. 66 | -l, --long long report 67 | -L, --licenses=(license names) comma separated list of license names to view 68 | -M, --clusters=cluster_name cluster to issue commands to. Default is 69 | current cluster. cluster with no name will 70 | reset to default. Implies --local. 71 | -n, --name=job_name(s) comma separated list of job names to view 72 | --noconvert don't convert units from their original type 73 | (e.g. 2048M won't be converted to 2G). 74 | -o, --format=format format specification 75 | -O, --Format=format format specification 76 | -p, --partition=partition(s) comma separated list of partitions 77 | to view, default is all partitions 78 | -q, --qos=qos(s) comma separated list of qos's 79 | to view, default is all qos's 80 | -R, --reservation=name reservation to view, default is all 81 | -r, --array display one job array element per line 82 | --sibling Report information about all sibling jobs 83 | on a federated cluster. Implies --federation. 84 | -s, --step=step(s) comma separated list of job steps 85 | to view, default is all 86 | -S, --sort=fields comma separated list of fields to sort on 87 | --start print expected start times of pending jobs 88 | -t, --states=states comma separated list of states to view, 89 | default is pending and running, 90 | '--states=all' reports all states 91 | -u, --user=user_name(s) comma separated list of users to view 92 | --name=job_name(s) comma separated list of job names to view 93 | -v, --verbose verbosity level 94 | -V, --version output version information and exit 95 | -w, --nodelist=hostlist list of nodes to view, default is 96 | all nodes 97 | 98 | Help options: 99 | --help show this help message 100 | --usage display a brief summary of squeue options 101 | ~~~ 102 | 103 | ### Identifying the QOS (Queues) 104 | 105 | ### `sacctmgr` 106 | 107 | The `sacctmgr` (or Slurm Account Manager) command shows the current QOS (aka queues) on the HPC environment. 108 | 109 | Examples: 110 | 111 | sacctmgr 112 | 113 | ~~~ 114 | sacctmgr 115 | sacctmgr show qos 116 | sacctmgr --help 117 | ~~~ 118 | 119 | sacctmgr show qos 120 | 121 | ~~~ 122 | $ sacctmgr show qos 123 | ~~~ 124 | 125 | ~~~ 126 | Name Priority GraceTime Preempt PreemptMode Flags UsageThres UsageFactor GrpTRES GrpTRESMins GrpTRESRunMin GrpJobs GrpSubmit GrpWall MaxTRES MaxTRESPerNode MaxTRESMins MaxWall MaxTRESPU MaxJobsPU MaxSubmitPU MaxTRESPA MaxJobsPA MaxSubmitPA MinTRES 127 | ---------- ---------- ---------- ---------- ----------- ---------------------------------------- ---------- ----------- ------------- ------------- ------------- ------- --------- ----------- ------------- -------------- ------------- ----------- ------------- --------- ----------- ------------- --------- ----------- ------------- 128 | long 0 00:00:00 cluster 1.000000 14-00:00:00 cpu=3600 129 | batch 0 00:00:00 cluster 1.000000 3-00:00:00 cpu=3600 130 | ~~~ 131 | 132 | sacctmgr --help 133 | 134 | ~~~ 135 | $ sacctmgr --help 136 | ~~~ 137 | 138 | ~~~ 139 | sacctmgr [