├── .gitignore ├── .gitmodules ├── assets ├── cfnstackhpctemplate.png ├── cryoemPCSarchitecture.png ├── cryosparcsigninpage.png └── CryoSPARCParallelClusterArch.png ├── CODE_OF_CONDUCT.md ├── LICENSE ├── deployment ├── FSxLustreDataRepoTasksPolicy.yaml ├── parallel-cluster-cryosparc.yaml └── parallel-cluster-cryosparc-custom-roles.yaml ├── README.md ├── CONTRIBUTING.md ├── source ├── pcs-cryosparc-post-install.sh └── parallel-cluster-post-install.sh ├── PCSREADME.md └── ParallelClusterREADME.md /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | .DS_Store 3 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "deployment/aws-hpc-recipe"] 2 | path = deployment/aws-hpc-recipe 3 | url = https://github.com/aws-samples/aws-hpc-recipes.git 4 | -------------------------------------------------------------------------------- /assets/cfnstackhpctemplate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/cfnstackhpctemplate.png -------------------------------------------------------------------------------- /assets/cryoemPCSarchitecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/cryoemPCSarchitecture.png -------------------------------------------------------------------------------- /assets/cryosparcsigninpage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/cryosparcsigninpage.png -------------------------------------------------------------------------------- /assets/CryoSPARCParallelClusterArch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/CryoSPARCParallelClusterArch.png -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | 3 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 4 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 5 | opensource-codeofconduct@amazon.com with any additional questions or comments. 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | -------------------------------------------------------------------------------- /deployment/FSxLustreDataRepoTasksPolicy.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: > 3 | This Cloudformation template builds the necessary IAM policies for creating data repository tasks for FSx. (AWS SID: SO9634) 4 | Resources: 5 | DataRepoTaskIamPolicy: 6 | Type: "AWS::IAM::ManagedPolicy" 7 | Properties: 8 | PolicyDocument: 9 | Version: "2012-10-17" 10 | Statement: 11 | - Sid: "DataRepoTaskAdmin" 12 | Effect: Allow 13 | Action: 14 | - "fsx:CreateDataRepositoryTask" 15 | - "fsx:CancelDataRepositoryTask" 16 | Resource: 17 | - !Sub "arn:aws:fsx:${AWS::Region}:${AWS::AccountId}:file-system/${FsxId}" 18 | - !Sub "arn:aws:fsx:${AWS::Region}:${AWS::AccountId}:task/*" 19 | - Sid: "DataRepoTaskRead" 20 | Effect: Allow 21 | Action: 22 | - "fsx:DescribeDataRepositoryTasks" 23 | Resource: 24 | - "arn:aws:fsx:${AWS::Region}:${AWS::AccountId}:*" 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Guidance for Cryo-EM on AWS Parallel Cluster and AWS Parallel Computing Service 2 | 3 | ## Introduction 4 | 5 | This guide provides orientation for running Cryo-EM workflows on AWS. AWS offers two different parallel computing options: AWS ParallelCluster and AWS Parallel Computing Service (PCS). Each has strengths depending on workload complexity, ease of management, and scaling needs. 6 | 7 | ## Overview 8 | 9 | ### AWS ParallelCluster 10 | 11 | An open-source cluster management tool that provisions and manages HPC clusters on AWS. 12 | 13 | Offers fine-grained control over compute environments, schedulers (Slurm, etc.), networking and data management. 14 | 15 | Best for reproducible environments, and tightly coupled workloads. 16 | 17 | For detailed guidance on running Cryo-EM workloads using CryoSPARC with AWS ParallelCluster, refer to the Guidance [README](ParallelClusterREADME.md). 18 | 19 | Below is the architecture for the AWS Parallel Cluster Guidance. 20 | 21 | ![ParallelClusterArchitecture](assets/CryoSPARCParallelClusterArch.png) 22 | 23 | ### AWS Parallel Computing Service (PCS) 24 | 25 | A managed service for running parallel workloads without needing to manage the underlying infrastructure. 26 | 27 | Ideal for on-demand scaling. 28 | 29 | Lower operational burden, more “serverless” style. 30 | 31 | For detailed guidance on running Cryo-EM workloads using CryoSPARC with AWS Parallel Computing Service (PCS), refer to the Guidance [README](PCSREADME.md). 32 | 33 | Below is the architecture for the AWS PCS Guidance. 34 | 35 | ![PCSArchitecture](assets/cryoemPCSarchitecture.png) 36 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | ## Reporting Bugs/Feature Requests 10 | 11 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 12 | 13 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 14 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 15 | 16 | - A reproducible test case or series of steps 17 | - The version of our code being used 18 | - Any modifications you've made relevant to the bug 19 | - Anything unusual about your environment or deployment 20 | 21 | ## Contributing via Pull Requests 22 | 23 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 24 | 25 | 1. You are working against the latest source on the _main_ branch. 26 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 27 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 28 | 29 | To send us a pull request, please: 30 | 31 | 1. Fork the repository. 32 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 33 | 3. Ensure local tests pass. 34 | 4. Commit to your fork using clear commit messages. 35 | 5. Send us a pull request, answering any default questions in the pull request interface. 36 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 37 | 38 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 39 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 40 | 41 | ## Finding contributions to work on 42 | 43 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 44 | 45 | ## Code of Conduct 46 | 47 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 48 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 49 | opensource-codeofconduct@amazon.com with any additional questions or comments. 50 | 51 | ## Security issue notifications 52 | 53 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 54 | 55 | ## Licensing 56 | 57 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 58 | -------------------------------------------------------------------------------- /deployment/parallel-cluster-cryosparc.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: > 3 | This Cloudformation template builds the architecture for Cryoem on AWS - parallel cluster version of the architecture with the necessary slurm controllers and instance groups. (AWS SID: SO9634) 4 | DevSettings: 5 | Timeouts: 6 | HeadNodeBootstrapTimeout: 5400 7 | Region: us-east-1 8 | Image: 9 | Os: alinux2 10 | HeadNode: 11 | InstanceType: c5a.4xlarge 12 | Networking: 13 | SubnetId: 14 | ElasticIp: true 15 | SecurityGroups: 16 | - 17 | Ssh: 18 | KeyName: 19 | LocalStorage: 20 | RootVolume: 21 | Size: 100 22 | EphemeralVolume: 23 | MountDir: /scratch 24 | Dcv: 25 | Enabled: true 26 | CustomActions: 27 | OnNodeConfigured: 28 | Sequence: 29 | - Script: s3:///parallel-cluster-post-install.sh 30 | Args: 31 | - 32 | - /shared/cryosparc 33 | - /shared/cuda 34 | - 11.8.0 35 | - 11.8.0_520.61.05 36 | - /fsx 37 | Iam: 38 | S3Access: 39 | - BucketName: 40 | 41 | Scheduling: 42 | Scheduler: slurm 43 | SlurmQueues: 44 | - Name: cpu 45 | CapacityType: ONDEMAND 46 | ComputeResources: 47 | - Name: c5a-8xlarge 48 | InstanceType: c5a.8xlarge 49 | MinCount: 0 50 | MaxCount: 20 51 | DisableSimultaneousMultithreading: true 52 | Efa: 53 | Enabled: false 54 | Networking: 55 | SubnetIds: 56 | - 57 | SecurityGroups: 58 | - 59 | PlacementGroup: 60 | Enabled: true 61 | ComputeSettings: 62 | LocalStorage: 63 | EphemeralVolume: 64 | MountDir: /scratch 65 | 66 | - Name: single-gpu 67 | CapacityType: ONDEMAND 68 | ComputeResources: 69 | - Name: g6-4xlarge 70 | InstanceType: g6.4xlarge 71 | MinCount: 0 72 | MaxCount: 20 73 | DisableSimultaneousMultithreading: true 74 | Efa: 75 | Enabled: false 76 | Networking: 77 | SubnetIds: 78 | - 79 | SecurityGroups: 80 | - 81 | PlacementGroup: 82 | Enabled: true 83 | ComputeSettings: 84 | LocalStorage: 85 | EphemeralVolume: 86 | MountDir: /scratch 87 | 88 | - Name: multi-gpu 89 | CapacityType: ONDEMAND 90 | ComputeResources: 91 | - Name: g6-48xlarge 92 | InstanceType: g6.48xlarge 93 | MinCount: 0 94 | MaxCount: 20 95 | DisableSimultaneousMultithreading: true 96 | Efa: 97 | Enabled: true 98 | Networking: 99 | SubnetIds: 100 | - 101 | SecurityGroups: 102 | - 103 | PlacementGroup: 104 | Enabled: true 105 | ComputeSettings: 106 | LocalStorage: 107 | EphemeralVolume: 108 | MountDir: /scratch 109 | 110 | SharedStorage: 111 | - Name: cryosparc-ebs 112 | StorageType: Ebs 113 | MountDir: /shared 114 | EbsSettings: 115 | Encrypted: true 116 | VolumeType: gp3 117 | Size: 100 118 | 119 | - Name: cryosparc-fsx 120 | StorageType: FsxLustre 121 | MountDir: /fsx 122 | FsxLustreSettings: 123 | AutoImportPolicy: NEW_CHANGED 124 | StorageCapacity: 1024 125 | DeploymentType: PERSISTENT_2 126 | ImportedFileChunkSize: 1024 127 | PerUnitStorageThroughput: 250 128 | ImportPath: s3:// 129 | 130 | Monitoring: 131 | Dashboards: 132 | CloudWatch: 133 | Enabled: true 134 | -------------------------------------------------------------------------------- /deployment/parallel-cluster-cryosparc-custom-roles.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: > 3 | This Cloudformation template builds the architecture for Cryoem on AWS - parallel cluster version of the architecture with the necessary slurm controllers and instance groups, this version is if your account has restrictions for the creation of new IAM resources. (AWS SID: SO9634) 4 | Region: 5 | Image: 6 | Os: alinux2 7 | Iam: 8 | Roles: 9 | LambdaFunctionsRole: 10 | HeadNode: 11 | InstanceType: c5a.4xlarge 12 | Networking: 13 | SubnetId: 14 | ElasticIp: true 15 | SecurityGroups: 16 | - 17 | Ssh: 18 | KeyName: 19 | LocalStorage: 20 | RootVolume: 21 | Size: 100 22 | EphemeralVolume: 23 | MountDir: /scratch 24 | Dcv: 25 | Enabled: true 26 | CustomActions: 27 | OnNodeConfigured: 28 | Script: s3:///parallel-cluster-post-install.sh 29 | Args: 30 | - s3:///parallel-cluster-post-install.sh 31 | - 32 | Iam: 33 | InstanceRole: 34 | 35 | Scheduling: 36 | Scheduler: slurm 37 | SlurmQueues: 38 | - Name: cpu 39 | Iam: 40 | InstanceRole: 41 | CapacityType: ONDEMAND 42 | ComputeResources: 43 | - Name: c5a-8xlarge 44 | InstanceType: c5a.8xlarge 45 | MinCount: 0 46 | MaxCount: 20 47 | DisableSimultaneousMultithreading: true 48 | Efa: 49 | Enabled: false 50 | Networking: 51 | SubnetIds: 52 | - 53 | SecurityGroups: 54 | - 55 | PlacementGroup: 56 | Enabled: true 57 | ComputeSettings: 58 | LocalStorage: 59 | EphemeralVolume: 60 | MountDir: /scratch 61 | 62 | - Name: single-gpu 63 | Iam: 64 | InstanceRole: 65 | CapacityType: ONDEMAND 66 | ComputeResources: 67 | - Name: g6-4xlarge 68 | InstanceType: g6.4xlarge 69 | MinCount: 0 70 | MaxCount: 10 71 | DisableSimultaneousMultithreading: true 72 | Efa: 73 | Enabled: false 74 | Networking: 75 | SubnetIds: 76 | - 77 | SecurityGroups: 78 | - 79 | PlacementGroup: 80 | Enabled: true 81 | ComputeSettings: 82 | LocalStorage: 83 | EphemeralVolume: 84 | MountDir: /scratch 85 | 86 | - Name: multi-gpu 87 | Iam: 88 | InstanceRole: 89 | CapacityType: ONDEMAND 90 | ComputeResources: 91 | - Name: g6-48xlarge 92 | InstanceType: g6.48xlarge 93 | MinCount: 0 94 | MaxCount: 10 95 | DisableSimultaneousMultithreading: true 96 | Efa: 97 | Enabled: true 98 | Networking: 99 | SubnetIds: 100 | - 101 | SecurityGroups: 102 | - 103 | PlacementGroup: 104 | Enabled: true 105 | ComputeSettings: 106 | LocalStorage: 107 | EphemeralVolume: 108 | MountDir: /scratch 109 | 110 | SharedStorage: 111 | - Name: cryosparc-ebs 112 | StorageType: Ebs 113 | MountDir: /shared 114 | EbsSettings: 115 | Encrypted: true 116 | VolumeType: gp3 117 | Size: 100 118 | 119 | - Name: cryosparc-fsx 120 | StorageType: FsxLustre 121 | MountDir: /fsx 122 | FsxLustreSettings: 123 | AutoImportPolicy: NEW_CHANGED 124 | StorageCapacity: 1024 125 | DeploymentType: PERSISTENT_2 126 | ImportedFileChunkSize: 1024 127 | PerUnitStorageThroughput: 250 128 | ImportPath: s3:// 129 | 130 | Monitoring: 131 | Dashboards: 132 | CloudWatch: 133 | Enabled: true 134 | -------------------------------------------------------------------------------- /source/pcs-cryosparc-post-install.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ## Args: 4 | # argv1 - CRYOSPARC_LICENSE_ID (required) 5 | # argv2 - CRYOSPARC_INSTALL_PATH (default: /shared/cryosparc) 6 | # argv3 - CUDA_INSTALL_PATH (default: /shared/cuda) 7 | # argv4 - CUDA_VERSION (default 11.3.1) 8 | # argv5 - CUDA_LONG_VERSION (default: 11.3.1_465.19.01) 9 | # argv6 - PROJECT_DATA_PATH (default: /fsx) 10 | 11 | # Get the local commands to run yum and apt 12 | YUM_CMD=$(which yum || echo "") 13 | APT_GET_CMD=$(which apt-get || echo "") 14 | 15 | # If we have yum installed, use it to install prerequisites. If not, use apt 16 | if [[ -n $YUM_CMD ]]; then 17 | wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -P /tmp 18 | yum install -y /tmp/epel-release-latest-7.noarch.rpm 19 | 20 | yum install -y perl-Switch python3 python3-pip links 21 | user_test=$(getent passwd ec2-user) 22 | if [[ -n "${user_test}" ]]; then 23 | OSUSER=ec2-user 24 | OSGROUP=ec2-user 25 | else 26 | OSUSER=centos 27 | OSGROUP=centos 28 | fi 29 | elif [[ -n $APT_GET_CMD ]]; then 30 | apt-get update 31 | apt-get install -y libswitch-perl python3 python3-pip links 32 | OSUSER=ubuntu 33 | OSGROUP=ubuntu 34 | else 35 | # If we don't have yum or apt, we couldn't install the prerequisites, so exit 36 | echo "error can't install package $PACKAGE" 37 | exit 1; 38 | fi 39 | 40 | # Get the cryoSPARC license ID, optional path, and optional versions from the script arguments 41 | CRYOSPARC_LICENSE_ID=$1 42 | CRYOSPARC_INSTALL_PATH=${2:-/shared/cryosparc} 43 | CUDA_INSTALL_PATH=${3:-/shared/cuda} 44 | CUDA_VERSION=${4:-11.3.1} 45 | CUDA_LONG_VERSION=${5:-11.3.1_465.19.01} 46 | PROJECT_DATA_PATH=${6:-/shared} 47 | 48 | /bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} || chmod 777 ${PROJECT_DATA_PATH} 49 | 50 | # Install the AWS CLI 51 | pip3 install --upgrade awscli boto3 52 | 53 | set -e 54 | 55 | #yum -y update 56 | 57 | # Configure AWS 58 | AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | rev | cut -c 2- | rev) 59 | aws configure set default.region "${AWS_DEFAULT_REGION}" 60 | aws configure set default.output json 61 | 62 | if [[ "$(cat ${CUDA_INSTALL_PATH}/installed_cuda_version.log 2>/dev/null )" == "${CUDA_LONG_VERSION}" ]]; then 63 | echo "Matched previous CUDA version. Using old installer ${CUDA_LONG_VERSION}" 64 | else 65 | echo "Installing new version of CUDA ${CUDA_LONG_VERSION} (this may break cryosparc install)" 66 | # Install CUDA Toolkit 67 | mkdir -p "${CUDA_INSTALL_PATH}" 68 | mkdir -p "${CUDA_INSTALL_PATH}_tmp" 69 | cd "${CUDA_INSTALL_PATH}" || return 70 | wget "https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda_${CUDA_LONG_VERSION}_linux.run" 71 | fi 72 | echo "${CUDA_LONG_VERSION}" > ${CUDA_INSTALL_PATH}/installed_cuda_version.log 73 | 74 | sh ${CUDA_INSTALL_PATH}/cuda_"${CUDA_LONG_VERSION}"_linux.run --tmpdir="${CUDA_INSTALL_PATH}_tmp" --defaultroot="${CUDA_INSTALL_PATH}" --toolkit --toolkitpath="${CUDA_INSTALL_PATH}"/"${CUDA_VERSION}" --samples --silent 75 | #rm cuda_"${CUDA_LONG_VERSION}"_linux.run 76 | 77 | # Add CUDA to the path 78 | cat > /etc/profile.d/cuda.sh << 'EOF' 79 | PATH=$PATH:@CUDA_INSTALL_PATH@/@CUDA_VERSION@/bin 80 | EOF 81 | sed -i "s|@CUDA_INSTALL_PATH@|${CUDA_INSTALL_PATH}|g" /etc/profile.d/cuda.sh 82 | sed -i "s|@CUDA_VERSION@|${CUDA_VERSION}|g" /etc/profile.d/cuda.sh 83 | . /etc/profile.d/cuda.sh 84 | 85 | # Add CryoSPARC to the path 86 | cat > /etc/profile.d/cryosparc.sh << 'EOF' 87 | PATH=$PATH:@CRYOSPARC_INSTALL_PATH@/cryosparc_master/bin 88 | EOF 89 | sed -i "s|@CRYOSPARC_INSTALL_PATH@|${CRYOSPARC_INSTALL_PATH}|g" /etc/profile.d/cryosparc.sh 90 | . /etc/profile.d/cryosparc.sh 91 | 92 | # Condition checks whether /etc/profile.d/cryosparc.sh activated previously install cryosparc 93 | # if not, then we install cryosparc 94 | if [ ! -x "$(command -v "cryosparcm")" ]; then 95 | echo "Installing fresh CryoSPARC" 96 | 97 | # Download cryoSPARC 98 | mkdir -p "${CRYOSPARC_INSTALL_PATH}" 99 | # Need to make sure OSUSER can write to this path 100 | chown ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH} 101 | 102 | cd "${CRYOSPARC_INSTALL_PATH}" || return 103 | [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_master.tar.gz" ] || curl -L "https://get.cryosparc.com/download/master-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_master.tar.gz 104 | [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker.tar.gz" ] || curl -L "https://get.cryosparc.com/download/worker-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_worker.tar.gz 105 | 106 | # Install cryoSPARC main process 107 | tar -xf cryosparc_master.tar.gz 108 | 109 | # cryosparc untars with ownership: 1001:1001 by default. re-align permissions to OSUSER 110 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master 111 | 112 | # Basic configuration for install 113 | export CRYOSPARC_FORCE_USER=true 114 | export CRYOSPARC_FORCE_HOSTNAME=true 115 | export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true 116 | 117 | # Install Main 118 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_master && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \ 119 | --hostname "${HOSTNAME}" \ 120 | --dbpath "${CRYOSPARC_INSTALL_PATH}"/cryosparc_db \ 121 | --port 45000 \ 122 | --allowroot \ 123 | --yes" - $OSUSER 124 | 125 | # Enforce configuration long-term 126 | echo "export CRYOSPARC_FORCE_USER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh 127 | echo "export CRYOSPARC_FORCE_HOSTNAME=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh 128 | echo "export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh 129 | 130 | # Ownership of this path determines how cryosparc is started 131 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master 132 | 133 | # Start cryoSPARC main package 134 | /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER} 135 | 136 | # Install cryoSPARC worker package 137 | cd "${CRYOSPARC_INSTALL_PATH}" || return 138 | tar -xf cryosparc_worker.tar.gz 139 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker 140 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \ 141 | --cudapath "${CUDA_INSTALL_PATH}/${CUDA_VERSION}" \ 142 | --yes" - $OSUSER 143 | 144 | #rm "${CRYOSPARC_INSTALL_PATH}"/*.tar.gz 145 | 146 | # Once again, re-align permissions 147 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker 148 | 149 | # Start cryoSPARC main package 150 | /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 151 | 152 | else 153 | echo "Restoring CryoSPARC with updated Hostname and refreshing paritition connections" 154 | 155 | # Stop any running cryosparc 156 | systemctl stop cryosparc-supervisor.service || /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop || echo \"Nothing Running\" " - ${OSUSER} 157 | 158 | # Update hostname to new main 159 | sed -i "s/^\(.*CRYOSPARC_MASTER_HOSTNAME=\"\).*\"/\1$HOSTNAME\"/g" ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/config.sh 160 | 161 | # Once again, re-align permissions for proper start 162 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master 163 | fi 164 | 165 | # Start cluster 166 | #/bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER} 167 | 168 | # Confirm Restart CryoSPARC main 169 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm restart" - ${OSUSER} 170 | 171 | # Stop server in anticipation for service 172 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 173 | 174 | # Create the CryoSPARC Systemd service and start at Boot 175 | eval $(cryosparcm env) 176 | cd "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd" || return 177 | # Final alignment on permissions 178 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd 179 | env "CRYOSPARC_ROOT_DIR=$CRYOSPARC_ROOT_DIR" ./install_services.sh 180 | systemctl start cryosparc-supervisor.service 181 | systemctl restart cryosparc-supervisor.service 182 | systemctl enable cryosparc-supervisor.service 183 | 184 | 185 | # Be tolerant of errors on partitions; we can always come back through admin panel and add later 186 | set +e 187 | echo "Partitions:" 188 | /opt/slurm/bin/scontrol show partitions 189 | 190 | echo "Beginning" 191 | # Create cluster config files 192 | for PARTITION in $( /opt/slurm/bin/scontrol show partitions | grep PartitionName | cut -d'=' -f 2 ) 193 | do 194 | if [ ! -f "${CRYOSPARC_INSTALL_PATH}/${PARTITION}/cluster_info.json" ]; then 195 | echo "Connecting New Partition: ${PARTITION}" 196 | case $PARTITION in 197 | compute-single-gpu) 198 | echo "L4 GPU: $PARTITION" 199 | PARTITION_CACHE_PATH="/scratch" 200 | PARTITION_CACHE_RESERVE=10000 201 | PARTITION_CACHE_QUOTA=800000 202 | PARTITION_RAM_GB_MULTIPLIER=2.0 203 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 204 | ;; 205 | compute-multi-gpu) 206 | echo "L4 GPU: $PARTITION" 207 | PARTITION_CACHE_PATH="/scratch" 208 | PARTITION_CACHE_RESERVE=10000 209 | PARTITION_CACHE_QUOTA=800000 210 | PARTITION_RAM_GB_MULTIPLIER=2.0 211 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 212 | ;; 213 | *) 214 | PARTITION_CACHE_PATH="" 215 | PARTITION_CACHE_RESERVE=10000 216 | PARTITION_CACHE_QUOTA=null 217 | PARTITION_RAM_GB_MULTIPLIER=2.0 218 | SBATCH_EXTRA="" 219 | ;; 220 | esac 221 | 222 | mkdir -p "${CRYOSPARC_INSTALL_PATH}/${PARTITION}" 223 | cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_info.json << EOF 224 | { 225 | "qdel_cmd_tpl": "/opt/slurm/bin/scancel {{ cluster_job_id }}", 226 | "worker_bin_path": "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw", 227 | "title": "cryosparc-cluster", 228 | "cache_path": "${PARTITION_CACHE_PATH}", 229 | "cache_reserve_mb": ${PARTITION_CACHE_RESERVE}, 230 | "cache_quota_mb": ${PARTITION_CACHE_QUOTA}, 231 | "qinfo_cmd_tpl": "/opt/slurm/bin/sinfo --format='%.42N %.5D %.15P %.8T %.15C %.5c %.10z %.10m %.15G %.9d %40E'", 232 | "qsub_cmd_tpl": "/opt/slurm/bin/sbatch {{ script_path_abs }}", 233 | "qstat_cmd_tpl": "/opt/slurm/bin/squeue -j {{ cluster_job_id }}", 234 | "send_cmd_tpl": "{{ command }}", 235 | "name": "${PARTITION}" 236 | } 237 | EOF 238 | #sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E' 239 | 240 | cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_script.sh << EOF 241 | #!/usr/bin/env bash 242 | #### cryoSPARC cluster submission script template for SLURM 243 | ## Available variables: 244 | ## {{ run_cmd }} - the complete command string to run the job 245 | ## {{ num_cpu }} - the number of CPUs needed 246 | ## {{ num_gpu }} - the number of GPUs needed. 247 | ## Note: the code will use this many GPUs starting from dev id 0 248 | ## the cluster scheduler or this script have the responsibility 249 | ## of setting CUDA_VISIBLE_DEVICES so that the job code ends up 250 | ## using the correct cluster-allocated GPUs. 251 | ## {{ ram_gb }} - the amount of RAM needed in GB 252 | ## {{ job_dir_abs }} - absolute path to the job directory 253 | ## {{ project_dir_abs }} - absolute path to the project dir 254 | ## {{ job_log_path_abs }} - absolute path to the log file for the job 255 | ## {{ worker_bin_path }} - absolute path to the cryosparc worker command 256 | ## {{ run_args }} - arguments to be passed to cryosparcw run 257 | ## {{ project_uid }} - uid of the project 258 | ## {{ job_uid }} - uid of the job 259 | ## {{ job_creator }} - name of the user that created the job (may contain spaces) 260 | ## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email) 261 | ## {{ job_type }} - CryoSPARC job type 262 | ## 263 | ## What follows is a simple SLURM script: 264 | 265 | #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }} 266 | #SBATCH -n {{ num_cpu }} 267 | ${SBATCH_EXTRA} 268 | #SBATCH --partition=${PARTITION} 269 | #SBATCH --mem={{ (ram_gb|float * ram_gb_multiplier|float)|int }}G 270 | #SBATCH --output={{ job_log_path_abs }} 271 | #SBATCH --error={{ job_log_path_abs }} 272 | 273 | {{ run_cmd }} 274 | EOF 275 | 276 | #sed -i "s|@PARTITION@|${PARTITION}|g" "${CRYOSPARC_INSTALL_PATH}"/cluster_script.sh 277 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/${PARTITION} 278 | 279 | # Connect CryoSPARC worker nodes to cluster 280 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster connect" - ${OSUSER} 281 | 282 | # Individually apply custom_vars 283 | CLICMD=$(cat << EOT 284 | set_scheduler_target_property(hostname="${PARTITION}",key="custom_vars",value={"ram_gb_multiplier": "${PARTITION_RAM_GB_MULTIPLIER}"}) 285 | EOT 286 | ) 287 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cli '$CLICMD' " - ${OSUSER} 288 | echo "Done connecting $PARTITION" 289 | else 290 | echo "Partition already connected to CryoSPARC: ${PARTITION}" 291 | fi 292 | 293 | done 294 | set -e 295 | 296 | # VALIDATE CRYOSPARC 297 | #echo "Validating lanes" 298 | #/bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 299 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate cpu --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 300 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-t4 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 301 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-a100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 302 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-v100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 303 | 304 | set +e 305 | echo "Attempting last attempt to update to latest version...if this fails, you may need to manually update head and compute" 306 | 307 | # Update to latest version of CryoSPARC 308 | systemctl stop cryosparc-supervisor.service 309 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm update" - ${OSUSER} 310 | # Depends on cryosparcm update to pull latest worker to cryosparc_master dir. 311 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && cp ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/cryosparc_worker.tar.gz ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/cryosparc_worker.tar.gz" - ${OSUSER} 312 | # Only update workers if they were installed (continue otherwise) 313 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw update" - ${OSUSER} || true 314 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 315 | systemctl start cryosparc-supervisor.service 316 | set -e 317 | 318 | echo "CryoSPARC setup complete" 319 | -------------------------------------------------------------------------------- /PCSREADME.md: -------------------------------------------------------------------------------- 1 | # Guidance for Cryo-EM on AWS Parallel Computing Service 2 | 3 | ## Table of Contents 4 | 5 | - [Guidance for Cryo-EM on AWS Parallel Computing Service](#guidance-for-cryo-em-on-aws-parallel-computing-service) 6 | - [Table of Contents](#table-of-contents) 7 | - [Overview](#overview) 8 | - [Cost](#cost) 9 | - [Prerequisites](#prerequisites) 10 | - [Operating System](#operating-system) 11 | - [Supported Regions](#supported-regions) 12 | - [Data Transfer](#data-transfer) 13 | - [Deployment Steps](#deployment-steps) 14 | - [Running the Guidance](#running-the-guidance) 15 | - [Next Steps](#next-steps) 16 | - [Install ChimeraX for Visualization](#install-chimerax-for-visualization) 17 | - [Cleanup](#cleanup) 18 | - [FAQ, known issues, additional considerations, and limitations](#faq-known-issues-additional-considerations-and-limitations) 19 | - [AWS ParallelCluster](#aws-parallelcluster) 20 | - [Notices](#notices) 21 | - [License](#license) 22 | - [Authors](#authors) 23 | 24 | ## Overview 25 | 26 | This guidance demonstrates how to deploy CryoSPARC for cryogenic electron microscopy (Cryo-EM) workloads on AWS Parallel Computing Service (PCS). Cryo-EM enables drug discovery researchers to determine three-dimensional molecular structures crucial for their research. This solution addresses the challenge of processing terabytes of microscopy data through scalable, heterogeneous computing combined with fast, cost-effective storage. 27 | 28 | Below is the architecture model for this guidance. 29 | 30 | ![Architecturewsteps](assets/cryoemPCSarchitecture.png) 31 | 32 | ## Cost 33 | 34 | _You are responsible for the cost of the AWS services used while running this Guidance. As of September 2025, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $795.98 per sample. This estimate is based on processing 1 sample (1 TB of data). Cost calculations were derived using the times measured under realistic workload conditions for each instance type._ 35 | 36 | Below you can find a cost breakdown for this estimate based on the resources this guidance runs and assuming the aforementioned working periods (1 sample, 1 TB of data). 37 | 38 | | AWS service | Dimensions | Cost [USD] | 39 | | ---------------------------------- | ----------------------------------------- | ---------- | 40 | | AWS Simple Storage Service (S3) | 1 TB w/ Intelligent Tiering | $ 23.72 | 41 | | Amazon Elastic File Service (EFS) | 100 GB Elastic Throughput | $ 30.00 | 42 | | Amazon FSx for Lustre | 1.2TB SSD - 250 MBps/TiB | $ 252.35 | 43 | | AWS Parallel Compute Service (PCS) | Small Slurm Controller | $ 56.97 | 44 | | Amazon Elastic Compute Cloud (EC2) | (Login Node) 1 On-Demand c5a.4xlarge | $ 33.02 | 45 | | Amazon Elastic Compute Cloud (EC2) | (CPU Group) 1 On-Demand c5a.8xlarge | $ 0.16 | 46 | | Amazon Elastic Compute Cloud (EC2) | (Single-GPU Group) 1 On-Demand g6.4xlarge | $ 28.35 | 47 | | Amazon Elastic Compute Cloud (EC2) | (Multi-GPU Group) 1 On-Demand g6.48xlarge | $ 371.41 | 48 | 49 | _We recommend creating a [Budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance._ 50 | 51 | ## Prerequisites 52 | 53 | In order to be able to run this guidance and to use CryoSparc you need to have the following: 54 | 55 | - CryoSPARC license ([contact Structura Biotechnology to obtain](https://guide.cryosparc.com/licensing)). 56 | - [AWS CLI](https://aws.amazon.com/cli/) installed and configured. 57 | - See additional [prerequisites](https://docs.aws.amazon.com/pcs/latest/userguide/getting-started_prerequisites.html) from AWS PCS. 58 | - An SSH client. 59 | 60 | ### Operating System 61 | 62 | These deployment instructions are optimized to best work on a Mac or Linux environment. Deployment in Windows may require additional steps for setting up required libraries and CLI. 63 | 64 | ### Supported Regions 65 | 66 | Only the following regions are supported for this guidance: 67 | 68 | - United States (N. Virginia) 69 | - United States (Ohio) 70 | - United States (Oregon) 71 | 72 | - Asia Pacific (Singapore) 73 | - Asia Pacific (Sydney) 74 | - Asia Pacific (Tokyo) 75 | 76 | - Europe (Frankfurt) 77 | - Europe (Ireland) 78 | - Europe (London) 79 | - Europe (Stockholm) 80 | 81 | Deploying the guidance in other regions may lead to errors or inconsistent behavior. 82 | 83 | ### Data Transfer 84 | 85 | Create a new S3 bucket for your input data. 86 | 87 | The data transfer mechanism to move data from instruments into S3 depends on the connectivity in the lab environment and the volume of data to be transferred. We recommend [AWS DataSync](https://aws.amazon.com/datasync/), which automates secure data transfer from on-premises into the cloud with minimal development effort. [Storage Gateway File Gateway](https://aws.amazon.com/storagegateway/file/) is another viable option, especially if lab connectivity is limited or continued two-way access from on-premises to the transferred data sets is required. Both DataSync and Storage Gateway [can be bandwidth throttled](https://docs.aws.amazon.com/datasync/latest/userguide/working-with-task-executions.html) to protect non-HPC business-critical network constraints. 88 | 89 | Alternatively, you can use the [AWS S3 CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/) to transfer individual files, or use partner solution to get started quickly. 90 | 91 | ## Deployment Steps 92 | 93 | 1. **Clone the GitHub Repository** 94 | Clone this repository. View this [README](deployment/aws-hpc-recipe/recipes/pcs/getting_started/README.md) for deploying a PCS cluster. To create a PCS cluster with the right shared storage for this example, you can use the PCS guidance recipes for a one-click deployment, which uses AWS CloudFormation to launch an entire cluster, quickly. 95 | 96 | ```bash 97 | git clone https://github.com/aws-samples/cryoem-on-aws-parallel-cluster.git 98 | cd deployment/aws-hpc-recipe/recipes/pcs/getting_started 99 | cat README.md 100 | ``` 101 | 102 | 2. **Launch the PCS Cluster Using CloudFormation** 103 | - Navigate to the AWS Management Console → **CloudFormation**. 104 | - Choose **Create stack** → **With new resources (standard)**. 105 | - Upload the PCS template from the cloned repo. 106 | - Provide an **SSH key pair** if you want shell access to the login node. 107 | - Leave all other defaults unchanged and click **Create stack**. 108 | 109 | This creates: 110 | - Networking prerequisites 111 | - A Login Node group 112 | - One demo Compute Node group 113 | - An **Amazon EFS** file system mounted at `/home` 114 | - An **Amazon FSx for Lustre** file system mounted at `/shared` 115 | 116 | ![CloudFormation stacks](assets/cfnstackhpctemplate.png) 117 | 118 | 3. **Update FSx for Lustre Throughput** 119 | Increase throughput per unit of storage to support CryoSPARC installation: 120 | 121 | ```bash 122 | aws fsx update-file-system --file-system-id --lustre-configuration PerUnitStorageThroughput=250 123 | ``` 124 | 125 | This may take up to 20 minutes to complete. 126 | 127 | 4. **Retrieve Compute Node Group Information** 128 | Run the following to get AMI ID, Instance Profile ARN, and Launch Template ID: 129 | 130 | ```bash 131 | aws pcs get-compute-node-group --cluster-identifier --compute-node-group-identifier compute-1 132 | ``` 133 | 134 | Save the output values for use in the next step. 135 | 136 | 5. **Create Additional Compute Node Groups** 137 | Run the following commands to create CPU, single-GPU, and multi-GPU node groups: 138 | 139 | ```bash 140 | aws pcs create-compute-node-group --compute-node-group-name compute-cpu --cluster-identifier --region --subnet-ids --custom-launch-template id=,version='1' --ami-id --iam-instance-profile --scaling-config minInstanceCount=0,maxInstanceCount=2 --instance-configs instanceType=c5a.8xlarge 141 | 142 | aws pcs create-compute-node-group --compute-node-group-name compute-single-gpu --cluster-identifier --region --subnet-ids --custom-launch-template id=,version='1' --ami-id --iam-instance-profile --scaling-config minInstanceCount=0,maxInstanceCount=2 --instance-configs instanceType=g6.4xlarge 143 | 144 | aws pcs create-compute-node-group --compute-node-group-name compute-multi-gpu --cluster-identifier --region --subnet-ids --custom-launch-template id=,version='1' --ami-id --iam-instance-profile --scaling-config minInstanceCount=0,maxInstanceCount=2 --instance-configs instanceType=g6.48xlarge 145 | ``` 146 | 147 | 6. **Verify Node Group Creation** 148 | Confirm that each node group is active: 149 | 150 | ```bash 151 | aws pcs get-compute-node-group --region --cluster-identifier --compute-node-group-identifier 152 | ``` 153 | 154 | Wait until the status returns `ACTIVE`. 155 | 156 | 7. **Create Queues for Node Groups** 157 | Map queues to node groups so CryoSPARC can submit jobs to the right hardware: 158 | 159 | ```bash 160 | aws pcs create-queue --queue-name cpu-queue --cluster-identifier --compute-node-group-configurations computeNodeGroupId= 161 | 162 | aws pcs create-queue --queue-name single-gpu-queue --cluster-identifier --compute-node-group-configurations computeNodeGroupId= 163 | 164 | aws pcs create-queue --queue-name multi-gpu-queue --cluster-identifier --compute-node-group-configurations computeNodeGroupId= 165 | ``` 166 | 167 | 8. **Verify Queues** 168 | Check that the queues are created and active: 169 | ```bash 170 | aws pcs get-queue --region --cluster-identifier --queue-identifier 171 | ``` 172 | 173 | ## Running the Guidance 174 | 175 | 1. **Log in to the PCS Login Node** 176 | - Open the **Amazon EC2 Console**. 177 | - Search for your Login Node instance using the tag: 178 | ``` 179 | aws:pcs:compute-node-group-id= 180 | ``` 181 | - Select the instance → **Connect** → **Session Manager** → **Connect**. 182 | - Switch to the `ec2-user`: 183 | ```bash 184 | sudo su - ec2-user 185 | ``` 186 | 187 | 2. **Check Available Slurm Queues** 188 | Run: 189 | 190 | ```bash 191 | sinfo 192 | ``` 193 | 194 | You should see partitions for CPU, GPU, and multi-GPU nodes. 195 | 3. **Download and Run CryoSPARC Installation Script** 196 | 197 | ```bash 198 | wget https://raw.githubusercontent.com/aws-samples/cryoem-on-aws-parallel-cluster/refs/heads/main/source/pcs-cryosparc-post-install.sh 199 | chmod +x pcs-cryosparc-post-install.sh 200 | sudo ./pcs-cryosparc-post-install.sh /shared/cryosparc /shared/cuda 11.8.0 11.8.0_520.61.05 /shared 201 | ``` 202 | 203 | Installation can take up to an hour. 204 | 205 | 4. **Start the CryoSPARC Server** 206 | 207 | ```bash 208 | /shared/cryosparc/cryosparc_master/bin/cryosparcm start 209 | ``` 210 | 211 | 5. **Create a CryoSPARC User** 212 | 213 | ```bash 214 | cryosparcm createuser --email "" --password "" --username "" --firstname "" --lastname "" 215 | ``` 216 | 217 | 6. **Access the CryoSPARC UI** 218 | - Open an SSH tunnel from your local machine: 219 | ```bash 220 | ssh -i /path/to/key.pem -N -f -L localhost:45000:localhost:45000 ec2-user@ 221 | ``` 222 | - In a browser, go to: 223 | [http://localhost:45000](http://localhost:45000) 224 | - Log in with your CryoSPARC user credentials. 225 | ![CryoSPARC Sign In](assets/cryosparcsigninpage.png) 226 | 227 | 7. **Download and Extract a Test Dataset** 228 | 229 | Download the [movies test set](https://guide.cryosparc.com/processing-data/get-started-with-cryosparc-introductory-tutorial) from the CryoSparc introductory tutorial. 230 | 231 | ```bash 232 | mkdir /shared/data 233 | cd /shared/data 234 | /shared/cryosparc/cryosparc_master/bin/cryosparcm downloadtest 235 | tar -xf empiar_10025_subset.tar 236 | ``` 237 | 238 | 8. **Run a Test Job in CryoSPARC UI** 239 | - Create a new **Import Movies** job. 240 | - Select the `compute-cpu` lane (queue). 241 | - Submit the job. 242 | 243 | In the terminal, check the running job with: 244 | 245 | ```bash 246 | squeue 247 | ``` 248 | 249 | or check allocated nodes with: 250 | 251 | ```bash 252 | sinfo 253 | ``` 254 | 255 | ## Next Steps 256 | 257 | #### Install ChimeraX for Visualization 258 | 259 | Install ChimeraX on the login node and use Amazon DCV for remote desktop visualization. This can enable users to directly visualize CryoSPARC results without transferring data. 260 | 261 | ## Cleanup 262 | 263 | To cleanup the provisioned resources follow these steps: 264 | 265 | 1. Delete PCS Queues 266 | 267 | ```bash 268 | aws pcs delete-queue --cluster-identifier --queue-identifier cpu-queue 269 | aws pcs delete-queue --cluster-identifier --queue-identifier single-gpu-queue 270 | aws pcs delete-queue --cluster-identifier --queue-identifier multi-gpu-queue 271 | ``` 272 | 273 | 2. Delete Node Groups 274 | 275 | ```bash 276 | aws pcs delete-compute-node-group --cluster-identifier --compute-node-group-identifier compute-cpu 277 | aws pcs delete-compute-node-group --cluster-identifier --compute-node-group-identifier compute-single-gpu 278 | aws pcs delete-compute-node-group --cluster-identifier --compute-node-group-identifier compute-multi-gpu 279 | ``` 280 | 281 | 3. Delete CloudFormation Stack 282 | 283 | ```bash 284 | aws cloudformation delete-stack --stack-name 285 | ``` 286 | 287 | ## FAQ, known issues, additional considerations, and limitations 288 | 289 | ### AWS ParallelCluster 290 | 291 | AWS ParallelCluster offers an alternative deployment method for running CryoSPARC workloads. AWS ParallelCluster might be preferred when you need more granular control over your HPC infrastructure or require customized configurations that aren't available in PCS. It offers greater flexibility in cluster customization, including the ability to modify the underlying infrastructure, customize AMIs, and implement specific security configurations. 292 | 293 | In such situations where AWS ParallelCluster may be preferred, an AWS guidance is available [here](ParallelClusterREADME.md). 294 | 295 | ## Notices 296 | 297 | _Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers._ 298 | 299 | ## License 300 | 301 | This library is licensed under the MIT-0 License. See the LICENSE file. 302 | 303 | ## Authors 304 | 305 | - Marissa Powers 306 | - Juan Perin 307 | - Rye Robinson 308 | -------------------------------------------------------------------------------- /source/parallel-cluster-post-install.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ## Args: 4 | # argv1 - CRYOSPARC_LICENSE_ID (required) 5 | # argv2 - CRYOSPARC_INSTALL_PATH (default: /shared/cryosparc) 6 | # argv3 - CUDA_INSTALL_PATH (default: /shared/cuda) 7 | # argv4 - CUDA_VERSION (default 11.3.1) 8 | # argv5 - CUDA_LONG_VERSION (default: 11.3.1_465.19.01) 9 | # argv6 - PROJECT_DATA_PATH (default: /fsx) 10 | 11 | set +e 12 | # Log script output to a file to reference later 13 | exec &> >(tee -a "/tmp/post_install.log") 14 | 15 | . "/etc/parallelcluster/cfnconfig" 16 | 17 | # Get the local commands to run yum and apt 18 | YUM_CMD=$(which yum || echo "") 19 | APT_GET_CMD=$(which apt-get || echo "") 20 | 21 | # If we have yum installed, use it to install prerequisites. If not, use apt 22 | if [[ -n $YUM_CMD ]]; then 23 | wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -P /tmp 24 | yum install -y /tmp/epel-release-latest-7.noarch.rpm 25 | 26 | yum install -y perl-Switch python3 python3-pip links 27 | user_test=$(getent passwd ec2-user) 28 | if [[ -n "${user_test}" ]]; then 29 | OSUSER=ec2-user 30 | OSGROUP=ec2-user 31 | else 32 | OSUSER=centos 33 | OSGROUP=centos 34 | fi 35 | elif [[ -n $APT_GET_CMD ]]; then 36 | apt-get update 37 | apt-get install -y libswitch-perl python3 python3-pip links 38 | OSUSER=ubuntu 39 | OSGROUP=ubuntu 40 | else 41 | # If we don't have yum or apt, we couldn't install the prerequisites, so exit 42 | echo "error can't install package $PACKAGE" 43 | exit 1; 44 | fi 45 | 46 | # Get the cryoSPARC license ID, optional path, and optional versions from the script arguments 47 | CRYOSPARC_LICENSE_ID=$1 48 | CRYOSPARC_INSTALL_PATH=${2:-/shared/cryosparc} 49 | CUDA_INSTALL_PATH=${3:-/shared/cuda} 50 | CUDA_VERSION=${4:-11.3.1} 51 | CUDA_LONG_VERSION=${5:-11.3.1_465.19.01} 52 | PROJECT_DATA_PATH=${6:-/fsx} 53 | 54 | /bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} || chmod 777 ${PROJECT_DATA_PATH} 55 | 56 | # Install the AWS CLI 57 | pip3 install --upgrade awscli boto3 58 | 59 | set -e 60 | 61 | # Configure AWS 62 | AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | rev | cut -c 2- | rev) 63 | aws configure set default.region "${AWS_DEFAULT_REGION}" 64 | aws configure set default.output json 65 | 66 | if [[ "$(cat ${CUDA_INSTALL_PATH}/installed_cuda_version.log 2>/dev/null )" == "${CUDA_LONG_VERSION}" ]]; then 67 | echo "Matched previous CUDA version. Using old installer ${CUDA_LONG_VERSION}" 68 | else 69 | echo "Installing new version of CUDA ${CUDA_LONG_VERSION} (this may break cryosparc install)" 70 | # Install CUDA Toolkit 71 | mkdir -p "${CUDA_INSTALL_PATH}" 72 | mkdir -p "${CUDA_INSTALL_PATH}_tmp" 73 | cd "${CUDA_INSTALL_PATH}" || return 74 | wget "https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda_${CUDA_LONG_VERSION}_linux.run" 75 | fi 76 | echo "${CUDA_LONG_VERSION}" > ${CUDA_INSTALL_PATH}/installed_cuda_version.log 77 | 78 | sh ${CUDA_INSTALL_PATH}/cuda_"${CUDA_LONG_VERSION}"_linux.run --tmpdir="${CUDA_INSTALL_PATH}_tmp" --defaultroot="${CUDA_INSTALL_PATH}" --toolkit --toolkitpath="${CUDA_INSTALL_PATH}"/"${CUDA_VERSION}" --samples --silent 79 | #rm cuda_"${CUDA_LONG_VERSION}"_linux.run 80 | 81 | # Add CUDA to the path 82 | cat > /etc/profile.d/cuda.sh << 'EOF' 83 | PATH=$PATH:@CUDA_INSTALL_PATH@/@CUDA_VERSION@/bin 84 | EOF 85 | sed -i "s|@CUDA_INSTALL_PATH@|${CUDA_INSTALL_PATH}|g" /etc/profile.d/cuda.sh 86 | sed -i "s|@CUDA_VERSION@|${CUDA_VERSION}|g" /etc/profile.d/cuda.sh 87 | . /etc/profile.d/cuda.sh 88 | 89 | # Add CryoSPARC to the path 90 | cat > /etc/profile.d/cryosparc.sh << 'EOF' 91 | PATH=$PATH:@CRYOSPARC_INSTALL_PATH@/cryosparc_master/bin 92 | EOF 93 | sed -i "s|@CRYOSPARC_INSTALL_PATH@|${CRYOSPARC_INSTALL_PATH}|g" /etc/profile.d/cryosparc.sh 94 | . /etc/profile.d/cryosparc.sh 95 | 96 | # Condition checks whether /etc/profile.d/cryosparc.sh activated previously install cryosparc 97 | # if not, then we install cryosparc 98 | if [ ! -x "$(command -v "cryosparcm")" ]; then 99 | echo "Installing fresh CryoSPARC" 100 | 101 | # Download cryoSPARC 102 | mkdir -p "${CRYOSPARC_INSTALL_PATH}" 103 | # Need to make sure OSUSER can write to this path 104 | chown ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH} 105 | 106 | cd "${CRYOSPARC_INSTALL_PATH}" || return 107 | [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_master.tar.gz" ] || curl -L "https://get.cryosparc.com/download/master-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_master.tar.gz 108 | [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker.tar.gz" ] || curl -L "https://get.cryosparc.com/download/worker-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_worker.tar.gz 109 | 110 | # Install cryoSPARC main process 111 | tar -xf cryosparc_master.tar.gz 112 | 113 | # cryosparc untars with ownership: 1001:1001 by default. re-align permissions to OSUSER 114 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master 115 | 116 | # Basic configuration for install 117 | export CRYOSPARC_FORCE_USER=true 118 | export CRYOSPARC_FORCE_HOSTNAME=true 119 | export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true 120 | 121 | # Install Main 122 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_master && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \ 123 | --hostname "${HOSTNAME}" \ 124 | --dbpath "${CRYOSPARC_INSTALL_PATH}"/cryosparc_db \ 125 | --port 45000 \ 126 | --allowroot \ 127 | --yes" - $OSUSER 128 | 129 | # Enforce configuration long-term 130 | echo "export CRYOSPARC_FORCE_USER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh 131 | echo "export CRYOSPARC_FORCE_HOSTNAME=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh 132 | echo "export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh 133 | 134 | # Ownership of this path determines how cryosparc is started 135 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master 136 | 137 | # Start cryoSPARC main package 138 | /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER} 139 | 140 | # Install cryoSPARC worker package 141 | cd "${CRYOSPARC_INSTALL_PATH}" || return 142 | tar -xf cryosparc_worker.tar.gz 143 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker 144 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \ 145 | --cudapath "${CUDA_INSTALL_PATH}/${CUDA_VERSION}" \ 146 | --yes" - $OSUSER 147 | 148 | # Once again, re-align permissions 149 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker 150 | 151 | # Start cryoSPARC main package 152 | /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 153 | 154 | else 155 | echo "Restoring CryoSPARC with updated Hostname and refreshing paritition connections" 156 | 157 | # Stop any running cryosparc 158 | systemctl stop cryosparc-supervisor.service || /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop || echo \"Nothing Running\" " - ${OSUSER} 159 | 160 | # Update hostname to new main 161 | sed -i "s/^\(.*CRYOSPARC_MASTER_HOSTNAME=\"\).*\"/\1$HOSTNAME\"/g" ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/config.sh 162 | 163 | # Once again, re-align permissions for proper start 164 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master 165 | fi 166 | 167 | # Start cluster 168 | #/bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER} 169 | 170 | # Confirm Restart CryoSPARC main 171 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm restart" - ${OSUSER} 172 | 173 | # Stop server in anticipation for service 174 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 175 | 176 | # Create the CryoSPARC Systemd service and start at Boot 177 | eval $(cryosparcm env) 178 | cd "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd" || return 179 | # Final alignment on permissions 180 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd 181 | env "CRYOSPARC_ROOT_DIR=$CRYOSPARC_ROOT_DIR" ./install_services.sh 182 | systemctl start cryosparc-supervisor.service 183 | systemctl restart cryosparc-supervisor.service 184 | systemctl enable cryosparc-supervisor.service 185 | 186 | 187 | # Be tolerant of errors on partitions; we can always come back through admin panel and add later 188 | set +e 189 | echo "Partitions:" 190 | /opt/slurm/bin/scontrol show partitions 191 | 192 | echo "Beginning" 193 | # Create cluster config files 194 | for PARTITION in $( /opt/slurm/bin/scontrol show partitions | grep PartitionName | cut -d'=' -f 2 ) 195 | do 196 | if [ ! -f "${CRYOSPARC_INSTALL_PATH}/${PARTITION}/cluster_info.json" ]; then 197 | echo "Connecting New Partition: ${PARTITION}" 198 | case $PARTITION in 199 | gpu-t4*) 200 | echo "T4 GPU: $PARTITION" 201 | PARTITION_CACHE_PATH="/scratch" 202 | PARTITION_CACHE_RESERVE=10000 203 | PARTITION_CACHE_QUOTA=800000 204 | PARTITION_RAM_GB_MULTIPLIER=2.0 205 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 206 | ;; 207 | gpu-l4*) 208 | echo "L4 GPU: $PARTITION" 209 | PARTITION_CACHE_PATH="/scratch" 210 | PARTITION_CACHE_RESERVE=10000 211 | PARTITION_CACHE_QUOTA=800000 212 | PARTITION_RAM_GB_MULTIPLIER=2.0 213 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 214 | ;; 215 | gpu-a100*) 216 | echo "A100 GPU: $PARTITION" 217 | PARTITION_CACHE_PATH="/scratch" 218 | PARTITION_CACHE_RESERVE=10000 219 | PARTITION_CACHE_QUOTA=800000 220 | PARTITION_RAM_GB_MULTIPLIER=2.0 221 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 222 | ;; 223 | cpu*) 224 | echo "X86: $PARTITION" 225 | PARTITION_CACHE_PATH="/scratch" 226 | PARTITION_CACHE_RESERVE=10000 227 | PARTITION_CACHE_QUOTA=800000 228 | PARTITION_RAM_GB_MULTIPLIER=2.0 229 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 230 | ;; 231 | gpu-a100-spot*) 232 | echo "A100 GPU SPOT: $PARTITION" 233 | PARTITION_CACHE_PATH="/scratch" 234 | PARTITION_CACHE_RESERVE=10000 235 | PARTITION_CACHE_QUOTA=800000 236 | PARTITION_RAM_GB_MULTIPLIER=2.0 237 | SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}" 238 | ;; 239 | esac 240 | 241 | mkdir -p "${CRYOSPARC_INSTALL_PATH}/${PARTITION}" 242 | cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_info.json << EOF 243 | { 244 | "qdel_cmd_tpl": "/opt/slurm/bin/scancel {{ cluster_job_id }}", 245 | "worker_bin_path": "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw", 246 | "title": "cryosparc-cluster", 247 | "cache_path": "${PARTITION_CACHE_PATH}", 248 | "cache_reserve_mb": ${PARTITION_CACHE_RESERVE}, 249 | "cache_quota_mb": ${PARTITION_CACHE_QUOTA}, 250 | "qinfo_cmd_tpl": "/opt/slurm/bin/sinfo --format='%.42N %.5D %.15P %.8T %.15C %.5c %.10z %.10m %.15G %.9d %40E'", 251 | "qsub_cmd_tpl": "/opt/slurm/bin/sbatch {{ script_path_abs }}", 252 | "qstat_cmd_tpl": "/opt/slurm/bin/squeue -j {{ cluster_job_id }}", 253 | "send_cmd_tpl": "{{ command }}", 254 | "name": "${PARTITION}" 255 | } 256 | EOF 257 | #sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E' 258 | 259 | cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_script.sh << EOF 260 | #!/usr/bin/env bash 261 | #### cryoSPARC cluster submission script template for SLURM 262 | ## Available variables: 263 | ## {{ run_cmd }} - the complete command string to run the job 264 | ## {{ num_cpu }} - the number of CPUs needed 265 | ## {{ num_gpu }} - the number of GPUs needed. 266 | ## Note: the code will use this many GPUs starting from dev id 0 267 | ## the cluster scheduler or this script have the responsibility 268 | ## of setting CUDA_VISIBLE_DEVICES so that the job code ends up 269 | ## using the correct cluster-allocated GPUs. 270 | ## {{ ram_gb }} - the amount of RAM needed in GB 271 | ## {{ job_dir_abs }} - absolute path to the job directory 272 | ## {{ project_dir_abs }} - absolute path to the project dir 273 | ## {{ job_log_path_abs }} - absolute path to the log file for the job 274 | ## {{ worker_bin_path }} - absolute path to the cryosparc worker command 275 | ## {{ run_args }} - arguments to be passed to cryosparcw run 276 | ## {{ project_uid }} - uid of the project 277 | ## {{ job_uid }} - uid of the job 278 | ## {{ job_creator }} - name of the user that created the job (may contain spaces) 279 | ## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email) 280 | ## {{ job_type }} - CryoSPARC job type 281 | ## 282 | ## What follows is a simple SLURM script: 283 | 284 | #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }} 285 | #SBATCH -n {{ num_cpu }} 286 | ${SBATCH_EXTRA} 287 | #SBATCH --partition=${PARTITION} 288 | #SBATCH --mem={{ (ram_gb|float * ram_gb_multiplier|float)|int }}G 289 | #SBATCH --output={{ job_log_path_abs }} 290 | #SBATCH --error={{ job_log_path_abs }} 291 | 292 | {{ run_cmd }} 293 | EOF 294 | 295 | #sed -i "s|@PARTITION@|${PARTITION}|g" "${CRYOSPARC_INSTALL_PATH}"/cluster_script.sh 296 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/${PARTITION} 297 | 298 | # Connect CryoSPARC worker nodes to cluster 299 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster connect" - ${OSUSER} 300 | 301 | # Individually apply custom_vars 302 | CLICMD=$(cat << EOT 303 | set_scheduler_target_property(hostname="${PARTITION}",key="custom_vars",value={"ram_gb_multiplier": "${PARTITION_RAM_GB_MULTIPLIER}"}) 304 | EOT 305 | ) 306 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cli '$CLICMD' " - ${OSUSER} 307 | echo "Done connecting $PARTITION" 308 | else 309 | echo "Partition already connected to CryoSPARC: ${PARTITION}" 310 | fi 311 | 312 | done 313 | set -e 314 | 315 | # VALIDATE CRYOSPARC 316 | # This stage can be run after cluster creation. 317 | #echo "Validating lanes" 318 | #/bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 319 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate cpu --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 320 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-t4 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 321 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-a100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 322 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-v100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} 323 | echo "Enabling All-or-Nothing" 324 | echo "all_or_nothing_batch = True" >> /etc/parallelcluster/slurm_plugin/parallelcluster_slurm_resume.conf 325 | 326 | set +e 327 | echo "Attempting last attempt to update to latest version...if this fails, you may need to manually update head and compute" 328 | 329 | # Update to latest version of CryoSPARC 330 | systemctl stop cryosparc-supervisor.service 331 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm update" - ${OSUSER} 332 | # Depends on cryosparcm update to pull latest worker to cryosparc_master dir. 333 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && cp ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/cryosparc_worker.tar.gz ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/cryosparc_worker.tar.gz" - ${OSUSER} 334 | # Only update workers if they were installed (continue otherwise) 335 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw update" - ${OSUSER} || true 336 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 337 | systemctl start cryosparc-supervisor.service 338 | set -e 339 | 340 | #Clean up the .tar.gz: 341 | rm "${CRYOSPARC_INSTALL_PATH}"/*.tar.gz 342 | 343 | echo "CryoSPARC setup complete" 344 | -------------------------------------------------------------------------------- /ParallelClusterREADME.md: -------------------------------------------------------------------------------- 1 | # Guidance for Cryo-EM on AWS ParallelCluster 2 | 3 | - [Guidance for Cryo-EM on AWS ParallelCluster](#guidance-for-cryo-em-on-aws-parallelcluster) 4 | - [Overview](#overview) 5 | - [Cost](#cost) 6 | - [Prerequisites](#prerequisites) 7 | - [Environment](#environment) 8 | - [Supported Regions](#supported-regions) 9 | - [Data Transfer](#data-transfer) 10 | - [CryoSPARC License](#cryosparc-license) 11 | - [Networking and Compute Availability](#networking-and-compute-availability) 12 | - [IAM Permissions](#iam-permissions) 13 | - [Data Export Policy (Optional)](#data-export-policy-optional) 14 | - [Deployment Steps](#deployment-steps) 15 | - [Running the Guidance](#running-the-guidance) 16 | - [Cleanup](#cleanup) 17 | - [FAQ, known issues, additional considerations, and limitations](#faq-known-issues-additional-considerations-and-limitations) 18 | - [AWS Parallel Computing Service (PCS)](#aws-parallel-computing-service-pcs) 19 | - [Notices](#notices) 20 | - [License](#license) 21 | - [Authors](#authors) 22 | 23 | ## Overview 24 | 25 | This guidance demonstrates how to deploy CryoSPARC for cryogenic electron microscopy (Cryo-EM) workloads on AWS ParallelCluster. Cryo-EM enables drug discovery researchers to determine three-dimensional molecular structures crucial for their research. This solution addresses the challenge of processing terabytes of microscopy data through scalable, heterogeneous computing combined with fast, cost-effective storage. 26 | 27 | Below is the architecture model for this guidance. 28 | 29 | ![Architecture](assets/CryoSPARCParallelClusterArch.png) 30 | 31 | ## Cost 32 | 33 | _You are responsible for the cost of the AWS services used while running this Guidance. As of September 2025, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $739.01 per sample. This estimate is based on processing 1 sample (1 TB of data). Cost calculations were derived using the times measured under realistic workload conditions for each instance type._ 34 | 35 | Below you can find a cost breakdown for this estimate based on the resources this guidance runs and assuming the aforementioned working periods (1 sample, 1 TB of data). 36 | 37 | | AWS service | Dimensions | Cost [USD] | 38 | | ---------------------------------- | ----------------------------------------- | ---------- | 39 | | AWS Simple Storage Service (S3) | 1 TB w/ Intelligent Tiering | $ 23.72 | 40 | | Amazon Elastic File Service (EFS) | 100 GB Elastic Throughput | $ 30.00 | 41 | | Amazon FSx for Lustre | 1.2TB SSD - 250 MBps/TiB | $ 252.35 | 42 | | Amazon Elastic Compute Cloud (EC2) | (Head Node) 1 On-Demand c5a.4xlarge | $ 33.02 | 43 | | Amazon Elastic Compute Cloud (EC2) | (CPU Group) 1 On-Demand c5a.8xlarge | $ 0.16 | 44 | | Amazon Elastic Compute Cloud (EC2) | (Single-GPU Group) 1 On-Demand g6.4xlarge | $ 28.35 | 45 | | Amazon Elastic Compute Cloud (EC2) | (Multi-GPU Group) 1 On-Demand g6.48xlarge | $ 371.41 | 46 | 47 | _We recommend creating a [Budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance._ 48 | 49 | ## Prerequisites 50 | 51 | In order to be able to run this guidance and to use CryoSparc you need to have the following: 52 | 53 | - CryoSPARC license ([contact Structura Biotechnology to obtain](https://guide.cryosparc.com/licensing)). 54 | - [AWS CLI](https://aws.amazon.com/cli/) installed and configured. 55 | - An SSH client. 56 | 57 | ### Environment 58 | 59 | We recommend using [AWS CloudShell](https://aws.amazon.com/cloudshell/) to quickly set up an environment that already has the credentials and command line tools you'll need to get started. [The AWS CloudShell Console](https://console.aws.amazon.com/cloudshell) already has credentials to your AWS account, the AWS CLI, and Python installed. If you're not using CloudShell, make sure you have these installed in your local environment before continuing. 60 | 61 | ### Supported Regions 62 | 63 | Only the following regions are supported for this guidance: 64 | 65 | - United States (N. Virginia) 66 | - United States (Ohio) 67 | - United States (Oregon) 68 | 69 | - Asia Pacific (Singapore) 70 | - Asia Pacific (Sydney) 71 | - Asia Pacific (Tokyo) 72 | 73 | - Europe (Frankfurt) 74 | - Europe (Ireland) 75 | - Europe (London) 76 | - Europe (Stockholm) 77 | 78 | Deploying the guidance in other regions may lead to errors or inconsistent behavior. 79 | 80 | ### Data Transfer 81 | 82 | Create a new S3 bucket for your input data. 83 | 84 | The data transfer mechanism to move data from instruments into S3 depends on the connectivity in the lab environment and the volume of data to be transferred. We recommend [AWS DataSync](https://aws.amazon.com/datasync/), which automates secure data transfer from on-premises into the cloud with minimal development effort. [Storage Gateway File Gateway](https://aws.amazon.com/storagegateway/file/) is another viable option, especially if lab connectivity is limited or continued two-way access from on-premises to the transferred data sets is required. Both DataSync and Storage Gateway [can be bandwidth throttled](https://docs.aws.amazon.com/datasync/latest/userguide/working-with-task-executions.html) to protect non-HPC business-critical network constraints. 85 | 86 | Alternatively, you can use the [AWS S3 CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/) to transfer individual files, or use partner solution to get started quickly. 87 | 88 | ### CryoSPARC License 89 | 90 | First, you'll need to request a license from Structura. It can take a day or two to obtain the license, so request it before you get started. You'll use this license ID to replace the placeholder in the configuration file. 91 | 92 | ### Networking and Compute Availability 93 | 94 | A typical use of a default VPC has public and private subnets balanced across multiple Availability Zones (AZs). However, HPC clusters (like ParallelCluster) usually prefer a single-AZ so they can keep communication latency low and use Cluster Placement Groups. For the compute nodes, you can create a large private subnet with a relatively large number of IP addresses. Then, you can create a public subnet with minimal IP addresses, since it will only contain the head node. 95 | 96 | HPC EC2 instances like the [g6 family](https://aws.amazon.com/ec2/instance-types/g6/) aren’t available in every AZ. That means we need to determine which AZ in a given Region has all the compute families we need. We can do that with the [AWS CLI](https://aws.amazon.com/cli/) [describe-instance-type-offerings](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instance-type-offerings.html) command. The easiest way to do this is to use [CloudShell](https://aws.amazon.com/cloudshell/), which provides a shell environment ready to issue AWS CLI commands in a few minutes. If you want a more permanent development environment for ParallelCluster CLI calls, you can use [Cloud9](https://aws.amazon.com/cloud9) which persists an IDE environment, including a Terminal in which you can run CLI commands. After you've provisioned the environment is provisioned, copy and paste the text into the shell. 97 | 98 | ```bash 99 | aws ec2 describe-instance-type-offerings \ 100 | --location-type availability-zone \ 101 | --region \ 102 | --filters Name=instance-type,Values=g6.4xlarge \ 103 | --query "InstanceTypeOfferings[*].Location" \ 104 | --output text 105 | ``` 106 | 107 | Using the output showing which AZs have the compute instances you need, you can create your VPC and subnets. Populate the ``, ``, and `` inputs in the configuration file. 108 | 109 | You’ll also need to [create an EC2 SSH key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-key-pairs.html) so that you can SSH into the head node once your cluster has been deployed, and populate the `` input in the configuration file. 110 | 111 | ### IAM Permissions 112 | 113 | While ParallelCluster creates its own least-privilege roles and policies by default, many Enterprises limit their AWS account users’ access to IAM actions. ParallelCluster also supports using or adding pre-created IAM resources, which you can request to be pre-created for you by your IT services team. The required permissions and roles are [provided in the ParallelCluster documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/iam-roles-in-parallelcluster-v3.html). 114 | 115 | Use [parallel-cluster-cryosparc.yaml](./deployment/parallel-cluster-cryosparc.yaml) - if your account allows ParallelCluster to create new IAM Roles and Policies. 116 | 117 | Use [parallel-cluster-cryosparc-custom-roles.yaml](./deployment/parallel-cluster-cryosparc-custom-roles.yaml) - if your account restricts creation of new IAM resources, which has additional IAM fields, to help you get started quickly. 118 | 119 | If using custom roles, refer to the ParallelCluster IAM documentation for required permissions. 120 | 121 | Note: In ParallelCluster 3.4+, the config file accepts EITHER S3Access OR InstanceRole parameters, but not both. Ensure your roles have S3 access in addition to the policies outlined in the documentation. 122 | 123 | ### Data Export Policy (Optional) 124 | 125 | If you want to automatically export data back to Amazon S3 after job completion, you'll need to attach the [FSxLustreDataRepoTasksPolicy.yaml](./deployment/FSxLustreDataRepoTasksPolicy.yaml) to the head node's instance profile. 126 | 127 | ## Deployment Steps 128 | 129 | 1. **Clone the repository containing the ParallelCluster configuration files** 130 | 131 | ```bash 132 | git clone https://github.com/aws-samples/cryoem-on-aws-parallel-cluster.git 133 | ``` 134 | 135 | 2. **Navigate to the repository folder** 136 | 137 | ```bash 138 | cd cryoem-on-aws-parallel-cluster 139 | ``` 140 | 141 | 3. **Identify available Availability Zones for your required instance types** 142 | 143 | Use AWS CloudShell or Cloud9 to run the following command (replace `` with your target AWS region, e.g., `us-east-1`): 144 | 145 | ```bash 146 | aws ec2 describe-instance-type-offerings \ 147 | --location-type availability-zone \ 148 | --region \ 149 | --filters Name=instance-type,Values=g6.4xlarge \ 150 | --query "InstanceTypeOfferings[*].Location" \ 151 | --output text 152 | ``` 153 | 154 | This command identifies which Availability Zones support g6 instances. Note the output for use in the next step. 155 | 156 | 4. **Create VPC and subnets in the identified Availability Zone** 157 | 158 | Create a VPC with: 159 | - One small public subnet (for the head node) 160 | - One large private subnet (for compute nodes with a relatively large number of IP addresses) 161 | 162 | Ensure your public subnet is configured to automatically assign IPv4 addresses and has DNS enabled. 163 | 164 | Capture the subnet IDs using: 165 | 166 | ```bash 167 | aws ec2 describe-subnets --filters "Name=vpc-id,Values=" --query "Subnets[*].[SubnetId,CidrBlock,AvailabilityZone]" --output table 168 | ``` 169 | 170 | 5. **Create an EC2 SSH key pair** 171 | 172 | ```bash 173 | aws ec2 create-key-pair --key-name cryosparc-cluster-key --query 'KeyMaterial' --output text > cryosparc-cluster-key.pem 174 | chmod 400 cryosparc-cluster-key.pem 175 | ``` 176 | 177 | This creates a key pair and saves the private key locally. The key name will be used in the configuration file. 178 | 179 | 6. **Create an S3 bucket for ParallelCluster artifacts** 180 | 181 | ```bash 182 | aws s3 mb s3://cryosparc-parallel-cluster- --region 183 | ``` 184 | 185 | Replace `` with your AWS account ID and `` with your target region. 186 | 187 | 7. **Edit the ParallelCluster configuration file** 188 | 189 | Open either `parallel-cluster-cryosparc.yaml` or `parallel-cluster-cryosparc-custom-roles.yaml` and replace the following placeholders: 190 | - `` - Your CryoSPARC license ID from Structura 191 | - `` - Your AWS region (e.g., `us-east-1`) 192 | - `` - The subnet ID for your public subnet 193 | - `` - The subnet ID for your private subnet 194 | - `` - The name of your SSH key pair (e.g., `cryosparc-cluster-key`) 195 | 196 | Make sure to view the multi-tier deployment of instances for node groups. 197 | 198 | 8. **Upload the configuration file and post-install script to S3** 199 | 200 | ```bash 201 | aws s3 cp parallel-cluster-cryosparc.yaml s3://cryosparc-parallel-cluster-/ 202 | aws s3 cp parallel-cluster-post-install.sh s3://cryosparc-parallel-cluster-/ 203 | ``` 204 | 205 | 9. **Install AWS ParallelCluster in a Python virtual environment** 206 | ```bash 207 | python3 -m venv pcluster-venv 208 | source pcluster-venv/bin/activate 209 | pip install --upgrade pip 210 | pip install aws-parallelcluster 211 | ``` 212 | 10. **Install Node Version Manager and LTS Node.JS Version** 213 | 214 | ```bash 215 | curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash 216 | chmod ug+x ~/.nvm/nvm.sh 217 | source ~/.nvm/nvm.sh 218 | nvm install --lts 219 | node --version 220 | ``` 221 | 222 | 11. **Verify ParallelCluster installation** 223 | 224 | ```bash 225 | pcluster version 226 | ``` 227 | 228 | This confirms ParallelCluster CLI is properly installed. 229 | 230 | 12. **Copy the ParallelCluster configuration file from S3** 231 | 232 | ```bash 233 | aws s3api get-object --bucket cryosparc-parallel-cluster- --key parallel-cluster-cryosparc.yaml parallel-cluster-cryosparc.yaml 234 | ``` 235 | 236 | 13. **Create the ParallelCluster** 237 | 238 | ```bash 239 | pcluster create-cluster --cluster-name cryosparc-cluster --cluster-configuration parallel-cluster-cryosparc.yaml 240 | ``` 241 | 242 | This command initiates the cluster creation process using AWS CloudFormation. 243 | 244 | 14. **Monitor cluster creation status** 245 | 246 | ```bash 247 | pcluster describe-cluster --cluster-name cryosparc-cluster 248 | ``` 249 | 250 | Alternatively, monitor progress in the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation/). 251 | 252 | The cluster is ready when the status shows `CREATE_COMPLETE`. 253 | 254 | 15. **Capture the head node instance ID (once cluster is created)** 255 | 256 | ```bash 257 | aws cloudformation describe-stack-resources --stack-name cryosparc-cluster --query "StackResources[?LogicalResourceId=='HeadNode'].PhysicalResourceId" --output text 258 | ``` 259 | 260 | 16. **Capture the head node public IP address** 261 | 262 | ```bash 263 | pcluster describe-cluster --cluster-name cryosparc-cluster --query "headNode.publicIpAddress" --output text 264 | ``` 265 | 266 | **Troubleshooting:** If the stack rolls back due to a failure: 267 | 268 | - Verify your public subnet automatically assigns IPv4 addresses and has DNS enabled 269 | - Re-create the cluster with the `--rollback-on-failure false` flag to preserve resources for troubleshooting: 270 | ```bash 271 | pcluster create-cluster --cluster-name cryosparc-cluster --cluster-configuration parallel-cluster-cryosparc.yaml --rollback-on-failure false 272 | ``` 273 | - Check the HeadNode system logs in the EC2 console: Select the instance → Actions → Monitor and troubleshoot → Get system log 274 | 275 | ## Running the Guidance 276 | 277 | Once your cluster has been deployed and provisioned, you are ready to continue using AWS ParallelCluster to run CryoSPARC jobs as described in their [documentation](https://guide.cryosparc.com/setup-configuration-and-management/cryosparc-on-aws). 278 | 279 | ## Cleanup 280 | 281 | To clean up your cluster, use ParallelCluster's delete-cluster command to de-provision the underlying resources in your cluster. 282 | 283 | ```bash 284 | pcluster delete-cluster --cluster-name cryosparc-cluster 285 | ``` 286 | 287 | Once the cluster has been deleted, you can delete the files you uploaded to S3 and the S3 bucket itself, along with the data transfer solution you chose in the prerequisite sections. 288 | 289 | ## FAQ, known issues, additional considerations, and limitations 290 | 291 | ### AWS Parallel Computing Service (PCS) 292 | 293 | AWS Parallel Computing Service (PCS) offers an alternative deployment method for running CryoSPARC workloads. PCS might be preferred when you want a fully managed experience with less operational overhead, faster setup, and easier scaling compared to managing infrastructure manually. It abstracts much of the complexity of HPC cluster management while still allowing you to run large-scale distributed workloads. 294 | 295 | In such situations where AWS PCS may be preferred, an AWS guidance is available [here](PCSREADME.md). 296 | 297 | You can find the post-install sample code in `source/pcs-cryosparc-post-install.sh` as referenced in the Scalable Cryo-EM on AWS Parallel Computing Service (PCS) guidance for installation on the login node. The full architecture of the guidance is as follows and can be found on the guidance. 298 | 299 | ## Notices 300 | 301 | _Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers._ 302 | 303 | ![CryoSPARC on PCS Architecture](assets/cryoemPCSarchitecture.png) 304 | 305 | ## License 306 | 307 | This library is licensed under the MIT-0 License. See the LICENSE file. 308 | 309 | ## Authors 310 | 311 | - Natalie White 312 | - Brian Skjerven 313 | --------------------------------------------------------------------------------