├── .gitignore
├── .gitmodules
├── assets
    ├── cfnstackhpctemplate.png
    ├── cryoemPCSarchitecture.png
    ├── cryosparcsigninpage.png
    └── CryoSPARCParallelClusterArch.png
├── CODE_OF_CONDUCT.md
├── LICENSE
├── deployment
    ├── FSxLustreDataRepoTasksPolicy.yaml
    ├── parallel-cluster-cryosparc.yaml
    └── parallel-cluster-cryosparc-custom-roles.yaml
├── README.md
├── CONTRIBUTING.md
├── source
    ├── pcs-cryosparc-post-install.sh
    └── parallel-cluster-post-install.sh
├── PCSREADME.md
└── ParallelClusterREADME.md


/.gitignore:
--------------------------------------------------------------------------------
1 | 
2 | .DS_Store
3 | 


--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "deployment/aws-hpc-recipe"]
2 | 	path = deployment/aws-hpc-recipe
3 | 	url = https://github.com/aws-samples/aws-hpc-recipes.git
4 | 


--------------------------------------------------------------------------------
/assets/cfnstackhpctemplate.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/cfnstackhpctemplate.png


--------------------------------------------------------------------------------
/assets/cryoemPCSarchitecture.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/cryoemPCSarchitecture.png


--------------------------------------------------------------------------------
/assets/cryosparcsigninpage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/cryosparcsigninpage.png


--------------------------------------------------------------------------------
/assets/CryoSPARCParallelClusterArch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-solutions-library-samples/guidance-for-scalable-cryo-em-on-aws-parallel-computing-service/HEAD/assets/CryoSPARCParallelClusterArch.png


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | 
3 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
4 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
5 | opensource-codeofconduct@amazon.com with any additional questions or comments.
6 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT No Attribution
 2 | 
 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 6 | this software and associated documentation files (the "Software"), to deal in
 7 | the Software without restriction, including without limitation the rights to
 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 9 | the Software, and to permit persons to whom the Software is furnished to do so.
10 | 
11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
17 | 


--------------------------------------------------------------------------------
/deployment/FSxLustreDataRepoTasksPolicy.yaml:
--------------------------------------------------------------------------------
 1 | AWSTemplateFormatVersion: 2010-09-09
 2 | Description: >
 3 |   This Cloudformation template builds the necessary IAM policies for creating data repository tasks for FSx. (AWS SID: SO9634)
 4 | Resources:
 5 |   DataRepoTaskIamPolicy:
 6 |     Type: "AWS::IAM::ManagedPolicy"
 7 |     Properties:
 8 |       PolicyDocument:
 9 |         Version: "2012-10-17"
10 |         Statement:
11 |           - Sid: "DataRepoTaskAdmin"
12 |             Effect: Allow
13 |             Action:
14 |               - "fsx:CreateDataRepositoryTask"
15 |               - "fsx:CancelDataRepositoryTask"
16 |             Resource:
17 |               - !Sub "arn:aws:fsx:${AWS::Region}:${AWS::AccountId}:file-system/${FsxId}"
18 |               - !Sub "arn:aws:fsx:${AWS::Region}:${AWS::AccountId}:task/*"
19 |           - Sid: "DataRepoTaskRead"
20 |             Effect: Allow
21 |             Action:
22 |               - "fsx:DescribeDataRepositoryTasks"
23 |             Resource:
24 |               - "arn:aws:fsx:${AWS::Region}:${AWS::AccountId}:*"
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Guidance for Cryo-EM on AWS Parallel Cluster and AWS Parallel Computing Service
 2 | 
 3 | ## Introduction
 4 | 
 5 | This guide provides orientation for running Cryo-EM workflows on AWS. AWS offers two different parallel computing options: AWS ParallelCluster and AWS Parallel Computing Service (PCS). Each has strengths depending on workload complexity, ease of management, and scaling needs.
 6 | 
 7 | ## Overview
 8 | 
 9 | ### AWS ParallelCluster
10 | 
11 | An open-source cluster management tool that provisions and manages HPC clusters on AWS.
12 | 
13 | Offers fine-grained control over compute environments, schedulers (Slurm, etc.), networking and data management.
14 | 
15 | Best for reproducible environments, and tightly coupled workloads.
16 | 
17 | For detailed guidance on running Cryo-EM workloads using CryoSPARC with AWS ParallelCluster, refer to the Guidance [README](ParallelClusterREADME.md).
18 | 
19 | Below is the architecture for the AWS Parallel Cluster Guidance.
20 | 
21 | ![ParallelClusterArchitecture](assets/CryoSPARCParallelClusterArch.png)
22 | 
23 | ### AWS Parallel Computing Service (PCS)
24 | 
25 | A managed service for running parallel workloads without needing to manage the underlying infrastructure.
26 | 
27 | Ideal for on-demand scaling.
28 | 
29 | Lower operational burden, more “serverless” style.
30 | 
31 | For detailed guidance on running Cryo-EM workloads using CryoSPARC with AWS Parallel Computing Service (PCS), refer to the Guidance [README](PCSREADME.md).
32 | 
33 | Below is the architecture for the AWS PCS Guidance.
34 | 
35 | ![PCSArchitecture](assets/cryoemPCSarchitecture.png)
36 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | ## Reporting Bugs/Feature Requests
10 | 
11 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
12 | 
13 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
14 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
15 | 
16 | - A reproducible test case or series of steps
17 | - The version of our code being used
18 | - Any modifications you've made relevant to the bug
19 | - Anything unusual about your environment or deployment
20 | 
21 | ## Contributing via Pull Requests
22 | 
23 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
24 | 
25 | 1. You are working against the latest source on the _main_ branch.
26 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
27 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
28 | 
29 | To send us a pull request, please:
30 | 
31 | 1. Fork the repository.
32 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
33 | 3. Ensure local tests pass.
34 | 4. Commit to your fork using clear commit messages.
35 | 5. Send us a pull request, answering any default questions in the pull request interface.
36 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
37 | 
38 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
39 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
40 | 
41 | ## Finding contributions to work on
42 | 
43 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
44 | 
45 | ## Code of Conduct
46 | 
47 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
48 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
49 | opensource-codeofconduct@amazon.com with any additional questions or comments.
50 | 
51 | ## Security issue notifications
52 | 
53 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
54 | 
55 | ## Licensing
56 | 
57 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
58 | 


--------------------------------------------------------------------------------
/deployment/parallel-cluster-cryosparc.yaml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: 2010-09-09
  2 | Description: >
  3 |   This Cloudformation template builds the architecture for Cryoem on AWS - parallel cluster version of the architecture with the necessary slurm controllers and instance groups. (AWS SID: SO9634)
  4 | DevSettings:
  5 |   Timeouts:
  6 |     HeadNodeBootstrapTimeout: 5400
  7 | Region: us-east-1
  8 | Image:
  9 |   Os: alinux2
 10 | HeadNode:
 11 |   InstanceType: c5a.4xlarge
 12 |   Networking:
 13 |     SubnetId: <SMALL-PUBLIC-SUBNET-ID>
 14 |     ElasticIp: true
 15 |     SecurityGroups:
 16 |       - <HEAD-NODE-SECURITY-GROUP-ID>
 17 |   Ssh:
 18 |     KeyName: <EC2-KEY-PAIR-NAME>
 19 |   LocalStorage:
 20 |     RootVolume:
 21 |       Size: 100
 22 |     EphemeralVolume:
 23 |       MountDir: /scratch
 24 |   Dcv:
 25 |     Enabled: true
 26 |   CustomActions:
 27 |     OnNodeConfigured:
 28 |       Sequence:
 29 |         - Script: s3://<BUCKET-NAME>/parallel-cluster-post-install.sh
 30 |           Args:
 31 |             - <CRYOSPARC-LICENSE>
 32 |             - /shared/cryosparc
 33 |             - /shared/cuda
 34 |             - 11.8.0
 35 |             - 11.8.0_520.61.05
 36 |             - /fsx
 37 |   Iam:
 38 |     S3Access:
 39 |       - BucketName: <S3-BUCKET-NAME>
 40 | 
 41 | Scheduling:
 42 |   Scheduler: slurm
 43 |   SlurmQueues:
 44 |     - Name: cpu
 45 |       CapacityType: ONDEMAND
 46 |       ComputeResources:
 47 |         - Name: c5a-8xlarge
 48 |           InstanceType: c5a.8xlarge
 49 |           MinCount: 0
 50 |           MaxCount: 20
 51 |           DisableSimultaneousMultithreading: true
 52 |           Efa:
 53 |             Enabled: false
 54 |       Networking:
 55 |         SubnetIds:
 56 |           - <LARGE-PRIVATE-SUBNET-ID>
 57 |         SecurityGroups:
 58 |           - <NODE-SECURITY-GROUP-ID>
 59 |         PlacementGroup:
 60 |           Enabled: true
 61 |       ComputeSettings:
 62 |         LocalStorage:
 63 |           EphemeralVolume:
 64 |             MountDir: /scratch
 65 | 
 66 |     - Name: single-gpu
 67 |       CapacityType: ONDEMAND
 68 |       ComputeResources:
 69 |         - Name: g6-4xlarge
 70 |           InstanceType: g6.4xlarge
 71 |           MinCount: 0
 72 |           MaxCount: 20
 73 |           DisableSimultaneousMultithreading: true
 74 |           Efa:
 75 |             Enabled: false
 76 |       Networking:
 77 |         SubnetIds:
 78 |           - <LARGE-PRIVATE-SUBNET-ID>
 79 |         SecurityGroups:
 80 |           - <NODE-SECURITY-GROUP-ID>
 81 |         PlacementGroup:
 82 |           Enabled: true
 83 |       ComputeSettings:
 84 |         LocalStorage:
 85 |           EphemeralVolume:
 86 |             MountDir: /scratch
 87 | 
 88 |     - Name: multi-gpu
 89 |       CapacityType: ONDEMAND
 90 |       ComputeResources:
 91 |         - Name: g6-48xlarge
 92 |           InstanceType: g6.48xlarge
 93 |           MinCount: 0
 94 |           MaxCount: 20
 95 |           DisableSimultaneousMultithreading: true
 96 |           Efa:
 97 |             Enabled: true
 98 |       Networking:
 99 |         SubnetIds:
100 |           - <LARGE-PRIVATE-SUBNET-ID>
101 |         SecurityGroups:
102 |           - <NODE-SECURITY-GROUP-ID>
103 |         PlacementGroup:
104 |           Enabled: true
105 |       ComputeSettings:
106 |         LocalStorage:
107 |           EphemeralVolume:
108 |             MountDir: /scratch
109 | 
110 | SharedStorage:
111 |   - Name: cryosparc-ebs
112 |     StorageType: Ebs
113 |     MountDir: /shared
114 |     EbsSettings:
115 |       Encrypted: true
116 |       VolumeType: gp3
117 |       Size: 100
118 | 
119 |   - Name: cryosparc-fsx
120 |     StorageType: FsxLustre
121 |     MountDir: /fsx
122 |     FsxLustreSettings:
123 |       AutoImportPolicy: NEW_CHANGED
124 |       StorageCapacity: 1024
125 |       DeploymentType: PERSISTENT_2
126 |       ImportedFileChunkSize: 1024
127 |       PerUnitStorageThroughput: 250
128 |       ImportPath: s3://<PATH-TO-IMPORT-FILES>
129 | 
130 | Monitoring:
131 |   Dashboards:
132 |     CloudWatch:
133 |       Enabled: true
134 | 


--------------------------------------------------------------------------------
/deployment/parallel-cluster-cryosparc-custom-roles.yaml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: 2010-09-09
  2 | Description: >
  3 |   This Cloudformation template builds the architecture for Cryoem on AWS - parallel cluster version of the architecture with the necessary slurm controllers and instance groups, this version is if your account has restrictions for the creation of new IAM resources. (AWS SID: SO9634)
  4 | Region: <REGION>
  5 | Image:
  6 |   Os: alinux2
  7 | Iam:
  8 |   Roles:
  9 |     LambdaFunctionsRole: <PClusterLambdaRoleArn>
 10 | HeadNode:
 11 |   InstanceType: c5a.4xlarge
 12 |   Networking:
 13 |     SubnetId: <SMALL-PUBLIC-SUBNET-ID>
 14 |     ElasticIp: true
 15 |     SecurityGroups:
 16 |       - <HEAD-NODE-SECURITY-GROUP-ID>
 17 |   Ssh:
 18 |     KeyName: <EC2-KEY-PAIR-NAME>
 19 |   LocalStorage:
 20 |     RootVolume:
 21 |       Size: 100
 22 |     EphemeralVolume:
 23 |       MountDir: /scratch
 24 |   Dcv:
 25 |     Enabled: true
 26 |   CustomActions:
 27 |     OnNodeConfigured:
 28 |       Script: s3://<BUCKET-NAME>/parallel-cluster-post-install.sh
 29 |       Args:
 30 |         - s3://<BUCKET-NAME>/parallel-cluster-post-install.sh
 31 |         - <CRYOSPARC-LICENSE>
 32 |   Iam:
 33 |     InstanceRole: <PClusterInstanceRoleArn>
 34 | 
 35 | Scheduling:
 36 |   Scheduler: slurm
 37 |   SlurmQueues:
 38 |     - Name: cpu
 39 |       Iam:
 40 |         InstanceRole: <PClusterInstanceRoleArn>
 41 |       CapacityType: ONDEMAND
 42 |       ComputeResources:
 43 |         - Name: c5a-8xlarge
 44 |           InstanceType: c5a.8xlarge
 45 |           MinCount: 0
 46 |           MaxCount: 20
 47 |           DisableSimultaneousMultithreading: true
 48 |           Efa:
 49 |             Enabled: false
 50 |       Networking:
 51 |         SubnetIds:
 52 |           - <LARGE-PRIVATE-SUBNET-ID>
 53 |         SecurityGroups:
 54 |           - <NODE-SECURITY-GROUP-ID>
 55 |         PlacementGroup:
 56 |           Enabled: true
 57 |       ComputeSettings:
 58 |         LocalStorage:
 59 |           EphemeralVolume:
 60 |             MountDir: /scratch
 61 | 
 62 |     - Name: single-gpu
 63 |       Iam:
 64 |         InstanceRole: <PClusterInstanceRoleArn>
 65 |       CapacityType: ONDEMAND
 66 |       ComputeResources:
 67 |         - Name: g6-4xlarge
 68 |           InstanceType: g6.4xlarge
 69 |           MinCount: 0
 70 |           MaxCount: 10
 71 |           DisableSimultaneousMultithreading: true
 72 |           Efa:
 73 |             Enabled: false
 74 |       Networking:
 75 |         SubnetIds:
 76 |           - <LARGE-PRIVATE-SUBNET-ID>
 77 |         SecurityGroups:
 78 |           - <NODE-SECURITY-GROUP-ID>
 79 |         PlacementGroup:
 80 |           Enabled: true
 81 |       ComputeSettings:
 82 |         LocalStorage:
 83 |           EphemeralVolume:
 84 |             MountDir: /scratch
 85 | 
 86 |     - Name: multi-gpu
 87 |       Iam:
 88 |         InstanceRole: <PClusterInstanceRoleArn>
 89 |       CapacityType: ONDEMAND
 90 |       ComputeResources:
 91 |         - Name: g6-48xlarge
 92 |           InstanceType: g6.48xlarge
 93 |           MinCount: 0
 94 |           MaxCount: 10
 95 |           DisableSimultaneousMultithreading: true
 96 |           Efa:
 97 |             Enabled: true
 98 |       Networking:
 99 |         SubnetIds:
100 |           - <LARGE-PRIVATE-SUBNET-ID>
101 |         SecurityGroups:
102 |           - <NODE-SECURITY-GROUP-ID>
103 |         PlacementGroup:
104 |           Enabled: true
105 |       ComputeSettings:
106 |         LocalStorage:
107 |           EphemeralVolume:
108 |             MountDir: /scratch
109 | 
110 | SharedStorage:
111 |   - Name: cryosparc-ebs
112 |     StorageType: Ebs
113 |     MountDir: /shared
114 |     EbsSettings:
115 |       Encrypted: true
116 |       VolumeType: gp3
117 |       Size: 100
118 | 
119 |   - Name: cryosparc-fsx
120 |     StorageType: FsxLustre
121 |     MountDir: /fsx
122 |     FsxLustreSettings:
123 |       AutoImportPolicy: NEW_CHANGED
124 |       StorageCapacity: 1024
125 |       DeploymentType: PERSISTENT_2
126 |       ImportedFileChunkSize: 1024
127 |       PerUnitStorageThroughput: 250
128 |       ImportPath: s3://<PATH-TO-IMPORT-FILES>
129 | 
130 | Monitoring:
131 |   Dashboards:
132 |     CloudWatch:
133 |       Enabled: true
134 | 


--------------------------------------------------------------------------------
/source/pcs-cryosparc-post-install.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | ## Args: 
  4 | # argv1 - CRYOSPARC_LICENSE_ID (required)
  5 | # argv2 - CRYOSPARC_INSTALL_PATH (default: /shared/cryosparc)
  6 | # argv3 - CUDA_INSTALL_PATH (default: /shared/cuda)
  7 | # argv4 - CUDA_VERSION (default 11.3.1)
  8 | # argv5 - CUDA_LONG_VERSION (default: 11.3.1_465.19.01)
  9 | # argv6 - PROJECT_DATA_PATH (default: /fsx)
 10 | 
 11 | # Get the local commands to run yum and apt
 12 | YUM_CMD=$(which yum || echo "")
 13 | APT_GET_CMD=$(which apt-get || echo "")
 14 | 
 15 | # If we have yum installed, use it to install prerequisites. If not, use apt
 16 | if [[ -n $YUM_CMD ]]; then
 17 |     wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -P /tmp
 18 |     yum install -y /tmp/epel-release-latest-7.noarch.rpm
 19 | 
 20 |     yum install -y perl-Switch python3 python3-pip links
 21 |     user_test=$(getent passwd ec2-user)
 22 |     if [[ -n "${user_test}" ]]; then
 23 |         OSUSER=ec2-user
 24 |         OSGROUP=ec2-user
 25 |     else
 26 |         OSUSER=centos
 27 |         OSGROUP=centos
 28 |     fi
 29 | elif [[ -n $APT_GET_CMD ]]; then
 30 |     apt-get update
 31 |     apt-get install -y libswitch-perl python3 python3-pip links
 32 |     OSUSER=ubuntu
 33 |     OSGROUP=ubuntu
 34 | else
 35 |     # If we don't have yum or apt, we couldn't install the prerequisites, so exit
 36 |     echo "error can't install package $PACKAGE"
 37 |     exit 1;
 38 | fi
 39 | 
 40 | # Get the cryoSPARC license ID, optional path, and optional versions from the script arguments
 41 | CRYOSPARC_LICENSE_ID=$1
 42 | CRYOSPARC_INSTALL_PATH=${2:-/shared/cryosparc}
 43 | CUDA_INSTALL_PATH=${3:-/shared/cuda}
 44 | CUDA_VERSION=${4:-11.3.1}
 45 | CUDA_LONG_VERSION=${5:-11.3.1_465.19.01}
 46 | PROJECT_DATA_PATH=${6:-/shared}
 47 | 
 48 | /bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} || chmod 777 ${PROJECT_DATA_PATH}
 49 | 
 50 | # Install the AWS CLI
 51 | pip3 install --upgrade awscli boto3
 52 | 
 53 | set -e
 54 | 
 55 | #yum -y update
 56 | 
 57 | # Configure AWS
 58 | AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | rev | cut -c 2- | rev)
 59 | aws configure set default.region "${AWS_DEFAULT_REGION}"
 60 | aws configure set default.output json
 61 | 
 62 | if [[ "$(cat ${CUDA_INSTALL_PATH}/installed_cuda_version.log 2>/dev/null )" == "${CUDA_LONG_VERSION}" ]]; then 
 63 |   echo "Matched previous CUDA version. Using old installer ${CUDA_LONG_VERSION}"
 64 | else 
 65 |   echo "Installing new version of CUDA ${CUDA_LONG_VERSION} (this may break cryosparc install)"
 66 |   # Install CUDA Toolkit
 67 |   mkdir -p "${CUDA_INSTALL_PATH}"
 68 |   mkdir -p "${CUDA_INSTALL_PATH}_tmp"
 69 |   cd "${CUDA_INSTALL_PATH}" || return
 70 |   wget "https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda_${CUDA_LONG_VERSION}_linux.run"
 71 | fi
 72 | echo "${CUDA_LONG_VERSION}" > ${CUDA_INSTALL_PATH}/installed_cuda_version.log
 73 | 
 74 | sh ${CUDA_INSTALL_PATH}/cuda_"${CUDA_LONG_VERSION}"_linux.run --tmpdir="${CUDA_INSTALL_PATH}_tmp" --defaultroot="${CUDA_INSTALL_PATH}" --toolkit --toolkitpath="${CUDA_INSTALL_PATH}"/"${CUDA_VERSION}" --samples --silent
 75 | #rm cuda_"${CUDA_LONG_VERSION}"_linux.run
 76 | 
 77 | # Add CUDA to the path
 78 | cat > /etc/profile.d/cuda.sh << 'EOF'
 79 | PATH=$PATH:@CUDA_INSTALL_PATH@/@CUDA_VERSION@/bin
 80 | EOF
 81 | sed -i "s|@CUDA_INSTALL_PATH@|${CUDA_INSTALL_PATH}|g" /etc/profile.d/cuda.sh
 82 | sed -i "s|@CUDA_VERSION@|${CUDA_VERSION}|g" /etc/profile.d/cuda.sh
 83 | . /etc/profile.d/cuda.sh
 84 | 
 85 | # Add CryoSPARC to the path
 86 | cat > /etc/profile.d/cryosparc.sh << 'EOF'
 87 | PATH=$PATH:@CRYOSPARC_INSTALL_PATH@/cryosparc_master/bin
 88 | EOF
 89 | sed -i "s|@CRYOSPARC_INSTALL_PATH@|${CRYOSPARC_INSTALL_PATH}|g" /etc/profile.d/cryosparc.sh
 90 | . /etc/profile.d/cryosparc.sh
 91 | 
 92 | # Condition checks whether /etc/profile.d/cryosparc.sh activated previously install cryosparc
 93 | # if not, then we install cryosparc
 94 | if [ ! -x "$(command -v "cryosparcm")" ]; then
 95 |   echo "Installing fresh CryoSPARC"
 96 | 
 97 |   # Download cryoSPARC
 98 |   mkdir -p "${CRYOSPARC_INSTALL_PATH}"
 99 |   # Need to make sure OSUSER can write to this path
100 |   chown ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}
101 | 
102 |   cd "${CRYOSPARC_INSTALL_PATH}" || return
103 |   [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_master.tar.gz" ] || curl -L "https://get.cryosparc.com/download/master-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_master.tar.gz
104 |   [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker.tar.gz" ] || curl -L "https://get.cryosparc.com/download/worker-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_worker.tar.gz
105 |   
106 |   # Install cryoSPARC main process
107 |   tar -xf cryosparc_master.tar.gz
108 | 
109 |   # cryosparc untars with ownership: 1001:1001 by default. re-align permissions to OSUSER
110 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master
111 | 
112 |   # Basic configuration for install
113 |   export CRYOSPARC_FORCE_USER=true
114 |   export CRYOSPARC_FORCE_HOSTNAME=true
115 |   export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true
116 | 
117 |   # Install Main
118 |   /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_master && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \
119 |       --hostname "${HOSTNAME}" \
120 |       --dbpath "${CRYOSPARC_INSTALL_PATH}"/cryosparc_db \
121 |       --port 45000 \
122 |       --allowroot \
123 |       --yes" - $OSUSER
124 |   
125 |   # Enforce configuration long-term
126 |   echo "export CRYOSPARC_FORCE_USER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh
127 |   echo "export CRYOSPARC_FORCE_HOSTNAME=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh
128 |   echo "export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh
129 | 
130 |   # Ownership of this path determines how cryosparc is started
131 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master
132 | 
133 |   # Start cryoSPARC main package 
134 |   /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER}
135 | 
136 |   # Install cryoSPARC worker package
137 |   cd "${CRYOSPARC_INSTALL_PATH}" || return
138 |   tar -xf cryosparc_worker.tar.gz
139 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker
140 |   /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \
141 |       --cudapath "${CUDA_INSTALL_PATH}/${CUDA_VERSION}" \
142 |       --yes" - $OSUSER
143 |   
144 |   #rm "${CRYOSPARC_INSTALL_PATH}"/*.tar.gz
145 |   
146 |   # Once again, re-align permissions
147 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker
148 | 
149 |   # Start cryoSPARC main package 
150 |   /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER}
151 | 
152 | else 
153 |   echo "Restoring CryoSPARC with updated Hostname and refreshing paritition connections"
154 | 
155 |   # Stop any running cryosparc
156 |   systemctl stop cryosparc-supervisor.service || /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop || echo \"Nothing Running\" " - ${OSUSER}
157 | 
158 |   # Update hostname to new main
159 |   sed -i "s/^\(.*CRYOSPARC_MASTER_HOSTNAME=\"\).*\"/\1$HOSTNAME\"/g" ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/config.sh
160 | 
161 |   # Once again, re-align permissions for proper start 
162 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master
163 | fi
164 |   
165 | # Start cluster
166 | #/bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER}
167 | 
168 | # Confirm Restart CryoSPARC main
169 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm restart" - ${OSUSER} 
170 | 
171 | # Stop server in anticipation for service
172 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 
173 | 
174 | # Create the CryoSPARC Systemd service and start at Boot
175 | eval $(cryosparcm env)
176 | cd "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd" || return
177 | # Final alignment on permissions
178 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd
179 | env "CRYOSPARC_ROOT_DIR=$CRYOSPARC_ROOT_DIR" ./install_services.sh
180 | systemctl start cryosparc-supervisor.service
181 | systemctl restart cryosparc-supervisor.service
182 | systemctl enable cryosparc-supervisor.service
183 | 
184 | 
185 | # Be tolerant of errors on partitions; we can always come back through admin panel and add later
186 | set +e 
187 | echo "Partitions:"
188 | /opt/slurm/bin/scontrol show partitions
189 | 
190 | echo "Beginning"
191 | # Create cluster config files
192 | for PARTITION in $( /opt/slurm/bin/scontrol show partitions | grep PartitionName | cut -d'=' -f 2 )
193 | do 
194 |   if [ ! -f "${CRYOSPARC_INSTALL_PATH}/${PARTITION}/cluster_info.json" ]; then
195 |     echo "Connecting New Partition: ${PARTITION}"
196 |     case $PARTITION in
197 |         compute-single-gpu)
198 | 	    echo "L4 GPU: $PARTITION"
199 |             PARTITION_CACHE_PATH="/scratch"
200 |             PARTITION_CACHE_RESERVE=10000
201 |             PARTITION_CACHE_QUOTA=800000
202 |             PARTITION_RAM_GB_MULTIPLIER=2.0
203 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
204 |         ;;
205 |         compute-multi-gpu)
206 | 	    echo "L4 GPU: $PARTITION"
207 |             PARTITION_CACHE_PATH="/scratch"
208 |             PARTITION_CACHE_RESERVE=10000
209 |             PARTITION_CACHE_QUOTA=800000
210 |             PARTITION_RAM_GB_MULTIPLIER=2.0
211 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
212 |         ;;
213 |         *)
214 |             PARTITION_CACHE_PATH=""
215 |             PARTITION_CACHE_RESERVE=10000
216 |             PARTITION_CACHE_QUOTA=null
217 |             PARTITION_RAM_GB_MULTIPLIER=2.0
218 |             SBATCH_EXTRA=""
219 |         ;;
220 |     esac
221 | 
222 |     mkdir -p "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"
223 |     cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_info.json << EOF
224 | {
225 | "qdel_cmd_tpl": "/opt/slurm/bin/scancel {{ cluster_job_id }}",
226 | "worker_bin_path": "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw",
227 | "title": "cryosparc-cluster",
228 | "cache_path": "${PARTITION_CACHE_PATH}",
229 | "cache_reserve_mb": ${PARTITION_CACHE_RESERVE},
230 | "cache_quota_mb": ${PARTITION_CACHE_QUOTA},
231 | "qinfo_cmd_tpl": "/opt/slurm/bin/sinfo --format='%.42N %.5D %.15P %.8T %.15C %.5c %.10z %.10m %.15G %.9d %40E'",
232 | "qsub_cmd_tpl": "/opt/slurm/bin/sbatch {{ script_path_abs }}",
233 | "qstat_cmd_tpl": "/opt/slurm/bin/squeue -j {{ cluster_job_id }}",
234 | "send_cmd_tpl": "{{ command }}",
235 | "name": "${PARTITION}"
236 | }
237 | EOF
238 | #sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'
239 | 
240 |     cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_script.sh << EOF
241 | #!/usr/bin/env bash
242 | #### cryoSPARC cluster submission script template for SLURM
243 | ## Available variables:
244 | ## {{ run_cmd }}            - the complete command string to run the job
245 | ## {{ num_cpu }}            - the number of CPUs needed
246 | ## {{ num_gpu }}            - the number of GPUs needed.
247 | ##                            Note: the code will use this many GPUs starting from dev id 0
248 | ##                                  the cluster scheduler or this script have the responsibility
249 | ##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up
250 | ##                                  using the correct cluster-allocated GPUs.
251 | ## {{ ram_gb }}             - the amount of RAM needed in GB
252 | ## {{ job_dir_abs }}        - absolute path to the job directory
253 | ## {{ project_dir_abs }}    - absolute path to the project dir
254 | ## {{ job_log_path_abs }}   - absolute path to the log file for the job
255 | ## {{ worker_bin_path }}    - absolute path to the cryosparc worker command
256 | ## {{ run_args }}           - arguments to be passed to cryosparcw run
257 | ## {{ project_uid }}        - uid of the project
258 | ## {{ job_uid }}            - uid of the job
259 | ## {{ job_creator }}        - name of the user that created the job (may contain spaces)
260 | ## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)
261 | ## {{ job_type }}           - CryoSPARC job type
262 | ##
263 | ## What follows is a simple SLURM script:
264 | 
265 | #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}
266 | #SBATCH -n {{ num_cpu }}
267 | ${SBATCH_EXTRA}
268 | #SBATCH --partition=${PARTITION}
269 | #SBATCH --mem={{ (ram_gb|float * ram_gb_multiplier|float)|int }}G
270 | #SBATCH --output={{ job_log_path_abs }}
271 | #SBATCH --error={{ job_log_path_abs }}
272 | 
273 | {{ run_cmd }}
274 | EOF
275 | 
276 |     #sed -i "s|@PARTITION@|${PARTITION}|g" "${CRYOSPARC_INSTALL_PATH}"/cluster_script.sh
277 |     chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/${PARTITION}
278 | 
279 |     # Connect CryoSPARC worker nodes to cluster
280 |     /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster connect" - ${OSUSER}
281 | 
282 |     # Individually apply custom_vars
283 |     CLICMD=$(cat << EOT
284 | set_scheduler_target_property(hostname="${PARTITION}",key="custom_vars",value={"ram_gb_multiplier": "${PARTITION_RAM_GB_MULTIPLIER}"})
285 | EOT
286 | )
287 |     /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cli '$CLICMD' " - ${OSUSER}
288 |     echo "Done connecting $PARTITION"
289 |   else
290 |     echo "Partition already connected to CryoSPARC: ${PARTITION}"
291 |   fi
292 | 
293 | done
294 | set -e
295 | 
296 | # VALIDATE CRYOSPARC
297 | #echo "Validating lanes"
298 | #/bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
299 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate cpu --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
300 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-t4 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
301 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-a100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
302 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-v100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
303 | 
304 | set +e 
305 | echo "Attempting last attempt to update to latest version...if this fails, you may need to manually update head and compute" 
306 | 
307 | # Update to latest version of CryoSPARC
308 | systemctl stop cryosparc-supervisor.service
309 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm update" - ${OSUSER}
310 | # Depends on cryosparcm update to pull latest worker to cryosparc_master dir. 
311 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && cp ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/cryosparc_worker.tar.gz ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/cryosparc_worker.tar.gz" - ${OSUSER}
312 | # Only update workers if they were installed (continue otherwise)
313 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw update" - ${OSUSER} || true 
314 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER}
315 | systemctl start cryosparc-supervisor.service
316 | set -e 
317 | 
318 | echo "CryoSPARC setup complete"
319 | 


--------------------------------------------------------------------------------
/PCSREADME.md:
--------------------------------------------------------------------------------
  1 | # Guidance for Cryo-EM on AWS Parallel Computing Service
  2 | 
  3 | ## Table of Contents
  4 | 
  5 | - [Guidance for Cryo-EM on AWS Parallel Computing Service](#guidance-for-cryo-em-on-aws-parallel-computing-service)
  6 |   - [Table of Contents](#table-of-contents)
  7 |   - [Overview](#overview)
  8 |   - [Cost](#cost)
  9 |   - [Prerequisites](#prerequisites)
 10 |     - [Operating System](#operating-system)
 11 |     - [Supported Regions](#supported-regions)
 12 |     - [Data Transfer](#data-transfer)
 13 |   - [Deployment Steps](#deployment-steps)
 14 |   - [Running the Guidance](#running-the-guidance)
 15 |   - [Next Steps](#next-steps)
 16 |       - [Install ChimeraX for Visualization](#install-chimerax-for-visualization)
 17 |   - [Cleanup](#cleanup)
 18 |   - [FAQ, known issues, additional considerations, and limitations](#faq-known-issues-additional-considerations-and-limitations)
 19 |     - [AWS ParallelCluster](#aws-parallelcluster)
 20 |   - [Notices](#notices)
 21 |   - [License](#license)
 22 |   - [Authors](#authors)
 23 | 
 24 | ## Overview
 25 | 
 26 | This guidance demonstrates how to deploy CryoSPARC for cryogenic electron microscopy (Cryo-EM) workloads on AWS Parallel Computing Service (PCS). Cryo-EM enables drug discovery researchers to determine three-dimensional molecular structures crucial for their research. This solution addresses the challenge of processing terabytes of microscopy data through scalable, heterogeneous computing combined with fast, cost-effective storage.
 27 | 
 28 | Below is the architecture model for this guidance.
 29 | 
 30 | ![Architecturewsteps](assets/cryoemPCSarchitecture.png)
 31 | 
 32 | ## Cost
 33 | 
 34 | _You are responsible for the cost of the AWS services used while running this Guidance. As of September 2025, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $795.98 per sample. This estimate is based on processing 1 sample (1 TB of data). Cost calculations were derived using the times measured under realistic workload conditions for each instance type._
 35 | 
 36 | Below you can find a cost breakdown for this estimate based on the resources this guidance runs and assuming the aforementioned working periods (1 sample, 1 TB of data).
 37 | 
 38 | | AWS service                        | Dimensions                                | Cost [USD] |
 39 | | ---------------------------------- | ----------------------------------------- | ---------- |
 40 | | AWS Simple Storage Service (S3)    | 1 TB w/ Intelligent Tiering               | $ 23.72    |
 41 | | Amazon Elastic File Service (EFS)  | 100 GB Elastic Throughput                 | $ 30.00    |
 42 | | Amazon FSx for Lustre              | 1.2TB SSD - 250 MBps/TiB                  | $ 252.35   |
 43 | | AWS Parallel Compute Service (PCS) | Small Slurm Controller                    | $ 56.97    |
 44 | | Amazon Elastic Compute Cloud (EC2) | (Login Node) 1 On-Demand c5a.4xlarge      | $ 33.02    |
 45 | | Amazon Elastic Compute Cloud (EC2) | (CPU Group) 1 On-Demand c5a.8xlarge       | $ 0.16     |
 46 | | Amazon Elastic Compute Cloud (EC2) | (Single-GPU Group) 1 On-Demand g6.4xlarge | $ 28.35    |
 47 | | Amazon Elastic Compute Cloud (EC2) | (Multi-GPU Group) 1 On-Demand g6.48xlarge | $ 371.41   |
 48 | 
 49 | _We recommend creating a [Budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance._
 50 | 
 51 | ## Prerequisites
 52 | 
 53 | In order to be able to run this guidance and to use CryoSparc you need to have the following:
 54 | 
 55 | - CryoSPARC license ([contact Structura Biotechnology to obtain](https://guide.cryosparc.com/licensing)).
 56 | - [AWS CLI](https://aws.amazon.com/cli/) installed and configured.
 57 | - See additional [prerequisites](https://docs.aws.amazon.com/pcs/latest/userguide/getting-started_prerequisites.html) from AWS PCS.
 58 | - An SSH client.
 59 | 
 60 | ### Operating System
 61 | 
 62 | These deployment instructions are optimized to best work on a Mac or Linux environment. Deployment in Windows may require additional steps for setting up required libraries and CLI.
 63 | 
 64 | ### Supported Regions
 65 | 
 66 | Only the following regions are supported for this guidance:
 67 | 
 68 | - United States (N. Virginia)
 69 | - United States (Ohio)
 70 | - United States (Oregon)
 71 | 
 72 | - Asia Pacific (Singapore)
 73 | - Asia Pacific (Sydney)
 74 | - Asia Pacific (Tokyo)
 75 | 
 76 | - Europe (Frankfurt)
 77 | - Europe (Ireland)
 78 | - Europe (London)
 79 | - Europe (Stockholm)
 80 | 
 81 | Deploying the guidance in other regions may lead to errors or inconsistent behavior.
 82 | 
 83 | ### Data Transfer
 84 | 
 85 | Create a new S3 bucket for your input data.
 86 | 
 87 | The data transfer mechanism to move data from instruments into S3 depends on the connectivity in the lab environment and the volume of data to be transferred. We recommend [AWS DataSync](https://aws.amazon.com/datasync/), which automates secure data transfer from on-premises into the cloud with minimal development effort. [Storage Gateway File Gateway](https://aws.amazon.com/storagegateway/file/) is another viable option, especially if lab connectivity is limited or continued two-way access from on-premises to the transferred data sets is required. Both DataSync and Storage Gateway [can be bandwidth throttled](https://docs.aws.amazon.com/datasync/latest/userguide/working-with-task-executions.html) to protect non-HPC business-critical network constraints.
 88 | 
 89 | Alternatively, you can use the [AWS S3 CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/) to transfer individual files, or use partner solution to get started quickly.
 90 | 
 91 | ## Deployment Steps
 92 | 
 93 | 1. **Clone the GitHub Repository**  
 94 |    Clone this repository. View this [README](deployment/aws-hpc-recipe/recipes/pcs/getting_started/README.md) for deploying a PCS cluster. To create a PCS cluster with the right shared storage for this example, you can use the PCS guidance recipes for a one-click deployment, which uses AWS CloudFormation to launch an entire cluster, quickly.
 95 | 
 96 |    ```bash
 97 |    git clone https://github.com/aws-samples/cryoem-on-aws-parallel-cluster.git
 98 |    cd deployment/aws-hpc-recipe/recipes/pcs/getting_started
 99 |    cat README.md
100 |    ```
101 | 
102 | 2. **Launch the PCS Cluster Using CloudFormation**
103 |    - Navigate to the AWS Management Console → **CloudFormation**.
104 |    - Choose **Create stack** → **With new resources (standard)**.
105 |    - Upload the PCS template from the cloned repo.
106 |    - Provide an **SSH key pair** if you want shell access to the login node.
107 |    - Leave all other defaults unchanged and click **Create stack**.
108 | 
109 |    This creates:
110 |    - Networking prerequisites
111 |    - A Login Node group
112 |    - One demo Compute Node group
113 |    - An **Amazon EFS** file system mounted at `/home`
114 |    - An **Amazon FSx for Lustre** file system mounted at `/shared`
115 | 
116 |    ![CloudFormation stacks](assets/cfnstackhpctemplate.png)
117 | 
118 | 3. **Update FSx for Lustre Throughput**  
119 |    Increase throughput per unit of storage to support CryoSPARC installation:
120 | 
121 |    ```bash
122 |    aws fsx update-file-system        --file-system-id <FSX_LUSTRE_ID>        --lustre-configuration PerUnitStorageThroughput=250
123 |    ```
124 | 
125 |    This may take up to 20 minutes to complete.
126 | 
127 | 4. **Retrieve Compute Node Group Information**  
128 |    Run the following to get AMI ID, Instance Profile ARN, and Launch Template ID:
129 | 
130 |    ```bash
131 |    aws pcs get-compute-node-group        --cluster-identifier <PCS_CLUSTER_NAME>        --compute-node-group-identifier compute-1
132 |    ```
133 | 
134 |    Save the output values for use in the next step.
135 | 
136 | 5. **Create Additional Compute Node Groups**  
137 |    Run the following commands to create CPU, single-GPU, and multi-GPU node groups:
138 | 
139 |    ```bash
140 |    aws pcs create-compute-node-group        --compute-node-group-name compute-cpu        --cluster-identifier <PCS_CLUSTER_NAME>        --region <REGION>        --subnet-ids <PRIVATE_SUBNET_ID>        --custom-launch-template id=<COMPUTE_LT_ID>,version='1'        --ami-id <AMI_ID>        --iam-instance-profile <INSTANCE_PROFILE_ARN>        --scaling-config minInstanceCount=0,maxInstanceCount=2        --instance-configs instanceType=c5a.8xlarge
141 | 
142 |    aws pcs create-compute-node-group        --compute-node-group-name compute-single-gpu        --cluster-identifier <PCS_CLUSTER_NAME>        --region <REGION>        --subnet-ids <PRIVATE_SUBNET_ID>        --custom-launch-template id=<COMPUTE_LT_ID>,version='1'        --ami-id <AMI_ID>        --iam-instance-profile <INSTANCE_PROFILE_ARN>        --scaling-config minInstanceCount=0,maxInstanceCount=2        --instance-configs instanceType=g6.4xlarge
143 | 
144 |    aws pcs create-compute-node-group        --compute-node-group-name compute-multi-gpu        --cluster-identifier <PCS_CLUSTER_NAME>        --region <REGION>        --subnet-ids <PRIVATE_SUBNET_ID>        --custom-launch-template id=<COMPUTE_LT_ID>,version='1'        --ami-id <AMI_ID>        --iam-instance-profile <INSTANCE_PROFILE_ARN>        --scaling-config minInstanceCount=0,maxInstanceCount=2        --instance-configs instanceType=g6.48xlarge
145 |    ```
146 | 
147 | 6. **Verify Node Group Creation**  
148 |    Confirm that each node group is active:
149 | 
150 |    ```bash
151 |    aws pcs get-compute-node-group        --region <REGION>        --cluster-identifier <PCS_CLUSTER_NAME>        --compute-node-group-identifier <NODE_GROUP_NAME>
152 |    ```
153 | 
154 |    Wait until the status returns `ACTIVE`.
155 | 
156 | 7. **Create Queues for Node Groups**  
157 |    Map queues to node groups so CryoSPARC can submit jobs to the right hardware:
158 | 
159 |    ```bash
160 |    aws pcs create-queue        --queue-name cpu-queue        --cluster-identifier <PCS_CLUSTER_NAME>        --compute-node-group-configurations computeNodeGroupId=<COMPUTE_CPU_NODE_GROUP_ID>
161 | 
162 |    aws pcs create-queue        --queue-name single-gpu-queue        --cluster-identifier <PCS_CLUSTER_NAME>        --compute-node-group-configurations computeNodeGroupId=<COMPUTE_SINGLE_GPU_NODE_GROUP_ID>
163 | 
164 |    aws pcs create-queue        --queue-name multi-gpu-queue        --cluster-identifier <PCS_CLUSTER_NAME>        --compute-node-group-configurations computeNodeGroupId=<COMPUTE_MULTI_GPU_NODE_GROUP_ID>
165 |    ```
166 | 
167 | 8. **Verify Queues**  
168 |    Check that the queues are created and active:
169 |    ```bash
170 |    aws pcs get-queue         --region <REGION>         --cluster-identifier <PCS_CLUSTER_NAME>         --queue-identifier <PCS_QUEUE_NAME>
171 |    ```
172 | 
173 | ## Running the Guidance
174 | 
175 | 1. **Log in to the PCS Login Node**
176 |    - Open the **Amazon EC2 Console**.
177 |    - Search for your Login Node instance using the tag:
178 |      ```
179 |      aws:pcs:compute-node-group-id=<LOGIN_COMPUTE_NODE_GROUP_ID>
180 |      ```
181 |    - Select the instance → **Connect** → **Session Manager** → **Connect**.
182 |    - Switch to the `ec2-user`:
183 |      ```bash
184 |      sudo su - ec2-user
185 |      ```
186 | 
187 | 2. **Check Available Slurm Queues**  
188 |    Run:
189 | 
190 |    ```bash
191 |    sinfo
192 |    ```
193 | 
194 |    You should see partitions for CPU, GPU, and multi-GPU nodes.
195 | 3. **Download and Run CryoSPARC Installation Script**
196 | 
197 |    ```bash
198 |    wget https://raw.githubusercontent.com/aws-samples/cryoem-on-aws-parallel-cluster/refs/heads/main/source/pcs-cryosparc-post-install.sh
199 |    chmod +x pcs-cryosparc-post-install.sh
200 |    sudo ./pcs-cryosparc-post-install.sh <LICENSE_ID> /shared/cryosparc /shared/cuda 11.8.0 11.8.0_520.61.05 /shared
201 |    ```
202 | 
203 |    Installation can take up to an hour.
204 | 
205 | 4. **Start the CryoSPARC Server**
206 | 
207 |    ```bash
208 |    /shared/cryosparc/cryosparc_master/bin/cryosparcm start
209 |    ```
210 | 
211 | 5. **Create a CryoSPARC User**
212 | 
213 |    ```bash
214 |    cryosparcm createuser        --email "<youremail@email.com>"        --password "<yourpassword>"        --username "<yourusername>"        --firstname "<yourfirstname>"        --lastname "<yourlastname>"
215 |    ```
216 | 
217 | 6. **Access the CryoSPARC UI**
218 |    - Open an SSH tunnel from your local machine:
219 |      ```bash
220 |      ssh -i /path/to/key.pem -N -f -L localhost:45000:localhost:45000 ec2-user@<PUBLIC_IP>
221 |      ```
222 |    - In a browser, go to:  
223 |      [http://localhost:45000](http://localhost:45000)
224 |    - Log in with your CryoSPARC user credentials.
225 |      ![CryoSPARC Sign In](assets/cryosparcsigninpage.png)
226 |      
227 | 7. **Download and Extract a Test Dataset**
228 | 
229 | Download the [movies test set](https://guide.cryosparc.com/processing-data/get-started-with-cryosparc-introductory-tutorial) from the CryoSparc introductory tutorial.
230 | 
231 | ```bash
232 | mkdir /shared/data
233 | cd /shared/data
234 | /shared/cryosparc/cryosparc_master/bin/cryosparcm downloadtest
235 | tar -xf empiar_10025_subset.tar
236 | ```
237 | 
238 | 8. **Run a Test Job in CryoSPARC UI**
239 |    - Create a new **Import Movies** job.
240 |    - Select the `compute-cpu` lane (queue).
241 |    - Submit the job.
242 | 
243 |    In the terminal, check the running job with:
244 | 
245 |    ```bash
246 |    squeue
247 |    ```
248 | 
249 |    or check allocated nodes with:
250 | 
251 |    ```bash
252 |    sinfo
253 |    ```
254 | 
255 | ## Next Steps
256 | 
257 | #### Install ChimeraX for Visualization
258 | 
259 | Install ChimeraX on the login node and use Amazon DCV for remote desktop visualization. This can enable users to directly visualize CryoSPARC results without transferring data.
260 | 
261 | ## Cleanup
262 | 
263 | To cleanup the provisioned resources follow these steps:
264 | 
265 | 1. Delete PCS Queues
266 | 
267 | ```bash
268 | aws pcs delete-queue --cluster-identifier <pcs_cluster_name> --queue-identifier cpu-queue
269 | aws pcs delete-queue --cluster-identifier <pcs_cluster_name> --queue-identifier single-gpu-queue
270 | aws pcs delete-queue --cluster-identifier <pcs_cluster_name> --queue-identifier multi-gpu-queue
271 | ```
272 | 
273 | 2. Delete Node Groups
274 | 
275 | ```bash
276 | aws pcs delete-compute-node-group --cluster-identifier <pcs_cluster_name> --compute-node-group-identifier compute-cpu
277 | aws pcs delete-compute-node-group --cluster-identifier <pcs_cluster_name> --compute-node-group-identifier compute-single-gpu
278 | aws pcs delete-compute-node-group --cluster-identifier <pcs_cluster_name> --compute-node-group-identifier compute-multi-gpu
279 | ```
280 | 
281 | 3. Delete CloudFormation Stack
282 | 
283 | ```bash
284 | aws cloudformation delete-stack --stack-name <pcs_cloudformation_stack_name>
285 | ```
286 | 
287 | ## FAQ, known issues, additional considerations, and limitations
288 | 
289 | ### AWS ParallelCluster
290 | 
291 | AWS ParallelCluster offers an alternative deployment method for running CryoSPARC workloads. AWS ParallelCluster might be preferred when you need more granular control over your HPC infrastructure or require customized configurations that aren't available in PCS. It offers greater flexibility in cluster customization, including the ability to modify the underlying infrastructure, customize AMIs, and implement specific security configurations.
292 | 
293 | In such situations where AWS ParallelCluster may be preferred, an AWS guidance is available [here](ParallelClusterREADME.md).
294 | 
295 | ## Notices
296 | 
297 | _Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers._
298 | 
299 | ## License
300 | 
301 | This library is licensed under the MIT-0 License. See the LICENSE file.
302 | 
303 | ## Authors
304 | 
305 | - Marissa Powers
306 | - Juan Perin
307 | - Rye Robinson
308 | 


--------------------------------------------------------------------------------
/source/parallel-cluster-post-install.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | ## Args: 
  4 | # argv1 - CRYOSPARC_LICENSE_ID (required)
  5 | # argv2 - CRYOSPARC_INSTALL_PATH (default: /shared/cryosparc)
  6 | # argv3 - CUDA_INSTALL_PATH (default: /shared/cuda)
  7 | # argv4 - CUDA_VERSION (default 11.3.1)
  8 | # argv5 - CUDA_LONG_VERSION (default: 11.3.1_465.19.01)
  9 | # argv6 - PROJECT_DATA_PATH (default: /fsx)
 10 | 
 11 | set +e 
 12 | # Log script output to a file to reference later
 13 | exec &> >(tee -a "/tmp/post_install.log")
 14 | 
 15 | . "/etc/parallelcluster/cfnconfig"
 16 | 
 17 | # Get the local commands to run yum and apt
 18 | YUM_CMD=$(which yum || echo "")
 19 | APT_GET_CMD=$(which apt-get || echo "")
 20 | 
 21 | # If we have yum installed, use it to install prerequisites. If not, use apt
 22 | if [[ -n $YUM_CMD ]]; then
 23 |     wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -P /tmp
 24 |     yum install -y /tmp/epel-release-latest-7.noarch.rpm
 25 | 
 26 |     yum install -y perl-Switch python3 python3-pip links
 27 |     user_test=$(getent passwd ec2-user)
 28 |     if [[ -n "${user_test}" ]]; then
 29 |         OSUSER=ec2-user
 30 |         OSGROUP=ec2-user
 31 |     else
 32 |         OSUSER=centos
 33 |         OSGROUP=centos
 34 |     fi
 35 | elif [[ -n $APT_GET_CMD ]]; then
 36 |     apt-get update
 37 |     apt-get install -y libswitch-perl python3 python3-pip links
 38 |     OSUSER=ubuntu
 39 |     OSGROUP=ubuntu
 40 | else
 41 |     # If we don't have yum or apt, we couldn't install the prerequisites, so exit
 42 |     echo "error can't install package $PACKAGE"
 43 |     exit 1;
 44 | fi
 45 | 
 46 | # Get the cryoSPARC license ID, optional path, and optional versions from the script arguments
 47 | CRYOSPARC_LICENSE_ID=$1
 48 | CRYOSPARC_INSTALL_PATH=${2:-/shared/cryosparc}
 49 | CUDA_INSTALL_PATH=${3:-/shared/cuda}
 50 | CUDA_VERSION=${4:-11.3.1}
 51 | CUDA_LONG_VERSION=${5:-11.3.1_465.19.01}
 52 | PROJECT_DATA_PATH=${6:-/fsx}
 53 | 
 54 | /bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER} || chmod 777 ${PROJECT_DATA_PATH}
 55 | 
 56 | # Install the AWS CLI
 57 | pip3 install --upgrade awscli boto3
 58 | 
 59 | set -e
 60 | 
 61 | # Configure AWS
 62 | AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | rev | cut -c 2- | rev)
 63 | aws configure set default.region "${AWS_DEFAULT_REGION}"
 64 | aws configure set default.output json
 65 | 
 66 | if [[ "$(cat ${CUDA_INSTALL_PATH}/installed_cuda_version.log 2>/dev/null )" == "${CUDA_LONG_VERSION}" ]]; then 
 67 |   echo "Matched previous CUDA version. Using old installer ${CUDA_LONG_VERSION}"
 68 | else 
 69 |   echo "Installing new version of CUDA ${CUDA_LONG_VERSION} (this may break cryosparc install)"
 70 |   # Install CUDA Toolkit
 71 |   mkdir -p "${CUDA_INSTALL_PATH}"
 72 |   mkdir -p "${CUDA_INSTALL_PATH}_tmp"
 73 |   cd "${CUDA_INSTALL_PATH}" || return
 74 |   wget "https://developer.download.nvidia.com/compute/cuda/${CUDA_VERSION}/local_installers/cuda_${CUDA_LONG_VERSION}_linux.run"
 75 | fi
 76 | echo "${CUDA_LONG_VERSION}" > ${CUDA_INSTALL_PATH}/installed_cuda_version.log
 77 | 
 78 | sh ${CUDA_INSTALL_PATH}/cuda_"${CUDA_LONG_VERSION}"_linux.run --tmpdir="${CUDA_INSTALL_PATH}_tmp" --defaultroot="${CUDA_INSTALL_PATH}" --toolkit --toolkitpath="${CUDA_INSTALL_PATH}"/"${CUDA_VERSION}" --samples --silent
 79 | #rm cuda_"${CUDA_LONG_VERSION}"_linux.run
 80 | 
 81 | # Add CUDA to the path
 82 | cat > /etc/profile.d/cuda.sh << 'EOF'
 83 | PATH=$PATH:@CUDA_INSTALL_PATH@/@CUDA_VERSION@/bin
 84 | EOF
 85 | sed -i "s|@CUDA_INSTALL_PATH@|${CUDA_INSTALL_PATH}|g" /etc/profile.d/cuda.sh
 86 | sed -i "s|@CUDA_VERSION@|${CUDA_VERSION}|g" /etc/profile.d/cuda.sh
 87 | . /etc/profile.d/cuda.sh
 88 | 
 89 | # Add CryoSPARC to the path
 90 | cat > /etc/profile.d/cryosparc.sh << 'EOF'
 91 | PATH=$PATH:@CRYOSPARC_INSTALL_PATH@/cryosparc_master/bin
 92 | EOF
 93 | sed -i "s|@CRYOSPARC_INSTALL_PATH@|${CRYOSPARC_INSTALL_PATH}|g" /etc/profile.d/cryosparc.sh
 94 | . /etc/profile.d/cryosparc.sh
 95 | 
 96 | # Condition checks whether /etc/profile.d/cryosparc.sh activated previously install cryosparc
 97 | # if not, then we install cryosparc
 98 | if [ ! -x "$(command -v "cryosparcm")" ]; then
 99 |   echo "Installing fresh CryoSPARC"
100 | 
101 |   # Download cryoSPARC
102 |   mkdir -p "${CRYOSPARC_INSTALL_PATH}"
103 |   # Need to make sure OSUSER can write to this path
104 |   chown ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}
105 | 
106 |   cd "${CRYOSPARC_INSTALL_PATH}" || return
107 |   [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_master.tar.gz" ] || curl -L "https://get.cryosparc.com/download/master-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_master.tar.gz
108 |   [ -f "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker.tar.gz" ] || curl -L "https://get.cryosparc.com/download/worker-v4.0.3/${CRYOSPARC_LICENSE_ID}" -o cryosparc_worker.tar.gz
109 |   
110 |   # Install cryoSPARC main process
111 |   tar -xf cryosparc_master.tar.gz
112 | 
113 |   # cryosparc untars with ownership: 1001:1001 by default. re-align permissions to OSUSER
114 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master
115 | 
116 |   # Basic configuration for install
117 |   export CRYOSPARC_FORCE_USER=true
118 |   export CRYOSPARC_FORCE_HOSTNAME=true
119 |   export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true
120 | 
121 |   # Install Main
122 |   /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_master && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \
123 |       --hostname "${HOSTNAME}" \
124 |       --dbpath "${CRYOSPARC_INSTALL_PATH}"/cryosparc_db \
125 |       --port 45000 \
126 |       --allowroot \
127 |       --yes" - $OSUSER
128 |   
129 |   # Enforce configuration long-term
130 |   echo "export CRYOSPARC_FORCE_USER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh
131 |   echo "export CRYOSPARC_FORCE_HOSTNAME=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh
132 |   echo "export CRYOSPARC_DISABLE_IMPORT_ON_MASTER=true" >> "${CRYOSPARC_INSTALL_PATH}"/cryosparc_master/config.sh
133 | 
134 |   # Ownership of this path determines how cryosparc is started
135 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master
136 | 
137 |   # Start cryoSPARC main package 
138 |   /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER}
139 | 
140 |   # Install cryoSPARC worker package
141 |   cd "${CRYOSPARC_INSTALL_PATH}" || return
142 |   tar -xf cryosparc_worker.tar.gz
143 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker
144 |   /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker && ./install.sh --license "${CRYOSPARC_LICENSE_ID}" \
145 |       --cudapath "${CUDA_INSTALL_PATH}/${CUDA_VERSION}" \
146 |       --yes" - $OSUSER
147 |   
148 |   # Once again, re-align permissions
149 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker
150 | 
151 |   # Start cryoSPARC main package 
152 |   /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER}
153 | 
154 | else 
155 |   echo "Restoring CryoSPARC with updated Hostname and refreshing paritition connections"
156 | 
157 |   # Stop any running cryosparc
158 |   systemctl stop cryosparc-supervisor.service || /bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop || echo \"Nothing Running\" " - ${OSUSER}
159 | 
160 |   # Update hostname to new main
161 |   sed -i "s/^\(.*CRYOSPARC_MASTER_HOSTNAME=\"\).*\"/\1$HOSTNAME\"/g" ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/config.sh
162 | 
163 |   # Once again, re-align permissions for proper start 
164 |   chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master
165 | fi
166 |   
167 | # Start cluster
168 | #/bin/su -c "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm start" - ${OSUSER}
169 | 
170 | # Confirm Restart CryoSPARC main
171 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm restart" - ${OSUSER} 
172 | 
173 | # Stop server in anticipation for service
174 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER} 
175 | 
176 | # Create the CryoSPARC Systemd service and start at Boot
177 | eval $(cryosparcm env)
178 | cd "${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd" || return
179 | # Final alignment on permissions
180 | chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/systemd
181 | env "CRYOSPARC_ROOT_DIR=$CRYOSPARC_ROOT_DIR" ./install_services.sh
182 | systemctl start cryosparc-supervisor.service
183 | systemctl restart cryosparc-supervisor.service
184 | systemctl enable cryosparc-supervisor.service
185 | 
186 | 
187 | # Be tolerant of errors on partitions; we can always come back through admin panel and add later
188 | set +e 
189 | echo "Partitions:"
190 | /opt/slurm/bin/scontrol show partitions
191 | 
192 | echo "Beginning"
193 | # Create cluster config files
194 | for PARTITION in $( /opt/slurm/bin/scontrol show partitions | grep PartitionName | cut -d'=' -f 2 )
195 | do 
196 |   if [ ! -f "${CRYOSPARC_INSTALL_PATH}/${PARTITION}/cluster_info.json" ]; then
197 |     echo "Connecting New Partition: ${PARTITION}"
198 |     case $PARTITION in
199 |     	gpu-t4*)
200 | 	    echo "T4 GPU: $PARTITION"
201 |             PARTITION_CACHE_PATH="/scratch"
202 |             PARTITION_CACHE_RESERVE=10000
203 |             PARTITION_CACHE_QUOTA=800000
204 |             PARTITION_RAM_GB_MULTIPLIER=2.0
205 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
206 |         ;;
207 |      	gpu-l4*)
208 | 	    echo "L4 GPU: $PARTITION"
209 |             PARTITION_CACHE_PATH="/scratch"
210 |             PARTITION_CACHE_RESERVE=10000
211 |             PARTITION_CACHE_QUOTA=800000
212 |             PARTITION_RAM_GB_MULTIPLIER=2.0
213 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
214 |         ;;
215 |         gpu-a100*)
216 | 	    echo "A100 GPU: $PARTITION"
217 |             PARTITION_CACHE_PATH="/scratch"
218 |             PARTITION_CACHE_RESERVE=10000
219 |             PARTITION_CACHE_QUOTA=800000
220 |             PARTITION_RAM_GB_MULTIPLIER=2.0
221 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
222 |         ;;
223 |         cpu*)
224 | 	    echo "X86: $PARTITION"
225 |             PARTITION_CACHE_PATH="/scratch"
226 |             PARTITION_CACHE_RESERVE=10000
227 |             PARTITION_CACHE_QUOTA=800000
228 |             PARTITION_RAM_GB_MULTIPLIER=2.0
229 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
230 |         ;;
231 |         gpu-a100-spot*)
232 | 	    echo "A100 GPU SPOT: $PARTITION"
233 |             PARTITION_CACHE_PATH="/scratch"
234 |             PARTITION_CACHE_RESERVE=10000
235 |             PARTITION_CACHE_QUOTA=800000
236 |             PARTITION_RAM_GB_MULTIPLIER=2.0
237 |             SBATCH_EXTRA="#SBATCH --gres=gpu:{{ num_gpu }}"
238 |         ;;
239 |     esac
240 | 
241 |     mkdir -p "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"
242 |     cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_info.json << EOF
243 | {
244 | "qdel_cmd_tpl": "/opt/slurm/bin/scancel {{ cluster_job_id }}",
245 | "worker_bin_path": "${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw",
246 | "title": "cryosparc-cluster",
247 | "cache_path": "${PARTITION_CACHE_PATH}",
248 | "cache_reserve_mb": ${PARTITION_CACHE_RESERVE},
249 | "cache_quota_mb": ${PARTITION_CACHE_QUOTA},
250 | "qinfo_cmd_tpl": "/opt/slurm/bin/sinfo --format='%.42N %.5D %.15P %.8T %.15C %.5c %.10z %.10m %.15G %.9d %40E'",
251 | "qsub_cmd_tpl": "/opt/slurm/bin/sbatch {{ script_path_abs }}",
252 | "qstat_cmd_tpl": "/opt/slurm/bin/squeue -j {{ cluster_job_id }}",
253 | "send_cmd_tpl": "{{ command }}",
254 | "name": "${PARTITION}"
255 | }
256 | EOF
257 | #sinfo --format='%.8N %.6D %.10P %.6T %.14C %.5c %.6z %.7m %.7G %.9d %20E'
258 | 
259 |     cat > "${CRYOSPARC_INSTALL_PATH}/${PARTITION}"/cluster_script.sh << EOF
260 | #!/usr/bin/env bash
261 | #### cryoSPARC cluster submission script template for SLURM
262 | ## Available variables:
263 | ## {{ run_cmd }}            - the complete command string to run the job
264 | ## {{ num_cpu }}            - the number of CPUs needed
265 | ## {{ num_gpu }}            - the number of GPUs needed.
266 | ##                            Note: the code will use this many GPUs starting from dev id 0
267 | ##                                  the cluster scheduler or this script have the responsibility
268 | ##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up
269 | ##                                  using the correct cluster-allocated GPUs.
270 | ## {{ ram_gb }}             - the amount of RAM needed in GB
271 | ## {{ job_dir_abs }}        - absolute path to the job directory
272 | ## {{ project_dir_abs }}    - absolute path to the project dir
273 | ## {{ job_log_path_abs }}   - absolute path to the log file for the job
274 | ## {{ worker_bin_path }}    - absolute path to the cryosparc worker command
275 | ## {{ run_args }}           - arguments to be passed to cryosparcw run
276 | ## {{ project_uid }}        - uid of the project
277 | ## {{ job_uid }}            - uid of the job
278 | ## {{ job_creator }}        - name of the user that created the job (may contain spaces)
279 | ## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)
280 | ## {{ job_type }}           - CryoSPARC job type
281 | ##
282 | ## What follows is a simple SLURM script:
283 | 
284 | #SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}
285 | #SBATCH -n {{ num_cpu }}
286 | ${SBATCH_EXTRA}
287 | #SBATCH --partition=${PARTITION}
288 | #SBATCH --mem={{ (ram_gb|float * ram_gb_multiplier|float)|int }}G
289 | #SBATCH --output={{ job_log_path_abs }}
290 | #SBATCH --error={{ job_log_path_abs }}
291 | 
292 | {{ run_cmd }}
293 | EOF
294 | 
295 |     #sed -i "s|@PARTITION@|${PARTITION}|g" "${CRYOSPARC_INSTALL_PATH}"/cluster_script.sh
296 |     chown -R ${OSUSER}:${OSGROUP} ${CRYOSPARC_INSTALL_PATH}/${PARTITION}
297 | 
298 |     # Connect CryoSPARC worker nodes to cluster
299 |     /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster connect" - ${OSUSER}
300 | 
301 |     # Individually apply custom_vars
302 |     CLICMD=$(cat << EOT
303 | set_scheduler_target_property(hostname="${PARTITION}",key="custom_vars",value={"ram_gb_multiplier": "${PARTITION_RAM_GB_MULTIPLIER}"})
304 | EOT
305 | )
306 |     /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH}/${PARTITION} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cli '$CLICMD' " - ${OSUSER}
307 |     echo "Done connecting $PARTITION"
308 |   else
309 |     echo "Partition already connected to CryoSPARC: ${PARTITION}"
310 |   fi
311 | 
312 | done
313 | set -e
314 | 
315 | # VALIDATE CRYOSPARC
316 | # This stage can be run after cluster creation.
317 | #echo "Validating lanes"
318 | #/bin/su -c "mkdir -p ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
319 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate cpu --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
320 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-t4 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
321 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-a100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
322 | #/bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm cluster validate gpu-v100 --projects_dir ${PROJECT_DATA_PATH}/validate-lanes" - ${OSUSER}
323 | echo "Enabling All-or-Nothing"
324 | echo "all_or_nothing_batch = True" >> /etc/parallelcluster/slurm_plugin/parallelcluster_slurm_resume.conf
325 | 
326 | set +e 
327 | echo "Attempting last attempt to update to latest version...if this fails, you may need to manually update head and compute" 
328 | 
329 | # Update to latest version of CryoSPARC
330 | systemctl stop cryosparc-supervisor.service
331 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm update" - ${OSUSER}
332 | # Depends on cryosparcm update to pull latest worker to cryosparc_master dir. 
333 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && cp ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/cryosparc_worker.tar.gz ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/cryosparc_worker.tar.gz" - ${OSUSER}
334 | # Only update workers if they were installed (continue otherwise)
335 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_worker/bin/cryosparcw update" - ${OSUSER} || true 
336 | /bin/su -c "cd ${CRYOSPARC_INSTALL_PATH} && ${CRYOSPARC_INSTALL_PATH}/cryosparc_master/bin/cryosparcm stop" - ${OSUSER}
337 | systemctl start cryosparc-supervisor.service
338 | set -e 
339 | 
340 | #Clean up the .tar.gz:
341 | rm "${CRYOSPARC_INSTALL_PATH}"/*.tar.gz
342 | 
343 | echo "CryoSPARC setup complete"
344 | 


--------------------------------------------------------------------------------
/ParallelClusterREADME.md:
--------------------------------------------------------------------------------
  1 | # Guidance for Cryo-EM on AWS ParallelCluster
  2 | 
  3 | - [Guidance for Cryo-EM on AWS ParallelCluster](#guidance-for-cryo-em-on-aws-parallelcluster)
  4 |   - [Overview](#overview)
  5 |   - [Cost](#cost)
  6 |   - [Prerequisites](#prerequisites)
  7 |     - [Environment](#environment)
  8 |     - [Supported Regions](#supported-regions)
  9 |     - [Data Transfer](#data-transfer)
 10 |     - [CryoSPARC License](#cryosparc-license)
 11 |     - [Networking and Compute Availability](#networking-and-compute-availability)
 12 |     - [IAM Permissions](#iam-permissions)
 13 |     - [Data Export Policy (Optional)](#data-export-policy-optional)
 14 |   - [Deployment Steps](#deployment-steps)
 15 |   - [Running the Guidance](#running-the-guidance)
 16 |   - [Cleanup](#cleanup)
 17 |   - [FAQ, known issues, additional considerations, and limitations](#faq-known-issues-additional-considerations-and-limitations)
 18 |     - [AWS Parallel Computing Service (PCS)](#aws-parallel-computing-service-pcs)
 19 |   - [Notices](#notices)
 20 |   - [License](#license)
 21 |   - [Authors](#authors)
 22 | 
 23 | ## Overview
 24 | 
 25 | This guidance demonstrates how to deploy CryoSPARC for cryogenic electron microscopy (Cryo-EM) workloads on AWS ParallelCluster. Cryo-EM enables drug discovery researchers to determine three-dimensional molecular structures crucial for their research. This solution addresses the challenge of processing terabytes of microscopy data through scalable, heterogeneous computing combined with fast, cost-effective storage.
 26 | 
 27 | Below is the architecture model for this guidance.
 28 | 
 29 | ![Architecture](assets/CryoSPARCParallelClusterArch.png)
 30 | 
 31 | ## Cost
 32 | 
 33 | _You are responsible for the cost of the AWS services used while running this Guidance. As of September 2025, the cost for running this Guidance with the default settings in the US East (N. Virginia) is approximately $739.01 per sample. This estimate is based on processing 1 sample (1 TB of data). Cost calculations were derived using the times measured under realistic workload conditions for each instance type._
 34 | 
 35 | Below you can find a cost breakdown for this estimate based on the resources this guidance runs and assuming the aforementioned working periods (1 sample, 1 TB of data).
 36 | 
 37 | | AWS service                        | Dimensions                                | Cost [USD] |
 38 | | ---------------------------------- | ----------------------------------------- | ---------- |
 39 | | AWS Simple Storage Service (S3)    | 1 TB w/ Intelligent Tiering               | $ 23.72    |
 40 | | Amazon Elastic File Service (EFS)  | 100 GB Elastic Throughput                 | $ 30.00    |
 41 | | Amazon FSx for Lustre              | 1.2TB SSD - 250 MBps/TiB                  | $ 252.35   |
 42 | | Amazon Elastic Compute Cloud (EC2) | (Head Node) 1 On-Demand c5a.4xlarge       | $ 33.02    |
 43 | | Amazon Elastic Compute Cloud (EC2) | (CPU Group) 1 On-Demand c5a.8xlarge       | $ 0.16     |
 44 | | Amazon Elastic Compute Cloud (EC2) | (Single-GPU Group) 1 On-Demand g6.4xlarge | $ 28.35    |
 45 | | Amazon Elastic Compute Cloud (EC2) | (Multi-GPU Group) 1 On-Demand g6.48xlarge | $ 371.41   |
 46 | 
 47 | _We recommend creating a [Budget](https://docs.aws.amazon.com/cost-management/latest/userguide/budgets-managing-costs.html) through [AWS Cost Explorer](https://aws.amazon.com/aws-cost-management/aws-cost-explorer/) to help manage costs. Prices are subject to change. For full details, refer to the pricing webpage for each AWS service used in this Guidance._
 48 | 
 49 | ## Prerequisites
 50 | 
 51 | In order to be able to run this guidance and to use CryoSparc you need to have the following:
 52 | 
 53 | - CryoSPARC license ([contact Structura Biotechnology to obtain](https://guide.cryosparc.com/licensing)).
 54 | - [AWS CLI](https://aws.amazon.com/cli/) installed and configured.
 55 | - An SSH client.
 56 | 
 57 | ### Environment
 58 | 
 59 | We recommend using [AWS CloudShell](https://aws.amazon.com/cloudshell/) to quickly set up an environment that already has the credentials and command line tools you'll need to get started. [The AWS CloudShell Console](https://console.aws.amazon.com/cloudshell) already has credentials to your AWS account, the AWS CLI, and Python installed. If you're not using CloudShell, make sure you have these installed in your local environment before continuing.
 60 | 
 61 | ### Supported Regions
 62 | 
 63 | Only the following regions are supported for this guidance:
 64 | 
 65 | - United States (N. Virginia)
 66 | - United States (Ohio)
 67 | - United States (Oregon)
 68 | 
 69 | - Asia Pacific (Singapore)
 70 | - Asia Pacific (Sydney)
 71 | - Asia Pacific (Tokyo)
 72 | 
 73 | - Europe (Frankfurt)
 74 | - Europe (Ireland)
 75 | - Europe (London)
 76 | - Europe (Stockholm)
 77 | 
 78 | Deploying the guidance in other regions may lead to errors or inconsistent behavior.
 79 | 
 80 | ### Data Transfer
 81 | 
 82 | Create a new S3 bucket for your input data.
 83 | 
 84 | The data transfer mechanism to move data from instruments into S3 depends on the connectivity in the lab environment and the volume of data to be transferred. We recommend [AWS DataSync](https://aws.amazon.com/datasync/), which automates secure data transfer from on-premises into the cloud with minimal development effort. [Storage Gateway File Gateway](https://aws.amazon.com/storagegateway/file/) is another viable option, especially if lab connectivity is limited or continued two-way access from on-premises to the transferred data sets is required. Both DataSync and Storage Gateway [can be bandwidth throttled](https://docs.aws.amazon.com/datasync/latest/userguide/working-with-task-executions.html) to protect non-HPC business-critical network constraints.
 85 | 
 86 | Alternatively, you can use the [AWS S3 CLI](https://docs.aws.amazon.com/cli/latest/reference/s3/) to transfer individual files, or use partner solution to get started quickly.
 87 | 
 88 | ### CryoSPARC License
 89 | 
 90 | First, you'll need to request a license from Structura. It can take a day or two to obtain the license, so request it before you get started. You'll use this license ID to replace the <CRYOSPARC-LICENSE> placeholder in the configuration file.
 91 | 
 92 | ### Networking and Compute Availability
 93 | 
 94 | A typical use of a default VPC has public and private subnets balanced across multiple Availability Zones (AZs). However, HPC clusters (like ParallelCluster) usually prefer a single-AZ so they can keep communication latency low and use Cluster Placement Groups. For the compute nodes, you can create a large private subnet with a relatively large number of IP addresses. Then, you can create a public subnet with minimal IP addresses, since it will only contain the head node.
 95 | 
 96 | HPC EC2 instances like the [g6 family](https://aws.amazon.com/ec2/instance-types/g6/) aren’t available in every AZ. That means we need to determine which AZ in a given Region has all the compute families we need. We can do that with the [AWS CLI](https://aws.amazon.com/cli/) [describe-instance-type-offerings](https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instance-type-offerings.html) command. The easiest way to do this is to use [CloudShell](https://aws.amazon.com/cloudshell/), which provides a shell environment ready to issue AWS CLI commands in a few minutes. If you want a more permanent development environment for ParallelCluster CLI calls, you can use [Cloud9](https://aws.amazon.com/cloud9) which persists an IDE environment, including a Terminal in which you can run CLI commands. After you've provisioned the environment is provisioned, copy and paste the text into the shell.
 97 | 
 98 | ```bash
 99 | aws ec2 describe-instance-type-offerings \
100 | --location-type availability-zone \
101 | --region <region> \
102 | --filters Name=instance-type,Values=g6.4xlarge \
103 | --query "InstanceTypeOfferings[*].Location" \
104 | --output text
105 | ```
106 | 
107 | Using the output showing which AZs have the compute instances you need, you can create your VPC and subnets. Populate the `<REGION>`, `<SMALL-PUBLIC-SUBNET-ID>`, and `<LARGE-PRIVATE-SUBNET-ID>` inputs in the configuration file.
108 | 
109 | You’ll also need to [create an EC2 SSH key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/create-key-pairs.html) so that you can SSH into the head node once your cluster has been deployed, and populate the `<EC2-KEY-PAIR-NAME>` input in the configuration file.
110 | 
111 | ### IAM Permissions
112 | 
113 | While ParallelCluster creates its own least-privilege roles and policies by default, many Enterprises limit their AWS account users’ access to IAM actions. ParallelCluster also supports using or adding pre-created IAM resources, which you can request to be pre-created for you by your IT services team. The required permissions and roles are [provided in the ParallelCluster documentation](https://docs.aws.amazon.com/parallelcluster/latest/ug/iam-roles-in-parallelcluster-v3.html).
114 | 
115 | Use [parallel-cluster-cryosparc.yaml](./deployment/parallel-cluster-cryosparc.yaml) - if your account allows ParallelCluster to create new IAM Roles and Policies.
116 | 
117 | Use [parallel-cluster-cryosparc-custom-roles.yaml](./deployment/parallel-cluster-cryosparc-custom-roles.yaml) - if your account restricts creation of new IAM resources, which has additional IAM fields, to help you get started quickly.
118 | 
119 | If using custom roles, refer to the ParallelCluster IAM documentation for required permissions.
120 | 
121 | Note: In ParallelCluster 3.4+, the config file accepts EITHER S3Access OR InstanceRole parameters, but not both. Ensure your roles have S3 access in addition to the policies outlined in the documentation.
122 | 
123 | ### Data Export Policy (Optional)
124 | 
125 | If you want to automatically export data back to Amazon S3 after job completion, you'll need to attach the [FSxLustreDataRepoTasksPolicy.yaml](./deployment/FSxLustreDataRepoTasksPolicy.yaml) to the head node's instance profile.
126 | 
127 | ## Deployment Steps
128 | 
129 | 1. **Clone the repository containing the ParallelCluster configuration files**
130 | 
131 |    ```bash
132 |    git clone https://github.com/aws-samples/cryoem-on-aws-parallel-cluster.git
133 |    ```
134 | 
135 | 2. **Navigate to the repository folder**
136 | 
137 |    ```bash
138 |    cd cryoem-on-aws-parallel-cluster
139 |    ```
140 | 
141 | 3. **Identify available Availability Zones for your required instance types**
142 | 
143 |    Use AWS CloudShell or Cloud9 to run the following command (replace `<region>` with your target AWS region, e.g., `us-east-1`):
144 | 
145 |    ```bash
146 |    aws ec2 describe-instance-type-offerings \
147 |    --location-type availability-zone \
148 |    --region <region> \
149 |    --filters Name=instance-type,Values=g6.4xlarge \
150 |    --query "InstanceTypeOfferings[*].Location" \
151 |    --output text
152 |    ```
153 | 
154 |    This command identifies which Availability Zones support g6 instances. Note the output for use in the next step.
155 | 
156 | 4. **Create VPC and subnets in the identified Availability Zone**
157 | 
158 |    Create a VPC with:
159 |    - One small public subnet (for the head node)
160 |    - One large private subnet (for compute nodes with a relatively large number of IP addresses)
161 | 
162 |    Ensure your public subnet is configured to automatically assign IPv4 addresses and has DNS enabled.
163 | 
164 |    Capture the subnet IDs using:
165 | 
166 |    ```bash
167 |    aws ec2 describe-subnets --filters "Name=vpc-id,Values=<your-vpc-id>" --query "Subnets[*].[SubnetId,CidrBlock,AvailabilityZone]" --output table
168 |    ```
169 | 
170 | 5. **Create an EC2 SSH key pair**
171 | 
172 |    ```bash
173 |    aws ec2 create-key-pair --key-name cryosparc-cluster-key --query 'KeyMaterial' --output text > cryosparc-cluster-key.pem
174 |    chmod 400 cryosparc-cluster-key.pem
175 |    ```
176 | 
177 |    This creates a key pair and saves the private key locally. The key name will be used in the configuration file.
178 | 
179 | 6. **Create an S3 bucket for ParallelCluster artifacts**
180 | 
181 |    ```bash
182 |    aws s3 mb s3://cryosparc-parallel-cluster-<your-account-id> --region <region>
183 |    ```
184 | 
185 |    Replace `<your-account-id>` with your AWS account ID and `<region>` with your target region.
186 | 
187 | 7. **Edit the ParallelCluster configuration file**
188 | 
189 |    Open either `parallel-cluster-cryosparc.yaml` or `parallel-cluster-cryosparc-custom-roles.yaml` and replace the following placeholders:
190 |    - `<CRYOSPARC-LICENSE>` - Your CryoSPARC license ID from Structura
191 |    - `<REGION>` - Your AWS region (e.g., `us-east-1`)
192 |    - `<SMALL-PUBLIC-SUBNET-ID>` - The subnet ID for your public subnet
193 |    - `<LARGE-PRIVATE-SUBNET-ID>` - The subnet ID for your private subnet
194 |    - `<EC2-KEY-PAIR-NAME>` - The name of your SSH key pair (e.g., `cryosparc-cluster-key`)
195 | 
196 |    Make sure to view the multi-tier deployment of instances for node groups.
197 | 
198 | 8. **Upload the configuration file and post-install script to S3**
199 | 
200 |    ```bash
201 |    aws s3 cp parallel-cluster-cryosparc.yaml s3://cryosparc-parallel-cluster-<your-account-id>/
202 |    aws s3 cp parallel-cluster-post-install.sh s3://cryosparc-parallel-cluster-<your-account-id>/
203 |    ```
204 | 
205 | 9. **Install AWS ParallelCluster in a Python virtual environment**
206 |    ```bash
207 |    python3 -m venv pcluster-venv
208 |    source pcluster-venv/bin/activate
209 |    pip install --upgrade pip
210 |    pip install aws-parallelcluster
211 |    ```
212 | 10. **Install Node Version Manager and LTS Node.JS Version**
213 | 
214 |     ```bash
215 |     curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
216 |     chmod ug+x ~/.nvm/nvm.sh
217 |     source ~/.nvm/nvm.sh
218 |     nvm install --lts
219 |     node --version
220 |     ```
221 | 
222 | 11. **Verify ParallelCluster installation**
223 | 
224 |     ```bash
225 |     pcluster version
226 |     ```
227 | 
228 |     This confirms ParallelCluster CLI is properly installed.
229 | 
230 | 12. **Copy the ParallelCluster configuration file from S3**
231 | 
232 |     ```bash
233 |     aws s3api get-object --bucket cryosparc-parallel-cluster-<your-account-id> --key parallel-cluster-cryosparc.yaml parallel-cluster-cryosparc.yaml
234 |     ```
235 | 
236 | 13. **Create the ParallelCluster**
237 | 
238 |     ```bash
239 |     pcluster create-cluster --cluster-name cryosparc-cluster --cluster-configuration parallel-cluster-cryosparc.yaml
240 |     ```
241 | 
242 |     This command initiates the cluster creation process using AWS CloudFormation.
243 | 
244 | 14. **Monitor cluster creation status**
245 | 
246 |     ```bash
247 |     pcluster describe-cluster --cluster-name cryosparc-cluster
248 |     ```
249 | 
250 |     Alternatively, monitor progress in the [AWS CloudFormation console](https://console.aws.amazon.com/cloudformation/).
251 | 
252 |     The cluster is ready when the status shows `CREATE_COMPLETE`.
253 | 
254 | 15. **Capture the head node instance ID (once cluster is created)**
255 | 
256 |     ```bash
257 |     aws cloudformation describe-stack-resources --stack-name cryosparc-cluster --query "StackResources[?LogicalResourceId=='HeadNode'].PhysicalResourceId" --output text
258 |     ```
259 | 
260 | 16. **Capture the head node public IP address**
261 | 
262 |     ```bash
263 |     pcluster describe-cluster --cluster-name cryosparc-cluster --query "headNode.publicIpAddress" --output text
264 |     ```
265 | 
266 | **Troubleshooting:** If the stack rolls back due to a failure:
267 | 
268 | - Verify your public subnet automatically assigns IPv4 addresses and has DNS enabled
269 | - Re-create the cluster with the `--rollback-on-failure false` flag to preserve resources for troubleshooting:
270 |   ```bash
271 |   pcluster create-cluster --cluster-name cryosparc-cluster --cluster-configuration parallel-cluster-cryosparc.yaml --rollback-on-failure false
272 |   ```
273 | - Check the HeadNode system logs in the EC2 console: Select the instance → Actions → Monitor and troubleshoot → Get system log
274 | 
275 | ## Running the Guidance
276 | 
277 | Once your cluster has been deployed and provisioned, you are ready to continue using AWS ParallelCluster to run CryoSPARC jobs as described in their [documentation](https://guide.cryosparc.com/setup-configuration-and-management/cryosparc-on-aws).
278 | 
279 | ## Cleanup
280 | 
281 | To clean up your cluster, use ParallelCluster's delete-cluster command to de-provision the underlying resources in your cluster.
282 | 
283 | ```bash
284 | pcluster delete-cluster --cluster-name cryosparc-cluster
285 | ```
286 | 
287 | Once the cluster has been deleted, you can delete the files you uploaded to S3 and the S3 bucket itself, along with the data transfer solution you chose in the prerequisite sections.
288 | 
289 | ## FAQ, known issues, additional considerations, and limitations
290 | 
291 | ### AWS Parallel Computing Service (PCS)
292 | 
293 | AWS Parallel Computing Service (PCS) offers an alternative deployment method for running CryoSPARC workloads. PCS might be preferred when you want a fully managed experience with less operational overhead, faster setup, and easier scaling compared to managing infrastructure manually. It abstracts much of the complexity of HPC cluster management while still allowing you to run large-scale distributed workloads.
294 | 
295 | In such situations where AWS PCS may be preferred, an AWS guidance is available [here](PCSREADME.md).
296 | 
297 | You can find the post-install sample code in `source/pcs-cryosparc-post-install.sh` as referenced in the Scalable Cryo-EM on AWS Parallel Computing Service (PCS) guidance for installation on the login node. The full architecture of the guidance is as follows and can be found on the guidance.
298 | 
299 | ## Notices
300 | 
301 | _Customers are responsible for making their own independent assessment of the information in this Guidance. This Guidance: (a) is for informational purposes only, (b) represents AWS current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. AWS responsibilities and liabilities to its customers are controlled by AWS agreements, and this Guidance is not part of, nor does it modify, any agreement between AWS and its customers._
302 | 
303 | ![CryoSPARC on PCS Architecture](assets/cryoemPCSarchitecture.png)
304 | 
305 | ## License
306 | 
307 | This library is licensed under the MIT-0 License. See the LICENSE file.
308 | 
309 | ## Authors
310 | 
311 | - Natalie White
312 | - Brian Skjerven
313 | 


--------------------------------------------------------------------------------