├── CHANGELOG.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── THIRD-PARTY-LICENSES
├── deploy
    └── upload_artifacts.sh
├── src
    ├── cfn_templates
    │   ├── apply-s3-lifecycle-stack.yml
    │   ├── code-build-stack.yml
    │   ├── e2e-sfn-stack.yml
    │   ├── omics-resources-stack.yml
    │   ├── s3-stack.yml
    │   ├── sfn-task-checker-stack.yml
    │   ├── sfn-trigger-stack.yml
    │   └── solution-cfn.yml
    ├── codebuild
    │   ├── buildspec_docker.yml
    │   └── buildspec_lambdas.yml
    ├── glue
    │   ├── PhenotypicGenomes.json
    │   └── etl.py
    ├── lambda
    │   ├── add_bucket_notification
    │   │   └── add_bucket_notification_lambda.py
    │   ├── apply_s3_lifecycle
    │   │   └── apply_s3_lifecycle_lambda.py
    │   ├── check_omcis_workflow_task
    │   │   └── lambda_check_omics_workflow_task.py
    │   ├── import_annotation
    │   │   └── import_annotation_lambda.py
    │   ├── import_reference
    │   │   └── import_reference_lambda.py
    │   ├── import_sequence
    │   │   └── import_sequence_lambda.py
    │   ├── import_variants
    │   │   └── import_variant_lambda.py
    │   ├── launch_genomics_sfn
    │   │   └── lambda_launch_genomics_sfn.py
    │   ├── start_workflow
    │   │   └── start_workflow_lambda.py
    │   └── trigger_code_build
    │   │   ├── trigger_docker_code_build.py
    │   │   └── trigger_lambdas_code_build.py
    ├── notebook
    │   └── Sample_queries_omics.ipynb
    └── workflow
    │   ├── main.wdl
    │   ├── parameter-template.json
    │   └── sub-workflows
    │       ├── fastq-to-bam.wdl
    │       ├── haplotypecaller-gvcf-gatk4.wdl
    │       └── processing-for-variant-discovery-gatk4.wdl
└── static
    ├── arch_diagram.png
    ├── stepfunctions.png
    └── stepfunctions_graph_workflowstudio.png


/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | # Changelog
 2 | 
 3 | All notable changes to this project will be documented in this file.
 4 | 
 5 | ## [1.1.0] (2023-03-29)
 6 | 
 7 | ### Features
 8 | 
 9 | * Removed dependency on Omics API models as a Lambda Layer due to general availablility
10 | * Use Omics CloudFormation resources instead of Custom Resources
11 | * Introduce checks in Step Functions State Machine to prevent duplicate workflows from being launched
12 | * Use existing Omics Reference store, if provided, else create a new one 
13 | 
14 | ## [1.0.0] (2022-11-30)
15 | 
16 | ### Features
17 | 
18 | * First releasw


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Amazon Omics - from raw sequence data to insights
  2 | 
  3 | This github repository has the code and artifacts described in the blog post: [Part 2 – Automated End to End Genomics Data Processing and Analysis using Amazon Omics and AWS Step Functions](https://aws.amazon.com/blogs/industries/automated-end-to-end-genomics-data-storage-and-analysis-using-amazon-omics/):
  4 | 
  5 | 
  6 | ## Reference architecture
  7 | ![Alt text](static/arch_diagram.png?raw=true "Reference architecture using Step Functions and Lambda Functions")
  8 | 
  9 | ## Prerequisites 
 10 | 
 11 | - Python 3.7 and above with package installer - pip
 12 | - Linux/UNIX environment to run deployment shell scripts
 13 | - Compression and file packaging utility - zip
 14 | - AWS account with AdminstratorAccess to deploy various AWS resources using CloudFormation
 15 | - An S3 bucket, for example `my-artifact-bucket` within this account to upload all assets needed for deployment
 16 | - AWS CLI v2 installed and configured to your AWS Account to upload files to `<my-artifact-bucket>` (Installation instructions here: https://github.com/aws/aws-cli/tree/v2#installation)
 17 | 
 18 | 
 19 | ```
 20 | Note that cross region imports are not supported in Amazon Omics today. If you chose to deploy it in another supported region outside of us-east-1, copy the example data used in the solution in a bucket in that region and update the permissions in the CloudFormation templates accordingly
 21 | ```
 22 | 
 23 | ## How to deploy
 24 | 
 25 | 1. Once you clone the repository, navigate to the `deploy/` directory within the repository.  
 26 | 2. Run the deployment script to upload all required files to the artifact bucket
 27 | 
 28 | `sh upload_artifacts my-artifact-bucket <aws-profile-name>`
 29 | ```
 30 | NOTE
 31 | 
 32 | You can use the 2nd argument <aws-profile-name> as an optional argument if you chose to use a specific AWS profile
 33 | ```
 34 | 3. Navigate to the AWS S3 Console. In the list of buckets, click on `<my-artifact-bucket>` and navigate to `templates` prefix. Find the file named `solution-cfn.yml`. Copy the Object URL (begins with https://) for this object (not the S3 URI).
 35 | 4. Navigate to AWS CloudFormation Console. Click on `Create Stack`, select `Template is ready` and paste the above https:// Object URL into the `Amazon S3 URL` field and click `Next`. 
 36 | 5. Fill in the `Stack name` with a name of your choice, `ArtifactBucketName` with `<my-artifact-bucket>`, `WorkflowInputsBucketName` & `WorkflowOutputsBucketName` with new bucket names of your choice; these buckets will be created.
 37 | 6. For the `CurrentReferenceStoreId` parameter, if the account that you plan to use has an existing reference store and you want to repurpose it, you can provide the Referernce store ID as the value. (Since only 1 reference store is allowed per account per region). If you don't have one and want to create a new one, provide the value `NONE`. 
 38 | 7. Click Next on the subsequent two pages, then on the Review <Stack name> page, acknowledge the following 'Capabilities', and click `Submit`:
 39 |     - AWS CloudFormation might create IAM resources with custom names.
 40 |     - AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND
 41 | 8. CloudFormation will now create multiple stacks with all the necessary resources, including Omics resources:
 42 |     - Omics Reference Store with Reference genome imported (Skip Reference store creation if store ID provided)
 43 |     - Omics Sequence Store
 44 |     - Omics Workflow with workflow definition and parameters defined
 45 |     - Omics Variant Store
 46 |     - Omics Annotation Store with ClinVar imported
 47 | 
 48 | 9. It's recommended that users update omics permissions to least privilege access when leveraging this sample code as a starting point for future production needs.
 49 | 10. The CloudFormation Stack should complete deployment in less than an hour
 50 | 
 51 | ## Usage
 52 | 1. Once the template has been deployed successfully, you can use a pair of FASTQ files to launch the end to end Secondary Analysis pipeline. 
 53 | 
 54 | ```
 55 | Note that in this solution, the FASTQ files need to be named in the following manner:
 56 |     
 57 | <sample_name>_R1.fastq.gz 
 58 | 
 59 | <sample_name>_R2.fastq.gz
 60 | 
 61 | This can be updated to your needs by updating the Python regex in start_workflow_lambda.py
 62 | 
 63 | You can also use example FASTQs provided here to test:
 64 | 
 65 | s3://aws-genomics-static-us-east-1/omics-e2e/test_fastqs/NA1287820K_R1.fastq.gz
 66 |     
 67 | s3://aws-genomics-static-us-east-1/omics-e2e/test_fastqs/NA1287820K_R2.fastq.gz
 68 | 
 69 | ```
 70 | 
 71 | 
 72 | 2. Upload these FASTQ files to the bucket used for `WorkflowInputsBucketName` in a prefix `inputs`. This bucket is configured such that uploaded FASTQ files in this prefix will use S3 notifications to tigger a Lambda function that evaluates the inputs and launches the Step Functions workflow. You can monitor the Step Functions workflow in the AWS Console for Step Functions and navigating to State Machines -> AmazonOmicsEndToEndStepFunction. You should see a running Execution with the Name "GENOMICS_\<sampleId>_\<uuid>", where sampleId is extracted from the name of the FASTQ files used.
 73 | 
 74 | ```
 75 | NOTE
 76 | 
 77 | Currently if both FASTQs are uploaded simultaneaously, the Step Function trigger lambda has a best-effort mechanism to avoid race conditions by adding a random delay and checking for a running execution with the same sample name. It's recommended to check for a duplicate execution as a precaution.
 78 | ```
 79 | 
 80 | 3. The Step Functions workflow has the following steps:
 81 |    - Import FASTQ files to the pre-created Omics Sequence store.
 82 |    - Start a pre-created Omics Workflow with the input FASTQs.
 83 |    - Import the workflow output BAM file to the pre-created Omics Sequence Store and the output VCF file to the pre-created Omics Variant Store in parallel.
 84 |    - Apply S3 object tags to the input FASTQ and output BAM and VCF files to allow S3 lifecycle policies to be applied.  
 85 |    
 86 | ![Alt text](static/stepfunctions_graph_workflowstudio.png?raw=true "Step Function State Machine")
 87 | 
 88 | 4. Since these steps are asynchronous API calls, we leverage tasks to poll for completion and move on to the next step on success. The Step Functions Workflow takes about 3 hours to complete with the test FASTQs provided above and could vary by the size of inputs chosen. 
 89 | 
 90 |     Note that if there is a Step Function Workflow failure, users can refer to this blog on instructions for how to resume a Step Function workflow - https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/
 91 | 
 92 | 5. Now that the variants are available in the Omics Variant Store and the pre-loaded annotations in the Omics Annotation store, you can create resource links for them in AWS Lake Formation, Grant permissions to the desired users and query the resuting tables in Amazon Athena to derive insights (see instructions on how to provide Lake Formation permissions in the blog post <link>). Note that for the example notebook, we used genomic data from the example [Ovation Dx NAFLD Whole Genome dataset](https://aws.amazon.com/marketplace/pp/prodview-565xa6uzf77wu?sr=0-1&ref_=beagle&applicationId=AWS-Marketplace-Console#offers) from the Amazon Data Exchange
 93 | 
 94 | ## Cleanup
 95 | 
 96 | The above solution has deployed several AWS resources as part of the CloudFormation stack. If you chose to clean up the resources created by this solution, you can take the following steps:
 97 | 
 98 | 1. Delete the CloudFormation stack with the name that was assigned at creation. This will start deleting all the resources created. 
 99 | 2. Due to certain actions taken during usage of the solution, resources such as the Workflow input and output buckets and the ECR respoistories will fail to delete due to them not being empty. In order to delete them as well, users will have to empty the contents of the S3 buckets for Workflow inputs and outputs and delete the images created under the Amazon ECR repositories (if you chose to clean up these resources). Once deleted, you can re-attempt to delete the CloudFormation stack.
100 | 3. If the Omics resources fail to delete by the delete stack action in CloudFormation, users will need to manually delete the Omics Resources created by the stack, such as the workfow, variant store, annotation store, sequence store and reference store (or just the imported reference genome). Once done, you can re-attempt to delete the CloudFormation stack.  
101 | 4. If certain custom CloudFormation resources such as Lambda functions in the Omics and CodeBuild stacks fail to delete again, simply retrying the deletion of the parent stack should delete it.
102 |    
103 | 
104 | ## License
105 | This library is licensed under the MIT-0 License. See the LICENSE file.
106 | 
107 | ## Authors
108 | 
109 | Nadeem Bulsara | Sr. Solutions Architect - Genomics, BDSI | AWS
110 | 
111 | Sujaya Srinivasan | Genomics Solutions Architect, WWPS | AWS 
112 | 
113 | David Obenshain | Cloud Application Architect, WWPS Professional Services | AWS
114 | 
115 | Gargi Singh Chhatwal | Sr. Solutions Architect - Federal Civilian, WWPS | AWS
116 | 
117 | Joshua Broyde | Sr. AI/ML Solutions Architect, BDSI | AWS


--------------------------------------------------------------------------------
/THIRD-PARTY-LICENSES:
--------------------------------------------------------------------------------
  1 | ** CrHelper; version 2.0.11 -- https://github.com/aws-cloudformation/custom-resource-helper
  2 |  
  3 | Apache License
  4 | Version 2.0, January 2004
  5 | http://www.apache.org/licenses/
  6 | 
  7 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  8 | 
  9 | 1. Definitions.
 10 | 
 11 | "License" shall mean the terms and conditions for use, reproduction, and
 12 | distribution as defined by Sections 1 through 9 of this document.
 13 | 
 14 | "Licensor" shall mean the copyright owner or entity authorized by the copyright
 15 | owner that is granting the License.
 16 | 
 17 | "Legal Entity" shall mean the union of the acting entity and all other entities
 18 | that control, are controlled by, or are under common control with that entity.
 19 | For the purposes of this definition, "control" means (i) the power, direct or
 20 | indirect, to cause the direction or management of such entity, whether by
 21 | contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 | outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 | "You" (or "Your") shall mean an individual or Legal Entity exercising
 25 | permissions granted by this License.
 26 | 
 27 | "Source" form shall mean the preferred form for making modifications, including
 28 | but not limited to software source code, documentation source, and configuration
 29 | files.
 30 | 
 31 | "Object" form shall mean any form resulting from mechanical transformation or
 32 | translation of a Source form, including but not limited to compiled object code,
 33 | generated documentation, and conversions to other media types.
 34 | 
 35 | "Work" shall mean the work of authorship, whether in Source or Object form, made
 36 | available under the License, as indicated by a copyright notice that is included
 37 | in or attached to the work (an example is provided in the Appendix below).
 38 | 
 39 | "Derivative Works" shall mean any work, whether in Source or Object form, that
 40 | is based on (or derived from) the Work and for which the editorial revisions,
 41 | annotations, elaborations, or other modifications represent, as a whole, an
 42 | original work of authorship. For the purposes of this License, Derivative Works
 43 | shall not include works that remain separable from, or merely link (or bind by
 44 | name) to the interfaces of, the Work and Derivative Works thereof.
 45 | 
 46 | "Contribution" shall mean any work of authorship, including the original version
 47 | of the Work and any modifications or additions to that Work or Derivative Works
 48 | thereof, that is intentionally submitted to Licensor for inclusion in the Work
 49 | by the copyright owner or by an individual or Legal Entity authorized to submit
 50 | on behalf of the copyright owner. For the purposes of this definition,
 51 | "submitted" means any form of electronic, verbal, or written communication sent
 52 | to the Licensor or its representatives, including but not limited to
 53 | communication on electronic mailing lists, source code control systems, and
 54 | issue tracking systems that are managed by, or on behalf of, the Licensor for
 55 | the purpose of discussing and improving the Work, but excluding communication
 56 | that is conspicuously marked or otherwise designated in writing by the copyright
 57 | owner as "Not a Contribution."
 58 | 
 59 | "Contributor" shall mean Licensor and any individual or Legal Entity on behalf
 60 | of whom a Contribution has been received by Licensor and subsequently
 61 | incorporated within the Work.
 62 | 
 63 | 2. Grant of Copyright License. Subject to the terms and conditions of this
 64 | License, each Contributor hereby grants to You a perpetual, worldwide, non-
 65 | exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce,
 66 | prepare Derivative Works of, publicly display, publicly perform, sublicense, and
 67 | distribute the Work and such Derivative Works in Source or Object form.
 68 | 
 69 | 3. Grant of Patent License. Subject to the terms and conditions of this License,
 70 | each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-
 71 | charge, royalty-free, irrevocable (except as stated in this section) patent
 72 | license to make, have made, use, offer to sell, sell, import, and otherwise
 73 | transfer the Work, where such license applies only to those patent claims
 74 | licensable by such Contributor that are necessarily infringed by their
 75 | Contribution(s) alone or by combination of their Contribution(s) with the Work
 76 | to which such Contribution(s) was submitted. If You institute patent litigation
 77 | against any entity (including a cross-claim or counterclaim in a lawsuit)
 78 | alleging that the Work or a Contribution incorporated within the Work
 79 | constitutes direct or contributory patent infringement, then any patent licenses
 80 | granted to You under this License for that Work shall terminate as of the date
 81 | such litigation is filed.
 82 | 
 83 | 4. Redistribution. You may reproduce and distribute copies of the Work or
 84 | Derivative Works thereof in any medium, with or without modifications, and in
 85 | Source or Object form, provided that You meet the following conditions:
 86 | 
 87 |      (a) You must give any other recipients of the Work or Derivative Works a
 88 | copy of this License; and
 89 | 
 90 |      (b) You must cause any modified files to carry prominent notices stating
 91 | that You changed the files; and
 92 | 
 93 |      (c) You must retain, in the Source form of any Derivative Works that You
 94 | distribute, all copyright, patent, trademark, and attribution notices from the
 95 | Source form of the Work, excluding those notices that do not pertain to any part
 96 | of the Derivative Works; and
 97 | 
 98 |      (d) If the Work includes a "NOTICE" text file as part of its distribution,
 99 | then any Derivative Works that You distribute must include a readable copy of
100 | the attribution notices contained within such NOTICE file, excluding those
101 | notices that do not pertain to any part of the Derivative Works, in at least one
102 | of the following places: within a NOTICE text file distributed as part of the
103 | Derivative Works; within the Source form or documentation, if provided along
104 | with the Derivative Works; or, within a display generated by the Derivative
105 | Works, if and wherever such third-party notices normally appear. The contents of
106 | the NOTICE file are for informational purposes only and do not modify the
107 | License. You may add Your own attribution notices within Derivative Works that
108 | You distribute, alongside or as an addendum to the NOTICE text from the Work,
109 | provided that such additional attribution notices cannot be construed as
110 | modifying the License.
111 | 
112 |      You may add Your own copyright statement to Your modifications and may
113 | provide additional or different license terms and conditions for use,
114 | reproduction, or distribution of Your modifications, or for any such Derivative
115 | Works as a whole, provided Your use, reproduction, and distribution of the Work
116 | otherwise complies with the conditions stated in this License.
117 | 
118 | 5. Submission of Contributions. Unless You explicitly state otherwise, any
119 | Contribution intentionally submitted for inclusion in the Work by You to the
120 | Licensor shall be under the terms and conditions of this License, without any
121 | additional terms or conditions. Notwithstanding the above, nothing herein shall
122 | supersede or modify the terms of any separate license agreement you may have
123 | executed with Licensor regarding such Contributions.
124 | 
125 | 6. Trademarks. This License does not grant permission to use the trade names,
126 | trademarks, service marks, or product names of the Licensor, except as required
127 | for reasonable and customary use in describing the origin of the Work and
128 | reproducing the content of the NOTICE file.
129 | 
130 | 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in
131 | writing, Licensor provides the Work (and each Contributor provides its
132 | Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
133 | KIND, either express or implied, including, without limitation, any warranties
134 | or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
135 | PARTICULAR PURPOSE. You are solely responsible for determining the
136 | appropriateness of using or redistributing the Work and assume any risks
137 | associated with Your exercise of permissions under this License.
138 | 
139 | 8. Limitation of Liability. In no event and under no legal theory, whether in
140 | tort (including negligence), contract, or otherwise, unless required by
141 | applicable law (such as deliberate and grossly negligent acts) or agreed to in
142 | writing, shall any Contributor be liable to You for damages, including any
143 | direct, indirect, special, incidental, or consequential damages of any character
144 | arising as a result of this License or out of the use or inability to use the
145 | Work (including but not limited to damages for loss of goodwill, work stoppage,
146 | computer failure or malfunction, or any and all other commercial damages or
147 | losses), even if such Contributor has been advised of the possibility of such
148 | damages.
149 | 
150 | 9. Accepting Warranty or Additional Liability. While redistributing the Work or
151 | Derivative Works thereof, You may choose to offer, and charge a fee for,
152 | acceptance of support, warranty, indemnity, or other liability obligations
153 | and/or rights consistent with this License. However, in accepting such
154 | obligations, You may act only on Your own behalf and on Your sole
155 | responsibility, not on behalf of any other Contributor, and only if You agree to
156 | indemnify, defend, and hold each Contributor harmless for any liability incurred
157 | by, or claims asserted against, such Contributor by reason of your accepting any
158 | such warranty or additional liability.
159 | 
160 | END OF TERMS AND CONDITIONS
161 | 
162 | APPENDIX: How to apply the Apache License to your work.
163 | 
164 | To apply the Apache License to your work, attach the following boilerplate
165 | notice, with the fields enclosed by brackets "[]" replaced with your own
166 | identifying information. (Don't include the brackets!)  The text should be
167 | enclosed in the appropriate comment syntax for the file format. We also
168 | recommend that a file or class name and description of purpose be included on
169 | the same "printed page" as the copyright notice for easier identification within
170 | third-party archives.
171 | 
172 | Copyright [yyyy] [name of copyright owner]
173 | 
174 | Licensed under the Apache License, Version 2.0 (the "License");
175 | you may not use this file except in compliance with the License.
176 | You may obtain a copy of the License at
177 | 
178 | http://www.apache.org/licenses/LICENSE-2.0
179 | 
180 | Unless required by applicable law or agreed to in writing, software
181 | distributed under the License is distributed on an "AS IS" BASIS,
182 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
183 | See the License for the specific language governing permissions and
184 | limitations under the License.
185 | 
186 | * For CrHelper see also this required NOTICE:
187 |     Custom Resource Helper Copyright 2019 Amazon.com, Inc. or its affiliates.
188 | All Rights Reserved.
189 |     This library is licensed under the Apache 2.0 License.
190 |     Decorator implementation inspired by https://github.com/ryansb/cfn-wrapper-
191 | python
192 |     Log implementation inspired by https://gitlab.com/hadrien/aws_lambda_logging
193 | 
194 | ------
195 | 
196 | ** gatk4-germline-snps-indels; version 2.3.1 -- https://github.com/gatk-workflows/gatk4-germline-snps-indels
197 | Copyright Broad Institute, 2019 | BSD-3 This script is released under the WDL
198 | open source code license (BSD-3) (full license text at
199 | https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the
200 | programs it calls may be subject to different licenses. Users are responsible
201 | for checking that they are authorized to run all programs before running this
202 | script.
203 |  
204 | BSD 3-Clause License
205 | 
206 | Copyright (c) 2017, Broad Institute
207 | All rights reserved.
208 | 
209 | Redistribution and use in source and binary forms, with or without
210 | modification, are permitted provided that the following conditions are met:
211 | 
212 | * Redistributions of source code must retain the above copyright notice, this
213 |   list of conditions and the following disclaimer.
214 | 
215 | * Redistributions in binary form must reproduce the above copyright notice,
216 |   this list of conditions and the following disclaimer in the documentation
217 |   and/or other materials provided with the distribution.
218 | 
219 | * Neither the name of the copyright holder nor the names of its
220 |   contributors may be used to endorse or promote products derived from
221 |   this software without specific prior written permission.
222 | 
223 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
224 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
225 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
226 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
227 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
228 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
229 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
230 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
231 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
232 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
233 | 
234 | ------
235 | 
236 | ** jsonschema; version 4.16.0 -- https://github.com/python-jsonschema/jsonschema
237 | N/A
238 |  
239 | MIT License
240 | 
241 | Copyright (c) <year> <copyright holders>
242 | 
243 | Permission is hereby granted, free of charge, to any person obtaining a copy of
244 | this software and associated documentation files (the "Software"), to deal in
245 | the Software without restriction, including without limitation the rights to
246 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
247 | the Software, and to permit persons to whom the Software is furnished to do so,
248 | subject to the following conditions:
249 | 
250 | The above copyright notice and this permission notice shall be included in all
251 | copies or substantial portions of the Software.
252 | 
253 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
254 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
255 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
256 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
257 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
258 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
259 | 


--------------------------------------------------------------------------------
/deploy/upload_artifacts.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | set -e 
 4 | 
 5 | ARTIFACT_S3_BUCKET=s3://${1}
 6 | AWS_PROFILE=$2
 7 | TEMPLATES='templates'
 8 | LAMBDAS='lambdas'
 9 | WORKFLOWS='workflows'
10 | BUILDSPECS='buildspecs'
11 | 
12 | # use profile if 2nd argument provided
13 | if [ $# -eq 2 ]
14 |   then
15 |     AWS_PROFILE=" --profile ${2}"
16 | fi
17 | 
18 | echo "=========================================="
19 | echo "TRIGGER CODEBUILD JOB LAMBDAS"
20 | echo "=========================================="
21 | cd ../src/lambda/trigger_code_build
22 | echo "Installing pip packages"
23 | pip3 install crhelper boto3==1.26.65 -t ./package
24 | cd ./package
25 | zip -r ../trigger_docker_code_build.zip .
26 | cd ..
27 | echo "Zip lambda to artifact"
28 | zip -g trigger_docker_code_build.zip trigger_docker_code_build.py
29 | aws s3 cp trigger_docker_code_build.zip $ARTIFACT_S3_BUCKET/$LAMBDAS/ $AWS_PROFILE
30 | rm trigger_docker_code_build.zip 
31 | echo "done with trigger_docker_code_build.zip"
32 | 
33 | cd ./package
34 | zip -r ../trigger_lambdas_code_build.zip .
35 | cd ..
36 | echo "Zip lambda to artifact"
37 | zip -g trigger_lambdas_code_build.zip trigger_lambdas_code_build.py
38 | aws s3 cp trigger_lambdas_code_build.zip $ARTIFACT_S3_BUCKET/$LAMBDAS/ $AWS_PROFILE
39 | rm trigger_lambdas_code_build.zip
40 | rm -r ./package
41 | echo "done with trigger_lambdas_code_build.zip"
42 | echo "Uploaded all lambdas Zip files to trigger Code Build jobs to ${ARTIFACT_S3_PATH}/${LAMBDAS}/"
43 | 
44 | echo "=========================================="
45 | echo "UPLOAD CLOUDFORMATION TEMPLATES"
46 | echo "=========================================="
47 | echo "iterate through cfn templates and upload to s3"
48 | cd ../../../deploy
49 | for f in $(find ../src/cfn_templates/ -name '*.yml' -or -name '*.yaml'); do echo "uploading `basename $f`" && aws s3 cp $f $ARTIFACT_S3_BUCKET/$TEMPLATES/ $AWS_PROFILE && echo "Done"; done
50 | echo "Uploaded all cfn template files to ${ARTIFACT_S3_PATH}/${TEMPLATES}/"
51 | 
52 | echo "=========================================="
53 | echo "ZIP and UPLOAD WORKFLOW FILES"
54 | echo "=========================================="
55 | echo "zip and upload workflow files "
56 | cd ../src/workflow/
57 | zip -r gatkbestpractices.wdl.zip main.wdl sub-workflows/
58 | aws s3 cp gatkbestpractices.wdl.zip $ARTIFACT_S3_BUCKET/$WORKFLOWS/ $AWS_PROFILE
59 | aws s3 cp parameter-template.json $ARTIFACT_S3_BUCKET/$WORKFLOWS/ $AWS_PROFILE
60 | rm gatkbestpractices.wdl.zip
61 | echo "uploaded all workflow files to ${ARTIFACT_S3_BUCKET}/${WORKFLOWS}/"
62 | cd -
63 | 
64 | echo "=========================================="
65 | echo "UPLOAD LAMBDA FILES"
66 | echo "=========================================="
67 | echo "iterate through lambdas and upload to s3"
68 | for f in $(find ../src/lambda/ -name '*.py' -or -name '*.py'); do echo "uploading `basename $f`" && aws s3 cp $f $ARTIFACT_S3_BUCKET/$LAMBDAS/ $AWS_PROFILE && echo "Done"; done
69 | echo "Uploaded all lambda files to ${ARTIFACT_S3_BUCKET}/${LAMBDAS}/"
70 | 
71 | echo "=========================================="
72 | echo "UPLOAD CODEBUILD BUILDSPEC FILES"
73 | echo "=========================================="
74 | echo "iterate through buildspecs and upload to s3"
75 | for f in $(find ../src/codebuild/ -name '*.yml' -or -name '*.yaml'); do echo "uploading `basename $f`" && aws s3 cp $f $ARTIFACT_S3_BUCKET/$BUILDSPECS/ $AWS_PROFILE && echo "Done"; done
76 | echo "Uploaded all buildspec files to ${ARTIFACT_S3_BUCKET}/${BUILDSPECS}/"
77 | 
78 | echo "=========================================="
79 | echo "DONE"
80 | echo "=========================================="


--------------------------------------------------------------------------------
/src/cfn_templates/apply-s3-lifecycle-stack.yml:
--------------------------------------------------------------------------------
 1 | AWSTemplateFormatVersion: 2010-09-09
 2 | Description: >-
 3 |   Lambda to tag relevant genomics inputs and output
 4 |   files in S3 so that S3 life cycle policies can be applied
 5 | Parameters:
 6 |   InputsBucketName:
 7 |     Type: String
 8 |     Description: S3 bucket that's used for inputs (e.g. FASTQs)
 9 |   OutputsBucketName:
10 |     Type: String
11 |     Description: S3 bucket that's used for outputs (e.g. BAM/CRAM, VCFs)
12 |   LambdaBucketName:
13 |     Type: String
14 |     Description: S3 bucket where lambda code artifacts are stored
15 |   LambdaArtifactPrefix:
16 |     Type: String
17 |     Description: Prefix in bucket where lambda artifacts are stored
18 | Resources:
19 |   ApplyS3LifecycleLambdaFunction:
20 |     Type: 'AWS::Lambda::Function'
21 |     Properties:
22 |       FunctionName: apply-s3-lifecycle
23 |       Code:
24 |         S3Bucket: !Ref LambdaBucketName
25 |         S3Key: !Sub "${LambdaArtifactPrefix}apply_s3_lifecycle_lambda.zip"
26 |       Handler: apply_s3_lifecycle_lambda.lambda_handler
27 |       Role: !GetAtt LambdaCleanupIAMRole.Arn
28 |       Runtime: python3.9
29 |       Timeout: 30
30 |   LambdaCleanupIAMRole:
31 |     Type: 'AWS::IAM::Role'
32 |     Properties:
33 |       AssumeRolePolicyDocument:
34 |         Version: 2012-10-17
35 |         Statement:
36 |           - Effect: Allow
37 |             Principal:
38 |               Service:
39 |               - lambda.amazonaws.com
40 |             Action:
41 |               - 'sts:AssumeRole'
42 |       Policies:
43 |         - PolicyName: TaggingAndLogging
44 |           PolicyDocument:
45 |             Version: 2012-10-17
46 |             Statement:
47 |               - Effect: Allow
48 |                 Action:
49 |                   - 's3:PutObjectTagging'
50 |                   - 's3:GetObjectTagging'
51 |                   - 's3:ListBucket'
52 |                 Resource: 
53 |                   - !Sub 'arn:aws:s3:::${InputsBucketName}'
54 |                   - !Sub 'arn:aws:s3:::${InputsBucketName}/*'
55 |                   - !Sub 'arn:aws:s3:::${OutputsBucketName}'
56 |                   - !Sub 'arn:aws:s3:::${OutputsBucketName}/*'
57 |               - Effect: Allow
58 |                 Action:
59 |                   - 'logs:CreateLogGroup'
60 |                   - 'logs:CreateLogStream'
61 |                   - 'logs:PutLogEvents'
62 |                 Resource: 'arn:aws:logs:*:*:*'
63 | Outputs:
64 |   ApplyS3LifecycleLambdaFunctionArn:
65 |     Value: !GetAtt ApplyS3LifecycleLambdaFunction.Arn
66 |     Export:
67 |       Name: ApplyS3LifecycleLambdaFunctionArn
68 | 


--------------------------------------------------------------------------------
/src/cfn_templates/code-build-stack.yml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: 2010-09-09
  2 | Description: >-
  3 |   Build Code Build projects and Lambdas to trigger the build of all required 
  4 |   lambda functions and Docker images for Omics resources
  5 | Parameters:
  6 |   ResourcesS3Bucket:
  7 |     Type: String
  8 |   LambdasS3Prefix:
  9 |     Type: String
 10 |   BuildSpecS3Prefix:
 11 |     Type: String
 12 |   DockerGenomesInTheCloud:
 13 |     Type: String
 14 |   DockerGatk:
 15 |     Type: String
 16 |   DockerCodeBuildBuildSpecS3File:
 17 |     Type: String
 18 |     Default: buildspec_docker.yml
 19 |   LambdasCodeBuildBuildSpecS3File:
 20 |     Type: String
 21 |     Default: buildspec_lambdas.yml
 22 | Resources:
 23 |   ECRGenomesInTheCloud:
 24 |     Type: 'AWS::ECR::Repository'
 25 |     Properties:
 26 |       RepositoryName: genomes-in-the-cloud
 27 |       RepositoryPolicyText:
 28 |         Version: 2012-10-17
 29 |         Statement:
 30 |           - Sid: AllowOmicsToPull
 31 |             Effect: Allow
 32 |             Principal:
 33 |               Service: omics.amazonaws.com
 34 |             Action:
 35 |               - 'ecr:BatchGetImage'
 36 |               - 'ecr:GetDownloadUrlForLayer'
 37 |               - 'ecr:BatchCheckLayerAvailability'
 38 |   ECRGatk:
 39 |     Type: 'AWS::ECR::Repository'
 40 |     Properties:
 41 |       RepositoryName: gatk
 42 |       RepositoryPolicyText:
 43 |         Version: 2012-10-17
 44 |         Statement:
 45 |           - Sid: AllowOmicsToPull
 46 |             Effect: Allow
 47 |             Principal:
 48 |               Service: omics.amazonaws.com
 49 |             Action:
 50 |               - 'ecr:BatchGetImage'
 51 |               - 'ecr:GetDownloadUrlForLayer'
 52 |               - 'ecr:BatchCheckLayerAvailability'
 53 |   CodeBuildServiceRole:
 54 |     Type: 'AWS::IAM::Role'
 55 |     Properties:
 56 |       AssumeRolePolicyDocument:
 57 |         Version: 2012-10-17
 58 |         Statement:
 59 |           - Action:
 60 |               - 'sts:AssumeRole'
 61 |             Effect: Allow
 62 |             Principal:
 63 |               Service:
 64 |                 - codebuild.amazonaws.com
 65 |       Path: /
 66 |       Policies:
 67 |         - PolicyName: CodeBuildServiceRolePolicy
 68 |           PolicyDocument:
 69 |             Statement:
 70 |               - Effect: Allow
 71 |                 Action:
 72 |                   - 'ecr:BatchCheckLayerAvailability'
 73 |                   - 'ecr:CompleteLayerUpload'
 74 |                   - 'ecr:GetAuthorizationToken'
 75 |                   - 'ecr:InitiateLayerUpload'
 76 |                   - 'ecr:PutImage'
 77 |                   - 'ecr:UploadLayerPart'
 78 |                 Resource: '*'
 79 |               - Effect: Allow
 80 |                 Action:
 81 |                   - 's3:GetObject'
 82 |                   - 's3:GetBucketLocation'
 83 |                   - 's3:ListBucket'
 84 |                   - 's3:PutObject'
 85 |                 Resource:
 86 |                   - !Sub 'arn:aws:s3:::${ResourcesS3Bucket}'
 87 |                   - !Sub 'arn:aws:s3:::${ResourcesS3Bucket}/*'
 88 |               - Effect: Allow
 89 |                 Action:
 90 |                   - 'logs:CreateLogGroup'
 91 |                   - 'logs:CreateLogStream'
 92 |                   - 'logs:PutLogEvents'
 93 |                 Resource:
 94 |                   - !Sub >-
 95 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/*
 96 |   DockerCodeBuildProject:
 97 |     Type: 'AWS::CodeBuild::Project'
 98 |     DependsOn:
 99 |       - ECRGenomesInTheCloud
100 |       - ECRGatk
101 |       - CodeBuildServiceRole
102 |       - LambdasCodeBuildProject
103 |     Properties:
104 |       Name: DockerCodeBuildProject
105 |       ServiceRole: !Sub '${CodeBuildServiceRole.Arn}'
106 |       Artifacts:
107 |         Type: NO_ARTIFACTS
108 |       Environment:
109 |         Type: linuxContainer
110 |         ComputeType: BUILD_GENERAL1_SMALL
111 |         Image: 'aws/codebuild/standard:3.0'
112 |         PrivilegedMode: true
113 |       Source:
114 |         Type: NO_SOURCE
115 |         BuildSpec: !Sub >-
116 |           arn:aws:s3:::${ResourcesS3Bucket}/${BuildSpecS3Prefix}/${DockerCodeBuildBuildSpecS3File}
117 |       TimeoutInMinutes: 20
118 |   LambdasCodeBuildProject:
119 |     Type: 'AWS::CodeBuild::Project'
120 |     DependsOn:
121 |       - CodeBuildServiceRole
122 |     Properties:
123 |       Name: LambdasCodeBuildProject
124 |       ServiceRole: !Sub '${CodeBuildServiceRole.Arn}'
125 |       Artifacts:
126 |         Type: NO_ARTIFACTS
127 |       Environment:
128 |         Type: linuxContainer
129 |         ComputeType: BUILD_GENERAL1_SMALL
130 |         Image: 'aws/codebuild/standard:3.0'
131 |         PrivilegedMode: true
132 |         EnvironmentVariables:
133 |           - Name: RESOURCES_BUCKET
134 |             Type: PLAINTEXT
135 |             Value: !Ref ResourcesS3Bucket
136 |           - Name: RESOURCES_PREFIX
137 |             Type: PLAINTEXT
138 |             Value: !Ref LambdasS3Prefix
139 |       Source:
140 |         Type: NO_SOURCE
141 |         BuildSpec: !Sub >-
142 |           arn:aws:s3:::${ResourcesS3Bucket}/${BuildSpecS3Prefix}/${LambdasCodeBuildBuildSpecS3File}
143 |       TimeoutInMinutes: 10
144 |   TriggerDockerCodeBuildGenomesInTheCloud:
145 |     Type: 'Custom::TriggerDockerCodeBuildGitc'
146 |     DependsOn:
147 |       - DockerCodeBuildProject
148 |       - TriggerDockerCodeBuildLambda
149 |     Version: 1
150 |     Properties:
151 |       ServiceToken: !Sub '${TriggerDockerCodeBuildLambda.Arn}'
152 |       ProjectName: DockerCodeBuildProject
153 |       SourceRepo: !Ref DockerGenomesInTheCloud
154 |       EcrRepo: !GetAtt ECRGenomesInTheCloud.RepositoryUri
155 |   TriggerDockerCodeBuildGatk:
156 |     Type: 'Custom::TriggerDockerCodeBuildGatk'
157 |     DependsOn:
158 |       - DockerCodeBuildProject
159 |       - TriggerDockerCodeBuildLambda
160 |     Version: 1
161 |     Properties:
162 |       ServiceToken: !Sub '${TriggerDockerCodeBuildLambda.Arn}'
163 |       ProjectName: DockerCodeBuildProject
164 |       SourceRepo: !Ref DockerGatk
165 |       EcrRepo: !GetAtt ECRGatk.RepositoryUri
166 |   TriggerLambdasCodeBuild:
167 |     Type: 'Custom::TriggerLambdasCodeBuild'
168 |     DependsOn:
169 |       - LambdasCodeBuildProject
170 |       - TriggerLambdasCodeBuildLambda
171 |     Version: 1
172 |     Properties:
173 |       ServiceToken: !Sub '${TriggerLambdasCodeBuildLambda.Arn}'
174 |       ProjectName: LambdasCodeBuildProject
175 |   TriggerDockerCodeBuildLambda:
176 |     Type: 'AWS::Lambda::Function'
177 |     DependsOn:
178 |       - TriggerCodeBuildLambdaRole
179 |     Properties:
180 |       Handler: trigger_docker_code_build.handler
181 |       Runtime: python3.9
182 |       FunctionName: trigger-docker-code-build
183 |       Code:
184 |         S3Bucket: !Sub '${ResourcesS3Bucket}'
185 |         S3Key: !Sub '${LambdasS3Prefix}trigger_docker_code_build.zip'
186 |       Role: !Sub '${TriggerCodeBuildLambdaRole.Arn}'
187 |       Timeout: 60
188 |   TriggerLambdasCodeBuildLambda:
189 |     Type: 'AWS::Lambda::Function'
190 |     DependsOn:
191 |       - TriggerCodeBuildLambdaRole
192 |     Properties:
193 |       Handler: trigger_lambdas_code_build.handler
194 |       Runtime: python3.9
195 |       FunctionName: trigger-lambdas-code-build
196 |       Code:
197 |         S3Bucket: !Sub '${ResourcesS3Bucket}'
198 |         S3Key: !Sub '${LambdasS3Prefix}trigger_lambdas_code_build.zip'
199 |       Role: !Sub '${TriggerCodeBuildLambdaRole.Arn}'
200 |       Timeout: 60
201 |   TriggerCodeBuildLambdaRole:
202 |     Type: 'AWS::IAM::Role'
203 |     DependsOn: []
204 |     Properties:
205 |       AssumeRolePolicyDocument:
206 |         Version: 2012-10-17
207 |         Statement:
208 |           - Action:
209 |               - 'sts:AssumeRole'
210 |             Effect: Allow
211 |             Principal:
212 |               Service:
213 |                 - lambda.amazonaws.com
214 |       Path: /
215 |       Policies:
216 |         - PolicyName: TriggerCodeBuildLambdaRolePolicy
217 |           PolicyDocument:
218 |             Statement:
219 |               - Effect: Allow
220 |                 Action:
221 |                   - 'logs:CreateLogGroup'
222 |                   - 'logs:CreateLogStream'
223 |                   - 'logs:PutLogEvents'
224 |                 Resource:
225 |                   - !Sub >-
226 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
227 |               - Effect: Allow
228 |                 Action:
229 |                   - 'lambda:AddPermission'
230 |                   - 'lambda:RemovePermission'
231 |                   - 'events:PutRule'
232 |                   - 'events:DeleteRule'
233 |                   - 'events:PutTargets'
234 |                   - 'events:RemoveTargets'
235 |                 Resource: '*'
236 |               - Effect: Allow
237 |                 Action:
238 |                   - 'iam:GetRole'
239 |                   - 'iam:PassRole'
240 |                 Resource: !Sub '${CodeBuildServiceRole.Arn}'
241 |               - Effect: Allow
242 |                 Action:
243 |                   - 'codebuild:StartBuild'
244 |                   - 'codebuild:BatchGetBuilds'
245 |                 Resource: 
246 |                   - !Sub '${DockerCodeBuildProject.Arn}'
247 |                   - !Sub '${LambdasCodeBuildProject.Arn}'
248 | Outputs:
249 |   EcrImageUriGotc:
250 |     Value: !GetAtt TriggerDockerCodeBuildGenomesInTheCloud.EcrImageUri
251 |   EcrImageUriGatk:
252 |     Value: !GetAtt TriggerDockerCodeBuildGatk.EcrImageUri


--------------------------------------------------------------------------------
/src/cfn_templates/e2e-sfn-stack.yml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: "2010-09-09"
  2 | Parameters:
  3 |   OmicsImportSequenceLambdaArn:
  4 |     Type: String
  5 |   OmicsImportSequenceJobRoleArn:
  6 |     Type: String
  7 |   CheckOmicsTaskLambdaFunctionArn:
  8 |     Type: String
  9 |   OmicsWorkflowStartRunLambdaArn:
 10 |     Type: String
 11 |   OmicsWorkflowStartRunJobRoleArn:
 12 |     Type: String
 13 |   OmicsImportVariantLambdaArn:
 14 |     Type: String
 15 |   OmicsImportVariantJobRoleArn:
 16 |     Type: String
 17 |   ApplyS3LifecycleLambdaFunctionArn:
 18 |     Type: String
 19 |   ReferenceFastaFileS3Uri:
 20 |     Type: String
 21 |   Mills1000GIndelsVcf:
 22 |     Type: String
 23 |   DbSnpVcf:
 24 |     Type: String
 25 |   KnownIndelsVcf:
 26 |     Type: String
 27 |   RunDate:
 28 |     Type: String
 29 |     Description: Example run date
 30 |     Default: 2016-09-01T02:00:00+0200
 31 |   PlatformName:
 32 |     Type: String
 33 |     Description: Example platform name
 34 |     Default: Illumina
 35 |   SequencingCenter:
 36 |     Type: String
 37 |     Description: Example sequencing center
 38 |     Default: ABCD
 39 |   OmicsVariantStoreName:
 40 |     Type: String
 41 |     
 42 | Description: >-
 43 |   State Machine for ingesting FASTQs into Omics Sequence Store,
 44 |   running GATK workflow with input FASTQs,
 45 |   ingesting post-workflow outputs into Omics Sequence and Variamt Stores,
 46 |   and S3 file tagging to enable activation of S3 lifecycle policies on inputs and outputs
 47 | Resources:
 48 |   AmazonOmicsStepFunction:
 49 |     Type: AWS::StepFunctions::StateMachine
 50 |     Properties:
 51 |       RoleArn: !GetAtt AmazonOmicsStepFunctionRole.Arn
 52 |       StateMachineName: AmazonOmicsEndToEndStepFunction
 53 |       DefinitionSubstitutions:
 54 |         OmicsImportSequenceLambdaArn: !Ref OmicsImportSequenceLambdaArn
 55 |         OmicsImportSequenceJobRoleArn: !Ref OmicsImportSequenceJobRoleArn
 56 |         CheckOmicsTaskLambdaFunctionArn: !Ref CheckOmicsTaskLambdaFunctionArn
 57 |         OmicsWorkflowStartRunLambdaArn: !Ref OmicsWorkflowStartRunLambdaArn
 58 |         OmicsWorkflowStartRunJobRoleArn: !Ref OmicsWorkflowStartRunJobRoleArn
 59 |         OmicsImportVariantJobRoleArn: !Ref OmicsImportVariantJobRoleArn
 60 |         OmicsImportVariantLambdaArn: !Ref OmicsImportVariantLambdaArn
 61 |         ApplyS3LifecycleLambdaFunctionArn: !Ref ApplyS3LifecycleLambdaFunctionArn
 62 |         
 63 |       DefinitionString: !Sub |
 64 |         {
 65 |           "Comment": "StateMachine to orchestrate end-to-end Omics Workflow",
 66 |           "StartAt": "IngestFastqToReadSet",
 67 |           "States": {
 68 |             "IngestFastqToReadSet": {
 69 |               "InputPath": "$",
 70 |               "Next": "WaitForFastqIngest",
 71 |               "Parameters": {
 72 |                 "FunctionName": "${OmicsImportSequenceLambdaArn}",
 73 |                 "Payload": {
 74 |                   "FileType": "FASTQ",
 75 |                   "Read1.$": "$.Read1",
 76 |                   "Read2.$": "$.Read2",
 77 |                   "ReferenceArn.$": "$.ReferenceArn",
 78 |                   "SampleId.$": "$.SampleId",
 79 |                   "SequenceStoreId.$": "$.SequenceStoreId",
 80 |                   "SubjectId.$": "$.SubjectId",
 81 |                   "RoleArn": "${OmicsImportSequenceJobRoleArn}"
 82 |                 }
 83 |               },
 84 |               "Resource": "arn:aws:states:::lambda:invoke",
 85 |               "ResultSelector": {
 86 |                 "import_fastq.$": "$.Payload"
 87 |               },
 88 |               "ResultPath": "$.import_fastq",
 89 |               "Type": "Task"
 90 |             },
 91 |             "WaitForFastqIngest": {
 92 |               "Next": "CheckFastqIngest",
 93 |               "Seconds": 10,
 94 |               "Type": "Wait"
 95 |             },
 96 |             "SuccessState": {
 97 |               "Type": "Succeed"
 98 |             },
 99 |             "CheckFastqIngest": {
100 |               "InputPath": "$",
101 |               "Next": "FastqIngestDone?",
102 |               "Parameters": {
103 |                 "FunctionName": "${CheckOmicsTaskLambdaFunctionArn}",
104 |                 "Payload": {
105 |                   "task_type": "GetReadSetImportJob",
106 |                   "task_params": {
107 |                     "id.$": "$.import_fastq.import_fastq.importReadSetJobId",
108 |                     "sequence_store_id.$": "$.SequenceStoreId"
109 |                   }
110 |                 }
111 |               },
112 |               "Resource": "arn:aws:states:::lambda:invoke",
113 |               "ResultPath": "$.import_fastq.import_fastq.status.message",
114 |               "Type": "Task"
115 |             },
116 |             "FastqIngestFailed": {
117 |               "Cause": "Fastq Ingest Failed",
118 |               "Error": "$.import_fastq.import_fastq.status.error",
119 |               "Type": "Fail"
120 |             },
121 |             "FastqIngestDone?": {
122 |               "Choices": [
123 |                 {
124 |                   "Next": "RunOmicsWorkflowLambda",
125 |                   "StringEquals": "COMPLETED",
126 |                   "Variable": "$.import_fastq.import_fastq.status.message.Payload.task_status"
127 |                 },
128 |                 {
129 |                   "Next": "FastqIngestFailed",
130 |                   "StringEquals": "FAILED",
131 |                   "Variable": "$.import_fastq.import_fastq.status.message.Payload.task_status"
132 |                 }
133 |               ],
134 |               "Default": "WaitForFastqIngest",
135 |               "Type": "Choice"
136 |             },
137 |             "RunOmicsWorkflowLambda": {
138 |               "InputPath": "$",
139 |               "Next": "WaitForOmicsWorkflow",
140 |               "Parameters": {
141 |                 "FunctionName": "${OmicsWorkflowStartRunLambdaArn}",
142 |                 "Payload": {
143 |                   "sample_name.$": "$.SampleId",
144 |                   "fastq_1.$": "$.Read1",
145 |                   "fastq_2.$": "$.Read2",
146 |                   "ref_fasta": "${ReferenceFastaFileS3Uri}",
147 |                   "readgroup_name.$": "$.SampleId",
148 |                   "library_name.$": "$.SampleId",
149 |                   "platform_name": "${PlatformName}",
150 |                   "run_date": "${RunDate}",
151 |                   "sequencing_center": "${SequencingCenter}",
152 |                   "dbSNP_vcf": "${DbSnpVcf}",
153 |                   "Mills_1000G_indels_vcf": "${Mills1000GIndelsVcf}",
154 |                   "known_indels_vcf": "${KnownIndelsVcf}",
155 |                   "scattered_calling_intervals_archive.$": "$.IntervalsS3Path",
156 |                   "gatk_docker.$": "$.GatkDockerUri",
157 |                   "gotc_docker.$": "$.GotcDockerUri",
158 |                   "WorkflowId.$": "$.WorkflowId",
159 |                   "JobRoleArn": "${OmicsWorkflowStartRunJobRoleArn}",
160 |                   "OutputS3Path.$": "$.WorkflowOutputS3Path"
161 |                 }
162 |               },
163 |               "Resource": "arn:aws:states:::lambda:invoke",
164 |               "ResultSelector": {
165 |                 "workflow.$": "$.Payload"
166 |               },
167 |               "ResultPath": "$.workflow",
168 |               "Type": "Task"
169 |             },
170 |             "WaitForOmicsWorkflow": {
171 |               "Next": "CheckOmicsWorkflow",
172 |               "Seconds": 60,
173 |               "Type": "Wait"
174 |             },
175 |             "CheckOmicsWorkflow": {
176 |               "InputPath": "$",
177 |               "Next": "OmicsWorkflowDone?",
178 |               "Parameters": {
179 |                 "FunctionName": "${CheckOmicsTaskLambdaFunctionArn}",
180 |                 "Payload": {
181 |                   "task_type": "GetRun",
182 |                   "task_params": {
183 |                     "id.$": "$.workflow.workflow.WorkflowRunId"
184 |                   }
185 |                 }
186 |               },
187 |               "Resource": "arn:aws:states:::lambda:invoke",
188 |               "ResultPath": "$.workflow.workflow.status.message",
189 |               "Type": "Task"
190 |             },
191 |             "OmicsWorkflowDone?": {
192 |               "Choices": [
193 |                 {
194 |                   "Next": "PostWorkflowIngest",
195 |                   "StringEquals": "COMPLETED",
196 |                   "Variable": "$.workflow.workflow.status.message.Payload.task_status"
197 |                 },
198 |                 {
199 |                   "Next": "OmicsWorkflowFailed",
200 |                   "StringEquals": "FAILED",
201 |                   "Variable": "$.workflow.workflow.status.message.Payload.task_status"
202 |                 }
203 |               ],
204 |               "Default": "WaitForOmicsWorkflow",
205 |               "Type": "Choice"
206 |             },
207 |             "OmicsWorkflowFailed": {
208 |               "Cause": "Omics Workflow Failed",
209 |               "Error": "$.workflow.workflow.status.error",
210 |               "Type": "Fail"
211 |             },
212 |             "PostWorkflowIngest":
213 |               {
214 |                   "Branches":
215 |                   [
216 |                       {
217 |                           "StartAt": "IngestBamToReadSet",
218 |                           "States":
219 |                           {
220 |                               "BamIngestDone?":
221 |                               {
222 |                                   "Choices":
223 |                                   [
224 |                                       {
225 |                                           "Next": "PostWorkflowBamIngestCompleted",
226 |                                           "StringEquals": "COMPLETED",
227 |                                           "Variable": "$.import_bam.import_bam.status.message.Payload.task_status"
228 |                                       },
229 |                                       {
230 |                                           "Next": "BamIngestFailed",
231 |                                           "StringEquals": "FAILED",
232 |                                           "Variable": "$.import_bam.import_bam.status.message.Payload.task_status"
233 |                                       }
234 |                                   ],
235 |                                   "Default": "WaitForBamIngest",
236 |                                   "Type": "Choice"
237 |                               },
238 |                               "BamIngestFailed":
239 |                               {
240 |                                   "Cause": "Post Workflow BAM Ingest Failed",
241 |                                   "Error": "$.import_bam.import_bam.status.error",
242 |                                   "Type": "Fail"
243 |                               },
244 |                               "CheckBamIngest":
245 |                               {
246 |                                   "InputPath": "$",
247 |                                   "Next": "BamIngestDone?",
248 |                                   "Parameters": {
249 |                                     "FunctionName": "${CheckOmicsTaskLambdaFunctionArn}",
250 |                                     "Payload": {
251 |                                       "task_type": "GetReadSetImportJob",
252 |                                       "task_params": {
253 |                                         "id.$": "$.import_bam.import_bam.importReadSetJobId",
254 |                                         "sequence_store_id.$": "$.SequenceStoreId"
255 |                                       }
256 |                                     }
257 |                                   },
258 |                                   "Resource": "arn:aws:states:::lambda:invoke",
259 |                                   "ResultPath": "$.import_bam.import_bam.status.message",
260 |                                   "Type": "Task"
261 |                               },
262 |                               "IngestBamToReadSet":
263 |                               {
264 |                                   "InputPath": "$",
265 |                                   "Next": "WaitForBamIngest",
266 |                                   "Parameters":
267 |                                   {
268 |                                       "FunctionName": "${OmicsImportSequenceLambdaArn}",
269 |                                       "Payload": {
270 |                                           "FileType": "BAM",
271 |                                           "Read1.$": "States.Format('{}/{}/out/analysis_ready_bam/{}.hg38.bam', $.WorkflowOutputS3Path, $.workflow.workflow.WorkflowRunId, $.SampleId)",
272 |                                           "ReferenceArn.$": "$.ReferenceArn",
273 |                                           "SampleId.$": "$.SampleId",
274 |                                           "SequenceStoreId.$": "$.SequenceStoreId",
275 |                                           "SubjectId.$": "$.SubjectId",
276 |                                           "RoleArn": "${OmicsImportSequenceJobRoleArn}"
277 |                                         }
278 |                                   },
279 |                                   "Resource": "arn:aws:states:::lambda:invoke",
280 |                                   "ResultSelector": {
281 |                                     "import_bam.$": "$.Payload"
282 |                                   },
283 |                                   "ResultPath": "$.import_bam",
284 |                                   "Type": "Task"
285 |                               },
286 |                               "PostWorkflowBamIngestCompleted":
287 |                               {
288 |                                   "End": true,
289 |                                   "Type": "Pass"
290 |                               },
291 |                               "WaitForBamIngest":
292 |                               {
293 |                                   "Next": "CheckBamIngest",
294 |                                   "Seconds": 10,
295 |                                   "Type": "Wait"
296 |                               }
297 |                           }
298 |                       },
299 |                       {
300 |                           "StartAt": "IngestVcfToVariantStore",
301 |                           "States":
302 |                           {
303 |                               "CheckVcfIngest":
304 |                               {
305 |                                   "InputPath": "$",
306 |                                   "Next": "VcfIngestDone?",
307 |                                   "Parameters":
308 |                                   {
309 |                                       "FunctionName": "${CheckOmicsTaskLambdaFunctionArn}",
310 |                                       "Payload": {
311 |                                       "task_type": "GetVariantImportJob",
312 |                                       "task_params": {
313 |                                         "job_id.$": "$.import_vcf.import_vcf.VariantImportJobId"
314 |                                       }
315 |                                     }
316 |                                   },
317 |                                   "Resource": "arn:aws:states:::lambda:invoke",
318 |                                   "ResultPath": "$.import_vcf.import_vcf.status.message",
319 |                                   "Type": "Task"
320 |                               },
321 |                               "IngestVcfToVariantStore":
322 |                               {
323 |                                   "InputPath": "$",
324 |                                   "Next": "WaitForVcfIngest",
325 |                                   "Parameters":
326 |                                   {
327 |                                       "FunctionName": "${OmicsImportVariantLambdaArn}",
328 |                                       "Payload": {
329 |                                           "VariantStoreName": "${OmicsVariantStoreName}",
330 |                                           "OmicsImportVariantRoleArn": "${OmicsImportVariantJobRoleArn}",
331 |                                           "VcfS3Uri.$": "States.Format('{}/{}/out/output_vcf/{}.hg38.vcf.gz', $.WorkflowOutputS3Path, $.workflow.workflow.WorkflowRunId, $.SampleId)"
332 |                                         }
333 |                                   },
334 |                                   "Resource": "arn:aws:states:::lambda:invoke",
335 |                                   "ResultSelector": {
336 |                                     "import_vcf.$": "$.Payload"
337 |                                   },
338 |                                   "ResultPath": "$.import_vcf",
339 |                                   "Type": "Task"
340 |                               },
341 |                               "PostWorkflowVcfIngestCompleted":
342 |                               {
343 |                                   "End": true,
344 |                                   "Type": "Pass"
345 |                               },
346 |                               "VcfIngestDone?":
347 |                               {
348 |                                   "Choices":
349 |                                   [
350 |                                       {
351 |                                           "Next": "PostWorkflowVcfIngestCompleted",
352 |                                           "StringEquals": "COMPLETED",
353 |                                           "Variable": "$.import_vcf.import_vcf.status.message.Payload.task_status"
354 |                                       },
355 |                                       {
356 |                                           "Next": "VcfIngestFailed",
357 |                                           "StringEquals": "FAILED",
358 |                                           "Variable": "$.import_vcf.import_vcf.status.message.Payload.task_status"
359 |                                       }
360 |                                   ],
361 |                                   "Default": "WaitForVcfIngest",
362 |                                   "Type": "Choice"
363 |                               },
364 |                               "VcfIngestFailed":
365 |                               {
366 |                                   "Cause": "Post Workflow VCF Ingest to Variant Store Failed",
367 |                                   "Error": "$.import_vcf.import_vcf.status.error",
368 |                                   "Type": "Fail"
369 |                               },
370 |                               "WaitForVcfIngest":
371 |                               {
372 |                                   "Next": "CheckVcfIngest",
373 |                                   "Seconds": 10,
374 |                                   "Type": "Wait"
375 |                               }
376 |                           }
377 |                       }
378 |                   ],
379 |                   "Next": "AddLifeCycleTags",
380 |                   "Type": "Parallel"
381 |               },
382 |               "AddLifeCycleTags":
383 |               {
384 |                   "Next": "SuccessState",
385 |                   "InputPath": "$.[0]",
386 |                   "Parameters":
387 |                   {
388 |                       "FunctionName": "${ApplyS3LifecycleLambdaFunctionArn}",
389 |                       "Payload": {
390 |                         "inputs": {
391 |                             "fastq.$": "States.Array($.Read1,$.Read2)"
392 |                         },
393 |                         "outputs": {
394 |                             "vcf.$": "States.Array(States.Format('{}/{}/out/output_vcf/{}.hg38.vcf.gz', $.WorkflowOutputS3Path, $.workflow.workflow.WorkflowRunId, $.SampleId))",
395 |                             "bam.$": "States.Array(States.Format('{}/{}/out/analysis_ready_bam/{}.hg38.bam', $.WorkflowOutputS3Path, $.workflow.workflow.WorkflowRunId, $.SampleId))"
396 |                         }
397 |                       } 
398 |                   },
399 |                   "Resource": "arn:aws:states:::lambda:invoke",
400 |                   "Type": "Task"
401 |               }
402 |           }
403 |         }
404 | 
405 |   AmazonOmicsStepFunctionRole:
406 |     Type: AWS::IAM::Role
407 |     Properties:
408 |       RoleName: AmazonOmicsStepFunctionRole
409 |       AssumeRolePolicyDocument:
410 |         Version: '2012-10-17'
411 |         Statement:
412 |           - Effect: Allow
413 |             Principal:
414 |               Service: !Sub 'states.${AWS::Region}.amazonaws.com'
415 |             Action: 'sts:AssumeRole'
416 |       Policies:
417 |         - PolicyName: InvokeLambda
418 |           PolicyDocument:
419 |             Statement:
420 |               - Effect: Allow
421 |                 Action: 'lambda:InvokeFunction'
422 |                 Resource:
423 |                   - !Ref OmicsImportSequenceLambdaArn
424 |                   - !Ref CheckOmicsTaskLambdaFunctionArn
425 |                   - !Ref OmicsWorkflowStartRunLambdaArn
426 |                   - !Ref OmicsImportVariantLambdaArn
427 |                   - !Ref ApplyS3LifecycleLambdaFunctionArn
428 | Outputs:
429 |   AmazonOmicsStepFunctionArn:
430 |     Value: !GetAtt AmazonOmicsStepFunction.Arn
431 |     Export:
432 |       Name: AmazonOmicsStepFunctionArn


--------------------------------------------------------------------------------
/src/cfn_templates/omics-resources-stack.yml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: 2010-09-09
  2 | Description: All necessary Amazon Omics Resources to store and process Genomics data
  3 | Parameters:
  4 |   ExistingReferenceStoreId:
  5 |     Type: String 
  6 |     Default: 'NONE'
  7 |     Description: 'Provide Reference Store ID if exists in current account-region, else leave it NONE'
  8 |   OmicsResourcePrefix:
  9 |     Type: String
 10 |     Default: omics-cfn
 11 |   OmicsResourcesS3Bucket:
 12 |     Type: String
 13 |   OmicsCustomResourceLambdaS3Prefix:
 14 |     Type: String
 15 |   OmicsWorkflowInputBucketName:
 16 |     Type: String
 17 |   OmicsWorkflowOutputBucketName:
 18 |     Type: String
 19 |   OmicsReferenceFastaUri:
 20 |     Type: String
 21 |   OmicsReferenceName:
 22 |     Type: String
 23 |   ClinvarS3Path:
 24 |     Type: String
 25 |   OmicsAnnotationStoreName:
 26 |     Type: String
 27 |   OmicsVariantStoreName:
 28 |     Type: String
 29 |   AnnotationStoreFormat:
 30 |     Type: String
 31 |     Default: VCF
 32 |   OmicsWorkflowDefinitionZipS3:
 33 |     Type: String
 34 | Conditions:
 35 |   ToCreateReferenceStore: !Equals 
 36 |     - !Sub '${ExistingReferenceStoreId}'
 37 |     - 'NONE'
 38 | Resources:
 39 | 
 40 |   OmicsReferenceStore:
 41 |     Type: AWS::Omics::ReferenceStore
 42 |     Condition: ToCreateReferenceStore
 43 |     Properties: 
 44 |       Name: !Sub '${OmicsResourcePrefix}-reference-store'
 45 | 
 46 |   OmicsImportReference:
 47 |     Type: 'Custom::OmicsImportReference'
 48 |     Version: 1
 49 |     Properties:
 50 |       ServiceToken: !Sub '${OmicsImportReferenceLambda.Arn}'
 51 |       ReferenceStoreId: 
 52 |         !If [
 53 |           ToCreateReferenceStore, 
 54 |           !GetAtt OmicsReferenceStore.ReferenceStoreId,
 55 |           !Ref ExistingReferenceStoreId]
 56 |       ReferenceName: !Sub '${OmicsReferenceName}'
 57 |       OmicsImportReferenceRoleArn: !Sub '${OmicsImportReferenceJobRole.Arn}'
 58 |       ReferenceSourceS3Uri: !Ref OmicsReferenceFastaUri
 59 | 
 60 |   OmicsImportReferenceLambda:
 61 |     Type: 'AWS::Lambda::Function'
 62 |     Properties:
 63 |       Handler: import_reference_lambda.handler
 64 |       Runtime: python3.9
 65 |       FunctionName: !Sub '${OmicsResourcePrefix}-import-reference'
 66 |       Code:
 67 |         S3Bucket: !Sub '${OmicsResourcesS3Bucket}'
 68 |         S3Key: !Sub '${OmicsCustomResourceLambdaS3Prefix}import_reference_lambda.zip'
 69 |       Role: !Sub '${OmicsImportReferenceLambdaRole.Arn}'
 70 |       Timeout: 60
 71 |   OmicsImportReferenceLambdaRole:
 72 |     Type: 'AWS::IAM::Role'
 73 |     Properties:
 74 |       AssumeRolePolicyDocument:
 75 |         Version: 2012-10-17
 76 |         Statement:
 77 |           - Action:
 78 |               - 'sts:AssumeRole'
 79 |             Effect: Allow
 80 |             Principal:
 81 |               Service:
 82 |                 - lambda.amazonaws.com
 83 |       Path: /
 84 |       Policies:
 85 |         - PolicyName: ImportReferencePolicy
 86 |           PolicyDocument:
 87 |             Statement:
 88 |               - Effect: Allow
 89 |                 Action:
 90 |                   - 'logs:CreateLogGroup'
 91 |                   - 'logs:CreateLogStream'
 92 |                   - 'logs:PutLogEvents'
 93 |                 Resource:
 94 |                   - !Sub >-
 95 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
 96 |               - Effect: Allow
 97 |                 Action:
 98 |                   - 'omics:*'
 99 |                 Resource: '*'
100 |               - Effect: Allow
101 |                 Action:
102 |                   - 'lambda:AddPermission'
103 |                   - 'lambda:RemovePermission'
104 |                   - 'events:PutRule'
105 |                   - 'events:DeleteRule'
106 |                   - 'events:PutTargets'
107 |                   - 'events:RemoveTargets'
108 |                 Resource: '*'
109 |               - Effect: Allow
110 |                 Action:
111 |                   - 'iam:GetRole'
112 |                   - 'iam:PassRole'
113 |                 Resource: !Sub '${OmicsImportReferenceJobRole.Arn}'
114 |   OmicsImportReferenceJobRole:
115 |     Type: 'AWS::IAM::Role'
116 |     Properties:
117 |       AssumeRolePolicyDocument:
118 |         Version: 2012-10-17
119 |         Statement:
120 |           - Action:
121 |               - 'sts:AssumeRole'
122 |             Effect: Allow
123 |             Principal:
124 |               Service:
125 |                 - omics.amazonaws.com
126 |       Path: /
127 |       Policies:
128 |         - PolicyName: ImportReferenceJobRolePolicy
129 |           PolicyDocument:
130 |             Statement:
131 |               - Effect: Allow
132 |                 Action:
133 |                   - 's3:GetObject'
134 |                   - 's3:GetBucketLocation'
135 |                   - 's3:ListBucket'
136 |                 Resource:
137 |                   - !Sub 'arn:aws:s3:::${OmicsResourcesS3Bucket}'
138 |                   - !Sub 'arn:aws:s3:::${OmicsResourcesS3Bucket}/*'
139 |                   - !Sub 'arn:aws:s3:::broad-references'
140 |                   - !Sub 'arn:aws:s3:::broad-references/*'
141 |   OmicsVariantStore:
142 |     Type: AWS::Omics::VariantStore
143 |     DependsOn: 
144 |       - OmicsAnnotationStore
145 |     Properties: 
146 |       Description: String
147 |       Name: !Sub '${OmicsVariantStoreName}'
148 |       Reference: 
149 |         ReferenceArn: !Sub '${OmicsImportReference.Arn}'
150 | 
151 |   OmicsImportVariantLambda:
152 |     Type: 'AWS::Lambda::Function'
153 |     Properties:
154 |       Handler: import_variant_lambda.handler
155 |       Runtime: python3.9
156 |       FunctionName: !Sub '${OmicsResourcePrefix}-import-variant'
157 |       Code:
158 |         S3Bucket: !Sub '${OmicsResourcesS3Bucket}'
159 |         S3Key: !Sub '${OmicsCustomResourceLambdaS3Prefix}import_variant_lambda.zip'
160 |       Role: !Sub '${OmicsImportVariantLambdaRole.Arn}'
161 |       Timeout: 60
162 |   OmicsImportVariantLambdaRole:
163 |     Type: 'AWS::IAM::Role'
164 |     Properties:
165 |       AssumeRolePolicyDocument:
166 |         Version: 2012-10-17
167 |         Statement:
168 |           - Action:
169 |               - 'sts:AssumeRole'
170 |             Effect: Allow
171 |             Principal:
172 |               Service:
173 |                 - lambda.amazonaws.com
174 |       Path: /
175 |       Policies:
176 |         - PolicyName: ImportVariantPolicy
177 |           PolicyDocument:
178 |             Statement:
179 |               - Effect: Allow
180 |                 Action:
181 |                   - 'logs:CreateLogGroup'
182 |                   - 'logs:CreateLogStream'
183 |                   - 'logs:PutLogEvents'
184 |                 Resource:
185 |                   - !Sub >-
186 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
187 |               - Effect: Allow
188 |                 Action:
189 |                   - 'omics:*'
190 |                 Resource: '*'
191 |               - Effect: Allow
192 |                 Action:
193 |                   - 'iam:GetRole'
194 |                   - 'iam:PassRole'
195 |                 Resource: !Sub '${OmicsImportVariantJobRole.Arn}'
196 |   OmicsImportVariantJobRole:
197 |     Type: 'AWS::IAM::Role'
198 |     Properties:
199 |       AssumeRolePolicyDocument:
200 |         Version: 2012-10-17
201 |         Statement:
202 |           - Action:
203 |               - 'sts:AssumeRole'
204 |             Effect: Allow
205 |             Principal:
206 |               Service:
207 |                 - omics.amazonaws.com
208 |       Path: /
209 |       Policies:
210 |         - PolicyName: OmicsImportVariantJobRolePolicy
211 |           PolicyDocument:
212 |             Statement:
213 |               - Effect: Allow
214 |                 Action:
215 |                   - 's3:GetObject'
216 |                   - 's3:GetBucketLocation'
217 |                   - 's3:ListBucket'
218 |                 Resource:
219 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowOutputBucketName}'
220 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowOutputBucketName}/*'
221 |               - Effect: Allow
222 |                 Action:
223 |                   - 'omics:GetReference'
224 |                   - 'omics:GetReferenceMetadata'
225 |                 Resource:
226 |                   - !Sub 'arn:aws:omics:${AWS::Region}:${AWS::AccountId}:referenceStore/*'
227 |   OmicsAnnotationStore:
228 |     Type: AWS::Omics::AnnotationStore
229 |     Properties: 
230 |       Name: !Sub '${OmicsAnnotationStoreName}'
231 |       Reference: 
232 |         ReferenceArn: !Sub '${OmicsImportReference.Arn}'
233 |       StoreFormat: !Sub '${AnnotationStoreFormat}'
234 | 
235 |   OmicsImportAnnotation:
236 |     Type: 'Custom::OmicsImportAnnotation'
237 |     DependsOn:
238 |       - OmicsAnnotationStore
239 |     Version: 1
240 |     Properties:
241 |       ServiceToken: !Sub '${OmicsImportAnnotationLambda.Arn}'
242 |       AnnotationStoreName: !Sub '${OmicsAnnotationStoreName}'
243 |       OmicsImportAnnotationRoleArn: !Sub '${OmicsImportAnnotationJobRole.Arn}'
244 |       AnnotationSourceS3Uri: !Ref ClinvarS3Path
245 | 
246 |   OmicsImportAnnotationLambda:
247 |     Type: 'AWS::Lambda::Function'
248 |     Properties:
249 |       Handler: import_annotation_lambda.handler
250 |       Runtime: python3.9
251 |       FunctionName: !Sub '${OmicsResourcePrefix}-import-annotation'
252 |       Code:
253 |         S3Bucket: !Sub '${OmicsResourcesS3Bucket}'
254 |         S3Key: !Sub '${OmicsCustomResourceLambdaS3Prefix}import_annotation_lambda.zip'
255 |       Role: !Sub '${OmicsImportAnnotationLambdaRole.Arn}'
256 |       Timeout: 60
257 | 
258 |   OmicsImportAnnotationLambdaRole:
259 |     Type: 'AWS::IAM::Role'
260 |     Properties:
261 |       AssumeRolePolicyDocument:
262 |         Version: 2012-10-17
263 |         Statement:
264 |           - Action:
265 |               - 'sts:AssumeRole'
266 |             Effect: Allow
267 |             Principal:
268 |               Service:
269 |                 - lambda.amazonaws.com
270 |       Path: /
271 |       Policies:
272 |         - PolicyName: ImportAnnotationPolicy
273 |           PolicyDocument:
274 |             Statement:
275 |               - Effect: Allow
276 |                 Action:
277 |                   - 'logs:CreateLogGroup'
278 |                   - 'logs:CreateLogStream'
279 |                   - 'logs:PutLogEvents'
280 |                 Resource:
281 |                   - !Sub >-
282 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
283 |               - Effect: Allow
284 |                 Action:
285 |                   - 'omics:*'
286 |                 Resource: '*'
287 |               - Effect: Allow
288 |                 Action:
289 |                   - 'lambda:AddPermission'
290 |                   - 'lambda:RemovePermission'
291 |                   - 'events:PutRule'
292 |                   - 'events:DeleteRule'
293 |                   - 'events:PutTargets'
294 |                   - 'events:RemoveTargets'
295 |                 Resource: '*'
296 |               - Effect: Allow
297 |                 Action:
298 |                   - 'iam:GetRole'
299 |                   - 'iam:PassRole'
300 |                 Resource: !Sub '${OmicsImportAnnotationJobRole.Arn}'
301 |   OmicsImportAnnotationJobRole:
302 |     Type: 'AWS::IAM::Role'
303 |     Properties:
304 |       AssumeRolePolicyDocument:
305 |         Version: 2012-10-17
306 |         Statement:
307 |           - Action:
308 |               - 'sts:AssumeRole'
309 |             Effect: Allow
310 |             Principal:
311 |               Service:
312 |                 - omics.amazonaws.com
313 |       Path: /
314 |       Policies:
315 |         - PolicyName: ImportAnnotationJobRolePolicy
316 |           PolicyDocument:
317 |             Statement:
318 |               - Effect: Allow
319 |                 Action:
320 |                   - 's3:GetObject'
321 |                   - 's3:GetBucketLocation'
322 |                   - 's3:ListBucket'
323 |                 Resource:
324 |                   - !Sub 'arn:aws:s3:::${OmicsResourcesS3Bucket}'
325 |                   - !Sub 'arn:aws:s3:::${OmicsResourcesS3Bucket}/*'
326 |                   - arn:aws:s3:::aws-genomics-datasets
327 |                   - 'arn:aws:s3:::aws-genomics-datasets/*'
328 |                   - arn:aws:s3:::aws-genomics-static-us-east-1
329 |                   - 'arn:aws:s3:::aws-genomics-static-us-east-1/*'
330 |               - Effect: Allow
331 |                 Action:
332 |                   - 'omics:GetReference'
333 |                   - 'omics:GetReferenceMetadata'
334 |                 Resource:
335 |                   - !Sub 'arn:aws:omics:${AWS::Region}:${AWS::AccountId}:referenceStore/*'
336 | 
337 |   OmicsSequenceStore:
338 |     Type: AWS::Omics::SequenceStore
339 |     Properties: 
340 |       Name: !Sub '${OmicsResourcePrefix}-sequence-store'
341 | 
342 |   OmicsCreateWorkflow:
343 |     Type: AWS::Omics::Workflow
344 |     Properties: 
345 |       DefinitionUri: !Sub ${OmicsWorkflowDefinitionZipS3}
346 |       Description: !Sub '${OmicsResourcePrefix} test workflow'
347 |       Name: !Sub '${OmicsResourcePrefix}-test-workflow'
348 |       ParameterTemplate:     
349 |         sample_name:
350 |           Description: "sample name"
351 |         fastq_1:
352 |           Description: "path to fastq1"
353 |         fastq_2:
354 |           Description: "path to fastq2"
355 |         ref_fasta:
356 |           Description: "path to reference fasta"
357 |         readgroup_name:
358 |           Description: "readgroup name"
359 |         library_name:
360 |           Description: "library name"
361 |         platform_name:
362 |           Description: "platform name  e.g. Illumina"
363 |         run_date:
364 |           Description: "sequencing run date"
365 |         sequencing_center:
366 |           Description: "name of sequencing center"
367 |         dbSNP_vcf:
368 |           Description: "dbsnp vcf"
369 |         Mills_1000G_indels_vcf:
370 |           Description: "Mills 1000 genomes gold indels vcf"
371 |         known_indels_vcf:
372 |           Description: "known indels vcf"
373 |         scattered_calling_intervals_archive:
374 |           Description: "tar (not gzip) of scatter intervals"
375 |         gatk_docker:
376 |           Description: "docker uri in private ECR of GATK"
377 |         gotc_docker:
378 |           Description: "docker uri in private ECR of Genomes in the Cloud"
379 | 
380 |   OmicsWorkflowStartRunLambda:
381 |     Type: 'AWS::Lambda::Function'
382 |     Properties:
383 |       Handler: start_workflow_lambda.handler
384 |       Runtime: python3.9
385 |       FunctionName: !Sub '${OmicsResourcePrefix}-start-workflow'
386 |       Code:
387 |         S3Bucket: !Sub '${OmicsResourcesS3Bucket}'
388 |         S3Key: !Sub '${OmicsCustomResourceLambdaS3Prefix}start_workflow_lambda.zip'
389 |       Role: !Sub '${OmicsWorkflowStartRunLambdaRole.Arn}'
390 |       Timeout: 60
391 | 
392 |   OmicsWorkflowStartRunLambdaRole:
393 |     Type: 'AWS::IAM::Role'
394 |     Properties:
395 |       AssumeRolePolicyDocument:
396 |         Version: 2012-10-17
397 |         Statement:
398 |           - Action:
399 |               - 'sts:AssumeRole'
400 |             Effect: Allow
401 |             Principal:
402 |               Service:
403 |                 - lambda.amazonaws.com
404 |       Path: /
405 |       Policies:
406 |         - PolicyName: ImportSequencePolicy
407 |           PolicyDocument:
408 |             Statement:
409 |               - Effect: Allow
410 |                 Action:
411 |                   - 'logs:CreateLogGroup'
412 |                   - 'logs:CreateLogStream'
413 |                   - 'logs:PutLogEvents'
414 |                 Resource:
415 |                   - !Sub >-
416 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
417 |               - Effect: Allow
418 |                 Action:
419 |                   - 'omics:*'
420 |                 Resource: '*'
421 |               - Effect: Allow
422 |                 Action:
423 |                   - 'iam:GetRole'
424 |                   - 'iam:PassRole'
425 |                 Resource: !Sub '${OmicsWorkflowStartRunJobRole.Arn}'
426 |   OmicsWorkflowStartRunJobRole:
427 |     Type: 'AWS::IAM::Role'
428 |     Properties:
429 |       AssumeRolePolicyDocument:
430 |         Version: 2012-10-17
431 |         Statement:
432 |           - Action:
433 |               - 'sts:AssumeRole'
434 |             Effect: Allow
435 |             Principal:
436 |               Service:
437 |                 - omics.amazonaws.com
438 |       Path: /
439 |       Policies:
440 |         - PolicyName: WorkflowStartRunJobRolePolicy
441 |           PolicyDocument:
442 |             Statement:
443 |               - Effect: Allow
444 |                 Action:
445 |                   - 's3:GetObject'
446 |                   - 's3:GetBucketLocation'
447 |                   - 's3:ListBucket'
448 |                 Resource:
449 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowInputBucketName}'
450 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowInputBucketName}/*'
451 |                   - arn:aws:s3:::broad-references
452 |                   - 'arn:aws:s3:::broad-references/*'
453 |                   - arn:aws:s3:::gatk-test-data
454 |                   - 'arn:aws:s3:::gatk-test-data/*'
455 |                   - arn:aws:s3:::aws-genomics-datasets
456 |                   - 'arn:aws:s3:::aws-genomics-datasets/*'
457 |                   - arn:aws:s3:::aws-genomics-static-us-east-1
458 |                   - 'arn:aws:s3:::aws-genomics-static-us-east-1/*'
459 |                   - !Sub 'arn:aws:s3:::${OmicsResourcesS3Bucket}'
460 |                   - !Sub 'arn:aws:s3:::${OmicsResourcesS3Bucket}/*'
461 |               - Effect: Allow
462 |                 Action:
463 |                   - 's3:GetObject'
464 |                   - 's3:GetBucketLocation'
465 |                   - 's3:ListBucket'
466 |                   - 's3:PutObject'
467 |                 Resource:
468 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowOutputBucketName}'
469 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowOutputBucketName}/*'
470 |               - Effect: Allow
471 |                 Action:
472 |                   - ecr:GetAuthorizationToken
473 |                   - ecr:BatchCheckLayerAvailability
474 |                   - ecr:GetDownloadUrlForLayer
475 |                   - ecr:GetRepositoryPolicy
476 |                   - ecr:ListImages
477 |                   - ecr:DescribeImages
478 |                   - ecr:BatchGetImage
479 |                 Resource: "*"
480 |               - Effect: Allow
481 |                 Action:
482 |                   - 'omics:*'
483 |                 Resource: "*"
484 |               - Effect: Allow
485 |                 Action:
486 |                   - 'logs:CreateLogGroup'
487 |                 Resource:
488 |                   - !Sub 'arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/omics/WorkflowLog:*'
489 |               - Effect: Allow
490 |                 Action:
491 |                   - 'logs:DescribeLogStreams'
492 |                   - 'logs:CreateLogStream'
493 |                   - 'logs:PutLogEvents'
494 |                 Resource:
495 |                   - !Sub 'arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/omics/WorkflowLog:log-stream:*'
496 | 
497 |   OmicsImportSequenceLambda:
498 |     Type: 'AWS::Lambda::Function'
499 |     Properties:
500 |       Handler: import_sequence_lambda.handler
501 |       Runtime: python3.9
502 |       FunctionName: !Sub '${OmicsResourcePrefix}-import-sequence'
503 |       Code:
504 |         S3Bucket: !Sub '${OmicsResourcesS3Bucket}'
505 |         S3Key: !Sub '${OmicsCustomResourceLambdaS3Prefix}import_sequence_lambda.zip'
506 |       Role: !Sub '${OmicsImportSequenceLambdaRole.Arn}'
507 |       Timeout: 60
508 | 
509 |   OmicsImportSequenceLambdaRole:
510 |     Type: 'AWS::IAM::Role'
511 |     Properties:
512 |       AssumeRolePolicyDocument:
513 |         Version: 2012-10-17
514 |         Statement:
515 |           - Action:
516 |               - 'sts:AssumeRole'
517 |             Effect: Allow
518 |             Principal:
519 |               Service:
520 |                 - lambda.amazonaws.com
521 |       Path: /
522 |       Policies:
523 |         - PolicyName: ImportSequencePolicy
524 |           PolicyDocument:
525 |             Statement:
526 |               - Effect: Allow
527 |                 Action:
528 |                   - 'logs:CreateLogGroup'
529 |                   - 'logs:CreateLogStream'
530 |                   - 'logs:PutLogEvents'
531 |                 Resource:
532 |                   - !Sub >-
533 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
534 |               - Effect: Allow
535 |                 Action:
536 |                   - 'omics:*'
537 |                 Resource: '*'
538 |               - Effect: Allow
539 |                 Action:
540 |                   - 'iam:GetRole'
541 |                   - 'iam:PassRole'
542 |                 Resource: !Sub ${OmicsImportSequenceJobRole.Arn}
543 |   OmicsImportSequenceJobRole:
544 |     Type: 'AWS::IAM::Role'
545 |     Properties:
546 |       AssumeRolePolicyDocument:
547 |         Version: 2012-10-17
548 |         Statement:
549 |           - Action:
550 |               - 'sts:AssumeRole'
551 |             Effect: Allow
552 |             Principal:
553 |               Service:
554 |                 - omics.amazonaws.com
555 |       Path: /
556 |       Policies:
557 |         - PolicyName: ImportSequenceJobRolePolicy
558 |           PolicyDocument:
559 |             Statement:
560 |               - Effect: Allow
561 |                 Action:
562 |                   - 's3:GetObject'
563 |                   - 's3:GetBucketLocation'
564 |                   - 's3:ListBucket'
565 |                 Resource:
566 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowInputBucketName}'
567 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowInputBucketName}/*'
568 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowOutputBucketName}'
569 |                   - !Sub 'arn:aws:s3:::${OmicsWorkflowOutputBucketName}/*'
570 | Outputs:
571 |   OmicsImportSequenceLambdaArn:
572 |     Value: !Sub ${OmicsImportSequenceLambda.Arn}
573 |   OmicsImportSequenceJobRoleArn:
574 |     Value: !Sub ${OmicsImportSequenceJobRole.Arn}
575 |   OmicsWorkflowStartRunLambdaArn:
576 |     Value: !Sub ${OmicsWorkflowStartRunLambda.Arn}
577 |   OmicsWorkflowStartRunJobRoleArn:
578 |     Value: !Sub ${OmicsWorkflowStartRunJobRole.Arn}
579 |   OmicsImportVariantLambdaArn:
580 |     Value: !Sub ${OmicsImportVariantLambda.Arn}
581 |   OmicsImportVariantJobRoleArn:
582 |     Value: !Sub ${OmicsImportVariantJobRole.Arn}
583 |   OmicsSequenceStoreId:
584 |     Value: !GetAtt OmicsSequenceStore.SequenceStoreId
585 |   OmicsReferenceArn:
586 |     Value: !GetAtt OmicsImportReference.Arn
587 |   OmicsWorkflowId:
588 |     Value: !GetAtt OmicsCreateWorkflow.Id


--------------------------------------------------------------------------------
/src/cfn_templates/s3-stack.yml:
--------------------------------------------------------------------------------
 1 | AWSTemplateFormatVersion: 2010-09-09
 2 | Description: >-
 3 |   S3 buckets for Genomics inputs and outputs
 4 |   with lifecycle configuration for cost savings
 5 | Parameters:
 6 |   DataInputBucketName:
 7 |     Type: String
 8 |   DataOutputBucketName:
 9 |     Type: String
10 | Resources:
11 |   InputBucket:
12 |     Type: 'AWS::S3::Bucket'
13 |     Properties:
14 |       BucketName: !Ref DataInputBucketName
15 |       LifecycleConfiguration:
16 |         Rules:
17 |           - Id: DeleteRule
18 |             Status: Enabled
19 |             ExpirationInDays: 30
20 |             TagFilters: 
21 |                 - Key: OmicsTiering
22 |                   Value: RemoveIn30
23 |       BucketEncryption:
24 |           ServerSideEncryptionConfiguration:
25 |             - BucketKeyEnabled: true
26 |               ServerSideEncryptionByDefault:
27 |                 SSEAlgorithm: AES256
28 |   InputBucketPolicy:
29 |     Type: AWS::S3::BucketPolicy
30 |     Properties: 
31 |       Bucket: !Ref InputBucket
32 |       PolicyDocument:
33 |         Version: 2012-10-17
34 |         Statement:
35 |           - Action: "s3:*"
36 |             Effect: Deny
37 |             Resource:
38 |               - !Sub arn:aws:s3:::${DataInputBucketName}
39 |               - !Sub arn:aws:s3:::${DataInputBucketName}/*
40 |             Principal: '*'
41 |             Condition:
42 |               Bool:
43 |                 "aws:SecureTransport": "false"
44 |             
45 |   OutputBucket:
46 |     Type: 'AWS::S3::Bucket'
47 |     Properties:
48 |       BucketName: !Ref DataOutputBucketName
49 |       LifecycleConfiguration:
50 |         Rules:
51 |           - Id: IntelligentTier
52 |             Status: Enabled
53 |             Transitions:
54 |               - TransitionInDays: 1
55 |                 StorageClass: INTELLIGENT_TIERING
56 |             TagFilters: 
57 |                 - Key: OmicsTiering
58 |                   Value: IntelligentTierAfter30
59 |       BucketEncryption:
60 |           ServerSideEncryptionConfiguration:
61 |             - BucketKeyEnabled: true
62 |               ServerSideEncryptionByDefault:
63 |                 SSEAlgorithm: AES256
64 | 
65 |   OutputBucketPolicy:
66 |     Type: AWS::S3::BucketPolicy
67 |     Properties: 
68 |       Bucket: !Ref OutputBucket
69 |       PolicyDocument:
70 |         Version: 2012-10-17
71 |         Statement:
72 |           - Action: "s3:*"
73 |             Effect: Deny
74 |             Resource:
75 |               - !Sub arn:aws:s3:::${DataOutputBucketName}
76 |               - !Sub arn:aws:s3:::${DataOutputBucketName}/*
77 |             Principal: '*'
78 |             Condition:
79 |               Bool:
80 |                 "aws:SecureTransport": "false"


--------------------------------------------------------------------------------
/src/cfn_templates/sfn-task-checker-stack.yml:
--------------------------------------------------------------------------------
 1 | AWSTemplateFormatVersion: 2010-09-09
 2 | Description: >-
 3 |   Lambda to check completion of various Omics API calls
 4 |   such as Sequence data import and Omics workflow completion
 5 | Parameters:
 6 |   OmicsOutputBucket:
 7 |     Type: String
 8 |     Description: S3 bucket that Amazon Omics workflow will write outputs to
 9 |   LambdaBucketName:
10 |     Type: String
11 |     Description: S3 bucket where lambda code artifacts are stored
12 |   LambdaArtifactPrefix:
13 |     Type: String
14 |     Description: Prefix in bucket where lambda artifacts are stored
15 | 
16 | Resources:
17 |   CheckOmicsTaskLambdaFunction:
18 |     Type: 'AWS::Lambda::Function'
19 |     Properties:
20 |       FunctionName: CheckOmicsTask
21 |       Code:
22 |         S3Bucket: !Ref LambdaBucketName
23 |         S3Key: !Sub '${LambdaArtifactPrefix}lambda_check_omics_workflow_task.zip'
24 |       Handler: lambda_check_omics_workflow_task.lambda_handler
25 |       Role: !GetAtt LambdaIAMRole.Arn
26 |       Runtime: python3.9
27 |       Timeout: 20
28 | 
29 |   LambdaIAMRole:
30 |     Type: 'AWS::IAM::Role'
31 |     Properties:
32 |       RoleName: CheckOmicsLambdaFnRole
33 |       AssumeRolePolicyDocument:
34 |         Version: 2012-10-17
35 |         Statement:
36 |           - Effect: Allow
37 |             Principal:
38 |               Service:
39 |               - lambda.amazonaws.com
40 |             Action:
41 |               - 'sts:AssumeRole'
42 |       Policies:
43 |         - PolicyName: AmazonOmicsPolicy
44 |           PolicyDocument:
45 |             Version: 2012-10-17
46 |             Statement:
47 |               - Effect: Allow
48 |                 Action:
49 |                   - 'omics:GetReadSetImportJob'
50 |                   - 'omics:GetRun'
51 |                   - 'omics:GetVariantImportJob'
52 |                 Resource: "*"
53 |         - PolicyName: LambdaLogs
54 |           PolicyDocument:
55 |             Version: 2012-10-17
56 |             Statement:
57 |               - Effect: Allow
58 |                 Action:
59 |                   - 'logs:CreateLogGroup'
60 |                   - 'logs:CreateLogStream'
61 |                   - 'logs:PutLogEvents'
62 |                 Resource: 'arn:aws:logs:*:*:*'
63 | Outputs:
64 |   CheckOmicsTaskLambdaFunctionArn:
65 |     Value: !GetAtt CheckOmicsTaskLambdaFunction.Arn


--------------------------------------------------------------------------------
/src/cfn_templates/sfn-trigger-stack.yml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: 2010-09-09
  2 | Description: >-
  3 |   Lambda to evaluate complete set of inputs and  
  4 |   invoke Step Functions State machine to process Genomics Data.
  5 |   S3 event notification to lambda integration for FASTQ input bucket.
  6 | Parameters:
  7 |   FastqInputBucket:
  8 |     Type: String
  9 |     Description: S3 bucket that's used for the Lambda event notification (FASTQs)
 10 |   GenomicsStepFunctionArn:
 11 |     Type: String
 12 |     Description: ARN of the Step Function State machine that processes Genomics input data
 13 |   LambdaBucketName:
 14 |     Type: String
 15 |     Description: S3 bucket where lambda code artifacts are stored
 16 |   LambdaArtifactPrefix:
 17 |     Type: String
 18 |     Description: Prefix in bucket where lambda artifacts are stored
 19 |   SequenceStoreId:
 20 |     Type: String
 21 |   ReferenceArn:
 22 |     Type: String
 23 |   WorkflowId:
 24 |     Type: String
 25 |   WorkflowOutputS3Path:
 26 |     Type: String
 27 |   GatkDockerUri:
 28 |     Type: String
 29 |   GotcDockerUri:
 30 |     Type: String
 31 |   IntervalS3Path:
 32 |     Type: String
 33 |   NotificationAppliedToS3Prefix:
 34 |     Type: String
 35 |     Default: inputs/
 36 | Resources:
 37 |   InvokeGenomicsStepFunctionLambda:
 38 |     Type: 'AWS::Lambda::Function'
 39 |     Properties:
 40 |       FunctionName: lambda-invoke-genomics-sfn-wf
 41 |       Code:
 42 |         S3Bucket: !Ref LambdaBucketName
 43 |         S3Key: !Sub "${LambdaArtifactPrefix}lambda_launch_genomics_sfn.zip"
 44 |       Handler: lambda_launch_genomics_sfn.lambda_handler
 45 |       Role: !GetAtt LambdaIAMRole.Arn
 46 |       Runtime: python3.9
 47 |       Timeout: 20
 48 |       Environment:
 49 |         Variables:
 50 |           NUM_FASTQS_PER_SAMPLE: 2
 51 |           GENOMICS_STEP_FUNCTION_ARN: !Ref GenomicsStepFunctionArn
 52 |           SEQUENCE_STORE_ID: !Ref SequenceStoreId
 53 |           REFERENCE_ARN: !Ref ReferenceArn
 54 |           WORKFLOW_ID: !Ref WorkflowId
 55 |           WORKFLOW_OUTPUT_S3_PATH: !Ref WorkflowOutputS3Path
 56 |           GATK_DOCKER_URI: !Ref GatkDockerUri
 57 |           GOTC_DOCKER_URI: !Ref GotcDockerUri
 58 |           INTERVAL_S3_PATH: !Ref IntervalS3Path
 59 | 
 60 |   LambdaIAMRole:
 61 |     Type: 'AWS::IAM::Role'
 62 |     Properties:
 63 |       AssumeRolePolicyDocument:
 64 |         Version: 2012-10-17
 65 |         Statement:
 66 |           - Effect: Allow
 67 |             Principal:
 68 |               Service:
 69 |               - lambda.amazonaws.com
 70 |             Action:
 71 |               - 'sts:AssumeRole'
 72 |       Policies:
 73 |         - PolicyName: Policy1
 74 |           PolicyDocument:
 75 |             Version: 2012-10-17
 76 |             Statement:
 77 |               - Effect: Allow
 78 |                 Action:
 79 |                   - 's3:GetBucketNotification'
 80 |                   - 's3:PutBucketNotification'
 81 |                   - 's3:GetObject'
 82 |                   - 's3:ListBucket'
 83 |                 Resource: 
 84 |                   - !Sub 'arn:aws:s3:::${FastqInputBucket}'
 85 |                   - !Sub 'arn:aws:s3:::${FastqInputBucket}:*'
 86 |               - Effect: Allow
 87 |                 Action:
 88 |                   - 'logs:CreateLogGroup'
 89 |                   - 'logs:CreateLogStream'
 90 |                   - 'logs:PutLogEvents'
 91 |                 Resource: 'arn:aws:logs:*:*:*'
 92 |               - Effect: Allow
 93 |                 Action:
 94 |                   - 'states:StartExecution'
 95 |                   - 'states:StartSyncExecution'
 96 |                   - 'states:ListExecutions'
 97 |                   - 'states:ListStateMachines'
 98 |                 Resource: !Sub ${GenomicsStepFunctionArn}
 99 | 
100 |   PutBucketNotificationTrigger:
101 |     Type: 'Custom::PutBucketNotificationTrigger'
102 |     DependsOn:
103 |       - PutBucketNotificationTriggerLambda
104 |     Version: 1
105 |     Properties:
106 |       ServiceToken: !Sub '${PutBucketNotificationTriggerLambda.Arn}'
107 |       BucketName: !Ref FastqInputBucket
108 |       Prefix: !Ref NotificationAppliedToS3Prefix
109 |       LambdaFunctionArn: !GetAtt InvokeGenomicsStepFunctionLambda.Arn
110 | 
111 |   PutBucketNotificationTriggerLambda:
112 |     Type: 'AWS::Lambda::Function'
113 |     DependsOn:
114 |       - PutBucketNotificationTriggerLambdaRole
115 |     Properties:
116 |       Handler: add_bucket_notification_lambda.handler
117 |       Runtime: python3.9
118 |       FunctionName: !Sub 'lambda-put-bucket-notification'
119 |       Code:
120 |         S3Bucket: !Ref LambdaBucketName
121 |         S3Key: !Sub "${LambdaArtifactPrefix}add_bucket_notification_lambda.zip"
122 |       Role: !Sub '${PutBucketNotificationTriggerLambdaRole.Arn}'
123 |       Timeout: 60
124 | 
125 |   PutBucketNotificationTriggerLambdaRole:
126 |     Type: 'AWS::IAM::Role'
127 |     Properties:
128 |       AssumeRolePolicyDocument:
129 |         Version: 2012-10-17
130 |         Statement:
131 |           - Action:
132 |               - 'sts:AssumeRole'
133 |             Effect: Allow
134 |             Principal:
135 |               Service:
136 |                 - lambda.amazonaws.com
137 |       Path: /
138 |       Policies:
139 |         - PolicyName: PutBucketNotificationPolicy
140 |           PolicyDocument:
141 |             Statement:
142 |               - Effect: Allow
143 |                 Action:
144 |                   - 'logs:CreateLogGroup'
145 |                   - 'logs:CreateLogStream'
146 |                   - 'logs:PutLogEvents'
147 |                 Resource:
148 |                   - !Sub >-
149 |                     arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/*
150 |               - Effect: Allow
151 |                 Action:
152 |                   - 's3:PutBucketNotification'
153 |                 Resource: !Sub 'arn:aws:s3:::${FastqInputBucket}'
154 |   
155 |   AllowInputBucketToInvokeLambda:
156 |     Type: 'AWS::Lambda::Permission'
157 |     DependsOn: 
158 |       - InvokeGenomicsStepFunctionLambda
159 |     Properties:
160 |       FunctionName: !GetAtt InvokeGenomicsStepFunctionLambda.Arn
161 |       Action: lambda:InvokeFunction
162 |       Principal: s3.amazonaws.com
163 |       SourceAccount: !Ref 'AWS::AccountId'
164 |       SourceArn: !Sub arn:aws:s3:::${FastqInputBucket}
165 | 
166 | 
167 | 
168 | 
169 | 


--------------------------------------------------------------------------------
/src/cfn_templates/solution-cfn.yml:
--------------------------------------------------------------------------------
  1 | AWSTemplateFormatVersion: 2010-09-09
  2 | Description: Main stack that nests other stacks required by the solution
  3 | Parameters:
  4 |   ArtifactBucketName:
  5 |     Type: String
  6 |     Description: Choose an existing bucket in your account for deployment artifacts
  7 |   LambdaArtifactsS3Prefix:
  8 |     Type: String
  9 |     Description: 'trailing backslash required - Folder name used by the upload script - keep in sync'
 10 |     Default: lambdas/
 11 |   CodeBuildArtifactsS3Prefix:
 12 |     Type: String
 13 |     Description: Folder name used by the upload script - keep in sync
 14 |     Default: buildspecs
 15 |   CfnTemplatesS3Prefix:
 16 |     Type: String
 17 |     Description: Folder name used by the upload script - keep in sync
 18 |     Default: templates
 19 |   WorkflowArtifactsS3Prefix:
 20 |     Type: String
 21 |     Description: Folder name used by the upload script - keep in sync
 22 |     Default: workflows
 23 |   WorkflowInputsBucketName:
 24 |     Type: String
 25 |     Description: New bucket created for users to upload inputs. Make it unique by adding accountId and region in the name
 26 |   WorkflowOutputsBucketName:
 27 |     Type: String
 28 |     Description: New bucket created for workflows to write outputs. Make it unique by adding accountId and region in the name
 29 |   ReferenceFastaName:
 30 |     Type: String
 31 |     Default: GRCh38
 32 |   ReferenceFastaS3Uri:
 33 |     Type: String
 34 |     Default: s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta
 35 |   WorkflowIntervalS3Path:
 36 |     Type: String
 37 |     Default: s3://aws-genomics-static-us-east-1/omics-e2e/intervals.tar
 38 |   ClinVarVcfS3Path:
 39 |     Type: String
 40 |     Default: s3://aws-genomics-static-us-east-1/omics-e2e/clinvar.vcf.gz
 41 |   DnSnpVcfS3Uri:
 42 |     Type: String
 43 |     Default: s3://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf
 44 |   Mills1000GIndelsVcfS3Uri:
 45 |     Type: String
 46 |     Default: s3://broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
 47 |   KnownIndelsVcfS3Uri:
 48 |     Type: String
 49 |     Default: s3://broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz
 50 |   WorkflowDefinitionFilename:
 51 |     Type: String
 52 |     Description: File name used by the upload script - keep in sync
 53 |     Default: gatkbestpractices.wdl.zip
 54 |   CurrentReferenceStoreId:
 55 |     Type: String 
 56 |     Default: 'NONE'
 57 |     Description: 'Provide Reference Store ID if exists in current account-region, else leave it NONE'  
 58 |   VariantStoreName:
 59 |     Type: String
 60 |     Default: omicsvariantstore
 61 |     Description: Name of the Omics Variant store
 62 |   AnnotationStoreName:
 63 |     Type: String
 64 |     Default: omicsannotationstore
 65 |     Description: Name of the Omics Annotation store
 66 | 
 67 | Resources:
 68 |   CodeBuildStack:
 69 |     Type: AWS::CloudFormation::Stack
 70 |     Properties:
 71 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/code-build-stack.yml
 72 |       TimeoutInMinutes: 20
 73 |       Parameters:
 74 |         ResourcesS3Bucket: !Ref ArtifactBucketName
 75 |         LambdasS3Prefix: !Ref LambdaArtifactsS3Prefix
 76 |         BuildSpecS3Prefix: !Ref CodeBuildArtifactsS3Prefix
 77 |         DockerGenomesInTheCloud: public.ecr.aws/aws-genomics/broadinstitute/genomes-in-the-cloud:2.4.7-1603303710
 78 |         DockerGatk: public.ecr.aws/aws-genomics/broadinstitute/gatk:4.1.9.0
 79 | 
 80 |   S3ResourcesStack:
 81 |     Type: AWS::CloudFormation::Stack
 82 |     Properties:
 83 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/s3-stack.yml
 84 |       TimeoutInMinutes: 5
 85 |       Parameters:
 86 |         DataInputBucketName: !Ref WorkflowInputsBucketName
 87 |         DataOutputBucketName: !Ref WorkflowOutputsBucketName
 88 | 
 89 |   OmicsResourcesStack:
 90 |     Type: AWS::CloudFormation::Stack
 91 |     DependsOn:
 92 |       - S3ResourcesStack
 93 |       - CodeBuildStack
 94 |     Properties:
 95 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/omics-resources-stack.yml
 96 |       TimeoutInMinutes: 60
 97 |       Parameters:
 98 |         OmicsResourcesS3Bucket: !Ref ArtifactBucketName
 99 |         OmicsCustomResourceLambdaS3Prefix: !Ref LambdaArtifactsS3Prefix
100 |         OmicsWorkflowInputBucketName: !Ref WorkflowInputsBucketName
101 |         OmicsWorkflowOutputBucketName: !Ref WorkflowOutputsBucketName
102 |         ExistingReferenceStoreId: !Ref CurrentReferenceStoreId
103 |         OmicsReferenceFastaUri: !Ref ReferenceFastaS3Uri
104 |         OmicsReferenceName: !Ref ReferenceFastaName
105 |         OmicsWorkflowDefinitionZipS3: !Sub "s3://${ArtifactBucketName}/${WorkflowArtifactsS3Prefix}/${WorkflowDefinitionFilename}"
106 |         ClinvarS3Path: !Ref ClinVarVcfS3Path
107 |         OmicsVariantStoreName: !Ref VariantStoreName
108 |         OmicsAnnotationStoreName: !Ref AnnotationStoreName
109 | 
110 |   ApplyS3LifecycleStack:
111 |     Type: AWS::CloudFormation::Stack
112 |     DependsOn:
113 |       - S3ResourcesStack
114 |       - CodeBuildStack
115 |     Properties:
116 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/apply-s3-lifecycle-stack.yml
117 |       TimeoutInMinutes: 10
118 |       Parameters:
119 |         LambdaBucketName: !Ref ArtifactBucketName
120 |         LambdaArtifactPrefix: !Ref LambdaArtifactsS3Prefix
121 |         InputsBucketName: !Ref WorkflowInputsBucketName
122 |         OutputsBucketName: !Ref WorkflowOutputsBucketName
123 |   SfnTaskCheckerStack:
124 |     Type: AWS::CloudFormation::Stack
125 |     DependsOn:
126 |       - OmicsResourcesStack
127 |       - CodeBuildStack
128 |     Properties:
129 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/sfn-task-checker-stack.yml
130 |       TimeoutInMinutes: 10
131 |       Parameters:
132 |         OmicsOutputBucket: !Ref WorkflowOutputsBucketName
133 |         LambdaBucketName: !Ref ArtifactBucketName
134 |         LambdaArtifactPrefix: !Ref LambdaArtifactsS3Prefix
135 | 
136 |   StepFunctionStack:
137 |     Type: AWS::CloudFormation::Stack
138 |     DependsOn:
139 |       - ApplyS3LifecycleStack
140 |       - SfnTaskCheckerStack
141 |     Properties:
142 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/e2e-sfn-stack.yml
143 |       TimeoutInMinutes: 60
144 |       Parameters:
145 |         ReferenceFastaFileS3Uri: !Ref ReferenceFastaS3Uri
146 |         OmicsVariantStoreName: !Ref VariantStoreName
147 |         DbSnpVcf: !Ref DnSnpVcfS3Uri
148 |         Mills1000GIndelsVcf: !Ref Mills1000GIndelsVcfS3Uri
149 |         KnownIndelsVcf: !Ref KnownIndelsVcfS3Uri
150 |         OmicsImportSequenceLambdaArn:
151 |           Fn::GetAtt:
152 |             - OmicsResourcesStack
153 |             - Outputs.OmicsImportSequenceLambdaArn
154 |         OmicsImportSequenceJobRoleArn:
155 |           Fn::GetAtt:
156 |             - OmicsResourcesStack
157 |             - Outputs.OmicsImportSequenceJobRoleArn
158 |         CheckOmicsTaskLambdaFunctionArn:
159 |           Fn::GetAtt:
160 |             - SfnTaskCheckerStack
161 |             - Outputs.CheckOmicsTaskLambdaFunctionArn
162 |         OmicsWorkflowStartRunLambdaArn:
163 |           Fn::GetAtt:
164 |             - OmicsResourcesStack
165 |             - Outputs.OmicsWorkflowStartRunLambdaArn
166 |         OmicsWorkflowStartRunJobRoleArn:
167 |           Fn::GetAtt:
168 |             - OmicsResourcesStack
169 |             - Outputs.OmicsWorkflowStartRunJobRoleArn
170 |         OmicsImportVariantLambdaArn:
171 |           Fn::GetAtt:
172 |             - OmicsResourcesStack
173 |             - Outputs.OmicsImportVariantLambdaArn
174 |         OmicsImportVariantJobRoleArn:
175 |           Fn::GetAtt:
176 |             - OmicsResourcesStack
177 |             - Outputs.OmicsImportVariantJobRoleArn
178 |         ApplyS3LifecycleLambdaFunctionArn:
179 |           Fn::GetAtt: 
180 |             - ApplyS3LifecycleStack
181 |             - Outputs.ApplyS3LifecycleLambdaFunctionArn
182 |   SfnTriggerStack:
183 |     Type: AWS::CloudFormation::Stack
184 |     DependsOn:
185 |       - StepFunctionStack
186 |     Properties:
187 |       TemplateURL: !Sub https://${ArtifactBucketName}.s3.amazonaws.com/${CfnTemplatesS3Prefix}/sfn-trigger-stack.yml
188 |       TimeoutInMinutes: 5
189 |       Parameters:
190 |         FastqInputBucket: !Ref WorkflowInputsBucketName
191 |         GenomicsStepFunctionArn:
192 |           Fn::GetAtt: 
193 |             - StepFunctionStack
194 |             - Outputs.AmazonOmicsStepFunctionArn
195 |         LambdaBucketName: !Ref ArtifactBucketName
196 |         LambdaArtifactPrefix: !Ref LambdaArtifactsS3Prefix
197 |         SequenceStoreId:
198 |           Fn::GetAtt:
199 |             - OmicsResourcesStack
200 |             - Outputs.OmicsSequenceStoreId
201 |         ReferenceArn:
202 |           Fn::GetAtt:
203 |             - OmicsResourcesStack
204 |             - Outputs.OmicsReferenceArn
205 |         WorkflowId: 
206 |           Fn::GetAtt:
207 |             - OmicsResourcesStack
208 |             - Outputs.OmicsWorkflowId
209 |         WorkflowOutputS3Path: !Sub "s3://${WorkflowOutputsBucketName}/outputs"
210 |         GatkDockerUri: 
211 |           Fn::GetAtt:
212 |             - CodeBuildStack
213 |             - Outputs.EcrImageUriGatk
214 |         GotcDockerUri:
215 |           Fn::GetAtt:
216 |             - CodeBuildStack
217 |             - Outputs.EcrImageUriGotc
218 |         IntervalS3Path: !Ref WorkflowIntervalS3Path
219 | 


--------------------------------------------------------------------------------
/src/codebuild/buildspec_docker.yml:
--------------------------------------------------------------------------------
 1 | version: 0.2
 2 | phases:
 3 |   install:
 4 |     runtime-versions:
 5 |       docker: 18
 6 |   pre_build:
 7 |     commands:
 8 |       - echo Logging in to Amazon ECR...
 9 |       - aws --version
10 |       - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
11 |       - REPOSITORY_URI=$ECR_REPO
12 |       - IMAGE_TAG=$ECR_REPO_VERSION
13 |   build:
14 |     commands:
15 |       - echo Build started on `date`
16 |       - echo Pull public docker image
17 |       - docker pull $SOURCE_REPO
18 |       - docker tag $SOURCE_REPO $REPOSITORY_URI:$IMAGE_TAG
19 |   post_build:
20 |     commands:
21 |       - echo Pull completed on `date`
22 |       - echo Pushing the Docker image...
23 |       - docker push $REPOSITORY_URI:$IMAGE_TAG
24 |       - echo Push complete on `date`


--------------------------------------------------------------------------------
/src/codebuild/buildspec_lambdas.yml:
--------------------------------------------------------------------------------
 1 | version: 0.2
 2 | env:
 3 |   shell: bash
 4 | phases:
 5 |   install:
 6 |     runtime-versions:
 7 |       python: 3.9
 8 |   build:
 9 |     commands:
10 |       - |
11 |         #!/bin/bash
12 |         lambda_s3_dirname=${RESOURCES_PREFIX}
13 |         artifact_s3_dirname=${RESOURCES_PREFIX}
14 | 
15 |         # Declare all lambda functions with package needs (crhelper needed since these lambdas help with resource creation)
16 |         declare -a LambdaNamesWithCrHelper=("import_reference_lambda" "import_annotation_lambda" "add_bucket_notification_lambda")
17 | 
18 |         # iterate over each lambda
19 |         for lambda in ${LambdaNamesWithCrHelper[@]}; do
20 | 
21 |           COUNT=$(aws s3 ls "s3://${RESOURCES_BUCKET}/${lambda_s3_dirname}${lambda}.py" | wc -l)
22 |           if [ $COUNT = 0 ]; then
23 |               echo "skipping Build, ${lambda}.py not found in  s3://${RESOURCES_BUCKET}/${lambda_s3_dirname}"
24 |           else
25 |             echo "Building lambda zip for: ${lambda} "
26 |             mkdir tmp_${lambda}
27 |             cd tmp_${lambda}
28 |             echo "Download lambda py for: ${lambda} "
29 |             aws s3 cp s3://${RESOURCES_BUCKET}/${lambda_s3_dirname}${lambda}.py .
30 |             echo "Installing pip packages"
31 |             pip install crhelper boto3==1.26.65 -t ./package
32 |             cd ./package
33 |             zip -r ../${lambda}.zip *
34 |             cd ..
35 |             echo "Zip lambda to artifact"
36 |             zip -g ${lambda}.zip ${lambda}.py
37 |             echo "Upload zip to s3://${RESOURCES_BUCKET}/${artifact_s3_dirname}"
38 |             aws s3 cp ${lambda}.zip s3://${RESOURCES_BUCKET}/${artifact_s3_dirname}
39 |             cd ..
40 |             rm -rf tmp_${lambda}
41 |             echo "Done with ${lambda}"
42 |           fi
43 |         done
44 | 
45 |         # Declare all lambda functions with package needs
46 |         declare -a LambdaNamesJsonSchema=("apply_s3_lifecycle_lambda" "lambda_check_omics_workflow_task" "import_sequence_lambda" "import_variant_lambda" "lambda_launch_genomics_sfn"  "start_workflow_lambda")
47 | 
48 |         # iterate over each lambda
49 |         for lambda in ${LambdaNamesJsonSchema[@]}; do
50 | 
51 |           COUNT=$(aws s3 ls "s3://${RESOURCES_BUCKET}/${lambda_s3_dirname}${lambda}.py" | wc -l)
52 |           if [ $COUNT = 0 ]; then
53 |               echo "skipping Build, ${lambda}.py not found in  s3://${RESOURCES_BUCKET}/${lambda_s3_dirname}"
54 |           else
55 |             echo "Building lambda zip for: ${lambda} "
56 |             mkdir tmp_${lambda}
57 |             cd tmp_${lambda}
58 |             echo "Download lambda py for: ${lambda} "
59 |             aws s3 cp s3://${RESOURCES_BUCKET}/${lambda_s3_dirname}${lambda}.py .
60 |             echo "Installing pip packages"
61 |             pip install jsonschema boto3==1.26.65 -t ./package
62 |             cd ./package
63 |             zip -r ../${lambda}.zip *
64 |             cd ..
65 |             echo "Zip lambda to artifact"
66 |             zip -g ${lambda}.zip ${lambda}.py
67 |             echo "Upload zip to s3://${RESOURCES_BUCKET}/${artifact_s3_dirname}"
68 |             aws s3 cp ${lambda}.zip s3://${RESOURCES_BUCKET}/${artifact_s3_dirname}
69 |             cd ..
70 |             rm -rf tmp_${lambda}
71 |             echo "Done with ${lambda}"
72 |           fi
73 |         done
74 | 


--------------------------------------------------------------------------------
/src/glue/etl.py:
--------------------------------------------------------------------------------
  1 | import sys
  2 | from awsglue.transforms import *
  3 | from awsglue.utils import getResolvedOptions
  4 | from pyspark.context import SparkContext
  5 | from awsglue.context import GlueContext
  6 | from awsglue.job import Job
  7 | from awsglue.dynamicframe import DynamicFrameCollection
  8 | from awsglue.dynamicframe import DynamicFrame
  9 | 
 10 | # Script generated for node Custom Transform
 11 | def MyTransform(glueContext, dfc) -> DynamicFrameCollection:
 12 |     from pyspark.sql.functions import coalesce
 13 |     from awsglue.dynamicframe import DynamicFrame
 14 | 
 15 |     df0 = dfc.select(list(dfc.keys())[0]).toDF()
 16 | 
 17 |     df0.withColumn("patient_id", coalesce(df0.dg_patient_id, df0.patient_id))
 18 | 
 19 |     df0.withColumn("patient_id", coalesce(df0.rx_patient_id, df0.patient_id))
 20 | 
 21 |     df0.withColumn("patient_id", coalesce(df0.pr_patient_id, df0.patient_id))
 22 | 
 23 |     dyf = DynamicFrame.fromDF(df0, glueContext, "results")
 24 |     return DynamicFrameCollection({"CustomTransform0": dyf}, glueContext)
 25 | 
 26 | 
 27 | args = getResolvedOptions(sys.argv, ["JOB_NAME"])
 28 | sc = SparkContext()
 29 | glueContext = GlueContext(sc)
 30 | spark = glueContext.spark_session
 31 | job = Job(glueContext)
 32 | job.init(args["JOB_NAME"], args)
 33 | 
 34 | # Script generated for node Novation_Rx
 35 | Novation_Rx_node1665891226598 = glueContext.create_dynamic_frame.from_catalog(
 36 |     database="phenotypicdb",
 37 |     table_name="ovation_rx_csv",
 38 |     transformation_ctx="Novation_Rx_node1665891226598",
 39 | )
 40 | 
 41 | # Script generated for node DiagnosisDF
 42 | DiagnosisDF_node1665691724279 = glueContext.create_dynamic_frame.from_catalog(
 43 |     database="phenotypicdb",
 44 |     table_name="ovation_diagnosis_csv",
 45 |     transformation_ctx="DiagnosisDF_node1665691724279",
 46 | )
 47 | 
 48 | # Script generated for node ClinicoGenomicsDF
 49 | ClinicoGenomicsDF_node1665689379027 = glueContext.create_dynamic_frame.from_catalog(
 50 |     database="phenotypicdb",
 51 |     table_name="ovation_clinicogenomics_csv",
 52 |     transformation_ctx="ClinicoGenomicsDF_node1665689379027",
 53 | )
 54 | 
 55 | # Script generated for node ProceduresDF
 56 | ProceduresDF_node1665690724543 = glueContext.create_dynamic_frame.from_catalog(
 57 |     database="phenotypicdb",
 58 |     table_name="ovation_procedures_csv",
 59 |     transformation_ctx="ProceduresDF_node1665690724543",
 60 | )
 61 | 
 62 | # Script generated for node Renamed keys for finalJoin
 63 | RenamedkeysforfinalJoin_node1665892999370 = ApplyMapping.apply(
 64 |     frame=Novation_Rx_node1665891226598,
 65 |     mappings=[
 66 |         ("patient_id", "string", "rx_patient_id", "string"),
 67 |         ("claim_id", "string", "rx_claim_id", "string"),
 68 |         ("ndc_product", "long", "ndc_product", "long"),
 69 |         ("quantity", "double", "quantity", "double"),
 70 |         ("uom", "string", "uom", "string"),
 71 |         ("prescriber_npi", "long", "prescriber_npi", "long"),
 72 |         ("brand_name", "string", "brand_name", "string"),
 73 |         ("generic_name", "string", "generic_name", "string"),
 74 |         ("dosage_form", "string", "dosage_form", "string"),
 75 |     ],
 76 |     transformation_ctx="RenamedkeysforfinalJoin_node1665892999370",
 77 | )
 78 | 
 79 | # Script generated for node Renamed keys for Join
 80 | RenamedkeysforJoin_node1665694312611 = ApplyMapping.apply(
 81 |     frame=DiagnosisDF_node1665691724279,
 82 |     mappings=[
 83 |         ("patient_id", "string", "dg_patient_id", "string"),
 84 |         ("claim_id", "string", "dg_claim_id", "string"),
 85 |         ("diagnosis_date", "string", "diagnosis_date", "string"),
 86 |         ("diagnosis_vocab", "string", "diagnosis_vocab", "string"),
 87 |         ("diagnosis_code", "string", "diagnosis_code", "string"),
 88 |         ("diagnosis_desc", "string", "diagnosis_desc", "string"),
 89 |         ("vocabulary_name", "string", "vocabulary_name", "string"),
 90 |     ],
 91 |     transformation_ctx="RenamedkeysforJoin_node1665694312611",
 92 | )
 93 | 
 94 | # Script generated for node ApplyFilteronClinicalGenomicsData
 95 | ApplyFilteronClinicalGenomicsData_node1665894278030 = ApplyMapping.apply(
 96 |     frame=ClinicoGenomicsDF_node1665689379027,
 97 |     mappings=[
 98 |         ("patient_id", "string", "patient_id", "string"),
 99 |         ("lab_specimen_identifier", "string", "lab_specimen_identifier", "string"),
100 |         ("sample_type", "string", "sample_type", "string"),
101 |         ("afr_ancestry_percent", "double", "afr_ancestry_percent", "double"),
102 |         ("amr_ancestry_percent", "double", "amr_ancestry_percent", "double"),
103 |         ("eas_ancestry_percent", "double", "eas_ancestry_percent", "double"),
104 |         ("eur_ancestry_percent", "double", "eur_ancestry_percent", "double"),
105 |         ("oce_ancestry_percent", "double", "oce_ancestry_percent", "double"),
106 |         ("sas_ancestry_percent", "double", "sas_ancestry_percent", "double"),
107 |         ("was_ancestry_percent", "double", "was_ancestry_percent", "double"),
108 |     ],
109 |     transformation_ctx="ApplyFilteronClinicalGenomicsData_node1665894278030",
110 | )
111 | 
112 | # Script generated for node FilterProcedureData
113 | FilterProcedureData_node1665694157838 = ApplyMapping.apply(
114 |     frame=ProceduresDF_node1665690724543,
115 |     mappings=[
116 |         ("patient_id", "string", "pr_patient_id", "string"),
117 |         ("claim_id", "string", "pr_claim_id", "string"),
118 |         ("claim_type", "string", "pr_claim_type", "string"),
119 |         ("procedure_date", "string", "pr_procedure_date", "string"),
120 |         ("procedure_vocab", "string", "pr_procedure_vocab", "string"),
121 |         ("procedure_code", "string", "pr_procedure_code", "string"),
122 |         ("procedure_short_desc", "string", "procedure_short_desc", "string"),
123 |         ("procedure_long_desc", "string", "procedure_long_desc", "string"),
124 |         ("vocabulary_name", "string", "vocabulary_name", "string"),
125 |     ],
126 |     transformation_ctx="FilterProcedureData_node1665694157838",
127 | )
128 | 
129 | # Script generated for node JoinClinicalGenomicswithProcedures
130 | ApplyFilteronClinicalGenomicsData_node1665894278030DF = (
131 |     ApplyFilteronClinicalGenomicsData_node1665894278030.toDF()
132 | )
133 | FilterProcedureData_node1665694157838DF = FilterProcedureData_node1665694157838.toDF()
134 | JoinClinicalGenomicswithProcedures_node1665694145271 = DynamicFrame.fromDF(
135 |     ApplyFilteronClinicalGenomicsData_node1665894278030DF.join(
136 |         FilterProcedureData_node1665694157838DF,
137 |         (
138 |             ApplyFilteronClinicalGenomicsData_node1665894278030DF["patient_id"]
139 |             == FilterProcedureData_node1665694157838DF["pr_patient_id"]
140 |         ),
141 |         "outer",
142 |     ),
143 |     glueContext,
144 |     "JoinClinicalGenomicswithProcedures_node1665694145271",
145 | )
146 | 
147 | # Script generated for node Join
148 | JoinClinicalGenomicswithProcedures_node1665694145271DF = (
149 |     JoinClinicalGenomicswithProcedures_node1665694145271.toDF()
150 | )
151 | RenamedkeysforJoin_node1665694312611DF = RenamedkeysforJoin_node1665694312611.toDF()
152 | Join_node1665694288059 = DynamicFrame.fromDF(
153 |     JoinClinicalGenomicswithProcedures_node1665694145271DF.join(
154 |         RenamedkeysforJoin_node1665694312611DF,
155 |         (
156 |             JoinClinicalGenomicswithProcedures_node1665694145271DF["patient_id"]
157 |             == RenamedkeysforJoin_node1665694312611DF["dg_patient_id"]
158 |         ),
159 |         "outer",
160 |     ),
161 |     glueContext,
162 |     "Join_node1665694288059",
163 | )
164 | 
165 | # Script generated for node finalJoin
166 | Join_node1665694288059DF = Join_node1665694288059.toDF()
167 | RenamedkeysforfinalJoin_node1665892999370DF = (
168 |     RenamedkeysforfinalJoin_node1665892999370.toDF()
169 | )
170 | finalJoin_node1665891366375 = DynamicFrame.fromDF(
171 |     Join_node1665694288059DF.join(
172 |         RenamedkeysforfinalJoin_node1665892999370DF,
173 |         (
174 |             Join_node1665694288059DF["patient_id"]
175 |             == RenamedkeysforfinalJoin_node1665892999370DF["rx_patient_id"]
176 |         ),
177 |         "outer",
178 |     ),
179 |     glueContext,
180 |     "finalJoin_node1665891366375",
181 | )
182 | 
183 | # Script generated for node Custom Transform
184 | CustomTransform_node1665962473016 = MyTransform(
185 |     glueContext,
186 |     DynamicFrameCollection(
187 |         {"finalJoin_node1665891366375": finalJoin_node1665891366375}, glueContext
188 |     ),
189 | )
190 | 
191 | # Script generated for node Select From Collection
192 | SelectFromCollection_node1665962600856 = SelectFromCollection.apply(
193 |     dfc=CustomTransform_node1665962473016,
194 |     key=list(CustomTransform_node1665962473016.keys())[0],
195 |     transformation_ctx="SelectFromCollection_node1665962600856",
196 | )
197 | 
198 | # Script generated for node Amazon S3
199 | AmazonS3_node1665695621571 = glueContext.write_dynamic_frame.from_options(
200 |     frame=SelectFromCollection_node1665962600856,
201 |     connection_type="s3",
202 |     format="glueparquet",
203 |     connection_options={
204 |         "path": "s3://omics-datalake-genomics/phentotypic-datalake/",
205 |         "partitionKeys": [],
206 |     },
207 |     transformation_ctx="AmazonS3_node1665695621571",
208 | )
209 | 
210 | job.commit()
211 | 


--------------------------------------------------------------------------------
/src/lambda/add_bucket_notification/add_bucket_notification_lambda.py:
--------------------------------------------------------------------------------
 1 | from crhelper import CfnResource
 2 | import logging
 3 | import boto3
 4 | from botocore.exceptions import ClientError
 5 | 
 6 | logger = logging.getLogger(__name__)
 7 | # Initialise the helper, all inputs are optional, this example shows the defaults
 8 | helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', polling_interval=1)
 9 | 
10 | # Initiate client
11 | try:
12 |     print("Attempt to initiate client")
13 |     s3 = boto3.resource('s3')
14 |     print("Attempt to initiate client complete")
15 | except Exception as e:
16 |     raise e
17 | 
18 | @helper.create
19 | def create(event, context):
20 |     logger.info("Got Create")
21 |     put_bucket_notification(event, context)
22 | 
23 | 
24 | @helper.update
25 | def update(event, context):
26 |     logger.info("Got Update")
27 |     put_bucket_notification(event, context)
28 | 
29 | 
30 | @helper.delete
31 | def delete(event, context):
32 |     logger.info("Got Delete")
33 |     pass
34 |     # Delete never returns anything. Should not fail if the underlying resources are already deleted. Desired state.
35 | 
36 | def handler(event, context):
37 |     helper(event, context)
38 | 
39 | def put_bucket_notification(event, context):
40 |     bucket_name = event['ResourceProperties']['BucketName']
41 |     prefix = event['ResourceProperties']['Prefix']
42 |     lambda_function_arn = event['ResourceProperties']['LambdaFunctionArn']
43 |     try:
44 |         print("Attempt to update bucket configuration")
45 |         bucket_notification = s3.BucketNotification(bucket_name)
46 |         response = bucket_notification.put(
47 |             NotificationConfiguration={
48 |                 'LambdaFunctionConfigurations': [
49 |                 {
50 |                     'Id': 'ObjectCreatedStartsWithPrefix',
51 |                     'LambdaFunctionArn': lambda_function_arn,
52 |                     'Events': [
53 |                         's3:ObjectCreated:*'
54 |                     ],
55 |                     'Filter': {
56 |                         'Key': {
57 |                             'FilterRules': [
58 |                                 {
59 |                                     'Name': 'prefix',
60 |                                     'Value': prefix
61 |                                 },
62 |                             ]
63 |                         }
64 |                     }
65 |                 },
66 |             ]
67 |         },
68 |         )
69 |     except ClientError as e:
70 |         raise Exception( "boto3 client error : " + e.__str__())
71 |     except Exception as e:
72 |        raise Exception( "Unexpected error : " +    e.__str__())
73 |     print(response)


--------------------------------------------------------------------------------
/src/lambda/apply_s3_lifecycle/apply_s3_lifecycle_lambda.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import boto3
  3 | from jsonschema import validate
  4 | from botocore.exceptions import ClientError
  5 | 
  6 | print('Loading function - Apply S3 Lifecycle')
  7 | 
  8 | s3_client = boto3.client('s3')
  9 | 
 10 | file_tag_rules = {
 11 |     "bam":
 12 |     [
 13 |         {
 14 |             "Key": "processed",
 15 |             "Value": "true"
 16 |         },
 17 |         {
 18 |             "Key": "OmicsTiering",
 19 |             "Value": "IntelligentTierAfter30"
 20 |         }
 21 |     ],
 22 |     "vcf":
 23 |     [
 24 |         {
 25 |             "Key": "processed",
 26 |             "Value": "true"
 27 |         },
 28 |         {
 29 |             "Key": "OmicsTiering",
 30 |             "Value": "Standard"
 31 |         }
 32 |     ],
 33 |     "gvcf":
 34 |     [
 35 |         {
 36 |             "Key": "processed",
 37 |             "Value": "true"
 38 |         },
 39 |         {
 40 |             "Key": "OmicsTiering",
 41 |             "Value": "IntelligentTierAfter30"
 42 |         }
 43 |     ],
 44 |     "fastq":
 45 |     [
 46 |         {
 47 |             "Key": "processed",
 48 |             "Value": "true"
 49 |         },
 50 |         {
 51 |             "Key": "OmicsTiering",
 52 |             "Value": "RemoveIn30"
 53 |         }
 54 |     ]
 55 | }
 56 | 
 57 | def validate_event(_event_json):
 58 |     schema = {
 59 |         "$schema": "http://json-schema.org/draft-04/schema#",
 60 |         "type": "object",
 61 |         "properties": {
 62 |         "inputs": {
 63 |         "type": "object",
 64 |         "properties": {
 65 |             "fastq": {
 66 |                 "type": "array"
 67 |             }
 68 |         },
 69 |         "required": [
 70 |             "fastq"
 71 |             ]
 72 |         },
 73 |         "outputs": {
 74 |             "type": "object",
 75 |             "properties": {
 76 |                 "vcf": {
 77 |                     "type": "array"
 78 |                 },
 79 |                 "bam": {
 80 |                     "type": "array"
 81 |                 },
 82 |                 "gvcf": {
 83 |                     "type": "array"
 84 |                 }
 85 |                 },
 86 |             "required": [
 87 |                 "vcf",
 88 |                 "bam"
 89 |             ]
 90 |         }
 91 |     },
 92 |         "required": [
 93 |             "inputs",
 94 |             "outputs"
 95 |         ]
 96 |     }
 97 | 
 98 |     try:
 99 |         validate(_event_json, schema=schema)
100 |         return True
101 |     except Exception as e:
102 |         raise e
103 | 
104 | def split_s3_path(s3_path):
105 |     path_parts=s3_path.replace("s3://","").split("/")
106 |     bucket=path_parts.pop(0)
107 |     key="/".join(path_parts)
108 |     return bucket, key
109 | 
110 | def get_tagset_for_object(_bucket, _key):
111 |     try:
112 |         get_tags_response = s3_client.get_object_tagging(
113 |             Bucket=_bucket,
114 |             Key=_key,
115 |         )
116 |         return get_tags_response['TagSet']
117 |     except Exception as e:
118 |         raise e
119 | 
120 | def lambda_handler(event, context):
121 |     """
122 |     Example event
123 |     {
124 |         "inputs": {
125 |             "fastq": [
126 |                 "s3://path/tofastq_R1.fastq.gz",
127 |                 "s3://path/tofastq_R2.fastq.gz"
128 |             ]
129 |         },
130 |         "outputs": {
131 |             "vcf": [
132 |                 "s3://output.vcf"
133 |             ],
134 |             "bam": [
135 |                 "s3://output.bam"
136 |             ],
137 |             "gvcf": [
138 |                 "s3://example.genome.vcf.gz"
139 |             ]
140 |         } 
141 |     }
142 |     """
143 |     # Inoked by Step Function 
144 |     print("Received event: " + json.dumps(event, indent=2))
145 | 
146 |     objects_to_tag = {}
147 |     # check valid event and add files to tag
148 |     if validate_event(event):
149 |         for _k, _v in event['inputs'].items():
150 |             objects_to_tag[_k] = _v
151 |         for _k, _v in event['outputs'].items():
152 |             objects_to_tag[_k] = _v
153 | 
154 |     # check and apply tags based on config
155 |     for file_type, s3_files in objects_to_tag.items():
156 |         for _s3_file in s3_files:
157 |             bucket, _key = split_s3_path(_s3_file)
158 |             print(f"File type: {file_type} Bucket: {bucket} Key: {_key}")
159 |             # tag_set = get_tagset_for_object(bucket, _key)
160 |             # Add logic here to check existing tags if needed
161 | 
162 |             # Apply new tag set based on file type and config
163 |             try:
164 |                 put_tags_response = s3_client.put_object_tagging(
165 |                     Bucket=bucket,
166 |                     Key=_key,    
167 |                     Tagging={
168 |                         'TagSet': file_tag_rules[file_type]
169 |                     }
170 |                 )
171 |                 print(put_tags_response)
172 |             except ClientError as e:
173 |                 raise Exception( "boto3 client error : " + e.__str__())
174 |             except Exception as e:
175 |                 raise Exception( "Unexpected error : " +    e.__str__())
176 |     print("Cleanup complete for sample")
177 | 


--------------------------------------------------------------------------------
/src/lambda/check_omcis_workflow_task/lambda_check_omics_workflow_task.py:
--------------------------------------------------------------------------------
  1 | import boto3
  2 | from copy import deepcopy
  3 | from dataclasses import dataclass, asdict
  4 | from datetime import datetime
  5 | from typing import Literal
  6 | 
  7 | import logging
  8 | 
  9 | logger = logging.getLogger()
 10 | logger.setLevel(logging.INFO)
 11 | 
 12 | # Set up for omics model for boto3, only needed while in beta
 13 | 
 14 | session = boto3.Session()
 15 | 
 16 | omics_client = session.client("omics")
 17 | 
 18 | # Define omics service response status types
 19 | 
 20 | TASK_TYPE = Literal["GetReadSetImportJob", "GetVariantImportJob", "GetRun"]
 21 | READ_SET_IMPORT_STATUS = Literal[
 22 |     "CREATED", "SUBMITTED", "RUNNING", "CANCELLING", "FAILED", "DONE", "COMPLETED_WITH_FAILURES"]
 23 | GET_RUN_JOB_STATUS = Literal[
 24 |     "PENDING", "STARTING", "RUNNING", "STOPPING", "COMPLETED", "DELETED", "CANCELLED", "FAILED"]
 25 | GET_VARIANT_IMPORT_JOB_STATUS = Literal[
 26 |     "CREATING", "QUEUED", "IN_PROGRESS", "CANCELING", "CANCELED", "COMPLETE", "FAILED"]
 27 | 
 28 | 
 29 | # Define data classes
 30 | 
 31 | 
 32 | @dataclass
 33 | class CheckOmicsWorkflowTaskRequest:
 34 |     task_type: TASK_TYPE
 35 |     task_params: dict
 36 | 
 37 | 
 38 | @dataclass
 39 | class CheckOmicsWorkflowTaskResponse(CheckOmicsWorkflowTaskRequest):
 40 |     task_status: str = None
 41 |     task_response: dict = None
 42 | 
 43 | 
 44 | # Converts datetimes to isoformat string
 45 | def dates_to_string(response_dict: dict):
 46 |     converted_response = deepcopy(response_dict)
 47 | 
 48 |     for k, v in response_dict.items():
 49 |         if isinstance(v, dict):
 50 |             converted_response[k] = dates_to_string(response_dict[k])
 51 |         elif isinstance(v, datetime):
 52 |             converted_response[k] = response_dict[k].isoformat()
 53 |     return converted_response
 54 | 
 55 | 
 56 | # Checks if task_status is one of possible values for a terminal state
 57 | # Sets value to either COMPLETED, or FAILED if terminal state
 58 | def get_terminal_status(task_status) -> str:
 59 |     if task_status == 'DONE':
 60 |         return "COMPLETED"
 61 |     elif task_status in ["CANCELLING", "DELETED", "CANCELLED", "COMPLETED_WITH_FAILURES"]:
 62 |         return "FAILED"
 63 |     else:
 64 |         return task_status
 65 | 
 66 | 
 67 | # Make Lambda Function response
 68 | def make_response(
 69 |         request: CheckOmicsWorkflowTaskRequest,
 70 |         task_status: str,
 71 |         task_response: dict
 72 | ) -> CheckOmicsWorkflowTaskResponse:
 73 |     del task_response['ResponseMetadata']
 74 | 
 75 |     workflow_task_response = CheckOmicsWorkflowTaskResponse(
 76 |         task_type=request.task_type,
 77 |         task_params=request.task_params,
 78 |         task_status=task_status,
 79 |         task_response=dates_to_string(task_response)
 80 |     )
 81 |     return workflow_task_response
 82 | 
 83 | 
 84 | # Amazon Omics service calls
 85 | 
 86 | def get_read_set_import_job(request: CheckOmicsWorkflowTaskRequest) -> CheckOmicsWorkflowTaskResponse:
 87 |     boto3.client('omics')
 88 |     response = omics_client.get_read_set_import_job(
 89 |         id=request.task_params['id'],
 90 |         sequenceStoreId=request.task_params['sequence_store_id']
 91 |     )
 92 |     logger.info(response)
 93 | 
 94 |     workflow_task_response = make_response(
 95 |         request=request,
 96 |         task_status=get_terminal_status(response['status']),
 97 |         task_response=response
 98 |     )
 99 |     return workflow_task_response
100 | 
101 | 
102 | def get_run(request: CheckOmicsWorkflowTaskRequest) -> CheckOmicsWorkflowTaskResponse:
103 |     boto3.client('omics')
104 |     response = omics_client.get_run(
105 |         id=request.task_params['id'],
106 |     )
107 | 
108 |     logger.info(response)
109 | 
110 |     workflow_task_response = make_response(
111 |         request=request,
112 |         task_status=get_terminal_status(response['status']),
113 |         task_response=response
114 |     )
115 |     return workflow_task_response
116 | 
117 | 
118 | def get_variant_import_job(request: CheckOmicsWorkflowTaskRequest) -> CheckOmicsWorkflowTaskResponse:
119 |     boto3.client('omics')
120 |     response = omics_client.get_variant_import_job(
121 |         jobId=request.task_params['job_id'],
122 |     )
123 | 
124 |     logger.info(response)
125 | 
126 |     workflow_task_response = make_response(
127 |         request=request,
128 |         task_status=get_terminal_status(response['status']),
129 |         task_response=response
130 |     )
131 | 
132 |     return workflow_task_response
133 | 
134 | 
135 | # Main lambda handler
136 | 
137 | def lambda_handler(event: CheckOmicsWorkflowTaskRequest, context):
138 |     logger.info(f"Event Object: {event}")
139 | 
140 |     request = CheckOmicsWorkflowTaskRequest(**event)
141 | 
142 |     if request.task_type == "GetReadSetImportJob":
143 |         task_response = get_read_set_import_job(request)
144 |     elif request.task_type == "GetRun":
145 |         task_response = get_run(request)
146 |     elif request.task_type == "GetVariantImportJob":
147 |         task_response = get_variant_import_job(request)
148 |     else:
149 |         task_response = make_response(
150 |             request,
151 |             task_status="FAILED",
152 |             task_response={
153 |                 'failure_message': f'The requested task_type: {request.task_type} is not one of: ['
154 |                                    f'GetReadSetImportJob, GetRun, GetVariantImportJob]'}
155 |         )
156 | 
157 |     return asdict(task_response)


--------------------------------------------------------------------------------
/src/lambda/import_annotation/import_annotation_lambda.py:
--------------------------------------------------------------------------------
  1 | from crhelper import CfnResource
  2 | import logging
  3 | import boto3
  4 | from botocore.exceptions import ClientError
  5 | 
  6 | logger = logging.getLogger(__name__)
  7 | # Initialise the helper, all inputs are optional, this example shows the defaults
  8 | helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', polling_interval=1)
  9 | 
 10 | # Initiate client
 11 | try:
 12 |     print("Attempt to initiate client")
 13 |     omics_session = boto3.Session()
 14 |     omics_client = omics_session.client('omics')
 15 |     print("Attempt to initiate client complete")
 16 | except Exception as e:
 17 |     helper.init_failure(e)
 18 | 
 19 | 
 20 | @helper.create
 21 | def create(event, context):
 22 |     logger.info("Got Create")
 23 |     import_annotation(event, context)
 24 | 
 25 | 
 26 | @helper.update
 27 | def update(event, context):
 28 |     logger.info("Got Update")
 29 |     import_annotation(event, context)
 30 | 
 31 | 
 32 | @helper.delete
 33 | def delete(event, context):
 34 |     logger.info("Got Delete")
 35 |     return "delete"
 36 |     # Delete never returns anything. Should not fail if the underlying resources are already deleted. Desired state.
 37 | 
 38 | @helper.poll_create
 39 | def poll_create(event, context):
 40 |     logger.info("Got Create poll")
 41 |     return check_annotation_import_status(event, context)
 42 | 
 43 | 
 44 | @helper.poll_update
 45 | def poll_update(event, context):
 46 |     logger.info("Got Update poll")
 47 |     return check_annotation_import_status(event, context)
 48 | 
 49 | 
 50 | @helper.poll_delete
 51 | def poll_delete(event, context):
 52 |     logger.info("Got Delete poll")
 53 |     return "delete poll"
 54 | 
 55 | def handler(event, context):
 56 |     helper(event, context)
 57 | 
 58 | def import_annotation(event, context):
 59 |     omics_import_role_arn = event['ResourceProperties']['OmicsImportAnnotationRoleArn']
 60 |     annotation_source_s3_uri = event['ResourceProperties']['AnnotationSourceS3Uri']
 61 |     annotation_store_name = event['ResourceProperties']['AnnotationStoreName']
 62 |     try:
 63 |         print(f"Attempt to import annotation file: {annotation_source_s3_uri} to store: {annotation_store_name}")
 64 |         response = omics_client.start_annotation_import_job(
 65 |             destinationName=annotation_store_name,
 66 |             roleArn=omics_import_role_arn,
 67 |             items=[{'source': annotation_source_s3_uri}]
 68 |             )
 69 |     except ClientError as e:
 70 |         raise Exception( "boto3 client error : " + e.__str__())
 71 |     except Exception as e:
 72 |        raise Exception( "Unexpected error : " +    e.__str__())
 73 |     logger.info(response)
 74 |     helper.Data.update({"AnnotationImportJobId": response['jobId']})
 75 |     return True
 76 | 
 77 | def check_annotation_import_status(event, context):
 78 |     annotation_import_job_id = helper.Data.get("AnnotationImportJobId")
 79 | 
 80 |     try:
 81 |         response = omics_client.get_annotation_import_job(
 82 |             jobId=annotation_import_job_id
 83 |             )
 84 |     except ClientError as e:
 85 |         raise Exception( "boto3 client error : " + e.__str__())
 86 |     except Exception as e:
 87 |        raise Exception( "Unexpected error : " +    e.__str__())
 88 |     status = response['status']
 89 |     
 90 |     if status in ['SUBMITTED', 'IN_PROGRESS', 'RUNNING', 'CREATING', 'QUEUED']:
 91 |         logger.info(status)
 92 |         return None
 93 |     else:
 94 |         if status in ['READY', 'ACTIVE', 'COMPLETED', 'COMPLETE']:
 95 |             logger.info(status)
 96 |             return True
 97 |         else:
 98 |             msg = f"Annotation Import Job ID : {annotation_import_job_id} has status {status}, exiting"
 99 |             logger.info(msg)
100 |             raise ValueError(msg)
101 | 
102 | 


--------------------------------------------------------------------------------
/src/lambda/import_reference/import_reference_lambda.py:
--------------------------------------------------------------------------------
  1 | from crhelper import CfnResource
  2 | import logging
  3 | import boto3
  4 | from botocore.exceptions import ClientError
  5 | 
  6 | logger = logging.getLogger(__name__)
  7 | # Initialise the helper, all inputs are optional, this example shows the defaults
  8 | helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', polling_interval=1)
  9 | 
 10 | # Initiate client
 11 | try:
 12 |     print("Attempt to initiate client")
 13 |     omics_session = boto3.Session()
 14 |     omics_client = omics_session.client('omics')
 15 |     print("Attempt to initiate client complete")
 16 | except Exception as e:
 17 |     helper.init_failure(e)
 18 | 
 19 | 
 20 | @helper.create
 21 | def create(event, context):
 22 |     logger.info("Got Create")
 23 |     import_reference(event, context)
 24 | 
 25 | 
 26 | @helper.update
 27 | def update(event, context):
 28 |     logger.info("Got Update")
 29 |     import_reference(event, context)
 30 | 
 31 | 
 32 | @helper.delete
 33 | def delete(event, context):
 34 |     logger.info("Got Delete")
 35 |     return "delete"
 36 |     # Delete never returns anything. Should not fail if the underlying resources are already deleted. Desired state.
 37 | 
 38 | @helper.poll_create
 39 | def poll_create(event, context):
 40 |     logger.info("Got Create poll")
 41 |     return check_reference_import_status(event, context)
 42 | 
 43 | 
 44 | @helper.poll_update
 45 | def poll_update(event, context):
 46 |     logger.info("Got Update poll")
 47 |     return check_reference_import_status(event, context)
 48 | 
 49 | 
 50 | @helper.poll_delete
 51 | def poll_delete(event, context):
 52 |     logger.info("Got Delete poll")
 53 |     return "delete poll"
 54 | 
 55 | def handler(event, context):
 56 |     helper(event, context)
 57 | 
 58 | def import_reference(event, context):
 59 |     reference_store_id = event['ResourceProperties']['ReferenceStoreId']
 60 |     omics_import_role_arn = event['ResourceProperties']['OmicsImportReferenceRoleArn']
 61 |     reference_source_s3_uri = event['ResourceProperties']['ReferenceSourceS3Uri']
 62 |     reference_name = event['ResourceProperties']['ReferenceName']
 63 |     try:
 64 |         print(f"Attempt to import reference: {reference_source_s3_uri} to store: {reference_store_id}")
 65 |         response = omics_client.start_reference_import_job(
 66 |             referenceStoreId=reference_store_id,
 67 |             roleArn=omics_import_role_arn,
 68 |             sources=[{'sourceFile': reference_source_s3_uri, 'name': reference_name}]
 69 |             )
 70 |     except ClientError as e:
 71 |         raise Exception( "boto3 client error : " + e.__str__())
 72 |     except Exception as e:
 73 |        raise Exception( "Unexpected error : " +    e.__str__())
 74 |     logger.info(response)
 75 |     helper.Data.update({"ReferenceImportJobId": response['id']})
 76 |     helper.Data.update({"ReferenceStoreId": response['referenceStoreId']})
 77 |     return True
 78 | 
 79 | def get_reference_arn_id(reference_store_id, reference_name):
 80 |     try:
 81 |         response = omics_client.list_references(
 82 |             referenceStoreId=reference_store_id, 
 83 |             filter={'name': reference_name}
 84 |             )
 85 |     except ClientError as e:
 86 |         raise Exception( "boto3 client error : " + e.__str__())
 87 |     except Exception as e:
 88 |        raise Exception( "Unexpected error : " +    e.__str__())
 89 |     return response['references'][0]['arn'], response['references'][0]['id']
 90 | 
 91 | def check_reference_import_status(event, context):
 92 |     reference_store_id = helper.Data.get("ReferenceStoreId")
 93 |     reference_import_job_id = helper.Data.get("ReferenceImportJobId")
 94 | 
 95 |     try:
 96 |         response = omics_client.get_reference_import_job(
 97 |             id=reference_import_job_id, 
 98 |             referenceStoreId=reference_store_id
 99 |             )
100 |     except ClientError as e:
101 |         raise Exception( "boto3 client error : " + e.__str__())
102 |     except Exception as e:
103 |        raise Exception( "Unexpected error : " +    e.__str__())
104 |     status = response['status']
105 |     
106 |     if status in ['SUBMITTED', 'IN_PROGRESS', 'RUNNING']:
107 |         logger.info(status)
108 |         return None
109 |     else:
110 |         if status in ['READY', 'ACTIVE', 'COMPLETED']:
111 |             logger.info(status)
112 |             _arn, _id = get_reference_arn_id(
113 |                 reference_store_id, 
114 |                 event['ResourceProperties']['ReferenceName']
115 |                 )
116 |             helper.Data.update({"Arn": _arn})
117 |             helper.Data.update({"Id": _id})
118 |             return True
119 |         else:
120 |             msg = f"Reference store: {reference_store_id} has status {status}, exiting"
121 |             logger.info(msg)
122 |             raise ValueError(msg)
123 | 
124 | 


--------------------------------------------------------------------------------
/src/lambda/import_sequence/import_sequence_lambda.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import boto3
 3 | from botocore.exceptions import ClientError
 4 | 
 5 | logger = logging.getLogger(__name__)
 6 | 
 7 | # Initiate client
 8 | try:
 9 |     print("Attempt to initiate client")
10 |     omics_session = boto3.Session()
11 |     omics_client = omics_session.client('omics')
12 |     print("Attempt to initiate client complete")
13 | except Exception as e:
14 |     raise e
15 | 
16 | def handler(event, context):
17 |     sequence_store_id = event['SequenceStoreId']
18 |     sample_id = event['SampleId']
19 |     subject_id = event['SubjectId']
20 |     source_file_type = event['FileType']
21 |     source_files = {}
22 |     source1 = event['Read1']
23 |     source_files["source1"] = source1
24 |     if "Read2" in event:
25 |         source2 = event['Read2']
26 |         source_files["source2"] = source2
27 |     reference_arn = event['ReferenceArn']
28 |     role_arn = event['RoleArn']
29 |     source_list = [
30 |         {
31 |             "sourceFiles": source_files,
32 |             "sourceFileType": source_file_type,
33 |             "subjectId": subject_id,
34 |             "sampleId": sample_id,
35 |             "referenceArn": reference_arn
36 |         }
37 |     ]
38 | 
39 |     try:
40 |         print("Attempt to import read set")
41 |         response = omics_client.start_read_set_import_job(
42 |             sequenceStoreId=sequence_store_id,
43 |             roleArn=role_arn,
44 |             sources=source_list
45 |             )
46 |     except ClientError as e:
47 |         raise Exception( "boto3 client error : " + e.__str__())
48 |     except Exception as e:
49 |        raise Exception( "Unexpected error : " +    e.__str__())
50 |     logger.info(response)
51 |     return {"importReadSetJobId": response['id']}


--------------------------------------------------------------------------------
/src/lambda/import_variants/import_variant_lambda.py:
--------------------------------------------------------------------------------
 1 | import boto3
 2 | from botocore.exceptions import ClientError
 3 | 
 4 | # Initiate client
 5 | try:
 6 |     print("Attempt to initiate client")
 7 |     omics_session = boto3.Session()
 8 |     omics_client = omics_session.client('omics')
 9 |     print("Attempt to initiate client complete")
10 | except Exception as e:
11 |     raise e
12 | 
13 | def handler(event, context):
14 |     variant_store_name = event['VariantStoreName']
15 |     role_arn = event['OmicsImportVariantRoleArn']
16 |     variant_items = [{
17 |         "source": event['VcfS3Uri']
18 |     }]
19 |     try:
20 |         print("Attempt to start variant import job")
21 |         response = omics_client.start_variant_import_job(
22 |             destinationName=variant_store_name,
23 |             roleArn=role_arn,
24 |             items=variant_items
25 |             )
26 |     except ClientError as e:
27 |         raise Exception( "boto3 client error : " + e.__str__())
28 |     except Exception as e:
29 |        raise Exception( "Unexpected error : " +    e.__str__())
30 |     print(response)
31 |     return {"VariantImportJobId": response['jobId']}


--------------------------------------------------------------------------------
/src/lambda/launch_genomics_sfn/lambda_launch_genomics_sfn.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import boto3
  3 | import re
  4 | import os
  5 | import sys
  6 | import uuid
  7 | import time
  8 | import random
  9 | 
 10 | print('Loading function - launch Genomics Step Function Workflow')
 11 | 
 12 | s3 = boto3.client('s3')
 13 | 
 14 | ## FASTQ files name should look like 
 15 | # mysample_R1.fastq.gz mysample_R2.fastq.gz
 16 | # based on regex below  
 17 | FASTQ_REGEX = re.compile('^(\w{1,20})_R(\d{1,10})\.*')
 18 | 
 19 | ## this will control the expected number of files
 20 | # found with prefix, for example, inputs/mysamples
 21 | EXPECTED_READS = int(os.environ['NUM_FASTQS_PER_SAMPLE'])
 22 | SFN_ARN = os.environ['GENOMICS_STEP_FUNCTION_ARN']
 23 | MAX_DELAY = 20
 24 | 
 25 | def get_files_with_prefix(_bucket, _key, _sample):
 26 |     file_list = []
 27 |     if "/" in _key:
 28 |         _prefix = os.path.dirname(_key) + '/' + _sample
 29 |     else:
 30 |         _prefix = _sample
 31 |     
 32 |     s3_client = boto3.client("s3")
 33 |     response = s3_client.list_objects_v2(Bucket=_bucket, Prefix=_prefix)
 34 |     files = response.get("Contents")
 35 |     for file in files:
 36 |         s3_file_uri = 's3://' + _bucket + '/' + file['Key']
 37 |         print(f"S3 file path: {s3_file_uri}")
 38 |         file_list.append(s3_file_uri)
 39 |     return file_list
 40 | 
 41 | def verify_fastq(_filename):
 42 |     result = re.search(FASTQ_REGEX, _filename)
 43 |     if result:
 44 |         print(f'verified that file {_filename} is a FASTQ')
 45 |         return True
 46 |     else:
 47 |         return False
 48 | 
 49 | def is_sfn_exec_running(exec_name_prefix):
 50 |     sfn_client = boto3.client('stepfunctions')
 51 |     try:
 52 |         response = sfn_client.list_executions(
 53 |             stateMachineArn=SFN_ARN,
 54 |             statusFilter='RUNNING'
 55 |         )
 56 |     except Exception as e:
 57 |         raise e
 58 |     # check for response
 59 |     if 'executions' in response and len(response['executions']) > 0:
 60 |         for _exec in response['executions']:
 61 |             if _exec['name'].startswith(exec_name_prefix):
 62 |                 return True
 63 |         return False
 64 |     else:
 65 |         return False
 66 | 
 67 | def lambda_handler(event, context):
 68 |     # Sanity checks
 69 |     print("Received s3 event: " + json.dumps(event, indent=4))
 70 |     if "Records" not in event:
 71 |         sys.exit("Event doesnt have records, exiting")
 72 | 
 73 |     if len(event["Records"]) == 0:
 74 |         sys.exit("Event has empty records, exiting")
 75 | 
 76 |     event_obj = event["Records"][0]
 77 |     if "eventSource" not in event_obj or \
 78 |         event_obj["eventSource"] != "aws:s3" or \
 79 |             event_obj["eventName"].split(':')[0] != "ObjectCreated":
 80 |         sys.exit("Not a valid PutObject S3 event, exiting")
 81 |     
 82 |     # Get the object from the event and show its content type
 83 |     bucket = event_obj['s3']['bucket']['name']
 84 |     _key = event_obj['s3']['object']['key']
 85 |     print(f"Bucket: {bucket} Key: {_key}")
 86 | 
 87 |     if not _key.endswith('.fastq') and not _key.endswith('.fastq.gz'):
 88 |         sys.exit("Not a valid FASTQ file, exiting")
 89 |     else:
 90 |         # check if reads present
 91 |         if not verify_fastq(_key.split('/')[-1]):
 92 |             sys.exit("Not a valid fastq")
 93 |         else:
 94 |             # Add a random sleep to reduce likelihood of a race condition when
 95 |             # multiple fastqs arrive at the same time
 96 |             time_delay_seconds = random.randrange(0, MAX_DELAY)
 97 |             print(f"Waiting for {time_delay_seconds} seconds")
 98 |             time.sleep(time_delay_seconds)
 99 | 
100 |             result = re.match(FASTQ_REGEX, os.path.basename(_key))
101 |             sample_name = result.group(1)
102 |             files_for_sample = get_files_with_prefix(bucket, _key, sample_name)
103 |             print(f"{len(files_for_sample)} reads found for sample {sample_name}")
104 |             if len(files_for_sample) == EXPECTED_READS:
105 |                 print("All FASTQs for sample accounted for, start step functions")
106 |                 sfn_name_prefix = f'GENOMICS_{sample_name}'
107 |                 sfn_exec_name = sfn_name_prefix + '_' + str(uuid.uuid1())
108 | 
109 |                 # check if already running (to avoid race condition)
110 |                 print("Checking if SFN execution running")
111 |                 if is_sfn_exec_running(sfn_name_prefix):
112 |                     sys.exit(f"SFN execution for sample: {sample_name} is RUNNING, skip launching a duplicate")
113 | 
114 |                 sfn_payload = {
115 |                     "SampleId": sample_name,
116 |                     "Read1":  files_for_sample[0],
117 |                     "Read2": files_for_sample[1],
118 |                     "SubjectId": 'TEST_SUBJECT',
119 |                     "SequenceStoreId": os.environ["SEQUENCE_STORE_ID"],
120 |                     "ReferenceArn": os.environ["REFERENCE_ARN"],
121 |                     "WorkflowId": os.environ["WORKFLOW_ID"],
122 |                     "WorkflowOutputS3Path": os.environ["WORKFLOW_OUTPUT_S3_PATH"],
123 |                     "GatkDockerUri": os.environ["GATK_DOCKER_URI"],
124 |                     "GotcDockerUri": os.environ["GOTC_DOCKER_URI"],
125 |                     "IntervalsS3Path": os.environ["INTERVAL_S3_PATH"]
126 |                 }
127 |                 sfn_client = boto3.client('stepfunctions')
128 |                 try:
129 |                     response = sfn_client.start_execution(
130 |                         stateMachineArn=SFN_ARN,
131 |                         name=sfn_exec_name,
132 |                         input=json.dumps(sfn_payload)
133 |                     )
134 |                     print(f"Launched SFN execution: {sfn_exec_name}")
135 |                 except Exception as e:
136 |                     raise e
137 |             else:
138 |                 print("Not all FASTQs found for sample, exit")
139 | 


--------------------------------------------------------------------------------
/src/lambda/start_workflow/start_workflow_lambda.py:
--------------------------------------------------------------------------------
 1 | import logging
 2 | import boto3
 3 | from botocore.exceptions import ClientError
 4 | 
 5 | logger = logging.getLogger(__name__)
 6 | 
 7 | # Initiate client
 8 | try:
 9 |     print("Attempt to initiate client")
10 |     omics_session = boto3.Session()
11 |     omics_client = omics_session.client('omics')
12 |     print("Attempt to initiate client complete")
13 | except Exception as e:
14 |     raise e
15 | 
16 | def handler(event, context):
17 |     workflow_id = event['WorkflowId']
18 |     role_arn = event['JobRoleArn']
19 |     output_s3_path = event['OutputS3Path']
20 |     params = {
21 |         "sample_name": event['sample_name'],
22 |         "ref_fasta": event['ref_fasta'],
23 |         "fastq_1": event['fastq_1'],
24 |         "fastq_2": event['fastq_2'],
25 |         "readgroup_name": event['readgroup_name'],
26 |         "library_name": event['fastq_2'],
27 |         "platform_name": event['platform_name'],
28 |         "run_date": event['run_date'],
29 |         "sequencing_center":event['sequencing_center'],
30 |         "dbSNP_vcf": event['dbSNP_vcf'],
31 |         "Mills_1000G_indels_vcf": event['Mills_1000G_indels_vcf'],
32 |         "known_indels_vcf": event['known_indels_vcf'],
33 |         "scattered_calling_intervals_archive": event['scattered_calling_intervals_archive'],
34 |         "gatk_docker": event['gatk_docker'],
35 |         "gotc_docker": event['gotc_docker']
36 |         
37 |     }
38 | 
39 |     try:
40 |         print("Attempt to start workflow run")
41 |         response = omics_client.start_run(
42 |             workflowId=workflow_id,
43 |             name=event['sample_name'] + '-workflow',
44 |             roleArn=role_arn,
45 |             parameters=params,
46 |             outputUri=output_s3_path
47 |             )
48 |     except ClientError as e:
49 |         raise Exception( "boto3 client error : " + e.__str__())
50 |     except Exception as e:
51 |        raise Exception( "Unexpected error : " +    e.__str__())
52 |     logger.info(response)
53 |     return {"WorkflowRunId": response['id']}


--------------------------------------------------------------------------------
/src/lambda/trigger_code_build/trigger_docker_code_build.py:
--------------------------------------------------------------------------------
  1 | from crhelper import CfnResource
  2 | import logging
  3 | import boto3
  4 | from botocore.exceptions import ClientError
  5 | 
  6 | logger = logging.getLogger(__name__)
  7 | # Initialise the helper, all inputs are optional, this example shows the defaults
  8 | helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', polling_interval=1)
  9 | 
 10 | 
 11 | # Initiate client
 12 | try:
 13 |     print("Attempt to initiate client")
 14 |     _session = boto3.Session()
 15 |     code_build_client = _session.client('codebuild')
 16 |     print("Attempt to initiate codebuild client complete")
 17 | except Exception as e:
 18 |     helper.init_failure(e)
 19 | 
 20 | 
 21 | @helper.create
 22 | def create(event, context):
 23 |     logger.info("Got Create")
 24 |     start_code_build(event, context)
 25 | 
 26 | 
 27 | @helper.update
 28 | def update(event, context):
 29 |     logger.info("Got Update")
 30 |     start_code_build(event, context)
 31 | 
 32 | 
 33 | @helper.delete
 34 | def delete(event, context):
 35 |     logger.info("Got Delete")
 36 |     return "delete"
 37 |     # Delete never returns anything. Should not fail if the underlying resources are already deleted. Desired state.
 38 | 
 39 | @helper.poll_create
 40 | def poll_create(event, context):
 41 |     logger.info("Got Create poll")
 42 |     return check_code_build_status(event, context)
 43 | 
 44 | 
 45 | @helper.poll_update
 46 | def poll_update(event, context):
 47 |     logger.info("Got Update poll")
 48 |     return check_code_build_status(event, context)
 49 | 
 50 | 
 51 | @helper.poll_delete
 52 | def poll_delete(event, context):
 53 |     logger.info("Got Delete poll")
 54 |     return "delete poll"
 55 | 
 56 | def handler(event, context):
 57 |     helper(event, context)
 58 | 
 59 | 
 60 | def start_code_build(event, context):
 61 |     project_name = event['ResourceProperties']['ProjectName']
 62 |     source_repo = event['ResourceProperties']['SourceRepo']
 63 |     ecr_repo = event['ResourceProperties']['EcrRepo']
 64 |     image_tag = source_repo.split(':')[-1]
 65 |     try:
 66 |         print(f"Attempt to start code build project {project_name}")
 67 |         response = code_build_client.start_build(
 68 |             projectName=project_name,
 69 |             environmentVariablesOverride=[
 70 |                 {
 71 |                     "name": 'SOURCE_REPO',
 72 |                     "value": source_repo
 73 |                 },
 74 |                 {
 75 |                     "name": 'ECR_REPO_VERSION',
 76 |                     "value": image_tag
 77 |                 },
 78 |                 {
 79 |                     "name": 'ECR_REPO',
 80 |                     "value": ecr_repo
 81 |                 }
 82 |             ]
 83 |         )
 84 |     except ClientError as e:
 85 |         raise Exception( "boto3 client error : " + e.__str__())
 86 |     except Exception as e:
 87 |        raise Exception( "Unexpected error : " +    e.__str__())
 88 |     logger.info(response)
 89 |     helper.Data.update({"BuildId": response['build']['id']})
 90 |     helper.Data.update({"EcrImageUri": ecr_repo + ':' + image_tag})
 91 |     
 92 | def check_code_build_status(event, context):
 93 |     build_id = helper.Data.get('BuildId')
 94 | 
 95 |     try:
 96 |         response = code_build_client.batch_get_builds(ids=[build_id])
 97 |     except ClientError as e:
 98 |         raise Exception( "boto3 client error : " + e.__str__())
 99 |     except Exception as e:
100 |        raise Exception( "Unexpected error : " +    e.__str__())
101 |     status = response['builds'][0]['buildStatus']
102 |     
103 |     if status in ['FAILED', 'FAULT', 'STOPPED', 'TIMED_OUT']:
104 |         msg = f"Build ID {build_id} has status {status}, exiting"
105 |         logger.info(msg)
106 |         raise ValueError(msg)
107 |     else:
108 |         if status in ['SUCCEEDED']:
109 |             logger.info(status)
110 |             return True
111 |         else:
112 |             logger.info(f"Build ID status is: {build_id}")
113 |             return None


--------------------------------------------------------------------------------
/src/lambda/trigger_code_build/trigger_lambdas_code_build.py:
--------------------------------------------------------------------------------
 1 | from crhelper import CfnResource
 2 | import logging
 3 | import boto3
 4 | from botocore.exceptions import ClientError
 5 | 
 6 | logger = logging.getLogger(__name__)
 7 | # Initialise the helper, all inputs are optional, this example shows the defaults
 8 | helper = CfnResource(json_logging=False, log_level='DEBUG', boto_level='CRITICAL', polling_interval=1)
 9 | 
10 | 
11 | # Initiate client
12 | try:
13 |     print("Attempt to initiate client")
14 |     _session = boto3.Session()
15 |     code_build_client = _session.client('codebuild')
16 |     print("Attempt to initiate codebuild client complete")
17 | except Exception as e:
18 |     helper.init_failure(e)
19 | 
20 | 
21 | @helper.create
22 | def create(event, context):
23 |     logger.info("Got Create")
24 |     start_code_build(event, context)
25 | 
26 | 
27 | @helper.update
28 | def update(event, context):
29 |     logger.info("Got Update")
30 |     start_code_build(event, context)
31 | 
32 | 
33 | @helper.delete
34 | def delete(event, context):
35 |     logger.info("Got Delete")
36 |     return "delete"
37 |     # Delete never returns anything. Should not fail if the underlying resources are already deleted. Desired state.
38 | 
39 | @helper.poll_create
40 | def poll_create(event, context):
41 |     logger.info("Got Create poll")
42 |     return check_code_build_status(event, context)
43 | 
44 | 
45 | @helper.poll_update
46 | def poll_update(event, context):
47 |     logger.info("Got Update poll")
48 |     return check_code_build_status(event, context)
49 | 
50 | 
51 | @helper.poll_delete
52 | def poll_delete(event, context):
53 |     logger.info("Got Delete poll")
54 |     return "delete poll"
55 | 
56 | def handler(event, context):
57 |     helper(event, context)
58 | 
59 | 
60 | def start_code_build(event, context):
61 |     project_name = event['ResourceProperties']['ProjectName']
62 |     
63 |     try:
64 |         print(f"Attempt to start code build project {project_name}")
65 |         response = code_build_client.start_build(
66 |             projectName=project_name
67 |         )
68 |     except ClientError as e:
69 |         raise Exception( "boto3 client error : " + e.__str__())
70 |     except Exception as e:
71 |        raise Exception( "Unexpected error : " +    e.__str__())
72 |     logger.info(response)
73 |     helper.Data.update({"BuildId": response['build']['id']})
74 | 
75 | def check_code_build_status(event, context):
76 |     build_id = helper.Data.get('BuildId')
77 |     
78 |     try:
79 |         response = code_build_client.batch_get_builds(ids=[build_id])
80 |     except ClientError as e:
81 |         raise Exception( "boto3 client error : " + e.__str__())
82 |     except Exception as e:
83 |        raise Exception( "Unexpected error : " +    e.__str__())
84 |     status = response['builds'][0]['buildStatus']
85 |     
86 |     if status in ['FAILED', 'FAULT', 'STOPPED', 'TIMED_OUT']:
87 |         msg = f"Build ID {build_id} has status {status}, exiting"
88 |         logger.info(msg)
89 |         raise ValueError(msg)
90 |     else:
91 |         if status in ['SUCCEEDED']:
92 |             logger.info(status)
93 |             return True
94 |         else:
95 |             logger.info(f"Build ID status is: {build_id}")
96 |             return None


--------------------------------------------------------------------------------
/src/workflow/main.wdl:
--------------------------------------------------------------------------------
  1 | version 1.0
  2 | 
  3 | import "sub-workflows/processing-for-variant-discovery-gatk4.wdl" as preprocess
  4 | import "sub-workflows/haplotypecaller-gvcf-gatk4.wdl" as haplotype
  5 | import "sub-workflows/fastq-to-bam.wdl" as fastq2bam
  6 | 
  7 | workflow fastqToVCF {
  8 |     input {
  9 |       String sample_name
 10 |       File fastq_1
 11 |       File fastq_2
 12 |       String readgroup_name
 13 |       String run_date
 14 |       String library_name
 15 |       String platform_name
 16 |       String sequencing_center
 17 |       File ref_fasta
 18 |       File dbSNP_vcf
 19 |       File known_indels_vcf
 20 | 
 21 |       File Mills_1000G_indels_vcf
 22 |  
 23 |       File scattered_calling_intervals_archive
 24 |       String gatk_docker
 25 |       String gotc_docker
 26 |       
 27 |     }
 28 |     String unmapped_bam_suffix="unmapped.bam"
 29 |  String ref_base_ext = basename(ref_fasta)
 30 |    String ref_base = basename(ref_fasta,".fasta")
 31 |    String ref_base_path = sub(ref_fasta,ref_base_ext,"")
 32 |    String dbSNP_vcf_index_path = sub(dbSNP_vcf,".vcf",".vcf.idx")
 33 |    String known_indels_vcf_path = sub(known_indels_vcf,".vcf.gz",".vcf.gz.tbi")
 34 |    String Mills_1000G_indels_vcf_path = sub(Mills_1000G_indels_vcf,".vcf.gz",".vcf.gz.tbi")
 35 | 
 36 |    File ref_fasta_index = ref_base_path + ref_base_ext + ".fai"
 37 |    File ref_dict = ref_base_path + ref_base + ".dict"
 38 |    File ref_alt = ref_base_path + ref_base_ext + ".64.alt"
 39 |    File ref_sa = ref_base_path + ref_base_ext + ".64.sa"
 40 |    File ref_ann = ref_base_path + ref_base_ext + ".64.ann"
 41 |    File ref_bwt = ref_base_path + ref_base_ext + ".64.bwt"
 42 |    File ref_pac = ref_base_path + ref_base_ext + ".64.pac"
 43 |    File ref_amb = ref_base_path + ref_base_ext + ".64.amb"
 44 | 
 45 |    File dbSNP_vcf_index = dbSNP_vcf_index_path
 46 |    File known_indels_vcf_index = known_indels_vcf_path
 47 |    File Mills_1000G_indels_vcf_index = Mills_1000G_indels_vcf_path
 48 | 
 49 |     call fastq2bam.ConvertPairedFastQsToUnmappedBamWf as Fastq2Bam {
 50 |       input:
 51 |         sample_name=sample_name,
 52 |         fastq_1=fastq_1, fastq_2=fastq_2,
 53 |         readgroup_name=readgroup_name,
 54 |         run_date = run_date,
 55 |         library_name = library_name,
 56 |         platform_name = platform_name,
 57 |         sequencing_center = sequencing_center,
 58 | 	gatk_docker = gatk_docker
 59 |     }
 60 |     call preprocess.PreProcessingForVariantDiscovery_GATK4 as PreProcess {
 61 |         input:
 62 |           sample_name = sample_name,
 63 |           unmapped_bam = Fastq2Bam.output_unmapped_bam,
 64 |           unmapped_bam_suffix = unmapped_bam_suffix,
 65 |           ref_fasta = ref_fasta,
 66 |           ref_fasta_index = ref_fasta_index,
 67 |           ref_dict = ref_dict,
 68 |           ref_alt = ref_alt,
 69 |           ref_sa = ref_sa,
 70 |           ref_ann = ref_ann,
 71 |           ref_bwt = ref_bwt,
 72 |           ref_pac = ref_pac,
 73 |           ref_amb = ref_amb,
 74 |           dbSNP_vcf =  dbSNP_vcf,
 75 |           dbSNP_vcf_index =  dbSNP_vcf_index,
 76 |           known_indels_vcf = known_indels_vcf,
 77 |           known_indels_vcf_index = known_indels_vcf_index,
 78 |           Mills_1000G_indels_vcf = Mills_1000G_indels_vcf,
 79 |           Mills_1000G_indels_vcf_index = Mills_1000G_indels_vcf_index,
 80 |           gatk_docker = gatk_docker,
 81 |           gotc_docker = gotc_docker,
 82 |     }
 83 |     call haplotype.HaplotypeCallerGvcf_GATK4 as CallHaplotypes {
 84 |         input:
 85 |           input_bam = PreProcess.analysis_ready_bam,
 86 |           input_bam_index = PreProcess.analysis_ready_bam_index,
 87 |           ref_fasta = ref_fasta,
 88 |           ref_fasta_index = ref_fasta_index,
 89 |           ref_dict = ref_dict,
 90 |           scattered_calling_intervals_archive = scattered_calling_intervals_archive,
 91 | 	  gatk_docker = gatk_docker,
 92 | 	  gotc_docker = gotc_docker,
 93 | 
 94 |     }
 95 |     output {
 96 |         File duplication_metrics = PreProcess.duplication_metrics
 97 |         File bqsr_report = PreProcess.bqsr_report
 98 |         File analysis_ready_bam = PreProcess.analysis_ready_bam
 99 |         File analysis_ready_bam_index = PreProcess.analysis_ready_bam_index
100 |         File analysis_ready_bam_md5 = PreProcess.analysis_ready_bam_md5
101 |         File output_vcf = CallHaplotypes.output_vcf
102 |         File output_vcf_index = CallHaplotypes.output_vcf_index
103 |     }
104 | }
105 | 


--------------------------------------------------------------------------------
/src/workflow/parameter-template.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "sample_name": {
 3 |     "description": "sample name"
 4 |   },
 5 |   "fastq_1": {
 6 |     "description": "path to fastq1"
 7 |   },
 8 |   "fastq_2": {
 9 |     "description": "path to fastq2"
10 |   },
11 |   "ref_fasta": {
12 |     "description": "path to reference fasta"
13 |   },
14 |   "readgroup_name": {
15 |     "description": "readgroup name"
16 |   },
17 |   "library_name": {
18 |     "description": "library name"
19 |   },
20 |   "platform_name": {
21 |     "description": "platform name}, e.g. Illumina"
22 |   },
23 |   "run_date": {
24 |     "description": "sequencing run date"
25 |   },
26 |   "sequencing_center": {
27 |     "description": "name of sequencing center"
28 |   },
29 |   "dbSNP_vcf": {
30 |     "description": "dbsnp vcf"
31 |   },
32 |   "Mills_1000G_indels_vcf": {
33 |     "description": "Mills 1000 genomes gold indels vcf"
34 |   },
35 |   "known_indels_vcf": {
36 |     "description": "known indels vcf"
37 |   },
38 |   "scattered_calling_intervals_archive": {
39 |     "description": "tar (not gzip) of scatter intervals"
40 |   },
41 |   "gatk_docker": {
42 |     "description": "docker uri in private ECR of GATK"
43 |   },
44 |   "gotc_docker": {
45 |     "description": "docker uri in private ECR of Genomes in the Cloud"
46 |   }
47 | }


--------------------------------------------------------------------------------
/src/workflow/sub-workflows/fastq-to-bam.wdl:
--------------------------------------------------------------------------------
  1 | version 1.0
  2 | ##Copyright Broad Institute, 2018
  3 | ##
  4 | ## This WDL converts paired FASTQ to uBAM and adds read group information
  5 | ##
  6 | ## Requirements/expectations :
  7 | ## - Pair-end sequencing data in FASTQ format (one file per orientation)
  8 | ## - The following metada descriptors per sample:
  9 | ##  - readgroup
 10 | ##  - sample_name
 11 | ##  - library_name
 12 | ##  - platform_unit
 13 | ##  - run_date
 14 | ##  - platform_name
 15 | ##  - sequecing_center
 16 | ##
 17 | ## Outputs :
 18 | ## - Set of unmapped BAMs, one per read group
 19 | ## - File of a list of the generated unmapped BAMs
 20 | ##
 21 | ## Cromwell version support
 22 | ## - Successfully tested on v47
 23 | ## - Does not work on versions < v23 due to output syntax
 24 | ##
 25 | ## Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
 26 | ## For program versions, see docker containers.
 27 | ##
 28 | ## LICENSING :
 29 | ## This script is released under the WDL source code license (BSD-3) (see LICENSE in
 30 | ## https://github.com/broadinstitute/wdl). Note however that the programs it calls may
 31 | ## be subject to different licenses. Users are responsible for checking that they are
 32 | ## authorized to run all programs before running this script. Please see the docker
 33 | ## page at https://hub.docker.com/r/broadinstitute/genomes-in-the-cloud/ for detailed
 34 | ## licensing information pertaining to the included programs.
 35 | 
 36 | # WORKFLOW DEFINITION
 37 | workflow ConvertPairedFastQsToUnmappedBamWf {
 38 |     input {
 39 |         String sample_name
 40 |         File fastq_1
 41 |         File fastq_2
 42 |         String readgroup_name
 43 |         String run_date
 44 |         String library_name
 45 |         String platform_name
 46 |         String sequencing_center
 47 |         String gatk_docker
 48 |     }
 49 | 
 50 |     #String gatk_docker = "022521056385.dkr.ecr.us-east-1.amazonaws.com/gatk:4.1.9.0"
 51 | 
 52 |     String gatk_path = "/gatk/gatk"
 53 | 
 54 |     # Convert pair of FASTQs to uBAM
 55 |     call PairedFastQsToUnmappedBAM {
 56 |         input:
 57 |             sample_name = sample_name,
 58 |             fastq_1 = fastq_1,
 59 |             fastq_2 = fastq_2,
 60 |             readgroup_name = readgroup_name,
 61 |             run_date = run_date,
 62 |             library_name = library_name,
 63 |             platform_name = platform_name,
 64 |             sequencing_center = sequencing_center,
 65 |             gatk_path = gatk_path,
 66 |             docker = gatk_docker,
 67 |     }
 68 | 
 69 | 
 70 |     # Outputs that will be retained when execution is complete
 71 |     output {
 72 |         File output_unmapped_bam = PairedFastQsToUnmappedBAM.output_unmapped_bam
 73 |     }
 74 | }
 75 | 
 76 | # TASK DEFINITIONS
 77 | 
 78 | # Convert a pair of FASTQs to uBAM
 79 | task PairedFastQsToUnmappedBAM {
 80 |     input {
 81 |         # Command parameters
 82 |         String sample_name
 83 |         File fastq_1
 84 |         File fastq_2
 85 |         String readgroup_name
 86 |         String gatk_path
 87 |         String run_date
 88 |         String library_name
 89 |         String platform_name
 90 |         String sequencing_center
 91 | 
 92 |         # Runtime parameters
 93 |         Int machine_mem_gb = 7
 94 |         String docker
 95 |     }
 96 |     Int command_mem_gb = machine_mem_gb - 1
 97 |     command {
 98 |         echo "FASTQ to uBAM" >&2
 99 |         echo "fastq_1 ~{fastq_1}" >&2
100 |         echo "fastq_2 ~{fastq_2}" >&2
101 |         echo "sample_name ~{sample_name}" >&2
102 |         echo "readgroup_name ~{readgroup_name}" >&2
103 | 
104 |         ~{gatk_path} --java-options "-Xmx~{command_mem_gb}g" \
105 |         FastqToSam \
106 |             --FASTQ ~{fastq_1} \
107 |             --FASTQ2 ~{fastq_2} \
108 |             --OUTPUT ~{readgroup_name}.unmapped.bam \
109 |             --READ_GROUP_NAME ~{readgroup_name} \
110 |             --SAMPLE_NAME ~{sample_name} \
111 |             --LIBRARY_NAME ~{library_name} \
112 |             --RUN_DATE ~{run_date} \
113 |             --PLATFORM ~{platform_name} \
114 |             --SEQUENCING_CENTER ~{sequencing_center}
115 | 
116 |         # Creates a file of file names of the uBAM, which is a text file with each row having the path to the file.
117 |         # In this case there will only be one file path in the txt file but this format is used by
118 |         # the pre-processing for variant discovery workflow.
119 | 
120 |     }
121 |     runtime {
122 |         docker: docker
123 |         memory: machine_mem_gb + " GiB"
124 |         cpu: 4
125 |     }
126 |     output {
127 |         File output_unmapped_bam = "~{readgroup_name}.unmapped.bam"
128 |     }
129 | }
130 | 
131 | 
132 | 


--------------------------------------------------------------------------------
/src/workflow/sub-workflows/haplotypecaller-gvcf-gatk4.wdl:
--------------------------------------------------------------------------------
  1 | version 1.0
  2 | 
  3 | ## Copyright Broad Institute, 2019
  4 | ##
  5 | ## The haplotypecaller-gvcf-gatk4 workflow runs the HaplotypeCaller tool
  6 | ## from GATK4 in GVCF mode on a single sample according to GATK Best Practices.
  7 | ## When executed the workflow scatters the HaplotypeCaller tool over a sample
  8 | ## using an intervals list file. The output file produced will be a
  9 | ## single gvcf file.
 10 | ##
 11 | ## Requirements/expectations :
 12 | ## - One analysis-ready BAM file for a single sample (as identified in RG:SM)
 13 | ## - Set of variant calling intervals lists for the scatter, provided in a file
 14 | ##
 15 | ## Outputs :
 16 | ## - One GVCF file and its index
 17 | ##
 18 | ##
 19 | ## LICENSING :
 20 | ## This script is released under the WDL source code license (BSD-3) (see LICENSE in
 21 | ## https://github.com/broadinstitute/wdl). Note however that the programs it calls may
 22 | ## be subject to different licenses. Users are responsible for checking that they are
 23 | ## authorized to run all programs before running this script. Please see the dockers
 24 | ## for detailed licensing information pertaining to the included programs.
 25 | 
 26 | # WORKFLOW DEFINITION
 27 | workflow HaplotypeCallerGvcf_GATK4 {
 28 |     input {
 29 |         File input_bam
 30 |         File input_bam_index
 31 |         File ref_fasta
 32 |         File scattered_calling_intervals_archive
 33 |         File ref_fasta_index
 34 |         File ref_dict
 35 |         String gatk_docker
 36 | 	    String gotc_docker
 37 |     }
 38 | 
 39 |     
 40 |     #File ref_fasta="s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta"
 41 |     
 42 |     #String ref_base_ext = basename(ref_fasta)
 43 |     #String ref_base = basename(ref_fasta,".fasta")
 44 |     #String ref_base_path = sub(ref_fasta,ref_base_ext,"")
 45 | 
 46 |     #File ref_fasta_index= ref_base_path + ref_base_ext + ".fai"
 47 |     #File ref_dict = ref_base_path + ref_base + ".dict"
 48 | 
 49 |     #File ref_fasta_index="s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai"
 50 |     #File scattered_calling_intervals_archive="s3://omics-test-input-bucket/workflow/intervals.tar.gz"
 51 |     #File ref_dict="s3://broad-references/hg38/v0/Homo_sapiens_assembly38.dict"
 52 | 
 53 |     Boolean make_gvcf = false
 54 |     Boolean make_bamout = false
 55 |     #String gatk_docker = "022521056385.dkr.ecr.us-east-1.amazonaws.com/gatk:4.1.9.0"
 56 |     String gatk_path = "/gatk/gatk"
 57 | 
 58 |     String sample_basename = basename(input_bam, ".bam")
 59 |     String vcf_basename = sample_basename
 60 |     String output_suffix = if make_gvcf then ".g.vcf.gz" else ".vcf.gz"
 61 |     String output_filename = vcf_basename + output_suffix
 62 | 
 63 |     #Array[File] scattered_calling_intervals = read_lines(scattered_calling_intervals_list)
 64 | 
 65 |     # Call variants in parallel over grouped calling intervals
 66 | 
 67 |     call UnpackIntervals {
 68 |         input: archive = scattered_calling_intervals_archive,
 69 |                docker = gotc_docker
 70 |     }
 71 | 
 72 |     scatter (interval_file in UnpackIntervals.interval_files) {
 73 | 
 74 |         # Generate GVCF by interval
 75 |         call HaplotypeCaller {
 76 |             input:
 77 |                 input_bam = input_bam,
 78 |                 input_bam_index = input_bam_index,
 79 |                 interval_list = interval_file,
 80 |                 output_filename = output_filename,
 81 |                 ref_dict = ref_dict,
 82 |                 ref_fasta = ref_fasta,
 83 |                 ref_fasta_index = ref_fasta_index,
 84 |                 make_gvcf = make_gvcf,
 85 |                 make_bamout = make_bamout,
 86 |                 docker = gatk_docker,
 87 |                 gatk_path = gatk_path
 88 |         }
 89 |     }
 90 | 
 91 |     # Merge per-interval GVCFs
 92 |     call MergeGVCFs {
 93 |         input:
 94 |             input_vcfs = HaplotypeCaller.output_vcf,
 95 |             input_vcfs_indexes = HaplotypeCaller.output_vcf_index,
 96 |             output_filename = output_filename,
 97 |             docker = gatk_docker,
 98 |             gatk_path = gatk_path
 99 |     }
100 | 
101 |     # Outputs that will be retained when execution is complete
102 |     output {
103 |         File output_vcf = MergeGVCFs.output_vcf
104 |         File output_vcf_index = MergeGVCFs.output_vcf_index
105 |     }
106 | }
107 | 
108 | # TASK DEFINITIONS
109 | 
110 | task UnpackIntervals {
111 |     input {
112 |         File archive
113 |         String docker
114 |    }
115 |    String basestem_input = basename(archive, ".tar")
116 |     command {
117 |         echo "Unpack Intervals" >&2
118 |         tar xvf ~{archive} --directory ./
119 |     }
120 |     runtime {
121 |         docker: docker 
122 |         cpu: 2
123 |         memory: "2 GiB"
124 |    }
125 |     output {
126 |        Array[File] interval_files = glob("${basestem_input}/*")
127 |     }
128 | }
129 | 
130 | # HaplotypeCaller per-sample in GVCF mode
131 | task HaplotypeCaller {
132 |     input {
133 |         # Command parameters
134 |         File input_bam
135 |         File input_bam_index
136 |         File interval_list
137 |         String output_filename
138 |         File ref_dict
139 |         File ref_fasta
140 |         File ref_fasta_index
141 |         Float? contamination
142 |         Boolean make_gvcf
143 |         Boolean make_bamout
144 | 
145 |         String gatk_path
146 |         String? java_options
147 | 
148 |         # Runtime parameters
149 |         String docker
150 |         Int? mem_gb
151 |     }
152 | 
153 |     String java_opt = select_first([java_options, ""])
154 | 
155 |     Int machine_mem_gb = select_first([mem_gb, 16])
156 |     Int command_mem_gb = machine_mem_gb - 2
157 | 
158 |     String vcf_basename = if make_gvcf then  basename(output_filename, ".gvcf") else basename(output_filename, ".vcf")
159 |     String bamout_arg = if make_bamout then "-bamout ~{vcf_basename}.bamout.bam" else ""
160 | 
161 |     parameter_meta {
162 |         input_bam: {
163 |                        description: "a bam file"
164 |                    }
165 |         input_bam_index: {
166 |                              description: "an index file for the bam input"
167 |                          }
168 |     }
169 |     command {
170 |         echo HaplotypeCaller >&2
171 |         set -euxo pipefail
172 | 
173 |         ~{gatk_path} --java-options "-Xmx~{command_mem_gb}G ~{java_opt}" \
174 |         HaplotypeCaller \
175 |         -R ~{ref_fasta} \
176 |         -I ~{input_bam} \
177 |         -L ~{interval_list} \
178 |         -O ~{output_filename} \
179 |         -contamination ~{default="0" contamination} \
180 |         -G StandardAnnotation -G StandardHCAnnotation ~{true="-G AS_StandardAnnotation" false="" make_gvcf} \
181 |         -GQB 10 -GQB 20 -GQB 30 -GQB 40 -GQB 50 -GQB 60 -GQB 70 -GQB 80 -GQB 90 \
182 |         ~{true="-ERC GVCF" false="" make_gvcf} \
183 |         ~{bamout_arg}
184 | 
185 |         touch ~{vcf_basename}.bamout.bam
186 |     }
187 |     runtime {
188 |         docker: docker
189 |         memory: machine_mem_gb + " GiB"
190 |         cpu: 4
191 |     }
192 |     output {
193 |         File output_vcf = "~{output_filename}"
194 |         File output_vcf_index = "~{output_filename}.tbi"
195 |         File bamout = "~{vcf_basename}.bamout.bam"
196 |     }
197 | }
198 | # Merge GVCFs generated per-interval for the same sample
199 | task MergeGVCFs {
200 |     input {
201 |         # Command parameters
202 |         Array[File] input_vcfs
203 |         Array[File] input_vcfs_indexes
204 |         String output_filename
205 | 
206 |         String gatk_path
207 | 
208 |         # Runtime parameters
209 |         String docker
210 |         Int? mem_gb
211 |     }
212 |     Int machine_mem_gb = select_first([mem_gb, 8])
213 |     Int command_mem_gb = machine_mem_gb - 2
214 | 
215 |     command {
216 |         echo MergeGVCFs
217 |         set -euxo pipefail
218 | 
219 |         ~{gatk_path} --java-options "-Xmx~{command_mem_gb}G"  \
220 |         MergeVcfs \
221 |         --INPUT ~{sep=' --INPUT ' input_vcfs} \
222 |         --OUTPUT ~{output_filename}
223 |     }
224 |     runtime {
225 |         docker: docker
226 |         memory: machine_mem_gb + " GB"
227 |         cpu: 2
228 |     }
229 |     output {
230 |         File output_vcf = "~{output_filename}"
231 |         File output_vcf_index = "~{output_filename}.tbi"
232 |     }
233 | }
234 | 


--------------------------------------------------------------------------------
/src/workflow/sub-workflows/processing-for-variant-discovery-gatk4.wdl:
--------------------------------------------------------------------------------
  1 | version 1.0
  2 | 
  3 | ## Copyright Broad Institute, 2021
  4 | ##
  5 | ## This WDL pipeline implements data pre-processing according to the GATK Best Practices.
  6 | ##
  7 | ## Requirements/expectations :
  8 | ## - Pair-end sequencing data in unmapped BAM (uBAM) format
  9 | ## - One or more read groups, one per uBAM file, all belonging to a single sample (SM)
 10 | ## - Input uBAM files must additionally comply with the following requirements:
 11 | ## - - filenames all have the same suffix (we use ".unmapped.bam")
 12 | ## - - files must pass validation by ValidateSamFile
 13 | ## - - reads are provided in query-sorted order
 14 | ## - - all reads must have an RG tag
 15 | ##
 16 | ## Output :
 17 | ## - A clean BAM file and its index, suitable for variant discovery analyses.
 18 | ##
 19 | ## Cromwell version support
 20 | ## - Successfully tested on v59
 21 | ##
 22 | ## Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
 23 | ##
 24 | ## LICENSING :
 25 | ## This script is released under the WDL source code license (BSD-3) (see LICENSE in
 26 | ## https://github.com/broadinstitute/wdl). Note however that the programs it calls may
 27 | ## be subject to different licenses. Users are responsible for checking that they are
 28 | ## authorized to run all programs before running this script. Please see the dockers
 29 | ## for detailed licensing information pertaining to the included programs.
 30 | 
 31 | # WORKFLOW DEFINITION
 32 | workflow PreProcessingForVariantDiscovery_GATK4 {
 33 |   input {
 34 |     String sample_name
 35 | 
 36 |     File unmapped_bam
 37 |     String unmapped_bam_suffix
 38 |     File ref_fasta
 39 |     File ref_fasta_index
 40 |     File ref_dict
 41 |     File ref_alt
 42 |     File ref_ann
 43 |     File ref_bwt
 44 |     File ref_pac
 45 |     File ref_amb
 46 |     File ref_sa
 47 |     File dbSNP_vcf
 48 |     File dbSNP_vcf_index
 49 |     File known_indels_vcf
 50 |     File known_indels_vcf_index
 51 |     File Mills_1000G_indels_vcf
 52 |     File Mills_1000G_indels_vcf_index
 53 |     String gatk_docker
 54 |     String gotc_docker
 55 | 
 56 |   }
 57 | 
 58 |   String ref_name = "hg38"
 59 |   #File ref_fasta = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta"
 60 | 
 61 |   #String ref_base_ext = basename(ref_fasta)
 62 |   #String ref_base = basename(ref_fasta,".fasta")
 63 |   #String ref_base_path = sub(ref_fasta,ref_base_ext,"")
 64 |   #String dbSNP_vcf_index_path = sub(dbSNP_vcf,".vcf",".vcf.idx")
 65 |   #String known_indels_vcf_path = sub(known_indels_vcf,".vcf.gz",".vcf.gz.tbi")
 66 |   #String Mills_1000G_indels_vcf_path = sub(Mills_1000G_indels_vcf,".vcf.gz",".vcf.gz.tbi")
 67 | 
 68 |   #File ref_fasta_index = ref_base_path + ref_base_ext + ".fai"
 69 |   #File ref_dict = ref_base_path + ref_base + ".dict"
 70 |   #File ref_alt = ref_base_path + ref_base_ext + ".64.alt"
 71 |   #File ref_sa = ref_base_path + ref_base_ext + ".64.sa"
 72 |   #File ref_ann = ref_base_path + ref_base_ext + ".64.ann"
 73 |   #File ref_bwt = ref_base_path + ref_base_ext + ".64.bwt"
 74 |   #File ref_pac = ref_base_path + ref_base_ext + ".64.pac"
 75 |  # File ref_amb = ref_base_path + ref_base_ext + ".64.amb"
 76 | 
 77 |   #File dbSNP_vcf_index = dbSNP_vcf_index_path
 78 |   #File known_indels_vcf_index = known_indels_vcf_path
 79 |   #File Mills_1000G_indels_vcf_index = Mills_1000G_indels_vcf_path
 80 | 
 81 |   #File ref_fasta_index = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai"
 82 |   #File ref_dict = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.dict"
 83 |   #File ref_alt = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.alt"
 84 |   #File ref_sa = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.sa"
 85 |   #File ref_ann = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.ann"
 86 |   #File ref_bwt = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.bwt"
 87 |   #File ref_pac = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.pac"
 88 |   #File ref_amb = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.64.amb"
 89 |   #File dbSNP_vcf = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf"
 90 |   #File dbSNP_vcf_index = "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.dbsnp138.vcf.idx"
 91 |   #Array[File] known_indels_sites_VCFs = [
 92 |   #                                      "s3://broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
 93 |   #                                      "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz"
 94 |   #                                      ]
 95 |   #Array[File] known_indels_sites_indices = [
 96 |   #                                         "s3://broad-references/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi",
 97 |   #                                         "s3://broad-references/hg38/v0/Homo_sapiens_assembly38.known_indels.vcf.gz.tbi"
 98 |   #                                         ]
 99 | 
100 | 
101 |   String bwa_commandline = "bwa mem -K 100000000 -p -v 3 -t 14 -Y $bash_ref_fasta"
102 |   Int compression_level = 6
103 | 
104 |   #String gatk_docker = "022521056385.dkr.ecr.us-east-1.amazonaws.com/gatk:4.1.9.0"
105 |   String gatk_path = "/gatk/gatk"
106 |   #String gotc_docker = "022521056385.dkr.ecr.us-east-1.amazonaws.com/genomes-in-the-cloud:2.4.7-1603303710"
107 |   String gotc_path = "/usr/gitc/"
108 |   # Amazon linux has python installed
109 | 
110 |   String base_file_name = sample_name + "." + ref_name
111 | 
112 |   #Array[File] flowcell_unmapped_bams = read_lines(flowcell_unmapped_bams_list)
113 | 
114 |   #File unmapped_bam = unmapped_bam
115 | 
116 |   # Get the version of BWA to include in the PG record in the header of the BAM produced
117 |   # by MergeBamAlignment.
118 |   call GetBwaVersion {
119 |     input:
120 |       docker_image = gotc_docker,
121 |       bwa_path = gotc_path,
122 |   }
123 | 
124 |   # Align flowcell-level unmapped input bams in parallel
125 | 
126 |     # Get the basename, i.e. strip the filepath and the extension
127 |     String bam_basename = basename(unmapped_bam, unmapped_bam_suffix)
128 | 
129 |     # Map reads to reference
130 |     call SamToFastqAndBwaMem {
131 |       input:
132 |         input_bam = unmapped_bam,
133 |         bwa_commandline = bwa_commandline,
134 |         output_bam_basename = bam_basename + ".unmerged",
135 |         ref_fasta = ref_fasta,
136 |         ref_fasta_index = ref_fasta_index,
137 |         ref_dict = ref_dict,
138 |         ref_alt = ref_alt,
139 |         ref_sa = ref_sa,
140 |         ref_ann = ref_ann,
141 |         ref_bwt = ref_bwt,
142 |         ref_pac = ref_pac,
143 |         ref_amb = ref_amb,
144 |         docker_image = gotc_docker,
145 |         bwa_path = gotc_path,
146 |         gotc_path = gotc_path,
147 |         compression_level = compression_level
148 |     }
149 | 
150 |     # Merge original uBAM and BWA-aligned BAM
151 |     call MergeBamAlignment {
152 |       input:
153 |         unmapped_bam = unmapped_bam,
154 |         bwa_commandline = bwa_commandline,
155 |         bwa_version = GetBwaVersion.version,
156 |         aligned_bam = SamToFastqAndBwaMem.output_bam,
157 |         output_bam_basename = bam_basename + ".aligned.unsorted",
158 |         ref_fasta = ref_fasta,
159 |         ref_fasta_index = ref_fasta_index,
160 |         ref_dict = ref_dict,
161 |         docker_image = gatk_docker,
162 |         gatk_path = gatk_path,
163 |         compression_level = compression_level
164 |     }
165 | 
166 |   # Aggregate aligned+merged flowcell BAM files and mark duplicates
167 |   # We take advantage of the tool's ability to take multiple BAM inputs and write out a single output
168 |   # to avoid having to spend time just merging BAM files.
169 |   call MarkDuplicates {
170 |     input:
171 |       input_bams = MergeBamAlignment.output_bam,
172 |       output_bam_basename = base_file_name + ".aligned.unsorted.duplicates_marked",
173 |       metrics_filename = base_file_name + ".duplicate_metrics",
174 |       docker_image = gatk_docker,
175 |       gatk_path = gatk_path,
176 |       compression_level = compression_level,
177 |   }
178 | 
179 |   # Sort aggregated+deduped BAM file and fix tags
180 |   call SortAndFixTags {
181 |     input:
182 |       input_bam = MarkDuplicates.output_bam,
183 |       output_bam_basename = base_file_name + ".aligned.duplicate_marked.sorted",
184 |       ref_dict = ref_dict,
185 |       ref_fasta = ref_fasta,
186 |       ref_fasta_index = ref_fasta_index,
187 |       docker_image = gatk_docker,
188 |       gatk_path = gatk_path,
189 |       compression_level = compression_level
190 |   }
191 | 
192 |   # Create list of sequences for scatter-gather parallelization
193 |   call CreateSequenceGroupingTSV {
194 |     input:
195 |       ref_dict = ref_dict,
196 |       docker_image = gotc_docker,
197 |   }
198 | 
199 |   # Perform Base Quality Score Recalibration (BQSR) on the sorted BAM in parallel
200 |   scatter (subgroup in CreateSequenceGroupingTSV.sequence_grouping) {
201 |     # Generate the recalibration model by interval
202 |     call BaseRecalibrator {
203 |       input:
204 |         input_bam = SortAndFixTags.output_bam,
205 |         input_bam_index = SortAndFixTags.output_bam_index,
206 |         recalibration_report_filename = base_file_name + ".recal_data.csv",
207 |         sequence_group_interval = subgroup,
208 |         dbSNP_vcf = dbSNP_vcf,
209 |         dbSNP_vcf_index = dbSNP_vcf_index,
210 |         known_indels_vcf = known_indels_vcf,
211 |         known_indels_vcf_index = known_indels_vcf_index,
212 |         Mills_1000G_indels_vcf = Mills_1000G_indels_vcf,
213 |         Mills_1000G_indels_vcf_index = Mills_1000G_indels_vcf_index,
214 |         ref_dict = ref_dict,
215 |         ref_fasta = ref_fasta,
216 |         ref_fasta_index = ref_fasta_index,
217 |         docker_image = gatk_docker,
218 |         gatk_path = gatk_path,
219 |     }
220 |   }
221 | 
222 |   # Merge the recalibration reports resulting from by-interval recalibration
223 |   call GatherBqsrReports {
224 |     input:
225 |       input_bqsr_reports = BaseRecalibrator.recalibration_report,
226 |       output_report_filename = base_file_name + ".recal_data.csv",
227 |       docker_image = gatk_docker,
228 |       gatk_path = gatk_path,
229 |   }
230 | 
231 |   scatter (subgroup in CreateSequenceGroupingTSV.sequence_grouping_with_unmapped) {
232 | 
233 |     # Apply the recalibration model by interval
234 |     call ApplyBQSR {
235 |       input:
236 |         input_bam = SortAndFixTags.output_bam,
237 |         input_bam_index = SortAndFixTags.output_bam_index,
238 |         output_bam_basename = base_file_name + ".aligned.duplicates_marked.recalibrated",
239 |         recalibration_report = GatherBqsrReports.output_bqsr_report,
240 |         sequence_group_interval = subgroup,
241 |         ref_dict = ref_dict,
242 |         ref_fasta = ref_fasta,
243 |         ref_fasta_index = ref_fasta_index,
244 |         docker_image = gatk_docker,
245 |         gatk_path = gatk_path,
246 |     }
247 |   }
248 | 
249 |   # Merge the recalibrated BAM files resulting from by-interval recalibration
250 |   call GatherBamFiles {
251 |     input:
252 |       input_bams = ApplyBQSR.recalibrated_bam,
253 |       output_bam_basename = base_file_name,
254 |       docker_image = gatk_docker,
255 |       gatk_path = gatk_path,
256 |       compression_level = compression_level
257 |   }
258 | 
259 |   # Outputs that will be retained when execution is complete
260 |   output {
261 |     File duplication_metrics = MarkDuplicates.duplicate_metrics
262 |     File bqsr_report = GatherBqsrReports.output_bqsr_report
263 |     File analysis_ready_bam = GatherBamFiles.output_bam
264 |     File analysis_ready_bam_index = GatherBamFiles.output_bam_index
265 |     File analysis_ready_bam_md5 = GatherBamFiles.output_bam_md5
266 |   }
267 | }
268 | 
269 | # TASK DEFINITIONS
270 | 
271 | # Get version of BWA
272 | task GetBwaVersion {
273 |   input {
274 |     Float mem_size_gb = 1
275 |     String docker_image
276 |     String bwa_path
277 |   }
278 | 
279 |   command {
280 |     echo GetBwaVersion >&2
281 | 
282 |     # Not setting "set -o pipefail" here because /bwa has a rc=1 and we don't want to allow rc=1 to succeed
283 |     # because the sed may also fail with that error and that is something we actually want to fail on.
284 | 
285 |     set -ux
286 | 
287 |     ~{bwa_path}bwa 2>&1 | \
288 |     grep -e '^Version' | \
289 |     sed 's/Version: //'
290 |   }
291 |   runtime {
292 |     docker: docker_image
293 |     memory: "~{mem_size_gb} GiB"
294 |   }
295 |   output {
296 |     String version = read_string(stdout())
297 |   }
298 | }
299 | 
300 | # Read unmapped BAM, convert on-the-fly to FASTQ and stream to BWA MEM for alignment
301 | task SamToFastqAndBwaMem {
302 |   # This is the .alt file from bwa-kit (https://github.com/lh3/bwa/tree/master/bwakit),
303 |   # listing the reference contigs that are "alternative". Leave blank in JSON for legacy
304 |   # references such as b37 and hg19.
305 |   input {
306 |     File input_bam
307 |     String bwa_commandline
308 |     String output_bam_basename
309 |     File ref_fasta
310 |     File ref_fasta_index
311 |     File ref_dict
312 |     File? ref_alt
313 |     File ref_amb
314 |     File ref_ann
315 |     File ref_bwt
316 |     File ref_pac
317 |     File ref_sa
318 | 
319 |     Float mem_size_gb = 56
320 |     Int num_cpu = 32
321 | 
322 |     Int compression_level
323 | 
324 |     String docker_image
325 |     String bwa_path
326 |     String gotc_path
327 |   }
328 | 
329 |   command {
330 |     set -euo pipefail
331 | 
332 |     # set the bash variable needed for the command-line
333 |     bash_ref_fasta=~{ref_fasta}
334 | 
335 |     java -Xmx8G -jar ~{gotc_path}picard.jar \
336 |     SamToFastq \
337 |     INPUT=~{input_bam} \
338 |     FASTQ=/dev/stdout \
339 |     INTERLEAVE=true \
340 |     NON_PF=true \
341 |     | \
342 |     ~{bwa_path}~{bwa_commandline} /dev/stdin - \
343 |     | \
344 |     samtools view -1 - > ~{output_bam_basename}.bam
345 |   }
346 |   runtime {
347 |     docker: docker_image
348 |     memory: "~{mem_size_gb} GiB"
349 |     cpu: num_cpu
350 |   }
351 |   output {
352 |     File output_bam = "~{output_bam_basename}.bam"
353 |   }
354 | }
355 | 
356 | # Merge original input uBAM file with BWA-aligned BAM file
357 | task MergeBamAlignment {
358 |   input {
359 |     File unmapped_bam
360 |     String bwa_commandline
361 |     String bwa_version
362 |     File aligned_bam
363 |     String output_bam_basename
364 |     File ref_fasta
365 |     File ref_fasta_index
366 |     File ref_dict
367 | 
368 |     Int compression_level
369 |     Int mem_size_gb = 32
370 | 
371 |     String docker_image
372 |     String gatk_path
373 |   }
374 | 
375 |   Int command_mem_gb = ceil(mem_size_gb) - 1
376 | 
377 |   command {
378 |     echo MergeBamAlignment >&2
379 | 
380 |     set -euxo pipefail
381 | 
382 |     # set the bash variable needed for the command-line
383 |     bash_ref_fasta=~{ref_fasta}
384 |     ~{gatk_path} --java-options "-Dsamjdk.compression_level=~{compression_level} -Xmx~{command_mem_gb}G"  \
385 |     MergeBamAlignment \
386 |     --VALIDATION_STRINGENCY SILENT \
387 |     --EXPECTED_ORIENTATIONS FR \
388 |     --ATTRIBUTES_TO_RETAIN X0 \
389 |     --ALIGNED_BAM ~{aligned_bam} \
390 |     --UNMAPPED_BAM ~{unmapped_bam} \
391 |     --OUTPUT ~{output_bam_basename}.bam \
392 |     --REFERENCE_SEQUENCE ~{ref_fasta} \
393 |     --PAIRED_RUN true \
394 |     --SORT_ORDER "unsorted" \
395 |     --IS_BISULFITE_SEQUENCE false \
396 |     --ALIGNED_READS_ONLY false \
397 |     --CLIP_ADAPTERS false \
398 |     --ADD_MATE_CIGAR true \
399 |     --MAX_INSERTIONS_OR_DELETIONS -1 \
400 |     --PRIMARY_ALIGNMENT_STRATEGY MostDistant \
401 |     --PROGRAM_RECORD_ID "bwamem" \
402 |     --PROGRAM_GROUP_VERSION "~{bwa_version}" \
403 |     --PROGRAM_GROUP_COMMAND_LINE "~{bwa_commandline}" \
404 |     --PROGRAM_GROUP_NAME "bwamem" \
405 |     --UNMAPPED_READ_STRATEGY COPY_TO_TAG \
406 |     --ALIGNER_PROPER_PAIR_FLAGS true \
407 |     --UNMAP_CONTAMINANT_READS true
408 |   }
409 |   runtime {
410 |     docker: docker_image
411 |     memory: "~{mem_size_gb} GiB"
412 |     cpu: 2
413 |   }
414 |   output {
415 |     File output_bam = "~{output_bam_basename}.bam"
416 |   }
417 | }
418 | 
419 | # Sort BAM file by coordinate order and fix tag values for NM and UQ
420 | task SortAndFixTags {
421 |   input {
422 |     File input_bam
423 |     String output_bam_basename
424 |     File ref_dict
425 |     File ref_fasta
426 |     File ref_fasta_index
427 | 
428 |     Int compression_level
429 |     Float mem_size_gb = 32
430 | 
431 |     String docker_image
432 |     String gatk_path
433 |   }
434 | 
435 |   command {
436 |     echo SortAndFixTags >&2
437 | 
438 |     set -euxo pipefail
439 | 
440 |     ~{gatk_path} --java-options "-Xmx32G -Xms32G" \
441 |     SortSam \
442 |     --INPUT ~{input_bam} \
443 |     --OUTPUT /dev/stdout \
444 |     --SORT_ORDER "coordinate" \
445 |     --CREATE_INDEX false \
446 |     --CREATE_MD5_FILE false \
447 |     | \
448 |     ~{gatk_path} --java-options "-Dsamjdk.compression_level=6 -Xmx18G -Xms18G" \
449 |     SetNmMdAndUqTags \
450 |     --INPUT /dev/stdin \
451 |     --OUTPUT ~{output_bam_basename}.bam \
452 |     --CREATE_INDEX true \
453 |     --CREATE_MD5_FILE false \
454 |     --REFERENCE_SEQUENCE ~{ref_fasta}
455 | 
456 |   }
457 |   runtime {
458 |     docker: docker_image
459 |     memory: "~{mem_size_gb} GiB"
460 |     cpu: 8
461 |   }
462 |   output {
463 |     File output_bam = "~{output_bam_basename}.bam"
464 |     File output_bam_index = "~{output_bam_basename}.bai"
465 |   }
466 | }
467 | 
468 | # Mark duplicate reads to avoid counting non-independent observations
469 | task MarkDuplicates {
470 |   input {
471 |     File input_bams
472 |     String output_bam_basename
473 |     String metrics_filename
474 | 
475 |     Int compression_level
476 |     Float mem_size_gb = 32
477 | 
478 |     String docker_image
479 |     String gatk_path
480 |   }
481 | 
482 |   Int xmx = ceil(mem_size_gb) - 8
483 |   # Task is assuming query-sorted input so that the Secondary and Supplementary reads get marked correctly.
484 |   # This works because the output of BWA is query-grouped and therefore, so is the output of MergeBamAlignment.
485 |   # While query-grouped isn't actually query-sorted, it's good enough for MarkDuplicates with ASSUME_SORT_ORDER="queryname"
486 |   command {
487 |     echo MarkDuplicates >&2
488 | 
489 |     set -euxo pipefail
490 | 
491 |     ~{gatk_path} --java-options "-Dsamjdk.compression_level=~{compression_level} -Xms~{xmx}G -Xmx~{xmx}G" \
492 |     MarkDuplicates \
493 |     --INPUT  ~{input_bams} \
494 |     --OUTPUT ~{output_bam_basename}.bam \
495 |     --METRICS_FILE ~{metrics_filename} \
496 |     --VALIDATION_STRINGENCY SILENT \
497 |     --OPTICAL_DUPLICATE_PIXEL_DISTANCE 2500 \
498 |     --ASSUME_SORT_ORDER "queryname" \
499 |     --CREATE_MD5_FILE false
500 |   }
501 |   runtime {
502 |     docker: docker_image
503 |     memory: "~{mem_size_gb}  GiB"
504 |     cpu: 4
505 |   }
506 |   output {
507 |     File output_bam = "~{output_bam_basename}.bam"
508 |     File duplicate_metrics = "~{metrics_filename}"
509 |   }
510 | }
511 | 
512 | # Generate sets of intervals for scatter-gathering over chromosomes
513 | task CreateSequenceGroupingTSV {
514 |   input {
515 |     File ref_dict
516 |     Float mem_size_gb = 2
517 |     String docker_image
518 |   }
519 |   # Use python to create the Sequencing Groupings used for BQSR and PrintReads Scatter.
520 |   # It outputs to stdout where it is parsed into a wdl Array[Array[String]]
521 |   # e.g. [["1"], ["2"], ["3", "4"], ["5"], ["6", "7", "8"]]
522 |   command <<<
523 | 
524 |     echo CreateSequenceGroupingTSV >&2
525 | 
526 |     python <<CODE
527 |     with open("~{ref_dict}", "r") as ref_dict_file:
528 |         sequence_tuple_list = []
529 |         longest_sequence = 0
530 |         for line in ref_dict_file:
531 |             if line.startswith("@SQ"):
532 |                 line_split = line.split("\t")
533 |                 # (Sequence_Name, Sequence_Length)
534 |                 sequence_tuple_list.append((line_split[1].split("SN:")[1], int(line_split[2].split("LN:")[1])))
535 |         longest_sequence = sorted(sequence_tuple_list, key=lambda x: x[1], reverse=True)[0][1]
536 |     # We are adding this to the intervals because hg38 has contigs named with embedded colons (:) and a bug in
537 |     # some versions of GATK strips off the last element after a colon, so we add this as a sacrificial element.
538 |     hg38_protection_tag = ":1+"
539 |     # initialize the tsv string with the first sequence
540 |     tsv_string = sequence_tuple_list[0][0] + hg38_protection_tag
541 |     temp_size = sequence_tuple_list[0][1]
542 |     for sequence_tuple in sequence_tuple_list[1:]:
543 |         if temp_size + sequence_tuple[1] <= longest_sequence:
544 |             temp_size += sequence_tuple[1]
545 |             tsv_string += "\t" + sequence_tuple[0] + hg38_protection_tag
546 |         else:
547 |             tsv_string += "\n" + sequence_tuple[0] + hg38_protection_tag
548 |             temp_size = sequence_tuple[1]
549 |     # add the unmapped sequences as a separate line to ensure that they are recalibrated as well
550 |     with open("sequence_grouping.txt","w") as tsv_file:
551 |       tsv_file.write(tsv_string)
552 |       tsv_file.close()
553 | 
554 |     tsv_string += '\n' + "unmapped"
555 | 
556 |     with open("sequence_grouping_with_unmapped.txt","w") as tsv_file_with_unmapped:
557 |       tsv_file_with_unmapped.write(tsv_string)
558 |       tsv_file_with_unmapped.close()
559 |     CODE
560 | 
561 |     cat sequence_grouping_with_unmapped.txt
562 |   >>>
563 |   runtime {
564 |     docker: docker_image
565 |     memory: "~{mem_size_gb} GiB"
566 |     cpu: 2
567 |   }
568 |   output {
569 |     Array[Array[String]] sequence_grouping = read_tsv("sequence_grouping.txt")
570 |     Array[Array[String]] sequence_grouping_with_unmapped = read_tsv("sequence_grouping_with_unmapped.txt")
571 |   }
572 | }
573 | 
574 | # Generate Base Quality Score Recalibration (BQSR) model
575 | task BaseRecalibrator {
576 |   input {
577 |     File input_bam
578 |     File input_bam_index
579 |     String recalibration_report_filename
580 |     Array[String] sequence_group_interval
581 |     File dbSNP_vcf
582 |     File dbSNP_vcf_index
583 |     File known_indels_vcf
584 |     File known_indels_vcf_index
585 |     File Mills_1000G_indels_vcf
586 |     File Mills_1000G_indels_vcf_index
587 |     #Array[File] known_indels_sites_indices
588 |     File ref_dict
589 |     File ref_fasta
590 |     File ref_fasta_index
591 | 
592 |     Float mem_size_gb = 30
593 | 
594 |     String docker_image
595 |     String gatk_path
596 |   }
597 | 
598 |   Int xmx = ceil(mem_size_gb) - 2
599 | 
600 |   command {
601 |     echo BaseRecalibrator >&2
602 |     set -euxo pipefail
603 | 
604 |     ~{gatk_path} --java-options "-Xmx~{xmx}G" \
605 |     BaseRecalibrator \
606 |     -R ~{ref_fasta} \
607 |     -I ~{input_bam} \
608 |     --use-original-qualities \
609 |     -O ~{recalibration_report_filename} \
610 |     --known-sites ~{dbSNP_vcf} \
611 |     --known-sites ~{known_indels_vcf} \
612 |     --known-sites ~{Mills_1000G_indels_vcf} \
613 |     -L ~{sep=" -L " sequence_group_interval}
614 |   }
615 |   runtime {
616 |     docker: docker_image
617 |     memory: "~{mem_size_gb} GiB"
618 |     cpu: 2
619 |   }
620 |   output {
621 |     File recalibration_report = "~{recalibration_report_filename}"
622 |   }
623 | }
624 | 
625 | # Combine multiple recalibration tables from scattered BaseRecalibrator runs
626 | task GatherBqsrReports {
627 |   input {
628 |     Array[File] input_bqsr_reports
629 |     String output_report_filename
630 | 
631 |     Float mem_size_gb = 8
632 | 
633 |     String docker_image
634 |     String gatk_path
635 |   }
636 | 
637 |   Int xmx = ceil(mem_size_gb) - 2
638 | 
639 |   command {
640 |     echo GatherBqsrReports
641 |     set -euxo pipefail
642 | 
643 |     ~{gatk_path} --java-options "-Xmx~{xmx}G" \
644 |     GatherBQSRReports \
645 |     -I ~{sep=' -I ' input_bqsr_reports} \
646 |     -O ~{output_report_filename}
647 |   }
648 |   runtime {
649 |     docker: docker_image
650 |     memory: "~{mem_size_gb} GiB"
651 |     cpu: 2
652 |   }
653 |   output {
654 |     File output_bqsr_report = "~{output_report_filename}"
655 |   }
656 | }
657 | 
658 | # Apply Base Quality Score Recalibration (BQSR) model
659 | task ApplyBQSR {
660 |   input {
661 |     File input_bam
662 |     File input_bam_index
663 |     String output_bam_basename
664 |     File recalibration_report
665 |     Array[String] sequence_group_interval
666 |     File ref_dict
667 |     File ref_fasta
668 |     File ref_fasta_index
669 | 
670 |     Float mem_size_gb = 8
671 | 
672 |     String docker_image
673 |     String gatk_path
674 |   }
675 | 
676 |   Int xmx = ceil(mem_size_gb) - 2
677 |   command {
678 |     echo ApplyBQSR
679 |     set -euxo pipefail
680 | 
681 |     ~{gatk_path} --java-options "-Dsamjdk.compression_level=6 -Xmx~{xmx}G" \
682 |     ApplyBQSR \
683 |     -R ~{ref_fasta} \
684 |     -I ~{input_bam} \
685 |     -O ~{output_bam_basename}.bam \
686 |     -L ~{sep=" -L " sequence_group_interval} \
687 |     -bqsr ~{recalibration_report} \
688 |     --static-quantized-quals 10 --static-quantized-quals 20 --static-quantized-quals 30 \
689 |     --add-output-sam-program-record \
690 |     --create-output-bam-md5 \
691 |     --use-original-qualities
692 |   }
693 |   runtime {
694 |     docker: docker_image
695 |     memory: "~{mem_size_gb} GiB"
696 |     cpu: 2
697 |   }
698 |   output {
699 |     File recalibrated_bam = "~{output_bam_basename}.bam"
700 |   }
701 | }
702 | 
703 | # Combine multiple recalibrated BAM files from scattered ApplyRecalibration runs
704 | task GatherBamFiles {
705 |   input {
706 |     Array[File] input_bams
707 |     String output_bam_basename
708 | 
709 |     Int compression_level
710 |     Float mem_size_gb = 4
711 | 
712 |     String docker_image
713 |     String gatk_path
714 |   }
715 | 
716 |   Int xmx = ceil(mem_size_gb) - 2
717 | 
718 |   command {
719 |     echo GatherBamFiles
720 |     set -euxo pipefail
721 | 
722 |     ~{gatk_path} --java-options "-Dsamjdk.compression_level=~{compression_level} -Xmx~{xmx}G" \
723 |     GatherBamFiles \
724 |     --INPUT ~{sep=' --INPUT ' input_bams} \
725 |     --OUTPUT ~{output_bam_basename}.bam \
726 |     --CREATE_INDEX true \
727 |     --CREATE_MD5_FILE true
728 |   }
729 |   runtime {
730 |     docker: docker_image
731 |     memory: "~{mem_size_gb} GiB"
732 |     cpu: 2
733 |   }
734 |   output {
735 |     File output_bam = "~{output_bam_basename}.bam"
736 |     File output_bam_index = "~{output_bam_basename}.bai"
737 |     File output_bam_md5 = "~{output_bam_basename}.bam.md5"
738 |   }
739 | }
740 | 


--------------------------------------------------------------------------------
/static/arch_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-omics-end-to-end-genomics/d5898660bc4a7d505f0556201208932df3fe947b/static/arch_diagram.png


--------------------------------------------------------------------------------
/static/stepfunctions.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-omics-end-to-end-genomics/d5898660bc4a7d505f0556201208932df3fe947b/static/stepfunctions.png


--------------------------------------------------------------------------------
/static/stepfunctions_graph_workflowstudio.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-omics-end-to-end-genomics/d5898660bc4a7d505f0556201208932df3fe947b/static/stepfunctions_graph_workflowstudio.png


--------------------------------------------------------------------------------