├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── container ├── inference │ ├── Dockerfile │ ├── inference.py │ ├── nginx.conf │ ├── serve │ └── wsgi.py └── training │ ├── Dockerfile │ ├── changehostname.c │ ├── requirements.txt │ └── start_with_right_hostname.sh ├── deploy-ESM-embeddings-server.ipynb ├── img ├── 1-setup.png ├── 2-api-key.png ├── 3-generate.png ├── 4-sm.png ├── 5-secret-type.png └── bionemo-sm-arch.png ├── src ├── esm1nv-training.yaml └── train.py └── train-ESM.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .venv 3 | .scratch 4 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Amazon SageMaker with NVIDIA BioNeMo 2 | 3 | ## Description 4 | 5 | Code examples for running NVIDIA BioNeMo inference and training on Amazon SageMaker. 6 | 7 | ## Introduction 8 | 9 | Proteins are complex biomolecules that carry out most of the essential functions in cells, from metabolism to cellular signaling and structure. A deep understanding of protein structure and function is critical for advancing fields like personalized medicine, biomanufacturing, and synthetic biology. 10 | 11 | Recent advances in natural language processing (NLP) have enabled breakthroughs in computational biology through the development of protein language models (pLMs). Similar to how word tokens are the building blocks of sentences in NLP models, amino acids are the building blocks that make up protein sequences. When exposed to millions of protein sequences during training, pLMs develop attention patterns that represent the evolutionary relationships between amino acids. This learned representation of primary sequence can then be fine-tuned to predict protein properties and higher-order structure. 12 | 13 | At re:Invent 2023, NVIDIA announced that its BioNeMo generative AI platform for drug discovery will now be available on AWS services including Amazon SageMaker, AWS ParallelCluster, and the upcoming NVIDIA DGX Cloud on AWS. BioNeMo provides pre-trained large language models, data loaders, and optimized training frameworks to help speed up target identification, protein structure prediction, and drug candidate screening in the drug discovery process. Researchers and developers at pharmaceutical and biotech companies that use AWS will be able to leverage BioNeMo and AWS's scalable GPU cloud computing capabilities to rapidly build and train generative AI models on biomolecular data. Several biotech companies and startups are already using BioNeMo for AI-accelerated drug discovery and this announcement will enable them to easily scale up resources as needed. 14 | 15 | This repository contains example of how to use the [BioNeMo framework container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework) on Amazon SageMaker. 16 | 17 | ## Architecture 18 | 19 | ![Solution architecture for running BioNeMo training and inference on Amazon SageMaker](img/bionemo-sm-arch.png) 20 | 21 | ## Configuration 22 | 23 | BioNeMo uses [Hydra](https://hydra.cc/docs/intro/) to manage the training job parameters. These are stored in .yaml files and passed to the training job at runtime. For this reason, you do not need to pass many hyperparameters to your SageMaker Estimator - sometimes only the name of your configuration file! You can find a basic Hydra tutorial [here](https://hydra.cc/docs/tutorials/basic/your_first_app/simple_cli/). 24 | 25 | ## Setup 26 | 27 | Before you create a BioNeMo training job, follow these steps to generate some NGC API credentials and store them in AWS Secrets Manager. 28 | 29 | 1. Sign-in or create a new account at NVIDIA [NGC](https://ngc.nvidia.com/signin). 30 | 2. Select your name in the top-right corner of the screen and then "Setup" 31 | 32 | ![Select Setup from the top-right menup](img/1-setup.png) 33 | 34 | 3. Select "Generate API Key". 35 | 36 | ![Select Generate API Key](img/2-api-key.png) 37 | 38 | 4. Select the green "+ Generate API Key" button and confirm. 39 | 40 | ![Select green Generate API Key button ](img/3-generate.png) 41 | 42 | 5. Copy the API key - this is the last time you can retrieve it! 43 | 44 | 6. Before you leave the NVIDIA NGC site, also take note of your organization ID listed under your name in the top-right corner of the screen. You'll need this, plus your API key, to download BioNeMo artifacts. 45 | 46 | 7. Navigate to the AWS Console and then to AWS Secrets Manager. 47 | 48 | ![Navigate to AWS Secrets Manager](img/4-sm.png) 49 | 50 | 8. Select "Store a new secret". 51 | 9. Under "Secret type" select "Other type of secret" 52 | 53 | ![Select other type of secret](img/5-secret-type.png) 54 | 55 | 10. Under "Key/value" pairs, add a key named "NGC_CLI_API_KEY" with a value of your NGC API key. Add another key named "NGC_CLI_ORG" with a value of your NGC organization. Select Next. 56 | 57 | 11. Under "Configure secret - Secret name and description", name your secret "NVIDIA_NGC_CREDS" and select Next. You'll use this secret name when submitting BioNeMo jobs to SageMaker. 58 | 59 | 12. Select the remaining default options to create your secret. 60 | 61 | ## Examples 62 | 63 | ### Generate ESM-1nv sequence embeddings using an Amazon SageMaker Real-Time Inference Endpoint 64 | 65 | The **deploy-ESM-embeddings-server.ipynb** notebook describes how to deploy the pretrained esm-1nv model as an endpoint for generating sequence embeddings. In this case, all of the required configuration files are already included in the BioNeMo framwork. You only need to specify the name of your model and your NGC API secret name in AWS Secrets Manager. 66 | 67 | ### Train ESM-1nv on Protein Sequences from UniProt 68 | 69 | The **train-ESM.ipynb** notebook describes how to pretrain or fine-tune the esm-1nv model using a sequence data from the UniProt sequence database. In this case, you will need to create a configuration file and upload it with the training script when creating the job. You should not need to modify the training script. Once the training has finished the Nemo checkpoints will be available in S3. 70 | 71 | ## Security 72 | 73 | Amazon S3 now applies server-side encryption with Amazon S3 managed keys (SSE-S3) as the base level of encryption for every bucket in Amazon S3. However, for particularly sensitive data or models you may want to apply a different server- or client-side encrption method, [as described in the Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html). 74 | 75 | Additional security best practices, such as disabling access control lists (ACLs) and S3 Block Public Access, can by found in the [Amazon S3 documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html). 76 | 77 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 78 | 79 | ## License 80 | 81 | This library is licensed under the MIT-0 License. See the LICENSE file. 82 | -------------------------------------------------------------------------------- /container/inference/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | FROM nvcr.io/nvidia/clara/bionemo-framework:1.10 5 | 6 | # Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present 7 | LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true 8 | 9 | ENV PYTHON=python3 10 | ENV PYTHON_VERSION=3.10.12 11 | ENV PYTHON_SHORT_VERSION=3.10 12 | ENV MAMBA_VERSION=23.11.0-0 13 | ENV PYTORCH_VERSION=2.1.0 14 | ENV DEBIAN_FRONTEND=noninteractive 15 | ENV DLC_CONTAINER_TYPE=inference 16 | ENV PYTHONUNBUFFERED=TRUE 17 | ENV PYTHONDONTWRITEBYTECODE=TRUE 18 | ENV PATH="${BIONEMO_HOME}:${PATH}" 19 | ENV TQDM_POSITION=-1 20 | ENV MODEL_PATH $BIONEMO_HOME/models 21 | 22 | COPY serve . 23 | COPY inference.py . 24 | COPY wsgi.py . 25 | COPY nginx.conf . 26 | 27 | # Upgrade installed packages 28 | RUN apt-get update && apt-get upgrade -y && apt-get clean \ 29 | && apt-get -y install --no-install-recommends \ 30 | build-essential \ 31 | ca-certificates \ 32 | curl \ 33 | nginx \ 34 | && rm -rf /var/lib/apt/lists/* \ 35 | && pip3 --no-cache-dir install --upgrade pip \ 36 | && pip --no-cache-dir install \ 37 | boto3 \ 38 | "sagemaker>=2,<3" \ 39 | flask \ 40 | gunicorn \ 41 | gevent \ 42 | ujson \ 43 | && rm -rf /root/.cache | true \ 44 | && rm "$HOME/.aws/config" 45 | 46 | WORKDIR $BIONEMO_HOME 47 | EXPOSE 8080 48 | ENTRYPOINT ["/usr/bin/python"] 49 | CMD ["serve"] -------------------------------------------------------------------------------- /container/inference/inference.py: -------------------------------------------------------------------------------- 1 | from io import StringIO 2 | import flask 3 | from flask import Flask, Response, Request 4 | import logging 5 | 6 | from bionemo.triton.inference_wrapper import new_inference_wrapper 7 | import warnings 8 | 9 | logging.basicConfig( 10 | format="%(asctime)s - %(levelname)s - %(message)s", 11 | datefmt="%m/%d/%Y %H:%M:%S", 12 | level=logging.INFO, 13 | ) 14 | 15 | warnings.filterwarnings("ignore") 16 | warnings.simplefilter("ignore") 17 | 18 | app = Flask(__name__) 19 | connection = new_inference_wrapper("grpc://localhost:8001") 20 | 21 | 22 | @app.route("/ping", methods=["GET"]) 23 | def ping(): 24 | """ 25 | Check the health of the model server by verifying if the model is loaded. 26 | 27 | Returns a 200 status code if the model is loaded successfully, or a 500 28 | status code if there is an error. 29 | 30 | Returns: 31 | flask.Response: A response object containing the status code and mimetype. 32 | """ 33 | status = 200 34 | return flask.Response(response="\n", status=status, mimetype="application/json") 35 | 36 | 37 | @app.route("/invocations", methods=["POST"]) 38 | def invocations(): 39 | """ 40 | Handle prediction requests by preprocessing the input data, making predictions, 41 | and returning the predictions as a JSON object. 42 | 43 | This function checks if the request content type is supported (text/csv), 44 | and if so, decodes the input data, preprocesses it, makes predictions, and returns 45 | the predictions as a JSON object. If the content type is not supported, a 415 status 46 | code is returned. 47 | 48 | Returns: 49 | flask.Response: A response object containing the predictions, status code, and mimetype. 50 | """ 51 | print(f"Predictor: received content type: {flask.request.content_type}") 52 | if flask.request.content_type == "text/csv": 53 | input = flask.request.data.decode("utf-8") 54 | print(f"Predictor: received input: {input}") 55 | seqs = input.split(",") 56 | embeddings = connection.seqs_to_embedding(seqs) 57 | print(f"{embeddings.shape=}") 58 | print(f"Predictor: output: {embeddings}") 59 | # Return the predictions as a list 60 | return embeddings.tolist() 61 | else: 62 | print(f"Received: {flask.request.content_type}", flush=True) 63 | return flask.Response( 64 | response=f"This predictor only supports CSV data; Received: {flask.request.content_type}", 65 | status=415, 66 | mimetype="text/plain", 67 | ) 68 | -------------------------------------------------------------------------------- /container/inference/nginx.conf: -------------------------------------------------------------------------------- 1 | worker_processes 1; 2 | daemon off; # Prevent forking 3 | 4 | 5 | pid /tmp/nginx.pid; 6 | error_log /var/log/nginx/error.log; 7 | 8 | events { 9 | # defaults 10 | } 11 | 12 | http { 13 | include /etc/nginx/mime.types; 14 | default_type application/octet-stream; 15 | access_log /var/log/nginx/access.log combined; 16 | 17 | upstream gunicorn { 18 | server unix:/tmp/gunicorn.sock; 19 | } 20 | 21 | server { 22 | listen 8080 deferred; 23 | client_max_body_size 5m; 24 | 25 | keepalive_timeout 5; 26 | proxy_read_timeout 1200s; 27 | 28 | location ~ ^/(ping|invocations) { 29 | proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; 30 | proxy_set_header Host $http_host; 31 | proxy_redirect off; 32 | proxy_pass http://gunicorn; 33 | } 34 | 35 | location / { 36 | return 404 "{}"; 37 | } 38 | } 39 | } -------------------------------------------------------------------------------- /container/inference/serve: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # This file implements the scoring service shell. You don't necessarily need to modify it for various 4 | # algorithms. 5 | 6 | # Required environment variables: 7 | # 8 | 9 | import multiprocessing 10 | import os 11 | import signal 12 | import subprocess 13 | import sys 14 | import re 15 | import boto3 16 | import logging 17 | import shutil 18 | from botocore.exceptions import ClientError 19 | import json 20 | from time import sleep 21 | cpu_count = multiprocessing.cpu_count() 22 | 23 | model_server_timeout = os.environ.get("MODEL_SERVER_TIMEOUT", 60) 24 | model_server_workers = int(os.environ.get("MODEL_SERVER_WORKERS", cpu_count)) 25 | 26 | logging.basicConfig( 27 | format="%(asctime)s - %(levelname)s - %(message)s", 28 | datefmt="%m/%d/%Y %H:%M:%S", 29 | level=logging.INFO, 30 | ) 31 | 32 | def sigterm_handler(nginx_pid, gunicorn_pid, pytriton_pid): 33 | try: 34 | os.kill(nginx_pid, signal.SIGQUIT) 35 | except OSError: 36 | pass 37 | try: 38 | os.kill(gunicorn_pid, signal.SIGQUIT) 39 | except OSError: 40 | pass 41 | try: 42 | os.kill(pytriton_pid, signal.SIGQUIT) 43 | except OSError: 44 | pass 45 | sys.exit(0) 46 | 47 | 48 | def parse_conf_path( 49 | model_name: str, 50 | root_path: str = "/workspace/bionemo/examples", 51 | ) -> str: 52 | """Parse the conf path from the model name.""" 53 | 54 | if model_name == "megamolbart": 55 | conf_path = "molecule/megamolbart/conf" 56 | elif model_name == "prott5nv": 57 | conf_path = "protein/prott5nv/conf" 58 | elif model_name == "esm1nv": 59 | conf_path = "protein/esm1nv/conf" 60 | elif re.match(r"diffdock", model_name): 61 | conf_path = "molecule/diffdock/conf" 62 | elif re.match(r"esm2", model_name): 63 | conf_path = "protein/esm2nv/conf" 64 | elif re.match(r"equidock", model_name): 65 | conf_path = "protein/equidock/conf" 66 | else: 67 | raise ValueError(f"Invalid model name: {model_name}") 68 | 69 | return os.path.join(root_path, conf_path) 70 | 71 | 72 | def set_ngc_credentials(secret_name: str) -> None: 73 | """Get NVIDIA NGC API Key and org from AWS Secrets Manager""" 74 | 75 | # Create a Secrets Manager client 76 | client = boto3.client( 77 | "secretsmanager", region_name=os.environ.get("AWS_REGION", "us-west-2") 78 | ) 79 | 80 | logging.info("Retrieving NGC credentials from AWS Secrets Manager.") 81 | 82 | try: 83 | get_secret_value_response = client.get_secret_value(SecretId=secret_name) 84 | except ClientError as e: 85 | # For a list of exceptions thrown, see 86 | # https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html 87 | raise e 88 | 89 | creds = json.loads(get_secret_value_response["SecretString"]) 90 | 91 | logging.info("Setting NGC credentials as environment variables.") 92 | os.environ["NGC_CLI_API_KEY"] = creds.get("NGC_CLI_API_KEY", "") 93 | os.environ["NGC_CLI_ORG"] = creds.get("NGC_CLI_ORG", "") 94 | os.environ["NGC_CLI_TEAM"] = creds.get("NGC_CLI_TEAM", "") 95 | os.environ["NGC_CLI_FORMAT_TYPE"] = creds.get("NGC_CLI_FORMAT_TYPE", "ascii") 96 | 97 | return None 98 | 99 | 100 | def download_model_weights( 101 | secret_name=os.environ.get("SM_SECRET_NAME", "NVIDIA_NGC_CREDS"), 102 | model_name=os.environ.get("MODEL_NAME", "all"), 103 | model_path=os.environ.get("MODEL_PATH", "/workspace/bionemo/models"), 104 | ): 105 | set_ngc_credentials(secret_name) 106 | logging.info("Downloading pre-trained model checkpoint") 107 | if not os.path.exists(model_path): 108 | os.makedirs(model_path) 109 | 110 | if not os.path.exists("artifact_paths.yaml"): 111 | shutil.copy( 112 | "/workspace/bionemo/artifact_paths.yaml", 113 | os.getcwd(), 114 | ) 115 | subprocess.run( 116 | [ 117 | "/usr/bin/python", 118 | "/workspace/bionemo/download_artifacts.py", 119 | "--models", 120 | model_name, 121 | "--source", 122 | "ngc", 123 | "--model_dir", 124 | model_path, 125 | ], 126 | check=True, 127 | ) 128 | downloaded_nemo_files = [f for f in os.listdir(model_path) if f.endswith(".nemo")] 129 | checkpoint_path = os.path.join(model_path, downloaded_nemo_files[0]) 130 | logging.info(f"Pre-trained model checkpoint downloaded to {checkpoint_path}") 131 | 132 | 133 | def start_server(): 134 | logging.info("Starting the inference server with {} workers.".format(model_server_workers)) 135 | 136 | # link the log streams to stdout/err so they will be logged to the container logs 137 | subprocess.check_call( 138 | ["/usr/bin/ln", "-sf", "/dev/stdout", "/var/log/nginx/access.log"] 139 | ) 140 | subprocess.check_call( 141 | ["/usr/bin/ln", "-sf", "/dev/stderr", "/var/log/nginx/error.log"] 142 | ) 143 | 144 | download_model_weights() 145 | 146 | config_path = parse_conf_path(os.environ.get("MODEL_NAME", "esm1nv")) 147 | 148 | logging.info("Starting nginx") 149 | nginx = subprocess.Popen( 150 | [ 151 | "/usr/sbin/nginx", 152 | "-c", 153 | os.path.join(os.environ.get("BIONEMO_HOME"), "nginx.conf"), 154 | ] 155 | ) 156 | sleep(5) 157 | 158 | logging.info("Starting gunicorn") 159 | gunicorn = subprocess.Popen( 160 | [ 161 | "/usr/local/bin/gunicorn", 162 | "--timeout", 163 | str(model_server_timeout), 164 | "-k", 165 | "sync", 166 | "-b", 167 | "unix:/tmp/gunicorn.sock", 168 | "-w", 169 | str(model_server_workers), 170 | "wsgi:app", 171 | ] 172 | ) 173 | sleep(5) 174 | 175 | logging.info("Starting pytriton inference wrapper") 176 | pytriton = subprocess.Popen( 177 | [ 178 | "/usr/bin/python", 179 | "-m", 180 | "bionemo.triton.inference_wrapper", 181 | "--config-path", 182 | config_path, 183 | ] 184 | ) 185 | sleep(5) 186 | 187 | signal.signal( 188 | signal.SIGTERM, 189 | lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid, pytriton.pid), 190 | ) 191 | 192 | # If any subprocess exits, so do we. 193 | pids = set([nginx.pid, gunicorn.pid, pytriton.pid]) 194 | while True: 195 | pid, _ = os.wait() 196 | if pid in pids: 197 | break 198 | 199 | sigterm_handler(nginx.pid, gunicorn.pid, pytriton.pid) 200 | logging.info("Inference server exiting") 201 | 202 | 203 | # The main routine just invokes the start function. 204 | 205 | if __name__ == "__main__": 206 | start_server() 207 | -------------------------------------------------------------------------------- /container/inference/wsgi.py: -------------------------------------------------------------------------------- 1 | import inference as myapp 2 | 3 | # This is just a simple wrapper for gunicorn to find your app. 4 | # If you want to change the algorithm file, simply change "predictor" above to the 5 | # new file. 6 | 7 | app = myapp.app -------------------------------------------------------------------------------- /container/training/Dockerfile: -------------------------------------------------------------------------------- 1 | # Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | FROM nvcr.io/nvidia/clara/bionemo-framework:1.10 as common 5 | 6 | ENV PYTHON=python3 7 | ENV PYTHON_VERSION=3.10.12 8 | ENV PYTHON_SHORT_VERSION=3.10 9 | ENV MAMBA_VERSION=23.11.0-0 10 | ENV PYTORCH_VERSION=2.1.0 11 | ENV SMP_URL=https://smppy.s3.amazonaws.com/pytorch/cu121/smprof-0.3.334-cp310-cp310-linux_x86_64.whl 12 | ENV EFA_PATH=/opt/amazon/efa 13 | ENV PYTHONDONTWRITEBYTECODE=1 14 | ENV PYTHONUNBUFFERED=1 15 | ENV PYTHONIOENCODING=UTF-8 16 | ENV LANG=C.UTF-8 17 | ENV LC_ALL=C.UTF-8 18 | ENV DEBIAN_FRONTEND=noninteractive 19 | ENV TORCH_CUDA_ARCH_LIST="5.2;7.0+PTX;7.5;8.0;8.6;9.0" 20 | ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" 21 | ENV CUDNN_VERSION=8.9.2.26 22 | ENV EFA_VERSION=1.30.0 23 | ENV GDRCOPY_VERSION=2.3.1 24 | ENV OPEN_MPI_PATH=/opt/amazon/openmpi 25 | ENV DGLBACKEND=pytorch 26 | ENV MANUAL_BUILD=0 27 | ENV RDMAV_FORK_SAFE=1 28 | ENV DLC_CONTAINER_TYPE=training 29 | ENV NCCL_ASYNC_ERROR_HANDLING=1 30 | ENV SAGEMAKER_TRAINING_MODULE=sagemaker_pytorch_container.training:main 31 | ENV PATH="$OPEN_MPI_PATH/bin:$EFA_PATH/bin:$PATH" 32 | ENV LD_LIBRARY_PATH=$OPEN_MPI_PATH/lib/:$EFA_PATH/lib/:$LD_LIBRARY_PATH 33 | ENV PYTORCH_API_USAGE_STDERR=1 34 | ENV TORCH_LOGS=+dynamo,+aot,+inductor 35 | ENV TQDM_POSITION=-1 36 | ENV MODEL_PATH $BIONEMO_HOME/models 37 | # makes AllToAll complete successfully. Update will be included in NCCL 2.20.* 38 | ENV NCCL_CUMEM_ENABLE=0 39 | ENV OFI_URI="https://github.com/aws/aws-ofi-nccl/releases/download/v1.8.0-aws/aws-ofi-nccl-1.8.0-aws.tar.gz" 40 | 41 | COPY changehostname.c / 42 | COPY start_with_right_hostname.sh /usr/local/bin/start_with_right_hostname.sh 43 | 44 | RUN apt-get update \ 45 | && apt-get upgrade -y \ 46 | && apt-get autoremove -y \ 47 | && apt-get clean \ 48 | && rm -rf /var/lib/apt/lists/* \ 49 | && mkdir /tmp/efa \ 50 | && cd /tmp/efa \ 51 | && curl -O https://s3-us-west-2.amazonaws.com/aws-efa-installer/aws-efa-installer-${EFA_VERSION}.tar.gz \ 52 | && tar -xf aws-efa-installer-${EFA_VERSION}.tar.gz \ 53 | && cd aws-efa-installer \ 54 | && apt-get update \ 55 | && ./efa_installer.sh -y --skip-kmod --skip-limit-conf --no-verify \ 56 | && rm -rf /tmp/efa \ 57 | && rm -rf /tmp/aws-efa-installer-${EFA_VERSION}.tar.gz \ 58 | && rm -rf /var/lib/apt/lists/* \ 59 | && apt-get clean \ 60 | && wget -O /tmp/ofi-aws.tar.gz ${OFI_URI} \ 61 | && tar -xvzf /tmp/ofi-aws.tar.gz -C /usr/local/bin --no-same-owner \ 62 | && rm /tmp/ofi-aws.tar.gz 63 | 64 | COPY requirements.txt / 65 | 66 | RUN pip install --upgrade pip --no-cache-dir --trusted-host pypi.org --trusted-host files.pythonhosted.org \ 67 | && pip install --no-cache-dir -U ${SMP_URL} \ 68 | && pip install --no-cache-dir -r /requirements.txt \ 69 | && rm -rf /root/.cache | true \ 70 | && rm "$HOME/.aws/config" \ 71 | && chmod +x /usr/local/bin/start_with_right_hostname.sh 72 | 73 | WORKDIR / 74 | 75 | ENTRYPOINT ["bash", "-m", "start_with_right_hostname.sh"] 76 | CMD ["/bin/bash"] -------------------------------------------------------------------------------- /container/training/changehostname.c: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | /* 5 | * Modifies gethostname to return algo-1, algo-2, etc. when running on SageMaker. 6 | * 7 | * Without this gethostname() on SageMaker returns 'aws', leading NCCL/MPI to think there is only one host, 8 | * not realizing that it needs to use NET/Socket. 9 | * 10 | * When docker container starts we read 'current_host' value from /opt/ml/input/config/resourceconfig.json 11 | * and replace PLACEHOLDER_HOSTNAME with it before compiling this code into a shared library. 12 | */ 13 | int gethostname(char *name, size_t len) 14 | { 15 | const char *val = PLACEHOLDER_HOSTNAME; 16 | strncpy(name, val, len); 17 | return 0; 18 | } 19 | -------------------------------------------------------------------------------- /container/training/requirements.txt: -------------------------------------------------------------------------------- 1 | accelerate==1.1.0 2 | fastai==2.7.18 3 | huggingface-hub<0.24.0 4 | numba 5 | opencv-python 6 | pandas 7 | pillow 8 | requests>=2.31.0 9 | sagemaker>=2,<3 10 | sagemaker-pytorch-training 11 | sagemaker-training 12 | scikit-learn 13 | shap 14 | smclarify 15 | -------------------------------------------------------------------------------- /container/training/start_with_right_hostname.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | if [[ "$1" = "train" ]]; then 4 | CURRENT_HOST=$(jq .current_host /opt/ml/input/config/resourceconfig.json) 5 | sed -ie "s/PLACEHOLDER_HOSTNAME/$CURRENT_HOST/g" changehostname.c 6 | gcc -o changehostname.o -c -fPIC -Wall changehostname.c 7 | gcc -o libchangehostname.so -shared -export-dynamic changehostname.o -ldl 8 | LD_PRELOAD=/libchangehostname.so train 9 | else 10 | eval "$@" 11 | fi 12 | -------------------------------------------------------------------------------- /deploy-ESM-embeddings-server.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "f9586f20", 6 | "metadata": {}, 7 | "source": [ 8 | "# Deploy ESM Embeddings Server on on Amazon SageMaker\n", 9 | "\n", 10 | "Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", 11 | "SPDX-License-Identifier: MIT-0" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "id": "97b562fb", 17 | "metadata": {}, 18 | "source": [ 19 | "---\n", 20 | "## 1. Setup" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "88a64f1b", 26 | "metadata": {}, 27 | "source": [ 28 | "### 1.1. Create clients" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "id": "1c273482-ffb7-49af-a83f-19a7759a7621", 35 | "metadata": { 36 | "tags": [] 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "import boto3\n", 41 | "import sagemaker\n", 42 | "\n", 43 | "boto_session = boto3.session.Session()\n", 44 | "sagemaker_session = sagemaker.session.Session(boto_session)\n", 45 | "s3 = boto_session.resource(\"s3\")\n", 46 | "region = boto_session.region_name\n", 47 | "role = sagemaker.get_execution_role()" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "id": "a6c5b022", 53 | "metadata": {}, 54 | "source": [ 55 | "### 1.2. Build BioNeMo-Inference Container Image\n", 56 | "\n", 57 | "If you don't already have access to the BioNeMo-SageMaker container image, run the following cell to build and deploy it to your AWS account. Take note of the image URI - you'll use it for the processing and training steps below.\n", 58 | "\n", 59 | "Here is an example shell script you can use in your environment (including SageMaker Notebook Instances) to build the container.\n", 60 | "\n", 61 | "Once you have built and pushed the container, we strongly recommend using [ECR image scanning](https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html) to ensure that it meets your security requirements." 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "id": "2d24d513", 68 | "metadata": { 69 | "scrolled": true, 70 | "tags": [] 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "%%bash\n", 75 | "\n", 76 | "# The name of our algorithm\n", 77 | "algorithm_name=bionemo-inference\n", 78 | "\n", 79 | "pushd container/inference\n", 80 | "\n", 81 | "account=$(aws sts get-caller-identity --query Account --output text)\n", 82 | "\n", 83 | "# Get the region defined in the current configuration (default to us-west-2 if none defined)\n", 84 | "region=$(aws configure get region)\n", 85 | "region=${region:-us-west-2}\n", 86 | "\n", 87 | "fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\"\n", 88 | "\n", 89 | "# If the repository doesn't exist in ECR, create it.\n", 90 | "aws ecr describe-repositories --repository-names \"${algorithm_name}\" > /dev/null 2>&1\n", 91 | "\n", 92 | "if [ $? -ne 0 ]\n", 93 | "then\n", 94 | " aws ecr create-repository --repository-name \"${algorithm_name}\" > /dev/null\n", 95 | "fi\n", 96 | "\n", 97 | "# Get the login command from ECR and execute it directly\n", 98 | "$(aws ecr get-login --region ${region} --no-include-email)\n", 99 | "\n", 100 | "# Build the docker image locally with the image name and then push it to ECR\n", 101 | "# with the full name.\n", 102 | "\n", 103 | "docker build -t ${algorithm_name} .\n", 104 | "docker tag ${algorithm_name} ${fullname}\n", 105 | "\n", 106 | "docker push ${fullname}\n", 107 | "\n", 108 | "popd" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "id": "8f7bd546", 114 | "metadata": {}, 115 | "source": [ 116 | "---\n", 117 | "## 2. Deploy Real-Time Inference Endpoint" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "id": "80794653", 123 | "metadata": {}, 124 | "source": [ 125 | "### 2.1. Create esm1nv model" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "id": "c89024a7-f1fa-47df-bd1a-987fa6e647ea", 132 | "metadata": { 133 | "tags": [] 134 | }, 135 | "outputs": [], 136 | "source": [ 137 | "from sagemaker.model import Model\n", 138 | "\n", 139 | "# Replace this with your ECR repository URI from above\n", 140 | "BIONEMO_IMAGE_URI = (\n", 141 | " \".dkr.ecr..amazonaws.com/bionemo-inference:latest\"\n", 142 | ")\n", 143 | "\n", 144 | "esm_embeddings = Model(\n", 145 | " image_uri=BIONEMO_IMAGE_URI,\n", 146 | " name=\"esm-embeddings\",\n", 147 | " model_data=None,\n", 148 | " role=role,\n", 149 | " predictor_cls=sagemaker.predictor.Predictor,\n", 150 | " sagemaker_session=sagemaker_session,\n", 151 | " env={\"SM_SECRET_NAME\": \"NVIDIA_NGC_CREDS\", \"MODEL_NAME\": \"esm1nv\"},\n", 152 | ")" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "id": "88734416", 158 | "metadata": {}, 159 | "source": [ 160 | "### 2.2. Deploy model to SageMaker endpoint" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "id": "9e557430-556f-4185-8f43-f90c691ed7db", 167 | "metadata": { 168 | "tags": [] 169 | }, 170 | "outputs": [], 171 | "source": [ 172 | "esm_embeddings_predictor = esm_embeddings.deploy(\n", 173 | " initial_instance_count=1,\n", 174 | " instance_type='ml.g5.xlarge',\n", 175 | " serializer = sagemaker.base_serializers.CSVSerializer(),\n", 176 | " deserializer = sagemaker.base_deserializers.NumpyDeserializer()\n", 177 | ")" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "id": "483388b3", 183 | "metadata": {}, 184 | "source": [ 185 | "### 2.3. Test model" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "id": "61852f67-ae4e-4f17-86d0-5039e7fa94bd", 192 | "metadata": { 193 | "tags": [] 194 | }, 195 | "outputs": [], 196 | "source": [ 197 | "esm_embeddings_predictor.predict(\"MSLKRKNIALIPAAGIGVRFGADKPKQYVEIGSKTVLEHVL,MIQSQINRNIRLDLADAILLSKAKKDLSFAEIADGTGLA\")" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "id": "9a0ac500-4e63-41f7-9b52-9acdda34f84f", 204 | "metadata": {}, 205 | "outputs": [], 206 | "source": [] 207 | } 208 | ], 209 | "metadata": { 210 | "kernelspec": { 211 | "display_name": "conda_python3", 212 | "language": "python", 213 | "name": "conda_python3" 214 | }, 215 | "language_info": { 216 | "codemirror_mode": { 217 | "name": "ipython", 218 | "version": 3 219 | }, 220 | "file_extension": ".py", 221 | "mimetype": "text/x-python", 222 | "name": "python", 223 | "nbconvert_exporter": "python", 224 | "pygments_lexer": "ipython3", 225 | "version": "3.10.15" 226 | } 227 | }, 228 | "nbformat": 4, 229 | "nbformat_minor": 5 230 | } 231 | -------------------------------------------------------------------------------- /img/1-setup.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-with-nvidia-bionemo/218b86ab0e1202a6d1de8d02669190eda58289c7/img/1-setup.png -------------------------------------------------------------------------------- /img/2-api-key.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-with-nvidia-bionemo/218b86ab0e1202a6d1de8d02669190eda58289c7/img/2-api-key.png -------------------------------------------------------------------------------- /img/3-generate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-with-nvidia-bionemo/218b86ab0e1202a6d1de8d02669190eda58289c7/img/3-generate.png -------------------------------------------------------------------------------- /img/4-sm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-with-nvidia-bionemo/218b86ab0e1202a6d1de8d02669190eda58289c7/img/4-sm.png -------------------------------------------------------------------------------- /img/5-secret-type.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-with-nvidia-bionemo/218b86ab0e1202a6d1de8d02669190eda58289c7/img/5-secret-type.png -------------------------------------------------------------------------------- /img/bionemo-sm-arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-with-nvidia-bionemo/218b86ab0e1202a6d1de8d02669190eda58289c7/img/bionemo-sm-arch.png -------------------------------------------------------------------------------- /src/esm1nv-training.yaml: -------------------------------------------------------------------------------- 1 | # Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | name: esm1nv 5 | do_training: True # set to false if data preprocessing steps must be completed 6 | do_testing: False # set to true to run evaluation on test data after training, requires test_dataset section 7 | restore_from_path: null # used when starting from a .nemo file 8 | 9 | trainer: 10 | # devices: 1 # number of GPUs or CPUs. Don't define here unless you want to override SM. 11 | # num_nodes: 1 # Number of instances. Don't define here unless you want to override SM. 12 | accelerator: gpu #gpu or cpu 13 | precision: 16-mixed #16-mixed or bf16-mixed 14 | logger: False # logger is provided by NeMo exp_manager 15 | enable_checkpointing: False # checkpointing is done by NeMo exp_manager 16 | use_distributed_sampler: False # use NeMo Megatron samplers 17 | max_epochs: null # # use max_steps instead with NeMo Megatron model 18 | max_steps: 100 # consumed_samples = global_step * micro_batch_size * data_parallel_size * accumulate_grad_batches 19 | log_every_n_steps: 1 # number of interations between logging 20 | val_check_interval: 25 21 | limit_val_batches: 1.0 # number of batches in validation step, use fraction for fraction of data, 0 to disable 22 | limit_test_batches: 0 # number of batches in test step, use fraction for fraction of data, 0 to disable 23 | accumulate_grad_batches: 1 24 | gradient_clip_val: 1.0 25 | benchmark: False 26 | 27 | exp_manager: 28 | name: ${name} 29 | exp_dir: ${oc.env:BIONEMO_HOME}/results/nemo_experiments/${.name}/${.wandb_logger_kwargs.name} 30 | explicit_log_dir: ${.exp_dir} 31 | create_wandb_logger: True 32 | create_tensorboard_logger: True 33 | wandb_logger_kwargs: 34 | project: ${name}_pretraining 35 | name: ${name}_pretraining 36 | group: ${name} 37 | job_type: Localhost_nodes_${trainer.num_nodes}_gpus_${trainer.devices} 38 | notes: "date: ${now:%y%m%d-%H%M%S}" 39 | tags: 40 | - ${name} 41 | offline: True # set to True if there are issues uploading to WandB during training 42 | resume_if_exists: True # automatically resume if checkpoint exists 43 | resume_ignore_no_checkpoint: True # leave as True, will start new training if resume_if_exists is True but no checkpoint exists 44 | create_checkpoint_callback: True # leave as True, use exp_manger for for checkpoints 45 | checkpoint_callback_params: 46 | monitor: val_loss 47 | save_top_k: 3 # number of checkpoints to save 48 | mode: min # use min or max of monitored metric to select best checkpoints 49 | always_save_nemo: False # saves nemo file during validation, not implemented for model parallel 50 | filename: "${name}--{val_loss:.2f}-{step}-{consumed_samples}" 51 | model_parallel_size: ${multiply:${model.tensor_model_parallel_size}, ${model.pipeline_model_parallel_size}} 52 | 53 | model: 54 | micro_batch_size: 96 # NOTE: adjust to occupy ~ 90% of GPU memory 55 | tensor_model_parallel_size: 1 # model parallelism 56 | pipeline_model_parallel_size: 1 # model parallelism. If enabled, you need to set data.dynamic_padding to False as pipeline parallelism requires fixed-length padding. 57 | # model architecture 58 | seq_length: 512 59 | max_position_embeddings: ${.seq_length} 60 | encoder_seq_length: ${.seq_length} 61 | num_layers: 6 62 | hidden_size: 768 63 | ffn_hidden_size: 3072 # Transformer FFN hidden size. Usually 4 * hidden_size. 64 | num_attention_heads: 12 65 | init_method_std: 0.02 # Standard deviation of the zero mean normal distribution used for weight initialization.') 66 | hidden_dropout: 0.1 # Dropout probability for hidden state transformer. 67 | kv_channels: null # Projection weights dimension in multi-head attention. Set to hidden_size // num_attention_heads if null 68 | apply_query_key_layer_scaling: True # scale Q * K^T by 1 / layer-number. 69 | layernorm_epsilon: 1e-5 70 | make_vocab_size_divisible_by: 128 # Pad the vocab size to be divisible by this value for computation efficiency. 71 | pre_process: True # add embedding 72 | post_process: True # add pooler 73 | bert_binary_head: False # BERT binary head 74 | resume_from_checkpoint: null # manually set the checkpoint file to load from 75 | masked_softmax_fusion: True # Use a kernel that fuses the attention softmax with it's mask. 76 | 77 | tokenizer: 78 | # Use ESM2 tokenizers from HF 79 | library: huggingface 80 | type: BertWordPieceLowerCase 81 | model_name: facebook/esm2_t33_650M_UR50D 82 | mask_id: 32 83 | model: null 84 | vocab_file: null 85 | merge_file: null 86 | 87 | # precision 88 | native_amp_init_scale: 4294967296 # 2 ** 32 89 | native_amp_growth_interval: 1000 90 | fp32_residual_connection: False # Move residual connections to fp32 91 | fp16_lm_cross_entropy: False # Move the cross entropy unreduced loss calculation for lm head to fp16 92 | 93 | # miscellaneous 94 | seed: 1234 95 | use_cpu_initialization: False # Init weights on the CPU (slow for large model) 96 | onnx_safe: False # Use work-arounds for known problems with Torch ONNX exporter. 97 | 98 | # not implemented in NeMo yet 99 | activations_checkpoint_method: null # 'uniform', 'block' 100 | activations_checkpoint_num_layers: 1 101 | 102 | data: 103 | dataset_path: /opt/ml/input/data 104 | dataset: 105 | train: x000 106 | val: x001 107 | # These control the MLM token probabilities. The following settings are commonly used in literature. 108 | modify_percent: 0.15 # Fraction of characters in a protein sequence to modify. 109 | perturb_percent: 0.1 # Of the modify_percent, what fraction of characters are to be replaced with another amino acid. 110 | mask_percent: 0.8 # Of the modify_percent, what fraction of characters are to be replaced with a mask token. 111 | identity_percent: 0.1 # Of the modify_percent, what fraction of characters are to be unchanged as the original amino acid. 112 | 113 | data_prefix: "" # must be null or "" 114 | num_workers: 1 115 | dataloader_type: single # cyclic 116 | reset_position_ids: False # Reset position ids after end-of-document token 117 | reset_attention_mask: False # Reset attention mask after end-of-document token 118 | eod_mask_loss: False # Mask loss for the end of document tokens 119 | masked_lm_prob: 0.15 # Probability of replacing a token with mask. 120 | short_seq_prob: 0.1 # Probability of producing a short sequence. 121 | skip_lines: 0 122 | drop_last: False 123 | pin_memory: False 124 | index_mapping_dir: null # path to store cached indexing files (if empty, will be stored in the same directory as dataset_path) 125 | data_impl: "csv_mmap" 126 | data_impl_kwargs: 127 | csv_mmap: 128 | header_lines: 1 129 | newline_int: 10 # byte-value of newline 130 | workers: ${model.data.num_workers} # number of workers when creating missing index files (null defaults to cpu_num // 2) 131 | sort_dataset_paths: True # if True datasets will be sorted by name 132 | data_sep: "," # string to split text into columns 133 | data_col: 1 134 | use_upsampling: False # if the data should be upsampled to max number of steps in the training 135 | seed: ${model.seed} # Random seed 136 | max_seq_length: ${model.seq_length} # Maximum input sequence length. Longer sequences are truncated 137 | dynamic_padding: 138 | False # If True, each batch is padded to the maximum sequence length within that batch. 139 | # Set it to False when model.pipeline_model_parallel_size > 1, as pipeline parallelism requires fixed-length padding. 140 | 141 | optim: 142 | name: fused_adam # fused optimizers used by Megatron model 143 | lr: 2e-4 144 | weight_decay: 0.01 145 | betas: 146 | - 0.9 147 | - 0.98 148 | sched: 149 | name: CosineAnnealing 150 | warmup_steps: 500 # use to set warmup_steps explicitly or leave as null to calculate 151 | constant_steps: 50000 152 | min_lr: 2e-5 153 | 154 | dwnstr_task_validation: 155 | enabled: False 156 | -------------------------------------------------------------------------------- /src/train.py: -------------------------------------------------------------------------------- 1 | # Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | 4 | import argparse 5 | import boto3 6 | from botocore.exceptions import ClientError 7 | import json 8 | import logging 9 | import os 10 | import re 11 | import shutil 12 | import subprocess 13 | from datetime import timedelta 14 | import yaml 15 | 16 | import torch.distributed as dist 17 | 18 | NUM_GPUS = int(os.environ.get("SM_NUM_GPUS", 0)) 19 | HOSTS = json.loads(os.environ.get("SM_HOSTS", f'["{os.uname()[1]}"]')) 20 | NUM_HOSTS = len(HOSTS) 21 | 22 | os.environ["HYDRA_FULL_ERROR"] = "1" 23 | 24 | logging.basicConfig( 25 | format="%(asctime)s - %(levelname)s - %(message)s", 26 | datefmt="%m/%d/%Y %H:%M:%S", 27 | level=logging.INFO, 28 | ) 29 | 30 | 31 | def parse_args(): 32 | """Parse the arguments.""" 33 | logging.info("Parsing arguments") 34 | parser = argparse.ArgumentParser() 35 | 36 | parser.add_argument( 37 | "--config-path", 38 | type=str, 39 | default="/opt/ml/code", 40 | help="Path to config files in the container", 41 | ) 42 | parser.add_argument( 43 | "--config-name", 44 | type=str, 45 | default="train", 46 | help="Name of the config file for the run (without file extension)", 47 | ) 48 | 49 | parser.add_argument( 50 | "--model-name", 51 | type=str, 52 | default=None, 53 | choices=[ 54 | "diffdock_confidence", 55 | "diffdock_score", 56 | "equidock_db5", 57 | "equidock_dips", 58 | "esm1nv", 59 | "esm2nv_3b", 60 | "esm2nv_650m", 61 | "esm2_650m_huggingface", 62 | "esm2_3b_huggingface", 63 | "megamolbart", 64 | "prott5nv", 65 | ], 66 | help="Name of BioNeMo model to use for training", 67 | ) 68 | 69 | parser.add_argument( 70 | "--download-pretrained-weights", 71 | type=bool, 72 | default=False, 73 | help="Download the pre-trained model checkpoint for fine-tuning?", 74 | ) 75 | 76 | parser.add_argument( 77 | "--ngc-cli-secret-name", 78 | type=str, 79 | default="NVIDIA_NGC_CREDS", 80 | help="Name of an AWS Secrets Manager secret containing NGC_CLI_API_KEY and NGC_CLI_ORG key/value pairs.", 81 | ) 82 | 83 | args, _ = parser.parse_known_args() 84 | return args 85 | 86 | 87 | def parse_model_path( 88 | model_name: str, 89 | root_path: str = "/workspace/bionemo/examples", 90 | ) -> str: 91 | """Parse the model path from the model name.""" 92 | 93 | if model_name == "megamolbart": 94 | model_path = "molecule/megamolbart/pretrain.py" 95 | elif model_name == "prott5nv": 96 | model_path = "protein/prott5nv/pretrain.py" 97 | elif model_name == "esm1nv": 98 | model_path = "protein/esm1nv/pretrain.py" 99 | elif re.match(r"diffdock", model_name): 100 | model_path = "molecule/diffdock/train.py" 101 | elif re.match(r"esm2", model_name): 102 | model_path = "protein/esm2nv/pretrain.py" 103 | elif re.match(r"equidock", model_name): 104 | model_path = "protein/equidock/pretrain.py" 105 | else: 106 | raise ValueError(f"Invalid model name: {model_name}") 107 | 108 | return os.path.join(root_path, model_path) 109 | 110 | 111 | def main(args): 112 | """Main function.""" 113 | 114 | # logging.info(f"Current environment variables are:\n{os.environ}") 115 | 116 | parsed_model_name = args.model_name or get_model_name_from_config( 117 | args.config_path, args.config_name 118 | ) 119 | 120 | training_script = parse_model_path(parsed_model_name) 121 | 122 | run_cmd = [ 123 | "/usr/bin/python", 124 | training_script, 125 | "--config-path", 126 | args.config_path, 127 | "--config-name", 128 | args.config_name, 129 | ] 130 | 131 | if args.download_pretrained_weights == "True": 132 | 133 | set_ngc_credentials(args.ngc_cli_secret_name) 134 | 135 | logging.info("Downloading pre-trained model checkpoint") 136 | model_path = os.getenv("MODEL_PATH") 137 | if not os.path.exists(model_path): 138 | os.makedirs(model_path) 139 | 140 | if not os.path.exists("artifact_paths.yaml"): 141 | shutil.copy( 142 | "/workspace/bionemo/artifact_paths.yaml", 143 | os.getcwd(), 144 | ) 145 | subprocess.run( 146 | [ 147 | "/usr/bin/python", 148 | "/workspace/bionemo/download_artifacts.py", 149 | parsed_model_name, 150 | "--source", 151 | "ngc", 152 | "--download_dir", 153 | model_path, 154 | ], 155 | check=True, 156 | ) 157 | downloaded_nemo_files = [ 158 | f for f in os.listdir(model_path) if f.endswith(".nemo") 159 | ] 160 | checkpoint_path = os.path.join(model_path, downloaded_nemo_files[0]) 161 | logging.info(f"Pre-trained model checkpoint downloaded to {checkpoint_path}") 162 | run_cmd.append(f"++restore_from_path={checkpoint_path}") 163 | 164 | run_cmd.append(f"++trainer.devices={NUM_GPUS}") 165 | run_cmd.append(f"++trainer.num_nodes={NUM_HOSTS}") 166 | 167 | logging.info( 168 | f"Running training script located at {training_script} with command:\n{run_cmd}" 169 | ) 170 | 171 | subprocess.run( 172 | run_cmd, 173 | check=True, 174 | ) 175 | 176 | logging.info("Training process complete") 177 | 178 | if os.environ["LOCAL_RANK"] == 0: 179 | 180 | results_path = os.path.join( 181 | os.getenv("BIONEMO_HOME"), "results/nemo_experiments" 182 | ) 183 | shutil.copytree(results_path, "/opt/ml/model/") 184 | 185 | 186 | def set_ngc_credentials(secret_name: str) -> None: 187 | """Get NVIDIA NGC API Key and org from AWS Secrets Manager""" 188 | 189 | # Create a Secrets Manager client 190 | client = boto3.client("secretsmanager", region_name=os.getenv("AWS_REGION")) 191 | 192 | logging.info("Retrieving NGC credentials from AWS Secrets Manager.") 193 | 194 | try: 195 | get_secret_value_response = client.get_secret_value(SecretId=secret_name) 196 | except ClientError as e: 197 | # For a list of exceptions thrown, see 198 | # https://docs.aws.amazon.com/secretsmanager/latest/apireference/API_GetSecretValue.html 199 | raise e 200 | 201 | creds = json.loads(get_secret_value_response["SecretString"]) 202 | 203 | logging.info("Setting NGC credentials as environment variables.") 204 | os.environ["NGC_CLI_API_KEY"] = creds.get("NGC_CLI_API_KEY", "") 205 | os.environ["NGC_CLI_ORG"] = creds.get("NGC_CLI_ORG", "") 206 | os.environ["NGC_CLI_TEAM"] = creds.get("NGC_CLI_TEAM", "") 207 | os.environ["NGC_CLI_FORMAT_TYPE"] = creds.get("NGC_CLI_FORMAT_TYPE", "ascii") 208 | 209 | return None 210 | 211 | 212 | def get_model_name_from_config( 213 | config_path: str = "/opt/ml/input/data/config", config_name: str = "train" 214 | ) -> str: 215 | """Get the model name from the config file.""" 216 | with open(os.path.join(config_path, f"{config_name}.yaml")) as f: 217 | config = yaml.safe_load(f) 218 | return config["name"] 219 | 220 | 221 | def init_distributed_training(args): 222 | """Initializes distributed training settings.""" 223 | 224 | try: 225 | backend = "smddp" 226 | import smdistributed.dataparallel.torch.torch_smddp 227 | except ModuleNotFoundError: 228 | backend = "nccl" 229 | print("Warning: SMDDP not found on this image, falling back to NCCL!") 230 | 231 | local_rank = int(os.environ["LOCAL_RANK"]) 232 | world_size = int(os.environ["WORLD_SIZE"]) 233 | global_rank = int(os.environ["RANK"]) 234 | 235 | if local_rank == 0: 236 | logging.info("Local Rank is : {}".format(os.environ["LOCAL_RANK"])) 237 | logging.info("Worldsize is : {}".format(os.environ["WORLD_SIZE"])) 238 | logging.info("Rank is : {}".format(os.environ["RANK"])) 239 | 240 | logging.info("Master address is : {}".format(os.environ["MASTER_ADDR"])) 241 | logging.info("Master port is : {}".format(os.environ["MASTER_PORT"])) 242 | 243 | dist.init_process_group( 244 | backend=backend, 245 | world_size=world_size, 246 | rank=global_rank, 247 | init_method="env://", 248 | timeout=timedelta(seconds=120), 249 | ) 250 | 251 | return local_rank, world_size, global_rank 252 | 253 | 254 | if __name__ == "__main__": 255 | args = parse_args() 256 | local_rank, world_size, global_rank = init_distributed_training(args) 257 | main(args) 258 | -------------------------------------------------------------------------------- /train-ESM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# ESM-1nv Training with BioNeMo on Amazon SageMaker\n", 8 | "\n", 9 | "Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.\n", 10 | "SPDX-License-Identifier: MIT-0" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "---\n", 18 | "## 1. Setup" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "### 1.1. Create clients" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": null, 31 | "metadata": { 32 | "tags": [] 33 | }, 34 | "outputs": [], 35 | "source": [ 36 | "import boto3\n", 37 | "import os\n", 38 | "import sagemaker\n", 39 | "from time import strftime\n", 40 | "\n", 41 | "boto_session = boto3.session.Session()\n", 42 | "sagemaker_session = sagemaker.session.Session(boto_session)\n", 43 | "REGION_NAME = sagemaker_session.boto_region_name\n", 44 | "S3_BUCKET = sagemaker_session.default_bucket()\n", 45 | "S3_PREFIX = \"bionemo-training\"\n", 46 | "S3_FOLDER = sagemaker.s3.s3_path_join(\"s3://\", S3_BUCKET, S3_PREFIX)\n", 47 | "print(f\"S3 uri is {S3_FOLDER}\")\n", 48 | "\n", 49 | "EXPERIMENT_NAME = \"bionemo-training-\" + strftime(\"%Y-%m-%d\")\n", 50 | "\n", 51 | "SAGEMAKER_EXECUTION_ROLE = sagemaker.session.get_execution_role(sagemaker_session)\n", 52 | "print(f\"Assumed SageMaker role is {SAGEMAKER_EXECUTION_ROLE}\")" 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": { 58 | "tags": [] 59 | }, 60 | "source": [ 61 | "### 1.2. Build BioNeMo-Training Container Image\n", 62 | "\n", 63 | "If you don't already have access to the BioNeMo-SageMaker container image, run the following cell to build and deploy it to your AWS account. Take note of the image URI - you'll use it for the processing and training steps below.\n", 64 | "\n", 65 | "Here is an example shell script you can use in your environment (including SageMaker Notebook Instances) to build the container.\n", 66 | "\n", 67 | "Once you have built and pushed the container, we strongly recommend using [ECR image scanning](https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html) to ensure that it meets your security requirements." 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "NOTE: If you don't have access to a container build environment, one alternative is the [Amazon SageMaker Studio Image Build CLI](https://github.com/aws-samples/sagemaker-studio-image-build-cli)." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": { 81 | "scrolled": true, 82 | "tags": [] 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "%%bash\n", 87 | "\n", 88 | "# The name of our algorithm\n", 89 | "algorithm_name=bionemo-training\n", 90 | "\n", 91 | "pushd container/training\n", 92 | "\n", 93 | "account=$(aws sts get-caller-identity --query Account --output text)\n", 94 | "\n", 95 | "# Get the region defined in the current configuration (default to us-west-2 if none defined)\n", 96 | "region=$(aws configure get region)\n", 97 | "region=${region:-us-west-2}\n", 98 | "\n", 99 | "fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\"\n", 100 | "\n", 101 | "# If the repository doesn't exist in ECR, create it.\n", 102 | "aws ecr describe-repositories --repository-names \"${algorithm_name}\" > /dev/null 2>&1\n", 103 | "\n", 104 | "if [ $? -ne 0 ]\n", 105 | "then\n", 106 | " aws ecr create-repository --repository-name \"${algorithm_name}\" > /dev/null\n", 107 | "fi\n", 108 | "\n", 109 | "# Get the login command from ECR and execute it directly\n", 110 | "$(aws ecr get-login --region ${region} --no-include-email)\n", 111 | "\n", 112 | "# Build the docker image locally with the image name and then push it to ECR\n", 113 | "# with the full name.\n", 114 | "\n", 115 | "docker build -t ${algorithm_name} .\n", 116 | "docker tag ${algorithm_name} ${fullname}\n", 117 | "\n", 118 | "docker push ${fullname}\n", 119 | "\n", 120 | "popd" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "---\n", 128 | "## 2. Data Processing" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "### 2.1. Query UniProt for human amino acid sequences between 100 and 500 residues in length" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "tags": [] 143 | }, 144 | "outputs": [], 145 | "source": [ 146 | "from io import BytesIO\n", 147 | "import pandas as pd\n", 148 | "import requests\n", 149 | "\n", 150 | "query_url = \"https://rest.uniprot.org/uniprotkb/stream?query=organism_id:9606+AND+reviewed=True+AND+length=[100+TO+500]&format=tsv&compressed=true&fields=accession,sequence\"\n", 151 | "uniprot_request = requests.get(query_url)\n", 152 | "bio = BytesIO(uniprot_request.content)\n", 153 | "\n", 154 | "df = pd.read_csv(bio, compression=\"gzip\", sep=\"\\t\")\n", 155 | "display(df)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "### 2.2. Split Data and Upload to S3" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": null, 168 | "metadata": { 169 | "tags": [] 170 | }, 171 | "outputs": [], 172 | "source": [ 173 | "train = df.sample(n=9600, random_state=42)\n", 174 | "val_test = df.drop(train.index)\n", 175 | "val = val_test.sample(n=960, random_state=42)\n", 176 | "test = val_test.drop(val.index).sample(n=960, random_state=42)\n", 177 | "del val_test\n", 178 | "\n", 179 | "print(f\"Training data size: {train.shape}\")\n", 180 | "print(f\"Validation data size: {val.shape}\")\n", 181 | "print(f\"Test data size: {test.shape}\")\n", 182 | "\n", 183 | "for dir in [\"train\", \"val\", \"test\"]:\n", 184 | " if not os.path.exists(os.path.join(\"data\", dir)):\n", 185 | " os.makedirs(os.path.join(\"data\", dir))\n", 186 | "\n", 187 | "train.to_csv(os.path.join(\"data\", \"train\", \"x000.csv\"), index=False)\n", 188 | "val.to_csv(os.path.join(\"data\", \"val\", \"x001.csv\"), index=False)\n", 189 | "test.to_csv(os.path.join(\"data\", \"test\", \"x002.csv\"), index=False)\n", 190 | "\n", 191 | "DATA_PREFIX = os.path.join(S3_PREFIX, \"data\")\n", 192 | "DATA_URI = sagemaker_session.upload_data(\n", 193 | " path=\"data\", bucket=S3_BUCKET, key_prefix=DATA_PREFIX\n", 194 | ")\n", 195 | "print(f\"Sequence data available at {DATA_URI}\")" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": { 201 | "tags": [] 202 | }, 203 | "source": [ 204 | "---\n", 205 | "## 3. Configure NVIDIA NGC API Credentiatls" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "Before you create a BioNeMo training job, follow these steps to generate some NGC API credentials and store them in AWS Secrets Manager. \n", 213 | "\n", 214 | "1. Sign-in or create a new account at NVIDIA [NGC](https://ngc.nvidia.com/signin).\n", 215 | "2. Select your name in the top-right corner of the screen and then \"Setup\"\n", 216 | "\n", 217 | "![Select Setup from the top-right menup](img/1-setup.png)\n", 218 | "\n", 219 | "3. Select \"Generate API Key\".\n", 220 | "\n", 221 | "![Select Generate API Key](img/2-api-key.png)\n", 222 | "\n", 223 | "4. Select the green \"+ Generate API Key\" button and confirm.\n", 224 | "\n", 225 | "![Select green Generate API Key button ](img/3-generate.png)\n", 226 | "\n", 227 | "5. Copy the API key - this is the last time you can retrieve it!\n", 228 | "\n", 229 | "6. Before you leave the NVIDIA NGC site, also take note of your organization ID listed under your name in the top-right corner of the screen. You'll need this, plus your API key, to download BioNeMo artifacts.\n", 230 | "\n", 231 | "7. Navigate to the AWS Console and then to AWS Secrets Manager.\n", 232 | "\n", 233 | "![Navigate to AWS Secrets Manager](img/4-sm.png)\n", 234 | "\n", 235 | "8. Select \"Store a new secret\".\n", 236 | "9. Under \"Secret type\" select \"Other type of secret\"\n", 237 | "\n", 238 | "![Select other type of secret](img/5-secret-type.png)\n", 239 | "\n", 240 | "10. Under \"Key/value\" pairs, add a key named \"NGC_CLI_API_KEY\" with a value of your NGC API key. Add another key named \"NGC_CLI_ORG\" with a value of your NGC organization. Select Next.\n", 241 | "\n", 242 | "11. Under \"Configure secret - Secret name and description\", name your secret \"NVIDIA_NGC_CREDS\" and select Next. You'll use this secret name when submitting BioNeMo jobs to SageMaker.\n", 243 | "\n", 244 | "12. Select the remaining default options to create your secret.\n" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "## 4. Submit ESM-1nv Training Job" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": { 258 | "scrolled": true, 259 | "tags": [] 260 | }, 261 | "outputs": [], 262 | "source": [ 263 | "import os\n", 264 | "from sagemaker.experiments.run import Run\n", 265 | "from sagemaker.pytorch import PyTorch\n", 266 | "\n", 267 | "# Replace this with your ECR repository URI from above\n", 268 | "BIONEMO_IMAGE_URI = (\n", 269 | " \".dkr.ecr..amazonaws.com/bionemo-training:latest\"\n", 270 | ")\n", 271 | "\n", 272 | "bionemo_estimator = PyTorch(\n", 273 | " base_job_name=\"bionemo-training\",\n", 274 | " distribution={\"torch_distributed\": {\"enabled\": True}},\n", 275 | " entry_point=\"train.py\",\n", 276 | " hyperparameters={\n", 277 | " \"config-name\": \"esm1nv-training\", # This is the name of your config file, without the extension\n", 278 | " \"model-name\": \"esm1nv\", # If you don't provide this as a hyperparameter, it will be inferred from the name field in the config file\n", 279 | " \"download-pretrained-weights\": True, # Required to fine-tune from pretrained weights. Set to False for pretraining.\n", 280 | " \"ngc-cli-secret-name\": \"NVIDIA_NGC_CREDS\" # Replace this if you used a different name above.\n", 281 | " },\n", 282 | " image_uri=BIONEMO_IMAGE_URI,\n", 283 | " instance_count=1, # Update this value for multi-node training\n", 284 | " instance_type=\"ml.g5.2xlarge\", # Update this value for other instance types\n", 285 | " output_path=os.path.join(S3_FOLDER, \"model\"),\n", 286 | " role=SAGEMAKER_EXECUTION_ROLE,\n", 287 | " sagemaker_session=sagemaker_session,\n", 288 | " source_dir=\"src\",\n", 289 | ")\n", 290 | "\n", 291 | "with Run(\n", 292 | " experiment_name=EXPERIMENT_NAME,\n", 293 | " sagemaker_session=sagemaker_session,\n", 294 | ") as run:\n", 295 | " bionemo_estimator.fit(\n", 296 | " inputs={\n", 297 | " \"train\": os.path.join(DATA_URI, \"train\"),\n", 298 | " \"val\": os.path.join(DATA_URI, \"val\"),\n", 299 | " },\n", 300 | " wait=False,\n", 301 | " )" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": null, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [] 317 | } 318 | ], 319 | "metadata": { 320 | "kernelspec": { 321 | "display_name": "conda_pytorch_p310", 322 | "language": "python", 323 | "name": "conda_pytorch_p310" 324 | }, 325 | "language_info": { 326 | "codemirror_mode": { 327 | "name": "ipython", 328 | "version": 3 329 | }, 330 | "file_extension": ".py", 331 | "mimetype": "text/x-python", 332 | "name": "python", 333 | "nbconvert_exporter": "python", 334 | "pygments_lexer": "ipython3", 335 | "version": "3.10.14" 336 | } 337 | }, 338 | "nbformat": 4, 339 | "nbformat_minor": 4 340 | } 341 | --------------------------------------------------------------------------------