├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── architecture_diagram.png ├── env.env ├── examples ├── lgbm │ ├── conf │ │ └── conf.yaml │ ├── dag.png │ ├── data_source.md │ ├── evaluate │ │ ├── __init__.py │ │ ├── evaluate.py │ │ └── requirements.txt │ ├── preprocessing │ │ ├── __init__.py │ │ ├── preprocessing.py │ │ └── requirements.txt │ ├── requirements.txt │ ├── training │ │ ├── __init__.py │ │ ├── requirements.txt │ │ └── training.py │ └── transform │ │ ├── __init__.py │ │ ├── docker │ │ ├── Dockerfile │ │ ├── dockerd-entrypoint.py │ │ ├── model_script.py │ │ └── readme.md │ │ ├── requirements.txt │ │ └── transform.py ├── llm-text-summarization │ ├── conf │ │ └── conf.yaml │ ├── dag.png │ ├── data_source.md │ ├── preprocessing │ │ ├── __init__.py │ │ └── preprocessing.py │ ├── requirements.txt │ ├── training │ │ ├── __init__.py │ │ ├── requirements.txt │ │ └── training.py │ └── transform │ │ └── inference.py ├── multi-model-example │ ├── MultiModel.md │ ├── cal_housing_pca │ │ ├── conf │ │ │ └── conf-multi-model.yaml │ │ ├── data_source.md │ │ └── modelscripts │ │ │ ├── inference.py │ │ │ ├── preprocess.py │ │ │ ├── requirements.txt │ │ │ └── train.py │ ├── cal_housing_tf │ │ ├── conf │ │ │ └── conf-multi-model.yaml │ │ ├── data_source.md │ │ └── modelscripts │ │ │ ├── evaluate.py │ │ │ ├── inference.py │ │ │ ├── preprocess.py │ │ │ ├── requirements.txt │ │ │ └── train.py │ └── dag.png └── tf │ ├── conf │ └── conf.yaml │ ├── data_source.md │ ├── modelscripts │ ├── evaluate.py │ ├── inference.py │ ├── preprocess.py │ ├── requirements.txt │ └── train.py │ └── smp_dag.png ├── framework ├── .gitignore ├── __init__.py ├── conf │ ├── conf.yaml │ └── template │ │ └── conf.yaml ├── createmodel │ ├── __init__.py │ └── create_model_service.py ├── framework_entrypoint.py ├── modelmetrics │ ├── __init__.py │ └── model_metrics_service.py ├── pipeline │ ├── helper.py │ ├── model_unit.py │ └── pipeline_service.py ├── processing │ ├── __init__.py │ └── processing_service.py ├── registermodel │ ├── __init__.py │ └── register_model_service.py ├── training │ ├── __init__.py │ └── training_service.py ├── transform │ ├── __init__.py │ └── transform_service.py └── utilities │ ├── __init__.py │ ├── configuration.py │ ├── logger.py │ └── utils.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | **/.DS_Store 2 | *.ipynb 3 | *.ipynb_checkpoints* 4 | *__pycache__* 5 | pipeline_definition.json 6 | *env.txt 7 | .idea 8 | git_erase_commits 9 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT No Attribution 2 | 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so. 10 | 11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Dynamic Sagemaker Pipelines Framework 2 | In this repo, we present a framework for automating SageMaker Pipelines DAG creation based on simple configuration files. The framework code starts by reading the configuration file(s); and then dynamically creates a SameMaker Pipelines DAG based on the steps declared in the configuration file(s) and the interactions/dependencies among steps. This orchestration framework caters to both single-model and multi-model use cases; and ensures smooth flow of data and processes. 3 | 4 | ### Architecture Diagram 5 | 6 | The following architecture diagram depicts how the proposed framework can be used during both experimentation and operationalization of ML models. 7 | 8 | ![architecture-diagram](./architecture_diagram.png) 9 | 10 | ### Repository Structure 11 | 12 | This repo contains the following directories and files: 13 | 14 | - **/framework/conf/**: This directory contains a configuration file that is used to set common variables across all modeling units such as subnets, security groups, and IAM role at the runtime. A modeling unit is a sequence of up to 6 steps for training an ML model 15 | - **/framework/createmodel/**: This directory contains a Python script that creates a SageMaker Model object based on model artifacts from a Training Step 16 | - **/framework/modelmetrics/**: This directory contains a Python script that creates a SageMaker Processing job for generating a model metrics JSON report for a trained model 17 | - **/framework/pipeline/**: This directory contains Python scripts that leverage Python classes defined in other framework directories to create or update a SageMaker Pipelines DAG based on the specified configurations. The model_unit.py script is used by pipeline_service.py to create one or more modeling units. Each modeling unit is a sequence of up to 6 steps for training an ML model: process, train, create model, transform, metrics, and register model. Configurations for each modeling unit should be specified in the model’s respective repository. The pipeline_service.py also sets dependencies among SageMaker Pipelines steps (i.e., how steps within and across modeling units are sequenced and/or chained) based on sagemakerPipeline section which should be defined in the configuration file of one of the model repositories (i.e., the anchor model) 18 | - **/framework/processing/**: This directory contains a Python script that creates a generic SageMaker Processing job 19 | - **/framework/registermodel/**: This directory contains a Python script for registering a trained model along with its calculated metrics in SageMaker Model Registry 20 | - **/framework/training/**: This directory contains a Python script that creates a SageMaker Training job 21 | - **/framework/transform/**: This directory contains a Python script that creates a SageMaker Batch Transform job. In the context of model training, this is used to calculate the performance of a trained model on test data • /framework/utilities/: This directory contains utility scripts for reading and joining configuration files, as well as logging 22 | - **/framework_entrypoint.py**: This file is the entry point of the framework code. It simply calls a function defined in the /framework/pipeline/ directory to create or update a SageMaker Pipelines DAG and execute it 23 | - **/examples/**: This directory contains several examples of how this automation framework can be used to create simple and complex training DAGs 24 | - **/env.env**: This file allows to set common variables such as subnets, security groups, and IAM role as environment variables 25 | - **/requirements.txt**: This file specifies Python libraries that are required for the framework code 26 | 27 | ### Deployment Guide 28 | 29 | Follow the steps below in order to deploy the solution: 30 | 31 | 1. Organize your model training repository, for example according to the following structure: 32 | 33 | ``` 34 | 35 | . 36 | ├── 37 | | ├── conf 38 | | | └── conf.yaml 39 | | └── scripts 40 | | ├── preprocess.py 41 | | ├── train.py 42 | | ├── transform.py 43 | | └── evaluate.py 44 | └── README.md 45 | ``` 46 | 47 | 1. Clone the framework code and your model(s) source code from the Git repositories: 48 | 49 | a. Clone `dynamic-sagemaker-pipelines-framework` repo into a training directory. Here we assume the training directory is called `aws-train` : 50 | 51 | 52 | git clone https://github.com/aws-samples/dynamic-sagemaker-pipelines-framework.git aws-train 53 | 54 | b. Clone the model(s) source code under the same directory. For multi-model training repeat this step for as many models you require to train. 55 | 56 | 57 | git clone https:.git aws-train 58 | 59 | For a single-model training, your directory should look like: 60 | 61 | 62 | . 63 | ├── framework 64 | └── 65 | 66 | For multi-model training, your directory should look like: 67 | 68 | 69 | . 70 | ├── framework 71 | └── 72 | └── 73 | └── 74 | 75 | 76 | 1. Set up the following environment variables. Asterisks indicate environment variables which are required, while the rest are optional. 77 | 78 | 79 | | Environment Variable | Description | 80 | | ---------------------| ------------| 81 | | SMP_ACCOUNTID* | AWS Account where SageMaker Pipeline is executed | 82 | | SMP_REGION* | AWS Region where SageMaker Pipeline is executed | 83 | | SMP_S3BUCKETNAME* | Amazon S3 bucket name | 84 | | SMP_ROLE* | Amazon SageMaker execution role | 85 | | SMP_MODEL_CONFIGPATH* | Relative path of the of single-model or multi-model configuration file(s) | 86 | | SMP_SUBNETS | Subnet IDs for SageMaker networking configuration | 87 | | SMP_SECURITYGROUPS | Security group IDs for SageMaker networking configuration | 88 | 89 | Note: 90 | 91 | a. For single-model use cases: SMP_MODEL_CONFIGPATH="/conf/conf.yaml" 92 | 93 | b. For multi-model use cases: SMP_MODEL_CONFIGPATH="*/conf/conf.yaml" 94 | 95 | During experimentation (i.e., local testing), you can specify environment variables inside env.env file; and then export them by executing the following command in your terminal: 96 | 97 | ```bash 98 | source env.env 99 | ``` 100 | During operationalization, these environment variables will be set by the CI pipeline. 101 | 102 | Note: All environment variables should be between quotation marks, the following example is provided to illustrate it: 103 | 104 | SMP_SUBNETS="subnet-xxxxxxxx,subnet-xxxxxxxx" 105 | SMP_ROLE="arn:aws:iam::xxxxxxxxxxxx:role/xxxxx" 106 | SMP_SECURITYGROUPS="sg-xxxxxxxx" 107 | SMP_S3BUCKETNAME="your-bucket" 108 | SMP_MODEL_CONFIGPATH="your-path-absolute-or-relative/*/conf/*.yaml" 109 | SMP_ACCOUNTID=”xxxxxxxxxxxx” 110 | SMP_REGION="your-aws-region" 111 | 112 | 1. Recommendations for how the security groups, VPCs, IAM roles, buckets, and subnets should be set up. 113 | Please be aware that this recommendations needs to be considered at the moment where `SageMaker Domain` is created. 114 | 115 | As best practice: 116 | - Create IAM roles with minimum permissions required for the ML activities you want to support using SageMaker Role Manager. This tool provides predefined policies that can be customized. 117 | 118 | - Set up VPCs with public and private subnets across multiple Availability Zones for fault tolerance. Configure security groups to allow access to SageMaker endpoints only from the private subnets. 119 | 120 | - Create S3 buckets in the same region as the SageMaker domain for storing model artifacts and data. The buckets should be encrypted using AWS KMS. 121 | 122 | - When configuring networking, choose the "VPC" connection option. Select the VPC and subnets created earlier. 123 | 124 | - Review and customize the IAM roles and policies generated by SageMaker Role Manager to meet your specific access and governance needs. 125 | 126 | Refer to the AWS documentation on SageMaker domain setup for organizations for step-by-step instructions on configuring the above resources through the SageMaker console wizard. 127 | 128 | sources: 129 | [1][Amazon SageMaker simplifies setting up SageMaker domain for enterprises to onboard their users to SageMaker](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-simplifies-setting-up-sagemaker-domain-for-enterprises-to-onboard-their-users-to-sagemaker/) 130 | [2][Choose an Amazon VPC - Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-vpc.html) 131 | [3][Connect to SageMaker Within your VPC - Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/interface-vpc-endpoint.html) 132 | 133 | 134 | 1. Create and activate a virtual environment: 135 | 136 | ```bash 137 | python -m venv .venv 138 | source .venv/bin/activate 139 | ``` 140 | 141 | 1. Install required python packages: 142 | 143 | ```bash 144 | pip install -r requirements.txt 145 | ``` 146 | 147 | 1. Edit your model training conf.yaml file(s). Please see the Configuration Files Structure section for more details. 148 | 149 | 1. From terminal, call framework’s entry point to create or update and execute the SageMaker Pipeline training DAG: 150 | 151 | ```bash 152 | python framework/framework_entrypoint.py 153 | ``` 154 | 155 | 1. View and debug the SageMaker Pipelines execution in the Pipelines tab of SageMaker Studio UI. 156 | 157 | 158 | 159 | ### Configuration Files Structure 160 | 161 | There are two types of configuration files in the proposed solution: 1) Framework configuration, and 2) Model configuration(s). Below we describe each in details. 162 | 163 | #### Framework Configuration 164 | 165 | The `/framework/conf/conf.yaml` is used to set variables that are common across all modeling units. This includes SMP_S3BUCKETNAME, SMP_ROLE, SMP_MODEL_CONFIGPATH, SMP_SUBNETS, SMP_SECURITYGROUPS, and SMP_MODELNAME. Please see step 3 of the `Deployment Guide section` for descriptions of these variables and how to set them via environment variables. 166 | 167 | #### Model Configuration(s) 168 | 169 | For each model in the project, we need to specify the following in the /conf/conf.yaml file (Asterisks indicate required fields, while the rest are optional): 170 | 171 | - **/conf/models***: Under this section, one or more modeling units can be configured. When the framework code is executed, it will automatically read all configuration files during run-time and append them to the config tree. Theoretically, you can specify all modeling units in the same conf.yaml file, but it is recommended to specify each modeling unit configuration in its respective Git repository to minimize errors. 172 | 173 | - **{model-name}***: The name of the model. 174 | - **source_directory***: A common source_dir path to use for all steps within the modeling unit 175 | - **[processing](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing)**: This section specifies preprocessing parameters below. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) for descriptions of each parameter 176 | 177 | ``` 178 | image_uri*: 179 | entry_point*: 180 | base_job_name: 181 | instance_count: # default value: 1 182 | instance_type: # default value: "ml.m5.2xlarge" 183 | volume_size_in_gb: # default value: 32 184 | max_runtime_seconds: # default value: 3000 185 | tags: # default value: None 186 | env: # default value: None 187 | framework_version: # default value: "0" 188 | s3_data_distribution_type: # default value: "FullyReplicated" 189 | s3_data_type: # default value: "S3Prefix" 190 | s3_input_mode: # default value: "File" 191 | s3_upload_mode: # default value: "EndOfJob" 192 | channels: 193 | train: 194 | dataFiles: 195 | - sourceName: 196 | fileName*: 197 | ``` 198 | Note: 199 | 200 | a. dataFiles are loaded to container at "_/opt/ml/processing/input/{sourceName}/_" path 201 | 202 | b. SageMaker offloads the content from "_/opt/ml/processing/input/{channelName}/_" container path to S3 when the processing job is complete 203 | 204 | - **[train*](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training)**: This section specifies training job parameters below. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) for descriptions of each parameter 205 | 206 | ``` 207 | image_uri*: 208 | entry_point*: 209 | base_job_name: 210 | instance_count: # default value: 1 211 | instance_type: # default value: "ml.m5.2xlarge" 212 | volume_size_in_gb: # default value: 32 213 | max_runtime_seconds: # default value: 3000 214 | tags: 215 | env: 216 | hyperparams: 217 | model_data_uri: 218 | channels: 219 | train*: 220 | dataFiles: 221 | - sourceName: 222 | fileName: 223 | test: 224 | dataFiles: 225 | - sourceName: 226 | fileName: 227 | ``` 228 | 229 | Note: 230 | 231 | a. dataFiles are loaded to container at "_/opt/ml/input/data/{channelName}/_" path (also accessible via environment variable "_SM_CHANNEL\_{channelName}_") 232 | 233 | b. SageMaker zips trained model artifacts from "_/opt/ml/model/_" container path and uploads to S3 234 | 235 | - **[transform*](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform)**: This section specifies SageMaker Transform job parameters below for making predictions on the test data. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TransformStep) for descriptions of each parameter 236 | 237 | ``` 238 | image_uri*: 239 | base_job_name: # default value: "default-transform-job-name" 240 | instance_count: # default value: 1 241 | instance_type: # default value: "ml.m5.2xlarge" 242 | strategy: 243 | assemble_with: 244 | join_source: 245 | split_type: 246 | content_type: # default value: "text/csv" 247 | max_payload: 248 | volume_size: # default value: 50 249 | max_runtime_in_seconds: # default value: 3600 250 | input_filter: 251 | output_filter: 252 | tags: 253 | env: 254 | channels: 255 | test: 256 | s3BucketName: 257 | dataFiles: 258 | - sourceName: 259 | fileName: 260 | ``` 261 | 262 | Note: 263 | 264 | a. Results of the batch transform job are stored in S3 bucket with name s3BucketName. This S3 bucket is also used to stage local input files specified in _fileName_ 265 | 266 | b. Only one channel and one dataFile in that channel are allowed for the transform step 267 | 268 | 269 | - **[evaluate](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#property-file)**: This section specifies SageMaker Processing job parameters for generating a model metrics JSON report for the trained model. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_metrics.ModelMetrics) for descriptions of each parameter 270 | 271 | ``` 272 | image_uri*: 273 | entry_point*: 274 | base_job_name: 275 | instance_count: # default value: 1 276 | instance_type: # default value: "ml.m5.2xlarge" 277 | strategy: # default value: "SingleRecord" 278 | max_payload: 279 | volume_size_in_gb: # default value: 50 280 | max_runtime_in_seconds: # default value: 3600 281 | s3_data_distribution_type: # default value: "FullyReplicated" 282 | s3_data_type: # default value: "S3Prefix" 283 | s3_input_mode: # default value: "File" 284 | tags: 285 | env: 286 | channels: 287 | test: 288 | s3BucketName: 289 | dataFiles: 290 | - sourceName: 291 | fileName: 292 | 293 | ``` 294 | 295 | Note: 296 | 297 | a. dataFiles are loaded to container at "_/opt/ml/processing/input/{sourceName}/_" path 298 | 299 | b. Only one channel and one dataFile in that channel is allowed for evaluate step 300 | 301 | c. SageMaker offloads the content from "_/opt/ml/processing/input/{channelName}/_" container path to S3 302 | 303 | - **[registry*](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model)**: This section specifies parameters for registering the trained model in SageMaker Model Registry 304 | - **ModelRepack**: If "True", uses entry_point in the transform step for inference entry_point when serving the model on SageMaker 305 | - **[ModelPackageDescription](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel)** 306 | - **InferenceSpecification**: This section includes inference specifications of the model package. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel) for descriptions of each paramater 307 | 308 | ``` 309 | image_uri*: 310 | supported_content_types*: 311 | - application/json 312 | supported_response_MIME_types*: 313 | - application/json 314 | approval_status*: # PendingManualApproval | Rejected | Approved 315 | ``` 316 | 317 | - **/conf/sagemakerPipeline***: This section is used to define SageMaker Pipelines flow including dependencies among steps. For single-model use cases, this section is defined at the end of the configuration file. For multi-model use cases, the sagemakerPipeline section only needs to be defined in configuration file of one of the models (any of the models). We refer to this model as the anchor model. 318 | 319 | - **pipelineName***: Name of the SageMaker Pipeline 320 | - **models***: Nested list of modeling units 321 | - **{model-name}***: Model identifier which should match a {model-name} identifier in the /conf/models section. 322 | - **steps***: 323 | - **step_name***: Step name to be displayed in the SageMaker Pipelines DAG. 324 | - **step_class***: (Union[Processing, Training, CreateModel, Transform, Metrics, RegisterModel]) 325 | - **step_type***: This parameter is only required for preprocess steps, for which it should be set to preprocess. This is needed to distinguish preprocess and evaluate steps, both of which have a step_class of Processing. 326 | - **enable_cache**: ([Union[True, False]]) - whether to enable Sagemaker Pipelines caching for this step or not. 327 | - **chain_input_source_step**: ([list[step_name]]) – This can be used to set the channel outputs of another step as input to this step. 328 | - **chain_input_additional_prefix**: This is only allowed for steps of the Transform step_class; and can be used in conjunction with chain_input_source_step parameter to pinpoint the file that should be used as the input to the Transform step. 329 | - **dependencies**: This section is used to specify the sequence in which the SageMaker Pipelines steps should be executed. We have adapted the Apache Airflow notation for this section (i.e., {step_name} >> {step_name}). If this section is left blank, explicit dependencies specified by chain_input_source_step parameter and/or implicit dependencies define the Sagemaker Pipelines DAG flow. 330 | 331 | 332 | ## Security 333 | 334 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 335 | 336 | ## License 337 | 338 | This library is licensed under the MIT-*License. See the LICENSE file. 339 | -------------------------------------------------------------------------------- /architecture_diagram.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/architecture_diagram.png -------------------------------------------------------------------------------- /env.env: -------------------------------------------------------------------------------- 1 | export SMP_SUBNETS="subnet-xxxxxxxx,subnet-xxxxxxxx" 2 | export SMP_ROLE="arn:aws:iam::xxxxxxxxxxxx:role/xxxxx" 3 | export SMP_SECURITYGROUPS="sg-xxxxxxxx" 4 | export SMP_S3BUCKETNAME="your-bucket" 5 | export SMP_MODEL_CONFIGPATH="your-path-absolute-or-relative//conf/.yaml" 6 | export SMP_ACCOUNTID=”xxxxxxxxxxxx” 7 | export SMP_REGION="your-aws-region" 8 | -------------------------------------------------------------------------------- /examples/lgbm/conf/conf.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | conf: 3 | models: 4 | lgbm: 5 | source_directory: examples/lgbm 6 | train: 7 | instance_type: ml.c5.xlarge 8 | image_uri: SMP_ACCOUNTID.dkr.ecr.SMP_REGION.amazonaws.com/pytorch-training:1.9.0-cpu-py38 9 | entry_point: training/training.py 10 | base_job_name: lightgbm-train 11 | channels: 12 | train: 13 | dataFiles: 14 | - sourceName: online_shoppers_intention_train 15 | fileName: s3://SMP_S3BUCKETNAME/lightGBM/train 16 | test: 17 | dataFiles: 18 | - sourceName: online_shoppers_intention_test 19 | fileName: s3://SMP_S3BUCKETNAME/lightGBM/test 20 | 21 | registry: 22 | ModelRepack: "False" 23 | InferenceSpecification: 24 | image_uri: "SMP_ACCOUNTID.dkr.ecr.SMP_REGION.amazonaws.com/lightgbm-inference:lightgbm-i0.0" 25 | supported_content_types: 26 | - application/json 27 | supported_response_MIME_types: 28 | - application/json 29 | approval_status: PendingManualApproval 30 | 31 | transform: 32 | instance_type: ml.c5.xlarge 33 | image_uri: "refer-transform/docker-to-built-inference-image" 34 | entry_point: transform/transform.py 35 | content_type: application/x-npy 36 | channels: 37 | test: 38 | s3BucketName: SMP_S3BUCKETNAME 39 | dataFiles: 40 | - sourceName: online_shoppers_intention_test 41 | fileName: s3://SMP_S3BUCKETNAME/lightGBM/test/x_test.npy 42 | 43 | evaluate: 44 | instance_type: ml.c5.xlarge 45 | image_uri: 'SMP_ACCOUNTID.dkr.ecr.SMP_REGION.amazonaws.com/pytorch-training:1.9.0-cpu-py38' 46 | entry_point: evaluate/evaluate.py 47 | base_job_name: lgbm-evaluate 48 | content_type: application/json 49 | channels: 50 | test: 51 | s3BucketName: SMP_S3BUCKETNAME 52 | dataFiles: 53 | - sourceName: online_shoppers_intention_ytest 54 | fileName: s3://SMP_S3BUCKETNAME/lightGBM/test/y_test.npy 55 | 56 | sagemakerPipeline: 57 | pipelineName: lgbm-test 58 | models: 59 | lgbm: 60 | steps: 61 | - step_name: lgbm-Training 62 | step_class: Training 63 | enable_cache: True 64 | - step_name: lgbm-CreateModel 65 | step_class: CreateModel 66 | - step_name: lgbm-Transform 67 | step_class: Transform 68 | enable_cache: True 69 | - step_name: lgbm-Metrics 70 | step_class: Metrics 71 | chain_input_source_step: 72 | - lgbm-Transform 73 | enable_cache: True 74 | - step_name: lgbm-Register 75 | step_class: RegisterModel 76 | 77 | dependencies: 78 | - lgbm-Training >> lgbm-CreateModel >> lgbm-Transform >> lgbm-Metrics >> lgbm-Register 79 | -------------------------------------------------------------------------------- /examples/lgbm/dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/dag.png -------------------------------------------------------------------------------- /examples/lgbm/data_source.md: -------------------------------------------------------------------------------- 1 | # LGBM Data Source 2 | 3 | ## Data Information 4 | We use the Online Shoppers Purchasing Intention Dataset. 5 | 6 | More info on the dataset: 7 | 8 | This dataset was obtained from UCI's Machine Learning Library. https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intention+dataset 9 | 10 | 11 | ## Data Download 12 | Download the data locally from [here](https://archive.ics.uci.edu/static/public/468/online+shoppers+purchasing+intention+dataset.zip). The data file is named **online_shoppers_intention.csv** 13 | 14 | Then reference the preprocessing script(written for sagemaker processing jobs) to create train test splits. 15 | 16 | ## Upload to S3 17 | ``` 18 | aws s3 cp /x_train.npy s3:///lightGBM/train 19 | aws s3 cp /y_train.npy s3:///lightGBM/train 20 | 21 | aws s3 cp /x_test.npy s3:///lightGBM/test 22 | aws s3 cp /y_test.npy s3:///lightGBM/test 23 | 24 | ``` -------------------------------------------------------------------------------- /examples/lgbm/evaluate/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/evaluate/__init__.py -------------------------------------------------------------------------------- /examples/lgbm/evaluate/evaluate.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import lightgbm as lgb 3 | import numpy as np 4 | from sklearn.metrics import accuracy_score, roc_auc_score 5 | import pathlib, json 6 | 7 | 8 | 9 | if __name__=='__main__': 10 | 11 | print('Loading data . . . .') 12 | y_test= np.load(glob.glob('{}/*.npy'.format('/opt/ml/processing/input/online_shoppers_intention_ytest'))[0]) 13 | # y_pred= glob.glob('{}/*.out'.format('/opt/ml/processing/input'))[0] 14 | text_file= open(glob.glob('{}/*.out'.format('/opt/ml/processing/input/lgbm-Transform-test'))[0], "r") 15 | y_pred= np.array([float(i) for i in text_file.read()[1:-1].split(',')]) 16 | 17 | print('\ny_pred shape: \n{}\n'.format(y_pred.shape)) 18 | print('\ny_test shape: \n{}\n'.format(y_test.shape)) 19 | 20 | 21 | print('Evaluating model . . . .\n') 22 | acc = accuracy_score(y_test.astype(int), y_pred.astype(int)) 23 | auc = roc_auc_score(y_test, y_pred) 24 | print('Accuracy: {:.2f}'.format(acc)) 25 | print('AUC Score: {:.2f}'.format(auc)) 26 | 27 | output_dir = "/opt/ml/processing/output" 28 | pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True) 29 | 30 | report_dict = { 31 | "evaluation": { 32 | "metrics": { 33 | "Accuracy": '{:.2f}'.format(acc), "AUC_Score": '{:.2f}'.format(auc) 34 | } 35 | } 36 | } 37 | 38 | evaluation_path = f"{output_dir}/model_evaluation_metrics.json" 39 | with open(evaluation_path, "w") as f: 40 | f.write(json.dumps(report_dict)) 41 | 42 | -------------------------------------------------------------------------------- /examples/lgbm/evaluate/requirements.txt: -------------------------------------------------------------------------------- 1 | lightgbm 2 | numpy 3 | pandas 4 | scikit-learn -------------------------------------------------------------------------------- /examples/lgbm/preprocessing/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/preprocessing/__init__.py -------------------------------------------------------------------------------- /examples/lgbm/preprocessing/preprocessing.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import numpy as np 3 | import os 4 | import pandas as pd 5 | from sklearn.model_selection import train_test_split 6 | 7 | 8 | if __name__=='__main__': 9 | 10 | 11 | input_file = glob.glob('{}/*.csv'.format('/opt/ml/processing/input')) 12 | print('\nINPUT FILE: \n{}\n'.format(input_file)) 13 | df = pd.read_csv(input_file[0]) 14 | 15 | # minor preprocessing (drop some uninformative columns etc.) 16 | print('Preprocessing the dataset . . . .') 17 | df_clean = df.drop(['Month','Browser','OperatingSystems','Region','TrafficType','Weekend'], axis=1) 18 | visitor_encoded = pd.get_dummies(df_clean['VisitorType'], prefix='Visitor_Type', drop_first = True) 19 | df_clean_merged = pd.concat([df_clean, visitor_encoded], axis=1).drop(['VisitorType'], axis=1) 20 | X = df_clean_merged.drop('Revenue', axis=1) 21 | y = df_clean_merged['Revenue'] 22 | 23 | # split the preprocessed data with stratified sampling for class imbalance 24 | X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=2, test_size=.2) 25 | 26 | # save to container directory for uploading to S3 27 | print('Saving the preprocessed dataset . . . .') 28 | train_data_output_path = os.path.join('/opt/ml/processing/train', 'x_train.npy') 29 | np.save(train_data_output_path, X_train.to_numpy()) 30 | train_labels_output_path = os.path.join('/opt/ml/processing/train', 'y_train.npy') 31 | np.save(train_labels_output_path, y_train.to_numpy()) 32 | test_data_output_path = os.path.join('/opt/ml/processing/test', 'x_test.npy') 33 | np.save(test_data_output_path, X_test.to_numpy()) 34 | test_labels_output_path = os.path.join('/opt/ml/processing/test', 'y_test.npy') 35 | np.save(test_labels_output_path, y_test.to_numpy()) -------------------------------------------------------------------------------- /examples/lgbm/preprocessing/requirements.txt: -------------------------------------------------------------------------------- 1 | lightgbm 2 | numpy 3 | pandas 4 | scikit-learn -------------------------------------------------------------------------------- /examples/lgbm/requirements.txt: -------------------------------------------------------------------------------- 1 | lightgbm 2 | numpy 3 | pandas 4 | scikit-learn -------------------------------------------------------------------------------- /examples/lgbm/training/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/training/__init__.py -------------------------------------------------------------------------------- /examples/lgbm/training/requirements.txt: -------------------------------------------------------------------------------- 1 | lightgbm 2 | numpy 3 | pandas 4 | scikit-learn -------------------------------------------------------------------------------- /examples/lgbm/training/training.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import glob 3 | import lightgbm as lgb 4 | import numpy as np 5 | import os 6 | 7 | 8 | if __name__=='__main__': 9 | 10 | # extract training data S3 location and hyperparameter values 11 | parser = argparse.ArgumentParser() 12 | parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) 13 | parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_TEST']) 14 | parser.add_argument('--num_leaves', type=int, default=28) 15 | parser.add_argument('--max_depth', type=int, default=5) 16 | parser.add_argument('--learning_rate', type=float, default=0.1) 17 | args = parser.parse_args() 18 | 19 | print('Loading training data from {}\n'.format(args.train)) 20 | input_files = glob.glob('{}/*.npy'.format(args.train)) 21 | print('\nTRAINING INPUT FILE LIST: \n{}\n'.format(input_files)) 22 | for file in input_files: 23 | if 'x_' in file: 24 | x_train = np.load(file) 25 | else: 26 | y_train = np.load(file) 27 | print('\nx_train shape: \n{}\n'.format(x_train.shape)) 28 | print('\ny_train shape: \n{}\n'.format(y_train.shape)) 29 | train_data = lgb.Dataset(x_train, label=y_train) 30 | 31 | print('Loading validation data from {}\n'.format(args.validation)) 32 | eval_input_files = glob.glob('{}/*.npy'.format(args.validation)) 33 | print('\nVALIDATION INPUT FILE LIST: \n{}\n'.format(eval_input_files)) 34 | for file in eval_input_files: 35 | if 'x_' in file: 36 | x_val = np.load(file) 37 | else: 38 | y_val = np.load(file) 39 | print('\nx_val shape: \n{}\n'.format(x_val.shape)) 40 | print('\ny_val shape: \n{}\n'.format(y_val.shape)) 41 | eval_data = lgb.Dataset(x_val, label=y_val) 42 | 43 | print('Training model with hyperparameters:\n\t num_leaves: {}\n\t max_depth: {}\n\t learning_rate: {}\n' 44 | .format(args.num_leaves, args.max_depth, args.learning_rate)) 45 | parameters = { 46 | 'objective': 'binary', 47 | 'metric': 'binary_logloss', 48 | 'is_unbalance': 'true', 49 | 'boosting': 'gbdt', 50 | 'num_leaves': args.num_leaves, 51 | 'max_depth': args.max_depth, 52 | 'learning_rate': args.learning_rate, 53 | 'verbose': 1 54 | } 55 | num_round = 10 56 | bst = lgb.train(parameters, train_data, num_round, eval_data) 57 | 58 | print('Saving model . . . .') 59 | bst.save_model('/opt/ml/model/online_shoppers_model.txt') -------------------------------------------------------------------------------- /examples/lgbm/transform/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/transform/__init__.py -------------------------------------------------------------------------------- /examples/lgbm/transform/docker/Dockerfile: -------------------------------------------------------------------------------- 1 | 2 | FROM ubuntu:18.04 3 | 4 | # Set a docker label to advertise multi-model support on the container 5 | LABEL com.amazonaws.sagemaker.capabilities.multi-models=true 6 | # Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present 7 | LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true 8 | 9 | # Install necessary dependencies for MMS and SageMaker Inference Toolkit 10 | RUN apt-get update && \ 11 | apt-get -y install --no-install-recommends \ 12 | build-essential \ 13 | ca-certificates \ 14 | openjdk-8-jdk-headless \ 15 | python3-dev \ 16 | curl \ 17 | vim \ 18 | && rm -rf /var/lib/apt/lists/* \ 19 | && curl -O https://bootstrap.pypa.io/pip/3.6/get-pip.py \ 20 | && python3 get-pip.py 21 | 22 | RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1 23 | RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1 24 | 25 | RUN pip install lightgbm numpy pandas \ 26 | scikit-learn multi-model-server \ 27 | sagemaker-inference retrying 28 | 29 | # Copy entrypoint script to the image 30 | COPY dockerd-entrypoint.py /usr/local/bin/dockerd-entrypoint.py 31 | RUN chmod +x /usr/local/bin/dockerd-entrypoint.py 32 | 33 | RUN mkdir -p /home/model-server/ 34 | 35 | # Copy the default custom service file to handle incoming data and inference requests 36 | COPY model_script.py /home/model-server/model_script.py 37 | 38 | # Define an entrypoint script for the docker image 39 | ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"] 40 | 41 | # Define command to be passed to the entrypoint 42 | CMD ["serve"] 43 | 44 | # Define healthcheck 45 | HEALTHCHECK CMD curl --fail http://localhost:8080/ping || exit 1 46 | 47 | # Add and set a non-root user. Issue with sagemaker inference with non-root users linked here- https://github.com/aws/sagemaker-inference-toolkit/issues/72. Please comment the lines below until linked issue is resolved. 48 | RUN useradd -m nonroot 49 | USER nonroot 50 | -------------------------------------------------------------------------------- /examples/lgbm/transform/docker/dockerd-entrypoint.py: -------------------------------------------------------------------------------- 1 | 2 | import subprocess 3 | import sys 4 | import shlex 5 | import os 6 | from retrying import retry 7 | from subprocess import CalledProcessError 8 | from sagemaker_inference import model_server 9 | 10 | def _retry_if_error(exception): 11 | return isinstance(exception, CalledProcessError or OSError) 12 | 13 | @retry(stop_max_delay=1000 * 50, 14 | retry_on_exception=_retry_if_error) 15 | def _start_mms(): 16 | # by default the number of workers per model is 1, but we can configure it through the 17 | # environment variable below if desired. 18 | # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2' 19 | model_server.start_model_server(handler_service='/home/model-server/model_script.py:handle') 20 | 21 | def main(): 22 | if sys.argv[1] == 'serve': 23 | _start_mms() 24 | else: 25 | subprocess.check_call(shlex.split(' '.join(sys.argv[1:]))) 26 | 27 | # prevent docker exit 28 | subprocess.call(['tail', '-f', '/dev/null']) 29 | 30 | main() 31 | -------------------------------------------------------------------------------- /examples/lgbm/transform/docker/model_script.py: -------------------------------------------------------------------------------- 1 | 2 | from collections import namedtuple 3 | import glob 4 | import json 5 | import logging 6 | import os 7 | import re 8 | 9 | import lightgbm as lgb 10 | import numpy as np 11 | from sagemaker_inference import content_types, encoder 12 | 13 | NUM_FEATURES = 12 14 | 15 | class ModelHandler(object): 16 | """ 17 | A lightGBM Model handler implementation. 18 | """ 19 | 20 | def __init__(self): 21 | self.initialized = False 22 | self.model = None 23 | 24 | def initialize(self, context): 25 | """ 26 | Initialize model. This will be called during model loading time 27 | :param context: Initial context contains model server system properties. 28 | :return: None 29 | """ 30 | self.initialized = True 31 | properties = context.system_properties 32 | model_dir = properties.get("model_dir") 33 | self.model = lgb.Booster(model_file=os.path.join(model_dir,'online_shoppers_model.txt')) 34 | 35 | 36 | def preprocess(self, request): 37 | """ 38 | Transform raw input into model input data. 39 | :param request: list of raw requests 40 | :return: list of preprocessed model input data 41 | """ 42 | payload = request[0]['body'] 43 | payload= payload[payload.find(b'\n')+1:] 44 | data = np.frombuffer(payload, dtype=np.float64) 45 | data = data.reshape((data.size // NUM_FEATURES, NUM_FEATURES)) 46 | return data 47 | 48 | def inference(self, model_input): 49 | """ 50 | Internal inference methods 51 | :param model_input: transformed model input data list 52 | :return: list of inference output in numpy array 53 | """ 54 | prediction = self.model.predict(model_input) 55 | print('prediction: ', prediction) 56 | return prediction 57 | 58 | def postprocess(self, inference_output): 59 | """ 60 | Post processing step - converts predictions to str 61 | :param inference_output: predictions as numpy 62 | :return: list of inference output as string 63 | """ 64 | 65 | return [str(inference_output.tolist())] 66 | 67 | def handle(self, data, context): 68 | """ 69 | Call preprocess, inference and post-process functions 70 | :param data: input data 71 | :param context: mms context 72 | """ 73 | 74 | model_input = self.preprocess(data) 75 | model_out = self.inference(model_input) 76 | return self.postprocess(model_out) 77 | 78 | _service = ModelHandler() 79 | 80 | 81 | def handle(data, context): 82 | if not _service.initialized: 83 | _service.initialize(context) 84 | 85 | if data is None: 86 | return None 87 | 88 | return _service.handle(data, context) 89 | -------------------------------------------------------------------------------- /examples/lgbm/transform/docker/readme.md: -------------------------------------------------------------------------------- 1 | # build docker image and push to your account's ECR. 2 | to build in sagemaker studio notebook, use sagemaker-studio-image-build 3 | 4 | sm-docker build . 5 | 6 | ref: https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/ 7 | -------------------------------------------------------------------------------- /examples/lgbm/transform/requirements.txt: -------------------------------------------------------------------------------- 1 | lightgbm 2 | numpy 3 | pandas 4 | scikit-learn 5 | multi-model-server 6 | sagemaker-inference 7 | retrying -------------------------------------------------------------------------------- /examples/lgbm/transform/transform.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import sys 3 | import shlex 4 | import os 5 | from retrying import retry 6 | from subprocess import CalledProcessError 7 | from sagemaker_inference import model_server 8 | 9 | def _retry_if_error(exception): 10 | return isinstance(exception, CalledProcessError or OSError) 11 | 12 | @retry(stop_max_delay=1000 * 50, 13 | retry_on_exception=_retry_if_error) 14 | def _start_mms(): 15 | # by default the number of workers per model is 1, but we can configure it through the 16 | # environment variable below if desired. 17 | # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2' 18 | model_server.start_model_server(handler_service='/home/model-server/model_script.py:handle') 19 | 20 | def main(): 21 | if sys.argv[1] == 'serve': 22 | _start_mms() 23 | else: 24 | subprocess.check_call(shlex.split(' '.join(sys.argv[1:]))) 25 | 26 | # prevent docker exit 27 | subprocess.call(['tail', '-f', '/dev/null']) 28 | 29 | main() -------------------------------------------------------------------------------- /examples/llm-text-summarization/conf/conf.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | conf: 3 | models: 4 | falcon40b-finetuneable: 5 | source_directory: examples/llm-text-summarization 6 | 7 | preprocess: 8 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-cpu-py310-ubuntu20.04-sagemaker 9 | entry_point: preprocessing/preprocessing.py 10 | base_job_name: falcon-text-summarization-preprocess 11 | channels: 12 | training: 13 | dataFiles: 14 | - sourceName: raw-prompts-train 15 | fileName: s3://SMP_S3BUCKETNAME/falcon40b-summarization/input/samsum-train.arrow 16 | testing: 17 | dataFiles: 18 | - sourceName: raw-prompts-test 19 | fileName: s3://SMP_S3BUCKETNAME/falcon40b-summarization/input/samsum-test.arrow 20 | 21 | train: 22 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-cpu-py310-ubuntu20.04-sagemaker 23 | entry_point: training/training.py 24 | base_job_name: falcon-text-summarization-tuning 25 | instance_type: ml.g5.12xlarge 26 | volume_size_in_gb: 1024 27 | max_runtime_seconds: 86400 28 | 29 | 30 | transform: 31 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04 32 | entry_point: transform/inference.py 33 | 34 | 35 | registry: 36 | ModelRepack: "False" 37 | InferenceSpecification: 38 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04 39 | supported_content_types: 40 | - application/json 41 | supported_response_MIME_types: 42 | - application/json 43 | approval_status: PendingManualApproval 44 | 45 | 46 | sagemakerPipeline: 47 | pipelineName: Falcon40b-fine-tune 48 | models: 49 | falcon40b-finetuneable: 50 | steps: 51 | - step_name: falcon-text-summarization-preprocess 52 | step_class: Processing 53 | step_type: preprocess 54 | enable_cache: True 55 | - step_name: falcon-text-summarization-tuning 56 | step_class: Training 57 | enable_cache: False 58 | step_type: train 59 | chain_input_source_step: 60 | - falcon-text-summarization-preprocess 61 | - step_name: falcon-text-summarization-register 62 | step_class: RegisterModel 63 | enable_cache: False 64 | dependencies: 65 | - falcon-text-summarization-preprocess >> falcon-text-summarization-tuning >> falcon-text-summarization-register -------------------------------------------------------------------------------- /examples/llm-text-summarization/dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/llm-text-summarization/dag.png -------------------------------------------------------------------------------- /examples/llm-text-summarization/data_source.md: -------------------------------------------------------------------------------- 1 | # LLM Example Data Source 2 | 3 | ## Data Information 4 | We use the samsum https://huggingface.co/datasets/samsum dataset hosted at huggingface for this example. 5 | 6 | More info on the dataset: 7 | 8 | https://huggingface.co/datasets/samsum 9 | 10 | 11 | ## Data Download and S3 Upload 12 | To download the data, run the following code block. The data files are named as **samsum-{train/test/validation}.arrow** 13 | install requirements via: pip install aiobotocore, datasets, 14 | ``` 15 | from datasets import load_dataset 16 | from datasets import load_dataset_builder 17 | import aiobotocore.session 18 | 19 | # set up a profile in your aws config file: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html) 20 | profile = 'default' 21 | s3_session = aiobotocore.session.AioSession(profile=profile) 22 | storage_options = {"session": s3_session} 23 | 24 | output_dir = "s3:///" 25 | builder = load_dataset_builder("samsum") 26 | builder.download_and_prepare(output_dir, storage_options=storage_options, file_format="arrow") 27 | ``` 28 | -------------------------------------------------------------------------------- /examples/llm-text-summarization/preprocessing/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/llm-text-summarization/preprocessing/__init__.py -------------------------------------------------------------------------------- /examples/llm-text-summarization/preprocessing/preprocessing.py: -------------------------------------------------------------------------------- 1 | from datasets import load_dataset 2 | from random import randint 3 | from transformers import AutoTokenizer 4 | 5 | # Load dataset staged from hub in s3 6 | preprocessing_input_local= '/opt/ml/processing/input' 7 | # dataset = load_dataset("arrow", 8 | # data_files={ 9 | # 'train': f'{preprocessing_input_local}/raw-prompts-train/samsum-train.arrow', 10 | # 'test': f'{preprocessing_input_local}/raw-prompts-test/samsum-test.arrow' 11 | # } 12 | # ) 13 | 14 | #Alternatively load dataset from hugging face 15 | dataset = load_dataset("samsum") 16 | 17 | print(f"Train dataset size: {len(dataset['train'])}") 18 | print(f"Test dataset size: {len(dataset['test'])}") 19 | 20 | tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b") 21 | 22 | # custom instruct prompt start 23 | prompt_template = f"Summarize the chat dialogue:\n{{dialogue}}\n---\nSummary:\n{{summary}}{{eos_token}}" 24 | 25 | # template dataset to add prompt to each sample 26 | def template_dataset(sample): 27 | sample["text"] = prompt_template.format(dialogue=sample["dialogue"], 28 | summary=sample["summary"], 29 | eos_token=tokenizer.eos_token) 30 | return sample 31 | 32 | 33 | # apply prompt template per sample 34 | train_dataset = dataset["train"].map(template_dataset, remove_columns=list(dataset["train"].features)) 35 | 36 | print(f'Sample summarization example on base model: {train_dataset[randint(0, len(dataset))]["text"]}') 37 | 38 | # apply prompt template per sample 39 | test_dataset = dataset["test"].map(template_dataset, remove_columns=list(dataset["test"].features)) 40 | 41 | # tokenize and chunk dataset 42 | lm_train_dataset = train_dataset.map( 43 | lambda sample: tokenizer(sample["text"]), batched=True, batch_size=24, remove_columns=list(train_dataset.features) 44 | ) 45 | 46 | 47 | lm_test_dataset = test_dataset.map( 48 | lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(test_dataset.features) 49 | ) 50 | 51 | # Print total number of samples 52 | print(f"Total number of train samples: {len(lm_train_dataset)}") 53 | 54 | lm_train_dataset.save_to_disk(f'/opt/ml/processing/output/training') 55 | lm_test_dataset.save_to_disk(f'/opt/ml/processing/output/testing') -------------------------------------------------------------------------------- /examples/llm-text-summarization/requirements.txt: -------------------------------------------------------------------------------- 1 | torch==2.2.0 2 | transformers 3 | datasets 4 | peft==0.4.0 5 | bitsandbytes==0.40.2 6 | accelerate==0.21.0 7 | py7zr 8 | einops 9 | tensorboardX -------------------------------------------------------------------------------- /examples/llm-text-summarization/training/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/llm-text-summarization/training/__init__.py -------------------------------------------------------------------------------- /examples/llm-text-summarization/training/requirements.txt: -------------------------------------------------------------------------------- 1 | # sagemaker estimators requires training.py and requirement in strict root folder, not 2 | torch==2.2.0 3 | transformers 4 | datasets 5 | peft==0.4.0 6 | bitsandbytes==0.40.2 7 | accelerate==0.21.0 8 | py7zr 9 | einops 10 | tensorboardX -------------------------------------------------------------------------------- /examples/llm-text-summarization/training/training.py: -------------------------------------------------------------------------------- 1 | import os 2 | import torch 3 | import transformers 4 | from datasets import load_from_disk 5 | import transformers 6 | from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig 7 | from peft import prepare_model_for_kbit_training 8 | from peft import LoraConfig, get_peft_model 9 | import shutil 10 | import os 11 | import nvidia 12 | 13 | cuda_install_dir = '/'.join(nvidia.__file__.split('/')[:-1]) + '/cuda_runtime/lib/' 14 | os.environ['LD_LIBRARY_PATH'] = cuda_install_dir 15 | 16 | print('*'*100) 17 | print(torch.__version__) 18 | print('*'*100) 19 | 20 | # log_bucket = f"s3://{os.environ['SMP_S3BUCKETNAME']}/falcon-40b-qlora-finetune" 21 | 22 | model_id = "tiiuae/falcon-7b" 23 | 24 | # model_id = "tiiuae/falcon-40b" 25 | 26 | device_map="auto" 27 | bnb_config = BitsAndBytesConfig( 28 | load_in_4bit=True, 29 | bnb_4bit_use_double_quant=True, 30 | bnb_4bit_quant_type="nf4", 31 | bnb_4bit_compute_dtype=torch.bfloat16 32 | ) 33 | 34 | # alternate config for loading unquantized model on cpu 35 | ''' 36 | device_map = { 37 | "transformer.word_embeddings": 0, 38 | "transformer.word_embeddings_layernorm": 0, 39 | "lm_head": "cpu", 40 | "transformer.h": 0, 41 | "transformer.ln_f": 0, 42 | } 43 | bnb_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True) 44 | ''' 45 | 46 | lm_train_dataset= load_from_disk(dataset_path=f"/opt/ml/input/data/falcon-text-summarization-preprocess-training/") 47 | lm_test_dataset= load_from_disk(dataset_path=f"/opt/ml/input/data/falcon-text-summarization-preprocess-testing/") 48 | 49 | tokenizer= AutoTokenizer.from_pretrained(model_id) 50 | model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config, device_map= device_map) #model prepared for LoRA training using PEFT 51 | 52 | tokenizer.pad_token = tokenizer.eos_token 53 | 54 | 55 | model.gradient_checkpointing_enable() 56 | model = prepare_model_for_kbit_training(model) 57 | model.config.use_cache = False 58 | 59 | config = LoraConfig( 60 | r=8, 61 | lora_alpha=32, 62 | target_modules=[ 63 | "query_key_value", 64 | "dense", 65 | "dense_h_to_4h", 66 | "dense_4h_to_h", 67 | ], 68 | lora_dropout=0.05, 69 | bias="none", 70 | task_type="CAUSAL_LM" 71 | ) 72 | 73 | model = get_peft_model(model, config) 74 | 75 | trainer = transformers.Trainer( 76 | model=model, 77 | train_dataset= lm_train_dataset, 78 | eval_dataset=lm_test_dataset, 79 | args=transformers.TrainingArguments( 80 | per_device_train_batch_size=8, 81 | per_device_eval_batch_size=8, 82 | # logging_dir=f'{log_bucket}/', # connect tensorboard for visualizing live training logs 83 | logging_steps=2, 84 | num_train_epochs=1, #num_train_epochs=1 for demonstration 85 | learning_rate=2e-4, 86 | bf16=True, 87 | save_strategy = "no", 88 | output_dir="outputs", 89 | report_to="tensorboard" 90 | ), 91 | data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False), 92 | ) 93 | 94 | trainer.train() 95 | 96 | eval_metrics= trainer.evaluate() 97 | 98 | # throw an error if evaluation loss is above threshold. Alternatively output an evaluation.json and add the pass/fail logic as a Sagemaker pipeline step. 99 | if eval_metrics['eval_loss'] > 2: 100 | raise ValueError("Evaluation loss is too high.") 101 | 102 | # create a tarball of the model. For inference logic, untar and load appropriate .bin and .config for the llm from hugging face and serve. 103 | trainer.save_model('/opt/ml/model') 104 | # shutil.copytree('/opt/ml/code', os.path.join(os.environ['SM_MODEL_DIR'], 'code')) -------------------------------------------------------------------------------- /examples/llm-text-summarization/transform/inference.py: -------------------------------------------------------------------------------- 1 | # Inserting an inference.py to register the fine-tuned model. 2 | # This pattern demands the inference logic be available in a batch transform step. 3 | 4 | # Replace this script with your inference logic, and place model dependencies for your model under llm-text-summarization/transform. 5 | # e.x: s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.0.0/infer-prepack-huggingface-llm-falcon-40b-bf16.tar.gz 6 | # refer: https://huggingface.co/docs/sagemaker/inference#user-defined-code-and-modules 7 | 8 | # extract Model ID from: https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html. 9 | # then use the code snippet below. also use to get the appropriate container in 'deploy_image_uri' 10 | 11 | ''' 12 | from sagemaker import image_uris, model_uris 13 | 14 | model_id, model_version, = ( 15 | "huggingface-llm-falcon-40b-bf16", 16 | "*", 17 | ) 18 | inference_instance_type = "ml.p3.2xlarge" 19 | 20 | # Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above. 21 | deploy_image_uri = image_uris.retrieve( 22 | region=None, 23 | framework=None, # automatically inferred from model_id 24 | image_scope="inference", 25 | model_id=model_id, 26 | model_version=model_version, 27 | instance_type=inference_instance_type, 28 | ) 29 | 30 | # Retrieve the model uri. 31 | model_uri = model_uris.retrieve( 32 | model_id=model_id, model_version=model_version, model_scope="inference" 33 | ) 34 | ''' 35 | -------------------------------------------------------------------------------- /examples/multi-model-example/MultiModel.md: -------------------------------------------------------------------------------- 1 | # Welcome to the multi model example for this config driven sagemaker pipeline framwork! 2 | 3 | ## Introduction 4 | This is an multi model usage example. In this example, there are two models. One is PCA Model trained for feature dimension reduction, the other model is Tensorflow MLP trained for California Housing Price prediction. The Tensorflow Model's preprocessing step uses trained PCA Model to reduce its training data's number of feature dimensions. Also, we add the dependency that Tensorflow Model must be registered after PCA Model registration. 5 | 6 | ![Mutli-Model-Pipeline-DAG](./dag.png) 7 | 8 | ## Data Information 9 | We use the California housing dataset. 10 | 11 | More info on the dataset: 12 | 13 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/ 14 | 15 | The target variable is the median house value for California districts. 16 | 17 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). 18 | 19 | ### Data Download 20 | To download the data, run the following code block. The data file name is **cal_housing.data** 21 | ``` 22 | import boto3 23 | 24 | region = "" 25 | s3 = boto3.client("s3") 26 | s3.download_file( 27 | f"sagemaker-example-files-prod-{region}", 28 | "datasets/tabular/california_housing/cal_housing.tgz", 29 | "cal_housing.tgz", 30 | ) 31 | 32 | import tarfile 33 | with tarfile.open("cal_housing.tgz") as tar: 34 | tar.extractall(path="tf/train_data") 35 | ``` 36 | 37 | ### Upload to S3 38 | ``` 39 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3:////cal_housing.data 40 | ``` 41 | 42 | 43 | 44 | 45 | 46 | 47 | ## Multi Model Project Structure 48 | 49 | ``` 50 | /root/ 51 | │ dynamic-model-training-with-amazon-sagemaker-pipelines 52 | │ 53 | └───model X specific codebase 54 | │ │ 55 | │ └─ conf 56 | │ │ │ 57 | │ │ └─multi_model_conf.yaml 58 | │ │ ... 59 | │ └─ model_X_scripts 60 | │ │ │ 61 | │ │ └─preprocess.py 62 | │ │ │ 63 | │ │ └─train.py 64 | │ │ │ 65 | │ │ └─... 66 | │ 67 | └───model Y specific codebase 68 | │ │ 69 | │ └─ conf 70 | │ │ │ 71 | │ │ └─multi_model_conf.yaml 72 | │ │ ... 73 | │ └─ model_Y_scripts 74 | │ │ │ 75 | │ │ └─preprocess.py 76 | │ │ │ 77 | │ │ └─train.py 78 | │ │ │ 79 | │ │ └─... 80 | ``` 81 | ## Multi Model Runbook 82 | - Step 1: Set Up Anchor Model For Multi Model Execution 83 | 84 | For this config driven sagemaker pipeline framwork, when it is used for mutli model pipeline creation, each model will have its own conf.yaml. And the anchor model is define as the model that its conf.yaml contains sagemakerPipeline configuration section. In this example, the one and only sagemakerPipeline configuration section is defined in cal_housing_tf's conf.yaml file. Any model can be your anchor model. 85 | 86 | - Step 2: Set Up Environment Variables 87 | 88 | Navigate to project root directory, set up env vars listed in env.env. For this multi model example, you may need to run the following command specifically in terminal. 89 | ``` 90 | export SMP_MODEL_CONFIGPATH=examples/multi-model-example/*/conf/conf-multi-model.yaml 91 | ``` 92 | - Step 3: Generate Pipeline Definition & Run Pipeline 93 | 94 | Navigate to project root directory, run the following command in terminal. 95 | ``` 96 | python3 framework/framework_entrypoint.py 97 | ``` 98 |
99 |
100 | 101 | 102 | Enjoy! 103 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_pca/conf/conf-multi-model.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | conf: 3 | models: 4 | calhousingpca: 5 | source_directory: examples/multi-model-example/cal_housing_pca/modelscripts 6 | preprocess: 7 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 8 | entry_point: preprocess.py 9 | channels: 10 | train: 11 | dataFiles: 12 | - sourceName: raw_data 13 | fileName: s3://SMP_S3BUCKETNAME/tf2-california-housing-pipelines/traindata/cal_housing.data 14 | 15 | train: 16 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 17 | entry_point: train.py 18 | 19 | registry: 20 | ModelRepack: "False" 21 | InferenceSpecification: 22 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 23 | supported_content_types: 24 | - application/json 25 | supported_response_MIME_types: 26 | - application/json 27 | approval_status: PendingManualApproval 28 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_pca/data_source.md: -------------------------------------------------------------------------------- 1 | # Tensorflow Example Data Source 2 | 3 | ## Data Information 4 | We use the California housing dataset. 5 | 6 | More info on the dataset: 7 | 8 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/ 9 | 10 | The target variable is the median house value for California districts. 11 | 12 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). 13 | 14 | ## Data Download 15 | To download the data, run the following code block. The data file name is **cal_housing.data** 16 | ``` 17 | import boto3 18 | 19 | region = "" 20 | s3 = boto3.client("s3") 21 | s3.download_file( 22 | f"sagemaker-example-files-prod-{region}", 23 | "datasets/tabular/california_housing/cal_housing.tgz", 24 | "cal_housing.tgz", 25 | ) 26 | 27 | import tarfile 28 | with tarfile.open("cal_housing.tgz") as tar: 29 | tar.extractall(path="tf/train_data") 30 | ``` 31 | 32 | ## Upload to S3 33 | ``` 34 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3:////cal_housing.data 35 | ``` -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_pca/modelscripts/inference.py: -------------------------------------------------------------------------------- 1 | print("This is a placeholder inference.py for cal housing PCA model.") 2 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_pca/modelscripts/preprocess.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import pandas as pd 4 | 5 | BASE_DIR = "/opt/ml/processing" 6 | CODE_DIR = os.path.join(BASE_DIR, "code") 7 | INPUT_DIR = os.path.join(BASE_DIR, "input") 8 | OUTPUT_DIR = os.path.join(BASE_DIR, "output") 9 | 10 | print(os.listdir(INPUT_DIR)) 11 | 12 | if __name__ == "__main__": 13 | columns = [ 14 | "longitude", 15 | "latitude", 16 | "housingMedianAge", 17 | "totalRooms", 18 | "totalBedrooms", 19 | "population", 20 | "households", 21 | "medianIncome", 22 | "medianHouseValue", 23 | ] 24 | cal_housing_df = pd.read_csv( 25 | os.path.join(INPUT_DIR, "raw_data/cal_housing.data"), 26 | names=columns, 27 | header=None 28 | ) 29 | X = cal_housing_df[ 30 | [ 31 | "longitude", 32 | "latitude", 33 | "housingMedianAge", 34 | "totalRooms", 35 | "totalBedrooms", 36 | "population", 37 | "households", 38 | "medianIncome", 39 | ] 40 | ] 41 | Y = cal_housing_df[["medianHouseValue"]] / 100000 42 | 43 | X.to_csv(os.path.join(OUTPUT_DIR, "train/X.csv"), index=False, header=True) 44 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_pca/modelscripts/requirements.txt: -------------------------------------------------------------------------------- 1 | pandas -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_pca/modelscripts/train.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import pandas as pd 4 | from joblib import dump 5 | from sklearn.decomposition import PCA 6 | 7 | if __name__ == "__main__": 8 | # data directories 9 | channel_path = "/opt/ml/input/data/calhousing-pca-Preprocessing-train" 10 | print(f'Training data location: {os.listdir(channel_path)}') 11 | train_data_path = os.path.join(channel_path, "X.csv") 12 | X = pd.read_csv(train_data_path) 13 | print(X.head(5)) 14 | pca = PCA(n_components=6) 15 | pca.fit(X) 16 | print(pca.explained_variance_ratio_) 17 | print(pca.singular_values_) 18 | 19 | # save model 20 | dump(pca, os.path.join(os.environ.get("SM_MODEL_DIR"), "pca_model.joblib")) 21 | print(os.listdir("/opt/ml/model/")) 22 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/conf/conf-multi-model.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | conf: 3 | models: 4 | calhousingtf: 5 | source_directory: examples/multi-model-example/cal_housing_tf/modelscripts 6 | preprocess: 7 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 8 | entry_point: preprocess.py 9 | channels: 10 | train: 11 | dataFiles: 12 | - sourceName: raw_data 13 | fileName: s3://SMP_S3BUCKETNAME/tf2-california-housing-pipelines/traindata/cal_housing.data 14 | 15 | train: 16 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.11.0-cpu-py39 17 | entry_point: train.py 18 | 19 | registry: 20 | ModelRepack: "False" 21 | InferenceSpecification: 22 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu 23 | supported_content_types: 24 | - application/json 25 | supported_response_MIME_types: 26 | - application/json 27 | approval_status: PendingManualApproval 28 | 29 | transform: 30 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu 31 | entry_point: inference.py 32 | channels: 33 | train: 34 | 35 | evaluate: 36 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 37 | entry_point: evaluate.py 38 | channels: train 39 | content_type: application/json 40 | 41 | 42 | 43 | sagemakerPipeline: 44 | pipelineName: calhousing-test 45 | models: 46 | calhousingpca: 47 | steps: 48 | - step_name: calhousing-pca-Preprocessing 49 | step_class: Processing 50 | step_type: preprocess 51 | enable_cache: True 52 | - step_name: calhousing-pca-Training 53 | step_class: Training 54 | enable_cache: True 55 | chain_input_source_step: 56 | - calhousing-pca-Preprocessing 57 | - step_name: calhousing-pca-Register 58 | step_class: RegisterModel 59 | calhousingtf: 60 | steps: 61 | - step_name: calhousing-tf-Preprocessing 62 | step_type: preprocess 63 | step_class: Processing 64 | chain_input_source_step: 65 | - calhousing-pca-Training 66 | enable_cache: True 67 | - step_name: calhousing-tf-Training 68 | step_class: Training 69 | enable_cache: True 70 | chain_input_source_step: 71 | - calhousing-tf-Preprocessing 72 | - step_name: calhousing-tf-CreateModel 73 | step_class: CreateModel 74 | - step_name: calhousing-tf-Transform 75 | step_class: Transform 76 | chain_input_source_step: 77 | - calhousing-tf-Preprocessing 78 | chain_input_additional_prefix: test/x_test.csv 79 | - step_name: calhousing-tf-Metrics 80 | step_class: Metrics 81 | chain_input_source_step: 82 | - calhousing-tf-Preprocessing 83 | - calhousing-tf-Transform 84 | - step_name: calhousing-tf-Register 85 | step_class: RegisterModel 86 | 87 | dependencies: 88 | - calhousing-tf-Preprocessing >> calhousing-tf-Training >> calhousing-tf-CreateModel >> calhousing-tf-Transform >> calhousing-tf-Metrics >> calhousing-tf-Register 89 | - calhousing-pca-Preprocessing >> calhousing-pca-Training >> calhousing-pca-Register 90 | # example: add-on customized dependency 91 | - calhousing-pca-Register >> calhousing-tf-Register -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/data_source.md: -------------------------------------------------------------------------------- 1 | # Tensorflow Example Data Source 2 | 3 | ## Data Information 4 | We use the California housing dataset. 5 | 6 | More info on the dataset: 7 | 8 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/ 9 | 10 | The target variable is the median house value for California districts. 11 | 12 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). 13 | 14 | ## Data Download 15 | To download the data, run the following code block. The data file name is **cal_housing.data** 16 | ``` 17 | import boto3 18 | 19 | region = "" 20 | s3 = boto3.client("s3") 21 | s3.download_file( 22 | f"sagemaker-example-files-prod-{region}", 23 | "datasets/tabular/california_housing/cal_housing.tgz", 24 | "cal_housing.tgz", 25 | ) 26 | 27 | import tarfile 28 | with tarfile.open("cal_housing.tgz") as tar: 29 | tar.extractall(path="tf/train_data") 30 | ``` 31 | 32 | ## Upload to S3 33 | ``` 34 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3:////cal_housing.data 35 | ``` -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/modelscripts/evaluate.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pathlib 4 | 5 | import numpy as np 6 | from sklearn.metrics import mean_squared_error 7 | 8 | if __name__ == "__main__": 9 | pred_path = "/opt/ml/processing/input/calhousing-tf-Transform-train/" 10 | print(os.listdir(pred_path)) 11 | with open(os.path.join(pred_path, "x_test.csv.out")) as f: 12 | file_string = f.read() 13 | y_test_pred = json.loads(file_string)["predictions"] 14 | 15 | test_path = "/opt/ml/processing/input/calhousing-tf-Preprocessing-train/" 16 | print(os.listdir(test_path)) 17 | y_test_true = np.loadtxt(os.path.join(test_path, "test/y_test.csv")) 18 | scores = mean_squared_error(y_test_true, y_test_pred) 19 | print("\nTest MSE :", scores) 20 | 21 | # Available metrics to add to model: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html 22 | report_dict = { 23 | "regression_metrics": { 24 | "mse": {"value": scores, "standard_deviation": "NaN"}, 25 | }, 26 | } 27 | 28 | output_dir = "/opt/ml/processing/output" 29 | pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True) 30 | 31 | evaluation_path = f"{output_dir}/model_evaluation_metrics.json" 32 | with open(evaluation_path, "w") as f: 33 | f.write(json.dumps(report_dict)) 34 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/modelscripts/inference.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | import numpy as np 5 | 6 | print(f"My location: {os.listdir()}") 7 | print(f"Dir of /opt/ml: {os.listdir('/opt/ml')}") 8 | print(f"Dir of /opt/ml/model: {os.listdir('/opt/ml/model')}") 9 | 10 | print(f"{'-' * 40} Start printing Env Var {'-' * 40}") 11 | for name, value in os.environ.items(): 12 | print("{0}: {1}".format(name, value)) 13 | print(f"{'-' * 40} Finish printing Env Var {'-' * 40}") 14 | 15 | model_dir = "/opt/ml/model" 16 | print("numpy version", np.__version__) 17 | 18 | 19 | def read_csv(csv): 20 | return np.array([[float(j) for j in i.split(",")] for i in csv.splitlines()]) 21 | 22 | 23 | def input_handler(data, context): 24 | """ Pre-process request input before it is sent to TensorFlow Serving REST API 25 | Args: 26 | data (obj): the request data, in format of dict or string 27 | context (Context): an object containing request and configuration details 28 | Returns: 29 | (dict): a JSON-serializable dict that contains request body and headers 30 | """ 31 | print(f"InputHandler, request content type is {context.request_content_type}") 32 | print(f"InputHandler, model name is {context.model_name}") 33 | print(f"InputHandler, method is {context.method}") 34 | print(f"InputHandler, rest_uri is {context.rest_uri}") 35 | print(f"InputHandler, custom_attributes is {context.custom_attributes}") 36 | print(f"InputHandler, accept_header is {context.accept_header}") 37 | print(f"InputHandler, content_length is {context.content_length}") 38 | 39 | if context.request_content_type == 'application/json': 40 | # pass through json (assumes it's correctly formed) 41 | d = data.read().decode('utf-8') 42 | return d if len(d) else '' 43 | if context.request_content_type == 'text/csv': 44 | payload = data.read().decode('utf-8') 45 | inputs = read_csv(payload) 46 | print(inputs[:10]) 47 | input_data = {'instances': inputs.tolist()} 48 | return json.dumps(input_data) 49 | 50 | 51 | def output_handler(data, context): 52 | """Post-process TensorFlow Serving output before it is returned to the client. 53 | Args: 54 | data (obj): the TensorFlow serving response 55 | context (Context): an object containing request and configuration details 56 | Returns: 57 | (bytes, string): data to return to client, response content type 58 | """ 59 | print(f"OutputHandler, hello world!") 60 | status_code = data.status_code 61 | content = data.content 62 | 63 | if status_code != 200: 64 | raise ValueError(content.decode('utf-8')) 65 | 66 | response_content_type = context.accept_header 67 | prediction = data.content 68 | 69 | print(f"Prediction type is {type(prediction)}, {prediction}") 70 | print(f"Prediction is {prediction}") 71 | 72 | return prediction, response_content_type 73 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/modelscripts/preprocess.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | import tarfile 4 | 5 | import numpy as np 6 | import pandas as pd 7 | from joblib import load 8 | from sklearn.model_selection import train_test_split 9 | from sklearn.preprocessing import StandardScaler 10 | 11 | BASE_DIR = "/opt/ml/processing" 12 | CODE_DIR = os.path.join(BASE_DIR, "code") 13 | INPUT_DIR = os.path.join(BASE_DIR, "input") 14 | OUTPUT_DIR = os.path.join(BASE_DIR, "output") 15 | 16 | print(os.listdir(INPUT_DIR)) 17 | 18 | if __name__ == "__main__": 19 | columns = [ 20 | "longitude", 21 | "latitude", 22 | "housingMedianAge", 23 | "totalRooms", 24 | "totalBedrooms", 25 | "population", 26 | "households", 27 | "medianIncome", 28 | "medianHouseValue", 29 | ] 30 | cal_housing_df = pd.read_csv( 31 | os.path.join(INPUT_DIR, "raw_data/cal_housing.data"), 32 | names=columns, 33 | header=None 34 | ) 35 | X = cal_housing_df[ 36 | [ 37 | "longitude", 38 | "latitude", 39 | "housingMedianAge", 40 | "totalRooms", 41 | "totalBedrooms", 42 | "population", 43 | "households", 44 | "medianIncome", 45 | ] 46 | ] 47 | Y = cal_housing_df[["medianHouseValue"]] / 100000 48 | 49 | x_train_, x_test_, y_train, y_test = train_test_split(X, Y, test_size=0.33) 50 | pca_model_tarfile_location = os.path.join(INPUT_DIR, "calhousing-pca-Training-input-train/model.tar.gz") 51 | print(os.listdir(os.path.join(INPUT_DIR, "calhousing-pca-Training-input-train"))) 52 | # with tarfile.open(pca_model_tarfile_location) as tar: 53 | # tar.extractall() 54 | tf = tarfile.open(mode='r', fileobj=None) 55 | tf.extractall(pca_model_tarfile_location, members=None) 56 | 57 | print(os.listdir()) 58 | pca = load("pca_model.joblib") 59 | x_train = pca.transform(x_train_) 60 | x_test = pca.transform(x_test_) 61 | 62 | split_data_dir = os.path.join(BASE_DIR, "split_data") 63 | if not os.path.exists(split_data_dir): 64 | os.mkdir(split_data_dir) 65 | np.save(os.path.join(split_data_dir, "x_train.npy"), x_train) 66 | np.save(os.path.join(split_data_dir, "x_test.npy"), x_test) 67 | np.save(os.path.join(split_data_dir, "y_train.npy"), y_train) 68 | np.save(os.path.join(split_data_dir, "y_test.npy"), y_test) 69 | 70 | input_files = glob.glob("{}/*.npy".format(split_data_dir)) 71 | print("\nINPUT FILE LIST: \n{}\n".format(input_files)) 72 | scaler = StandardScaler() 73 | x_train = np.load(os.path.join(split_data_dir, "x_train.npy")) 74 | scaler.fit(x_train) 75 | 76 | train_data_output_dir = os.path.join(OUTPUT_DIR, "train/train") 77 | if not os.path.exists(train_data_output_dir): 78 | os.mkdir(train_data_output_dir) 79 | test_data_output_dir = os.path.join(OUTPUT_DIR, "train/test") 80 | if not os.path.exists(test_data_output_dir): 81 | os.mkdir(test_data_output_dir) 82 | for file in input_files: 83 | raw = np.load(file) 84 | # only transform feature columns 85 | if "y_" not in file: 86 | transformed = scaler.transform(raw) 87 | if "train" in file: 88 | if "y_" in file: 89 | output_path = os.path.join(train_data_output_dir, "y_train.npy") 90 | np.save(output_path, raw) 91 | print("SAVED LABEL TRAINING DATA FILE\n") 92 | else: 93 | output_path = os.path.join(train_data_output_dir, "x_train.npy") 94 | np.save(output_path, transformed) 95 | print("SAVED TRANSFORMED TRAINING DATA FILE\n") 96 | else: 97 | if "y_" in file: 98 | output_path = os.path.join(test_data_output_dir, "y_test.npy") 99 | np.save(output_path, raw) 100 | output_path = os.path.join(test_data_output_dir, "y_test.csv") 101 | np.savetxt(output_path, raw, delimiter=",") 102 | print("SAVED LABEL TEST DATA FILE\n") 103 | else: 104 | output_path = os.path.join(test_data_output_dir, "x_test.npy") 105 | np.save(output_path, transformed) 106 | output_path = os.path.join(test_data_output_dir, "x_test.csv") 107 | np.savetxt(output_path, transformed, delimiter=",") 108 | print("SAVED TRANSFORMED TEST DATA FILE\n") 109 | -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/modelscripts/requirements.txt: -------------------------------------------------------------------------------- 1 | pandas -------------------------------------------------------------------------------- /examples/multi-model-example/cal_housing_tf/modelscripts/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | 7 | 8 | def parse_args(): 9 | parser = argparse.ArgumentParser() 10 | 11 | # hyperparameters sent by the client are passed as command-line arguments to the script 12 | parser.add_argument('--epochs', type=int, default=1) 13 | parser.add_argument('--batch_size', type=int, default=64) 14 | parser.add_argument('--learning_rate', type=float, default=0.1) 15 | 16 | # data directories 17 | channel_path = "/opt/ml/input/data/calhousing-tf-Preprocessing-train" 18 | parser.add_argument( 19 | '--train', 20 | type=str, 21 | default=os.path.join(channel_path, "train") 22 | ) 23 | parser.add_argument( 24 | '--test', 25 | type=str, 26 | default=os.path.join(channel_path, "test") 27 | ) 28 | 29 | # model directory 30 | parser.add_argument('--sm-model-dir', type=str, default=os.environ.get('SM_MODEL_DIR')) 31 | 32 | return parser.parse_known_args() 33 | 34 | 35 | def get_train_data(train_dir): 36 | x_train = np.load(os.path.join(train_dir, 'x_train.npy')) 37 | y_train = np.load(os.path.join(train_dir, 'y_train.npy')) 38 | print('x train', x_train.shape, 'y train', y_train.shape) 39 | 40 | return x_train, y_train 41 | 42 | 43 | def get_test_data(test_dir): 44 | x_test = np.load(os.path.join(test_dir, 'x_test.npy')) 45 | y_test = np.load(os.path.join(test_dir, 'y_test.npy')) 46 | print('x test', x_test.shape, 'y test', y_test.shape) 47 | 48 | return x_test, y_test 49 | 50 | 51 | def get_model(): 52 | inputs = tf.keras.Input(shape=(6,)) 53 | hidden_1 = tf.keras.layers.Dense(8, activation='tanh')(inputs) 54 | hidden_2 = tf.keras.layers.Dense(4, activation='sigmoid')(hidden_1) 55 | outputs = tf.keras.layers.Dense(1)(hidden_2) 56 | return tf.keras.Model(inputs=inputs, outputs=outputs) 57 | 58 | 59 | if __name__ == "__main__": 60 | args, _ = parse_args() 61 | 62 | print('Training data location: {}'.format(args.train)) 63 | print('Test data location: {}'.format(args.test)) 64 | x_train, y_train = get_train_data(args.train) 65 | x_test, y_test = get_test_data(args.test) 66 | 67 | batch_size = args.batch_size 68 | epochs = args.epochs 69 | learning_rate = args.learning_rate 70 | print('batch_size = {}, epochs = {}, learning rate = {}'.format(batch_size, epochs, learning_rate)) 71 | 72 | model = get_model() 73 | optimizer = tf.keras.optimizers.SGD(learning_rate) 74 | model.compile(optimizer=optimizer, loss='mse') 75 | model.fit(x_train, 76 | y_train, 77 | batch_size=batch_size, 78 | epochs=epochs, 79 | validation_data=(x_test, y_test)) 80 | 81 | # evaluate on test set 82 | scores = model.evaluate(x_test, y_test, batch_size, verbose=2) 83 | print("\nTest MSE :", scores) 84 | 85 | # save model 86 | model.save(args.sm_model_dir + '/1') 87 | -------------------------------------------------------------------------------- /examples/multi-model-example/dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/multi-model-example/dag.png -------------------------------------------------------------------------------- /examples/tf/conf/conf.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | conf: 3 | models: 4 | calhousing: 5 | source_directory: examples/tf/modelscripts 6 | preprocess: 7 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 8 | entry_point: preprocess.py 9 | channels: 10 | train: 11 | dataFiles: 12 | - sourceName: raw_data 13 | fileName: s3://SMP_S3BUCKETNAME/tf2-california-housing-pipelines/traindata/cal_housing.data 14 | train: 15 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.11.0-cpu-py39 16 | entry_point: train.py 17 | 18 | registry: 19 | ModelRepack: "False" 20 | InferenceSpecification: 21 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu 22 | supported_content_types: 23 | - application/json 24 | supported_response_MIME_types: 25 | - application/json 26 | approval_status: PendingManualApproval 27 | 28 | transform: 29 | image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu 30 | entry_point: inference.py 31 | channels: 32 | train: 33 | 34 | evaluate: 35 | image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3 36 | entry_point: evaluate.py 37 | channels: train 38 | content_type: application/json 39 | 40 | 41 | sagemakerPipeline: 42 | pipelineName: calhousing-test 43 | models: 44 | calhousing: 45 | steps: 46 | - step_name: calhousing-Preprocessing 47 | step_type: preprocess 48 | step_class: Processing 49 | enable_cache: True 50 | - step_name: calhousing-Training 51 | step_class: Training 52 | enable_cache: True 53 | chain_input_source_step: 54 | - calhousing-Preprocessing 55 | - step_name: calhousing-CreateModel 56 | step_class: CreateModel 57 | - step_name: calhousing-Transform 58 | step_class: Transform 59 | chain_input_source_step: 60 | - calhousing-Preprocessing 61 | chain_input_additional_prefix: test/x_test.csv 62 | - step_name: calhousing-Metrics 63 | step_class: Metrics 64 | chain_input_source_step: 65 | - calhousing-Preprocessing 66 | - calhousing-Transform 67 | - step_name: calhousing-Register 68 | step_class: RegisterModel 69 | 70 | dependencies: 71 | - calhousing-Preprocessing >> calhousing-Training >> calhousing-CreateModel >> calhousing-Transform >> calhousing-Metrics >> calhousing-Register 72 | -------------------------------------------------------------------------------- /examples/tf/data_source.md: -------------------------------------------------------------------------------- 1 | # Tensorflow Example Data Source 2 | 3 | ## Data Information 4 | We use the California housing dataset. 5 | 6 | More info on the dataset: 7 | 8 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/ 9 | 10 | The target variable is the median house value for California districts. 11 | 12 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). 13 | 14 | ## Data Download 15 | To download the data, run the following code block. The data file name is **cal_housing.data** 16 | ``` 17 | import boto3 18 | 19 | region = "" 20 | s3 = boto3.client("s3") 21 | s3.download_file( 22 | f"sagemaker-example-files-prod-{region}", 23 | "datasets/tabular/california_housing/cal_housing.tgz", 24 | "cal_housing.tgz", 25 | ) 26 | 27 | import tarfile 28 | with tarfile.open("cal_housing.tgz") as tar: 29 | tar.extractall(path="tf/train_data") 30 | ``` 31 | 32 | ## Upload to S3 33 | ``` 34 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3:////cal_housing.data 35 | ``` -------------------------------------------------------------------------------- /examples/tf/modelscripts/evaluate.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pathlib 4 | 5 | import numpy as np 6 | from sklearn.metrics import mean_squared_error 7 | 8 | if __name__ == "__main__": 9 | pred_path = "/opt/ml/processing/input/calhousing-Transform-train/" 10 | print(os.listdir(pred_path)) 11 | with open(os.path.join(pred_path, "x_test.csv.out")) as f: 12 | file_string = f.read() 13 | y_test_pred = json.loads(file_string)["predictions"] 14 | 15 | test_path = "/opt/ml/processing/input/calhousing-Preprocessing-train/" 16 | print(os.listdir(test_path)) 17 | y_test_true = np.loadtxt(os.path.join(test_path, "test/y_test.csv")) 18 | scores = mean_squared_error(y_test_true, y_test_pred) 19 | print("\nTest MSE :", scores) 20 | 21 | # Available metrics to add to model: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html 22 | report_dict = { 23 | "regression_metrics": { 24 | "mse": {"value": scores, "standard_deviation": "NaN"}, 25 | }, 26 | } 27 | 28 | output_dir = "/opt/ml/processing/output" 29 | pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True) 30 | 31 | evaluation_path = f"{output_dir}/model_evaluation_metrics.json" 32 | with open(evaluation_path, "w") as f: 33 | f.write(json.dumps(report_dict)) 34 | -------------------------------------------------------------------------------- /examples/tf/modelscripts/inference.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | import numpy as np 5 | 6 | print(f"My location: {os.listdir()}") 7 | print(f"Dir of /opt/ml: {os.listdir('/opt/ml')}") 8 | print(f"Dir of /opt/ml/model: {os.listdir('/opt/ml/model')}") 9 | 10 | print(f"{'-' * 40} Start printing Env Var {'-' * 40}") 11 | for name, value in os.environ.items(): 12 | print("{0}: {1}".format(name, value)) 13 | print(f"{'-' * 40} Finish printing Env Var {'-' * 40}") 14 | 15 | model_dir = "/opt/ml/model" 16 | print("numpy version", np.__version__) 17 | 18 | 19 | def read_csv(csv): 20 | return np.array([[float(j) for j in i.split(",")] for i in csv.splitlines()]) 21 | 22 | 23 | def input_handler(data, context): 24 | """ Pre-process request input before it is sent to TensorFlow Serving REST API 25 | Args: 26 | data (obj): the request data, in format of dict or string 27 | context (Context): an object containing request and configuration details 28 | Returns: 29 | (dict): a JSON-serializable dict that contains request body and headers 30 | """ 31 | print(f"InputHandler, request content type is {context.request_content_type}") 32 | print(f"InputHandler, model name is {context.model_name}") 33 | print(f"InputHandler, method is {context.method}") 34 | print(f"InputHandler, rest_uri is {context.rest_uri}") 35 | print(f"InputHandler, custom_attributes is {context.custom_attributes}") 36 | print(f"InputHandler, accept_header is {context.accept_header}") 37 | print(f"InputHandler, content_length is {context.content_length}") 38 | 39 | if context.request_content_type == 'application/json': 40 | # pass through json (assumes it's correctly formed) 41 | d = data.read().decode('utf-8') 42 | return d if len(d) else '' 43 | if context.request_content_type == 'text/csv': 44 | payload = data.read().decode('utf-8') 45 | inputs = read_csv(payload) 46 | print(inputs[:10]) 47 | input_data = {'instances': inputs.tolist()} 48 | return json.dumps(input_data) 49 | 50 | 51 | def output_handler(data, context): 52 | """Post-process TensorFlow Serving output before it is returned to the client. 53 | Args: 54 | data (obj): the TensorFlow serving response 55 | context (Context): an object containing request and configuration details 56 | Returns: 57 | (bytes, string): data to return to client, response content type 58 | """ 59 | print(f"OutputHandler, hello world!") 60 | status_code = data.status_code 61 | content = data.content 62 | 63 | if status_code != 200: 64 | raise ValueError(content.decode('utf-8')) 65 | 66 | response_content_type = context.accept_header 67 | prediction = data.content 68 | 69 | print(f"Prediction type is {type(prediction)}, {prediction}") 70 | print(f"Prediction is {prediction}") 71 | 72 | return prediction, response_content_type 73 | -------------------------------------------------------------------------------- /examples/tf/modelscripts/preprocess.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import os 3 | 4 | import numpy as np 5 | import pandas as pd 6 | from sklearn.model_selection import train_test_split 7 | from sklearn.preprocessing import StandardScaler 8 | 9 | BASE_DIR = "/opt/ml/processing" 10 | CODE_DIR = os.path.join(BASE_DIR, "code") 11 | INPUT_DIR = os.path.join(BASE_DIR, "input") 12 | OUTPUT_DIR = os.path.join(BASE_DIR, "output") 13 | 14 | print(os.listdir(INPUT_DIR)) 15 | 16 | if __name__ == "__main__": 17 | columns = [ 18 | "longitude", 19 | "latitude", 20 | "housingMedianAge", 21 | "totalRooms", 22 | "totalBedrooms", 23 | "population", 24 | "households", 25 | "medianIncome", 26 | "medianHouseValue", 27 | ] 28 | cal_housing_df = pd.read_csv( 29 | os.path.join(INPUT_DIR, "raw_data/cal_housing.data"), 30 | names=columns, 31 | header=None 32 | ) 33 | X = cal_housing_df[ 34 | [ 35 | "longitude", 36 | "latitude", 37 | "housingMedianAge", 38 | "totalRooms", 39 | "totalBedrooms", 40 | "population", 41 | "households", 42 | "medianIncome", 43 | ] 44 | ] 45 | Y = cal_housing_df[["medianHouseValue"]] / 100000 46 | 47 | x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33) 48 | split_data_dir = os.path.join(BASE_DIR, "split_data") 49 | if not os.path.exists(split_data_dir): 50 | os.mkdir(split_data_dir) 51 | np.save(os.path.join(split_data_dir, "x_train.npy"), x_train) 52 | np.save(os.path.join(split_data_dir, "x_test.npy"), x_test) 53 | np.save(os.path.join(split_data_dir, "y_train.npy"), y_train) 54 | np.save(os.path.join(split_data_dir, "y_test.npy"), y_test) 55 | 56 | input_files = glob.glob("{}/*.npy".format(split_data_dir)) 57 | print("\nINPUT FILE LIST: \n{}\n".format(input_files)) 58 | scaler = StandardScaler() 59 | x_train = np.load(os.path.join(split_data_dir, "x_train.npy")) 60 | scaler.fit(x_train) 61 | 62 | train_data_output_dir = os.path.join(OUTPUT_DIR, "train/train") 63 | if not os.path.exists(train_data_output_dir): 64 | os.mkdir(train_data_output_dir) 65 | test_data_output_dir = os.path.join(OUTPUT_DIR, "train/test") 66 | if not os.path.exists(test_data_output_dir): 67 | os.mkdir(test_data_output_dir) 68 | for file in input_files: 69 | raw = np.load(file) 70 | # only transform feature columns 71 | if "y_" not in file: 72 | transformed = scaler.transform(raw) 73 | if "train" in file: 74 | if "y_" in file: 75 | output_path = os.path.join(train_data_output_dir, "y_train.npy") 76 | np.save(output_path, raw) 77 | print("SAVED LABEL TRAINING DATA FILE\n") 78 | else: 79 | output_path = os.path.join(train_data_output_dir, "x_train.npy") 80 | np.save(output_path, transformed) 81 | print("SAVED TRANSFORMED TRAINING DATA FILE\n") 82 | else: 83 | if "y_" in file: 84 | output_path = os.path.join(test_data_output_dir, "y_test.npy") 85 | np.save(output_path, raw) 86 | output_path = os.path.join(test_data_output_dir, "y_test.csv") 87 | np.savetxt(output_path, raw, delimiter=",") 88 | print("SAVED LABEL TEST DATA FILE\n") 89 | else: 90 | output_path = os.path.join(test_data_output_dir, "x_test.npy") 91 | np.save(output_path, transformed) 92 | output_path = os.path.join(test_data_output_dir, "x_test.csv") 93 | np.savetxt(output_path, transformed, delimiter=",") 94 | print("SAVED TRANSFORMED TEST DATA FILE\n") 95 | -------------------------------------------------------------------------------- /examples/tf/modelscripts/requirements.txt: -------------------------------------------------------------------------------- 1 | pandas -------------------------------------------------------------------------------- /examples/tf/modelscripts/train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | 7 | 8 | def parse_args(): 9 | parser = argparse.ArgumentParser() 10 | 11 | # hyperparameters sent by the client are passed as command-line arguments to the script 12 | parser.add_argument('--epochs', type=int, default=1) 13 | parser.add_argument('--batch_size', type=int, default=64) 14 | parser.add_argument('--learning_rate', type=float, default=0.1) 15 | 16 | # data directories 17 | channel_path = "/opt/ml/input/data/calhousing-Preprocessing-train" 18 | parser.add_argument( 19 | '--train', 20 | type=str, 21 | default=os.path.join(channel_path, "train") 22 | ) 23 | parser.add_argument( 24 | '--test', 25 | type=str, 26 | default=os.path.join(channel_path, "test") 27 | ) 28 | 29 | # model directory 30 | parser.add_argument('--sm-model-dir', type=str, default=os.environ.get('SM_MODEL_DIR')) 31 | 32 | return parser.parse_known_args() 33 | 34 | 35 | def get_train_data(train_dir): 36 | x_train = np.load(os.path.join(train_dir, 'x_train.npy')) 37 | y_train = np.load(os.path.join(train_dir, 'y_train.npy')) 38 | print('x train', x_train.shape, 'y train', y_train.shape) 39 | 40 | return x_train, y_train 41 | 42 | 43 | def get_test_data(test_dir): 44 | x_test = np.load(os.path.join(test_dir, 'x_test.npy')) 45 | y_test = np.load(os.path.join(test_dir, 'y_test.npy')) 46 | print('x test', x_test.shape, 'y test', y_test.shape) 47 | 48 | return x_test, y_test 49 | 50 | 51 | def get_model(): 52 | inputs = tf.keras.Input(shape=(8,)) 53 | hidden_1 = tf.keras.layers.Dense(8, activation='tanh')(inputs) 54 | hidden_2 = tf.keras.layers.Dense(4, activation='sigmoid')(hidden_1) 55 | outputs = tf.keras.layers.Dense(1)(hidden_2) 56 | return tf.keras.Model(inputs=inputs, outputs=outputs) 57 | 58 | 59 | if __name__ == "__main__": 60 | args, _ = parse_args() 61 | 62 | print('Training data location: {}'.format(args.train)) 63 | print('Test data location: {}'.format(args.test)) 64 | x_train, y_train = get_train_data(args.train) 65 | x_test, y_test = get_test_data(args.test) 66 | 67 | batch_size = args.batch_size 68 | epochs = args.epochs 69 | learning_rate = args.learning_rate 70 | print('batch_size = {}, epochs = {}, learning rate = {}'.format(batch_size, epochs, learning_rate)) 71 | 72 | model = get_model() 73 | optimizer = tf.keras.optimizers.SGD(learning_rate) 74 | model.compile(optimizer=optimizer, loss='mse') 75 | model.fit(x_train, 76 | y_train, 77 | batch_size=batch_size, 78 | epochs=epochs, 79 | validation_data=(x_test, y_test)) 80 | 81 | # evaluate on test set 82 | scores = model.evaluate(x_test, y_test, batch_size, verbose=2) 83 | print("\nTest MSE :", scores) 84 | 85 | # save model 86 | model.save(args.sm_model_dir + '/1') 87 | -------------------------------------------------------------------------------- /examples/tf/smp_dag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/tf/smp_dag.png -------------------------------------------------------------------------------- /framework/.gitignore: -------------------------------------------------------------------------------- 1 | .venv/ -------------------------------------------------------------------------------- /framework/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/__init__.py -------------------------------------------------------------------------------- /framework/conf/conf.yaml: -------------------------------------------------------------------------------- 1 | --- 2 | conf: 3 | s3Bucket: SMP_S3BUCKETNAME 4 | modelConfigFilePath: SMP_MODEL_CONFIGPATH 5 | sagemakerNetworkSecurity: 6 | subnets: SMP_SUBNETS 7 | role: SMP_ROLE 8 | security_groups_id: SMP_SECURITYGROUPS 9 | 10 | -------------------------------------------------------------------------------- /framework/conf/template/conf.yaml: -------------------------------------------------------------------------------- 1 | models: 2 | modelContainer: 3 | mymodel: 4 | source_directory: source_directory_root_level 5 | name: model_name 6 | registry: 7 | ModelPackageGroupName: "" 8 | ModelPackageGroupDescription: "" 9 | ModelApprovalStatus: "" 10 | ModelRepack: "" 11 | InferenceSpecification: 12 | SupportedTransformInstanceTypes: 13 | - ml.m5.2xlarge 14 | - OTHER.OPTIONS 15 | SupportedRealtimeInferenceInstanceType: 16 | - ml.m5.2xlarge 17 | - OTHER.OPTIONS 18 | SupportedContentTypes: 19 | - application/json 20 | SupportedResponseMIMETypes: 21 | - application/json 22 | image_uri: &modelImage "ECR_MODEL_IMAGE" 23 | MetadataProperties: 24 | - Test 25 | metrics: 26 | ModelQuality: 27 | Statistics: 28 | ContentType: application/json 29 | channels: 30 | train: 31 | location: 32 | activeLocation: s3 33 | s3BucketName: "" 34 | evaluateBucketPrefix: "" 35 | evaluateInputLocalFilepath: "" 36 | inputBucketPrefix: prefix/to/input 37 | content_type: text/csv 38 | dataFiles: 39 | - fileName: data_1.csv 40 | - fileName: s3://bucket/fullt/path/data_2.csv 41 | sagemaker: 42 | image_uri: *modelImage 43 | base_job_name: "" 44 | entry_point: "" 45 | instance_count: "" 46 | instance_type: "" 47 | strategy: "" 48 | assemble_with: "" 49 | join_source: "" 50 | split_type: "" 51 | content_type: "" 52 | max_payload: "" 53 | volume_size_in_gb: "" 54 | max_runtime_in_seconds: "" 55 | s3_data_type: "" 56 | s3_input_mode: "" 57 | s3_data_distribution_type: "" 58 | tags: 59 | - Key: key1 60 | Value: value1 61 | - Key: key2 62 | Value: value2 63 | env: 64 | key: value 65 | key2: value2 66 | preprocess: 67 | instance_type: "" 68 | instance_count: "" 69 | volume_size_in_gb: 50 70 | max_runtime_seconds: 3000 71 | image_uri: "" 72 | entry_point: "" 73 | base_job_name: "" 74 | channels: 75 | train: 76 | s3BucketName: "" 77 | inputBucketPrefix: prefix/to/input 78 | outputBucketPrefix: prefix/to/output 79 | dataFiles: 80 | - sourceName: data_1 81 | fileName: data_1.csv 82 | - sourceName: data_2 83 | fileName: s3://bucket/fullt/path/data_2.csv 84 | tags: 85 | - Key: key1 86 | Value: value1 87 | - Key: key2 88 | Value: value2 89 | max_run_time_in_seconds: 3600 90 | env: 91 | key: value 92 | key2: value2 93 | train: 94 | instance_type: "" 95 | instance_count: "" 96 | output_path: s3://bucket/path/to/output 97 | base_image_uri: *modelImage 98 | entry_point: "" 99 | base_job_name: "" 100 | volume_size_in_gb: 50 101 | max_runtime_seconds: 3000 102 | hyperparameters: 103 | parameters: value 104 | parameters2: value2 105 | channels: 106 | train: 107 | location: 108 | activeLocation: s3 109 | s3BucketName: "" 110 | inputBucketPrefix: prefix/to/input 111 | content_type: text/csv 112 | dataFiles: 113 | - fileName: data_1.csv 114 | - fileName: s3://bucket/fullt/path/data_2.csv 115 | tags: 116 | - Key: key1 117 | Value: value1 118 | - Key: key2 119 | Value: value2 120 | env: 121 | key: value 122 | key2: value2 123 | tune: 124 | base_job_name: "" 125 | image_uri: *modelImage 126 | strategy: "" 127 | objective_metric_name: "" 128 | hyperparameter_ranges: "" 129 | metric_definitions: "" 130 | objective_type: "" 131 | max_parallel_jobs: "" 132 | max_runtime_in_seconds: "" 133 | tags: 134 | - Key: key1 135 | Value: value1 136 | - Key: key2 137 | Value: value2 138 | early_stopping_type: "" 139 | random_seed: "" 140 | transform: 141 | channels: 142 | train: 143 | location: 144 | activeLocation: s3 145 | s3BucketName: "" 146 | evaluateBucketPrefix: "" 147 | evaluateInputLocalFilepath: "" 148 | inputBucketPrefix: prefix/to/input 149 | content_type: text/csv 150 | dataFiles: 151 | - fileName: data_1.csv 152 | - fileName: s3://bucket/fullt/path/data_2.csv 153 | sagemaker: 154 | image_uri: *modelImage 155 | base_job_name: "" 156 | entry_point: "" 157 | instance_count: "" 158 | instance_type: "" 159 | strategy: "" 160 | assemble_with: "" 161 | join_source: "" 162 | split_type: "" 163 | content_type: "" 164 | max_payload: "" 165 | volume_size_in_gb: "" 166 | max_runtime_in_seconds: "" 167 | s3_data_type: "" 168 | s3_input_mode: "" 169 | s3_data_distribution_type: "" 170 | tags: 171 | - Key: key1 172 | Value: value1 173 | - Key: key2 174 | Value: value2 175 | env: 176 | key: value 177 | key2: value2 178 | sagemakerPipeline: 179 | pipelineName: "" 180 | models: 181 | mymodel: 182 | steps: 183 | - step_name: mymodel-Preprocessing 184 | step_class: preprocessing 185 | chain_input_source_steps: 186 | - upstream-model-source-steps-# 187 | enable_cache: "" 188 | - step_name: mymodel-Training 189 | step_class: training 190 | chain_input_source_steps: 191 | - upstream-model-source-steps-# 192 | enable_cache: "" 193 | - step_name: mymodel-CreateModel 194 | step_class: createmodel 195 | - step_name: mymodel-Transform 196 | step_class: transform 197 | chain_input_source_steps: 198 | - upstream-model-source-steps-# 199 | chain_input_additional_prefix: "" 200 | enable_cache: "" 201 | - step_name: mymodel-Metrics 202 | step_class: metrics 203 | chain_input_source_steps: 204 | - upstream-model-source-steps-# 205 | chain_input_additional_prefix: "" 206 | enable_cache: "" 207 | - step_name: mymodel-Registry 208 | step_class: registermodel 209 | dependencies: 210 | - step1 >> step2 >> ... 211 | -------------------------------------------------------------------------------- /framework/createmodel/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/createmodel/__init__.py -------------------------------------------------------------------------------- /framework/createmodel/create_model_service.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | # Import native libraries 19 | 20 | from sagemaker.model import Model 21 | from sagemaker.workflow.pipeline_context import PipelineSession 22 | # Import Third-party libraries 23 | from sagemaker.workflow.steps import TrainingStep 24 | # Import Custom libraries 25 | from utilities.logger import Logger 26 | 27 | 28 | ######################################################################################## 29 | ### If the Logger class implememntation required file handler ### 30 | ### self.logger = Logger(_conf) ### 31 | ######################################################################################## 32 | 33 | 34 | class CreateModelService: 35 | """ 36 | Create Model Service. Create a ModelStep 37 | """ 38 | 39 | def __init__(self, config: dict, model_name: str) -> "CreateModelService": 40 | """ 41 | Initialization method to Create a SageMaker Model 42 | 43 | Args: 44 | ---------- 45 | - config (dict): Application configuration 46 | - model_name (str): Name of Model 47 | """ 48 | self.config = config 49 | self.model_name = model_name 50 | self.logger = Logger() 51 | 52 | def _get_network_config(self) -> dict: 53 | """ 54 | Method to retreive SageMaker network configuration 55 | 56 | Returns: 57 | ---------- 58 | - SageMaker Network Configuration dictionary 59 | """ 60 | 61 | network_config_kwargs = dict( 62 | enable_network_isolation=False, 63 | security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split( 64 | ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None, 65 | subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get( 66 | "sagemakerNetworkSecurity.subnets", None) else None, 67 | kms_key=self.config.get("sagemakerNetworkSecurity.kms_key"), 68 | encrypt_inter_container_traffic=True, 69 | role=self.config.get("sagemakerNetworkSecurity.role"), 70 | ) 71 | return network_config_kwargs 72 | 73 | def _get_pipeline_session(self) -> PipelineSession: 74 | return PipelineSession(default_bucket=self.config.get("s3Bucket")) 75 | 76 | def _args(self) -> dict: 77 | """ 78 | Parse method to retreive all arguments to be used to create the Model 79 | 80 | Returns: 81 | ---------- 82 | - CreateModel arguments : dict 83 | """ 84 | 85 | # parse main conf dictionary 86 | conf = self.config.get("models") 87 | 88 | args = dict( 89 | name=conf.get(f"{self.model_name}.name"), 90 | image_uri=conf.get(f"{self.model_name}.registry.InferenceSpecification.image_uri"), 91 | model_repack_flag=conf.get(f"{self.model_name}.registry.ModelRepack", "True"), 92 | # source_dir=conf.get(f"{self.model_name}.source_directory", os.environ["SMP_SOURCE_DIR_PATH"]), 93 | source_dir=conf.get(f"{self.model_name}.source_directory"), 94 | entry_point=conf.get(f"{self.model_name}.transform.entry_point", "inference.py").replace("/", ".").replace( 95 | ".py", ""), 96 | env={ 97 | "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code", 98 | "SAGEMAKER_PROGRAM": conf.get(f"{self.model_name}.transform.entry_point", "inference.py") 99 | .replace("/", ".") 100 | .replace(".py", ""), 101 | "SAGEMAKER_REQUIREMENTS": "requirements.txt", 102 | }, 103 | enable_network_isolation=False, 104 | default_bucket=self.config.get(f"s3Bucket"), 105 | ) 106 | 107 | return args 108 | 109 | def create_model(self, step_train: TrainingStep) -> Model: 110 | """ 111 | Create a SageMaker Model 112 | 113 | Args: 114 | ---------- 115 | - step_train (TrainingStep): SageMaker Training Step 116 | 117 | Returns: 118 | ---------- 119 | - SageMaker Model 120 | 121 | """ 122 | # Get SegeMaker Network Configuration 123 | sagemaker_network_config = self._get_network_config() 124 | self.logger.log_info(f"{'-' * 50} Start SageMaker Model Creation {self.model_name} {'-' * 50}") 125 | self.logger.log_info(f"SageMaker network config: {sagemaker_network_config}") 126 | 127 | # Get Arg for CreateModel Step 128 | args = self._args() 129 | self.logger.log_info(f"Arguments used: {args}") 130 | 131 | # Check ModelRepack Flag 132 | _model_repack_flag = args.get("model_repack_flag") 133 | 134 | vpc_config = {} 135 | if sagemaker_network_config.get("security_group_ids", None): vpc_config.update( 136 | {'SecurityGroupIds': sagemaker_network_config.get("security_group_ids")}) 137 | if sagemaker_network_config.get("subnets", None): vpc_config.update( 138 | {'Subnets': sagemaker_network_config.get("subnets")}) 139 | if not vpc_config: vpc_config = None 140 | 141 | model = '' 142 | if _model_repack_flag == "True": 143 | model = Model( 144 | name=args["name"], 145 | image_uri=args["image_uri"], 146 | source_dir=args["source_dir"], 147 | entry_point=args["entry_point"], 148 | model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts, 149 | role=sagemaker_network_config["role"], 150 | vpc_config=vpc_config, 151 | enable_network_isolation=args["enable_network_isolation"], 152 | sagemaker_session=self._get_pipeline_session(), 153 | model_kms_key=sagemaker_network_config["kms_key"], 154 | ) 155 | 156 | elif _model_repack_flag == "False": 157 | model = Model( 158 | name=args["name"], 159 | image_uri=args["image_uri"], 160 | env=args["env"], 161 | model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts, 162 | role=sagemaker_network_config["role"], 163 | vpc_config=vpc_config, 164 | enable_network_isolation=args.get("enable_network_isolation"), 165 | sagemaker_session=self._get_pipeline_session(), 166 | model_kms_key=sagemaker_network_config["kms_key"], 167 | ) 168 | print 169 | 170 | self.logger.log_info(f"SageMaker Model - {self.model_name} - Created") 171 | return model 172 | -------------------------------------------------------------------------------- /framework/framework_entrypoint.py: -------------------------------------------------------------------------------- 1 | from pipeline.pipeline_service import PipelineService 2 | 3 | pipeline = PipelineService() 4 | pipeline.execute_pipeline() 5 | -------------------------------------------------------------------------------- /framework/modelmetrics/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/modelmetrics/__init__.py -------------------------------------------------------------------------------- /framework/modelmetrics/model_metrics_service.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | # Import native libraries 19 | import os 20 | 21 | # Import Third-party libraries 22 | import boto3 23 | import sagemaker 24 | from pipeline.helper import get_chain_input_file 25 | from sagemaker.network import NetworkConfig 26 | from sagemaker.processing import ( 27 | FrameworkProcessor, 28 | ProcessingInput, 29 | ProcessingOutput, 30 | RunArgs, 31 | ) 32 | from sagemaker.workflow.pipeline_context import PipelineSession 33 | # Import Custom libraries 34 | from utilities.logger import Logger 35 | 36 | ######################################################################################## 37 | ### If the Logger class implememntation required file handler ### 38 | ### self.logger = Logger(_conf) ### 39 | ######################################################################################## 40 | 41 | session = boto3.session.Session() 42 | region_name = session.region_name 43 | client_sagemaker_obj = boto3.client("sagemaker", region_name=region_name) 44 | 45 | 46 | class ModelMetricsService: 47 | """ 48 | Create an Evaluate function to generate the model metrcis 49 | """ 50 | 51 | def __init__(self, config: dict, model_name: str, step_config: dict, 52 | model_step_dict: dict) -> "ModelMetricsService": 53 | """ 54 | Initialization method to Create ModelMetricsService 55 | 56 | Args: 57 | ---------- 58 | - config (dict): Application configuration 59 | - model_name (str): Name of Model 60 | """ 61 | self.config = config 62 | self.model_name = model_name 63 | self.step_config = step_config 64 | self.model_step_dict = model_step_dict 65 | self.logger = Logger() 66 | 67 | def _get_pipeline_session(self) -> PipelineSession: 68 | return PipelineSession(default_bucket=self.config.get("s3Bucket")) 69 | 70 | def _get_network_config(self) -> dict: 71 | """ 72 | Method to retreive SageMaker network configuration 73 | 74 | Returns: 75 | ---------- 76 | - SageMaker Network Configuration dictionary 77 | """ 78 | 79 | network_config_kwargs = dict( 80 | enable_network_isolation=False, 81 | security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split( 82 | ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None, 83 | subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get( 84 | "sagemakerNetworkSecurity.subnets", None) else None, 85 | encrypt_inter_container_traffic=True, 86 | ) 87 | return network_config_kwargs 88 | 89 | def _sagemaker_args(self): 90 | """ 91 | Parse method to retreive all sagemaker arguments 92 | """ 93 | conf = self.config.get(f"models.{self.model_name}.evaluate") 94 | 95 | args = dict( 96 | image_uri=conf.get("image_uri"), 97 | entry_point=conf.get("entry_point"), 98 | base_job_name=conf.get("base_job_name", "default-model-metrics-job-name"), 99 | instance_count=conf.get("instance_count", 1), 100 | instance_type=conf.get("instance_type", "ml.m5.2xlarge"), 101 | strategy=conf.get("strategy", "SingleRecord"), 102 | max_payload=conf.get("max_payload", None), 103 | volume_size_in_gb=conf.get("volume_size_in_gb", 50), 104 | max_runtime_in_seconds=conf.get("max_runtime_in_seconds", 3600), 105 | s3_data_distribution_type=conf.get("s3_data_distribution_type", "FullyReplicated"), 106 | s3_data_type=conf.get("s3_data_type", "S3Prefix"), 107 | s3_input_mode=conf.get("s3_input_mode", "File"), 108 | role=self.config.get("sagemakerNetworkSecurity.role"), 109 | kms_key=self.config.get("sagemakerNetworkSecurity.kms_key", None), 110 | tags=conf.get("tags", None), 111 | env=conf.get("env", None), 112 | ) 113 | 114 | self.logger.log_info("Arguments Instantiates", f"Args: {args}") 115 | 116 | return args 117 | 118 | def _get_static_input_list(self) -> list: 119 | """ 120 | Method to retreive SageMaker static inputs 121 | 122 | Returns: 123 | ---------- 124 | - SageMaker Processing Inputs list 125 | 126 | """ 127 | conf = self.config.get(f"models.{self.model_name}.evaluate") 128 | # Get the total number of input files 129 | input_files_list = list() 130 | for channel in conf.get("channels", {}).keys(): input_files_list.append( 131 | conf.get(f"channels.{channel}.dataFiles", [])[0]) 132 | return input_files_list 133 | 134 | def _get_static_input(self, input_local_filepath): 135 | """ 136 | Method to retreive SageMaker static inputs 137 | 138 | Returns: 139 | ---------- 140 | - SageMaker Processing Inputs list 141 | 142 | """ 143 | static_inputs = [] 144 | 145 | conf = self.config.get(f"models.{self.model_name}.evaluate") 146 | if isinstance(conf.get("channels", {}), dict): 147 | # Get the total number of input files 148 | input_files_list = self._get_static_input_list() 149 | if len(input_files_list) >= 7: 150 | raise Exception("Static inputs for metrics should not exceed 7") 151 | for file in input_files_list: 152 | if file.get("fileName").startswith("s3://"): 153 | _source = file.get("fileName") 154 | else: 155 | bucket = conf.get("channels.train.s3Bucket") 156 | input_prefix = conf.get("channels.train.s3InputPrefix", "") 157 | _source = os.path.join(bucket, input_prefix, file.get("fileName")) 158 | 159 | temp = ProcessingInput( 160 | input_name=file.get("sourceName", ""), 161 | source=_source, 162 | destination=os.path.join(input_local_filepath, file.get("sourceName", "")), 163 | s3_data_distribution_type=conf.get("s3_data_distribution_type", "FullyReplicated"), 164 | ) 165 | static_inputs.append(temp) 166 | 167 | return static_inputs 168 | 169 | def _get_chain_input(self, input_local_filepath): 170 | """ 171 | Method to retreive SageMaker chain inputs 172 | 173 | Returns: 174 | ---------- 175 | - SageMaker Processing Inputs list 176 | """ 177 | dynamic_inputs = [] 178 | chain_input_source_step = self.step_config.get("chain_input_source_step", []) 179 | 180 | channels_conf = self.config.get(f"models.{self.model_name}.evaluate.channels", "train") 181 | if isinstance(channels_conf, str): 182 | # no datafile input 183 | channel_name = channels_conf 184 | else: 185 | # find datafile input 186 | if len(channels_conf.keys()) != 1: 187 | raise Exception("Evaluate step can only have one channel.") 188 | channel_name = list(channels_conf.keys())[0] 189 | 190 | for source_step_name in chain_input_source_step: 191 | chain_input_path = get_chain_input_file( 192 | source_step_name=source_step_name, 193 | steps_dict=self.model_step_dict, 194 | source_output_name=channel_name, 195 | ) 196 | 197 | temp = ProcessingInput( 198 | input_name=f"{source_step_name}-input", 199 | source=chain_input_path, 200 | destination=os.path.join(input_local_filepath, f"{source_step_name}-{channel_name}"), 201 | ) 202 | dynamic_inputs.append(temp) 203 | 204 | return dynamic_inputs 205 | 206 | def _get_processing_inputs(self, input_destination) -> list: 207 | """ 208 | Method to get additional processing inputs 209 | """ 210 | # Instantiate a list of inputs 211 | temp_static_input = self._get_static_input(input_destination) 212 | temp_dynamic_input = self._get_chain_input(input_destination) 213 | processing_inputs = temp_static_input + temp_dynamic_input 214 | return processing_inputs 215 | 216 | def _generate_model_metrics( 217 | self, 218 | input_destination: str, 219 | output_source: str, 220 | output_destination: str, 221 | ) -> RunArgs: 222 | 223 | """ 224 | Method to create the ProcessorStep args to calculate metrics 225 | 226 | Args: 227 | ---------- 228 | - input_destination(str): path for input destination 229 | - output_source (str): path for output source 230 | - output_destination (str): path for output destination 231 | """ 232 | 233 | # Get metrics Config 234 | # metrics_config = self.config.get(f"models.{self.model_name}.evaluate") 235 | # Get Sagemaker Network config params 236 | sagemakernetworkconfig = self._get_network_config() 237 | # Get Sagemaker config params 238 | args = self._sagemaker_args() 239 | # Replace entry point path leverage python -m for local dependencies 240 | entrypoint_command = args.get("entry_point").replace("/", ".").replace(".py", "") 241 | 242 | # Create SageMaker Processor Instance 243 | processor = FrameworkProcessor( 244 | image_uri=args.get("image_uri"), 245 | estimator_cls=sagemaker.sklearn.estimator.SKLearn, # ignore bc of image_uri 246 | framework_version=None, 247 | role=args.get("role"), 248 | command=["python", "-m", entrypoint_command], 249 | instance_count=args.get("instance_count"), 250 | instance_type=args.get("instance_type"), 251 | volume_size_in_gb=args.get("volume_size_in_gb"), 252 | volume_kms_key=args.get("kms_key"), 253 | output_kms_key=args.get("kms_key"), 254 | max_runtime_in_seconds=args.get("max_runtime_in_seconds"), 255 | base_job_name=args.get("base_job_name"), 256 | sagemaker_session=self._get_pipeline_session(), 257 | env=args.get("env"), 258 | tags=args.get("tags"), 259 | network_config=NetworkConfig(**sagemakernetworkconfig), 260 | ) 261 | 262 | generate_model_metrics_args = processor.run( 263 | inputs=self._get_processing_inputs(input_destination), 264 | outputs=[ 265 | ProcessingOutput( 266 | source=output_source, 267 | destination=output_destination, 268 | output_name="model_evaluation_metrics", 269 | ), 270 | ], 271 | source_dir=self.config.get( 272 | f"models.{self.model_name}.source_directory", 273 | os.getenv("SMP_SOURCE_DIR_PATH") 274 | ), 275 | code=args.get("entry_point"), 276 | wait=True, 277 | logs=True, 278 | job_name=args.get("base_job_name"), 279 | ) 280 | 281 | return generate_model_metrics_args 282 | 283 | def calculate_model_metrics(self) -> RunArgs: 284 | """ 285 | Method to calculate models metrics 286 | """ 287 | 288 | self.logger.log_info(f"{'-' * 40} {self.model_name} {'-' * 40}") 289 | evaluate_data = self.config.get(f"models.{self.model_name}.evaluate") 290 | if isinstance(evaluate_data.get("channels", "train"), dict): 291 | evaluate_channels = list(evaluate_data.get("channels").keys()) 292 | # Iterate through evaluate channels 293 | if len(evaluate_channels) != 1: 294 | raise Exception(f" Only one channel allowed within evaluation section. {evaluate_channels} found.") 295 | else: 296 | channel = evaluate_channels[0] 297 | self.logger.log_info(f"During ModelMetricsService, one evaluate channel {channel} found.") 298 | 299 | channel_full_name = f"channels.{channel}" 300 | bucket_prefix = evaluate_data.get(f"{channel_full_name}.bucket_prefix", "") 301 | s3_bucket_name = evaluate_data.get(f"{channel_full_name}.s3BucketName") 302 | processing_input_destination = evaluate_data.get( 303 | f"{channel_full_name}.InputLocalFilepath", "/opt/ml/processing/input/" 304 | ) 305 | processing_output_source = evaluate_data.get( 306 | f"{channel_full_name}.OuputLocalFilepath", "/opt/ml/processing/output/" 307 | ) 308 | processing_output_key = os.path.join( 309 | bucket_prefix, 310 | self.model_name, 311 | "evaluation", 312 | ) 313 | processing_output_destination = f"s3://{s3_bucket_name}/{processing_output_key}/" 314 | else: 315 | processing_input_destination = "/opt/ml/processing/input/" 316 | processing_output_source = "/opt/ml/processing/output/" 317 | processing_output_destination = None 318 | 319 | generate_model_metrics_args = self._generate_model_metrics( 320 | input_destination=processing_input_destination, 321 | output_source=processing_output_source, 322 | output_destination=processing_output_destination, 323 | ) 324 | 325 | self.logger.log_info(f" Model evaluate completed.") 326 | 327 | return generate_model_metrics_args 328 | -------------------------------------------------------------------------------- /framework/pipeline/helper.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | # Import native libraries 19 | import re 20 | 21 | from sagemaker.workflow import steps 22 | from sagemaker.workflow.functions import Join 23 | from ast import literal_eval 24 | 25 | 26 | def look_up_step_type_from_step_name(source_step_name: str, config: dict) -> str: 27 | """ 28 | Look up a step_type in the sagemakerPipeline providing source_step_name and model_name. 29 | 30 | Args: 31 | source_step_name (str): The name of the step to look up in the sagemakerPipeline. 32 | model_name (str): The model in sagemaker pipeline 33 | config (dict): The configuration. 34 | 35 | 36 | Returns: 37 | The step_type. 38 | """ 39 | # note: chain_input_source_step will error out if source_step does not have an optional step_type declared in 40 | # sagemakerPipeline section of conf. step_type is mandatory for step_class: Processing. 41 | for model_name in config['models'].keys(): 42 | steps_dict = config['models'][model_name] 43 | smp_steps_dict = config['sagemakerPipeline']['models'][model_name]['steps'] 44 | 45 | for step in smp_steps_dict: 46 | if step['step_name'] == source_step_name: 47 | if step['step_class'] == 'Processing': 48 | try: 49 | return step['step_type'] 50 | except KeyError: 51 | print( 52 | f'When chaining input, source {source_step_name} needs to include step_type, in sagemakerPipeline section of conf' 53 | ) 54 | elif step['step_class'] == 'Training': 55 | return "train" 56 | elif step['step_class'] == 'Transform': 57 | return "transform" 58 | else: 59 | raise Exception("Only Prcoessing, Training & Transform Step can be used as chain input source.") 60 | 61 | 62 | def look_up_steps(source_step_name: str, steps_dict: dict) -> steps.Step: 63 | """ 64 | Look up a step in a dictionary of steps. 65 | 66 | Args: 67 | source_step_name (str): The name of the step to look up. 68 | steps_dict (dict): The dictionary of steps. 69 | 70 | Returns: 71 | The step. 72 | """ 73 | for model_name, model_steps in steps_dict.items(): 74 | for step in model_steps: 75 | if step.name == source_step_name: 76 | return step 77 | 78 | 79 | def look_up_step_config(source_step_name: str, smp_config: dict) -> dict: 80 | """ 81 | Look up a step configuration in a dictionary of steps. 82 | 83 | Args: 84 | source_step_name (str): The name of the step to look up. 85 | smp_config (dict): The dictionary of steps. 86 | 87 | Returns: 88 | The step configuration. 89 | """ 90 | for source_model in smp_config.get("models"): 91 | for step in smp_config.get(f"models.{source_model}.steps"): 92 | if step.get("step_name") == source_step_name: 93 | step_class = step.get("step_class") 94 | return source_model, step_class 95 | 96 | 97 | def get_chain_input_file( 98 | source_step_name: str, 99 | steps_dict: dict, 100 | source_output_name: str = "train", 101 | allowed_step_types: list = ["Processing", "Training", "Transform"], 102 | ) -> str: 103 | """ 104 | Get the input file for a step in a chain. 105 | 106 | Args: 107 | source_step_name (str): The name of the step to look up. 108 | steps_dict (dict): The dictionary of steps. 109 | source_output_name (str): The name of the output to look up. 110 | allowed_step_types (list): The list of allowed step types. 111 | 112 | Returns: 113 | The input file. 114 | """ 115 | 116 | source_step = look_up_steps(source_step_name, steps_dict) 117 | if source_step.step_type.value not in allowed_step_types: 118 | raise ValueError( 119 | f"Invalid Source Step Type: {source_step.step_type.value}, Valid source step are {allowed_step_types}" 120 | ) 121 | if source_step.step_type.value == "Processing": 122 | chain_input_file = source_step.properties.ProcessingOutputConfig.Outputs[source_output_name].S3Output.S3Uri 123 | elif source_step.step_type.value == "Training": 124 | chain_input_file = source_step.properties.ModelArtifacts.S3ModelArtifacts 125 | elif source_step.step_type.value == "Transform": 126 | chain_input_file = source_step.properties.TransformOutput.S3OutputPath 127 | else: 128 | raise ValueError( 129 | f"SageMaker model Framework is not supported: {source_step.step_type.value}" 130 | f"Invalid Source Step Type: {source_step.step_type.value}, Valid source step are {allowed_step_types}" 131 | ) 132 | return chain_input_file 133 | 134 | 135 | def get_cache_flag(step_config: dict) -> bool: 136 | """ 137 | Get the cache flag for a step configuration. 138 | 139 | Args: 140 | step_config (dict): The step configuration. 141 | 142 | Returns: 143 | The cache flag. 144 | """ 145 | if "enable_cache" in step_config.keys(): 146 | cache_flag_content = step_config.get("enable_cache") 147 | if isinstance(cache_flag_content, bool): 148 | chache_flag = cache_flag_content 149 | else: 150 | raise Exception("Invalid value of step_caching, valid values are True or False") 151 | else: 152 | chache_flag = False 153 | return chache_flag 154 | 155 | 156 | def generate_default_smp_config(config: dict) -> dict: 157 | """ 158 | Generate the default SageMaker Model Parallelism configuration. 159 | 160 | Args: 161 | config (dict): The configuration. 162 | 163 | Returns: 164 | The SageMaker Model Parallelism configuration. 165 | """ 166 | model_name = config.get("models.modelName") 167 | model_abbreviated = model_name.replace("model", "") 168 | project_name = config.get("project_name") 169 | 170 | try: 171 | default_pipeline_name = config.get(f"models.{model_name}.sagemakerPipeline.pipelineName") 172 | except Exception: 173 | default_pipeline_name = f"{project_name}-{model_abbreviated}-pipeline" 174 | 175 | smp_ = f""" 176 | {{ 177 | pipelineName: "{default_pipeline_name}", 178 | models: {{ 179 | {model_name}: {{ 180 | steps = [ 181 | {{ 182 | step_name = {model_abbreviated}-Preprocessing, 183 | step_class = preprocessing, 184 | }}, 185 | {{ 186 | step_name = {model_abbreviated}-Training, 187 | step_class = training, 188 | chain_input_source_steps = [{model_abbreviated}-Preprocessing], 189 | }}, 190 | {{ 191 | step_name = {model_abbreviated}, 192 | step_class = createmodel, 193 | }}, 194 | {{ 195 | step_name = {model_abbreviated}-Transform, 196 | step_class = transform, 197 | }}, 198 | {{ 199 | step_name = {model_abbreviated}-Metrics, 200 | step_class = metrics, 201 | }}, 202 | {{ 203 | step_name = {model_abbreviated}-Register, 204 | step_class = registermodel, 205 | }} 206 | ] 207 | }} 208 | }}, 209 | dependencies = [ 210 | "{model_abbreviated}-Preprocessing >> {model_abbreviated}-Training >> {model_abbreviated} >> {model_abbreviated}-Transform >> {model_abbreviated}-Metrics >> {model_abbreviated}-Register" 211 | ] 212 | }} 213 | 214 | """ 215 | smp_config = literal_eval(smp_) 216 | return smp_config 217 | -------------------------------------------------------------------------------- /framework/pipeline/model_unit.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | # Create individual model units for pipeline 19 | 20 | from createmodel.create_model_service import CreateModelService 21 | from modelmetrics.model_metrics_service import ModelMetricsService 22 | from pipeline.helper import get_cache_flag 23 | from processing.processing_service import ProcessingService 24 | from registermodel.register_model_service import RegisterModelService 25 | from sagemaker.workflow.model_step import ModelStep 26 | from sagemaker.workflow.properties import PropertyFile 27 | from sagemaker.workflow.steps import ( 28 | CacheConfig, 29 | ProcessingStep, 30 | TrainingStep, 31 | TransformStep 32 | ) 33 | from training.training_service import TrainingService 34 | from transform.transform_service import TransformService 35 | 36 | 37 | class ModelUnit: 38 | def __init__( 39 | self, 40 | config: dict, 41 | model_name: str, 42 | model_step_dict: dict, 43 | ) -> "ModelUnit": 44 | 45 | self.config = config 46 | self.model_name = model_name 47 | self.model_step_dict = model_step_dict.copy() 48 | self.model_step_dict[self.model_name] = [] 49 | 50 | def get_train_pipeline_steps(self) -> list: 51 | process_step = None 52 | train_step = None 53 | create_model_step = None 54 | transform_step = None 55 | metrics_step = None 56 | register_model_step = None 57 | model_pipeline_steps = [] 58 | 59 | step_config_list = self.config.get(f"sagemakerPipeline.models.{self.model_name}.steps") 60 | 61 | for step_config in step_config_list: 62 | step_class = step_config.get("step_class") 63 | if step_class == "Processing": 64 | preprocess_step = self.sagemaker_processing(step_config) 65 | add_step = preprocess_step 66 | elif step_class == "Training": 67 | train_step = self.sagemaker_training(step_config) 68 | add_step = train_step 69 | elif step_class == "CreateModel": 70 | if train_step is None: 71 | raise Exception("A training step must be run before a CreateModel step") 72 | create_model_step = self.sagemaker_create_model(step_config, train_step) 73 | add_step = create_model_step 74 | elif step_class == "Transform": 75 | sagemaker_model_name = create_model_step.properties.ModelName 76 | transform_step = self.sagemaker_transform(step_config, sagemaker_model_name) 77 | add_step = transform_step 78 | elif step_class == "Metrics": 79 | if transform_step is None: 80 | raise Exception("A transform step is required to create a model metrics step.") 81 | metrics_step = self.sagemaker_model_metrics(step_config) 82 | add_step = metrics_step 83 | elif step_class == "RegisterModel": 84 | if train_step is None: 85 | raise Exception("A training step is required to create a register model step.") 86 | register_model_step = self.sagemaker_register_model(step_config, metrics_step, train_step) 87 | add_step = register_model_step 88 | else: 89 | raise Exception("Invalid step_class value.") 90 | 91 | model_pipeline_steps.append(add_step) 92 | self.model_step_dict[self.model_name].append(add_step) 93 | return model_pipeline_steps 94 | 95 | def sagemaker_processing(self, step_config: dict) -> ProcessingStep: 96 | process_service = ProcessingService( 97 | self.config, 98 | self.model_name, 99 | step_config, 100 | self.model_step_dict, 101 | ) 102 | step_args = process_service.processing() 103 | cache_config = CacheConfig(enable_caching=True, expire_after="10d") 104 | process_step = ProcessingStep( 105 | name=step_config.get("step_name"), 106 | step_args=step_args, 107 | cache_config=cache_config, 108 | ) 109 | return process_step 110 | 111 | def sagemaker_training(self, step_config: dict) -> TrainingStep: 112 | 113 | training_service = TrainingService( 114 | self.config, 115 | self.model_name, 116 | step_config, 117 | self.model_step_dict, 118 | ) 119 | 120 | step_args = training_service.train_step() 121 | cache_config = CacheConfig(enable_caching=True, expire_after="10d") 122 | train_step = TrainingStep( 123 | name=step_config.get("step_name"), 124 | step_args=step_args, 125 | cache_config=cache_config, 126 | ) 127 | return train_step 128 | 129 | def sagemaker_create_model(self, step_config: dict, train_step: TrainingStep) -> ModelStep: 130 | 131 | create_model_service = CreateModelService( 132 | self.config, 133 | self.model_name, 134 | ) 135 | model = create_model_service.create_model(train_step) 136 | create_model_step = ModelStep( 137 | name=step_config.get("step_name"), 138 | step_args=model.create(instance_type="ml.m5.2xlarge") 139 | ) 140 | return create_model_step 141 | 142 | def sagemaker_transform(self, step_config: dict, sagemaker_model_name: str) -> TransformStep: 143 | 144 | transform_service = TransformService( 145 | self.config, 146 | self.model_name, 147 | step_config, 148 | self.model_step_dict, 149 | ) 150 | 151 | transform_step_args = transform_service.transform( 152 | sagemaker_model_name=sagemaker_model_name 153 | ) 154 | cache_config = CacheConfig(enable_caching=get_cache_flag(step_config), expire_after="10d") 155 | transform_step = TransformStep( 156 | name=step_config.get("step_name"), 157 | step_args=transform_step_args, 158 | cache_config=cache_config 159 | ) 160 | return transform_step 161 | 162 | def sagemaker_model_metrics(self, step_config: dict) -> ProcessingStep: 163 | 164 | model_metric_service = ModelMetricsService(self.config, self.model_name, step_config, self.model_step_dict) 165 | model_metric_args = model_metric_service.calculate_model_metrics() 166 | 167 | cache_config = CacheConfig(enable_caching=get_cache_flag(step_config), expire_after="10d") 168 | evaluation_report = PropertyFile( 169 | name="EvaluationReport", 170 | output_name="model_evaluation_metrics", 171 | path=f"model_evaluation_metrics.json", 172 | ) 173 | metrics_step = ProcessingStep( 174 | name=step_config.get("step_name"), 175 | step_args=model_metric_args, 176 | property_files=[evaluation_report], 177 | cache_config=cache_config, 178 | ) 179 | 180 | return metrics_step 181 | 182 | def sagemaker_register_model(self, step_config: dict, metrics_step: ProcessingStep, 183 | train_step: TrainingStep) -> ModelStep: 184 | 185 | register_model_service = RegisterModelService(self.config, self.model_name) 186 | register_model_args = register_model_service.register_model( 187 | metrics_step, 188 | train_step 189 | ) 190 | 191 | register_model_step = ModelStep( 192 | name=step_config.get("step_name"), 193 | step_args=register_model_args 194 | ) 195 | 196 | return register_model_step 197 | -------------------------------------------------------------------------------- /framework/pipeline/pipeline_service.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | from pipeline.model_unit import ModelUnit 4 | from sagemaker.workflow.model_step import ModelStep 5 | from sagemaker.workflow.pipeline import Pipeline 6 | from sagemaker.workflow.pipeline_context import PipelineSession 7 | from utilities.configuration import Conf 8 | 9 | 10 | class PipelineService: 11 | 12 | def __init__(self) -> "PipelineService": 13 | self.config = Conf().load_conf() 14 | pass 15 | 16 | def _add_step_dependencies(self, pipeline_steps: list) -> None: 17 | step_dependency_config = self.config.get("sagemakerPipeline.dependencies", []) 18 | for condition in step_dependency_config: 19 | temp_chain = condition.split(" >> ") 20 | for i in range(len(temp_chain) - 1): 21 | source_step_name = temp_chain[i] 22 | dest_step_name = temp_chain[i + 1] 23 | source_step = None 24 | dest_step = None 25 | for step in pipeline_steps: 26 | step_name = step.name 27 | if step_name == source_step_name: 28 | source_step = step 29 | elif step_name == dest_step_name: 30 | dest_step = step 31 | if source_step is not None and dest_step is not None: 32 | if isinstance(dest_step, ModelStep): 33 | dest_step.steps[0].add_depends_on([source_step]) 34 | else: 35 | dest_step.add_depends_on([source_step]) 36 | break 37 | if source_step is None or dest_step is None: 38 | raise Exception( 39 | f"Failed when adding dependency between steps {source_step_name} and {dest_step_name}.") 40 | 41 | def construct_train_pipeline(self): 42 | model_steps_dict = {} 43 | 44 | for model_name in list(self.config.get("sagemakerPipeline.models").keys()): 45 | model_steps_dict[model_name] = ModelUnit( 46 | self.config, model_name, model_steps_dict, 47 | ).get_train_pipeline_steps() 48 | 49 | pipeline_steps = [] 50 | for model_name, model_unit_steps in model_steps_dict.items(): 51 | pipeline_steps += model_unit_steps 52 | 53 | self._add_step_dependencies(pipeline_steps=pipeline_steps) 54 | 55 | pipeline = Pipeline( 56 | name=self.config.get("sagemakerPipeline.pipelineName"), 57 | steps=pipeline_steps, 58 | sagemaker_session=PipelineSession(), 59 | ) 60 | pipeline_definition = json.loads(pipeline.definition()) 61 | 62 | return pipeline, pipeline_definition 63 | 64 | def execute_pipeline(self) -> None: 65 | pipeline_role = self.config.get("sagemakerNetworkSecurity.role") 66 | pipeline, pipeline_definition = self.construct_train_pipeline() 67 | 68 | with open("pipeline_definition.json", "w") as file: 69 | json.dump(pipeline_definition, file) 70 | 71 | pipeline.upsert(role_arn=pipeline_role) 72 | pipeline.start() 73 | -------------------------------------------------------------------------------- /framework/processing/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/processing/__init__.py -------------------------------------------------------------------------------- /framework/processing/processing_service.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | import json 19 | import os 20 | from typing import Tuple 21 | 22 | from pipeline.helper import get_chain_input_file, look_up_step_type_from_step_name 23 | from sagemaker.network import NetworkConfig 24 | from sagemaker.processing import ( 25 | FrameworkProcessor, 26 | ProcessingInput, 27 | ProcessingOutput 28 | ) 29 | from sagemaker.sklearn import estimator 30 | from sagemaker.workflow.pipeline_context import PipelineSession 31 | 32 | 33 | class ProcessingService: 34 | """ 35 | Class to handle the creation of processing steps 36 | 37 | Attributes: 38 | ---------- 39 | - config: dict 40 | - Configuration dictionary 41 | - model_name: str 42 | - Model name 43 | - step_config: dict 44 | - Processing step configuration dictionary 45 | - model_step_dict: dict 46 | - Dictionary of model processing steps 47 | """ 48 | 49 | def __init__(self, config: dict, model_name: str, step_config: dict, model_step_dict: dict): 50 | self.config = config 51 | self.model_name = model_name 52 | self.step_config = step_config 53 | self.model_step_dict = model_step_dict 54 | 55 | def _get_network_config(self) -> dict: 56 | """ 57 | Method to retreive SageMaker network configuration 58 | 59 | Returns: 60 | ---------- 61 | - SageMaker Network Configuration dictionary 62 | """ 63 | network_config_kwargs = dict( 64 | enable_network_isolation=False, 65 | security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split( 66 | ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None, 67 | subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get( 68 | "sagemakerNetworkSecurity.subnets", None) else None, 69 | encrypt_inter_container_traffic=True, 70 | ) 71 | 72 | return network_config_kwargs 73 | 74 | def _get_pipeline_session(self) -> PipelineSession: 75 | """ 76 | Method to retreive SageMaker pipeline session 77 | 78 | Returns: 79 | ---------- 80 | - SageMaker pipeline session 81 | """ 82 | return PipelineSession(default_bucket=self.config.get("s3Bucket")) 83 | 84 | def _args(self) -> dict: 85 | """ 86 | Parse method to retreive all arguments to be used to create the processing stop 87 | 88 | Returns: 89 | ---------- 90 | - Processing Step arguments : dict 91 | """ 92 | 93 | # parse main conf dictionary 94 | conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}") 95 | source_dir = self.config.get( 96 | f"models.{self.model_name}.source_directory", 97 | os.getenv("SMP_SOURCE_DIR_PATH") 98 | ) 99 | 100 | args = dict( 101 | image_uri=conf.get("image_uri"), 102 | base_job_name=conf.get("base_job_name", "default-processing-job-name"), 103 | entry_point=conf.get("entry_point"), 104 | instance_count=conf.get("instance_count", 1), 105 | instance_type=conf.get("instance_type", "ml.m5.2xlarge"), 106 | volume_size_in_gb=conf.get("volume_size_in_gb", 32), 107 | max_runtime_seconds=conf.get("max_runtime_seconds", 3000), 108 | tags=conf.get("tags", None), 109 | env=conf.get("env", None), 110 | source_directory=source_dir, 111 | framework_version=conf.get("framework_version", "0"), 112 | role=self.config.get("sagemakerNetworkSecurity.role"), 113 | kms_key=self.config.get("sagemakerNetworkSecurity.kms_key", None), 114 | s3_data_distribution_type=conf.get("s3_data_distribution_type", "FullyReplicated"), 115 | s3_data_type=conf.get("s3_data_type", "S3Prefix"), 116 | s3_input_mode=conf.get("s3_input_mode", "File"), 117 | s3_upload_mode=conf.get("s3_upload_mode", "EndOfJob"), 118 | ) 119 | 120 | return args 121 | 122 | def _get_static_input_list(self) -> list: 123 | """ 124 | Method to retreive SageMaker static inputs 125 | 126 | Returns: 127 | ---------- 128 | - SageMaker Processing Inputs list 129 | 130 | """ 131 | conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}") 132 | # Get the total number of input files 133 | input_files_list = list() 134 | for channel in conf.get("channels", {}).keys(): 135 | temp_data_files = conf.get(f"channels.{channel}.dataFiles", []) 136 | if temp_data_files: 137 | input_files_list.append(temp_data_files[0]) 138 | return input_files_list 139 | 140 | def _get_static_input(self) -> Tuple[list, int]: 141 | """ 142 | Method to retreive SageMaker static inputs 143 | 144 | Returns: 145 | ---------- 146 | - SageMaker Processing Inputs list 147 | 148 | """ 149 | # parse main conf dictionary 150 | conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}") 151 | args = self._args() 152 | # Get the total number of input files 153 | input_files_list = self._get_static_input_list() 154 | static_inputs = [] 155 | input_local_filepath = "/opt/ml/processing/input/" 156 | 157 | for file in input_files_list: 158 | if file.get("fileName").startswith("s3://"): 159 | _source = file.get("fileName") 160 | else: 161 | bucket = conf.get("channels.train.s3Bucket") 162 | input_prefix = conf.get("channels.train.s3InputPrefix", "") 163 | _source = os.path.join(bucket, input_prefix, file.get("fileName")) 164 | 165 | temp = ProcessingInput( 166 | input_name=file.get("sourceName", ""), 167 | source=_source, 168 | destination=os.path.join(input_local_filepath, file.get("sourceName", "")), 169 | s3_data_distribution_type=args.get("s3_data_distribution_type") 170 | ) 171 | 172 | static_inputs.append(temp) 173 | 174 | return static_inputs 175 | 176 | def _get_static_manifest_input(self): 177 | """ 178 | Method to create a manifest file to reference SageMaker Processing Inputs 179 | To create a manifest file the conf file should have s3Bucket and s3InputPrefix 180 | and fileName should contain only the name of the file. 181 | 182 | Returns: 183 | ---------- 184 | - SageMaker Processing Inputs list 185 | 186 | Notes: 187 | ---------- 188 | SageMaker Processing Job API has a limit of 10 ProcessingInputs 189 | 2 of these will be used for code and entrypoint input, 190 | If there are more than 7 input data files, Manifest file needs to 191 | be used to reference ProcessingInput data. 192 | """ 193 | conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}") 194 | bucket = conf.get("channels.train.s3Bucket") 195 | input_prefix = conf.get("channels.train.s3InputPrefix", "") 196 | input_local_file_path = conf.get("inputLocalFilepath", "/opt/ml/processing/input") 197 | manifest_local_filename = f"{self.model_name}_{self.step_config.get('step_type')}_input.manifest" 198 | input_files_list = self._get_static_input_list() 199 | 200 | manifest_list = [] 201 | for file in input_files_list: 202 | manifest_list.append(file.get("fileName")) 203 | 204 | manifest_data = [{"prefix": f"s3://{bucket}/{input_prefix}"}, *manifest_list] 205 | 206 | with open(manifest_local_filename, "w") as outfile: 207 | json.dump(manifest_data, outfile, indent=1) 208 | 209 | manifest_input = ProcessingInput( 210 | source=manifest_local_filename, 211 | destination=os.path.join(input_local_file_path, "train"), 212 | s3_data_type="ManifestFile", 213 | ) 214 | 215 | return [manifest_input] 216 | 217 | def _get_chain_input(self): 218 | """ 219 | Method to retreive SageMaker chain inputs 220 | 221 | Returns: 222 | ---------- 223 | - SageMaker Processing Inputs list 224 | """ 225 | dynamic_processing_input = [] 226 | chain_input_source_step = self.step_config.get("chain_input_source_step", []) 227 | input_local_filepath = "/opt/ml/processing/input/" 228 | args = self._args() 229 | 230 | for source_step_name in chain_input_source_step: 231 | source_step_type = look_up_step_type_from_step_name( 232 | source_step_name=source_step_name, 233 | config=self.config 234 | ) 235 | 236 | for channel in self.config["models"][self.model_name][source_step_type].get('channels',["train"]): 237 | chain_input_path = get_chain_input_file( 238 | source_step_name=source_step_name, 239 | steps_dict=self.model_step_dict, 240 | source_output_name=channel, 241 | ) 242 | 243 | temp = ProcessingInput( 244 | input_name=f"{source_step_name}-input-{channel}", 245 | source=chain_input_path, 246 | destination=os.path.join(input_local_filepath, f"{source_step_name}-input-{channel}"), 247 | s3_data_distribution_type=args.get("s3_data_distribution_type") 248 | ) 249 | dynamic_processing_input.append(temp) 250 | 251 | return dynamic_processing_input 252 | 253 | def _get_processing_inputs(self) -> list: 254 | """ 255 | Method to retreive SageMaker processing inputs 256 | 257 | Returns: 258 | ---------- 259 | - SageMaker Processing Inputs list 260 | """ 261 | 262 | temp_static_input = [] 263 | if len(self._get_static_input_list()) >= 7: 264 | temp_static_input = self._get_static_manifest_input() 265 | else: 266 | temp_static_input = self._get_static_input() 267 | 268 | dynamic_processing_input = self._get_chain_input() 269 | 270 | return temp_static_input + dynamic_processing_input 271 | 272 | def _get_processing_outputs(self) -> list: 273 | """ 274 | Method to retreive SageMaker processing outputs 275 | 276 | Returns: 277 | ---------- 278 | - SageMaker Processing Outputs list 279 | """ 280 | processing_conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}") 281 | processing_outputs = [] 282 | processing_output_local_filepath = processing_conf.get("location.outputLocalFilepath", 283 | "/opt/ml/processing/output") 284 | 285 | source_step_type = self.step_config["step_type"] 286 | 287 | output_names = list( 288 | self.config["models"][self.model_name][source_step_type].get('channels', ["train"])) 289 | 290 | for output_name in output_names: 291 | temp = ProcessingOutput( 292 | output_name=output_name, 293 | source=os.path.join(processing_output_local_filepath, output_name), 294 | s3_upload_mode="EndOfJob" 295 | ) 296 | 297 | processing_outputs.append(temp) 298 | 299 | return processing_outputs 300 | 301 | def _run_processing_step( 302 | self, 303 | network_config: dict, 304 | args: dict 305 | ): 306 | """ 307 | Method to run SageMaker Processing step 308 | 309 | Parameters: 310 | ---------- 311 | - network_config: dict 312 | Network configuration 313 | - args: dict 314 | Arguments for SageMaker Processing step 315 | 316 | Returns: 317 | ---------- 318 | - step_process: dict 319 | SageMaker Processing step 320 | """ 321 | 322 | entrypoint_command = args["entry_point"].replace("/", ".").replace(".py", "") 323 | 324 | framework_processor = FrameworkProcessor( 325 | image_uri=args["image_uri"], 326 | framework_version=args["framework_version"], 327 | estimator_cls=estimator.SKLearn, 328 | role=args["role"], 329 | command=["python", "-m", entrypoint_command], 330 | instance_count=args["instance_count"], 331 | instance_type=args["instance_type"], 332 | volume_size_in_gb=args["volume_size_in_gb"], 333 | max_runtime_in_seconds=args["max_runtime_seconds"], 334 | base_job_name=args["base_job_name"], 335 | tags=args["tags"], 336 | env=args["env"], 337 | volume_kms_key=args["kms_key"], 338 | output_kms_key=args["kms_key"], 339 | network_config=NetworkConfig(**network_config), 340 | sagemaker_session=self._get_pipeline_session(), 341 | ) 342 | 343 | step_process = framework_processor.run( 344 | inputs=self._get_processing_inputs(), 345 | outputs=self._get_processing_outputs(), 346 | source_dir=args["source_directory"], 347 | code=args["entry_point"], 348 | job_name=args["base_job_name"] 349 | ) 350 | 351 | return step_process 352 | 353 | def processing(self) -> dict: 354 | return self._run_processing_step( 355 | self._get_network_config(), 356 | self._args() 357 | ) 358 | -------------------------------------------------------------------------------- /framework/registermodel/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/registermodel/__init__.py -------------------------------------------------------------------------------- /framework/registermodel/register_model_service.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | from sagemaker.model import ModelPackage 19 | from sagemaker.workflow.execution_variables import ExecutionVariables 20 | from sagemaker.model_metrics import MetricsSource, ModelMetrics 21 | from sagemaker.workflow.steps import ProcessingStep, TrainingStep 22 | 23 | from createmodel.create_model_service import CreateModelService 24 | 25 | 26 | class RegisterModelService: 27 | def __init__(self, config: dict, model_name: str): 28 | self.config = config 29 | self.model_name = model_name 30 | 31 | def register_model(self, step_metrics: ProcessingStep, step_train: TrainingStep) -> ModelPackage: 32 | create_model_service = CreateModelService(self.config, self.model_name) 33 | model_package_dict = self.config.get(f"models.{self.model_name}.registry") 34 | model = create_model_service.create_model(step_train=step_train) 35 | 36 | if step_metrics: 37 | model_metrics = ModelMetrics( 38 | model_statistics=MetricsSource( 39 | content_type=self.config.get( 40 | f"models.{self.model_name}.evaluate.content_type", 41 | "application/json" 42 | ), 43 | s3_uri="{}{}.json".format( 44 | step_metrics.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"], 45 | step_metrics.arguments["ProcessingOutputConfig"]["Outputs"][0]["OutputName"], 46 | ), 47 | ), 48 | ) 49 | else: 50 | model_metrics = None 51 | 52 | inference_spec_dict = model_package_dict.get("InferenceSpecification") 53 | 54 | register_model_step_args = model.register( 55 | content_types=inference_spec_dict.get("supported_content_types"), 56 | response_types=inference_spec_dict.get("supported_response_MIME_types"), 57 | inference_instances=inference_spec_dict.get("SupportedRealtimeInferenceInstanceTypes", ["ml.m5.2xlarge"]), 58 | transform_instances=inference_spec_dict.get("SupportedTransformInstanceTypes", ["ml.m5.2xlarge"]), 59 | model_package_group_name=f"{self.config.get('models.projectName')}-{self.model_name}", 60 | marketplace_cert=False, 61 | description=model_package_dict.get( 62 | "ModelPackageDescription", 63 | "Default Model Package Description. Please add custom descriptioon in your conf.yaml file" 64 | ), 65 | customer_metadata_properties={ 66 | "PIPELINE_ARN": ExecutionVariables.PIPELINE_EXECUTION_ARN, 67 | }, 68 | approval_status=inference_spec_dict.get("approval_status"), 69 | model_metrics=model_metrics, 70 | ) 71 | 72 | return register_model_step_args 73 | -------------------------------------------------------------------------------- /framework/training/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /framework/training/training_service.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | # Import native libraries 19 | import os 20 | from typing import Tuple 21 | 22 | from pipeline.helper import get_chain_input_file, look_up_step_type_from_step_name 23 | from sagemaker.estimator import Estimator 24 | from sagemaker.inputs import TrainingInput 25 | from sagemaker.workflow.pipeline_context import PipelineSession 26 | 27 | 28 | class TrainingService: 29 | """ 30 | Class to handle SageMaker Training Service 31 | """ 32 | 33 | def __init__( 34 | self, 35 | config: dict, 36 | model_name: str, 37 | step_config: dict, 38 | model_step_dict: dict 39 | ) -> "TrainingService": 40 | 41 | self.config = config 42 | self.model_name = model_name 43 | self.step_config = step_config 44 | self.domain_section = self.step_config.get("step_type", "train") 45 | self.model_step_dict = model_step_dict 46 | 47 | def _get_network_config(self) -> dict: 48 | """ 49 | Method to retreive SageMaker network configuration 50 | 51 | Returns: 52 | ---------- 53 | - SageMaker Network Configuration dictionary 54 | """ 55 | 56 | network_config_kwargs = dict( 57 | enable_network_isolation=False, 58 | security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split( 59 | ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None, 60 | subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get( 61 | "sagemakerNetworkSecurity.subnets", None) else None, 62 | encrypt_inter_container_traffic=True, 63 | ) 64 | return network_config_kwargs 65 | 66 | def _get_pipeline_session(self) -> PipelineSession: 67 | """ 68 | Method to retreive SageMaker pipeline session 69 | 70 | Returns: 71 | ---------- 72 | - SageMaker Pipeline Session 73 | """ 74 | 75 | return PipelineSession(default_bucket=self.config.get("s3Bucket")) 76 | 77 | def _args(self) -> dict: 78 | """ 79 | Method to retreive SageMaker training arguments 80 | 81 | Returns: 82 | ---------- 83 | - SageMaker Training Arguments dictionary 84 | """ 85 | 86 | conf = self.config.get(f"models.{self.model_name}.{self.domain_section}") 87 | source_dir = self.config.get( 88 | f"models.{self.model_name}.source_directory", 89 | os.getenv("SMP_SOURCE_DIR_PATH") 90 | ) 91 | 92 | args = dict( 93 | image_uri=conf.get("image_uri"), 94 | base_job_name=conf.get("base_job_name", "default-training-job-name"), 95 | entry_point=conf.get("entry_point"), 96 | instance_count=conf.get("instance_count", 1), 97 | instance_type=conf.get("instance_type", "ml.m5.2xlarge"), 98 | volume_size_in_gb=conf.get("volume_size_in_gb", 32), 99 | max_runtime_seconds=conf.get("max_runtime_seconds", 3000), 100 | tags=conf.get("tags", None), 101 | env=conf.get("env", None), 102 | source_directory=source_dir, 103 | output_path=conf.get("output_path"), 104 | hyperparams=conf.get("hyperparams", None), 105 | model_data_uri=conf.get("model_data_uri", None), 106 | role=self.config.get("sagemakerNetworkSecurity.role"), 107 | kms_key=self.config.get("sagemakerNetworkSecurity.kms_key", None) 108 | ) 109 | 110 | return args 111 | 112 | def _get_static_input_list(self) -> list: 113 | """ 114 | Method to retreive SageMaker static inputs 115 | 116 | Returns: 117 | ---------- 118 | - SageMaker Processing Inputs list 119 | 120 | """ 121 | conf = self.config.get(f"models.{self.model_name}.{self.domain_section}") 122 | # Get the total number of input files 123 | input_files_list = list() 124 | for channel in conf.get("channels", {}).keys(): input_files_list.append( 125 | conf.get(f"channels.{channel}.dataFiles", [])[0]) 126 | return input_files_list 127 | 128 | def _get_static_input(self, channel) -> Tuple[list, int]: 129 | """ 130 | Method to retreive SageMaker static inputs 131 | 132 | Returns: 133 | ---------- 134 | - SageMaker Processing Inputs list 135 | 136 | """ 137 | # parse main conf dictionary 138 | conf = self.config.get(f"models.{self.model_name}.{self.domain_section}") 139 | args = self._args() 140 | # Get the total number of input files 141 | input_files_list = self._get_static_input_list() 142 | 143 | training_channel_inputs = {} 144 | content_type = conf.get("content_type", None) 145 | input_mode = conf.get("input_mode", "File") 146 | distribution = conf.get("distribution", "FullyReplicated") 147 | 148 | for file in input_files_list: 149 | if file.get("fileName").startswith("s3://"): 150 | _source = file.get("fileName") 151 | else: 152 | bucket = conf.get("channels.train.s3Bucket") 153 | input_prefix = conf.get("channels.train.s3InputPrefix", "") 154 | _source = os.path.join(bucket, input_prefix, "data", channel, file.get("fileName")) 155 | 156 | training_input = TrainingInput( 157 | s3_data=_source, 158 | content_type=content_type, 159 | input_mode=input_mode, 160 | distribution=distribution, 161 | ) 162 | 163 | training_channel_inputs[channel] = training_input 164 | 165 | return training_channel_inputs 166 | 167 | def _get_chain_input(self): 168 | """ 169 | Method to retreive SageMaker chain inputs 170 | 171 | Returns: 172 | ---------- 173 | - SageMaker Processing Inputs list 174 | """ 175 | dynamic_training_input = [] 176 | training_channel_inputs = {} 177 | conf = self.config.get(f"models.{self.model_name}.{self.domain_section}") 178 | chain_input_source_step = self.step_config.get("chain_input_source_step", []) 179 | content_type = conf.get("content_type", None) 180 | input_mode = conf.get("input_mode", "File") 181 | distribution = conf.get("distribution", "FullyReplicated") 182 | 183 | for source_step_name in chain_input_source_step: 184 | source_step_type = look_up_step_type_from_step_name( 185 | source_step_name=source_step_name, 186 | config=self.config 187 | ) 188 | training_channel_inputs[source_step_name] = {} 189 | for channel in self.config["models"][self.model_name][source_step_type].get("channels", ["train"]): 190 | chain_input_path = get_chain_input_file( 191 | source_step_name=source_step_name, 192 | steps_dict=self.model_step_dict, 193 | source_output_name=channel 194 | ) 195 | 196 | training_input = TrainingInput( 197 | s3_data=chain_input_path, 198 | content_type=content_type, 199 | input_mode=input_mode, 200 | distribution=distribution, 201 | ) 202 | 203 | # dynamic_training_input.append(temp) 204 | # training_channel_inputs[f"{self.model_name}-{source_step_name}"] = training_input 205 | # training_channel_inputs.update({source_step_name: {channel: training_input}}) 206 | training_channel_inputs[source_step_name][channel] = training_input 207 | 208 | return training_channel_inputs 209 | 210 | def _run_training_step(self, args: dict): 211 | if "/" in args["entry_point"]: 212 | train_source_dir = f"{args['source_directory']}/{args['entry_point'].rsplit('/', 1)[0]}" 213 | train_entry_point = args["entry_point"].rsplit("/", 1)[1] 214 | train_dependencies = [ 215 | os.path.join(args["source_directory"], f) for f in os.listdir(args["source_directory"]) 216 | ] 217 | else: 218 | train_source_dir = args["source_directory"] 219 | train_entry_point = args["entry_point"] 220 | train_dependencies = None 221 | 222 | estimator = Estimator( 223 | role=args["role"], 224 | image_uri=args["image_uri"], 225 | instance_count=args["instance_count"], 226 | instance_type=args["instance_type"], 227 | volume_size=args["volume_size_in_gb"], 228 | max_run=args["max_runtime_seconds"], 229 | output_path=args["output_path"], 230 | base_job_name=args["base_job_name"], 231 | hyperparameters=args["hyperparams"], 232 | tags=args["tags"], 233 | model_uri=args["model_data_uri"], 234 | environment=args["env"], 235 | source_dir=train_source_dir, 236 | entry_point=train_entry_point, 237 | dependences=train_dependencies, 238 | sagemaker_session=self._get_pipeline_session() 239 | ) 240 | 241 | return estimator 242 | 243 | def train_step(self): 244 | """ 245 | Method to run training step 246 | 247 | Returns: 248 | ---------- 249 | - SageMaker Estimator object 250 | 251 | """ 252 | 253 | train_conf = self.config.get(f"models.{self.model_name}.{self.domain_section}") 254 | args = self._args() 255 | estimator = self._run_training_step(args) 256 | 257 | training_channel_inputs = {} 258 | for channel in train_conf.get("channels", "train"): 259 | temp_inputs = self._get_static_input(channel) 260 | training_channel_inputs.update(temp_inputs) 261 | 262 | chained_inputs = self._get_chain_input() 263 | for chain_input in chained_inputs: 264 | for channel in chained_inputs[chain_input]: 265 | training_channel_inputs.update({f"{chain_input}-{channel}": chained_inputs[chain_input][channel]}) 266 | 267 | train_args = estimator.fit( 268 | inputs=training_channel_inputs, 269 | job_name=args["base_job_name"] 270 | ) 271 | 272 | return train_args 273 | -------------------------------------------------------------------------------- /framework/transform/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/transform/__init__.py -------------------------------------------------------------------------------- /framework/transform/transform_service.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | # Import native libraries 19 | from typing import Union, Tuple 20 | 21 | from pipeline.helper import get_chain_input_file 22 | # Import third-party libraries 23 | from sagemaker.transformer import Transformer 24 | from sagemaker.workflow.functions import Join 25 | from sagemaker.workflow.pipeline_context import PipelineSession 26 | # Import custom libraries 27 | from utilities.logger import Logger 28 | 29 | 30 | class TransformService: 31 | """ 32 | SageMaker Transform Step service. 33 | """ 34 | 35 | def __init__(self, config: dict, model_name: str, step_config: dict, model_step_dict: dict) -> "TransformService": 36 | """ 37 | Initialization method to TransformStep 38 | 39 | Args: 40 | ---------- 41 | - config (dict): Application configuration 42 | - model_name (str): Name of Model 43 | - run_mode (str): run mode definition. Can be 'train' or 'inference' 44 | """ 45 | self.config = config 46 | self.model_name = model_name 47 | self.step_config = step_config 48 | self.model_step_dict = model_step_dict 49 | self.logger = Logger() 50 | 51 | def _get_network_config(self) -> dict: 52 | """ 53 | Method to retreive SageMaker network configuration 54 | 55 | Returns: 56 | ---------- 57 | - SageMaker Network Configuration dictionary 58 | """ 59 | 60 | network_config_kwargs = dict( 61 | enable_network_isolation=False, 62 | security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split( 63 | ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None, 64 | subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get( 65 | "sagemakerNetworkSecurity.subnets", None) else None, 66 | kms_key=self.config.get("sagemakerNetworkSecurity.kms_key"), 67 | encrypt_inter_container_traffic=True, 68 | role=self.config.get("sagemakerNetworkSecurity.role"), 69 | ) 70 | return network_config_kwargs 71 | 72 | def _args(self) -> dict: 73 | """ 74 | Parse method to retreive all arguments to be used to create the Model 75 | 76 | Returns: 77 | ---------- 78 | - Transform Step arguments : dict 79 | """ 80 | 81 | # parse main conf dictionary 82 | conf = self.config.get("models") 83 | 84 | args = dict( 85 | image_uri=conf.get(f"{self.model_name}.transform.image_uri"), 86 | base_job_name=conf.get(f"{self.model_name}.transform.base_job_name", "default-transform-job-name"), 87 | instance_count=conf.get(f"{self.model_name}.transform.instance_count", 1), 88 | instance_type=conf.get(f"{self.model_name}.transform.instance_type", "ml.m5.2xlarge"), 89 | strategy=conf.get(f"{self.model_name}.transform.strategy", None), 90 | assemble_with=conf.get(f"{self.model_name}.transform.assemble_with", None), 91 | join_source=conf.get(f"{self.model_name}.transform.join_source", None), 92 | split_type=conf.get(f"{self.model_name}.transform.split_type", None), 93 | content_type=conf.get(f"{self.model_name}.transform.content_type", "text/csv"), 94 | max_payload=conf.get(f"{self.model_name}.transform.max_payload", None), 95 | volume_size=conf.get(f"{self.model_name}.transform.volume_size", 50), 96 | max_runtime_in_seconds=conf.get(f"{self.model_name}.transform.max_runtime_in_seconds", 3600), 97 | input_filter=conf.get(f"{self.model_name}.transform.input_filter", None), 98 | output_filter=conf.get(f"{self.model_name}.transform.output_filter", None), 99 | tags=conf.get(f"{self.model_name}.transform.tags", None), 100 | env=conf.get(f"{self.model_name}.transform.env", None), 101 | ) 102 | 103 | return args 104 | 105 | def _get_train_inputs_outputs(self, transform_data: dict) -> Tuple[str, str]: 106 | """ 107 | Method to retreive dynamically the files to be Transformed 108 | 109 | Args: 110 | ---------- 111 | - transform_data (dict): Dictionary of files 112 | 113 | Return 114 | ---------- 115 | - input_data_file_s3path (str): Input path location 116 | - output_file_s3path (str): Output path location 117 | """ 118 | 119 | evaluate_channels = list(transform_data.get("channels", "train").keys()) 120 | if len(evaluate_channels) != 1: 121 | raise Exception(f"Only one channel allowed within Transform evaluate section. {evaluate_channels} found.") 122 | else: 123 | channel = evaluate_channels[0] 124 | self.logger.log_info("INFO", f"During TransformService, one evaluate channel {channel} found.") 125 | 126 | channel_full_name = f"channels.{channel}" 127 | bucket_prefix = transform_data.get(f"{channel_full_name}.inputBucketPrefix") + '/' if transform_data.get( 128 | f"{channel_full_name}.inputBucketPrefix") else "" 129 | s3_bucket_name = transform_data.get(f"{channel_full_name}.s3BucketName") 130 | 131 | # Transform data source 132 | files = list(transform_data.get(f"{channel_full_name}.dataFiles", "")) 133 | 134 | if len(files) == 1: 135 | file = files[0] 136 | self.logger.log_info("During TransformService, one evaluate file {file} found.") 137 | file_name = file.get("fileName") 138 | 139 | if file_name.startswith("s3://"): 140 | input_data_file_s3path = file_name 141 | else: 142 | input_data_file_s3path = f"s3://{s3_bucket_name}/{bucket_prefix}/{file_name}" 143 | 144 | elif len(files) == 0: 145 | self.logger.log_info("INFO", "During TransformService, no evaluate file found.") 146 | input_data_file_s3path = None 147 | else: 148 | raise Exception(f"Maximum one file allowed within evaluation.dataFiles section. {len(files)} found.") 149 | 150 | output_file_s3path = f"s3://{s3_bucket_name}/{bucket_prefix}{self.model_name}/predictions/transform" 151 | 152 | return input_data_file_s3path, output_file_s3path 153 | 154 | def _get_chain_input(self): 155 | """ 156 | Method to retreive SageMaker chain inputs 157 | 158 | Returns: 159 | ---------- 160 | - SageMaker Processing Inputs list 161 | """ 162 | channels_conf = self.config.get(f"models.{self.model_name}.transform.channels", {"train": []}) 163 | if len(channels_conf.keys()) != 1: 164 | raise Exception("Transform step can only have one channel.") 165 | channel_name = list(channels_conf.keys())[0] 166 | 167 | dynamic_processing_input = [] 168 | chain_input_source_step = self.step_config.get("chain_input_source_step", []) 169 | chain_input_additional_prefix = self.step_config.get("chain_input_additional_prefix", "") 170 | args = self._args() 171 | 172 | if len(chain_input_source_step) == 1: 173 | self.logger.log_info( 174 | "INFO", f"During TransformService, chain input source step {chain_input_source_step} found." 175 | ) 176 | for source_step_name in chain_input_source_step: 177 | chain_input_path = get_chain_input_file( 178 | source_step_name=source_step_name, 179 | steps_dict=self.model_step_dict, 180 | source_output_name=channel_name, 181 | ) 182 | 183 | input_data_file_s3path = Join("/", [chain_input_path, chain_input_additional_prefix]) 184 | 185 | elif len(chain_input_source_step) == 0: 186 | self.logger.log_info( 187 | "INFO", "During TransformService, no chain input found. Input from transform.dataFiles" 188 | ) 189 | return None 190 | else: 191 | raise Exception( 192 | f"Maximum one chain input allowed for TransformService. {len(chain_input_source_step)} found." 193 | ) 194 | 195 | return input_data_file_s3path 196 | 197 | def _run_batch_transform( 198 | self, 199 | input_data: str, 200 | output_path: str, 201 | network_config: dict, 202 | sagemaker_model_name: str, 203 | args: dict, 204 | ) -> dict: 205 | """ 206 | Method to setup a SageMaker Transformer and Transformer arguments 207 | 208 | Args: 209 | ---------- 210 | - input_data (str): Input data path 211 | - output_path (str): Output data path 212 | - network_config (dict): SageMaker Network Config 213 | - sagemaker_model_name (str): SageMaker Model Name 214 | - args (dict): SageMaker TransformStep arguments 215 | 216 | Return 217 | ---------- 218 | - step_transform_args : TransformStep Args 219 | """ 220 | 221 | # define a transformer 222 | transformer = Transformer( 223 | model_name=sagemaker_model_name, 224 | instance_count=args["instance_count"], 225 | instance_type=args["instance_type"], 226 | strategy=args["strategy"], 227 | assemble_with=args["assemble_with"], 228 | max_payload=args["max_payload"], 229 | output_path=output_path, 230 | sagemaker_session=PipelineSession( 231 | default_bucket=self.config.get("models.s3Bucket"), 232 | ), 233 | output_kms_key=network_config["kms_key"], 234 | accept=args["content_type"], 235 | tags=args["tags"], 236 | env=args["env"], 237 | base_transform_job_name=args["base_job_name"], 238 | volume_kms_key=network_config["kms_key"], 239 | ) 240 | 241 | step_transform_args = transformer.transform( 242 | data=input_data, 243 | content_type=args["content_type"], 244 | split_type=args["split_type"], 245 | input_filter=args["input_filter"], 246 | output_filter=args["output_filter"], 247 | join_source=args["join_source"], 248 | ) 249 | 250 | return step_transform_args 251 | 252 | def transform( 253 | self, 254 | sagemaker_model_name: str, 255 | ) -> Union[dict, dict]: 256 | """ 257 | Method to setup SageMaker TransformStep 258 | 259 | Args: 260 | ---------- 261 | - step_config (dict) 262 | - steps_dict (dict) 263 | - sagemaker_model_name (str) 264 | """ 265 | 266 | step_transform_args = None 267 | self.logger.log_info(f"{'-' * 40} {self.model_name} {'-' * 40}") 268 | self.logger.log_info(f"Starting {self.model_name} batch transform") 269 | 270 | # Get SageMaker network configuration 271 | sagemaker_network_config = self._get_network_config() 272 | self.logger.log_info(f"SageMaker network config: {sagemaker_network_config}") 273 | 274 | transform_data = self.config.get(f"models.{self.model_name}.transform") 275 | sagemaker_config = self._args() 276 | 277 | if self._get_chain_input(): 278 | input_data_file_s3path = self._get_chain_input() 279 | output_data_file_s3path = None 280 | else: 281 | input_data_file_s3path, output_data_file_s3path = self._get_train_inputs_outputs( 282 | transform_data 283 | ) 284 | 285 | step_transform_args = self._run_batch_transform( 286 | input_data=input_data_file_s3path, 287 | output_path=output_data_file_s3path, 288 | network_config=sagemaker_network_config, 289 | sagemaker_model_name=sagemaker_model_name, 290 | args=sagemaker_config, 291 | ) 292 | 293 | return step_transform_args 294 | -------------------------------------------------------------------------------- /framework/utilities/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/utilities/__init__.py -------------------------------------------------------------------------------- /framework/utilities/configuration.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | import os 19 | import glob 20 | import yaml 21 | from typing import Any, Dict, Union, List 22 | 23 | 24 | class Conf: 25 | """ 26 | Class to Read Framework config and all complementary app Conf files. 27 | """ 28 | 29 | def __init__(self): 30 | self.path = "framework/conf/conf.yaml" 31 | 32 | def load_conf(self): 33 | """ 34 | Method to load and merge all Conf files 35 | """ 36 | base_conf, conf_path = self._get_framework_conf() 37 | base_conf["conf"]["models"] = {} 38 | 39 | modelConfigFilePath = base_conf["conf"][ 40 | "modelConfigFilePath" 41 | ] 42 | yaml_files = glob.glob( 43 | f"{self._get_parent_dir()}/{modelConfigFilePath}", recursive=True 44 | ) 45 | 46 | for file_path in yaml_files: 47 | if file_path.startswith(conf_path): 48 | continue 49 | model_conf = self._read_yaml_file(file_path) 50 | # Insert Models Attibutes into Framework attribute in a runtime 51 | for key, value in model_conf["conf"]["models"].items(): 52 | base_conf["conf"]["models"].setdefault(key, {}).update(value) 53 | 54 | # Insert sagemakerPipeline section as a primary key 55 | for key, value in model_conf["conf"].items(): 56 | if key == "sagemakerPipeline": 57 | base_conf["conf"]["sagemakerPipeline"] = {} 58 | base_conf["conf"]["sagemakerPipeline"].update(value) 59 | 60 | update_conf = self._inject_env_variables(config=base_conf) 61 | return DotDict(update_conf).get("conf") 62 | 63 | def _inject_env_variables( 64 | self, 65 | config: Union[Dict[str, Union[Dict, List, str]], List] 66 | ) -> Union[Dict, List]: 67 | """ 68 | Replace dictionary TAGS by Environment Variables on a runtime 69 | 70 | Args: 71 | ---------- 72 | - config (dict): Framework configuration 73 | 74 | Returns: 75 | ---------- 76 | - Frameworks configurationwith values tags replaced by 77 | environment variables. 78 | 79 | """ 80 | if isinstance(config, dict): 81 | updated_config = {} 82 | for key, value in config.items(): 83 | if isinstance(value, dict): 84 | updated_config[key] = self._inject_env_variables(value) 85 | elif isinstance(value, list): 86 | updated_config[key] = [self._inject_env_variables(item) for item in value] 87 | else: 88 | updated_config[key] = self._replace_placeholders(value) 89 | return updated_config 90 | 91 | elif isinstance(config, list): 92 | return [self._inject_env_variables(item) for item in config] 93 | else: 94 | return config 95 | 96 | def _replace_placeholders(self, value: str) -> str: 97 | """ 98 | Placeholder 99 | """ 100 | if isinstance(value, str): 101 | if value.startswith("s3://"): 102 | parts = value.split("/") 103 | updated_parts = [os.environ.get(part, part) for part in parts] 104 | return "/".join(updated_parts) 105 | else: 106 | parts = value.split(".") 107 | updated_parts = [os.environ.get(part, part) for part in parts] 108 | return '.'.join(updated_parts) 109 | return value 110 | 111 | def _get_framework_conf(self): 112 | """ 113 | Load the Framework Conf file 114 | """ 115 | path = self.path 116 | root = self._get_parent_dir() 117 | conf_path = os.path.join(root, path) 118 | 119 | with open(conf_path, "r") as f: 120 | conf = yaml.safe_load(f) 121 | config = self._inject_env_variables(config=conf) 122 | return config, conf_path 123 | 124 | def _get_parent_dir(self): 125 | """ 126 | Get the parent directory from where the framework is been executed 127 | """ 128 | subdirectory = "framework" 129 | current_directory = os.getcwd() 130 | 131 | substring = str(current_directory).split("/") 132 | parent_dir = [path for path in substring if path != subdirectory] 133 | 134 | return "/".join(parent_dir) 135 | 136 | def _read_yaml_file(self, file_path: str): 137 | """ 138 | Read YAML file 139 | 140 | Args: 141 | ---------- 142 | - file_path (str): Conf file path 143 | 144 | """ 145 | with open(file_path, "r") as f: 146 | return yaml.safe_load(f) 147 | 148 | 149 | class DotDict(dict): 150 | """ 151 | A dictionary subclass that enables dot notation for nested access 152 | """ 153 | 154 | def __getattr__(self, key: str) -> "DotDict": 155 | """ 156 | Retreive the value of a nested key using dot notation. 157 | 158 | Args: 159 | ---------- 160 | - key (str): Yhe nested key in dot notation 161 | 162 | Returns: 163 | ---------- 164 | - The value of the nested key, wrapped in a "DotDict" if the value is a dictionary. 165 | 166 | Raises: 167 | ---------- 168 | - AttributeError: If the nested key is not found. 169 | """ 170 | if key in self: 171 | value = self[key] 172 | if isinstance(value, dict): 173 | return DotDict(value) 174 | return value 175 | else: 176 | return DotDict() 177 | 178 | def __setattr__(self, key: str, value: Any) -> None: 179 | self[key] = value 180 | 181 | def __delattr__(self, key: str) -> None: 182 | try: 183 | del self[key] 184 | except KeyError: 185 | raise AttributeError( 186 | f"{self.__class__.__name__} object has no attribute {key}" 187 | ) 188 | 189 | def get_value(self, key: str, default: Any = None) -> Any: 190 | """ 191 | Retreive the value of a nested key using dot notation 192 | 193 | Args: 194 | ---------- 195 | - key (str): The nested key in dot notation. 196 | - default (Any): The default value to return if the nested keyis not found. Default is None 197 | 198 | Returns: 199 | ---------- 200 | - The value of the nested key if found, or the specified default value if not found. 201 | """ 202 | keys = key.split(".") 203 | value = self 204 | for k in keys: 205 | value = value.__getattr__(k) 206 | if not isinstance(value, DotDict): 207 | break 208 | return value if value is not None else default 209 | 210 | def get(self, key: str, default: Any = None) -> Any: 211 | """ 212 | Retreive the value of a nested key using dot notation 213 | 214 | Args: 215 | ---------- 216 | - key (str): The nested key in dot notation. 217 | - default (Any): The default value to return if the nested keyis not found. Default is None 218 | 219 | Returns: 220 | ---------- 221 | - The value of the nested key if found, or the specified default value if not found. 222 | """ 223 | value = self.get_value(key) 224 | return value if value is not None and value != {} else default 225 | -------------------------------------------------------------------------------- /framework/utilities/logger.py: -------------------------------------------------------------------------------- 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # 3 | # SPDX-License-Identifier: MIT-0 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 6 | # software and associated documentation files (the "Software"), to deal in the Software 7 | # without restriction, including without limitation the rights to use, copy, modify, 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 9 | # permit persons to whom the Software is furnished to do so. 10 | # 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 17 | 18 | import logging 19 | from typing import Union, List 20 | 21 | 22 | class Logger: 23 | """ 24 | Logger class that implement custom formattin messages 25 | 26 | Attibutes: 27 | ---------- 28 | - logger (logging.Logger): The logger object from the logging module. 29 | 30 | Methods: 31 | ---------- 32 | - log_debug(*messages: Union[str, List[str]]): Logs debug messages 33 | - log_info(*messages: Union[str, List[str]]): Logs informational messages 34 | - log_warning(*messages: Union[str, List[str]]): Logs warning messages 35 | - log_error(*messages: Union[str, List[str]]): Logs error messages 36 | - log_critical(*messages: Union[str, List[str]]): Logs critical messages 37 | """ 38 | 39 | def __init__(self, config: Union[dict, None] = None): 40 | """ 41 | Initializes the Loger class. 42 | 43 | Args: 44 | ---------- 45 | - config (dict): Configuration for the logger 46 | """ 47 | # Get logger 48 | self.logger = logging.getLogger(self.__class__.__name__) 49 | 50 | # Create the formatter 51 | formatter = logging.Formatter( 52 | "%(asctime)s :::: [Log %(name)s] :::: %(message)s", 53 | datefmt="[%Y-%m-%d %H:%M:%S %Z%z]" 54 | ) 55 | 56 | # Create the console handler and add formatter 57 | console_handler = logging.StreamHandler() 58 | console_handler.setFormatter(formatter) 59 | 60 | # Add handlers to the loger 61 | self.logger.addHandler(console_handler) 62 | 63 | def _log_messages(self, level: str, prefix: str, *messages: Union[str, List[str]]): 64 | """ 65 | Logs messages with the specified level. 66 | 67 | Args: 68 | ---------- 69 | - level: (str): The logging level 70 | - prefix: (str): Message prefix to be used 71 | - *messages: (Union[str, List[str]]): Single or List of messages 72 | """ 73 | for msg in messages: 74 | level(f"[Level: {prefix}] :::: {msg}") 75 | 76 | def log_debug(self, *messages: Union[str, List[str]]): 77 | """ 78 | Logs debug messages 79 | 80 | Args: 81 | ---------- 82 | - *messages: (Union[str, List[str]]): Single or List of messages 83 | """ 84 | self.logger.setLevel(logging.DEBUG) 85 | self._log_messages(self.logger.debug, "DEBUG ", *messages) 86 | 87 | def log_info(self, *messages: Union[str, List[str]]): 88 | """ 89 | Logs informational messages 90 | 91 | Args: 92 | ---------- 93 | - *messages: (Union[str, List[str]]): Single or List of messages 94 | """ 95 | self.logger.setLevel(logging.INFO) 96 | self._log_messages(self.logger.info, "INFO ", *messages) 97 | 98 | def log_warning(self, *messages: Union[str, List[str]]): 99 | """ 100 | Logs warining messages 101 | 102 | Args: 103 | ---------- 104 | - *messages: (Union[str, List[str]]): Single or List of messages 105 | """ 106 | self.logger.setLevel(logging.WARNING) 107 | self._log_messages(self.logger.warning, "WARNING ", *messages) 108 | 109 | def log_error(self, *messages: Union[str, List[str]]): 110 | """ 111 | Logs error messages 112 | 113 | Args: 114 | ---------- 115 | - *messages: (Union[str, List[str]]): Single or List of messages 116 | """ 117 | self.logger.setLevel(logging.ERROR) 118 | self._log_messages(self.logger.error, "ERROR ", *messages) 119 | 120 | def log_critical(self, *messages: Union[str, List[str]]): 121 | """ 122 | Logs critical messages 123 | 124 | Args: 125 | ---------- 126 | - *messages: (Union[str, List[str]]): Single or List of messages 127 | """ 128 | self.logger.setLevel(logging.CRITICAL) 129 | self._log_messages(self.logger.critical, "CRITICAL", *messages) 130 | -------------------------------------------------------------------------------- /framework/utilities/utils.py: -------------------------------------------------------------------------------- 1 | class S3Utilities: 2 | def __new__(cls): 3 | if cls._instance is None: 4 | cls._instance = super(S3Utilities, cls).__new__(cls) 5 | return cls._instance 6 | 7 | @staticmethod 8 | def split_s3_uri(s3_uri: str) -> tuple: 9 | split_list = s3_uri.split("//")[1].split("/") 10 | s3_bucket_name = split_list[0] 11 | s3_key = "/".join(split_list[1:]) 12 | return s3_bucket_name, s3_key 13 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | sagemaker 2 | boto3 3 | pyhocon 4 | PyYAML --------------------------------------------------------------------------------