├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── architecture_diagram.png
├── env.env
├── examples
    ├── lgbm
    │   ├── conf
    │   │   └── conf.yaml
    │   ├── dag.png
    │   ├── data_source.md
    │   ├── evaluate
    │   │   ├── __init__.py
    │   │   ├── evaluate.py
    │   │   └── requirements.txt
    │   ├── preprocessing
    │   │   ├── __init__.py
    │   │   ├── preprocessing.py
    │   │   └── requirements.txt
    │   ├── requirements.txt
    │   ├── training
    │   │   ├── __init__.py
    │   │   ├── requirements.txt
    │   │   └── training.py
    │   └── transform
    │   │   ├── __init__.py
    │   │   ├── docker
    │   │       ├── Dockerfile
    │   │       ├── dockerd-entrypoint.py
    │   │       ├── model_script.py
    │   │       └── readme.md
    │   │   ├── requirements.txt
    │   │   └── transform.py
    ├── llm-text-summarization
    │   ├── conf
    │   │   └── conf.yaml
    │   ├── dag.png
    │   ├── data_source.md
    │   ├── preprocessing
    │   │   ├── __init__.py
    │   │   └── preprocessing.py
    │   ├── requirements.txt
    │   ├── training
    │   │   ├── __init__.py
    │   │   ├── requirements.txt
    │   │   └── training.py
    │   └── transform
    │   │   └── inference.py
    ├── multi-model-example
    │   ├── MultiModel.md
    │   ├── cal_housing_pca
    │   │   ├── conf
    │   │   │   └── conf-multi-model.yaml
    │   │   ├── data_source.md
    │   │   └── modelscripts
    │   │   │   ├── inference.py
    │   │   │   ├── preprocess.py
    │   │   │   ├── requirements.txt
    │   │   │   └── train.py
    │   ├── cal_housing_tf
    │   │   ├── conf
    │   │   │   └── conf-multi-model.yaml
    │   │   ├── data_source.md
    │   │   └── modelscripts
    │   │   │   ├── evaluate.py
    │   │   │   ├── inference.py
    │   │   │   ├── preprocess.py
    │   │   │   ├── requirements.txt
    │   │   │   └── train.py
    │   └── dag.png
    └── tf
    │   ├── conf
    │       └── conf.yaml
    │   ├── data_source.md
    │   ├── modelscripts
    │       ├── evaluate.py
    │       ├── inference.py
    │       ├── preprocess.py
    │       ├── requirements.txt
    │       └── train.py
    │   └── smp_dag.png
├── framework
    ├── .gitignore
    ├── __init__.py
    ├── conf
    │   ├── conf.yaml
    │   └── template
    │   │   └── conf.yaml
    ├── createmodel
    │   ├── __init__.py
    │   └── create_model_service.py
    ├── framework_entrypoint.py
    ├── modelmetrics
    │   ├── __init__.py
    │   └── model_metrics_service.py
    ├── pipeline
    │   ├── helper.py
    │   ├── model_unit.py
    │   └── pipeline_service.py
    ├── processing
    │   ├── __init__.py
    │   └── processing_service.py
    ├── registermodel
    │   ├── __init__.py
    │   └── register_model_service.py
    ├── training
    │   ├── __init__.py
    │   └── training_service.py
    ├── transform
    │   ├── __init__.py
    │   └── transform_service.py
    └── utilities
    │   ├── __init__.py
    │   ├── configuration.py
    │   ├── logger.py
    │   └── utils.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | **/.DS_Store
2 | *.ipynb
3 | *.ipynb_checkpoints*
4 | *__pycache__*
5 | pipeline_definition.json
6 | *env.txt
7 | .idea
8 | git_erase_commits
9 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *main* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT No Attribution
 2 | 
 3 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 6 | this software and associated documentation files (the "Software"), to deal in
 7 | the Software without restriction, including without limitation the rights to
 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 9 | the Software, and to permit persons to whom the Software is furnished to do so.
10 | 
11 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
12 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
13 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
14 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
15 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
16 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
17 | 
18 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Dynamic Sagemaker Pipelines Framework
  2 | In this repo, we present a framework for automating SageMaker Pipelines DAG creation based on simple configuration files. The framework code starts by reading the configuration file(s); and then dynamically creates a SameMaker Pipelines DAG based on the steps declared in the configuration file(s) and the interactions/dependencies among steps. This orchestration framework caters to both single-model and multi-model use cases; and ensures smooth flow of data and processes.
  3 | 
  4 | ### Architecture Diagram
  5 | 
  6 | The following architecture diagram depicts how the proposed framework can be used during both experimentation and operationalization of ML models.
  7 | 
  8 | ![architecture-diagram](./architecture_diagram.png)
  9 | 
 10 | ### Repository Structure 
 11 | 
 12 | This repo contains the following directories and files: 
 13 | 
 14 | - **/framework/conf/**: This directory contains a configuration file that is used to set common variables across all modeling units such as subnets, security groups, and IAM role at the runtime. A modeling unit is a sequence of up to 6 steps for training an ML model 
 15 | - **/framework/createmodel/**: This directory contains a Python script that creates a SageMaker Model object based on model artifacts from a Training Step 
 16 | - **/framework/modelmetrics/**: This directory contains a Python script that creates a SageMaker Processing job for generating a model metrics JSON report for a trained model 
 17 | - **/framework/pipeline/**: This directory contains Python scripts that leverage Python classes defined in other framework directories to create or update a SageMaker Pipelines DAG based on the specified configurations. The model_unit.py script is used by pipeline_service.py to create one or more modeling units. Each modeling unit is a sequence of up to 6 steps for training an ML model: process, train, create model, transform, metrics, and register model. Configurations for each modeling unit should be specified in the model’s respective repository. The pipeline_service.py also sets dependencies among SageMaker Pipelines steps (i.e., how steps within and across modeling units are sequenced and/or chained) based on sagemakerPipeline section which should be defined in the configuration file of one of the model repositories (i.e., the anchor model)
 18 | - **/framework/processing/**: This directory contains a Python script that creates a generic SageMaker Processing job 
 19 | - **/framework/registermodel/**: This directory contains a Python script for registering a trained model along with its calculated metrics in SageMaker Model Registry 
 20 | - **/framework/training/**: This directory contains a Python script that creates a SageMaker Training job 
 21 | - **/framework/transform/**: This directory contains a Python script that creates a SageMaker Batch Transform job. In the context of model training, this is used to calculate the performance of a trained model on test data •	/framework/utilities/: This directory contains utility scripts for reading and joining configuration files, as well as logging 
 22 | - **/framework_entrypoint.py**: This file is the entry point of the framework code. It simply calls a function defined in the /framework/pipeline/ directory to create or update a SageMaker Pipelines DAG and execute it 
 23 | - **/examples/**: This directory contains several examples of how this automation framework can be used to create simple and complex training DAGs 
 24 | - **/env.env**: This file allows to set common variables such as subnets, security groups, and IAM role as environment variables 
 25 | - **/requirements.txt**: This file specifies Python libraries that are required for the framework code
 26 | 
 27 | ### Deployment Guide 
 28 | 
 29 | Follow the steps below in order to deploy the solution:
 30 | 
 31 | 1. Organize your model training repository, for example according to the following structure:
 32 | 
 33 |     ```
 34 |     <MODEL-DIR-REPO>
 35 |     .
 36 |     ├── <MODEL-DIR>
 37 |     |   ├── conf
 38 |     |   |   └── conf.yaml
 39 |     |   └── scripts
 40 |     |       ├── preprocess.py
 41 |     |       ├── train.py
 42 |     |       ├── transform.py
 43 |     |       └── evaluate.py
 44 |     └── README.md
 45 |     ```
 46 | 
 47 | 1. Clone the framework code and your model(s) source code from the Git repositories:
 48 | 
 49 |     a.	Clone `dynamic-sagemaker-pipelines-framework` repo into a training directory. Here we assume the training directory is called `aws-train` : 
 50 | 
 51 | 
 52 |         git clone https://github.com/aws-samples/dynamic-sagemaker-pipelines-framework.git aws-train
 53 | 
 54 |     b. Clone the model(s) source code under the same directory. For multi-model training repeat this step for as many models you require to train.  
 55 |   
 56 | 
 57 |         git clone https:<MODEL-DIR-REPO>.git aws-train
 58 | 
 59 |         For a single-model training, your directory should look like:
 60 | 
 61 |         <aws-train>  
 62 |         .  
 63 |         ├── framework
 64 |         └── <MODEL-DIR>
 65 | 
 66 |         For multi-model training, your directory should look like:
 67 | 
 68 |         <aws-train>  
 69 |         .  
 70 |         ├── framework
 71 |         └── <MODEL-DIR-1>
 72 |         └── <MODEL-DIR-2>
 73 |         └── <MODEL-DIR-3>
 74 | 
 75 | 
 76 | 1. Set up the following environment variables. Asterisks indicate environment variables which are required, while the rest are optional.  
 77 | 
 78 |   
 79 |     | Environment Variable | Description |
 80 |     | ---------------------| ------------|
 81 |     | SMP_ACCOUNTID*        |    AWS Account where SageMaker Pipeline is executed |
 82 |     | SMP_REGION*           |    AWS Region where SageMaker Pipeline is executed |
 83 |     | SMP_S3BUCKETNAME*     |    Amazon S3 bucket name |
 84 |     | SMP_ROLE*             |   Amazon SageMaker execution role |
 85 |     | SMP_MODEL_CONFIGPATH* |   Relative path of the of single-model or multi-model configuration file(s) |
 86 |     | SMP_SUBNETS          |  Subnet IDs for SageMaker networking configuration |
 87 |     | SMP_SECURITYGROUPS   |   Security group IDs for SageMaker networking configuration |
 88 | 
 89 |     Note:
 90 | 
 91 |         a. For single-model use cases: SMP_MODEL_CONFIGPATH="<MODEL-DIR>/conf/conf.yaml" 
 92 | 
 93 |         b. For multi-model  use cases: SMP_MODEL_CONFIGPATH="*/conf/conf.yaml"  
 94 | 
 95 |     During experimentation (i.e., local testing), you can specify environment variables inside env.env file; and then export them by executing the following command in your terminal: 
 96 |     
 97 |     ```bash
 98 |     source env.env
 99 |     ```
100 |     During operationalization, these environment variables will be set by the CI pipeline.
101 | 
102 |     Note: All environment variables should be between quotation marks, the following example is provided to illustrate it:
103 | 
104 |     SMP_SUBNETS="subnet-xxxxxxxx,subnet-xxxxxxxx"  
105 |     SMP_ROLE="arn:aws:iam::xxxxxxxxxxxx:role/xxxxx"  
106 |     SMP_SECURITYGROUPS="sg-xxxxxxxx"  
107 |     SMP_S3BUCKETNAME="your-bucket"  
108 |     SMP_MODEL_CONFIGPATH="your-path-absolute-or-relative/*/conf/*.yaml"  
109 |     SMP_ACCOUNTID=”xxxxxxxxxxxx”  
110 |     SMP_REGION="your-aws-region" 
111 | 
112 | 1. Recommendations for  how the security groups, VPCs, IAM roles, buckets, and subnets should be set up.
113 |     Please be aware that this recommendations needs to be considered at the moment where `SageMaker Domain` is created.
114 | 
115 |     As best practice:
116 |     - Create IAM roles with minimum permissions required for the ML activities you want to support using SageMaker Role Manager. This tool provides predefined  policies that can be customized.
117 | 
118 |     - Set up VPCs with public and private subnets across multiple Availability Zones for fault tolerance. Configure security groups to allow access to SageMaker endpoints only from the private subnets.
119 | 
120 |     - Create S3 buckets in the same region as the SageMaker domain for storing model artifacts and data. The buckets should be encrypted using AWS KMS.
121 | 
122 |     - When configuring networking, choose the "VPC" connection option. Select the VPC and subnets created earlier.
123 | 
124 |     - Review and customize the IAM roles and policies generated by SageMaker Role Manager to meet your specific access and governance needs.
125 | 
126 |     Refer to the AWS documentation on SageMaker domain setup for organizations for step-by-step instructions on configuring the above resources through the SageMaker console wizard.
127 | 
128 |     sources:  
129 |     [1][Amazon SageMaker simplifies setting up SageMaker domain for enterprises to onboard their users to SageMaker](https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-simplifies-setting-up-sagemaker-domain-for-enterprises-to-onboard-their-users-to-sagemaker/)  
130 |     [2][Choose an Amazon VPC - Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-vpc.html)  
131 |     [3][Connect to SageMaker Within your VPC - Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/interface-vpc-endpoint.html)  
132 | 
133 | 
134 | 1. Create and activate a virtual environment:
135 | 
136 |     ```bash
137 |     python -m venv .venv
138 |     source .venv/bin/activate
139 |     ```
140 | 
141 | 1. Install required python packages: 
142 | 
143 |     ```bash
144 |     pip install -r requirements.txt
145 |     ```
146 | 
147 | 1. Edit your model training conf.yaml file(s). Please see the Configuration Files Structure section for more details. 
148 | 
149 | 1. From terminal, call framework’s entry point to create or update and execute the SageMaker Pipeline training DAG:
150 | 
151 |     ```bash
152 |     python framework/framework_entrypoint.py
153 |     ```
154 | 
155 | 1. View and debug the SageMaker Pipelines execution in the Pipelines tab of SageMaker Studio UI.
156 | 
157 | 
158 | 
159 | ### Configuration Files Structure
160 | 
161 | There are two types of configuration files in the proposed solution: 1) Framework configuration, and 2) Model configuration(s). Below we describe each in details.
162 | 
163 | #### Framework Configuration 
164 | 
165 | The `/framework/conf/conf.yaml` is used to set variables that are common across all modeling units. This includes SMP_S3BUCKETNAME, SMP_ROLE, SMP_MODEL_CONFIGPATH, SMP_SUBNETS, SMP_SECURITYGROUPS, and SMP_MODELNAME. Please see step 3 of the `Deployment Guide section` for descriptions of these variables and how to set them via environment variables.
166 | 
167 | #### Model Configuration(s) 
168 | 
169 | For each model in the project, we need to specify the following in the <model-name>/conf/conf.yaml file (Asterisks indicate required fields, while the rest are optional):
170 | 
171 | -	**/conf/models***: Under this section, one or more modeling units can be configured. When the framework code is executed, it will automatically read all configuration files during run-time and append them to the config tree. Theoretically, you can specify all modeling units in the same conf.yaml file, but it is recommended to specify each modeling unit configuration in its respective Git repository to minimize errors. 
172 | 
173 |     - **{model-name}***: The name of the model. 
174 |     - **source_directory***: A common source_dir path to use for all steps within the modeling unit 
175 |     - **[processing](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing)**: This section specifies preprocessing parameters below. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) for descriptions of each parameter 
176 | 
177 |         ```
178 |         image_uri*:                     
179 |         entry_point*:                   
180 |         base_job_name:                 
181 |         instance_count:                # default value: 1
182 |         instance_type:                 # default value: "ml.m5.2xlarge"
183 |         volume_size_in_gb:             # default value: 32
184 |         max_runtime_seconds:           # default value: 3000
185 |         tags:                          # default value: None
186 |         env:                           # default value: None
187 |         framework_version:             # default value: "0"
188 |         s3_data_distribution_type:     # default value: "FullyReplicated"
189 |         s3_data_type:                  # default value: "S3Prefix"
190 |         s3_input_mode:                 # default value: "File"
191 |         s3_upload_mode:                # default value: "EndOfJob"
192 |         channels:
193 |             train:
194 |                 dataFiles:
195 |                     - sourceName:      
196 |                         fileName*:      
197 |         ```
198 |         Note: 
199 | 
200 |             a. dataFiles are loaded to container at "_/opt/ml/processing/input/{sourceName}/_" path
201 |         
202 |             b. SageMaker offloads the content from "_/opt/ml/processing/input/{channelName}/_" container path to S3 when the processing job is complete
203 |               
204 |     - **[train*](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training)**: This section specifies training job parameters below. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) for descriptions of each parameter 
205 | 
206 |         ```
207 |         image_uri*:                    
208 |         entry_point*:                  
209 |         base_job_name:                
210 |         instance_count:               # default value: 1
211 |         instance_type:                # default value: "ml.m5.2xlarge"
212 |         volume_size_in_gb:            # default value: 32
213 |         max_runtime_seconds:          # default value: 3000
214 |         tags:                         
215 |         env:                          
216 |         hyperparams:                  
217 |         model_data_uri:               
218 |         channels:
219 |             train*:                    
220 |                 dataFiles:
221 |                     - sourceName:     
222 |                         fileName:     
223 |             test:                     
224 |                 dataFiles:
225 |                     - sourceName:     
226 |                         fileName:     
227 |         ```
228 | 
229 |         Note:
230 | 
231 |             a. dataFiles are loaded to container at "_/opt/ml/input/data/{channelName}/_" path (also accessible via environment variable "_SM_CHANNEL\_{channelName}_")
232 |         
233 |             b. SageMaker zips trained model artifacts from "_/opt/ml/model/_" container path and uploads to S3 
234 | 
235 |     - **[transform*](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform)**: This section specifies SageMaker Transform job parameters below for making predictions on the test data. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TransformStep) for descriptions of each parameter
236 | 
237 |         ```
238 |         image_uri*:                  
239 |         base_job_name:              # default value: "default-transform-job-name"
240 |         instance_count:             # default value: 1
241 |         instance_type:              # default value: "ml.m5.2xlarge"
242 |         strategy:                   
243 |         assemble_with:              
244 |         join_source:                
245 |         split_type:                 
246 |         content_type:               # default value: "text/csv"
247 |         max_payload:                
248 |         volume_size:                # default value: 50
249 |         max_runtime_in_seconds:     # default value: 3600
250 |         input_filter:               
251 |         output_filter:              
252 |         tags:                       
253 |         env:                        
254 |         channels:
255 |             test:
256 |                 s3BucketName: 
257 |                 dataFiles:
258 |                     - sourceName:   
259 |                         fileName:   
260 |         ```
261 |         
262 |         Note:  
263 | 
264 |             a. Results of the batch transform job are stored in S3 bucket with name s3BucketName. This S3 bucket is also used to stage local input files specified in _fileName_
265 | 
266 |             b. Only one channel and one dataFile in that channel are allowed for the transform step 
267 | 
268 |             
269 |     - **[evaluate](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#property-file)**: This section specifies SageMaker Processing job parameters for generating a model metrics JSON report for the trained model. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_metrics.ModelMetrics) for descriptions of each parameter
270 | 
271 |         ```
272 |             image_uri*:                     
273 |             entry_point*:                  
274 |             base_job_name:                 
275 |             instance_count:                # default value: 1
276 |             instance_type:                 # default value: "ml.m5.2xlarge"
277 |             strategy:                      # default value: "SingleRecord"
278 |             max_payload:                   
279 |             volume_size_in_gb:             # default value: 50
280 |             max_runtime_in_seconds:        # default value: 3600
281 |             s3_data_distribution_type:     # default value: "FullyReplicated"
282 |             s3_data_type:                  # default value: "S3Prefix"
283 |             s3_input_mode:                 # default value: "File"
284 |             tags:                          
285 |             env:                           
286 |             channels:
287 |                 test:
288 |                     s3BucketName:          
289 |                     dataFiles:
290 |                         - sourceName:      
291 |                             fileName:      
292 | 
293 |         ```
294 | 
295 |         Note:
296 |             
297 |             a. dataFiles are loaded to container at "_/opt/ml/processing/input/{sourceName}/_" path
298 |             
299 |             b. Only one channel and one dataFile in that channel is allowed for evaluate step
300 |             
301 |             c. SageMaker offloads the content from "_/opt/ml/processing/input/{channelName}/_" container path to S3
302 |                 
303 |     - **[registry*](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model)**: This section specifies parameters for registering the trained model in SageMaker Model Registry
304 |         - **ModelRepack**: If "True", uses entry_point in the transform step for inference entry_point when serving the model on SageMaker
305 |         - **[ModelPackageDescription](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel)**
306 |         - **InferenceSpecification**: This section includes inference specifications of the model package. Please see [Amazon SageMaker documentation](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel) for descriptions of each paramater
307 | 
308 |             ```
309 |             image_uri*:                        
310 |             supported_content_types*: 
311 |                 - application/json           
312 |             supported_response_MIME_types*: 
313 |                 - application/json           
314 |             approval_status*:              # PendingManualApproval | Rejected | Approved
315 |             ```
316 | 
317 | -	**/conf/sagemakerPipeline***: This section is used to define SageMaker Pipelines flow including dependencies among steps. For single-model use cases, this section is defined at the end of the configuration file. For multi-model use cases, the sagemakerPipeline section only needs to be defined in configuration file of one of the models (any of the models). We refer to this model as the anchor model. 
318 | 
319 |     - **pipelineName***: Name of the SageMaker Pipeline 
320 |     - **models***: Nested list of modeling units
321 |         - **{model-name}***: Model identifier which should match a {model-name} identifier in the /conf/models section. 
322 |             - **steps***: 
323 |                 - **step_name***: Step name to be displayed in the SageMaker Pipelines DAG. 
324 |                 - **step_class***: (Union[Processing, Training, CreateModel, Transform, Metrics, RegisterModel]) 
325 |                 - **step_type***: This parameter is only required for preprocess steps, for which it should be set to preprocess. This is needed to distinguish preprocess and evaluate steps, both of which have a step_class of Processing. 
326 |                 - **enable_cache**: ([Union[True, False]]) - whether to enable Sagemaker Pipelines caching for this step or not. 
327 |                 - **chain_input_source_step**: ([list[step_name]]) – This can be used to set the channel outputs of another step as input to this step. 
328 |                 - **chain_input_additional_prefix**: This is only allowed for steps of the Transform step_class; and can be used in conjunction with chain_input_source_step parameter to pinpoint the file that should be used as the input to the Transform step. 
329 |     - **dependencies**: This section is used to specify the sequence in which the SageMaker Pipelines steps should be executed. We have adapted the Apache Airflow notation for this section (i.e., {step_name} >> {step_name}). If this section is left blank, explicit dependencies specified by chain_input_source_step parameter and/or implicit dependencies define the Sagemaker Pipelines DAG flow.
330 | 
331 | 
332 | ## Security
333 | 
334 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
335 | 
336 | ## License
337 | 
338 | This library is licensed under the MIT-*License. See the LICENSE file.
339 | 


--------------------------------------------------------------------------------
/architecture_diagram.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/architecture_diagram.png


--------------------------------------------------------------------------------
/env.env:
--------------------------------------------------------------------------------
1 | export SMP_SUBNETS="subnet-xxxxxxxx,subnet-xxxxxxxx"
2 | export SMP_ROLE="arn:aws:iam::xxxxxxxxxxxx:role/xxxxx"
3 | export SMP_SECURITYGROUPS="sg-xxxxxxxx"
4 | export SMP_S3BUCKETNAME="your-bucket"
5 | export SMP_MODEL_CONFIGPATH="your-path-absolute-or-relative//conf/.yaml"
6 | export SMP_ACCOUNTID=”xxxxxxxxxxxx”
7 | export SMP_REGION="your-aws-region"
8 | 


--------------------------------------------------------------------------------
/examples/lgbm/conf/conf.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | conf:
 3 |     models:
 4 |         lgbm:
 5 |             source_directory: examples/lgbm
 6 |             train:
 7 |                 instance_type: ml.c5.xlarge
 8 |                 image_uri: SMP_ACCOUNTID.dkr.ecr.SMP_REGION.amazonaws.com/pytorch-training:1.9.0-cpu-py38
 9 |                 entry_point: training/training.py
10 |                 base_job_name: lightgbm-train
11 |                 channels:
12 |                     train:
13 |                         dataFiles:
14 |                             - sourceName: online_shoppers_intention_train
15 |                               fileName: s3://SMP_S3BUCKETNAME/lightGBM/train
16 |                     test:
17 |                         dataFiles:
18 |                             - sourceName: online_shoppers_intention_test
19 |                               fileName: s3://SMP_S3BUCKETNAME/lightGBM/test
20 | 
21 |             registry:
22 |                 ModelRepack: "False"
23 |                 InferenceSpecification: 
24 |                     image_uri: "SMP_ACCOUNTID.dkr.ecr.SMP_REGION.amazonaws.com/lightgbm-inference:lightgbm-i0.0"
25 |                     supported_content_types: 
26 |                         - application/json
27 |                     supported_response_MIME_types: 
28 |                         - application/json
29 |                     approval_status: PendingManualApproval
30 | 
31 |             transform:
32 |               instance_type: ml.c5.xlarge
33 |               image_uri: "refer-transform/docker-to-built-inference-image"
34 |               entry_point: transform/transform.py
35 |               content_type: application/x-npy
36 |               channels:
37 |                     test:
38 |                         s3BucketName: SMP_S3BUCKETNAME                   
39 |                         dataFiles:
40 |                             - sourceName: online_shoppers_intention_test
41 |                               fileName: s3://SMP_S3BUCKETNAME/lightGBM/test/x_test.npy
42 | 
43 |             evaluate:
44 |               instance_type: ml.c5.xlarge
45 |               image_uri: 'SMP_ACCOUNTID.dkr.ecr.SMP_REGION.amazonaws.com/pytorch-training:1.9.0-cpu-py38'
46 |               entry_point: evaluate/evaluate.py
47 |               base_job_name: lgbm-evaluate
48 |               content_type: application/json
49 |               channels:
50 |                     test:
51 |                         s3BucketName: SMP_S3BUCKETNAME                   
52 |                         dataFiles:
53 |                             - sourceName: online_shoppers_intention_ytest
54 |                               fileName: s3://SMP_S3BUCKETNAME/lightGBM/test/y_test.npy
55 | 
56 |     sagemakerPipeline:
57 |         pipelineName: lgbm-test
58 |         models:
59 |             lgbm:
60 |                 steps:
61 |                     - step_name: lgbm-Training
62 |                       step_class: Training
63 |                       enable_cache: True
64 |                     - step_name: lgbm-CreateModel
65 |                       step_class: CreateModel
66 |                     - step_name: lgbm-Transform
67 |                       step_class: Transform
68 |                       enable_cache: True
69 |                     - step_name: lgbm-Metrics
70 |                       step_class: Metrics
71 |                       chain_input_source_step: 
72 |                         - lgbm-Transform
73 |                       enable_cache: True
74 |                     - step_name: lgbm-Register
75 |                       step_class: RegisterModel
76 | 
77 |         dependencies:
78 |             - lgbm-Training >> lgbm-CreateModel >> lgbm-Transform >> lgbm-Metrics >> lgbm-Register
79 | 


--------------------------------------------------------------------------------
/examples/lgbm/dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/dag.png


--------------------------------------------------------------------------------
/examples/lgbm/data_source.md:
--------------------------------------------------------------------------------
 1 | # LGBM Data Source
 2 | 
 3 | ## Data Information
 4 | We use the Online Shoppers Purchasing Intention Dataset.
 5 | 
 6 | More info on the dataset:
 7 | 
 8 | This dataset was obtained from  UCI's Machine Learning Library. https://archive.ics.uci.edu/dataset/468/online+shoppers+purchasing+intention+dataset
 9 | 
10 | 
11 | ## Data Download
12 | Download the data locally from [here](https://archive.ics.uci.edu/static/public/468/online+shoppers+purchasing+intention+dataset.zip). The data file is named **online_shoppers_intention.csv**
13 | 
14 | Then reference the preprocessing script(written for sagemaker processing jobs) to create train test splits.
15 | 
16 | ## Upload to S3
17 | ```
18 | aws s3 cp <LOCAL>/x_train.npy s3://<BUCKET>/lightGBM/train
19 | aws s3 cp <LOCAL>/y_train.npy s3://<BUCKET>/lightGBM/train
20 | 
21 | aws s3 cp <LOCAL>/x_test.npy s3://<BUCKET>/lightGBM/test
22 | aws s3 cp <LOCAL>/y_test.npy s3://<BUCKET>/lightGBM/test
23 | 
24 | ```


--------------------------------------------------------------------------------
/examples/lgbm/evaluate/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/evaluate/__init__.py


--------------------------------------------------------------------------------
/examples/lgbm/evaluate/evaluate.py:
--------------------------------------------------------------------------------
 1 | import glob
 2 | import lightgbm as lgb
 3 | import numpy as np
 4 | from sklearn.metrics import accuracy_score, roc_auc_score
 5 | import pathlib, json
 6 | 
 7 | 
 8 | 
 9 | if __name__=='__main__':
10 |     
11 |     print('Loading data . . . .')
12 |     y_test= np.load(glob.glob('{}/*.npy'.format('/opt/ml/processing/input/online_shoppers_intention_ytest'))[0]) 
13 |     # y_pred= glob.glob('{}/*.out'.format('/opt/ml/processing/input'))[0]
14 |     text_file= open(glob.glob('{}/*.out'.format('/opt/ml/processing/input/lgbm-Transform-test'))[0], "r")
15 |     y_pred= np.array([float(i) for i in text_file.read()[1:-1].split(',')])
16 | 
17 |     print('\ny_pred shape: \n{}\n'.format(y_pred.shape))
18 |     print('\ny_test shape: \n{}\n'.format(y_test.shape))
19 |  
20 | 
21 |     print('Evaluating model . . . .\n')    
22 |     acc = accuracy_score(y_test.astype(int), y_pred.astype(int))
23 |     auc = roc_auc_score(y_test, y_pred)
24 |     print('Accuracy:  {:.2f}'.format(acc))
25 |     print('AUC Score: {:.2f}'.format(auc))
26 |     
27 |     output_dir = "/opt/ml/processing/output"
28 |     pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
29 | 
30 |     report_dict = {
31 |         "evaluation": {
32 |             "metrics": {
33 |                 "Accuracy": '{:.2f}'.format(acc), "AUC_Score": '{:.2f}'.format(auc)
34 |             }
35 |         }
36 |     }
37 | 
38 |     evaluation_path = f"{output_dir}/model_evaluation_metrics.json"
39 |     with open(evaluation_path, "w") as f:
40 |         f.write(json.dumps(report_dict))
41 | 
42 | 


--------------------------------------------------------------------------------
/examples/lgbm/evaluate/requirements.txt:
--------------------------------------------------------------------------------
1 | lightgbm 
2 | numpy 
3 | pandas 
4 | scikit-learn


--------------------------------------------------------------------------------
/examples/lgbm/preprocessing/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/preprocessing/__init__.py


--------------------------------------------------------------------------------
/examples/lgbm/preprocessing/preprocessing.py:
--------------------------------------------------------------------------------
 1 | import glob
 2 | import numpy as np
 3 | import os
 4 | import pandas as pd
 5 | from sklearn.model_selection import train_test_split
 6 | 
 7 | 
 8 | if __name__=='__main__':
 9 |     
10 |     
11 |     input_file = glob.glob('{}/*.csv'.format('/opt/ml/processing/input'))
12 |     print('\nINPUT FILE: \n{}\n'.format(input_file))   
13 |     df = pd.read_csv(input_file[0])
14 |     
15 |     # minor preprocessing (drop some uninformative columns etc.)
16 |     print('Preprocessing the dataset . . . .')   
17 |     df_clean = df.drop(['Month','Browser','OperatingSystems','Region','TrafficType','Weekend'], axis=1)
18 |     visitor_encoded = pd.get_dummies(df_clean['VisitorType'], prefix='Visitor_Type', drop_first = True)
19 |     df_clean_merged = pd.concat([df_clean, visitor_encoded], axis=1).drop(['VisitorType'], axis=1)
20 |     X = df_clean_merged.drop('Revenue', axis=1)
21 |     y = df_clean_merged['Revenue']
22 |     
23 |     # split the preprocessed data with stratified sampling for class imbalance
24 |     X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=2, test_size=.2)
25 | 
26 |     # save to container directory for uploading to S3
27 |     print('Saving the preprocessed dataset . . . .')   
28 |     train_data_output_path = os.path.join('/opt/ml/processing/train', 'x_train.npy')
29 |     np.save(train_data_output_path, X_train.to_numpy())
30 |     train_labels_output_path = os.path.join('/opt/ml/processing/train', 'y_train.npy')
31 |     np.save(train_labels_output_path, y_train.to_numpy())    
32 |     test_data_output_path = os.path.join('/opt/ml/processing/test', 'x_test.npy')
33 |     np.save(test_data_output_path, X_test.to_numpy())
34 |     test_labels_output_path = os.path.join('/opt/ml/processing/test', 'y_test.npy')
35 |     np.save(test_labels_output_path, y_test.to_numpy())   


--------------------------------------------------------------------------------
/examples/lgbm/preprocessing/requirements.txt:
--------------------------------------------------------------------------------
1 | lightgbm 
2 | numpy 
3 | pandas 
4 | scikit-learn 


--------------------------------------------------------------------------------
/examples/lgbm/requirements.txt:
--------------------------------------------------------------------------------
1 | lightgbm 
2 | numpy 
3 | pandas 
4 | scikit-learn


--------------------------------------------------------------------------------
/examples/lgbm/training/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/training/__init__.py


--------------------------------------------------------------------------------
/examples/lgbm/training/requirements.txt:
--------------------------------------------------------------------------------
1 | lightgbm 
2 | numpy 
3 | pandas 
4 | scikit-learn 


--------------------------------------------------------------------------------
/examples/lgbm/training/training.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import glob
 3 | import lightgbm as lgb
 4 | import numpy as np
 5 | import os
 6 | 
 7 | 
 8 | if __name__=='__main__':
 9 |     
10 |     # extract training data S3 location and hyperparameter values
11 |     parser = argparse.ArgumentParser()
12 |     parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
13 |     parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_TEST'])
14 |     parser.add_argument('--num_leaves', type=int, default=28)
15 |     parser.add_argument('--max_depth', type=int, default=5)
16 |     parser.add_argument('--learning_rate', type=float, default=0.1)
17 |     args = parser.parse_args()
18 |     
19 |     print('Loading training data from {}\n'.format(args.train))
20 |     input_files = glob.glob('{}/*.npy'.format(args.train))
21 |     print('\nTRAINING INPUT FILE LIST: \n{}\n'.format(input_files)) 
22 |     for file in input_files:
23 |         if 'x_' in file:
24 |             x_train = np.load(file)
25 |         else:
26 |             y_train = np.load(file)      
27 |     print('\nx_train shape: \n{}\n'.format(x_train.shape))
28 |     print('\ny_train shape: \n{}\n'.format(y_train.shape))
29 |     train_data = lgb.Dataset(x_train, label=y_train)
30 |     
31 |     print('Loading validation data from {}\n'.format(args.validation))
32 |     eval_input_files = glob.glob('{}/*.npy'.format(args.validation))
33 |     print('\nVALIDATION INPUT FILE LIST: \n{}\n'.format(eval_input_files)) 
34 |     for file in eval_input_files:
35 |         if 'x_' in file:
36 |             x_val = np.load(file)
37 |         else:
38 |             y_val = np.load(file)      
39 |     print('\nx_val shape: \n{}\n'.format(x_val.shape))
40 |     print('\ny_val shape: \n{}\n'.format(y_val.shape))
41 |     eval_data = lgb.Dataset(x_val, label=y_val)
42 |     
43 |     print('Training model with hyperparameters:\n\t num_leaves: {}\n\t max_depth: {}\n\t learning_rate: {}\n'
44 |           .format(args.num_leaves, args.max_depth, args.learning_rate))
45 |     parameters = {
46 |         'objective': 'binary',
47 |         'metric': 'binary_logloss',
48 |         'is_unbalance': 'true',
49 |         'boosting': 'gbdt',
50 |         'num_leaves': args.num_leaves,
51 |         'max_depth': args.max_depth,
52 |         'learning_rate': args.learning_rate,
53 |         'verbose': 1
54 |     }
55 |     num_round = 10
56 |     bst = lgb.train(parameters, train_data, num_round, eval_data)
57 |     
58 |     print('Saving model . . . .')
59 |     bst.save_model('/opt/ml/model/online_shoppers_model.txt')


--------------------------------------------------------------------------------
/examples/lgbm/transform/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/lgbm/transform/__init__.py


--------------------------------------------------------------------------------
/examples/lgbm/transform/docker/Dockerfile:
--------------------------------------------------------------------------------
 1 | 
 2 | FROM ubuntu:18.04
 3 |     
 4 | # Set a docker label to advertise multi-model support on the container
 5 | LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
 6 | # Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present
 7 | LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
 8 | 
 9 | # Install necessary dependencies for MMS and SageMaker Inference Toolkit
10 | RUN apt-get update && \
11 |     apt-get -y install --no-install-recommends \
12 |     build-essential \
13 |     ca-certificates \
14 |     openjdk-8-jdk-headless \
15 |     python3-dev \
16 |     curl \
17 |     vim \
18 |     && rm -rf /var/lib/apt/lists/* \
19 |     && curl -O https://bootstrap.pypa.io/pip/3.6/get-pip.py \
20 |     && python3 get-pip.py
21 | 
22 | RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
23 | RUN update-alternatives --install /usr/local/bin/pip pip /usr/local/bin/pip3 1
24 |     
25 | RUN pip install lightgbm numpy pandas \ 
26 |                 scikit-learn multi-model-server \
27 |                 sagemaker-inference retrying
28 | 
29 | # Copy entrypoint script to the image
30 | COPY dockerd-entrypoint.py /usr/local/bin/dockerd-entrypoint.py
31 | RUN chmod +x /usr/local/bin/dockerd-entrypoint.py
32 | 
33 | RUN mkdir -p /home/model-server/
34 | 
35 | # Copy the default custom service file to handle incoming data and inference requests
36 | COPY model_script.py /home/model-server/model_script.py
37 | 
38 | # Define an entrypoint script for the docker image
39 | ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]
40 | 
41 | # Define command to be passed to the entrypoint
42 | CMD ["serve"]
43 | 
44 | # Define healthcheck
45 | HEALTHCHECK CMD curl --fail http://localhost:8080/ping || exit 1
46 | 
47 | # Add and set a non-root user. Issue with sagemaker inference with non-root users linked here- https://github.com/aws/sagemaker-inference-toolkit/issues/72. Please comment the lines below until linked issue is resolved.
48 | RUN useradd -m nonroot 
49 | USER nonroot
50 | 


--------------------------------------------------------------------------------
/examples/lgbm/transform/docker/dockerd-entrypoint.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import subprocess
 3 | import sys
 4 | import shlex
 5 | import os
 6 | from retrying import retry
 7 | from subprocess import CalledProcessError
 8 | from sagemaker_inference import model_server
 9 | 
10 | def _retry_if_error(exception):
11 |     return isinstance(exception, CalledProcessError or OSError)
12 | 
13 | @retry(stop_max_delay=1000 * 50,
14 |        retry_on_exception=_retry_if_error)
15 | def _start_mms():
16 |     # by default the number of workers per model is 1, but we can configure it through the
17 |     # environment variable below if desired.
18 |     # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
19 |     model_server.start_model_server(handler_service='/home/model-server/model_script.py:handle')
20 | 
21 | def main():
22 |     if sys.argv[1] == 'serve':
23 |         _start_mms()
24 |     else:
25 |         subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))
26 | 
27 |     # prevent docker exit
28 |     subprocess.call(['tail', '-f', '/dev/null'])
29 |     
30 | main()
31 | 


--------------------------------------------------------------------------------
/examples/lgbm/transform/docker/model_script.py:
--------------------------------------------------------------------------------
 1 | 
 2 | from collections import namedtuple
 3 | import glob
 4 | import json
 5 | import logging
 6 | import os
 7 | import re
 8 | 
 9 | import lightgbm as lgb
10 | import numpy as np
11 | from sagemaker_inference import content_types, encoder
12 | 
13 | NUM_FEATURES = 12
14 | 
15 | class ModelHandler(object):
16 |     """
17 |     A lightGBM Model handler implementation.
18 |     """
19 | 
20 |     def __init__(self):
21 |         self.initialized = False
22 |         self.model = None
23 | 
24 |     def initialize(self, context):
25 |         """
26 |         Initialize model. This will be called during model loading time
27 |         :param context: Initial context contains model server system properties.
28 |         :return: None
29 |         """
30 |         self.initialized = True
31 |         properties = context.system_properties
32 |         model_dir = properties.get("model_dir") 
33 |         self.model = lgb.Booster(model_file=os.path.join(model_dir,'online_shoppers_model.txt'))
34 |        
35 | 
36 |     def preprocess(self, request):
37 |         """
38 |         Transform raw input into model input data.
39 |         :param request: list of raw requests
40 |         :return: list of preprocessed model input data
41 |         """        
42 |         payload = request[0]['body']
43 |         payload= payload[payload.find(b'\n')+1:]
44 |         data = np.frombuffer(payload, dtype=np.float64)
45 |         data = data.reshape((data.size // NUM_FEATURES, NUM_FEATURES))
46 |         return data
47 | 
48 |     def inference(self, model_input):
49 |         """
50 |         Internal inference methods
51 |         :param model_input: transformed model input data list
52 |         :return: list of inference output in numpy array
53 |         """
54 |         prediction = self.model.predict(model_input)
55 |         print('prediction: ', prediction)
56 |         return prediction
57 | 
58 |     def postprocess(self, inference_output):
59 |         """
60 |         Post processing step - converts predictions to str
61 |         :param inference_output: predictions as numpy
62 |         :return: list of inference output as string
63 |         """
64 | 
65 |         return [str(inference_output.tolist())]
66 |         
67 |     def handle(self, data, context):
68 |         """
69 |         Call preprocess, inference and post-process functions
70 |         :param data: input data
71 |         :param context: mms context
72 |         """
73 |         
74 |         model_input = self.preprocess(data)
75 |         model_out = self.inference(model_input)
76 |         return self.postprocess(model_out)
77 | 
78 | _service = ModelHandler()
79 | 
80 | 
81 | def handle(data, context):
82 |     if not _service.initialized:
83 |         _service.initialize(context)
84 | 
85 |     if data is None:
86 |         return None
87 | 
88 |     return _service.handle(data, context)
89 | 


--------------------------------------------------------------------------------
/examples/lgbm/transform/docker/readme.md:
--------------------------------------------------------------------------------
1 | # build docker image and push to your account's ECR.
2 | to build in sagemaker studio notebook, use sagemaker-studio-image-build 
3 | 
4 | sm-docker build . 
5 | 
6 | ref: https://aws.amazon.com/blogs/machine-learning/using-the-amazon-sagemaker-studio-image-build-cli-to-build-container-images-from-your-studio-notebooks/
7 | 


--------------------------------------------------------------------------------
/examples/lgbm/transform/requirements.txt:
--------------------------------------------------------------------------------
1 | lightgbm 
2 | numpy 
3 | pandas 
4 | scikit-learn
5 | multi-model-server
6 | sagemaker-inference 
7 | retrying


--------------------------------------------------------------------------------
/examples/lgbm/transform/transform.py:
--------------------------------------------------------------------------------
 1 | import subprocess
 2 | import sys
 3 | import shlex
 4 | import os
 5 | from retrying import retry
 6 | from subprocess import CalledProcessError
 7 | from sagemaker_inference import model_server
 8 | 
 9 | def _retry_if_error(exception):
10 |     return isinstance(exception, CalledProcessError or OSError)
11 | 
12 | @retry(stop_max_delay=1000 * 50,
13 |        retry_on_exception=_retry_if_error)
14 | def _start_mms():
15 |     # by default the number of workers per model is 1, but we can configure it through the
16 |     # environment variable below if desired.
17 |     # os.environ['SAGEMAKER_MODEL_SERVER_WORKERS'] = '2'
18 |     model_server.start_model_server(handler_service='/home/model-server/model_script.py:handle')
19 | 
20 | def main():
21 |     if sys.argv[1] == 'serve':
22 |         _start_mms()
23 |     else:
24 |         subprocess.check_call(shlex.split(' '.join(sys.argv[1:])))
25 | 
26 |     # prevent docker exit
27 |     subprocess.call(['tail', '-f', '/dev/null'])
28 |     
29 | main()


--------------------------------------------------------------------------------
/examples/llm-text-summarization/conf/conf.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | conf:
 3 |   models:
 4 |     falcon40b-finetuneable:
 5 |       source_directory: examples/llm-text-summarization
 6 |       
 7 |       preprocess:
 8 |         image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-cpu-py310-ubuntu20.04-sagemaker
 9 |         entry_point: preprocessing/preprocessing.py
10 |         base_job_name: falcon-text-summarization-preprocess
11 |         channels:
12 |           training:
13 |             dataFiles:
14 |               - sourceName: raw-prompts-train
15 |                 fileName:  s3://SMP_S3BUCKETNAME/falcon40b-summarization/input/samsum-train.arrow
16 |           testing:
17 |             dataFiles:
18 |               - sourceName: raw-prompts-test
19 |                 fileName:  s3://SMP_S3BUCKETNAME/falcon40b-summarization/input/samsum-test.arrow
20 |           
21 |       train:
22 |         image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-cpu-py310-ubuntu20.04-sagemaker
23 |         entry_point: training/training.py
24 |         base_job_name: falcon-text-summarization-tuning
25 |         instance_type: ml.g5.12xlarge
26 |         volume_size_in_gb: 1024
27 |         max_runtime_seconds: 86400
28 | 
29 |         
30 |       transform:
31 |         image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04
32 |         entry_point: transform/inference.py
33 | 
34 | 
35 |       registry:
36 |         ModelRepack: "False"
37 |         InferenceSpecification: 
38 |           image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.13.1-transformers4.26.0-gpu-py39-cu117-ubuntu20.04
39 |           supported_content_types: 
40 |               - application/json
41 |           supported_response_MIME_types: 
42 |               - application/json
43 |           approval_status: PendingManualApproval
44 | 
45 | 
46 |   sagemakerPipeline:
47 |     pipelineName: Falcon40b-fine-tune
48 |     models:
49 |       falcon40b-finetuneable:
50 |         steps:
51 |           - step_name: falcon-text-summarization-preprocess
52 |             step_class: Processing
53 |             step_type: preprocess
54 |             enable_cache: True
55 |           - step_name: falcon-text-summarization-tuning
56 |             step_class: Training
57 |             enable_cache: False
58 |             step_type: train
59 |             chain_input_source_step:
60 |               - falcon-text-summarization-preprocess
61 |           - step_name: falcon-text-summarization-register
62 |             step_class: RegisterModel
63 |             enable_cache: False
64 |     dependencies:
65 |       - falcon-text-summarization-preprocess >> falcon-text-summarization-tuning >> falcon-text-summarization-register


--------------------------------------------------------------------------------
/examples/llm-text-summarization/dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/llm-text-summarization/dag.png


--------------------------------------------------------------------------------
/examples/llm-text-summarization/data_source.md:
--------------------------------------------------------------------------------
 1 | # LLM Example Data Source
 2 | 
 3 | ## Data Information
 4 | We use the samsum https://huggingface.co/datasets/samsum dataset hosted at huggingface for this example.
 5 | 
 6 | More info on the dataset:
 7 | 
 8 | https://huggingface.co/datasets/samsum
 9 | 
10 | 
11 | ## Data Download and S3 Upload
12 | To download the data, run the following code block. The data files are named as **samsum-{train/test/validation}.arrow**
13 | install requirements via: pip install aiobotocore, datasets, 
14 | ```
15 | from datasets import load_dataset
16 | from datasets import load_dataset_builder
17 | import aiobotocore.session
18 | 
19 | # set up a profile in your aws config file: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
20 | profile = 'default'
21 | s3_session = aiobotocore.session.AioSession(profile=profile) 
22 | storage_options = {"session": s3_session}
23 | 
24 | output_dir = "s3://<BUCKET>/<PREFIX>"
25 | builder = load_dataset_builder("samsum")
26 | builder.download_and_prepare(output_dir, storage_options=storage_options, file_format="arrow")
27 | ```
28 | 


--------------------------------------------------------------------------------
/examples/llm-text-summarization/preprocessing/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/llm-text-summarization/preprocessing/__init__.py


--------------------------------------------------------------------------------
/examples/llm-text-summarization/preprocessing/preprocessing.py:
--------------------------------------------------------------------------------
 1 | from datasets import load_dataset
 2 | from random import randint
 3 | from transformers import AutoTokenizer
 4 | 
 5 | # Load dataset staged from hub in s3
 6 | preprocessing_input_local= '/opt/ml/processing/input'
 7 | # dataset = load_dataset("arrow", 
 8 | #                        data_files={
 9 | #                         'train': f'{preprocessing_input_local}/raw-prompts-train/samsum-train.arrow', 
10 | #                         'test': f'{preprocessing_input_local}/raw-prompts-test/samsum-test.arrow'
11 | #                         }
12 | #                     )
13 | 
14 | #Alternatively load dataset from hugging face
15 | dataset = load_dataset("samsum")
16 | 
17 | print(f"Train dataset size: {len(dataset['train'])}")
18 | print(f"Test dataset size: {len(dataset['test'])}")
19 | 
20 | tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b")
21 | 
22 | # custom instruct prompt start
23 | prompt_template = f"Summarize the chat dialogue:\n{{dialogue}}\n---\nSummary:\n{{summary}}{{eos_token}}"
24 | 
25 | # template dataset to add prompt to each sample
26 | def template_dataset(sample):
27 |     sample["text"] = prompt_template.format(dialogue=sample["dialogue"],
28 |                                             summary=sample["summary"],
29 |                                             eos_token=tokenizer.eos_token)
30 |     return sample
31 | 
32 | 
33 | # apply prompt template per sample
34 | train_dataset = dataset["train"].map(template_dataset, remove_columns=list(dataset["train"].features))
35 | 
36 | print(f'Sample summarization example on base model:  {train_dataset[randint(0, len(dataset))]["text"]}')
37 | 
38 | # apply prompt template per sample
39 | test_dataset = dataset["test"].map(template_dataset, remove_columns=list(dataset["test"].features))
40 | 
41 |  # tokenize and chunk dataset
42 | lm_train_dataset = train_dataset.map(
43 |     lambda sample: tokenizer(sample["text"]), batched=True, batch_size=24, remove_columns=list(train_dataset.features)
44 | )
45 | 
46 | 
47 | lm_test_dataset = test_dataset.map(
48 |     lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(test_dataset.features)
49 | )
50 | 
51 | # Print total number of samples
52 | print(f"Total number of train samples: {len(lm_train_dataset)}")
53 | 
54 | lm_train_dataset.save_to_disk(f'/opt/ml/processing/output/training')
55 | lm_test_dataset.save_to_disk(f'/opt/ml/processing/output/testing')


--------------------------------------------------------------------------------
/examples/llm-text-summarization/requirements.txt:
--------------------------------------------------------------------------------
1 | torch==2.2.0
2 | transformers
3 | datasets
4 | peft==0.4.0
5 | bitsandbytes==0.40.2
6 | accelerate==0.21.0
7 | py7zr
8 | einops
9 | tensorboardX


--------------------------------------------------------------------------------
/examples/llm-text-summarization/training/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/llm-text-summarization/training/__init__.py


--------------------------------------------------------------------------------
/examples/llm-text-summarization/training/requirements.txt:
--------------------------------------------------------------------------------
 1 | # sagemaker estimators requires training.py and requirement in strict root folder, not 
 2 | torch==2.2.0
 3 | transformers
 4 | datasets
 5 | peft==0.4.0
 6 | bitsandbytes==0.40.2
 7 | accelerate==0.21.0
 8 | py7zr
 9 | einops
10 | tensorboardX


--------------------------------------------------------------------------------
/examples/llm-text-summarization/training/training.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import torch
  3 | import transformers
  4 | from datasets import load_from_disk
  5 | import transformers
  6 | from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
  7 | from peft import prepare_model_for_kbit_training
  8 | from peft import LoraConfig, get_peft_model
  9 | import shutil
 10 | import os
 11 | import nvidia
 12 | 
 13 | cuda_install_dir = '/'.join(nvidia.__file__.split('/')[:-1]) + '/cuda_runtime/lib/'
 14 | os.environ['LD_LIBRARY_PATH'] =  cuda_install_dir
 15 | 
 16 | print('*'*100)
 17 | print(torch.__version__)
 18 | print('*'*100)
 19 | 
 20 | # log_bucket = f"s3://{os.environ['SMP_S3BUCKETNAME']}/falcon-40b-qlora-finetune"
 21 | 
 22 | model_id = "tiiuae/falcon-7b"
 23 | 
 24 | # model_id = "tiiuae/falcon-40b"
 25 | 
 26 | device_map="auto"
 27 | bnb_config = BitsAndBytesConfig(
 28 |     load_in_4bit=True,
 29 |     bnb_4bit_use_double_quant=True,
 30 |     bnb_4bit_quant_type="nf4",
 31 |     bnb_4bit_compute_dtype=torch.bfloat16
 32 | )
 33 | 
 34 | # alternate config for loading unquantized model on cpu
 35 | '''
 36 | device_map = {
 37 |     "transformer.word_embeddings": 0,
 38 |     "transformer.word_embeddings_layernorm": 0,
 39 |     "lm_head": "cpu",
 40 |     "transformer.h": 0,
 41 |     "transformer.ln_f": 0,
 42 | }
 43 | bnb_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
 44 | '''
 45 | 
 46 | lm_train_dataset= load_from_disk(dataset_path=f"/opt/ml/input/data/falcon-text-summarization-preprocess-training/")
 47 | lm_test_dataset= load_from_disk(dataset_path=f"/opt/ml/input/data/falcon-text-summarization-preprocess-testing/")
 48 | 
 49 | tokenizer= AutoTokenizer.from_pretrained(model_id)
 50 | model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, quantization_config=bnb_config, device_map= device_map) #model prepared for LoRA training using PEFT
 51 | 
 52 | tokenizer.pad_token = tokenizer.eos_token
 53 | 
 54 | 
 55 | model.gradient_checkpointing_enable()
 56 | model = prepare_model_for_kbit_training(model)
 57 | model.config.use_cache = False
 58 | 
 59 | config = LoraConfig(
 60 |     r=8,
 61 |     lora_alpha=32,
 62 |     target_modules=[
 63 |         "query_key_value",
 64 |         "dense",
 65 |         "dense_h_to_4h",
 66 |         "dense_4h_to_h",
 67 |         ],
 68 |     lora_dropout=0.05,
 69 |     bias="none",
 70 |     task_type="CAUSAL_LM"
 71 | )
 72 | 
 73 | model = get_peft_model(model, config)
 74 | 
 75 | trainer = transformers.Trainer(
 76 |     model=model,
 77 |     train_dataset= lm_train_dataset,
 78 |     eval_dataset=lm_test_dataset,
 79 |     args=transformers.TrainingArguments(
 80 |         per_device_train_batch_size=8,
 81 |         per_device_eval_batch_size=8,
 82 |         # logging_dir=f'{log_bucket}/', # connect tensorboard for visualizing live training logs
 83 |         logging_steps=2,
 84 |         num_train_epochs=1, #num_train_epochs=1 for demonstration
 85 |         learning_rate=2e-4,
 86 |         bf16=True,
 87 |         save_strategy = "no",
 88 |         output_dir="outputs",
 89 |         report_to="tensorboard"    
 90 |     ),
 91 |     data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
 92 | )
 93 | 
 94 | trainer.train()
 95 | 
 96 | eval_metrics= trainer.evaluate()
 97 | 
 98 | # throw an error if evaluation loss is above threshold. Alternatively output an evaluation.json and add the pass/fail logic as a Sagemaker pipeline step.
 99 | if eval_metrics['eval_loss'] > 2:
100 |     raise ValueError("Evaluation loss is too high.")
101 | 
102 | # create a tarball of the model. For inference logic, untar and load appropriate .bin and .config for the llm from hugging face and serve.
103 | trainer.save_model('/opt/ml/model')
104 | # shutil.copytree('/opt/ml/code', os.path.join(os.environ['SM_MODEL_DIR'], 'code'))


--------------------------------------------------------------------------------
/examples/llm-text-summarization/transform/inference.py:
--------------------------------------------------------------------------------
 1 | # Inserting an inference.py to register the fine-tuned model. 
 2 | # This pattern demands the inference logic be available in a batch transform step.
 3 | 
 4 | # Replace this script with your inference logic, and place model dependencies for your model under llm-text-summarization/transform.
 5 | # e.x: s3://jumpstart-cache-prod-us-east-1/huggingface-infer/prepack/v1.0.0/infer-prepack-huggingface-llm-falcon-40b-bf16.tar.gz
 6 | # refer: https://huggingface.co/docs/sagemaker/inference#user-defined-code-and-modules
 7 | 
 8 | # extract Model ID from: https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html.
 9 | # then use the code snippet below. also use to get the appropriate container in 'deploy_image_uri'
10 | 
11 | '''
12 | from sagemaker import image_uris, model_uris
13 | 
14 | model_id, model_version, = (
15 |     "huggingface-llm-falcon-40b-bf16",
16 |     "*",
17 | )
18 | inference_instance_type = "ml.p3.2xlarge"
19 | 
20 | # Retrieve the inference docker container uri. This is the base HuggingFace container image for the default model above.
21 | deploy_image_uri = image_uris.retrieve(
22 | region=None,
23 | framework=None, # automatically inferred from model_id
24 | image_scope="inference",
25 | model_id=model_id,
26 | model_version=model_version,
27 | instance_type=inference_instance_type,
28 | )
29 | 
30 | # Retrieve the model uri.
31 | model_uri = model_uris.retrieve(
32 | model_id=model_id, model_version=model_version, model_scope="inference"
33 | )
34 | '''
35 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/MultiModel.md:
--------------------------------------------------------------------------------
  1 | # Welcome to the multi model example for this config driven sagemaker pipeline framwork!
  2 | 
  3 | ## Introduction
  4 | This is an multi model usage example. In this example, there are two models. One is PCA Model trained for feature dimension reduction, the other model is Tensorflow MLP trained for California Housing Price prediction. The Tensorflow Model's preprocessing step uses trained PCA Model to reduce its training data's number of feature dimensions. Also, we add the dependency that Tensorflow Model must be registered after PCA Model registration.
  5 | 
  6 | ![Mutli-Model-Pipeline-DAG](./dag.png)
  7 | 
  8 | ## Data Information
  9 | We use the California housing dataset.
 10 | 
 11 | More info on the dataset:
 12 | 
 13 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/
 14 | 
 15 | The target variable is the median house value for California districts.
 16 | 
 17 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
 18 | 
 19 | ### Data Download
 20 | To download the data, run the following code block. The data file name is **cal_housing.data**
 21 | ```
 22 | import boto3
 23 | 
 24 | region = "<your region>"
 25 | s3 = boto3.client("s3")
 26 | s3.download_file(
 27 |     f"sagemaker-example-files-prod-{region}",
 28 |     "datasets/tabular/california_housing/cal_housing.tgz",
 29 |     "cal_housing.tgz",
 30 | )
 31 | 
 32 | import tarfile
 33 | with tarfile.open("cal_housing.tgz") as tar:
 34 |     tar.extractall(path="tf/train_data")
 35 | ```
 36 | 
 37 | ### Upload to S3
 38 | ```
 39 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3://<BUCKET>/<PREFIX>/cal_housing.data
 40 | ```
 41 | 
 42 | 
 43 | 
 44 | 
 45 | 
 46 | 
 47 | ## Multi Model Project Structure
 48 | 
 49 | ```
 50 | /root/
 51 | │   dynamic-model-training-with-amazon-sagemaker-pipelines  
 52 | │
 53 | └───model X specific codebase 
 54 | │   │	
 55 | │   └─ conf
 56 | │   │	│   
 57 | │   │	└─multi_model_conf.yaml
 58 | │   │	...
 59 | │   └─ model_X_scripts
 60 | │   │	│
 61 | │   │	└─preprocess.py
 62 | │   │	│
 63 | │   │	└─train.py
 64 | │   │	│
 65 | │   │	└─...
 66 | │
 67 | └───model Y specific codebase 
 68 | │   │	
 69 | │   └─ conf
 70 | │   │	│   
 71 | │   │	└─multi_model_conf.yaml
 72 | │   │	...
 73 | │   └─ model_Y_scripts
 74 | │   │	│
 75 | │   │	└─preprocess.py
 76 | │   │	│
 77 | │   │	└─train.py
 78 | │   │	│
 79 | │   │	└─...
 80 | ```
 81 | ## Multi Model Runbook
 82 | - Step 1: Set Up Anchor Model For Multi Model Execution
 83 | 
 84 | For this config driven sagemaker pipeline framwork, when it is used for mutli model pipeline creation, each model will have its own conf.yaml. And the anchor model is define as the model that its conf.yaml contains sagemakerPipeline configuration section. In this example,  the one and only sagemakerPipeline configuration section is defined in cal_housing_tf's conf.yaml file. Any model can be your anchor model.
 85 | 
 86 | - Step 2: Set Up Environment Variables
 87 | 
 88 | Navigate to project root directory, set up env vars listed in env.env. For this multi model example, you may need to run the following command specifically in terminal.
 89 | ```
 90 | export SMP_MODEL_CONFIGPATH=examples/multi-model-example/*/conf/conf-multi-model.yaml
 91 | ```
 92 | - Step 3: Generate Pipeline Definition & Run Pipeline
 93 | 
 94 | Navigate to project root directory, run the following command in terminal.
 95 | ```
 96 | python3 framework/framework_entrypoint.py 
 97 | ```
 98 | <br>
 99 | <br>
100 | 
101 | 
102 | Enjoy!
103 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_pca/conf/conf-multi-model.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | conf:
 3 |     models:
 4 |         calhousingpca:
 5 |             source_directory: examples/multi-model-example/cal_housing_pca/modelscripts
 6 |             preprocess:
 7 |                 image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
 8 |                 entry_point: preprocess.py
 9 |                 channels:
10 |                     train:
11 |                         dataFiles:
12 |                             - sourceName: raw_data
13 |                               fileName: s3://SMP_S3BUCKETNAME/tf2-california-housing-pipelines/traindata/cal_housing.data
14 | 
15 |             train:
16 |                 image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
17 |                 entry_point: train.py
18 | 
19 |             registry:
20 |                 ModelRepack: "False"
21 |                 InferenceSpecification: 
22 |                     image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
23 |                     supported_content_types: 
24 |                         - application/json
25 |                     supported_response_MIME_types: 
26 |                         - application/json
27 |                     approval_status: PendingManualApproval
28 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_pca/data_source.md:
--------------------------------------------------------------------------------
 1 | # Tensorflow Example Data Source
 2 | 
 3 | ## Data Information
 4 | We use the California housing dataset.
 5 | 
 6 | More info on the dataset:
 7 | 
 8 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/
 9 | 
10 | The target variable is the median house value for California districts.
11 | 
12 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
13 | 
14 | ## Data Download
15 | To download the data, run the following code block. The data file name is **cal_housing.data**
16 | ```
17 | import boto3
18 | 
19 | region = "<your region>"
20 | s3 = boto3.client("s3")
21 | s3.download_file(
22 |     f"sagemaker-example-files-prod-{region}",
23 |     "datasets/tabular/california_housing/cal_housing.tgz",
24 |     "cal_housing.tgz",
25 | )
26 | 
27 | import tarfile
28 | with tarfile.open("cal_housing.tgz") as tar:
29 |     tar.extractall(path="tf/train_data")
30 | ```
31 | 
32 | ## Upload to S3
33 | ```
34 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3://<BUCKET>/<PREFIX>/cal_housing.data
35 | ```


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_pca/modelscripts/inference.py:
--------------------------------------------------------------------------------
1 | print("This is a placeholder inference.py for cal housing PCA model.")
2 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_pca/modelscripts/preprocess.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | import pandas as pd
 4 | 
 5 | BASE_DIR = "/opt/ml/processing"
 6 | CODE_DIR = os.path.join(BASE_DIR, "code")
 7 | INPUT_DIR = os.path.join(BASE_DIR, "input")
 8 | OUTPUT_DIR = os.path.join(BASE_DIR, "output")
 9 | 
10 | print(os.listdir(INPUT_DIR))
11 | 
12 | if __name__ == "__main__":
13 |     columns = [
14 |         "longitude",
15 |         "latitude",
16 |         "housingMedianAge",
17 |         "totalRooms",
18 |         "totalBedrooms",
19 |         "population",
20 |         "households",
21 |         "medianIncome",
22 |         "medianHouseValue",
23 |     ]
24 |     cal_housing_df = pd.read_csv(
25 |         os.path.join(INPUT_DIR, "raw_data/cal_housing.data"),
26 |         names=columns,
27 |         header=None
28 |     )
29 |     X = cal_housing_df[
30 |         [
31 |             "longitude",
32 |             "latitude",
33 |             "housingMedianAge",
34 |             "totalRooms",
35 |             "totalBedrooms",
36 |             "population",
37 |             "households",
38 |             "medianIncome",
39 |         ]
40 |     ]
41 |     Y = cal_housing_df[["medianHouseValue"]] / 100000
42 | 
43 |     X.to_csv(os.path.join(OUTPUT_DIR, "train/X.csv"), index=False, header=True)
44 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_pca/modelscripts/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_pca/modelscripts/train.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | 
 3 | import pandas as pd
 4 | from joblib import dump
 5 | from sklearn.decomposition import PCA
 6 | 
 7 | if __name__ == "__main__":
 8 |     # data directories
 9 |     channel_path = "/opt/ml/input/data/calhousing-pca-Preprocessing-train"
10 |     print(f'Training data location: {os.listdir(channel_path)}')
11 |     train_data_path = os.path.join(channel_path, "X.csv")
12 |     X = pd.read_csv(train_data_path)
13 |     print(X.head(5))
14 |     pca = PCA(n_components=6)
15 |     pca.fit(X)
16 |     print(pca.explained_variance_ratio_)
17 |     print(pca.singular_values_)
18 | 
19 |     # save model
20 |     dump(pca, os.path.join(os.environ.get("SM_MODEL_DIR"), "pca_model.joblib"))
21 |     print(os.listdir("/opt/ml/model/"))
22 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/conf/conf-multi-model.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | conf:
 3 |     models:
 4 |         calhousingtf:
 5 |             source_directory: examples/multi-model-example/cal_housing_tf/modelscripts
 6 |             preprocess:
 7 |                 image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
 8 |                 entry_point: preprocess.py
 9 |                 channels:
10 |                     train:
11 |                         dataFiles:
12 |                             - sourceName: raw_data
13 |                               fileName: s3://SMP_S3BUCKETNAME/tf2-california-housing-pipelines/traindata/cal_housing.data
14 | 
15 |             train:
16 |                 image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.11.0-cpu-py39
17 |                 entry_point: train.py
18 | 
19 |             registry:
20 |                 ModelRepack: "False"
21 |                 InferenceSpecification: 
22 |                     image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu
23 |                     supported_content_types: 
24 |                         - application/json
25 |                     supported_response_MIME_types: 
26 |                         - application/json
27 |                     approval_status: PendingManualApproval
28 | 
29 |             transform:
30 |               image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu
31 |               entry_point: inference.py
32 |               channels: 
33 |                   train:
34 | 
35 |             evaluate:
36 |               image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
37 |               entry_point: evaluate.py
38 |               channels: train
39 |               content_type: application/json
40 |       
41 | 
42 | 
43 |     sagemakerPipeline:
44 |         pipelineName: calhousing-test
45 |         models:
46 |             calhousingpca:
47 |                 steps:
48 |                     - step_name: calhousing-pca-Preprocessing
49 |                       step_class: Processing
50 |                       step_type: preprocess
51 |                       enable_cache: True
52 |                     - step_name: calhousing-pca-Training
53 |                       step_class: Training
54 |                       enable_cache: True
55 |                       chain_input_source_step:
56 |                         - calhousing-pca-Preprocessing
57 |                     - step_name: calhousing-pca-Register
58 |                       step_class: RegisterModel
59 |             calhousingtf:
60 |                 steps:
61 |                     - step_name: calhousing-tf-Preprocessing
62 |                       step_type: preprocess
63 |                       step_class: Processing
64 |                       chain_input_source_step: 
65 |                         - calhousing-pca-Training
66 |                       enable_cache: True
67 |                     - step_name: calhousing-tf-Training
68 |                       step_class: Training
69 |                       enable_cache: True
70 |                       chain_input_source_step: 
71 |                         - calhousing-tf-Preprocessing
72 |                     - step_name: calhousing-tf-CreateModel
73 |                       step_class: CreateModel
74 |                     - step_name: calhousing-tf-Transform
75 |                       step_class: Transform
76 |                       chain_input_source_step: 
77 |                         - calhousing-tf-Preprocessing
78 |                       chain_input_additional_prefix: test/x_test.csv
79 |                     - step_name: calhousing-tf-Metrics
80 |                       step_class: Metrics
81 |                       chain_input_source_step: 
82 |                         - calhousing-tf-Preprocessing
83 |                         - calhousing-tf-Transform
84 |                     - step_name: calhousing-tf-Register
85 |                       step_class: RegisterModel
86 | 
87 |         dependencies:
88 |             - calhousing-tf-Preprocessing >> calhousing-tf-Training >> calhousing-tf-CreateModel >> calhousing-tf-Transform >> calhousing-tf-Metrics >> calhousing-tf-Register
89 |             - calhousing-pca-Preprocessing >> calhousing-pca-Training >> calhousing-pca-Register
90 |             # example: add-on customized dependency
91 |             - calhousing-pca-Register >> calhousing-tf-Register


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/data_source.md:
--------------------------------------------------------------------------------
 1 | # Tensorflow Example Data Source
 2 | 
 3 | ## Data Information
 4 | We use the California housing dataset.
 5 | 
 6 | More info on the dataset:
 7 | 
 8 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/
 9 | 
10 | The target variable is the median house value for California districts.
11 | 
12 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
13 | 
14 | ## Data Download
15 | To download the data, run the following code block. The data file name is **cal_housing.data**
16 | ```
17 | import boto3
18 | 
19 | region = "<your region>"
20 | s3 = boto3.client("s3")
21 | s3.download_file(
22 |     f"sagemaker-example-files-prod-{region}",
23 |     "datasets/tabular/california_housing/cal_housing.tgz",
24 |     "cal_housing.tgz",
25 | )
26 | 
27 | import tarfile
28 | with tarfile.open("cal_housing.tgz") as tar:
29 |     tar.extractall(path="tf/train_data")
30 | ```
31 | 
32 | ## Upload to S3
33 | ```
34 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3://<BUCKET>/<PREFIX>/cal_housing.data
35 | ```


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/modelscripts/evaluate.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | import pathlib
 4 | 
 5 | import numpy as np
 6 | from sklearn.metrics import mean_squared_error
 7 | 
 8 | if __name__ == "__main__":
 9 |     pred_path = "/opt/ml/processing/input/calhousing-tf-Transform-train/"
10 |     print(os.listdir(pred_path))
11 |     with open(os.path.join(pred_path, "x_test.csv.out")) as f:
12 |         file_string = f.read()
13 |     y_test_pred = json.loads(file_string)["predictions"]
14 | 
15 |     test_path = "/opt/ml/processing/input/calhousing-tf-Preprocessing-train/"
16 |     print(os.listdir(test_path))
17 |     y_test_true = np.loadtxt(os.path.join(test_path, "test/y_test.csv"))
18 |     scores = mean_squared_error(y_test_true, y_test_pred)
19 |     print("\nTest MSE :", scores)
20 | 
21 |     # Available metrics to add to model: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html
22 |     report_dict = {
23 |         "regression_metrics": {
24 |             "mse": {"value": scores, "standard_deviation": "NaN"},
25 |         },
26 |     }
27 | 
28 |     output_dir = "/opt/ml/processing/output"
29 |     pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
30 | 
31 |     evaluation_path = f"{output_dir}/model_evaluation_metrics.json"
32 |     with open(evaluation_path, "w") as f:
33 |         f.write(json.dumps(report_dict))
34 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/modelscripts/inference.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | 
 4 | import numpy as np
 5 | 
 6 | print(f"My location: {os.listdir()}")
 7 | print(f"Dir of /opt/ml: {os.listdir('/opt/ml')}")
 8 | print(f"Dir of /opt/ml/model: {os.listdir('/opt/ml/model')}")
 9 | 
10 | print(f"{'-' * 40} Start printing Env Var {'-' * 40}")
11 | for name, value in os.environ.items():
12 |     print("{0}: {1}".format(name, value))
13 | print(f"{'-' * 40} Finish printing Env Var {'-' * 40}")
14 | 
15 | model_dir = "/opt/ml/model"
16 | print("numpy version", np.__version__)
17 | 
18 | 
19 | def read_csv(csv):
20 |     return np.array([[float(j) for j in i.split(",")] for i in csv.splitlines()])
21 | 
22 | 
23 | def input_handler(data, context):
24 |     """ Pre-process request input before it is sent to TensorFlow Serving REST API
25 |     Args:
26 |         data (obj): the request data, in format of dict or string
27 |         context (Context): an object containing request and configuration details
28 |     Returns:
29 |         (dict): a JSON-serializable dict that contains request body and headers
30 |     """
31 |     print(f"InputHandler, request content type is {context.request_content_type}")
32 |     print(f"InputHandler, model name is {context.model_name}")
33 |     print(f"InputHandler, method is {context.method}")
34 |     print(f"InputHandler, rest_uri is {context.rest_uri}")
35 |     print(f"InputHandler, custom_attributes is {context.custom_attributes}")
36 |     print(f"InputHandler, accept_header is {context.accept_header}")
37 |     print(f"InputHandler, content_length is {context.content_length}")
38 | 
39 |     if context.request_content_type == 'application/json':
40 |         # pass through json (assumes it's correctly formed)
41 |         d = data.read().decode('utf-8')
42 |         return d if len(d) else ''
43 |     if context.request_content_type == 'text/csv':
44 |         payload = data.read().decode('utf-8')
45 |         inputs = read_csv(payload)
46 |         print(inputs[:10])
47 |         input_data = {'instances': inputs.tolist()}
48 |         return json.dumps(input_data)
49 | 
50 | 
51 | def output_handler(data, context):
52 |     """Post-process TensorFlow Serving output before it is returned to the client.
53 |     Args:
54 |         data (obj): the TensorFlow serving response
55 |         context (Context): an object containing request and configuration details
56 |     Returns:
57 |         (bytes, string): data to return to client, response content type
58 |     """
59 |     print(f"OutputHandler, hello world!")
60 |     status_code = data.status_code
61 |     content = data.content
62 | 
63 |     if status_code != 200:
64 |         raise ValueError(content.decode('utf-8'))
65 | 
66 |     response_content_type = context.accept_header
67 |     prediction = data.content
68 | 
69 |     print(f"Prediction type is {type(prediction)}, {prediction}")
70 |     print(f"Prediction is {prediction}")
71 | 
72 |     return prediction, response_content_type
73 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/modelscripts/preprocess.py:
--------------------------------------------------------------------------------
  1 | import glob
  2 | import os
  3 | import tarfile
  4 | 
  5 | import numpy as np
  6 | import pandas as pd
  7 | from joblib import load
  8 | from sklearn.model_selection import train_test_split
  9 | from sklearn.preprocessing import StandardScaler
 10 | 
 11 | BASE_DIR = "/opt/ml/processing"
 12 | CODE_DIR = os.path.join(BASE_DIR, "code")
 13 | INPUT_DIR = os.path.join(BASE_DIR, "input")
 14 | OUTPUT_DIR = os.path.join(BASE_DIR, "output")
 15 | 
 16 | print(os.listdir(INPUT_DIR))
 17 | 
 18 | if __name__ == "__main__":
 19 |     columns = [
 20 |         "longitude",
 21 |         "latitude",
 22 |         "housingMedianAge",
 23 |         "totalRooms",
 24 |         "totalBedrooms",
 25 |         "population",
 26 |         "households",
 27 |         "medianIncome",
 28 |         "medianHouseValue",
 29 |     ]
 30 |     cal_housing_df = pd.read_csv(
 31 |         os.path.join(INPUT_DIR, "raw_data/cal_housing.data"),
 32 |         names=columns,
 33 |         header=None
 34 |     )
 35 |     X = cal_housing_df[
 36 |         [
 37 |             "longitude",
 38 |             "latitude",
 39 |             "housingMedianAge",
 40 |             "totalRooms",
 41 |             "totalBedrooms",
 42 |             "population",
 43 |             "households",
 44 |             "medianIncome",
 45 |         ]
 46 |     ]
 47 |     Y = cal_housing_df[["medianHouseValue"]] / 100000
 48 | 
 49 |     x_train_, x_test_, y_train, y_test = train_test_split(X, Y, test_size=0.33)
 50 |     pca_model_tarfile_location = os.path.join(INPUT_DIR, "calhousing-pca-Training-input-train/model.tar.gz")
 51 |     print(os.listdir(os.path.join(INPUT_DIR, "calhousing-pca-Training-input-train")))
 52 |     # with tarfile.open(pca_model_tarfile_location) as tar:
 53 |     #     tar.extractall()
 54 |     tf = tarfile.open(mode='r', fileobj=None)
 55 |     tf.extractall(pca_model_tarfile_location, members=None)
 56 | 
 57 |     print(os.listdir())
 58 |     pca = load("pca_model.joblib")
 59 |     x_train = pca.transform(x_train_)
 60 |     x_test = pca.transform(x_test_)
 61 | 
 62 |     split_data_dir = os.path.join(BASE_DIR, "split_data")
 63 |     if not os.path.exists(split_data_dir):
 64 |         os.mkdir(split_data_dir)
 65 |     np.save(os.path.join(split_data_dir, "x_train.npy"), x_train)
 66 |     np.save(os.path.join(split_data_dir, "x_test.npy"), x_test)
 67 |     np.save(os.path.join(split_data_dir, "y_train.npy"), y_train)
 68 |     np.save(os.path.join(split_data_dir, "y_test.npy"), y_test)
 69 | 
 70 |     input_files = glob.glob("{}/*.npy".format(split_data_dir))
 71 |     print("\nINPUT FILE LIST: \n{}\n".format(input_files))
 72 |     scaler = StandardScaler()
 73 |     x_train = np.load(os.path.join(split_data_dir, "x_train.npy"))
 74 |     scaler.fit(x_train)
 75 | 
 76 |     train_data_output_dir = os.path.join(OUTPUT_DIR, "train/train")
 77 |     if not os.path.exists(train_data_output_dir):
 78 |         os.mkdir(train_data_output_dir)
 79 |     test_data_output_dir = os.path.join(OUTPUT_DIR, "train/test")
 80 |     if not os.path.exists(test_data_output_dir):
 81 |         os.mkdir(test_data_output_dir)
 82 |     for file in input_files:
 83 |         raw = np.load(file)
 84 |         # only transform feature columns
 85 |         if "y_" not in file:
 86 |             transformed = scaler.transform(raw)
 87 |         if "train" in file:
 88 |             if "y_" in file:
 89 |                 output_path = os.path.join(train_data_output_dir, "y_train.npy")
 90 |                 np.save(output_path, raw)
 91 |                 print("SAVED LABEL TRAINING DATA FILE\n")
 92 |             else:
 93 |                 output_path = os.path.join(train_data_output_dir, "x_train.npy")
 94 |                 np.save(output_path, transformed)
 95 |                 print("SAVED TRANSFORMED TRAINING DATA FILE\n")
 96 |         else:
 97 |             if "y_" in file:
 98 |                 output_path = os.path.join(test_data_output_dir, "y_test.npy")
 99 |                 np.save(output_path, raw)
100 |                 output_path = os.path.join(test_data_output_dir, "y_test.csv")
101 |                 np.savetxt(output_path, raw, delimiter=",")
102 |                 print("SAVED LABEL TEST DATA FILE\n")
103 |             else:
104 |                 output_path = os.path.join(test_data_output_dir, "x_test.npy")
105 |                 np.save(output_path, transformed)
106 |                 output_path = os.path.join(test_data_output_dir, "x_test.csv")
107 |                 np.savetxt(output_path, transformed, delimiter=",")
108 |                 print("SAVED TRANSFORMED TEST DATA FILE\n")
109 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/modelscripts/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas


--------------------------------------------------------------------------------
/examples/multi-model-example/cal_housing_tf/modelscripts/train.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | 
 4 | import numpy as np
 5 | import tensorflow as tf
 6 | 
 7 | 
 8 | def parse_args():
 9 |     parser = argparse.ArgumentParser()
10 | 
11 |     # hyperparameters sent by the client are passed as command-line arguments to the script
12 |     parser.add_argument('--epochs', type=int, default=1)
13 |     parser.add_argument('--batch_size', type=int, default=64)
14 |     parser.add_argument('--learning_rate', type=float, default=0.1)
15 | 
16 |     # data directories
17 |     channel_path = "/opt/ml/input/data/calhousing-tf-Preprocessing-train"
18 |     parser.add_argument(
19 |         '--train',
20 |         type=str,
21 |         default=os.path.join(channel_path, "train")
22 |     )
23 |     parser.add_argument(
24 |         '--test',
25 |         type=str,
26 |         default=os.path.join(channel_path, "test")
27 |     )
28 | 
29 |     # model directory
30 |     parser.add_argument('--sm-model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
31 | 
32 |     return parser.parse_known_args()
33 | 
34 | 
35 | def get_train_data(train_dir):
36 |     x_train = np.load(os.path.join(train_dir, 'x_train.npy'))
37 |     y_train = np.load(os.path.join(train_dir, 'y_train.npy'))
38 |     print('x train', x_train.shape, 'y train', y_train.shape)
39 | 
40 |     return x_train, y_train
41 | 
42 | 
43 | def get_test_data(test_dir):
44 |     x_test = np.load(os.path.join(test_dir, 'x_test.npy'))
45 |     y_test = np.load(os.path.join(test_dir, 'y_test.npy'))
46 |     print('x test', x_test.shape, 'y test', y_test.shape)
47 | 
48 |     return x_test, y_test
49 | 
50 | 
51 | def get_model():
52 |     inputs = tf.keras.Input(shape=(6,))
53 |     hidden_1 = tf.keras.layers.Dense(8, activation='tanh')(inputs)
54 |     hidden_2 = tf.keras.layers.Dense(4, activation='sigmoid')(hidden_1)
55 |     outputs = tf.keras.layers.Dense(1)(hidden_2)
56 |     return tf.keras.Model(inputs=inputs, outputs=outputs)
57 | 
58 | 
59 | if __name__ == "__main__":
60 |     args, _ = parse_args()
61 | 
62 |     print('Training data location: {}'.format(args.train))
63 |     print('Test data location: {}'.format(args.test))
64 |     x_train, y_train = get_train_data(args.train)
65 |     x_test, y_test = get_test_data(args.test)
66 | 
67 |     batch_size = args.batch_size
68 |     epochs = args.epochs
69 |     learning_rate = args.learning_rate
70 |     print('batch_size = {}, epochs = {}, learning rate = {}'.format(batch_size, epochs, learning_rate))
71 | 
72 |     model = get_model()
73 |     optimizer = tf.keras.optimizers.SGD(learning_rate)
74 |     model.compile(optimizer=optimizer, loss='mse')
75 |     model.fit(x_train,
76 |               y_train,
77 |               batch_size=batch_size,
78 |               epochs=epochs,
79 |               validation_data=(x_test, y_test))
80 | 
81 |     # evaluate on test set
82 |     scores = model.evaluate(x_test, y_test, batch_size, verbose=2)
83 |     print("\nTest MSE :", scores)
84 | 
85 |     # save model
86 |     model.save(args.sm_model_dir + '/1')
87 | 


--------------------------------------------------------------------------------
/examples/multi-model-example/dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/multi-model-example/dag.png


--------------------------------------------------------------------------------
/examples/tf/conf/conf.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | conf:
 3 |     models:
 4 |         calhousing:
 5 |             source_directory: examples/tf/modelscripts
 6 |             preprocess:
 7 |                 image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
 8 |                 entry_point: preprocess.py
 9 |                 channels:
10 |                     train:
11 |                         dataFiles:
12 |                             - sourceName: raw_data
13 |                               fileName: s3://SMP_S3BUCKETNAME/tf2-california-housing-pipelines/traindata/cal_housing.data
14 |             train:
15 |                 image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-training:2.11.0-cpu-py39
16 |                 entry_point: train.py
17 | 
18 |             registry:
19 |                 ModelRepack: "False"
20 |                 InferenceSpecification: 
21 |                     image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu
22 |                     supported_content_types: 
23 |                         - application/json
24 |                     supported_response_MIME_types: 
25 |                         - application/json
26 |                     approval_status: PendingManualApproval
27 | 
28 |             transform:
29 |               image_uri: 763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.11.0-cpu
30 |               entry_point: inference.py
31 |               channels: 
32 |                   train:
33 | 
34 |             evaluate:
35 |               image_uri: 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3
36 |               entry_point: evaluate.py
37 |               channels: train
38 |               content_type: application/json
39 | 
40 | 
41 |     sagemakerPipeline:
42 |         pipelineName: calhousing-test
43 |         models:
44 |             calhousing:
45 |                 steps:
46 |                     - step_name: calhousing-Preprocessing
47 |                       step_type: preprocess
48 |                       step_class: Processing
49 |                       enable_cache: True
50 |                     - step_name: calhousing-Training
51 |                       step_class: Training
52 |                       enable_cache: True
53 |                       chain_input_source_step: 
54 |                         - calhousing-Preprocessing
55 |                     - step_name: calhousing-CreateModel
56 |                       step_class: CreateModel
57 |                     - step_name: calhousing-Transform
58 |                       step_class: Transform
59 |                       chain_input_source_step: 
60 |                         - calhousing-Preprocessing
61 |                       chain_input_additional_prefix: test/x_test.csv
62 |                     - step_name: calhousing-Metrics
63 |                       step_class: Metrics
64 |                       chain_input_source_step: 
65 |                         - calhousing-Preprocessing
66 |                         - calhousing-Transform
67 |                     - step_name: calhousing-Register
68 |                       step_class: RegisterModel
69 | 
70 |         dependencies:
71 |             - calhousing-Preprocessing >> calhousing-Training >> calhousing-CreateModel >> calhousing-Transform >> calhousing-Metrics >> calhousing-Register
72 | 


--------------------------------------------------------------------------------
/examples/tf/data_source.md:
--------------------------------------------------------------------------------
 1 | # Tensorflow Example Data Source
 2 | 
 3 | ## Data Information
 4 | We use the California housing dataset.
 5 | 
 6 | More info on the dataset:
 7 | 
 8 | This dataset was obtained from the StatLib repository. http://lib.stat.cmu.edu/datasets/
 9 | 
10 | The target variable is the median house value for California districts.
11 | 
12 | This dataset was derived from the 1990 U.S. census, using one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people).
13 | 
14 | ## Data Download
15 | To download the data, run the following code block. The data file name is **cal_housing.data**
16 | ```
17 | import boto3
18 | 
19 | region = "<your region>"
20 | s3 = boto3.client("s3")
21 | s3.download_file(
22 |     f"sagemaker-example-files-prod-{region}",
23 |     "datasets/tabular/california_housing/cal_housing.tgz",
24 |     "cal_housing.tgz",
25 | )
26 | 
27 | import tarfile
28 | with tarfile.open("cal_housing.tgz") as tar:
29 |     tar.extractall(path="tf/train_data")
30 | ```
31 | 
32 | ## Upload to S3
33 | ```
34 | aws s3 cp tf/train_data/CaliforniaHousing/cal_housing.data s3://<BUCKET>/<PREFIX>/cal_housing.data
35 | ```


--------------------------------------------------------------------------------
/examples/tf/modelscripts/evaluate.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | import pathlib
 4 | 
 5 | import numpy as np
 6 | from sklearn.metrics import mean_squared_error
 7 | 
 8 | if __name__ == "__main__":
 9 |     pred_path = "/opt/ml/processing/input/calhousing-Transform-train/"
10 |     print(os.listdir(pred_path))
11 |     with open(os.path.join(pred_path, "x_test.csv.out")) as f:
12 |         file_string = f.read()
13 |     y_test_pred = json.loads(file_string)["predictions"]
14 | 
15 |     test_path = "/opt/ml/processing/input/calhousing-Preprocessing-train/"
16 |     print(os.listdir(test_path))
17 |     y_test_true = np.loadtxt(os.path.join(test_path, "test/y_test.csv"))
18 |     scores = mean_squared_error(y_test_true, y_test_pred)
19 |     print("\nTest MSE :", scores)
20 | 
21 |     # Available metrics to add to model: https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-metrics.html
22 |     report_dict = {
23 |         "regression_metrics": {
24 |             "mse": {"value": scores, "standard_deviation": "NaN"},
25 |         },
26 |     }
27 | 
28 |     output_dir = "/opt/ml/processing/output"
29 |     pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
30 | 
31 |     evaluation_path = f"{output_dir}/model_evaluation_metrics.json"
32 |     with open(evaluation_path, "w") as f:
33 |         f.write(json.dumps(report_dict))
34 | 


--------------------------------------------------------------------------------
/examples/tf/modelscripts/inference.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | 
 4 | import numpy as np
 5 | 
 6 | print(f"My location: {os.listdir()}")
 7 | print(f"Dir of /opt/ml: {os.listdir('/opt/ml')}")
 8 | print(f"Dir of /opt/ml/model: {os.listdir('/opt/ml/model')}")
 9 | 
10 | print(f"{'-' * 40} Start printing Env Var {'-' * 40}")
11 | for name, value in os.environ.items():
12 |     print("{0}: {1}".format(name, value))
13 | print(f"{'-' * 40} Finish printing Env Var {'-' * 40}")
14 | 
15 | model_dir = "/opt/ml/model"
16 | print("numpy version", np.__version__)
17 | 
18 | 
19 | def read_csv(csv):
20 |     return np.array([[float(j) for j in i.split(",")] for i in csv.splitlines()])
21 | 
22 | 
23 | def input_handler(data, context):
24 |     """ Pre-process request input before it is sent to TensorFlow Serving REST API
25 |     Args:
26 |         data (obj): the request data, in format of dict or string
27 |         context (Context): an object containing request and configuration details
28 |     Returns:
29 |         (dict): a JSON-serializable dict that contains request body and headers
30 |     """
31 |     print(f"InputHandler, request content type is {context.request_content_type}")
32 |     print(f"InputHandler, model name is {context.model_name}")
33 |     print(f"InputHandler, method is {context.method}")
34 |     print(f"InputHandler, rest_uri is {context.rest_uri}")
35 |     print(f"InputHandler, custom_attributes is {context.custom_attributes}")
36 |     print(f"InputHandler, accept_header is {context.accept_header}")
37 |     print(f"InputHandler, content_length is {context.content_length}")
38 | 
39 |     if context.request_content_type == 'application/json':
40 |         # pass through json (assumes it's correctly formed)
41 |         d = data.read().decode('utf-8')
42 |         return d if len(d) else ''
43 |     if context.request_content_type == 'text/csv':
44 |         payload = data.read().decode('utf-8')
45 |         inputs = read_csv(payload)
46 |         print(inputs[:10])
47 |         input_data = {'instances': inputs.tolist()}
48 |         return json.dumps(input_data)
49 | 
50 | 
51 | def output_handler(data, context):
52 |     """Post-process TensorFlow Serving output before it is returned to the client.
53 |     Args:
54 |         data (obj): the TensorFlow serving response
55 |         context (Context): an object containing request and configuration details
56 |     Returns:
57 |         (bytes, string): data to return to client, response content type
58 |     """
59 |     print(f"OutputHandler, hello world!")
60 |     status_code = data.status_code
61 |     content = data.content
62 | 
63 |     if status_code != 200:
64 |         raise ValueError(content.decode('utf-8'))
65 | 
66 |     response_content_type = context.accept_header
67 |     prediction = data.content
68 | 
69 |     print(f"Prediction type is {type(prediction)}, {prediction}")
70 |     print(f"Prediction is {prediction}")
71 | 
72 |     return prediction, response_content_type
73 | 


--------------------------------------------------------------------------------
/examples/tf/modelscripts/preprocess.py:
--------------------------------------------------------------------------------
 1 | import glob
 2 | import os
 3 | 
 4 | import numpy as np
 5 | import pandas as pd
 6 | from sklearn.model_selection import train_test_split
 7 | from sklearn.preprocessing import StandardScaler
 8 | 
 9 | BASE_DIR = "/opt/ml/processing"
10 | CODE_DIR = os.path.join(BASE_DIR, "code")
11 | INPUT_DIR = os.path.join(BASE_DIR, "input")
12 | OUTPUT_DIR = os.path.join(BASE_DIR, "output")
13 | 
14 | print(os.listdir(INPUT_DIR))
15 | 
16 | if __name__ == "__main__":
17 |     columns = [
18 |         "longitude",
19 |         "latitude",
20 |         "housingMedianAge",
21 |         "totalRooms",
22 |         "totalBedrooms",
23 |         "population",
24 |         "households",
25 |         "medianIncome",
26 |         "medianHouseValue",
27 |     ]
28 |     cal_housing_df = pd.read_csv(
29 |         os.path.join(INPUT_DIR, "raw_data/cal_housing.data"),
30 |         names=columns,
31 |         header=None
32 |     )
33 |     X = cal_housing_df[
34 |         [
35 |             "longitude",
36 |             "latitude",
37 |             "housingMedianAge",
38 |             "totalRooms",
39 |             "totalBedrooms",
40 |             "population",
41 |             "households",
42 |             "medianIncome",
43 |         ]
44 |     ]
45 |     Y = cal_housing_df[["medianHouseValue"]] / 100000
46 | 
47 |     x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.33)
48 |     split_data_dir = os.path.join(BASE_DIR, "split_data")
49 |     if not os.path.exists(split_data_dir):
50 |         os.mkdir(split_data_dir)
51 |     np.save(os.path.join(split_data_dir, "x_train.npy"), x_train)
52 |     np.save(os.path.join(split_data_dir, "x_test.npy"), x_test)
53 |     np.save(os.path.join(split_data_dir, "y_train.npy"), y_train)
54 |     np.save(os.path.join(split_data_dir, "y_test.npy"), y_test)
55 | 
56 |     input_files = glob.glob("{}/*.npy".format(split_data_dir))
57 |     print("\nINPUT FILE LIST: \n{}\n".format(input_files))
58 |     scaler = StandardScaler()
59 |     x_train = np.load(os.path.join(split_data_dir, "x_train.npy"))
60 |     scaler.fit(x_train)
61 | 
62 |     train_data_output_dir = os.path.join(OUTPUT_DIR, "train/train")
63 |     if not os.path.exists(train_data_output_dir):
64 |         os.mkdir(train_data_output_dir)
65 |     test_data_output_dir = os.path.join(OUTPUT_DIR, "train/test")
66 |     if not os.path.exists(test_data_output_dir):
67 |         os.mkdir(test_data_output_dir)
68 |     for file in input_files:
69 |         raw = np.load(file)
70 |         # only transform feature columns
71 |         if "y_" not in file:
72 |             transformed = scaler.transform(raw)
73 |         if "train" in file:
74 |             if "y_" in file:
75 |                 output_path = os.path.join(train_data_output_dir, "y_train.npy")
76 |                 np.save(output_path, raw)
77 |                 print("SAVED LABEL TRAINING DATA FILE\n")
78 |             else:
79 |                 output_path = os.path.join(train_data_output_dir, "x_train.npy")
80 |                 np.save(output_path, transformed)
81 |                 print("SAVED TRANSFORMED TRAINING DATA FILE\n")
82 |         else:
83 |             if "y_" in file:
84 |                 output_path = os.path.join(test_data_output_dir, "y_test.npy")
85 |                 np.save(output_path, raw)
86 |                 output_path = os.path.join(test_data_output_dir, "y_test.csv")
87 |                 np.savetxt(output_path, raw, delimiter=",")
88 |                 print("SAVED LABEL TEST DATA FILE\n")
89 |             else:
90 |                 output_path = os.path.join(test_data_output_dir, "x_test.npy")
91 |                 np.save(output_path, transformed)
92 |                 output_path = os.path.join(test_data_output_dir, "x_test.csv")
93 |                 np.savetxt(output_path, transformed, delimiter=",")
94 |                 print("SAVED TRANSFORMED TEST DATA FILE\n")
95 | 


--------------------------------------------------------------------------------
/examples/tf/modelscripts/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas


--------------------------------------------------------------------------------
/examples/tf/modelscripts/train.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | 
 4 | import numpy as np
 5 | import tensorflow as tf
 6 | 
 7 | 
 8 | def parse_args():
 9 |     parser = argparse.ArgumentParser()
10 | 
11 |     # hyperparameters sent by the client are passed as command-line arguments to the script
12 |     parser.add_argument('--epochs', type=int, default=1)
13 |     parser.add_argument('--batch_size', type=int, default=64)
14 |     parser.add_argument('--learning_rate', type=float, default=0.1)
15 | 
16 |     # data directories
17 |     channel_path = "/opt/ml/input/data/calhousing-Preprocessing-train"
18 |     parser.add_argument(
19 |         '--train',
20 |         type=str,
21 |         default=os.path.join(channel_path, "train")
22 |     )
23 |     parser.add_argument(
24 |         '--test',
25 |         type=str,
26 |         default=os.path.join(channel_path, "test")
27 |     )
28 | 
29 |     # model directory
30 |     parser.add_argument('--sm-model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
31 | 
32 |     return parser.parse_known_args()
33 | 
34 | 
35 | def get_train_data(train_dir):
36 |     x_train = np.load(os.path.join(train_dir, 'x_train.npy'))
37 |     y_train = np.load(os.path.join(train_dir, 'y_train.npy'))
38 |     print('x train', x_train.shape, 'y train', y_train.shape)
39 | 
40 |     return x_train, y_train
41 | 
42 | 
43 | def get_test_data(test_dir):
44 |     x_test = np.load(os.path.join(test_dir, 'x_test.npy'))
45 |     y_test = np.load(os.path.join(test_dir, 'y_test.npy'))
46 |     print('x test', x_test.shape, 'y test', y_test.shape)
47 | 
48 |     return x_test, y_test
49 | 
50 | 
51 | def get_model():
52 |     inputs = tf.keras.Input(shape=(8,))
53 |     hidden_1 = tf.keras.layers.Dense(8, activation='tanh')(inputs)
54 |     hidden_2 = tf.keras.layers.Dense(4, activation='sigmoid')(hidden_1)
55 |     outputs = tf.keras.layers.Dense(1)(hidden_2)
56 |     return tf.keras.Model(inputs=inputs, outputs=outputs)
57 | 
58 | 
59 | if __name__ == "__main__":
60 |     args, _ = parse_args()
61 | 
62 |     print('Training data location: {}'.format(args.train))
63 |     print('Test data location: {}'.format(args.test))
64 |     x_train, y_train = get_train_data(args.train)
65 |     x_test, y_test = get_test_data(args.test)
66 | 
67 |     batch_size = args.batch_size
68 |     epochs = args.epochs
69 |     learning_rate = args.learning_rate
70 |     print('batch_size = {}, epochs = {}, learning rate = {}'.format(batch_size, epochs, learning_rate))
71 | 
72 |     model = get_model()
73 |     optimizer = tf.keras.optimizers.SGD(learning_rate)
74 |     model.compile(optimizer=optimizer, loss='mse')
75 |     model.fit(x_train,
76 |               y_train,
77 |               batch_size=batch_size,
78 |               epochs=epochs,
79 |               validation_data=(x_test, y_test))
80 | 
81 |     # evaluate on test set
82 |     scores = model.evaluate(x_test, y_test, batch_size, verbose=2)
83 |     print("\nTest MSE :", scores)
84 | 
85 |     # save model
86 |     model.save(args.sm_model_dir + '/1')
87 | 


--------------------------------------------------------------------------------
/examples/tf/smp_dag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/examples/tf/smp_dag.png


--------------------------------------------------------------------------------
/framework/.gitignore:
--------------------------------------------------------------------------------
1 | .venv/


--------------------------------------------------------------------------------
/framework/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/__init__.py


--------------------------------------------------------------------------------
/framework/conf/conf.yaml:
--------------------------------------------------------------------------------
 1 | ---
 2 | conf:
 3 |   s3Bucket: SMP_S3BUCKETNAME
 4 |   modelConfigFilePath: SMP_MODEL_CONFIGPATH
 5 |   sagemakerNetworkSecurity:
 6 |     subnets: SMP_SUBNETS
 7 |     role: SMP_ROLE
 8 |     security_groups_id: SMP_SECURITYGROUPS
 9 | 
10 | 


--------------------------------------------------------------------------------
/framework/conf/template/conf.yaml:
--------------------------------------------------------------------------------
  1 | models:
  2 |   modelContainer:
  3 |     mymodel:
  4 |       source_directory: source_directory_root_level
  5 |       name: model_name
  6 |       registry:
  7 |         ModelPackageGroupName: ""
  8 |         ModelPackageGroupDescription: ""
  9 |         ModelApprovalStatus: ""
 10 |         ModelRepack: ""
 11 |         InferenceSpecification:
 12 |           SupportedTransformInstanceTypes:
 13 |             - ml.m5.2xlarge
 14 |             - OTHER.OPTIONS
 15 |           SupportedRealtimeInferenceInstanceType:
 16 |             - ml.m5.2xlarge
 17 |             - OTHER.OPTIONS
 18 |           SupportedContentTypes:
 19 |             - application/json
 20 |           SupportedResponseMIMETypes:
 21 |             - application/json
 22 |           image_uri: &modelImage "ECR_MODEL_IMAGE"
 23 |           MetadataProperties:
 24 |             - Test
 25 |       metrics:
 26 |         ModelQuality:
 27 |           Statistics:
 28 |             ContentType: application/json
 29 |         channels:
 30 |           train:
 31 |             location:
 32 |               activeLocation: s3
 33 |               s3BucketName: ""
 34 |               evaluateBucketPrefix: ""
 35 |               evaluateInputLocalFilepath: ""
 36 |               inputBucketPrefix: prefix/to/input
 37 |               content_type: text/csv
 38 |             dataFiles:
 39 |               - fileName: data_1.csv
 40 |               - fileName: s3://bucket/fullt/path/data_2.csv
 41 |         sagemaker:
 42 |           image_uri: *modelImage
 43 |           base_job_name: ""
 44 |           entry_point: ""
 45 |           instance_count: ""
 46 |           instance_type: ""
 47 |           strategy: ""
 48 |           assemble_with: ""
 49 |           join_source: ""
 50 |           split_type: ""
 51 |           content_type: ""
 52 |           max_payload: ""
 53 |           volume_size_in_gb: ""
 54 |           max_runtime_in_seconds: ""
 55 |           s3_data_type: ""
 56 |           s3_input_mode: ""
 57 |           s3_data_distribution_type: ""
 58 |           tags:
 59 |             - Key: key1
 60 |               Value: value1
 61 |             - Key: key2
 62 |               Value: value2
 63 |           env:
 64 |             key: value
 65 |             key2: value2
 66 |       preprocess:
 67 |         instance_type: ""
 68 |         instance_count: ""
 69 |         volume_size_in_gb: 50
 70 |         max_runtime_seconds: 3000
 71 |         image_uri: ""
 72 |         entry_point: ""
 73 |         base_job_name: ""
 74 |         channels:
 75 |           train:
 76 |               s3BucketName: ""
 77 |               inputBucketPrefix: prefix/to/input
 78 |               outputBucketPrefix: prefix/to/output
 79 |             dataFiles:
 80 |               - sourceName: data_1
 81 |                 fileName: data_1.csv
 82 |               - sourceName: data_2
 83 |                 fileName: s3://bucket/fullt/path/data_2.csv
 84 |           tags:
 85 |             - Key: key1
 86 |               Value: value1
 87 |             - Key: key2
 88 |               Value: value2
 89 |           max_run_time_in_seconds: 3600
 90 |           env:
 91 |             key: value
 92 |             key2: value2
 93 |       train:
 94 |         instance_type: ""
 95 |         instance_count: ""
 96 |         output_path: s3://bucket/path/to/output
 97 |         base_image_uri: *modelImage
 98 |         entry_point: ""
 99 |         base_job_name: ""
100 |         volume_size_in_gb: 50
101 |         max_runtime_seconds: 3000
102 |         hyperparameters:
103 |           parameters: value
104 |           parameters2: value2
105 |         channels:
106 |           train:
107 |             location:
108 |               activeLocation: s3
109 |               s3BucketName: ""
110 |               inputBucketPrefix: prefix/to/input
111 |               content_type: text/csv
112 |             dataFiles:
113 |               - fileName: data_1.csv
114 |               - fileName: s3://bucket/fullt/path/data_2.csv
115 |         tags:
116 |           - Key: key1
117 |             Value: value1
118 |           - Key: key2
119 |             Value: value2
120 |         env:
121 |           key: value
122 |           key2: value2
123 |       tune:
124 |         base_job_name: ""
125 |         image_uri: *modelImage
126 |         strategy: ""
127 |         objective_metric_name: ""
128 |         hyperparameter_ranges: ""
129 |         metric_definitions: ""
130 |         objective_type: ""
131 |         max_parallel_jobs: ""
132 |         max_runtime_in_seconds: ""
133 |         tags:
134 |           - Key: key1
135 |             Value: value1
136 |           - Key: key2
137 |             Value: value2
138 |         early_stopping_type: ""
139 |         random_seed: ""
140 |       transform:
141 |         channels:
142 |           train:
143 |             location:
144 |               activeLocation: s3
145 |               s3BucketName: ""
146 |               evaluateBucketPrefix: ""
147 |               evaluateInputLocalFilepath: ""
148 |               inputBucketPrefix: prefix/to/input
149 |               content_type: text/csv
150 |             dataFiles:
151 |               - fileName: data_1.csv
152 |               - fileName: s3://bucket/fullt/path/data_2.csv
153 |         sagemaker:
154 |           image_uri: *modelImage
155 |           base_job_name: ""
156 |           entry_point: ""
157 |           instance_count: ""
158 |           instance_type: ""
159 |           strategy: ""
160 |           assemble_with: ""
161 |           join_source: ""
162 |           split_type: ""
163 |           content_type: ""
164 |           max_payload: ""
165 |           volume_size_in_gb: ""
166 |           max_runtime_in_seconds: ""
167 |           s3_data_type: ""
168 |           s3_input_mode: ""
169 |           s3_data_distribution_type: ""
170 |           tags:
171 |             - Key: key1
172 |               Value: value1
173 |             - Key: key2
174 |               Value: value2
175 |           env:
176 |             key: value
177 |             key2: value2
178 |   sagemakerPipeline:
179 |     pipelineName: ""
180 |     models:
181 |       mymodel:
182 |         steps:
183 |           - step_name: mymodel-Preprocessing
184 |             step_class: preprocessing
185 |             chain_input_source_steps:
186 |               - upstream-model-source-steps-#
187 |             enable_cache: ""
188 |           - step_name: mymodel-Training
189 |             step_class: training
190 |             chain_input_source_steps:
191 |               - upstream-model-source-steps-#
192 |             enable_cache: ""
193 |           - step_name: mymodel-CreateModel
194 |             step_class: createmodel
195 |           - step_name: mymodel-Transform
196 |             step_class: transform
197 |             chain_input_source_steps:
198 |               - upstream-model-source-steps-#
199 |             chain_input_additional_prefix: ""
200 |             enable_cache: ""
201 |           - step_name: mymodel-Metrics
202 |             step_class: metrics
203 |             chain_input_source_steps:
204 |               - upstream-model-source-steps-#
205 |             chain_input_additional_prefix: ""
206 |             enable_cache: ""
207 |           - step_name: mymodel-Registry
208 |             step_class: registermodel
209 |     dependencies:
210 |       - step1 >> step2 >> ...
211 | 


--------------------------------------------------------------------------------
/framework/createmodel/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/createmodel/__init__.py


--------------------------------------------------------------------------------
/framework/createmodel/create_model_service.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | # Import native libraries
 19 | 
 20 | from sagemaker.model import Model
 21 | from sagemaker.workflow.pipeline_context import PipelineSession
 22 | # Import Third-party libraries
 23 | from sagemaker.workflow.steps import TrainingStep
 24 | # Import Custom libraries
 25 | from utilities.logger import Logger
 26 | 
 27 | 
 28 | ########################################################################################
 29 | ### If the Logger class implememntation required file handler                        ###
 30 | ### self.logger = Logger(_conf)                                                      ###
 31 | ########################################################################################
 32 | 
 33 | 
 34 | class CreateModelService:
 35 |     """
 36 |     Create Model Service. Create a ModelStep
 37 |     """
 38 | 
 39 |     def __init__(self, config: dict, model_name: str) -> "CreateModelService":
 40 |         """
 41 |         Initialization method to Create a SageMaker Model
 42 | 
 43 |         Args:
 44 |         ----------
 45 |         - config (dict): Application configuration
 46 |         - model_name (str): Name of Model
 47 |         """
 48 |         self.config = config
 49 |         self.model_name = model_name
 50 |         self.logger = Logger()
 51 | 
 52 |     def _get_network_config(self) -> dict:
 53 |         """
 54 |         Method to retreive SageMaker network configuration
 55 | 
 56 |         Returns:
 57 |         ----------
 58 |         - SageMaker Network Configuration dictionary
 59 |         """
 60 | 
 61 |         network_config_kwargs = dict(
 62 |             enable_network_isolation=False,
 63 |             security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split(
 64 |                 ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None,
 65 |             subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get(
 66 |                 "sagemakerNetworkSecurity.subnets", None) else None,
 67 |             kms_key=self.config.get("sagemakerNetworkSecurity.kms_key"),
 68 |             encrypt_inter_container_traffic=True,
 69 |             role=self.config.get("sagemakerNetworkSecurity.role"),
 70 |         )
 71 |         return network_config_kwargs
 72 | 
 73 |     def _get_pipeline_session(self) -> PipelineSession:
 74 |         return PipelineSession(default_bucket=self.config.get("s3Bucket"))
 75 | 
 76 |     def _args(self) -> dict:
 77 |         """
 78 |         Parse method to retreive all arguments to be used to create the Model
 79 | 
 80 |         Returns:
 81 |         ----------
 82 |         - CreateModel arguments : dict
 83 |         """
 84 | 
 85 |         # parse main conf dictionary
 86 |         conf = self.config.get("models")
 87 | 
 88 |         args = dict(
 89 |             name=conf.get(f"{self.model_name}.name"),
 90 |             image_uri=conf.get(f"{self.model_name}.registry.InferenceSpecification.image_uri"),
 91 |             model_repack_flag=conf.get(f"{self.model_name}.registry.ModelRepack", "True"),
 92 |             # source_dir=conf.get(f"{self.model_name}.source_directory", os.environ["SMP_SOURCE_DIR_PATH"]),
 93 |             source_dir=conf.get(f"{self.model_name}.source_directory"),
 94 |             entry_point=conf.get(f"{self.model_name}.transform.entry_point", "inference.py").replace("/", ".").replace(
 95 |                 ".py", ""),
 96 |             env={
 97 |                 "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
 98 |                 "SAGEMAKER_PROGRAM": conf.get(f"{self.model_name}.transform.entry_point", "inference.py")
 99 |                 .replace("/", ".")
100 |                 .replace(".py", ""),
101 |                 "SAGEMAKER_REQUIREMENTS": "requirements.txt",
102 |             },
103 |             enable_network_isolation=False,
104 |             default_bucket=self.config.get(f"s3Bucket"),
105 |         )
106 | 
107 |         return args
108 | 
109 |     def create_model(self, step_train: TrainingStep) -> Model:
110 |         """
111 |         Create a SageMaker Model
112 | 
113 |         Args:
114 |         ----------
115 |         - step_train (TrainingStep): SageMaker Training Step
116 | 
117 |         Returns:
118 |         ----------
119 |         - SageMaker Model
120 | 
121 |         """
122 |         # Get SegeMaker Network Configuration
123 |         sagemaker_network_config = self._get_network_config()
124 |         self.logger.log_info(f"{'-' * 50} Start SageMaker Model Creation {self.model_name} {'-' * 50}")
125 |         self.logger.log_info(f"SageMaker network config: {sagemaker_network_config}")
126 | 
127 |         # Get Arg for CreateModel Step
128 |         args = self._args()
129 |         self.logger.log_info(f"Arguments used: {args}")
130 | 
131 |         # Check ModelRepack Flag
132 |         _model_repack_flag = args.get("model_repack_flag")
133 | 
134 |         vpc_config = {}
135 |         if sagemaker_network_config.get("security_group_ids", None): vpc_config.update(
136 |             {'SecurityGroupIds': sagemaker_network_config.get("security_group_ids")})
137 |         if sagemaker_network_config.get("subnets", None): vpc_config.update(
138 |             {'Subnets': sagemaker_network_config.get("subnets")})
139 |         if not vpc_config: vpc_config = None
140 | 
141 |         model = ''
142 |         if _model_repack_flag == "True":
143 |             model = Model(
144 |                 name=args["name"],
145 |                 image_uri=args["image_uri"],
146 |                 source_dir=args["source_dir"],
147 |                 entry_point=args["entry_point"],
148 |                 model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
149 |                 role=sagemaker_network_config["role"],
150 |                 vpc_config=vpc_config,
151 |                 enable_network_isolation=args["enable_network_isolation"],
152 |                 sagemaker_session=self._get_pipeline_session(),
153 |                 model_kms_key=sagemaker_network_config["kms_key"],
154 |             )
155 | 
156 |         elif _model_repack_flag == "False":
157 |             model = Model(
158 |                 name=args["name"],
159 |                 image_uri=args["image_uri"],
160 |                 env=args["env"],
161 |                 model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
162 |                 role=sagemaker_network_config["role"],
163 |                 vpc_config=vpc_config,
164 |                 enable_network_isolation=args.get("enable_network_isolation"),
165 |                 sagemaker_session=self._get_pipeline_session(),
166 |                 model_kms_key=sagemaker_network_config["kms_key"],
167 |             )
168 |             print
169 | 
170 |         self.logger.log_info(f"SageMaker Model - {self.model_name} - Created")
171 |         return model
172 | 


--------------------------------------------------------------------------------
/framework/framework_entrypoint.py:
--------------------------------------------------------------------------------
1 | from pipeline.pipeline_service import PipelineService
2 | 
3 | pipeline = PipelineService()
4 | pipeline.execute_pipeline()
5 | 


--------------------------------------------------------------------------------
/framework/modelmetrics/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/modelmetrics/__init__.py


--------------------------------------------------------------------------------
/framework/modelmetrics/model_metrics_service.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | # Import native libraries
 19 | import os
 20 | 
 21 | # Import Third-party libraries
 22 | import boto3
 23 | import sagemaker
 24 | from pipeline.helper import get_chain_input_file
 25 | from sagemaker.network import NetworkConfig
 26 | from sagemaker.processing import (
 27 |     FrameworkProcessor,
 28 |     ProcessingInput,
 29 |     ProcessingOutput,
 30 |     RunArgs,
 31 | )
 32 | from sagemaker.workflow.pipeline_context import PipelineSession
 33 | # Import Custom libraries
 34 | from utilities.logger import Logger
 35 | 
 36 | ########################################################################################
 37 | ### If the Logger class implememntation required file handler                        ###
 38 | ### self.logger = Logger(_conf)                                                      ###
 39 | ########################################################################################
 40 | 
 41 | session = boto3.session.Session()
 42 | region_name = session.region_name
 43 | client_sagemaker_obj = boto3.client("sagemaker", region_name=region_name)
 44 | 
 45 | 
 46 | class ModelMetricsService:
 47 |     """
 48 |     Create an Evaluate function to generate the model metrcis
 49 |     """
 50 | 
 51 |     def __init__(self, config: dict, model_name: str, step_config: dict,
 52 |                  model_step_dict: dict) -> "ModelMetricsService":
 53 |         """
 54 |         Initialization method to Create ModelMetricsService
 55 | 
 56 |         Args:
 57 |         ----------
 58 |         - config (dict): Application configuration
 59 |         - model_name (str): Name of Model
 60 |         """
 61 |         self.config = config
 62 |         self.model_name = model_name
 63 |         self.step_config = step_config
 64 |         self.model_step_dict = model_step_dict
 65 |         self.logger = Logger()
 66 | 
 67 |     def _get_pipeline_session(self) -> PipelineSession:
 68 |         return PipelineSession(default_bucket=self.config.get("s3Bucket"))
 69 | 
 70 |     def _get_network_config(self) -> dict:
 71 |         """
 72 |         Method to retreive SageMaker network configuration
 73 | 
 74 |         Returns:
 75 |         ----------
 76 |         - SageMaker Network Configuration dictionary
 77 |         """
 78 | 
 79 |         network_config_kwargs = dict(
 80 |             enable_network_isolation=False,
 81 |             security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split(
 82 |                 ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None,
 83 |             subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get(
 84 |                 "sagemakerNetworkSecurity.subnets", None) else None,
 85 |             encrypt_inter_container_traffic=True,
 86 |         )
 87 |         return network_config_kwargs
 88 | 
 89 |     def _sagemaker_args(self):
 90 |         """
 91 |         Parse method to retreive all sagemaker arguments
 92 |         """
 93 |         conf = self.config.get(f"models.{self.model_name}.evaluate")
 94 | 
 95 |         args = dict(
 96 |             image_uri=conf.get("image_uri"),
 97 |             entry_point=conf.get("entry_point"),
 98 |             base_job_name=conf.get("base_job_name", "default-model-metrics-job-name"),
 99 |             instance_count=conf.get("instance_count", 1),
100 |             instance_type=conf.get("instance_type", "ml.m5.2xlarge"),
101 |             strategy=conf.get("strategy", "SingleRecord"),
102 |             max_payload=conf.get("max_payload", None),
103 |             volume_size_in_gb=conf.get("volume_size_in_gb", 50),
104 |             max_runtime_in_seconds=conf.get("max_runtime_in_seconds", 3600),
105 |             s3_data_distribution_type=conf.get("s3_data_distribution_type", "FullyReplicated"),
106 |             s3_data_type=conf.get("s3_data_type", "S3Prefix"),
107 |             s3_input_mode=conf.get("s3_input_mode", "File"),
108 |             role=self.config.get("sagemakerNetworkSecurity.role"),
109 |             kms_key=self.config.get("sagemakerNetworkSecurity.kms_key", None),
110 |             tags=conf.get("tags", None),
111 |             env=conf.get("env", None),
112 |         )
113 | 
114 |         self.logger.log_info("Arguments Instantiates", f"Args: {args}")
115 | 
116 |         return args
117 | 
118 |     def _get_static_input_list(self) -> list:
119 |         """
120 |         Method to retreive SageMaker static inputs
121 | 
122 |         Returns:
123 |         ----------
124 |         - SageMaker Processing Inputs list
125 |         
126 |         """
127 |         conf = self.config.get(f"models.{self.model_name}.evaluate")
128 |         # Get the total number of input files
129 |         input_files_list = list()
130 |         for channel in conf.get("channels", {}).keys(): input_files_list.append(
131 |             conf.get(f"channels.{channel}.dataFiles", [])[0])
132 |         return input_files_list
133 | 
134 |     def _get_static_input(self, input_local_filepath):
135 |         """
136 |         Method to retreive SageMaker static inputs
137 | 
138 |         Returns:
139 |         ----------
140 |         - SageMaker Processing Inputs list
141 |         
142 |         """
143 |         static_inputs = []
144 | 
145 |         conf = self.config.get(f"models.{self.model_name}.evaluate")
146 |         if isinstance(conf.get("channels", {}), dict):
147 |             # Get the total number of input files
148 |             input_files_list = self._get_static_input_list()
149 |             if len(input_files_list) >= 7:
150 |                 raise Exception("Static inputs for metrics should not exceed 7")
151 |             for file in input_files_list:
152 |                 if file.get("fileName").startswith("s3://"):
153 |                     _source = file.get("fileName")
154 |                 else:
155 |                     bucket = conf.get("channels.train.s3Bucket")
156 |                     input_prefix = conf.get("channels.train.s3InputPrefix", "")
157 |                     _source = os.path.join(bucket, input_prefix, file.get("fileName"))
158 | 
159 |                 temp = ProcessingInput(
160 |                     input_name=file.get("sourceName", ""),
161 |                     source=_source,
162 |                     destination=os.path.join(input_local_filepath, file.get("sourceName", "")),
163 |                     s3_data_distribution_type=conf.get("s3_data_distribution_type", "FullyReplicated"),
164 |                 )
165 |                 static_inputs.append(temp)
166 | 
167 |         return static_inputs
168 | 
169 |     def _get_chain_input(self, input_local_filepath):
170 |         """
171 |         Method to retreive SageMaker chain inputs
172 | 
173 |         Returns:
174 |         ----------
175 |         - SageMaker Processing Inputs list
176 |         """
177 |         dynamic_inputs = []
178 |         chain_input_source_step = self.step_config.get("chain_input_source_step", [])
179 | 
180 |         channels_conf = self.config.get(f"models.{self.model_name}.evaluate.channels", "train")
181 |         if isinstance(channels_conf, str):
182 |             # no datafile input
183 |             channel_name = channels_conf
184 |         else:
185 |             # find datafile input
186 |             if len(channels_conf.keys()) != 1:
187 |                 raise Exception("Evaluate step can only have one channel.")
188 |             channel_name = list(channels_conf.keys())[0]
189 | 
190 |         for source_step_name in chain_input_source_step:
191 |             chain_input_path = get_chain_input_file(
192 |                 source_step_name=source_step_name,
193 |                 steps_dict=self.model_step_dict,
194 |                 source_output_name=channel_name,
195 |             )
196 | 
197 |             temp = ProcessingInput(
198 |                 input_name=f"{source_step_name}-input",
199 |                 source=chain_input_path,
200 |                 destination=os.path.join(input_local_filepath, f"{source_step_name}-{channel_name}"),
201 |             )
202 |             dynamic_inputs.append(temp)
203 | 
204 |         return dynamic_inputs
205 | 
206 |     def _get_processing_inputs(self, input_destination) -> list:
207 |         """
208 |         Method to get additional processing inputs
209 |         """
210 |         # Instantiate a list of inputs
211 |         temp_static_input = self._get_static_input(input_destination)
212 |         temp_dynamic_input = self._get_chain_input(input_destination)
213 |         processing_inputs = temp_static_input + temp_dynamic_input
214 |         return processing_inputs
215 | 
216 |     def _generate_model_metrics(
217 |             self,
218 |             input_destination: str,
219 |             output_source: str,
220 |             output_destination: str,
221 |     ) -> RunArgs:
222 | 
223 |         """
224 |         Method to create the ProcessorStep args to calculate metrics
225 | 
226 |         Args:
227 |         ----------
228 |         - input_destination(str): path for input destination
229 |         - output_source (str): path for output source
230 |         - output_destination (str): path for output destination
231 |         """
232 | 
233 |         # Get metrics Config
234 |         # metrics_config = self.config.get(f"models.{self.model_name}.evaluate")
235 |         # Get Sagemaker Network config params
236 |         sagemakernetworkconfig = self._get_network_config()
237 |         # Get Sagemaker config params
238 |         args = self._sagemaker_args()
239 |         # Replace entry point path leverage python -m for local dependencies
240 |         entrypoint_command = args.get("entry_point").replace("/", ".").replace(".py", "")
241 | 
242 |         # Create SageMaker Processor Instance
243 |         processor = FrameworkProcessor(
244 |             image_uri=args.get("image_uri"),
245 |             estimator_cls=sagemaker.sklearn.estimator.SKLearn,  # ignore bc of image_uri
246 |             framework_version=None,
247 |             role=args.get("role"),
248 |             command=["python", "-m", entrypoint_command],
249 |             instance_count=args.get("instance_count"),
250 |             instance_type=args.get("instance_type"),
251 |             volume_size_in_gb=args.get("volume_size_in_gb"),
252 |             volume_kms_key=args.get("kms_key"),
253 |             output_kms_key=args.get("kms_key"),
254 |             max_runtime_in_seconds=args.get("max_runtime_in_seconds"),
255 |             base_job_name=args.get("base_job_name"),
256 |             sagemaker_session=self._get_pipeline_session(),
257 |             env=args.get("env"),
258 |             tags=args.get("tags"),
259 |             network_config=NetworkConfig(**sagemakernetworkconfig),
260 |         )
261 | 
262 |         generate_model_metrics_args = processor.run(
263 |             inputs=self._get_processing_inputs(input_destination),
264 |             outputs=[
265 |                 ProcessingOutput(
266 |                     source=output_source,
267 |                     destination=output_destination,
268 |                     output_name="model_evaluation_metrics",
269 |                 ),
270 |             ],
271 |             source_dir=self.config.get(
272 |                 f"models.{self.model_name}.source_directory",
273 |                 os.getenv("SMP_SOURCE_DIR_PATH")
274 |             ),
275 |             code=args.get("entry_point"),
276 |             wait=True,
277 |             logs=True,
278 |             job_name=args.get("base_job_name"),
279 |         )
280 | 
281 |         return generate_model_metrics_args
282 | 
283 |     def calculate_model_metrics(self) -> RunArgs:
284 |         """
285 |         Method to calculate models metrics
286 |         """
287 | 
288 |         self.logger.log_info(f"{'-' * 40} {self.model_name} {'-' * 40}")
289 |         evaluate_data = self.config.get(f"models.{self.model_name}.evaluate")
290 |         if isinstance(evaluate_data.get("channels", "train"), dict):
291 |             evaluate_channels = list(evaluate_data.get("channels").keys())
292 |             # Iterate through evaluate channels
293 |             if len(evaluate_channels) != 1:
294 |                 raise Exception(f" Only one channel allowed within evaluation section. {evaluate_channels} found.")
295 |             else:
296 |                 channel = evaluate_channels[0]
297 |                 self.logger.log_info(f"During ModelMetricsService, one evaluate channel {channel} found.")
298 | 
299 |             channel_full_name = f"channels.{channel}"
300 |             bucket_prefix = evaluate_data.get(f"{channel_full_name}.bucket_prefix", "")
301 |             s3_bucket_name = evaluate_data.get(f"{channel_full_name}.s3BucketName")
302 |             processing_input_destination = evaluate_data.get(
303 |                 f"{channel_full_name}.InputLocalFilepath", "/opt/ml/processing/input/"
304 |             )
305 |             processing_output_source = evaluate_data.get(
306 |                 f"{channel_full_name}.OuputLocalFilepath", "/opt/ml/processing/output/"
307 |             )
308 |             processing_output_key = os.path.join(
309 |                 bucket_prefix,
310 |                 self.model_name,
311 |                 "evaluation",
312 |             )
313 |             processing_output_destination = f"s3://{s3_bucket_name}/{processing_output_key}/"
314 |         else:
315 |             processing_input_destination = "/opt/ml/processing/input/"
316 |             processing_output_source = "/opt/ml/processing/output/"
317 |             processing_output_destination = None
318 | 
319 |         generate_model_metrics_args = self._generate_model_metrics(
320 |             input_destination=processing_input_destination,
321 |             output_source=processing_output_source,
322 |             output_destination=processing_output_destination,
323 |         )
324 | 
325 |         self.logger.log_info(f" Model evaluate completed.")
326 | 
327 |         return generate_model_metrics_args
328 | 


--------------------------------------------------------------------------------
/framework/pipeline/helper.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | # Import native libraries
 19 | import re
 20 | 
 21 | from sagemaker.workflow import steps
 22 | from sagemaker.workflow.functions import Join
 23 | from ast import literal_eval
 24 | 
 25 | 
 26 | def look_up_step_type_from_step_name(source_step_name: str, config: dict) -> str:
 27 |     """
 28 |     Look up a step_type in the sagemakerPipeline providing source_step_name and model_name.
 29 | 
 30 |     Args:
 31 |         source_step_name (str): The name of the step to look up in the sagemakerPipeline.
 32 |         model_name (str): The model in sagemaker pipeline
 33 |         config (dict): The configuration.
 34 | 
 35 | 
 36 |     Returns:
 37 |         The step_type.
 38 |     """
 39 |     # note: chain_input_source_step will error out if source_step does not have an optional step_type declared in
 40 |     # sagemakerPipeline section of conf. step_type is mandatory for step_class: Processing.
 41 |     for model_name in config['models'].keys():
 42 |         steps_dict = config['models'][model_name]
 43 |         smp_steps_dict = config['sagemakerPipeline']['models'][model_name]['steps']
 44 | 
 45 |         for step in smp_steps_dict:
 46 |             if step['step_name'] == source_step_name:
 47 |                 if step['step_class'] == 'Processing':
 48 |                     try:
 49 |                         return step['step_type']
 50 |                     except KeyError:
 51 |                         print(
 52 |                             f'When chaining input, source {source_step_name} needs to include step_type, in sagemakerPipeline section of conf'
 53 |                         )
 54 |                 elif step['step_class'] == 'Training':
 55 |                     return "train"
 56 |                 elif step['step_class'] == 'Transform':
 57 |                     return "transform"
 58 |                 else:
 59 |                     raise Exception("Only Prcoessing, Training & Transform Step can be used as chain input source.")
 60 | 
 61 | 
 62 | def look_up_steps(source_step_name: str, steps_dict: dict) -> steps.Step:
 63 |     """
 64 |     Look up a step in a dictionary of steps.
 65 | 
 66 |     Args:
 67 |         source_step_name (str): The name of the step to look up.
 68 |         steps_dict (dict): The dictionary of steps.
 69 | 
 70 |     Returns:
 71 |         The step.
 72 |     """
 73 |     for model_name, model_steps in steps_dict.items():
 74 |         for step in model_steps:
 75 |             if step.name == source_step_name:
 76 |                 return step
 77 | 
 78 | 
 79 | def look_up_step_config(source_step_name: str, smp_config: dict) -> dict:
 80 |     """
 81 |     Look up a step configuration in a dictionary of steps.
 82 | 
 83 |     Args:
 84 |         source_step_name (str): The name of the step to look up.
 85 |         smp_config (dict): The dictionary of steps.
 86 | 
 87 |     Returns:
 88 |         The step configuration.
 89 |     """
 90 |     for source_model in smp_config.get("models"):
 91 |         for step in smp_config.get(f"models.{source_model}.steps"):
 92 |             if step.get("step_name") == source_step_name:
 93 |                 step_class = step.get("step_class")
 94 |                 return source_model, step_class
 95 | 
 96 | 
 97 | def get_chain_input_file(
 98 |         source_step_name: str,
 99 |         steps_dict: dict,
100 |         source_output_name: str = "train",
101 |         allowed_step_types: list = ["Processing", "Training", "Transform"],
102 | ) -> str:
103 |     """
104 |     Get the input file for a step in a chain.
105 | 
106 |     Args:
107 |         source_step_name (str): The name of the step to look up.
108 |         steps_dict (dict): The dictionary of steps.
109 |         source_output_name (str): The name of the output to look up.
110 |         allowed_step_types (list): The list of allowed step types.
111 | 
112 |     Returns:
113 |         The input file.
114 |     """
115 | 
116 |     source_step = look_up_steps(source_step_name, steps_dict)
117 |     if source_step.step_type.value not in allowed_step_types:
118 |         raise ValueError(
119 |             f"Invalid Source Step Type: {source_step.step_type.value}, Valid source step are {allowed_step_types}"
120 |         )
121 |     if source_step.step_type.value == "Processing":
122 |         chain_input_file = source_step.properties.ProcessingOutputConfig.Outputs[source_output_name].S3Output.S3Uri
123 |     elif source_step.step_type.value == "Training":
124 |         chain_input_file = source_step.properties.ModelArtifacts.S3ModelArtifacts
125 |     elif source_step.step_type.value == "Transform":
126 |         chain_input_file = source_step.properties.TransformOutput.S3OutputPath
127 |     else:
128 |         raise ValueError(
129 |             f"SageMaker model Framework is not supported: {source_step.step_type.value}"
130 |             f"Invalid Source Step Type: {source_step.step_type.value}, Valid source step are {allowed_step_types}"
131 |         )
132 |     return chain_input_file
133 | 
134 | 
135 | def get_cache_flag(step_config: dict) -> bool:
136 |     """
137 |     Get the cache flag for a step configuration.
138 | 
139 |     Args:
140 |         step_config (dict): The step configuration.
141 | 
142 |     Returns:
143 |         The cache flag.
144 |     """
145 |     if "enable_cache" in step_config.keys():
146 |         cache_flag_content = step_config.get("enable_cache")
147 |         if isinstance(cache_flag_content, bool):
148 |             chache_flag = cache_flag_content
149 |         else:
150 |             raise Exception("Invalid value of step_caching, valid values are True or False")
151 |     else:
152 |         chache_flag = False
153 |     return chache_flag
154 | 
155 | 
156 | def generate_default_smp_config(config: dict) -> dict:
157 |     """
158 |     Generate the default SageMaker Model Parallelism configuration.
159 | 
160 |     Args:
161 |         config (dict): The configuration.
162 | 
163 |     Returns:
164 |         The SageMaker Model Parallelism configuration.
165 |     """
166 |     model_name = config.get("models.modelName")
167 |     model_abbreviated = model_name.replace("model", "")
168 |     project_name = config.get("project_name")
169 | 
170 |     try:
171 |         default_pipeline_name = config.get(f"models.{model_name}.sagemakerPipeline.pipelineName")
172 |     except Exception:
173 |         default_pipeline_name = f"{project_name}-{model_abbreviated}-pipeline"
174 | 
175 |     smp_ = f"""
176 |         {{
177 |             pipelineName: "{default_pipeline_name}",
178 |             models: {{
179 |                 {model_name}: {{
180 |                     steps = [
181 |                         {{
182 |                             step_name = {model_abbreviated}-Preprocessing,
183 |                             step_class = preprocessing,
184 |                         }},
185 |                         {{
186 |                             step_name = {model_abbreviated}-Training,
187 |                             step_class = training,
188 |                             chain_input_source_steps = [{model_abbreviated}-Preprocessing],
189 |                         }},
190 |                         {{
191 |                             step_name = {model_abbreviated},
192 |                             step_class = createmodel,
193 |                         }},
194 |                         {{
195 |                             step_name = {model_abbreviated}-Transform,
196 |                             step_class = transform,
197 |                         }},
198 |                         {{
199 |                             step_name = {model_abbreviated}-Metrics, 
200 |                             step_class = metrics,
201 |                         }},
202 |                         {{
203 |                             step_name = {model_abbreviated}-Register,
204 |                             step_class = registermodel,
205 |                         }}
206 |                     ]
207 |                 }}
208 |             }},
209 |             dependencies = [
210 |                 "{model_abbreviated}-Preprocessing >> {model_abbreviated}-Training >> {model_abbreviated} >> {model_abbreviated}-Transform >> {model_abbreviated}-Metrics >> {model_abbreviated}-Register"
211 |             ]
212 |         }}
213 |     
214 |     """
215 |     smp_config = literal_eval(smp_)
216 |     return smp_config
217 | 


--------------------------------------------------------------------------------
/framework/pipeline/model_unit.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | # Create individual model units for pipeline
 19 | 
 20 | from createmodel.create_model_service import CreateModelService
 21 | from modelmetrics.model_metrics_service import ModelMetricsService
 22 | from pipeline.helper import get_cache_flag
 23 | from processing.processing_service import ProcessingService
 24 | from registermodel.register_model_service import RegisterModelService
 25 | from sagemaker.workflow.model_step import ModelStep
 26 | from sagemaker.workflow.properties import PropertyFile
 27 | from sagemaker.workflow.steps import (
 28 |     CacheConfig,
 29 |     ProcessingStep,
 30 |     TrainingStep,
 31 |     TransformStep
 32 | )
 33 | from training.training_service import TrainingService
 34 | from transform.transform_service import TransformService
 35 | 
 36 | 
 37 | class ModelUnit:
 38 |     def __init__(
 39 |             self,
 40 |             config: dict,
 41 |             model_name: str,
 42 |             model_step_dict: dict,
 43 |     ) -> "ModelUnit":
 44 | 
 45 |         self.config = config
 46 |         self.model_name = model_name
 47 |         self.model_step_dict = model_step_dict.copy()
 48 |         self.model_step_dict[self.model_name] = []
 49 | 
 50 |     def get_train_pipeline_steps(self) -> list:
 51 |         process_step = None
 52 |         train_step = None
 53 |         create_model_step = None
 54 |         transform_step = None
 55 |         metrics_step = None
 56 |         register_model_step = None
 57 |         model_pipeline_steps = []
 58 | 
 59 |         step_config_list = self.config.get(f"sagemakerPipeline.models.{self.model_name}.steps")
 60 | 
 61 |         for step_config in step_config_list:
 62 |             step_class = step_config.get("step_class")
 63 |             if step_class == "Processing":
 64 |                 preprocess_step = self.sagemaker_processing(step_config)
 65 |                 add_step = preprocess_step
 66 |             elif step_class == "Training":
 67 |                 train_step = self.sagemaker_training(step_config)
 68 |                 add_step = train_step
 69 |             elif step_class == "CreateModel":
 70 |                 if train_step is None:
 71 |                     raise Exception("A training step must be run before a CreateModel step")
 72 |                 create_model_step = self.sagemaker_create_model(step_config, train_step)
 73 |                 add_step = create_model_step
 74 |             elif step_class == "Transform":
 75 |                 sagemaker_model_name = create_model_step.properties.ModelName
 76 |                 transform_step = self.sagemaker_transform(step_config, sagemaker_model_name)
 77 |                 add_step = transform_step
 78 |             elif step_class == "Metrics":
 79 |                 if transform_step is None:
 80 |                     raise Exception("A transform step is required to create a model metrics step.")
 81 |                 metrics_step = self.sagemaker_model_metrics(step_config)
 82 |                 add_step = metrics_step
 83 |             elif step_class == "RegisterModel":
 84 |                 if train_step is None:
 85 |                     raise Exception("A training step is required to create a register model step.")
 86 |                 register_model_step = self.sagemaker_register_model(step_config, metrics_step, train_step)
 87 |                 add_step = register_model_step
 88 |             else:
 89 |                 raise Exception("Invalid step_class value.")
 90 | 
 91 |             model_pipeline_steps.append(add_step)
 92 |             self.model_step_dict[self.model_name].append(add_step)
 93 |         return model_pipeline_steps
 94 | 
 95 |     def sagemaker_processing(self, step_config: dict) -> ProcessingStep:
 96 |         process_service = ProcessingService(
 97 |             self.config,
 98 |             self.model_name,
 99 |             step_config,
100 |             self.model_step_dict,
101 |         )
102 |         step_args = process_service.processing()
103 |         cache_config = CacheConfig(enable_caching=True, expire_after="10d")
104 |         process_step = ProcessingStep(
105 |             name=step_config.get("step_name"),
106 |             step_args=step_args,
107 |             cache_config=cache_config,
108 |         )
109 |         return process_step
110 | 
111 |     def sagemaker_training(self, step_config: dict) -> TrainingStep:
112 | 
113 |         training_service = TrainingService(
114 |             self.config,
115 |             self.model_name,
116 |             step_config,
117 |             self.model_step_dict,
118 |         )
119 | 
120 |         step_args = training_service.train_step()
121 |         cache_config = CacheConfig(enable_caching=True, expire_after="10d")
122 |         train_step = TrainingStep(
123 |             name=step_config.get("step_name"),
124 |             step_args=step_args,
125 |             cache_config=cache_config,
126 |         )
127 |         return train_step
128 | 
129 |     def sagemaker_create_model(self, step_config: dict, train_step: TrainingStep) -> ModelStep:
130 | 
131 |         create_model_service = CreateModelService(
132 |             self.config,
133 |             self.model_name,
134 |         )
135 |         model = create_model_service.create_model(train_step)
136 |         create_model_step = ModelStep(
137 |             name=step_config.get("step_name"),
138 |             step_args=model.create(instance_type="ml.m5.2xlarge")
139 |         )
140 |         return create_model_step
141 | 
142 |     def sagemaker_transform(self, step_config: dict, sagemaker_model_name: str) -> TransformStep:
143 | 
144 |         transform_service = TransformService(
145 |             self.config,
146 |             self.model_name,
147 |             step_config,
148 |             self.model_step_dict,
149 |         )
150 | 
151 |         transform_step_args = transform_service.transform(
152 |             sagemaker_model_name=sagemaker_model_name
153 |         )
154 |         cache_config = CacheConfig(enable_caching=get_cache_flag(step_config), expire_after="10d")
155 |         transform_step = TransformStep(
156 |             name=step_config.get("step_name"),
157 |             step_args=transform_step_args,
158 |             cache_config=cache_config
159 |         )
160 |         return transform_step
161 | 
162 |     def sagemaker_model_metrics(self, step_config: dict) -> ProcessingStep:
163 | 
164 |         model_metric_service = ModelMetricsService(self.config, self.model_name, step_config, self.model_step_dict)
165 |         model_metric_args = model_metric_service.calculate_model_metrics()
166 | 
167 |         cache_config = CacheConfig(enable_caching=get_cache_flag(step_config), expire_after="10d")
168 |         evaluation_report = PropertyFile(
169 |             name="EvaluationReport",
170 |             output_name="model_evaluation_metrics",
171 |             path=f"model_evaluation_metrics.json",
172 |         )
173 |         metrics_step = ProcessingStep(
174 |             name=step_config.get("step_name"),
175 |             step_args=model_metric_args,
176 |             property_files=[evaluation_report],
177 |             cache_config=cache_config,
178 |         )
179 | 
180 |         return metrics_step
181 | 
182 |     def sagemaker_register_model(self, step_config: dict, metrics_step: ProcessingStep,
183 |                                  train_step: TrainingStep) -> ModelStep:
184 | 
185 |         register_model_service = RegisterModelService(self.config, self.model_name)
186 |         register_model_args = register_model_service.register_model(
187 |             metrics_step,
188 |             train_step
189 |         )
190 | 
191 |         register_model_step = ModelStep(
192 |             name=step_config.get("step_name"),
193 |             step_args=register_model_args
194 |         )
195 | 
196 |         return register_model_step
197 | 


--------------------------------------------------------------------------------
/framework/pipeline/pipeline_service.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | 
 3 | from pipeline.model_unit import ModelUnit
 4 | from sagemaker.workflow.model_step import ModelStep
 5 | from sagemaker.workflow.pipeline import Pipeline
 6 | from sagemaker.workflow.pipeline_context import PipelineSession
 7 | from utilities.configuration import Conf
 8 | 
 9 | 
10 | class PipelineService:
11 | 
12 |     def __init__(self) -> "PipelineService":
13 |         self.config = Conf().load_conf()
14 |         pass
15 | 
16 |     def _add_step_dependencies(self, pipeline_steps: list) -> None:
17 |         step_dependency_config = self.config.get("sagemakerPipeline.dependencies", [])
18 |         for condition in step_dependency_config:
19 |             temp_chain = condition.split(" >> ")
20 |             for i in range(len(temp_chain) - 1):
21 |                 source_step_name = temp_chain[i]
22 |                 dest_step_name = temp_chain[i + 1]
23 |                 source_step = None
24 |                 dest_step = None
25 |                 for step in pipeline_steps:
26 |                     step_name = step.name
27 |                     if step_name == source_step_name:
28 |                         source_step = step
29 |                     elif step_name == dest_step_name:
30 |                         dest_step = step
31 |                     if source_step is not None and dest_step is not None:
32 |                         if isinstance(dest_step, ModelStep):
33 |                             dest_step.steps[0].add_depends_on([source_step])
34 |                         else:
35 |                             dest_step.add_depends_on([source_step])
36 |                         break
37 |                 if source_step is None or dest_step is None:
38 |                     raise Exception(
39 |                         f"Failed when adding dependency between steps {source_step_name} and {dest_step_name}.")
40 | 
41 |     def construct_train_pipeline(self):
42 |         model_steps_dict = {}
43 | 
44 |         for model_name in list(self.config.get("sagemakerPipeline.models").keys()):
45 |             model_steps_dict[model_name] = ModelUnit(
46 |                 self.config, model_name, model_steps_dict,
47 |             ).get_train_pipeline_steps()
48 | 
49 |         pipeline_steps = []
50 |         for model_name, model_unit_steps in model_steps_dict.items():
51 |             pipeline_steps += model_unit_steps
52 | 
53 |         self._add_step_dependencies(pipeline_steps=pipeline_steps)
54 | 
55 |         pipeline = Pipeline(
56 |             name=self.config.get("sagemakerPipeline.pipelineName"),
57 |             steps=pipeline_steps,
58 |             sagemaker_session=PipelineSession(),
59 |         )
60 |         pipeline_definition = json.loads(pipeline.definition())
61 | 
62 |         return pipeline, pipeline_definition
63 | 
64 |     def execute_pipeline(self) -> None:
65 |         pipeline_role = self.config.get("sagemakerNetworkSecurity.role")
66 |         pipeline, pipeline_definition = self.construct_train_pipeline()
67 | 
68 |         with open("pipeline_definition.json", "w") as file:
69 |             json.dump(pipeline_definition, file)
70 | 
71 |         pipeline.upsert(role_arn=pipeline_role)
72 |         pipeline.start()
73 | 


--------------------------------------------------------------------------------
/framework/processing/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/processing/__init__.py


--------------------------------------------------------------------------------
/framework/processing/processing_service.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | import json
 19 | import os
 20 | from typing import Tuple
 21 | 
 22 | from pipeline.helper import get_chain_input_file, look_up_step_type_from_step_name
 23 | from sagemaker.network import NetworkConfig
 24 | from sagemaker.processing import (
 25 |     FrameworkProcessor,
 26 |     ProcessingInput,
 27 |     ProcessingOutput
 28 | )
 29 | from sagemaker.sklearn import estimator
 30 | from sagemaker.workflow.pipeline_context import PipelineSession
 31 | 
 32 | 
 33 | class ProcessingService:
 34 |     """
 35 |     Class to handle the creation of processing steps
 36 | 
 37 |     Attributes:
 38 |     ----------
 39 |     - config: dict
 40 |         - Configuration dictionary
 41 |     - model_name: str
 42 |         - Model name
 43 |     - step_config: dict
 44 |         - Processing step configuration dictionary
 45 |     - model_step_dict: dict
 46 |         - Dictionary of model processing steps
 47 |     """
 48 | 
 49 |     def __init__(self, config: dict, model_name: str, step_config: dict, model_step_dict: dict):
 50 |         self.config = config
 51 |         self.model_name = model_name
 52 |         self.step_config = step_config
 53 |         self.model_step_dict = model_step_dict
 54 | 
 55 |     def _get_network_config(self) -> dict:
 56 |         """
 57 |         Method to retreive SageMaker network configuration
 58 | 
 59 |         Returns:
 60 |         ----------
 61 |         - SageMaker Network Configuration dictionary
 62 |         """
 63 |         network_config_kwargs = dict(
 64 |             enable_network_isolation=False,
 65 |             security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split(
 66 |                 ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None,
 67 |             subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get(
 68 |                 "sagemakerNetworkSecurity.subnets", None) else None,
 69 |             encrypt_inter_container_traffic=True,
 70 |         )
 71 | 
 72 |         return network_config_kwargs
 73 | 
 74 |     def _get_pipeline_session(self) -> PipelineSession:
 75 |         """
 76 |         Method to retreive SageMaker pipeline session
 77 | 
 78 |         Returns:
 79 |         ----------
 80 |         - SageMaker pipeline session
 81 |         """
 82 |         return PipelineSession(default_bucket=self.config.get("s3Bucket"))
 83 | 
 84 |     def _args(self) -> dict:
 85 |         """
 86 |         Parse method to retreive all arguments to be used to create the processing stop
 87 | 
 88 |         Returns:
 89 |         ----------
 90 |         - Processing Step arguments : dict
 91 |         """
 92 | 
 93 |         # parse main conf dictionary
 94 |         conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}")
 95 |         source_dir = self.config.get(
 96 |             f"models.{self.model_name}.source_directory", 
 97 |             os.getenv("SMP_SOURCE_DIR_PATH")
 98 |         )
 99 | 
100 |         args = dict(
101 |             image_uri=conf.get("image_uri"),
102 |             base_job_name=conf.get("base_job_name", "default-processing-job-name"),
103 |             entry_point=conf.get("entry_point"),
104 |             instance_count=conf.get("instance_count", 1),
105 |             instance_type=conf.get("instance_type", "ml.m5.2xlarge"),
106 |             volume_size_in_gb=conf.get("volume_size_in_gb", 32),
107 |             max_runtime_seconds=conf.get("max_runtime_seconds", 3000),
108 |             tags=conf.get("tags", None),
109 |             env=conf.get("env", None),
110 |             source_directory=source_dir,
111 |             framework_version=conf.get("framework_version", "0"),
112 |             role=self.config.get("sagemakerNetworkSecurity.role"),
113 |             kms_key=self.config.get("sagemakerNetworkSecurity.kms_key", None),
114 |             s3_data_distribution_type=conf.get("s3_data_distribution_type", "FullyReplicated"),
115 |             s3_data_type=conf.get("s3_data_type", "S3Prefix"),
116 |             s3_input_mode=conf.get("s3_input_mode", "File"),
117 |             s3_upload_mode=conf.get("s3_upload_mode", "EndOfJob"),
118 |         )
119 | 
120 |         return args
121 | 
122 |     def _get_static_input_list(self) -> list:
123 |         """
124 |         Method to retreive SageMaker static inputs
125 | 
126 |         Returns:
127 |         ----------
128 |         - SageMaker Processing Inputs list
129 |         
130 |         """
131 |         conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}")
132 |         # Get the total number of input files
133 |         input_files_list = list()
134 |         for channel in conf.get("channels", {}).keys():
135 |             temp_data_files = conf.get(f"channels.{channel}.dataFiles", [])
136 |             if temp_data_files:
137 |                 input_files_list.append(temp_data_files[0])
138 |         return input_files_list
139 | 
140 |     def _get_static_input(self) -> Tuple[list, int]:
141 |         """
142 |         Method to retreive SageMaker static inputs
143 | 
144 |         Returns:
145 |         ----------
146 |         - SageMaker Processing Inputs list
147 |         
148 |         """
149 |         # parse main conf dictionary
150 |         conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}")
151 |         args = self._args()
152 |         # Get the total number of input files
153 |         input_files_list = self._get_static_input_list()
154 |         static_inputs = []
155 |         input_local_filepath = "/opt/ml/processing/input/"
156 | 
157 |         for file in input_files_list:
158 |             if file.get("fileName").startswith("s3://"):
159 |                 _source = file.get("fileName")
160 |             else:
161 |                 bucket = conf.get("channels.train.s3Bucket")
162 |                 input_prefix = conf.get("channels.train.s3InputPrefix", "")
163 |                 _source = os.path.join(bucket, input_prefix, file.get("fileName"))
164 | 
165 |             temp = ProcessingInput(
166 |                 input_name=file.get("sourceName", ""),
167 |                 source=_source,
168 |                 destination=os.path.join(input_local_filepath, file.get("sourceName", "")),
169 |                 s3_data_distribution_type=args.get("s3_data_distribution_type")
170 |             )
171 | 
172 |             static_inputs.append(temp)
173 | 
174 |         return static_inputs
175 | 
176 |     def _get_static_manifest_input(self):
177 |         """
178 |         Method to create a manifest file to reference SageMaker Processing Inputs
179 |         To create a manifest file the conf file should have s3Bucket and s3InputPrefix
180 |         and fileName should contain only the name of the file.
181 | 
182 |         Returns:
183 |         ----------
184 |         - SageMaker Processing Inputs list
185 |         
186 |         Notes:
187 |         ----------
188 |         SageMaker Processing Job API has a limit of 10 ProcessingInputs
189 |         2 of these will be used for code and entrypoint input,
190 |         If there are more than 7 input data files, Manifest file needs to
191 |         be used to reference ProcessingInput data.
192 |         """
193 |         conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}")
194 |         bucket = conf.get("channels.train.s3Bucket")
195 |         input_prefix = conf.get("channels.train.s3InputPrefix", "")
196 |         input_local_file_path = conf.get("inputLocalFilepath", "/opt/ml/processing/input")
197 |         manifest_local_filename = f"{self.model_name}_{self.step_config.get('step_type')}_input.manifest"
198 |         input_files_list = self._get_static_input_list()
199 | 
200 |         manifest_list = []
201 |         for file in input_files_list:
202 |             manifest_list.append(file.get("fileName"))
203 | 
204 |         manifest_data = [{"prefix": f"s3://{bucket}/{input_prefix}"}, *manifest_list]
205 | 
206 |         with open(manifest_local_filename, "w") as outfile:
207 |             json.dump(manifest_data, outfile, indent=1)
208 | 
209 |         manifest_input = ProcessingInput(
210 |             source=manifest_local_filename,
211 |             destination=os.path.join(input_local_file_path, "train"),
212 |             s3_data_type="ManifestFile",
213 |         )
214 | 
215 |         return [manifest_input]
216 | 
217 |     def _get_chain_input(self):
218 |         """
219 |         Method to retreive SageMaker chain inputs
220 | 
221 |         Returns:
222 |         ----------
223 |         - SageMaker Processing Inputs list
224 |         """
225 |         dynamic_processing_input = []
226 |         chain_input_source_step = self.step_config.get("chain_input_source_step", [])
227 |         input_local_filepath = "/opt/ml/processing/input/"
228 |         args = self._args()
229 | 
230 |         for source_step_name in chain_input_source_step:
231 |             source_step_type = look_up_step_type_from_step_name(
232 |                 source_step_name=source_step_name,
233 |                 config=self.config
234 |             )
235 | 
236 |             for channel in self.config["models"][self.model_name][source_step_type].get('channels',["train"]):
237 |                 chain_input_path = get_chain_input_file(
238 |                     source_step_name=source_step_name,
239 |                     steps_dict=self.model_step_dict,
240 |                     source_output_name=channel,
241 |                 )
242 | 
243 |                 temp = ProcessingInput(
244 |                     input_name=f"{source_step_name}-input-{channel}",
245 |                     source=chain_input_path,
246 |                     destination=os.path.join(input_local_filepath, f"{source_step_name}-input-{channel}"),
247 |                     s3_data_distribution_type=args.get("s3_data_distribution_type")
248 |                 )
249 |                 dynamic_processing_input.append(temp)
250 | 
251 |         return dynamic_processing_input
252 | 
253 |     def _get_processing_inputs(self) -> list:
254 |         """
255 |         Method to retreive SageMaker processing inputs
256 | 
257 |         Returns:
258 |         ----------
259 |         - SageMaker Processing Inputs list
260 |         """
261 | 
262 |         temp_static_input = []
263 |         if len(self._get_static_input_list()) >= 7:
264 |             temp_static_input = self._get_static_manifest_input()
265 |         else:
266 |             temp_static_input = self._get_static_input()
267 | 
268 |         dynamic_processing_input = self._get_chain_input()
269 | 
270 |         return temp_static_input + dynamic_processing_input
271 | 
272 |     def _get_processing_outputs(self) -> list:
273 |         """
274 |         Method to retreive SageMaker processing outputs
275 | 
276 |         Returns:
277 |         ----------
278 |         - SageMaker Processing Outputs list
279 |         """
280 |         processing_conf = self.config.get(f"models.{self.model_name}.{self.step_config.get('step_type')}")
281 |         processing_outputs = []
282 |         processing_output_local_filepath = processing_conf.get("location.outputLocalFilepath",
283 |                                                                "/opt/ml/processing/output")
284 | 
285 |         source_step_type = self.step_config["step_type"]
286 | 
287 |         output_names = list(
288 |             self.config["models"][self.model_name][source_step_type].get('channels', ["train"]))
289 | 
290 |         for output_name in output_names:
291 |             temp = ProcessingOutput(
292 |                 output_name=output_name,
293 |                 source=os.path.join(processing_output_local_filepath, output_name),
294 |                 s3_upload_mode="EndOfJob"
295 |             )
296 | 
297 |             processing_outputs.append(temp)
298 | 
299 |         return processing_outputs
300 | 
301 |     def _run_processing_step(
302 |             self,
303 |             network_config: dict,
304 |             args: dict
305 |     ):
306 |         """
307 |         Method to run SageMaker Processing step
308 | 
309 |         Parameters:
310 |         ----------
311 |         - network_config: dict
312 |             Network configuration
313 |         - args: dict
314 |             Arguments for SageMaker Processing step
315 | 
316 |         Returns:
317 |         ----------
318 |         - step_process: dict
319 |             SageMaker Processing step
320 |         """
321 | 
322 |         entrypoint_command = args["entry_point"].replace("/", ".").replace(".py", "")
323 | 
324 |         framework_processor = FrameworkProcessor(
325 |             image_uri=args["image_uri"],
326 |             framework_version=args["framework_version"],
327 |             estimator_cls=estimator.SKLearn,
328 |             role=args["role"],
329 |             command=["python", "-m", entrypoint_command],
330 |             instance_count=args["instance_count"],
331 |             instance_type=args["instance_type"],
332 |             volume_size_in_gb=args["volume_size_in_gb"],
333 |             max_runtime_in_seconds=args["max_runtime_seconds"],
334 |             base_job_name=args["base_job_name"],
335 |             tags=args["tags"],
336 |             env=args["env"],
337 |             volume_kms_key=args["kms_key"],
338 |             output_kms_key=args["kms_key"],
339 |             network_config=NetworkConfig(**network_config),
340 |             sagemaker_session=self._get_pipeline_session(),
341 |         )
342 | 
343 |         step_process = framework_processor.run(
344 |             inputs=self._get_processing_inputs(),
345 |             outputs=self._get_processing_outputs(),
346 |             source_dir=args["source_directory"],
347 |             code=args["entry_point"],
348 |             job_name=args["base_job_name"]
349 |         )
350 | 
351 |         return step_process
352 | 
353 |     def processing(self) -> dict:
354 |         return self._run_processing_step(
355 |             self._get_network_config(),
356 |             self._args()
357 |         )
358 | 


--------------------------------------------------------------------------------
/framework/registermodel/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/registermodel/__init__.py


--------------------------------------------------------------------------------
/framework/registermodel/register_model_service.py:
--------------------------------------------------------------------------------
 1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | #
 3 | # SPDX-License-Identifier: MIT-0
 4 | #
 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
 6 | # software and associated documentation files (the "Software"), to deal in the Software
 7 | # without restriction, including without limitation the rights to use, copy, modify,
 8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
 9 | # permit persons to whom the Software is furnished to do so.
10 | #
11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
17 | 
18 | from sagemaker.model import ModelPackage
19 | from sagemaker.workflow.execution_variables import ExecutionVariables
20 | from sagemaker.model_metrics import MetricsSource, ModelMetrics
21 | from sagemaker.workflow.steps import ProcessingStep, TrainingStep
22 | 
23 | from createmodel.create_model_service import CreateModelService
24 | 
25 | 
26 | class RegisterModelService:
27 |     def __init__(self, config: dict, model_name: str):
28 |         self.config = config
29 |         self.model_name = model_name
30 | 
31 |     def register_model(self, step_metrics: ProcessingStep, step_train: TrainingStep) -> ModelPackage:
32 |         create_model_service = CreateModelService(self.config, self.model_name)
33 |         model_package_dict = self.config.get(f"models.{self.model_name}.registry")
34 |         model = create_model_service.create_model(step_train=step_train)
35 | 
36 |         if step_metrics:
37 |             model_metrics = ModelMetrics(
38 |                 model_statistics=MetricsSource(
39 |                     content_type=self.config.get(
40 |                         f"models.{self.model_name}.evaluate.content_type", 
41 |                         "application/json"
42 |                     ),
43 |                     s3_uri="{}{}.json".format(
44 |                         step_metrics.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"],
45 |                         step_metrics.arguments["ProcessingOutputConfig"]["Outputs"][0]["OutputName"],
46 |                     ),
47 |                 ),
48 |             )
49 |         else:
50 |             model_metrics = None
51 | 
52 |         inference_spec_dict = model_package_dict.get("InferenceSpecification")
53 | 
54 |         register_model_step_args = model.register(
55 |             content_types=inference_spec_dict.get("supported_content_types"),
56 |             response_types=inference_spec_dict.get("supported_response_MIME_types"),
57 |             inference_instances=inference_spec_dict.get("SupportedRealtimeInferenceInstanceTypes", ["ml.m5.2xlarge"]),
58 |             transform_instances=inference_spec_dict.get("SupportedTransformInstanceTypes", ["ml.m5.2xlarge"]),
59 |             model_package_group_name=f"{self.config.get('models.projectName')}-{self.model_name}",
60 |             marketplace_cert=False,
61 |             description=model_package_dict.get(
62 |                 "ModelPackageDescription",
63 |                 "Default Model Package Description. Please add custom descriptioon in your conf.yaml file"
64 |             ),
65 |             customer_metadata_properties={
66 |                 "PIPELINE_ARN": ExecutionVariables.PIPELINE_EXECUTION_ARN,
67 |             },
68 |             approval_status=inference_spec_dict.get("approval_status"),
69 |             model_metrics=model_metrics,
70 |         )
71 | 
72 |         return register_model_step_args
73 | 


--------------------------------------------------------------------------------
/framework/training/__init__.py:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/framework/training/training_service.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | # Import native libraries
 19 | import os
 20 | from typing import Tuple
 21 | 
 22 | from pipeline.helper import get_chain_input_file, look_up_step_type_from_step_name
 23 | from sagemaker.estimator import Estimator
 24 | from sagemaker.inputs import TrainingInput
 25 | from sagemaker.workflow.pipeline_context import PipelineSession
 26 | 
 27 | 
 28 | class TrainingService:
 29 |     """
 30 |     Class to handle SageMaker Training Service
 31 |     """
 32 | 
 33 |     def __init__(
 34 |             self,
 35 |             config: dict,
 36 |             model_name: str,
 37 |             step_config: dict,
 38 |             model_step_dict: dict
 39 |     ) -> "TrainingService":
 40 | 
 41 |         self.config = config
 42 |         self.model_name = model_name
 43 |         self.step_config = step_config
 44 |         self.domain_section = self.step_config.get("step_type", "train")
 45 |         self.model_step_dict = model_step_dict
 46 | 
 47 |     def _get_network_config(self) -> dict:
 48 |         """
 49 |         Method to retreive SageMaker network configuration
 50 | 
 51 |         Returns:
 52 |         ----------
 53 |         - SageMaker Network Configuration dictionary
 54 |         """
 55 | 
 56 |         network_config_kwargs = dict(
 57 |             enable_network_isolation=False,
 58 |             security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split(
 59 |                 ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None,
 60 |             subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get(
 61 |                 "sagemakerNetworkSecurity.subnets", None) else None,
 62 |             encrypt_inter_container_traffic=True,
 63 |         )
 64 |         return network_config_kwargs
 65 | 
 66 |     def _get_pipeline_session(self) -> PipelineSession:
 67 |         """
 68 |         Method to retreive SageMaker pipeline session
 69 | 
 70 |         Returns:
 71 |         ----------
 72 |         - SageMaker Pipeline Session
 73 |         """
 74 | 
 75 |         return PipelineSession(default_bucket=self.config.get("s3Bucket"))
 76 | 
 77 |     def _args(self) -> dict:
 78 |         """
 79 |         Method to retreive SageMaker training arguments
 80 | 
 81 |         Returns:
 82 |         ----------
 83 |         - SageMaker Training Arguments dictionary
 84 |         """
 85 | 
 86 |         conf = self.config.get(f"models.{self.model_name}.{self.domain_section}")
 87 |         source_dir = self.config.get(
 88 |             f"models.{self.model_name}.source_directory",
 89 |             os.getenv("SMP_SOURCE_DIR_PATH")
 90 |         )
 91 | 
 92 |         args = dict(
 93 |             image_uri=conf.get("image_uri"),
 94 |             base_job_name=conf.get("base_job_name", "default-training-job-name"),
 95 |             entry_point=conf.get("entry_point"),
 96 |             instance_count=conf.get("instance_count", 1),
 97 |             instance_type=conf.get("instance_type", "ml.m5.2xlarge"),
 98 |             volume_size_in_gb=conf.get("volume_size_in_gb", 32),
 99 |             max_runtime_seconds=conf.get("max_runtime_seconds", 3000),
100 |             tags=conf.get("tags", None),
101 |             env=conf.get("env", None),
102 |             source_directory=source_dir,
103 |             output_path=conf.get("output_path"),
104 |             hyperparams=conf.get("hyperparams", None),
105 |             model_data_uri=conf.get("model_data_uri", None),
106 |             role=self.config.get("sagemakerNetworkSecurity.role"),
107 |             kms_key=self.config.get("sagemakerNetworkSecurity.kms_key", None)
108 |         )
109 | 
110 |         return args
111 | 
112 |     def _get_static_input_list(self) -> list:
113 |         """
114 |         Method to retreive SageMaker static inputs
115 | 
116 |         Returns:
117 |         ----------
118 |         - SageMaker Processing Inputs list
119 |         
120 |         """
121 |         conf = self.config.get(f"models.{self.model_name}.{self.domain_section}")
122 |         # Get the total number of input files
123 |         input_files_list = list()
124 |         for channel in conf.get("channels", {}).keys(): input_files_list.append(
125 |             conf.get(f"channels.{channel}.dataFiles", [])[0])
126 |         return input_files_list
127 | 
128 |     def _get_static_input(self, channel) -> Tuple[list, int]:
129 |         """
130 |         Method to retreive SageMaker static inputs
131 | 
132 |         Returns:
133 |         ----------
134 |         - SageMaker Processing Inputs list
135 |         
136 |         """
137 |         # parse main conf dictionary
138 |         conf = self.config.get(f"models.{self.model_name}.{self.domain_section}")
139 |         args = self._args()
140 |         # Get the total number of input files
141 |         input_files_list = self._get_static_input_list()
142 | 
143 |         training_channel_inputs = {}
144 |         content_type = conf.get("content_type", None)
145 |         input_mode = conf.get("input_mode", "File")
146 |         distribution = conf.get("distribution", "FullyReplicated")
147 | 
148 |         for file in input_files_list:
149 |             if file.get("fileName").startswith("s3://"):
150 |                 _source = file.get("fileName")
151 |             else:
152 |                 bucket = conf.get("channels.train.s3Bucket")
153 |                 input_prefix = conf.get("channels.train.s3InputPrefix", "")
154 |                 _source = os.path.join(bucket, input_prefix, "data", channel, file.get("fileName"))
155 | 
156 |             training_input = TrainingInput(
157 |                 s3_data=_source,
158 |                 content_type=content_type,
159 |                 input_mode=input_mode,
160 |                 distribution=distribution,
161 |             )
162 | 
163 |             training_channel_inputs[channel] = training_input
164 | 
165 |         return training_channel_inputs
166 | 
167 |     def _get_chain_input(self):
168 |         """
169 |         Method to retreive SageMaker chain inputs
170 | 
171 |         Returns:
172 |         ----------
173 |         - SageMaker Processing Inputs list
174 |         """
175 |         dynamic_training_input = []
176 |         training_channel_inputs = {}
177 |         conf = self.config.get(f"models.{self.model_name}.{self.domain_section}")
178 |         chain_input_source_step = self.step_config.get("chain_input_source_step", [])
179 |         content_type = conf.get("content_type", None)
180 |         input_mode = conf.get("input_mode", "File")
181 |         distribution = conf.get("distribution", "FullyReplicated")
182 | 
183 |         for source_step_name in chain_input_source_step:
184 |             source_step_type = look_up_step_type_from_step_name(
185 |                 source_step_name=source_step_name,
186 |                 config=self.config
187 |             )
188 |             training_channel_inputs[source_step_name] = {}
189 |             for channel in self.config["models"][self.model_name][source_step_type].get("channels", ["train"]):
190 |                 chain_input_path = get_chain_input_file(
191 |                     source_step_name=source_step_name,
192 |                     steps_dict=self.model_step_dict,
193 |                     source_output_name=channel
194 |                 )
195 | 
196 |                 training_input = TrainingInput(
197 |                     s3_data=chain_input_path,
198 |                     content_type=content_type,
199 |                     input_mode=input_mode,
200 |                     distribution=distribution,
201 |                 )
202 | 
203 |                 # dynamic_training_input.append(temp)
204 |                 # training_channel_inputs[f"{self.model_name}-{source_step_name}"] = training_input
205 |                 # training_channel_inputs.update({source_step_name: {channel: training_input}})
206 |                 training_channel_inputs[source_step_name][channel] = training_input
207 | 
208 |         return training_channel_inputs
209 | 
210 |     def _run_training_step(self, args: dict):
211 |         if "/" in args["entry_point"]:
212 |             train_source_dir = f"{args['source_directory']}/{args['entry_point'].rsplit('/', 1)[0]}"
213 |             train_entry_point = args["entry_point"].rsplit("/", 1)[1]
214 |             train_dependencies = [
215 |                 os.path.join(args["source_directory"], f) for f in os.listdir(args["source_directory"])
216 |             ]
217 |         else:
218 |             train_source_dir = args["source_directory"]
219 |             train_entry_point = args["entry_point"]
220 |             train_dependencies = None
221 | 
222 |         estimator = Estimator(
223 |             role=args["role"],
224 |             image_uri=args["image_uri"],
225 |             instance_count=args["instance_count"],
226 |             instance_type=args["instance_type"],
227 |             volume_size=args["volume_size_in_gb"],
228 |             max_run=args["max_runtime_seconds"],
229 |             output_path=args["output_path"],
230 |             base_job_name=args["base_job_name"],
231 |             hyperparameters=args["hyperparams"],
232 |             tags=args["tags"],
233 |             model_uri=args["model_data_uri"],
234 |             environment=args["env"],
235 |             source_dir=train_source_dir,
236 |             entry_point=train_entry_point,
237 |             dependences=train_dependencies,
238 |             sagemaker_session=self._get_pipeline_session()
239 |         )
240 | 
241 |         return estimator
242 | 
243 |     def train_step(self):
244 |         """
245 |         Method to run training step
246 | 
247 |         Returns:
248 |         ----------
249 |         - SageMaker Estimator object
250 |         
251 |         """
252 | 
253 |         train_conf = self.config.get(f"models.{self.model_name}.{self.domain_section}")
254 |         args = self._args()
255 |         estimator = self._run_training_step(args)
256 | 
257 |         training_channel_inputs = {}
258 |         for channel in train_conf.get("channels", "train"):
259 |             temp_inputs = self._get_static_input(channel)
260 |             training_channel_inputs.update(temp_inputs)
261 | 
262 |         chained_inputs = self._get_chain_input()
263 |         for chain_input in chained_inputs:
264 |             for channel in chained_inputs[chain_input]:
265 |                 training_channel_inputs.update({f"{chain_input}-{channel}": chained_inputs[chain_input][channel]})
266 | 
267 |         train_args = estimator.fit(
268 |             inputs=training_channel_inputs,
269 |             job_name=args["base_job_name"]
270 |         )
271 | 
272 |         return train_args
273 | 


--------------------------------------------------------------------------------
/framework/transform/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/transform/__init__.py


--------------------------------------------------------------------------------
/framework/transform/transform_service.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | # Import native libraries
 19 | from typing import Union, Tuple
 20 | 
 21 | from pipeline.helper import get_chain_input_file
 22 | # Import third-party libraries
 23 | from sagemaker.transformer import Transformer
 24 | from sagemaker.workflow.functions import Join
 25 | from sagemaker.workflow.pipeline_context import PipelineSession
 26 | # Import custom libraries
 27 | from utilities.logger import Logger
 28 | 
 29 | 
 30 | class TransformService:
 31 |     """
 32 |     SageMaker Transform Step service.
 33 |     """
 34 | 
 35 |     def __init__(self, config: dict, model_name: str, step_config: dict, model_step_dict: dict) -> "TransformService":
 36 |         """
 37 |         Initialization method to TransformStep
 38 | 
 39 |         Args:
 40 |         ----------
 41 |         - config (dict): Application configuration
 42 |         - model_name (str): Name of Model
 43 |         - run_mode (str): run mode definition. Can be 'train' or 'inference'
 44 |         """
 45 |         self.config = config
 46 |         self.model_name = model_name
 47 |         self.step_config = step_config
 48 |         self.model_step_dict = model_step_dict
 49 |         self.logger = Logger()
 50 | 
 51 |     def _get_network_config(self) -> dict:
 52 |         """
 53 |         Method to retreive SageMaker network configuration
 54 | 
 55 |         Returns:
 56 |         ----------
 57 |         - SageMaker Network Configuration dictionary
 58 |         """
 59 | 
 60 |         network_config_kwargs = dict(
 61 |             enable_network_isolation=False,
 62 |             security_group_ids=self.config.get("sagemakerNetworkSecurity.security_groups_id").split(
 63 |                 ",") if self.config.get("sagemakerNetworkSecurity.security_groups_id") else None,
 64 |             subnets=self.config.get("sagemakerNetworkSecurity.subnets", None).split(",") if self.config.get(
 65 |                 "sagemakerNetworkSecurity.subnets", None) else None,
 66 |             kms_key=self.config.get("sagemakerNetworkSecurity.kms_key"),
 67 |             encrypt_inter_container_traffic=True,
 68 |             role=self.config.get("sagemakerNetworkSecurity.role"),
 69 |         )
 70 |         return network_config_kwargs
 71 | 
 72 |     def _args(self) -> dict:
 73 |         """
 74 |         Parse method to retreive all arguments to be used to create the Model
 75 | 
 76 |         Returns:
 77 |         ----------
 78 |         - Transform Step arguments : dict
 79 |         """
 80 | 
 81 |         # parse main conf dictionary
 82 |         conf = self.config.get("models")
 83 | 
 84 |         args = dict(
 85 |             image_uri=conf.get(f"{self.model_name}.transform.image_uri"),
 86 |             base_job_name=conf.get(f"{self.model_name}.transform.base_job_name", "default-transform-job-name"),
 87 |             instance_count=conf.get(f"{self.model_name}.transform.instance_count", 1),
 88 |             instance_type=conf.get(f"{self.model_name}.transform.instance_type", "ml.m5.2xlarge"),
 89 |             strategy=conf.get(f"{self.model_name}.transform.strategy", None),
 90 |             assemble_with=conf.get(f"{self.model_name}.transform.assemble_with", None),
 91 |             join_source=conf.get(f"{self.model_name}.transform.join_source", None),
 92 |             split_type=conf.get(f"{self.model_name}.transform.split_type", None),
 93 |             content_type=conf.get(f"{self.model_name}.transform.content_type", "text/csv"),
 94 |             max_payload=conf.get(f"{self.model_name}.transform.max_payload", None),
 95 |             volume_size=conf.get(f"{self.model_name}.transform.volume_size", 50),
 96 |             max_runtime_in_seconds=conf.get(f"{self.model_name}.transform.max_runtime_in_seconds", 3600),
 97 |             input_filter=conf.get(f"{self.model_name}.transform.input_filter", None),
 98 |             output_filter=conf.get(f"{self.model_name}.transform.output_filter", None),
 99 |             tags=conf.get(f"{self.model_name}.transform.tags", None),
100 |             env=conf.get(f"{self.model_name}.transform.env", None),
101 |         )
102 | 
103 |         return args
104 | 
105 |     def _get_train_inputs_outputs(self, transform_data: dict) -> Tuple[str, str]:
106 |         """
107 |         Method to retreive dynamically the files to be Transformed
108 | 
109 |         Args:
110 |         ----------
111 |         - transform_data (dict): Dictionary of files
112 | 
113 |         Return
114 |         ----------
115 |         - input_data_file_s3path (str): Input path location
116 |         - output_file_s3path (str): Output path location
117 |         """
118 | 
119 |         evaluate_channels = list(transform_data.get("channels", "train").keys())
120 |         if len(evaluate_channels) != 1:
121 |             raise Exception(f"Only one channel allowed within Transform evaluate section. {evaluate_channels} found.")
122 |         else:
123 |             channel = evaluate_channels[0]
124 |             self.logger.log_info("INFO", f"During TransformService, one evaluate channel {channel} found.")
125 | 
126 |         channel_full_name = f"channels.{channel}"
127 |         bucket_prefix = transform_data.get(f"{channel_full_name}.inputBucketPrefix") + '/' if transform_data.get(
128 |             f"{channel_full_name}.inputBucketPrefix") else ""
129 |         s3_bucket_name = transform_data.get(f"{channel_full_name}.s3BucketName")
130 | 
131 |         # Transform data source
132 |         files = list(transform_data.get(f"{channel_full_name}.dataFiles", ""))
133 | 
134 |         if len(files) == 1:
135 |             file = files[0]
136 |             self.logger.log_info("During TransformService, one evaluate file {file} found.")
137 |             file_name = file.get("fileName")
138 | 
139 |             if file_name.startswith("s3://"):
140 |                 input_data_file_s3path = file_name
141 |             else:
142 |                 input_data_file_s3path = f"s3://{s3_bucket_name}/{bucket_prefix}/{file_name}"
143 | 
144 |         elif len(files) == 0:
145 |             self.logger.log_info("INFO", "During TransformService, no evaluate file found.")
146 |             input_data_file_s3path = None
147 |         else:
148 |             raise Exception(f"Maximum one file allowed within evaluation.dataFiles section. {len(files)} found.")
149 | 
150 |         output_file_s3path = f"s3://{s3_bucket_name}/{bucket_prefix}{self.model_name}/predictions/transform"
151 | 
152 |         return input_data_file_s3path, output_file_s3path
153 | 
154 |     def _get_chain_input(self):
155 |         """
156 |         Method to retreive SageMaker chain inputs
157 | 
158 |         Returns:
159 |         ----------
160 |         - SageMaker Processing Inputs list
161 |         """
162 |         channels_conf = self.config.get(f"models.{self.model_name}.transform.channels", {"train": []})
163 |         if len(channels_conf.keys()) != 1:
164 |             raise Exception("Transform step can only have one channel.")
165 |         channel_name = list(channels_conf.keys())[0]
166 | 
167 |         dynamic_processing_input = []
168 |         chain_input_source_step = self.step_config.get("chain_input_source_step", [])
169 |         chain_input_additional_prefix = self.step_config.get("chain_input_additional_prefix", "")
170 |         args = self._args()
171 | 
172 |         if len(chain_input_source_step) == 1:
173 |             self.logger.log_info(
174 |                 "INFO", f"During TransformService, chain input source step {chain_input_source_step} found."
175 |             )
176 |             for source_step_name in chain_input_source_step:
177 |                 chain_input_path = get_chain_input_file(
178 |                     source_step_name=source_step_name,
179 |                     steps_dict=self.model_step_dict,
180 |                     source_output_name=channel_name,
181 |                 )
182 | 
183 |                 input_data_file_s3path = Join("/", [chain_input_path, chain_input_additional_prefix])
184 | 
185 |         elif len(chain_input_source_step) == 0:
186 |             self.logger.log_info(
187 |                 "INFO", "During TransformService, no chain input found. Input from transform.dataFiles"
188 |             )
189 |             return None
190 |         else:
191 |             raise Exception(
192 |                 f"Maximum one chain input allowed for TransformService. {len(chain_input_source_step)} found."
193 |             )
194 | 
195 |         return input_data_file_s3path
196 | 
197 |     def _run_batch_transform(
198 |             self,
199 |             input_data: str,
200 |             output_path: str,
201 |             network_config: dict,
202 |             sagemaker_model_name: str,
203 |             args: dict,
204 |     ) -> dict:
205 |         """
206 |         Method to setup a SageMaker Transformer and Transformer arguments
207 | 
208 |         Args:
209 |         ----------
210 |         - input_data (str): Input data path
211 |         - output_path (str): Output data path
212 |         - network_config (dict): SageMaker Network Config
213 |         - sagemaker_model_name (str): SageMaker Model Name
214 |         - args (dict): SageMaker TransformStep arguments
215 | 
216 |         Return
217 |         ----------
218 |         - step_transform_args : TransformStep Args
219 |         """
220 | 
221 |         # define a transformer
222 |         transformer = Transformer(
223 |             model_name=sagemaker_model_name,
224 |             instance_count=args["instance_count"],
225 |             instance_type=args["instance_type"],
226 |             strategy=args["strategy"],
227 |             assemble_with=args["assemble_with"],
228 |             max_payload=args["max_payload"],
229 |             output_path=output_path,
230 |             sagemaker_session=PipelineSession(
231 |                 default_bucket=self.config.get("models.s3Bucket"),
232 |             ),
233 |             output_kms_key=network_config["kms_key"],
234 |             accept=args["content_type"],
235 |             tags=args["tags"],
236 |             env=args["env"],
237 |             base_transform_job_name=args["base_job_name"],
238 |             volume_kms_key=network_config["kms_key"],
239 |         )
240 | 
241 |         step_transform_args = transformer.transform(
242 |             data=input_data,
243 |             content_type=args["content_type"],
244 |             split_type=args["split_type"],
245 |             input_filter=args["input_filter"],
246 |             output_filter=args["output_filter"],
247 |             join_source=args["join_source"],
248 |         )
249 | 
250 |         return step_transform_args
251 | 
252 |     def transform(
253 |             self,
254 |             sagemaker_model_name: str,
255 |     ) -> Union[dict, dict]:
256 |         """
257 |         Method to setup SageMaker TransformStep
258 | 
259 |         Args:
260 |         ----------
261 |         - step_config (dict)
262 |         - steps_dict (dict)
263 |         - sagemaker_model_name (str)
264 |         """
265 | 
266 |         step_transform_args = None
267 |         self.logger.log_info(f"{'-' * 40} {self.model_name} {'-' * 40}")
268 |         self.logger.log_info(f"Starting {self.model_name} batch transform")
269 | 
270 |         # Get SageMaker network configuration
271 |         sagemaker_network_config = self._get_network_config()
272 |         self.logger.log_info(f"SageMaker network config: {sagemaker_network_config}")
273 | 
274 |         transform_data = self.config.get(f"models.{self.model_name}.transform")
275 |         sagemaker_config = self._args()
276 | 
277 |         if self._get_chain_input():
278 |             input_data_file_s3path = self._get_chain_input()
279 |             output_data_file_s3path = None
280 |         else:
281 |             input_data_file_s3path, output_data_file_s3path = self._get_train_inputs_outputs(
282 |                 transform_data
283 |             )
284 | 
285 |         step_transform_args = self._run_batch_transform(
286 |             input_data=input_data_file_s3path,
287 |             output_path=output_data_file_s3path,
288 |             network_config=sagemaker_network_config,
289 |             sagemaker_model_name=sagemaker_model_name,
290 |             args=sagemaker_config,
291 |         )
292 | 
293 |         return step_transform_args
294 | 


--------------------------------------------------------------------------------
/framework/utilities/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/dynamic-sagemaker-pipelines-framework/0499913d0d0ac0d935fb5340f8bb3afb69ce6469/framework/utilities/__init__.py


--------------------------------------------------------------------------------
/framework/utilities/configuration.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | import os
 19 | import glob
 20 | import yaml
 21 | from typing import Any, Dict, Union, List
 22 | 
 23 | 
 24 | class Conf:
 25 |     """
 26 |     Class to Read Framework config and all complementary app Conf files.
 27 |     """
 28 | 
 29 |     def __init__(self):
 30 |         self.path = "framework/conf/conf.yaml"
 31 | 
 32 |     def load_conf(self):
 33 |         """
 34 |         Method to load and merge all Conf files
 35 |         """
 36 |         base_conf, conf_path = self._get_framework_conf()
 37 |         base_conf["conf"]["models"] = {}
 38 | 
 39 |         modelConfigFilePath = base_conf["conf"][
 40 |             "modelConfigFilePath"
 41 |         ]
 42 |         yaml_files = glob.glob(
 43 |             f"{self._get_parent_dir()}/{modelConfigFilePath}", recursive=True
 44 |         )
 45 | 
 46 |         for file_path in yaml_files:
 47 |             if file_path.startswith(conf_path):
 48 |                 continue
 49 |             model_conf = self._read_yaml_file(file_path)
 50 |             # Insert Models Attibutes into Framework attribute in a runtime
 51 |             for key, value in model_conf["conf"]["models"].items():
 52 |                 base_conf["conf"]["models"].setdefault(key, {}).update(value)
 53 | 
 54 |             # Insert sagemakerPipeline section as a primary key
 55 |             for key, value in model_conf["conf"].items():
 56 |                 if key == "sagemakerPipeline":
 57 |                     base_conf["conf"]["sagemakerPipeline"] = {}
 58 |                     base_conf["conf"]["sagemakerPipeline"].update(value)
 59 | 
 60 |         update_conf = self._inject_env_variables(config=base_conf)
 61 |         return DotDict(update_conf).get("conf")
 62 | 
 63 |     def _inject_env_variables(
 64 |             self,
 65 |             config: Union[Dict[str, Union[Dict, List, str]], List]
 66 |     ) -> Union[Dict, List]:
 67 |         """
 68 |         Replace dictionary TAGS by Environment Variables on a runtime
 69 | 
 70 |         Args:
 71 |         ----------
 72 |             - config (dict): Framework configuration
 73 | 
 74 |         Returns:
 75 |         ----------
 76 |             - Frameworks configurationwith values tags replaced by
 77 |             environment variables.
 78 | 
 79 |         """
 80 |         if isinstance(config, dict):
 81 |             updated_config = {}
 82 |             for key, value in config.items():
 83 |                 if isinstance(value, dict):
 84 |                     updated_config[key] = self._inject_env_variables(value)
 85 |                 elif isinstance(value, list):
 86 |                     updated_config[key] = [self._inject_env_variables(item) for item in value]
 87 |                 else:
 88 |                     updated_config[key] = self._replace_placeholders(value)
 89 |             return updated_config
 90 | 
 91 |         elif isinstance(config, list):
 92 |             return [self._inject_env_variables(item) for item in config]
 93 |         else:
 94 |             return config
 95 | 
 96 |     def _replace_placeholders(self, value: str) -> str:
 97 |         """
 98 |         Placeholder
 99 |         """
100 |         if isinstance(value, str):
101 |             if value.startswith("s3://"):
102 |                 parts = value.split("/")
103 |                 updated_parts = [os.environ.get(part, part) for part in parts]
104 |                 return "/".join(updated_parts)
105 |             else:
106 |                 parts = value.split(".")
107 |                 updated_parts = [os.environ.get(part, part) for part in parts]
108 |                 return '.'.join(updated_parts)
109 |         return value
110 | 
111 |     def _get_framework_conf(self):
112 |         """
113 |         Load the Framework Conf file
114 |         """
115 |         path = self.path
116 |         root = self._get_parent_dir()
117 |         conf_path = os.path.join(root, path)
118 | 
119 |         with open(conf_path, "r") as f:
120 |             conf = yaml.safe_load(f)
121 |             config = self._inject_env_variables(config=conf)
122 |         return config, conf_path
123 | 
124 |     def _get_parent_dir(self):
125 |         """
126 |         Get the parent directory from where the framework is been executed
127 |         """
128 |         subdirectory = "framework"
129 |         current_directory = os.getcwd()
130 | 
131 |         substring = str(current_directory).split("/")
132 |         parent_dir = [path for path in substring if path != subdirectory]
133 | 
134 |         return "/".join(parent_dir)
135 | 
136 |     def _read_yaml_file(self, file_path: str):
137 |         """
138 |         Read YAML file
139 | 
140 |         Args:
141 |         ----------
142 |         - file_path (str): Conf file path
143 | 
144 |         """
145 |         with open(file_path, "r") as f:
146 |             return yaml.safe_load(f)
147 | 
148 | 
149 | class DotDict(dict):
150 |     """
151 |     A dictionary subclass that enables dot notation for nested access
152 |     """
153 | 
154 |     def __getattr__(self, key: str) -> "DotDict":
155 |         """
156 |         Retreive the value of a nested key using dot notation.
157 | 
158 |         Args:
159 |         ----------
160 |         - key (str): Yhe nested key in dot notation
161 | 
162 |         Returns:
163 |         ----------
164 |         - The value of the nested key, wrapped in a "DotDict" if the value is a dictionary.
165 | 
166 |         Raises:
167 |         ----------
168 |         - AttributeError: If the nested key is not found.
169 |         """
170 |         if key in self:
171 |             value = self[key]
172 |             if isinstance(value, dict):
173 |                 return DotDict(value)
174 |             return value
175 |         else:
176 |             return DotDict()
177 | 
178 |     def __setattr__(self, key: str, value: Any) -> None:
179 |         self[key] = value
180 | 
181 |     def __delattr__(self, key: str) -> None:
182 |         try:
183 |             del self[key]
184 |         except KeyError:
185 |             raise AttributeError(
186 |                 f"{self.__class__.__name__} object has no attribute {key}"
187 |             )
188 | 
189 |     def get_value(self, key: str, default: Any = None) -> Any:
190 |         """
191 |         Retreive the value of a nested key using dot notation
192 | 
193 |          Args:
194 |         ----------
195 |         - key (str): The nested key in dot notation.
196 |         - default (Any): The default value to return if the nested keyis not found. Default is None
197 | 
198 |         Returns:
199 |         ----------
200 |         - The value of the nested key if found, or the specified default value if not found.
201 |         """
202 |         keys = key.split(".")
203 |         value = self
204 |         for k in keys:
205 |             value = value.__getattr__(k)
206 |             if not isinstance(value, DotDict):
207 |                 break
208 |         return value if value is not None else default
209 | 
210 |     def get(self, key: str, default: Any = None) -> Any:
211 |         """
212 |         Retreive the value of a nested key using dot notation
213 | 
214 |          Args:
215 |         ----------
216 |         - key (str): The nested key in dot notation.
217 |         - default (Any): The default value to return if the nested keyis not found. Default is None
218 | 
219 |         Returns:
220 |         ----------
221 |         - The value of the nested key if found, or the specified default value if not found.
222 |         """
223 |         value = self.get_value(key)
224 |         return value if value is not None and value != {} else default
225 | 


--------------------------------------------------------------------------------
/framework/utilities/logger.py:
--------------------------------------------------------------------------------
  1 | # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
  2 | #
  3 | # SPDX-License-Identifier: MIT-0
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this
  6 | # software and associated documentation files (the "Software"), to deal in the Software
  7 | # without restriction, including without limitation the rights to use, copy, modify,
  8 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
  9 | # permit persons to whom the Software is furnished to do so.
 10 | #
 11 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 12 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 13 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 14 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
 15 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
 16 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 17 | 
 18 | import logging
 19 | from typing import Union, List
 20 | 
 21 | 
 22 | class Logger:
 23 |     """
 24 |     Logger class that implement custom formattin messages
 25 | 
 26 |     Attibutes:
 27 |     ----------
 28 |     - logger (logging.Logger): The logger object from the logging module.
 29 | 
 30 |     Methods:
 31 |     ----------
 32 |     - log_debug(*messages: Union[str, List[str]]): Logs debug messages
 33 |     - log_info(*messages: Union[str, List[str]]): Logs informational messages
 34 |     - log_warning(*messages: Union[str, List[str]]): Logs warning messages
 35 |     - log_error(*messages: Union[str, List[str]]): Logs error messages
 36 |     - log_critical(*messages: Union[str, List[str]]): Logs critical messages
 37 |     """
 38 | 
 39 |     def __init__(self, config: Union[dict, None] = None):
 40 |         """
 41 |         Initializes the Loger class.
 42 | 
 43 |         Args:
 44 |         ----------
 45 |         - config (dict): Configuration for the logger
 46 |         """
 47 |         # Get logger
 48 |         self.logger = logging.getLogger(self.__class__.__name__)
 49 | 
 50 |         # Create the formatter
 51 |         formatter = logging.Formatter(
 52 |             "%(asctime)s :::: [Log %(name)s] :::: %(message)s",
 53 |             datefmt="[%Y-%m-%d %H:%M:%S %Z%z]"
 54 |         )
 55 | 
 56 |         # Create the console handler and add formatter
 57 |         console_handler = logging.StreamHandler()
 58 |         console_handler.setFormatter(formatter)
 59 | 
 60 |         # Add handlers to the loger
 61 |         self.logger.addHandler(console_handler)
 62 | 
 63 |     def _log_messages(self, level: str, prefix: str, *messages: Union[str, List[str]]):
 64 |         """
 65 |         Logs messages with the specified level.
 66 | 
 67 |         Args:
 68 |         ----------
 69 |         - level:     (str): The logging level
 70 |         - prefix:    (str): Message prefix to be used
 71 |         - *messages: (Union[str, List[str]]): Single or List of messages
 72 |         """
 73 |         for msg in messages:
 74 |             level(f"[Level: {prefix}] :::: {msg}")
 75 | 
 76 |     def log_debug(self, *messages: Union[str, List[str]]):
 77 |         """
 78 |         Logs debug messages
 79 | 
 80 |         Args:
 81 |         ----------
 82 |         - *messages: (Union[str, List[str]]): Single or List of messages
 83 |         """
 84 |         self.logger.setLevel(logging.DEBUG)
 85 |         self._log_messages(self.logger.debug, "DEBUG   ", *messages)
 86 | 
 87 |     def log_info(self, *messages: Union[str, List[str]]):
 88 |         """
 89 |         Logs informational messages
 90 | 
 91 |         Args:
 92 |         ----------
 93 |         - *messages: (Union[str, List[str]]): Single or List of messages
 94 |         """
 95 |         self.logger.setLevel(logging.INFO)
 96 |         self._log_messages(self.logger.info, "INFO    ", *messages)
 97 | 
 98 |     def log_warning(self, *messages: Union[str, List[str]]):
 99 |         """
100 |         Logs warining messages
101 | 
102 |         Args:
103 |         ----------
104 |         - *messages: (Union[str, List[str]]): Single or List of messages
105 |         """
106 |         self.logger.setLevel(logging.WARNING)
107 |         self._log_messages(self.logger.warning, "WARNING ", *messages)
108 | 
109 |     def log_error(self, *messages: Union[str, List[str]]):
110 |         """
111 |         Logs error messages
112 | 
113 |         Args:
114 |         ----------
115 |         - *messages: (Union[str, List[str]]): Single or List of messages
116 |         """
117 |         self.logger.setLevel(logging.ERROR)
118 |         self._log_messages(self.logger.error, "ERROR   ", *messages)
119 | 
120 |     def log_critical(self, *messages: Union[str, List[str]]):
121 |         """
122 |         Logs critical messages
123 | 
124 |         Args:
125 |         ----------
126 |         - *messages: (Union[str, List[str]]): Single or List of messages
127 |         """
128 |         self.logger.setLevel(logging.CRITICAL)
129 |         self._log_messages(self.logger.critical, "CRITICAL", *messages)
130 | 


--------------------------------------------------------------------------------
/framework/utilities/utils.py:
--------------------------------------------------------------------------------
 1 | class S3Utilities:
 2 |     def __new__(cls):
 3 |         if cls._instance is None:
 4 |             cls._instance = super(S3Utilities, cls).__new__(cls)
 5 |         return cls._instance
 6 | 
 7 |     @staticmethod
 8 |     def split_s3_uri(s3_uri: str) -> tuple:
 9 |         split_list = s3_uri.split("//")[1].split("/")
10 |         s3_bucket_name = split_list[0]
11 |         s3_key = "/".join(split_list[1:])
12 |         return s3_bucket_name, s3_key
13 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | sagemaker
2 | boto3
3 | pyhocon
4 | PyYAML


--------------------------------------------------------------------------------