├── .github
└── PULL_REQUEST_TEMPLATE.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── imgs
├── CloudWatchA.png
├── CloudWatchB.png
├── CloudWatchC.png
├── MLOps_BuildImage.jpg
├── MLOps_Train_Deploy_TestModel.jpg
├── MLOps_Train_Deploy_TestModel.png
├── cloudformation-launch-stack.png
├── cloudformationStacks.png
├── crisp.png
└── eyecatch_sagemaker.png
└── lab
├── 00_Warmup
├── 01_BasicModel_Part1_TrainDeployTest.ipynb
├── 02_BasicModel_Part2_HyperparameterOptimization.ipynb
├── 03_BasicModel_Part3_BatchPrediction.ipynb
└── 04_BasicModel_Part4_ModelMonitoring.ipynb
├── 01_CreateAlgorithmContainer
├── 01_Creating a Classifier Container.ipynb
├── 02_Testing our local model server.ipynb
└── 03_Testing the container using SageMaker Estimator.ipynb
├── 02_TrainYourModel
├── 01_Training our model.ipynb
└── 02_Check Progress and Test the endpoint.ipynb
└── 03_TestingHacking
└── 01_Stress Test.ipynb
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | *Issue #, if available:*
2 |
3 | *Description of changes:*
4 |
5 |
6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
7 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing Guidelines
2 |
3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
4 | documentation, we greatly value feedback and contributions from our community.
5 |
6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
7 | information to effectively respond to your bug report or contribution.
8 |
9 |
10 | ## Reporting Bugs/Feature Requests
11 |
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 |
14 | When filing an issue, please check [existing open](https://github.com/awslabs/amazon-sagemaker-mlops-workshop/issues), or [recently closed](https://github.com/awslabs/amazon-sagemaker-mlops-workshop/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 |
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 |
22 |
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 |
26 | 1. You are working against the latest source on the *master* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 |
30 | To send us a pull request, please:
31 |
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 |
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 |
42 |
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/awslabs/amazon-sagemaker-mlops-workshop/labels/help%20wanted) issues is a great place to start.
45 |
46 |
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 |
52 |
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 |
56 |
57 | ## Licensing
58 |
59 | See the [LICENSE](https://github.com/awslabs/amazon-sagemaker-mlops-workshop/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 |
61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.
62 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this
4 | software and associated documentation files (the "Software"), to deal in the Software
5 | without restriction, including without limitation the rights to use, copy, modify,
6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
7 | permit persons to whom the Software is furnished to do so.
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Amazon Sagemaker MLops (with classic CI/CD tools) Workshop
2 | Machine Learning Ops Workshop with SageMaker and CodePipeline: lab guides and materials.
3 |
4 | ## Introduction
5 |
6 |
7 |
8 | Data Scientists and ML developers need more than a Jupyter notebook to create a ML model, to test it, to put it into production and to integrate it with a portal and/or a basic web/mobile application, in a reliable and flexible way.
9 |
10 |
11 |
12 | There are two basic questions that you should consider when you start developing a ML model for a real Business Case:
13 |
14 | 1. How long would it take your organization to deploy a change that involves a single line of code?
15 | 2. Can you do this on a repeatable, reliable basis?
16 |
17 | So, if you're not happy with the answers you have, MLOps is a concept that can help you: a) to create or improve the organization culture for CI/CD applied to ML; b) to create an automated infrastructure that will support your processes.
18 |
19 | In this workshop you'll see how to create/operate an automated ML pipeline using a traditional CI/CD tool, called [CodePipeline](https://aws.amazon.com/codepipeline/), to orchestrate the ML workflow. During the exercises you'll see how to create a Docker container from scratch with your own algorithm, start a training/deployment job by just copying a .zip file to an S3 repo, run A/B tests and more. This is a reference architecture that can be used as an inspiration to create your own solution.
20 |
21 | [Amazon Sagemaker](https://aws.amazon.com/sagemaker/), a service that supports the whole pipeline of a ML Model development lifecycle, is the heart of this solution. Around it, you can add several different services as the AWS Code* for creating an automated pipeline, building your docker images, train/test/deploy/integrate your models, etc.
22 |
23 | Here you can find more information about [DevOps](https://aws.amazon.com/devops/) at AWS ([What is DevOps](https://aws.amazon.com/pt/devops/what-is-devops/)).
24 |
25 | #### Some important references
26 |
27 | Another AWS service that can used for this purpose is [Step Functions](
28 | https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/readmelink.html#getting-started-with-sample-jupyter-notebooks). In this link, you'll also find the documentation of the python library that can be executed directly from your Jupyter notebook.
29 |
30 | [Apache AirFlow](https://airflow.apache.org/) is a powerful Open Source tool that can also be integrated with SageMaker. Curious? Just take a look on the [SageMaker Operators for AirFlow](https://sagemaker.readthedocs.io/en/stable/using_workflow.html).
31 |
32 | Ah, you have a Kubernetes cluster and want to integrate SageMaker to that and manage the ML Pipeline from the cluster. No problem, take a look on the [SageMaker Operators for Kubernetes](https://aws.amazon.com/blogs/machine-learning/introducing-amazon-sagemaker-operators-for-kubernetes/).
33 |
34 | Anyway, there are lots of workflow managers that can be perfectly integrated with SageMaker to do the same job! Pick yours and use your creativity to create your own MLOps platform!
35 |
36 | ## Pre-Requisites
37 |
38 | ### Services
39 |
40 | You should have some basic experience with:
41 | - Train/test a ML model
42 | - Python ([scikit-learn](https://scikit-learn.org/stable/#))
43 | - [Jupyter Notebook](https://jupyter.org/)
44 | - [AWS CodePipeline](https://aws.amazon.com/codepipeline/)
45 | - [AWS CodeCommit](https://aws.amazon.com/codecommit/)
46 | - [AWS CodeBuild](https://aws.amazon.com/codebuild/)
47 | - [Amazon ECR](https://aws.amazon.com/ecr/)
48 | - [Amazon SageMaker](https://aws.amazon.com/sagemaker/)
49 | - [AWS CloudFormation](https://aws.amazon.com/cloudformation/)
50 |
51 |
52 | Some experience working with the AWS console is helpful as well.
53 |
54 | ### AWS Account
55 |
56 | In order to complete this workshop you'll need an AWS Account with access to the services above. There are resources required by this workshop that are eligible for the AWS free tier if your account is less than 12 months old. See the [AWS Free Tier](https://aws.amazon.com/free/) page for more details.
57 |
58 | ## Scenario
59 |
60 | In this workshop you'll implement and experiment a basic MLOps process, supported by an automated infrastructure for training/testing/deploying/integrating ML Models. It is comprised into four parts:
61 |
62 | 1. You'll start with a **WarmUp**, for reviewing the basic features of Amazon Sagemaker;
63 | 2. Then you will **optionally** create a **Customized Docker Image** with your own algorithm. We'll use scikit-learn as our library;
64 | 3. After that, you will train the model (using the built-in XGBoost or a custom container if you ran the step 2), deploy them into a **DEV** environment, approve and deploy them into a **PRD** environment with **High Availability** and **Elasticity**;
65 | 4. Finally, you'll run a Stress test on your production endpoint to test the elasticity and simulate a situation where the number of requests on your ML model can vary.
66 |
67 | Parts 2 and 3 are supported by automated pipelines that reads the assets produced by the ML developer and execute/control the whole process.
68 |
69 |
70 | ### Architecture
71 | For part 2 the following architecture will support the process. In part 2 you'll create a Docker image that contains your own implementation of a RandomForest classifier, using python 3.7 and scikit-learn. Remember that if you are happy with the [built-in XGboost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) you can skip this part.
72 |
73 | 
74 |
75 | 1. The **ML Developer** creates the assets for Docker Image based on Scikit Learn, using Sagemaker, and pushes all the assets to a Git Repo (CodeCommit);
76 | 2. CodePipeline listens the push event of CodeCommit, gets the source code and launches CodeBuild;
77 | 3. CodeBuild authenticates into ECR, build the Docker Image and pushes it into the ECR repository
78 | 4. Done.
79 |
80 | For part 3 you'll make use of the following structure for training the model, testing it, deploying it in two different environments: DEV - QA/Development (simple endpoint) and PRD - Production (HA/Elastic endpoint).
81 |
82 | **Although there is an ETL part in the Architecture, we'll not use Glue or other ETL tool in this workshop. The idea is just to show you how simple it is to integrate this Architecture with your Data Lake and/or Legacy databases using an ETL process**
83 | 
84 |
85 |
86 | 1. An ETL process or the ML Developer, prepares a new dataset for training the model and copies it into an S3 Bucket;
87 | 2. CodePipeline listens to this S3 Bucket, calls a Lambda function for start training a job in Sagemaker;
88 | 3. The lambda function sends a training job request to Sagemaker;
89 | 4. When the training is finished, CodePipeline gets its status goes to the next stage if there is no error;
90 | 5. CodePipeline calls CloudFormation to deploy a model in a Development/QA environment into Sagemaker;
91 | 6. After finishing the deployment in DEV/QA, CodePipeline awaits for a manual approval
92 | 7. An approver approves or rejects the deployment. If rejected the pipeline stops here; If approved it goes to the next stage;
93 | 8. CodePipeline calls CloudFormation to deploy a model into production. This time, the endpoint will count with an AutoScaling policy for HA and Elasticity.
94 | 9. Done.
95 |
96 | ### Crisp DM
97 |
98 |
99 |
100 | It is important to mention that the process above was based on an Industry process for Data Mining and Machine Learning called [CRISP-DM](https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining).
101 |
102 | CRISP-DM stands for “Cross Industry Standard Process – Data Mining” and is an excellent skeleton to build a data science project around.
103 |
104 |
105 |
106 | There are 6 phases to CRISP:
107 | - **Business understanding**: Don’t dive into the data immediately! First take some time to understand: Business objectives, Surrounding context, ML problem category.
108 | - **Data understanding**: Exploring the data gives us insights about the paths we should follow.
109 | - **Data preparation**: Data cleaning, normalization, feature selection, feature engineering, etc.
110 | - **Modeling**: Select the algorithms, train your model, optimize it as necessary.
111 | - **Evaluation**: Test your model with different samples, with real data if possible and decide if the model will fit the requirements of your business case.
112 | - **Deployment**: Deploy into production, integrate it, do A/B tests, integration tests, etc.
113 |
114 | Notice the arrows in the diagram though. CRISP frames data science as a cyclical endeavour - more insights lead to better business understanding, which kicks off the process again.
115 |
116 | ## Instructions
117 |
118 | First, you need to execute a CloudFormation script to create all the components required for the exercises.
119 |
120 | 1. Select the below to launch CloudFormation stack.
121 |
122 | Region| Launch
123 | ------|-----
124 | US East (N. Virginia) | [](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=AIWorkshop&templateURL=https://s3.amazonaws.com/aws-ai-ml-aod-latam/mlops-workshop/m.yml)
125 |
126 | 1. Then open the Jupyter Notebook instance in Sagemaker and start doing the exercises:
127 |
128 | 1. [Warmup](lab/00_Warmup/01_BasicModel_Part1_TrainDeployTest.ipynb): This is a basic exercise for exploring the Sagemaker features like: training, deploying and optimizing a model. If you already have experience with Sagemaker, you can skip this exercise.
129 | 2. [Container Image with a Scikit Classifier](lab/01_CreateAlgorithmContainer/01_Creating%20a%20Classifier%20Container.ipynb ): In this exercise we'll create a Docker Image that encapsulates all the code required for training and deploying a RandomForest classifier. If you don't want to create a custom container, skip this section.
130 | 1. [Test the models locally](lab/01_CreateAlgorithmContainer/02_Testing%20our%20local%20model%20server.ipynb): This is part of the exercise #3. You can use this Jupyter to test your local WebService, to simulate how Sagemaker will call it when you ask it to create an Endpoint or launch a Batch job for you.
131 | 2. [Test the container using a SageMaker Estimator](lab/01_CreateAlgorithmContainer/03_Testing%20the%20container%20using%20SageMaker%20Estimator.ipynb): This optional exercise can be used for understanding how SageMaker Estimators can encapsulate your container and abstract the complexity of the training/tuning/deploying processes.
132 | 4. [Train your models](lab/02_TrainYourModel/01_Training%20our%20model.ipynb): In this exercise you'll use the training pipeline. You'll see how to train or retrain a particular model by just copying a zip file with the required assets to a given S3 bucket.
133 | 1. [Check Training progress and test](lab/02_TrainYourModel/02_Check%20Progress%20and%20Test%20the%20endpoint.ipynb): Here you can monitor the training process, approve the production deployment and test your endpoints.
134 | 5. [Stress Test](lab/03_TestingHacking/01_Stress%20Test.ipynb): Here you can execute stress tests to see how well your model is performing.
135 |
136 |
137 | ----
138 | # Cleaning
139 |
140 | First delete the following stacks:
141 | - mlops-deploy-iris-model-dev
142 | - mlops-deploy-iris-model-prd
143 | - mlops-training-iris-model-job
144 |
145 | Then delete the stack you created. If you named **AIWorkshop**, find this stack using the CloudFormation console and delete it.
146 |
147 | **WARNING**: All the assets will be deleted, including the S3 Bucket and the ECR Docker images created during the execution of this workshop.
148 |
149 | ----
150 | # Suggested agenda
151 | - Introduction (1:00)
152 | - MLStack@AWS, SageMaker concepts/features, MLOps, etc.
153 | - WarmUp - Part 1 (0:30)
154 | - Break (0:15)
155 | - WarmUp - Parts 2,3,4 (0:50)
156 | - Container Image (0:40)
157 | - Lunch (1:00)
158 | - MLOps Pipeline: Train + Deployment (0:30)
159 | - Stress tests + Auto Scaling (0:30)
160 | - Wrap up + discussion (0:20)
161 |
162 | Total: 5:45
163 |
164 | ----
165 | ## License Summary
166 | This sample code is made available under a modified MIT license. See the LICENSE file.
167 |
168 | Thank you!
169 |
--------------------------------------------------------------------------------
/imgs/CloudWatchA.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/CloudWatchA.png
--------------------------------------------------------------------------------
/imgs/CloudWatchB.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/CloudWatchB.png
--------------------------------------------------------------------------------
/imgs/CloudWatchC.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/CloudWatchC.png
--------------------------------------------------------------------------------
/imgs/MLOps_BuildImage.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/MLOps_BuildImage.jpg
--------------------------------------------------------------------------------
/imgs/MLOps_Train_Deploy_TestModel.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/MLOps_Train_Deploy_TestModel.jpg
--------------------------------------------------------------------------------
/imgs/MLOps_Train_Deploy_TestModel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/MLOps_Train_Deploy_TestModel.png
--------------------------------------------------------------------------------
/imgs/cloudformation-launch-stack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/cloudformation-launch-stack.png
--------------------------------------------------------------------------------
/imgs/cloudformationStacks.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/cloudformationStacks.png
--------------------------------------------------------------------------------
/imgs/crisp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/crisp.png
--------------------------------------------------------------------------------
/imgs/eyecatch_sagemaker.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-mlops-workshop/01d8773ab11111d60102ca9e593a03fe24700c2d/imgs/eyecatch_sagemaker.png
--------------------------------------------------------------------------------
/lab/00_Warmup/01_BasicModel_Part1_TrainDeployTest.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# SageMaker Warmup Exercises\n",
8 | "\n",
9 | "This is a series of exercises for you to get familiar with the most important features SageMaker offers. It is divided in four parts: \n",
10 | "\n",
11 | "- **Part 1**: This exercise. Let's explore a toy dataset (iris), train, deploy and test a ML model, using XGBoost\n",
12 | "- **Part 2**: After training your first ML model using SageMaker, now it's time to optimize it using Automatic Model Tuning\n",
13 | "- **Part 3**: Ah, you don't need to do real-time predictions, no problem. Let's do Batch Predictions\n",
14 | "- **Part 4**: Finally, let's use SageMaker Model Monitoring to get some info from the real-time endpoint"
15 | ]
16 | },
17 | {
18 | "cell_type": "markdown",
19 | "metadata": {},
20 | "source": [
21 | "# Part 1/4 - Train/Deploy/Test a multiclass model using SageMaker Built-in XGBoost\n",
22 | "\n",
23 | "This exercise is about executing all the steps of the Machine Learning development pipeline, using some features SageMaker offers. We'll use here a public dataset called iris. Iris is a toy dataset and this is a very simple example. The idea here is to focus on the SageMaker features and not on a complex scenario. Let's see how SageMaker can accelerate your work and avoid wasting your time with tasks that aren't related to your business. \n",
24 | "\n",
25 | "SageMaker library 2.0+ is required!"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "## Let's start by importing the dataset and visualize it"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": null,
38 | "metadata": {
39 | "scrolled": true
40 | },
41 | "outputs": [],
42 | "source": [
43 | "%matplotlib inline\n",
44 | "\n",
45 | "import pandas as pd\n",
46 | "import numpy as np\n",
47 | "import seaborn as sns\n",
48 | "import matplotlib.pyplot as plt\n",
49 | "\n",
50 | "from sklearn import datasets\n",
51 | "sns.set(color_codes=True)\n",
52 | "\n",
53 | "iris = datasets.load_iris()\n",
54 | "\n",
55 | "X=iris.data\n",
56 | "y=iris.target\n",
57 | "\n",
58 | "dataset = np.insert(iris.data, 0, iris.target,axis=1)\n",
59 | "\n",
60 | "df = pd.DataFrame(data=dataset, columns=['iris_id'] + iris.feature_names)\n",
61 | "df['species'] = df['iris_id'].map(lambda x: 'setosa' if x == 0 else 'versicolor' if x == 1 else 'virginica')\n",
62 | "\n",
63 | "df.head()"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": null,
69 | "metadata": {},
70 | "outputs": [],
71 | "source": [
72 | "df.describe()"
73 | ]
74 | },
75 | {
76 | "cell_type": "markdown",
77 | "metadata": {},
78 | "source": [
79 | "## Checking the class distribution"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": null,
85 | "metadata": {},
86 | "outputs": [],
87 | "source": [
88 | "ax = df.groupby(df['species'])['species'].count().plot(kind='bar')\n",
89 | "x_offset = -0.05\n",
90 | "y_offset = 0\n",
91 | "for p in ax.patches:\n",
92 | " b = p.get_bbox()\n",
93 | " val = \"{}\".format(int(b.y1 + b.y0))\n",
94 | " ax.annotate(val, ((b.x0 + b.x1)/2 + x_offset, b.y1 + y_offset))"
95 | ]
96 | },
97 | {
98 | "cell_type": "markdown",
99 | "metadata": {},
100 | "source": [
101 | "### Correlation Matrix"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": null,
107 | "metadata": {},
108 | "outputs": [],
109 | "source": [
110 | "corr = df.corr()\n",
111 | "\n",
112 | "f, ax = plt.subplots(figsize=(15, 8))\n",
113 | "sns.heatmap(corr, annot=True, fmt=\"f\",\n",
114 | " xticklabels=corr.columns.values,\n",
115 | " yticklabels=corr.columns.values,\n",
116 | " ax=ax)"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "metadata": {},
122 | "source": [
123 | "### Pairplots & histograms"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": null,
129 | "metadata": {},
130 | "outputs": [],
131 | "source": [
132 | "sns.pairplot(df.drop(['iris_id'], axis=1), hue='species', height=2.5, diag_kind=\"kde\")"
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "### Now with linear regression"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": null,
145 | "metadata": {},
146 | "outputs": [],
147 | "source": [
148 | "sns.pairplot(df.drop(['iris_id'], axis=1), kind=\"reg\", hue='species', height=2.5,diag_kind=\"kde\")"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "### Fit a plot a kernel density estimate.\n",
156 | "We can see in this dimension an overlaping between **versicolor** and **virginica**. This is a better representation of what we identified above."
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": null,
162 | "metadata": {},
163 | "outputs": [],
164 | "source": [
165 | "tmp_df = df[(df.iris_id==0.0)]\n",
166 | "sns.kdeplot(tmp_df['petal width (cm)'], tmp_df['petal length (cm)'], bw='silverman', cmap=\"Blues\", shade=False, shade_lowest=False)\n",
167 | "\n",
168 | "tmp_df = df[(df.iris_id==1.0)]\n",
169 | "sns.kdeplot(tmp_df['petal width (cm)'], tmp_df['petal length (cm)'], bw='silverman', cmap=\"Greens\", shade=False, shade_lowest=False)\n",
170 | "\n",
171 | "tmp_df = df[(df.iris_id==2.0)]\n",
172 | "sns.kdeplot(tmp_df['petal width (cm)'], tmp_df['petal length (cm)'], bw='silverman', cmap=\"Reds\", shade=False, shade_lowest=False)\n",
173 | "\n",
174 | "plt.xlabel('species')"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "Ok. Petal length and petal width have the highest linear correlation with our label. Also, sepal width seems to be useless, considering the linear correlation with our label.\n",
182 | "\n",
183 | "Since versicolor and virginica cannot be split linearly, we need a more versatile algorithm to create a better classifier. In this case, we'll use XGBoost, a tree ensemble that can give us a good model for predicting the flower."
184 | ]
185 | },
186 | {
187 | "cell_type": "markdown",
188 | "metadata": {},
189 | "source": [
190 | "## Ok, now let's split the dataset into training and test"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": null,
196 | "metadata": {},
197 | "outputs": [],
198 | "source": [
199 | "from sklearn.model_selection import train_test_split\n",
200 | "\n",
201 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)\n",
202 | "yX_train = np.column_stack((y_train, X_train))\n",
203 | "yX_test = np.column_stack((y_test, X_test))\n",
204 | "np.savetxt(\"iris_train.csv\", yX_train, delimiter=\",\", fmt='%0.3f')\n",
205 | "np.savetxt(\"iris_test.csv\", yX_test, delimiter=\",\", fmt='%0.3f')"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "## Now it's time to train our model with the builtin algorithm XGBoost"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": null,
218 | "metadata": {},
219 | "outputs": [],
220 | "source": [
221 | "import sagemaker\n",
222 | "import boto3\n",
223 | "\n",
224 | "from sagemaker import get_execution_role\n",
225 | "from sklearn.model_selection import train_test_split\n",
226 | "\n",
227 | "role = get_execution_role()\n",
228 | "\n",
229 | "prefix='mlops/iris'\n",
230 | "# Retrieve the default bucket\n",
231 | "sagemaker_session = sagemaker.Session()\n",
232 | "bucket = sagemaker_session.default_bucket()\n",
233 | "assert(sagemaker.__version__ >= \"2.0\")"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "metadata": {},
239 | "source": [
240 | "#### Ok. Let's continue, upload the dataset and train the model"
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": null,
246 | "metadata": {},
247 | "outputs": [],
248 | "source": [
249 | "# Upload the dataset to an S3 bucket\n",
250 | "input_train = sagemaker_session.upload_data(path='iris_train.csv', key_prefix='%s/data' % prefix)\n",
251 | "input_test = sagemaker_session.upload_data(path='iris_test.csv', key_prefix='%s/data' % prefix)"
252 | ]
253 | },
254 | {
255 | "cell_type": "code",
256 | "execution_count": null,
257 | "metadata": {},
258 | "outputs": [],
259 | "source": [
260 | "train_data = sagemaker.inputs.TrainingInput(s3_data=input_train,content_type=\"csv\")\n",
261 | "test_data = sagemaker.inputs.TrainingInput(s3_data=input_test,content_type=\"csv\")"
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": null,
267 | "metadata": {},
268 | "outputs": [],
269 | "source": [
270 | "# get the URI for new container\n",
271 | "container_uri = sagemaker.image_uris.retrieve('xgboost', boto3.Session().region_name, version='1.0-1')\n",
272 | "\n",
273 | "# Create the estimator\n",
274 | "xgb = sagemaker.estimator.Estimator(container_uri,\n",
275 | " role, \n",
276 | " instance_count=1, \n",
277 | " instance_type='ml.m4.xlarge',\n",
278 | " output_path='s3://{}/{}/output'.format(bucket, prefix),\n",
279 | " sagemaker_session=sagemaker_session)\n",
280 | "# Set the hyperparameters\n",
281 | "xgb.set_hyperparameters(eta=0.1,\n",
282 | " max_depth=10,\n",
283 | " gamma=4,\n",
284 | " num_class=len(np.unique(y)),\n",
285 | " alpha=10,\n",
286 | " min_child_weight=6,\n",
287 | " silent=0,\n",
288 | " objective='multi:softmax',\n",
289 | " num_round=30)"
290 | ]
291 | },
292 | {
293 | "cell_type": "markdown",
294 | "metadata": {},
295 | "source": [
296 | "### Train the model"
297 | ]
298 | },
299 | {
300 | "cell_type": "code",
301 | "execution_count": null,
302 | "metadata": {},
303 | "outputs": [],
304 | "source": [
305 | "%%time\n",
306 | "# takes around 3min 11s\n",
307 | "xgb.fit({'train': train_data, 'validation': test_data, })"
308 | ]
309 | },
310 | {
311 | "cell_type": "markdown",
312 | "metadata": {},
313 | "source": [
314 | "### Deploy the model and create an endpoint for it\n",
315 | "The following action will:\n",
316 | " * get the assets from the job we just ran and then create an input in the Models Catalog\n",
317 | " * create a endpoint configuration (a metadata for our final endpoint)\n",
318 | " * create an endpoint, which is our model wrapped in a format of a WebService\n",
319 | " \n",
320 | "After that we'll be able to call our deployed endpoint for doing predictions"
321 | ]
322 | },
323 | {
324 | "cell_type": "code",
325 | "execution_count": null,
326 | "metadata": {},
327 | "outputs": [],
328 | "source": [
329 | "%%time\n",
330 | "# Enable log capturing in the endpoint\n",
331 | "data_capture_configuration = sagemaker.model_monitor.data_capture_config.DataCaptureConfig(\n",
332 | " enable_capture=True, \n",
333 | " sampling_percentage=100, \n",
334 | " destination_s3_uri='s3://{}/{}/monitoring'.format(bucket, prefix), \n",
335 | " sagemaker_session=sagemaker_session\n",
336 | ")\n",
337 | "xgb_predictor = xgb.deploy(\n",
338 | " initial_instance_count=1, \n",
339 | " instance_type='ml.m4.xlarge',\n",
340 | " data_capture_config=data_capture_configuration\n",
341 | ")"
342 | ]
343 | },
344 | {
345 | "cell_type": "markdown",
346 | "metadata": {},
347 | "source": [
348 | "### Alright, now that we have deployed the endpoint, with data capturing enabled, it's time to setup the monitor\n",
349 | "Let's start by configuring our predictor"
350 | ]
351 | },
352 | {
353 | "cell_type": "code",
354 | "execution_count": null,
355 | "metadata": {},
356 | "outputs": [],
357 | "source": [
358 | "from sagemaker.serializers import CSVSerializer\n",
359 | "from sklearn.metrics import f1_score\n",
360 | "csv_serializer = CSVSerializer()\n",
361 | "\n",
362 | "endpoint_name = xgb_predictor.endpoint_name\n",
363 | "model_name = boto3.client('sagemaker').describe_endpoint_config(\n",
364 | " EndpointConfigName=endpoint_name\n",
365 | ")['ProductionVariants'][0]['ModelName']\n",
366 | "!echo $model_name > model_name.txt\n",
367 | "!echo $endpoint_name > endpoint_name.txt\n",
368 | "xgb_predictor.serializer = csv_serializer"
369 | ]
370 | },
371 | {
372 | "cell_type": "markdown",
373 | "metadata": {},
374 | "source": [
375 | "## Now, let's do a basic test with the deployed endpoint\n",
376 | "In this test, we'll use a helper object called predictor. This object is always returned from a **Deploy** call. The predictor is just for testing purposes and we'll not use it inside our real application."
377 | ]
378 | },
379 | {
380 | "cell_type": "code",
381 | "execution_count": null,
382 | "metadata": {},
383 | "outputs": [],
384 | "source": [
385 | "predictions_test = [ float(xgb_predictor.predict(x).decode('utf-8')) for x in X_test] \n",
386 | "score = f1_score(y_test,predictions_test,labels=[0.0,1.0,2.0],average='micro')\n",
387 | "\n",
388 | "print('F1 Score(micro): %.1f' % (score * 100.0))"
389 | ]
390 | },
391 | {
392 | "cell_type": "markdown",
393 | "metadata": {},
394 | "source": [
395 | "## Then, let's test the API for our trained model\n",
396 | "This is how your application will call the endpoint. Using boto3 for getting a sagemaker runtime client and then we'll call invoke_endpoint"
397 | ]
398 | },
399 | {
400 | "cell_type": "code",
401 | "execution_count": null,
402 | "metadata": {},
403 | "outputs": [],
404 | "source": [
405 | "from sagemaker.serializers import CSVSerializer\n",
406 | "csv_serializer = CSVSerializer()\n",
407 | "\n",
408 | "sm = boto3.client('sagemaker-runtime')\n",
409 | "resp = sm.invoke_endpoint(\n",
410 | " EndpointName=endpoint_name,\n",
411 | " ContentType='text/csv',\n",
412 | " Body=csv_serializer.serialize(X_test[0])\n",
413 | ")\n",
414 | "prediction = float(resp['Body'].read().decode('utf-8'))\n",
415 | "print('Predicted class: %.1f for [%s]' % (prediction, csv_serializer.serialize(X_test[0])) )"
416 | ]
417 | },
418 | {
419 | "cell_type": "markdown",
420 | "metadata": {},
421 | "source": [
422 | "## Alright, now that you know how to train/deploy/test a model let's optimize it\n",
423 | "\n",
424 | "Click [here to start the Part 2/4](02_BasicModel_Part2_HyperparameterOptimization.ipynb) of this warmup: Hyperparameter Optimization"
425 | ]
426 | },
427 | {
428 | "cell_type": "markdown",
429 | "metadata": {},
430 | "source": [
431 | "## Cleaning up (Attention! Read the message before deleting the Endpoint)\n",
432 | "Only run the next cell if you will **NOT** continue running the next part of the WarmUp. If you decide to continue, please, click on the link above."
433 | ]
434 | },
435 | {
436 | "cell_type": "code",
437 | "execution_count": null,
438 | "metadata": {},
439 | "outputs": [],
440 | "source": [
441 | "xgb_predictor.delete_endpoint()"
442 | ]
443 | },
444 | {
445 | "cell_type": "markdown",
446 | "metadata": {},
447 | "source": [
448 | "# The end"
449 | ]
450 | }
451 | ],
452 | "metadata": {
453 | "kernelspec": {
454 | "display_name": "conda_python3",
455 | "language": "python",
456 | "name": "conda_python3"
457 | },
458 | "language_info": {
459 | "codemirror_mode": {
460 | "name": "ipython",
461 | "version": 3
462 | },
463 | "file_extension": ".py",
464 | "mimetype": "text/x-python",
465 | "name": "python",
466 | "nbconvert_exporter": "python",
467 | "pygments_lexer": "ipython3",
468 | "version": "3.6.10"
469 | }
470 | },
471 | "nbformat": 4,
472 | "nbformat_minor": 2
473 | }
474 |
--------------------------------------------------------------------------------
/lab/00_Warmup/02_BasicModel_Part2_HyperparameterOptimization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Part 2/4 - Optimizing the model\n",
8 | "\n",
9 | "Now that you know how to train a ML model using SageMaker, it's time to optimize it using [Automatic Model Tuning](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html) or Hyperparameter optimization. This is a powerful technique that explores the space of possible values for the selected hyperparameters and tries to find the best combination based on a given metric. You can also select the strategy you want to execute based on that metric, for instance: If my objective function (metric) is **Acuraccy**, then I will select the **Maximize** stragegy. If my metric is **Error**, then I will select **Minimize**.\n",
10 | "\n",
11 | "SageMaker library 2.0+ is required!"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## Let's start by recreating the estimator"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "import sagemaker\n",
28 | "import boto3\n",
29 | "import numpy as np\n",
30 | "\n",
31 | "from sagemaker import get_execution_role\n",
32 | "from sklearn.model_selection import train_test_split\n",
33 | "from sklearn import datasets\n",
34 | "\n",
35 | "role = get_execution_role()\n",
36 | "\n",
37 | "prefix='mlops/iris'\n",
38 | "# Retrieve the default bucket\n",
39 | "sagemaker_session = sagemaker.Session()\n",
40 | "bucket = sagemaker_session.default_bucket()\n",
41 | "assert(sagemaker.__version__ >= \"2.0\")"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "metadata": {},
47 | "source": [
48 | "### Preparing the dataset and uploading it"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": null,
54 | "metadata": {},
55 | "outputs": [],
56 | "source": [
57 | "iris = datasets.load_iris()\n",
58 | "X=iris.data\n",
59 | "y=iris.target\n",
60 | "\n",
61 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)\n",
62 | "yX_train = np.column_stack((y_train, X_train))\n",
63 | "yX_test = np.column_stack((y_test, X_test))\n",
64 | "np.savetxt(\"iris_train.csv\", yX_train, delimiter=\",\", fmt='%0.3f')\n",
65 | "np.savetxt(\"iris_test.csv\", yX_test, delimiter=\",\", fmt='%0.3f')\n",
66 | "\n",
67 | "# Upload the dataset to an S3 bucket\n",
68 | "input_train = sagemaker_session.upload_data(path='iris_train.csv', key_prefix='%s/data' % prefix)\n",
69 | "input_test = sagemaker_session.upload_data(path='iris_test.csv', key_prefix='%s/data' % prefix)\n",
70 | "\n",
71 | "train_data = sagemaker.inputs.TrainingInput(s3_data=input_train,content_type=\"csv\")\n",
72 | "test_data = sagemaker.inputs.TrainingInput(s3_data=input_test,content_type=\"csv\")"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": null,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "# get the URI for new container\n",
82 | "container_uri = sagemaker.image_uris.retrieve('xgboost', boto3.Session().region_name, version='1.0-1')\n",
83 | "\n",
84 | "# Create the estimator\n",
85 | "xgb = sagemaker.estimator.Estimator(container_uri,\n",
86 | " role, \n",
87 | " instance_count=1, \n",
88 | " instance_type='ml.m4.xlarge',\n",
89 | " output_path='s3://{}/{}/output'.format(bucket, prefix),\n",
90 | " sagemaker_session=sagemaker_session)\n",
91 | "# Set the hyperparameters\n",
92 | "xgb.set_hyperparameters(num_class=len(np.unique(y)),\n",
93 | " silent=0,\n",
94 | " objective='multi:softmax',\n",
95 | " num_round=30)"
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "## Hyperparameter Tuning Jobs\n",
103 | "#### A.K.A. Hyperparameter Optimization\n",
104 | "\n",
105 | "We know that the iris dataset is an easy challenge. We can achieve a better score with XGBoost. However, we don't want to waste time testing all the possible variations of the hyperparameters in order to optimize the training process.\n",
106 | "\n",
107 | "Instead, we'll use the Sagemaker's tuning feature. For that, we'll use the same estimator, but let's create a Tuner and ask it for optimize the model for us. "
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner\n",
117 | "\n",
118 | "hyperparameter_ranges = {'eta': ContinuousParameter(0, 1),\n",
119 | " 'min_child_weight': ContinuousParameter(1, 10),\n",
120 | " 'alpha': ContinuousParameter(0, 2),\n",
121 | " 'gamma': ContinuousParameter(0, 10),\n",
122 | " 'max_depth': IntegerParameter(1, 10)}\n",
123 | "\n",
124 | "objective_metric_name = 'validation:merror'\n",
125 | "\n",
126 | "tuner = HyperparameterTuner(xgb,\n",
127 | " objective_metric_name,\n",
128 | " hyperparameter_ranges,\n",
129 | " max_jobs=20,\n",
130 | " max_parallel_jobs=4,\n",
131 | " objective_type='Minimize')\n",
132 | "\n",
133 | "tuner.fit({'train': train_data, 'validation': test_data, })\n",
134 | "tuner.wait()"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "job_name = tuner.latest_tuning_job.name\n",
144 | "attached_tuner = HyperparameterTuner.attach(job_name)\n",
145 | "xgb_predictor = attached_tuner.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "endpoint_name = xgb_predictor.endpoint_name\n",
155 | "model_name = boto3.client('sagemaker').describe_endpoint_config(\n",
156 | " EndpointConfigName=endpoint_name\n",
157 | ")['ProductionVariants'][0]['ModelName']\n",
158 | "!echo $model_name > model_name.txt\n",
159 | "!echo $endpoint_name > endpoint_name2.txt"
160 | ]
161 | },
162 | {
163 | "cell_type": "markdown",
164 | "metadata": {},
165 | "source": [
166 | "## A simple test before we move on"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": null,
172 | "metadata": {},
173 | "outputs": [],
174 | "source": [
175 | "from sagemaker.serializers import CSVSerializer\n",
176 | "from sklearn.metrics import f1_score\n",
177 | "csv_serializer = CSVSerializer()\n",
178 | "xgb_predictor.serializer = csv_serializer"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": null,
184 | "metadata": {},
185 | "outputs": [],
186 | "source": [
187 | "predictions_test = [ float(xgb_predictor.predict(x).decode('utf-8')) for x in X_test] \n",
188 | "score = f1_score(y_test,predictions_test,labels=[0.0,1.0,2.0],average='micro')\n",
189 | "\n",
190 | "print('F1 Score(micro): %.1f' % (score * 100.0))"
191 | ]
192 | },
193 | {
194 | "cell_type": "markdown",
195 | "metadata": {},
196 | "source": [
197 | "## Alright, now that you know how to optimize a model let's run a batch prediction\n",
198 | "\n",
199 | "Click [here to start the Part 3/4](03_BasicModel_Part3_BatchPrediction.ipynb) of this warmup: Batch Prediction"
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {},
205 | "source": [
206 | "## Cleaning up (Attention! Read the message before deleting the Endpoint)\n",
207 | "Only run the next cell if you will **NOT** continue running the next part of the WarmUp. If you decide to continue, please, click on the link above."
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": null,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "xgb_predictor.delete_endpoint()"
217 | ]
218 | },
219 | {
220 | "cell_type": "markdown",
221 | "metadata": {},
222 | "source": [
223 | "# The end"
224 | ]
225 | }
226 | ],
227 | "metadata": {
228 | "kernelspec": {
229 | "display_name": "conda_python3",
230 | "language": "python",
231 | "name": "conda_python3"
232 | },
233 | "language_info": {
234 | "codemirror_mode": {
235 | "name": "ipython",
236 | "version": 3
237 | },
238 | "file_extension": ".py",
239 | "mimetype": "text/x-python",
240 | "name": "python",
241 | "nbconvert_exporter": "python",
242 | "pygments_lexer": "ipython3",
243 | "version": "3.6.10"
244 | }
245 | },
246 | "nbformat": 4,
247 | "nbformat_minor": 2
248 | }
249 |
--------------------------------------------------------------------------------
/lab/00_Warmup/03_BasicModel_Part3_BatchPrediction.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Part 3/4 - Batch Prediction"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "## Batch transform job\n",
15 | "If you have a file with the samples you want to predict, just upload that file to an S3 bucket and start a Batch Transform job. For this task, you don't need to deploy an endpoint. Sagemaker will create all the resources needed to do this batch prediction, save the results into an S3 bucket and then it will destroy the resources automatically for you"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": null,
21 | "metadata": {},
22 | "outputs": [],
23 | "source": [
24 | "import sagemaker\n",
25 | "import numpy as np\n",
26 | "import boto3\n",
27 | "import os\n",
28 | "import pandas as pd\n",
29 | "import io\n",
30 | "from sklearn import datasets\n",
31 | "from sagemaker import get_execution_role\n",
32 | "\n",
33 | "role = get_execution_role()\n",
34 | "\n",
35 | "iris = datasets.load_iris()\n",
36 | "prefix='mlops/iris'\n",
37 | "sagemaker_session = sagemaker.Session()\n",
38 | "bucket = sagemaker_session.default_bucket()\n",
39 | "model_name = open('model_name.txt', 'r').read().strip() if os.path.isfile('model_name.txt') else None\n",
40 | "if model_name is None:\n",
41 | " raise Exception(\"You must run Part 1 or 2 before this. There you will train a Model to use here\")"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": null,
47 | "metadata": {},
48 | "outputs": [],
49 | "source": [
50 | "batch_dataset_filename=\"batch_dataset.csv\"\n",
51 | "np.savetxt(batch_dataset_filename, iris.data, delimiter=\",\", fmt='%0.3f')\n",
52 | "input_batch = sagemaker_session.upload_data(path=batch_dataset_filename, key_prefix='%s/data' % prefix)"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": null,
58 | "metadata": {},
59 | "outputs": [],
60 | "source": [
61 | "# Initialize the transformer object\n",
62 | "transformer=sagemaker.transformer.Transformer(\n",
63 | " base_transform_job_name='mlops-iris',\n",
64 | " model_name=model_name,\n",
65 | " instance_count=1,\n",
66 | " instance_type='ml.c4.xlarge',\n",
67 | " output_path='s3://{}/{}/batch_output'.format(bucket, prefix),\n",
68 | ")\n",
69 | "# To start a transform job:\n",
70 | "transformer.transform(input_batch, content_type='text/csv', split_type='Line')\n",
71 | "# Then wait until transform job is completed\n",
72 | "transformer.wait()"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": null,
78 | "metadata": {},
79 | "outputs": [],
80 | "source": [
81 | "dataset = np.insert(iris.data, 0, iris.target,axis=1)\n",
82 | "df = pd.DataFrame(data=dataset, columns=['iris_id'] + iris.feature_names)\n",
83 | "df_pred = pd.read_csv(\n",
84 | " io.StringIO(sagemaker_session.read_s3_file(bucket, '{}/batch_output/{}.out'.format(\n",
85 | " prefix, batch_dataset_filename))), names=['predicted_iris_id'] )\n",
86 | "df = pd.merge(df,df_pred, left_index=True, right_index=True)\n",
87 | "df.head()"
88 | ]
89 | },
90 | {
91 | "cell_type": "code",
92 | "execution_count": null,
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "from sklearn.metrics import f1_score\n",
97 | "score = f1_score(df['iris_id'], df['predicted_iris_id'],labels=[0.0,1.0,2.0],average='micro')\n",
98 | "\n",
99 | "print('F1 Score(micro): %.1f' % (score * 100.0))"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "%matplotlib inline\n",
109 | "import seaborn as sns\n",
110 | "import matplotlib.pyplot as plt\n",
111 | "\n",
112 | "from sklearn.metrics import confusion_matrix\n",
113 | "cnf_matrix = confusion_matrix(df['iris_id'], df['predicted_iris_id'])\n",
114 | "\n",
115 | "f, ax = plt.subplots(figsize=(15, 8))\n",
116 | "sns.heatmap(cnf_matrix, annot=True, fmt=\"f\", mask=np.zeros_like(cnf_matrix, dtype=np.bool), \n",
117 | " cmap=sns.diverging_palette(220, 10, as_cmap=True),\n",
118 | " square=True, ax=ax)"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "## Alright, now that you know how to optimize a model let's run a batch prediction\n",
126 | "\n",
127 | "Click [here to start the Part 4/4](04_BasicModel_Part4_ModelMonitoring.ipynb) of this warmup: Model Monitoring"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "# The end"
135 | ]
136 | }
137 | ],
138 | "metadata": {
139 | "kernelspec": {
140 | "display_name": "conda_python3",
141 | "language": "python",
142 | "name": "conda_python3"
143 | },
144 | "language_info": {
145 | "codemirror_mode": {
146 | "name": "ipython",
147 | "version": 3
148 | },
149 | "file_extension": ".py",
150 | "mimetype": "text/x-python",
151 | "name": "python",
152 | "nbconvert_exporter": "python",
153 | "pygments_lexer": "ipython3",
154 | "version": "3.6.10"
155 | }
156 | },
157 | "nbformat": 4,
158 | "nbformat_minor": 2
159 | }
160 |
--------------------------------------------------------------------------------
/lab/00_Warmup/04_BasicModel_Part4_ModelMonitoring.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## PART 4/4 - Enabling Monitoring and checking violations\n",
8 | "\n",
9 | "In this exercise, we will enable [Model Monitoring](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html) for the endpoint we created in the first part. This is an important feature if you need to keep an eye on the your model performance. You check for violations like: invalid input, data drift and more.\n",
10 | "\n",
11 | "You start this process by preparing the baseline or a reference set of statistic values that Monitor uses to compare with the payload and identify anomalies. \n",
12 | "\n",
13 | "There is a cron job like called Monitoring Schedulle. This element will keep processing the logs (in and out) to create the reports you can see."
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": null,
19 | "metadata": {},
20 | "outputs": [],
21 | "source": [
22 | "import sagemaker\n",
23 | "import numpy as np\n",
24 | "import boto3\n",
25 | "import os\n",
26 | "import pandas as pd\n",
27 | "from sklearn import datasets\n",
28 | "from sagemaker import get_execution_role\n",
29 | "from sagemaker.serializers import CSVSerializer\n",
30 | "\n",
31 | "role = get_execution_role()\n",
32 | "iris = datasets.load_iris()\n",
33 | "X = iris.data\n",
34 | "y = iris.target\n",
35 | "\n",
36 | "dataset = np.insert(X, 0, y,axis=1)\n",
37 | "pd.DataFrame(data=dataset, columns=['iris_id'] + iris.feature_names).to_csv('full_dataset.csv', index=None)\n",
38 | "\n",
39 | "sagemaker_session = sagemaker.Session()\n",
40 | "bucket = sagemaker_session.default_bucket()\n",
41 | "\n",
42 | "prefix='mlops/iris'\n",
43 | "endpoint_name = open('endpoint_name.txt', 'r').read().strip() if os.path.isfile('endpoint_name.txt') else None\n",
44 | "endpoint_name2 = open('endpoint_name2.txt', 'r').read().strip() if os.path.isfile('endpoint_name2.txt') else None\n",
45 | "\n",
46 | "try:\n",
47 | " xgb_predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name, sagemaker_session=sagemaker_session)\n",
48 | " xgb_predictor.serializer = CSVSerializer()\n",
49 | "except Exception as e:\n",
50 | " raise Exception(\"You must run Part 1 before this. There, you will train/deploy a Model and use it here\")"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "from sagemaker.model_monitor import DefaultModelMonitor\n",
60 | "from sagemaker.model_monitor.dataset_format import DatasetFormat\n",
61 | "\n",
62 | "endpoint_monitor = DefaultModelMonitor(\n",
63 | " role=role,\n",
64 | " instance_count=1,\n",
65 | " instance_type='ml.m5.xlarge',\n",
66 | " volume_size_in_gb=20,\n",
67 | " max_runtime_in_seconds=3600,\n",
68 | ")\n",
69 | "endpoint_monitor.suggest_baseline(\n",
70 | " baseline_dataset='full_dataset.csv',\n",
71 | " dataset_format=DatasetFormat.csv(header=True),\n",
72 | " output_s3_uri='s3://{}/{}/monitoring/baseline'.format(bucket, prefix),\n",
73 | " wait=True,\n",
74 | " logs=False\n",
75 | ")"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "### Just take a look on the baseline created/suggested by SageMaker for your dataset\n",
83 | "This set of statistics and constraints will be used by the Monitoring Scheduler to compare the incoming data with what is considered **normal**. Each invalid payload sent to the endpoint will be considered a violation."
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": null,
89 | "metadata": {},
90 | "outputs": [],
91 | "source": [
92 | "baseline_job = endpoint_monitor.latest_baselining_job\n",
93 | "schema_df = pd.json_normalize(baseline_job.baseline_statistics().body_dict[\"features\"])\n",
94 | "constraints_df = pd.json_normalize(baseline_job.suggested_constraints().body_dict[\"features\"])\n",
95 | "report_df = schema_df.merge(constraints_df)\n",
96 | "report_df.drop([\n",
97 | " 'numerical_statistics.distribution.kll.buckets',\n",
98 | " 'numerical_statistics.distribution.kll.sketch.data',\n",
99 | " 'numerical_statistics.distribution.kll.sketch.parameters.c'\n",
100 | "], axis=1).head(10)"
101 | ]
102 | },
103 | {
104 | "cell_type": "markdown",
105 | "metadata": {},
106 | "source": [
107 | "And then, we need to create a **Monitoring Schedule** for our endpoint. The command below will create a cron scheduler that will process the log each hour, then we can see how well our model is going."
108 | ]
109 | },
110 | {
111 | "cell_type": "code",
112 | "execution_count": null,
113 | "metadata": {},
114 | "outputs": [],
115 | "source": [
116 | "from sagemaker.model_monitor import CronExpressionGenerator\n",
117 | "from time import gmtime, strftime\n",
118 | "\n",
119 | "endpoint_monitor.create_monitoring_schedule(\n",
120 | " endpoint_input=endpoint_name,\n",
121 | " output_s3_uri='s3://{}/{}/monitoring/reports'.format(bucket, prefix),\n",
122 | " statistics=endpoint_monitor.baseline_statistics(),\n",
123 | " constraints=endpoint_monitor.suggested_constraints(),\n",
124 | " schedule_cron_expression=CronExpressionGenerator.hourly(),\n",
125 | " enable_cloudwatch_metrics=True,\n",
126 | ")"
127 | ]
128 | },
129 | {
130 | "cell_type": "code",
131 | "execution_count": null,
132 | "metadata": {},
133 | "outputs": [],
134 | "source": [
135 | "# This is how you can list all the monitoring schedules you created in your account\n",
136 | "!aws sagemaker list-monitoring-schedules"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "### Start generating some artificial traffic\n",
144 | "The cell below starts a thread to send some traffic to the endpoint. Note that you need to stop the kernel to terminate this thread. If there is no traffic, the monitoring jobs are marked as `Failed` since there is no data to process."
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "metadata": {},
151 | "outputs": [],
152 | "source": [
153 | "import random\n",
154 | "import time \n",
155 | "from threading import Thread\n",
156 | "\n",
157 | "traffic_generator_running=True\n",
158 | "def invoke_endpoint_forever():\n",
159 | " print('Invoking endpoint forever!')\n",
160 | " while traffic_generator_running:\n",
161 | " ## This will create an invalid set of features\n",
162 | " ## The idea is to violate two monitoring constraings: not_null and data_drift\n",
163 | " null_idx = random.randint(0,3)\n",
164 | " sample = [random.randint(500,2000) / 100.0 for i in range(4)]\n",
165 | " sample[null_idx] = None\n",
166 | " xgb_predictor.predict(sample)\n",
167 | " time.sleep(0.5)\n",
168 | " print('Endpoint invoker has stopped')\n",
169 | "Thread(target = invoke_endpoint_forever).start()"
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "## Kick off a processing job to analyze the logs\n",
177 | "Since the Monitoring Scheduler will only run each full hour and we don't have that time to wait, let's kick off a processin job manually (that simulates what the scheduler does) and see the report with the violations our Thread above will cause to the endpoint with the random payload."
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": null,
183 | "metadata": {},
184 | "outputs": [],
185 | "source": [
186 | "import time\n",
187 | "import datetime\n",
188 | "import boto3\n",
189 | "\n",
190 | "def process_monitoring_logs(endpoint_monitor):\n",
191 | " sm = boto3.client('sagemaker')\n",
192 | " now = datetime.datetime.today()\n",
193 | " suffix = now.strftime(\"%Y/%m/%d/%H\")\n",
194 | " start_time = datetime.datetime(now.year, now.month, now.day, now.hour)\n",
195 | " end_time = start_time + datetime.timedelta(hours=1)\n",
196 | "\n",
197 | " # get the monitoring metadata\n",
198 | " base_desc = endpoint_monitor.describe_latest_baselining_job()\n",
199 | " sche_desc = endpoint_monitor.describe_schedule()\n",
200 | " baseline_path = base_desc['ProcessingOutputConfig']['Outputs'][0]['S3Output']['S3Uri']\n",
201 | " endpoint_name = sche_desc['EndpointName']\n",
202 | "\n",
203 | " variant_name = sm.describe_endpoint(EndpointName=endpoint_name)['ProductionVariants'][0]['VariantName']\n",
204 | " logs_path = \"%s/%s/%s\" % (endpoint_name,variant_name,suffix)\n",
205 | " \n",
206 | " s3_output = {\n",
207 | " \"S3Uri\": 's3://{}/{}/monitoring/{}'.format(bucket, prefix, logs_path),\n",
208 | " \"LocalPath\": \"/opt/ml/processing/output\",\n",
209 | " \"S3UploadMode\": \"Continuous\"\n",
210 | " }\n",
211 | " # values for the processing job input\n",
212 | " values = [\n",
213 | " [ 'input_1', 's3://{}/{}/monitoring/{}'.format(bucket, prefix, logs_path),\n",
214 | " '/opt/ml/processing/input/endpoint/{}'.format(logs_path) ], \n",
215 | " [ 'baseline', '%s/statistics.json' % baseline_path,\n",
216 | " '/opt/ml/processing/baseline/stats'],\n",
217 | " [ 'constraints', '%s/constraints.json' % baseline_path,\n",
218 | " '/opt/ml/processing/baseline/constraints']\n",
219 | " ]\n",
220 | " job_params = {\n",
221 | " 'ProcessingJobName': 'model-monitoring-%s' % time.strftime(\"%Y%m%d%H%M%S\"),\n",
222 | " 'ProcessingInputs': [{\n",
223 | " 'InputName': o[0],\n",
224 | " 'S3Input': { \n",
225 | " 'S3Uri': o[1], 'LocalPath': o[2], 'S3DataType': 'S3Prefix', 'S3InputMode': 'File', \n",
226 | " 'S3CompressionType': 'None', 'S3DataDistributionType': 'FullyReplicated'\n",
227 | " }} for o in values],\n",
228 | " 'ProcessingOutputConfig': { 'Outputs': [ {'OutputName': 'result','S3Output': s3_output } ] },\n",
229 | " 'ProcessingResources': base_desc['ProcessingResources'],\n",
230 | " 'AppSpecification': base_desc['AppSpecification'],\n",
231 | " 'RoleArn': base_desc['RoleArn'],\n",
232 | " 'Environment': {\n",
233 | " 'baseline_constraints': '/opt/ml/processing/baseline/constraints/constraints.json',\n",
234 | " 'baseline_statistics': '/opt/ml/processing/baseline/stats/statistics.json',\n",
235 | " 'dataset_format': '{\"sagemakerCaptureJson\":{\"captureIndexNames\":[\"endpointInput\",\"endpointOutput\"]}}',\n",
236 | " 'dataset_source': '/opt/ml/processing/input/endpoint', \n",
237 | " 'output_path': '/opt/ml/processing/output',\n",
238 | " 'publish_cloudwatch_metrics': 'Enabled',\n",
239 | " 'sagemaker_monitoring_schedule_name': sche_desc['MonitoringScheduleName'],\n",
240 | " 'sagemaker_endpoint_name': endpoint_name,\n",
241 | " 'start_time': start_time.strftime(\"%Y-%m-%dT%H:%M:%SZ\"),\n",
242 | " 'end_time': end_time.strftime(\"%Y-%m-%dT%H:%M:%SZ\")\n",
243 | " }\n",
244 | " }\n",
245 | " sm.create_processing_job(**job_params)\n",
246 | " waiter = sm.get_waiter('processing_job_completed_or_stopped')\n",
247 | " waiter.wait( ProcessingJobName=job_params['ProcessingJobName'], WaiterConfig={'Delay': 30,'MaxAttempts': 20} )\n",
248 | " return job_params['ProcessingJobName'], s3_output['S3Uri']"
249 | ]
250 | },
251 | {
252 | "cell_type": "code",
253 | "execution_count": null,
254 | "metadata": {},
255 | "outputs": [],
256 | "source": [
257 | "import pandas as pd\n",
258 | "## The processing job takes something like 5mins to run\n",
259 | "job_name, s3_output = process_monitoring_logs(endpoint_monitor)\n",
260 | "tokens = s3_output.split('/', 3)\n",
261 | "df = pd.read_json(sagemaker_session.read_s3_file(tokens[2], '%s/constraint_violations.json' % tokens[3]))\n",
262 | "df = pd.json_normalize(df.violations)\n",
263 | "df.head()"
264 | ]
265 | },
266 | {
267 | "cell_type": "markdown",
268 | "metadata": {},
269 | "source": [
270 | "You can also check these metrics on CloudWatch. Just open the CloudWatch console, click on **Metrics**, then select:\n",
271 | " All -> aws/sagemaker/Endpoints/data-metrics -> Endpoint, MonitoringSchedule\n",
272 | "\n",
273 | "Use the *endpoint_monitor* name to filter the metrics"
274 | ]
275 | },
276 | {
277 | "cell_type": "markdown",
278 | "metadata": {},
279 | "source": [
280 | "## Cleaning up"
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {},
287 | "outputs": [],
288 | "source": [
289 | "traffic_generator_running=False\n",
290 | "time.sleep(3)\n",
291 | "endpoint_monitor.delete_monitoring_schedule()\n",
292 | "time.sleep(10) # wait for 10 seconds before trying to delete the endpoint"
293 | ]
294 | },
295 | {
296 | "cell_type": "code",
297 | "execution_count": null,
298 | "metadata": {},
299 | "outputs": [],
300 | "source": [
301 | "try:\n",
302 | " xgb_predictor = sagemaker.predictor.Predictor(endpoint_name=endpoint_name, sagemaker_session=sagemaker_session)\n",
303 | " xgb_predictor.delete_endpoint()\n",
304 | "except Exception as e:\n",
305 | " print(e)\n",
306 | "try:\n",
307 | " xgb_predictor2 = sagemaker.predictor.Predictor(endpoint_name=endpoint_name2, sagemaker_session=sagemaker_session)\n",
308 | " xgb_predictor2.delete_endpoint()\n",
309 | "except Exception as e:\n",
310 | " print(e)"
311 | ]
312 | },
313 | {
314 | "cell_type": "markdown",
315 | "metadata": {},
316 | "source": [
317 | "# The end"
318 | ]
319 | }
320 | ],
321 | "metadata": {
322 | "kernelspec": {
323 | "display_name": "conda_python3",
324 | "language": "python",
325 | "name": "conda_python3"
326 | },
327 | "language_info": {
328 | "codemirror_mode": {
329 | "name": "ipython",
330 | "version": 3
331 | },
332 | "file_extension": ".py",
333 | "mimetype": "text/x-python",
334 | "name": "python",
335 | "nbconvert_exporter": "python",
336 | "pygments_lexer": "ipython3",
337 | "version": "3.6.10"
338 | }
339 | },
340 | "nbformat": 4,
341 | "nbformat_minor": 2
342 | }
343 |
--------------------------------------------------------------------------------
/lab/01_CreateAlgorithmContainer/01_Creating a Classifier Container.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Building a docker container for training/deploying our classifier\n",
8 | "\n",
9 | "In this exercise we'll create a Docker image that will have the required code for training and deploying a ML model. In this particular example, we'll use scikit-learn (https://scikit-learn.org/) and the **Random Forest Tree** implementation of that library to train a flower classifier. The dataset used in this experiment is a toy dataset called Iris (http://archive.ics.uci.edu/ml/datasets/iris). The challenge itself is very basic, so you can focus on the mechanics and the features of this automated environment.\n",
10 | "\n",
11 | "A first pipeline will be executed at the end of this exercise, automatically. It will get the assets you'll push to a Git repo, build this image and push it to ECR, a docker image repository, used by SageMaker.\n",
12 | "\n",
13 | "> **Question**: Why would I create a Scikit-learn container from scratch if SageMaker already offerst one (https://docs.aws.amazon.com/sagemaker/latest/dg/sklearn.html). \n",
14 | "> **Answer**: This is an exercise and the idea here is also to show you how you can create your own container. In a real-life scenario, the best approach is to use the native container offered by SageMaker.\n",
15 | "\n",
16 | "\n",
17 | "## Why do I have to do this? If you're asking yourself this question you probably don't need to create a custom container. If that is the case, you can skip this section by clicking on the link bellow and use the built-in container with XGBoost to run the automated pipeline\n",
18 | "> [Skip this section](../02_TrainYourModel/01_Training%20our%20model.ipynb) and start training your ML model\n"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "metadata": {},
24 | "source": [
25 | "## PART 1 - Creating the assets required to build/test a docker image"
26 | ]
27 | },
28 | {
29 | "cell_type": "markdown",
30 | "metadata": {},
31 | "source": [
32 | "### 1.1 Let's start by creating the training script!\n",
33 | "\n",
34 | "As you can see, this is a very basic example of Scikit-Learn. Nothing fancy."
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": null,
40 | "metadata": {},
41 | "outputs": [],
42 | "source": [
43 | "%%writefile train.py\n",
44 | "import os\n",
45 | "import sys\n",
46 | "import pandas as pd\n",
47 | "import re\n",
48 | "import joblib\n",
49 | "import json\n",
50 | "from sklearn.ensemble import RandomForestClassifier\n",
51 | "\n",
52 | "def load_dataset(path):\n",
53 | " # Take the set of files and read them all into a single pandas dataframe\n",
54 | " files = [ os.path.join(path, file) for file in os.listdir(path) ]\n",
55 | " \n",
56 | " if len(files) == 0:\n",
57 | " raise ValueError(\"Invalid # of files in dir: {}\".format(path))\n",
58 | "\n",
59 | " raw_data = [ pd.read_csv(file, sep=\",\", header=None ) for file in files ]\n",
60 | " data = pd.concat(raw_data)\n",
61 | "\n",
62 | " # labels are in the first column\n",
63 | " y = data.iloc[:,0]\n",
64 | " X = data.iloc[:,1:]\n",
65 | " return X,y\n",
66 | " \n",
67 | "def start(args):\n",
68 | " print(\"Training mode\")\n",
69 | "\n",
70 | " try:\n",
71 | " X_train, y_train = load_dataset(args.train)\n",
72 | " X_test, y_test = load_dataset(args.validation)\n",
73 | " \n",
74 | " hyperparameters = {\n",
75 | " \"max_depth\": args.max_depth,\n",
76 | " \"verbose\": 1, # show all logs\n",
77 | " \"n_jobs\": args.n_jobs,\n",
78 | " \"n_estimators\": args.n_estimators\n",
79 | " }\n",
80 | " print(\"Training the classifier\")\n",
81 | " model = RandomForestClassifier()\n",
82 | " model.set_params(**hyperparameters)\n",
83 | " model.fit(X_train, y_train)\n",
84 | " print(\"Score: {}\".format( model.score(X_test, y_test)) )\n",
85 | " joblib.dump(model, open(os.path.join(args.model_dir, \"iris_model.pkl\"), \"wb\"))\n",
86 | " \n",
87 | " except Exception as e:\n",
88 | " # Write out an error file. This will be returned as the failureReason in the\n",
89 | " # DescribeTrainingJob result.\n",
90 | " trc = traceback.format_exc()\n",
91 | " with open(os.path.join(args.output_dir, \"failure\"), \"w\") as s:\n",
92 | " s.write(\"Exception during training: \" + str(e) + \"\\\\n\" + trc)\n",
93 | " \n",
94 | " # Printing this causes the exception to be in the training job logs, as well.\n",
95 | " print(\"Exception during training: \" + str(e) + \"\\\\n\" + trc, file=sys.stderr)\n",
96 | " \n",
97 | " # A non-zero exit code causes the training job to be marked as Failed.\n",
98 | " sys.exit(255)"
99 | ]
100 | },
101 | {
102 | "cell_type": "markdown",
103 | "metadata": {},
104 | "source": [
105 | "### 1.2 Ok. Lets then create the handler. The **Inference Handler** is how we use the SageMaker Inference Toolkit to encapsulate our code and expose it as a SageMaker container.\n",
106 | "SageMaker Inference Toolkit: https://github.com/aws/sagemaker-inference-toolkit"
107 | ]
108 | },
109 | {
110 | "cell_type": "code",
111 | "execution_count": null,
112 | "metadata": {},
113 | "outputs": [],
114 | "source": [
115 | "%%writefile handler.py\n",
116 | "import os\n",
117 | "import sys\n",
118 | "import joblib\n",
119 | "from sagemaker_inference.default_inference_handler import DefaultInferenceHandler\n",
120 | "from sagemaker_inference.default_handler_service import DefaultHandlerService\n",
121 | "from sagemaker_inference import content_types, errors, transformer, encoder, decoder\n",
122 | "\n",
123 | "class HandlerService(DefaultHandlerService, DefaultInferenceHandler):\n",
124 | " def __init__(self):\n",
125 | " op = transformer.Transformer(default_inference_handler=self)\n",
126 | " super(HandlerService, self).__init__(transformer=op)\n",
127 | " \n",
128 | " ## Loads the model from the disk\n",
129 | " def default_model_fn(self, model_dir):\n",
130 | " model_filename = os.path.join(model_dir, \"iris_model.pkl\")\n",
131 | " return joblib.load(open(model_filename, \"rb\"))\n",
132 | " \n",
133 | " ## Parse and check the format of the input data\n",
134 | " def default_input_fn(self, input_data, content_type):\n",
135 | " if content_type != \"text/csv\":\n",
136 | " raise Exception(\"Invalid content-type: %s\" % content_type)\n",
137 | " return decoder.decode(input_data, content_type).reshape(1,-1)\n",
138 | " \n",
139 | " ## Run our model and do the prediction\n",
140 | " def default_predict_fn(self, payload, model):\n",
141 | " return model.predict( payload ).tolist()\n",
142 | " \n",
143 | " ## Gets the prediction output and format it to be returned to the user\n",
144 | " def default_output_fn(self, prediction, accept):\n",
145 | " if accept != \"text/csv\":\n",
146 | " raise Exception(\"Invalid accept: %s\" % accept)\n",
147 | " return encoder.encode(prediction, accept)"
148 | ]
149 | },
150 | {
151 | "cell_type": "markdown",
152 | "metadata": {},
153 | "source": [
154 | "### 1.3 Now we need to create the entrypoint of our container. The main function\n",
155 | "\n",
156 | "We'll use **SageMaker Training Toolkit** (https://github.com/aws/sagemaker-training-toolkit) to work with the arguments and environment variables defined by SageMaker. This library will make our code simpler."
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": null,
162 | "metadata": {},
163 | "outputs": [],
164 | "source": [
165 | "%%writefile main.py\n",
166 | "import train\n",
167 | "import argparse\n",
168 | "import sys\n",
169 | "import os\n",
170 | "import traceback\n",
171 | "from sagemaker_inference import model_server\n",
172 | "from sagemaker_training import environment\n",
173 | "\n",
174 | "if __name__ == \"__main__\":\n",
175 | " if len(sys.argv) < 2 or ( not sys.argv[1] in [ \"serve\", \"train\" ] ):\n",
176 | " raise Exception(\"Invalid argument: you must inform 'train' for training mode or 'serve' predicting mode\") \n",
177 | " \n",
178 | " if sys.argv[1] == \"train\":\n",
179 | " \n",
180 | " env = environment.Environment()\n",
181 | " \n",
182 | " parser = argparse.ArgumentParser()\n",
183 | " # https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md\n",
184 | " parser.add_argument(\"--max-depth\", type=int, default=10)\n",
185 | " parser.add_argument(\"--n-jobs\", type=int, default=env.num_cpus)\n",
186 | " parser.add_argument(\"--n-estimators\", type=int, default=120)\n",
187 | " \n",
188 | " # reads input channels training and testing from the environment variables\n",
189 | " parser.add_argument(\"--train\", type=str, default=env.channel_input_dirs[\"train\"])\n",
190 | " parser.add_argument(\"--validation\", type=str, default=env.channel_input_dirs[\"validation\"])\n",
191 | "\n",
192 | " parser.add_argument(\"--model-dir\", type=str, default=env.model_dir)\n",
193 | " parser.add_argument(\"--output-dir\", type=str, default=env.output_dir)\n",
194 | " \n",
195 | " args,unknown = parser.parse_known_args()\n",
196 | " train.start(args)\n",
197 | " else:\n",
198 | " model_server.start_model_server(handler_service=\"serving.handler\")"
199 | ]
200 | },
201 | {
202 | "cell_type": "markdown",
203 | "metadata": {},
204 | "source": [
205 | "### 1.4 Then, we can create the Dockerfile\n",
206 | "Just pay attention to the packages we'll install in our container. Here, we'll use **SageMaker Inference Toolkit** (https://github.com/aws/sagemaker-inference-toolkit) and **SageMaker Training Toolkit** (https://github.com/aws/sagemaker-training-toolkit) to prepare the container for training/serving our model. **By serving** you can understand: exposing our model as a webservice that can be called through an api call."
207 | ]
208 | },
209 | {
210 | "cell_type": "code",
211 | "execution_count": null,
212 | "metadata": {},
213 | "outputs": [],
214 | "source": [
215 | "%%writefile Dockerfile\n",
216 | "FROM python:3.7-buster\n",
217 | "\n",
218 | "# Set a docker label to advertise multi-model support on the container\n",
219 | "LABEL com.amazonaws.sagemaker.capabilities.multi-models=false\n",
220 | "# Set a docker label to enable container to use SAGEMAKER_BIND_TO_PORT environment variable if present\n",
221 | "LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true\n",
222 | "\n",
223 | "RUN apt-get update -y && apt-get -y install --no-install-recommends default-jdk\n",
224 | "RUN rm -rf /var/lib/apt/lists/*\n",
225 | "\n",
226 | "RUN pip --no-cache-dir install multi-model-server sagemaker-inference sagemaker-training\n",
227 | "RUN pip --no-cache-dir install pandas numpy scipy scikit-learn\n",
228 | "\n",
229 | "ENV PYTHONUNBUFFERED=TRUE\n",
230 | "ENV PYTHONDONTWRITEBYTECODE=TRUE\n",
231 | "ENV PYTHONPATH=\"/opt/ml/code:${PATH}\"\n",
232 | "\n",
233 | "COPY main.py /opt/ml/code/main.py\n",
234 | "COPY train.py /opt/ml/code/train.py\n",
235 | "COPY handler.py /opt/ml/code/serving/handler.py\n",
236 | "\n",
237 | "ENTRYPOINT [\"python\", \"/opt/ml/code/main.py\"]"
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "metadata": {
243 | "collapsed": true
244 | },
245 | "source": [
246 | "### 1.5 Finally, let's create the buildspec\n",
247 | "This file will be used by CodeBuild for creating our Container image. \n",
248 | "With this file, CodeBuild will run the \"docker build\" command, using the assets we created above, and deploy the image to the Registry. \n",
249 | "As you can see, each command is a bash command that will be executed from inside a Linux Container."
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {},
256 | "outputs": [],
257 | "source": [
258 | "%%writefile buildspec.yml\n",
259 | "version: 0.2\n",
260 | "\n",
261 | "phases:\n",
262 | " install:\n",
263 | " runtime-versions:\n",
264 | " docker: 18\n",
265 | "\n",
266 | " pre_build:\n",
267 | " commands:\n",
268 | " - echo Logging in to Amazon ECR...\n",
269 | " - $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION)\n",
270 | " build:\n",
271 | " commands:\n",
272 | " - echo Build started on `date`\n",
273 | " - echo Building the Docker image...\n",
274 | " - docker build -t $IMAGE_REPO_NAME:$IMAGE_TAG .\n",
275 | " - docker tag $IMAGE_REPO_NAME:$IMAGE_TAG $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG\n",
276 | "\n",
277 | " post_build:\n",
278 | " commands:\n",
279 | " - echo Build completed on `date`\n",
280 | " - echo Pushing the Docker image...\n",
281 | " - echo docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG\n",
282 | " - docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG\n",
283 | " - echo $AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG > image.url\n",
284 | " - echo Done\n",
285 | "artifacts:\n",
286 | " files:\n",
287 | " - image.url\n",
288 | " name: image_url\n",
289 | " discard-paths: yes"
290 | ]
291 | },
292 | {
293 | "cell_type": "markdown",
294 | "metadata": {},
295 | "source": [
296 | "## PART 2 - Local Test: Let's build the image locally and do some tests\n",
297 | "### 2.1 Building the image locally, first\n",
298 | "Each SageMaker Jupyter Notebook already has a **docker** envorinment pre-installed. So we can play with Docker containers just using the same environment."
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {
305 | "scrolled": true
306 | },
307 | "outputs": [],
308 | "source": [
309 | "!docker build -f Dockerfile -t iris_model:1.0 ."
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "### 2.2 Now that we have the algorithm image we can run it to train/deploy a model"
317 | ]
318 | },
319 | {
320 | "cell_type": "markdown",
321 | "metadata": {},
322 | "source": [
323 | "### Then, we need to prepare the dataset\n",
324 | "You'll see that we're splitting the dataset into training and validation and also saving these two subsets of the dataset into csv files. These files will be then uploaded to an S3 Bucket and shared with SageMaker."
325 | ]
326 | },
327 | {
328 | "cell_type": "code",
329 | "execution_count": null,
330 | "metadata": {},
331 | "outputs": [],
332 | "source": [
333 | "!rm -rf input\n",
334 | "!mkdir -p input/data/train\n",
335 | "!mkdir -p input/data/validation\n",
336 | "\n",
337 | "import pandas as pd\n",
338 | "import numpy as np\n",
339 | "\n",
340 | "from sklearn import datasets\n",
341 | "from sklearn.model_selection import train_test_split\n",
342 | "\n",
343 | "iris = datasets.load_iris()\n",
344 | "\n",
345 | "dataset = np.insert(iris.data, 0, iris.target,axis=1)\n",
346 | "\n",
347 | "df = pd.DataFrame(data=dataset, columns=[\"iris_id\"] + iris.feature_names)\n",
348 | "X = df.iloc[:,1:]\n",
349 | "y = df.iloc[:,0]\n",
350 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n",
351 | "\n",
352 | "train_df = X_train.copy()\n",
353 | "train_df.insert(0, \"iris_id\", y_train)\n",
354 | "train_df.to_csv(\"input/data/train/training.csv\", sep=\",\", header=None, index=None)\n",
355 | "\n",
356 | "test_df = X_test.copy()\n",
357 | "test_df.insert(0, \"iris_id\", y_test)\n",
358 | "test_df.to_csv(\"input/data/validation/testing.csv\", sep=\",\", header=None, index=None)\n",
359 | "\n",
360 | "df.head()"
361 | ]
362 | },
363 | {
364 | "cell_type": "markdown",
365 | "metadata": {},
366 | "source": [
367 | "### 2.3 Just a basic local test, using the local Docker daemon\n",
368 | "Here we will simulate SageMaker calling our docker container for training and serving. We'll do that using the built-in Docker Daemon of the Jupyter Notebook Instance."
369 | ]
370 | },
371 | {
372 | "cell_type": "code",
373 | "execution_count": null,
374 | "metadata": {},
375 | "outputs": [],
376 | "source": [
377 | "!rm -rf input/config && mkdir -p input/config"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": null,
383 | "metadata": {},
384 | "outputs": [],
385 | "source": [
386 | "%%writefile input/config/hyperparameters.json\n",
387 | "{\"max_depth\": 20, \"n_jobs\": 4, \"n_estimators\": 120}"
388 | ]
389 | },
390 | {
391 | "cell_type": "code",
392 | "execution_count": null,
393 | "metadata": {},
394 | "outputs": [],
395 | "source": [
396 | "%%writefile input/config/resourceconfig.json\n",
397 | "{\"current_host\": \"localhost\", \"hosts\": [\"algo-1-kipw9\"]}"
398 | ]
399 | },
400 | {
401 | "cell_type": "code",
402 | "execution_count": null,
403 | "metadata": {},
404 | "outputs": [],
405 | "source": [
406 | "%%writefile input/config/inputdataconfig.json\n",
407 | "{\"train\": {\"TrainingInputMode\": \"File\"}, \"validation\": {\"TrainingInputMode\": \"File\"}}"
408 | ]
409 | },
410 | {
411 | "cell_type": "code",
412 | "execution_count": null,
413 | "metadata": {
414 | "scrolled": false
415 | },
416 | "outputs": [],
417 | "source": [
418 | "%%time\n",
419 | "!rm -rf model/\n",
420 | "!mkdir -p model\n",
421 | "\n",
422 | "print( \"Training...\")\n",
423 | "!docker run --rm --name \"my_model\" \\\n",
424 | " -v \"$PWD/model:/opt/ml/model\" \\\n",
425 | " -v \"$PWD/output:/opt/ml/output\" \\\n",
426 | " -v \"$PWD/input:/opt/ml/input\" iris_model:1.0 train"
427 | ]
428 | },
429 | {
430 | "cell_type": "markdown",
431 | "metadata": {},
432 | "source": [
433 | "### 2.4 This is the serving test. It simulates an Endpoint exposed by Sagemaker\n",
434 | "\n",
435 | "After you execute the next cell, this Jupyter notebook will freeze. A webservice will be exposed at the port 8080. "
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": null,
441 | "metadata": {},
442 | "outputs": [],
443 | "source": [
444 | "!docker run --rm --name \"my_model\" \\\n",
445 | " -p 8080:8080 \\\n",
446 | " -v \"$PWD/model:/opt/ml/model\" \\\n",
447 | " -v \"$PWD/input:/opt/ml/input\" iris_model:1.0 serve"
448 | ]
449 | },
450 | {
451 | "cell_type": "markdown",
452 | "metadata": {},
453 | "source": [
454 | "> While the above cell is running, click here [TEST NOTEBOOK](02_Testing%20our%20local%20model%20server.ipynb) to run some tests.\n",
455 | "\n",
456 | "> After you finish the tests, press **STOP**"
457 | ]
458 | },
459 | {
460 | "cell_type": "markdown",
461 | "metadata": {},
462 | "source": [
463 | "## PART 3 - Integrated Test: Everything seems ok, now it's time to put all together\n",
464 | "\n",
465 | "We'll start by running a local **CodeBuild** test, to check the buildspec and also deploy this image into the container registry. Remember that SageMaker will only see images published to ECR.\n"
466 | ]
467 | },
468 | {
469 | "cell_type": "code",
470 | "execution_count": null,
471 | "metadata": {},
472 | "outputs": [],
473 | "source": [
474 | "import boto3\n",
475 | "\n",
476 | "sts_client = boto3.client(\"sts\")\n",
477 | "session = boto3.session.Session()\n",
478 | "\n",
479 | "account_id = sts_client.get_caller_identity()[\"Account\"]\n",
480 | "region = session.region_name\n",
481 | "credentials = session.get_credentials()\n",
482 | "credentials = credentials.get_frozen_credentials()\n",
483 | "\n",
484 | "repo_name=\"iris-model\"\n",
485 | "image_tag=\"test\""
486 | ]
487 | },
488 | {
489 | "cell_type": "code",
490 | "execution_count": null,
491 | "metadata": {},
492 | "outputs": [],
493 | "source": [
494 | "!sudo rm -rf tests && mkdir -p tests\n",
495 | "!cp handler.py main.py train.py Dockerfile buildspec.yml tests/\n",
496 | "with open(\"tests/vars.env\", \"w\") as f:\n",
497 | " f.write(\"AWS_ACCOUNT_ID=%s\\n\" % account_id)\n",
498 | " f.write(\"IMAGE_TAG=%s\\n\" % image_tag)\n",
499 | " f.write(\"IMAGE_REPO_NAME=%s\\n\" % repo_name)\n",
500 | " f.write(\"AWS_DEFAULT_REGION=%s\\n\" % region)\n",
501 | " f.write(\"AWS_ACCESS_KEY_ID=%s\\n\" % credentials.access_key)\n",
502 | " f.write(\"AWS_SECRET_ACCESS_KEY=%s\\n\" % credentials.secret_key)\n",
503 | " f.write(\"AWS_SESSION_TOKEN=%s\\n\" % credentials.token )\n",
504 | " f.close()\n",
505 | "\n",
506 | "!cat tests/vars.env"
507 | ]
508 | },
509 | {
510 | "cell_type": "code",
511 | "execution_count": null,
512 | "metadata": {},
513 | "outputs": [],
514 | "source": [
515 | "%%time\n",
516 | "\n",
517 | "!/tmp/aws-codebuild/local_builds/codebuild_build.sh \\\n",
518 | " -a \"$PWD/tests/output\" \\\n",
519 | " -s \"$PWD/tests\" \\\n",
520 | " -i \"samirsouza/aws-codebuild-standard:3.0\" \\\n",
521 | " -e \"$PWD/tests/vars.env\" \\\n",
522 | " -c"
523 | ]
524 | },
525 | {
526 | "cell_type": "markdown",
527 | "metadata": {},
528 | "source": [
529 | "> Now that we have an image deployed in the ECR repo we can also run some local tests using the SageMaker Estimator.\n",
530 | "\n",
531 | "> Click on this [TEST NOTEBOOK](03_Testing%20the%20container%20using%20SageMaker%20Estimator.ipynb) to run some tests.\n",
532 | "\n",
533 | "> After you finishing the tests, come back to **this notebook** to push the assets to the Git Repo\n"
534 | ]
535 | },
536 | {
537 | "cell_type": "markdown",
538 | "metadata": {},
539 | "source": [
540 | "## PART 4 - Let's push all the assets to the Git Repo connected to the Build pipeline\n",
541 | "There is a CodePipeine configured to keep listeining to this Git Repo and start a new Building process with CodeBuild."
542 | ]
543 | },
544 | {
545 | "cell_type": "code",
546 | "execution_count": null,
547 | "metadata": {},
548 | "outputs": [],
549 | "source": [
550 | "%%bash\n",
551 | "cd ../../../mlops\n",
552 | "git checkout iris_model\n",
553 | "cp $OLDPWD/buildspec.yml $OLDPWD/handler.py $OLDPWD/train.py $OLDPWD/main.py $OLDPWD/Dockerfile .\n",
554 | "\n",
555 | "git add --all\n",
556 | "git commit -a -m \" - files for building an iris model image\"\n",
557 | "git push"
558 | ]
559 | },
560 | {
561 | "cell_type": "markdown",
562 | "metadata": {},
563 | "source": [
564 | "> Alright, now open the AWS console and go to the **CodePipeline** dashboard. Look for a pipeline called **mlops-iris-model**. This pipeline will deploy the final image to an ECR repo. When this process finishes, open the **Elastic Compute Registry** dashboard, in the AWS console, and check if you have an image called **iris-model:latest**. If yes, you can go to the next exercise. If not, wait a little more."
565 | ]
566 | }
567 | ],
568 | "metadata": {
569 | "kernelspec": {
570 | "display_name": "conda_python3",
571 | "language": "python",
572 | "name": "conda_python3"
573 | },
574 | "language_info": {
575 | "codemirror_mode": {
576 | "name": "ipython",
577 | "version": 3
578 | },
579 | "file_extension": ".py",
580 | "mimetype": "text/x-python",
581 | "name": "python",
582 | "nbconvert_exporter": "python",
583 | "pygments_lexer": "ipython3",
584 | "version": "3.6.10"
585 | }
586 | },
587 | "nbformat": 4,
588 | "nbformat_minor": 2
589 | }
590 |
--------------------------------------------------------------------------------
/lab/01_CreateAlgorithmContainer/02_Testing our local model server.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## First, let's test the ping method (GET /ping)\n",
8 | "This method will be used by Sagemaker for health check our model. It must return a code **200**."
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import json\n",
18 | "from urllib import request\n",
19 | "\n",
20 | "base_url='http://localhost:8080'"
21 | ]
22 | },
23 | {
24 | "cell_type": "code",
25 | "execution_count": null,
26 | "metadata": {
27 | "scrolled": true
28 | },
29 | "outputs": [],
30 | "source": [
31 | "resp = request.urlopen(\"%s/ping\" % base_url)\n",
32 | "print(\"Response code: %d\" % resp.getcode() )"
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {},
38 | "source": [
39 | "## Then we can the predictions (POST /invocations)\n",
40 | "This method will be used by Sagemaker for the predictions. Here we're simulating the header parameter related to the CustomAttributes"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {
47 | "scrolled": true
48 | },
49 | "outputs": [],
50 | "source": [
51 | "%%time\n",
52 | "from sagemaker.serializers import CSVSerializer\n",
53 | "csv_serializer = CSVSerializer()\n",
54 | "payloads = [\n",
55 | " [4.6, 3.1, 1.5, 0.2], # 0\n",
56 | " [7.7, 2.6, 6.9, 2.3], # 2\n",
57 | " [6.1, 2.8, 4.7, 1.2] # 1\n",
58 | "]\n",
59 | "\n",
60 | "def predict(payload):\n",
61 | " headers = {\n",
62 | " 'Content-type': 'text/csv',\n",
63 | " 'Accept': 'text/csv'\n",
64 | " }\n",
65 | " \n",
66 | " req = request.Request(\"%s/invocations\" % base_url, data=csv_serializer.serialize(payload).encode('utf-8'), headers=headers)\n",
67 | " resp = request.urlopen(req)\n",
68 | " print(\"Response code: %d, Prediction: %s\\n\" % (resp.getcode(), resp.read()))\n",
69 | " for i in resp.headers:\n",
70 | " print(i, resp.headers[i])\n",
71 | "\n",
72 | "for p in payloads:\n",
73 | " predict(p)"
74 | ]
75 | },
76 | {
77 | "cell_type": "markdown",
78 | "metadata": {},
79 | "source": [
80 | "## Now, you can go back to the previous Jupyter, stop it and continue executing it"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": null,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": []
89 | }
90 | ],
91 | "metadata": {
92 | "kernelspec": {
93 | "display_name": "conda_python3",
94 | "language": "python",
95 | "name": "conda_python3"
96 | },
97 | "language_info": {
98 | "codemirror_mode": {
99 | "name": "ipython",
100 | "version": 3
101 | },
102 | "file_extension": ".py",
103 | "mimetype": "text/x-python",
104 | "name": "python",
105 | "nbconvert_exporter": "python",
106 | "pygments_lexer": "ipython3",
107 | "version": "3.6.10"
108 | }
109 | },
110 | "nbformat": 4,
111 | "nbformat_minor": 2
112 | }
113 |
--------------------------------------------------------------------------------
/lab/01_CreateAlgorithmContainer/03_Testing the container using SageMaker Estimator.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Integrated Test\n",
8 | "In this test, we'll use a SageMaker Estimator (https://sagemaker.readthedocs.io/en/stable/estimators.html) to encapsulate the docker image published to ECR and start a **local** test, but this time, using the SageMaker library."
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "metadata": {},
15 | "outputs": [],
16 | "source": [
17 | "import sagemaker\n",
18 | "import json\n",
19 | "from sagemaker import get_execution_role\n",
20 | "\n",
21 | "role = get_execution_role()\n",
22 | "sagemaker_session = sagemaker.Session()\n",
23 | "bucket = sagemaker_session.default_bucket()\n",
24 | "prefix='mlops/iris'"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {},
30 | "source": [
31 | "## Upload the dataset\n",
32 | "In the previous exercise, prepared the training and validation dataset. Now, we'll upload the CSVs to S3 and share them with an Estimator"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": null,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "train_path = sagemaker_session.upload_data(path='input/data/train', key_prefix='iris-model/input/train')\n",
42 | "test_path = sagemaker_session.upload_data(path='input/data/validation', key_prefix='iris-model/input/validation')\n",
43 | "print(\"Train: %s\\nValidation: %s\" % (train_path, test_path) )"
44 | ]
45 | },
46 | {
47 | "cell_type": "markdown",
48 | "metadata": {},
49 | "source": [
50 | "## And now, we can use a SageMaker Estimator for training and deploying the container we've created"
51 | ]
52 | },
53 | {
54 | "cell_type": "code",
55 | "execution_count": null,
56 | "metadata": {},
57 | "outputs": [],
58 | "source": [
59 | "# Create the estimator\n",
60 | "# iris-model:test is the name of the container created in the previous notebook\n",
61 | "# By the local codebuild test. An image with that name:tag was pushed to the ECR.\n",
62 | "iris = sagemaker.estimator.Estimator('iris-model:test',\n",
63 | " role,\n",
64 | " instance_count=1, \n",
65 | " instance_type='local',\n",
66 | " output_path='s3://{}/{}/output'.format(bucket, prefix))\n",
67 | "hyperparameters = {\n",
68 | " 'max_depth': 20,\n",
69 | " 'n_jobs': 4,\n",
70 | " 'n_estimators': 120\n",
71 | "}\n",
72 | "\n",
73 | "print(hyperparameters)\n",
74 | "iris.set_hyperparameters(**hyperparameters)"
75 | ]
76 | },
77 | {
78 | "cell_type": "markdown",
79 | "metadata": {},
80 | "source": [
81 | "After you call .fit, a new training job will be executed inside the *local Docker daemon* and not in the SageMaker environment, on the cloud"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": null,
87 | "metadata": {},
88 | "outputs": [],
89 | "source": [
90 | "iris.fit({'train': train_path, 'validation': test_path })"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "The next command will launch a new container in your local Docker daemon. Then you can use the returned predictor for testing it"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": null,
103 | "metadata": {},
104 | "outputs": [],
105 | "source": [
106 | "iris_predictor = iris.deploy(initial_instance_count=1, instance_type='local')"
107 | ]
108 | },
109 | {
110 | "cell_type": "markdown",
111 | "metadata": {},
112 | "source": [
113 | "Now, let's use the predictor (https://sagemaker.readthedocs.io/en/stable/predictors.html) for some tests."
114 | ]
115 | },
116 | {
117 | "cell_type": "code",
118 | "execution_count": null,
119 | "metadata": {},
120 | "outputs": [],
121 | "source": [
122 | "import pandas as pd\n",
123 | "import random\n",
124 | "from sagemaker.serializers import CSVSerializer\n",
125 | "from sagemaker.deserializers import CSVDeserializer\n",
126 | "\n",
127 | "# configure the predictor to do everything for us\n",
128 | "iris_predictor.serializer = CSVSerializer()\n",
129 | "iris_predictor.deserializer = CSVDeserializer()\n",
130 | "\n",
131 | "# load the testing data from the validation csv\n",
132 | "validation = pd.read_csv('input/data/validation/testing.csv', header=None)\n",
133 | "idx = random.randint(0,len(validation)-5)\n",
134 | "req = validation.iloc[idx:idx+5].values\n",
135 | "\n",
136 | "# cut a sample with 5 lines from our dataset and then split the label from the features.\n",
137 | "X = req[:,1:].tolist()\n",
138 | "y = req[:,0].tolist()\n",
139 | "\n",
140 | "# call the local endpoint\n",
141 | "for features,label in zip(X,y):\n",
142 | " prediction = iris_predictor.predict(features)\n",
143 | "\n",
144 | " # compare the results\n",
145 | " print(\"RESULT: {} == {} ? {}\".format( label, prediction, label == prediction ) )"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": null,
151 | "metadata": {},
152 | "outputs": [],
153 | "source": [
154 | "iris_predictor.delete_endpoint()"
155 | ]
156 | },
157 | {
158 | "cell_type": "markdown",
159 | "metadata": {},
160 | "source": [
161 | "### That's it! :) Now you can go back to the previous Jupyter notebook and commit the assets to start building the Final Docker Image"
162 | ]
163 | }
164 | ],
165 | "metadata": {
166 | "kernelspec": {
167 | "display_name": "conda_python3",
168 | "language": "python",
169 | "name": "conda_python3"
170 | },
171 | "language_info": {
172 | "codemirror_mode": {
173 | "name": "ipython",
174 | "version": 3
175 | },
176 | "file_extension": ".py",
177 | "mimetype": "text/x-python",
178 | "name": "python",
179 | "nbconvert_exporter": "python",
180 | "pygments_lexer": "ipython3",
181 | "version": "3.6.10"
182 | }
183 | },
184 | "nbformat": 4,
185 | "nbformat_minor": 2
186 | }
187 |
--------------------------------------------------------------------------------
/lab/02_TrainYourModel/01_Training our model.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Now, it is time for start an automated ML pipeline using the MLOps environment\n",
8 | "\n",
9 | "We'll do that by putting a zip file, called **trainingjob.zip**, in an S3 bucket. CodePipeline will listen to that bucket and start a job. This zip file has the following structure:\n",
10 | " - trainingjob.json (Sagemaker training job descriptor)\n",
11 | " - environment.json (Instructions to the environment of how to deploy and prepare the endpoints)"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "### 1.1 Let's start defining the hyperparameters and other attributes\n",
19 | "\n",
20 | "If you ran the previous section **01_CreateAlgorithmContainer** and managed to create a custom container, please change the following variable in the next cell, from: \n",
21 | "```Python\n",
22 | "use_xgboost_builtin=True\n",
23 | "```\n",
24 | "to:\n",
25 | "```Python\n",
26 | "use_xgboost_builtin=False\n",
27 | "```"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "import sagemaker\n",
37 | "import boto3\n",
38 | "\n",
39 | "use_xgboost_builtin=True\n",
40 | "\n",
41 | "sts_client = boto3.client(\"sts\")\n",
42 | "account_id = sts_client.get_caller_identity()[\"Account\"]\n",
43 | "region = boto3.session.Session().region_name\n",
44 | "model_prefix='iris-model'\n",
45 | "training_image = None\n",
46 | "hyperparameters = None\n",
47 | "if use_xgboost_builtin: \n",
48 | " training_image = sagemaker.image_uris.retrieve('xgboost', boto3.Session().region_name, version='1.0-1')\n",
49 | " hyperparameters = {\n",
50 | " \"alpha\": 0.42495142279951414,\n",
51 | " \"eta\": 0.4307531922567607,\n",
52 | " \"gamma\": 1.8028358018081714,\n",
53 | " \"max_depth\": 10,\n",
54 | " \"min_child_weight\": 5.925133573560345,\n",
55 | " \"num_class\": 3,\n",
56 | " \"num_round\": 30,\n",
57 | " \"objective\": \"multi:softmax\",\n",
58 | " \"reg_lambda\": 10,\n",
59 | " \"silent\": 0,\n",
60 | " }\n",
61 | "else:\n",
62 | " training_image = '{}.dkr.ecr.{}.amazonaws.com/{}:latest'.format(account_id, region, model_prefix)\n",
63 | " hyperparameters = {\n",
64 | " \"max_depth\": 11,\n",
65 | " \"n_jobs\": 5,\n",
66 | " \"n_estimators\": 120\n",
67 | " }\n",
68 | "print(training_image)"
69 | ]
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "metadata": {},
74 | "source": [
75 | "### 1.2 Then, let's create the trainingjob descriptor"
76 | ]
77 | },
78 | {
79 | "cell_type": "code",
80 | "execution_count": null,
81 | "metadata": {
82 | "scrolled": true
83 | },
84 | "outputs": [],
85 | "source": [
86 | "import time\n",
87 | "import sagemaker\n",
88 | "import boto3\n",
89 | "\n",
90 | "roleArn = \"arn:aws:iam::{}:role/MLOps\".format(account_id)\n",
91 | "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n",
92 | "job_name = model_prefix + timestamp\n",
93 | "sagemaker_session = sagemaker.Session()\n",
94 | "\n",
95 | "training_params = {}\n",
96 | "\n",
97 | "# Here we set the reference for the Image Classification Docker image, stored on ECR (https://aws.amazon.com/pt/ecr/)\n",
98 | "training_params[\"AlgorithmSpecification\"] = {\n",
99 | " \"TrainingImage\": training_image,\n",
100 | " \"TrainingInputMode\": \"File\"\n",
101 | "}\n",
102 | "\n",
103 | "# The IAM role with all the permissions given to Sagemaker\n",
104 | "training_params[\"RoleArn\"] = roleArn\n",
105 | "\n",
106 | "# Here Sagemaker will store the final trained model\n",
107 | "training_params[\"OutputDataConfig\"] = {\n",
108 | " \"S3OutputPath\": 's3://{}/{}'.format(sagemaker_session.default_bucket(), model_prefix)\n",
109 | "}\n",
110 | "\n",
111 | "# This is the config of the instance that will execute the training\n",
112 | "training_params[\"ResourceConfig\"] = {\n",
113 | " \"InstanceCount\": 1,\n",
114 | " \"InstanceType\": \"ml.m4.xlarge\",\n",
115 | " \"VolumeSizeInGB\": 30\n",
116 | "}\n",
117 | "\n",
118 | "# The job name. You'll see this name in the Jobs section of the Sagemaker's console\n",
119 | "training_params[\"TrainingJobName\"] = job_name\n",
120 | "\n",
121 | "for i in hyperparameters:\n",
122 | " hyperparameters[i] = str(hyperparameters[i])\n",
123 | " \n",
124 | "# Here you will configure the hyperparameters used for training your model.\n",
125 | "training_params[\"HyperParameters\"] = hyperparameters\n",
126 | "\n",
127 | "# Training timeout\n",
128 | "training_params[\"StoppingCondition\"] = {\n",
129 | " \"MaxRuntimeInSeconds\": 360000\n",
130 | "}\n",
131 | "\n",
132 | "# The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)\n",
133 | "training_params[\"InputDataConfig\"] = []\n",
134 | "\n",
135 | "# Please notice that we're using application/x-recordio for both \n",
136 | "# training and validation datasets, given our dataset is formated in RecordIO\n",
137 | "\n",
138 | "# Here we set training dataset\n",
139 | "training_params[\"InputDataConfig\"].append({\n",
140 | " \"ChannelName\": \"train\",\n",
141 | " \"DataSource\": {\n",
142 | " \"S3DataSource\": {\n",
143 | " \"S3DataType\": \"S3Prefix\",\n",
144 | " \"S3Uri\": 's3://{}/{}/input/train'.format(sagemaker_session.default_bucket(), model_prefix),\n",
145 | " \"S3DataDistributionType\": \"FullyReplicated\"\n",
146 | " }\n",
147 | " },\n",
148 | " \"ContentType\": \"text/csv\",\n",
149 | " \"CompressionType\": \"None\"\n",
150 | "})\n",
151 | "training_params[\"InputDataConfig\"].append({\n",
152 | " \"ChannelName\": \"validation\",\n",
153 | " \"DataSource\": {\n",
154 | " \"S3DataSource\": {\n",
155 | " \"S3DataType\": \"S3Prefix\",\n",
156 | " \"S3Uri\": 's3://{}/{}/input/validation'.format(sagemaker_session.default_bucket(), model_prefix),\n",
157 | " \"S3DataDistributionType\": \"FullyReplicated\"\n",
158 | " }\n",
159 | " },\n",
160 | " \"ContentType\": \"text/csv\",\n",
161 | " \"CompressionType\": \"None\"\n",
162 | "})\n",
163 | "training_params[\"Tags\"] = []"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": null,
169 | "metadata": {},
170 | "outputs": [],
171 | "source": [
172 | "deployment_params = {\n",
173 | " \"EndpointPrefix\": model_prefix,\n",
174 | " \"DevelopmentEndpoint\": {\n",
175 | " # we want to enable the endpoint monitoring\n",
176 | " \"InferenceMonitoring\": True,\n",
177 | " # we will collect 100% of all the requests/predictions\n",
178 | " \"InferenceMonitoringSampling\": 100,\n",
179 | " \"InferenceMonitoringOutputBucket\": 's3://{}/{}/monitoring/dev'.format(sagemaker_session.default_bucket(), model_prefix),\n",
180 | " # we don't want to enable A/B tests in development\n",
181 | " \"ABTests\": False,\n",
182 | " # we'll use a basic instance for testing purposes\n",
183 | " \"InstanceType\": \"ml.t2.large\",\n",
184 | " \"InitialInstanceCount\": 1,\n",
185 | " # we don't want high availability/escalability for development\n",
186 | " \"AutoScaling\": None\n",
187 | " },\n",
188 | " \"ProductionEndpoint\": {\n",
189 | " # we want to enable the endpoint monitoring\n",
190 | " \"InferenceMonitoring\": True,\n",
191 | " # we will collect 100% of all the requests/predictions\n",
192 | " \"InferenceMonitoringSampling\": 100,\n",
193 | " \"InferenceMonitoringOutputBucket\": 's3://{}/{}/monitoring/prd'.format(sagemaker_session.default_bucket(), model_prefix),\n",
194 | " # we want to do A/B tests in production\n",
195 | " \"ABTests\": True,\n",
196 | " # we'll use a better instance for production. CPU optimized\n",
197 | " \"InstanceType\": \"ml.c5.large\",\n",
198 | " \"InitialInstanceCount\": 2,\n",
199 | " \"InitialVariantWeight\": 0.1,\n",
200 | " # we want elasticity. at minimum 2 instances to support the endpoint and at maximum 10\n",
201 | " # we'll use a threshold of 750 predictions per instance to start adding new instances or remove them\n",
202 | " \"AutoScaling\": {\n",
203 | " \"MinCapacity\": 2,\n",
204 | " \"MaxCapacity\": 10,\n",
205 | " \"TargetValue\": 200.0,\n",
206 | " \"ScaleInCooldown\": 30,\n",
207 | " \"ScaleOutCooldown\": 60,\n",
208 | " \"PredefinedMetricType\": \"SageMakerVariantInvocationsPerInstance\"\n",
209 | " }\n",
210 | " }\n",
211 | "}"
212 | ]
213 | },
214 | {
215 | "cell_type": "markdown",
216 | "metadata": {},
217 | "source": [
218 | "#### Preparing and uploading the dataset"
219 | ]
220 | },
221 | {
222 | "cell_type": "code",
223 | "execution_count": null,
224 | "metadata": {},
225 | "outputs": [],
226 | "source": [
227 | "import numpy as np\n",
228 | "import sagemaker\n",
229 | "from sklearn import datasets\n",
230 | "from sklearn.model_selection import train_test_split\n",
231 | "\n",
232 | "sagemaker_session = sagemaker.Session()\n",
233 | "iris = datasets.load_iris()\n",
234 | "\n",
235 | "X_train, X_test, y_train, y_test = train_test_split(\n",
236 | " iris.data, iris.target, test_size=0.33, random_state=42, stratify=iris.target)\n",
237 | "np.savetxt(\"iris_train.csv\", np.column_stack((y_train, X_train)), delimiter=\",\", fmt='%0.3f')\n",
238 | "np.savetxt(\"iris_test.csv\", np.column_stack((y_test, X_test)), delimiter=\",\", fmt='%0.3f')\n",
239 | "\n",
240 | "# Upload the dataset to an S3 bucket\n",
241 | "input_train = sagemaker_session.upload_data(path='iris_train.csv', key_prefix='%s/input/train' % model_prefix)\n",
242 | "input_test = sagemaker_session.upload_data(path='iris_test.csv', key_prefix='%s/input/validation' % model_prefix)"
243 | ]
244 | },
245 | {
246 | "cell_type": "markdown",
247 | "metadata": {},
248 | "source": [
249 | "### 1.3 Alright! Now it's time to start the training process"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {},
256 | "outputs": [],
257 | "source": [
258 | "import boto3\n",
259 | "import io\n",
260 | "import zipfile\n",
261 | "import json\n",
262 | "\n",
263 | "s3 = boto3.client('s3')\n",
264 | "sts_client = boto3.client(\"sts\")\n",
265 | "\n",
266 | "session = boto3.session.Session()\n",
267 | "\n",
268 | "account_id = sts_client.get_caller_identity()[\"Account\"]\n",
269 | "region = session.region_name\n",
270 | "\n",
271 | "bucket_name = \"mlops-%s-%s\" % (region, account_id)\n",
272 | "key_name = \"training_jobs/%s/trainingjob.zip\" % model_prefix\n",
273 | "\n",
274 | "zip_buffer = io.BytesIO()\n",
275 | "with zipfile.ZipFile(zip_buffer, 'a') as zf:\n",
276 | " zf.writestr('trainingjob.json', json.dumps(training_params))\n",
277 | " zf.writestr('deployment.json', json.dumps(deployment_params))\n",
278 | "zip_buffer.seek(0)\n",
279 | "\n",
280 | "s3.put_object(Bucket=bucket_name, Key=key_name, Body=bytearray(zip_buffer.read()))"
281 | ]
282 | },
283 | {
284 | "cell_type": "markdown",
285 | "metadata": {},
286 | "source": [
287 | "### Ok, now open the AWS console in another tab and go to the CodePipeline console to see the status of our building pipeline"
288 | ]
289 | },
290 | {
291 | "cell_type": "markdown",
292 | "metadata": {},
293 | "source": [
294 | "> Now, click on [THIS NOTEBOOK](02_Check%20Progress%20and%20Test%20the%20endpoint.ipynb) to see the progress and test your endpoint"
295 | ]
296 | },
297 | {
298 | "cell_type": "markdown",
299 | "metadata": {},
300 | "source": [
301 | "# A/B TESTS\n",
302 | "\n",
303 | "If you take a look on the **deployment** parameters you'll see that we enabled the **Production** endpoint for A/B tests. To try this, just deploy the first model into production, then run the section **1.3** again. Feel free to change some hyperparameter values in the section **1.1** before starting a new training session.\n",
304 | "\n",
305 | "When publishing the second model into **Development**, the endpoint will be updated and the model will be replaced without compromising the user experience. This is the natural behavior of an Endpoint in SageMaker when you update it.\n",
306 | "\n",
307 | "After you approve the deployment into **Production**, the endpoint will be updated and a second model will be added to it. Now it's time to execute some **A/B tests**. In the **Progress** Jupyter (link above), execute the last cell (test code) to show which model answered your request. You just need to keep sending some requests to see the **Production** endpoint using both models A and B, respecting the proportion defined by the variable **InitialVariantWeight** in the deployment params.\n",
308 | "\n",
309 | "In a real life scenario you can monitor the performance of both models and then adjust the **Weight** of each model to do the full transition to the new model (and remove the old one) or to rollback the new deployment.\n",
310 | "\n",
311 | "To adjust the weight of each model (Variant Name) in an endpoint, you just need to call the following function: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.update_endpoint_weights_and_capacities"
312 | ]
313 | }
314 | ],
315 | "metadata": {
316 | "kernelspec": {
317 | "display_name": "conda_python3",
318 | "language": "python",
319 | "name": "conda_python3"
320 | },
321 | "language_info": {
322 | "codemirror_mode": {
323 | "name": "ipython",
324 | "version": 3
325 | },
326 | "file_extension": ".py",
327 | "mimetype": "text/x-python",
328 | "name": "python",
329 | "nbconvert_exporter": "python",
330 | "pygments_lexer": "ipython3",
331 | "version": "3.6.10"
332 | }
333 | },
334 | "nbformat": 4,
335 | "nbformat_minor": 2
336 | }
337 |
--------------------------------------------------------------------------------
/lab/02_TrainYourModel/02_Check Progress and Test the endpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Now let's monitor the training/deploying process"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": null,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import boto3\n",
17 | "import ipywidgets as widgets\n",
18 | "import time\n",
19 | "\n",
20 | "from IPython.display import display"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "metadata": {},
26 | "source": [
27 | "## Helper functions"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": null,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "def get_actions():\n",
37 | " actions = []\n",
38 | " executionId = None\n",
39 | " resp = codepipeline.get_pipeline_state( name=pipeline_name )\n",
40 | " for stage in resp['stageStates']:\n",
41 | " stageName = stage['stageName']\n",
42 | " stageStatus = None\n",
43 | " if stage.get('latestExecution') is not None:\n",
44 | " stageStatus = stage['latestExecution']['status']\n",
45 | " if executionId is None:\n",
46 | " executionId = stage['latestExecution']['pipelineExecutionId']\n",
47 | " elif stage['latestExecution']['pipelineExecutionId'] != executionId:\n",
48 | " stageStatus = 'Old'\n",
49 | " for action in stage['actionStates']:\n",
50 | " actionName = action['actionName']\n",
51 | " actionStatus = 'Old'\n",
52 | " if action.get('latestExecution') is not None and stageStatus != 'Old':\n",
53 | " actionStatus = action['latestExecution']['status']\n",
54 | " actions.append( {'stageName': stageName, \n",
55 | " 'stageStatus': stageStatus, \n",
56 | " 'actionName': actionName, \n",
57 | " 'actionStatus': actionStatus})\n",
58 | " return actions"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": null,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "def get_approval_token():\n",
68 | " resp = codepipeline.get_pipeline_state( name=pipeline_name )\n",
69 | " token = None\n",
70 | " # Get the approve train status token\n",
71 | " for stageState in resp['stageStates']:\n",
72 | " if stageState['stageName'] == 'DeployApproval':\n",
73 | " for actionState in stageState['actionStates']:\n",
74 | " if actionState['actionName'] == 'ApproveDeploy':\n",
75 | " if actionState.get('latestExecution') is None:\n",
76 | " return None\n",
77 | " latestExecution = actionState['latestExecution']\n",
78 | " if latestExecution['status'] == 'InProgress':\n",
79 | " token = latestExecution['token']\n",
80 | " return token"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": null,
86 | "metadata": {},
87 | "outputs": [],
88 | "source": [
89 | "from sagemaker.serializers import CSVSerializer\n",
90 | "csv_serializer = CSVSerializer()\n",
91 | "def test_endpoint(endpoint_name, payload):\n",
92 | " resp = sm.invoke_endpoint(\n",
93 | " EndpointName=endpoint_name,\n",
94 | " ContentType='text/csv',\n",
95 | " Accept='text/csv',\n",
96 | " Body=csv_serializer.serialize(payload)\n",
97 | " )\n",
98 | " variant_name = resp['ResponseMetadata']['HTTPHeaders']['x-amzn-invoked-production-variant']\n",
99 | " return float(resp['Body'].read().decode('utf-8').strip()), variant_name"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "def approval(token, result):\n",
109 | " if token is None:\n",
110 | " return\n",
111 | " \n",
112 | " codepipeline.put_approval_result(\n",
113 | " pipelineName=pipeline_name,\n",
114 | " stageName='DeployApproval',\n",
115 | " actionName='ApproveDeploy',\n",
116 | " result=result,\n",
117 | " token=token\n",
118 | " )"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": null,
124 | "metadata": {},
125 | "outputs": [],
126 | "source": [
127 | "def approve(b):\n",
128 | " result={\n",
129 | " 'summary': 'This is a great model! Put into production.',\n",
130 | " 'status': 'Approved'\n",
131 | " }\n",
132 | " approval(get_approval_token(), result) \n",
133 | " button_box.close()\n",
134 | " start_monitoring()"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": null,
140 | "metadata": {},
141 | "outputs": [],
142 | "source": [
143 | "def reject(b):\n",
144 | " result={\n",
145 | " 'summary': 'This is a rubbish model. Discard it',\n",
146 | " 'status': 'Rejected'\n",
147 | " }\n",
148 | " approval(get_approval_token(), result)\n",
149 | " button_box.close()\n",
150 | " start_monitoring()"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "metadata": {},
157 | "outputs": [],
158 | "source": [
159 | "def start_monitoring():\n",
160 | " global button_box\n",
161 | " \n",
162 | " running = True\n",
163 | " while running:\n",
164 | " steps_ok = 0\n",
165 | " for k,action in enumerate(get_actions()):\n",
166 | " if action['actionStatus'] == 'Failed':\n",
167 | " bar.bar_style='danger'\n",
168 | " label.value='Ops! Something went wrong Stage[{}] Action[{}]'.format(\n",
169 | " action['stageName'], action['actionName'])\n",
170 | " running = False\n",
171 | " return\n",
172 | "\n",
173 | " elif action['actionStatus'] == 'InProgress':\n",
174 | " if get_approval_token() is not None:\n",
175 | " display(button_box)\n",
176 | " running = False\n",
177 | " break\n",
178 | " elif action['actionStatus'] == 'Old':\n",
179 | " break\n",
180 | " elif action['actionStatus'] == 'Succeeded':\n",
181 | " steps_ok += 1\n",
182 | " \n",
183 | " label.value = \"Actions {}/{} - Current: Stage[{}] Action[{}]\".format( \n",
184 | " k+1,max_actions, action['stageName'], action['actionName'] )\n",
185 | " bar.value = steps_ok\n",
186 | "\n",
187 | " if steps_ok == max_actions:\n",
188 | " running = False\n",
189 | " else: \n",
190 | " time.sleep(2)"
191 | ]
192 | },
193 | {
194 | "cell_type": "markdown",
195 | "metadata": {},
196 | "source": [
197 | "## Job monitoring"
198 | ]
199 | },
200 | {
201 | "cell_type": "code",
202 | "execution_count": null,
203 | "metadata": {},
204 | "outputs": [],
205 | "source": [
206 | "codepipeline = boto3.client('codepipeline')\n",
207 | "sm = boto3.client('sagemaker-runtime')\n",
208 | "\n",
209 | "model_prefix='iris-model'\n",
210 | "pipeline_name = 'iris-model-pipeline'\n",
211 | "endpoint_name_mask='{}-%s'.format(model_prefix)"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": null,
217 | "metadata": {},
218 | "outputs": [],
219 | "source": [
220 | "approve_btn = widgets.Button(description=\"Approve\", button_style='success', icon='check')\n",
221 | "reject_btn = widgets.Button(description=\"Reject\", button_style='danger', icon='close')\n",
222 | "approve_btn.on_click(approve)\n",
223 | "reject_btn.on_click(reject)\n",
224 | "button_box = widgets.HBox([approve_btn, reject_btn])\n",
225 | " \n",
226 | "max_actions = len(get_actions())\n",
227 | "label = widgets.Label(value=\"Loading...\")\n",
228 | "bar = widgets.IntProgress( value=0, min=0, max=max_actions, step=1, bar_style='info' )\n",
229 | "info_box = widgets.VBox([label, bar])\n",
230 | "\n",
231 | "display(info_box)\n",
232 | "start_monitoring()"
233 | ]
234 | },
235 | {
236 | "cell_type": "markdown",
237 | "metadata": {},
238 | "source": [
239 | "## Now, if everything went fine, we can test our models"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": null,
245 | "metadata": {},
246 | "outputs": [],
247 | "source": [
248 | "%%time\n",
249 | "payload = [4.6, 3.1, 1.5, 0.2]\n",
250 | "\n",
251 | "print( \"DSV\")\n",
252 | "print( \"Classifier: %s, Variant Name: %s\" % test_endpoint( endpoint_name_mask % ('development'), payload ) )\n",
253 | "\n",
254 | "print( \"\\nPRD\")\n",
255 | "print( \"Classifier: %s, Variant Name: %s\" % test_endpoint( endpoint_name_mask % ('production'), payload ) )"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": null,
261 | "metadata": {},
262 | "outputs": [],
263 | "source": []
264 | }
265 | ],
266 | "metadata": {
267 | "kernelspec": {
268 | "display_name": "conda_python3",
269 | "language": "python",
270 | "name": "conda_python3"
271 | }
272 | },
273 | "nbformat": 4,
274 | "nbformat_minor": 2
275 | }
276 |
--------------------------------------------------------------------------------
/lab/03_TestingHacking/01_Stress Test.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Stress Test\n",
8 | "\n",
9 | "The idea of this code is to see how the production Endpoint will behave when a **bunch** of requests arrive it.\n",
10 | "Let's simulate several users doing predictions at the same time"
11 | ]
12 | },
13 | {
14 | "cell_type": "code",
15 | "execution_count": null,
16 | "metadata": {},
17 | "outputs": [],
18 | "source": [
19 | "import threading\n",
20 | "import boto3\n",
21 | "import numpy as np\n",
22 | "import time\n",
23 | "import math\n",
24 | "\n",
25 | "from multiprocessing.pool import ThreadPool\n",
26 | "from sklearn import datasets"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "sm = boto3.client(\"sagemaker-runtime\")\n",
36 | "\n",
37 | "endpoint_name_mask='iris-model-%s'\n",
38 | "\n",
39 | "iris = datasets.load_iris()\n",
40 | "dataset = np.insert(iris.data, 0, iris.target,axis=1)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "from sagemaker.serializers import CSVSerializer\n",
50 | "\n",
51 | "def predict(payload):\n",
52 | " csv_serializer = CSVSerializer()\n",
53 | " payload = payload\n",
54 | " X = payload[1:]\n",
55 | " y = payload[0]\n",
56 | " \n",
57 | " elapsed_time = time.time()\n",
58 | " resp = sm.invoke_endpoint(\n",
59 | " EndpointName=endpoint_name_mask % env,\n",
60 | " ContentType='text/csv',\n",
61 | " Accept='text/csv',\n",
62 | " Body=csv_serializer.serialize(X)\n",
63 | " )\n",
64 | " elapsed_time = time.time() - elapsed_time\n",
65 | " resp = float(resp['Body'].read().decode('utf-8').strip())\n",
66 | " return (resp == y, elapsed_time)"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": null,
72 | "metadata": {},
73 | "outputs": [],
74 | "source": [
75 | "def run_test(max_threads, max_requests):\n",
76 | " num_batches = math.ceil(max_requests / len(dataset))\n",
77 | " requests = []\n",
78 | " for i in range(num_batches):\n",
79 | " batch = dataset.copy()\n",
80 | " np.random.shuffle(batch)\n",
81 | " requests += batch.tolist()\n",
82 | " len(requests)\n",
83 | "\n",
84 | " pool = ThreadPool(max_threads)\n",
85 | " result = pool.map(predict, requests)\n",
86 | " pool.close()\n",
87 | " pool.join()\n",
88 | " \n",
89 | " correct_random_forest=0\n",
90 | " elapsedtime_random_forest=0\n",
91 | " for i in result:\n",
92 | " correct_random_forest += i[0]\n",
93 | " elapsedtime_random_forest += i[1]\n",
94 | " print(\"Score classifier: {}\".format(correct_random_forest/len(result)))\n",
95 | "\n",
96 | " print(\"Elapsed time: {}s\".format(elapsedtime_random_forest))"
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": null,
102 | "metadata": {},
103 | "outputs": [],
104 | "source": [
105 | "env='production'"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": null,
111 | "metadata": {},
112 | "outputs": [],
113 | "source": [
114 | "%%time\n",
115 | "print(\"Starting test 1\")\n",
116 | "run_test(10, 1000)"
117 | ]
118 | },
119 | {
120 | "cell_type": "code",
121 | "execution_count": null,
122 | "metadata": {},
123 | "outputs": [],
124 | "source": [
125 | "%%time\n",
126 | "print(\"Starting test 2\")\n",
127 | "run_test(100, 10000)"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": null,
133 | "metadata": {},
134 | "outputs": [],
135 | "source": [
136 | "%%time\n",
137 | "print(\"Starting test 3\")\n",
138 | "run_test(150, 100000000)"
139 | ]
140 | },
141 | {
142 | "cell_type": "markdown",
143 | "metadata": {},
144 | "source": [
145 | "> While this test is running, go to the **AWS Console** -> **Sagemaker**, then click on the **Endpoint** and then click on the **CloudWatch** monitoring logs to see the Endpoint Behavior"
146 | ]
147 | },
148 | {
149 | "cell_type": "markdown",
150 | "metadata": {},
151 | "source": [
152 | "## In CloudWatch, mark the following three checkboxes\n",
153 | ""
154 | ]
155 | },
156 | {
157 | "cell_type": "markdown",
158 | "metadata": {},
159 | "source": [
160 | "## Then, change the following config, marked in RED\n",
161 | "\n",
162 | ""
163 | ]
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "metadata": {},
168 | "source": [
169 | "## Now, while your stress test is still running, you will see the Auto Scaling Alarm like this, after 3 datapoints above 750 Invocations Per Instance\n",
170 | "\n",
171 | "\n",
172 | "\n",
173 | "When this happens, the Endpoint Autoscaling will start adding more instances to your cluster. You can observe in the Graph from the previous image that, after new instances are added to the cluster, the **Invocations** metrics grows."
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "## Well done!"
181 | ]
182 | }
183 | ],
184 | "metadata": {
185 | "kernelspec": {
186 | "display_name": "conda_python3",
187 | "language": "python",
188 | "name": "conda_python3"
189 | },
190 | "language_info": {
191 | "codemirror_mode": {
192 | "name": "ipython",
193 | "version": 3
194 | },
195 | "file_extension": ".py",
196 | "mimetype": "text/x-python",
197 | "name": "python",
198 | "nbconvert_exporter": "python",
199 | "pygments_lexer": "ipython3",
200 | "version": "3.6.10"
201 | }
202 | },
203 | "nbformat": 4,
204 | "nbformat_minor": 2
205 | }
206 |
--------------------------------------------------------------------------------