├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── Evaluating a Project
    └── Evaluation.md
├── Images
    └── Autopilot.png
├── LICENSE
├── Project Writeups
    ├── Advanced Data Science - XRay Image Analysis.md
    ├── Hospital Costs with Autopilot.md
    ├── Hosting and MLOps.md
    └── README.md
├── README.md
└── Starter Notebooks
    ├── Advanced Data Science - XRay Analysis
        ├── Computer Vision for XRay Analysis.ipynb
        ├── ground_truth_od.py
        ├── im2rec.py
        ├── images
        │   ├── gt_label_output.png
        │   └── tensorplot.gif
        ├── src
        │   ├── requirements.txt
        │   └── ssd_entry_point.py
        ├── template.manifest
        └── tensor_plot.py
    ├── Cost Prediction
        ├── Cost Prediction with Autopilot.ipynb
        └── images
        │   ├── shap_1.png
        │   ├── shap_2.png
        │   ├── shap_3.png
        │   └── shap_4.png
    └── MLOps and Hosting
        ├── Hosting Models on SageMaker.ipynb
        ├── install-run-notebook.sh
        ├── model.tar.gz
        ├── src
            ├── requirements.txt
            └── train.py
        ├── test_set.csv
        └── train_set.csv


/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing Guidelines
 2 | 
 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
 4 | documentation, we greatly value feedback and contributions from our community.
 5 | 
 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
 7 | information to effectively respond to your bug report or contribution.
 8 | 
 9 | 
10 | ## Reporting Bugs/Feature Requests
11 | 
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 | 
14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 | 
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 | 
22 | 
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 | 
26 | 1. You are working against the latest source on the *master* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 | 
30 | To send us a pull request, please:
31 | 
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 | 
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 | 
42 | 
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
45 | 
46 | 
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 | 
52 | 
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 | 
56 | 
57 | ## Licensing
58 | 
59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 | 


--------------------------------------------------------------------------------
/Evaluating a Project/Evaluation.md:
--------------------------------------------------------------------------------
 1 | # Questions to Evaluate a Machine Learning Project 
 2 | 
 3 | This set of questions can be used during a machine learning course that introduces technical people who did not previously have a background in ML, and helps them understand if their project is well designed according to machine learning standards.
 4 | 
 5 | ## Infrastructure
 6 | 
 7 | * What is the time to train your model? Can you use streaming data to reduce the amount of time to train? Can you split over multiple instances to reduce the time to train?
 8 | * What infrastructure are you using to train your models? Is it the lowest possible cost? Have you considered using GPU's to lower your train time?
 9 | * Where are you storing your data; is that the best solution?
10 | * What devops framework do you have to continuously integrate changes as you make them?
11 | * Where are you developing your model, and is it the best choice for your scenario?
12 | 
13 | ## Conceptual
14 | 
15 | * Does the data that you're using reflect the real world?
16 | * What actually impacts the real world prediction problem, and is that in your data set?
17 | 
18 | ## Data transformation
19 | 
20 | * Did you normalize your data?
21 | * Did you randomly shuffle your data?
22 | * Did you remove any outliers for your data?
23 | * How did you handle missing or nonsensical values in your data?
24 | * How are you handling any sequential elements of your data set?
25 | * Did you remove any bias from it? Have you thought about the ethical implications of your machine learning system, and the fact that the data set itself you are using is potentially biased?
26 | * In the case of transfer learning, does your data match the model's input expectation (eg. image size, image format, color-correction, etc)
27 | 
28 | ## Method
29 | 
30 | * Which model did you select, and why?
31 | * How are you evaluating your model?
32 | * Is your model overfitting, and what are you doing to counteract that?
33 | * What are the limitations of your model, and what are its strong points?
34 | * What are the guardrails on the your model performance metrics? What is the minimum and maximum accuracy you expect to achieve?
35 | 


--------------------------------------------------------------------------------
/Images/Autopilot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Images/Autopilot.png


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of
 4 | this software and associated documentation files (the "Software"), to deal in
 5 | the Software without restriction, including without limitation the rights to
 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 7 | the Software, and to permit persons to whom the Software is furnished to do so.
 8 | 
 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
15 | 
16 | 


--------------------------------------------------------------------------------
/Project Writeups/Advanced Data Science - XRay Image Analysis.md:
--------------------------------------------------------------------------------
 1 | # Advanced SageMaker Features for Data Science in HCLS
 2 | You've seen the lectures, you've watched the videos, you've stepped through a hands on lab. Now it's time to take your usage of SageMaker to the next level by taking advantage of some advanced SageMaker features for training.
 3 | 
 4 | In particular, in this project you will train your own object detection model using the built-in Object Detection classifier. This is actually picking up from the results of a Ground Truth data labelling job, and using a manifest file to train on 1000 images. You'll train on spot instances using a sizeable GPU. You will also connect this job to a larger experiment to manage and view your progress. You'll learn how to convert your png images to RecordIO for optimized throughput.
 5 | 
 6 | Optionally, if you have extra time, you're welcome to bring your own single shot detection algorithim into script mode for SageMaker. If you're able to do that, you can leverage SageMaker debugger to analyze the gradients of your model, and even produce an interactive TensorPlot! 
 7 | 
 8 | # Built-in Object Detection for X-Ray Analysis
 9 | In 2017 the National Institute of Health introduced to the public scientific community one of the largest datasets of chest XRays previously available. Clocking in at 112,000 images, this dataset is the result of more than 30,000 patients. It is labelled to identity potentially cancerous masses within the images, helping acceelerate both scientific analysis and positive patient outcomes through faster diagnosis.
10 | 
11 | In this project you'll leverage a subset of the NIH dataset, specifically a sample of 1000 images that we have previously tested with SageMaker's data labelling solution, Ground Truth. You will use the output of a Ground Truth labelling job, where a labeller manually drew bounding boxes around the neck and/or trachea from these images. We're going to use the output from that job as the input for an Object Detection training job.
12 | 
13 | ## Accessing the Data and Preparing the Training Job
14 | To access the data and get started on your project, please navigate to the following link:
15 | 
16 | - https://github.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/blob/main/Starter%20Notebooks/Advanced%20Data%20Science%20-%20XRay%20Analysis/Computer%20Vision%20for%20XRay%20Analysis.ipynb
17 | 
18 | This will take you all the way to training an Object Detection algorithm using the built-in image. This job should take about 35 minutes to run, on the GPU's provided. Notice that your job is connected to SageMaker experiments, so you should be able to view the results in the Experiments tab.
19 | 
20 | Also notice that you are training on SageMaker spot! Go ahead and tell us how much you saved on that job. 
21 | 
22 | ## Extending with bringing your own model
23 | If you have time, you are welcome to extend the solution by bringing your own single-shot-detection algorithm. You can use the script we provided to convert your images to RecordIO if you prefer.
24 | 
25 | The advantage of bringing your own model in this case is that you'll be able to use SageMaker Debugger. SageMaker Debugger works out of the box with script-mode for TensorFlow, PyTorch and MXNet models. What this means is that you can setup a _debug hook config_, which will listen to the tensors recorded during your training job and let you analyze these. On the one hand, you can use a built-in debugger image which applies up to 18 built-in rules on your tensors, covering things like whether or not your loss is decreasing, if your gradients are vanishing or exploring, and even analyze feature important.
26 | 
27 | On the other hand, you can download the tensors produced during your job and _plug these into a local visualization solution._ We've included some starter code for doing this with our provided `TensorPlot` framework, which will let you develop a local interactive image for assessing your model.
28 | 
29 | SageMaker Debugger has exmaples for class activation maps, tensor plots, BERT attention head visualizations, and even model pruning. You're welcome to step through the Debugger examples listed below, play with these, and connect them to the NIH or other dataset as you prefer. 
30 | 
31 | - https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-debugger


--------------------------------------------------------------------------------
/Project Writeups/Hospital Costs with Autopilot.md:
--------------------------------------------------------------------------------
 1 | # Predicting Hospital Costs Per Patient with SageMaker Autopilot
 2 | 
 3 | ### Your Problem
 4 | Medicare is a national health insurance program, administered by the Center for Medicare and Medicaid Services (CMS). This is a primary health insurance program for Americans who are aged 65 and older. Medicare has published historical data showing hospital’s average spending for Medicare Part A and Part B claims based on different claim types and claim periods covering 1 to 3 days prior to hospital admission up to 30 days after discharge from hospital admission. These hospital spending are price standardized and non-risk adjusted, since risk adjustment is done at the episode level of the claims spanning the entire period during the episode. The hospital average costs are listed against the corresponding state level average cost and national level average cost.
 5 | 
 6 | You have just joined the data science team at Well-Forecasted Hospital. Your goal is to use the Medicare historical spending data to estimate potential costs per patient.
 7 | 
 8 | ### Your Dataset
 9 | Medicare has published dataset showing average hospital spending on Medicare Part A and Part B claims. Both the links below refer to the same data set, one is listed in the healthdata.gov site and the other is listed at the data.medicare.gov site. The data dictionary is described in the link marked as #2 below. The dataset has hospital spending data from the year 2018 and has 67,826 data rows spanning across 13 columns. For the purposes of our analysis and machine learning, we use the dataset in csv (Comma Separated Values) format.
10 | 1.	https://healthdata.gov/dataset/medicare-hospital-spending-claim
11 | 2.	https://data.medicare.gov/Hospital-Compare/Medicare-Hospital-Spending-by-Claim/nrth-mfg3
12 | 
13 | A direct link to download the dataset to local computer can be accessed at this link - https://data.medicare.gov/api/views/nrth-mfg3/rows.csv?accessType=DOWNLOAD
14 | 
15 | ### Accessing and cleaning your data
16 | To make this easier for you, we've written a starter notebook that downloads the data for you and performs some basic manipulations.
17 | 
18 | See the link to the starter notebook here:
19 | - https://github.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/blob/main/Starter%20Notebooks/Cost%20Prediction/Cost%20Prediction%20with%20Autopilot.ipynb
20 | 
21 | ### Analyzing your data and performing feature engineering
22 | Once you've loaded the cleaned data into your pandas dataframe, spend a bit of time exploring the fields with generating some plots and histograms. We started with using `pandas.plotting.scatter_matrix` and looking at the claims field.
23 | 
24 | We also found it helpful to use `sklearn.feature_selection.SelectKBest` based on `sklearn.feature_selection.chi2` to identify the best candidate X features.
25 | 
26 | Feel free to experiment here with your favorite feature engineering and data anlysis steps.
27 | 
28 | ### Train a model using your X and Y variables with SageMaker Autopilot
29 | We recommend using SageMaker Autopilot as a simple way to automate both data analysis and model tuning. In the notebook provided, you'll be able to easily train your own set of 250 models using the proivded datset and wrapping Python code to leverage the Autopilot API. 
30 | 
31 | The Autopilot job will take quite a bit of time to run. If you are using SageMaker Studio you should be able to monitor the job via the Experiments tab. Once the job moves into "Feature Engineering," you should be able to open up the both the data transformation and candidate generation notebooks. Do that. Open themn up on your local Studio domain, step through them, and try to understand precisely what they are doing for you. Remember, all of this code is generated for your specific dataset! 
32 | 
33 | ### Deploy your solution to a RESTful API
34 | When you have time, you'll notice that the built-in models and the script-mode managed containers within SageMaker come with a `model.deploy()` method. This will automatically create a RESTful API hosting your model! Give that a try.
35 | 
36 | What's nice about Autopilot is that __it will deploy the featurizing code along with the best model.__ That is to say, for all of the data transformation code that Autopilot generates within the candidates, it will deploy the specific featurizing code that maps to your best candidate. This is wrapped inside of a what SageMaker calls __an inference pipeline__.
37 | 
38 | ### Extend with bringing your own feature engineering script into SageMaker Autopilot
39 | You may remember that this project opened up with some basic data transformation before we even plugged it into Autopilot. If you have time, your task is to port that code into the _bring your own feature engineering_ capabilities of Autopilot.
40 | 
41 | Step through the example right here, then modify it to point to your dataset and the code from the starter notebook.
42 | - https://github.com/aws/amazon-sagemaker-examples/blob/master/autopilot/custom-feature-selection/Feature_selection_autopilot.ipynb


--------------------------------------------------------------------------------
/Project Writeups/Hosting and MLOps.md:
--------------------------------------------------------------------------------
 1 | # Hosting Models on SageMaker for Rapid Diagnosis
 2 | You've just joined the machine learning team at a local hospital. They have completed a POC on diagnosing breast cancer, and need your help hosting this model on SageMaker.
 3 | 
 4 | Not only does the team want to build a RESTful API, they want to take advantage of the massive array of features and capabilities that SageMaker brings to the table.
 5 | 
 6 | In this module you will:
 7 | - host your model on SageMaker
 8 | - enable and test autoscaling on your endpoint 
 9 | - monitor your model
10 | - push a new model into production on that endpoint
11 | - build an automatic retraining system with Lambda
12 | 
13 | ## 1 & 2. Access Your Model Artifact, Training and Inference Code. Package in the SageMaker SDK
14 | The good news is that your model is already built! You have a pretrained model artifact, defined in SKLearn. This model looks at tabular data and returns a predicted likelihood that the patient has breast cancer. You also have the exact training and inference code necessary to develop this model, written as a Python script. 
15 | 
16 | Access these artifacts and package them up within the SageMaker SDK following the example here:
17 | - https://github.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/blob/main/Starter%20Notebooks/MLOps%20and%20Hosting/Hosting%20Models%20on%20SageMaker.ipynb
18 | 
19 | ## 3 & 4. Create and Endpoint and Enable Autoscaling
20 | Once you have the artifacts packaed within the SageMaker SDK, creating an endpoint should be as simple as calling `model.deploy()`. Follow the notebook for this.
21 | 
22 | After this is completed, you'll use boto3 to enable autoscaling on that endpoint. Notice that ths is also a single function call, albeit with a few more parameters.
23 | 
24 | ## 5. Enable Model Monitor on your Endpoint
25 | The real-world environment that our data scientists trained their model against actually changes over time - we need to make sure that the trained model we are using stays up to date with those changes. The way we're going to do that on SageMaker is by _setting up model monitor on our endpoints._ The way this happens is actually through passing up our training data. We'll use the SageMaker SDK to spin up a processing job that takes our training data in S3, and learns statistical benchmarks on our data. 
26 | 
27 | The built-in processing image for model monitor uses _Amazon Deequ_, an open-source solution that ensures data quality at high volumes. It's written in PySpark, so it scales quite well. After your processing job has finished you're welcome to view the thresholds and modify them as you or your data scientists prefer. 
28 | - https://github.com/awslabs/deequ
29 | 
30 | You'll also specify a percentage of data capture, that is the amount of your traffic you want to store after it hits your endpoint. Then, you'll set up _a monitoring schedule_ to run monitoring jobs which use the thresholds you learnt during the previous step, and simply apply those to the data captured from your endpoint.
31 | 
32 | ## 6. Improve your model with AutoGluon
33 | You might find that the quality of your model drops over time, or possibly wasn't even that great to being with. AutoGluon is an easy way to improve this - it supports tabular data, imaging, and natural language processing. AutoGluon also has data augmentation capabilities, so it works with smaller datasets like we have here. Step six entails runnning an AutoGluon job, using script mode in SageMaker, to find a better version of the model.
34 | 
35 | ## 7. Tune your model and redploy
36 | Another way of finding a better model is using the automatic model tuner. Sadly that doesn't improve the performance of our model in this case, unlike AutoGluon which brings us up to 95% accuracy, but so you can see how it's done we've included the code for both runing a hyperparameter tuning job and a re-deploy to the same endpoint.
37 | 
38 | ## 8. Automate the workflow
39 | For the sake of time we've included a very simple of way of automating this workflow - setting up the local `notebook-runner` toolkit and simply running the entire notebook on it's own processing job. While you might not use this instead of a true MLOps pipeline for a real-time application, you can certainly get a lot of value out of running notebooks automatically and with a CLI. See the blog post and GitHub code suit for more details here: 
40 | - https://aws.amazon.com/blogs/machine-learning/scheduling-jupyter-notebooks-on-sagemaker-ephemeral-instances/
41 | 
42 | ## Extensions
43 | If you have spare time, you're welcome to add the AutoGluon deploy capabilities as referenced in this example notebook:
44 | - https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/autogluon-tabular 
45 | 
46 | You can also step through setting up an MLOps pipeline using Lambda, Step Functions, and Apache AirFlow as referenced here: 
47 | - https://github.com/aws-samples/mlops-amazon-sagemaker-devops-with-ml


--------------------------------------------------------------------------------
/Project Writeups/README.md:
--------------------------------------------------------------------------------
 1 | # SageMaker Projects for HCLS
 2 | In this course, we have a few different types of projects for you. We have an introduction to data science projects on SageMaker, which is great for people who are net new to machine learning and/or to SageMaker. We also have an advanced data science on SageMaker project, for those of you who want to spend more time learning about advanced SageMaker features for training. We also have a project for operationalizing SageMaker, for those of you who want to focus more on operationalizing SageMaker than necessarily training. All of these projects will start with SageMaker Studio, and utilize SageMaker for both training and hosting, but the big difference is the amount of time you spend on those, where you'll be spending your time, and how you'll frame your final deliverables.
 3 | 
 4 | By the end of course tomorrow, you should have a new notebook developed that solves a meaty problem! You should be able to take this notebook back to work with you and drive value for your company and team. Remember, your AWS Solutions Architect is on the ready to help you out, so don't be shy! Ask for help early and often.
 5 | 
 6 | And don't forget the secret sauce. Make sure you have some fun!
 7 | 
 8 | ---
 9 | 
10 | ## Introduction to Data Science on SageMaker
11 | If you are new to data science, or if this is your absolute first SageMaker workshop, you might want to focus on projects in this category. We'll show you a data science problem, tell you how to access that dataset, give you some basic code for cleaning in, then show you how to train a model on SageMaker. Quite a bit of the problem solving is actually on you, however, as we want to help you stretch your data science skills. You'll be asked to perform feature engineering, analyze your data, train multiple models until you find the best approach. 
12 | 
13 | #### Introductory Data Science Projects
14 | - Predicting Hospital Costs per Patient with Autopilot
15 | 
16 | ## Advanced Data Science on SageMaker
17 | If you've already had a SageMaker workshop before, and you want to focus on advanced SageMaker features for data science, you might pick the `Advanced Data Science on SageMaker` project. This will introduce you to the built-in object detection algorithm, manifest files, Ground Truth, debugger, script-mode, spot instances, experiments, and more! 
18 | 
19 | This project focuses on XRay analysis. You'll train an object detection algorithm to identiy the throat and trachea within 1000 NIH XRay images.
20 | 
21 | ## SageMaker in Production
22 | A different project is for those who are more interested in putting SageMaker into production. This project focuses more on the operationalizing aspects of SageMaker, such as hosting, autoscaling, automation with Lambda, monitoring, workflow orchestration, and MLOps. You will start with pre-existing model artifacts and training / inference code, and then build a system that highlights the key tenets we need when going into production with SageMaker. This project is called `Hosting and MLOps`.
23 | 
24 | You'll also get to see how to use both automatic model tuning and AutoGluon to quickly and dramatically improve the performance of your model.
25 | 
26 | 
27 | ---
28 | 
29 | # FAQ's
30 | __1. If I pick a production project, will I still learn how to train a model?__ Yes definitely! You just won't spend all of your time there. You'll spend maybe 15% of your time training a model, but most of your time is on getting those endpoints up and running, then monitoring and updating them.
31 | 
32 | __2. If I am here to learn about data science, what should I focus on?__ Go ahead and pick a project in the "Introduction to Data Science" category. Generally you'll be learning about how to use Autopilot. 
33 | 
34 | __3. So I'm going to need to write some of my own code for these?__ Yes, and that's because for the people who want to learn what data science is all about, there's nothing better than attacking your own data set and getting it to work for you. That also goes for the projects about running in production. We'll give you a framework for how to develop these, give you hints, and point to resources, then your AWS Solutions Architect can help you take that project all the way home.
35 | 
36 | __4. How far should I get on these projects?__ By the end of the day today, you should have stepped through your first lab in your group. Then you should have accessed your data, cleaned it, and started thinking about how to analyze it. These projects are long, and some of the steps will take some time to run, so just get as far as you can. You'll still have most of the day tomorrow to work on them. 
37 | 
38 | __5. Will I get to see the solutions for these projects?__ Yes! At the end of the session tomorrow, we'll have our experts walk through the solution set. 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Architecting For Machine Learning on Amazon SageMaker in Health Care and Life Science
 2 | Welcome to the art and science of machine learning! During this 2-day accelerator course you will quickly learn about the theory and application of machine learning for HCLS applications, with a strong focus on the AWS cloud and Amazon SageMaker. All of our projects are coming straight from the Health Care and Life Science domain space, so if you're familiar with the needs of analyzing medical imaging data, patient cost spending, dermatology, and even genomic analysis, you'll feel right at home.  
 3 | 
 4 | This accelerator is designed for data scientists who are new to AWS, and architects and developers who are new to machine learning. You will spend two days performing data science tasks: training models, evaluating them, analyzing data, etc. After this two day period you will be better suited to continue building data science solutions on AWS, designing architectural requirements for these, or supporting teams who currently do this. 
 5 | 
 6 | We will cover:
 7 | - Statistical machine learning
 8 | - Deep learning
 9 | - Feature engineering
10 | - Deploying a model into production
11 | - Model evaluation and comparison
12 | - SageMaker deep dive: Studio, notebooks, training jobs, endpoints, model monitor, etc
13 | 
14 | This course is designed for Python developers primarily. But since it is group-based, you will still have a great time even if you don't wrangle Python for your day job. We recommend reviewing Python programming using the statistical package Pandas. We also recommend having a Cloud Practiioner AWS Certification, but it is not required. Lastly, we recommend the book listed below. It is an excellent read, and clearly demonstrates all important concepts. The syntax might be a bit outdated at this point, but the concepts are still spot on. 
15 | - https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/ 
16 | - https://aws.amazon.com/certification/certified-cloud-practitioner/ 
17 | - [Deep Learning with Python by Francois Chollet](https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438)
18 | 
19 | ## Agenda
20 | __Day One:__
21 | - Learn about ML on AWS
22 | - Go through a sample lab
23 | - Break into teams and focus on a new machine learning project
24 | __First Goal:__ Download your dataset to an S3 bucket, create a SageMaker Studio domain, and load your data into a Pandas dataframe. 
25 | 
26 | __Day Two:__ 
27 | - Learn about feature engineering on AWS
28 | - Finish your first set of engineered features
29 | - Train your first model
30 | - Learn about model evaluation on AWS
31 | - Tune your model model using SageMaker automatic model tuning 
32 | - Learn about putting your model into production on SageMaker
33 | __Deliverable:__ Demo your notebook to your colleagues!  
34 | 
35 | ## What you'll need
36 | - Your own laptop 
37 | - Github account to share code with your project partners
38 | - Kaggle account to download data sets
39 | 
40 | We will provide you an AWS account for this course. However, if you would like to bring your own dataset and use the time to build your own project, you're welcome to do that! We ask that you use your own AWS account in that case. 
41 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/Computer Vision for XRay Analysis.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Object Detection in Chest Xrays\n",
   8 |     "\n",
   9 |     "This workshop uses a portion of the NIH Chest Xray dataset. Specifically, we will use about 1,000 images where we will predict the location of the trachea and throat of the patient.\n",
  10 |     "\n",
  11 |     "In addition, you'll learn about a variety of SageMaker features for training. In this lab we will:\n",
  12 |     "1. Download and prepare the result of a Ground Truth labelling job for xray image classification\n",
  13 |     "2. Visualize this dataset locally\n",
  14 |     "3. Train an object detection model using the built-in object detection algorithm from SageMaker\n",
  15 |     "4. Leverage GPU's and spot instances for running the training job.\n",
  16 |     "5. Set up our own model using script mode, leveraging GluonCV\n",
  17 |     "6. Leverage debugger for this job\n",
  18 |     "7. Visualize the network for our model locally.\n",
  19 |     "8. View and track all of this progress using Experiments.\n",
  20 |     "\n",
  21 |     "---"
  22 |    ]
  23 |   },
  24 |   {
  25 |    "cell_type": "code",
  26 |    "execution_count": null,
  27 |    "metadata": {},
  28 |    "outputs": [],
  29 |    "source": [
  30 |     "!pip install --upgrade pip\n",
  31 |     "!pip install matplotlib\n",
  32 |     "!pip install imageio\n",
  33 |     "!pip install --upgrade awscli\n",
  34 |     "!pip install --upgrade boto3\n",
  35 |     "!pip install sagemaker-experiments\n",
  36 |     "!pip install --upgrade sagemaker"
  37 |    ]
  38 |   },
  39 |   {
  40 |    "cell_type": "markdown",
  41 |    "metadata": {},
  42 |    "source": [
  43 |     "## Enable SageMaker Experiments\n",
  44 |     "First, let's create an experiment so we can track this job and all of our assets."
  45 |    ]
  46 |   },
  47 |   {
  48 |    "cell_type": "code",
  49 |    "execution_count": null,
  50 |    "metadata": {},
  51 |    "outputs": [],
  52 |    "source": [
  53 |     "import boto3\n",
  54 |     "import time\n",
  55 |     "from smexperiments.experiment import Experiment\n",
  56 |     "\n",
  57 |     "sm = boto3.client('sagemaker')\n",
  58 |     "\n",
  59 |     "experiment_name = f\"xray-object-detection-{int(time.time())}\"\n",
  60 |     "experiment = Experiment.create(experiment_name=experiment_name, \n",
  61 |     "                                    description=\"Training an object detection model on XRay data\", \n",
  62 |     "                                    sagemaker_boto_client=sm)"
  63 |    ]
  64 |   },
  65 |   {
  66 |    "cell_type": "markdown",
  67 |    "metadata": {},
  68 |    "source": [
  69 |     "Now you can open the Experiments tab on the lefthand side, and you should see a new experiment!"
  70 |    ]
  71 |   },
  72 |   {
  73 |    "cell_type": "markdown",
  74 |    "metadata": {},
  75 |    "source": [
  76 |     "---\n",
  77 |     "# Download and Prepare NIH Images\n",
  78 |     "Next, we are going to access those 1000 images from the NIH dataset. Please ask your AWS SA for a link to the dataset. This is a timed presign url, so make sure to use it quickly!"
  79 |    ]
  80 |   },
  81 |   {
  82 |    "cell_type": "code",
  83 |    "execution_count": null,
  84 |    "metadata": {},
  85 |    "outputs": [],
  86 |    "source": [
  87 |     "#Change to URL sent by Workshop leader\n",
  88 |     "DATA_SOURCE = 'https://nih-xray-data.s3.amazonaws.com/compressed-image-file/images.tar.gz?AWSAccessKeyId=ASIASUWHP42B3EFFCQIY&Signature=hPONsLLap26VOe6WNnBe0NGea6I%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEMX%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIAq%2Bx85gM9%2BMmWykkKE5qEJFeQOm5EDYUddVPwyrFtbtAiBqn3Sh%2FuRX3%2FS24Z7mRL670ouRjUrHqPJYmzQ1WULeuCrFAwgdEAAaDDE4MTg4MDc0MzU1NSIMPZ72cjbIbsy9QqicKqIDWl75J2dA07YuBGAJ6dUNuonWmfujluzIHeuWtIVL4GWi%2FnRESP3MNj1ddB2RbSdrpIKcE7PNhaPU99Ih3ld96hP%2BVYJ1DQbsh12zpvaPVrE2oYnG2TcTSAWqsaX5yIhUtG2VHyZui48aK8MihEVFhXtXHYLkRIE60hUskjMOrjbG%2Bm5MxFIkGyX%2BT69aI9wRlKs%2BN82XtstmsqJcMdgcdKls6KtGNJ5uY8poRWtEqNAVRAWcePMJbZnhmJm%2Bleanr52CXc6sp9QNdWhBrUZbi9rgIYYDR2nsoJVyhP0H%2FE0Yqc1bYaYgRRJhciXmlMSq%2BXdArh5awwnI%2Fpk8XhIQr4ouzzxV8nRTk4yK54JuyvoeviF7utmOz96eXCk%2BkNqqhHAlf%2FtR%2FkaOX5c%2BkHMa23Qr0X2BTxqFGwbste2ANPjRZc9DHmFP0hSDiWv0MgRtfMZxaLEkrpkYURE2Fy749TjdxzrxLPvdxi5R2UP4hCImAcwQvnF0VMW22B9oxJIvvfFBwVwpYN2Am2AmqJ7TTDxuLxnaUvyX0N4u8UL%2FtgcSjzDFgOL8BTrHAe7nptBN3%2BitZjpZ4dIGMC0bhHJ1rP2y4WZv6zGLcjGmRhIWllldzkrK7Ij2I2O8zPVbORGKB9wUdJZ31JaWiuO5o0cFyOBopLV1VjyN%2B4BwvLExUJWUXlL3QbNfaQD8GxfxZDVCTpK5T2BRJsID0s2dIcZvjlNiiKZYzSWh%2BZx52Ixs6IPcy9Gh1EfFMgpeAKK7rl1AaeWFF4VCEW1ixMyoa16U0CP1KfA2eXlhw3SsJrtQ25gx7OaXT3wywviml1lLT6Fu9pM%3D&Expires=1604434685'"
  89 |    ]
  90 |   },
  91 |   {
  92 |    "cell_type": "code",
  93 |    "execution_count": null,
  94 |    "metadata": {},
  95 |    "outputs": [],
  96 |    "source": [
  97 |     "import sagemaker\n",
  98 |     "\n",
  99 |     "BUCKET = sagemaker.Session().default_bucket()\n",
 100 |     "\n",
 101 |     "PREFIX='hcls-xray' #Change to your directory/prefix\n",
 102 |     "\n",
 103 |     "IMAGE_FILE = 'image_data.tar.gz' #do not edit\n"
 104 |    ]
 105 |   },
 106 |   {
 107 |    "cell_type": "code",
 108 |    "execution_count": null,
 109 |    "metadata": {},
 110 |    "outputs": [],
 111 |    "source": [
 112 |     "#first we download the compressed data from the bucket\n",
 113 |     "!wget \"$DATA_SOURCE\" -O image_data.tar.gz\n",
 114 |     "#then we will decompress the images to be used for training and validation\n",
 115 |     "!tar -xf image_data.tar.gz\n",
 116 |     "#now copy the data to S3\n",
 117 |     "!aws s3 cp --recursive --quiet images s3://$BUCKET/$PREFIX/image_data/\n",
 118 |     "print('Files uploaded to S3')"
 119 |    ]
 120 |   },
 121 |   {
 122 |    "cell_type": "code",
 123 |    "execution_count": null,
 124 |    "metadata": {},
 125 |    "outputs": [],
 126 |    "source": [
 127 |     "def create_new_manifest_file(input_file, output_file):\n",
 128 |     "\n",
 129 |     "    template_manifest=open(input_file).readlines()\n",
 130 |     "    output_manifest=[]\n",
 131 |     "    for i in template_manifest:\n",
 132 |     "        i=i.strip()\n",
 133 |     "        i=i.replace('BUCKET',BUCKET) #have the manifest point to the actual bucket each individual is using for the workshop\n",
 134 |     "        i=i.replace('PREFIX',PREFIX) #have the manifest point to the actual bucket each individual is using for the workshop\n",
 135 |     "\n",
 136 |     "        output_manifest.append(i)\n",
 137 |     "    f_out=open(output_file,'w')\n",
 138 |     "    print(*output_manifest,file=f_out,sep=\"\\n\")\n",
 139 |     "    f_out.close()\n",
 140 |     "    \n",
 141 |     "    return output_manifest\n",
 142 |     "    \n",
 143 |     "output_manifest = create_new_manifest_file('template.manifest', 'output.manifest')"
 144 |    ]
 145 |   },
 146 |   {
 147 |    "cell_type": "markdown",
 148 |    "metadata": {},
 149 |    "source": [
 150 |     "Let's inspect the contents of this labeled manfiest file. "
 151 |    ]
 152 |   },
 153 |   {
 154 |    "cell_type": "code",
 155 |    "execution_count": null,
 156 |    "metadata": {},
 157 |    "outputs": [],
 158 |    "source": [
 159 |     "json.loads(output_manifest[0])"
 160 |    ]
 161 |   },
 162 |   {
 163 |    "cell_type": "markdown",
 164 |    "metadata": {},
 165 |    "source": [
 166 |     "Now let's copy the manifest file out to S3."
 167 |    ]
 168 |   },
 169 |   {
 170 |    "cell_type": "code",
 171 |    "execution_count": null,
 172 |    "metadata": {},
 173 |    "outputs": [],
 174 |    "source": [
 175 |     "!aws s3 cp  output.manifest s3://$BUCKET/$PREFIX/output.manifest"
 176 |    ]
 177 |   },
 178 |   {
 179 |    "cell_type": "markdown",
 180 |    "metadata": {},
 181 |    "source": [
 182 |     "---\n",
 183 |     "# Analyze Local Data \n",
 184 |     "Next, let's open up a few of those image files to make sure we know what we're dealing with. Remember, these are picking up after someone has finished labelling them with SageMaker Ground Truth! "
 185 |    ]
 186 |   },
 187 |   {
 188 |    "cell_type": "code",
 189 |    "execution_count": null,
 190 |    "metadata": {},
 191 |    "outputs": [],
 192 |    "source": [
 193 |     "%matplotlib inline\n",
 194 |     "%load_ext autoreload\n",
 195 |     "%autoreload 2\n",
 196 |     "import os\n",
 197 |     "from collections import namedtuple\n",
 198 |     "from collections import defaultdict\n",
 199 |     "from collections import Counter\n",
 200 |     "import itertools\n",
 201 |     "import json\n",
 202 |     "import random\n",
 203 |     "import time\n",
 204 |     "import imageio\n",
 205 |     "import numpy as np\n",
 206 |     "import matplotlib\n",
 207 |     "import matplotlib.pyplot as plt\n",
 208 |     "from matplotlib.backends.backend_pdf import PdfPages\n",
 209 |     "from sklearn.metrics import confusion_matrix\n",
 210 |     "import boto3\n",
 211 |     "import sagemaker\n",
 212 |     "from urllib.parse import urlparse\n",
 213 |     "\n",
 214 |     "fids2bbs = defaultdict(list)\n",
 215 |     "\n",
 216 |     "from ground_truth_od import group_miou\n",
 217 |     "from ground_truth_od import BoundingBox, WorkerBoundingBox, \\\n",
 218 |     "    GroundTruthBox, BoxedImage"
 219 |    ]
 220 |   },
 221 |   {
 222 |    "cell_type": "code",
 223 |    "execution_count": null,
 224 |    "metadata": {},
 225 |    "outputs": [],
 226 |    "source": [
 227 |     "EXP_NAME = 'nih-chest-xrays' #where to put experiment data\n",
 228 |     "\n",
 229 |     "OUTPUT_MANIFEST_S3=f'{BUCKET}/{PREFIX}/output.manifest' #location of the manifest file in S3\n",
 230 |     "IMAGE_DATA_S3=f'{BUCKET}/{PREFIX}' #location of image data in s3\n",
 231 |     "\n",
 232 |     "print('S3 Location of Manifest File:')\n",
 233 |     "print(OUTPUT_MANIFEST_S3)\n",
 234 |     "\n",
 235 |     "print('S3 Location of Image Data:')\n",
 236 |     "print(IMAGE_DATA_S3)\n"
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "markdown",
 241 |    "metadata": {},
 242 |    "source": [
 243 |     "First we will load and preprocess the manifest file. This manifest file is in fact an **augmented manifest file**, and also contains the location of the throat of the patient in the xray"
 244 |    ]
 245 |   },
 246 |   {
 247 |    "cell_type": "code",
 248 |    "execution_count": null,
 249 |    "metadata": {},
 250 |    "outputs": [],
 251 |    "source": [
 252 |     "def read_data(file_name):\n",
 253 |     "    !mkdir -p data #make a data directory if it does not exist\n",
 254 |     "    with open(file_name, 'r') as f:\n",
 255 |     "        output = [json.loads(line.strip()) for line in f.readlines()]\n",
 256 |     "\n",
 257 |     "    return output\n",
 258 |     "\n",
 259 |     "def write_manifest(file_name):\n",
 260 |     "    f_out=open(file_name,'w')\n",
 261 |     "    for i in output_clean:\n",
 262 |     "        print(json.dumps(i),file=f_out,sep=\"\\n\")\n",
 263 |     "    f_out.close()\n",
 264 |     "\n",
 265 |     "def filter_manifest(file_name):\n",
 266 |     "    'remove any images that are not labeled.'\n",
 267 |     "    \n",
 268 |     "    output = read_data(file_name)\n",
 269 |     "    \n",
 270 |     "    output_clean =[]\n",
 271 |     "    \n",
 272 |     "    metadata_info ='xray-labeling-job-clone-clone-full-clone-metadata' #change depending on the job\n",
 273 |     "    \n",
 274 |     "    for the_sample in output:\n",
 275 |     "        z = the_sample[metadata_info]['creation-date']\n",
 276 |     "        output_clean.append(the_sample)\n",
 277 |     "\n",
 278 |     "    print(f'Number of images without errors {len(output_clean)}')\n",
 279 |     "        \n",
 280 |     "    return(output_clean)\n",
 281 |     "\n",
 282 |     "\n",
 283 |     "output_clean = filter_manifest('output.manifest')\n",
 284 |     "write_manifest('data/output_manifest_clean.manifest')"
 285 |    ]
 286 |   },
 287 |   {
 288 |    "cell_type": "code",
 289 |    "execution_count": null,
 290 |    "metadata": {},
 291 |    "outputs": [],
 292 |    "source": [
 293 |     "def get_groundtruth_labels(output):\n",
 294 |     "    # Create data arrays.\n",
 295 |     "    img_uris = [None] * len(output)\n",
 296 |     "    confidences = [None] * len(output)\n",
 297 |     "    groundtruth_labels = [None] * len(output)\n",
 298 |     "    human = np.zeros(len(output))\n",
 299 |     "\n",
 300 |     "    # Find the job name contained within the manifest file manifest corresponds to.\n",
 301 |     "    keys = list(output[0].keys())\n",
 302 |     "    metakey = keys[np.where([('-metadata' in k) for k in keys])[0][0]]\n",
 303 |     "    jobname = metakey[:-9]\n",
 304 |     "\n",
 305 |     "    # Extract the data.\n",
 306 |     "    for datum_id, datum in enumerate(output):\n",
 307 |     "        img_uris[datum_id] = datum['source-ref']\n",
 308 |     "        groundtruth_labels[datum_id] = str(datum[metakey]['class-map'])\n",
 309 |     "        confidences[datum_id] = datum[metakey]['objects']\n",
 310 |     "        human[datum_id] = int(datum[metakey]['human-annotated'] == 'yes')\n",
 311 |     "    groundtruth_labels = np.array(groundtruth_labels)\n",
 312 |     "    \n",
 313 |     "    return groundtruth_labels\n",
 314 |     "\n",
 315 |     "groundtruth_labels = get_groundtruth_labels(output_clean)"
 316 |    ]
 317 |   },
 318 |   {
 319 |    "cell_type": "code",
 320 |    "execution_count": null,
 321 |    "metadata": {},
 322 |    "outputs": [],
 323 |    "source": [
 324 |     "groundtruth_labels[0]"
 325 |    ]
 326 |   },
 327 |   {
 328 |    "cell_type": "code",
 329 |    "execution_count": null,
 330 |    "metadata": {},
 331 |    "outputs": [],
 332 |    "source": [
 333 |     "def map_images_to_labels(output):\n",
 334 |     "\n",
 335 |     "    # Create data arrays.\n",
 336 |     "    confidences = np.zeros(len(output))\n",
 337 |     "\n",
 338 |     "    # Find the job name the manifest corresponds to.\n",
 339 |     "    keys = list(output[0].keys())\n",
 340 |     "    metakey = keys[np.where([('-metadata' in k) for k in keys])[0][0]]\n",
 341 |     "    jobname = metakey[:-9]\n",
 342 |     "    output_images = []\n",
 343 |     "    consolidated_boxes = []\n",
 344 |     "\n",
 345 |     "    # Extract the data.\n",
 346 |     "    for datum_id, datum in enumerate(output):\n",
 347 |     "        image_size = datum[jobname]['image_size'][0]\n",
 348 |     "        box_annotations = datum[jobname]['annotations']\n",
 349 |     "        uri = datum['source-ref']\n",
 350 |     "        box_confidences = datum[metakey]['objects']\n",
 351 |     "        human = int(datum[metakey]['human-annotated'] == 'yes')\n",
 352 |     "\n",
 353 |     "        # Make image object.\n",
 354 |     "        image = BoxedImage(id=datum_id, size=image_size,\n",
 355 |     "                           uri=uri)\n",
 356 |     "\n",
 357 |     "        # Create bounding boxes for image.\n",
 358 |     "        boxes = []\n",
 359 |     "        for i, annotation in enumerate(box_annotations):\n",
 360 |     "            box = BoundingBox(image_id=datum_id, boxdata=annotation)\n",
 361 |     "            box.confidence = box_confidences[i]['confidence']\n",
 362 |     "            box.image = image\n",
 363 |     "            box.human = human\n",
 364 |     "            boxes.append(box)\n",
 365 |     "            consolidated_boxes.append(box)\n",
 366 |     "        image.consolidated_boxes = boxes\n",
 367 |     "\n",
 368 |     "        # Store if the image is human labeled.\n",
 369 |     "        image.human = human\n",
 370 |     "\n",
 371 |     "        # Retrieve ground truth boxes for the image.\n",
 372 |     "        oid_boxes_data = fids2bbs[image.oid_id]\n",
 373 |     "        gt_boxes = []\n",
 374 |     "        for data in oid_boxes_data:\n",
 375 |     "            gt_box = GroundTruthBox(image_id=datum_id, oiddata=data,\n",
 376 |     "                                    image=image)\n",
 377 |     "            gt_boxes.append(gt_box)\n",
 378 |     "        image.gt_boxes = gt_boxes\n",
 379 |     "\n",
 380 |     "        output_images.append(image)\n",
 381 |     "        \n",
 382 |     "    return output_images, jobname\n",
 383 |     "\n",
 384 |     "output_images, jobname = map_images_to_labels(output_clean)"
 385 |    ]
 386 |   },
 387 |   {
 388 |    "cell_type": "code",
 389 |    "execution_count": null,
 390 |    "metadata": {},
 391 |    "outputs": [],
 392 |    "source": [
 393 |     "len(output_clean)"
 394 |    ]
 395 |   },
 396 |   {
 397 |    "cell_type": "code",
 398 |    "execution_count": null,
 399 |    "metadata": {},
 400 |    "outputs": [],
 401 |    "source": [
 402 |     "def create_bounding_boxes(output_clean, output_images):\n",
 403 |     "    # Iterate through the json files, creating bounding box objects.\n",
 404 |     "    \n",
 405 |     "    output_with_answers=[] #only include images with the answers in them\n",
 406 |     "    output_images_with_answers=[]\n",
 407 |     "\n",
 408 |     "    output_with_no_answers=[]\n",
 409 |     "    output_images_with_no_answers=[]\n",
 410 |     "\n",
 411 |     "    for i in range(0,len(output_clean)):\n",
 412 |     "        try:\n",
 413 |     "            #images with class_id have answers in them\n",
 414 |     "            x = output_clean[i][jobname]['annotations'][0]['class_id']\n",
 415 |     "\n",
 416 |     "            output_with_answers.append(output_clean[i])\n",
 417 |     "            output_images_with_answers.append(output_images[i])\n",
 418 |     "        except:\n",
 419 |     "            output_with_no_answers.append(output_clean[i])\n",
 420 |     "            output_images_with_no_answers.append(output_images[i])\n",
 421 |     "            pass\n",
 422 |     "\n",
 423 |     "        #add the box to the image\n",
 424 |     "        for i in range(0,len(output_with_answers)):\n",
 425 |     "            the_output=output_with_answers[i]\n",
 426 |     "            the_image=output_images_with_answers[i]\n",
 427 |     "            answers=the_output[jobname]['annotations']\n",
 428 |     "            box=WorkerBoundingBox(image_id=i,boxdata=answers[0],worker_id='anon-worker')\n",
 429 |     "            box.image=the_image\n",
 430 |     "            the_image.worker_boxes.append(box)\n",
 431 |     "\n",
 432 |     "    print(f\"Number of images with labeled trachea/throat: {len(output_images_with_answers)}\")\n",
 433 |     "    print(f\"Number of images without labeled trachea/throat: {len(output_with_no_answers)}\")\n",
 434 |     "    \n",
 435 |     "    return output_with_answers, output_images_with_answers\n",
 436 |     "            \n",
 437 |     "output_with_answers, output_images_with_answers = create_bounding_boxes(output_clean, output_images)"
 438 |    ]
 439 |   },
 440 |   {
 441 |    "cell_type": "code",
 442 |    "execution_count": null,
 443 |    "metadata": {},
 444 |    "outputs": [],
 445 |    "source": [
 446 |     "def download_images(output_images_with_answers, image_dir = 'data', dataset_size = 5):\n",
 447 |     "    image_subset = np.random.choice(output_images_with_answers, dataset_size, replace=False)\n",
 448 |     "\n",
 449 |     "    for img in image_subset:\n",
 450 |     "        target_fname = os.path.join(\n",
 451 |     "            image_dir, img.uri.split('/')[-1])\n",
 452 |     "        if not os.path.isfile(target_fname):\n",
 453 |     "            !aws s3 cp {img.uri} {target_fname}\n",
 454 |     "            \n",
 455 |     "    return image_subset\n",
 456 |     "        \n",
 457 |     "image_subset = download_images(output_images_with_answers)"
 458 |    ]
 459 |   },
 460 |   {
 461 |    "cell_type": "markdown",
 462 |    "metadata": {},
 463 |    "source": [
 464 |     "Next, we're going to plot the bounding boxes on the XRay data. Your plot should look something like this!\n",
 465 |     "\n",
 466 |     "![](images/gt_label_output.png)"
 467 |    ]
 468 |   },
 469 |   {
 470 |    "cell_type": "code",
 471 |    "execution_count": null,
 472 |    "metadata": {},
 473 |    "outputs": [],
 474 |    "source": [
 475 |     "def visualize_images(image_subset, image_dir = 'data', n_show = 5):\n",
 476 |     "    \n",
 477 |     "    # Find human and auto-labeled images in the subset.\n",
 478 |     "    human_labeled_subset = [img for img in image_subset if img.human]\n",
 479 |     "\n",
 480 |     "    # Show examples of each\n",
 481 |     "    fig, axes = plt.subplots(n_show, 2, figsize=(9, 2*n_show),\n",
 482 |     "                             facecolor='white', dpi=100)\n",
 483 |     "    fig.suptitle('Human-labeled examples', fontsize=24)\n",
 484 |     "    axes[0, 0].set_title('Worker labels', fontsize=14)\n",
 485 |     "    axes[0, 1].set_title('Consolidated label', fontsize=14)\n",
 486 |     "    for row, img in enumerate(np.random.choice(human_labeled_subset, size=n_show)):\n",
 487 |     "        img.download(image_dir)\n",
 488 |     "        img.plot_worker_bbs(axes[row, 0])\n",
 489 |     "        img.plot_consolidated_bbs(axes[row, 1])\n",
 490 |     "\n",
 491 |     "visualize_images(image_subset)"
 492 |    ]
 493 |   },
 494 |   {
 495 |    "cell_type": "markdown",
 496 |    "metadata": {},
 497 |    "source": [
 498 |     "(Note that in this context we only had one labeler, so the consolidated label will be identical to the worker label)"
 499 |    ]
 500 |   },
 501 |   {
 502 |    "cell_type": "markdown",
 503 |    "metadata": {},
 504 |    "source": [
 505 |     "---"
 506 |    ]
 507 |   },
 508 |   {
 509 |    "cell_type": "markdown",
 510 |    "metadata": {},
 511 |    "source": [
 512 |     "# Split Data and Copy to S3"
 513 |    ]
 514 |   },
 515 |   {
 516 |    "cell_type": "code",
 517 |    "execution_count": null,
 518 |    "metadata": {},
 519 |    "outputs": [],
 520 |    "source": [
 521 |     "def split_data(output):\n",
 522 |     "    \n",
 523 |     "    # Shuffle output in place.\n",
 524 |     "    np.random.shuffle(output)\n",
 525 |     "\n",
 526 |     "    dataset_size = len(output)\n",
 527 |     "    train_test_split_index = round(dataset_size*0.9)\n",
 528 |     "\n",
 529 |     "    train_data = output[:train_test_split_index]\n",
 530 |     "    test_data = output[train_test_split_index:]\n",
 531 |     "\n",
 532 |     "    train_test_split_index_2 = round(len(test_data)*0.5)\n",
 533 |     "    validation_data=test_data[:train_test_split_index_2]\n",
 534 |     "    hold_out=test_data[train_test_split_index_2:]\n",
 535 |     "    \n",
 536 |     "    return train_data, validation_data, hold_out\n",
 537 |     "                   \n",
 538 |     "train_data, validation_data, hold_out = split_data(output_with_answers)"
 539 |    ]
 540 |   },
 541 |   {
 542 |    "cell_type": "code",
 543 |    "execution_count": null,
 544 |    "metadata": {},
 545 |    "outputs": [],
 546 |    "source": [
 547 |     "num_training_samples = 0\n",
 548 |     "with open('data/train.manifest', 'w') as f:\n",
 549 |     "    for line in train_data:\n",
 550 |     "        f.write(json.dumps(line))\n",
 551 |     "        f.write('\\n')\n",
 552 |     "        num_training_samples += 1\n",
 553 |     "\n",
 554 |     "with open('data/validation.manifest', 'w') as f:\n",
 555 |     "    for line in validation_data:\n",
 556 |     "        f.write(json.dumps(line))\n",
 557 |     "        f.write('\\n')\n",
 558 |     "with open('data/hold_out.manifest', 'w') as f:\n",
 559 |     "    for line in hold_out:\n",
 560 |     "        f.write(json.dumps(line))\n",
 561 |     "        f.write('\\n')\n",
 562 |     "\n",
 563 |     "print(f'Training Data Set Size: {len(train_data)}')\n",
 564 |     "print(f'Validatation Data Set Size: {len(validation_data)}')\n",
 565 |     "print(f'Hold Out Data Set Size: {len(hold_out)}')"
 566 |    ]
 567 |   },
 568 |   {
 569 |    "cell_type": "code",
 570 |    "execution_count": null,
 571 |    "metadata": {},
 572 |    "outputs": [],
 573 |    "source": [
 574 |     "def copy_to_s3(bucket, prefix, expr_name):\n",
 575 |     "    !aws s3 cp data/train.manifest s3://{bucket}/{prefix}/{expr_name}/train.manifest\n",
 576 |     "    !aws s3 cp data/validation.manifest s3://{bucket}/{prefix}/{expr_name}/validation.manifest\n",
 577 |     "    !aws s3 cp data/hold_out.manifest s3://{bucket}/{prefix}/{expr_name}/hold_out.manifest\n",
 578 |     "        \n",
 579 |     "copy_to_s3(BUCKET, PREFIX, EXP_NAME)"
 580 |    ]
 581 |   },
 582 |   {
 583 |    "cell_type": "markdown",
 584 |    "metadata": {},
 585 |    "source": [
 586 |     "# Train on SageMaker & Track with Experiments"
 587 |    ]
 588 |   },
 589 |   {
 590 |    "cell_type": "markdown",
 591 |    "metadata": {},
 592 |    "source": [
 593 |     "Let's create a trial within the experiment that we can associate this job with. "
 594 |    ]
 595 |   },
 596 |   {
 597 |    "cell_type": "code",
 598 |    "execution_count": null,
 599 |    "metadata": {},
 600 |    "outputs": [],
 601 |    "source": [
 602 |     "from smexperiments.trial import Trial\n",
 603 |     "\n",
 604 |     "trial_name = f\"built-in-object-detection-{int(time.time())}\"\n",
 605 |     "\n",
 606 |     "trial = Trial.create(trial_name = trial_name,\n",
 607 |     "                     experiment_name = experiment_name,\n",
 608 |     "                     sagemaker_boto_client = sm)"
 609 |    ]
 610 |   },
 611 |   {
 612 |    "cell_type": "code",
 613 |    "execution_count": null,
 614 |    "metadata": {},
 615 |    "outputs": [],
 616 |    "source": [
 617 |     "import re\n",
 618 |     "from sagemaker import get_execution_role\n",
 619 |     "from time import gmtime, strftime\n",
 620 |     "\n",
 621 |     "role = get_execution_role()\n",
 622 |     "sess = sagemaker.Session()\n",
 623 |     "s3 = boto3.resource('s3')\n",
 624 |     "\n",
 625 |     "training_image = sagemaker.image_uris.retrieve('object-detection', boto3.Session().region_name, version='latest')\n",
 626 |     "augmented_manifest_filename_train = 'train.manifest'\n",
 627 |     "augmented_manifest_filename_validation = 'validation.manifest'\n",
 628 |     "bucket_name = BUCKET\n",
 629 |     "s3_prefix = EXP_NAME\n"
 630 |    ]
 631 |   },
 632 |   {
 633 |    "cell_type": "code",
 634 |    "execution_count": null,
 635 |    "metadata": {},
 636 |    "outputs": [],
 637 |    "source": [
 638 |     "# Defines paths for use in the training job request.\n",
 639 |     "s3_train_data_path = 's3://{}/{}/{}/train.manifest'.format(BUCKET, PREFIX, EXP_NAME)\n",
 640 |     "s3_validation_data_path = 's3://{}/{}/{}/validation.manifest'.format(BUCKET, PREFIX, EXP_NAME )\n",
 641 |     "s3_debug_path = \"s3://{}/{}/{}/debug-hook-data\".format(BUCKET, PREFIX, EXP_NAME)\n",
 642 |     "s3_output_path = f's3://{BUCKET}/{PREFIX}/{EXP_NAME}/output'"
 643 |    ]
 644 |   },
 645 |   {
 646 |    "cell_type": "code",
 647 |    "execution_count": null,
 648 |    "metadata": {},
 649 |    "outputs": [],
 650 |    "source": [
 651 |     "\n",
 652 |     "augmented_manifest_s3_key = s3_train_data_path.split(bucket_name)[1][1:]\n",
 653 |     "s3_obj = s3.Object(bucket_name, augmented_manifest_s3_key)\n",
 654 |     "augmented_manifest = s3_obj.get()['Body'].read().decode('utf-8')\n",
 655 |     "augmented_manifest_lines = augmented_manifest.split('\\n')\n",
 656 |     "num_training_samples = len(augmented_manifest_lines) # Compute number of training samples for use in training job request.\n",
 657 |     "\n",
 658 |     "# Determine the keys in the training manifest and exclude the meta data from the labling job.\n",
 659 |     "attribute_names = list(json.loads(augmented_manifest_lines[0]).keys())\n",
 660 |     "attribute_names = [attrib for attrib in attribute_names if 'meta' not in attrib]"
 661 |    ]
 662 |   },
 663 |   {
 664 |    "cell_type": "code",
 665 |    "execution_count": null,
 666 |    "metadata": {},
 667 |    "outputs": [],
 668 |    "source": [
 669 |     "# Create unique job name\n",
 670 |     "job_name_prefix = EXP_NAME\n",
 671 |     "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n",
 672 |     "model_job_name = job_name_prefix + timestamp"
 673 |    ]
 674 |   },
 675 |   {
 676 |    "cell_type": "code",
 677 |    "execution_count": null,
 678 |    "metadata": {},
 679 |    "outputs": [],
 680 |    "source": [
 681 |     "# Create unique job name\n",
 682 |     "job_name_prefix = EXP_NAME\n",
 683 |     "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n",
 684 |     "model_job_name = job_name_prefix + timestamp\n",
 685 |     "\n",
 686 |     "# set up your training job using boto3 API syntax\n",
 687 |     "training_params = \\\n",
 688 |     "    {\n",
 689 |     "        \"AlgorithmSpecification\": {\n",
 690 |     "            # NB. This is one of the named constants defined in the first cell.\n",
 691 |     "            \"TrainingImage\": training_image,\n",
 692 |     "            \"TrainingInputMode\": \"Pipe\"\n",
 693 |     "        },\n",
 694 |     "        \"RoleArn\": role,\n",
 695 |     "        \"OutputDataConfig\": {\n",
 696 |     "            \"S3OutputPath\": s3_output_path\n",
 697 |     "        },\n",
 698 |     "        \"ResourceConfig\": {\n",
 699 |     "            \"InstanceCount\": 1,\n",
 700 |     "            \"InstanceType\": \"ml.p3.2xlarge\", #Use a GPU backed instance\n",
 701 |     "            \"VolumeSizeInGB\": 50\n",
 702 |     "        },\n",
 703 |     "        \"TrainingJobName\": model_job_name,\n",
 704 |     "        \"HyperParameters\": {  # NB. These hyperparameters are at the user's discretion and are beyond the scope of this demo.\n",
 705 |     "            \"base_network\": \"resnet-50\",\n",
 706 |     "            \"use_pretrained_model\": \"1\",\n",
 707 |     "            \"num_classes\": \"1\",\n",
 708 |     "            \"mini_batch_size\": \"10\",\n",
 709 |     "            \"epochs\": \"30\",\n",
 710 |     "            \"learning_rate\": \"0.001\",\n",
 711 |     "            \"lr_scheduler_step\": \"\",\n",
 712 |     "            \"lr_scheduler_factor\": \"0.1\",\n",
 713 |     "            \"optimizer\": \"sgd\",\n",
 714 |     "            \"momentum\": \"0.9\",\n",
 715 |     "            \"weight_decay\": \"0.0005\",\n",
 716 |     "            \"overlap_threshold\": \"0.5\",\n",
 717 |     "            \"nms_threshold\": \"0.45\",\n",
 718 |     "            \"image_shape\": \"300\",\n",
 719 |     "            \"label_width\": \"350\",\n",
 720 |     "            \"num_training_samples\": str(num_training_samples)\n",
 721 |     "        },\n",
 722 |     "        \"StoppingCondition\": {\n",
 723 |     "            \"MaxRuntimeInSeconds\": 86400,\n",
 724 |     "            \"MaxWaitTimeInSeconds\":259200,\n",
 725 |     "\n",
 726 |     "        },\n",
 727 |     "        \"EnableManagedSpotTraining\" :True,\n",
 728 |     "        \"InputDataConfig\": [\n",
 729 |     "            {\n",
 730 |     "                \"ChannelName\": \"train\",\n",
 731 |     "                \"DataSource\": {\n",
 732 |     "                    \"S3DataSource\": {\n",
 733 |     "                        \"S3DataType\": \"AugmentedManifestFile\",  # NB. Augmented Manifest\n",
 734 |     "                        \"S3Uri\": s3_train_data_path,\n",
 735 |     "                        \"S3DataDistributionType\": \"FullyReplicated\",\n",
 736 |     "                        # NB. This must correspond to the JSON field names in your augmented manifest.\n",
 737 |     "                        \"AttributeNames\": attribute_names\n",
 738 |     "                    }\n",
 739 |     "                },\n",
 740 |     "                \"ContentType\": \"application/x-recordio\",\n",
 741 |     "                \"RecordWrapperType\": \"RecordIO\",\n",
 742 |     "                \"CompressionType\": \"None\"\n",
 743 |     "            },\n",
 744 |     "            {\n",
 745 |     "                \"ChannelName\": \"validation\",\n",
 746 |     "                \"DataSource\": {\n",
 747 |     "                    \"S3DataSource\": {\n",
 748 |     "                        \"S3DataType\": \"AugmentedManifestFile\",  # NB. Augmented Manifest\n",
 749 |     "                        \"S3Uri\": s3_validation_data_path,\n",
 750 |     "                        \"S3DataDistributionType\": \"FullyReplicated\",\n",
 751 |     "                        # NB. This must correspond to the JSON field names in your augmented manifest.\n",
 752 |     "                        \"AttributeNames\": attribute_names\n",
 753 |     "                    }\n",
 754 |     "                },\n",
 755 |     "                \"ContentType\": \"application/x-recordio\",\n",
 756 |     "                \"RecordWrapperType\": \"RecordIO\",\n",
 757 |     "                \"CompressionType\": \"None\"\n",
 758 |     "            }\n",
 759 |     "        ],\n",
 760 |     "        \"ExperimentConfig\": {\n",
 761 |     "            'ExperimentName': experiment_name,\n",
 762 |     "            'TrialName': trial_name,\n",
 763 |     "            'TrialComponentDisplayName': 'Training'\n",
 764 |     "            },\n",
 765 |     "        \"DebugHookConfig\":{\n",
 766 |     "            'S3OutputPath': s3_debug_path,\n",
 767 |     "            'CollectionConfigurations': [\n",
 768 |     "                {\n",
 769 |     "                    'CollectionName': 'all_tensors',\n",
 770 |     "                    'CollectionParameters': {\n",
 771 |     "                        'include_regex': '.*',\n",
 772 |     "                        \"save_steps\":\"1, 2, 3\"\n",
 773 |     "                    }\n",
 774 |     "                },\n",
 775 |     "            ]\n",
 776 |     "    },\n",
 777 |     "    }\n",
 778 |     "\n",
 779 |     "print('Training job name: {}'.format(model_job_name))\n",
 780 |     "print('\\nInput Data Location: {}'.format(\n",
 781 |     "    training_params['InputDataConfig'][0]['DataSource']['S3DataSource']))\n"
 782 |    ]
 783 |   },
 784 |   {
 785 |    "cell_type": "code",
 786 |    "execution_count": null,
 787 |    "metadata": {},
 788 |    "outputs": [],
 789 |    "source": [
 790 |     "client = boto3.client(service_name='sagemaker')\n",
 791 |     "client.create_training_job(**training_params)\n",
 792 |     "\n",
 793 |     "# Confirm that the training job has started\n",
 794 |     "status = client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus']\n",
 795 |     "print(f'Training job name: {model_job_name}')\n",
 796 |     "print('Training job current status: {}'.format(status))"
 797 |    ]
 798 |   },
 799 |   {
 800 |    "cell_type": "markdown",
 801 |    "metadata": {},
 802 |    "source": [
 803 |     "Using the default p3.2xlarge as noted here, this job should take about an hour to train. While that's happening, circle back and step through the code again. Make sure you really understood how everything is coming together.\n",
 804 |     "\n",
 805 |     "### Monitor Job Progress using Experiments\n",
 806 |     "If you are running on Studio, you should be able to open up the Experiments tab and see the status of your job."
 807 |    ]
 808 |   },
 809 |   {
 810 |    "cell_type": "markdown",
 811 |    "metadata": {},
 812 |    "source": [
 813 |     "# Convert Images into RecordIO\n",
 814 |     "As is well documented, training deep learning models can take a long time. One way to speed this up is by using an optimized file format, such as recordIO. Let's convert our pngs into recordIO for the next step. "
 815 |    ]
 816 |   },
 817 |   {
 818 |    "cell_type": "code",
 819 |    "execution_count": null,
 820 |    "metadata": {},
 821 |    "outputs": [],
 822 |    "source": [
 823 |     "!pip install mxnet\n",
 824 |     "!pip install opencv-python-headless"
 825 |    ]
 826 |   },
 827 |   {
 828 |    "cell_type": "code",
 829 |    "execution_count": null,
 830 |    "metadata": {},
 831 |    "outputs": [],
 832 |    "source": [
 833 |     "# point the first argument to the location of your local png image folder\n",
 834 |     "# running the script with this command will create a lst file, listing all of your images for the train set\n",
 835 |     "!python im2rec.py --root \"/root/images\" --prefix \"train\" --exts '.png' --chunks 1  --create_list 'Yes'"
 836 |    ]
 837 |   },
 838 |   {
 839 |    "cell_type": "code",
 840 |    "execution_count": null,
 841 |    "metadata": {},
 842 |    "outputs": [],
 843 |    "source": [
 844 |     "# running this file will create a train.idx and train.rec file\n",
 845 |     "!python im2rec.py --root \"/root/images\" --prefix '/root/amazon-sagemaker-architecting-for-ml-hcls/Starter Notebooks/Advanced Data Science - XRay Analysis/' --exts '.png' --chunks 1 --create_list 'no'"
 846 |    ]
 847 |   },
 848 |   {
 849 |    "cell_type": "code",
 850 |    "execution_count": null,
 851 |    "metadata": {},
 852 |    "outputs": [],
 853 |    "source": [
 854 |     "!aws s3 cp train.idx s3://$BUCKET/$PREFIX/recio-files/\n",
 855 |     "!aws s3 cp train.rec s3://$BUCKET/$PREFIX/recio-files/"
 856 |    ]
 857 |   },
 858 |   {
 859 |    "cell_type": "markdown",
 860 |    "metadata": {},
 861 |    "source": [
 862 |     "---\n",
 863 |     "# Bring your own Model and Train on SageMaker with Script Mode\n",
 864 |     "Once your job finishes, you are welcome to explore bringing your own script into SageMaker. Below we're demonstrating using GluonCV to bring a custom ssd model using the MXNet container. This is nice because it's coming with it's own pre-trained model! \n",
 865 |     "- https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/gluoncv_ssd_mobilenet/gluoncv_ssd_mobilenet_neo.ipynb\n",
 866 |     "\n",
 867 |     "Notice that by using script mode we automatically get access to debugger, which will give us the ability to visualize our neural network locally. Let's get it up and running!\n",
 868 |     "\n",
 869 |     "If you prefer, you are welcome to bring your own preferred SSD model instead.\n",
 870 |     "\n"
 871 |    ]
 872 |   },
 873 |   {
 874 |    "cell_type": "code",
 875 |    "execution_count": null,
 876 |    "metadata": {},
 877 |    "outputs": [],
 878 |    "source": [
 879 |     "%%writefile src/requirements.txt\n",
 880 |     "\n",
 881 |     "gluoncv"
 882 |    ]
 883 |   },
 884 |   {
 885 |    "cell_type": "code",
 886 |    "execution_count": null,
 887 |    "metadata": {},
 888 |    "outputs": [],
 889 |    "source": [
 890 |     "from sagemaker.mxnet import MXNet\n",
 891 |     "import sagemaker\n",
 892 |     "from sagemaker.debugger import DebuggerHookConfig, CollectionConfig\n",
 893 |     "\n",
 894 |     "role = sagemaker.get_execution_role()\n",
 895 |     "\n",
 896 |     "ssd_estimator = MXNet(entry_point='ssd_entry_point.py',\n",
 897 |     "                      source_dir = 'src',\n",
 898 |     "                      role=role,\n",
 899 |     "                      output_path=s3_output_path,\n",
 900 |     "                      instance_count=1,\n",
 901 |     "                      instance_type='ml.p3.8xlarge',\n",
 902 |     "                      framework_version='1.6',\n",
 903 |     "                      py_version='py3',\n",
 904 |     "                      use_spot_instances=True,\n",
 905 |     "                      max_wait = (8600*3),\n",
 906 |     "                      max_run = 8600,\n",
 907 |     "                      distribution={'parameter_server': {'enabled': True}},\n",
 908 |     "                      hyperparameters={'epochs': 1, 'data-shape': 350},\n",
 909 |     "                      debugger_hook_config = DebuggerHookConfig(\n",
 910 |     "                              s3_output_path =  s3_debug_path,\n",
 911 |     "                              collection_configs = [CollectionConfig(name='all_tensors',\n",
 912 |     "                              parameters={'include_regex':'.*', 'save_steps':'1,2,3'})]))                     \n",
 913 |     "\n",
 914 |     "ssd_estimator.fit(inputs = {'train': 's3://{}/{}/recio-files'.format(BUCKET, PREFIX)}, \n",
 915 |     "                          experiment_config = {'ExperimentName': experiment_name,\n",
 916 |     "                            'TrialName': 'xray-recordio-gluoncv', 'TrialComponentDisplayName': 'Training'})"
 917 |    ]
 918 |   },
 919 |   {
 920 |    "cell_type": "markdown",
 921 |    "metadata": {},
 922 |    "source": [
 923 |     "You might discover an issue here - GluonCV is stuggling to find the labels from our bounding boxes. Can you figure out how to supply them correctly? "
 924 |    ]
 925 |   },
 926 |   {
 927 |    "cell_type": "markdown",
 928 |    "metadata": {},
 929 |    "source": [
 930 |     "---\n",
 931 |     "# Visualize Model with SageMaker Debugger\n",
 932 |     "Now, we're going to use SageMaker Debugger to build a TensorPlot of our model!"
 933 |    ]
 934 |   },
 935 |   {
 936 |    "cell_type": "markdown",
 937 |    "metadata": {},
 938 |    "source": [
 939 |     "![](images/tensorplot.gif)"
 940 |    ]
 941 |   },
 942 |   {
 943 |    "cell_type": "code",
 944 |    "execution_count": null,
 945 |    "metadata": {},
 946 |    "outputs": [],
 947 |    "source": [
 948 |     "!aws s3 sync {s3_debug_path} ."
 949 |    ]
 950 |   },
 951 |   {
 952 |    "cell_type": "code",
 953 |    "execution_count": null,
 954 |    "metadata": {},
 955 |    "outputs": [],
 956 |    "source": [
 957 |     "import tensor_plot \n",
 958 |     "\n",
 959 |     "visualization = tensor_plot.TensorPlot(\n",
 960 |     "    regex=\".*relu_output\", \n",
 961 |     "    path=folder_name,\n",
 962 |     "    steps=10,  \n",
 963 |     "    batch_sample_id=0,\n",
 964 |     "    color_channel = 1,\n",
 965 |     "    title=\"Relu outputs\",\n",
 966 |     "    label=\".*sequential0_input_0\",\n",
 967 |     "    prediction=\".*sequential0_output_0\"\n",
 968 |     ")"
 969 |    ]
 970 |   },
 971 |   {
 972 |    "cell_type": "markdown",
 973 |    "metadata": {},
 974 |    "source": [
 975 |     "If we plot too many layers, it can crash the notebook. If you encounter performance or out of memory issues, then either try to reduce the layers to plot by changing the regex or run this Notebook in JupyterLab instead of Jupyter.\n",
 976 |     "\n",
 977 |     "In the below cell we vizualize outputs of all layers, including final classification. Please note that because training job ran only for a few epochs classification accuracy is not high."
 978 |    ]
 979 |   },
 980 |   {
 981 |    "cell_type": "code",
 982 |    "execution_count": null,
 983 |    "metadata": {},
 984 |    "outputs": [],
 985 |    "source": [
 986 |     "visualization.fig.show(renderer=\"iframe\")"
 987 |    ]
 988 |   },
 989 |   {
 990 |    "cell_type": "markdown",
 991 |    "metadata": {},
 992 |    "source": [
 993 |     "---\n",
 994 |     "# Extentions\n",
 995 |     "If you make it here with spare time, why not try to bring another model into SageMaker? Or set up the automatic model tuner on your own script file? Or optimize your model for deployment using SageMaker neo? \n",
 996 |     "\n",
 997 |     "You can also deploy some of these models and start to get predictions from them using `model.deploy()`.\n",
 998 |     "\n",
 999 |     "Feel free to use the rest of your time to build something awesome. "
1000 |    ]
1001 |   },
1002 |   {
1003 |    "cell_type": "code",
1004 |    "execution_count": null,
1005 |    "metadata": {},
1006 |    "outputs": [],
1007 |    "source": []
1008 |   }
1009 |  ],
1010 |  "metadata": {
1011 |   "instance_type": "ml.t3.medium",
1012 |   "kernelspec": {
1013 |    "display_name": "Python 3 (Data Science)",
1014 |    "language": "python",
1015 |    "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
1016 |   },
1017 |   "language_info": {
1018 |    "codemirror_mode": {
1019 |     "name": "ipython",
1020 |     "version": 3
1021 |    },
1022 |    "file_extension": ".py",
1023 |    "mimetype": "text/x-python",
1024 |    "name": "python",
1025 |    "nbconvert_exporter": "python",
1026 |    "pygments_lexer": "ipython3",
1027 |    "version": "3.7.6"
1028 |   }
1029 |  },
1030 |  "nbformat": 4,
1031 |  "nbformat_minor": 4
1032 | }
1033 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/ground_truth_od.py:
--------------------------------------------------------------------------------
  1 | '''Define classes and functions for interfacing with SageMaker Ground
  2 | Truth object detection.
  3 | 
  4 | '''
  5 | 
  6 | import os
  7 | import imageio
  8 | import matplotlib.pyplot as plt
  9 | import numpy as np
 10 | 
 11 | 
 12 | class BoundingBox:
 13 |     '''Bounding box for an object in an image.'''
 14 | 
 15 |     def __init__(self, image_id=None, boxdata=None):
 16 |         self.image_id = image_id
 17 |         if boxdata:
 18 |             for datum in boxdata:
 19 |                 setattr(self, datum, boxdata[datum])
 20 | 
 21 |     def __repr__(self):
 22 |         return 'Box for image {}'.format(self.image_id)
 23 | 
 24 |     def compute_bb_data(self):
 25 |         '''Compute the parameters used for IoU.'''
 26 |         image = self.image
 27 |         self.xmin = self.left/image.width
 28 |         self.xmax = (self.left + self.width)/image.width
 29 |         self.ymin = self.top/image.height
 30 |         self.ymax = (self.top + self.height)/image.height
 31 | 
 32 | 
 33 | class WorkerBoundingBox(BoundingBox):
 34 |     '''Bounding box for an object in an image produced by a worker.'''
 35 | 
 36 |     def __init__(self, image_id=None, worker_id=None, boxdata=None):
 37 |         self.worker_id = worker_id
 38 |         super().__init__(image_id=image_id, boxdata=boxdata)
 39 | 
 40 | 
 41 | class GroundTruthBox(BoundingBox):
 42 |     '''Bounding box for an object in an image produced by a worker.'''
 43 | 
 44 |     def __init__(self, image_id=None, oiddata=None, image=None):
 45 |         self.image = image
 46 |         self.class_name = oiddata[0]
 47 |         xmin, xmax, ymin, ymax = [float(datum) for datum in oiddata[1:]]
 48 |         self.xmin = xmin
 49 |         self.ymin = ymin
 50 |         self.xmax = xmax
 51 |         self.ymax = ymax
 52 |         imw = image.width
 53 |         imh = image.height
 54 |         boxdata = {'height': (ymax-ymin)*imh,
 55 |                    'width': (xmax-xmin)*imw,
 56 |                    'left': xmin*imw,
 57 |                    'top': ymin*imh}
 58 |         super().__init__(image_id=image_id, boxdata=boxdata)
 59 | 
 60 | 
 61 | class BoxedImage:
 62 |     '''Image with bounding boxes.'''
 63 | 
 64 |     def __init__(self, id=None, consolidated_boxes=None,
 65 |                  worker_boxes=None, gt_boxes=None, uri=None,
 66 |                  size=None):
 67 |         self.id = id
 68 |         self.uri = uri
 69 |         if uri:
 70 |             self.filename = uri.split('/')[-1]
 71 |             self.oid_id = self.filename.split('.')[0]
 72 |         else:
 73 |             self.filename = None
 74 |             self.oid_id = None
 75 |         self.local = None
 76 |         self.im = None
 77 |         if size:
 78 |             self.width = size['width']
 79 |             self.depth = size['depth']
 80 |             self.height = size['height']
 81 |             self.shape = self.width, self.height, self.depth
 82 |         if consolidated_boxes:
 83 |             self.consolidated_boxes = consolidated_boxes
 84 |         else:
 85 |             self.consolidated_boxes = []
 86 |         if worker_boxes:
 87 |             self.worker_boxes = worker_boxes
 88 |         else:
 89 |             self.worker_boxes = []
 90 |         if gt_boxes:
 91 |             self.gt_boxes = gt_boxes
 92 |         else:
 93 |             self.gt_boxes = []
 94 | 
 95 |     def __repr__(self):
 96 |         return 'Image{}'.format(self.id)
 97 | 
 98 |     def n_consolidated_boxes(self):
 99 |         '''Count the number of consolidated boxes.'''
100 |         return len(self.consolidated_boxes)
101 | 
102 |     def n_worker_boxes(self):
103 |         return len(self.worker_boxes)
104 | 
105 |     def download(self, directory):
106 |         target_fname = os.path.join(
107 |             directory, self.uri.split('/')[-1])
108 |         if not os.path.isfile(target_fname):
109 |             os.system(f'aws s3 cp {self.uri} {target_fname}')
110 |         self.local = target_fname
111 | 
112 |     def imread(self):
113 |         '''Cache the image reading process.'''
114 |         try:
115 |             return imageio.imread(self.local)
116 |         except OSError:
117 |             print("You need to download this image first. "
118 |                   "Use this_image.download(local_directory).")
119 |             raise
120 | 
121 |     def plot_bbs(self, ax, bbs, img_kwargs, box_kwargs, **kwargs):
122 |         '''Master function for plotting images with bounding boxes.'''
123 |         img = self.imread()
124 |         ax.imshow(img, **img_kwargs)
125 |         imh, imw, *_ = img.shape
126 |         box_kwargs['fill'] = None
127 |         if kwargs.get('worker', False):
128 |             # Give each worker a color.
129 |             worker_colors = {}
130 |             worker_count = 0
131 |             for bb in bbs:
132 |                 worker = bb.worker_id
133 |                 if worker not in worker_colors:
134 |                     worker_colors[worker] = 'C' + str((9-worker_count) % 10)
135 |                     worker_count += 1
136 |                 rec = plt.Rectangle((bb.left, bb.top), bb.width, bb.height,
137 |                                     edgecolor=worker_colors[worker],
138 |                                     **box_kwargs)
139 |                 ax.add_patch(rec)
140 |         else:
141 |             for bb in bbs:
142 |                 rec = plt.Rectangle(
143 |                     (bb.left, bb.top), bb.width, bb.height, **box_kwargs)
144 |                 ax.add_patch(rec)
145 |         ax.axis('off')
146 | 
147 |     def plot_consolidated_bbs(self, ax, img_kwargs={},
148 |                               box_kwargs={'edgecolor': 'blue',
149 |                                           'lw': 3}):
150 |         '''Plot the consolidated boxes.'''
151 |         self.plot_bbs(ax, self.consolidated_boxes,
152 |                       img_kwargs=img_kwargs, box_kwargs=box_kwargs)
153 | 
154 |     def plot_worker_bbs(self, ax, img_kwargs={}, box_kwargs={'lw': 2}):
155 |         '''Plot the individual worker boxes.'''
156 |         self.plot_bbs(ax, self.worker_boxes, worker=True,
157 |                       img_kwargs=img_kwargs, box_kwargs=box_kwargs)
158 | 
159 |     def plot_gt_bbs(self, ax, img_kwargs={},
160 |                     box_kwargs={'edgecolor': 'lime',
161 |                                 'lw': 3}):
162 |         '''Plot the ground truth (Open Image Dataset) boxes.'''
163 |         self.plot_bbs(ax, self.gt_boxes,
164 |                       img_kwargs=img_kwargs, box_kwargs=box_kwargs)
165 | 
166 |     def compute_img_confidence(self):
167 |         ''' Compute the mean bb confidence. '''
168 |         if len(self.consolidated_boxes) > 0:
169 |             return np.mean([box.confidence for box in self.consolidated_boxes])
170 |         else:
171 |             return 0
172 | 
173 |     def compute_iou_bb(self):
174 |         '''Compute the mean intersection over union for a collection of
175 |         bounding boxes.
176 |         '''
177 | 
178 |         # Precompute data for the consolidated boxes if necessary.
179 |         for box in self.consolidated_boxes:
180 |             try:
181 |                 box.xmin
182 |             except AttributeError:
183 |                 box.compute_bb_data()
184 | 
185 |         # Make the numpy arrays.
186 |         if self.gt_boxes:
187 |             gts = np.vstack([(box.xmin, box.ymin, box.xmax, box.ymax)
188 |                              for box in self.gt_boxes])
189 |         else:
190 |             gts = []
191 |         if self.consolidated_boxes:
192 |             preds = np.vstack([(box.xmin, box.ymin, box.xmax, box.ymax)
193 |                                for box in self.consolidated_boxes])
194 |         else:
195 |             preds = []
196 |         confs = np.array([box.confidence for box in self.consolidated_boxes])
197 | 
198 |         if len(preds) == 0 and len(gts) == 0:
199 |             return 1.
200 |         if len(preds) == 0 or len(gts) == 0:
201 |             return 0.
202 |         preds = preds[np.argsort(confs.flatten())][::-1]
203 | 
204 |         is_pred_assigned_to_gt = [False] * len(gts)
205 |         pred_areas = (preds[:, 2] - preds[:, 0]) * \
206 |             (preds[:, 3] - preds[:, 1])
207 |         gt_areas = (gts[:, 2] - gts[:, 0]) * (gts[:, 3] - gts[:, 1])
208 |         all_ious = []
209 |         for pred_id, pred in enumerate(preds):
210 |             best_iou = 0
211 |             best_id = -1
212 |             for gt_id, gt in enumerate(gts):
213 |                 if is_pred_assigned_to_gt[gt_id]:
214 |                     continue
215 |                 x1 = max(gt[0], pred[0])
216 |                 y1 = max(gt[1], pred[1])
217 |                 x2 = min(gt[2], pred[2])
218 |                 y2 = min(gt[3], pred[3])
219 |                 iw = max(0, x2 - x1)
220 |                 ih = max(0, y2 - y1)
221 |                 inter = iw * ih
222 |                 iou = inter / \
223 |                     (pred_areas[pred_id] + gt_areas[gt_id] - inter)
224 |                 if iou > best_iou:
225 |                     best_iou = iou
226 |                     best_id = gt_id
227 |             if best_id != -1:
228 |                 is_pred_assigned_to_gt[best_id] = True
229 |                 # True positive! Store the IoU.
230 |                 all_ious.append(best_iou)
231 |             else:
232 |                 # 0 IoU for each unmatched gt (false-negative).
233 |                 all_ious.append(0.)
234 | 
235 |         # 0 IoU for each unmatched prediction (false-positive).
236 |         all_ious.extend([0.] * (len(is_pred_assigned_to_gt) -
237 |                                 sum(is_pred_assigned_to_gt)))
238 | 
239 |         return np.mean(all_ious)
240 | 
241 | 
242 | def group_miou(imgs):
243 |     '''Compute the mIoU for a group of images.
244 | 
245 |     Args:
246 |       imgs: list of BoxedImages, with consolidated_boxes and gt_boxes.
247 | 
248 |     Returns:
249 |       mIoU calculated over the bounding boxes in the group.
250 |     '''
251 |     # Create a notional BoxedImage with bounding boxes from imgs.
252 |     all_consolidated_boxes = [box for img in imgs
253 |                               for box in img.consolidated_boxes]
254 |     all_gt_boxes = [box for img in imgs
255 |                     for box in img.gt_boxes]
256 |     notional_image = BoxedImage(consolidated_boxes=all_consolidated_boxes,
257 |                                 gt_boxes=all_gt_boxes)
258 | 
259 |     # Compute and return the mIoU.
260 |     return notional_image.compute_iou_bb()
261 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/im2rec.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | # Licensed to the Apache Software Foundation (ASF) under one
  4 | # or more contributor license agreements.  See the NOTICE file
  5 | # distributed with this work for additional information
  6 | # regarding copyright ownership.  The ASF licenses this file
  7 | # to you under the Apache License, Version 2.0 (the
  8 | # "License"); you may not use this file except in compliance
  9 | # with the License.  You may obtain a copy of the License at
 10 | #
 11 | #   http://www.apache.org/licenses/LICENSE-2.0
 12 | #
 13 | # Unless required by applicable law or agreed to in writing,
 14 | # software distributed under the License is distributed on an
 15 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 16 | # KIND, either express or implied.  See the License for the
 17 | # specific language governing permissions and limitations
 18 | # under the License.
 19 | 
 20 | from __future__ import print_function
 21 | import os
 22 | import sys
 23 | 
 24 | curr_path = os.path.abspath(os.path.dirname(__file__))
 25 | sys.path.append(os.path.join(curr_path, "../python"))
 26 | import mxnet as mx
 27 | import random
 28 | import argparse
 29 | import cv2
 30 | import time
 31 | import traceback
 32 | 
 33 | try:
 34 |     import multiprocessing
 35 | except ImportError:
 36 |     multiprocessing = None
 37 | 
 38 | def list_image(root, recursive, exts):
 39 |     i = 0
 40 |     if recursive:
 41 |         cat = {}
 42 |         for path, dirs, files in os.walk(root, followlinks=True):
 43 |             dirs.sort()
 44 |             files.sort()
 45 |             for fname in files:
 46 |                 fpath = os.path.join(path, fname)
 47 |                 suffix = os.path.splitext(fname)[1].lower()
 48 |                 if os.path.isfile(fpath) and (suffix in exts):
 49 |                     if path not in cat:
 50 |                         cat[path] = len(cat)
 51 |                     yield (i, os.path.relpath(fpath, root), cat[path])
 52 |                     i += 1
 53 |         for k, v in sorted(cat.items(), key=lambda x: x[1]):
 54 |             print(os.path.relpath(k, root), v)
 55 |     else:
 56 |         for fname in sorted(os.listdir(root)):
 57 |             fpath = os.path.join(root, fname)
 58 |             suffix = os.path.splitext(fname)[1].lower()
 59 |             if os.path.isfile(fpath) and (suffix in exts):
 60 |                 yield (i, os.path.relpath(fpath, root), 0)
 61 |                 i += 1
 62 | 
 63 | def write_list(path_out, image_list):
 64 |     with open(path_out, 'w') as fout:
 65 |         for i, item in enumerate(image_list):
 66 |             line = '%d\t' % item[0]
 67 |             for j in item[2:]:
 68 |                 line += '%f\t' % j
 69 |             line += '%s\n' % item[1]
 70 |             fout.write(line)
 71 | 
 72 | def make_list(args):
 73 |     image_list = list_image(args.root, args.recursive, args.exts)
 74 |     image_list = list(image_list)
 75 |     if args.shuffle is True:
 76 |         random.seed(100)
 77 |         random.shuffle(image_list)
 78 |     N = len(image_list)
 79 |     chunk_size = (N + args.chunks - 1) // args.chunks
 80 |     for i in range(args.chunks):
 81 |         chunk = image_list[i * chunk_size:(i + 1) * chunk_size]
 82 |         if args.chunks > 1:
 83 |             str_chunk = '_%d' % i
 84 |         else:
 85 |             str_chunk = ''
 86 |         sep = int(chunk_size * args.train_ratio)
 87 |         sep_test = int(chunk_size * args.test_ratio)
 88 |         if args.train_ratio == 1.0:
 89 |             write_list(args.prefix + str_chunk + '.lst', chunk)
 90 |         else:
 91 |             if args.test_ratio:
 92 |                 write_list(args.prefix + str_chunk + '_test.lst', chunk[:sep_test])
 93 |             if args.train_ratio + args.test_ratio < 1.0:
 94 |                 write_list(args.prefix + str_chunk + '_val.lst', chunk[sep_test + sep:])
 95 |             write_list(args.prefix + str_chunk + '_train.lst', chunk[sep_test:sep_test + sep])
 96 | 
 97 | def read_list(path_in):
 98 |     with open(path_in) as fin:
 99 |         while True:
100 |             line = fin.readline()
101 |             if not line:
102 |                 break
103 |             line = [i.strip() for i in line.strip().split('\t')]
104 |             line_len = len(line)
105 |             if line_len < 3:
106 |                 print('lst should at least has three parts, but only has %s parts for %s' %(line_len, line))
107 |                 continue
108 |             try:
109 |                 item = [int(line[0])] + [line[-1]] + [float(i) for i in line[1:-1]]
110 |             except Exception as e:
111 |                 print('Parsing lst met error for %s, detail: %s' %(line, e))
112 |                 continue
113 |             yield item
114 | 
115 | def image_encode(args, i, item, q_out):
116 |     fullpath = os.path.join(args.root, item[1])
117 | 
118 |     if len(item) > 3 and args.pack_label:
119 |         header = mx.recordio.IRHeader(0, item[2:], item[0], 0)
120 |     else:
121 |         header = mx.recordio.IRHeader(0, item[2], item[0], 0)
122 | 
123 |     if args.pass_through:
124 |         try:
125 |             with open(fullpath, 'rb') as fin:
126 |                 img = fin.read()
127 |             s = mx.recordio.pack(header, img)
128 |             q_out.put((i, s, item))
129 |         except Exception as e:
130 |             traceback.print_exc()
131 |             print('pack_img error:', item[1], e)
132 |             q_out.put((i, None, item))
133 |         return
134 | 
135 |     try:
136 |         img = cv2.imread(fullpath, args.color)
137 |     except:
138 |         traceback.print_exc()
139 |         print('imread error trying to load file: %s ' % fullpath)
140 |         q_out.put((i, None, item))
141 |         return
142 |     if img is None:
143 |         print('imread read blank (None) image for file: %s' % fullpath)
144 |         q_out.put((i, None, item))
145 |         return
146 |     if args.center_crop:
147 |         if img.shape[0] > img.shape[1]:
148 |             margin = (img.shape[0] - img.shape[1]) // 2;
149 |             img = img[margin:margin + img.shape[1], :]
150 |         else:
151 |             margin = (img.shape[1] - img.shape[0]) // 2;
152 |             img = img[:, margin:margin + img.shape[0]]
153 |     if args.resize:
154 |         if img.shape[0] > img.shape[1]:
155 |             newsize = (args.resize, img.shape[0] * args.resize // img.shape[1])
156 |         else:
157 |             newsize = (img.shape[1] * args.resize // img.shape[0], args.resize)
158 |         img = cv2.resize(img, newsize)
159 | 
160 |     try:
161 |         s = mx.recordio.pack_img(header, img, quality=args.quality, img_fmt=args.encoding)
162 |         q_out.put((i, s, item))
163 |     except Exception as e:
164 |         traceback.print_exc()
165 |         print('pack_img error on file: %s' % fullpath, e)
166 |         q_out.put((i, None, item))
167 |         return
168 | 
169 | def read_worker(args, q_in, q_out):
170 |     while True:
171 |         deq = q_in.get()
172 |         if deq is None:
173 |             break
174 |         i, item = deq
175 |         image_encode(args, i, item, q_out)
176 | 
177 | def write_worker(q_out, fname, working_dir):
178 |     pre_time = time.time()
179 |     count = 0
180 |     fname = os.path.basename(fname)
181 |     fname_rec = os.path.splitext(fname)[0] + '.rec'
182 |     fname_idx = os.path.splitext(fname)[0] + '.idx'
183 |     record = mx.recordio.MXIndexedRecordIO(os.path.join(working_dir, fname_idx),
184 |                                            os.path.join(working_dir, fname_rec), 'w')
185 |     buf = {}
186 |     more = True
187 |     while more:
188 |         deq = q_out.get()
189 |         if deq is not None:
190 |             i, s, item = deq
191 |             buf[i] = (s, item)
192 |         else:
193 |             more = False
194 |         while count in buf:
195 |             s, item = buf[count]
196 |             del buf[count]
197 |             if s is not None:
198 |                 record.write_idx(item[0], s)
199 | 
200 |             if count % 1000 == 0:
201 |                 cur_time = time.time()
202 |                 print('time:', cur_time - pre_time, ' count:', count)
203 |                 pre_time = cur_time
204 |             count += 1
205 | 
206 | def parse_args():
207 |     parser = argparse.ArgumentParser(
208 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter,
209 |         description='Create an image list or \
210 |         make a record database by reading from an image list')
211 |     parser.add_argument('--prefix', help='prefix of input/output lst and rec files.')
212 |     parser.add_argument('--root', help='path to folder containing images.')
213 | 
214 |     cgroup = parser.add_argument_group('Options for creating image lists')
215 |     cgroup.add_argument('--l', action='store_true', default=True, 
216 |                         help='If this is set im2rec will create image list(s) by traversing root folder\
217 |         and output to <prefix>.lst.\
218 |         Otherwise im2rec will read <prefix>.lst and create a database at <prefix>.rec')
219 |     cgroup.add_argument('--exts', nargs='+', default=['.jpeg', '.jpg', '.png'],
220 |                         help='list of acceptable image extensions.')
221 |     cgroup.add_argument('--chunks', type=int, default=1, help='number of chunks.')
222 |     cgroup.add_argument('--train-ratio', type=float, default=1.0,
223 |                         help='Ratio of images to use for training.')
224 |     cgroup.add_argument('--test-ratio', type=float, default=0,
225 |                         help='Ratio of images to use for testing.')
226 |     cgroup.add_argument('--recursive', action='store_true',
227 |                         help='If true recursively walk through subdirs and assign an unique label\
228 |         to images in each folder. Otherwise only include images in the root folder\
229 |         and give them label 0.')
230 |     cgroup.add_argument('--no-shuffle', dest='shuffle', action='store_false',
231 |                         help='If this is passed, \
232 |         im2rec will not randomize the image order in <prefix>.lst')
233 |     rgroup = parser.add_argument_group('Options for creating database')
234 |     rgroup.add_argument('--pass-through', action='store_true',
235 |                         help='whether to skip transformation and save image as is')
236 |     rgroup.add_argument('--resize', type=int, default=0,
237 |                         help='resize the shorter edge of image to the newsize, original images will\
238 |         be packed by default.')
239 |     rgroup.add_argument('--center-crop', action='store_true',
240 |                         help='specify whether to crop the center image to make it rectangular.')
241 |     rgroup.add_argument('--quality', type=int, default=95,
242 |                         help='JPEG quality for encoding, 1-100; or PNG compression for encoding, 1-9')
243 |     rgroup.add_argument('--num-thread', type=int, default=1,
244 |                         help='number of thread to use for encoding. order of images will be different\
245 |         from the input list if >1. the input list will be modified to match the\
246 |         resulting order.')
247 |     rgroup.add_argument('--color', type=int, default=1, choices=[-1, 0, 1],
248 |                         help='specify the color mode of the loaded image.\
249 |         1: Loads a color image. Any transparency of image will be neglected. It is the default flag.\
250 |         0: Loads image in grayscale mode.\
251 |         -1:Loads image as such including alpha channel.')
252 |     rgroup.add_argument('--encoding', type=str, default='.jpg', choices=['.jpg', '.png'],
253 |                         help='specify the encoding of the images.')
254 |     rgroup.add_argument('--pack-label', action='store_true',
255 |         help='Whether to also pack multi dimensional label in the record file')
256 |     
257 |     parser.add_argument('--create_list', type=str, default = 'no')
258 |     args = parser.parse_args()
259 | 
260 |     return args
261 | 
262 | if __name__ == '__main__':
263 |     args = parse_args()
264 |     
265 |     print ('made it through arg parse')
266 |     
267 |     print ('looking inside of prefix: {}'.format(args.prefix))
268 |     
269 |     if args.create_list == 'Yes':
270 |         
271 |         print ('tripped list creation')
272 |         make_list(args)
273 |     else:
274 |         
275 |         
276 |         if os.path.isdir(args.prefix):
277 |             working_dir = args.prefix
278 |         else:
279 |             working_dir = os.path.dirname(args.prefix)
280 |             
281 |         files = [os.path.join(working_dir, fname) for fname in os.listdir(working_dir)
282 |                     if os.path.isfile(os.path.join(working_dir, fname))]
283 |         count = 0
284 |         for fname in files:
285 |             if fname.startswith(args.prefix) and fname.endswith('.lst'):
286 |                 print('Creating .rec file from', fname, 'in', working_dir)
287 |                 count += 1
288 |                 image_list = read_list(fname)
289 |                 # -- write_record -- #
290 |                 if args.num_thread > 1 and multiprocessing is not None:
291 |                     q_in = [multiprocessing.Queue(1024) for i in range(args.num_thread)]
292 |                     q_out = multiprocessing.Queue(1024)
293 |                     read_process = [multiprocessing.Process(target=read_worker, args=(args, q_in[i], q_out)) \
294 |                                     for i in range(args.num_thread)]
295 |                     for p in read_process:
296 |                         p.start()
297 |                     write_process = multiprocessing.Process(target=write_worker, args=(q_out, fname, working_dir))
298 |                     write_process.start()
299 | 
300 |                     for i, item in enumerate(image_list):
301 |                         q_in[i % len(q_in)].put((i, item))
302 |                     for q in q_in:
303 |                         q.put(None)
304 |                     for p in read_process:
305 |                         p.join()
306 | 
307 |                     q_out.put(None)
308 |                     write_process.join()
309 |                 else:
310 |                     print('multiprocessing not available, fall back to single threaded encoding')
311 |                     try:
312 |                         import Queue as queue
313 |                     except ImportError:
314 |                         import queue
315 |                     q_out = queue.Queue()
316 |                     fname = os.path.basename(fname)
317 |                     fname_rec = os.path.splitext(fname)[0] + '.rec'
318 |                     fname_idx = os.path.splitext(fname)[0] + '.idx'
319 |                     record = mx.recordio.MXIndexedRecordIO(os.path.join(working_dir, fname_idx),
320 |                                                            os.path.join(working_dir, fname_rec), 'w')
321 |                     cnt = 0
322 |                     pre_time = time.time()
323 |                     for i, item in enumerate(image_list):
324 |                         image_encode(args, i, item, q_out)
325 |                         if q_out.empty():
326 |                             continue
327 |                         _, s, _ = q_out.get()
328 |                         record.write_idx(item[0], s)
329 |                         if cnt % 1000 == 0:
330 |                             cur_time = time.time()
331 |                             print('time:', cur_time - pre_time, ' count:', cnt)
332 |                             pre_time = cur_time
333 |                         cnt += 1
334 |         if not count:
335 |             print('Did not find and list file with prefix %s'%args.prefix)
336 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/images/gt_label_output.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Advanced Data Science - XRay Analysis/images/gt_label_output.png


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/images/tensorplot.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Advanced Data Science - XRay Analysis/images/tensorplot.gif


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/src/requirements.txt:
--------------------------------------------------------------------------------
1 | 
2 | gluoncv
3 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/src/ssd_entry_point.py:
--------------------------------------------------------------------------------
  1 | import io
  2 | import PIL.Image
  3 | import json
  4 | import logging
  5 | import numpy as np
  6 | 
  7 | logger = logging.getLogger(__name__)
  8 | logger.setLevel(logging.DEBUG)
  9 | 
 10 | import glob
 11 | import time
 12 | import argparse
 13 | import warnings
 14 | import mxnet as mx
 15 | from mxnet import nd
 16 | from mxnet import gluon
 17 | from mxnet import autograd
 18 | 
 19 | from gluoncv import data as gdata
 20 | from gluoncv.data.batchify import Tuple, Stack, Pad
 21 | from gluoncv.data.transforms.presets.ssd import SSDDefaultTrainTransform
 22 | from gluoncv import model_zoo
 23 | import os
 24 | import gluoncv as gcv
 25 | 
 26 | def model_fn(model_dir):
 27 |     """
 28 |     Load the gluon model. Called once when hosting service starts.
 29 |     :param: model_dir The directory where model files are stored.
 30 |     :return: a model (in this case a Gluon network)
 31 |     """
 32 |     net = gluon.SymbolBlock.imports('%s/model-symbol.json' % model_dir,
 33 |         ['data'],
 34 |         '%s/model-0000.params' % model_dir)
 35 |    
 36 |     return net
 37 | 
 38 | def parse_args():
 39 |     parser = argparse.ArgumentParser(description='Train SSD networks.')
 40 |     parser.add_argument('--network', type=str, default='ssd_512_mobilenet1.0_voc',
 41 |                         help="Network name")
 42 |     parser.add_argument('--data-shape', type=int, default=512,
 43 |                         help="Input data shape, use 300, 512.")
 44 |     parser.add_argument('--batch-size', type=int, default=32,
 45 |                         help='Training mini-batch size')
 46 |     parser.add_argument('--num-workers', '-j', dest='num_workers', type=int,
 47 |                         default=4, help='Number of data workers, you can use larger '
 48 |                         'number to accelerate data loading, if you CPU and GPUs are powerful.')
 49 |     parser.add_argument('--gpus', type=str, default='0',
 50 |                         help='Training with GPUs, you can specify 1,3 for example.')
 51 |     parser.add_argument('--epochs', type=int, default=240,
 52 |                         help='Training epochs.')
 53 |     parser.add_argument('--start-epoch', type=int, default=0,
 54 |                         help='Starting epoch for resuming, default is 0 for new training.'
 55 |                         'You can specify it to 100 for example to start from 100 epoch.')
 56 |     parser.add_argument('--log-interval', type=int, default=100,
 57 |                         help='Logging mini-batch interval. Default is 100.')
 58 |     parser.add_argument('--lr', type=float, default=0.001,
 59 |                         help='Learning rate, default is 0.001')
 60 |     parser.add_argument('--lr-decay', type=float, default=0.1,
 61 |                         help='decay rate of learning rate. default is 0.1.')
 62 |     parser.add_argument('--lr-decay-epoch', type=str, default='160,200',
 63 |                         help='epochs at which learning rate decays. default is 160,200.')
 64 |     parser.add_argument('--momentum', type=float, default=0.9,
 65 |                         help='SGD momentum, default is 0.9')
 66 |     parser.add_argument('--wd', type=float, default=0.0005,
 67 |                         help='Weight decay, default is 5e-4')
 68 | 
 69 |     return parser.parse_args()
 70 | 
 71 | 
 72 | def get_dataloader(net, data_shape, batch_size, num_workers, ctx):
 73 |     """Get dataloader."""
 74 | 
 75 |     width, height = data_shape, data_shape
 76 |     # use fake data to generate fixed anchors for target generation
 77 |     with autograd.train_mode():
 78 |         _, _, anchors = net(mx.nd.zeros((1, 3, height, width), ctx))
 79 |     anchors = anchors.as_in_context(mx.cpu())
 80 |     batchify_fn = Tuple(Stack(), Stack(), Stack())  # stack image, cls_targets, box_targets
 81 |     
 82 |     # can I point that to a bundle of png files instead?
 83 |     train_dataset = gdata.RecordFileDetection(os.path.join(os.environ['SM_CHANNEL_TRAIN'], 'train.rec'))
 84 |     
 85 |     # this is the folder with all the training images 
 86 |     train_folder = os.environ['SM_CHANNEL_TRAIN']
 87 |     
 88 |     train_loader = gluon.data.DataLoader(
 89 |         train_dataset.transform(SSDDefaultTrainTransform(width, height, anchors)),
 90 |         batch_size, True, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers)
 91 |     return train_loader
 92 | 
 93 | def train(net, train_data, ctx, args):
 94 |     """Training pipeline"""
 95 |     
 96 |     net.collect_params().reset_ctx(ctx)
 97 |     
 98 |     trainer = gluon.Trainer(
 99 |             net.collect_params(), 'sgd',
100 |             {'learning_rate': args.lr, 'wd': args.wd, 'momentum': args.momentum}, update_on_kvstore=None)
101 | 
102 |     # lr decay policy
103 |     lr_decay = float(args.lr_decay)
104 |     lr_steps = sorted([float(ls) for ls in args.lr_decay_epoch.split(',') if ls.strip()])
105 | 
106 |     mbox_loss = gcv.loss.SSDMultiBoxLoss()
107 |     ce_metric = mx.metric.Loss('CrossEntropy')
108 |     smoothl1_metric = mx.metric.Loss('SmoothL1')
109 | 
110 |     # set up logger
111 |     logging.basicConfig()
112 |     logger = logging.getLogger()
113 |     logger.setLevel(logging.INFO)
114 |     logger.info(args)
115 |     logger.info('Start training from [Epoch {}]'.format(args.start_epoch))
116 |     best_map = [0]
117 | 
118 |     for epoch in range(args.start_epoch, args.epochs):
119 |         while lr_steps and epoch >= lr_steps[0]:
120 |             new_lr = trainer.learning_rate * lr_decay
121 |             lr_steps.pop(0)
122 |             trainer.set_learning_rate(new_lr)
123 |             logger.info("[Epoch {}] Set learning rate to {}".format(epoch, new_lr))
124 |         ce_metric.reset()
125 |         smoothl1_metric.reset()
126 |         tic = time.time()
127 |         btic = time.time()
128 |         net.hybridize(static_alloc=True, static_shape=True)
129 | 
130 |         for i, batch in enumerate(train_data):
131 |             data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0)
132 |             cls_targets = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0)
133 |             box_targets = gluon.utils.split_and_load(batch[2], ctx_list=ctx, batch_axis=0)
134 | 
135 |             with autograd.record():
136 |                 cls_preds = []
137 |                 box_preds = []
138 |                 for x in data:
139 |                     cls_pred, box_pred, _ = net(x)
140 |                     cls_preds.append(cls_pred)
141 |                     box_preds.append(box_pred)
142 |                 sum_loss, cls_loss, box_loss = mbox_loss(
143 |                     cls_preds, box_preds, cls_targets, box_targets)
144 |                 autograd.backward(sum_loss)
145 |             # since we have already normalized the loss, we don't want to normalize
146 |             # by batch-size anymore
147 |             trainer.step(1)
148 |             
149 |             local_batch_size = int(args.batch_size)
150 |             ce_metric.update(0, [l * local_batch_size for l in cls_loss])
151 |             smoothl1_metric.update(0, [l * local_batch_size for l in box_loss])
152 |             if args.log_interval and not (i + 1) % args.log_interval:
153 |                 name1, loss1 = ce_metric.get()
154 |                 name2, loss2 = smoothl1_metric.get()
155 |                 logger.info('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, {}={:.3f}, {}={:.3f}'.format(
156 |                         epoch, i, args.batch_size/(time.time()-btic), name1, loss1, name2, loss2))
157 |             btic = time.time()
158 | 
159 |         name1, loss1 = ce_metric.get()
160 |         name2, loss2 = smoothl1_metric.get()
161 |         logger.info('[Epoch {}] Training cost: {:.3f}, {}={:.3f}, {}={:.3f}'.format(
162 |                 epoch, (time.time()-tic), name1, loss1, name2, loss2))
163 |         current_map = 0.
164 | 
165 |     #save model
166 |     net.set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100)
167 |     net(mx.nd.ones((1,3,512,512), ctx=ctx[0]))
168 |     net.export('%s/model' % os.environ['SM_MODEL_DIR'])
169 |     return net
170 | 
171 | if __name__ == '__main__':
172 | 
173 |     args = parse_args()
174 |     
175 |     ctx = [mx.gpu(int(i)) for i in args.gpus.split(',') if i.strip()]
176 |     ctx = ctx if ctx else [mx.cpu()]
177 | 
178 |     net = model_zoo.get_model(args.network, pretrained=False, ctx=ctx)
179 |     net.initialize(ctx=mx.gpu(0))
180 |     
181 |     train_loader = get_dataloader(net, args.data_shape, args.batch_size, args.num_workers, ctx[0])
182 | 
183 |     train(net, train_loader, ctx, args)


--------------------------------------------------------------------------------
/Starter Notebooks/Advanced Data Science - XRay Analysis/tensor_plot.py:
--------------------------------------------------------------------------------
  1 | # Third Party
  2 | import numpy as np
  3 | import plotly.graph_objects as go
  4 | import plotly.offline as py
  5 | 
  6 | # First Party
  7 | from smdebug.trials import create_trial
  8 | 
  9 | py.init_notebook_mode(connected=True)
 10 | 
 11 | # This class provides methods to plot tensors as 3 dimensional objects. It is intended for plotting convolutional
 12 | # neural networks and expects that inputs are images and that outputs are class labels or images.
 13 | class TensorPlot:
 14 |     def __init__(
 15 |         self,
 16 |         regex,
 17 |         path,
 18 |         steps=10,
 19 |         batch_sample_id=None,
 20 |         color_channel=1,
 21 |         title="",
 22 |         label=None,
 23 |         prediction=None,
 24 |     ):
 25 |         """
 26 | 
 27 |         :param regex: tensor regex
 28 |         :param path:
 29 |         :param steps:
 30 |         :param batch_sample_id:
 31 |         :param color_channel:
 32 |         :param title:
 33 |         :param label:
 34 |         :param prediction:
 35 |         """
 36 |         self.trial = create_trial(path)
 37 |         self.regex = regex
 38 |         self.steps = steps
 39 |         self.batch_sample_id = batch_sample_id
 40 |         self.color_channel = color_channel
 41 |         self.title = title
 42 |         self.label = label
 43 |         self.prediction = prediction
 44 |         self.max_dim = 0
 45 |         self.dist = 0
 46 |         self.tensors = {}
 47 |         self.output = {}
 48 |         self.input = {}
 49 |         self.load_tensors()
 50 |         self.set_figure()
 51 |         self.plot_network()
 52 |         self.set_frames()
 53 | 
 54 |     # Loads all tensors into a dict where the key is the step.
 55 |     # If batch_sample_id is None then batch dimension is plotted as a seperate dimension
 56 |     # if batch_sample_id is -1 then tensors are summed over batch dimension. Otherwise
 57 |     # the corresponding sample is plotted in the figure, and all the remaining samples
 58 |     # in the batch are dropped.
 59 |     def load_tensors(self):
 60 |         available_steps = self.trial.steps()
 61 |         for step in available_steps[0 : self.steps]:
 62 |             self.tensors[step] = []
 63 | 
 64 |             # input image into the neural network
 65 |             if self.label is not None:
 66 |                 for tname in self.trial.tensor_names(regex=self.label):
 67 |                     tensor = self.trial.tensor(tname).value(step)
 68 |                     if self.color_channel == 1:
 69 |                         self.input[step] = tensor[0, 0, :, :]
 70 |                     elif self.color_channel == 3:
 71 |                         self.input[step] = tensor[0, :, :, 3]
 72 | 
 73 |             # iterate over tensors that match the regex
 74 |             for tname in self.trial.tensor_names(regex=self.regex):
 75 |                 tensor = self.trial.tensor(tname).value(step)
 76 |                 # get max value of tensors to set axis dimension accordingly
 77 |                 for dim in tensor.shape:
 78 |                     if dim > self.max_dim:
 79 |                         self.max_dim = dim
 80 | 
 81 |                 # layer inputs/outputs have as first dimension batch size
 82 |                 if self.batch_sample_id != None:
 83 |                     # sum over batch dimension
 84 |                     if self.batch_sample_id == -1:
 85 |                         tensor = np.sum(tensor, axis=0) / tensor.shape[0]
 86 |                     # plot item from batch
 87 |                     elif self.batch_sample_id >= 0 and self.batch_sample_id <= tensor.shape[0]:
 88 |                         tensor = tensor[self.batch_sample_id]
 89 |                     # plot first item from batch
 90 |                     else:
 91 |                         tensor = tensor[0]
 92 | 
 93 |                     # normalize tensor values between 0 and 1 so that all tensors have same colorscheme
 94 |                     tensor = tensor - np.min(tensor)
 95 |                     if np.max(tensor) != 0:
 96 |                         tensor = tensor / np.max(tensor)
 97 |                     if len(tensor.shape) == 3:
 98 |                         for l in range(tensor.shape[self.color_channel - 1]):
 99 |                             if self.color_channel == 1:
100 |                                 self.tensors[step].append([tname, tensor[l, :, :]])
101 |                             elif self.color_channel == 3:
102 |                                 self.tensors[step].append([tname, tensor[:, :, l]])
103 |                     elif len(tensor.shape) == 1:
104 |                         self.tensors[step].append([tname, tensor])
105 |                 else:
106 |                     # normalize tensor values between 0 and 1 so that all tensors have same colorscheme
107 |                     tensor = tensor - np.min(tensor)
108 |                     if np.max(tensor) != 0:
109 |                         tensor = tensor / np.max(tensor)
110 |                     if len(tensor.shape) == 4:
111 |                         for i in range(tensor.shape[0]):
112 |                             for l in range(tensor.shape[1]):
113 |                                 if self.color_channel == 1:
114 |                                     self.tensors[step].append([tname, tensor[i, l, :, :]])
115 |                                 elif self.color_channel == 3:
116 |                                     self.tensors[step].append([tname, tensor[i, :, :, l]])
117 |                     elif len(tensor.shape) == 2:
118 |                         self.tensors[step].append([tname, tensor])
119 | 
120 |             # model output
121 |             if self.prediction is not None:
122 |                 for tname in self.trial.tensor_names(regex=self.prediction):
123 |                     tensor = self.trial.tensor(tname).value(step)
124 |                     # predicted class (batch size, propabilities per clas)
125 |                     if len(tensor.shape) == 2:
126 |                         self.output[step] = np.array([np.argmax(tensor, axis=1)[0]])
127 |                     # predict an image (batch size, color channel, weidth, height)
128 |                     elif len(tensor.shape) == 4:
129 |                         # MXNet has color channel in dim1
130 |                         if self.color_channel == 1:
131 |                             self.output[step] = tensor[0, 0, :, :]
132 |                         # TF has color channel in dim 3
133 |                         elif self.color_channel == 3:
134 |                             self.output[step] = tensor[0, :, :, 0]
135 | 
136 |     # Configure the plot layout
137 |     def set_figure(self):
138 |         self.fig = go.Figure(
139 |             layout=go.Layout(
140 |                 autosize=False,
141 |                 title=self.title,
142 |                 width=1000,
143 |                 height=800,
144 |                 template="plotly_dark",
145 |                 font=dict(color="gray"),
146 |                 showlegend=False,
147 |                 updatemenus=[
148 |                     dict(
149 |                         type="buttons",
150 |                         buttons=[
151 |                             dict(
152 |                                 label="Play",
153 |                                 method="animate",
154 |                                 args=[
155 |                                     None,
156 |                                     {
157 |                                         "frame": {"duration": 1, "redraw": True},
158 |                                         "fromcurrent": True,
159 |                                         "transition": {"duration": 1},
160 |                                     },
161 |                                 ],
162 |                             )
163 |                         ],
164 |                     )
165 |                 ],
166 |                 scene=dict(
167 |                     xaxis=dict(
168 |                         range=[-self.max_dim / 2, self.max_dim / 2],
169 |                         autorange=False,
170 |                         gridcolor="black",
171 |                         zerolinecolor="black",
172 |                         showgrid=False,
173 |                         showline=False,
174 |                         showticklabels=False,
175 |                         showspikes=False,
176 |                     ),
177 |                     yaxis=dict(
178 |                         range=[-self.max_dim / 2, self.max_dim / 2],
179 |                         autorange=False,
180 |                         gridcolor="black",
181 |                         zerolinecolor="black",
182 |                         showgrid=False,
183 |                         showline=False,
184 |                         showticklabels=False,
185 |                         showspikes=False,
186 |                     ),
187 |                     zaxis=dict(
188 |                         gridcolor="black",
189 |                         zerolinecolor="black",
190 |                         showgrid=False,
191 |                         showline=False,
192 |                         showticklabels=False,
193 |                         showspikes=False,
194 |                     ),
195 |                 ),
196 |             )
197 |         )
198 | 
199 |     # Create a sequence of frames: tensors from same step will be stored in the same frame
200 |     def set_frames(self):
201 |         frames = []
202 |         available_steps = self.trial.steps()
203 |         for step in available_steps[0 : self.steps]:
204 |             layers = []
205 |             if self.label is not None:
206 |                 if len(self.input[step].shape) == 2:
207 |                     # plot predicted image
208 |                     layers.append({"type": "surface", "surfacecolor": self.input[step]})
209 |             for i in range(len(self.tensors[step])):
210 |                 if len(self.tensors[step][i][1].shape) == 1:
211 |                     # set color of fully connected layer for corresponding step
212 |                     layers.append(
213 |                         {"type": "scatter3d", "marker": {"color": self.tensors[step][i][1]}}
214 |                     )
215 |                 elif len(self.tensors[step][i][1].shape) == 2:
216 |                     # set color of convolutional/pooling  layer for corresponding step
217 |                     layers.append({"type": "surface", "surfacecolor": self.tensors[step][i][1]})
218 |             if self.prediction is not None:
219 |                 if len(self.output[step].shape) == 1:
220 |                     # plot predicted class for first input in batch
221 |                     layers.append(
222 |                         {
223 |                             "type": "scatter3d",
224 |                             "text": "Predicted class " + str(self.output[step][0]),
225 |                             "textfont": {"size": 40},
226 |                         }
227 |                     )
228 |                 elif len(self.output[step].shape) == 2:
229 |                     # plot predicted image
230 |                     layers.append({"type": "surface", "surfacecolor": self.output[step]})
231 |             frames.append(go.Frame(data=layers))
232 | 
233 |         self.fig.frames = frames
234 | 
235 |     # Plot the different neural network layers
236 |     # if ignore_batch_dimension is True then convolutions are plotted as
237 |     # Surface and dense layers are plotted as Scatter3D
238 |     # if ignore_batch_dimension is False then convolutions and dense layers
239 |     # are plotted as Surface. We don't plot biases.
240 |     # If convolution has shape [batch_size, 10, 24, 24] and ignore_batch_dimension==True
241 |     # then this function will plot 10 Surface layers in the size of 24x24
242 |     def plot_network(self):
243 |         tensors = []
244 |         dist = 0
245 |         counter = 0
246 | 
247 |         first_step = self.trial.steps()[0]
248 |         if self.label is not None:
249 |             tensor = self.input[first_step].shape
250 |             if len(tensor) == 2:
251 |                 tensors.append(
252 |                     go.Surface(
253 |                         z=np.zeros((tensor[0], tensor[1])) + self.dist,
254 |                         y=np.arange(-tensor[0] / 2, tensor[0] / 2),
255 |                         x=np.arange(-tensor[1] / 2, tensor[1] / 2),
256 |                         surfacecolor=self.input[first_step],
257 |                         showscale=False,
258 |                         colorscale="gray",
259 |                         opacity=0.7,
260 |                     )
261 |                 )
262 |             self.dist += 2
263 |         prev_name = None
264 |         for tname, layer in self.tensors[first_step]:
265 |             tensor = layer.shape
266 | 
267 |             if len(tensor) == 2:
268 |                 tensors.append(
269 |                     go.Surface(
270 |                         z=np.zeros((tensor[0], tensor[1])) + self.dist,
271 |                         y=np.arange(-tensor[0] / 2, tensor[0] / 2),
272 |                         x=np.arange(-tensor[1] / 2, tensor[1] / 2),
273 |                         text=tname,
274 |                         surfacecolor=layer,
275 |                         showscale=False,
276 |                         # colorscale='gray',
277 |                         opacity=0.7,
278 |                     )
279 |                 )
280 | 
281 |             elif len(tensor) == 1:
282 |                 tensors.append(
283 |                     go.Scatter3d(
284 |                         z=np.zeros(tensor[0]) + self.dist,
285 |                         y=np.zeros(tensor[0]),
286 |                         x=np.arange(-tensor[0] / 2, tensor[0] / 2),
287 |                         text=tname,
288 |                         mode="markers",
289 |                         marker=dict(size=3, opacity=0.7, color=layer),
290 |                     )
291 |                 )
292 |             if tname == prev_name:
293 |                 self.dist += 0.2
294 |             else:
295 |                 self.dist += 1
296 |             counter += 1
297 |             prev_name = tname
298 |         # plot model output
299 |         if self.prediction is not None:
300 |             # model predicts a class label (batch_size, class propabilities)
301 |             if len(self.output[first_step].shape) == 1:
302 |                 tensors.append(
303 |                     go.Scatter3d(
304 |                         z=np.array([self.dist + 0.2]),
305 |                         x=np.array([0]),
306 |                         y=np.array([0]),
307 |                         text="Predicted class",
308 |                         mode="markers+text",
309 |                         marker=dict(size=3, color="black"),
310 |                         textfont=dict(size=18),
311 |                         opacity=0.7,
312 |                     )
313 |                 )
314 |                 # model predicts an output image  (batch size, color channel, width, height)
315 |             elif len(self.output[first_step].shape) == 2:
316 |                 tensor = self.output[first_step].shape
317 |                 tensors.append(
318 |                     go.Surface(
319 |                         z=np.zeros((tensor[0], tensor[1])) + self.dist + 3,
320 |                         y=np.arange(-tensor[0] / 2, tensor[0] / 2),
321 |                         x=np.arange(-tensor[1] / 2, tensor[1] / 2),
322 |                         text="Predicted image",
323 |                         surfacecolor=self.output[first_step],
324 |                         showscale=False,
325 |                         colorscale="gray",
326 |                         opacity=0.7,
327 |                     )
328 |                 )
329 | 
330 |         # add list of tensors to figure
331 |         self.fig.add_traces(tensors)
332 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Cost Prediction/Cost Prediction with Autopilot.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Predict Hospital Spending Per Patient with SageMaker Autopilot\n",
  8 |     "In this lab we'll get started with SageMaker using Autopilot! In particular we will download the Medicare dataset, clean it, and plug it into a framework for SageMaker Autopilot.\n",
  9 |     "\n",
 10 |     "You'll see the notebooks generated for you, the hundreds of models trained, in addition to your very own inference pipeline, deployable to a SageMaker endpoint or batch transform job!\n",
 11 |     "\n",
 12 |     "At the end, we'll set up a SHAP explainer to analyze local feature importance for a set of predictions. Let's get started!"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "code",
 17 |    "execution_count": null,
 18 |    "metadata": {},
 19 |    "outputs": [],
 20 |    "source": [
 21 |     "# Download the Mediare dataset as csv file to the notebook\n",
 22 |     "!wget -O Medicare_Hospital_Spending_by_Claim.csv https://data.medicare.gov/api/views/nrth-mfg3/rows.csv?accessType=DOWNLOAD"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "markdown",
 27 |    "metadata": {},
 28 |    "source": [
 29 |     "### Data Preprocessing on the Raw Dataset\n",
 30 |     "In this section we read the raw csv data set into a pandas data frame. We inspect the data using pandas head() function. We do data pre-processing using feature encoding, feature engineering, column renaming, dropping some columns that have no relevance to the prediction of `Avg_Hosp` cost and examining there are no missing values in the data set"
 31 |    ]
 32 |   },
 33 |   {
 34 |    "cell_type": "code",
 35 |    "execution_count": null,
 36 |    "metadata": {},
 37 |    "outputs": [],
 38 |    "source": [
 39 |     "# Read the CSV file into panda dataframe and save it to another table so we can keep a copy of the original dataset\n",
 40 |     "# In our example we use the dataframe called table1 for all pre-processing, while the dataframe table\n",
 41 |     "# maintains a copy of the original data\n",
 42 |     "\n",
 43 |     "import pandas as pd\n",
 44 |     "table = pd.read_csv('Medicare_Hospital_Spending_by_Claim.csv')\n",
 45 |     "table1 = table.copy()\n",
 46 |     "table1.head()"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": null,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "# Encode column \"State\"\n",
 56 |     "\n",
 57 |     "replace_map = {'State': {'AK': 1, 'AL': 2, 'AR': 3, 'AZ': 4, 'CA': 5, 'CO': 6, 'CT': 7, \n",
 58 |     "                         'DC': 8, 'DE': 9, 'FL': 10, 'GA': 11, 'HI': 12, \n",
 59 |     "                         'IA': 13, 'ID': 14, 'IL': 15, 'IN': 16, 'KS': 17, \n",
 60 |     "                         'KY': 18, 'LA': 19, 'MA': 20, 'ME': 21, 'MI': 22, \n",
 61 |     "                         'MN': 23, 'MO': 24, 'MS': 25, 'MT': 26, 'NC': 27, \n",
 62 |     "                         'ND': 28, 'NE': 29, 'NH': 30, 'NJ': 31, 'NM': 32, \n",
 63 |     "                         'NV': 33, 'NY': 34, 'OH': 35, 'OK': 36, 'OR': 37, \n",
 64 |     "                         'PA': 38, 'RI': 39, 'SC': 40, 'SD': 41, 'TN': 42, \n",
 65 |     "                         'TX': 43, 'UT': 44, 'VA': 45, 'VT': 46, 'WA': 47, \n",
 66 |     "                         'WI': 48, 'WV': 49, 'WY': 50}}\n",
 67 |     "table1.replace(replace_map,inplace=True)"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "code",
 72 |    "execution_count": null,
 73 |    "metadata": {},
 74 |    "outputs": [],
 75 |    "source": [
 76 |     "# Encode column \"Period\"\n",
 77 |     "\n",
 78 |     "replace_map = {'Period': {'1 to 3 days Prior to Index Hospital Admission': 1, \n",
 79 |     "                          'During Index Hospital Admission': 2, \n",
 80 |     "                          '1 through 30 days After Discharge from Index Hospital Admission': 3, \n",
 81 |     "                          'Complete Episode': 4}}\n",
 82 |     "table1.replace(replace_map,inplace=True)"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "code",
 87 |    "execution_count": null,
 88 |    "metadata": {},
 89 |    "outputs": [],
 90 |    "source": [
 91 |     "# Encode column \"Claim Type\"\n",
 92 |     "\n",
 93 |     "replace_map = {'Claim Type': {'Home Health Agency': 1, \n",
 94 |     "                              'Hospice': 2, \n",
 95 |     "                              'Inpatient': 3, \n",
 96 |     "                              'Outpatient': 4, \n",
 97 |     "                              'Skilled Nursing Facility': 5, \n",
 98 |     "                              'Durable Medical Equipment': 6, \n",
 99 |     "                              'Carrier': 7, \n",
100 |     "                              'Total': 8}}\n",
101 |     "table1.replace(replace_map,inplace=True)"
102 |    ]
103 |   },
104 |   {
105 |    "cell_type": "code",
106 |    "execution_count": null,
107 |    "metadata": {},
108 |    "outputs": [],
109 |    "source": [
110 |     "# Convert the column \"Percent of Spending Hospital\tPercent of Spending\" to float, remove the percent sign and \n",
111 |     "# divide by 100 to normalize for percentage\n",
112 |     "\n",
113 |     "table1['Percent of Spending Hospital'] = table1['Percent of Spending Hospital'].str.rstrip('%').astype('float')\n",
114 |     "table1['Percent of Spending Hospital'] = table1['Percent of Spending Hospital']/100"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": null,
120 |    "metadata": {},
121 |    "outputs": [],
122 |    "source": [
123 |     "# Convert the column \"Percent of Spending State\" to float, remove the percent sign and \n",
124 |     "# divide by 100 to normalize for percentage\n",
125 |     "\n",
126 |     "table1['Percent of Spending State'] = table1['Percent of Spending State'].str.rstrip('%').astype('float')\n",
127 |     "table1['Percent of Spending State'] = table1['Percent of Spending State']/100"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "code",
132 |    "execution_count": null,
133 |    "metadata": {},
134 |    "outputs": [],
135 |    "source": [
136 |     "# Convert the column \"Percent of Spending Nation\" to float, remove the percent sign and \n",
137 |     "# divide by 100 to normalize for percentage\n",
138 |     "\n",
139 |     "table1['Percent of Spending Nation'] = table1['Percent of Spending Nation'].str.rstrip('%').astype('float')\n",
140 |     "table1['Percent of Spending Nation'] = table1['Percent of Spending Nation']/100"
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": null,
146 |    "metadata": {},
147 |    "outputs": [],
148 |    "source": [
149 |     "# Drop Column \"Facility Name\", Facility Id related to the facility, hence facility name is not\n",
150 |     "# relevant for the model\n",
151 |     "\n",
152 |     "table1.drop(['Facility Name'], axis=1, inplace = True)"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "code",
157 |    "execution_count": null,
158 |    "metadata": {},
159 |    "outputs": [],
160 |    "source": [
161 |     "# Move the \"Avg Spending Per Episode Hospital\" column to the beginning, since the\n",
162 |     "# algorithm requires the prediction column at the beginning\n",
163 |     "\n",
164 |     "col_name='Avg Spending Per Episode Hospital'\n",
165 |     "first_col = table1.pop(col_name)\n",
166 |     "table1.insert(0, col_name, first_col)"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "code",
171 |    "execution_count": null,
172 |    "metadata": {},
173 |    "outputs": [],
174 |    "source": [
175 |     "# Convert integer values to float in the columns \"Avg Spending Per Episode Hospital\", \n",
176 |     "# \"Avg Spending Per Episode State\" and \"Avg Spending Per Episode Nation\"\n",
177 |     "# Columns with integer values are interpreted as categorical values. Changing to float avoids any mis-interpretetaion\n",
178 |     "\n",
179 |     "table1['Avg Spending Per Episode Hospital'] = table1['Avg Spending Per Episode Hospital'].astype('float')\n",
180 |     "table1['Avg Spending Per Episode State'] = table1['Avg Spending Per Episode State'].astype('float')\n",
181 |     "table1['Avg Spending Per Episode Nation'] = table1['Avg Spending Per Episode Nation'].astype('float')"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "code",
186 |    "execution_count": null,
187 |    "metadata": {},
188 |    "outputs": [],
189 |    "source": [
190 |     "# Rename long column names for costs and percentage costs on the hospital, state and nation,\n",
191 |     "# so they are easily referenced in the rest of this discussion\n",
192 |     "\n",
193 |     "table1.rename(columns={'Avg Spending Per Episode Hospital':'Avg_Hosp',\n",
194 |     "                       'Avg Spending Per Episode State':'Avg_State',\n",
195 |     "                       'Avg Spending Per Episode Nation':'Avg_Nation',\n",
196 |     "                       'Percent of Spending Hospital':'Percent_Hosp',\n",
197 |     "                       'Percent of Spending State':'Percent_State',\n",
198 |     "                       'Percent of Spending Nation':'Percent_Nation'}, \n",
199 |     "                       inplace=True)"
200 |    ]
201 |   },
202 |   {
203 |    "cell_type": "code",
204 |    "execution_count": null,
205 |    "metadata": {},
206 |    "outputs": [],
207 |    "source": [
208 |     "# Convert Start Date and End Date to datetime objects, then convert them to integers. First the data is converted\n",
209 |     "# to Pandas datetime object. Then the year, month and days are extracted from the datetime object and \n",
210 |     "# multipled with some weights to convert into final integer values.\n",
211 |     "\n",
212 |     "table1['Start Date'] = pd.to_datetime(table1['Start Date'])\n",
213 |     "table1['End Date'] = pd.to_datetime(table1['End Date'])\n",
214 |     "table1['Start Date'] = 1000*table1['Start Date'].dt.year + 100*table1['Start Date'].dt.month + table1['Start Date'].dt.day\n",
215 |     "table1['End Date'] = 1000*table1['End Date'].dt.year + 100*table1['End Date'].dt.month + table1['End Date'].dt.day"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "code",
220 |    "execution_count": null,
221 |    "metadata": {},
222 |    "outputs": [],
223 |    "source": [
224 |     "# See the first 5 rows in the dataframe to see how the changed data looks\n",
225 |     "\n",
226 |     "table1.head()"
227 |    ]
228 |   },
229 |   {
230 |    "cell_type": "code",
231 |    "execution_count": null,
232 |    "metadata": {},
233 |    "outputs": [],
234 |    "source": [
235 |     "# Drop Columns \"Start Date\" and \"End Date\". The dataset is only for 2018, hence all start and end dates\n",
236 |     "# are same in each row and does not impact the model\n",
237 |     "\n",
238 |     "table1.drop(['Start Date'], axis=1, inplace = True)\n",
239 |     "table1.drop(['End Date'], axis=1, inplace = True)"
240 |    ]
241 |   },
242 |   {
243 |    "cell_type": "code",
244 |    "execution_count": null,
245 |    "metadata": {},
246 |    "outputs": [],
247 |    "source": [
248 |     "# Make sure the table do not have missing values. The following code line shows there are no missing values\n",
249 |     "# in the table\n",
250 |     "\n",
251 |     "table1.isna().sum()"
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "code",
256 |    "execution_count": null,
257 |    "metadata": {},
258 |    "outputs": [],
259 |    "source": [
260 |     "df = table1.sample(frac=1)"
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "code",
265 |    "execution_count": null,
266 |    "metadata": {},
267 |    "outputs": [],
268 |    "source": [
269 |     "fraction_train = .85\n",
270 |     "test_row = round(df.shape[0] * fraction_train)\n",
271 |     "test_set = df.iloc[test_row:]\n",
272 |     "train_set = df.iloc[:test_row]"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": null,
278 |    "metadata": {},
279 |    "outputs": [],
280 |    "source": [
281 |     "local_train_file = 'train_set.csv'\n",
282 |     "\n",
283 |     "train_set.to_csv(local_train_file, index=False, header=True)\n",
284 |     "test_set.to_csv('test_set.csv', index=False, header=True)"
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "code",
289 |    "execution_count": null,
290 |    "metadata": {},
291 |    "outputs": [],
292 |    "source": [
293 |     "# optionally run some of your own plots here to analyze the data"
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "markdown",
298 |    "metadata": {},
299 |    "source": [
300 |     "# SageMaker Autopilot\n",
301 |     "Next, let's run this dataset on SageMaker Autopilot! "
302 |    ]
303 |   },
304 |   {
305 |    "cell_type": "code",
306 |    "execution_count": null,
307 |    "metadata": {},
308 |    "outputs": [],
309 |    "source": [
310 |     "from sagemaker import AutoML\n",
311 |     "from time import gmtime, strftime, sleep\n",
312 |     "import numpy as np\n",
313 |     "import sagemaker\n",
314 |     "\n",
315 |     "sess = sagemaker.Session()\n",
316 |     "\n",
317 |     "role = sagemaker.get_execution_role()\n",
318 |     "\n",
319 |     "timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())\n",
320 |     "base_job_name = 'cost-prediction-' + timestamp_suffix\n",
321 |     "\n",
322 |     "target_attribute_name = 'Avg_Hosp'\n",
323 |     "target_attribute_values = np.unique(train_set[target_attribute_name])\n",
324 |     "target_attribute_true_value = target_attribute_values[1] # 'True.'\n",
325 |     "\n",
326 |     "automl = AutoML(role=role,\n",
327 |     "                target_attribute_name=target_attribute_name,\n",
328 |     "                base_job_name=base_job_name,\n",
329 |     "                sagemaker_session=sess,\n",
330 |     "                max_candidates=20,\n",
331 |     "               problem_type = 'Regression',\n",
332 |     "                job_objective = {'MetricName':'MSE'})\n",
333 |     "\n",
334 |     "automl.fit(local_train_file, job_name=base_job_name, wait=True, logs=True)"
335 |    ]
336 |   },
337 |   {
338 |    "cell_type": "markdown",
339 |    "metadata": {},
340 |    "source": [
341 |     "After you run this cell, open up the Experiments tab on SageMaker Studio, right click on your new `cost-prediction` job, and view the AutoML job details! "
342 |    ]
343 |   },
344 |   {
345 |    "cell_type": "markdown",
346 |    "metadata": {},
347 |    "source": [
348 |     "![](../../Images/Autopilot.png)"
349 |    ]
350 |   },
351 |   {
352 |    "cell_type": "markdown",
353 |    "metadata": {},
354 |    "source": [
355 |     "Once the state of the job has moved into `Feature Engineering`, you should be able to open the data exploration notebook, in addition to the candidate generation notebook. "
356 |    ]
357 |   },
358 |   {
359 |    "cell_type": "markdown",
360 |    "metadata": {},
361 |    "source": [
362 |     "Spend some time stepping through these notebooks. You can also download the data transformation code base. Remember, all of this was generated for your specific dataset!"
363 |    ]
364 |   },
365 |   {
366 |    "cell_type": "markdown",
367 |    "metadata": {},
368 |    "source": [
369 |     "---\n",
370 |     "# Analyze Autopilot Modeling Performance\n",
371 |     "Your AutoML job will take some time to complete. Feel free to use that time to step through the generated notebooks and learn about all the feature engineering strategies they are using! \n",
372 |     "\n",
373 |     "Once your job has finished, it's time to analyze that performance. Luckily for us we can simply deploy that entire artifact onto an endpoint, using the same `model.deploy()` that we saw earlier. Let's do that here.\n",
374 |     "\n",
375 |     "We'll attach the name of your job to an AutoML estimator, so please make sure to paste in the name of your job below."
376 |    ]
377 |   },
378 |   {
379 |    "cell_type": "code",
380 |    "execution_count": null,
381 |    "metadata": {},
382 |    "outputs": [],
383 |    "source": [
384 |     "from datetime import datetime\n",
385 |     "from sagemaker import AutoML\n",
386 |     "import sagemaker\n",
387 |     "import numpy as np\n",
388 |     "\n",
389 |     "sess = sagemaker.Session()\n",
390 |     "\n",
391 |     "# if you needed to restart you kernel, you can attach your AutoML job here\n",
392 |     "automl_job_name = 'COST-PREDICTION-28-02-12-32' #<== REPLACE THIS WITH YOUR OWN AUTOML JOB NAME\n",
393 |     "automl = AutoML.attach(automl_job_name, sagemaker_session=sess)\n",
394 |     "\n",
395 |     "ep_name = 'automl-endpoint-' + datetime.now().strftime('%S')\n",
396 |     "\n",
397 |     "inference_response_keys = ['predicted_label', 'probability']\n",
398 |     "\n",
399 |     "# Create the inference endpoint\n",
400 |     "automl.deploy(1, 'ml.m5.xlarge', endpoint_name = ep_name) #inference_response_keys=inference_response_keys)"
401 |    ]
402 |   },
403 |   {
404 |    "cell_type": "code",
405 |    "execution_count": null,
406 |    "metadata": {},
407 |    "outputs": [],
408 |    "source": [
409 |     "!pip install --upgrade sagemaker"
410 |    ]
411 |   },
412 |   {
413 |    "cell_type": "code",
414 |    "execution_count": null,
415 |    "metadata": {},
416 |    "outputs": [],
417 |    "source": [
418 |     "from sagemaker.predictor import RealTimePredictor\n",
419 |     "class AutomlEstimator:\n",
420 |     "    def __init__(self, endpoint_name, sagemaker_session):\n",
421 |     "        self.predictor = RealTimePredictor(\n",
422 |     "            endpoint_name=endpoint_name,\n",
423 |     "            sagemaker_session=sagemaker_session,\n",
424 |     "            serializer=sagemaker.serializers.CSVSerializer(),\n",
425 |     "            content_type='text/csv',\n",
426 |     "            accept='text/csv'\n",
427 |     "        )\n",
428 |     "    # Prediction function for regression\n",
429 |     "    def predict(self, x):\n",
430 |     "        response = self.predictor.predict(x)\n",
431 |     "        return np.array([float(x) for x in response.decode('utf-8').split(',')])"
432 |    ]
433 |   },
434 |   {
435 |    "cell_type": "code",
436 |    "execution_count": null,
437 |    "metadata": {},
438 |    "outputs": [],
439 |    "source": [
440 |     "# make sure this is pointing to the right endpoint name - if you reran that cell above you may have overwitten the variable in memory\n",
441 |     "automl_estimator = AutomlEstimator(endpoint_name=ep_name, sagemaker_session=sess)"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": null,
447 |    "metadata": {},
448 |    "outputs": [],
449 |    "source": [
450 |     "import pandas as pd\n",
451 |     "\n",
452 |     "test_data = pd.read_csv('test_set.csv')"
453 |    ]
454 |   },
455 |   {
456 |    "cell_type": "markdown",
457 |    "metadata": {},
458 |    "source": [
459 |     "# Explain Global and Local Modeling Performance with SHAP\n",
460 |     "A key question that many stakeholders will have is how your model came to its predictions, both for the entire dataset and for individual predictions. In this lab we'll set up a SHAP model explainer to view feature importances. Feature importances can be understood both in terms of \"local,\" or per-prediction, and \"global,\" or for the entire datset.\n",
461 |     "\n",
462 |     "We will actually wrap your model endpoint to provide these."
463 |    ]
464 |   },
465 |   {
466 |    "cell_type": "code",
467 |    "execution_count": null,
468 |    "metadata": {},
469 |    "outputs": [],
470 |    "source": [
471 |     "!conda update -n base -c defaults conda -y"
472 |    ]
473 |   },
474 |   {
475 |    "cell_type": "code",
476 |    "execution_count": null,
477 |    "metadata": {},
478 |    "outputs": [],
479 |    "source": [
480 |     "!conda install -c conda-forge -y shap"
481 |    ]
482 |   },
483 |   {
484 |    "cell_type": "code",
485 |    "execution_count": null,
486 |    "metadata": {},
487 |    "outputs": [],
488 |    "source": [
489 |     "import shap\n",
490 |     "\n",
491 |     "from shap import KernelExplainer\n",
492 |     "from shap import sample\n",
493 |     "from scipy.special import expit\n",
494 |     "\n",
495 |     "# Initialize plugin to make plots interactive.\n",
496 |     "shap.initjs()"
497 |    ]
498 |   },
499 |   {
500 |    "cell_type": "code",
501 |    "execution_count": null,
502 |    "metadata": {},
503 |    "outputs": [],
504 |    "source": [
505 |     "data_without_target = test_data.drop(columns=['Avg_Hosp'])\n",
506 |     "\n",
507 |     "background_data = sample(data_without_target, 50)"
508 |    ]
509 |   },
510 |   {
511 |    "cell_type": "code",
512 |    "execution_count": null,
513 |    "metadata": {},
514 |    "outputs": [],
515 |    "source": [
516 |     "# Derive link function \n",
517 |     "problem_type = automl.describe_auto_ml_job(job_name=automl_job_name)['ResolvedAttributes']['ProblemType'] \n",
518 |     "link = \"identity\" if problem_type == 'Regression' else \"logit\" \n",
519 |     "\n",
520 |     "# the handle to predict_proba is passed to KernelExplainerWrapper since KernelSHAP requires the class probability\n",
521 |     "explainer = KernelExplainer(automl_estimator.predict, background_data, link=link)"
522 |    ]
523 |   },
524 |   {
525 |    "cell_type": "code",
526 |    "execution_count": null,
527 |    "metadata": {},
528 |    "outputs": [],
529 |    "source": [
530 |     "# Since expected_value is given in the log-odds space we convert it back to probability using expit which is the inverse function to logit\n",
531 |     "print('expected value =', explainer.expected_value)"
532 |    ]
533 |   },
534 |   {
535 |    "cell_type": "code",
536 |    "execution_count": null,
537 |    "metadata": {},
538 |    "outputs": [],
539 |    "source": [
540 |     "%%writefile managed_endpoint.py\n",
541 |     "\n",
542 |     "import boto3\n",
543 |     "region = boto3.Session().region_name\n",
544 |     "\n",
545 |     "sm = boto3.Session().client(service_name='sagemaker',region_name=region)\n",
546 |     "\n",
547 |     "class ManagedEndpoint:\n",
548 |     "    def __init__(self, ep_name, auto_delete=False):\n",
549 |     "        self.name = ep_name\n",
550 |     "        self.auto_delete = auto_delete\n",
551 |     "        \n",
552 |     "    def __enter__(self):\n",
553 |     "        endpoint_description = sm.describe_endpoint(EndpointName=self.name)\n",
554 |     "        if endpoint_description['EndpointStatus'] == 'InService':\n",
555 |     "            self.in_service = True        \n",
556 |     "\n",
557 |     "    def __exit__(self, type, value, traceback):\n",
558 |     "        if self.in_service and self.auto_delete:\n",
559 |     "            print(\"Deleting the endpoint: {}\".format(self.name))            \n",
560 |     "            sm.delete_endpoint(EndpointName=self.name)\n",
561 |     "            sm.get_waiter('endpoint_deleted').wait(EndpointName=self.name)\n",
562 |     "            self.in_service = False"
563 |    ]
564 |   },
565 |   {
566 |    "cell_type": "code",
567 |    "execution_count": null,
568 |    "metadata": {},
569 |    "outputs": [],
570 |    "source": [
571 |     "# Get the first sample\n",
572 |     "x = data_without_target.iloc[0:1]\n",
573 |     "\n",
574 |     "# ManagedEndpoint can optionally auto delete the endpoint after calculating the SHAP values. To enable auto delete, use ManagedEndpoint(ep_name, auto_delete=True)\n",
575 |     "from managed_endpoint import ManagedEndpoint\n",
576 |     "with ManagedEndpoint(ep_name) as mep:\n",
577 |     "    shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='aic')"
578 |    ]
579 |   },
580 |   {
581 |    "cell_type": "markdown",
582 |    "metadata": {},
583 |    "source": [
584 |     "# Visualize SHAP Values\n",
585 |     "Now, let's see which features are more strongly influencing the predictions from our model!\n",
586 |     "\n",
587 |     "![](images/shap_1.png)"
588 |    ]
589 |   },
590 |   {
591 |    "cell_type": "code",
592 |    "execution_count": null,
593 |    "metadata": {},
594 |    "outputs": [],
595 |    "source": [
596 |     "# Since shap_values are provided in the log-odds space, we convert them back to the probability space by using LogitLink\n",
597 |     "shap.force_plot(explainer.expected_value, shap_values, x, link=link)"
598 |    ]
599 |   },
600 |   {
601 |    "cell_type": "markdown",
602 |    "metadata": {},
603 |    "source": [
604 |     "![](images/shap_2.png)"
605 |    ]
606 |   },
607 |   {
608 |    "cell_type": "code",
609 |    "execution_count": null,
610 |    "metadata": {},
611 |    "outputs": [],
612 |    "source": [
613 |     "with ManagedEndpoint(ep_name) as mep:\n",
614 |     "    shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='num_features(5)')\n",
615 |     "shap.force_plot(explainer.expected_value, shap_values, x, link=link)"
616 |    ]
617 |   },
618 |   {
619 |    "cell_type": "code",
620 |    "execution_count": null,
621 |    "metadata": {},
622 |    "outputs": [],
623 |    "source": [
624 |     "# Sample 50 random samples\n",
625 |     "X = sample(data_without_target, 50)\n",
626 |     "\n",
627 |     "# Calculate SHAP values for these samples, and delete the endpoint\n",
628 |     "with ManagedEndpoint(ep_name, auto_delete=True) as mep:\n",
629 |     "    shap_values = explainer.shap_values(X, nsamples='auto', l1_reg='aic')"
630 |    ]
631 |   },
632 |   {
633 |    "cell_type": "markdown",
634 |    "metadata": {},
635 |    "source": [
636 |     "![](images/shap_3.png)"
637 |    ]
638 |   },
639 |   {
640 |    "cell_type": "code",
641 |    "execution_count": null,
642 |    "metadata": {},
643 |    "outputs": [],
644 |    "source": [
645 |     "shap.force_plot(explainer.expected_value, shap_values, X, link=link)"
646 |    ]
647 |   },
648 |   {
649 |    "cell_type": "markdown",
650 |    "metadata": {},
651 |    "source": [
652 |     "![](images/shap_4.png)"
653 |    ]
654 |   },
655 |   {
656 |    "cell_type": "code",
657 |    "execution_count": null,
658 |    "metadata": {},
659 |    "outputs": [],
660 |    "source": [
661 |     "shap.summary_plot(shap_values, X, plot_type=\"bar\")"
662 |    ]
663 |   },
664 |   {
665 |    "cell_type": "markdown",
666 |    "metadata": {},
667 |    "source": [
668 |     "---\n",
669 |     "# Optional - Extend Autopilot with your own feature engineering code\n",
670 |     "If you have extra time after getting to the local inference explanations, why not take a look at bringing your own feature engineering code into SageMaker Autopilot? Remember that this notebook started with ~10 basic ETL steps in Python to convert the raw Medicare data into something our models could even start to loook at. Look at the following example to see how to port your own ETL scripts into SageMaker Autopilot for custom feature engineering.\n",
671 |     "\n",
672 |     "Remember, once you get the entire pipeline deployed onto an endpoint, it means you can send the raw data up to the endpoint, and it will perform both feature engineering and model infereing for you, all in real time!\n",
673 |     "\n",
674 |     "- https://github.com/aws/amazon-sagemaker-examples/tree/master/autopilot/custom-feature-selection"
675 |    ]
676 |   },
677 |   {
678 |    "cell_type": "code",
679 |    "execution_count": null,
680 |    "metadata": {},
681 |    "outputs": [],
682 |    "source": []
683 |   }
684 |  ],
685 |  "metadata": {
686 |   "instance_type": "ml.t3.medium",
687 |   "kernelspec": {
688 |    "display_name": "Python 3 (Data Science)",
689 |    "language": "python",
690 |    "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
691 |   },
692 |   "language_info": {
693 |    "codemirror_mode": {
694 |     "name": "ipython",
695 |     "version": 3
696 |    },
697 |    "file_extension": ".py",
698 |    "mimetype": "text/x-python",
699 |    "name": "python",
700 |    "nbconvert_exporter": "python",
701 |    "pygments_lexer": "ipython3",
702 |    "version": "3.7.6"
703 |   }
704 |  },
705 |  "nbformat": 4,
706 |  "nbformat_minor": 4
707 | }
708 | 


--------------------------------------------------------------------------------
/Starter Notebooks/Cost Prediction/images/shap_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_1.png


--------------------------------------------------------------------------------
/Starter Notebooks/Cost Prediction/images/shap_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_2.png


--------------------------------------------------------------------------------
/Starter Notebooks/Cost Prediction/images/shap_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_3.png


--------------------------------------------------------------------------------
/Starter Notebooks/Cost Prediction/images/shap_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_4.png


--------------------------------------------------------------------------------
/Starter Notebooks/MLOps and Hosting/install-run-notebook.sh:
--------------------------------------------------------------------------------
1 | version=0.15.0
2 | pip install https://github.com/aws-samples/sagemaker-run-notebook/releases/download/v${version}/sagemaker_run_notebook-${version}.tar.gz
3 | jlpm config set cache-folder /tmp/yarncache
4 | jupyter lab build --debug --minimize=False
5 | nohup supervisorctl -c /etc/supervisor/conf.d/supervisord.conf restart jupyterlabserver
6 | 


--------------------------------------------------------------------------------
/Starter Notebooks/MLOps and Hosting/model.tar.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/MLOps and Hosting/model.tar.gz


--------------------------------------------------------------------------------
/Starter Notebooks/MLOps and Hosting/src/requirements.txt:
--------------------------------------------------------------------------------
 1 | 
 2 | autogluon
 3 | sagemaker
 4 | awscli 
 5 | boto3
 6 | PrettyTable
 7 | bokeh
 8 | numpy==1.16.1
 9 | matplotlib
10 | sagemaker-experiments
11 | 


--------------------------------------------------------------------------------
/Starter Notebooks/MLOps and Hosting/src/train.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import ast
  3 | import argparse
  4 | import logging
  5 | import warnings
  6 | import os
  7 | import json
  8 | import glob
  9 | import subprocess
 10 | import sys
 11 | import boto3
 12 | import pickle
 13 | import pandas as pd
 14 | from collections import Counter
 15 | from timeit import default_timer as timer
 16 | import time
 17 | 
 18 | from smexperiments.experiment import Experiment
 19 | from smexperiments.trial import Trial
 20 | from smexperiments.trial_component import TrialComponent
 21 | from smexperiments.tracker import Tracker
 22 | 
 23 | sys.path.insert(0, 'package')
 24 | with warnings.catch_warnings():
 25 |     warnings.filterwarnings("ignore",category=DeprecationWarning)
 26 |     from prettytable import PrettyTable
 27 |     import autogluon as ag
 28 |     from autogluon import TabularPrediction as task
 29 |     from autogluon.task.tabular_prediction import TabularDataset
 30 |     
 31 | # ------------------------------------------------------------ #
 32 | # Training methods                                             #
 33 | # ------------------------------------------------------------ #
 34 | 
 35 | def du(path):
 36 |     """disk usage in human readable format (e.g. '2,1GB')"""
 37 |     return subprocess.check_output(['du','-sh', path]).split()[0].decode('utf-8')
 38 | 
 39 | def __load_input_data(path: str) -> TabularDataset:
 40 |     """
 41 |     Load training data as dataframe
 42 |     :param path:
 43 |     :return: DataFrame
 44 |     """
 45 |     input_data_files = os.listdir(path)
 46 |     try:
 47 |         input_dfs = [pd.read_csv(f'{path}/{data_file}') for data_file in input_data_files]
 48 |         return task.Dataset(df=pd.concat(input_dfs))
 49 |     except:
 50 |         print(f'No csv data in {path}!')
 51 |         return None
 52 | 
 53 | def train(args):
 54 |         
 55 |     is_distributed = len(args.hosts) > 1
 56 |     host_rank = args.hosts.index(args.current_host)    
 57 |     dist_ip_addrs = args.hosts
 58 |     dist_ip_addrs.pop(host_rank)
 59 |     ngpus_per_trial = 1 if args.num_gpus > 0 else 0
 60 | 
 61 |     # load training and validation data
 62 |     print(f'Train files: {os.listdir(args.train)}')
 63 |     train_data = __load_input_data(args.train)
 64 |     print(f'Label counts: {dict(Counter(train_data[args.label]))}')
 65 |     
 66 |     predictor = task.fit(
 67 |         train_data=train_data,
 68 |         label=args.label,            
 69 |         output_directory=args.model_dir,
 70 |         problem_type=args.problem_type,
 71 |         eval_metric=args.eval_metric,
 72 |         stopping_metric=args.stopping_metric,
 73 |         auto_stack=args.auto_stack, # default: False
 74 |         hyperparameter_tune=args.hyperparameter_tune, # default: False
 75 |         feature_prune=args.feature_prune, # default: False
 76 |         holdout_frac=args.holdout_frac, # default: None
 77 |         num_bagging_folds=args.num_bagging_folds, # default: 0
 78 |         num_bagging_sets=args.num_bagging_sets, # default: None
 79 |         stack_ensemble_levels=args.stack_ensemble_levels, # default: 0
 80 |         cache_data=args.cache_data,
 81 |         time_limits=args.time_limits,
 82 |         num_trials=args.num_trials, # default: None
 83 |         search_strategy=args.search_strategy, # default: 'random'
 84 |         search_options=args.search_options,
 85 |         visualizer=args.visualizer,
 86 |         verbosity=args.verbosity
 87 |     )
 88 |     
 89 |     # Results summary
 90 |     predictor.fit_summary(verbosity=1)
 91 | 
 92 |     # Leaderboard on optional test data
 93 |     if args.test:
 94 |         print(f'Test files: {os.listdir(args.test)}')
 95 |         test_data = __load_input_data(args.test)    
 96 |         print('Running model on test data and getting Leaderboard...')
 97 |         leaderboard = predictor.leaderboard(dataset=test_data, silent=True)
 98 |         def format_for_print(df):
 99 |             table = PrettyTable(list(df.columns))
100 |             for row in df.itertuples():
101 |                 table.add_row(row[1:])
102 |             return str(table)
103 |         print(format_for_print(leaderboard), end='\n\n')
104 | 
105 |     # Files summary
106 |     print(f'Model export summary:')
107 |     print(f"/opt/ml/model/: {os.listdir('/opt/ml/model/')}")
108 |     models_contents = os.listdir('/opt/ml/model/models')
109 |     print(f"/opt/ml/model/models: {models_contents}")
110 |     print(f"/opt/ml/model directory size: {du('/opt/ml/model/')}\n")
111 | 
112 | # ------------------------------------------------------------ #
113 | # Training execution                                           #
114 | # ------------------------------------------------------------ #
115 | 
116 | def str2bool(v):
117 |     return v.lower() in ('yes', 'true', 't', '1')
118 | 
119 | def parse_args():
120 | 
121 |     parser = argparse.ArgumentParser(
122 |              formatter_class=argparse.ArgumentDefaultsHelpFormatter)
123 |     parser.register('type','bool',str2bool) # add type keyword to registries
124 | 
125 |     parser.add_argument('--hosts', type=list, default=json.loads(os.environ['SM_HOSTS']))    
126 |     parser.add_argument('--current-host', type=str, default=os.environ['SM_CURRENT_HOST'])
127 |     parser.add_argument('--num-gpus', type=int, default=os.environ['SM_NUM_GPUS'])
128 |     parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) # /opt/ml/model
129 |     parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAINING'])
130 |     parser.add_argument('--test', type=str, default='') # /opt/ml/input/data/test
131 |     parser.add_argument('--label', type=str, default='truth',
132 |                         help="Name of the column that contains the target variable to predict.")
133 |     
134 |     parser.add_argument('--problem_type', type=str, default=None,
135 |                         help=("Type of prediction problem, i.e. is this a binary/multiclass classification or "
136 |                               "regression problem options: 'binary', 'multiclass', 'regression'). "
137 |                               "If `problem_type = None`, the prediction problem type is inferred based "
138 |                               "on the label-values in provided dataset."))
139 |     parser.add_argument('--eval_metric', type=str, default=None,
140 |                         help=("Metric by which predictions will be ultimately evaluated on test data."
141 |                               "AutoGluon tunes factors such as hyperparameters, early-stopping, ensemble-weights, etc. "
142 |                               "in order to improve this metric on validation data. "
143 |                               "If `eval_metric = None`, it is automatically chosen based on `problem_type`. "
144 |                               "Defaults to 'accuracy' for binary and multiclass classification and "
145 |                               "'root_mean_squared_error' for regression. "
146 |                               "Otherwise, options for classification: [ "
147 |                               "    'accuracy', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_weighted', "
148 |                               "    'roc_auc', 'average_precision', 'precision', 'precision_macro', 'precision_micro', 'precision_weighted', "
149 |                               "    'recall', 'recall_macro', 'recall_micro', 'recall_weighted', 'log_loss', 'pac_score']. "
150 |                               "Options for regression: ['root_mean_squared_error', 'mean_squared_error', "
151 |                               "'mean_absolute_error', 'median_absolute_error', 'r2']. "
152 |                               "For more information on these options, see `sklearn.metrics`: "
153 |                               "https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics "
154 |                               "You can also pass your own evaluation function here as long as it follows formatting of the functions "
155 |                               "defined in `autogluon/utils/tabular/metrics/`. "))
156 |     parser.add_argument('--stopping_metric', type=str, default=None,
157 |                         help=("Metric which models use to early stop to avoid overfitting. "
158 |                               "`stopping_metric` is not used by weighted ensembles, instead weighted ensembles maximize `eval_metric`. "
159 |                               "Defaults to `eval_metric` value except when `eval_metric='roc_auc'`, where it defaults to `log_loss`."))      
160 |     parser.add_argument('--auto_stack', type='bool', default=False,
161 |                         help=("Whether to have AutoGluon automatically attempt to select optimal "
162 |                               "num_bagging_folds and stack_ensemble_levels based on data properties. "
163 |                               "Note: Overrides num_bagging_folds and stack_ensemble_levels values. "
164 |                               "Note: This can increase training time by up to 20x, but can produce much better results. "
165 |                               "Note: This can increase inference time by up to 20x."))
166 |     parser.add_argument('--hyperparameter_tune', type='bool', default=False,
167 |                         help=("Whether to tune hyperparameters or just use fixed hyperparameter values "
168 |                               "for each model. Setting as True will increase `fit()` runtimes."))
169 |     parser.add_argument('--feature_prune', type='bool', default=False,
170 |                         help="Whether or not to perform feature selection.")
171 |     parser.add_argument('--holdout_frac', type=float, default=None, 
172 |                         help=("Fraction of train_data to holdout as tuning data for optimizing hyperparameters "
173 |                               "(ignored unless `tuning_data = None`, ignored if `num_bagging_folds != 0`). "
174 |                               "Default value is selected based on the number of rows in the training data. "
175 |                               "Default values range from 0.2 at 2,500 rows to 0.01 at 250,000 rows. "
176 |                               "Default value is doubled if `hyperparameter_tune = True`, up to a maximum of 0.2. "
177 |                               "Disabled if `num_bagging_folds >= 2`."))    
178 |     parser.add_argument('--num_bagging_folds', type=int, default=0, 
179 |                         help=("Number of folds used for bagging of models. When `num_bagging_folds = k`, "
180 |                               "training time is roughly increased by a factor of `k` (set = 0 to disable bagging). "
181 |                               "Disabled by default, but we recommend values between 5-10 to maximize predictive performance. "
182 |                               "Increasing num_bagging_folds will result in models with lower bias but that are more prone to overfitting. "
183 |                               "Values > 10 may produce diminishing returns, and can even harm overall results due to overfitting. "
184 |                               "To further improve predictions, avoid increasing num_bagging_folds much beyond 10 "
185 |                               "and instead increase num_bagging_sets. "))    
186 |     parser.add_argument('--num_bagging_sets', type=int, default=None,
187 |                         help=("Number of repeats of kfold bagging to perform (values must be >= 1). "
188 |                               "Total number of models trained during bagging = num_bagging_folds * num_bagging_sets. "
189 |                               "Defaults to 1 if time_limits is not specified, otherwise 20 "
190 |                               "(always disabled if num_bagging_folds is not specified). "
191 |                               "Values greater than 1 will result in superior predictive performance, "
192 |                               "especially on smaller problems and with stacking enabled. "
193 |                               "Increasing num_bagged_sets reduces the bagged aggregated variance without "
194 |                               "increasing the amount each model is overfit."))
195 |     parser.add_argument('--stack_ensemble_levels', type=int, default=0, 
196 |                         help=("Number of stacking levels to use in stack ensemble. "
197 |                               "Roughly increases model training time by factor of `stack_ensemble_levels+1` " 
198 |                               "(set = 0 to disable stack ensembling).  "
199 |                               "Disabled by default, but we recommend values between 1-3 to maximize predictive performance. "
200 |                               "To prevent overfitting, this argument is ignored unless you have also set `num_bagging_folds >= 2`."))
201 |     parser.add_argument('--hyperparameters', type=lambda s: ast.literal_eval(s), default=None,
202 |                         help="Refer to docs: https://autogluon.mxnet.io/api/autogluon.task.html")
203 |     parser.add_argument('--cache_data', type='bool', default=True,
204 |                        help=("Whether the predictor returned by this `fit()` call should be able to be further trained "
205 |                              "via another future `fit()` call. "
206 |                              "When enabled, the training and validation data are saved to disk for future reuse."))
207 |     parser.add_argument('--time_limits', type=int, default=None, 
208 |                         help=("Approximately how long `fit()` should run for (wallclock time in seconds)."
209 |                               "If not specified, `fit()` will run until all models have completed training, "
210 |                               "but will not repeatedly bag models unless `num_bagging_sets` is specified."))
211 |     parser.add_argument('--num_trials', type=int, default=None, 
212 |                         help=("Maximal number of different hyperparameter settings of each "
213 |                               "model type to evaluate during HPO. (only matters if "
214 |                               "hyperparameter_tune = True). If both `time_limits` and "
215 |                               "`num_trials` are specified, `time_limits` takes precedent."))    
216 |     parser.add_argument('--search_strategy', type=str, default='random',
217 |                         help=("Which hyperparameter search algorithm to use. "
218 |                               "Options include: 'random' (random search), 'skopt' "
219 |                               "(SKopt Bayesian optimization), 'grid' (grid search), "
220 |                               "'hyperband' (Hyperband), 'rl' (reinforcement learner)"))      
221 |     parser.add_argument('--search_options', type=lambda s: ast.literal_eval(s), default=None,
222 |                         help="Auxiliary keyword arguments to pass to the searcher that performs hyperparameter optimization.")
223 |     parser.add_argument('--nthreads_per_trial', type=int, default=None,
224 |                         help="How many CPUs to use in each training run of an individual model. This is automatically determined by AutoGluon when left as None (based on available compute).")
225 |     parser.add_argument('--ngpus_per_trial', type=int, default=None,
226 |                         help="How many GPUs to use in each trial (ie. single training run of a model). This is automatically determined by AutoGluon when left as None.")
227 |     parser.add_argument('--dist_ip_addrs', type=list, default=None,
228 |                         help="List of IP addresses corresponding to remote workers, in order to leverage distributed computation.") 
229 |     parser.add_argument('--visualizer', type=str, default='none',
230 |                         help=("How to visualize the neural network training progress during `fit()`. "
231 |                               "Options: ['mxboard', 'tensorboard', 'none']."))          
232 |     parser.add_argument('--verbosity', type=int, default=2, 
233 |                         help=("Verbosity levels range from 0 to 4 and control how much information is printed during fit(). "
234 |                               "Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings). "
235 |                               "If using logging, you can alternatively control amount of information printed via `logger.setLevel(L)`, "
236 |                               "where `L` ranges from 0 to 50 (Note: higher values of `L` correspond to fewer print statements, "
237 |                               "opposite of verbosity levels"))
238 |     parser.add_argument('--debug', type='bool', default=False,
239 |                        help=("Whether to set logging level to DEBUG"))  
240 |         
241 |     parser.add_argument('--feature_importance', type='bool', default=True)
242 | 
243 |     return parser.parse_args()
244 | 
245 | 
246 | def set_experiment_config(experiment_basename = None):
247 |     '''
248 |     Optionally takes an base name for the experiment. Has a hard dependency on boto3 installation. 
249 |     Creates a new experiment using the basename, otherwise simply uses autogluon as basename.
250 |     May run into issues on Experiments' requirements for basename config downstream.
251 |     '''
252 |     now = int(time.time())
253 |     
254 |     if experiment_basename:
255 |         experiment_name = '{}-autogluon-{}'.format(experiment_basename, now)
256 |     else:
257 |         experiment_name = 'autogluon-{}'.format(now)
258 |     
259 |     try:
260 |         client = boto3.Session().client('sagemaker')
261 |     except:
262 |         print ('You need to install boto3 to create an experiment. Try pip install --upgrade boto3')
263 |         return ''
264 |     
265 |     try:
266 |         Experiment.create(experiment_name=experiment_name, 
267 |                             description="Running AutoGluon Tabular with SageMaker Experiments", 
268 |                             sagemaker_boto_client=client)
269 |         print ('Created an experiment named {}, you should be able to see this in SageMaker Studio right now.'.format(experiment_name))
270 |         
271 |     except:
272 |         print ('Could not create the experiment. Is your basename properly configured? Also try installing the sagemaker experiments SDK with pip install sagemaker-experiments.')
273 |         return ''
274 |     
275 |     return experiment_name
276 | 
277 | if __name__ == "__main__":
278 |     start = timer()
279 | 
280 |     args = parse_args()
281 |         
282 |     # Print SageMaker args
283 |     print('\n====== args ======')
284 |     for k,v in vars(args).items():
285 |         print(f'{k},  type: {type(v)},  value: {v}')
286 |     print()
287 |     
288 |     train()
289 | 
290 |     # Package inference code with model export
291 |     subprocess.call('mkdir /opt/ml/model/code'.split())
292 |     subprocess.call('cp /opt/ml/code/inference.py /opt/ml/model/code/'.split())
293 |     
294 |     elapsed_time = round(timer()-start,3)
295 |     print(f'Elapsed time: {elapsed_time} seconds')  
296 |     print('===== Training Completed =====')
297 | 


--------------------------------------------------------------------------------
/Starter Notebooks/MLOps and Hosting/test_set.csv:
--------------------------------------------------------------------------------
  1 | area_se,concavity_se,radius_worst,compactness_worst,concavity_worst,compactness_mean,texture_se,area_worst,perimeter_mean,fractal_dimension_worst,concave points_worst,fractal_dimension_mean,concave points_mean,texture_mean,radius_se,smoothness_se,concavity_mean,texture_worst,perimeter_worst,symmetry_mean,smoothness_mean,smoothness_worst,concave points_se,fractal_dimension_se,radius_mean,compactness_se,perimeter_se,symmetry_se,area_mean,symmetry_worst,truth
  2 | 45.4,0.01998,19.07,0.1871,0.2914,0.07200000000000001,1.24,1138.0,94.74,0.08216,0.1609,0.05922,0.05259,20.13,0.4727,0.005718,0.07395,30.88,123.4,0.1586,0.09867000000000001,0.1464,0.011090000000000001,0.002085,14.68,0.01162,3.195,0.0141,684.5,0.3029,1
  3 | 76.36,0.0611,19.76,0.1963,0.2535,0.08642000000000001,1.305,1228.0,117.4,0.06558,0.09181,0.0534,0.05778,21.84,0.6362,0.005529999999999999,0.1103,24.7,129.1,0.177,0.07371,0.08822,0.01444,0.005036,18.08,0.05296,4.312,0.0214,1024.0,0.2369,1
  4 | 15.26,0.02828,14.24,0.2685,0.2866,0.08575,0.5308,623.7,85.79,0.0732,0.09172999999999999,0.05594,0.02864,13.72,0.1833,0.0042710000000000005,0.050769999999999996,17.37,96.59,0.1617,0.08363,0.1166,0.008468,0.002613,13.28,0.020730000000000002,1.5919999999999999,0.01461,541.8,0.2736,0
  5 | 86.22,0.07117000000000001,25.74,0.8681,0.9387,0.27699999999999997,1.595,1821.0,140.1,0.124,0.265,0.07016,0.152,29.33,0.726,0.006522,0.3514,39.42,184.6,0.2397,0.1178,0.165,0.016640000000000002,0.006185,20.6,0.061579999999999996,5.772,0.02324,1265.0,0.4087,1
  6 | 32.74,0.01608,15.66,0.1252,0.1117,0.05895,1.046,750.0,89.78,0.07234,0.07453,0.05898,0.029439999999999997,15.98,0.3892,0.007976,0.035339999999999996,21.58,101.2,0.1714,0.08457999999999999,0.1195,0.009046,0.00283,14.04,0.01295,2.6439999999999997,0.02005,611.2,0.2725,0
  7 | 20.64,0.016980000000000002,14.96,0.1346,0.1742,0.07957,0.9462,686.5,87.76,0.0696,0.09077,0.06088,0.0316,17.64,0.2431,0.0032450000000000005,0.04548,23.53,95.78,0.1732,0.0995,0.1199,0.009233,0.001524,13.7,0.008186,1.564,0.01285,571.1,0.2518,0
  8 | 21.84,0.02153,16.01,0.1388,0.17,0.08345,1.636,783.6,96.12,0.06599,0.1017,0.057479999999999996,0.049510000000000005,20.21,0.2323,0.005415,0.06824,28.48,103.9,0.1487,0.09587000000000001,0.1216,0.01183,0.001812,14.87,0.01371,1.5959999999999999,0.01959,680.9,0.2369,0
  9 | 28.93,0.007936,15.05,0.1421,0.07003,0.07426,1.9240000000000002,705.6,86.34,0.07675,0.07762999999999999,0.060160000000000005,0.032639999999999995,30.72,0.3408,0.005841,0.028189999999999996,41.61,96.69,0.1375,0.09245,0.1172,0.009128,0.0029850000000000002,13.38,0.012459999999999999,2.287,0.01564,557.2,0.2196,0
 10 | 23.92,0.005717,15.53,0.1109,0.053070000000000006,0.05306,1.081,749.9,90.03,0.07082999999999999,0.0589,0.057,0.02733,12.88,0.2571,0.006692,0.01765,18.0,98.4,0.1373,0.09308999999999999,0.1281,0.006626999999999999,0.0024760000000000003,14.11,0.01132,1.558,0.014159999999999999,616.5,0.21,0
 11 | 75.09,0.040619999999999996,23.86,0.3597,0.5179,0.1644,1.216,1760.0,137.8,0.08999,0.2113,0.062220000000000004,0.1121,21.24,0.5904,0.006666,0.2188,30.76,163.2,0.1848,0.1085,0.1464,0.014790000000000001,0.0037270000000000003,20.59,0.02791,4.206,0.011170000000000001,1320.0,0.248,1
 12 | 27.48,0.1197,11.26,0.295,0.3486,0.1294,2.261,390.2,64.6,0.1162,0.0991,0.08116,0.037160000000000006,18.06,0.4311,0.01286,0.1307,24.39,73.07,0.1669,0.09698999999999999,0.1301,0.0246,0.01792,9.904,0.08807999999999999,3.1319999999999997,0.0388,302.4,0.2614,0
 13 | 20.72,0.014819999999999998,15.61,0.1011,0.1101,0.05016,0.6232,760.2,94.66,0.06142,0.07955,0.05347999999999999,0.02541,14.7,0.2182,0.0067079999999999996,0.03416,17.58,101.7,0.1659,0.08472,0.1139,0.01056,0.001779,14.81,0.01197,1.6769999999999998,0.0158,680.7,0.2334,0
 14 | 14.41,0.03113,13.72,0.1975,0.145,0.09965,0.6412,576.0,84.08,0.1009,0.0585,0.07238,0.020980000000000002,14.23,0.1814,0.005231,0.037380000000000004,16.91,87.38,0.1652,0.09462000000000001,0.1142,0.007315,0.0057009999999999995,12.99,0.02305,0.9219,0.016390000000000002,514.3,0.2432,0
 15 | 19.63,0.02197,10.57,0.2097,0.09996000000000001,0.1225,1.13,326.6,59.96,0.08982000000000001,0.07262,0.07696,0.02421,13.9,0.3538,0.01546,0.033319999999999995,17.84,67.84,0.2197,0.1371,0.185,0.0158,0.003901,9.295,0.0254,2.388,0.03997,257.8,0.3681,0
 16 | 67.1,0.02134,20.33,0.2817,0.2432,0.1791,2.91,1298.0,129.1,0.09203,0.1841,0.07224,0.1469,26.29,0.519,0.0075450000000000005,0.1937,32.72,141.3,0.1634,0.1215,0.1392,0.018430000000000002,0.01039,19.1,0.0605,5.801,0.030560000000000004,1132.0,0.2311,1
 17 | 7.228,0.1535,10.85,0.3619,0.603,0.166,1.2309999999999999,351.9,70.15,0.12,0.1465,0.0845,0.059410000000000004,20.22,0.1115,0.008499,0.228,22.82,76.51,0.2188,0.09072999999999999,0.1143,0.029189999999999997,0.0122,10.57,0.07643,2.363,0.01617,338.3,0.2597,0
 18 | 9.597000000000001,0.027569999999999997,11.94,0.3898,0.3365,0.1069,0.5379999999999999,433.1,70.41,0.10800000000000001,0.07966000000000001,0.06837,0.01571,15.62,0.1482,0.0044740000000000005,0.05115,19.35,80.78,0.1861,0.1007,0.1332,0.006691,0.004672,10.88,0.030930000000000003,1.301,0.01212,358.9,0.2581,0
 19 | 57.72,0.05839,17.73,0.2116,0.3344,0.1283,1.4569999999999999,975.2,107.5,0.07952999999999999,0.1047,0.06532,0.07981,21.88,0.5706,0.01056,0.1799,25.21,113.7,0.1869,0.1165,0.1426,0.011859999999999999,0.006187,16.26,0.03756,2.9610000000000003,0.04022,826.8,0.2736,1
 20 | 17.81,0.0,8.952,0.07767,0.0,0.05847,2.7769999999999997,240.1,54.09,0.08116,0.0,0.07359,0.0,18.6,0.3368,0.02075,0.0,22.44,56.65,0.2163,0.1074,0.1347,0.0,0.0068200000000000005,8.597000000000001,0.01403,2.222,0.06146,221.2,0.3142,0
 21 | 116.4,0.1091,23.37,0.6164,0.7681,0.3454,1.885,1623.0,143.7,0.09963999999999999,0.2508,0.08142,0.1604,23.97,0.9317,0.01038,0.3754,31.72,170.3,0.2906,0.1286,0.1639,0.02593,0.005987,20.18,0.06835,8.649,0.07895,1245.0,0.544,1
 22 | 104.9,0.09723,24.09,0.7444,0.7242,0.21899999999999997,1.666,1651.0,128.3,0.1038,0.2493,0.06343,0.09961,24.81,0.9811,0.006548,0.2107,33.17,177.4,0.231,0.09081,0.1247,0.02638,0.007645999999999999,19.07,0.1006,8.83,0.053329999999999995,1104.0,0.467,1
 23 | 14.47,0.01556,12.57,0.1,0.08803,0.05562,1.926,489.5,75.27,0.06576,0.04306,0.0578,0.01553,17.39,0.1859,0.007831,0.023530000000000002,26.48,79.57,0.1718,0.1007,0.1356,0.00624,0.0019879999999999997,11.81,0.008776,1.011,0.031389999999999994,428.9,0.32,0
 24 | 16.35,0.08158,14.39,0.5849,0.7727,0.1346,0.4402,639.1,84.95,0.1178,0.1561,0.06409,0.0398,14.11,0.2025,0.005501,0.1374,17.7,105.0,0.1596,0.0876,0.1254,0.0137,0.007555,12.89,0.055920000000000004,2.3930000000000002,0.01266,512.2,0.2639,0
 25 | 23.29,0.07927000000000001,13.9,0.2317,0.3344,0.0958,1.389,595.6,84.08,0.07127,0.1017,0.05935,0.0339,15.7,0.2913,0.006418000000000001,0.1115,19.69,92.12,0.1432,0.07818,0.09926,0.017740000000000002,0.003696,12.89,0.03961,2.347,0.01878,516.6,0.1999,0
 26 | 17.58,0.0151,14.35,0.1063,0.139,0.05205,1.35,632.9,84.1,0.06788,0.06005,0.05584,0.02068,25.25,0.2084,0.005768,0.027719999999999998,34.23,91.29,0.1619,0.08791,0.1289,0.006451,0.0018280000000000002,13.21,0.008081999999999999,1.314,0.01347,537.9,0.2444,0
 27 | 130.8,0.07649,23.68,0.3391,0.4932,0.1838,1.743,1696.0,134.7,0.09469,0.1923,0.07468999999999999,0.128,23.86,1.072,0.007964,0.2283,29.43,158.8,0.2249,0.10800000000000001,0.1347,0.01936,0.005928,20.09,0.04732,7.803999999999999,0.027360000000000002,1247.0,0.3294,1
 28 | 24.25,0.02905,15.67,0.4166,0.5006,0.1231,0.8937,759.4,85.98,0.1179,0.2088,0.06777000000000001,0.0734,18.66,0.2871,0.006532,0.1226,27.95,102.8,0.2128,0.1158,0.1786,0.01215,0.003643,13.17,0.02336,1.8969999999999998,0.01743,534.6,0.39,1
 29 | 13.56,0.030789999999999998,13.94,0.1508,0.2298,0.045239999999999995,1.601,602.0,84.13,0.07198,0.0497,0.05635,0.01105,17.43,0.163,0.006261,0.04336,27.82,88.28,0.1487,0.07215,0.1101,0.005383,0.00225,13.2,0.01569,0.873,0.01962,541.6,0.2767,0
 30 | 27.41,0.019469999999999998,16.51,0.1376,0.1611,0.07214,1.385,826.4,94.7,0.06956,0.1095,0.0568,0.03027,25.42,0.3031,0.004775,0.04105,32.29,107.4,0.184,0.08275,0.106,0.01269,0.0026260000000000003,14.74,0.01172,2.177,0.0187,668.6,0.2722,0
 31 | 49.45,0.052779999999999994,16.46,0.3635,0.3219,0.1836,1.511,809.2,98.22,0.09208,0.1108,0.07406,0.063,13.98,0.5462,0.009976,0.145,18.34,114.1,0.2086,0.1031,0.1312,0.0158,0.005444,14.69,0.05244,4.795,0.02653,656.1,0.2827,0
 32 | 48.31,0.028130000000000002,19.26,0.2394,0.3791,0.1223,0.7859,1156.0,101.7,0.08019,0.1514,0.057960000000000005,0.08087000000000001,19.48,0.4743,0.00624,0.1466,26.0,124.9,0.1931,0.1092,0.1546,0.01093,0.002461,15.46,0.01484,3.094,0.013969999999999998,748.9,0.2837,1
 33 | 10.8,0.02758,11.92,0.221,0.2299,0.09097000000000001,0.8225,440.0,71.49,0.0908,0.1075,0.06907,0.03341,14.96,0.1601,0.007416,0.053970000000000004,19.9,79.76,0.1776,0.1033,0.1418,0.0101,0.0029170000000000003,11.06,0.01877,1.355,0.02348,373.9,0.3301,0
 34 | 23.02,0.02889,18.49,0.5564,0.5703,0.1639,1.278,1035.0,103.7,0.1204,0.2014,0.0665,0.08399,33.56,0.2419,0.005345,0.1751,49.54,126.3,0.2091,0.1063,0.1883,0.01022,0.0033590000000000004,15.53,0.02556,1.903,0.009947,744.9,0.3512,1
 35 | 24.91,0.04815,15.89,0.4238,0.5186,0.1098,1.0190000000000001,799.6,93.63,0.1014,0.1447,0.06125,0.055979999999999995,21.72,0.28600000000000003,0.005878,0.1319,30.36,116.2,0.1885,0.09823,0.1446,0.011609999999999999,0.0040219999999999995,14.25,0.02995,2.657,0.02028,633.0,0.3591,1
 36 | 93.54,0.05081,21.31,0.2117,0.3446,0.1066,1.849,1403.0,122.1,0.07421,0.149,0.05699,0.07731,20.25,0.8529,0.01075,0.149,27.26,139.9,0.1697,0.0944,0.1338,0.01911,0.004217,18.61,0.027219999999999998,5.632000000000001,0.022930000000000002,1094.0,0.2341,1
 37 | 29.84,0.02071,15.3,0.2264,0.1326,0.1126,1.492,706.7,91.38,0.08321,0.1048,0.06171,0.043039999999999995,27.15,0.3645,0.007256,0.04462,33.17,100.2,0.1537,0.09929,0.1241,0.01626,0.005304,14.05,0.02678,2.888,0.0208,600.4,0.225,0
 38 | 6.8020000000000005,0.03735,9.965,0.1887,0.1868,0.060529999999999994,1.182,301.0,59.75,0.09206,0.025639999999999996,0.06724,0.005128,21.68,0.1186,0.005515,0.03735,27.99,66.61,0.1274,0.07969,0.1086,0.005128,0.004582999999999999,9.397,0.026739999999999996,1.1740000000000002,0.01951,268.8,0.2376,0
 39 | 17.74,0.02018,11.95,0.1223,0.09755,0.05139,1.239,441.2,68.26,0.06769,0.03413,0.05687999999999999,0.007875,14.97,0.2525,0.006547,0.02251,20.72,77.79,0.1399,0.07793,0.1076,0.005612,0.00236,10.75,0.01781,1.806,0.01671,355.3,0.23,0
 40 | 70.01,0.03457,24.47,0.2761,0.4146,0.1074,1.189,1872.0,134.4,0.08327999999999999,0.1563,0.055920000000000004,0.0834,27.81,0.524,0.00502,0.1554,37.38,162.7,0.1448,0.09158999999999999,0.1223,0.01091,0.0028870000000000002,20.51,0.02062,3.767,0.012980000000000002,1319.0,0.2437,1
 41 | 15.7,0.01985,10.23,0.1148,0.08867,0.06492,0.9768,314.9,60.34,0.07773,0.062270000000000006,0.06905,0.02076,12.44,0.2773,0.009606,0.029560000000000003,15.66,65.13,0.1815,0.1024,0.1324,0.014209999999999999,0.002968,9.504,0.01432,1.909,0.02027,273.9,0.245,0
 42 | 32.96,0.000692,14.23,0.06191,0.0018449999999999999,0.03789,1.214,624.1,82.61,0.06289,0.01111,0.05501,0.0041670000000000006,19.31,0.40399999999999997,0.007490999999999999,0.000692,22.25,90.24,0.1819,0.0806,0.1021,0.0041670000000000006,0.00299,13.05,0.008593,2.595,0.0219,527.2,0.2439,0
 43 | 14.16,0.013430000000000001,13.3,0.046189999999999995,0.04833,0.03766,2.342,545.9,82.61,0.061689999999999995,0.05013,0.058629999999999995,0.029230000000000003,18.42,0.1839,0.004352,0.02562,22.81,84.46,0.1467,0.08983,0.09701,0.011640000000000001,0.001777,13.03,0.004899000000000001,1.17,0.02671,523.8,0.1987,0
 44 | 14.55,0.010790000000000001,14.67,0.1582,0.105,0.06059,0.9234,656.7,87.21,0.08025,0.08586,0.05952999999999999,0.017230000000000002,16.34,0.1872,0.004477,0.01857,23.19,96.08,0.1353,0.07685,0.1089,0.007956,0.0025510000000000003,13.64,0.011770000000000001,1.449,0.01325,571.8,0.2346,0
 45 | 29.63,0.0058119999999999995,15.63,0.1141,0.04753,0.0633,1.6780000000000002,749.1,88.68,0.06911,0.0589,0.056729999999999996,0.022930000000000002,19.6,0.3419,0.005836,0.01342,28.01,100.9,0.1555,0.08684,0.1118,0.007039,0.002326,13.85,0.01095,2.331,0.02014,592.6,0.2513,0
 46 | 20.86,0.055529999999999996,12.26,0.2118,0.1797,0.11199999999999999,1.768,457.8,74.65,0.08134,0.06917999999999999,0.06782,0.025939999999999998,14.44,0.2784,0.01215,0.06737,19.68,78.78,0.1818,0.09984,0.1345,0.01494,0.0055119999999999995,11.54,0.04112,1.6280000000000001,0.0184,402.9,0.2329,0
 47 | 33.76,0.01121,16.11,0.1766,0.09189,0.08606,1.111,803.7,94.57,0.07246,0.06946000000000001,0.058660000000000004,0.02957,24.02,0.3721,0.004868,0.03102,29.11,102.9,0.1685,0.08974,0.1115,0.008606,0.002893,14.62,0.01818,2.279,0.02085,662.7,0.2522,0
 48 | 67.34,0.026260000000000002,20.05,0.2119,0.2318,0.09182,1.391,1260.0,109.3,0.07228,0.1474,0.05534,0.06576,18.8,0.599,0.006123,0.08422,26.3,130.7,0.1893,0.08865,0.1168,0.016040000000000002,0.003493,16.78,0.0247,4.129,0.020909999999999998,886.3,0.281,1
 49 | 69.47,0.04252,23.32,0.7394,0.6566,0.183,1.041,1681.0,114.2,0.1339,0.1899,0.06487000000000001,0.07944,24.52,0.5907,0.0058200000000000005,0.1692,33.82,151.6,0.1927,0.1071,0.1585,0.01127,0.006299,17.2,0.05616,3.705,0.015269999999999999,929.4,0.3313,1
 50 | 16.16,0.020069999999999998,13.67,0.2003,0.2267,0.07165,1.255,567.9,76.95,0.07923999999999999,0.07632,0.05968,0.01863,15.65,0.2271,0.0059689999999999995,0.041510000000000005,24.9,87.78,0.2079,0.09723,0.1377,0.007026999999999999,0.002607,12.0,0.018119999999999997,1.4409999999999998,0.019719999999999998,443.3,0.3379,0
 51 | 21.83,0.01831,17.71,0.1722,0.231,0.08501,0.6372,947.9,104.3,0.07012,0.1129,0.05875,0.04528,14.86,0.2387,0.003958,0.055,19.58,115.9,0.1735,0.09495,0.1206,0.008747,0.001621,16.14,0.012459999999999999,1.729,0.015,800.0,0.2778,0
 52 | 20.53,0.0139,15.93,0.2043,0.2085,0.06031,0.8265,787.9,86.18,0.07146,0.1112,0.05587,0.020309999999999998,21.58,0.2385,0.00328,0.0311,30.25,102.5,0.1784,0.08162,0.1094,0.006881,0.001286,13.44,0.01102,1.5719999999999998,0.0138,563.0,0.2994,1
 53 | 24.62,0.02586,13.11,0.1676,0.1755,0.09362000000000001,2.174,525.1,73.81,0.08851,0.061270000000000005,0.07005,0.022330000000000003,20.97,0.3251,0.01037,0.04591,32.16,84.53,0.1842,0.1102,0.1557,0.0075060000000000005,0.0039759999999999995,11.45,0.01706,2.077,0.01816,401.5,0.2762,0
 54 | 68.17,0.03497,24.15,0.659,0.6091,0.1719,0.6062,1813.0,127.9,0.1123,0.1785,0.06261,0.07593,26.47,0.5558,0.0050149999999999995,0.1657,30.9,161.4,0.1853,0.09401,0.1509,0.009643,0.003896,19.27,0.03318,3.528,0.015430000000000001,1162.0,0.3672,1
 55 | 25.22,0.01872,15.1,0.1751,0.1381,0.08165,1.3730000000000002,699.4,86.6,0.06602999999999999,0.07911,0.0571,0.0278,18.3,0.295,0.005884,0.03974,25.94,97.59,0.1638,0.1022,0.1339,0.009366,0.0018170000000000003,13.45,0.01491,2.099,0.01884,555.1,0.2678,0
 56 | 19.62,0.003297,11.98,0.09669,0.01335,0.06779,1.966,436.1,71.94,0.06522,0.02022,0.06027999999999999,0.0075829999999999995,19.86,0.2976,0.01289,0.005006,25.78,76.91,0.19399999999999998,0.1054,0.1424,0.004967,0.001963,11.22,0.011040000000000001,1.959,0.04243,387.3,0.3292,0
 57 | 43.4,0.021509999999999998,17.98,0.1546,0.2644,0.06287999999999999,1.147,993.6,85.84,0.07371,0.11599999999999999,0.05671,0.03438,19.63,0.4697,0.0060030000000000005,0.05857999999999999,29.87,116.6,0.1598,0.09047999999999999,0.1401,0.009443,0.0018679999999999999,13.43,0.01063,3.142,0.0152,565.4,0.2884,1
 58 | 15.82,0.01123,12.85,0.053320000000000006,0.04116,0.032119999999999996,0.5996,513.1,77.25,0.06037000000000001,0.01852,0.05649,0.005051,14.08,0.2113,0.005343,0.01123,16.47,81.6,0.1673,0.07733999999999999,0.1001,0.005051,0.0009502000000000001,12.18,0.005767,1.4380000000000002,0.01977,461.4,0.2293,0
 59 | 14.34,0.04156,12.4,0.201,0.2596,0.08227999999999999,1.166,467.6,73.99,0.0918,0.07431,0.06573999999999999,0.01969,14.59,0.2034,0.004957,0.053079999999999995,21.9,82.04,0.1779,0.1046,0.1352,0.008038,0.003614,11.49,0.02114,1.567,0.018430000000000002,404.9,0.2941,0
 60 | 54.04,0.02291,20.58,0.1202,0.2249,0.06722,1.214,1261.0,114.2,0.061110000000000005,0.1185,0.05025,0.05596,20.01,0.5506,0.004024,0.07293,27.83,129.2,0.2129,0.08402000000000001,0.1072,0.009863,0.001902,17.95,0.008422,3.3569999999999998,0.05014,982.0,0.4882,1
 61 | 67.74,0.04256,23.96,0.3725,0.5936,0.1448,1.4080000000000001,1740.0,128.1,0.09009,0.20600000000000002,0.06115,0.1194,18.82,0.5659,0.005288,0.2256,30.39,153.9,0.1823,0.1089,0.1514,0.01176,0.003211,19.44,0.02833,3.6310000000000002,0.017169999999999998,1167.0,0.3266,1
 62 | 21.2,0.031139999999999998,15.85,0.2735,0.3103,0.1021,0.7394,766.9,93.97,0.07683,0.1599,0.06081,0.05532,15.18,0.2406,0.005706,0.08487,19.85,108.6,0.1724,0.0997,0.1316,0.01493,0.0025280000000000003,14.44,0.022969999999999997,2.12,0.01454,640.1,0.2691,0
 63 | 155.8,0.044969999999999996,31.01,0.4126,0.5820000000000001,0.1682,0.9635,2944.0,153.5,0.08677,0.2593,0.06309,0.1237,26.97,1.058,0.006428,0.195,34.51,206.0,0.1909,0.09509,0.1481,0.017159999999999998,0.003053,23.21,0.028630000000000003,7.247000000000001,0.0159,1670.0,0.3103,1
 64 | 116.2,0.0889,20.96,0.3903,0.3639,0.2458,3.568,1332.0,132.4,0.1023,0.1767,0.078,0.1118,24.8,0.9555,0.003139,0.2065,29.94,151.7,0.2397,0.0974,0.1037,0.0409,0.01284,19.17,0.08297,11.07,0.04484,1123.0,0.3176,1
 65 | 99.04,0.0395,23.69,0.1922,0.3215,0.1034,2.463,1731.0,131.2,0.06637,0.1628,0.05533,0.09791,28.25,0.7655,0.005769,0.14400000000000002,38.25,155.0,0.1752,0.0978,0.1166,0.01678,0.0024980000000000002,20.13,0.02423,5.202999999999999,0.01898,1261.0,0.2572,1
 66 | 36.74,0.02623,19.96,0.11599999999999999,0.221,0.058839999999999996,0.828,1236.0,120.9,0.057370000000000004,0.1294,0.049960000000000004,0.058429999999999996,19.98,0.3283,0.007571,0.0802,24.3,129.0,0.155,0.08922999999999999,0.1243,0.01463,0.0016760000000000002,18.81,0.01114,2.363,0.0193,1102.0,0.2567,1
 67 | 139.9,0.035710000000000006,29.92,0.4188,0.4658,0.2106,0.9004,2642.0,165.5,0.09671,0.2475,0.06738999999999999,0.1471,21.6,0.9915,0.004989,0.231,26.93,205.7,0.1991,0.10300000000000001,0.1342,0.015969999999999998,0.00476,24.63,0.032119999999999996,7.05,0.01879,1841.0,0.3157,1
 68 | 22.07,0.0073019999999999995,15.33,0.1513,0.062310000000000004,0.06945,1.5030000000000001,715.5,89.79,0.07617,0.07962999999999999,0.05835,0.01896,21.25,0.2589,0.007389,0.01462,30.28,98.27,0.1517,0.0907,0.1287,0.01004,0.002925,14.03,0.01383,1.6669999999999998,0.012629999999999999,603.4,0.2226,0
 69 | 100.4,0.04093,23.23,0.2534,0.3092,0.1313,1.736,1645.0,134.7,0.06386,0.1613,0.054189999999999995,0.1015,20.67,0.8336,0.0049380000000000005,0.1523,27.15,152.0,0.2166,0.09156,0.1097,0.01699,0.002719,20.47,0.030889999999999997,5.167999999999999,0.02816,1299.0,0.322,1
 70 | 103.6,0.059039999999999995,24.19,0.3416,0.3703,0.1669,1.892,1671.0,133.7,0.07632,0.2152,0.0602,0.1265,26.83,0.9761,0.008439,0.1641,33.81,160.0,0.1875,0.09905,0.1278,0.02536,0.004286,20.2,0.04674,7.127999999999999,0.0371,1234.0,0.3271,1
 71 | 54.18,0.03188,20.96,0.4233,0.4784,0.2022,1.073,1315.0,108.1,0.1142,0.2073,0.07356,0.1028,20.68,0.5692,0.007026,0.1722,31.48,136.8,0.2164,0.11699999999999999,0.1789,0.012969999999999999,0.004142,16.13,0.02501,3.8539999999999996,0.016890000000000002,798.8,0.3706,1
 72 | 24.2,0.1027,10.06,0.3748,0.4609,0.1972,1.911,297.1,60.07,0.1055,0.1145,0.08743,0.04908,18.9,0.4653,0.009845,0.1975,23.4,68.62,0.233,0.09967999999999999,0.1221,0.025269999999999997,0.007877,9.042,0.0659,3.7689999999999997,0.034910000000000004,244.5,0.3135,0
 73 | 34.44,0.037630000000000004,15.65,0.4706,0.4425,0.1353,1.8090000000000002,768.9,81.15,0.1205,0.1459,0.06937,0.04562,26.86,0.4053,0.009098,0.1085,39.34,101.7,0.1943,0.1034,0.1785,0.01321,0.005672,12.34,0.03845,2.642,0.01878,477.4,0.3215,1
 74 | 44.41,0.03248,19.28,0.2947,0.3597,0.1319,1.232,1121.0,106.9,0.08199999999999999,0.1583,0.06277,0.08488,20.71,0.4375,0.006697,0.1478,30.38,129.8,0.1948,0.1169,0.159,0.013919999999999998,0.0027890000000000002,16.27,0.02083,3.27,0.015359999999999999,813.7,0.3103,1
 75 | 53.16,0.03059,23.79,0.3749,0.4316,0.1442,0.9951,1628.0,127.2,0.07787000000000001,0.2252,0.05892000000000001,0.09464,18.18,0.4709,0.005654,0.1626,28.65,152.4,0.1893,0.1037,0.1518,0.01499,0.0019649999999999997,19.4,0.02199,2.903,0.01623,1145.0,0.359,1
 76 | 67.78,0.05042,20.88,0.3559,0.5588,0.1496,1.3980000000000001,1344.0,112.8,0.08482,0.1847,0.06382,0.1203,23.98,0.6009,0.008268000000000001,0.2417,32.09,136.1,0.2248,0.1197,0.1634,0.01112,0.003854,17.02,0.03082,3.9989999999999997,0.02102,899.3,0.353,1
 77 | 18.02,0.005832,13.64,0.1352,0.04506,0.05794,1.1520000000000001,562.6,76.66,0.08083,0.05093,0.06047999999999999,0.008487999999999999,18.9,0.243,0.00718,0.007509999999999999,27.06,86.54,0.1555,0.08386,0.1289,0.005495,0.002754,12.06,0.01096,1.5590000000000002,0.019819999999999997,445.3,0.28800000000000003,0
 78 | 23.94,0.07743,15.09,1.058,1.105,0.2396,1.599,711.4,83.97,0.2075,0.221,0.08242999999999999,0.08542999999999999,24.04,0.2976,0.007148999999999999,0.2273,40.68,97.65,0.203,0.1186,0.1853,0.01432,0.01008,12.46,0.07217,2.039,0.01789,475.9,0.4366,1
 79 | 14.67,0.016980000000000002,14.5,0.2776,0.18899999999999997,0.127,0.7477,630.5,85.63,0.08183,0.07282999999999999,0.06811,0.0311,15.71,0.1852,0.004097,0.04568,20.49,96.09,0.1967,0.1075,0.1312,0.006490000000000001,0.002425,13.08,0.01898,1.383,0.01678,520.0,0.3184,0
 80 | 23.52,0.04312,10.01,0.1678,0.1397,0.08751,2.265,310.1,59.2,0.0849,0.05087,0.06963,0.0218,13.86,0.4098,0.008738,0.059879999999999996,19.23,65.59,0.2341,0.07721,0.09836,0.0156,0.005822,9.173,0.03938,2.608,0.04192,260.9,0.3282,0
 81 | 17.25,0.007078,15.85,0.1564,0.1206,0.07255,0.4801,773.4,87.76,0.07782,0.08703999999999999,0.06155,0.0188,16.33,0.2047,0.0038280000000000002,0.017519999999999997,20.2,101.6,0.1631,0.09277,0.1264,0.005077,0.001697,13.68,0.007228,1.3730000000000002,0.01054,575.5,0.2806,0
 82 | 11.36,0.0,9.262,0.07057000000000001,0.0,0.04276,0.7873,259.2,54.42,0.07848,0.0,0.06724,0.0,14.45,0.2204,0.009172,0.0,17.04,58.36,0.1722,0.09137999999999999,0.1162,0.0,0.003399,8.671,0.008006999999999998,1.435,0.027110000000000002,227.2,0.2592,0
 83 | 42.76,0.044360000000000004,16.34,0.3089,0.2604,0.1339,0.7372,803.6,95.77,0.08473,0.1397,0.06346,0.07064,15.24,0.5115,0.005508,0.09966,18.24,109.4,0.2116,0.1132,0.1277,0.01623,0.004841,14.64,0.04412,3.8139999999999996,0.02427,651.9,0.3151,0
 84 | 20.95,0.027030000000000002,13.83,0.2463,0.2434,0.09445,1.5019999999999998,574.7,79.78,0.09261,0.1205,0.06404,0.03745,21.8,0.2978,0.007112,0.06015,30.5,91.46,0.193,0.08772,0.1304,0.01293,0.004463,12.36,0.02493,2.2030000000000003,0.01958,466.1,0.2972,0
 85 | 24.87,0.015359999999999999,16.25,0.303,0.1804,0.09823,0.948,809.8,97.03,0.08472,0.1489,0.05852,0.04819,19.1,0.2877,0.005332,0.0594,26.19,109.1,0.1879,0.08992,0.1313,0.01187,0.002815,14.96,0.02115,2.171,0.01522,687.3,0.2962,0
 86 | 104.9,0.06591,21.57,0.4785,0.5165,0.2004,1.465,1437.0,119.0,0.1224,0.1996,0.07368999999999999,0.1002,23.33,0.9289,0.0067659999999999994,0.2136,28.87,143.6,0.1696,0.09289,0.1207,0.02311,0.0113,17.6,0.07025,5.801,0.016730000000000002,980.5,0.2301,1
 87 | 153.4,0.05372999999999999,25.38,0.6656,0.7119,0.2776,0.9053,2019.0,122.8,0.1189,0.2654,0.07871,0.1471,10.38,1.095,0.006399,0.3001,17.33,184.6,0.2419,0.1184,0.1622,0.01587,0.006193,17.99,0.04904,8.589,0.03003,1001.0,0.4601,1
 88 | 27.57,0.01851,13.32,0.1477,0.149,0.09262999999999999,1.1540000000000001,549.8,75.49,0.08023999999999999,0.09815,0.06401,0.03132,16.17,0.3713,0.008998,0.042789999999999995,21.59,86.57,0.1853,0.1128,0.1526,0.01167,0.003213,11.68,0.01292,2.5540000000000003,0.021519999999999997,420.5,0.2804,0
 89 | 35.03,0.026639999999999997,20.21,0.5804,0.5274,0.1559,0.6857,1261.0,107.0,0.1233,0.1864,0.06515,0.07752,17.88,0.33399999999999996,0.004185,0.1354,27.26,132.7,0.1998,0.10400000000000001,0.1446,0.009067,0.003817,16.13,0.02868,2.1830000000000003,0.01703,807.2,0.42700000000000005,1
 90 | 80.99,0.04718,26.23,0.5717,0.7053,0.2087,0.9209,2081.0,144.4,0.1007,0.2422,0.06606000000000001,0.1562,22.28,0.6242,0.005215,0.281,28.74,172.0,0.2162,0.1167,0.1502,0.01288,0.004028,21.61,0.03726,4.158,0.02045,1407.0,0.3828,1
 91 | 44.74,0.04763,14.62,0.1364,0.1559,0.09755,2.635,653.3,90.31,0.07253,0.1015,0.06457,0.06615,13.17,0.5461,0.01004,0.10099999999999999,15.38,94.52,0.1976,0.1248,0.1394,0.02853,0.005528,13.94,0.03247,4.091,0.01715,594.2,0.21600000000000003,0
 92 | 87.17,0.04502,20.99,0.2053,0.392,0.1056,2.129,1362.0,111.8,0.07599,0.1827,0.06071,0.09934,21.0,0.8161,0.006455,0.1508,33.15,143.2,0.1727,0.1119,0.1449,0.01744,0.0037329999999999998,17.06,0.01797,6.0760000000000005,0.01829,918.6,0.2623,1
 93 | 19.08,0.014530000000000001,11.35,0.0824,0.03938,0.057429999999999995,1.805,396.5,70.21,0.07313,0.04306,0.06669,0.025830000000000002,14.71,0.2073,0.01496,0.02363,16.82,72.01,0.1566,0.1006,0.1216,0.01583,0.004785,11.08,0.02121,1.3769999999999998,0.03082,372.7,0.1902,0
 94 | 20.39,0.00203,14.97,0.05836,0.01379,0.03614,0.6864,698.7,85.69,0.06192,0.0221,0.05335,0.004419,12.71,0.2244,0.0033380000000000003,0.002758,16.94,95.48,0.1365,0.07376,0.09022999999999999,0.003242,0.0015660000000000001,13.5,0.003746,1.5090000000000001,0.0148,566.2,0.2267,0
 95 | 58.53,0.1438,18.07,0.1793,0.2803,0.1146,1.6669999999999998,1021.0,114.5,0.06817999999999999,0.1099,0.058660000000000004,0.06597,25.56,0.5296,0.03113,0.1682,28.07,120.4,0.1308,0.1006,0.1243,0.03927,0.01256,17.42,0.08555,3.767,0.02175,948.0,0.1603,1
 96 | 38.87,0.05371,16.21,0.1976,0.3349,0.09947,2.22,808.9,94.25,0.06846000000000001,0.1225,0.05636,0.04938,21.46,0.4204,0.009369,0.1204,29.25,108.4,0.2075,0.09444,0.1306,0.01761,0.003249,14.48,0.029830000000000002,3.301,0.02418,648.2,0.302,1
 97 | 33.01,0.0028309999999999997,14.73,0.05847,0.01824,0.03735,0.8285,672.4,82.71,0.0658,0.03532,0.05517999999999999,0.008829,13.84,0.3975,0.004148,0.0045590000000000006,17.4,93.96,0.1453,0.08352000000000001,0.1016,0.004821,0.002273,13.05,0.004711,2.5669999999999997,0.014219999999999998,530.6,0.2107,0
 98 | 21.05,0.026810000000000004,17.62,0.6643,0.5539,0.1868,0.9832,896.9,97.41,0.1275,0.2701,0.06924,0.08782999999999999,21.53,0.2545,0.004452,0.1425,33.21,122.4,0.2252,0.1054,0.1525,0.013519999999999999,0.003711,14.58,0.03055,2.11,0.01454,644.8,0.4264,1
 99 | 16.97,0.001184,13.07,0.0739,0.007731999999999999,0.038919999999999996,0.9097,523.4,76.09,0.07037,0.027960000000000002,0.0607,0.005592,17.93,0.2335,0.004729,0.001546,22.25,82.74,0.1382,0.07683,0.1013,0.003951,0.001755,12.03,0.006887000000000001,1.466,0.01466,446.0,0.2171,0
100 | 45.81,0.01622,20.92,0.1806,0.20800000000000002,0.07027,1.433,1320.0,115.2,0.07948,0.1136,0.0551,0.047439999999999996,24.48,0.4212,0.005444,0.05699,34.69,135.1,0.1538,0.08855,0.1315,0.008522,0.002751,17.93,0.01169,2.765,0.014190000000000001,998.9,0.2504,1
101 | 33.63,0.02332,19.85,0.2405,0.3378,0.1041,0.8568,1222.0,113.0,0.08113,0.1857,0.05612999999999999,0.08353,17.08,0.3093,0.004757,0.1266,25.09,130.9,0.1813,0.1008,0.1416,0.012620000000000001,0.002362,17.3,0.015030000000000002,2.193,0.013940000000000001,928.2,0.3138,1
102 | 60.78,0.06899,17.67,0.6247,0.6922,0.2008,1.268,959.5,96.42,0.1132,0.1785,0.07292,0.08653,22.15,0.7036,0.009406999999999999,0.2135,29.51,119.1,0.1949,0.1049,0.16399999999999998,0.01848,0.006113,14.25,0.07056,5.372999999999999,0.017,645.7,0.2844,1
103 | 14.49,0.014519999999999998,16.23,0.3904,0.3728,0.1047,0.6123,740.7,85.42,0.09618,0.1607,0.061770000000000005,0.052520000000000004,21.81,0.1938,0.00335,0.08259,29.89,105.5,0.1746,0.09714,0.1503,0.006853,0.00172,13.17,0.01384,1.334,0.01113,531.5,0.3693,1
104 | 22.45,0.00186,13.5,0.06624,0.005579,0.04216,1.35,564.1,79.83,0.06431,0.008772,0.05855,0.002924,18.4,0.2719,0.006383,0.00186,23.08,85.56,0.1697,0.08392999999999999,0.1038,0.002924,0.002015,12.58,0.008008,1.7209999999999999,0.025710000000000004,489.0,0.2505,0
105 | 44.64,0.04303,18.79,0.3583,0.583,0.1555,0.6583,1102.0,102.5,0.10099999999999999,0.1827,0.07069,0.1097,11.89,0.4209,0.005393,0.2032,17.04,125.0,0.1966,0.1257,0.1531,0.0132,0.004168,15.46,0.023209999999999998,2.805,0.01792,736.9,0.3216,1
106 | 89.74,0.03737,21.53,0.2327,0.2544,0.1289,1.288,1426.0,118.4,0.07625,0.1489,0.060770000000000005,0.07762000000000001,20.56,0.7548,0.007997,0.11699999999999999,26.06,143.4,0.2116,0.1001,0.1309,0.01648,0.003996,18.01,0.027000000000000003,5.353,0.02897,1007.0,0.3251,1
107 | 83.5,0.04257,24.22,0.2311,0.3158,0.08348,1.041,1750.0,132.5,0.07127,0.1445,0.051770000000000004,0.06022,21.46,0.6874,0.007959,0.09042,26.17,161.7,0.1467,0.08355,0.1228,0.01671,0.003933,20.48,0.031330000000000004,5.144,0.01341,1306.0,0.2238,1
108 | 24.19,0.07926,16.35,0.7090000000000001,0.9019,0.2225,0.8749,832.7,102.1,0.1155,0.2475,0.06898,0.09711,22.53,0.253,0.006965000000000001,0.2733,27.57,125.4,0.2041,0.09947,0.1419,0.022340000000000002,0.005784,14.9,0.06213,3.466,0.01499,685.0,0.2866,1
109 | 16.07,0.015090000000000001,15.35,0.3124,0.2654,0.1138,0.6068,719.8,87.44,0.08665,0.1427,0.06317,0.03152,18.75,0.1998,0.004413,0.042010000000000006,25.16,101.9,0.1723,0.1075,0.1624,0.007369,0.0017870000000000002,13.46,0.014430000000000002,1.443,0.01354,551.1,0.3518,0
110 | 17.91,0.009127,15.49,0.135,0.08115,0.061360000000000005,0.8561,725.9,88.44,0.07182000000000001,0.051039999999999995,0.0589,0.01141,17.21,0.2185,0.004599,0.0142,23.58,100.3,0.1614,0.08785,0.1157,0.004814,0.0017079999999999999,13.85,0.009169,1.495,0.01247,588.7,0.2364,0
111 | 24.28,0.007276,16.11,0.1637,0.06648,0.07885,1.217,793.7,96.22,0.06427999999999999,0.08485,0.0565,0.03781,16.95,0.2713,0.0050799999999999994,0.026019999999999998,23.0,104.6,0.17800000000000002,0.09855,0.1216,0.009073000000000001,0.0017059999999999998,14.97,0.0137,1.893,0.0135,685.9,0.2404,0
112 | 19.15,0.0,9.456,0.06444,0.0,0.04362,1.4280000000000002,268.6,47.92,0.07039,0.0,0.058839999999999996,0.0,24.54,0.3857,0.007189,0.0,30.37,59.16,0.1587,0.052629999999999996,0.08996,0.0,0.002783,7.76,0.00466,2.548,0.026760000000000003,181.0,0.2871,0
113 | 28.92,0.014119999999999999,21.31,0.2445,0.3538,0.08467999999999999,0.4757,1410.0,118.6,0.06938,0.1571,0.05425,0.05814,18.58,0.2577,0.002866,0.08169,26.36,139.2,0.1621,0.08588,0.1234,0.006719,0.0010869999999999999,18.31,0.009181,1.817,0.01069,1041.0,0.3206,1
114 | 12.26,0.0,10.62,0.07203999999999999,0.0,0.04102,0.496,342.9,61.24,0.08151,0.0,0.06422,0.0,11.97,0.1988,0.00604,0.0,14.1,66.53,0.1903,0.0925,0.1234,0.0,0.00322,9.738,0.0056560000000000004,1.218,0.02277,288.5,0.3105,0
115 | 20.3,0.044469999999999996,13.07,0.1937,0.256,0.07722000000000001,1.44,520.5,74.2,0.08284,0.06663999999999999,0.06267,0.014280000000000001,19.04,0.2864,0.007278,0.05485,26.98,86.43,0.2031,0.08546000000000001,0.1249,0.008799,0.003339,11.57,0.02047,2.206,0.018680000000000002,409.7,0.3035,0
116 | 13.17,0.012819999999999998,12.32,0.1507,0.1275,0.07608,0.5293,457.5,72.23,0.08022,0.0875,0.0627,0.02755,13.04,0.1904,0.0064719999999999995,0.03265,16.18,78.27,0.1769,0.09834,0.1358,0.008849,0.002817,11.29,0.01122,1.1640000000000001,0.016919999999999998,388.0,0.2733,0
117 | 15.89,0.026310000000000004,12.64,0.217,0.2302,0.1168,0.7339,475.7,75.46,0.07427,0.1105,0.0632,0.044969999999999996,16.02,0.2456,0.005884,0.07097,19.67,81.93,0.1886,0.1088,0.1415,0.013040000000000001,0.001982,11.61,0.02005,1.6669999999999998,0.01848,408.2,0.2787,0
118 | 11.36,0.016130000000000002,13.86,0.1958,0.18100000000000002,0.08836000000000001,0.905,580.9,83.18,0.07833999999999999,0.08388,0.062,0.0239,16.17,0.1458,0.0028870000000000002,0.03296,23.02,89.69,0.1735,0.09879,0.1172,0.007308,0.001972,12.94,0.01285,0.9975,0.0187,507.6,0.3297,0
119 | 19.01,0.01051,13.75,0.1928,0.1167,0.06545,0.8073,583.1,78.01,0.07961,0.055560000000000005,0.06129,0.016919999999999998,15.21,0.2575,0.005403,0.01994,21.38,91.11,0.1638,0.08673,0.1256,0.005142,0.002065,12.2,0.014180000000000002,1.959,0.013330000000000002,457.9,0.2661,0
120 | 10.21,0.07753,9.092,0.431,0.5381,0.1305,1.962,249.8,53.27,0.1486,0.07879,0.08261,0.02168,20.7,0.1935,0.01243,0.1321,29.72,58.08,0.2222,0.09405,0.163,0.01022,0.01178,8.219,0.05416,1.2429999999999999,0.02309,203.9,0.3322,0
121 | 31.0,0.03688,16.86,0.4059,0.3744,0.1306,1.845,811.3,92.87,0.1026,0.1772,0.06433,0.06462000000000001,23.81,0.4207,0.010879999999999999,0.1115,34.85,115.0,0.2235,0.09462999999999999,0.1559,0.01627,0.004768,14.19,0.0371,3.534,0.044989999999999995,610.7,0.4724,1
122 | 164.1,0.03582,30.67,0.2678,0.4819,0.1283,0.9245,2906.0,155.1,0.07737999999999999,0.2089,0.055060000000000005,0.141,24.27,1.0090000000000001,0.006292,0.2308,30.73,202.4,0.1797,0.1069,0.1515,0.013009999999999999,0.003118,23.51,0.01971,6.462000000000001,0.014790000000000001,1747.0,0.2593,1
123 | 12.67,0.01132,14.92,0.1231,0.0846,0.053610000000000005,1.685,684.5,89.75,0.06609,0.07911,0.05764,0.032510000000000004,17.18,0.1504,0.005371,0.026810000000000004,25.34,96.42,0.1641,0.08045,0.1066,0.009155,0.001444,14.06,0.01273,1.237,0.01719,609.1,0.2523,0
124 | 17.72,0.01551,16.2,0.1737,0.1362,0.06934,0.4125,819.1,97.65,0.06766,0.08177999999999999,0.055439999999999996,0.02657,13.21,0.1783,0.005012,0.03393,15.73,104.5,0.1721,0.07962999999999999,0.1126,0.009155,0.0017670000000000001,15.19,0.01485,1.338,0.01647,711.8,0.2487,0
125 | 19.2,0.04757,14.18,0.3593,0.3206,0.1297,0.873,600.5,80.64,0.1118,0.09804,0.06588,0.0288,17.48,0.2608,0.0067150000000000005,0.05892000000000001,23.13,95.23,0.1779,0.1042,0.1427,0.01051,0.006884,12.39,0.03705,2.117,0.01838,462.9,0.2819,0
126 | 90.47,0.03342,22.25,0.2291,0.3272,0.11,1.581,1549.0,121.4,0.08456,0.1674,0.06213,0.08665,17.12,0.7128,0.008102,0.1457,24.9,145.4,0.1966,0.1054,0.1503,0.01601,0.00457,18.66,0.02101,4.895,0.02045,1077.0,0.2894,1
127 | 130.2,0.03576,27.66,0.3885,0.4756,0.1954,0.6999,2227.0,147.2,0.08574,0.2432,0.0614,0.1501,21.9,1.008,0.003978,0.2448,25.8,195.0,0.1824,0.1063,0.1294,0.014709999999999999,0.003796,22.01,0.028210000000000002,7.561,0.01518,1482.0,0.2741,1
128 | 19.83,0.01796,16.89,0.2884,0.3796,0.07862000000000001,1.005,848.7,87.76,0.079,0.1329,0.0613,0.03085,24.69,0.231,0.0040880000000000005,0.05285,35.64,113.2,0.1761,0.09258,0.1471,0.00688,0.001465,13.61,0.01174,1.7519999999999998,0.01323,572.6,0.34700000000000003,1
129 | 33.67,0.03452,16.41,0.3856,0.5106,0.1469,0.9306,844.4,88.64,0.1109,0.2051,0.07325,0.08172,20.52,0.3906,0.005414,0.1445,29.66,113.3,0.2116,0.1106,0.1574,0.013340000000000001,0.004005,13.4,0.02265,3.093,0.01705,556.7,0.3585,1
130 | 22.81,0.0,11.92,0.054939999999999996,0.0,0.03558,3.8960000000000004,439.6,70.67,0.05905,0.0,0.055020000000000006,0.0,29.37,0.3141,0.007594,0.0,38.3,75.19,0.106,0.07449,0.09267,0.0,0.001773,11.2,0.008878,2.041,0.01989,386.0,0.1566,0
131 | 52.34,0.021169999999999998,24.33,0.2945,0.3788,0.1088,1.033,1844.0,132.9,0.07998999999999999,0.1697,0.055720000000000006,0.09333,27.06,0.3977,0.005043,0.1519,39.16,162.3,0.1814,0.1,0.1522,0.008185,0.0018920000000000002,20.31,0.015780000000000002,2.5869999999999997,0.012819999999999998,1288.0,0.3151,1
132 | 20.21,0.03452,14.13,0.2318,0.1604,0.07899,1.2990000000000002,621.9,84.18,0.07247,0.06608,0.05899,0.01883,18.29,0.2357,0.003629,0.04057,24.61,96.31,0.1874,0.07351,0.09329,0.01065,0.0037049999999999995,12.96,0.03713,2.397,0.02632,525.2,0.3207,0
133 | 17.61,0.01329,13.71,0.1212,0.102,0.07664,0.9505,574.4,81.25,0.06888,0.05602000000000001,0.05984,0.02107,17.3,0.21,0.006809,0.03193,21.1,88.7,0.1707,0.1028,0.1384,0.006474,0.001784,12.67,0.009514,1.5659999999999998,0.020569999999999998,489.9,0.2688,0
134 | 94.44,0.05687999999999999,22.54,0.205,0.4,0.1328,0.7813,1575.0,135.1,0.07678,0.1625,0.05882999999999999,0.1043,14.34,0.7572,0.01149,0.198,16.67,152.2,0.1809,0.1003,0.1374,0.01885,0.005115,20.29,0.02461,5.438,0.01756,1297.0,0.2364,1
135 | 38.49,0.02967,17.58,0.2101,0.2866,0.08597,1.198,967.0,97.26,0.06954,0.11199999999999999,0.05915,0.04335,19.07,0.386,0.004952000000000001,0.07486,28.06,113.8,0.1561,0.09215,0.1246,0.009423,0.001718,15.05,0.0163,2.63,0.01152,701.9,0.2282,1
136 | 43.52,0.006021,11.21,0.1352,0.02085,0.08333,1.7469999999999999,380.9,61.93,0.08009,0.04589,0.07028999999999999,0.01967,19.12,0.6965,0.013069999999999998,0.008934000000000001,23.17,71.79,0.2538,0.1075,0.1398,0.01052,0.004225,9.742,0.01885,4.607,0.031,289.7,0.3196,0
137 | 16.85,0.0169,11.16,0.1402,0.1055,0.07326,2.015,384.0,64.41,0.07664,0.06498999999999999,0.06331,0.01775,17.53,0.2619,0.007803,0.02511,26.84,71.98,0.18899999999999997,0.1007,0.1402,0.008043000000000002,0.002778,10.05,0.014490000000000001,1.778,0.021,310.8,0.2894,0
138 | 58.38,0.04942,20.38,0.4122,0.5036,0.1109,1.679,1284.0,112.4,0.07944,0.1739,0.05407000000000001,0.05736,25.42,0.51,0.008109,0.1204,35.46,132.8,0.1467,0.08331000000000001,0.1436,0.017419999999999998,0.003739,17.27,0.04308,3.283,0.01594,928.8,0.25,1
139 | 21.03,0.02544,14.98,0.2698,0.2577,0.1192,0.4981,686.6,88.59,0.08177000000000001,0.0909,0.06302999999999999,0.04451,13.9,0.2569,0.005850999999999999,0.0786,17.13,101.1,0.1962,0.1051,0.1376,0.00836,0.002918,13.56,0.02314,2.011,0.01842,561.3,0.3065,0
140 | 19.39,0.02334,12.98,0.1822,0.1609,0.09218,1.204,513.9,77.61,0.08251,0.1202,0.0685,0.04274,24.89,0.2623,0.008320000000000001,0.05441,30.36,84.48,0.182,0.10300000000000001,0.1311,0.01665,0.003674,11.99,0.02025,1.865,0.02094,441.3,0.2599,0
141 | 19.25,0.009212999999999999,14.91,0.1017,0.0626,0.055810000000000005,0.6549,688.9,89.59,0.0671,0.08216,0.05586,0.02652,15.66,0.2142,0.004837,0.02087,19.31,96.53,0.1589,0.07966000000000001,0.1034,0.010759999999999999,0.002104,14.02,0.009238,1.6059999999999999,0.01171,606.5,0.2136,0
142 | 33.01,0.033889999999999997,15.79,0.1581,0.2675,0.06636,1.6269999999999998,758.2,93.97,0.06836,0.1359,0.05416,0.05271,23.29,0.4157,0.008312,0.0839,31.71,102.2,0.1627,0.08682000000000001,0.1312,0.01576,0.0028710000000000003,14.6,0.017419999999999998,2.9139999999999997,0.0174,664.7,0.2477,1
143 | 24.72,0.04649,13.74,0.4092,0.4504,0.17,1.426,591.7,78.99,0.10300000000000001,0.1865,0.07371,0.07415,16.58,0.3197,0.005427,0.1659,26.38,91.93,0.2678,0.1091,0.1385,0.018430000000000002,0.004635,11.8,0.03633,2.281,0.05628,432.0,0.5774,1
144 | 93.99,0.01715,29.17,0.26,0.3155,0.1022,1.127,2615.0,137.2,0.07526000000000001,0.2009,0.052779999999999994,0.08632000000000001,23.04,0.6917,0.004728,0.1097,35.59,188.0,0.1769,0.09427999999999999,0.1401,0.01038,0.001987,21.16,0.01259,4.303,0.01083,1404.0,0.2822,1
145 | 48.29,0.0236,16.77,0.1525,0.1632,0.07624,0.8121,873.2,92.51,0.06072,0.1087,0.054479999999999994,0.04603,13.47,0.522,0.007089,0.05724,16.9,110.4,0.2075,0.09906000000000001,0.1297,0.01286,0.001463,14.34,0.014280000000000001,3.763,0.02266,641.2,0.3062,0
146 | 30.18,0.032139999999999995,9.981,0.1248,0.09441000000000001,0.07697999999999999,1.2,302.0,56.74,0.07431,0.047619999999999996,0.06621,0.023809999999999998,15.49,0.5381,0.01093,0.04721,17.7,65.27,0.193,0.08292999999999999,0.1015,0.015059999999999999,0.004174000000000001,8.878,0.02899,4.277,0.02837,241.0,0.2434,0
147 | 28.62,0.01977,14.45,0.1979,0.1423,0.1117,1.003,624.1,82.51,0.08557000000000001,0.08045,0.06623,0.02995,16.7,0.3834,0.007509,0.0388,21.74,93.63,0.212,0.1125,0.1475,0.009198999999999999,0.003629,12.75,0.015609999999999999,2.495,0.01805,493.8,0.3071,0
148 | 19.14,0.0,9.077,0.0834,0.0,0.048780000000000004,1.462,248.0,47.98,0.09938,0.0,0.07285,0.0,25.49,0.3777,0.01266,0.0,30.92,57.17,0.187,0.08098,0.1256,0.0,0.006872,7.729,0.009692000000000001,2.492,0.02882,178.8,0.3058,0
149 | 67.66,0.04345,22.75,0.3458,0.4734,0.1339,2.2840000000000003,1540.0,129.9,0.07918,0.2255,0.05715,0.1103,21.68,0.6226,0.004756,0.1863,34.66,157.6,0.2082,0.09797,0.1218,0.01806,0.003288,19.68,0.03368,5.172999999999999,0.03756,1194.0,0.4045,1
150 | 34.78,0.01949,14.16,0.1105,0.08112,0.07589,1.3219999999999998,616.7,83.51,0.06435,0.06296,0.06087000000000001,0.02645,20.78,0.4202,0.007017,0.03136,24.11,90.82,0.254,0.1135,0.1297,0.01153,0.001533,13.0,0.01142,2.873,0.02951,519.4,0.3196,0
151 | 18.15,0.020630000000000003,12.33,0.09147999999999999,0.1444,0.04701,0.9429,466.7,71.8,0.06641,0.06961,0.056670000000000005,0.0223,19.04,0.2727,0.009281999999999999,0.03709,23.84,78.0,0.1516,0.08139,0.129,0.008965,0.002146,11.31,0.009216,1.831,0.021830000000000002,394.1,0.24,0
152 | 18.4,0.02636,14.38,0.3842,0.3582,0.1334,0.6332,633.7,82.69,0.1033,0.1407,0.06854,0.05074,18.17,0.2324,0.005704,0.08017,22.15,95.29,0.1641,0.1076,0.1533,0.010320000000000001,0.003563,12.65,0.02502,1.696,0.01759,485.6,0.32299999999999995,0
153 | 20.05,0.005308,10.92,0.09473,0.02049,0.05301,2.043,366.1,62.11,0.08988,0.023809999999999998,0.0689,0.007937,19.94,0.335,0.01113,0.006829000000000001,26.29,68.81,0.135,0.1024,0.1316,0.00525,0.005667,9.787,0.01463,2.1319999999999997,0.018009999999999998,294.5,0.1934,0
154 | 13.25,0.008342,13.14,0.1232,0.08636,0.052410000000000005,0.7285,532.8,76.84,0.07898,0.07025,0.059070000000000004,0.01963,12.74,0.1822,0.005528,0.019719999999999998,18.41,84.08,0.159,0.09311,0.1275,0.006273,0.00253,12.06,0.009789,1.171,0.01465,448.6,0.2514,0
155 | 19.29,0.03304,14.8,0.257,0.3438,0.1147,1.3319999999999999,675.2,88.1,0.07686,0.1453,0.06079,0.053810000000000004,18.89,0.2136,0.005442,0.0858,27.2,97.33,0.1806,0.1059,0.1428,0.013669999999999998,0.002464,13.51,0.01957,1.5130000000000001,0.01315,558.1,0.2666,0
156 | 36.35,0.013580000000000002,16.36,0.1238,0.135,0.05055,1.153,830.6,84.74,0.062060000000000004,0.1001,0.05318,0.02648,14.76,0.4057,0.004481000000000001,0.03261,22.35,104.5,0.1386,0.07355,0.1006,0.01082,0.0014349999999999999,13.27,0.01038,2.701,0.01069,551.7,0.2027,0
157 | 40.73,0.02713,20.42,0.342,0.3508,0.1198,0.8282,1239.0,115.1,0.07867,0.1939,0.05491,0.07488,19.32,0.3971,0.00609,0.1036,25.84,139.5,0.1506,0.08968,0.1381,0.01345,0.0026579999999999998,17.54,0.025689999999999998,3.088,0.01594,951.6,0.2928,1
158 | 115.2,0.02721,22.96,0.2444,0.2639,0.111,1.1520000000000001,1648.0,111.2,0.0906,0.1555,0.06281,0.06431,27.15,0.9291,0.008740000000000001,0.1007,34.49,152.1,0.1793,0.09898,0.16,0.014580000000000001,0.004417,17.08,0.02219,6.051,0.02045,930.9,0.301,1
159 | 22.73,0.027139999999999997,11.14,0.1542,0.1277,0.06258,1.35,385.2,61.49,0.08524,0.0656,0.06412999999999999,0.01514,18.49,0.3776,0.007501000000000001,0.029480000000000003,25.62,70.88,0.2238,0.08946,0.1234,0.009883,0.003913000000000001,9.667,0.01989,2.569,0.0196,289.1,0.3174,0
160 | 11.68,0.017230000000000002,12.09,0.1982,0.1553,0.07078999999999999,1.031,447.1,70.67,0.07287,0.06754,0.06246,0.02074,14.93,0.1642,0.005296,0.035460000000000005,20.83,79.73,0.2003,0.07987000000000001,0.1095,0.006959999999999999,0.001941,11.04,0.019030000000000002,1.281,0.0188,372.7,0.3202,0
161 | 35.13,0.0,13.45,0.052129999999999996,0.0,0.03398,3.647,558.9,77.42,0.06742999999999999,0.0,0.0596,0.0,29.97,0.4455,0.007339,0.0,38.05,85.08,0.1701,0.07699,0.09422,0.0,0.003136,12.27,0.008243,2.884,0.03141,465.4,0.2409,0
162 | 17.49,0.01376,12.36,0.09708,0.07529,0.05008,1.974,459.3,72.17,0.06994,0.062029999999999995,0.05955,0.021730000000000003,18.89,0.2656,0.0065379999999999995,0.02399,26.14,79.29,0.2013,0.08713,0.1118,0.009923999999999999,0.002928,11.37,0.01395,1.954,0.03416,396.0,0.3267,0
163 | 23.11,0.03829,16.22,0.4202,0.40399999999999997,0.12300000000000001,1.079,808.9,95.81,0.1023,0.1205,0.06341000000000001,0.0389,24.99,0.2542,0.007137999999999999,0.1009,31.73,113.5,0.1872,0.08837,0.134,0.01162,0.006111,14.47,0.04653,2.615,0.02068,656.4,0.3187,0
164 | 13.38,0.006564,13.5,0.1472,0.052329999999999995,0.08392999999999999,0.6931,553.7,81.78,0.06922,0.06343,0.061,0.01924,13.78,0.1807,0.006064,0.01288,17.48,88.54,0.1638,0.09667,0.1298,0.007978,0.001392,12.72,0.0118,1.34,0.01374,492.1,0.2369,0
165 | 22.95,0.014230000000000001,19.18,0.292,0.2477,0.07112,0.5679,1084.0,107.1,0.07622999999999999,0.08737,0.05325,0.02307,20.2,0.2473,0.002667,0.036489999999999995,26.56,127.3,0.1846,0.07497000000000001,0.1009,0.005297,0.0017,16.69,0.014459999999999999,1.775,0.01961,857.6,0.4677,1
166 | 17.67,0.05189,10.6,0.3663,0.2913,0.2204,1.39,328.1,64.12,0.1364,0.1075,0.09575,0.07038,13.14,0.2744,0.021769999999999998,0.1188,18.04,69.47,0.2057,0.1255,0.2006,0.0145,0.011479999999999999,9.676,0.04888,1.787,0.02632,272.5,0.2848,0
167 | 18.15,0.0643,12.36,0.4082,0.4779,0.1113,1.376,476.4,71.73,0.09532,0.1555,0.0664,0.03613,17.2,0.2574,0.008565000000000001,0.09457,26.87,90.14,0.1489,0.08915,0.1391,0.01768,0.0049759999999999995,10.97,0.046380000000000005,2.806,0.01516,371.5,0.254,0
168 | 18.57,0.02,14.29,0.217,0.2413,0.07175,0.7786,624.6,78.31,0.0747,0.08829,0.059160000000000004,0.02027,18.02,0.2527,0.005833,0.04392,24.04,93.85,0.1695,0.09231,0.1368,0.007087,0.00196,12.21,0.013880000000000002,1.874,0.01938,458.4,0.3218,0
169 | 23.13,0.0288,11.54,0.1486,0.07987000000000001,0.08578,1.534,402.8,67.41,0.07552,0.03203,0.06481,0.01201,19.29,0.355,0.007595,0.02995,23.31,74.22,0.2217,0.09988999999999999,0.1219,0.008614,0.0034509999999999996,10.49,0.02219,2.302,0.0271,336.1,0.2826,0
170 | 12.67,0.01434,12.32,0.1648,0.1399,0.06888999999999999,0.9938,462.0,73.06,0.06765,0.08476,0.05865,0.02875,15.39,0.1759,0.0051329999999999995,0.03503,22.02,79.93,0.1734,0.09639,0.11900000000000001,0.008602,0.001588,11.43,0.01521,1.143,0.015009999999999999,399.8,0.2676,0
171 | 57.65,0.0371,24.56,0.3206,0.5755,0.1206,0.6636,1623.0,122.0,0.09287999999999999,0.1956,0.05629,0.08271,24.59,0.5495,0.003872,0.1468,30.41,152.9,0.1953,0.09029,0.1249,0.012,0.0033369999999999997,19.02,0.01842,3.055,0.01964,1076.0,0.3956,1
172 | 28.51,0.03312,16.76,0.3345,0.3114,0.1025,1.3359999999999999,867.1,97.53,0.09251000000000001,0.1308,0.059129999999999995,0.03876,22.11,0.3186,0.004449,0.06859,31.55,110.2,0.1944,0.08515,0.1077,0.01196,0.0040149999999999995,14.99,0.02808,2.31,0.01906,693.7,0.3163,0
173 | 


--------------------------------------------------------------------------------