├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Evaluating a Project └── Evaluation.md ├── Images └── Autopilot.png ├── LICENSE ├── Project Writeups ├── Advanced Data Science - XRay Image Analysis.md ├── Hospital Costs with Autopilot.md ├── Hosting and MLOps.md └── README.md ├── README.md └── Starter Notebooks ├── Advanced Data Science - XRay Analysis ├── Computer Vision for XRay Analysis.ipynb ├── ground_truth_od.py ├── im2rec.py ├── images │ ├── gt_label_output.png │ └── tensorplot.gif ├── src │ ├── requirements.txt │ └── ssd_entry_point.py ├── template.manifest └── tensor_plot.py ├── Cost Prediction ├── Cost Prediction with Autopilot.ipynb └── images │ ├── shap_1.png │ ├── shap_2.png │ ├── shap_3.png │ └── shap_4.png └── MLOps and Hosting ├── Hosting Models on SageMaker.ipynb ├── install-run-notebook.sh ├── model.tar.gz ├── src ├── requirements.txt └── train.py ├── test_set.csv └── train_set.csv /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /Evaluating a Project/Evaluation.md: -------------------------------------------------------------------------------- 1 | # Questions to Evaluate a Machine Learning Project 2 | 3 | This set of questions can be used during a machine learning course that introduces technical people who did not previously have a background in ML, and helps them understand if their project is well designed according to machine learning standards. 4 | 5 | ## Infrastructure 6 | 7 | * What is the time to train your model? Can you use streaming data to reduce the amount of time to train? Can you split over multiple instances to reduce the time to train? 8 | * What infrastructure are you using to train your models? Is it the lowest possible cost? Have you considered using GPU's to lower your train time? 9 | * Where are you storing your data; is that the best solution? 10 | * What devops framework do you have to continuously integrate changes as you make them? 11 | * Where are you developing your model, and is it the best choice for your scenario? 12 | 13 | ## Conceptual 14 | 15 | * Does the data that you're using reflect the real world? 16 | * What actually impacts the real world prediction problem, and is that in your data set? 17 | 18 | ## Data transformation 19 | 20 | * Did you normalize your data? 21 | * Did you randomly shuffle your data? 22 | * Did you remove any outliers for your data? 23 | * How did you handle missing or nonsensical values in your data? 24 | * How are you handling any sequential elements of your data set? 25 | * Did you remove any bias from it? Have you thought about the ethical implications of your machine learning system, and the fact that the data set itself you are using is potentially biased? 26 | * In the case of transfer learning, does your data match the model's input expectation (eg. image size, image format, color-correction, etc) 27 | 28 | ## Method 29 | 30 | * Which model did you select, and why? 31 | * How are you evaluating your model? 32 | * Is your model overfitting, and what are you doing to counteract that? 33 | * What are the limitations of your model, and what are its strong points? 34 | * What are the guardrails on the your model performance metrics? What is the minimum and maximum accuracy you expect to achieve? 35 | -------------------------------------------------------------------------------- /Images/Autopilot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Images/Autopilot.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /Project Writeups/Advanced Data Science - XRay Image Analysis.md: -------------------------------------------------------------------------------- 1 | # Advanced SageMaker Features for Data Science in HCLS 2 | You've seen the lectures, you've watched the videos, you've stepped through a hands on lab. Now it's time to take your usage of SageMaker to the next level by taking advantage of some advanced SageMaker features for training. 3 | 4 | In particular, in this project you will train your own object detection model using the built-in Object Detection classifier. This is actually picking up from the results of a Ground Truth data labelling job, and using a manifest file to train on 1000 images. You'll train on spot instances using a sizeable GPU. You will also connect this job to a larger experiment to manage and view your progress. You'll learn how to convert your png images to RecordIO for optimized throughput. 5 | 6 | Optionally, if you have extra time, you're welcome to bring your own single shot detection algorithim into script mode for SageMaker. If you're able to do that, you can leverage SageMaker debugger to analyze the gradients of your model, and even produce an interactive TensorPlot! 7 | 8 | # Built-in Object Detection for X-Ray Analysis 9 | In 2017 the National Institute of Health introduced to the public scientific community one of the largest datasets of chest XRays previously available. Clocking in at 112,000 images, this dataset is the result of more than 30,000 patients. It is labelled to identity potentially cancerous masses within the images, helping acceelerate both scientific analysis and positive patient outcomes through faster diagnosis. 10 | 11 | In this project you'll leverage a subset of the NIH dataset, specifically a sample of 1000 images that we have previously tested with SageMaker's data labelling solution, Ground Truth. You will use the output of a Ground Truth labelling job, where a labeller manually drew bounding boxes around the neck and/or trachea from these images. We're going to use the output from that job as the input for an Object Detection training job. 12 | 13 | ## Accessing the Data and Preparing the Training Job 14 | To access the data and get started on your project, please navigate to the following link: 15 | 16 | - https://github.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/blob/main/Starter%20Notebooks/Advanced%20Data%20Science%20-%20XRay%20Analysis/Computer%20Vision%20for%20XRay%20Analysis.ipynb 17 | 18 | This will take you all the way to training an Object Detection algorithm using the built-in image. This job should take about 35 minutes to run, on the GPU's provided. Notice that your job is connected to SageMaker experiments, so you should be able to view the results in the Experiments tab. 19 | 20 | Also notice that you are training on SageMaker spot! Go ahead and tell us how much you saved on that job. 21 | 22 | ## Extending with bringing your own model 23 | If you have time, you are welcome to extend the solution by bringing your own single-shot-detection algorithm. You can use the script we provided to convert your images to RecordIO if you prefer. 24 | 25 | The advantage of bringing your own model in this case is that you'll be able to use SageMaker Debugger. SageMaker Debugger works out of the box with script-mode for TensorFlow, PyTorch and MXNet models. What this means is that you can setup a _debug hook config_, which will listen to the tensors recorded during your training job and let you analyze these. On the one hand, you can use a built-in debugger image which applies up to 18 built-in rules on your tensors, covering things like whether or not your loss is decreasing, if your gradients are vanishing or exploring, and even analyze feature important. 26 | 27 | On the other hand, you can download the tensors produced during your job and _plug these into a local visualization solution._ We've included some starter code for doing this with our provided `TensorPlot` framework, which will let you develop a local interactive image for assessing your model. 28 | 29 | SageMaker Debugger has exmaples for class activation maps, tensor plots, BERT attention head visualizations, and even model pruning. You're welcome to step through the Debugger examples listed below, play with these, and connect them to the NIH or other dataset as you prefer. 30 | 31 | - https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-debugger -------------------------------------------------------------------------------- /Project Writeups/Hospital Costs with Autopilot.md: -------------------------------------------------------------------------------- 1 | # Predicting Hospital Costs Per Patient with SageMaker Autopilot 2 | 3 | ### Your Problem 4 | Medicare is a national health insurance program, administered by the Center for Medicare and Medicaid Services (CMS). This is a primary health insurance program for Americans who are aged 65 and older. Medicare has published historical data showing hospital’s average spending for Medicare Part A and Part B claims based on different claim types and claim periods covering 1 to 3 days prior to hospital admission up to 30 days after discharge from hospital admission. These hospital spending are price standardized and non-risk adjusted, since risk adjustment is done at the episode level of the claims spanning the entire period during the episode. The hospital average costs are listed against the corresponding state level average cost and national level average cost. 5 | 6 | You have just joined the data science team at Well-Forecasted Hospital. Your goal is to use the Medicare historical spending data to estimate potential costs per patient. 7 | 8 | ### Your Dataset 9 | Medicare has published dataset showing average hospital spending on Medicare Part A and Part B claims. Both the links below refer to the same data set, one is listed in the healthdata.gov site and the other is listed at the data.medicare.gov site. The data dictionary is described in the link marked as #2 below. The dataset has hospital spending data from the year 2018 and has 67,826 data rows spanning across 13 columns. For the purposes of our analysis and machine learning, we use the dataset in csv (Comma Separated Values) format. 10 | 1. https://healthdata.gov/dataset/medicare-hospital-spending-claim 11 | 2. https://data.medicare.gov/Hospital-Compare/Medicare-Hospital-Spending-by-Claim/nrth-mfg3 12 | 13 | A direct link to download the dataset to local computer can be accessed at this link - https://data.medicare.gov/api/views/nrth-mfg3/rows.csv?accessType=DOWNLOAD 14 | 15 | ### Accessing and cleaning your data 16 | To make this easier for you, we've written a starter notebook that downloads the data for you and performs some basic manipulations. 17 | 18 | See the link to the starter notebook here: 19 | - https://github.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/blob/main/Starter%20Notebooks/Cost%20Prediction/Cost%20Prediction%20with%20Autopilot.ipynb 20 | 21 | ### Analyzing your data and performing feature engineering 22 | Once you've loaded the cleaned data into your pandas dataframe, spend a bit of time exploring the fields with generating some plots and histograms. We started with using `pandas.plotting.scatter_matrix` and looking at the claims field. 23 | 24 | We also found it helpful to use `sklearn.feature_selection.SelectKBest` based on `sklearn.feature_selection.chi2` to identify the best candidate X features. 25 | 26 | Feel free to experiment here with your favorite feature engineering and data anlysis steps. 27 | 28 | ### Train a model using your X and Y variables with SageMaker Autopilot 29 | We recommend using SageMaker Autopilot as a simple way to automate both data analysis and model tuning. In the notebook provided, you'll be able to easily train your own set of 250 models using the proivded datset and wrapping Python code to leverage the Autopilot API. 30 | 31 | The Autopilot job will take quite a bit of time to run. If you are using SageMaker Studio you should be able to monitor the job via the Experiments tab. Once the job moves into "Feature Engineering," you should be able to open up the both the data transformation and candidate generation notebooks. Do that. Open themn up on your local Studio domain, step through them, and try to understand precisely what they are doing for you. Remember, all of this code is generated for your specific dataset! 32 | 33 | ### Deploy your solution to a RESTful API 34 | When you have time, you'll notice that the built-in models and the script-mode managed containers within SageMaker come with a `model.deploy()` method. This will automatically create a RESTful API hosting your model! Give that a try. 35 | 36 | What's nice about Autopilot is that __it will deploy the featurizing code along with the best model.__ That is to say, for all of the data transformation code that Autopilot generates within the candidates, it will deploy the specific featurizing code that maps to your best candidate. This is wrapped inside of a what SageMaker calls __an inference pipeline__. 37 | 38 | ### Extend with bringing your own feature engineering script into SageMaker Autopilot 39 | You may remember that this project opened up with some basic data transformation before we even plugged it into Autopilot. If you have time, your task is to port that code into the _bring your own feature engineering_ capabilities of Autopilot. 40 | 41 | Step through the example right here, then modify it to point to your dataset and the code from the starter notebook. 42 | - https://github.com/aws/amazon-sagemaker-examples/blob/master/autopilot/custom-feature-selection/Feature_selection_autopilot.ipynb -------------------------------------------------------------------------------- /Project Writeups/Hosting and MLOps.md: -------------------------------------------------------------------------------- 1 | # Hosting Models on SageMaker for Rapid Diagnosis 2 | You've just joined the machine learning team at a local hospital. They have completed a POC on diagnosing breast cancer, and need your help hosting this model on SageMaker. 3 | 4 | Not only does the team want to build a RESTful API, they want to take advantage of the massive array of features and capabilities that SageMaker brings to the table. 5 | 6 | In this module you will: 7 | - host your model on SageMaker 8 | - enable and test autoscaling on your endpoint 9 | - monitor your model 10 | - push a new model into production on that endpoint 11 | - build an automatic retraining system with Lambda 12 | 13 | ## 1 & 2. Access Your Model Artifact, Training and Inference Code. Package in the SageMaker SDK 14 | The good news is that your model is already built! You have a pretrained model artifact, defined in SKLearn. This model looks at tabular data and returns a predicted likelihood that the patient has breast cancer. You also have the exact training and inference code necessary to develop this model, written as a Python script. 15 | 16 | Access these artifacts and package them up within the SageMaker SDK following the example here: 17 | - https://github.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/blob/main/Starter%20Notebooks/MLOps%20and%20Hosting/Hosting%20Models%20on%20SageMaker.ipynb 18 | 19 | ## 3 & 4. Create and Endpoint and Enable Autoscaling 20 | Once you have the artifacts packaed within the SageMaker SDK, creating an endpoint should be as simple as calling `model.deploy()`. Follow the notebook for this. 21 | 22 | After this is completed, you'll use boto3 to enable autoscaling on that endpoint. Notice that ths is also a single function call, albeit with a few more parameters. 23 | 24 | ## 5. Enable Model Monitor on your Endpoint 25 | The real-world environment that our data scientists trained their model against actually changes over time - we need to make sure that the trained model we are using stays up to date with those changes. The way we're going to do that on SageMaker is by _setting up model monitor on our endpoints._ The way this happens is actually through passing up our training data. We'll use the SageMaker SDK to spin up a processing job that takes our training data in S3, and learns statistical benchmarks on our data. 26 | 27 | The built-in processing image for model monitor uses _Amazon Deequ_, an open-source solution that ensures data quality at high volumes. It's written in PySpark, so it scales quite well. After your processing job has finished you're welcome to view the thresholds and modify them as you or your data scientists prefer. 28 | - https://github.com/awslabs/deequ 29 | 30 | You'll also specify a percentage of data capture, that is the amount of your traffic you want to store after it hits your endpoint. Then, you'll set up _a monitoring schedule_ to run monitoring jobs which use the thresholds you learnt during the previous step, and simply apply those to the data captured from your endpoint. 31 | 32 | ## 6. Improve your model with AutoGluon 33 | You might find that the quality of your model drops over time, or possibly wasn't even that great to being with. AutoGluon is an easy way to improve this - it supports tabular data, imaging, and natural language processing. AutoGluon also has data augmentation capabilities, so it works with smaller datasets like we have here. Step six entails runnning an AutoGluon job, using script mode in SageMaker, to find a better version of the model. 34 | 35 | ## 7. Tune your model and redploy 36 | Another way of finding a better model is using the automatic model tuner. Sadly that doesn't improve the performance of our model in this case, unlike AutoGluon which brings us up to 95% accuracy, but so you can see how it's done we've included the code for both runing a hyperparameter tuning job and a re-deploy to the same endpoint. 37 | 38 | ## 8. Automate the workflow 39 | For the sake of time we've included a very simple of way of automating this workflow - setting up the local `notebook-runner` toolkit and simply running the entire notebook on it's own processing job. While you might not use this instead of a true MLOps pipeline for a real-time application, you can certainly get a lot of value out of running notebooks automatically and with a CLI. See the blog post and GitHub code suit for more details here: 40 | - https://aws.amazon.com/blogs/machine-learning/scheduling-jupyter-notebooks-on-sagemaker-ephemeral-instances/ 41 | 42 | ## Extensions 43 | If you have spare time, you're welcome to add the AutoGluon deploy capabilities as referenced in this example notebook: 44 | - https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/autogluon-tabular 45 | 46 | You can also step through setting up an MLOps pipeline using Lambda, Step Functions, and Apache AirFlow as referenced here: 47 | - https://github.com/aws-samples/mlops-amazon-sagemaker-devops-with-ml -------------------------------------------------------------------------------- /Project Writeups/README.md: -------------------------------------------------------------------------------- 1 | # SageMaker Projects for HCLS 2 | In this course, we have a few different types of projects for you. We have an introduction to data science projects on SageMaker, which is great for people who are net new to machine learning and/or to SageMaker. We also have an advanced data science on SageMaker project, for those of you who want to spend more time learning about advanced SageMaker features for training. We also have a project for operationalizing SageMaker, for those of you who want to focus more on operationalizing SageMaker than necessarily training. All of these projects will start with SageMaker Studio, and utilize SageMaker for both training and hosting, but the big difference is the amount of time you spend on those, where you'll be spending your time, and how you'll frame your final deliverables. 3 | 4 | By the end of course tomorrow, you should have a new notebook developed that solves a meaty problem! You should be able to take this notebook back to work with you and drive value for your company and team. Remember, your AWS Solutions Architect is on the ready to help you out, so don't be shy! Ask for help early and often. 5 | 6 | And don't forget the secret sauce. Make sure you have some fun! 7 | 8 | --- 9 | 10 | ## Introduction to Data Science on SageMaker 11 | If you are new to data science, or if this is your absolute first SageMaker workshop, you might want to focus on projects in this category. We'll show you a data science problem, tell you how to access that dataset, give you some basic code for cleaning in, then show you how to train a model on SageMaker. Quite a bit of the problem solving is actually on you, however, as we want to help you stretch your data science skills. You'll be asked to perform feature engineering, analyze your data, train multiple models until you find the best approach. 12 | 13 | #### Introductory Data Science Projects 14 | - Predicting Hospital Costs per Patient with Autopilot 15 | 16 | ## Advanced Data Science on SageMaker 17 | If you've already had a SageMaker workshop before, and you want to focus on advanced SageMaker features for data science, you might pick the `Advanced Data Science on SageMaker` project. This will introduce you to the built-in object detection algorithm, manifest files, Ground Truth, debugger, script-mode, spot instances, experiments, and more! 18 | 19 | This project focuses on XRay analysis. You'll train an object detection algorithm to identiy the throat and trachea within 1000 NIH XRay images. 20 | 21 | ## SageMaker in Production 22 | A different project is for those who are more interested in putting SageMaker into production. This project focuses more on the operationalizing aspects of SageMaker, such as hosting, autoscaling, automation with Lambda, monitoring, workflow orchestration, and MLOps. You will start with pre-existing model artifacts and training / inference code, and then build a system that highlights the key tenets we need when going into production with SageMaker. This project is called `Hosting and MLOps`. 23 | 24 | You'll also get to see how to use both automatic model tuning and AutoGluon to quickly and dramatically improve the performance of your model. 25 | 26 | 27 | --- 28 | 29 | # FAQ's 30 | __1. If I pick a production project, will I still learn how to train a model?__ Yes definitely! You just won't spend all of your time there. You'll spend maybe 15% of your time training a model, but most of your time is on getting those endpoints up and running, then monitoring and updating them. 31 | 32 | __2. If I am here to learn about data science, what should I focus on?__ Go ahead and pick a project in the "Introduction to Data Science" category. Generally you'll be learning about how to use Autopilot. 33 | 34 | __3. So I'm going to need to write some of my own code for these?__ Yes, and that's because for the people who want to learn what data science is all about, there's nothing better than attacking your own data set and getting it to work for you. That also goes for the projects about running in production. We'll give you a framework for how to develop these, give you hints, and point to resources, then your AWS Solutions Architect can help you take that project all the way home. 35 | 36 | __4. How far should I get on these projects?__ By the end of the day today, you should have stepped through your first lab in your group. Then you should have accessed your data, cleaned it, and started thinking about how to analyze it. These projects are long, and some of the steps will take some time to run, so just get as far as you can. You'll still have most of the day tomorrow to work on them. 37 | 38 | __5. Will I get to see the solutions for these projects?__ Yes! At the end of the session tomorrow, we'll have our experts walk through the solution set. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Architecting For Machine Learning on Amazon SageMaker in Health Care and Life Science 2 | Welcome to the art and science of machine learning! During this 2-day accelerator course you will quickly learn about the theory and application of machine learning for HCLS applications, with a strong focus on the AWS cloud and Amazon SageMaker. All of our projects are coming straight from the Health Care and Life Science domain space, so if you're familiar with the needs of analyzing medical imaging data, patient cost spending, dermatology, and even genomic analysis, you'll feel right at home. 3 | 4 | This accelerator is designed for data scientists who are new to AWS, and architects and developers who are new to machine learning. You will spend two days performing data science tasks: training models, evaluating them, analyzing data, etc. After this two day period you will be better suited to continue building data science solutions on AWS, designing architectural requirements for these, or supporting teams who currently do this. 5 | 6 | We will cover: 7 | - Statistical machine learning 8 | - Deep learning 9 | - Feature engineering 10 | - Deploying a model into production 11 | - Model evaluation and comparison 12 | - SageMaker deep dive: Studio, notebooks, training jobs, endpoints, model monitor, etc 13 | 14 | This course is designed for Python developers primarily. But since it is group-based, you will still have a great time even if you don't wrangle Python for your day job. We recommend reviewing Python programming using the statistical package Pandas. We also recommend having a Cloud Practiioner AWS Certification, but it is not required. Lastly, we recommend the book listed below. It is an excellent read, and clearly demonstrates all important concepts. The syntax might be a bit outdated at this point, but the concepts are still spot on. 15 | - https://pythonprogramming.net/data-analysis-python-pandas-tutorial-introduction/ 16 | - https://aws.amazon.com/certification/certified-cloud-practitioner/ 17 | - [Deep Learning with Python by Francois Chollet](https://www.amazon.com/Deep-Learning-Python-Francois-Chollet/dp/1617294438) 18 | 19 | ## Agenda 20 | __Day One:__ 21 | - Learn about ML on AWS 22 | - Go through a sample lab 23 | - Break into teams and focus on a new machine learning project 24 | __First Goal:__ Download your dataset to an S3 bucket, create a SageMaker Studio domain, and load your data into a Pandas dataframe. 25 | 26 | __Day Two:__ 27 | - Learn about feature engineering on AWS 28 | - Finish your first set of engineered features 29 | - Train your first model 30 | - Learn about model evaluation on AWS 31 | - Tune your model model using SageMaker automatic model tuning 32 | - Learn about putting your model into production on SageMaker 33 | __Deliverable:__ Demo your notebook to your colleagues! 34 | 35 | ## What you'll need 36 | - Your own laptop 37 | - Github account to share code with your project partners 38 | - Kaggle account to download data sets 39 | 40 | We will provide you an AWS account for this course. However, if you would like to bring your own dataset and use the time to build your own project, you're welcome to do that! We ask that you use your own AWS account in that case. 41 | -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/Computer Vision for XRay Analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Object Detection in Chest Xrays\n", 8 | "\n", 9 | "This workshop uses a portion of the NIH Chest Xray dataset. Specifically, we will use about 1,000 images where we will predict the location of the trachea and throat of the patient.\n", 10 | "\n", 11 | "In addition, you'll learn about a variety of SageMaker features for training. In this lab we will:\n", 12 | "1. Download and prepare the result of a Ground Truth labelling job for xray image classification\n", 13 | "2. Visualize this dataset locally\n", 14 | "3. Train an object detection model using the built-in object detection algorithm from SageMaker\n", 15 | "4. Leverage GPU's and spot instances for running the training job.\n", 16 | "5. Set up our own model using script mode, leveraging GluonCV\n", 17 | "6. Leverage debugger for this job\n", 18 | "7. Visualize the network for our model locally.\n", 19 | "8. View and track all of this progress using Experiments.\n", 20 | "\n", 21 | "---" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "!pip install --upgrade pip\n", 31 | "!pip install matplotlib\n", 32 | "!pip install imageio\n", 33 | "!pip install --upgrade awscli\n", 34 | "!pip install --upgrade boto3\n", 35 | "!pip install sagemaker-experiments\n", 36 | "!pip install --upgrade sagemaker" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Enable SageMaker Experiments\n", 44 | "First, let's create an experiment so we can track this job and all of our assets." 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": null, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "import boto3\n", 54 | "import time\n", 55 | "from smexperiments.experiment import Experiment\n", 56 | "\n", 57 | "sm = boto3.client('sagemaker')\n", 58 | "\n", 59 | "experiment_name = f\"xray-object-detection-{int(time.time())}\"\n", 60 | "experiment = Experiment.create(experiment_name=experiment_name, \n", 61 | " description=\"Training an object detection model on XRay data\", \n", 62 | " sagemaker_boto_client=sm)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": {}, 68 | "source": [ 69 | "Now you can open the Experiments tab on the lefthand side, and you should see a new experiment!" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "---\n", 77 | "# Download and Prepare NIH Images\n", 78 | "Next, we are going to access those 1000 images from the NIH dataset. Please ask your AWS SA for a link to the dataset. This is a timed presign url, so make sure to use it quickly!" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "#Change to URL sent by Workshop leader\n", 88 | "DATA_SOURCE = 'https://nih-xray-data.s3.amazonaws.com/compressed-image-file/images.tar.gz?AWSAccessKeyId=ASIASUWHP42B3EFFCQIY&Signature=hPONsLLap26VOe6WNnBe0NGea6I%3D&x-amz-security-token=IQoJb3JpZ2luX2VjEMX%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIAq%2Bx85gM9%2BMmWykkKE5qEJFeQOm5EDYUddVPwyrFtbtAiBqn3Sh%2FuRX3%2FS24Z7mRL670ouRjUrHqPJYmzQ1WULeuCrFAwgdEAAaDDE4MTg4MDc0MzU1NSIMPZ72cjbIbsy9QqicKqIDWl75J2dA07YuBGAJ6dUNuonWmfujluzIHeuWtIVL4GWi%2FnRESP3MNj1ddB2RbSdrpIKcE7PNhaPU99Ih3ld96hP%2BVYJ1DQbsh12zpvaPVrE2oYnG2TcTSAWqsaX5yIhUtG2VHyZui48aK8MihEVFhXtXHYLkRIE60hUskjMOrjbG%2Bm5MxFIkGyX%2BT69aI9wRlKs%2BN82XtstmsqJcMdgcdKls6KtGNJ5uY8poRWtEqNAVRAWcePMJbZnhmJm%2Bleanr52CXc6sp9QNdWhBrUZbi9rgIYYDR2nsoJVyhP0H%2FE0Yqc1bYaYgRRJhciXmlMSq%2BXdArh5awwnI%2Fpk8XhIQr4ouzzxV8nRTk4yK54JuyvoeviF7utmOz96eXCk%2BkNqqhHAlf%2FtR%2FkaOX5c%2BkHMa23Qr0X2BTxqFGwbste2ANPjRZc9DHmFP0hSDiWv0MgRtfMZxaLEkrpkYURE2Fy749TjdxzrxLPvdxi5R2UP4hCImAcwQvnF0VMW22B9oxJIvvfFBwVwpYN2Am2AmqJ7TTDxuLxnaUvyX0N4u8UL%2FtgcSjzDFgOL8BTrHAe7nptBN3%2BitZjpZ4dIGMC0bhHJ1rP2y4WZv6zGLcjGmRhIWllldzkrK7Ij2I2O8zPVbORGKB9wUdJZ31JaWiuO5o0cFyOBopLV1VjyN%2B4BwvLExUJWUXlL3QbNfaQD8GxfxZDVCTpK5T2BRJsID0s2dIcZvjlNiiKZYzSWh%2BZx52Ixs6IPcy9Gh1EfFMgpeAKK7rl1AaeWFF4VCEW1ixMyoa16U0CP1KfA2eXlhw3SsJrtQ25gx7OaXT3wywviml1lLT6Fu9pM%3D&Expires=1604434685'" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": {}, 95 | "outputs": [], 96 | "source": [ 97 | "import sagemaker\n", 98 | "\n", 99 | "BUCKET = sagemaker.Session().default_bucket()\n", 100 | "\n", 101 | "PREFIX='hcls-xray' #Change to your directory/prefix\n", 102 | "\n", 103 | "IMAGE_FILE = 'image_data.tar.gz' #do not edit\n" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": null, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "#first we download the compressed data from the bucket\n", 113 | "!wget \"$DATA_SOURCE\" -O image_data.tar.gz\n", 114 | "#then we will decompress the images to be used for training and validation\n", 115 | "!tar -xf image_data.tar.gz\n", 116 | "#now copy the data to S3\n", 117 | "!aws s3 cp --recursive --quiet images s3://$BUCKET/$PREFIX/image_data/\n", 118 | "print('Files uploaded to S3')" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "def create_new_manifest_file(input_file, output_file):\n", 128 | "\n", 129 | " template_manifest=open(input_file).readlines()\n", 130 | " output_manifest=[]\n", 131 | " for i in template_manifest:\n", 132 | " i=i.strip()\n", 133 | " i=i.replace('BUCKET',BUCKET) #have the manifest point to the actual bucket each individual is using for the workshop\n", 134 | " i=i.replace('PREFIX',PREFIX) #have the manifest point to the actual bucket each individual is using for the workshop\n", 135 | "\n", 136 | " output_manifest.append(i)\n", 137 | " f_out=open(output_file,'w')\n", 138 | " print(*output_manifest,file=f_out,sep=\"\\n\")\n", 139 | " f_out.close()\n", 140 | " \n", 141 | " return output_manifest\n", 142 | " \n", 143 | "output_manifest = create_new_manifest_file('template.manifest', 'output.manifest')" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "Let's inspect the contents of this labeled manfiest file. " 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": {}, 157 | "outputs": [], 158 | "source": [ 159 | "json.loads(output_manifest[0])" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "Now let's copy the manifest file out to S3." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "!aws s3 cp output.manifest s3://$BUCKET/$PREFIX/output.manifest" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "---\n", 183 | "# Analyze Local Data \n", 184 | "Next, let's open up a few of those image files to make sure we know what we're dealing with. Remember, these are picking up after someone has finished labelling them with SageMaker Ground Truth! " 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "%matplotlib inline\n", 194 | "%load_ext autoreload\n", 195 | "%autoreload 2\n", 196 | "import os\n", 197 | "from collections import namedtuple\n", 198 | "from collections import defaultdict\n", 199 | "from collections import Counter\n", 200 | "import itertools\n", 201 | "import json\n", 202 | "import random\n", 203 | "import time\n", 204 | "import imageio\n", 205 | "import numpy as np\n", 206 | "import matplotlib\n", 207 | "import matplotlib.pyplot as plt\n", 208 | "from matplotlib.backends.backend_pdf import PdfPages\n", 209 | "from sklearn.metrics import confusion_matrix\n", 210 | "import boto3\n", 211 | "import sagemaker\n", 212 | "from urllib.parse import urlparse\n", 213 | "\n", 214 | "fids2bbs = defaultdict(list)\n", 215 | "\n", 216 | "from ground_truth_od import group_miou\n", 217 | "from ground_truth_od import BoundingBox, WorkerBoundingBox, \\\n", 218 | " GroundTruthBox, BoxedImage" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "EXP_NAME = 'nih-chest-xrays' #where to put experiment data\n", 228 | "\n", 229 | "OUTPUT_MANIFEST_S3=f'{BUCKET}/{PREFIX}/output.manifest' #location of the manifest file in S3\n", 230 | "IMAGE_DATA_S3=f'{BUCKET}/{PREFIX}' #location of image data in s3\n", 231 | "\n", 232 | "print('S3 Location of Manifest File:')\n", 233 | "print(OUTPUT_MANIFEST_S3)\n", 234 | "\n", 235 | "print('S3 Location of Image Data:')\n", 236 | "print(IMAGE_DATA_S3)\n" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "First we will load and preprocess the manifest file. This manifest file is in fact an **augmented manifest file**, and also contains the location of the throat of the patient in the xray" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": null, 249 | "metadata": {}, 250 | "outputs": [], 251 | "source": [ 252 | "def read_data(file_name):\n", 253 | " !mkdir -p data #make a data directory if it does not exist\n", 254 | " with open(file_name, 'r') as f:\n", 255 | " output = [json.loads(line.strip()) for line in f.readlines()]\n", 256 | "\n", 257 | " return output\n", 258 | "\n", 259 | "def write_manifest(file_name):\n", 260 | " f_out=open(file_name,'w')\n", 261 | " for i in output_clean:\n", 262 | " print(json.dumps(i),file=f_out,sep=\"\\n\")\n", 263 | " f_out.close()\n", 264 | "\n", 265 | "def filter_manifest(file_name):\n", 266 | " 'remove any images that are not labeled.'\n", 267 | " \n", 268 | " output = read_data(file_name)\n", 269 | " \n", 270 | " output_clean =[]\n", 271 | " \n", 272 | " metadata_info ='xray-labeling-job-clone-clone-full-clone-metadata' #change depending on the job\n", 273 | " \n", 274 | " for the_sample in output:\n", 275 | " z = the_sample[metadata_info]['creation-date']\n", 276 | " output_clean.append(the_sample)\n", 277 | "\n", 278 | " print(f'Number of images without errors {len(output_clean)}')\n", 279 | " \n", 280 | " return(output_clean)\n", 281 | "\n", 282 | "\n", 283 | "output_clean = filter_manifest('output.manifest')\n", 284 | "write_manifest('data/output_manifest_clean.manifest')" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "def get_groundtruth_labels(output):\n", 294 | " # Create data arrays.\n", 295 | " img_uris = [None] * len(output)\n", 296 | " confidences = [None] * len(output)\n", 297 | " groundtruth_labels = [None] * len(output)\n", 298 | " human = np.zeros(len(output))\n", 299 | "\n", 300 | " # Find the job name contained within the manifest file manifest corresponds to.\n", 301 | " keys = list(output[0].keys())\n", 302 | " metakey = keys[np.where([('-metadata' in k) for k in keys])[0][0]]\n", 303 | " jobname = metakey[:-9]\n", 304 | "\n", 305 | " # Extract the data.\n", 306 | " for datum_id, datum in enumerate(output):\n", 307 | " img_uris[datum_id] = datum['source-ref']\n", 308 | " groundtruth_labels[datum_id] = str(datum[metakey]['class-map'])\n", 309 | " confidences[datum_id] = datum[metakey]['objects']\n", 310 | " human[datum_id] = int(datum[metakey]['human-annotated'] == 'yes')\n", 311 | " groundtruth_labels = np.array(groundtruth_labels)\n", 312 | " \n", 313 | " return groundtruth_labels\n", 314 | "\n", 315 | "groundtruth_labels = get_groundtruth_labels(output_clean)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": {}, 322 | "outputs": [], 323 | "source": [ 324 | "groundtruth_labels[0]" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": null, 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [ 333 | "def map_images_to_labels(output):\n", 334 | "\n", 335 | " # Create data arrays.\n", 336 | " confidences = np.zeros(len(output))\n", 337 | "\n", 338 | " # Find the job name the manifest corresponds to.\n", 339 | " keys = list(output[0].keys())\n", 340 | " metakey = keys[np.where([('-metadata' in k) for k in keys])[0][0]]\n", 341 | " jobname = metakey[:-9]\n", 342 | " output_images = []\n", 343 | " consolidated_boxes = []\n", 344 | "\n", 345 | " # Extract the data.\n", 346 | " for datum_id, datum in enumerate(output):\n", 347 | " image_size = datum[jobname]['image_size'][0]\n", 348 | " box_annotations = datum[jobname]['annotations']\n", 349 | " uri = datum['source-ref']\n", 350 | " box_confidences = datum[metakey]['objects']\n", 351 | " human = int(datum[metakey]['human-annotated'] == 'yes')\n", 352 | "\n", 353 | " # Make image object.\n", 354 | " image = BoxedImage(id=datum_id, size=image_size,\n", 355 | " uri=uri)\n", 356 | "\n", 357 | " # Create bounding boxes for image.\n", 358 | " boxes = []\n", 359 | " for i, annotation in enumerate(box_annotations):\n", 360 | " box = BoundingBox(image_id=datum_id, boxdata=annotation)\n", 361 | " box.confidence = box_confidences[i]['confidence']\n", 362 | " box.image = image\n", 363 | " box.human = human\n", 364 | " boxes.append(box)\n", 365 | " consolidated_boxes.append(box)\n", 366 | " image.consolidated_boxes = boxes\n", 367 | "\n", 368 | " # Store if the image is human labeled.\n", 369 | " image.human = human\n", 370 | "\n", 371 | " # Retrieve ground truth boxes for the image.\n", 372 | " oid_boxes_data = fids2bbs[image.oid_id]\n", 373 | " gt_boxes = []\n", 374 | " for data in oid_boxes_data:\n", 375 | " gt_box = GroundTruthBox(image_id=datum_id, oiddata=data,\n", 376 | " image=image)\n", 377 | " gt_boxes.append(gt_box)\n", 378 | " image.gt_boxes = gt_boxes\n", 379 | "\n", 380 | " output_images.append(image)\n", 381 | " \n", 382 | " return output_images, jobname\n", 383 | "\n", 384 | "output_images, jobname = map_images_to_labels(output_clean)" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "len(output_clean)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "def create_bounding_boxes(output_clean, output_images):\n", 403 | " # Iterate through the json files, creating bounding box objects.\n", 404 | " \n", 405 | " output_with_answers=[] #only include images with the answers in them\n", 406 | " output_images_with_answers=[]\n", 407 | "\n", 408 | " output_with_no_answers=[]\n", 409 | " output_images_with_no_answers=[]\n", 410 | "\n", 411 | " for i in range(0,len(output_clean)):\n", 412 | " try:\n", 413 | " #images with class_id have answers in them\n", 414 | " x = output_clean[i][jobname]['annotations'][0]['class_id']\n", 415 | "\n", 416 | " output_with_answers.append(output_clean[i])\n", 417 | " output_images_with_answers.append(output_images[i])\n", 418 | " except:\n", 419 | " output_with_no_answers.append(output_clean[i])\n", 420 | " output_images_with_no_answers.append(output_images[i])\n", 421 | " pass\n", 422 | "\n", 423 | " #add the box to the image\n", 424 | " for i in range(0,len(output_with_answers)):\n", 425 | " the_output=output_with_answers[i]\n", 426 | " the_image=output_images_with_answers[i]\n", 427 | " answers=the_output[jobname]['annotations']\n", 428 | " box=WorkerBoundingBox(image_id=i,boxdata=answers[0],worker_id='anon-worker')\n", 429 | " box.image=the_image\n", 430 | " the_image.worker_boxes.append(box)\n", 431 | "\n", 432 | " print(f\"Number of images with labeled trachea/throat: {len(output_images_with_answers)}\")\n", 433 | " print(f\"Number of images without labeled trachea/throat: {len(output_with_no_answers)}\")\n", 434 | " \n", 435 | " return output_with_answers, output_images_with_answers\n", 436 | " \n", 437 | "output_with_answers, output_images_with_answers = create_bounding_boxes(output_clean, output_images)" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": {}, 444 | "outputs": [], 445 | "source": [ 446 | "def download_images(output_images_with_answers, image_dir = 'data', dataset_size = 5):\n", 447 | " image_subset = np.random.choice(output_images_with_answers, dataset_size, replace=False)\n", 448 | "\n", 449 | " for img in image_subset:\n", 450 | " target_fname = os.path.join(\n", 451 | " image_dir, img.uri.split('/')[-1])\n", 452 | " if not os.path.isfile(target_fname):\n", 453 | " !aws s3 cp {img.uri} {target_fname}\n", 454 | " \n", 455 | " return image_subset\n", 456 | " \n", 457 | "image_subset = download_images(output_images_with_answers)" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "Next, we're going to plot the bounding boxes on the XRay data. Your plot should look something like this!\n", 465 | "\n", 466 | "![](images/gt_label_output.png)" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": null, 472 | "metadata": {}, 473 | "outputs": [], 474 | "source": [ 475 | "def visualize_images(image_subset, image_dir = 'data', n_show = 5):\n", 476 | " \n", 477 | " # Find human and auto-labeled images in the subset.\n", 478 | " human_labeled_subset = [img for img in image_subset if img.human]\n", 479 | "\n", 480 | " # Show examples of each\n", 481 | " fig, axes = plt.subplots(n_show, 2, figsize=(9, 2*n_show),\n", 482 | " facecolor='white', dpi=100)\n", 483 | " fig.suptitle('Human-labeled examples', fontsize=24)\n", 484 | " axes[0, 0].set_title('Worker labels', fontsize=14)\n", 485 | " axes[0, 1].set_title('Consolidated label', fontsize=14)\n", 486 | " for row, img in enumerate(np.random.choice(human_labeled_subset, size=n_show)):\n", 487 | " img.download(image_dir)\n", 488 | " img.plot_worker_bbs(axes[row, 0])\n", 489 | " img.plot_consolidated_bbs(axes[row, 1])\n", 490 | "\n", 491 | "visualize_images(image_subset)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "(Note that in this context we only had one labeler, so the consolidated label will be identical to the worker label)" 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "---" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "# Split Data and Copy to S3" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "def split_data(output):\n", 522 | " \n", 523 | " # Shuffle output in place.\n", 524 | " np.random.shuffle(output)\n", 525 | "\n", 526 | " dataset_size = len(output)\n", 527 | " train_test_split_index = round(dataset_size*0.9)\n", 528 | "\n", 529 | " train_data = output[:train_test_split_index]\n", 530 | " test_data = output[train_test_split_index:]\n", 531 | "\n", 532 | " train_test_split_index_2 = round(len(test_data)*0.5)\n", 533 | " validation_data=test_data[:train_test_split_index_2]\n", 534 | " hold_out=test_data[train_test_split_index_2:]\n", 535 | " \n", 536 | " return train_data, validation_data, hold_out\n", 537 | " \n", 538 | "train_data, validation_data, hold_out = split_data(output_with_answers)" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": null, 544 | "metadata": {}, 545 | "outputs": [], 546 | "source": [ 547 | "num_training_samples = 0\n", 548 | "with open('data/train.manifest', 'w') as f:\n", 549 | " for line in train_data:\n", 550 | " f.write(json.dumps(line))\n", 551 | " f.write('\\n')\n", 552 | " num_training_samples += 1\n", 553 | "\n", 554 | "with open('data/validation.manifest', 'w') as f:\n", 555 | " for line in validation_data:\n", 556 | " f.write(json.dumps(line))\n", 557 | " f.write('\\n')\n", 558 | "with open('data/hold_out.manifest', 'w') as f:\n", 559 | " for line in hold_out:\n", 560 | " f.write(json.dumps(line))\n", 561 | " f.write('\\n')\n", 562 | "\n", 563 | "print(f'Training Data Set Size: {len(train_data)}')\n", 564 | "print(f'Validatation Data Set Size: {len(validation_data)}')\n", 565 | "print(f'Hold Out Data Set Size: {len(hold_out)}')" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": null, 571 | "metadata": {}, 572 | "outputs": [], 573 | "source": [ 574 | "def copy_to_s3(bucket, prefix, expr_name):\n", 575 | " !aws s3 cp data/train.manifest s3://{bucket}/{prefix}/{expr_name}/train.manifest\n", 576 | " !aws s3 cp data/validation.manifest s3://{bucket}/{prefix}/{expr_name}/validation.manifest\n", 577 | " !aws s3 cp data/hold_out.manifest s3://{bucket}/{prefix}/{expr_name}/hold_out.manifest\n", 578 | " \n", 579 | "copy_to_s3(BUCKET, PREFIX, EXP_NAME)" 580 | ] 581 | }, 582 | { 583 | "cell_type": "markdown", 584 | "metadata": {}, 585 | "source": [ 586 | "# Train on SageMaker & Track with Experiments" 587 | ] 588 | }, 589 | { 590 | "cell_type": "markdown", 591 | "metadata": {}, 592 | "source": [ 593 | "Let's create a trial within the experiment that we can associate this job with. " 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": null, 599 | "metadata": {}, 600 | "outputs": [], 601 | "source": [ 602 | "from smexperiments.trial import Trial\n", 603 | "\n", 604 | "trial_name = f\"built-in-object-detection-{int(time.time())}\"\n", 605 | "\n", 606 | "trial = Trial.create(trial_name = trial_name,\n", 607 | " experiment_name = experiment_name,\n", 608 | " sagemaker_boto_client = sm)" 609 | ] 610 | }, 611 | { 612 | "cell_type": "code", 613 | "execution_count": null, 614 | "metadata": {}, 615 | "outputs": [], 616 | "source": [ 617 | "import re\n", 618 | "from sagemaker import get_execution_role\n", 619 | "from time import gmtime, strftime\n", 620 | "\n", 621 | "role = get_execution_role()\n", 622 | "sess = sagemaker.Session()\n", 623 | "s3 = boto3.resource('s3')\n", 624 | "\n", 625 | "training_image = sagemaker.image_uris.retrieve('object-detection', boto3.Session().region_name, version='latest')\n", 626 | "augmented_manifest_filename_train = 'train.manifest'\n", 627 | "augmented_manifest_filename_validation = 'validation.manifest'\n", 628 | "bucket_name = BUCKET\n", 629 | "s3_prefix = EXP_NAME\n" 630 | ] 631 | }, 632 | { 633 | "cell_type": "code", 634 | "execution_count": null, 635 | "metadata": {}, 636 | "outputs": [], 637 | "source": [ 638 | "# Defines paths for use in the training job request.\n", 639 | "s3_train_data_path = 's3://{}/{}/{}/train.manifest'.format(BUCKET, PREFIX, EXP_NAME)\n", 640 | "s3_validation_data_path = 's3://{}/{}/{}/validation.manifest'.format(BUCKET, PREFIX, EXP_NAME )\n", 641 | "s3_debug_path = \"s3://{}/{}/{}/debug-hook-data\".format(BUCKET, PREFIX, EXP_NAME)\n", 642 | "s3_output_path = f's3://{BUCKET}/{PREFIX}/{EXP_NAME}/output'" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": null, 648 | "metadata": {}, 649 | "outputs": [], 650 | "source": [ 651 | "\n", 652 | "augmented_manifest_s3_key = s3_train_data_path.split(bucket_name)[1][1:]\n", 653 | "s3_obj = s3.Object(bucket_name, augmented_manifest_s3_key)\n", 654 | "augmented_manifest = s3_obj.get()['Body'].read().decode('utf-8')\n", 655 | "augmented_manifest_lines = augmented_manifest.split('\\n')\n", 656 | "num_training_samples = len(augmented_manifest_lines) # Compute number of training samples for use in training job request.\n", 657 | "\n", 658 | "# Determine the keys in the training manifest and exclude the meta data from the labling job.\n", 659 | "attribute_names = list(json.loads(augmented_manifest_lines[0]).keys())\n", 660 | "attribute_names = [attrib for attrib in attribute_names if 'meta' not in attrib]" 661 | ] 662 | }, 663 | { 664 | "cell_type": "code", 665 | "execution_count": null, 666 | "metadata": {}, 667 | "outputs": [], 668 | "source": [ 669 | "# Create unique job name\n", 670 | "job_name_prefix = EXP_NAME\n", 671 | "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", 672 | "model_job_name = job_name_prefix + timestamp" 673 | ] 674 | }, 675 | { 676 | "cell_type": "code", 677 | "execution_count": null, 678 | "metadata": {}, 679 | "outputs": [], 680 | "source": [ 681 | "# Create unique job name\n", 682 | "job_name_prefix = EXP_NAME\n", 683 | "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", 684 | "model_job_name = job_name_prefix + timestamp\n", 685 | "\n", 686 | "# set up your training job using boto3 API syntax\n", 687 | "training_params = \\\n", 688 | " {\n", 689 | " \"AlgorithmSpecification\": {\n", 690 | " # NB. This is one of the named constants defined in the first cell.\n", 691 | " \"TrainingImage\": training_image,\n", 692 | " \"TrainingInputMode\": \"Pipe\"\n", 693 | " },\n", 694 | " \"RoleArn\": role,\n", 695 | " \"OutputDataConfig\": {\n", 696 | " \"S3OutputPath\": s3_output_path\n", 697 | " },\n", 698 | " \"ResourceConfig\": {\n", 699 | " \"InstanceCount\": 1,\n", 700 | " \"InstanceType\": \"ml.p3.2xlarge\", #Use a GPU backed instance\n", 701 | " \"VolumeSizeInGB\": 50\n", 702 | " },\n", 703 | " \"TrainingJobName\": model_job_name,\n", 704 | " \"HyperParameters\": { # NB. These hyperparameters are at the user's discretion and are beyond the scope of this demo.\n", 705 | " \"base_network\": \"resnet-50\",\n", 706 | " \"use_pretrained_model\": \"1\",\n", 707 | " \"num_classes\": \"1\",\n", 708 | " \"mini_batch_size\": \"10\",\n", 709 | " \"epochs\": \"30\",\n", 710 | " \"learning_rate\": \"0.001\",\n", 711 | " \"lr_scheduler_step\": \"\",\n", 712 | " \"lr_scheduler_factor\": \"0.1\",\n", 713 | " \"optimizer\": \"sgd\",\n", 714 | " \"momentum\": \"0.9\",\n", 715 | " \"weight_decay\": \"0.0005\",\n", 716 | " \"overlap_threshold\": \"0.5\",\n", 717 | " \"nms_threshold\": \"0.45\",\n", 718 | " \"image_shape\": \"300\",\n", 719 | " \"label_width\": \"350\",\n", 720 | " \"num_training_samples\": str(num_training_samples)\n", 721 | " },\n", 722 | " \"StoppingCondition\": {\n", 723 | " \"MaxRuntimeInSeconds\": 86400,\n", 724 | " \"MaxWaitTimeInSeconds\":259200,\n", 725 | "\n", 726 | " },\n", 727 | " \"EnableManagedSpotTraining\" :True,\n", 728 | " \"InputDataConfig\": [\n", 729 | " {\n", 730 | " \"ChannelName\": \"train\",\n", 731 | " \"DataSource\": {\n", 732 | " \"S3DataSource\": {\n", 733 | " \"S3DataType\": \"AugmentedManifestFile\", # NB. Augmented Manifest\n", 734 | " \"S3Uri\": s3_train_data_path,\n", 735 | " \"S3DataDistributionType\": \"FullyReplicated\",\n", 736 | " # NB. This must correspond to the JSON field names in your augmented manifest.\n", 737 | " \"AttributeNames\": attribute_names\n", 738 | " }\n", 739 | " },\n", 740 | " \"ContentType\": \"application/x-recordio\",\n", 741 | " \"RecordWrapperType\": \"RecordIO\",\n", 742 | " \"CompressionType\": \"None\"\n", 743 | " },\n", 744 | " {\n", 745 | " \"ChannelName\": \"validation\",\n", 746 | " \"DataSource\": {\n", 747 | " \"S3DataSource\": {\n", 748 | " \"S3DataType\": \"AugmentedManifestFile\", # NB. Augmented Manifest\n", 749 | " \"S3Uri\": s3_validation_data_path,\n", 750 | " \"S3DataDistributionType\": \"FullyReplicated\",\n", 751 | " # NB. This must correspond to the JSON field names in your augmented manifest.\n", 752 | " \"AttributeNames\": attribute_names\n", 753 | " }\n", 754 | " },\n", 755 | " \"ContentType\": \"application/x-recordio\",\n", 756 | " \"RecordWrapperType\": \"RecordIO\",\n", 757 | " \"CompressionType\": \"None\"\n", 758 | " }\n", 759 | " ],\n", 760 | " \"ExperimentConfig\": {\n", 761 | " 'ExperimentName': experiment_name,\n", 762 | " 'TrialName': trial_name,\n", 763 | " 'TrialComponentDisplayName': 'Training'\n", 764 | " },\n", 765 | " \"DebugHookConfig\":{\n", 766 | " 'S3OutputPath': s3_debug_path,\n", 767 | " 'CollectionConfigurations': [\n", 768 | " {\n", 769 | " 'CollectionName': 'all_tensors',\n", 770 | " 'CollectionParameters': {\n", 771 | " 'include_regex': '.*',\n", 772 | " \"save_steps\":\"1, 2, 3\"\n", 773 | " }\n", 774 | " },\n", 775 | " ]\n", 776 | " },\n", 777 | " }\n", 778 | "\n", 779 | "print('Training job name: {}'.format(model_job_name))\n", 780 | "print('\\nInput Data Location: {}'.format(\n", 781 | " training_params['InputDataConfig'][0]['DataSource']['S3DataSource']))\n" 782 | ] 783 | }, 784 | { 785 | "cell_type": "code", 786 | "execution_count": null, 787 | "metadata": {}, 788 | "outputs": [], 789 | "source": [ 790 | "client = boto3.client(service_name='sagemaker')\n", 791 | "client.create_training_job(**training_params)\n", 792 | "\n", 793 | "# Confirm that the training job has started\n", 794 | "status = client.describe_training_job(TrainingJobName=model_job_name)['TrainingJobStatus']\n", 795 | "print(f'Training job name: {model_job_name}')\n", 796 | "print('Training job current status: {}'.format(status))" 797 | ] 798 | }, 799 | { 800 | "cell_type": "markdown", 801 | "metadata": {}, 802 | "source": [ 803 | "Using the default p3.2xlarge as noted here, this job should take about an hour to train. While that's happening, circle back and step through the code again. Make sure you really understood how everything is coming together.\n", 804 | "\n", 805 | "### Monitor Job Progress using Experiments\n", 806 | "If you are running on Studio, you should be able to open up the Experiments tab and see the status of your job." 807 | ] 808 | }, 809 | { 810 | "cell_type": "markdown", 811 | "metadata": {}, 812 | "source": [ 813 | "# Convert Images into RecordIO\n", 814 | "As is well documented, training deep learning models can take a long time. One way to speed this up is by using an optimized file format, such as recordIO. Let's convert our pngs into recordIO for the next step. " 815 | ] 816 | }, 817 | { 818 | "cell_type": "code", 819 | "execution_count": null, 820 | "metadata": {}, 821 | "outputs": [], 822 | "source": [ 823 | "!pip install mxnet\n", 824 | "!pip install opencv-python-headless" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": null, 830 | "metadata": {}, 831 | "outputs": [], 832 | "source": [ 833 | "# point the first argument to the location of your local png image folder\n", 834 | "# running the script with this command will create a lst file, listing all of your images for the train set\n", 835 | "!python im2rec.py --root \"/root/images\" --prefix \"train\" --exts '.png' --chunks 1 --create_list 'Yes'" 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": null, 841 | "metadata": {}, 842 | "outputs": [], 843 | "source": [ 844 | "# running this file will create a train.idx and train.rec file\n", 845 | "!python im2rec.py --root \"/root/images\" --prefix '/root/amazon-sagemaker-architecting-for-ml-hcls/Starter Notebooks/Advanced Data Science - XRay Analysis/' --exts '.png' --chunks 1 --create_list 'no'" 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": null, 851 | "metadata": {}, 852 | "outputs": [], 853 | "source": [ 854 | "!aws s3 cp train.idx s3://$BUCKET/$PREFIX/recio-files/\n", 855 | "!aws s3 cp train.rec s3://$BUCKET/$PREFIX/recio-files/" 856 | ] 857 | }, 858 | { 859 | "cell_type": "markdown", 860 | "metadata": {}, 861 | "source": [ 862 | "---\n", 863 | "# Bring your own Model and Train on SageMaker with Script Mode\n", 864 | "Once your job finishes, you are welcome to explore bringing your own script into SageMaker. Below we're demonstrating using GluonCV to bring a custom ssd model using the MXNet container. This is nice because it's coming with it's own pre-trained model! \n", 865 | "- https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker_neo_compilation_jobs/gluoncv_ssd_mobilenet/gluoncv_ssd_mobilenet_neo.ipynb\n", 866 | "\n", 867 | "Notice that by using script mode we automatically get access to debugger, which will give us the ability to visualize our neural network locally. Let's get it up and running!\n", 868 | "\n", 869 | "If you prefer, you are welcome to bring your own preferred SSD model instead.\n", 870 | "\n" 871 | ] 872 | }, 873 | { 874 | "cell_type": "code", 875 | "execution_count": null, 876 | "metadata": {}, 877 | "outputs": [], 878 | "source": [ 879 | "%%writefile src/requirements.txt\n", 880 | "\n", 881 | "gluoncv" 882 | ] 883 | }, 884 | { 885 | "cell_type": "code", 886 | "execution_count": null, 887 | "metadata": {}, 888 | "outputs": [], 889 | "source": [ 890 | "from sagemaker.mxnet import MXNet\n", 891 | "import sagemaker\n", 892 | "from sagemaker.debugger import DebuggerHookConfig, CollectionConfig\n", 893 | "\n", 894 | "role = sagemaker.get_execution_role()\n", 895 | "\n", 896 | "ssd_estimator = MXNet(entry_point='ssd_entry_point.py',\n", 897 | " source_dir = 'src',\n", 898 | " role=role,\n", 899 | " output_path=s3_output_path,\n", 900 | " instance_count=1,\n", 901 | " instance_type='ml.p3.8xlarge',\n", 902 | " framework_version='1.6',\n", 903 | " py_version='py3',\n", 904 | " use_spot_instances=True,\n", 905 | " max_wait = (8600*3),\n", 906 | " max_run = 8600,\n", 907 | " distribution={'parameter_server': {'enabled': True}},\n", 908 | " hyperparameters={'epochs': 1, 'data-shape': 350},\n", 909 | " debugger_hook_config = DebuggerHookConfig(\n", 910 | " s3_output_path = s3_debug_path,\n", 911 | " collection_configs = [CollectionConfig(name='all_tensors',\n", 912 | " parameters={'include_regex':'.*', 'save_steps':'1,2,3'})])) \n", 913 | "\n", 914 | "ssd_estimator.fit(inputs = {'train': 's3://{}/{}/recio-files'.format(BUCKET, PREFIX)}, \n", 915 | " experiment_config = {'ExperimentName': experiment_name,\n", 916 | " 'TrialName': 'xray-recordio-gluoncv', 'TrialComponentDisplayName': 'Training'})" 917 | ] 918 | }, 919 | { 920 | "cell_type": "markdown", 921 | "metadata": {}, 922 | "source": [ 923 | "You might discover an issue here - GluonCV is stuggling to find the labels from our bounding boxes. Can you figure out how to supply them correctly? " 924 | ] 925 | }, 926 | { 927 | "cell_type": "markdown", 928 | "metadata": {}, 929 | "source": [ 930 | "---\n", 931 | "# Visualize Model with SageMaker Debugger\n", 932 | "Now, we're going to use SageMaker Debugger to build a TensorPlot of our model!" 933 | ] 934 | }, 935 | { 936 | "cell_type": "markdown", 937 | "metadata": {}, 938 | "source": [ 939 | "![](images/tensorplot.gif)" 940 | ] 941 | }, 942 | { 943 | "cell_type": "code", 944 | "execution_count": null, 945 | "metadata": {}, 946 | "outputs": [], 947 | "source": [ 948 | "!aws s3 sync {s3_debug_path} ." 949 | ] 950 | }, 951 | { 952 | "cell_type": "code", 953 | "execution_count": null, 954 | "metadata": {}, 955 | "outputs": [], 956 | "source": [ 957 | "import tensor_plot \n", 958 | "\n", 959 | "visualization = tensor_plot.TensorPlot(\n", 960 | " regex=\".*relu_output\", \n", 961 | " path=folder_name,\n", 962 | " steps=10, \n", 963 | " batch_sample_id=0,\n", 964 | " color_channel = 1,\n", 965 | " title=\"Relu outputs\",\n", 966 | " label=\".*sequential0_input_0\",\n", 967 | " prediction=\".*sequential0_output_0\"\n", 968 | ")" 969 | ] 970 | }, 971 | { 972 | "cell_type": "markdown", 973 | "metadata": {}, 974 | "source": [ 975 | "If we plot too many layers, it can crash the notebook. If you encounter performance or out of memory issues, then either try to reduce the layers to plot by changing the regex or run this Notebook in JupyterLab instead of Jupyter.\n", 976 | "\n", 977 | "In the below cell we vizualize outputs of all layers, including final classification. Please note that because training job ran only for a few epochs classification accuracy is not high." 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": null, 983 | "metadata": {}, 984 | "outputs": [], 985 | "source": [ 986 | "visualization.fig.show(renderer=\"iframe\")" 987 | ] 988 | }, 989 | { 990 | "cell_type": "markdown", 991 | "metadata": {}, 992 | "source": [ 993 | "---\n", 994 | "# Extentions\n", 995 | "If you make it here with spare time, why not try to bring another model into SageMaker? Or set up the automatic model tuner on your own script file? Or optimize your model for deployment using SageMaker neo? \n", 996 | "\n", 997 | "You can also deploy some of these models and start to get predictions from them using `model.deploy()`.\n", 998 | "\n", 999 | "Feel free to use the rest of your time to build something awesome. " 1000 | ] 1001 | }, 1002 | { 1003 | "cell_type": "code", 1004 | "execution_count": null, 1005 | "metadata": {}, 1006 | "outputs": [], 1007 | "source": [] 1008 | } 1009 | ], 1010 | "metadata": { 1011 | "instance_type": "ml.t3.medium", 1012 | "kernelspec": { 1013 | "display_name": "Python 3 (Data Science)", 1014 | "language": "python", 1015 | "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" 1016 | }, 1017 | "language_info": { 1018 | "codemirror_mode": { 1019 | "name": "ipython", 1020 | "version": 3 1021 | }, 1022 | "file_extension": ".py", 1023 | "mimetype": "text/x-python", 1024 | "name": "python", 1025 | "nbconvert_exporter": "python", 1026 | "pygments_lexer": "ipython3", 1027 | "version": "3.7.6" 1028 | } 1029 | }, 1030 | "nbformat": 4, 1031 | "nbformat_minor": 4 1032 | } 1033 | -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/ground_truth_od.py: -------------------------------------------------------------------------------- 1 | '''Define classes and functions for interfacing with SageMaker Ground 2 | Truth object detection. 3 | 4 | ''' 5 | 6 | import os 7 | import imageio 8 | import matplotlib.pyplot as plt 9 | import numpy as np 10 | 11 | 12 | class BoundingBox: 13 | '''Bounding box for an object in an image.''' 14 | 15 | def __init__(self, image_id=None, boxdata=None): 16 | self.image_id = image_id 17 | if boxdata: 18 | for datum in boxdata: 19 | setattr(self, datum, boxdata[datum]) 20 | 21 | def __repr__(self): 22 | return 'Box for image {}'.format(self.image_id) 23 | 24 | def compute_bb_data(self): 25 | '''Compute the parameters used for IoU.''' 26 | image = self.image 27 | self.xmin = self.left/image.width 28 | self.xmax = (self.left + self.width)/image.width 29 | self.ymin = self.top/image.height 30 | self.ymax = (self.top + self.height)/image.height 31 | 32 | 33 | class WorkerBoundingBox(BoundingBox): 34 | '''Bounding box for an object in an image produced by a worker.''' 35 | 36 | def __init__(self, image_id=None, worker_id=None, boxdata=None): 37 | self.worker_id = worker_id 38 | super().__init__(image_id=image_id, boxdata=boxdata) 39 | 40 | 41 | class GroundTruthBox(BoundingBox): 42 | '''Bounding box for an object in an image produced by a worker.''' 43 | 44 | def __init__(self, image_id=None, oiddata=None, image=None): 45 | self.image = image 46 | self.class_name = oiddata[0] 47 | xmin, xmax, ymin, ymax = [float(datum) for datum in oiddata[1:]] 48 | self.xmin = xmin 49 | self.ymin = ymin 50 | self.xmax = xmax 51 | self.ymax = ymax 52 | imw = image.width 53 | imh = image.height 54 | boxdata = {'height': (ymax-ymin)*imh, 55 | 'width': (xmax-xmin)*imw, 56 | 'left': xmin*imw, 57 | 'top': ymin*imh} 58 | super().__init__(image_id=image_id, boxdata=boxdata) 59 | 60 | 61 | class BoxedImage: 62 | '''Image with bounding boxes.''' 63 | 64 | def __init__(self, id=None, consolidated_boxes=None, 65 | worker_boxes=None, gt_boxes=None, uri=None, 66 | size=None): 67 | self.id = id 68 | self.uri = uri 69 | if uri: 70 | self.filename = uri.split('/')[-1] 71 | self.oid_id = self.filename.split('.')[0] 72 | else: 73 | self.filename = None 74 | self.oid_id = None 75 | self.local = None 76 | self.im = None 77 | if size: 78 | self.width = size['width'] 79 | self.depth = size['depth'] 80 | self.height = size['height'] 81 | self.shape = self.width, self.height, self.depth 82 | if consolidated_boxes: 83 | self.consolidated_boxes = consolidated_boxes 84 | else: 85 | self.consolidated_boxes = [] 86 | if worker_boxes: 87 | self.worker_boxes = worker_boxes 88 | else: 89 | self.worker_boxes = [] 90 | if gt_boxes: 91 | self.gt_boxes = gt_boxes 92 | else: 93 | self.gt_boxes = [] 94 | 95 | def __repr__(self): 96 | return 'Image{}'.format(self.id) 97 | 98 | def n_consolidated_boxes(self): 99 | '''Count the number of consolidated boxes.''' 100 | return len(self.consolidated_boxes) 101 | 102 | def n_worker_boxes(self): 103 | return len(self.worker_boxes) 104 | 105 | def download(self, directory): 106 | target_fname = os.path.join( 107 | directory, self.uri.split('/')[-1]) 108 | if not os.path.isfile(target_fname): 109 | os.system(f'aws s3 cp {self.uri} {target_fname}') 110 | self.local = target_fname 111 | 112 | def imread(self): 113 | '''Cache the image reading process.''' 114 | try: 115 | return imageio.imread(self.local) 116 | except OSError: 117 | print("You need to download this image first. " 118 | "Use this_image.download(local_directory).") 119 | raise 120 | 121 | def plot_bbs(self, ax, bbs, img_kwargs, box_kwargs, **kwargs): 122 | '''Master function for plotting images with bounding boxes.''' 123 | img = self.imread() 124 | ax.imshow(img, **img_kwargs) 125 | imh, imw, *_ = img.shape 126 | box_kwargs['fill'] = None 127 | if kwargs.get('worker', False): 128 | # Give each worker a color. 129 | worker_colors = {} 130 | worker_count = 0 131 | for bb in bbs: 132 | worker = bb.worker_id 133 | if worker not in worker_colors: 134 | worker_colors[worker] = 'C' + str((9-worker_count) % 10) 135 | worker_count += 1 136 | rec = plt.Rectangle((bb.left, bb.top), bb.width, bb.height, 137 | edgecolor=worker_colors[worker], 138 | **box_kwargs) 139 | ax.add_patch(rec) 140 | else: 141 | for bb in bbs: 142 | rec = plt.Rectangle( 143 | (bb.left, bb.top), bb.width, bb.height, **box_kwargs) 144 | ax.add_patch(rec) 145 | ax.axis('off') 146 | 147 | def plot_consolidated_bbs(self, ax, img_kwargs={}, 148 | box_kwargs={'edgecolor': 'blue', 149 | 'lw': 3}): 150 | '''Plot the consolidated boxes.''' 151 | self.plot_bbs(ax, self.consolidated_boxes, 152 | img_kwargs=img_kwargs, box_kwargs=box_kwargs) 153 | 154 | def plot_worker_bbs(self, ax, img_kwargs={}, box_kwargs={'lw': 2}): 155 | '''Plot the individual worker boxes.''' 156 | self.plot_bbs(ax, self.worker_boxes, worker=True, 157 | img_kwargs=img_kwargs, box_kwargs=box_kwargs) 158 | 159 | def plot_gt_bbs(self, ax, img_kwargs={}, 160 | box_kwargs={'edgecolor': 'lime', 161 | 'lw': 3}): 162 | '''Plot the ground truth (Open Image Dataset) boxes.''' 163 | self.plot_bbs(ax, self.gt_boxes, 164 | img_kwargs=img_kwargs, box_kwargs=box_kwargs) 165 | 166 | def compute_img_confidence(self): 167 | ''' Compute the mean bb confidence. ''' 168 | if len(self.consolidated_boxes) > 0: 169 | return np.mean([box.confidence for box in self.consolidated_boxes]) 170 | else: 171 | return 0 172 | 173 | def compute_iou_bb(self): 174 | '''Compute the mean intersection over union for a collection of 175 | bounding boxes. 176 | ''' 177 | 178 | # Precompute data for the consolidated boxes if necessary. 179 | for box in self.consolidated_boxes: 180 | try: 181 | box.xmin 182 | except AttributeError: 183 | box.compute_bb_data() 184 | 185 | # Make the numpy arrays. 186 | if self.gt_boxes: 187 | gts = np.vstack([(box.xmin, box.ymin, box.xmax, box.ymax) 188 | for box in self.gt_boxes]) 189 | else: 190 | gts = [] 191 | if self.consolidated_boxes: 192 | preds = np.vstack([(box.xmin, box.ymin, box.xmax, box.ymax) 193 | for box in self.consolidated_boxes]) 194 | else: 195 | preds = [] 196 | confs = np.array([box.confidence for box in self.consolidated_boxes]) 197 | 198 | if len(preds) == 0 and len(gts) == 0: 199 | return 1. 200 | if len(preds) == 0 or len(gts) == 0: 201 | return 0. 202 | preds = preds[np.argsort(confs.flatten())][::-1] 203 | 204 | is_pred_assigned_to_gt = [False] * len(gts) 205 | pred_areas = (preds[:, 2] - preds[:, 0]) * \ 206 | (preds[:, 3] - preds[:, 1]) 207 | gt_areas = (gts[:, 2] - gts[:, 0]) * (gts[:, 3] - gts[:, 1]) 208 | all_ious = [] 209 | for pred_id, pred in enumerate(preds): 210 | best_iou = 0 211 | best_id = -1 212 | for gt_id, gt in enumerate(gts): 213 | if is_pred_assigned_to_gt[gt_id]: 214 | continue 215 | x1 = max(gt[0], pred[0]) 216 | y1 = max(gt[1], pred[1]) 217 | x2 = min(gt[2], pred[2]) 218 | y2 = min(gt[3], pred[3]) 219 | iw = max(0, x2 - x1) 220 | ih = max(0, y2 - y1) 221 | inter = iw * ih 222 | iou = inter / \ 223 | (pred_areas[pred_id] + gt_areas[gt_id] - inter) 224 | if iou > best_iou: 225 | best_iou = iou 226 | best_id = gt_id 227 | if best_id != -1: 228 | is_pred_assigned_to_gt[best_id] = True 229 | # True positive! Store the IoU. 230 | all_ious.append(best_iou) 231 | else: 232 | # 0 IoU for each unmatched gt (false-negative). 233 | all_ious.append(0.) 234 | 235 | # 0 IoU for each unmatched prediction (false-positive). 236 | all_ious.extend([0.] * (len(is_pred_assigned_to_gt) - 237 | sum(is_pred_assigned_to_gt))) 238 | 239 | return np.mean(all_ious) 240 | 241 | 242 | def group_miou(imgs): 243 | '''Compute the mIoU for a group of images. 244 | 245 | Args: 246 | imgs: list of BoxedImages, with consolidated_boxes and gt_boxes. 247 | 248 | Returns: 249 | mIoU calculated over the bounding boxes in the group. 250 | ''' 251 | # Create a notional BoxedImage with bounding boxes from imgs. 252 | all_consolidated_boxes = [box for img in imgs 253 | for box in img.consolidated_boxes] 254 | all_gt_boxes = [box for img in imgs 255 | for box in img.gt_boxes] 256 | notional_image = BoxedImage(consolidated_boxes=all_consolidated_boxes, 257 | gt_boxes=all_gt_boxes) 258 | 259 | # Compute and return the mIoU. 260 | return notional_image.compute_iou_bb() 261 | -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/im2rec.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | # -*- coding: utf-8 -*- 3 | # Licensed to the Apache Software Foundation (ASF) under one 4 | # or more contributor license agreements. See the NOTICE file 5 | # distributed with this work for additional information 6 | # regarding copyright ownership. The ASF licenses this file 7 | # to you under the Apache License, Version 2.0 (the 8 | # "License"); you may not use this file except in compliance 9 | # with the License. You may obtain a copy of the License at 10 | # 11 | # http://www.apache.org/licenses/LICENSE-2.0 12 | # 13 | # Unless required by applicable law or agreed to in writing, 14 | # software distributed under the License is distributed on an 15 | # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 16 | # KIND, either express or implied. See the License for the 17 | # specific language governing permissions and limitations 18 | # under the License. 19 | 20 | from __future__ import print_function 21 | import os 22 | import sys 23 | 24 | curr_path = os.path.abspath(os.path.dirname(__file__)) 25 | sys.path.append(os.path.join(curr_path, "../python")) 26 | import mxnet as mx 27 | import random 28 | import argparse 29 | import cv2 30 | import time 31 | import traceback 32 | 33 | try: 34 | import multiprocessing 35 | except ImportError: 36 | multiprocessing = None 37 | 38 | def list_image(root, recursive, exts): 39 | i = 0 40 | if recursive: 41 | cat = {} 42 | for path, dirs, files in os.walk(root, followlinks=True): 43 | dirs.sort() 44 | files.sort() 45 | for fname in files: 46 | fpath = os.path.join(path, fname) 47 | suffix = os.path.splitext(fname)[1].lower() 48 | if os.path.isfile(fpath) and (suffix in exts): 49 | if path not in cat: 50 | cat[path] = len(cat) 51 | yield (i, os.path.relpath(fpath, root), cat[path]) 52 | i += 1 53 | for k, v in sorted(cat.items(), key=lambda x: x[1]): 54 | print(os.path.relpath(k, root), v) 55 | else: 56 | for fname in sorted(os.listdir(root)): 57 | fpath = os.path.join(root, fname) 58 | suffix = os.path.splitext(fname)[1].lower() 59 | if os.path.isfile(fpath) and (suffix in exts): 60 | yield (i, os.path.relpath(fpath, root), 0) 61 | i += 1 62 | 63 | def write_list(path_out, image_list): 64 | with open(path_out, 'w') as fout: 65 | for i, item in enumerate(image_list): 66 | line = '%d\t' % item[0] 67 | for j in item[2:]: 68 | line += '%f\t' % j 69 | line += '%s\n' % item[1] 70 | fout.write(line) 71 | 72 | def make_list(args): 73 | image_list = list_image(args.root, args.recursive, args.exts) 74 | image_list = list(image_list) 75 | if args.shuffle is True: 76 | random.seed(100) 77 | random.shuffle(image_list) 78 | N = len(image_list) 79 | chunk_size = (N + args.chunks - 1) // args.chunks 80 | for i in range(args.chunks): 81 | chunk = image_list[i * chunk_size:(i + 1) * chunk_size] 82 | if args.chunks > 1: 83 | str_chunk = '_%d' % i 84 | else: 85 | str_chunk = '' 86 | sep = int(chunk_size * args.train_ratio) 87 | sep_test = int(chunk_size * args.test_ratio) 88 | if args.train_ratio == 1.0: 89 | write_list(args.prefix + str_chunk + '.lst', chunk) 90 | else: 91 | if args.test_ratio: 92 | write_list(args.prefix + str_chunk + '_test.lst', chunk[:sep_test]) 93 | if args.train_ratio + args.test_ratio < 1.0: 94 | write_list(args.prefix + str_chunk + '_val.lst', chunk[sep_test + sep:]) 95 | write_list(args.prefix + str_chunk + '_train.lst', chunk[sep_test:sep_test + sep]) 96 | 97 | def read_list(path_in): 98 | with open(path_in) as fin: 99 | while True: 100 | line = fin.readline() 101 | if not line: 102 | break 103 | line = [i.strip() for i in line.strip().split('\t')] 104 | line_len = len(line) 105 | if line_len < 3: 106 | print('lst should at least has three parts, but only has %s parts for %s' %(line_len, line)) 107 | continue 108 | try: 109 | item = [int(line[0])] + [line[-1]] + [float(i) for i in line[1:-1]] 110 | except Exception as e: 111 | print('Parsing lst met error for %s, detail: %s' %(line, e)) 112 | continue 113 | yield item 114 | 115 | def image_encode(args, i, item, q_out): 116 | fullpath = os.path.join(args.root, item[1]) 117 | 118 | if len(item) > 3 and args.pack_label: 119 | header = mx.recordio.IRHeader(0, item[2:], item[0], 0) 120 | else: 121 | header = mx.recordio.IRHeader(0, item[2], item[0], 0) 122 | 123 | if args.pass_through: 124 | try: 125 | with open(fullpath, 'rb') as fin: 126 | img = fin.read() 127 | s = mx.recordio.pack(header, img) 128 | q_out.put((i, s, item)) 129 | except Exception as e: 130 | traceback.print_exc() 131 | print('pack_img error:', item[1], e) 132 | q_out.put((i, None, item)) 133 | return 134 | 135 | try: 136 | img = cv2.imread(fullpath, args.color) 137 | except: 138 | traceback.print_exc() 139 | print('imread error trying to load file: %s ' % fullpath) 140 | q_out.put((i, None, item)) 141 | return 142 | if img is None: 143 | print('imread read blank (None) image for file: %s' % fullpath) 144 | q_out.put((i, None, item)) 145 | return 146 | if args.center_crop: 147 | if img.shape[0] > img.shape[1]: 148 | margin = (img.shape[0] - img.shape[1]) // 2; 149 | img = img[margin:margin + img.shape[1], :] 150 | else: 151 | margin = (img.shape[1] - img.shape[0]) // 2; 152 | img = img[:, margin:margin + img.shape[0]] 153 | if args.resize: 154 | if img.shape[0] > img.shape[1]: 155 | newsize = (args.resize, img.shape[0] * args.resize // img.shape[1]) 156 | else: 157 | newsize = (img.shape[1] * args.resize // img.shape[0], args.resize) 158 | img = cv2.resize(img, newsize) 159 | 160 | try: 161 | s = mx.recordio.pack_img(header, img, quality=args.quality, img_fmt=args.encoding) 162 | q_out.put((i, s, item)) 163 | except Exception as e: 164 | traceback.print_exc() 165 | print('pack_img error on file: %s' % fullpath, e) 166 | q_out.put((i, None, item)) 167 | return 168 | 169 | def read_worker(args, q_in, q_out): 170 | while True: 171 | deq = q_in.get() 172 | if deq is None: 173 | break 174 | i, item = deq 175 | image_encode(args, i, item, q_out) 176 | 177 | def write_worker(q_out, fname, working_dir): 178 | pre_time = time.time() 179 | count = 0 180 | fname = os.path.basename(fname) 181 | fname_rec = os.path.splitext(fname)[0] + '.rec' 182 | fname_idx = os.path.splitext(fname)[0] + '.idx' 183 | record = mx.recordio.MXIndexedRecordIO(os.path.join(working_dir, fname_idx), 184 | os.path.join(working_dir, fname_rec), 'w') 185 | buf = {} 186 | more = True 187 | while more: 188 | deq = q_out.get() 189 | if deq is not None: 190 | i, s, item = deq 191 | buf[i] = (s, item) 192 | else: 193 | more = False 194 | while count in buf: 195 | s, item = buf[count] 196 | del buf[count] 197 | if s is not None: 198 | record.write_idx(item[0], s) 199 | 200 | if count % 1000 == 0: 201 | cur_time = time.time() 202 | print('time:', cur_time - pre_time, ' count:', count) 203 | pre_time = cur_time 204 | count += 1 205 | 206 | def parse_args(): 207 | parser = argparse.ArgumentParser( 208 | formatter_class=argparse.ArgumentDefaultsHelpFormatter, 209 | description='Create an image list or \ 210 | make a record database by reading from an image list') 211 | parser.add_argument('--prefix', help='prefix of input/output lst and rec files.') 212 | parser.add_argument('--root', help='path to folder containing images.') 213 | 214 | cgroup = parser.add_argument_group('Options for creating image lists') 215 | cgroup.add_argument('--l', action='store_true', default=True, 216 | help='If this is set im2rec will create image list(s) by traversing root folder\ 217 | and output to .lst.\ 218 | Otherwise im2rec will read .lst and create a database at .rec') 219 | cgroup.add_argument('--exts', nargs='+', default=['.jpeg', '.jpg', '.png'], 220 | help='list of acceptable image extensions.') 221 | cgroup.add_argument('--chunks', type=int, default=1, help='number of chunks.') 222 | cgroup.add_argument('--train-ratio', type=float, default=1.0, 223 | help='Ratio of images to use for training.') 224 | cgroup.add_argument('--test-ratio', type=float, default=0, 225 | help='Ratio of images to use for testing.') 226 | cgroup.add_argument('--recursive', action='store_true', 227 | help='If true recursively walk through subdirs and assign an unique label\ 228 | to images in each folder. Otherwise only include images in the root folder\ 229 | and give them label 0.') 230 | cgroup.add_argument('--no-shuffle', dest='shuffle', action='store_false', 231 | help='If this is passed, \ 232 | im2rec will not randomize the image order in .lst') 233 | rgroup = parser.add_argument_group('Options for creating database') 234 | rgroup.add_argument('--pass-through', action='store_true', 235 | help='whether to skip transformation and save image as is') 236 | rgroup.add_argument('--resize', type=int, default=0, 237 | help='resize the shorter edge of image to the newsize, original images will\ 238 | be packed by default.') 239 | rgroup.add_argument('--center-crop', action='store_true', 240 | help='specify whether to crop the center image to make it rectangular.') 241 | rgroup.add_argument('--quality', type=int, default=95, 242 | help='JPEG quality for encoding, 1-100; or PNG compression for encoding, 1-9') 243 | rgroup.add_argument('--num-thread', type=int, default=1, 244 | help='number of thread to use for encoding. order of images will be different\ 245 | from the input list if >1. the input list will be modified to match the\ 246 | resulting order.') 247 | rgroup.add_argument('--color', type=int, default=1, choices=[-1, 0, 1], 248 | help='specify the color mode of the loaded image.\ 249 | 1: Loads a color image. Any transparency of image will be neglected. It is the default flag.\ 250 | 0: Loads image in grayscale mode.\ 251 | -1:Loads image as such including alpha channel.') 252 | rgroup.add_argument('--encoding', type=str, default='.jpg', choices=['.jpg', '.png'], 253 | help='specify the encoding of the images.') 254 | rgroup.add_argument('--pack-label', action='store_true', 255 | help='Whether to also pack multi dimensional label in the record file') 256 | 257 | parser.add_argument('--create_list', type=str, default = 'no') 258 | args = parser.parse_args() 259 | 260 | return args 261 | 262 | if __name__ == '__main__': 263 | args = parse_args() 264 | 265 | print ('made it through arg parse') 266 | 267 | print ('looking inside of prefix: {}'.format(args.prefix)) 268 | 269 | if args.create_list == 'Yes': 270 | 271 | print ('tripped list creation') 272 | make_list(args) 273 | else: 274 | 275 | 276 | if os.path.isdir(args.prefix): 277 | working_dir = args.prefix 278 | else: 279 | working_dir = os.path.dirname(args.prefix) 280 | 281 | files = [os.path.join(working_dir, fname) for fname in os.listdir(working_dir) 282 | if os.path.isfile(os.path.join(working_dir, fname))] 283 | count = 0 284 | for fname in files: 285 | if fname.startswith(args.prefix) and fname.endswith('.lst'): 286 | print('Creating .rec file from', fname, 'in', working_dir) 287 | count += 1 288 | image_list = read_list(fname) 289 | # -- write_record -- # 290 | if args.num_thread > 1 and multiprocessing is not None: 291 | q_in = [multiprocessing.Queue(1024) for i in range(args.num_thread)] 292 | q_out = multiprocessing.Queue(1024) 293 | read_process = [multiprocessing.Process(target=read_worker, args=(args, q_in[i], q_out)) \ 294 | for i in range(args.num_thread)] 295 | for p in read_process: 296 | p.start() 297 | write_process = multiprocessing.Process(target=write_worker, args=(q_out, fname, working_dir)) 298 | write_process.start() 299 | 300 | for i, item in enumerate(image_list): 301 | q_in[i % len(q_in)].put((i, item)) 302 | for q in q_in: 303 | q.put(None) 304 | for p in read_process: 305 | p.join() 306 | 307 | q_out.put(None) 308 | write_process.join() 309 | else: 310 | print('multiprocessing not available, fall back to single threaded encoding') 311 | try: 312 | import Queue as queue 313 | except ImportError: 314 | import queue 315 | q_out = queue.Queue() 316 | fname = os.path.basename(fname) 317 | fname_rec = os.path.splitext(fname)[0] + '.rec' 318 | fname_idx = os.path.splitext(fname)[0] + '.idx' 319 | record = mx.recordio.MXIndexedRecordIO(os.path.join(working_dir, fname_idx), 320 | os.path.join(working_dir, fname_rec), 'w') 321 | cnt = 0 322 | pre_time = time.time() 323 | for i, item in enumerate(image_list): 324 | image_encode(args, i, item, q_out) 325 | if q_out.empty(): 326 | continue 327 | _, s, _ = q_out.get() 328 | record.write_idx(item[0], s) 329 | if cnt % 1000 == 0: 330 | cur_time = time.time() 331 | print('time:', cur_time - pre_time, ' count:', cnt) 332 | pre_time = cur_time 333 | cnt += 1 334 | if not count: 335 | print('Did not find and list file with prefix %s'%args.prefix) 336 | -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/images/gt_label_output.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Advanced Data Science - XRay Analysis/images/gt_label_output.png -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/images/tensorplot.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Advanced Data Science - XRay Analysis/images/tensorplot.gif -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/src/requirements.txt: -------------------------------------------------------------------------------- 1 | 2 | gluoncv 3 | -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/src/ssd_entry_point.py: -------------------------------------------------------------------------------- 1 | import io 2 | import PIL.Image 3 | import json 4 | import logging 5 | import numpy as np 6 | 7 | logger = logging.getLogger(__name__) 8 | logger.setLevel(logging.DEBUG) 9 | 10 | import glob 11 | import time 12 | import argparse 13 | import warnings 14 | import mxnet as mx 15 | from mxnet import nd 16 | from mxnet import gluon 17 | from mxnet import autograd 18 | 19 | from gluoncv import data as gdata 20 | from gluoncv.data.batchify import Tuple, Stack, Pad 21 | from gluoncv.data.transforms.presets.ssd import SSDDefaultTrainTransform 22 | from gluoncv import model_zoo 23 | import os 24 | import gluoncv as gcv 25 | 26 | def model_fn(model_dir): 27 | """ 28 | Load the gluon model. Called once when hosting service starts. 29 | :param: model_dir The directory where model files are stored. 30 | :return: a model (in this case a Gluon network) 31 | """ 32 | net = gluon.SymbolBlock.imports('%s/model-symbol.json' % model_dir, 33 | ['data'], 34 | '%s/model-0000.params' % model_dir) 35 | 36 | return net 37 | 38 | def parse_args(): 39 | parser = argparse.ArgumentParser(description='Train SSD networks.') 40 | parser.add_argument('--network', type=str, default='ssd_512_mobilenet1.0_voc', 41 | help="Network name") 42 | parser.add_argument('--data-shape', type=int, default=512, 43 | help="Input data shape, use 300, 512.") 44 | parser.add_argument('--batch-size', type=int, default=32, 45 | help='Training mini-batch size') 46 | parser.add_argument('--num-workers', '-j', dest='num_workers', type=int, 47 | default=4, help='Number of data workers, you can use larger ' 48 | 'number to accelerate data loading, if you CPU and GPUs are powerful.') 49 | parser.add_argument('--gpus', type=str, default='0', 50 | help='Training with GPUs, you can specify 1,3 for example.') 51 | parser.add_argument('--epochs', type=int, default=240, 52 | help='Training epochs.') 53 | parser.add_argument('--start-epoch', type=int, default=0, 54 | help='Starting epoch for resuming, default is 0 for new training.' 55 | 'You can specify it to 100 for example to start from 100 epoch.') 56 | parser.add_argument('--log-interval', type=int, default=100, 57 | help='Logging mini-batch interval. Default is 100.') 58 | parser.add_argument('--lr', type=float, default=0.001, 59 | help='Learning rate, default is 0.001') 60 | parser.add_argument('--lr-decay', type=float, default=0.1, 61 | help='decay rate of learning rate. default is 0.1.') 62 | parser.add_argument('--lr-decay-epoch', type=str, default='160,200', 63 | help='epochs at which learning rate decays. default is 160,200.') 64 | parser.add_argument('--momentum', type=float, default=0.9, 65 | help='SGD momentum, default is 0.9') 66 | parser.add_argument('--wd', type=float, default=0.0005, 67 | help='Weight decay, default is 5e-4') 68 | 69 | return parser.parse_args() 70 | 71 | 72 | def get_dataloader(net, data_shape, batch_size, num_workers, ctx): 73 | """Get dataloader.""" 74 | 75 | width, height = data_shape, data_shape 76 | # use fake data to generate fixed anchors for target generation 77 | with autograd.train_mode(): 78 | _, _, anchors = net(mx.nd.zeros((1, 3, height, width), ctx)) 79 | anchors = anchors.as_in_context(mx.cpu()) 80 | batchify_fn = Tuple(Stack(), Stack(), Stack()) # stack image, cls_targets, box_targets 81 | 82 | # can I point that to a bundle of png files instead? 83 | train_dataset = gdata.RecordFileDetection(os.path.join(os.environ['SM_CHANNEL_TRAIN'], 'train.rec')) 84 | 85 | # this is the folder with all the training images 86 | train_folder = os.environ['SM_CHANNEL_TRAIN'] 87 | 88 | train_loader = gluon.data.DataLoader( 89 | train_dataset.transform(SSDDefaultTrainTransform(width, height, anchors)), 90 | batch_size, True, batchify_fn=batchify_fn, last_batch='rollover', num_workers=num_workers) 91 | return train_loader 92 | 93 | def train(net, train_data, ctx, args): 94 | """Training pipeline""" 95 | 96 | net.collect_params().reset_ctx(ctx) 97 | 98 | trainer = gluon.Trainer( 99 | net.collect_params(), 'sgd', 100 | {'learning_rate': args.lr, 'wd': args.wd, 'momentum': args.momentum}, update_on_kvstore=None) 101 | 102 | # lr decay policy 103 | lr_decay = float(args.lr_decay) 104 | lr_steps = sorted([float(ls) for ls in args.lr_decay_epoch.split(',') if ls.strip()]) 105 | 106 | mbox_loss = gcv.loss.SSDMultiBoxLoss() 107 | ce_metric = mx.metric.Loss('CrossEntropy') 108 | smoothl1_metric = mx.metric.Loss('SmoothL1') 109 | 110 | # set up logger 111 | logging.basicConfig() 112 | logger = logging.getLogger() 113 | logger.setLevel(logging.INFO) 114 | logger.info(args) 115 | logger.info('Start training from [Epoch {}]'.format(args.start_epoch)) 116 | best_map = [0] 117 | 118 | for epoch in range(args.start_epoch, args.epochs): 119 | while lr_steps and epoch >= lr_steps[0]: 120 | new_lr = trainer.learning_rate * lr_decay 121 | lr_steps.pop(0) 122 | trainer.set_learning_rate(new_lr) 123 | logger.info("[Epoch {}] Set learning rate to {}".format(epoch, new_lr)) 124 | ce_metric.reset() 125 | smoothl1_metric.reset() 126 | tic = time.time() 127 | btic = time.time() 128 | net.hybridize(static_alloc=True, static_shape=True) 129 | 130 | for i, batch in enumerate(train_data): 131 | data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0) 132 | cls_targets = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0) 133 | box_targets = gluon.utils.split_and_load(batch[2], ctx_list=ctx, batch_axis=0) 134 | 135 | with autograd.record(): 136 | cls_preds = [] 137 | box_preds = [] 138 | for x in data: 139 | cls_pred, box_pred, _ = net(x) 140 | cls_preds.append(cls_pred) 141 | box_preds.append(box_pred) 142 | sum_loss, cls_loss, box_loss = mbox_loss( 143 | cls_preds, box_preds, cls_targets, box_targets) 144 | autograd.backward(sum_loss) 145 | # since we have already normalized the loss, we don't want to normalize 146 | # by batch-size anymore 147 | trainer.step(1) 148 | 149 | local_batch_size = int(args.batch_size) 150 | ce_metric.update(0, [l * local_batch_size for l in cls_loss]) 151 | smoothl1_metric.update(0, [l * local_batch_size for l in box_loss]) 152 | if args.log_interval and not (i + 1) % args.log_interval: 153 | name1, loss1 = ce_metric.get() 154 | name2, loss2 = smoothl1_metric.get() 155 | logger.info('[Epoch {}][Batch {}], Speed: {:.3f} samples/sec, {}={:.3f}, {}={:.3f}'.format( 156 | epoch, i, args.batch_size/(time.time()-btic), name1, loss1, name2, loss2)) 157 | btic = time.time() 158 | 159 | name1, loss1 = ce_metric.get() 160 | name2, loss2 = smoothl1_metric.get() 161 | logger.info('[Epoch {}] Training cost: {:.3f}, {}={:.3f}, {}={:.3f}'.format( 162 | epoch, (time.time()-tic), name1, loss1, name2, loss2)) 163 | current_map = 0. 164 | 165 | #save model 166 | net.set_nms(nms_thresh=0.45, nms_topk=400, post_nms=100) 167 | net(mx.nd.ones((1,3,512,512), ctx=ctx[0])) 168 | net.export('%s/model' % os.environ['SM_MODEL_DIR']) 169 | return net 170 | 171 | if __name__ == '__main__': 172 | 173 | args = parse_args() 174 | 175 | ctx = [mx.gpu(int(i)) for i in args.gpus.split(',') if i.strip()] 176 | ctx = ctx if ctx else [mx.cpu()] 177 | 178 | net = model_zoo.get_model(args.network, pretrained=False, ctx=ctx) 179 | net.initialize(ctx=mx.gpu(0)) 180 | 181 | train_loader = get_dataloader(net, args.data_shape, args.batch_size, args.num_workers, ctx[0]) 182 | 183 | train(net, train_loader, ctx, args) -------------------------------------------------------------------------------- /Starter Notebooks/Advanced Data Science - XRay Analysis/tensor_plot.py: -------------------------------------------------------------------------------- 1 | # Third Party 2 | import numpy as np 3 | import plotly.graph_objects as go 4 | import plotly.offline as py 5 | 6 | # First Party 7 | from smdebug.trials import create_trial 8 | 9 | py.init_notebook_mode(connected=True) 10 | 11 | # This class provides methods to plot tensors as 3 dimensional objects. It is intended for plotting convolutional 12 | # neural networks and expects that inputs are images and that outputs are class labels or images. 13 | class TensorPlot: 14 | def __init__( 15 | self, 16 | regex, 17 | path, 18 | steps=10, 19 | batch_sample_id=None, 20 | color_channel=1, 21 | title="", 22 | label=None, 23 | prediction=None, 24 | ): 25 | """ 26 | 27 | :param regex: tensor regex 28 | :param path: 29 | :param steps: 30 | :param batch_sample_id: 31 | :param color_channel: 32 | :param title: 33 | :param label: 34 | :param prediction: 35 | """ 36 | self.trial = create_trial(path) 37 | self.regex = regex 38 | self.steps = steps 39 | self.batch_sample_id = batch_sample_id 40 | self.color_channel = color_channel 41 | self.title = title 42 | self.label = label 43 | self.prediction = prediction 44 | self.max_dim = 0 45 | self.dist = 0 46 | self.tensors = {} 47 | self.output = {} 48 | self.input = {} 49 | self.load_tensors() 50 | self.set_figure() 51 | self.plot_network() 52 | self.set_frames() 53 | 54 | # Loads all tensors into a dict where the key is the step. 55 | # If batch_sample_id is None then batch dimension is plotted as a seperate dimension 56 | # if batch_sample_id is -1 then tensors are summed over batch dimension. Otherwise 57 | # the corresponding sample is plotted in the figure, and all the remaining samples 58 | # in the batch are dropped. 59 | def load_tensors(self): 60 | available_steps = self.trial.steps() 61 | for step in available_steps[0 : self.steps]: 62 | self.tensors[step] = [] 63 | 64 | # input image into the neural network 65 | if self.label is not None: 66 | for tname in self.trial.tensor_names(regex=self.label): 67 | tensor = self.trial.tensor(tname).value(step) 68 | if self.color_channel == 1: 69 | self.input[step] = tensor[0, 0, :, :] 70 | elif self.color_channel == 3: 71 | self.input[step] = tensor[0, :, :, 3] 72 | 73 | # iterate over tensors that match the regex 74 | for tname in self.trial.tensor_names(regex=self.regex): 75 | tensor = self.trial.tensor(tname).value(step) 76 | # get max value of tensors to set axis dimension accordingly 77 | for dim in tensor.shape: 78 | if dim > self.max_dim: 79 | self.max_dim = dim 80 | 81 | # layer inputs/outputs have as first dimension batch size 82 | if self.batch_sample_id != None: 83 | # sum over batch dimension 84 | if self.batch_sample_id == -1: 85 | tensor = np.sum(tensor, axis=0) / tensor.shape[0] 86 | # plot item from batch 87 | elif self.batch_sample_id >= 0 and self.batch_sample_id <= tensor.shape[0]: 88 | tensor = tensor[self.batch_sample_id] 89 | # plot first item from batch 90 | else: 91 | tensor = tensor[0] 92 | 93 | # normalize tensor values between 0 and 1 so that all tensors have same colorscheme 94 | tensor = tensor - np.min(tensor) 95 | if np.max(tensor) != 0: 96 | tensor = tensor / np.max(tensor) 97 | if len(tensor.shape) == 3: 98 | for l in range(tensor.shape[self.color_channel - 1]): 99 | if self.color_channel == 1: 100 | self.tensors[step].append([tname, tensor[l, :, :]]) 101 | elif self.color_channel == 3: 102 | self.tensors[step].append([tname, tensor[:, :, l]]) 103 | elif len(tensor.shape) == 1: 104 | self.tensors[step].append([tname, tensor]) 105 | else: 106 | # normalize tensor values between 0 and 1 so that all tensors have same colorscheme 107 | tensor = tensor - np.min(tensor) 108 | if np.max(tensor) != 0: 109 | tensor = tensor / np.max(tensor) 110 | if len(tensor.shape) == 4: 111 | for i in range(tensor.shape[0]): 112 | for l in range(tensor.shape[1]): 113 | if self.color_channel == 1: 114 | self.tensors[step].append([tname, tensor[i, l, :, :]]) 115 | elif self.color_channel == 3: 116 | self.tensors[step].append([tname, tensor[i, :, :, l]]) 117 | elif len(tensor.shape) == 2: 118 | self.tensors[step].append([tname, tensor]) 119 | 120 | # model output 121 | if self.prediction is not None: 122 | for tname in self.trial.tensor_names(regex=self.prediction): 123 | tensor = self.trial.tensor(tname).value(step) 124 | # predicted class (batch size, propabilities per clas) 125 | if len(tensor.shape) == 2: 126 | self.output[step] = np.array([np.argmax(tensor, axis=1)[0]]) 127 | # predict an image (batch size, color channel, weidth, height) 128 | elif len(tensor.shape) == 4: 129 | # MXNet has color channel in dim1 130 | if self.color_channel == 1: 131 | self.output[step] = tensor[0, 0, :, :] 132 | # TF has color channel in dim 3 133 | elif self.color_channel == 3: 134 | self.output[step] = tensor[0, :, :, 0] 135 | 136 | # Configure the plot layout 137 | def set_figure(self): 138 | self.fig = go.Figure( 139 | layout=go.Layout( 140 | autosize=False, 141 | title=self.title, 142 | width=1000, 143 | height=800, 144 | template="plotly_dark", 145 | font=dict(color="gray"), 146 | showlegend=False, 147 | updatemenus=[ 148 | dict( 149 | type="buttons", 150 | buttons=[ 151 | dict( 152 | label="Play", 153 | method="animate", 154 | args=[ 155 | None, 156 | { 157 | "frame": {"duration": 1, "redraw": True}, 158 | "fromcurrent": True, 159 | "transition": {"duration": 1}, 160 | }, 161 | ], 162 | ) 163 | ], 164 | ) 165 | ], 166 | scene=dict( 167 | xaxis=dict( 168 | range=[-self.max_dim / 2, self.max_dim / 2], 169 | autorange=False, 170 | gridcolor="black", 171 | zerolinecolor="black", 172 | showgrid=False, 173 | showline=False, 174 | showticklabels=False, 175 | showspikes=False, 176 | ), 177 | yaxis=dict( 178 | range=[-self.max_dim / 2, self.max_dim / 2], 179 | autorange=False, 180 | gridcolor="black", 181 | zerolinecolor="black", 182 | showgrid=False, 183 | showline=False, 184 | showticklabels=False, 185 | showspikes=False, 186 | ), 187 | zaxis=dict( 188 | gridcolor="black", 189 | zerolinecolor="black", 190 | showgrid=False, 191 | showline=False, 192 | showticklabels=False, 193 | showspikes=False, 194 | ), 195 | ), 196 | ) 197 | ) 198 | 199 | # Create a sequence of frames: tensors from same step will be stored in the same frame 200 | def set_frames(self): 201 | frames = [] 202 | available_steps = self.trial.steps() 203 | for step in available_steps[0 : self.steps]: 204 | layers = [] 205 | if self.label is not None: 206 | if len(self.input[step].shape) == 2: 207 | # plot predicted image 208 | layers.append({"type": "surface", "surfacecolor": self.input[step]}) 209 | for i in range(len(self.tensors[step])): 210 | if len(self.tensors[step][i][1].shape) == 1: 211 | # set color of fully connected layer for corresponding step 212 | layers.append( 213 | {"type": "scatter3d", "marker": {"color": self.tensors[step][i][1]}} 214 | ) 215 | elif len(self.tensors[step][i][1].shape) == 2: 216 | # set color of convolutional/pooling layer for corresponding step 217 | layers.append({"type": "surface", "surfacecolor": self.tensors[step][i][1]}) 218 | if self.prediction is not None: 219 | if len(self.output[step].shape) == 1: 220 | # plot predicted class for first input in batch 221 | layers.append( 222 | { 223 | "type": "scatter3d", 224 | "text": "Predicted class " + str(self.output[step][0]), 225 | "textfont": {"size": 40}, 226 | } 227 | ) 228 | elif len(self.output[step].shape) == 2: 229 | # plot predicted image 230 | layers.append({"type": "surface", "surfacecolor": self.output[step]}) 231 | frames.append(go.Frame(data=layers)) 232 | 233 | self.fig.frames = frames 234 | 235 | # Plot the different neural network layers 236 | # if ignore_batch_dimension is True then convolutions are plotted as 237 | # Surface and dense layers are plotted as Scatter3D 238 | # if ignore_batch_dimension is False then convolutions and dense layers 239 | # are plotted as Surface. We don't plot biases. 240 | # If convolution has shape [batch_size, 10, 24, 24] and ignore_batch_dimension==True 241 | # then this function will plot 10 Surface layers in the size of 24x24 242 | def plot_network(self): 243 | tensors = [] 244 | dist = 0 245 | counter = 0 246 | 247 | first_step = self.trial.steps()[0] 248 | if self.label is not None: 249 | tensor = self.input[first_step].shape 250 | if len(tensor) == 2: 251 | tensors.append( 252 | go.Surface( 253 | z=np.zeros((tensor[0], tensor[1])) + self.dist, 254 | y=np.arange(-tensor[0] / 2, tensor[0] / 2), 255 | x=np.arange(-tensor[1] / 2, tensor[1] / 2), 256 | surfacecolor=self.input[first_step], 257 | showscale=False, 258 | colorscale="gray", 259 | opacity=0.7, 260 | ) 261 | ) 262 | self.dist += 2 263 | prev_name = None 264 | for tname, layer in self.tensors[first_step]: 265 | tensor = layer.shape 266 | 267 | if len(tensor) == 2: 268 | tensors.append( 269 | go.Surface( 270 | z=np.zeros((tensor[0], tensor[1])) + self.dist, 271 | y=np.arange(-tensor[0] / 2, tensor[0] / 2), 272 | x=np.arange(-tensor[1] / 2, tensor[1] / 2), 273 | text=tname, 274 | surfacecolor=layer, 275 | showscale=False, 276 | # colorscale='gray', 277 | opacity=0.7, 278 | ) 279 | ) 280 | 281 | elif len(tensor) == 1: 282 | tensors.append( 283 | go.Scatter3d( 284 | z=np.zeros(tensor[0]) + self.dist, 285 | y=np.zeros(tensor[0]), 286 | x=np.arange(-tensor[0] / 2, tensor[0] / 2), 287 | text=tname, 288 | mode="markers", 289 | marker=dict(size=3, opacity=0.7, color=layer), 290 | ) 291 | ) 292 | if tname == prev_name: 293 | self.dist += 0.2 294 | else: 295 | self.dist += 1 296 | counter += 1 297 | prev_name = tname 298 | # plot model output 299 | if self.prediction is not None: 300 | # model predicts a class label (batch_size, class propabilities) 301 | if len(self.output[first_step].shape) == 1: 302 | tensors.append( 303 | go.Scatter3d( 304 | z=np.array([self.dist + 0.2]), 305 | x=np.array([0]), 306 | y=np.array([0]), 307 | text="Predicted class", 308 | mode="markers+text", 309 | marker=dict(size=3, color="black"), 310 | textfont=dict(size=18), 311 | opacity=0.7, 312 | ) 313 | ) 314 | # model predicts an output image (batch size, color channel, width, height) 315 | elif len(self.output[first_step].shape) == 2: 316 | tensor = self.output[first_step].shape 317 | tensors.append( 318 | go.Surface( 319 | z=np.zeros((tensor[0], tensor[1])) + self.dist + 3, 320 | y=np.arange(-tensor[0] / 2, tensor[0] / 2), 321 | x=np.arange(-tensor[1] / 2, tensor[1] / 2), 322 | text="Predicted image", 323 | surfacecolor=self.output[first_step], 324 | showscale=False, 325 | colorscale="gray", 326 | opacity=0.7, 327 | ) 328 | ) 329 | 330 | # add list of tensors to figure 331 | self.fig.add_traces(tensors) 332 | -------------------------------------------------------------------------------- /Starter Notebooks/Cost Prediction/Cost Prediction with Autopilot.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Predict Hospital Spending Per Patient with SageMaker Autopilot\n", 8 | "In this lab we'll get started with SageMaker using Autopilot! In particular we will download the Medicare dataset, clean it, and plug it into a framework for SageMaker Autopilot.\n", 9 | "\n", 10 | "You'll see the notebooks generated for you, the hundreds of models trained, in addition to your very own inference pipeline, deployable to a SageMaker endpoint or batch transform job!\n", 11 | "\n", 12 | "At the end, we'll set up a SHAP explainer to analyze local feature importance for a set of predictions. Let's get started!" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "# Download the Mediare dataset as csv file to the notebook\n", 22 | "!wget -O Medicare_Hospital_Spending_by_Claim.csv https://data.medicare.gov/api/views/nrth-mfg3/rows.csv?accessType=DOWNLOAD" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "### Data Preprocessing on the Raw Dataset\n", 30 | "In this section we read the raw csv data set into a pandas data frame. We inspect the data using pandas head() function. We do data pre-processing using feature encoding, feature engineering, column renaming, dropping some columns that have no relevance to the prediction of `Avg_Hosp` cost and examining there are no missing values in the data set" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "# Read the CSV file into panda dataframe and save it to another table so we can keep a copy of the original dataset\n", 40 | "# In our example we use the dataframe called table1 for all pre-processing, while the dataframe table\n", 41 | "# maintains a copy of the original data\n", 42 | "\n", 43 | "import pandas as pd\n", 44 | "table = pd.read_csv('Medicare_Hospital_Spending_by_Claim.csv')\n", 45 | "table1 = table.copy()\n", 46 | "table1.head()" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "# Encode column \"State\"\n", 56 | "\n", 57 | "replace_map = {'State': {'AK': 1, 'AL': 2, 'AR': 3, 'AZ': 4, 'CA': 5, 'CO': 6, 'CT': 7, \n", 58 | " 'DC': 8, 'DE': 9, 'FL': 10, 'GA': 11, 'HI': 12, \n", 59 | " 'IA': 13, 'ID': 14, 'IL': 15, 'IN': 16, 'KS': 17, \n", 60 | " 'KY': 18, 'LA': 19, 'MA': 20, 'ME': 21, 'MI': 22, \n", 61 | " 'MN': 23, 'MO': 24, 'MS': 25, 'MT': 26, 'NC': 27, \n", 62 | " 'ND': 28, 'NE': 29, 'NH': 30, 'NJ': 31, 'NM': 32, \n", 63 | " 'NV': 33, 'NY': 34, 'OH': 35, 'OK': 36, 'OR': 37, \n", 64 | " 'PA': 38, 'RI': 39, 'SC': 40, 'SD': 41, 'TN': 42, \n", 65 | " 'TX': 43, 'UT': 44, 'VA': 45, 'VT': 46, 'WA': 47, \n", 66 | " 'WI': 48, 'WV': 49, 'WY': 50}}\n", 67 | "table1.replace(replace_map,inplace=True)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "# Encode column \"Period\"\n", 77 | "\n", 78 | "replace_map = {'Period': {'1 to 3 days Prior to Index Hospital Admission': 1, \n", 79 | " 'During Index Hospital Admission': 2, \n", 80 | " '1 through 30 days After Discharge from Index Hospital Admission': 3, \n", 81 | " 'Complete Episode': 4}}\n", 82 | "table1.replace(replace_map,inplace=True)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "# Encode column \"Claim Type\"\n", 92 | "\n", 93 | "replace_map = {'Claim Type': {'Home Health Agency': 1, \n", 94 | " 'Hospice': 2, \n", 95 | " 'Inpatient': 3, \n", 96 | " 'Outpatient': 4, \n", 97 | " 'Skilled Nursing Facility': 5, \n", 98 | " 'Durable Medical Equipment': 6, \n", 99 | " 'Carrier': 7, \n", 100 | " 'Total': 8}}\n", 101 | "table1.replace(replace_map,inplace=True)" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "metadata": {}, 108 | "outputs": [], 109 | "source": [ 110 | "# Convert the column \"Percent of Spending Hospital\tPercent of Spending\" to float, remove the percent sign and \n", 111 | "# divide by 100 to normalize for percentage\n", 112 | "\n", 113 | "table1['Percent of Spending Hospital'] = table1['Percent of Spending Hospital'].str.rstrip('%').astype('float')\n", 114 | "table1['Percent of Spending Hospital'] = table1['Percent of Spending Hospital']/100" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "# Convert the column \"Percent of Spending State\" to float, remove the percent sign and \n", 124 | "# divide by 100 to normalize for percentage\n", 125 | "\n", 126 | "table1['Percent of Spending State'] = table1['Percent of Spending State'].str.rstrip('%').astype('float')\n", 127 | "table1['Percent of Spending State'] = table1['Percent of Spending State']/100" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": null, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "# Convert the column \"Percent of Spending Nation\" to float, remove the percent sign and \n", 137 | "# divide by 100 to normalize for percentage\n", 138 | "\n", 139 | "table1['Percent of Spending Nation'] = table1['Percent of Spending Nation'].str.rstrip('%').astype('float')\n", 140 | "table1['Percent of Spending Nation'] = table1['Percent of Spending Nation']/100" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "# Drop Column \"Facility Name\", Facility Id related to the facility, hence facility name is not\n", 150 | "# relevant for the model\n", 151 | "\n", 152 | "table1.drop(['Facility Name'], axis=1, inplace = True)" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": null, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "# Move the \"Avg Spending Per Episode Hospital\" column to the beginning, since the\n", 162 | "# algorithm requires the prediction column at the beginning\n", 163 | "\n", 164 | "col_name='Avg Spending Per Episode Hospital'\n", 165 | "first_col = table1.pop(col_name)\n", 166 | "table1.insert(0, col_name, first_col)" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "# Convert integer values to float in the columns \"Avg Spending Per Episode Hospital\", \n", 176 | "# \"Avg Spending Per Episode State\" and \"Avg Spending Per Episode Nation\"\n", 177 | "# Columns with integer values are interpreted as categorical values. Changing to float avoids any mis-interpretetaion\n", 178 | "\n", 179 | "table1['Avg Spending Per Episode Hospital'] = table1['Avg Spending Per Episode Hospital'].astype('float')\n", 180 | "table1['Avg Spending Per Episode State'] = table1['Avg Spending Per Episode State'].astype('float')\n", 181 | "table1['Avg Spending Per Episode Nation'] = table1['Avg Spending Per Episode Nation'].astype('float')" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# Rename long column names for costs and percentage costs on the hospital, state and nation,\n", 191 | "# so they are easily referenced in the rest of this discussion\n", 192 | "\n", 193 | "table1.rename(columns={'Avg Spending Per Episode Hospital':'Avg_Hosp',\n", 194 | " 'Avg Spending Per Episode State':'Avg_State',\n", 195 | " 'Avg Spending Per Episode Nation':'Avg_Nation',\n", 196 | " 'Percent of Spending Hospital':'Percent_Hosp',\n", 197 | " 'Percent of Spending State':'Percent_State',\n", 198 | " 'Percent of Spending Nation':'Percent_Nation'}, \n", 199 | " inplace=True)" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "# Convert Start Date and End Date to datetime objects, then convert them to integers. First the data is converted\n", 209 | "# to Pandas datetime object. Then the year, month and days are extracted from the datetime object and \n", 210 | "# multipled with some weights to convert into final integer values.\n", 211 | "\n", 212 | "table1['Start Date'] = pd.to_datetime(table1['Start Date'])\n", 213 | "table1['End Date'] = pd.to_datetime(table1['End Date'])\n", 214 | "table1['Start Date'] = 1000*table1['Start Date'].dt.year + 100*table1['Start Date'].dt.month + table1['Start Date'].dt.day\n", 215 | "table1['End Date'] = 1000*table1['End Date'].dt.year + 100*table1['End Date'].dt.month + table1['End Date'].dt.day" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "# See the first 5 rows in the dataframe to see how the changed data looks\n", 225 | "\n", 226 | "table1.head()" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": null, 232 | "metadata": {}, 233 | "outputs": [], 234 | "source": [ 235 | "# Drop Columns \"Start Date\" and \"End Date\". The dataset is only for 2018, hence all start and end dates\n", 236 | "# are same in each row and does not impact the model\n", 237 | "\n", 238 | "table1.drop(['Start Date'], axis=1, inplace = True)\n", 239 | "table1.drop(['End Date'], axis=1, inplace = True)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": {}, 246 | "outputs": [], 247 | "source": [ 248 | "# Make sure the table do not have missing values. The following code line shows there are no missing values\n", 249 | "# in the table\n", 250 | "\n", 251 | "table1.isna().sum()" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "df = table1.sample(frac=1)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "fraction_train = .85\n", 270 | "test_row = round(df.shape[0] * fraction_train)\n", 271 | "test_set = df.iloc[test_row:]\n", 272 | "train_set = df.iloc[:test_row]" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [ 281 | "local_train_file = 'train_set.csv'\n", 282 | "\n", 283 | "train_set.to_csv(local_train_file, index=False, header=True)\n", 284 | "test_set.to_csv('test_set.csv', index=False, header=True)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [ 293 | "# optionally run some of your own plots here to analyze the data" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "# SageMaker Autopilot\n", 301 | "Next, let's run this dataset on SageMaker Autopilot! " 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": {}, 308 | "outputs": [], 309 | "source": [ 310 | "from sagemaker import AutoML\n", 311 | "from time import gmtime, strftime, sleep\n", 312 | "import numpy as np\n", 313 | "import sagemaker\n", 314 | "\n", 315 | "sess = sagemaker.Session()\n", 316 | "\n", 317 | "role = sagemaker.get_execution_role()\n", 318 | "\n", 319 | "timestamp_suffix = strftime('%d-%H-%M-%S', gmtime())\n", 320 | "base_job_name = 'cost-prediction-' + timestamp_suffix\n", 321 | "\n", 322 | "target_attribute_name = 'Avg_Hosp'\n", 323 | "target_attribute_values = np.unique(train_set[target_attribute_name])\n", 324 | "target_attribute_true_value = target_attribute_values[1] # 'True.'\n", 325 | "\n", 326 | "automl = AutoML(role=role,\n", 327 | " target_attribute_name=target_attribute_name,\n", 328 | " base_job_name=base_job_name,\n", 329 | " sagemaker_session=sess,\n", 330 | " max_candidates=20,\n", 331 | " problem_type = 'Regression',\n", 332 | " job_objective = {'MetricName':'MSE'})\n", 333 | "\n", 334 | "automl.fit(local_train_file, job_name=base_job_name, wait=True, logs=True)" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "After you run this cell, open up the Experiments tab on SageMaker Studio, right click on your new `cost-prediction` job, and view the AutoML job details! " 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "![](../../Images/Autopilot.png)" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "Once the state of the job has moved into `Feature Engineering`, you should be able to open the data exploration notebook, in addition to the candidate generation notebook. " 356 | ] 357 | }, 358 | { 359 | "cell_type": "markdown", 360 | "metadata": {}, 361 | "source": [ 362 | "Spend some time stepping through these notebooks. You can also download the data transformation code base. Remember, all of this was generated for your specific dataset!" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "---\n", 370 | "# Analyze Autopilot Modeling Performance\n", 371 | "Your AutoML job will take some time to complete. Feel free to use that time to step through the generated notebooks and learn about all the feature engineering strategies they are using! \n", 372 | "\n", 373 | "Once your job has finished, it's time to analyze that performance. Luckily for us we can simply deploy that entire artifact onto an endpoint, using the same `model.deploy()` that we saw earlier. Let's do that here.\n", 374 | "\n", 375 | "We'll attach the name of your job to an AutoML estimator, so please make sure to paste in the name of your job below." 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": null, 381 | "metadata": {}, 382 | "outputs": [], 383 | "source": [ 384 | "from datetime import datetime\n", 385 | "from sagemaker import AutoML\n", 386 | "import sagemaker\n", 387 | "import numpy as np\n", 388 | "\n", 389 | "sess = sagemaker.Session()\n", 390 | "\n", 391 | "# if you needed to restart you kernel, you can attach your AutoML job here\n", 392 | "automl_job_name = 'COST-PREDICTION-28-02-12-32' #<== REPLACE THIS WITH YOUR OWN AUTOML JOB NAME\n", 393 | "automl = AutoML.attach(automl_job_name, sagemaker_session=sess)\n", 394 | "\n", 395 | "ep_name = 'automl-endpoint-' + datetime.now().strftime('%S')\n", 396 | "\n", 397 | "inference_response_keys = ['predicted_label', 'probability']\n", 398 | "\n", 399 | "# Create the inference endpoint\n", 400 | "automl.deploy(1, 'ml.m5.xlarge', endpoint_name = ep_name) #inference_response_keys=inference_response_keys)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "!pip install --upgrade sagemaker" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "from sagemaker.predictor import RealTimePredictor\n", 419 | "class AutomlEstimator:\n", 420 | " def __init__(self, endpoint_name, sagemaker_session):\n", 421 | " self.predictor = RealTimePredictor(\n", 422 | " endpoint_name=endpoint_name,\n", 423 | " sagemaker_session=sagemaker_session,\n", 424 | " serializer=sagemaker.serializers.CSVSerializer(),\n", 425 | " content_type='text/csv',\n", 426 | " accept='text/csv'\n", 427 | " )\n", 428 | " # Prediction function for regression\n", 429 | " def predict(self, x):\n", 430 | " response = self.predictor.predict(x)\n", 431 | " return np.array([float(x) for x in response.decode('utf-8').split(',')])" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": null, 437 | "metadata": {}, 438 | "outputs": [], 439 | "source": [ 440 | "# make sure this is pointing to the right endpoint name - if you reran that cell above you may have overwitten the variable in memory\n", 441 | "automl_estimator = AutomlEstimator(endpoint_name=ep_name, sagemaker_session=sess)" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": {}, 448 | "outputs": [], 449 | "source": [ 450 | "import pandas as pd\n", 451 | "\n", 452 | "test_data = pd.read_csv('test_set.csv')" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": {}, 458 | "source": [ 459 | "# Explain Global and Local Modeling Performance with SHAP\n", 460 | "A key question that many stakeholders will have is how your model came to its predictions, both for the entire dataset and for individual predictions. In this lab we'll set up a SHAP model explainer to view feature importances. Feature importances can be understood both in terms of \"local,\" or per-prediction, and \"global,\" or for the entire datset.\n", 461 | "\n", 462 | "We will actually wrap your model endpoint to provide these." 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": null, 468 | "metadata": {}, 469 | "outputs": [], 470 | "source": [ 471 | "!conda update -n base -c defaults conda -y" 472 | ] 473 | }, 474 | { 475 | "cell_type": "code", 476 | "execution_count": null, 477 | "metadata": {}, 478 | "outputs": [], 479 | "source": [ 480 | "!conda install -c conda-forge -y shap" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": null, 486 | "metadata": {}, 487 | "outputs": [], 488 | "source": [ 489 | "import shap\n", 490 | "\n", 491 | "from shap import KernelExplainer\n", 492 | "from shap import sample\n", 493 | "from scipy.special import expit\n", 494 | "\n", 495 | "# Initialize plugin to make plots interactive.\n", 496 | "shap.initjs()" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": null, 502 | "metadata": {}, 503 | "outputs": [], 504 | "source": [ 505 | "data_without_target = test_data.drop(columns=['Avg_Hosp'])\n", 506 | "\n", 507 | "background_data = sample(data_without_target, 50)" 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": null, 513 | "metadata": {}, 514 | "outputs": [], 515 | "source": [ 516 | "# Derive link function \n", 517 | "problem_type = automl.describe_auto_ml_job(job_name=automl_job_name)['ResolvedAttributes']['ProblemType'] \n", 518 | "link = \"identity\" if problem_type == 'Regression' else \"logit\" \n", 519 | "\n", 520 | "# the handle to predict_proba is passed to KernelExplainerWrapper since KernelSHAP requires the class probability\n", 521 | "explainer = KernelExplainer(automl_estimator.predict, background_data, link=link)" 522 | ] 523 | }, 524 | { 525 | "cell_type": "code", 526 | "execution_count": null, 527 | "metadata": {}, 528 | "outputs": [], 529 | "source": [ 530 | "# Since expected_value is given in the log-odds space we convert it back to probability using expit which is the inverse function to logit\n", 531 | "print('expected value =', explainer.expected_value)" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": null, 537 | "metadata": {}, 538 | "outputs": [], 539 | "source": [ 540 | "%%writefile managed_endpoint.py\n", 541 | "\n", 542 | "import boto3\n", 543 | "region = boto3.Session().region_name\n", 544 | "\n", 545 | "sm = boto3.Session().client(service_name='sagemaker',region_name=region)\n", 546 | "\n", 547 | "class ManagedEndpoint:\n", 548 | " def __init__(self, ep_name, auto_delete=False):\n", 549 | " self.name = ep_name\n", 550 | " self.auto_delete = auto_delete\n", 551 | " \n", 552 | " def __enter__(self):\n", 553 | " endpoint_description = sm.describe_endpoint(EndpointName=self.name)\n", 554 | " if endpoint_description['EndpointStatus'] == 'InService':\n", 555 | " self.in_service = True \n", 556 | "\n", 557 | " def __exit__(self, type, value, traceback):\n", 558 | " if self.in_service and self.auto_delete:\n", 559 | " print(\"Deleting the endpoint: {}\".format(self.name)) \n", 560 | " sm.delete_endpoint(EndpointName=self.name)\n", 561 | " sm.get_waiter('endpoint_deleted').wait(EndpointName=self.name)\n", 562 | " self.in_service = False" 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "execution_count": null, 568 | "metadata": {}, 569 | "outputs": [], 570 | "source": [ 571 | "# Get the first sample\n", 572 | "x = data_without_target.iloc[0:1]\n", 573 | "\n", 574 | "# ManagedEndpoint can optionally auto delete the endpoint after calculating the SHAP values. To enable auto delete, use ManagedEndpoint(ep_name, auto_delete=True)\n", 575 | "from managed_endpoint import ManagedEndpoint\n", 576 | "with ManagedEndpoint(ep_name) as mep:\n", 577 | " shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='aic')" 578 | ] 579 | }, 580 | { 581 | "cell_type": "markdown", 582 | "metadata": {}, 583 | "source": [ 584 | "# Visualize SHAP Values\n", 585 | "Now, let's see which features are more strongly influencing the predictions from our model!\n", 586 | "\n", 587 | "![](images/shap_1.png)" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": null, 593 | "metadata": {}, 594 | "outputs": [], 595 | "source": [ 596 | "# Since shap_values are provided in the log-odds space, we convert them back to the probability space by using LogitLink\n", 597 | "shap.force_plot(explainer.expected_value, shap_values, x, link=link)" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "![](images/shap_2.png)" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": null, 610 | "metadata": {}, 611 | "outputs": [], 612 | "source": [ 613 | "with ManagedEndpoint(ep_name) as mep:\n", 614 | " shap_values = explainer.shap_values(x, nsamples='auto', l1_reg='num_features(5)')\n", 615 | "shap.force_plot(explainer.expected_value, shap_values, x, link=link)" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": null, 621 | "metadata": {}, 622 | "outputs": [], 623 | "source": [ 624 | "# Sample 50 random samples\n", 625 | "X = sample(data_without_target, 50)\n", 626 | "\n", 627 | "# Calculate SHAP values for these samples, and delete the endpoint\n", 628 | "with ManagedEndpoint(ep_name, auto_delete=True) as mep:\n", 629 | " shap_values = explainer.shap_values(X, nsamples='auto', l1_reg='aic')" 630 | ] 631 | }, 632 | { 633 | "cell_type": "markdown", 634 | "metadata": {}, 635 | "source": [ 636 | "![](images/shap_3.png)" 637 | ] 638 | }, 639 | { 640 | "cell_type": "code", 641 | "execution_count": null, 642 | "metadata": {}, 643 | "outputs": [], 644 | "source": [ 645 | "shap.force_plot(explainer.expected_value, shap_values, X, link=link)" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "![](images/shap_4.png)" 653 | ] 654 | }, 655 | { 656 | "cell_type": "code", 657 | "execution_count": null, 658 | "metadata": {}, 659 | "outputs": [], 660 | "source": [ 661 | "shap.summary_plot(shap_values, X, plot_type=\"bar\")" 662 | ] 663 | }, 664 | { 665 | "cell_type": "markdown", 666 | "metadata": {}, 667 | "source": [ 668 | "---\n", 669 | "# Optional - Extend Autopilot with your own feature engineering code\n", 670 | "If you have extra time after getting to the local inference explanations, why not take a look at bringing your own feature engineering code into SageMaker Autopilot? Remember that this notebook started with ~10 basic ETL steps in Python to convert the raw Medicare data into something our models could even start to loook at. Look at the following example to see how to port your own ETL scripts into SageMaker Autopilot for custom feature engineering.\n", 671 | "\n", 672 | "Remember, once you get the entire pipeline deployed onto an endpoint, it means you can send the raw data up to the endpoint, and it will perform both feature engineering and model infereing for you, all in real time!\n", 673 | "\n", 674 | "- https://github.com/aws/amazon-sagemaker-examples/tree/master/autopilot/custom-feature-selection" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": null, 680 | "metadata": {}, 681 | "outputs": [], 682 | "source": [] 683 | } 684 | ], 685 | "metadata": { 686 | "instance_type": "ml.t3.medium", 687 | "kernelspec": { 688 | "display_name": "Python 3 (Data Science)", 689 | "language": "python", 690 | "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0" 691 | }, 692 | "language_info": { 693 | "codemirror_mode": { 694 | "name": "ipython", 695 | "version": 3 696 | }, 697 | "file_extension": ".py", 698 | "mimetype": "text/x-python", 699 | "name": "python", 700 | "nbconvert_exporter": "python", 701 | "pygments_lexer": "ipython3", 702 | "version": "3.7.6" 703 | } 704 | }, 705 | "nbformat": 4, 706 | "nbformat_minor": 4 707 | } 708 | -------------------------------------------------------------------------------- /Starter Notebooks/Cost Prediction/images/shap_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_1.png -------------------------------------------------------------------------------- /Starter Notebooks/Cost Prediction/images/shap_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_2.png -------------------------------------------------------------------------------- /Starter Notebooks/Cost Prediction/images/shap_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_3.png -------------------------------------------------------------------------------- /Starter Notebooks/Cost Prediction/images/shap_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/Cost Prediction/images/shap_4.png -------------------------------------------------------------------------------- /Starter Notebooks/MLOps and Hosting/install-run-notebook.sh: -------------------------------------------------------------------------------- 1 | version=0.15.0 2 | pip install https://github.com/aws-samples/sagemaker-run-notebook/releases/download/v${version}/sagemaker_run_notebook-${version}.tar.gz 3 | jlpm config set cache-folder /tmp/yarncache 4 | jupyter lab build --debug --minimize=False 5 | nohup supervisorctl -c /etc/supervisor/conf.d/supervisord.conf restart jupyterlabserver 6 | -------------------------------------------------------------------------------- /Starter Notebooks/MLOps and Hosting/model.tar.gz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/amazon-sagemaker-architecting-for-ml-hcls/9236be67abb200b6073b2b17079c9c368326c353/Starter Notebooks/MLOps and Hosting/model.tar.gz -------------------------------------------------------------------------------- /Starter Notebooks/MLOps and Hosting/src/requirements.txt: -------------------------------------------------------------------------------- 1 | 2 | autogluon 3 | sagemaker 4 | awscli 5 | boto3 6 | PrettyTable 7 | bokeh 8 | numpy==1.16.1 9 | matplotlib 10 | sagemaker-experiments 11 | -------------------------------------------------------------------------------- /Starter Notebooks/MLOps and Hosting/src/train.py: -------------------------------------------------------------------------------- 1 | 2 | import ast 3 | import argparse 4 | import logging 5 | import warnings 6 | import os 7 | import json 8 | import glob 9 | import subprocess 10 | import sys 11 | import boto3 12 | import pickle 13 | import pandas as pd 14 | from collections import Counter 15 | from timeit import default_timer as timer 16 | import time 17 | 18 | from smexperiments.experiment import Experiment 19 | from smexperiments.trial import Trial 20 | from smexperiments.trial_component import TrialComponent 21 | from smexperiments.tracker import Tracker 22 | 23 | sys.path.insert(0, 'package') 24 | with warnings.catch_warnings(): 25 | warnings.filterwarnings("ignore",category=DeprecationWarning) 26 | from prettytable import PrettyTable 27 | import autogluon as ag 28 | from autogluon import TabularPrediction as task 29 | from autogluon.task.tabular_prediction import TabularDataset 30 | 31 | # ------------------------------------------------------------ # 32 | # Training methods # 33 | # ------------------------------------------------------------ # 34 | 35 | def du(path): 36 | """disk usage in human readable format (e.g. '2,1GB')""" 37 | return subprocess.check_output(['du','-sh', path]).split()[0].decode('utf-8') 38 | 39 | def __load_input_data(path: str) -> TabularDataset: 40 | """ 41 | Load training data as dataframe 42 | :param path: 43 | :return: DataFrame 44 | """ 45 | input_data_files = os.listdir(path) 46 | try: 47 | input_dfs = [pd.read_csv(f'{path}/{data_file}') for data_file in input_data_files] 48 | return task.Dataset(df=pd.concat(input_dfs)) 49 | except: 50 | print(f'No csv data in {path}!') 51 | return None 52 | 53 | def train(args): 54 | 55 | is_distributed = len(args.hosts) > 1 56 | host_rank = args.hosts.index(args.current_host) 57 | dist_ip_addrs = args.hosts 58 | dist_ip_addrs.pop(host_rank) 59 | ngpus_per_trial = 1 if args.num_gpus > 0 else 0 60 | 61 | # load training and validation data 62 | print(f'Train files: {os.listdir(args.train)}') 63 | train_data = __load_input_data(args.train) 64 | print(f'Label counts: {dict(Counter(train_data[args.label]))}') 65 | 66 | predictor = task.fit( 67 | train_data=train_data, 68 | label=args.label, 69 | output_directory=args.model_dir, 70 | problem_type=args.problem_type, 71 | eval_metric=args.eval_metric, 72 | stopping_metric=args.stopping_metric, 73 | auto_stack=args.auto_stack, # default: False 74 | hyperparameter_tune=args.hyperparameter_tune, # default: False 75 | feature_prune=args.feature_prune, # default: False 76 | holdout_frac=args.holdout_frac, # default: None 77 | num_bagging_folds=args.num_bagging_folds, # default: 0 78 | num_bagging_sets=args.num_bagging_sets, # default: None 79 | stack_ensemble_levels=args.stack_ensemble_levels, # default: 0 80 | cache_data=args.cache_data, 81 | time_limits=args.time_limits, 82 | num_trials=args.num_trials, # default: None 83 | search_strategy=args.search_strategy, # default: 'random' 84 | search_options=args.search_options, 85 | visualizer=args.visualizer, 86 | verbosity=args.verbosity 87 | ) 88 | 89 | # Results summary 90 | predictor.fit_summary(verbosity=1) 91 | 92 | # Leaderboard on optional test data 93 | if args.test: 94 | print(f'Test files: {os.listdir(args.test)}') 95 | test_data = __load_input_data(args.test) 96 | print('Running model on test data and getting Leaderboard...') 97 | leaderboard = predictor.leaderboard(dataset=test_data, silent=True) 98 | def format_for_print(df): 99 | table = PrettyTable(list(df.columns)) 100 | for row in df.itertuples(): 101 | table.add_row(row[1:]) 102 | return str(table) 103 | print(format_for_print(leaderboard), end='\n\n') 104 | 105 | # Files summary 106 | print(f'Model export summary:') 107 | print(f"/opt/ml/model/: {os.listdir('/opt/ml/model/')}") 108 | models_contents = os.listdir('/opt/ml/model/models') 109 | print(f"/opt/ml/model/models: {models_contents}") 110 | print(f"/opt/ml/model directory size: {du('/opt/ml/model/')}\n") 111 | 112 | # ------------------------------------------------------------ # 113 | # Training execution # 114 | # ------------------------------------------------------------ # 115 | 116 | def str2bool(v): 117 | return v.lower() in ('yes', 'true', 't', '1') 118 | 119 | def parse_args(): 120 | 121 | parser = argparse.ArgumentParser( 122 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 123 | parser.register('type','bool',str2bool) # add type keyword to registries 124 | 125 | parser.add_argument('--hosts', type=list, default=json.loads(os.environ['SM_HOSTS'])) 126 | parser.add_argument('--current-host', type=str, default=os.environ['SM_CURRENT_HOST']) 127 | parser.add_argument('--num-gpus', type=int, default=os.environ['SM_NUM_GPUS']) 128 | parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) # /opt/ml/model 129 | parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAINING']) 130 | parser.add_argument('--test', type=str, default='') # /opt/ml/input/data/test 131 | parser.add_argument('--label', type=str, default='truth', 132 | help="Name of the column that contains the target variable to predict.") 133 | 134 | parser.add_argument('--problem_type', type=str, default=None, 135 | help=("Type of prediction problem, i.e. is this a binary/multiclass classification or " 136 | "regression problem options: 'binary', 'multiclass', 'regression'). " 137 | "If `problem_type = None`, the prediction problem type is inferred based " 138 | "on the label-values in provided dataset.")) 139 | parser.add_argument('--eval_metric', type=str, default=None, 140 | help=("Metric by which predictions will be ultimately evaluated on test data." 141 | "AutoGluon tunes factors such as hyperparameters, early-stopping, ensemble-weights, etc. " 142 | "in order to improve this metric on validation data. " 143 | "If `eval_metric = None`, it is automatically chosen based on `problem_type`. " 144 | "Defaults to 'accuracy' for binary and multiclass classification and " 145 | "'root_mean_squared_error' for regression. " 146 | "Otherwise, options for classification: [ " 147 | " 'accuracy', 'balanced_accuracy', 'f1', 'f1_macro', 'f1_micro', 'f1_weighted', " 148 | " 'roc_auc', 'average_precision', 'precision', 'precision_macro', 'precision_micro', 'precision_weighted', " 149 | " 'recall', 'recall_macro', 'recall_micro', 'recall_weighted', 'log_loss', 'pac_score']. " 150 | "Options for regression: ['root_mean_squared_error', 'mean_squared_error', " 151 | "'mean_absolute_error', 'median_absolute_error', 'r2']. " 152 | "For more information on these options, see `sklearn.metrics`: " 153 | "https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics " 154 | "You can also pass your own evaluation function here as long as it follows formatting of the functions " 155 | "defined in `autogluon/utils/tabular/metrics/`. ")) 156 | parser.add_argument('--stopping_metric', type=str, default=None, 157 | help=("Metric which models use to early stop to avoid overfitting. " 158 | "`stopping_metric` is not used by weighted ensembles, instead weighted ensembles maximize `eval_metric`. " 159 | "Defaults to `eval_metric` value except when `eval_metric='roc_auc'`, where it defaults to `log_loss`.")) 160 | parser.add_argument('--auto_stack', type='bool', default=False, 161 | help=("Whether to have AutoGluon automatically attempt to select optimal " 162 | "num_bagging_folds and stack_ensemble_levels based on data properties. " 163 | "Note: Overrides num_bagging_folds and stack_ensemble_levels values. " 164 | "Note: This can increase training time by up to 20x, but can produce much better results. " 165 | "Note: This can increase inference time by up to 20x.")) 166 | parser.add_argument('--hyperparameter_tune', type='bool', default=False, 167 | help=("Whether to tune hyperparameters or just use fixed hyperparameter values " 168 | "for each model. Setting as True will increase `fit()` runtimes.")) 169 | parser.add_argument('--feature_prune', type='bool', default=False, 170 | help="Whether or not to perform feature selection.") 171 | parser.add_argument('--holdout_frac', type=float, default=None, 172 | help=("Fraction of train_data to holdout as tuning data for optimizing hyperparameters " 173 | "(ignored unless `tuning_data = None`, ignored if `num_bagging_folds != 0`). " 174 | "Default value is selected based on the number of rows in the training data. " 175 | "Default values range from 0.2 at 2,500 rows to 0.01 at 250,000 rows. " 176 | "Default value is doubled if `hyperparameter_tune = True`, up to a maximum of 0.2. " 177 | "Disabled if `num_bagging_folds >= 2`.")) 178 | parser.add_argument('--num_bagging_folds', type=int, default=0, 179 | help=("Number of folds used for bagging of models. When `num_bagging_folds = k`, " 180 | "training time is roughly increased by a factor of `k` (set = 0 to disable bagging). " 181 | "Disabled by default, but we recommend values between 5-10 to maximize predictive performance. " 182 | "Increasing num_bagging_folds will result in models with lower bias but that are more prone to overfitting. " 183 | "Values > 10 may produce diminishing returns, and can even harm overall results due to overfitting. " 184 | "To further improve predictions, avoid increasing num_bagging_folds much beyond 10 " 185 | "and instead increase num_bagging_sets. ")) 186 | parser.add_argument('--num_bagging_sets', type=int, default=None, 187 | help=("Number of repeats of kfold bagging to perform (values must be >= 1). " 188 | "Total number of models trained during bagging = num_bagging_folds * num_bagging_sets. " 189 | "Defaults to 1 if time_limits is not specified, otherwise 20 " 190 | "(always disabled if num_bagging_folds is not specified). " 191 | "Values greater than 1 will result in superior predictive performance, " 192 | "especially on smaller problems and with stacking enabled. " 193 | "Increasing num_bagged_sets reduces the bagged aggregated variance without " 194 | "increasing the amount each model is overfit.")) 195 | parser.add_argument('--stack_ensemble_levels', type=int, default=0, 196 | help=("Number of stacking levels to use in stack ensemble. " 197 | "Roughly increases model training time by factor of `stack_ensemble_levels+1` " 198 | "(set = 0 to disable stack ensembling). " 199 | "Disabled by default, but we recommend values between 1-3 to maximize predictive performance. " 200 | "To prevent overfitting, this argument is ignored unless you have also set `num_bagging_folds >= 2`.")) 201 | parser.add_argument('--hyperparameters', type=lambda s: ast.literal_eval(s), default=None, 202 | help="Refer to docs: https://autogluon.mxnet.io/api/autogluon.task.html") 203 | parser.add_argument('--cache_data', type='bool', default=True, 204 | help=("Whether the predictor returned by this `fit()` call should be able to be further trained " 205 | "via another future `fit()` call. " 206 | "When enabled, the training and validation data are saved to disk for future reuse.")) 207 | parser.add_argument('--time_limits', type=int, default=None, 208 | help=("Approximately how long `fit()` should run for (wallclock time in seconds)." 209 | "If not specified, `fit()` will run until all models have completed training, " 210 | "but will not repeatedly bag models unless `num_bagging_sets` is specified.")) 211 | parser.add_argument('--num_trials', type=int, default=None, 212 | help=("Maximal number of different hyperparameter settings of each " 213 | "model type to evaluate during HPO. (only matters if " 214 | "hyperparameter_tune = True). If both `time_limits` and " 215 | "`num_trials` are specified, `time_limits` takes precedent.")) 216 | parser.add_argument('--search_strategy', type=str, default='random', 217 | help=("Which hyperparameter search algorithm to use. " 218 | "Options include: 'random' (random search), 'skopt' " 219 | "(SKopt Bayesian optimization), 'grid' (grid search), " 220 | "'hyperband' (Hyperband), 'rl' (reinforcement learner)")) 221 | parser.add_argument('--search_options', type=lambda s: ast.literal_eval(s), default=None, 222 | help="Auxiliary keyword arguments to pass to the searcher that performs hyperparameter optimization.") 223 | parser.add_argument('--nthreads_per_trial', type=int, default=None, 224 | help="How many CPUs to use in each training run of an individual model. This is automatically determined by AutoGluon when left as None (based on available compute).") 225 | parser.add_argument('--ngpus_per_trial', type=int, default=None, 226 | help="How many GPUs to use in each trial (ie. single training run of a model). This is automatically determined by AutoGluon when left as None.") 227 | parser.add_argument('--dist_ip_addrs', type=list, default=None, 228 | help="List of IP addresses corresponding to remote workers, in order to leverage distributed computation.") 229 | parser.add_argument('--visualizer', type=str, default='none', 230 | help=("How to visualize the neural network training progress during `fit()`. " 231 | "Options: ['mxboard', 'tensorboard', 'none'].")) 232 | parser.add_argument('--verbosity', type=int, default=2, 233 | help=("Verbosity levels range from 0 to 4 and control how much information is printed during fit(). " 234 | "Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings). " 235 | "If using logging, you can alternatively control amount of information printed via `logger.setLevel(L)`, " 236 | "where `L` ranges from 0 to 50 (Note: higher values of `L` correspond to fewer print statements, " 237 | "opposite of verbosity levels")) 238 | parser.add_argument('--debug', type='bool', default=False, 239 | help=("Whether to set logging level to DEBUG")) 240 | 241 | parser.add_argument('--feature_importance', type='bool', default=True) 242 | 243 | return parser.parse_args() 244 | 245 | 246 | def set_experiment_config(experiment_basename = None): 247 | ''' 248 | Optionally takes an base name for the experiment. Has a hard dependency on boto3 installation. 249 | Creates a new experiment using the basename, otherwise simply uses autogluon as basename. 250 | May run into issues on Experiments' requirements for basename config downstream. 251 | ''' 252 | now = int(time.time()) 253 | 254 | if experiment_basename: 255 | experiment_name = '{}-autogluon-{}'.format(experiment_basename, now) 256 | else: 257 | experiment_name = 'autogluon-{}'.format(now) 258 | 259 | try: 260 | client = boto3.Session().client('sagemaker') 261 | except: 262 | print ('You need to install boto3 to create an experiment. Try pip install --upgrade boto3') 263 | return '' 264 | 265 | try: 266 | Experiment.create(experiment_name=experiment_name, 267 | description="Running AutoGluon Tabular with SageMaker Experiments", 268 | sagemaker_boto_client=client) 269 | print ('Created an experiment named {}, you should be able to see this in SageMaker Studio right now.'.format(experiment_name)) 270 | 271 | except: 272 | print ('Could not create the experiment. Is your basename properly configured? Also try installing the sagemaker experiments SDK with pip install sagemaker-experiments.') 273 | return '' 274 | 275 | return experiment_name 276 | 277 | if __name__ == "__main__": 278 | start = timer() 279 | 280 | args = parse_args() 281 | 282 | # Print SageMaker args 283 | print('\n====== args ======') 284 | for k,v in vars(args).items(): 285 | print(f'{k}, type: {type(v)}, value: {v}') 286 | print() 287 | 288 | train() 289 | 290 | # Package inference code with model export 291 | subprocess.call('mkdir /opt/ml/model/code'.split()) 292 | subprocess.call('cp /opt/ml/code/inference.py /opt/ml/model/code/'.split()) 293 | 294 | elapsed_time = round(timer()-start,3) 295 | print(f'Elapsed time: {elapsed_time} seconds') 296 | print('===== Training Completed =====') 297 | -------------------------------------------------------------------------------- /Starter Notebooks/MLOps and Hosting/test_set.csv: -------------------------------------------------------------------------------- 1 | area_se,concavity_se,radius_worst,compactness_worst,concavity_worst,compactness_mean,texture_se,area_worst,perimeter_mean,fractal_dimension_worst,concave points_worst,fractal_dimension_mean,concave points_mean,texture_mean,radius_se,smoothness_se,concavity_mean,texture_worst,perimeter_worst,symmetry_mean,smoothness_mean,smoothness_worst,concave points_se,fractal_dimension_se,radius_mean,compactness_se,perimeter_se,symmetry_se,area_mean,symmetry_worst,truth 2 | 45.4,0.01998,19.07,0.1871,0.2914,0.07200000000000001,1.24,1138.0,94.74,0.08216,0.1609,0.05922,0.05259,20.13,0.4727,0.005718,0.07395,30.88,123.4,0.1586,0.09867000000000001,0.1464,0.011090000000000001,0.002085,14.68,0.01162,3.195,0.0141,684.5,0.3029,1 3 | 76.36,0.0611,19.76,0.1963,0.2535,0.08642000000000001,1.305,1228.0,117.4,0.06558,0.09181,0.0534,0.05778,21.84,0.6362,0.005529999999999999,0.1103,24.7,129.1,0.177,0.07371,0.08822,0.01444,0.005036,18.08,0.05296,4.312,0.0214,1024.0,0.2369,1 4 | 15.26,0.02828,14.24,0.2685,0.2866,0.08575,0.5308,623.7,85.79,0.0732,0.09172999999999999,0.05594,0.02864,13.72,0.1833,0.0042710000000000005,0.050769999999999996,17.37,96.59,0.1617,0.08363,0.1166,0.008468,0.002613,13.28,0.020730000000000002,1.5919999999999999,0.01461,541.8,0.2736,0 5 | 86.22,0.07117000000000001,25.74,0.8681,0.9387,0.27699999999999997,1.595,1821.0,140.1,0.124,0.265,0.07016,0.152,29.33,0.726,0.006522,0.3514,39.42,184.6,0.2397,0.1178,0.165,0.016640000000000002,0.006185,20.6,0.061579999999999996,5.772,0.02324,1265.0,0.4087,1 6 | 32.74,0.01608,15.66,0.1252,0.1117,0.05895,1.046,750.0,89.78,0.07234,0.07453,0.05898,0.029439999999999997,15.98,0.3892,0.007976,0.035339999999999996,21.58,101.2,0.1714,0.08457999999999999,0.1195,0.009046,0.00283,14.04,0.01295,2.6439999999999997,0.02005,611.2,0.2725,0 7 | 20.64,0.016980000000000002,14.96,0.1346,0.1742,0.07957,0.9462,686.5,87.76,0.0696,0.09077,0.06088,0.0316,17.64,0.2431,0.0032450000000000005,0.04548,23.53,95.78,0.1732,0.0995,0.1199,0.009233,0.001524,13.7,0.008186,1.564,0.01285,571.1,0.2518,0 8 | 21.84,0.02153,16.01,0.1388,0.17,0.08345,1.636,783.6,96.12,0.06599,0.1017,0.057479999999999996,0.049510000000000005,20.21,0.2323,0.005415,0.06824,28.48,103.9,0.1487,0.09587000000000001,0.1216,0.01183,0.001812,14.87,0.01371,1.5959999999999999,0.01959,680.9,0.2369,0 9 | 28.93,0.007936,15.05,0.1421,0.07003,0.07426,1.9240000000000002,705.6,86.34,0.07675,0.07762999999999999,0.060160000000000005,0.032639999999999995,30.72,0.3408,0.005841,0.028189999999999996,41.61,96.69,0.1375,0.09245,0.1172,0.009128,0.0029850000000000002,13.38,0.012459999999999999,2.287,0.01564,557.2,0.2196,0 10 | 23.92,0.005717,15.53,0.1109,0.053070000000000006,0.05306,1.081,749.9,90.03,0.07082999999999999,0.0589,0.057,0.02733,12.88,0.2571,0.006692,0.01765,18.0,98.4,0.1373,0.09308999999999999,0.1281,0.006626999999999999,0.0024760000000000003,14.11,0.01132,1.558,0.014159999999999999,616.5,0.21,0 11 | 75.09,0.040619999999999996,23.86,0.3597,0.5179,0.1644,1.216,1760.0,137.8,0.08999,0.2113,0.062220000000000004,0.1121,21.24,0.5904,0.006666,0.2188,30.76,163.2,0.1848,0.1085,0.1464,0.014790000000000001,0.0037270000000000003,20.59,0.02791,4.206,0.011170000000000001,1320.0,0.248,1 12 | 27.48,0.1197,11.26,0.295,0.3486,0.1294,2.261,390.2,64.6,0.1162,0.0991,0.08116,0.037160000000000006,18.06,0.4311,0.01286,0.1307,24.39,73.07,0.1669,0.09698999999999999,0.1301,0.0246,0.01792,9.904,0.08807999999999999,3.1319999999999997,0.0388,302.4,0.2614,0 13 | 20.72,0.014819999999999998,15.61,0.1011,0.1101,0.05016,0.6232,760.2,94.66,0.06142,0.07955,0.05347999999999999,0.02541,14.7,0.2182,0.0067079999999999996,0.03416,17.58,101.7,0.1659,0.08472,0.1139,0.01056,0.001779,14.81,0.01197,1.6769999999999998,0.0158,680.7,0.2334,0 14 | 14.41,0.03113,13.72,0.1975,0.145,0.09965,0.6412,576.0,84.08,0.1009,0.0585,0.07238,0.020980000000000002,14.23,0.1814,0.005231,0.037380000000000004,16.91,87.38,0.1652,0.09462000000000001,0.1142,0.007315,0.0057009999999999995,12.99,0.02305,0.9219,0.016390000000000002,514.3,0.2432,0 15 | 19.63,0.02197,10.57,0.2097,0.09996000000000001,0.1225,1.13,326.6,59.96,0.08982000000000001,0.07262,0.07696,0.02421,13.9,0.3538,0.01546,0.033319999999999995,17.84,67.84,0.2197,0.1371,0.185,0.0158,0.003901,9.295,0.0254,2.388,0.03997,257.8,0.3681,0 16 | 67.1,0.02134,20.33,0.2817,0.2432,0.1791,2.91,1298.0,129.1,0.09203,0.1841,0.07224,0.1469,26.29,0.519,0.0075450000000000005,0.1937,32.72,141.3,0.1634,0.1215,0.1392,0.018430000000000002,0.01039,19.1,0.0605,5.801,0.030560000000000004,1132.0,0.2311,1 17 | 7.228,0.1535,10.85,0.3619,0.603,0.166,1.2309999999999999,351.9,70.15,0.12,0.1465,0.0845,0.059410000000000004,20.22,0.1115,0.008499,0.228,22.82,76.51,0.2188,0.09072999999999999,0.1143,0.029189999999999997,0.0122,10.57,0.07643,2.363,0.01617,338.3,0.2597,0 18 | 9.597000000000001,0.027569999999999997,11.94,0.3898,0.3365,0.1069,0.5379999999999999,433.1,70.41,0.10800000000000001,0.07966000000000001,0.06837,0.01571,15.62,0.1482,0.0044740000000000005,0.05115,19.35,80.78,0.1861,0.1007,0.1332,0.006691,0.004672,10.88,0.030930000000000003,1.301,0.01212,358.9,0.2581,0 19 | 57.72,0.05839,17.73,0.2116,0.3344,0.1283,1.4569999999999999,975.2,107.5,0.07952999999999999,0.1047,0.06532,0.07981,21.88,0.5706,0.01056,0.1799,25.21,113.7,0.1869,0.1165,0.1426,0.011859999999999999,0.006187,16.26,0.03756,2.9610000000000003,0.04022,826.8,0.2736,1 20 | 17.81,0.0,8.952,0.07767,0.0,0.05847,2.7769999999999997,240.1,54.09,0.08116,0.0,0.07359,0.0,18.6,0.3368,0.02075,0.0,22.44,56.65,0.2163,0.1074,0.1347,0.0,0.0068200000000000005,8.597000000000001,0.01403,2.222,0.06146,221.2,0.3142,0 21 | 116.4,0.1091,23.37,0.6164,0.7681,0.3454,1.885,1623.0,143.7,0.09963999999999999,0.2508,0.08142,0.1604,23.97,0.9317,0.01038,0.3754,31.72,170.3,0.2906,0.1286,0.1639,0.02593,0.005987,20.18,0.06835,8.649,0.07895,1245.0,0.544,1 22 | 104.9,0.09723,24.09,0.7444,0.7242,0.21899999999999997,1.666,1651.0,128.3,0.1038,0.2493,0.06343,0.09961,24.81,0.9811,0.006548,0.2107,33.17,177.4,0.231,0.09081,0.1247,0.02638,0.007645999999999999,19.07,0.1006,8.83,0.053329999999999995,1104.0,0.467,1 23 | 14.47,0.01556,12.57,0.1,0.08803,0.05562,1.926,489.5,75.27,0.06576,0.04306,0.0578,0.01553,17.39,0.1859,0.007831,0.023530000000000002,26.48,79.57,0.1718,0.1007,0.1356,0.00624,0.0019879999999999997,11.81,0.008776,1.011,0.031389999999999994,428.9,0.32,0 24 | 16.35,0.08158,14.39,0.5849,0.7727,0.1346,0.4402,639.1,84.95,0.1178,0.1561,0.06409,0.0398,14.11,0.2025,0.005501,0.1374,17.7,105.0,0.1596,0.0876,0.1254,0.0137,0.007555,12.89,0.055920000000000004,2.3930000000000002,0.01266,512.2,0.2639,0 25 | 23.29,0.07927000000000001,13.9,0.2317,0.3344,0.0958,1.389,595.6,84.08,0.07127,0.1017,0.05935,0.0339,15.7,0.2913,0.006418000000000001,0.1115,19.69,92.12,0.1432,0.07818,0.09926,0.017740000000000002,0.003696,12.89,0.03961,2.347,0.01878,516.6,0.1999,0 26 | 17.58,0.0151,14.35,0.1063,0.139,0.05205,1.35,632.9,84.1,0.06788,0.06005,0.05584,0.02068,25.25,0.2084,0.005768,0.027719999999999998,34.23,91.29,0.1619,0.08791,0.1289,0.006451,0.0018280000000000002,13.21,0.008081999999999999,1.314,0.01347,537.9,0.2444,0 27 | 130.8,0.07649,23.68,0.3391,0.4932,0.1838,1.743,1696.0,134.7,0.09469,0.1923,0.07468999999999999,0.128,23.86,1.072,0.007964,0.2283,29.43,158.8,0.2249,0.10800000000000001,0.1347,0.01936,0.005928,20.09,0.04732,7.803999999999999,0.027360000000000002,1247.0,0.3294,1 28 | 24.25,0.02905,15.67,0.4166,0.5006,0.1231,0.8937,759.4,85.98,0.1179,0.2088,0.06777000000000001,0.0734,18.66,0.2871,0.006532,0.1226,27.95,102.8,0.2128,0.1158,0.1786,0.01215,0.003643,13.17,0.02336,1.8969999999999998,0.01743,534.6,0.39,1 29 | 13.56,0.030789999999999998,13.94,0.1508,0.2298,0.045239999999999995,1.601,602.0,84.13,0.07198,0.0497,0.05635,0.01105,17.43,0.163,0.006261,0.04336,27.82,88.28,0.1487,0.07215,0.1101,0.005383,0.00225,13.2,0.01569,0.873,0.01962,541.6,0.2767,0 30 | 27.41,0.019469999999999998,16.51,0.1376,0.1611,0.07214,1.385,826.4,94.7,0.06956,0.1095,0.0568,0.03027,25.42,0.3031,0.004775,0.04105,32.29,107.4,0.184,0.08275,0.106,0.01269,0.0026260000000000003,14.74,0.01172,2.177,0.0187,668.6,0.2722,0 31 | 49.45,0.052779999999999994,16.46,0.3635,0.3219,0.1836,1.511,809.2,98.22,0.09208,0.1108,0.07406,0.063,13.98,0.5462,0.009976,0.145,18.34,114.1,0.2086,0.1031,0.1312,0.0158,0.005444,14.69,0.05244,4.795,0.02653,656.1,0.2827,0 32 | 48.31,0.028130000000000002,19.26,0.2394,0.3791,0.1223,0.7859,1156.0,101.7,0.08019,0.1514,0.057960000000000005,0.08087000000000001,19.48,0.4743,0.00624,0.1466,26.0,124.9,0.1931,0.1092,0.1546,0.01093,0.002461,15.46,0.01484,3.094,0.013969999999999998,748.9,0.2837,1 33 | 10.8,0.02758,11.92,0.221,0.2299,0.09097000000000001,0.8225,440.0,71.49,0.0908,0.1075,0.06907,0.03341,14.96,0.1601,0.007416,0.053970000000000004,19.9,79.76,0.1776,0.1033,0.1418,0.0101,0.0029170000000000003,11.06,0.01877,1.355,0.02348,373.9,0.3301,0 34 | 23.02,0.02889,18.49,0.5564,0.5703,0.1639,1.278,1035.0,103.7,0.1204,0.2014,0.0665,0.08399,33.56,0.2419,0.005345,0.1751,49.54,126.3,0.2091,0.1063,0.1883,0.01022,0.0033590000000000004,15.53,0.02556,1.903,0.009947,744.9,0.3512,1 35 | 24.91,0.04815,15.89,0.4238,0.5186,0.1098,1.0190000000000001,799.6,93.63,0.1014,0.1447,0.06125,0.055979999999999995,21.72,0.28600000000000003,0.005878,0.1319,30.36,116.2,0.1885,0.09823,0.1446,0.011609999999999999,0.0040219999999999995,14.25,0.02995,2.657,0.02028,633.0,0.3591,1 36 | 93.54,0.05081,21.31,0.2117,0.3446,0.1066,1.849,1403.0,122.1,0.07421,0.149,0.05699,0.07731,20.25,0.8529,0.01075,0.149,27.26,139.9,0.1697,0.0944,0.1338,0.01911,0.004217,18.61,0.027219999999999998,5.632000000000001,0.022930000000000002,1094.0,0.2341,1 37 | 29.84,0.02071,15.3,0.2264,0.1326,0.1126,1.492,706.7,91.38,0.08321,0.1048,0.06171,0.043039999999999995,27.15,0.3645,0.007256,0.04462,33.17,100.2,0.1537,0.09929,0.1241,0.01626,0.005304,14.05,0.02678,2.888,0.0208,600.4,0.225,0 38 | 6.8020000000000005,0.03735,9.965,0.1887,0.1868,0.060529999999999994,1.182,301.0,59.75,0.09206,0.025639999999999996,0.06724,0.005128,21.68,0.1186,0.005515,0.03735,27.99,66.61,0.1274,0.07969,0.1086,0.005128,0.004582999999999999,9.397,0.026739999999999996,1.1740000000000002,0.01951,268.8,0.2376,0 39 | 17.74,0.02018,11.95,0.1223,0.09755,0.05139,1.239,441.2,68.26,0.06769,0.03413,0.05687999999999999,0.007875,14.97,0.2525,0.006547,0.02251,20.72,77.79,0.1399,0.07793,0.1076,0.005612,0.00236,10.75,0.01781,1.806,0.01671,355.3,0.23,0 40 | 70.01,0.03457,24.47,0.2761,0.4146,0.1074,1.189,1872.0,134.4,0.08327999999999999,0.1563,0.055920000000000004,0.0834,27.81,0.524,0.00502,0.1554,37.38,162.7,0.1448,0.09158999999999999,0.1223,0.01091,0.0028870000000000002,20.51,0.02062,3.767,0.012980000000000002,1319.0,0.2437,1 41 | 15.7,0.01985,10.23,0.1148,0.08867,0.06492,0.9768,314.9,60.34,0.07773,0.062270000000000006,0.06905,0.02076,12.44,0.2773,0.009606,0.029560000000000003,15.66,65.13,0.1815,0.1024,0.1324,0.014209999999999999,0.002968,9.504,0.01432,1.909,0.02027,273.9,0.245,0 42 | 32.96,0.000692,14.23,0.06191,0.0018449999999999999,0.03789,1.214,624.1,82.61,0.06289,0.01111,0.05501,0.0041670000000000006,19.31,0.40399999999999997,0.007490999999999999,0.000692,22.25,90.24,0.1819,0.0806,0.1021,0.0041670000000000006,0.00299,13.05,0.008593,2.595,0.0219,527.2,0.2439,0 43 | 14.16,0.013430000000000001,13.3,0.046189999999999995,0.04833,0.03766,2.342,545.9,82.61,0.061689999999999995,0.05013,0.058629999999999995,0.029230000000000003,18.42,0.1839,0.004352,0.02562,22.81,84.46,0.1467,0.08983,0.09701,0.011640000000000001,0.001777,13.03,0.004899000000000001,1.17,0.02671,523.8,0.1987,0 44 | 14.55,0.010790000000000001,14.67,0.1582,0.105,0.06059,0.9234,656.7,87.21,0.08025,0.08586,0.05952999999999999,0.017230000000000002,16.34,0.1872,0.004477,0.01857,23.19,96.08,0.1353,0.07685,0.1089,0.007956,0.0025510000000000003,13.64,0.011770000000000001,1.449,0.01325,571.8,0.2346,0 45 | 29.63,0.0058119999999999995,15.63,0.1141,0.04753,0.0633,1.6780000000000002,749.1,88.68,0.06911,0.0589,0.056729999999999996,0.022930000000000002,19.6,0.3419,0.005836,0.01342,28.01,100.9,0.1555,0.08684,0.1118,0.007039,0.002326,13.85,0.01095,2.331,0.02014,592.6,0.2513,0 46 | 20.86,0.055529999999999996,12.26,0.2118,0.1797,0.11199999999999999,1.768,457.8,74.65,0.08134,0.06917999999999999,0.06782,0.025939999999999998,14.44,0.2784,0.01215,0.06737,19.68,78.78,0.1818,0.09984,0.1345,0.01494,0.0055119999999999995,11.54,0.04112,1.6280000000000001,0.0184,402.9,0.2329,0 47 | 33.76,0.01121,16.11,0.1766,0.09189,0.08606,1.111,803.7,94.57,0.07246,0.06946000000000001,0.058660000000000004,0.02957,24.02,0.3721,0.004868,0.03102,29.11,102.9,0.1685,0.08974,0.1115,0.008606,0.002893,14.62,0.01818,2.279,0.02085,662.7,0.2522,0 48 | 67.34,0.026260000000000002,20.05,0.2119,0.2318,0.09182,1.391,1260.0,109.3,0.07228,0.1474,0.05534,0.06576,18.8,0.599,0.006123,0.08422,26.3,130.7,0.1893,0.08865,0.1168,0.016040000000000002,0.003493,16.78,0.0247,4.129,0.020909999999999998,886.3,0.281,1 49 | 69.47,0.04252,23.32,0.7394,0.6566,0.183,1.041,1681.0,114.2,0.1339,0.1899,0.06487000000000001,0.07944,24.52,0.5907,0.0058200000000000005,0.1692,33.82,151.6,0.1927,0.1071,0.1585,0.01127,0.006299,17.2,0.05616,3.705,0.015269999999999999,929.4,0.3313,1 50 | 16.16,0.020069999999999998,13.67,0.2003,0.2267,0.07165,1.255,567.9,76.95,0.07923999999999999,0.07632,0.05968,0.01863,15.65,0.2271,0.0059689999999999995,0.041510000000000005,24.9,87.78,0.2079,0.09723,0.1377,0.007026999999999999,0.002607,12.0,0.018119999999999997,1.4409999999999998,0.019719999999999998,443.3,0.3379,0 51 | 21.83,0.01831,17.71,0.1722,0.231,0.08501,0.6372,947.9,104.3,0.07012,0.1129,0.05875,0.04528,14.86,0.2387,0.003958,0.055,19.58,115.9,0.1735,0.09495,0.1206,0.008747,0.001621,16.14,0.012459999999999999,1.729,0.015,800.0,0.2778,0 52 | 20.53,0.0139,15.93,0.2043,0.2085,0.06031,0.8265,787.9,86.18,0.07146,0.1112,0.05587,0.020309999999999998,21.58,0.2385,0.00328,0.0311,30.25,102.5,0.1784,0.08162,0.1094,0.006881,0.001286,13.44,0.01102,1.5719999999999998,0.0138,563.0,0.2994,1 53 | 24.62,0.02586,13.11,0.1676,0.1755,0.09362000000000001,2.174,525.1,73.81,0.08851,0.061270000000000005,0.07005,0.022330000000000003,20.97,0.3251,0.01037,0.04591,32.16,84.53,0.1842,0.1102,0.1557,0.0075060000000000005,0.0039759999999999995,11.45,0.01706,2.077,0.01816,401.5,0.2762,0 54 | 68.17,0.03497,24.15,0.659,0.6091,0.1719,0.6062,1813.0,127.9,0.1123,0.1785,0.06261,0.07593,26.47,0.5558,0.0050149999999999995,0.1657,30.9,161.4,0.1853,0.09401,0.1509,0.009643,0.003896,19.27,0.03318,3.528,0.015430000000000001,1162.0,0.3672,1 55 | 25.22,0.01872,15.1,0.1751,0.1381,0.08165,1.3730000000000002,699.4,86.6,0.06602999999999999,0.07911,0.0571,0.0278,18.3,0.295,0.005884,0.03974,25.94,97.59,0.1638,0.1022,0.1339,0.009366,0.0018170000000000003,13.45,0.01491,2.099,0.01884,555.1,0.2678,0 56 | 19.62,0.003297,11.98,0.09669,0.01335,0.06779,1.966,436.1,71.94,0.06522,0.02022,0.06027999999999999,0.0075829999999999995,19.86,0.2976,0.01289,0.005006,25.78,76.91,0.19399999999999998,0.1054,0.1424,0.004967,0.001963,11.22,0.011040000000000001,1.959,0.04243,387.3,0.3292,0 57 | 43.4,0.021509999999999998,17.98,0.1546,0.2644,0.06287999999999999,1.147,993.6,85.84,0.07371,0.11599999999999999,0.05671,0.03438,19.63,0.4697,0.0060030000000000005,0.05857999999999999,29.87,116.6,0.1598,0.09047999999999999,0.1401,0.009443,0.0018679999999999999,13.43,0.01063,3.142,0.0152,565.4,0.2884,1 58 | 15.82,0.01123,12.85,0.053320000000000006,0.04116,0.032119999999999996,0.5996,513.1,77.25,0.06037000000000001,0.01852,0.05649,0.005051,14.08,0.2113,0.005343,0.01123,16.47,81.6,0.1673,0.07733999999999999,0.1001,0.005051,0.0009502000000000001,12.18,0.005767,1.4380000000000002,0.01977,461.4,0.2293,0 59 | 14.34,0.04156,12.4,0.201,0.2596,0.08227999999999999,1.166,467.6,73.99,0.0918,0.07431,0.06573999999999999,0.01969,14.59,0.2034,0.004957,0.053079999999999995,21.9,82.04,0.1779,0.1046,0.1352,0.008038,0.003614,11.49,0.02114,1.567,0.018430000000000002,404.9,0.2941,0 60 | 54.04,0.02291,20.58,0.1202,0.2249,0.06722,1.214,1261.0,114.2,0.061110000000000005,0.1185,0.05025,0.05596,20.01,0.5506,0.004024,0.07293,27.83,129.2,0.2129,0.08402000000000001,0.1072,0.009863,0.001902,17.95,0.008422,3.3569999999999998,0.05014,982.0,0.4882,1 61 | 67.74,0.04256,23.96,0.3725,0.5936,0.1448,1.4080000000000001,1740.0,128.1,0.09009,0.20600000000000002,0.06115,0.1194,18.82,0.5659,0.005288,0.2256,30.39,153.9,0.1823,0.1089,0.1514,0.01176,0.003211,19.44,0.02833,3.6310000000000002,0.017169999999999998,1167.0,0.3266,1 62 | 21.2,0.031139999999999998,15.85,0.2735,0.3103,0.1021,0.7394,766.9,93.97,0.07683,0.1599,0.06081,0.05532,15.18,0.2406,0.005706,0.08487,19.85,108.6,0.1724,0.0997,0.1316,0.01493,0.0025280000000000003,14.44,0.022969999999999997,2.12,0.01454,640.1,0.2691,0 63 | 155.8,0.044969999999999996,31.01,0.4126,0.5820000000000001,0.1682,0.9635,2944.0,153.5,0.08677,0.2593,0.06309,0.1237,26.97,1.058,0.006428,0.195,34.51,206.0,0.1909,0.09509,0.1481,0.017159999999999998,0.003053,23.21,0.028630000000000003,7.247000000000001,0.0159,1670.0,0.3103,1 64 | 116.2,0.0889,20.96,0.3903,0.3639,0.2458,3.568,1332.0,132.4,0.1023,0.1767,0.078,0.1118,24.8,0.9555,0.003139,0.2065,29.94,151.7,0.2397,0.0974,0.1037,0.0409,0.01284,19.17,0.08297,11.07,0.04484,1123.0,0.3176,1 65 | 99.04,0.0395,23.69,0.1922,0.3215,0.1034,2.463,1731.0,131.2,0.06637,0.1628,0.05533,0.09791,28.25,0.7655,0.005769,0.14400000000000002,38.25,155.0,0.1752,0.0978,0.1166,0.01678,0.0024980000000000002,20.13,0.02423,5.202999999999999,0.01898,1261.0,0.2572,1 66 | 36.74,0.02623,19.96,0.11599999999999999,0.221,0.058839999999999996,0.828,1236.0,120.9,0.057370000000000004,0.1294,0.049960000000000004,0.058429999999999996,19.98,0.3283,0.007571,0.0802,24.3,129.0,0.155,0.08922999999999999,0.1243,0.01463,0.0016760000000000002,18.81,0.01114,2.363,0.0193,1102.0,0.2567,1 67 | 139.9,0.035710000000000006,29.92,0.4188,0.4658,0.2106,0.9004,2642.0,165.5,0.09671,0.2475,0.06738999999999999,0.1471,21.6,0.9915,0.004989,0.231,26.93,205.7,0.1991,0.10300000000000001,0.1342,0.015969999999999998,0.00476,24.63,0.032119999999999996,7.05,0.01879,1841.0,0.3157,1 68 | 22.07,0.0073019999999999995,15.33,0.1513,0.062310000000000004,0.06945,1.5030000000000001,715.5,89.79,0.07617,0.07962999999999999,0.05835,0.01896,21.25,0.2589,0.007389,0.01462,30.28,98.27,0.1517,0.0907,0.1287,0.01004,0.002925,14.03,0.01383,1.6669999999999998,0.012629999999999999,603.4,0.2226,0 69 | 100.4,0.04093,23.23,0.2534,0.3092,0.1313,1.736,1645.0,134.7,0.06386,0.1613,0.054189999999999995,0.1015,20.67,0.8336,0.0049380000000000005,0.1523,27.15,152.0,0.2166,0.09156,0.1097,0.01699,0.002719,20.47,0.030889999999999997,5.167999999999999,0.02816,1299.0,0.322,1 70 | 103.6,0.059039999999999995,24.19,0.3416,0.3703,0.1669,1.892,1671.0,133.7,0.07632,0.2152,0.0602,0.1265,26.83,0.9761,0.008439,0.1641,33.81,160.0,0.1875,0.09905,0.1278,0.02536,0.004286,20.2,0.04674,7.127999999999999,0.0371,1234.0,0.3271,1 71 | 54.18,0.03188,20.96,0.4233,0.4784,0.2022,1.073,1315.0,108.1,0.1142,0.2073,0.07356,0.1028,20.68,0.5692,0.007026,0.1722,31.48,136.8,0.2164,0.11699999999999999,0.1789,0.012969999999999999,0.004142,16.13,0.02501,3.8539999999999996,0.016890000000000002,798.8,0.3706,1 72 | 24.2,0.1027,10.06,0.3748,0.4609,0.1972,1.911,297.1,60.07,0.1055,0.1145,0.08743,0.04908,18.9,0.4653,0.009845,0.1975,23.4,68.62,0.233,0.09967999999999999,0.1221,0.025269999999999997,0.007877,9.042,0.0659,3.7689999999999997,0.034910000000000004,244.5,0.3135,0 73 | 34.44,0.037630000000000004,15.65,0.4706,0.4425,0.1353,1.8090000000000002,768.9,81.15,0.1205,0.1459,0.06937,0.04562,26.86,0.4053,0.009098,0.1085,39.34,101.7,0.1943,0.1034,0.1785,0.01321,0.005672,12.34,0.03845,2.642,0.01878,477.4,0.3215,1 74 | 44.41,0.03248,19.28,0.2947,0.3597,0.1319,1.232,1121.0,106.9,0.08199999999999999,0.1583,0.06277,0.08488,20.71,0.4375,0.006697,0.1478,30.38,129.8,0.1948,0.1169,0.159,0.013919999999999998,0.0027890000000000002,16.27,0.02083,3.27,0.015359999999999999,813.7,0.3103,1 75 | 53.16,0.03059,23.79,0.3749,0.4316,0.1442,0.9951,1628.0,127.2,0.07787000000000001,0.2252,0.05892000000000001,0.09464,18.18,0.4709,0.005654,0.1626,28.65,152.4,0.1893,0.1037,0.1518,0.01499,0.0019649999999999997,19.4,0.02199,2.903,0.01623,1145.0,0.359,1 76 | 67.78,0.05042,20.88,0.3559,0.5588,0.1496,1.3980000000000001,1344.0,112.8,0.08482,0.1847,0.06382,0.1203,23.98,0.6009,0.008268000000000001,0.2417,32.09,136.1,0.2248,0.1197,0.1634,0.01112,0.003854,17.02,0.03082,3.9989999999999997,0.02102,899.3,0.353,1 77 | 18.02,0.005832,13.64,0.1352,0.04506,0.05794,1.1520000000000001,562.6,76.66,0.08083,0.05093,0.06047999999999999,0.008487999999999999,18.9,0.243,0.00718,0.007509999999999999,27.06,86.54,0.1555,0.08386,0.1289,0.005495,0.002754,12.06,0.01096,1.5590000000000002,0.019819999999999997,445.3,0.28800000000000003,0 78 | 23.94,0.07743,15.09,1.058,1.105,0.2396,1.599,711.4,83.97,0.2075,0.221,0.08242999999999999,0.08542999999999999,24.04,0.2976,0.007148999999999999,0.2273,40.68,97.65,0.203,0.1186,0.1853,0.01432,0.01008,12.46,0.07217,2.039,0.01789,475.9,0.4366,1 79 | 14.67,0.016980000000000002,14.5,0.2776,0.18899999999999997,0.127,0.7477,630.5,85.63,0.08183,0.07282999999999999,0.06811,0.0311,15.71,0.1852,0.004097,0.04568,20.49,96.09,0.1967,0.1075,0.1312,0.006490000000000001,0.002425,13.08,0.01898,1.383,0.01678,520.0,0.3184,0 80 | 23.52,0.04312,10.01,0.1678,0.1397,0.08751,2.265,310.1,59.2,0.0849,0.05087,0.06963,0.0218,13.86,0.4098,0.008738,0.059879999999999996,19.23,65.59,0.2341,0.07721,0.09836,0.0156,0.005822,9.173,0.03938,2.608,0.04192,260.9,0.3282,0 81 | 17.25,0.007078,15.85,0.1564,0.1206,0.07255,0.4801,773.4,87.76,0.07782,0.08703999999999999,0.06155,0.0188,16.33,0.2047,0.0038280000000000002,0.017519999999999997,20.2,101.6,0.1631,0.09277,0.1264,0.005077,0.001697,13.68,0.007228,1.3730000000000002,0.01054,575.5,0.2806,0 82 | 11.36,0.0,9.262,0.07057000000000001,0.0,0.04276,0.7873,259.2,54.42,0.07848,0.0,0.06724,0.0,14.45,0.2204,0.009172,0.0,17.04,58.36,0.1722,0.09137999999999999,0.1162,0.0,0.003399,8.671,0.008006999999999998,1.435,0.027110000000000002,227.2,0.2592,0 83 | 42.76,0.044360000000000004,16.34,0.3089,0.2604,0.1339,0.7372,803.6,95.77,0.08473,0.1397,0.06346,0.07064,15.24,0.5115,0.005508,0.09966,18.24,109.4,0.2116,0.1132,0.1277,0.01623,0.004841,14.64,0.04412,3.8139999999999996,0.02427,651.9,0.3151,0 84 | 20.95,0.027030000000000002,13.83,0.2463,0.2434,0.09445,1.5019999999999998,574.7,79.78,0.09261,0.1205,0.06404,0.03745,21.8,0.2978,0.007112,0.06015,30.5,91.46,0.193,0.08772,0.1304,0.01293,0.004463,12.36,0.02493,2.2030000000000003,0.01958,466.1,0.2972,0 85 | 24.87,0.015359999999999999,16.25,0.303,0.1804,0.09823,0.948,809.8,97.03,0.08472,0.1489,0.05852,0.04819,19.1,0.2877,0.005332,0.0594,26.19,109.1,0.1879,0.08992,0.1313,0.01187,0.002815,14.96,0.02115,2.171,0.01522,687.3,0.2962,0 86 | 104.9,0.06591,21.57,0.4785,0.5165,0.2004,1.465,1437.0,119.0,0.1224,0.1996,0.07368999999999999,0.1002,23.33,0.9289,0.0067659999999999994,0.2136,28.87,143.6,0.1696,0.09289,0.1207,0.02311,0.0113,17.6,0.07025,5.801,0.016730000000000002,980.5,0.2301,1 87 | 153.4,0.05372999999999999,25.38,0.6656,0.7119,0.2776,0.9053,2019.0,122.8,0.1189,0.2654,0.07871,0.1471,10.38,1.095,0.006399,0.3001,17.33,184.6,0.2419,0.1184,0.1622,0.01587,0.006193,17.99,0.04904,8.589,0.03003,1001.0,0.4601,1 88 | 27.57,0.01851,13.32,0.1477,0.149,0.09262999999999999,1.1540000000000001,549.8,75.49,0.08023999999999999,0.09815,0.06401,0.03132,16.17,0.3713,0.008998,0.042789999999999995,21.59,86.57,0.1853,0.1128,0.1526,0.01167,0.003213,11.68,0.01292,2.5540000000000003,0.021519999999999997,420.5,0.2804,0 89 | 35.03,0.026639999999999997,20.21,0.5804,0.5274,0.1559,0.6857,1261.0,107.0,0.1233,0.1864,0.06515,0.07752,17.88,0.33399999999999996,0.004185,0.1354,27.26,132.7,0.1998,0.10400000000000001,0.1446,0.009067,0.003817,16.13,0.02868,2.1830000000000003,0.01703,807.2,0.42700000000000005,1 90 | 80.99,0.04718,26.23,0.5717,0.7053,0.2087,0.9209,2081.0,144.4,0.1007,0.2422,0.06606000000000001,0.1562,22.28,0.6242,0.005215,0.281,28.74,172.0,0.2162,0.1167,0.1502,0.01288,0.004028,21.61,0.03726,4.158,0.02045,1407.0,0.3828,1 91 | 44.74,0.04763,14.62,0.1364,0.1559,0.09755,2.635,653.3,90.31,0.07253,0.1015,0.06457,0.06615,13.17,0.5461,0.01004,0.10099999999999999,15.38,94.52,0.1976,0.1248,0.1394,0.02853,0.005528,13.94,0.03247,4.091,0.01715,594.2,0.21600000000000003,0 92 | 87.17,0.04502,20.99,0.2053,0.392,0.1056,2.129,1362.0,111.8,0.07599,0.1827,0.06071,0.09934,21.0,0.8161,0.006455,0.1508,33.15,143.2,0.1727,0.1119,0.1449,0.01744,0.0037329999999999998,17.06,0.01797,6.0760000000000005,0.01829,918.6,0.2623,1 93 | 19.08,0.014530000000000001,11.35,0.0824,0.03938,0.057429999999999995,1.805,396.5,70.21,0.07313,0.04306,0.06669,0.025830000000000002,14.71,0.2073,0.01496,0.02363,16.82,72.01,0.1566,0.1006,0.1216,0.01583,0.004785,11.08,0.02121,1.3769999999999998,0.03082,372.7,0.1902,0 94 | 20.39,0.00203,14.97,0.05836,0.01379,0.03614,0.6864,698.7,85.69,0.06192,0.0221,0.05335,0.004419,12.71,0.2244,0.0033380000000000003,0.002758,16.94,95.48,0.1365,0.07376,0.09022999999999999,0.003242,0.0015660000000000001,13.5,0.003746,1.5090000000000001,0.0148,566.2,0.2267,0 95 | 58.53,0.1438,18.07,0.1793,0.2803,0.1146,1.6669999999999998,1021.0,114.5,0.06817999999999999,0.1099,0.058660000000000004,0.06597,25.56,0.5296,0.03113,0.1682,28.07,120.4,0.1308,0.1006,0.1243,0.03927,0.01256,17.42,0.08555,3.767,0.02175,948.0,0.1603,1 96 | 38.87,0.05371,16.21,0.1976,0.3349,0.09947,2.22,808.9,94.25,0.06846000000000001,0.1225,0.05636,0.04938,21.46,0.4204,0.009369,0.1204,29.25,108.4,0.2075,0.09444,0.1306,0.01761,0.003249,14.48,0.029830000000000002,3.301,0.02418,648.2,0.302,1 97 | 33.01,0.0028309999999999997,14.73,0.05847,0.01824,0.03735,0.8285,672.4,82.71,0.0658,0.03532,0.05517999999999999,0.008829,13.84,0.3975,0.004148,0.0045590000000000006,17.4,93.96,0.1453,0.08352000000000001,0.1016,0.004821,0.002273,13.05,0.004711,2.5669999999999997,0.014219999999999998,530.6,0.2107,0 98 | 21.05,0.026810000000000004,17.62,0.6643,0.5539,0.1868,0.9832,896.9,97.41,0.1275,0.2701,0.06924,0.08782999999999999,21.53,0.2545,0.004452,0.1425,33.21,122.4,0.2252,0.1054,0.1525,0.013519999999999999,0.003711,14.58,0.03055,2.11,0.01454,644.8,0.4264,1 99 | 16.97,0.001184,13.07,0.0739,0.007731999999999999,0.038919999999999996,0.9097,523.4,76.09,0.07037,0.027960000000000002,0.0607,0.005592,17.93,0.2335,0.004729,0.001546,22.25,82.74,0.1382,0.07683,0.1013,0.003951,0.001755,12.03,0.006887000000000001,1.466,0.01466,446.0,0.2171,0 100 | 45.81,0.01622,20.92,0.1806,0.20800000000000002,0.07027,1.433,1320.0,115.2,0.07948,0.1136,0.0551,0.047439999999999996,24.48,0.4212,0.005444,0.05699,34.69,135.1,0.1538,0.08855,0.1315,0.008522,0.002751,17.93,0.01169,2.765,0.014190000000000001,998.9,0.2504,1 101 | 33.63,0.02332,19.85,0.2405,0.3378,0.1041,0.8568,1222.0,113.0,0.08113,0.1857,0.05612999999999999,0.08353,17.08,0.3093,0.004757,0.1266,25.09,130.9,0.1813,0.1008,0.1416,0.012620000000000001,0.002362,17.3,0.015030000000000002,2.193,0.013940000000000001,928.2,0.3138,1 102 | 60.78,0.06899,17.67,0.6247,0.6922,0.2008,1.268,959.5,96.42,0.1132,0.1785,0.07292,0.08653,22.15,0.7036,0.009406999999999999,0.2135,29.51,119.1,0.1949,0.1049,0.16399999999999998,0.01848,0.006113,14.25,0.07056,5.372999999999999,0.017,645.7,0.2844,1 103 | 14.49,0.014519999999999998,16.23,0.3904,0.3728,0.1047,0.6123,740.7,85.42,0.09618,0.1607,0.061770000000000005,0.052520000000000004,21.81,0.1938,0.00335,0.08259,29.89,105.5,0.1746,0.09714,0.1503,0.006853,0.00172,13.17,0.01384,1.334,0.01113,531.5,0.3693,1 104 | 22.45,0.00186,13.5,0.06624,0.005579,0.04216,1.35,564.1,79.83,0.06431,0.008772,0.05855,0.002924,18.4,0.2719,0.006383,0.00186,23.08,85.56,0.1697,0.08392999999999999,0.1038,0.002924,0.002015,12.58,0.008008,1.7209999999999999,0.025710000000000004,489.0,0.2505,0 105 | 44.64,0.04303,18.79,0.3583,0.583,0.1555,0.6583,1102.0,102.5,0.10099999999999999,0.1827,0.07069,0.1097,11.89,0.4209,0.005393,0.2032,17.04,125.0,0.1966,0.1257,0.1531,0.0132,0.004168,15.46,0.023209999999999998,2.805,0.01792,736.9,0.3216,1 106 | 89.74,0.03737,21.53,0.2327,0.2544,0.1289,1.288,1426.0,118.4,0.07625,0.1489,0.060770000000000005,0.07762000000000001,20.56,0.7548,0.007997,0.11699999999999999,26.06,143.4,0.2116,0.1001,0.1309,0.01648,0.003996,18.01,0.027000000000000003,5.353,0.02897,1007.0,0.3251,1 107 | 83.5,0.04257,24.22,0.2311,0.3158,0.08348,1.041,1750.0,132.5,0.07127,0.1445,0.051770000000000004,0.06022,21.46,0.6874,0.007959,0.09042,26.17,161.7,0.1467,0.08355,0.1228,0.01671,0.003933,20.48,0.031330000000000004,5.144,0.01341,1306.0,0.2238,1 108 | 24.19,0.07926,16.35,0.7090000000000001,0.9019,0.2225,0.8749,832.7,102.1,0.1155,0.2475,0.06898,0.09711,22.53,0.253,0.006965000000000001,0.2733,27.57,125.4,0.2041,0.09947,0.1419,0.022340000000000002,0.005784,14.9,0.06213,3.466,0.01499,685.0,0.2866,1 109 | 16.07,0.015090000000000001,15.35,0.3124,0.2654,0.1138,0.6068,719.8,87.44,0.08665,0.1427,0.06317,0.03152,18.75,0.1998,0.004413,0.042010000000000006,25.16,101.9,0.1723,0.1075,0.1624,0.007369,0.0017870000000000002,13.46,0.014430000000000002,1.443,0.01354,551.1,0.3518,0 110 | 17.91,0.009127,15.49,0.135,0.08115,0.061360000000000005,0.8561,725.9,88.44,0.07182000000000001,0.051039999999999995,0.0589,0.01141,17.21,0.2185,0.004599,0.0142,23.58,100.3,0.1614,0.08785,0.1157,0.004814,0.0017079999999999999,13.85,0.009169,1.495,0.01247,588.7,0.2364,0 111 | 24.28,0.007276,16.11,0.1637,0.06648,0.07885,1.217,793.7,96.22,0.06427999999999999,0.08485,0.0565,0.03781,16.95,0.2713,0.0050799999999999994,0.026019999999999998,23.0,104.6,0.17800000000000002,0.09855,0.1216,0.009073000000000001,0.0017059999999999998,14.97,0.0137,1.893,0.0135,685.9,0.2404,0 112 | 19.15,0.0,9.456,0.06444,0.0,0.04362,1.4280000000000002,268.6,47.92,0.07039,0.0,0.058839999999999996,0.0,24.54,0.3857,0.007189,0.0,30.37,59.16,0.1587,0.052629999999999996,0.08996,0.0,0.002783,7.76,0.00466,2.548,0.026760000000000003,181.0,0.2871,0 113 | 28.92,0.014119999999999999,21.31,0.2445,0.3538,0.08467999999999999,0.4757,1410.0,118.6,0.06938,0.1571,0.05425,0.05814,18.58,0.2577,0.002866,0.08169,26.36,139.2,0.1621,0.08588,0.1234,0.006719,0.0010869999999999999,18.31,0.009181,1.817,0.01069,1041.0,0.3206,1 114 | 12.26,0.0,10.62,0.07203999999999999,0.0,0.04102,0.496,342.9,61.24,0.08151,0.0,0.06422,0.0,11.97,0.1988,0.00604,0.0,14.1,66.53,0.1903,0.0925,0.1234,0.0,0.00322,9.738,0.0056560000000000004,1.218,0.02277,288.5,0.3105,0 115 | 20.3,0.044469999999999996,13.07,0.1937,0.256,0.07722000000000001,1.44,520.5,74.2,0.08284,0.06663999999999999,0.06267,0.014280000000000001,19.04,0.2864,0.007278,0.05485,26.98,86.43,0.2031,0.08546000000000001,0.1249,0.008799,0.003339,11.57,0.02047,2.206,0.018680000000000002,409.7,0.3035,0 116 | 13.17,0.012819999999999998,12.32,0.1507,0.1275,0.07608,0.5293,457.5,72.23,0.08022,0.0875,0.0627,0.02755,13.04,0.1904,0.0064719999999999995,0.03265,16.18,78.27,0.1769,0.09834,0.1358,0.008849,0.002817,11.29,0.01122,1.1640000000000001,0.016919999999999998,388.0,0.2733,0 117 | 15.89,0.026310000000000004,12.64,0.217,0.2302,0.1168,0.7339,475.7,75.46,0.07427,0.1105,0.0632,0.044969999999999996,16.02,0.2456,0.005884,0.07097,19.67,81.93,0.1886,0.1088,0.1415,0.013040000000000001,0.001982,11.61,0.02005,1.6669999999999998,0.01848,408.2,0.2787,0 118 | 11.36,0.016130000000000002,13.86,0.1958,0.18100000000000002,0.08836000000000001,0.905,580.9,83.18,0.07833999999999999,0.08388,0.062,0.0239,16.17,0.1458,0.0028870000000000002,0.03296,23.02,89.69,0.1735,0.09879,0.1172,0.007308,0.001972,12.94,0.01285,0.9975,0.0187,507.6,0.3297,0 119 | 19.01,0.01051,13.75,0.1928,0.1167,0.06545,0.8073,583.1,78.01,0.07961,0.055560000000000005,0.06129,0.016919999999999998,15.21,0.2575,0.005403,0.01994,21.38,91.11,0.1638,0.08673,0.1256,0.005142,0.002065,12.2,0.014180000000000002,1.959,0.013330000000000002,457.9,0.2661,0 120 | 10.21,0.07753,9.092,0.431,0.5381,0.1305,1.962,249.8,53.27,0.1486,0.07879,0.08261,0.02168,20.7,0.1935,0.01243,0.1321,29.72,58.08,0.2222,0.09405,0.163,0.01022,0.01178,8.219,0.05416,1.2429999999999999,0.02309,203.9,0.3322,0 121 | 31.0,0.03688,16.86,0.4059,0.3744,0.1306,1.845,811.3,92.87,0.1026,0.1772,0.06433,0.06462000000000001,23.81,0.4207,0.010879999999999999,0.1115,34.85,115.0,0.2235,0.09462999999999999,0.1559,0.01627,0.004768,14.19,0.0371,3.534,0.044989999999999995,610.7,0.4724,1 122 | 164.1,0.03582,30.67,0.2678,0.4819,0.1283,0.9245,2906.0,155.1,0.07737999999999999,0.2089,0.055060000000000005,0.141,24.27,1.0090000000000001,0.006292,0.2308,30.73,202.4,0.1797,0.1069,0.1515,0.013009999999999999,0.003118,23.51,0.01971,6.462000000000001,0.014790000000000001,1747.0,0.2593,1 123 | 12.67,0.01132,14.92,0.1231,0.0846,0.053610000000000005,1.685,684.5,89.75,0.06609,0.07911,0.05764,0.032510000000000004,17.18,0.1504,0.005371,0.026810000000000004,25.34,96.42,0.1641,0.08045,0.1066,0.009155,0.001444,14.06,0.01273,1.237,0.01719,609.1,0.2523,0 124 | 17.72,0.01551,16.2,0.1737,0.1362,0.06934,0.4125,819.1,97.65,0.06766,0.08177999999999999,0.055439999999999996,0.02657,13.21,0.1783,0.005012,0.03393,15.73,104.5,0.1721,0.07962999999999999,0.1126,0.009155,0.0017670000000000001,15.19,0.01485,1.338,0.01647,711.8,0.2487,0 125 | 19.2,0.04757,14.18,0.3593,0.3206,0.1297,0.873,600.5,80.64,0.1118,0.09804,0.06588,0.0288,17.48,0.2608,0.0067150000000000005,0.05892000000000001,23.13,95.23,0.1779,0.1042,0.1427,0.01051,0.006884,12.39,0.03705,2.117,0.01838,462.9,0.2819,0 126 | 90.47,0.03342,22.25,0.2291,0.3272,0.11,1.581,1549.0,121.4,0.08456,0.1674,0.06213,0.08665,17.12,0.7128,0.008102,0.1457,24.9,145.4,0.1966,0.1054,0.1503,0.01601,0.00457,18.66,0.02101,4.895,0.02045,1077.0,0.2894,1 127 | 130.2,0.03576,27.66,0.3885,0.4756,0.1954,0.6999,2227.0,147.2,0.08574,0.2432,0.0614,0.1501,21.9,1.008,0.003978,0.2448,25.8,195.0,0.1824,0.1063,0.1294,0.014709999999999999,0.003796,22.01,0.028210000000000002,7.561,0.01518,1482.0,0.2741,1 128 | 19.83,0.01796,16.89,0.2884,0.3796,0.07862000000000001,1.005,848.7,87.76,0.079,0.1329,0.0613,0.03085,24.69,0.231,0.0040880000000000005,0.05285,35.64,113.2,0.1761,0.09258,0.1471,0.00688,0.001465,13.61,0.01174,1.7519999999999998,0.01323,572.6,0.34700000000000003,1 129 | 33.67,0.03452,16.41,0.3856,0.5106,0.1469,0.9306,844.4,88.64,0.1109,0.2051,0.07325,0.08172,20.52,0.3906,0.005414,0.1445,29.66,113.3,0.2116,0.1106,0.1574,0.013340000000000001,0.004005,13.4,0.02265,3.093,0.01705,556.7,0.3585,1 130 | 22.81,0.0,11.92,0.054939999999999996,0.0,0.03558,3.8960000000000004,439.6,70.67,0.05905,0.0,0.055020000000000006,0.0,29.37,0.3141,0.007594,0.0,38.3,75.19,0.106,0.07449,0.09267,0.0,0.001773,11.2,0.008878,2.041,0.01989,386.0,0.1566,0 131 | 52.34,0.021169999999999998,24.33,0.2945,0.3788,0.1088,1.033,1844.0,132.9,0.07998999999999999,0.1697,0.055720000000000006,0.09333,27.06,0.3977,0.005043,0.1519,39.16,162.3,0.1814,0.1,0.1522,0.008185,0.0018920000000000002,20.31,0.015780000000000002,2.5869999999999997,0.012819999999999998,1288.0,0.3151,1 132 | 20.21,0.03452,14.13,0.2318,0.1604,0.07899,1.2990000000000002,621.9,84.18,0.07247,0.06608,0.05899,0.01883,18.29,0.2357,0.003629,0.04057,24.61,96.31,0.1874,0.07351,0.09329,0.01065,0.0037049999999999995,12.96,0.03713,2.397,0.02632,525.2,0.3207,0 133 | 17.61,0.01329,13.71,0.1212,0.102,0.07664,0.9505,574.4,81.25,0.06888,0.05602000000000001,0.05984,0.02107,17.3,0.21,0.006809,0.03193,21.1,88.7,0.1707,0.1028,0.1384,0.006474,0.001784,12.67,0.009514,1.5659999999999998,0.020569999999999998,489.9,0.2688,0 134 | 94.44,0.05687999999999999,22.54,0.205,0.4,0.1328,0.7813,1575.0,135.1,0.07678,0.1625,0.05882999999999999,0.1043,14.34,0.7572,0.01149,0.198,16.67,152.2,0.1809,0.1003,0.1374,0.01885,0.005115,20.29,0.02461,5.438,0.01756,1297.0,0.2364,1 135 | 38.49,0.02967,17.58,0.2101,0.2866,0.08597,1.198,967.0,97.26,0.06954,0.11199999999999999,0.05915,0.04335,19.07,0.386,0.004952000000000001,0.07486,28.06,113.8,0.1561,0.09215,0.1246,0.009423,0.001718,15.05,0.0163,2.63,0.01152,701.9,0.2282,1 136 | 43.52,0.006021,11.21,0.1352,0.02085,0.08333,1.7469999999999999,380.9,61.93,0.08009,0.04589,0.07028999999999999,0.01967,19.12,0.6965,0.013069999999999998,0.008934000000000001,23.17,71.79,0.2538,0.1075,0.1398,0.01052,0.004225,9.742,0.01885,4.607,0.031,289.7,0.3196,0 137 | 16.85,0.0169,11.16,0.1402,0.1055,0.07326,2.015,384.0,64.41,0.07664,0.06498999999999999,0.06331,0.01775,17.53,0.2619,0.007803,0.02511,26.84,71.98,0.18899999999999997,0.1007,0.1402,0.008043000000000002,0.002778,10.05,0.014490000000000001,1.778,0.021,310.8,0.2894,0 138 | 58.38,0.04942,20.38,0.4122,0.5036,0.1109,1.679,1284.0,112.4,0.07944,0.1739,0.05407000000000001,0.05736,25.42,0.51,0.008109,0.1204,35.46,132.8,0.1467,0.08331000000000001,0.1436,0.017419999999999998,0.003739,17.27,0.04308,3.283,0.01594,928.8,0.25,1 139 | 21.03,0.02544,14.98,0.2698,0.2577,0.1192,0.4981,686.6,88.59,0.08177000000000001,0.0909,0.06302999999999999,0.04451,13.9,0.2569,0.005850999999999999,0.0786,17.13,101.1,0.1962,0.1051,0.1376,0.00836,0.002918,13.56,0.02314,2.011,0.01842,561.3,0.3065,0 140 | 19.39,0.02334,12.98,0.1822,0.1609,0.09218,1.204,513.9,77.61,0.08251,0.1202,0.0685,0.04274,24.89,0.2623,0.008320000000000001,0.05441,30.36,84.48,0.182,0.10300000000000001,0.1311,0.01665,0.003674,11.99,0.02025,1.865,0.02094,441.3,0.2599,0 141 | 19.25,0.009212999999999999,14.91,0.1017,0.0626,0.055810000000000005,0.6549,688.9,89.59,0.0671,0.08216,0.05586,0.02652,15.66,0.2142,0.004837,0.02087,19.31,96.53,0.1589,0.07966000000000001,0.1034,0.010759999999999999,0.002104,14.02,0.009238,1.6059999999999999,0.01171,606.5,0.2136,0 142 | 33.01,0.033889999999999997,15.79,0.1581,0.2675,0.06636,1.6269999999999998,758.2,93.97,0.06836,0.1359,0.05416,0.05271,23.29,0.4157,0.008312,0.0839,31.71,102.2,0.1627,0.08682000000000001,0.1312,0.01576,0.0028710000000000003,14.6,0.017419999999999998,2.9139999999999997,0.0174,664.7,0.2477,1 143 | 24.72,0.04649,13.74,0.4092,0.4504,0.17,1.426,591.7,78.99,0.10300000000000001,0.1865,0.07371,0.07415,16.58,0.3197,0.005427,0.1659,26.38,91.93,0.2678,0.1091,0.1385,0.018430000000000002,0.004635,11.8,0.03633,2.281,0.05628,432.0,0.5774,1 144 | 93.99,0.01715,29.17,0.26,0.3155,0.1022,1.127,2615.0,137.2,0.07526000000000001,0.2009,0.052779999999999994,0.08632000000000001,23.04,0.6917,0.004728,0.1097,35.59,188.0,0.1769,0.09427999999999999,0.1401,0.01038,0.001987,21.16,0.01259,4.303,0.01083,1404.0,0.2822,1 145 | 48.29,0.0236,16.77,0.1525,0.1632,0.07624,0.8121,873.2,92.51,0.06072,0.1087,0.054479999999999994,0.04603,13.47,0.522,0.007089,0.05724,16.9,110.4,0.2075,0.09906000000000001,0.1297,0.01286,0.001463,14.34,0.014280000000000001,3.763,0.02266,641.2,0.3062,0 146 | 30.18,0.032139999999999995,9.981,0.1248,0.09441000000000001,0.07697999999999999,1.2,302.0,56.74,0.07431,0.047619999999999996,0.06621,0.023809999999999998,15.49,0.5381,0.01093,0.04721,17.7,65.27,0.193,0.08292999999999999,0.1015,0.015059999999999999,0.004174000000000001,8.878,0.02899,4.277,0.02837,241.0,0.2434,0 147 | 28.62,0.01977,14.45,0.1979,0.1423,0.1117,1.003,624.1,82.51,0.08557000000000001,0.08045,0.06623,0.02995,16.7,0.3834,0.007509,0.0388,21.74,93.63,0.212,0.1125,0.1475,0.009198999999999999,0.003629,12.75,0.015609999999999999,2.495,0.01805,493.8,0.3071,0 148 | 19.14,0.0,9.077,0.0834,0.0,0.048780000000000004,1.462,248.0,47.98,0.09938,0.0,0.07285,0.0,25.49,0.3777,0.01266,0.0,30.92,57.17,0.187,0.08098,0.1256,0.0,0.006872,7.729,0.009692000000000001,2.492,0.02882,178.8,0.3058,0 149 | 67.66,0.04345,22.75,0.3458,0.4734,0.1339,2.2840000000000003,1540.0,129.9,0.07918,0.2255,0.05715,0.1103,21.68,0.6226,0.004756,0.1863,34.66,157.6,0.2082,0.09797,0.1218,0.01806,0.003288,19.68,0.03368,5.172999999999999,0.03756,1194.0,0.4045,1 150 | 34.78,0.01949,14.16,0.1105,0.08112,0.07589,1.3219999999999998,616.7,83.51,0.06435,0.06296,0.06087000000000001,0.02645,20.78,0.4202,0.007017,0.03136,24.11,90.82,0.254,0.1135,0.1297,0.01153,0.001533,13.0,0.01142,2.873,0.02951,519.4,0.3196,0 151 | 18.15,0.020630000000000003,12.33,0.09147999999999999,0.1444,0.04701,0.9429,466.7,71.8,0.06641,0.06961,0.056670000000000005,0.0223,19.04,0.2727,0.009281999999999999,0.03709,23.84,78.0,0.1516,0.08139,0.129,0.008965,0.002146,11.31,0.009216,1.831,0.021830000000000002,394.1,0.24,0 152 | 18.4,0.02636,14.38,0.3842,0.3582,0.1334,0.6332,633.7,82.69,0.1033,0.1407,0.06854,0.05074,18.17,0.2324,0.005704,0.08017,22.15,95.29,0.1641,0.1076,0.1533,0.010320000000000001,0.003563,12.65,0.02502,1.696,0.01759,485.6,0.32299999999999995,0 153 | 20.05,0.005308,10.92,0.09473,0.02049,0.05301,2.043,366.1,62.11,0.08988,0.023809999999999998,0.0689,0.007937,19.94,0.335,0.01113,0.006829000000000001,26.29,68.81,0.135,0.1024,0.1316,0.00525,0.005667,9.787,0.01463,2.1319999999999997,0.018009999999999998,294.5,0.1934,0 154 | 13.25,0.008342,13.14,0.1232,0.08636,0.052410000000000005,0.7285,532.8,76.84,0.07898,0.07025,0.059070000000000004,0.01963,12.74,0.1822,0.005528,0.019719999999999998,18.41,84.08,0.159,0.09311,0.1275,0.006273,0.00253,12.06,0.009789,1.171,0.01465,448.6,0.2514,0 155 | 19.29,0.03304,14.8,0.257,0.3438,0.1147,1.3319999999999999,675.2,88.1,0.07686,0.1453,0.06079,0.053810000000000004,18.89,0.2136,0.005442,0.0858,27.2,97.33,0.1806,0.1059,0.1428,0.013669999999999998,0.002464,13.51,0.01957,1.5130000000000001,0.01315,558.1,0.2666,0 156 | 36.35,0.013580000000000002,16.36,0.1238,0.135,0.05055,1.153,830.6,84.74,0.062060000000000004,0.1001,0.05318,0.02648,14.76,0.4057,0.004481000000000001,0.03261,22.35,104.5,0.1386,0.07355,0.1006,0.01082,0.0014349999999999999,13.27,0.01038,2.701,0.01069,551.7,0.2027,0 157 | 40.73,0.02713,20.42,0.342,0.3508,0.1198,0.8282,1239.0,115.1,0.07867,0.1939,0.05491,0.07488,19.32,0.3971,0.00609,0.1036,25.84,139.5,0.1506,0.08968,0.1381,0.01345,0.0026579999999999998,17.54,0.025689999999999998,3.088,0.01594,951.6,0.2928,1 158 | 115.2,0.02721,22.96,0.2444,0.2639,0.111,1.1520000000000001,1648.0,111.2,0.0906,0.1555,0.06281,0.06431,27.15,0.9291,0.008740000000000001,0.1007,34.49,152.1,0.1793,0.09898,0.16,0.014580000000000001,0.004417,17.08,0.02219,6.051,0.02045,930.9,0.301,1 159 | 22.73,0.027139999999999997,11.14,0.1542,0.1277,0.06258,1.35,385.2,61.49,0.08524,0.0656,0.06412999999999999,0.01514,18.49,0.3776,0.007501000000000001,0.029480000000000003,25.62,70.88,0.2238,0.08946,0.1234,0.009883,0.003913000000000001,9.667,0.01989,2.569,0.0196,289.1,0.3174,0 160 | 11.68,0.017230000000000002,12.09,0.1982,0.1553,0.07078999999999999,1.031,447.1,70.67,0.07287,0.06754,0.06246,0.02074,14.93,0.1642,0.005296,0.035460000000000005,20.83,79.73,0.2003,0.07987000000000001,0.1095,0.006959999999999999,0.001941,11.04,0.019030000000000002,1.281,0.0188,372.7,0.3202,0 161 | 35.13,0.0,13.45,0.052129999999999996,0.0,0.03398,3.647,558.9,77.42,0.06742999999999999,0.0,0.0596,0.0,29.97,0.4455,0.007339,0.0,38.05,85.08,0.1701,0.07699,0.09422,0.0,0.003136,12.27,0.008243,2.884,0.03141,465.4,0.2409,0 162 | 17.49,0.01376,12.36,0.09708,0.07529,0.05008,1.974,459.3,72.17,0.06994,0.062029999999999995,0.05955,0.021730000000000003,18.89,0.2656,0.0065379999999999995,0.02399,26.14,79.29,0.2013,0.08713,0.1118,0.009923999999999999,0.002928,11.37,0.01395,1.954,0.03416,396.0,0.3267,0 163 | 23.11,0.03829,16.22,0.4202,0.40399999999999997,0.12300000000000001,1.079,808.9,95.81,0.1023,0.1205,0.06341000000000001,0.0389,24.99,0.2542,0.007137999999999999,0.1009,31.73,113.5,0.1872,0.08837,0.134,0.01162,0.006111,14.47,0.04653,2.615,0.02068,656.4,0.3187,0 164 | 13.38,0.006564,13.5,0.1472,0.052329999999999995,0.08392999999999999,0.6931,553.7,81.78,0.06922,0.06343,0.061,0.01924,13.78,0.1807,0.006064,0.01288,17.48,88.54,0.1638,0.09667,0.1298,0.007978,0.001392,12.72,0.0118,1.34,0.01374,492.1,0.2369,0 165 | 22.95,0.014230000000000001,19.18,0.292,0.2477,0.07112,0.5679,1084.0,107.1,0.07622999999999999,0.08737,0.05325,0.02307,20.2,0.2473,0.002667,0.036489999999999995,26.56,127.3,0.1846,0.07497000000000001,0.1009,0.005297,0.0017,16.69,0.014459999999999999,1.775,0.01961,857.6,0.4677,1 166 | 17.67,0.05189,10.6,0.3663,0.2913,0.2204,1.39,328.1,64.12,0.1364,0.1075,0.09575,0.07038,13.14,0.2744,0.021769999999999998,0.1188,18.04,69.47,0.2057,0.1255,0.2006,0.0145,0.011479999999999999,9.676,0.04888,1.787,0.02632,272.5,0.2848,0 167 | 18.15,0.0643,12.36,0.4082,0.4779,0.1113,1.376,476.4,71.73,0.09532,0.1555,0.0664,0.03613,17.2,0.2574,0.008565000000000001,0.09457,26.87,90.14,0.1489,0.08915,0.1391,0.01768,0.0049759999999999995,10.97,0.046380000000000005,2.806,0.01516,371.5,0.254,0 168 | 18.57,0.02,14.29,0.217,0.2413,0.07175,0.7786,624.6,78.31,0.0747,0.08829,0.059160000000000004,0.02027,18.02,0.2527,0.005833,0.04392,24.04,93.85,0.1695,0.09231,0.1368,0.007087,0.00196,12.21,0.013880000000000002,1.874,0.01938,458.4,0.3218,0 169 | 23.13,0.0288,11.54,0.1486,0.07987000000000001,0.08578,1.534,402.8,67.41,0.07552,0.03203,0.06481,0.01201,19.29,0.355,0.007595,0.02995,23.31,74.22,0.2217,0.09988999999999999,0.1219,0.008614,0.0034509999999999996,10.49,0.02219,2.302,0.0271,336.1,0.2826,0 170 | 12.67,0.01434,12.32,0.1648,0.1399,0.06888999999999999,0.9938,462.0,73.06,0.06765,0.08476,0.05865,0.02875,15.39,0.1759,0.0051329999999999995,0.03503,22.02,79.93,0.1734,0.09639,0.11900000000000001,0.008602,0.001588,11.43,0.01521,1.143,0.015009999999999999,399.8,0.2676,0 171 | 57.65,0.0371,24.56,0.3206,0.5755,0.1206,0.6636,1623.0,122.0,0.09287999999999999,0.1956,0.05629,0.08271,24.59,0.5495,0.003872,0.1468,30.41,152.9,0.1953,0.09029,0.1249,0.012,0.0033369999999999997,19.02,0.01842,3.055,0.01964,1076.0,0.3956,1 172 | 28.51,0.03312,16.76,0.3345,0.3114,0.1025,1.3359999999999999,867.1,97.53,0.09251000000000001,0.1308,0.059129999999999995,0.03876,22.11,0.3186,0.004449,0.06859,31.55,110.2,0.1944,0.08515,0.1077,0.01196,0.0040149999999999995,14.99,0.02808,2.31,0.01906,693.7,0.3163,0 173 | --------------------------------------------------------------------------------