15 | Step-by-step instructions (expand for details)
16 |
17 | 1. In the AWS Management Console, choose **Services** then select **S3** under Storage.
18 |
19 | 1. Choose **+Create Bucket**
20 |
21 | 1. Provide a globally unique name for your bucket such as `smworkshop-firstname-lastname`.
22 |
23 | 1. Select the Region you've chosen to use for this workshop from the dropdown.
24 |
25 | 1. Choose **Create** in the lower left of the dialog without selecting a bucket to copy settings from.
26 |
27 |
28 |
29 | ## 2. Launching the Notebook Instance
30 |
31 | 1. Make sure you are on the AWS Management Console home page. As shown below, in the **Search for services** search box, type **SageMaker**. The search result list will populate with Amazon SageMaker, which you should now click. This will bring you to the Amazon SageMaker console homepage.
32 |
33 | 
34 |
35 | 2. In the upper-right corner of the AWS Management Console, confirm you are in the desired AWS region. Select N. Virginia, Oregon, Ohio, or Ireland (or any other region where SageMaker is available).
36 |
37 | 3. To create a new notebook instance, click the **Notebook instances** link on the left side, and click the **Create notebook instance** button in the upper right corner of the browser window.
38 |
39 | 
40 |
41 | 4. Type smworkshop-[First Name]-[Last Name] into the **Notebook instance name** text box, and select ml.m5.xlarge for the **Notebook instance type**.
42 |
43 | 
44 |
45 | 5. In the **Permissions and encryption** section, choose **Create a new role** in the **IAM role** drop down menu. Leave the defaults in the pop-up modal, as shown below. Click **Create role**.
46 |
47 | 
48 |
49 | 6. As shown below, go to the **Git repositories** section, click the **Repository** drop down menu, select **Clone a Git repository to this notebook instance only**, and enter the following for **Git repository URL**: `https://github.com/awslabs/amazon-sagemaker-workshop.git`
50 |
51 | 
52 |
53 | 7. Click **Create notebook instance** at the bottom.
54 |
55 | ### 3. Accessing the Notebook Instance
56 |
57 | 1. Wait for the server status to change to **InService**. This will take several minutes, possibly up to ten but likely much less.
58 |
59 | 
60 |
61 | 2. Click **Open Jupyter**. You will now see the Jupyter homepage for your notebook instance.
62 |
63 |
64 |
--------------------------------------------------------------------------------
/NotebookCreation/images/console-services.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/NotebookCreation/images/console-services.png
--------------------------------------------------------------------------------
/NotebookCreation/images/git-info.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/NotebookCreation/images/git-info.png
--------------------------------------------------------------------------------
/NotebookCreation/images/notebook-instances.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/NotebookCreation/images/notebook-instances.png
--------------------------------------------------------------------------------
/NotebookCreation/images/notebook-settings.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/NotebookCreation/images/notebook-settings.png
--------------------------------------------------------------------------------
/NotebookCreation/images/open-notebook.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/NotebookCreation/images/open-notebook.png
--------------------------------------------------------------------------------
/NotebookCreation/images/role-popup.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/NotebookCreation/images/role-popup.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Amazon SageMaker Workshops
2 |
3 | Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. This repository contains a collection of 2-hour workshops covering many features of Amazon SageMaker. They are suitable for self-service or live, guided events.
4 |
5 | 
6 |
7 | **BEFORE ATTEMPTING ANY WORKSHOP: please review the Prerequisites below and complete any actions that are required, especially those in the Permissions section.**
8 |
9 |
10 | # Workshops
11 |
12 | - [**Introduction to Amazon SageMaker**](Introduction) - This 100-200 level workshop demonstrates some of the key features of Amazon SageMaker. It does so via a set of straightforward examples for common use cases including: working with structured (tabular) data, natural language processing (sentiment analysis), and computer vision (image classification). Content includes how to (1) do exploratory data analysis in Amazon SageMaker notebook environments such as SageMaker Studio or SageMaker Notebook Instances; (2) run Amazon SageMaker training jobs with your own custom models or built-in algorithms; and (3) get predictions using hosted model endpoints and batch transform jobs.
13 |
14 | - [**TensorFlow in Amazon SageMaker**](TensorFlow) - In this 400 level workshop for experienced TensorFlow users, various aspects of TensorFlow usage in Amazon SageMaker will be demonstrated. In particular, TensorFlow will be applied to a natural language processing use case, a structured data use case, and a computer vision use case. Relevant SageMaker features that will be demonstrated include: prototyping training and inference code with SageMaker Local Mode; SageMaker Pipelines for workflow orchestration; hosted training jobs for full-scale training; distributed training on a single multi-GPU instance or multiple instances; Automatic Model Tuning; batch and real time inference options.
15 |
16 | - [**Simplify Workflows with Scripts, the CLI and Console**](Simplify-Workflows) - (**NOTE**: for CI/CD in Amazon SageMaker and workflow orchestration, first consider [Amazon SageMaker Pipelines](https://aws.amazon.com/sagemaker/pipelines); an example is the Structured Data module of [**TensorFlow in Amazon SageMaker**](TensorFlow)). The focus of this 200+ level workshop is on simplifying Amazon SageMaker workflows, or doing ad hoc jobs, with a roll-your-own solution using scripts, the AWS CLI, and the Amazon SageMaker console. All of these are alternatives to using Jupyter notebooks as an interface to Amazon SageMaker. You'll apply Amazon SageMaker built-in algorithms to a structured data example and a distributed training example showing different ways to set up nodes in a training cluster.
17 |
18 |
19 | # Prerequisites
20 |
21 | ## AWS Account
22 |
23 | **Permissions**: In order to complete this workshop you'll need an AWS Account, and an AWS IAM user in that account with at least full permissions to the following AWS services:
24 |
25 | - AWS IAM
26 | - Amazon S3
27 | - Amazon SageMaker
28 | - AWS CloudShell or AWS Cloud9
29 | - Amazon EC2: including P3, C5, and M5 instance types; to check your limits, see [Viewing Your Current Limits](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-resource-limits.html). If you do not have at least the default limits specified in [the Amazon SageMaker Limits table](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html), please file a limit increase request via the AWS console.
30 |
31 | **Use Your Own Account**: The code and instructions in this workshop assume only one student is using a given AWS account at a time. If you try sharing an account with another student, you'll run into naming conflicts for certain resources. You can work around these by appending a unique suffix to the resources that fail to create due to conflicts, but the instructions do not provide details on the changes required to make this work. Use a personal account or create a new AWS account for this workshop rather than using an organization’s account to ensure you have full access to the necessary services and to ensure you do not leave behind any resources from the workshop.
32 |
33 | **Costs**: Some, but NOT all, of the resources you will launch as part of this workshop are eligible for the AWS free tier if your account is less than 12 months old. See the [AWS Free Tier page](https://aws.amazon.com/free/) for more details. An example of a resource that is **not** covered by the free tier is the Amazon SageMaker notebook instance type used in some workshops. To avoid charges for endpoints and other resources you might not need after you've finished a workshop, please refer to the [**Cleanup Guide**](./CleanupGuide).
34 |
35 |
36 | ## AWS Region
37 |
38 | Amazon SageMaker is not available in all AWS Regions at this time. Accordingly, we recommend running this workshop in one of the following supported AWS Regions: N. Virginia, Oregon, Ohio, Ireland or Sydney.
39 |
40 | Once you've chosen a region, you should create all of the resources for this workshop there, including a new Amazon S3 bucket and a new SageMaker notebook instance. Make sure you select your region from the dropdown in the upper right corner of the AWS Console before getting started.
41 |
42 | 
43 |
44 |
45 | ## Browser
46 |
47 | We recommend you use the latest version of Chrome or Firefox to complete this workshop.
48 |
49 |
50 | ## AWS Command Line Interface
51 |
52 | To complete certain workshop modules, you'll need the AWS Command Line Interface (CLI) and a Bash environment. You'll use the AWS CLI to interface with Amazon SageMaker and other AWS services. Do NOT attempt to use a locally installed AWS CLI during a live workshop because there is insufficient time during a live workshop to resolve related issues with your laptop etc.
53 |
54 | To avoid problems that can arise configuring the CLI on your machine during a live workshop, either [**AWS CloudShell**](https://aws.amazon.com/cloudshell/) or [**AWS Cloud9**](https://aws.amazon.com/cloud9/) can be used. AWS CloudShell is a browser-based shell that makes it easy to securely manage, explore, and interact with your AWS resources. To run Bash scripts for workshops using CloudShell, simply create raw text script files on your local computer, and then follow the instruction steps for [uploading and running script files](https://docs.aws.amazon.com/cloudshell/latest/userguide/getting-started.html).
55 |
56 | AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with just a browser. It has the AWS CLI pre-installed so you don’t need to install files or configure your laptop to use the AWS CLI. For Cloud9 setup directions, see [**Cloud9 Setup**](Cloud9).
57 |
58 |
59 | ## Text Editor
60 |
61 | For any workshop module that requires use of the AWS Command Line Interface (see above), you also will need a **plain text** editor for writing Bash scripts. Any editor that inserts Windows or other special characters potentially will cause scripts to fail. AWS Cloud9 includes a text editor, while for AWS CloudShell you'll need to use your own separate text editor of your choice to create script files (or enter commands one at a time).
62 |
63 |
64 | # License & Contributing
65 |
66 | The contents of this workshop are licensed under the [Apache 2.0 License](./LICENSE).
67 | If you are interested in contributing to this project, please see the [Contributing Guidelines](./contributing/CONTRIBUTING.md). In connection with contributing, also review the [Code of Conduct](./contributing/CODE_OF_CONDUCT.md).
68 |
69 |
70 |
--------------------------------------------------------------------------------
/Simplify-Workflows/README.md:
--------------------------------------------------------------------------------
1 | # Simplify Workflows with Scripts, the CLI and Console
2 |
3 | ## Modules
4 |
5 | This workshop is divided into multiple modules. After completing **Preliminaries**, complete the module **Creating a Notebook Instance** next. You can complete the remaining modules in any order.
6 |
7 | - Preliminaries
8 |
9 | - Creating a Notebook Instance
10 |
11 | - Videogame Sales with the CLI and Console
12 |
13 | - Distributed Training with Built-in Algorithms, the CLI and Console
14 |
15 |
16 | ## Preliminaries
17 |
18 | - Be sure you have completed all of the Prerequisites listed in the [**main README**](../README.md). This workshop makes use of the AWS CLI and requires the use of a Bash environment for scripting. AWS CloudShell or AWS Cloud9 can be used to run the CLI and scripts; if you haven't done so already, please complete the [**CloudShell (or Cloud9) Setup**](../Cloud9).
19 |
20 | - If you are new to using Jupyter notebooks, read the next section, otherwise you may now skip ahead to the next section.
21 |
22 |
23 | ### Jupyter Notebooks: A Brief Overview
24 |
25 | Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. With respect to code, it can be thought of as a web-based IDE that executes code on the server it is running on instead of locally.
26 |
27 | There are two main types of "cells" in a notebook: code cells, and "markdown" cells with explanatory text. You will be running the code cells. These are distinguished by having "In" next to them in the left margin next to the cell, and a greyish background. Markdown cells lack "In" and have a white background. In the screenshot below, the upper cell is a markdown cell, while the lower cell is a code cell:
28 |
29 | 
30 |
31 | To run a code cell, simply click in it, then either click the **Run Cell** button in the notebook's toolbar, or use Control+Enter from your computer's keyboard. It may take a few seconds to a few minutes for a code cell to run. You can determine whether a cell is running by examining the `In[]:` indicator in the left margin next to each cell: a cell will show `In [*]:` when running, and `In [a number]:` when complete.
32 |
33 | Please run each code cell in order, and **only once**, to avoid repeated operations. For example, running the same training job cell twice might create two training jobs, possibly exceeding your service limits.
34 |
35 |
36 | ## Creating a Notebook Environment
37 |
38 | SageMaker provides hosted Jupyter notebooks that require no setup, so you can begin processing your training data sets immediately. With a few clicks in the SageMaker console, you can create a fully managed notebook environment, pre-loaded with useful libraries for machine learning. You need only add your data. You have two different options for this workshop. Follow the choice specified by your workshop instructor if you're in a live workshop, or make your own choice otherwise:
39 |
40 | - **SageMaker Studio**: An IDE for machine learning. To create a SageMaker Studio domain for this workshop, follow the instructions at [**Creating an Amazon SageMaker Studio domain**](../StudioCreation), then return here to continue with the next module of the workshop.
41 |
42 | - **SageMaker Notebook Instance**: A managed instance with preinstalled data science tools (though not as fully managed as SageMaker Studio). To create a SageMaker notebook instance for this workshop, follow the instructions at [**Creating a Notebook Instance**](../NotebookCreation), then return here to continue with the next module of the workshop.
43 |
44 |
45 | ## Videogame Sales with the CLI and Console
46 |
47 | Please go to the following link for this module: [**Videogame Sales with the CLI and Console**](../modules/Video_Game_Sales_CLI_Console.md). Be sure to use the **downloaded** version of the applicable Jupyter notebook from this workshop repository.
48 |
49 | When you're finished, return here to move on to the next module.
50 |
51 |
52 | ## Distributed Training with Built-in Algorithms, the CLI and Console
53 |
54 | Please go to the following link for this module: [**Distributed Training with Built-in Algorithms, the CLI and Console**](../modules/Distributed_Training_CLI_Console.md). Be sure to use the **downloaded** version of the applicable Jupyter notebook from this workshop repository.
55 |
56 | When you're finished, return here and go on to the Cleanup Guide.
57 |
58 |
59 | ## Cleanup
60 |
61 | To avoid charges for endpoints and other resources you might not need after the workshop, please refer to the [**Cleanup Guide**](../CleanupGuide).
62 |
63 |
64 |
65 |
66 |
67 |
68 |
69 |
70 |
--------------------------------------------------------------------------------
/StudioCreation/README.md:
--------------------------------------------------------------------------------
1 | # Creating an Amazon SageMaker Studio domain
2 |
3 | SageMaker Studio is an IDE for machine learning. To create a SageMaker Studio domain for this workshop, follow these steps:
4 |
5 | - Go to https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html
6 | - Follow the directions for the first several steps, when you reach reach the "For **Execution role**" step, be sure to carefully read the bullet point below first.
7 | - For **Execution role**, choose **Create a new role**, leave the defaults on the pop up, and click **Create role**.
8 | - When creation is complete (this may take about five minutes), click **Open Studio** in the line for your default user.
9 | - Open a terminal within Studio (File menu -> New -> Terminal).
10 | - Clone the official samples repository with this command:
11 |
12 | ```
13 | git clone https://github.com/awslabs/amazon-sagemaker-workshop.git
14 | ```
15 |
16 | - Return to the main workshop instructions page and continue with the rest of the workshop.
17 |
--------------------------------------------------------------------------------
/TensorFlow/README.md:
--------------------------------------------------------------------------------
1 | # TensorFlow in Amazon SageMaker
2 |
3 | This workshop demonstrates various aspects of how to work with custom TensorFlow models in Amazon SageMaker. We'll examine how TensorFlow can be applied in SageMaker to a natural language processing use case, a structured data use case, and a computer vision use case.
4 |
5 | Here are some of the key features of SageMaker demonstrated in this workshop:
6 |
7 | - **Data Processing**
8 | - **SageMaker Processing** for data preprocessing tasks within SageMaker.
9 |
10 | - **Prototyping and Working with Code**
11 | - **Script Mode**, which enables you to use your own custom model definitions and training scripts similar to those outside Amazon SageMaker, with prebuilt TensorFlow containers.
12 | - **Git integration** for Script Mode, which allows you to specify a training script in a Git repository so your code is version controlled and you don't have to download code locally.
13 | - **Local Mode Training** for rapid prototyping and to confirm your code is working before moving on to full scale model training.
14 | - **Local Mode Endpoints** to test your models and inference code before deploying with TensorFlow Serving in SageMaker hosted endpoints for production.
15 |
16 | - **Training and Tuning Models**
17 | - **Hosted Training** for large scale model training.
18 | - **Automatic Model Tuning** to find the best model hyperparameters using automation.
19 | - **Distributed Training with TensorFlow's native MirroredStrategy** to perform training with multiple GPUs on a *single* instance.
20 | - **Distributed Training on a SageMaker-managed cluster** of *multiple* instances using either the SageMaker Distributed Training feature, or parameters servers and Horovod.
21 |
22 | - **Inference**
23 | - **Hosted Endpoints** for real time predictions with TensorFlow Serving.
24 | - **Batch Transform Jobs** for asynchronous, large scale batch inference.
25 | - **Model evaluation or batch inference** with SageMaker Processing.
26 |
27 | - **Workflow Automation**
28 | - **SageMaker Pipelines** for creating an end-to-end automated pipeline from data preprocessing to model training to hosted model deployment.
29 |
30 |
31 | ## Modules
32 |
33 | This workshop is divided into multiple modules that should be completed in order. After **Preliminaries**, complete the module **Creating a Notebook Environment**. The next two modules, NLP and Structured Data, should be completed in order to show how to build a workflow from relatively simple to more complex.
34 |
35 | - Preliminaries
36 |
37 | - Creating a Notebook Environment
38 |
39 | - Natural Language Processing (NLP) Use Case: Sentiment Analysis
40 |
41 | - Structured Data Use Case: End-to-End Workflow for Boston Housing Price Predictions
42 |
43 | - Computer Vision Use Case: Image Classification
44 |
45 |
46 | ## Preliminaries
47 |
48 | - Be sure you have completed all of the Prerequisites listed in the [**main README**](../README.md).
49 |
50 | - If you are new to using Jupyter notebooks, read the next section, otherwise you may now skip ahead to the next module.
51 |
52 |
53 | ### Jupyter Notebooks: A Brief Overview
54 |
55 | Jupyter is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. With respect to code, it can be thought of as a web-based IDE that executes code on the server it is running on instead of locally.
56 |
57 | There are two main types of "cells" in a notebook: code cells, and "markdown" cells with explanatory text. You will be running the code cells. These are distinguished by having "In" next to them in the left margin next to the cell, and a greyish background. Markdown cells lack "In" and have a white background. In the screenshot below, the upper cell is a markdown cell, while the lower cell is a code cell:
58 |
59 | 
60 |
61 | To run a code cell, simply click in it, then either click the **Run Cell** button in the notebook's toolbar, or use Control+Enter from your computer's keyboard. It may take a few seconds to a few minutes for a code cell to run. You can determine whether a cell is running by examining the `In[]:` indicator in the left margin next to each cell: a cell will show `In [*]:` when running, and `In [a number]:` when complete.
62 |
63 | Please run each code cell in order, and **only once**, to avoid repeated operations. For example, running the same training job cell twice might create two training jobs, possibly exceeding your service limits.
64 |
65 |
66 | ## Creating a Notebook Environment
67 |
68 | SageMaker provides hosted Jupyter notebooks that require no setup, so you can begin processing your training data sets immediately. With a few clicks in the SageMaker console, you can create a fully managed notebook environment, pre-loaded with useful libraries for machine learning. You need only add your data. You have two different options for this workshop. Follow the choice specified by your workshop instructor if you're in a live workshop, or make your own choice otherwise:
69 |
70 | - **SageMaker Studio**: An IDE for machine learning. To create a SageMaker Studio domain for this workshop, follow the instructions at [**Creating an Amazon SageMaker Studio domain**](../StudioCreation), then return here to continue with the next module of the workshop.
71 |
72 | - **SageMaker Notebook Instance**: A managed instance with preinstalled data science tools (though not as fully managed as SageMaker Studio). To create a SageMaker notebook instance for this workshop, follow the instructions at [**Creating a Notebook Instance**](../NotebookCreation), then return here to continue with the next module of the workshop.
73 |
74 |
75 | ## Natural Language Processing Use Case: Sentiment Analysis
76 |
77 | In this module, we'll train a custom sentiment analysis model by providing our own Python training script for use with Amazon SageMaker's prebuilt TensorFlow 2 container. Assuming you have cloned this repository into your notebook environment (which you should do if you haven't), open the `notebooks` directory of the repository and click on the `sentiment-analysis.ipynb` notebook to open it.
78 |
79 | When you're finished, return here to move on to the next module.
80 |
81 |
82 | ## Structured Data Use Case: End-to-End Workflow for Boston Housing Price Predictions
83 |
84 | We'll focus on a relatively complete TensorFlow 2 workflow in this module to predict prices based on the Boston Housing dataset. In particular, we'll preprocess data with SageMaker Processing, prototype training and inference code with Local Mode, use Automatic Model Tuning, deploy the tuned model to a real time endpoint, and examine how SageMaker Pipelines can automate setting up this workflow for a production environment. Assuming you have cloned this repository into your notebook environment (which you should do if you haven't), open the `notebooks` directory of the repository and click on the `tf-2-workflow-smpipelines.ipynb` notebook to open it.
85 |
86 | When you're finished, return here to move on to the next module.
87 |
88 |
89 | ## Computer Vision Use Case: Image Classification
90 |
91 | This module applies TensorFlow within Amazon SageMaker to an image classification use case. Currently we recommend using the example [TensorFlow2 and SMDataParallel](https://github.com/aws/amazon-sagemaker-examples/tree/master/training/distributed_training/tensorflow/data_parallel/mnist). This example applies the SageMaker Distributed Training feature with data parallelism to train a model on multiple instances. (Model parallelism is another possibility.)
92 |
93 | Alternatively, there is a TensorFlow 1.x example for the parameter server method and Horovod. This example also uses the pre/post-processing script feature of the SageMaker TensorFlow Serving container to transform data for inference, without having to build separate containers and infrastructure to do this job. Assuming you have cloned this repository into your notebook environment (which you should do if you haven't), open the `notebooks` directory of the repository and click on the `tf-distributed-training.ipynb` notebook to open it.
94 | When you're finished, return here and go on to the Cleanup Guide.
95 |
96 |
97 | ## Cleanup
98 |
99 | To avoid charges for endpoints and other resources you might not need after the workshop, please refer to the [**Cleanup Guide**](../CleanupGuide).
100 |
101 |
102 |
--------------------------------------------------------------------------------
/contributing/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | ## Code of Conduct
2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
4 | opensource-codeofconduct@amazon.com with any additional questions or comments.
5 |
--------------------------------------------------------------------------------
/contributing/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing Guidelines
2 |
3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional
4 | documentation, we greatly value feedback and contributions from our community.
5 |
6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
7 | information to effectively respond to your bug report or contribution.
8 |
9 |
10 | ## Reporting Bugs/Feature Requests
11 |
12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features.
13 |
14 | When filing an issue, please check [existing open](https://github.com/awslabs/amazon-sagemaker-workshop/issues), or [recently closed](https://github.com/awslabs/amazon-sagemaker-workshop/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already
15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
16 |
17 | * A reproducible test case or series of steps
18 | * The version of our code being used
19 | * Any modifications you've made relevant to the bug
20 | * Anything unusual about your environment or deployment
21 |
22 |
23 | ## Contributing via Pull Requests
24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
25 |
26 | 1. You are working against the latest source on the *master* branch.
27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already.
28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted.
29 |
30 | To send us a pull request, please:
31 |
32 | 1. Fork the repository.
33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34 | 3. Ensure local tests pass.
35 | 4. Commit to your fork using clear commit messages.
36 | 5. Send us a pull request, answering any default questions in the pull request interface.
37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38 |
39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
41 |
42 |
43 | ## Finding contributions to work on
44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/awslabs/amazon-sagemaker-workshop/labels/help%20wanted) issues is a great place to start.
45 |
46 |
47 | ## Code of Conduct
48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50 | opensource-codeofconduct@amazon.com with any additional questions or comments.
51 |
52 |
53 | ## Security issue notifications
54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue.
55 |
56 |
57 | ## Licensing
58 |
59 | See the [LICENSE](https://github.com/awslabs/amazon-sagemaker-workshop/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
60 |
61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.
62 |
--------------------------------------------------------------------------------
/images/cells.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/images/cells.png
--------------------------------------------------------------------------------
/images/clawfoot_bathtub.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/images/clawfoot_bathtub.jpg
--------------------------------------------------------------------------------
/images/overview.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/images/overview.png
--------------------------------------------------------------------------------
/images/region-selection.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awslabs/amazon-sagemaker-workshop/0cf3d661a4065becde6b974ee6fff99c9640b850/images/region-selection.png
--------------------------------------------------------------------------------
/modules/Distributed_Training_CLI_Console.md:
--------------------------------------------------------------------------------
1 | # Distributed Training with SageMaker's Built-in Algorithms
2 |
3 | ## Introduction
4 |
5 | Amazon SageMaker provides high-performance, scalable machine learning algorithms optimized for speed, scale, and accuracy, and designed to run on extremely large training datasets. Based on the type of learning that you are undertaking, you can choose from either supervised algorithms (e.g. linear/logistic regression or classification), or unsupervised algorithms (e.g. k-means clustering). Besides general purpose algorithms, Amazon SageMaker's set of built-in algorithms also includes specific-purpose algorithms suited for tasks in domains such as natural language processing and computer vision.
6 |
7 | Amazon SageMaker's built-in algorithms are re-envisioned from the ground up, specifically for large training data sets. Most algorithms available elsewhere rely on being able to load files or the entire data set into memory, which doesn’t work for very large datasets. Even algorithms that don’t do this need all of the data downloaded before training starts, instead of streaming the data in and processing it as it comes in. And lastly, large training data sets can cause some algorithms available elsewhere to give up - in some cases, with training data sets as small as a few gigabytes.
8 |
9 | To summarize, there are many reasons to use Amazon SageMaker’s built-in algorithms instead of "bringing your own" (BYO) algorithm if the built-in algorithms fit your use case:
10 |
11 | - With BYO, time/cost must be spent on design (e.g. of a neural net).
12 | - With BYO, you must solve the problems of scalability and reliability for large data sets.
13 | - Built-in algorithms take care of these concerns.
14 | - Built-in algorithms also provide many conveniences such as reduced need for external hyperparameter optimization, efficient data loading, etc.
15 | - Faster training, and faster inference with smaller models produced by some built-in algorithms.
16 | - Even if you’re doing BYO, built-in algorithms may be helpful at some point in the machine learning pipeline, such as the PCA built-in algorithm for dimensionality reduction.
17 |
18 | ## Parallelized Data Distribution
19 |
20 | Amazon SageMaker makes it easy to train machine learning models across a cluster containing a large number of machines. This a non-trivial process, but Amazon SageMaker's built-in algorithms and pre-built machine learning containers (for TensorFlow, PyTorch, XGBoost, Scikit-learn, and MXNet) hide most of the complexity from you. Nevertheless, there are decisions about how to structure data that will have implications regarding how the distributed training is carried out.
21 |
22 | In this module, we will learn about how to take full advantage of distributed training clusters when using one of Amazon SageMaker's built-in algorithms. This module also shows how to use SageMaker's built-in algorithms via hosted Jupyter notebooks, the AWS CLI, and the Amazon SageMaker console.
23 |
24 | 1. **Exploratory Data Analysis**: For this part of the module, we'll be using an Amazon SageMaker notebook instance to explore and visualize a data set. Be sure you have downloaded this GitHub repository as specified in **Preliminaries** before you start. Next, in your notebook instance, click the **New** button on the right and select **Folder**.
25 |
26 | 2. Click the checkbox next to your new folder, click the **Rename** button above in the menu bar, and give the folder a name such as 'distributed-data'.
27 |
28 | 3. Click the folder to enter it.
29 |
30 | 4. To upload the notebook for this module, click the **Upload** button on the right. Then in the file selection popup, select the file 'data_distribution_types.ipynb' from the notebooks subdirectory in the folder on your computer where you downloaded this GitHub repository. Click the blue **Upload** button that appears to the right of the notebook's file name.
31 |
32 | 5. You are now ready to begin the notebook: click the notebook's file name to open it.
33 |
34 | 6. In the ```bucket = '