├── .github └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── Image-classification-transfer-learning.ipynb ├── Instructor.html ├── Instructor.md ├── LICENSE ├── NOTICE ├── README.md ├── TensorFlow_Distributed_MNIST.ipynb ├── build.sh ├── images ├── Picture1.png ├── Picture2.png ├── Picture3.png ├── Picture4.png ├── Picture5.png ├── Picture6.png ├── Picture7.png ├── Picture8.png ├── create-iam-role.png ├── create-instance.png ├── overview.png └── region-selection.png ├── index.html ├── sagemaker-lab.zip └── video-game-sales-xgboost.ipynb /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | # Eclipse 3 | .classpath 4 | .project 5 | .settings/ 6 | 7 | # Intellij 8 | .idea/ 9 | *.iml 10 | *.iws 11 | 12 | # Maven 13 | log/ 14 | target/ 15 | 16 | # VIM 17 | *.swp 18 | 19 | # Mac 20 | .DS_Store 21 | .DS_Store? 22 | 23 | # Windows 24 | Desktop.ini 25 | *.lnk 26 | *.cab 27 | *.msi 28 | *.msm 29 | *.msp 30 | $RECYCLE.BIN/ 31 | Thumbs.db 32 | ehthumbs.db 33 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/awslabs/amazon-sagemaker-workshop/issues), or [recently closed](https://github.com/awslabs/amazon-sagemaker-workshop/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/awslabs/amazon-sagemaker-workshop/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/awslabs/amazon-sagemaker-workshop/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /Image-classification-transfer-learning.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Image classification transfer learning demo\n", 8 | "\n", 9 | "1. [Introduction](#Introduction)\n", 10 | "2. [Prerequisites and Preprocessing](#Prequisites-and-Preprocessing)\n", 11 | "3. [Fine-tuning the Image classification model](#Fine-tuning-the-Image-classification-model)\n", 12 | "4. [Set up hosting for the model](#Set-up-hosting-for-the-model)\n", 13 | " 1. [Import model into hosting](#Import-model-into-hosting)\n", 14 | " 2. [Create endpoint configuration](#Create-endpoint-configuration)\n", 15 | " 3. [Create endpoint](#Create-endpoint)\n", 16 | "5. [Perform Inference](#Perform-Inference)\n" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "## Introduction\n", 24 | "\n", 25 | "Welcome to our end-to-end example of distributed image classification algorithm in transfer learning mode. In this demo, we will use the Amazon sagemaker image classification algorithm in transfer learning mode to fine-tune a pre-trained model (trained on imagenet data) to learn to classify a new dataset. In particular, the pre-trained model will be fine-tuned using [caltech-256 dataset](http://www.vision.caltech.edu/Image_Datasets/Caltech256/). \n", 26 | "\n", 27 | "To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on." 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "## Prequisites and Preprocessing\n", 35 | "\n", 36 | "### Permissions and environment variables\n", 37 | "\n", 38 | "Here we set up the linkage and authentication to AWS services. There are three parts to this:\n", 39 | "\n", 40 | "* The roles used to give learning and hosting access to your data. This will automatically be obtained from the role used to start the notebook\n", 41 | "* The S3 bucket that you want to use for training and model data\n", 42 | "* The Amazon sagemaker image classification docker image which need not be changed" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": { 49 | "collapsed": true 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "%%time\n", 54 | "import boto3\n", 55 | "import re\n", 56 | "from sagemaker import get_execution_role\n", 57 | "\n", 58 | "role = get_execution_role()\n", 59 | "\n", 60 | "bucket='<>' # customize to your bucket\n", 61 | "\n", 62 | "containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/image-classification:latest',\n", 63 | " 'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest',\n", 64 | " 'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/image-classification:latest',\n", 65 | " 'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:latest'}\n", 66 | "training_image = containers[boto3.Session().region_name]\n", 67 | "print(training_image)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "## Fine-tuning the Image classification model\n", 75 | "\n", 76 | "The caltech 256 dataset consist of images from 257 categories (the last one being a clutter category) and has 30k images with a minimum of 80 images and a maximum of about 800 images per category. \n", 77 | "\n", 78 | "The image classification algorithm can take two types of input formats. The first is a [recordio format](https://mxnet.incubator.apache.org/tutorials/basic/record_io.html) and the other is a [lst format](https://mxnet.incubator.apache.org/how_to/recordio.html?highlight=im2rec). Files for both these formats are available at http://data.dmlc.ml/mxnet/data/caltech-256/. In this example, we will use the recordio format for training and use the training/validation split [specified here](http://data.dmlc.ml/mxnet/data/caltech-256/)." 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": { 85 | "collapsed": true 86 | }, 87 | "outputs": [], 88 | "source": [ 89 | "import os\n", 90 | "import urllib.request\n", 91 | "import boto3\n", 92 | "\n", 93 | "def download(url):\n", 94 | " filename = url.split(\"/\")[-1]\n", 95 | " if not os.path.exists(filename):\n", 96 | " urllib.request.urlretrieve(url, filename)\n", 97 | "\n", 98 | " \n", 99 | "def upload_to_s3(channel, file):\n", 100 | " s3 = boto3.resource('s3')\n", 101 | " data = open(file, \"rb\")\n", 102 | " key = channel + '/' + file\n", 103 | " s3.Bucket(bucket).put_object(Key=key, Body=data)\n", 104 | "\n", 105 | "\n", 106 | "# # caltech-256\n", 107 | "download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')\n", 108 | "download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')\n", 109 | "upload_to_s3('validation', 'caltech-256-60-val.rec')\n", 110 | "upload_to_s3('train', 'caltech-256-60-train.rec')" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Once we have the data available in the correct format for training, the next step is to actually train the model using the data. Before training the model, we need to setup the training parameters. The next section will explain the parameters in detail." 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "## Training parameters\n", 125 | "There are two kinds of parameters that need to be set for training. The first one are the parameters for the training job. These include:\n", 126 | "\n", 127 | "* **Input specification**: These are the training and validation channels that specify the path where training data is present. These are specified in the \"InputDataConfig\" section. The main parameters that need to be set is the \"ContentType\" which can be set to \"application/x-recordio\" or \"application/x-image\" based on the input data format and the S3Uri which specifies the bucket and the folder where the data is present. \n", 128 | "* **Output specification**: This is specified in the \"OutputDataConfig\" section. We just need to specify the path where the output can be stored after training\n", 129 | "* **Resource config**: This section specifies the type of instance on which to run the training and the number of hosts used for training. If \"InstanceCount\" is more than 1, then training can be run in a distributed manner. \n", 130 | "\n", 131 | "Apart from the above set of parameters, there are hyperparameters that are specific to the algorithm. These are:\n", 132 | "\n", 133 | "* **num_layers**: The number of layers (depth) for the network. We use 18 in this samples but other values such as 50, 152 can be used.\n", 134 | "* **num_training_samples**: This is the total number of training samples. It is set to 15420 for caltech dataset with the current split\n", 135 | "* **num_classes**: This is the number of output classes for the new dataset. Imagenet was trained with 1000 output classes but the number of output classes can be changed for fine-tuning. For caltech, we use 257 because it has 256 object categories + 1 clutter class\n", 136 | "* **epochs**: Number of training epochs\n", 137 | "* **learning_rate**: Learning rate for training\n", 138 | "* **mini_batch_size**: The number of training samples used for each mini batch. In distributed training, the number of training samples used per batch will be N * mini_batch_size where N is the number of hosts on which training is run" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "After setting training parameters, we kick off training, and poll for status until training is completed, which in this example, takes between 10 to 12 minutes per epoch on a p2.xlarge machine. The network typically converges after 10 epochs. " 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": { 152 | "collapsed": true, 153 | "isConfigCell": true 154 | }, 155 | "outputs": [], 156 | "source": [ 157 | "# The algorithm supports multiple network depth (number of layers). They are 18, 34, 50, 101, 152 and 200\n", 158 | "# For this training, we will use 18 layers\n", 159 | "num_layers = 18\n", 160 | "# we need to specify the input image shape for the training data\n", 161 | "image_shape = \"3,224,224\"\n", 162 | "# we also need to specify the number of training samples in the training set\n", 163 | "# for caltech it is 15420\n", 164 | "num_training_samples = 15420\n", 165 | "# specify the number of output classes\n", 166 | "num_classes = 257\n", 167 | "# batch size for training\n", 168 | "mini_batch_size = 128\n", 169 | "# number of epochs\n", 170 | "epochs = 2\n", 171 | "# learning rate\n", 172 | "learning_rate = 0.01\n", 173 | "top_k=2\n", 174 | "# Since we are using transfer learning, we set use_pretrained_model to 1 so that weights can be \n", 175 | "# initialized with pre-trained weights\n", 176 | "use_pretrained_model = 1" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "# Training\n", 184 | "Run the training using Amazon sagemaker CreateTrainingJob API" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": { 191 | "collapsed": true 192 | }, 193 | "outputs": [], 194 | "source": [ 195 | "%%time\n", 196 | "import time\n", 197 | "import boto3\n", 198 | "from time import gmtime, strftime\n", 199 | "\n", 200 | "\n", 201 | "s3 = boto3.client('s3')\n", 202 | "# create unique job name \n", 203 | "job_name_prefix = 'sagemaker-imageclassification-notebook'\n", 204 | "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", 205 | "job_name = job_name_prefix + timestamp\n", 206 | "training_params = \\\n", 207 | "{\n", 208 | " # specify the training docker image\n", 209 | " \"AlgorithmSpecification\": {\n", 210 | " \"TrainingImage\": training_image,\n", 211 | " \"TrainingInputMode\": \"File\"\n", 212 | " },\n", 213 | " \"RoleArn\": role,\n", 214 | " \"OutputDataConfig\": {\n", 215 | " \"S3OutputPath\": 's3://{}/{}/output'.format(bucket, job_name_prefix)\n", 216 | " },\n", 217 | " \"ResourceConfig\": {\n", 218 | " \"InstanceCount\": 1,\n", 219 | " \"InstanceType\": \"ml.p2.8xlarge\",\n", 220 | " \"VolumeSizeInGB\": 50\n", 221 | " },\n", 222 | " \"TrainingJobName\": job_name,\n", 223 | " \"HyperParameters\": {\n", 224 | " \"image_shape\": image_shape,\n", 225 | " \"num_layers\": str(num_layers),\n", 226 | " \"num_training_samples\": str(num_training_samples),\n", 227 | " \"num_classes\": str(num_classes),\n", 228 | " \"mini_batch_size\": str(mini_batch_size),\n", 229 | " \"epochs\": str(epochs),\n", 230 | " \"learning_rate\": str(learning_rate),\n", 231 | " \"use_pretrained_model\": str(use_pretrained_model)\n", 232 | " },\n", 233 | " \"StoppingCondition\": {\n", 234 | " \"MaxRuntimeInSeconds\": 360000\n", 235 | " },\n", 236 | "#Training data should be inside a subdirectory called \"train\"\n", 237 | "#Validation data should be inside a subdirectory called \"validation\"\n", 238 | "#The algorithm currently only supports fullyreplicated model (where data is copied onto each machine)\n", 239 | " \"InputDataConfig\": [\n", 240 | " {\n", 241 | " \"ChannelName\": \"train\",\n", 242 | " \"DataSource\": {\n", 243 | " \"S3DataSource\": {\n", 244 | " \"S3DataType\": \"S3Prefix\",\n", 245 | " \"S3Uri\": 's3://{}/train/'.format(bucket),\n", 246 | " \"S3DataDistributionType\": \"FullyReplicated\"\n", 247 | " }\n", 248 | " },\n", 249 | " \"ContentType\": \"application/x-recordio\",\n", 250 | " \"CompressionType\": \"None\"\n", 251 | " },\n", 252 | " {\n", 253 | " \"ChannelName\": \"validation\",\n", 254 | " \"DataSource\": {\n", 255 | " \"S3DataSource\": {\n", 256 | " \"S3DataType\": \"S3Prefix\",\n", 257 | " \"S3Uri\": 's3://{}/validation/'.format(bucket),\n", 258 | " \"S3DataDistributionType\": \"FullyReplicated\"\n", 259 | " }\n", 260 | " },\n", 261 | " \"ContentType\": \"application/x-recordio\",\n", 262 | " \"CompressionType\": \"None\"\n", 263 | " }\n", 264 | " ]\n", 265 | "}\n", 266 | "print('Training job name: {}'.format(job_name))\n", 267 | "print('\\nInput Data Location: {}'.format(training_params['InputDataConfig'][0]['DataSource']['S3DataSource']))" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": null, 273 | "metadata": { 274 | "collapsed": true 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "# create the Amazon SageMaker training job\n", 279 | "sagemaker = boto3.client(service_name='sagemaker')\n", 280 | "sagemaker.create_training_job(**training_params)\n", 281 | "\n", 282 | "# confirm that the training job has started\n", 283 | "status = sagemaker.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']\n", 284 | "print('Training job current status: {}'.format(status))\n", 285 | "\n", 286 | "try:\n", 287 | " # wait for the job to finish and report the ending status\n", 288 | " sagemaker.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)\n", 289 | " training_info = sagemaker.describe_training_job(TrainingJobName=job_name)\n", 290 | " status = training_info['TrainingJobStatus']\n", 291 | " print(\"Training job ended with status: \" + status)\n", 292 | "except:\n", 293 | " print('Training failed to start')\n", 294 | " # if exception is raised, that means it has failed\n", 295 | " message = sagemaker.describe_training_job(TrainingJobName=job_name)['FailureReason']\n", 296 | " print('Training failed with the following error: {}'.format(message))" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": { 303 | "collapsed": true 304 | }, 305 | "outputs": [], 306 | "source": [ 307 | "training_info = sagemaker.describe_training_job(TrainingJobName=job_name)\n", 308 | "status = training_info['TrainingJobStatus']\n", 309 | "print(\"Training job ended with status: \" + status)" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "If you see the message,\n", 317 | "\n", 318 | "> `Training job ended with status: Completed`\n", 319 | "\n", 320 | "then that means training sucessfully completed and the output model was stored in the output path specified by `training_params['OutputDataConfig']`.\n", 321 | "\n", 322 | "You can also view information about and the status of a training job using the AWS SageMaker console. Just click on the \"Jobs\" tab." 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "# Inference\n", 330 | "\n", 331 | "***\n", 332 | "\n", 333 | "A trained model does nothing on its own. We now want to use the model to perform inference. For this example, that means predicting the topic mixture representing a given document.\n", 334 | "\n", 335 | "This section involves several steps,\n", 336 | "\n", 337 | "1. [Create Model](#CreateModel) - Create model for the training output\n", 338 | "1. [Create Endpoint Configuration](#CreateEndpointConfiguration) - Create a configuration defining an endpoint.\n", 339 | "1. [Create Endpoint](#CreateEndpoint) - Use the configuration to create an inference endpoint.\n", 340 | "1. [Perform Inference](#Perform Inference) - Perform inference on some input data using the endpoint." 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "## Create Model\n", 348 | "\n", 349 | "We now create a SageMaker Model from the training output. Using the model we can create an Endpoint Configuration." 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": { 356 | "collapsed": true 357 | }, 358 | "outputs": [], 359 | "source": [ 360 | "%%time\n", 361 | "import boto3\n", 362 | "from time import gmtime, strftime\n", 363 | "\n", 364 | "sage = boto3.Session().client(service_name='sagemaker') \n", 365 | "\n", 366 | "model_name=\"test-image-classification-model\"\n", 367 | "print(model_name)\n", 368 | "info = sage.describe_training_job(TrainingJobName=job_name)\n", 369 | "model_data = info['ModelArtifacts']['S3ModelArtifacts']\n", 370 | "print(model_data)\n", 371 | "containers = {'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/image-classification:latest',\n", 372 | " 'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/image-classification:latest',\n", 373 | " 'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/image-classification:latest',\n", 374 | " 'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/image-classification:latest'}\n", 375 | "hosting_image = containers[boto3.Session().region_name]\n", 376 | "primary_container = {\n", 377 | " 'Image': hosting_image,\n", 378 | " 'ModelDataUrl': model_data,\n", 379 | "}\n", 380 | "\n", 381 | "create_model_response = sage.create_model(\n", 382 | " ModelName = model_name,\n", 383 | " ExecutionRoleArn = role,\n", 384 | " PrimaryContainer = primary_container)\n", 385 | "\n", 386 | "print(create_model_response['ModelArn'])" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "### Create Endpoint Configuration\n", 394 | "At launch, we will support configuring REST endpoints in hosting with multiple models, e.g. for A/B testing purposes. In order to support this, customers create an endpoint configuration, that describes the distribution of traffic across the models, whether split, shadowed, or sampled in some way.\n", 395 | "\n", 396 | "In addition, the endpoint configuration describes the instance type required for model deployment, and at launch will describe the autoscaling configuration." 397 | ] 398 | }, 399 | { 400 | "cell_type": "code", 401 | "execution_count": null, 402 | "metadata": { 403 | "collapsed": true 404 | }, 405 | "outputs": [], 406 | "source": [ 407 | "from time import gmtime, strftime\n", 408 | "\n", 409 | "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", 410 | "endpoint_config_name = job_name_prefix + '-epc-' + timestamp\n", 411 | "endpoint_config_response = sage.create_endpoint_config(\n", 412 | " EndpointConfigName = endpoint_config_name,\n", 413 | " ProductionVariants=[{\n", 414 | " 'InstanceType':'ml.m4.xlarge',\n", 415 | " 'InitialInstanceCount':1,\n", 416 | " 'ModelName':model_name,\n", 417 | " 'VariantName':'AllTraffic'}])\n", 418 | "\n", 419 | "print('Endpoint configuration name: {}'.format(endpoint_config_name))\n", 420 | "print('Endpoint configuration arn: {}'.format(endpoint_config_response['EndpointConfigArn']))" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "### Create Endpoint\n", 428 | "Lastly, the customer creates the endpoint that serves up the model, through specifying the name and configuration defined above. The end result is an endpoint that can be validated and incorporated into production applications. This takes 9-11 minutes to complete." 429 | ] 430 | }, 431 | { 432 | "cell_type": "code", 433 | "execution_count": null, 434 | "metadata": { 435 | "collapsed": true 436 | }, 437 | "outputs": [], 438 | "source": [ 439 | "%%time\n", 440 | "import time\n", 441 | "\n", 442 | "timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())\n", 443 | "endpoint_name = job_name_prefix + '-ep-' + timestamp\n", 444 | "print('Endpoint name: {}'.format(endpoint_name))\n", 445 | "\n", 446 | "endpoint_params = {\n", 447 | " 'EndpointName': endpoint_name,\n", 448 | " 'EndpointConfigName': endpoint_config_name,\n", 449 | "}\n", 450 | "endpoint_response = sagemaker.create_endpoint(**endpoint_params)\n", 451 | "print('EndpointArn = {}'.format(endpoint_response['EndpointArn']))" 452 | ] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "metadata": {}, 457 | "source": [ 458 | "Finally, now the endpoint can be created. It may take sometime to create the endpoint..." 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "collapsed": true 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "# get the status of the endpoint\n", 470 | "response = sagemaker.describe_endpoint(EndpointName=endpoint_name)\n", 471 | "status = response['EndpointStatus']\n", 472 | "print('EndpointStatus = {}'.format(status))\n", 473 | "\n", 474 | "\n", 475 | "# wait until the status has changed\n", 476 | "sagemaker.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)\n", 477 | "\n", 478 | "\n", 479 | "# print the status of the endpoint\n", 480 | "endpoint_response = sagemaker.describe_endpoint(EndpointName=endpoint_name)\n", 481 | "status = endpoint_response['EndpointStatus']\n", 482 | "print('Endpoint creation ended with EndpointStatus = {}'.format(status))\n", 483 | "\n", 484 | "if status != 'InService':\n", 485 | " raise Exception('Endpoint creation failed.')" 486 | ] 487 | }, 488 | { 489 | "cell_type": "markdown", 490 | "metadata": {}, 491 | "source": [ 492 | "If you see the message,\n", 493 | "\n", 494 | "> `Endpoint creation ended with EndpointStatus = InService`\n", 495 | "\n", 496 | "then congratulations! You now have a functioning inference endpoint. You can confirm the endpoint configuration and status by navigating to the \"Endpoints\" tab in the AWS SageMaker console.\n", 497 | "\n", 498 | "We will finally create a runtime object from which we can invoke the endpoint." 499 | ] 500 | }, 501 | { 502 | "cell_type": "markdown", 503 | "metadata": {}, 504 | "source": [ 505 | "## Perform Inference\n", 506 | "Finally, the customer can now validate the model for use. They can obtain the endpoint from the client library using the result from previous operations, and generate classifications from the trained model using that endpoint.\n" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": null, 512 | "metadata": { 513 | "collapsed": true 514 | }, 515 | "outputs": [], 516 | "source": [ 517 | "import boto3\n", 518 | "runtime = boto3.Session().client(service_name='runtime.sagemaker') " 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "### Download test image" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": null, 531 | "metadata": { 532 | "collapsed": true 533 | }, 534 | "outputs": [], 535 | "source": [ 536 | "!wget -O /tmp/test.jpg http://www.vision.caltech.edu/Image_Datasets/Caltech256/images/008.bathtub/008_0007.jpg\n", 537 | "file_name = '/tmp/test.jpg'\n", 538 | "# test image\n", 539 | "from IPython.display import Image\n", 540 | "Image(file_name) " 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": null, 546 | "metadata": { 547 | "collapsed": true 548 | }, 549 | "outputs": [], 550 | "source": [ 551 | "import json\n", 552 | "import numpy as np\n", 553 | "with open(file_name, 'rb') as f:\n", 554 | " payload = f.read()\n", 555 | " payload = bytearray(payload)\n", 556 | "response = runtime.invoke_endpoint(EndpointName=endpoint_name, \n", 557 | " ContentType='application/x-image', \n", 558 | " Body=payload)\n", 559 | "result = response['Body'].read()\n", 560 | "# result will be in json format and convert it to ndarray\n", 561 | "result = json.loads(result)\n", 562 | "# the result will output the probabilities for all classes\n", 563 | "# find the class with maximum probability and print the class index\n", 564 | "index = np.argmax(result)\n", 565 | "object_categories = ['ak47', 'american-flag', 'backpack', 'baseball-bat', 'baseball-glove', 'basketball-hoop', 'bat', 'bathtub', 'bear', 'beer-mug', 'billiards', 'binoculars', 'birdbath', 'blimp', 'bonsai-101', 'boom-box', 'bowling-ball', 'bowling-pin', 'boxing-glove', 'brain-101', 'breadmaker', 'buddha-101', 'bulldozer', 'butterfly', 'cactus', 'cake', 'calculator', 'camel', 'cannon', 'canoe', 'car-tire', 'cartman', 'cd', 'centipede', 'cereal-box', 'chandelier-101', 'chess-board', 'chimp', 'chopsticks', 'cockroach', 'coffee-mug', 'coffin', 'coin', 'comet', 'computer-keyboard', 'computer-monitor', 'computer-mouse', 'conch', 'cormorant', 'covered-wagon', 'cowboy-hat', 'crab-101', 'desk-globe', 'diamond-ring', 'dice', 'dog', 'dolphin-101', 'doorknob', 'drinking-straw', 'duck', 'dumb-bell', 'eiffel-tower', 'electric-guitar-101', 'elephant-101', 'elk', 'ewer-101', 'eyeglasses', 'fern', 'fighter-jet', 'fire-extinguisher', 'fire-hydrant', 'fire-truck', 'fireworks', 'flashlight', 'floppy-disk', 'football-helmet', 'french-horn', 'fried-egg', 'frisbee', 'frog', 'frying-pan', 'galaxy', 'gas-pump', 'giraffe', 'goat', 'golden-gate-bridge', 'goldfish', 'golf-ball', 'goose', 'gorilla', 'grand-piano-101', 'grapes', 'grasshopper', 'guitar-pick', 'hamburger', 'hammock', 'harmonica', 'harp', 'harpsichord', 'hawksbill-101', 'head-phones', 'helicopter-101', 'hibiscus', 'homer-simpson', 'horse', 'horseshoe-crab', 'hot-air-balloon', 'hot-dog', 'hot-tub', 'hourglass', 'house-fly', 'human-skeleton', 'hummingbird', 'ibis-101', 'ice-cream-cone', 'iguana', 'ipod', 'iris', 'jesus-christ', 'joy-stick', 'kangaroo-101', 'kayak', 'ketch-101', 'killer-whale', 'knife', 'ladder', 'laptop-101', 'lathe', 'leopards-101', 'license-plate', 'lightbulb', 'light-house', 'lightning', 'llama-101', 'mailbox', 'mandolin', 'mars', 'mattress', 'megaphone', 'menorah-101', 'microscope', 'microwave', 'minaret', 'minotaur', 'motorbikes-101', 'mountain-bike', 'mushroom', 'mussels', 'necktie', 'octopus', 'ostrich', 'owl', 'palm-pilot', 'palm-tree', 'paperclip', 'paper-shredder', 'pci-card', 'penguin', 'people', 'pez-dispenser', 'photocopier', 'picnic-table', 'playing-card', 'porcupine', 'pram', 'praying-mantis', 'pyramid', 'raccoon', 'radio-telescope', 'rainbow', 'refrigerator', 'revolver-101', 'rifle', 'rotary-phone', 'roulette-wheel', 'saddle', 'saturn', 'school-bus', 'scorpion-101', 'screwdriver', 'segway', 'self-propelled-lawn-mower', 'sextant', 'sheet-music', 'skateboard', 'skunk', 'skyscraper', 'smokestack', 'snail', 'snake', 'sneaker', 'snowmobile', 'soccer-ball', 'socks', 'soda-can', 'spaghetti', 'speed-boat', 'spider', 'spoon', 'stained-glass', 'starfish-101', 'steering-wheel', 'stirrups', 'sunflower-101', 'superman', 'sushi', 'swan', 'swiss-army-knife', 'sword', 'syringe', 'tambourine', 'teapot', 'teddy-bear', 'teepee', 'telephone-box', 'tennis-ball', 'tennis-court', 'tennis-racket', 'theodolite', 'toaster', 'tomato', 'tombstone', 'top-hat', 'touring-bike', 'tower-pisa', 'traffic-light', 'treadmill', 'triceratops', 'tricycle', 'trilobite-101', 'tripod', 't-shirt', 'tuning-fork', 'tweezer', 'umbrella-101', 'unicorn', 'vcr', 'video-projector', 'washing-machine', 'watch-101', 'waterfall', 'watermelon', 'welding-mask', 'wheelbarrow', 'windmill', 'wine-bottle', 'xylophone', 'yarmulke', 'yo-yo', 'zebra', 'airplanes-101', 'car-side-101', 'faces-easy-101', 'greyhound', 'tennis-shoes', 'toad', 'clutter']\n", 566 | "print(\"Result: label - \" + object_categories[index] + \", probability - \" + str(result[index]))" 567 | ] 568 | }, 569 | { 570 | "cell_type": "markdown", 571 | "metadata": {}, 572 | "source": [ 573 | "### Clean up\n", 574 | "\n", 575 | "When we're done with the endpoint, we can just delete it and the backing instances will be released. Run the following cell to delete the endpoint." 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": null, 581 | "metadata": { 582 | "collapsed": true 583 | }, 584 | "outputs": [], 585 | "source": [ 586 | "sage.delete_endpoint(EndpointName=endpoint_name)" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": null, 592 | "metadata": { 593 | "collapsed": true 594 | }, 595 | "outputs": [], 596 | "source": [] 597 | } 598 | ], 599 | "metadata": { 600 | "kernelspec": { 601 | "display_name": "conda_mxnet_p36", 602 | "language": "python", 603 | "name": "conda_mxnet_p36" 604 | }, 605 | "language_info": { 606 | "codemirror_mode": { 607 | "name": "ipython", 608 | "version": 3 609 | }, 610 | "file_extension": ".py", 611 | "mimetype": "text/x-python", 612 | "name": "python", 613 | "nbconvert_exporter": "python", 614 | "pygments_lexer": "ipython3", 615 | "version": "3.6.2" 616 | } 617 | }, 618 | "nbformat": 4, 619 | "nbformat_minor": 2 620 | } 621 | -------------------------------------------------------------------------------- /Instructor.html: -------------------------------------------------------------------------------- 1 |

2 | Deep Learning with Sagemaker and Tensorflow

3 |

4 | Required IAM Roles and Permissions

5 |

6 | Sagemaker Role

7 |

This is a CloudFormation template for the IAM role that needs to be used by the students when creating their Sagemaker notebook. It will output an ARN that will be used during the workshop.

8 |

Sagemaker Service Role (Cloudformation Template)

9 |
	"Resources": {
 10 | 		"SageMakerLab": {
 11 | 			"Type": "AWS::IAM::Role",
 12 | 			"Properties": {
 13 | 				"AssumeRolePolicyDocument": {
 14 | 					"Version": "2012-10-17",
 15 | 					"Statement": [{
 16 | 						"Effect": "Allow",
 17 | 						"Principal": {
 18 | 							"Service": [
 19 | 								"sagemaker.amazonaws.com"
 20 | 							]
 21 | 						},
 22 | 						"Action": [
 23 | 							"sts:AssumeRole"
 24 | 						]
 25 | 					}]
 26 | 				},
 27 | 				"ManagedPolicyArns": [
 28 | 					"arn:aws:iam::aws:policy/AmazonSageMakerFullAccess"
 29 | 				]
 30 | 			}
 31 | 		}
 32 | 	},
 33 | 	"Outputs": {
 34 | 		"qwikLAB": {
 35 | 			"Description": "Outputs to be used by qwikLAB",
 36 | 			"Value": {
 37 | 				"Fn::Join": [
 38 | 					"", [
 39 | 						"{",
 40 | 						"\"Resource-ARN\": \"",
 41 | 						{
 42 | 							"Fn::GetAtt": [
 43 | 								"SageMakerLab",
 44 | 								"Arn"
 45 | 							]
 46 | 						},
 47 | 						"\"}"
 48 | 					]
 49 | 				]
 50 | 			}
 51 | 		}
 52 | 	}
 53 | }
 54 | 
55 |

56 | Sagemaker User Policy:

57 |

These are the IAM permissions that should be setup for every user.

58 |
    "Version": "2012-10-17",
 59 |     "Statement": [
 60 |         {
 61 |             "Effect": "Allow",
 62 |             "Action": [
 63 |                 "sagemaker:*",
 64 |                 "ecr:GetAuthorizationToken",
 65 |                 "ecr:GetDownloadUrlForLayer",
 66 |                 "ecr:BatchGetImage",
 67 |                 "ecr:BatchCheckLayerAvailability",
 68 |                 "cloudwatch:PutMetricData",
 69 |                 "logs:CreateLogGroup",
 70 |                 "logs:CreateLogStream",
 71 |                 "logs:DescribeLogStreams",
 72 |                 "logs:PutLogEvents",
 73 |                 "logs:GetLogEvents",
 74 |                 "s3:CreateBucket",
 75 |                 "s3:ListBucket",
 76 |                 "s3:GetBucketLocation",
 77 |                 "s3:GetObject",
 78 |                 "s3:PutObject",
 79 |                 "s3:DeleteObject",
 80 |                 "s3:ListAllMyBuckets",
 81 |                 "iam:ListRoles"
 82 |             ],
 83 |             "Resource": "*"
 84 |         },
 85 |         {
 86 |             "Effect": "Allow",
 87 |             "Action": [
 88 |                 "iam:PassRole"
 89 |             ],
 90 |             "Resource": "*",
 91 |             "Condition": {
 92 |                 "StringEquals": {
 93 |                     "iam:PassedToService": "sagemaker.amazonaws.com"
 94 |                 }
 95 |             }
 96 |         }
 97 |     ]
 98 | }
 99 | 
100 |

101 | Resources Created

102 |

Below are the resources required in the AWS account that will be created. All are per student unless noted otherwise

103 | 136 | 137 | -------------------------------------------------------------------------------- /Instructor.md: -------------------------------------------------------------------------------- 1 | # Deep Learning with Sagemaker and Tensorflow 2 | 3 | ## Required IAM Roles and Permissions 4 | 5 | ### Sagemaker Role 6 | 7 | This is a CloudFormation template for the IAM role that needs to be used by the students when creating their Sagemaker notebook. It will output an ARN that will be used during the workshop. 8 | 9 | Sagemaker Service Role (Cloudformation Template) 10 | 11 | ```{ 12 | "Resources": { 13 | "SageMakerLab": { 14 | "Type": "AWS::IAM::Role", 15 | "Properties": { 16 | "AssumeRolePolicyDocument": { 17 | "Version": "2012-10-17", 18 | "Statement": [{ 19 | "Effect": "Allow", 20 | "Principal": { 21 | "Service": [ 22 | "sagemaker.amazonaws.com" 23 | ] 24 | }, 25 | "Action": [ 26 | "sts:AssumeRole" 27 | ] 28 | }] 29 | }, 30 | "ManagedPolicyArns": [ 31 | "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess" 32 | ] 33 | } 34 | } 35 | }, 36 | "Outputs": { 37 | "qwikLAB": { 38 | "Description": "Outputs to be used by qwikLAB", 39 | "Value": { 40 | "Fn::Join": [ 41 | "", [ 42 | "{", 43 | "\"Resource-ARN\": \"", 44 | { 45 | "Fn::GetAtt": [ 46 | "SageMakerLab", 47 | "Arn" 48 | ] 49 | }, 50 | "\"}" 51 | ] 52 | ] 53 | } 54 | } 55 | } 56 | } 57 | ``` 58 | 59 | ### Sagemaker User Policy: 60 | These are the IAM permissions that should be setup for every user. 61 | 62 | ```{ 63 | "Version": "2012-10-17", 64 | "Statement": [ 65 | { 66 | "Effect": "Allow", 67 | "Action": [ 68 | "sagemaker:*", 69 | "ecr:GetAuthorizationToken", 70 | "ecr:GetDownloadUrlForLayer", 71 | "ecr:BatchGetImage", 72 | "ecr:BatchCheckLayerAvailability", 73 | "cloudwatch:PutMetricData", 74 | "logs:CreateLogGroup", 75 | "logs:CreateLogStream", 76 | "logs:DescribeLogStreams", 77 | "logs:PutLogEvents", 78 | "logs:GetLogEvents", 79 | "s3:CreateBucket", 80 | "s3:ListBucket", 81 | "s3:GetBucketLocation", 82 | "s3:GetObject", 83 | "s3:PutObject", 84 | "s3:DeleteObject", 85 | "s3:ListAllMyBuckets", 86 | "iam:ListRoles" 87 | ], 88 | "Resource": "*" 89 | }, 90 | { 91 | "Effect": "Allow", 92 | "Action": [ 93 | "iam:PassRole" 94 | ], 95 | "Resource": "*", 96 | "Condition": { 97 | "StringEquals": { 98 | "iam:PassedToService": "sagemaker.amazonaws.com" 99 | } 100 | } 101 | } 102 | ] 103 | } 104 | ``` 105 | 106 | ## Resources Created 107 | Below are the resources required in the AWS account that will be created. All are per student unless noted otherwise 108 | 109 | - 1 S3 Bucket 110 | - 1 Sagemaker Role (can be shared by all students) 111 | - Sagemaker 112 | - General 113 | - 1 Sagemaker Notebook instance (ml.m4.xlarge) 114 | - Module 2 (Gaming) 115 | - 1 Training instance (ml.c4.xlarge) 116 | - 1 Endpoint (ml.t2.medium) 117 | - Module 3 (Distributed TensorFlow) 118 | - 1 Training instance (2x ml.c4.8xlarge) 119 | - 1 Endpoint (ml.m4.xlarge) 120 | - Module 4 (Image Classification) 121 | - 1 S3 Bucket 122 | - 1 Training instance (ml.p2.8xlarge) 123 | - 1 Training instance (2x ml.c4.8xlarge) 124 | - 1 Endpoint (ml.m4.xlarge) 125 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Amazon Sagemaker Workshop 2 | Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Amazon SageMaker Workshop 2 | 3 | Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. In this workshop, you'll create a SageMaker notebook instance and work through sample Jupyter notebooks that demonstrate some of the many features of SageMaker. For example, you'll create model training jobs using SageMaker's hosted training feature, and create endpoints to serve predictions from your models using SageMaker's hosted endpoint feature. Along the way you'll see how machine learning can be applied to both structured data (e.g. from CSV flat files) and unstructured data (e.g. images). 4 | 5 | ![Overview](./images/overview.png) 6 | 7 | ## Prerequisites 8 | 9 | ### AWS Account 10 | 11 | In order to complete this workshop you'll need an AWS Account with access to create AWS IAM, S3 and SageMaker resources. The code and instructions in this workshop assume only one student is using a given AWS account at a time. If you try sharing an account with another student, you'll run into naming conflicts for certain resources. You can work around these by appending a unique suffix to the resources that fail to create due to conflicts, but the instructions do not provide details on the changes required to make this work. 12 | 13 | Some of the resources you will launch as part of this workshop are eligible for the AWS free tier if your account is less than 12 months old. See the [AWS Free Tier page](https://aws.amazon.com/free/) for more details. 14 | 15 | ### AWS Region 16 | 17 | SageMaker is not available in all AWS Regions at this time. Accordingly, we recommend running this workshop in one of the following supported AWS Regions: N. Virginia, Oregon, Ohio, or Ireland. 18 | 19 | Once you've chosen a region, you should create all of the resources for this workshop there, including a new Amazon S3 bucket and a new SageMaker notebook instance. Make sure you select your region from the dropdown in the upper right corner of the AWS Console before getting started. 20 | 21 | ![Region selection screenshot](./images/region-selection.png) 22 | 23 | ### Browser 24 | 25 | We recommend you use the latest version of Chrome or Firefox to complete this workshop. 26 | 27 | ## Modules 28 | 29 | This workshop is divided into multiple modules. Module 1 must be completed first, followed by Module 2. You can complete the other modules (Modules 3 and 4) in any order. 30 | 31 | 1. Creating a Notebook Instance 32 | 2. Video Game Sales Notebook 33 | 3. Distributed Training with TensorFlow Notebook 34 | 4. Image Classification Notebook 35 | 36 | Be patient as you work your way through the notebook-based modules. After you run a cell in a notebook, it may take several seconds for the code to show results. For the cells that start training jobs, it may take several minutes. In particular, the last two modules have training jobs that may last up to 10 minutes. 37 | 38 | After you have completed the workshop, you can delete all of the resources that were created by following the Cleanup Guide provided with this lab guide. 39 | 40 | ## Module 1: Creating a Notebook Instance 41 | 42 | In this module we'll start by creating an Amazon S3 bucket that will be used throughout the workshop. We'll then create a SageMaker notebook instance, which we will use to run the other workshop modules. 43 | 44 | ### 1. Create a S3 Bucket 45 | 46 | SageMaker typically uses S3 as storage for data and model artifacts. In this step you'll create a S3 bucket for this purpose. To begin, sign into the AWS Management Console, https://console.aws.amazon.com/. 47 | 48 | #### High-Level Instructions 49 | 50 | Use the console or AWS CLI to create an Amazon S3 bucket. Keep in mind that your bucket's name must be globally unique across all regions and customers. We recommend using a name like `smworkshop-firstname-lastname`. If you get an error that your bucket name already exists, try adding additional numbers or characters until you find an unused name. 51 | 52 |
53 | Step-by-step instructions (expand for details)

54 | 55 | 1. In the AWS Management Console, choose **Services** then select **S3** under Storage. 56 | 57 | 1. Choose **+Create Bucket** 58 | 59 | 1. Provide a globally unique name for your bucket such as `smworkshop-firstname-lastname`. 60 | 61 | 1. Select the Region you've chosen to use for this workshop from the dropdown. 62 | 63 | 1. Choose **Create** in the lower left of the dialog without selecting a bucket to copy settings from. 64 | 65 |

66 | 67 | ### 2. Launching the Notebook Instance 68 | 69 | 1. In the upper-right corner of the AWS Management Console, confirm you are in the desired AWS region. Select N. Virginia, Oregon, Ohio, or Ireland. 70 | 71 | 2. Click on Amazon SageMaker from the list of all services. This will bring you to the Amazon SageMaker console homepage. 72 | 73 | ![Services in Console](./images/Picture1.png) 74 | 75 | 3. To create a new notebook instance, go to **Notebook instances**, and click the **Create notebook instance** button at the top of the browser window. 76 | 77 | ![Notebook Instances](./images/Picture2.png) 78 | 79 | 4. Type [First Name]-[Last Name]-workshop into the **Notebook instance name** text box, and select ml.m4.xlarge for the **Notebook instance type**. 80 | 81 | 5. For IAM role, choose **Select an existing role** and choose one named "AmazonSageMaker-ExecutionRole-XXXX". 82 | 83 | ![Create Notebook Instance](./images/create-instance.png) 84 | 85 | 6. You can expand the "Tags" section and add tags here if required. 86 | 87 | 7. You will be taken back to the Create Notebook instance page. Click **Create notebook instance**. This will take several minutes to complete. 88 | 89 | ### 3. Accessing the Notebook Instance 90 | 91 | 1. Wait for the server status to change to **InService**. This will take a few minutes. 92 | 93 | ![Access Notebook](./images/Picture4.png) 94 | 95 | 2. Click **Open**. You will now see the Jupyter homepage for your notebook instance. 96 | 97 | ![Open Notebook](./images/Picture5.png) 98 | 99 | 100 | ## Module 2: Video Game Sales Notebook 101 | 102 | In this module, we'll work our way through an example Jupyter notebook that demonstrates how to use an Amazon-provided algorithm in SageMaker. More specifically, we'll use SageMaker's version of XGBoost, a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to predict a target variable by combining the estimates of a set of simpler, weaker models. XGBoost has done remarkably well in machine learning competitions because it robustly handles a wide variety of data types, relationships, and distributions. It often is a useful, go-to algorithm in working with structured data, such as data that might be found in relational databases and flat files. 103 | 104 | To begin, follow these steps: 105 | 106 | 1. Download this repository to your computer by clicking the green **Clone or download** button from the upper right of this page, then **Download ZIP**. 107 | - If you aren't accessing this on Github, you can download this here: [sagemaker-lab.zip](./sagemaker-lab.zip) 108 | 2. In your notebook instance, click the **New** button on the right and select **Folder**. 109 | 3. Click the checkbox next to your new folder, click the **Rename** button above in the menu bar, and give the folder a name such as 'video-game-sales'. 110 | 4. Click the folder to enter it. 111 | 5. To upload the notebook, click the **Upload** button on the right, then in the file selection popup, select the file 'video-game-sales.ipynb' from the folder on your computer where you downloaded this GitHub repository. Then click the blue **Upload** button that appears in the notebook next to the file name. 112 | 6. You are now ready to begin the notebook: click the notebook's file name to open it. 113 | 7. In the ```bucket = ''``` code line, paste the name of the S3 bucket you created in Module 1 to replace ``````. The code line should now read similar to ```bucket = 'smworkshop-john-smith'```. Do NOT paste the entire path (s3://.......), just the bucket name. 114 | 115 | 116 | Jupyter notebooks tell a story by combining explanatory text and code. There are two types of "cells" in a notebook: code cells, and "markdown" cells with explanatory text. 117 | - You will be running the code cells. These are distinguished by having "In" next to them in the left margin next to the cell, and a greyish background. Markdown cells lack "In" and have a white background. 118 | - To run a code cell, simply click in it, then either click the **Run Cell** button in the notebook's toolbar, or use Control+Enter from your computer's keyboard. 119 | - It may take a few seconds to a few minutes for a code cell to run. Please run each code cell in order, and only once, to avoid repeated operations. For example, running the same training job cell twice might create two training jobs, possibly exceeding your service limits. 120 | - Run through each cell in the video-game-sales notebook to complete this module 121 | 122 |

NOTE: training the model for this example typically takes about 5 minutes.

123 | 124 | 125 | ## Module 3: Distributed Training with TensorFlow Notebook 126 | 127 | In this module we will be using images of handwritten digits from the [MNIST Database](http://yann.lecun.com/exdb/mnist/) to demonstrate how to perform distributed training using SageMaker. Using a convolutional neural network model based on the [TensorFlow MNIST Example](https://github.com/tensorflow/models/tree/master/official/mnist), we will demonstrate how to use a Jupyter notebook and the [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to create your own script to pre-process data, train a model, create a SageMaker hosted endpoint, and make predictions against this endpoint. The model will predict what the handwritten digit is in the image presented for prediction. Besides demonstrating a "bring your own script" for TensorFlow use case, the example also showcases how easy it is to set up a cluster of multiple instances for model training in SageMaker. 128 | 129 | 1. In your notebook instance, click the **New** button on the right and select **Folder**. 130 | 2. Click the checkbox next to your new folder, click the **Rename** button above in the menu bar, and give the folder a name such as 'tensorflow-distributed'. 131 | 3. Click the folder to enter it. 132 | 4. To upload the notebook, click the **Upload** button on the right, then in the file selection popup, select the file 'TensorFlow_Distributed_MNIST.ipynb' from the folder on your computer where you downloaded this GitHub repository. Then click the blue **Upload** button that appears in the notebook next to the file name. 133 | 5. You are now ready to begin the notebook: click the notebook's file name to open it, then follow the directions in the notebook. 134 | 135 |

NOTE: training the model for this example typically takes about 8 minutes.

136 | 137 | ## Module 4: Image Classification Notebook 138 | 139 | For this module, we'll work with an image classification example notebook. In particular, we'll use the Amazon-provided image classification algorithm, which is a supervised learning algorithm that takes an image as input and classifies it into one of multiple output categories. It uses a convolutional neural network (ResNet) that can be trained from scratch, or trained using transfer learning when a large number of training images are not available. Even if you don't have experience with neural networks or image classification, SageMaker's image classification algorithm makes the technology easy to use, with no need to design and set up your own neural network. 140 | 141 | Follow these steps: 142 | 143 | 1. In your notebook instance, click the **New** button on the right and select **Folder**. 144 | 2. Click the checkbox next to your new folder, click the **Rename** button above in the menu bar, and give the folder a name such as 'image-classification'. 145 | 3. Click the folder to enter it. 146 | 4. To upload the notebook, click the **Upload** button on the right, then in the file selection popup, select the file 'Image-classification-transfer-learning.ipynb' from the folder on your computer where you downloaded this GitHub repository. Then click the blue **Upload** button that appears in the notebook next to the file name. 147 | 5. You are now ready to begin the notebook: click the notebook's file name to open it, then follow the directions in the notebook. 148 | 149 |

NOTE: training the model for this example typically takes about 10 minutes. However, keep in mind that this is relatively short because transfer learning is used rather than training from scratch, which could take many hours.

150 | 151 | ## Cleanup Guide 152 | 153 | To avoid charges for resources you no longer need when you're done with this workshop, you can delete them or, in the case of your notebook instance, stop them. Here are the resources you should consider: 154 | 155 | - Endpoints: these are the clusters of one or more instances serving inferences from your models. If you did not delete them from within the notebooks, you can delete them via the SageMaker console. To do so, click the **Endpoints** link in the left panel. Then, for each endpoint, click the radio button next to it, then select **Delete** from the **Actions** drop down menu. You can follow a similar procedure to delete the related Models and Endpoint configurations. 156 | 157 | - Notebook instance: you have two options if you do not want to keep the notebook instance running. If you would like to save it for later, you can stop rather than deleting it. To delete it, click the **Notebook instances** link in the left panel. Next, click the radio button next to the notebook instance created for this workshop, then select **Delete** from the **Actions** drop down menu. To simply stop it instead, just click the **Stop** link. After it is stopped, you can start it again by clicking the **Start** link. Keep in mind that if you stop rather than delete it, you will be charged for the storage associated with it. 158 | 159 | ## License 160 | 161 | The contents of this workshop are licensed under the Apache 2.0 License. 162 | -------------------------------------------------------------------------------- /TensorFlow_Distributed_MNIST.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# MNIST distributed training with TensorFlow \n", 8 | "\n", 9 | "## Contents\n", 10 | "\n", 11 | "1. [Background](#Background)\n", 12 | "1. [Setup](#Setup)\n", 13 | "1. [Data](#Data)\n", 14 | "1. [Train](#Train)\n", 15 | "1. [Host](#Host)\n", 16 | "1. [Predict](#Predict)\n", 17 | "\n", 18 | "\n", 19 | "## Background\n", 20 | "\n", 21 | "The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. This tutorial focuses on how to create a convolutional neural network model to train the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) using **TensorFlow distributed training**.\n", 22 | "\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "\n", 30 | "## Setup\n", 31 | "\n", 32 | "Here we will start by importing the necessary libraries for this notebook." 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "import os\n", 42 | "os.system(\"aws s3 cp s3://sagemaker-workshop-pdx/mnist/utils.py utils.py\")\n", 43 | "os.system(\"aws s3 cp s3://sagemaker-workshop-pdx/mnist/mnist.py mnist.py\")\n", 44 | "import sagemaker\n", 45 | "import utils\n", 46 | "import numpy as np\n", 47 | "import matplotlib.pyplot as plt\n", 48 | "from tensorflow.contrib.learn.python.learn.datasets import mnist\n", 49 | "import tensorflow as tf\n", 50 | "import boto3\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "Next we specify the IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the `get_execution_role()` call with the appropriate full IAM role arn string(s)." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": { 64 | "collapsed": true 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "role = sagemaker.get_execution_role()\n", 69 | "sagemaker_session = sagemaker.Session()" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "\n", 77 | "## Data\n", 78 | "\n", 79 | "\n", 80 | "### Download the MNIST dataset\n", 81 | "\n", 82 | "First we will download the data from the workshop's S3 bucket, then we will extract the images from the compressed files." 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": { 89 | "scrolled": false 90 | }, 91 | "outputs": [], 92 | "source": [ 93 | "os.system(\"aws s3 cp --recursive s3://sagemaker-workshop-pdx/mnist/data data\")\n", 94 | "\n", 95 | "data_sets = mnist.read_data_sets('mnist/data', dtype=tf.uint8, reshape=False, validation_size=5000)\n", 96 | "\n", 97 | "utils.convert_to(data_sets.train, 'train', 'mnist/data')\n", 98 | "utils.convert_to(data_sets.validation, 'validation', 'mnist/data')\n", 99 | "utils.convert_to(data_sets.test, 'test', 'mnist/data')" 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "### Some sample images from the MNIST data set\n", 107 | "\n", 108 | "Here are some images from the MNIST traing data set, feel free to change the batch number and re-run this cell to see some other images from the collection, or, change the data set it pulls from to the test data set to see images from that collection." 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [ 117 | "!cat utils.py" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "batch_xs, batch_ys = data_sets.train.next_batch(5) # Change \"train\" to \"test\" or select a different batch.\n", 127 | "utils.gen_image(batch_xs[0]).show()\n", 128 | "utils.gen_image(batch_xs[1]).show()\n", 129 | "utils.gen_image(batch_xs[2]).show()" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "### Upload the data\n", 137 | "We use the ```sagemaker.Session.upload_data``` function to upload our datasets to an S3 bucket. The return value of inputs identifies the location -- we will use this later when we start the training job." 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "inputs = sagemaker_session.upload_data(path='mnist/data', key_prefix='data/mnist')" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "\n", 154 | "## Train\n", 155 | "\n", 156 | "Here is the full code for the network model:" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": null, 162 | "metadata": { 163 | "scrolled": false 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "!cat 'mnist.py'" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "The script here is an adaptation of the [TensorFlow MNIST example](https://github.com/tensorflow/models/tree/master/official/mnist). We have defined ```model_fn(features, labels, mode)```, which includes all the logic to support training, evaluation and inference. \n", 175 | "\n", 176 | "### A regular ```model_fn```\n", 177 | "\n", 178 | "A regular **```model_fn```** follows the pattern:\n", 179 | "1. [defines a neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L96)\n", 180 | "- [applies the ```features``` in the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L178)\n", 181 | "- [if the ```mode``` is ```PREDICT```, returns the output from the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L186)\n", 182 | "- [calculates the loss function comparing the output with the ```labels```](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L188)\n", 183 | "- [creates an optimizer and minimizes the loss function to improve the neural network](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L193)\n", 184 | "- [returns the output, optimizer and loss function](https://github.com/tensorflow/models/blob/master/official/mnist/mnist.py#L205)\n", 185 | "\n", 186 | "### Writing a ```model_fn``` for distributed training\n", 187 | "When distributed training happens, the same neural network will be sent to multiple training instances. Each instance will train with a batch of the dataset, calculate loss and minimize the optimizer. One entire loop of this process is called a **training step**.\n", 188 | "\n", 189 | "### Syncronizing training steps\n", 190 | "A [global step](https://www.tensorflow.org/api_docs/python/tf/train/global_step) is a global counter shared between the instances. This counter is used by the optimizer to keep track of the number of **training steps** across instances and is necessary for distributed training: \n", 191 | "\n", 192 | "```python\n", 193 | "train_op = optimizer.minimize(loss, tf.train.get_or_create_global_step())\n", 194 | "```\n", 195 | "\n", 196 | "That is also the **only** required change for distributed training!" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "### Create a training job using the sagemaker.TensorFlow estimator" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": { 210 | "scrolled": true 211 | }, 212 | "outputs": [], 213 | "source": [ 214 | "from sagemaker.tensorflow import TensorFlow\n", 215 | "\n", 216 | "mnist_estimator = TensorFlow(entry_point='mnist.py',\n", 217 | " role=role,\n", 218 | " training_steps=1000, \n", 219 | " evaluation_steps=100,\n", 220 | " train_instance_count=2,\n", 221 | " train_instance_type='ml.c4.8xlarge')\n", 222 | "\n", 223 | "mnist_estimator.fit(inputs)" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "The **```fit```** method will create a training job using two **ml.c4.8xlarge** instances. The output above will show the status of the training jobs on each instance during training and evaluation.\n", 231 | "\n", 232 | "When training is complete, the training job will generate a saved model for serving using a SageMaker endpoint." 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": { 238 | "collapsed": true 239 | }, 240 | "source": [ 241 | "\n", 242 | "## Host\n", 243 | "\n", 244 | "\n", 245 | "### Deploy the trained model to prepare for predictions\n", 246 | "\n", 247 | "The deploy() method creates a SageMaker endpoint which serves prediction requests in real-time." 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "mnist_predictor = mnist_estimator.deploy(initial_instance_count=1,\n", 257 | " instance_type='ml.m4.xlarge')" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "\n", 265 | "## Predict\n", 266 | "\n", 267 | "### Invoking the endpoint\n", 268 | "\n", 269 | "Now we will pass some of the test images to the model endpoint for inference." 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "from tensorflow.examples.tutorials.mnist import input_data\n", 279 | "\n", 280 | "mnist = input_data.read_data_sets(\"/tmp/data/\", one_hot=True)\n", 281 | "\n", 282 | "for i in range(10):\n", 283 | " data = mnist.test.images[i].tolist()\n", 284 | " tensor_proto = tf.make_tensor_proto(values=np.asarray(data), shape=[1, len(data)], dtype=tf.float32)\n", 285 | " predict_response = mnist_predictor.predict(tensor_proto)\n", 286 | " \n", 287 | " image = mnist.test.images[i]\n", 288 | " image = np.array(image, dtype='float')\n", 289 | " plt.imshow(image.reshape(28, 28))\n", 290 | " plt.show()\n", 291 | " label = np.argmax(mnist.test.labels[i])\n", 292 | " print(\"Label is: {}\".format(label))\n", 293 | " prediction = predict_response['outputs']['classes']['int64Val'][0]\n", 294 | " print(\"Prediction is: {}\".format(prediction))\n", 295 | " print(\"_________________________________\")" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "# Deleting the endpoint\n", 303 | "When you are done with the notebook, delete the endpoint to not incur unneccessary charges by running the following cell." 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": null, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "sagemaker.Session().delete_endpoint(mnist_predictor.endpoint)" 313 | ] 314 | } 315 | ], 316 | "metadata": { 317 | "kernelspec": { 318 | "display_name": "conda_tensorflow_p36", 319 | "language": "python", 320 | "name": "conda_tensorflow_p36" 321 | }, 322 | "language_info": { 323 | "codemirror_mode": { 324 | "name": "ipython", 325 | "version": 3 326 | }, 327 | "file_extension": ".py", 328 | "mimetype": "text/x-python", 329 | "name": "python", 330 | "nbconvert_exporter": "python", 331 | "pygments_lexer": "ipython3", 332 | "version": "3.6.2" 333 | }, 334 | "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." 335 | }, 336 | "nbformat": 4, 337 | "nbformat_minor": 2 338 | } 339 | -------------------------------------------------------------------------------- /build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | github-markdown README.md > index.html 4 | github-markdown Instructor.md > Instructor.html 5 | rm -f sagemaker-lab.zip 6 | zip -r sagemaker-lab.zip * 7 | -------------------------------------------------------------------------------- /images/Picture1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture1.png -------------------------------------------------------------------------------- /images/Picture2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture2.png -------------------------------------------------------------------------------- /images/Picture3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture3.png -------------------------------------------------------------------------------- /images/Picture4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture4.png -------------------------------------------------------------------------------- /images/Picture5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture5.png -------------------------------------------------------------------------------- /images/Picture6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture6.png -------------------------------------------------------------------------------- /images/Picture7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture7.png -------------------------------------------------------------------------------- /images/Picture8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/Picture8.png -------------------------------------------------------------------------------- /images/create-iam-role.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/create-iam-role.png -------------------------------------------------------------------------------- /images/create-instance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/create-instance.png -------------------------------------------------------------------------------- /images/overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/overview.png -------------------------------------------------------------------------------- /images/region-selection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/images/region-selection.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 |

2 | Amazon SageMaker Workshop

3 |

Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. In this workshop, you'll create a SageMaker notebook instance and work through sample Jupyter notebooks that demonstrate some of the many features of SageMaker. For example, you'll create model training jobs using SageMaker's hosted training feature, and create endpoints to serve predictions from your models using SageMaker's hosted endpoint feature. Along the way you'll see how machine learning can be applied to both structured data (e.g. from CSV flat files) and unstructured data (e.g. images).

4 |

Overview

5 |

6 | Prerequisites

7 |

8 | AWS Account

9 |

In order to complete this workshop you'll need an AWS Account with access to create AWS IAM, S3 and SageMaker resources. The code and instructions in this workshop assume only one student is using a given AWS account at a time. If you try sharing an account with another student, you'll run into naming conflicts for certain resources. You can work around these by appending a unique suffix to the resources that fail to create due to conflicts, but the instructions do not provide details on the changes required to make this work.

10 |

Some of the resources you will launch as part of this workshop are eligible for the AWS free tier if your account is less than 12 months old. See the AWS Free Tier page for more details.

11 |

12 | AWS Region

13 |

SageMaker is not available in all AWS Regions at this time. Accordingly, we recommend running this workshop in one of the following supported AWS Regions: N. Virginia, Oregon, Ohio, or Ireland.

14 |

Once you've chosen a region, you should create all of the resources for this workshop there, including a new Amazon S3 bucket and a new SageMaker notebook instance. Make sure you select your region from the dropdown in the upper right corner of the AWS Console before getting started.

15 |

Region selection screenshot

16 |

17 | Browser

18 |

We recommend you use the latest version of Chrome or Firefox to complete this workshop.

19 |

20 | Modules

21 |

This workshop is divided into multiple modules. Module 1 must be completed first, followed by Module 2. You can complete the other modules (Modules 3 and 4) in any order.

22 |
    23 |
  1. Creating a Notebook Instance
  2. 24 |
  3. Video Game Sales Notebook
  4. 25 |
  5. Distributed Training with TensorFlow Notebook
  6. 26 |
  7. Image Classification Notebook
  8. 27 |
28 |

Be patient as you work your way through the notebook-based modules. After you run a cell in a notebook, it may take several seconds for the code to show results. For the cells that start training jobs, it may take several minutes. In particular, the last two modules have training jobs that may last up to 10 minutes.

29 |

After you have completed the workshop, you can delete all of the resources that were created by following the Cleanup Guide provided with this lab guide.

30 |

31 | Module 1: Creating a Notebook Instance

32 |

In this module we'll start by creating an Amazon S3 bucket that will be used throughout the workshop. We'll then create a SageMaker notebook instance, which we will use to run the other workshop modules.

33 |

34 | 1. Create a S3 Bucket

35 |

SageMaker typically uses S3 as storage for data and model artifacts. In this step you'll create a S3 bucket for this purpose. To begin, sign into the AWS Management Console, https://console.aws.amazon.com/.

36 |

37 | High-Level Instructions

38 |

Use the console or AWS CLI to create an Amazon S3 bucket. Keep in mind that your bucket's name must be globally unique across all regions and customers. We recommend using a name like smworkshop-firstname-lastname. If you get an error that your bucket name already exists, try adding additional numbers or characters until you find an unused name.

39 |
40 | Step-by-step instructions (expand for details)

41 |

42 |
    43 |
  1. 44 |

    In the AWS Management Console, choose Services then select S3 under Storage.

    45 |
  2. 46 |
  3. 47 |

    Choose +Create Bucket

    48 |
  4. 49 |
  5. 50 |

    Provide a globally unique name for your bucket such as smworkshop-firstname-lastname.

    51 |
  6. 52 |
  7. 53 |

    Select the Region you've chosen to use for this workshop from the dropdown.

    54 |
  8. 55 |
  9. 56 |

    Choose Create in the lower left of the dialog without selecting a bucket to copy settings from.

    57 |
  10. 58 |
59 |
60 |

61 | 2. Launching the Notebook Instance

62 |
    63 |
  1. 64 |

    In the upper-right corner of the AWS Management Console, confirm you are in the desired AWS region. Select N. Virginia, Oregon, Ohio, or Ireland.

    65 |
  2. 66 |
  3. 67 |

    Click on Amazon SageMaker from the list of all services. This will bring you to the Amazon SageMaker console homepage.

    68 |
  4. 69 |
70 |

Services in Console

71 |
    72 |
  1. To create a new notebook instance, go to Notebook instances, and click the Create notebook instance button at the top of the browser window.
  2. 73 |
74 |

Notebook Instances

75 |
    76 |
  1. 77 |

    Type [First Name]-[Last Name]-workshop into the Notebook instance name text box, and select ml.m4.xlarge for the Notebook instance type.

    78 |
  2. 79 |
  3. 80 |

    For IAM role, choose Select an existing role and choose one named "AmazonSageMaker-ExecutionRole-XXXX".

    81 |
  4. 82 |
83 |

Create Notebook Instance

84 |
    85 |
  1. 86 |

    You can expand the "Tags" section and add tags here if required.

    87 |
  2. 88 |
  3. 89 |

    You will be taken back to the Create Notebook instance page. Click Create notebook instance. This will take several minutes to complete.

    90 |
  4. 91 |
92 |

93 | 3. Accessing the Notebook Instance

94 |
    95 |
  1. Wait for the server status to change to InService. This will take a few minutes.
  2. 96 |
97 |

Access Notebook

98 |
    99 |
  1. Click Open. You will now see the Jupyter homepage for your notebook instance.
  2. 100 |
101 |

Open Notebook

102 |

103 | Module 2: Video Game Sales Notebook

104 |

In this module, we'll work our way through an example Jupyter notebook that demonstrates how to use an Amazon-provided algorithm in SageMaker. More specifically, we'll use SageMaker's version of XGBoost, a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to predict a target variable by combining the estimates of a set of simpler, weaker models. XGBoost has done remarkably well in machine learning competitions because it robustly handles a wide variety of data types, relationships, and distributions. It often is a useful, go-to algorithm in working with structured data, such as data that might be found in relational databases and flat files.

105 |

To begin, follow these steps:

106 |
    107 |
  1. Download this repository to your computer by clicking the green Clone or download button from the upper right of this page, then Download ZIP. 108 |
      109 |
    • If you aren't accessing this on Github, you can download this here: sagemaker-lab.zip 110 |
    • 111 |
    112 |
  2. 113 |
  3. In your notebook instance, click the New button on the right and select Folder.
  4. 114 |
  5. Click the checkbox next to your new folder, click the Rename button above in the menu bar, and give the folder a name such as 'video-game-sales'.
  6. 115 |
  7. Click the folder to enter it.
  8. 116 |
  9. To upload the notebook, click the Upload button on the right, then in the file selection popup, select the file 'video-game-sales.ipynb' from the folder on your computer where you downloaded this GitHub repository. Then click the blue Upload button that appears in the notebook next to the file name.
  10. 117 |
  11. You are now ready to begin the notebook: click the notebook's file name to open it.
  12. 118 |
  13. In the bucket = '<your_s3_bucket_name_here>' code line, paste the name of the S3 bucket you created in Module 1 to replace <your_s3_bucket_name_here>. The code line should now read similar to bucket = 'smworkshop-john-smith'. Do NOT paste the entire path (s3://.......), just the bucket name.
  14. 119 |
  15. If you are familiar with Jupyter notebooks, you can skip this step. Otherwise, please expand the instructions below.
  16. 120 |
121 |
122 | Jupyter notebook instructions (expand for details)

123 |

124 |
    125 |
  1. 126 |

    Jupyter notebooks tell a story by combining explanatory text and code. There are two types of "cells" in a notebook: code cells, and "markdown" cells with explanatory text.

    127 |
  2. 128 |
  3. 129 |

    You will be running the code cells. These are distinguished by having "In" next to them in the left margin next to the cell, and a greyish background. Markdown cells lack "In" and have a white background.

    130 |
  4. 131 |
  5. 132 |

    To run a code cell, simply click in it, then either click the Run Cell button in the notebook's toolbar, or use Control+Enter from your computer's keyboard.

    133 |
  6. 134 |
  7. 135 |

    It may take a few seconds to a few minutes for a code cell to run. Please run each code cell in order, and only once, to avoid repeated operations. For example, running the same training job cell twice might create two training jobs, possibly exceeding your service limits.

    136 |
  8. 137 |
138 |
139 |

NOTE: training the model for this example typically takes about 5 minutes.

140 |

141 | Module 3: Distributed Training with TensorFlow Notebook

142 |

In this module we will be using images of handwritten digits from the MNIST Database to demonstrate how to perform distributed training using SageMaker. Using a convolutional neural network model based on the TensorFlow MNIST Example, we will demonstrate how to use a Jupyter notebook and the SageMaker Python SDK to create your own script to pre-process data, train a model, create a SageMaker hosted endpoint, and make predictions against this endpoint. The model will predict what the handwritten digit is in the image presented for prediction. Besides demonstrating a "bring your own script" for TensorFlow use case, the example also showcases how easy it is to set up a cluster of multiple instances for model training in SageMaker.

143 |
    144 |
  1. In your notebook instance, click the New button on the right and select Folder.
  2. 145 |
  3. Click the checkbox next to your new folder, click the Rename button above in the menu bar, and give the folder a name such as 'tensorflow-distributed'.
  4. 146 |
  5. Click the folder to enter it.
  6. 147 |
  7. To upload the notebook, click the Upload button on the right, then in the file selection popup, select the file 'TensorFlow_Distributed_MNIST.ipynb' from the folder on your computer where you downloaded this GitHub repository. Then click the blue Upload button that appears in the notebook next to the file name.
  8. 148 |
  9. You are now ready to begin the notebook: click the notebook's file name to open it, then follow the directions in the notebook.
  10. 149 |
150 |

NOTE: training the model for this example typically takes about 8 minutes.

151 |

152 | Module 4: Image Classification Notebook

153 |

For this module, we'll work with an image classification example notebook. In particular, we'll use the Amazon-provided image classification algorithm, which is a supervised learning algorithm that takes an image as input and classifies it into one of multiple output categories. It uses a convolutional neural network (ResNet) that can be trained from scratch, or trained using transfer learning when a large number of training images are not available. Even if you don't have experience with neural networks or image classification, SageMaker's image classification algorithm makes the technology easy to use, with no need to design and set up your own neural network.

154 |

Follow these steps:

155 |
    156 |
  1. In your notebook instance, click the New button on the right and select Folder.
  2. 157 |
  3. Click the checkbox next to your new folder, click the Rename button above in the menu bar, and give the folder a name such as 'image-classification'.
  4. 158 |
  5. Click the folder to enter it.
  6. 159 |
  7. To upload the notebook, click the Upload button on the right, then in the file selection popup, select the file 'Image-classification-transfer-learning.ipynb' from the folder on your computer where you downloaded this GitHub repository. Then click the blue Upload button that appears in the notebook next to the file name.
  8. 160 |
  9. You are now ready to begin the notebook: click the notebook's file name to open it, then follow the directions in the notebook.
  10. 161 |
162 |

NOTE: training the model for this example typically takes about 10 minutes. However, keep in mind that this is relatively short because transfer learning is used rather than training from scratch, which could take many hours.

163 |

164 | Cleanup Guide

165 |

To avoid charges for resources you no longer need when you're done with this workshop, you can delete them or, in the case of your notebook instance, stop them. Here are the resources you should consider:

166 |
    167 |
  • 168 |

    Endpoints: these are the clusters of one or more instances serving inferences from your models. If you did not delete them from within the notebooks, you can delete them via the SageMaker console. To do so, click the Endpoints link in the left panel. Then, for each endpoint, click the radio button next to it, then select Delete from the Actions drop down menu. You can follow a similar procedure to delete the related Models and Endpoint configurations.

    169 |
  • 170 |
  • 171 |

    Notebook instance: you have two options if you do not want to keep the notebook instance running. If you would like to save it for later, you can stop rather than deleting it. To delete it, click the Notebook instances link in the left panel. Next, click the radio button next to the notebook instance created for this workshop, then select Delete from the Actions drop down menu. To simply stop it instead, just click the Stop link. After it is stopped, you can start it again by clicking the Start link. Keep in mind that if you stop rather than delete it, you will be charged for the storage associated with it.

    172 |
  • 173 |
174 |

175 | License

176 |

The contents of this workshop are licensed under the Apache 2.0 License.

177 | 178 | -------------------------------------------------------------------------------- /sagemaker-lab.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/adamrb/amazon-sagemaker-workshop/1078fb4e8b22cd98c55e90e826e99d85e10982d8/sagemaker-lab.zip -------------------------------------------------------------------------------- /video-game-sales-xgboost.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Predicting Product Success When Review Data Is Available\n", 8 | "_**Using XGBoost to Predict Whether Sales will Exceed the \"Hit\" Threshold**_\n", 9 | "\n", 10 | "---\n", 11 | "\n", 12 | "---\n", 13 | "\n", 14 | "## Contents\n", 15 | "\n", 16 | "1. [Background](#Background)\n", 17 | "1. [Setup](#Setup)\n", 18 | "1. [Data](#Data)\n", 19 | "1. [Train](#Train)\n", 20 | "1. [Host](#Host)\n", 21 | "1. [Evaluation](#Evaluation)\n", 22 | "1. [Extensions](#Extensions)\n", 23 | "\n", 24 | "\n", 25 | "## Background\n", 26 | "\n", 27 | "Word of mouth in the form of user reviews, critic reviews, social media comments, etc. often can provide insights about whether a product ultimately will be a success. In the video game industry in particular, reviews and ratings can have a large impact on a game's success. However, not all games with bad reviews fail, and not all games with good reviews turn out to be hits. To predict hit games, machine learning algorithms potentially can take advantage of various relevant data attributes in addition to reviews. \n", 28 | "\n", 29 | "For this notebook, we will work with the data set Video Game Sales with Ratings. This [Metacritic](http://www.metacritic.com/browse/games/release-date/available) data includes attributes for user reviews as well as critic reviews, sales, ESRB ratings, among others. Both user reviews and critic reviews are in the form of ratings scores, on a scale of 0 to 10 or 0 to 100. Although this is convenient, a significant issue with the data set is that it is relatively small. \n", 30 | "\n", 31 | "Dealing with a small data set such as this one is a common problem in machine learning. This problem often is compounded by imbalances between the classes in the small data set. In such situations, using an ensemble learner can be a good choice. This notebook will focus on using XGBoost, a popular ensemble learner, to build a classifier to determine whether a game will be a hit. \n", 32 | "\n", 33 | "## Setup\n", 34 | "\n", 35 | "Let's start by specifying:\n", 36 | "\n", 37 | "- The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.\n", 38 | "- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the `get_execution_role()` call with the appropriate full IAM role arn string(s)." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": null, 44 | "metadata": { 45 | "collapsed": true, 46 | "isConfigCell": true 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "bucket = ''\n", 51 | "prefix = 'sagemaker/videogames_xgboost'\n", 52 | " \n", 53 | "import sagemaker\n", 54 | "\n", 55 | "role = sagemaker.get_execution_role()" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "Next we'll import the Python libraries we'll need." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": { 69 | "collapsed": true 70 | }, 71 | "outputs": [], 72 | "source": [ 73 | "import numpy as np \n", 74 | "import pandas as pd \n", 75 | "import matplotlib.pyplot as plt \n", 76 | "from IPython.display import Image \n", 77 | "from IPython.display import display \n", 78 | "from sklearn.datasets import dump_svmlight_file \n", 79 | "from time import gmtime, strftime \n", 80 | "import sys \n", 81 | "import math \n", 82 | "import json\n", 83 | "import boto3" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "---\n", 91 | "## Data\n", 92 | "\n", 93 | "Before proceeding further, let's download the data set from a public S3 bucket to your notebook instance. It will then appear in the same directory as this notebook. Then we'll take an initial look at the data." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "raw_data_filename = 'Video_Games_Sales_as_at_22_Dec_2016.csv'\n", 103 | "data_bucket = 'sagemaker-workshop-pdx'\n", 104 | "\n", 105 | "s3 = boto3.resource('s3')\n", 106 | "s3.Bucket(data_bucket).download_file(raw_data_filename, 'raw_data.csv')\n", 107 | "\n", 108 | "data = pd.read_csv('./raw_data.csv')\n", 109 | "pd.set_option('display.max_rows', 20) \n", 110 | "data" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Before proceeding further, we need to decide upon a target to predict. Video game development budgets can run into the tens of millions of dollars, so it is critical for game publishers to publish \"hit\" games to recoup their costs and make a profit. As a proxy for what constitutes a \"hit\" game, we will set a target of greater than 1 million units in global sales." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "collapsed": true 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "data['y'] = (data['Global_Sales'] > 1)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "With our target now defined, let's take a look at the imbalance between the \"hit\" and \"not a hit\" classes:" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": null, 141 | "metadata": { 142 | "collapsed": true 143 | }, 144 | "outputs": [], 145 | "source": [ 146 | "plt.bar(['not a hit', 'hit'], data['y'].value_counts())\n", 147 | "plt.show()" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "Not surprisingly, only a small fraction of games can be considered \"hits\" under our metric. Next, we'll choose features that have predictive power for our target. We'll begin by plotting review scores versus global sales to check our hunch that such scores have an impact on sales. Logarithmic scale is used for clarity." 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": null, 160 | "metadata": { 161 | "collapsed": true 162 | }, 163 | "outputs": [], 164 | "source": [ 165 | "viz = data.filter(['User_Score','Critic_Score', 'Global_Sales'], axis=1)\n", 166 | "viz['User_Score'] = pd.Series(viz['User_Score'].apply(pd.to_numeric, errors='coerce'))\n", 167 | "viz['User_Score'] = viz['User_Score'].mask(np.isnan(viz[\"User_Score\"]), viz['Critic_Score'] / 10.0)\n", 168 | "viz.plot(kind='scatter', logx=True, logy=True, x='Critic_Score', y='Global_Sales')\n", 169 | "viz.plot(kind='scatter', logx=True, logy=True, x='User_Score', y='Global_Sales')\n", 170 | "plt.show()" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "Our intuition about the relationship between review scores and sales seems justified. We also note in passing that other relevant features can be extracted from the data set. For example, the ESRB rating has an impact since games with an \"E\" for everyone rating typically reach a wider audience than games with an age-restricted \"M\" for mature rating, though depending on another feature, the genre (such as shooter or action), M-rated games also can be huge hits. Our model hopefully will learn these relationships and others. \n", 178 | "\n", 179 | "Next, looking at the columns of features of this data set, we can identify several that should be excluded. For example, there are five columns that specify sales numbers: these numbers are directly related to the target we're trying to predict, so these columns should be dropped. Other features may be irrelevant, such as the name of the game." 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": { 186 | "collapsed": true 187 | }, 188 | "outputs": [], 189 | "source": [ 190 | "data = data.drop(['Name', 'Year_of_Release', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales', 'Critic_Count', 'User_Count', 'Developer'], axis=1)" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "With the number of columns reduced, now is a good time to check how many columns are missing data:" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": null, 203 | "metadata": { 204 | "collapsed": true 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "data.isnull().sum()" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "As noted in Kaggle's overview of this data set, many review ratings are missing. Unfortunately, since those are crucial features that we are relying on for our predictions, and there is no reliable way of imputing so many of them, we'll need to drop rows missing those features." 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": { 222 | "collapsed": true 223 | }, 224 | "outputs": [], 225 | "source": [ 226 | "data = data.dropna()" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "Now we need to resolve a problem we see in the User_Score column: it contains some 'tbd' string values, so it obviously is not numeric. User_Score is more properly a numeric rather than categorical feature, so we'll need to convert it from string type to numeric, and temporarily fill in NaNs for the tbds. Next, we must decide what to do with these new NaNs in the User_Score column. We've already thrown out a large number of rows, so if we can salvage these rows, we should. As a first approximation, we'll take the value in the Critic_Score column and divide by 10 since the user scores tend to track the critic scores (though on a scale of 0 to 10 instead of 0 to 100). " 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": { 240 | "collapsed": true 241 | }, 242 | "outputs": [], 243 | "source": [ 244 | "data['User_Score'] = data['User_Score'].apply(pd.to_numeric, errors='coerce')\n", 245 | "data['User_Score'] = data['User_Score'].mask(np.isnan(data[\"User_Score\"]), data['Critic_Score'] / 10.0)" 246 | ] 247 | }, 248 | { 249 | "cell_type": "markdown", 250 | "metadata": {}, 251 | "source": [ 252 | "Let's do some final preprocessing of the data, including converting the categorical features into numeric using the one-hot encoding method." 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": { 259 | "collapsed": true 260 | }, 261 | "outputs": [], 262 | "source": [ 263 | "data['y'] = data['y'].apply(lambda y: 'yes' if y == True else 'no')\n", 264 | "model_data = pd.get_dummies(data)" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "To help prevent overfitting the model, we'll randomly split the data into three groups. Specifically, the model will be trained on 70% of the data. It will then be evaluated on 20% of the data to give us an estimate of the accuracy we hope to have on \"new\" data. As a final testing dataset, the remaining 10% will be held out until the end." 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": { 278 | "collapsed": true 279 | }, 280 | "outputs": [], 281 | "source": [ 282 | "train_data, validation_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data)), int(0.9 * len(model_data))]) " 283 | ] 284 | }, 285 | { 286 | "cell_type": "markdown", 287 | "metadata": {}, 288 | "source": [ 289 | "XGBoost operates on data in the libSVM data format, with features and the target variable provided as separate arguments. To avoid any misalignment issues due to random reordering, this split is done after the previous split in the above cell. As a last step before training, we'll copy the resulting files to S3 as input for SageMaker's managed training." 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": { 296 | "collapsed": true 297 | }, 298 | "outputs": [], 299 | "source": [ 300 | "dump_svmlight_file(X=train_data.drop(['y_no', 'y_yes'], axis=1), y=train_data['y_yes'], f='train.libsvm')\n", 301 | "dump_svmlight_file(X=validation_data.drop(['y_no', 'y_yes'], axis=1), y=validation_data['y_yes'], f='validation.libsvm')\n", 302 | "dump_svmlight_file(X=test_data.drop(['y_no', 'y_yes'], axis=1), y=test_data['y_yes'], f='test.libsvm')\n", 303 | "\n", 304 | "boto3.Session().resource('s3').Bucket(bucket).Object(prefix + '/train/train.libsvm').upload_file('train.libsvm')\n", 305 | "boto3.Session().resource('s3').Bucket(bucket).Object(prefix + '/validation/validation.libsvm').upload_file('validation.libsvm')" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "---\n", 313 | "## Train\n", 314 | "\n", 315 | "Our data is now ready to be used to train a XGBoost model. The XGBoost algorithm has many tunable hyperparameters. Some of these hyperparameters are listed below; initially we'll only use a few of them. \n", 316 | "\n", 317 | "- `max_depth`: Maximum depth of a tree. As a cautionary note, a value too small could underfit the data, while increasing it will make the model more complex and thus more likely to overfit the data (in other words, the classic bias-variance tradeoff).\n", 318 | "- `eta`: Step size shrinkage used in updates to prevent overfitting. \n", 319 | "- `eval_metric`: Evaluation metric(s) for validation data. For data sets such as this one with imbalanced classes, we'll use the AUC metric.\n", 320 | "- `scale_pos_weight`: Controls the balance of positive and negative weights, again useful for data sets having imbalanced classes.\n", 321 | "\n", 322 | "First we'll setup the parameters for a training job, then create a training job with those parameters and run it. " 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "collapsed": true 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "job_name = 'videogames-xgboost-' + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", 334 | "print(\"Training job\", job_name)\n", 335 | "\n", 336 | "containers = {\n", 337 | " 'us-west-2': '433757028032.dkr.ecr.us-west-2.amazonaws.com/xgboost:latest',\n", 338 | " 'us-east-1': '811284229777.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',\n", 339 | " 'us-east-2': '825641698319.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest',\n", 340 | " 'eu-west-1': '685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:latest'\n", 341 | " }\n", 342 | "\n", 343 | "create_training_params = \\\n", 344 | "{\n", 345 | " \"RoleArn\": role,\n", 346 | " \"TrainingJobName\": job_name,\n", 347 | " \"AlgorithmSpecification\": {\n", 348 | " \"TrainingImage\": containers[boto3.Session().region_name],\n", 349 | " \"TrainingInputMode\": \"File\"\n", 350 | " },\n", 351 | " \"ResourceConfig\": {\n", 352 | " \"InstanceCount\": 1,\n", 353 | " \"InstanceType\": \"ml.c4.xlarge\",\n", 354 | " \"VolumeSizeInGB\": 10\n", 355 | " },\n", 356 | " \"InputDataConfig\": [\n", 357 | " {\n", 358 | " \"ChannelName\": \"train\",\n", 359 | " \"DataSource\": {\n", 360 | " \"S3DataSource\": {\n", 361 | " \"S3DataType\": \"S3Prefix\",\n", 362 | " \"S3Uri\": \"s3://{}/{}/train\".format(bucket, prefix),\n", 363 | " \"S3DataDistributionType\": \"FullyReplicated\"\n", 364 | " }\n", 365 | " },\n", 366 | " \"ContentType\": \"libsvm\",\n", 367 | " \"CompressionType\": \"None\"\n", 368 | " },\n", 369 | " {\n", 370 | " \"ChannelName\": \"validation\",\n", 371 | " \"DataSource\": {\n", 372 | " \"S3DataSource\": {\n", 373 | " \"S3DataType\": \"S3Prefix\",\n", 374 | " \"S3Uri\": \"s3://{}/{}/validation\".format(bucket, prefix),\n", 375 | " \"S3DataDistributionType\": \"FullyReplicated\"\n", 376 | " }\n", 377 | " },\n", 378 | " \"ContentType\": \"libsvm\",\n", 379 | " \"CompressionType\": \"None\"\n", 380 | " }\n", 381 | " ],\n", 382 | " \"OutputDataConfig\": {\n", 383 | " \"S3OutputPath\": \"s3://{}/{}/xgboost-video-games/output\".format(bucket, prefix)\n", 384 | " },\n", 385 | " \"HyperParameters\": {\n", 386 | " \"max_depth\":\"3\",\n", 387 | " \"eta\":\"0.1\",\n", 388 | " \"eval_metric\":\"auc\",\n", 389 | " \"scale_pos_weight\":\"2.0\",\n", 390 | " \"subsample\":\"0.5\",\n", 391 | " \"objective\":\"binary:logistic\",\n", 392 | " \"num_round\":\"100\"\n", 393 | " },\n", 394 | " \"StoppingCondition\": {\n", 395 | " \"MaxRuntimeInSeconds\": 60 * 60\n", 396 | " }\n", 397 | "}" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "metadata": { 404 | "collapsed": true 405 | }, 406 | "outputs": [], 407 | "source": [ 408 | "%%time\n", 409 | "\n", 410 | "sm = boto3.client('sagemaker')\n", 411 | "sm.create_training_job(**create_training_params)\n", 412 | "\n", 413 | "status = sm.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']\n", 414 | "print(status)\n", 415 | "\n", 416 | "try:\n", 417 | " sm.get_waiter('training_job_completed_or_stopped').wait(TrainingJobName=job_name)\n", 418 | "finally:\n", 419 | " status = sm.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus']\n", 420 | " print(\"Training job ended with status: \" + status)\n", 421 | " if status == 'Failed':\n", 422 | " message = sm.describe_training_job(TrainingJobName=job_name)['FailureReason']\n", 423 | " print('Training failed with the following error: {}'.format(message))\n", 424 | " raise Exception('Training job failed')" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "---\n", 432 | "## Host\n", 433 | "\n", 434 | "Now that we've trained the XGBoost algorithm on our data, let's prepare the model for hosting on a SageMaker serverless endpoint. We will:\n", 435 | "\n", 436 | "1. Point to the scoring container\n", 437 | "1. Point to the model.tar.gz that came from training\n", 438 | "1. Create the hosting model" 439 | ] 440 | }, 441 | { 442 | "cell_type": "code", 443 | "execution_count": null, 444 | "metadata": { 445 | "collapsed": true 446 | }, 447 | "outputs": [], 448 | "source": [ 449 | "create_model_response = sm.create_model(\n", 450 | " ModelName=job_name,\n", 451 | " ExecutionRoleArn=role,\n", 452 | " PrimaryContainer={\n", 453 | " 'Image': containers[boto3.Session().region_name],\n", 454 | " 'ModelDataUrl': sm.describe_training_job(TrainingJobName=job_name)['ModelArtifacts']['S3ModelArtifacts']})\n", 455 | "\n", 456 | "print(create_model_response['ModelArn'])" 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "metadata": {}, 462 | "source": [ 463 | "Next, we'll configure our hosting endpoint. Here we specify:\n", 464 | "\n", 465 | "1. EC2 instance type to use for hosting\n", 466 | "1. The initial number of instances\n", 467 | "1. Our hosting model name\n", 468 | "\n", 469 | "After the endpoint has been configured, we'll create the endpoint itself." 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "metadata": { 476 | "collapsed": true 477 | }, 478 | "outputs": [], 479 | "source": [ 480 | "xgboost_endpoint_config = 'videogames-xgboost-endpoint-config-' + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n", 481 | "print(xgboost_endpoint_config)\n", 482 | "create_endpoint_config_response = sm.create_endpoint_config(\n", 483 | " EndpointConfigName=xgboost_endpoint_config,\n", 484 | " ProductionVariants=[{\n", 485 | " 'InstanceType': 'ml.t2.medium',\n", 486 | " 'InitialInstanceCount': 1,\n", 487 | " 'ModelName': job_name,\n", 488 | " 'VariantName': 'AllTraffic'}])\n", 489 | "\n", 490 | "print(\"Endpoint Config Arn: \" + create_endpoint_config_response['EndpointConfigArn'])" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": null, 496 | "metadata": { 497 | "collapsed": true 498 | }, 499 | "outputs": [], 500 | "source": [ 501 | "%%time\n", 502 | "\n", 503 | "xgboost_endpoint = 'EXAMPLE-videogames-xgb-endpoint-' + strftime(\"%Y%m%d%H%M\", gmtime())\n", 504 | "print(xgboost_endpoint)\n", 505 | "create_endpoint_response = sm.create_endpoint(\n", 506 | " EndpointName=xgboost_endpoint,\n", 507 | " EndpointConfigName=xgboost_endpoint_config)\n", 508 | "print(create_endpoint_response['EndpointArn'])\n", 509 | "\n", 510 | "resp = sm.describe_endpoint(EndpointName=xgboost_endpoint)\n", 511 | "status = resp['EndpointStatus']\n", 512 | "print(\"Status: \" + status)\n", 513 | "\n", 514 | "try:\n", 515 | " sm.get_waiter('endpoint_in_service').wait(EndpointName=xgboost_endpoint)\n", 516 | "finally:\n", 517 | " resp = sm.describe_endpoint(EndpointName=xgboost_endpoint)\n", 518 | " status = resp['EndpointStatus']\n", 519 | " print(\"Arn: \" + resp['EndpointArn'])\n", 520 | " print(\"Status: \" + status)\n", 521 | "\n", 522 | " if status != 'InService':\n", 523 | " message = sm.describe_endpoint(EndpointName=xgboost_endpoint)['FailureReason']\n", 524 | " print('Endpoint creation failed with the following error: {}'.format(message))\n", 525 | " raise Exception('Endpoint creation did not succeed')" 526 | ] 527 | }, 528 | { 529 | "cell_type": "markdown", 530 | "metadata": {}, 531 | "source": [ 532 | "---\n", 533 | "\n", 534 | "## Evaluation\n", 535 | "\n", 536 | "Now that we have our hosted endpoint, we can generate predictions from it. More specifically, let's generate predictions from our test data set to understand how well our model generalizes to data it has not seen yet.\n", 537 | "\n", 538 | "There are many ways to compare the performance of a machine learning model. We'll start simply by comparing actual to predicted values of whether the game was a \"hit\" (`1`) or not (`0`). Then we'll produce a confusion matrix, which shows how many test data points were predicted by the model in each category versus how many test data points actually belonged in each category." 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": null, 544 | "metadata": { 545 | "collapsed": true 546 | }, 547 | "outputs": [], 548 | "source": [ 549 | "runtime = boto3.client('runtime.sagemaker')" 550 | ] 551 | }, 552 | { 553 | "cell_type": "code", 554 | "execution_count": null, 555 | "metadata": { 556 | "collapsed": true 557 | }, 558 | "outputs": [], 559 | "source": [ 560 | "def do_predict(data, endpoint_name, content_type):\n", 561 | " payload = '\\n'.join(data)\n", 562 | " response = runtime.invoke_endpoint(EndpointName=endpoint_name, \n", 563 | " ContentType=content_type, \n", 564 | " Body=payload)\n", 565 | " result = response['Body'].read()\n", 566 | " result = result.decode(\"utf-8\")\n", 567 | " result = result.split(',')\n", 568 | " preds = [float((num)) for num in result]\n", 569 | " preds = [round(num) for num in preds]\n", 570 | " return preds\n", 571 | "\n", 572 | "def batch_predict(data, batch_size, endpoint_name, content_type):\n", 573 | " items = len(data)\n", 574 | " arrs = []\n", 575 | " \n", 576 | " for offset in range(0, items, batch_size):\n", 577 | " if offset+batch_size < items:\n", 578 | " results = do_predict(data[offset:(offset+batch_size)], endpoint_name, content_type)\n", 579 | " arrs.extend(results)\n", 580 | " else:\n", 581 | " arrs.extend(do_predict(data[offset:items], endpoint_name, content_type))\n", 582 | " sys.stdout.write('.')\n", 583 | " return(arrs)" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": null, 589 | "metadata": { 590 | "collapsed": true 591 | }, 592 | "outputs": [], 593 | "source": [ 594 | "%%time\n", 595 | "import json\n", 596 | "\n", 597 | "with open('test.libsvm', 'r') as f:\n", 598 | " payload = f.read().strip()\n", 599 | "\n", 600 | "labels = [int(line.split(' ')[0]) for line in payload.split('\\n')]\n", 601 | "test_data = [line for line in payload.split('\\n')]\n", 602 | "preds = batch_predict(test_data, 100, xgboost_endpoint, 'text/x-libsvm')\n", 603 | "\n", 604 | "print ('\\nerror rate=%f' % ( sum(1 for i in range(len(preds)) if preds[i]!=labels[i]) /float(len(preds))))" 605 | ] 606 | }, 607 | { 608 | "cell_type": "code", 609 | "execution_count": null, 610 | "metadata": { 611 | "collapsed": true 612 | }, 613 | "outputs": [], 614 | "source": [ 615 | "pd.crosstab(index=np.array(labels), columns=np.array(preds))" 616 | ] 617 | }, 618 | { 619 | "cell_type": "markdown", 620 | "metadata": {}, 621 | "source": [ 622 | "Of the 132 games in the test set that actually are \"hits\" by our metric, the model correctly identified 73, while the overall error rate is 13%. The amount of false negatives versus true positives can be shifted substantially in favor of true positives by increasing the hyperparameter scale_pos_weight. Of course, this increase comes at the expense of reduced accuracy/increased error rate and more false positives. How to make this trade-off ultimately is a business decision based on the relative costs of false positives, false negatives, etc." 623 | ] 624 | }, 625 | { 626 | "cell_type": "markdown", 627 | "metadata": {}, 628 | "source": [ 629 | "---\n", 630 | "## Extensions\n", 631 | "\n", 632 | "This XGBoost model is just the starting point for predicting whether a game will be a hit based on reviews and other features. There are several possible avenues for improving the model's performance. First, of course, would be to collect more data and, if possible, fill in the existing missing fields with actual information. Another possibility is further hyperparameter tuning, with Amazon SageMaker's Hyperparameter Optimization service. And, although ensemble learners often do well with imbalanced data sets, it could be worth exploring techniques for mitigating imbalances such as downsampling, synthetic data augmentation, and other approaches. " 633 | ] 634 | }, 635 | { 636 | "cell_type": "code", 637 | "execution_count": null, 638 | "metadata": { 639 | "collapsed": true 640 | }, 641 | "outputs": [], 642 | "source": [ 643 | "sm.delete_endpoint(EndpointName=xgboost_endpoint)" 644 | ] 645 | } 646 | ], 647 | "metadata": { 648 | "kernelspec": { 649 | "display_name": "conda_python3", 650 | "language": "python", 651 | "name": "conda_python3" 652 | }, 653 | "language_info": { 654 | "codemirror_mode": { 655 | "name": "ipython", 656 | "version": 3 657 | }, 658 | "file_extension": ".py", 659 | "mimetype": "text/x-python", 660 | "name": "python", 661 | "nbconvert_exporter": "python", 662 | "pygments_lexer": "ipython3", 663 | "version": "3.6.2" 664 | }, 665 | "notice": "Copyright 2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License." 666 | }, 667 | "nbformat": 4, 668 | "nbformat_minor": 2 669 | } 670 | --------------------------------------------------------------------------------