├── .gitattributes
├── images
    ├── 3.43.45PM.png
    ├── 3.44.08PM.png
    ├── 3.44.14PM.png
    ├── 3.44.19PM.png
    ├── 3.44.35PM.png
    ├── 3.49.20PM.png
    ├── 3.56.54PM.png
    ├── 3.57.19PM.png
    ├── 3.59.37PM.png
    ├── 5.05.06PM.png
    ├── 10.45.32AM.png
    ├── 10.46.14AM.png
    ├── 10.47.32AM.png
    └── 10.48.15AM.png
├── slides
    └── reinvent cmp 423 Inf1 Lab.pdf
├── README.md
├── 3. benchmark run.md
├── 1. Setup Dev Env.md
├── 2. Deploy and model serve on Inf1 Instance.md
└── 4. Profiling and Debugging.md


/.gitattributes:
--------------------------------------------------------------------------------
1 | # Auto detect text files and perform LF normalization
2 | * text=auto
3 | 


--------------------------------------------------------------------------------
/images/3.43.45PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.43.45PM.png


--------------------------------------------------------------------------------
/images/3.44.08PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.08PM.png


--------------------------------------------------------------------------------
/images/3.44.14PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.14PM.png


--------------------------------------------------------------------------------
/images/3.44.19PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.19PM.png


--------------------------------------------------------------------------------
/images/3.44.35PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.35PM.png


--------------------------------------------------------------------------------
/images/3.49.20PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.49.20PM.png


--------------------------------------------------------------------------------
/images/3.56.54PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.56.54PM.png


--------------------------------------------------------------------------------
/images/3.57.19PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.57.19PM.png


--------------------------------------------------------------------------------
/images/3.59.37PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.59.37PM.png


--------------------------------------------------------------------------------
/images/5.05.06PM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/5.05.06PM.png


--------------------------------------------------------------------------------
/images/10.45.32AM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.45.32AM.png


--------------------------------------------------------------------------------
/images/10.46.14AM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.46.14AM.png


--------------------------------------------------------------------------------
/images/10.47.32AM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.47.32AM.png


--------------------------------------------------------------------------------
/images/10.48.15AM.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.48.15AM.png


--------------------------------------------------------------------------------
/slides/reinvent cmp 423 Inf1 Lab.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/slides/reinvent cmp 423 Inf1 Lab.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Reinvent Inf1 Lab: Hands-on Deep Learning Inference with Amazon EC2 Inf1 Instance  
 2 | 
 3 | >Note: We simplified this lab into a new repository.  https://github.com/awshlabs/Jul2020-Inf1Lab 
 4 | 
 5 | ## Abstract:
 6 | 
 7 | In this workshop, you gain hands-on experience with Amazon EC2 Inf1 instances, powered by custom AWS Inferentia chips. Amazon EC2 Inf1 instances offer low-latency, high-throughput, and cost-effective machine learning inference in the cloud. This workshop walks you through taking a trained deep learning model to deployment on Amazon EC2 Inf1 instances by using AWS Neuron, an SDK for optimizing inference using AWS Inferentia processors.
 8 | 
 9 | ## Overview:
10 | 
11 | Please follow the labs in sequence.
12 | 
13 | Lab 1. **Launch** a C5 Instance, **install** the Neuron development environment, Custom compile a pre-trained model to target the Inferentia Neuron Processor.   
14 | Lab 2. **Launch** an Inf1 Instance, **install** Neuron run-time and development environment, **test** and **model serve** the compiled ResNet package.   
15 | Lab 3. **Compile** on C5 and **launch** a load test run on Inf1 Instance.   
16 | Lab 4. **Debug and profile** your model on Inf1 Instance. 
17 | 
18 | ## Slides:
19 | 
20 | Reinvent workshop slides at at: [slides Directory](./slides)
21 | 
22 | 


--------------------------------------------------------------------------------
/3. benchmark run.md:
--------------------------------------------------------------------------------
  1 | # Lab 3: Compile on C5 and launch a load test run on Inf1 Instance.
  2 | 
  3 | **Please complete Lab 2 and clean up by following Lab 2's last step. If using DLAMI Conda environment, please update to [latest Neuron software](https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/dlami-release-notes.md) for this lab.**
  4 | 
  5 | This lab shows an example load testing using FP16 model derived from Keras ResNet50 model and compiled to Inferentia with experimental performance flags. For this lab please use the C5 used in lab 1 and inf1.2xlarge used in lab 2.
  6 | 
  7 | ## Lab 3 Section 1: Compile on C5
  8 | 
  9 | **3.1.1** Download and unpack the ResNet50 performance package on C5 instance:
 10 | 
 11 | ```bash
 12 | wget https://reinventinf1.s3.amazonaws.com/keras_fp16_benchmarking_db.tgz
 13 | ```
 14 | ```bash
 15 | tar -xzf keras_fp16_benchmarking_db.tgz
 16 | ```
 17 | ```bash
 18 | cd keras_fp16_benchmarking_db
 19 | ```
 20 | 
 21 | **3.1.2** Activate virtual environment and install Neuron Compiler if not done so. Also install pillow module for test scripts.
 22 | 
 23 | ```bash
 24 | source test_env_p36/bin/activate
 25 | pip install neuron-cc
 26 | pip install pillow
 27 | ```
 28 | 
 29 | **3.1.3** Extract Keras ResNet50 FP32, optimize for inference, and convert to FP16.
 30 | 
 31 | Extract Keras ResNet50 FP32 (resnet50_fp32_keras.pb will be generated):
 32 | 
 33 | ```bash
 34 | python gen_resnet50_keras.py
 35 | ```
 36 | Optimize the extracted Keras ResNet50 FP32 graph for inference before casting (resnet50_fp32_keras_opt.pb will be generated):
 37 | 
 38 | ```bash
 39 | python optimize_for_inference.py --graph resnet50_fp32_keras.pb --out_graph resnet50_fp32_keras_opt.pb
 40 | ```
 41 | 
 42 | Convert full graph to FP16 (resnet50_fp16_keras_opt.pb will be generated):
 43 | ```bash
 44 | python fp32tofp16.py  --graph resnet50_fp32_keras_opt.pb --out_graph resnet50_fp16_keras_opt.pb
 45 | ```
 46 | 
 47 | **3.1.4** Compile ResNet50 frozen graph using provided pb2sm_compile.py script on Inf1 instance. This step takes about 4 minutes on Inf1.2xlarge. NOTE: please ensure that the Neuron Compiler is up-to-date by following the setup steps in Lab 1 Section 1.
 48 | 
 49 | >We optimized this model with a compiled time batch size of 5. We optimize throughput having a runtime batch size of mutiples of 5. (50 in this case). This step takes about 6 minutes.
 50 | 
 51 | ```bash
 52 | time python pb2sm_compile.py
 53 | ```
 54 | 
 55 | At the end of this step, you will see a zipped saved model `rn50_fp16_compiled_batch5.zip` which you will need to copy to your Inf1 instance (the PEM key was setup during Lab 2 Section 3):
 56 | 
 57 | ```bash
 58 | scp -i ~/ee-default-keypair.pem ./rn50_fp16_compiled_batch5.zip ubuntu@<instance DNS>:~/ # Ubuntu Image default.
 59 | #scp -i ~/ee-default-keypair.pem ./rn50_fp16_compiled_batch5.zip ec2-user@<instance DNS>:~/  # if on AML2  if you are on Amazon 
 60 | ```
 61 | 
 62 | ## Lab 3 Section 2: Launch a load test run on Inf1
 63 | 
 64 | **3.2.1** Download and unpack the ResNet50 performance package again, this time on Inf1 instance:
 65 | 
 66 | ```bash
 67 | wget https://reinventinf1.s3.amazonaws.com/keras_fp16_benchmarking_db.tgz
 68 | ```
 69 | ```bash
 70 | tar -xzf keras_fp16_benchmarking_db.tgz
 71 | ```
 72 | ```bash
 73 | cd keras_fp16_benchmarking_db
 74 | ```
 75 | 
 76 | Unzip the saved model that was transfered from C5 into current directory:
 77 | 
 78 | ```bash
 79 | unzip ~/rn50_fp16_compiled_batch5.zip
 80 | ```
 81 | 
 82 | **3.2.2** Run load test using provided infer_resnet50_keras_loadtest.py script on Inf1 instance (please make sure this is inf1.2xlarge):
 83 | 
 84 | > There are total of 4 Neuron Cores on Inf1.2xlarge.  There are 4 sessions of ResNet50 running, each session binds to a Neuron core. There are 4 threads in each of these sessions.  
 85 | 
 86 | ```bash
 87 | time python infer_resnet50_keras_loadtest.py
 88 | ```
 89 | Output:
 90 | 
 91 | ```
 92 | NUM THREADS:  16
 93 | NUM_LOOPS_PER_THREAD:  100
 94 | USER_BATCH_SIZE:  50
 95 | current throughput: 0 images/sec
 96 | current throughput: 0 images/sec
 97 | current throughput: 700 images/sec
 98 | current throughput: 800 images/sec
 99 | current throughput: 1700 images/sec
100 | current throughput: 1800 images/sec
101 | current throughput: 1850 images/sec
102 | current throughput: 1800 images/sec
103 | current throughput: 1850 images/sec
104 | current throughput: 1700 images/sec
105 | current throughput: 1850 images/sec
106 | current throughput: 1800 images/sec
107 | current throughput: 1800 images/sec
108 | current throughput: 1800 images/sec
109 | current throughput: 1800 images/sec
110 | current throughput: 1750 images/sec
111 | current throughput: 1950 images/sec
112 | current throughput: 1750 images/sec
113 | current throughput: 1850 images/sec
114 | current throughput: 1800 images/sec
115 | current throughput: 1750 images/sec
116 | current throughput: 1800 images/sec
117 | current throughput: 1800 images/sec
118 | current throughput: 1750 images/sec
119 | current throughput: 1800 images/sec
120 | current throughput: 1800 images/sec
121 | current throughput: 1750 images/sec
122 | current throughput: 1850 images/sec
123 | current throughput: 1750 images/sec
124 | current throughput: 1800 images/sec
125 | current throughput: 1800 images/sec
126 | current throughput: 1800 images/sec
127 | current throughput: 1850 images/sec
128 | current throughput: 1800 images/sec
129 | current throughput: 1850 images/sec
130 | current throughput: 1800 images/sec
131 | current throughput: 1750 images/sec
132 | current throughput: 1800 images/sec
133 | current throughput: 1750 images/sec
134 | current throughput: 1800 images/sec
135 | current throughput: 1800 images/sec
136 | current throughput: 1750 images/sec
137 | current throughput: 1850 images/sec
138 | current throughput: 1800 images/sec
139 | current throughput: 1800 images/sec
140 | current throughput: 1900 images/sec
141 | current throughput: 1800 images/sec
142 | current throughput: 850 images/sec
143 | current throughput: 250 images/sec
144 | 
145 | real    0m54.746s
146 | user    1m39.552s
147 | sys     0m7.787s
148 | 
149 | ```
150 | 
151 | NOTE: If you see lower throughput, please make sure that the Inf1 instance is inf1.2xlarge.
152 | 
153 | **3.2.3** While this is running you can see utilization using neuron-top tool in a separate terminal (it takes about a minute to load; also running neuron-top will lower the throughput to around 1200 images/sec):
154 | ```bash
155 | /opt/aws/neuron/bin/neuron-top
156 | ```
157 | 
158 | **Note: Please go back to home directory /home/ubuntu**
159 | 
160 | ```bash
161 | cd ~/
162 | ```
163 | 
164 | [Go To Lab 4](4.%20Profiling%20and%20Debugging.md)
165 | 


--------------------------------------------------------------------------------
/1. Setup Dev Env.md:
--------------------------------------------------------------------------------
  1 | # Reinvent Inf1 Lab: Hands-on Deep Learning Inference with Amazon EC2 Inf1 Instance (20 min)
  2 | 
  3 | ## Abstract:
  4 | 
  5 | In this workshop, you gain hands-on experience with Amazon EC2 Inf1 instances, powered by custom AWS Inferentia chips. Amazon EC2 Inf1 instances offer low-latency, high-throughput, and cost-effective machine learning inference in the cloud. This workshop walks you through taking a trained deep learning model to deployment on Amazon EC2 Inf1 instances by using AWS Neuron, an SDK for optimizing inference using AWS Inferentia processors.
  6 | 
  7 | 
  8 | ## Overview:
  9 | 
 10 | **Note: Please folllow the labs in sequence.**
 11 | 
 12 | Lab 1. **Launch** a C5 Instance, **install** the Neuron development environment, Custom compile a pre-trained model to target the Inferentia Neuron Processor.
 13 | 
 14 | Lab 2. **Launch** an Inf1 Instance, **install** Neuron run-time and development environment, **test** and **model serve** the compiled ResNet package.
 15 | 
 16 | Lab 3. **Compile and Launch** a load test on Inferentia Instance.
 17 | 
 18 | Lab 4. **Debug and Profile** your model.
 19 | 
 20 | 
 21 | ----------
 22 | 
 23 | # Lab 1. **Launch** a C5 Instance, **install** the Neuron development environment, Custom compile a pre-trained model to target the Inferentia Neuron Processor.
 24 | 
 25 | ## Lab 1 Section 1: Launch an EC2 instance to setup Neuron SDK Dev Environment
 26 | 
 27 | A typical workflow with the Neuron SDK will be to compile a trained ML model on a compute-optimized compilation server and then the distribute the artifacts to Inf1 instances for execution.  Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based. To use a pre-built Deep Learning AMI with Neuron software, see [DLAMI AWS Neuron guide](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia.html).
 28 | 
 29 | [Launching a DLAMI Instance with AWS Neuron](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html).
 30 | 
 31 | **1.1.0** Setup your SSH env.  For Windows, we recommend installing MobaXterm. https://mobaxterm.mobatek.net/
 32 | 
 33 | If you are using AWS Events engine onsite at Re:Invent. Download your SSH key from the events engine dashboard.  https://dashboard.eventengine.run/ Enter your unique events hash, and download the SSH Key to your local laptop/machine.
 34 | 
 35 | 
 36 | **1.1.1** Goto the EC2 portal, select, **Deep Learning AMI (Ubuntu 16.04) Version 26.**  We will run through the installation and update process as an exercise, as we are very actively updating the software tools and packages.
 37 | >Please update your system frequently to ensure you are using the latest Neuron packages.
 38 | 
 39 | **1.1.2** Select and start an EC2 instance as your development environment.
 40 | It is recommended to use c5.4xlarge or larger. For this example we will use a **c5d.4xlarge** with a faster SSD drive, 16 vcores, and 32 GB of RAM.
 41 | 
 42 | >In the future, if you would like to compile and run inference on the same machine, please select inf1.6xlarge or larger.
 43 | 
 44 | 
 45 | **1.1.3** Choose an existing ssh key.
 46 | 
 47 | 
 48 | **1.1.4** Install virtual environment:
 49 | > It is always a best practice to use a virtual environment to ensure you have the best flexibility for package management in working with deep learning frameworks.
 50 | ```bash
 51 | sudo apt-get update
 52 | sudo apt-get -y install virtualenv
 53 | ```
 54 | 
 55 | Note: If you see the following errors during apt-get install, please wait a minute or so for background updates to finish and retry apt-get install:
 56 | 
 57 | ```bash
 58 | E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
 59 | E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
 60 | ```
 61 | 
 62 | **1.1.5** Setup a new Python 3.6 virtual environment.
 63 | 
 64 | ```bash
 65 | virtualenv --python=python3.6 test_env_p36
 66 | source test_env_p36/bin/activate
 67 | pip install --no-deps tensorflow_serving_api==1.15
 68 | ```
 69 | > If you are ever disconnected from the instance, make sure you run this command again to get back to the correct virtual environment where you have installed all the correct packages.
 70 | 
 71 | 
 72 | ```bash
 73 | source test_env_p36/bin/activate
 74 | ```
 75 | 
 76 | **1.1.6** Modify Pip configurations to point to the Neuron repository.
 77 | ```bash
 78 | tee $VIRTUAL_ENV/pip.conf > /dev/null <<EOF
 79 | [global]
 80 | extra-index-url = https://pip.repos.neuron.amazonaws.com
 81 | EOF
 82 | ```
 83 | 
 84 | 
 85 | **1.1.7** Install the Neuron Compiler software and TensorFlow package with Neuron support.
 86 | 
 87 | ```bash
 88 | pip install neuron-cc tensorflow-neuron
 89 | ```
 90 | Please ignore the following error displayed during installation:
 91 | ```bash
 92 | ERROR: tensorflow-serving-api 1.15.0 requires tensorflow~=1.15.0, which is not installed.
 93 | ```
 94 | 
 95 | 
 96 | ## Lab #1 Section 2.  Compile model for deployment onto Inf1 Instance.
 97 | 
 98 | In this section we compile the pre-trained Keras ResNet50 model that is included with TensorFlow-Neuron and export it as a SavedModel. SavedModel is an interchange format for TensorFlow models.
 99 | 
100 | >Note that Model size for Resnet is about 20+ million parameters, in FP32 format, each parameter is 4 bytes.  The compiler process automatically converts them into BF16 (2 bytes per parameter), which is a much more efficient dataformat that Neuron Cores support in hardware.
101 | 
102 | **1.2.1** Create a python script named `compile_resnet50.py` with the following content. You can use either nano or vi editors.
103 | 
104 | >Inspect the code below to understand the steps: export savedModel, compile model.
105 | 
106 | ```python
107 | import os
108 | import time
109 | import shutil
110 | import tensorflow as tf
111 | import tensorflow.neuron as tfn
112 | import tensorflow.compat.v1.keras as keras
113 | from tensorflow.keras.applications.resnet50 import ResNet50
114 | from tensorflow.keras.applications.resnet50 import preprocess_input
115 | 
116 | # Create a workspace
117 | WORKSPACE = './ws_resnet50'
118 | os.makedirs(WORKSPACE, exist_ok=True)
119 | 
120 | # Prepare export directory (old one removed)
121 | model_dir = os.path.join(WORKSPACE, 'resnet50')
122 | compiled_model_dir = os.path.join(WORKSPACE, 'resnet50_neuron')
123 | shutil.rmtree(model_dir, ignore_errors=True)
124 | shutil.rmtree(compiled_model_dir, ignore_errors=True)
125 | 
126 | # Instantiate Keras ResNet50 model
127 | keras.backend.set_learning_phase(0)
128 | tf.keras.backend.set_image_data_format('channels_last')
129 | model = ResNet50(weights='imagenet')
130 | 
131 | # Export SavedModel
132 | tf.saved_model.simple_save(
133 |     session            = keras.backend.get_session(),
134 |     export_dir         = model_dir,
135 |     inputs             = {'input': model.inputs[0]},
136 |     outputs            = {'output': model.outputs[0]})
137 | 
138 | # Compile using Neuron
139 | #tfn.saved_model.compile(model_dir, compiled_model_dir) #default compiles to 1 neuron core.
140 | tfn.saved_model.compile(model_dir, compiled_model_dir, compiler_args =['--num-neuroncores', '4']) # compile to 4 neuron cores.
141 | 
142 | # Prepare SavedModel for uploading to Inf1 instance
143 | shutil.make_archive('./resnet50_neuron', 'zip', WORKSPACE, 'resnet50_neuron')
144 | ```
145 | 
146 | > Note `compiler_args =['--num-neuroncores', '4']`, targets and optimizes the model to run on 4 Neuron cores; default is 1. Be sure to use the correct number of cores on the target Inf1 instance.
147 | 
148 | 
149 | **1.2.2** Run the compilation script, which will take a few minutes on c5d.4xlarge. At the end of script execution, the compiled SavedModel is zipped as resnet50_neuron.zip in local directory. This zip file will be needed in the next lab where we deploy the compiled model on an Inf1 instance.
150 | 
151 | ```bash
152 | time python compile_resnet50.py  
153 | ```
154 | ```
155 | ...
156 | INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc
157 | INFO:tensorflow:Number of operations in TensorFlow session: 4638
158 | INFO:tensorflow:Number of operations after tf.neuron optimizations: 556
159 | INFO:tensorflow:Number of operations placed on Neuron runtime: 554
160 | INFO:tensorflow:Successfully converted ./ws_resnet50/resnet50 to ./ws_resnet50/resnet50_neuron
161 | 
162 | real    1m54.833s
163 | user    1m46.876s
164 | sys     0m6.736s
165 | ```
166 | 
167 | >Note: Neuron compiler does ahead-time compilation. Comparing this to JIT, or first time inference compile systems, this saves time if you are deploying this onto mutliple servers. The Neuron compiler can automatically fuse operator + scheduling and mem management.
168 | 
169 | **1.2.3** Please keep resnet50_neuron.zip in local directory. You will need this in the next lab. Please click on the link below to go to the next document to launch an Inf1 Instance and deploy your compiled model.
170 | 
171 | [Go To Lab 2](2.%20Deploy%20and%20model%20serve%20on%20Inf1%20Instance.md)
172 | 


--------------------------------------------------------------------------------
/2. Deploy and model serve on Inf1 Instance.md:
--------------------------------------------------------------------------------
  1 | # Lab 2. Deploy and Serve Compiled Model On Inferentia Instance (30 Minutes)
  2 | 
  3 | **NOTE: Please complete Lab 1 before starting Lab 2 to obtain the compiled model.**
  4 | 
  5 | ## Lab 2 Section 1: Select Deep Learning AMI Version 26 for Ubuntu, and start an Inf1 instance inf1.2xlarge.
  6 | 
  7 | **2.1.1** Select Deep Learning AMI Version 26 for Ubuntu, and start an Inf1 instance inf1.2xlarge. 
  8 | 
  9 | ## Lab 2 Section 2: Update Repo List for Pip and Apt-get (Debian)
 10 | 
 11 | **2.2.1** Add Neuron Apt-get repo for debian, Ubuntu 16.
 12 | >For Ubuntu 18, use bionic instead of xenial; we are using Ubuntu 16 for this lab.
 13 | 
 14 | Adding Repo URL to list.
 15 | ```bash
 16 | sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null <<EOF
 17 | deb https://apt.repos.neuron.amazonaws.com xenial main
 18 | EOF
 19 | ```
 20 | Add apt-get public key for the Neuron Apt-get Repo.
 21 | ```bash
 22 | wget -qO - https://apt.repos.neuron.amazonaws.com/GPG-PUB-KEY-AMAZON-AWS-NEURON.PUB | sudo apt-key add -
 23 | ```
 24 | 
 25 | **2.2.2** Installing Neuron run-time, SDK, and Tensorflow Model server optimized for the Neuron SDK.
 26 | 
 27 | ```bash
 28 | sudo apt-get update
 29 | ```
 30 | ```bash
 31 | sudo apt-get install aws-neuron-runtime
 32 | ```
 33 | ```bash
 34 | sudo apt-get install aws-neuron-tools
 35 | ```
 36 | Tensorflow model server that can take neuron parameter files and serve it directly.
 37 | ```bash
 38 | sudo apt-get install tensorflow-model-server-neuron
 39 | ```
 40 | 
 41 | Note: If you see the following errors during apt-get install, please wait a minute or so for background updates to finish and retry apt-get install:
 42 | ```bash
 43 | E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable)
 44 | E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?
 45 | ```
 46 | 
 47 | **2.2.3** Add Neuron Pip Repo to system for easy update.
 48 | 
 49 | ```bash
 50 | sudo tee /etc/pip.conf > /dev/null <<EOF
 51 | [global]
 52 | extra-index-url = https://pip.repos.neuron.amazonaws.com
 53 | EOF
 54 | ```
 55 | 
 56 | **2.2.4** Install virtual environment:
 57 | 
 58 | ```bash
 59 | sudo apt-get update
 60 | sudo apt-get -y install virtualenv
 61 | ```
 62 | 
 63 | **2.2.5** Setup a new Python 3.6 environment:
 64 | 
 65 | ```bash
 66 | virtualenv --python=python3.6 test_env_p36
 67 | ```
 68 | 
 69 | ```bash
 70 | source test_env_p36/bin/activate
 71 | ```
 72 | 
 73 | **2.2.6** Install TensorFlow-Neuron (Neuron Compiler is not needed if compilation is not done)
 74 | ```bash
 75 | pip install tensorflow-neuron
 76 | ```
 77 | 
 78 | **2.2.7** Also, you would need TensorFlow Serving API (use --no-deps to prevent installation of regular tensorflow):
 79 | 
 80 | ```bash
 81 | pip install --no-deps tensorflow_serving_api==1.15
 82 | ```
 83 | 
 84 | **2.2.8** Install Pillow (graphics package for loading images)
 85 | ```bash
 86 | pip install pillow
 87 | ```
 88 | 
 89 | ### Lab 2 Section 3: On the Inf1, create a inference Python script named infer_resnet50.py with the following content:
 90 | 
 91 | Steps Overview:
 92 | 
 93 |  * **2.3.1** You already compiled the Keras ResNet50 model and export it as a SavedModel which is an interchange format for TensorFlow models. ssh copy (scp) the compiled package onto the Inferentia Instance.
 94 | 
 95 |  * **2.3.2** to **2.3.5** Run inference on Inf1 with an example input.
 96 | 
 97 | 
 98 | **2.3.1** If not compiling and inferring on the same instance, copy the artifact to the inference server:
 99 | 
100 | On your C5 instance create a .pem file with your private key; you can again use nano or vi to copy and paste your ee-default-keypair.pem from your local laptop.
101 | 
102 | Now, copy the compiled tf model package to your Inf1 instance:
103 | 
104 | ```bash
105 | scp -i ee-default-keypair.pem ./resnet50_neuron.zip ubuntu@<instance DNS>:~/ # Ubuntu Image default.
106 | #scp -i ee-default-keypair.pem ./resnet50_neuron.zip ec2-user@<instance DNS>:~/  # if on AML2  if you are on Amazon Linux Image.
107 | ```
108 | **2.3.2** On the Inf1, create a inference Python script named `infer_resnet50.py` with the following content:
109 | 
110 | ```python
111 | import os
112 | import time
113 | import numpy as np
114 | import tensorflow as tf
115 | from tensorflow.keras.preprocessing import image
116 | from tensorflow.keras.applications import resnet50
117 | 
118 | tf.keras.backend.set_image_data_format('channels_last')
119 | 
120 | # Create input from image
121 | img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
122 | img_arr = image.img_to_array(img_sgl)
123 | img_arr2 = np.expand_dims(img_arr, axis=0)
124 | img_arr3 = resnet50.preprocess_input(img_arr2)
125 | 
126 | # Load model
127 | COMPILED_MODEL_DIR = './resnet50_neuron/'
128 | predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR)
129 | 
130 | # Run Inference and Display results
131 | model_feed_dict={'input': img_arr3}
132 | infa_rslts = predictor_inferentia(model_feed_dict)
133 | print(resnet50.decode_predictions(infa_rslts["output"], top=5)[0])
134 | ```
135 | 
136 | **2.3.3** Unzip the mode on Inferentia Instance, download the example image and run the inference:
137 | ```bash
138 | unzip resnet50_neuron.zip
139 | ```
140 | **2.3.4** Get a sample image to test the model inference on Neuron Cores.
141 | ```bash
142 | curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg
143 | ```
144 | **2.3.5** Run Inference Script
145 | ```bash
146 | python infer_resnet50.py
147 | ```
148 | You should get:
149 | ```bash
150 | [('n02123045', 'tabby', 0.69945353), ('n02127052', 'lynx', 0.1215847), ('n02123159', 'tiger_cat', 0.08367486), ('n02124075', 'Egyptian_cat', 0.064890705), ('n02128757', 'snow_leopard', 0.009392076)]
151 | ```
152 | 
153 | ## Lab 2 Section 4.  Neuron TensorFlow Serving
154 | 
155 | TensorFlow Serving is a serving system that allows customers to scale-up inference across a network. Neuron TensorFlow Serving uses the same API as normal TensorFlow Serving. The only differences are that the saved model must be compiled for Inferentia and the entry point is a different binary named `tensorflow_model_server_neuron`. The binary is found at `/usr/local/bin/tensorflow_model_server_neuron` and is pre-installed in the DLAMI or installed with APT/YUM tensorflow-model-server-neuron package.
156 | 
157 | You have installed these two packages in the previous step.
158 | 
159 | The following example shows how to prepare saved model for serving and a sample inference via served model.
160 | 
161 | **2.4.1** Prepare Compiled Saved Model
162 | 
163 | Prepare a directory structure for TensorFlow model serving, with the previously compiled ResNet50 saved model in directory "1":
164 | 
165 | ```bash
166 | mkdir -p resnet50_inf1_serve
167 | cp -rf resnet50_neuron resnet50_inf1_serve/1
168 | ```
169 | 
170 | **2.4.2** Serving Saved Model
171 | 
172 | User can now serve the saved model with the tensorflow_model_server_neuron binary:
173 | 
174 | ```bash
175 | tensorflow_model_server_neuron --model_name=resnet50_inf1_serve --model_base_path=$(pwd)/resnet50_inf1_serve/ --port=8500
176 | ```
177 | 
178 | The compiled model is staged in Inferentia DRAM by the server to prepare for inference.
179 | 
180 | **2.4.3** Generate inference requests to the model server
181 | 
182 | In another terminal, enter the created virtualenv:
183 | 
184 | ```bash
185 | source test_env_p36/bin/activate
186 | ```
187 | 
188 | Run inferences via GRPC using the following sample client code (save it as `tfs_client.py` and run it as `python tfs_client.py`):
189 | 
190 | ```python
191 | import numpy as np
192 | import grpc
193 | import tensorflow as tf
194 | from tensorflow.keras.preprocessing import image
195 | from tensorflow.keras.applications.resnet50 import preprocess_input
196 | from tensorflow.keras.applications.resnet50 import decode_predictions
197 | from tensorflow_serving.apis import predict_pb2
198 | from tensorflow_serving.apis import prediction_service_pb2_grpc
199 | 
200 | tf.keras.backend.set_image_data_format('channels_last')
201 | 
202 | if __name__ == '__main__':
203 |     channel = grpc.insecure_channel('localhost:8500')
204 |     stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
205 |     img_file = tf.keras.utils.get_file(
206 |         "./kitten_small.jpg",
207 |         "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg")
208 |     img = image.load_img(img_file, target_size=(224, 224))
209 |     img_array = preprocess_input(image.img_to_array(img)[None, ...])
210 |     request = predict_pb2.PredictRequest()
211 |     request.model_spec.name = 'resnet50_inf1_serve'
212 |     request.inputs['input'].CopyFrom(
213 |         tf.contrib.util.make_tensor_proto(img_array, shape=img_array.shape))
214 |     result = stub.Predict(request)
215 |     prediction = tf.make_ndarray(result.outputs['output'])
216 |     print(decode_predictions(prediction))
217 | ```
218 | 
219 | Expected output:
220 | ```bash
221 | [[('n02123045', 'tabby', 0.69945353), ('n02127052', 'lynx', 0.1215847), ('n02123159', 'tiger_cat', 0.08367486), ('n02124075', 'Egyptian_cat', 0.064890705), ('n02128757', 'snow_leopard', 0.009392076)]]
222 | ```
223 | 
224 | **2.4.4** Cleanup
225 | 
226 | Please terminate the TensorFlow Serving process in the first terminal by pressing Ctrl-C and do: 
227 | 
228 | ```bash
229 | /opt/aws/neuron/bin/neuron-cli reset
230 | ```
231 | Expected output:
232 | ```bash
233 | (test_env_p36) ubuntu@ip-10-1-2-11:~$ /opt/aws/neuron/bin/neuron-cli reset
234 | No NCG Found
235 | ```
236 | [Go To Lab 3](3.%20benchmark%20run.md)
237 | 


--------------------------------------------------------------------------------
/4. Profiling and Debugging.md:
--------------------------------------------------------------------------------
  1 | # Lab 4. Debugging and Profiling on TensorBoard-Neuron
  2 | 
  3 | In this lab you will practice debugging and profiling Lab 2 inference run using TensorBoard-Neuron.
  4 | 
  5 | **Note: After Lab 3, please go back to home directory /home/ubuntu**
  6 | 
  7 | ```bash
  8 | cd ~/
  9 | ```
 10 | 
 11 | ## Lab 4 Section 1: Installation on Inf1 on Inf1 instance
 12 | 
 13 | **4.1.1** Install TensorFlow-Neuron if not already done on Inf1 instance. By default, TensorBoard-Neuron will be installed when you install TensorFlow-Neuron.  If you already have installed TensorFlow-Neuron, skip this step
 14 | 
 15 | ```
 16 | $ pip install tensorflow-neuron
 17 | ```
 18 | 
 19 | **4.1.2** TensorBoard-Neuron can also be installed separately.
 20 | 
 21 | ```
 22 | $ pip install tensorboard-neuron
 23 | ```
 24 | 
 25 | **4.1.3** Additionally, if you would like to profile your model (see below), you will also need to have Neuron tools installed.
 26 | 
 27 | ```
 28 | $ sudo apt install aws-neuron-tools
 29 | ```
 30 | 
 31 | ## Lab 4 Section 2: Profile the network and collect inference traces on Inf1 instance
 32 | 
 33 | When using TensorFlow-Neuron, MXNet-Neuron, or PyTorch-Neuron, raw profile data will be collected if NEURON_PROFILE environment variable is set. The raw profile is dumped into the directory pointed by NEURON_PROFILE environment variable.
 34 | 
 35 | **4.2.1** Set the environment variable
 36 | ```
 37 | mkdir -p profile
 38 | export NEURON_PROFILE=profile
 39 | ```
 40 | 
 41 | NOTE: the directory must exist before you move on to the next step.  Otherwise, profile data will not be emitted.
 42 | 
 43 | **4.2.2** Run inference from Lab 2 through the framework
 44 | ```
 45 | python infer_resnet50.py
 46 | ```
 47 | 
 48 | ## Lab 4 Section 3: Visualizing data with TensorBoard-Neuron
 49 | 
 50 | In this section please use Chrome. FireFox will work except for Step 4.6.4 "Chrome trace". Don't use Safari as it will hang during certain steps (a known problem with TensorBoard). 
 51 | 
 52 | **4.3.1** To view data in TensorBoard-Neuron, run the command below, where “logdir” is the directory where TensorFlow logs are generated.  (Note that this "logdir" is *not* the same as the NEURON_PROFILE directory that you set during inference.  The inference script above does not actually produce any tensorflow logs, so you can set this to any value.  The most important thing is that for this step, the NEURON_PROFILE environment variable should still be set to the same directory you used during your inference run.  `tensorboard_neuron` will process the neuron profile data from this directory at startup.)
 53 | 
 54 | ```
 55 | $ tensorboard_neuron --logdir ~/logs --run_neuron_profile
 56 | ```
 57 | 
 58 | **4.3.2** By default, TensorBoard-Neuron will be launched at “localhost:6006,” by specifying "--host" and "--port" option the URL can be changed.
 59 | 
 60 | Now, in a browser visit [localhost:6006](http://localhost:6006/) to view the visualization or and enter the host and port if specified above.
 61 | 
 62 | For this step, you may need to do port forwarding to your local machine
 63 | ```
 64 | ssh -i <PEM key file> ubuntu@<instance DNS> -L 6006:localhost:6006
 65 | ```
 66 | 
 67 | ## Lab 4 Section 4: How to check Neuron compatibility
 68 | 
 69 | TensorBoard-Neuron can visualize which operators are supported on Neuron devices. All Neuron compatible operators would run on Neuron Cores and other operators would run on CPU.
 70 | 
 71 | **4.4.1** Navigate to the "Graphs" plugin.
 72 | 
 73 | **4.4.2** select “Neuron Compatibility“
 74 | 
 75 | In the navigation pane on the left, under the “Color” section, select “Neuron Compatibility.”
 76 | 
 77 | ![image](./images/3.44.08PM.png)
 78 | 
 79 | **4.4.3** View compatible operators
 80 | 
 81 | Now, the graph should be colored red and/or green.  Green indicates that an operator that is compatible with Neuron devices, while red indicates that the operator is currently not supported.  If there are unsupported operators, all of these operators’ names will be listed under the “Incompatible Operations” section.
 82 | ![image](./images/3.44.14PM.png)
 83 | 
 84 | ## Lab 4 Section 5: How to visualize graphs run on a Neuron device
 85 | 
 86 | After successfully analyzing the profiled run on a Neuron device, you can launch TensorBoard-Neuron to view the graph and see how much time each operator is taking.
 87 | 
 88 | **4.5.1** Navigate to the "Graphs" plugin
 89 | 
 90 | **4.5.2** Select the “Neuron_profile” tag
 91 | 
 92 | The “neuron_profile” tag contains timing information regarding the inference you profiled.
 93 | ![image](./images/3.44.19PM.png)
 94 | 
 95 | **4.5.3** select “Compute Time”
 96 | 
 97 | In the navigation pane on the left, under the “Color” section, select “Compute time.”
 98 | ![image](./images/3.44.35PM.png)
 99 | 
100 | **4.5.4** View time taken by various layers
101 | 
102 | This view will show time taken by each layer and will be colored according to how much relative time the layer took to compute.  A lighter shade of red means that a relatively small portion of compute time was spent in this layer, while a darker red shows that more compute time was used.  Some layers may also be blank, which indicates that these layers may have been optimized out to improve inference performance.  Clicking on a node will show the compute time, if available.
103 | ![image](./images/3.49.20PM.png)
104 | 
105 | ## Lab 4 Section 6: How to view detailed profile using the Neuron Profile plugin
106 | 
107 | To get a better understanding of the profile, you can check out the Neuron Profile plugin.  Here, you will find more information on the inference, including an overview, a list of the most time-consuming operators (op profile tool), and an execution timeline view (Chrome trace).
108 | 
109 | **4.6.1** Select the “Neuron Profile” plugin
110 | 
111 | On the navigation bar at the top of the page, there will be a list of active plugins.  In this case, you will need to use the “Neuron Profile” plugin. The plugin may take a while to register on first load.  If this tab does not show initially, please refresh the page.
112 | 
113 | ![image](./images/3.43.45PM.png)
114 | 
115 | **4.6.2** The Profile Overview
116 | 
117 | The first page you will land on in the Neuron Profile plugin is the overview page.  It contains various information regarding the inference.
118 | ![image](./images/3.56.54PM.png)
119 | In the “Performance Summary” section, you will see execution stats, such as the total execution time, the average layer execution time, and the utilization of NeuronMatrix Units.
120 | 
121 | The “Neuron Time Graph” shows how long a portion of the graph (a NeuronOp) took to execute.
122 | 
123 | The “Top TensorFlow operations executed on Neuron Cores” sections gives a quick summary of the most time-consuming operators that were executed on the device.
124 | 
125 | “Run Environment” shows the information on devices used during this inference.
126 | 
127 | Finally, the “Recommendation for Next Steps” section gives helpful pointers to place to learn more about what to do next
128 | 
129 | **4.6.3** The Operator Profile
130 | 
131 | In the “Tools” dropdown menu, select “op_profile.”
132 | 
133 | The “op profile” tool displays the percentage of overall time taken for each operator, sorted by the most expensive operators at the top.  It gives a better understanding of where the bottlenecks in a model may be.
134 | ![image](./images/3.57.19PM.png)
135 | 
136 | **4.6.4** Chrome trace
137 | 
138 | In the “Tools” dropdown menu, select “trace_viewer.”
139 | 
140 | For developers wanting to better understand the timeline of the inference, the Chrome trace view is the tool for you.  It shows the history of execution organized by the operator names.
141 | 
142 | Please note that this tool can only be used in Chrome browsers.
143 | ![image](./images/3.59.37PM.png)
144 | 
145 | ## Lab 4 Section 7: How to debug an inference
146 | 
147 | **4.7.1** Launch TensorBoard-Neuron and navigate to the webpage
148 | 
149 | To use the Debugger plugin, you will need to launch with an extra flag:
150 | 
151 | ```
152 | $ tensorboard_neuron --logdir ~/logs --debugger_port 7000
153 | ```
154 | 
155 | where 7000 can be any desired port number.
156 | 
157 | **4.7.2** Modify and run your inference script
158 | 
159 | In order to run the inference in “debug mode,” you must use TensorFlow’s debug wrapper.  The following lines will need to be added to your infer_resnet50.py script from Lab 2.
160 | ```python
161 | ...
162 | DEBUG_SERVER_ADDRESS = 'localhost:7000'
163 | ...
164 | predictor_inferentia._session = tf_debug.TensorBoardDebugWrapperSession(
165 |             predictor_inferentia._session, DEBUG_SERVER_ADDRESS)
166 | ```
167 | 
168 | The full modified infer_resnet50.py script is:
169 | 
170 | ```python
171 | import os
172 | import time
173 | import numpy as np
174 | import tensorflow as tf
175 | from tensorflow.keras.preprocessing import image
176 | from tensorflow.keras.applications import resnet50
177 | from tensorflow.python import debug as tf_debug
178 | 
179 | tf.keras.backend.set_image_data_format('channels_last')
180 | 
181 | # The port must be the same as the one used for --debugger_port above
182 | # in this example, PORT is 7000
183 | DEBUG_SERVER_ADDRESS = 'localhost:7000'
184 | 
185 | # Create input from image
186 | img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224))
187 | img_arr = image.img_to_array(img_sgl)
188 | img_arr2 = np.expand_dims(img_arr, axis=0)
189 | img_arr3 = resnet50.preprocess_input(img_arr2)
190 | 
191 | # Load model
192 | COMPILED_MODEL_DIR = './resnet50_neuron/'
193 | predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR)
194 | 
195 | # Use the debug wrapper for TensorBoard
196 | predictor_inferentia._session = tf_debug.TensorBoardDebugWrapperSession(
197 |             predictor_inferentia._session, DEBUG_SERVER_ADDRESS)
198 | 
199 | # Run Inference and Display results
200 | model_feed_dict={'input': img_arr3}
201 | infa_rslts = predictor_inferentia(model_feed_dict)
202 | print(resnet50.decode_predictions(infa_rslts["output"], top=5)[0])
203 | ```
204 | 
205 | After adding these modifications, run the script to begin inference.  The execution will be paused before any calculation starts.
206 | 
207 | **4.7.3** Select the “debugger” plugin
208 | 
209 | On the navigation bar at the top of the page, there will be a list of active plugins.  In this case, you will need to use the “Debugger” plugin.
210 | ![image](./images/5.05.06PM.png)
211 | 
212 | **4.7.4** Enable watchpoints
213 | 
214 | In the “Runtime Node List” on the left, there will be a list of operators and a checkbox next to each.  Select all of the operators that you would like the view the tensor output of.
215 | ![image](./images/10.45.32AM.png)
216 | 
217 | **4.7.5** execute inference
218 | 
219 | On the bottom left of the page, there will be a “Continue...” button that will resume the inference execution.  As the graph is executed, output tensors will be saved for later viewing.
220 | ![image](./images/10.46.14AM.png)
221 | 
222 | **4.7.6** View tensors
223 | 
224 | At the bottom of the page, there will be a“Tensor Value Overview” section that shows a summary of all the output tensors that were selected as watchpoints in Step 4.7.4.
225 | ![image](./images/10.47.32AM.png)To view more specific information on a tensor, you can click on a tensor’s value.  You may also hover over the bar in the “Health Pill” column for a more detailed summary of values.
226 | ![image](./images/10.48.15AM.png)
227 | 


--------------------------------------------------------------------------------