├── .gitattributes ├── images ├── 3.43.45PM.png ├── 3.44.08PM.png ├── 3.44.14PM.png ├── 3.44.19PM.png ├── 3.44.35PM.png ├── 3.49.20PM.png ├── 3.56.54PM.png ├── 3.57.19PM.png ├── 3.59.37PM.png ├── 5.05.06PM.png ├── 10.45.32AM.png ├── 10.46.14AM.png ├── 10.47.32AM.png └── 10.48.15AM.png ├── slides └── reinvent cmp 423 Inf1 Lab.pdf ├── README.md ├── 3. benchmark run.md ├── 1. Setup Dev Env.md ├── 2. Deploy and model serve on Inf1 Instance.md └── 4. Profiling and Debugging.md /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | -------------------------------------------------------------------------------- /images/3.43.45PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.43.45PM.png -------------------------------------------------------------------------------- /images/3.44.08PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.08PM.png -------------------------------------------------------------------------------- /images/3.44.14PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.14PM.png -------------------------------------------------------------------------------- /images/3.44.19PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.19PM.png -------------------------------------------------------------------------------- /images/3.44.35PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.44.35PM.png -------------------------------------------------------------------------------- /images/3.49.20PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.49.20PM.png -------------------------------------------------------------------------------- /images/3.56.54PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.56.54PM.png -------------------------------------------------------------------------------- /images/3.57.19PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.57.19PM.png -------------------------------------------------------------------------------- /images/3.59.37PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/3.59.37PM.png -------------------------------------------------------------------------------- /images/5.05.06PM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/5.05.06PM.png -------------------------------------------------------------------------------- /images/10.45.32AM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.45.32AM.png -------------------------------------------------------------------------------- /images/10.46.14AM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.46.14AM.png -------------------------------------------------------------------------------- /images/10.47.32AM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.47.32AM.png -------------------------------------------------------------------------------- /images/10.48.15AM.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/images/10.48.15AM.png -------------------------------------------------------------------------------- /slides/reinvent cmp 423 Inf1 Lab.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awshlabs/reinvent19Inf1Lab/HEAD/slides/reinvent cmp 423 Inf1 Lab.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Reinvent Inf1 Lab: Hands-on Deep Learning Inference with Amazon EC2 Inf1 Instance 2 | 3 | >Note: We simplified this lab into a new repository. https://github.com/awshlabs/Jul2020-Inf1Lab 4 | 5 | ## Abstract: 6 | 7 | In this workshop, you gain hands-on experience with Amazon EC2 Inf1 instances, powered by custom AWS Inferentia chips. Amazon EC2 Inf1 instances offer low-latency, high-throughput, and cost-effective machine learning inference in the cloud. This workshop walks you through taking a trained deep learning model to deployment on Amazon EC2 Inf1 instances by using AWS Neuron, an SDK for optimizing inference using AWS Inferentia processors. 8 | 9 | ## Overview: 10 | 11 | Please follow the labs in sequence. 12 | 13 | Lab 1. **Launch** a C5 Instance, **install** the Neuron development environment, Custom compile a pre-trained model to target the Inferentia Neuron Processor. 14 | Lab 2. **Launch** an Inf1 Instance, **install** Neuron run-time and development environment, **test** and **model serve** the compiled ResNet package. 15 | Lab 3. **Compile** on C5 and **launch** a load test run on Inf1 Instance. 16 | Lab 4. **Debug and profile** your model on Inf1 Instance. 17 | 18 | ## Slides: 19 | 20 | Reinvent workshop slides at at: [slides Directory](./slides) 21 | 22 | -------------------------------------------------------------------------------- /3. benchmark run.md: -------------------------------------------------------------------------------- 1 | # Lab 3: Compile on C5 and launch a load test run on Inf1 Instance. 2 | 3 | **Please complete Lab 2 and clean up by following Lab 2's last step. If using DLAMI Conda environment, please update to [latest Neuron software](https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/dlami-release-notes.md) for this lab.** 4 | 5 | This lab shows an example load testing using FP16 model derived from Keras ResNet50 model and compiled to Inferentia with experimental performance flags. For this lab please use the C5 used in lab 1 and inf1.2xlarge used in lab 2. 6 | 7 | ## Lab 3 Section 1: Compile on C5 8 | 9 | **3.1.1** Download and unpack the ResNet50 performance package on C5 instance: 10 | 11 | ```bash 12 | wget https://reinventinf1.s3.amazonaws.com/keras_fp16_benchmarking_db.tgz 13 | ``` 14 | ```bash 15 | tar -xzf keras_fp16_benchmarking_db.tgz 16 | ``` 17 | ```bash 18 | cd keras_fp16_benchmarking_db 19 | ``` 20 | 21 | **3.1.2** Activate virtual environment and install Neuron Compiler if not done so. Also install pillow module for test scripts. 22 | 23 | ```bash 24 | source test_env_p36/bin/activate 25 | pip install neuron-cc 26 | pip install pillow 27 | ``` 28 | 29 | **3.1.3** Extract Keras ResNet50 FP32, optimize for inference, and convert to FP16. 30 | 31 | Extract Keras ResNet50 FP32 (resnet50_fp32_keras.pb will be generated): 32 | 33 | ```bash 34 | python gen_resnet50_keras.py 35 | ``` 36 | Optimize the extracted Keras ResNet50 FP32 graph for inference before casting (resnet50_fp32_keras_opt.pb will be generated): 37 | 38 | ```bash 39 | python optimize_for_inference.py --graph resnet50_fp32_keras.pb --out_graph resnet50_fp32_keras_opt.pb 40 | ``` 41 | 42 | Convert full graph to FP16 (resnet50_fp16_keras_opt.pb will be generated): 43 | ```bash 44 | python fp32tofp16.py --graph resnet50_fp32_keras_opt.pb --out_graph resnet50_fp16_keras_opt.pb 45 | ``` 46 | 47 | **3.1.4** Compile ResNet50 frozen graph using provided pb2sm_compile.py script on Inf1 instance. This step takes about 4 minutes on Inf1.2xlarge. NOTE: please ensure that the Neuron Compiler is up-to-date by following the setup steps in Lab 1 Section 1. 48 | 49 | >We optimized this model with a compiled time batch size of 5. We optimize throughput having a runtime batch size of mutiples of 5. (50 in this case). This step takes about 6 minutes. 50 | 51 | ```bash 52 | time python pb2sm_compile.py 53 | ``` 54 | 55 | At the end of this step, you will see a zipped saved model `rn50_fp16_compiled_batch5.zip` which you will need to copy to your Inf1 instance (the PEM key was setup during Lab 2 Section 3): 56 | 57 | ```bash 58 | scp -i ~/ee-default-keypair.pem ./rn50_fp16_compiled_batch5.zip ubuntu@:~/ # Ubuntu Image default. 59 | #scp -i ~/ee-default-keypair.pem ./rn50_fp16_compiled_batch5.zip ec2-user@:~/ # if on AML2 if you are on Amazon 60 | ``` 61 | 62 | ## Lab 3 Section 2: Launch a load test run on Inf1 63 | 64 | **3.2.1** Download and unpack the ResNet50 performance package again, this time on Inf1 instance: 65 | 66 | ```bash 67 | wget https://reinventinf1.s3.amazonaws.com/keras_fp16_benchmarking_db.tgz 68 | ``` 69 | ```bash 70 | tar -xzf keras_fp16_benchmarking_db.tgz 71 | ``` 72 | ```bash 73 | cd keras_fp16_benchmarking_db 74 | ``` 75 | 76 | Unzip the saved model that was transfered from C5 into current directory: 77 | 78 | ```bash 79 | unzip ~/rn50_fp16_compiled_batch5.zip 80 | ``` 81 | 82 | **3.2.2** Run load test using provided infer_resnet50_keras_loadtest.py script on Inf1 instance (please make sure this is inf1.2xlarge): 83 | 84 | > There are total of 4 Neuron Cores on Inf1.2xlarge. There are 4 sessions of ResNet50 running, each session binds to a Neuron core. There are 4 threads in each of these sessions. 85 | 86 | ```bash 87 | time python infer_resnet50_keras_loadtest.py 88 | ``` 89 | Output: 90 | 91 | ``` 92 | NUM THREADS: 16 93 | NUM_LOOPS_PER_THREAD: 100 94 | USER_BATCH_SIZE: 50 95 | current throughput: 0 images/sec 96 | current throughput: 0 images/sec 97 | current throughput: 700 images/sec 98 | current throughput: 800 images/sec 99 | current throughput: 1700 images/sec 100 | current throughput: 1800 images/sec 101 | current throughput: 1850 images/sec 102 | current throughput: 1800 images/sec 103 | current throughput: 1850 images/sec 104 | current throughput: 1700 images/sec 105 | current throughput: 1850 images/sec 106 | current throughput: 1800 images/sec 107 | current throughput: 1800 images/sec 108 | current throughput: 1800 images/sec 109 | current throughput: 1800 images/sec 110 | current throughput: 1750 images/sec 111 | current throughput: 1950 images/sec 112 | current throughput: 1750 images/sec 113 | current throughput: 1850 images/sec 114 | current throughput: 1800 images/sec 115 | current throughput: 1750 images/sec 116 | current throughput: 1800 images/sec 117 | current throughput: 1800 images/sec 118 | current throughput: 1750 images/sec 119 | current throughput: 1800 images/sec 120 | current throughput: 1800 images/sec 121 | current throughput: 1750 images/sec 122 | current throughput: 1850 images/sec 123 | current throughput: 1750 images/sec 124 | current throughput: 1800 images/sec 125 | current throughput: 1800 images/sec 126 | current throughput: 1800 images/sec 127 | current throughput: 1850 images/sec 128 | current throughput: 1800 images/sec 129 | current throughput: 1850 images/sec 130 | current throughput: 1800 images/sec 131 | current throughput: 1750 images/sec 132 | current throughput: 1800 images/sec 133 | current throughput: 1750 images/sec 134 | current throughput: 1800 images/sec 135 | current throughput: 1800 images/sec 136 | current throughput: 1750 images/sec 137 | current throughput: 1850 images/sec 138 | current throughput: 1800 images/sec 139 | current throughput: 1800 images/sec 140 | current throughput: 1900 images/sec 141 | current throughput: 1800 images/sec 142 | current throughput: 850 images/sec 143 | current throughput: 250 images/sec 144 | 145 | real 0m54.746s 146 | user 1m39.552s 147 | sys 0m7.787s 148 | 149 | ``` 150 | 151 | NOTE: If you see lower throughput, please make sure that the Inf1 instance is inf1.2xlarge. 152 | 153 | **3.2.3** While this is running you can see utilization using neuron-top tool in a separate terminal (it takes about a minute to load; also running neuron-top will lower the throughput to around 1200 images/sec): 154 | ```bash 155 | /opt/aws/neuron/bin/neuron-top 156 | ``` 157 | 158 | **Note: Please go back to home directory /home/ubuntu** 159 | 160 | ```bash 161 | cd ~/ 162 | ``` 163 | 164 | [Go To Lab 4](4.%20Profiling%20and%20Debugging.md) 165 | -------------------------------------------------------------------------------- /1. Setup Dev Env.md: -------------------------------------------------------------------------------- 1 | # Reinvent Inf1 Lab: Hands-on Deep Learning Inference with Amazon EC2 Inf1 Instance (20 min) 2 | 3 | ## Abstract: 4 | 5 | In this workshop, you gain hands-on experience with Amazon EC2 Inf1 instances, powered by custom AWS Inferentia chips. Amazon EC2 Inf1 instances offer low-latency, high-throughput, and cost-effective machine learning inference in the cloud. This workshop walks you through taking a trained deep learning model to deployment on Amazon EC2 Inf1 instances by using AWS Neuron, an SDK for optimizing inference using AWS Inferentia processors. 6 | 7 | 8 | ## Overview: 9 | 10 | **Note: Please folllow the labs in sequence.** 11 | 12 | Lab 1. **Launch** a C5 Instance, **install** the Neuron development environment, Custom compile a pre-trained model to target the Inferentia Neuron Processor. 13 | 14 | Lab 2. **Launch** an Inf1 Instance, **install** Neuron run-time and development environment, **test** and **model serve** the compiled ResNet package. 15 | 16 | Lab 3. **Compile and Launch** a load test on Inferentia Instance. 17 | 18 | Lab 4. **Debug and Profile** your model. 19 | 20 | 21 | ---------- 22 | 23 | # Lab 1. **Launch** a C5 Instance, **install** the Neuron development environment, Custom compile a pre-trained model to target the Inferentia Neuron Processor. 24 | 25 | ## Lab 1 Section 1: Launch an EC2 instance to setup Neuron SDK Dev Environment 26 | 27 | A typical workflow with the Neuron SDK will be to compile a trained ML model on a compute-optimized compilation server and then the distribute the artifacts to Inf1 instances for execution. Select an AMI of your choice, which may be Ubuntu 16.x, Ubuntu 18.x, Amazon Linux 2 based. To use a pre-built Deep Learning AMI with Neuron software, see [DLAMI AWS Neuron guide](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia.html). 28 | 29 | [Launching a DLAMI Instance with AWS Neuron](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html). 30 | 31 | **1.1.0** Setup your SSH env. For Windows, we recommend installing MobaXterm. https://mobaxterm.mobatek.net/ 32 | 33 | If you are using AWS Events engine onsite at Re:Invent. Download your SSH key from the events engine dashboard. https://dashboard.eventengine.run/ Enter your unique events hash, and download the SSH Key to your local laptop/machine. 34 | 35 | 36 | **1.1.1** Goto the EC2 portal, select, **Deep Learning AMI (Ubuntu 16.04) Version 26.** We will run through the installation and update process as an exercise, as we are very actively updating the software tools and packages. 37 | >Please update your system frequently to ensure you are using the latest Neuron packages. 38 | 39 | **1.1.2** Select and start an EC2 instance as your development environment. 40 | It is recommended to use c5.4xlarge or larger. For this example we will use a **c5d.4xlarge** with a faster SSD drive, 16 vcores, and 32 GB of RAM. 41 | 42 | >In the future, if you would like to compile and run inference on the same machine, please select inf1.6xlarge or larger. 43 | 44 | 45 | **1.1.3** Choose an existing ssh key. 46 | 47 | 48 | **1.1.4** Install virtual environment: 49 | > It is always a best practice to use a virtual environment to ensure you have the best flexibility for package management in working with deep learning frameworks. 50 | ```bash 51 | sudo apt-get update 52 | sudo apt-get -y install virtualenv 53 | ``` 54 | 55 | Note: If you see the following errors during apt-get install, please wait a minute or so for background updates to finish and retry apt-get install: 56 | 57 | ```bash 58 | E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) 59 | E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it? 60 | ``` 61 | 62 | **1.1.5** Setup a new Python 3.6 virtual environment. 63 | 64 | ```bash 65 | virtualenv --python=python3.6 test_env_p36 66 | source test_env_p36/bin/activate 67 | pip install --no-deps tensorflow_serving_api==1.15 68 | ``` 69 | > If you are ever disconnected from the instance, make sure you run this command again to get back to the correct virtual environment where you have installed all the correct packages. 70 | 71 | 72 | ```bash 73 | source test_env_p36/bin/activate 74 | ``` 75 | 76 | **1.1.6** Modify Pip configurations to point to the Neuron repository. 77 | ```bash 78 | tee $VIRTUAL_ENV/pip.conf > /dev/null <Note that Model size for Resnet is about 20+ million parameters, in FP32 format, each parameter is 4 bytes. The compiler process automatically converts them into BF16 (2 bytes per parameter), which is a much more efficient dataformat that Neuron Cores support in hardware. 101 | 102 | **1.2.1** Create a python script named `compile_resnet50.py` with the following content. You can use either nano or vi editors. 103 | 104 | >Inspect the code below to understand the steps: export savedModel, compile model. 105 | 106 | ```python 107 | import os 108 | import time 109 | import shutil 110 | import tensorflow as tf 111 | import tensorflow.neuron as tfn 112 | import tensorflow.compat.v1.keras as keras 113 | from tensorflow.keras.applications.resnet50 import ResNet50 114 | from tensorflow.keras.applications.resnet50 import preprocess_input 115 | 116 | # Create a workspace 117 | WORKSPACE = './ws_resnet50' 118 | os.makedirs(WORKSPACE, exist_ok=True) 119 | 120 | # Prepare export directory (old one removed) 121 | model_dir = os.path.join(WORKSPACE, 'resnet50') 122 | compiled_model_dir = os.path.join(WORKSPACE, 'resnet50_neuron') 123 | shutil.rmtree(model_dir, ignore_errors=True) 124 | shutil.rmtree(compiled_model_dir, ignore_errors=True) 125 | 126 | # Instantiate Keras ResNet50 model 127 | keras.backend.set_learning_phase(0) 128 | tf.keras.backend.set_image_data_format('channels_last') 129 | model = ResNet50(weights='imagenet') 130 | 131 | # Export SavedModel 132 | tf.saved_model.simple_save( 133 | session = keras.backend.get_session(), 134 | export_dir = model_dir, 135 | inputs = {'input': model.inputs[0]}, 136 | outputs = {'output': model.outputs[0]}) 137 | 138 | # Compile using Neuron 139 | #tfn.saved_model.compile(model_dir, compiled_model_dir) #default compiles to 1 neuron core. 140 | tfn.saved_model.compile(model_dir, compiled_model_dir, compiler_args =['--num-neuroncores', '4']) # compile to 4 neuron cores. 141 | 142 | # Prepare SavedModel for uploading to Inf1 instance 143 | shutil.make_archive('./resnet50_neuron', 'zip', WORKSPACE, 'resnet50_neuron') 144 | ``` 145 | 146 | > Note `compiler_args =['--num-neuroncores', '4']`, targets and optimizes the model to run on 4 Neuron cores; default is 1. Be sure to use the correct number of cores on the target Inf1 instance. 147 | 148 | 149 | **1.2.2** Run the compilation script, which will take a few minutes on c5d.4xlarge. At the end of script execution, the compiled SavedModel is zipped as resnet50_neuron.zip in local directory. This zip file will be needed in the next lab where we deploy the compiled model on an Inf1 instance. 150 | 151 | ```bash 152 | time python compile_resnet50.py 153 | ``` 154 | ``` 155 | ... 156 | INFO:tensorflow:fusing subgraph neuron_op_d6f098c01c780733 with neuron-cc 157 | INFO:tensorflow:Number of operations in TensorFlow session: 4638 158 | INFO:tensorflow:Number of operations after tf.neuron optimizations: 556 159 | INFO:tensorflow:Number of operations placed on Neuron runtime: 554 160 | INFO:tensorflow:Successfully converted ./ws_resnet50/resnet50 to ./ws_resnet50/resnet50_neuron 161 | 162 | real 1m54.833s 163 | user 1m46.876s 164 | sys 0m6.736s 165 | ``` 166 | 167 | >Note: Neuron compiler does ahead-time compilation. Comparing this to JIT, or first time inference compile systems, this saves time if you are deploying this onto mutliple servers. The Neuron compiler can automatically fuse operator + scheduling and mem management. 168 | 169 | **1.2.3** Please keep resnet50_neuron.zip in local directory. You will need this in the next lab. Please click on the link below to go to the next document to launch an Inf1 Instance and deploy your compiled model. 170 | 171 | [Go To Lab 2](2.%20Deploy%20and%20model%20serve%20on%20Inf1%20Instance.md) 172 | -------------------------------------------------------------------------------- /2. Deploy and model serve on Inf1 Instance.md: -------------------------------------------------------------------------------- 1 | # Lab 2. Deploy and Serve Compiled Model On Inferentia Instance (30 Minutes) 2 | 3 | **NOTE: Please complete Lab 1 before starting Lab 2 to obtain the compiled model.** 4 | 5 | ## Lab 2 Section 1: Select Deep Learning AMI Version 26 for Ubuntu, and start an Inf1 instance inf1.2xlarge. 6 | 7 | **2.1.1** Select Deep Learning AMI Version 26 for Ubuntu, and start an Inf1 instance inf1.2xlarge. 8 | 9 | ## Lab 2 Section 2: Update Repo List for Pip and Apt-get (Debian) 10 | 11 | **2.2.1** Add Neuron Apt-get repo for debian, Ubuntu 16. 12 | >For Ubuntu 18, use bionic instead of xenial; we are using Ubuntu 16 for this lab. 13 | 14 | Adding Repo URL to list. 15 | ```bash 16 | sudo tee /etc/apt/sources.list.d/neuron.list > /dev/null < /dev/null <:~/ # Ubuntu Image default. 106 | #scp -i ee-default-keypair.pem ./resnet50_neuron.zip ec2-user@:~/ # if on AML2 if you are on Amazon Linux Image. 107 | ``` 108 | **2.3.2** On the Inf1, create a inference Python script named `infer_resnet50.py` with the following content: 109 | 110 | ```python 111 | import os 112 | import time 113 | import numpy as np 114 | import tensorflow as tf 115 | from tensorflow.keras.preprocessing import image 116 | from tensorflow.keras.applications import resnet50 117 | 118 | tf.keras.backend.set_image_data_format('channels_last') 119 | 120 | # Create input from image 121 | img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224)) 122 | img_arr = image.img_to_array(img_sgl) 123 | img_arr2 = np.expand_dims(img_arr, axis=0) 124 | img_arr3 = resnet50.preprocess_input(img_arr2) 125 | 126 | # Load model 127 | COMPILED_MODEL_DIR = './resnet50_neuron/' 128 | predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR) 129 | 130 | # Run Inference and Display results 131 | model_feed_dict={'input': img_arr3} 132 | infa_rslts = predictor_inferentia(model_feed_dict) 133 | print(resnet50.decode_predictions(infa_rslts["output"], top=5)[0]) 134 | ``` 135 | 136 | **2.3.3** Unzip the mode on Inferentia Instance, download the example image and run the inference: 137 | ```bash 138 | unzip resnet50_neuron.zip 139 | ``` 140 | **2.3.4** Get a sample image to test the model inference on Neuron Cores. 141 | ```bash 142 | curl -O https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg 143 | ``` 144 | **2.3.5** Run Inference Script 145 | ```bash 146 | python infer_resnet50.py 147 | ``` 148 | You should get: 149 | ```bash 150 | [('n02123045', 'tabby', 0.69945353), ('n02127052', 'lynx', 0.1215847), ('n02123159', 'tiger_cat', 0.08367486), ('n02124075', 'Egyptian_cat', 0.064890705), ('n02128757', 'snow_leopard', 0.009392076)] 151 | ``` 152 | 153 | ## Lab 2 Section 4. Neuron TensorFlow Serving 154 | 155 | TensorFlow Serving is a serving system that allows customers to scale-up inference across a network. Neuron TensorFlow Serving uses the same API as normal TensorFlow Serving. The only differences are that the saved model must be compiled for Inferentia and the entry point is a different binary named `tensorflow_model_server_neuron`. The binary is found at `/usr/local/bin/tensorflow_model_server_neuron` and is pre-installed in the DLAMI or installed with APT/YUM tensorflow-model-server-neuron package. 156 | 157 | You have installed these two packages in the previous step. 158 | 159 | The following example shows how to prepare saved model for serving and a sample inference via served model. 160 | 161 | **2.4.1** Prepare Compiled Saved Model 162 | 163 | Prepare a directory structure for TensorFlow model serving, with the previously compiled ResNet50 saved model in directory "1": 164 | 165 | ```bash 166 | mkdir -p resnet50_inf1_serve 167 | cp -rf resnet50_neuron resnet50_inf1_serve/1 168 | ``` 169 | 170 | **2.4.2** Serving Saved Model 171 | 172 | User can now serve the saved model with the tensorflow_model_server_neuron binary: 173 | 174 | ```bash 175 | tensorflow_model_server_neuron --model_name=resnet50_inf1_serve --model_base_path=$(pwd)/resnet50_inf1_serve/ --port=8500 176 | ``` 177 | 178 | The compiled model is staged in Inferentia DRAM by the server to prepare for inference. 179 | 180 | **2.4.3** Generate inference requests to the model server 181 | 182 | In another terminal, enter the created virtualenv: 183 | 184 | ```bash 185 | source test_env_p36/bin/activate 186 | ``` 187 | 188 | Run inferences via GRPC using the following sample client code (save it as `tfs_client.py` and run it as `python tfs_client.py`): 189 | 190 | ```python 191 | import numpy as np 192 | import grpc 193 | import tensorflow as tf 194 | from tensorflow.keras.preprocessing import image 195 | from tensorflow.keras.applications.resnet50 import preprocess_input 196 | from tensorflow.keras.applications.resnet50 import decode_predictions 197 | from tensorflow_serving.apis import predict_pb2 198 | from tensorflow_serving.apis import prediction_service_pb2_grpc 199 | 200 | tf.keras.backend.set_image_data_format('channels_last') 201 | 202 | if __name__ == '__main__': 203 | channel = grpc.insecure_channel('localhost:8500') 204 | stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) 205 | img_file = tf.keras.utils.get_file( 206 | "./kitten_small.jpg", 207 | "https://raw.githubusercontent.com/awslabs/mxnet-model-server/master/docs/images/kitten_small.jpg") 208 | img = image.load_img(img_file, target_size=(224, 224)) 209 | img_array = preprocess_input(image.img_to_array(img)[None, ...]) 210 | request = predict_pb2.PredictRequest() 211 | request.model_spec.name = 'resnet50_inf1_serve' 212 | request.inputs['input'].CopyFrom( 213 | tf.contrib.util.make_tensor_proto(img_array, shape=img_array.shape)) 214 | result = stub.Predict(request) 215 | prediction = tf.make_ndarray(result.outputs['output']) 216 | print(decode_predictions(prediction)) 217 | ``` 218 | 219 | Expected output: 220 | ```bash 221 | [[('n02123045', 'tabby', 0.69945353), ('n02127052', 'lynx', 0.1215847), ('n02123159', 'tiger_cat', 0.08367486), ('n02124075', 'Egyptian_cat', 0.064890705), ('n02128757', 'snow_leopard', 0.009392076)]] 222 | ``` 223 | 224 | **2.4.4** Cleanup 225 | 226 | Please terminate the TensorFlow Serving process in the first terminal by pressing Ctrl-C and do: 227 | 228 | ```bash 229 | /opt/aws/neuron/bin/neuron-cli reset 230 | ``` 231 | Expected output: 232 | ```bash 233 | (test_env_p36) ubuntu@ip-10-1-2-11:~$ /opt/aws/neuron/bin/neuron-cli reset 234 | No NCG Found 235 | ``` 236 | [Go To Lab 3](3.%20benchmark%20run.md) 237 | -------------------------------------------------------------------------------- /4. Profiling and Debugging.md: -------------------------------------------------------------------------------- 1 | # Lab 4. Debugging and Profiling on TensorBoard-Neuron 2 | 3 | In this lab you will practice debugging and profiling Lab 2 inference run using TensorBoard-Neuron. 4 | 5 | **Note: After Lab 3, please go back to home directory /home/ubuntu** 6 | 7 | ```bash 8 | cd ~/ 9 | ``` 10 | 11 | ## Lab 4 Section 1: Installation on Inf1 on Inf1 instance 12 | 13 | **4.1.1** Install TensorFlow-Neuron if not already done on Inf1 instance. By default, TensorBoard-Neuron will be installed when you install TensorFlow-Neuron. If you already have installed TensorFlow-Neuron, skip this step 14 | 15 | ``` 16 | $ pip install tensorflow-neuron 17 | ``` 18 | 19 | **4.1.2** TensorBoard-Neuron can also be installed separately. 20 | 21 | ``` 22 | $ pip install tensorboard-neuron 23 | ``` 24 | 25 | **4.1.3** Additionally, if you would like to profile your model (see below), you will also need to have Neuron tools installed. 26 | 27 | ``` 28 | $ sudo apt install aws-neuron-tools 29 | ``` 30 | 31 | ## Lab 4 Section 2: Profile the network and collect inference traces on Inf1 instance 32 | 33 | When using TensorFlow-Neuron, MXNet-Neuron, or PyTorch-Neuron, raw profile data will be collected if NEURON_PROFILE environment variable is set. The raw profile is dumped into the directory pointed by NEURON_PROFILE environment variable. 34 | 35 | **4.2.1** Set the environment variable 36 | ``` 37 | mkdir -p profile 38 | export NEURON_PROFILE=profile 39 | ``` 40 | 41 | NOTE: the directory must exist before you move on to the next step. Otherwise, profile data will not be emitted. 42 | 43 | **4.2.2** Run inference from Lab 2 through the framework 44 | ``` 45 | python infer_resnet50.py 46 | ``` 47 | 48 | ## Lab 4 Section 3: Visualizing data with TensorBoard-Neuron 49 | 50 | In this section please use Chrome. FireFox will work except for Step 4.6.4 "Chrome trace". Don't use Safari as it will hang during certain steps (a known problem with TensorBoard). 51 | 52 | **4.3.1** To view data in TensorBoard-Neuron, run the command below, where “logdir” is the directory where TensorFlow logs are generated. (Note that this "logdir" is *not* the same as the NEURON_PROFILE directory that you set during inference. The inference script above does not actually produce any tensorflow logs, so you can set this to any value. The most important thing is that for this step, the NEURON_PROFILE environment variable should still be set to the same directory you used during your inference run. `tensorboard_neuron` will process the neuron profile data from this directory at startup.) 53 | 54 | ``` 55 | $ tensorboard_neuron --logdir ~/logs --run_neuron_profile 56 | ``` 57 | 58 | **4.3.2** By default, TensorBoard-Neuron will be launched at “localhost:6006,” by specifying "--host" and "--port" option the URL can be changed. 59 | 60 | Now, in a browser visit [localhost:6006](http://localhost:6006/) to view the visualization or and enter the host and port if specified above. 61 | 62 | For this step, you may need to do port forwarding to your local machine 63 | ``` 64 | ssh -i ubuntu@ -L 6006:localhost:6006 65 | ``` 66 | 67 | ## Lab 4 Section 4: How to check Neuron compatibility 68 | 69 | TensorBoard-Neuron can visualize which operators are supported on Neuron devices. All Neuron compatible operators would run on Neuron Cores and other operators would run on CPU. 70 | 71 | **4.4.1** Navigate to the "Graphs" plugin. 72 | 73 | **4.4.2** select “Neuron Compatibility“ 74 | 75 | In the navigation pane on the left, under the “Color” section, select “Neuron Compatibility.” 76 | 77 | ![image](./images/3.44.08PM.png) 78 | 79 | **4.4.3** View compatible operators 80 | 81 | Now, the graph should be colored red and/or green. Green indicates that an operator that is compatible with Neuron devices, while red indicates that the operator is currently not supported. If there are unsupported operators, all of these operators’ names will be listed under the “Incompatible Operations” section. 82 | ![image](./images/3.44.14PM.png) 83 | 84 | ## Lab 4 Section 5: How to visualize graphs run on a Neuron device 85 | 86 | After successfully analyzing the profiled run on a Neuron device, you can launch TensorBoard-Neuron to view the graph and see how much time each operator is taking. 87 | 88 | **4.5.1** Navigate to the "Graphs" plugin 89 | 90 | **4.5.2** Select the “Neuron_profile” tag 91 | 92 | The “neuron_profile” tag contains timing information regarding the inference you profiled. 93 | ![image](./images/3.44.19PM.png) 94 | 95 | **4.5.3** select “Compute Time” 96 | 97 | In the navigation pane on the left, under the “Color” section, select “Compute time.” 98 | ![image](./images/3.44.35PM.png) 99 | 100 | **4.5.4** View time taken by various layers 101 | 102 | This view will show time taken by each layer and will be colored according to how much relative time the layer took to compute. A lighter shade of red means that a relatively small portion of compute time was spent in this layer, while a darker red shows that more compute time was used. Some layers may also be blank, which indicates that these layers may have been optimized out to improve inference performance. Clicking on a node will show the compute time, if available. 103 | ![image](./images/3.49.20PM.png) 104 | 105 | ## Lab 4 Section 6: How to view detailed profile using the Neuron Profile plugin 106 | 107 | To get a better understanding of the profile, you can check out the Neuron Profile plugin. Here, you will find more information on the inference, including an overview, a list of the most time-consuming operators (op profile tool), and an execution timeline view (Chrome trace). 108 | 109 | **4.6.1** Select the “Neuron Profile” plugin 110 | 111 | On the navigation bar at the top of the page, there will be a list of active plugins. In this case, you will need to use the “Neuron Profile” plugin. The plugin may take a while to register on first load. If this tab does not show initially, please refresh the page. 112 | 113 | ![image](./images/3.43.45PM.png) 114 | 115 | **4.6.2** The Profile Overview 116 | 117 | The first page you will land on in the Neuron Profile plugin is the overview page. It contains various information regarding the inference. 118 | ![image](./images/3.56.54PM.png) 119 | In the “Performance Summary” section, you will see execution stats, such as the total execution time, the average layer execution time, and the utilization of NeuronMatrix Units. 120 | 121 | The “Neuron Time Graph” shows how long a portion of the graph (a NeuronOp) took to execute. 122 | 123 | The “Top TensorFlow operations executed on Neuron Cores” sections gives a quick summary of the most time-consuming operators that were executed on the device. 124 | 125 | “Run Environment” shows the information on devices used during this inference. 126 | 127 | Finally, the “Recommendation for Next Steps” section gives helpful pointers to place to learn more about what to do next 128 | 129 | **4.6.3** The Operator Profile 130 | 131 | In the “Tools” dropdown menu, select “op_profile.” 132 | 133 | The “op profile” tool displays the percentage of overall time taken for each operator, sorted by the most expensive operators at the top. It gives a better understanding of where the bottlenecks in a model may be. 134 | ![image](./images/3.57.19PM.png) 135 | 136 | **4.6.4** Chrome trace 137 | 138 | In the “Tools” dropdown menu, select “trace_viewer.” 139 | 140 | For developers wanting to better understand the timeline of the inference, the Chrome trace view is the tool for you. It shows the history of execution organized by the operator names. 141 | 142 | Please note that this tool can only be used in Chrome browsers. 143 | ![image](./images/3.59.37PM.png) 144 | 145 | ## Lab 4 Section 7: How to debug an inference 146 | 147 | **4.7.1** Launch TensorBoard-Neuron and navigate to the webpage 148 | 149 | To use the Debugger plugin, you will need to launch with an extra flag: 150 | 151 | ``` 152 | $ tensorboard_neuron --logdir ~/logs --debugger_port 7000 153 | ``` 154 | 155 | where 7000 can be any desired port number. 156 | 157 | **4.7.2** Modify and run your inference script 158 | 159 | In order to run the inference in “debug mode,” you must use TensorFlow’s debug wrapper. The following lines will need to be added to your infer_resnet50.py script from Lab 2. 160 | ```python 161 | ... 162 | DEBUG_SERVER_ADDRESS = 'localhost:7000' 163 | ... 164 | predictor_inferentia._session = tf_debug.TensorBoardDebugWrapperSession( 165 | predictor_inferentia._session, DEBUG_SERVER_ADDRESS) 166 | ``` 167 | 168 | The full modified infer_resnet50.py script is: 169 | 170 | ```python 171 | import os 172 | import time 173 | import numpy as np 174 | import tensorflow as tf 175 | from tensorflow.keras.preprocessing import image 176 | from tensorflow.keras.applications import resnet50 177 | from tensorflow.python import debug as tf_debug 178 | 179 | tf.keras.backend.set_image_data_format('channels_last') 180 | 181 | # The port must be the same as the one used for --debugger_port above 182 | # in this example, PORT is 7000 183 | DEBUG_SERVER_ADDRESS = 'localhost:7000' 184 | 185 | # Create input from image 186 | img_sgl = image.load_img('kitten_small.jpg', target_size=(224, 224)) 187 | img_arr = image.img_to_array(img_sgl) 188 | img_arr2 = np.expand_dims(img_arr, axis=0) 189 | img_arr3 = resnet50.preprocess_input(img_arr2) 190 | 191 | # Load model 192 | COMPILED_MODEL_DIR = './resnet50_neuron/' 193 | predictor_inferentia = tf.contrib.predictor.from_saved_model(COMPILED_MODEL_DIR) 194 | 195 | # Use the debug wrapper for TensorBoard 196 | predictor_inferentia._session = tf_debug.TensorBoardDebugWrapperSession( 197 | predictor_inferentia._session, DEBUG_SERVER_ADDRESS) 198 | 199 | # Run Inference and Display results 200 | model_feed_dict={'input': img_arr3} 201 | infa_rslts = predictor_inferentia(model_feed_dict) 202 | print(resnet50.decode_predictions(infa_rslts["output"], top=5)[0]) 203 | ``` 204 | 205 | After adding these modifications, run the script to begin inference. The execution will be paused before any calculation starts. 206 | 207 | **4.7.3** Select the “debugger” plugin 208 | 209 | On the navigation bar at the top of the page, there will be a list of active plugins. In this case, you will need to use the “Debugger” plugin. 210 | ![image](./images/5.05.06PM.png) 211 | 212 | **4.7.4** Enable watchpoints 213 | 214 | In the “Runtime Node List” on the left, there will be a list of operators and a checkbox next to each. Select all of the operators that you would like the view the tensor output of. 215 | ![image](./images/10.45.32AM.png) 216 | 217 | **4.7.5** execute inference 218 | 219 | On the bottom left of the page, there will be a “Continue...” button that will resume the inference execution. As the graph is executed, output tensors will be saved for later viewing. 220 | ![image](./images/10.46.14AM.png) 221 | 222 | **4.7.6** View tensors 223 | 224 | At the bottom of the page, there will be a“Tensor Value Overview” section that shows a summary of all the output tensors that were selected as watchpoints in Step 4.7.4. 225 | ![image](./images/10.47.32AM.png)To view more specific information on a tensor, you can click on a tensor’s value. You may also hover over the bar in the “Health Pill” column for a more detailed summary of values. 226 | ![image](./images/10.48.15AM.png) 227 | --------------------------------------------------------------------------------