├── LICENSE ├── README.md ├── imgs ├── 2022_01.jpg ├── _108240741_beatles-abbeyroad-square-reuters-applecorps.jpg ├── add256.jpg ├── add512.jpg ├── add768.jpg ├── addnormal.jpg ├── addtensorrt_FP16.jpg ├── addtensorrt_FP32.jpg ├── addtensorrt_INT8.jpg ├── mask256.jpg ├── mask512.jpg ├── mask768.jpg ├── masknormal.jpg ├── masktensorrt_FP16.jpg ├── masktensorrt_FP32.jpg └── masktensorrt_INT8.jpg ├── inference_FP32_vs_FP16.ipynb ├── inference_batch_vs_imsize.ipynb ├── inference_classification.ipynb ├── inference_dev.ipynb ├── inference_segmentation.ipynb ├── inference_segmentation_demo.ipynb ├── inference_tensorrt.py ├── results ├── batch.csv ├── fp16.csv ├── imsize.csv ├── jetsonano.txt ├── xavier.csv └── xavier_segmentation.csv └── utils ├── __pycache__ └── fp16.cpython-36.pyc └── fp16.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Kentaro Yoshioka 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Benchmark-FP32-FP16-INT8-with-TensorRT 2 | Benchmark inference speed of CNNs with various quantization methods with TensorRT! 3 | 4 | :star: if it helps you. 5 | 6 | # Image classification 7 | 8 | Run: 9 | `inference_tensorrt.py` 10 | 11 | ## Hardware:Jetson Nano. 12 | TRT notes TensorRT compiled models in the noted precision. 13 | 14 | Latency of image inference (1,3,256,256) [ms] 15 | 16 | | | TRT FP32 | TRT FP16 | TRT INT8 | 17 | |:--------:|------|:----:|------| 18 | | resnet18 | 26 | 18 | | 19 | | resnet34 | 48 | 30 | | 20 | | resnet50 | 79 | 42 | | 21 | 22 | Jetson Nano does not support INT8.. 23 | 24 | ## Hardware:Jetson Xavier. 25 | 26 | TRT notes TensorRT compiled models in the noted precision. 27 | 28 | Latency of image inference (1,3,256,256) [ms] 29 | 30 | | | resnet18 | resnet34 | resnet50 | 31 | |------|----------|----------|----------| 32 | | PytorchRaw | 11 | 12 | 16 | 33 | | TRT FP32 | 3.8 | 5.6 | 9.9 | 34 | | TRT FP16 | 2.1 | 3.3 | 4.4 | 35 | | TRT INT8 | 1.7 | 2.7 | 3.0 | 36 | 37 | # Image segmentation 38 | ![beatles](imgs/addtensorrt_FP32.jpg) 39 | ## Hardware:Jetson Xavier. 40 | 41 | TRT notes TensorRT compiled models in the noted precision. 42 | 43 | Latency of image inference (1,3,512,512) [ms] 44 | 45 | | | fcn_resnet50 | fcn_resnet101 | deeplabv3_resnet50 | deeplabv3_resnet101 | 46 | |------|--------------|---------------|--------------------|---------------------| 47 | | PytorchRaw | 200 | 344 | 281 | 426 | 48 | | TRT FP32 | 173 | 290 | 252 | 366 | 49 | | TRT FP16 | 36 | 57 | 130 | 151 | 50 | | TRT INT8 | 21 | 32 | 97 | 108 | 51 | 52 | ## Hardware:Jetson Nano. 53 | 54 | Latency of image inference (1,3,256,256) [ms] 55 | 56 | | | fcn_resnet50 | 57 | |------|--------------| 58 | | PytorchRaw | 6800 | 59 | | TRT FP32 | 767 | 60 | | TRT FP16 | 40 | 61 | | TRT INT8 | NA | 62 | 63 | # Hardware setup 64 | The hardware setup seems tricky. 65 | 66 | * Install pytorch 67 | 68 | https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048 69 | 70 | **The stable version for Jetson nano seems to be torch==1.1** 71 | 72 | **For Xavier, torch==1.3 worked fine for me.** 73 | 74 | * Install torchvision 75 | 76 | I followed this instruction and installed torchvision==0.3.0 77 | 78 | https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32 79 | 80 | ```bash 81 | sudo apt-get install libjpeg-dev zlib1g-dev 82 | git clone -b v0.3.0 https://github.com/pytorch/vision torchvision 83 | cd torchvision 84 | sudo python3 setup.py install 85 | ``` 86 | 87 | * Install torch2trt 88 | 89 | Followed readme. 90 | 91 | https://github.com/NVIDIA-AI-IOT/torch2trt 92 | 93 | ```bash 94 | sudo apt-get install libprotobuf* protobuf-compiler ninja-build 95 | git clone https://github.com/NVIDIA-AI-IOT/torch2trt 96 | cd torch2trt 97 | sudo python3 setup.py install --plugins 98 | ``` 99 | -------------------------------------------------------------------------------- /imgs/2022_01.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/2022_01.jpg -------------------------------------------------------------------------------- /imgs/_108240741_beatles-abbeyroad-square-reuters-applecorps.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/_108240741_beatles-abbeyroad-square-reuters-applecorps.jpg -------------------------------------------------------------------------------- /imgs/add256.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add256.jpg -------------------------------------------------------------------------------- /imgs/add512.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add512.jpg -------------------------------------------------------------------------------- /imgs/add768.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add768.jpg -------------------------------------------------------------------------------- /imgs/addnormal.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addnormal.jpg -------------------------------------------------------------------------------- /imgs/addtensorrt_FP16.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_FP16.jpg -------------------------------------------------------------------------------- /imgs/addtensorrt_FP32.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_FP32.jpg -------------------------------------------------------------------------------- /imgs/addtensorrt_INT8.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_INT8.jpg -------------------------------------------------------------------------------- /imgs/mask256.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask256.jpg -------------------------------------------------------------------------------- /imgs/mask512.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask512.jpg -------------------------------------------------------------------------------- /imgs/mask768.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask768.jpg -------------------------------------------------------------------------------- /imgs/masknormal.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masknormal.jpg -------------------------------------------------------------------------------- /imgs/masktensorrt_FP16.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_FP16.jpg -------------------------------------------------------------------------------- /imgs/masktensorrt_FP32.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_FP32.jpg -------------------------------------------------------------------------------- /imgs/masktensorrt_INT8.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_INT8.jpg -------------------------------------------------------------------------------- /inference_FP32_vs_FP16.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 8, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import torch\n", 11 | "import time\n", 12 | "from torchvision.models import *\n", 13 | "import pandas as pd\n", 14 | "import os\n", 15 | "from apex import amp" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 9, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "# make results\n", 25 | "os.makedirs(\"results\", exist_ok=True)" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 10, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n", 35 | " inputs = torch.randn(input_size)\n", 36 | " if device == 'cuda':\n", 37 | " model = model.cuda()\n", 38 | " inputs = inputs.cuda()\n", 39 | " if FP16:\n", 40 | " model = model.half()\n", 41 | " inputs = inputs.half()\n", 42 | "\n", 43 | " model.eval()\n", 44 | "\n", 45 | " i = 0\n", 46 | " time_spent = []\n", 47 | " while i < 200:\n", 48 | " start_time = time.time()\n", 49 | " with torch.no_grad():\n", 50 | " _ = model(inputs)\n", 51 | "\n", 52 | " if device == 'cuda':\n", 53 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n", 54 | " if i != 0:\n", 55 | " time_spent.append(time.time() - start_time)\n", 56 | " i += 1\n", 57 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n", 58 | " return np.mean(time_spent)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 11, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n", 68 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n", 69 | "\n", 70 | "# resnet is enought for now\n", 71 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\"]" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 17, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "model: resnet18\n", 84 | "Looks ok!\n" 85 | ] 86 | } 87 | ], 88 | "source": [ 89 | "# test amp\n", 90 | "model_name = \"resnet18\"\n", 91 | "print(\"model: {}\".format(model_name))\n", 92 | "mdl = globals()[model_name]\n", 93 | "model = mdl().to(\"cuda\")\n", 94 | "model = amp.initialize(model, opt_level=opt_level)\n", 95 | "print(\"Looks ok!\")" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 20, 101 | "metadata": { 102 | "scrolled": false 103 | }, 104 | "outputs": [ 105 | { 106 | "name": "stdout", 107 | "output_type": "stream", 108 | "text": [ 109 | "model: resnet18\n", 110 | "Avg execution time (ms): 0.010\n", 111 | "Avg execution time (ms): 0.009\n", 112 | "Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.\n", 113 | "\n", 114 | "Defaults for this optimization level are:\n", 115 | "enabled : True\n", 116 | "opt_level : O1\n", 117 | "cast_model_type : None\n", 118 | "patch_torch_functions : True\n", 119 | "keep_batchnorm_fp32 : None\n", 120 | "master_weights : None\n", 121 | "loss_scale : dynamic\n", 122 | "Processing user overrides (additional kwargs that are not None)...\n", 123 | "After processing overrides, optimization options are:\n", 124 | "enabled : True\n", 125 | "opt_level : O1\n", 126 | "cast_model_type : None\n", 127 | "patch_torch_functions : True\n", 128 | "keep_batchnorm_fp32 : None\n", 129 | "master_weights : None\n", 130 | "loss_scale : dynamic\n", 131 | "Avg execution time (ms): 0.009\n", 132 | "model: resnet34\n", 133 | "Avg execution time (ms): 0.017\n", 134 | "Avg execution time (ms): 0.017\n", 135 | "Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.\n", 136 | "\n", 137 | "Defaults for this optimization level are:\n", 138 | "enabled : True\n", 139 | "opt_level : O1\n", 140 | "cast_model_type : None\n", 141 | "patch_torch_functions : True\n", 142 | "keep_batchnorm_fp32 : None\n", 143 | "master_weights : None\n", 144 | "loss_scale : dynamic\n", 145 | "Processing user overrides (additional kwargs that are not None)...\n", 146 | "After processing overrides, optimization options are:\n", 147 | "enabled : True\n", 148 | "opt_level : O1\n", 149 | "cast_model_type : None\n", 150 | "patch_torch_functions : True\n", 151 | "keep_batchnorm_fp32 : None\n", 152 | "master_weights : None\n", 153 | "loss_scale : dynamic\n", 154 | "Avg execution time (ms): 0.017\n", 155 | "model: resnet50\n", 156 | "Avg execution time (ms): 0.019\n", 157 | "Avg execution time (ms): 0.019\n", 158 | "Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.\n", 159 | "\n", 160 | "Defaults for this optimization level are:\n", 161 | "enabled : True\n", 162 | "opt_level : O1\n", 163 | "cast_model_type : None\n", 164 | "patch_torch_functions : True\n", 165 | "keep_batchnorm_fp32 : None\n", 166 | "master_weights : None\n", 167 | "loss_scale : dynamic\n", 168 | "Processing user overrides (additional kwargs that are not None)...\n", 169 | "After processing overrides, optimization options are:\n", 170 | "enabled : True\n", 171 | "opt_level : O1\n", 172 | "cast_model_type : None\n", 173 | "patch_torch_functions : True\n", 174 | "keep_batchnorm_fp32 : None\n", 175 | "master_weights : None\n", 176 | "loss_scale : dynamic\n", 177 | "Avg execution time (ms): 0.020\n" 178 | ] 179 | } 180 | ], 181 | "source": [ 182 | "for i, model_name in enumerate(modellist):\n", 183 | "\n", 184 | " runtimes = []\n", 185 | " \n", 186 | " # define model\n", 187 | " print(\"model: {}\".format(model_name))\n", 188 | " mdl = globals()[model_name]\n", 189 | " model = mdl()\n", 190 | " \n", 191 | " # Run FP32\n", 192 | " runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=False))\n", 193 | " # Run FP16\n", 194 | " runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=True))\n", 195 | " \n", 196 | " # Amp Initialization\n", 197 | " opt_level = 'O1' # for only use FP32\n", 198 | " mdl = globals()[model_name]\n", 199 | " model = mdl().to(\"cuda\")\n", 200 | " model = amp.initialize(model, opt_level=opt_level)\n", 201 | " \n", 202 | " # Run FP16\n", 203 | " runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=False))\n", 204 | " \n", 205 | " if i == 0:\n", 206 | " df = pd.DataFrame({model_name: runtimes},\n", 207 | " index = [\"FP32\", \"FP16_torch\", \"FP16_apex\"])\n", 208 | " else:\n", 209 | " df[model_name] = runtimes\n", 210 | " " 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 21, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "data": { 220 | "text/html": [ 221 | "
\n", 222 | "\n", 235 | "\n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | "
resnet18resnet34resnet50
FP320.0098910.0169400.019452
FP16_torch0.0090340.0170170.019448
FP16_apex0.0090570.0169780.020046
\n", 265 | "
" 266 | ], 267 | "text/plain": [ 268 | " resnet18 resnet34 resnet50\n", 269 | "FP32 0.009891 0.016940 0.019452\n", 270 | "FP16_torch 0.009034 0.017017 0.019448\n", 271 | "FP16_apex 0.009057 0.016978 0.020046" 272 | ] 273 | }, 274 | "execution_count": 21, 275 | "metadata": {}, 276 | "output_type": "execute_result" 277 | } 278 | ], 279 | "source": [ 280 | "df.to_csv(\"results/fp16.csv\")\n", 281 | "df" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "metadata": {}, 288 | "outputs": [], 289 | "source": [] 290 | } 291 | ], 292 | "metadata": { 293 | "kernelspec": { 294 | "display_name": "Python 3", 295 | "language": "python", 296 | "name": "python3" 297 | }, 298 | "language_info": { 299 | "codemirror_mode": { 300 | "name": "ipython", 301 | "version": 3 302 | }, 303 | "file_extension": ".py", 304 | "mimetype": "text/x-python", 305 | "name": "python", 306 | "nbconvert_exporter": "python", 307 | "pygments_lexer": "ipython3", 308 | "version": "3.6.5" 309 | } 310 | }, 311 | "nbformat": 4, 312 | "nbformat_minor": 2 313 | } 314 | -------------------------------------------------------------------------------- /inference_batch_vs_imsize.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 8, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import torch\n", 11 | "import time\n", 12 | "from torchvision.models import *\n", 13 | "import pandas as pd\n", 14 | "import os" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 9, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "# make models from str\n", 24 | "model_name = \"resnet18\"\n", 25 | "# make results\n", 26 | "os.makedirs(\"results\", exist_ok=True)" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 3, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n", 36 | " inputs = torch.randn(input_size)\n", 37 | " if device == 'cuda':\n", 38 | " model = model.cuda()\n", 39 | " inputs = inputs.cuda()\n", 40 | " if FP16:\n", 41 | " model = model.half()\n", 42 | " inputs = inputs.half()\n", 43 | "\n", 44 | " model.eval()\n", 45 | "\n", 46 | " i = 0\n", 47 | " time_spent = []\n", 48 | " while i < 200:\n", 49 | " start_time = time.time()\n", 50 | " with torch.no_grad():\n", 51 | " _ = model(inputs)\n", 52 | "\n", 53 | " if device == 'cuda':\n", 54 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n", 55 | " if i != 0:\n", 56 | " time_spent.append(time.time() - start_time)\n", 57 | " i += 1\n", 58 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n", 59 | " return np.mean(time_spent)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n", 69 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n", 70 | "\n", 71 | "# resnet is enought for now\n", 72 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\"]" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 5, 78 | "metadata": { 79 | "scrolled": false 80 | }, 81 | "outputs": [ 82 | { 83 | "name": "stdout", 84 | "output_type": "stream", 85 | "text": [ 86 | "model: resnet18\n", 87 | "Avg execution time (ms): 0.009\n", 88 | "Avg execution time (ms): 0.019\n", 89 | "Avg execution time (ms): 0.047\n", 90 | "Avg execution time (ms): 0.040\n", 91 | "Avg execution time (ms): 0.091\n", 92 | "Avg execution time (ms): 0.008\n", 93 | "Avg execution time (ms): 0.008\n", 94 | "Avg execution time (ms): 0.020\n", 95 | "Avg execution time (ms): 0.072\n", 96 | "model: resnet34\n", 97 | "Avg execution time (ms): 0.015\n", 98 | "Avg execution time (ms): 0.037\n", 99 | "Avg execution time (ms): 0.079\n", 100 | "Avg execution time (ms): 0.066\n", 101 | "Avg execution time (ms): 0.146\n", 102 | "Avg execution time (ms): 0.014\n", 103 | "Avg execution time (ms): 0.015\n", 104 | "Avg execution time (ms): 0.038\n", 105 | "Avg execution time (ms): 0.135\n", 106 | "model: resnet50\n", 107 | "Avg execution time (ms): 0.018\n", 108 | "Avg execution time (ms): 0.045\n", 109 | "Avg execution time (ms): 0.093\n", 110 | "Avg execution time (ms): 0.137\n", 111 | "Avg execution time (ms): 0.279\n", 112 | "Avg execution time (ms): 0.014\n", 113 | "Avg execution time (ms): 0.018\n", 114 | "Avg execution time (ms): 0.048\n", 115 | "Avg execution time (ms): 0.179\n" 116 | ] 117 | } 118 | ], 119 | "source": [ 120 | "batchlist = [1, 4, 8, 16, 32]\n", 121 | "imsize = [128, 256, 512, 1024]\n", 122 | "\n", 123 | "for i, model_name in enumerate(modellist):\n", 124 | "\n", 125 | " runtimes = []\n", 126 | " \n", 127 | " # define model\n", 128 | " print(\"model: {}\".format(model_name))\n", 129 | " mdl = globals()[model_name]\n", 130 | " model = mdl()\n", 131 | " \n", 132 | " for batch in batchlist: \n", 133 | " runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n", 134 | "\n", 135 | " if i == 0:\n", 136 | " dfbatch = pd.DataFrame({model_name: runtimes},\n", 137 | " index = batchlist)\n", 138 | " else:\n", 139 | " dfbatch[model_name] = runtimes\n", 140 | " \n", 141 | " runtimes = []\n", 142 | " for isize in imsize:\n", 143 | " runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n", 144 | "\n", 145 | " if i == 0:\n", 146 | " dfimsize = pd.DataFrame({model_name: runtimes},\n", 147 | " index = imsize)\n", 148 | " else:\n", 149 | " dfimsize[model_name] = runtimes" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 13, 155 | "metadata": {}, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/html": [ 160 | "
\n", 161 | "\n", 174 | "\n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | "
resnet18resnet34resnet50
10.0085420.0148130.017847
40.0048400.0091990.011361
80.0058780.0098940.011598
160.0024950.0041450.008592
320.0028360.0045640.008728
\n", 216 | "
" 217 | ], 218 | "text/plain": [ 219 | " resnet18 resnet34 resnet50\n", 220 | "1 0.008542 0.014813 0.017847\n", 221 | "4 0.004840 0.009199 0.011361\n", 222 | "8 0.005878 0.009894 0.011598\n", 223 | "16 0.002495 0.004145 0.008592\n", 224 | "32 0.002836 0.004564 0.008728" 225 | ] 226 | }, 227 | "execution_count": 13, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "dfbatch.to_csv(\"results/batch.csv\")\n", 234 | "dfimsize.to_csv(\"results/imsize.csv\")\n", 235 | "dfbatch" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": 14, 241 | "metadata": {}, 242 | "outputs": [ 243 | { 244 | "data": { 245 | "text/html": [ 246 | "
\n", 247 | "\n", 260 | "\n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | "
resnet18resnet34resnet50
1280.0078690.0144140.014149
2560.0077170.0146840.017656
5120.0201390.0376990.047882
10240.0716870.1348250.179115
\n", 296 | "
" 297 | ], 298 | "text/plain": [ 299 | " resnet18 resnet34 resnet50\n", 300 | "128 0.007869 0.014414 0.014149\n", 301 | "256 0.007717 0.014684 0.017656\n", 302 | "512 0.020139 0.037699 0.047882\n", 303 | "1024 0.071687 0.134825 0.179115" 304 | ] 305 | }, 306 | "execution_count": 14, 307 | "metadata": {}, 308 | "output_type": "execute_result" 309 | } 310 | ], 311 | "source": [ 312 | "dfimsize" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": 15, 318 | "metadata": {}, 319 | "outputs": [ 320 | { 321 | "ename": "ModuleNotFoundError", 322 | "evalue": "No module named 'apex'", 323 | "output_type": "error", 324 | "traceback": [ 325 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 326 | "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", 327 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mapex\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 328 | "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'apex'" 329 | ] 330 | } 331 | ], 332 | "source": [ 333 | "import " 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "metadata": {}, 340 | "outputs": [], 341 | "source": [] 342 | } 343 | ], 344 | "metadata": { 345 | "kernelspec": { 346 | "display_name": "Python 3", 347 | "language": "python", 348 | "name": "python3" 349 | }, 350 | "language_info": { 351 | "codemirror_mode": { 352 | "name": "ipython", 353 | "version": 3 354 | }, 355 | "file_extension": ".py", 356 | "mimetype": "text/x-python", 357 | "name": "python", 358 | "nbconvert_exporter": "python", 359 | "pygments_lexer": "ipython3", 360 | "version": "3.6.5" 361 | } 362 | }, 363 | "nbformat": 4, 364 | "nbformat_minor": 2 365 | } 366 | -------------------------------------------------------------------------------- /inference_classification.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 9, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import torch\n", 11 | "import time\n", 12 | "from torchvision.models import *\n", 13 | "import pandas as pd" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 2, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "# make models from str\n", 23 | "model_name = \"resnet18\"" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 7, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n", 33 | " inputs = torch.randn(input_size)\n", 34 | " if device == 'cuda':\n", 35 | " model = model.cuda()\n", 36 | " inputs = inputs.cuda()\n", 37 | " if FP16:\n", 38 | " model = model.half()\n", 39 | " inputs = inputs.half()\n", 40 | "\n", 41 | " model.eval()\n", 42 | "\n", 43 | " i = 0\n", 44 | " time_spent = []\n", 45 | " while i < 200:\n", 46 | " start_time = time.time()\n", 47 | " with torch.no_grad():\n", 48 | " _ = model(inputs)\n", 49 | "\n", 50 | " if device == 'cuda':\n", 51 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n", 52 | " if i != 0:\n", 53 | " time_spent.append(time.time() - start_time)\n", 54 | " i += 1\n", 55 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n", 56 | " return np.mean(time_spent)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 38, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n", 66 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n", 67 | "\n", 68 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": { 75 | "scrolled": false 76 | }, 77 | "outputs": [ 78 | { 79 | "name": "stdout", 80 | "output_type": "stream", 81 | "text": [ 82 | "model: resnet18\n", 83 | "Avg execution time (ms): 0.008\n", 84 | "Avg execution time (ms): 0.019\n", 85 | "Avg execution time (ms): 0.046\n", 86 | "Avg execution time (ms): 0.040\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "for i, model_name in enumerate(modellist):\n", 92 | " batchlist = [1, 4, 8, 16, 32]\n", 93 | " imsize = [128, 256, 512]\n", 94 | " runtimes = []\n", 95 | " \n", 96 | " # define model\n", 97 | " print(\"model: {}\".format(model_name))\n", 98 | " mdl = globals()[model_name]\n", 99 | " model = mdl()\n", 100 | " \n", 101 | " for batch in batchlist: \n", 102 | " runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n", 103 | "\n", 104 | " if i == 0:\n", 105 | " dfbatch = pd.DataFrame({model_name: runtimes},\n", 106 | " index = batchlist)\n", 107 | " else:\n", 108 | " dfbatch[model_name] = runtimes\n", 109 | " \n", 110 | " runtimes = []\n", 111 | " for isize in imsize:\n", 112 | " print(\"model: {}\".format(model_name))\n", 113 | " mdl = globals()[model_name]\n", 114 | " model = mdl()\n", 115 | " runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n", 116 | "\n", 117 | " if i == 0:\n", 118 | " dfimsize = pd.DataFrame({model_name: runtimes},\n", 119 | " index = imsize)\n", 120 | " else:\n", 121 | " dfimsize[model_name] = runtimes" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 37, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/html": [ 132 | "
\n", 133 | "\n", 146 | "\n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | "
resnet18resnet34resnet50resnet101resnet152resnext50_32x4dresnext101_32x8dmnasnet1_0squeezenet1_0densenet121densenet169inception_v3
cuda FP320.0068950.0135120.0166320.0329390.0484000.0333090.1187090.0077040.0041200.0250610.0376390.027673
cuda FP160.0079690.0153160.0179400.0358980.0523640.0337560.1141060.0077770.0039660.0226300.0348380.030210
\n", 197 | "
" 198 | ], 199 | "text/plain": [ 200 | " resnet18 resnet34 resnet50 resnet101 resnet152 \\\n", 201 | "cuda FP32 0.006895 0.013512 0.016632 0.032939 0.048400 \n", 202 | "cuda FP16 0.007969 0.015316 0.017940 0.035898 0.052364 \n", 203 | "\n", 204 | " resnext50_32x4d resnext101_32x8d mnasnet1_0 squeezenet1_0 \\\n", 205 | "cuda FP32 0.033309 0.118709 0.007704 0.004120 \n", 206 | "cuda FP16 0.033756 0.114106 0.007777 0.003966 \n", 207 | "\n", 208 | " densenet121 densenet169 inception_v3 \n", 209 | "cuda FP32 0.025061 0.037639 0.027673 \n", 210 | "cuda FP16 0.022630 0.034838 0.030210 " 211 | ] 212 | }, 213 | "execution_count": 37, 214 | "metadata": {}, 215 | "output_type": "execute_result" 216 | } 217 | ], 218 | "source": [ 219 | "df" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": {}, 226 | "outputs": [], 227 | "source": [] 228 | } 229 | ], 230 | "metadata": { 231 | "kernelspec": { 232 | "display_name": "Python 3", 233 | "language": "python", 234 | "name": "python3" 235 | }, 236 | "language_info": { 237 | "codemirror_mode": { 238 | "name": "ipython", 239 | "version": 3 240 | }, 241 | "file_extension": ".py", 242 | "mimetype": "text/x-python", 243 | "name": "python", 244 | "nbconvert_exporter": "python", 245 | "pygments_lexer": "ipython3", 246 | "version": "3.6.5" 247 | } 248 | }, 249 | "nbformat": 4, 250 | "nbformat_minor": 2 251 | } 252 | -------------------------------------------------------------------------------- /inference_dev.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 9, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import torch\n", 11 | "import time\n", 12 | "from torchvision.models import *\n", 13 | "import pandas as pd" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 2, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "# make models from str\n", 23 | "model_name = \"resnet18\"" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 7, 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n", 33 | " inputs = torch.randn(input_size)\n", 34 | " if device == 'cuda':\n", 35 | " model = model.cuda()\n", 36 | " inputs = inputs.cuda()\n", 37 | " if FP16:\n", 38 | " model = model.half()\n", 39 | " inputs = inputs.half()\n", 40 | "\n", 41 | " model.eval()\n", 42 | "\n", 43 | " i = 0\n", 44 | " time_spent = []\n", 45 | " while i < 200:\n", 46 | " start_time = time.time()\n", 47 | " with torch.no_grad():\n", 48 | " _ = model(inputs)\n", 49 | "\n", 50 | " if device == 'cuda':\n", 51 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n", 52 | " if i != 0:\n", 53 | " time_spent.append(time.time() - start_time)\n", 54 | " i += 1\n", 55 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n", 56 | " return np.mean(time_spent)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 38, 62 | "metadata": {}, 63 | "outputs": [], 64 | "source": [ 65 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n", 66 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n", 67 | "\n", 68 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": { 75 | "scrolled": false 76 | }, 77 | "outputs": [ 78 | { 79 | "name": "stdout", 80 | "output_type": "stream", 81 | "text": [ 82 | "model: resnet18\n", 83 | "Avg execution time (ms): 0.008\n", 84 | "Avg execution time (ms): 0.019\n", 85 | "Avg execution time (ms): 0.046\n", 86 | "Avg execution time (ms): 0.040\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "for i, model_name in enumerate(modellist):\n", 92 | " batchlist = [1, 4, 8, 16, 32]\n", 93 | " imsize = [128, 256, 512]\n", 94 | " runtimes = []\n", 95 | " \n", 96 | " # define model\n", 97 | " print(\"model: {}\".format(model_name))\n", 98 | " mdl = globals()[model_name]\n", 99 | " model = mdl()\n", 100 | " \n", 101 | " for batch in batchlist: \n", 102 | " runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n", 103 | "\n", 104 | " if i == 0:\n", 105 | " dfbatch = pd.DataFrame({model_name: runtimes},\n", 106 | " index = batchlist)\n", 107 | " else:\n", 108 | " dfbatch[model_name] = runtimes\n", 109 | " \n", 110 | " runtimes = []\n", 111 | " for isize in imsize:\n", 112 | " print(\"model: {}\".format(model_name))\n", 113 | " mdl = globals()[model_name]\n", 114 | " model = mdl()\n", 115 | " runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n", 116 | "\n", 117 | " if i == 0:\n", 118 | " dfimsize = pd.DataFrame({model_name: runtimes},\n", 119 | " index = imsize)\n", 120 | " else:\n", 121 | " dfimsize[model_name] = runtimes" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 37, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/html": [ 132 | "
\n", 133 | "\n", 146 | "\n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | "
resnet18resnet34resnet50resnet101resnet152resnext50_32x4dresnext101_32x8dmnasnet1_0squeezenet1_0densenet121densenet169inception_v3
cuda FP320.0068950.0135120.0166320.0329390.0484000.0333090.1187090.0077040.0041200.0250610.0376390.027673
cuda FP160.0079690.0153160.0179400.0358980.0523640.0337560.1141060.0077770.0039660.0226300.0348380.030210
\n", 197 | "
" 198 | ], 199 | "text/plain": [ 200 | " resnet18 resnet34 resnet50 resnet101 resnet152 \\\n", 201 | "cuda FP32 0.006895 0.013512 0.016632 0.032939 0.048400 \n", 202 | "cuda FP16 0.007969 0.015316 0.017940 0.035898 0.052364 \n", 203 | "\n", 204 | " resnext50_32x4d resnext101_32x8d mnasnet1_0 squeezenet1_0 \\\n", 205 | "cuda FP32 0.033309 0.118709 0.007704 0.004120 \n", 206 | "cuda FP16 0.033756 0.114106 0.007777 0.003966 \n", 207 | "\n", 208 | " densenet121 densenet169 inception_v3 \n", 209 | "cuda FP32 0.025061 0.037639 0.027673 \n", 210 | "cuda FP16 0.022630 0.034838 0.030210 " 211 | ] 212 | }, 213 | "execution_count": 37, 214 | "metadata": {}, 215 | "output_type": "execute_result" 216 | } 217 | ], 218 | "source": [ 219 | "df" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": null, 225 | "metadata": {}, 226 | "outputs": [], 227 | "source": [] 228 | } 229 | ], 230 | "metadata": { 231 | "kernelspec": { 232 | "display_name": "Python 3", 233 | "language": "python", 234 | "name": "python3" 235 | }, 236 | "language_info": { 237 | "codemirror_mode": { 238 | "name": "ipython", 239 | "version": 3 240 | }, 241 | "file_extension": ".py", 242 | "mimetype": "text/x-python", 243 | "name": "python", 244 | "nbconvert_exporter": "python", 245 | "pygments_lexer": "ipython3", 246 | "version": "3.6.5" 247 | } 248 | }, 249 | "nbformat": 4, 250 | "nbformat_minor": 2 251 | } 252 | -------------------------------------------------------------------------------- /inference_segmentation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import numpy as np\n", 10 | "import torch\n", 11 | "import time\n", 12 | "from torchvision.models import *\n", 13 | "import pandas as pd\n", 14 | "import os\n", 15 | "import torchvision\n", 16 | "from torch2trt import torch2trt" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "from torchvision.models.segmentation import *" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 3, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "FP32 = True\n", 35 | "FP16 = True\n", 36 | "INT8 = True" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 4, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "# make results\n", 46 | "os.makedirs(\"results\", exist_ok=True)" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 5, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "class ModelWrapper(torch.nn.Module):\n", 56 | " def __init__(self, model):\n", 57 | " super(ModelWrapper, self).__init__()\n", 58 | " self.model = model\n", 59 | " def forward(self, x):\n", 60 | " return self.model(x)['out']" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 6, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n", 70 | " inputs = torch.randn(input_size)\n", 71 | " if device == 'cuda':\n", 72 | " model = model.cuda()\n", 73 | " inputs = inputs.cuda()\n", 74 | " if FP16:\n", 75 | " model = model.half()\n", 76 | " inputs = inputs.half()\n", 77 | "\n", 78 | " model.eval()\n", 79 | "\n", 80 | " i = 0\n", 81 | " time_spent = []\n", 82 | " while i < 200:\n", 83 | " start_time = time.time()\n", 84 | " with torch.no_grad():\n", 85 | " _ = model(inputs)\n", 86 | "\n", 87 | " if device == 'cuda':\n", 88 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n", 89 | " if i != 0:\n", 90 | " time_spent.append(time.time() - start_time)\n", 91 | " i += 1\n", 92 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n", 93 | " return np.mean(time_spent)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 7, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "# resnet is enought for now\n", 103 | "modellist = [\"fcn_resnet50\", \"fcn_resnet101\", \"deeplabv3_resnet50\", \"deeplabv3_resnet101\"]" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 9, 109 | "metadata": { 110 | "scrolled": true 111 | }, 112 | "outputs": [ 113 | { 114 | "name": "stdout", 115 | "output_type": "stream", 116 | "text": [ 117 | "model: fcn_resnet50\n", 118 | "Avg execution time (ms): 0.205\n", 119 | "Avg execution time (ms): 0.174\n", 120 | "running fp16 models..\n", 121 | "Avg execution time (ms): 0.037\n", 122 | "running int8 models..\n", 123 | "Avg execution time (ms): 0.022\n", 124 | "model: fcn_resnet101\n" 125 | ] 126 | }, 127 | { 128 | "name": "stderr", 129 | "output_type": "stream", 130 | "text": [ 131 | "Downloading: \"https://download.pytorch.org/models/resnet101-5d3b4d8f.pth\" to /home/ken/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth\n", 132 | "100.0%\n" 133 | ] 134 | }, 135 | { 136 | "name": "stdout", 137 | "output_type": "stream", 138 | "text": [ 139 | "Avg execution time (ms): 0.344\n", 140 | "Avg execution time (ms): 0.290\n", 141 | "running fp16 models..\n", 142 | "Avg execution time (ms): 0.057\n", 143 | "running int8 models..\n", 144 | "Avg execution time (ms): 0.032\n", 145 | "model: deeplabv3_resnet50\n", 146 | "Avg execution time (ms): 0.281\n", 147 | "Avg execution time (ms): 0.252\n", 148 | "running fp16 models..\n", 149 | "Avg execution time (ms): 0.130\n", 150 | "running int8 models..\n", 151 | "Avg execution time (ms): 0.097\n", 152 | "model: deeplabv3_resnet101\n", 153 | "Avg execution time (ms): 0.426\n", 154 | "Avg execution time (ms): 0.367\n", 155 | "running fp16 models..\n", 156 | "Avg execution time (ms): 0.151\n", 157 | "running int8 models..\n", 158 | "Avg execution time (ms): 0.108\n" 159 | ] 160 | } 161 | ], 162 | "source": [ 163 | "results = []\n", 164 | "for i, model_name in enumerate(modellist):\n", 165 | " runtimes = []\n", 166 | "\n", 167 | " # define model\n", 168 | " print(\"model: {}\".format(model_name))\n", 169 | " input_size = [1, 3, 512, 512]\n", 170 | " mdl = globals()[model_name]\n", 171 | " model = mdl().cuda().eval()\n", 172 | " # Run raw models\n", 173 | " runtimes.append(computeTime(model, input_size=input_size, device=\"cuda\", FP16=False))\n", 174 | "\n", 175 | " if FP32: \n", 176 | " mdl = globals()[model_name]\n", 177 | " model = mdl().cuda().eval()\n", 178 | " model_w = ModelWrapper(model)\n", 179 | " x = torch.zeros(input_size).cuda()\n", 180 | "\n", 181 | " # convert to tensorrt models\n", 182 | " model_trt = torch2trt(model_w, [x])\n", 183 | "\n", 184 | " # Run TensorRT models\n", 185 | " runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=False))\n", 186 | " if FP16:\n", 187 | " print(\"running fp16 models..\")\n", 188 | " # Make FP16 tensorRT models\n", 189 | " mdl = globals()[model_name]\n", 190 | " model = mdl().eval().half().cuda()\n", 191 | " model_w = ModelWrapper(model).half()\n", 192 | " x = torch.zeros(input_size).half().cuda()\n", 193 | " # convert to tensorrt models\n", 194 | " model_trt = torch2trt(model_w, [x], fp16_mode=True)\n", 195 | " # Run TensorRT models\n", 196 | " runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=True))\n", 197 | "\n", 198 | " if INT8:\n", 199 | " print(\"running int8 models..\")\n", 200 | " # Make INT8 tensorRT models\n", 201 | " mdl = globals()[model_name]\n", 202 | " model = mdl().eval().half().cuda()\n", 203 | " model_w = ModelWrapper(model).half()\n", 204 | " x = torch.randn(input_size).half().cuda()\n", 205 | " # convert to tensorrt models\n", 206 | " model_trt = torch2trt(model_w, [x], fp16_mode=True, int8_mode=True, max_batch_size=1)\n", 207 | "\n", 208 | " runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=True))\n", 209 | "\n", 210 | " if i == 0:\n", 211 | " df = pd.DataFrame({model_name: runtimes},\n", 212 | " index = [\"Raw\", \"FP32\", \"FP16\", \"INT8\"])\n", 213 | " else:\n", 214 | " df[model_name] = runtimes" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 10, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "text/html": [ 225 | "
\n", 226 | "\n", 239 | "\n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | "
fcn_resnet50fcn_resnet101deeplabv3_resnet50deeplabv3_resnet101
Raw0.2053590.3443070.2810230.425960
FP320.1738180.2901800.2523140.366532
FP160.0366350.0569220.1298680.151195
INT80.0218690.0322920.0973510.108282
\n", 280 | "
" 281 | ], 282 | "text/plain": [ 283 | " fcn_resnet50 fcn_resnet101 deeplabv3_resnet50 deeplabv3_resnet101\n", 284 | "Raw 0.205359 0.344307 0.281023 0.425960\n", 285 | "FP32 0.173818 0.290180 0.252314 0.366532\n", 286 | "FP16 0.036635 0.056922 0.129868 0.151195\n", 287 | "INT8 0.021869 0.032292 0.097351 0.108282" 288 | ] 289 | }, 290 | "execution_count": 10, 291 | "metadata": {}, 292 | "output_type": "execute_result" 293 | } 294 | ], 295 | "source": [ 296 | "df.to_csv(\"results/xavier_segmentation.csv\")\n", 297 | "df" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [] 306 | } 307 | ], 308 | "metadata": { 309 | "kernelspec": { 310 | "display_name": "Python 3", 311 | "language": "python", 312 | "name": "python3" 313 | }, 314 | "language_info": { 315 | "codemirror_mode": { 316 | "name": "ipython", 317 | "version": 3 318 | }, 319 | "file_extension": ".py", 320 | "mimetype": "text/x-python", 321 | "name": "python", 322 | "nbconvert_exporter": "python", 323 | "pygments_lexer": "ipython3", 324 | "version": "3.6.9" 325 | } 326 | }, 327 | "nbformat": 4, 328 | "nbformat_minor": 2 329 | } 330 | -------------------------------------------------------------------------------- /inference_tensorrt.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # coding: utf-8 3 | 4 | import numpy as np 5 | import torch 6 | import time 7 | from torchvision.models import * 8 | from utils.fp16 import network_to_half 9 | import os 10 | from torch2trt import torch2trt 11 | import pandas as pd 12 | 13 | FP32 = True 14 | FP16 = True 15 | INT8 = True 16 | 17 | # make results 18 | os.makedirs("results", exist_ok=True) 19 | 20 | def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False): 21 | inputs = torch.randn(input_size) 22 | if device == 'cuda': 23 | inputs = inputs.cuda() 24 | if FP16: 25 | model = network_to_half(model) 26 | 27 | i = 0 28 | time_spent = [] 29 | while i < 200: 30 | start_time = time.time() 31 | with torch.no_grad(): 32 | _ = model(inputs) 33 | 34 | if device == 'cuda': 35 | torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!) 36 | if i != 0: 37 | time_spent.append(time.time() - start_time) 38 | i += 1 39 | print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent))) 40 | return np.mean(time_spent) 41 | 42 | 43 | modellist = ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152", "resnext50_32x4d", "resnext101_32x8d", "mnasnet1_0", "squeezenet1_0", "densenet121", "densenet169", "inception_v3"] 44 | 45 | # resnet is enought for now 46 | modellist = ["resnet18", "resnet34", "resnet50"] 47 | results = [] 48 | 49 | for i, model_name in enumerate(modellist): 50 | runtimes = [] 51 | 52 | input_size = [1, 3, 256, 256] 53 | mdl = globals()[model_name] 54 | model = mdl().cuda().eval() 55 | # Run raw models 56 | runtimes.append(computeTime(model, input_size=input_size, device="cuda", FP16=False)) 57 | 58 | if FP32: 59 | # define model 60 | print("model: {}".format(model_name)) 61 | mdl = globals()[model_name] 62 | model = mdl().cuda().eval() 63 | # define input 64 | input_size = [1, 3, 256, 256] 65 | x = torch.zeros(input_size).cuda() 66 | 67 | # convert to tensorrt models 68 | model_trt = torch2trt(model, [x]) 69 | 70 | # Run TensorRT models 71 | runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=False)) 72 | if FP16: 73 | print("running fp16 models..") 74 | # Make FP16 tensorRT models 75 | mdl = globals()[model_name] 76 | model = mdl().eval().half().cuda() 77 | # define input 78 | input_size = [1, 3, 256, 256] 79 | x = torch.zeros(input_size).half().cuda() 80 | # convert to tensorrt models 81 | model_trt = torch2trt(model, [x], fp16_mode=True) 82 | # Run TensorRT models 83 | runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=True)) 84 | 85 | results.append({model_name: runtimes}) 86 | if INT8: 87 | print("running int8 models..") 88 | # Make INT8 tensorRT models 89 | mdl = globals()[model_name] 90 | model = mdl().eval().half().cuda() 91 | # define input 92 | input_size = [1, 3, 256, 256] 93 | x = torch.randn(input_size).half().cuda() 94 | # convert to tensorrt models 95 | model_trt = torch2trt(model, [x], fp16_mode=True, int8_mode=True, max_batch_size=1) 96 | # Run TensorRT models 97 | input_size = [1, 3, 256, 256] 98 | runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=True)) 99 | results.append({model_name: runtimes}) 100 | 101 | if i == 0: 102 | df = pd.DataFrame({model_name: runtimes}, 103 | index = ["Raw", "FP32", "FP16", "INT8"]) 104 | else: 105 | df[model_name] = runtimes 106 | 107 | df.to_csv("results/xavier.csv") 108 | df 109 | 110 | -------------------------------------------------------------------------------- /results/batch.csv: -------------------------------------------------------------------------------- 1 | ,resnet18,resnet34,resnet50 2 | 1,0.008542266922380456,0.01481275103200021,0.0178466082817346 3 | 4,0.004840066684550376,0.0091991511421587,0.011360873229539574 4 | 8,0.005877630195425983,0.009893905727108519,0.011597911616665634 5 | 16,0.0024946156009357776,0.004145186525493411,0.008591583475994704 6 | 32,0.002836233557169162,0.004563913123691502,0.008728343338223558 7 | -------------------------------------------------------------------------------- /results/fp16.csv: -------------------------------------------------------------------------------- 1 | ,resnet18,resnet34,resnet50 2 | FP32,0.00989055154311597,0.01693966640299888,0.019452313082901077 3 | FP16_torch,0.009034480281810664,0.017017085348541412,0.019448458848886154 4 | FP16_apex,0.009056915589912453,0.016978000276651813,0.020045562006121304 5 | -------------------------------------------------------------------------------- /results/imsize.csv: -------------------------------------------------------------------------------- 1 | ,resnet18,resnet34,resnet50 2 | 128,0.007868697295835869,0.014414251749239975,0.01414928244585967 3 | 256,0.007717028335111225,0.014684499807693251,0.017655759600538706 4 | 512,0.02013937672178949,0.03769871218120632,0.04788235084495353 5 | 1024,0.07168652424261199,0.13482510863836086,0.1791151432535756 6 | -------------------------------------------------------------------------------- /results/jetsonano.txt: -------------------------------------------------------------------------------- 1 | Without tensorRT FP32/FP16 2 | [{'resnet18': [0.038558399257947455, 0.037869752951003796]}, {'resnet34': [0.06386453542278041, 0.06943798424610541]}, {'resnet50': [0.0906546499261904, 0.08942561532983828]}] 3 | 4 | With TensorRT FP32/FP16 5 | [{'resnet18': [0.02693739967729578, 0.030103209030688107]}, {'resnet34': [0.047432633500602374, 0.046157992664893066]}, {'resnet50': [0.07816180631743004, 0.07399397758982289]}] 6 | 7 | 8 | -------------------------------------------------------------------------------- /results/xavier.csv: -------------------------------------------------------------------------------- 1 | ,resnet18,resnet34,resnet50 2 | Raw,0.007432765098073375,0.011262918836507365,0.015097524652529002 3 | FP32,0.003625017913741682,0.005848102234116751,0.010952574523849104 4 | FP16,0.0017815654601284008,0.004464829986418911,0.0041645591582485176 5 | INT8,0.0018363609984891499,0.004203463319557995,0.0030628951949689853 6 | -------------------------------------------------------------------------------- /results/xavier_segmentation.csv: -------------------------------------------------------------------------------- 1 | ,fcn_resnet50,fcn_resnet101,deeplabv3_resnet50,deeplabv3_resnet101 2 | Raw,0.20535858312443872,0.34430714108836113,0.2810230279088619,0.425959756026915 3 | FP32,0.17381846126000486,0.29017985765658433,0.2523143626936716,0.3665324635242098 4 | FP16,0.03663515685191705,0.05692227521733423,0.12986798621901316,0.1511950037587228 5 | INT8,0.021869432986082144,0.032291673535677655,0.09735102749350083,0.10828216591073041 6 | -------------------------------------------------------------------------------- /utils/__pycache__/fp16.cpython-36.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/utils/__pycache__/fp16.cpython-36.pyc -------------------------------------------------------------------------------- /utils/fp16.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | # codes from https://github.com/fastai/imagenet-fast/tree/master/cifar10 4 | 5 | class tofp16(nn.Module): 6 | def __init__(self): 7 | super(tofp16, self).__init__() 8 | 9 | def forward(self, input): 10 | return input.half() 11 | 12 | 13 | def copy_in_params(net, params): 14 | net_params = list(net.parameters()) 15 | for i in range(len(params)): 16 | net_params[i].data.copy_(params[i].data) 17 | 18 | 19 | def set_grad(params, params_with_grad): 20 | 21 | for param, param_w_grad in zip(params, params_with_grad): 22 | if param.grad is None: 23 | param.grad = torch.nn.Parameter(param.data.new().resize_(*param.data.size())) 24 | param.grad.data.copy_(param_w_grad.grad.data) 25 | 26 | 27 | def BN_convert_float(module): 28 | ''' 29 | BatchNorm layers to have parameters in single precision. 30 | Find all layers and convert them back to float. This can't 31 | be done with built in .apply as that function will apply 32 | fn to all modules, parameters, and buffers. Thus we wouldn't 33 | be able to guard the float conversion based on the module type. 34 | ''' 35 | if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): 36 | module.float() 37 | for child in module.children(): 38 | BN_convert_float(child) 39 | return module 40 | 41 | 42 | def network_to_half(network): 43 | return nn.Sequential(tofp16(), BN_convert_float(network.half())) 44 | --------------------------------------------------------------------------------