├── LICENSE
├── README.md
├── imgs
├── 2022_01.jpg
├── _108240741_beatles-abbeyroad-square-reuters-applecorps.jpg
├── add256.jpg
├── add512.jpg
├── add768.jpg
├── addnormal.jpg
├── addtensorrt_FP16.jpg
├── addtensorrt_FP32.jpg
├── addtensorrt_INT8.jpg
├── mask256.jpg
├── mask512.jpg
├── mask768.jpg
├── masknormal.jpg
├── masktensorrt_FP16.jpg
├── masktensorrt_FP32.jpg
└── masktensorrt_INT8.jpg
├── inference_FP32_vs_FP16.ipynb
├── inference_batch_vs_imsize.ipynb
├── inference_classification.ipynb
├── inference_dev.ipynb
├── inference_segmentation.ipynb
├── inference_segmentation_demo.ipynb
├── inference_tensorrt.py
├── results
├── batch.csv
├── fp16.csv
├── imsize.csv
├── jetsonano.txt
├── xavier.csv
└── xavier_segmentation.csv
└── utils
├── __pycache__
└── fp16.cpython-36.pyc
└── fp16.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 Kentaro Yoshioka
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Benchmark-FP32-FP16-INT8-with-TensorRT
2 | Benchmark inference speed of CNNs with various quantization methods with TensorRT!
3 |
4 | :star: if it helps you.
5 |
6 | # Image classification
7 |
8 | Run:
9 | `inference_tensorrt.py`
10 |
11 | ## Hardware:Jetson Nano.
12 | TRT notes TensorRT compiled models in the noted precision.
13 |
14 | Latency of image inference (1,3,256,256) [ms]
15 |
16 | | | TRT FP32 | TRT FP16 | TRT INT8 |
17 | |:--------:|------|:----:|------|
18 | | resnet18 | 26 | 18 | |
19 | | resnet34 | 48 | 30 | |
20 | | resnet50 | 79 | 42 | |
21 |
22 | Jetson Nano does not support INT8..
23 |
24 | ## Hardware:Jetson Xavier.
25 |
26 | TRT notes TensorRT compiled models in the noted precision.
27 |
28 | Latency of image inference (1,3,256,256) [ms]
29 |
30 | | | resnet18 | resnet34 | resnet50 |
31 | |------|----------|----------|----------|
32 | | PytorchRaw | 11 | 12 | 16 |
33 | | TRT FP32 | 3.8 | 5.6 | 9.9 |
34 | | TRT FP16 | 2.1 | 3.3 | 4.4 |
35 | | TRT INT8 | 1.7 | 2.7 | 3.0 |
36 |
37 | # Image segmentation
38 | 
39 | ## Hardware:Jetson Xavier.
40 |
41 | TRT notes TensorRT compiled models in the noted precision.
42 |
43 | Latency of image inference (1,3,512,512) [ms]
44 |
45 | | | fcn_resnet50 | fcn_resnet101 | deeplabv3_resnet50 | deeplabv3_resnet101 |
46 | |------|--------------|---------------|--------------------|---------------------|
47 | | PytorchRaw | 200 | 344 | 281 | 426 |
48 | | TRT FP32 | 173 | 290 | 252 | 366 |
49 | | TRT FP16 | 36 | 57 | 130 | 151 |
50 | | TRT INT8 | 21 | 32 | 97 | 108 |
51 |
52 | ## Hardware:Jetson Nano.
53 |
54 | Latency of image inference (1,3,256,256) [ms]
55 |
56 | | | fcn_resnet50 |
57 | |------|--------------|
58 | | PytorchRaw | 6800 |
59 | | TRT FP32 | 767 |
60 | | TRT FP16 | 40 |
61 | | TRT INT8 | NA |
62 |
63 | # Hardware setup
64 | The hardware setup seems tricky.
65 |
66 | * Install pytorch
67 |
68 | https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048
69 |
70 | **The stable version for Jetson nano seems to be torch==1.1**
71 |
72 | **For Xavier, torch==1.3 worked fine for me.**
73 |
74 | * Install torchvision
75 |
76 | I followed this instruction and installed torchvision==0.3.0
77 |
78 | https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32
79 |
80 | ```bash
81 | sudo apt-get install libjpeg-dev zlib1g-dev
82 | git clone -b v0.3.0 https://github.com/pytorch/vision torchvision
83 | cd torchvision
84 | sudo python3 setup.py install
85 | ```
86 |
87 | * Install torch2trt
88 |
89 | Followed readme.
90 |
91 | https://github.com/NVIDIA-AI-IOT/torch2trt
92 |
93 | ```bash
94 | sudo apt-get install libprotobuf* protobuf-compiler ninja-build
95 | git clone https://github.com/NVIDIA-AI-IOT/torch2trt
96 | cd torch2trt
97 | sudo python3 setup.py install --plugins
98 | ```
99 |
--------------------------------------------------------------------------------
/imgs/2022_01.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/2022_01.jpg
--------------------------------------------------------------------------------
/imgs/_108240741_beatles-abbeyroad-square-reuters-applecorps.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/_108240741_beatles-abbeyroad-square-reuters-applecorps.jpg
--------------------------------------------------------------------------------
/imgs/add256.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add256.jpg
--------------------------------------------------------------------------------
/imgs/add512.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add512.jpg
--------------------------------------------------------------------------------
/imgs/add768.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add768.jpg
--------------------------------------------------------------------------------
/imgs/addnormal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addnormal.jpg
--------------------------------------------------------------------------------
/imgs/addtensorrt_FP16.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_FP16.jpg
--------------------------------------------------------------------------------
/imgs/addtensorrt_FP32.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_FP32.jpg
--------------------------------------------------------------------------------
/imgs/addtensorrt_INT8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_INT8.jpg
--------------------------------------------------------------------------------
/imgs/mask256.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask256.jpg
--------------------------------------------------------------------------------
/imgs/mask512.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask512.jpg
--------------------------------------------------------------------------------
/imgs/mask768.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask768.jpg
--------------------------------------------------------------------------------
/imgs/masknormal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masknormal.jpg
--------------------------------------------------------------------------------
/imgs/masktensorrt_FP16.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_FP16.jpg
--------------------------------------------------------------------------------
/imgs/masktensorrt_FP32.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_FP32.jpg
--------------------------------------------------------------------------------
/imgs/masktensorrt_INT8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_INT8.jpg
--------------------------------------------------------------------------------
/inference_FP32_vs_FP16.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 8,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import torch\n",
11 | "import time\n",
12 | "from torchvision.models import *\n",
13 | "import pandas as pd\n",
14 | "import os\n",
15 | "from apex import amp"
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 9,
21 | "metadata": {},
22 | "outputs": [],
23 | "source": [
24 | "# make results\n",
25 | "os.makedirs(\"results\", exist_ok=True)"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 10,
31 | "metadata": {},
32 | "outputs": [],
33 | "source": [
34 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
35 | " inputs = torch.randn(input_size)\n",
36 | " if device == 'cuda':\n",
37 | " model = model.cuda()\n",
38 | " inputs = inputs.cuda()\n",
39 | " if FP16:\n",
40 | " model = model.half()\n",
41 | " inputs = inputs.half()\n",
42 | "\n",
43 | " model.eval()\n",
44 | "\n",
45 | " i = 0\n",
46 | " time_spent = []\n",
47 | " while i < 200:\n",
48 | " start_time = time.time()\n",
49 | " with torch.no_grad():\n",
50 | " _ = model(inputs)\n",
51 | "\n",
52 | " if device == 'cuda':\n",
53 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
54 | " if i != 0:\n",
55 | " time_spent.append(time.time() - start_time)\n",
56 | " i += 1\n",
57 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
58 | " return np.mean(time_spent)"
59 | ]
60 | },
61 | {
62 | "cell_type": "code",
63 | "execution_count": 11,
64 | "metadata": {},
65 | "outputs": [],
66 | "source": [
67 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
68 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
69 | "\n",
70 | "# resnet is enought for now\n",
71 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\"]"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": 17,
77 | "metadata": {},
78 | "outputs": [
79 | {
80 | "name": "stdout",
81 | "output_type": "stream",
82 | "text": [
83 | "model: resnet18\n",
84 | "Looks ok!\n"
85 | ]
86 | }
87 | ],
88 | "source": [
89 | "# test amp\n",
90 | "model_name = \"resnet18\"\n",
91 | "print(\"model: {}\".format(model_name))\n",
92 | "mdl = globals()[model_name]\n",
93 | "model = mdl().to(\"cuda\")\n",
94 | "model = amp.initialize(model, opt_level=opt_level)\n",
95 | "print(\"Looks ok!\")"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 20,
101 | "metadata": {
102 | "scrolled": false
103 | },
104 | "outputs": [
105 | {
106 | "name": "stdout",
107 | "output_type": "stream",
108 | "text": [
109 | "model: resnet18\n",
110 | "Avg execution time (ms): 0.010\n",
111 | "Avg execution time (ms): 0.009\n",
112 | "Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.\n",
113 | "\n",
114 | "Defaults for this optimization level are:\n",
115 | "enabled : True\n",
116 | "opt_level : O1\n",
117 | "cast_model_type : None\n",
118 | "patch_torch_functions : True\n",
119 | "keep_batchnorm_fp32 : None\n",
120 | "master_weights : None\n",
121 | "loss_scale : dynamic\n",
122 | "Processing user overrides (additional kwargs that are not None)...\n",
123 | "After processing overrides, optimization options are:\n",
124 | "enabled : True\n",
125 | "opt_level : O1\n",
126 | "cast_model_type : None\n",
127 | "patch_torch_functions : True\n",
128 | "keep_batchnorm_fp32 : None\n",
129 | "master_weights : None\n",
130 | "loss_scale : dynamic\n",
131 | "Avg execution time (ms): 0.009\n",
132 | "model: resnet34\n",
133 | "Avg execution time (ms): 0.017\n",
134 | "Avg execution time (ms): 0.017\n",
135 | "Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.\n",
136 | "\n",
137 | "Defaults for this optimization level are:\n",
138 | "enabled : True\n",
139 | "opt_level : O1\n",
140 | "cast_model_type : None\n",
141 | "patch_torch_functions : True\n",
142 | "keep_batchnorm_fp32 : None\n",
143 | "master_weights : None\n",
144 | "loss_scale : dynamic\n",
145 | "Processing user overrides (additional kwargs that are not None)...\n",
146 | "After processing overrides, optimization options are:\n",
147 | "enabled : True\n",
148 | "opt_level : O1\n",
149 | "cast_model_type : None\n",
150 | "patch_torch_functions : True\n",
151 | "keep_batchnorm_fp32 : None\n",
152 | "master_weights : None\n",
153 | "loss_scale : dynamic\n",
154 | "Avg execution time (ms): 0.017\n",
155 | "model: resnet50\n",
156 | "Avg execution time (ms): 0.019\n",
157 | "Avg execution time (ms): 0.019\n",
158 | "Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.\n",
159 | "\n",
160 | "Defaults for this optimization level are:\n",
161 | "enabled : True\n",
162 | "opt_level : O1\n",
163 | "cast_model_type : None\n",
164 | "patch_torch_functions : True\n",
165 | "keep_batchnorm_fp32 : None\n",
166 | "master_weights : None\n",
167 | "loss_scale : dynamic\n",
168 | "Processing user overrides (additional kwargs that are not None)...\n",
169 | "After processing overrides, optimization options are:\n",
170 | "enabled : True\n",
171 | "opt_level : O1\n",
172 | "cast_model_type : None\n",
173 | "patch_torch_functions : True\n",
174 | "keep_batchnorm_fp32 : None\n",
175 | "master_weights : None\n",
176 | "loss_scale : dynamic\n",
177 | "Avg execution time (ms): 0.020\n"
178 | ]
179 | }
180 | ],
181 | "source": [
182 | "for i, model_name in enumerate(modellist):\n",
183 | "\n",
184 | " runtimes = []\n",
185 | " \n",
186 | " # define model\n",
187 | " print(\"model: {}\".format(model_name))\n",
188 | " mdl = globals()[model_name]\n",
189 | " model = mdl()\n",
190 | " \n",
191 | " # Run FP32\n",
192 | " runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=False))\n",
193 | " # Run FP16\n",
194 | " runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=True))\n",
195 | " \n",
196 | " # Amp Initialization\n",
197 | " opt_level = 'O1' # for only use FP32\n",
198 | " mdl = globals()[model_name]\n",
199 | " model = mdl().to(\"cuda\")\n",
200 | " model = amp.initialize(model, opt_level=opt_level)\n",
201 | " \n",
202 | " # Run FP16\n",
203 | " runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=False))\n",
204 | " \n",
205 | " if i == 0:\n",
206 | " df = pd.DataFrame({model_name: runtimes},\n",
207 | " index = [\"FP32\", \"FP16_torch\", \"FP16_apex\"])\n",
208 | " else:\n",
209 | " df[model_name] = runtimes\n",
210 | " "
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 21,
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "data": {
220 | "text/html": [
221 | "
\n",
222 | "\n",
235 | "
\n",
236 | " \n",
237 | " \n",
238 | " | \n",
239 | " resnet18 | \n",
240 | " resnet34 | \n",
241 | " resnet50 | \n",
242 | "
\n",
243 | " \n",
244 | " \n",
245 | " \n",
246 | " FP32 | \n",
247 | " 0.009891 | \n",
248 | " 0.016940 | \n",
249 | " 0.019452 | \n",
250 | "
\n",
251 | " \n",
252 | " FP16_torch | \n",
253 | " 0.009034 | \n",
254 | " 0.017017 | \n",
255 | " 0.019448 | \n",
256 | "
\n",
257 | " \n",
258 | " FP16_apex | \n",
259 | " 0.009057 | \n",
260 | " 0.016978 | \n",
261 | " 0.020046 | \n",
262 | "
\n",
263 | " \n",
264 | "
\n",
265 | "
"
266 | ],
267 | "text/plain": [
268 | " resnet18 resnet34 resnet50\n",
269 | "FP32 0.009891 0.016940 0.019452\n",
270 | "FP16_torch 0.009034 0.017017 0.019448\n",
271 | "FP16_apex 0.009057 0.016978 0.020046"
272 | ]
273 | },
274 | "execution_count": 21,
275 | "metadata": {},
276 | "output_type": "execute_result"
277 | }
278 | ],
279 | "source": [
280 | "df.to_csv(\"results/fp16.csv\")\n",
281 | "df"
282 | ]
283 | },
284 | {
285 | "cell_type": "code",
286 | "execution_count": null,
287 | "metadata": {},
288 | "outputs": [],
289 | "source": []
290 | }
291 | ],
292 | "metadata": {
293 | "kernelspec": {
294 | "display_name": "Python 3",
295 | "language": "python",
296 | "name": "python3"
297 | },
298 | "language_info": {
299 | "codemirror_mode": {
300 | "name": "ipython",
301 | "version": 3
302 | },
303 | "file_extension": ".py",
304 | "mimetype": "text/x-python",
305 | "name": "python",
306 | "nbconvert_exporter": "python",
307 | "pygments_lexer": "ipython3",
308 | "version": "3.6.5"
309 | }
310 | },
311 | "nbformat": 4,
312 | "nbformat_minor": 2
313 | }
314 |
--------------------------------------------------------------------------------
/inference_batch_vs_imsize.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 8,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import torch\n",
11 | "import time\n",
12 | "from torchvision.models import *\n",
13 | "import pandas as pd\n",
14 | "import os"
15 | ]
16 | },
17 | {
18 | "cell_type": "code",
19 | "execution_count": 9,
20 | "metadata": {},
21 | "outputs": [],
22 | "source": [
23 | "# make models from str\n",
24 | "model_name = \"resnet18\"\n",
25 | "# make results\n",
26 | "os.makedirs(\"results\", exist_ok=True)"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 3,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
36 | " inputs = torch.randn(input_size)\n",
37 | " if device == 'cuda':\n",
38 | " model = model.cuda()\n",
39 | " inputs = inputs.cuda()\n",
40 | " if FP16:\n",
41 | " model = model.half()\n",
42 | " inputs = inputs.half()\n",
43 | "\n",
44 | " model.eval()\n",
45 | "\n",
46 | " i = 0\n",
47 | " time_spent = []\n",
48 | " while i < 200:\n",
49 | " start_time = time.time()\n",
50 | " with torch.no_grad():\n",
51 | " _ = model(inputs)\n",
52 | "\n",
53 | " if device == 'cuda':\n",
54 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
55 | " if i != 0:\n",
56 | " time_spent.append(time.time() - start_time)\n",
57 | " i += 1\n",
58 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
59 | " return np.mean(time_spent)"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 4,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
69 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
70 | "\n",
71 | "# resnet is enought for now\n",
72 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\"]"
73 | ]
74 | },
75 | {
76 | "cell_type": "code",
77 | "execution_count": 5,
78 | "metadata": {
79 | "scrolled": false
80 | },
81 | "outputs": [
82 | {
83 | "name": "stdout",
84 | "output_type": "stream",
85 | "text": [
86 | "model: resnet18\n",
87 | "Avg execution time (ms): 0.009\n",
88 | "Avg execution time (ms): 0.019\n",
89 | "Avg execution time (ms): 0.047\n",
90 | "Avg execution time (ms): 0.040\n",
91 | "Avg execution time (ms): 0.091\n",
92 | "Avg execution time (ms): 0.008\n",
93 | "Avg execution time (ms): 0.008\n",
94 | "Avg execution time (ms): 0.020\n",
95 | "Avg execution time (ms): 0.072\n",
96 | "model: resnet34\n",
97 | "Avg execution time (ms): 0.015\n",
98 | "Avg execution time (ms): 0.037\n",
99 | "Avg execution time (ms): 0.079\n",
100 | "Avg execution time (ms): 0.066\n",
101 | "Avg execution time (ms): 0.146\n",
102 | "Avg execution time (ms): 0.014\n",
103 | "Avg execution time (ms): 0.015\n",
104 | "Avg execution time (ms): 0.038\n",
105 | "Avg execution time (ms): 0.135\n",
106 | "model: resnet50\n",
107 | "Avg execution time (ms): 0.018\n",
108 | "Avg execution time (ms): 0.045\n",
109 | "Avg execution time (ms): 0.093\n",
110 | "Avg execution time (ms): 0.137\n",
111 | "Avg execution time (ms): 0.279\n",
112 | "Avg execution time (ms): 0.014\n",
113 | "Avg execution time (ms): 0.018\n",
114 | "Avg execution time (ms): 0.048\n",
115 | "Avg execution time (ms): 0.179\n"
116 | ]
117 | }
118 | ],
119 | "source": [
120 | "batchlist = [1, 4, 8, 16, 32]\n",
121 | "imsize = [128, 256, 512, 1024]\n",
122 | "\n",
123 | "for i, model_name in enumerate(modellist):\n",
124 | "\n",
125 | " runtimes = []\n",
126 | " \n",
127 | " # define model\n",
128 | " print(\"model: {}\".format(model_name))\n",
129 | " mdl = globals()[model_name]\n",
130 | " model = mdl()\n",
131 | " \n",
132 | " for batch in batchlist: \n",
133 | " runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n",
134 | "\n",
135 | " if i == 0:\n",
136 | " dfbatch = pd.DataFrame({model_name: runtimes},\n",
137 | " index = batchlist)\n",
138 | " else:\n",
139 | " dfbatch[model_name] = runtimes\n",
140 | " \n",
141 | " runtimes = []\n",
142 | " for isize in imsize:\n",
143 | " runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n",
144 | "\n",
145 | " if i == 0:\n",
146 | " dfimsize = pd.DataFrame({model_name: runtimes},\n",
147 | " index = imsize)\n",
148 | " else:\n",
149 | " dfimsize[model_name] = runtimes"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 13,
155 | "metadata": {},
156 | "outputs": [
157 | {
158 | "data": {
159 | "text/html": [
160 | "\n",
161 | "\n",
174 | "
\n",
175 | " \n",
176 | " \n",
177 | " | \n",
178 | " resnet18 | \n",
179 | " resnet34 | \n",
180 | " resnet50 | \n",
181 | "
\n",
182 | " \n",
183 | " \n",
184 | " \n",
185 | " 1 | \n",
186 | " 0.008542 | \n",
187 | " 0.014813 | \n",
188 | " 0.017847 | \n",
189 | "
\n",
190 | " \n",
191 | " 4 | \n",
192 | " 0.004840 | \n",
193 | " 0.009199 | \n",
194 | " 0.011361 | \n",
195 | "
\n",
196 | " \n",
197 | " 8 | \n",
198 | " 0.005878 | \n",
199 | " 0.009894 | \n",
200 | " 0.011598 | \n",
201 | "
\n",
202 | " \n",
203 | " 16 | \n",
204 | " 0.002495 | \n",
205 | " 0.004145 | \n",
206 | " 0.008592 | \n",
207 | "
\n",
208 | " \n",
209 | " 32 | \n",
210 | " 0.002836 | \n",
211 | " 0.004564 | \n",
212 | " 0.008728 | \n",
213 | "
\n",
214 | " \n",
215 | "
\n",
216 | "
"
217 | ],
218 | "text/plain": [
219 | " resnet18 resnet34 resnet50\n",
220 | "1 0.008542 0.014813 0.017847\n",
221 | "4 0.004840 0.009199 0.011361\n",
222 | "8 0.005878 0.009894 0.011598\n",
223 | "16 0.002495 0.004145 0.008592\n",
224 | "32 0.002836 0.004564 0.008728"
225 | ]
226 | },
227 | "execution_count": 13,
228 | "metadata": {},
229 | "output_type": "execute_result"
230 | }
231 | ],
232 | "source": [
233 | "dfbatch.to_csv(\"results/batch.csv\")\n",
234 | "dfimsize.to_csv(\"results/imsize.csv\")\n",
235 | "dfbatch"
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": 14,
241 | "metadata": {},
242 | "outputs": [
243 | {
244 | "data": {
245 | "text/html": [
246 | "\n",
247 | "\n",
260 | "
\n",
261 | " \n",
262 | " \n",
263 | " | \n",
264 | " resnet18 | \n",
265 | " resnet34 | \n",
266 | " resnet50 | \n",
267 | "
\n",
268 | " \n",
269 | " \n",
270 | " \n",
271 | " 128 | \n",
272 | " 0.007869 | \n",
273 | " 0.014414 | \n",
274 | " 0.014149 | \n",
275 | "
\n",
276 | " \n",
277 | " 256 | \n",
278 | " 0.007717 | \n",
279 | " 0.014684 | \n",
280 | " 0.017656 | \n",
281 | "
\n",
282 | " \n",
283 | " 512 | \n",
284 | " 0.020139 | \n",
285 | " 0.037699 | \n",
286 | " 0.047882 | \n",
287 | "
\n",
288 | " \n",
289 | " 1024 | \n",
290 | " 0.071687 | \n",
291 | " 0.134825 | \n",
292 | " 0.179115 | \n",
293 | "
\n",
294 | " \n",
295 | "
\n",
296 | "
"
297 | ],
298 | "text/plain": [
299 | " resnet18 resnet34 resnet50\n",
300 | "128 0.007869 0.014414 0.014149\n",
301 | "256 0.007717 0.014684 0.017656\n",
302 | "512 0.020139 0.037699 0.047882\n",
303 | "1024 0.071687 0.134825 0.179115"
304 | ]
305 | },
306 | "execution_count": 14,
307 | "metadata": {},
308 | "output_type": "execute_result"
309 | }
310 | ],
311 | "source": [
312 | "dfimsize"
313 | ]
314 | },
315 | {
316 | "cell_type": "code",
317 | "execution_count": 15,
318 | "metadata": {},
319 | "outputs": [
320 | {
321 | "ename": "ModuleNotFoundError",
322 | "evalue": "No module named 'apex'",
323 | "output_type": "error",
324 | "traceback": [
325 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
326 | "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
327 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mapex\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
328 | "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'apex'"
329 | ]
330 | }
331 | ],
332 | "source": [
333 | "import "
334 | ]
335 | },
336 | {
337 | "cell_type": "code",
338 | "execution_count": null,
339 | "metadata": {},
340 | "outputs": [],
341 | "source": []
342 | }
343 | ],
344 | "metadata": {
345 | "kernelspec": {
346 | "display_name": "Python 3",
347 | "language": "python",
348 | "name": "python3"
349 | },
350 | "language_info": {
351 | "codemirror_mode": {
352 | "name": "ipython",
353 | "version": 3
354 | },
355 | "file_extension": ".py",
356 | "mimetype": "text/x-python",
357 | "name": "python",
358 | "nbconvert_exporter": "python",
359 | "pygments_lexer": "ipython3",
360 | "version": "3.6.5"
361 | }
362 | },
363 | "nbformat": 4,
364 | "nbformat_minor": 2
365 | }
366 |
--------------------------------------------------------------------------------
/inference_classification.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 9,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import torch\n",
11 | "import time\n",
12 | "from torchvision.models import *\n",
13 | "import pandas as pd"
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 2,
19 | "metadata": {},
20 | "outputs": [],
21 | "source": [
22 | "# make models from str\n",
23 | "model_name = \"resnet18\""
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 7,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": [
32 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
33 | " inputs = torch.randn(input_size)\n",
34 | " if device == 'cuda':\n",
35 | " model = model.cuda()\n",
36 | " inputs = inputs.cuda()\n",
37 | " if FP16:\n",
38 | " model = model.half()\n",
39 | " inputs = inputs.half()\n",
40 | "\n",
41 | " model.eval()\n",
42 | "\n",
43 | " i = 0\n",
44 | " time_spent = []\n",
45 | " while i < 200:\n",
46 | " start_time = time.time()\n",
47 | " with torch.no_grad():\n",
48 | " _ = model(inputs)\n",
49 | "\n",
50 | " if device == 'cuda':\n",
51 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
52 | " if i != 0:\n",
53 | " time_spent.append(time.time() - start_time)\n",
54 | " i += 1\n",
55 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
56 | " return np.mean(time_spent)"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 38,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": [
65 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
66 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
67 | "\n",
68 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": null,
74 | "metadata": {
75 | "scrolled": false
76 | },
77 | "outputs": [
78 | {
79 | "name": "stdout",
80 | "output_type": "stream",
81 | "text": [
82 | "model: resnet18\n",
83 | "Avg execution time (ms): 0.008\n",
84 | "Avg execution time (ms): 0.019\n",
85 | "Avg execution time (ms): 0.046\n",
86 | "Avg execution time (ms): 0.040\n"
87 | ]
88 | }
89 | ],
90 | "source": [
91 | "for i, model_name in enumerate(modellist):\n",
92 | " batchlist = [1, 4, 8, 16, 32]\n",
93 | " imsize = [128, 256, 512]\n",
94 | " runtimes = []\n",
95 | " \n",
96 | " # define model\n",
97 | " print(\"model: {}\".format(model_name))\n",
98 | " mdl = globals()[model_name]\n",
99 | " model = mdl()\n",
100 | " \n",
101 | " for batch in batchlist: \n",
102 | " runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n",
103 | "\n",
104 | " if i == 0:\n",
105 | " dfbatch = pd.DataFrame({model_name: runtimes},\n",
106 | " index = batchlist)\n",
107 | " else:\n",
108 | " dfbatch[model_name] = runtimes\n",
109 | " \n",
110 | " runtimes = []\n",
111 | " for isize in imsize:\n",
112 | " print(\"model: {}\".format(model_name))\n",
113 | " mdl = globals()[model_name]\n",
114 | " model = mdl()\n",
115 | " runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n",
116 | "\n",
117 | " if i == 0:\n",
118 | " dfimsize = pd.DataFrame({model_name: runtimes},\n",
119 | " index = imsize)\n",
120 | " else:\n",
121 | " dfimsize[model_name] = runtimes"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": 37,
127 | "metadata": {},
128 | "outputs": [
129 | {
130 | "data": {
131 | "text/html": [
132 | "\n",
133 | "\n",
146 | "
\n",
147 | " \n",
148 | " \n",
149 | " | \n",
150 | " resnet18 | \n",
151 | " resnet34 | \n",
152 | " resnet50 | \n",
153 | " resnet101 | \n",
154 | " resnet152 | \n",
155 | " resnext50_32x4d | \n",
156 | " resnext101_32x8d | \n",
157 | " mnasnet1_0 | \n",
158 | " squeezenet1_0 | \n",
159 | " densenet121 | \n",
160 | " densenet169 | \n",
161 | " inception_v3 | \n",
162 | "
\n",
163 | " \n",
164 | " \n",
165 | " \n",
166 | " cuda FP32 | \n",
167 | " 0.006895 | \n",
168 | " 0.013512 | \n",
169 | " 0.016632 | \n",
170 | " 0.032939 | \n",
171 | " 0.048400 | \n",
172 | " 0.033309 | \n",
173 | " 0.118709 | \n",
174 | " 0.007704 | \n",
175 | " 0.004120 | \n",
176 | " 0.025061 | \n",
177 | " 0.037639 | \n",
178 | " 0.027673 | \n",
179 | "
\n",
180 | " \n",
181 | " cuda FP16 | \n",
182 | " 0.007969 | \n",
183 | " 0.015316 | \n",
184 | " 0.017940 | \n",
185 | " 0.035898 | \n",
186 | " 0.052364 | \n",
187 | " 0.033756 | \n",
188 | " 0.114106 | \n",
189 | " 0.007777 | \n",
190 | " 0.003966 | \n",
191 | " 0.022630 | \n",
192 | " 0.034838 | \n",
193 | " 0.030210 | \n",
194 | "
\n",
195 | " \n",
196 | "
\n",
197 | "
"
198 | ],
199 | "text/plain": [
200 | " resnet18 resnet34 resnet50 resnet101 resnet152 \\\n",
201 | "cuda FP32 0.006895 0.013512 0.016632 0.032939 0.048400 \n",
202 | "cuda FP16 0.007969 0.015316 0.017940 0.035898 0.052364 \n",
203 | "\n",
204 | " resnext50_32x4d resnext101_32x8d mnasnet1_0 squeezenet1_0 \\\n",
205 | "cuda FP32 0.033309 0.118709 0.007704 0.004120 \n",
206 | "cuda FP16 0.033756 0.114106 0.007777 0.003966 \n",
207 | "\n",
208 | " densenet121 densenet169 inception_v3 \n",
209 | "cuda FP32 0.025061 0.037639 0.027673 \n",
210 | "cuda FP16 0.022630 0.034838 0.030210 "
211 | ]
212 | },
213 | "execution_count": 37,
214 | "metadata": {},
215 | "output_type": "execute_result"
216 | }
217 | ],
218 | "source": [
219 | "df"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": null,
225 | "metadata": {},
226 | "outputs": [],
227 | "source": []
228 | }
229 | ],
230 | "metadata": {
231 | "kernelspec": {
232 | "display_name": "Python 3",
233 | "language": "python",
234 | "name": "python3"
235 | },
236 | "language_info": {
237 | "codemirror_mode": {
238 | "name": "ipython",
239 | "version": 3
240 | },
241 | "file_extension": ".py",
242 | "mimetype": "text/x-python",
243 | "name": "python",
244 | "nbconvert_exporter": "python",
245 | "pygments_lexer": "ipython3",
246 | "version": "3.6.5"
247 | }
248 | },
249 | "nbformat": 4,
250 | "nbformat_minor": 2
251 | }
252 |
--------------------------------------------------------------------------------
/inference_dev.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 9,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import torch\n",
11 | "import time\n",
12 | "from torchvision.models import *\n",
13 | "import pandas as pd"
14 | ]
15 | },
16 | {
17 | "cell_type": "code",
18 | "execution_count": 2,
19 | "metadata": {},
20 | "outputs": [],
21 | "source": [
22 | "# make models from str\n",
23 | "model_name = \"resnet18\""
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 7,
29 | "metadata": {},
30 | "outputs": [],
31 | "source": [
32 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
33 | " inputs = torch.randn(input_size)\n",
34 | " if device == 'cuda':\n",
35 | " model = model.cuda()\n",
36 | " inputs = inputs.cuda()\n",
37 | " if FP16:\n",
38 | " model = model.half()\n",
39 | " inputs = inputs.half()\n",
40 | "\n",
41 | " model.eval()\n",
42 | "\n",
43 | " i = 0\n",
44 | " time_spent = []\n",
45 | " while i < 200:\n",
46 | " start_time = time.time()\n",
47 | " with torch.no_grad():\n",
48 | " _ = model(inputs)\n",
49 | "\n",
50 | " if device == 'cuda':\n",
51 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
52 | " if i != 0:\n",
53 | " time_spent.append(time.time() - start_time)\n",
54 | " i += 1\n",
55 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
56 | " return np.mean(time_spent)"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 38,
62 | "metadata": {},
63 | "outputs": [],
64 | "source": [
65 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
66 | " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
67 | "\n",
68 | "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]"
69 | ]
70 | },
71 | {
72 | "cell_type": "code",
73 | "execution_count": null,
74 | "metadata": {
75 | "scrolled": false
76 | },
77 | "outputs": [
78 | {
79 | "name": "stdout",
80 | "output_type": "stream",
81 | "text": [
82 | "model: resnet18\n",
83 | "Avg execution time (ms): 0.008\n",
84 | "Avg execution time (ms): 0.019\n",
85 | "Avg execution time (ms): 0.046\n",
86 | "Avg execution time (ms): 0.040\n"
87 | ]
88 | }
89 | ],
90 | "source": [
91 | "for i, model_name in enumerate(modellist):\n",
92 | " batchlist = [1, 4, 8, 16, 32]\n",
93 | " imsize = [128, 256, 512]\n",
94 | " runtimes = []\n",
95 | " \n",
96 | " # define model\n",
97 | " print(\"model: {}\".format(model_name))\n",
98 | " mdl = globals()[model_name]\n",
99 | " model = mdl()\n",
100 | " \n",
101 | " for batch in batchlist: \n",
102 | " runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n",
103 | "\n",
104 | " if i == 0:\n",
105 | " dfbatch = pd.DataFrame({model_name: runtimes},\n",
106 | " index = batchlist)\n",
107 | " else:\n",
108 | " dfbatch[model_name] = runtimes\n",
109 | " \n",
110 | " runtimes = []\n",
111 | " for isize in imsize:\n",
112 | " print(\"model: {}\".format(model_name))\n",
113 | " mdl = globals()[model_name]\n",
114 | " model = mdl()\n",
115 | " runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n",
116 | "\n",
117 | " if i == 0:\n",
118 | " dfimsize = pd.DataFrame({model_name: runtimes},\n",
119 | " index = imsize)\n",
120 | " else:\n",
121 | " dfimsize[model_name] = runtimes"
122 | ]
123 | },
124 | {
125 | "cell_type": "code",
126 | "execution_count": 37,
127 | "metadata": {},
128 | "outputs": [
129 | {
130 | "data": {
131 | "text/html": [
132 | "\n",
133 | "\n",
146 | "
\n",
147 | " \n",
148 | " \n",
149 | " | \n",
150 | " resnet18 | \n",
151 | " resnet34 | \n",
152 | " resnet50 | \n",
153 | " resnet101 | \n",
154 | " resnet152 | \n",
155 | " resnext50_32x4d | \n",
156 | " resnext101_32x8d | \n",
157 | " mnasnet1_0 | \n",
158 | " squeezenet1_0 | \n",
159 | " densenet121 | \n",
160 | " densenet169 | \n",
161 | " inception_v3 | \n",
162 | "
\n",
163 | " \n",
164 | " \n",
165 | " \n",
166 | " cuda FP32 | \n",
167 | " 0.006895 | \n",
168 | " 0.013512 | \n",
169 | " 0.016632 | \n",
170 | " 0.032939 | \n",
171 | " 0.048400 | \n",
172 | " 0.033309 | \n",
173 | " 0.118709 | \n",
174 | " 0.007704 | \n",
175 | " 0.004120 | \n",
176 | " 0.025061 | \n",
177 | " 0.037639 | \n",
178 | " 0.027673 | \n",
179 | "
\n",
180 | " \n",
181 | " cuda FP16 | \n",
182 | " 0.007969 | \n",
183 | " 0.015316 | \n",
184 | " 0.017940 | \n",
185 | " 0.035898 | \n",
186 | " 0.052364 | \n",
187 | " 0.033756 | \n",
188 | " 0.114106 | \n",
189 | " 0.007777 | \n",
190 | " 0.003966 | \n",
191 | " 0.022630 | \n",
192 | " 0.034838 | \n",
193 | " 0.030210 | \n",
194 | "
\n",
195 | " \n",
196 | "
\n",
197 | "
"
198 | ],
199 | "text/plain": [
200 | " resnet18 resnet34 resnet50 resnet101 resnet152 \\\n",
201 | "cuda FP32 0.006895 0.013512 0.016632 0.032939 0.048400 \n",
202 | "cuda FP16 0.007969 0.015316 0.017940 0.035898 0.052364 \n",
203 | "\n",
204 | " resnext50_32x4d resnext101_32x8d mnasnet1_0 squeezenet1_0 \\\n",
205 | "cuda FP32 0.033309 0.118709 0.007704 0.004120 \n",
206 | "cuda FP16 0.033756 0.114106 0.007777 0.003966 \n",
207 | "\n",
208 | " densenet121 densenet169 inception_v3 \n",
209 | "cuda FP32 0.025061 0.037639 0.027673 \n",
210 | "cuda FP16 0.022630 0.034838 0.030210 "
211 | ]
212 | },
213 | "execution_count": 37,
214 | "metadata": {},
215 | "output_type": "execute_result"
216 | }
217 | ],
218 | "source": [
219 | "df"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": null,
225 | "metadata": {},
226 | "outputs": [],
227 | "source": []
228 | }
229 | ],
230 | "metadata": {
231 | "kernelspec": {
232 | "display_name": "Python 3",
233 | "language": "python",
234 | "name": "python3"
235 | },
236 | "language_info": {
237 | "codemirror_mode": {
238 | "name": "ipython",
239 | "version": 3
240 | },
241 | "file_extension": ".py",
242 | "mimetype": "text/x-python",
243 | "name": "python",
244 | "nbconvert_exporter": "python",
245 | "pygments_lexer": "ipython3",
246 | "version": "3.6.5"
247 | }
248 | },
249 | "nbformat": 4,
250 | "nbformat_minor": 2
251 | }
252 |
--------------------------------------------------------------------------------
/inference_segmentation.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "import numpy as np\n",
10 | "import torch\n",
11 | "import time\n",
12 | "from torchvision.models import *\n",
13 | "import pandas as pd\n",
14 | "import os\n",
15 | "import torchvision\n",
16 | "from torch2trt import torch2trt"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 2,
22 | "metadata": {},
23 | "outputs": [],
24 | "source": [
25 | "from torchvision.models.segmentation import *"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 3,
31 | "metadata": {},
32 | "outputs": [],
33 | "source": [
34 | "FP32 = True\n",
35 | "FP16 = True\n",
36 | "INT8 = True"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": 4,
42 | "metadata": {},
43 | "outputs": [],
44 | "source": [
45 | "# make results\n",
46 | "os.makedirs(\"results\", exist_ok=True)"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 5,
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "class ModelWrapper(torch.nn.Module):\n",
56 | " def __init__(self, model):\n",
57 | " super(ModelWrapper, self).__init__()\n",
58 | " self.model = model\n",
59 | " def forward(self, x):\n",
60 | " return self.model(x)['out']"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 6,
66 | "metadata": {},
67 | "outputs": [],
68 | "source": [
69 | "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
70 | " inputs = torch.randn(input_size)\n",
71 | " if device == 'cuda':\n",
72 | " model = model.cuda()\n",
73 | " inputs = inputs.cuda()\n",
74 | " if FP16:\n",
75 | " model = model.half()\n",
76 | " inputs = inputs.half()\n",
77 | "\n",
78 | " model.eval()\n",
79 | "\n",
80 | " i = 0\n",
81 | " time_spent = []\n",
82 | " while i < 200:\n",
83 | " start_time = time.time()\n",
84 | " with torch.no_grad():\n",
85 | " _ = model(inputs)\n",
86 | "\n",
87 | " if device == 'cuda':\n",
88 | " torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
89 | " if i != 0:\n",
90 | " time_spent.append(time.time() - start_time)\n",
91 | " i += 1\n",
92 | " print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
93 | " return np.mean(time_spent)"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 7,
99 | "metadata": {},
100 | "outputs": [],
101 | "source": [
102 | "# resnet is enought for now\n",
103 | "modellist = [\"fcn_resnet50\", \"fcn_resnet101\", \"deeplabv3_resnet50\", \"deeplabv3_resnet101\"]"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 9,
109 | "metadata": {
110 | "scrolled": true
111 | },
112 | "outputs": [
113 | {
114 | "name": "stdout",
115 | "output_type": "stream",
116 | "text": [
117 | "model: fcn_resnet50\n",
118 | "Avg execution time (ms): 0.205\n",
119 | "Avg execution time (ms): 0.174\n",
120 | "running fp16 models..\n",
121 | "Avg execution time (ms): 0.037\n",
122 | "running int8 models..\n",
123 | "Avg execution time (ms): 0.022\n",
124 | "model: fcn_resnet101\n"
125 | ]
126 | },
127 | {
128 | "name": "stderr",
129 | "output_type": "stream",
130 | "text": [
131 | "Downloading: \"https://download.pytorch.org/models/resnet101-5d3b4d8f.pth\" to /home/ken/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth\n",
132 | "100.0%\n"
133 | ]
134 | },
135 | {
136 | "name": "stdout",
137 | "output_type": "stream",
138 | "text": [
139 | "Avg execution time (ms): 0.344\n",
140 | "Avg execution time (ms): 0.290\n",
141 | "running fp16 models..\n",
142 | "Avg execution time (ms): 0.057\n",
143 | "running int8 models..\n",
144 | "Avg execution time (ms): 0.032\n",
145 | "model: deeplabv3_resnet50\n",
146 | "Avg execution time (ms): 0.281\n",
147 | "Avg execution time (ms): 0.252\n",
148 | "running fp16 models..\n",
149 | "Avg execution time (ms): 0.130\n",
150 | "running int8 models..\n",
151 | "Avg execution time (ms): 0.097\n",
152 | "model: deeplabv3_resnet101\n",
153 | "Avg execution time (ms): 0.426\n",
154 | "Avg execution time (ms): 0.367\n",
155 | "running fp16 models..\n",
156 | "Avg execution time (ms): 0.151\n",
157 | "running int8 models..\n",
158 | "Avg execution time (ms): 0.108\n"
159 | ]
160 | }
161 | ],
162 | "source": [
163 | "results = []\n",
164 | "for i, model_name in enumerate(modellist):\n",
165 | " runtimes = []\n",
166 | "\n",
167 | " # define model\n",
168 | " print(\"model: {}\".format(model_name))\n",
169 | " input_size = [1, 3, 512, 512]\n",
170 | " mdl = globals()[model_name]\n",
171 | " model = mdl().cuda().eval()\n",
172 | " # Run raw models\n",
173 | " runtimes.append(computeTime(model, input_size=input_size, device=\"cuda\", FP16=False))\n",
174 | "\n",
175 | " if FP32: \n",
176 | " mdl = globals()[model_name]\n",
177 | " model = mdl().cuda().eval()\n",
178 | " model_w = ModelWrapper(model)\n",
179 | " x = torch.zeros(input_size).cuda()\n",
180 | "\n",
181 | " # convert to tensorrt models\n",
182 | " model_trt = torch2trt(model_w, [x])\n",
183 | "\n",
184 | " # Run TensorRT models\n",
185 | " runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=False))\n",
186 | " if FP16:\n",
187 | " print(\"running fp16 models..\")\n",
188 | " # Make FP16 tensorRT models\n",
189 | " mdl = globals()[model_name]\n",
190 | " model = mdl().eval().half().cuda()\n",
191 | " model_w = ModelWrapper(model).half()\n",
192 | " x = torch.zeros(input_size).half().cuda()\n",
193 | " # convert to tensorrt models\n",
194 | " model_trt = torch2trt(model_w, [x], fp16_mode=True)\n",
195 | " # Run TensorRT models\n",
196 | " runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=True))\n",
197 | "\n",
198 | " if INT8:\n",
199 | " print(\"running int8 models..\")\n",
200 | " # Make INT8 tensorRT models\n",
201 | " mdl = globals()[model_name]\n",
202 | " model = mdl().eval().half().cuda()\n",
203 | " model_w = ModelWrapper(model).half()\n",
204 | " x = torch.randn(input_size).half().cuda()\n",
205 | " # convert to tensorrt models\n",
206 | " model_trt = torch2trt(model_w, [x], fp16_mode=True, int8_mode=True, max_batch_size=1)\n",
207 | "\n",
208 | " runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=True))\n",
209 | "\n",
210 | " if i == 0:\n",
211 | " df = pd.DataFrame({model_name: runtimes},\n",
212 | " index = [\"Raw\", \"FP32\", \"FP16\", \"INT8\"])\n",
213 | " else:\n",
214 | " df[model_name] = runtimes"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 10,
220 | "metadata": {},
221 | "outputs": [
222 | {
223 | "data": {
224 | "text/html": [
225 | "\n",
226 | "\n",
239 | "
\n",
240 | " \n",
241 | " \n",
242 | " | \n",
243 | " fcn_resnet50 | \n",
244 | " fcn_resnet101 | \n",
245 | " deeplabv3_resnet50 | \n",
246 | " deeplabv3_resnet101 | \n",
247 | "
\n",
248 | " \n",
249 | " \n",
250 | " \n",
251 | " Raw | \n",
252 | " 0.205359 | \n",
253 | " 0.344307 | \n",
254 | " 0.281023 | \n",
255 | " 0.425960 | \n",
256 | "
\n",
257 | " \n",
258 | " FP32 | \n",
259 | " 0.173818 | \n",
260 | " 0.290180 | \n",
261 | " 0.252314 | \n",
262 | " 0.366532 | \n",
263 | "
\n",
264 | " \n",
265 | " FP16 | \n",
266 | " 0.036635 | \n",
267 | " 0.056922 | \n",
268 | " 0.129868 | \n",
269 | " 0.151195 | \n",
270 | "
\n",
271 | " \n",
272 | " INT8 | \n",
273 | " 0.021869 | \n",
274 | " 0.032292 | \n",
275 | " 0.097351 | \n",
276 | " 0.108282 | \n",
277 | "
\n",
278 | " \n",
279 | "
\n",
280 | "
"
281 | ],
282 | "text/plain": [
283 | " fcn_resnet50 fcn_resnet101 deeplabv3_resnet50 deeplabv3_resnet101\n",
284 | "Raw 0.205359 0.344307 0.281023 0.425960\n",
285 | "FP32 0.173818 0.290180 0.252314 0.366532\n",
286 | "FP16 0.036635 0.056922 0.129868 0.151195\n",
287 | "INT8 0.021869 0.032292 0.097351 0.108282"
288 | ]
289 | },
290 | "execution_count": 10,
291 | "metadata": {},
292 | "output_type": "execute_result"
293 | }
294 | ],
295 | "source": [
296 | "df.to_csv(\"results/xavier_segmentation.csv\")\n",
297 | "df"
298 | ]
299 | },
300 | {
301 | "cell_type": "code",
302 | "execution_count": null,
303 | "metadata": {},
304 | "outputs": [],
305 | "source": []
306 | }
307 | ],
308 | "metadata": {
309 | "kernelspec": {
310 | "display_name": "Python 3",
311 | "language": "python",
312 | "name": "python3"
313 | },
314 | "language_info": {
315 | "codemirror_mode": {
316 | "name": "ipython",
317 | "version": 3
318 | },
319 | "file_extension": ".py",
320 | "mimetype": "text/x-python",
321 | "name": "python",
322 | "nbconvert_exporter": "python",
323 | "pygments_lexer": "ipython3",
324 | "version": "3.6.9"
325 | }
326 | },
327 | "nbformat": 4,
328 | "nbformat_minor": 2
329 | }
330 |
--------------------------------------------------------------------------------
/inference_tensorrt.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | # coding: utf-8
3 |
4 | import numpy as np
5 | import torch
6 | import time
7 | from torchvision.models import *
8 | from utils.fp16 import network_to_half
9 | import os
10 | from torch2trt import torch2trt
11 | import pandas as pd
12 |
13 | FP32 = True
14 | FP16 = True
15 | INT8 = True
16 |
17 | # make results
18 | os.makedirs("results", exist_ok=True)
19 |
20 | def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):
21 | inputs = torch.randn(input_size)
22 | if device == 'cuda':
23 | inputs = inputs.cuda()
24 | if FP16:
25 | model = network_to_half(model)
26 |
27 | i = 0
28 | time_spent = []
29 | while i < 200:
30 | start_time = time.time()
31 | with torch.no_grad():
32 | _ = model(inputs)
33 |
34 | if device == 'cuda':
35 | torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)
36 | if i != 0:
37 | time_spent.append(time.time() - start_time)
38 | i += 1
39 | print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))
40 | return np.mean(time_spent)
41 |
42 |
43 | modellist = ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152", "resnext50_32x4d", "resnext101_32x8d", "mnasnet1_0", "squeezenet1_0", "densenet121", "densenet169", "inception_v3"]
44 |
45 | # resnet is enought for now
46 | modellist = ["resnet18", "resnet34", "resnet50"]
47 | results = []
48 |
49 | for i, model_name in enumerate(modellist):
50 | runtimes = []
51 |
52 | input_size = [1, 3, 256, 256]
53 | mdl = globals()[model_name]
54 | model = mdl().cuda().eval()
55 | # Run raw models
56 | runtimes.append(computeTime(model, input_size=input_size, device="cuda", FP16=False))
57 |
58 | if FP32:
59 | # define model
60 | print("model: {}".format(model_name))
61 | mdl = globals()[model_name]
62 | model = mdl().cuda().eval()
63 | # define input
64 | input_size = [1, 3, 256, 256]
65 | x = torch.zeros(input_size).cuda()
66 |
67 | # convert to tensorrt models
68 | model_trt = torch2trt(model, [x])
69 |
70 | # Run TensorRT models
71 | runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=False))
72 | if FP16:
73 | print("running fp16 models..")
74 | # Make FP16 tensorRT models
75 | mdl = globals()[model_name]
76 | model = mdl().eval().half().cuda()
77 | # define input
78 | input_size = [1, 3, 256, 256]
79 | x = torch.zeros(input_size).half().cuda()
80 | # convert to tensorrt models
81 | model_trt = torch2trt(model, [x], fp16_mode=True)
82 | # Run TensorRT models
83 | runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=True))
84 |
85 | results.append({model_name: runtimes})
86 | if INT8:
87 | print("running int8 models..")
88 | # Make INT8 tensorRT models
89 | mdl = globals()[model_name]
90 | model = mdl().eval().half().cuda()
91 | # define input
92 | input_size = [1, 3, 256, 256]
93 | x = torch.randn(input_size).half().cuda()
94 | # convert to tensorrt models
95 | model_trt = torch2trt(model, [x], fp16_mode=True, int8_mode=True, max_batch_size=1)
96 | # Run TensorRT models
97 | input_size = [1, 3, 256, 256]
98 | runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=True))
99 | results.append({model_name: runtimes})
100 |
101 | if i == 0:
102 | df = pd.DataFrame({model_name: runtimes},
103 | index = ["Raw", "FP32", "FP16", "INT8"])
104 | else:
105 | df[model_name] = runtimes
106 |
107 | df.to_csv("results/xavier.csv")
108 | df
109 |
110 |
--------------------------------------------------------------------------------
/results/batch.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | 1,0.008542266922380456,0.01481275103200021,0.0178466082817346
3 | 4,0.004840066684550376,0.0091991511421587,0.011360873229539574
4 | 8,0.005877630195425983,0.009893905727108519,0.011597911616665634
5 | 16,0.0024946156009357776,0.004145186525493411,0.008591583475994704
6 | 32,0.002836233557169162,0.004563913123691502,0.008728343338223558
7 |
--------------------------------------------------------------------------------
/results/fp16.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | FP32,0.00989055154311597,0.01693966640299888,0.019452313082901077
3 | FP16_torch,0.009034480281810664,0.017017085348541412,0.019448458848886154
4 | FP16_apex,0.009056915589912453,0.016978000276651813,0.020045562006121304
5 |
--------------------------------------------------------------------------------
/results/imsize.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | 128,0.007868697295835869,0.014414251749239975,0.01414928244585967
3 | 256,0.007717028335111225,0.014684499807693251,0.017655759600538706
4 | 512,0.02013937672178949,0.03769871218120632,0.04788235084495353
5 | 1024,0.07168652424261199,0.13482510863836086,0.1791151432535756
6 |
--------------------------------------------------------------------------------
/results/jetsonano.txt:
--------------------------------------------------------------------------------
1 | Without tensorRT FP32/FP16
2 | [{'resnet18': [0.038558399257947455, 0.037869752951003796]}, {'resnet34': [0.06386453542278041, 0.06943798424610541]}, {'resnet50': [0.0906546499261904, 0.08942561532983828]}]
3 |
4 | With TensorRT FP32/FP16
5 | [{'resnet18': [0.02693739967729578, 0.030103209030688107]}, {'resnet34': [0.047432633500602374, 0.046157992664893066]}, {'resnet50': [0.07816180631743004, 0.07399397758982289]}]
6 |
7 |
8 |
--------------------------------------------------------------------------------
/results/xavier.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | Raw,0.007432765098073375,0.011262918836507365,0.015097524652529002
3 | FP32,0.003625017913741682,0.005848102234116751,0.010952574523849104
4 | FP16,0.0017815654601284008,0.004464829986418911,0.0041645591582485176
5 | INT8,0.0018363609984891499,0.004203463319557995,0.0030628951949689853
6 |
--------------------------------------------------------------------------------
/results/xavier_segmentation.csv:
--------------------------------------------------------------------------------
1 | ,fcn_resnet50,fcn_resnet101,deeplabv3_resnet50,deeplabv3_resnet101
2 | Raw,0.20535858312443872,0.34430714108836113,0.2810230279088619,0.425959756026915
3 | FP32,0.17381846126000486,0.29017985765658433,0.2523143626936716,0.3665324635242098
4 | FP16,0.03663515685191705,0.05692227521733423,0.12986798621901316,0.1511950037587228
5 | INT8,0.021869432986082144,0.032291673535677655,0.09735102749350083,0.10828216591073041
6 |
--------------------------------------------------------------------------------
/utils/__pycache__/fp16.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/utils/__pycache__/fp16.cpython-36.pyc
--------------------------------------------------------------------------------
/utils/fp16.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | # codes from https://github.com/fastai/imagenet-fast/tree/master/cifar10
4 |
5 | class tofp16(nn.Module):
6 | def __init__(self):
7 | super(tofp16, self).__init__()
8 |
9 | def forward(self, input):
10 | return input.half()
11 |
12 |
13 | def copy_in_params(net, params):
14 | net_params = list(net.parameters())
15 | for i in range(len(params)):
16 | net_params[i].data.copy_(params[i].data)
17 |
18 |
19 | def set_grad(params, params_with_grad):
20 |
21 | for param, param_w_grad in zip(params, params_with_grad):
22 | if param.grad is None:
23 | param.grad = torch.nn.Parameter(param.data.new().resize_(*param.data.size()))
24 | param.grad.data.copy_(param_w_grad.grad.data)
25 |
26 |
27 | def BN_convert_float(module):
28 | '''
29 | BatchNorm layers to have parameters in single precision.
30 | Find all layers and convert them back to float. This can't
31 | be done with built in .apply as that function will apply
32 | fn to all modules, parameters, and buffers. Thus we wouldn't
33 | be able to guard the float conversion based on the module type.
34 | '''
35 | if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
36 | module.float()
37 | for child in module.children():
38 | BN_convert_float(child)
39 | return module
40 |
41 |
42 | def network_to_half(network):
43 | return nn.Sequential(tofp16(), BN_convert_float(network.half()))
44 |
--------------------------------------------------------------------------------