├── LICENSE
├── README.md
├── imgs
    ├── 2022_01.jpg
    ├── _108240741_beatles-abbeyroad-square-reuters-applecorps.jpg
    ├── add256.jpg
    ├── add512.jpg
    ├── add768.jpg
    ├── addnormal.jpg
    ├── addtensorrt_FP16.jpg
    ├── addtensorrt_FP32.jpg
    ├── addtensorrt_INT8.jpg
    ├── mask256.jpg
    ├── mask512.jpg
    ├── mask768.jpg
    ├── masknormal.jpg
    ├── masktensorrt_FP16.jpg
    ├── masktensorrt_FP32.jpg
    └── masktensorrt_INT8.jpg
├── inference_FP32_vs_FP16.ipynb
├── inference_batch_vs_imsize.ipynb
├── inference_classification.ipynb
├── inference_dev.ipynb
├── inference_segmentation.ipynb
├── inference_segmentation_demo.ipynb
├── inference_tensorrt.py
├── results
    ├── batch.csv
    ├── fp16.csv
    ├── imsize.csv
    ├── jetsonano.txt
    ├── xavier.csv
    └── xavier_segmentation.csv
└── utils
    ├── __pycache__
        └── fp16.cpython-36.pyc
    └── fp16.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Kentaro Yoshioka
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Benchmark-FP32-FP16-INT8-with-TensorRT
 2 | Benchmark inference speed of CNNs with various quantization methods with TensorRT!
 3 | 
 4 | :star: if it helps you.
 5 | 
 6 | # Image classification
 7 | 
 8 | Run:
 9 | `inference_tensorrt.py`
10 | 
11 | ## Hardware:Jetson Nano.
12 | TRT notes TensorRT compiled models in the noted precision.
13 | 
14 | Latency of image inference (1,3,256,256) [ms]
15 | 
16 | |          | TRT FP32 | TRT FP16 | TRT INT8 |
17 | |:--------:|------|:----:|------|
18 | | resnet18 | 26   |  18  |      |
19 | | resnet34 | 48   |  30  |      |
20 | | resnet50 | 79   | 42   |      |
21 | 
22 | Jetson Nano does not support INT8..
23 | 
24 | ## Hardware:Jetson Xavier.
25 | 
26 | TRT notes TensorRT compiled models in the noted precision.
27 | 
28 | Latency of image inference (1,3,256,256) [ms]
29 | 
30 | |      | resnet18 | resnet34 | resnet50 |
31 | |------|----------|----------|----------|
32 | | PytorchRaw  | 11       | 12       | 16       |
33 | | TRT FP32 | 3.8      | 5.6      | 9.9      |
34 | | TRT FP16 | 2.1      | 3.3      | 4.4      |
35 | | TRT INT8 | 1.7      | 2.7      | 3.0     |
36 | 
37 | # Image segmentation
38 | ![beatles](imgs/addtensorrt_FP32.jpg)
39 | ## Hardware:Jetson Xavier.
40 | 
41 | TRT notes TensorRT compiled models in the noted precision.
42 | 
43 | Latency of image inference (1,3,512,512) [ms]
44 | 
45 | |      | fcn_resnet50 | fcn_resnet101 | deeplabv3_resnet50 | deeplabv3_resnet101 |
46 | |------|--------------|---------------|--------------------|---------------------|
47 | | PytorchRaw  | 200          | 344           | 281                | 426                 |
48 | | TRT FP32 | 173          | 290           | 252                | 366                 |
49 | | TRT FP16 | 36           | 57            | 130                | 151                 |
50 | | TRT INT8 | 21           | 32            | 97                 | 108                 |
51 | 
52 | ## Hardware:Jetson Nano.
53 | 
54 | Latency of image inference (1,3,256,256) [ms]
55 | 
56 | |      | fcn_resnet50 | 
57 | |------|--------------|
58 | | PytorchRaw  | 6800          | 
59 | | TRT FP32 | 767          | 
60 | | TRT FP16 | 40           | 
61 | | TRT INT8 | NA           | 
62 | 
63 | # Hardware setup
64 | The hardware setup seems tricky.
65 | 
66 | * Install pytorch
67 | 
68 | https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048
69 | 
70 | **The stable version for Jetson nano seems to be torch==1.1**
71 | 
72 | **For Xavier, torch==1.3 worked fine for me.**
73 | 
74 | * Install torchvision
75 | 
76 | I followed this instruction and installed torchvision==0.3.0
77 | 
78 | https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32
79 | 
80 | ```bash
81 | sudo apt-get install libjpeg-dev zlib1g-dev
82 | git clone -b v0.3.0 https://github.com/pytorch/vision torchvision
83 | cd torchvision
84 | sudo python3 setup.py install
85 | ```
86 | 
87 | * Install torch2trt
88 | 
89 | Followed readme.
90 | 
91 | https://github.com/NVIDIA-AI-IOT/torch2trt
92 | 
93 | ```bash
94 | sudo apt-get install libprotobuf* protobuf-compiler ninja-build
95 | git clone https://github.com/NVIDIA-AI-IOT/torch2trt
96 | cd torch2trt
97 | sudo python3 setup.py install --plugins 
98 | ```
99 | 


--------------------------------------------------------------------------------
/imgs/2022_01.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/2022_01.jpg


--------------------------------------------------------------------------------
/imgs/_108240741_beatles-abbeyroad-square-reuters-applecorps.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/_108240741_beatles-abbeyroad-square-reuters-applecorps.jpg


--------------------------------------------------------------------------------
/imgs/add256.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add256.jpg


--------------------------------------------------------------------------------
/imgs/add512.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add512.jpg


--------------------------------------------------------------------------------
/imgs/add768.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/add768.jpg


--------------------------------------------------------------------------------
/imgs/addnormal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addnormal.jpg


--------------------------------------------------------------------------------
/imgs/addtensorrt_FP16.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_FP16.jpg


--------------------------------------------------------------------------------
/imgs/addtensorrt_FP32.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_FP32.jpg


--------------------------------------------------------------------------------
/imgs/addtensorrt_INT8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/addtensorrt_INT8.jpg


--------------------------------------------------------------------------------
/imgs/mask256.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask256.jpg


--------------------------------------------------------------------------------
/imgs/mask512.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask512.jpg


--------------------------------------------------------------------------------
/imgs/mask768.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/mask768.jpg


--------------------------------------------------------------------------------
/imgs/masknormal.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masknormal.jpg


--------------------------------------------------------------------------------
/imgs/masktensorrt_FP16.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_FP16.jpg


--------------------------------------------------------------------------------
/imgs/masktensorrt_FP32.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_FP32.jpg


--------------------------------------------------------------------------------
/imgs/masktensorrt_INT8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/imgs/masktensorrt_INT8.jpg


--------------------------------------------------------------------------------
/inference_FP32_vs_FP16.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 8,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import torch\n",
 11 |     "import time\n",
 12 |     "from torchvision.models import *\n",
 13 |     "import pandas as pd\n",
 14 |     "import os\n",
 15 |     "from apex import amp"
 16 |    ]
 17 |   },
 18 |   {
 19 |    "cell_type": "code",
 20 |    "execution_count": 9,
 21 |    "metadata": {},
 22 |    "outputs": [],
 23 |    "source": [
 24 |     "# make results\n",
 25 |     "os.makedirs(\"results\", exist_ok=True)"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 10,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
 35 |     "    inputs = torch.randn(input_size)\n",
 36 |     "    if device == 'cuda':\n",
 37 |     "        model = model.cuda()\n",
 38 |     "        inputs = inputs.cuda()\n",
 39 |     "    if FP16:\n",
 40 |     "        model = model.half()\n",
 41 |     "        inputs = inputs.half()\n",
 42 |     "\n",
 43 |     "    model.eval()\n",
 44 |     "\n",
 45 |     "    i = 0\n",
 46 |     "    time_spent = []\n",
 47 |     "    while i < 200:\n",
 48 |     "        start_time = time.time()\n",
 49 |     "        with torch.no_grad():\n",
 50 |     "            _ = model(inputs)\n",
 51 |     "\n",
 52 |     "        if device == 'cuda':\n",
 53 |     "            torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
 54 |     "        if i != 0:\n",
 55 |     "            time_spent.append(time.time() - start_time)\n",
 56 |     "        i += 1\n",
 57 |     "    print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
 58 |     "    return np.mean(time_spent)"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "code",
 63 |    "execution_count": 11,
 64 |    "metadata": {},
 65 |    "outputs": [],
 66 |    "source": [
 67 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
 68 |     " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
 69 |     "\n",
 70 |     "# resnet is enought for now\n",
 71 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\"]"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": 17,
 77 |    "metadata": {},
 78 |    "outputs": [
 79 |     {
 80 |      "name": "stdout",
 81 |      "output_type": "stream",
 82 |      "text": [
 83 |       "model: resnet18\n",
 84 |       "Looks ok!\n"
 85 |      ]
 86 |     }
 87 |    ],
 88 |    "source": [
 89 |     "# test amp\n",
 90 |     "model_name = \"resnet18\"\n",
 91 |     "print(\"model: {}\".format(model_name))\n",
 92 |     "mdl = globals()[model_name]\n",
 93 |     "model = mdl().to(\"cuda\")\n",
 94 |     "model = amp.initialize(model, opt_level=opt_level)\n",
 95 |     "print(\"Looks ok!\")"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": 20,
101 |    "metadata": {
102 |     "scrolled": false
103 |    },
104 |    "outputs": [
105 |     {
106 |      "name": "stdout",
107 |      "output_type": "stream",
108 |      "text": [
109 |       "model: resnet18\n",
110 |       "Avg execution time (ms): 0.010\n",
111 |       "Avg execution time (ms): 0.009\n",
112 |       "Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.\n",
113 |       "\n",
114 |       "Defaults for this optimization level are:\n",
115 |       "enabled                : True\n",
116 |       "opt_level              : O1\n",
117 |       "cast_model_type        : None\n",
118 |       "patch_torch_functions  : True\n",
119 |       "keep_batchnorm_fp32    : None\n",
120 |       "master_weights         : None\n",
121 |       "loss_scale             : dynamic\n",
122 |       "Processing user overrides (additional kwargs that are not None)...\n",
123 |       "After processing overrides, optimization options are:\n",
124 |       "enabled                : True\n",
125 |       "opt_level              : O1\n",
126 |       "cast_model_type        : None\n",
127 |       "patch_torch_functions  : True\n",
128 |       "keep_batchnorm_fp32    : None\n",
129 |       "master_weights         : None\n",
130 |       "loss_scale             : dynamic\n",
131 |       "Avg execution time (ms): 0.009\n",
132 |       "model: resnet34\n",
133 |       "Avg execution time (ms): 0.017\n",
134 |       "Avg execution time (ms): 0.017\n",
135 |       "Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.\n",
136 |       "\n",
137 |       "Defaults for this optimization level are:\n",
138 |       "enabled                : True\n",
139 |       "opt_level              : O1\n",
140 |       "cast_model_type        : None\n",
141 |       "patch_torch_functions  : True\n",
142 |       "keep_batchnorm_fp32    : None\n",
143 |       "master_weights         : None\n",
144 |       "loss_scale             : dynamic\n",
145 |       "Processing user overrides (additional kwargs that are not None)...\n",
146 |       "After processing overrides, optimization options are:\n",
147 |       "enabled                : True\n",
148 |       "opt_level              : O1\n",
149 |       "cast_model_type        : None\n",
150 |       "patch_torch_functions  : True\n",
151 |       "keep_batchnorm_fp32    : None\n",
152 |       "master_weights         : None\n",
153 |       "loss_scale             : dynamic\n",
154 |       "Avg execution time (ms): 0.017\n",
155 |       "model: resnet50\n",
156 |       "Avg execution time (ms): 0.019\n",
157 |       "Avg execution time (ms): 0.019\n",
158 |       "Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.\n",
159 |       "\n",
160 |       "Defaults for this optimization level are:\n",
161 |       "enabled                : True\n",
162 |       "opt_level              : O1\n",
163 |       "cast_model_type        : None\n",
164 |       "patch_torch_functions  : True\n",
165 |       "keep_batchnorm_fp32    : None\n",
166 |       "master_weights         : None\n",
167 |       "loss_scale             : dynamic\n",
168 |       "Processing user overrides (additional kwargs that are not None)...\n",
169 |       "After processing overrides, optimization options are:\n",
170 |       "enabled                : True\n",
171 |       "opt_level              : O1\n",
172 |       "cast_model_type        : None\n",
173 |       "patch_torch_functions  : True\n",
174 |       "keep_batchnorm_fp32    : None\n",
175 |       "master_weights         : None\n",
176 |       "loss_scale             : dynamic\n",
177 |       "Avg execution time (ms): 0.020\n"
178 |      ]
179 |     }
180 |    ],
181 |    "source": [
182 |     "for i, model_name in enumerate(modellist):\n",
183 |     "\n",
184 |     "    runtimes = []\n",
185 |     "    \n",
186 |     "    # define model\n",
187 |     "    print(\"model: {}\".format(model_name))\n",
188 |     "    mdl = globals()[model_name]\n",
189 |     "    model = mdl()\n",
190 |     "    \n",
191 |     "    # Run FP32\n",
192 |     "    runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=False))\n",
193 |     "    # Run FP16\n",
194 |     "    runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=True))\n",
195 |     "    \n",
196 |     "    # Amp Initialization\n",
197 |     "    opt_level = 'O1'  # for only use FP32\n",
198 |     "    mdl = globals()[model_name]\n",
199 |     "    model = mdl().to(\"cuda\")\n",
200 |     "    model = amp.initialize(model, opt_level=opt_level)\n",
201 |     "    \n",
202 |     "    # Run FP16\n",
203 |     "    runtimes.append(computeTime(model, input_size=[1, 3, 256, 256], device=\"cuda\", FP16=False))\n",
204 |     "    \n",
205 |     "    if i == 0:\n",
206 |     "        df = pd.DataFrame({model_name: runtimes},\n",
207 |     "                         index = [\"FP32\", \"FP16_torch\", \"FP16_apex\"])\n",
208 |     "    else:\n",
209 |     "        df[model_name] = runtimes\n",
210 |     "        "
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "code",
215 |    "execution_count": 21,
216 |    "metadata": {},
217 |    "outputs": [
218 |     {
219 |      "data": {
220 |       "text/html": [
221 |        "<div>\n",
222 |        "<style scoped>\n",
223 |        "    .dataframe tbody tr th:only-of-type {\n",
224 |        "        vertical-align: middle;\n",
225 |        "    }\n",
226 |        "\n",
227 |        "    .dataframe tbody tr th {\n",
228 |        "        vertical-align: top;\n",
229 |        "    }\n",
230 |        "\n",
231 |        "    .dataframe thead th {\n",
232 |        "        text-align: right;\n",
233 |        "    }\n",
234 |        "</style>\n",
235 |        "<table border=\"1\" class=\"dataframe\">\n",
236 |        "  <thead>\n",
237 |        "    <tr style=\"text-align: right;\">\n",
238 |        "      <th></th>\n",
239 |        "      <th>resnet18</th>\n",
240 |        "      <th>resnet34</th>\n",
241 |        "      <th>resnet50</th>\n",
242 |        "    </tr>\n",
243 |        "  </thead>\n",
244 |        "  <tbody>\n",
245 |        "    <tr>\n",
246 |        "      <th>FP32</th>\n",
247 |        "      <td>0.009891</td>\n",
248 |        "      <td>0.016940</td>\n",
249 |        "      <td>0.019452</td>\n",
250 |        "    </tr>\n",
251 |        "    <tr>\n",
252 |        "      <th>FP16_torch</th>\n",
253 |        "      <td>0.009034</td>\n",
254 |        "      <td>0.017017</td>\n",
255 |        "      <td>0.019448</td>\n",
256 |        "    </tr>\n",
257 |        "    <tr>\n",
258 |        "      <th>FP16_apex</th>\n",
259 |        "      <td>0.009057</td>\n",
260 |        "      <td>0.016978</td>\n",
261 |        "      <td>0.020046</td>\n",
262 |        "    </tr>\n",
263 |        "  </tbody>\n",
264 |        "</table>\n",
265 |        "</div>"
266 |       ],
267 |       "text/plain": [
268 |        "            resnet18  resnet34  resnet50\n",
269 |        "FP32        0.009891  0.016940  0.019452\n",
270 |        "FP16_torch  0.009034  0.017017  0.019448\n",
271 |        "FP16_apex   0.009057  0.016978  0.020046"
272 |       ]
273 |      },
274 |      "execution_count": 21,
275 |      "metadata": {},
276 |      "output_type": "execute_result"
277 |     }
278 |    ],
279 |    "source": [
280 |     "df.to_csv(\"results/fp16.csv\")\n",
281 |     "df"
282 |    ]
283 |   },
284 |   {
285 |    "cell_type": "code",
286 |    "execution_count": null,
287 |    "metadata": {},
288 |    "outputs": [],
289 |    "source": []
290 |   }
291 |  ],
292 |  "metadata": {
293 |   "kernelspec": {
294 |    "display_name": "Python 3",
295 |    "language": "python",
296 |    "name": "python3"
297 |   },
298 |   "language_info": {
299 |    "codemirror_mode": {
300 |     "name": "ipython",
301 |     "version": 3
302 |    },
303 |    "file_extension": ".py",
304 |    "mimetype": "text/x-python",
305 |    "name": "python",
306 |    "nbconvert_exporter": "python",
307 |    "pygments_lexer": "ipython3",
308 |    "version": "3.6.5"
309 |   }
310 |  },
311 |  "nbformat": 4,
312 |  "nbformat_minor": 2
313 | }
314 | 


--------------------------------------------------------------------------------
/inference_batch_vs_imsize.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 8,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import torch\n",
 11 |     "import time\n",
 12 |     "from torchvision.models import *\n",
 13 |     "import pandas as pd\n",
 14 |     "import os"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "code",
 19 |    "execution_count": 9,
 20 |    "metadata": {},
 21 |    "outputs": [],
 22 |    "source": [
 23 |     "# make models from str\n",
 24 |     "model_name = \"resnet18\"\n",
 25 |     "# make results\n",
 26 |     "os.makedirs(\"results\", exist_ok=True)"
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "code",
 31 |    "execution_count": 3,
 32 |    "metadata": {},
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
 36 |     "    inputs = torch.randn(input_size)\n",
 37 |     "    if device == 'cuda':\n",
 38 |     "        model = model.cuda()\n",
 39 |     "        inputs = inputs.cuda()\n",
 40 |     "    if FP16:\n",
 41 |     "        model = model.half()\n",
 42 |     "        inputs = inputs.half()\n",
 43 |     "\n",
 44 |     "    model.eval()\n",
 45 |     "\n",
 46 |     "    i = 0\n",
 47 |     "    time_spent = []\n",
 48 |     "    while i < 200:\n",
 49 |     "        start_time = time.time()\n",
 50 |     "        with torch.no_grad():\n",
 51 |     "            _ = model(inputs)\n",
 52 |     "\n",
 53 |     "        if device == 'cuda':\n",
 54 |     "            torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
 55 |     "        if i != 0:\n",
 56 |     "            time_spent.append(time.time() - start_time)\n",
 57 |     "        i += 1\n",
 58 |     "    print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
 59 |     "    return np.mean(time_spent)"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": 4,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
 69 |     " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
 70 |     "\n",
 71 |     "# resnet is enought for now\n",
 72 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\"]"
 73 |    ]
 74 |   },
 75 |   {
 76 |    "cell_type": "code",
 77 |    "execution_count": 5,
 78 |    "metadata": {
 79 |     "scrolled": false
 80 |    },
 81 |    "outputs": [
 82 |     {
 83 |      "name": "stdout",
 84 |      "output_type": "stream",
 85 |      "text": [
 86 |       "model: resnet18\n",
 87 |       "Avg execution time (ms): 0.009\n",
 88 |       "Avg execution time (ms): 0.019\n",
 89 |       "Avg execution time (ms): 0.047\n",
 90 |       "Avg execution time (ms): 0.040\n",
 91 |       "Avg execution time (ms): 0.091\n",
 92 |       "Avg execution time (ms): 0.008\n",
 93 |       "Avg execution time (ms): 0.008\n",
 94 |       "Avg execution time (ms): 0.020\n",
 95 |       "Avg execution time (ms): 0.072\n",
 96 |       "model: resnet34\n",
 97 |       "Avg execution time (ms): 0.015\n",
 98 |       "Avg execution time (ms): 0.037\n",
 99 |       "Avg execution time (ms): 0.079\n",
100 |       "Avg execution time (ms): 0.066\n",
101 |       "Avg execution time (ms): 0.146\n",
102 |       "Avg execution time (ms): 0.014\n",
103 |       "Avg execution time (ms): 0.015\n",
104 |       "Avg execution time (ms): 0.038\n",
105 |       "Avg execution time (ms): 0.135\n",
106 |       "model: resnet50\n",
107 |       "Avg execution time (ms): 0.018\n",
108 |       "Avg execution time (ms): 0.045\n",
109 |       "Avg execution time (ms): 0.093\n",
110 |       "Avg execution time (ms): 0.137\n",
111 |       "Avg execution time (ms): 0.279\n",
112 |       "Avg execution time (ms): 0.014\n",
113 |       "Avg execution time (ms): 0.018\n",
114 |       "Avg execution time (ms): 0.048\n",
115 |       "Avg execution time (ms): 0.179\n"
116 |      ]
117 |     }
118 |    ],
119 |    "source": [
120 |     "batchlist = [1, 4, 8, 16, 32]\n",
121 |     "imsize = [128, 256, 512, 1024]\n",
122 |     "\n",
123 |     "for i, model_name in enumerate(modellist):\n",
124 |     "\n",
125 |     "    runtimes = []\n",
126 |     "    \n",
127 |     "    # define model\n",
128 |     "    print(\"model: {}\".format(model_name))\n",
129 |     "    mdl = globals()[model_name]\n",
130 |     "    model = mdl()\n",
131 |     "    \n",
132 |     "    for batch in batchlist:        \n",
133 |     "        runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n",
134 |     "\n",
135 |     "    if i == 0:\n",
136 |     "        dfbatch = pd.DataFrame({model_name: runtimes},\n",
137 |     "                         index = batchlist)\n",
138 |     "    else:\n",
139 |     "        dfbatch[model_name] = runtimes\n",
140 |     "        \n",
141 |     "    runtimes = []\n",
142 |     "    for isize in imsize:\n",
143 |     "        runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n",
144 |     "\n",
145 |     "    if i == 0:\n",
146 |     "        dfimsize = pd.DataFrame({model_name: runtimes},\n",
147 |     "                         index = imsize)\n",
148 |     "    else:\n",
149 |     "        dfimsize[model_name] = runtimes"
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "code",
154 |    "execution_count": 13,
155 |    "metadata": {},
156 |    "outputs": [
157 |     {
158 |      "data": {
159 |       "text/html": [
160 |        "<div>\n",
161 |        "<style scoped>\n",
162 |        "    .dataframe tbody tr th:only-of-type {\n",
163 |        "        vertical-align: middle;\n",
164 |        "    }\n",
165 |        "\n",
166 |        "    .dataframe tbody tr th {\n",
167 |        "        vertical-align: top;\n",
168 |        "    }\n",
169 |        "\n",
170 |        "    .dataframe thead th {\n",
171 |        "        text-align: right;\n",
172 |        "    }\n",
173 |        "</style>\n",
174 |        "<table border=\"1\" class=\"dataframe\">\n",
175 |        "  <thead>\n",
176 |        "    <tr style=\"text-align: right;\">\n",
177 |        "      <th></th>\n",
178 |        "      <th>resnet18</th>\n",
179 |        "      <th>resnet34</th>\n",
180 |        "      <th>resnet50</th>\n",
181 |        "    </tr>\n",
182 |        "  </thead>\n",
183 |        "  <tbody>\n",
184 |        "    <tr>\n",
185 |        "      <th>1</th>\n",
186 |        "      <td>0.008542</td>\n",
187 |        "      <td>0.014813</td>\n",
188 |        "      <td>0.017847</td>\n",
189 |        "    </tr>\n",
190 |        "    <tr>\n",
191 |        "      <th>4</th>\n",
192 |        "      <td>0.004840</td>\n",
193 |        "      <td>0.009199</td>\n",
194 |        "      <td>0.011361</td>\n",
195 |        "    </tr>\n",
196 |        "    <tr>\n",
197 |        "      <th>8</th>\n",
198 |        "      <td>0.005878</td>\n",
199 |        "      <td>0.009894</td>\n",
200 |        "      <td>0.011598</td>\n",
201 |        "    </tr>\n",
202 |        "    <tr>\n",
203 |        "      <th>16</th>\n",
204 |        "      <td>0.002495</td>\n",
205 |        "      <td>0.004145</td>\n",
206 |        "      <td>0.008592</td>\n",
207 |        "    </tr>\n",
208 |        "    <tr>\n",
209 |        "      <th>32</th>\n",
210 |        "      <td>0.002836</td>\n",
211 |        "      <td>0.004564</td>\n",
212 |        "      <td>0.008728</td>\n",
213 |        "    </tr>\n",
214 |        "  </tbody>\n",
215 |        "</table>\n",
216 |        "</div>"
217 |       ],
218 |       "text/plain": [
219 |        "    resnet18  resnet34  resnet50\n",
220 |        "1   0.008542  0.014813  0.017847\n",
221 |        "4   0.004840  0.009199  0.011361\n",
222 |        "8   0.005878  0.009894  0.011598\n",
223 |        "16  0.002495  0.004145  0.008592\n",
224 |        "32  0.002836  0.004564  0.008728"
225 |       ]
226 |      },
227 |      "execution_count": 13,
228 |      "metadata": {},
229 |      "output_type": "execute_result"
230 |     }
231 |    ],
232 |    "source": [
233 |     "dfbatch.to_csv(\"results/batch.csv\")\n",
234 |     "dfimsize.to_csv(\"results/imsize.csv\")\n",
235 |     "dfbatch"
236 |    ]
237 |   },
238 |   {
239 |    "cell_type": "code",
240 |    "execution_count": 14,
241 |    "metadata": {},
242 |    "outputs": [
243 |     {
244 |      "data": {
245 |       "text/html": [
246 |        "<div>\n",
247 |        "<style scoped>\n",
248 |        "    .dataframe tbody tr th:only-of-type {\n",
249 |        "        vertical-align: middle;\n",
250 |        "    }\n",
251 |        "\n",
252 |        "    .dataframe tbody tr th {\n",
253 |        "        vertical-align: top;\n",
254 |        "    }\n",
255 |        "\n",
256 |        "    .dataframe thead th {\n",
257 |        "        text-align: right;\n",
258 |        "    }\n",
259 |        "</style>\n",
260 |        "<table border=\"1\" class=\"dataframe\">\n",
261 |        "  <thead>\n",
262 |        "    <tr style=\"text-align: right;\">\n",
263 |        "      <th></th>\n",
264 |        "      <th>resnet18</th>\n",
265 |        "      <th>resnet34</th>\n",
266 |        "      <th>resnet50</th>\n",
267 |        "    </tr>\n",
268 |        "  </thead>\n",
269 |        "  <tbody>\n",
270 |        "    <tr>\n",
271 |        "      <th>128</th>\n",
272 |        "      <td>0.007869</td>\n",
273 |        "      <td>0.014414</td>\n",
274 |        "      <td>0.014149</td>\n",
275 |        "    </tr>\n",
276 |        "    <tr>\n",
277 |        "      <th>256</th>\n",
278 |        "      <td>0.007717</td>\n",
279 |        "      <td>0.014684</td>\n",
280 |        "      <td>0.017656</td>\n",
281 |        "    </tr>\n",
282 |        "    <tr>\n",
283 |        "      <th>512</th>\n",
284 |        "      <td>0.020139</td>\n",
285 |        "      <td>0.037699</td>\n",
286 |        "      <td>0.047882</td>\n",
287 |        "    </tr>\n",
288 |        "    <tr>\n",
289 |        "      <th>1024</th>\n",
290 |        "      <td>0.071687</td>\n",
291 |        "      <td>0.134825</td>\n",
292 |        "      <td>0.179115</td>\n",
293 |        "    </tr>\n",
294 |        "  </tbody>\n",
295 |        "</table>\n",
296 |        "</div>"
297 |       ],
298 |       "text/plain": [
299 |        "      resnet18  resnet34  resnet50\n",
300 |        "128   0.007869  0.014414  0.014149\n",
301 |        "256   0.007717  0.014684  0.017656\n",
302 |        "512   0.020139  0.037699  0.047882\n",
303 |        "1024  0.071687  0.134825  0.179115"
304 |       ]
305 |      },
306 |      "execution_count": 14,
307 |      "metadata": {},
308 |      "output_type": "execute_result"
309 |     }
310 |    ],
311 |    "source": [
312 |     "dfimsize"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "code",
317 |    "execution_count": 15,
318 |    "metadata": {},
319 |    "outputs": [
320 |     {
321 |      "ename": "ModuleNotFoundError",
322 |      "evalue": "No module named 'apex'",
323 |      "output_type": "error",
324 |      "traceback": [
325 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
326 |       "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
327 |       "\u001b[0;32m<ipython-input-15-3d354972c283>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mimport\u001b[0m \u001b[0mapex\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
328 |       "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'apex'"
329 |      ]
330 |     }
331 |    ],
332 |    "source": [
333 |     "import "
334 |    ]
335 |   },
336 |   {
337 |    "cell_type": "code",
338 |    "execution_count": null,
339 |    "metadata": {},
340 |    "outputs": [],
341 |    "source": []
342 |   }
343 |  ],
344 |  "metadata": {
345 |   "kernelspec": {
346 |    "display_name": "Python 3",
347 |    "language": "python",
348 |    "name": "python3"
349 |   },
350 |   "language_info": {
351 |    "codemirror_mode": {
352 |     "name": "ipython",
353 |     "version": 3
354 |    },
355 |    "file_extension": ".py",
356 |    "mimetype": "text/x-python",
357 |    "name": "python",
358 |    "nbconvert_exporter": "python",
359 |    "pygments_lexer": "ipython3",
360 |    "version": "3.6.5"
361 |   }
362 |  },
363 |  "nbformat": 4,
364 |  "nbformat_minor": 2
365 | }
366 | 


--------------------------------------------------------------------------------
/inference_classification.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 9,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import torch\n",
 11 |     "import time\n",
 12 |     "from torchvision.models import *\n",
 13 |     "import pandas as pd"
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "code",
 18 |    "execution_count": 2,
 19 |    "metadata": {},
 20 |    "outputs": [],
 21 |    "source": [
 22 |     "# make models from str\n",
 23 |     "model_name = \"resnet18\""
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "code",
 28 |    "execution_count": 7,
 29 |    "metadata": {},
 30 |    "outputs": [],
 31 |    "source": [
 32 |     "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
 33 |     "    inputs = torch.randn(input_size)\n",
 34 |     "    if device == 'cuda':\n",
 35 |     "        model = model.cuda()\n",
 36 |     "        inputs = inputs.cuda()\n",
 37 |     "    if FP16:\n",
 38 |     "        model = model.half()\n",
 39 |     "        inputs = inputs.half()\n",
 40 |     "\n",
 41 |     "    model.eval()\n",
 42 |     "\n",
 43 |     "    i = 0\n",
 44 |     "    time_spent = []\n",
 45 |     "    while i < 200:\n",
 46 |     "        start_time = time.time()\n",
 47 |     "        with torch.no_grad():\n",
 48 |     "            _ = model(inputs)\n",
 49 |     "\n",
 50 |     "        if device == 'cuda':\n",
 51 |     "            torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
 52 |     "        if i != 0:\n",
 53 |     "            time_spent.append(time.time() - start_time)\n",
 54 |     "        i += 1\n",
 55 |     "    print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
 56 |     "    return np.mean(time_spent)"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": 38,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": [
 65 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
 66 |     " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
 67 |     "\n",
 68 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": null,
 74 |    "metadata": {
 75 |     "scrolled": false
 76 |    },
 77 |    "outputs": [
 78 |     {
 79 |      "name": "stdout",
 80 |      "output_type": "stream",
 81 |      "text": [
 82 |       "model: resnet18\n",
 83 |       "Avg execution time (ms): 0.008\n",
 84 |       "Avg execution time (ms): 0.019\n",
 85 |       "Avg execution time (ms): 0.046\n",
 86 |       "Avg execution time (ms): 0.040\n"
 87 |      ]
 88 |     }
 89 |    ],
 90 |    "source": [
 91 |     "for i, model_name in enumerate(modellist):\n",
 92 |     "    batchlist = [1, 4, 8, 16, 32]\n",
 93 |     "    imsize = [128, 256, 512]\n",
 94 |     "    runtimes = []\n",
 95 |     "    \n",
 96 |     "    # define model\n",
 97 |     "    print(\"model: {}\".format(model_name))\n",
 98 |     "    mdl = globals()[model_name]\n",
 99 |     "    model = mdl()\n",
100 |     "    \n",
101 |     "    for batch in batchlist:        \n",
102 |     "        runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n",
103 |     "\n",
104 |     "    if i == 0:\n",
105 |     "        dfbatch = pd.DataFrame({model_name: runtimes},\n",
106 |     "                         index = batchlist)\n",
107 |     "    else:\n",
108 |     "        dfbatch[model_name] = runtimes\n",
109 |     "        \n",
110 |     "    runtimes = []\n",
111 |     "    for isize in imsize:\n",
112 |     "        print(\"model: {}\".format(model_name))\n",
113 |     "        mdl = globals()[model_name]\n",
114 |     "        model = mdl()\n",
115 |     "        runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n",
116 |     "\n",
117 |     "    if i == 0:\n",
118 |     "        dfimsize = pd.DataFrame({model_name: runtimes},\n",
119 |     "                         index = imsize)\n",
120 |     "    else:\n",
121 |     "        dfimsize[model_name] = runtimes"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 37,
127 |    "metadata": {},
128 |    "outputs": [
129 |     {
130 |      "data": {
131 |       "text/html": [
132 |        "<div>\n",
133 |        "<style scoped>\n",
134 |        "    .dataframe tbody tr th:only-of-type {\n",
135 |        "        vertical-align: middle;\n",
136 |        "    }\n",
137 |        "\n",
138 |        "    .dataframe tbody tr th {\n",
139 |        "        vertical-align: top;\n",
140 |        "    }\n",
141 |        "\n",
142 |        "    .dataframe thead th {\n",
143 |        "        text-align: right;\n",
144 |        "    }\n",
145 |        "</style>\n",
146 |        "<table border=\"1\" class=\"dataframe\">\n",
147 |        "  <thead>\n",
148 |        "    <tr style=\"text-align: right;\">\n",
149 |        "      <th></th>\n",
150 |        "      <th>resnet18</th>\n",
151 |        "      <th>resnet34</th>\n",
152 |        "      <th>resnet50</th>\n",
153 |        "      <th>resnet101</th>\n",
154 |        "      <th>resnet152</th>\n",
155 |        "      <th>resnext50_32x4d</th>\n",
156 |        "      <th>resnext101_32x8d</th>\n",
157 |        "      <th>mnasnet1_0</th>\n",
158 |        "      <th>squeezenet1_0</th>\n",
159 |        "      <th>densenet121</th>\n",
160 |        "      <th>densenet169</th>\n",
161 |        "      <th>inception_v3</th>\n",
162 |        "    </tr>\n",
163 |        "  </thead>\n",
164 |        "  <tbody>\n",
165 |        "    <tr>\n",
166 |        "      <th>cuda FP32</th>\n",
167 |        "      <td>0.006895</td>\n",
168 |        "      <td>0.013512</td>\n",
169 |        "      <td>0.016632</td>\n",
170 |        "      <td>0.032939</td>\n",
171 |        "      <td>0.048400</td>\n",
172 |        "      <td>0.033309</td>\n",
173 |        "      <td>0.118709</td>\n",
174 |        "      <td>0.007704</td>\n",
175 |        "      <td>0.004120</td>\n",
176 |        "      <td>0.025061</td>\n",
177 |        "      <td>0.037639</td>\n",
178 |        "      <td>0.027673</td>\n",
179 |        "    </tr>\n",
180 |        "    <tr>\n",
181 |        "      <th>cuda FP16</th>\n",
182 |        "      <td>0.007969</td>\n",
183 |        "      <td>0.015316</td>\n",
184 |        "      <td>0.017940</td>\n",
185 |        "      <td>0.035898</td>\n",
186 |        "      <td>0.052364</td>\n",
187 |        "      <td>0.033756</td>\n",
188 |        "      <td>0.114106</td>\n",
189 |        "      <td>0.007777</td>\n",
190 |        "      <td>0.003966</td>\n",
191 |        "      <td>0.022630</td>\n",
192 |        "      <td>0.034838</td>\n",
193 |        "      <td>0.030210</td>\n",
194 |        "    </tr>\n",
195 |        "  </tbody>\n",
196 |        "</table>\n",
197 |        "</div>"
198 |       ],
199 |       "text/plain": [
200 |        "           resnet18  resnet34  resnet50  resnet101  resnet152  \\\n",
201 |        "cuda FP32  0.006895  0.013512  0.016632   0.032939   0.048400   \n",
202 |        "cuda FP16  0.007969  0.015316  0.017940   0.035898   0.052364   \n",
203 |        "\n",
204 |        "           resnext50_32x4d  resnext101_32x8d  mnasnet1_0  squeezenet1_0  \\\n",
205 |        "cuda FP32         0.033309          0.118709    0.007704       0.004120   \n",
206 |        "cuda FP16         0.033756          0.114106    0.007777       0.003966   \n",
207 |        "\n",
208 |        "           densenet121  densenet169  inception_v3  \n",
209 |        "cuda FP32     0.025061     0.037639      0.027673  \n",
210 |        "cuda FP16     0.022630     0.034838      0.030210  "
211 |       ]
212 |      },
213 |      "execution_count": 37,
214 |      "metadata": {},
215 |      "output_type": "execute_result"
216 |     }
217 |    ],
218 |    "source": [
219 |     "df"
220 |    ]
221 |   },
222 |   {
223 |    "cell_type": "code",
224 |    "execution_count": null,
225 |    "metadata": {},
226 |    "outputs": [],
227 |    "source": []
228 |   }
229 |  ],
230 |  "metadata": {
231 |   "kernelspec": {
232 |    "display_name": "Python 3",
233 |    "language": "python",
234 |    "name": "python3"
235 |   },
236 |   "language_info": {
237 |    "codemirror_mode": {
238 |     "name": "ipython",
239 |     "version": 3
240 |    },
241 |    "file_extension": ".py",
242 |    "mimetype": "text/x-python",
243 |    "name": "python",
244 |    "nbconvert_exporter": "python",
245 |    "pygments_lexer": "ipython3",
246 |    "version": "3.6.5"
247 |   }
248 |  },
249 |  "nbformat": 4,
250 |  "nbformat_minor": 2
251 | }
252 | 


--------------------------------------------------------------------------------
/inference_dev.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 9,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import torch\n",
 11 |     "import time\n",
 12 |     "from torchvision.models import *\n",
 13 |     "import pandas as pd"
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "code",
 18 |    "execution_count": 2,
 19 |    "metadata": {},
 20 |    "outputs": [],
 21 |    "source": [
 22 |     "# make models from str\n",
 23 |     "model_name = \"resnet18\""
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "code",
 28 |    "execution_count": 7,
 29 |    "metadata": {},
 30 |    "outputs": [],
 31 |    "source": [
 32 |     "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
 33 |     "    inputs = torch.randn(input_size)\n",
 34 |     "    if device == 'cuda':\n",
 35 |     "        model = model.cuda()\n",
 36 |     "        inputs = inputs.cuda()\n",
 37 |     "    if FP16:\n",
 38 |     "        model = model.half()\n",
 39 |     "        inputs = inputs.half()\n",
 40 |     "\n",
 41 |     "    model.eval()\n",
 42 |     "\n",
 43 |     "    i = 0\n",
 44 |     "    time_spent = []\n",
 45 |     "    while i < 200:\n",
 46 |     "        start_time = time.time()\n",
 47 |     "        with torch.no_grad():\n",
 48 |     "            _ = model(inputs)\n",
 49 |     "\n",
 50 |     "        if device == 'cuda':\n",
 51 |     "            torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
 52 |     "        if i != 0:\n",
 53 |     "            time_spent.append(time.time() - start_time)\n",
 54 |     "        i += 1\n",
 55 |     "    print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
 56 |     "    return np.mean(time_spent)"
 57 |    ]
 58 |   },
 59 |   {
 60 |    "cell_type": "code",
 61 |    "execution_count": 38,
 62 |    "metadata": {},
 63 |    "outputs": [],
 64 |    "source": [
 65 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\", \\\n",
 66 |     " \"resnext50_32x4d\", \"resnext101_32x8d\", \"mnasnet1_0\", \"squeezenet1_0\", \"densenet121\", \"densenet169\", \"inception_v3\"]\n",
 67 |     "\n",
 68 |     "modellist = [\"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": null,
 74 |    "metadata": {
 75 |     "scrolled": false
 76 |    },
 77 |    "outputs": [
 78 |     {
 79 |      "name": "stdout",
 80 |      "output_type": "stream",
 81 |      "text": [
 82 |       "model: resnet18\n",
 83 |       "Avg execution time (ms): 0.008\n",
 84 |       "Avg execution time (ms): 0.019\n",
 85 |       "Avg execution time (ms): 0.046\n",
 86 |       "Avg execution time (ms): 0.040\n"
 87 |      ]
 88 |     }
 89 |    ],
 90 |    "source": [
 91 |     "for i, model_name in enumerate(modellist):\n",
 92 |     "    batchlist = [1, 4, 8, 16, 32]\n",
 93 |     "    imsize = [128, 256, 512]\n",
 94 |     "    runtimes = []\n",
 95 |     "    \n",
 96 |     "    # define model\n",
 97 |     "    print(\"model: {}\".format(model_name))\n",
 98 |     "    mdl = globals()[model_name]\n",
 99 |     "    model = mdl()\n",
100 |     "    \n",
101 |     "    for batch in batchlist:        \n",
102 |     "        runtimes.append(computeTime(model, input_size=[batch, 3, 256, 256], device=\"cuda\", FP16=False)/batch)\n",
103 |     "\n",
104 |     "    if i == 0:\n",
105 |     "        dfbatch = pd.DataFrame({model_name: runtimes},\n",
106 |     "                         index = batchlist)\n",
107 |     "    else:\n",
108 |     "        dfbatch[model_name] = runtimes\n",
109 |     "        \n",
110 |     "    runtimes = []\n",
111 |     "    for isize in imsize:\n",
112 |     "        print(\"model: {}\".format(model_name))\n",
113 |     "        mdl = globals()[model_name]\n",
114 |     "        model = mdl()\n",
115 |     "        runtimes.append(computeTime(model, input_size=[1, 3, isize, isize], device=\"cuda\", FP16=False))\n",
116 |     "\n",
117 |     "    if i == 0:\n",
118 |     "        dfimsize = pd.DataFrame({model_name: runtimes},\n",
119 |     "                         index = imsize)\n",
120 |     "    else:\n",
121 |     "        dfimsize[model_name] = runtimes"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 37,
127 |    "metadata": {},
128 |    "outputs": [
129 |     {
130 |      "data": {
131 |       "text/html": [
132 |        "<div>\n",
133 |        "<style scoped>\n",
134 |        "    .dataframe tbody tr th:only-of-type {\n",
135 |        "        vertical-align: middle;\n",
136 |        "    }\n",
137 |        "\n",
138 |        "    .dataframe tbody tr th {\n",
139 |        "        vertical-align: top;\n",
140 |        "    }\n",
141 |        "\n",
142 |        "    .dataframe thead th {\n",
143 |        "        text-align: right;\n",
144 |        "    }\n",
145 |        "</style>\n",
146 |        "<table border=\"1\" class=\"dataframe\">\n",
147 |        "  <thead>\n",
148 |        "    <tr style=\"text-align: right;\">\n",
149 |        "      <th></th>\n",
150 |        "      <th>resnet18</th>\n",
151 |        "      <th>resnet34</th>\n",
152 |        "      <th>resnet50</th>\n",
153 |        "      <th>resnet101</th>\n",
154 |        "      <th>resnet152</th>\n",
155 |        "      <th>resnext50_32x4d</th>\n",
156 |        "      <th>resnext101_32x8d</th>\n",
157 |        "      <th>mnasnet1_0</th>\n",
158 |        "      <th>squeezenet1_0</th>\n",
159 |        "      <th>densenet121</th>\n",
160 |        "      <th>densenet169</th>\n",
161 |        "      <th>inception_v3</th>\n",
162 |        "    </tr>\n",
163 |        "  </thead>\n",
164 |        "  <tbody>\n",
165 |        "    <tr>\n",
166 |        "      <th>cuda FP32</th>\n",
167 |        "      <td>0.006895</td>\n",
168 |        "      <td>0.013512</td>\n",
169 |        "      <td>0.016632</td>\n",
170 |        "      <td>0.032939</td>\n",
171 |        "      <td>0.048400</td>\n",
172 |        "      <td>0.033309</td>\n",
173 |        "      <td>0.118709</td>\n",
174 |        "      <td>0.007704</td>\n",
175 |        "      <td>0.004120</td>\n",
176 |        "      <td>0.025061</td>\n",
177 |        "      <td>0.037639</td>\n",
178 |        "      <td>0.027673</td>\n",
179 |        "    </tr>\n",
180 |        "    <tr>\n",
181 |        "      <th>cuda FP16</th>\n",
182 |        "      <td>0.007969</td>\n",
183 |        "      <td>0.015316</td>\n",
184 |        "      <td>0.017940</td>\n",
185 |        "      <td>0.035898</td>\n",
186 |        "      <td>0.052364</td>\n",
187 |        "      <td>0.033756</td>\n",
188 |        "      <td>0.114106</td>\n",
189 |        "      <td>0.007777</td>\n",
190 |        "      <td>0.003966</td>\n",
191 |        "      <td>0.022630</td>\n",
192 |        "      <td>0.034838</td>\n",
193 |        "      <td>0.030210</td>\n",
194 |        "    </tr>\n",
195 |        "  </tbody>\n",
196 |        "</table>\n",
197 |        "</div>"
198 |       ],
199 |       "text/plain": [
200 |        "           resnet18  resnet34  resnet50  resnet101  resnet152  \\\n",
201 |        "cuda FP32  0.006895  0.013512  0.016632   0.032939   0.048400   \n",
202 |        "cuda FP16  0.007969  0.015316  0.017940   0.035898   0.052364   \n",
203 |        "\n",
204 |        "           resnext50_32x4d  resnext101_32x8d  mnasnet1_0  squeezenet1_0  \\\n",
205 |        "cuda FP32         0.033309          0.118709    0.007704       0.004120   \n",
206 |        "cuda FP16         0.033756          0.114106    0.007777       0.003966   \n",
207 |        "\n",
208 |        "           densenet121  densenet169  inception_v3  \n",
209 |        "cuda FP32     0.025061     0.037639      0.027673  \n",
210 |        "cuda FP16     0.022630     0.034838      0.030210  "
211 |       ]
212 |      },
213 |      "execution_count": 37,
214 |      "metadata": {},
215 |      "output_type": "execute_result"
216 |     }
217 |    ],
218 |    "source": [
219 |     "df"
220 |    ]
221 |   },
222 |   {
223 |    "cell_type": "code",
224 |    "execution_count": null,
225 |    "metadata": {},
226 |    "outputs": [],
227 |    "source": []
228 |   }
229 |  ],
230 |  "metadata": {
231 |   "kernelspec": {
232 |    "display_name": "Python 3",
233 |    "language": "python",
234 |    "name": "python3"
235 |   },
236 |   "language_info": {
237 |    "codemirror_mode": {
238 |     "name": "ipython",
239 |     "version": 3
240 |    },
241 |    "file_extension": ".py",
242 |    "mimetype": "text/x-python",
243 |    "name": "python",
244 |    "nbconvert_exporter": "python",
245 |    "pygments_lexer": "ipython3",
246 |    "version": "3.6.5"
247 |   }
248 |  },
249 |  "nbformat": 4,
250 |  "nbformat_minor": 2
251 | }
252 | 


--------------------------------------------------------------------------------
/inference_segmentation.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import torch\n",
 11 |     "import time\n",
 12 |     "from torchvision.models import *\n",
 13 |     "import pandas as pd\n",
 14 |     "import os\n",
 15 |     "import torchvision\n",
 16 |     "from torch2trt import torch2trt"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": 2,
 22 |    "metadata": {},
 23 |    "outputs": [],
 24 |    "source": [
 25 |     "from torchvision.models.segmentation import *"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 3,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "FP32 = True\n",
 35 |     "FP16 = True\n",
 36 |     "INT8 = True"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "code",
 41 |    "execution_count": 4,
 42 |    "metadata": {},
 43 |    "outputs": [],
 44 |    "source": [
 45 |     "# make results\n",
 46 |     "os.makedirs(\"results\", exist_ok=True)"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": 5,
 52 |    "metadata": {},
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "class ModelWrapper(torch.nn.Module):\n",
 56 |     "    def __init__(self, model):\n",
 57 |     "        super(ModelWrapper, self).__init__()\n",
 58 |     "        self.model = model\n",
 59 |     "    def forward(self, x):\n",
 60 |     "        return self.model(x)['out']"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "code",
 65 |    "execution_count": 6,
 66 |    "metadata": {},
 67 |    "outputs": [],
 68 |    "source": [
 69 |     "def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):\n",
 70 |     "    inputs = torch.randn(input_size)\n",
 71 |     "    if device == 'cuda':\n",
 72 |     "        model = model.cuda()\n",
 73 |     "        inputs = inputs.cuda()\n",
 74 |     "    if FP16:\n",
 75 |     "        model = model.half()\n",
 76 |     "        inputs = inputs.half()\n",
 77 |     "\n",
 78 |     "    model.eval()\n",
 79 |     "\n",
 80 |     "    i = 0\n",
 81 |     "    time_spent = []\n",
 82 |     "    while i < 200:\n",
 83 |     "        start_time = time.time()\n",
 84 |     "        with torch.no_grad():\n",
 85 |     "            _ = model(inputs)\n",
 86 |     "\n",
 87 |     "        if device == 'cuda':\n",
 88 |     "            torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)\n",
 89 |     "        if i != 0:\n",
 90 |     "            time_spent.append(time.time() - start_time)\n",
 91 |     "        i += 1\n",
 92 |     "    print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))\n",
 93 |     "    return np.mean(time_spent)"
 94 |    ]
 95 |   },
 96 |   {
 97 |    "cell_type": "code",
 98 |    "execution_count": 7,
 99 |    "metadata": {},
100 |    "outputs": [],
101 |    "source": [
102 |     "# resnet is enought for now\n",
103 |     "modellist = [\"fcn_resnet50\", \"fcn_resnet101\", \"deeplabv3_resnet50\", \"deeplabv3_resnet101\"]"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "code",
108 |    "execution_count": 9,
109 |    "metadata": {
110 |     "scrolled": true
111 |    },
112 |    "outputs": [
113 |     {
114 |      "name": "stdout",
115 |      "output_type": "stream",
116 |      "text": [
117 |       "model: fcn_resnet50\n",
118 |       "Avg execution time (ms): 0.205\n",
119 |       "Avg execution time (ms): 0.174\n",
120 |       "running fp16 models..\n",
121 |       "Avg execution time (ms): 0.037\n",
122 |       "running int8 models..\n",
123 |       "Avg execution time (ms): 0.022\n",
124 |       "model: fcn_resnet101\n"
125 |      ]
126 |     },
127 |     {
128 |      "name": "stderr",
129 |      "output_type": "stream",
130 |      "text": [
131 |       "Downloading: \"https://download.pytorch.org/models/resnet101-5d3b4d8f.pth\" to /home/ken/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth\n",
132 |       "100.0%\n"
133 |      ]
134 |     },
135 |     {
136 |      "name": "stdout",
137 |      "output_type": "stream",
138 |      "text": [
139 |       "Avg execution time (ms): 0.344\n",
140 |       "Avg execution time (ms): 0.290\n",
141 |       "running fp16 models..\n",
142 |       "Avg execution time (ms): 0.057\n",
143 |       "running int8 models..\n",
144 |       "Avg execution time (ms): 0.032\n",
145 |       "model: deeplabv3_resnet50\n",
146 |       "Avg execution time (ms): 0.281\n",
147 |       "Avg execution time (ms): 0.252\n",
148 |       "running fp16 models..\n",
149 |       "Avg execution time (ms): 0.130\n",
150 |       "running int8 models..\n",
151 |       "Avg execution time (ms): 0.097\n",
152 |       "model: deeplabv3_resnet101\n",
153 |       "Avg execution time (ms): 0.426\n",
154 |       "Avg execution time (ms): 0.367\n",
155 |       "running fp16 models..\n",
156 |       "Avg execution time (ms): 0.151\n",
157 |       "running int8 models..\n",
158 |       "Avg execution time (ms): 0.108\n"
159 |      ]
160 |     }
161 |    ],
162 |    "source": [
163 |     "results = []\n",
164 |     "for i, model_name in enumerate(modellist):\n",
165 |     "    runtimes = []\n",
166 |     "\n",
167 |     "    # define model\n",
168 |     "    print(\"model: {}\".format(model_name))\n",
169 |     "    input_size = [1, 3, 512, 512]\n",
170 |     "    mdl = globals()[model_name]\n",
171 |     "    model = mdl().cuda().eval()\n",
172 |     "    # Run raw models\n",
173 |     "    runtimes.append(computeTime(model, input_size=input_size, device=\"cuda\", FP16=False))\n",
174 |     "\n",
175 |     "    if FP32:    \n",
176 |     "        mdl = globals()[model_name]\n",
177 |     "        model = mdl().cuda().eval()\n",
178 |     "        model_w = ModelWrapper(model)\n",
179 |     "        x = torch.zeros(input_size).cuda()\n",
180 |     "\n",
181 |     "        # convert to tensorrt models\n",
182 |     "        model_trt = torch2trt(model_w, [x])\n",
183 |     "\n",
184 |     "        # Run TensorRT models\n",
185 |     "        runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=False))\n",
186 |     "    if FP16:\n",
187 |     "        print(\"running fp16 models..\")\n",
188 |     "        # Make FP16 tensorRT models\n",
189 |     "        mdl = globals()[model_name]\n",
190 |     "        model = mdl().eval().half().cuda()\n",
191 |     "        model_w = ModelWrapper(model).half()\n",
192 |     "        x = torch.zeros(input_size).half().cuda()\n",
193 |     "        # convert to tensorrt models\n",
194 |     "        model_trt = torch2trt(model_w, [x], fp16_mode=True)\n",
195 |     "        # Run TensorRT models\n",
196 |     "        runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=True))\n",
197 |     "\n",
198 |     "    if INT8:\n",
199 |     "        print(\"running int8 models..\")\n",
200 |     "        # Make INT8 tensorRT models\n",
201 |     "        mdl = globals()[model_name]\n",
202 |     "        model = mdl().eval().half().cuda()\n",
203 |     "        model_w = ModelWrapper(model).half()\n",
204 |     "        x = torch.randn(input_size).half().cuda()\n",
205 |     "        # convert to tensorrt models\n",
206 |     "        model_trt = torch2trt(model_w, [x], fp16_mode=True, int8_mode=True, max_batch_size=1)\n",
207 |     "\n",
208 |     "        runtimes.append(computeTime(model_trt, input_size=input_size, device=\"cuda\", FP16=True))\n",
209 |     "\n",
210 |     "    if i == 0:\n",
211 |     "        df = pd.DataFrame({model_name: runtimes},\n",
212 |     "                         index = [\"Raw\", \"FP32\", \"FP16\", \"INT8\"])\n",
213 |     "    else:\n",
214 |     "        df[model_name] = runtimes"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 10,
220 |    "metadata": {},
221 |    "outputs": [
222 |     {
223 |      "data": {
224 |       "text/html": [
225 |        "<div>\n",
226 |        "<style scoped>\n",
227 |        "    .dataframe tbody tr th:only-of-type {\n",
228 |        "        vertical-align: middle;\n",
229 |        "    }\n",
230 |        "\n",
231 |        "    .dataframe tbody tr th {\n",
232 |        "        vertical-align: top;\n",
233 |        "    }\n",
234 |        "\n",
235 |        "    .dataframe thead th {\n",
236 |        "        text-align: right;\n",
237 |        "    }\n",
238 |        "</style>\n",
239 |        "<table border=\"1\" class=\"dataframe\">\n",
240 |        "  <thead>\n",
241 |        "    <tr style=\"text-align: right;\">\n",
242 |        "      <th></th>\n",
243 |        "      <th>fcn_resnet50</th>\n",
244 |        "      <th>fcn_resnet101</th>\n",
245 |        "      <th>deeplabv3_resnet50</th>\n",
246 |        "      <th>deeplabv3_resnet101</th>\n",
247 |        "    </tr>\n",
248 |        "  </thead>\n",
249 |        "  <tbody>\n",
250 |        "    <tr>\n",
251 |        "      <th>Raw</th>\n",
252 |        "      <td>0.205359</td>\n",
253 |        "      <td>0.344307</td>\n",
254 |        "      <td>0.281023</td>\n",
255 |        "      <td>0.425960</td>\n",
256 |        "    </tr>\n",
257 |        "    <tr>\n",
258 |        "      <th>FP32</th>\n",
259 |        "      <td>0.173818</td>\n",
260 |        "      <td>0.290180</td>\n",
261 |        "      <td>0.252314</td>\n",
262 |        "      <td>0.366532</td>\n",
263 |        "    </tr>\n",
264 |        "    <tr>\n",
265 |        "      <th>FP16</th>\n",
266 |        "      <td>0.036635</td>\n",
267 |        "      <td>0.056922</td>\n",
268 |        "      <td>0.129868</td>\n",
269 |        "      <td>0.151195</td>\n",
270 |        "    </tr>\n",
271 |        "    <tr>\n",
272 |        "      <th>INT8</th>\n",
273 |        "      <td>0.021869</td>\n",
274 |        "      <td>0.032292</td>\n",
275 |        "      <td>0.097351</td>\n",
276 |        "      <td>0.108282</td>\n",
277 |        "    </tr>\n",
278 |        "  </tbody>\n",
279 |        "</table>\n",
280 |        "</div>"
281 |       ],
282 |       "text/plain": [
283 |        "      fcn_resnet50  fcn_resnet101  deeplabv3_resnet50  deeplabv3_resnet101\n",
284 |        "Raw       0.205359       0.344307            0.281023             0.425960\n",
285 |        "FP32      0.173818       0.290180            0.252314             0.366532\n",
286 |        "FP16      0.036635       0.056922            0.129868             0.151195\n",
287 |        "INT8      0.021869       0.032292            0.097351             0.108282"
288 |       ]
289 |      },
290 |      "execution_count": 10,
291 |      "metadata": {},
292 |      "output_type": "execute_result"
293 |     }
294 |    ],
295 |    "source": [
296 |     "df.to_csv(\"results/xavier_segmentation.csv\")\n",
297 |     "df"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "code",
302 |    "execution_count": null,
303 |    "metadata": {},
304 |    "outputs": [],
305 |    "source": []
306 |   }
307 |  ],
308 |  "metadata": {
309 |   "kernelspec": {
310 |    "display_name": "Python 3",
311 |    "language": "python",
312 |    "name": "python3"
313 |   },
314 |   "language_info": {
315 |    "codemirror_mode": {
316 |     "name": "ipython",
317 |     "version": 3
318 |    },
319 |    "file_extension": ".py",
320 |    "mimetype": "text/x-python",
321 |    "name": "python",
322 |    "nbconvert_exporter": "python",
323 |    "pygments_lexer": "ipython3",
324 |    "version": "3.6.9"
325 |   }
326 |  },
327 |  "nbformat": 4,
328 |  "nbformat_minor": 2
329 | }
330 | 


--------------------------------------------------------------------------------
/inference_tensorrt.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # coding: utf-8
  3 | 
  4 | import numpy as np
  5 | import torch
  6 | import time
  7 | from torchvision.models import *
  8 | from utils.fp16 import network_to_half
  9 | import os
 10 | from torch2trt import torch2trt
 11 | import pandas as pd
 12 | 
 13 | FP32 = True
 14 | FP16 = True
 15 | INT8 = True
 16 | 
 17 | # make results
 18 | os.makedirs("results", exist_ok=True)
 19 | 
 20 | def computeTime(model, input_size=[1, 3, 224, 224], device='cuda', FP16=False):
 21 |     inputs = torch.randn(input_size)
 22 |     if device == 'cuda':
 23 |         inputs = inputs.cuda()
 24 |     if FP16:
 25 |         model = network_to_half(model)
 26 | 
 27 |     i = 0
 28 |     time_spent = []
 29 |     while i < 200:
 30 |         start_time = time.time()
 31 |         with torch.no_grad():
 32 |             _ = model(inputs)
 33 | 
 34 |         if device == 'cuda':
 35 |             torch.cuda.synchronize() # wait for cuda to finish (cuda is asynchronous!)
 36 |         if i != 0:
 37 |             time_spent.append(time.time() - start_time)
 38 |         i += 1
 39 |     print('Avg execution time (ms): {:.3f}'.format(np.mean(time_spent)))
 40 |     return np.mean(time_spent)
 41 | 
 42 | 
 43 | modellist = ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152",  "resnext50_32x4d", "resnext101_32x8d", "mnasnet1_0", "squeezenet1_0", "densenet121", "densenet169", "inception_v3"]
 44 | 
 45 | # resnet is enought for now
 46 | modellist = ["resnet18", "resnet34", "resnet50"]
 47 | results = []
 48 | 
 49 | for i, model_name in enumerate(modellist):
 50 |     runtimes = []
 51 | 
 52 |     input_size = [1, 3, 256, 256]
 53 |     mdl = globals()[model_name]
 54 |     model = mdl().cuda().eval()
 55 |     # Run raw models
 56 |     runtimes.append(computeTime(model, input_size=input_size, device="cuda", FP16=False))
 57 | 
 58 |     if FP32:
 59 | 	# define model
 60 |         print("model: {}".format(model_name))
 61 |         mdl = globals()[model_name]
 62 |         model = mdl().cuda().eval()
 63 |         # define input
 64 |         input_size = [1, 3, 256, 256]
 65 |         x = torch.zeros(input_size).cuda()
 66 | 
 67 |         # convert to tensorrt models
 68 |         model_trt = torch2trt(model, [x])
 69 | 
 70 |         # Run TensorRT models
 71 |         runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=False))
 72 |     if FP16:
 73 |         print("running fp16 models..")
 74 |         # Make FP16 tensorRT models
 75 |         mdl = globals()[model_name]
 76 |         model = mdl().eval().half().cuda()
 77 | 	# define input
 78 |         input_size = [1, 3, 256, 256]
 79 |         x = torch.zeros(input_size).half().cuda()
 80 |         # convert to tensorrt models
 81 |         model_trt = torch2trt(model, [x], fp16_mode=True)
 82 |         # Run TensorRT models
 83 |         runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=True))
 84 | 
 85 |         results.append({model_name: runtimes})
 86 |     if INT8:
 87 |         print("running int8 models..")
 88 |         # Make INT8 tensorRT models
 89 |         mdl = globals()[model_name]
 90 |         model = mdl().eval().half().cuda()
 91 |         # define input
 92 |         input_size = [1, 3, 256, 256]
 93 |         x = torch.randn(input_size).half().cuda()
 94 |         # convert to tensorrt models
 95 |         model_trt = torch2trt(model, [x], fp16_mode=True, int8_mode=True, max_batch_size=1)
 96 |         # Run TensorRT models
 97 |         input_size = [1, 3, 256, 256]
 98 |         runtimes.append(computeTime(model_trt, input_size=input_size, device="cuda", FP16=True))
 99 |         results.append({model_name: runtimes})
100 | 
101 |     if i == 0:
102 |         df = pd.DataFrame({model_name: runtimes},
103 |                          index = ["Raw", "FP32", "FP16", "INT8"])
104 |     else:
105 |         df[model_name] = runtimes
106 | 
107 | df.to_csv("results/xavier.csv")
108 | df
109 | 
110 | 


--------------------------------------------------------------------------------
/results/batch.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | 1,0.008542266922380456,0.01481275103200021,0.0178466082817346
3 | 4,0.004840066684550376,0.0091991511421587,0.011360873229539574
4 | 8,0.005877630195425983,0.009893905727108519,0.011597911616665634
5 | 16,0.0024946156009357776,0.004145186525493411,0.008591583475994704
6 | 32,0.002836233557169162,0.004563913123691502,0.008728343338223558
7 | 


--------------------------------------------------------------------------------
/results/fp16.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | FP32,0.00989055154311597,0.01693966640299888,0.019452313082901077
3 | FP16_torch,0.009034480281810664,0.017017085348541412,0.019448458848886154
4 | FP16_apex,0.009056915589912453,0.016978000276651813,0.020045562006121304
5 | 


--------------------------------------------------------------------------------
/results/imsize.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | 128,0.007868697295835869,0.014414251749239975,0.01414928244585967
3 | 256,0.007717028335111225,0.014684499807693251,0.017655759600538706
4 | 512,0.02013937672178949,0.03769871218120632,0.04788235084495353
5 | 1024,0.07168652424261199,0.13482510863836086,0.1791151432535756
6 | 


--------------------------------------------------------------------------------
/results/jetsonano.txt:
--------------------------------------------------------------------------------
1 | Without tensorRT FP32/FP16
2 | [{'resnet18': [0.038558399257947455, 0.037869752951003796]}, {'resnet34': [0.06386453542278041, 0.06943798424610541]}, {'resnet50': [0.0906546499261904, 0.08942561532983828]}]
3 | 
4 | With TensorRT FP32/FP16
5 | [{'resnet18': [0.02693739967729578, 0.030103209030688107]}, {'resnet34': [0.047432633500602374, 0.046157992664893066]}, {'resnet50': [0.07816180631743004, 0.07399397758982289]}]
6 | 
7 | 
8 | 


--------------------------------------------------------------------------------
/results/xavier.csv:
--------------------------------------------------------------------------------
1 | ,resnet18,resnet34,resnet50
2 | Raw,0.007432765098073375,0.011262918836507365,0.015097524652529002
3 | FP32,0.003625017913741682,0.005848102234116751,0.010952574523849104
4 | FP16,0.0017815654601284008,0.004464829986418911,0.0041645591582485176
5 | INT8,0.0018363609984891499,0.004203463319557995,0.0030628951949689853
6 | 


--------------------------------------------------------------------------------
/results/xavier_segmentation.csv:
--------------------------------------------------------------------------------
1 | ,fcn_resnet50,fcn_resnet101,deeplabv3_resnet50,deeplabv3_resnet101
2 | Raw,0.20535858312443872,0.34430714108836113,0.2810230279088619,0.425959756026915
3 | FP32,0.17381846126000486,0.29017985765658433,0.2523143626936716,0.3665324635242098
4 | FP16,0.03663515685191705,0.05692227521733423,0.12986798621901316,0.1511950037587228
5 | INT8,0.021869432986082144,0.032291673535677655,0.09735102749350083,0.10828216591073041
6 | 


--------------------------------------------------------------------------------
/utils/__pycache__/fp16.cpython-36.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/kentaroy47/benchmark-FP32-FP16-INT8-with-TensorRT/501e747dcbe31ff09520b2b18124c1c67ed9da9c/utils/__pycache__/fp16.cpython-36.pyc


--------------------------------------------------------------------------------
/utils/fp16.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | import torch.nn as nn
 3 | # codes from https://github.com/fastai/imagenet-fast/tree/master/cifar10
 4 | 
 5 | class tofp16(nn.Module):
 6 |     def __init__(self):
 7 |         super(tofp16, self).__init__()
 8 | 
 9 |     def forward(self, input):
10 |         return input.half()
11 | 
12 | 
13 | def copy_in_params(net, params):
14 |     net_params = list(net.parameters())
15 |     for i in range(len(params)):
16 |         net_params[i].data.copy_(params[i].data)
17 | 
18 | 
19 | def set_grad(params, params_with_grad):
20 | 
21 |     for param, param_w_grad in zip(params, params_with_grad):
22 |         if param.grad is None:
23 |             param.grad = torch.nn.Parameter(param.data.new().resize_(*param.data.size()))
24 |         param.grad.data.copy_(param_w_grad.grad.data)
25 | 
26 | 
27 | def BN_convert_float(module):
28 |     '''
29 |     BatchNorm layers to have parameters in single precision.
30 |     Find all layers and convert them back to float. This can't
31 |     be done with built in .apply as that function will apply
32 |     fn to all modules, parameters, and buffers. Thus we wouldn't
33 |     be able to guard the float conversion based on the module type.
34 |     '''
35 |     if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
36 |         module.float()
37 |     for child in module.children():
38 |         BN_convert_float(child)
39 |     return module
40 | 
41 | 
42 | def network_to_half(network):
43 |     return nn.Sequential(tofp16(), BN_convert_float(network.half()))
44 | 


--------------------------------------------------------------------------------