├── .gitignore
├── .gitmodules
├── LICENSE
├── README.md
├── datasets
└── .gitignore
├── demo.ipynb
├── demo
├── 15673749081_767a7fa63a_k.jpg
├── 16004479832_a748d55f21_k.jpg
├── 17790319373_bd19b24cfc_k.jpg
├── 18124840932_e42b3e377c_k.jpg
├── 19064748793_bb942deea1_k.jpg
├── 24274813513_0cfd2ce6d0_k.jpg
├── 33823288584_1d21cf0a26_k.jpg
├── 33887522274_eebd074106_k.jpg
├── 34501842524_3c858b3080_k.jpg
├── NOTICE
└── output
│ └── sample.jpg
├── demo_FPN.ipynb
├── eval_fast.ipynb
├── eval_fast_FPN.ipynb
├── eval_faster.ipynb
├── eval_faster_FPN.ipynb
├── eval_mask.ipynb
├── eval_mask_FPN.ipynb
├── files
├── pretrained_base_cnn
│ └── .gitignore
├── proposal_files
│ └── .gitignore
├── results
│ └── .gitignore
└── trained_models
│ └── .gitignore
├── lib
├── cppcuda
│ ├── build
│ │ └── .gitignore
│ ├── roi_align_backward_cpu.cpp
│ ├── roi_align_backward_cuda.cu
│ ├── roi_align_binding.cpp
│ ├── roi_align_cpu.cpp
│ ├── roi_align_cuda.h
│ └── roi_align_forward_cuda.cu
├── cppcuda_cffi
│ ├── __init__.py
│ ├── bind.py
│ ├── get_lib_path.py
│ ├── make.sh
│ └── src
│ │ ├── cpp
│ │ ├── roi_align_cpu_loop.cpp
│ │ └── roi_align_cpu_loop.h
│ │ ├── cuda
│ │ ├── roi_align_backward_cuda_kernel.cu
│ │ ├── roi_align_backward_cuda_kernel.h
│ │ ├── roi_align_forward_cuda_kernel.cu
│ │ └── roi_align_forward_cuda_kernel.h
│ │ ├── roi_align_backward_cuda.c
│ │ ├── roi_align_backward_cuda.h
│ │ ├── roi_align_forward_cpu.c
│ │ ├── roi_align_forward_cpu.h
│ │ ├── roi_align_forward_cuda.c
│ │ └── roi_align_forward_cuda.h
├── data
│ ├── coco_dataset.py
│ ├── json_dataset.py
│ └── roidb.py
├── model
│ ├── collect_and_distribute_fpn_rpn_proposals.py
│ ├── detector.py
│ ├── generate_proposals.py
│ ├── loss.py
│ └── roi_align.py
├── utils
│ ├── blob.py
│ ├── boxes.py
│ ├── collate_custom.py
│ ├── collections.py
│ ├── colormap.py
│ ├── data_parallel.py
│ ├── dummy_datasets.py
│ ├── fast_rcnn_sample_rois.py
│ ├── generate_anchors.py
│ ├── io.py
│ ├── json_dataset_evaluator.py
│ ├── logging.py
│ ├── multilevel_rois.py
│ ├── preprocess_sample.py
│ ├── result_utils.py
│ ├── segms.py
│ ├── selective_search.py
│ ├── solver.py
│ ├── timer.py
│ ├── training_stats.py
│ ├── utils.py
│ └── vis.py
└── utils_cython
│ ├── build_cython.py
│ ├── cython_bbox.pyx
│ └── cython_nms.pyx
└── train_fast.py
/.gitignore:
--------------------------------------------------------------------------------
1 | **/*.o
2 | **/*.so
3 | **/__pycache__/
4 | .ipynb_checkpoints/
5 | debug/
6 | lib/utils_cython/*.c
7 | demo/output/*.pdf
8 | lib/cppcuda_cffi/roialign/
9 |
10 |
--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "lib/cocoapi"]
2 | path = lib/cocoapi
3 | url = https://github.com/cocodataset/cocoapi
4 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2018 Ignacio Rocco
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Detectorch - detectron for PyTorch
2 |
3 | (Disclaimer: this is work in progress and does not feature all the functionalities of detectron. Currently only inference and evaluation are supported -- no training)
4 | (News: Now supporting FPN and ResNet-101!)
5 |
6 | This code allows to use some of the [Detectron models for object detection from Facebook AI Research](https://github.com/facebookresearch/Detectron/) with PyTorch.
7 |
8 | It currently supports:
9 |
10 | - Fast R-CNN
11 | - Faster R-CNN
12 | - Mask R-CNN
13 |
14 | It supports ResNet-50/101 models with or without FPN. The pre-trained models from caffe2 can be imported and used on PyTorch.
15 |
16 |
17 |

18 |
Example Mask R-CNN with ResNet-101 and FPN.
19 |
20 |
21 | ## Evaluation
22 | Both bounding box evaluation and instance segmentation evaluation where tested, yielding the same results as in the Detectron caffe2 models. These results below have been computed using the PyTorch code:
23 |
24 | | Model | box AP | mask AP | model id |
25 | | --- | --- | --- | --- |
26 | | [fast_rcnn_R-50-C4_2x](https://s3-us-west-2.amazonaws.com/detectron/36224046/12_2017_baselines/fast_rcnn_R-50-C4_2x.yaml.08_22_57.XFxNqEnL/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl) | 35.6 | | 36224046 |
27 | | [fast_rcnn_R-50-FPN_2x](https://s3-us-west-2.amazonaws.com/detectron/36225249/12_2017_baselines/fast_rcnn_R-50-FPN_2x.yaml.08_40_18.zoChak1f/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl) | 36.8 | | 36225249 |
28 | | [e2e_faster_rcnn_R-50-C4_2x](https://s3-us-west-2.amazonaws.com/detectron/35857281/12_2017_baselines/e2e_faster_rcnn_R-50-C4_2x.yaml.01_34_56.ScPH0Z4r/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl) | 36.5 | | 35857281 |
29 | | [e2e_faster_rcnn_R-50-FPN_2x](https://s3-us-west-2.amazonaws.com/detectron/35857389/12_2017_baselines/e2e_faster_rcnn_R-50-FPN_2x.yaml.01_37_22.KSeq0b5q/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl) | 37.9 | | 35857389 |
30 | | [e2e_mask_rcnn_R-50-C4_2x](https://s3-us-west-2.amazonaws.com/detectron/35858828/12_2017_baselines/e2e_mask_rcnn_R-50-C4_2x.yaml.01_46_47.HBThTerB/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl) | 37.8 | 32.8 | 35858828 |
31 | | [e2e_mask_rcnn_R-50-FPN_2x](https://s3-us-west-2.amazonaws.com/detectron/35859007/12_2017_baselines/e2e_mask_rcnn_R-50-FPN_2x.yaml.01_49_07.By8nQcCH/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl)| 38.6 | 34.5 | 35859007 |
32 | | [e2e_mask_rcnn_R-101-FPN_2x](https://s3-us-west-2.amazonaws.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl) | 40.9 | 36.4 | 35861858 |
33 |
34 |
35 | ## Training
36 | Training code is experimental. See `train_fast.py` for training Fast R-CNN. It seems to work, but slow.
37 |
38 | ## Installation
39 | First, clone the repo with `git clone --recursive https://github.com/ignacio-rocco/detectorch` so that you also clone the Coco API.
40 |
41 | The code can be used with PyTorch 0.3.1 or PyTorch 0.4 (master) under Python 3. Anaconda is recommended. Other required packages
42 |
43 | - torchvision (`conda install torchvision -c soumith`)
44 | - opencv (`conda install -c conda-forge opencv `)
45 | - cython (`conda install cython`)
46 | - matplotlib (`conda install matplotlib`)
47 | - scikit-image (`conda install scikit-image`)
48 | - ninja (`conda install ninja`) *(required for Pytorch 0.4 only)*
49 |
50 | Additionally, you need to build the Coco API and RoIAlign layer. See below.
51 |
52 | #### Compiling the Coco API
53 | If you cloned this repo with `git clone --recursive` you should have also cloned the cocoapi in `lib/cocoapi`. Compile this with:
54 | ```
55 | cd lib/cocoapi/PythonAPI
56 | make install
57 | ```
58 |
59 |
60 | #### Compiling RoIAlign
61 | The RoIAlign layer was converted from the caffe2 version. There are two different implementations for each PyTorch version:
62 |
63 | - Pytorch 0.4: RoIAlign using ATen library (lib/cppcuda). Compiled JIT when loaded.
64 | - PyTorch 0.3.1: RoIAlign using TH/THC and cffi (lib/cppcuda_cffi). Needs to be compiled with:
65 |
66 | ```
67 | cd lib/cppcuda_cffi
68 | ./make.sh
69 | ```
70 |
71 | ## Quick Start
72 | Check the demo notebook.
73 |
--------------------------------------------------------------------------------
/datasets/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/demo/15673749081_767a7fa63a_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/15673749081_767a7fa63a_k.jpg
--------------------------------------------------------------------------------
/demo/16004479832_a748d55f21_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/16004479832_a748d55f21_k.jpg
--------------------------------------------------------------------------------
/demo/17790319373_bd19b24cfc_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/17790319373_bd19b24cfc_k.jpg
--------------------------------------------------------------------------------
/demo/18124840932_e42b3e377c_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/18124840932_e42b3e377c_k.jpg
--------------------------------------------------------------------------------
/demo/19064748793_bb942deea1_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/19064748793_bb942deea1_k.jpg
--------------------------------------------------------------------------------
/demo/24274813513_0cfd2ce6d0_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/24274813513_0cfd2ce6d0_k.jpg
--------------------------------------------------------------------------------
/demo/33823288584_1d21cf0a26_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/33823288584_1d21cf0a26_k.jpg
--------------------------------------------------------------------------------
/demo/33887522274_eebd074106_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/33887522274_eebd074106_k.jpg
--------------------------------------------------------------------------------
/demo/34501842524_3c858b3080_k.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/34501842524_3c858b3080_k.jpg
--------------------------------------------------------------------------------
/demo/NOTICE:
--------------------------------------------------------------------------------
1 | The demo images are licensed as United States government work:
2 | https://www.usa.gov/government-works
3 |
4 | The image files were obtained on Jan 13, 2018 from the following
5 | URLs.
6 |
7 | 16004479832_a748d55f21_k.jpg
8 | https://www.flickr.com/photos/archivesnews/16004479832
9 |
10 | 18124840932_e42b3e377c_k.jpg
11 | https://www.flickr.com/photos/usnavy/18124840932
12 |
13 | 33887522274_eebd074106_k.jpg
14 | https://www.flickr.com/photos/usaid_pakistan/33887522274
15 |
16 | 15673749081_767a7fa63a_k.jpg
17 | https://www.flickr.com/photos/usnavy/15673749081
18 |
19 | 34501842524_3c858b3080_k.jpg
20 | https://www.flickr.com/photos/departmentofenergy/34501842524
21 |
22 | 24274813513_0cfd2ce6d0_k.jpg
23 | https://www.flickr.com/photos/dhsgov/24274813513
24 |
25 | 19064748793_bb942deea1_k.jpg
26 | https://www.flickr.com/photos/statephotos/19064748793
27 |
28 | 33823288584_1d21cf0a26_k.jpg
29 | https://www.flickr.com/photos/cbpphotos/33823288584
30 |
31 | 17790319373_bd19b24cfc_k.jpg
32 | https://www.flickr.com/photos/secdef/17790319373
33 |
--------------------------------------------------------------------------------
/demo/output/sample.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/demo/output/sample.jpg
--------------------------------------------------------------------------------
/eval_fast.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Imports"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": true
15 | },
16 | "outputs": [],
17 | "source": [
18 | "import torch\n",
19 | "from torch.autograd import Variable\n",
20 | "from torch.utils.data import DataLoader\n",
21 | "\n",
22 | "import matplotlib.pyplot as plt\n",
23 | "import numpy as np\n",
24 | "\n",
25 | "import sys\n",
26 | "sys.path.insert(0, \"lib/\")\n",
27 | "from data.coco_dataset import CocoDataset\n",
28 | "from utils.preprocess_sample import preprocess_sample\n",
29 | "from utils.collate_custom import collate_custom\n",
30 | "from utils.utils import to_cuda_variable\n",
31 | "import utils.result_utils as result_utils\n",
32 | "from utils.json_dataset_evaluator import evaluate_boxes\n",
33 | "from model.detector import detector\n",
34 | "\n",
35 | "torch_ver = torch.__version__[:3]"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "# Parameters"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 2,
48 | "metadata": {
49 | "collapsed": true
50 | },
51 | "outputs": [],
52 | "source": [
53 | "# Pretrained model\n",
54 | "# https://s3-us-west-2.amazonaws.com/detectron/36224046/12_2017_baselines/fast_rcnn_R-50-C4_2x.yaml.08_22_57.XFxNqEnL/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl\n",
55 | "arch='resnet50'\n",
56 | "pretrained_model_file = 'files/trained_models/fast/model_final.pkl'\n",
57 | "\n",
58 | "# Pre-computed COCO minival2014 proposals\n",
59 | "# https://s3-us-west-2.amazonaws.com/detectron/35998355/12_2017_baselines/rpn_R-50-C4_1x.yaml.08_00_43.njH5oD9L/output/test/coco_2014_minival/rpn/rpn_proposals.pkl\n",
60 | "proposal_file='files/proposal_files/coco_2014_minival/rpn_proposals.pkl'\n",
61 | "\n",
62 | "# COCO minival2014 dataset path\n",
63 | "coco_ann_file='datasets/data/coco/annotations/instances_minival2014.json'\n",
64 | "img_dir='datasets/data/coco/val2014'"
65 | ]
66 | },
67 | {
68 | "cell_type": "markdown",
69 | "metadata": {},
70 | "source": [
71 | "# Create dataset"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": 3,
77 | "metadata": {},
78 | "outputs": [
79 | {
80 | "name": "stdout",
81 | "output_type": "stream",
82 | "text": [
83 | "loading annotations into memory...\n",
84 | "Done (t=1.31s)\n",
85 | "creating index...\n",
86 | "index created!\n",
87 | "Loading proposals from: files/proposal_files/coco_2014_minival/rpn_proposals.pkl\n",
88 | " 1/5000\n",
89 | " 2501/5000\n"
90 | ]
91 | }
92 | ],
93 | "source": [
94 | "dataset = CocoDataset(ann_file=coco_ann_file,img_dir=img_dir,proposal_file=proposal_file,\n",
95 | " sample_transform=preprocess_sample(target_sizes=[800]))\n",
96 | "dataloader = DataLoader(dataset, batch_size=1, # only batch_size=1 is supported by now\n",
97 | " shuffle=False, num_workers=0, collate_fn=collate_custom)"
98 | ]
99 | },
100 | {
101 | "cell_type": "markdown",
102 | "metadata": {},
103 | "source": [
104 | "# Create detector model"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 4,
110 | "metadata": {},
111 | "outputs": [
112 | {
113 | "name": "stdout",
114 | "output_type": "stream",
115 | "text": [
116 | "loading pretrained weights\n"
117 | ]
118 | }
119 | ],
120 | "source": [
121 | "model = detector(arch=arch,\n",
122 | " detector_pkl_file=pretrained_model_file)\n",
123 | "model = model.cuda()"
124 | ]
125 | },
126 | {
127 | "cell_type": "markdown",
128 | "metadata": {},
129 | "source": [
130 | "# Evaluate"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 5,
136 | "metadata": {
137 | "collapsed": true
138 | },
139 | "outputs": [],
140 | "source": [
141 | "# Create data structure to store results\n",
142 | "all_boxes, all_segms, all_keyps = result_utils.empty_results(dataset.num_classes, len(dataset)) \n",
143 | "# (only all_boxes will be used for fast RCNN)"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 6,
149 | "metadata": {
150 | "collapsed": true
151 | },
152 | "outputs": [],
153 | "source": [
154 | "batch = next(iter(dataloader))"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 7,
160 | "metadata": {},
161 | "outputs": [
162 | {
163 | "name": "stdout",
164 | "output_type": "stream",
165 | "text": [
166 | "1/5000\n",
167 | "101/5000\n",
168 | "201/5000\n",
169 | "301/5000\n",
170 | "401/5000\n",
171 | "501/5000\n",
172 | "601/5000\n",
173 | "701/5000\n",
174 | "801/5000\n",
175 | "901/5000\n",
176 | "1001/5000\n",
177 | "1101/5000\n",
178 | "1201/5000\n",
179 | "1301/5000\n",
180 | "1401/5000\n",
181 | "1501/5000\n",
182 | "1601/5000\n",
183 | "1701/5000\n",
184 | "1801/5000\n",
185 | "1901/5000\n",
186 | "2001/5000\n",
187 | "2101/5000\n",
188 | "2201/5000\n",
189 | "2301/5000\n",
190 | "2401/5000\n",
191 | "2501/5000\n",
192 | "2601/5000\n",
193 | "2701/5000\n",
194 | "2801/5000\n",
195 | "2901/5000\n",
196 | "3001/5000\n",
197 | "3101/5000\n",
198 | "3201/5000\n",
199 | "3301/5000\n",
200 | "3401/5000\n",
201 | "3501/5000\n",
202 | "3601/5000\n",
203 | "3701/5000\n",
204 | "3801/5000\n",
205 | "3901/5000\n",
206 | "4001/5000\n",
207 | "4101/5000\n",
208 | "4201/5000\n",
209 | "4301/5000\n",
210 | "4401/5000\n",
211 | "4501/5000\n",
212 | "4601/5000\n",
213 | "4701/5000\n",
214 | "4801/5000\n",
215 | "4901/5000\n",
216 | "Done!\n"
217 | ]
218 | }
219 | ],
220 | "source": [
221 | "# Compute detections for whole dataset\n",
222 | "for i, batch in enumerate(dataloader):\n",
223 | " batch = to_cuda_variable(batch)\n",
224 | " # forward pass\n",
225 | " if torch_ver==\"0.4\": # handle change in \"volatile\"\n",
226 | " with torch.no_grad(): \n",
227 | " class_scores,bbox_deltas,_,_ =model(batch['image'],batch['rois'])\n",
228 | " else:\n",
229 | " class_scores,bbox_deltas,_,_ =model(batch['image'],batch['rois'])\n",
230 | " # postprocess output:\n",
231 | " # - convert coordinates back to original image size, \n",
232 | " # - treshold proposals based on score,\n",
233 | " # - do NMS.\n",
234 | " scores_final, boxes_final, boxes_per_class = result_utils.postprocess_output(batch['rois'],\n",
235 | " batch['scaling_factors'],\n",
236 | " batch['original_im_size'],\n",
237 | " class_scores,\n",
238 | " bbox_deltas)\n",
239 | " # store results\n",
240 | " result_utils.extend_results(i, all_boxes, boxes_per_class)\n",
241 | " \n",
242 | " if i%100==0:\n",
243 | " print(\"{}/{}\".format(i+1,len(dataset)))\n",
244 | "\n",
245 | "print('Done!')"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 8,
251 | "metadata": {
252 | "collapsed": true
253 | },
254 | "outputs": [],
255 | "source": [
256 | "# Save detection results\n",
257 | "np.save('files/results/all_boxes.npy',all_boxes)"
258 | ]
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": 9,
263 | "metadata": {},
264 | "outputs": [
265 | {
266 | "name": "stdout",
267 | "output_type": "stream",
268 | "text": [
269 | "Loading and preparing results...\n",
270 | "DONE (t=1.64s)\n",
271 | "creating index...\n",
272 | "index created!\n",
273 | "Running per image evaluation...\n",
274 | "Evaluate annotation type *bbox*\n",
275 | "DONE (t=42.18s).\n",
276 | "Accumulating evaluation results...\n",
277 | "DONE (t=6.47s).\n",
278 | " Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.356\n",
279 | " Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.567\n",
280 | " Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.382\n",
281 | " Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.181\n",
282 | " Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.403\n",
283 | " Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.494\n",
284 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.302\n",
285 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.466\n",
286 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.486\n",
287 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.269\n",
288 | " Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.545\n",
289 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.651\n"
290 | ]
291 | }
292 | ],
293 | "source": [
294 | "# Compute evaluation metrics\n",
295 | "coco_eval = evaluate_boxes(json_dataset=dataset.coco, \n",
296 | " all_boxes=all_boxes, \n",
297 | " output_dir='files/results/',\n",
298 | " use_salt=False, cleanup=False)"
299 | ]
300 | }
301 | ],
302 | "metadata": {
303 | "kernelspec": {
304 | "display_name": "Python (detectorch0.3)",
305 | "language": "python",
306 | "name": "detectorch03"
307 | },
308 | "language_info": {
309 | "codemirror_mode": {
310 | "name": "ipython",
311 | "version": 3
312 | },
313 | "file_extension": ".py",
314 | "mimetype": "text/x-python",
315 | "name": "python",
316 | "nbconvert_exporter": "python",
317 | "pygments_lexer": "ipython3",
318 | "version": "3.6.2"
319 | }
320 | },
321 | "nbformat": 4,
322 | "nbformat_minor": 2
323 | }
324 |
--------------------------------------------------------------------------------
/eval_faster.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Imports"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": true
15 | },
16 | "outputs": [],
17 | "source": [
18 | "import torch\n",
19 | "from torch.autograd import Variable\n",
20 | "from torch.utils.data import DataLoader\n",
21 | "\n",
22 | "import matplotlib.pyplot as plt\n",
23 | "import numpy as np\n",
24 | "\n",
25 | "import sys\n",
26 | "sys.path.insert(0, \"lib/\")\n",
27 | "from data.coco_dataset import CocoDataset\n",
28 | "from utils.preprocess_sample import preprocess_sample\n",
29 | "from utils.collate_custom import collate_custom\n",
30 | "from utils.utils import to_cuda_variable\n",
31 | "import utils.result_utils as result_utils\n",
32 | "from utils.json_dataset_evaluator import evaluate_boxes\n",
33 | "from model.detector import detector\n",
34 | "\n",
35 | "torch_ver = torch.__version__[:3]"
36 | ]
37 | },
38 | {
39 | "cell_type": "markdown",
40 | "metadata": {},
41 | "source": [
42 | "# Parameters"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 2,
48 | "metadata": {
49 | "collapsed": true
50 | },
51 | "outputs": [],
52 | "source": [
53 | "# Pretrained model\n",
54 | "# https://s3-us-west-2.amazonaws.com/detectron/35857281/12_2017_baselines/e2e_faster_rcnn_R-50-C4_2x.yaml.01_34_56.ScPH0Z4r/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl\n",
55 | "arch='resnet50'\n",
56 | "pretrained_model_file = 'files/trained_models/faster/model_final.pkl'\n",
57 | "\n",
58 | "# COCO minival2014 dataset path\n",
59 | "coco_ann_file='datasets/data/coco/annotations/instances_minival2014.json'\n",
60 | "img_dir='datasets/data/coco/val2014'"
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "# Create dataset"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 3,
73 | "metadata": {},
74 | "outputs": [
75 | {
76 | "name": "stdout",
77 | "output_type": "stream",
78 | "text": [
79 | "loading annotations into memory...\n",
80 | "Done (t=1.38s)\n",
81 | "creating index...\n",
82 | "index created!\n"
83 | ]
84 | }
85 | ],
86 | "source": [
87 | "dataset = CocoDataset(ann_file=coco_ann_file,img_dir=img_dir,\n",
88 | " sample_transform=preprocess_sample(target_sizes=[800]))\n",
89 | "dataloader = DataLoader(dataset, batch_size=1, # only batch_size=1 is supported by now\n",
90 | " shuffle=False, num_workers=0, collate_fn=collate_custom)"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "# Create detector model"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 4,
103 | "metadata": {},
104 | "outputs": [
105 | {
106 | "name": "stdout",
107 | "output_type": "stream",
108 | "text": [
109 | "loading pretrained weights\n"
110 | ]
111 | }
112 | ],
113 | "source": [
114 | "model = detector(arch=arch,\n",
115 | " detector_pkl_file=pretrained_model_file,\n",
116 | " use_rpn_head = True)\n",
117 | "model = model.cuda()"
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "# Evaluate"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 5,
130 | "metadata": {
131 | "collapsed": true
132 | },
133 | "outputs": [],
134 | "source": [
135 | "# Create data structure to store results\n",
136 | "all_boxes, all_segms, all_keyps = result_utils.empty_results(dataset.num_classes, len(dataset)) \n",
137 | "# (only all_boxes will be used for fast RCNN)"
138 | ]
139 | },
140 | {
141 | "cell_type": "code",
142 | "execution_count": null,
143 | "metadata": {},
144 | "outputs": [
145 | {
146 | "name": "stdout",
147 | "output_type": "stream",
148 | "text": [
149 | "1/5000\n",
150 | "101/5000\n",
151 | "201/5000\n",
152 | "301/5000\n",
153 | "401/5000\n",
154 | "501/5000\n",
155 | "601/5000\n",
156 | "701/5000\n",
157 | "801/5000\n",
158 | "901/5000\n",
159 | "1001/5000\n",
160 | "1101/5000\n",
161 | "1201/5000\n",
162 | "1301/5000\n",
163 | "1401/5000\n",
164 | "1501/5000\n",
165 | "1601/5000\n"
166 | ]
167 | }
168 | ],
169 | "source": [
170 | "# Compute detections for whole dataset\n",
171 | "for i, batch in enumerate(dataloader):\n",
172 | " batch = to_cuda_variable(batch)\n",
173 | " # forward pass\n",
174 | " if torch_ver==\"0.4\": # handle change in \"volatile\"\n",
175 | " with torch.no_grad():\n",
176 | " class_scores,bbox_deltas,rois,_=model(batch['image'],\n",
177 | " scaling_factor=batch['scaling_factors']) \n",
178 | " else:\n",
179 | " class_scores,bbox_deltas,rois,_=model(batch['image'],\n",
180 | " scaling_factor=batch['scaling_factors']) \n",
181 | " # postprocess output:\n",
182 | " # - convert coordinates back to original image size, \n",
183 | " # - treshold proposals based on score,\n",
184 | " # - do NMS.\n",
185 | " scores_final, boxes_final, boxes_per_class = result_utils.postprocess_output(rois,\n",
186 | " batch['scaling_factors'],\n",
187 | " batch['original_im_size'],\n",
188 | " class_scores,\n",
189 | " bbox_deltas)\n",
190 | " # store results\n",
191 | " result_utils.extend_results(i, all_boxes, boxes_per_class)\n",
192 | " \n",
193 | " if i%100==0:\n",
194 | " print(\"{}/{}\".format(i+1,len(dataset)))\n",
195 | " \n",
196 | "print('Done!')"
197 | ]
198 | },
199 | {
200 | "cell_type": "code",
201 | "execution_count": 11,
202 | "metadata": {
203 | "collapsed": true
204 | },
205 | "outputs": [],
206 | "source": [
207 | "# Save detection results\n",
208 | "np.save('files/results/all_boxes_faster.npy',all_boxes)"
209 | ]
210 | },
211 | {
212 | "cell_type": "code",
213 | "execution_count": 12,
214 | "metadata": {},
215 | "outputs": [
216 | {
217 | "name": "stdout",
218 | "output_type": "stream",
219 | "text": [
220 | "Loading and preparing results...\n",
221 | "DONE (t=1.97s)\n",
222 | "creating index...\n",
223 | "index created!\n",
224 | "Running per image evaluation...\n",
225 | "Evaluate annotation type *bbox*\n",
226 | "DONE (t=40.78s).\n",
227 | "Accumulating evaluation results...\n",
228 | "DONE (t=6.83s).\n",
229 | " Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.365\n",
230 | " Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.573\n",
231 | " Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.393\n",
232 | " Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.184\n",
233 | " Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.406\n",
234 | " Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.506\n",
235 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.308\n",
236 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.474\n",
237 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.492\n",
238 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.279\n",
239 | " Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.540\n",
240 | " Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.657\n"
241 | ]
242 | }
243 | ],
244 | "source": [
245 | "# Compute evaluation metrics\n",
246 | "coco_eval = evaluate_boxes(json_dataset=dataset.coco, \n",
247 | " all_boxes=all_boxes, \n",
248 | " output_dir='files/results/',\n",
249 | " use_salt=False, cleanup=False)"
250 | ]
251 | }
252 | ],
253 | "metadata": {
254 | "kernelspec": {
255 | "display_name": "Python (detectorch0.3)",
256 | "language": "python",
257 | "name": "detectorch03"
258 | },
259 | "language_info": {
260 | "codemirror_mode": {
261 | "name": "ipython",
262 | "version": 3
263 | },
264 | "file_extension": ".py",
265 | "mimetype": "text/x-python",
266 | "name": "python",
267 | "nbconvert_exporter": "python",
268 | "pygments_lexer": "ipython3",
269 | "version": "3.6.2"
270 | }
271 | },
272 | "nbformat": 4,
273 | "nbformat_minor": 2
274 | }
275 |
--------------------------------------------------------------------------------
/files/pretrained_base_cnn/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/files/proposal_files/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/files/results/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/files/trained_models/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/lib/cppcuda/build/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/lib/cppcuda/roi_align_backward_cpu.cpp:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_gradient_op.cc
2 | // (Ignacio Rocco)
3 |
4 | #include "ATen/NativeFunctions.h"
5 | #include
6 |
7 | namespace at {
8 | namespace contrib {
9 |
10 | template
11 | void bilinear_interpolate_gradient(
12 | const int height,
13 | const int width,
14 | T y,
15 | T x,
16 | T& w1,
17 | T& w2,
18 | T& w3,
19 | T& w4,
20 | int& x_low,
21 | int& x_high,
22 | int& y_low,
23 | int& y_high,
24 | const int /*index*/ /* index for debug only*/) {
25 | // deal with cases that inverse elements are out of feature map boundary
26 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
27 | // empty
28 | w1 = w2 = w3 = w4 = 0.;
29 | x_low = x_high = y_low = y_high = -1;
30 | return;
31 | }
32 |
33 | if (y <= 0) {
34 | y = 0;
35 | }
36 | if (x <= 0) {
37 | x = 0;
38 | }
39 |
40 | y_low = (int)y;
41 | x_low = (int)x;
42 |
43 | if (y_low >= height - 1) {
44 | y_high = y_low = height - 1;
45 | y = (T)y_low;
46 | } else {
47 | y_high = y_low + 1;
48 | }
49 |
50 | if (x_low >= width - 1) {
51 | x_high = x_low = width - 1;
52 | x = (T)x_low;
53 | } else {
54 | x_high = x_low + 1;
55 | }
56 |
57 | T ly = y - y_low;
58 | T lx = x - x_low;
59 | T hy = 1. - ly, hx = 1. - lx;
60 |
61 | // reference in forward
62 | // T v1 = bottom_data[y_low * width + x_low];
63 | // T v2 = bottom_data[y_low * width + x_high];
64 | // T v3 = bottom_data[y_high * width + x_low];
65 | // T v4 = bottom_data[y_high * width + x_high];
66 | // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
67 |
68 | w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
69 |
70 | return;
71 | }
72 |
73 | template
74 | inline void add(const T& val, T* address) {
75 | *address += val;
76 | }
77 |
78 | template
79 | void roi_align_backward_loop(
80 | const int nthreads,
81 | const T* top_diff, // input gradient
82 | const int /*num_rois*/, // unused
83 | const T& spatial_scale,
84 | const int channels,
85 | const int height,
86 | const int width,
87 | const int pooled_height,
88 | const int pooled_width,
89 | const int sampling_ratio,
90 | T* bottom_diff, // output gradient
91 | const T* bottom_rois, // input rois
92 | int rois_cols) {
93 |
94 | // DCHECK(rois_cols == 4 || rois_cols == 5); check this before calling loop
95 |
96 |
97 | for (int index = 0; index < nthreads; index++) {
98 | // (n, c, ph, pw) is an element in the pooled output
99 | int pw = index % pooled_width;
100 | int ph = (index / pooled_width) % pooled_height;
101 | int c = (index / pooled_width / pooled_height) % channels;
102 | int n = index / pooled_width / pooled_height / channels;
103 |
104 | const T* offset_bottom_rois = bottom_rois + n * rois_cols;
105 | int roi_batch_ind = 0;
106 | if (rois_cols == 5) {
107 | roi_batch_ind = offset_bottom_rois[0];
108 | offset_bottom_rois++;
109 | }
110 |
111 | // Do not using rounding; this implementation detail is critical
112 | T roi_start_w = offset_bottom_rois[0] * spatial_scale;
113 | T roi_start_h = offset_bottom_rois[1] * spatial_scale;
114 | T roi_end_w = offset_bottom_rois[2] * spatial_scale;
115 | T roi_end_h = offset_bottom_rois[3] * spatial_scale;
116 | // T roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
117 | // T roi_start_h = round(offset_bottom_rois[1] * spatial_scale);
118 | // T roi_end_w = round(offset_bottom_rois[2] * spatial_scale);
119 | // T roi_end_h = round(offset_bottom_rois[3] * spatial_scale);
120 |
121 | // Force malformed ROIs to be 1x1
122 | T roi_width = std::max(roi_end_w - roi_start_w, (T)1.);
123 | T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);
124 | T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
125 | T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
126 |
127 | T* offset_bottom_diff =
128 | bottom_diff + (roi_batch_ind * channels + c) * height * width;
129 |
130 | int top_offset = (n * channels + c) * pooled_height * pooled_width;
131 | const T* offset_top_diff = top_diff + top_offset;
132 | const T top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
133 |
134 | // We use roi_bin_grid to sample the grid and mimic integral
135 | int roi_bin_grid_h = (sampling_ratio > 0)
136 | ? sampling_ratio
137 | : std::ceil(roi_height / pooled_height); // e.g., = 2
138 | int roi_bin_grid_w =
139 | (sampling_ratio > 0) ? sampling_ratio : std::ceil(roi_width / pooled_width);
140 |
141 | // We do average (integral) pooling inside a bin
142 | const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
143 |
144 | for (int iy = 0; iy < roi_bin_grid_h; iy++) {
145 | const T y = roi_start_h + ph * bin_size_h +
146 | static_cast(iy + .5f) * bin_size_h /
147 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
148 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
149 | const T x = roi_start_w + pw * bin_size_w +
150 | static_cast(ix + .5f) * bin_size_w /
151 | static_cast(roi_bin_grid_w);
152 |
153 | T w1, w2, w3, w4;
154 | int x_low, x_high, y_low, y_high;
155 |
156 | bilinear_interpolate_gradient(
157 | height,
158 | width,
159 | y,
160 | x,
161 | w1,
162 | w2,
163 | w3,
164 | w4,
165 | x_low,
166 | x_high,
167 | y_low,
168 | y_high,
169 | index);
170 |
171 | T g1 = top_diff_this_bin * w1 / count;
172 | T g2 = top_diff_this_bin * w2 / count;
173 | T g3 = top_diff_this_bin * w3 / count;
174 | T g4 = top_diff_this_bin * w4 / count;
175 |
176 | if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
177 | // atomic add is not needed for now since it is single threaded
178 | add(static_cast(g1), offset_bottom_diff + y_low * width + x_low);
179 | add(static_cast(g2), offset_bottom_diff + y_low * width + x_high);
180 | add(static_cast(g3), offset_bottom_diff + y_high * width + x_low);
181 | add(static_cast(g4), offset_bottom_diff + y_high * width + x_high);
182 | } // if
183 | } // ix
184 | } // iy
185 | } // for
186 | } // ROIAlignBackward
187 |
188 |
189 | Tensor roi_align_backward_cpu(
190 | const Tensor& bottom_rois,
191 | const Tensor& grad_output, // gradient of the output of the layer
192 | int64_t b_size,
193 | int64_t channels,
194 | int64_t height,
195 | int64_t width,
196 | int64_t pooled_height,
197 | int64_t pooled_width,
198 | double spatial_scale,
199 | int64_t sampling_ratio)
200 | {
201 |
202 | // ROIs is the set of region proposals to process. It is a 2D Tensor where the first
203 | // dim is the # of proposals, and the second dim is the proposal itself in the form
204 | // [batch_index startW startH endW endH]
205 | AT_CHECK(bottom_rois.ndimension() == 2, "RoI Proposals should be a 2D Tensor, (batch_sz x proposals)");
206 | AT_CHECK(bottom_rois.size(1) == 5, "Proposals should be of the form [batch_index startW startH endW enH]");
207 |
208 | auto num_rois = bottom_rois.size(0);
209 | auto roi_cols = bottom_rois.size(1);
210 |
211 | AT_CHECK(roi_cols == 4 || roi_cols == 5, "RoI Proposals should have 4 or 5 columns");
212 |
213 | // Output Tensor is (num_rois, C, pooled_height, pooled_width)
214 | auto output = bottom_rois.type().tensor({b_size, channels, height, width}).zero_(); // gradient wrt input features
215 |
216 | AT_CHECK(bottom_rois.is_contiguous(), "bottom_rois must be contiguous");
217 |
218 | roi_align_backward_loop(
219 | grad_output.numel(),
220 | grad_output.data(),
221 | num_rois,
222 | static_cast(spatial_scale),
223 | channels,
224 | height,
225 | width,
226 | pooled_height,
227 | pooled_width,
228 | sampling_ratio,
229 | output.data(),
230 | bottom_rois.data(),
231 | roi_cols);
232 |
233 | return output;
234 | }
235 |
236 |
237 | } // namespace
238 | } // namespace
--------------------------------------------------------------------------------
/lib/cppcuda/roi_align_backward_cuda.cu:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_gradient_op.cu
2 | // (Ignacio Rocco)
3 |
4 | #include "ATen/NativeFunctions.h"
5 | #include
6 |
7 | namespace at {
8 | namespace contrib {
9 |
10 | // Use 1024 threads per block, which requires cuda sm_2x or above
11 | const int CUDA_NUM_THREADS = 1024;
12 | const int CUDA_MAX_BLOCKS = 65535;
13 |
14 | inline int GET_BLOCKS(const int N)
15 | {
16 | return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
17 | }
18 |
19 | __host__ __device__ __forceinline__ float fmin(float a, float b) {
20 | return a > b ? b : a;
21 | }
22 |
23 | __host__ __device__ __forceinline__ float fmax(float a, float b) {
24 | return a > b ? a : b;
25 | }
26 |
27 |
28 | template
29 | inline __device__ T gpu_atomic_add(const T val, T* address);
30 |
31 | template <>
32 | inline __device__ float gpu_atomic_add(const float val, float* address) {
33 | return atomicAdd(address, val);
34 | }
35 |
36 | template
37 | __device__ void bilinear_interpolate_gradient(
38 | const int height,
39 | const int width,
40 | T y,
41 | T x,
42 | T& w1,
43 | T& w2,
44 | T& w3,
45 | T& w4,
46 | int& x_low,
47 | int& x_high,
48 | int& y_low,
49 | int& y_high,
50 | const int index /* index for debug only*/) {
51 | // deal with cases that inverse elements are out of feature map boundary
52 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
53 | // empty
54 | w1 = w2 = w3 = w4 = 0.;
55 | x_low = x_high = y_low = y_high = -1;
56 | return;
57 | }
58 |
59 | if (y <= 0) {
60 | y = 0;
61 | }
62 | if (x <= 0) {
63 | x = 0;
64 | }
65 |
66 | y_low = (int)y;
67 | x_low = (int)x;
68 |
69 | if (y_low >= height - 1) {
70 | y_high = y_low = height - 1;
71 | y = (T)y_low;
72 | } else {
73 | y_high = y_low + 1;
74 | }
75 |
76 | if (x_low >= width - 1) {
77 | x_high = x_low = width - 1;
78 | x = (T)x_low;
79 | } else {
80 | x_high = x_low + 1;
81 | }
82 |
83 | T ly = y - y_low;
84 | T lx = x - x_low;
85 | T hy = 1. - ly, hx = 1. - lx;
86 |
87 | // reference in forward
88 | // T v1 = bottom_data[y_low * width + x_low];
89 | // T v2 = bottom_data[y_low * width + x_high];
90 | // T v3 = bottom_data[y_high * width + x_low];
91 | // T v4 = bottom_data[y_high * width + x_high];
92 | // T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
93 |
94 | w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
95 |
96 | return;
97 | }
98 |
99 | template
100 | __global__ void roi_align_backward_kernel(
101 | const int nthreads,
102 | const T* top_diff,
103 | const int num_rois,
104 | const T spatial_scale,
105 | const int channels,
106 | const int height,
107 | const int width,
108 | const int pooled_height,
109 | const int pooled_width,
110 | const int sampling_ratio,
111 | T* bottom_diff,
112 | const T* bottom_rois,
113 | int rois_cols) {
114 | //CUDA_1D_KERNEL_LOOP(index, nthreads) {
115 | for (int index = blockIdx.x * blockDim.x + threadIdx.x;
116 | index < nthreads;
117 | index += blockDim.x * gridDim.x)
118 | {
119 | // (n, c, ph, pw) is an element in the pooled output
120 | int pw = index % pooled_width;
121 | int ph = (index / pooled_width) % pooled_height;
122 | int c = (index / pooled_width / pooled_height) % channels;
123 | int n = index / pooled_width / pooled_height / channels;
124 |
125 | const T* offset_bottom_rois = bottom_rois + n * 5;
126 | int roi_batch_ind = offset_bottom_rois[0];
127 |
128 | // Do not using rounding; this implementation detail is critical
129 | T roi_start_w = offset_bottom_rois[1] * spatial_scale;
130 | T roi_start_h = offset_bottom_rois[2] * spatial_scale;
131 | T roi_end_w = offset_bottom_rois[3] * spatial_scale;
132 | T roi_end_h = offset_bottom_rois[4] * spatial_scale;
133 | // T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
134 | // T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
135 | // T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
136 | // T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
137 |
138 | // Force malformed ROIs to be 1x1
139 | T roi_width = fmax(roi_end_w - roi_start_w, (T)1.);
140 | T roi_height = fmax(roi_end_h - roi_start_h, (T)1.);
141 | T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
142 | T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
143 |
144 | T* offset_bottom_diff =
145 | bottom_diff + (roi_batch_ind * channels + c) * height * width;
146 |
147 | int top_offset = (n * channels + c) * pooled_height * pooled_width;
148 | const T* offset_top_diff = top_diff + top_offset;
149 | const T top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
150 |
151 | // We use roi_bin_grid to sample the grid and mimic integral
152 | int roi_bin_grid_h = (sampling_ratio > 0)
153 | ? sampling_ratio
154 | : ceilf(roi_height / pooled_height); // e.g., = 2
155 | int roi_bin_grid_w =
156 | (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
157 |
158 | // We do average (integral) pooling inside a bin
159 | const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
160 |
161 | for (int iy = 0; iy < roi_bin_grid_h; iy++) // e.g., iy = 0, 1
162 | {
163 | const T y = roi_start_h + ph * bin_size_h +
164 | static_cast(iy + .5f) * bin_size_h /
165 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
166 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
167 | const T x = roi_start_w + pw * bin_size_w +
168 | static_cast(ix + .5f) * bin_size_w /
169 | static_cast(roi_bin_grid_w);
170 |
171 | T w1, w2, w3, w4;
172 | int x_low, x_high, y_low, y_high;
173 |
174 | bilinear_interpolate_gradient(
175 | height,
176 | width,
177 | y,
178 | x,
179 | w1,
180 | w2,
181 | w3,
182 | w4,
183 | x_low,
184 | x_high,
185 | y_low,
186 | y_high,
187 | index);
188 |
189 | T g1 = top_diff_this_bin * w1 / count;
190 | T g2 = top_diff_this_bin * w2 / count;
191 | T g3 = top_diff_this_bin * w3 / count;
192 | T g4 = top_diff_this_bin * w4 / count;
193 |
194 | if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
195 | gpu_atomic_add(
196 | static_cast(g1), offset_bottom_diff + y_low * width + x_low);
197 | gpu_atomic_add(
198 | static_cast(g2), offset_bottom_diff + y_low * width + x_high);
199 | gpu_atomic_add(
200 | static_cast(g3), offset_bottom_diff + y_high * width + x_low);
201 | gpu_atomic_add(
202 | static_cast(g4), offset_bottom_diff + y_high * width + x_high);
203 | } // if
204 | } // ix
205 | } // iy
206 | } // CUDA_1D_KERNEL_LOOP
207 | } // RoIAlignBackward
208 |
209 | Tensor roi_align_backward_cuda(
210 | const Tensor& bottom_rois,
211 | const Tensor& grad_output, // gradient of the output of the layer
212 | int64_t b_size,
213 | int64_t channels,
214 | int64_t height,
215 | int64_t width,
216 | int64_t pooled_height,
217 | int64_t pooled_width,
218 | double spatial_scale,
219 | int64_t sampling_ratio)
220 | {
221 |
222 | // ROIs is the set of region proposals to process. It is a 2D Tensor where the first
223 | // dim is the # of proposals, and the second dim is the proposal itself in the form
224 | // [batch_index startW startH endW endH]
225 | AT_CHECK(bottom_rois.ndimension() == 2, "RoI Proposals should be a 2D Tensor, (batch_sz x proposals)");
226 | AT_CHECK(bottom_rois.size(1) == 5, "Proposals should be of the form [batch_index startW startH endW enH]");
227 |
228 | auto num_rois = bottom_rois.size(0);
229 | auto roi_cols = bottom_rois.size(1);
230 |
231 | AT_CHECK(roi_cols == 4 || roi_cols == 5, "RoI Proposals should have 4 or 5 columns");
232 |
233 | // Output Tensor is (num_rois, C, pooled_height, pooled_width)
234 | auto output = bottom_rois.type().tensor({b_size, channels, height, width}).zero_(); // gradient wrt input features
235 |
236 | AT_CHECK(bottom_rois.is_contiguous(), "bottom_rois must be contiguous");
237 |
238 | int64_t total_threads = output.numel();
239 | int64_t blocks = fmin(GET_BLOCKS(total_threads),CUDA_MAX_BLOCKS);
240 |
241 | roi_align_backward_kernel<<>>(
242 | grad_output.numel(),
243 | grad_output.data(),
244 | num_rois,
245 | static_cast(spatial_scale),
246 | channels,
247 | height,
248 | width,
249 | pooled_height,
250 | pooled_width,
251 | sampling_ratio,
252 | output.data(),
253 | bottom_rois.data(),
254 | roi_cols);
255 |
256 | AT_CHECK(cudaGetLastError() == cudaSuccess, "roi_align_forward_kernel failed");
257 |
258 | return output;
259 | }
260 |
261 |
262 | }
263 | } // namespace caffe2
--------------------------------------------------------------------------------
/lib/cppcuda/roi_align_binding.cpp:
--------------------------------------------------------------------------------
1 | #include
2 | #include "roi_align_cpu.cpp"
3 | #include "roi_align_backward_cpu.cpp"
4 | #include "roi_align_cuda.h"
5 |
6 |
7 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
8 | m.def("roi_align_forward_cpu", &at::contrib::roi_align_forward_cpu, "roi_align_forward_cpu");
9 | m.def("roi_align_backward_cpu", &at::contrib::roi_align_backward_cpu, "roi_align_backward_cpu");
10 | m.def("roi_align_forward_cuda", &at::contrib::roi_align_forward_cuda, "roi_align_forward_cuda");
11 | m.def("roi_align_backward_cuda", &at::contrib::roi_align_backward_cuda, "roi_align_backward_cuda");
12 | }
13 |
--------------------------------------------------------------------------------
/lib/cppcuda/roi_align_cpu.cpp:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cc
2 | // (Ignacio Rocco)
3 |
4 | #include "ATen/NativeFunctions.h"
5 | #include
6 |
7 | namespace at {
8 | namespace contrib {
9 |
10 | template
11 | struct PreCalc {
12 | int pos1;
13 | int pos2;
14 | int pos3;
15 | int pos4;
16 | T w1;
17 | T w2;
18 | T w3;
19 | T w4;
20 | };
21 |
22 | template
23 | void pre_calc_for_bilinear_interpolate(
24 | const int height,
25 | const int width,
26 | const int pooled_height,
27 | const int pooled_width,
28 | const int iy_upper,
29 | const int ix_upper,
30 | T roi_start_h,
31 | T roi_start_w,
32 | T bin_size_h,
33 | T bin_size_w,
34 | int roi_bin_grid_h,
35 | int roi_bin_grid_w,
36 | std::vector>& pre_calc) {
37 | int pre_calc_index = 0;
38 | for (int ph = 0; ph < pooled_height; ph++) {
39 | for (int pw = 0; pw < pooled_width; pw++) {
40 | for (int iy = 0; iy < iy_upper; iy++) {
41 | const T yy = roi_start_h + ph * bin_size_h +
42 | static_cast(iy + .5f) * bin_size_h /
43 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
44 | for (int ix = 0; ix < ix_upper; ix++) {
45 | const T xx = roi_start_w + pw * bin_size_w +
46 | static_cast(ix + .5f) * bin_size_w /
47 | static_cast(roi_bin_grid_w);
48 |
49 | T x = xx;
50 | T y = yy;
51 | // deal with: inverse elements are out of feature map boundary
52 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
53 | // empty
54 | PreCalc pc;
55 | pc.pos1 = 0;
56 | pc.pos2 = 0;
57 | pc.pos3 = 0;
58 | pc.pos4 = 0;
59 | pc.w1 = 0;
60 | pc.w2 = 0;
61 | pc.w3 = 0;
62 | pc.w4 = 0;
63 | pre_calc[pre_calc_index] = pc;
64 | pre_calc_index += 1;
65 | continue;
66 | }
67 |
68 | if (y <= 0) {
69 | y = 0;
70 | }
71 | if (x <= 0) {
72 | x = 0;
73 | }
74 |
75 | int y_low = (int)y;
76 | int x_low = (int)x;
77 | int y_high;
78 | int x_high;
79 |
80 | if (y_low >= height - 1) {
81 | y_high = y_low = height - 1;
82 | y = (T)y_low;
83 | } else {
84 | y_high = y_low + 1;
85 | }
86 |
87 | if (x_low >= width - 1) {
88 | x_high = x_low = width - 1;
89 | x = (T)x_low;
90 | } else {
91 | x_high = x_low + 1;
92 | }
93 |
94 | T ly = y - y_low;
95 | T lx = x - x_low;
96 | T hy = 1. - ly, hx = 1. - lx;
97 | T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
98 |
99 | // save weights and indeces
100 | PreCalc pc;
101 | pc.pos1 = y_low * width + x_low;
102 | pc.pos2 = y_low * width + x_high;
103 | pc.pos3 = y_high * width + x_low;
104 | pc.pos4 = y_high * width + x_high;
105 | pc.w1 = w1;
106 | pc.w2 = w2;
107 | pc.w3 = w3;
108 | pc.w4 = w4;
109 | pre_calc[pre_calc_index] = pc;
110 |
111 | pre_calc_index += 1;
112 | }
113 | }
114 | }
115 | }
116 | }
117 |
118 |
119 | template
120 | void roi_align_forward_loop(
121 | const int outputElements,
122 | const T* bottom_data, // input tensor
123 | const T* bottom_rois, // input rois
124 | const T& spatial_scale,
125 | const int channels,
126 | const int height,
127 | const int width,
128 | const int pooled_height,
129 | const int pooled_width,
130 | const int sampling_ratio,
131 | const int roi_cols, // rois can have 4 or 5 columns
132 | T* top_data) // output
133 | {
134 | int n_rois = outputElements / channels / pooled_width / pooled_height;
135 | // (n, c, ph, pw) is an element in the pooled output
136 | // can be parallelized using omp
137 | // #pragma omp parallel for num_threads(32)
138 | for (int n = 0; n < n_rois; n++) {
139 | int index_n = n * channels * pooled_width * pooled_height;
140 |
141 | // roi could have 4 or 5 columns
142 | const T* offset_bottom_rois = bottom_rois + n * roi_cols;
143 | int roi_batch_ind = 0;
144 | if (roi_cols == 5) {
145 | roi_batch_ind = offset_bottom_rois[0];
146 | offset_bottom_rois++;
147 | }
148 |
149 | // Do not using rounding; this implementation detail is critical
150 | T roi_start_w = offset_bottom_rois[0] * spatial_scale;
151 | T roi_start_h = offset_bottom_rois[1] * spatial_scale;
152 | T roi_end_w = offset_bottom_rois[2] * spatial_scale;
153 | T roi_end_h = offset_bottom_rois[3] * spatial_scale;
154 | // T roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
155 | // T roi_start_h = round(offset_bottom_rois[1] * spatial_scale);
156 | // T roi_end_w = round(offset_bottom_rois[2] * spatial_scale);
157 | // T roi_end_h = round(offset_bottom_rois[3] * spatial_scale);
158 |
159 | // Force malformed ROIs to be 1x1
160 | T roi_width = std::max(roi_end_w - roi_start_w, (T)1.);
161 | T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);
162 | T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
163 | T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
164 |
165 | // We use roi_bin_grid to sample the grid and mimic integral
166 | int roi_bin_grid_h = (sampling_ratio > 0)
167 | ? sampling_ratio
168 | : std::ceil(roi_height / pooled_height); // e.g., = 2
169 | int roi_bin_grid_w =
170 | (sampling_ratio > 0) ? sampling_ratio : std::ceil(roi_width / pooled_width);
171 |
172 | // We do average (integral) pooling inside a bin
173 | const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
174 |
175 | // we want to precalculate indeces and weights shared by all chanels,
176 | // this is the key point of optimiation
177 | std::vector> pre_calc(
178 | roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height);
179 | pre_calc_for_bilinear_interpolate(
180 | height,
181 | width,
182 | pooled_height,
183 | pooled_width,
184 | roi_bin_grid_h,
185 | roi_bin_grid_w,
186 | roi_start_h,
187 | roi_start_w,
188 | bin_size_h,
189 | bin_size_w,
190 | roi_bin_grid_h,
191 | roi_bin_grid_w,
192 | pre_calc);
193 |
194 |
195 | for (int c = 0; c < channels; c++) {
196 | int index_n_c = index_n + c * pooled_width * pooled_height;
197 | const T* offset_bottom_data =
198 | bottom_data + (roi_batch_ind * channels + c) * height * width;
199 | int pre_calc_index = 0;
200 |
201 | for (int ph = 0; ph < pooled_height; ph++) {
202 | for (int pw = 0; pw < pooled_width; pw++) {
203 | int index = index_n_c + ph * pooled_width + pw;
204 |
205 | T output_val = 0.;
206 | for (int iy = 0; iy < roi_bin_grid_h; iy++) {
207 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
208 | PreCalc pc = pre_calc[pre_calc_index];
209 | output_val += pc.w1 * offset_bottom_data[pc.pos1] +
210 | pc.w2 * offset_bottom_data[pc.pos2] +
211 | pc.w3 * offset_bottom_data[pc.pos3] +
212 | pc.w4 * offset_bottom_data[pc.pos4];
213 |
214 | pre_calc_index += 1;
215 | }
216 | }
217 | output_val /= count;
218 |
219 | top_data[index] = output_val;
220 | } // for pw
221 | } // for ph
222 | } // for c
223 | } // for n
224 | }
225 |
226 |
227 | Tensor roi_align_forward_cpu(
228 | const Tensor& input,
229 | const Tensor& bottom_rois,
230 | int64_t pooled_height,
231 | int64_t pooled_width,
232 | double spatial_scale,
233 | int64_t sampling_ratio)
234 | {
235 | // Input is the output of the last convolutional layer in the Backbone network, so
236 | // it should be in the format of NCHW
237 | AT_CHECK(input.ndimension() == 4, "Input to RoI Pooling should be a NCHW Tensor");
238 |
239 | // ROIs is the set of region proposals to process. It is a 2D Tensor where the first
240 | // dim is the # of proposals, and the second dim is the proposal itself in the form
241 | // [batch_index startW startH endW endH]
242 | AT_CHECK(bottom_rois.ndimension() == 2, "RoI Proposals should be a 2D Tensor, (batch_sz x proposals)");
243 | AT_CHECK(bottom_rois.size(1) == 5, "Proposals should be of the form [batch_index startW startH endW enH]");
244 |
245 | auto num_rois = bottom_rois.size(0);
246 | auto roi_cols = bottom_rois.size(1);
247 | auto channels = input.size(1);
248 | auto height = input.size(2);
249 | auto width = input.size(3);
250 |
251 | AT_CHECK(roi_cols == 4 || roi_cols == 5, "RoI Proposals should have 4 or 5 columns");
252 |
253 |
254 | // Output Tensor is (num_rois, C, pooled_height, pooled_width)
255 | auto output = input.type().tensor({num_rois, channels, pooled_height, pooled_width});
256 |
257 | AT_CHECK(input.is_contiguous(), "input must be contiguous");
258 | AT_CHECK(bottom_rois.is_contiguous(), "bottom_rois must be contiguous");
259 |
260 |
261 | roi_align_forward_loop(
262 | output.numel(),
263 | input.data(),
264 | bottom_rois.data(),
265 | static_cast(spatial_scale),
266 | channels,
267 | height,
268 | width,
269 | pooled_height,
270 | pooled_width,
271 | sampling_ratio,
272 | roi_cols,
273 | output.data());
274 |
275 | return output;
276 | }
277 |
278 |
279 |
280 | }
281 | }
--------------------------------------------------------------------------------
/lib/cppcuda/roi_align_cuda.h:
--------------------------------------------------------------------------------
1 | namespace at {
2 | namespace contrib {
3 |
4 | Tensor roi_align_forward_cuda(
5 | const Tensor& input,
6 | const Tensor& bottom_rois,
7 | int64_t pooled_height,
8 | int64_t pooled_width,
9 | double spatial_scale,
10 | int64_t sampling_ratio);
11 |
12 | Tensor roi_align_backward_cuda(
13 | const Tensor& bottom_rois,
14 | const Tensor& grad_output, // gradient of the output of the layer
15 | int64_t b_size,
16 | int64_t channels,
17 | int64_t height,
18 | int64_t width,
19 | int64_t pooled_height,
20 | int64_t pooled_width,
21 | double spatial_scale,
22 | int64_t sampling_ratio);
23 |
24 |
25 | }
26 | }
--------------------------------------------------------------------------------
/lib/cppcuda/roi_align_forward_cuda.cu:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cu
2 | // (Ignacio Rocco)
3 |
4 | #include "ATen/NativeFunctions.h"
5 | #include
6 |
7 | namespace at {
8 | namespace contrib {
9 |
10 | // Use 1024 threads per block, which requires cuda sm_2x or above
11 | const int CUDA_NUM_THREADS = 1024;
12 | const int CUDA_MAX_BLOCKS = 65535;
13 |
14 | inline int GET_BLOCKS(const int N)
15 | {
16 | return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
17 | }
18 |
19 | __host__ __device__ __forceinline__ float fmin(float a, float b) {
20 | return a > b ? b : a;
21 | }
22 |
23 | __host__ __device__ __forceinline__ float fmax(float a, float b) {
24 | return a > b ? a : b;
25 | }
26 |
27 | template
28 | __device__ T bilinear_interpolate(
29 | const T* bottom_data,
30 | const int height,
31 | const int width,
32 | T y,
33 | T x,
34 | const int index /* index for debug only*/) {
35 | // deal with cases that inverse elements are out of feature map boundary
36 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
37 | // empty
38 | return 0;
39 | }
40 |
41 | if (y <= 0) {
42 | y = 0;
43 | }
44 | if (x <= 0) {
45 | x = 0;
46 | }
47 |
48 | int y_low = (int)y;
49 | int x_low = (int)x;
50 | int y_high;
51 | int x_high;
52 |
53 | if (y_low >= height - 1) {
54 | y_high = y_low = height - 1;
55 | y = (T)y_low;
56 | } else {
57 | y_high = y_low + 1;
58 | }
59 |
60 | if (x_low >= width - 1) {
61 | x_high = x_low = width - 1;
62 | x = (T)x_low;
63 | } else {
64 | x_high = x_low + 1;
65 | }
66 |
67 | T ly = y - y_low;
68 | T lx = x - x_low;
69 | T hy = 1. - ly, hx = 1. - lx;
70 | // do bilinear interpolation
71 | T v1 = bottom_data[y_low * width + x_low];
72 | T v2 = bottom_data[y_low * width + x_high];
73 | T v3 = bottom_data[y_high * width + x_low];
74 | T v4 = bottom_data[y_high * width + x_high];
75 | T w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
76 |
77 | T val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
78 |
79 | return val;
80 | }
81 |
82 | template
83 | __global__ void roi_align_forward_kernel(
84 | const int outputElements,
85 | const T* bottom_data, // input tensor
86 | const T* bottom_rois, // input rois
87 | const T spatial_scale,
88 | const int channels,
89 | const int height,
90 | const int width,
91 | const int pooled_height,
92 | const int pooled_width,
93 | const int sampling_ratio,
94 | T* top_data) // output
95 | {
96 | // CUDA_1D_KERNEL_LOOP(index, nthreads) {
97 | for (int index = blockIdx.x * blockDim.x + threadIdx.x;
98 | index < outputElements;
99 | index += blockDim.x * gridDim.x)
100 | {
101 | // (n, c, ph, pw) is an element in the pooled output
102 | int pw = index % pooled_width;
103 | int ph = (index / pooled_width) % pooled_height;
104 | int c = (index / pooled_width / pooled_height) % channels;
105 | int n = index / pooled_width / pooled_height / channels;
106 |
107 | const T* offset_bottom_rois = bottom_rois + n * 5;
108 | int roi_batch_ind = offset_bottom_rois[0];
109 |
110 | // Do not using rounding; this implementation detail is critical
111 | T roi_start_w = offset_bottom_rois[1] * spatial_scale;
112 | T roi_start_h = offset_bottom_rois[2] * spatial_scale;
113 | T roi_end_w = offset_bottom_rois[3] * spatial_scale;
114 | T roi_end_h = offset_bottom_rois[4] * spatial_scale;
115 | // T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
116 | // T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
117 | // T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
118 | // T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
119 |
120 | // Force malformed ROIs to be 1x1
121 | T roi_width = fmax(roi_end_w - roi_start_w, (T)1.);
122 | T roi_height = fmax(roi_end_h - roi_start_h, (T)1.);
123 | T bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
124 | T bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
125 |
126 | const T* offset_bottom_data =
127 | bottom_data + (roi_batch_ind * channels + c) * height * width;
128 |
129 | // We use roi_bin_grid to sample the grid and mimic integral
130 | int roi_bin_grid_h = (sampling_ratio > 0)
131 | ? sampling_ratio
132 | : ceilf(roi_height / pooled_height); // e.g., = 2
133 | int roi_bin_grid_w =
134 | (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
135 |
136 | // We do average (integral) pooling inside a bin
137 | const T count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
138 |
139 | T output_val = 0.;
140 | for (int iy = 0; iy < roi_bin_grid_h; iy++) // e.g., iy = 0, 1
141 | {
142 | const T y = roi_start_h + ph * bin_size_h +
143 | static_cast(iy + .5f) * bin_size_h /
144 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
145 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
146 | const T x = roi_start_w + pw * bin_size_w +
147 | static_cast(ix + .5f) * bin_size_w /
148 | static_cast(roi_bin_grid_w);
149 |
150 | T val = bilinear_interpolate(
151 | offset_bottom_data, height, width, y, x, index);
152 | output_val += val;
153 | }
154 | }
155 | output_val /= count;
156 |
157 | top_data[index] = output_val;
158 | }
159 | }
160 |
161 |
162 | Tensor roi_align_forward_cuda(
163 | const Tensor& input,
164 | const Tensor& bottom_rois,
165 | int64_t pooled_height,
166 | int64_t pooled_width,
167 | double spatial_scale,
168 | int64_t sampling_ratio)
169 | {
170 |
171 | // Input is the output of the last convolutional layer in the Backbone network, so
172 | // it should be in the format of NCHW
173 | AT_CHECK(input.ndimension() == 4, "Input to RoI Align should be a NCHW Tensor");
174 |
175 | // ROIs is the set of region proposals to process. It is a 2D Tensor where the first
176 | // dim is the # of proposals, and the second dim is the n itself in the form
177 | // [batch_index startW startH endW endH]
178 | AT_CHECK(bottom_rois.ndimension() == 2, "RoI Proposals should be a 2D Tensor, (batch_sz x proposals)");
179 | AT_CHECK(bottom_rois.size(1) == 5, "Proposals should be of the form [batch_index startW startH endW enH]");
180 |
181 | auto proposals = bottom_rois.size(0);
182 | auto channels = input.size(1);
183 | auto height = input.size(2);
184 | auto width = input.size(3);
185 |
186 | // Output Tensor is (num_rois, C, pooled_height, pooled_width)
187 | auto output = input.type().tensor({proposals, channels, pooled_height, pooled_width});
188 |
189 | AT_CHECK(input.is_contiguous(), "input must be contiguous");
190 | AT_CHECK(bottom_rois.is_contiguous(), "bottom_rois must be contiguous");
191 |
192 | // dim3 block(512);
193 | // dim3 grid((output.numel() + 512 - 1) / 512);
194 | int64_t total_threads = output.numel();
195 | int64_t blocks = fmin(GET_BLOCKS(total_threads),CUDA_MAX_BLOCKS);
196 |
197 | roi_align_forward_kernel<<>>(
198 | output.numel(),
199 | input.data(),
200 | bottom_rois.data(),
201 | static_cast(spatial_scale),
202 | channels,
203 | height,
204 | width,
205 | pooled_height,
206 | pooled_width,
207 | sampling_ratio,
208 | output.data());
209 | AT_CHECK(cudaGetLastError() == cudaSuccess, "roi_align_forward_kernel failed");
210 |
211 | return output;
212 | }
213 |
214 |
215 | } // at::contrib
216 | } // at
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ignacio-rocco/detectorch/bc2bc84781dfe3cb85aa4639ffd21d71989c6183/lib/cppcuda_cffi/__init__.py
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/bind.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 | from torch.utils.ffi import create_extension
4 |
5 |
6 | sources = ['src/roi_align_forward_cpu.c']
7 | headers = ['src/roi_align_forward_cpu.h']
8 | defines = []
9 | with_cuda = False
10 |
11 | if torch.cuda.is_available():
12 | print('Including CUDA code.')
13 | sources += ['src/roi_align_forward_cuda.c','src/roi_align_backward_cuda.c']
14 | headers += ['src/roi_align_forward_cuda.h','src/roi_align_backward_cuda.h']
15 | defines += [('WITH_CUDA', None)]
16 | with_cuda = True
17 |
18 | this_file = os.path.dirname(os.path.realpath(__file__))
19 | print(this_file)
20 | extra_objects = ['src/cpp/roi_align_cpu_loop.o',
21 | 'src/cuda/roi_align_forward_cuda_kernel.cu.o',
22 | 'src/cuda/roi_align_backward_cuda_kernel.cu.o']
23 |
24 | extra_objects = [os.path.join(this_file, fname) for fname in extra_objects]
25 |
26 | ffi = create_extension(
27 | 'roialign',
28 | headers=headers,
29 | sources=sources,
30 | define_macros=defines,
31 | relative_to=__file__,
32 | with_cuda=with_cuda,
33 | extra_objects=extra_objects,
34 | extra_compile_args=['-std=c11']
35 | )
36 |
37 | if __name__ == '__main__':
38 | ffi.build()
39 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/get_lib_path.py:
--------------------------------------------------------------------------------
1 | import os
2 | import torch
3 |
4 | def main():
5 | libpath=os.path.join(os.path.dirname(torch.__file__),'lib','include')
6 | print(libpath)
7 |
8 |
9 | if __name__ == "__main__":
10 | main()
11 |
12 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/make.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | CUDA_PATH=/usr/local/cuda/bin/
4 | PATH=$CUDA_PATH:$PATH
5 |
6 | TORCHLIBPATH=$(python get_lib_path.py 2>&1)
7 | echo $TORCHLIBPATH
8 |
9 | cd src/cpp/
10 |
11 | echo "Compiling roi_align_cpu.cpp with g++..."
12 | g++ -I $TORCHLIBPATH -o roi_align_cpu_loop.o roi_align_cpu_loop.cpp -fPIC -shared -std=c++0x
13 |
14 | echo "Compiling roi_align_forward_cuda_kernel.cu with nvcc..."
15 | cd ../cuda/
16 | nvcc -c -o roi_align_forward_cuda_kernel.cu.o roi_align_forward_cuda_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52
17 | nvcc -c -o roi_align_backward_cuda_kernel.cu.o roi_align_backward_cuda_kernel.cu -x cu -Xcompiler -fPIC -arch=sm_52
18 | cd ../../
19 |
20 | python bind.py
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/cpp/roi_align_cpu_loop.cpp:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cc
2 | // (Ignacio Rocco)
3 | #ifdef __cplusplus
4 |
5 | #include
6 | #include
7 | #include
8 |
9 | struct PreCalc {
10 | int pos1;
11 | int pos2;
12 | int pos3;
13 | int pos4;
14 | float w1;
15 | float w2;
16 | float w3;
17 | float w4;
18 | };
19 |
20 | void pre_calc_for_bilinear_interpolate(
21 | const int height,
22 | const int width,
23 | const int pooled_height,
24 | const int pooled_width,
25 | const int iy_upper,
26 | const int ix_upper,
27 | float roi_start_h,
28 | float roi_start_w,
29 | float bin_size_h,
30 | float bin_size_w,
31 | int roi_bin_grid_h,
32 | int roi_bin_grid_w,
33 | std::vector& pre_calc) {
34 | int pre_calc_index = 0;
35 | for (int ph = 0; ph < pooled_height; ph++) {
36 | for (int pw = 0; pw < pooled_width; pw++) {
37 | for (int iy = 0; iy < iy_upper; iy++) {
38 | const float yy = roi_start_h + ph * bin_size_h +
39 | static_cast(iy + .5f) * bin_size_h /
40 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
41 | for (int ix = 0; ix < ix_upper; ix++) {
42 | const float xx = roi_start_w + pw * bin_size_w +
43 | static_cast(ix + .5f) * bin_size_w /
44 | static_cast(roi_bin_grid_w);
45 |
46 | float x = xx;
47 | float y = yy;
48 | // deal with: inverse elements are out of feature map boundary
49 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
50 | // empty
51 | PreCalc pc;
52 | pc.pos1 = 0;
53 | pc.pos2 = 0;
54 | pc.pos3 = 0;
55 | pc.pos4 = 0;
56 | pc.w1 = 0;
57 | pc.w2 = 0;
58 | pc.w3 = 0;
59 | pc.w4 = 0;
60 | pre_calc[pre_calc_index] = pc;
61 | pre_calc_index += 1;
62 | continue;
63 | }
64 |
65 | if (y <= 0) {
66 | y = 0;
67 | }
68 | if (x <= 0) {
69 | x = 0;
70 | }
71 |
72 | int y_low = (int)y;
73 | int x_low = (int)x;
74 | int y_high;
75 | int x_high;
76 |
77 | if (y_low >= height - 1) {
78 | y_high = y_low = height - 1;
79 | y = (float)y_low;
80 | } else {
81 | y_high = y_low + 1;
82 | }
83 |
84 | if (x_low >= width - 1) {
85 | x_high = x_low = width - 1;
86 | x = (float)x_low;
87 | } else {
88 | x_high = x_low + 1;
89 | }
90 |
91 | float ly = y - y_low;
92 | float lx = x - x_low;
93 | float hy = 1. - ly, hx = 1. - lx;
94 | float w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
95 |
96 | // save weights and indeces
97 | PreCalc pc;
98 | pc.pos1 = y_low * width + x_low;
99 | pc.pos2 = y_low * width + x_high;
100 | pc.pos3 = y_high * width + x_low;
101 | pc.pos4 = y_high * width + x_high;
102 | pc.w1 = w1;
103 | pc.w2 = w2;
104 | pc.w3 = w3;
105 | pc.w4 = w4;
106 | pre_calc[pre_calc_index] = pc;
107 |
108 | pre_calc_index += 1;
109 | }
110 | }
111 | }
112 | }
113 | }
114 |
115 | extern "C" {
116 | #endif
117 |
118 | void roi_align_forward_loop(
119 | const int outputElements,
120 | const float* bottom_data, // input tensor
121 | const float* bottom_rois, // input rois
122 | const float spatial_scale,
123 | const int channels,
124 | const int height,
125 | const int width,
126 | const int pooled_height,
127 | const int pooled_width,
128 | const int sampling_ratio,
129 | const int roi_cols, // rois can have 4 or 5 columns
130 | float* top_data) // output
131 | {
132 | int n_rois = outputElements / channels / pooled_width / pooled_height;
133 | // (n, c, ph, pw) is an element in the pooled output
134 | // can be parallelized using omp
135 | // #pragma omp parallel for num_threads(32)
136 | for (int n = 0; n < n_rois; n++) {
137 | int index_n = n * channels * pooled_width * pooled_height;
138 |
139 | // roi could have 4 or 5 columns
140 | const float* offset_bottom_rois = bottom_rois + n * roi_cols;
141 | int roi_batch_ind = 0;
142 | if (roi_cols == 5) {
143 | roi_batch_ind = offset_bottom_rois[0];
144 | offset_bottom_rois++;
145 | }
146 |
147 | // Do not using rounding; this implementation detail is critical
148 | float roi_start_w = offset_bottom_rois[0] * spatial_scale;
149 | float roi_start_h = offset_bottom_rois[1] * spatial_scale;
150 | float roi_end_w = offset_bottom_rois[2] * spatial_scale;
151 | float roi_end_h = offset_bottom_rois[3] * spatial_scale;
152 | // float roi_start_w = round(offset_bottom_rois[0] * spatial_scale);
153 | // float roi_start_h = round(offset_bottom_rois[1] * spatial_scale);
154 | // float roi_end_w = round(offset_bottom_rois[2] * spatial_scale);
155 | // float roi_end_h = round(offset_bottom_rois[3] * spatial_scale);
156 |
157 | // Force malformed ROIs to be 1x1
158 | float roi_width = std::max(roi_end_w - roi_start_w, (float)1.);
159 | float roi_height = std::max(roi_end_h - roi_start_h, (float)1.);
160 | float bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
161 | float bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
162 |
163 | // We use roi_bin_grid to sample the grid and mimic integral
164 | int roi_bin_grid_h = (sampling_ratio > 0)
165 | ? sampling_ratio
166 | : std::ceil(roi_height / pooled_height); // e.g., = 2
167 | int roi_bin_grid_w =
168 | (sampling_ratio > 0) ? sampling_ratio : std::ceil(roi_width / pooled_width);
169 |
170 | // We do average (integral) pooling inside a bin
171 | const float count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
172 |
173 | // we want to precalculate indeces and weights shared by all chanels,
174 | // this is the key point of optimiation
175 | std::vector pre_calc(
176 | roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height);
177 | pre_calc_for_bilinear_interpolate(
178 | height,
179 | width,
180 | pooled_height,
181 | pooled_width,
182 | roi_bin_grid_h,
183 | roi_bin_grid_w,
184 | roi_start_h,
185 | roi_start_w,
186 | bin_size_h,
187 | bin_size_w,
188 | roi_bin_grid_h,
189 | roi_bin_grid_w,
190 | pre_calc);
191 |
192 |
193 | for (int c = 0; c < channels; c++) {
194 | int index_n_c = index_n + c * pooled_width * pooled_height;
195 | const float* offset_bottom_data =
196 | bottom_data + (roi_batch_ind * channels + c) * height * width;
197 | int pre_calc_index = 0;
198 |
199 | for (int ph = 0; ph < pooled_height; ph++) {
200 | for (int pw = 0; pw < pooled_width; pw++) {
201 | int index = index_n_c + ph * pooled_width + pw;
202 |
203 | float output_val = 0.;
204 | for (int iy = 0; iy < roi_bin_grid_h; iy++) {
205 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
206 | PreCalc pc = pre_calc[pre_calc_index];
207 | output_val += pc.w1 * offset_bottom_data[pc.pos1] +
208 | pc.w2 * offset_bottom_data[pc.pos2] +
209 | pc.w3 * offset_bottom_data[pc.pos3] +
210 | pc.w4 * offset_bottom_data[pc.pos4];
211 |
212 | pre_calc_index += 1;
213 | }
214 | }
215 | output_val /= count;
216 |
217 | top_data[index] = output_val;
218 | } // for pw
219 | } // for ph
220 | } // for c
221 | } // for n
222 | }
223 |
224 |
225 |
226 |
227 | #ifdef __cplusplus
228 | }
229 | #endif
230 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/cpp/roi_align_cpu_loop.h:
--------------------------------------------------------------------------------
1 | #ifdef __cplusplus
2 | extern "C" {
3 | #endif
4 |
5 | void roi_align_forward_loop(
6 | const int outputElements,
7 | const float* bottom_data, // input tensor
8 | const float* bottom_rois, // input rois
9 | const float spatial_scale,
10 | const int channels,
11 | const int height,
12 | const int width,
13 | const int pooled_height,
14 | const int pooled_width,
15 | const int sampling_ratio,
16 | const int roi_cols, // rois can have 4 or 5 columns
17 | float* top_data);
18 |
19 | #ifdef __cplusplus
20 | }
21 | #endif
22 |
23 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/cuda/roi_align_backward_cuda_kernel.cu:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_gradient_op.cu
2 | // (Ignacio Rocco)
3 | #ifdef __cplusplus
4 | extern "C" {
5 | #endif
6 |
7 | #include
8 | #include
9 | #include
10 |
11 | // Use 1024 threads per block, which requires cuda sm_2x or above
12 | const int CUDA_NUM_THREADS = 1024;
13 | const int CUDA_MAX_BLOCKS = 65535;
14 |
15 | inline int GET_BLOCKS(const int N)
16 | {
17 | return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
18 | }
19 |
20 | __host__ __device__ __forceinline__ float myfmin(float a, float b) {
21 | return a > b ? b : a;
22 | }
23 |
24 | __host__ __device__ __forceinline__ float myfmax(float a, float b) {
25 | return a > b ? a : b;
26 | }
27 |
28 |
29 | inline __device__ float gpu_atomic_add(const float val, float* address) {
30 | return atomicAdd(address, val);
31 | }
32 |
33 | __device__ void bilinear_interpolate_gradient(
34 | const int height,
35 | const int width,
36 | float y,
37 | float x,
38 | float& w1,
39 | float& w2,
40 | float& w3,
41 | float& w4,
42 | int& x_low,
43 | int& x_high,
44 | int& y_low,
45 | int& y_high,
46 | const int index /* index for debug only*/) {
47 | // deal with cases that inverse elements are out of feature map boundary
48 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
49 | // empty
50 | w1 = w2 = w3 = w4 = 0.;
51 | x_low = x_high = y_low = y_high = -1;
52 | return;
53 | }
54 |
55 | if (y <= 0) {
56 | y = 0;
57 | }
58 | if (x <= 0) {
59 | x = 0;
60 | }
61 |
62 | y_low = (int)y;
63 | x_low = (int)x;
64 |
65 | if (y_low >= height - 1) {
66 | y_high = y_low = height - 1;
67 | y = (float)y_low;
68 | } else {
69 | y_high = y_low + 1;
70 | }
71 |
72 | if (x_low >= width - 1) {
73 | x_high = x_low = width - 1;
74 | x = (float)x_low;
75 | } else {
76 | x_high = x_low + 1;
77 | }
78 |
79 | float ly = y - y_low;
80 | float lx = x - x_low;
81 | float hy = 1. - ly, hx = 1. - lx;
82 |
83 | // reference in forward
84 | // float v1 = bottom_data[y_low * width + x_low];
85 | // float v2 = bottom_data[y_low * width + x_high];
86 | // float v3 = bottom_data[y_high * width + x_low];
87 | // float v4 = bottom_data[y_high * width + x_high];
88 | // float val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
89 |
90 | w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
91 |
92 | return;
93 | }
94 |
95 | __global__ void roi_align_backward_kernel(
96 | const int nthreads,
97 | const float* top_diff,
98 | const int num_rois,
99 | const float spatial_scale,
100 | const int channels,
101 | const int height,
102 | const int width,
103 | const int pooled_height,
104 | const int pooled_width,
105 | const int sampling_ratio,
106 | float* bottom_diff,
107 | const float* bottom_rois,
108 | int rois_cols) {
109 | //CUDA_1D_KERNEL_LOOP(index, nthreads) {
110 | for (int index = blockIdx.x * blockDim.x + threadIdx.x;
111 | index < nthreads;
112 | index += blockDim.x * gridDim.x)
113 | {
114 | // (n, c, ph, pw) is an element in the pooled output
115 | int pw = index % pooled_width;
116 | int ph = (index / pooled_width) % pooled_height;
117 | int c = (index / pooled_width / pooled_height) % channels;
118 | int n = index / pooled_width / pooled_height / channels;
119 |
120 | const float* offset_bottom_rois = bottom_rois + n * 5;
121 | int roi_batch_ind = offset_bottom_rois[0];
122 |
123 | // Do not using rounding; this implementation detail is critical
124 | float roi_start_w = offset_bottom_rois[1] * spatial_scale;
125 | float roi_start_h = offset_bottom_rois[2] * spatial_scale;
126 | float roi_end_w = offset_bottom_rois[3] * spatial_scale;
127 | float roi_end_h = offset_bottom_rois[4] * spatial_scale;
128 | // float roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
129 | // float roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
130 | // float roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
131 | // float roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
132 |
133 | // Force malformed ROIs to be 1x1
134 | float roi_width = myfmax(roi_end_w - roi_start_w, (float)1.);
135 | float roi_height = myfmax(roi_end_h - roi_start_h, (float)1.);
136 | float bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
137 | float bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
138 |
139 | float* offset_bottom_diff =
140 | bottom_diff + (roi_batch_ind * channels + c) * height * width;
141 |
142 | int top_offset = (n * channels + c) * pooled_height * pooled_width;
143 | const float* offset_top_diff = top_diff + top_offset;
144 | const float top_diff_this_bin = offset_top_diff[ph * pooled_width + pw];
145 |
146 | // We use roi_bin_grid to sample the grid and mimic integral
147 | int roi_bin_grid_h = (sampling_ratio > 0)
148 | ? sampling_ratio
149 | : ceilf(roi_height / pooled_height); // e.g., = 2
150 | int roi_bin_grid_w =
151 | (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
152 |
153 | // We do average (integral) pooling inside a bin
154 | const float count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
155 |
156 | for (int iy = 0; iy < roi_bin_grid_h; iy++) // e.g., iy = 0, 1
157 | {
158 | const float y = roi_start_h + ph * bin_size_h +
159 | static_cast(iy + .5f) * bin_size_h /
160 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
161 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
162 | const float x = roi_start_w + pw * bin_size_w +
163 | static_cast(ix + .5f) * bin_size_w /
164 | static_cast(roi_bin_grid_w);
165 |
166 | float w1, w2, w3, w4;
167 | int x_low, x_high, y_low, y_high;
168 |
169 | bilinear_interpolate_gradient(
170 | height,
171 | width,
172 | y,
173 | x,
174 | w1,
175 | w2,
176 | w3,
177 | w4,
178 | x_low,
179 | x_high,
180 | y_low,
181 | y_high,
182 | index);
183 |
184 | float g1 = top_diff_this_bin * w1 / count;
185 | float g2 = top_diff_this_bin * w2 / count;
186 | float g3 = top_diff_this_bin * w3 / count;
187 | float g4 = top_diff_this_bin * w4 / count;
188 |
189 | if (x_low >= 0 && x_high >= 0 && y_low >= 0 && y_high >= 0) {
190 | gpu_atomic_add(
191 | static_cast(g1), offset_bottom_diff + y_low * width + x_low);
192 | gpu_atomic_add(
193 | static_cast(g2), offset_bottom_diff + y_low * width + x_high);
194 | gpu_atomic_add(
195 | static_cast(g3), offset_bottom_diff + y_high * width + x_low);
196 | gpu_atomic_add(
197 | static_cast(g4), offset_bottom_diff + y_high * width + x_high);
198 | } // if
199 | } // ix
200 | } // iy
201 | } // CUDA_1D_KERNEL_LOOP
202 | } // RoIAlignBackward
203 |
204 | int launch_roi_align_backward_cuda(
205 | const int nthreads,
206 | const float* top_diff,
207 | const int num_rois,
208 | const float spatial_scale,
209 | const int channels,
210 | const int height,
211 | const int width,
212 | const int pooled_height,
213 | const int pooled_width,
214 | const int sampling_ratio,
215 | float* bottom_diff,
216 | const float* bottom_rois,
217 | int roi_cols,
218 | cudaStream_t stream)
219 | {
220 |
221 | int64_t blocks = myfmin(GET_BLOCKS(nthreads),CUDA_MAX_BLOCKS);
222 |
223 | roi_align_backward_kernel<<>>(
224 | nthreads,
225 | top_diff,
226 | num_rois,
227 | spatial_scale,
228 | channels,
229 | height,
230 | width,
231 | pooled_height,
232 | pooled_width,
233 | sampling_ratio,
234 | bottom_diff,
235 | bottom_rois,
236 | roi_cols);
237 |
238 | // check for errors
239 | cudaError_t err = cudaGetLastError();
240 | if (err != cudaSuccess) {
241 | printf("error in BilinearSampler.updateOutput: %s\n", cudaGetErrorString(err));
242 | //THError("aborting");
243 | return 0;
244 | }
245 | return 1;
246 |
247 | }
248 |
249 |
250 | #ifdef __cplusplus
251 | }
252 | #endif
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/cuda/roi_align_backward_cuda_kernel.h:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_gradient_op.cu
2 | // (Ignacio Rocco)
3 | #ifdef __cplusplus
4 | extern "C" {
5 | #endif
6 |
7 | int launch_roi_align_backward_cuda(
8 | const int nthreads,
9 | const float* top_diff,
10 | const int num_rois,
11 | const float spatial_scale,
12 | const int channels,
13 | const int height,
14 | const int width,
15 | const int pooled_height,
16 | const int pooled_width,
17 | const int sampling_ratio,
18 | float* bottom_diff,
19 | const float* bottom_rois,
20 | int roi_cols,
21 | cudaStream_t stream);
22 |
23 | #ifdef __cplusplus
24 | }
25 | #endif
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/cuda/roi_align_forward_cuda_kernel.cu:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cu
2 | // (Ignacio Rocco)
3 | #ifdef __cplusplus
4 | extern "C" {
5 | #endif
6 |
7 | #include
8 | #include
9 | #include
10 |
11 |
12 | // Use 1024 threads per block, which requires cuda sm_2x or above
13 | const int CUDA_NUM_THREADS = 1024;
14 | const int CUDA_MAX_BLOCKS = 65535;
15 |
16 | inline int GET_BLOCKS(const int N)
17 | {
18 | return (N + CUDA_NUM_THREADS - 1) / CUDA_NUM_THREADS;
19 | }
20 |
21 | __host__ __device__ __forceinline__ float myfmin(float a, float b) {
22 | return a > b ? b : a;
23 | }
24 |
25 | __host__ __device__ __forceinline__ float myfmax(float a, float b) {
26 | return a > b ? a : b;
27 | }
28 |
29 | __device__ float bilinear_interpolate(
30 | const float* bottom_data,
31 | const int height,
32 | const int width,
33 | float y,
34 | float x,
35 | const int index /* index for debug only*/) {
36 | // deal with cases that inverse elements are out of feature map boundary
37 | if (y < -1.0 || y > height || x < -1.0 || x > width) {
38 | // empty
39 | return 0;
40 | }
41 |
42 | if (y <= 0) {
43 | y = 0;
44 | }
45 | if (x <= 0) {
46 | x = 0;
47 | }
48 |
49 | int y_low = (int)y;
50 | int x_low = (int)x;
51 | int y_high;
52 | int x_high;
53 |
54 | if (y_low >= height - 1) {
55 | y_high = y_low = height - 1;
56 | y = (float)y_low;
57 | } else {
58 | y_high = y_low + 1;
59 | }
60 |
61 | if (x_low >= width - 1) {
62 | x_high = x_low = width - 1;
63 | x = (float)x_low;
64 | } else {
65 | x_high = x_low + 1;
66 | }
67 |
68 | float ly = y - y_low;
69 | float lx = x - x_low;
70 | float hy = 1. - ly, hx = 1. - lx;
71 | // do bilinear interpolation
72 | float v1 = bottom_data[y_low * width + x_low];
73 | float v2 = bottom_data[y_low * width + x_high];
74 | float v3 = bottom_data[y_high * width + x_low];
75 | float v4 = bottom_data[y_high * width + x_high];
76 | float w1 = hy * hx, w2 = hy * lx, w3 = ly * hx, w4 = ly * lx;
77 |
78 | float val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);
79 |
80 | return val;
81 | }
82 |
83 | __global__ void roi_align_forward_kernel(
84 | const int outputElements,
85 | const float* bottom_data, // input tensor
86 | const float* bottom_rois, // input rois
87 | const float spatial_scale,
88 | const int channels,
89 | const int height,
90 | const int width,
91 | const int pooled_height,
92 | const int pooled_width,
93 | const int sampling_ratio,
94 | float* top_data) // output
95 | {
96 | // CUDA_1D_KERNEL_LOOP(index, nthreads) {
97 | for (int index = blockIdx.x * blockDim.x + threadIdx.x;
98 | index < outputElements;
99 | index += blockDim.x * gridDim.x)
100 | {
101 | // (n, c, ph, pw) is an element in the pooled output
102 | int pw = index % pooled_width;
103 | int ph = (index / pooled_width) % pooled_height;
104 | int c = (index / pooled_width / pooled_height) % channels;
105 | int n = index / pooled_width / pooled_height / channels;
106 |
107 | const float* offset_bottom_rois = bottom_rois + n * 5;
108 | int roi_batch_ind = offset_bottom_rois[0];
109 |
110 | // Do not using rounding; this implementation detail is critical
111 | float roi_start_w = offset_bottom_rois[1] * spatial_scale;
112 | float roi_start_h = offset_bottom_rois[2] * spatial_scale;
113 | float roi_end_w = offset_bottom_rois[3] * spatial_scale;
114 | float roi_end_h = offset_bottom_rois[4] * spatial_scale;
115 | // T roi_start_w = round(offset_bottom_rois[1] * spatial_scale);
116 | // T roi_start_h = round(offset_bottom_rois[2] * spatial_scale);
117 | // T roi_end_w = round(offset_bottom_rois[3] * spatial_scale);
118 | // T roi_end_h = round(offset_bottom_rois[4] * spatial_scale);
119 |
120 | // Force malformed ROIs to be 1x1
121 | float roi_width = myfmax(roi_end_w - roi_start_w, (float)1.);
122 | float roi_height = myfmax(roi_end_h - roi_start_h, (float)1.);
123 | float bin_size_h = static_cast(roi_height) / static_cast(pooled_height);
124 | float bin_size_w = static_cast(roi_width) / static_cast(pooled_width);
125 |
126 | const float* offset_bottom_data =
127 | bottom_data + (roi_batch_ind * channels + c) * height * width;
128 |
129 | // We use roi_bin_grid to sample the grid and mimic integral
130 | int roi_bin_grid_h = (sampling_ratio > 0)
131 | ? sampling_ratio
132 | : ceilf(roi_height / pooled_height); // e.g., = 2
133 | int roi_bin_grid_w =
134 | (sampling_ratio > 0) ? sampling_ratio : ceilf(roi_width / pooled_width);
135 |
136 | // We do average (integral) pooling inside a bin
137 | const float count = roi_bin_grid_h * roi_bin_grid_w; // e.g. = 4
138 |
139 | float output_val = 0.;
140 | for (int iy = 0; iy < roi_bin_grid_h; iy++) // e.g., iy = 0, 1
141 | {
142 | const float y = roi_start_h + ph * bin_size_h +
143 | static_cast(iy + .5f) * bin_size_h /
144 | static_cast(roi_bin_grid_h); // e.g., 0.5, 1.5
145 | for (int ix = 0; ix < roi_bin_grid_w; ix++) {
146 | const float x = roi_start_w + pw * bin_size_w +
147 | static_cast(ix + .5f) * bin_size_w /
148 | static_cast(roi_bin_grid_w);
149 |
150 | float val = bilinear_interpolate(
151 | offset_bottom_data, height, width, y, x, index);
152 | output_val += val;
153 | }
154 | }
155 | output_val /= count;
156 |
157 | top_data[index] = output_val;
158 | }
159 | }
160 |
161 | int launch_roi_align_forward_cuda(
162 | const int outputElements,
163 | const float* bottom_data, // input tensor
164 | const float* bottom_rois, // input rois
165 | const float spatial_scale,
166 | const int channels,
167 | const int height,
168 | const int width,
169 | const int pooled_height,
170 | const int pooled_width,
171 | const int sampling_ratio,
172 | float* top_data,
173 | cudaStream_t stream)
174 | {
175 |
176 | int64_t blocks = myfmin(GET_BLOCKS(outputElements),CUDA_MAX_BLOCKS);
177 |
178 | roi_align_forward_kernel<<>>(
179 | outputElements,
180 | bottom_data, // input tensor
181 | bottom_rois, // input rois
182 | spatial_scale,
183 | channels,
184 | height,
185 | width,
186 | pooled_height,
187 | pooled_width,
188 | sampling_ratio,
189 | top_data);
190 |
191 | // check for errors
192 | cudaError_t err = cudaGetLastError();
193 | if (err != cudaSuccess) {
194 | printf("error in BilinearSampler.updateOutput: %s\n", cudaGetErrorString(err));
195 | //THError("aborting");
196 | return 0;
197 | }
198 | return 1;
199 |
200 | }
201 |
202 |
203 | #ifdef __cplusplus
204 | }
205 | #endif
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/cuda/roi_align_forward_cuda_kernel.h:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cu
2 | // (Ignacio Rocco)
3 | #ifdef __cplusplus
4 | extern "C" {
5 | #endif
6 |
7 | int launch_roi_align_forward_cuda(
8 | const int outputElements,
9 | const float* bottom_data, // input tensor
10 | const float* bottom_rois, // input rois
11 | const float spatial_scale,
12 | const int channels,
13 | const int height,
14 | const int width,
15 | const int pooled_height,
16 | const int pooled_width,
17 | const int sampling_ratio,
18 | float* top_data,
19 | cudaStream_t stream);
20 |
21 |
22 | #ifdef __cplusplus
23 | }
24 | #endif
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/roi_align_backward_cuda.c:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_gradient_op.cu
2 | // (Ignacio Rocco)
3 | #include
4 | #include
5 | #include
6 | #include "cuda/roi_align_backward_cuda_kernel.h"
7 |
8 | #define real float
9 |
10 | // this symbol will be resolved automatically from PyTorch libs
11 | extern THCState *state;
12 |
13 | int roi_align_backward_cuda(
14 | THCudaTensor *bottom_rois,
15 | THCudaTensor *grad_output, // gradient of the output of the layer
16 | THCudaTensor *output,
17 | int64_t pooled_height,
18 | int64_t pooled_width,
19 | double spatial_scale,
20 | int64_t sampling_ratio)
21 | {
22 |
23 | // ROIs is the set of region proposals to process. It is a 2D Tensor where the first
24 | // dim is the # of proposals, and the second dim is the proposal itself in the form
25 | // [batch_index startW startH endW endH]
26 | int num_rois = THCudaTensor_size(state, bottom_rois, 0);
27 | int roi_cols = THCudaTensor_size(state, bottom_rois, 1);
28 | int channels = THCudaTensor_size(state, output, 1);
29 | int height = THCudaTensor_size(state, output, 2);
30 | int width = THCudaTensor_size(state, output, 3);
31 |
32 |
33 | int64_t total_threads = num_rois*channels*pooled_height*pooled_width;
34 |
35 | cudaStream_t stream = THCState_getCurrentStream(state);
36 |
37 | launch_roi_align_backward_cuda(
38 | total_threads,
39 | THCudaTensor_data(state, grad_output),
40 | num_rois,
41 | spatial_scale,
42 | channels,
43 | height,
44 | width,
45 | pooled_height,
46 | pooled_width,
47 | sampling_ratio,
48 | THCudaTensor_data(state, output),
49 | THCudaTensor_data(state, bottom_rois),
50 | roi_cols,
51 | stream);
52 |
53 | return 1;
54 | }
55 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/roi_align_backward_cuda.h:
--------------------------------------------------------------------------------
1 | int roi_align_backward_cuda(
2 | THCudaTensor *bottom_rois,
3 | THCudaTensor *grad_output, // gradient of the output of the layer
4 | THCudaTensor *output,
5 | int64_t pooled_height,
6 | int64_t pooled_width,
7 | double spatial_scale,
8 | int64_t sampling_ratio);
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/roi_align_forward_cpu.c:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cc
2 | // (Ignacio Rocco)
3 |
4 | #include
5 | #include
6 | #include
7 | #include "cpp/roi_align_cpu_loop.h"
8 |
9 | #define real float
10 |
11 | int roi_align_forward_cpu(
12 | THFloatTensor *input,
13 | THFloatTensor *bottom_rois,
14 | THFloatTensor *output,
15 | int64_t pooled_height,
16 | int64_t pooled_width,
17 | double spatial_scale,
18 | int64_t sampling_ratio)
19 | {
20 |
21 | int proposals = THFloatTensor_size(bottom_rois, 0);
22 | int roi_cols = THFloatTensor_size(bottom_rois, 1);
23 | int channels = THFloatTensor_size(input, 1);
24 | int height = THFloatTensor_size(input, 2);
25 | int width = THFloatTensor_size(input, 3);
26 |
27 |
28 | int64_t total_threads = proposals*channels*pooled_height*pooled_width;
29 |
30 | roi_align_forward_loop(
31 | total_threads,
32 | THFloatTensor_data(input),
33 | THFloatTensor_data(bottom_rois),
34 | (float)(spatial_scale),
35 | channels,
36 | height,
37 | width,
38 | pooled_height,
39 | pooled_width,
40 | sampling_ratio,
41 | roi_cols,
42 | THFloatTensor_data(output));
43 |
44 | return 1;
45 | }
46 |
47 |
48 |
49 |
50 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/roi_align_forward_cpu.h:
--------------------------------------------------------------------------------
1 |
2 | int roi_align_forward_cpu(
3 | THFloatTensor *input,
4 | THFloatTensor *bottom_rois,
5 | THFloatTensor *output,
6 | int64_t pooled_height,
7 | int64_t pooled_width,
8 | double spatial_scale,
9 | int64_t sampling_ratio);
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/roi_align_forward_cuda.c:
--------------------------------------------------------------------------------
1 | // Adapted from https://github.com/caffe2/caffe2/blob/master/caffe2/operators/roi_align_op.cu
2 | // (Ignacio Rocco)
3 | #include
4 | #include
5 | #include
6 | #include "cuda/roi_align_forward_cuda_kernel.h"
7 |
8 |
9 | #define real float
10 |
11 | // this symbol will be resolved automatically from PyTorch libs
12 | extern THCState *state;
13 |
14 |
15 | int roi_align_forward_cuda(
16 | THCudaTensor *input,
17 | THCudaTensor *bottom_rois,
18 | THCudaTensor *output,
19 | int64_t pooled_height,
20 | int64_t pooled_width,
21 | double spatial_scale,
22 | int64_t sampling_ratio)
23 | {
24 |
25 | int proposals = THCudaTensor_size(state, bottom_rois, 0);
26 | int channels = THCudaTensor_size(state, input, 1);
27 | int height = THCudaTensor_size(state, input, 2);
28 | int width = THCudaTensor_size(state, input, 3);
29 |
30 |
31 | int64_t total_threads = proposals*channels*pooled_height*pooled_width;
32 |
33 | cudaStream_t stream = THCState_getCurrentStream(state);
34 |
35 | launch_roi_align_forward_cuda(
36 | total_threads,
37 | THCudaTensor_data(state, input),
38 | THCudaTensor_data(state, bottom_rois),
39 | (float)(spatial_scale),
40 | channels,
41 | height,
42 | width,
43 | pooled_height,
44 | pooled_width,
45 | sampling_ratio,
46 | THCudaTensor_data(state, output),
47 | stream);
48 |
49 | return 1;
50 | }
51 |
52 |
--------------------------------------------------------------------------------
/lib/cppcuda_cffi/src/roi_align_forward_cuda.h:
--------------------------------------------------------------------------------
1 | int roi_align_forward_cuda(
2 | THCudaTensor *input,
3 | THCudaTensor *bottom_rois,
4 | THCudaTensor *output,
5 | int64_t pooled_height,
6 | int64_t pooled_width,
7 | double spatial_scale,
8 | int64_t sampling_ratio);
--------------------------------------------------------------------------------
/lib/data/coco_dataset.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import torch
4 | from torch.utils.data import Dataset
5 | import numpy as np
6 | import skimage.io as io
7 |
8 | from data.json_dataset import JsonDataset
9 | from data.roidb import roidb_for_training
10 |
11 | class CocoDataset(Dataset):
12 |
13 | def __init__(self,
14 | ann_file,
15 | img_dir,
16 | sample_transform=None,
17 | proposal_file=None,
18 | num_classes=81,
19 | proposal_limit=1000,
20 | mode='test'):
21 | self.img_dir = img_dir
22 | if mode=='test':
23 | self.coco = JsonDataset(annotation_file=ann_file,image_directory=img_dir) ## needed for evaluation
24 | #self.img_ids = sorted(list(self.coco.COCO.imgs.keys()))
25 | #self.classes = self.coco.classes
26 | self.num_classes=num_classes
27 | self.sample_transform = sample_transform
28 | # load proposals
29 | self.proposals=None
30 | if mode=='test':
31 | self.roidb = self.coco.get_roidb(proposal_file=proposal_file,proposal_limit=proposal_limit)
32 | #self.proposals = [entry['boxes'][entry['gt_classes'] == 0] for entry in roidb] # remove gt boxes
33 | elif mode=='train':
34 | print('creating roidb for training')
35 | self.roidb = roidb_for_training(annotation_files=ann_file,
36 | image_directories=img_dir,
37 | proposal_files=proposal_file)
38 |
39 | def __len__(self):
40 | return len(self.roidb)
41 |
42 | def __getitem__(self, idx):
43 | # get db entry
44 | dbentry = self.roidb[idx]
45 | # load image
46 | image_fn = dbentry['image']
47 | image = io.imread(image_fn)
48 | # convert grayscale to RGB
49 | if len(image.shape) == 2:
50 | image = np.repeat(np.expand_dims(image,2), 3, axis=2)
51 | # flip if needed (in these cases proposal coords are already flipped in roidb)
52 | if dbentry['flipped']:
53 | image = image[:, ::-1, :]
54 |
55 | # # get proposals
56 | # proposal_coords = torch.FloatTensor([-1])
57 | # if self.proposals is not None:
58 | # sample['proposal_coords']=torch.FloatTensor(self.roidb[idx]['boxes'])
59 |
60 | # initially the sample is just composed of the loaded image and the dbentry
61 | sample = {'image': image, 'dbentry': dbentry}
62 |
63 | # the sample transform will do the preprocessing and convert to the inputs required by the network
64 | if self.sample_transform is not None:
65 | sample = self.sample_transform(sample)
66 |
67 | return sample
68 |
--------------------------------------------------------------------------------
/lib/data/roidb.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | """Functions for common roidb manipulations."""
17 |
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | from __future__ import unicode_literals
22 |
23 | # from past.builtins import basestring # in python 3: pip install future
24 | #import logging
25 | import numpy as np
26 |
27 | from data.json_dataset import JsonDataset
28 | import utils.boxes as box_utils
29 | #import utils.keypoints as keypoint_utils
30 | import utils.segms as segm_utils
31 |
32 |
33 | class logging(): # overwrite logger with dummy class which prints
34 | def info(self,s):
35 | print(s)
36 | def debug(self,s):
37 | # print('debug: '+s)
38 | return
39 |
40 |
41 | #logger = logging.getLogger(__name__)
42 | logger = logging()
43 |
44 | def roidb_for_training(annotation_files,
45 | image_directories,
46 | proposal_files,
47 | train_crowd_filter_thresh=0.7,
48 | use_flipped=True,
49 | train_fg_thresh=0.5,
50 | train_bg_thresh_hi=0.5,
51 | train_bg_thresh_lo=0,
52 | keypoints_on=False,
53 | bbox_thresh=0.5,
54 | cls_agnostic_bbox_reg=False,
55 | bbox_reg_weights=(10.0, 10.0, 5.0, 5.0)):
56 | """Load and concatenate roidbs for one or more datasets, along with optional
57 | object proposals. The roidb entries are then prepared for use in training,
58 | which involves caching certain types of metadata for each roidb entry.
59 | """
60 | def get_roidb(annotation_file, image_directory, proposal_file):
61 | ds = JsonDataset(annotation_file,image_directory)
62 | roidb = ds.get_roidb(
63 | gt=True,
64 | proposal_file=proposal_file,
65 | crowd_filter_thresh=train_crowd_filter_thresh
66 | )
67 | if use_flipped:
68 | logger.info('Appending horizontally-flipped training examples...')
69 | extend_with_flipped_entries(roidb, ds)
70 | logger.info('Loaded dataset: {:s}'.format(ds.name))
71 | return roidb
72 |
73 | if isinstance(annotation_files, str):
74 | annotation_files = (annotation_files, )
75 | if isinstance(image_directories, str):
76 | image_directories = (image_directories, )
77 | if isinstance(proposal_files, str):
78 | proposal_files = (proposal_files, )
79 | if len(proposal_files) == 0:
80 | proposal_files = (None, ) * len(annotation_files)
81 | assert len(annotation_files) == len(image_directories) and len(annotation_files) == len(proposal_files)
82 |
83 | # if isinstance(annotation_files,(list,tuple)) and isinstance(image_directories,(list,tuple)) and isinstance(proposal_files,(list,tuple)):
84 | roidbs = [get_roidb(*args) for args in zip(annotation_files, image_directories, proposal_files)]
85 | roidb = roidbs[0]
86 | if len(annotation_files)>1:
87 | for r in roidbs[1:]:
88 | roidb.extend(r)
89 | # elif isinstance(annotation_files,str) and isinstance(image_directories,str) and isinstance(proposal_files,str):
90 | # roidb = get_roidb(annotation_files,image_directories,proposal_files)
91 |
92 | roidb = filter_for_training(roidb,train_fg_thresh,train_bg_thresh_hi,train_bg_thresh_lo,keypoints_on)
93 |
94 | logger.info('Computing bounding-box regression targets...')
95 | add_bbox_regression_targets(roidb,bbox_thresh,cls_agnostic_bbox_reg,bbox_reg_weights)
96 | logger.info('done')
97 |
98 | _compute_and_log_stats(roidb)
99 |
100 | return roidb
101 |
102 |
103 | def extend_with_flipped_entries(roidb, dataset):
104 | """Flip each entry in the given roidb and return a new roidb that is the
105 | concatenation of the original roidb and the flipped entries.
106 |
107 | "Flipping" an entry means that that image and associated metadata (e.g.,
108 | ground truth boxes and object proposals) are horizontally flipped.
109 | """
110 | flipped_roidb = []
111 | for entry in roidb:
112 | width = entry['width']
113 | boxes = entry['boxes'].copy()
114 | oldx1 = boxes[:, 0].copy()
115 | oldx2 = boxes[:, 2].copy()
116 | boxes[:, 0] = width - oldx2 - 1
117 | boxes[:, 2] = width - oldx1 - 1
118 | assert (boxes[:, 2] >= boxes[:, 0]).all()
119 | flipped_entry = {}
120 | dont_copy = ('boxes', 'segms', 'gt_keypoints', 'flipped')
121 | for k, v in entry.items():
122 | if k not in dont_copy:
123 | flipped_entry[k] = v
124 | flipped_entry['boxes'] = boxes
125 | flipped_entry['segms'] = segm_utils.flip_segms(
126 | entry['segms'], entry['height'], entry['width']
127 | )
128 | # if dataset.keypoints is not None:
129 | # flipped_entry['gt_keypoints'] = keypoint_utils.flip_keypoints(
130 | # dataset.keypoints, dataset.keypoint_flip_map,
131 | # entry['gt_keypoints'], entry['width']
132 | # )
133 | flipped_entry['flipped'] = True
134 | flipped_roidb.append(flipped_entry)
135 | roidb.extend(flipped_roidb)
136 |
137 |
138 | def filter_for_training(roidb,
139 | train_fg_thresh,
140 | train_bg_thresh_hi,
141 | train_bg_thresh_lo,
142 | keypoints_on):
143 | """Remove roidb entries that have no usable RoIs based on config settings.
144 | """
145 | def is_valid(entry):
146 | # Valid images have:
147 | # (1) At least one foreground RoI OR
148 | # (2) At least one background RoI
149 | overlaps = entry['max_overlaps']
150 | # find boxes with sufficient overlap
151 | fg_inds = np.where(overlaps >= train_fg_thresh)[0]
152 | # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
153 | bg_inds = np.where((overlaps < train_bg_thresh_hi) &
154 | (overlaps >= train_bg_thresh_lo))[0]
155 | # image is only valid if such boxes exist
156 | valid = len(fg_inds) > 0 or len(bg_inds) > 0
157 | if keypoints_on:
158 | # If we're training for keypoints, exclude images with no keypoints
159 | valid = valid and entry['has_visible_keypoints']
160 | return valid
161 |
162 | num = len(roidb)
163 | filtered_roidb = [entry for entry in roidb if is_valid(entry)]
164 | num_after = len(filtered_roidb)
165 | logger.info('Filtered {} roidb entries: {} -> {}'.
166 | format(num - num_after, num, num_after))
167 | return filtered_roidb
168 |
169 |
170 | def add_bbox_regression_targets(roidb,bbox_thresh,cls_agnostic_bbox_reg,bbox_reg_weights):
171 | """Add information needed to train bounding-box regressors."""
172 | for entry in roidb:
173 | entry['bbox_targets'] = _compute_targets(entry,bbox_thresh,cls_agnostic_bbox_reg,bbox_reg_weights)
174 |
175 |
176 | def _compute_targets(entry,bbox_thresh,cls_agnostic_bbox_reg,bbox_reg_weights):
177 | """Compute bounding-box regression targets for an image."""
178 | # Indices of ground-truth ROIs
179 | rois = entry['boxes']
180 | overlaps = entry['max_overlaps']
181 | labels = entry['max_classes']
182 | gt_inds = np.where((entry['gt_classes'] > 0) & (entry['is_crowd'] == 0))[0]
183 | # Targets has format (class, tx, ty, tw, th)
184 | targets = np.zeros((rois.shape[0], 5), dtype=np.float32)
185 | if len(gt_inds) == 0:
186 | # Bail if the image has no ground-truth ROIs
187 | return targets
188 |
189 | # Indices of examples for which we try to make predictions
190 | ex_inds = np.where(overlaps >= bbox_thresh)[0]
191 |
192 | # Get IoU overlap between each ex ROI and gt ROI
193 | ex_gt_overlaps = box_utils.bbox_overlaps(
194 | rois[ex_inds, :].astype(dtype=np.float32, copy=False),
195 | rois[gt_inds, :].astype(dtype=np.float32, copy=False))
196 |
197 | # Find which gt ROI each ex ROI has max overlap with:
198 | # this will be the ex ROI's gt target
199 | gt_assignment = ex_gt_overlaps.argmax(axis=1)
200 | gt_rois = rois[gt_inds[gt_assignment], :]
201 | ex_rois = rois[ex_inds, :]
202 | # Use class "1" for all boxes if using class_agnostic_bbox_reg
203 | targets[ex_inds, 0] = (
204 | 1 if cls_agnostic_bbox_reg else labels[ex_inds])
205 | targets[ex_inds, 1:] = box_utils.bbox_transform_inv(ex_rois, gt_rois, bbox_reg_weights)
206 | return targets
207 |
208 |
209 | def _compute_and_log_stats(roidb):
210 | classes = roidb[0]['dataset'].classes
211 | char_len = np.max([len(c) for c in classes])
212 | hist_bins = np.arange(len(classes) + 1)
213 |
214 | # Histogram of ground-truth objects
215 | gt_hist = np.zeros((len(classes)), dtype=np.int)
216 | for entry in roidb:
217 | gt_inds = np.where(
218 | (entry['gt_classes'] > 0) & (entry['is_crowd'] == 0))[0]
219 | gt_classes = entry['gt_classes'][gt_inds]
220 | gt_hist += np.histogram(gt_classes, bins=hist_bins)[0]
221 | logger.debug('Ground-truth class histogram:')
222 | for i, v in enumerate(gt_hist):
223 | logger.debug(
224 | '{:d}{:s}: {:d}'.format(
225 | i, classes[i].rjust(char_len), v))
226 | logger.debug('-' * char_len)
227 | logger.debug(
228 | '{:s}: {:d}'.format(
229 | 'total'.rjust(char_len), np.sum(gt_hist)))
230 |
--------------------------------------------------------------------------------
/lib/model/collect_and_distribute_fpn_rpn_proposals.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | from __future__ import absolute_import
17 | from __future__ import division
18 | from __future__ import print_function
19 | from __future__ import unicode_literals
20 |
21 | import torch
22 | from torch.autograd import Variable
23 |
24 | import numpy as np
25 | from utils.multilevel_rois import map_rois_to_fpn_levels
26 |
27 | from math import log2
28 | # from core.config import cfg
29 | # from datasets import json_dataset
30 | # import modeling.FPN as fpn
31 | # import roi_data.fast_rcnn
32 | # import utils.blob as blob_utils
33 |
34 |
35 | class CollectAndDistributeFpnRpnProposals(torch.nn.Module):
36 | def __init__(self, spatial_scales, train=False):
37 | super(CollectAndDistributeFpnRpnProposals, self).__init__()
38 | self._train = train
39 | self.rpn_levels = [int(log2(1/s)) for s in spatial_scales]
40 | self.rpn_min_level = self.rpn_levels[0]
41 | self.rpn_max_level = self.rpn_levels[-1]
42 |
43 | def forward(self, roi_list, roi_score_list):
44 | """See modeling.detector.CollectAndDistributeFpnRpnProposals for
45 | inputs/outputs documentation.
46 | """
47 | # inputs is
48 | # [rpn_rois_fpn2, ..., rpn_rois_fpn6,
49 | # rpn_roi_probs_fpn2, ..., rpn_roi_probs_fpn6]
50 | # If training with Faster R-CNN, then inputs will additionally include
51 | # + [roidb, im_info]
52 | rois = collect(roi_list, roi_score_list, self._train)
53 |
54 | # ************** WARNING ***************
55 | # TRAINING CODE BELOW NOT CONVERTED TO PYTORCH
56 | # ************** WARNING ***************
57 |
58 | # if self._train:
59 | # # During training we reuse the data loader code. We populate roidb
60 | # # entries on the fly using the rois generated by RPN.
61 | # # im_info: [[im_height, im_width, im_scale], ...]
62 | # im_info = inputs[-1].data
63 | # im_scales = im_info[:, 2]
64 | # roidb = blob_utils.deserialize(inputs[-2].data)
65 | # # For historical consistency with the original Faster R-CNN
66 | # # implementation we are *not* filtering crowd proposals.
67 | # # This choice should be investigated in the future (it likely does
68 | # # not matter).
69 | # json_dataset.add_proposals(roidb, rois, im_scales, crowd_thresh=0)
70 | # # Compute training labels for the RPN proposals; also handles
71 | # # distributing the proposals over FPN levels
72 | # output_blob_names = roi_data.fast_rcnn.get_fast_rcnn_blob_names()
73 | # blobs = {k: [] for k in output_blob_names}
74 | # roi_data.fast_rcnn.add_fast_rcnn_blobs(blobs, im_scales, roidb)
75 | # for i, k in enumerate(output_blob_names):
76 | # blob_utils.py_op_copy_blob(blobs[k], outputs[i])
77 | # else:
78 | # # For inference we have a special code path that avoids some data
79 | # # loader overhead
80 | # distribute(rois, None, outputs, self._train)
81 | return distribute(rois, self.rpn_min_level, self.rpn_max_level) #, None, outputs, self._train)
82 |
83 |
84 | def collect(roi_inputs, score_inputs, train):
85 | #cfg_key = 'TRAIN' if is_training else 'TEST'
86 | post_nms_topN = 2000 if train else 1000 # cfg[cfg_key].RPN_POST_NMS_TOP_N
87 | # k_max = 6 #cfg.FPN.RPN_MAX_LEVEL
88 | # k_min = 2 #cfg.FPN.RPN_MIN_LEVEL
89 | # num_lvls = k_max - k_min + 1
90 | # roi_inputs = inputs[:num_lvls]
91 | # score_inputs = inputs[num_lvls:]
92 | # if is_training:
93 | # score_inputs = score_inputs[:-2]
94 |
95 | # rois are in [[batch_idx, x0, y0, x1, y2], ...] format
96 | # Combine predictions across all levels and retain the top scoring
97 | #rois = np.concatenate([blob.data for blob in roi_inputs])
98 | rois = torch.cat(tuple(roi_inputs),0)
99 | #scores = np.concatenate([blob.data for blob in score_inputs]).squeeze()
100 | scores = torch.cat(tuple(score_inputs),0).squeeze()
101 | #inds = np.argsort(-scores)[:post_nms_topN]
102 | vals, inds = torch.sort(-scores)
103 | #rois = rois[inds, :]
104 | rois = rois[inds[:post_nms_topN], :]
105 | return rois
106 |
107 |
108 | def distribute(rois, lvl_min, lvl_max): #, label_blobs, outputs, train):
109 | """To understand the output blob order see return value of
110 | roi_data.fast_rcnn.get_fast_rcnn_blob_names(is_training=False)
111 | """
112 | # lvl_min = 2 #cfg.FPN.ROI_MIN_LEVEL
113 | # lvl_max = 5 #cfg.FPN.ROI_MAX_LEVEL
114 | lvls = map_rois_to_fpn_levels(rois.data.cpu().numpy(), lvl_min, lvl_max)
115 |
116 | # outputs[0].reshape(rois.shape)
117 | # outputs[0].data[...] = rois
118 |
119 | # Create new roi blobs for each FPN level
120 | # (See: modeling.FPN.add_multilevel_roi_blobs which is similar but annoying
121 | # to generalize to support this particular case.)
122 | rois_idx_order = np.empty((0, ))
123 | distr_rois=[]
124 | for output_idx, lvl in enumerate(range(lvl_min, lvl_max + 1)):
125 | idx_lvl = np.where(lvls == lvl)[0]
126 | distr_rois.append(rois[idx_lvl, :])
127 | rois_idx_order = np.concatenate((rois_idx_order, idx_lvl))
128 | rois_idx_restore = np.argsort(rois_idx_order)
129 | return distr_rois, rois_idx_restore
--------------------------------------------------------------------------------
/lib/model/loss.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import pickle
3 | import numpy as np
4 | #import copy
5 | import torchvision.models as models
6 | from model.roi_align import RoIAlign
7 | from model.generate_proposals import GenerateProposals
8 | from utils.utils import isnan,infbreak,printmax
9 |
10 | from torch.autograd import Variable
11 | from torch.nn.functional import cross_entropy
12 |
13 | def smooth_L1(pred,targets,alpha_in,alpha_out,beta=1.0):
14 | x=(pred-targets)*alpha_in
15 | xabs=torch.abs(x)
16 | y1=0.5*x**2/beta
17 | y2=xabs-0.5*beta
18 | case1=torch.le(xabs,beta).float()
19 | case2=1-case1
20 | return torch.sum((y1*case1+y2*case2)*alpha_out)/pred.size(0)
21 |
22 | def accuracy(cls_score,cls_labels):
23 | class_dim = cls_score.dim()-1
24 | argmax=torch.max(torch.nn.functional.softmax(cls_score,dim=class_dim),class_dim)[1]
25 | accuracy = torch.mean(torch.eq(argmax,cls_labels.long()).float())
26 | return accuracy
27 |
28 | # class detector_loss(torch.nn.Module):
29 | # def __init__(self, do_loss_cls=True, do_loss_bbox=True, do_accuracy_cls=True):
30 | # super(detector_loss, self).__init__()
31 | # # Flags
32 | # self.do_loss_cls = do_loss_cls
33 | # self.do_loss_bbox = do_loss_bbox
34 | # self.do_accuracy_cls = do_accuracy_cls
35 | # # Dicts for losses
36 | # # self.losses={}
37 | # # if do_loss_cls:
38 | # # self.losses['loss_cls']=0
39 | # # if do_loss_bbox:
40 | # # self.losses['loss_bbox']=0
41 | # # # Dicts for metrics
42 | # # self.metrics={}
43 | # # if do_accuracy_cls:
44 | # # self.metrics['accuracy_cls']=0
45 |
46 | # def forward(self,
47 | # cls_score,
48 | # cls_labels,
49 | # bbox_pred,
50 | # bbox_targets,
51 | # bbox_inside_weights,
52 | # bbox_outside_weights):
53 |
54 | # # compute losses
55 | # losses=[]
56 | # if self.do_loss_cls:
57 | # loss_cls = cross_entropy(cls_score,cls_labels.long())
58 | # losses.append(loss_cls)
59 | # if self.do_loss_bbox:
60 | # loss_bbox = smooth_L1(bbox_pred,bbox_targets,bbox_inside_weights,bbox_outside_weights)
61 | # losses.append(loss_bbox)
62 |
63 | # # # compute metrics
64 | # # if self.do_accuracy_cls:
65 | # # self.metrics['accuracy_cls'] = accuracy(cls_score,cls_labels.long())
66 |
67 | # # sum total loss
68 | # #loss = torch.sum(torch.cat(tuple([v.unsqueeze(0) for v in losses]),0))
69 |
70 | # # loss.register_hook(printmax)
71 |
72 | # return tuple(losses)
73 |
--------------------------------------------------------------------------------
/lib/model/roi_align.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Function
3 | from torch.nn.modules.module import Module
4 | from torch.autograd import Variable
5 | import os
6 | from torch.autograd.function import once_differentiable
7 |
8 | torch_ver = torch.__version__[:3]
9 |
10 | if torch_ver=="0.4":
11 | from torch.utils.cpp_extension import load
12 | build_path = os.path.realpath(os.path.join(os.path.dirname(os.path.realpath(__file__)),'../cppcuda/build/'))
13 |
14 | print('compiling/loading roi_align')
15 | roialign = load(name='roialign',sources=['lib/cppcuda/roi_align_binding.cpp',
16 | 'lib/cppcuda/roi_align_forward_cuda.cu',
17 | 'lib/cppcuda/roi_align_backward_cuda.cu'],
18 | build_directory=build_path,verbose=True)
19 | else:
20 | import cppcuda_cffi.roialign as roialign
21 |
22 |
23 | class RoIAlignFunction(Function):
24 | # def __init__(ctx, pooled_height, pooled_width, spatial_scale, sampling_ratio):
25 | # ctx.pooled_width = int(pooled_width)
26 | # ctx.pooled_height = int(pooled_height)
27 | # ctx.spatial_scale = float(spatial_scale)
28 | # ctx.sampling_ratio = int(sampling_ratio)
29 | # ctx.features_size = None
30 | # ctx.rois=None
31 |
32 | @staticmethod
33 | def forward(ctx, features, rois, pooled_height, pooled_width, spatial_scale, sampling_ratio):
34 | #ctx.save_for_backward(rois)
35 | ctx.rois=rois
36 | ctx.features_size=features.size()
37 | ctx.pooled_height=pooled_height
38 | ctx.pooled_width=pooled_width
39 | ctx.spatial_scale=spatial_scale
40 | ctx.sampling_ratio=sampling_ratio
41 |
42 | # compute
43 | if features.is_cuda != rois.is_cuda:
44 | raise TypeError('features and rois should be on same device (CPU or GPU)')
45 | elif features.is_cuda and rois.is_cuda :
46 | if torch_ver=="0.4":
47 | output = roialign.roi_align_forward_cuda(features,
48 | rois,
49 | pooled_height,
50 | pooled_width,
51 | spatial_scale,
52 | sampling_ratio)
53 | else:
54 | num_channels = features.size(1)
55 | num_rois = rois.size(0)
56 | output = torch.zeros(num_rois, num_channels, pooled_height, pooled_width).cuda()
57 | roialign.roi_align_forward_cuda(features,
58 | rois,
59 | output,
60 | pooled_height,
61 | pooled_width,
62 | spatial_scale,
63 | sampling_ratio)
64 |
65 | elif features.is_cuda==False and rois.is_cuda==False:
66 | if torch_ver=="0.4":
67 | output = roialign.roi_align_forward_cpu(features,
68 | rois,
69 | pooled_height,
70 | pooled_width,
71 | spatial_scale,
72 | sampling_ratio)
73 | else:
74 | num_channels = features.size(1)
75 | num_rois = rois.size(0)
76 | output = torch.zeros(num_rois, num_channels, pooled_height, pooled_width)
77 | roialign.roi_align_forward_cpu(features,
78 | rois,
79 | output,
80 | pooled_height,
81 | pooled_width,
82 | spatial_scale,
83 | sampling_ratio)
84 |
85 |
86 | if torch_ver=="0.4":
87 | return Variable(output,requires_grad=True)
88 | else:
89 | return output
90 |
91 | @staticmethod
92 | @once_differentiable
93 | def backward(ctx, grad_output):
94 | #rois, = ctx.saved_variables
95 | rois = ctx.rois
96 | features_size=ctx.features_size
97 | pooled_height=ctx.pooled_height
98 | pooled_width=ctx.pooled_width
99 | spatial_scale=ctx.spatial_scale
100 | sampling_ratio=ctx.sampling_ratio
101 |
102 | #rois = ctx.rois
103 | if rois.is_cuda:
104 | if torch_ver=="0.4":
105 | grad_input = roialign.roi_align_backward_cuda(rois,
106 | grad_output,
107 | features_size[0],
108 | features_size[1],
109 | features_size[2],
110 | features_size[3],
111 | pooled_height,
112 | pooled_width,
113 | spatial_scale,
114 | sampling_ratio)
115 | else:
116 | #import pdb; pdb.set_trace()
117 | grad_input = torch.zeros(features_size).cuda(rois.get_device()) # <- the problem!
118 | roialign.roi_align_backward_cuda(rois,
119 | grad_output,
120 | grad_input,
121 | pooled_height,
122 | pooled_width,
123 | spatial_scale,
124 | sampling_ratio)
125 |
126 | else:
127 | if torch_ver=="0.4":
128 | grad_input = roialign.roi_align_backward_cpu(rois,
129 | grad_output,
130 | features_size[0],
131 | features_size[1],
132 | features_size[2],
133 | features_size[3],
134 | pooled_height,
135 | pooled_width,
136 | spatial_scale,
137 | sampling_ratio)
138 | else:
139 | raise("backward pass not implemented on cpu in cffi extension")
140 |
141 | # import pdb; pdb.set_trace()
142 | if torch_ver=="0.4":
143 | return Variable(grad_input), None, None, None, None, None
144 | else:
145 | return grad_input, None, None, None, None, None
146 |
147 |
148 |
149 |
150 | class RoIAlign(Module):
151 | def __init__(self, pooled_height, pooled_width, spatial_scale, sampling_ratio=0):
152 | super(RoIAlign, self).__init__()
153 |
154 | self.pooled_height=int(pooled_height)
155 | self.pooled_width=int(pooled_width)
156 | self.spatial_scale=float(spatial_scale)
157 | self.sampling_ratio=int(sampling_ratio)
158 |
159 | def forward(self, features, rois):
160 | # features is a Variable/FloatTensor of size BxCxHxW
161 | # rois is a (optional: list of) Variable/FloatTensor IDX,Xmin,Ymin,Xmax,Ymax (normalized to [0,1])
162 | rois = preprocess_rois(rois)
163 | output = RoIAlignFunction.apply(features,
164 | rois,
165 | self.pooled_height,
166 | self.pooled_width,
167 | self.spatial_scale,
168 | self.sampling_ratio)
169 | return output
170 |
171 |
172 | def preprocess_rois(rois):
173 | # do some verifications on what has been passed as rois
174 | if isinstance(rois,list): # if list, convert to single tensor (used for multiscale)
175 | rois = torch.cat(tuple(rois),0)
176 | if isinstance(rois,Variable):
177 | if rois.dim()==3:
178 | if rois.size(0)==1:
179 | rois = rois.squeeze(0)
180 | else:
181 | raise("rois has wrong size")
182 | if rois.size(1)==4:
183 | # add zeros
184 | zeros = Variable(torch.zeros((rois.size(0),1)))
185 | if rois.is_cuda:
186 | zeros = zeros.cuda()
187 | rois = torch.cat((zeros,rois),1).contiguous()
188 | return rois
--------------------------------------------------------------------------------
/lib/utils/blob.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | #
16 | # Based on:
17 | # --------------------------------------------------------
18 | # Fast R-CNN
19 | # Copyright (c) 2015 Microsoft
20 | # Licensed under The MIT License [see LICENSE for details]
21 | # Written by Ross Girshick
22 | # --------------------------------------------------------
23 |
24 | import cv2
25 | import numpy as np
26 |
27 | def im_list_to_blob(ims,fpn_on=False,fpn_coarsest_stride=32):
28 | """Convert a list of images into a network input. Assumes images were
29 | prepared using prep_im_for_blob or equivalent: i.e.
30 | - BGR channel order
31 | - pixel means subtracted
32 | - resized to the desired input size
33 | - float32 numpy ndarray format
34 | Output is a 4D HCHW tensor of the images concatenated along axis 0 with
35 | shape.
36 | """
37 | max_shape = np.array([im.shape for im in ims]).max(axis=0)
38 | # Pad the image so they can be divisible by a stride
39 | if fpn_on:
40 | stride = float(fpn_coarsest_stride)
41 | max_shape[0] = int(np.ceil(max_shape[0] / stride) * stride)
42 | max_shape[1] = int(np.ceil(max_shape[1] / stride) * stride)
43 |
44 | num_images = len(ims)
45 | blob = np.zeros((num_images, max_shape[0], max_shape[1], 3),
46 | dtype=np.float32)
47 | for i in range(num_images):
48 | im = ims[i]
49 | blob[i, 0:im.shape[0], 0:im.shape[1], :] = im
50 | # Move channels (axis 3) to axis 1
51 | # Axis order will become: (batch elem, channel, height, width)
52 | channel_swap = (0, 3, 1, 2)
53 | blob = blob.transpose(channel_swap)
54 | return blob
55 |
56 |
57 | def prep_im_for_blob(im, pixel_means=[122.7717, 115.9465, 102.9801], target_sizes=[800], max_size=1333):
58 | """Prepare an image for use as a network input blob. Specially:
59 | - Subtract per-channel pixel mean
60 | - Convert to float32
61 | - Rescale to each of the specified target size (capped at max_size)
62 | Returns a list of transformed images, one for each target size. Also returns
63 | the scale factors that were used to compute each returned image.
64 | """
65 | im = im.astype(np.float32, copy=False)
66 | im -= pixel_means
67 | im_shape = im.shape
68 | im_size_min = np.min(im_shape[0:2])
69 | im_size_max = np.max(im_shape[0:2])
70 |
71 | ims = []
72 | im_scales = []
73 | for target_size in target_sizes:
74 | im_scale = float(target_size) / float(im_size_min)
75 | # Prevent the biggest axis from being more than max_size
76 | if np.round(im_scale * im_size_max) > max_size:
77 | im_scale = float(max_size) / float(im_size_max)
78 | # BUGGY im is replaced by scaled im
79 | # im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
80 | # interpolation=cv2.INTER_LINEAR)
81 | # ims.append(im)
82 | im_prime = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,
83 | interpolation=cv2.INTER_LINEAR)
84 | ims.append(im_prime)
85 | im_scales.append(im_scale)
86 |
87 | return ims, im_scales
88 |
89 | def get_rois_blob(im_rois, im_scale):
90 | """Converts RoIs into network inputs.
91 | Arguments:
92 | im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
93 | im_scale_factors (list): scale factors as returned by _get_image_blob
94 | Returns:
95 | blob (ndarray): R x 5 matrix of RoIs in the image pyramid with columns
96 | [level, x1, y1, x2, y2]
97 | """
98 | rois, levels = project_im_rois(im_rois, im_scale)
99 | rois_blob = np.hstack((levels, rois))
100 | return rois_blob.astype(np.float32, copy=False)
101 |
102 |
103 | def project_im_rois(im_rois, scales):
104 | """Project image RoIs into the image pyramid built by _get_image_blob.
105 | Arguments:
106 | im_rois (ndarray): R x 4 matrix of RoIs in original image coordinates
107 | scales (list): scale factors as returned by _get_image_blob
108 | Returns:
109 | rois (ndarray): R x 4 matrix of projected RoI coordinates
110 | levels (ndarray): image pyramid levels used by each projected RoI
111 | """
112 | rois = im_rois.astype(np.float, copy=False) * scales
113 | levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)
114 | return rois, levels
--------------------------------------------------------------------------------
/lib/utils/collate_custom.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import collections
3 | #from torch.utils.data.dataloader import default_collate
4 | import itertools
5 |
6 | def collate_custom(batch,key=None):
7 | """ Custom collate function for the Dataset class
8 | * It doesn't convert numpy arrays to stacked-tensors, but rather combines them in a list
9 | * This is useful for processing annotations of different sizes
10 | """
11 |
12 | # this case will occur in first pass, and will convert a
13 | # list of dictionaries (returned by the threads by sampling dataset[idx])
14 | # to a unified dictionary of collated values
15 | if isinstance(batch[0], collections.Mapping):
16 | return {key: collate_custom([d[key] for d in batch],key) for key in batch[0]}
17 | # these cases will occur in recursion
18 | #elif torch.is_tensor(batch[0]): # for tensors, use standrard collating function
19 | #return default_collate(batch)
20 | elif isinstance(batch,list) and isinstance(batch[0],list): # flatten lists of lists
21 | flattened_list = list(itertools.chain(*batch))
22 | return flattened_list
23 | elif isinstance(batch,list) and len(batch)==1: # lists of length 1, remove list wrap
24 | return batch[0]
25 | else: # for other types (i.e. lists of len!=1), return as is
26 | return batch
27 |
28 |
--------------------------------------------------------------------------------
/lib/utils/collections.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | """A simple attribute dictionary used for representing configuration options."""
17 |
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | from __future__ import unicode_literals
22 |
23 |
24 | class AttrDict(dict):
25 |
26 | def __getattr__(self, name):
27 | if name in self.__dict__:
28 | return self.__dict__[name]
29 | elif name in self:
30 | return self[name]
31 | else:
32 | raise AttributeError(name)
33 |
34 | def __setattr__(self, name, value):
35 | if name in self.__dict__:
36 | self.__dict__[name] = value
37 | else:
38 | self[name] = value
39 |
--------------------------------------------------------------------------------
/lib/utils/colormap.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | """An awesome colormap for really neat visualizations."""
17 |
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | from __future__ import unicode_literals
22 |
23 | import numpy as np
24 |
25 |
26 | def colormap(rgb=False):
27 | color_list = np.array(
28 | [
29 | 0.000, 0.447, 0.741,
30 | 0.850, 0.325, 0.098,
31 | 0.929, 0.694, 0.125,
32 | 0.494, 0.184, 0.556,
33 | 0.466, 0.674, 0.188,
34 | 0.301, 0.745, 0.933,
35 | 0.635, 0.078, 0.184,
36 | 0.300, 0.300, 0.300,
37 | 0.600, 0.600, 0.600,
38 | 1.000, 0.000, 0.000,
39 | 1.000, 0.500, 0.000,
40 | 0.749, 0.749, 0.000,
41 | 0.000, 1.000, 0.000,
42 | 0.000, 0.000, 1.000,
43 | 0.667, 0.000, 1.000,
44 | 0.333, 0.333, 0.000,
45 | 0.333, 0.667, 0.000,
46 | 0.333, 1.000, 0.000,
47 | 0.667, 0.333, 0.000,
48 | 0.667, 0.667, 0.000,
49 | 0.667, 1.000, 0.000,
50 | 1.000, 0.333, 0.000,
51 | 1.000, 0.667, 0.000,
52 | 1.000, 1.000, 0.000,
53 | 0.000, 0.333, 0.500,
54 | 0.000, 0.667, 0.500,
55 | 0.000, 1.000, 0.500,
56 | 0.333, 0.000, 0.500,
57 | 0.333, 0.333, 0.500,
58 | 0.333, 0.667, 0.500,
59 | 0.333, 1.000, 0.500,
60 | 0.667, 0.000, 0.500,
61 | 0.667, 0.333, 0.500,
62 | 0.667, 0.667, 0.500,
63 | 0.667, 1.000, 0.500,
64 | 1.000, 0.000, 0.500,
65 | 1.000, 0.333, 0.500,
66 | 1.000, 0.667, 0.500,
67 | 1.000, 1.000, 0.500,
68 | 0.000, 0.333, 1.000,
69 | 0.000, 0.667, 1.000,
70 | 0.000, 1.000, 1.000,
71 | 0.333, 0.000, 1.000,
72 | 0.333, 0.333, 1.000,
73 | 0.333, 0.667, 1.000,
74 | 0.333, 1.000, 1.000,
75 | 0.667, 0.000, 1.000,
76 | 0.667, 0.333, 1.000,
77 | 0.667, 0.667, 1.000,
78 | 0.667, 1.000, 1.000,
79 | 1.000, 0.000, 1.000,
80 | 1.000, 0.333, 1.000,
81 | 1.000, 0.667, 1.000,
82 | 0.167, 0.000, 0.000,
83 | 0.333, 0.000, 0.000,
84 | 0.500, 0.000, 0.000,
85 | 0.667, 0.000, 0.000,
86 | 0.833, 0.000, 0.000,
87 | 1.000, 0.000, 0.000,
88 | 0.000, 0.167, 0.000,
89 | 0.000, 0.333, 0.000,
90 | 0.000, 0.500, 0.000,
91 | 0.000, 0.667, 0.000,
92 | 0.000, 0.833, 0.000,
93 | 0.000, 1.000, 0.000,
94 | 0.000, 0.000, 0.167,
95 | 0.000, 0.000, 0.333,
96 | 0.000, 0.000, 0.500,
97 | 0.000, 0.000, 0.667,
98 | 0.000, 0.000, 0.833,
99 | 0.000, 0.000, 1.000,
100 | 0.000, 0.000, 0.000,
101 | 0.143, 0.143, 0.143,
102 | 0.286, 0.286, 0.286,
103 | 0.429, 0.429, 0.429,
104 | 0.571, 0.571, 0.571,
105 | 0.714, 0.714, 0.714,
106 | 0.857, 0.857, 0.857,
107 | 1.000, 1.000, 1.000
108 | ]
109 | ).astype(np.float32)
110 | color_list = color_list.reshape((-1, 3)) * 255
111 | if not rgb:
112 | color_list = color_list[:, ::-1]
113 | return color_list
114 |
--------------------------------------------------------------------------------
/lib/utils/data_parallel.py:
--------------------------------------------------------------------------------
1 | import operator
2 | import torch
3 | import warnings
4 | from torch.nn import Module
5 | from torch.nn.parallel.scatter_gather import scatter_kwargs, gather
6 | from torch.nn.parallel.replicate import replicate
7 | from torch.nn.parallel.parallel_apply import parallel_apply
8 |
9 | class DataParallel(torch.nn.DataParallel):
10 | def __init__(self, *args, **kwargs):
11 | super(MyDataParallel, self).__init__(*args, **kwargs)
12 |
13 | def scatter(self, inputs, kwargs, device_ids): # scatter a list of len N into N gpus
14 | return scatter_lists(inputs, kwargs, device_ids)
15 |
16 | def scatter_lists(inputs, kwargs,device_ids):
17 | n_inputs = len(inputs)
18 | n_devices = len(device_ids)
19 | for i in range(n_inputs):
20 | assert(len(inputs[i])==n_devices)
21 | inputs=tuple([tuple([inputs[i][j].cuda(device_ids[j]) for i in range(n_inputs)]) for j in range(n_devices)])
22 | return inputs,kwargs
23 |
24 |
25 | def data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None, dont_scatter=False, dont_gather=False):
26 | r"""Evaluates module(input) in parallel across the GPUs given in device_ids.
27 |
28 | This is the functional version of the DataParallel module.
29 |
30 | Args:
31 | module: the module to evaluate in parallel
32 | inputs: inputs to the module
33 | device_ids: GPU ids on which to replicate module
34 | output_device: GPU location of the output Use -1 to indicate the CPU.
35 | (default: device_ids[0])
36 | Returns:
37 | a Variable containing the result of module(input) located on
38 | output_device
39 | """
40 | if not isinstance(inputs, tuple):
41 | inputs = (inputs,)
42 | #print('getting device_ids')
43 | if device_ids is None:
44 | device_ids = list(range(torch.cuda.device_count()))
45 | #print(device_ids)
46 | if output_device is None:
47 | output_device = device_ids[0]
48 |
49 | if dont_scatter==False:
50 | do_scatter_lists=isinstance(inputs[0],list)
51 | if do_scatter_lists:
52 | inputs, module_kwargs = scatter_lists(inputs, module_kwargs, device_ids)
53 | else:
54 | inputs, module_kwargs = scatter_kwargs(inputs, module_kwargs, device_ids, dim)
55 |
56 | if len(device_ids) == 1:
57 | return module(*inputs[0], **module_kwargs[0])
58 | #print('getting used device_ids')
59 | used_device_ids = device_ids[:len(inputs)]
60 | #print(used_device_ids)
61 | #print('making model replicas')
62 | replicas = replicate(module, used_device_ids)
63 | #print('applying model')
64 | outputs = parallel_apply(replicas, inputs, module_kwargs, used_device_ids)
65 | if dont_gather:
66 | return tuple([[out[i] for out in outputs] for i in range(len(outputs[0]))])
67 | #print('gathering result')
68 | return gather(outputs, output_device, dim)
--------------------------------------------------------------------------------
/lib/utils/dummy_datasets.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | """Provide stub objects that can act as stand-in "dummy" datasets for simple use
16 | cases, like getting all classes in a dataset. This exists so that demos can be
17 | run without requiring users to download/install datasets first.
18 | """
19 |
20 | from __future__ import absolute_import
21 | from __future__ import division
22 | from __future__ import print_function
23 | from __future__ import unicode_literals
24 |
25 | from utils.collections import AttrDict
26 |
27 |
28 | def get_coco_dataset():
29 | """A dummy COCO dataset that includes only the 'classes' field."""
30 | ds = AttrDict()
31 | classes = [
32 | '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
33 | 'bus', 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
34 | 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse',
35 | 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',
36 | 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis',
37 | 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove',
38 | 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'wine glass',
39 | 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich',
40 | 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake',
41 | 'chair', 'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv',
42 | 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
43 | 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
44 | 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
45 | ]
46 | ds.classes = {i: name for i, name in enumerate(classes)}
47 | return ds
48 |
--------------------------------------------------------------------------------
/lib/utils/fast_rcnn_sample_rois.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | #
16 | # Based on:
17 | # --------------------------------------------------------
18 | # Fast R-CNN
19 | # Copyright (c) 2015 Microsoft
20 | # Licensed under The MIT License [see LICENSE for details]
21 | # Written by Ross Girshick
22 | # --------------------------------------------------------
23 |
24 | import numpy as np
25 | import numpy.random as npr
26 |
27 |
28 |
29 | def ones(shape, int32=False):
30 | """Return a blob of all ones of the given shape with the correct float or
31 | int data type.
32 | """
33 | return np.ones(shape, dtype=np.int32 if int32 else np.float32)
34 |
35 | def zeros(shape, int32=False):
36 | """Return a blob of all zeros of the given shape with the correct float or
37 | int data type.
38 | """
39 | return np.zeros(shape, dtype=np.int32 if int32 else np.float32)
40 |
41 | def fast_rcnn_sample_rois(roidb,
42 | im_scale,
43 | batch_idx,
44 | train_batch_size_per_image=512, # rois per im
45 | train_fg_roi_fraction=0.25,
46 | train_fg_thresh=0.5,
47 | train_bg_thresh_hi=0.5,
48 | train_bg_thresh_lo=0,
49 | mask_on=False,
50 | keypoints_on=False
51 | ):
52 | #print('debug: setting random seed 1234 in fast_rcnn.py: _sample_rois()')
53 | # npr.seed(1234) # DEBUG
54 | """Generate a random sample of RoIs comprising foreground and background
55 | examples.
56 | """
57 | rois_per_image = int(train_batch_size_per_image)
58 | fg_rois_per_image = int(np.round(train_fg_roi_fraction * rois_per_image))
59 | max_overlaps = roidb['max_overlaps']
60 |
61 | # Select foreground RoIs as those with >= FG_THRESH overlap
62 | fg_inds = np.where(max_overlaps >= train_fg_thresh)[0]
63 | # Guard against the case when an image has fewer than fg_rois_per_image
64 | # foreground RoIs
65 | fg_rois_per_this_image = np.minimum(fg_rois_per_image, fg_inds.size)
66 | # Sample foreground regions without replacement
67 | if fg_inds.size > 0:
68 | fg_inds = npr.choice(
69 | fg_inds, size=fg_rois_per_this_image, replace=False
70 | )
71 |
72 | # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
73 | bg_inds = np.where(
74 | (max_overlaps < train_bg_thresh_hi) &
75 | (max_overlaps >= train_bg_thresh_lo)
76 | )[0]
77 | # Compute number of background RoIs to take from this image (guarding
78 | # against there being fewer than desired)
79 | bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image
80 | bg_rois_per_this_image = np.minimum(bg_rois_per_this_image, bg_inds.size)
81 | # Sample foreground regions without replacement
82 | if bg_inds.size > 0:
83 | bg_inds = npr.choice(
84 | bg_inds, size=bg_rois_per_this_image, replace=False
85 | )
86 |
87 | # The indices that we're selecting (both fg and bg)
88 | keep_inds = np.append(fg_inds, bg_inds)
89 | # Label is the class each RoI has max overlap with
90 | sampled_labels = roidb['max_classes'][keep_inds]
91 | sampled_labels[fg_rois_per_this_image:] = 0 # Label bg RoIs with class 0
92 | sampled_boxes = roidb['boxes'][keep_inds]
93 |
94 | if 'bbox_targets' not in roidb:
95 | gt_inds = np.where(roidb['gt_classes'] > 0)[0]
96 | gt_boxes = roidb['boxes'][gt_inds, :]
97 | gt_assignments = gt_inds[roidb['box_to_gt_ind_map'][keep_inds]]
98 | bbox_targets = _compute_targets(
99 | sampled_boxes, gt_boxes[gt_assignments, :], sampled_labels
100 | )
101 | bbox_targets, bbox_inside_weights = _expand_bbox_targets(bbox_targets)
102 | else:
103 | bbox_targets, bbox_inside_weights = _expand_bbox_targets(
104 | roidb['bbox_targets'][keep_inds, :]
105 | )
106 |
107 | bbox_outside_weights = np.array(
108 | bbox_inside_weights > 0, dtype=bbox_inside_weights.dtype
109 | )
110 |
111 | # Scale rois and format as (batch_idx, x1, y1, x2, y2)
112 | sampled_rois = sampled_boxes * im_scale
113 | repeated_batch_idx = batch_idx * ones((sampled_rois.shape[0], 1))
114 | sampled_rois = np.hstack((repeated_batch_idx, sampled_rois))
115 |
116 | # Base Fast R-CNN blobs
117 | blob_dict = dict(
118 | labels_int32=sampled_labels.astype(np.int32, copy=False),
119 | rois=sampled_rois,
120 | bbox_targets=bbox_targets,
121 | bbox_inside_weights=bbox_inside_weights,
122 | bbox_outside_weights=bbox_outside_weights
123 | )
124 |
125 | # # Optionally add Mask R-CNN blobs
126 | # if mask_on:
127 | # roi_data.mask_rcnn.add_mask_rcnn_blobs(
128 | # blob_dict, sampled_boxes, roidb, im_scale, batch_idx
129 | # )
130 |
131 | # # Optionally add Keypoint R-CNN blobs
132 | # if keypoints_on:
133 | # roi_data.keypoint_rcnn.add_keypoint_rcnn_blobs(
134 | # blob_dict, roidb, fg_rois_per_image, fg_inds, im_scale, batch_idx
135 | # )
136 |
137 | return blob_dict
138 |
139 | def _expand_bbox_targets(bbox_target_data, num_classes=81, cls_agnostic_bbox_reg=False):
140 | """Bounding-box regression targets are stored in a compact form in the
141 | roidb.
142 | This function expands those targets into the 4-of-4*K representation used
143 | by the network (i.e. only one class has non-zero targets). The loss weights
144 | are similarly expanded.
145 | Returns:
146 | bbox_target_data (ndarray): N x 4K blob of regression targets
147 | bbox_inside_weights (ndarray): N x 4K blob of loss weights
148 | """
149 | num_bbox_reg_classes = num_classes
150 | if cls_agnostic_bbox_reg:
151 | num_bbox_reg_classes = 2 # bg and fg
152 |
153 | clss = bbox_target_data[:, 0]
154 | bbox_targets = zeros((clss.size, 4 * num_bbox_reg_classes))
155 | bbox_inside_weights = zeros(bbox_targets.shape)
156 | inds = np.where(clss > 0)[0]
157 | for ind in inds:
158 | cls = int(clss[ind])
159 | start = 4 * cls
160 | end = start + 4
161 | bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
162 | bbox_inside_weights[ind, start:end] = (1.0, 1.0, 1.0, 1.0)
163 | return bbox_targets, bbox_inside_weights
--------------------------------------------------------------------------------
/lib/utils/generate_anchors.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | #
16 | # Based on:
17 | # --------------------------------------------------------
18 | # Faster R-CNN
19 | # Copyright (c) 2015 Microsoft
20 | # Licensed under The MIT License [see LICENSE for details]
21 | # Written by Ross Girshick and Sean Bell
22 | # --------------------------------------------------------
23 |
24 | import numpy as np
25 |
26 | # Verify that we compute the same anchors as Shaoqing's matlab implementation:
27 | #
28 | # >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
29 | # >> anchors
30 | #
31 | # anchors =
32 | #
33 | # -83 -39 100 56
34 | # -175 -87 192 104
35 | # -359 -183 376 200
36 | # -55 -55 72 72
37 | # -119 -119 136 136
38 | # -247 -247 264 264
39 | # -35 -79 52 96
40 | # -79 -167 96 184
41 | # -167 -343 184 360
42 |
43 | # array([[ -83., -39., 100., 56.],
44 | # [-175., -87., 192., 104.],
45 | # [-359., -183., 376., 200.],
46 | # [ -55., -55., 72., 72.],
47 | # [-119., -119., 136., 136.],
48 | # [-247., -247., 264., 264.],
49 | # [ -35., -79., 52., 96.],
50 | # [ -79., -167., 96., 184.],
51 | # [-167., -343., 184., 360.]])
52 |
53 |
54 | def generate_anchors(
55 | stride=16, sizes=(32, 64, 128, 256, 512), aspect_ratios=(0.5, 1, 2)
56 | ):
57 | """Generates a matrix of anchor boxes in (x1, y1, x2, y2) format. Anchors
58 | are centered on stride / 2, have (approximate) sqrt areas of the specified
59 | sizes, and aspect ratios as given.
60 | """
61 | return _generate_anchors(
62 | stride,
63 | np.array(sizes, dtype=np.float) / stride,
64 | np.array(aspect_ratios, dtype=np.float)
65 | )
66 |
67 |
68 | def _generate_anchors(base_size, scales, aspect_ratios):
69 | """Generate anchor (reference) windows by enumerating aspect ratios X
70 | scales wrt a reference (0, 0, base_size - 1, base_size - 1) window.
71 | """
72 | anchor = np.array([1, 1, base_size, base_size], dtype=np.float) - 1
73 | anchors = _ratio_enum(anchor, aspect_ratios)
74 | anchors = np.vstack(
75 | [_scale_enum(anchors[i, :], scales) for i in range(anchors.shape[0])]
76 | )
77 | return anchors
78 |
79 |
80 | def _whctrs(anchor):
81 | """Return width, height, x center, and y center for an anchor (window)."""
82 | w = anchor[2] - anchor[0] + 1
83 | h = anchor[3] - anchor[1] + 1
84 | x_ctr = anchor[0] + 0.5 * (w - 1)
85 | y_ctr = anchor[1] + 0.5 * (h - 1)
86 | return w, h, x_ctr, y_ctr
87 |
88 |
89 | def _mkanchors(ws, hs, x_ctr, y_ctr):
90 | """Given a vector of widths (ws) and heights (hs) around a center
91 | (x_ctr, y_ctr), output a set of anchors (windows).
92 | """
93 | ws = ws[:, np.newaxis]
94 | hs = hs[:, np.newaxis]
95 | anchors = np.hstack(
96 | (
97 | x_ctr - 0.5 * (ws - 1),
98 | y_ctr - 0.5 * (hs - 1),
99 | x_ctr + 0.5 * (ws - 1),
100 | y_ctr + 0.5 * (hs - 1)
101 | )
102 | )
103 | return anchors
104 |
105 |
106 | def _ratio_enum(anchor, ratios):
107 | """Enumerate a set of anchors for each aspect ratio wrt an anchor."""
108 | w, h, x_ctr, y_ctr = _whctrs(anchor)
109 | size = w * h
110 | size_ratios = size / ratios
111 | ws = np.round(np.sqrt(size_ratios))
112 | hs = np.round(ws * ratios)
113 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
114 | return anchors
115 |
116 |
117 | def _scale_enum(anchor, scales):
118 | """Enumerate a set of anchors for each scale wrt an anchor."""
119 | w, h, x_ctr, y_ctr = _whctrs(anchor)
120 | ws = w * scales
121 | hs = h * scales
122 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
123 | return anchors
--------------------------------------------------------------------------------
/lib/utils/io.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | """IO utilities."""
17 |
18 | import pickle
19 | import os
20 |
21 | def save_object(obj, file_name):
22 | """Save a Python object by pickling it."""
23 | file_name = os.path.abspath(file_name)
24 | with open(file_name, 'wb') as f:
25 | pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)
26 |
--------------------------------------------------------------------------------
/lib/utils/logging.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | """Utilities for logging."""
17 |
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 | from __future__ import unicode_literals
22 |
23 | from collections import deque
24 | from email.mime.text import MIMEText
25 | import json
26 | import logging
27 | import numpy as np
28 | import smtplib
29 | import sys
30 |
31 | # Print lower precision floating point values than default FLOAT_REPR
32 | json.encoder.FLOAT_REPR = lambda o: format(o, '.6f')
33 |
34 |
35 | def log_json_stats(stats, sort_keys=True):
36 | print('json_stats: {:s}'.format(json.dumps(stats, sort_keys=sort_keys)))
37 |
38 |
39 | class SmoothedValue(object):
40 | """Track a series of values and provide access to smoothed values over a
41 | window or the global series average.
42 | """
43 |
44 | def __init__(self, window_size):
45 | self.deque = deque(maxlen=window_size)
46 | self.series = []
47 | self.total = 0.0
48 | self.count = 0
49 |
50 | def AddValue(self, value):
51 | self.deque.append(value)
52 | self.series.append(value)
53 | self.count += 1
54 | self.total += value
55 |
56 | def GetMedianValue(self):
57 | return np.median(self.deque)
58 |
59 | def GetAverageValue(self):
60 | return np.mean(self.deque)
61 |
62 | def GetGlobalAverageValue(self):
63 | return self.total / self.count
64 |
65 |
66 | def send_email(subject, body, to):
67 | s = smtplib.SMTP('localhost')
68 | mime = MIMEText(body)
69 | mime['Subject'] = subject
70 | mime['To'] = to
71 | s.sendmail('detectron', to, mime.as_string())
72 |
73 |
74 | def setup_logging(name):
75 | FORMAT = '%(levelname)s %(filename)s:%(lineno)4d: %(message)s'
76 | # Manually clear root loggers to prevent any module that may have called
77 | # logging.basicConfig() from blocking our logging setup
78 | logging.root.handlers = []
79 | logging.basicConfig(level=logging.INFO, format=FORMAT, stream=sys.stdout)
80 | logger = logging.getLogger(name)
81 | return logger
--------------------------------------------------------------------------------
/lib/utils/multilevel_rois.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | import utils.boxes as box_utils
17 | import numpy as np
18 |
19 | def add_multilevel_rois_for_test(blobs, name, roi_min_level=2,roi_max_level=5):
20 | """Distributes a set of RoIs across FPN pyramid levels by creating new level
21 | specific RoI blobs.
22 |
23 | Arguments:
24 | blobs (dict): dictionary of blobs
25 | name (str): a key in 'blobs' identifying the source RoI blob
26 |
27 | Returns:
28 | [by ref] blobs (dict): new keys named by `name + 'fpn' + level`
29 | are added to dict each with a value that's an R_level x 5 ndarray of
30 | RoIs (see _get_rois_blob for format)
31 | """
32 | lvl_min = roi_min_level
33 | lvl_max = roi_max_level
34 | #lvls = map_rois_to_fpn_levels(blobs[name][:, 1:5], lvl_min, lvl_max)
35 | lvls = map_rois_to_fpn_levels(blobs[name], lvl_min, lvl_max)
36 | blobs = add_multilevel_roi_blobs(
37 | blobs, name, blobs[name], lvls, lvl_min, lvl_max
38 | )
39 | return blobs
40 |
41 | def map_rois_to_fpn_levels(rois, k_min, k_max, roi_canonical_scale=224, roi_canonical_level=4):
42 | """Determine which FPN level each RoI in a set of RoIs should map to based
43 | on the heuristic in the FPN paper.
44 | """
45 | # Compute level ids
46 | s = np.sqrt(box_utils.boxes_area(rois))
47 | s0 = roi_canonical_scale # default: 224
48 | lvl0 = roi_canonical_level # default: 4
49 |
50 | # Eqn.(1) in FPN paper
51 | target_lvls = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
52 | target_lvls = np.clip(target_lvls, k_min, k_max)
53 | return target_lvls
54 |
55 |
56 | def add_multilevel_roi_blobs(
57 | #rois,target_lvls, lvl_min, lvl_max):
58 | blobs, blob_prefix, rois, target_lvls, lvl_min, lvl_max):
59 | """Add RoI blobs for multiple FPN levels to the blobs dict.
60 |
61 | blobs: a dict mapping from blob name to numpy ndarray
62 | blob_prefix: name prefix to use for the FPN blobs
63 | rois: the source rois as a 2D numpy array of shape (N, 5) where each row is
64 | an roi and the columns encode (batch_idx, x1, y1, x2, y2)
65 | target_lvls: numpy array of shape (N, ) indicating which FPN level each roi
66 | in rois should be assigned to
67 | lvl_min: the finest (highest resolution) FPN level (e.g., 2)
68 | lvl_max: the coarest (lowest resolution) FPN level (e.g., 6)
69 | """
70 | rois_idx_order = np.empty((0, ))
71 | rois_stacked = np.zeros((0, 4), dtype=np.float32) # for assert
72 | for lvl in range(lvl_min, lvl_max + 1):
73 | idx_lvl = np.where(target_lvls == lvl)[0]
74 | blobs[blob_prefix + '_fpn' + str(lvl)] = rois[idx_lvl, :]
75 | rois_idx_order = np.concatenate((rois_idx_order, idx_lvl))
76 | rois_stacked = np.vstack(
77 | [rois_stacked, blobs[blob_prefix + '_fpn' + str(lvl)]]
78 | )
79 | rois_idx_restore = np.argsort(rois_idx_order).astype(np.int32, copy=False)
80 | blobs[blob_prefix + '_idx_restore_int32'] = rois_idx_restore
81 | # Sanity check that restore order is correct
82 | assert (rois_stacked[rois_idx_restore] == rois).all()
83 | return blobs
--------------------------------------------------------------------------------
/lib/utils/preprocess_sample.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | from utils.blob import prep_im_for_blob,im_list_to_blob
4 | from utils.fast_rcnn_sample_rois import fast_rcnn_sample_rois
5 | from utils.multilevel_rois import add_multilevel_rois_for_test
6 |
7 | class preprocess_sample(object):
8 | # performs the preprocessing (including building image pyramids and scaling the coordinates)
9 | def __init__(self,
10 | target_sizes=800,
11 | max_size=1333,
12 | mean=[122.7717, 115.9465, 102.9801],
13 | remove_dup_proposals=True,
14 | fpn_on=False,
15 | spatial_scale=0.0625,
16 | sample_proposals_for_training=False):
17 | self.mean=mean
18 | self.target_sizes=target_sizes if isinstance(target_sizes,list) else [target_sizes]
19 | self.max_size=max_size
20 | self.remove_dup_proposals=remove_dup_proposals
21 | self.fpn_on=fpn_on
22 | self.spatial_scale=spatial_scale
23 | self.sample_proposals_for_training = sample_proposals_for_training
24 |
25 | def __call__(self, sample):
26 | # resizes image and returns scale factors
27 | original_im_size=sample['image'].shape
28 | im_list,im_scales = prep_im_for_blob(sample['image'],
29 | pixel_means=self.mean,
30 | target_sizes=self.target_sizes,
31 | max_size=self.max_size)
32 | sample['image'] = torch.FloatTensor(im_list_to_blob(im_list,self.fpn_on)) # im_list_to blob swaps channels and adds stride in case of fpn
33 | sample['scaling_factors'] = im_scales[0]
34 | sample['original_im_size'] = torch.FloatTensor(original_im_size)
35 | if len(sample['dbentry']['boxes'])!=0 and not self.sample_proposals_for_training: # Fast RCNN test
36 | proposals = sample['dbentry']['boxes']*im_scales[0]
37 | if self.remove_dup_proposals:
38 | proposals,_ = self.remove_dup_prop(proposals)
39 |
40 | if self.fpn_on==False:
41 | sample['rois'] = torch.FloatTensor(proposals)
42 | else:
43 | multiscale_proposals = add_multilevel_rois_for_test({'rois': proposals},'rois')
44 | for k in multiscale_proposals.keys():
45 | sample[k] = torch.FloatTensor(multiscale_proposals[k])
46 |
47 | elif self.sample_proposals_for_training: # Fast RCNN training
48 | sampled_rois_labels_and_targets = fast_rcnn_sample_rois(roidb=sample['dbentry'],
49 | im_scale=im_scales[0],
50 | batch_idx=0) # ok as long as we keep batch_size=1
51 | sampled_rois_labels_and_targets = {key: torch.FloatTensor(value) for key,value in sampled_rois_labels_and_targets.items()}
52 | # add to sample
53 | sample = {**sample, **sampled_rois_labels_and_targets}
54 | # remove dbentry from sample
55 | del sample['dbentry']
56 | return sample
57 |
58 | # from Detectron test.py
59 | # When mapping from image ROIs to feature map ROIs, there's some aliasing
60 | # (some distinct image ROIs get mapped to the same feature ROI).
61 | # Here, we identify duplicate feature ROIs, so we only compute features
62 | # on the unique subset.
63 | def remove_dup_prop(self,proposals):
64 | v = np.array([1e3, 1e6, 1e9, 1e12])
65 |
66 | hashes = np.round(proposals * self.spatial_scale).dot(v)
67 | _, index, inv_index = np.unique(hashes, return_index=True, return_inverse=True)
68 | proposals = proposals[index, :]
69 |
70 | return (proposals,inv_index)
--------------------------------------------------------------------------------
/lib/utils/result_utils.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | # some functions are from Detectron
17 |
18 | import numpy as np
19 | from torch.autograd import Variable
20 | import utils.boxes as box_utils
21 | import cv2
22 | import pycocotools.mask as mask_util
23 |
24 |
25 | def to_np(x):
26 | if isinstance(x,np.ndarray):
27 | return x
28 | if isinstance(x,Variable):
29 | x=x.data
30 | return x.cpu().numpy()
31 |
32 | def empty_results(num_classes, num_images):
33 | """Return empty results lists for boxes, masks, and keypoints.
34 | Box detections are collected into:
35 | all_boxes[cls][image] = N x 5 array with columns (x1, y1, x2, y2, score)
36 | Instance mask predictions are collected into:
37 | all_segms[cls][image] = [...] list of COCO RLE encoded masks that are in
38 | 1:1 correspondence with the boxes in all_boxes[cls][image]
39 | Keypoint predictions are collected into:
40 | all_keyps[cls][image] = [...] list of keypoints results, each encoded as
41 | a 3D array (#rois, 4, #keypoints) with the 4 rows corresponding to
42 | [x, y, logit, prob] (See: utils.keypoints.heatmaps_to_keypoints).
43 | Keypoints are recorded for person (cls = 1); they are in 1:1
44 | correspondence with the boxes in all_boxes[cls][image].
45 | """
46 | # Note: do not be tempted to use [[] * N], which gives N references to the
47 | # *same* empty list.
48 | all_boxes = [[[] for _ in range(num_images)] for _ in range(num_classes)]
49 | all_segms = [[[] for _ in range(num_images)] for _ in range(num_classes)]
50 | all_keyps = [[[] for _ in range(num_images)] for _ in range(num_classes)]
51 | return all_boxes, all_segms, all_keyps
52 |
53 |
54 | def extend_results(index, all_res, im_res):
55 | """Add results for an image to the set of all results at the specified
56 | index.
57 | """
58 | # Skip cls_idx 0 (__background__)
59 | for cls_idx in range(1, len(im_res)):
60 | all_res[cls_idx][index] = im_res[cls_idx]
61 |
62 | # When mapping from image ROIs to feature map ROIs, there's some aliasing
63 | # (some distinct image ROIs get mapped to the same feature ROI).
64 | # Here, we identify duplicate feature ROIs, so we only compute features
65 | # on the unique subset.
66 | def remove_dup_prop(self,proposals):
67 | proposals=proposals.data.numpy()
68 | v = np.array([1e3, 1e6, 1e9, 1e12])
69 |
70 | hashes = np.round(proposals * self.spatial_scale).dot(v)
71 | _, index, inv_index = np.unique(hashes, return_index=True, return_inverse=True)
72 | proposals = proposals[index, :]
73 | return torch.FloatTensor(proposals)
74 |
75 |
76 | def postprocess_output(rois,scaling_factor,im_size,class_scores,bbox_deltas,bbox_reg_weights = (10.0,10.0,5.0,5.0)):
77 | boxes = to_np(rois.div(scaling_factor).squeeze(0))
78 | bbox_deltas = to_np(bbox_deltas)
79 | orig_im_size = to_np(im_size).squeeze()
80 | # apply deltas
81 | pred_boxes = box_utils.bbox_transform(boxes, bbox_deltas, bbox_reg_weights)
82 | # clip on boundaries
83 | pred_boxes = box_utils.clip_tiled_boxes(pred_boxes,orig_im_size)
84 | scores = to_np(class_scores)
85 | # Map scores and predictions back to the original set of boxes
86 | # This re-duplicates the previously removed boxes
87 | # Is there any use for this?
88 | # inv_index = to_np(batch['proposal_inv_index']).squeeze().astype(np.int64)
89 | # scores = scores[inv_index, :]
90 | # pred_boxes = pred_boxes[inv_index, :]
91 | # threshold on score and run nms to remove duplicates
92 | scores_final, boxes_final, boxes_per_class = box_results_with_nms_and_limit(scores, pred_boxes)
93 |
94 | return (scores_final, boxes_final, boxes_per_class)
95 |
96 | def box_results_with_nms_and_limit(scores, boxes,
97 | num_classes=81,
98 | score_thresh=0.05,
99 | overlap_thresh=0.5,
100 | do_soft_nms=False,
101 | soft_nms_sigma=0.5,
102 | soft_nms_method='linear',
103 | do_bbox_vote=False,
104 | bbox_vote_thresh=0.8,
105 | bbox_vote_method='ID',
106 | max_detections_per_img=100, ### over all classes ###
107 | ):
108 | """Returns bounding-box detection results by thresholding on scores and
109 | applying non-maximum suppression (NMS).
110 |
111 | A number of #detections presist after this and are returned, sorted by class
112 |
113 | `boxes` has shape (#detections, 4 * #classes), where each row represents
114 | a list of predicted bounding boxes for each of the object classes in the
115 | dataset (including the background class). The detections in each row
116 | originate from the same object proposal.
117 |
118 | `scores` has shape (#detection, #classes), where each row represents a list
119 | of object detection confidence scores for each of the object classes in the
120 | dataset (including the background class). `scores[i, j]`` corresponds to the
121 | box at `boxes[i, j * 4:(j + 1) * 4]`.
122 | """
123 | cls_boxes = [[] for _ in range(num_classes)]
124 | # Apply threshold on detection probabilities and apply NMS
125 | # Skip j = 0, because it's the background class
126 | for j in range(1, num_classes):
127 | inds = np.where(scores[:, j] > score_thresh)[0]
128 | scores_j = scores[inds, j]
129 | boxes_j = boxes[inds, j * 4:(j + 1) * 4]
130 | dets_j = np.hstack((boxes_j, scores_j[:, np.newaxis])).astype(
131 | np.float32, copy=False
132 | )
133 | if do_soft_nms:
134 | nms_dets, _ = box_utils.soft_nms(
135 | dets_j,
136 | sigma=soft_nms_sigma,
137 | overlap_thresh=overlap_thresh,
138 | score_thresh=0.0001,
139 | method=soft_nms_method
140 | )
141 | else:
142 | keep = box_utils.nms(dets_j, overlap_thresh)
143 | nms_dets = dets_j[keep, :]
144 | # Refine the post-NMS boxes using bounding-box voting
145 | if do_bbox_vote:
146 | nms_dets = box_utils.box_voting(
147 | nms_dets,
148 | dets_j,
149 | bbox_vote_thresh,
150 | scoring_method=bbox_vote_method
151 | )
152 | cls_boxes[j] = nms_dets
153 |
154 | # Limit to max_per_image detections **over all classes**
155 | if max_detections_per_img > 0:
156 | image_scores = np.hstack(
157 | [cls_boxes[j][:, -1] for j in range(1, num_classes)]
158 | )
159 | if len(image_scores) > max_detections_per_img:
160 | image_thresh = np.sort(image_scores)[-max_detections_per_img]
161 | for j in range(1, num_classes):
162 | keep = np.where(cls_boxes[j][:, -1] >= image_thresh)[0]
163 | cls_boxes[j] = cls_boxes[j][keep, :]
164 |
165 | im_results = np.vstack([cls_boxes[j] for j in range(1, num_classes)])
166 | boxes = im_results[:, :-1]
167 | scores = im_results[:, -1]
168 | return scores, boxes, cls_boxes
169 |
170 | def segm_results(cls_boxes, masks, ref_boxes, im_h, im_w,
171 | num_classes=81,
172 | M=14, # cfg.MRCNN.RESOLUTION
173 | cls_specific_mask=True,
174 | thresh_binarize=0.5):
175 | cls_segms = [[] for _ in range(num_classes)]
176 | mask_ind = 0
177 | # To work around an issue with cv2.resize (it seems to automatically pad
178 | # with repeated border values), we manually zero-pad the masks by 1 pixel
179 | # prior to resizing back to the original image resolution. This prevents
180 | # "top hat" artifacts. We therefore need to expand the reference boxes by an
181 | # appropriate factor.
182 | scale = (M + 2.0) / M
183 | ref_boxes = box_utils.expand_boxes(ref_boxes, scale)
184 | ref_boxes = ref_boxes.astype(np.int32)
185 | padded_mask = np.zeros((M + 2, M + 2), dtype=np.float32)
186 |
187 | # skip j = 0, because it's the background class
188 | for j in range(1, num_classes):
189 | segms = []
190 | for _ in range(cls_boxes[j].shape[0]):
191 | if cls_specific_mask:
192 | padded_mask[1:-1, 1:-1] = masks[mask_ind, j, :, :]
193 | else:
194 | padded_mask[1:-1, 1:-1] = masks[mask_ind, 0, :, :]
195 |
196 | ref_box = ref_boxes[mask_ind, :]
197 | w = ref_box[2] - ref_box[0] + 1
198 | h = ref_box[3] - ref_box[1] + 1
199 | w = np.maximum(w, 1)
200 | h = np.maximum(h, 1)
201 |
202 | mask = cv2.resize(padded_mask, (w, h))
203 | mask = np.array(mask > thresh_binarize, dtype=np.uint8)
204 | im_mask = np.zeros((im_h, im_w), dtype=np.uint8)
205 |
206 | x_0 = max(ref_box[0], 0)
207 | x_1 = min(ref_box[2] + 1, im_w)
208 | y_0 = max(ref_box[1], 0)
209 | y_1 = min(ref_box[3] + 1, im_h)
210 |
211 | im_mask[y_0:y_1, x_0:x_1] = mask[
212 | (y_0 - ref_box[1]):(y_1 - ref_box[1]),
213 | (x_0 - ref_box[0]):(x_1 - ref_box[0])
214 | ]
215 |
216 | # Get RLE encoding used by the COCO evaluation API
217 | rle = mask_util.encode(
218 | np.array(im_mask[:, :, np.newaxis], order='F')
219 | )[0]
220 | rle['counts'] = rle['counts'].decode() # convert back to str so that it can be later saved to json
221 | segms.append(rle)
222 |
223 | mask_ind += 1
224 |
225 | cls_segms[j] = segms
226 |
227 | assert mask_ind == masks.shape[0]
228 | return cls_segms
--------------------------------------------------------------------------------
/lib/utils/segms.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | """Functions for interacting with segmentation masks in the COCO format.
17 |
18 | The following terms are used in this module
19 | mask: a binary mask encoded as a 2D numpy array
20 | segm: a segmentation mask in one of the two COCO formats (polygon or RLE)
21 | polygon: COCO's polygon format
22 | RLE: COCO's run length encoding format
23 | """
24 |
25 | from __future__ import absolute_import
26 | from __future__ import division
27 | from __future__ import print_function
28 | from __future__ import unicode_literals
29 |
30 | import numpy as np
31 |
32 | import pycocotools.mask as mask_util
33 |
34 |
35 | def flip_segms(segms, height, width):
36 | """Left/right flip each mask in a list of masks."""
37 | def _flip_poly(poly, width):
38 | flipped_poly = np.array(poly)
39 | flipped_poly[0::2] = width - np.array(poly[0::2]) - 1
40 | return flipped_poly.tolist()
41 |
42 | def _flip_rle(rle, height, width):
43 | if 'counts' in rle and type(rle['counts']) == list:
44 | # Magic RLE format handling painfully discovered by looking at the
45 | # COCO API showAnns function.
46 | rle = mask_util.frPyObjects([rle], height, width)
47 | mask = mask_util.decode(rle)
48 | mask = mask[:, ::-1, :]
49 | rle = mask_util.encode(np.array(mask, order='F', dtype=np.uint8))
50 | return rle
51 |
52 | flipped_segms = []
53 | for segm in segms:
54 | if type(segm) == list:
55 | # Polygon format
56 | flipped_segms.append([_flip_poly(poly, width) for poly in segm])
57 | else:
58 | # RLE format
59 | assert type(segm) == dict
60 | flipped_segms.append(_flip_rle(segm, height, width))
61 | return flipped_segms
62 |
63 |
64 | def polys_to_mask(polygons, height, width):
65 | """Convert from the COCO polygon segmentation format to a binary mask
66 | encoded as a 2D array of data type numpy.float32. The polygon segmentation
67 | is understood to be enclosed inside a height x width image. The resulting
68 | mask is therefore of shape (height, width).
69 | """
70 | rle = mask_util.frPyObjects(polygons, height, width)
71 | mask = np.array(mask_util.decode(rle), dtype=np.float32)
72 | # Flatten in case polygons was a list
73 | mask = np.sum(mask, axis=2)
74 | mask = np.array(mask > 0, dtype=np.float32)
75 | return mask
76 |
77 |
78 | def mask_to_bbox(mask):
79 | """Compute the tight bounding box of a binary mask."""
80 | xs = np.where(np.sum(mask, axis=0) > 0)[0]
81 | ys = np.where(np.sum(mask, axis=1) > 0)[0]
82 |
83 | if len(xs) == 0 or len(ys) == 0:
84 | return None
85 |
86 | x0 = xs[0]
87 | x1 = xs[-1]
88 | y0 = ys[0]
89 | y1 = ys[-1]
90 | return np.array((x0, y0, x1, y1), dtype=np.float32)
91 |
92 |
93 | def polys_to_mask_wrt_box(polygons, box, M):
94 | """Convert from the COCO polygon segmentation format to a binary mask
95 | encoded as a 2D array of data type numpy.float32. The polygon segmentation
96 | is understood to be enclosed in the given box and rasterized to an M x M
97 | mask. The resulting mask is therefore of shape (M, M).
98 | """
99 | w = box[2] - box[0]
100 | h = box[3] - box[1]
101 |
102 | w = np.maximum(w, 1)
103 | h = np.maximum(h, 1)
104 |
105 | polygons_norm = []
106 | for poly in polygons:
107 | p = np.array(poly, dtype=np.float32)
108 | p[0::2] = (p[0::2] - box[0]) * M / w
109 | p[1::2] = (p[1::2] - box[1]) * M / h
110 | polygons_norm.append(p)
111 |
112 | rle = mask_util.frPyObjects(polygons_norm, M, M)
113 | mask = np.array(mask_util.decode(rle), dtype=np.float32)
114 | # Flatten in case polygons was a list
115 | mask = np.sum(mask, axis=2)
116 | mask = np.array(mask > 0, dtype=np.float32)
117 | return mask
118 |
119 |
120 | def polys_to_boxes(polys):
121 | """Convert a list of polygons into an array of tight bounding boxes."""
122 | boxes_from_polys = np.zeros((len(polys), 4), dtype=np.float32)
123 | for i in range(len(polys)):
124 | poly = polys[i]
125 | x0 = min(min(p[::2]) for p in poly)
126 | x1 = max(max(p[::2]) for p in poly)
127 | y0 = min(min(p[1::2]) for p in poly)
128 | y1 = max(max(p[1::2]) for p in poly)
129 | boxes_from_polys[i, :] = [x0, y0, x1, y1]
130 |
131 | return boxes_from_polys
132 |
133 |
134 | def rle_mask_voting(
135 | top_masks, all_masks, all_dets, iou_thresh, binarize_thresh, method='AVG'
136 | ):
137 | """Returns new masks (in correspondence with `top_masks`) by combining
138 | multiple overlapping masks coming from the pool of `all_masks`. Two methods
139 | for combining masks are supported: 'AVG' uses a weighted average of
140 | overlapping mask pixels; 'UNION' takes the union of all mask pixels.
141 | """
142 | if len(top_masks) == 0:
143 | return
144 |
145 | all_not_crowd = [False] * len(all_masks)
146 | top_to_all_overlaps = mask_util.iou(top_masks, all_masks, all_not_crowd)
147 | decoded_all_masks = [
148 | np.array(mask_util.decode(rle), dtype=np.float32) for rle in all_masks
149 | ]
150 | decoded_top_masks = [
151 | np.array(mask_util.decode(rle), dtype=np.float32) for rle in top_masks
152 | ]
153 | all_boxes = all_dets[:, :4].astype(np.int32)
154 | all_scores = all_dets[:, 4]
155 |
156 | # Fill box support with weights
157 | mask_shape = decoded_all_masks[0].shape
158 | mask_weights = np.zeros((len(all_masks), mask_shape[0], mask_shape[1]))
159 | for k in range(len(all_masks)):
160 | ref_box = all_boxes[k]
161 | x_0 = max(ref_box[0], 0)
162 | x_1 = min(ref_box[2] + 1, mask_shape[1])
163 | y_0 = max(ref_box[1], 0)
164 | y_1 = min(ref_box[3] + 1, mask_shape[0])
165 | mask_weights[k, y_0:y_1, x_0:x_1] = all_scores[k]
166 | mask_weights = np.maximum(mask_weights, 1e-5)
167 |
168 | top_segms_out = []
169 | for k in range(len(top_masks)):
170 | # Corner case of empty mask
171 | if decoded_top_masks[k].sum() == 0:
172 | top_segms_out.append(top_masks[k])
173 | continue
174 |
175 | inds_to_vote = np.where(top_to_all_overlaps[k] >= iou_thresh)[0]
176 | # Only matches itself
177 | if len(inds_to_vote) == 1:
178 | top_segms_out.append(top_masks[k])
179 | continue
180 |
181 | masks_to_vote = [decoded_all_masks[i] for i in inds_to_vote]
182 | if method == 'AVG':
183 | ws = mask_weights[inds_to_vote]
184 | soft_mask = np.average(masks_to_vote, axis=0, weights=ws)
185 | mask = np.array(soft_mask > binarize_thresh, dtype=np.uint8)
186 | elif method == 'UNION':
187 | # Any pixel that's on joins the mask
188 | soft_mask = np.sum(masks_to_vote, axis=0)
189 | mask = np.array(soft_mask > 1e-5, dtype=np.uint8)
190 | else:
191 | raise NotImplementedError('Method {} is unknown'.format(method))
192 | rle = mask_util.encode(np.array(mask[:, :, np.newaxis], order='F'))[0]
193 | top_segms_out.append(rle)
194 |
195 | return top_segms_out
196 |
197 |
198 | def rle_mask_nms(masks, dets, thresh, mode='IOU'):
199 | """Performs greedy non-maximum suppression based on an overlap measurement
200 | between masks. The type of measurement is determined by `mode` and can be
201 | either 'IOU' (standard intersection over union) or 'IOMA' (intersection over
202 | mininum area).
203 | """
204 | if len(masks) == 0:
205 | return []
206 | if len(masks) == 1:
207 | return [0]
208 |
209 | if mode == 'IOU':
210 | # Computes ious[m1, m2] = area(intersect(m1, m2)) / area(union(m1, m2))
211 | all_not_crowds = [False] * len(masks)
212 | ious = mask_util.iou(masks, masks, all_not_crowds)
213 | elif mode == 'IOMA':
214 | # Computes ious[m1, m2] = area(intersect(m1, m2)) / min(area(m1), area(m2))
215 | all_crowds = [True] * len(masks)
216 | # ious[m1, m2] = area(intersect(m1, m2)) / area(m2)
217 | ious = mask_util.iou(masks, masks, all_crowds)
218 | # ... = max(area(intersect(m1, m2)) / area(m2),
219 | # area(intersect(m2, m1)) / area(m1))
220 | ious = np.maximum(ious, ious.transpose())
221 | elif mode == 'CONTAINMENT':
222 | # Computes ious[m1, m2] = area(intersect(m1, m2)) / area(m2)
223 | # Which measures how much m2 is contained inside m1
224 | all_crowds = [True] * len(masks)
225 | ious = mask_util.iou(masks, masks, all_crowds)
226 | else:
227 | raise NotImplementedError('Mode {} is unknown'.format(mode))
228 |
229 | scores = dets[:, 4]
230 | order = np.argsort(-scores)
231 |
232 | keep = []
233 | while order.size > 0:
234 | i = order[0]
235 | keep.append(i)
236 | ovr = ious[i, order[1:]]
237 | inds_to_keep = np.where(ovr <= thresh)[0]
238 | order = order[inds_to_keep + 1]
239 |
240 | return keep
241 |
242 |
243 | def rle_masks_to_boxes(masks):
244 | """Computes the bounding box of each mask in a list of RLE encoded masks."""
245 | if len(masks) == 0:
246 | return []
247 |
248 | decoded_masks = [
249 | np.array(mask_util.decode(rle), dtype=np.float32) for rle in masks
250 | ]
251 |
252 | def get_bounds(flat_mask):
253 | inds = np.where(flat_mask > 0)[0]
254 | return inds.min(), inds.max()
255 |
256 | boxes = np.zeros((len(decoded_masks), 4))
257 | keep = [True] * len(decoded_masks)
258 | for i, mask in enumerate(decoded_masks):
259 | if mask.sum() == 0:
260 | keep[i] = False
261 | continue
262 | flat_mask = mask.sum(axis=0)
263 | x0, x1 = get_bounds(flat_mask)
264 | flat_mask = mask.sum(axis=1)
265 | y0, y1 = get_bounds(flat_mask)
266 | boxes[i, :] = (x0, y0, x1, y1)
267 |
268 | return boxes, np.where(keep)[0]
269 |
--------------------------------------------------------------------------------
/lib/utils/selective_search.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import cv2
3 |
4 | def selective_search(pil_image=None,quality='f',size=800):
5 | # speed-up using multithreads
6 | cv2.setUseOptimized(True);
7 | cv2.setNumThreads(4);
8 |
9 | # resize image to limit number of proposals and to bypass a bug in OpenCV with non-square images
10 | w,h = pil_image.size
11 | h_factor,w_factor=h/size,w/size
12 | pil_image=pil_image.resize((size,size))
13 |
14 | im = cv2.cvtColor(np.array(pil_image), cv2.COLOR_RGB2BGR)
15 |
16 | # create Selective Search Segmentation Object using default parameters
17 | ss = cv2.ximgproc.segmentation.createSelectiveSearchSegmentation()
18 |
19 | # set input image on which we will run segmentation
20 | ss.setBaseImage(im)
21 |
22 | # Switch to fast but low recall Selective Search method
23 | if (quality == 'f'):
24 | ss.switchToSelectiveSearchFast()
25 | # Switch to high recall but slow Selective Search method
26 | elif (quality == 'q'):
27 | ss.switchToSelectiveSearchQuality()
28 |
29 | # run selective search segmentation on input image
30 | rects = ss.process()
31 |
32 | # rect is in x,y,w,h format
33 | # convert to xmin,ymin,xmax,ymax format
34 | rects = np.vstack((rects[:,0]*w_factor, rects[:,1]*h_factor, (rects[:,0]+rects[:,2])*w_factor, (rects[:,1]+rects[:,3])*h_factor)).transpose()
35 |
36 | return rects
--------------------------------------------------------------------------------
/lib/utils/solver.py:
--------------------------------------------------------------------------------
1 | def adjust_learning_rate(optimizer, lr):
2 | for param_group in optimizer.param_groups:
3 | param_group['lr'] = lr
4 |
5 |
6 | def get_step_index(cur_iter,lr_steps=[0, 240000, 320000],max_iter=360000):
7 | """Given an iteration, find which learning rate step we're at."""
8 | assert lr_steps[0] == 0, 'The first step should always start at 0.'
9 | steps = lr_steps + [max_iter]
10 | for ind, step in enumerate(steps): # NoQA
11 | if cur_iter < step:
12 | break
13 | return ind - 1
14 |
15 |
16 | def lr_func_steps_with_decay(cur_iter,base_lr=0.01,gamma=0.1):
17 | """For cfg.SOLVER.LR_POLICY = 'steps_with_decay'
18 | Change the learning rate specified iterations based on the formula
19 | lr = base_lr * gamma ** lr_step_count.
20 | Example:
21 | cfg.SOLVER.MAX_ITER: 90
22 | cfg.SOLVER.STEPS: [0, 60, 80]
23 | cfg.SOLVER.BASE_LR: 0.02
24 | cfg.SOLVER.GAMMA: 0.1
25 | for cur_iter in [0, 59] use 0.02 = 0.02 * 0.1 ** 0
26 | in [60, 79] use 0.002 = 0.02 * 0.1 ** 1
27 | in [80, inf] use 0.0002 = 0.02 * 0.1 ** 2
28 | """
29 | ind = get_step_index(cur_iter)
30 | return base_lr * gamma ** ind
31 |
32 | def get_lr_at_iter(it,warm_up_iters=500,warm_up_factor=0.3333333333333333,warm_up_method='linear'):
33 | """Get the learning rate at iteration it according to the cfg.SOLVER
34 | settings.
35 | """
36 | lr = lr_func_steps_with_decay(it)
37 | if it < warm_up_iters:
38 | if warm_up_method == 'linear':
39 | alpha = it / warm_up_iters
40 | warm_up_factor = warm_up_factor * (1 - alpha) + alpha
41 | elif warm_up_method != 'constant':
42 | raise KeyError('Unknown WARM_UP_METHOD: {}'.format(warm_up_method))
43 | lr *= warm_up_factor
44 | return lr
--------------------------------------------------------------------------------
/lib/utils/timer.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | #
16 | # Based on:
17 | # --------------------------------------------------------
18 | # Fast R-CNN
19 | # Copyright (c) 2015 Microsoft
20 | # Licensed under The MIT License [see LICENSE for details]
21 | # Written by Ross Girshick
22 | # --------------------------------------------------------
23 |
24 | """Timing related functions."""
25 |
26 | from __future__ import absolute_import
27 | from __future__ import division
28 | from __future__ import print_function
29 | from __future__ import unicode_literals
30 |
31 | import time
32 |
33 |
34 | class Timer(object):
35 | """A simple timer."""
36 |
37 | def __init__(self):
38 | self.reset()
39 |
40 | def tic(self):
41 | # using time.time instead of time.clock because time time.clock
42 | # does not normalize for multithreading
43 | self.start_time = time.time()
44 |
45 | def toc(self, average=True):
46 | self.diff = time.time() - self.start_time
47 | self.total_time += self.diff
48 | self.calls += 1
49 | self.average_time = self.total_time / self.calls
50 | if average:
51 | return self.average_time
52 | else:
53 | return self.diff
54 |
55 | def reset(self):
56 | self.total_time = 0.
57 | self.calls = 0
58 | self.start_time = 0.
59 | self.diff = 0.
60 | self.average_time = 0.
61 |
--------------------------------------------------------------------------------
/lib/utils/training_stats.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python2
2 |
3 | # Copyright (c) 2017-present, Facebook, Inc.
4 | #
5 | # Licensed under the Apache License, Version 2.0 (the "License");
6 | # you may not use this file except in compliance with the License.
7 | # You may obtain a copy of the License at
8 | #
9 | # http://www.apache.org/licenses/LICENSE-2.0
10 | #
11 | # Unless required by applicable law or agreed to in writing, software
12 | # distributed under the License is distributed on an "AS IS" BASIS,
13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 | # See the License for the specific language governing permissions and
15 | # limitations under the License.
16 | ##############################################################################
17 |
18 | """Utilities for training."""
19 |
20 | from __future__ import absolute_import
21 | from __future__ import division
22 | from __future__ import print_function
23 | from __future__ import unicode_literals
24 |
25 | import datetime
26 | import numpy as np
27 |
28 | #from caffe2.python import utils as c2_py_utils
29 | #from core.config import cfg
30 | from utils.logging import log_json_stats
31 | from utils.logging import SmoothedValue
32 | from utils.timer import Timer
33 |
34 |
35 | class TrainingStats(object):
36 | """Track vital training statistics."""
37 |
38 | def __init__(self, metrics, losses,
39 | solver_max_iters):
40 | self.solver_max_iters = solver_max_iters
41 | # Window size for smoothing tracked values (with median filtering)
42 | self.win_sz = 20
43 | # Output logging period in SGD iterations
44 | self.log_period = 20
45 | self.smoothed_losses_and_metrics = {
46 | key: SmoothedValue(self.win_sz)
47 | for key in losses + metrics
48 | }
49 | self.losses_and_metrics = {
50 | key: 0
51 | for key in losses + metrics
52 | }
53 | self.smoothed_total_loss = SmoothedValue(self.win_sz)
54 | self.smoothed_mb_qsize = SmoothedValue(self.win_sz)
55 | self.iter_total_loss = np.nan
56 | self.iter_timer = Timer()
57 | self.metrics = metrics
58 | self.losses = losses
59 |
60 | def IterTic(self):
61 | self.iter_timer.tic()
62 |
63 | def IterToc(self):
64 | return self.iter_timer.toc(average=False)
65 |
66 | def ResetIterTimer(self):
67 | self.iter_timer.reset()
68 |
69 | def UpdateIterStats(self,losses_dict, metrics_dict):
70 | """Update tracked iteration statistics."""
71 | for k in self.losses_and_metrics.keys():
72 | if k in self.losses: # if loss
73 | self.losses_and_metrics[k] = losses_dict[k]
74 | else: # if metric
75 | self.losses_and_metrics[k] = metrics_dict[k]
76 |
77 | for k, v in self.smoothed_losses_and_metrics.items():
78 | v.AddValue(self.losses_and_metrics[k])
79 | #import pdb; pdb.set_trace()
80 | self.iter_total_loss = np.sum(
81 | np.array([self.losses_and_metrics[k] for k in self.losses])
82 | )
83 | self.smoothed_total_loss.AddValue(self.iter_total_loss)
84 | self.smoothed_mb_qsize.AddValue(
85 | #self.model.roi_data_loader._minibatch_queue.qsize()
86 | 64
87 | )
88 |
89 | def LogIterStats(self, cur_iter, lr):
90 | """Log the tracked statistics."""
91 | if (cur_iter % self.log_period == 0 or
92 | cur_iter == self.solver_max_iters - 1):
93 | stats = self.GetStats(cur_iter, lr)
94 | log_json_stats(stats)
95 |
96 | def GetStats(self, cur_iter, lr):
97 | eta_seconds = self.iter_timer.average_time * (
98 | self.solver_max_iters - cur_iter
99 | )
100 | eta = str(datetime.timedelta(seconds=int(eta_seconds)))
101 | #mem_stats = c2_py_utils.GetGPUMemoryUsageStats()
102 | #mem_usage = np.max(mem_stats['max_by_gpu'][:cfg.NUM_GPUS])
103 | stats = dict(
104 | iter=cur_iter,
105 | lr="{:.6f}".format(float(lr)),
106 | time="{:.6f}".format(self.iter_timer.average_time),
107 | loss="{:.6f}".format(self.smoothed_total_loss.GetMedianValue()),
108 | eta=eta,
109 | #mb_qsize=int(np.round(self.smoothed_mb_qsize.GetMedianValue())),
110 | #mem=int(np.ceil(mem_usage / 1024 / 1024))
111 | )
112 | for k, v in self.smoothed_losses_and_metrics.items():
113 | stats[k] = "{:.6f}".format(v.GetMedianValue())
114 | return stats
--------------------------------------------------------------------------------
/lib/utils/utils.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.autograd import Variable
3 |
4 | def normalize_axis(x,L):
5 | return (x-1-(L-1)/2)*2/(L-1)
6 |
7 | def unnormalize_axis(x,L):
8 | return x*(L-1)/2+1+(L-1)/2
9 |
10 | def expand_dim(tensor,dim,desired_dim_len):
11 | sz = list(tensor.size())
12 | sz[dim]=desired_dim_len
13 | return tensor.expand(tuple(sz))
14 |
15 | def create_file_path(filename):
16 | if not os.path.exists(os.path.dirname(filename)):
17 | try:
18 | os.makedirs(os.path.dirname(filename))
19 | except OSError as exc: # Guard against race condition
20 | if exc.errno != errno.EEXIST:
21 | raise
22 |
23 | def to_cuda(x):
24 | if isinstance(x,dict):
25 | return {key: to_cuda(x[key]) for key in x.keys()}
26 | if isinstance(x,list):
27 | return [y.cuda() for y in x]
28 | return x.cuda()
29 |
30 | def to_cuda_variable(x,volatile=True):
31 | if isinstance(x,dict):
32 | return {key: to_cuda_variable(x[key],volatile=volatile) for key in x.keys()}
33 | if isinstance(x,list):
34 | return [to_cuda_variable(y) for y in x]
35 | if isinstance(x, (int, float)):
36 | return x
37 | if isinstance(x, torch.Tensor):
38 | if torch.__version__[:3]=="0.4" or volatile==False:
39 | return Variable(x.cuda())
40 | else:
41 | return Variable(x.cuda(),volatile=True)
42 |
43 |
44 | def parse_th_to_caffe2(terms,i=0,parsed=''):
45 | # Convert PyTorch ResNet weight names to caffe2 weight names
46 | if i==0:
47 | if terms[i]=='conv1':
48 | parsed='conv1'
49 | elif terms[i]=='bn1':
50 | parsed='res_conv1'
51 | elif terms[i].startswith('layer'):
52 | parsed='res'+str(int(terms[i][-1])+1)
53 | else:
54 | if terms[i]=='weight' and (terms[i-1].startswith('conv') or terms[i-1]=='0'):
55 | parsed+='_w'
56 | elif terms[i]=='weight' and (terms[i-1].startswith('bn') or terms[i-1]=='1'):
57 | parsed+='_bn_s'
58 | elif terms[i]=='bias' and (terms[i-1].startswith('bn') or terms[i-1]=='1'):
59 | parsed+='_bn_b'
60 | elif terms[i-1].startswith('layer'):
61 | parsed+='_'+terms[i]
62 | elif terms[i].startswith('conv') or terms[i].startswith('bn'):
63 | parsed+='_branch2'+chr(96+int(terms[i][-1]))
64 | elif terms[i]=='downsample':
65 | parsed+='_branch1'
66 | # increase counter
67 | i+=1
68 | # do recursion
69 | if i==len(terms):
70 | return parsed
71 | return parse_th_to_caffe2(terms,i,parsed)
72 |
--------------------------------------------------------------------------------
/lib/utils_cython/build_cython.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 |
16 | from __future__ import absolute_import
17 | from __future__ import division
18 | from __future__ import print_function
19 |
20 | from Cython.Build import cythonize
21 | from setuptools import Extension
22 | from setuptools import setup
23 |
24 | import numpy as np
25 |
26 | _NP_INCLUDE_DIRS = np.get_include()
27 |
28 |
29 | # Extension modules
30 | ext_modules = [
31 | Extension(
32 | name='cython_bbox',
33 | sources=[
34 | 'cython_bbox.pyx'
35 | ],
36 | extra_compile_args=[
37 | '-Wno-cpp'
38 | ],
39 | include_dirs=[
40 | _NP_INCLUDE_DIRS
41 | ]
42 | ),
43 | Extension(
44 | name='cython_nms',
45 | sources=[
46 | 'cython_nms.pyx'
47 | ],
48 | extra_compile_args=[
49 | '-Wno-cpp'
50 | ],
51 | include_dirs=[
52 | _NP_INCLUDE_DIRS
53 | ]
54 | )
55 | ]
56 |
57 | setup(
58 | name='Detectron',
59 | ext_modules=cythonize(ext_modules)
60 | )
--------------------------------------------------------------------------------
/lib/utils_cython/cython_bbox.pyx:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | #
16 | # Based on:
17 | # --------------------------------------------------------
18 | # Fast R-CNN
19 | # Copyright (c) 2015 Microsoft
20 | # Licensed under The MIT License [see LICENSE for details]
21 | # Written by Sergey Karayev
22 | # --------------------------------------------------------
23 |
24 | cimport cython
25 | import numpy as np
26 | cimport numpy as np
27 |
28 | DTYPE = np.float32
29 | ctypedef np.float32_t DTYPE_t
30 |
31 | @cython.boundscheck(False)
32 | def bbox_overlaps(
33 | np.ndarray[DTYPE_t, ndim=2] boxes,
34 | np.ndarray[DTYPE_t, ndim=2] query_boxes):
35 | """
36 | Parameters
37 | ----------
38 | boxes: (N, 4) ndarray of float
39 | query_boxes: (K, 4) ndarray of float
40 | Returns
41 | -------
42 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes
43 | """
44 | cdef unsigned int N = boxes.shape[0]
45 | cdef unsigned int K = query_boxes.shape[0]
46 | cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
47 | cdef DTYPE_t iw, ih, box_area
48 | cdef DTYPE_t ua
49 | cdef unsigned int k, n
50 | with nogil:
51 | for k in range(K):
52 | box_area = (
53 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) *
54 | (query_boxes[k, 3] - query_boxes[k, 1] + 1)
55 | )
56 | for n in range(N):
57 | iw = (
58 | min(boxes[n, 2], query_boxes[k, 2]) -
59 | max(boxes[n, 0], query_boxes[k, 0]) + 1
60 | )
61 | if iw > 0:
62 | ih = (
63 | min(boxes[n, 3], query_boxes[k, 3]) -
64 | max(boxes[n, 1], query_boxes[k, 1]) + 1
65 | )
66 | if ih > 0:
67 | ua = float(
68 | (boxes[n, 2] - boxes[n, 0] + 1) *
69 | (boxes[n, 3] - boxes[n, 1] + 1) +
70 | box_area - iw * ih
71 | )
72 | overlaps[n, k] = iw * ih / ua
73 | return overlaps
--------------------------------------------------------------------------------
/lib/utils_cython/cython_nms.pyx:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2017-present, Facebook, Inc.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | ##############################################################################
15 | #
16 | # Based on:
17 | # --------------------------------------------------------
18 | # Fast R-CNN
19 | # Copyright (c) 2015 Microsoft
20 | # Licensed under The MIT License [see LICENSE for details]
21 | # Written by Ross Girshick
22 | # --------------------------------------------------------
23 |
24 | cimport cython
25 | import numpy as np
26 | cimport numpy as np
27 |
28 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b) nogil:
29 | return a if a >= b else b
30 |
31 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b) nogil:
32 | return a if a <= b else b
33 |
34 | @cython.boundscheck(False)
35 | @cython.cdivision(True)
36 | @cython.wraparound(False)
37 | def nms(np.ndarray[np.float32_t, ndim=2] dets, np.float32_t thresh):
38 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
39 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
40 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
41 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
42 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
43 |
44 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1)
45 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1]
46 |
47 | cdef int ndets = dets.shape[0]
48 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \
49 | np.zeros((ndets), dtype=np.int)
50 |
51 | # nominal indices
52 | cdef int _i, _j
53 | # sorted indices
54 | cdef int i, j
55 | # temp variables for box i's (the box currently under consideration)
56 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea
57 | # variables for computing overlap with box j (lower scoring box)
58 | cdef np.float32_t xx1, yy1, xx2, yy2
59 | cdef np.float32_t w, h
60 | cdef np.float32_t inter, ovr
61 |
62 | with nogil:
63 | for _i in range(ndets):
64 | i = order[_i]
65 | if suppressed[i] == 1:
66 | continue
67 | ix1 = x1[i]
68 | iy1 = y1[i]
69 | ix2 = x2[i]
70 | iy2 = y2[i]
71 | iarea = areas[i]
72 | for _j in range(_i + 1, ndets):
73 | j = order[_j]
74 | if suppressed[j] == 1:
75 | continue
76 | xx1 = max(ix1, x1[j])
77 | yy1 = max(iy1, y1[j])
78 | xx2 = min(ix2, x2[j])
79 | yy2 = min(iy2, y2[j])
80 | w = max(0.0, xx2 - xx1 + 1)
81 | h = max(0.0, yy2 - yy1 + 1)
82 | inter = w * h
83 | ovr = inter / (iarea + areas[j] - inter)
84 | if ovr >= thresh:
85 | suppressed[j] = 1
86 |
87 | return np.where(suppressed == 0)[0]
88 |
89 | # ----------------------------------------------------------
90 | # Soft-NMS: Improving Object Detection With One Line of Code
91 | # Copyright (c) University of Maryland, College Park
92 | # Licensed under The MIT License [see LICENSE for details]
93 | # Written by Navaneeth Bodla and Bharat Singh
94 | # ----------------------------------------------------------
95 | @cython.boundscheck(False)
96 | @cython.cdivision(True)
97 | @cython.wraparound(False)
98 | def soft_nms(
99 | np.ndarray[float, ndim=2] boxes_in,
100 | float sigma=0.5,
101 | float Nt=0.3,
102 | float threshold=0.001,
103 | unsigned int method=0
104 | ):
105 | boxes = boxes_in.copy()
106 | cdef unsigned int N = boxes.shape[0]
107 | cdef float iw, ih, box_area
108 | cdef float ua
109 | cdef int pos = 0
110 | cdef float maxscore = 0
111 | cdef int maxpos = 0
112 | cdef float x1, x2, y1, y2, tx1, tx2, ty1, ty2, ts, area, weight, ov
113 | inds = np.arange(N)
114 |
115 | for i in range(N):
116 | maxscore = boxes[i, 4]
117 | maxpos = i
118 |
119 | tx1 = boxes[i,0]
120 | ty1 = boxes[i,1]
121 | tx2 = boxes[i,2]
122 | ty2 = boxes[i,3]
123 | ts = boxes[i,4]
124 | ti = inds[i]
125 |
126 | pos = i + 1
127 | # get max box
128 | while pos < N:
129 | if maxscore < boxes[pos, 4]:
130 | maxscore = boxes[pos, 4]
131 | maxpos = pos
132 | pos = pos + 1
133 |
134 | # add max box as a detection
135 | boxes[i,0] = boxes[maxpos,0]
136 | boxes[i,1] = boxes[maxpos,1]
137 | boxes[i,2] = boxes[maxpos,2]
138 | boxes[i,3] = boxes[maxpos,3]
139 | boxes[i,4] = boxes[maxpos,4]
140 | inds[i] = inds[maxpos]
141 |
142 | # swap ith box with position of max box
143 | boxes[maxpos,0] = tx1
144 | boxes[maxpos,1] = ty1
145 | boxes[maxpos,2] = tx2
146 | boxes[maxpos,3] = ty2
147 | boxes[maxpos,4] = ts
148 | inds[maxpos] = ti
149 |
150 | tx1 = boxes[i,0]
151 | ty1 = boxes[i,1]
152 | tx2 = boxes[i,2]
153 | ty2 = boxes[i,3]
154 | ts = boxes[i,4]
155 |
156 | pos = i + 1
157 | # NMS iterations, note that N changes if detection boxes fall below
158 | # threshold
159 | while pos < N:
160 | x1 = boxes[pos, 0]
161 | y1 = boxes[pos, 1]
162 | x2 = boxes[pos, 2]
163 | y2 = boxes[pos, 3]
164 | s = boxes[pos, 4]
165 |
166 | area = (x2 - x1 + 1) * (y2 - y1 + 1)
167 | iw = (min(tx2, x2) - max(tx1, x1) + 1)
168 | if iw > 0:
169 | ih = (min(ty2, y2) - max(ty1, y1) + 1)
170 | if ih > 0:
171 | ua = float((tx2 - tx1 + 1) * (ty2 - ty1 + 1) + area - iw * ih)
172 | ov = iw * ih / ua #iou between max box and detection box
173 |
174 | if method == 1: # linear
175 | if ov > Nt:
176 | weight = 1 - ov
177 | else:
178 | weight = 1
179 | elif method == 2: # gaussian
180 | weight = np.exp(-(ov * ov)/sigma)
181 | else: # original NMS
182 | if ov > Nt:
183 | weight = 0
184 | else:
185 | weight = 1
186 |
187 | boxes[pos, 4] = weight*boxes[pos, 4]
188 |
189 | # if box score falls below threshold, discard the box by
190 | # swapping with last box update N
191 | if boxes[pos, 4] < threshold:
192 | boxes[pos,0] = boxes[N-1, 0]
193 | boxes[pos,1] = boxes[N-1, 1]
194 | boxes[pos,2] = boxes[N-1, 2]
195 | boxes[pos,3] = boxes[N-1, 3]
196 | boxes[pos,4] = boxes[N-1, 4]
197 | inds[pos] = inds[N-1]
198 | N = N - 1
199 | pos = pos - 1
200 |
201 | pos = pos + 1
202 |
203 | return boxes[:N], inds[:N]
--------------------------------------------------------------------------------
/train_fast.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | import torch
4 | from torch.autograd import Variable
5 | from torch.utils.data import DataLoader
6 |
7 | import numpy as np
8 |
9 | import sys
10 | sys.path.insert(0, "lib/")
11 | from data.coco_dataset import CocoDataset
12 | from utils.preprocess_sample import preprocess_sample
13 | from utils.collate_custom import collate_custom
14 | from utils.utils import to_cuda, to_variable, to_cuda_variable
15 | from model.detector import detector
16 | from model.loss import accuracy, smooth_L1
17 | from utils.solver import adjust_learning_rate,get_lr_at_iter
18 | from utils.training_stats import TrainingStats
19 | from torch.nn.utils.clip_grad import clip_grad_norm
20 | import torch.nn as nn
21 | from utils.data_parallel import data_parallel
22 | from torch.nn.functional import cross_entropy
23 |
24 |
25 | parser = argparse.ArgumentParser(description='PyTorch Fast RCNN Training')
26 | # MODEL
27 | parser.add_argument('--cnn-arch', default='resnet50')
28 | parser.add_argument('--cnn-pkl', default='files/pretrained_base_cnn/R-50.pkl')
29 | parser.add_argument('--cnn-mapping', default='files/mapping_files/resnet50_mapping.npy')
30 | # DATASET
31 | # parser.add_argument('--dset-path', default=('datasets/data/coco/coco_train2014',
32 | # 'datasets/data/coco/coco_val2014/'))
33 | # parser.add_argument('--dset-rois', default=('files/proposal_files/coco_2014_train/rpn_proposals.pkl',
34 | # 'files/proposal_files/coco_2014_valminusminival/rpn_proposals.pkl'))
35 | # parser.add_argument('--dset-ann', default=('datasets/data/coco/annotations/instances_train2014.json',
36 | # 'datasets/data/coco/annotations/instances_valminusminival2014.json'))
37 | # parser.add_argument('--dset-path', default=('datasets/data/coco/coco_train2014',
38 | # ))
39 | # parser.add_argument('--dset-rois', default=('files/proposal_files/coco_2014_train/rpn_proposals.pkl',
40 | # ))
41 | # parser.add_argument('--dset-ann', default=('datasets/data/coco/annotations/instances_train2014.json',
42 | # ))
43 |
44 | # use MINIVAL for debugging as it loads fast
45 | parser.add_argument('--dset-path', default=('datasets/data/coco/coco_val2014',
46 | ))
47 | parser.add_argument('--dset-rois', default=('files/proposal_files/coco_2014_minival/rpn_proposals.pkl',
48 | ))
49 | parser.add_argument('--dset-ann', default=('datasets/data/coco/annotations/instances_minival2014.json',
50 | ))
51 | # DATALOADER
52 |
53 | parser.add_argument('-j', '--workers', default=4, type=int, metavar='N',
54 | help='number of data loading workers (default: 0)')
55 | # SOLVER
56 | parser.add_argument('--base-lr', default=0.01, type=float)
57 | parser.add_argument('--lr-steps', default=[0, 240000, 320000])
58 | parser.add_argument('--momentum', default=0.9, type=float)
59 | parser.add_argument('--wd', default=1e-4, type=float, help='weight decay (default: 1e-4)')
60 | # TRAINING
61 | parser.add_argument('--max-iter', default=360000, type=int)
62 | parser.add_argument('--batch-size', default=1, type=int)
63 | parser.add_argument('--start-iter', default=0, type=int, metavar='N',
64 | help='manual iter number (useful on restarts)')
65 | parser.add_argument('--resume', default='', type=str, metavar='PATH',
66 | help='path to latest checkpoint (default: none)')
67 | parser.add_argument('--checkpoint-period', default=20000, type=int)
68 | parser.add_argument('--checkpoint-fn', default='files/results/fast.pth.tar')
69 |
70 |
71 | def main():
72 | args = parser.parse_args()
73 | print(args)
74 | # for now, batch_size should match number of gpus
75 | assert(args.batch_size==torch.cuda.device_count())
76 |
77 | # create model
78 | model = detector(arch=args.cnn_arch,
79 | base_cnn_pkl_file=args.cnn_pkl,
80 | mapping_file=args.cnn_mapping,
81 | output_prob=False,
82 | return_rois=False,
83 | return_img_features=False)
84 | model = model.cuda()
85 |
86 | # freeze part of the net
87 | stop_grad=['conv1','bn1','relu','maxpool','layer1']
88 | model_no_grad=torch.nn.Sequential(*[getattr(model.model,l) for l in stop_grad])
89 | for param in model_no_grad.parameters():
90 | param.requires_grad = False
91 |
92 | # define optimizer
93 | optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()),
94 | lr=args.base_lr,
95 | momentum=args.momentum,
96 | weight_decay=args.wd)
97 |
98 | # create dataset
99 | train_dataset = CocoDataset(ann_file=args.dset_ann,
100 | img_dir=args.dset_path,
101 | proposal_file=args.dset_rois,
102 | mode='train',
103 | sample_transform=preprocess_sample(target_sizes=[800],
104 | sample_proposals_for_training=True))
105 | train_loader = DataLoader(train_dataset, batch_size=args.batch_size,shuffle=False, num_workers=args.workers, collate_fn=collate_custom)
106 |
107 | training_stats = TrainingStats(losses=['loss_cls','loss_bbox'],
108 | metrics=['accuracy_cls'],
109 | solver_max_iters=args.max_iter)
110 |
111 | iter = args.start_iter
112 |
113 | print('starting training')
114 |
115 | while iterargs.max_iter:
192 | break
193 | # advance iteration
194 | iter+=1
195 | #import pdb; pdb.set_trace()
196 |
197 | def save_checkpoint(state, filename='checkpoint.pth.tar'):
198 | torch.save(state, filename)
199 |
200 | if __name__ == '__main__':
201 | main()
202 |
--------------------------------------------------------------------------------
|