├── .gitignore ├── README.md ├── downoad_model.py ├── forward.py ├── result1.jpg ├── result2.jpg ├── roi_pooling_2d.py ├── sample.jpg ├── sample2.jpg └── vgg.py /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | *.pyc 3 | 4 | fast_rcnn_vgg_voc.model 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # fast R-CNN without caffe or GPU! 2 | This repo implements simple faster R-CNN. You can use it to detect 20 objects defined in PASCAL VOC datasets. Only detection now. Training is not supported. 3 | 4 | The idea is to understand how R-CNN works by actual codes. I just wanted to have a simple implementation. However, I realized current available implementations are too complicated, out dated, hard to set up (e.g. installing caffe), or requires GPUs to try. So I made it by myself. 5 | 6 | Most of the code is copied from Dr.Saito's implementation: https://github.com/mitmul/chainer-fast-rcnn. 7 | I just removed the caffe dependency, removed GPU limitation, updated to make it compatible with the latest chainer, and made the converted model available. Many thanks to Dr.Saito! He is the professor that tought me deep learning. 8 | Also I copied a non maximum suppression from R-CNN repo: https://github.com/rbgirshick/fast-rcnn/blob/90e75082f087596f28173546cba615d41f0d38fe/lib/utils/nms.py#L10-L37 9 | 10 | Update: Dr. Saito published faster R-CNN implementation after I opened this repo. You should check it : https://github.com/mitmul/chainer-faster-rcnn 11 | 12 | ## Requirements and environmental setup 13 | - [OpenCV 3 with python bindings](http://opencv.org) 14 | - [Chainer 1.9](http://chainer.org) 15 | - [dlib v18.18](https://github.com/davisking/dlib) 16 | 17 | some commands and hints that might help: 18 | ``` 19 | #get and install anaconda. you might want to check the latest link. 20 | wget https://3230d63b5fc54e62148e-c95ac804525aac4b6dba79b00b39d1d3.ssl.cf1.rackcdn.com/Anaconda2-2.4.1-Linux-x86_64.sh 21 | bash Anaconda2-2.4.1-Linux-x86_64.sh -b 22 | echo 'export PATH=$HOME/anaconda/bin:$PATH' >> .bashrc 23 | echo 'export PYTHONPATH=$HOME/anaconda/lib/python2.7/site-packages:$PYTHONPATH' >> .bashrc 24 | source .bashrc 25 | conda update conda -y 26 | # install chainer 27 | pip install chainer==1.9 28 | # install dlib 29 | conda install -c menpo dlib=18.18 30 | #install opencv3 31 | conda uninstall -c menpo opencv #in case you have opnecv2 32 | conda install -c menpo opencv3 33 | ``` 34 | If you got the following error, you are using OpneCV 2. Upgrade to version 3. 35 | ``` 36 | Traceback (most recent call last): 37 | File "forward.py", line 176, in 38 | result = draw_result(orig_image, im_scale, clss, bbox, orig_rects,args.nms_thresh, args.conf) 39 | File "forward.py", line 122, in draw_result 40 | (0, 0, 255), 2, cv.LINE_AA) 41 | AttributeError: 'module' object has no attribute 'LINE_AA' 42 | ``` 43 | 44 | ## Download model 45 | Downdload pretrained model on pascal voc dataset. 46 | The chainer model is converted from official fast R-CNN repository () using a chainer's replication (). 47 | You need to donwload mannually: https://drive.google.com/open?id=0B046sNk0DhCDNW5oMnVGaFdnWkU 48 | You will cerate a file : fast_rcnn_vgg_voc.model 49 | 50 | ## How to use. 51 | First you should prepare a sample image, and then 52 | ``` 53 | python forward.py --img_fn sample.jpg --out_fn result.jpg 54 | #if you want to you gpu 55 | python forward.py --img_fn sample.jpg --out_fn result.jpg --gpu_id 0 56 | ``` 57 | ### Samples 58 | ![](result1.jpg) 59 | Source: 'Overstekend wild' St. Janskerkhof Den Bosch © FaceMePLS (https://www.flickr.com/photos/faceme/5891724192) 60 | ![](result2.jpg) 61 | Source: My personal photo. My living room. 62 | -------------------------------------------------------------------------------- /downoad_model.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | import urllib 4 | url="https://googledrive.com/host/0B046sNk0DhCDUk9YeklwOFczc0E/fast_rcnn_vgg_voc.model" 5 | filename="fast_rcnn_vgg_voc.model" 6 | urllib.urlretrieve(url,filename) -------------------------------------------------------------------------------- /forward.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | import sys 5 | import time 6 | import dlib 7 | import argparse 8 | import cv2 as cv 9 | import numpy as np 10 | import vgg 11 | import chainer 12 | from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils 13 | from chainer import Link, Chain, ChainList 14 | import chainer.functions as F 15 | import chainer.links as L 16 | 17 | CLASSES = ('__background__', 18 | 'aeroplane', 'bicycle', 'bird', 'boat', 19 | 'bottle', 'bus', 'car', 'cat', 'chair', 20 | 'cow', 'diningtable', 'dog', 'horse', 21 | 'motorbike', 'person', 'pottedplant', 22 | 'sheep', 'sofa', 'train', 'tvmonitor') 23 | PIXEL_MEANS = np.array([102.9801, 115.9465, 122.7717], dtype=np.float32) 24 | 25 | 26 | def img_preprocessing(orig_img, pixel_means, max_size=1000, scale=600): 27 | img = orig_img.astype(np.float32, copy=True) 28 | img -= pixel_means 29 | im_size_min = np.min(img.shape[0:2]) 30 | im_size_max = np.max(img.shape[0:2]) 31 | im_scale = float(scale) / float(im_size_min) 32 | if np.rint(im_scale * im_size_max) > max_size: 33 | im_scale = float(max_size) / float(im_size_max) 34 | img = cv.resize(img, None, None, fx=im_scale, fy=im_scale, 35 | interpolation=cv.INTER_LINEAR) 36 | 37 | return img.transpose([2, 0, 1]).astype(np.float32), im_scale 38 | 39 | 40 | def get_bboxes(orig_img, im_scale, min_size, dedup_boxes=1. / 16): 41 | rects = [] 42 | dlib.find_candidate_object_locations(orig_img, rects, min_size=min_size) 43 | rects = [[0, d.left(), d.top(), d.right(), d.bottom()] for d in rects] 44 | rects = np.asarray(rects, dtype=np.float32) 45 | 46 | # bbox pre-processing 47 | rects *= im_scale 48 | v = np.array([1, 1e3, 1e6, 1e9, 1e12]) 49 | hashes = np.round(rects * dedup_boxes).dot(v) 50 | _, index, inv_index = np.unique(hashes, return_index=True, 51 | return_inverse=True) 52 | rects = rects[index, :] 53 | 54 | return rects 55 | 56 | 57 | def nms(dets, thresh): 58 | """ 59 | Copyed from python faster RCNN repocitory. 60 | Source: https://github.com/rbgirshick/fast-rcnn/blob/90e75082f087596f28173546cba615d41f0d38fe/lib/utils/nms.py#L10-L37 61 | """ 62 | x1 = dets[:, 0] 63 | y1 = dets[:, 1] 64 | x2 = dets[:, 2] 65 | y2 = dets[:, 3] 66 | scores = dets[:, 4] 67 | 68 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 69 | order = scores.argsort()[::-1] 70 | 71 | keep = [] 72 | while order.size > 0: 73 | i = order[0] 74 | keep.append(i) 75 | xx1 = np.maximum(x1[i], x1[order[1:]]) 76 | yy1 = np.maximum(y1[i], y1[order[1:]]) 77 | xx2 = np.minimum(x2[i], x2[order[1:]]) 78 | yy2 = np.minimum(y2[i], y2[order[1:]]) 79 | 80 | w = np.maximum(0.0, xx2 - xx1 + 1) 81 | h = np.maximum(0.0, yy2 - yy1 + 1) 82 | inter = w * h 83 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 84 | 85 | inds = np.where(ovr <= thresh)[0] 86 | order = order[inds + 1] 87 | 88 | return keep 89 | 90 | def draw_result(out, im_scale, clss, bbox, rects, nms_thresh, conf): 91 | out = cv.resize(out, None, None, fx=im_scale, fy=im_scale, 92 | interpolation=cv.INTER_LINEAR) 93 | for cls_id in range(1, 21): 94 | _cls = clss[:, cls_id][:, np.newaxis] 95 | _bbx = bbox[:, cls_id * 4: (cls_id + 1) * 4] 96 | dets = np.hstack((_bbx, _cls)) 97 | keep = nms(dets, nms_thresh) 98 | dets = dets[keep, :] 99 | orig_rects = rects[keep, 1:] 100 | 101 | inds = np.where(dets[:, -1] >= conf)[0] 102 | for i in inds: 103 | _bbox = dets[i, :4] 104 | x1, y1, x2, y2 = orig_rects[i] 105 | width = x2 - x1 106 | height = y2 - y1 107 | center_x = x1 + 0.5 * width 108 | center_y = y1 + 0.5 * height 109 | 110 | dx, dy, dw, dh = map(int, _bbox) 111 | _center_x = dx * width + center_x 112 | _center_y = dy * height + center_y 113 | _width = np.exp(dw) * width 114 | _height = np.exp(dh) * height 115 | 116 | x1 = _center_x - 0.5 * _width 117 | y1 = _center_y - 0.5 * _height 118 | x2 = _center_x + 0.5 * _width 119 | y2 = _center_y + 0.5 * _height 120 | 121 | cv.rectangle(out, (int(x1), int(y1)), (int(x2), int(y2)), 122 | (0, 0, 255), 2, cv.LINE_AA) 123 | ret, baseline = cv.getTextSize(CLASSES[cls_id], 124 | cv.FONT_HERSHEY_SIMPLEX, 1.0, 1) 125 | cv.rectangle(out, (int(x1), int(y2) - ret[1] - baseline), 126 | (int(x1) + ret[0], int(y2)), (0, 0, 255), -1) 127 | cv.putText(out, CLASSES[cls_id], (int(x1), int(y2) - baseline), 128 | cv.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 1, 129 | cv.LINE_AA) 130 | 131 | print CLASSES[cls_id], dets[i, 4] 132 | 133 | return out 134 | 135 | 136 | if __name__ == '__main__': 137 | parser = argparse.ArgumentParser() 138 | parser.add_argument('--gpu_id', type=int, default=-1) 139 | parser.add_argument('--img_fn', type=str, default='sample.jpg') 140 | parser.add_argument('--out_fn', type=str, default='result.jpg') 141 | parser.add_argument('--min_size', type=int, default=500) 142 | parser.add_argument('--nms_thresh', type=float, default=0.2) 143 | parser.add_argument('--conf', type=float, default=0.75) 144 | args = parser.parse_args() 145 | 146 | print 147 | vgg_model=vgg.VGG() 148 | serializers.load_npz('fast_rcnn_vgg_voc.model', vgg_model) 149 | 150 | #Gpu Setting 151 | if args.gpu_id >= 0: 152 | xp = cuda.cupy 153 | cuda.get_device(args.gpu_id).use() 154 | vgg_model.to_gpu() 155 | else: 156 | xp=np 157 | 158 | orig_image = cv.imread(args.img_fn) 159 | img, im_scale = img_preprocessing(orig_image, PIXEL_MEANS) 160 | orig_rects = get_bboxes(orig_image, im_scale, min_size=args.min_size) 161 | 162 | img = xp.asarray(img) 163 | rects = xp.asarray(orig_rects) 164 | 165 | x = chainer.Variable(img[xp.newaxis, :, :, :]) 166 | rois = chainer.Variable(rects) 167 | 168 | cls_score, bbox_pred = vgg_model(x,rois) 169 | 170 | clss = cls_score.data 171 | bbox = bbox_pred.data 172 | if args.gpu_id >= 0: 173 | clss = cuda.cupy.asnumpy(cls_score.data) 174 | bbox = cuda.cupy.asnumpy(bbox_pred.data) 175 | 176 | result = draw_result(orig_image, im_scale, clss, bbox, orig_rects,args.nms_thresh, args.conf) 177 | cv.imwrite(args.out_fn, result) 178 | -------------------------------------------------------------------------------- /result1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apple2373/chainer-simple-fast-rnn/606162d3a8176847bf24b621108339a3bb4bc306/result1.jpg -------------------------------------------------------------------------------- /result2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apple2373/chainer-simple-fast-rnn/606162d3a8176847bf24b621108339a3bb4bc306/result2.jpg -------------------------------------------------------------------------------- /roi_pooling_2d.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | ''' 4 | ROI pooling. 5 | Basically copied from chainer's officials, but I modified a few lines 6 | because chainer (1.9.0)'s implementation did not work on cpu. 7 | Souece: https://github.com/pfnet/chainer/blob/65030ac4ce0050685212d746ef8545ee0817d42e/chainer/functions/pooling/roi_pooling_2d.py 8 | ''' 9 | import numpy 10 | import six 11 | 12 | from chainer import cuda 13 | from chainer import function 14 | from chainer.utils import type_check 15 | 16 | 17 | def _roi_pooling_slice(size, stride, max_size, roi_offset): 18 | start = int(numpy.floor(size * stride)) 19 | end = int(numpy.ceil((size + 1) * stride)) 20 | 21 | start = min(max(start + roi_offset, 0), max_size) 22 | end = min(max(end + roi_offset, 0), max_size) 23 | 24 | return slice(start, end), end - start 25 | 26 | 27 | class ROIPooling2D(function.Function): 28 | 29 | """RoI pooling over a set of 2d planes.""" 30 | 31 | def __init__(self, outh, outw, spatial_scale): 32 | self.outh, self.outw = outh, outw 33 | self.spatial_scale = spatial_scale 34 | 35 | def check_type_forward(self, in_types): 36 | type_check.expect(in_types.size() == 2) 37 | 38 | x_type, roi_type = in_types 39 | type_check.expect( 40 | x_type.dtype == numpy.float32, 41 | x_type.ndim == 4, 42 | roi_type.dtype == numpy.float32, 43 | roi_type.ndim == 2, 44 | roi_type.shape[1] == 5, 45 | ) 46 | 47 | def forward_cpu(self, inputs): 48 | bottom_data, bottom_rois = inputs 49 | # 50 | channels, height, width = bottom_data.shape[1:] 51 | n_rois = bottom_rois.shape[0] 52 | # 53 | top_data = numpy.empty((n_rois, channels, self.outh, self.outw), 54 | dtype=numpy.float32) 55 | self.argmax_data = numpy.empty_like(top_data).astype(numpy.int32) 56 | for i_roi in six.moves.range(n_rois): 57 | idx, xmin, ymin, xmax, ymax = bottom_rois[i_roi] 58 | xmin = int(round(xmin * self.spatial_scale)) 59 | xmax = int(round(xmax * self.spatial_scale)) 60 | ymin = int(round(ymin * self.spatial_scale)) 61 | ymax = int(round(ymax * self.spatial_scale)) 62 | roi_width = max(xmax - xmin + 1, 1) 63 | roi_height = max(ymax - ymin + 1, 1) 64 | strideh = 1. * roi_height / self.outh 65 | stridew = 1. * roi_width / self.outw 66 | 67 | for outh in six.moves.range(self.outh): 68 | sliceh, lenh = _roi_pooling_slice( 69 | outh, strideh, height, ymin) 70 | if sliceh.stop <= sliceh.start: 71 | continue 72 | for outw in six.moves.range(self.outw): 73 | slicew, lenw = _roi_pooling_slice( 74 | outw, stridew, width, xmin) 75 | if slicew.stop <= slicew.start: 76 | continue 77 | roi_data = bottom_data[int(idx), :, sliceh, slicew]\ 78 | .reshape(channels, -1) 79 | top_data[i_roi, :, outh, outw] =\ 80 | numpy.max(roi_data, axis=1) 81 | 82 | # get the max idx respect to feature_maps coordinates 83 | max_idx_slice = numpy.unravel_index( 84 | numpy.argmax(roi_data, axis=1), (lenh, lenw)) 85 | max_idx_slice_h = max_idx_slice[0] + sliceh.start 86 | max_idx_slice_w = max_idx_slice[1] + slicew.start 87 | max_idx_slice = max_idx_slice_h * width + max_idx_slice_w 88 | self.argmax_data[i_roi, :, outh, outw] = max_idx_slice 89 | return top_data, 90 | 91 | def forward_gpu(self, inputs): 92 | bottom_data, bottom_rois = inputs 93 | channels, height, width = bottom_data.shape[1:] 94 | n_rois = bottom_rois.shape[0] 95 | top_data = cuda.cupy.empty((n_rois, channels, self.outh, 96 | self.outw), dtype=numpy.float32) 97 | self.argmax_data = cuda.cupy.empty_like(top_data).astype(numpy.int32) 98 | cuda.cupy.ElementwiseKernel( 99 | ''' 100 | raw float32 bottom_data, float32 spatial_scale, int32 channels, 101 | int32 height, int32 width, int32 pooled_height, int32 pooled_width, 102 | raw float32 bottom_rois 103 | ''', 104 | 'float32 top_data, int32 argmax_data', 105 | ''' 106 | // pos in output filter 107 | int pw = i % pooled_width; 108 | int ph = (i / pooled_width) % pooled_height; 109 | int c = (i / pooled_width / pooled_height) % channels; 110 | int num = i / pooled_width / pooled_height / channels; 111 | 112 | int roi_batch_ind = bottom_rois[num * 5 + 0]; 113 | int roi_start_w = round(bottom_rois[num * 5 + 1] * spatial_scale); 114 | int roi_start_h = round(bottom_rois[num * 5 + 2] * spatial_scale); 115 | int roi_end_w = round(bottom_rois[num * 5 + 3] * spatial_scale); 116 | int roi_end_h = round(bottom_rois[num * 5 + 4] * spatial_scale); 117 | 118 | // Force malformed ROIs to be 1x1 119 | int roi_width = max(roi_end_w - roi_start_w + 1, 1); 120 | int roi_height = max(roi_end_h - roi_start_h + 1, 1); 121 | float bin_size_h = static_cast(roi_height) 122 | / static_cast(pooled_height); 123 | float bin_size_w = static_cast(roi_width) 124 | / static_cast(pooled_width); 125 | 126 | int hstart = static_cast(floor(static_cast(ph) 127 | * bin_size_h)); 128 | int wstart = static_cast(floor(static_cast(pw) 129 | * bin_size_w)); 130 | int hend = static_cast(ceil(static_cast(ph + 1) 131 | * bin_size_h)); 132 | int wend = static_cast(ceil(static_cast(pw + 1) 133 | * bin_size_w)); 134 | 135 | // Add roi offsets and clip to input boundaries 136 | hstart = min(max(hstart + roi_start_h, 0), height); 137 | hend = min(max(hend + roi_start_h, 0), height); 138 | wstart = min(max(wstart + roi_start_w, 0), width); 139 | wend = min(max(wend + roi_start_w, 0), width); 140 | bool is_empty = (hend <= hstart) || (wend <= wstart); 141 | 142 | // Define an empty pooling region to be zero 143 | float maxval = is_empty ? 0 : -1E+37; 144 | // If nothing is pooled, argmax=-1 causes nothing to be backprop'd 145 | int maxidx = -1; 146 | int data_offset = (roi_batch_ind * channels + c) * height * width; 147 | for (int h = hstart; h < hend; ++h) { 148 | for (int w = wstart; w < wend; ++w) { 149 | int bottom_index = h * width + w; 150 | if (bottom_data[data_offset + bottom_index] > maxval) { 151 | maxval = bottom_data[data_offset + bottom_index]; 152 | maxidx = bottom_index; 153 | } 154 | } 155 | } 156 | top_data = maxval; 157 | argmax_data = maxidx; 158 | ''', 'roi_poolig_2d_fwd' 159 | )(bottom_data, self.spatial_scale, channels, height, width, 160 | self.outh, self.outw, bottom_rois, top_data, 161 | self.argmax_data) 162 | 163 | return top_data, 164 | 165 | def backward_cpu(self, inputs, gy): 166 | bottom_data, bottom_rois = inputs 167 | n_rois, channels, height, width = bottom_data.shape 168 | bottom_delta = numpy.zeros_like(bottom_data, dtype=numpy.float32) 169 | 170 | for i_roi in six.moves.range(n_rois): 171 | idx, xmin, ymin, xmax, ymax = bottom_rois[i_roi] 172 | idx = int(idx) 173 | xmin = int(round(xmin * self.spatial_scale)) 174 | xmax = int(round(xmax * self.spatial_scale)) 175 | ymin = int(round(ymin * self.spatial_scale)) 176 | ymax = int(round(ymax * self.spatial_scale)) 177 | roi_width = max(xmax - xmin + 1, 1) 178 | roi_height = max(ymax - ymin + 1, 1) 179 | 180 | strideh = float(roi_height) / float(self.outh) 181 | stridew = float(roi_width) / float(self.outw) 182 | 183 | # iterate all the w, h (from feature map) that fall into this ROIs 184 | for w in six.moves.range(xmin, xmax + 1): 185 | for h in six.moves.range(ymin, ymax + 1): 186 | phstart = int(numpy.floor(float(h - ymin) / strideh)) 187 | phend = int(numpy.ceil(float(h - ymin + 1) / strideh)) 188 | pwstart = int(numpy.floor(float(w - xmin) / stridew)) 189 | pwend = int(numpy.ceil(float(w - xmin + 1) / stridew)) 190 | 191 | phstart = min(max(phstart, 0), self.outh) 192 | phend = min(max(phend, 0), self.outh) 193 | pwstart = min(max(pwstart, 0), self.outw) 194 | pwend = min(max(pwend, 0), self.outw) 195 | 196 | for ph in six.moves.range(phstart, phend): 197 | for pw in six.moves.range(pwstart, pwend): 198 | max_idx_tmp = self.argmax_data[i_roi, :, ph, pw] 199 | for c in six.moves.range(channels): 200 | if max_idx_tmp[c] == (h * width + w): 201 | bottom_delta[idx, c, h, w] += \ 202 | gy[0][i_roi, c, ph, pw] 203 | return bottom_delta, None 204 | 205 | def backward_gpu(self, inputs, gy): 206 | bottom_data, bottom_rois = inputs 207 | channels, height, width = bottom_data.shape[1:] 208 | bottom_diff = cuda.cupy.zeros_like(bottom_data, dtype=numpy.float32) 209 | cuda.cupy.ElementwiseKernel( 210 | ''' 211 | raw float32 top_diff, raw int32 argmax_data, int32 num_rois, 212 | float32 spatial_scale, int32 channels, int32 height, int32 width, 213 | int32 pooled_height, int32 pooled_width, raw float32 bottom_rois 214 | ''', 215 | 'float32 bottom_diff', 216 | ''' 217 | int w = i % width; 218 | int h = (i / width) % height; 219 | int c = (i / (width * height)) % channels; 220 | int num = i / (width * height * channels); 221 | 222 | float gradient = 0; 223 | // Accumulate gradient over all ROIs that pooled this element 224 | for (int roi_n = 0; roi_n < num_rois; ++roi_n) { 225 | // Skip if ROI's batch index doesn't match num 226 | if (num != static_cast(bottom_rois[roi_n * 5])) { 227 | continue; 228 | } 229 | 230 | int roi_start_w = round(bottom_rois[roi_n * 5 + 1] 231 | * spatial_scale); 232 | int roi_start_h = round(bottom_rois[roi_n * 5 + 2] 233 | * spatial_scale); 234 | int roi_end_w = round(bottom_rois[roi_n * 5 + 3] 235 | * spatial_scale); 236 | int roi_end_h = round(bottom_rois[roi_n * 5 + 4] 237 | * spatial_scale); 238 | 239 | // Skip if ROI doesn't include (h, w) 240 | const bool in_roi = (w >= roi_start_w && w <= roi_end_w && 241 | h >= roi_start_h && h <= roi_end_h); 242 | if (!in_roi) { 243 | continue; 244 | } 245 | 246 | int offset = (roi_n * channels + c) * pooled_height 247 | * pooled_width; 248 | 249 | // Compute feasible set of pooled units that could have pooled 250 | // this bottom unit 251 | 252 | // Force malformed ROIs to be 1x1 253 | int roi_width = max(roi_end_w - roi_start_w + 1, 1); 254 | int roi_height = max(roi_end_h - roi_start_h + 1, 1); 255 | 256 | float bin_size_h = static_cast(roi_height) 257 | / static_cast(pooled_height); 258 | float bin_size_w = static_cast(roi_width) 259 | / static_cast(pooled_width); 260 | 261 | int phstart = floor(static_cast(h - roi_start_h) 262 | / bin_size_h); 263 | int phend = ceil(static_cast(h - roi_start_h + 1) 264 | / bin_size_h); 265 | int pwstart = floor(static_cast(w - roi_start_w) 266 | / bin_size_w); 267 | int pwend = ceil(static_cast(w - roi_start_w + 1) 268 | / bin_size_w); 269 | 270 | phstart = min(max(phstart, 0), pooled_height); 271 | phend = min(max(phend, 0), pooled_height); 272 | pwstart = min(max(pwstart, 0), pooled_width); 273 | pwend = min(max(pwend, 0), pooled_width); 274 | 275 | for (int ph = phstart; ph < phend; ++ph) { 276 | for (int pw = pwstart; pw < pwend; ++pw) { 277 | int index_ = ph * pooled_width + pw + offset; 278 | if (argmax_data[index_] == (h * width + w)) { 279 | gradient += top_diff[index_]; 280 | } 281 | } 282 | } 283 | } 284 | bottom_diff = gradient; 285 | ''', 'roi_pooling_2d_bwd' 286 | )(gy[0], self.argmax_data, bottom_rois.shape[0], self.spatial_scale, 287 | channels, height, width, self.outh, self.outw, 288 | bottom_rois, bottom_diff) 289 | 290 | return bottom_diff, None 291 | 292 | 293 | def roi_pooling_2d(x, rois, outh, outw, spatial_scale): 294 | """Spatial Region of Interest (ROI) pooling function. 295 | 296 | This function acts similarly to :class:`~functions.MaxPooling2D`, but 297 | it computes the maximum of input spatial patch for each channel 298 | with the region of interest. 299 | 300 | Args: 301 | x (~chainer.Variable): Input variable. The shape is expected to be 302 | 4 dimentional: (n: batch, c: channel, h, height, w: width). 303 | rois (~chainer.Variable): Input roi variable. The shape is expected to 304 | be (n: data size, 5), and each datum is set as below: 305 | (batch_index, x_min, y_min, x_max, y_max). 306 | outh (int): Height of output image after pooled. 307 | outw (int): Width of output image after pooled. 308 | spatial_scale (float): Scale of the roi is resized. 309 | 310 | Returns: 311 | ~chainer.Variable: Ouptut variable. 312 | 313 | See the original paper proposing ROIPooling: 314 | `Fast R-CNN `_. 315 | 316 | """ 317 | return ROIPooling2D(outh, outw, spatial_scale)(x, rois) 318 | -------------------------------------------------------------------------------- /sample.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apple2373/chainer-simple-fast-rnn/606162d3a8176847bf24b621108339a3bb4bc306/sample.jpg -------------------------------------------------------------------------------- /sample2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apple2373/chainer-simple-fast-rnn/606162d3a8176847bf24b621108339a3bb4bc306/sample2.jpg -------------------------------------------------------------------------------- /vgg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | import chainer 4 | import chainer.functions as F 5 | import chainer.links as L 6 | from roi_pooling_2d import roi_pooling_2d 7 | 8 | class VGG(chainer.Chain): 9 | 10 | def __init__(self): 11 | super(VGG, self).__init__( 12 | conv1_1=L.Convolution2D(3, 64, 3, stride=1, pad=1), 13 | conv1_2=L.Convolution2D(64, 64, 3, stride=1, pad=1), 14 | 15 | conv2_1=L.Convolution2D(64, 128, 3, stride=1, pad=1), 16 | conv2_2=L.Convolution2D(128, 128, 3, stride=1, pad=1), 17 | 18 | conv3_1=L.Convolution2D(128, 256, 3, stride=1, pad=1), 19 | conv3_2=L.Convolution2D(256, 256, 3, stride=1, pad=1), 20 | conv3_3=L.Convolution2D(256, 256, 3, stride=1, pad=1), 21 | 22 | conv4_1=L.Convolution2D(256, 512, 3, stride=1, pad=1), 23 | conv4_2=L.Convolution2D(512, 512, 3, stride=1, pad=1), 24 | conv4_3=L.Convolution2D(512, 512, 3, stride=1, pad=1), 25 | 26 | conv5_1=L.Convolution2D(512, 512, 3, stride=1, pad=1), 27 | conv5_2=L.Convolution2D(512, 512, 3, stride=1, pad=1), 28 | conv5_3=L.Convolution2D(512, 512, 3, stride=1, pad=1), 29 | 30 | fc6=L.Linear(25088, 4096), 31 | fc7=L.Linear(4096, 4096), 32 | cls_score=L.Linear(4096, 21), 33 | bbox_pred=L.Linear(4096, 84) 34 | ) 35 | 36 | def __call__(self, x, rois, train=False): 37 | 38 | h = F.relu(self.conv1_1(x)) 39 | h = F.relu(self.conv1_2(h)) 40 | h = F.max_pooling_2d(h, 2, stride=2) 41 | 42 | h = F.relu(self.conv2_1(h)) 43 | h = F.relu(self.conv2_2(h)) 44 | h = F.max_pooling_2d(h, 2, stride=2) 45 | 46 | h = F.relu(self.conv3_1(h)) 47 | h = F.relu(self.conv3_2(h)) 48 | h = F.relu(self.conv3_3(h)) 49 | h = F.max_pooling_2d(h, 2, stride=2) 50 | 51 | h = F.relu(self.conv4_1(h)) 52 | h = F.relu(self.conv4_2(h)) 53 | h = F.relu(self.conv4_3(h)) 54 | h = F.max_pooling_2d(h, 2, stride=2) 55 | 56 | h = F.relu(self.conv5_1(h)) 57 | h = F.relu(self.conv5_2(h)) 58 | h = F.relu(self.conv5_3(h)) 59 | h = roi_pooling_2d(h, rois,7,7,0.0625) 60 | 61 | 62 | h = F.dropout(F.relu(self.fc6(h)), train=train, ratio=0.5) 63 | h = F.dropout(F.relu(self.fc7(h)), train=train, ratio=0.5) 64 | cls_score = F.softmax(self.cls_score(h)) 65 | bbox_pred = self.bbox_pred(h) 66 | 67 | return cls_score, bbox_pred 68 | --------------------------------------------------------------------------------