├── lib ├── nms │ ├── __init__.py │ ├── .gitignore │ ├── gpu_nms.hpp │ ├── py_cpu_nms.py │ ├── gpu_nms.pyx │ ├── cpu_nms.pyx │ └── nms_kernel.cu ├── transform │ ├── __init__.py │ └── torch_image_transform_layer.py ├── utils │ ├── .gitignore │ ├── __init__.py │ ├── timer.py │ ├── blob.py │ └── bbox.pyx ├── pycocotools │ ├── __init__.py │ ├── UPSTREAM_REV │ ├── license.txt │ ├── maskApi.h │ ├── mask.py │ └── maskApi.c ├── Makefile ├── datasets │ ├── __init__.py │ ├── VOCdevkit-matlab-wrapper │ │ ├── get_voc_opts.m │ │ ├── xVOCap.m │ │ └── voc_eval.m │ ├── ds_utils.py │ ├── factory.py │ ├── tools │ │ └── mcg_munge.py │ ├── voc_eval.py │ └── imdb.py ├── fast_rcnn │ ├── __init__.py │ ├── nms_wrapper.py │ ├── bbox_transform.py │ ├── train.py │ └── config.py ├── roi_data_layer │ ├── __init__.py │ └── roidb.py ├── rpn │ ├── __init__.py │ ├── README.md │ ├── generate_anchors.py │ ├── generate.py │ ├── proposal_layer.py │ └── proposal_target_layer.py └── setup.py ├── experiments ├── logs │ └── .gitignore ├── README.md ├── cfgs │ └── fast_rcnn_ohem.yml └── scripts │ ├── fast_rcnn.sh │ ├── fast_rcnn_ohem.sh │ └── fast_rcnn_ohem_07tv12tv.sh ├── data ├── pylintrc ├── demo │ ├── 000456.jpg │ ├── 000542.jpg │ ├── 001150.jpg │ ├── 001763.jpg │ └── 004545.jpg ├── .gitignore ├── scripts │ ├── fetch_imagenet_models.sh │ ├── fetch_selective_search_data.sh │ └── fetch_fast_rcnn_ohem_models.sh └── README.md ├── tools ├── README.md ├── _init_paths.py ├── reval.py ├── eval_recall.py ├── rpn_generate.py ├── test_net.py ├── train_net.py ├── compress_net.py └── demo.py ├── .gitignore ├── .gitmodules ├── models └── pascal_voc │ ├── VGG16 │ ├── fast_rcnn │ │ ├── solver.prototxt │ │ ├── train.prototxt │ │ └── test.prototxt │ └── fast_rcnn_ohem │ │ └── solver.prototxt │ └── VGG_CNN_M_1024 │ ├── fast_rcnn │ ├── solver.prototxt │ ├── train.prototxt │ └── test.prototxt │ └── fast_rcnn_ohem │ ├── solver.prototxt │ └── train.prototxt ├── LICENSE └── README.md /lib/nms/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /lib/transform/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /experiments/logs/.gitignore: -------------------------------------------------------------------------------- 1 | *.txt* 2 | -------------------------------------------------------------------------------- /lib/utils/.gitignore: -------------------------------------------------------------------------------- 1 | *.c 2 | *.so 3 | -------------------------------------------------------------------------------- /lib/nms/.gitignore: -------------------------------------------------------------------------------- 1 | *.c 2 | *.cpp 3 | *.so 4 | -------------------------------------------------------------------------------- /lib/pycocotools/__init__.py: -------------------------------------------------------------------------------- 1 | __author__ = 'tylin' 2 | -------------------------------------------------------------------------------- /data/pylintrc: -------------------------------------------------------------------------------- 1 | [TYPECHECK] 2 | 3 | ignored-modules = numpy, numpy.random, cv2 4 | -------------------------------------------------------------------------------- /lib/Makefile: -------------------------------------------------------------------------------- 1 | all: 2 | python setup.py build_ext --inplace 3 | rm -rf build 4 | -------------------------------------------------------------------------------- /tools/README.md: -------------------------------------------------------------------------------- 1 | Tools for training, testing, and compressing Fast R-CNN networks. 2 | -------------------------------------------------------------------------------- /data/demo/000456.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abhi2610/ohem/HEAD/data/demo/000456.jpg -------------------------------------------------------------------------------- /data/demo/000542.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abhi2610/ohem/HEAD/data/demo/000542.jpg -------------------------------------------------------------------------------- /data/demo/001150.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abhi2610/ohem/HEAD/data/demo/001150.jpg -------------------------------------------------------------------------------- /data/demo/001763.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abhi2610/ohem/HEAD/data/demo/001763.jpg -------------------------------------------------------------------------------- /data/demo/004545.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/abhi2610/ohem/HEAD/data/demo/004545.jpg -------------------------------------------------------------------------------- /data/.gitignore: -------------------------------------------------------------------------------- 1 | selective_search* 2 | imagenet_models* 3 | fast_rcnn_models* 4 | VOCdevkit* 5 | cache 6 | -------------------------------------------------------------------------------- /lib/pycocotools/UPSTREAM_REV: -------------------------------------------------------------------------------- 1 | https://github.com/pdollar/coco/commit/3ac47c77ebd5a1ed4254a98b7fbf2ef4765a3574 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .ipynb_checkpoints 3 | lib/build 4 | lib/pycocotools/_mask.c 5 | lib/pycocotools/_mask.so 6 | data/coco 7 | data/cocoAPI 8 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "caffe-fast-rcnn"] 2 | path = caffe-fast-rcnn 3 | url = https://github.com/rbgirshick/caffe-fast-rcnn.git 4 | branch = fast-rcnn 5 | -------------------------------------------------------------------------------- /lib/nms/gpu_nms.hpp: -------------------------------------------------------------------------------- 1 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 2 | int boxes_dim, float nms_overlap_thresh, int device_id); 3 | -------------------------------------------------------------------------------- /experiments/README.md: -------------------------------------------------------------------------------- 1 | Scripts are under `experiments/scripts`. 2 | 3 | Each script saves a log file under `experiments/logs`. 4 | 5 | Configuration override files used in the experiments are stored in `experiments/cfgs`. 6 | -------------------------------------------------------------------------------- /lib/utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | -------------------------------------------------------------------------------- /lib/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | -------------------------------------------------------------------------------- /lib/fast_rcnn/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | -------------------------------------------------------------------------------- /lib/roi_data_layer/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | -------------------------------------------------------------------------------- /lib/rpn/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick and Sean Bell 6 | # -------------------------------------------------------- 7 | -------------------------------------------------------------------------------- /lib/datasets/VOCdevkit-matlab-wrapper/get_voc_opts.m: -------------------------------------------------------------------------------- 1 | function VOCopts = get_voc_opts(path) 2 | 3 | tmp = pwd; 4 | cd(path); 5 | try 6 | addpath('VOCcode'); 7 | VOCinit; 8 | catch 9 | rmpath('VOCcode'); 10 | cd(tmp); 11 | error(sprintf('VOCcode directory not found under %s', path)); 12 | end 13 | rmpath('VOCcode'); 14 | cd(tmp); 15 | -------------------------------------------------------------------------------- /lib/datasets/VOCdevkit-matlab-wrapper/xVOCap.m: -------------------------------------------------------------------------------- 1 | function ap = xVOCap(rec,prec) 2 | % From the PASCAL VOC 2011 devkit 3 | 4 | mrec=[0 ; rec ; 1]; 5 | mpre=[0 ; prec ; 0]; 6 | for i=numel(mpre)-1:-1:1 7 | mpre(i)=max(mpre(i),mpre(i+1)); 8 | end 9 | i=find(mrec(2:end)~=mrec(1:end-1))+1; 10 | ap=sum((mrec(i)-mrec(i-1)).*mpre(i)); 11 | -------------------------------------------------------------------------------- /experiments/cfgs/fast_rcnn_ohem.yml: -------------------------------------------------------------------------------- 1 | EXP_DIR: fast_rcnn_ohem 2 | MATLAB: /opt/matlab/8.1/bin/matlab 3 | TRAIN: 4 | BG_THRESH_LO: 0.0 5 | # we use gradient accumulation, 6 | # see solver.prototxt (iter_size: 2) 7 | IMS_PER_BATCH: 1 8 | # adjust batch_size for iter_size 9 | BATCH_SIZE: 64 10 | USE_OHEM: True 11 | # Wasn't used in the paper (impact unknown). 12 | ASPECT_GROUPING: False -------------------------------------------------------------------------------- /models/pascal_voc/VGG16/fast_rcnn/solver.prototxt: -------------------------------------------------------------------------------- 1 | train_net: "models/pascal_voc/VGG16/fast_rcnn/train.prototxt" 2 | base_lr: 0.001 3 | lr_policy: "step" 4 | gamma: 0.1 5 | stepsize: 30000 6 | display: 20 7 | average_loss: 100 8 | # iter_size: 1 9 | momentum: 0.9 10 | weight_decay: 0.0005 11 | # We disable standard caffe solver snapshotting and implement our own snapshot 12 | # function 13 | snapshot: 0 14 | # We still use the snapshot prefix, though 15 | snapshot_prefix: "vgg16_fast_rcnn" 16 | #debug_info: true 17 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG16/fast_rcnn_ohem/solver.prototxt: -------------------------------------------------------------------------------- 1 | train_net: "models/pascal_voc/VGG16/fast_rcnn_ohem/train.prototxt" 2 | base_lr: 0.001 3 | lr_policy: "step" 4 | gamma: 0.1 5 | stepsize: 30000 6 | display: 20 7 | average_loss: 100 8 | iter_size: 2 9 | momentum: 0.9 10 | weight_decay: 0.0005 11 | # We disable standard caffe solver snapshotting and implement our own snapshot 12 | # function 13 | snapshot: 0 14 | # We still use the snapshot prefix, though 15 | snapshot_prefix: "vgg16_fast_rcnn" 16 | #debug_info: true 17 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG_CNN_M_1024/fast_rcnn/solver.prototxt: -------------------------------------------------------------------------------- 1 | train_net: "models/pascal_voc/VGG_CNN_M_1024/fast_rcnn/train.prototxt" 2 | base_lr: 0.001 3 | lr_policy: "step" 4 | gamma: 0.1 5 | stepsize: 30000 6 | display: 20 7 | average_loss: 100 8 | momentum: 0.9 9 | weight_decay: 0.0005 10 | # We disable standard caffe solver snapshotting and implement our own snapshot 11 | # function 12 | snapshot: 0 13 | # We still use the snapshot prefix, though 14 | snapshot_prefix: "vgg_cnn_m_1024_fast_rcnn" 15 | #debug_info: true 16 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG_CNN_M_1024/fast_rcnn_ohem/solver.prototxt: -------------------------------------------------------------------------------- 1 | train_net: "models/pascal_voc/VGG_CNN_M_1024/fast_rcnn_ohem/train.prototxt" 2 | base_lr: 0.001 3 | lr_policy: "step" 4 | gamma: 0.1 5 | stepsize: 30000 6 | display: 20 7 | average_loss: 100 8 | momentum: 0.9 9 | iter_size: 2 10 | weight_decay: 0.0005 11 | # We disable standard caffe solver snapshotting and implement our own snapshot 12 | # function 13 | snapshot: 0 14 | # We still use the snapshot prefix, though 15 | snapshot_prefix: "vgg_cnn_m_1024_fast_rcnn" 16 | #debug_info: true 17 | -------------------------------------------------------------------------------- /lib/fast_rcnn/nms_wrapper.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | from fast_rcnn.config import cfg 9 | from nms.gpu_nms import gpu_nms 10 | from nms.cpu_nms import cpu_nms 11 | 12 | def nms(dets, thresh, force_cpu=False): 13 | """Dispatch to either CPU or GPU NMS implementations.""" 14 | 15 | if dets.shape[0] == 0: 16 | return [] 17 | if cfg.USE_GPU_NMS and not force_cpu: 18 | return gpu_nms(dets, thresh, device_id=cfg.GPU_ID) 19 | else: 20 | return cpu_nms(dets, thresh) 21 | -------------------------------------------------------------------------------- /tools/_init_paths.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | """Set up paths for Fast R-CNN.""" 9 | 10 | import os.path as osp 11 | import sys 12 | 13 | def add_path(path): 14 | if path not in sys.path: 15 | sys.path.insert(0, path) 16 | 17 | this_dir = osp.dirname(__file__) 18 | 19 | # Add caffe to PYTHONPATH 20 | caffe_path = osp.join(this_dir, '..', 'caffe-fast-rcnn', 'python') 21 | add_path(caffe_path) 22 | 23 | # Add lib to PYTHONPATH 24 | lib_path = osp.join(this_dir, '..', 'lib') 25 | add_path(lib_path) 26 | -------------------------------------------------------------------------------- /lib/rpn/README.md: -------------------------------------------------------------------------------- 1 | ### `rpn` module overview 2 | 3 | ##### `generate_anchors.py` 4 | 5 | Generates a regular grid of multi-scale, multi-aspect anchor boxes. 6 | 7 | ##### `proposal_layer.py` 8 | 9 | Converts RPN outputs (per-anchor scores and bbox regression estimates) into object proposals. 10 | 11 | ##### `anchor_target_layer.py` 12 | 13 | Generates training targets/labels for each anchor. Classification labels are 1 (object), 0 (not object) or -1 (ignore). 14 | Bbox regression targets are specified when the classification label is > 0. 15 | 16 | ##### `proposal_target_layer.py` 17 | 18 | Generates training targets/labels for each object proposal: classification labels 0 - K (bg or object class 1, ... , K) 19 | and bbox regression targets in that case that the label is > 0. 20 | 21 | ##### `generate.py` 22 | 23 | Generate object detection proposals from an imdb using an RPN. 24 | -------------------------------------------------------------------------------- /data/scripts/fetch_imagenet_models.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )" 4 | cd $DIR 5 | 6 | FILE=imagenet_models.tgz 7 | URL=http://www.cs.berkeley.edu/~rbg/faster-rcnn-data/$FILE 8 | CHECKSUM=ed34ca912d6782edfb673a8c3a0bda6d 9 | 10 | if [ -f $FILE ]; then 11 | echo "File already exists. Checking md5..." 12 | os=`uname -s` 13 | if [ "$os" = "Linux" ]; then 14 | checksum=`md5sum $FILE | awk '{ print $1 }'` 15 | elif [ "$os" = "Darwin" ]; then 16 | checksum=`cat $FILE | md5` 17 | fi 18 | if [ "$checksum" = "$CHECKSUM" ]; then 19 | echo "Checksum is correct. No need to download." 20 | exit 0 21 | else 22 | echo "Checksum is incorrect. Need to download again." 23 | fi 24 | fi 25 | 26 | echo "Downloading pretrained ImageNet models (1G)..." 27 | 28 | wget $URL -O $FILE 29 | 30 | echo "Unzipping..." 31 | 32 | tar zxvf $FILE 33 | 34 | echo "Done. Please run this command again to verify that checksum = $CHECKSUM." 35 | -------------------------------------------------------------------------------- /data/scripts/fetch_selective_search_data.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )" 4 | cd $DIR 5 | 6 | FILE=selective_search_data.tgz 7 | URL=http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/$FILE 8 | CHECKSUM=7078c1db87a7851b31966b96774cd9b9 9 | 10 | if [ -f $FILE ]; then 11 | echo "File already exists. Checking md5..." 12 | os=`uname -s` 13 | if [ "$os" = "Linux" ]; then 14 | checksum=`md5sum $FILE | awk '{ print $1 }'` 15 | elif [ "$os" = "Darwin" ]; then 16 | checksum=`cat $FILE | md5` 17 | fi 18 | if [ "$checksum" = "$CHECKSUM" ]; then 19 | echo "Checksum is correct. No need to download." 20 | exit 0 21 | else 22 | echo "Checksum is incorrect. Need to download again." 23 | fi 24 | fi 25 | 26 | echo "Downloading precomputed selective search boxes (0.5G)..." 27 | 28 | wget $URL -O $FILE 29 | 30 | echo "Unzipping..." 31 | 32 | tar zxvf $FILE 33 | 34 | echo "Done. Please run this command again to verify that checksum = $CHECKSUM." 35 | -------------------------------------------------------------------------------- /data/scripts/fetch_fast_rcnn_ohem_models.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )" 4 | cd $DIR 5 | 6 | FILE=fast_rcnn_ohem_models.tgz 7 | URL=http://graphics.cs.cmu.edu/projects/ohem/data/$FILE 8 | CHECKSUM=cbfd5b7ed5ec4d5cb838701cbf1f3ccb 9 | 10 | if [ -f $FILE ]; then 11 | echo "File already exists. Checking md5..." 12 | os=`uname -s` 13 | if [ "$os" = "Linux" ]; then 14 | checksum=`md5sum $FILE | awk '{ print $1 }'` 15 | elif [ "$os" = "Darwin" ]; then 16 | checksum=`cat $FILE | md5` 17 | fi 18 | if [ "$checksum" = "$CHECKSUM" ]; then 19 | echo "Checksum is correct. No need to download." 20 | exit 0 21 | else 22 | echo "Checksum is incorrect. Need to download again." 23 | fi 24 | fi 25 | 26 | echo "Downloading Fast R-CNN OHEM models (VGG16 and VGG_CNN_M_1024)(1.5G)..." 27 | 28 | wget $URL -O $FILE 29 | 30 | echo "Unzipping..." 31 | 32 | tar zxvf $FILE 33 | 34 | echo "Done. Please run this command again to verify that checksum = $CHECKSUM." 35 | -------------------------------------------------------------------------------- /lib/utils/timer.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import time 9 | 10 | class Timer(object): 11 | """A simple timer.""" 12 | def __init__(self): 13 | self.total_time = 0. 14 | self.calls = 0 15 | self.start_time = 0. 16 | self.diff = 0. 17 | self.average_time = 0. 18 | 19 | def tic(self): 20 | # using time.time instead of time.clock because time time.clock 21 | # does not normalize for multithreading 22 | self.start_time = time.time() 23 | 24 | def toc(self, average=True): 25 | self.diff = time.time() - self.start_time 26 | self.total_time += self.diff 27 | self.calls += 1 28 | self.average_time = self.total_time / self.calls 29 | if average: 30 | return self.average_time 31 | else: 32 | return self.diff 33 | -------------------------------------------------------------------------------- /lib/nms/py_cpu_nms.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | def py_cpu_nms(dets, thresh): 11 | """Pure Python NMS baseline.""" 12 | x1 = dets[:, 0] 13 | y1 = dets[:, 1] 14 | x2 = dets[:, 2] 15 | y2 = dets[:, 3] 16 | scores = dets[:, 4] 17 | 18 | areas = (x2 - x1 + 1) * (y2 - y1 + 1) 19 | order = scores.argsort()[::-1] 20 | 21 | keep = [] 22 | while order.size > 0: 23 | i = order[0] 24 | keep.append(i) 25 | xx1 = np.maximum(x1[i], x1[order[1:]]) 26 | yy1 = np.maximum(y1[i], y1[order[1:]]) 27 | xx2 = np.minimum(x2[i], x2[order[1:]]) 28 | yy2 = np.minimum(y2[i], y2[order[1:]]) 29 | 30 | w = np.maximum(0.0, xx2 - xx1 + 1) 31 | h = np.maximum(0.0, yy2 - yy1 + 1) 32 | inter = w * h 33 | ovr = inter / (areas[i] + areas[order[1:]] - inter) 34 | 35 | inds = np.where(ovr <= thresh)[0] 36 | order = order[inds + 1] 37 | 38 | return keep 39 | -------------------------------------------------------------------------------- /lib/nms/gpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | assert sizeof(int) == sizeof(np.int32_t) 12 | 13 | cdef extern from "gpu_nms.hpp": 14 | void _nms(np.int32_t*, int*, np.float32_t*, int, int, float, int) 15 | 16 | def gpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh, 17 | np.int32_t device_id=0): 18 | cdef int boxes_num = dets.shape[0] 19 | cdef int boxes_dim = dets.shape[1] 20 | cdef int num_out 21 | cdef np.ndarray[np.int32_t, ndim=1] \ 22 | keep = np.zeros(boxes_num, dtype=np.int32) 23 | cdef np.ndarray[np.float32_t, ndim=1] \ 24 | scores = dets[:, 4] 25 | cdef np.ndarray[np.int_t, ndim=1] \ 26 | order = scores.argsort()[::-1] 27 | cdef np.ndarray[np.float32_t, ndim=2] \ 28 | sorted_dets = dets[order, :] 29 | _nms(&keep[0], &num_out, &sorted_dets[0, 0], boxes_num, boxes_dim, thresh, device_id) 30 | keep = keep[:num_out] 31 | return list(order[keep]) 32 | -------------------------------------------------------------------------------- /lib/datasets/ds_utils.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast/er R-CNN 3 | # Licensed under The MIT License [see LICENSE for details] 4 | # Written by Ross Girshick 5 | # -------------------------------------------------------- 6 | 7 | import numpy as np 8 | 9 | def unique_boxes(boxes, scale=1.0): 10 | """Return indices of unique boxes.""" 11 | v = np.array([1, 1e3, 1e6, 1e9]) 12 | hashes = np.round(boxes * scale).dot(v) 13 | _, index = np.unique(hashes, return_index=True) 14 | return np.sort(index) 15 | 16 | def xywh_to_xyxy(boxes): 17 | """Convert [x y w h] box format to [x1 y1 x2 y2] format.""" 18 | return np.hstack((boxes[:, 0:2], boxes[:, 0:2] + boxes[:, 2:4] - 1)) 19 | 20 | def xyxy_to_xywh(boxes): 21 | """Convert [x1 y1 x2 y2] box format to [x y w h] format.""" 22 | return np.hstack((boxes[:, 0:2], boxes[:, 2:4] - boxes[:, 0:2] + 1)) 23 | 24 | def validate_boxes(boxes, width=0, height=0): 25 | """Check that a set of boxes are valid.""" 26 | x1 = boxes[:, 0] 27 | y1 = boxes[:, 1] 28 | x2 = boxes[:, 2] 29 | y2 = boxes[:, 3] 30 | assert (x1 >= 0).all() 31 | assert (y1 >= 0).all() 32 | assert (x2 >= x1).all() 33 | assert (y2 >= y1).all() 34 | assert (x2 < width).all() 35 | assert (y2 < height).all() 36 | 37 | def filter_small_boxes(boxes, min_size): 38 | w = boxes[:, 2] - boxes[:, 0] 39 | h = boxes[:, 3] - boxes[:, 1] 40 | keep = np.where((w >= min_size) & (h > min_size))[0] 41 | return keep 42 | -------------------------------------------------------------------------------- /lib/datasets/factory.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | """Factory method for easily getting imdbs by name.""" 9 | 10 | __sets = {} 11 | 12 | from datasets.pascal_voc import pascal_voc 13 | from datasets.coco import coco 14 | import numpy as np 15 | 16 | # Set up voc__ using selective search "fast" mode 17 | for year in ['2007', '2012']: 18 | for split in ['train', 'val', 'trainval', 'test']: 19 | name = 'voc_{}_{}'.format(year, split) 20 | __sets[name] = (lambda split=split, year=year: pascal_voc(split, year)) 21 | 22 | # Set up coco_2014_ 23 | for year in ['2014']: 24 | for split in ['train', 'val', 'minival', 'valminusminival']: 25 | name = 'coco_{}_{}'.format(year, split) 26 | __sets[name] = (lambda split=split, year=year: coco(split, year)) 27 | 28 | # Set up coco_2015_ 29 | for year in ['2015']: 30 | for split in ['test', 'test-dev']: 31 | name = 'coco_{}_{}'.format(year, split) 32 | __sets[name] = (lambda split=split, year=year: coco(split, year)) 33 | 34 | def get_imdb(name): 35 | """Get an imdb (image database) by name.""" 36 | if not __sets.has_key(name): 37 | raise KeyError('Unknown dataset: {}'.format(name)) 38 | return __sets[name]() 39 | 40 | def list_imdbs(): 41 | """List all registered imdbs.""" 42 | return __sets.keys() 43 | -------------------------------------------------------------------------------- /lib/datasets/VOCdevkit-matlab-wrapper/voc_eval.m: -------------------------------------------------------------------------------- 1 | function res = voc_eval(path, comp_id, test_set, output_dir) 2 | 3 | VOCopts = get_voc_opts(path); 4 | VOCopts.testset = test_set; 5 | 6 | for i = 1:length(VOCopts.classes) 7 | cls = VOCopts.classes{i}; 8 | res(i) = voc_eval_cls(cls, VOCopts, comp_id, output_dir); 9 | end 10 | 11 | fprintf('\n~~~~~~~~~~~~~~~~~~~~\n'); 12 | fprintf('Results:\n'); 13 | aps = [res(:).ap]'; 14 | fprintf('%.1f\n', aps * 100); 15 | fprintf('%.1f\n', mean(aps) * 100); 16 | fprintf('~~~~~~~~~~~~~~~~~~~~\n'); 17 | 18 | function res = voc_eval_cls(cls, VOCopts, comp_id, output_dir) 19 | 20 | test_set = VOCopts.testset; 21 | year = VOCopts.dataset(4:end); 22 | 23 | addpath(fullfile(VOCopts.datadir, 'VOCcode')); 24 | 25 | res_fn = sprintf(VOCopts.detrespath, comp_id, cls); 26 | 27 | recall = []; 28 | prec = []; 29 | ap = 0; 30 | ap_auc = 0; 31 | 32 | do_eval = (str2num(year) <= 2007) | ~strcmp(test_set, 'test'); 33 | if do_eval 34 | % Bug in VOCevaldet requires that tic has been called first 35 | tic; 36 | [recall, prec, ap] = VOCevaldet(VOCopts, comp_id, cls, true); 37 | ap_auc = xVOCap(recall, prec); 38 | 39 | % force plot limits 40 | ylim([0 1]); 41 | xlim([0 1]); 42 | 43 | print(gcf, '-djpeg', '-r0', ... 44 | [output_dir '/' cls '_pr.jpg']); 45 | end 46 | fprintf('!!! %s : %.4f %.4f\n', cls, ap, ap_auc); 47 | 48 | res.recall = recall; 49 | res.prec = prec; 50 | res.ap = ap; 51 | res.ap_auc = ap_auc; 52 | 53 | save([output_dir '/' cls '_pr.mat'], ... 54 | 'res', 'recall', 'prec', 'ap', 'ap_auc'); 55 | 56 | rmpath(fullfile(VOCopts.datadir, 'VOCcode')); 57 | -------------------------------------------------------------------------------- /lib/datasets/tools/mcg_munge.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | 4 | """Hacky tool to convert file system layout of MCG boxes downloaded from 5 | http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/mcg/ 6 | so that it's consistent with those computed by Jan Hosang (see: 7 | http://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal- 8 | computing/research/object-recognition-and-scene-understanding/how- 9 | good-are-detection-proposals-really/) 10 | 11 | NB: Boxes from the MCG website are in (y1, x1, y2, x2) order. 12 | Boxes from Hosang et al. are in (x1, y1, x2, y2) order. 13 | """ 14 | 15 | def munge(src_dir): 16 | # stored as: ./MCG-COCO-val2014-boxes/COCO_val2014_000000193401.mat 17 | # want: ./MCG/mat/COCO_val2014_0/COCO_val2014_000000141/COCO_val2014_000000141334.mat 18 | 19 | files = os.listdir(src_dir) 20 | for fn in files: 21 | base, ext = os.path.splitext(fn) 22 | # first 14 chars / first 22 chars / all chars + .mat 23 | # COCO_val2014_0/COCO_val2014_000000447/COCO_val2014_000000447991.mat 24 | first = base[:14] 25 | second = base[:22] 26 | dst_dir = os.path.join('MCG', 'mat', first, second) 27 | if not os.path.exists(dst_dir): 28 | os.makedirs(dst_dir) 29 | src = os.path.join(src_dir, fn) 30 | dst = os.path.join(dst_dir, fn) 31 | print 'MV: {} -> {}'.format(src, dst) 32 | os.rename(src, dst) 33 | 34 | if __name__ == '__main__': 35 | # src_dir should look something like: 36 | # src_dir = 'MCG-COCO-val2014-boxes' 37 | src_dir = sys.argv[1] 38 | munge(src_dir) 39 | -------------------------------------------------------------------------------- /lib/pycocotools/license.txt: -------------------------------------------------------------------------------- 1 | Copyright (c) 2014, Piotr Dollar and Tsung-Yi Lin 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | 1. Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 2. Redistributions in binary form must reproduce the above copyright notice, 10 | this list of conditions and the following disclaimer in the documentation 11 | and/or other materials provided with the distribution. 12 | 13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND 14 | ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 15 | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR 17 | ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 18 | (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 19 | LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 20 | ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 21 | (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 22 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 23 | 24 | The views and conclusions contained in the software and documentation are those 25 | of the authors and should not be interpreted as representing official policies, 26 | either expressed or implied, of the FreeBSD Project. 27 | -------------------------------------------------------------------------------- /experiments/scripts/fast_rcnn.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Usage: 3 | # ./experiments/scripts/fast_rcnn.sh GPU NET DATASET [options args to {train,test}_net.py] 4 | # DATASET is either pascal_voc or coco. 5 | # 6 | # Example: 7 | # ./experiments/scripts/fast_rcnn.sh 0 VGG_CNN_M_1024 pascal_voc \ 8 | # --set EXP_DIR foobar RNG_SEED 42 TRAIN.SCALES "[400, 500, 600, 700]" 9 | 10 | set -x 11 | set -e 12 | 13 | export PYTHONUNBUFFERED="True" 14 | 15 | GPU_ID=$1 16 | NET=$2 17 | NET_lc=${NET,,} 18 | DATASET=$3 19 | 20 | array=( $@ ) 21 | len=${#array[@]} 22 | EXTRA_ARGS=${array[@]:3:$len} 23 | EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_} 24 | 25 | case $DATASET in 26 | pascal_voc) 27 | TRAIN_IMDB="voc_2007_trainval" 28 | TEST_IMDB="voc_2007_test" 29 | PT_DIR="pascal_voc" 30 | ITERS=40000 31 | ;; 32 | coco) 33 | TRAIN_IMDB="coco_2014_train" 34 | TEST_IMDB="coco_2014_minival" 35 | PT_DIR="coco" 36 | ITERS=280000 37 | ;; 38 | *) 39 | echo "No dataset given" 40 | exit 41 | ;; 42 | esac 43 | 44 | LOG="experiments/logs/fast_rcnn_${NET}_${EXTRA_ARGS_SLUG}.txt.`date +'%Y-%m-%d_%H-%M-%S'`" 45 | exec &> >(tee -a "$LOG") 46 | echo Logging output to "$LOG" 47 | 48 | time ./tools/train_net.py --gpu ${GPU_ID} \ 49 | --solver models/${PT_DIR}/${NET}/fast_rcnn/solver.prototxt \ 50 | --weights data/imagenet_models/${NET}.v2.caffemodel \ 51 | --imdb ${TRAIN_IMDB} \ 52 | --iters ${ITERS} \ 53 | ${EXTRA_ARGS} 54 | 55 | set +x 56 | NET_FINAL=`grep -B 1 "done solving" ${LOG} | grep "Wrote snapshot" | awk '{print $4}'` 57 | set -x 58 | 59 | time ./tools/test_net.py --gpu ${GPU_ID} \ 60 | --def models/${PT_DIR}/${NET}/fast_rcnn/test.prototxt \ 61 | --net ${NET_FINAL} \ 62 | --imdb ${TEST_IMDB} \ 63 | ${EXTRA_ARGS} 64 | -------------------------------------------------------------------------------- /lib/utils/blob.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | """Blob helper functions.""" 9 | 10 | import numpy as np 11 | import cv2 12 | 13 | def im_list_to_blob(ims): 14 | """Convert a list of images into a network input. 15 | 16 | Assumes images are already prepared (means subtracted, BGR order, ...). 17 | """ 18 | max_shape = np.array([im.shape for im in ims]).max(axis=0) 19 | num_images = len(ims) 20 | blob = np.zeros((num_images, max_shape[0], max_shape[1], 3), 21 | dtype=np.float32) 22 | for i in xrange(num_images): 23 | im = ims[i] 24 | blob[i, 0:im.shape[0], 0:im.shape[1], :] = im 25 | # Move channels (axis 3) to axis 1 26 | # Axis order will become: (batch elem, channel, height, width) 27 | channel_swap = (0, 3, 1, 2) 28 | blob = blob.transpose(channel_swap) 29 | return blob 30 | 31 | def prep_im_for_blob(im, pixel_means, target_size, max_size): 32 | """Mean subtract and scale an image for use in a blob.""" 33 | im = im.astype(np.float32, copy=False) 34 | im -= pixel_means 35 | im_shape = im.shape 36 | im_size_min = np.min(im_shape[0:2]) 37 | im_size_max = np.max(im_shape[0:2]) 38 | im_scale = float(target_size) / float(im_size_min) 39 | # Prevent the biggest axis from being more than MAX_SIZE 40 | if np.round(im_scale * im_size_max) > max_size: 41 | im_scale = float(max_size) / float(im_size_max) 42 | im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale, 43 | interpolation=cv2.INTER_LINEAR) 44 | 45 | return im, im_scale 46 | -------------------------------------------------------------------------------- /experiments/scripts/fast_rcnn_ohem.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Usage: 3 | # ./experiments/scripts/fast_rcnn_ohem.sh GPU NET DATASET [options args to {train,test}_net.py] 4 | # DATASET is either pascal_voc or coco. 5 | # 6 | # Example: 7 | # ./experiments/scripts/fast_rcnn_ohem.sh 0 VGG16 pascal_voc \ 8 | # --set EXP_DIR foobar RNG_SEED 42 TRAIN.SCALES "[400, 500, 600, 700]" 9 | 10 | set -x 11 | set -e 12 | 13 | export PYTHONUNBUFFERED="True" 14 | 15 | GPU_ID=$1 16 | NET=$2 17 | NET_lc=${NET,,} 18 | DATASET=$3 19 | 20 | array=( $@ ) 21 | len=${#array[@]} 22 | EXTRA_ARGS=${array[@]:3:$len} 23 | EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_} 24 | 25 | case $DATASET in 26 | pascal_voc) 27 | TRAIN_IMDB="voc_2007_trainval" 28 | TEST_IMDB="voc_2007_test" 29 | PT_DIR="pascal_voc" 30 | ITERS=40000 31 | ;; 32 | coco) 33 | echo "Support coming soon. Stay tuned!" 34 | exit 35 | # TRAIN_IMDB="coco_2014_train" 36 | # TEST_IMDB="coco_2014_minival" 37 | # PT_DIR="coco" 38 | # ITERS=280000 39 | ;; 40 | *) 41 | echo "No dataset given" 42 | exit 43 | ;; 44 | esac 45 | 46 | LOG="experiments/logs/fast_rcnn_ohem_${NET}_${EXTRA_ARGS_SLUG}.txt.`date +'%Y-%m-%d_%H-%M-%S'`" 47 | exec &> >(tee -a "$LOG") 48 | echo Logging output to "$LOG" 49 | 50 | time ./tools/train_net.py --gpu ${GPU_ID} \ 51 | --solver models/${PT_DIR}/${NET}/fast_rcnn_ohem/solver.prototxt \ 52 | --weights data/imagenet_models/${NET}.v2.caffemodel \ 53 | --imdb ${TRAIN_IMDB} \ 54 | --iters ${ITERS} \ 55 | --cfg experiments/cfgs/fast_rcnn_ohem.yml \ 56 | ${EXTRA_ARGS} 57 | 58 | set +x 59 | NET_FINAL=`grep -B 1 "done solving" ${LOG} | grep "Wrote snapshot" | awk '{print $4}'` 60 | set -x 61 | 62 | time ./tools/test_net.py --gpu ${GPU_ID} \ 63 | --def models/${PT_DIR}/${NET}/fast_rcnn/test.prototxt \ 64 | --net ${NET_FINAL} \ 65 | --imdb ${TEST_IMDB} \ 66 | --cfg experiments/cfgs/fast_rcnn_ohem.yml \ 67 | --num_dets 2000 \ 68 | --det_thresh 0.00001 \ 69 | ${EXTRA_ARGS} -------------------------------------------------------------------------------- /lib/utils/bbox.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Sergey Karayev 6 | # -------------------------------------------------------- 7 | 8 | cimport cython 9 | import numpy as np 10 | cimport numpy as np 11 | 12 | DTYPE = np.float 13 | ctypedef np.float_t DTYPE_t 14 | 15 | def bbox_overlaps( 16 | np.ndarray[DTYPE_t, ndim=2] boxes, 17 | np.ndarray[DTYPE_t, ndim=2] query_boxes): 18 | """ 19 | Parameters 20 | ---------- 21 | boxes: (N, 4) ndarray of float 22 | query_boxes: (K, 4) ndarray of float 23 | Returns 24 | ------- 25 | overlaps: (N, K) ndarray of overlap between boxes and query_boxes 26 | """ 27 | cdef unsigned int N = boxes.shape[0] 28 | cdef unsigned int K = query_boxes.shape[0] 29 | cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE) 30 | cdef DTYPE_t iw, ih, box_area 31 | cdef DTYPE_t ua 32 | cdef unsigned int k, n 33 | for k in range(K): 34 | box_area = ( 35 | (query_boxes[k, 2] - query_boxes[k, 0] + 1) * 36 | (query_boxes[k, 3] - query_boxes[k, 1] + 1) 37 | ) 38 | for n in range(N): 39 | iw = ( 40 | min(boxes[n, 2], query_boxes[k, 2]) - 41 | max(boxes[n, 0], query_boxes[k, 0]) + 1 42 | ) 43 | if iw > 0: 44 | ih = ( 45 | min(boxes[n, 3], query_boxes[k, 3]) - 46 | max(boxes[n, 1], query_boxes[k, 1]) + 1 47 | ) 48 | if ih > 0: 49 | ua = float( 50 | (boxes[n, 2] - boxes[n, 0] + 1) * 51 | (boxes[n, 3] - boxes[n, 1] + 1) + 52 | box_area - iw * ih 53 | ) 54 | overlaps[n, k] = iw * ih / ua 55 | return overlaps 56 | -------------------------------------------------------------------------------- /experiments/scripts/fast_rcnn_ohem_07tv12tv.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # Usage: 3 | # ./experiments/scripts/fast_rcnn_ohem.sh GPU NET DATASET [options args to {train,test}_net.py] 4 | # DATASET is either pascal_voc or coco. 5 | # 6 | # Example: 7 | # ./experiments/scripts/fast_rcnn_ohem.sh 0 VGG16 pascal_voc \ 8 | # --set EXP_DIR foobar RNG_SEED 42 TRAIN.SCALES "[400, 500, 600, 700]" 9 | 10 | set -x 11 | set -e 12 | 13 | export PYTHONUNBUFFERED="True" 14 | 15 | GPU_ID=$1 16 | NET=$2 17 | NET_lc=${NET,,} 18 | DATASET=$3 19 | 20 | array=( $@ ) 21 | len=${#array[@]} 22 | EXTRA_ARGS=${array[@]:3:$len} 23 | EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_} 24 | 25 | case $DATASET in 26 | pascal_voc) 27 | TRAIN_IMDB="voc_2007_trainval+voc_2012_trainval" 28 | TEST_IMDB="voc_2007_test" 29 | PT_DIR="pascal_voc" 30 | ITERS=80000 31 | ;; 32 | coco) 33 | echo "Support coming soon. Stay tuned!" 34 | exit 35 | # TRAIN_IMDB="coco_2014_train" 36 | # TEST_IMDB="coco_2014_minival" 37 | # PT_DIR="coco" 38 | # ITERS=280000 39 | ;; 40 | *) 41 | echo "No dataset given" 42 | exit 43 | ;; 44 | esac 45 | 46 | LOG="experiments/logs/fast_rcnn_ohem_${NET}_${EXTRA_ARGS_SLUG}.txt.`date +'%Y-%m-%d_%H-%M-%S'`" 47 | exec &> >(tee -a "$LOG") 48 | echo Logging output to "$LOG" 49 | 50 | time ./tools/train_net.py --gpu ${GPU_ID} \ 51 | --solver models/${PT_DIR}/${NET}/fast_rcnn_ohem/solver.prototxt \ 52 | --weights data/imagenet_models/${NET}.v2.caffemodel \ 53 | --imdb ${TRAIN_IMDB} \ 54 | --iters ${ITERS} \ 55 | --cfg experiments/cfgs/fast_rcnn_ohem.yml \ 56 | ${EXTRA_ARGS} 57 | 58 | set +x 59 | NET_FINAL=`grep -B 1 "done solving" ${LOG} | grep "Wrote snapshot" | awk '{print $4}'` 60 | set -x 61 | 62 | time ./tools/test_net.py --gpu ${GPU_ID} \ 63 | --def models/${PT_DIR}/${NET}/fast_rcnn/test.prototxt \ 64 | --net ${NET_FINAL} \ 65 | --imdb ${TEST_IMDB} \ 66 | --cfg experiments/cfgs/fast_rcnn_ohem.yml \ 67 | --num_dets 2000 \ 68 | --det_thresh 0.00001 \ 69 | ${EXTRA_ARGS} 70 | -------------------------------------------------------------------------------- /lib/pycocotools/maskApi.h: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #pragma once 8 | #include 9 | 10 | typedef unsigned int uint; 11 | typedef unsigned long siz; 12 | typedef unsigned char byte; 13 | typedef double* BB; 14 | typedef struct { siz h, w, m; uint *cnts; } RLE; 15 | 16 | // Initialize/destroy RLE. 17 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ); 18 | void rleFree( RLE *R ); 19 | 20 | // Initialize/destroy RLE array. 21 | void rlesInit( RLE **R, siz n ); 22 | void rlesFree( RLE **R, siz n ); 23 | 24 | // Encode binary masks using RLE. 25 | void rleEncode( RLE *R, const byte *mask, siz h, siz w, siz n ); 26 | 27 | // Decode binary masks encoded via RLE. 28 | void rleDecode( const RLE *R, byte *mask, siz n ); 29 | 30 | // Compute union or intersection of encoded masks. 31 | void rleMerge( const RLE *R, RLE *M, siz n, bool intersect ); 32 | 33 | // Compute area of encoded masks. 34 | void rleArea( const RLE *R, siz n, uint *a ); 35 | 36 | // Compute intersection over union between masks. 37 | void rleIou( RLE *dt, RLE *gt, siz m, siz n, byte *iscrowd, double *o ); 38 | 39 | // Compute intersection over union between bounding boxes. 40 | void bbIou( BB dt, BB gt, siz m, siz n, byte *iscrowd, double *o ); 41 | 42 | // Get bounding boxes surrounding encoded masks. 43 | void rleToBbox( const RLE *R, BB bb, siz n ); 44 | 45 | // Convert bounding boxes to encoded masks. 46 | void rleFrBbox( RLE *R, const BB bb, siz h, siz w, siz n ); 47 | 48 | // Convert polygon to encoded mask. 49 | void rleFrPoly( RLE *R, const double *xy, siz k, siz h, siz w ); 50 | 51 | // Get compressed string representation of encoded mask. 52 | char* rleToString( const RLE *R ); 53 | 54 | // Convert from compressed string representation of encoded mask. 55 | void rleFrString( RLE *R, char *s, siz h, siz w ); 56 | -------------------------------------------------------------------------------- /lib/transform/torch_image_transform_layer.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast/er R-CNN 3 | # Licensed under The MIT License [see LICENSE for details] 4 | # -------------------------------------------------------- 5 | 6 | """ Transform images for compatibility with models trained with 7 | https://github.com/facebook/fb.resnet.torch. 8 | 9 | Usage in model prototxt: 10 | 11 | layer { 12 | name: 'data_xform' 13 | type: 'Python' 14 | bottom: 'data_caffe' 15 | top: 'data' 16 | python_param { 17 | module: 'transform.torch_image_transform_layer' 18 | layer: 'TorchImageTransformLayer' 19 | } 20 | } 21 | """ 22 | 23 | import caffe 24 | from fast_rcnn.config import cfg 25 | import numpy as np 26 | 27 | class TorchImageTransformLayer(caffe.Layer): 28 | def setup(self, bottom, top): 29 | # (1, 3, 1, 1) shaped arrays 30 | self.PIXEL_MEANS = \ 31 | np.array([[[[0.48462227599918]], 32 | [[0.45624044862054]], 33 | [[0.40588363755159]]]]) 34 | self.PIXEL_STDS = \ 35 | np.array([[[[0.22889466674951]], 36 | [[0.22446679341259]], 37 | [[0.22495548344775]]]]) 38 | # The default ("old") pixel means that were already subtracted 39 | channel_swap = (0, 3, 1, 2) 40 | self.OLD_PIXEL_MEANS = \ 41 | cfg.PIXEL_MEANS[np.newaxis, :, :, :].transpose(channel_swap) 42 | 43 | top[0].reshape(*(bottom[0].shape)) 44 | 45 | def forward(self, bottom, top): 46 | ims = bottom[0].data 47 | # Invert the channel means that were already subtracted 48 | ims += self.OLD_PIXEL_MEANS 49 | # 1. Permute BGR to RGB and normalize to [0, 1] 50 | ims = ims[:, [2, 1, 0], :, :] / 255.0 51 | # 2. Remove channel means 52 | ims -= self.PIXEL_MEANS 53 | # 3. Standardize channels 54 | ims /= self.PIXEL_STDS 55 | top[0].reshape(*(ims.shape)) 56 | top[0].data[...] = ims 57 | 58 | def backward(self, top, propagate_down, bottom): 59 | """This layer does not propagate gradients.""" 60 | pass 61 | 62 | def reshape(self, bottom, top): 63 | """Reshaping happens during the call to forward.""" 64 | pass 65 | -------------------------------------------------------------------------------- /tools/reval.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # -------------------------------------------------------- 4 | # Fast R-CNN 5 | # Copyright (c) 2015 Microsoft 6 | # Licensed under The MIT License [see LICENSE for details] 7 | # Written by Ross Girshick 8 | # -------------------------------------------------------- 9 | 10 | """Reval = re-eval. Re-evaluate saved detections.""" 11 | 12 | import _init_paths 13 | from fast_rcnn.test import apply_nms 14 | from fast_rcnn.config import cfg 15 | from datasets.factory import get_imdb 16 | import cPickle 17 | import os, sys, argparse 18 | import numpy as np 19 | 20 | def parse_args(): 21 | """ 22 | Parse input arguments 23 | """ 24 | parser = argparse.ArgumentParser(description='Re-evaluate results') 25 | parser.add_argument('output_dir', nargs=1, help='results directory', 26 | type=str) 27 | parser.add_argument('--imdb', dest='imdb_name', 28 | help='dataset to re-evaluate', 29 | default='voc_2007_test', type=str) 30 | parser.add_argument('--matlab', dest='matlab_eval', 31 | help='use matlab for evaluation', 32 | action='store_true') 33 | parser.add_argument('--comp', dest='comp_mode', help='competition mode', 34 | action='store_true') 35 | parser.add_argument('--nms', dest='apply_nms', help='apply nms', 36 | action='store_true') 37 | 38 | if len(sys.argv) == 1: 39 | parser.print_help() 40 | sys.exit(1) 41 | 42 | args = parser.parse_args() 43 | return args 44 | 45 | def from_dets(imdb_name, output_dir, args): 46 | imdb = get_imdb(imdb_name) 47 | imdb.competition_mode(args.comp_mode) 48 | imdb.config['matlab_eval'] = args.matlab_eval 49 | with open(os.path.join(output_dir, 'detections.pkl'), 'rb') as f: 50 | dets = cPickle.load(f) 51 | 52 | if args.apply_nms: 53 | print 'Applying NMS to all detections' 54 | nms_dets = apply_nms(dets, cfg.TEST.NMS) 55 | else: 56 | nms_dets = dets 57 | 58 | print 'Evaluating detections' 59 | imdb.evaluate_detections(nms_dets, output_dir) 60 | 61 | if __name__ == '__main__': 62 | args = parse_args() 63 | 64 | output_dir = os.path.abspath(args.output_dir[0]) 65 | imdb_name = args.imdb_name 66 | from_dets(imdb_name, output_dir, args) 67 | -------------------------------------------------------------------------------- /lib/nms/cpu_nms.pyx: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | cimport numpy as np 10 | 11 | cdef inline np.float32_t max(np.float32_t a, np.float32_t b): 12 | return a if a >= b else b 13 | 14 | cdef inline np.float32_t min(np.float32_t a, np.float32_t b): 15 | return a if a <= b else b 16 | 17 | def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh): 18 | cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0] 19 | cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1] 20 | cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2] 21 | cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3] 22 | cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4] 23 | 24 | cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1 + 1) * (y2 - y1 + 1) 25 | cdef np.ndarray[np.int_t, ndim=1] order = scores.argsort()[::-1] 26 | 27 | cdef int ndets = dets.shape[0] 28 | cdef np.ndarray[np.int_t, ndim=1] suppressed = \ 29 | np.zeros((ndets), dtype=np.int) 30 | 31 | # nominal indices 32 | cdef int _i, _j 33 | # sorted indices 34 | cdef int i, j 35 | # temp variables for box i's (the box currently under consideration) 36 | cdef np.float32_t ix1, iy1, ix2, iy2, iarea 37 | # variables for computing overlap with box j (lower scoring box) 38 | cdef np.float32_t xx1, yy1, xx2, yy2 39 | cdef np.float32_t w, h 40 | cdef np.float32_t inter, ovr 41 | 42 | keep = [] 43 | for _i in range(ndets): 44 | i = order[_i] 45 | if suppressed[i] == 1: 46 | continue 47 | keep.append(i) 48 | ix1 = x1[i] 49 | iy1 = y1[i] 50 | ix2 = x2[i] 51 | iy2 = y2[i] 52 | iarea = areas[i] 53 | for _j in range(_i + 1, ndets): 54 | j = order[_j] 55 | if suppressed[j] == 1: 56 | continue 57 | xx1 = max(ix1, x1[j]) 58 | yy1 = max(iy1, y1[j]) 59 | xx2 = min(ix2, x2[j]) 60 | yy2 = min(iy2, y2[j]) 61 | w = max(0.0, xx2 - xx1 + 1) 62 | h = max(0.0, yy2 - yy1 + 1) 63 | inter = w * h 64 | ovr = inter / (iarea + areas[j] - inter) 65 | if ovr >= thresh: 66 | suppressed[j] = 1 67 | 68 | return keep 69 | -------------------------------------------------------------------------------- /tools/eval_recall.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import _init_paths 4 | from fast_rcnn.config import cfg, cfg_from_file, cfg_from_list 5 | from datasets.factory import get_imdb 6 | import argparse 7 | import time, os, sys 8 | import numpy as np 9 | 10 | def parse_args(): 11 | """ 12 | Parse input arguments 13 | """ 14 | parser = argparse.ArgumentParser(description='Test a Fast R-CNN network') 15 | parser.add_argument('--imdb', dest='imdb_name', 16 | help='dataset to test', 17 | default='voc_2007_test', type=str) 18 | parser.add_argument('--method', dest='method', 19 | help='proposal method', 20 | default='selective_search', type=str) 21 | parser.add_argument('--rpn-file', dest='rpn_file', 22 | default=None, type=str) 23 | 24 | if len(sys.argv) == 1: 25 | parser.print_help() 26 | sys.exit(1) 27 | 28 | args = parser.parse_args() 29 | return args 30 | 31 | if __name__ == '__main__': 32 | args = parse_args() 33 | 34 | print('Called with args:') 35 | print(args) 36 | 37 | imdb = get_imdb(args.imdb_name) 38 | imdb.set_proposal_method(args.method) 39 | if args.rpn_file is not None: 40 | imdb.config['rpn_file'] = args.rpn_file 41 | 42 | candidate_boxes = None 43 | if 0: 44 | import scipy.io as sio 45 | filename = 'debug/stage1_rpn_voc_2007_test.mat' 46 | raw_data = sio.loadmat(filename)['aboxes'].ravel() 47 | candidate_boxes = raw_data 48 | 49 | ar, gt_overlaps, recalls, thresholds = \ 50 | imdb.evaluate_recall(candidate_boxes=candidate_boxes) 51 | print 'Method: {}'.format(args.method) 52 | print 'AverageRec: {:.3f}'.format(ar) 53 | 54 | def recall_at(t): 55 | ind = np.where(thresholds > t - 1e-5)[0][0] 56 | assert np.isclose(thresholds[ind], t) 57 | return recalls[ind] 58 | 59 | print 'Recall@0.5: {:.3f}'.format(recall_at(0.5)) 60 | print 'Recall@0.6: {:.3f}'.format(recall_at(0.6)) 61 | print 'Recall@0.7: {:.3f}'.format(recall_at(0.7)) 62 | print 'Recall@0.8: {:.3f}'.format(recall_at(0.8)) 63 | print 'Recall@0.9: {:.3f}'.format(recall_at(0.9)) 64 | # print again for easy spreadsheet copying 65 | print '{:.3f}'.format(ar) 66 | print '{:.3f}'.format(recall_at(0.5)) 67 | print '{:.3f}'.format(recall_at(0.6)) 68 | print '{:.3f}'.format(recall_at(0.7)) 69 | print '{:.3f}'.format(recall_at(0.8)) 70 | print '{:.3f}'.format(recall_at(0.9)) 71 | -------------------------------------------------------------------------------- /lib/fast_rcnn/bbox_transform.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | def bbox_transform(ex_rois, gt_rois): 11 | ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0 12 | ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0 13 | ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths 14 | ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights 15 | 16 | gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0 17 | gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0 18 | gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths 19 | gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights 20 | 21 | targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths 22 | targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights 23 | targets_dw = np.log(gt_widths / ex_widths) 24 | targets_dh = np.log(gt_heights / ex_heights) 25 | 26 | targets = np.vstack( 27 | (targets_dx, targets_dy, targets_dw, targets_dh)).transpose() 28 | return targets 29 | 30 | def bbox_transform_inv(boxes, deltas): 31 | if boxes.shape[0] == 0: 32 | return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype) 33 | 34 | boxes = boxes.astype(deltas.dtype, copy=False) 35 | 36 | widths = boxes[:, 2] - boxes[:, 0] + 1.0 37 | heights = boxes[:, 3] - boxes[:, 1] + 1.0 38 | ctr_x = boxes[:, 0] + 0.5 * widths 39 | ctr_y = boxes[:, 1] + 0.5 * heights 40 | 41 | dx = deltas[:, 0::4] 42 | dy = deltas[:, 1::4] 43 | dw = deltas[:, 2::4] 44 | dh = deltas[:, 3::4] 45 | 46 | pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis] 47 | pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis] 48 | pred_w = np.exp(dw) * widths[:, np.newaxis] 49 | pred_h = np.exp(dh) * heights[:, np.newaxis] 50 | 51 | pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype) 52 | # x1 53 | pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w 54 | # y1 55 | pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h 56 | # x2 57 | pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w 58 | # y2 59 | pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h 60 | 61 | return pred_boxes 62 | 63 | def clip_boxes(boxes, im_shape): 64 | """ 65 | Clip boxes to image boundaries. 66 | """ 67 | 68 | # x1 >= 0 69 | boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0) 70 | # y1 >= 0 71 | boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0) 72 | # x2 < im_shape[1] 73 | boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0) 74 | # y2 < im_shape[0] 75 | boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0) 76 | return boxes 77 | -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | This directory holds (*after you download them*): 2 | - Fast R-CNN models trained with OHEM on VOC 2007 trainval 3 | - Caffe models pre-trained on ImageNet 4 | - Symlinks to datasets 5 | 6 | To download Fast R-CNN models (VGG_CNN_M_1024, VGG16) trained with OHEM on VOC 2007 trainval, run: 7 | 8 | ``` 9 | ./data/scripts/fetch_fast_rcnn_ohem_models.sh 10 | ``` 11 | 12 | This script will populate `data/fast_rcnn_ohem_models` with VGG16 and VGG_CNN_M_1024 models (Fast R-CNN detectors trained with OHEM). 13 | 14 | 15 | To download Caffe models (ZF, VGG16) pre-trained on ImageNet, run: 16 | 17 | ``` 18 | ./data/scripts/fetch_imagenet_models.sh 19 | ``` 20 | 21 | This script will populate `data/imagenet_models`. 22 | 23 | In order to train and test with PASCAL VOC, you will need to establish symlinks. 24 | From the `data` directory (`cd data`): 25 | 26 | ``` 27 | # For VOC 2007 28 | ln -s /your/path/to/VOC2007/VOCdevkit VOCdevkit2007 29 | 30 | # For VOC 2012 31 | ln -s /your/path/to/VOC2012/VOCdevkit VOCdevkit2012 32 | ``` 33 | 34 | Install the MS COCO dataset at /path/to/coco 35 | 36 | ``` 37 | ln -s /path/to/coco coco 38 | ``` 39 | 40 | For COCO with Fast R-CNN, place object proposals under `coco_proposals` (inside 41 | the `data` directory). You can obtain proposals on COCO from Jan Hosang at 42 | https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/object-recognition-and-scene-understanding/how-good-are-detection-proposals-really/. 43 | For COCO, using MCG is recommended over selective search. MCG boxes can be downloaded 44 | from http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/mcg/. 45 | Use the tool `lib/datasets/tools/mcg_munge.py` to convert the downloaded MCG data 46 | into the same file layout as those from Jan Hosang. 47 | 48 | Since you'll likely be experimenting with multiple installs of Fast/er R-CNN in 49 | parallel, you'll probably want to keep all of this data in a shared place and 50 | use symlinks. On my system I create the following symlinks inside `data`: 51 | 52 | Annotations for the 5k image 'minival' subset of COCO val2014 that I like to use 53 | can be found at http://www.cs.berkeley.edu/~rbg/faster-rcnn-data/instances_minival2014.json.zip. 54 | Annotations for COCO val2014 (set) minus minival (~35k images) can be found at 55 | http://www.cs.berkeley.edu/~rbg/faster-rcnn-data/instances_valminusminival2014.json.zip. 56 | 57 | ``` 58 | # data/cache holds various outputs created by the datasets package 59 | ln -s /data/fast_rcnn_shared/cache 60 | 61 | # move the imagenet_models to shared location and symlink to them 62 | ln -s /data/fast_rcnn_shared/imagenet_models 63 | 64 | # move the selective search data to a shared location and symlink to them 65 | # (only applicable to Fast R-CNN training) 66 | ln -s /data/fast_rcnn_shared/selective_search_data 67 | 68 | ln -s /data/VOC2007/VOCdevkit VOCdevkit2007 69 | ln -s /data/VOC2012/VOCdevkit VOCdevkit2012 70 | ``` 71 | -------------------------------------------------------------------------------- /tools/rpn_generate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # -------------------------------------------------------- 4 | # Fast/er/ R-CNN 5 | # Copyright (c) 2015 Microsoft 6 | # Licensed under The MIT License [see LICENSE for details] 7 | # Written by Ross Girshick 8 | # -------------------------------------------------------- 9 | 10 | """Generate RPN proposals.""" 11 | 12 | import _init_paths 13 | import numpy as np 14 | from fast_rcnn.config import cfg, cfg_from_file, cfg_from_list, get_output_dir 15 | from datasets.factory import get_imdb 16 | from rpn.generate import imdb_proposals 17 | import cPickle 18 | import caffe 19 | import argparse 20 | import pprint 21 | import time, os, sys 22 | 23 | def parse_args(): 24 | """ 25 | Parse input arguments 26 | """ 27 | parser = argparse.ArgumentParser(description='Test a Fast R-CNN network') 28 | parser.add_argument('--gpu', dest='gpu_id', help='GPU id to use', 29 | default=0, type=int) 30 | parser.add_argument('--def', dest='prototxt', 31 | help='prototxt file defining the network', 32 | default=None, type=str) 33 | parser.add_argument('--net', dest='caffemodel', 34 | help='model to test', 35 | default=None, type=str) 36 | parser.add_argument('--cfg', dest='cfg_file', 37 | help='optional config file', default=None, type=str) 38 | parser.add_argument('--wait', dest='wait', 39 | help='wait until net file exists', 40 | default=True, type=bool) 41 | parser.add_argument('--imdb', dest='imdb_name', 42 | help='dataset to test', 43 | default='voc_2007_test', type=str) 44 | parser.add_argument('--set', dest='set_cfgs', 45 | help='set config keys', default=None, 46 | nargs=argparse.REMAINDER) 47 | 48 | if len(sys.argv) == 1: 49 | parser.print_help() 50 | sys.exit(1) 51 | 52 | args = parser.parse_args() 53 | return args 54 | 55 | if __name__ == '__main__': 56 | args = parse_args() 57 | 58 | print('Called with args:') 59 | print(args) 60 | 61 | if args.cfg_file is not None: 62 | cfg_from_file(args.cfg_file) 63 | if args.set_cfgs is not None: 64 | cfg_from_list(args.set_cfgs) 65 | 66 | cfg.GPU_ID = args.gpu_id 67 | 68 | # RPN test settings 69 | cfg.TEST.RPN_PRE_NMS_TOP_N = -1 70 | cfg.TEST.RPN_POST_NMS_TOP_N = 2000 71 | 72 | print('Using config:') 73 | pprint.pprint(cfg) 74 | 75 | while not os.path.exists(args.caffemodel) and args.wait: 76 | print('Waiting for {} to exist...'.format(args.caffemodel)) 77 | time.sleep(10) 78 | 79 | caffe.set_mode_gpu() 80 | caffe.set_device(args.gpu_id) 81 | net = caffe.Net(args.prototxt, args.caffemodel, caffe.TEST) 82 | net.name = os.path.splitext(os.path.basename(args.caffemodel))[0] 83 | 84 | imdb = get_imdb(args.imdb_name) 85 | imdb_boxes = imdb_proposals(net, imdb) 86 | 87 | output_dir = get_output_dir(imdb, net) 88 | rpn_file = os.path.join(output_dir, net.name + '_rpn_proposals.pkl') 89 | with open(rpn_file, 'wb') as f: 90 | cPickle.dump(imdb_boxes, f, cPickle.HIGHEST_PROTOCOL) 91 | print 'Wrote RPN proposals to {}'.format(rpn_file) 92 | -------------------------------------------------------------------------------- /lib/rpn/generate_anchors.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick and Sean Bell 6 | # -------------------------------------------------------- 7 | 8 | import numpy as np 9 | 10 | # Verify that we compute the same anchors as Shaoqing's matlab implementation: 11 | # 12 | # >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat 13 | # >> anchors 14 | # 15 | # anchors = 16 | # 17 | # -83 -39 100 56 18 | # -175 -87 192 104 19 | # -359 -183 376 200 20 | # -55 -55 72 72 21 | # -119 -119 136 136 22 | # -247 -247 264 264 23 | # -35 -79 52 96 24 | # -79 -167 96 184 25 | # -167 -343 184 360 26 | 27 | #array([[ -83., -39., 100., 56.], 28 | # [-175., -87., 192., 104.], 29 | # [-359., -183., 376., 200.], 30 | # [ -55., -55., 72., 72.], 31 | # [-119., -119., 136., 136.], 32 | # [-247., -247., 264., 264.], 33 | # [ -35., -79., 52., 96.], 34 | # [ -79., -167., 96., 184.], 35 | # [-167., -343., 184., 360.]]) 36 | 37 | def generate_anchors(base_size=16, ratios=[0.5, 1, 2], 38 | scales=2**np.arange(3, 6)): 39 | """ 40 | Generate anchor (reference) windows by enumerating aspect ratios X 41 | scales wrt a reference (0, 0, 15, 15) window. 42 | """ 43 | 44 | base_anchor = np.array([1, 1, base_size, base_size]) - 1 45 | ratio_anchors = _ratio_enum(base_anchor, ratios) 46 | anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales) 47 | for i in xrange(ratio_anchors.shape[0])]) 48 | return anchors 49 | 50 | def _whctrs(anchor): 51 | """ 52 | Return width, height, x center, and y center for an anchor (window). 53 | """ 54 | 55 | w = anchor[2] - anchor[0] + 1 56 | h = anchor[3] - anchor[1] + 1 57 | x_ctr = anchor[0] + 0.5 * (w - 1) 58 | y_ctr = anchor[1] + 0.5 * (h - 1) 59 | return w, h, x_ctr, y_ctr 60 | 61 | def _mkanchors(ws, hs, x_ctr, y_ctr): 62 | """ 63 | Given a vector of widths (ws) and heights (hs) around a center 64 | (x_ctr, y_ctr), output a set of anchors (windows). 65 | """ 66 | 67 | ws = ws[:, np.newaxis] 68 | hs = hs[:, np.newaxis] 69 | anchors = np.hstack((x_ctr - 0.5 * (ws - 1), 70 | y_ctr - 0.5 * (hs - 1), 71 | x_ctr + 0.5 * (ws - 1), 72 | y_ctr + 0.5 * (hs - 1))) 73 | return anchors 74 | 75 | def _ratio_enum(anchor, ratios): 76 | """ 77 | Enumerate a set of anchors for each aspect ratio wrt an anchor. 78 | """ 79 | 80 | w, h, x_ctr, y_ctr = _whctrs(anchor) 81 | size = w * h 82 | size_ratios = size / ratios 83 | ws = np.round(np.sqrt(size_ratios)) 84 | hs = np.round(ws * ratios) 85 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr) 86 | return anchors 87 | 88 | def _scale_enum(anchor, scales): 89 | """ 90 | Enumerate a set of anchors for each scale wrt an anchor. 91 | """ 92 | 93 | w, h, x_ctr, y_ctr = _whctrs(anchor) 94 | ws = w * scales 95 | hs = h * scales 96 | anchors = _mkanchors(ws, hs, x_ctr, y_ctr) 97 | return anchors 98 | 99 | if __name__ == '__main__': 100 | import time 101 | t = time.time() 102 | a = generate_anchors() 103 | print time.time() - t 104 | print a 105 | from IPython import embed; embed() 106 | -------------------------------------------------------------------------------- /tools/test_net.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # -------------------------------------------------------- 4 | # Fast R-CNN with OHEM 5 | # Licensed under The MIT License [see LICENSE for details] 6 | # Written by Ross Girshick and Abhinav Shrivastava 7 | # -------------------------------------------------------- 8 | 9 | """Test a Fast R-CNN network on an image database.""" 10 | 11 | import _init_paths 12 | from fast_rcnn.test import test_net 13 | from fast_rcnn.config import cfg, cfg_from_file, cfg_from_list 14 | from datasets.factory import get_imdb 15 | import caffe 16 | import argparse 17 | import pprint 18 | import time, os, sys 19 | 20 | def parse_args(): 21 | """ 22 | Parse input arguments 23 | """ 24 | parser = argparse.ArgumentParser(description='Test a Fast R-CNN network') 25 | parser.add_argument('--gpu', dest='gpu_id', help='GPU id to use', 26 | default=0, type=int) 27 | parser.add_argument('--def', dest='prototxt', 28 | help='prototxt file defining the network', 29 | default=None, type=str) 30 | parser.add_argument('--net', dest='caffemodel', 31 | help='model to test', 32 | default=None, type=str) 33 | parser.add_argument('--cfg', dest='cfg_file', 34 | help='optional config file', default=None, type=str) 35 | parser.add_argument('--wait', dest='wait', 36 | help='wait until net file exists', 37 | default=True, type=bool) 38 | parser.add_argument('--imdb', dest='imdb_name', 39 | help='dataset to test', 40 | default='voc_2007_test', type=str) 41 | parser.add_argument('--comp', dest='comp_mode', help='competition mode', 42 | action='store_true') 43 | parser.add_argument('--set', dest='set_cfgs', 44 | help='set config keys', default=None, 45 | nargs=argparse.REMAINDER) 46 | parser.add_argument('--vis', dest='vis', help='visualize detections', 47 | action='store_true') 48 | parser.add_argument('--num_dets', dest='max_per_image', 49 | help='max number of detections per image', 50 | default=100, type=int) 51 | parser.add_argument('--det_thresh', dest='det_thresh', 52 | help='detection score threshold', 53 | default=0.05, type=float) 54 | 55 | if len(sys.argv) == 1: 56 | parser.print_help() 57 | sys.exit(1) 58 | 59 | args = parser.parse_args() 60 | return args 61 | 62 | if __name__ == '__main__': 63 | args = parse_args() 64 | 65 | print('Called with args:') 66 | print(args) 67 | 68 | if args.cfg_file is not None: 69 | cfg_from_file(args.cfg_file) 70 | if args.set_cfgs is not None: 71 | cfg_from_list(args.set_cfgs) 72 | 73 | cfg.GPU_ID = args.gpu_id 74 | 75 | print('Using config:') 76 | pprint.pprint(cfg) 77 | 78 | while not os.path.exists(args.caffemodel) and args.wait: 79 | print('Waiting for {} to exist...'.format(args.caffemodel)) 80 | time.sleep(10) 81 | 82 | caffe.set_mode_gpu() 83 | caffe.set_device(args.gpu_id) 84 | net = caffe.Net(args.prototxt, args.caffemodel, caffe.TEST) 85 | net.name = os.path.splitext(os.path.basename(args.caffemodel))[0] 86 | 87 | imdb = get_imdb(args.imdb_name) 88 | imdb.competition_mode(args.comp_mode) 89 | if not cfg.TEST.HAS_RPN: 90 | imdb.set_proposal_method(cfg.TEST.PROPOSAL_METHOD) 91 | 92 | test_net(net, imdb, max_per_image=args.max_per_image, vis=args.vis, thresh=args.det_thresh) 93 | -------------------------------------------------------------------------------- /tools/train_net.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # -------------------------------------------------------- 4 | # Fast R-CNN 5 | # Copyright (c) 2015 Microsoft 6 | # Licensed under The MIT License [see LICENSE for details] 7 | # Written by Ross Girshick 8 | # -------------------------------------------------------- 9 | 10 | """Train a Fast R-CNN network on a region of interest database.""" 11 | 12 | import _init_paths 13 | from fast_rcnn.train import get_training_roidb, train_net 14 | from fast_rcnn.config import cfg, cfg_from_file, cfg_from_list, get_output_dir 15 | from datasets.factory import get_imdb 16 | import datasets.imdb 17 | import caffe 18 | import argparse 19 | import pprint 20 | import numpy as np 21 | import sys 22 | 23 | def parse_args(): 24 | """ 25 | Parse input arguments 26 | """ 27 | parser = argparse.ArgumentParser(description='Train a Fast R-CNN network') 28 | parser.add_argument('--gpu', dest='gpu_id', 29 | help='GPU device id to use [0]', 30 | default=0, type=int) 31 | parser.add_argument('--solver', dest='solver', 32 | help='solver prototxt', 33 | default=None, type=str) 34 | parser.add_argument('--iters', dest='max_iters', 35 | help='number of iterations to train', 36 | default=40000, type=int) 37 | parser.add_argument('--weights', dest='pretrained_model', 38 | help='initialize with pretrained model weights', 39 | default=None, type=str) 40 | parser.add_argument('--cfg', dest='cfg_file', 41 | help='optional config file', 42 | default=None, type=str) 43 | parser.add_argument('--imdb', dest='imdb_name', 44 | help='dataset to train on', 45 | default='voc_2007_trainval', type=str) 46 | parser.add_argument('--rand', dest='randomize', 47 | help='randomize (do not use a fixed seed)', 48 | action='store_true') 49 | parser.add_argument('--set', dest='set_cfgs', 50 | help='set config keys', default=None, 51 | nargs=argparse.REMAINDER) 52 | 53 | if len(sys.argv) == 1: 54 | parser.print_help() 55 | sys.exit(1) 56 | 57 | args = parser.parse_args() 58 | return args 59 | 60 | def combined_roidb(imdb_names): 61 | def get_roidb(imdb_name): 62 | imdb = get_imdb(imdb_name) 63 | print 'Loaded dataset `{:s}` for training'.format(imdb.name) 64 | imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD) 65 | print 'Set proposal method: {:s}'.format(cfg.TRAIN.PROPOSAL_METHOD) 66 | roidb = get_training_roidb(imdb) 67 | return roidb 68 | 69 | roidbs = [get_roidb(s) for s in imdb_names.split('+')] 70 | roidb = roidbs[0] 71 | if len(roidbs) > 1: 72 | for r in roidbs[1:]: 73 | roidb.extend(r) 74 | imdb = datasets.imdb.imdb(imdb_names) 75 | else: 76 | imdb = get_imdb(imdb_names) 77 | return imdb, roidb 78 | 79 | if __name__ == '__main__': 80 | args = parse_args() 81 | 82 | print('Called with args:') 83 | print(args) 84 | 85 | if args.cfg_file is not None: 86 | cfg_from_file(args.cfg_file) 87 | if args.set_cfgs is not None: 88 | cfg_from_list(args.set_cfgs) 89 | 90 | cfg.GPU_ID = args.gpu_id 91 | 92 | print('Using config:') 93 | pprint.pprint(cfg) 94 | 95 | if not args.randomize: 96 | # fix the random seeds (numpy and caffe) for reproducibility 97 | np.random.seed(cfg.RNG_SEED) 98 | caffe.set_random_seed(cfg.RNG_SEED) 99 | 100 | # set up caffe 101 | caffe.set_mode_gpu() 102 | caffe.set_device(args.gpu_id) 103 | 104 | imdb, roidb = combined_roidb(args.imdb_name) 105 | print '{:d} roidb entries'.format(len(roidb)) 106 | 107 | output_dir = get_output_dir(imdb) 108 | print 'Output will be saved to `{:s}`'.format(output_dir) 109 | 110 | train_net(args.solver, roidb, output_dir, 111 | pretrained_model=args.pretrained_model, 112 | max_iters=args.max_iters) 113 | -------------------------------------------------------------------------------- /lib/pycocotools/mask.py: -------------------------------------------------------------------------------- 1 | __author__ = 'tsungyi' 2 | 3 | import pycocotools._mask as _mask 4 | 5 | # Interface for manipulating masks stored in RLE format. 6 | # 7 | # RLE is a simple yet efficient format for storing binary masks. RLE 8 | # first divides a vector (or vectorized image) into a series of piecewise 9 | # constant regions and then for each piece simply stores the length of 10 | # that piece. For example, given M=[0 0 1 1 1 0 1] the RLE counts would 11 | # be [2 3 1 1], or for M=[1 1 1 1 1 1 0] the counts would be [0 6 1] 12 | # (note that the odd counts are always the numbers of zeros). Instead of 13 | # storing the counts directly, additional compression is achieved with a 14 | # variable bitrate representation based on a common scheme called LEB128. 15 | # 16 | # Compression is greatest given large piecewise constant regions. 17 | # Specifically, the size of the RLE is proportional to the number of 18 | # *boundaries* in M (or for an image the number of boundaries in the y 19 | # direction). Assuming fairly simple shapes, the RLE representation is 20 | # O(sqrt(n)) where n is number of pixels in the object. Hence space usage 21 | # is substantially lower, especially for large simple objects (large n). 22 | # 23 | # Many common operations on masks can be computed directly using the RLE 24 | # (without need for decoding). This includes computations such as area, 25 | # union, intersection, etc. All of these operations are linear in the 26 | # size of the RLE, in other words they are O(sqrt(n)) where n is the area 27 | # of the object. Computing these operations on the original mask is O(n). 28 | # Thus, using the RLE can result in substantial computational savings. 29 | # 30 | # The following API functions are defined: 31 | # encode - Encode binary masks using RLE. 32 | # decode - Decode binary masks encoded via RLE. 33 | # merge - Compute union or intersection of encoded masks. 34 | # iou - Compute intersection over union between masks. 35 | # area - Compute area of encoded masks. 36 | # toBbox - Get bounding boxes surrounding encoded masks. 37 | # frPyObjects - Convert polygon, bbox, and uncompressed RLE to encoded RLE mask. 38 | # 39 | # Usage: 40 | # Rs = encode( masks ) 41 | # masks = decode( Rs ) 42 | # R = merge( Rs, intersect=false ) 43 | # o = iou( dt, gt, iscrowd ) 44 | # a = area( Rs ) 45 | # bbs = toBbox( Rs ) 46 | # Rs = frPyObjects( [pyObjects], h, w ) 47 | # 48 | # In the API the following formats are used: 49 | # Rs - [dict] Run-length encoding of binary masks 50 | # R - dict Run-length encoding of binary mask 51 | # masks - [hxwxn] Binary mask(s) (must have type np.ndarray(dtype=uint8) in column-major order) 52 | # iscrowd - [nx1] list of np.ndarray. 1 indicates corresponding gt image has crowd region to ignore 53 | # bbs - [nx4] Bounding box(es) stored as [x y w h] 54 | # poly - Polygon stored as [[x1 y1 x2 y2...],[x1 y1 ...],...] (2D list) 55 | # dt,gt - May be either bounding boxes or encoded masks 56 | # Both poly and bbs are 0-indexed (bbox=[0 0 1 1] encloses first pixel). 57 | # 58 | # Finally, a note about the intersection over union (iou) computation. 59 | # The standard iou of a ground truth (gt) and detected (dt) object is 60 | # iou(gt,dt) = area(intersect(gt,dt)) / area(union(gt,dt)) 61 | # For "crowd" regions, we use a modified criteria. If a gt object is 62 | # marked as "iscrowd", we allow a dt to match any subregion of the gt. 63 | # Choosing gt' in the crowd gt that best matches the dt can be done using 64 | # gt'=intersect(dt,gt). Since by definition union(gt',dt)=dt, computing 65 | # iou(gt,dt,iscrowd) = iou(gt',dt) = area(intersect(gt,dt)) / area(dt) 66 | # For crowd gt regions we use this modified criteria above for the iou. 67 | # 68 | # To compile run "python setup.py build_ext --inplace" 69 | # Please do not contact us for help with compiling. 70 | # 71 | # Microsoft COCO Toolbox. version 2.0 72 | # Data, paper, and tutorials available at: http://mscoco.org/ 73 | # Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 74 | # Licensed under the Simplified BSD License [see coco/license.txt] 75 | 76 | encode = _mask.encode 77 | decode = _mask.decode 78 | iou = _mask.iou 79 | merge = _mask.merge 80 | area = _mask.area 81 | toBbox = _mask.toBbox 82 | frPyObjects = _mask.frPyObjects -------------------------------------------------------------------------------- /lib/rpn/generate.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | from fast_rcnn.config import cfg 9 | from utils.blob import im_list_to_blob 10 | from utils.timer import Timer 11 | import numpy as np 12 | import cv2 13 | 14 | def _vis_proposals(im, dets, thresh=0.5): 15 | """Draw detected bounding boxes.""" 16 | inds = np.where(dets[:, -1] >= thresh)[0] 17 | if len(inds) == 0: 18 | return 19 | 20 | class_name = 'obj' 21 | im = im[:, :, (2, 1, 0)] 22 | fig, ax = plt.subplots(figsize=(12, 12)) 23 | ax.imshow(im, aspect='equal') 24 | for i in inds: 25 | bbox = dets[i, :4] 26 | score = dets[i, -1] 27 | 28 | ax.add_patch( 29 | plt.Rectangle((bbox[0], bbox[1]), 30 | bbox[2] - bbox[0], 31 | bbox[3] - bbox[1], fill=False, 32 | edgecolor='red', linewidth=3.5) 33 | ) 34 | ax.text(bbox[0], bbox[1] - 2, 35 | '{:s} {:.3f}'.format(class_name, score), 36 | bbox=dict(facecolor='blue', alpha=0.5), 37 | fontsize=14, color='white') 38 | 39 | ax.set_title(('{} detections with ' 40 | 'p({} | box) >= {:.1f}').format(class_name, class_name, 41 | thresh), 42 | fontsize=14) 43 | plt.axis('off') 44 | plt.tight_layout() 45 | plt.draw() 46 | 47 | def _get_image_blob(im): 48 | """Converts an image into a network input. 49 | 50 | Arguments: 51 | im (ndarray): a color image in BGR order 52 | 53 | Returns: 54 | blob (ndarray): a data blob holding an image pyramid 55 | im_scale_factors (list): list of image scales (relative to im) used 56 | in the image pyramid 57 | """ 58 | im_orig = im.astype(np.float32, copy=True) 59 | im_orig -= cfg.PIXEL_MEANS 60 | 61 | im_shape = im_orig.shape 62 | im_size_min = np.min(im_shape[0:2]) 63 | im_size_max = np.max(im_shape[0:2]) 64 | 65 | processed_ims = [] 66 | 67 | assert len(cfg.TEST.SCALES) == 1 68 | target_size = cfg.TEST.SCALES[0] 69 | 70 | im_scale = float(target_size) / float(im_size_min) 71 | # Prevent the biggest axis from being more than MAX_SIZE 72 | if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE: 73 | im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max) 74 | im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale, 75 | interpolation=cv2.INTER_LINEAR) 76 | im_info = np.hstack((im.shape[:2], im_scale))[np.newaxis, :] 77 | processed_ims.append(im) 78 | 79 | # Create a blob to hold the input images 80 | blob = im_list_to_blob(processed_ims) 81 | 82 | return blob, im_info 83 | 84 | def im_proposals(net, im): 85 | """Generate RPN proposals on a single image.""" 86 | blobs = {} 87 | blobs['data'], blobs['im_info'] = _get_image_blob(im) 88 | net.blobs['data'].reshape(*(blobs['data'].shape)) 89 | net.blobs['im_info'].reshape(*(blobs['im_info'].shape)) 90 | blobs_out = net.forward( 91 | data=blobs['data'].astype(np.float32, copy=False), 92 | im_info=blobs['im_info'].astype(np.float32, copy=False)) 93 | 94 | scale = blobs['im_info'][0, 2] 95 | boxes = blobs_out['rois'][:, 1:].copy() / scale 96 | scores = blobs_out['scores'].copy() 97 | return boxes, scores 98 | 99 | def imdb_proposals(net, imdb): 100 | """Generate RPN proposals on all images in an imdb.""" 101 | 102 | _t = Timer() 103 | imdb_boxes = [[] for _ in xrange(imdb.num_images)] 104 | for i in xrange(imdb.num_images): 105 | im = cv2.imread(imdb.image_path_at(i)) 106 | _t.tic() 107 | imdb_boxes[i], scores = im_proposals(net, im) 108 | _t.toc() 109 | print 'im_proposals: {:d}/{:d} {:.3f}s' \ 110 | .format(i + 1, imdb.num_images, _t.average_time) 111 | if 0: 112 | dets = np.hstack((imdb_boxes[i], scores)) 113 | # from IPython import embed; embed() 114 | _vis_proposals(im, dets[:3, :], thresh=0.9) 115 | plt.show() 116 | 117 | return imdb_boxes 118 | -------------------------------------------------------------------------------- /tools/compress_net.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # -------------------------------------------------------- 4 | # Fast R-CNN 5 | # Copyright (c) 2015 Microsoft 6 | # Licensed under The MIT License [see LICENSE for details] 7 | # Written by Ross Girshick 8 | # -------------------------------------------------------- 9 | 10 | """Compress a Fast R-CNN network using truncated SVD.""" 11 | 12 | import _init_paths 13 | import caffe 14 | import argparse 15 | import numpy as np 16 | import os, sys 17 | 18 | def parse_args(): 19 | """Parse input arguments.""" 20 | parser = argparse.ArgumentParser(description='Compress a Fast R-CNN network') 21 | parser.add_argument('--def', dest='prototxt', 22 | help='prototxt file defining the uncompressed network', 23 | default=None, type=str) 24 | parser.add_argument('--def-svd', dest='prototxt_svd', 25 | help='prototxt file defining the SVD compressed network', 26 | default=None, type=str) 27 | parser.add_argument('--net', dest='caffemodel', 28 | help='model to compress', 29 | default=None, type=str) 30 | 31 | if len(sys.argv) == 1: 32 | parser.print_help() 33 | sys.exit(1) 34 | 35 | args = parser.parse_args() 36 | return args 37 | 38 | def compress_weights(W, l): 39 | """Compress the weight matrix W of an inner product (fully connected) layer 40 | using truncated SVD. 41 | 42 | Parameters: 43 | W: N x M weights matrix 44 | l: number of singular values to retain 45 | 46 | Returns: 47 | Ul, L: matrices such that W \approx Ul*L 48 | """ 49 | 50 | # numpy doesn't seem to have a fast truncated SVD algorithm... 51 | # this could be faster 52 | U, s, V = np.linalg.svd(W, full_matrices=False) 53 | 54 | Ul = U[:, :l] 55 | sl = s[:l] 56 | Vl = V[:l, :] 57 | 58 | L = np.dot(np.diag(sl), Vl) 59 | return Ul, L 60 | 61 | def main(): 62 | args = parse_args() 63 | 64 | # prototxt = 'models/VGG16/test.prototxt' 65 | # caffemodel = 'snapshots/vgg16_fast_rcnn_iter_40000.caffemodel' 66 | net = caffe.Net(args.prototxt, args.caffemodel, caffe.TEST) 67 | 68 | # prototxt_svd = 'models/VGG16/svd/test_fc6_fc7.prototxt' 69 | # caffemodel = 'snapshots/vgg16_fast_rcnn_iter_40000.caffemodel' 70 | net_svd = caffe.Net(args.prototxt_svd, args.caffemodel, caffe.TEST) 71 | 72 | print('Uncompressed network {} : {}'.format(args.prototxt, args.caffemodel)) 73 | print('Compressed network prototxt {}'.format(args.prototxt_svd)) 74 | 75 | out = os.path.splitext(os.path.basename(args.caffemodel))[0] + '_svd' 76 | out_dir = os.path.dirname(args.caffemodel) 77 | 78 | # Compress fc6 79 | if net_svd.params.has_key('fc6_L'): 80 | l_fc6 = net_svd.params['fc6_L'][0].data.shape[0] 81 | print(' fc6_L bottleneck size: {}'.format(l_fc6)) 82 | 83 | # uncompressed weights and biases 84 | W_fc6 = net.params['fc6'][0].data 85 | B_fc6 = net.params['fc6'][1].data 86 | 87 | print(' compressing fc6...') 88 | Ul_fc6, L_fc6 = compress_weights(W_fc6, l_fc6) 89 | 90 | assert(len(net_svd.params['fc6_L']) == 1) 91 | 92 | # install compressed matrix factors (and original biases) 93 | net_svd.params['fc6_L'][0].data[...] = L_fc6 94 | 95 | net_svd.params['fc6_U'][0].data[...] = Ul_fc6 96 | net_svd.params['fc6_U'][1].data[...] = B_fc6 97 | 98 | out += '_fc6_{}'.format(l_fc6) 99 | 100 | # Compress fc7 101 | if net_svd.params.has_key('fc7_L'): 102 | l_fc7 = net_svd.params['fc7_L'][0].data.shape[0] 103 | print ' fc7_L bottleneck size: {}'.format(l_fc7) 104 | 105 | W_fc7 = net.params['fc7'][0].data 106 | B_fc7 = net.params['fc7'][1].data 107 | 108 | print(' compressing fc7...') 109 | Ul_fc7, L_fc7 = compress_weights(W_fc7, l_fc7) 110 | 111 | assert(len(net_svd.params['fc7_L']) == 1) 112 | 113 | net_svd.params['fc7_L'][0].data[...] = L_fc7 114 | 115 | net_svd.params['fc7_U'][0].data[...] = Ul_fc7 116 | net_svd.params['fc7_U'][1].data[...] = B_fc7 117 | 118 | out += '_fc7_{}'.format(l_fc7) 119 | 120 | filename = '{}/{}.caffemodel'.format(out_dir, out) 121 | net_svd.save(filename) 122 | print 'Wrote svd model to: {:s}'.format(filename) 123 | 124 | if __name__ == '__main__': 125 | main() 126 | -------------------------------------------------------------------------------- /lib/nms/nms_kernel.cu: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------ 2 | // Faster R-CNN 3 | // Copyright (c) 2015 Microsoft 4 | // Licensed under The MIT License [see fast-rcnn/LICENSE for details] 5 | // Written by Shaoqing Ren 6 | // ------------------------------------------------------------------ 7 | 8 | #include "gpu_nms.hpp" 9 | #include 10 | #include 11 | 12 | #define CUDA_CHECK(condition) \ 13 | /* Code block avoids redefinition of cudaError_t error */ \ 14 | do { \ 15 | cudaError_t error = condition; \ 16 | if (error != cudaSuccess) { \ 17 | std::cout << cudaGetErrorString(error) << std::endl; \ 18 | } \ 19 | } while (0) 20 | 21 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 22 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 23 | 24 | __device__ inline float devIoU(float const * const a, float const * const b) { 25 | float left = max(a[0], b[0]), right = min(a[2], b[2]); 26 | float top = max(a[1], b[1]), bottom = min(a[3], b[3]); 27 | float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f); 28 | float interS = width * height; 29 | float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1); 30 | float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1); 31 | return interS / (Sa + Sb - interS); 32 | } 33 | 34 | __global__ void nms_kernel(const int n_boxes, const float nms_overlap_thresh, 35 | const float *dev_boxes, unsigned long long *dev_mask) { 36 | const int row_start = blockIdx.y; 37 | const int col_start = blockIdx.x; 38 | 39 | // if (row_start > col_start) return; 40 | 41 | const int row_size = 42 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 43 | const int col_size = 44 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 45 | 46 | __shared__ float block_boxes[threadsPerBlock * 5]; 47 | if (threadIdx.x < col_size) { 48 | block_boxes[threadIdx.x * 5 + 0] = 49 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; 50 | block_boxes[threadIdx.x * 5 + 1] = 51 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; 52 | block_boxes[threadIdx.x * 5 + 2] = 53 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; 54 | block_boxes[threadIdx.x * 5 + 3] = 55 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; 56 | block_boxes[threadIdx.x * 5 + 4] = 57 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; 58 | } 59 | __syncthreads(); 60 | 61 | if (threadIdx.x < row_size) { 62 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 63 | const float *cur_box = dev_boxes + cur_box_idx * 5; 64 | int i = 0; 65 | unsigned long long t = 0; 66 | int start = 0; 67 | if (row_start == col_start) { 68 | start = threadIdx.x + 1; 69 | } 70 | for (i = start; i < col_size; i++) { 71 | if (devIoU(cur_box, block_boxes + i * 5) > nms_overlap_thresh) { 72 | t |= 1ULL << i; 73 | } 74 | } 75 | const int col_blocks = DIVUP(n_boxes, threadsPerBlock); 76 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 77 | } 78 | } 79 | 80 | void _set_device(int device_id) { 81 | int current_device; 82 | CUDA_CHECK(cudaGetDevice(¤t_device)); 83 | if (current_device == device_id) { 84 | return; 85 | } 86 | // The call to cudaSetDevice must come before any calls to Get, which 87 | // may perform initialization using the GPU. 88 | CUDA_CHECK(cudaSetDevice(device_id)); 89 | } 90 | 91 | void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num, 92 | int boxes_dim, float nms_overlap_thresh, int device_id) { 93 | _set_device(device_id); 94 | 95 | float* boxes_dev = NULL; 96 | unsigned long long* mask_dev = NULL; 97 | 98 | const int col_blocks = DIVUP(boxes_num, threadsPerBlock); 99 | 100 | CUDA_CHECK(cudaMalloc(&boxes_dev, 101 | boxes_num * boxes_dim * sizeof(float))); 102 | CUDA_CHECK(cudaMemcpy(boxes_dev, 103 | boxes_host, 104 | boxes_num * boxes_dim * sizeof(float), 105 | cudaMemcpyHostToDevice)); 106 | 107 | CUDA_CHECK(cudaMalloc(&mask_dev, 108 | boxes_num * col_blocks * sizeof(unsigned long long))); 109 | 110 | dim3 blocks(DIVUP(boxes_num, threadsPerBlock), 111 | DIVUP(boxes_num, threadsPerBlock)); 112 | dim3 threads(threadsPerBlock); 113 | nms_kernel<<>>(boxes_num, 114 | nms_overlap_thresh, 115 | boxes_dev, 116 | mask_dev); 117 | 118 | std::vector mask_host(boxes_num * col_blocks); 119 | CUDA_CHECK(cudaMemcpy(&mask_host[0], 120 | mask_dev, 121 | sizeof(unsigned long long) * boxes_num * col_blocks, 122 | cudaMemcpyDeviceToHost)); 123 | 124 | std::vector remv(col_blocks); 125 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 126 | 127 | int num_to_keep = 0; 128 | for (int i = 0; i < boxes_num; i++) { 129 | int nblock = i / threadsPerBlock; 130 | int inblock = i % threadsPerBlock; 131 | 132 | if (!(remv[nblock] & (1ULL << inblock))) { 133 | keep_out[num_to_keep++] = i; 134 | unsigned long long *p = &mask_host[0] + i * col_blocks; 135 | for (int j = nblock; j < col_blocks; j++) { 136 | remv[j] |= p[j]; 137 | } 138 | } 139 | } 140 | *num_out = num_to_keep; 141 | 142 | CUDA_CHECK(cudaFree(boxes_dev)); 143 | CUDA_CHECK(cudaFree(mask_dev)); 144 | } 145 | -------------------------------------------------------------------------------- /tools/demo.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # -------------------------------------------------------- 4 | # Faster R-CNN 5 | # Copyright (c) 2015 Microsoft 6 | # Licensed under The MIT License [see LICENSE for details] 7 | # Written by Ross Girshick 8 | # -------------------------------------------------------- 9 | 10 | """ 11 | Demo script showing detections in sample images. 12 | 13 | See README.md for installation instructions before running. 14 | """ 15 | 16 | import _init_paths 17 | from fast_rcnn.config import cfg 18 | from fast_rcnn.test import im_detect 19 | from fast_rcnn.nms_wrapper import nms 20 | from utils.timer import Timer 21 | import matplotlib.pyplot as plt 22 | import numpy as np 23 | import scipy.io as sio 24 | import caffe, os, sys, cv2 25 | import argparse 26 | 27 | CLASSES = ('__background__', 28 | 'aeroplane', 'bicycle', 'bird', 'boat', 29 | 'bottle', 'bus', 'car', 'cat', 'chair', 30 | 'cow', 'diningtable', 'dog', 'horse', 31 | 'motorbike', 'person', 'pottedplant', 32 | 'sheep', 'sofa', 'train', 'tvmonitor') 33 | 34 | NETS = {'vgg16': ('VGG16', 35 | 'VGG16_faster_rcnn_final.caffemodel'), 36 | 'zf': ('ZF', 37 | 'ZF_faster_rcnn_final.caffemodel')} 38 | 39 | 40 | def vis_detections(im, class_name, dets, thresh=0.5): 41 | """Draw detected bounding boxes.""" 42 | inds = np.where(dets[:, -1] >= thresh)[0] 43 | if len(inds) == 0: 44 | return 45 | 46 | im = im[:, :, (2, 1, 0)] 47 | fig, ax = plt.subplots(figsize=(12, 12)) 48 | ax.imshow(im, aspect='equal') 49 | for i in inds: 50 | bbox = dets[i, :4] 51 | score = dets[i, -1] 52 | 53 | ax.add_patch( 54 | plt.Rectangle((bbox[0], bbox[1]), 55 | bbox[2] - bbox[0], 56 | bbox[3] - bbox[1], fill=False, 57 | edgecolor='red', linewidth=3.5) 58 | ) 59 | ax.text(bbox[0], bbox[1] - 2, 60 | '{:s} {:.3f}'.format(class_name, score), 61 | bbox=dict(facecolor='blue', alpha=0.5), 62 | fontsize=14, color='white') 63 | 64 | ax.set_title(('{} detections with ' 65 | 'p({} | box) >= {:.1f}').format(class_name, class_name, 66 | thresh), 67 | fontsize=14) 68 | plt.axis('off') 69 | plt.tight_layout() 70 | plt.draw() 71 | 72 | def demo(net, image_name): 73 | """Detect object classes in an image using pre-computed object proposals.""" 74 | 75 | # Load the demo image 76 | im_file = os.path.join(cfg.DATA_DIR, 'demo', image_name) 77 | im = cv2.imread(im_file) 78 | 79 | # Detect all object classes and regress object bounds 80 | timer = Timer() 81 | timer.tic() 82 | scores, boxes = im_detect(net, im) 83 | timer.toc() 84 | print ('Detection took {:.3f}s for ' 85 | '{:d} object proposals').format(timer.total_time, boxes.shape[0]) 86 | 87 | # Visualize detections for each class 88 | CONF_THRESH = 0.8 89 | NMS_THRESH = 0.3 90 | for cls_ind, cls in enumerate(CLASSES[1:]): 91 | cls_ind += 1 # because we skipped background 92 | cls_boxes = boxes[:, 4*cls_ind:4*(cls_ind + 1)] 93 | cls_scores = scores[:, cls_ind] 94 | dets = np.hstack((cls_boxes, 95 | cls_scores[:, np.newaxis])).astype(np.float32) 96 | keep = nms(dets, NMS_THRESH) 97 | dets = dets[keep, :] 98 | vis_detections(im, cls, dets, thresh=CONF_THRESH) 99 | 100 | def parse_args(): 101 | """Parse input arguments.""" 102 | parser = argparse.ArgumentParser(description='Faster R-CNN demo') 103 | parser.add_argument('--gpu', dest='gpu_id', help='GPU device id to use [0]', 104 | default=0, type=int) 105 | parser.add_argument('--cpu', dest='cpu_mode', 106 | help='Use CPU mode (overrides --gpu)', 107 | action='store_true') 108 | parser.add_argument('--net', dest='demo_net', help='Network to use [vgg16]', 109 | choices=NETS.keys(), default='vgg16') 110 | 111 | args = parser.parse_args() 112 | 113 | return args 114 | 115 | if __name__ == '__main__': 116 | cfg.TEST.HAS_RPN = True # Use RPN for proposals 117 | 118 | args = parse_args() 119 | 120 | prototxt = os.path.join(cfg.MODELS_DIR, NETS[args.demo_net][0], 121 | 'faster_rcnn_alt_opt', 'faster_rcnn_test.pt') 122 | caffemodel = os.path.join(cfg.DATA_DIR, 'faster_rcnn_models', 123 | NETS[args.demo_net][1]) 124 | 125 | if not os.path.isfile(caffemodel): 126 | raise IOError(('{:s} not found.\nDid you run ./data/script/' 127 | 'fetch_faster_rcnn_models.sh?').format(caffemodel)) 128 | 129 | if args.cpu_mode: 130 | caffe.set_mode_cpu() 131 | else: 132 | caffe.set_mode_gpu() 133 | caffe.set_device(args.gpu_id) 134 | cfg.GPU_ID = args.gpu_id 135 | net = caffe.Net(prototxt, caffemodel, caffe.TEST) 136 | 137 | print '\n\nLoaded network {:s}'.format(caffemodel) 138 | 139 | # Warmup on a dummy image 140 | im = 128 * np.ones((300, 500, 3), dtype=np.uint8) 141 | for i in xrange(2): 142 | _, _= im_detect(net, im) 143 | 144 | im_names = ['000456.jpg', '000542.jpg', '001150.jpg', 145 | '001763.jpg', '004545.jpg'] 146 | for im_name in im_names: 147 | print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' 148 | print 'Demo for data/demo/{}'.format(im_name) 149 | demo(net, im_name) 150 | 151 | plt.show() 152 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG_CNN_M_1024/fast_rcnn/train.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG_CNN_M_1024" 2 | layer { 3 | name: 'data' 4 | type: 'Python' 5 | top: 'data' 6 | top: 'rois' 7 | top: 'labels' 8 | top: 'bbox_targets' 9 | top: 'bbox_inside_weights' 10 | top: 'bbox_outside_weights' 11 | python_param { 12 | module: 'roi_data_layer.layer' 13 | layer: 'RoIDataLayer' 14 | param_str: "'num_classes': 21" 15 | } 16 | } 17 | layer { 18 | name: "conv1" 19 | type: "Convolution" 20 | bottom: "data" 21 | top: "conv1" 22 | param { lr_mult: 0 decay_mult: 0 } 23 | param { lr_mult: 0 decay_mult: 0 } 24 | convolution_param { 25 | num_output: 96 26 | kernel_size: 7 27 | stride: 2 28 | } 29 | } 30 | layer { 31 | name: "relu1" 32 | type: "ReLU" 33 | bottom: "conv1" 34 | top: "conv1" 35 | } 36 | layer { 37 | name: "norm1" 38 | type: "LRN" 39 | bottom: "conv1" 40 | top: "norm1" 41 | lrn_param { 42 | local_size: 5 43 | alpha: 0.0005 44 | beta: 0.75 45 | k: 2 46 | } 47 | } 48 | layer { 49 | name: "pool1" 50 | type: "Pooling" 51 | bottom: "norm1" 52 | top: "pool1" 53 | pooling_param { 54 | pool: MAX 55 | kernel_size: 3 56 | stride: 2 57 | } 58 | } 59 | layer { 60 | name: "conv2" 61 | type: "Convolution" 62 | bottom: "pool1" 63 | top: "conv2" 64 | param { 65 | lr_mult: 1 66 | } 67 | param { 68 | lr_mult: 2 69 | } 70 | convolution_param { 71 | num_output: 256 72 | pad: 1 73 | kernel_size: 5 74 | stride: 2 75 | } 76 | } 77 | layer { 78 | name: "relu2" 79 | type: "ReLU" 80 | bottom: "conv2" 81 | top: "conv2" 82 | } 83 | layer { 84 | name: "norm2" 85 | type: "LRN" 86 | bottom: "conv2" 87 | top: "norm2" 88 | lrn_param { 89 | local_size: 5 90 | alpha: 0.0005 91 | beta: 0.75 92 | k: 2 93 | } 94 | } 95 | layer { 96 | name: "pool2" 97 | type: "Pooling" 98 | bottom: "norm2" 99 | top: "pool2" 100 | pooling_param { 101 | pool: MAX 102 | kernel_size: 3 103 | stride: 2 104 | } 105 | } 106 | layer { 107 | name: "conv3" 108 | type: "Convolution" 109 | bottom: "pool2" 110 | top: "conv3" 111 | param { 112 | lr_mult: 1 113 | } 114 | param { 115 | lr_mult: 2 116 | } 117 | convolution_param { 118 | num_output: 512 119 | pad: 1 120 | kernel_size: 3 121 | } 122 | } 123 | layer { 124 | name: "relu3" 125 | type: "ReLU" 126 | bottom: "conv3" 127 | top: "conv3" 128 | } 129 | layer { 130 | name: "conv4" 131 | type: "Convolution" 132 | bottom: "conv3" 133 | top: "conv4" 134 | param { 135 | lr_mult: 1 136 | } 137 | param { 138 | lr_mult: 2 139 | } 140 | convolution_param { 141 | num_output: 512 142 | pad: 1 143 | kernel_size: 3 144 | } 145 | } 146 | layer { 147 | name: "relu4" 148 | type: "ReLU" 149 | bottom: "conv4" 150 | top: "conv4" 151 | } 152 | layer { 153 | name: "conv5" 154 | type: "Convolution" 155 | bottom: "conv4" 156 | top: "conv5" 157 | param { 158 | lr_mult: 1 159 | } 160 | param { 161 | lr_mult: 2 162 | } 163 | convolution_param { 164 | num_output: 512 165 | pad: 1 166 | kernel_size: 3 167 | } 168 | } 169 | layer { 170 | name: "relu5" 171 | type: "ReLU" 172 | bottom: "conv5" 173 | top: "conv5" 174 | } 175 | layer { 176 | name: "roi_pool5" 177 | type: "ROIPooling" 178 | bottom: "conv5" 179 | bottom: "rois" 180 | top: "pool5" 181 | roi_pooling_param { 182 | pooled_w: 6 183 | pooled_h: 6 184 | spatial_scale: 0.0625 # 1/16 185 | } 186 | } 187 | layer { 188 | name: "fc6" 189 | type: "InnerProduct" 190 | bottom: "pool5" 191 | top: "fc6" 192 | param { 193 | lr_mult: 1 194 | } 195 | param { 196 | lr_mult: 2 197 | } 198 | inner_product_param { 199 | num_output: 4096 200 | } 201 | } 202 | layer { 203 | name: "relu6" 204 | type: "ReLU" 205 | bottom: "fc6" 206 | top: "fc6" 207 | } 208 | layer { 209 | name: "drop6" 210 | type: "Dropout" 211 | bottom: "fc6" 212 | top: "fc6" 213 | dropout_param { 214 | dropout_ratio: 0.5 215 | } 216 | } 217 | layer { 218 | name: "fc7" 219 | type: "InnerProduct" 220 | bottom: "fc6" 221 | top: "fc7" 222 | param { 223 | lr_mult: 1 224 | } 225 | param { 226 | lr_mult: 2 227 | } 228 | inner_product_param { 229 | num_output: 1024 230 | } 231 | } 232 | layer { 233 | name: "relu7" 234 | type: "ReLU" 235 | bottom: "fc7" 236 | top: "fc7" 237 | } 238 | layer { 239 | name: "drop7" 240 | type: "Dropout" 241 | bottom: "fc7" 242 | top: "fc7" 243 | dropout_param { 244 | dropout_ratio: 0.5 245 | } 246 | } 247 | layer { 248 | name: "cls_score" 249 | type: "InnerProduct" 250 | bottom: "fc7" 251 | top: "cls_score" 252 | param { 253 | lr_mult: 1 254 | } 255 | param { 256 | lr_mult: 2 257 | } 258 | inner_product_param { 259 | num_output: 21 260 | weight_filler { 261 | type: "gaussian" 262 | std: 0.01 263 | } 264 | bias_filler { 265 | type: "constant" 266 | value: 0 267 | } 268 | } 269 | } 270 | layer { 271 | name: "bbox_pred" 272 | type: "InnerProduct" 273 | bottom: "fc7" 274 | top: "bbox_pred" 275 | param { 276 | lr_mult: 1 277 | } 278 | param { 279 | lr_mult: 2 280 | } 281 | inner_product_param { 282 | num_output: 84 283 | weight_filler { 284 | type: "gaussian" 285 | std: 0.001 286 | } 287 | bias_filler { 288 | type: "constant" 289 | value: 0 290 | } 291 | } 292 | } 293 | layer { 294 | name: "loss_cls" 295 | type: "SoftmaxWithLoss" 296 | bottom: "cls_score" 297 | bottom: "labels" 298 | top: "loss_cls" 299 | loss_weight: 1 300 | } 301 | layer { 302 | name: "loss_bbox" 303 | type: "SmoothL1Loss" 304 | bottom: "bbox_pred" 305 | bottom: "bbox_targets" 306 | bottom: "bbox_inside_weights" 307 | bottom: "bbox_outside_weights" 308 | top: "loss_bbox" 309 | loss_weight: 1 310 | } 311 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG_CNN_M_1024/fast_rcnn/test.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG_CNN_M_1024" 2 | input: "data" 3 | input_shape { 4 | dim: 1 5 | dim: 3 6 | dim: 224 7 | dim: 224 8 | } 9 | input: "rois" 10 | input_shape { 11 | dim: 1 # to be changed on-the-fly to num ROIs 12 | dim: 5 # [batch ind, x1, y1, x2, y2] zero-based indexing 13 | } 14 | layer { 15 | name: "conv1" 16 | type: "Convolution" 17 | bottom: "data" 18 | top: "conv1" 19 | param { 20 | lr_mult: 0 21 | decay_mult: 0 22 | } 23 | param { 24 | lr_mult: 0 25 | decay_mult: 0 26 | } 27 | convolution_param { 28 | num_output: 96 29 | kernel_size: 7 30 | stride: 2 31 | } 32 | } 33 | layer { 34 | name: "relu1" 35 | type: "ReLU" 36 | bottom: "conv1" 37 | top: "conv1" 38 | } 39 | layer { 40 | name: "norm1" 41 | type: "LRN" 42 | bottom: "conv1" 43 | top: "norm1" 44 | lrn_param { 45 | local_size: 5 46 | alpha: 0.0005 47 | beta: 0.75 48 | k: 2 49 | } 50 | } 51 | layer { 52 | name: "pool1" 53 | type: "Pooling" 54 | bottom: "norm1" 55 | top: "pool1" 56 | pooling_param { 57 | pool: MAX 58 | kernel_size: 3 59 | stride: 2 60 | } 61 | } 62 | layer { 63 | name: "conv2" 64 | type: "Convolution" 65 | bottom: "pool1" 66 | top: "conv2" 67 | param { 68 | lr_mult: 1 69 | decay_mult: 1 70 | } 71 | param { 72 | lr_mult: 2 73 | decay_mult: 0 74 | } 75 | convolution_param { 76 | num_output: 256 77 | pad: 1 78 | kernel_size: 5 79 | stride: 2 80 | } 81 | } 82 | layer { 83 | name: "relu2" 84 | type: "ReLU" 85 | bottom: "conv2" 86 | top: "conv2" 87 | } 88 | layer { 89 | name: "norm2" 90 | type: "LRN" 91 | bottom: "conv2" 92 | top: "norm2" 93 | lrn_param { 94 | local_size: 5 95 | alpha: 0.0005 96 | beta: 0.75 97 | k: 2 98 | } 99 | } 100 | layer { 101 | name: "pool2" 102 | type: "Pooling" 103 | bottom: "norm2" 104 | top: "pool2" 105 | pooling_param { 106 | pool: MAX 107 | kernel_size: 3 108 | stride: 2 109 | } 110 | } 111 | layer { 112 | name: "conv3" 113 | type: "Convolution" 114 | bottom: "pool2" 115 | top: "conv3" 116 | param { 117 | lr_mult: 1 118 | decay_mult: 1 119 | } 120 | param { 121 | lr_mult: 2 122 | decay_mult: 0 123 | } 124 | convolution_param { 125 | num_output: 512 126 | pad: 1 127 | kernel_size: 3 128 | } 129 | } 130 | layer { 131 | name: "relu3" 132 | type: "ReLU" 133 | bottom: "conv3" 134 | top: "conv3" 135 | } 136 | layer { 137 | name: "conv4" 138 | type: "Convolution" 139 | bottom: "conv3" 140 | top: "conv4" 141 | param { 142 | lr_mult: 1 143 | decay_mult: 1 144 | } 145 | param { 146 | lr_mult: 2 147 | decay_mult: 0 148 | } 149 | convolution_param { 150 | num_output: 512 151 | pad: 1 152 | kernel_size: 3 153 | } 154 | } 155 | layer { 156 | name: "relu4" 157 | type: "ReLU" 158 | bottom: "conv4" 159 | top: "conv4" 160 | } 161 | layer { 162 | name: "conv5" 163 | type: "Convolution" 164 | bottom: "conv4" 165 | top: "conv5" 166 | param { 167 | lr_mult: 1 168 | decay_mult: 1 169 | } 170 | param { 171 | lr_mult: 2 172 | decay_mult: 0 173 | } 174 | convolution_param { 175 | num_output: 512 176 | pad: 1 177 | kernel_size: 3 178 | } 179 | } 180 | layer { 181 | name: "relu5" 182 | type: "ReLU" 183 | bottom: "conv5" 184 | top: "conv5" 185 | } 186 | layer { 187 | name: "roi_pool5" 188 | type: "ROIPooling" 189 | bottom: "conv5" 190 | bottom: "rois" 191 | top: "pool5" 192 | roi_pooling_param { 193 | pooled_w: 6 194 | pooled_h: 6 195 | spatial_scale: 0.0625 # 1/16 196 | } 197 | } 198 | layer { 199 | name: "fc6" 200 | type: "InnerProduct" 201 | bottom: "pool5" 202 | top: "fc6" 203 | param { 204 | lr_mult: 1 205 | decay_mult: 1 206 | } 207 | param { 208 | lr_mult: 2 209 | decay_mult: 0 210 | } 211 | inner_product_param { 212 | num_output: 4096 213 | } 214 | } 215 | layer { 216 | name: "relu6" 217 | type: "ReLU" 218 | bottom: "fc6" 219 | top: "fc6" 220 | } 221 | layer { 222 | name: "drop6" 223 | type: "Dropout" 224 | bottom: "fc6" 225 | top: "fc6" 226 | dropout_param { 227 | dropout_ratio: 0.5 228 | } 229 | } 230 | layer { 231 | name: "fc7" 232 | type: "InnerProduct" 233 | bottom: "fc6" 234 | top: "fc7" 235 | param { 236 | lr_mult: 1 237 | decay_mult: 1 238 | } 239 | param { 240 | lr_mult: 2 241 | decay_mult: 0 242 | } 243 | inner_product_param { 244 | num_output: 1024 245 | } 246 | } 247 | layer { 248 | name: "relu7" 249 | type: "ReLU" 250 | bottom: "fc7" 251 | top: "fc7" 252 | } 253 | layer { 254 | name: "drop7" 255 | type: "Dropout" 256 | bottom: "fc7" 257 | top: "fc7" 258 | dropout_param { 259 | dropout_ratio: 0.5 260 | } 261 | } 262 | layer { 263 | name: "cls_score" 264 | type: "InnerProduct" 265 | bottom: "fc7" 266 | top: "cls_score" 267 | param { 268 | lr_mult: 1 269 | decay_mult: 1 270 | } 271 | param { 272 | lr_mult: 2 273 | decay_mult: 0 274 | } 275 | inner_product_param { 276 | num_output: 21 277 | weight_filler { 278 | type: "gaussian" 279 | std: 0.01 280 | } 281 | bias_filler { 282 | type: "constant" 283 | value: 0 284 | } 285 | } 286 | } 287 | layer { 288 | name: "bbox_pred" 289 | type: "InnerProduct" 290 | bottom: "fc7" 291 | top: "bbox_pred" 292 | param { 293 | lr_mult: 1 294 | decay_mult: 1 295 | } 296 | param { 297 | lr_mult: 2 298 | decay_mult: 0 299 | } 300 | inner_product_param { 301 | num_output: 84 302 | weight_filler { 303 | type: "gaussian" 304 | std: 0.001 305 | } 306 | bias_filler { 307 | type: "constant" 308 | value: 0 309 | } 310 | } 311 | } 312 | layer { 313 | name: "cls_prob" 314 | type: "Softmax" 315 | bottom: "cls_score" 316 | top: "cls_prob" 317 | } 318 | -------------------------------------------------------------------------------- /lib/roi_data_layer/roidb.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | """Transform a roidb into a trainable roidb by adding a bunch of metadata.""" 9 | 10 | import numpy as np 11 | from fast_rcnn.config import cfg 12 | from fast_rcnn.bbox_transform import bbox_transform 13 | from utils.cython_bbox import bbox_overlaps 14 | import PIL 15 | 16 | def prepare_roidb(imdb): 17 | """Enrich the imdb's roidb by adding some derived quantities that 18 | are useful for training. This function precomputes the maximum 19 | overlap, taken over ground-truth boxes, between each ROI and 20 | each ground-truth box. The class with maximum overlap is also 21 | recorded. 22 | """ 23 | sizes = [PIL.Image.open(imdb.image_path_at(i)).size 24 | for i in xrange(imdb.num_images)] 25 | roidb = imdb.roidb 26 | for i in xrange(len(imdb.image_index)): 27 | roidb[i]['image'] = imdb.image_path_at(i) 28 | roidb[i]['width'] = sizes[i][0] 29 | roidb[i]['height'] = sizes[i][1] 30 | # need gt_overlaps as a dense array for argmax 31 | gt_overlaps = roidb[i]['gt_overlaps'].toarray() 32 | # max overlap with gt over classes (columns) 33 | max_overlaps = gt_overlaps.max(axis=1) 34 | # gt class that had the max overlap 35 | max_classes = gt_overlaps.argmax(axis=1) 36 | roidb[i]['max_classes'] = max_classes 37 | roidb[i]['max_overlaps'] = max_overlaps 38 | # sanity checks 39 | # max overlap of 0 => class should be zero (background) 40 | zero_inds = np.where(max_overlaps == 0)[0] 41 | assert all(max_classes[zero_inds] == 0) 42 | # max overlap > 0 => class should not be zero (must be a fg class) 43 | nonzero_inds = np.where(max_overlaps > 0)[0] 44 | assert all(max_classes[nonzero_inds] != 0) 45 | 46 | def add_bbox_regression_targets(roidb): 47 | """Add information needed to train bounding-box regressors.""" 48 | assert len(roidb) > 0 49 | assert 'max_classes' in roidb[0], 'Did you call prepare_roidb first?' 50 | 51 | num_images = len(roidb) 52 | # Infer number of classes from the number of columns in gt_overlaps 53 | num_classes = roidb[0]['gt_overlaps'].shape[1] 54 | for im_i in xrange(num_images): 55 | rois = roidb[im_i]['boxes'] 56 | max_overlaps = roidb[im_i]['max_overlaps'] 57 | max_classes = roidb[im_i]['max_classes'] 58 | roidb[im_i]['bbox_targets'] = \ 59 | _compute_targets(rois, max_overlaps, max_classes) 60 | 61 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: 62 | # Use fixed / precomputed "means" and "stds" instead of empirical values 63 | means = np.tile( 64 | np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (num_classes, 1)) 65 | stds = np.tile( 66 | np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (num_classes, 1)) 67 | else: 68 | # Compute values needed for means and stds 69 | # var(x) = E(x^2) - E(x)^2 70 | class_counts = np.zeros((num_classes, 1)) + cfg.EPS 71 | sums = np.zeros((num_classes, 4)) 72 | squared_sums = np.zeros((num_classes, 4)) 73 | for im_i in xrange(num_images): 74 | targets = roidb[im_i]['bbox_targets'] 75 | for cls in xrange(1, num_classes): 76 | cls_inds = np.where(targets[:, 0] == cls)[0] 77 | if cls_inds.size > 0: 78 | class_counts[cls] += cls_inds.size 79 | sums[cls, :] += targets[cls_inds, 1:].sum(axis=0) 80 | squared_sums[cls, :] += \ 81 | (targets[cls_inds, 1:] ** 2).sum(axis=0) 82 | 83 | means = sums / class_counts 84 | stds = np.sqrt(squared_sums / class_counts - means ** 2) 85 | 86 | print 'bbox target means:' 87 | print means 88 | print means[1:, :].mean(axis=0) # ignore bg class 89 | print 'bbox target stdevs:' 90 | print stds 91 | print stds[1:, :].mean(axis=0) # ignore bg class 92 | 93 | # Normalize targets 94 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS: 95 | print "Normalizing targets" 96 | for im_i in xrange(num_images): 97 | targets = roidb[im_i]['bbox_targets'] 98 | for cls in xrange(1, num_classes): 99 | cls_inds = np.where(targets[:, 0] == cls)[0] 100 | roidb[im_i]['bbox_targets'][cls_inds, 1:] -= means[cls, :] 101 | roidb[im_i]['bbox_targets'][cls_inds, 1:] /= stds[cls, :] 102 | else: 103 | print "NOT normalizing targets" 104 | 105 | # These values will be needed for making predictions 106 | # (the predicts will need to be unnormalized and uncentered) 107 | return means.ravel(), stds.ravel() 108 | 109 | def _compute_targets(rois, overlaps, labels): 110 | """Compute bounding-box regression targets for an image.""" 111 | # Indices of ground-truth ROIs 112 | gt_inds = np.where(overlaps == 1)[0] 113 | if len(gt_inds) == 0: 114 | # Bail if the image has no ground-truth ROIs 115 | return np.zeros((rois.shape[0], 5), dtype=np.float32) 116 | # Indices of examples for which we try to make predictions 117 | ex_inds = np.where(overlaps >= cfg.TRAIN.BBOX_THRESH)[0] 118 | 119 | # Get IoU overlap between each ex ROI and gt ROI 120 | ex_gt_overlaps = bbox_overlaps( 121 | np.ascontiguousarray(rois[ex_inds, :], dtype=np.float), 122 | np.ascontiguousarray(rois[gt_inds, :], dtype=np.float)) 123 | 124 | # Find which gt ROI each ex ROI has max overlap with: 125 | # this will be the ex ROI's gt target 126 | gt_assignment = ex_gt_overlaps.argmax(axis=1) 127 | gt_rois = rois[gt_inds[gt_assignment], :] 128 | ex_rois = rois[ex_inds, :] 129 | 130 | targets = np.zeros((rois.shape[0], 5), dtype=np.float32) 131 | targets[ex_inds, 0] = labels[ex_inds] 132 | targets[ex_inds, 1:] = bbox_transform(ex_rois, gt_rois) 133 | return targets 134 | -------------------------------------------------------------------------------- /lib/setup.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import os 9 | from os.path import join as pjoin 10 | from setuptools import setup 11 | from distutils.extension import Extension 12 | from Cython.Distutils import build_ext 13 | import subprocess 14 | import numpy as np 15 | 16 | def find_in_path(name, path): 17 | "Find a file in a search path" 18 | # Adapted fom 19 | # http://code.activestate.com/recipes/52224-find-a-file-given-a-search-path/ 20 | for dir in path.split(os.pathsep): 21 | binpath = pjoin(dir, name) 22 | if os.path.exists(binpath): 23 | return os.path.abspath(binpath) 24 | return None 25 | 26 | 27 | def locate_cuda(): 28 | """Locate the CUDA environment on the system 29 | 30 | Returns a dict with keys 'home', 'nvcc', 'include', and 'lib64' 31 | and values giving the absolute path to each directory. 32 | 33 | Starts by looking for the CUDAHOME env variable. If not found, everything 34 | is based on finding 'nvcc' in the PATH. 35 | """ 36 | 37 | # first check if the CUDAHOME env variable is in use 38 | if 'CUDAHOME' in os.environ: 39 | home = os.environ['CUDAHOME'] 40 | nvcc = pjoin(home, 'bin', 'nvcc') 41 | else: 42 | # otherwise, search the PATH for NVCC 43 | default_path = pjoin(os.sep, 'usr', 'local', 'cuda', 'bin') 44 | nvcc = find_in_path('nvcc', os.environ['PATH'] + os.pathsep + default_path) 45 | if nvcc is None: 46 | raise EnvironmentError('The nvcc binary could not be ' 47 | 'located in your $PATH. Either add it to your path, or set $CUDAHOME') 48 | home = os.path.dirname(os.path.dirname(nvcc)) 49 | 50 | cudaconfig = {'home':home, 'nvcc':nvcc, 51 | 'include': pjoin(home, 'include'), 52 | 'lib64': pjoin(home, 'lib64')} 53 | for k, v in cudaconfig.iteritems(): 54 | if not os.path.exists(v): 55 | raise EnvironmentError('The CUDA %s path could not be located in %s' % (k, v)) 56 | 57 | return cudaconfig 58 | CUDA = locate_cuda() 59 | 60 | 61 | # Obtain the numpy include directory. This logic works across numpy versions. 62 | try: 63 | numpy_include = np.get_include() 64 | except AttributeError: 65 | numpy_include = np.get_numpy_include() 66 | 67 | def customize_compiler_for_nvcc(self): 68 | """inject deep into distutils to customize how the dispatch 69 | to gcc/nvcc works. 70 | 71 | If you subclass UnixCCompiler, it's not trivial to get your subclass 72 | injected in, and still have the right customizations (i.e. 73 | distutils.sysconfig.customize_compiler) run on it. So instead of going 74 | the OO route, I have this. Note, it's kindof like a wierd functional 75 | subclassing going on.""" 76 | 77 | # tell the compiler it can processes .cu 78 | self.src_extensions.append('.cu') 79 | 80 | # save references to the default compiler_so and _comple methods 81 | default_compiler_so = self.compiler_so 82 | super = self._compile 83 | 84 | # now redefine the _compile method. This gets executed for each 85 | # object but distutils doesn't have the ability to change compilers 86 | # based on source extension: we add it. 87 | def _compile(obj, src, ext, cc_args, extra_postargs, pp_opts): 88 | if os.path.splitext(src)[1] == '.cu': 89 | # use the cuda for .cu files 90 | self.set_executable('compiler_so', CUDA['nvcc']) 91 | # use only a subset of the extra_postargs, which are 1-1 translated 92 | # from the extra_compile_args in the Extension class 93 | postargs = extra_postargs['nvcc'] 94 | else: 95 | postargs = extra_postargs['gcc'] 96 | 97 | super(obj, src, ext, cc_args, postargs, pp_opts) 98 | # reset the default compiler_so, which we might have changed for cuda 99 | self.compiler_so = default_compiler_so 100 | 101 | # inject our redefined _compile method into the class 102 | self._compile = _compile 103 | 104 | 105 | # run the customize_compiler 106 | class custom_build_ext(build_ext): 107 | def build_extensions(self): 108 | customize_compiler_for_nvcc(self.compiler) 109 | build_ext.build_extensions(self) 110 | 111 | 112 | ext_modules = [ 113 | Extension( 114 | "utils.cython_bbox", 115 | ["utils/bbox.pyx"], 116 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 117 | include_dirs = [numpy_include] 118 | ), 119 | Extension( 120 | "nms.cpu_nms", 121 | ["nms/cpu_nms.pyx"], 122 | extra_compile_args={'gcc': ["-Wno-cpp", "-Wno-unused-function"]}, 123 | include_dirs = [numpy_include] 124 | ), 125 | Extension('nms.gpu_nms', 126 | ['nms/nms_kernel.cu', 'nms/gpu_nms.pyx'], 127 | library_dirs=[CUDA['lib64']], 128 | libraries=['cudart'], 129 | language='c++', 130 | runtime_library_dirs=[CUDA['lib64']], 131 | # this syntax is specific to this build system 132 | # we're only going to use certain compiler args with nvcc and not with 133 | # gcc the implementation of this trick is in customize_compiler() below 134 | extra_compile_args={'gcc': ["-Wno-unused-function"], 135 | 'nvcc': ['-arch=sm_35', 136 | '--ptxas-options=-v', 137 | '-c', 138 | '--compiler-options', 139 | "'-fPIC'"]}, 140 | include_dirs = [numpy_include, CUDA['include']] 141 | ), 142 | Extension( 143 | 'pycocotools._mask', 144 | sources=['pycocotools/maskApi.c', 'pycocotools/_mask.pyx'], 145 | include_dirs = [numpy_include, 'pycocotools'], 146 | extra_compile_args={ 147 | 'gcc': ['-Wno-cpp', '-Wno-unused-function', '-std=c99']}, 148 | ), 149 | ] 150 | 151 | setup( 152 | name='fast_rcnn', 153 | ext_modules=ext_modules, 154 | # inject our custom trigger 155 | cmdclass={'build_ext': custom_build_ext}, 156 | ) 157 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Online Hard Example Mining (OHEM) 2 | 3 | Copyright (c) 2016, Abhinav Shrivastava 4 | 5 | The MIT License (MIT) 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a copy 8 | of this software and associated documentation files (the "Software"), to deal 9 | in the Software without restriction, including without limitation the rights 10 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 11 | copies of the Software, and to permit persons to whom the Software is 12 | furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in 15 | all copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 23 | THE SOFTWARE. 24 | 25 | ************************************************************************ 26 | 27 | THIRD-PARTY SOFTWARE NOTICES AND INFORMATION 28 | 29 | This project, OHEM, incorporates material from the project(s) 30 | listed below (collectively, "Third Party Code"). The original copyright 31 | notice and license of such Third Party Code are set out below. This 32 | Third Party Code is licensed to you under their original license terms 33 | set forth below. 34 | 35 | 1. Fast R-CNN (https://github.com/rbgirshick/fast-rcnn) 36 | 37 | Copyright (c) Microsoft Corporation 38 | 39 | All rights reserved. 40 | 41 | MIT License 42 | 43 | Permission is hereby granted, free of charge, to any person obtaining a 44 | copy of this software and associated documentation files (the "Software"), 45 | to deal in the Software without restriction, including without limitation 46 | the rights to use, copy, modify, merge, publish, distribute, sublicense, 47 | and/or sell copies of the Software, and to permit persons to whom the 48 | Software is furnished to do so, subject to the following conditions: 49 | 50 | The above copyright notice and this permission notice shall be included 51 | in all copies or substantial portions of the Software. 52 | 53 | THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 54 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 55 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 56 | THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 57 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 58 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 59 | OTHER DEALINGS IN THE SOFTWARE. 60 | 61 | 62 | 2. Faster R-CNN (https://github.com/rbgirshick/py-faster-rcnn) 63 | 64 | The MIT License (MIT) 65 | 66 | Copyright (c) 2015 Microsoft Corporation 67 | 68 | Permission is hereby granted, free of charge, to any person obtaining a copy 69 | of this software and associated documentation files (the "Software"), to deal 70 | in the Software without restriction, including without limitation the rights 71 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 72 | copies of the Software, and to permit persons to whom the Software is 73 | furnished to do so, subject to the following conditions: 74 | 75 | The above copyright notice and this permission notice shall be included in 76 | all copies or substantial portions of the Software. 77 | 78 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 79 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 80 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 81 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 82 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 83 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 84 | THE SOFTWARE. 85 | 86 | 3. Caffe, (https://github.com/BVLC/caffe/) 87 | 88 | COPYRIGHT 89 | 90 | All contributions by the University of California: 91 | Copyright (c) 2014, 2015, The Regents of the University of California (Regents) 92 | All rights reserved. 93 | 94 | All other contributions: 95 | Copyright (c) 2014, 2015, the respective contributors 96 | All rights reserved. 97 | 98 | Caffe uses a shared copyright model: each contributor holds copyright 99 | over their contributions to Caffe. The project versioning records all 100 | such contribution and copyright details. If a contributor wants to 101 | further mark their specific copyright on a particular contribution, 102 | they should indicate their copyright solely in the commit message of 103 | the change when it is committed. 104 | 105 | The BSD 2-Clause License 106 | 107 | Redistribution and use in source and binary forms, with or without 108 | modification, are permitted provided that the following conditions 109 | are met: 110 | 111 | 1. Redistributions of source code must retain the above copyright notice, 112 | this list of conditions and the following disclaimer. 113 | 114 | 2. Redistributions in binary form must reproduce the above copyright 115 | notice, this list of conditions and the following disclaimer in the 116 | documentation and/or other materials provided with the distribution. 117 | 118 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 119 | "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 120 | LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 121 | A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 122 | HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 123 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED 124 | TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR 125 | PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 126 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 127 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 128 | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 129 | 130 | ************END OF THIRD-PARTY SOFTWARE NOTICES AND INFORMATION********** 131 | -------------------------------------------------------------------------------- /lib/fast_rcnn/train.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | """Train a Fast R-CNN network.""" 9 | 10 | import caffe 11 | from fast_rcnn.config import cfg 12 | import roi_data_layer.roidb as rdl_roidb 13 | from utils.timer import Timer 14 | import numpy as np 15 | import os 16 | 17 | from caffe.proto import caffe_pb2 18 | import google.protobuf as pb2 19 | 20 | class SolverWrapper(object): 21 | """A simple wrapper around Caffe's solver. 22 | This wrapper gives us control over he snapshotting process, which we 23 | use to unnormalize the learned bounding-box regression weights. 24 | """ 25 | 26 | def __init__(self, solver_prototxt, roidb, output_dir, 27 | pretrained_model=None): 28 | """Initialize the SolverWrapper.""" 29 | self.output_dir = output_dir 30 | 31 | if (cfg.TRAIN.HAS_RPN and cfg.TRAIN.BBOX_REG and 32 | cfg.TRAIN.BBOX_NORMALIZE_TARGETS): 33 | # RPN can only use precomputed normalization because there are no 34 | # fixed statistics to compute a priori 35 | assert cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED 36 | 37 | if cfg.TRAIN.BBOX_REG: 38 | print 'Computing bounding-box regression targets...' 39 | self.bbox_means, self.bbox_stds = \ 40 | rdl_roidb.add_bbox_regression_targets(roidb) 41 | print 'done' 42 | 43 | self.solver = caffe.SGDSolver(solver_prototxt) 44 | if pretrained_model is not None: 45 | print ('Loading pretrained model ' 46 | 'weights from {:s}').format(pretrained_model) 47 | self.solver.net.copy_from(pretrained_model) 48 | 49 | self.solver_param = caffe_pb2.SolverParameter() 50 | with open(solver_prototxt, 'rt') as f: 51 | pb2.text_format.Merge(f.read(), self.solver_param) 52 | 53 | self.solver.net.layers[0].set_roidb(roidb) 54 | 55 | def snapshot(self): 56 | """Take a snapshot of the network after unnormalizing the learned 57 | bounding-box regression weights. This enables easy use at test-time. 58 | """ 59 | net = self.solver.net 60 | 61 | scale_bbox_params = (cfg.TRAIN.BBOX_REG and 62 | cfg.TRAIN.BBOX_NORMALIZE_TARGETS and 63 | net.params.has_key('bbox_pred')) 64 | 65 | if scale_bbox_params: 66 | # save original values 67 | orig_0 = net.params['bbox_pred'][0].data.copy() 68 | orig_1 = net.params['bbox_pred'][1].data.copy() 69 | 70 | # scale and shift with bbox reg unnormalization; then save snapshot 71 | net.params['bbox_pred'][0].data[...] = \ 72 | (net.params['bbox_pred'][0].data * 73 | self.bbox_stds[:, np.newaxis]) 74 | net.params['bbox_pred'][1].data[...] = \ 75 | (net.params['bbox_pred'][1].data * 76 | self.bbox_stds + self.bbox_means) 77 | 78 | infix = ('_' + cfg.TRAIN.SNAPSHOT_INFIX 79 | if cfg.TRAIN.SNAPSHOT_INFIX != '' else '') 80 | filename = (self.solver_param.snapshot_prefix + infix + 81 | '_iter_{:d}'.format(self.solver.iter) + '.caffemodel') 82 | filename = os.path.join(self.output_dir, filename) 83 | 84 | net.save(str(filename)) 85 | print 'Wrote snapshot to: {:s}'.format(filename) 86 | 87 | if scale_bbox_params: 88 | # restore net to original state 89 | net.params['bbox_pred'][0].data[...] = orig_0 90 | net.params['bbox_pred'][1].data[...] = orig_1 91 | return filename 92 | 93 | def train_model(self, max_iters): 94 | """Network training loop.""" 95 | last_snapshot_iter = -1 96 | timer = Timer() 97 | model_paths = [] 98 | while self.solver.iter < max_iters: 99 | # Make one SGD update 100 | timer.tic() 101 | self.solver.step(1) 102 | timer.toc() 103 | if self.solver.iter % (10 * self.solver_param.display) == 0: 104 | print 'speed: {:.3f}s / iter'.format(timer.average_time) 105 | 106 | if self.solver.iter % cfg.TRAIN.SNAPSHOT_ITERS == 0: 107 | last_snapshot_iter = self.solver.iter 108 | model_paths.append(self.snapshot()) 109 | 110 | if last_snapshot_iter != self.solver.iter: 111 | model_paths.append(self.snapshot()) 112 | return model_paths 113 | 114 | def get_training_roidb(imdb): 115 | """Returns a roidb (Region of Interest database) for use in training.""" 116 | if cfg.TRAIN.USE_FLIPPED: 117 | print 'Appending horizontally-flipped training examples...' 118 | imdb.append_flipped_images() 119 | print 'done' 120 | 121 | print 'Preparing training data...' 122 | rdl_roidb.prepare_roidb(imdb) 123 | print 'done' 124 | 125 | return imdb.roidb 126 | 127 | def filter_roidb(roidb): 128 | """Remove roidb entries that have no usable RoIs.""" 129 | 130 | def is_valid(entry): 131 | # Valid images have: 132 | # (1) At least one foreground RoI OR 133 | # (2) At least one background RoI 134 | overlaps = entry['max_overlaps'] 135 | # find boxes with sufficient overlap 136 | fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0] 137 | # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) 138 | bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) & 139 | (overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] 140 | # image is only valid if such boxes exist 141 | valid = len(fg_inds) > 0 or len(bg_inds) > 0 142 | return valid 143 | 144 | num = len(roidb) 145 | filtered_roidb = [entry for entry in roidb if is_valid(entry)] 146 | num_after = len(filtered_roidb) 147 | print 'Filtered {} roidb entries: {} -> {}'.format(num - num_after, 148 | num, num_after) 149 | return filtered_roidb 150 | 151 | def train_net(solver_prototxt, roidb, output_dir, 152 | pretrained_model=None, max_iters=40000): 153 | """Train a Fast R-CNN network.""" 154 | 155 | roidb = filter_roidb(roidb) 156 | sw = SolverWrapper(solver_prototxt, roidb, output_dir, 157 | pretrained_model=pretrained_model) 158 | 159 | print 'Solving...' 160 | model_paths = sw.train_model(max_iters) 161 | print 'done solving' 162 | return model_paths 163 | -------------------------------------------------------------------------------- /lib/rpn/proposal_layer.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick and Sean Bell 6 | # -------------------------------------------------------- 7 | 8 | import caffe 9 | import numpy as np 10 | import yaml 11 | from fast_rcnn.config import cfg 12 | from generate_anchors import generate_anchors 13 | from fast_rcnn.bbox_transform import bbox_transform_inv, clip_boxes 14 | from fast_rcnn.nms_wrapper import nms 15 | 16 | DEBUG = False 17 | 18 | class ProposalLayer(caffe.Layer): 19 | """ 20 | Outputs object detection proposals by applying estimated bounding-box 21 | transformations to a set of regular boxes (called "anchors"). 22 | """ 23 | 24 | def setup(self, bottom, top): 25 | # parse the layer parameter string, which must be valid YAML 26 | layer_params = yaml.load(self.param_str_) 27 | 28 | self._feat_stride = layer_params['feat_stride'] 29 | anchor_scales = layer_params.get('scales', (8, 16, 32)) 30 | self._anchors = generate_anchors(scales=np.array(anchor_scales)) 31 | self._num_anchors = self._anchors.shape[0] 32 | 33 | if DEBUG: 34 | print 'feat_stride: {}'.format(self._feat_stride) 35 | print 'anchors:' 36 | print self._anchors 37 | 38 | # rois blob: holds R regions of interest, each is a 5-tuple 39 | # (n, x1, y1, x2, y2) specifying an image batch index n and a 40 | # rectangle (x1, y1, x2, y2) 41 | top[0].reshape(1, 5) 42 | 43 | # scores blob: holds scores for R regions of interest 44 | if len(top) > 1: 45 | top[1].reshape(1, 1, 1, 1) 46 | 47 | def forward(self, bottom, top): 48 | # Algorithm: 49 | # 50 | # for each (H, W) location i 51 | # generate A anchor boxes centered on cell i 52 | # apply predicted bbox deltas at cell i to each of the A anchors 53 | # clip predicted boxes to image 54 | # remove predicted boxes with either height or width < threshold 55 | # sort all (proposal, score) pairs by score from highest to lowest 56 | # take top pre_nms_topN proposals before NMS 57 | # apply NMS with threshold 0.7 to remaining proposals 58 | # take after_nms_topN proposals after NMS 59 | # return the top proposals (-> RoIs top, scores top) 60 | 61 | assert bottom[0].data.shape[0] == 1, \ 62 | 'Only single item batches are supported' 63 | 64 | cfg_key = str(self.phase) # either 'TRAIN' or 'TEST' 65 | pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N 66 | post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N 67 | nms_thresh = cfg[cfg_key].RPN_NMS_THRESH 68 | min_size = cfg[cfg_key].RPN_MIN_SIZE 69 | 70 | # the first set of _num_anchors channels are bg probs 71 | # the second set are the fg probs, which we want 72 | scores = bottom[0].data[:, self._num_anchors:, :, :] 73 | bbox_deltas = bottom[1].data 74 | im_info = bottom[2].data[0, :] 75 | 76 | if DEBUG: 77 | print 'im_size: ({}, {})'.format(im_info[0], im_info[1]) 78 | print 'scale: {}'.format(im_info[2]) 79 | 80 | # 1. Generate proposals from bbox deltas and shifted anchors 81 | height, width = scores.shape[-2:] 82 | 83 | if DEBUG: 84 | print 'score map size: {}'.format(scores.shape) 85 | 86 | # Enumerate all shifts 87 | shift_x = np.arange(0, width) * self._feat_stride 88 | shift_y = np.arange(0, height) * self._feat_stride 89 | shift_x, shift_y = np.meshgrid(shift_x, shift_y) 90 | shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), 91 | shift_x.ravel(), shift_y.ravel())).transpose() 92 | 93 | # Enumerate all shifted anchors: 94 | # 95 | # add A anchors (1, A, 4) to 96 | # cell K shifts (K, 1, 4) to get 97 | # shift anchors (K, A, 4) 98 | # reshape to (K*A, 4) shifted anchors 99 | A = self._num_anchors 100 | K = shifts.shape[0] 101 | anchors = self._anchors.reshape((1, A, 4)) + \ 102 | shifts.reshape((1, K, 4)).transpose((1, 0, 2)) 103 | anchors = anchors.reshape((K * A, 4)) 104 | 105 | # Transpose and reshape predicted bbox transformations to get them 106 | # into the same order as the anchors: 107 | # 108 | # bbox deltas will be (1, 4 * A, H, W) format 109 | # transpose to (1, H, W, 4 * A) 110 | # reshape to (1 * H * W * A, 4) where rows are ordered by (h, w, a) 111 | # in slowest to fastest order 112 | bbox_deltas = bbox_deltas.transpose((0, 2, 3, 1)).reshape((-1, 4)) 113 | 114 | # Same story for the scores: 115 | # 116 | # scores are (1, A, H, W) format 117 | # transpose to (1, H, W, A) 118 | # reshape to (1 * H * W * A, 1) where rows are ordered by (h, w, a) 119 | scores = scores.transpose((0, 2, 3, 1)).reshape((-1, 1)) 120 | 121 | # Convert anchors into proposals via bbox transformations 122 | proposals = bbox_transform_inv(anchors, bbox_deltas) 123 | 124 | # 2. clip predicted boxes to image 125 | proposals = clip_boxes(proposals, im_info[:2]) 126 | 127 | # 3. remove predicted boxes with either height or width < threshold 128 | # (NOTE: convert min_size to input image scale stored in im_info[2]) 129 | keep = _filter_boxes(proposals, min_size * im_info[2]) 130 | proposals = proposals[keep, :] 131 | scores = scores[keep] 132 | 133 | # 4. sort all (proposal, score) pairs by score from highest to lowest 134 | # 5. take top pre_nms_topN (e.g. 6000) 135 | order = scores.ravel().argsort()[::-1] 136 | if pre_nms_topN > 0: 137 | order = order[:pre_nms_topN] 138 | proposals = proposals[order, :] 139 | scores = scores[order] 140 | 141 | # 6. apply nms (e.g. threshold = 0.7) 142 | # 7. take after_nms_topN (e.g. 300) 143 | # 8. return the top proposals (-> RoIs top) 144 | keep = nms(np.hstack((proposals, scores)), nms_thresh) 145 | if post_nms_topN > 0: 146 | keep = keep[:post_nms_topN] 147 | proposals = proposals[keep, :] 148 | scores = scores[keep] 149 | 150 | # Output rois blob 151 | # Our RPN implementation only supports a single input image, so all 152 | # batch inds are 0 153 | batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32) 154 | blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False))) 155 | top[0].reshape(*(blob.shape)) 156 | top[0].data[...] = blob 157 | 158 | # [Optional] output scores blob 159 | if len(top) > 1: 160 | top[1].reshape(*(scores.shape)) 161 | top[1].data[...] = scores 162 | 163 | def backward(self, top, propagate_down, bottom): 164 | """This layer does not propagate gradients.""" 165 | pass 166 | 167 | def reshape(self, bottom, top): 168 | """Reshaping happens during the call to forward.""" 169 | pass 170 | 171 | def _filter_boxes(boxes, min_size): 172 | """Remove all boxes with any side smaller than min_size.""" 173 | ws = boxes[:, 2] - boxes[:, 0] + 1 174 | hs = boxes[:, 3] - boxes[:, 1] + 1 175 | keep = np.where((ws >= min_size) & (hs >= min_size))[0] 176 | return keep 177 | -------------------------------------------------------------------------------- /lib/datasets/voc_eval.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast/er R-CNN 3 | # Licensed under The MIT License [see LICENSE for details] 4 | # Written by Bharath Hariharan 5 | # -------------------------------------------------------- 6 | 7 | import xml.etree.ElementTree as ET 8 | import os 9 | import cPickle 10 | import numpy as np 11 | 12 | def parse_rec(filename): 13 | """ Parse a PASCAL VOC xml file """ 14 | tree = ET.parse(filename) 15 | objects = [] 16 | for obj in tree.findall('object'): 17 | obj_struct = {} 18 | obj_struct['name'] = obj.find('name').text 19 | obj_struct['pose'] = obj.find('pose').text 20 | obj_struct['truncated'] = int(obj.find('truncated').text) 21 | obj_struct['difficult'] = int(obj.find('difficult').text) 22 | bbox = obj.find('bndbox') 23 | obj_struct['bbox'] = [int(bbox.find('xmin').text), 24 | int(bbox.find('ymin').text), 25 | int(bbox.find('xmax').text), 26 | int(bbox.find('ymax').text)] 27 | objects.append(obj_struct) 28 | 29 | return objects 30 | 31 | def voc_ap(rec, prec, use_07_metric=False): 32 | """ ap = voc_ap(rec, prec, [use_07_metric]) 33 | Compute VOC AP given precision and recall. 34 | If use_07_metric is true, uses the 35 | VOC 07 11 point method (default:False). 36 | """ 37 | if use_07_metric: 38 | # 11 point metric 39 | ap = 0. 40 | for t in np.arange(0., 1.1, 0.1): 41 | if np.sum(rec >= t) == 0: 42 | p = 0 43 | else: 44 | p = np.max(prec[rec >= t]) 45 | ap = ap + p / 11. 46 | else: 47 | # correct AP calculation 48 | # first append sentinel values at the end 49 | mrec = np.concatenate(([0.], rec, [1.])) 50 | mpre = np.concatenate(([0.], prec, [0.])) 51 | 52 | # compute the precision envelope 53 | for i in range(mpre.size - 1, 0, -1): 54 | mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) 55 | 56 | # to calculate area under PR curve, look for points 57 | # where X axis (recall) changes value 58 | i = np.where(mrec[1:] != mrec[:-1])[0] 59 | 60 | # and sum (\Delta recall) * prec 61 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) 62 | return ap 63 | 64 | def voc_eval(detpath, 65 | annopath, 66 | imagesetfile, 67 | classname, 68 | cachedir, 69 | ovthresh=0.5, 70 | use_07_metric=False): 71 | """rec, prec, ap = voc_eval(detpath, 72 | annopath, 73 | imagesetfile, 74 | classname, 75 | [ovthresh], 76 | [use_07_metric]) 77 | 78 | Top level function that does the PASCAL VOC evaluation. 79 | 80 | detpath: Path to detections 81 | detpath.format(classname) should produce the detection results file. 82 | annopath: Path to annotations 83 | annopath.format(imagename) should be the xml annotations file. 84 | imagesetfile: Text file containing the list of images, one image per line. 85 | classname: Category name (duh) 86 | cachedir: Directory for caching the annotations 87 | [ovthresh]: Overlap threshold (default = 0.5) 88 | [use_07_metric]: Whether to use VOC07's 11 point AP computation 89 | (default False) 90 | """ 91 | # assumes detections are in detpath.format(classname) 92 | # assumes annotations are in annopath.format(imagename) 93 | # assumes imagesetfile is a text file with each line an image name 94 | # cachedir caches the annotations in a pickle file 95 | 96 | # first load gt 97 | if not os.path.isdir(cachedir): 98 | os.mkdir(cachedir) 99 | cachefile = os.path.join(cachedir, 'annots.pkl') 100 | # read list of images 101 | with open(imagesetfile, 'r') as f: 102 | lines = f.readlines() 103 | imagenames = [x.strip() for x in lines] 104 | 105 | if not os.path.isfile(cachefile): 106 | # load annots 107 | recs = {} 108 | for i, imagename in enumerate(imagenames): 109 | recs[imagename] = parse_rec(annopath.format(imagename)) 110 | if i % 100 == 0: 111 | print 'Reading annotation for {:d}/{:d}'.format( 112 | i + 1, len(imagenames)) 113 | # save 114 | print 'Saving cached annotations to {:s}'.format(cachefile) 115 | with open(cachefile, 'w') as f: 116 | cPickle.dump(recs, f) 117 | else: 118 | # load 119 | with open(cachefile, 'r') as f: 120 | recs = cPickle.load(f) 121 | 122 | # extract gt objects for this class 123 | class_recs = {} 124 | npos = 0 125 | for imagename in imagenames: 126 | R = [obj for obj in recs[imagename] if obj['name'] == classname] 127 | bbox = np.array([x['bbox'] for x in R]) 128 | difficult = np.array([x['difficult'] for x in R]).astype(np.bool) 129 | det = [False] * len(R) 130 | npos = npos + sum(~difficult) 131 | class_recs[imagename] = {'bbox': bbox, 132 | 'difficult': difficult, 133 | 'det': det} 134 | 135 | # read dets 136 | detfile = detpath.format(classname) 137 | with open(detfile, 'r') as f: 138 | lines = f.readlines() 139 | 140 | splitlines = [x.strip().split(' ') for x in lines] 141 | image_ids = [x[0] for x in splitlines] 142 | confidence = np.array([float(x[1]) for x in splitlines]) 143 | BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) 144 | 145 | # sort by confidence 146 | sorted_ind = np.argsort(-confidence) 147 | sorted_scores = np.sort(-confidence) 148 | BB = BB[sorted_ind, :] 149 | image_ids = [image_ids[x] for x in sorted_ind] 150 | 151 | # go down dets and mark TPs and FPs 152 | nd = len(image_ids) 153 | tp = np.zeros(nd) 154 | fp = np.zeros(nd) 155 | for d in range(nd): 156 | R = class_recs[image_ids[d]] 157 | bb = BB[d, :].astype(float) 158 | ovmax = -np.inf 159 | BBGT = R['bbox'].astype(float) 160 | 161 | if BBGT.size > 0: 162 | # compute overlaps 163 | # intersection 164 | ixmin = np.maximum(BBGT[:, 0], bb[0]) 165 | iymin = np.maximum(BBGT[:, 1], bb[1]) 166 | ixmax = np.minimum(BBGT[:, 2], bb[2]) 167 | iymax = np.minimum(BBGT[:, 3], bb[3]) 168 | iw = np.maximum(ixmax - ixmin + 1., 0.) 169 | ih = np.maximum(iymax - iymin + 1., 0.) 170 | inters = iw * ih 171 | 172 | # union 173 | uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) + 174 | (BBGT[:, 2] - BBGT[:, 0] + 1.) * 175 | (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters) 176 | 177 | overlaps = inters / uni 178 | ovmax = np.max(overlaps) 179 | jmax = np.argmax(overlaps) 180 | 181 | if ovmax > ovthresh: 182 | if not R['difficult'][jmax]: 183 | if not R['det'][jmax]: 184 | tp[d] = 1. 185 | R['det'][jmax] = 1 186 | else: 187 | fp[d] = 1. 188 | else: 189 | fp[d] = 1. 190 | 191 | # compute precision recall 192 | fp = np.cumsum(fp) 193 | tp = np.cumsum(tp) 194 | rec = tp / float(npos) 195 | # avoid divide by zero in case the first detection matches a difficult 196 | # ground truth 197 | prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) 198 | ap = voc_ap(rec, prec, use_07_metric) 199 | 200 | return rec, prec, ap 201 | -------------------------------------------------------------------------------- /lib/rpn/proposal_target_layer.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Faster R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick and Sean Bell 6 | # -------------------------------------------------------- 7 | 8 | import caffe 9 | import yaml 10 | import numpy as np 11 | import numpy.random as npr 12 | from fast_rcnn.config import cfg 13 | from fast_rcnn.bbox_transform import bbox_transform 14 | from utils.cython_bbox import bbox_overlaps 15 | 16 | DEBUG = False 17 | 18 | class ProposalTargetLayer(caffe.Layer): 19 | """ 20 | Assign object detection proposals to ground-truth targets. Produces proposal 21 | classification labels and bounding-box regression targets. 22 | """ 23 | 24 | def setup(self, bottom, top): 25 | layer_params = yaml.load(self.param_str_) 26 | self._num_classes = layer_params['num_classes'] 27 | 28 | # sampled rois (0, x1, y1, x2, y2) 29 | top[0].reshape(1, 5) 30 | # labels 31 | top[1].reshape(1, 1) 32 | # bbox_targets 33 | top[2].reshape(1, self._num_classes * 4) 34 | # bbox_inside_weights 35 | top[3].reshape(1, self._num_classes * 4) 36 | # bbox_outside_weights 37 | top[4].reshape(1, self._num_classes * 4) 38 | 39 | def forward(self, bottom, top): 40 | # Proposal ROIs (0, x1, y1, x2, y2) coming from RPN 41 | # (i.e., rpn.proposal_layer.ProposalLayer), or any other source 42 | all_rois = bottom[0].data 43 | # GT boxes (x1, y1, x2, y2, label) 44 | # TODO(rbg): it's annoying that sometimes I have extra info before 45 | # and other times after box coordinates -- normalize to one format 46 | gt_boxes = bottom[1].data 47 | 48 | # Include ground-truth boxes in the set of candidate rois 49 | zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype) 50 | all_rois = np.vstack( 51 | (all_rois, np.hstack((zeros, gt_boxes[:, :-1]))) 52 | ) 53 | 54 | # Sanity check: single batch only 55 | assert np.all(all_rois[:, 0] == 0), \ 56 | 'Only single item batches are supported' 57 | 58 | num_images = 1 59 | rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images 60 | fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image) 61 | 62 | # Sample rois with classification labels and bounding box regression 63 | # targets 64 | labels, rois, bbox_targets, bbox_inside_weights = _sample_rois( 65 | all_rois, gt_boxes, fg_rois_per_image, 66 | rois_per_image, self._num_classes) 67 | 68 | if DEBUG: 69 | print 'num fg: {}'.format((labels > 0).sum()) 70 | print 'num bg: {}'.format((labels == 0).sum()) 71 | self._count += 1 72 | self._fg_num += (labels > 0).sum() 73 | self._bg_num += (labels == 0).sum() 74 | print 'num fg avg: {}'.format(self._fg_num / self._count) 75 | print 'num bg avg: {}'.format(self._bg_num / self._count) 76 | print 'ratio: {:.3f}'.format(float(self._fg_num) / float(self._bg_num)) 77 | 78 | # sampled rois 79 | top[0].reshape(*rois.shape) 80 | top[0].data[...] = rois 81 | 82 | # classification labels 83 | top[1].reshape(*labels.shape) 84 | top[1].data[...] = labels 85 | 86 | # bbox_targets 87 | top[2].reshape(*bbox_targets.shape) 88 | top[2].data[...] = bbox_targets 89 | 90 | # bbox_inside_weights 91 | top[3].reshape(*bbox_inside_weights.shape) 92 | top[3].data[...] = bbox_inside_weights 93 | 94 | # bbox_outside_weights 95 | top[4].reshape(*bbox_inside_weights.shape) 96 | top[4].data[...] = np.array(bbox_inside_weights > 0).astype(np.float32) 97 | 98 | def backward(self, top, propagate_down, bottom): 99 | """This layer does not propagate gradients.""" 100 | pass 101 | 102 | def reshape(self, bottom, top): 103 | """Reshaping happens during the call to forward.""" 104 | pass 105 | 106 | 107 | def _get_bbox_regression_labels(bbox_target_data, num_classes): 108 | """Bounding-box regression targets (bbox_target_data) are stored in a 109 | compact form N x (class, tx, ty, tw, th) 110 | 111 | This function expands those targets into the 4-of-4*K representation used 112 | by the network (i.e. only one class has non-zero targets). 113 | 114 | Returns: 115 | bbox_target (ndarray): N x 4K blob of regression targets 116 | bbox_inside_weights (ndarray): N x 4K blob of loss weights 117 | """ 118 | 119 | clss = bbox_target_data[:, 0] 120 | bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32) 121 | bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32) 122 | inds = np.where(clss > 0)[0] 123 | for ind in inds: 124 | cls = clss[ind] 125 | start = 4 * cls 126 | end = start + 4 127 | bbox_targets[ind, start:end] = bbox_target_data[ind, 1:] 128 | bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS 129 | return bbox_targets, bbox_inside_weights 130 | 131 | 132 | def _compute_targets(ex_rois, gt_rois, labels): 133 | """Compute bounding-box regression targets for an image.""" 134 | 135 | assert ex_rois.shape[0] == gt_rois.shape[0] 136 | assert ex_rois.shape[1] == 4 137 | assert gt_rois.shape[1] == 4 138 | 139 | targets = bbox_transform(ex_rois, gt_rois) 140 | if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED: 141 | # Optionally normalize targets by a precomputed mean and stdev 142 | targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS)) 143 | / np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS)) 144 | return np.hstack( 145 | (labels[:, np.newaxis], targets)).astype(np.float32, copy=False) 146 | 147 | def _sample_rois(all_rois, gt_boxes, fg_rois_per_image, rois_per_image, num_classes): 148 | """Generate a random sample of RoIs comprising foreground and background 149 | examples. 150 | """ 151 | # overlaps: (rois x gt_boxes) 152 | overlaps = bbox_overlaps( 153 | np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float), 154 | np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float)) 155 | gt_assignment = overlaps.argmax(axis=1) 156 | max_overlaps = overlaps.max(axis=1) 157 | labels = gt_boxes[gt_assignment, 4] 158 | 159 | # Select foreground RoIs as those with >= FG_THRESH overlap 160 | fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0] 161 | # Guard against the case when an image has fewer than fg_rois_per_image 162 | # foreground RoIs 163 | fg_rois_per_this_image = min(fg_rois_per_image, fg_inds.size) 164 | # Sample foreground regions without replacement 165 | if fg_inds.size > 0: 166 | fg_inds = npr.choice(fg_inds, size=fg_rois_per_this_image, replace=False) 167 | 168 | # Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI) 169 | bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) & 170 | (max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0] 171 | # Compute number of background RoIs to take from this image (guarding 172 | # against there being fewer than desired) 173 | bg_rois_per_this_image = rois_per_image - fg_rois_per_this_image 174 | bg_rois_per_this_image = min(bg_rois_per_this_image, bg_inds.size) 175 | # Sample background regions without replacement 176 | if bg_inds.size > 0: 177 | bg_inds = npr.choice(bg_inds, size=bg_rois_per_this_image, replace=False) 178 | 179 | # The indices that we're selecting (both fg and bg) 180 | keep_inds = np.append(fg_inds, bg_inds) 181 | # Select sampled values from various arrays: 182 | labels = labels[keep_inds] 183 | # Clamp labels for the background RoIs to 0 184 | labels[fg_rois_per_this_image:] = 0 185 | rois = all_rois[keep_inds] 186 | 187 | bbox_target_data = _compute_targets( 188 | rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels) 189 | 190 | bbox_targets, bbox_inside_weights = \ 191 | _get_bbox_regression_labels(bbox_target_data, num_classes) 192 | 193 | return labels, rois, bbox_targets, bbox_inside_weights 194 | -------------------------------------------------------------------------------- /lib/pycocotools/maskApi.c: -------------------------------------------------------------------------------- 1 | /************************************************************************** 2 | * Microsoft COCO Toolbox. version 2.0 3 | * Data, paper, and tutorials available at: http://mscoco.org/ 4 | * Code written by Piotr Dollar and Tsung-Yi Lin, 2015. 5 | * Licensed under the Simplified BSD License [see coco/license.txt] 6 | **************************************************************************/ 7 | #include "maskApi.h" 8 | #include 9 | #include 10 | 11 | uint umin( uint a, uint b ) { return (ab) ? a : b; } 13 | 14 | void rleInit( RLE *R, siz h, siz w, siz m, uint *cnts ) { 15 | R->h=h; R->w=w; R->m=m; R->cnts=(m==0)?0:malloc(sizeof(uint)*m); 16 | if(cnts) for(siz j=0; jcnts[j]=cnts[j]; 17 | } 18 | 19 | void rleFree( RLE *R ) { 20 | free(R->cnts); R->cnts=0; 21 | } 22 | 23 | void rlesInit( RLE **R, siz n ) { 24 | *R = (RLE*) malloc(sizeof(RLE)*n); 25 | for(siz i=0; i0 ) { 61 | c=umin(ca,cb); cc+=c; ct=0; 62 | ca-=c; if(!ca && a0) { 83 | crowd=iscrowd!=NULL && iscrowd[g]; 84 | if(dt[d].h!=gt[g].h || dt[d].w!=gt[g].w) { o[g*m+d]=-1; continue; } 85 | siz ka, kb, a, b; uint c, ca, cb, ct, i, u; bool va, vb; 86 | ca=dt[d].cnts[0]; ka=dt[d].m; va=vb=0; 87 | cb=gt[g].cnts[0]; kb=gt[g].m; a=b=1; i=u=0; ct=1; 88 | while( ct>0 ) { 89 | c=umin(ca,cb); if(va||vb) { u+=c; if(va&&vb) i+=c; } ct=0; 90 | ca-=c; if(!ca && ad?1:c=dy && xs>xe) || (dxye); 151 | if(flip) { t=xs; xs=xe; xe=t; t=ys; ys=ye; ye=t; } 152 | s = dx>=dy ? (double)(ye-ys)/dx : (double)(xe-xs)/dy; 153 | if(dx>=dy) for( int d=0; d<=dx; d++ ) { 154 | t=flip?dx-d:d; u[m]=t+xs; v[m]=(int)(ys+s*t+.5); m++; 155 | } else for( int d=0; d<=dy; d++ ) { 156 | t=flip?dy-d:d; v[m]=t+ys; u[m]=(int)(xs+s*t+.5); m++; 157 | } 158 | } 159 | // get points along y-boundary and downsample 160 | free(x); free(y); k=m; m=0; double xd, yd; 161 | x=malloc(sizeof(int)*k); y=malloc(sizeof(int)*k); 162 | for( j=1; jw-1 ) continue; 165 | yd=(double)(v[j]h) yd=h; yd=ceil(yd); 167 | x[m]=(int) xd; y[m]=(int) yd; m++; 168 | } 169 | // compute rle encoding given y-boundary points 170 | k=m; a=malloc(sizeof(uint)*(k+1)); 171 | for( j=0; j0) b[m++]=a[j++]; else { 177 | j++; if(jm, p=0; long x; bool more; 184 | char *s=malloc(sizeof(char)*m*6); 185 | for( i=0; icnts[i]; if(i>2) x-=(long) R->cnts[i-2]; more=1; 187 | while( more ) { 188 | char c=x & 0x1f; x >>= 5; more=(c & 0x10) ? x!=-1 : x!=0; 189 | if(more) c |= 0x20; c+=48; s[p++]=c; 190 | } 191 | } 192 | s[p]=0; return s; 193 | } 194 | 195 | void rleFrString( RLE *R, char *s, siz h, siz w ) { 196 | siz m=0, p=0, k; long x; bool more; uint *cnts; 197 | while( s[m] ) m++; cnts=malloc(sizeof(uint)*m); m=0; 198 | while( s[p] ) { 199 | x=0; k=0; more=1; 200 | while( more ) { 201 | char c=s[p]-48; x |= (c & 0x1f) << 5*k; 202 | more = c & 0x20; p++; k++; 203 | if(!more && (c & 0x10)) x |= -1 << 5*k; 204 | } 205 | if(m>2) x+=(long) cnts[m-2]; cnts[m++]=(uint) x; 206 | } 207 | rleInit(R,h,w,m,cnts); free(cnts); 208 | } 209 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG16/fast_rcnn/train.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG_ILSVRC_16_layers" 2 | layer { 3 | name: 'data' 4 | type: 'Python' 5 | top: 'data' 6 | top: 'rois' 7 | top: 'labels' 8 | top: 'bbox_targets' 9 | top: 'bbox_inside_weights' 10 | top: 'bbox_outside_weights' 11 | python_param { 12 | module: 'roi_data_layer.layer' 13 | layer: 'RoIDataLayer' 14 | param_str: "'num_classes': 21" 15 | } 16 | } 17 | layer { 18 | name: "conv1_1" 19 | type: "Convolution" 20 | bottom: "data" 21 | top: "conv1_1" 22 | param { 23 | lr_mult: 0 24 | decay_mult: 0 25 | } 26 | param { 27 | lr_mult: 0 28 | decay_mult: 0 29 | } 30 | convolution_param { 31 | num_output: 64 32 | pad: 1 33 | kernel_size: 3 34 | } 35 | } 36 | layer { 37 | name: "relu1_1" 38 | type: "ReLU" 39 | bottom: "conv1_1" 40 | top: "conv1_1" 41 | } 42 | layer { 43 | name: "conv1_2" 44 | type: "Convolution" 45 | bottom: "conv1_1" 46 | top: "conv1_2" 47 | param { 48 | lr_mult: 0 49 | decay_mult: 0 50 | } 51 | param { 52 | lr_mult: 0 53 | decay_mult: 0 54 | } 55 | convolution_param { 56 | num_output: 64 57 | pad: 1 58 | kernel_size: 3 59 | } 60 | } 61 | layer { 62 | name: "relu1_2" 63 | type: "ReLU" 64 | bottom: "conv1_2" 65 | top: "conv1_2" 66 | } 67 | layer { 68 | name: "pool1" 69 | type: "Pooling" 70 | bottom: "conv1_2" 71 | top: "pool1" 72 | pooling_param { 73 | pool: MAX 74 | kernel_size: 2 75 | stride: 2 76 | } 77 | } 78 | layer { 79 | name: "conv2_1" 80 | type: "Convolution" 81 | bottom: "pool1" 82 | top: "conv2_1" 83 | param { 84 | lr_mult: 0 85 | decay_mult: 0 86 | } 87 | param { 88 | lr_mult: 0 89 | decay_mult: 0 90 | } 91 | convolution_param { 92 | num_output: 128 93 | pad: 1 94 | kernel_size: 3 95 | } 96 | } 97 | layer { 98 | name: "relu2_1" 99 | type: "ReLU" 100 | bottom: "conv2_1" 101 | top: "conv2_1" 102 | } 103 | layer { 104 | name: "conv2_2" 105 | type: "Convolution" 106 | bottom: "conv2_1" 107 | top: "conv2_2" 108 | param { 109 | lr_mult: 0 110 | decay_mult: 0 111 | } 112 | param { 113 | lr_mult: 0 114 | decay_mult: 0 115 | } 116 | convolution_param { 117 | num_output: 128 118 | pad: 1 119 | kernel_size: 3 120 | } 121 | } 122 | layer { 123 | name: "relu2_2" 124 | type: "ReLU" 125 | bottom: "conv2_2" 126 | top: "conv2_2" 127 | } 128 | layer { 129 | name: "pool2" 130 | type: "Pooling" 131 | bottom: "conv2_2" 132 | top: "pool2" 133 | pooling_param { 134 | pool: MAX 135 | kernel_size: 2 136 | stride: 2 137 | } 138 | } 139 | layer { 140 | name: "conv3_1" 141 | type: "Convolution" 142 | bottom: "pool2" 143 | top: "conv3_1" 144 | param { 145 | lr_mult: 1 146 | } 147 | param { 148 | lr_mult: 2 149 | } 150 | convolution_param { 151 | num_output: 256 152 | pad: 1 153 | kernel_size: 3 154 | } 155 | } 156 | layer { 157 | name: "relu3_1" 158 | type: "ReLU" 159 | bottom: "conv3_1" 160 | top: "conv3_1" 161 | } 162 | layer { 163 | name: "conv3_2" 164 | type: "Convolution" 165 | bottom: "conv3_1" 166 | top: "conv3_2" 167 | param { 168 | lr_mult: 1 169 | } 170 | param { 171 | lr_mult: 2 172 | } 173 | convolution_param { 174 | num_output: 256 175 | pad: 1 176 | kernel_size: 3 177 | } 178 | } 179 | layer { 180 | name: "relu3_2" 181 | type: "ReLU" 182 | bottom: "conv3_2" 183 | top: "conv3_2" 184 | } 185 | layer { 186 | name: "conv3_3" 187 | type: "Convolution" 188 | bottom: "conv3_2" 189 | top: "conv3_3" 190 | param { 191 | lr_mult: 1 192 | } 193 | param { 194 | lr_mult: 2 195 | } 196 | convolution_param { 197 | num_output: 256 198 | pad: 1 199 | kernel_size: 3 200 | } 201 | } 202 | layer { 203 | name: "relu3_3" 204 | type: "ReLU" 205 | bottom: "conv3_3" 206 | top: "conv3_3" 207 | } 208 | layer { 209 | name: "pool3" 210 | type: "Pooling" 211 | bottom: "conv3_3" 212 | top: "pool3" 213 | pooling_param { 214 | pool: MAX 215 | kernel_size: 2 216 | stride: 2 217 | } 218 | } 219 | layer { 220 | name: "conv4_1" 221 | type: "Convolution" 222 | bottom: "pool3" 223 | top: "conv4_1" 224 | param { 225 | lr_mult: 1 226 | } 227 | param { 228 | lr_mult: 2 229 | } 230 | convolution_param { 231 | num_output: 512 232 | pad: 1 233 | kernel_size: 3 234 | } 235 | } 236 | layer { 237 | name: "relu4_1" 238 | type: "ReLU" 239 | bottom: "conv4_1" 240 | top: "conv4_1" 241 | } 242 | layer { 243 | name: "conv4_2" 244 | type: "Convolution" 245 | bottom: "conv4_1" 246 | top: "conv4_2" 247 | param { 248 | lr_mult: 1 249 | } 250 | param { 251 | lr_mult: 2 252 | } 253 | convolution_param { 254 | num_output: 512 255 | pad: 1 256 | kernel_size: 3 257 | } 258 | } 259 | layer { 260 | name: "relu4_2" 261 | type: "ReLU" 262 | bottom: "conv4_2" 263 | top: "conv4_2" 264 | } 265 | layer { 266 | name: "conv4_3" 267 | type: "Convolution" 268 | bottom: "conv4_2" 269 | top: "conv4_3" 270 | param { 271 | lr_mult: 1 272 | } 273 | param { 274 | lr_mult: 2 275 | } 276 | convolution_param { 277 | num_output: 512 278 | pad: 1 279 | kernel_size: 3 280 | } 281 | } 282 | layer { 283 | name: "relu4_3" 284 | type: "ReLU" 285 | bottom: "conv4_3" 286 | top: "conv4_3" 287 | } 288 | layer { 289 | name: "pool4" 290 | type: "Pooling" 291 | bottom: "conv4_3" 292 | top: "pool4" 293 | pooling_param { 294 | pool: MAX 295 | kernel_size: 2 296 | stride: 2 297 | } 298 | } 299 | layer { 300 | name: "conv5_1" 301 | type: "Convolution" 302 | bottom: "pool4" 303 | top: "conv5_1" 304 | param { 305 | lr_mult: 1 306 | } 307 | param { 308 | lr_mult: 2 309 | } 310 | convolution_param { 311 | num_output: 512 312 | pad: 1 313 | kernel_size: 3 314 | } 315 | } 316 | layer { 317 | name: "relu5_1" 318 | type: "ReLU" 319 | bottom: "conv5_1" 320 | top: "conv5_1" 321 | } 322 | layer { 323 | name: "conv5_2" 324 | type: "Convolution" 325 | bottom: "conv5_1" 326 | top: "conv5_2" 327 | param { 328 | lr_mult: 1 329 | } 330 | param { 331 | lr_mult: 2 332 | } 333 | convolution_param { 334 | num_output: 512 335 | pad: 1 336 | kernel_size: 3 337 | } 338 | } 339 | layer { 340 | name: "relu5_2" 341 | type: "ReLU" 342 | bottom: "conv5_2" 343 | top: "conv5_2" 344 | } 345 | layer { 346 | name: "conv5_3" 347 | type: "Convolution" 348 | bottom: "conv5_2" 349 | top: "conv5_3" 350 | param { 351 | lr_mult: 1 352 | } 353 | param { 354 | lr_mult: 2 355 | } 356 | convolution_param { 357 | num_output: 512 358 | pad: 1 359 | kernel_size: 3 360 | } 361 | } 362 | layer { 363 | name: "relu5_3" 364 | type: "ReLU" 365 | bottom: "conv5_3" 366 | top: "conv5_3" 367 | } 368 | layer { 369 | name: "roi_pool5" 370 | type: "ROIPooling" 371 | bottom: "conv5_3" 372 | bottom: "rois" 373 | top: "pool5" 374 | roi_pooling_param { 375 | pooled_w: 7 376 | pooled_h: 7 377 | spatial_scale: 0.0625 # 1/16 378 | } 379 | } 380 | layer { 381 | name: "fc6" 382 | type: "InnerProduct" 383 | bottom: "pool5" 384 | top: "fc6" 385 | param { 386 | lr_mult: 1 387 | } 388 | param { 389 | lr_mult: 2 390 | } 391 | inner_product_param { 392 | num_output: 4096 393 | } 394 | } 395 | layer { 396 | name: "relu6" 397 | type: "ReLU" 398 | bottom: "fc6" 399 | top: "fc6" 400 | } 401 | layer { 402 | name: "drop6" 403 | type: "Dropout" 404 | bottom: "fc6" 405 | top: "fc6" 406 | dropout_param { 407 | dropout_ratio: 0.5 408 | } 409 | } 410 | layer { 411 | name: "fc7" 412 | type: "InnerProduct" 413 | bottom: "fc6" 414 | top: "fc7" 415 | param { 416 | lr_mult: 1 417 | } 418 | param { 419 | lr_mult: 2 420 | } 421 | inner_product_param { 422 | num_output: 4096 423 | } 424 | } 425 | layer { 426 | name: "relu7" 427 | type: "ReLU" 428 | bottom: "fc7" 429 | top: "fc7" 430 | } 431 | layer { 432 | name: "drop7" 433 | type: "Dropout" 434 | bottom: "fc7" 435 | top: "fc7" 436 | dropout_param { 437 | dropout_ratio: 0.5 438 | } 439 | } 440 | layer { 441 | name: "cls_score" 442 | type: "InnerProduct" 443 | bottom: "fc7" 444 | top: "cls_score" 445 | param { 446 | lr_mult: 1 447 | } 448 | param { 449 | lr_mult: 2 450 | } 451 | inner_product_param { 452 | num_output: 21 453 | weight_filler { 454 | type: "gaussian" 455 | std: 0.01 456 | } 457 | bias_filler { 458 | type: "constant" 459 | value: 0 460 | } 461 | } 462 | } 463 | layer { 464 | name: "bbox_pred" 465 | type: "InnerProduct" 466 | bottom: "fc7" 467 | top: "bbox_pred" 468 | param { 469 | lr_mult: 1 470 | } 471 | param { 472 | lr_mult: 2 473 | } 474 | inner_product_param { 475 | num_output: 84 476 | weight_filler { 477 | type: "gaussian" 478 | std: 0.001 479 | } 480 | bias_filler { 481 | type: "constant" 482 | value: 0 483 | } 484 | } 485 | } 486 | layer { 487 | name: "loss_cls" 488 | type: "SoftmaxWithLoss" 489 | bottom: "cls_score" 490 | bottom: "labels" 491 | top: "loss_cls" 492 | loss_weight: 1 493 | } 494 | layer { 495 | name: "loss_bbox" 496 | type: "SmoothL1Loss" 497 | bottom: "bbox_pred" 498 | bottom: "bbox_targets" 499 | bottom: "bbox_inside_weights" 500 | bottom: "bbox_outside_weights" 501 | top: "loss_bbox" 502 | loss_weight: 1 503 | } 504 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG16/fast_rcnn/test.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG_ILSVRC_16_layers" 2 | 3 | input: "data" 4 | input_shape { 5 | dim: 1 6 | dim: 3 7 | dim: 224 8 | dim: 224 9 | } 10 | 11 | input: "rois" 12 | input_shape { 13 | dim: 1 # to be changed on-the-fly to num ROIs 14 | dim: 5 # [batch ind, x1, y1, x2, y2] zero-based indexing 15 | } 16 | 17 | layer { 18 | name: "conv1_1" 19 | type: "Convolution" 20 | bottom: "data" 21 | top: "conv1_1" 22 | param { 23 | lr_mult: 0 24 | decay_mult: 0 25 | } 26 | param { 27 | lr_mult: 0 28 | decay_mult: 0 29 | } 30 | convolution_param { 31 | num_output: 64 32 | pad: 1 33 | kernel_size: 3 34 | } 35 | } 36 | layer { 37 | name: "relu1_1" 38 | type: "ReLU" 39 | bottom: "conv1_1" 40 | top: "conv1_1" 41 | } 42 | layer { 43 | name: "conv1_2" 44 | type: "Convolution" 45 | bottom: "conv1_1" 46 | top: "conv1_2" 47 | param { 48 | lr_mult: 0 49 | decay_mult: 0 50 | } 51 | param { 52 | lr_mult: 0 53 | decay_mult: 0 54 | } 55 | convolution_param { 56 | num_output: 64 57 | pad: 1 58 | kernel_size: 3 59 | } 60 | } 61 | layer { 62 | name: "relu1_2" 63 | type: "ReLU" 64 | bottom: "conv1_2" 65 | top: "conv1_2" 66 | } 67 | layer { 68 | name: "pool1" 69 | type: "Pooling" 70 | bottom: "conv1_2" 71 | top: "pool1" 72 | pooling_param { 73 | pool: MAX 74 | kernel_size: 2 75 | stride: 2 76 | } 77 | } 78 | layer { 79 | name: "conv2_1" 80 | type: "Convolution" 81 | bottom: "pool1" 82 | top: "conv2_1" 83 | param { 84 | lr_mult: 0 85 | decay_mult: 0 86 | } 87 | param { 88 | lr_mult: 0 89 | decay_mult: 0 90 | } 91 | convolution_param { 92 | num_output: 128 93 | pad: 1 94 | kernel_size: 3 95 | } 96 | } 97 | layer { 98 | name: "relu2_1" 99 | type: "ReLU" 100 | bottom: "conv2_1" 101 | top: "conv2_1" 102 | } 103 | layer { 104 | name: "conv2_2" 105 | type: "Convolution" 106 | bottom: "conv2_1" 107 | top: "conv2_2" 108 | param { 109 | lr_mult: 0 110 | decay_mult: 0 111 | } 112 | param { 113 | lr_mult: 0 114 | decay_mult: 0 115 | } 116 | convolution_param { 117 | num_output: 128 118 | pad: 1 119 | kernel_size: 3 120 | } 121 | } 122 | layer { 123 | name: "relu2_2" 124 | type: "ReLU" 125 | bottom: "conv2_2" 126 | top: "conv2_2" 127 | } 128 | layer { 129 | name: "pool2" 130 | type: "Pooling" 131 | bottom: "conv2_2" 132 | top: "pool2" 133 | pooling_param { 134 | pool: MAX 135 | kernel_size: 2 136 | stride: 2 137 | } 138 | } 139 | layer { 140 | name: "conv3_1" 141 | type: "Convolution" 142 | bottom: "pool2" 143 | top: "conv3_1" 144 | param { 145 | lr_mult: 1 146 | decay_mult: 1 147 | } 148 | param { 149 | lr_mult: 2 150 | decay_mult: 0 151 | } 152 | convolution_param { 153 | num_output: 256 154 | pad: 1 155 | kernel_size: 3 156 | } 157 | } 158 | layer { 159 | name: "relu3_1" 160 | type: "ReLU" 161 | bottom: "conv3_1" 162 | top: "conv3_1" 163 | } 164 | layer { 165 | name: "conv3_2" 166 | type: "Convolution" 167 | bottom: "conv3_1" 168 | top: "conv3_2" 169 | param { 170 | lr_mult: 1 171 | decay_mult: 1 172 | } 173 | param { 174 | lr_mult: 2 175 | decay_mult: 0 176 | } 177 | convolution_param { 178 | num_output: 256 179 | pad: 1 180 | kernel_size: 3 181 | } 182 | } 183 | layer { 184 | name: "relu3_2" 185 | type: "ReLU" 186 | bottom: "conv3_2" 187 | top: "conv3_2" 188 | } 189 | layer { 190 | name: "conv3_3" 191 | type: "Convolution" 192 | bottom: "conv3_2" 193 | top: "conv3_3" 194 | param { 195 | lr_mult: 1 196 | decay_mult: 1 197 | } 198 | param { 199 | lr_mult: 2 200 | decay_mult: 0 201 | } 202 | convolution_param { 203 | num_output: 256 204 | pad: 1 205 | kernel_size: 3 206 | } 207 | } 208 | layer { 209 | name: "relu3_3" 210 | type: "ReLU" 211 | bottom: "conv3_3" 212 | top: "conv3_3" 213 | } 214 | layer { 215 | name: "pool3" 216 | type: "Pooling" 217 | bottom: "conv3_3" 218 | top: "pool3" 219 | pooling_param { 220 | pool: MAX 221 | kernel_size: 2 222 | stride: 2 223 | } 224 | } 225 | layer { 226 | name: "conv4_1" 227 | type: "Convolution" 228 | bottom: "pool3" 229 | top: "conv4_1" 230 | param { 231 | lr_mult: 1 232 | decay_mult: 1 233 | } 234 | param { 235 | lr_mult: 2 236 | decay_mult: 0 237 | } 238 | convolution_param { 239 | num_output: 512 240 | pad: 1 241 | kernel_size: 3 242 | } 243 | } 244 | layer { 245 | name: "relu4_1" 246 | type: "ReLU" 247 | bottom: "conv4_1" 248 | top: "conv4_1" 249 | } 250 | layer { 251 | name: "conv4_2" 252 | type: "Convolution" 253 | bottom: "conv4_1" 254 | top: "conv4_2" 255 | param { 256 | lr_mult: 1 257 | decay_mult: 1 258 | } 259 | param { 260 | lr_mult: 2 261 | decay_mult: 0 262 | } 263 | convolution_param { 264 | num_output: 512 265 | pad: 1 266 | kernel_size: 3 267 | } 268 | } 269 | layer { 270 | name: "relu4_2" 271 | type: "ReLU" 272 | bottom: "conv4_2" 273 | top: "conv4_2" 274 | } 275 | layer { 276 | name: "conv4_3" 277 | type: "Convolution" 278 | bottom: "conv4_2" 279 | top: "conv4_3" 280 | param { 281 | lr_mult: 1 282 | decay_mult: 1 283 | } 284 | param { 285 | lr_mult: 2 286 | decay_mult: 0 287 | } 288 | convolution_param { 289 | num_output: 512 290 | pad: 1 291 | kernel_size: 3 292 | } 293 | } 294 | layer { 295 | name: "relu4_3" 296 | type: "ReLU" 297 | bottom: "conv4_3" 298 | top: "conv4_3" 299 | } 300 | layer { 301 | name: "pool4" 302 | type: "Pooling" 303 | bottom: "conv4_3" 304 | top: "pool4" 305 | pooling_param { 306 | pool: MAX 307 | kernel_size: 2 308 | stride: 2 309 | } 310 | } 311 | layer { 312 | name: "conv5_1" 313 | type: "Convolution" 314 | bottom: "pool4" 315 | top: "conv5_1" 316 | param { 317 | lr_mult: 1 318 | decay_mult: 1 319 | } 320 | param { 321 | lr_mult: 2 322 | decay_mult: 0 323 | } 324 | convolution_param { 325 | num_output: 512 326 | pad: 1 327 | kernel_size: 3 328 | } 329 | } 330 | layer { 331 | name: "relu5_1" 332 | type: "ReLU" 333 | bottom: "conv5_1" 334 | top: "conv5_1" 335 | } 336 | layer { 337 | name: "conv5_2" 338 | type: "Convolution" 339 | bottom: "conv5_1" 340 | top: "conv5_2" 341 | param { 342 | lr_mult: 1 343 | decay_mult: 1 344 | } 345 | param { 346 | lr_mult: 2 347 | decay_mult: 0 348 | } 349 | convolution_param { 350 | num_output: 512 351 | pad: 1 352 | kernel_size: 3 353 | } 354 | } 355 | layer { 356 | name: "relu5_2" 357 | type: "ReLU" 358 | bottom: "conv5_2" 359 | top: "conv5_2" 360 | } 361 | layer { 362 | name: "conv5_3" 363 | type: "Convolution" 364 | bottom: "conv5_2" 365 | top: "conv5_3" 366 | param { 367 | lr_mult: 1 368 | decay_mult: 1 369 | } 370 | param { 371 | lr_mult: 2 372 | decay_mult: 0 373 | } 374 | convolution_param { 375 | num_output: 512 376 | pad: 1 377 | kernel_size: 3 378 | } 379 | } 380 | layer { 381 | name: "relu5_3" 382 | type: "ReLU" 383 | bottom: "conv5_3" 384 | top: "conv5_3" 385 | } 386 | layer { 387 | name: "roi_pool5" 388 | type: "ROIPooling" 389 | bottom: "conv5_3" 390 | bottom: "rois" 391 | top: "pool5" 392 | roi_pooling_param { 393 | pooled_w: 7 394 | pooled_h: 7 395 | spatial_scale: 0.0625 # 1/16 396 | } 397 | } 398 | layer { 399 | name: "fc6" 400 | type: "InnerProduct" 401 | bottom: "pool5" 402 | top: "fc6" 403 | param { 404 | lr_mult: 1 405 | decay_mult: 1 406 | } 407 | param { 408 | lr_mult: 2 409 | decay_mult: 0 410 | } 411 | inner_product_param { 412 | num_output: 4096 413 | } 414 | } 415 | layer { 416 | name: "relu6" 417 | type: "ReLU" 418 | bottom: "fc6" 419 | top: "fc6" 420 | } 421 | layer { 422 | name: "drop6" 423 | type: "Dropout" 424 | bottom: "fc6" 425 | top: "fc6" 426 | dropout_param { 427 | dropout_ratio: 0.5 428 | } 429 | } 430 | layer { 431 | name: "fc7" 432 | type: "InnerProduct" 433 | bottom: "fc6" 434 | top: "fc7" 435 | param { 436 | lr_mult: 1 437 | decay_mult: 1 438 | } 439 | param { 440 | lr_mult: 2 441 | decay_mult: 0 442 | } 443 | inner_product_param { 444 | num_output: 4096 445 | } 446 | } 447 | layer { 448 | name: "relu7" 449 | type: "ReLU" 450 | bottom: "fc7" 451 | top: "fc7" 452 | } 453 | layer { 454 | name: "drop7" 455 | type: "Dropout" 456 | bottom: "fc7" 457 | top: "fc7" 458 | dropout_param { 459 | dropout_ratio: 0.5 460 | } 461 | } 462 | layer { 463 | name: "cls_score" 464 | type: "InnerProduct" 465 | bottom: "fc7" 466 | top: "cls_score" 467 | param { 468 | lr_mult: 1 469 | decay_mult: 1 470 | } 471 | param { 472 | lr_mult: 2 473 | decay_mult: 0 474 | } 475 | inner_product_param { 476 | num_output: 21 477 | weight_filler { 478 | type: "gaussian" 479 | std: 0.01 480 | } 481 | bias_filler { 482 | type: "constant" 483 | value: 0 484 | } 485 | } 486 | } 487 | layer { 488 | name: "bbox_pred" 489 | type: "InnerProduct" 490 | bottom: "fc7" 491 | top: "bbox_pred" 492 | param { 493 | lr_mult: 1 494 | decay_mult: 1 495 | } 496 | param { 497 | lr_mult: 2 498 | decay_mult: 0 499 | } 500 | inner_product_param { 501 | num_output: 84 502 | weight_filler { 503 | type: "gaussian" 504 | std: 0.001 505 | } 506 | bias_filler { 507 | type: "constant" 508 | value: 0 509 | } 510 | } 511 | } 512 | layer { 513 | name: "cls_prob" 514 | type: "Softmax" 515 | bottom: "cls_score" 516 | top: "cls_prob" 517 | } 518 | -------------------------------------------------------------------------------- /models/pascal_voc/VGG_CNN_M_1024/fast_rcnn_ohem/train.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG_CNN_M_1024" 2 | layer { 3 | name: 'data' 4 | type: 'Python' 5 | top: 'data' 6 | top: 'rois' 7 | top: 'labels' 8 | top: 'bbox_targets' 9 | top: 'bbox_inside_weights' 10 | top: 'bbox_outside_weights' 11 | python_param { 12 | module: 'roi_data_layer.layer' 13 | layer: 'RoIDataLayer' 14 | param_str: "'num_classes': 21" 15 | } 16 | } 17 | layer { 18 | name: "conv1" 19 | type: "Convolution" 20 | bottom: "data" 21 | top: "conv1" 22 | param { lr_mult: 0 decay_mult: 0 } 23 | param { lr_mult: 0 decay_mult: 0 } 24 | convolution_param { 25 | num_output: 96 26 | kernel_size: 7 27 | stride: 2 28 | } 29 | } 30 | layer { 31 | name: "relu1" 32 | type: "ReLU" 33 | bottom: "conv1" 34 | top: "conv1" 35 | } 36 | layer { 37 | name: "norm1" 38 | type: "LRN" 39 | bottom: "conv1" 40 | top: "norm1" 41 | lrn_param { 42 | local_size: 5 43 | alpha: 0.0005 44 | beta: 0.75 45 | k: 2 46 | } 47 | } 48 | layer { 49 | name: "pool1" 50 | type: "Pooling" 51 | bottom: "norm1" 52 | top: "pool1" 53 | pooling_param { 54 | pool: MAX 55 | kernel_size: 3 56 | stride: 2 57 | } 58 | } 59 | layer { 60 | name: "conv2" 61 | type: "Convolution" 62 | bottom: "pool1" 63 | top: "conv2" 64 | param { 65 | lr_mult: 1 66 | } 67 | param { 68 | lr_mult: 2 69 | } 70 | convolution_param { 71 | num_output: 256 72 | pad: 1 73 | kernel_size: 5 74 | stride: 2 75 | } 76 | } 77 | layer { 78 | name: "relu2" 79 | type: "ReLU" 80 | bottom: "conv2" 81 | top: "conv2" 82 | } 83 | layer { 84 | name: "norm2" 85 | type: "LRN" 86 | bottom: "conv2" 87 | top: "norm2" 88 | lrn_param { 89 | local_size: 5 90 | alpha: 0.0005 91 | beta: 0.75 92 | k: 2 93 | } 94 | } 95 | layer { 96 | name: "pool2" 97 | type: "Pooling" 98 | bottom: "norm2" 99 | top: "pool2" 100 | pooling_param { 101 | pool: MAX 102 | kernel_size: 3 103 | stride: 2 104 | } 105 | } 106 | layer { 107 | name: "conv3" 108 | type: "Convolution" 109 | bottom: "pool2" 110 | top: "conv3" 111 | param { 112 | lr_mult: 1 113 | } 114 | param { 115 | lr_mult: 2 116 | } 117 | convolution_param { 118 | num_output: 512 119 | pad: 1 120 | kernel_size: 3 121 | } 122 | } 123 | layer { 124 | name: "relu3" 125 | type: "ReLU" 126 | bottom: "conv3" 127 | top: "conv3" 128 | } 129 | layer { 130 | name: "conv4" 131 | type: "Convolution" 132 | bottom: "conv3" 133 | top: "conv4" 134 | param { 135 | lr_mult: 1 136 | } 137 | param { 138 | lr_mult: 2 139 | } 140 | convolution_param { 141 | num_output: 512 142 | pad: 1 143 | kernel_size: 3 144 | } 145 | } 146 | layer { 147 | name: "relu4" 148 | type: "ReLU" 149 | bottom: "conv4" 150 | top: "conv4" 151 | } 152 | layer { 153 | name: "conv5" 154 | type: "Convolution" 155 | bottom: "conv4" 156 | top: "conv5" 157 | param { 158 | lr_mult: 1 159 | } 160 | param { 161 | lr_mult: 2 162 | } 163 | convolution_param { 164 | num_output: 512 165 | pad: 1 166 | kernel_size: 3 167 | } 168 | } 169 | layer { 170 | name: "relu5" 171 | type: "ReLU" 172 | bottom: "conv5" 173 | top: "conv5" 174 | } 175 | ########################## 176 | ## Readonly RoI Network ## 177 | ######### Start ########## 178 | layer { 179 | name: "roi_pool5_readonly" 180 | type: "ROIPooling" 181 | bottom: "conv5" 182 | bottom: "rois" 183 | top: "pool5_readonly" 184 | propagate_down: false 185 | propagate_down: false 186 | roi_pooling_param { 187 | pooled_w: 6 188 | pooled_h: 6 189 | spatial_scale: 0.0625 # 1/16 190 | } 191 | } 192 | layer { 193 | name: "fc6_readonly" 194 | type: "InnerProduct" 195 | bottom: "pool5_readonly" 196 | top: "fc6_readonly" 197 | propagate_down: false 198 | param { 199 | name: "fc6_w" 200 | } 201 | param { 202 | name: "fc6_b" 203 | } 204 | inner_product_param { 205 | num_output: 4096 206 | } 207 | } 208 | layer { 209 | name: "relu6_readonly" 210 | type: "ReLU" 211 | bottom: "fc6_readonly" 212 | top: "fc6_readonly" 213 | propagate_down: false 214 | } 215 | layer { 216 | name: "drop6_readonly" 217 | type: "Dropout" 218 | bottom: "fc6_readonly" 219 | top: "fc6_readonly" 220 | propagate_down: false 221 | dropout_param { 222 | dropout_ratio: 0.5 223 | } 224 | } 225 | layer { 226 | name: "fc7_readonly" 227 | type: "InnerProduct" 228 | bottom: "fc6_readonly" 229 | top: "fc7_readonly" 230 | propagate_down: false 231 | param { 232 | name: "fc7_w" 233 | } 234 | param { 235 | name: "fc7_b" 236 | } 237 | inner_product_param { 238 | num_output: 1024 239 | } 240 | } 241 | layer { 242 | name: "relu7_readonly" 243 | type: "ReLU" 244 | bottom: "fc7_readonly" 245 | top: "fc7_readonly" 246 | propagate_down: false 247 | } 248 | layer { 249 | name: "drop7_readonly" 250 | type: "Dropout" 251 | bottom: "fc7_readonly" 252 | top: "fc7_readonly" 253 | propagate_down: false 254 | dropout_param { 255 | dropout_ratio: 0.5 256 | } 257 | } 258 | layer { 259 | name: "cls_score_readonly" 260 | type: "InnerProduct" 261 | bottom: "fc7_readonly" 262 | top: "cls_score_readonly" 263 | propagate_down: false 264 | param { 265 | name: "cls_score_w" 266 | } 267 | param { 268 | name: "cls_score_b" 269 | } 270 | inner_product_param { 271 | num_output: 21 272 | weight_filler { 273 | type: "gaussian" 274 | std: 0.01 275 | } 276 | bias_filler { 277 | type: "constant" 278 | value: 0 279 | } 280 | } 281 | } 282 | layer { 283 | name: "bbox_pred_readonly" 284 | type: "InnerProduct" 285 | bottom: "fc7_readonly" 286 | top: "bbox_pred_readonly" 287 | propagate_down: false 288 | param { 289 | name: "bbox_pred_w" 290 | } 291 | param { 292 | name: "bbox_pred_b" 293 | } 294 | inner_product_param { 295 | num_output: 84 296 | weight_filler { 297 | type: "gaussian" 298 | std: 0.001 299 | } 300 | bias_filler { 301 | type: "constant" 302 | value: 0 303 | } 304 | } 305 | } 306 | layer { 307 | name: "cls_prob_readonly" 308 | type: "Softmax" 309 | bottom: "cls_score_readonly" 310 | top: "cls_prob_readonly" 311 | propagate_down: false 312 | } 313 | layer { 314 | name: "hard_roi_mining" 315 | type: "Python" 316 | bottom: "cls_prob_readonly" 317 | bottom: "bbox_pred_readonly" 318 | bottom: "rois" 319 | bottom: "labels" 320 | bottom: "bbox_targets" 321 | bottom: "bbox_inside_weights" 322 | bottom: "bbox_outside_weights" 323 | top: "rois_hard" 324 | top: "labels_hard" 325 | top: "bbox_targets_hard" 326 | top: "bbox_inside_weights_hard" 327 | top: "bbox_outside_weights_hard" 328 | propagate_down: false 329 | propagate_down: false 330 | propagate_down: false 331 | propagate_down: false 332 | propagate_down: false 333 | propagate_down: false 334 | propagate_down: false 335 | python_param { 336 | module: "roi_data_layer.layer" 337 | layer: "OHEMDataLayer" 338 | param_str: "'num_classes': 21" 339 | } 340 | } 341 | ########## End ########### 342 | ## Readonly RoI Network ## 343 | ########################## 344 | layer { 345 | name: "roi_pool5" 346 | type: "ROIPooling" 347 | bottom: "conv5" 348 | bottom: "rois_hard" 349 | top: "pool5" 350 | propagate_down: true 351 | propagate_down: false 352 | roi_pooling_param { 353 | pooled_w: 6 354 | pooled_h: 6 355 | spatial_scale: 0.0625 # 1/16 356 | } 357 | } 358 | layer { 359 | name: "fc6" 360 | type: "InnerProduct" 361 | bottom: "pool5" 362 | top: "fc6" 363 | param { 364 | name: "fc6_w" 365 | lr_mult: 1 366 | decay_mult: 1 367 | } 368 | param { 369 | name: "fc6_b" 370 | lr_mult: 2 371 | decay_mult: 0 372 | } 373 | inner_product_param { 374 | num_output: 4096 375 | } 376 | } 377 | layer { 378 | name: "relu6" 379 | type: "ReLU" 380 | bottom: "fc6" 381 | top: "fc6" 382 | } 383 | layer { 384 | name: "drop6" 385 | type: "Dropout" 386 | bottom: "fc6" 387 | top: "fc6" 388 | dropout_param { 389 | dropout_ratio: 0.5 390 | } 391 | } 392 | layer { 393 | name: "fc7" 394 | type: "InnerProduct" 395 | bottom: "fc6" 396 | top: "fc7" 397 | param { 398 | name: "fc7_w" 399 | lr_mult: 1 400 | decay_mult: 1 401 | } 402 | param { 403 | name: "fc7_b" 404 | lr_mult: 2 405 | decay_mult: 0 406 | } 407 | inner_product_param { 408 | num_output: 1024 409 | } 410 | } 411 | layer { 412 | name: "relu7" 413 | type: "ReLU" 414 | bottom: "fc7" 415 | top: "fc7" 416 | } 417 | layer { 418 | name: "drop7" 419 | type: "Dropout" 420 | bottom: "fc7" 421 | top: "fc7" 422 | dropout_param { 423 | dropout_ratio: 0.5 424 | } 425 | } 426 | layer { 427 | name: "cls_score" 428 | type: "InnerProduct" 429 | bottom: "fc7" 430 | top: "cls_score" 431 | param { 432 | name: "cls_score_w" 433 | lr_mult: 1 434 | decay_mult: 1 435 | } 436 | param { 437 | name: "cls_score_b" 438 | lr_mult: 2 439 | decay_mult: 0 440 | } 441 | inner_product_param { 442 | num_output: 21 443 | weight_filler { 444 | type: "gaussian" 445 | std: 0.01 446 | } 447 | bias_filler { 448 | type: "constant" 449 | value: 0 450 | } 451 | } 452 | } 453 | layer { 454 | name: "bbox_pred" 455 | type: "InnerProduct" 456 | bottom: "fc7" 457 | top: "bbox_pred" 458 | param { 459 | name: "bbox_pred_w" 460 | lr_mult: 1 461 | decay_mult: 1 462 | } 463 | param { 464 | name: "bbox_pred_b" 465 | lr_mult: 2 466 | decay_mult: 0 467 | } 468 | inner_product_param { 469 | num_output: 84 470 | weight_filler { 471 | type: "gaussian" 472 | std: 0.001 473 | } 474 | bias_filler { 475 | type: "constant" 476 | value: 0 477 | } 478 | } 479 | } 480 | layer { 481 | name: "loss_cls" 482 | type: "SoftmaxWithLoss" 483 | bottom: "cls_score" 484 | bottom: "labels_hard" 485 | top: "loss_cls" 486 | propagate_down: true 487 | propagate_down: false 488 | loss_weight: 1 489 | } 490 | layer { 491 | name: "loss_bbox" 492 | type: "SmoothL1Loss" 493 | bottom: "bbox_pred" 494 | bottom: "bbox_targets_hard" 495 | bottom: "bbox_inside_weights_hard" 496 | bottom: "bbox_outside_weights_hard" 497 | top: "loss_bbox" 498 | propagate_down: true 499 | propagate_down: false 500 | propagate_down: false 501 | propagate_down: false 502 | loss_weight: 1 503 | } -------------------------------------------------------------------------------- /lib/fast_rcnn/config.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN with OHEM 3 | # Licensed under The MIT License [see LICENSE for details] 4 | # Written by Ross Girshick and Abhinav Shrivastava 5 | # -------------------------------------------------------- 6 | 7 | """Fast R-CNN config system. 8 | 9 | This file specifies default config options for Fast R-CNN. You should not 10 | change values in this file. Instead, you should write a config file (in yaml) 11 | and use cfg_from_file(yaml_file) to load it and override the default options. 12 | 13 | Most tools in $ROOT/tools take a --cfg option to specify an override file. 14 | - See tools/{train,test}_net.py for example code that uses cfg_from_file() 15 | - See experiments/cfgs/*.yml for example YAML config override files 16 | """ 17 | 18 | import os 19 | import os.path as osp 20 | import numpy as np 21 | # `pip install easydict` if you don't have it 22 | from easydict import EasyDict as edict 23 | 24 | __C = edict() 25 | # Consumers can get config by: 26 | # from fast_rcnn_config import cfg 27 | cfg = __C 28 | 29 | # 30 | # Training options 31 | # 32 | 33 | __C.TRAIN = edict() 34 | 35 | # Scales to use during training (can list multiple scales) 36 | # Each scale is the pixel size of an image's shortest side 37 | __C.TRAIN.SCALES = (600,) 38 | 39 | # Max pixel size of the longest side of a scaled input image 40 | __C.TRAIN.MAX_SIZE = 1000 41 | 42 | # Images to use per minibatch 43 | __C.TRAIN.IMS_PER_BATCH = 2 44 | 45 | # Minibatch size (number of regions of interest [ROIs]) 46 | __C.TRAIN.BATCH_SIZE = 128 47 | 48 | # Fraction of minibatch that is labeled foreground (i.e. class > 0) 49 | __C.TRAIN.FG_FRACTION = 0.25 50 | 51 | # Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH) 52 | __C.TRAIN.FG_THRESH = 0.5 53 | 54 | # Overlap threshold for a ROI to be considered background (class = 0 if 55 | # overlap in [LO, HI)) 56 | __C.TRAIN.BG_THRESH_HI = 0.5 57 | __C.TRAIN.BG_THRESH_LO = 0.1 58 | 59 | # Use horizontally-flipped images during training? 60 | __C.TRAIN.USE_FLIPPED = True 61 | 62 | # Train bounding-box regressors 63 | __C.TRAIN.BBOX_REG = True 64 | 65 | # Overlap required between a ROI and ground-truth box in order for that ROI to 66 | # be used as a bounding-box regression training example 67 | __C.TRAIN.BBOX_THRESH = 0.5 68 | 69 | # Iterations between snapshots 70 | __C.TRAIN.SNAPSHOT_ITERS = 10000 71 | 72 | # solver.prototxt specifies the snapshot path prefix, this adds an optional 73 | # infix to yield the path: [_]_iters_XYZ.caffemodel 74 | __C.TRAIN.SNAPSHOT_INFIX = '' 75 | 76 | # Use a prefetch thread in roi_data_layer.layer 77 | # So far I haven't found this useful; likely more engineering work is required 78 | __C.TRAIN.USE_PREFETCH = False 79 | 80 | # Normalize the targets (subtract empirical mean, divide by empirical stddev) 81 | __C.TRAIN.BBOX_NORMALIZE_TARGETS = True 82 | # Deprecated (inside weights) 83 | __C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) 84 | # Normalize the targets using "precomputed" (or made up) means and stdevs 85 | # (BBOX_NORMALIZE_TARGETS must also be True) 86 | __C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = False 87 | __C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0) 88 | __C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2) 89 | 90 | # Train using these proposals 91 | __C.TRAIN.PROPOSAL_METHOD = 'selective_search' 92 | 93 | # Make minibatches from images that have similar aspect ratios (i.e. both 94 | # tall and thin or both short and wide) in order to avoid wasting computation 95 | # on zero-padding. 96 | __C.TRAIN.ASPECT_GROUPING = True 97 | 98 | # Use RPN to detect objects 99 | __C.TRAIN.HAS_RPN = False 100 | # IOU >= thresh: positive example 101 | __C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7 102 | # IOU < thresh: negative example 103 | __C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3 104 | # If an anchor statisfied by positive and negative conditions set to negative 105 | __C.TRAIN.RPN_CLOBBER_POSITIVES = False 106 | # Max number of foreground examples 107 | __C.TRAIN.RPN_FG_FRACTION = 0.5 108 | # Total number of examples 109 | __C.TRAIN.RPN_BATCHSIZE = 256 110 | # NMS threshold used on RPN proposals 111 | __C.TRAIN.RPN_NMS_THRESH = 0.7 112 | # Number of top scoring boxes to keep before apply NMS to RPN proposals 113 | __C.TRAIN.RPN_PRE_NMS_TOP_N = 12000 114 | # Number of top scoring boxes to keep after applying NMS to RPN proposals 115 | __C.TRAIN.RPN_POST_NMS_TOP_N = 2000 116 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale) 117 | __C.TRAIN.RPN_MIN_SIZE = 16 118 | # Deprecated (outside weights) 119 | __C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0) 120 | # Give the positive RPN examples weight of p * 1 / {num positives} 121 | # and give negatives a weight of (1 - p) 122 | # Set to -1.0 to use uniform example weighting 123 | __C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0 124 | 125 | # Parameters for "Online Hard-example Mining Algorithm" 126 | __C.TRAIN.USE_OHEM = False 127 | # For diversity and de-duplication 128 | __C.TRAIN.OHEM_USE_NMS = True 129 | __C.TRAIN.OHEM_NMS_THRESH = 0.7 130 | 131 | # 132 | # Testing options 133 | # 134 | 135 | __C.TEST = edict() 136 | 137 | # Scales to use during testing (can list multiple scales) 138 | # Each scale is the pixel size of an image's shortest side 139 | __C.TEST.SCALES = (600,) 140 | 141 | # Max pixel size of the longest side of a scaled input image 142 | __C.TEST.MAX_SIZE = 1000 143 | 144 | # Overlap threshold used for non-maximum suppression (suppress boxes with 145 | # IoU >= this threshold) 146 | __C.TEST.NMS = 0.3 147 | 148 | # Experimental: treat the (K+1) units in the cls_score layer as linear 149 | # predictors (trained, eg, with one-vs-rest SVMs). 150 | __C.TEST.SVM = False 151 | 152 | # Test using bounding-box regressors 153 | __C.TEST.BBOX_REG = True 154 | 155 | # Propose boxes 156 | __C.TEST.HAS_RPN = False 157 | 158 | # Test using these proposals 159 | __C.TEST.PROPOSAL_METHOD = 'selective_search' 160 | 161 | ## NMS threshold used on RPN proposals 162 | __C.TEST.RPN_NMS_THRESH = 0.7 163 | ## Number of top scoring boxes to keep before apply NMS to RPN proposals 164 | __C.TEST.RPN_PRE_NMS_TOP_N = 6000 165 | ## Number of top scoring boxes to keep after applying NMS to RPN proposals 166 | __C.TEST.RPN_POST_NMS_TOP_N = 300 167 | # Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale) 168 | __C.TEST.RPN_MIN_SIZE = 16 169 | 170 | 171 | # 172 | # MISC 173 | # 174 | 175 | # The mapping from image coordinates to feature map coordinates might cause 176 | # some boxes that are distinct in image space to become identical in feature 177 | # coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor 178 | # for identifying duplicate boxes. 179 | # 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16 180 | __C.DEDUP_BOXES = 1./16. 181 | 182 | # Pixel mean values (BGR order) as a (1, 1, 3) array 183 | # We use the same pixel mean for all networks even though it's not exactly what 184 | # they were trained with 185 | __C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]]) 186 | 187 | # For reproducibility 188 | __C.RNG_SEED = 3 189 | 190 | # A small number that's used many times 191 | __C.EPS = 1e-14 192 | 193 | # Root directory of project 194 | __C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..')) 195 | 196 | # Data directory 197 | __C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data')) 198 | 199 | # Model directory 200 | __C.MODELS_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'models', 'pascal_voc')) 201 | 202 | # Name (or path to) the matlab executable 203 | __C.MATLAB = 'matlab' 204 | 205 | # Place outputs under an experiments directory 206 | __C.EXP_DIR = 'default' 207 | 208 | # Use GPU implementation of non-maximum suppression 209 | __C.USE_GPU_NMS = True 210 | 211 | # Default GPU device id 212 | __C.GPU_ID = 0 213 | 214 | 215 | def get_output_dir(imdb, net=None): 216 | """Return the directory where experimental artifacts are placed. 217 | If the directory does not exist, it is created. 218 | 219 | A canonical path is built using the name from an imdb and a network 220 | (if not None). 221 | """ 222 | outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name)) 223 | if net is not None: 224 | outdir = osp.join(outdir, net.name) 225 | if not os.path.exists(outdir): 226 | os.makedirs(outdir) 227 | return outdir 228 | 229 | def _merge_a_into_b(a, b): 230 | """Merge config dictionary a into config dictionary b, clobbering the 231 | options in b whenever they are also specified in a. 232 | """ 233 | if type(a) is not edict: 234 | return 235 | 236 | for k, v in a.iteritems(): 237 | # a must specify keys that are in b 238 | if not b.has_key(k): 239 | raise KeyError('{} is not a valid config key'.format(k)) 240 | 241 | # the types must match, too 242 | old_type = type(b[k]) 243 | if old_type is not type(v): 244 | if isinstance(b[k], np.ndarray): 245 | v = np.array(v, dtype=b[k].dtype) 246 | else: 247 | raise ValueError(('Type mismatch ({} vs. {}) ' 248 | 'for config key: {}').format(type(b[k]), 249 | type(v), k)) 250 | 251 | # recursively merge dicts 252 | if type(v) is edict: 253 | try: 254 | _merge_a_into_b(a[k], b[k]) 255 | except: 256 | print('Error under config key: {}'.format(k)) 257 | raise 258 | else: 259 | b[k] = v 260 | 261 | def cfg_from_file(filename): 262 | """Load a config file and merge it into the default options.""" 263 | import yaml 264 | with open(filename, 'r') as f: 265 | yaml_cfg = edict(yaml.load(f)) 266 | 267 | _merge_a_into_b(yaml_cfg, __C) 268 | 269 | def cfg_from_list(cfg_list): 270 | """Set config keys via list (e.g., from command line).""" 271 | from ast import literal_eval 272 | assert len(cfg_list) % 2 == 0 273 | for k, v in zip(cfg_list[0::2], cfg_list[1::2]): 274 | key_list = k.split('.') 275 | d = __C 276 | for subkey in key_list[:-1]: 277 | assert d.has_key(subkey) 278 | d = d[subkey] 279 | subkey = key_list[-1] 280 | assert d.has_key(subkey) 281 | try: 282 | value = literal_eval(v) 283 | except: 284 | # handle the case when v is a string literal 285 | value = v 286 | assert type(value) == type(d[subkey]), \ 287 | 'type {} does not match original type {}'.format( 288 | type(value), type(d[subkey])) 289 | d[subkey] = value 290 | -------------------------------------------------------------------------------- /lib/datasets/imdb.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------- 2 | # Fast R-CNN 3 | # Copyright (c) 2015 Microsoft 4 | # Licensed under The MIT License [see LICENSE for details] 5 | # Written by Ross Girshick 6 | # -------------------------------------------------------- 7 | 8 | import os 9 | import os.path as osp 10 | import PIL 11 | from utils.cython_bbox import bbox_overlaps 12 | import numpy as np 13 | import scipy.sparse 14 | from fast_rcnn.config import cfg 15 | 16 | class imdb(object): 17 | """Image database.""" 18 | 19 | def __init__(self, name): 20 | self._name = name 21 | self._num_classes = 0 22 | self._classes = [] 23 | self._image_index = [] 24 | self._obj_proposer = 'selective_search' 25 | self._roidb = None 26 | self._roidb_handler = self.default_roidb 27 | # Use this dict for storing dataset specific config options 28 | self.config = {} 29 | 30 | @property 31 | def name(self): 32 | return self._name 33 | 34 | @property 35 | def num_classes(self): 36 | return len(self._classes) 37 | 38 | @property 39 | def classes(self): 40 | return self._classes 41 | 42 | @property 43 | def image_index(self): 44 | return self._image_index 45 | 46 | @property 47 | def roidb_handler(self): 48 | return self._roidb_handler 49 | 50 | @roidb_handler.setter 51 | def roidb_handler(self, val): 52 | self._roidb_handler = val 53 | 54 | def set_proposal_method(self, method): 55 | method = eval('self.' + method + '_roidb') 56 | self.roidb_handler = method 57 | 58 | @property 59 | def roidb(self): 60 | # A roidb is a list of dictionaries, each with the following keys: 61 | # boxes 62 | # gt_overlaps 63 | # gt_classes 64 | # flipped 65 | if self._roidb is not None: 66 | return self._roidb 67 | self._roidb = self.roidb_handler() 68 | return self._roidb 69 | 70 | @property 71 | def cache_path(self): 72 | cache_path = osp.abspath(osp.join(cfg.DATA_DIR, 'cache')) 73 | if not os.path.exists(cache_path): 74 | os.makedirs(cache_path) 75 | return cache_path 76 | 77 | @property 78 | def num_images(self): 79 | return len(self.image_index) 80 | 81 | def image_path_at(self, i): 82 | raise NotImplementedError 83 | 84 | def default_roidb(self): 85 | raise NotImplementedError 86 | 87 | def evaluate_detections(self, all_boxes, output_dir=None): 88 | """ 89 | all_boxes is a list of length number-of-classes. 90 | Each list element is a list of length number-of-images. 91 | Each of those list elements is either an empty list [] 92 | or a numpy array of detection. 93 | 94 | all_boxes[class][image] = [] or np.array of shape #dets x 5 95 | """ 96 | raise NotImplementedError 97 | 98 | def _get_widths(self): 99 | return [PIL.Image.open(self.image_path_at(i)).size[0] 100 | for i in xrange(self.num_images)] 101 | 102 | def append_flipped_images(self): 103 | num_images = self.num_images 104 | widths = self._get_widths() 105 | for i in xrange(num_images): 106 | boxes = self.roidb[i]['boxes'].copy() 107 | oldx1 = boxes[:, 0].copy() 108 | oldx2 = boxes[:, 2].copy() 109 | boxes[:, 0] = widths[i] - oldx2 - 1 110 | boxes[:, 2] = widths[i] - oldx1 - 1 111 | assert (boxes[:, 2] >= boxes[:, 0]).all() 112 | entry = {'boxes' : boxes, 113 | 'gt_overlaps' : self.roidb[i]['gt_overlaps'], 114 | 'gt_classes' : self.roidb[i]['gt_classes'], 115 | 'flipped' : True} 116 | self.roidb.append(entry) 117 | self._image_index = self._image_index * 2 118 | 119 | def evaluate_recall(self, candidate_boxes=None, thresholds=None, 120 | area='all', limit=None): 121 | """Evaluate detection proposal recall metrics. 122 | 123 | Returns: 124 | results: dictionary of results with keys 125 | 'ar': average recall 126 | 'recalls': vector recalls at each IoU overlap threshold 127 | 'thresholds': vector of IoU overlap thresholds 128 | 'gt_overlaps': vector of all ground-truth overlaps 129 | """ 130 | # Record max overlap value for each gt box 131 | # Return vector of overlap values 132 | areas = { 'all': 0, 'small': 1, 'medium': 2, 'large': 3, 133 | '96-128': 4, '128-256': 5, '256-512': 6, '512-inf': 7} 134 | area_ranges = [ [0**2, 1e5**2], # all 135 | [0**2, 32**2], # small 136 | [32**2, 96**2], # medium 137 | [96**2, 1e5**2], # large 138 | [96**2, 128**2], # 96-128 139 | [128**2, 256**2], # 128-256 140 | [256**2, 512**2], # 256-512 141 | [512**2, 1e5**2], # 512-inf 142 | ] 143 | assert areas.has_key(area), 'unknown area range: {}'.format(area) 144 | area_range = area_ranges[areas[area]] 145 | gt_overlaps = np.zeros(0) 146 | num_pos = 0 147 | for i in xrange(self.num_images): 148 | # Checking for max_overlaps == 1 avoids including crowd annotations 149 | # (...pretty hacking :/) 150 | max_gt_overlaps = self.roidb[i]['gt_overlaps'].toarray().max(axis=1) 151 | gt_inds = np.where((self.roidb[i]['gt_classes'] > 0) & 152 | (max_gt_overlaps == 1))[0] 153 | gt_boxes = self.roidb[i]['boxes'][gt_inds, :] 154 | gt_areas = self.roidb[i]['seg_areas'][gt_inds] 155 | valid_gt_inds = np.where((gt_areas >= area_range[0]) & 156 | (gt_areas <= area_range[1]))[0] 157 | gt_boxes = gt_boxes[valid_gt_inds, :] 158 | num_pos += len(valid_gt_inds) 159 | 160 | if candidate_boxes is None: 161 | # If candidate_boxes is not supplied, the default is to use the 162 | # non-ground-truth boxes from this roidb 163 | non_gt_inds = np.where(self.roidb[i]['gt_classes'] == 0)[0] 164 | boxes = self.roidb[i]['boxes'][non_gt_inds, :] 165 | else: 166 | boxes = candidate_boxes[i] 167 | if boxes.shape[0] == 0: 168 | continue 169 | if limit is not None and boxes.shape[0] > limit: 170 | boxes = boxes[:limit, :] 171 | 172 | overlaps = bbox_overlaps(boxes.astype(np.float), 173 | gt_boxes.astype(np.float)) 174 | 175 | _gt_overlaps = np.zeros((gt_boxes.shape[0])) 176 | for j in xrange(gt_boxes.shape[0]): 177 | # find which proposal box maximally covers each gt box 178 | argmax_overlaps = overlaps.argmax(axis=0) 179 | # and get the iou amount of coverage for each gt box 180 | max_overlaps = overlaps.max(axis=0) 181 | # find which gt box is 'best' covered (i.e. 'best' = most iou) 182 | gt_ind = max_overlaps.argmax() 183 | gt_ovr = max_overlaps.max() 184 | assert(gt_ovr >= 0) 185 | # find the proposal box that covers the best covered gt box 186 | box_ind = argmax_overlaps[gt_ind] 187 | # record the iou coverage of this gt box 188 | _gt_overlaps[j] = overlaps[box_ind, gt_ind] 189 | assert(_gt_overlaps[j] == gt_ovr) 190 | # mark the proposal box and the gt box as used 191 | overlaps[box_ind, :] = -1 192 | overlaps[:, gt_ind] = -1 193 | # append recorded iou coverage level 194 | gt_overlaps = np.hstack((gt_overlaps, _gt_overlaps)) 195 | 196 | gt_overlaps = np.sort(gt_overlaps) 197 | if thresholds is None: 198 | step = 0.05 199 | thresholds = np.arange(0.5, 0.95 + 1e-5, step) 200 | recalls = np.zeros_like(thresholds) 201 | # compute recall for each iou threshold 202 | for i, t in enumerate(thresholds): 203 | recalls[i] = (gt_overlaps >= t).sum() / float(num_pos) 204 | # ar = 2 * np.trapz(recalls, thresholds) 205 | ar = recalls.mean() 206 | return {'ar': ar, 'recalls': recalls, 'thresholds': thresholds, 207 | 'gt_overlaps': gt_overlaps} 208 | 209 | def create_roidb_from_box_list(self, box_list, gt_roidb): 210 | assert len(box_list) == self.num_images, \ 211 | 'Number of boxes must match number of ground-truth images' 212 | roidb = [] 213 | for i in xrange(self.num_images): 214 | boxes = box_list[i] 215 | num_boxes = boxes.shape[0] 216 | overlaps = np.zeros((num_boxes, self.num_classes), dtype=np.float32) 217 | 218 | if gt_roidb is not None and gt_roidb[i]['boxes'].size > 0: 219 | gt_boxes = gt_roidb[i]['boxes'] 220 | gt_classes = gt_roidb[i]['gt_classes'] 221 | gt_overlaps = bbox_overlaps(boxes.astype(np.float), 222 | gt_boxes.astype(np.float)) 223 | argmaxes = gt_overlaps.argmax(axis=1) 224 | maxes = gt_overlaps.max(axis=1) 225 | I = np.where(maxes > 0)[0] 226 | overlaps[I, gt_classes[argmaxes[I]]] = maxes[I] 227 | 228 | overlaps = scipy.sparse.csr_matrix(overlaps) 229 | roidb.append({ 230 | 'boxes' : boxes, 231 | 'gt_classes' : np.zeros((num_boxes,), dtype=np.int32), 232 | 'gt_overlaps' : overlaps, 233 | 'flipped' : False, 234 | 'seg_areas' : np.zeros((num_boxes,), dtype=np.float32), 235 | }) 236 | return roidb 237 | 238 | @staticmethod 239 | def merge_roidbs(a, b): 240 | assert len(a) == len(b) 241 | for i in xrange(len(a)): 242 | a[i]['boxes'] = np.vstack((a[i]['boxes'], b[i]['boxes'])) 243 | a[i]['gt_classes'] = np.hstack((a[i]['gt_classes'], 244 | b[i]['gt_classes'])) 245 | a[i]['gt_overlaps'] = scipy.sparse.vstack([a[i]['gt_overlaps'], 246 | b[i]['gt_overlaps']]) 247 | a[i]['seg_areas'] = np.hstack((a[i]['seg_areas'], 248 | b[i]['seg_areas'])) 249 | return a 250 | 251 | def competition_mode(self, on): 252 | """Turn competition mode on or off.""" 253 | pass 254 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Training Region-based Object Detectors with Online Hard Example Mining 2 | By Abhinav Shrivastava, Abhinav Gupta, Ross Girshick 3 | 4 | ### Introduction 5 | Online Hard Example Mining (OHEM) is an online bootstrapping algorithm for training region-based ConvNet object detectors like [Fast R-CNN](https://github.com/rbgirshick/fast-rcnn). OHEM 6 | - works nicely in the Stochastic Gradient Descent (SGD) paradigm, 7 | - simplifies training by removing some heuristics and hyperparameters, 8 | - leads to better convergence (lower training set loss), 9 | - consistently gives significanlty higher mAP on PASCAL VOC and MS COCO. 10 | 11 | OHEM was initially presented at CVPR 2016 as an Oral Presentation. For more details, see the [arXiv tech report](http://arxiv.org/abs/1604.03540). 12 | 13 | ### License 14 | 15 | This code is released under the MIT License (refer to the LICENSE file for details). 16 | 17 | ### Citing 18 | 19 | If you find this useful in your research, please consider citing: 20 | 21 | @inproceedings{shrivastavaCVPR16ohem, 22 | Author = {Abhinav Shrivastava and Abhinav Gupta and Ross Girshick}, 23 | Title = {Training Region-based Object Detectors with Online Hard Example Mining}, 24 | Booktitle = {Conference on Computer Vision and Pattern Recognition ({CVPR})}, 25 | Year = {2016} 26 | } 27 | 28 | ### Disclaimer 29 | 30 | This implementation is built on a *fork* of Faster R-CNN Python code ([here](https://github.com/rbgirshick/py-faster-rcnn)), which in turn builds on Fast R-CNN ([here](https://github.com/rbgirshick/fast-rcnn)). Please cite the appropriate papers depending on which part of the code and/or model you are using. 31 | 32 | ### Results 33 | 34 | | | training data | test data | mAP (paper) | mAP (this repo) | 35 | |:--- | :--- | :--- | :--- | :--- | 36 | |Fast R-CNN (FRCN) | VOC 07 trainval | VOC 07 test | 66.9 | 67.6 | 37 | |FRCN with OHEM | VOC 07 trainval | VOC 07 test | 69.9 | 71.5 | 38 | |FRCN, +M, +B | VOC 07 trainval | VOC 07 test | 72.4 | | 39 | |FRCN with OHEM, +M, +B | VOC 07 trainval | VOC 07 test | 75.1 | | 40 | |FRCN | VOC 07 trainval + 12 trainval | VOC 07 test | 70.0 | | 41 | |FRCN with OHEM | VOC 07 trainval + 12 trainval | VOC 07 test | 74.6 | 75.5 | 42 | |FRCN with OHEM, +M, +B | VOC 07 trainval + 12 trainval | VOC 07 test | 78.9 | | 43 | |FRCN | VOC 12 trainval | VOC 12 test | 65.7 | | 44 | |FRCN with OHEM | VOC 12 trainval | VOC 12 test | 69.8 | | 45 | |FRCN with OHEM, +M, +B | VOC 12 trainval | VOC 12 test | 72.9 | | 46 | |FRCN | VOC 07 trainval&test + 12 trainval | VOC 12 test | 68.4 | | 47 | |FRCN with OHEM | VOC 07 trainval&test + 12 trainval | VOC 12 test | 71.9 | | 48 | |FRCN with OHEM, +M, +B | VOC 07 trainval&test + 12 trainval | VOC 12 test | 76.3 | | 49 | |FRCN with OHEM, +M, +B | *above* + COCO 14 trainval | VOC 12 test | 80.1 | | 50 | 51 | **Note**: All methods above use the VGG16 network. `mAP (paper)` is the mAP reported in the paper. `mAP (this repo)` is the mAP reproduced by this codebase. 52 | 53 | **Legend**: `+M`: using multi-scale for training and testing, `+B`: multi-stage bounding box regression. See the paper for details. 54 | 55 | ### Released 56 | - [x] Initial OHEM release 57 | 58 | ### Sometime in the future 59 | - [ ] Support for Multi-scale training and testing 60 | - [ ] Support for Multi-stage bounding box regression 61 | - [ ] Scripts/models for results in [this Table](#results) 62 | - [ ] Support for Faster R-CNN (see [below](#faq-regarding-faster-r-cnn-support)) 63 | 64 | ### Contents 65 | 1. [Requirements: software](#requirements-software) 66 | 2. [Requirements: hardware](#requirements-hardware) 67 | 3. [Basic installation](#installation-sufficient-for-the-demo) 68 | 4. [Demo](#demo) 69 | 5. [Beyond the demo: training and testing](#beyond-the-demo-installation-for-training-and-testing-models) 70 | 6. [Usage](#usage) 71 | 7. [FAQ regarding Faster R-CNN support](#faq-regarding-faster-r-cnn-support) 72 | 73 | ### Requirements: software 74 | 75 | 1. Requirements for `Caffe` and `pycaffe` (see: [Caffe installation instructions](http://caffe.berkeleyvision.org/installation.html)) 76 | 77 | **Note:** Caffe *must* be built with support for Python layers! 78 | 79 | ```make 80 | # In your Makefile.config, make sure to have this line uncommented 81 | WITH_PYTHON_LAYER := 1 82 | # Unrelatedly, it's also recommended that you use CUDNN 83 | USE_CUDNN := 1 84 | ``` 85 | 86 | You can download Ross's [Makefile.config](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/Makefile.config) for reference. 87 | 2. Python packages you might not have: `cython`, `python-opencv`, `easydict`, `yaaml' 88 | 3. [Optional] MATLAB is required for **official** PASCAL VOC evaluation only. The code now includes unofficial Python evaluation code. 89 | 90 | ### Requirements: hardware 91 | 92 | 1. For training smaller networks (VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 4G of memory suffices 93 | 2. For training VGG16, you'll need a K40 or Titan X (or better). 94 | 95 | ### Installation (similar to Fast(er) R-CNN) 96 | 97 | 1. Clone the OHEM repository 98 | ```Shell 99 | # Make sure to clone with --recursive 100 | git clone --recursive https://github.com/abhi2610/ohem.git 101 | ``` 102 | 103 | 2. We'll call the directory that you cloned OHEM into `OHEM_ROOT` 104 | 105 | *Ignore notes 1 and 2 if you followed step 1 above.* 106 | 107 | **Note 1:** If you didn't clone OHEM with the `--recursive` flag, then you'll need to manually clone the `caffe-fast-rcnn` submodule: 108 | ```Shell 109 | git submodule update --init --recursive 110 | ``` 111 | **Note 2:** The `caffe-fast-rcnn` submodule needs to be on the `faster-rcnn` branch (or equivalent detached state). This will happen automatically *if you followed step 1 instructions*. 112 | 113 | 3. Build the Cython modules 114 | ```Shell 115 | cd $OHEM_ROOT/lib 116 | make 117 | ``` 118 | 119 | 4. Build Caffe and pycaffe 120 | ```Shell 121 | cd $OHEM_ROOT/caffe-fast-rcnn 122 | # Now follow the Caffe installation instructions here: 123 | # http://caffe.berkeleyvision.org/installation.html 124 | 125 | # If you're experienced with Caffe and have all of the requirements installed 126 | # and your Makefile.config in place, then simply do: 127 | make -j8 && make pycaffe 128 | ``` 129 | 130 | 5. Download pre-computed Fast R-CNN detector trained with OHEM using VGG16 and VGG_CNN_M_1024 networks. 131 | ```Shell 132 | cd $OHEM_ROOT 133 | ./data/scripts/fetch_fast_rcnn_ohem_models.sh 134 | ``` 135 | This will populate the `$OHEM_ROOT/data` folder with a `fast_rcnn_ohem_models` folder which contains VGG16 and VGG_CNN_M_1024 models (Fast R-CNN detectors trained with OHEM). 136 | The format will be `fast_rcnn_ohem_models/TRAINING_SET/MODEL_FILE`. 137 | 138 | *These models were re-trained using this codebase and achieve slightly better performance (see [this Table](#results)). In particular, on the standard split, VGG_CNN_M_1024 model gets 62.8 mAP (compared to 62.0 mAP reported in paper) and VGG16 model gets 71.5 mAP (compared to 69.9 mAP). All models from the paper will be released soon.* 139 | 140 | ### Installation for training and testing models 141 | 1. Download the training, validation, test data and VOCdevkit 142 | 143 | ```Shell 144 | wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar 145 | wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar 146 | wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar 147 | ``` 148 | 149 | 2. Extract all of these tars into one directory named `VOCdevkit` 150 | 151 | ```Shell 152 | tar xvf VOCtrainval_06-Nov-2007.tar 153 | tar xvf VOCtest_06-Nov-2007.tar 154 | tar xvf VOCdevkit_08-Jun-2007.tar 155 | ``` 156 | 157 | 3. It should have this basic structure 158 | 159 | ```Shell 160 | $VOCdevkit/ # development kit 161 | $VOCdevkit/VOCcode/ # VOC utility code 162 | $VOCdevkit/VOC2007 # image sets, annotations, etc. 163 | # ... and several other directories ... 164 | ``` 165 | 166 | 4. Create symlinks for the PASCAL VOC dataset 167 | 168 | ```Shell 169 | cd $OHEM_ROOT/data 170 | ln -s $VOCdevkit VOCdevkit2007 171 | ``` 172 | Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects. 173 | 5. [Optional] follow similar steps to get PASCAL VOC 2010 and 2012 174 | 6. Follow the next sections to download pre-trained ImageNet models 175 | 176 | *COCO instructions and models will be released soon.* 177 | 178 | ### Download pre-trained ImageNet models 179 | 180 | Pre-trained ImageNet models can be downloaded for the two networks described in the paper: VGG_CNN_M_1024 and VGG16. 181 | 182 | ```Shell 183 | cd $OHEM_ROOT 184 | ./data/scripts/fetch_imagenet_models.sh 185 | ``` 186 | Models come from the [Caffe Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo), but are provided here for your convenience.. 187 | 188 | ### Usage 189 | 190 | To train a Fast R-CNN detector using the **OHEM** algorithm on voc_2007_trainval, use `experiments/scripts/fast_rcnn_ohem.sh`. See `experiments/scripts/` directory for other scripts. Output is written underneath `$OHEM_ROOT/output`. 191 | 192 | ```Shell 193 | cd $OHEM_ROOT 194 | ./experiments/scripts/fast_rcnn_ohem.sh [GPU_ID] [NET] [--set ...] 195 | # GPU_ID is the GPU you want to train on 196 | # NET in {VGG16, VGG_CNN_M_1024} is the network arch to use 197 | # --set ... allows you to specify fast_rcnn.config options, e.g. 198 | # --set EXP_DIR seed_rng1701 RNG_SEED 1701 199 | ``` 200 | 201 | Artifacts generated by the scripts in `tools` are written in this directory. 202 | 203 | Trained Fast R-CNN networks with OHEM are saved under: 204 | 205 | ``` 206 | output/// 207 | ``` 208 | 209 | Test outputs are saved under: 210 | 211 | ``` 212 | output//// 213 | ``` 214 | 215 | *The VGG_CNN_M_1024 model should get ~62.8 mAP and VGG16 model should get ~71.5 mAP. For reference, you can download my logs from [here](http://graphics.cs.cmu.edu/projects/ohem/data/logs.tgz).* 216 | 217 | ### FAQ regarding Faster R-CNN support 218 | 219 | I have received a lot of queries regarding using OHEM with Faster R-CNN. I have not spent too much time combining OHEM with Faster R-CNN yet. Some researchers have informed me that OHEM works well in the 'alternating optimization' setup, but not so much with the 'end to end learning' setup. I hope to try and release the support for Faster R-CNN in the coming months. If you would like an update when I release it, send me an email. 220 | 221 | Also, the authors of [R-FCN](https://github.com/daijifeng001/R-FCN) succesfully used OHEM with R-FCN and Faster R-CNN; you might find their codebase helpful. 222 | --------------------------------------------------------------------------------