├── end_to_end_people_detection_in_crowded_scenes.jpg └── readme.md /end_to_end_people_detection_in_crowded_scenes.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Smorodov/Deep-learning-object-detection-links./365150b5623e1c1ba8eedb3a4e368c54dab143fc/end_to_end_people_detection_in_crowded_scenes.jpg -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | ## Object Detection 2 | 3 | |Method|VOC2007|VOC2010|VOC2012|ILSVRC 2013|MSCOCO 2015|Speed| 4 | |--- |--- |--- |--- |--- |--- |--- | 5 | |OverFeat|-|-|-|24.3%|-|-| 6 | |R-CNN (AlexNet)|58.5%|53.7%|53.3%|31.4%|-|-| 7 | |R-CNN (VGG16)|66.0%|-|-|-|-|-| 8 | |SPP_net(ZF-5)|54.2%(1-model), 60.9%(2-model)|-|-|31.84%(1-model), 35.11%(6-model)|-|-| 9 | |DeepID-Net|64.1%|-|-|50.3%|-|-| 10 | |NoC|73.3%|-|68.8%|-|-|-| 11 | |Fast-RCNN (VGG16)|70.0%|68.8%|68.4%|-|19.7%(@[0.5-0.95]), 35.9%(@0.5)|-| 12 | |MR-CNN|78.2%|-|73.9%|-|-|-| 13 | |Faster-RCNN (VGG16)|78.8%|-|75.9%|-|21.9%(@[0.5-0.95]), 42.7%(@0.5)|198ms| 14 | |Faster-RCNN (ResNet-101)|85.6%|-|83.8%|-|37.4%(@[0.5-0.95]), 59.0%(@0.5)|-| 15 | |SSD300 (VGG16)|72.1%|-|-|-|-|58 fps| 16 | |SSD500 (VGG16)|75.1%|-|-|-|-|23 fps| 17 | |ION|79.2%|-|76.4%|-|-|-| 18 | |CRAFT|75.7%|-|71.3%|48.5%|-|-| 19 | |OHEM|78.9%|-|76.3%|-|25.5%(@[0.5-0.95]), 45.9%(@0.5)|-| 20 | |R-FCN (ResNet-50)|77.4%|-|-|-|-|0.12sec(K40), 0.09sec(TitianX)| 21 | |R-FCN (ResNet-101)|79.5%|-|-|-|-|0.17sec(K40), 0.12sec(TitianX)| 22 | |R-FCN (ResNet-101),multi sc train|83.6%|-|82.0%|-|31.5%(@[0.5-0.95]), 53.2%(@0.5)|-| 23 | |PVANet 9.0|81.8%|-|82.5%|-|-|750ms(CPU), 46ms(TitianX)| 24 | 25 | 26 | # Leaderboard 27 | 28 | **Detection Results: VOC2012** 29 | 30 | * intro: Competition “comp4” (train on additional data) 31 | * homepage: [http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4) 32 | 33 | # Papers 34 | 35 | **Deep Neural Networks for Object Detection** 36 | 37 | * paper: [http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf](http://papers.nips.cc/paper/5207-deep-neural-networks-for-object-detection.pdf) 38 | 39 | **OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks** 40 | 41 | * arxiv: [http://arxiv.org/abs/1312.6229](http://arxiv.org/abs/1312.6229) 42 | * github: [https://github.com/sermanet/OverFeat](https://github.com/sermanet/OverFeat) 43 | * code: [http://cilvr.nyu.edu/doku.php?id=software:overfeat:start](http://cilvr.nyu.edu/doku.php?id=software:overfeat:start) 44 | 45 | ## R-CNN 46 | 47 | **Rich feature hierarchies for accurate object detection and semantic segmentation** 48 | 49 | * intro: R-CNN 50 | * arxiv: [http://arxiv.org/abs/1311.2524](http://arxiv.org/abs/1311.2524) 51 | * supp: [http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf](http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf) 52 | * slides: [http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf](http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf) 53 | * slides: [http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf](http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf) 54 | * github: [https://github.com/rbgirshick/rcnn](https://github.com/rbgirshick/rcnn) 55 | * notes: [http://zhangliliang.com/2014/07/23/paper-note-rcnn/](http://zhangliliang.com/2014/07/23/paper-note-rcnn/) 56 | * caffe-pr(“Make R-CNN the Caffe detection example”): [https://github.com/BVLC/caffe/pull/482](https://github.com/BVLC/caffe/pull/482) 57 | 58 | ## MultiBox 59 | 60 | **Scalable Object Detection using Deep Neural Networks** 61 | 62 | * intro: first MultiBox. Train a CNN to predict Region of Interest. 63 | * arxiv: [http://arxiv.org/abs/1312.2249](http://arxiv.org/abs/1312.2249) 64 | * github: [https://github.com/google/multibox](https://github.com/google/multibox) 65 | * blog: [https://research.googleblog.com/2014/12/high-quality-object-detection-at-scale.html](https://research.googleblog.com/2014/12/high-quality-object-detection-at-scale.html) 66 | 67 | **Scalable, High-Quality Object Detection** 68 | 69 | * intro: second MultiBox 70 | * arxiv: [http://arxiv.org/abs/1412.1441](http://arxiv.org/abs/1412.1441) 71 | * github: [https://github.com/google/multibox](https://github.com/google/multibox) 72 | 73 | ## SPP-Net 74 | 75 | **Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition** 76 | 77 | * intro: ECCV 2014 / TPAMI 2015 78 | * arxiv: [http://arxiv.org/abs/1406.4729](http://arxiv.org/abs/1406.4729) 79 | * github: [https://github.com/ShaoqingRen/SPP_net](https://github.com/ShaoqingRen/SPP_net) 80 | * notes: [http://zhangliliang.com/2014/09/13/paper-note-sppnet/](http://zhangliliang.com/2014/09/13/paper-note-sppnet/) 81 | 82 | ## DeepID-Net 83 | 84 | **DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection** 85 | 86 | * intro: PAMI 2016 87 | * intro: an extension of R-CNN. box pre-training, cascade on region proposals, deformation layers and context representations 88 | * project page: [http://www.ee.cuhk.edu.hk/%CB%9Cwlouyang/projects/imagenetDeepId/index.html](http://www.ee.cuhk.edu.hk/%CB%9Cwlouyang/projects/imagenetDeepId/index.html) 89 | * arxiv: [http://arxiv.org/abs/1412.5661](http://arxiv.org/abs/1412.5661) 90 | 91 | **Object Detectors Emerge in Deep Scene CNNs** 92 | 93 | * arxiv: [http://arxiv.org/abs/1412.6856](http://arxiv.org/abs/1412.6856) 94 | * paper: [https://www.robots.ox.ac.uk/~vgg/rg/papers/zhou_iclr15.pdf](https://www.robots.ox.ac.uk/~vgg/rg/papers/zhou_iclr15.pdf) 95 | * paper: [https://people.csail.mit.edu/khosla/papers/iclr2015_zhou.pdf](https://people.csail.mit.edu/khosla/papers/iclr2015_zhou.pdf) 96 | * slides: [http://places.csail.mit.edu/slide_iclr2015.pdf](http://places.csail.mit.edu/slide_iclr2015.pdf) 97 | 98 | **segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection** 99 | 100 | * intro: CVPR 2015 101 | * project(code+data): [https://www.cs.toronto.edu/~yukun/segdeepm.html](https://www.cs.toronto.edu/~yukun/segdeepm.html) 102 | * arxiv: [https://arxiv.org/abs/1502.04275](https://arxiv.org/abs/1502.04275) 103 | * github: [https://github.com/YknZhu/segDeepM](https://github.com/YknZhu/segDeepM) 104 | 105 | ## NoC 106 | 107 | **Object Detection Networks on Convolutional Feature Maps** 108 | 109 | * intro: TPAMI 2015 110 | * arxiv: [http://arxiv.org/abs/1504.06066](http://arxiv.org/abs/1504.06066) 111 | 112 | **Improving Object Detection with Deep Convolutional Networks via Bayesian Optimization and Structured Prediction** 113 | 114 | * arxiv: [http://arxiv.org/abs/1504.03293](http://arxiv.org/abs/1504.03293) 115 | * slides: [http://www.ytzhang.net/files/publications/2015-cvpr-det-slides.pdf](http://www.ytzhang.net/files/publications/2015-cvpr-det-slides.pdf) 116 | * github: [https://github.com/YutingZhang/fgs-obj](https://github.com/YutingZhang/fgs-obj) 117 | 118 | ## Fast R-CNN 119 | 120 | **Fast R-CNN** 121 | 122 | * arxiv: [http://arxiv.org/abs/1504.08083](http://arxiv.org/abs/1504.08083) 123 | * slides: [http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf](http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf) 124 | * github: [https://github.com/rbgirshick/fast-rcnn](https://github.com/rbgirshick/fast-rcnn) 125 | * webcam demo: [https://github.com/rbgirshick/fast-rcnn/pull/29](https://github.com/rbgirshick/fast-rcnn/pull/29) 126 | * notes: [http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/](http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/) 127 | * notes: [http://blog.csdn.net/linj_m/article/details/48930179](http://blog.csdn.net/linj_m/article/details/48930179) 128 | * github(“Fast R-CNN in MXNet”): [https://github.com/precedenceguo/mx-rcnn](https://github.com/precedenceguo/mx-rcnn) 129 | * github: [https://github.com/mahyarnajibi/fast-rcnn-torch](https://github.com/mahyarnajibi/fast-rcnn-torch) 130 | * github: [https://github.com/apple2373/chainer-simple-fast-rnn](https://github.com/apple2373/chainer-simple-fast-rnn) 131 | * github(Tensorflow): [https://github.com/zplizzi/tensorflow-fast-rcnn](https://github.com/zplizzi/tensorflow-fast-rcnn) 132 | 133 | ## DeepBox 134 | 135 | **DeepBox: Learning Objectness with Convolutional Networks** 136 | 137 | * arxiv: [http://arxiv.org/abs/1505.02146](http://arxiv.org/abs/1505.02146) 138 | * github: [https://github.com/weichengkuo/DeepBox](https://github.com/weichengkuo/DeepBox) 139 | 140 | ## MR-CNN 141 | 142 | **Object detection via a multi-region & semantic segmentation-aware CNN model** 143 | 144 | * intro: ICCV 2015\. MR-CNN 145 | * arxiv: [http://arxiv.org/abs/1505.01749](http://arxiv.org/abs/1505.01749) 146 | * github: [https://github.com/gidariss/mrcnn-object-detection](https://github.com/gidariss/mrcnn-object-detection) 147 | * notes: [http://zhangliliang.com/2015/05/17/paper-note-ms-cnn/](http://zhangliliang.com/2015/05/17/paper-note-ms-cnn/) 148 | * notes: [http://blog.cvmarcher.com/posts/2015/05/17/multi-region-semantic-segmentation-aware-cnn/](http://blog.cvmarcher.com/posts/2015/05/17/multi-region-semantic-segmentation-aware-cnn/) 149 | * my notes: Who can tell me why there are a bunch of duplicated sentences in section 7.2 “Detection error analysis”? :-D 150 | 151 | ## Faster R-CNN 152 | 153 | **Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks** 154 | 155 | * intro: NIPS 2015 156 | * arxiv: [http://arxiv.org/abs/1506.01497](http://arxiv.org/abs/1506.01497) 157 | * gitxiv: [http://www.gitxiv.com/posts/8pfpcvefDYn2gSgXk/faster-r-cnn-towards-real-time-object-detection-with-region](http://www.gitxiv.com/posts/8pfpcvefDYn2gSgXk/faster-r-cnn-towards-real-time-object-detection-with-region) 158 | * slides: [http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/slides/w05-FasterR-CNN.pdf](http://web.cs.hacettepe.edu.tr/~aykut/classes/spring2016/bil722/slides/w05-FasterR-CNN.pdf) 159 | * github: [https://github.com/ShaoqingRen/faster_rcnn](https://github.com/ShaoqingRen/faster_rcnn) 160 | * github: [https://github.com/rbgirshick/py-faster-rcnn](https://github.com/rbgirshick/py-faster-rcnn) 161 | * github: [https://github.com/mitmul/chainer-faster-rcnn](https://github.com/mitmul/chainer-faster-rcnn) 162 | * github(Torch): [https://github.com/andreaskoepf/faster-rcnn.torch](https://github.com/andreaskoepf/faster-rcnn.torch) 163 | * github(Torch): [https://github.com/ruotianluo/Faster-RCNN-Densecap-torch](https://github.com/ruotianluo/Faster-RCNN-Densecap-torch) 164 | * github(Tensorflow): [https://github.com/smallcorgi/Faster-RCNN_TF](https://github.com/smallcorgi/Faster-RCNN_TF) 165 | * github(Tensorflow): [https://github.com/CharlesShang/TFFRCNN](https://github.com/CharlesShang/TFFRCNN) 166 | 167 | **Faster R-CNN in MXNet with distributed implementation and data parallelization** 168 | 169 | * github: [https://github.com/dmlc/mxnet/tree/master/example/rcnn](https://github.com/dmlc/mxnet/tree/master/example/rcnn) 170 | 171 | **Contextual Priming and Feedback for Faster R-CNN** 172 | 173 | * intro: ECCV 2016\. Carnegie Mellon University 174 | * paper: [http://abhinavsh.info/context_priming_feedback.pdf](http://abhinavsh.info/context_priming_feedback.pdf) 175 | * poster: [http://www.eccv2016.org/files/posters/P-1A-20.pdf](http://www.eccv2016.org/files/posters/P-1A-20.pdf) 176 | 177 | **An Implementation of Faster RCNN with Study for Region Sampling** 178 | 179 | * intro: Technical Report, 3 pages. CMU 180 | * arxiv: [https://arxiv.org/abs/1702.02138](https://arxiv.org/abs/1702.02138) 181 | * github: [https://github.com/endernewton/tf-faster-rcnn](https://github.com/endernewton/tf-faster-rcnn) 182 | 183 | ## YOLO 184 | 185 | **You Only Look Once: Unified, Real-Time Object Detection** 186 | 187 | ![](https://camo.githubusercontent.com/e69d4118b20a42de4e23b9549f9a6ec6dbbb0814/687474703a2f2f706a7265646469652e636f6d2f6d656469612f66696c65732f6461726b6e65742d626c61636b2d736d616c6c2e706e67) 188 | 189 | * arxiv: [http://arxiv.org/abs/1506.02640](http://arxiv.org/abs/1506.02640) 190 | * code: [http://pjreddie.com/darknet/yolo/](http://pjreddie.com/darknet/yolo/) 191 | * github: [https://github.com/pjreddie/darknet](https://github.com/pjreddie/darknet) 192 | * reddit: [https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/](https://www.reddit.com/r/MachineLearning/comments/3a3m0o/realtime_object_detection_with_yolo/) 193 | * github: [https://github.com/gliese581gg/YOLO_tensorflow](https://github.com/gliese581gg/YOLO_tensorflow) 194 | * github: [https://github.com/xingwangsfu/caffe-yolo](https://github.com/xingwangsfu/caffe-yolo) 195 | * github: [https://github.com/frankzhangrui/Darknet-Yolo](https://github.com/frankzhangrui/Darknet-Yolo) 196 | * github: [https://github.com/BriSkyHekun/py-darknet-yolo](https://github.com/BriSkyHekun/py-darknet-yolo) 197 | * github: [https://github.com/tommy-qichang/yolo.torch](https://github.com/tommy-qichang/yolo.torch) 198 | * github: [https://github.com/frischzenger/yolo-windows](https://github.com/frischzenger/yolo-windows) 199 | * gtihub: [https://github.com/AlexeyAB/yolo-windows](https://github.com/AlexeyAB/yolo-windows) 200 | 201 | **darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++** 202 | 203 | * blog: [https://thtrieu.github.io/notes/yolo-tensorflow-graph-buffer-cpp](https://thtrieu.github.io/notes/yolo-tensorflow-graph-buffer-cpp) 204 | * github: [https://github.com/thtrieu/darkflow](https://github.com/thtrieu/darkflow) 205 | 206 | **Start Training YOLO with Our Own Data** 207 | 208 | ![](http://guanghan.info/blog/en/wp-content/uploads/2015/12/images-40.jpg) 209 | 210 | * intro: train with customized data and class numbers/labels. Linux / Windows version for darknet. 211 | * blog: [http://guanghan.info/blog/en/my-works/train-yolo/](http://guanghan.info/blog/en/my-works/train-yolo/) 212 | * github: [https://github.com/Guanghan/darknet](https://github.com/Guanghan/darknet) 213 | 214 | **R-CNN minus R** 215 | 216 | * arxiv: [http://arxiv.org/abs/1506.06981](http://arxiv.org/abs/1506.06981) 217 | 218 | ## AttentionNet 219 | 220 | **AttentionNet: Aggregating Weak Directions for Accurate Object Detection** 221 | 222 | * intro: ICCV 2015 223 | * intro: state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 human detection task 224 | * arxiv: [http://arxiv.org/abs/1506.07704](http://arxiv.org/abs/1506.07704) 225 | * slides: [https://www.robots.ox.ac.uk/~vgg/rg/slides/AttentionNet.pdf](https://www.robots.ox.ac.uk/~vgg/rg/slides/AttentionNet.pdf) 226 | * slides: [http://image-net.org/challenges/talks/lunit-kaist-slide.pdf](http://image-net.org/challenges/talks/lunit-kaist-slide.pdf) 227 | 228 | ## DenseBox 229 | 230 | **DenseBox: Unifying Landmark Localization with End to End Object Detection** 231 | 232 | * arxiv: [http://arxiv.org/abs/1509.04874](http://arxiv.org/abs/1509.04874) 233 | * demo: [http://pan.baidu.com/s/1mgoWWsS](http://pan.baidu.com/s/1mgoWWsS) 234 | * KITTI result: [http://www.cvlibs.net/datasets/kitti/eval_object.php](http://www.cvlibs.net/datasets/kitti/eval_object.php) 235 | 236 | ## SSD 237 | 238 | **SSD: Single Shot MultiBox Detector** 239 | 240 | ![](https://camo.githubusercontent.com/ad9b147ed3a5f48ffb7c3540711c15aa04ce49c6/687474703a2f2f7777772e63732e756e632e6564752f7e776c69752f7061706572732f7373642e706e67) 241 | 242 | * intro: ECCV 2016 Oral 243 | * arxiv: [http://arxiv.org/abs/1512.02325](http://arxiv.org/abs/1512.02325) 244 | * paper: [http://www.cs.unc.edu/~wliu/papers/ssd.pdf](http://www.cs.unc.edu/~wliu/papers/ssd.pdf) 245 | * slides: [http://www.cs.unc.edu/%7Ewliu/papers/ssd_eccv2016_slide.pdf](http://www.cs.unc.edu/%7Ewliu/papers/ssd_eccv2016_slide.pdf) 246 | * github: [https://github.com/weiliu89/caffe/tree/ssd](https://github.com/weiliu89/caffe/tree/ssd) 247 | * video: [http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973](http://weibo.com/p/2304447a2326da963254c963c97fb05dd3a973) 248 | * github(MXNet): [https://github.com/zhreshold/mxnet-ssd](https://github.com/zhreshold/mxnet-ssd) 249 | * github: [https://github.com/zhreshold/mxnet-ssd.cpp](https://github.com/zhreshold/mxnet-ssd.cpp) 250 | * github(Keras): [https://github.com/rykov8/ssd_keras](https://github.com/rykov8/ssd_keras) 251 | 252 | ## Inside-Outside Net (ION) 253 | 254 | **Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks** 255 | 256 | * intro: “0.8s per image on a Titan X GPU (excluding proposal generation) without two-stage bounding-box regression and 1.15s per image with it”. 257 | * arxiv: [http://arxiv.org/abs/1512.04143](http://arxiv.org/abs/1512.04143) 258 | * slides: [http://www.seanbell.ca/tmp/ion-coco-talk-bell2015.pdf](http://www.seanbell.ca/tmp/ion-coco-talk-bell2015.pdf) 259 | * coco-leaderboard: [http://mscoco.org/dataset/#detections-leaderboard](http://mscoco.org/dataset/#detections-leaderboard) 260 | 261 | **Adaptive Object Detection Using Adjacency and Zoom Prediction** 262 | 263 | * intro: CVPR 2016\. AZ-Net 264 | * arxiv: [http://arxiv.org/abs/1512.07711](http://arxiv.org/abs/1512.07711) 265 | * github: [https://github.com/luyongxi/az-net](https://github.com/luyongxi/az-net) 266 | * youtube: [https://www.youtube.com/watch?v=YmFtuNwxaNM](https://www.youtube.com/watch?v=YmFtuNwxaNM) 267 | 268 | ## G-CNN 269 | 270 | **G-CNN: an Iterative Grid Based Object Detector** 271 | 272 | * arxiv: [http://arxiv.org/abs/1512.07729](http://arxiv.org/abs/1512.07729) 273 | 274 | **Factors in Finetuning Deep Model for object detection** 275 | 276 | **Factors in Finetuning Deep Model for Object Detection with Long-tail Distribution** 277 | 278 | * intro: CVPR 2016.rank 3rd for provided data and 2nd for external data on ILSVRC 2015 object detection 279 | * project page: [http://www.ee.cuhk.edu.hk/~wlouyang/projects/ImageNetFactors/CVPR16.html](http://www.ee.cuhk.edu.hk/~wlouyang/projects/ImageNetFactors/CVPR16.html) 280 | * arxiv: [http://arxiv.org/abs/1601.05150](http://arxiv.org/abs/1601.05150) 281 | 282 | **We don’t need no bounding-boxes: Training object class detectors using only human verification** 283 | 284 | * arxiv: [http://arxiv.org/abs/1602.08405](http://arxiv.org/abs/1602.08405) 285 | 286 | ## HyperNet 287 | 288 | **HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection** 289 | 290 | * arxiv: [http://arxiv.org/abs/1604.00600](http://arxiv.org/abs/1604.00600) 291 | 292 | ## MultiPathNet 293 | 294 | **A MultiPath Network for Object Detection** 295 | 296 | * intro: BMVC 2016\. Facebook AI Research (FAIR) 297 | * arxiv: [http://arxiv.org/abs/1604.02135](http://arxiv.org/abs/1604.02135) 298 | * github: [https://github.com/facebookresearch/multipathnet](https://github.com/facebookresearch/multipathnet) 299 | 300 | ## CRAFT 301 | 302 | **CRAFT Objects from Images** 303 | 304 | * intro: CVPR 2016\. Cascade Region-proposal-network And FasT-rcnn. an extension of Faster R-CNN 305 | * project page: [http://byangderek.github.io/projects/craft.html](http://byangderek.github.io/projects/craft.html) 306 | * arxiv: [https://arxiv.org/abs/1604.03239](https://arxiv.org/abs/1604.03239) 307 | * paper: [http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yang_CRAFT_Objects_From_CVPR_2016_paper.pdf](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yang_CRAFT_Objects_From_CVPR_2016_paper.pdf) 308 | * github: [https://github.com/byangderek/CRAFT](https://github.com/byangderek/CRAFT) 309 | 310 | ## OHEM 311 | 312 | **Training Region-based Object Detectors with Online Hard Example Mining** 313 | 314 | * intro: CVPR 2016 Oral. Online hard example mining (OHEM) 315 | * arxiv: [http://arxiv.org/abs/1604.03540](http://arxiv.org/abs/1604.03540) 316 | * paper: [http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shrivastava_Training_Region-Based_Object_CVPR_2016_paper.pdf](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shrivastava_Training_Region-Based_Object_CVPR_2016_paper.pdf) 317 | * github(Official): [https://github.com/abhi2610/ohem](https://github.com/abhi2610/ohem) 318 | * author page: [http://abhinav-shrivastava.info/](http://abhinav-shrivastava.info/) 319 | 320 | **Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection** 321 | 322 | * intro: CVPR 2016 323 | * arxiv: [http://arxiv.org/abs/1604.05766](http://arxiv.org/abs/1604.05766) 324 | 325 | **Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers** 326 | 327 | * intro: scale-dependent pooling (SDP), cascaded rejection clas-sifiers (CRC) 328 | * paper: [http://www-personal.umich.edu/~wgchoi/SDP-CRC_camready.pdf](http://www-personal.umich.edu/~wgchoi/SDP-CRC_camready.pdf) 329 | 330 | ## R-FCN 331 | 332 | **R-FCN: Object Detection via Region-based Fully Convolutional Networks** 333 | 334 | * arxiv: [http://arxiv.org/abs/1605.06409](http://arxiv.org/abs/1605.06409) 335 | * github: [https://github.com/daijifeng001/R-FCN](https://github.com/daijifeng001/R-FCN) 336 | * github: [https://github.com/Orpine/py-R-FCN](https://github.com/Orpine/py-R-FCN) 337 | 338 | **Weakly supervised object detection using pseudo-strong labels** 339 | 340 | * arxiv: [http://arxiv.org/abs/1607.04731](http://arxiv.org/abs/1607.04731) 341 | 342 | **Recycle deep features for better object detection** 343 | 344 | * arxiv: [http://arxiv.org/abs/1607.05066](http://arxiv.org/abs/1607.05066) 345 | 346 | ## MS-CNN 347 | 348 | **A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection** 349 | 350 | * intro: ECCV 2016 351 | * intro: 640×480: 15 fps, 960×720: 8 fps 352 | * arxiv: [http://arxiv.org/abs/1607.07155](http://arxiv.org/abs/1607.07155) 353 | * github: [https://github.com/zhaoweicai/mscnn](https://github.com/zhaoweicai/mscnn) 354 | * poster: [http://www.eccv2016.org/files/posters/P-2B-38.pdf](http://www.eccv2016.org/files/posters/P-2B-38.pdf) 355 | 356 | **Multi-stage Object Detection with Group Recursive Learning** 357 | 358 | * intro: VOC2007: 78.6%, VOC2012: 74.9% 359 | * arxiv: [http://arxiv.org/abs/1608.05159](http://arxiv.org/abs/1608.05159) 360 | 361 | **Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection** 362 | 363 | * intro: WACV 2017\. SubCNN 364 | * arxiv: [http://arxiv.org/abs/1604.04693](http://arxiv.org/abs/1604.04693) 365 | * github: [https://github.com/yuxng/SubCNN](https://github.com/yuxng/SubCNN) 366 | 367 | ## PVANET 368 | 369 | **PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection** 370 | 371 | * intro: “less channels with more layers”, concatenated ReLU, Inception, and HyperNet, batch normalization, residual connections 372 | * arxiv: [http://arxiv.org/abs/1608.08021](http://arxiv.org/abs/1608.08021) 373 | * github: [https://github.com/sanghoon/pva-faster-rcnn](https://github.com/sanghoon/pva-faster-rcnn) 374 | * leaderboard(PVANet 9.0): [http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=4) 375 | 376 | **PVANet: Lightweight Deep Neural Networks for Real-time Object Detection** 377 | 378 | * intro: Presented at NIPS 2016 Workshop on Efficient Methods for Deep Neural Networks (EMDNN). Continuation of [arXiv:1608.08021](https://arxiv.org/abs/1608.08021) 379 | * arxiv: [https://arxiv.org/abs/1611.08588](https://arxiv.org/abs/1611.08588) 380 | 381 | ## GBD-Net 382 | 383 | **Gated Bi-directional CNN for Object Detection** 384 | 385 | * intro: The Chinese University of Hong Kong & Sensetime Group Limited 386 | * paper: [http://link.springer.com/chapter/10.1007/978-3-319-46478-7_22](http://link.springer.com/chapter/10.1007/978-3-319-46478-7_22) 387 | * mirror: [https://pan.baidu.com/s/1dFohO7v](https://pan.baidu.com/s/1dFohO7v) 388 | 389 | **Crafting GBD-Net for Object Detection** 390 | 391 | * intro: winner of the ImageNet object detection challenge of 2016\. CUImage and CUVideo 392 | * intro: gated bi-directional CNN (GBD-Net) 393 | * arxiv: [https://arxiv.org/abs/1610.02579](https://arxiv.org/abs/1610.02579) 394 | * github: [https://github.com/craftGBD/craftGBD](https://github.com/craftGBD/craftGBD) 395 | 396 | ## StuffNet 397 | 398 | **StuffNet: Using ‘Stuff’ to Improve Object Detection** 399 | 400 | * arxiv: [https://arxiv.org/abs/1610.05861](https://arxiv.org/abs/1610.05861) 401 | 402 | **Generalized Haar Filter based Deep Networks for Real-Time Object Detection in Traffic Scene** 403 | 404 | * arxiv: [https://arxiv.org/abs/1610.09609](https://arxiv.org/abs/1610.09609) 405 | 406 | **Hierarchical Object Detection with Deep Reinforcement Learning** 407 | 408 | * intro: Deep Reinforcement Learning Workshop (NIPS 2016) 409 | * project page: [https://imatge-upc.github.io/detection-2016-nipsws/](https://imatge-upc.github.io/detection-2016-nipsws/) 410 | * arxiv: [https://arxiv.org/abs/1611.03718](https://arxiv.org/abs/1611.03718) 411 | * slides: [http://www.slideshare.net/xavigiro/hierarchical-object-detection-with-deep-reinforcement-learning](http://www.slideshare.net/xavigiro/hierarchical-object-detection-with-deep-reinforcement-learning) 412 | * github: [https://github.com/imatge-upc/detection-2016-nipsws](https://github.com/imatge-upc/detection-2016-nipsws) 413 | * blog: [http://jorditorres.org/nips/](http://jorditorres.org/nips/) 414 | 415 | **Learning to detect and localize many objects from few examples** 416 | 417 | * arxiv: [https://arxiv.org/abs/1611.05664](https://arxiv.org/abs/1611.05664) 418 | 419 | **Speed/accuracy trade-offs for modern convolutional object detectors** 420 | 421 | * intro: Google Research 422 | * arxiv: [https://arxiv.org/abs/1611.10012](https://arxiv.org/abs/1611.10012) 423 | 424 | **SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving** 425 | 426 | * arxiv: [https://arxiv.org/abs/1612.01051](https://arxiv.org/abs/1612.01051) 427 | * github: [https://github.com/BichenWuUCB/squeezeDet](https://github.com/BichenWuUCB/squeezeDet) 428 | 429 | ## Feature Pyramid Network (FPN) 430 | 431 | **Feature Pyramid Networks for Object Detection** 432 | 433 | * intro: Facebook AI Research 434 | * arxiv: [https://arxiv.org/abs/1612.03144](https://arxiv.org/abs/1612.03144) 435 | 436 | **Action-Driven Object Detection with Top-Down Visual Attentions** 437 | 438 | * arxiv: [https://arxiv.org/abs/1612.06704](https://arxiv.org/abs/1612.06704) 439 | 440 | **Beyond Skip Connections: Top-Down Modulation for Object Detection** 441 | 442 | * intro: CMU & UC Berkeley & Google Research 443 | * arxiv: [https://arxiv.org/abs/1612.06851](https://arxiv.org/abs/1612.06851) 444 | 445 | ## YOLOv2 446 | 447 | **YOLO9000: Better, Faster, Stronger** 448 | 449 | * arxiv: [https://arxiv.org/abs/1612.08242](https://arxiv.org/abs/1612.08242) 450 | * code: [http://pjreddie.com/yolo9000/](http://pjreddie.com/yolo9000/) 451 | * github(Chainer): [https://github.com/leetenki/YOLOv2](https://github.com/leetenki/YOLOv2) 452 | 453 | ## DSSD 454 | 455 | **DSSD : Deconvolutional Single Shot Detector** 456 | 457 | * intro: UNC Chapel Hill & Amazon Inc 458 | * arxiv: [https://arxiv.org/abs/1701.06659](https://arxiv.org/abs/1701.06659) 459 | 460 | **Wide-Residual-Inception Networks for Real-time Object Detection** 461 | 462 | * intro: Inha University 463 | * arxiv: [https://arxiv.org/abs/1702.01243](https://arxiv.org/abs/1702.01243) 464 | 465 | **Attentional Network for Visual Object Detection** 466 | 467 | * intro: University of Maryland & Mitsubishi Electric Research Laboratories 468 | * arxiv: [https://arxiv.org/abs/1702.01478](https://arxiv.org/abs/1702.01478) 469 | 470 | # Detection From Video 471 | 472 | **Learning Object Class Detectors from Weakly Annotated Video** 473 | 474 | * intro: CVPR 2012 475 | * paper: [https://www.vision.ee.ethz.ch/publications/papers/proceedings/eth_biwi_00905.pdf](https://www.vision.ee.ethz.ch/publications/papers/proceedings/eth_biwi_00905.pdf) 476 | 477 | **Analysing domain shift factors between videos and images for object detection** 478 | 479 | * arxiv: [https://arxiv.org/abs/1501.01186](https://arxiv.org/abs/1501.01186) 480 | 481 | **Video Object Recognition** 482 | 483 | * slides: [http://vision.princeton.edu/courses/COS598/2015sp/slides/VideoRecog/Video%20Object%20Recognition.pptx](http://vision.princeton.edu/courses/COS598/2015sp/slides/VideoRecog/Video%20Object%20Recognition.pptx) 484 | 485 | **Deep Learning for Saliency Prediction in Natural Video** 486 | 487 | * intro: Submitted on 12 Jan 2016 488 | * keywords: Deep learning, saliency map, optical flow, convolution network, contrast features 489 | * paper: [https://hal.archives-ouvertes.fr/hal-01251614/document](https://hal.archives-ouvertes.fr/hal-01251614/document) 490 | 491 | ## T-CNN 492 | 493 | **T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos** 494 | 495 | * intro: Winning solution in ILSVRC2015 Object Detection from Video(VID) Task 496 | * arxiv: [http://arxiv.org/abs/1604.02532](http://arxiv.org/abs/1604.02532) 497 | * github: [https://github.com/myfavouritekk/T-CNN](https://github.com/myfavouritekk/T-CNN) 498 | 499 | **Object Detection from Video Tubelets with Convolutional Neural Networks** 500 | 501 | * intro: CVPR 2016 Spotlight paper 502 | * arxiv: [https://arxiv.org/abs/1604.04053](https://arxiv.org/abs/1604.04053) 503 | * paper: [http://www.ee.cuhk.edu.hk/~wlouyang/Papers/KangVideoDet_CVPR16.pdf](http://www.ee.cuhk.edu.hk/~wlouyang/Papers/KangVideoDet_CVPR16.pdf) 504 | * gihtub: [https://github.com/myfavouritekk/vdetlib](https://github.com/myfavouritekk/vdetlib) 505 | 506 | **Object Detection in Videos with Tubelets and Multi-context Cues** 507 | 508 | * intro: SenseTime Group 509 | * slides: [http://www.ee.cuhk.edu.hk/~xgwang/CUvideo.pdf](http://www.ee.cuhk.edu.hk/~xgwang/CUvideo.pdf) 510 | * slides: [http://image-net.org/challenges/talks/Object%20Detection%20in%20Videos%20with%20Tubelets%20and%20Multi-context%20Cues%20-%20Final.pdf](http://image-net.org/challenges/talks/Object%20Detection%20in%20Videos%20with%20Tubelets%20and%20Multi-context%20Cues%20-%20Final.pdf) 511 | 512 | **Context Matters: Refining Object Detection in Video with Recurrent Neural Networks** 513 | 514 | * intro: BMVC 2016 515 | * keywords: pseudo-labeler 516 | * arxiv: [http://arxiv.org/abs/1607.04648](http://arxiv.org/abs/1607.04648) 517 | * paper: [http://vision.cornell.edu/se3/wp-content/uploads/2016/07/video_object_detection_BMVC.pdf](http://vision.cornell.edu/se3/wp-content/uploads/2016/07/video_object_detection_BMVC.pdf) 518 | 519 | **CNN Based Object Detection in Large Video Images** 520 | 521 | * intro: WangTao @ 爱奇艺 522 | * keywords: object retrieval, object detection, scene classification 523 | * slides: [http://on-demand.gputechconf.com/gtc/2016/presentation/s6362-wang-tao-cnn-based-object-detection-large-video-images.pdf](http://on-demand.gputechconf.com/gtc/2016/presentation/s6362-wang-tao-cnn-based-object-detection-large-video-images.pdf) 524 | 525 | ## Datasets 526 | 527 | **YouTube-Objects dataset v2.2** 528 | 529 | * homepage: [http://calvin.inf.ed.ac.uk/datasets/youtube-objects-dataset/](http://calvin.inf.ed.ac.uk/datasets/youtube-objects-dataset/) 530 | 531 | **ILSVRC2015: Object detection from video (VID)** 532 | 533 | * homepage: [http://vision.cs.unc.edu/ilsvrc2015/download-videos-3j16.php#vid](http://vision.cs.unc.edu/ilsvrc2015/download-videos-3j16.php#vid) 534 | 535 | # Object Detection in 3D 536 | 537 | **Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks** 538 | 539 | * arxiv: [https://arxiv.org/abs/1609.06666](https://arxiv.org/abs/1609.06666) 540 | 541 | # Object Detection on RGB-D 542 | 543 | **Learning Rich Features from RGB-D Images for Object Detection and Segmentation** 544 | 545 | * arxiv: [http://arxiv.org/abs/1407.5736](http://arxiv.org/abs/1407.5736) 546 | 547 | **Differential Geometry Boosts Convolutional Neural Networks for Object Detection** 548 | 549 | * intro: CVPR 2016 550 | * paper: [http://www.cv-foundation.org/openaccess/content_cvpr_2016_workshops/w23/html/Wang_Differential_Geometry_Boosts_CVPR_2016_paper.html](http://www.cv-foundation.org/openaccess/content_cvpr_2016_workshops/w23/html/Wang_Differential_Geometry_Boosts_CVPR_2016_paper.html) 551 | 552 | # Salient Object Detection 553 | 554 | This task involves predicting the salient regions of an image given by human eye fixations. 555 | 556 | **Best Deep Saliency Detection Models (CVPR 2016 & 2015)** 557 | 558 | [http://i.cs.hku.hk/~yzyu/vision.html](http://i.cs.hku.hk/~yzyu/vision.html) 559 | 560 | **Large-scale optimization of hierarchical features for saliency prediction in natural images** 561 | 562 | * paper: [http://coxlab.org/pdfs/cvpr2014_vig_saliency.pdf](http://coxlab.org/pdfs/cvpr2014_vig_saliency.pdf) 563 | 564 | **Predicting Eye Fixations using Convolutional Neural Networks** 565 | 566 | * paper: [http://www.escience.cn/system/file?fileId=72648](http://www.escience.cn/system/file?fileId=72648) 567 | 568 | **Saliency Detection by Multi-Context Deep Learning** 569 | 570 | * paper: [http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhao_Saliency_Detection_by_2015_CVPR_paper.pdf](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhao_Saliency_Detection_by_2015_CVPR_paper.pdf) 571 | 572 | **DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection** 573 | 574 | * arxiv: [http://arxiv.org/abs/1510.05484](http://arxiv.org/abs/1510.05484) 575 | 576 | **SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection** 577 | 578 | * paper: [www.shengfenghe.com/supercnn-a-superpixelwise-convolutional-neural-network-for-salient-object-detection.html](www.shengfenghe.com/supercnn-a-superpixelwise-convolutional-neural-network-for-salient-object-detection.html) 579 | 580 | **Shallow and Deep Convolutional Networks for Saliency Prediction** 581 | 582 | * arxiv: [http://arxiv.org/abs/1603.00845](http://arxiv.org/abs/1603.00845) 583 | * github: [https://github.com/imatge-upc/saliency-2016-cvpr](https://github.com/imatge-upc/saliency-2016-cvpr) 584 | 585 | **Recurrent Attentional Networks for Saliency Detection** 586 | 587 | * intro: CVPR 2016\. recurrent attentional convolutional-deconvolution network (RACDNN) 588 | * arxiv: [http://arxiv.org/abs/1604.03227](http://arxiv.org/abs/1604.03227) 589 | 590 | **Two-Stream Convolutional Networks for Dynamic Saliency Prediction** 591 | 592 | * arxiv: [http://arxiv.org/abs/1607.04730](http://arxiv.org/abs/1607.04730) 593 | 594 | **Unconstrained Salient Object Detection** 595 | 596 | **Unconstrained Salient Object Detection via Proposal Subset Optimization** 597 | 598 | ![](http://cs-people.bu.edu/jmzhang/images/pasted%20image%201465x373.jpg) 599 | 600 | * intro: CVPR 2016 601 | * project page: [http://cs-people.bu.edu/jmzhang/sod.html](http://cs-people.bu.edu/jmzhang/sod.html) 602 | * paper: [http://cs-people.bu.edu/jmzhang/SOD/CVPR16SOD_camera_ready.pdf](http://cs-people.bu.edu/jmzhang/SOD/CVPR16SOD_camera_ready.pdf) 603 | * github: [https://github.com/jimmie33/SOD](https://github.com/jimmie33/SOD) 604 | * caffe model zoo: [https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-object-proposal-models-for-salient-object-detection](https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-object-proposal-models-for-salient-object-detection) 605 | 606 | **DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection** 607 | 608 | * paper: [http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Liu_DHSNet_Deep_Hierarchical_CVPR_2016_paper.pdf](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Liu_DHSNet_Deep_Hierarchical_CVPR_2016_paper.pdf) 609 | 610 | **Salient Object Subitizing** 611 | 612 | ![](http://cs-people.bu.edu/jmzhang/images/frontpage.png?crc=123070793) 613 | 614 | * intro: CVPR 2015 615 | * intro: predicting the existence and the number of salient objects in an image using holistic cues 616 | * project page: [http://cs-people.bu.edu/jmzhang/sos.html](http://cs-people.bu.edu/jmzhang/sos.html) 617 | * arxiv: [http://arxiv.org/abs/1607.07525](http://arxiv.org/abs/1607.07525) 618 | * paper: [http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf](http://cs-people.bu.edu/jmzhang/SOS/SOS_preprint.pdf) 619 | * caffe model zoo: [https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-models-for-salient-object-subitizing](https://github.com/BVLC/caffe/wiki/Model-Zoo#cnn-models-for-salient-object-subitizing) 620 | 621 | **Deeply-Supervised Recurrent Convolutional Neural Network for Saliency Detection** 622 | 623 | * intro: ACMMM 2016\. deeply-supervised recurrent convolutional neural network (DSRCNN) 624 | * arxiv: [http://arxiv.org/abs/1608.05177](http://arxiv.org/abs/1608.05177) 625 | 626 | **Saliency Detection via Combining Region-Level and Pixel-Level Predictions with CNNs** 627 | 628 | * intro: ECCV 2016 629 | * arxiv: [http://arxiv.org/abs/1608.05186](http://arxiv.org/abs/1608.05186) 630 | 631 | **Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection** 632 | 633 | * arxiv: [http://arxiv.org/abs/1608.08029](http://arxiv.org/abs/1608.08029) 634 | 635 | **A Deep Multi-Level Network for Saliency Prediction** 636 | 637 | * arxiv: [http://arxiv.org/abs/1609.01064](http://arxiv.org/abs/1609.01064) 638 | 639 | **Visual Saliency Detection Based on Multiscale Deep CNN Features** 640 | 641 | * intro: IEEE Transactions on Image Processing 642 | * arxiv: [http://arxiv.org/abs/1609.02077](http://arxiv.org/abs/1609.02077) 643 | 644 | **A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection** 645 | 646 | * intro: DSCLRCN 647 | * arxiv: [https://arxiv.org/abs/1610.01708](https://arxiv.org/abs/1610.01708) 648 | 649 | **Deeply supervised salient object detection with short connections** 650 | 651 | * arxiv: [https://arxiv.org/abs/1611.04849](https://arxiv.org/abs/1611.04849) 652 | 653 | **Weakly Supervised Top-down Salient Object Detection** 654 | 655 | * intro: Nanyang Technological University 656 | * arxiv: [https://arxiv.org/abs/1611.05345](https://arxiv.org/abs/1611.05345) 657 | 658 | **SalGAN: Visual Saliency Prediction with Generative Adversarial Networks** 659 | 660 | * project page: [https://imatge-upc.github.io/saliency-salgan-2017/](https://imatge-upc.github.io/saliency-salgan-2017/) 661 | * arxiv: [https://arxiv.org/abs/1701.01081](https://arxiv.org/abs/1701.01081) 662 | 663 | **Visual Saliency Prediction Using a Mixture of Deep Neural Networks** 664 | 665 | * arxiv: [https://arxiv.org/abs/1702.00372](https://arxiv.org/abs/1702.00372) 666 | 667 | **A Fast and Compact Salient Score Regression Network Based on Fully Convolutional Network** 668 | 669 | * arxiv: [https://arxiv.org/abs/1702.00615](https://arxiv.org/abs/1702.00615) 670 | 671 | ## Saliency Detection in Video 672 | 673 | **Deep Learning For Video Saliency Detection** 674 | 675 | * arxiv: [https://arxiv.org/abs/1702.00871](https://arxiv.org/abs/1702.00871) 676 | 677 | ## Datasets 678 | 679 | **MSRA10K Salient Object Database** 680 | 681 | [http://mmcheng.net/msra10k/](http://mmcheng.net/msra10k/) 682 | 683 | # Specific Object Deteciton 684 | 685 | ## Face Deteciton 686 | 687 | **Multi-view Face Detection Using Deep Convolutional Neural Networks** 688 | 689 | * intro: Yahoo 690 | * arxiv: [http://arxiv.org/abs/1502.02766](http://arxiv.org/abs/1502.02766) 691 | 692 | **From Facial Parts Responses to Face Detection: A Deep Learning Approach** 693 | 694 | ![](http://personal.ie.cuhk.edu.hk/~ys014/projects/Faceness/support/index.png) 695 | 696 | * project page: [http://personal.ie.cuhk.edu.hk/~ys014/projects/Faceness/Faceness.html](http://personal.ie.cuhk.edu.hk/~ys014/projects/Faceness/Faceness.html) 697 | 698 | **Compact Convolutional Neural Network Cascade for Face Detection** 699 | 700 | * arxiv: [http://arxiv.org/abs/1508.01292](http://arxiv.org/abs/1508.01292) 701 | * github: [https://github.com/Bkmz21/FD-Evaluation](https://github.com/Bkmz21/FD-Evaluation) 702 | 703 | **Face Detection with End-to-End Integration of a ConvNet and a 3D Model** 704 | 705 | * intro: ECCV 2016 706 | * arxiv: [https://arxiv.org/abs/1606.00850](https://arxiv.org/abs/1606.00850) 707 | * github(MXNet): [https://github.com/tfwu/FaceDetection-ConvNet-3D](https://github.com/tfwu/FaceDetection-ConvNet-3D) 708 | 709 | **CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection** 710 | 711 | * intro: CMU 712 | * arxiv: [https://arxiv.org/abs/1606.05413](https://arxiv.org/abs/1606.05413) 713 | 714 | **Finding Tiny Faces** 715 | 716 | * intro: CMU 717 | * arxiv: [https://arxiv.org/abs/1612.04402](https://arxiv.org/abs/1612.04402) 718 | 719 | **Towards a Deep Learning Framework for Unconstrained Face Detection** 720 | 721 | * intro: overlap with CMS-RCNN 722 | * arxiv: [https://arxiv.org/abs/1612.05322](https://arxiv.org/abs/1612.05322) 723 | 724 | **Supervised Transformer Network for Efficient Face Detection** 725 | 726 | * arxiv: [http://arxiv.org/abs/1607.05477](http://arxiv.org/abs/1607.05477) 727 | 728 | ### UnitBox 729 | 730 | **UnitBox: An Advanced Object Detection Network** 731 | 732 | * intro: ACM MM 2016 733 | * arxiv: [http://arxiv.org/abs/1608.01471](http://arxiv.org/abs/1608.01471) 734 | 735 | **Bootstrapping Face Detection with Hard Negative Examples** 736 | 737 | * author: 万韶华 @ 小米. 738 | * intro: Faster R-CNN, hard negative mining. state-of-the-art on the FDDB dataset 739 | * arxiv: [http://arxiv.org/abs/1608.02236](http://arxiv.org/abs/1608.02236) 740 | 741 | **Grid Loss: Detecting Occluded Faces** 742 | 743 | * intro: ECCV 2016 744 | * arxiv: [https://arxiv.org/abs/1609.00129](https://arxiv.org/abs/1609.00129) 745 | * paper: [http://lrs.icg.tugraz.at/pubs/opitz_eccv_16.pdf](http://lrs.icg.tugraz.at/pubs/opitz_eccv_16.pdf) 746 | * poster: [http://www.eccv2016.org/files/posters/P-2A-34.pdf](http://www.eccv2016.org/files/posters/P-2A-34.pdf) 747 | 748 | **A Multi-Scale Cascade Fully Convolutional Network Face Detector** 749 | 750 | * intro: ICPR 2016 751 | * arxiv: [http://arxiv.org/abs/1609.03536](http://arxiv.org/abs/1609.03536) 752 | 753 | ### MTCNN 754 | 755 | **Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks** 756 | 757 | **Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks** 758 | 759 | ![](https://kpzhang93.github.io/MTCNN_face_detection_alignment/support/index.png) 760 | 761 | * project page: [https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html](https://kpzhang93.github.io/MTCNN_face_detection_alignment/index.html) 762 | * arxiv: [https://arxiv.org/abs/1604.02878](https://arxiv.org/abs/1604.02878) 763 | * github(Matlab): [https://github.com/kpzhang93/MTCNN_face_detection_alignment](https://github.com/kpzhang93/MTCNN_face_detection_alignment) 764 | * github(MXNet): [https://github.com/pangyupo/mxnet_mtcnn_face_detection](https://github.com/pangyupo/mxnet_mtcnn_face_detection) 765 | * github: [https://github.com/DaFuCoding/MTCNN_Caffe](https://github.com/DaFuCoding/MTCNN_Caffe) 766 | * github(MXNet): [https://github.com/Seanlinx/mtcnn](https://github.com/Seanlinx/mtcnn) 767 | 768 | **Face Detection using Deep Learning: An Improved Faster RCNN Approach** 769 | 770 | * intro: DeepIR Inc 771 | * arxiv: [https://arxiv.org/abs/1701.08289](https://arxiv.org/abs/1701.08289) 772 | 773 | **Faceness-Net: Face Detection through Deep Facial Part Responses** 774 | 775 | * intro: An extended version of ICCV 2015 paper 776 | * arxiv: [https://arxiv.org/abs/1701.08393](https://arxiv.org/abs/1701.08393) 777 | 778 | ### Datasets / Benchmarks 779 | 780 | **FDDB: Face Detection Data Set and Benchmark** 781 | 782 | * homepage: [http://vis-www.cs.umass.edu/fddb/index.html](http://vis-www.cs.umass.edu/fddb/index.html) 783 | * results: [http://vis-www.cs.umass.edu/fddb/results.html](http://vis-www.cs.umass.edu/fddb/results.html) 784 | 785 | **WIDER FACE: A Face Detection Benchmark** 786 | 787 | ![](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/support/intro.jpg) 788 | 789 | * homepage: [http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/](http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) 790 | * arxiv: [http://arxiv.org/abs/1511.06523](http://arxiv.org/abs/1511.06523) 791 | 792 | ## Facial Point / Landmark Detection 793 | 794 | **Deep Convolutional Network Cascade for Facial Point Detection** 795 | 796 | ![](http://mmlab.ie.cuhk.edu.hk/archive/CNN/data/Picture1.png) 797 | 798 | * homepage: [http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm](http://mmlab.ie.cuhk.edu.hk/archive/CNN_FacePoint.htm) 799 | * paper: [http://www.ee.cuhk.edu.hk/~xgwang/papers/sunWTcvpr13.pdf](http://www.ee.cuhk.edu.hk/~xgwang/papers/sunWTcvpr13.pdf) 800 | * github: [https://github.com/luoyetx/deep-landmark](https://github.com/luoyetx/deep-landmark) 801 | 802 | **Facial Landmark Detection by Deep Multi-task Learning** 803 | 804 | * intro: ECCV 2014 805 | * project page: [http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html](http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html) 806 | * paper: [http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepfacealign.pdf](http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepfacealign.pdf) 807 | * github(Matlab): [https://github.com/zhzhanp/TCDCN-face-alignment](https://github.com/zhzhanp/TCDCN-face-alignment) 808 | 809 | **A Recurrent Encoder-Decoder Network for Sequential Face Alignment** 810 | 811 | * intro: ECCV 2016 812 | * arxiv: [https://arxiv.org/abs/1608.05477](https://arxiv.org/abs/1608.05477) 813 | 814 | **Detecting facial landmarks in the video based on a hybrid framework** 815 | 816 | * arxiv: [http://arxiv.org/abs/1609.06441](http://arxiv.org/abs/1609.06441) 817 | 818 | **Deep Constrained Local Models for Facial Landmark Detection** 819 | 820 | * arxiv: [https://arxiv.org/abs/1611.08657](https://arxiv.org/abs/1611.08657) 821 | 822 | **Effective face landmark localization via single deep network** 823 | 824 | * arxiv: [https://arxiv.org/abs/1702.02719](https://arxiv.org/abs/1702.02719) 825 | 826 | ## People Detection 827 | 828 | **End-to-end people detection in crowded scenes** 829 | 830 | ![](end_to_end_people_detection_in_crowded_scenes.jpg) 831 | 832 | * arxiv: [http://arxiv.org/abs/1506.04878](http://arxiv.org/abs/1506.04878) 833 | * github: [https://github.com/Russell91/reinspect](https://github.com/Russell91/reinspect) 834 | * ipn: [http://nbviewer.ipython.org/github/Russell91/ReInspect/blob/master/evaluation_reinspect.ipynb](http://nbviewer.ipython.org/github/Russell91/ReInspect/blob/master/evaluation_reinspect.ipynb) 835 | 836 | **Detecting People in Artwork with CNNs** 837 | 838 | * intro: ECCV 2016 Workshops 839 | * arxiv: [https://arxiv.org/abs/1610.08871](https://arxiv.org/abs/1610.08871) 840 | 841 | ## Person Head Detection 842 | 843 | **Context-aware CNNs for person head detection** 844 | 845 | * arxiv: [http://arxiv.org/abs/1511.07917](http://arxiv.org/abs/1511.07917) 846 | * github: [https://github.com/aosokin/cnn_head_detection](https://github.com/aosokin/cnn_head_detection) 847 | 848 | ## Pedestrian Detection 849 | 850 | **Pedestrian Detection aided by Deep Learning Semantic Tasks** 851 | 852 | * intro: CVPR 2015 853 | * project page: [http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/](http://mmlab.ie.cuhk.edu.hk/projects/TA-CNN/) 854 | * paper: [http://arxiv.org/abs/1412.0069](http://arxiv.org/abs/1412.0069) 855 | 856 | **Deep Learning Strong Parts for Pedestrian Detection** 857 | 858 | * intro: ICCV 2015\. CUHK. DeepParts 859 | * intro: Achieving 11.89% average miss rate on Caltech Pedestrian Dataset 860 | * paper: [http://personal.ie.cuhk.edu.hk/~pluo/pdf/tianLWTiccv15.pdf](http://personal.ie.cuhk.edu.hk/~pluo/pdf/tianLWTiccv15.pdf) 861 | 862 | **Deep convolutional neural networks for pedestrian detection** 863 | 864 | * arxiv: [http://arxiv.org/abs/1510.03608](http://arxiv.org/abs/1510.03608) 865 | * github: [https://github.com/DenisTome/DeepPed](https://github.com/DenisTome/DeepPed) 866 | 867 | **Scale-aware Fast R-CNN for Pedestrian Detection** 868 | 869 | * arxiv: [https://arxiv.org/abs/1510.08160](https://arxiv.org/abs/1510.08160) 870 | 871 | **New algorithm improves speed and accuracy of pedestrian detection** 872 | 873 | * blog: [http://www.eurekalert.org/pub_releases/2016-02/uoc–nai020516.php](http://www.eurekalert.org/pub_releases/2016-02/uoc--nai020516.php) 874 | 875 | **Pushing the Limits of Deep CNNs for Pedestrian Detection** 876 | 877 | * intro: “set a new record on the Caltech pedestrian dataset, lowering the log-average miss rate from 11.7% to 8.9%” 878 | * arxiv: [http://arxiv.org/abs/1603.04525](http://arxiv.org/abs/1603.04525) 879 | 880 | **A Real-Time Deep Learning Pedestrian Detector for Robot Navigation** 881 | 882 | * arxiv: [http://arxiv.org/abs/1607.04436](http://arxiv.org/abs/1607.04436) 883 | 884 | **A Real-Time Pedestrian Detector using Deep Learning for Human-Aware Navigation** 885 | 886 | * arxiv: [http://arxiv.org/abs/1607.04441](http://arxiv.org/abs/1607.04441) 887 | 888 | **Is Faster R-CNN Doing Well for Pedestrian Detection?** 889 | 890 | * intro: ECCV 2016 891 | * arxiv: [http://arxiv.org/abs/1607.07032](http://arxiv.org/abs/1607.07032) 892 | * github: [https://github.com/zhangliliang/RPN_BF/tree/RPN-pedestrian](https://github.com/zhangliliang/RPN_BF/tree/RPN-pedestrian) 893 | 894 | **Reduced Memory Region Based Deep Convolutional Neural Network Detection** 895 | 896 | * intro: IEEE 2016 ICCE-Berlin 897 | * arxiv: [http://arxiv.org/abs/1609.02500](http://arxiv.org/abs/1609.02500) 898 | 899 | **Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection** 900 | 901 | * arxiv: [https://arxiv.org/abs/1610.03466](https://arxiv.org/abs/1610.03466) 902 | 903 | **Multispectral Deep Neural Networks for Pedestrian Detection** 904 | 905 | * intro: BMVC 2016 oral 906 | * arxiv: [https://arxiv.org/abs/1611.02644](https://arxiv.org/abs/1611.02644) 907 | 908 | ## Vehicle Detection 909 | 910 | **DAVE: A Unified Framework for Fast Vehicle Detection and Annotation** 911 | 912 | * intro: ECCV 2016 913 | * arxiv: [http://arxiv.org/abs/1607.04564](http://arxiv.org/abs/1607.04564) 914 | 915 | **Evolving Boxes for fast Vehicle Detection** 916 | 917 | * arxiv: [https://arxiv.org/abs/1702.00254](https://arxiv.org/abs/1702.00254) 918 | 919 | ## Traffic-Sign Detection 920 | 921 | **Traffic-Sign Detection and Classification in the Wild** 922 | 923 | * project page(code+dataset): [http://cg.cs.tsinghua.edu.cn/traffic-sign/](http://cg.cs.tsinghua.edu.cn/traffic-sign/) 924 | * paper: [http://120.52.73.11/www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhu_Traffic-Sign_Detection_and_CVPR_2016_paper.pdf](http://120.52.73.11/www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhu_Traffic-Sign_Detection_and_CVPR_2016_paper.pdf) 925 | * code & model: [http://cg.cs.tsinghua.edu.cn/traffic-sign/data_model_code/newdata0411.zip](http://cg.cs.tsinghua.edu.cn/traffic-sign/data_model_code/newdata0411.zip) 926 | 927 | ## Boundary / Edge / Contour Detection 928 | 929 | **Holistically-Nested Edge Detection** 930 | 931 | ![](https://camo.githubusercontent.com/da32e7e3275c2a9693dd2a6925b03a1151e2b098/687474703a2f2f70616765732e756373642e6564752f7e7a74752f6865642e6a7067) 932 | 933 | * intro: ICCV 2015, Marr Prize 934 | * paper: [http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Xie_Holistically-Nested_Edge_Detection_ICCV_2015_paper.pdf](http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Xie_Holistically-Nested_Edge_Detection_ICCV_2015_paper.pdf) 935 | * arxiv: [http://arxiv.org/abs/1504.06375](http://arxiv.org/abs/1504.06375) 936 | * github: [https://github.com/s9xie/hed](https://github.com/s9xie/hed) 937 | 938 | **Unsupervised Learning of Edges** 939 | 940 | * intro: CVPR 2016\. Facebook AI Research 941 | * arxiv: [http://arxiv.org/abs/1511.04166](http://arxiv.org/abs/1511.04166) 942 | * zn-blog: [http://www.leiphone.com/news/201607/b1trsg9j6GSMnjOP.html](http://www.leiphone.com/news/201607/b1trsg9j6GSMnjOP.html) 943 | 944 | **Pushing the Boundaries of Boundary Detection using Deep Learning** 945 | 946 | * arxiv: [http://arxiv.org/abs/1511.07386](http://arxiv.org/abs/1511.07386) 947 | 948 | **Convolutional Oriented Boundaries** 949 | 950 | * intro: ECCV 2016 951 | * arxiv: [http://arxiv.org/abs/1608.02755](http://arxiv.org/abs/1608.02755) 952 | 953 | **Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks** 954 | 955 | * project page: [http://www.vision.ee.ethz.ch/~cvlsegmentation/](http://www.vision.ee.ethz.ch/~cvlsegmentation/) 956 | * arxiv: [https://arxiv.org/abs/1701.04658](https://arxiv.org/abs/1701.04658) 957 | 958 | **Richer Convolutional Features for Edge Detection** 959 | 960 | * intro: richer convolutional features (RCF) 961 | * arxiv: [https://arxiv.org/abs/1612.02103](https://arxiv.org/abs/1612.02103) 962 | 963 | ## Skeleton Detection 964 | 965 | **Object Skeleton Extraction in Natural Images by Fusing Scale-associated Deep Side Outputs** 966 | 967 | ![](https://camo.githubusercontent.com/88a65f132aa4ae4b0477e3ad02c13cdc498377d9/687474703a2f2f37786e37777a2e636f6d312e7a302e676c622e636c6f7564646e2e636f6d2f44656570536b656c65746f6e2e706e673f696d61676556696577322f322f772f353030) 968 | 969 | * arxiv: [http://arxiv.org/abs/1603.09446](http://arxiv.org/abs/1603.09446) 970 | * github: [https://github.com/zeakey/DeepSkeleton](https://github.com/zeakey/DeepSkeleton) 971 | 972 | **DeepSkeleton: Learning Multi-task Scale-associated Deep Side Outputs for Object Skeleton Extraction in Natural Images** 973 | 974 | * arxiv: [http://arxiv.org/abs/1609.03659](http://arxiv.org/abs/1609.03659) 975 | 976 | ## Fruit Detection 977 | 978 | **Deep Fruit Detection in Orchards** 979 | 980 | * arxiv: [https://arxiv.org/abs/1610.03677](https://arxiv.org/abs/1610.03677) 981 | 982 | **Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards** 983 | 984 | * intro: The Journal of Field Robotics in May 2016 985 | * project page: [http://confluence.acfr.usyd.edu.au/display/AGPub/](http://confluence.acfr.usyd.edu.au/display/AGPub/) 986 | * arxiv: [https://arxiv.org/abs/1610.08120](https://arxiv.org/abs/1610.08120) 987 | 988 | ## Others 989 | 990 | **Deep Deformation Network for Object Landmark Localization** 991 | 992 | * arxiv: [http://arxiv.org/abs/1605.01014](http://arxiv.org/abs/1605.01014) 993 | 994 | **Fashion Landmark Detection in the Wild** 995 | 996 | * arxiv: [http://arxiv.org/abs/1608.03049](http://arxiv.org/abs/1608.03049) 997 | 998 | **Deep Learning for Fast and Accurate Fashion Item Detection** 999 | 1000 | * intro: Kuznech Inc. 1001 | * intro: MultiBox and Fast R-CNN 1002 | * paper: [https://kddfashion2016.mybluemix.net/kddfashion_finalSubmissions/Deep%20Learning%20for%20Fast%20and%20Accurate%20Fashion%20Item%20Detection.pdf](https://kddfashion2016.mybluemix.net/kddfashion_finalSubmissions/Deep%20Learning%20for%20Fast%20and%20Accurate%20Fashion%20Item%20Detection.pdf) 1003 | 1004 | **Visual Relationship Detection with Language Priors** 1005 | 1006 | * intro: ECCV 2016 oral 1007 | * paper: [https://cs.stanford.edu/people/ranjaykrishna/vrd/vrd.pdf](https://cs.stanford.edu/people/ranjaykrishna/vrd/vrd.pdf) 1008 | * github: [https://github.com/Prof-Lu-Cewu/Visual-Relationship-Detection](https://github.com/Prof-Lu-Cewu/Visual-Relationship-Detection) 1009 | 1010 | **OSMDeepOD - OSM and Deep Learning based Object Detection from Aerial Imagery (formerly known as “OSM-Crosswalk-Detection”)** 1011 | 1012 | ![](https://raw.githubusercontent.com/geometalab/OSMDeepOD/master/imgs/process.png) 1013 | 1014 | * github: [https://github.com/geometalab/OSMDeepOD](https://github.com/geometalab/OSMDeepOD) 1015 | 1016 | **Selfie Detection by Synergy-Constraint Based Convolutional Neural Network** 1017 | 1018 | * intro: IEEE SITIS 2016 1019 | * arxiv: [https://arxiv.org/abs/1611.04357](https://arxiv.org/abs/1611.04357) 1020 | 1021 | **Associative Embedding:End-to-End Learning for Joint Detection and Grouping** 1022 | 1023 | * arxiv: [https://arxiv.org/abs/1611.05424](https://arxiv.org/abs/1611.05424) 1024 | 1025 | **Deep Cuboid Detection: Beyond 2D Bounding Boxes** 1026 | 1027 | * intro: CMU & Magic Leap 1028 | * arxiv: [https://arxiv.org/abs/1611.10010](https://arxiv.org/abs/1611.10010) 1029 | 1030 | **Automatic Model Based Dataset Generation for Fast and Accurate Crop and Weeds Detection** 1031 | 1032 | * arxiv: [https://arxiv.org/abs/1612.03019](https://arxiv.org/abs/1612.03019) 1033 | 1034 | **Deep Learning Logo Detection with Data Expansion by Synthesising Context** 1035 | 1036 | * arxiv: [https://arxiv.org/abs/1612.09322](https://arxiv.org/abs/1612.09322) 1037 | 1038 | **Pixel-wise Ear Detection with Convolutional Encoder-Decoder Networks** 1039 | 1040 | * arxiv: [https://arxiv.org/abs/1702.00307](https://arxiv.org/abs/1702.00307) 1041 | 1042 | # Object Proposal 1043 | 1044 | **DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers** 1045 | 1046 | * arxiv: [http://arxiv.org/abs/1510.04445](http://arxiv.org/abs/1510.04445) 1047 | * github: [https://github.com/aghodrati/deepproposal](https://github.com/aghodrati/deepproposal) 1048 | 1049 | **Scale-aware Pixel-wise Object Proposal Networks** 1050 | 1051 | * intro: IEEE Transactions on Image Processing 1052 | * arxiv: [http://arxiv.org/abs/1601.04798](http://arxiv.org/abs/1601.04798) 1053 | 1054 | **Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization** 1055 | 1056 | * intro: BMVC 2016\. AttractioNet 1057 | * arxiv: [https://arxiv.org/abs/1606.04446](https://arxiv.org/abs/1606.04446) 1058 | * github: [https://github.com/gidariss/AttractioNet](https://github.com/gidariss/AttractioNet) 1059 | 1060 | **Learning to Segment Object Proposals via Recursive Neural Networks** 1061 | 1062 | * arxiv: [https://arxiv.org/abs/1612.01057](https://arxiv.org/abs/1612.01057) 1063 | 1064 | # Localization 1065 | 1066 | **Beyond Bounding Boxes: Precise Localization of Objects in Images** 1067 | 1068 | * intro: PhD Thesis 1069 | * homepage: [http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-193.html](http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-193.html) 1070 | * phd-thesis: [http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-193.pdf](http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-193.pdf) 1071 | * github(“SDS using hypercolumns”): [https://github.com/bharath272/sds](https://github.com/bharath272/sds) 1072 | 1073 | **Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning** 1074 | 1075 | * arxiv: [http://arxiv.org/abs/1503.00949](http://arxiv.org/abs/1503.00949) 1076 | 1077 | **Weakly Supervised Object Localization Using Size Estimates** 1078 | 1079 | * arxiv: [http://arxiv.org/abs/1608.04314](http://arxiv.org/abs/1608.04314) 1080 | 1081 | **Active Object Localization with Deep Reinforcement Learning** 1082 | 1083 | * intro: ICCV 2015 1084 | * keywords: Markov Decision Process 1085 | * arxiv: [https://arxiv.org/abs/1511.06015](https://arxiv.org/abs/1511.06015) 1086 | 1087 | **Localizing objects using referring expressions** 1088 | 1089 | * intro: ECCV 2016 1090 | * keywords: LSTM, multiple instance learning (MIL) 1091 | * paper: [http://www.umiacs.umd.edu/~varun/files/refexp-ECCV16.pdf](http://www.umiacs.umd.edu/~varun/files/refexp-ECCV16.pdf) 1092 | * github: [https://github.com/varun-nagaraja/referring-expressions](https://github.com/varun-nagaraja/referring-expressions) 1093 | 1094 | **LocNet: Improving Localization Accuracy for Object Detection** 1095 | 1096 | * arxiv: [http://arxiv.org/abs/1511.07763](http://arxiv.org/abs/1511.07763) 1097 | * github: [https://github.com/gidariss/LocNet](https://github.com/gidariss/LocNet) 1098 | 1099 | **Learning Deep Features for Discriminative Localization** 1100 | 1101 | ![](http://cnnlocalization.csail.mit.edu/framework.jpg) 1102 | 1103 | * homepage: [http://cnnlocalization.csail.mit.edu/](http://cnnlocalization.csail.mit.edu/) 1104 | * arxiv: [http://arxiv.org/abs/1512.04150](http://arxiv.org/abs/1512.04150) 1105 | * github(Tensorflow): [https://github.com/jazzsaxmafia/Weakly_detector](https://github.com/jazzsaxmafia/Weakly_detector) 1106 | * github: [https://github.com/metalbubble/CAM](https://github.com/metalbubble/CAM) 1107 | * github: [https://github.com/tdeboissiere/VGG16CAM-keras](https://github.com/tdeboissiere/VGG16CAM-keras) 1108 | 1109 | **ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization** 1110 | 1111 | ![](http://www.di.ens.fr/willow/research/contextlocnet/model.png) 1112 | 1113 | * intro: ECCV 2016 1114 | * project page: [http://www.di.ens.fr/willow/research/contextlocnet/](http://www.di.ens.fr/willow/research/contextlocnet/) 1115 | * arxiv: [http://arxiv.org/abs/1609.04331](http://arxiv.org/abs/1609.04331) 1116 | * github: [https://github.com/vadimkantorov/contextlocnet](https://github.com/vadimkantorov/contextlocnet) 1117 | 1118 | # Tutorials / Talks 1119 | 1120 | **Convolutional Feature Maps: Elements of efficient (and accurate) CNN-based object detection** 1121 | 1122 | * slides: [http://research.microsoft.com/en-us/um/people/kahe/iccv15tutorial/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf](http://research.microsoft.com/en-us/um/people/kahe/iccv15tutorial/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf) 1123 | 1124 | **Towards Good Practices for Recognition & Detection** 1125 | 1126 | * intro: Hikvision Research Institute. Supervised Data Augmentation (SDA) 1127 | * slides: [http://image-net.org/challenges/talks/2016/Hikvision_at_ImageNet_2016.pdf](http://image-net.org/challenges/talks/2016/Hikvision_at_ImageNet_2016.pdf) 1128 | 1129 | # Projects 1130 | 1131 | **TensorBox: a simple framework for training neural networks to detect objects in images** 1132 | 1133 | * intro: “The basic model implements the simple and robust GoogLeNet-OverFeat algorithm. We additionally provide an implementation of the [ReInspect](https://github.com/Russell91/ReInspect/) algorithm” 1134 | * github: [https://github.com/Russell91/TensorBox](https://github.com/Russell91/TensorBox) 1135 | 1136 | **Object detection in torch: Implementation of some object detection frameworks in torch** 1137 | 1138 | * github: [https://github.com/fmassa/object-detection.torch](https://github.com/fmassa/object-detection.torch) 1139 | 1140 | **Using DIGITS to train an Object Detection network** 1141 | 1142 | * github: [https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md](https://github.com/NVIDIA/DIGITS/blob/master/examples/object-detection/README.md) 1143 | 1144 | **FCN-MultiBox Detector** 1145 | 1146 | * intro: Full convolution MultiBox Detector (like SSD) implemented in Torch. 1147 | * github: [https://github.com/teaonly/FMD.torch](https://github.com/teaonly/FMD.torch) 1148 | 1149 | **KittiBox: A car detection model implemented in Tensorflow.** 1150 | 1151 | * keywords: MultiNet 1152 | * intro: KittiBox is a collection of scripts to train out model FastBox on the Kitti Object Detection Dataset 1153 | * github: [https://github.com/MarvinTeichmann/KittiBox](https://github.com/MarvinTeichmann/KittiBox) 1154 | 1155 | # Blogs 1156 | 1157 | **Convolutional Neural Networks for Object Detection** 1158 | 1159 | [http://rnd.azoft.com/convolutional-neural-networks-object-detection/](http://rnd.azoft.com/convolutional-neural-networks-object-detection/) 1160 | 1161 | **Introducing automatic object detection to visual search (Pinterest)** 1162 | 1163 | * keywords: Faster R-CNN 1164 | * blog: [https://engineering.pinterest.com/blog/introducing-automatic-object-detection-visual-search](https://engineering.pinterest.com/blog/introducing-automatic-object-detection-visual-search) 1165 | * demo: [https://engineering.pinterest.com/sites/engineering/files/Visual%20Search%20V1%20-%20Video.mp4](https://engineering.pinterest.com/sites/engineering/files/Visual%20Search%20V1%20-%20Video.mp4) 1166 | * review: [https://news.developer.nvidia.com/pinterest-introduces-the-future-of-visual-search/?mkt_tok=eyJpIjoiTnpaa01UWXpPRE0xTURFMiIsInQiOiJJRjcybjkwTmtmallORUhLOFFFODBDclFqUlB3SWlRVXJXb1MrQ013TDRIMGxLQWlBczFIeWg0TFRUdnN2UHY2ZWFiXC9QQVwvQzBHM3B0UzBZblpOSmUyU1FcLzNPWXI4cml2VERwTTJsOFwvOEk9In0%3D](https://news.developer.nvidia.com/pinterest-introduces-the-future-of-visual-search/?mkt_tok=eyJpIjoiTnpaa01UWXpPRE0xTURFMiIsInQiOiJJRjcybjkwTmtmallORUhLOFFFODBDclFqUlB3SWlRVXJXb1MrQ013TDRIMGxLQWlBczFIeWg0TFRUdnN2UHY2ZWFiXC9QQVwvQzBHM3B0UzBZblpOSmUyU1FcLzNPWXI4cml2VERwTTJsOFwvOEk9In0%3D) 1167 | 1168 | **Deep Learning for Object Detection with DIGITS** 1169 | 1170 | * blog: [https://devblogs.nvidia.com/parallelforall/deep-learning-object-detection-digits/](https://devblogs.nvidia.com/parallelforall/deep-learning-object-detection-digits/) 1171 | 1172 | **Analyzing The Papers Behind Facebook’s Computer Vision Approach** 1173 | 1174 | * keywords: DeepMask, SharpMask, MultiPathNet 1175 | * blog: [https://adeshpande3.github.io/adeshpande3.github.io/Analyzing-the-Papers-Behind-Facebook’s-Computer-Vision-Approach/](https://adeshpande3.github.io/adeshpande3.github.io/Analyzing-the-Papers-Behind-Facebook's-Computer-Vision-Approach/) 1176 | 1177 | **Easily Create High Quality Object Detectors with Deep Learning** 1178 | 1179 | * intro: dlib v19.2 1180 | * blog: [http://blog.dlib.net/2016/10/easily-create-high-quality-object.html](http://blog.dlib.net/2016/10/easily-create-high-quality-object.html) 1181 | 1182 | **How to Train a Deep-Learned Object Detection Model in the Microsoft Cognitive Toolkit** 1183 | 1184 | * blog: [https://blogs.technet.microsoft.com/machinelearning/2016/10/25/how-to-train-a-deep-learned-object-detection-model-in-cntk/](https://blogs.technet.microsoft.com/machinelearning/2016/10/25/how-to-train-a-deep-learned-object-detection-model-in-cntk/) 1185 | * github: [https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Detection/FastRCNN](https://github.com/Microsoft/CNTK/tree/master/Examples/Image/Detection/FastRCNN) 1186 | 1187 | **Object Detection in Satellite Imagery, a Low Overhead Approach** 1188 | 1189 | * part 1: [https://medium.com/the-downlinq/object-detection-in-satellite-imagery-a-low-overhead-approach-part-i-cbd96154a1b7#.2csh4iwx9](https://medium.com/the-downlinq/object-detection-in-satellite-imagery-a-low-overhead-approach-part-i-cbd96154a1b7#.2csh4iwx9) 1190 | * part 2: [https://medium.com/the-downlinq/object-detection-in-satellite-imagery-a-low-overhead-approach-part-ii-893f40122f92#.f9b7dgf64](https://medium.com/the-downlinq/object-detection-in-satellite-imagery-a-low-overhead-approach-part-ii-893f40122f92#.f9b7dgf64) 1191 | 1192 | **You Only Look Twice — Multi-Scale Object Detection in Satellite Imagery With Convolutional Neural Networks** 1193 | 1194 | * part 1: [https://medium.com/the-downlinq/you-only-look-twice-multi-scale-object-detection-in-satellite-imagery-with-convolutional-neural-38dad1cf7571#.fmmi2o3of](https://medium.com/the-downlinq/you-only-look-twice-multi-scale-object-detection-in-satellite-imagery-with-convolutional-neural-38dad1cf7571#.fmmi2o3of) 1195 | * part 2: [https://medium.com/the-downlinq/you-only-look-twice-multi-scale-object-detection-in-satellite-imagery-with-convolutional-neural-34f72f659588#.nwzarsz1t](https://medium.com/the-downlinq/you-only-look-twice-multi-scale-object-detection-in-satellite-imagery-with-convolutional-neural-34f72f659588#.nwzarsz1t) 1196 | 1197 | **Faster R-CNN Pedestrian and Car Detection** 1198 | 1199 | * blog: [https://bigsnarf.wordpress.com/2016/11/07/faster-r-cnn-pedestrian-and-car-detection/](https://bigsnarf.wordpress.com/2016/11/07/faster-r-cnn-pedestrian-and-car-detection/) 1200 | * ipn: [https://gist.github.com/bigsnarfdude/2f7b2144065f6056892a98495644d3e0#file-demo_faster_rcnn_notebook-ipynb](https://gist.github.com/bigsnarfdude/2f7b2144065f6056892a98495644d3e0#file-demo_faster_rcnn_notebook-ipynb) 1201 | * github: [https://github.com/bigsnarfdude/Faster-RCNN_TF](https://github.com/bigsnarfdude/Faster-RCNN_TF) 1202 | 1203 | **Small U-Net for vehicle detection** 1204 | 1205 | * blog: [https://medium.com/@vivek.yadav/small-u-net-for-vehicle-detection-9eec216f9fd6#.md4u80kad](https://medium.com/@vivek.yadav/small-u-net-for-vehicle-detection-9eec216f9fd6#.md4u80kad) 1206 | --------------------------------------------------------------------------------