├── object_detection_summary.md ├── object_detection_summary.pdf ├── pic ├── Selection_043.png ├── Selection_044.png ├── Selection_045.png ├── Selection_048.png ├── Selection_049.png ├── Selection_050.png ├── Selection_051.png ├── Selection_052.png ├── Selection_053.png ├── Selection_054.png ├── Selection_056.png ├── Selection_057.png ├── Selection_058.png ├── Selection_059.png ├── Selection_060.png ├── Selection_061.png ├── Selection_062.png ├── Selection_063.png ├── Selection_064.png ├── Selection_065.png ├── Selection_066.png ├── Selection_067.png ├── Selection_068.png ├── Selection_069.png ├── Selection_070.png ├── Selection_071.png ├── Selection_072.png ├── Selection_073.png ├── Selection_074.png ├── Selection_076.png ├── Selection_077.png ├── Selection_078.png ├── Selection_079.png ├── Selection_080.png ├── Selection_081.png ├── Selection_082.png ├── Selection_083.png ├── Selection_084.png ├── Selection_085.png ├── Selection_086.png ├── Selection_087.png ├── Selection_088.png ├── Selection_089.png ├── Selection_090.png ├── Selection_091.png ├── Selection_092.png ├── Selection_093.png ├── Selection_094.png ├── Selection_095.png ├── Selection_096.png ├── Selection_097.png ├── Selection_098.png ├── Selection_099.png ├── Selection_136.png ├── Selection_137.png ├── Selection_138.png ├── Selection_139.png ├── Selection_140.png ├── Selection_141.png ├── Selection_144.png ├── Selection_145.png ├── Selection_160.png ├── Selection_161.png └── Selection_162.png └── rcnn_seires_paper.docx /object_detection_summary.md: -------------------------------------------------------------------------------- 1 | # Object Detection Summary 2 | ## 1. SPP-Net 3 | **Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition** [[Paper](https://arxiv.org/pdf/1406.4729.pdf)] [[Code](https://github.com/ShaoqingRen/SPP_net)]: Kaiming He et.al 4 | 5 | #### Net structure 6 | Main structure: ZF + Spatial Pyramid Pooling layer 7 | 8 | Spatial Pyramid Pooling Layer: SPP是BOW(Bag of Words)模型的扩展,**将图像从粗粒度到细粒度进行划分,然后将局部特征进行整合(在CNN流行之前很常用)。** 9 | 10 | SPP-net将最后一层conv层之后的pooling层用SPP层进行替换,在每种粒度输出kM维向量(k:# filter; M:该粒度包含的bin的数量)。其中最粗的粒度只输出一个bin,也就是global pooling, 而global average pooling可以用用于减小模型尺寸,降低过拟合,提高精度。 11 | 12 | **特点**:可以接受不同长宽比,不同大小的输入图片,输出固定大小的向量。
13 | 14 | ![image1](pic/Selection_048.png) 15 | 16 | #### Model Training 17 | **Single size training**
18 | 输入:固定224 x 224大小
19 | conv5 feature map输出:a x a (eg: 13 x 13)
20 | SPP: n x n bins, 滑动窗池化,win=ceiling(a/n) stride=floor(a/n)
21 | 22 | **Multi-size training**
23 | 输入: 224 x 224和180 x 180 (224 resize 产生)
24 | 由于SPP层输出的大小只与粒度有关与输入无关,所以224和180两种图片输入的模型的各层权重可以完全共享,因此在训练的时候用一个网络训练一个epoch之后保留权重用另一个网络训练一个epoch,如此迭代。 25 | 26 | #### SPP Net for object detection 27 | 与RCNN相比,不同与RCNN的proposal选择在输入图片上进行,SPP Net在featurem map上进行操作,不需要对每个proposal进行重新的卷积计算,速度提高100倍+(操作与fast RCNN类似)。
28 | 29 | ![image2](pic/Selection_049.png) 30 | 31 | Selective Search 提取2000个proposal,resize图像min(w, h) = s ∈ S = {480, 576, 688, 864, 1200}后提取特征,对每个候选proposal用4层SP(1x1,2x2,3x3,6x6, total 50 bins)pool,输出256*50=12800d向量输入到fc层,最后用SVM分类。正例为bbox,负例为与正例IoU<0.3的box(去掉与其他样本IoU>0.7的负例) 32 | 33 | ![image3](pic/Selection_050.png) 34 | 35 | 36 | --- 37 | ## 2. DeepID-Net 38 | **DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection** [[Paper](https://arxiv.org/pdf/1412.5661.pdf)]: Xiaoou training
39 | An extension of R-CNN: box pre-training, cascade on region proposals, deformation layers and context representations 40 | #### Network structure 41 | **method:** SS提取proposal; RCNN剔除可能是背景的proposal;crop bbox的图片输入到DeepID-Net输出各类的confidence(200类);用计算全图的分类score(1000类)作为contextual信息来优化前面bbox的分类score;平均多个deep model的输出来提高Detection精度;RCNN进行bbox regression。 42 | 43 | ![image4](pic/Selection_043.png) 44 | 45 | **def-pooling layer**:conv层输出大小为W x H,C个part detection maps,记Mc为第c个part detection map,Mc的第(i,j)个元素是mc(i,j)。def-pooling层从Mc取一个大小为(2R+1)x(2R+1),中心为(sx.x, sy.y)的区域,输出如下,大小为(W/sx, H/sy): 46 | 47 | ![image5](pic/Selection_045.png) 48 | 49 | 其中mczδxy是把第c个part放到deformed position zδxy的visual score; ac,n和dc,nδxy通过BP学习;Σn=1Nac,ndc,nδxy是把part从假定的anchor处(sx.x, sy.y放到zδxy的惩罚。
50 | 而当N=1,an=1, d1δxy=0 for |δx|, |δy| ≤ k,d1δxy = ∞ for |δx|, |δy| > k,此时def-pooling layer相当于max-pooling with kernel size k。
51 | ![image6](pic/Selection_044.png) 52 | 53 | **优势**:1. 可以替换conv层,学习不同size和semantic meaning的deformation parts;比如高级用大的parts(上肢),中级的用中间级别的part(人头),低级的用小的part(嘴)。2. def-pooling layer可以同时检测多个有相同visual cur的deformable parts(如上图所示);通过def-pooling layer学习的visual pattern可以在多个class共享。 54 | 55 | **contextual modeling**: concatenate 1000-class score as contextual score with 200-class score to form a 1200d feature vector. 56 | 57 | **multiple deep models**: models trained in different structures, pretraining schemes, loss functions (hinge loss), adding def-pooling layer or not, doing bbox rejection or not. 58 | 59 | 60 | 61 | --- 62 | ## 3. YOLO 63 | **You Only Look Once: Unified, Real-Time Object Detection** [[Paper](https://arxiv.org/pdf/1506.02640.pdf)] [[Code](https://github.com/pjreddie/darknet)]: RGB et.al
64 | Other references: [TensorFlow YOLO object detection on Android](https://github.com/natanielruiz/android-yolo) 65 | 66 | **特点**:从全图提取特征来同时预测所有类的所有bbox,允许end-to-end training,检测实时性高。 67 | #### Network structure 68 | **原理**:首先将图片切分成S x S个格子,如果物体的中心在一个格子内,那么这个格子就需要对检测的该物体负责。每个格子预测B个bbox以及这些bbox的confidence score - Pr(Object)\*IOUpredtruth即反映了格子包含物体的可能性以及位置坐标的精度。每个bbox预测值有5个:(x, y, w, h, confidence);而每个格子预测C个条件类概率(Pr(Classi|Object)。只有当格子包含物体的时候才计算这个概率,并且**计算class的概率每个格子只做一次,与bbox的数量(B)无关**。在test的时候将条件类概率与之前的confidence相乘即可得到每个bbox的每个类的检测概率(这个值包含了预得的测类以及坐标的准确性,如下)。
69 | ![image7](pic/Selection_052.png) 70 | 在Pascal数据集下S=7, B=2, C=20,所以最后的预测为7x7x(5x2+20)的Tensor。
71 | ![image8](pic/Selection_051.png) 72 | 73 | **网络结构**: 74 | 24 conv + 2 fc; fast version为9 conv + 2 fc。
75 | ![image9](pic/Selection_053.png) 76 | 77 | #### Model Training 78 | 首先用ImageNet 1000 class + 上图前20层+average pooling+fc预训练(top5:88%)**分类**,输入的图片大小为224x224。然后在这基础上加4conv+2fc(weights随机初始化)并且图片输入大小提高到448x448训练 **检测**。其中 **预处理** 根据输入图片的大小normalize bbox的宽和高到0-1,而x,y则基于特定格子参数化到0-1。 79 | 80 | 最后一层的激活函数是linear activation,其他层为Leaky ReLU:
81 | ![image10](pic/Selection_056.png) 82 | 83 | 每个格子都会预测多个bbox,但是训练的时候我们对应每个物体只想要一个bbox predictor。如果某个predictor的预测值与某个物体真实值之间存在最大IOU,那么这个predictor就是负责这个预测这个物体的。 84 | 85 | **Loss function**
86 | ![image11](pic/Selection_054.png) 87 | 88 | 其中,λcoord=5, λnoobj=0.5用于增大bbox预测坐标的loss,减少没有包含物体格子的预测confidence的loss(因为大部分格子木有包含物体);而为了使小的bbox预测坐标时的偏差比大的bbox时影响更大,对w和h进行了开根处理。liobj表示格子i含有是否物体,有就是1;lijobj表示i个格子中使用第j个bbox predictor是负责预测该物体的(分类误差只有在格子有物体的时候计算,坐标误差只有在predictor是负责预测这个物体的时候计算)。 89 | 90 | 数据集:VOC2007+VOC2012 91 | 92 | #### Inference 93 | 98(7x7x2) bbox + NMS 94 | 95 | #### Limitation 96 | - 每个格子只有2个bbox,以及只能属于1类,因此集群的小物体很难检 97 | - 预测新的尺寸或者异常长宽比的bbox的坐标能力较差,相比与faster R-CNN,定位误差较大。 98 | - 将大小不同个的bbox的error按一样的方式处理(小bbox的误差对IoU影响较大) 99 | 100 | 101 | 102 | --- 103 | ## 4. YOLOv2(YOLO9000) 104 | **YOLO9000: Better, Faster, Stronger** [[Paper](https://arxiv.org/pdf/1612.08242.pdf)] [[Code](https://github.com/allanzelener/YAD2K)] 105 | 106 | #### Improvement 107 | 1). 每层conv后添加 **BN** (Batch Normalization); 108 | 109 | 2). **分类** 模型也用448x448的输入图片训练(提高训练输入图像的分辨率),然后fine-tune用于检测模型; 110 | 111 | 3).**Anchor boxes**
112 | 使用anchor boxes代替conv层后的全连接层预测的bbox并用anchor box来预测class(在yolo中是由每个格子进行预测)和objectiveness(98 bbox in YOLO --> >1000(13x13x9) bbox in YOLOv2) 113 | 114 | 用k-means clustering代替手动预先确定anchor box的大小(聚类的距离函数为d(box, centroid)=1−IOU(box, centroid)而非euclidean distance)
115 | ![image12](pic/Selection_065.png) 116 | 117 | 4). 使用相对grid cell的位置来预测坐标:将真实bbox bound to 0-1,而将预测值用logistics activation也限制到0-1。其中pw和ph是预设的anchor box的宽和高。 118 | 119 | ![image13](pic/Selection_063.png) 120 | 121 | ![image14](pic/Selection_057.png) 122 | 123 | 5). 更细粒度的feature来捕获小物体( **passthrough层** ):使用最后一层feature map(13x13)的前一层(26x26x512),将其转成13x13x2048(类似与resnet中identity mapping)并与最后一层concatenate。 124 | 125 | 6). 由于YOLOv2只包含conv层和pooling层,没有fc层,因此输入 **任意大小** 的图像。每10个batch选择一个新的输入图像大小 - {320, 352, ..., 608}。 126 | 127 | 7). 用 **Darknet** 代替VGG作为分类模型作为YOLOv2的基础,减少计算量提高速度。在检测的时候用3x3 conv层(1024 filter)+ 1x1 conv层代替Darknet最后的conv层。加一层passthrough层将最后的3x3x512层(feature map:14x14)与最后的3x3x1024层(feature map: 7x7)相连。 128 | 129 | ![image15](pic/Selection_061.png) 130 | 131 | 8). Hierarchical classification 132 | 使用WordTree来预测每个节点的条件概率:比如如果一个物体标记为terrier,那么同时也被标注为dog和mammal。在训练/预测的时候将所有WordTree的中间节点也加入预测的label(VOC:1000-> 1369),然后计算联合概率。 133 | 134 | ![image16](pic/Selection_060.png) 135 | 136 | ![image17](pic/Selection_058.png) 137 | 138 | ![image18](pic/Selection_059.png) 139 | 140 | 141 | --- 142 | ## 5. AttentionNet 143 | **AttentionNet: Aggregating Weak Directions for Accurate Object Detection** [[Paper](AttentionNet: Aggregating Weak Directions for Accurate Object Detection)] 144 | 145 | ![image19](pic/Selection_066.png) 146 | 147 | --- 148 | ## 6. DenseBox 149 | **DenseBox: Unifying Landmark Localization with End to End Object Detection** [[Paper](https://arxiv.org/pdf/1509.04874.pdf)] : Baidu 150 | 151 | #### model structure 152 | VGG19 前12层conv,conv4-4后面添加up-sampling层,与conv3-4连接叉出两个branch,分别用于预测bbox的left top和bottom right点和是否包含物体的概率:ti = {s, dxt=xi−xt,dyt=yi−yt, dxb=xi−xb,dyb=yi−yb}i(第一个是confidence score,后面的是输出的坐标与真实bbox的距离)。 153 | 154 | ![image20](pic/Selection_067.png) 155 | 156 | **loss function**
157 | ![image21](pic/Selection_072.png) 158 | 159 | ![image22](pic/Selection_073.png) (貌似这个公式是错误的) 160 | 161 | ![image23](pic/Selection_069.png) 162 | 163 | ![image24](pic/Selection_068.png) 164 | 165 | #### landmark Localization 166 | ![image25](pic/Selection_070.png) 167 | 168 | ![image26](pic/Selection_071.png)
169 | Llm: L2 loss of predicted value and labels; Lrf与Lcls相同 170 | 171 | 172 | --- 173 | ## 7. SSD 174 | **SSD: Single Shot MultiBox Detector** [[Paper](https://arxiv.org/pdf/1512.02325.pdf)] [[Code](https://github.com/weiliu89/caffe/tree/ssd)] 175 | 176 | **特点**:用一套预设大小和长宽比的bbox来预测各类的score和box offset(类似于Faster RCNN的anchor box);**结合不同分辨率的feature map用于处理不同大小的物体**;完全 **没有proposal generation** 和接下去的feature resampling,易于训练。速度快59fps on Titan X with mAP 74.3% on 300x 300 image in VOC2007 177 | 178 | #### Model structure 179 | 以VGG16作为base,然后添加几层逐渐变小的conv层作为不同尺寸的feature map用于检测模型(**低层感受野比较小适合检测小物体,高层感受野比较大适合检测大物体**)。在各层m x n x p 大小的feature map用3 x 3 x p的kernel来生成类的score或者相对与预设的bbox的坐标offset。 180 | 181 | ![image27](pic/Selection_079.png) 182 | 183 | **bbox的默认大小和长宽比**
184 | 对于feature map的每个位置有(c+4)\*k个filter,其中c为class数,k为每个点预设的bbox数,4为box shape offset,所以每个feature map共有(c+4)kmn个输出。 185 | 186 | ![image28](pic/Selection_074.png) 187 | 188 | #### Model training 189 | 1). 与YOLO和faster RCNN类似的,SSD训练的时候需要首先将ground bbox投射到 **每层** feature map上。 190 | 首先需要确定哪些预设的box对ground truth的检测负责:根据预设box挑出IoU(jaccard overlap)最高的ground bbox,然后如果预设的box与其他ground bbox之间IoU大于0.5也进行配对,那么网络就可以对多个重叠的预设bbox都检车出高的分值而非只有IoU最高的那一个。 191 | 192 | **Loss**:
193 | ![image29](pic/Selection_076.png) 194 | 195 | ![image30](pic/Selection_077.png) 196 | 197 | ![image31](pic/Selection_078.png) 198 | 199 | 其中N是匹配上的预设bbox的个数。权重α设为1,而xijk={1,0}为第i个预设box与第j个ground bbox在物体k上匹配。 200 | 201 | 预设的bbox不一定需要与每一层的实际感受野对应,**通过设计一系列预设的bbox使特定的feature map与特定大小的物体相对应。**
202 | 首先每一层feature map的预设bbox的大小计算如下:
203 | ![image32](pic/Selection_080.png) 204 | 205 | 其中,Smin=0.2, Smax=0.9, m指共有m层freature map,即最底层的大小为0.2,最高层的大小为0.9。 206 | 207 | 而长宽比为ar ∈ {1, 2, 3, 1/2, 1/3}},bbox的宽wka=Sk*sqrt(ar),bbox的高hka=Sk/sqrt(ar)。 208 | 而对ar=1,加一个bbox的Sk'=sqrt(Sk*Sk+1);而每个bbox的中心为((i+0.5)/|fk|, (j+0.5)/|fk|),其中fk是第k个feature map的size。 209 | 210 | 通过这种设计,图1中4x4feature map有匹配上够的预设box,但是在8x8的feature map上就没有。 211 | 212 | 2). hard negative mining 213 | 214 | 3). data augmentation 215 | - 使用全图 216 | - 与物体box的IoU ∈ {0.1, 0.3, 0.5, 0.7, 0.9}的局部图 217 | - 随机path 218 | 219 | 挑选的patch大小从原图的0.1-1,长宽比从0.5-2。然后resize到固定大小,0.5的几率随机水平翻转,以及其他一些photo-metric distortions。 220 | 221 | 222 | --- 223 | ## 8. DSSD 224 | **DSSD : Deconvolutional Single Shot Detector** [[Paper](https://arxiv.org/pdf/1701.06659.pdf)] 225 | 226 | **特点**:SSD+resnet101 with deconvolutional layer (**获取高等级context**) 227 | 228 | #### Model structure 229 | 只用resnet101替换vgg16,mAP反而从77.5%降到76.4%,但是进过调整之后的精度能显著提升。
230 | ![image33](pic/Selection_081.png) 231 | 232 | **prediction module**:在feature map之后使用resnet block后再进行预测,而非SSD中直接在conv层之后添加L2 normalization层进行预测,这样做的检测精度就明显超过VGG16了。
233 | ![image34](pic/Selection_082.png) 234 | 235 | **Deconvolutional SSD**:为了获取高级的context,在SSD基础上添加了deconv层。(**缺点:训练预测时的处理速度会变慢,没有预训练模型可用**)
236 | ![image35](pic/Selection_083.png) 237 | 238 | 其中,elw product值element-wise product,用于连接原始的conv层和相应的deconv层。 239 | 240 | #### Model training 241 | **Loss**: joint localization loss (smooth L1) + confidence loss (softmax)。 242 | 243 | 由于没有类似RCNN的resampling过程,需要数据增强。 244 | 245 | 使用kmeans clustering来确定预设box的长宽比(**与YOLO2类似**),发现7个类最合适,添加了比例1.6(1/1.6)。 246 | 247 | #### FPS 248 | ![image36](pic/Selection_084.png) 249 | 250 | --- 251 | ## 9. Inside-Outside Net (ION) 252 | **Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks** [[Paper](https://arxiv.org/pdf/1512.04143.pdf)]: Microsoft RGB 253 | 254 | **特点**:预测的时候使用通过spatial RNN整合了contextual information(ION结构);在ROI区域用skip pooling提取不同尺度的特征;将这两者连接作为网络的输入。 255 | 256 | ![image37](pic/Selection_086.png) 257 | #### Model Structure 258 | ![image38](pic/Selection_085.png) 259 | 260 | 2k ROI from raw image (RCNN) --> conv3,4,5+context feature --> L2 normalization, concatenate, re-scale, dimension reduced (1x1 conv) --> 512x7x7 (**由vgg16决定**)
261 | 此外,由于需要结合多层的feature,需要减少1x1conv层初始weight的大小,使用 **Xavier initialization** 262 | 263 | **Context feature with IRNNs**
264 | ![image39](pic/Selection_087.png) 265 | 266 | RNN输入序列的的4个读取方向:上下左右。这里采用的RNN为 **ReLU RNN**(recurrent weight matrix初始化为 **identity matrix**,梯度可以完全通过BP传递。效果可以和LSMTM一样好,但是内存,计算速度可以提升很多)。(常用的RNN结构为GRU, LSTM, plain tanh recurrent nn) 267 | 268 | 使用 **1x1 conv** 作为input-to-hidden transition,因为可以被各个方向共享。bias也可以通过这个方式共享然后整合到1x1 conv层中。然后IRNN的输出以4个方向的隐含层依次连接。 269 | 270 | IRNN从左到右的update如下,其他方向类似:
271 | ![image40](pic/Selection_089.png) 272 | 273 | 而当将hiden transition matrix固定为单位阵的时候,上式就可以简化为: 274 | ![image41](pic/Selection_090.png) 275 | 276 | 在每个方向,平行计算所有独立的行和列而非每次计算一个RNN cell;使用semantic label来regularize IRNN输出,这个时候需要添加一层deconv层(32x32 kernel放大16倍)并crop;经过测试在连接各层输出的时候 **不需要dropout**;在训练RNN的时候bias也是不需要的。 277 | 278 | 第一个4向IRNN输出的feature map就汇总了每个cell周边的feature,随后的1x1 conv层将这些信息混合并降维。在第二个IRNN后,每个cell的输出与输入与输入相关,因此此时的context feature包含了 **global和local** 的信息,feature随位置而改变。
279 | ![image42](pic/Selection_088.png) 280 | 281 | 282 | --- 283 | # 10. R-FCN 284 | **R-FCN: Object Detection via Region-based Fully Convolutional Networks** [[Paper](https://arxiv.org/pdf/1605.06409.pdf)] [[Code]( https://github.com/daijifeng001/R-FCN)] : Kaiming He 285 | 286 | **特点**:fully convolutional, computation shared on entire image, position-sensitive score map,may ignore global info 287 | 288 | ![image43](pic/Selection_092.png) 289 | 290 | #### Model structure 291 | region proposal (RPN) + region classification。最后一层feature map对每个类+背景生成一个k2 **position-sensitive** (top-left, ..., bottom right) score map,因此总共有k2(C+1)个channel。RFCN最后一层是position-sensitive ROI pooling层,合并conv层的结果输出每个ROI的score,但是这里用的是selective pooling(与fast rcnn中不同),每一个kxk的bin都只是来自kxk层score map的其中一层。 292 | 293 | ![image44](pic/Selection_091.png) 294 | 295 | ![image45](pic/Selection_096.png) 296 | 297 | **Base Net**: ResNet101 + 1x1 conv (2048d->1024d) + k2(C+1)-channel conv层 298 | 299 | **position-sensitive score map/pooling**
300 | 将ROI分成k x k个格子,每个格子w x h;然后对(i,j)格子的score map进行pooling(**position-sensitive ROI pooling**)。
301 | ![image46](pic/Selection_094.png) 302 | 303 | 其中r是第(i,j)格子的对于类型c的pooled response,z是k2(C+1)个score map的其中一个,n是bin中包含的pixel数(所以是average pooling) 304 | 305 | 然后这个k2 position-sensitive score对这个ROI使用average scroe进行投票,生成一个C+1维vector,然后计算softmax(分类score)。
306 | ![image47](pic/Selection_098.png) 307 | 308 | ![image48](pic/Selection_099.png) 309 | 310 | 类似的,在上述k2(C+1)d的conv层添加 **4k2d** 的conv层用于bbox coordinate regression,输出向量为4k2向量,然后通过average voting整合到4d向量(t=(tx,ty,tw,th))。**注意,这里为了简化计算用的是class agnostic进行box回归,如果需要用class-specifi,那么添加的conv层为4k2C。** (ROI层以后不需要学习,所以训练时候的计算量可以忽略) 311 | 312 | ![image50](pic/Selection_093.png) 313 | #### training 314 | **Loss**: IoU > 0.5为正例
315 | ![image49](pic/Selection_097.png) 316 | 317 | **OHEM** 318 | 319 | input image: 600 (短边) 320 | 321 | **Algorithme à trous**: improve mAP 2.6% 322 | 323 | #### inference 324 | 300 ROI, NMS with IoU = 0.3 325 | 326 | # 11. RetinaNet (Focal loss) 327 | **Focal Loss for Dense Object Detection**: [[Paper](https://arxiv.org/pdf/1708.02002.pdf)] [[Code](https://github.com/unsky/focal-loss)]: Kaiming He, Ross Girshick et.al 328 | 329 | **特点:** focal loss来代替standard cross entropy loss,防止由于foreground-background class imbalance (比如1:1000)造成的容易检测的样本(easy negatives)累积的loss超过难检测的样本(hard negatives)从而控制了检测过程。 330 | 思想类似于OHEM,但是这里直接从loss入手,不用像OHEM一样对训练样本进行操作。 331 | 332 | #### Focal Loss 333 | 这里Focal loss是被用于one-stage object detection框架(YOLO, SSD, FPN等),在two-stage框架(RCNN+RPN)中应该会取得更好的效果。 334 | 335 | 对于原始的cross-entropy:![image51](pic/Selection_137.png),如果定义![image52](pic/Selection_138.png),则 **CE(p, y) = CE(Pt) = -log(Pt)**。引入权重α∈[0, 1] for class 1 and 1-α for class -1表示样本偏差对CE的影响:**CE(Pt) = -αtlog(Pt)**。 336 | 337 | 但是这样的α只对正负样本的重要性做了平衡(比如样本数量),无法区分 **容易训练/不容易训练** 的样本。所以引入调控因子 **(1-Pt)γ**,其中γ>=0: Focal-Loss为![image53](pic/Selection_139.png)。 338 | 其中当γ=0的时候,FL=CE。直观来看, **当Pt趋向1的时候,因子会变得更小,这个容易分类的样本的loss也就越小**。当γ=2 (在实际操作中,**γ=2, α=0.25效果最好**),pt=0.9的时候,FL的值比原来的CE值小100倍;而当pt=0.968的时候,则会降到1000倍。同时将α引入获得的FL的效果会比只用调节因子的还好一些:![image54](pic/Selection_140.png)。梯度为:![image55](pic/Selection_144.png) 339 | 340 | ![image56](pic/Selection_136.png) 341 | 342 | #### Class imbalance 343 | 在 **模型初始化** 的时候,对于二分类,一般情况是会把输出y=-1/1的概率设为相同;但是在样本偏差比较严重的情况时,这里降低预设概率为p(比如0.01),结果发现能够提高训练的稳定性。 344 | 345 | 而在two-stage检测中,一般是不做α平衡的,一般用 **two-stage cascade** 和 **biased minibatch sampling** (选择的比例跟α平衡其实是一样的)。这里就用FL来代替这些操作用在one-stage检测中。 346 | 347 | #### Retina detector 348 | 使用Resnet+FPN骨架,检测和分类用FCN层,使用P3-P7层按anchor大小从32x32到512x512来做检测。 349 | 350 | ![image57](pic/Selection_141.png) 351 | 352 | #### Effect 353 | ![image58](pic/Selection_145.png) 354 | 355 | # 12. CoupleNet 356 | **CoupleNet: Coupling Global Structure with Local Parts for Object Detection**: [[Paper](https://arxiv.org/pdf/1708.02863.pdf)] [[Code](https://github.com/tshizys/CoupleNet)]: Hanqing Lu, et al 357 | 358 | **特点:** 在RPN生成的proposal上使用Position Sensitive ROI提取local信息和ROI pooling来提取global信息,并结合两者达到更好的检测效果。(速度与RFCN相比稍慢,精度高3个点左右) 359 | 360 | #### Net Architecture 361 | ![image59](pic/Selection_161.png) 362 | 363 | 使用ResNet101作为骨架,通过RPN生成候选proposal,然后分成两条branch:1). local part-sensitive FCN + 2). global region-sensitive FCN。 这两条branch的输出最后需要couple到一起作为object的分数。 364 | 365 | **Local FCN**
366 | 与R-FCN相同,对候选proposal进行1x1卷积生成K2(C+1)个通道,通过对K2进行voting得到每一类的最后score。 367 | 368 | **Global FCN**
369 | 与Faster RCNN相同,现在conv4后添加1024d 1x1 conv层降维,然后添加一层ROI pooling层,通过kxk核和1x1 conv层最后输出C+1维向量。为了更好的获取global特征,还从面积为proposal两倍的context区域提取了特征,提取的特征直接连接到proposal提取的特征,输入到后面的ROI wise sub-network。 370 | 371 | **Coupling structure**
372 | global和local分支提取出来的特征需要首先 **normalization** 来确保相同的大小:L2-Norm layer/1x1 conv layer; 然后耦合到一起:element-wise sum, element-wise product and element-wise maximum。测试的结果为:**1x1 conv + element-wise sum** 373 | 374 | ![image60](pic/Selection_162.png) 375 | -------------------------------------------------------------------------------- /object_detection_summary.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/object_detection_summary.pdf -------------------------------------------------------------------------------- /pic/Selection_043.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_043.png -------------------------------------------------------------------------------- /pic/Selection_044.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_044.png -------------------------------------------------------------------------------- /pic/Selection_045.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_045.png -------------------------------------------------------------------------------- /pic/Selection_048.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_048.png -------------------------------------------------------------------------------- /pic/Selection_049.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_049.png -------------------------------------------------------------------------------- /pic/Selection_050.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_050.png -------------------------------------------------------------------------------- /pic/Selection_051.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_051.png -------------------------------------------------------------------------------- /pic/Selection_052.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_052.png -------------------------------------------------------------------------------- /pic/Selection_053.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_053.png -------------------------------------------------------------------------------- /pic/Selection_054.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_054.png -------------------------------------------------------------------------------- /pic/Selection_056.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_056.png -------------------------------------------------------------------------------- /pic/Selection_057.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_057.png -------------------------------------------------------------------------------- /pic/Selection_058.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_058.png -------------------------------------------------------------------------------- /pic/Selection_059.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_059.png -------------------------------------------------------------------------------- /pic/Selection_060.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_060.png -------------------------------------------------------------------------------- /pic/Selection_061.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_061.png -------------------------------------------------------------------------------- /pic/Selection_062.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_062.png -------------------------------------------------------------------------------- /pic/Selection_063.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_063.png -------------------------------------------------------------------------------- /pic/Selection_064.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_064.png -------------------------------------------------------------------------------- /pic/Selection_065.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_065.png -------------------------------------------------------------------------------- /pic/Selection_066.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_066.png -------------------------------------------------------------------------------- /pic/Selection_067.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_067.png -------------------------------------------------------------------------------- /pic/Selection_068.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_068.png -------------------------------------------------------------------------------- /pic/Selection_069.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_069.png -------------------------------------------------------------------------------- /pic/Selection_070.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_070.png -------------------------------------------------------------------------------- /pic/Selection_071.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_071.png -------------------------------------------------------------------------------- /pic/Selection_072.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_072.png -------------------------------------------------------------------------------- /pic/Selection_073.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_073.png -------------------------------------------------------------------------------- /pic/Selection_074.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_074.png -------------------------------------------------------------------------------- /pic/Selection_076.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_076.png -------------------------------------------------------------------------------- /pic/Selection_077.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_077.png -------------------------------------------------------------------------------- /pic/Selection_078.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_078.png -------------------------------------------------------------------------------- /pic/Selection_079.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_079.png -------------------------------------------------------------------------------- /pic/Selection_080.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_080.png -------------------------------------------------------------------------------- /pic/Selection_081.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_081.png -------------------------------------------------------------------------------- /pic/Selection_082.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_082.png -------------------------------------------------------------------------------- /pic/Selection_083.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_083.png -------------------------------------------------------------------------------- /pic/Selection_084.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_084.png -------------------------------------------------------------------------------- /pic/Selection_085.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_085.png -------------------------------------------------------------------------------- /pic/Selection_086.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_086.png -------------------------------------------------------------------------------- /pic/Selection_087.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_087.png -------------------------------------------------------------------------------- /pic/Selection_088.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_088.png -------------------------------------------------------------------------------- /pic/Selection_089.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_089.png -------------------------------------------------------------------------------- /pic/Selection_090.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_090.png -------------------------------------------------------------------------------- /pic/Selection_091.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_091.png -------------------------------------------------------------------------------- /pic/Selection_092.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_092.png -------------------------------------------------------------------------------- /pic/Selection_093.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_093.png -------------------------------------------------------------------------------- /pic/Selection_094.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_094.png -------------------------------------------------------------------------------- /pic/Selection_095.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_095.png -------------------------------------------------------------------------------- /pic/Selection_096.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_096.png -------------------------------------------------------------------------------- /pic/Selection_097.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_097.png -------------------------------------------------------------------------------- /pic/Selection_098.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_098.png -------------------------------------------------------------------------------- /pic/Selection_099.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_099.png -------------------------------------------------------------------------------- /pic/Selection_136.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_136.png -------------------------------------------------------------------------------- /pic/Selection_137.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_137.png -------------------------------------------------------------------------------- /pic/Selection_138.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_138.png -------------------------------------------------------------------------------- /pic/Selection_139.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_139.png -------------------------------------------------------------------------------- /pic/Selection_140.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_140.png -------------------------------------------------------------------------------- /pic/Selection_141.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_141.png -------------------------------------------------------------------------------- /pic/Selection_144.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_144.png -------------------------------------------------------------------------------- /pic/Selection_145.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_145.png -------------------------------------------------------------------------------- /pic/Selection_160.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_160.png -------------------------------------------------------------------------------- /pic/Selection_161.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_161.png -------------------------------------------------------------------------------- /pic/Selection_162.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_162.png -------------------------------------------------------------------------------- /rcnn_seires_paper.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/rcnn_seires_paper.docx --------------------------------------------------------------------------------