├── object_detection_summary.md
├── object_detection_summary.pdf
├── pic
├── Selection_043.png
├── Selection_044.png
├── Selection_045.png
├── Selection_048.png
├── Selection_049.png
├── Selection_050.png
├── Selection_051.png
├── Selection_052.png
├── Selection_053.png
├── Selection_054.png
├── Selection_056.png
├── Selection_057.png
├── Selection_058.png
├── Selection_059.png
├── Selection_060.png
├── Selection_061.png
├── Selection_062.png
├── Selection_063.png
├── Selection_064.png
├── Selection_065.png
├── Selection_066.png
├── Selection_067.png
├── Selection_068.png
├── Selection_069.png
├── Selection_070.png
├── Selection_071.png
├── Selection_072.png
├── Selection_073.png
├── Selection_074.png
├── Selection_076.png
├── Selection_077.png
├── Selection_078.png
├── Selection_079.png
├── Selection_080.png
├── Selection_081.png
├── Selection_082.png
├── Selection_083.png
├── Selection_084.png
├── Selection_085.png
├── Selection_086.png
├── Selection_087.png
├── Selection_088.png
├── Selection_089.png
├── Selection_090.png
├── Selection_091.png
├── Selection_092.png
├── Selection_093.png
├── Selection_094.png
├── Selection_095.png
├── Selection_096.png
├── Selection_097.png
├── Selection_098.png
├── Selection_099.png
├── Selection_136.png
├── Selection_137.png
├── Selection_138.png
├── Selection_139.png
├── Selection_140.png
├── Selection_141.png
├── Selection_144.png
├── Selection_145.png
├── Selection_160.png
├── Selection_161.png
└── Selection_162.png
└── rcnn_seires_paper.docx
/object_detection_summary.md:
--------------------------------------------------------------------------------
1 | # Object Detection Summary
2 | ## 1. SPP-Net
3 | **Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition** [[Paper](https://arxiv.org/pdf/1406.4729.pdf)] [[Code](https://github.com/ShaoqingRen/SPP_net)]: Kaiming He et.al
4 |
5 | #### Net structure
6 | Main structure: ZF + Spatial Pyramid Pooling layer
7 |
8 | Spatial Pyramid Pooling Layer: SPP是BOW(Bag of Words)模型的扩展,**将图像从粗粒度到细粒度进行划分,然后将局部特征进行整合(在CNN流行之前很常用)。**
9 |
10 | SPP-net将最后一层conv层之后的pooling层用SPP层进行替换,在每种粒度输出kM维向量(k:# filter; M:该粒度包含的bin的数量)。其中最粗的粒度只输出一个bin,也就是global pooling, 而global average pooling可以用用于减小模型尺寸,降低过拟合,提高精度。
11 |
12 | **特点**:可以接受不同长宽比,不同大小的输入图片,输出固定大小的向量。
13 |
14 | 
15 |
16 | #### Model Training
17 | **Single size training**
18 | 输入:固定224 x 224大小
19 | conv5 feature map输出:a x a (eg: 13 x 13)
20 | SPP: n x n bins, 滑动窗池化,win=ceiling(a/n) stride=floor(a/n)
21 |
22 | **Multi-size training**
23 | 输入: 224 x 224和180 x 180 (224 resize 产生)
24 | 由于SPP层输出的大小只与粒度有关与输入无关,所以224和180两种图片输入的模型的各层权重可以完全共享,因此在训练的时候用一个网络训练一个epoch之后保留权重用另一个网络训练一个epoch,如此迭代。
25 |
26 | #### SPP Net for object detection
27 | 与RCNN相比,不同与RCNN的proposal选择在输入图片上进行,SPP Net在featurem map上进行操作,不需要对每个proposal进行重新的卷积计算,速度提高100倍+(操作与fast RCNN类似)。
28 |
29 | 
30 |
31 | Selective Search 提取2000个proposal,resize图像min(w, h) = s ∈ S = {480, 576, 688, 864, 1200}后提取特征,对每个候选proposal用4层SP(1x1,2x2,3x3,6x6, total 50 bins)pool,输出256*50=12800d向量输入到fc层,最后用SVM分类。正例为bbox,负例为与正例IoU<0.3的box(去掉与其他样本IoU>0.7的负例)
32 |
33 | 
34 |
35 |
36 | ---
37 | ## 2. DeepID-Net
38 | **DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection** [[Paper](https://arxiv.org/pdf/1412.5661.pdf)]: Xiaoou training
39 | An extension of R-CNN: box pre-training, cascade on region proposals, deformation layers and context representations
40 | #### Network structure
41 | **method:** SS提取proposal; RCNN剔除可能是背景的proposal;crop bbox的图片输入到DeepID-Net输出各类的confidence(200类);用计算全图的分类score(1000类)作为contextual信息来优化前面bbox的分类score;平均多个deep model的输出来提高Detection精度;RCNN进行bbox regression。
42 |
43 | 
44 |
45 | **def-pooling layer**:conv层输出大小为W x H,C个part detection maps,记Mc为第c个part detection map,Mc的第(i,j)个元素是mc(i,j)。def-pooling层从Mc取一个大小为(2R+1)x(2R+1),中心为(sx.x, sy.y)的区域,输出如下,大小为(W/sx, H/sy):
46 |
47 | 
48 |
49 | 其中mczδx,δy是把第c个part放到deformed position zδx,δy的visual score; ac,n和dc,nδx,δy通过BP学习;Σn=1Nac,ndc,nδx,δy是把part从假定的anchor处(sx.x, sy.y放到zδx,δy的惩罚。
50 | 而当N=1,an=1, d1δx,δy=0 for |δx|, |δy| ≤ k,d1δx,δy = ∞ for |δx|, |δy| > k,此时def-pooling layer相当于max-pooling with kernel size k。
51 | 
52 |
53 | **优势**:1. 可以替换conv层,学习不同size和semantic meaning的deformation parts;比如高级用大的parts(上肢),中级的用中间级别的part(人头),低级的用小的part(嘴)。2. def-pooling layer可以同时检测多个有相同visual cur的deformable parts(如上图所示);通过def-pooling layer学习的visual pattern可以在多个class共享。
54 |
55 | **contextual modeling**: concatenate 1000-class score as contextual score with 200-class score to form a 1200d feature vector.
56 |
57 | **multiple deep models**: models trained in different structures, pretraining schemes, loss functions (hinge loss), adding def-pooling layer or not, doing bbox rejection or not.
58 |
59 |
60 |
61 | ---
62 | ## 3. YOLO
63 | **You Only Look Once: Unified, Real-Time Object Detection** [[Paper](https://arxiv.org/pdf/1506.02640.pdf)] [[Code](https://github.com/pjreddie/darknet)]: RGB et.al
64 | Other references: [TensorFlow YOLO object detection on Android](https://github.com/natanielruiz/android-yolo)
65 |
66 | **特点**:从全图提取特征来同时预测所有类的所有bbox,允许end-to-end training,检测实时性高。
67 | #### Network structure
68 | **原理**:首先将图片切分成S x S个格子,如果物体的中心在一个格子内,那么这个格子就需要对检测的该物体负责。每个格子预测B个bbox以及这些bbox的confidence score - Pr(Object)\*IOUpredtruth即反映了格子包含物体的可能性以及位置坐标的精度。每个bbox预测值有5个:(x, y, w, h, confidence);而每个格子预测C个条件类概率(Pr(Classi|Object)。只有当格子包含物体的时候才计算这个概率,并且**计算class的概率每个格子只做一次,与bbox的数量(B)无关**。在test的时候将条件类概率与之前的confidence相乘即可得到每个bbox的每个类的检测概率(这个值包含了预得的测类以及坐标的准确性,如下)。
69 | 
70 | 在Pascal数据集下S=7, B=2, C=20,所以最后的预测为7x7x(5x2+20)的Tensor。
71 | 
72 |
73 | **网络结构**:
74 | 24 conv + 2 fc; fast version为9 conv + 2 fc。
75 | 
76 |
77 | #### Model Training
78 | 首先用ImageNet 1000 class + 上图前20层+average pooling+fc预训练(top5:88%)**分类**,输入的图片大小为224x224。然后在这基础上加4conv+2fc(weights随机初始化)并且图片输入大小提高到448x448训练 **检测**。其中 **预处理** 根据输入图片的大小normalize bbox的宽和高到0-1,而x,y则基于特定格子参数化到0-1。
79 |
80 | 最后一层的激活函数是linear activation,其他层为Leaky ReLU:
81 | 
82 |
83 | 每个格子都会预测多个bbox,但是训练的时候我们对应每个物体只想要一个bbox predictor。如果某个predictor的预测值与某个物体真实值之间存在最大IOU,那么这个predictor就是负责这个预测这个物体的。
84 |
85 | **Loss function**
86 | 
87 |
88 | 其中,λcoord=5, λnoobj=0.5用于增大bbox预测坐标的loss,减少没有包含物体格子的预测confidence的loss(因为大部分格子木有包含物体);而为了使小的bbox预测坐标时的偏差比大的bbox时影响更大,对w和h进行了开根处理。liobj表示格子i含有是否物体,有就是1;lijobj表示i个格子中使用第j个bbox predictor是负责预测该物体的(分类误差只有在格子有物体的时候计算,坐标误差只有在predictor是负责预测这个物体的时候计算)。
89 |
90 | 数据集:VOC2007+VOC2012
91 |
92 | #### Inference
93 | 98(7x7x2) bbox + NMS
94 |
95 | #### Limitation
96 | - 每个格子只有2个bbox,以及只能属于1类,因此集群的小物体很难检
97 | - 预测新的尺寸或者异常长宽比的bbox的坐标能力较差,相比与faster R-CNN,定位误差较大。
98 | - 将大小不同个的bbox的error按一样的方式处理(小bbox的误差对IoU影响较大)
99 |
100 |
101 |
102 | ---
103 | ## 4. YOLOv2(YOLO9000)
104 | **YOLO9000: Better, Faster, Stronger** [[Paper](https://arxiv.org/pdf/1612.08242.pdf)] [[Code](https://github.com/allanzelener/YAD2K)]
105 |
106 | #### Improvement
107 | 1). 每层conv后添加 **BN** (Batch Normalization);
108 |
109 | 2). **分类** 模型也用448x448的输入图片训练(提高训练输入图像的分辨率),然后fine-tune用于检测模型;
110 |
111 | 3).**Anchor boxes**
112 | 使用anchor boxes代替conv层后的全连接层预测的bbox并用anchor box来预测class(在yolo中是由每个格子进行预测)和objectiveness(98 bbox in YOLO --> >1000(13x13x9) bbox in YOLOv2)
113 |
114 | 用k-means clustering代替手动预先确定anchor box的大小(聚类的距离函数为d(box, centroid)=1−IOU(box, centroid)而非euclidean distance)
115 | 
116 |
117 | 4). 使用相对grid cell的位置来预测坐标:将真实bbox bound to 0-1,而将预测值用logistics activation也限制到0-1。其中pw和ph是预设的anchor box的宽和高。
118 |
119 | 
120 |
121 | 
122 |
123 | 5). 更细粒度的feature来捕获小物体( **passthrough层** ):使用最后一层feature map(13x13)的前一层(26x26x512),将其转成13x13x2048(类似与resnet中identity mapping)并与最后一层concatenate。
124 |
125 | 6). 由于YOLOv2只包含conv层和pooling层,没有fc层,因此输入 **任意大小** 的图像。每10个batch选择一个新的输入图像大小 - {320, 352, ..., 608}。
126 |
127 | 7). 用 **Darknet** 代替VGG作为分类模型作为YOLOv2的基础,减少计算量提高速度。在检测的时候用3x3 conv层(1024 filter)+ 1x1 conv层代替Darknet最后的conv层。加一层passthrough层将最后的3x3x512层(feature map:14x14)与最后的3x3x1024层(feature map: 7x7)相连。
128 |
129 | 
130 |
131 | 8). Hierarchical classification
132 | 使用WordTree来预测每个节点的条件概率:比如如果一个物体标记为terrier,那么同时也被标注为dog和mammal。在训练/预测的时候将所有WordTree的中间节点也加入预测的label(VOC:1000-> 1369),然后计算联合概率。
133 |
134 | 
135 |
136 | 
137 |
138 | 
139 |
140 |
141 | ---
142 | ## 5. AttentionNet
143 | **AttentionNet: Aggregating Weak Directions for Accurate Object Detection** [[Paper](AttentionNet: Aggregating Weak Directions for Accurate Object Detection)]
144 |
145 | 
146 |
147 | ---
148 | ## 6. DenseBox
149 | **DenseBox: Unifying Landmark Localization with End to End Object Detection** [[Paper](https://arxiv.org/pdf/1509.04874.pdf)] : Baidu
150 |
151 | #### model structure
152 | VGG19 前12层conv,conv4-4后面添加up-sampling层,与conv3-4连接叉出两个branch,分别用于预测bbox的left top和bottom right点和是否包含物体的概率:ti = {s, dxt=xi−xt,dyt=yi−yt, dxb=xi−xb,dyb=yi−yb}i(第一个是confidence score,后面的是输出的坐标与真实bbox的距离)。
153 |
154 | 
155 |
156 | **loss function**
157 | 
158 |
159 |  (貌似这个公式是错误的)
160 |
161 | 
162 |
163 | 
164 |
165 | #### landmark Localization
166 | 
167 |
168 | 
169 | Llm: L2 loss of predicted value and labels; Lrf与Lcls相同
170 |
171 |
172 | ---
173 | ## 7. SSD
174 | **SSD: Single Shot MultiBox Detector** [[Paper](https://arxiv.org/pdf/1512.02325.pdf)] [[Code](https://github.com/weiliu89/caffe/tree/ssd)]
175 |
176 | **特点**:用一套预设大小和长宽比的bbox来预测各类的score和box offset(类似于Faster RCNN的anchor box);**结合不同分辨率的feature map用于处理不同大小的物体**;完全 **没有proposal generation** 和接下去的feature resampling,易于训练。速度快59fps on Titan X with mAP 74.3% on 300x 300 image in VOC2007
177 |
178 | #### Model structure
179 | 以VGG16作为base,然后添加几层逐渐变小的conv层作为不同尺寸的feature map用于检测模型(**低层感受野比较小适合检测小物体,高层感受野比较大适合检测大物体**)。在各层m x n x p 大小的feature map用3 x 3 x p的kernel来生成类的score或者相对与预设的bbox的坐标offset。
180 |
181 | 
182 |
183 | **bbox的默认大小和长宽比**
184 | 对于feature map的每个位置有(c+4)\*k个filter,其中c为class数,k为每个点预设的bbox数,4为box shape offset,所以每个feature map共有(c+4)kmn个输出。
185 |
186 | 
187 |
188 | #### Model training
189 | 1). 与YOLO和faster RCNN类似的,SSD训练的时候需要首先将ground bbox投射到 **每层** feature map上。
190 | 首先需要确定哪些预设的box对ground truth的检测负责:根据预设box挑出IoU(jaccard overlap)最高的ground bbox,然后如果预设的box与其他ground bbox之间IoU大于0.5也进行配对,那么网络就可以对多个重叠的预设bbox都检车出高的分值而非只有IoU最高的那一个。
191 |
192 | **Loss**:
193 | 
194 |
195 | 
196 |
197 | 
198 |
199 | 其中N是匹配上的预设bbox的个数。权重α设为1,而xijk={1,0}为第i个预设box与第j个ground bbox在物体k上匹配。
200 |
201 | 预设的bbox不一定需要与每一层的实际感受野对应,**通过设计一系列预设的bbox使特定的feature map与特定大小的物体相对应。**
202 | 首先每一层feature map的预设bbox的大小计算如下:
203 | 
204 |
205 | 其中,Smin=0.2, Smax=0.9, m指共有m层freature map,即最底层的大小为0.2,最高层的大小为0.9。
206 |
207 | 而长宽比为ar ∈ {1, 2, 3, 1/2, 1/3}},bbox的宽wka=Sk*sqrt(ar),bbox的高hka=Sk/sqrt(ar)。
208 | 而对ar=1,加一个bbox的Sk'=sqrt(Sk*Sk+1);而每个bbox的中心为((i+0.5)/|fk|, (j+0.5)/|fk|),其中fk是第k个feature map的size。
209 |
210 | 通过这种设计,图1中4x4feature map有匹配上够的预设box,但是在8x8的feature map上就没有。
211 |
212 | 2). hard negative mining
213 |
214 | 3). data augmentation
215 | - 使用全图
216 | - 与物体box的IoU ∈ {0.1, 0.3, 0.5, 0.7, 0.9}的局部图
217 | - 随机path
218 |
219 | 挑选的patch大小从原图的0.1-1,长宽比从0.5-2。然后resize到固定大小,0.5的几率随机水平翻转,以及其他一些photo-metric distortions。
220 |
221 |
222 | ---
223 | ## 8. DSSD
224 | **DSSD : Deconvolutional Single Shot Detector** [[Paper](https://arxiv.org/pdf/1701.06659.pdf)]
225 |
226 | **特点**:SSD+resnet101 with deconvolutional layer (**获取高等级context**)
227 |
228 | #### Model structure
229 | 只用resnet101替换vgg16,mAP反而从77.5%降到76.4%,但是进过调整之后的精度能显著提升。
230 | 
231 |
232 | **prediction module**:在feature map之后使用resnet block后再进行预测,而非SSD中直接在conv层之后添加L2 normalization层进行预测,这样做的检测精度就明显超过VGG16了。
233 | 
234 |
235 | **Deconvolutional SSD**:为了获取高级的context,在SSD基础上添加了deconv层。(**缺点:训练预测时的处理速度会变慢,没有预训练模型可用**)
236 | 
237 |
238 | 其中,elw product值element-wise product,用于连接原始的conv层和相应的deconv层。
239 |
240 | #### Model training
241 | **Loss**: joint localization loss (smooth L1) + confidence loss (softmax)。
242 |
243 | 由于没有类似RCNN的resampling过程,需要数据增强。
244 |
245 | 使用kmeans clustering来确定预设box的长宽比(**与YOLO2类似**),发现7个类最合适,添加了比例1.6(1/1.6)。
246 |
247 | #### FPS
248 | 
249 |
250 | ---
251 | ## 9. Inside-Outside Net (ION)
252 | **Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks** [[Paper](https://arxiv.org/pdf/1512.04143.pdf)]: Microsoft RGB
253 |
254 | **特点**:预测的时候使用通过spatial RNN整合了contextual information(ION结构);在ROI区域用skip pooling提取不同尺度的特征;将这两者连接作为网络的输入。
255 |
256 | 
257 | #### Model Structure
258 | 
259 |
260 | 2k ROI from raw image (RCNN) --> conv3,4,5+context feature --> L2 normalization, concatenate, re-scale, dimension reduced (1x1 conv) --> 512x7x7 (**由vgg16决定**)
261 | 此外,由于需要结合多层的feature,需要减少1x1conv层初始weight的大小,使用 **Xavier initialization**
262 |
263 | **Context feature with IRNNs**
264 | 
265 |
266 | RNN输入序列的的4个读取方向:上下左右。这里采用的RNN为 **ReLU RNN**(recurrent weight matrix初始化为 **identity matrix**,梯度可以完全通过BP传递。效果可以和LSMTM一样好,但是内存,计算速度可以提升很多)。(常用的RNN结构为GRU, LSTM, plain tanh recurrent nn)
267 |
268 | 使用 **1x1 conv** 作为input-to-hidden transition,因为可以被各个方向共享。bias也可以通过这个方式共享然后整合到1x1 conv层中。然后IRNN的输出以4个方向的隐含层依次连接。
269 |
270 | IRNN从左到右的update如下,其他方向类似:
271 | 
272 |
273 | 而当将hiden transition matrix固定为单位阵的时候,上式就可以简化为:
274 | 
275 |
276 | 在每个方向,平行计算所有独立的行和列而非每次计算一个RNN cell;使用semantic label来regularize IRNN输出,这个时候需要添加一层deconv层(32x32 kernel放大16倍)并crop;经过测试在连接各层输出的时候 **不需要dropout**;在训练RNN的时候bias也是不需要的。
277 |
278 | 第一个4向IRNN输出的feature map就汇总了每个cell周边的feature,随后的1x1 conv层将这些信息混合并降维。在第二个IRNN后,每个cell的输出与输入与输入相关,因此此时的context feature包含了 **global和local** 的信息,feature随位置而改变。
279 | 
280 |
281 |
282 | ---
283 | # 10. R-FCN
284 | **R-FCN: Object Detection via Region-based Fully Convolutional Networks** [[Paper](https://arxiv.org/pdf/1605.06409.pdf)] [[Code]( https://github.com/daijifeng001/R-FCN)] : Kaiming He
285 |
286 | **特点**:fully convolutional, computation shared on entire image, position-sensitive score map,may ignore global info
287 |
288 | 
289 |
290 | #### Model structure
291 | region proposal (RPN) + region classification。最后一层feature map对每个类+背景生成一个k2 **position-sensitive** (top-left, ..., bottom right) score map,因此总共有k2(C+1)个channel。RFCN最后一层是position-sensitive ROI pooling层,合并conv层的结果输出每个ROI的score,但是这里用的是selective pooling(与fast rcnn中不同),每一个kxk的bin都只是来自kxk层score map的其中一层。
292 |
293 | 
294 |
295 | 
296 |
297 | **Base Net**: ResNet101 + 1x1 conv (2048d->1024d) + k2(C+1)-channel conv层
298 |
299 | **position-sensitive score map/pooling**
300 | 将ROI分成k x k个格子,每个格子w x h;然后对(i,j)格子的score map进行pooling(**position-sensitive ROI pooling**)。
301 | 
302 |
303 | 其中r是第(i,j)格子的对于类型c的pooled response,z是k2(C+1)个score map的其中一个,n是bin中包含的pixel数(所以是average pooling)
304 |
305 | 然后这个k2 position-sensitive score对这个ROI使用average scroe进行投票,生成一个C+1维vector,然后计算softmax(分类score)。
306 | 
307 |
308 | 
309 |
310 | 类似的,在上述k2(C+1)d的conv层添加 **4k2d** 的conv层用于bbox coordinate regression,输出向量为4k2向量,然后通过average voting整合到4d向量(t=(tx,ty,tw,th))。**注意,这里为了简化计算用的是class agnostic进行box回归,如果需要用class-specifi,那么添加的conv层为4k2C。** (ROI层以后不需要学习,所以训练时候的计算量可以忽略)
311 |
312 | 
313 | #### training
314 | **Loss**: IoU > 0.5为正例
315 | 
316 |
317 | **OHEM**
318 |
319 | input image: 600 (短边)
320 |
321 | **Algorithme à trous**: improve mAP 2.6%
322 |
323 | #### inference
324 | 300 ROI, NMS with IoU = 0.3
325 |
326 | # 11. RetinaNet (Focal loss)
327 | **Focal Loss for Dense Object Detection**: [[Paper](https://arxiv.org/pdf/1708.02002.pdf)] [[Code](https://github.com/unsky/focal-loss)]: Kaiming He, Ross Girshick et.al
328 |
329 | **特点:** focal loss来代替standard cross entropy loss,防止由于foreground-background class imbalance (比如1:1000)造成的容易检测的样本(easy negatives)累积的loss超过难检测的样本(hard negatives)从而控制了检测过程。
330 | 思想类似于OHEM,但是这里直接从loss入手,不用像OHEM一样对训练样本进行操作。
331 |
332 | #### Focal Loss
333 | 这里Focal loss是被用于one-stage object detection框架(YOLO, SSD, FPN等),在two-stage框架(RCNN+RPN)中应该会取得更好的效果。
334 |
335 | 对于原始的cross-entropy:,如果定义,则 **CE(p, y) = CE(Pt) = -log(Pt)**。引入权重α∈[0, 1] for class 1 and 1-α for class -1表示样本偏差对CE的影响:**CE(Pt) = -αtlog(Pt)**。
336 |
337 | 但是这样的α只对正负样本的重要性做了平衡(比如样本数量),无法区分 **容易训练/不容易训练** 的样本。所以引入调控因子 **(1-Pt)γ**,其中γ>=0: Focal-Loss为。
338 | 其中当γ=0的时候,FL=CE。直观来看, **当Pt趋向1的时候,因子会变得更小,这个容易分类的样本的loss也就越小**。当γ=2 (在实际操作中,**γ=2, α=0.25效果最好**),pt=0.9的时候,FL的值比原来的CE值小100倍;而当pt=0.968的时候,则会降到1000倍。同时将α引入获得的FL的效果会比只用调节因子的还好一些:。梯度为:
339 |
340 | 
341 |
342 | #### Class imbalance
343 | 在 **模型初始化** 的时候,对于二分类,一般情况是会把输出y=-1/1的概率设为相同;但是在样本偏差比较严重的情况时,这里降低预设概率为p(比如0.01),结果发现能够提高训练的稳定性。
344 |
345 | 而在two-stage检测中,一般是不做α平衡的,一般用 **two-stage cascade** 和 **biased minibatch sampling** (选择的比例跟α平衡其实是一样的)。这里就用FL来代替这些操作用在one-stage检测中。
346 |
347 | #### Retina detector
348 | 使用Resnet+FPN骨架,检测和分类用FCN层,使用P3-P7层按anchor大小从32x32到512x512来做检测。
349 |
350 | 
351 |
352 | #### Effect
353 | 
354 |
355 | # 12. CoupleNet
356 | **CoupleNet: Coupling Global Structure with Local Parts for Object Detection**: [[Paper](https://arxiv.org/pdf/1708.02863.pdf)] [[Code](https://github.com/tshizys/CoupleNet)]: Hanqing Lu, et al
357 |
358 | **特点:** 在RPN生成的proposal上使用Position Sensitive ROI提取local信息和ROI pooling来提取global信息,并结合两者达到更好的检测效果。(速度与RFCN相比稍慢,精度高3个点左右)
359 |
360 | #### Net Architecture
361 | 
362 |
363 | 使用ResNet101作为骨架,通过RPN生成候选proposal,然后分成两条branch:1). local part-sensitive FCN + 2). global region-sensitive FCN。 这两条branch的输出最后需要couple到一起作为object的分数。
364 |
365 | **Local FCN**
366 | 与R-FCN相同,对候选proposal进行1x1卷积生成K2(C+1)个通道,通过对K2进行voting得到每一类的最后score。
367 |
368 | **Global FCN**
369 | 与Faster RCNN相同,现在conv4后添加1024d 1x1 conv层降维,然后添加一层ROI pooling层,通过kxk核和1x1 conv层最后输出C+1维向量。为了更好的获取global特征,还从面积为proposal两倍的context区域提取了特征,提取的特征直接连接到proposal提取的特征,输入到后面的ROI wise sub-network。
370 |
371 | **Coupling structure**
372 | global和local分支提取出来的特征需要首先 **normalization** 来确保相同的大小:L2-Norm layer/1x1 conv layer; 然后耦合到一起:element-wise sum, element-wise product and element-wise maximum。测试的结果为:**1x1 conv + element-wise sum**
373 |
374 | 
375 |
--------------------------------------------------------------------------------
/object_detection_summary.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/object_detection_summary.pdf
--------------------------------------------------------------------------------
/pic/Selection_043.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_043.png
--------------------------------------------------------------------------------
/pic/Selection_044.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_044.png
--------------------------------------------------------------------------------
/pic/Selection_045.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_045.png
--------------------------------------------------------------------------------
/pic/Selection_048.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_048.png
--------------------------------------------------------------------------------
/pic/Selection_049.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_049.png
--------------------------------------------------------------------------------
/pic/Selection_050.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_050.png
--------------------------------------------------------------------------------
/pic/Selection_051.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_051.png
--------------------------------------------------------------------------------
/pic/Selection_052.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_052.png
--------------------------------------------------------------------------------
/pic/Selection_053.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_053.png
--------------------------------------------------------------------------------
/pic/Selection_054.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_054.png
--------------------------------------------------------------------------------
/pic/Selection_056.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_056.png
--------------------------------------------------------------------------------
/pic/Selection_057.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_057.png
--------------------------------------------------------------------------------
/pic/Selection_058.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_058.png
--------------------------------------------------------------------------------
/pic/Selection_059.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_059.png
--------------------------------------------------------------------------------
/pic/Selection_060.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_060.png
--------------------------------------------------------------------------------
/pic/Selection_061.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_061.png
--------------------------------------------------------------------------------
/pic/Selection_062.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_062.png
--------------------------------------------------------------------------------
/pic/Selection_063.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_063.png
--------------------------------------------------------------------------------
/pic/Selection_064.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_064.png
--------------------------------------------------------------------------------
/pic/Selection_065.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_065.png
--------------------------------------------------------------------------------
/pic/Selection_066.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_066.png
--------------------------------------------------------------------------------
/pic/Selection_067.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_067.png
--------------------------------------------------------------------------------
/pic/Selection_068.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_068.png
--------------------------------------------------------------------------------
/pic/Selection_069.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_069.png
--------------------------------------------------------------------------------
/pic/Selection_070.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_070.png
--------------------------------------------------------------------------------
/pic/Selection_071.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_071.png
--------------------------------------------------------------------------------
/pic/Selection_072.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_072.png
--------------------------------------------------------------------------------
/pic/Selection_073.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_073.png
--------------------------------------------------------------------------------
/pic/Selection_074.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_074.png
--------------------------------------------------------------------------------
/pic/Selection_076.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_076.png
--------------------------------------------------------------------------------
/pic/Selection_077.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_077.png
--------------------------------------------------------------------------------
/pic/Selection_078.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_078.png
--------------------------------------------------------------------------------
/pic/Selection_079.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_079.png
--------------------------------------------------------------------------------
/pic/Selection_080.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_080.png
--------------------------------------------------------------------------------
/pic/Selection_081.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_081.png
--------------------------------------------------------------------------------
/pic/Selection_082.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_082.png
--------------------------------------------------------------------------------
/pic/Selection_083.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_083.png
--------------------------------------------------------------------------------
/pic/Selection_084.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_084.png
--------------------------------------------------------------------------------
/pic/Selection_085.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_085.png
--------------------------------------------------------------------------------
/pic/Selection_086.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_086.png
--------------------------------------------------------------------------------
/pic/Selection_087.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_087.png
--------------------------------------------------------------------------------
/pic/Selection_088.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_088.png
--------------------------------------------------------------------------------
/pic/Selection_089.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_089.png
--------------------------------------------------------------------------------
/pic/Selection_090.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_090.png
--------------------------------------------------------------------------------
/pic/Selection_091.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_091.png
--------------------------------------------------------------------------------
/pic/Selection_092.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_092.png
--------------------------------------------------------------------------------
/pic/Selection_093.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_093.png
--------------------------------------------------------------------------------
/pic/Selection_094.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_094.png
--------------------------------------------------------------------------------
/pic/Selection_095.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_095.png
--------------------------------------------------------------------------------
/pic/Selection_096.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_096.png
--------------------------------------------------------------------------------
/pic/Selection_097.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_097.png
--------------------------------------------------------------------------------
/pic/Selection_098.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_098.png
--------------------------------------------------------------------------------
/pic/Selection_099.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_099.png
--------------------------------------------------------------------------------
/pic/Selection_136.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_136.png
--------------------------------------------------------------------------------
/pic/Selection_137.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_137.png
--------------------------------------------------------------------------------
/pic/Selection_138.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_138.png
--------------------------------------------------------------------------------
/pic/Selection_139.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_139.png
--------------------------------------------------------------------------------
/pic/Selection_140.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_140.png
--------------------------------------------------------------------------------
/pic/Selection_141.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_141.png
--------------------------------------------------------------------------------
/pic/Selection_144.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_144.png
--------------------------------------------------------------------------------
/pic/Selection_145.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_145.png
--------------------------------------------------------------------------------
/pic/Selection_160.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_160.png
--------------------------------------------------------------------------------
/pic/Selection_161.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_161.png
--------------------------------------------------------------------------------
/pic/Selection_162.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_162.png
--------------------------------------------------------------------------------
/rcnn_seires_paper.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/rcnn_seires_paper.docx
--------------------------------------------------------------------------------