├── ICCV2021-Papers-with-Code.md.md
└── README.md
/ICCV2021-Papers-with-Code.md.md:
--------------------------------------------------------------------------------
1 | # ICCV2021-Papers-with-Code
2 |
3 | [ICCV 2021](http://iccv2021.thecvf.com/) 论文和开源项目合集(papers with code)!
4 |
5 | 1617 papers accepted - 25.9% acceptance rate
6 |
7 | ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
8 |
9 | > 注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目!
10 | >
11 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
12 |
13 | ## 【ICCV 2021 论文和开源目录】
14 |
15 | - [Backbone](#Backbone)
16 | - [Transformer](#Transformer)
17 | - [涨点神器](#Cool)
18 | - [GAN](#GAN)
19 | - [NAS](#NAS)
20 | - [NeRF](#NeRF)
21 | - [Loss](#Loss)
22 | - [Zero-Shot Learning](#Zero-Shot-Learning)
23 | - [Few-Shot Learning](#Few-Shot-Learning)
24 | - [长尾(Long-tailed)](#Long-tailed)
25 | - [Vision and Language](#VL)
26 | - [无监督/自监督(Self-Supervised)](#Un/Self-Supervised)
27 | - [Multi-Label Image Recognition(多标签图像识别)](#MLIR)
28 | - [2D目标检测(Object Detection)](#Object-Detection)
29 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
30 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
31 | - [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation)
32 | - [视频目标分割(Video Object Segmentation)](#VOS)
33 | - [Few-shot Segmentation](#Few-shot-Segmentation)
34 | - [人体运动分割(Human Motion Segmentation)](#HMS)
35 | - [目标跟踪(Object Tracking)](#Object-Tracking)
36 | - [3D Point Cloud](#3D-Point-Cloud)
37 | - [3D Object Detection(3D目标检测)](#Point-Cloud-Object-Detection)
38 | - [3D Semantic Segmenation(3D语义分割)](#Point-Cloud-Semantic-Segmentation)
39 | - [3D Instance Segmentation(3D实例分割)](#Point-Cloud-Instance-Segmentation)
40 | - [3D Multi-Object Tracking(3D多目标跟踪)](#Point-Cloud-Multi-Object-Tracking)
41 | - [Point Cloud Denoising(点云去噪)](#Point-Cloud-Denoising)
42 | - [Point Cloud Registration(点云配准)](#Point-Cloud-Registration)
43 | - [Point Cloud Completion(点云补全)](#PCC)
44 | - [雷达语义分割(Radar Semantic Segmentation)](#RSS)
45 | - [图像恢复(Image Restoration)](#Image-Restoration)
46 | - [超分辨率(Super-Resolution)](#Super-Resolution)
47 | - [去噪(Denoising)](#Denoising)
48 | - [医学图像去噪(Medical Image Denoising)](#Medical-Image-Denoising)
49 | - [去模糊(Deblurring)](#Deblurring)
50 | - [阴影去除(Shadow Removal)](Shadow-Removal)
51 | - [视频插帧(Video Frame Interpolation)](#VFI)
52 | - [视频修复/补全(Video Inpainting)](#Video-Inpainting)
53 | - [行人重识别(Person Re-identification)](#Re-ID)
54 | - [行人搜索(Person Search)](#Person-Search)
55 | - [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation)
56 | - [6D位姿估计(6D Object Pose Estimation)](#6D-Object)
57 | - [3D人头重建(3D Head Reconstruction)](#3D-Head-Reconstruction)
58 | - [人脸识别(Face Recognition)](#FR)
59 | - [人脸表情识别(Facial Expression Recognition)](#FER)
60 | - [行为识别(Action Recognition)](#Action-Recognition)
61 | - [时序动作定位(Temporal Action Localization)](#Temporal-Action-Localization)
62 | - [动作检测(Action Detection)](Action-Detection)
63 | - [群体活动识别(Group Activity Recognition)](#GAR)
64 | - [手语识别(Sign Language Recognition)](#SLR)
65 | - [文本检测(Text Detection)](#Text-Detection)
66 | - [文本识别(Text Recognition)](#Text-Recognition)
67 | - [文本替换(Text Repalcement)](#TR)
68 | - [视觉问答(Visual Question Answering, VQA)](#Visual-Question-Answering)
69 | - [对抗攻击(Adversarial Attack)](#Adversarial-Attack)
70 | - [深度估计(Depth Estimation)](#Depth-Estimation)
71 | - [视线估计(Gaze Estimation)](#Gaze-Estimation)
72 | - [人群计数(Crowd Counting)](#Crowd-Counting)
73 | - [车道线检测(Lane Detection)](#Lane-Detection)
74 | - [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction)
75 | - [异常检测(Anomaly Detection)](#Anomaly-Detection)
76 | - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
77 | - [图像编辑(Image Editing)](#Image-Editing)
78 | - [图像合成(Image Synthesis)](#Image-Synthesis)
79 | - [图像检索(Image Retrieval)](#Image-Retrieval)
80 | - [三维重建(3D Reconstruction)](#3D-R)
81 | - [视频稳像(Video Stabilization)](#Video-Stabilization)
82 | - [细粒度识别(Fine-Grained Recognition)](#FGR)
83 | - [风格迁移(Style Transfer)](#Style-Transfer)
84 | - [神经绘画(Neural Painting)](#Neural-Painting)
85 | - [特征匹配(Feature Matching)](#FM)
86 | - [语义对应(Semantic Correspondence)](#Semantic-Correspondence)
87 | - [边缘检测(Edge Detection)](#Edge-Detection)
88 | - [相机标定(Camera Calibration)](#Camera-Calibration)
89 | - [图像质量评估(Image Quality Assessment)](#IQA)
90 | - [度量学习(Metric Learning)](#ML)
91 | - [Unsupervised Domain Adaptation](#UDA)
92 | - [Video Rescaling](#Video-Rescaling)
93 | - [Hand-Object Interaction](#Hand-Object-Interaction)
94 | - [Vision-and-Language Navigation](#VLN)
95 | - [数据集(Datasets)](#Datasets)
96 | - [其他(Others)](#Others)
97 |
98 |
99 |
100 | # Backbone
101 |
102 | **Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions**
103 |
104 | - Paper(Oral): https://arxiv.org/abs/2102.12122
105 | - Code: https://github.com/whai362/PVT
106 |
107 | **AutoFormer: Searching Transformers for Visual Recognition**
108 |
109 | - Paper: https://arxiv.org/abs/2107.00651
110 | - Code: https://github.com/microsoft/AutoML
111 |
112 | **Bias Loss for Mobile Neural Networks**
113 |
114 | - Paper: https://arxiv.org/abs/2107.11170
115 | - Code: None
116 |
117 | **Vision Transformer with Progressive Sampling**
118 |
119 | - Paper: https://arxiv.org/abs/2108.01684
120 | - Code: https://github.com/yuexy/PS-ViT
121 |
122 | **Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet**
123 |
124 | - Paper: https://arxiv.org/abs/2101.11986
125 | - Code: https://github.com/yitu-opensource/T2T-ViT
126 |
127 | **Rethinking Spatial Dimensions of Vision Transformers**
128 |
129 | - Paper: https://arxiv.org/abs/2103.16302
130 |
131 | - Code: https://github.com/naver-ai/pit
132 |
133 | **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**
134 |
135 | - Paper: https://arxiv.org/abs/2103.14030
136 | - Code: https://github.com/microsoft/Swin-Transformer
137 |
138 | **Conformer: Local Features Coupling Global Representations for Visual Recognition**
139 |
140 | - Paper: https://arxiv.org/abs/2105.03889
141 |
142 | - Code: https://github.com/pengzhiliang/Conformer
143 |
144 | **MicroNet: Improving Image Recognition with Extremely Low FLOPs**
145 |
146 | - Paper: https://arxiv.org/abs/2108.05894
147 | - Code: https://github.com/liyunsheng13/micronet
148 |
149 | **Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition**
150 |
151 | - Paper: https://arxiv.org/abs/2102.01063
152 | - Code: https://github.com/idstcv/ZenNAS
153 |
154 |
155 |
156 | # Visual Transformer
157 |
158 | **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**
159 |
160 | - Paper: https://arxiv.org/abs/2103.14030
161 | - Code: https://github.com/microsoft/Swin-Transformer
162 |
163 | **An Empirical Study of Training Self-Supervised Vision Transformers**
164 |
165 | - Paper(Oral): https://arxiv.org/abs/2104.02057
166 | - MoCo v3 Code: None
167 |
168 | **Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions**
169 |
170 | - Paper(Oral): https://arxiv.org/abs/2102.12122
171 | - Code: https://github.com/whai362/PVT
172 |
173 | **Group-Free 3D Object Detection via Transformers**
174 |
175 | - Paper: https://arxiv.org/abs/2104.00678
176 | - Code: None
177 |
178 | **Spatial-Temporal Transformer for Dynamic Scene Graph Generation**
179 |
180 | - Paper: https://arxiv.org/abs/2107.12309
181 | - Code: None
182 |
183 | **Rethinking and Improving Relative Position Encoding for Vision Transformer**
184 |
185 | - Paper: https://arxiv.org/abs/2107.14222
186 | - Code: https://github.com/microsoft/AutoML/tree/main/iRPE
187 |
188 | **Emerging Properties in Self-Supervised Vision Transformers**
189 |
190 | - Paper: https://arxiv.org/abs/2104.14294
191 | - Code: https://github.com/facebookresearch/dino
192 |
193 | **Learning Spatio-Temporal Transformer for Visual Tracking**
194 |
195 | - Paper: https://arxiv.org/abs/2103.17154
196 | - Code: https://github.com/researchmm/Stark
197 |
198 | **Fast Convergence of DETR with Spatially Modulated Co-Attention**
199 |
200 | - Paper: https://arxiv.org/abs/2101.07448
201 | - Code: https://github.com/abc403/SMCA-replication
202 |
203 | **Vision Transformer with Progressive Sampling**
204 |
205 | - Paper: https://arxiv.org/abs/2108.01684
206 | - Code: https://github.com/yuexy/PS-ViT
207 |
208 | **Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet**
209 |
210 | - Paper: https://arxiv.org/abs/2101.11986
211 | - Code: https://github.com/yitu-opensource/T2T-ViT
212 |
213 | **Rethinking Spatial Dimensions of Vision Transformers**
214 |
215 | - Paper: https://arxiv.org/abs/2103.16302
216 | - Code: https://github.com/naver-ai/pit
217 |
218 | **The Right to Talk: An Audio-Visual Transformer Approach**
219 |
220 | - Paper: https://arxiv.org/abs/2108.03256
221 | - Code: None
222 |
223 | **Joint Inductive and Transductive Learning for Video Object Segmentation**
224 |
225 | - Paper: https://arxiv.org/abs/2108.03679
226 | - Code: https://github.com/maoyunyao/JOINT
227 |
228 | **Conformer: Local Features Coupling Global Representations for Visual Recognition**
229 |
230 | - Paper: https://arxiv.org/abs/2105.03889
231 | - Code: https://github.com/pengzhiliang/Conformer
232 |
233 | **Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer**
234 |
235 | - Paper: https://arxiv.org/abs/2108.03032
236 | - Code: https://github.com/zhiheLu/CWT-for-FSS
237 |
238 | **Paint Transformer: Feed Forward Neural Painting with Stroke Prediction**
239 |
240 | - Paper: https://arxiv.org/abs/2108.03798
241 | - Code: https://github.com/wzmsltw/PaintTransformer
242 |
243 | **Conditional DETR for Fast Training Convergence**
244 |
245 | - Paper: https://arxiv.org/abs/2108.06152
246 | - Code: https://github.com/Atten4Vis/ConditionalDETR
247 |
248 | **MUSIQ: Multi-scale Image Quality Transformer**
249 |
250 | - Paper: https://arxiv.org/abs/2108.05997
251 | - Code: https://github.com/google-research/google-research/tree/master/musiq
252 |
253 | **SOTR: Segmenting Objects with Transformers**
254 |
255 | - Paper: https://arxiv.org/abs/2108.06747
256 | - Code: https://github.com/easton-cau/SOTR
257 |
258 | **PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers**
259 |
260 | - Paper(Oral): https://arxiv.org/abs/2108.08839
261 | - Code: https://github.com/yuxumin/PoinTr
262 |
263 | **SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer**
264 |
265 | - Paper: https://arxiv.org/abs/2108.04444
266 | - Code: https://github.com/AllenXiangX/SnowflakeNet
267 |
268 | **Improving 3D Object Detection with Channel-wise Transformer**
269 |
270 | - Paper: https://arxiv.org/abs/2108.10723
271 | - Code: https://github.com/hlsheng1/CT3D
272 |
273 | **TransFER: Learning Relation-aware Facial Expression Representations with Transformers**
274 |
275 | - Paper: https://arxiv.org/abs/2108.11116
276 | - Code: None
277 |
278 | **GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer**
279 |
280 | - Paper: https://arxiv.org/abs/2108.12630
281 | - Code: https://github.com/xueyee/GroupFormer
282 |
283 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
284 |
285 | - Paper: https://arxiv.org/abs/2109.00512
286 | - Code: https://github.com/facebookresearch/co3d
287 | - Dataset: https://github.com/facebookresearch/co3d
288 |
289 | **Voxel Transformer for 3D Object Detection**
290 |
291 | - Paper: https://arxiv.org/abs/2109.02497
292 | - Code: None
293 |
294 | **3D Human Texture Estimation from a Single Image with Transformers**
295 |
296 | - Homepage: https://www.mmlab-ntu.com/project/texformer/
297 | - Paper(Oral): https://arxiv.org/abs/2109.02563
298 | - Code: None
299 |
300 | **FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting**
301 |
302 | - Paper: https://arxiv.org/abs/2109.02974
303 | - Code: https://github.com/ruiliu-ai/FuseFormer
304 |
305 | **CTRL-C: Camera calibration TRansformer with Line-Classification**
306 |
307 | - Paper: https://arxiv.org/abs/2109.02259
308 | - Code: https://github.com/jwlee-vcl/CTRL-C
309 |
310 | **An End-to-End Transformer Model for 3D Object Detection**
311 |
312 | - Homepage: https://facebookresearch.github.io/3detr/
313 | - Paper: https://arxiv.org/abs/2109.08141
314 | - Code: https://github.com/facebookresearch/3detr
315 |
316 | **Eformer: Edge Enhancement based Transformer for Medical Image Denoising**
317 |
318 | - Paper: https://arxiv.org/abs/2109.08044
319 | - Code: None
320 |
321 | **PnP-DETR: Towards Efficient Visual Analysis with Transformers**
322 |
323 | - Paper: https://arxiv.org/abs/2109.07036
324 | - Code: https://github.com/twangnh/pnp-detr
325 |
326 | **Transformer-based Dual Relation Graph for Multi-label Image Recognition**
327 |
328 | - Paper: https://arxiv.org/abs/2110.04722
329 | - Code: None
330 |
331 |
332 |
333 | # 涨点神器
334 |
335 | **FaPN: Feature-aligned Pyramid Network for Dense Image Prediction**
336 |
337 | - Paper: https://github.com/EMI-Group/FaPN
338 | - Code: https://arxiv.org/abs/2108.07058
339 |
340 | **Unifying Nonlocal Blocks for Neural Networks**
341 |
342 | - Paper: https://arxiv.org/abs/2108.02451
343 | - Code: https://github.com/zh460045050/SNL_ICCV2021
344 |
345 | **Towards Learning Spatially Discriminative Feature Representations**
346 |
347 | - Paper: https://arxiv.org/abs/2109.01359
348 | - Code: None
349 |
350 |
351 |
352 | # GAN
353 |
354 | **Labels4Free: Unsupervised Segmentation using StyleGAN**
355 |
356 | - Homepage: https://rameenabdal.github.io/Labels4Free/
357 | - Paper: https://arxiv.org/abs/2103.14968
358 |
359 | **GNeRF: GAN-based Neural Radiance Field without Posed Camera**
360 |
361 | - Paper(Oral): https://arxiv.org/abs/2103.15606
362 |
363 | - Code: https://github.com/MQ66/gnerf
364 |
365 | **EigenGAN: Layer-Wise Eigen-Learning for GANs**
366 |
367 | - Paper: https://arxiv.org/abs/2104.12476
368 | - Code: https://github.com/LynnHo/EigenGAN-Tensorflow
369 |
370 | **From Continuity to Editability: Inverting GANs with Consecutive Images**
371 |
372 | - Paper: https://arxiv.org/abs/2107.13812
373 | - Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
374 |
375 | **Sketch Your Own GAN**
376 |
377 | - Homepage: https://peterwang512.github.io/GANSketching/
378 | - Paper: https://arxiv.org/abs/2108.02774
379 | - 代码: https://github.com/peterwang512/GANSketching
380 |
381 | **Manifold Matching via Deep Metric Learning for Generative Modeling**
382 |
383 | - Paper: https://arxiv.org/abs/2106.10777
384 | - Code: https://github.com/dzld00/pytorch-manifold-matching
385 |
386 | **Dual Projection Generative Adversarial Networks for Conditional Image Generation**
387 |
388 | - Paper: https://arxiv.org/abs/2108.09016
389 | - Code: None
390 |
391 | **GAN Inversion for Out-of-Range Images with Geometric Transformations**
392 |
393 | - Paper: https://arxiv.org/abs/2108.08998
394 | - Code: None
395 |
396 | **ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement**
397 |
398 | - Homepage: https://yuval-alaluf.github.io/restyle-encoder/
399 | - Paper: https://arxiv.org/abs/2104.02699
400 | - Code: https://github.com/yuval-alaluf/restyle-encoder
401 |
402 | **StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery**
403 |
404 | - Paper(Oral): https://arxiv.org/abs/2103.17249
405 | - Code: https://github.com/orpatashnik/StyleCLIP
406 |
407 | **Image Synthesis via Semantic Composition**
408 |
409 | - Homepage: https://shepnerd.github.io/scg/
410 | - Paper: https://arxiv.org/abs/2109.07053
411 | - Code: https://github.com/dvlab-research/SCGAN
412 |
413 |
414 |
415 | # NAS
416 |
417 | **AutoFormer: Searching Transformers for Visual Recognition**
418 |
419 | - Paper: https://arxiv.org/abs/2107.00651
420 | - Code: https://github.com/microsoft/AutoML
421 |
422 | **BN-NAS: Neural Architecture Search with Batch Normalization**
423 |
424 | - Paper: https://arxiv.org/abs/2108.07375
425 | - Code: https://github.com/bychen515/BNNAS
426 |
427 | **Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition**
428 |
429 | - Paper: https://arxiv.org/abs/2102.01063
430 | - Code: https://github.com/idstcv/ZenNAS
431 |
432 |
433 |
434 | # NeRF
435 |
436 | **GNeRF: GAN-based Neural Radiance Field without Posed Camera**
437 |
438 | - Paper(Oral): https://arxiv.org/abs/2103.15606
439 |
440 | - Code: https://github.com/MQ66/gnerf
441 |
442 | **KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs**
443 |
444 | - Paper: https://arxiv.org/abs/2103.13744
445 |
446 | - Code: https://github.com/creiser/kilonerf
447 |
448 | **In-Place Scene Labelling and Understanding with Implicit Scene Representation**
449 |
450 | - Homepage: https://shuaifengzhi.com/Semantic-NeRF/
451 | - Paper(Oral): https://arxiv.org/abs/2103.15875
452 |
453 | **Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis**
454 |
455 | - Homepage: https://ajayj.com/dietnerf
456 | - Paper(DietNeRF): https://arxiv.org/abs/2104.00677
457 |
458 | **BARF: Bundle-Adjusting Neural Radiance Fields**
459 |
460 | - Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/
461 | - Paper(Oral): https://arxiv.org/abs/2104.06405
462 | - Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF
463 |
464 | **Self-Calibrating Neural Radiance Fields**
465 |
466 | - Paper: https://arxiv.org/abs/2108.13826
467 | - Code: https://github.com/POSTECH-CVLab/SCNeRF
468 |
469 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
470 |
471 | - Paper: https://arxiv.org/abs/2109.00512
472 | - Code: https://github.com/facebookresearch/co3d
473 | - Dataset: https://github.com/facebookresearch/co3d
474 |
475 | **Neural Articulated Radiance Field**
476 |
477 | - Paper: https://arxiv.org/abs/2104.03110
478 | - Code: https://github.com/nogu-atsu/NARF
479 |
480 | **NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo**
481 |
482 | - Paper(Oral): https://arxiv.org/abs/2109.01129
483 | - Code: https://github.com/weiyithu/NerfingMVS
484 |
485 | **SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes**
486 |
487 | - Homepage: https://xuchen-ethz.github.io/snarf
488 | - Paper: https://arxiv.org/abs/2104.03953
489 | - Code: https://github.com/xuchen-ethz/snarf
490 |
491 | **CodeNeRF: Disentangled Neural Radiance Fields for Object Categories**
492 |
493 | - Paper: https://arxiv.org/abs/2109.01750
494 | - Code: https://github.com/wayne1123/code-nerf
495 |
496 | **PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering**
497 |
498 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Ren_PIRenderer_Controllable_Portrait_Image_Generation_via_Semantic_Neural_Rendering_ICCV_2021_paper.html
499 | - Code: https://github.com/RenYurui/PIRender
500 |
501 |
502 |
503 | # Loss
504 |
505 | **Rank & Sort Loss for Object Detection and Instance Segmentation**
506 |
507 | - Paper(Oral): https://arxiv.org/abs/2107.11669
508 | - Code: https://github.com/kemaloksuz/RankSortLoss
509 |
510 | **Bias Loss for Mobile Neural Networks**
511 |
512 | - Paper: https://arxiv.org/abs/2107.11170
513 | - Code: None
514 |
515 | **A Robust Loss for Point Cloud Registration**
516 |
517 | - Paper: https://arxiv.org/abs/2108.11682
518 | - Code: None
519 |
520 | **Reconcile Prediction Consistency for Balanced Object Detection**
521 |
522 | - Paper: https://arxiv.org/abs/2108.10809
523 | - Code: None
524 |
525 | **Influence-Balanced Loss for Imbalanced Visual Classification**
526 |
527 | - Paper: https://arxiv.org/abs/2110.02444
528 | - Code: https://github.com/pseulki/IB-Loss
529 |
530 |
531 |
532 | # Zero-Shot Learning
533 |
534 | **FREE: Feature Refinement for Generalized Zero-Shot Learning**
535 |
536 | - Paper: https://arxiv.org/abs/2107.13807
537 | - Code: https://github.com/shiming-chen/FREE
538 |
539 | **Discriminative Region-based Multi-Label Zero-Shot Learning**
540 |
541 | - Paper: https://arxiv.org/abs/2108.09301
542 | - Code: https://arxiv.org/abs/2108.09301
543 |
544 | **Semantics Disentangling for Generalized Zero-Shot Learning**
545 |
546 | - Paper: https://arxiv.org/pdf/2101.07978
547 | - Code: https://github.com/uqzhichen/SDGZSL
548 |
549 |
550 |
551 | # Few-Shot Learning
552 |
553 | **Relational Embedding for Few-Shot Classification**
554 |
555 | - Paper: https://arxiv.org/abs/2108.0966
556 | - Code: https://github.com/dahyun-kang/renet
557 |
558 | **Few-Shot and Continual Learning with Attentive Independent Mechanisms**
559 |
560 | - Paper: https://arxiv.org/abs/2107.14053
561 | - Code: https://github.com/huang50213/AIM-Fewshot-Continual
562 |
563 | **Few Shot Visual Relationship Co-Localization**
564 |
565 | - Homepage: https://vl2g.github.io/projects/vrc/
566 |
567 | - Paper: https://arxiv.org/abs/2108.11618
568 |
569 |
570 |
571 | # 长尾(Long-tailed)
572 |
573 | **Parametric Contrastive Learning**
574 |
575 | - Paper: https://arxiv.org/abs/2107.12028
576 | - Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
577 |
578 | **Influence-Balanced Loss for Imbalanced Visual Classification**
579 |
580 | - Paper: https://arxiv.org/abs/2110.02444
581 | - Code: https://github.com/pseulki/IB-Loss
582 |
583 |
584 |
585 | # Vision and Language
586 |
587 | **VLGrammar: Grounded Grammar Induction of Vision and Language**
588 |
589 | - Paper: https://arxiv.org/abs/2103.12975
590 | - Code: https://github.com/evelinehong/VLGrammar
591 |
592 |
593 |
594 | # 无监督/自监督(Un/Self-Supervised)
595 |
596 | **An Empirical Study of Training Self-Supervised Vision Transformers**
597 |
598 | - Paper(Oral): https://arxiv.org/abs/2104.02057
599 | - MoCo v3 Code: None
600 |
601 | **DetCo: Unsupervised Contrastive Learning for Object Detection**
602 |
603 | - Paper: https://arxiv.org/abs/2102.04803
604 | - Code: https://github.com/xieenze/DetCo
605 |
606 | **Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization**
607 |
608 | - Paper: https://arxiv.org/abs/2108.02183
609 | - Code: None
610 |
611 | **Improving Contrastive Learning by Visualizing Feature Transformation**
612 |
613 | - Paper(Oral): https://arxiv.org/abs/2108.02982
614 | - Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation
615 |
616 | **Self-Supervised Visual Representations Learning by Contrastive Mask Prediction**
617 |
618 | - Paper: https://arxiv.org/abs/2108.08012
619 | - Code: None
620 |
621 | **Temporal Knowledge Consistency for Unsupervised Visual Representation Learning**
622 |
623 | - Paper: https://arxiv.org/abs/2108.10668
624 | - Code: None
625 |
626 | **MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving**
627 |
628 | - Paper: https://arxiv.org/abs/2108.12178
629 | - Code: https://github.com/KaiChen1998/MultiSiam
630 |
631 | **Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds**
632 |
633 | - Homepage: https://siyuanhuang.com/STRL/
634 | - Paper: https://arxiv.org/abs/2109.00179
635 | - Code: https://github.com/yichen928/STRL
636 |
637 | **Self-supervised Product Quantization for Deep Unsupervised Image Retrieval**
638 |
639 | - Paper: https://arxiv.org/abs/2109.02244
640 | - Code: https://github.com/youngkyunJang/SPQ
641 |
642 | **Self-Supervised Representation Learning from Flow Equivariance**
643 |
644 | - Paper: https://arxiv.org/abs/2101.06553
645 | - Code: None
646 |
647 |
648 |
649 | # Multi-Label Image Recognition(多标签图像识别)
650 |
651 | **Residual Attention: A Simple but Effective Method for Multi-Label Recognition**
652 |
653 | - Paper: https://arxiv.org/abs/2108.02456
654 | - Code: https://github.com/Kevinz-code/CSRA
655 |
656 |
657 |
658 | # 2D目标检测(Object Detection)
659 |
660 | **DetCo: Unsupervised Contrastive Learning for Object Detection**
661 |
662 | - Paper: https://arxiv.org/abs/2102.04803
663 | - Code: https://github.com/xieenze/DetCo
664 |
665 | **Detecting Invisible People**
666 |
667 | - Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm
668 | - Code: https://arxiv.org/abs/2012.08419
669 |
670 | **Active Learning for Deep Object Detection via Probabilistic Modeling**
671 |
672 | - Paper: https://arxiv.org/abs/2103.16130
673 | - Code: None
674 |
675 | **Conditional Variational Capsule Network for Open Set Recognition**
676 |
677 | - Paper: https://arxiv.org/abs/2104.09159
678 | - Code: https://github.com/guglielmocamporese/cvaecaposr
679 |
680 | **MDETR : Modulated Detection for End-to-End Multi-Modal Understanding**
681 |
682 | - Homepage: https://ashkamath.github.io/mdetr_page/
683 | - Paper(Oral): https://arxiv.org/abs/2104.12763
684 | - Code: https://github.com/ashkamath/mdetr
685 |
686 | **Rank & Sort Loss for Object Detection and Instance Segmentation**
687 |
688 | - Paper(Oral): https://arxiv.org/abs/2107.11669
689 | - Code: https://github.com/kemaloksuz/RankSortLoss
690 |
691 | **SimROD: A Simple Adaptation Method for Robust Object Detection**
692 |
693 | - Paper(Oral): https://arxiv.org/abs/2107.13389
694 | - Code: None
695 |
696 | **GraphFPN: Graph Feature Pyramid Network for Object Detection**
697 |
698 | - Paper: https://arxiv.org/abs/2108.00580
699 | - Code: None
700 |
701 | **Fast Convergence of DETR with Spatially Modulated Co-Attention**
702 |
703 | - Paper: https://arxiv.org/abs/2101.07448
704 | - Code: https://github.com/abc403/SMCA-replication
705 |
706 | **Conditional DETR for Fast Training Convergence**
707 |
708 | - Paper: https://arxiv.org/abs/2108.06152
709 | - Code: https://github.com/Atten4Vis/ConditionalDETR
710 |
711 | **TOOD: Task-aligned One-stage Object Detection**
712 |
713 | - Paper(Oral): https://arxiv.org/abs/2108.07755
714 | - Code: https://github.com/fcjian/TOOD
715 |
716 | **Reconcile Prediction Consistency for Balanced Object Detection**
717 |
718 | - Paper: https://arxiv.org/abs/2108.10809
719 |
720 | - Code: None
721 |
722 | **Mutual Supervision for Dense Object Detection**
723 |
724 | - Paper: https://arxiv.org/abs/2109.05986
725 | - Code: https://github.com/MCG-NJU/MuSu-Detection
726 |
727 | **PnP-DETR: Towards Efficient Visual Analysis with Transformers**
728 |
729 | - Paper: https://arxiv.org/abs/2109.07036
730 | - Code: https://github.com/twangnh/pnp-detr
731 |
732 | **Deep Structured Instance Graph for Distilling Object Detectors**
733 |
734 | - Paper: https://arxiv.org/abs/2109.12862
735 |
736 | - Code: https://github.com/dvlab-research/Dsig
737 |
738 | ## 半监督目标检测
739 |
740 | **End-to-End Semi-Supervised Object Detection with Soft Teacher**
741 |
742 | - Paper: https://arxiv.org/abs/2106.09018
743 | - Code: None
744 |
745 | ## 旋转目标检测
746 |
747 | **Oriented R-CNN for Object Detection**
748 |
749 | - Paper: https://arxiv.org/abs/2108.05699
750 | - Code: https://github.com/jbwang1997/OBBDetection
751 |
752 | ## Few-Shot目标检测
753 |
754 | **DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection**
755 |
756 | - Paper: https://arxiv.org/abs/2108.09017
757 | - Code: https://github.com/er-muyue/DeFRCN
758 |
759 |
760 |
761 | ## 语义分割(Semantic Segmentation)
762 |
763 | **Personalized Image Semantic Segmentation**
764 |
765 | - Paper: https://arxiv.org/abs/2107.13978
766 | - Code: https://github.com/zhangyuygss/PIS
767 | - Dataset: https://github.com/zhangyuygss/PIS
768 |
769 | **Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation**
770 |
771 | - Paper(Oral): https://arxiv.org/abs/2107.11264
772 | - Code: None
773 |
774 | **Enhanced Boundary Learning for Glass-like Object Segmentation**
775 |
776 | - Paper: https://arxiv.org/abs/2103.15734
777 | - Code: https://github.com/hehao13/EBLNet
778 |
779 | **Self-Regulation for Semantic Segmentation**
780 |
781 | - Paper: https://arxiv.org/abs/2108.09702
782 | - Code: https://github.com/dongzhang89/SR-SS
783 |
784 | **Mining Contextual Information Beyond Image for Semantic Segmentation**
785 |
786 | - Paper: https://arxiv.org/abs/2108.11819
787 | - Code: https://github.com/CharlesPikachu/mcibi
788 |
789 | **Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation**
790 |
791 | - Paper: https://arxiv.org/abs/2107.11264
792 | - Code: https://github.com/shjung13/Standardized-max-logits
793 |
794 | **ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation**
795 |
796 | - Paper: https://arxiv.org/abs/2108.12382
797 | - Code: https://github.com/SegmentationBLWX/sssegmentation
798 |
799 | **Scaling up instance annotation via label propagation**
800 |
801 | - Homepage: http://scaling-anno.csail.mit.edu/
802 | - Paper: https://arxiv.org/abs/2110.02277
803 | - Code: None
804 |
805 | ## 无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)
806 |
807 | **Multi-Anchor Active Domain Adaptation for Semantic Segmentation**
808 |
809 | - Paper(Oral): https://arxiv.org/abs/2108.08012
810 | - Code: https://github.com/munanning/MADA
811 |
812 | **Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation**
813 |
814 | - Homepage: https://sites.google.com/view/sfdaseg
815 | - Paper: https://arxiv.org/abs/2108.11249
816 |
817 | ## Few-Shot语义分割
818 |
819 | **Learning Meta-class Memory for Few-Shot Semantic Segmentation**
820 |
821 | - Paper: https://arxiv.org/abs/2108.02958'
822 | - Code: None
823 |
824 | **Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer**
825 |
826 | - Paper: https://arxiv.org/abs/2108.03032
827 | - Code: https://github.com/zhiheLu/CWT-for-FSS
828 |
829 | ## 半监督语义分割(Semi-supervised Semantic Segmentation)
830 |
831 | **Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation**
832 |
833 | - Paper: https://arxiv.org/abs/2107.11787
834 | - Code: None
835 |
836 | **Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation**
837 |
838 | - Paper(Oral): https://arxiv.org/abs/2107.11279
839 | - Code: https://github.com/CVMI-Lab/DARS
840 |
841 | **Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation**
842 |
843 | - Paper: https://arxiv.org/abs/2108.09025
844 | - Code: None
845 |
846 | ## 弱监督语义分割(Weakly Supervised Semantic Segmentation)
847 |
848 | **Complementary Patch for Weakly Supervised Semantic Segmentation**
849 |
850 | - Paper: https://arxiv.org/abs/2108.03852
851 | - Code: None
852 |
853 | ## 无监督分割(Unsupervised Segmentation)
854 |
855 | **Labels4Free: Unsupervised Segmentation using StyleGAN**
856 |
857 | - Homepage: https://rameenabdal.github.io/Labels4Free/
858 | - Paper: https://arxiv.org/abs/2103.14968
859 |
860 |
861 |
862 | # 实例分割(Instance Segmentation)
863 |
864 | **Instances as Queries**
865 |
866 | - Paper: https://arxiv.org/abs/2105.01928
867 | - Code: https://github.com/hustvl/QueryInst
868 |
869 | **Crossover Learning for Fast Online Video Instance Segmentation**
870 |
871 | - Paper: https://arxiv.org/abs/2104.05970
872 | - Code: https://github.com/hustvl/CrossVIS
873 |
874 | **Rank & Sort Loss for Object Detection and Instance Segmentation**
875 |
876 | - Paper(Oral): https://arxiv.org/abs/2107.11669
877 | - Code: https://github.com/kemaloksuz/RankSortLoss
878 |
879 | **SOTR: Segmenting Objects with Transformers**
880 |
881 | - Paper: https://arxiv.org/abs/2108.06747
882 | - Code: https://github.com/easton-cau/SOTR
883 |
884 | **Scaling up instance annotation via label propagation**
885 |
886 | - Homepage: http://scaling-anno.csail.mit.edu/
887 | - Paper: https://arxiv.org/abs/2110.02277
888 | - Code: None
889 |
890 |
891 |
892 | # 医学图像分割(Medical Image Segmentation)
893 |
894 | **Recurrent Mask Refinement for Few-Shot Medical Image Segmentation**
895 |
896 | - Paper: https://arxiv.org/abs/2108.00622
897 | - Code: https://github.com/uci-cbcl/RP-Net
898 |
899 |
900 |
901 | # 视频目标分割(Video Object Segmentation)
902 |
903 | **Hierarchical Memory Matching Network for Video Object Segmentation**
904 |
905 | - Paper: https://arxiv.org/abs/2109.11404
906 | - Code: https://github.com/Hongje/HMMN
907 |
908 | **Full-Duplex Strategy for Video Object Segmentation**
909 |
910 | - Homepage: http://dpfan.net/FSNet/
911 | - Paper: https://arxiv.org/abs/2108.03151
912 | - Code: https://github.com/GewelsJI/FSNet
913 |
914 | **Joint Inductive and Transductive Learning for Video Object Segmentation**
915 |
916 | - Paper: https://arxiv.org/abs/2108.03679
917 | - Code: https://github.com/maoyunyao/JOINT
918 |
919 |
920 |
921 | # Few-shot Segmentation
922 |
923 | **Mining Latent Classes for Few-shot Segmentation**
924 |
925 | - Paper(Oral): https://arxiv.org/abs/2103.15402
926 | - Code: https://github.com/LiheYoung/MiningFSS
927 |
928 |
929 |
930 | # 人体运动分割(Human Motion Segmentation)
931 |
932 | **Graph Constrained Data Representation Learning for Human Motion Segmentation**
933 |
934 | - Paper: https://arxiv.org/abs/2107.13362
935 | - Code: None
936 |
937 |
938 |
939 | # 目标跟踪(Object Tracking)
940 |
941 | **Learning to Track Objects from Unlabeled Videos**
942 |
943 | - Paper: https://arxiv.org/abs/2108.12711
944 | - Code: https://github.com/VISION-SJTU/USOT
945 |
946 | **Learning Spatio-Temporal Transformer for Visual Tracking**
947 |
948 | - Paper: https://arxiv.org/abs/2103.17154
949 | - Code: https://github.com/researchmm/Stark
950 |
951 | **Learning to Adversarially Blur Visual Object Tracking**
952 |
953 | - Paper: https://arxiv.org/abs/2107.12085
954 | - Code: https://github.com/tsingqguo/ABA
955 |
956 | **HiFT: Hierarchical Feature Transformer for Aerial Tracking**
957 |
958 | - Paper: https://arxiv.org/abs/2108.00202
959 | - Code: https://github.com/vision4robotics/HiFT
960 |
961 | **Learn to Match: Automatic Matching Network Design for Visual Tracking**
962 |
963 | - Paper: https://arxiv.org/abs/2108.00803
964 | - Code: https://github.com/JudasDie/SOTS
965 |
966 | **Saliency-Associated Object Tracking**
967 |
968 | - Paper: https://arxiv.org/abs/2108.03637
969 | - Code: https://github.com/ZikunZhou/SAOT.git
970 |
971 | ## RGBD 目标跟踪
972 |
973 | **DepthTrack: Unveiling the Power of RGBD Tracking**
974 |
975 | - Paper: https://arxiv.org/abs/2108.13962
976 | - Code: https://github.com/xiaozai/DeT
977 | - Dataset: https://github.com/xiaozai/DeT
978 |
979 |
980 |
981 | # 3D Point Cloud
982 |
983 | **Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds**
984 |
985 | - Homepage: https://siyuanhuang.com/STRL/
986 | - Paper: https://arxiv.org/abs/2109.00179
987 |
988 | - Code: https://github.com/yichen928/STRL
989 |
990 | **Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion**
991 |
992 | - Homepage: https://hansen7.github.io/OcCo/
993 | - Paper: https://arxiv.org/abs/2010.01089
994 | - Code: https://github.com/hansen7/OcCo
995 |
996 | **DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation**
997 |
998 | - Paper: https://arxiv.org/abs/2108.04023
999 | - Code: None
1000 |
1001 | **Adaptive Graph Convolution for Point Cloud Analysis**
1002 |
1003 | - Paper: https://arxiv.org/abs/2108.08035
1004 | - Code: https://github.com/hrzhou2/AdaptConv-master
1005 |
1006 | **Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion**
1007 |
1008 | - Paper: https://arxiv.org/abs/2010.01089
1009 | - Code: https://github.com/hansen7/OcCo
1010 |
1011 |
1012 |
1013 | # 3D Object Detection(3D目标检测)
1014 |
1015 | **Group-Free 3D Object Detection via Transformers**
1016 |
1017 | - Paper: https://arxiv.org/abs/2104.00678
1018 | - Code: None
1019 |
1020 | **Improving 3D Object Detection with Channel-wise Transformer**
1021 |
1022 | - Paper: https://arxiv.org/abs/2108.10723
1023 | - Code: https://github.com/hlsheng1/CT3D
1024 |
1025 | **AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection**
1026 |
1027 | - Paper: https://arxiv.org/abs/2108.11127
1028 | - Code: https://github.com/zongdai/AutoShape
1029 |
1030 | **4D-Net for Learned Multi-Modal Alignment**
1031 |
1032 | - Paper: https://arxiv.org/abs/2109.01066
1033 | - Code: None
1034 |
1035 | **Voxel Transformer for 3D Object Detection**
1036 |
1037 | - Paper: https://arxiv.org/abs/2109.02497
1038 | - Code: None
1039 |
1040 | **Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection**
1041 |
1042 | - Paper: https://arxiv.org/abs/2109.02499
1043 | - Code: None
1044 |
1045 | **An End-to-End Transformer Model for 3D Object Detection**
1046 |
1047 | - Homepage: https://facebookresearch.github.io/3detr/
1048 | - Paper: https://arxiv.org/abs/2109.08141
1049 | - Code: https://github.com/facebookresearch/3detr
1050 |
1051 | **RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection**
1052 |
1053 | - Paper: https://arxiv.org/abs/2103.10039
1054 | - Code: https://github.com/TuSimple/RangeDet
1055 |
1056 | **Geometry-based Distance Decomposition for Monocular 3D Object Detection**
1057 |
1058 | - Paper: https://arxiv.org/abs/2104.03775
1059 | - Code: https://github.com/Rock-100/MonoDet
1060 |
1061 |
1062 |
1063 | ## 3D Semantic Segmentation(3D语义分割)
1064 |
1065 | **ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation**
1066 |
1067 | - Paper: https://arxiv.org/abs/2107.11769
1068 | - Code: None
1069 |
1070 | **Learning with Noisy Labels for Robust Point Cloud Segmentation**
1071 |
1072 | - Homepage: https://shuquanye.com/PNAL_website/
1073 | - Paper(Oral): https://arxiv.org/abs/2107.14230
1074 |
1075 | **VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation**
1076 |
1077 | - Paper(Oral): https://arxiv.org/abs/2107.13824
1078 | - Code: https://github.com/hzykent/VMNet
1079 |
1080 | **Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation**
1081 |
1082 | - Paper: https://arxiv.org/abs/2107.14724
1083 | - Code: https://github.com/leolyj/DsCML
1084 |
1085 | **DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation**
1086 |
1087 | - Paper: https://arxiv.org/abs/2108.04023
1088 | - Code: None
1089 |
1090 | **Adaptive Graph Convolution for Point Cloud Analysis**
1091 |
1092 | - Paper: https://arxiv.org/abs/2108.08035
1093 | - Code: https://github.com/hrzhou2/AdaptConv-master
1094 |
1095 | **Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation**
1096 |
1097 | - Paper: https://arxiv.org/abs/2106.15277
1098 |
1099 | - Code: https://github.com/ICEORY/PMF
1100 |
1101 |
1102 |
1103 | ## 3D Instance Segmentation(3D实例分割)
1104 |
1105 | **Hierarchical Aggregation for 3D Instance Segmentation**
1106 |
1107 | - Paper: https://arxiv.org/abs/2108.02350
1108 | - Code: https://github.com/hustvl/HAIS
1109 |
1110 | **Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks**
1111 |
1112 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Liang_Instance_Segmentation_in_3D_Scenes_Using_Semantic_Superpoint_Tree_Networks_ICCV_2021_paper.html
1113 |
1114 | - Code: https://github.com/Gorilla-Lab-SCUT/SSTNet
1115 |
1116 |
1117 |
1118 | ## 3D Multi-Object Tracking(3D多目标跟踪)
1119 |
1120 | **Exploring Simple 3D Multi-Object Tracking for Autonomous Driving**
1121 |
1122 | - Paper: https://arxiv.org/abs/2108.10312
1123 | - Code: https://github.com/qcraftai/simtrack
1124 |
1125 |
1126 |
1127 | ## Point Cloud Denoising(点云去噪)
1128 |
1129 | **Score-Based Point Cloud Denoising**
1130 |
1131 | - Paper: https://arxiv.org/abs/2107.10981
1132 | - Code: None
1133 |
1134 |
1135 |
1136 | ## Point Cloud Registration(点云配准)
1137 |
1138 | **HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration**
1139 |
1140 | - Homepage: https://ispc-group.github.io/hregnet
1141 | - Paper: https://arxiv.org/abs/2107.11992
1142 | - Code: https://github.com/ispc-lab/HRegNet
1143 |
1144 | **A Robust Loss for Point Cloud Registration**
1145 |
1146 | - Paper: https://arxiv.org/abs/2108.11682
1147 | - Code: None
1148 |
1149 |
1150 |
1151 | # Point Cloud Completion(点云补全)
1152 |
1153 | **PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers**
1154 |
1155 | - Paper(Oral): https://arxiv.org/abs/2108.08839
1156 | - Code: https://github.com/yuxumin/PoinTr
1157 |
1158 | **SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer**
1159 |
1160 | - Paper: https://arxiv.org/abs/2108.04444
1161 | - Code: https://github.com/AllenXiangX/SnowflakeNet
1162 |
1163 |
1164 |
1165 | # 雷达语义分割(Radar Semantic Segmentation)
1166 |
1167 | **Multi-View Radar Semantic Segmentation**
1168 |
1169 | - Paper: https://arxiv.org/abs/2103.16214
1170 | - Code: https://github.com/valeoai/MVRSS
1171 |
1172 |
1173 |
1174 | # 图像恢复(Image Restoration)
1175 |
1176 | **Dynamic Attentive Graph Learning for Image Restoration**
1177 |
1178 | - Paper: https://arxiv.org/abs/2109.06620
1179 | - Code: https://github.com/jianzhangcs/DAGL
1180 |
1181 |
1182 |
1183 | # 超分辨率(Super-Resolution)
1184 |
1185 | **Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks**
1186 |
1187 | - Paper: https://arxiv.org/abs/2004.03791
1188 | - Code: https://github.com/LongguangWang/ArbSR
1189 |
1190 | **Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution**
1191 |
1192 | - Paper: https://arxiv.org/abs/2108.05302
1193 | - Code: https://github.com/JingyunLiang/MANet
1194 |
1195 | **Deep Reparametrization of Multi-Frame Super-Resolution and Denoising**
1196 |
1197 | - Paper(Oral): https://arxiv.org/abs/2108.08286
1198 | - Code: None
1199 |
1200 | **Dual-Camera Super-Resolution with Aligned Attention Modules**
1201 |
1202 | - Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1203 | - Paper: https://arxiv.org/abs/2109.01349
1204 | - Code: https://github.com/Tengfei-Wang/DualCameraSR
1205 | - Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1206 |
1207 | **Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme**
1208 |
1209 | - Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
1210 | - Code: https://github.com/IanYeung/RealVSR
1211 | - Dataset: https://github.com/IanYeung/RealVSR
1212 |
1213 |
1214 |
1215 | # 去噪(Denoising)
1216 |
1217 | **Deep Reparametrization of Multi-Frame Super-Resolution and Denoising**
1218 |
1219 | - Paper(Oral): https://arxiv.org/abs/2108.08286
1220 | - Code: None
1221 |
1222 | **Rethinking Deep Image Prior for Denoising**
1223 |
1224 | - Paper: https://arxiv.org/abs/2108.12841
1225 | - Code: https://github.com/gistvision/DIP-denosing
1226 |
1227 |
1228 |
1229 | # 医学图像去噪(Medical Image Denoising)
1230 |
1231 | **Eformer: Edge Enhancement based Transformer for Medical Image Denoising**
1232 |
1233 | - Paper: https://arxiv.org/abs/2109.08044
1234 | - Code: None
1235 |
1236 |
1237 |
1238 | # 去模糊(Deblurring)
1239 |
1240 | **Rethinking Coarse-to-Fine Approach in Single Image Deblurring**
1241 |
1242 | - Paper: https://arxiv.org/abs/2108.05054
1243 | - Code: https://github.com/chosj95/MIMO-UNet
1244 |
1245 | **Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions**
1246 |
1247 | - Paper: https://arxiv.org/abs/2108.09108
1248 | - Code: None
1249 |
1250 |
1251 |
1252 | # 阴影去除(Shadow Removal)
1253 |
1254 | **CANet: A Context-Aware Network for Shadow Removal**
1255 |
1256 | - Paper: https://arxiv.org/abs/2108.09894
1257 | - Code: https://github.com/Zipei-Chen/CANet
1258 |
1259 |
1260 |
1261 | # 视频插帧(Video Frame Interpolation)
1262 |
1263 | **XVFI: eXtreme Video Frame Interpolation**
1264 |
1265 | - Paper(Oral): https://arxiv.org/abs/2103.16206
1266 | - Code: https://github.com/JihyongOh/XVFI
1267 | - Dataset: https://github.com/JihyongOh/XVFI
1268 |
1269 | **Asymmetric Bilateral Motion Estimation for Video Frame Interpolation**
1270 |
1271 | - Paper: https://arxiv.org/abs/2108.06815
1272 | - Code: https://github.com/JunHeum/ABME
1273 |
1274 |
1275 |
1276 | # 视频修复/补全(Video Inpainting)
1277 |
1278 | **FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting**
1279 |
1280 | - Paper: https://arxiv.org/abs/2109.02974
1281 | - Code: https://github.com/ruiliu-ai/FuseFormer
1282 |
1283 |
1284 |
1285 | # 行人重识别(Person Re-identification)
1286 |
1287 | **TransReID: Transformer-based Object Re-Identification**
1288 |
1289 | - Paper: https://arxiv.org/abs/2102.04378
1290 |
1291 | - Code: https://github.com/heshuting555/TransReID
1292 |
1293 | **IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID**
1294 |
1295 | - Paper(Oral): https://arxiv.org/abs/2108.02413
1296 | - Code: https://github.com/SikaStar/IDM
1297 |
1298 |
1299 |
1300 | # 行人搜索(Person Search)
1301 |
1302 | **Weakly Supervised Person Search with Region Siamese Networks**
1303 |
1304 | - Paper: https://arxiv.org/abs/2109.06109
1305 | - Code: None
1306 |
1307 |
1308 |
1309 | # 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
1310 |
1311 | ## 2D 人体姿态估计
1312 |
1313 | **Human Pose Regression with Residual Log-likelihood Estimation**
1314 |
1315 | - Paper(Oral): https://arxiv.org/abs/2107.11291
1316 | - Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
1317 |
1318 | **Online Knowledge Distillation for Efficient Pose Estimation**
1319 |
1320 | - Paper: https://arxiv.org/abs/2108.02092
1321 | - Code: None
1322 |
1323 | ## 3D 人体姿态估计
1324 |
1325 | **Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows**
1326 |
1327 | - Paper: https://arxiv.org/abs/2107.13788
1328 | - Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
1329 |
1330 | **Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images**
1331 |
1332 | - Paper: https://arxiv.org/abs/2109.05885
1333 | - Code: None
1334 |
1335 |
1336 |
1337 | # 6D位姿估计(6D Object Pose Estimation)
1338 |
1339 | **StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation**
1340 |
1341 | - Paper: https://arxiv.org/abs/2109.10115
1342 | - Code: None
1343 | - Dataset: None
1344 |
1345 |
1346 |
1347 | # 3D人头重建(3D Head Reconstruction)
1348 |
1349 | **H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction**
1350 |
1351 | - Homepage: https://crisalixsa.github.io/h3d-net/
1352 |
1353 | - Paper: https://arxiv.org/abs/2107.12512
1354 |
1355 |
1356 |
1357 | # 人脸识别(Face Recognition)
1358 |
1359 | **SynFace: Face Recognition with Synthetic Data**
1360 |
1361 | - Paper: https://arxiv.org/abs/2108.07960
1362 | - Code: None
1363 |
1364 |
1365 |
1366 | # Facial Expression Recognition(人脸表情识别)
1367 |
1368 | **TransFER: Learning Relation-aware Facial Expression Representations with Transformers**
1369 |
1370 | - Paper: https://arxiv.org/abs/2108.11116
1371 | - Code: None
1372 |
1373 |
1374 |
1375 | # 行为识别(Action Recognition)
1376 |
1377 | **MGSampler: An Explainable Sampling Strategy for Video Action Recognition**
1378 |
1379 | - Paper: https://arxiv.org/abs/2104.09952
1380 | - Code: None
1381 |
1382 | **Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition**
1383 |
1384 | - Paper: https://arxiv.org/abs/2107.12213
1385 | - Code: https://github.com/Uason-Chen/CTR-GCN
1386 |
1387 | **Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization**
1388 |
1389 | - Paper: https://arxiv.org/abs/2108.02183
1390 | - Code: None
1391 |
1392 | **Dynamic Network Quantization for Efficient Video Inference**
1393 |
1394 | - Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html
1395 | - Paper: https://arxiv.org/abs/2108.10394
1396 | - Code: https://github.com/sunxm2357/VideoIQ
1397 |
1398 |
1399 |
1400 | # 时序动作定位(Temporal Action Localization)
1401 |
1402 | **Enriching Local and Global Contexts for Temporal Action Localization**
1403 |
1404 | - Paper: https://arxiv.org/abs/2107.12960
1405 | - Code: None
1406 |
1407 |
1408 |
1409 | # 动作检测(Action Detection)
1410 |
1411 | **Class Semantics-based Attention for Action Detection**
1412 |
1413 | - Paper: https://arxiv.org/abs/2109.02613
1414 | - Code: None
1415 |
1416 |
1417 |
1418 | # 群体活动识别(Group Activity Recognition)
1419 |
1420 | **GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer**
1421 |
1422 | - Paper: https://arxiv.org/abs/2108.12630
1423 | - Code: https://github.com/xueyee/GroupFormer
1424 |
1425 |
1426 |
1427 | # 手语识别(Sign Language Recognition)
1428 |
1429 | **Visual Alignment Constraint for Continuous Sign Language Recognition**
1430 |
1431 | - Paper: https://arxiv.org/abs/2104.02330
1432 | - Code: https://github.com/ycmin95/VAC_CSLR
1433 |
1434 |
1435 |
1436 | # 文本检测(Text Detection)
1437 |
1438 | **Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection**
1439 |
1440 | - Paper: https://arxiv.org/abs/2107.12664
1441 | - Code: https://github.com/GXYM/TextBPN
1442 |
1443 |
1444 |
1445 | # 文本识别(Text Recognition)
1446 |
1447 | **Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition**
1448 |
1449 | - Paper: https://arxiv.org/abs/2107.12090
1450 | - Code: None
1451 |
1452 |
1453 |
1454 | # 文本替换(Text Replacement)
1455 |
1456 | **STRIVE: Scene Text Replacement In Videos**
1457 |
1458 | - Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
1459 | - Paper: https://arxiv.org/abs/2109.02762
1460 | - Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1461 |
1462 | - Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1463 |
1464 |
1465 |
1466 | # 视觉问答(Visual Question Answering, VQA)
1467 |
1468 | **Greedy Gradient Ensemble for Robust Visual Question Answering**
1469 |
1470 | - Paper: https://arxiv.org/abs/2107.12651
1471 |
1472 | - Code: https://github.com/GeraldHan/GGE
1473 |
1474 |
1475 |
1476 | # 对抗攻击(Adversarial Attack)
1477 |
1478 | **Feature Importance-aware Transferable Adversarial Attacks**
1479 |
1480 | - Paper: https://arxiv.org/abs/2107.14185
1481 | - Code: https://github.com/hcguoO0/FIA
1482 |
1483 | **AdvDrop: Adversarial Attack to DNNs by Dropping Information**
1484 |
1485 | - Paper: https://arxiv.org/abs/2108.09034
1486 | - Code: https://github.com/RjDuan/AdvDrop
1487 |
1488 |
1489 |
1490 | # 深度估计(Depth Estimation)
1491 |
1492 | **Augmenting Depth Estimation with Geospatial Context**
1493 |
1494 | - Paper: https://arxiv.org/abs/2109.09879
1495 | - Code: None
1496 |
1497 | **NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo**
1498 |
1499 | - Paper(Oral): https://arxiv.org/abs/2109.01129
1500 | - Code: https://github.com/weiyithu/NerfingMVS
1501 |
1502 | ## 单目深度估计
1503 |
1504 | **MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments**
1505 |
1506 | - Paper: https://arxiv.org/abs/2107.12429
1507 | - Code: None
1508 |
1509 | **Towards Interpretable Deep Networks for Monocular Depth Estimation**
1510 |
1511 | - Paper: https://arxiv.org/abs/2108.05312
1512 | - Code: https://github.com/youzunzhi/InterpretableMDE
1513 |
1514 | **Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark**
1515 |
1516 | - Paper: https://arxiv.org/abs/2108.03830
1517 | - Code: https://github.com/w2kun/RNW
1518 |
1519 | **Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation**
1520 |
1521 | - Paper: https://arxiv.org/abs/2108.07628
1522 | - Code: https://github.com/LINA-lln/ADDS-DepthNet
1523 |
1524 | **StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation**
1525 |
1526 | - Paper: https://arxiv.org/abs/2108.08574
1527 | - Code: https://github.com/SJTU-ViSYS/StructDepth
1528 |
1529 |
1530 |
1531 | # 视线估计(Gaze Estimation)
1532 |
1533 | **Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation**
1534 |
1535 | - Paper: https://arxiv.org/abs/2107.13780
1536 | - Code: https://github.com/DreamtaleCore/PnP-GA
1537 |
1538 |
1539 |
1540 | # 人群计数(Crowd Counting)
1541 |
1542 | **Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework**
1543 |
1544 | - Paper(Oral): https://arxiv.org/abs/2107.12746
1545 | - Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
1546 |
1547 | **Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting**
1548 |
1549 | - Paper: https://arxiv.org/abs/2107.12619
1550 | - Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
1551 |
1552 |
1553 |
1554 | # 车道线检测(Lane-Detection)
1555 |
1556 | **VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection**
1557 |
1558 | - Paper: https://arxiv.org/abs/2108.08482
1559 | - Code: https://github.com/yujun0-0/MMA-Net
1560 |
1561 | - Dataset: https://github.com/yujun0-0/MMA-Net
1562 |
1563 |
1564 |
1565 | # 轨迹预测(Trajectory Prediction)
1566 |
1567 | **Human Trajectory Prediction via Counterfactual Analysis**
1568 |
1569 | - Paper: https://arxiv.org/abs/2107.14202
1570 | - Code: https://github.com/CHENGY12/CausalHTP
1571 |
1572 | **Personalized Trajectory Prediction via Distribution Discrimination**
1573 |
1574 | - Paper: https://arxiv.org/abs/2107.14204
1575 | - Code: https://github.com/CHENGY12/DisDis
1576 |
1577 | **MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction**
1578 |
1579 | - Paper: https://arxiv.org/abs/2108.09274
1580 | - Code: https://github.com/selflein/MG-GAN
1581 |
1582 | **Social NCE: Contrastive Learning of Socially-aware Motion Representations**
1583 |
1584 | - Paper: https://arxiv.org/abs/2012.11717
1585 | - Code: https://github.com/vita-epfl/social-nce
1586 |
1587 | **Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving**
1588 |
1589 | - Paper: https://arxiv.org/abs/2109.01510
1590 | - Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction
1591 |
1592 | **Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples**
1593 |
1594 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Where_Are_You_Heading_Dynamic_Trajectory_Prediction_With_Expert_Goal_ICCV_2021_paper.pdf
1595 | - Code: https://github.com/JoeHEZHAO/expert_traj
1596 |
1597 |
1598 |
1599 | # 异常检测(Anomaly Detection)
1600 |
1601 | **Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning**
1602 |
1603 | - Paper: https://arxiv.org/abs/2101.10030
1604 | - Code: https://github.com/tianyu0207/RTFM
1605 |
1606 |
1607 |
1608 | # 场景图生成(Scene Graph Generation)
1609 |
1610 | **Spatial-Temporal Transformer for Dynamic Scene Graph Generation**
1611 |
1612 | - Paper: https://arxiv.org/abs/2107.12309
1613 | - Code: None
1614 |
1615 |
1616 |
1617 | # 图像编辑(Image Editing)
1618 |
1619 | **Sketch Your Own GAN**
1620 |
1621 | - Homepage: https://peterwang512.github.io/GANSketching/
1622 | - Paper: https://arxiv.org/abs/2108.02774
1623 | - 代码: https://github.com/peterwang512/GANSketching
1624 |
1625 |
1626 |
1627 | # 图像合成(Image Synthesis)
1628 |
1629 | **Image Synthesis via Semantic Composition**
1630 |
1631 | - Homepage: https://shepnerd.github.io/scg/
1632 | - Paper: https://arxiv.org/abs/2109.07053
1633 | - Code: https://github.com/dvlab-research/SCGAN
1634 |
1635 |
1636 |
1637 | # 图像检索(Image Retrieval)
1638 |
1639 | **Self-supervised Product Quantization for Deep Unsupervised Image Retrieval**
1640 |
1641 | - Paper: https://arxiv.org/abs/2109.02244
1642 | - Code: https://github.com/youngkyunJang/SPQ
1643 |
1644 |
1645 |
1646 | # 三维重建(3D Reconstruction)
1647 |
1648 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
1649 |
1650 | - Paper: https://arxiv.org/abs/2109.00512
1651 |
1652 | - Code: https://github.com/facebookresearch/co3d
1653 | - Dataset: https://github.com/facebookresearch/co3d
1654 |
1655 |
1656 |
1657 | # 视频稳像(Video Stabilization)
1658 |
1659 | **Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization**
1660 |
1661 | - Paper: https://arxiv.org/abs/2108.09041
1662 |
1663 | - 代码:https://github.com/Annbless/OVS_Stabilization
1664 |
1665 |
1666 |
1667 | # 细粒度识别(Fine-Grained Recognition)
1668 |
1669 | **Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach**
1670 |
1671 | - Paper: https://arxiv.org/abs/2108.02399
1672 | - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1673 | - Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1674 |
1675 |
1676 |
1677 | # 风格迁移(Style Transfer)
1678 |
1679 | **AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer**
1680 |
1681 | - Paper: https://arxiv.org/abs/2108.03647
1682 |
1683 | - Paddle Code:https://github.com/PaddlePaddle/PaddleGAN
1684 |
1685 | - PyTorch Code:https://github.com/Huage001/AdaAttN
1686 |
1687 |
1688 |
1689 | # 神经绘画(Neural Painting)
1690 |
1691 | **Paint Transformer: Feed Forward Neural Painting with Stroke Prediction**
1692 |
1693 | - Paper: https://arxiv.org/abs/2108.03798
1694 | - Code: https://github.com/wzmsltw/PaintTransformer
1695 |
1696 |
1697 |
1698 | # 特征匹配(Feature Matching)
1699 |
1700 | **Learning to Match Features with Seeded Graph Matching Network**
1701 |
1702 | - Paper: https://arxiv.org/abs/2108.08771
1703 |
1704 | - Code: https://github.com/vdvchen/SGMNet
1705 |
1706 |
1707 |
1708 | # 语义对应(Semantic Correspondence)
1709 |
1710 | **Multi-scale Matching Networks for Semantic Correspondence**
1711 |
1712 | - Paper: https://arxiv.org/abs/2108.00211
1713 | - Code: https://github.com/wintersun661/MMNet
1714 |
1715 |
1716 |
1717 | # 边缘检测(Edge Detection)
1718 |
1719 | **Pixel Difference Networks for Efficient Edge Detection**
1720 |
1721 | - Paper: https://arxiv.org/abs/2108.07009
1722 | - Code: https://github.com/zhuoinoulu/pidinet
1723 |
1724 | **RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth**
1725 |
1726 | - Paper: https://arxiv.org/abs/2108.00616
1727 | - Code : https://github.com/MengyangPu/RINDNet
1728 | - Dataset: https://github.com/MengyangPu/RINDNet
1729 |
1730 |
1731 |
1732 | # 相机标定(Camera calibration)
1733 |
1734 | **CTRL-C: Camera calibration TRansformer with Line-Classification**
1735 |
1736 | - Paper: https://arxiv.org/abs/2109.02259
1737 | - Code: https://github.com/jwlee-vcl/CTRL-C
1738 |
1739 |
1740 |
1741 | # 图像质量评估(Image Quality Assessment)
1742 |
1743 | **MUSIQ: Multi-scale Image Quality Transformer**
1744 |
1745 | - Paper: https://arxiv.org/abs/2108.05997
1746 | - Code: https://github.com/google-research/google-research/tree/master/musiq
1747 |
1748 | **Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment**
1749 |
1750 | - Paper: https://arxiv.org/abs/2108.07948
1751 | - Code: https://github.com/researchmm/CKDN
1752 |
1753 |
1754 |
1755 | # 度量学习(Metric Learning)
1756 |
1757 | **Deep Relational Metric Learning**
1758 |
1759 | - Paper: https://arxiv.org/abs/2108.10026
1760 | - Code: https://github.com/zbr17/DRML
1761 |
1762 | **Towards Interpretable Deep Metric Learning with Structural Matching**
1763 |
1764 | - Paper: https://arxiv.org/abs/2108.05889
1765 | - Code: https://github.com/wl-zhao/DIML
1766 |
1767 |
1768 |
1769 | # Unsupervised Domain Adaptation
1770 |
1771 | **Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation**
1772 |
1773 | - Paper(Oral): https://arxiv.org/abs/2107.13467
1774 | - Code: None
1775 |
1776 |
1777 |
1778 | # Video Rescaling
1779 |
1780 | **Self-Conditioned Probabilistic Learning of Video Rescaling**
1781 |
1782 | - Paper: https://arxiv.org/abs/2107.11639
1783 |
1784 | - Code: None
1785 |
1786 |
1787 |
1788 | # Hand-Object Interaction
1789 |
1790 | **Learning a Contact Potential Field to Model the Hand-Object Interaction**
1791 |
1792 | - Paper: https://arxiv.org/abs/2012.00924
1793 | - Code: https://lixiny.github.io/CPF
1794 |
1795 |
1796 |
1797 | # Vision-and-Language Navigation
1798 |
1799 | **Airbert: In-domain Pretraining for Vision-and-Language Navigation**
1800 |
1801 | - Paper: https://arxiv.org/abs/2108.09105
1802 | - Code: https://airbert-vln.github.io/
1803 | - Dataset: https://airbert-vln.github.io/
1804 |
1805 |
1806 |
1807 | # 数据集(Datasets)
1808 |
1809 | **Beyond Road Extraction: A Dataset for Map Update using Aerial Images**
1810 |
1811 | - Homepage: https://favyen.com/muno21/
1812 | - Paper: https://arxiv.org/abs/2110.04690
1813 |
1814 | - Code: https://github.com/favyen/muno21
1815 |
1816 | **StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation**
1817 |
1818 | - Paper: https://arxiv.org/abs/2109.10115
1819 | - Code: None
1820 | - Dataset: None
1821 |
1822 | **RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth**
1823 |
1824 | - Paper: https://arxiv.org/abs/2108.00616
1825 | - Code : https://github.com/MengyangPu/RINDNet
1826 | - Dataset: https://github.com/MengyangPu/RINDNet
1827 |
1828 | **Panoptic Narrative Grounding**
1829 |
1830 | - Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
1831 | - Paper(Oral): https://arxiv.org/abs/2109.04988
1832 | - Code: https://github.com/BCV-Uniandes/PNG
1833 | - Dataset: https://github.com/BCV-Uniandes/PNG
1834 |
1835 | **STRIVE: Scene Text Replacement In Videos**
1836 |
1837 | - Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
1838 | - Paper: https://arxiv.org/abs/2109.02762
1839 | - Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1840 |
1841 | - Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1842 |
1843 | **Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme**
1844 |
1845 | - Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
1846 | - Code: https://github.com/IanYeung/RealVSR
1847 | - Dataset: https://github.com/IanYeung/RealVSR
1848 |
1849 | **Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes**
1850 |
1851 | - Paper: https://arxiv.org/abs/2109.03585
1852 |
1853 | - Code: None
1854 |
1855 | **Dual-Camera Super-Resolution with Aligned Attention Modules**
1856 |
1857 | - Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1858 | - Paper: https://arxiv.org/abs/2109.01349
1859 | - Code: https://github.com/Tengfei-Wang/DualCameraSR
1860 | - Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1861 |
1862 | **DepthTrack: Unveiling the Power of RGBD Tracking**
1863 |
1864 | - Paper: https://arxiv.org/abs/2108.13962
1865 | - Code: https://github.com/xiaozai/DeT
1866 | - Dataset: https://github.com/xiaozai/DeT
1867 |
1868 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
1869 |
1870 | - Paper: https://arxiv.org/abs/2109.00512
1871 |
1872 | - Code: https://github.com/facebookresearch/co3d
1873 | - Dataset: https://github.com/facebookresearch/co3d
1874 |
1875 | **BioFors: A Large Biomedical Image Forensics Dataset**
1876 |
1877 | - Paper: https://arxiv.org/abs/2108.12961
1878 | - Code: None
1879 | - Dataset: None
1880 |
1881 | **Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach**
1882 |
1883 | - Paper: https://arxiv.org/abs/2108.02399
1884 | - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1885 | - Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1886 |
1887 | **Airbert: In-domain Pretraining for Vision-and-Language Navigation**
1888 |
1889 | - Paper: https://arxiv.org/abs/2108.09105
1890 | - Code: https://airbert-vln.github.io/
1891 | - Dataset: https://airbert-vln.github.io/
1892 |
1893 | **Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation**
1894 |
1895 | - Paper: http://arxiv.org/abs/2108.08202
1896 | - Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
1897 | - Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
1898 |
1899 | **VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection**
1900 |
1901 | - Paper: https://arxiv.org/abs/2108.08482
1902 | - Code: https://github.com/yujun0-0/MMA-Net
1903 |
1904 | - Dataset: https://github.com/yujun0-0/MMA-Net
1905 |
1906 | **XVFI: eXtreme Video Frame Interpolation**
1907 |
1908 | - Paper(Oral): https://arxiv.org/abs/2103.16206
1909 | - Code: https://github.com/JihyongOh/XVFI
1910 | - Dataset: https://github.com/JihyongOh/XVFI
1911 |
1912 | **Personalized Image Semantic Segmentation**
1913 |
1914 | - Paper: https://arxiv.org/abs/2107.13978
1915 | - Code: https://github.com/zhangyuygss/PIS
1916 | - Dataset: https://github.com/zhangyuygss/PIS
1917 |
1918 | **H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction**
1919 |
1920 | - Homepage: https://crisalixsa.github.io/h3d-net/
1921 |
1922 | - Paper: https://arxiv.org/abs/2107.12512
1923 |
1924 |
1925 |
1926 | # 其他(Others)
1927 |
1928 | **Photon-Starved Scene Inference using Single Photon Cameras**
1929 |
1930 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Goyal_Photon-Starved_Scene_Inference_Using_Single_Photon_Cameras_ICCV_2021_paper.pdf
1931 |
1932 | - Code: https://github.com/bhavyagoyal/spclowlight
1933 |
1934 | **Towards Flexible Blind JPEG Artifacts Removal**
1935 |
1936 | - Paper: https://arxiv.org/abs/2109.14573
1937 |
1938 | - Code: https://github.com/jiaxi-jiang/FBCNN
1939 |
1940 | **Generating Attribution Maps with Disentangled Masked Backpropagation**
1941 |
1942 | - Paper: https://arxiv.org/abs/2101.06773
1943 | - Code: https://gitlab.com/adriaruizo/dmbp_iccv21
1944 |
1945 | **CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations**
1946 |
1947 | - Paper: https://arxiv.org/abs/2109.14910
1948 | - Code: None
1949 |
1950 | **ReconfigISP: Reconfigurable Camera Image Processing Pipeline**
1951 |
1952 | - Paper: https://arxiv.org/abs/2109.04760
1953 | - Code: None
1954 |
1955 | **Panoptic Narrative Grounding**
1956 |
1957 | - Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
1958 | - Paper(Oral): https://arxiv.org/abs/2109.04988
1959 | - Code: https://github.com/BCV-Uniandes/PNG
1960 | - Dataset: https://github.com/BCV-Uniandes/PNG
1961 |
1962 | **NEAT: Neural Attention Fields for End-to-End Autonomous Driving**
1963 |
1964 | - Paper: https://arxiv.org/abs/2109.04456
1965 | - https://github.com/autonomousvision/neat
1966 |
1967 | **Keep CALM and Improve Visual Feature Attribution**
1968 |
1969 | - Paper: https://arxiv.org/abs/2106.07861
1970 | - Code: https://github.com/naver-ai/calm
1971 |
1972 | **YouRefIt: Embodied Reference Understanding with Language and Gesture**
1973 |
1974 | - Paper: https://arxiv.org/abs/2109.03413
1975 | - Code: None
1976 |
1977 | **Pri3D: Can 3D Priors Help 2D Representation Learning?**
1978 |
1979 | - Paper: https://arxiv.org/abs/2104.11225
1980 | - Code: https://github.com/Sekunde/Pri3D
1981 |
1982 | **Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain**
1983 |
1984 | - Paper: https://arxiv.org/abs/2108.08487
1985 | - Code: https://github.com/iCGY96/APR
1986 |
1987 | **Continual Learning for Image-Based Camera Localization**
1988 |
1989 | - Paper: https://arxiv.org/abs/2108.09112
1990 | - Code: None
1991 |
1992 | **Multi-Task Self-Training for Learning General Representations**
1993 |
1994 | - Paper: https://arxiv.org/abs/2108.11353
1995 | - Code: None
1996 |
1997 | **A Unified Objective for Novel Class Discovery**
1998 |
1999 | - Homepage: https://ncd-uno.github.io/
2000 | - Paper(Oral): https://arxiv.org/abs/2108.08536
2001 | - Code: https://github.com/DonkeyShot21/UNO
2002 |
2003 | **Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs**
2004 |
2005 | - Paper: https://arxiv.org/abs/2108.07884
2006 | - Code: https://github.com/islamamirul/PermuteNet
2007 |
2008 | **Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation**
2009 |
2010 | - Paper: http://arxiv.org/abs/2108.08202
2011 | - Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
2012 | - Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
2013 |
2014 | **Impact of Aliasing on Generalizatin in Deep Convolutional Networks**
2015 |
2016 | - Paper: https://arxiv.org/abs/2108.03489
2017 | - Code: None
2018 |
2019 | **Out-of-Core Surface Reconstruction via Global TGV Minimization**
2020 |
2021 | - Paper: https://arxiv.org/abs/2107.14790
2022 | - Code: None
2023 |
2024 | **Progressive Correspondence Pruning by Consensus Learning**
2025 |
2026 | - Homepage: https://sailor-z.github.io/projects/CLNet.html
2027 | - Paper: https://arxiv.org/abs/2101.00591
2028 | - Code: https://github.com/sailor-z/CLNet
2029 |
2030 | **Energy-Based Open-World Uncertainty Modeling for Confidence Calibration**
2031 |
2032 | - Paper: https://arxiv.org/abs/2107.12628
2033 | - Code: None
2034 |
2035 | **Generalized Shuffled Linear Regression**
2036 |
2037 | - Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
2038 | - Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
2039 |
2040 | **Discovering 3D Parts from Image Collections**
2041 |
2042 | - Homepage: https://chhankyao.github.io/lpd/
2043 |
2044 | - Paper: https://arxiv.org/abs/2107.13629
2045 |
2046 | **Semi-Supervised Active Learning with Temporal Output Discrepancy**
2047 |
2048 | - Paper: https://arxiv.org/abs/2107.14153
2049 | - Code: https://github.com/siyuhuang/TOD
2050 |
2051 | **Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?**
2052 |
2053 | Paper: https://arxiv.org/abs/2105.02498
2054 |
2055 | Code: https://github.com/KingJamesSong/DifferentiableSVD
2056 |
2057 | **Hand-Object Contact Consistency Reasoning for Human Grasps Generation**
2058 |
2059 | - Homepage: https://hwjiang1510.github.io/GraspTTA/
2060 | - Paper(Oral): https://arxiv.org/abs/2104.03304
2061 | - Code: None
2062 |
2063 | **Equivariant Imaging: Learning Beyond the Range Space**
2064 |
2065 | - Paper(Oral): https://arxiv.org/abs/2103.14756
2066 | - Code: https://github.com/edongdongchen/EI
2067 |
2068 | **Just Ask: Learning to Answer Questions from Millions of Narrated Videos**
2069 |
2070 | - Paper(Oral): https://arxiv.org/abs/2012.00451
2071 | - Code: https://github.com/antoyang/just-ask
2072 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ICCV2023-Papers-with-Code
2 |
3 | [ICCV 2023](http://iccv2023.thecvf.com/) 论文和开源项目合集(papers with code)!
4 |
5 | 2160 papers accepted!
6 |
7 | ICCV 2023 收录论文IDs:https://t.co/A0mCH8gbOi
8 |
9 | > 注1:欢迎各位大佬提交issue,分享ICCV 2023论文和开源项目!
10 | >
11 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
12 | >
13 | > [ICCV 2021](ICCV2021-Papers-with-Code.md)
14 |
15 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【[CVer学术交流群](https://t.zsxq.com/10OGjThDw)】!互相学习,一起进步~
16 |
17 | 
18 |
19 | # 【ICCV 2023 论文开源目录】
20 |
21 | - [Backbone](#Backbone)
22 | - [CLIP](#CLIP)
23 | - [MAE](#MAE)
24 | - [GAN](#GAN)
25 | - [GNN](#GNN)
26 | - [MLP](#MLP)
27 | - [NAS](#NAS)
28 | - [OCR](#OCR)
29 | - [NeRF](#NeRF)
30 | - [DETR](#DETR)
31 | - [Prompt](#Prompt)
32 | - [Diffusion Models(扩散模型)](#Diffusion)
33 | - [Prompt](#Prompt)
34 | - [Avatars](#Avatars)
35 | - [ReID(重识别)](#ReID)
36 | - [长尾分布(Long-Tail)](#Long-Tail)
37 | - [Vision Transformer](#Vision-Transformer)
38 | - [视觉和语言(Vision-Language)](#VL)
39 | - [自监督学习(Self-supervised Learning)](#SSL)
40 | - [数据增强(Data Augmentation)](#DA)
41 | - [目标检测(Object Detection)](#Object-Detection)
42 | - [目标跟踪(Visual Tracking)](#VT)
43 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
44 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
45 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
46 | - [医学图像分类(Medical Image Classfication)](#MIC)
47 | - [医学图像分割(Medical Image Segmentation)](#MIS)
48 | - [视频目标分割(Video Object Segmentation)](#VOS)
49 | - [视频实例分割(Video Instance Segmentation)](#VIS)
50 | - [参考图像分割(Referring Image Segmentation)](#RIS)
51 | - [图像抠图(Image Matting)](#Matting)
52 | - [Low-level Vision](#LLV)
53 | - [超分辨率(Super-Resolution)](#SR)
54 | - [去噪(Denoising)](#Denoising)
55 | - [去模糊(Deblur)](#Deblur)
56 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
57 | - [3D目标检测(3D Object Detection)](#3DOD)
58 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
59 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
60 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
61 | - [3D配准(3D Registration)](#3D-Registration)
62 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
63 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
64 | - [医学图像(Medical Image)](#Medical-Image)
65 | - [图像生成(Image Generation)](#Image-Generation)
66 | - [视频生成(Video Generation)](#Video-Generation)
67 | - [图像编辑(Image Editing)](#Image-Editing)
68 | - [视频编辑(Video Editing)](#Video-Editing)
69 | - [视频理解(Video Understanding)](#Video-Understanding)
70 | - [人体运动生成(Human Motion Generation)](#Human-Motion-Generation)
71 | - [低光照图像增强(Low-light Image Enhancement)](#Low-light-Image-Enhancement)
72 | - [场景文本识别(Scene Text Recognition)](#STR)
73 | - [图像检索(Image Retrieval)](#Image-Retrieval)
74 | - [图像融合(Image Fusion)](#Image-Fusion)
75 | - [轨迹预测(Trajectory Prediction) ](#Trajectory-Prediction)
76 | - [人群计数(Crowd Counting)](#Crowd-Counting)
77 | - [Video Quality Assessment(视频质量评价)](#Video-Quality-Assessment)
78 | - [其它(Others)](#Others)
79 |
80 |
81 |
82 | # Avatars
83 |
84 | **Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control**
85 |
86 | Paper: https://arxiv.org/abs/2303.17606
87 |
88 | Code: https://github.com/songrise/AvatarCraft
89 |
90 |
91 |
92 | # Backbone
93 |
94 | **Rethinking Mobile Block for Efficient Attention-based Models**
95 |
96 | - Paper: https://arxiv.org/abs/2301.01146
97 | - Code: https://github.com/zhangzjn/EMO
98 |
99 |
100 |
101 | # CLIP
102 |
103 | **PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization**
104 |
105 | - Paper: https://arxiv.org/abs/2307.15199
106 | - Code: [https://PromptStyler.github.io/](https://promptstyler.github.io/)
107 |
108 | **CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation**
109 |
110 | - Paper: https://arxiv.org/abs/2308.15226
111 | - Code: http://www.github.com/devaansh100/CLIPTrans
112 |
113 |
114 |
115 | # NeRF
116 |
117 | **IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis**
118 |
119 | - Homepage: https://zju3dv.github.io/intrinsic_nerf/
120 | - Paper: https://arxiv.org/abs/2210.00647
121 | - Code: https://github.com/zju3dv/IntrinsicNeRF
122 |
123 | **Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control**
124 |
125 | - Paper: https://arxiv.org/abs/2303.17606
126 |
127 | - Code: https://github.com/songrise/AvatarCraft
128 |
129 | **FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis**
130 |
131 | - Homepage: https://shawn615.github.io/flipnerf/
132 | - Code: https://github.com/shawn615/FlipNeRF
133 | - Paper: https://arxiv.org/abs/2306.17723
134 |
135 | **Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields**
136 |
137 | - Homepage: https://wbhu.github.io/projects/Tri-MipRF
138 |
139 | - Paper: https://arxiv.org/abs/2307.11335
140 | - Code: https://github.com/wbhu/Tri-MipRF
141 |
142 |
143 |
144 | # Diffusion Models(扩散模型)
145 |
146 | **PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment**
147 |
148 | - Paper: https://arxiv.org/abs/2306.15667
149 | - Code: https://github.com/facebookresearch/PoseDiffusion
150 |
151 | **FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model**
152 |
153 | - Paper: https://arxiv.org/abs/2303.09833
154 | - Code: https://github.com/vvictoryuki/FreeDoM
155 |
156 | **BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion**
157 |
158 | - Paper: https://arxiv.org/abs/2307.10816
159 | - Code: https://github.com/Sierkinhane/BoxDiff
160 |
161 | **BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction**
162 |
163 | - Paper: https://arxiv.org/abs/2211.14304
164 | - Code: https://github.com/BarqueroGerman/BeLFusion
165 |
166 | **DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion**
167 |
168 | - Paper: https://arxiv.org/abs/2303.06840
169 | - Code: https://github.com/Zhaozixiang1228/MMIF-DDFM
170 |
171 | **DIRE for Diffusion-Generated Image Detection**
172 |
173 | - Paper: https://arxiv.org/abs/2303.09295
174 | - Code: https://github.com/ZhendongWang6/DIRE
175 |
176 |
177 |
178 | # Prompt
179 |
180 | **Read-only Prompt Optimization for Vision-Language Few-shot Learning**
181 |
182 | - Paper: https://arxiv.org/abs/2308.14960
183 | - Code: https://github.com/mlvlab/RPO
184 |
185 | **Introducing Language Guidance in Prompt-based Continual Learning**
186 |
187 | - Paper: https://arxiv.org/abs/2308.15827
188 | - Code: None
189 |
190 |
191 |
192 | # 视觉和语言(Vision-Language)
193 |
194 | **Read-only Prompt Optimization for Vision-Language Few-shot Learning**
195 |
196 | - Paper: https://arxiv.org/abs/2308.14960
197 | - Code: https://github.com/mlvlab/RPO
198 |
199 |
200 |
201 | # 目标检测(Object Detection)
202 |
203 | **Femtodet: an object detection baseline for energy versus performance tradeoffs**
204 |
205 | - Paper: https://arxiv.org/abs/2301.06719
206 | - Code: https://github.com/yh-pengtu/FemtoDet
207 |
208 | **Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment**
209 |
210 | - Paper: https://arxiv.org/abs/2207.13085
211 | - Code: https://github.com/Atten4Vis/GroupDETR
212 |
213 | **Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection**
214 |
215 | - Paper: https://arxiv.org/abs/2205.09613
216 | - Code: https://github.com/LiewFeng/imTED
217 |
218 | **ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation**
219 |
220 | - Paper: https://arxiv.org/abs/2308.09242
221 | - Code: https://github.com/iSEE-Laboratory/ASAG
222 |
223 |
224 |
225 | # 目标跟踪(Visual Tracking)
226 |
227 | **Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers**
228 |
229 | - Paper: https://arxiv.org/abs/2307.04129
230 | - Code: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker
231 |
232 |
233 |
234 | # 语义分割(Semantic Segmentation)
235 |
236 | **Segment Anything**
237 |
238 | - Homepage: https://segment-anything.com/
239 | - Paper: https://arxiv.org/abs/2304.02643
240 | - Code: https://github.com/facebookresearch/segment-anything
241 |
242 | **MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation**
243 |
244 | - Paper: https://arxiv.org/abs/2304.09913
245 | - Code: https://github.com/shjo-april/MARS
246 |
247 | **FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation**
248 |
249 | - Paper: https://arxiv.org/abs/2307.07245
250 | - Code: https://github.com/TY-Shi/FreeCOS
251 |
252 | **Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation**
253 |
254 | - Paper: https://arxiv.org/abs/2211.14512
255 | - Code: https://github.com/yyliu01
256 |
257 | **Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement**
258 |
259 | - Paper: https://arxiv.org/abs/2307.09362
260 | - Code: https://github.com/w1oves/DTP
261 |
262 |
263 |
264 | # 视频目标分割(Video Object Segmentation)
265 |
266 | **Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus**
267 |
268 | - Paper: https://arxiv.org/abs/2207.01203
269 |
270 | - Code: https://github.com/lxa9867/R2VOS
271 |
272 |
273 |
274 | # 视频实例分割(Video Instance Segmentation)
275 |
276 | **DVIS: Decoupled Video Instance Segmentation Framework**
277 |
278 | - Paper: https://arxiv.org/abs/2306.03413
279 | - Code: https://github.com/zhang-tao-whu/DVIS
280 |
281 |
282 |
283 | # 医学图像分类
284 |
285 | **BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification**
286 |
287 | - Paper: https://arxiv.org/abs/2203.01937
288 |
289 | - Code: https://github.com/cyh-0/BoMD
290 |
291 |
292 |
293 | # 医学图像分割
294 |
295 | **CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection**
296 |
297 | - Paper: https://arxiv.org/abs/2301.00785
298 | - Code: https://github.com/ljwztc/CLIP-Driven-Universal-Model
299 |
300 |
301 |
302 | # Low-level Vision
303 |
304 | **Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive**
305 |
306 | - Paper: https://arxiv.org/abs/2305.19862
307 | - Code: https://github.com/shangwei5/SelfDRSC
308 |
309 |
310 |
311 | # 超分辨率(Super-Resolution)
312 |
313 | **Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.**
314 |
315 | - Paper: https://arxiv.org/abs/2303.08942
316 | - Code: https://github.com/Zhaozixiang1228/GDSR-SSDNet
317 |
318 |
319 |
320 | # 3D点云(3D Point Cloud)
321 |
322 | **Robo3D: Towards Robust and Reliable 3D Perception against Corruptions**
323 |
324 | - Homepage: https://ldkong.com/Robo3D
325 | - Paper: https://arxiv.org/abs/2303.17597
326 | - Code: https://github.com/ldkong1205/Robo3D
327 |
328 | **Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models**
329 |
330 | - Paper: https://arxiv.org/abs/2304.07221
331 | - Code: https://github.com/zyh16143998882/ICCV23-IDPT
332 |
333 | **Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos**
334 |
335 | - Paper: https://arxiv.org/abs/2308.09247
336 | - Code: None
337 |
338 |
339 |
340 | # 3D目标检测(3D Object Detection)
341 |
342 | **PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images**
343 |
344 | - Paper: https://arxiv.org/abs/2206.01256
345 | - Code: https://github.com/megvii-research/PETR
346 |
347 | **DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection**
348 |
349 | - Paper: https://arxiv.org/abs/2304.13031
350 | - Code: https://github.com/AIR-DISCOVER/DQS3D
351 |
352 | **SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection**
353 |
354 | - Paper: https://arxiv.org/abs/2304.14340
355 | - Code: https://github.com/yichen928/SparseFusion
356 |
357 | **StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection**
358 |
359 | - Paper: https://arxiv.org/abs/2303.11926
360 | - Code: https://github.com/exiawsh/StreamPETR.git
361 |
362 | **Cross Modal Transformer: Towards Fast and Robust 3D Object Detection**
363 |
364 | - Paper: https://arxiv.org/abs/2301.01283
365 | - Code: https://github.com/junjie18/CMT.git
366 |
367 | **MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation**
368 |
369 | - Paper: https://arxiv.org/abs/2304.09801
370 | - Project: https://chongjiange.github.io/metabev.html
371 | - Code: https://github.com/ChongjianGE/MetaBEV
372 |
373 | **Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling**
374 |
375 | - Paper: https://arxiv.org/abs/2307.07944
376 | - Code: https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet
377 |
378 | **SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection**
379 |
380 | - Paper: https://arxiv.org/abs/2307.11477
381 | - Code: https://github.com/mengtan00/SA-BEV
382 |
383 |
384 |
385 | # 3D语义分割(3D Semantic Segmentation)
386 |
387 | **Rethinking Range View Representation for LiDAR Segmentation**
388 |
389 | - Homepage: https://ldkong.com/RangeFormer
390 | - Paper: https://arxiv.org/abs/2303.05367
391 | - Code: None
392 |
393 |
394 |
395 | # 3D目标跟踪(3D Object Tracking)
396 |
397 | **MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors**
398 |
399 | - Paper: https://arxiv.org/abs/2303.05071
400 | - Code : https://github.com/slothfulxtx/MBPTrack3D
401 |
402 |
403 |
404 | # 视频理解(Video Understanding)
405 |
406 | **Unmasked Teacher: Towards Training-Efficient Video Foundation Models**
407 |
408 | - Paper: https://arxiv.org/abs/2303.16058
409 |
410 | - Code: https://github.com/OpenGVLab/unmasked_teacher
411 |
412 |
413 |
414 | # 图像生成(Image Generation)
415 |
416 | **FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model**
417 |
418 | - Paper: https://arxiv.org/abs/2303.09833
419 | - Code: https://github.com/vvictoryuki/FreeDoM
420 |
421 | **BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion**
422 |
423 | - Paper: https://arxiv.org/abs/2307.10816
424 | - Code: https://github.com/Sierkinhane/BoxDiff
425 |
426 |
427 |
428 | # 视频生成(Video Generation)
429 |
430 | **Simulating Fluids in Real-World Still Images**
431 |
432 | - Homepage: https://slr-sfs.github.io/
433 | - Paper: https://arxiv.org/abs/2204.11335
434 | - Code: https://github.com/simon3dv/SLR-SFS
435 |
436 |
437 |
438 | # 图像编辑(Image Editing)
439 |
440 | **Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing**
441 |
442 | - Paper: https://arxiv.org/abs/2304.02051
443 | - Code: https://github.com/aimagelab/multimodal-garment-designer
444 |
445 |
446 |
447 | # 视频编辑(Video Editing)
448 |
449 | **FateZero: Fusing Attentions for Zero-shot Text-based Video Editing**
450 |
451 | - Project: https://fate-zero-edit.github.io/
452 | - Paper: https://arxiv.org/abs/2303.09535
453 | - Code: https://github.com/ChenyangQiQi/FateZero
454 |
455 |
456 |
457 | # 人体运动生成(Human Motion Generation)
458 |
459 | **BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction**
460 |
461 | - Paper: https://arxiv.org/abs/2211.14304
462 | - Code: https://github.com/BarqueroGerman/BeLFusion
463 |
464 |
465 |
466 | # 低光照图像增强(Low-light Image Enhancement)
467 |
468 | **Implicit Neural Representation for Cooperative Low-light Image Enhancement**
469 |
470 | - Paper: https://arxiv.org/abs/2303.11722
471 | - Code: https://github.com/Ysz2022/NeRCo
472 |
473 |
474 |
475 | # 场景文本检测(Scene Text Detection)
476 |
477 |
478 |
479 |
480 |
481 | # 场景文本识别(Scene Text Recognition)
482 |
483 | **Self-supervised Character-to-Character Distillation for Text Recognition**
484 |
485 | - Paper: https://arxiv.org/abs/2211.00288
486 | - Code: https://github.com/TongkunGuan/CCD
487 |
488 | **MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition**
489 |
490 | - Paper: https://arxiv.org/abs/2305.14758
491 | - Code: https://github.com/simplify23/MRN
492 | - 中文解读:https://zhuanlan.zhihu.com/p/643948935
493 |
494 |
495 |
496 | # 图像检索(Image Retrieval)
497 |
498 | **Zero-Shot Composed Image Retrieval with Textual Inversion**
499 |
500 | - Paper: https://arxiv.org/abs/2303.15247
501 | - Code: https://github.com/miccunifi/SEARLE
502 |
503 |
504 |
505 | # 图像融合(Image Fusion)
506 |
507 | **DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion**
508 |
509 | - Paper: https://arxiv.org/abs/2303.06840
510 | - Code: https://github.com/Zhaozixiang1228/MMIF-DDFM
511 |
512 |
513 |
514 | # 轨迹预测(Trajectory Prediction)
515 |
516 | **EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting**
517 |
518 | - Homepage: https://inhwanbae.github.io/publication/eigentrajectory/
519 |
520 | - Paper: https://arxiv.org/abs/2307.09306
521 | - Code: https://github.com/InhwanBae/EigenTrajectory
522 |
523 |
524 |
525 | # 人群计数(Crowd Counting)
526 |
527 | **Point-Query Quadtree for Crowd Counting, Localization, and More**
528 |
529 | - Paper: https://arxiv.org/abs/2308.13814
530 | - Code: https://github.com/cxliu0/PET
531 |
532 |
533 |
534 | # Video Quality Assessment(视频质量评价)
535 |
536 | **Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives**
537 |
538 | - Paper: https://arxiv.org/abs/2211.04894
539 | - Code: https://github.com/VQAssessment/DOVER
540 |
541 |
542 |
543 | # 其它(Others)
544 |
545 | **MotionBERT: A Unified Perspective on Learning Human Motion Representations**
546 |
547 | - Homepage: https://motionbert.github.io/
548 | - Paper: https://arxiv.org/abs/2210.06551
549 | - Code: https://github.com/Walter0807/MotionBERT
550 |
551 | **Graph Matching with Bi-level Noisy Correspondence**
552 |
553 | - Paper: https://arxiv.org/pdf/2212.04085.pdf
554 | - Code: https://github.com/Lin-Yijie/Graph-Matching-Networks/tree/main/COMMON
555 |
556 | **LDL: Line Distance Functions for Panoramic Localization**
557 |
558 | - Paper: https://arxiv.org/abs/2308.13989
559 | - Code: https://github.com/82magnolia/panoramic-localization
560 |
561 | **Active Neural Mapping**
562 |
563 | - Homepage: https://zikeyan.github.io/active-INR/index.html
564 | - Paper: https://arxiv.org/abs/2308.16246
565 | - Code: https://zikeyan.github.io/active-INR/index.html#
566 |
567 | **Reconstructing Groups of People with Hypergraph Relational Reasoning**
568 |
569 | - Paper: https://arxiv.org/abs/2308.15844
570 | - Code: https://github.com/boycehbz/GroupRec
--------------------------------------------------------------------------------