├── CVPR2019-Papers-with-Code.md
├── CVPR2020-Papers-with-Code.md
├── CVPR2021-Papers-with-Code.md
├── CVPR2022-Papers-with-Code.md
├── CVPR2023-Papers-with-Code.md
├── CVer学术交流群.png
├── README.md
└── master
/CVPR2019-Papers-with-Code.md:
--------------------------------------------------------------------------------
1 | # CVPR2019-Code
2 |
3 | CVPR 2019 论文开源项目合集
4 |
5 | 传送门:[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)
6 |
7 | 附:[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv)
8 |
9 | - [目标检测](#Object-Detection)
10 | - [目标跟踪](#Object-Tracking)
11 | - [语义分割](#Semantic-Segmentation)
12 | - [实例分割](#Instance-Segmentation)
13 | - [GAN](#GAN)
14 | - [人脸检测](#Face-Detection)
15 | - [人体姿态估计](#Human-Pose-Estimation)
16 | - [6DoF 姿态估计](#6DoF-Pose-Estimation)
17 | - [头部姿态估计](#Head-Pose-Estimation)
18 | - [人群密度估计](#Crowd-Counting)
19 |
20 | **更新记录:**
21 |
22 | - 20200226:添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)
23 |
24 | - 20191026:添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv)
25 | - 20190405:添加 8 篇论文(目标检测、语义分割等方向)
26 | - 20190408:添加 6 篇论文(目标跟踪、GAN、6DoF姿态估计等方向)
27 |
28 |
29 |
30 | # 目标检测
31 |
32 | **Bounding Box Regression with Uncertainty for Accurate Object Detection**
33 |
34 | - arXiv:
35 |
36 | - github:
37 |
38 |
39 |
40 | # 目标跟踪
41 |
42 | **Fast Online Object Tracking and Segmentation: A Unifying Approach**
43 |
44 | - arXiv:
45 |
46 | - github:
47 |
48 | - homepage:
49 |
50 | **Unsupervised Deep Tracking**
51 |
52 | - arXiv:
53 |
54 | - github:
55 |
56 | - github(PyTorch):
57 |
58 | **Target-Aware Deep Tracking**
59 |
60 | - arXiv:
61 |
62 | - homepage:
63 |
64 |
65 |
66 | # 语义分割
67 |
68 | **Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation**
69 |
70 | - arXiv:
71 |
72 | - github:[https://github.com/LinZhuoChen/DUpsampling(非官方)](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89)
73 |
74 | **Dual Attention Network for Scene Segmentation**
75 |
76 | - arXiv:
77 |
78 | - github:
79 |
80 | **Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images**
81 |
82 | - arXiv:None
83 |
84 | - github:
85 |
86 |
87 |
88 | # 实例分割
89 |
90 | **Mask Scoring R-CNN**
91 |
92 | - arXiv:
93 |
94 | - github:
95 |
96 |
97 |
98 | # GAN
99 |
100 | **Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis**
101 |
102 | - arXiv:
103 | - github:
104 |
105 |
106 |
107 | # 人脸检测
108 |
109 | **DSFD: Dual Shot Face Detector**
110 |
111 | - arXiv:
112 |
113 | - github:
114 |
115 |
116 |
117 | # 人体姿态估计
118 |
119 | **Deep High-Resolution Representation Learning for Human Pose Estimation**
120 |
121 | - arXiv:
122 |
123 | - github:
124 |
125 |
126 |
127 | # 6DoF姿态估计
128 |
129 | **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**
130 |
131 | - arXiv:
132 | - github:
133 |
134 |
135 |
136 | # 头部姿态估计
137 |
138 | **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**
139 |
140 | - paper:
141 | - github:
142 |
143 |
144 |
145 | # 人群密度估计
146 |
147 | **Learning from Synthetic Data for Crowd Counting in the Wild**
148 |
149 | - arXiv:
150 | - github:
151 | - homepage:
--------------------------------------------------------------------------------
/CVPR2020-Papers-with-Code.md:
--------------------------------------------------------------------------------
1 | # CVPR2020-Code
2 |
3 | [CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目
4 |
5 | **【推荐阅读】**
6 |
7 | - [CVPR 2020 virtual](http://cvpr20.com/)
8 | - ECCV 2020 论文开源项目合集来了:https://github.com/amusi/ECCV2020-Code
9 |
10 | - 关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
11 |
12 | **【CVPR 2020 论文开源目录】**
13 |
14 | - [CNN](#CNN)
15 | - [图像分类](#Image-Classification)
16 | - [视频分类](#Video-Classification)
17 | - [目标检测](#Object-Detection)
18 | - [3D目标检测](#3D-Object-Detection)
19 | - [视频目标检测](#Video-Object-Detection)
20 | - [目标跟踪](#Object-Tracking)
21 | - [语义分割](#Semantic-Segmentation)
22 | - [实例分割](#Instance-Segmentation)
23 | - [全景分割](#Panoptic-Segmentation)
24 | - [视频目标分割](#VOS)
25 | - [超像素分割](#Superpixel)
26 | - [交互式图像分割](#IIS)
27 | - [NAS](#NAS)
28 | - [GAN](#GAN)
29 | - [Re-ID](#Re-ID)
30 | - [3D点云(分类/分割/配准/跟踪等)](#3D-PointCloud)
31 | - [人脸(识别/检测/重建等)](#Face)
32 | - [人体姿态估计(2D/3D)](#Human-Pose-Estimation)
33 | - [人体解析](#Human-Parsing)
34 | - [场景文本检测](#Scene-Text-Detection)
35 | - [场景文本识别](#Scene-Text-Recognition)
36 | - [特征(点)检测和描述](#Feature)
37 | - [超分辨率](#Super-Resolution)
38 | - [模型压缩/剪枝](#Model-Compression)
39 | - [视频理解/行为识别](#Action-Recognition)
40 | - [人群计数](#Crowd-Counting)
41 | - [深度估计](#Depth-Estimation)
42 | - [6D目标姿态估计](#6DOF)
43 | - [手势估计](#Hand-Pose)
44 | - [显著性检测](#Saliency)
45 | - [去噪](#Denoising)
46 | - [去雨](#Deraining)
47 | - [去模糊](#Deblurring)
48 | - [去雾](#Dehazing)
49 | - [特征点检测与描述](#Feature)
50 | - [视觉问答(VQA)](#VQA)
51 | - [视频问答(VideoQA)](#VideoQA)
52 | - [视觉语言导航](#VLN)
53 | - [视频压缩](#Video-Compression)
54 | - [视频插帧](#Video-Frame-Interpolation)
55 | - [风格迁移](#Style-Transfer)
56 | - [车道线检测](#Lane-Detection)
57 | - ["人-物"交互(HOI)检测](#HOI)
58 | - [轨迹预测](#TP)
59 | - [运动预测](#Motion-Predication)
60 | - [光流估计](#OF)
61 | - [图像检索](#IR)
62 | - [虚拟试衣](#Virtual-Try-On)
63 | - [HDR](#HDR)
64 | - [对抗样本](#AE)
65 | - [三维重建](#3D-Reconstructing)
66 | - [深度补全](#DC)
67 | - [语义场景补全](#SSC)
68 | - [图像/视频描述](#Captioning)
69 | - [线框解析](#WP)
70 | - [数据集](#Datasets)
71 | - [其他](#Others)
72 | - [不确定中没中](#Not-Sure)
73 |
74 |
75 |
76 | # CNN
77 |
78 | **Exploring Self-attention for Image Recognition**
79 |
80 | - 论文:https://hszhao.github.io/papers/cvpr20_san.pdf
81 |
82 | - 代码:https://github.com/hszhao/SAN
83 |
84 | **Improving Convolutional Networks with Self-Calibrated Convolutions**
85 |
86 | - 主页:https://mmcheng.net/scconv/
87 |
88 | - 论文:http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
89 | - 代码:https://github.com/backseason/SCNet
90 |
91 | **Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets**
92 |
93 | - 论文:https://arxiv.org/abs/2003.13549
94 | - 代码:https://github.com/zeiss-microscopy/BSConv
95 |
96 |
97 |
98 | # 图像分类
99 |
100 | **Interpretable and Accurate Fine-grained Recognition via Region Grouping**
101 |
102 | - 论文:https://arxiv.org/abs/2005.10411
103 |
104 | - 代码:https://github.com/zxhuang1698/interpretability-by-parts
105 |
106 | **Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion**
107 |
108 | - 论文:https://arxiv.org/abs/2003.04490
109 |
110 | - 代码:https://github.com/AdamKortylewski/CompositionalNets
111 |
112 | **Spatially Attentive Output Layer for Image Classification**
113 |
114 | - 论文:https://arxiv.org/abs/2004.07570
115 | - 代码(好像被原作者删除了):https://github.com/ildoonet/spatially-attentive-output-layer
116 |
117 |
118 |
119 | # 视频分类
120 |
121 | **SmallBigNet: Integrating Core and Contextual Views for Video Classification**
122 |
123 | - 论文:https://arxiv.org/abs/2006.14582
124 | - 代码:https://github.com/xhl-video/SmallBigNet
125 |
126 |
127 |
128 | # 目标检测
129 |
130 | **Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax**
131 |
132 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf
133 | - 代码:https://github.com/FishYuLi/BalancedGroupSoftmax
134 |
135 | **AugFPN: Improving Multi-scale Feature Learning for Object Detection**
136 |
137 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf
138 | - 代码:https://github.com/Gus-Guo/AugFPN
139 |
140 | **Noise-Aware Fully Webly Supervised Object Detection**
141 |
142 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html
143 | - 代码:https://github.com/shenyunhang/NA-fWebSOD/
144 |
145 | **Learning a Unified Sample Weighting Network for Object Detection**
146 |
147 | - 论文:https://arxiv.org/abs/2006.06568
148 | - 代码:https://github.com/caiqi/sample-weighting-network
149 |
150 | **D2Det: Towards High Quality Object Detection and Instance Segmentation**
151 |
152 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
153 |
154 | - 代码:https://github.com/JialeCao001/D2Det
155 |
156 | **Dynamic Refinement Network for Oriented and Densely Packed Object Detection**
157 |
158 | - 论文下载链接:https://arxiv.org/abs/2005.09973
159 |
160 | - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020
161 |
162 | **Scale-Equalizing Pyramid Convolution for Object Detection**
163 |
164 | 论文:https://arxiv.org/abs/2005.03101
165 |
166 | 代码:https://github.com/jshilong/SEPC
167 |
168 | **Revisiting the Sibling Head in Object Detector**
169 |
170 | - 论文:https://arxiv.org/abs/2003.07540
171 |
172 | - 代码:https://github.com/Sense-X/TSD
173 |
174 | **Scale-equalizing Pyramid Convolution for Object Detection**
175 |
176 | - 论文:暂无
177 | - 代码:https://github.com/jshilong/SEPC
178 |
179 | **Detection in Crowded Scenes: One Proposal, Multiple Predictions**
180 |
181 | - 论文:https://arxiv.org/abs/2003.09163
182 | - 代码:https://github.com/megvii-model/CrowdDetection
183 |
184 | **Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection**
185 |
186 | - 论文:https://arxiv.org/abs/2004.04725
187 | - 代码:https://github.com/NVlabs/wetectron
188 |
189 | **Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection**
190 |
191 | - 论文:https://arxiv.org/abs/1912.02424
192 | - 代码:https://github.com/sfzhang15/ATSS
193 |
194 | **BiDet: An Efficient Binarized Object Detector**
195 |
196 | - 论文:https://arxiv.org/abs/2003.03961
197 | - 代码:https://github.com/ZiweiWangTHU/BiDet
198 |
199 | **Harmonizing Transferability and Discriminability for Adapting Object Detectors**
200 |
201 | - 论文:https://arxiv.org/abs/2003.06297
202 | - 代码:https://github.com/chaoqichen/HTCN
203 |
204 | **CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection**
205 |
206 | - 论文:https://arxiv.org/abs/2003.09119
207 | - 代码:https://github.com/KiveeDong/CentripetalNet
208 |
209 | **Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection**
210 |
211 | - 论文:https://arxiv.org/abs/2003.11818
212 | - 代码:https://github.com/ggjy/HitDet.pytorch
213 |
214 | **EfficientDet: Scalable and Efficient Object Detection**
215 |
216 | - 论文:https://arxiv.org/abs/1911.09070
217 | - 代码:https://github.com/google/automl/tree/master/efficientdet
218 |
219 |
220 |
221 | # 3D目标检测
222 |
223 | **SESS: Self-Ensembling Semi-Supervised 3D Object Detection**
224 |
225 | - 论文: https://arxiv.org/abs/1912.11803
226 |
227 | - 代码:https://github.com/Na-Z/sess
228 |
229 | **Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection**
230 |
231 | - 论文: https://arxiv.org/abs/2006.04356
232 |
233 | - 代码:https://github.com/dleam/Associate-3Ddet
234 |
235 | **What You See is What You Get: Exploiting Visibility for 3D Object Detection**
236 |
237 | - 主页:https://www.cs.cmu.edu/~peiyunh/wysiwyg/
238 |
239 | - 论文:https://arxiv.org/abs/1912.04986
240 | - 代码:https://github.com/peiyunh/wysiwyg
241 |
242 | **Learning Depth-Guided Convolutions for Monocular 3D Object Detection**
243 |
244 | - 论文:https://arxiv.org/abs/1912.04799
245 | - 代码:https://github.com/dingmyu/D4LCN
246 |
247 | **Structure Aware Single-stage 3D Object Detection from Point Cloud**
248 |
249 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html
250 |
251 | - 代码:https://github.com/skyhehe123/SA-SSD
252 |
253 | **IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving**
254 |
255 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf
256 |
257 | - 代码:https://github.com/swords123/IDA-3D
258 |
259 | **Train in Germany, Test in The USA: Making 3D Object Detectors Generalize**
260 |
261 | - 论文:https://arxiv.org/abs/2005.08139
262 |
263 | - 代码:https://github.com/cxy1997/3D_adapt_auto_driving
264 |
265 | **MLCVNet: Multi-Level Context VoteNet for 3D Object Detection**
266 |
267 | - 论文:https://arxiv.org/abs/2004.05679
268 | - 代码:https://github.com/NUAAXQ/MLCVNet
269 |
270 | **3DSSD: Point-based 3D Single Stage Object Detector**
271 |
272 | - CVPR 2020 Oral
273 |
274 | - 论文:https://arxiv.org/abs/2002.10187
275 |
276 | - 代码:https://github.com/tomztyang/3DSSD
277 |
278 | **Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation**
279 |
280 | - 论文:https://arxiv.org/abs/2004.03572
281 |
282 | - 代码:https://github.com/zju3dv/disprcn
283 |
284 | **End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection**
285 |
286 | - 论文:https://arxiv.org/abs/2004.03080
287 |
288 | - 代码:https://github.com/mileyan/pseudo-LiDAR_e2e
289 |
290 | **DSGN: Deep Stereo Geometry Network for 3D Object Detection**
291 |
292 | - 论文:https://arxiv.org/abs/2001.03398
293 | - 代码:https://github.com/chenyilun95/DSGN
294 |
295 | **LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention**
296 |
297 | - 论文:https://arxiv.org/abs/2004.01389
298 | - 代码:https://github.com/yinjunbo/3DVID
299 |
300 | **PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection**
301 |
302 | - 论文:https://arxiv.org/abs/1912.13192
303 |
304 | - 代码:https://github.com/sshaoshuai/PV-RCNN
305 |
306 | **Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud**
307 |
308 | - 论文:https://arxiv.org/abs/2003.01251
309 | - 代码:https://github.com/WeijingShi/Point-GNN
310 |
311 |
312 |
313 | # 视频目标检测
314 |
315 | **Memory Enhanced Global-Local Aggregation for Video Object Detection**
316 |
317 | 论文:https://arxiv.org/abs/2003.12063
318 |
319 | 代码:https://github.com/Scalsol/mega.pytorch
320 |
321 |
322 |
323 | # 目标跟踪
324 |
325 | **SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking**
326 |
327 | - 论文:https://arxiv.org/abs/1911.07241
328 | - 代码:https://github.com/ohhhyeahhh/SiamCAR
329 |
330 | **D3S -- A Discriminative Single Shot Segmentation Tracker**
331 |
332 | - 论文:https://arxiv.org/abs/1911.08862
333 | - 代码:https://github.com/alanlukezic/d3s
334 |
335 | **ROAM: Recurrently Optimizing Tracking Model**
336 |
337 | - 论文:https://arxiv.org/abs/1907.12006
338 |
339 | - 代码:https://github.com/skyoung/ROAM
340 |
341 | **Siam R-CNN: Visual Tracking by Re-Detection**
342 |
343 | - 主页:https://www.vision.rwth-aachen.de/page/siamrcnn
344 | - 论文:https://arxiv.org/abs/1911.12836
345 | - 论文2:https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf
346 | - 代码:https://github.com/VisualComputingInstitute/SiamR-CNN
347 |
348 | **Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises**
349 |
350 | - 论文:https://arxiv.org/abs/2003.09595
351 | - 代码:https://github.com/MasterBin-IIAU/CSA
352 |
353 | **High-Performance Long-Term Tracking with Meta-Updater**
354 |
355 | - 论文:https://arxiv.org/abs/2004.00305
356 |
357 | - 代码:https://github.com/Daikenan/LTMU
358 |
359 | **AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization**
360 |
361 | - 论文:https://arxiv.org/abs/2003.12949
362 |
363 | - 代码:https://github.com/vision4robotics/AutoTrack
364 |
365 | **Probabilistic Regression for Visual Tracking**
366 |
367 | - 论文:https://arxiv.org/abs/2003.12565
368 | - 代码:https://github.com/visionml/pytracking
369 |
370 | **MAST: A Memory-Augmented Self-supervised Tracker**
371 |
372 | - 论文:https://arxiv.org/abs/2002.07793
373 | - 代码:https://github.com/zlai0/MAST
374 |
375 | **Siamese Box Adaptive Network for Visual Tracking**
376 |
377 | - 论文:https://arxiv.org/abs/2003.06761
378 | - 代码:https://github.com/hqucv/siamban
379 |
380 | ## 多目标跟踪
381 |
382 | **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**
383 |
384 | - 主页:https://vap.aau.dk/3d-zef/
385 | - 论文:https://arxiv.org/abs/2006.08466
386 | - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/
387 | - 数据集:https://motchallenge.net/data/3D-ZeF20
388 |
389 |
390 |
391 | # 语义分割
392 |
393 | **FDA: Fourier Domain Adaptation for Semantic Segmentation**
394 |
395 | - 论文:https://arxiv.org/abs/2004.05498
396 |
397 | - 代码:https://github.com/YanchaoYang/FDA
398 |
399 | **Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation**
400 |
401 | - 论文:暂无
402 |
403 | - 代码:https://github.com/JianqiangWan/Super-BPD
404 |
405 | **Single-Stage Semantic Segmentation from Image Labels**
406 |
407 | - 论文:https://arxiv.org/abs/2005.08104
408 |
409 | - 代码:https://github.com/visinf/1-stage-wseg
410 |
411 | **Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation**
412 |
413 | - 论文:https://arxiv.org/abs/2003.00867
414 | - 代码:https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation
415 |
416 | **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**
417 |
418 | - 论文:http://vladlen.info/papers/MSeg.pdf
419 | - 代码:https://github.com/mseg-dataset/mseg-api
420 |
421 | **CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement**
422 |
423 | - 论文:https://arxiv.org/abs/2005.02551
424 | - 代码:https://github.com/hkchengrex/CascadePSP
425 |
426 | **Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision**
427 |
428 | - Oral
429 | - 论文:https://arxiv.org/abs/2004.07703
430 | - 代码:https://github.com/feipan664/IntraDA
431 |
432 | **Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation**
433 |
434 | - 论文:https://arxiv.org/abs/2004.04581
435 | - 代码:https://github.com/YudeWang/SEAM
436 |
437 | **Temporally Distributed Networks for Fast Video Segmentation**
438 |
439 | - 论文:https://arxiv.org/abs/2004.01800
440 |
441 | - 代码:https://github.com/feinanshan/TDNet
442 |
443 | **Context Prior for Scene Segmentation**
444 |
445 | - 论文:https://arxiv.org/abs/2004.01547
446 |
447 | - 代码:https://git.io/ContextPrior
448 |
449 | **Strip Pooling: Rethinking Spatial Pooling for Scene Parsing**
450 |
451 | - 论文:https://arxiv.org/abs/2003.13328
452 |
453 | - 代码:https://github.com/Andrew-Qibin/SPNet
454 |
455 | **Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks**
456 |
457 | - 论文:https://arxiv.org/abs/2003.05128
458 | - 代码:https://github.com/shachoi/HANet
459 |
460 | **Learning Dynamic Routing for Semantic Segmentation**
461 |
462 | - 论文:https://arxiv.org/abs/2003.10401
463 |
464 | - 代码:https://github.com/yanwei-li/DynamicRouting
465 |
466 |
467 |
468 | # 实例分割
469 |
470 | **D2Det: Towards High Quality Object Detection and Instance Segmentation**
471 |
472 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
473 |
474 | - 代码:https://github.com/JialeCao001/D2Det
475 |
476 | **PolarMask: Single Shot Instance Segmentation with Polar Representation**
477 |
478 | - 论文:https://arxiv.org/abs/1909.13226
479 | - 代码:https://github.com/xieenze/PolarMask
480 | - 解读:https://zhuanlan.zhihu.com/p/84890413
481 |
482 | **CenterMask : Real-Time Anchor-Free Instance Segmentation**
483 |
484 | - 论文:https://arxiv.org/abs/1911.06667
485 | - 代码:https://github.com/youngwanLEE/CenterMask
486 |
487 | **BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation**
488 |
489 | - 论文:https://arxiv.org/abs/2001.00309
490 | - 代码:https://github.com/aim-uofa/AdelaiDet
491 |
492 | **Deep Snake for Real-Time Instance Segmentation**
493 |
494 | - 论文:https://arxiv.org/abs/2001.01629
495 | - 代码:https://github.com/zju3dv/snake
496 |
497 | **Mask Encoding for Single Shot Instance Segmentation**
498 |
499 | - 论文:https://arxiv.org/abs/2003.11712
500 |
501 | - 代码:https://github.com/aim-uofa/AdelaiDet
502 |
503 |
504 |
505 | # 全景分割
506 |
507 | **Video Panoptic Segmentation**
508 |
509 | - 论文:https://arxiv.org/abs/2006.11339
510 | - 代码:https://github.com/mcahny/vps
511 | - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
512 |
513 | **Pixel Consensus Voting for Panoptic Segmentation**
514 |
515 | - 论文:https://arxiv.org/abs/2004.01849
516 | - 代码:还未公布
517 |
518 | **BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation**
519 |
520 | 论文:https://arxiv.org/abs/2003.14031
521 |
522 | 代码:https://github.com/Mooonside/BANet
523 |
524 |
525 |
526 | # 视频目标分割
527 |
528 | **A Transductive Approach for Video Object Segmentation**
529 |
530 | - 论文:https://arxiv.org/abs/2004.07193
531 |
532 | - 代码:https://github.com/microsoft/transductive-vos.pytorch
533 |
534 | **State-Aware Tracker for Real-Time Video Object Segmentation**
535 |
536 | - 论文:https://arxiv.org/abs/2003.00482
537 |
538 | - 代码:https://github.com/MegviiDetection/video_analyst
539 |
540 | **Learning Fast and Robust Target Models for Video Object Segmentation**
541 |
542 | - 论文:https://arxiv.org/abs/2003.00908
543 | - 代码:https://github.com/andr345/frtm-vos
544 |
545 | **Learning Video Object Segmentation from Unlabeled Videos**
546 |
547 | - 论文:https://arxiv.org/abs/2003.05020
548 | - 代码:https://github.com/carrierlxk/MuG
549 |
550 |
551 |
552 | # 超像素分割
553 |
554 | **Superpixel Segmentation with Fully Convolutional Networks**
555 |
556 | - 论文:https://arxiv.org/abs/2003.12929
557 | - 代码:https://github.com/fuy34/superpixel_fcn
558 |
559 |
560 |
561 | # 交互式图像分割
562 |
563 | **Interactive Object Segmentation with Inside-Outside Guidance**
564 |
565 | - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
566 | - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance
567 | - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet
568 |
569 |
570 |
571 | # NAS
572 |
573 | **AOWS: Adaptive and optimal network width search with latency constraints**
574 |
575 | - 论文:https://arxiv.org/abs/2005.10481
576 | - 代码:https://github.com/bermanmaxim/AOWS
577 |
578 | **Densely Connected Search Space for More Flexible Neural Architecture Search**
579 |
580 | - 论文:https://arxiv.org/abs/1906.09607
581 |
582 | - 代码:https://github.com/JaminFong/DenseNAS
583 |
584 | **MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning**
585 |
586 | - 论文:https://arxiv.org/abs/2003.14058
587 |
588 | - 代码:https://github.com/bhpfelix/MTLNAS
589 |
590 | **FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions**
591 |
592 | - 论文下载链接:https://arxiv.org/abs/2004.05565
593 |
594 | - 代码:https://github.com/facebookresearch/mobile-vision
595 |
596 | **Neural Architecture Search for Lightweight Non-Local Networks**
597 |
598 | - 论文:https://arxiv.org/abs/2004.01961
599 | - 代码:https://github.com/LiYingwei/AutoNL
600 |
601 | **Rethinking Performance Estimation in Neural Architecture Search**
602 |
603 | - 论文:https://arxiv.org/abs/2005.09917
604 | - 代码:https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS
605 | - 解读1:https://www.zhihu.com/question/372070853/answer/1035234510
606 | - 解读2:https://zhuanlan.zhihu.com/p/111167409
607 |
608 | **CARS: Continuous Evolution for Efficient Neural Architecture Search**
609 |
610 | - 论文:https://arxiv.org/abs/1909.04977
611 | - 代码(即将开源):https://github.com/huawei-noah/CARS
612 |
613 |
614 |
615 | # GAN
616 |
617 | **SEAN: Image Synthesis with Semantic Region-Adaptive Normalization**
618 |
619 | - 论文:https://arxiv.org/abs/1911.12861
620 | - 代码:https://github.com/ZPdesu/SEAN
621 |
622 | **Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation**
623 |
624 | - 论文地址:http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html
625 | - 代码地址:https://github.com/alpc91/NICE-GAN-pytorch
626 |
627 | **Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning**
628 |
629 | - 论文:https://arxiv.org/abs/1912.01899
630 | - 代码:https://github.com/SsGood/DBGAN
631 |
632 | **PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer**
633 |
634 | - 论文:https://arxiv.org/abs/1909.06956
635 | - 代码:https://github.com/wtjiang98/PSGAN
636 |
637 | **Semantically Mutil-modal Image Synthesis**
638 |
639 | - 主页:http://seanseattle.github.io/SMIS
640 | - 论文:https://arxiv.org/abs/2003.12697
641 | - 代码:https://github.com/Seanseattle/SMIS
642 |
643 | **Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping**
644 |
645 | - 论文:https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf
646 | - 代码:https://github.com/yiranran/Unpaired-Portrait-Drawing
647 |
648 | **Learning to Cartoonize Using White-box Cartoon Representations**
649 |
650 | - 论文:https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf
651 |
652 | - 主页:https://systemerrorwang.github.io/White-box-Cartoonization/
653 | - 代码:https://github.com/SystemErrorWang/White-box-Cartoonization
654 | - 解读:https://zhuanlan.zhihu.com/p/117422157
655 | - Demo视频:https://www.bilibili.com/video/av56708333
656 |
657 | **GAN Compression: Efficient Architectures for Interactive Conditional GANs**
658 |
659 | - 论文:https://arxiv.org/abs/2003.08936
660 |
661 | - 代码:https://github.com/mit-han-lab/gan-compression
662 |
663 | **Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions**
664 |
665 | - 论文:https://arxiv.org/abs/2003.01826
666 | - 代码:https://github.com/cc-hpc-itwm/UpConv
667 |
668 |
669 |
670 | # Re-ID
671 |
672 | **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification**
673 |
674 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html
675 | - 代码:https://github.com/wangguanan/HOReID
676 |
677 | **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**
678 |
679 | - 论文:https://arxiv.org/abs/2005.07862
680 |
681 | - 数据集:暂无
682 |
683 | **Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking**
684 |
685 | - 论文:https://arxiv.org/abs/2004.04199
686 |
687 | - 代码:https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking
688 |
689 | **Pose-guided Visible Part Matching for Occluded Person ReID**
690 |
691 | - 论文:https://arxiv.org/abs/2004.00230
692 | - 代码:https://github.com/hh23333/PVPM
693 |
694 | **Weakly supervised discriminative feature learning with state information for person identification**
695 |
696 | - 论文:https://arxiv.org/abs/2002.11939
697 | - 代码:https://github.com/KovenYu/state-information
698 |
699 |
700 |
701 | # 3D点云(分类/分割/配准等)
702 |
703 | ## 3D点云卷积
704 |
705 | **PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling**
706 |
707 | - 论文:https://arxiv.org/abs/2003.00492
708 | - 代码:https://github.com/yanx27/PointASNL
709 |
710 | **Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds**
711 |
712 | - 论文下载链接:https://arxiv.org/abs/2003.12971
713 |
714 | - 代码:https://github.com/raoyongming/PointGLR
715 |
716 | **Grid-GCN for Fast and Scalable Point Cloud Learning**
717 |
718 | - 论文:https://arxiv.org/abs/1912.02984
719 |
720 | - 代码:https://github.com/Xharlie/Grid-GCN
721 |
722 | **FPConv: Learning Local Flattening for Point Convolution**
723 |
724 | - 论文:https://arxiv.org/abs/2002.10701
725 | - 代码:https://github.com/lyqun/FPConv
726 |
727 | ## 3D点云分类
728 |
729 | **PointAugment: an Auto-Augmentation Framework for Point Cloud Classification**
730 |
731 | - 论文:https://arxiv.org/abs/2002.10876
732 | - 代码(即将开源): https://github.com/liruihui/PointAugment/
733 |
734 | ## 3D点云语义分割
735 |
736 | **RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds**
737 |
738 | - 论文:https://arxiv.org/abs/1911.11236
739 | - 代码:https://github.com/QingyongHu/RandLA-Net
740 |
741 | - 解读:https://zhuanlan.zhihu.com/p/105433460
742 |
743 | **Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels**
744 |
745 | - 论文:https://arxiv.org/abs/2004.04091
746 |
747 | - 代码:https://github.com/alex-xun-xu/WeakSupPointCloudSeg
748 |
749 | **PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation**
750 |
751 | - 论文:https://arxiv.org/abs/2003.14032
752 | - 代码:https://github.com/edwardzhou130/PolarSeg
753 |
754 | **Learning to Segment 3D Point Clouds in 2D Image Space**
755 |
756 | - 论文:https://arxiv.org/abs/2003.05593
757 |
758 | - 代码:https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space
759 |
760 | ## 3D点云实例分割
761 |
762 | PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
763 |
764 | - 论文:https://arxiv.org/abs/2004.01658
765 | - 代码:https://github.com/Jia-Research-Lab/PointGroup
766 |
767 | ## 3D点云配准
768 |
769 | **Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences**
770 |
771 | - 论文:https://arxiv.org/abs/2005.01014
772 | - 代码:https://github.com/XiaoshuiHuang/fmr
773 |
774 | **D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features**
775 |
776 | - 论文:https://arxiv.org/abs/2003.03164
777 | - 代码:https://github.com/XuyangBai/D3Feat
778 |
779 | **RPM-Net: Robust Point Matching using Learned Features**
780 |
781 | - 论文:https://arxiv.org/abs/2003.13479
782 | - 代码:https://github.com/yewzijian/RPMNet
783 |
784 | ## 3D点云补全
785 |
786 | **Cascaded Refinement Network for Point Cloud Completion**
787 |
788 | - 论文:https://arxiv.org/abs/2004.03327
789 | - 代码:https://github.com/xiaogangw/cascaded-point-completion
790 |
791 | ## 3D点云目标跟踪
792 |
793 | **P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds**
794 |
795 | - 论文:https://arxiv.org/abs/2005.13888
796 | - 代码:https://github.com/HaozheQi/P2B
797 |
798 | ## 其他
799 |
800 | **An Efficient PointLSTM for Point Clouds Based Gesture Recognition**
801 |
802 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
803 | - 代码:https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch
804 |
805 |
806 |
807 | # 人脸
808 |
809 | ## 人脸识别
810 |
811 | **CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition**
812 |
813 | - 论文:https://arxiv.org/abs/2004.00288
814 |
815 | - 代码:https://github.com/HuangYG123/CurricularFace
816 |
817 | **Learning Meta Face Recognition in Unseen Domains**
818 |
819 | - 论文:https://arxiv.org/abs/2003.07733
820 | - 代码:https://github.com/cleardusk/MFR
821 | - 解读:https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ
822 |
823 | ## 人脸检测
824 |
825 | ## 人脸活体检测
826 |
827 | **Searching Central Difference Convolutional Networks for Face Anti-Spoofing**
828 |
829 | - 论文:https://arxiv.org/abs/2003.04092
830 |
831 | - 代码:https://github.com/ZitongYu/CDCN
832 |
833 | ## 人脸表情识别
834 |
835 | **Suppressing Uncertainties for Large-Scale Facial Expression Recognition**
836 |
837 | - 论文:https://arxiv.org/abs/2002.10392
838 |
839 | - 代码(即将开源):https://github.com/kaiwang960112/Self-Cure-Network
840 |
841 | ## 人脸转正
842 |
843 | **Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images**
844 |
845 | - 论文:https://arxiv.org/abs/2003.08124
846 | - 代码:https://github.com/Hangz-nju-cuhk/Rotate-and-Render
847 |
848 | ## 人脸3D重建
849 |
850 | **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**
851 |
852 | - 论文:https://arxiv.org/abs/2003.13845
853 | - 数据集:https://github.com/lattas/AvatarMe
854 |
855 | **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**
856 |
857 | - 论文:https://arxiv.org/abs/2003.13989
858 | - 代码:https://github.com/zhuhao-nju/facescape
859 |
860 |
861 |
862 | # 人体姿态估计(2D/3D)
863 |
864 | ## 2D人体姿态估计
865 |
866 | **TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting**
867 |
868 | - 主页:https://yzhq97.github.io/transmomo/
869 |
870 | - 论文:https://arxiv.org/abs/2003.14401
871 | - 代码:https://github.com/yzhq97/transmomo.pytorch
872 |
873 | **HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation**
874 |
875 | - 论文:https://arxiv.org/abs/1908.10357
876 | - 代码:https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation
877 |
878 | **The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation**
879 |
880 | - 论文:https://arxiv.org/abs/1911.07524
881 | - 代码:https://github.com/HuangJunJie2017/UDP-Pose
882 | - 解读:https://zhuanlan.zhihu.com/p/92525039
883 |
884 | **Distribution-Aware Coordinate Representation for Human Pose Estimation**
885 |
886 | - 主页:https://ilovepose.github.io/coco/
887 |
888 | - 论文:https://arxiv.org/abs/1910.06278
889 |
890 | - 代码:https://github.com/ilovepose/DarkPose
891 |
892 | ## 3D人体姿态估计
893 |
894 | **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data**
895 |
896 | - 论文:https://arxiv.org/abs/2006.07778
897 | - 代码:https://github.com/Nicholasli1995/EvoSkeleton
898 |
899 | **Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach**
900 |
901 | - 主页:https://www.zhe-zhang.com/cvpr2020
902 | - 论文:https://arxiv.org/abs/2003.11163
903 |
904 | - 代码:https://github.com/CHUNYUWANG/imu-human-pose-pytorch
905 |
906 | **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**
907 |
908 | - 论文下载链接:https://arxiv.org/abs/2004.01166
909 |
910 | - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest
911 | - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
912 |
913 | **Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis**
914 |
915 | - 主页:http://val.cds.iisc.ac.in/pgp-human/
916 | - 论文:https://arxiv.org/abs/2004.04400
917 |
918 | **Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation**
919 |
920 | - 论文:https://arxiv.org/abs/2004.00329
921 | - 代码:https://github.com/fabbrimatteo/LoCO
922 |
923 | **VIBE: Video Inference for Human Body Pose and Shape Estimation**
924 |
925 | - 论文:https://arxiv.org/abs/1912.05656
926 | - 代码:https://github.com/mkocabas/VIBE
927 |
928 | **Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation**
929 |
930 | - 论文:https://arxiv.org/abs/2002.11251
931 | - 代码:https://github.com/vnmr/JointVideoPose3D
932 |
933 | **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**
934 |
935 | - 论文:https://arxiv.org/abs/2003.03972
936 | - 数据集:暂无
937 |
938 |
939 |
940 | # 人体解析
941 |
942 | **Correlating Edge, Pose with Parsing**
943 |
944 | - 论文:https://arxiv.org/abs/2005.01431
945 |
946 | - 代码:https://github.com/ziwei-zh/CorrPM
947 |
948 |
949 |
950 | # 场景文本检测
951 |
952 | **STEFANN: Scene Text Editor using Font Adaptive Neural Network**
953 |
954 | - 主页:https://prasunroy.github.io/stefann/
955 |
956 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
957 | - 代码:https://github.com/prasunroy/stefann
958 | - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
959 |
960 | **ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection**
961 |
962 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf
963 | - 代码:https://github.com/wangyuxin87/ContourNet
964 |
965 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
966 |
967 | - 论文:https://arxiv.org/abs/2003.10608
968 | - 代码和数据集:https://github.com/Jyouhou/UnrealText/
969 |
970 | **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**
971 |
972 | - 论文:https://arxiv.org/abs/2002.10200
973 | - 代码(即将开源):https://github.com/Yuliang-Liu/bezier_curve_text_spotting
974 | - 代码(即将开源):https://github.com/aim-uofa/adet
975 |
976 | **Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection**
977 |
978 | - 论文:https://arxiv.org/abs/2003.07493
979 |
980 | - 代码:https://github.com/GXYM/DRRG
981 |
982 |
983 |
984 | # 场景文本识别
985 |
986 | **SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition**
987 |
988 | - 论文:https://arxiv.org/abs/2005.10977
989 | - 代码:https://github.com/Pay20Y/SEED
990 |
991 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
992 |
993 | - 论文:https://arxiv.org/abs/2003.10608
994 | - 代码和数据集:https://github.com/Jyouhou/UnrealText/
995 |
996 | **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**
997 |
998 | - 论文:https://arxiv.org/abs/2002.10200
999 | - 代码(即将开源):https://github.com/aim-uofa/adet
1000 |
1001 | **Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition**
1002 |
1003 | - 论文:https://arxiv.org/abs/2003.06606
1004 |
1005 | - 代码:https://github.com/Canjie-Luo/Text-Image-Augmentation
1006 |
1007 |
1008 |
1009 | # 特征(点)检测和描述
1010 |
1011 | **SuperGlue: Learning Feature Matching with Graph Neural Networks**
1012 |
1013 | - 论文:https://arxiv.org/abs/1911.11763
1014 | - 代码:https://github.com/magicleap/SuperGluePretrainedNetwork
1015 |
1016 |
1017 |
1018 | # 超分辨率
1019 |
1020 | ## 图像超分辨率
1021 |
1022 | **Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution**
1023 |
1024 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html
1025 | - 代码:https://github.com/guoyongcs/DRN
1026 |
1027 | **Learning Texture Transformer Network for Image Super-Resolution**
1028 |
1029 | - 论文:https://arxiv.org/abs/2006.04139
1030 |
1031 | - 代码:https://github.com/FuzhiYang/TTSR
1032 |
1033 | **Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining**
1034 |
1035 | - 论文:https://arxiv.org/abs/2006.01424
1036 | - 代码:https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention
1037 |
1038 | **Structure-Preserving Super Resolution with Gradient Guidance**
1039 |
1040 | - 论文:https://arxiv.org/abs/2003.13081
1041 |
1042 | - 代码:https://github.com/Maclory/SPSR
1043 |
1044 | **Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy**
1045 |
1046 | 论文:https://arxiv.org/abs/2004.00448
1047 |
1048 | 代码:https://github.com/clovaai/cutblur
1049 |
1050 | ## 视频超分辨率
1051 |
1052 | **TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution**
1053 |
1054 | - 论文:https://arxiv.org/abs/1812.02898
1055 | - 代码:https://github.com/YapengTian/TDAN-VSR-CVPR-2020
1056 |
1057 | **Space-Time-Aware Multi-Resolution Video Enhancement**
1058 |
1059 | - 主页:https://alterzero.github.io/projects/STAR.html
1060 | - 论文:http://arxiv.org/abs/2003.13170
1061 | - 代码:https://github.com/alterzero/STARnet
1062 |
1063 | **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**
1064 |
1065 | - 论文:https://arxiv.org/abs/2002.11616
1066 | - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
1067 |
1068 |
1069 |
1070 | # 模型压缩/剪枝
1071 |
1072 | **DMCP: Differentiable Markov Channel Pruning for Neural Networks**
1073 |
1074 | - 论文:https://arxiv.org/abs/2005.03354
1075 | - 代码:https://github.com/zx55/dmcp
1076 |
1077 | **Forward and Backward Information Retention for Accurate Binary Neural Networks**
1078 |
1079 | - 论文:https://arxiv.org/abs/1909.10788
1080 |
1081 | - 代码:https://github.com/htqin/IR-Net
1082 |
1083 | **Towards Efficient Model Compression via Learned Global Ranking**
1084 |
1085 | - 论文:https://arxiv.org/abs/1904.12368
1086 | - 代码:https://github.com/cmu-enyac/LeGR
1087 |
1088 | **HRank: Filter Pruning using High-Rank Feature Map**
1089 |
1090 | - 论文:http://arxiv.org/abs/2002.10179
1091 | - 代码:https://github.com/lmbxmu/HRank
1092 |
1093 | **GAN Compression: Efficient Architectures for Interactive Conditional GANs**
1094 |
1095 | - 论文:https://arxiv.org/abs/2003.08936
1096 |
1097 | - 代码:https://github.com/mit-han-lab/gan-compression
1098 |
1099 | **Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression**
1100 |
1101 | - 论文:https://arxiv.org/abs/2003.08935
1102 |
1103 | - 代码:https://github.com/ofsoundof/group_sparsity
1104 |
1105 |
1106 |
1107 | # 视频理解/行为识别
1108 |
1109 | **Oops! Predicting Unintentional Action in Video**
1110 |
1111 | - 主页:https://oops.cs.columbia.edu/
1112 |
1113 | - 论文:https://arxiv.org/abs/1911.11206
1114 | - 代码:https://github.com/cvlab-columbia/oops
1115 | - 数据集:https://oops.cs.columbia.edu/data
1116 |
1117 | **PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition**
1118 |
1119 | - 论文:https://arxiv.org/abs/1911.12409
1120 | - 代码:https://github.com/shlizee/Predict-Cluster
1121 |
1122 | **Intra- and Inter-Action Understanding via Temporal Action Parsing**
1123 |
1124 | - 论文:https://arxiv.org/abs/2005.10229
1125 | - 主页和数据集:https://sdolivia.github.io/TAPOS/
1126 |
1127 | **3DV: 3D Dynamic Voxel for Action Recognition in Depth Video**
1128 |
1129 | - 论文:https://arxiv.org/abs/2005.05501
1130 | - 代码:https://github.com/3huo/3DV-Action
1131 |
1132 | **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**
1133 |
1134 | - 主页:https://sdolivia.github.io/FineGym/
1135 | - 论文:https://arxiv.org/abs/2004.06704
1136 |
1137 | **TEA: Temporal Excitation and Aggregation for Action Recognition**
1138 |
1139 | - 论文:https://arxiv.org/abs/2004.01398
1140 |
1141 | - 代码:https://github.com/Phoenix1327/tea-action-recognition
1142 |
1143 | **X3D: Expanding Architectures for Efficient Video Recognition**
1144 |
1145 | - 论文:https://arxiv.org/abs/2004.04730
1146 |
1147 | - 代码:https://github.com/facebookresearch/SlowFast
1148 |
1149 | **Temporal Pyramid Network for Action Recognition**
1150 |
1151 | - 主页:https://decisionforce.github.io/TPN
1152 |
1153 | - 论文:https://arxiv.org/abs/2004.03548
1154 | - 代码:https://github.com/decisionforce/TPN
1155 |
1156 | ## 基于骨架的动作识别
1157 |
1158 | **Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition**
1159 |
1160 | - 论文:https://arxiv.org/abs/2003.14111
1161 | - 代码:https://github.com/kenziyuliu/ms-g3d
1162 |
1163 |
1164 |
1165 | # 人群计数
1166 |
1167 |
1168 |
1169 | # 深度估计
1170 |
1171 | **BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion**
1172 |
1173 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf
1174 | - 代码:https://github.com/Yeh-yu-hsuan/BiFuse
1175 |
1176 | **Focus on defocus: bridging the synthetic to real domain gap for depth estimation**
1177 |
1178 | - 论文:https://arxiv.org/abs/2005.09623
1179 | - 代码:https://github.com/dvl-tum/defocus-net
1180 |
1181 | **Bi3D: Stereo Depth Estimation via Binary Classifications**
1182 |
1183 | - 论文:https://arxiv.org/abs/2005.07274
1184 |
1185 | - 代码:https://github.com/NVlabs/Bi3D
1186 |
1187 | **AANet: Adaptive Aggregation Network for Efficient Stereo Matching**
1188 |
1189 | - 论文:https://arxiv.org/abs/2004.09548
1190 | - 代码:https://github.com/haofeixu/aanet
1191 |
1192 | **Towards Better Generalization: Joint Depth-Pose Learning without PoseNet**
1193 |
1194 | - 论文:https://github.com/B1ueber2y/TrianFlow
1195 |
1196 | - 代码:https://github.com/B1ueber2y/TrianFlow
1197 |
1198 | ## 单目深度估计
1199 |
1200 | **On the uncertainty of self-supervised monocular depth estimation**
1201 |
1202 | - 论文:https://arxiv.org/abs/2005.06209
1203 | - 代码:https://github.com/mattpoggi/mono-uncertainty
1204 |
1205 | **3D Packing for Self-Supervised Monocular Depth Estimation**
1206 |
1207 | - 论文:https://arxiv.org/abs/1905.02693
1208 | - 代码:https://github.com/TRI-ML/packnet-sfm
1209 | - Demo视频:https://www.bilibili.com/video/av70562892/
1210 |
1211 | **Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation**
1212 |
1213 | - 论文:https://arxiv.org/abs/2002.12114
1214 | - 代码:https://github.com/yzhao520/ARC
1215 |
1216 |
1217 |
1218 | # 6D目标姿态估计
1219 |
1220 | **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation**
1221 |
1222 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf
1223 | - 代码:https://github.com/ethnhe/PVN3D
1224 |
1225 | **MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion**
1226 |
1227 | - 论文:https://arxiv.org/abs/2004.04336
1228 | - 代码:https://github.com/wkentaro/morefusion
1229 |
1230 | **EPOS: Estimating 6D Pose of Objects with Symmetries**
1231 |
1232 | 主页:http://cmp.felk.cvut.cz/epos
1233 |
1234 | 论文:https://arxiv.org/abs/2004.00605
1235 |
1236 | **G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features**
1237 |
1238 | - 论文:https://arxiv.org/abs/2003.11089
1239 |
1240 | - 代码:https://github.com/DC1991/G2L_Net
1241 |
1242 |
1243 |
1244 | # 手势估计
1245 |
1246 | **HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation**
1247 |
1248 | - 论文:https://arxiv.org/abs/2004.00060
1249 |
1250 | - 主页:http://vision.sice.indiana.edu/projects/hopenet
1251 |
1252 | **Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data**
1253 |
1254 | - 论文:https://arxiv.org/abs/2003.09572
1255 |
1256 | - 代码:https://github.com/CalciferZh/minimal-hand
1257 |
1258 |
1259 |
1260 | # 显著性检测
1261 |
1262 | **JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection**
1263 |
1264 | - 论文:https://arxiv.org/abs/2004.08515
1265 |
1266 | - 代码:https://github.com/kerenfu/JLDCF/
1267 |
1268 | **UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders**
1269 |
1270 | - 主页:http://dpfan.net/d3netbenchmark/
1271 |
1272 | - 论文:https://arxiv.org/abs/2004.05763
1273 | - 代码:https://github.com/JingZhang617/UCNet
1274 |
1275 |
1276 |
1277 | # 去噪
1278 |
1279 | **A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising**
1280 |
1281 | - 论文:https://arxiv.org/abs/2003.12751
1282 |
1283 | - 代码:https://github.com/Vandermode/NoiseModel
1284 |
1285 | **CycleISP: Real Image Restoration via Improved Data Synthesis**
1286 |
1287 | - 论文:https://arxiv.org/abs/2003.07761
1288 |
1289 | - 代码:https://github.com/swz30/CycleISP
1290 |
1291 |
1292 |
1293 | # 去雨
1294 |
1295 | **Multi-Scale Progressive Fusion Network for Single Image Deraining**
1296 |
1297 | - 论文:https://arxiv.org/abs/2003.10985
1298 | - 代码:https://github.com/kuihua/MSPFN
1299 |
1300 | **Detail-recovery Image Deraining via Context Aggregation Networks**
1301 |
1302 | - 论文:https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html
1303 | - 代码:https://github.com/Dengsgithub/DRD-Net
1304 |
1305 |
1306 |
1307 | # 去模糊
1308 |
1309 | ## 视频去模糊
1310 |
1311 | **Cascaded Deep Video Deblurring Using Temporal Sharpness Prior**
1312 |
1313 | - 主页:https://csbhr.github.io/projects/cdvd-tsp/index.html
1314 | - 论文:https://arxiv.org/abs/2004.02501
1315 | - 代码:https://github.com/csbhr/CDVD-TSP
1316 |
1317 |
1318 |
1319 | # 去雾
1320 |
1321 | **Domain Adaptation for Image Dehazing**
1322 |
1323 | - 论文:https://arxiv.org/abs/2005.04668
1324 |
1325 | - 代码:https://github.com/HUSTSYJ/DA_dahazing
1326 |
1327 | **Multi-Scale Boosted Dehazing Network with Dense Feature Fusion**
1328 |
1329 | - 论文:https://arxiv.org/abs/2004.13388
1330 |
1331 | - 代码:https://github.com/BookerDeWitt/MSBDN-DFF
1332 |
1333 |
1334 |
1335 | # 特征点检测与描述
1336 |
1337 | **ASLFeat: Learning Local Features of Accurate Shape and Localization**
1338 |
1339 | - 论文:https://arxiv.org/abs/2003.10071
1340 |
1341 | - 代码:https://github.com/lzx551402/aslfeat
1342 |
1343 |
1344 |
1345 | # 视觉问答(VQA)
1346 |
1347 | **VC R-CNN:Visual Commonsense R-CNN**
1348 |
1349 | - 论文:https://arxiv.org/abs/2002.12204
1350 | - 代码:https://github.com/Wangt-CN/VC-R-CNN
1351 |
1352 |
1353 |
1354 | # 视频问答(VideoQA)
1355 |
1356 | **Hierarchical Conditional Relation Networks for Video Question Answering**
1357 |
1358 | - 论文:https://arxiv.org/abs/2002.10698
1359 | - 代码:https://github.com/thaolmk54/hcrn-videoqa
1360 |
1361 |
1362 |
1363 | # 视觉语言导航
1364 |
1365 | **Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training**
1366 |
1367 | - 论文:https://arxiv.org/abs/2002.10638
1368 | - 代码(即将开源):https://github.com/weituo12321/PREVALENT
1369 |
1370 |
1371 |
1372 | # 视频压缩
1373 |
1374 | **Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement**
1375 |
1376 | - 论文:https://arxiv.org/abs/2003.01966
1377 | - 代码:https://github.com/RenYang-home/HLVC
1378 |
1379 |
1380 |
1381 | # 视频插帧
1382 |
1383 | **AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation**
1384 |
1385 | - 论文:https://arxiv.org/abs/1907.10244
1386 | - 代码:https://github.com/HyeongminLEE/AdaCoF-pytorch
1387 |
1388 | **FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation**
1389 |
1390 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html
1391 |
1392 | - 代码:https://github.com/CM-BF/FeatureFlow
1393 |
1394 | **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**
1395 |
1396 | - 论文:https://arxiv.org/abs/2002.11616
1397 | - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
1398 |
1399 | **Space-Time-Aware Multi-Resolution Video Enhancement**
1400 |
1401 | - 主页:https://alterzero.github.io/projects/STAR.html
1402 | - 论文:http://arxiv.org/abs/2003.13170
1403 | - 代码:https://github.com/alterzero/STARnet
1404 |
1405 | **Scene-Adaptive Video Frame Interpolation via Meta-Learning**
1406 |
1407 | - 论文:https://arxiv.org/abs/2004.00779
1408 | - 代码:https://github.com/myungsub/meta-interpolation
1409 |
1410 | **Softmax Splatting for Video Frame Interpolation**
1411 |
1412 | - 主页:http://sniklaus.com/papers/softsplat
1413 | - 论文:https://arxiv.org/abs/2003.05534
1414 | - 代码:https://github.com/sniklaus/softmax-splatting
1415 |
1416 |
1417 |
1418 | # 风格迁移
1419 |
1420 | **Diversified Arbitrary Style Transfer via Deep Feature Perturbation**
1421 |
1422 | - 论文:https://arxiv.org/abs/1909.08223
1423 | - 代码:https://github.com/EndyWon/Deep-Feature-Perturbation
1424 |
1425 | **Collaborative Distillation for Ultra-Resolution Universal Style Transfer**
1426 |
1427 | - 论文:https://arxiv.org/abs/2003.08436
1428 |
1429 | - 代码:https://github.com/mingsun-tse/collaborative-distillation
1430 |
1431 |
1432 |
1433 | # 车道线检测
1434 |
1435 | **Inter-Region Affinity Distillation for Road Marking Segmentation**
1436 |
1437 | - 论文:https://arxiv.org/abs/2004.05304
1438 | - 代码:https://github.com/cardwing/Codes-for-IntRA-KD
1439 |
1440 |
1441 |
1442 | # "人-物"交互(HOT)检测
1443 |
1444 | **PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection**
1445 |
1446 | - 论文:https://arxiv.org/abs/1912.12898
1447 | - 代码:https://github.com/YueLiao/PPDM
1448 |
1449 | **Detailed 2D-3D Joint Representation for Human-Object Interaction**
1450 |
1451 | - 论文:https://arxiv.org/abs/2004.08154
1452 |
1453 | - 代码:https://github.com/DirtyHarryLYL/DJ-RN
1454 |
1455 | **Cascaded Human-Object Interaction Recognition**
1456 |
1457 | - 论文:https://arxiv.org/abs/2003.04262
1458 |
1459 | - 代码:https://github.com/tfzhou/C-HOI
1460 |
1461 | **VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions**
1462 |
1463 | - 论文:https://arxiv.org/abs/2003.05541
1464 | - 代码:https://github.com/ASMIftekhar/VSGNet
1465 |
1466 |
1467 |
1468 | # 轨迹预测
1469 |
1470 | **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**
1471 |
1472 | - 论文:https://arxiv.org/abs/1912.06445
1473 | - 代码:https://github.com/JunweiLiang/Multiverse
1474 | - 数据集:https://next.cs.cmu.edu/multiverse/
1475 |
1476 | **Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction**
1477 |
1478 | - 论文:https://arxiv.org/abs/2002.11927
1479 | - 代码:https://github.com/abduallahmohamed/Social-STGCNN
1480 |
1481 |
1482 |
1483 | # 运动预测
1484 |
1485 | **Collaborative Motion Prediction via Neural Motion Message Passing**
1486 |
1487 | - 论文:https://arxiv.org/abs/2003.06594
1488 | - 代码:https://github.com/PhyllisH/NMMP
1489 |
1490 | **MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps**
1491 |
1492 | - 论文:https://arxiv.org/abs/2003.06754
1493 |
1494 | - 代码:https://github.com/pxiangwu/MotionNet
1495 |
1496 |
1497 |
1498 | # 光流估计
1499 |
1500 | **Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation**
1501 |
1502 | - 论文:https://arxiv.org/abs/2003.13045
1503 | - 代码:https://github.com/lliuz/ARFlow
1504 |
1505 |
1506 |
1507 | # 图像检索
1508 |
1509 | **Evade Deep Image Retrieval by Stashing Private Images in the Hash Space**
1510 |
1511 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html
1512 | - 代码:https://github.com/sugarruy/hashstash
1513 |
1514 |
1515 |
1516 | # 虚拟试衣
1517 |
1518 | **Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content**
1519 |
1520 | - 论文:https://arxiv.org/abs/2003.05863
1521 | - 代码:https://github.com/switchablenorms/DeepFashion_Try_On
1522 |
1523 |
1524 |
1525 | # HDR
1526 |
1527 | **Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline**
1528 |
1529 | - 主页:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR
1530 |
1531 | - 论文下载链接:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf
1532 |
1533 | - 代码:https://github.com/alex04072000/SingleHDR
1534 |
1535 |
1536 |
1537 | # 对抗样本
1538 |
1539 | **Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction**
1540 |
1541 | - 论文:https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf
1542 | - 代码:https://github.com/erbloo/dr_cvpr20
1543 |
1544 | **Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance**
1545 |
1546 | - 论文:https://arxiv.org/abs/1911.02466
1547 | - 代码:https://github.com/ZhengyuZhao/PerC-Adversarial
1548 |
1549 |
1550 |
1551 | # 三维重建
1552 |
1553 | **Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild**
1554 |
1555 | - **CVPR 2020 Best Paper**
1556 | - 主页:https://elliottwu.com/projects/unsup3d/
1557 | - 论文:https://arxiv.org/abs/1911.11130
1558 | - 代码:https://github.com/elliottwu/unsup3d
1559 |
1560 | **Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization**
1561 |
1562 | - 主页:https://shunsukesaito.github.io/PIFuHD/
1563 | - 论文:https://arxiv.org/abs/2004.00452
1564 | - 代码:https://github.com/facebookresearch/pifuhd
1565 |
1566 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
1567 | - 代码:https://github.com/chaitanya100100/TailorNet
1568 | - 数据集:https://github.com/zycliao/TailorNet_dataset
1569 |
1570 | **Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion**
1571 |
1572 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf
1573 | - 代码:https://github.com/jchibane/if-net
1574 |
1575 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf
1576 | - 代码:https://github.com/aymenmir1/pix2surf
1577 |
1578 |
1579 |
1580 | # 深度补全
1581 |
1582 | **Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End**
1583 |
1584 | 论文:https://arxiv.org/abs/2006.03349
1585 |
1586 | 代码:https://github.com/abdo-eldesokey/pncnn
1587 |
1588 |
1589 |
1590 | # 语义场景补全
1591 |
1592 | **3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior**
1593 |
1594 | - 论文:https://arxiv.org/abs/2003.14052
1595 | - 代码:https://github.com/charlesCXK/TorchSSC
1596 |
1597 |
1598 |
1599 | # 图像/视频描述
1600 |
1601 | **Syntax-Aware Action Targeting for Video Captioning**
1602 |
1603 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf
1604 | - 代码:https://github.com/SydCaption/SAAT
1605 |
1606 |
1607 |
1608 | # 线框解析
1609 |
1610 | **Holistically-Attracted Wireframe Parser**
1611 |
1612 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html
1613 |
1614 | - 代码:https://github.com/cherubicXN/hawp
1615 |
1616 |
1617 |
1618 | # 数据集
1619 |
1620 | **OASIS: A Large-Scale Dataset for Single Image 3D in the Wild**
1621 |
1622 | - 论文:https://arxiv.org/abs/2007.13215
1623 | - 数据集:https://oasis.cs.princeton.edu/
1624 |
1625 | **STEFANN: Scene Text Editor using Font Adaptive Neural Network**
1626 |
1627 | - 主页:https://prasunroy.github.io/stefann/
1628 |
1629 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
1630 | - 代码:https://github.com/prasunroy/stefann
1631 | - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
1632 |
1633 | **Interactive Object Segmentation with Inside-Outside Guidance**
1634 |
1635 | - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
1636 | - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance
1637 | - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet
1638 |
1639 | **Video Panoptic Segmentation**
1640 |
1641 | - 论文:https://arxiv.org/abs/2006.11339
1642 | - 代码:https://github.com/mcahny/vps
1643 | - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
1644 |
1645 | **FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation**
1646 |
1647 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html
1648 |
1649 | - 代码:https://github.com/HKUSTCV/FSS-1000
1650 |
1651 | - 数据集:https://github.com/HKUSTCV/FSS-1000
1652 |
1653 | **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**
1654 |
1655 | - 主页:https://vap.aau.dk/3d-zef/
1656 | - 论文:https://arxiv.org/abs/2006.08466
1657 | - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/
1658 | - 数据集:https://motchallenge.net/data/3D-ZeF20
1659 |
1660 | **TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style**
1661 |
1662 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
1663 | - 代码:https://github.com/chaitanya100100/TailorNet
1664 | - 数据集:https://github.com/zycliao/TailorNet_dataset
1665 |
1666 | **Oops! Predicting Unintentional Action in Video**
1667 |
1668 | - 主页:https://oops.cs.columbia.edu/
1669 |
1670 | - 论文:https://arxiv.org/abs/1911.11206
1671 | - 代码:https://github.com/cvlab-columbia/oops
1672 | - 数据集:https://oops.cs.columbia.edu/data
1673 |
1674 | **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**
1675 |
1676 | - 论文:https://arxiv.org/abs/1912.06445
1677 | - 代码:https://github.com/JunweiLiang/Multiverse
1678 | - 数据集:https://next.cs.cmu.edu/multiverse/
1679 |
1680 | **Open Compound Domain Adaptation**
1681 |
1682 | - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html
1683 | - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
1684 | - 论文:https://arxiv.org/abs/1909.03403
1685 | - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA
1686 |
1687 | **Intra- and Inter-Action Understanding via Temporal Action Parsing**
1688 |
1689 | - 论文:https://arxiv.org/abs/2005.10229
1690 | - 主页和数据集:https://sdolivia.github.io/TAPOS/
1691 |
1692 | **Dynamic Refinement Network for Oriented and Densely Packed Object Detection**
1693 |
1694 | - 论文下载链接:https://arxiv.org/abs/2005.09973
1695 |
1696 | - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020
1697 |
1698 | **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**
1699 |
1700 | - 论文:https://arxiv.org/abs/2005.07862
1701 |
1702 | - 数据集:暂无
1703 |
1704 | **KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations**
1705 |
1706 | - 论文:https://arxiv.org/abs/2002.12687
1707 |
1708 | - 数据集:https://github.com/qq456cvb/KeypointNet
1709 |
1710 | **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**
1711 |
1712 | - 论文:http://vladlen.info/papers/MSeg.pdf
1713 | - 代码:https://github.com/mseg-dataset/mseg-api
1714 | - 数据集:https://github.com/mseg-dataset/mseg-semantic
1715 |
1716 | **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**
1717 |
1718 | - 论文:https://arxiv.org/abs/2003.13845
1719 | - 数据集:https://github.com/lattas/AvatarMe
1720 |
1721 | **Learning to Autofocus**
1722 |
1723 | - 论文:https://arxiv.org/abs/2004.12260
1724 | - 数据集:暂无
1725 |
1726 | **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**
1727 |
1728 | - 论文:https://arxiv.org/abs/2003.13989
1729 | - 代码:https://github.com/zhuhao-nju/facescape
1730 |
1731 | **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**
1732 |
1733 | - 论文下载链接:https://arxiv.org/abs/2004.01166
1734 |
1735 | - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest
1736 | - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
1737 |
1738 | **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**
1739 |
1740 | - 主页:https://sdolivia.github.io/FineGym/
1741 | - 论文:https://arxiv.org/abs/2004.06704
1742 |
1743 | **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**
1744 |
1745 | - 主页:https://anyirao.com/projects/SceneSeg.html
1746 |
1747 | - 论文下载链接:https://arxiv.org/abs/2004.02678
1748 |
1749 | - 代码:https://github.com/AnyiRao/SceneSeg
1750 |
1751 | **Deep Homography Estimation for Dynamic Scenes**
1752 |
1753 | - 论文:https://arxiv.org/abs/2004.02132
1754 |
1755 | - 数据集:https://github.com/lcmhoang/hmg-dynamics
1756 |
1757 | **Assessing Image Quality Issues for Real-World Problems**
1758 |
1759 | - 主页:https://vizwiz.org/tasks-and-datasets/image-quality-issues/
1760 | - 论文:https://arxiv.org/abs/2003.12511
1761 |
1762 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
1763 |
1764 | - 论文:https://arxiv.org/abs/2003.10608
1765 | - 代码和数据集:https://github.com/Jyouhou/UnrealText/
1766 |
1767 | **PANDA: A Gigapixel-level Human-centric Video Dataset**
1768 |
1769 | - 论文:https://arxiv.org/abs/2003.04852
1770 |
1771 | - 数据集:http://www.panda-dataset.com/
1772 |
1773 | **IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning**
1774 |
1775 | - 论文:https://arxiv.org/abs/2003.02920
1776 | - 数据集:https://github.com/intra3d2019/IntrA
1777 |
1778 | **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**
1779 |
1780 | - 论文:https://arxiv.org/abs/2003.03972
1781 | - 数据集:暂无
1782 |
1783 |
1784 |
1785 | # 其他
1786 |
1787 | **CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus**
1788 |
1789 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html
1790 | - 代码:https://github.com/fkluger/consac
1791 |
1792 | **Learning to Learn Single Domain Generalization**
1793 |
1794 | - 论文:https://arxiv.org/abs/2003.13216
1795 | - 代码:https://github.com/joffery/M-ADA
1796 |
1797 | **Open Compound Domain Adaptation**
1798 |
1799 | - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html
1800 | - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
1801 | - 论文:https://arxiv.org/abs/1909.03403
1802 | - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA
1803 |
1804 | **Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision**
1805 |
1806 | - 论文:http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf
1807 |
1808 | - 代码:https://github.com/autonomousvision/differentiable_volumetric_rendering
1809 |
1810 | **QEBA: Query-Efficient Boundary-Based Blackbox Attack**
1811 |
1812 | - 论文:https://arxiv.org/abs/2005.14137
1813 | - 代码:https://github.com/AI-secure/QEBA
1814 |
1815 | **Equalization Loss for Long-Tailed Object Recognition**
1816 |
1817 | - 论文:https://arxiv.org/abs/2003.05176
1818 | - 代码:https://github.com/tztztztztz/eql.detectron2
1819 |
1820 | **Instance-aware Image Colorization**
1821 |
1822 | - 主页:https://ericsujw.github.io/InstColorization/
1823 | - 论文:https://arxiv.org/abs/2005.10825
1824 | - 代码:https://github.com/ericsujw/InstColorization
1825 |
1826 | **Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting**
1827 |
1828 | - 论文:https://arxiv.org/abs/2005.09704
1829 |
1830 | - 代码:https://github.com/Atlas200dk/sample-imageinpainting-HiFill
1831 |
1832 | **Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching**
1833 |
1834 | - 论文:https://arxiv.org/abs/2005.03860
1835 | - 代码:https://github.com/shiyujiao/cross_view_localization_DSM
1836 |
1837 | **Epipolar Transformers**
1838 |
1839 | - 论文:https://arxiv.org/abs/2005.04551
1840 |
1841 | - 代码:https://github.com/yihui-he/epipolar-transformers
1842 |
1843 | **Bringing Old Photos Back to Life**
1844 |
1845 | - 主页:http://raywzy.com/Old_Photo/
1846 | - 论文:https://arxiv.org/abs/2004.09484
1847 |
1848 | **MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask**
1849 |
1850 | - 论文:https://arxiv.org/abs/2003.10955
1851 |
1852 | - 代码:https://github.com/microsoft/MaskFlownet
1853 |
1854 | **Self-Supervised Viewpoint Learning from Image Collections**
1855 |
1856 | - 论文:https://arxiv.org/abs/2004.01793
1857 | - 论文2:https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf
1858 | - 代码:https://github.com/NVlabs/SSV
1859 |
1860 | **Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations**
1861 |
1862 | - Oral
1863 |
1864 | - 论文:https://arxiv.org/abs/2003.12237
1865 | - 代码:https://github.com/cuishuhao/BNM
1866 |
1867 | **Towards Learning Structure via Consensus for Face Segmentation and Parsing**
1868 |
1869 | - 论文:https://arxiv.org/abs/1911.00957
1870 | - 代码:https://github.com/isi-vista/structure_via_consensus
1871 |
1872 | **Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging**
1873 |
1874 | - Oral
1875 | - 论文:https://arxiv.org/abs/2003.13654
1876 |
1877 | - 代码:https://github.com/liuyang12/PnP-SCI
1878 |
1879 | **Lightweight Photometric Stereo for Facial Details Recovery**
1880 |
1881 | - 论文:https://arxiv.org/abs/2003.12307
1882 | - 代码:https://github.com/Juyong/FacePSNet
1883 |
1884 | **Footprints and Free Space from a Single Color Image**
1885 |
1886 | - 论文:https://arxiv.org/abs/2004.06376
1887 |
1888 | - 代码:https://github.com/nianticlabs/footprints
1889 |
1890 | **Self-Supervised Monocular Scene Flow Estimation**
1891 |
1892 | - 论文:https://arxiv.org/abs/2004.04143
1893 | - 代码:https://github.com/visinf/self-mono-sf
1894 |
1895 | **Quasi-Newton Solver for Robust Non-Rigid Registration**
1896 |
1897 | - 论文:https://arxiv.org/abs/2004.04322
1898 | - 代码:https://github.com/Juyong/Fast_RNRR
1899 |
1900 | **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**
1901 |
1902 | - 主页:https://anyirao.com/projects/SceneSeg.html
1903 |
1904 | - 论文下载链接:https://arxiv.org/abs/2004.02678
1905 |
1906 | - 代码:https://github.com/AnyiRao/SceneSeg
1907 |
1908 | **DeepFLASH: An Efficient Network for Learning-based Medical Image Registration**
1909 |
1910 | - 论文:https://arxiv.org/abs/2004.02097
1911 |
1912 | - 代码:https://github.com/jw4hv/deepflash
1913 |
1914 | **Self-Supervised Scene De-occlusion**
1915 |
1916 | - 主页:https://xiaohangzhan.github.io/projects/deocclusion/
1917 | - 论文:https://arxiv.org/abs/2004.02788
1918 | - 代码:https://github.com/XiaohangZhan/deocclusion
1919 |
1920 | **Polarized Reflection Removal with Perfect Alignment in the Wild**
1921 |
1922 | - 主页:https://leichenyang.weebly.com/project-polarized.html
1923 | - 代码:https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment
1924 |
1925 | **Background Matting: The World is Your Green Screen**
1926 |
1927 | - 论文:https://arxiv.org/abs/2004.00626
1928 | - 代码:http://github.com/senguptaumd/Background-Matting
1929 |
1930 | **What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective**
1931 |
1932 | - 论文:https://arxiv.org/abs/2003.11241
1933 |
1934 | - 代码:https://github.com/ZhangLi-CS/GCP_Optimization
1935 |
1936 | **Look-into-Object: Self-supervised Structure Modeling for Object Recognition**
1937 |
1938 | - 论文:暂无
1939 | - 代码:https://github.com/JDAI-CV/LIO
1940 |
1941 | **Video Object Grounding using Semantic Roles in Language Description**
1942 |
1943 | - 论文:https://arxiv.org/abs/2003.10606
1944 | - 代码:https://github.com/TheShadow29/vognet-pytorch
1945 |
1946 | **Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives**
1947 |
1948 | - 论文:https://arxiv.org/abs/2003.10739
1949 | - 代码:https://github.com/d-li14/DHM
1950 |
1951 | **SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization**
1952 |
1953 | - 论文:http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf
1954 | - 代码:https://github.com/YueJiang-nj/CVPR2020-SDFDiff
1955 |
1956 | **On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location**
1957 |
1958 | - 论文:https://arxiv.org/abs/2003.07064
1959 |
1960 | - 代码:https://github.com/oskyhn/CNNs-Without-Borders
1961 |
1962 | **GhostNet: More Features from Cheap Operations**
1963 |
1964 | - 论文:https://arxiv.org/abs/1911.11907
1965 |
1966 | - 代码:https://github.com/iamhankai/ghostnet
1967 |
1968 | **AdderNet: Do We Really Need Multiplications in Deep Learning?**
1969 |
1970 | - 论文:https://arxiv.org/abs/1912.13200
1971 | - 代码:https://github.com/huawei-noah/AdderNet
1972 |
1973 | **Deep Image Harmonization via Domain Verification**
1974 |
1975 | - 论文:https://arxiv.org/abs/1911.13239
1976 | - 代码:https://github.com/bcmi/Image_Harmonization_Datasets
1977 |
1978 | **Blurry Video Frame Interpolation**
1979 |
1980 | - 论文:https://arxiv.org/abs/2002.12259
1981 | - 代码:https://github.com/laomao0/BIN
1982 |
1983 | **Extremely Dense Point Correspondences using a Learned Feature Descriptor**
1984 |
1985 | - 论文:https://arxiv.org/abs/2003.00619
1986 | - 代码:https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch
1987 |
1988 | **Filter Grafting for Deep Neural Networks**
1989 |
1990 | - 论文:https://arxiv.org/abs/2001.05868
1991 | - 代码:https://github.com/fxmeng/filter-grafting
1992 | - 论文解读:https://www.zhihu.com/question/372070853/answer/1041569335
1993 |
1994 | **Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation**
1995 |
1996 | - 论文:https://arxiv.org/abs/2003.02824
1997 | - 代码:https://github.com/cmhungsteve/SSTDA
1998 |
1999 | **Detecting Attended Visual Targets in Video**
2000 |
2001 | - 论文:https://arxiv.org/abs/2003.02501
2002 |
2003 | - 代码:https://github.com/ejcgt/attention-target-detection
2004 |
2005 | **Deep Image Spatial Transformation for Person Image Generation**
2006 |
2007 | - 论文:https://arxiv.org/abs/2003.00696
2008 | - 代码:https://github.com/RenYurui/Global-Flow-Local-Attention
2009 |
2010 | **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications**
2011 |
2012 | - 论文:https://arxiv.org/abs/2003.01455
2013 | - 代码:https://github.com/bbrattoli/ZeroShotVideoClassification
2014 |
2015 | https://github.com/charlesCXK/3D-SketchAware-SSC
2016 |
2017 | https://github.com/Anonymous20192020/Anonymous_CVPR5767
2018 |
2019 | https://github.com/avirambh/ScopeFlow
2020 |
2021 | https://github.com/csbhr/CDVD-TSP
2022 |
2023 | https://github.com/ymcidence/TBH
2024 |
2025 | https://github.com/yaoyao-liu/mnemonics
2026 |
2027 | https://github.com/meder411/Tangent-Images
2028 |
2029 | https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch
2030 |
2031 | https://github.com/sjmoran/deep_local_parametric_filters
2032 |
2033 | https://github.com/charlesCXK/3D-SketchAware-SSC
2034 |
2035 | https://github.com/bermanmaxim/AOWS
2036 |
2037 | https://github.com/dc3ea9f/look-into-object
2038 |
2039 |
2040 |
2041 | # 不确定中没中
2042 |
2043 | **FADNet: A Fast and Accurate Network for Disparity Estimation**
2044 |
2045 | - 论文:还没出来
2046 | - 代码:https://github.com/HKBU-HPML/FADNet
2047 |
2048 | https://github.com/rFID-submit/RandomFID:不确定中没中
2049 |
2050 | https://github.com/JackSyu/AE-MSR:不确定中没中
2051 |
2052 | https://github.com/fastconvnets/cvpr2020:不确定中没中
2053 |
2054 | https://github.com/aimagelab/meshed-memory-transformer:不确定中没中
2055 |
2056 | https://github.com/TWSFar/CRGNet:不确定中没中
2057 |
2058 | https://github.com/CVPR-2020/CDARTS:不确定中没中
2059 |
2060 | https://github.com/anucvml/ddn-cvprw2020:不确定中没中
2061 |
2062 | https://github.com/dl-model-recommend/model-trust:不确定中没中
2063 |
2064 | https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior:不确定中没中
2065 |
2066 | https://github.com/onetcvpr/O-Net:不确定中没中
2067 |
2068 | https://github.com/502463708/Microcalcification_Detection:不确定中没中
2069 |
2070 | https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine:不确定中没中
2071 |
2072 | https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset:不确定中没中
2073 |
2074 | https://github.com/cvpr-nonrigid/dataset:不确定中没中
2075 |
2076 | https://github.com/theFool32/PPBA:不确定中没中
2077 |
2078 | https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition
--------------------------------------------------------------------------------
/CVPR2022-Papers-with-Code.md:
--------------------------------------------------------------------------------
1 | # CVPR 2022 论文和开源项目合集(Papers with Code)
2 |
3 | [CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)!
4 |
5 | CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
6 |
7 | > 注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!
8 | >
9 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
10 | >
11 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md)
12 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md)
13 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md)
14 |
15 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
16 |
17 | 
18 |
19 | ## 【CVPR 2022 论文开源目录】
20 |
21 | - [Backbone](#Backbone)
22 | - [CLIP](#CLIP)
23 | - [GAN](#GAN)
24 | - [GNN](#GNN)
25 | - [MLP](#MLP)
26 | - [NAS](#NAS)
27 | - [OCR](#OCR)
28 | - [NeRF](#NeRF)
29 | - [3D Face](#3D Face)
30 | - [长尾分布(Long-Tail)](#Long-Tail)
31 | - [Visual Transformer](#Visual-Transformer)
32 | - [视觉和语言(Vision-Language)](#VL)
33 | - [自监督学习(Self-supervised Learning)](#SSL)
34 | - [数据增强(Data Augmentation)](#DA)
35 | - [知识蒸馏(Knowledge Distillation)](#KD)
36 | - [目标检测(Object Detection)](#Object-Detection)
37 | - [目标跟踪(Visual Tracking)](#VT)
38 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
39 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
40 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
41 | - [小样本分类(Few-Shot Classification)](#FFC)
42 | - [小样本分割(Few-Shot Segmentation)](#FFS)
43 | - [图像抠图(Image Matting)](#Matting)
44 | - [视频理解(Video Understanding)](#VU)
45 | - [图像编辑(Image Editing)](#Image-Editing)
46 | - [Low-level Vision](#LLV)
47 | - [超分辨率(Super-Resolution)](#Super-Resolution)
48 | - [去模糊(Deblur)](#Deblur)
49 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
50 | - [3D目标检测(3D Object Detection)](#3D-Object-Detection)
51 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
52 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
53 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
54 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
55 | - [3D重建(3D Reconstruction)](#3D-R)
56 | - [行人重识别(Person Re-identification)](#ReID)
57 | - [伪装物体检测(Camouflaged Object Detection)](#COD)
58 | - [深度估计(Depth Estimation)](#Depth-Estimation)
59 | - [立体匹配(Stereo Matching)](#Stereo-Matching)
60 | - [特征匹配(Feature Matching)](#FM)
61 | - [车道线检测(Lane Detection)](#Lane-Detection)
62 | - [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation)
63 | - [图像修复(Image Inpainting)](#Image-Inpainting)
64 | - [图像检索(Image Retrieval)](#Image-Retrieval)
65 | - [人脸识别(Face Recognition)](#Face-Recognition)
66 | - [人群计数(Crowd Counting)](#Crowd-Counting)
67 | - [医学图像(Medical Image)](#Medical-Image)
68 | - [视频生成(Video Generation)](#Video Generation)
69 | - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
70 | - [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS)
71 | - [步态识别(Gait Recognition)](#GR)
72 | - [风格迁移(Style Transfer)](#ST)
73 | - [异常检测(Anomaly Detection](#AD)
74 | - [对抗样本(Adversarial Examples)](#AE)
75 | - [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL)
76 | - [雷达目标检测(Radar Object Detection)](#ROD)
77 | - [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI)
78 | - [图像拼接(Image Stitching)](#Image-Stitching)
79 | - [水印(Watermarking)](#Watermarking)
80 | - [Action Counting](#AC)
81 | - [Grounded Situation Recognition](#GSR)
82 | - [Zero-shot Learning](#ZSL)
83 | - [DeepFakes](#DeepFakes)
84 | - [数据集(Datasets)](#Datasets)
85 | - [新任务(New Tasks)](#New-Tasks)
86 | - [其他(Others)](#Others)
87 |
88 |
89 |
90 | # Backbone
91 |
92 | **A ConvNet for the 2020s**
93 |
94 | - Paper: https://arxiv.org/abs/2201.03545
95 | - Code: https://github.com/facebookresearch/ConvNeXt
96 | - 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw
97 |
98 | **Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs**
99 |
100 | - Paper: https://arxiv.org/abs/2203.06717
101 |
102 | - Code: https://github.com/megvii-research/RepLKNet
103 | - Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
104 |
105 | - 中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg
106 |
107 | **MPViT : Multi-Path Vision Transformer for Dense Prediction**
108 |
109 | - Paper: https://arxiv.org/abs/2112.11010
110 | - Code: https://github.com/youngwanLEE/MPViT
111 | - 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg
112 |
113 | **Mobile-Former: Bridging MobileNet and Transformer**
114 |
115 | - Paper: https://arxiv.org/abs/2108.05895
116 | - Code: None
117 | - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
118 |
119 | **MetaFormer is Actually What You Need for Vision**
120 |
121 | - Paper: https://arxiv.org/abs/2111.11418
122 | - Code: https://github.com/sail-sg/poolformer
123 |
124 | **Shunted Self-Attention via Multi-Scale Token Aggregation**
125 |
126 | - Paper(Oral): https://arxiv.org/abs/2111.15193
127 | - Code: https://github.com/OliverRensu/Shunted-Transformer
128 |
129 | **TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing**
130 |
131 | - Paper: http://arxiv.org/abs/2203.10489
132 | - Code: https://github.com/JierunChen/TVConv
133 |
134 | **Learned Queries for Efficient Local Attention**
135 |
136 | - Paper(Oral): https://arxiv.org/abs/2112.11435
137 | - Code: https://github.com/moabarar/qna
138 |
139 | **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**
140 |
141 | - Paper: https://arxiv.org/abs/2112.11081
142 | - Code: https://github.com/DingXiaoH/RepMLP
143 |
144 |
145 |
146 | # CLIP
147 |
148 | **HairCLIP: Design Your Hair by Text and Reference Image**
149 |
150 | - Paper: https://arxiv.org/abs/2112.05142
151 |
152 | - Code: https://github.com/wty-ustc/HairCLIP
153 |
154 | **PointCLIP: Point Cloud Understanding by CLIP**
155 |
156 | - Paper: https://arxiv.org/abs/2112.02413
157 | - Code: https://github.com/ZrrSkywalker/PointCLIP
158 |
159 | **Blended Diffusion for Text-driven Editing of Natural Images**
160 |
161 | - Paper: https://arxiv.org/abs/2111.14818
162 |
163 | - Code: https://github.com/omriav/blended-diffusion
164 |
165 |
166 |
167 | # GAN
168 |
169 | **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**
170 |
171 | - Homepage: https://semanticstylegan.github.io/
172 |
173 | - Paper: https://arxiv.org/abs/2112.02236
174 | - Demo: https://semanticstylegan.github.io/videos/demo.mp4
175 |
176 | **Style Transformer for Image Inversion and Editing**
177 |
178 | - Paper: https://arxiv.org/abs/2203.07932
179 | - Code: https://github.com/sapphire497/style-transformer
180 |
181 | **Unsupervised Image-to-Image Translation with Generative Prior**
182 |
183 | - Homepage: https://www.mmlab-ntu.com/project/gpunit/
184 | - Paper: https://arxiv.org/abs/2204.03641
185 | - Code: https://github.com/williamyang1991/GP-UNIT
186 |
187 | **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**
188 |
189 | - Homepage: https://universome.github.io/stylegan-v
190 | - Paper: https://arxiv.org/abs/2112.14683
191 | - Code: https://github.com/universome/stylegan-v
192 |
193 | **OSSGAN: Open-set Semi-supervised Image Generation**
194 |
195 | - Paper: https://arxiv.org/abs/2204.14249
196 | - Code: https://github.com/raven38/OSSGAN
197 |
198 | **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**
199 |
200 | - Paper: https://arxiv.org/abs/2204.06160
201 | - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
202 |
203 |
204 |
205 | # GNN
206 |
207 | **OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks**
208 |
209 | - Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf
210 | - Code: https://github.com/WanyuGroup/CVPR2022-OrphicX
211 |
212 |
213 |
214 | # MLP
215 |
216 | **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**
217 |
218 | - Paper: https://arxiv.org/abs/2112.11081
219 | - Code: https://github.com/DingXiaoH/RepMLP
220 |
221 |
222 |
223 | # NAS
224 |
225 | **β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search**
226 |
227 | - Paper: https://arxiv.org/abs/2203.01665
228 | - Code: https://github.com/Sunshine-Ye/Beta-DARTS
229 |
230 | **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**
231 |
232 | - Paper: https://arxiv.org/abs/2111.15362
233 | - Code: None
234 |
235 |
236 |
237 | # OCR
238 |
239 | **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**
240 |
241 | - Paper: https://arxiv.org/abs/2203.10209
242 |
243 | - Code: https://github.com/mxin262/SwinTextSpotter
244 |
245 |
246 |
247 | # NeRF
248 |
249 | **Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields**
250 |
251 | - Homepage: https://jonbarron.info/mipnerf360/
252 | - Paper: https://arxiv.org/abs/2111.12077
253 |
254 | - Demo: https://youtu.be/YStDS2-Ln1s
255 |
256 | **Point-NeRF: Point-based Neural Radiance Fields**
257 |
258 | - Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
259 | - Paper: https://arxiv.org/abs/2201.08845
260 | - Code: https://github.com/Xharlie/point-nerf
261 |
262 | **NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images**
263 |
264 | - Paper: https://arxiv.org/abs/2111.13679
265 | - Homepage: https://bmild.github.io/rawnerf/
266 | - Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
267 |
268 | **Urban Radiance Fields**
269 |
270 | - Homepage: https://urban-radiance-fields.github.io/
271 |
272 | - Paper: https://arxiv.org/abs/2111.14643
273 | - Demo: https://youtu.be/qGlq5DZT6uc
274 |
275 | **Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation**
276 |
277 | - Paper: https://arxiv.org/abs/2202.13162
278 | - Code: https://github.com/HexagonPrime/Pix2NeRF
279 |
280 | **HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video**
281 |
282 | - Homepage: https://grail.cs.washington.edu/projects/humannerf/
283 | - Paper: https://arxiv.org/abs/2201.04127
284 |
285 | - Demo: https://youtu.be/GM-RoZEymmw
286 |
287 |
288 |
289 | # 3D Face
290 |
291 | **ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations**
292 |
293 | - Paper: https://arxiv.org/abs/2203.14510
294 | - Code: https://github.com/MingwuZheng/ImFace
295 |
296 |
297 |
298 | # 长尾分布(Long-Tail)
299 |
300 | **Retrieval Augmented Classification for Long-Tail Visual Recognition**
301 |
302 | - Paper: https://arxiv.org/abs/2202.11233
303 | - Code: None
304 |
305 |
306 |
307 | # Visual Transformer
308 |
309 | ## Backbone
310 |
311 | **MPViT : Multi-Path Vision Transformer for Dense Prediction**
312 |
313 | - Paper: https://arxiv.org/abs/2112.11010
314 | - Code: https://github.com/youngwanLEE/MPViT
315 |
316 | **MetaFormer is Actually What You Need for Vision**
317 |
318 | - Paper: https://arxiv.org/abs/2111.11418
319 | - Code: https://github.com/sail-sg/poolformer
320 |
321 | **Mobile-Former: Bridging MobileNet and Transformer**
322 |
323 | - Paper: https://arxiv.org/abs/2108.05895
324 | - Code: None
325 | - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
326 |
327 | **Shunted Self-Attention via Multi-Scale Token Aggregation**
328 |
329 | - Paper(Oral): https://arxiv.org/abs/2111.15193
330 | - Code: https://github.com/OliverRensu/Shunted-Transformer
331 |
332 | **Learned Queries for Efficient Local Attention**
333 |
334 | - Paper(Oral): https://arxiv.org/abs/2112.11435
335 | - Code: https://github.com/moabarar/qna
336 |
337 | ## 应用(Application)
338 |
339 | **Language-based Video Editing via Multi-Modal Multi-Level Transformer**
340 |
341 | - Paper: https://arxiv.org/abs/2104.01122
342 | - Code: None
343 |
344 | **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**
345 |
346 | - Paper: https://arxiv.org/abs/2203.00859
347 | - Code: None
348 |
349 | **Embracing Single Stride 3D Object Detector with Sparse Transformer**
350 |
351 | - Paper: https://arxiv.org/abs/2112.06375
352 | - Code: https://github.com/TuSimple/SST
353 | - 中文解读:https://zhuanlan.zhihu.com/p/476056546
354 |
355 | **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**
356 |
357 | - Paper: https://arxiv.org/abs/2203.02891
358 | - Code: https://github.com/xulianuwa/MCTformer
359 |
360 | **Spatio-temporal Relation Modeling for Few-shot Action Recognition**
361 |
362 | - Paper: https://arxiv.org/abs/2112.05132
363 | - Code: https://github.com/Anirudh257/strm
364 |
365 | **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**
366 |
367 | - Paper: https://arxiv.org/abs/2111.07910
368 | - Code: https://github.com/caiyuanhao1998/MST
369 |
370 | **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**
371 |
372 | - Homepage: https://point-bert.ivg-research.xyz/
373 | - Paper: https://arxiv.org/abs/2111.14819
374 | - Code: https://github.com/lulutang0608/Point-BERT
375 |
376 | **GroupViT: Semantic Segmentation Emerges from Text Supervision**
377 |
378 | - Homepage: https://jerryxu.net/GroupViT/
379 |
380 | - Paper: https://arxiv.org/abs/2202.11094
381 | - Demo: https://youtu.be/DtJsWIUTW-Y
382 |
383 | **Restormer: Efficient Transformer for High-Resolution Image Restoration**
384 |
385 | - Paper: https://arxiv.org/abs/2111.09881
386 | - Code: https://github.com/swz30/Restormer
387 |
388 | **Splicing ViT Features for Semantic Appearance Transfer**
389 |
390 | - Homepage: https://splice-vit.github.io/
391 | - Paper: https://arxiv.org/abs/2201.00424
392 | - Code: https://github.com/omerbt/Splice
393 |
394 | **Self-supervised Video Transformer**
395 |
396 | - Homepage: https://kahnchana.github.io/svt/
397 | - Paper: https://arxiv.org/abs/2112.01514
398 |
399 | - Code: https://github.com/kahnchana/svt
400 |
401 | **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**
402 |
403 | - Paper: https://arxiv.org/abs/2203.02664
404 | - Code: https://github.com/rulixiang/afa
405 |
406 | **Accelerating DETR Convergence via Semantic-Aligned Matching**
407 |
408 | - Paper: https://arxiv.org/abs/2203.06883
409 | - Code: https://github.com/ZhangGongjie/SAM-DETR
410 |
411 | **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**
412 |
413 | - Paper: https://arxiv.org/abs/2203.01305
414 | - Code: https://github.com/FengLi-ust/DN-DETR
415 | - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
416 |
417 | **Style Transformer for Image Inversion and Editing**
418 |
419 | - Paper: https://arxiv.org/abs/2203.07932
420 | - Code: https://github.com/sapphire497/style-transformer
421 |
422 | **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**
423 |
424 | - Paper: https://arxiv.org/abs/2203.10981
425 |
426 | - Code: https://github.com/kuanchihhuang/MonoDTR
427 |
428 | **Mask Transfiner for High-Quality Instance Segmentation**
429 |
430 | - Paper: https://arxiv.org/abs/2111.13673
431 | - Code: https://github.com/SysCV/transfiner
432 |
433 | **Language as Queries for Referring Video Object Segmentation**
434 |
435 | - Paper: https://arxiv.org/abs/2201.00487
436 | - Code: https://github.com/wjn922/ReferFormer
437 | - 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ
438 |
439 | **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**
440 |
441 | - Paper: https://arxiv.org/abs/2203.00843
442 | - Code: https://github.com/CurryYuan/X-Trans2Cap
443 |
444 | **AdaMixer: A Fast-Converging Query-Based Object Detector**
445 |
446 | - Paper(Oral): https://arxiv.org/abs/2203.16507
447 | - Code: https://github.com/MCG-NJU/AdaMixer
448 |
449 | **Omni-DETR: Omni-Supervised Object Detection with Transformers**
450 |
451 | - Paper: https://arxiv.org/abs/2203.16089
452 | - Code: https://github.com/amazon-research/omni-detr
453 |
454 | **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**
455 |
456 | - Paper: https://arxiv.org/abs/2203.10209
457 |
458 | - Code: https://github.com/mxin262/SwinTextSpotter
459 |
460 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
461 |
462 | - Paper(Oral): https://arxiv.org/abs/2204.01018
463 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
464 |
465 | **Collaborative Transformers for Grounded Situation Recognition**
466 |
467 | - Paper: https://arxiv.org/abs/2203.16518
468 | - Code: https://github.com/jhcho99/CoFormer
469 |
470 | **NFormer: Robust Person Re-identification with Neighbor Transformer**
471 |
472 | - Paper: https://arxiv.org/abs/2204.09331
473 | - Code: https://github.com/haochenheheda/NFormer
474 |
475 | **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**
476 |
477 | - Paper: https://arxiv.org/abs/2201.06889
478 | - Code: None
479 |
480 | **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**
481 |
482 | - Paper(Oral): https://arxiv.org/abs/2204.08680
483 | - Code: https://github.com/zengwang430521/TCFormer
484 |
485 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
486 |
487 | - Paper: https://arxiv.org/abs/2204.10039
488 | - Code: https://github.com/H-deep/Trans-SVSR/
489 | - Dataset: http://shorturl.at/mpwGX
490 |
491 | **Safe Self-Refinement for Transformer-based Domain Adaptation**
492 |
493 | - Paper: https://arxiv.org/abs/2204.07683
494 | - Code: https://github.com/tsun/SSRT
495 |
496 | **Fast Point Transformer**
497 |
498 | - Homepage: http://cvlab.postech.ac.kr/research/FPT/
499 | - Paper: https://arxiv.org/abs/2112.04702
500 | - Code: https://github.com/POSTECH-CVLab/FastPointTransformer
501 |
502 | **Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval**
503 |
504 | - Paper: https://arxiv.org/abs/2204.09730
505 | - Code: https://github.com/mshukor/TFood
506 |
507 | **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**
508 |
509 | - Paper: https://arxiv.org/abs/2111.14887
510 | - Code: https://github.com/lhoyer/DAFormer
511 |
512 | **Stratified Transformer for 3D Point Cloud Segmentation**
513 |
514 | - Paper: https://arxiv.org/pdf/2203.14508.pdf
515 | - Code: https://github.com/dvlab-research/Stratified-Transformer
516 |
517 |
518 |
519 | # 视觉和语言(Vision-Language)
520 |
521 | **Conditional Prompt Learning for Vision-Language Models**
522 |
523 | - Paper: https://arxiv.org/abs/2203.05557
524 | - Code: https://github.com/KaiyangZhou/CoOp
525 |
526 | **Bridging Video-text Retrieval with Multiple Choice Question**
527 |
528 | - Paper: https://arxiv.org/abs/2201.04850
529 | - Code: https://github.com/TencentARC/MCQ
530 |
531 | **Visual Abductive Reasoning**
532 |
533 | - Paper: https://arxiv.org/abs/2203.14040
534 | - Code: https://github.com/leonnnop/VAR
535 |
536 |
537 |
538 | # 自监督学习(Self-supervised Learning)
539 |
540 | **UniVIP: A Unified Framework for Self-Supervised Visual Pre-training**
541 |
542 | - Paper: https://arxiv.org/abs/2203.06965
543 | - Code: None
544 |
545 | **Crafting Better Contrastive Views for Siamese Representation Learning**
546 |
547 | - Paper: https://arxiv.org/abs/2202.03278
548 | - Code: https://github.com/xyupeng/ContrastiveCrop
549 | - 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A
550 |
551 | **HCSC: Hierarchical Contrastive Selective Coding**
552 |
553 | - Homepage: https://github.com/gyfastas/HCSC
554 | - Paper: https://arxiv.org/abs/2202.00455
555 | - 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ
556 |
557 | **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**
558 |
559 | - Paper: https://arxiv.org/abs/2204.10437
560 |
561 | - Code: https://github.com/JLiangLab/DiRA
562 |
563 |
564 |
565 | # 数据增强(Data Augmentation)
566 |
567 | **TeachAugment: Data Augmentation Optimization Using Teacher Knowledge**
568 |
569 | - Paper: https://arxiv.org/abs/2202.12513
570 | - Code: https://github.com/DensoITLab/TeachAugment
571 |
572 | **AlignMixup: Improving Representations By Interpolating Aligned Features**
573 |
574 | - Paper: https://arxiv.org/abs/2103.15375
575 | - Code: https://github.com/shashankvkt/AlignMixup_CVPR22
576 |
577 |
578 |
579 | # 知识蒸馏(Knowledge Distillation)
580 |
581 | **Decoupled Knowledge Distillation**
582 |
583 | - Paper: https://arxiv.org/abs/2203.08679
584 | - Code: https://github.com/megvii-research/mdistiller
585 | - 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw
586 |
587 |
588 |
589 | # 目标检测(Object Detection)
590 |
591 | **BoxeR: Box-Attention for 2D and 3D Transformers**
592 | - Paper: https://arxiv.org/abs/2111.13087
593 | - Code: https://github.com/kienduynguyen/BoxeR
594 | - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
595 |
596 | **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**
597 |
598 | - Paper: https://arxiv.org/abs/2203.01305
599 | - Code: https://github.com/FengLi-ust/DN-DETR
600 | - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
601 |
602 | **Accelerating DETR Convergence via Semantic-Aligned Matching**
603 |
604 | - Paper: https://arxiv.org/abs/2203.06883
605 | - Code: https://github.com/ZhangGongjie/SAM-DETR
606 |
607 | **Localization Distillation for Dense Object Detection**
608 |
609 | - Paper: https://arxiv.org/abs/2102.12252
610 | - Code: https://github.com/HikariTJU/LD
611 | - Code2: https://github.com/HikariTJU/LD
612 | - 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
613 |
614 | **Focal and Global Knowledge Distillation for Detectors**
615 |
616 | - Paper: https://arxiv.org/abs/2111.11837
617 | - Code: https://github.com/yzd-v/FGD
618 | - 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ
619 |
620 | **A Dual Weighting Label Assignment Scheme for Object Detection**
621 |
622 | - Paper: https://arxiv.org/abs/2203.09730
623 | - Code: https://github.com/strongwolf/DW
624 |
625 | **AdaMixer: A Fast-Converging Query-Based Object Detector**
626 |
627 | - Paper(Oral): https://arxiv.org/abs/2203.16507
628 | - Code: https://github.com/MCG-NJU/AdaMixer
629 |
630 | **Omni-DETR: Omni-Supervised Object Detection with Transformers**
631 |
632 | - Paper: https://arxiv.org/abs/2203.16089
633 | - Code: https://github.com/amazon-research/omni-detr
634 |
635 | **SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection**
636 |
637 | - Paper(Oral): https://arxiv.org/abs/2203.06398
638 | - Code: https://github.com/CityU-AIM-Group/SIGMA
639 |
640 | ## 半监督目标检测
641 |
642 | **Dense Learning based Semi-Supervised Object Detection**
643 |
644 | - Paper: https://arxiv.org/abs/2204.07300
645 |
646 | - Code: https://github.com/chenbinghui1/DSL
647 |
648 | # 目标跟踪(Visual Tracking)
649 |
650 | **Correlation-Aware Deep Tracking**
651 |
652 | - Paper: https://arxiv.org/abs/2203.01666
653 | - Code: None
654 |
655 | **TCTrack: Temporal Contexts for Aerial Tracking**
656 |
657 | - Paper: https://arxiv.org/abs/2203.01885
658 | - Code: https://github.com/vision4robotics/TCTrack
659 |
660 | ## 多模态目标跟踪
661 |
662 | **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**
663 |
664 | - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
665 |
666 | - Paper: https://arxiv.org/abs/2204.04120
667 |
668 | ## 多目标跟踪(Multi-Object Tracking)
669 |
670 | **Learning of Global Objective for Network Flow in Multi-Object Tracking**
671 |
672 | - Paper: https://arxiv.org/abs/2203.16210
673 | - Code: None
674 |
675 | **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**
676 |
677 | - Homepage: https://dancetrack.github.io
678 | - Paper: https://arxiv.org/abs/2111.14690
679 | - Dataset: https://github.com/DanceTrack/DanceTrack
680 |
681 |
682 |
683 | # 语义分割(Semantic Segmentation)
684 |
685 | **Novel Class Discovery in Semantic Segmentation**
686 |
687 | - Homepage: https://ncdss.github.io/
688 | - Paper: https://arxiv.org/abs/2112.01900
689 | - Code: https://github.com/HeliosZhao/NCDSS
690 |
691 | **Deep Hierarchical Semantic Segmentation**
692 |
693 | - Paper: https://arxiv.org/abs/2203.14335
694 | - Code: https://github.com/0liliulei/HieraSeg
695 |
696 | **Rethinking Semantic Segmentation: A Prototype View**
697 |
698 | - Paper(Oral): https://arxiv.org/abs/2203.15102
699 | - Code: https://github.com/tfzhou/ProtoSeg
700 |
701 | ## 弱监督语义分割
702 |
703 | **Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation**
704 |
705 | - Paper: https://arxiv.org/abs/2203.00962
706 | - Code: https://github.com/zhaozhengChen/ReCAM
707 |
708 | **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**
709 |
710 | - Paper: https://arxiv.org/abs/2203.02891
711 | - Code: https://github.com/xulianuwa/MCTformer
712 |
713 | **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**
714 |
715 | - Paper: https://arxiv.org/abs/2203.02664
716 | - Code: https://github.com/rulixiang/afa
717 |
718 | **CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation**
719 |
720 | - Paper: https://arxiv.org/abs/2203.02668
721 | - Code: https://github.com/CVI-SZU/CLIMS
722 |
723 | **CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation**
724 |
725 | - Paper: https://arxiv.org/abs/2203.13505
726 | - Code: https://github.com/CVI-SZU/CCAM
727 |
728 | **FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation**
729 |
730 | - Homeapage: http://cvlab.postech.ac.kr/research/FIFO/
731 | - Paper(Oral): https://arxiv.org/abs/2204.01587
732 | - Code: https://github.com/sohyun-l/FIFO
733 |
734 | **Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation**
735 |
736 | - Paper: https://arxiv.org/abs/2203.09653
737 | - Code: https://github.com/maeve07/RCA.git
738 |
739 | ## 半监督语义分割
740 |
741 | **ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation**
742 |
743 | - Paper: https://arxiv.org/abs/2106.05095
744 | - Code: https://github.com/LiheYoung/ST-PlusPlus
745 | - 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
746 |
747 | **Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels**
748 |
749 | - Homepage: https://haochen-wang409.github.io/U2PL/
750 | - Paper: https://arxiv.org/abs/2203.03884
751 | - Code: https://github.com/Haochen-Wang409/U2PL
752 | - 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ
753 |
754 | **Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation**
755 |
756 | - Paper: https://arxiv.org/pdf/2111.12903.pdf
757 | - Code: https://github.com/yyliu01/PS-MT
758 |
759 | ## 域自适应语义分割
760 |
761 | **Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation**
762 |
763 | - Paper: https://arxiv.org/abs/2111.12940
764 | - Code: https://github.com/BIT-DA/RIPU
765 |
766 | **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**
767 |
768 | - Paper: https://arxiv.org/abs/2111.14887
769 | - Code: https://github.com/lhoyer/DAFormer
770 |
771 | ## 无监督语义分割
772 |
773 | **GroupViT: Semantic Segmentation Emerges from Text Supervision**
774 |
775 | - Homepage: https://jerryxu.net/GroupViT/
776 | - Paper: https://arxiv.org/abs/2202.11094
777 | - Demo: https://youtu.be/DtJsWIUTW-Y
778 |
779 | ## 少样本语义分割
780 |
781 | **Generalized Few-shot Semantic Segmentation**
782 |
783 | - Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf
784 | - Code: https://github.com/dvlab-research/GFS-Seg
785 |
786 |
787 |
788 | # 实例分割(Instance Segmentation)
789 |
790 | **BoxeR: Box-Attention for 2D and 3D Transformers**
791 | - Paper: https://arxiv.org/abs/2111.13087
792 | - Code: https://github.com/kienduynguyen/BoxeR
793 | - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
794 |
795 | **E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation**
796 |
797 | - Paper: https://arxiv.org/abs/2203.04074
798 | - Code: https://github.com/zhang-tao-whu/e2ec
799 |
800 | **Mask Transfiner for High-Quality Instance Segmentation**
801 |
802 | - Paper: https://arxiv.org/abs/2111.13673
803 | - Code: https://github.com/SysCV/transfiner
804 |
805 | **Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity**
806 |
807 | - Homepage: https://sites.google.com/view/generic-grouping/
808 |
809 | - Paper: https://arxiv.org/abs/2204.06107
810 | - Code: https://github.com/facebookresearch/Generic-Grouping
811 |
812 | ## 自监督实例分割
813 |
814 | **FreeSOLO: Learning to Segment Objects without Annotations**
815 |
816 | - Paper: https://arxiv.org/abs/2202.12181
817 | - Code: https://github.com/NVlabs/FreeSOLO
818 |
819 | ## 视频实例分割
820 |
821 | **Efficient Video Instance Segmentation via Tracklet Query and Proposal**
822 |
823 | - Homepage: https://jialianwu.com/projects/EfficientVIS.html
824 | - Paper: https://arxiv.org/abs/2203.01853
825 | - Demo: https://youtu.be/sSPMzgtMKCE
826 |
827 | **Temporally Efficient Vision Transformer for Video Instance Segmentation**
828 |
829 | - Paper: https://arxiv.org/abs/2204.08412
830 | - Code: https://github.com/hustvl/TeViT
831 |
832 |
833 |
834 | # 全景分割(Panoptic Segmentation)
835 |
836 | **Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers**
837 |
838 | - Paper: https://arxiv.org/abs/2109.03814
839 | - Code: https://github.com/zhiqi-li/Panoptic-SegFormer
840 |
841 | **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**
842 |
843 | - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
844 | - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
845 | - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
846 |
847 |
848 |
849 | # 小样本分类(Few-Shot Classification)
850 |
851 | **Integrative Few-Shot Learning for Classification and Segmentation**
852 |
853 | - Paper: https://arxiv.org/abs/2203.15712
854 | - Code: https://github.com/dahyun-kang/ifsl
855 |
856 | **Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification**
857 |
858 | - Paper: https://arxiv.org/abs/2106.05517
859 | - Code: https://github.com/LouieYang/MCL
860 |
861 |
862 |
863 | # 小样本分割(Few-Shot Segmentation)
864 |
865 | **Learning What Not to Segment: A New Perspective on Few-Shot Segmentation**
866 |
867 | - Paper: https://arxiv.org/abs/2203.07615
868 | - Code: https://github.com/chunbolang/BAM
869 |
870 | **Integrative Few-Shot Learning for Classification and Segmentation**
871 |
872 | - Paper: https://arxiv.org/abs/2203.15712
873 | - Code: https://github.com/dahyun-kang/ifsl
874 |
875 | **Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation**
876 |
877 | - Paper: https://arxiv.org/abs/2204.10638
878 | - Code: None
879 |
880 |
881 |
882 | # 图像抠图(Image Matting)
883 |
884 | **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**
885 |
886 | - Paper: https://arxiv.org/abs/2201.06889
887 | - Code: None
888 |
889 |
890 |
891 | # 视频理解(Video Understanding)
892 |
893 | **Self-supervised Video Transformer**
894 |
895 | - Homepage: https://kahnchana.github.io/svt/
896 | - Paper: https://arxiv.org/abs/2112.01514
897 | - Code: https://github.com/kahnchana/svt
898 |
899 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
900 |
901 | - Paper(Oral): https://arxiv.org/abs/2204.01018
902 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
903 |
904 | **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**
905 |
906 | - Paper(Oral): https://arxiv.org/abs/2204.03646
907 |
908 | - Dataset: https://github.com/xujinglin/FineDiving
909 | - Code: https://github.com/xujinglin/FineDiving
910 | - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
911 |
912 | **Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition**
913 |
914 | - Paper(Oral): https://arxiv.org/abs/2204.02148
915 | - Code: None
916 |
917 | ## 行为识别(Action Recognition)
918 |
919 | **Spatio-temporal Relation Modeling for Few-shot Action Recognition**
920 |
921 | - Paper: https://arxiv.org/abs/2112.05132
922 | - Code: https://github.com/Anirudh257/strm
923 |
924 | ## 动作检测(Action Detection)
925 |
926 | **End-to-End Semi-Supervised Learning for Video Action Detection**
927 |
928 | - Paper: https://arxiv.org/abs/2203.04251
929 | - Code: None
930 |
931 |
932 |
933 | # 图像编辑(Image Editing)
934 |
935 | **Style Transformer for Image Inversion and Editing**
936 |
937 | - Paper: https://arxiv.org/abs/2203.07932
938 | - Code: https://github.com/sapphire497/style-transformer
939 |
940 | **Blended Diffusion for Text-driven Editing of Natural Images**
941 |
942 | - Paper: https://arxiv.org/abs/2111.14818
943 | - Code: https://github.com/omriav/blended-diffusion
944 |
945 | **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**
946 |
947 | - Homepage: https://semanticstylegan.github.io/
948 |
949 | - Paper: https://arxiv.org/abs/2112.02236
950 | - Demo: https://semanticstylegan.github.io/videos/demo.mp4
951 |
952 |
953 |
954 | # Low-level Vision
955 |
956 | **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**
957 |
958 | - Paper: https://arxiv.org/abs/2111.15362
959 | - Code: None
960 |
961 | **Restormer: Efficient Transformer for High-Resolution Image Restoration**
962 |
963 | - Paper: https://arxiv.org/abs/2111.09881
964 | - Code: https://github.com/swz30/Restormer
965 |
966 | **Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements**
967 |
968 | - Paper(Oral): https://arxiv.org/abs/2111.12855
969 | - Code: https://github.com/edongdongchen/REI
970 |
971 |
972 |
973 | # 超分辨率(Super-Resolution)
974 |
975 | ## 图像超分辨率(Image Super-Resolution)
976 |
977 | **Learning the Degradation Distribution for Blind Image Super-Resolution**
978 |
979 | - Paper: https://arxiv.org/abs/2203.04962
980 | - Code: https://github.com/greatlog/UnpairedSR
981 |
982 | ## 视频超分辨率(Video Super-Resolution)
983 |
984 | **BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment**
985 |
986 | - Paper: https://arxiv.org/abs/2104.13371
987 | - Code: https://github.com/open-mmlab/mmediting
988 | - Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
989 | - 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
990 |
991 | **Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling**
992 |
993 | - Paper: https://arxiv.org/abs/2204.07114
994 | - Code: None
995 |
996 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
997 |
998 | - Paper: https://arxiv.org/abs/2204.10039
999 | - Code: https://github.com/H-deep/Trans-SVSR/
1000 | - Dataset: http://shorturl.at/mpwGX
1001 |
1002 |
1003 |
1004 | # 去模糊(Deblur)
1005 |
1006 | ## 图像去模糊(Image Deblur)
1007 |
1008 | **Learning to Deblur using Light Field Generated and Real Defocus Images**
1009 |
1010 | - Homepage: http://lyruan.com/Projects/DRBNet/
1011 | - Paper(Oral): https://arxiv.org/abs/2204.00442
1012 |
1013 | - Code: https://github.com/lingyanruan/DRBNet
1014 |
1015 |
1016 |
1017 | # 3D点云(3D Point Cloud)
1018 |
1019 | **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**
1020 |
1021 | - Homepage: https://point-bert.ivg-research.xyz/
1022 |
1023 | - Paper: https://arxiv.org/abs/2111.14819
1024 | - Code: https://github.com/lulutang0608/Point-BERT
1025 |
1026 | **A Unified Query-based Paradigm for Point Cloud Understanding**
1027 |
1028 | - Paper: https://arxiv.org/abs/2203.01252
1029 | - Code: None
1030 |
1031 | **CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding**
1032 |
1033 | - Paper: https://arxiv.org/abs/2203.00680
1034 | - Code: https://github.com/MohamedAfham/CrossPoint
1035 |
1036 | **PointCLIP: Point Cloud Understanding by CLIP**
1037 |
1038 | - Paper: https://arxiv.org/abs/2112.02413
1039 | - Code: https://github.com/ZrrSkywalker/PointCLIP
1040 |
1041 | **Fast Point Transformer**
1042 |
1043 | - Homepage: http://cvlab.postech.ac.kr/research/FPT/
1044 | - Paper: https://arxiv.org/abs/2112.04702
1045 | - Code: https://github.com/POSTECH-CVLab/FastPointTransformer
1046 |
1047 | **RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds**
1048 |
1049 | - Paper: https://arxiv.org/abs/2205.11028
1050 | - Code: https://github.com/gxd1994/RCP
1051 |
1052 | **The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution**
1053 |
1054 | - Paper: https://arxiv.org/abs/2205.15210
1055 | - Code: https://github.com/GostInShell/PaRI-Conv
1056 |
1057 |
1058 |
1059 | # 3D目标检测(3D Object Detection)
1060 |
1061 | **Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds**
1062 |
1063 | - Paper(Oral): https://arxiv.org/abs/2203.11139
1064 |
1065 | - Code: https://github.com/yifanzhang713/IA-SSD
1066 |
1067 | - Demo: https://www.youtube.com/watch?v=3jP2o9KXunA
1068 |
1069 | **BoxeR: Box-Attention for 2D and 3D Transformers**
1070 | - Paper: https://arxiv.org/abs/2111.13087
1071 | - Code: https://github.com/kienduynguyen/BoxeR
1072 | - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
1073 |
1074 | **Embracing Single Stride 3D Object Detector with Sparse Transformer**
1075 |
1076 | - Paper: https://arxiv.org/abs/2112.06375
1077 |
1078 | - Code: https://github.com/TuSimple/SST
1079 |
1080 | **Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes**
1081 |
1082 | - Paper: https://arxiv.org/abs/2011.12001
1083 | - Code: https://github.com/qq456cvb/CanonicalVoting
1084 |
1085 | **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**
1086 |
1087 | - Paper: https://arxiv.org/abs/2203.10981
1088 | - Code: https://github.com/kuanchihhuang/MonoDTR
1089 |
1090 | **HyperDet3D: Learning a Scene-conditioned 3D Object Detector**
1091 |
1092 | - Paper: https://arxiv.org/abs/2204.05599
1093 | - Code: None
1094 |
1095 | **OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data**
1096 |
1097 | - Paper: https://arxiv.org/abs/2204.06577
1098 | - Code: https://github.com/dschinagl/occam
1099 |
1100 | **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**
1101 |
1102 | - Homepage: https://thudair.baai.ac.cn/index
1103 | - Paper: https://arxiv.org/abs/2204.05575
1104 | - Code: https://github.com/AIR-THU/DAIR-V2X
1105 |
1106 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
1107 |
1108 | - Homepage: https://ithaca365.mae.cornell.edu/
1109 |
1110 | - Paper: https://arxiv.org/abs/2208.01166
1111 |
1112 |
1113 |
1114 | # 3D语义分割(3D Semantic Segmentation)
1115 |
1116 | **Scribble-Supervised LiDAR Semantic Segmentation**
1117 |
1118 | - Paper: https://arxiv.org/abs/2203.08537
1119 | - Dataset: https://github.com/ouenal/scribblekitti
1120 |
1121 | **Stratified Transformer for 3D Point Cloud Segmentation**
1122 |
1123 | - Paper: https://arxiv.org/pdf/2203.14508.pdf
1124 | - Code: https://github.com/dvlab-research/Stratified-Transformer
1125 |
1126 | # 3D实例分割(3D Instance Segmentation)
1127 |
1128 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
1129 |
1130 | - Homepage: https://ithaca365.mae.cornell.edu/
1131 |
1132 | - Paper: https://arxiv.org/abs/2208.01166
1133 |
1134 |
1135 |
1136 | # 3D目标跟踪(3D Object Tracking)
1137 |
1138 | **Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds**
1139 |
1140 | - Paper: https://arxiv.org/abs/2203.01730
1141 | - Code: https://github.com/Ghostish/Open3DSOT
1142 |
1143 | **PTTR: Relational 3D Point Cloud Object Tracking with Transformer**
1144 |
1145 | - Paper: https://arxiv.org/abs/2112.02857
1146 | - Code: https://github.com/Jasonkks/PTTR
1147 |
1148 |
1149 |
1150 | # 3D人体姿态估计(3D Human Pose Estimation)
1151 |
1152 | **MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation**
1153 |
1154 | - Paper: https://arxiv.org/abs/2111.12707
1155 |
1156 | - Code: https://github.com/Vegetebird/MHFormer
1157 |
1158 | - 中文解读: https://zhuanlan.zhihu.com/p/439459426
1159 |
1160 | **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**
1161 |
1162 | - Paper: https://arxiv.org/abs/2203.00859
1163 | - Code: None
1164 |
1165 | **Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation**
1166 |
1167 | - Paper: https://arxiv.org/abs/2203.07697
1168 | - Code: None
1169 | - 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw
1170 |
1171 | **BEV: Putting People in their Place: Monocular Regression of 3D People in Depth**
1172 |
1173 | - Homepage: https://arthur151.github.io/BEV/BEV.html
1174 | - Paper: https://arxiv.org/abs/2112.08274
1175 | - Code: https://github.com/Arthur151/ROMP
1176 | - Dataset: https://github.com/Arthur151/Relative_Human
1177 | - Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI
1178 |
1179 |
1180 |
1181 | # 3D语义场景补全(3D Semantic Scene Completion)
1182 |
1183 | **MonoScene: Monocular 3D Semantic Scene Completion**
1184 |
1185 | - Paper: https://arxiv.org/abs/2112.00726
1186 | - Code: https://github.com/cv-rits/MonoScene
1187 |
1188 |
1189 |
1190 | # 3D重建(3D Reconstruction)
1191 |
1192 | **BANMo: Building Animatable 3D Neural Models from Many Casual Videos**
1193 |
1194 | - Homepage: https://banmo-www.github.io/
1195 | - Paper: https://arxiv.org/abs/2112.12761
1196 | - Code: https://github.com/facebookresearch/banmo
1197 | - 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
1198 |
1199 |
1200 |
1201 | # 行人重识别(Person Re-identification)
1202 |
1203 | **NFormer: Robust Person Re-identification with Neighbor Transformer**
1204 |
1205 | - Paper: https://arxiv.org/abs/2204.09331
1206 | - Code: https://github.com/haochenheheda/NFormer
1207 |
1208 |
1209 |
1210 | # 伪装物体检测(Camouflaged Object Detection)
1211 |
1212 | **Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection**
1213 |
1214 | - Paper: https://arxiv.org/abs/2203.02688
1215 | - Code: https://github.com/lartpang/ZoomNet
1216 |
1217 |
1218 |
1219 | # 深度估计(Depth Estimation)
1220 |
1221 | ## 单目深度估计
1222 |
1223 | **NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation**
1224 |
1225 | - Paper: https://arxiv.org/abs/2203.01502
1226 | - Code: None
1227 |
1228 | **OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion**
1229 |
1230 | - Paper: https://arxiv.org/abs/2203.00838
1231 | - Code: None
1232 |
1233 | **Toward Practical Self-Supervised Monocular Indoor Depth Estimation**
1234 |
1235 | - Paper: https://arxiv.org/abs/2112.02306
1236 | - Code: None
1237 |
1238 | **P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior**
1239 |
1240 | - Paper: https://arxiv.org/abs/2204.02091
1241 | - Code: https://github.com/SysCV/P3Depth
1242 |
1243 | **Multi-Frame Self-Supervised Depth with Transformers**
1244 |
1245 | - Homepage: https://sites.google.com/tri.global/depthformer
1246 |
1247 | - Paper: https://arxiv.org/abs/2204.07616
1248 | - Code: None
1249 |
1250 |
1251 |
1252 | # 立体匹配(Stereo Matching)
1253 |
1254 | **ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching**
1255 |
1256 | - Paper: https://arxiv.org/abs/2203.02146
1257 | - Code: https://github.com/gangweiX/ACVNet
1258 |
1259 |
1260 |
1261 | # 特征匹配(Feature Matching)
1262 |
1263 | **ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching**
1264 |
1265 | - Paper: https://arxiv.org/abs/2204.11700
1266 | - Code: None
1267 |
1268 |
1269 |
1270 | # 车道线检测(Lane Detection)
1271 |
1272 | **Rethinking Efficient Lane Detection via Curve Modeling**
1273 |
1274 | - Paper: https://arxiv.org/abs/2203.02431
1275 | - Code: https://github.com/voldemortX/pytorch-auto-drive
1276 | - Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
1277 |
1278 | **A Keypoint-based Global Association Network for Lane Detection**
1279 |
1280 | - Paper: https://arxiv.org/abs/2204.07335
1281 | - Code: https://github.com/Wolfwjs/GANet
1282 |
1283 |
1284 |
1285 | # 光流估计(Optical Flow Estimation)
1286 |
1287 | **Imposing Consistency for Optical Flow Estimation**
1288 |
1289 | - Paper: https://arxiv.org/abs/2204.07262
1290 | - Code: None
1291 |
1292 | **Deep Equilibrium Optical Flow Estimation**
1293 |
1294 | - Paper: https://arxiv.org/abs/2204.08442
1295 | - Code: https://github.com/locuslab/deq-flow
1296 |
1297 | **GMFlow: Learning Optical Flow via Global Matching**
1298 |
1299 | - Paper(Oral): https://arxiv.org/abs/2111.13680
1300 | - Code: https://github.com/haofeixu/gmflow
1301 |
1302 |
1303 |
1304 | # 图像修复(Image Inpainting)
1305 |
1306 | **Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding**
1307 |
1308 | - Paper: https://arxiv.org/abs/2203.00867
1309 |
1310 | - Code: https://github.com/DQiaole/ZITS_inpainting
1311 |
1312 |
1313 |
1314 | # 图像检索(Image Retrieval)
1315 |
1316 | **Correlation Verification for Image Retrieval**
1317 |
1318 | - Paper(Oral): https://arxiv.org/abs/2204.01458
1319 | - Code: https://github.com/sungonce/CVNet
1320 |
1321 |
1322 |
1323 | # 人脸识别(Face Recognition)
1324 |
1325 | **AdaFace: Quality Adaptive Margin for Face Recognition**
1326 |
1327 | - Paper(Oral): https://arxiv.org/abs/2204.00964
1328 | - Code: https://github.com/mk-minchul/AdaFace
1329 |
1330 |
1331 |
1332 | # 人群计数(Crowd Counting)
1333 |
1334 | **Leveraging Self-Supervision for Cross-Domain Crowd Counting**
1335 |
1336 | - Paper: https://arxiv.org/abs/2103.16291
1337 | - Code: None
1338 |
1339 |
1340 |
1341 | # 医学图像(Medical Image)
1342 |
1343 | **BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation**
1344 |
1345 | - Paper: https://arxiv.org/abs/2203.02533
1346 | - Code: None
1347 |
1348 | **Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification**
1349 |
1350 | - Paper: https://arxiv.org/abs/2111.12918
1351 | - Code: https://github.com/FBLADL/ACPL
1352 |
1353 | **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**
1354 |
1355 | - Paper: https://arxiv.org/abs/2204.10437
1356 |
1357 | - Code: https://github.com/JLiangLab/DiRA
1358 |
1359 |
1360 |
1361 | # 视频生成(Video Generation)
1362 |
1363 | **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**
1364 |
1365 | - Homepage: https://universome.github.io/stylegan-v
1366 | - Paper: https://arxiv.org/abs/2112.14683
1367 |
1368 | - Code: https://github.com/universome/stylegan-v
1369 |
1370 | - Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4
1371 |
1372 |
1373 |
1374 | # 场景图生成(Scene Graph Generation)
1375 |
1376 | **SGTR: End-to-end Scene Graph Generation with Transformer**
1377 |
1378 | - Paper: https://arxiv.org/abs/2112.12970
1379 | - Code: None
1380 |
1381 |
1382 |
1383 | # 参考视频目标分割(Referring Video Object Segmentation)
1384 |
1385 | **Language as Queries for Referring Video Object Segmentation**
1386 |
1387 | - Paper: https://arxiv.org/abs/2201.00487
1388 | - Code: https://github.com/wjn922/ReferFormer
1389 |
1390 | **ReSTR: Convolution-free Referring Image Segmentation Using Transformers**
1391 |
1392 | - Paper: https://arxiv.org/abs/2203.16768
1393 | - Code: None
1394 |
1395 |
1396 |
1397 | # 步态识别(Gait Recognition)
1398 |
1399 | **Gait Recognition in the Wild with Dense 3D Representations and A Benchmark**
1400 |
1401 | - Homepage: https://gait3d.github.io/
1402 | - Paper: https://arxiv.org/abs/2204.02569
1403 | - Code: https://github.com/Gait3D/Gait3D-Benchmark
1404 |
1405 |
1406 |
1407 | # 风格迁移(Style Transfer)
1408 |
1409 | **StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions**
1410 |
1411 | - Homepage: https://lukashoel.github.io/stylemesh/
1412 | - Paper: https://arxiv.org/abs/2112.01530
1413 |
1414 | - Code: https://github.com/lukasHoel/stylemesh
1415 | - Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks
1416 |
1417 |
1418 |
1419 | # 异常检测(Anomaly Detection)
1420 |
1421 | **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**
1422 |
1423 | - Paper: https://arxiv.org/abs/2111.08644
1424 |
1425 | - Dataset: https://github.com/lilygeorgescu/UBnormal
1426 |
1427 | **Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection**
1428 |
1429 | - Paper(Oral): https://arxiv.org/abs/2111.09099
1430 | - Code: https://github.com/ristea/sspcab
1431 |
1432 | 对抗样本)
1433 |
1434 | # 对抗样本(Adversarial Examples)
1435 |
1436 | **Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon**
1437 |
1438 | - Paper: https://arxiv.org/abs/2203.03818
1439 | - Code: https://github.com/hncszyq/ShadowAttack
1440 |
1441 | **LAS-AT: Adversarial Training with Learnable Attack Strategy**
1442 |
1443 | - Paper(Oral): https://arxiv.org/abs/2203.06616
1444 | - Code: https://github.com/jiaxiaojunQAQ/LAS-AT
1445 |
1446 | **Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection**
1447 |
1448 | - Paper: https://arxiv.org/abs/2112.04532
1449 | - Code: https://github.com/joellliu/SegmentAndComplete
1450 |
1451 |
1452 |
1453 | # 弱监督物体检测(Weakly Supervised Object Localization)
1454 |
1455 | **Weakly Supervised Object Localization as Domain Adaption**
1456 |
1457 | - Paper: https://arxiv.org/abs/2203.01714
1458 | - Code: https://github.com/zh460045050/DA-WSOL_CVPR2022
1459 |
1460 |
1461 |
1462 | # 雷达目标检测(Radar Object Detection)
1463 |
1464 | **Exploiting Temporal Relations on Radar Perception for Autonomous Driving**
1465 |
1466 | - Paper: https://arxiv.org/abs/2204.01184
1467 | - Code: None
1468 |
1469 |
1470 |
1471 | # 高光谱图像重建(Hyperspectral Image Reconstruction)
1472 |
1473 | **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**
1474 |
1475 | - Paper: https://arxiv.org/abs/2111.07910
1476 | - Code: https://github.com/caiyuanhao1998/MST
1477 |
1478 |
1479 |
1480 | # 图像拼接(Image Stitching)
1481 |
1482 | **Deep Rectangling for Image Stitching: A Learning Baseline**
1483 |
1484 | - Paper(Oral): https://arxiv.org/abs/2203.03831
1485 |
1486 | - Code: https://github.com/nie-lang/DeepRectangling
1487 | - Dataset: https://github.com/nie-lang/DeepRectangling
1488 | - 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
1489 |
1490 |
1491 |
1492 | # 水印(Watermarking)
1493 |
1494 | **Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings**
1495 |
1496 | - Paper: https://arxiv.org/abs/2104.13450
1497 | - Code: None
1498 |
1499 |
1500 |
1501 | # Action Counting
1502 |
1503 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
1504 |
1505 | - Paper(Oral): https://arxiv.org/abs/2204.01018
1506 | - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
1507 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
1508 |
1509 |
1510 |
1511 | # Grounded Situation Recognition
1512 |
1513 | **Collaborative Transformers for Grounded Situation Recognition**
1514 |
1515 | - Paper: https://arxiv.org/abs/2203.16518
1516 | - Code: https://github.com/jhcho99/CoFormer
1517 |
1518 |
1519 |
1520 | # Zero-shot Learning
1521 |
1522 | **Unseen Classes at a Later Time? No Problem**
1523 |
1524 | - Paper: https://arxiv.org/abs/2203.16517
1525 | - Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time
1526 |
1527 |
1528 |
1529 | # DeepFakes
1530 |
1531 | **Detecting Deepfakes with Self-Blended Images**
1532 |
1533 | - Paper(Oral): https://arxiv.org/abs/2204.08376
1534 |
1535 | - Code: https://github.com/mapooon/SelfBlendedImages
1536 |
1537 |
1538 |
1539 | # 数据集(Datasets)
1540 |
1541 | **It's About Time: Analog Clock Reading in the Wild**
1542 |
1543 | - Homepage: https://charigyang.github.io/abouttime/
1544 | - Paper: https://arxiv.org/abs/2111.09162
1545 | - Code: https://github.com/charigyang/itsabouttime
1546 | - Demo: https://youtu.be/cbiMACA6dRc
1547 |
1548 | **Toward Practical Self-Supervised Monocular Indoor Depth Estimation**
1549 |
1550 | - Paper: https://arxiv.org/abs/2112.02306
1551 | - Code: None
1552 |
1553 | **Kubric: A scalable dataset generator**
1554 |
1555 | - Paper: https://arxiv.org/abs/2203.03570
1556 | - Code: https://github.com/google-research/kubric
1557 | - 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
1558 |
1559 | **Scribble-Supervised LiDAR Semantic Segmentation**
1560 |
1561 | - Paper: https://arxiv.org/abs/2203.08537
1562 | - Dataset: https://github.com/ouenal/scribblekitti
1563 |
1564 | **Deep Rectangling for Image Stitching: A Learning Baseline**
1565 |
1566 | - Paper(Oral): https://arxiv.org/abs/2203.03831
1567 | - Code: https://github.com/nie-lang/DeepRectangling
1568 | - Dataset: https://github.com/nie-lang/DeepRectangling
1569 | - 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
1570 |
1571 | **ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer**
1572 |
1573 | - Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
1574 | - Paper: https://arxiv.org/abs/2204.02389
1575 | - Dataset: https://github.com/rhgao/ObjectFolder
1576 | - Demo:https://youtu.be/e5aToT3LkRA
1577 |
1578 | **Shape from Polarization for Complex Scenes in the Wild**
1579 |
1580 | - Homepage: https://chenyanglei.github.io/sfpwild/index.html
1581 | - Paper: https://arxiv.org/abs/2112.11377
1582 | - Code: https://github.com/ChenyangLEI/sfp-wild
1583 |
1584 | **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**
1585 |
1586 | - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
1587 | - Paper: https://arxiv.org/abs/2204.04120
1588 |
1589 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
1590 |
1591 | - Paper(Oral): https://arxiv.org/abs/2204.01018
1592 | - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
1593 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
1594 |
1595 | **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**
1596 |
1597 | - Paper(Oral): https://arxiv.org/abs/2204.03646
1598 | - Dataset: https://github.com/xujinglin/FineDiving
1599 | - Code: https://github.com/xujinglin/FineDiving
1600 | - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
1601 |
1602 | **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**
1603 |
1604 | - Paper: https://arxiv.org/abs/2204.02701
1605 | - Dataset: https://github.com/yizhiwang96/TextLogoLayout
1606 | - Code: https://github.com/yizhiwang96/TextLogoLayout
1607 |
1608 | **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**
1609 |
1610 | - Homepage: https://thudair.baai.ac.cn/index
1611 | - Paper: https://arxiv.org/abs/2204.05575
1612 | - Code: https://github.com/AIR-THU/DAIR-V2X
1613 |
1614 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
1615 |
1616 | - Paper: https://arxiv.org/abs/2204.10039
1617 | - Code: https://github.com/H-deep/Trans-SVSR/
1618 | - Dataset: http://shorturl.at/mpwGX
1619 |
1620 | **Putting People in their Place: Monocular Regression of 3D People in Depth**
1621 |
1622 | - Homepage: https://arthur151.github.io/BEV/BEV.html
1623 | - Paper: https://arxiv.org/abs/2112.08274
1624 |
1625 | - Code:https://github.com/Arthur151/ROMP
1626 | - Dataset: https://github.com/Arthur151/Relative_Human
1627 |
1628 | **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**
1629 |
1630 | - Paper: https://arxiv.org/abs/2111.08644
1631 | - Dataset: https://github.com/lilygeorgescu/UBnormal
1632 |
1633 | **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**
1634 |
1635 | - Homepage: https://dancetrack.github.io
1636 | - Paper: https://arxiv.org/abs/2111.14690
1637 | - Dataset: https://github.com/DanceTrack/DanceTrack
1638 |
1639 | **Visual Abductive Reasoning**
1640 |
1641 | - Paper: https://arxiv.org/abs/2203.14040
1642 | - Code: https://github.com/leonnnop/VAR
1643 |
1644 | **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**
1645 |
1646 | - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
1647 | - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
1648 | - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
1649 |
1650 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
1651 |
1652 | - Homepage: https://ithaca365.mae.cornell.edu/
1653 |
1654 | - Paper: https://arxiv.org/abs/2208.01166
1655 |
1656 |
1657 |
1658 | # 新任务(New Task)
1659 |
1660 | **Language-based Video Editing via Multi-Modal Multi-Level Transformer**
1661 |
1662 | - Paper: https://arxiv.org/abs/2104.01122
1663 | - Code: None
1664 |
1665 | **It's About Time: Analog Clock Reading in the Wild**
1666 |
1667 | - Homepage: https://charigyang.github.io/abouttime/
1668 | - Paper: https://arxiv.org/abs/2111.09162
1669 | - Code: https://github.com/charigyang/itsabouttime
1670 | - Demo: https://youtu.be/cbiMACA6dRc
1671 |
1672 | **Splicing ViT Features for Semantic Appearance Transfer**
1673 |
1674 | - Homepage: https://splice-vit.github.io/
1675 | - Paper: https://arxiv.org/abs/2201.00424
1676 | - Code: https://github.com/omerbt/Splice
1677 |
1678 | **Visual Abductive Reasoning**
1679 |
1680 | - Paper: https://arxiv.org/abs/2203.14040
1681 | - Code: https://github.com/leonnnop/VAR
1682 |
1683 |
1684 |
1685 | # 其他(Others)
1686 |
1687 | **Kubric: A scalable dataset generator**
1688 |
1689 | - Paper: https://arxiv.org/abs/2203.03570
1690 | - Code: https://github.com/google-research/kubric
1691 | - 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
1692 |
1693 | **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**
1694 |
1695 | - Paper: https://arxiv.org/abs/2203.00843
1696 | - Code: https://github.com/CurryYuan/X-Trans2Cap
1697 |
1698 | **Balanced MSE for Imbalanced Visual Regression**
1699 |
1700 | - Paper(Oral): https://arxiv.org/abs/2203.16427
1701 | - Code: https://github.com/jiawei-ren/BalancedMSE
1702 |
1703 | **SNUG: Self-Supervised Neural Dynamic Garments**
1704 |
1705 | - Homepage: http://mslab.es/projects/SNUG/
1706 | - Paper(Oral): https://arxiv.org/abs/2204.02219
1707 | - Code: https://github.com/isantesteban/snug
1708 |
1709 | **Shape from Polarization for Complex Scenes in the Wild**
1710 |
1711 | - Homepage: https://chenyanglei.github.io/sfpwild/index.html
1712 | - Paper: https://arxiv.org/abs/2112.11377
1713 | - Code: https://github.com/ChenyangLEI/sfp-wild
1714 |
1715 | **LASER: LAtent SpacE Rendering for 2D Visual Localization**
1716 |
1717 | - Paper(Oral): https://arxiv.org/abs/2204.00157
1718 | - Code: None
1719 |
1720 | **Single-Photon Structured Light**
1721 |
1722 | - Paper(Oral): https://arxiv.org/abs/2204.05300
1723 | - Code: None
1724 |
1725 | **3DeformRS: Certifying Spatial Deformations on Point Clouds**
1726 |
1727 | - Paper: https://arxiv.org/abs/2204.05687
1728 | - Code: None
1729 |
1730 | **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**
1731 |
1732 | - Paper: https://arxiv.org/abs/2204.02701
1733 | - Dataset: https://github.com/yizhiwang96/TextLogoLayout
1734 | - Code: https://github.com/yizhiwang96/TextLogoLayout
1735 |
1736 | **Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes**
1737 |
1738 | - Paper: https://arxiv.org/abs/2203.13412
1739 | - Code: https://github.com/zjsong/SSPL
1740 |
1741 | **Robust and Accurate Superquadric Recovery: a Probabilistic Approach**
1742 |
1743 | - Paper(Oral): https://arxiv.org/abs/2111.14517
1744 | - Code: https://github.com/bmlklwx/EMS-superquadric_fitting
1745 |
1746 | **Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence**
1747 |
1748 | - Paper: https://arxiv.org/abs/2203.00911
1749 | - Code: None
1750 |
1751 | **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**
1752 |
1753 | - Paper(Oral): https://arxiv.org/abs/2204.08680
1754 | - Code: https://github.com/zengwang430521/TCFormer
1755 |
1756 | **DeepDPM: Deep Clustering With an Unknown Number of Clusters**
1757 |
1758 | - Paper: https://arxiv.org/abs/2203.14309
1759 | - Code: https://github.com/BGU-CS-VIL/DeepDPM
1760 |
1761 | **ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic**
1762 |
1763 | - Paper: https://arxiv.org/abs/2111.14447
1764 | - Code: https://github.com/YoadTew/zero-shot-image-to-text
1765 |
1766 | **Proto2Proto: Can you recognize the car, the way I do?**
1767 |
1768 | - Paper: https://arxiv.org/abs/2204.11830
1769 | - Code: https://github.com/archmaester/proto2proto
1770 |
1771 | **Putting People in their Place: Monocular Regression of 3D People in Depth**
1772 |
1773 | - Homepage: https://arthur151.github.io/BEV/BEV.html
1774 | - Paper: https://arxiv.org/abs/2112.08274
1775 | - Code:https://github.com/Arthur151/ROMP
1776 | - Dataset: https://github.com/Arthur151/Relative_Human
1777 |
1778 | **Light Field Neural Rendering**
1779 |
1780 | - Homepage: https://light-field-neural-rendering.github.io/
1781 | - Paper(Oral): https://arxiv.org/abs/2112.09687
1782 | - Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering
1783 |
1784 | **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**
1785 |
1786 | - Paper: https://arxiv.org/abs/2204.06160
1787 | - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
1788 |
1789 | **Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning**
1790 |
1791 | - Paper: https://arxiv.org/abs/2203.14333
1792 | - Code: https://github.com/0liliulei/LIIR
--------------------------------------------------------------------------------
/CVPR2023-Papers-with-Code.md:
--------------------------------------------------------------------------------
1 | # CVPR 2023 论文和开源项目合集(Papers with Code)
2 |
3 | [CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)!
4 |
5 | **25.78% = 2360 / 9155**
6 |
7 | CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate.
8 |
9 |
10 | > 注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!
11 | >
12 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
13 | >
14 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md)
15 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md)
16 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md)
17 | > - [CVPR 2022](CVPR2022-Papers-with-Code.md)
18 |
19 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
20 |
21 | 
22 |
23 | # 【CVPR 2023 论文开源目录】
24 |
25 | - [Backbone](#Backbone)
26 | - [CLIP](#CLIP)
27 | - [MAE](#MAE)
28 | - [GAN](#GAN)
29 | - [GNN](#GNN)
30 | - [MLP](#MLP)
31 | - [NAS](#NAS)
32 | - [OCR](#OCR)
33 | - [NeRF](#NeRF)
34 | - [DETR](#DETR)
35 | - [Prompt](#Prompt)
36 | - [Diffusion Models(扩散模型)](#Diffusion)
37 | - [Avatars](#Avatars)
38 | - [ReID(重识别)](#ReID)
39 | - [长尾分布(Long-Tail)](#Long-Tail)
40 | - [Vision Transformer](#Vision-Transformer)
41 | - [视觉和语言(Vision-Language)](#VL)
42 | - [自监督学习(Self-supervised Learning)](#SSL)
43 | - [数据增强(Data Augmentation)](#DA)
44 | - [目标检测(Object Detection)](#Object-Detection)
45 | - [目标跟踪(Visual Tracking)](#VT)
46 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
47 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
48 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
49 | - [医学图像分割(Medical Image Segmentation)](#MIS)
50 | - [视频目标分割(Video Object Segmentation)](#VOS)
51 | - [视频实例分割(Video Instance Segmentation)](#VIS)
52 | - [参考图像分割(Referring Image Segmentation)](#RIS)
53 | - [图像抠图(Image Matting)](#Matting)
54 | - [图像编辑(Image Editing)](#Image-Editing)
55 | - [Low-level Vision](#LLV)
56 | - [超分辨率(Super-Resolution)](#SR)
57 | - [去噪(Denoising)](#Denoising)
58 | - [去模糊(Deblur)](#Deblur)
59 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
60 | - [3D目标检测(3D Object Detection)](#3DOD)
61 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
62 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
63 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
64 | - [3D配准(3D Registration)](#3D-Registration)
65 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
66 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
67 | - [医学图像(Medical Image)](#Medical-Image)
68 | - [图像生成(Image Generation)](#Image-Generation)
69 | - [视频生成(Video Generation)](#Video-Generation)
70 | - [视频理解(Video Understanding)](#Video-Understanding)
71 | - [行为检测(Action Detection)](#Action-Detection)
72 | - [文本检测(Text Detection)](#Text-Detection)
73 | - [知识蒸馏(Knowledge Distillation)](#KD)
74 | - [模型剪枝(Model Pruning)](#Pruning)
75 | - [图像压缩(Image Compression)](#IC)
76 | - [异常检测(Anomaly Detection)](#AD)
77 | - [三维重建(3D Reconstruction)](#3D-Reconstruction)
78 | - [深度估计(Depth Estimation)](#Depth-Estimation)
79 | - [轨迹预测(Trajectory Prediction)](#TP)
80 | - [车道线检测(Lane Detection)](#Lane-Detection)
81 | - [图像描述(Image Captioning)](#Image-Captioning)
82 | - [视觉问答(Visual Question Answering)](#VQA)
83 | - [手语识别(Sign Language Recognition)](#SLR)
84 | - [视频预测(Video Prediction)](#Video-Prediction)
85 | - [新视点合成(Novel View Synthesis)](#NVS)
86 | - [Zero-Shot Learning(零样本学习)](#ZSL)
87 | - [立体匹配(Stereo Matching)](#Stereo-Matching)
88 | - [特征匹配(Feature Matching)](#Feature-Matching)
89 | - [场景图生成(Scene Graph Generation)](#SGG)
90 | - [隐式神经表示(Implicit Neural Representations)](#INR)
91 | - [图像质量评价(Image Quality Assessment)](#IQA)
92 | - [数据集(Datasets)](#Datasets)
93 | - [新任务(New Tasks)](#New-Tasks)
94 | - [其他(Others)](#Others)
95 |
96 |
97 |
98 | # Backbone
99 |
100 | **Integrally Pre-Trained Transformer Pyramid Networks**
101 |
102 | - Paper: https://arxiv.org/abs/2211.12735
103 | - Code: https://github.com/sunsmarterjie/iTPN
104 |
105 | **Stitchable Neural Networks**
106 |
107 | - Homepage: https://snnet.github.io/
108 | - Paper: https://arxiv.org/abs/2302.06586
109 | - Code: https://github.com/ziplab/SN-Net
110 |
111 | **Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**
112 |
113 | - Paper: https://arxiv.org/abs/2303.03667
114 | - Code: https://github.com/JierunChen/FasterNet
115 |
116 | **BiFormer: Vision Transformer with Bi-Level Routing Attention**
117 |
118 | - Paper: None
119 | - Code: https://github.com/rayleizhu/BiFormer
120 |
121 | **DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network**
122 |
123 | - Paper: https://arxiv.org/abs/2303.02165
124 | - Code: https://github.com/alibaba/lightweight-neural-architecture-search
125 |
126 | **Vision Transformer with Super Token Sampling**
127 |
128 | - Paper: https://arxiv.org/abs/2211.11167
129 | - Code: https://github.com/hhb072/SViT
130 |
131 | **Hard Patches Mining for Masked Image Modeling**
132 |
133 | - Paper: None
134 | - Code: None
135 |
136 | **SMPConv: Self-moving Point Representations for Continuous Convolution**
137 |
138 | - Paper: https://arxiv.org/abs/2304.02330
139 | - Code: https://github.com/sangnekim/SMPConv
140 |
141 | **Making Vision Transformers Efficient from A Token Sparsification View**
142 |
143 | - Paper: https://arxiv.org/abs/2303.08685
144 | - Code: https://github.com/changsn/STViT-R
145 |
146 |
147 |
148 | # CLIP
149 |
150 | **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**
151 |
152 | - Paper: https://arxiv.org/abs/2301.12959
153 | - Code: https://github.com/tobran/GALIP
154 |
155 | **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**
156 |
157 | - Paper: https://arxiv.org/abs/2303.06285
158 | - Code: https://github.com/Yueming6568/DeltaEdit
159 |
160 |
161 |
162 | # MAE
163 |
164 | **Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders**
165 |
166 | - Paper: https://arxiv.org/abs/2212.06785
167 | - Code: https://github.com/ZrrSkywalker/I2P-MAE
168 |
169 | **Generic-to-Specific Distillation of Masked Autoencoders**
170 |
171 | - Paper: https://arxiv.org/abs/2302.14771
172 | - Code: https://github.com/pengzhiliang/G2SD
173 |
174 |
175 |
176 | # GAN
177 |
178 | **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**
179 |
180 | - Paper: https://arxiv.org/abs/2303.06285
181 | - Code: https://github.com/Yueming6568/DeltaEdit
182 |
183 |
184 |
185 | # NeRF
186 |
187 | **NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior**
188 |
189 | - Home: https://nope-nerf.active.vision/
190 | - Paper: https://arxiv.org/abs/2212.07388
191 | - Code: None
192 |
193 | **Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures**
194 |
195 | - Paper: https://arxiv.org/abs/2211.07600
196 | - Code: https://github.com/eladrich/latent-nerf
197 |
198 | **NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis**
199 |
200 | - Paper: https://arxiv.org/abs/2301.08556
201 | - Code: None
202 |
203 | **Panoptic Lifting for 3D Scene Understanding with Neural Fields**
204 |
205 | - Homepage: https://nihalsid.github.io/panoptic-lifting/
206 | - Paper: https://arxiv.org/abs/2212.09802
207 | - Code: None
208 |
209 | **NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer**
210 |
211 | - Homepage: https://redrock303.github.io/nerflix/
212 | - Paper: https://arxiv.org/abs/2303.06919
213 | - Code: None
214 |
215 | **HNeRV: A Hybrid Neural Representation for Videos**
216 |
217 | - Homepage: https://haochen-rye.github.io/HNeRV
218 | - Paper: https://arxiv.org/abs/2304.02633
219 | - Code: https://github.com/haochen-rye/HNeRV
220 |
221 |
222 |
223 | # DETR
224 |
225 | **DETRs with Hybrid Matching**
226 |
227 | - Paper: https://arxiv.org/abs/2207.13080
228 | - Code: https://github.com/HDETR
229 |
230 |
231 |
232 | # Prompt
233 |
234 | **Diversity-Aware Meta Visual Prompting**
235 |
236 | - Paper: https://arxiv.org/abs/2303.08138
237 | - Code: https://github.com/shikiw/DAM-VP
238 |
239 |
240 |
241 | # NAS
242 |
243 | **PA&DA: Jointly Sampling PAth and DAta for Consistent NAS**
244 |
245 | - Paper: https://arxiv.org/abs/2302.14772
246 | - Code: https://github.com/ShunLu91/PA-DA
247 |
248 |
249 |
250 | # Avatars
251 |
252 | **Structured 3D Features for Reconstructing Relightable and Animatable Avatars**
253 |
254 | - Homepage: https://enriccorona.github.io/s3f/
255 | - Paper: https://arxiv.org/abs/2212.06820
256 | - Code: None
257 | - Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s
258 |
259 | **Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos**
260 |
261 | - Homepage: https://augmentedperception.github.io/monoavatar/
262 | - Paper: https://arxiv.org/abs/2304.01436
263 |
264 |
265 |
266 | # ReID(重识别)
267 |
268 | **Clothing-Change Feature Augmentation for Person Re-Identification**
269 |
270 | - Paper: None
271 | - Code: None
272 |
273 | **MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID**
274 |
275 | - Paper: https://arxiv.org/abs/2303.07065
276 | - Code: https://github.com/vimar-gu/MSINet
277 |
278 | **Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification**
279 |
280 | - Paper: https://arxiv.org/abs/2304.04205
281 | - Code: None
282 |
283 | **Large-scale Training Data Search for Object Re-identification**
284 |
285 | - Paper: https://arxiv.org/abs/2303.16186
286 | - Code: https://github.com/yorkeyao/SnP
287 |
288 |
289 |
290 | # Diffusion Models(扩散模型)
291 |
292 | **Video Probabilistic Diffusion Models in Projected Latent Space**
293 |
294 | - Homepage: https://sihyun.me/PVDM/
295 | - Paper: https://arxiv.org/abs/2302.07685
296 | - Code: https://github.com/sihyun-yu/PVDM
297 |
298 | **Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models**
299 |
300 | - Paper: https://arxiv.org/abs/2211.10655
301 | - Code: None
302 |
303 | **Imagic: Text-Based Real Image Editing with Diffusion Models**
304 |
305 | - Homepage: https://imagic-editing.github.io/
306 | - Paper: https://arxiv.org/abs/2210.09276
307 | - Code: None
308 |
309 | **Parallel Diffusion Models of Operator and Image for Blind Inverse Problems**
310 |
311 | - Paper: https://arxiv.org/abs/2211.10656
312 | - Code: None
313 |
314 | **DiffRF: Rendering-guided 3D Radiance Field Diffusion**
315 |
316 | - Homepage: https://sirwyver.github.io/DiffRF/
317 | - Paper: https://arxiv.org/abs/2212.01206
318 | - Code: None
319 |
320 | **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**
321 |
322 | - Paper: https://arxiv.org/abs/2212.09478
323 | - Code: https://github.com/researchmm/MM-Diffusion
324 |
325 | **HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising**
326 |
327 | - Homepage: https://aminshabani.github.io/housediffusion/
328 | - Paper: https://arxiv.org/abs/2211.13287
329 | - Code: https://github.com/aminshabani/house_diffusion
330 |
331 | **TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets**
332 |
333 | - Paper: https://arxiv.org/abs/2303.05762
334 | - Code: https://github.com/chenweixin107/TrojDiff
335 |
336 | **Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption**
337 |
338 | - Paper: https://arxiv.org/abs/2207.03442
339 | - Code: https://github.com/shiyegao/DDA
340 |
341 | **DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration**
342 |
343 | - Paper: https://arxiv.org/abs/2303.06885
344 | - Code: None
345 |
346 | **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**
347 |
348 | - Homepage: https://nv-tlabs.github.io/trace-pace/
349 | - Paper: https://arxiv.org/abs/2304.01893
350 | - Code: None
351 |
352 | **Generative Diffusion Prior for Unified Image Restoration and Enhancement**
353 |
354 | - Paper: https://arxiv.org/abs/2304.01247
355 | - Code: None
356 |
357 | **Conditional Image-to-Video Generation with Latent Flow Diffusion Models**
358 |
359 | - Paper: https://arxiv.org/abs/2303.13744
360 | - Code: https://github.com/nihaomiao/CVPR23_LFDM
361 |
362 |
363 |
364 | # 长尾分布(Long-Tail)
365 |
366 | **Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation**
367 |
368 | - Paper: https://arxiv.org/abs/2304.01279
369 | - Code: None
370 |
371 |
372 |
373 | # Vision Transformer
374 |
375 | **Integrally Pre-Trained Transformer Pyramid Networks**
376 |
377 | - Paper: https://arxiv.org/abs/2211.12735
378 | - Code: https://github.com/sunsmarterjie/iTPN
379 |
380 | **Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors**
381 |
382 | - Homepage: https://niessnerlab.org/projects/hou2023mask3d.html
383 | - Paper: https://arxiv.org/abs/2302.14746
384 | - Code: None
385 |
386 | **Learning Trajectory-Aware Transformer for Video Super-Resolution**
387 |
388 | - Paper: https://arxiv.org/abs/2204.04216
389 | - Code: https://github.com/researchmm/TTVSR
390 |
391 | **Vision Transformers are Parameter-Efficient Audio-Visual Learners**
392 |
393 | - Homepage: https://yanbo.ml/project_page/LAVISH/
394 | - Code: https://github.com/GenjiB/LAVISH
395 |
396 | **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**
397 |
398 | - Paper: https://arxiv.org/abs/2303.04249
399 | - Code: None
400 |
401 | **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**
402 |
403 | - Paper: https://arxiv.org/abs/2301.06051
404 | - Code: https://github.com/Haiyang-W/DSVT
405 |
406 | **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**
407 |
408 | - Paper: https://arxiv.org/abs/2211.10772
409 | - Code link: https://github.com/ViTAE-Transformer/DeepSolo
410 |
411 | **BiFormer: Vision Transformer with Bi-Level Routing Attention**
412 |
413 | - Paper: https://arxiv.org/abs/2303.08810
414 | - Code: https://github.com/rayleizhu/BiFormer
415 |
416 | **Vision Transformer with Super Token Sampling**
417 |
418 | - Paper: https://arxiv.org/abs/2211.11167
419 | - Code: https://github.com/hhb072/SViT
420 |
421 | **BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision**
422 |
423 | - Paper: https://arxiv.org/abs/2211.10439
424 | - Code: None
425 |
426 | **BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation**
427 |
428 | - Paper: None
429 | - Code: None
430 |
431 | **Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention**
432 |
433 | - Paper: https://arxiv.org/abs/2304.03282
434 | - Code: None
435 |
436 | **Making Vision Transformers Efficient from A Token Sparsification View**
437 |
438 | - Paper: https://arxiv.org/abs/2303.08685
439 | - Code: https://github.com/changsn/STViT-R
440 |
441 |
442 |
443 | # 视觉和语言(Vision-Language)
444 |
445 | **GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods**
446 |
447 | - Paper: https://arxiv.org/abs/2301.01893
448 | - Code: None
449 |
450 | **Teaching Structured Vision&Language Concepts to Vision&Language Models**
451 |
452 | - Paper: https://arxiv.org/abs/2211.11733
453 | - Code: None
454 |
455 | **Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks**
456 |
457 | - Paper: https://arxiv.org/abs/2211.09808
458 | - Code: https://github.com/fundamentalvision/Uni-Perceiver
459 |
460 | **Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training**
461 |
462 | - Paper: https://arxiv.org/abs/2303.00040
463 | - Code: None
464 |
465 | **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**
466 |
467 | - Paper: https://arxiv.org/abs/2303.02489
468 | - Code: None
469 |
470 | **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**
471 |
472 | - Paper: https://arxiv.org/abs/2303.02483
473 | - Code: None
474 |
475 | **Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding**
476 |
477 | - Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html
478 | - Paper: https://arxiv.org/abs/2303.04077
479 | - Code: None
480 |
481 | **All in One: Exploring Unified Video-Language Pre-training**
482 |
483 | - Paper: https://arxiv.org/abs/2203.07303
484 | - Code: https://github.com/showlab/all-in-one
485 |
486 | **Position-guided Text Prompt for Vision Language Pre-training**
487 |
488 | - Paper: https://arxiv.org/abs/2212.09737
489 | - Code: https://github.com/sail-sg/ptp
490 |
491 | **EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding**
492 |
493 | - Paper: https://arxiv.org/abs/2209.14941
494 | - Code: https://github.com/yanmin-wu/EDA
495 |
496 | **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**
497 |
498 | - Paper: https://arxiv.org/abs/2303.02489
499 | - Code: None
500 |
501 | **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**
502 |
503 | - Paper: https://arxiv.org/abs/2303.02483
504 | - Code: https://github.com/BrandonHanx/FAME-ViL
505 |
506 | **Align and Attend: Multimodal Summarization with Dual Contrastive Losses**
507 |
508 | - Homepage: https://boheumd.github.io/A2Summ/
509 | - Paper: https://arxiv.org/abs/2303.07284
510 | - Code: https://github.com/boheumd/A2Summ
511 |
512 | **Multi-Modal Representation Learning with Text-Driven Soft Masks**
513 |
514 | - Paper: https://arxiv.org/abs/2304.00719
515 | - Code: None
516 |
517 | **Learning to Name Classes for Vision and Language Models**
518 |
519 | - Paper: https://arxiv.org/abs/2304.01830
520 | - Code: None
521 |
522 |
523 |
524 | # 目标检测(Object Detection)
525 |
526 | **YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors**
527 |
528 | - Paper: https://arxiv.org/abs/2207.02696
529 | - Code: https://github.com/WongKinYiu/yolov7
530 |
531 | **DETRs with Hybrid Matching**
532 |
533 | - Paper: https://arxiv.org/abs/2207.13080
534 | - Code: https://github.com/HDETR
535 |
536 | **Enhanced Training of Query-Based Object Detection via Selective Query Recollection**
537 |
538 | - Paper: https://arxiv.org/abs/2212.07593
539 | - Code: https://github.com/Fangyi-Chen/SQR
540 |
541 | **Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection**
542 |
543 | - Paper: https://arxiv.org/abs/2303.05892
544 | - Code: https://github.com/LutingWang/OADP
545 |
546 |
547 |
548 | # 目标跟踪(Object Tracking)
549 |
550 | **Simple Cues Lead to a Strong Multi-Object Tracker**
551 |
552 | - Paper: https://arxiv.org/abs/2206.04656
553 | - Code: None
554 |
555 | **Joint Visual Grounding and Tracking with Natural Language Specification**
556 |
557 | - Paper: https://arxiv.org/abs/2303.12027
558 | - Code: https://github.com/lizhou-cs/JointNLT
559 |
560 |
561 |
562 | # 语义分割(Semantic Segmentation)
563 |
564 | **Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos**
565 |
566 | - Paper: https://arxiv.org/abs/2303.07224
567 | - Code: https://github.com/THU-LYJ-Lab/AR-Seg
568 |
569 | **FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding**
570 |
571 | - Paper: https://arxiv.org/abs/2304.02135
572 | - Code: https://github.com/uark-cviu/FREDOM
573 |
574 |
575 |
576 | # 医学图像分割(Medical Image Segmentation)
577 |
578 | **Label-Free Liver Tumor Segmentation**
579 |
580 | - Paper: https://arxiv.org/abs/2303.14869
581 | - Code: https://github.com/MrGiovanni/SyntheticTumors
582 |
583 | **Directional Connectivity-based Segmentation of Medical Images**
584 |
585 | - Paper: https://arxiv.org/abs/2304.00145
586 | - Code: https://github.com/Zyun-Y/DconnNet
587 |
588 | **Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation**
589 |
590 | - Paper: https://arxiv.org/abs/2305.00673
591 | - Code: https://github.com/DeepMed-Lab-ECNU/BCP
592 |
593 | **Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization**
594 |
595 | - Paper: https://arxiv.org/abs/2304.00212
596 | - Code: None
597 |
598 | **Fair Federated Medical Image Segmentation via Client Contribution Estimation**
599 |
600 | - Paper: https://arxiv.org/abs/2303.16520
601 | - Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce
602 |
603 | **Ambiguous Medical Image Segmentation using Diffusion Models**
604 |
605 | - Homepage: https://aimansnigdha.github.io/cimd/
606 | - Paper: https://arxiv.org/abs/2304.04745
607 | - Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models
608 |
609 | **Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation**
610 |
611 | - Paper: https://arxiv.org/abs/2303.13090
612 | - Code: https://github.com/HengCai-NJU/DeSCO
613 |
614 | **MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery**
615 |
616 | - Paper: https://arxiv.org/abs/2301.01767
617 | - Code: https://github.com/DeepMed-Lab-ECNU/MagicNet
618 |
619 | **MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation**
620 |
621 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
622 | - Code: https://github.com/WYC-321/MCF
623 |
624 | **Rethinking Few-Shot Medical Segmentation: A Vector Quantization View**
625 |
626 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html
627 | - Code: None
628 |
629 | **Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation**
630 |
631 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
632 | - Code: https://github.com/hritam-98/PatchCL-MedSeg
633 |
634 | **SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation**
635 |
636 | - Paper: https://arxiv.org/abs/2305.11012
637 | - Code: None
638 |
639 | **DoNet: Deep De-overlapping Network for Cytology Instance Segmentation**
640 |
641 | - Paper: https://arxiv.org/abs/2303.14373
642 | - Code: https://github.com/DeepDoNet/DoNet
643 |
644 |
645 |
646 | # 视频目标分割(Video Object Segmentation)
647 |
648 | **Two-shot Video Object Segmentation**
649 |
650 | - Paper: https://arxiv.org/abs/2303.12078
651 | - Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation
652 |
653 | **Under Video Object Segmentation Section**
654 |
655 | - Paper: https://arxiv.org/abs/2303.07815
656 | - Code: None
657 |
658 |
659 |
660 | # 视频实例分割(Video Instance Segmentation)
661 |
662 | **Mask-Free Video Instance Segmentation**
663 |
664 | - Paper: https://arxiv.org/abs/2303.15904
665 | - Code: https://github.com/SysCV/MaskFreeVis
666 |
667 |
668 |
669 | # 参考图像分割(Referring Image Segmentation )
670 |
671 | **PolyFormer: Referring Image Segmentation as Sequential Polygon Generation**
672 |
673 | - Paper: https://arxiv.org/abs/2302.07387
674 |
675 | - Code: None
676 |
677 |
678 |
679 | # 3D点云(3D-Point-Cloud)
680 |
681 | **Physical-World Optical Adversarial Attacks on 3D Face Recognition**
682 |
683 | - Paper: https://arxiv.org/abs/2205.13412
684 | - Code: https://github.com/PolyLiYJ/SLAttack.git
685 |
686 | **IterativePFN: True Iterative Point Cloud Filtering**
687 |
688 | - Paper: https://arxiv.org/abs/2304.01529
689 | - Code: https://github.com/ddsediri/IterativePFN
690 |
691 | **Attention-based Point Cloud Edge Sampling**
692 |
693 | - Homepage: https://junweizheng93.github.io/publications/APES/APES.html
694 | - Paper: https://arxiv.org/abs/2302.14673
695 | - Code: https://github.com/JunweiZheng93/APES
696 |
697 |
698 |
699 | # 3D目标检测(3D Object Detection)
700 |
701 | **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**
702 |
703 | - Paper: https://arxiv.org/abs/2301.06051
704 | - Code: https://github.com/Haiyang-W/DSVT
705 |
706 | **FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection**
707 |
708 | - Paper: https://arxiv.org/abs/2301.04467
709 | - Code: None
710 |
711 | **3D Video Object Detection with Learnable Object-Centric Global Optimization**
712 |
713 | - Paper: None
714 | - Code: None
715 |
716 | **Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection**
717 |
718 | - Paper: https://arxiv.org/abs/2304.01464
719 | - Code: https://github.com/azhuantou/HSSDA
720 |
721 |
722 |
723 | # 3D语义分割(3D Semantic Segmentation)
724 |
725 | **Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation**
726 |
727 | - Paper: https://arxiv.org/abs/2303.11203
728 | - Code: https://github.com/l1997i/lim3d
729 |
730 |
731 |
732 | # 3D语义场景补全(3D Semantic Scene Completion)
733 |
734 | - Paper: https://arxiv.org/abs/2302.12251
735 | - Code: https://github.com/NVlabs/VoxFormer
736 |
737 |
738 |
739 | # 3D配准(3D Registration)
740 |
741 | **Robust Outlier Rejection for 3D Registration with Variational Bayes**
742 |
743 | - Paper: https://arxiv.org/abs/2304.01514
744 | - Code: https://github.com/Jiang-HB/VBReg
745 |
746 |
747 |
748 | # 3D人体姿态估计(3D Human Pose Estimation)
749 |
750 |
751 |
752 | # 3D人体Mesh估计(3D Human Mesh Estimation)
753 |
754 | **3D Human Mesh Estimation from Virtual Markers**
755 |
756 | - Paper: https://arxiv.org/abs/2303.11726
757 | - Code: https://github.com/ShirleyMaxx/VirtualMarker
758 |
759 |
760 |
761 | # Low-level Vision
762 |
763 | **Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective**
764 |
765 | - Paper: https://arxiv.org/abs/2303.06859
766 | - Code: https://github.com/lixinustc/Casual-IR-DIL
767 |
768 | **Burstormer: Burst Image Restoration and Enhancement Transformer**
769 |
770 | - Paper: https://arxiv.org/abs/2304.01194
771 | - Code: http://github.com/akshaydudhane16/Burstormer
772 |
773 |
774 |
775 | # 超分辨率(Video Super-Resolution)
776 |
777 | **Super-Resolution Neural Operator**
778 |
779 | - Paper: https://arxiv.org/abs/2303.02584
780 | - Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator
781 |
782 | ## 视频超分辨率
783 |
784 | **Learning Trajectory-Aware Transformer for Video Super-Resolution**
785 |
786 | - Paper: https://arxiv.org/abs/2204.04216
787 |
788 | - Code: https://github.com/researchmm/TTVSR
789 |
790 | Denoising
791 |
792 | # 去噪(Denoising)
793 |
794 | ## 图像去噪(Image Denoising)
795 |
796 | **Masked Image Training for Generalizable Deep Image Denoising**
797 |
798 | - Paper- : https://arxiv.org/abs/2303.13132
799 | - Code: https://github.com/haoyuc/MaskedDenoising
800 |
801 |
802 |
803 | # 图像生成(Image Generation)
804 |
805 | **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**
806 |
807 | - Paper: https://arxiv.org/abs/2301.12959
808 | - Code: https://github.com/tobran/GALIP
809 |
810 | **MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis**
811 |
812 | - Paper: https://arxiv.org/abs/2211.09117
813 | - Code: https://github.com/LTH14/mage
814 |
815 | **Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation**
816 |
817 | - Paper: https://arxiv.org/abs/2304.01816
818 | - Code: None
819 |
820 | **Few-shot Semantic Image Synthesis with Class Affinity Transfer**
821 |
822 | - Paper: https://arxiv.org/abs/2304.02321
823 | - Code: None
824 |
825 | **TopNet: Transformer-based Object Placement Network for Image Compositing**
826 |
827 | - Paper: https://arxiv.org/abs/2304.03372
828 | - Code: None
829 |
830 |
831 |
832 | # 视频生成(Video Generation)
833 |
834 | **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**
835 |
836 | - Paper: https://arxiv.org/abs/2212.09478
837 | - Code: https://github.com/researchmm/MM-Diffusion
838 |
839 | **Conditional Image-to-Video Generation with Latent Flow Diffusion Models**
840 |
841 | - Paper: https://arxiv.org/abs/2303.13744
842 | - Code: https://github.com/nihaomiao/CVPR23_LFDM
843 |
844 |
845 |
846 | # 视频理解(Video Understanding)
847 |
848 | **Learning Transferable Spatiotemporal Representations from Natural Script Knowledge**
849 |
850 | - Paper: https://arxiv.org/abs/2209.15280
851 | - Code: https://github.com/TencentARC/TVTS
852 |
853 | **Frame Flexible Network**
854 |
855 | - Paper: https://arxiv.org/abs/2303.14817
856 | - Code: https://github.com/BeSpontaneous/FFN
857 |
858 | **Masked Motion Encoding for Self-Supervised Video Representation Learning**
859 |
860 | - Paper: https://arxiv.org/abs/2210.06096
861 | - Code: https://github.com/XinyuSun/MME
862 |
863 | **MARLIN: Masked Autoencoder for facial video Representation LearnING**
864 |
865 | - Paper: https://arxiv.org/abs/2211.06627
866 | - Code: https://github.com/ControlNet/MARLIN
867 |
868 |
869 |
870 | # 行为检测(Action Detection)
871 |
872 | **TriDet: Temporal Action Detection with Relative Boundary Modeling**
873 |
874 | - Paper: https://arxiv.org/abs/2303.07347
875 | - Code: https://github.com/dingfengshi/TriDet
876 |
877 |
878 |
879 | # 文本检测(Text Detection)
880 |
881 | **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**
882 |
883 | - Paper: https://arxiv.org/abs/2211.10772
884 | - Code link: https://github.com/ViTAE-Transformer/DeepSolo
885 |
886 |
887 |
888 | # 知识蒸馏(Knowledge Distillation)
889 |
890 | **Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation**
891 |
892 | - Paper: https://arxiv.org/abs/2302.14290
893 | - Code: None
894 |
895 | **Generic-to-Specific Distillation of Masked Autoencoders**
896 |
897 | - Paper: https://arxiv.org/abs/2302.14771
898 | - Code: https://github.com/pengzhiliang/G2SD
899 |
900 |
901 |
902 | # 模型剪枝(Model Pruning)
903 |
904 | **DepGraph: Towards Any Structural Pruning**
905 |
906 | - Paper: https://arxiv.org/abs/2301.12900
907 | - Code: https://github.com/VainF/Torch-Pruning
908 |
909 |
910 |
911 | # 图像压缩(Image Compression)
912 |
913 | **Context-Based Trit-Plane Coding for Progressive Image Compression**
914 |
915 | - Paper: https://arxiv.org/abs/2303.05715
916 | - Code: https://github.com/seungminjeon-github/CTC
917 |
918 |
919 |
920 | # 异常检测(Anomaly Detection)
921 |
922 | **Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images**
923 |
924 | - Paper: https://arxiv.org/abs/2111.13495
925 | - Code: https://github.com/tiangexiang/SQUID
926 |
927 |
928 |
929 | # 三维重建(3D Reconstruction)
930 |
931 | **OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields**
932 |
933 | - Paper: https://arxiv.org/abs/2211.12886
934 | - Code: None
935 |
936 | **SparsePose: Sparse-View Camera Pose Regression and Refinement**
937 |
938 | - Paper: https://arxiv.org/abs/2211.16991
939 | - Code: None
940 |
941 | **NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction**
942 |
943 | - Paper: https://arxiv.org/abs/2303.02375
944 | - Code: None
945 |
946 | **Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition**
947 |
948 | - Homepage: https://moygcc.github.io/vid2avatar/
949 | - Paper: https://arxiv.org/abs/2302.11566
950 | - Code: https://github.com/MoyGcc/vid2avatar
951 | - Demo: https://youtu.be/EGi47YeIeGQ
952 |
953 | **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**
954 |
955 | - Paper: https://arxiv.org/abs/2106.09614
956 | - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
957 |
958 | **Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction**
959 |
960 | - Paper: https://arxiv.org/abs/2303.05937
961 | - Code: None
962 |
963 | **3D Cinemagraphy from a Single Image**
964 |
965 | - Homepage: https://xingyi-li.github.io/3d-cinemagraphy/
966 | - Paper: https://arxiv.org/abs/2303.05724
967 | - Code: https://github.com/xingyi-li/3d-cinemagraphy
968 |
969 | **Revisiting Rotation Averaging: Uncertainties and Robust Losses**
970 |
971 | - Paper: https://arxiv.org/abs/2303.05195
972 | - Code https://github.com/zhangganlin/GlobalSfMpy
973 |
974 | **FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction**
975 |
976 | - Paper: https://arxiv.org/abs/2211.13874
977 | - Code: https://github.com/csbhr/FFHQ-UV
978 |
979 | **A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images**
980 |
981 | - Homepage: https://younglbw.github.io/HRN-homepage/
982 |
983 | - Paper: https://arxiv.org/abs/2302.14434
984 | - Code: https://github.com/youngLBW/HRN
985 |
986 |
987 |
988 | # 深度估计(Depth Estimation)
989 |
990 | **Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation**
991 |
992 | - Paper: https://arxiv.org/abs/2211.13202
993 | - Code: https://github.com/noahzn/Lite-Mono
994 |
995 |
996 |
997 | # 轨迹预测(Trajectory Prediction)
998 |
999 | **IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction**
1000 |
1001 | - Paper: https://arxiv.org/abs/2303.00575
1002 | - Code: None
1003 |
1004 | **EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning**
1005 |
1006 | - Paper: https://arxiv.org/abs/2303.10876
1007 | - Code: https://github.com/MediaBrain-SJTU/EqMotion
1008 |
1009 |
1010 |
1011 | # 车道线检测(Lane Detection)
1012 |
1013 | **Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection**
1014 |
1015 | - Paper: https://arxiv.org/abs/2301.02371
1016 | - Code: https://github.com/tusen-ai/Anchor3DLane
1017 |
1018 | **BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points**
1019 |
1020 | - Paper: https://arxiv.org/abs/2210.06006v3
1021 | - Code: https://github.com/gigo-team/bev_lane_det
1022 |
1023 |
1024 |
1025 | # 图像描述(Image Captioning)
1026 |
1027 | **ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing**
1028 |
1029 | - Paper: https://arxiv.org/abs/2303.02437
1030 | - Code: Node
1031 |
1032 | **Cross-Domain Image Captioning with Discriminative Finetuning**
1033 |
1034 | - Paper: https://arxiv.org/abs/2304.01662
1035 | - Code: None
1036 |
1037 | **Model-Agnostic Gender Debiased Image Captioning**
1038 |
1039 | - Paper: https://arxiv.org/abs/2304.03693
1040 | - Code: None
1041 |
1042 |
1043 |
1044 | # 视觉问答(Visual Question Answering)
1045 |
1046 | **MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering**
1047 |
1048 | - Paper: https://arxiv.org/abs/2303.01239
1049 | - Code: https://github.com/jingjing12110/MixPHM
1050 |
1051 |
1052 |
1053 | # 手语识别(Sign Language Recognition)
1054 |
1055 | **Continuous Sign Language Recognition with Correlation Network**
1056 |
1057 | Paper: https://arxiv.org/abs/2303.03202
1058 |
1059 | Code: https://github.com/hulianyuyy/CorrNet
1060 |
1061 |
1062 |
1063 | # 视频预测(Video Prediction)
1064 |
1065 | **MOSO: Decomposing MOtion, Scene and Object for Video Prediction**
1066 |
1067 | - Paper: https://arxiv.org/abs/2303.03684
1068 | - Code: https://github.com/anonymous202203/MOSO
1069 |
1070 |
1071 |
1072 | # 新视点合成(Novel View Synthesis)
1073 |
1074 | **3D Video Loops from Asynchronous Input**
1075 |
1076 | - Homepage: https://limacv.github.io/VideoLoop3D_web/
1077 | - Paper: https://arxiv.org/abs/2303.05312
1078 | - Code: https://github.com/limacv/VideoLoop3D
1079 |
1080 |
1081 |
1082 | # Zero-Shot Learning(零样本学习)
1083 |
1084 | **Bi-directional Distribution Alignment for Transductive Zero-Shot Learning**
1085 |
1086 | - Paper: https://arxiv.org/abs/2303.08698
1087 | - Code: https://github.com/Zhicaiwww/Bi-VAEGAN
1088 |
1089 | **Semantic Prompt for Few-Shot Learning**
1090 |
1091 | - Paper: None
1092 | - Code: None
1093 |
1094 |
1095 |
1096 | # 立体匹配(Stereo Matching)
1097 |
1098 | **Iterative Geometry Encoding Volume for Stereo Matching**
1099 |
1100 | - Paper: https://arxiv.org/abs/2303.06615
1101 | - Code: https://github.com/gangweiX/IGEV
1102 |
1103 | **Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation**
1104 |
1105 | - Paper: https://arxiv.org/abs/2304.00152
1106 | - Code: None
1107 |
1108 |
1109 |
1110 | # 特征匹配(Feature Matching)
1111 |
1112 | **Adaptive Spot-Guided Transformer for Consistent Local Feature Matching**
1113 |
1114 | - Homepage: [https://astr2023.github.io](https://astr2023.github.io/)
1115 | - Paper: https://arxiv.org/abs/2303.16624
1116 | - Code: https://github.com/ASTR2023/ASTR
1117 |
1118 |
1119 |
1120 | # 场景图生成(Scene Graph Generation)
1121 |
1122 | **Prototype-based Embedding Network for Scene Graph Generation**
1123 |
1124 | - Paper: https://arxiv.org/abs/2303.07096
1125 | - Code: None
1126 |
1127 |
1128 |
1129 | # 隐式神经表示(Implicit Neural Representations)
1130 |
1131 | **Polynomial Implicit Neural Representations For Large Diverse Datasets**
1132 |
1133 | - Paper: https://arxiv.org/abs/2303.11424
1134 | - Code: https://github.com/Rajhans0/Poly_INR
1135 |
1136 |
1137 |
1138 | # 图像质量评价(Image Quality Assessment)
1139 |
1140 | **Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild**
1141 |
1142 | - Paper: https://arxiv.org/abs/2304.00451
1143 | - Code: None
1144 |
1145 |
1146 |
1147 | # 数据集(Datasets)
1148 |
1149 | **Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes**
1150 |
1151 | - Paper: https://arxiv.org/abs/2303.02760
1152 | - Code: None
1153 |
1154 | **Align and Attend: Multimodal Summarization with Dual Contrastive Losses**
1155 |
1156 | - Homepage: https://boheumd.github.io/A2Summ/
1157 | - Paper: https://arxiv.org/abs/2303.07284
1158 | - Code: https://github.com/boheumd/A2Summ
1159 |
1160 | **GeoNet: Benchmarking Unsupervised Adaptation across Geographies**
1161 |
1162 | - Homepage: https://tarun005.github.io/GeoNet/
1163 | - Paper: https://arxiv.org/abs/2303.15443
1164 |
1165 | **CelebV-Text: A Large-Scale Facial Text-Video Dataset**
1166 |
1167 | - Homepage: https://celebv-text.github.io/
1168 | - Paper: https://arxiv.org/abs/2303.14717
1169 |
1170 |
1171 |
1172 | # 其他(Others)
1173 |
1174 | **Interactive Segmentation as Gaussian Process Classification**
1175 |
1176 | - Paper: https://arxiv.org/abs/2302.14578
1177 | - Code: None
1178 |
1179 | **Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger**
1180 |
1181 | - Paper: https://arxiv.org/abs/2302.14677
1182 | - Code: None
1183 |
1184 | **SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries**
1185 |
1186 | - Homepage: http://bit.ly/splinecam
1187 | - Paper: https://arxiv.org/abs/2302.12828
1188 | - Code: None
1189 |
1190 | **SCOTCH and SODA: A Transformer Video Shadow Detection Framework**
1191 |
1192 | - Paper: https://arxiv.org/abs/2211.06885
1193 | - Code: None
1194 |
1195 | **DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization**
1196 |
1197 | - Homepage: https://ai4ce.github.io/DeepMapping2/
1198 | - Paper: https://arxiv.org/abs/2212.06331
1199 | - None: https://github.com/ai4ce/DeepMapping2
1200 |
1201 | **RelightableHands: Efficient Neural Relighting of Articulated Hand Models**
1202 |
1203 | - Homepage: https://sh8.io/#/relightable_hands
1204 | - Paper: https://arxiv.org/abs/2302.04866
1205 | - Code: None
1206 |
1207 | **Token Turing Machines**
1208 |
1209 | - Paper: https://arxiv.org/abs/2211.09119
1210 | - Code: None
1211 |
1212 | **Single Image Backdoor Inversion via Robust Smoothed Classifiers**
1213 |
1214 | - Paper: https://arxiv.org/abs/2303.00215
1215 | - Code: https://github.com/locuslab/smoothinv
1216 |
1217 | **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**
1218 |
1219 | - Paper: https://arxiv.org/abs/2106.09614
1220 | - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
1221 |
1222 | **HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics**
1223 |
1224 | - Homepage: https://dolorousrtur.github.io/hood/
1225 | - Paper: https://arxiv.org/abs/2212.07242
1226 | - Code: https://github.com/dolorousrtur/hood
1227 | - Demo: https://www.youtube.com/watch?v=cBttMDPrUYY
1228 |
1229 | **A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others**
1230 |
1231 | - Paper: https://arxiv.org/abs/2212.04825
1232 | - Code: https://github.com/facebookresearch/Whac-A-Mole.git
1233 |
1234 | **RelightableHands: Efficient Neural Relighting of Articulated Hand Models**
1235 |
1236 | - Homepage: https://sh8.io/#/relightable_hands
1237 | - Paper: https://arxiv.org/abs/2302.04866
1238 | - Code: None
1239 | - Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4
1240 |
1241 | **Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation**
1242 |
1243 | - Paper: https://arxiv.org/abs/2303.00914
1244 | - Code: None
1245 |
1246 | **Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression**
1247 |
1248 | - Paper: https://arxiv.org/abs/2303.01052
1249 | - Code: None
1250 |
1251 | **UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy**
1252 |
1253 | - Paper: https://arxiv.org/abs/2303.00938
1254 | - Code: None
1255 |
1256 | **Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness**
1257 |
1258 | - Paper: https://arxiv.org/abs/2303.00971
1259 | - Code: https://github.com/zhijieshen-bjtu/DOPNet
1260 |
1261 | **Learning Neural Parametric Head Models**
1262 |
1263 | - Homepage: https://simongiebenhain.github.io/NPHM)
1264 | - Paper: https://arxiv.org/abs/2212.02761
1265 | - Code: None
1266 |
1267 | **A Meta-Learning Approach to Predicting Performance and Data Requirements**
1268 |
1269 | - Paper: https://arxiv.org/abs/2303.01598
1270 | - Code: None
1271 |
1272 | **MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision**
1273 |
1274 | - Homepage: https://imagine.enpc.fr/~guedona/MACARONS/
1275 | - Paper: https://arxiv.org/abs/2303.03315
1276 | - Code: None
1277 |
1278 | **Masked Images Are Counterfactual Samples for Robust Fine-tuning**
1279 |
1280 | - Paper: https://arxiv.org/abs/2303.03052
1281 | - Code: None
1282 |
1283 | **HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling**
1284 |
1285 | - Paper: https://arxiv.org/abs/2303.02700
1286 | - Code: None
1287 |
1288 | **Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization**
1289 |
1290 | - Paper: https://arxiv.org/abs/2303.02328
1291 | - Code: None
1292 |
1293 | **Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization**
1294 |
1295 | - Paper: https://arxiv.org/abs/2303.03108
1296 | - Code: None
1297 |
1298 | **Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples**
1299 |
1300 | - Paper: https://arxiv.org/abs/2301.01217
1301 | - Code: https://github.com/jiamingzhang94/Unlearnable-Clusters
1302 |
1303 | **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**
1304 |
1305 | - Paper: https://arxiv.org/abs/2303.04249
1306 | - Code: None
1307 |
1308 | **UniHCP: A Unified Model for Human-Centric Perceptions**
1309 |
1310 | - Paper: https://arxiv.org/abs/2303.02936
1311 | - Code: https://github.com/OpenGVLab/UniHCP
1312 |
1313 | **CUDA: Convolution-based Unlearnable Datasets**
1314 |
1315 | - Paper: https://arxiv.org/abs/2303.04278
1316 | - Code: https://github.com/vinusankars/Convolution-based-Unlearnability
1317 |
1318 | **Masked Images Are Counterfactual Samples for Robust Fine-tuning**
1319 |
1320 | - Paper: https://arxiv.org/abs/2303.03052
1321 | - Code: None
1322 |
1323 | **AdaptiveMix: Robust Feature Representation via Shrinking Feature Space**
1324 |
1325 | - Paper: https://arxiv.org/abs/2303.01559
1326 | - Code: https://github.com/WentianZhang-ML/AdaptiveMix
1327 |
1328 | **Physical-World Optical Adversarial Attacks on 3D Face Recognition**
1329 |
1330 | - Paper: https://arxiv.org/abs/2205.13412
1331 | - Code: https://github.com/PolyLiYJ/SLAttack.git
1332 |
1333 | **DPE: Disentanglement of Pose and Expression for General Video Portrait Editing**
1334 |
1335 | - Paper: https://arxiv.org/abs/2301.06281
1336 | - Code: https://carlyx.github.io/DPE/
1337 |
1338 | **SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation**
1339 |
1340 | - Paper: https://arxiv.org/abs/2211.12194
1341 | - Code: https://github.com/Winfredy/SadTalker
1342 |
1343 | **Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models**
1344 |
1345 | - Paper: None
1346 | - Code: None
1347 |
1348 | **Sharpness-Aware Gradient Matching for Domain Generalization**
1349 |
1350 | - Paper: None
1351 | - Code: https://github.com/Wang-pengfei/SAGM
1352 |
1353 | **Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization**
1354 |
1355 | - Paper: None
1356 | - Code: None
1357 |
1358 | **Blind Video Deflickering by Neural Filtering with a Flawed Atlas**
1359 |
1360 | - Homepage: https://chenyanglei.github.io/deflicker
1361 | - Paper: None
1362 | - Code: None
1363 |
1364 | **RiDDLE: Reversible and Diversified De-identification with Latent Encryptor**
1365 |
1366 | - Paper: None
1367 | - Code: https://github.com/ldz666666/RiDDLE
1368 |
1369 | **PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation**
1370 |
1371 | - Paper: https://arxiv.org/abs/2303.07337
1372 | - Code: None
1373 |
1374 | **Upcycling Models under Domain and Category Shift**
1375 |
1376 | - Paper: https://arxiv.org/abs/2303.07110
1377 | - Code: https://github.com/ispc-lab/GLC
1378 |
1379 | **Modality-Agnostic Debiasing for Single Domain Generalization**
1380 |
1381 | - Paper: https://arxiv.org/abs/2303.07123
1382 | - Code: None
1383 |
1384 | **Progressive Open Space Expansion for Open-Set Model Attribution**
1385 |
1386 | - Paper: https://arxiv.org/abs/2303.06877
1387 | - Code: None
1388 |
1389 | **Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies**
1390 |
1391 | - Paper: https://arxiv.org/abs/2303.06856
1392 | - Code: None
1393 |
1394 | **GFPose: Learning 3D Human Pose Prior with Gradient Fields**
1395 |
1396 | - Paper: https://arxiv.org/abs/2212.08641
1397 | - Code: https://github.com/Embracing/GFPose
1398 |
1399 | **PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment**
1400 |
1401 | - Paper: https://arxiv.org/abs/2303.11526
1402 | - Code: https://github.com/Zhang-VISLab
1403 |
1404 | **Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings**
1405 |
1406 | - Paper: https://arxiv.org/abs/2303.11502
1407 | - Code: None
1408 |
1409 | **Boundary Unlearning**
1410 |
1411 | - Paper: https://arxiv.org/abs/2303.11570
1412 | - Code: None
1413 |
1414 | **ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing**
1415 |
1416 | - Paper: https://arxiv.org/abs/2303.17096
1417 | - Code: https://github.com/alibaba/easyrobust
1418 |
1419 | **Zero-shot Model Diagnosis**
1420 |
1421 | - Paper: https://arxiv.org/abs/2303.15441
1422 | - Code: None
1423 |
1424 | **GeoNet: Benchmarking Unsupervised Adaptation across Geographies**
1425 |
1426 | - Homepage: https://tarun005.github.io/GeoNet/
1427 | - Paper: https://arxiv.org/abs/2303.15443
1428 |
1429 | **Quantum Multi-Model Fitting**
1430 |
1431 | - Paper: https://arxiv.org/abs/2303.15444
1432 | - Code: https://github.com/FarinaMatteo/qmmf
1433 |
1434 | **DivClust: Controlling Diversity in Deep Clustering**
1435 |
1436 | - Paper: https://arxiv.org/abs/2304.01042
1437 | - Code: None
1438 |
1439 | **Neural Volumetric Memory for Visual Locomotion Control**
1440 |
1441 | - Homepage: https://rchalyang.github.io/NVM
1442 | - Paper: https://arxiv.org/abs/2304.01201
1443 | - Code: https://rchalyang.github.io/NVM
1444 |
1445 | **MonoHuman: Animatable Human Neural Field from Monocular Video**
1446 |
1447 | - Homepage: https://yzmblog.github.io/projects/MonoHuman/
1448 | - Paper: https://arxiv.org/abs/2304.02001
1449 | - Code: https://github.com/Yzmblog/MonoHuman
1450 |
1451 | **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**
1452 |
1453 | - Homepage: https://nv-tlabs.github.io/trace-pace/
1454 | - Paper: https://arxiv.org/abs/2304.01893
1455 | - Code: None
1456 |
1457 | **Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification**
1458 |
1459 | - Paper: https://arxiv.org/abs/2304.01804
1460 | - Code: None
1461 |
1462 | **HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering**
1463 |
1464 | - Paper: https://arxiv.org/abs/2304.01686
1465 | - Code: None
1466 |
1467 | **On the Stability-Plasticity Dilemma of Class-Incremental Learning**
1468 |
1469 | - Paper: https://arxiv.org/abs/2304.01663
1470 | - Code: None
1471 |
1472 | **Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning**
1473 |
1474 | - Paper: https://arxiv.org/abs/2304.01482
1475 | - Code: None
1476 |
1477 | **VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution**
1478 |
1479 | - Paper: https://arxiv.org/abs/2304.01434
1480 | - Code: https://github.com/jaeill/CVPR23-VNE
1481 |
1482 | **Detecting and Grounding Multi-Modal Media Manipulation**
1483 |
1484 | - Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake
1485 | - Paper: https://arxiv.org/abs/2304.02556
1486 | - Code: https://github.com/rshaojimmy/MultiModal-DeepFake
1487 |
1488 | **Meta-causal Learning for Single Domain Generalization**
1489 |
1490 | - Paper: https://arxiv.org/abs/2304.03709
1491 | - Code: None
1492 |
1493 | **Disentangling Writer and Character Styles for Handwriting Generation**
1494 |
1495 | - Paper: https://arxiv.org/abs/2303.14736
1496 | - Code: https://github.com/dailenson/SDT
1497 |
1498 | **DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects**
1499 |
1500 | - Homepage: https://www.chenbao.tech/dexart/
1501 |
1502 | - Code: https://github.com/Kami-code/dexart-release
1503 |
1504 | **Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision**
1505 |
1506 | - Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html
1507 | - Paper: https://arxiv.org/abs/2303.00462
1508 | - Code: https://github.com/Toytiny/CMFlow
1509 |
1510 | **Marching-Primitives: Shape Abstraction from Signed Distance Function**
1511 |
1512 | - Paper: https://arxiv.org/abs/2303.13190
1513 | - Code: https://github.com/ChirikjianLab/Marching-Primitives
1514 |
1515 | **Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision**
1516 |
1517 | - Paper: https://arxiv.org/abs/2303.00885
1518 | - Code: None
--------------------------------------------------------------------------------
/CVer学术交流群.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wyf3/CVPR2024-Papers-with-Code/7a12b2155e596a79ba6dcc7a17a5ae27f0fc50a8/CVer学术交流群.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # CVPR 2024 论文和开源项目合集(Papers with Code)
2 |
3 | CVPR 2024 decisions are now available on OpenReview!
4 |
5 |
6 | > 注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目!
7 | >
8 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
9 | >
10 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md)
11 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md)
12 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md)
13 | > - [CVPR 2022](CVPR2022-Papers-with-Code.md)
14 | > - [CVPR 2023](CVPR2022-Papers-with-Code.md)
15 |
16 | 欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来!
17 |
18 | 
19 |
20 | # 【CVPR 2024 论文开源目录】
21 |
22 | - [3DGS(Gaussian Splatting)](#3DGS)
23 | - [Avatars](#Avatars)
24 | - [Backbone](#Backbone)
25 | - [CLIP](#CLIP)
26 | - [MAE](#MAE)
27 | - [Embodied AI](#Embodied-AI)
28 | - [GAN](#GAN)
29 | - [GNN](#GNN)
30 | - [多模态大语言模型(MLLM)](#MLLM)
31 | - [大语言模型(LLM)](#LLM)
32 | - [NAS](#NAS)
33 | - [OCR](#OCR)
34 | - [NeRF](#NeRF)
35 | - [DETR](#DETR)
36 | - [Prompt](#Prompt)
37 | - [扩散模型(Diffusion Models)](#Diffusion)
38 | - [ReID(重识别)](#ReID)
39 | - [长尾分布(Long-Tail)](#Long-Tail)
40 | - [Vision Transformer](#Vision-Transformer)
41 | - [视觉和语言(Vision-Language)](#VL)
42 | - [自监督学习(Self-supervised Learning)](#SSL)
43 | - [数据增强(Data Augmentation)](#DA)
44 | - [目标检测(Object Detection)](#Object-Detection)
45 | - [异常检测(Anomaly Detection)](#Anomaly-Detection)
46 | - [目标跟踪(Visual Tracking)](#VT)
47 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
48 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
49 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
50 | - [医学图像(Medical Image)](#MI)
51 | - [医学图像分割(Medical Image Segmentation)](#MIS)
52 | - [视频目标分割(Video Object Segmentation)](#VOS)
53 | - [视频实例分割(Video Instance Segmentation)](#VIS)
54 | - [参考图像分割(Referring Image Segmentation)](#RIS)
55 | - [图像抠图(Image Matting)](#Matting)
56 | - [图像编辑(Image Editing)](#Image-Editing)
57 | - [Low-level Vision](#LLV)
58 | - [超分辨率(Super-Resolution)](#SR)
59 | - [去噪(Denoising)](#Denoising)
60 | - [去模糊(Deblur)](#Deblur)
61 | - [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
62 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
63 | - [3D目标检测(3D Object Detection)](#3DOD)
64 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
65 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
66 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
67 | - [3D配准(3D Registration)](#3D-Registration)
68 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
69 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
70 | - [医学图像(Medical Image)](#Medical-Image)
71 | - [图像生成(Image Generation)](#Image-Generation)
72 | - [视频生成(Video Generation)](#Video-Generation)
73 | - [3D生成(3D Generation)](#3D-Generation)
74 | - [视频理解(Video Understanding)](#Video-Understanding)
75 | - [行为检测(Action Detection)](#Action-Detection)
76 | - [文本检测(Text Detection)](#Text-Detection)
77 | - [知识蒸馏(Knowledge Distillation)](#KD)
78 | - [模型剪枝(Model Pruning)](#Pruning)
79 | - [图像压缩(Image Compression)](#IC)
80 | - [三维重建(3D Reconstruction)](#3D-Reconstruction)
81 | - [深度估计(Depth Estimation)](#Depth-Estimation)
82 | - [轨迹预测(Trajectory Prediction)](#TP)
83 | - [车道线检测(Lane Detection)](#Lane-Detection)
84 | - [图像描述(Image Captioning)](#Image-Captioning)
85 | - [视觉问答(Visual Question Answering)](#VQA)
86 | - [手语识别(Sign Language Recognition)](#SLR)
87 | - [视频预测(Video Prediction)](#Video-Prediction)
88 | - [新视点合成(Novel View Synthesis)](#NVS)
89 | - [Zero-Shot Learning(零样本学习)](#ZSL)
90 | - [立体匹配(Stereo Matching)](#Stereo-Matching)
91 | - [特征匹配(Feature Matching)](#Feature-Matching)
92 | - [场景图生成(Scene Graph Generation)](#SGG)
93 | - [隐式神经表示(Implicit Neural Representations)](#INR)
94 | - [图像质量评价(Image Quality Assessment)](#IQA)
95 | - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
96 | - [数据集(Datasets)](#Datasets)
97 | - [新任务(New Tasks)](#New-Tasks)
98 | - [其他(Others)](#Others)
99 |
100 |
101 |
102 | # 3DGS(Gaussian Splatting)
103 |
104 | **Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering**
105 |
106 | - Homepage: https://city-super.github.io/scaffold-gs/
107 | - Paper: https://arxiv.org/abs/2312.00109
108 | - Code: https://github.com/city-super/Scaffold-GS
109 |
110 | **GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis**
111 |
112 | - Homepage: https://shunyuanzheng.github.io/GPS-Gaussian
113 | - Paper: https://arxiv.org/abs/2312.02155
114 | - Code: https://github.com/ShunyuanZheng/GPS-Gaussian
115 |
116 | **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**
117 |
118 | - Paper: https://arxiv.org/abs/2312.02134
119 | - Code: https://github.com/huliangxiao/GaussianAvatar
120 |
121 | **GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting**
122 |
123 | - Paper: https://arxiv.org/abs/2311.14521
124 | - Code: https://github.com/buaacyw/GaussianEditor
125 |
126 | **Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction**
127 |
128 | - Homepage: https://ingra14m.github.io/Deformable-Gaussians/
129 | - Paper: https://arxiv.org/abs/2309.13101
130 | - Code: https://github.com/ingra14m/Deformable-3D-Gaussians
131 |
132 | **SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes**
133 |
134 | - Homepage: https://yihua7.github.io/SC-GS-web/
135 | - Paper: https://arxiv.org/abs/2312.14937
136 | - Code: https://github.com/yihua7/SC-GS
137 |
138 | **Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis**
139 |
140 | - Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/
141 | - Paper: https://arxiv.org/abs/2312.16812
142 | - Code: https://github.com/oppo-us-research/SpacetimeGaussians
143 |
144 | **DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**
145 |
146 | - Homepage: https://fictionarry.github.io/DNGaussian/
147 | - Paper: https://arxiv.org/abs/2403.06912
148 | - Code: https://github.com/Fictionarry/DNGaussian
149 |
150 | **4D Gaussian Splatting for Real-Time Dynamic Scene Rendering**
151 |
152 | - Paper: https://arxiv.org/abs/2310.08528
153 | - Code: https://github.com/hustvl/4DGaussians
154 |
155 | **GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models**
156 |
157 | - Paper: https://arxiv.org/abs/2310.08529
158 | - Code: https://github.com/hustvl/GaussianDreamer
159 |
160 |
161 |
162 | # Avatars
163 |
164 | **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**
165 |
166 | - Paper: https://arxiv.org/abs/2312.02134
167 | - Code: https://github.com/huliangxiao/GaussianAvatar
168 |
169 | **Real-Time Simulated Avatar from Head-Mounted Sensors**
170 |
171 | - Homepage: https://www.zhengyiluo.com/SimXR/
172 | - Paper: https://arxiv.org/abs/2403.06862
173 |
174 |
175 |
176 | # Backbone
177 |
178 | **RepViT: Revisiting Mobile CNN From ViT Perspective**
179 |
180 | - Paper: https://arxiv.org/abs/2307.09283
181 | - Code: https://github.com/THU-MIG/RepViT
182 |
183 | **TransNeXt: Robust Foveal Visual Perception for Vision Transformers**
184 |
185 | - Paper: https://arxiv.org/abs/2311.17132
186 | - Code: https://github.com/DaiShiResearch/TransNeXt
187 |
188 |
189 |
190 | # CLIP
191 |
192 | **Alpha-CLIP: A CLIP Model Focusing on Wherever You Want**
193 |
194 | - Paper: https://arxiv.org/abs/2312.03818
195 | - Code: https://github.com/SunzeY/AlphaCLIP
196 |
197 | **FairCLIP: Harnessing Fairness in Vision-Language Learning**
198 |
199 | - Paper: https://arxiv.org/abs/2403.19949
200 | - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
201 |
202 |
203 |
204 | # MAE
205 |
206 |
207 |
208 | # Embodied AI
209 |
210 | **EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI**
211 |
212 | - Homepage: https://tai-wang.github.io/embodiedscan/
213 | - Paper: https://arxiv.org/abs/2312.16170
214 | - Code: https://github.com/OpenRobotLab/EmbodiedScan
215 |
216 | **MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**
217 |
218 | - Homepage: https://iranqin.github.io/MP5.github.io/
219 | - Paper: https://arxiv.org/abs/2312.07472
220 | - Code: https://github.com/IranQin/MP5
221 |
222 | **LEMON: Learning 3D Human-Object Interaction Relation from 2D Images**
223 |
224 | - Paper: https://arxiv.org/abs/2312.08963
225 | - Code: https://github.com/yyvhang/lemon_3d
226 |
227 |
228 |
229 | # GAN
230 |
231 |
232 |
233 | # OCR
234 |
235 | **An Empirical Study of Scaling Law for OCR**
236 |
237 | - Paper: https://arxiv.org/abs/2401.00028
238 | - Code: https://github.com/large-ocr-model/large-ocr-model.github.io
239 |
240 | **ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting**
241 |
242 | - Paper: https://arxiv.org/abs/2403.00303
243 | - Code: https://github.com/PriNing/ODM
244 |
245 |
246 |
247 | # NeRF
248 |
249 | **PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF**
250 |
251 | - Paper: https://arxiv.org/abs/2311.13099
252 | - Code: https://github.com/FYTalon/pienerf/
253 |
254 |
255 |
256 | # DETR
257 |
258 | **DETRs Beat YOLOs on Real-time Object Detection**
259 |
260 | - Paper: https://arxiv.org/abs/2304.08069
261 | - Code: https://github.com/lyuwenyu/RT-DETR
262 |
263 | **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**
264 |
265 | - Paper: https://arxiv.org/abs/2403.16131
266 | - Code: https://github.com/xiuqhou/Salience-DETR
267 |
268 |
269 |
270 | # Prompt
271 |
272 |
273 |
274 | # 多模态大语言模型(MLLM)
275 |
276 | **mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration**
277 |
278 | - Paper: https://arxiv.org/abs/2311.04257
279 | - Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2
280 |
281 | **Link-Context Learning for Multimodal LLMs**
282 |
283 | - Paper: https://arxiv.org/abs/2308.07891
284 | - Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main
285 |
286 | **OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation**
287 |
288 | - Paper: https://arxiv.org/abs/2311.17911
289 | - Code: https://github.com/shikiw/OPERA
290 |
291 | **Making Large Multimodal Models Understand Arbitrary Visual Prompts**
292 |
293 | - Homepage: https://vip-llava.github.io/
294 | - Paper: https://arxiv.org/abs/2312.00784
295 |
296 | **Pink: Unveiling the power of referential comprehension for multi-modal llms**
297 |
298 | - Paper: https://arxiv.org/abs/2310.00582
299 | - Code: https://github.com/SY-Xuan/Pink
300 |
301 | **Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding**
302 |
303 | - Paper: https://arxiv.org/abs/2311.08046
304 | - Code: https://github.com/PKU-YuanGroup/Chat-UniVi
305 |
306 | **OneLLM: One Framework to Align All Modalities with Language**
307 |
308 | - Paper: https://arxiv.org/abs/2312.03700
309 | - Code: https://github.com/csuhan/OneLLM
310 |
311 |
312 |
313 | # 大语言模型(LLM)
314 |
315 | **VTimeLLM: Empower LLM to Grasp Video Moments**
316 |
317 | - Paper: https://arxiv.org/abs/2311.18445
318 | - Code: https://github.com/huangb23/VTimeLLM
319 |
320 |
321 |
322 | # NAS
323 |
324 |
325 |
326 | # ReID(重识别)
327 |
328 | **Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification**
329 |
330 | - Paper: https://arxiv.org/abs/2403.10254
331 | - Code: https://github.com/924973292/EDITOR
332 |
333 | **Noisy-Correspondence Learning for Text-to-Image Person Re-identification**
334 |
335 | - Paper: https://arxiv.org/abs/2308.09911
336 |
337 | - Code : https://github.com/QinYang79/RDE
338 |
339 |
340 |
341 | # 扩散模型(Diffusion Models)
342 |
343 | **InstanceDiffusion: Instance-level Control for Image Generation**
344 |
345 | - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
346 |
347 | - Paper: https://arxiv.org/abs/2402.03290
348 | - Code: https://github.com/frank-xwang/InstanceDiffusion
349 |
350 | **Residual Denoising Diffusion Models**
351 |
352 | - Paper: https://arxiv.org/abs/2308.13712
353 | - Code: https://github.com/nachifur/RDDM
354 |
355 | **DeepCache: Accelerating Diffusion Models for Free**
356 |
357 | - Paper: https://arxiv.org/abs/2312.00858
358 | - Code: https://github.com/horseee/DeepCache
359 |
360 | **DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**
361 |
362 | - Homepage: https://tianhao-qi.github.io/DEADiff/
363 |
364 | - Paper: https://arxiv.org/abs/2403.06951
365 | - Code: https://github.com/Tianhao-Qi/DEADiff_code
366 |
367 | **SVGDreamer: Text Guided SVG Generation with Diffusion Model**
368 |
369 | - Paper: https://arxiv.org/abs/2312.16476
370 | - Code: https://ximinng.github.io/SVGDreamer-project/
371 |
372 | **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**
373 |
374 | - Paper: https://arxiv.org/abs/2312.05849
375 | - Code: https://github.com/jiuntian/interactdiffusion
376 |
377 | **MMA-Diffusion: MultiModal Attack on Diffusion Models**
378 |
379 | - Paper: https://arxiv.org/abs/2311.17516
380 | - Code: https://github.com/yangyijune/MMA-Diffusion
381 |
382 | **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**
383 |
384 | - Homeoage: https://video-motion-customization.github.io/
385 | - Paper: https://arxiv.org/abs/2312.00845
386 | - Code: https://github.com/HyeonHo99/Video-Motion-Customization
387 |
388 |
389 |
390 | # Vision Transformer
391 |
392 | **TransNeXt: Robust Foveal Visual Perception for Vision Transformers**
393 |
394 | - Paper: https://arxiv.org/abs/2311.17132
395 | - Code: https://github.com/DaiShiResearch/TransNeXt
396 |
397 | **RepViT: Revisiting Mobile CNN From ViT Perspective**
398 |
399 | - Paper: https://arxiv.org/abs/2307.09283
400 | - Code: https://github.com/THU-MIG/RepViT
401 |
402 | **A General and Efficient Training for Transformer via Token Expansion**
403 |
404 | - Paper: https://arxiv.org/abs/2404.00672
405 | - Code: https://github.com/Osilly/TokenExpansion
406 |
407 |
408 |
409 | # 视觉和语言(Vision-Language)
410 |
411 | **PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**
412 |
413 | - Paper: https://arxiv.org/abs/2403.02781
414 | - Code: https://github.com/zhengli97/PromptKD
415 |
416 | **FairCLIP: Harnessing Fairness in Vision-Language Learning**
417 |
418 | - Paper: https://arxiv.org/abs/2403.19949
419 | - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
420 |
421 |
422 |
423 | # 目标检测(Object Detection)
424 |
425 | **DETRs Beat YOLOs on Real-time Object Detection**
426 |
427 | - Paper: https://arxiv.org/abs/2304.08069
428 | - Code: https://github.com/lyuwenyu/RT-DETR
429 |
430 | **Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation**
431 |
432 | - Paper: https://arxiv.org/abs/2312.01220
433 | - Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation
434 |
435 | **YOLO-World: Real-Time Open-Vocabulary Object Detection**
436 |
437 | - Paper: https://arxiv.org/abs/2401.17270
438 | - Code: https://github.com/AILab-CVC/YOLO-World
439 |
440 | **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**
441 |
442 | - Paper: https://arxiv.org/abs/2403.16131
443 | - Code: https://github.com/xiuqhou/Salience-DETR
444 |
445 |
446 |
447 | # 异常检测(Anomaly Detection)
448 |
449 | **Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection**
450 |
451 | - Paper: https://arxiv.org/abs/2310.12790
452 | - Code: https://github.com/mala-lab/AHL
453 |
454 |
455 |
456 | # 目标跟踪(Object Tracking)
457 |
458 | **Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**
459 |
460 | - Paper: https://arxiv.org/abs/2403.04700
461 | - Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT
462 |
463 |
464 |
465 | # 语义分割(Semantic Segmentation)
466 |
467 | **Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation**
468 |
469 | - Paper: https://arxiv.org/abs/2312.04265
470 | - Code: https://github.com/w1oves/Rein
471 |
472 | **SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation**
473 |
474 | - Paper: https://arxiv.org/abs/2311.15537
475 | - Code: https://github.com/xb534/SED
476 |
477 |
478 |
479 | # 医学图像(Medical Image)
480 |
481 | **Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology**
482 |
483 | - Paper: https://arxiv.org/abs/2402.17228
484 | - Code: https://github.com/DearCaat/RRT-MIL
485 |
486 | **VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis**
487 |
488 | - Paper: https://arxiv.org/abs/2402.17300
489 | - Code: https://github.com/Luffy03/VoCo
490 |
491 | **ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images**
492 |
493 | - Paper: https://arxiv.org/abs/2311.15264
494 | - Code: https://github.com/nicoboou/chada_vit
495 |
496 |
497 |
498 | # 医学图像分割(Medical Image Segmentation)
499 |
500 |
501 |
502 |
503 |
504 | # 自动驾驶(Autonomous Driving)
505 |
506 | **UniPAD: A Universal Pre-training Paradigm for Autonomous Driving**
507 |
508 | - Paper: https://arxiv.org/abs/2310.08370
509 | - Code: https://github.com/Nightmare-n/UniPAD
510 |
511 | **Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications**
512 |
513 | - Paper: https://arxiv.org/abs/2311.17663
514 | - Code: https://github.com/haomo-ai/Cam4DOcc
515 |
516 | **Memory-based Adapters for Online 3D Scene Perception**
517 |
518 | - Paper: https://arxiv.org/abs/2403.06974
519 | - Code: https://github.com/xuxw98/Online3D
520 |
521 | **Symphonize 3D Semantic Scene Completion with Contextual Instance Queries**
522 |
523 | - Paper: https://arxiv.org/abs/2306.15670
524 | - Code: https://github.com/hustvl/Symphonies
525 |
526 | **A Real-world Large-scale Dataset for Roadside Cooperative Perception**
527 |
528 | - Paper: https://arxiv.org/abs/2403.10145
529 | - Code: https://github.com/AIR-THU/DAIR-RCooper
530 |
531 | **Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**
532 |
533 | - Paper: https://arxiv.org/abs/2403.07535
534 | - Code: https://github.com/Junda24/AFNet
535 |
536 | **Traffic Scene Parsing through the TSP6K Dataset**
537 |
538 | - Paper: https://arxiv.org/pdf/2303.02835.pdf
539 | - Code: https://github.com/PengtaoJiang/TSP6K
540 |
541 |
542 |
543 | # 3D点云(3D-Point-Cloud)
544 |
545 |
546 |
547 |
548 |
549 | # 3D目标检测(3D Object Detection)
550 |
551 | **PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection**
552 |
553 | - Paper: https://arxiv.org/abs/2312.08371
554 | - Code: https://github.com/kuanchihhuang/PTT
555 |
556 | **UniMODE: Unified Monocular 3D Object Detection**
557 |
558 | - Paper: https://arxiv.org/abs/2402.18573
559 |
560 |
561 |
562 | # 3D语义分割(3D Semantic Segmentation)
563 |
564 |
565 |
566 | # 图像编辑(Image Editing)
567 |
568 | **Edit One for All: Interactive Batch Image Editing**
569 |
570 | - Homepage: https://thaoshibe.github.io/edit-one-for-all
571 | - Paper: https://arxiv.org/abs/2401.10219
572 | - Code: https://github.com/thaoshibe/edit-one-for-all
573 |
574 |
575 |
576 | # 视频编辑(Video Editing)
577 |
578 | **MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers**
579 |
580 | - Homepage: [https://maskint.github.io](https://maskint.github.io/)
581 |
582 | - Paper: https://arxiv.org/abs/2312.12468
583 |
584 |
585 |
586 | # Low-level Vision
587 |
588 | **Residual Denoising Diffusion Models**
589 |
590 | - Paper: https://arxiv.org/abs/2308.13712
591 | - Code: https://github.com/nachifur/RDDM
592 |
593 | **Boosting Image Restoration via Priors from Pre-trained Models**
594 |
595 | - Paper: https://arxiv.org/abs/2403.06793
596 |
597 |
598 |
599 | # 超分辨率(Super-Resolution)
600 |
601 | **SeD: Semantic-Aware Discriminator for Image Super-Resolution**
602 |
603 | - Paper: https://arxiv.org/abs/2402.19387
604 | - Code: https://github.com/lbc12345/SeD
605 |
606 | **APISR: Anime Production Inspired Real-World Anime Super-Resolution**
607 |
608 | - Paper: https://arxiv.org/abs/2403.01598
609 | - Code: https://github.com/Kiteretsu77/APISR
610 |
611 |
612 |
613 | # 去噪(Denoising)
614 |
615 | ## 图像去噪(Image Denoising)
616 |
617 |
618 |
619 | # 3D人体姿态估计(3D Human Pose Estimation)
620 |
621 | **Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**
622 |
623 | - Paper: https://arxiv.org/abs/2311.12028
624 | - Code: https://github.com/NationalGAILab/HoT
625 |
626 |
627 |
628 | # 图像生成(Image Generation)
629 |
630 | **InstanceDiffusion: Instance-level Control for Image Generation**
631 |
632 | - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
633 |
634 | - Paper: https://arxiv.org/abs/2402.03290
635 | - Code: https://github.com/frank-xwang/InstanceDiffusion
636 |
637 | **ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations**
638 |
639 | - Homepage: https://eclipse-t2i.vercel.app/
640 | - Paper: https://arxiv.org/abs/2312.04655
641 |
642 | - Code: https://github.com/eclipse-t2i/eclipse-inference
643 |
644 | **Instruct-Imagen: Image Generation with Multi-modal Instruction**
645 |
646 | - Paper: https://arxiv.org/abs/2401.01952
647 |
648 | **Residual Denoising Diffusion Models**
649 |
650 | - Paper: https://arxiv.org/abs/2308.13712
651 | - Code: https://github.com/nachifur/RDDM
652 |
653 | **UniGS: Unified Representation for Image Generation and Segmentation**
654 |
655 | - Paper: https://arxiv.org/abs/2312.01985
656 |
657 | **Multi-Instance Generation Controller for Text-to-Image Synthesis**
658 |
659 | - Paper: https://arxiv.org/abs/2402.05408
660 | - Code: https://github.com/limuloo/migc
661 |
662 | **SVGDreamer: Text Guided SVG Generation with Diffusion Model**
663 |
664 | - Paper: https://arxiv.org/abs/2312.16476
665 | - Code: https://ximinng.github.io/SVGDreamer-project/
666 |
667 | **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**
668 |
669 | - Paper: https://arxiv.org/abs/2312.05849
670 | - Code: https://github.com/jiuntian/interactdiffusion
671 |
672 | **Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following**
673 |
674 | - Paper: https://arxiv.org/abs/2311.17002
675 | - Code: https://github.com/ali-vilab/Ranni
676 |
677 |
678 |
679 | # 视频生成(Video Generation)
680 |
681 | **Vlogger: Make Your Dream A Vlog**
682 |
683 | - Paper: https://arxiv.org/abs/2401.09414
684 | - Code: https://github.com/Vchitect/Vlogger
685 |
686 | **VBench: Comprehensive Benchmark Suite for Video Generative Models**
687 |
688 | - Homepage: https://vchitect.github.io/VBench-project/
689 | - Paper: https://arxiv.org/abs/2311.17982
690 | - Code: https://github.com/Vchitect/VBench
691 |
692 | **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**
693 |
694 | - Homeoage: https://video-motion-customization.github.io/
695 | - Paper: https://arxiv.org/abs/2312.00845
696 | - Code: https://github.com/HyeonHo99/Video-Motion-Customization
697 |
698 |
699 |
700 | # 3D生成
701 |
702 | **CityDreamer: Compositional Generative Model of Unbounded 3D Cities**
703 |
704 | - Homepage: https://haozhexie.com/project/city-dreamer/
705 | - Paper: https://arxiv.org/abs/2309.00610
706 | - Code: https://github.com/hzxie/city-dreamer
707 |
708 | **LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching**
709 |
710 | - Paper: https://arxiv.org/abs/2311.11284
711 | - Code: https://github.com/EnVision-Research/LucidDreamer
712 |
713 |
714 |
715 | # 视频理解(Video Understanding)
716 |
717 | **MVBench: A Comprehensive Multi-modal Video Understanding Benchmark**
718 |
719 | - Paper: https://arxiv.org/abs/2311.17005
720 | - Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2
721 |
722 |
723 |
724 | # 知识蒸馏(Knowledge Distillation)
725 |
726 | **Logit Standardization in Knowledge Distillation**
727 |
728 | - Paper: https://arxiv.org/abs/2403.01427
729 | - Code: https://github.com/sunshangquan/logit-standardization-KD
730 |
731 | **Efficient Dataset Distillation via Minimax Diffusion**
732 |
733 | - Paper: https://arxiv.org/abs/2311.15529
734 | - Code: https://github.com/vimar-gu/MinimaxDiffusion
735 |
736 |
737 |
738 | # 立体匹配(Stereo Matching)
739 |
740 | **Neural Markov Random Field for Stereo Matching**
741 |
742 | - Paper: https://arxiv.org/abs/2403.11193
743 | - Code: https://github.com/aeolusguan/NMRF
744 |
745 |
746 |
747 | # 场景图生成(Scene Graph Generation)
748 |
749 | **HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation**
750 |
751 | - Homepage: https://zhangce01.github.io/HiKER-SGG/
752 | - Paper : https://arxiv.org/abs/2403.12033
753 | - Code: https://github.com/zhangce01/HiKER-SGG
754 |
755 |
756 |
757 | # 视频质量评价(Video Quality Assessment)
758 |
759 | **KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos**
760 |
761 | - Homepage: https://lixinustc.github.io/projects/KVQ/
762 |
763 | - Paper: https://arxiv.org/abs/2402.07220
764 | - Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024
765 |
766 |
767 |
768 | # 数据集(Datasets)
769 |
770 | **A Real-world Large-scale Dataset for Roadside Cooperative Perception**
771 |
772 | - Paper: https://arxiv.org/abs/2403.10145
773 | - Code: https://github.com/AIR-THU/DAIR-RCooper
774 |
775 | **Traffic Scene Parsing through the TSP6K Dataset**
776 |
777 | - Paper: https://arxiv.org/pdf/2303.02835.pdf
778 | - Code: https://github.com/PengtaoJiang/TSP6K
779 |
780 |
781 |
782 | # 其他(Others)
783 |
784 | **Object Recognition as Next Token Prediction**
785 |
786 | - Paper: https://arxiv.org/abs/2312.02142
787 | - Code: https://github.com/kaiyuyue/nxtp
788 |
789 | **ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks**
790 |
791 | - Paper: https://arxiv.org/abs/2306.14525
792 | - Code: https://parameternet.github.io/
793 |
794 | **Seamless Human Motion Composition with Blended Positional Encodings**
795 |
796 | - Paper: https://arxiv.org/abs/2402.15509
797 | - Code: https://github.com/BarqueroGerman/FlowMDM
798 |
799 | **LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning**
800 |
801 | - Homepage: https://ll3da.github.io/
802 |
803 | - Paper: https://arxiv.org/abs/2311.18651
804 | - Code: https://github.com/Open3DA/LL3DA
805 |
806 | **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update**
807 |
808 | - Homepage: https://clova-tool.github.io/
809 | - Paper: https://arxiv.org/abs/2312.10908
810 |
811 | **MoMask: Generative Masked Modeling of 3D Human Motions**
812 |
813 | - Paper: https://arxiv.org/abs/2312.00063
814 | - Code: https://github.com/EricGuo5513/momask-codes
815 |
816 | **Amodal Ground Truth and Completion in the Wild**
817 |
818 | - Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
819 | - Paper: https://arxiv.org/abs/2312.17247
820 | - Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild
821 |
822 | **Improved Visual Grounding through Self-Consistent Explanations**
823 |
824 | - Paper: https://arxiv.org/abs/2312.04554
825 | - Code: https://github.com/uvavision/SelfEQ
826 |
827 | **ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object**
828 |
829 | - Homepage: https://chenshuang-zhang.github.io/imagenet_d/
830 | - Paper: https://arxiv.org/abs/2403.18775
831 | - Code: https://github.com/chenshuang-zhang/imagenet_d
832 |
833 | **Learning from Synthetic Human Group Activities**
834 |
835 | - Homepage: https://cjerry1243.github.io/M3Act/
836 | - Paper https://arxiv.org/abs/2306.16772
837 | - Code: https://github.com/cjerry1243/M3Act
838 |
839 | **A Cross-Subject Brain Decoding Framework**
840 |
841 | - Homepage: https://littlepure2333.github.io/MindBridge/
842 | - Paper: https://arxiv.org/abs/2404.07850
843 | - Code: https://github.com/littlepure2333/MindBridge
844 |
845 | **Multi-Task Dense Prediction via Mixture of Low-Rank Experts**
846 |
847 | - Paper : https://arxiv.org/abs/2403.17749
848 | - Code: https://github.com/YuqiYang213/MLoRE
849 |
850 | **Contrastive Mean-Shift Learning for Generalized Category Discovery**
851 |
852 | - Homepage: https://postech-cvlab.github.io/cms/
853 | - Paper: https://arxiv.org/abs/2404.09451
854 | - Code: https://github.com/sua-choi/CMS
855 |
--------------------------------------------------------------------------------
/master:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wyf3/CVPR2024-Papers-with-Code/7a12b2155e596a79ba6dcc7a17a5ae27f0fc50a8/master
--------------------------------------------------------------------------------