├── CVPR2019-Papers-with-Code.md ├── CVPR2020-Papers-with-Code.md ├── CVPR2021-Papers-with-Code.md ├── CVPR2022-Papers-with-Code.md ├── CVPR2023-Papers-with-Code.md ├── CVer学术交流群.png ├── README.md └── master /CVPR2019-Papers-with-Code.md: -------------------------------------------------------------------------------- 1 | # CVPR2019-Code 2 | 3 | CVPR 2019 论文开源项目合集 4 | 5 | 传送门:[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code) 6 | 7 | 附:[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv) 8 | 9 | - [目标检测](#Object-Detection) 10 | - [目标跟踪](#Object-Tracking) 11 | - [语义分割](#Semantic-Segmentation) 12 | - [实例分割](#Instance-Segmentation) 13 | - [GAN](#GAN) 14 | - [人脸检测](#Face-Detection) 15 | - [人体姿态估计](#Human-Pose-Estimation) 16 | - [6DoF 姿态估计](#6DoF-Pose-Estimation) 17 | - [头部姿态估计](#Head-Pose-Estimation) 18 | - [人群密度估计](#Crowd-Counting) 19 | 20 | **更新记录:** 21 | 22 | - 20200226:添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code) 23 | 24 | - 20191026:添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv) 25 | - 20190405:添加 8 篇论文(目标检测、语义分割等方向) 26 | - 20190408:添加 6 篇论文(目标跟踪、GAN、6DoF姿态估计等方向) 27 | 28 | 29 | 30 | # 目标检测 31 | 32 | **Bounding Box Regression with Uncertainty for Accurate Object Detection** 33 | 34 | - arXiv: 35 | 36 | - github: 37 | 38 | 39 | 40 | # 目标跟踪 41 | 42 | **Fast Online Object Tracking and Segmentation: A Unifying Approach** 43 | 44 | - arXiv: 45 | 46 | - github: 47 | 48 | - homepage: 49 | 50 | **Unsupervised Deep Tracking** 51 | 52 | - arXiv: 53 | 54 | - github: 55 | 56 | - github(PyTorch): 57 | 58 | **Target-Aware Deep Tracking** 59 | 60 | - arXiv: 61 | 62 | - homepage: 63 | 64 | 65 | 66 | # 语义分割 67 | 68 | **Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation** 69 | 70 | - arXiv: 71 | 72 | - github:[https://github.com/LinZhuoChen/DUpsampling(非官方)](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89) 73 | 74 | **Dual Attention Network for Scene Segmentation** 75 | 76 | - arXiv: 77 | 78 | - github: 79 | 80 | **Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images** 81 | 82 | - arXiv:None 83 | 84 | - github: 85 | 86 | 87 | 88 | # 实例分割 89 | 90 | **Mask Scoring R-CNN** 91 | 92 | - arXiv: 93 | 94 | - github: 95 | 96 | 97 | 98 | # GAN 99 | 100 | **Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis** 101 | 102 | - arXiv: 103 | - github: 104 | 105 | 106 | 107 | # 人脸检测 108 | 109 | **DSFD: Dual Shot Face Detector** 110 | 111 | - arXiv: 112 | 113 | - github: 114 | 115 | 116 | 117 | # 人体姿态估计 118 | 119 | **Deep High-Resolution Representation Learning for Human Pose Estimation** 120 | 121 | - arXiv: 122 | 123 | - github: 124 | 125 | 126 | 127 | # 6DoF姿态估计 128 | 129 | **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation** 130 | 131 | - arXiv: 132 | - github: 133 | 134 | 135 | 136 | # 头部姿态估计 137 | 138 | **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation** 139 | 140 | - paper: 141 | - github: 142 | 143 | 144 | 145 | # 人群密度估计 146 | 147 | **Learning from Synthetic Data for Crowd Counting in the Wild** 148 | 149 | - arXiv: 150 | - github: 151 | - homepage: -------------------------------------------------------------------------------- /CVPR2020-Papers-with-Code.md: -------------------------------------------------------------------------------- 1 | # CVPR2020-Code 2 | 3 | [CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目 4 | 5 | **【推荐阅读】** 6 | 7 | - [CVPR 2020 virtual](http://cvpr20.com/) 8 | - ECCV 2020 论文开源项目合集来了:https://github.com/amusi/ECCV2020-Code 9 | 10 | - 关于往年CV顶会论文(如ECCV 2020、CVPR 2019、ICCV 2019)以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 11 | 12 | **【CVPR 2020 论文开源目录】** 13 | 14 | - [CNN](#CNN) 15 | - [图像分类](#Image-Classification) 16 | - [视频分类](#Video-Classification) 17 | - [目标检测](#Object-Detection) 18 | - [3D目标检测](#3D-Object-Detection) 19 | - [视频目标检测](#Video-Object-Detection) 20 | - [目标跟踪](#Object-Tracking) 21 | - [语义分割](#Semantic-Segmentation) 22 | - [实例分割](#Instance-Segmentation) 23 | - [全景分割](#Panoptic-Segmentation) 24 | - [视频目标分割](#VOS) 25 | - [超像素分割](#Superpixel) 26 | - [交互式图像分割](#IIS) 27 | - [NAS](#NAS) 28 | - [GAN](#GAN) 29 | - [Re-ID](#Re-ID) 30 | - [3D点云(分类/分割/配准/跟踪等)](#3D-PointCloud) 31 | - [人脸(识别/检测/重建等)](#Face) 32 | - [人体姿态估计(2D/3D)](#Human-Pose-Estimation) 33 | - [人体解析](#Human-Parsing) 34 | - [场景文本检测](#Scene-Text-Detection) 35 | - [场景文本识别](#Scene-Text-Recognition) 36 | - [特征(点)检测和描述](#Feature) 37 | - [超分辨率](#Super-Resolution) 38 | - [模型压缩/剪枝](#Model-Compression) 39 | - [视频理解/行为识别](#Action-Recognition) 40 | - [人群计数](#Crowd-Counting) 41 | - [深度估计](#Depth-Estimation) 42 | - [6D目标姿态估计](#6DOF) 43 | - [手势估计](#Hand-Pose) 44 | - [显著性检测](#Saliency) 45 | - [去噪](#Denoising) 46 | - [去雨](#Deraining) 47 | - [去模糊](#Deblurring) 48 | - [去雾](#Dehazing) 49 | - [特征点检测与描述](#Feature) 50 | - [视觉问答(VQA)](#VQA) 51 | - [视频问答(VideoQA)](#VideoQA) 52 | - [视觉语言导航](#VLN) 53 | - [视频压缩](#Video-Compression) 54 | - [视频插帧](#Video-Frame-Interpolation) 55 | - [风格迁移](#Style-Transfer) 56 | - [车道线检测](#Lane-Detection) 57 | - ["人-物"交互(HOI)检测](#HOI) 58 | - [轨迹预测](#TP) 59 | - [运动预测](#Motion-Predication) 60 | - [光流估计](#OF) 61 | - [图像检索](#IR) 62 | - [虚拟试衣](#Virtual-Try-On) 63 | - [HDR](#HDR) 64 | - [对抗样本](#AE) 65 | - [三维重建](#3D-Reconstructing) 66 | - [深度补全](#DC) 67 | - [语义场景补全](#SSC) 68 | - [图像/视频描述](#Captioning) 69 | - [线框解析](#WP) 70 | - [数据集](#Datasets) 71 | - [其他](#Others) 72 | - [不确定中没中](#Not-Sure) 73 | 74 | 75 | 76 | # CNN 77 | 78 | **Exploring Self-attention for Image Recognition** 79 | 80 | - 论文:https://hszhao.github.io/papers/cvpr20_san.pdf 81 | 82 | - 代码:https://github.com/hszhao/SAN 83 | 84 | **Improving Convolutional Networks with Self-Calibrated Convolutions** 85 | 86 | - 主页:https://mmcheng.net/scconv/ 87 | 88 | - 论文:http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf 89 | - 代码:https://github.com/backseason/SCNet 90 | 91 | **Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets** 92 | 93 | - 论文:https://arxiv.org/abs/2003.13549 94 | - 代码:https://github.com/zeiss-microscopy/BSConv 95 | 96 | 97 | 98 | # 图像分类 99 | 100 | **Interpretable and Accurate Fine-grained Recognition via Region Grouping** 101 | 102 | - 论文:https://arxiv.org/abs/2005.10411 103 | 104 | - 代码:https://github.com/zxhuang1698/interpretability-by-parts 105 | 106 | **Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion** 107 | 108 | - 论文:https://arxiv.org/abs/2003.04490 109 | 110 | - 代码:https://github.com/AdamKortylewski/CompositionalNets 111 | 112 | **Spatially Attentive Output Layer for Image Classification** 113 | 114 | - 论文:https://arxiv.org/abs/2004.07570 115 | - 代码(好像被原作者删除了):https://github.com/ildoonet/spatially-attentive-output-layer 116 | 117 | 118 | 119 | # 视频分类 120 | 121 | **SmallBigNet: Integrating Core and Contextual Views for Video Classification** 122 | 123 | - 论文:https://arxiv.org/abs/2006.14582 124 | - 代码:https://github.com/xhl-video/SmallBigNet 125 | 126 | 127 | 128 | # 目标检测 129 | 130 | **Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax** 131 | 132 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf 133 | - 代码:https://github.com/FishYuLi/BalancedGroupSoftmax 134 | 135 | **AugFPN: Improving Multi-scale Feature Learning for Object Detection** 136 | 137 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf 138 | - 代码:https://github.com/Gus-Guo/AugFPN 139 | 140 | **Noise-Aware Fully Webly Supervised Object Detection** 141 | 142 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html 143 | - 代码:https://github.com/shenyunhang/NA-fWebSOD/ 144 | 145 | **Learning a Unified Sample Weighting Network for Object Detection** 146 | 147 | - 论文:https://arxiv.org/abs/2006.06568 148 | - 代码:https://github.com/caiqi/sample-weighting-network 149 | 150 | **D2Det: Towards High Quality Object Detection and Instance Segmentation** 151 | 152 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf 153 | 154 | - 代码:https://github.com/JialeCao001/D2Det 155 | 156 | **Dynamic Refinement Network for Oriented and Densely Packed Object Detection** 157 | 158 | - 论文下载链接:https://arxiv.org/abs/2005.09973 159 | 160 | - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020 161 | 162 | **Scale-Equalizing Pyramid Convolution for Object Detection** 163 | 164 | 论文:https://arxiv.org/abs/2005.03101 165 | 166 | 代码:https://github.com/jshilong/SEPC 167 | 168 | **Revisiting the Sibling Head in Object Detector** 169 | 170 | - 论文:https://arxiv.org/abs/2003.07540 171 | 172 | - 代码:https://github.com/Sense-X/TSD 173 | 174 | **Scale-equalizing Pyramid Convolution for Object Detection** 175 | 176 | - 论文:暂无 177 | - 代码:https://github.com/jshilong/SEPC 178 | 179 | **Detection in Crowded Scenes: One Proposal, Multiple Predictions** 180 | 181 | - 论文:https://arxiv.org/abs/2003.09163 182 | - 代码:https://github.com/megvii-model/CrowdDetection 183 | 184 | **Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection** 185 | 186 | - 论文:https://arxiv.org/abs/2004.04725 187 | - 代码:https://github.com/NVlabs/wetectron 188 | 189 | **Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection** 190 | 191 | - 论文:https://arxiv.org/abs/1912.02424 192 | - 代码:https://github.com/sfzhang15/ATSS 193 | 194 | **BiDet: An Efficient Binarized Object Detector** 195 | 196 | - 论文:https://arxiv.org/abs/2003.03961 197 | - 代码:https://github.com/ZiweiWangTHU/BiDet 198 | 199 | **Harmonizing Transferability and Discriminability for Adapting Object Detectors** 200 | 201 | - 论文:https://arxiv.org/abs/2003.06297 202 | - 代码:https://github.com/chaoqichen/HTCN 203 | 204 | **CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection** 205 | 206 | - 论文:https://arxiv.org/abs/2003.09119 207 | - 代码:https://github.com/KiveeDong/CentripetalNet 208 | 209 | **Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection** 210 | 211 | - 论文:https://arxiv.org/abs/2003.11818 212 | - 代码:https://github.com/ggjy/HitDet.pytorch 213 | 214 | **EfficientDet: Scalable and Efficient Object Detection** 215 | 216 | - 论文:https://arxiv.org/abs/1911.09070 217 | - 代码:https://github.com/google/automl/tree/master/efficientdet 218 | 219 | 220 | 221 | # 3D目标检测 222 | 223 | **SESS: Self-Ensembling Semi-Supervised 3D Object Detection** 224 | 225 | - 论文: https://arxiv.org/abs/1912.11803 226 | 227 | - 代码:https://github.com/Na-Z/sess 228 | 229 | **Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection** 230 | 231 | - 论文: https://arxiv.org/abs/2006.04356 232 | 233 | - 代码:https://github.com/dleam/Associate-3Ddet 234 | 235 | **What You See is What You Get: Exploiting Visibility for 3D Object Detection** 236 | 237 | - 主页:https://www.cs.cmu.edu/~peiyunh/wysiwyg/ 238 | 239 | - 论文:https://arxiv.org/abs/1912.04986 240 | - 代码:https://github.com/peiyunh/wysiwyg 241 | 242 | **Learning Depth-Guided Convolutions for Monocular 3D Object Detection** 243 | 244 | - 论文:https://arxiv.org/abs/1912.04799 245 | - 代码:https://github.com/dingmyu/D4LCN 246 | 247 | **Structure Aware Single-stage 3D Object Detection from Point Cloud** 248 | 249 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html 250 | 251 | - 代码:https://github.com/skyhehe123/SA-SSD 252 | 253 | **IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving** 254 | 255 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf 256 | 257 | - 代码:https://github.com/swords123/IDA-3D 258 | 259 | **Train in Germany, Test in The USA: Making 3D Object Detectors Generalize** 260 | 261 | - 论文:https://arxiv.org/abs/2005.08139 262 | 263 | - 代码:https://github.com/cxy1997/3D_adapt_auto_driving 264 | 265 | **MLCVNet: Multi-Level Context VoteNet for 3D Object Detection** 266 | 267 | - 论文:https://arxiv.org/abs/2004.05679 268 | - 代码:https://github.com/NUAAXQ/MLCVNet 269 | 270 | **3DSSD: Point-based 3D Single Stage Object Detector** 271 | 272 | - CVPR 2020 Oral 273 | 274 | - 论文:https://arxiv.org/abs/2002.10187 275 | 276 | - 代码:https://github.com/tomztyang/3DSSD 277 | 278 | **Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation** 279 | 280 | - 论文:https://arxiv.org/abs/2004.03572 281 | 282 | - 代码:https://github.com/zju3dv/disprcn 283 | 284 | **End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection** 285 | 286 | - 论文:https://arxiv.org/abs/2004.03080 287 | 288 | - 代码:https://github.com/mileyan/pseudo-LiDAR_e2e 289 | 290 | **DSGN: Deep Stereo Geometry Network for 3D Object Detection** 291 | 292 | - 论文:https://arxiv.org/abs/2001.03398 293 | - 代码:https://github.com/chenyilun95/DSGN 294 | 295 | **LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention** 296 | 297 | - 论文:https://arxiv.org/abs/2004.01389 298 | - 代码:https://github.com/yinjunbo/3DVID 299 | 300 | **PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection** 301 | 302 | - 论文:https://arxiv.org/abs/1912.13192 303 | 304 | - 代码:https://github.com/sshaoshuai/PV-RCNN 305 | 306 | **Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud** 307 | 308 | - 论文:https://arxiv.org/abs/2003.01251 309 | - 代码:https://github.com/WeijingShi/Point-GNN 310 | 311 | 312 | 313 | # 视频目标检测 314 | 315 | **Memory Enhanced Global-Local Aggregation for Video Object Detection** 316 | 317 | 论文:https://arxiv.org/abs/2003.12063 318 | 319 | 代码:https://github.com/Scalsol/mega.pytorch 320 | 321 | 322 | 323 | # 目标跟踪 324 | 325 | **SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking** 326 | 327 | - 论文:https://arxiv.org/abs/1911.07241 328 | - 代码:https://github.com/ohhhyeahhh/SiamCAR 329 | 330 | **D3S -- A Discriminative Single Shot Segmentation Tracker** 331 | 332 | - 论文:https://arxiv.org/abs/1911.08862 333 | - 代码:https://github.com/alanlukezic/d3s 334 | 335 | **ROAM: Recurrently Optimizing Tracking Model** 336 | 337 | - 论文:https://arxiv.org/abs/1907.12006 338 | 339 | - 代码:https://github.com/skyoung/ROAM 340 | 341 | **Siam R-CNN: Visual Tracking by Re-Detection** 342 | 343 | - 主页:https://www.vision.rwth-aachen.de/page/siamrcnn 344 | - 论文:https://arxiv.org/abs/1911.12836 345 | - 论文2:https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf 346 | - 代码:https://github.com/VisualComputingInstitute/SiamR-CNN 347 | 348 | **Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises** 349 | 350 | - 论文:https://arxiv.org/abs/2003.09595 351 | - 代码:https://github.com/MasterBin-IIAU/CSA 352 | 353 | **High-Performance Long-Term Tracking with Meta-Updater** 354 | 355 | - 论文:https://arxiv.org/abs/2004.00305 356 | 357 | - 代码:https://github.com/Daikenan/LTMU 358 | 359 | **AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization** 360 | 361 | - 论文:https://arxiv.org/abs/2003.12949 362 | 363 | - 代码:https://github.com/vision4robotics/AutoTrack 364 | 365 | **Probabilistic Regression for Visual Tracking** 366 | 367 | - 论文:https://arxiv.org/abs/2003.12565 368 | - 代码:https://github.com/visionml/pytracking 369 | 370 | **MAST: A Memory-Augmented Self-supervised Tracker** 371 | 372 | - 论文:https://arxiv.org/abs/2002.07793 373 | - 代码:https://github.com/zlai0/MAST 374 | 375 | **Siamese Box Adaptive Network for Visual Tracking** 376 | 377 | - 论文:https://arxiv.org/abs/2003.06761 378 | - 代码:https://github.com/hqucv/siamban 379 | 380 | ## 多目标跟踪 381 | 382 | **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset** 383 | 384 | - 主页:https://vap.aau.dk/3d-zef/ 385 | - 论文:https://arxiv.org/abs/2006.08466 386 | - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/ 387 | - 数据集:https://motchallenge.net/data/3D-ZeF20 388 | 389 | 390 | 391 | # 语义分割 392 | 393 | **FDA: Fourier Domain Adaptation for Semantic Segmentation** 394 | 395 | - 论文:https://arxiv.org/abs/2004.05498 396 | 397 | - 代码:https://github.com/YanchaoYang/FDA 398 | 399 | **Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation** 400 | 401 | - 论文:暂无 402 | 403 | - 代码:https://github.com/JianqiangWan/Super-BPD 404 | 405 | **Single-Stage Semantic Segmentation from Image Labels** 406 | 407 | - 论文:https://arxiv.org/abs/2005.08104 408 | 409 | - 代码:https://github.com/visinf/1-stage-wseg 410 | 411 | **Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation** 412 | 413 | - 论文:https://arxiv.org/abs/2003.00867 414 | - 代码:https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation 415 | 416 | **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation** 417 | 418 | - 论文:http://vladlen.info/papers/MSeg.pdf 419 | - 代码:https://github.com/mseg-dataset/mseg-api 420 | 421 | **CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement** 422 | 423 | - 论文:https://arxiv.org/abs/2005.02551 424 | - 代码:https://github.com/hkchengrex/CascadePSP 425 | 426 | **Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision** 427 | 428 | - Oral 429 | - 论文:https://arxiv.org/abs/2004.07703 430 | - 代码:https://github.com/feipan664/IntraDA 431 | 432 | **Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation** 433 | 434 | - 论文:https://arxiv.org/abs/2004.04581 435 | - 代码:https://github.com/YudeWang/SEAM 436 | 437 | **Temporally Distributed Networks for Fast Video Segmentation** 438 | 439 | - 论文:https://arxiv.org/abs/2004.01800 440 | 441 | - 代码:https://github.com/feinanshan/TDNet 442 | 443 | **Context Prior for Scene Segmentation** 444 | 445 | - 论文:https://arxiv.org/abs/2004.01547 446 | 447 | - 代码:https://git.io/ContextPrior 448 | 449 | **Strip Pooling: Rethinking Spatial Pooling for Scene Parsing** 450 | 451 | - 论文:https://arxiv.org/abs/2003.13328 452 | 453 | - 代码:https://github.com/Andrew-Qibin/SPNet 454 | 455 | **Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks** 456 | 457 | - 论文:https://arxiv.org/abs/2003.05128 458 | - 代码:https://github.com/shachoi/HANet 459 | 460 | **Learning Dynamic Routing for Semantic Segmentation** 461 | 462 | - 论文:https://arxiv.org/abs/2003.10401 463 | 464 | - 代码:https://github.com/yanwei-li/DynamicRouting 465 | 466 | 467 | 468 | # 实例分割 469 | 470 | **D2Det: Towards High Quality Object Detection and Instance Segmentation** 471 | 472 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf 473 | 474 | - 代码:https://github.com/JialeCao001/D2Det 475 | 476 | **PolarMask: Single Shot Instance Segmentation with Polar Representation** 477 | 478 | - 论文:https://arxiv.org/abs/1909.13226 479 | - 代码:https://github.com/xieenze/PolarMask 480 | - 解读:https://zhuanlan.zhihu.com/p/84890413 481 | 482 | **CenterMask : Real-Time Anchor-Free Instance Segmentation** 483 | 484 | - 论文:https://arxiv.org/abs/1911.06667 485 | - 代码:https://github.com/youngwanLEE/CenterMask 486 | 487 | **BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation** 488 | 489 | - 论文:https://arxiv.org/abs/2001.00309 490 | - 代码:https://github.com/aim-uofa/AdelaiDet 491 | 492 | **Deep Snake for Real-Time Instance Segmentation** 493 | 494 | - 论文:https://arxiv.org/abs/2001.01629 495 | - 代码:https://github.com/zju3dv/snake 496 | 497 | **Mask Encoding for Single Shot Instance Segmentation** 498 | 499 | - 论文:https://arxiv.org/abs/2003.11712 500 | 501 | - 代码:https://github.com/aim-uofa/AdelaiDet 502 | 503 | 504 | 505 | # 全景分割 506 | 507 | **Video Panoptic Segmentation** 508 | 509 | - 论文:https://arxiv.org/abs/2006.11339 510 | - 代码:https://github.com/mcahny/vps 511 | - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0 512 | 513 | **Pixel Consensus Voting for Panoptic Segmentation** 514 | 515 | - 论文:https://arxiv.org/abs/2004.01849 516 | - 代码:还未公布 517 | 518 | **BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation** 519 | 520 | 论文:https://arxiv.org/abs/2003.14031 521 | 522 | 代码:https://github.com/Mooonside/BANet 523 | 524 | 525 | 526 | # 视频目标分割 527 | 528 | **A Transductive Approach for Video Object Segmentation** 529 | 530 | - 论文:https://arxiv.org/abs/2004.07193 531 | 532 | - 代码:https://github.com/microsoft/transductive-vos.pytorch 533 | 534 | **State-Aware Tracker for Real-Time Video Object Segmentation** 535 | 536 | - 论文:https://arxiv.org/abs/2003.00482 537 | 538 | - 代码:https://github.com/MegviiDetection/video_analyst 539 | 540 | **Learning Fast and Robust Target Models for Video Object Segmentation** 541 | 542 | - 论文:https://arxiv.org/abs/2003.00908 543 | - 代码:https://github.com/andr345/frtm-vos 544 | 545 | **Learning Video Object Segmentation from Unlabeled Videos** 546 | 547 | - 论文:https://arxiv.org/abs/2003.05020 548 | - 代码:https://github.com/carrierlxk/MuG 549 | 550 | 551 | 552 | # 超像素分割 553 | 554 | **Superpixel Segmentation with Fully Convolutional Networks** 555 | 556 | - 论文:https://arxiv.org/abs/2003.12929 557 | - 代码:https://github.com/fuy34/superpixel_fcn 558 | 559 | 560 | 561 | # 交互式图像分割 562 | 563 | **Interactive Object Segmentation with Inside-Outside Guidance** 564 | 565 | - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf 566 | - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance 567 | - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet 568 | 569 | 570 | 571 | # NAS 572 | 573 | **AOWS: Adaptive and optimal network width search with latency constraints** 574 | 575 | - 论文:https://arxiv.org/abs/2005.10481 576 | - 代码:https://github.com/bermanmaxim/AOWS 577 | 578 | **Densely Connected Search Space for More Flexible Neural Architecture Search** 579 | 580 | - 论文:https://arxiv.org/abs/1906.09607 581 | 582 | - 代码:https://github.com/JaminFong/DenseNAS 583 | 584 | **MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning** 585 | 586 | - 论文:https://arxiv.org/abs/2003.14058 587 | 588 | - 代码:https://github.com/bhpfelix/MTLNAS 589 | 590 | **FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions** 591 | 592 | - 论文下载链接:https://arxiv.org/abs/2004.05565 593 | 594 | - 代码:https://github.com/facebookresearch/mobile-vision 595 | 596 | **Neural Architecture Search for Lightweight Non-Local Networks** 597 | 598 | - 论文:https://arxiv.org/abs/2004.01961 599 | - 代码:https://github.com/LiYingwei/AutoNL 600 | 601 | **Rethinking Performance Estimation in Neural Architecture Search** 602 | 603 | - 论文:https://arxiv.org/abs/2005.09917 604 | - 代码:https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS 605 | - 解读1:https://www.zhihu.com/question/372070853/answer/1035234510 606 | - 解读2:https://zhuanlan.zhihu.com/p/111167409 607 | 608 | **CARS: Continuous Evolution for Efficient Neural Architecture Search** 609 | 610 | - 论文:https://arxiv.org/abs/1909.04977 611 | - 代码(即将开源):https://github.com/huawei-noah/CARS 612 | 613 | 614 | 615 | # GAN 616 | 617 | **SEAN: Image Synthesis with Semantic Region-Adaptive Normalization** 618 | 619 | - 论文:https://arxiv.org/abs/1911.12861 620 | - 代码:https://github.com/ZPdesu/SEAN 621 | 622 | **Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation** 623 | 624 | - 论文地址:http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html 625 | - 代码地址:https://github.com/alpc91/NICE-GAN-pytorch 626 | 627 | **Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning** 628 | 629 | - 论文:https://arxiv.org/abs/1912.01899 630 | - 代码:https://github.com/SsGood/DBGAN 631 | 632 | **PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer** 633 | 634 | - 论文:https://arxiv.org/abs/1909.06956 635 | - 代码:https://github.com/wtjiang98/PSGAN 636 | 637 | **Semantically Mutil-modal Image Synthesis** 638 | 639 | - 主页:http://seanseattle.github.io/SMIS 640 | - 论文:https://arxiv.org/abs/2003.12697 641 | - 代码:https://github.com/Seanseattle/SMIS 642 | 643 | **Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping** 644 | 645 | - 论文:https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf 646 | - 代码:https://github.com/yiranran/Unpaired-Portrait-Drawing 647 | 648 | **Learning to Cartoonize Using White-box Cartoon Representations** 649 | 650 | - 论文:https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf 651 | 652 | - 主页:https://systemerrorwang.github.io/White-box-Cartoonization/ 653 | - 代码:https://github.com/SystemErrorWang/White-box-Cartoonization 654 | - 解读:https://zhuanlan.zhihu.com/p/117422157 655 | - Demo视频:https://www.bilibili.com/video/av56708333 656 | 657 | **GAN Compression: Efficient Architectures for Interactive Conditional GANs** 658 | 659 | - 论文:https://arxiv.org/abs/2003.08936 660 | 661 | - 代码:https://github.com/mit-han-lab/gan-compression 662 | 663 | **Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions** 664 | 665 | - 论文:https://arxiv.org/abs/2003.01826 666 | - 代码:https://github.com/cc-hpc-itwm/UpConv 667 | 668 | 669 | 670 | # Re-ID 671 | 672 | **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification** 673 | 674 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html 675 | - 代码:https://github.com/wangguanan/HOReID 676 | 677 | **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification** 678 | 679 | - 论文:https://arxiv.org/abs/2005.07862 680 | 681 | - 数据集:暂无 682 | 683 | **Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking** 684 | 685 | - 论文:https://arxiv.org/abs/2004.04199 686 | 687 | - 代码:https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking 688 | 689 | **Pose-guided Visible Part Matching for Occluded Person ReID** 690 | 691 | - 论文:https://arxiv.org/abs/2004.00230 692 | - 代码:https://github.com/hh23333/PVPM 693 | 694 | **Weakly supervised discriminative feature learning with state information for person identification** 695 | 696 | - 论文:https://arxiv.org/abs/2002.11939 697 | - 代码:https://github.com/KovenYu/state-information 698 | 699 | 700 | 701 | # 3D点云(分类/分割/配准等) 702 | 703 | ## 3D点云卷积 704 | 705 | **PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling** 706 | 707 | - 论文:https://arxiv.org/abs/2003.00492 708 | - 代码:https://github.com/yanx27/PointASNL 709 | 710 | **Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds** 711 | 712 | - 论文下载链接:https://arxiv.org/abs/2003.12971 713 | 714 | - 代码:https://github.com/raoyongming/PointGLR 715 | 716 | **Grid-GCN for Fast and Scalable Point Cloud Learning** 717 | 718 | - 论文:https://arxiv.org/abs/1912.02984 719 | 720 | - 代码:https://github.com/Xharlie/Grid-GCN 721 | 722 | **FPConv: Learning Local Flattening for Point Convolution** 723 | 724 | - 论文:https://arxiv.org/abs/2002.10701 725 | - 代码:https://github.com/lyqun/FPConv 726 | 727 | ## 3D点云分类 728 | 729 | **PointAugment: an Auto-Augmentation Framework for Point Cloud Classification** 730 | 731 | - 论文:https://arxiv.org/abs/2002.10876 732 | - 代码(即将开源): https://github.com/liruihui/PointAugment/ 733 | 734 | ## 3D点云语义分割 735 | 736 | **RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds** 737 | 738 | - 论文:https://arxiv.org/abs/1911.11236 739 | - 代码:https://github.com/QingyongHu/RandLA-Net 740 | 741 | - 解读:https://zhuanlan.zhihu.com/p/105433460 742 | 743 | **Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels** 744 | 745 | - 论文:https://arxiv.org/abs/2004.04091 746 | 747 | - 代码:https://github.com/alex-xun-xu/WeakSupPointCloudSeg 748 | 749 | **PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation** 750 | 751 | - 论文:https://arxiv.org/abs/2003.14032 752 | - 代码:https://github.com/edwardzhou130/PolarSeg 753 | 754 | **Learning to Segment 3D Point Clouds in 2D Image Space** 755 | 756 | - 论文:https://arxiv.org/abs/2003.05593 757 | 758 | - 代码:https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space 759 | 760 | ## 3D点云实例分割 761 | 762 | PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation 763 | 764 | - 论文:https://arxiv.org/abs/2004.01658 765 | - 代码:https://github.com/Jia-Research-Lab/PointGroup 766 | 767 | ## 3D点云配准 768 | 769 | **Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences** 770 | 771 | - 论文:https://arxiv.org/abs/2005.01014 772 | - 代码:https://github.com/XiaoshuiHuang/fmr 773 | 774 | **D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features** 775 | 776 | - 论文:https://arxiv.org/abs/2003.03164 777 | - 代码:https://github.com/XuyangBai/D3Feat 778 | 779 | **RPM-Net: Robust Point Matching using Learned Features** 780 | 781 | - 论文:https://arxiv.org/abs/2003.13479 782 | - 代码:https://github.com/yewzijian/RPMNet 783 | 784 | ## 3D点云补全 785 | 786 | **Cascaded Refinement Network for Point Cloud Completion** 787 | 788 | - 论文:https://arxiv.org/abs/2004.03327 789 | - 代码:https://github.com/xiaogangw/cascaded-point-completion 790 | 791 | ## 3D点云目标跟踪 792 | 793 | **P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds** 794 | 795 | - 论文:https://arxiv.org/abs/2005.13888 796 | - 代码:https://github.com/HaozheQi/P2B 797 | 798 | ## 其他 799 | 800 | **An Efficient PointLSTM for Point Clouds Based Gesture Recognition** 801 | 802 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html 803 | - 代码:https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch 804 | 805 | 806 | 807 | # 人脸 808 | 809 | ## 人脸识别 810 | 811 | **CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition** 812 | 813 | - 论文:https://arxiv.org/abs/2004.00288 814 | 815 | - 代码:https://github.com/HuangYG123/CurricularFace 816 | 817 | **Learning Meta Face Recognition in Unseen Domains** 818 | 819 | - 论文:https://arxiv.org/abs/2003.07733 820 | - 代码:https://github.com/cleardusk/MFR 821 | - 解读:https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ 822 | 823 | ## 人脸检测 824 | 825 | ## 人脸活体检测 826 | 827 | **Searching Central Difference Convolutional Networks for Face Anti-Spoofing** 828 | 829 | - 论文:https://arxiv.org/abs/2003.04092 830 | 831 | - 代码:https://github.com/ZitongYu/CDCN 832 | 833 | ## 人脸表情识别 834 | 835 | **Suppressing Uncertainties for Large-Scale Facial Expression Recognition** 836 | 837 | - 论文:https://arxiv.org/abs/2002.10392 838 | 839 | - 代码(即将开源):https://github.com/kaiwang960112/Self-Cure-Network 840 | 841 | ## 人脸转正 842 | 843 | **Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images** 844 | 845 | - 论文:https://arxiv.org/abs/2003.08124 846 | - 代码:https://github.com/Hangz-nju-cuhk/Rotate-and-Render 847 | 848 | ## 人脸3D重建 849 | 850 | **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"** 851 | 852 | - 论文:https://arxiv.org/abs/2003.13845 853 | - 数据集:https://github.com/lattas/AvatarMe 854 | 855 | **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction** 856 | 857 | - 论文:https://arxiv.org/abs/2003.13989 858 | - 代码:https://github.com/zhuhao-nju/facescape 859 | 860 | 861 | 862 | # 人体姿态估计(2D/3D) 863 | 864 | ## 2D人体姿态估计 865 | 866 | **TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting** 867 | 868 | - 主页:https://yzhq97.github.io/transmomo/ 869 | 870 | - 论文:https://arxiv.org/abs/2003.14401 871 | - 代码:https://github.com/yzhq97/transmomo.pytorch 872 | 873 | **HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation** 874 | 875 | - 论文:https://arxiv.org/abs/1908.10357 876 | - 代码:https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation 877 | 878 | **The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation** 879 | 880 | - 论文:https://arxiv.org/abs/1911.07524 881 | - 代码:https://github.com/HuangJunJie2017/UDP-Pose 882 | - 解读:https://zhuanlan.zhihu.com/p/92525039 883 | 884 | **Distribution-Aware Coordinate Representation for Human Pose Estimation** 885 | 886 | - 主页:https://ilovepose.github.io/coco/ 887 | 888 | - 论文:https://arxiv.org/abs/1910.06278 889 | 890 | - 代码:https://github.com/ilovepose/DarkPose 891 | 892 | ## 3D人体姿态估计 893 | 894 | **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data** 895 | 896 | - 论文:https://arxiv.org/abs/2006.07778 897 | - 代码:https://github.com/Nicholasli1995/EvoSkeleton 898 | 899 | **Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach** 900 | 901 | - 主页:https://www.zhe-zhang.com/cvpr2020 902 | - 论文:https://arxiv.org/abs/2003.11163 903 | 904 | - 代码:https://github.com/CHUNYUWANG/imu-human-pose-pytorch 905 | 906 | **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data** 907 | 908 | - 论文下载链接:https://arxiv.org/abs/2004.01166 909 | 910 | - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest 911 | - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML 912 | 913 | **Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis** 914 | 915 | - 主页:http://val.cds.iisc.ac.in/pgp-human/ 916 | - 论文:https://arxiv.org/abs/2004.04400 917 | 918 | **Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation** 919 | 920 | - 论文:https://arxiv.org/abs/2004.00329 921 | - 代码:https://github.com/fabbrimatteo/LoCO 922 | 923 | **VIBE: Video Inference for Human Body Pose and Shape Estimation** 924 | 925 | - 论文:https://arxiv.org/abs/1912.05656 926 | - 代码:https://github.com/mkocabas/VIBE 927 | 928 | **Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation** 929 | 930 | - 论文:https://arxiv.org/abs/2002.11251 931 | - 代码:https://github.com/vnmr/JointVideoPose3D 932 | 933 | **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS** 934 | 935 | - 论文:https://arxiv.org/abs/2003.03972 936 | - 数据集:暂无 937 | 938 | 939 | 940 | # 人体解析 941 | 942 | **Correlating Edge, Pose with Parsing** 943 | 944 | - 论文:https://arxiv.org/abs/2005.01431 945 | 946 | - 代码:https://github.com/ziwei-zh/CorrPM 947 | 948 | 949 | 950 | # 场景文本检测 951 | 952 | **STEFANN: Scene Text Editor using Font Adaptive Neural Network** 953 | 954 | - 主页:https://prasunroy.github.io/stefann/ 955 | 956 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html 957 | - 代码:https://github.com/prasunroy/stefann 958 | - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k 959 | 960 | **ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection** 961 | 962 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf 963 | - 代码:https://github.com/wangyuxin87/ContourNet 964 | 965 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** 966 | 967 | - 论文:https://arxiv.org/abs/2003.10608 968 | - 代码和数据集:https://github.com/Jyouhou/UnrealText/ 969 | 970 | **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network** 971 | 972 | - 论文:https://arxiv.org/abs/2002.10200 973 | - 代码(即将开源):https://github.com/Yuliang-Liu/bezier_curve_text_spotting 974 | - 代码(即将开源):https://github.com/aim-uofa/adet 975 | 976 | **Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection** 977 | 978 | - 论文:https://arxiv.org/abs/2003.07493 979 | 980 | - 代码:https://github.com/GXYM/DRRG 981 | 982 | 983 | 984 | # 场景文本识别 985 | 986 | **SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition** 987 | 988 | - 论文:https://arxiv.org/abs/2005.10977 989 | - 代码:https://github.com/Pay20Y/SEED 990 | 991 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** 992 | 993 | - 论文:https://arxiv.org/abs/2003.10608 994 | - 代码和数据集:https://github.com/Jyouhou/UnrealText/ 995 | 996 | **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network** 997 | 998 | - 论文:https://arxiv.org/abs/2002.10200 999 | - 代码(即将开源):https://github.com/aim-uofa/adet 1000 | 1001 | **Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition** 1002 | 1003 | - 论文:https://arxiv.org/abs/2003.06606 1004 | 1005 | - 代码:https://github.com/Canjie-Luo/Text-Image-Augmentation 1006 | 1007 | 1008 | 1009 | # 特征(点)检测和描述 1010 | 1011 | **SuperGlue: Learning Feature Matching with Graph Neural Networks** 1012 | 1013 | - 论文:https://arxiv.org/abs/1911.11763 1014 | - 代码:https://github.com/magicleap/SuperGluePretrainedNetwork 1015 | 1016 | 1017 | 1018 | # 超分辨率 1019 | 1020 | ## 图像超分辨率 1021 | 1022 | **Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution** 1023 | 1024 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html 1025 | - 代码:https://github.com/guoyongcs/DRN 1026 | 1027 | **Learning Texture Transformer Network for Image Super-Resolution** 1028 | 1029 | - 论文:https://arxiv.org/abs/2006.04139 1030 | 1031 | - 代码:https://github.com/FuzhiYang/TTSR 1032 | 1033 | **Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining** 1034 | 1035 | - 论文:https://arxiv.org/abs/2006.01424 1036 | - 代码:https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention 1037 | 1038 | **Structure-Preserving Super Resolution with Gradient Guidance** 1039 | 1040 | - 论文:https://arxiv.org/abs/2003.13081 1041 | 1042 | - 代码:https://github.com/Maclory/SPSR 1043 | 1044 | **Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy** 1045 | 1046 | 论文:https://arxiv.org/abs/2004.00448 1047 | 1048 | 代码:https://github.com/clovaai/cutblur 1049 | 1050 | ## 视频超分辨率 1051 | 1052 | **TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution** 1053 | 1054 | - 论文:https://arxiv.org/abs/1812.02898 1055 | - 代码:https://github.com/YapengTian/TDAN-VSR-CVPR-2020 1056 | 1057 | **Space-Time-Aware Multi-Resolution Video Enhancement** 1058 | 1059 | - 主页:https://alterzero.github.io/projects/STAR.html 1060 | - 论文:http://arxiv.org/abs/2003.13170 1061 | - 代码:https://github.com/alterzero/STARnet 1062 | 1063 | **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution** 1064 | 1065 | - 论文:https://arxiv.org/abs/2002.11616 1066 | - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 1067 | 1068 | 1069 | 1070 | # 模型压缩/剪枝 1071 | 1072 | **DMCP: Differentiable Markov Channel Pruning for Neural Networks** 1073 | 1074 | - 论文:https://arxiv.org/abs/2005.03354 1075 | - 代码:https://github.com/zx55/dmcp 1076 | 1077 | **Forward and Backward Information Retention for Accurate Binary Neural Networks** 1078 | 1079 | - 论文:https://arxiv.org/abs/1909.10788 1080 | 1081 | - 代码:https://github.com/htqin/IR-Net 1082 | 1083 | **Towards Efficient Model Compression via Learned Global Ranking** 1084 | 1085 | - 论文:https://arxiv.org/abs/1904.12368 1086 | - 代码:https://github.com/cmu-enyac/LeGR 1087 | 1088 | **HRank: Filter Pruning using High-Rank Feature Map** 1089 | 1090 | - 论文:http://arxiv.org/abs/2002.10179 1091 | - 代码:https://github.com/lmbxmu/HRank 1092 | 1093 | **GAN Compression: Efficient Architectures for Interactive Conditional GANs** 1094 | 1095 | - 论文:https://arxiv.org/abs/2003.08936 1096 | 1097 | - 代码:https://github.com/mit-han-lab/gan-compression 1098 | 1099 | **Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression** 1100 | 1101 | - 论文:https://arxiv.org/abs/2003.08935 1102 | 1103 | - 代码:https://github.com/ofsoundof/group_sparsity 1104 | 1105 | 1106 | 1107 | # 视频理解/行为识别 1108 | 1109 | **Oops! Predicting Unintentional Action in Video** 1110 | 1111 | - 主页:https://oops.cs.columbia.edu/ 1112 | 1113 | - 论文:https://arxiv.org/abs/1911.11206 1114 | - 代码:https://github.com/cvlab-columbia/oops 1115 | - 数据集:https://oops.cs.columbia.edu/data 1116 | 1117 | **PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition** 1118 | 1119 | - 论文:https://arxiv.org/abs/1911.12409 1120 | - 代码:https://github.com/shlizee/Predict-Cluster 1121 | 1122 | **Intra- and Inter-Action Understanding via Temporal Action Parsing** 1123 | 1124 | - 论文:https://arxiv.org/abs/2005.10229 1125 | - 主页和数据集:https://sdolivia.github.io/TAPOS/ 1126 | 1127 | **3DV: 3D Dynamic Voxel for Action Recognition in Depth Video** 1128 | 1129 | - 论文:https://arxiv.org/abs/2005.05501 1130 | - 代码:https://github.com/3huo/3DV-Action 1131 | 1132 | **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding** 1133 | 1134 | - 主页:https://sdolivia.github.io/FineGym/ 1135 | - 论文:https://arxiv.org/abs/2004.06704 1136 | 1137 | **TEA: Temporal Excitation and Aggregation for Action Recognition** 1138 | 1139 | - 论文:https://arxiv.org/abs/2004.01398 1140 | 1141 | - 代码:https://github.com/Phoenix1327/tea-action-recognition 1142 | 1143 | **X3D: Expanding Architectures for Efficient Video Recognition** 1144 | 1145 | - 论文:https://arxiv.org/abs/2004.04730 1146 | 1147 | - 代码:https://github.com/facebookresearch/SlowFast 1148 | 1149 | **Temporal Pyramid Network for Action Recognition** 1150 | 1151 | - 主页:https://decisionforce.github.io/TPN 1152 | 1153 | - 论文:https://arxiv.org/abs/2004.03548 1154 | - 代码:https://github.com/decisionforce/TPN 1155 | 1156 | ## 基于骨架的动作识别 1157 | 1158 | **Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition** 1159 | 1160 | - 论文:https://arxiv.org/abs/2003.14111 1161 | - 代码:https://github.com/kenziyuliu/ms-g3d 1162 | 1163 | 1164 | 1165 | # 人群计数 1166 | 1167 | 1168 | 1169 | # 深度估计 1170 | 1171 | **BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion** 1172 | 1173 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf 1174 | - 代码:https://github.com/Yeh-yu-hsuan/BiFuse 1175 | 1176 | **Focus on defocus: bridging the synthetic to real domain gap for depth estimation** 1177 | 1178 | - 论文:https://arxiv.org/abs/2005.09623 1179 | - 代码:https://github.com/dvl-tum/defocus-net 1180 | 1181 | **Bi3D: Stereo Depth Estimation via Binary Classifications** 1182 | 1183 | - 论文:https://arxiv.org/abs/2005.07274 1184 | 1185 | - 代码:https://github.com/NVlabs/Bi3D 1186 | 1187 | **AANet: Adaptive Aggregation Network for Efficient Stereo Matching** 1188 | 1189 | - 论文:https://arxiv.org/abs/2004.09548 1190 | - 代码:https://github.com/haofeixu/aanet 1191 | 1192 | **Towards Better Generalization: Joint Depth-Pose Learning without PoseNet** 1193 | 1194 | - 论文:https://github.com/B1ueber2y/TrianFlow 1195 | 1196 | - 代码:https://github.com/B1ueber2y/TrianFlow 1197 | 1198 | ## 单目深度估计 1199 | 1200 | **On the uncertainty of self-supervised monocular depth estimation** 1201 | 1202 | - 论文:https://arxiv.org/abs/2005.06209 1203 | - 代码:https://github.com/mattpoggi/mono-uncertainty 1204 | 1205 | **3D Packing for Self-Supervised Monocular Depth Estimation** 1206 | 1207 | - 论文:https://arxiv.org/abs/1905.02693 1208 | - 代码:https://github.com/TRI-ML/packnet-sfm 1209 | - Demo视频:https://www.bilibili.com/video/av70562892/ 1210 | 1211 | **Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation** 1212 | 1213 | - 论文:https://arxiv.org/abs/2002.12114 1214 | - 代码:https://github.com/yzhao520/ARC 1215 | 1216 | 1217 | 1218 | # 6D目标姿态估计 1219 | 1220 | **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation** 1221 | 1222 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf 1223 | - 代码:https://github.com/ethnhe/PVN3D 1224 | 1225 | **MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion** 1226 | 1227 | - 论文:https://arxiv.org/abs/2004.04336 1228 | - 代码:https://github.com/wkentaro/morefusion 1229 | 1230 | **EPOS: Estimating 6D Pose of Objects with Symmetries** 1231 | 1232 | 主页:http://cmp.felk.cvut.cz/epos 1233 | 1234 | 论文:https://arxiv.org/abs/2004.00605 1235 | 1236 | **G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features** 1237 | 1238 | - 论文:https://arxiv.org/abs/2003.11089 1239 | 1240 | - 代码:https://github.com/DC1991/G2L_Net 1241 | 1242 | 1243 | 1244 | # 手势估计 1245 | 1246 | **HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation** 1247 | 1248 | - 论文:https://arxiv.org/abs/2004.00060 1249 | 1250 | - 主页:http://vision.sice.indiana.edu/projects/hopenet 1251 | 1252 | **Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data** 1253 | 1254 | - 论文:https://arxiv.org/abs/2003.09572 1255 | 1256 | - 代码:https://github.com/CalciferZh/minimal-hand 1257 | 1258 | 1259 | 1260 | # 显著性检测 1261 | 1262 | **JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection** 1263 | 1264 | - 论文:https://arxiv.org/abs/2004.08515 1265 | 1266 | - 代码:https://github.com/kerenfu/JLDCF/ 1267 | 1268 | **UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders** 1269 | 1270 | - 主页:http://dpfan.net/d3netbenchmark/ 1271 | 1272 | - 论文:https://arxiv.org/abs/2004.05763 1273 | - 代码:https://github.com/JingZhang617/UCNet 1274 | 1275 | 1276 | 1277 | # 去噪 1278 | 1279 | **A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising** 1280 | 1281 | - 论文:https://arxiv.org/abs/2003.12751 1282 | 1283 | - 代码:https://github.com/Vandermode/NoiseModel 1284 | 1285 | **CycleISP: Real Image Restoration via Improved Data Synthesis** 1286 | 1287 | - 论文:https://arxiv.org/abs/2003.07761 1288 | 1289 | - 代码:https://github.com/swz30/CycleISP 1290 | 1291 | 1292 | 1293 | # 去雨 1294 | 1295 | **Multi-Scale Progressive Fusion Network for Single Image Deraining** 1296 | 1297 | - 论文:https://arxiv.org/abs/2003.10985 1298 | - 代码:https://github.com/kuihua/MSPFN 1299 | 1300 | **Detail-recovery Image Deraining via Context Aggregation Networks** 1301 | 1302 | - 论文:https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html 1303 | - 代码:https://github.com/Dengsgithub/DRD-Net 1304 | 1305 | 1306 | 1307 | # 去模糊 1308 | 1309 | ## 视频去模糊 1310 | 1311 | **Cascaded Deep Video Deblurring Using Temporal Sharpness Prior** 1312 | 1313 | - 主页:https://csbhr.github.io/projects/cdvd-tsp/index.html 1314 | - 论文:https://arxiv.org/abs/2004.02501 1315 | - 代码:https://github.com/csbhr/CDVD-TSP 1316 | 1317 | 1318 | 1319 | # 去雾 1320 | 1321 | **Domain Adaptation for Image Dehazing** 1322 | 1323 | - 论文:https://arxiv.org/abs/2005.04668 1324 | 1325 | - 代码:https://github.com/HUSTSYJ/DA_dahazing 1326 | 1327 | **Multi-Scale Boosted Dehazing Network with Dense Feature Fusion** 1328 | 1329 | - 论文:https://arxiv.org/abs/2004.13388 1330 | 1331 | - 代码:https://github.com/BookerDeWitt/MSBDN-DFF 1332 | 1333 | 1334 | 1335 | # 特征点检测与描述 1336 | 1337 | **ASLFeat: Learning Local Features of Accurate Shape and Localization** 1338 | 1339 | - 论文:https://arxiv.org/abs/2003.10071 1340 | 1341 | - 代码:https://github.com/lzx551402/aslfeat 1342 | 1343 | 1344 | 1345 | # 视觉问答(VQA) 1346 | 1347 | **VC R-CNN:Visual Commonsense R-CNN** 1348 | 1349 | - 论文:https://arxiv.org/abs/2002.12204 1350 | - 代码:https://github.com/Wangt-CN/VC-R-CNN 1351 | 1352 | 1353 | 1354 | # 视频问答(VideoQA) 1355 | 1356 | **Hierarchical Conditional Relation Networks for Video Question Answering** 1357 | 1358 | - 论文:https://arxiv.org/abs/2002.10698 1359 | - 代码:https://github.com/thaolmk54/hcrn-videoqa 1360 | 1361 | 1362 | 1363 | # 视觉语言导航 1364 | 1365 | **Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training** 1366 | 1367 | - 论文:https://arxiv.org/abs/2002.10638 1368 | - 代码(即将开源):https://github.com/weituo12321/PREVALENT 1369 | 1370 | 1371 | 1372 | # 视频压缩 1373 | 1374 | **Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement** 1375 | 1376 | - 论文:https://arxiv.org/abs/2003.01966 1377 | - 代码:https://github.com/RenYang-home/HLVC 1378 | 1379 | 1380 | 1381 | # 视频插帧 1382 | 1383 | **AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation** 1384 | 1385 | - 论文:https://arxiv.org/abs/1907.10244 1386 | - 代码:https://github.com/HyeongminLEE/AdaCoF-pytorch 1387 | 1388 | **FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation** 1389 | 1390 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html 1391 | 1392 | - 代码:https://github.com/CM-BF/FeatureFlow 1393 | 1394 | **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution** 1395 | 1396 | - 论文:https://arxiv.org/abs/2002.11616 1397 | - 代码:https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 1398 | 1399 | **Space-Time-Aware Multi-Resolution Video Enhancement** 1400 | 1401 | - 主页:https://alterzero.github.io/projects/STAR.html 1402 | - 论文:http://arxiv.org/abs/2003.13170 1403 | - 代码:https://github.com/alterzero/STARnet 1404 | 1405 | **Scene-Adaptive Video Frame Interpolation via Meta-Learning** 1406 | 1407 | - 论文:https://arxiv.org/abs/2004.00779 1408 | - 代码:https://github.com/myungsub/meta-interpolation 1409 | 1410 | **Softmax Splatting for Video Frame Interpolation** 1411 | 1412 | - 主页:http://sniklaus.com/papers/softsplat 1413 | - 论文:https://arxiv.org/abs/2003.05534 1414 | - 代码:https://github.com/sniklaus/softmax-splatting 1415 | 1416 | 1417 | 1418 | # 风格迁移 1419 | 1420 | **Diversified Arbitrary Style Transfer via Deep Feature Perturbation** 1421 | 1422 | - 论文:https://arxiv.org/abs/1909.08223 1423 | - 代码:https://github.com/EndyWon/Deep-Feature-Perturbation 1424 | 1425 | **Collaborative Distillation for Ultra-Resolution Universal Style Transfer** 1426 | 1427 | - 论文:https://arxiv.org/abs/2003.08436 1428 | 1429 | - 代码:https://github.com/mingsun-tse/collaborative-distillation 1430 | 1431 | 1432 | 1433 | # 车道线检测 1434 | 1435 | **Inter-Region Affinity Distillation for Road Marking Segmentation** 1436 | 1437 | - 论文:https://arxiv.org/abs/2004.05304 1438 | - 代码:https://github.com/cardwing/Codes-for-IntRA-KD 1439 | 1440 | 1441 | 1442 | # "人-物"交互(HOT)检测 1443 | 1444 | **PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection** 1445 | 1446 | - 论文:https://arxiv.org/abs/1912.12898 1447 | - 代码:https://github.com/YueLiao/PPDM 1448 | 1449 | **Detailed 2D-3D Joint Representation for Human-Object Interaction** 1450 | 1451 | - 论文:https://arxiv.org/abs/2004.08154 1452 | 1453 | - 代码:https://github.com/DirtyHarryLYL/DJ-RN 1454 | 1455 | **Cascaded Human-Object Interaction Recognition** 1456 | 1457 | - 论文:https://arxiv.org/abs/2003.04262 1458 | 1459 | - 代码:https://github.com/tfzhou/C-HOI 1460 | 1461 | **VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions** 1462 | 1463 | - 论文:https://arxiv.org/abs/2003.05541 1464 | - 代码:https://github.com/ASMIftekhar/VSGNet 1465 | 1466 | 1467 | 1468 | # 轨迹预测 1469 | 1470 | **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction** 1471 | 1472 | - 论文:https://arxiv.org/abs/1912.06445 1473 | - 代码:https://github.com/JunweiLiang/Multiverse 1474 | - 数据集:https://next.cs.cmu.edu/multiverse/ 1475 | 1476 | **Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction** 1477 | 1478 | - 论文:https://arxiv.org/abs/2002.11927 1479 | - 代码:https://github.com/abduallahmohamed/Social-STGCNN 1480 | 1481 | 1482 | 1483 | # 运动预测 1484 | 1485 | **Collaborative Motion Prediction via Neural Motion Message Passing** 1486 | 1487 | - 论文:https://arxiv.org/abs/2003.06594 1488 | - 代码:https://github.com/PhyllisH/NMMP 1489 | 1490 | **MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps** 1491 | 1492 | - 论文:https://arxiv.org/abs/2003.06754 1493 | 1494 | - 代码:https://github.com/pxiangwu/MotionNet 1495 | 1496 | 1497 | 1498 | # 光流估计 1499 | 1500 | **Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation** 1501 | 1502 | - 论文:https://arxiv.org/abs/2003.13045 1503 | - 代码:https://github.com/lliuz/ARFlow 1504 | 1505 | 1506 | 1507 | # 图像检索 1508 | 1509 | **Evade Deep Image Retrieval by Stashing Private Images in the Hash Space** 1510 | 1511 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html 1512 | - 代码:https://github.com/sugarruy/hashstash 1513 | 1514 | 1515 | 1516 | # 虚拟试衣 1517 | 1518 | **Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content** 1519 | 1520 | - 论文:https://arxiv.org/abs/2003.05863 1521 | - 代码:https://github.com/switchablenorms/DeepFashion_Try_On 1522 | 1523 | 1524 | 1525 | # HDR 1526 | 1527 | **Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline** 1528 | 1529 | - 主页:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR 1530 | 1531 | - 论文下载链接:https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf 1532 | 1533 | - 代码:https://github.com/alex04072000/SingleHDR 1534 | 1535 | 1536 | 1537 | # 对抗样本 1538 | 1539 | **Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction** 1540 | 1541 | - 论文:https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf 1542 | - 代码:https://github.com/erbloo/dr_cvpr20 1543 | 1544 | **Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance** 1545 | 1546 | - 论文:https://arxiv.org/abs/1911.02466 1547 | - 代码:https://github.com/ZhengyuZhao/PerC-Adversarial 1548 | 1549 | 1550 | 1551 | # 三维重建 1552 | 1553 | **Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild** 1554 | 1555 | - **CVPR 2020 Best Paper** 1556 | - 主页:https://elliottwu.com/projects/unsup3d/ 1557 | - 论文:https://arxiv.org/abs/1911.11130 1558 | - 代码:https://github.com/elliottwu/unsup3d 1559 | 1560 | **Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization** 1561 | 1562 | - 主页:https://shunsukesaito.github.io/PIFuHD/ 1563 | - 论文:https://arxiv.org/abs/2004.00452 1564 | - 代码:https://github.com/facebookresearch/pifuhd 1565 | 1566 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf 1567 | - 代码:https://github.com/chaitanya100100/TailorNet 1568 | - 数据集:https://github.com/zycliao/TailorNet_dataset 1569 | 1570 | **Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion** 1571 | 1572 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf 1573 | - 代码:https://github.com/jchibane/if-net 1574 | 1575 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf 1576 | - 代码:https://github.com/aymenmir1/pix2surf 1577 | 1578 | 1579 | 1580 | # 深度补全 1581 | 1582 | **Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End** 1583 | 1584 | 论文:https://arxiv.org/abs/2006.03349 1585 | 1586 | 代码:https://github.com/abdo-eldesokey/pncnn 1587 | 1588 | 1589 | 1590 | # 语义场景补全 1591 | 1592 | **3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior** 1593 | 1594 | - 论文:https://arxiv.org/abs/2003.14052 1595 | - 代码:https://github.com/charlesCXK/TorchSSC 1596 | 1597 | 1598 | 1599 | # 图像/视频描述 1600 | 1601 | **Syntax-Aware Action Targeting for Video Captioning** 1602 | 1603 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf 1604 | - 代码:https://github.com/SydCaption/SAAT 1605 | 1606 | 1607 | 1608 | # 线框解析 1609 | 1610 | **Holistically-Attracted Wireframe Parser** 1611 | 1612 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html 1613 | 1614 | - 代码:https://github.com/cherubicXN/hawp 1615 | 1616 | 1617 | 1618 | # 数据集 1619 | 1620 | **OASIS: A Large-Scale Dataset for Single Image 3D in the Wild** 1621 | 1622 | - 论文:https://arxiv.org/abs/2007.13215 1623 | - 数据集:https://oasis.cs.princeton.edu/ 1624 | 1625 | **STEFANN: Scene Text Editor using Font Adaptive Neural Network** 1626 | 1627 | - 主页:https://prasunroy.github.io/stefann/ 1628 | 1629 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html 1630 | - 代码:https://github.com/prasunroy/stefann 1631 | - 数据集:https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k 1632 | 1633 | **Interactive Object Segmentation with Inside-Outside Guidance** 1634 | 1635 | - 论文下载链接:http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf 1636 | - 代码:https://github.com/shiyinzhang/Inside-Outside-Guidance 1637 | - 数据集:https://github.com/shiyinzhang/Pixel-ImageNet 1638 | 1639 | **Video Panoptic Segmentation** 1640 | 1641 | - 论文:https://arxiv.org/abs/2006.11339 1642 | - 代码:https://github.com/mcahny/vps 1643 | - 数据集:https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0 1644 | 1645 | **FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation** 1646 | 1647 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html 1648 | 1649 | - 代码:https://github.com/HKUSTCV/FSS-1000 1650 | 1651 | - 数据集:https://github.com/HKUSTCV/FSS-1000 1652 | 1653 | **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset** 1654 | 1655 | - 主页:https://vap.aau.dk/3d-zef/ 1656 | - 论文:https://arxiv.org/abs/2006.08466 1657 | - 代码:https://bitbucket.org/aauvap/3d-zef/src/master/ 1658 | - 数据集:https://motchallenge.net/data/3D-ZeF20 1659 | 1660 | **TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style** 1661 | 1662 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf 1663 | - 代码:https://github.com/chaitanya100100/TailorNet 1664 | - 数据集:https://github.com/zycliao/TailorNet_dataset 1665 | 1666 | **Oops! Predicting Unintentional Action in Video** 1667 | 1668 | - 主页:https://oops.cs.columbia.edu/ 1669 | 1670 | - 论文:https://arxiv.org/abs/1911.11206 1671 | - 代码:https://github.com/cvlab-columbia/oops 1672 | - 数据集:https://oops.cs.columbia.edu/data 1673 | 1674 | **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction** 1675 | 1676 | - 论文:https://arxiv.org/abs/1912.06445 1677 | - 代码:https://github.com/JunweiLiang/Multiverse 1678 | - 数据集:https://next.cs.cmu.edu/multiverse/ 1679 | 1680 | **Open Compound Domain Adaptation** 1681 | 1682 | - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html 1683 | - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing 1684 | - 论文:https://arxiv.org/abs/1909.03403 1685 | - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA 1686 | 1687 | **Intra- and Inter-Action Understanding via Temporal Action Parsing** 1688 | 1689 | - 论文:https://arxiv.org/abs/2005.10229 1690 | - 主页和数据集:https://sdolivia.github.io/TAPOS/ 1691 | 1692 | **Dynamic Refinement Network for Oriented and Densely Packed Object Detection** 1693 | 1694 | - 论文下载链接:https://arxiv.org/abs/2005.09973 1695 | 1696 | - 代码和数据集:https://github.com/Anymake/DRN_CVPR2020 1697 | 1698 | **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification** 1699 | 1700 | - 论文:https://arxiv.org/abs/2005.07862 1701 | 1702 | - 数据集:暂无 1703 | 1704 | **KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations** 1705 | 1706 | - 论文:https://arxiv.org/abs/2002.12687 1707 | 1708 | - 数据集:https://github.com/qq456cvb/KeypointNet 1709 | 1710 | **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation** 1711 | 1712 | - 论文:http://vladlen.info/papers/MSeg.pdf 1713 | - 代码:https://github.com/mseg-dataset/mseg-api 1714 | - 数据集:https://github.com/mseg-dataset/mseg-semantic 1715 | 1716 | **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"** 1717 | 1718 | - 论文:https://arxiv.org/abs/2003.13845 1719 | - 数据集:https://github.com/lattas/AvatarMe 1720 | 1721 | **Learning to Autofocus** 1722 | 1723 | - 论文:https://arxiv.org/abs/2004.12260 1724 | - 数据集:暂无 1725 | 1726 | **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction** 1727 | 1728 | - 论文:https://arxiv.org/abs/2003.13989 1729 | - 代码:https://github.com/zhuhao-nju/facescape 1730 | 1731 | **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data** 1732 | 1733 | - 论文下载链接:https://arxiv.org/abs/2004.01166 1734 | 1735 | - 代码:https://github.com/Healthcare-Robotics/bodies-at-rest 1736 | - 数据集:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML 1737 | 1738 | **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding** 1739 | 1740 | - 主页:https://sdolivia.github.io/FineGym/ 1741 | - 论文:https://arxiv.org/abs/2004.06704 1742 | 1743 | **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation** 1744 | 1745 | - 主页:https://anyirao.com/projects/SceneSeg.html 1746 | 1747 | - 论文下载链接:https://arxiv.org/abs/2004.02678 1748 | 1749 | - 代码:https://github.com/AnyiRao/SceneSeg 1750 | 1751 | **Deep Homography Estimation for Dynamic Scenes** 1752 | 1753 | - 论文:https://arxiv.org/abs/2004.02132 1754 | 1755 | - 数据集:https://github.com/lcmhoang/hmg-dynamics 1756 | 1757 | **Assessing Image Quality Issues for Real-World Problems** 1758 | 1759 | - 主页:https://vizwiz.org/tasks-and-datasets/image-quality-issues/ 1760 | - 论文:https://arxiv.org/abs/2003.12511 1761 | 1762 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World** 1763 | 1764 | - 论文:https://arxiv.org/abs/2003.10608 1765 | - 代码和数据集:https://github.com/Jyouhou/UnrealText/ 1766 | 1767 | **PANDA: A Gigapixel-level Human-centric Video Dataset** 1768 | 1769 | - 论文:https://arxiv.org/abs/2003.04852 1770 | 1771 | - 数据集:http://www.panda-dataset.com/ 1772 | 1773 | **IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning** 1774 | 1775 | - 论文:https://arxiv.org/abs/2003.02920 1776 | - 数据集:https://github.com/intra3d2019/IntrA 1777 | 1778 | **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS** 1779 | 1780 | - 论文:https://arxiv.org/abs/2003.03972 1781 | - 数据集:暂无 1782 | 1783 | 1784 | 1785 | # 其他 1786 | 1787 | **CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus** 1788 | 1789 | - 论文:http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html 1790 | - 代码:https://github.com/fkluger/consac 1791 | 1792 | **Learning to Learn Single Domain Generalization** 1793 | 1794 | - 论文:https://arxiv.org/abs/2003.13216 1795 | - 代码:https://github.com/joffery/M-ADA 1796 | 1797 | **Open Compound Domain Adaptation** 1798 | 1799 | - 主页:https://liuziwei7.github.io/projects/CompoundDomain.html 1800 | - 数据集:https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing 1801 | - 论文:https://arxiv.org/abs/1909.03403 1802 | - 代码:https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA 1803 | 1804 | **Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision** 1805 | 1806 | - 论文:http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf 1807 | 1808 | - 代码:https://github.com/autonomousvision/differentiable_volumetric_rendering 1809 | 1810 | **QEBA: Query-Efficient Boundary-Based Blackbox Attack** 1811 | 1812 | - 论文:https://arxiv.org/abs/2005.14137 1813 | - 代码:https://github.com/AI-secure/QEBA 1814 | 1815 | **Equalization Loss for Long-Tailed Object Recognition** 1816 | 1817 | - 论文:https://arxiv.org/abs/2003.05176 1818 | - 代码:https://github.com/tztztztztz/eql.detectron2 1819 | 1820 | **Instance-aware Image Colorization** 1821 | 1822 | - 主页:https://ericsujw.github.io/InstColorization/ 1823 | - 论文:https://arxiv.org/abs/2005.10825 1824 | - 代码:https://github.com/ericsujw/InstColorization 1825 | 1826 | **Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting** 1827 | 1828 | - 论文:https://arxiv.org/abs/2005.09704 1829 | 1830 | - 代码:https://github.com/Atlas200dk/sample-imageinpainting-HiFill 1831 | 1832 | **Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching** 1833 | 1834 | - 论文:https://arxiv.org/abs/2005.03860 1835 | - 代码:https://github.com/shiyujiao/cross_view_localization_DSM 1836 | 1837 | **Epipolar Transformers** 1838 | 1839 | - 论文:https://arxiv.org/abs/2005.04551 1840 | 1841 | - 代码:https://github.com/yihui-he/epipolar-transformers 1842 | 1843 | **Bringing Old Photos Back to Life** 1844 | 1845 | - 主页:http://raywzy.com/Old_Photo/ 1846 | - 论文:https://arxiv.org/abs/2004.09484 1847 | 1848 | **MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask** 1849 | 1850 | - 论文:https://arxiv.org/abs/2003.10955 1851 | 1852 | - 代码:https://github.com/microsoft/MaskFlownet 1853 | 1854 | **Self-Supervised Viewpoint Learning from Image Collections** 1855 | 1856 | - 论文:https://arxiv.org/abs/2004.01793 1857 | - 论文2:https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf 1858 | - 代码:https://github.com/NVlabs/SSV 1859 | 1860 | **Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations** 1861 | 1862 | - Oral 1863 | 1864 | - 论文:https://arxiv.org/abs/2003.12237 1865 | - 代码:https://github.com/cuishuhao/BNM 1866 | 1867 | **Towards Learning Structure via Consensus for Face Segmentation and Parsing** 1868 | 1869 | - 论文:https://arxiv.org/abs/1911.00957 1870 | - 代码:https://github.com/isi-vista/structure_via_consensus 1871 | 1872 | **Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging** 1873 | 1874 | - Oral 1875 | - 论文:https://arxiv.org/abs/2003.13654 1876 | 1877 | - 代码:https://github.com/liuyang12/PnP-SCI 1878 | 1879 | **Lightweight Photometric Stereo for Facial Details Recovery** 1880 | 1881 | - 论文:https://arxiv.org/abs/2003.12307 1882 | - 代码:https://github.com/Juyong/FacePSNet 1883 | 1884 | **Footprints and Free Space from a Single Color Image** 1885 | 1886 | - 论文:https://arxiv.org/abs/2004.06376 1887 | 1888 | - 代码:https://github.com/nianticlabs/footprints 1889 | 1890 | **Self-Supervised Monocular Scene Flow Estimation** 1891 | 1892 | - 论文:https://arxiv.org/abs/2004.04143 1893 | - 代码:https://github.com/visinf/self-mono-sf 1894 | 1895 | **Quasi-Newton Solver for Robust Non-Rigid Registration** 1896 | 1897 | - 论文:https://arxiv.org/abs/2004.04322 1898 | - 代码:https://github.com/Juyong/Fast_RNRR 1899 | 1900 | **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation** 1901 | 1902 | - 主页:https://anyirao.com/projects/SceneSeg.html 1903 | 1904 | - 论文下载链接:https://arxiv.org/abs/2004.02678 1905 | 1906 | - 代码:https://github.com/AnyiRao/SceneSeg 1907 | 1908 | **DeepFLASH: An Efficient Network for Learning-based Medical Image Registration** 1909 | 1910 | - 论文:https://arxiv.org/abs/2004.02097 1911 | 1912 | - 代码:https://github.com/jw4hv/deepflash 1913 | 1914 | **Self-Supervised Scene De-occlusion** 1915 | 1916 | - 主页:https://xiaohangzhan.github.io/projects/deocclusion/ 1917 | - 论文:https://arxiv.org/abs/2004.02788 1918 | - 代码:https://github.com/XiaohangZhan/deocclusion 1919 | 1920 | **Polarized Reflection Removal with Perfect Alignment in the Wild** 1921 | 1922 | - 主页:https://leichenyang.weebly.com/project-polarized.html 1923 | - 代码:https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment 1924 | 1925 | **Background Matting: The World is Your Green Screen** 1926 | 1927 | - 论文:https://arxiv.org/abs/2004.00626 1928 | - 代码:http://github.com/senguptaumd/Background-Matting 1929 | 1930 | **What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective** 1931 | 1932 | - 论文:https://arxiv.org/abs/2003.11241 1933 | 1934 | - 代码:https://github.com/ZhangLi-CS/GCP_Optimization 1935 | 1936 | **Look-into-Object: Self-supervised Structure Modeling for Object Recognition** 1937 | 1938 | - 论文:暂无 1939 | - 代码:https://github.com/JDAI-CV/LIO 1940 | 1941 | **Video Object Grounding using Semantic Roles in Language Description** 1942 | 1943 | - 论文:https://arxiv.org/abs/2003.10606 1944 | - 代码:https://github.com/TheShadow29/vognet-pytorch 1945 | 1946 | **Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives** 1947 | 1948 | - 论文:https://arxiv.org/abs/2003.10739 1949 | - 代码:https://github.com/d-li14/DHM 1950 | 1951 | **SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization** 1952 | 1953 | - 论文:http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf 1954 | - 代码:https://github.com/YueJiang-nj/CVPR2020-SDFDiff 1955 | 1956 | **On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location** 1957 | 1958 | - 论文:https://arxiv.org/abs/2003.07064 1959 | 1960 | - 代码:https://github.com/oskyhn/CNNs-Without-Borders 1961 | 1962 | **GhostNet: More Features from Cheap Operations** 1963 | 1964 | - 论文:https://arxiv.org/abs/1911.11907 1965 | 1966 | - 代码:https://github.com/iamhankai/ghostnet 1967 | 1968 | **AdderNet: Do We Really Need Multiplications in Deep Learning?** 1969 | 1970 | - 论文:https://arxiv.org/abs/1912.13200 1971 | - 代码:https://github.com/huawei-noah/AdderNet 1972 | 1973 | **Deep Image Harmonization via Domain Verification** 1974 | 1975 | - 论文:https://arxiv.org/abs/1911.13239 1976 | - 代码:https://github.com/bcmi/Image_Harmonization_Datasets 1977 | 1978 | **Blurry Video Frame Interpolation** 1979 | 1980 | - 论文:https://arxiv.org/abs/2002.12259 1981 | - 代码:https://github.com/laomao0/BIN 1982 | 1983 | **Extremely Dense Point Correspondences using a Learned Feature Descriptor** 1984 | 1985 | - 论文:https://arxiv.org/abs/2003.00619 1986 | - 代码:https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch 1987 | 1988 | **Filter Grafting for Deep Neural Networks** 1989 | 1990 | - 论文:https://arxiv.org/abs/2001.05868 1991 | - 代码:https://github.com/fxmeng/filter-grafting 1992 | - 论文解读:https://www.zhihu.com/question/372070853/answer/1041569335 1993 | 1994 | **Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation** 1995 | 1996 | - 论文:https://arxiv.org/abs/2003.02824 1997 | - 代码:https://github.com/cmhungsteve/SSTDA 1998 | 1999 | **Detecting Attended Visual Targets in Video** 2000 | 2001 | - 论文:https://arxiv.org/abs/2003.02501 2002 | 2003 | - 代码:https://github.com/ejcgt/attention-target-detection 2004 | 2005 | **Deep Image Spatial Transformation for Person Image Generation** 2006 | 2007 | - 论文:https://arxiv.org/abs/2003.00696 2008 | - 代码:https://github.com/RenYurui/Global-Flow-Local-Attention 2009 | 2010 | **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications** 2011 | 2012 | - 论文:https://arxiv.org/abs/2003.01455 2013 | - 代码:https://github.com/bbrattoli/ZeroShotVideoClassification 2014 | 2015 | https://github.com/charlesCXK/3D-SketchAware-SSC 2016 | 2017 | https://github.com/Anonymous20192020/Anonymous_CVPR5767 2018 | 2019 | https://github.com/avirambh/ScopeFlow 2020 | 2021 | https://github.com/csbhr/CDVD-TSP 2022 | 2023 | https://github.com/ymcidence/TBH 2024 | 2025 | https://github.com/yaoyao-liu/mnemonics 2026 | 2027 | https://github.com/meder411/Tangent-Images 2028 | 2029 | https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch 2030 | 2031 | https://github.com/sjmoran/deep_local_parametric_filters 2032 | 2033 | https://github.com/charlesCXK/3D-SketchAware-SSC 2034 | 2035 | https://github.com/bermanmaxim/AOWS 2036 | 2037 | https://github.com/dc3ea9f/look-into-object 2038 | 2039 | 2040 | 2041 | # 不确定中没中 2042 | 2043 | **FADNet: A Fast and Accurate Network for Disparity Estimation** 2044 | 2045 | - 论文:还没出来 2046 | - 代码:https://github.com/HKBU-HPML/FADNet 2047 | 2048 | https://github.com/rFID-submit/RandomFID:不确定中没中 2049 | 2050 | https://github.com/JackSyu/AE-MSR:不确定中没中 2051 | 2052 | https://github.com/fastconvnets/cvpr2020:不确定中没中 2053 | 2054 | https://github.com/aimagelab/meshed-memory-transformer:不确定中没中 2055 | 2056 | https://github.com/TWSFar/CRGNet:不确定中没中 2057 | 2058 | https://github.com/CVPR-2020/CDARTS:不确定中没中 2059 | 2060 | https://github.com/anucvml/ddn-cvprw2020:不确定中没中 2061 | 2062 | https://github.com/dl-model-recommend/model-trust:不确定中没中 2063 | 2064 | https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior:不确定中没中 2065 | 2066 | https://github.com/onetcvpr/O-Net:不确定中没中 2067 | 2068 | https://github.com/502463708/Microcalcification_Detection:不确定中没中 2069 | 2070 | https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine:不确定中没中 2071 | 2072 | https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset:不确定中没中 2073 | 2074 | https://github.com/cvpr-nonrigid/dataset:不确定中没中 2075 | 2076 | https://github.com/theFool32/PPBA:不确定中没中 2077 | 2078 | https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition -------------------------------------------------------------------------------- /CVPR2022-Papers-with-Code.md: -------------------------------------------------------------------------------- 1 | # CVPR 2022 论文和开源项目合集(Papers with Code) 2 | 3 | [CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)! 4 | 5 | CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view 6 | 7 | > 注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目! 8 | > 9 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 10 | > 11 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md) 12 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md) 13 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md) 14 | 15 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~ 16 | 17 | ![](CVer学术交流群.png) 18 | 19 | ## 【CVPR 2022 论文开源目录】 20 | 21 | - [Backbone](#Backbone) 22 | - [CLIP](#CLIP) 23 | - [GAN](#GAN) 24 | - [GNN](#GNN) 25 | - [MLP](#MLP) 26 | - [NAS](#NAS) 27 | - [OCR](#OCR) 28 | - [NeRF](#NeRF) 29 | - [3D Face](#3D Face) 30 | - [长尾分布(Long-Tail)](#Long-Tail) 31 | - [Visual Transformer](#Visual-Transformer) 32 | - [视觉和语言(Vision-Language)](#VL) 33 | - [自监督学习(Self-supervised Learning)](#SSL) 34 | - [数据增强(Data Augmentation)](#DA) 35 | - [知识蒸馏(Knowledge Distillation)](#KD) 36 | - [目标检测(Object Detection)](#Object-Detection) 37 | - [目标跟踪(Visual Tracking)](#VT) 38 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) 39 | - [实例分割(Instance Segmentation)](#Instance-Segmentation) 40 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) 41 | - [小样本分类(Few-Shot Classification)](#FFC) 42 | - [小样本分割(Few-Shot Segmentation)](#FFS) 43 | - [图像抠图(Image Matting)](#Matting) 44 | - [视频理解(Video Understanding)](#VU) 45 | - [图像编辑(Image Editing)](#Image-Editing) 46 | - [Low-level Vision](#LLV) 47 | - [超分辨率(Super-Resolution)](#Super-Resolution) 48 | - [去模糊(Deblur)](#Deblur) 49 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud) 50 | - [3D目标检测(3D Object Detection)](#3D-Object-Detection) 51 | - [3D语义分割(3D Semantic Segmentation)](#3DSS) 52 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) 53 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) 54 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) 55 | - [3D重建(3D Reconstruction)](#3D-R) 56 | - [行人重识别(Person Re-identification)](#ReID) 57 | - [伪装物体检测(Camouflaged Object Detection)](#COD) 58 | - [深度估计(Depth Estimation)](#Depth-Estimation) 59 | - [立体匹配(Stereo Matching)](#Stereo-Matching) 60 | - [特征匹配(Feature Matching)](#FM) 61 | - [车道线检测(Lane Detection)](#Lane-Detection) 62 | - [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation) 63 | - [图像修复(Image Inpainting)](#Image-Inpainting) 64 | - [图像检索(Image Retrieval)](#Image-Retrieval) 65 | - [人脸识别(Face Recognition)](#Face-Recognition) 66 | - [人群计数(Crowd Counting)](#Crowd-Counting) 67 | - [医学图像(Medical Image)](#Medical-Image) 68 | - [视频生成(Video Generation)](#Video Generation) 69 | - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation) 70 | - [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS) 71 | - [步态识别(Gait Recognition)](#GR) 72 | - [风格迁移(Style Transfer)](#ST) 73 | - [异常检测(Anomaly Detection](#AD) 74 | - [对抗样本(Adversarial Examples)](#AE) 75 | - [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL) 76 | - [雷达目标检测(Radar Object Detection)](#ROD) 77 | - [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI) 78 | - [图像拼接(Image Stitching)](#Image-Stitching) 79 | - [水印(Watermarking)](#Watermarking) 80 | - [Action Counting](#AC) 81 | - [Grounded Situation Recognition](#GSR) 82 | - [Zero-shot Learning](#ZSL) 83 | - [DeepFakes](#DeepFakes) 84 | - [数据集(Datasets)](#Datasets) 85 | - [新任务(New Tasks)](#New-Tasks) 86 | - [其他(Others)](#Others) 87 | 88 | 89 | 90 | # Backbone 91 | 92 | **A ConvNet for the 2020s** 93 | 94 | - Paper: https://arxiv.org/abs/2201.03545 95 | - Code: https://github.com/facebookresearch/ConvNeXt 96 | - 中文解读:https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw 97 | 98 | **Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs** 99 | 100 | - Paper: https://arxiv.org/abs/2203.06717 101 | 102 | - Code: https://github.com/megvii-research/RepLKNet 103 | - Code2: https://github.com/DingXiaoH/RepLKNet-pytorch 104 | 105 | - 中文解读:https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg 106 | 107 | **MPViT : Multi-Path Vision Transformer for Dense Prediction** 108 | 109 | - Paper: https://arxiv.org/abs/2112.11010 110 | - Code: https://github.com/youngwanLEE/MPViT 111 | - 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg 112 | 113 | **Mobile-Former: Bridging MobileNet and Transformer** 114 | 115 | - Paper: https://arxiv.org/abs/2108.05895 116 | - Code: None 117 | - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ 118 | 119 | **MetaFormer is Actually What You Need for Vision** 120 | 121 | - Paper: https://arxiv.org/abs/2111.11418 122 | - Code: https://github.com/sail-sg/poolformer 123 | 124 | **Shunted Self-Attention via Multi-Scale Token Aggregation** 125 | 126 | - Paper(Oral): https://arxiv.org/abs/2111.15193 127 | - Code: https://github.com/OliverRensu/Shunted-Transformer 128 | 129 | **TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing** 130 | 131 | - Paper: http://arxiv.org/abs/2203.10489 132 | - Code: https://github.com/JierunChen/TVConv 133 | 134 | **Learned Queries for Efficient Local Attention** 135 | 136 | - Paper(Oral): https://arxiv.org/abs/2112.11435 137 | - Code: https://github.com/moabarar/qna 138 | 139 | **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality** 140 | 141 | - Paper: https://arxiv.org/abs/2112.11081 142 | - Code: https://github.com/DingXiaoH/RepMLP 143 | 144 | 145 | 146 | # CLIP 147 | 148 | **HairCLIP: Design Your Hair by Text and Reference Image** 149 | 150 | - Paper: https://arxiv.org/abs/2112.05142 151 | 152 | - Code: https://github.com/wty-ustc/HairCLIP 153 | 154 | **PointCLIP: Point Cloud Understanding by CLIP** 155 | 156 | - Paper: https://arxiv.org/abs/2112.02413 157 | - Code: https://github.com/ZrrSkywalker/PointCLIP 158 | 159 | **Blended Diffusion for Text-driven Editing of Natural Images** 160 | 161 | - Paper: https://arxiv.org/abs/2111.14818 162 | 163 | - Code: https://github.com/omriav/blended-diffusion 164 | 165 | 166 | 167 | # GAN 168 | 169 | **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing** 170 | 171 | - Homepage: https://semanticstylegan.github.io/ 172 | 173 | - Paper: https://arxiv.org/abs/2112.02236 174 | - Demo: https://semanticstylegan.github.io/videos/demo.mp4 175 | 176 | **Style Transformer for Image Inversion and Editing** 177 | 178 | - Paper: https://arxiv.org/abs/2203.07932 179 | - Code: https://github.com/sapphire497/style-transformer 180 | 181 | **Unsupervised Image-to-Image Translation with Generative Prior** 182 | 183 | - Homepage: https://www.mmlab-ntu.com/project/gpunit/ 184 | - Paper: https://arxiv.org/abs/2204.03641 185 | - Code: https://github.com/williamyang1991/GP-UNIT 186 | 187 | **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2** 188 | 189 | - Homepage: https://universome.github.io/stylegan-v 190 | - Paper: https://arxiv.org/abs/2112.14683 191 | - Code: https://github.com/universome/stylegan-v 192 | 193 | **OSSGAN: Open-set Semi-supervised Image Generation** 194 | 195 | - Paper: https://arxiv.org/abs/2204.14249 196 | - Code: https://github.com/raven38/OSSGAN 197 | 198 | **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis** 199 | 200 | - Paper: https://arxiv.org/abs/2204.06160 201 | - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution 202 | 203 | 204 | 205 | # GNN 206 | 207 | **OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks** 208 | 209 | - Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf 210 | - Code: https://github.com/WanyuGroup/CVPR2022-OrphicX 211 | 212 | 213 | 214 | # MLP 215 | 216 | **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality** 217 | 218 | - Paper: https://arxiv.org/abs/2112.11081 219 | - Code: https://github.com/DingXiaoH/RepMLP 220 | 221 | 222 | 223 | # NAS 224 | 225 | **β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search** 226 | 227 | - Paper: https://arxiv.org/abs/2203.01665 228 | - Code: https://github.com/Sunshine-Ye/Beta-DARTS 229 | 230 | **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior** 231 | 232 | - Paper: https://arxiv.org/abs/2111.15362 233 | - Code: None 234 | 235 | 236 | 237 | # OCR 238 | 239 | **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition** 240 | 241 | - Paper: https://arxiv.org/abs/2203.10209 242 | 243 | - Code: https://github.com/mxin262/SwinTextSpotter 244 | 245 | 246 | 247 | # NeRF 248 | 249 | **Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields** 250 | 251 | - Homepage: https://jonbarron.info/mipnerf360/ 252 | - Paper: https://arxiv.org/abs/2111.12077 253 | 254 | - Demo: https://youtu.be/YStDS2-Ln1s 255 | 256 | **Point-NeRF: Point-based Neural Radiance Fields** 257 | 258 | - Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/ 259 | - Paper: https://arxiv.org/abs/2201.08845 260 | - Code: https://github.com/Xharlie/point-nerf 261 | 262 | **NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images** 263 | 264 | - Paper: https://arxiv.org/abs/2111.13679 265 | - Homepage: https://bmild.github.io/rawnerf/ 266 | - Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc 267 | 268 | **Urban Radiance Fields** 269 | 270 | - Homepage: https://urban-radiance-fields.github.io/ 271 | 272 | - Paper: https://arxiv.org/abs/2111.14643 273 | - Demo: https://youtu.be/qGlq5DZT6uc 274 | 275 | **Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation** 276 | 277 | - Paper: https://arxiv.org/abs/2202.13162 278 | - Code: https://github.com/HexagonPrime/Pix2NeRF 279 | 280 | **HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video** 281 | 282 | - Homepage: https://grail.cs.washington.edu/projects/humannerf/ 283 | - Paper: https://arxiv.org/abs/2201.04127 284 | 285 | - Demo: https://youtu.be/GM-RoZEymmw 286 | 287 | 288 | 289 | # 3D Face 290 | 291 | **ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations** 292 | 293 | - Paper: https://arxiv.org/abs/2203.14510 294 | - Code: https://github.com/MingwuZheng/ImFace 295 | 296 | 297 | 298 | # 长尾分布(Long-Tail) 299 | 300 | **Retrieval Augmented Classification for Long-Tail Visual Recognition** 301 | 302 | - Paper: https://arxiv.org/abs/2202.11233 303 | - Code: None 304 | 305 | 306 | 307 | # Visual Transformer 308 | 309 | ## Backbone 310 | 311 | **MPViT : Multi-Path Vision Transformer for Dense Prediction** 312 | 313 | - Paper: https://arxiv.org/abs/2112.11010 314 | - Code: https://github.com/youngwanLEE/MPViT 315 | 316 | **MetaFormer is Actually What You Need for Vision** 317 | 318 | - Paper: https://arxiv.org/abs/2111.11418 319 | - Code: https://github.com/sail-sg/poolformer 320 | 321 | **Mobile-Former: Bridging MobileNet and Transformer** 322 | 323 | - Paper: https://arxiv.org/abs/2108.05895 324 | - Code: None 325 | - 中文解读:https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ 326 | 327 | **Shunted Self-Attention via Multi-Scale Token Aggregation** 328 | 329 | - Paper(Oral): https://arxiv.org/abs/2111.15193 330 | - Code: https://github.com/OliverRensu/Shunted-Transformer 331 | 332 | **Learned Queries for Efficient Local Attention** 333 | 334 | - Paper(Oral): https://arxiv.org/abs/2112.11435 335 | - Code: https://github.com/moabarar/qna 336 | 337 | ## 应用(Application) 338 | 339 | **Language-based Video Editing via Multi-Modal Multi-Level Transformer** 340 | 341 | - Paper: https://arxiv.org/abs/2104.01122 342 | - Code: None 343 | 344 | **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video** 345 | 346 | - Paper: https://arxiv.org/abs/2203.00859 347 | - Code: None 348 | 349 | **Embracing Single Stride 3D Object Detector with Sparse Transformer** 350 | 351 | - Paper: https://arxiv.org/abs/2112.06375 352 | - Code: https://github.com/TuSimple/SST 353 | - 中文解读:https://zhuanlan.zhihu.com/p/476056546 354 | 355 | **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation** 356 | 357 | - Paper: https://arxiv.org/abs/2203.02891 358 | - Code: https://github.com/xulianuwa/MCTformer 359 | 360 | **Spatio-temporal Relation Modeling for Few-shot Action Recognition** 361 | 362 | - Paper: https://arxiv.org/abs/2112.05132 363 | - Code: https://github.com/Anirudh257/strm 364 | 365 | **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction** 366 | 367 | - Paper: https://arxiv.org/abs/2111.07910 368 | - Code: https://github.com/caiyuanhao1998/MST 369 | 370 | **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling** 371 | 372 | - Homepage: https://point-bert.ivg-research.xyz/ 373 | - Paper: https://arxiv.org/abs/2111.14819 374 | - Code: https://github.com/lulutang0608/Point-BERT 375 | 376 | **GroupViT: Semantic Segmentation Emerges from Text Supervision** 377 | 378 | - Homepage: https://jerryxu.net/GroupViT/ 379 | 380 | - Paper: https://arxiv.org/abs/2202.11094 381 | - Demo: https://youtu.be/DtJsWIUTW-Y 382 | 383 | **Restormer: Efficient Transformer for High-Resolution Image Restoration** 384 | 385 | - Paper: https://arxiv.org/abs/2111.09881 386 | - Code: https://github.com/swz30/Restormer 387 | 388 | **Splicing ViT Features for Semantic Appearance Transfer** 389 | 390 | - Homepage: https://splice-vit.github.io/ 391 | - Paper: https://arxiv.org/abs/2201.00424 392 | - Code: https://github.com/omerbt/Splice 393 | 394 | **Self-supervised Video Transformer** 395 | 396 | - Homepage: https://kahnchana.github.io/svt/ 397 | - Paper: https://arxiv.org/abs/2112.01514 398 | 399 | - Code: https://github.com/kahnchana/svt 400 | 401 | **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers** 402 | 403 | - Paper: https://arxiv.org/abs/2203.02664 404 | - Code: https://github.com/rulixiang/afa 405 | 406 | **Accelerating DETR Convergence via Semantic-Aligned Matching** 407 | 408 | - Paper: https://arxiv.org/abs/2203.06883 409 | - Code: https://github.com/ZhangGongjie/SAM-DETR 410 | 411 | **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising** 412 | 413 | - Paper: https://arxiv.org/abs/2203.01305 414 | - Code: https://github.com/FengLi-ust/DN-DETR 415 | - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w 416 | 417 | **Style Transformer for Image Inversion and Editing** 418 | 419 | - Paper: https://arxiv.org/abs/2203.07932 420 | - Code: https://github.com/sapphire497/style-transformer 421 | 422 | **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer** 423 | 424 | - Paper: https://arxiv.org/abs/2203.10981 425 | 426 | - Code: https://github.com/kuanchihhuang/MonoDTR 427 | 428 | **Mask Transfiner for High-Quality Instance Segmentation** 429 | 430 | - Paper: https://arxiv.org/abs/2111.13673 431 | - Code: https://github.com/SysCV/transfiner 432 | 433 | **Language as Queries for Referring Video Object Segmentation** 434 | 435 | - Paper: https://arxiv.org/abs/2201.00487 436 | - Code: https://github.com/wjn922/ReferFormer 437 | - 中文解读:https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ 438 | 439 | **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning** 440 | 441 | - Paper: https://arxiv.org/abs/2203.00843 442 | - Code: https://github.com/CurryYuan/X-Trans2Cap 443 | 444 | **AdaMixer: A Fast-Converging Query-Based Object Detector** 445 | 446 | - Paper(Oral): https://arxiv.org/abs/2203.16507 447 | - Code: https://github.com/MCG-NJU/AdaMixer 448 | 449 | **Omni-DETR: Omni-Supervised Object Detection with Transformers** 450 | 451 | - Paper: https://arxiv.org/abs/2203.16089 452 | - Code: https://github.com/amazon-research/omni-detr 453 | 454 | **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition** 455 | 456 | - Paper: https://arxiv.org/abs/2203.10209 457 | 458 | - Code: https://github.com/mxin262/SwinTextSpotter 459 | 460 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** 461 | 462 | - Paper(Oral): https://arxiv.org/abs/2204.01018 463 | - Code: https://github.com/SvipRepetitionCounting/TransRAC 464 | 465 | **Collaborative Transformers for Grounded Situation Recognition** 466 | 467 | - Paper: https://arxiv.org/abs/2203.16518 468 | - Code: https://github.com/jhcho99/CoFormer 469 | 470 | **NFormer: Robust Person Re-identification with Neighbor Transformer** 471 | 472 | - Paper: https://arxiv.org/abs/2204.09331 473 | - Code: https://github.com/haochenheheda/NFormer 474 | 475 | **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation** 476 | 477 | - Paper: https://arxiv.org/abs/2201.06889 478 | - Code: None 479 | 480 | **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer** 481 | 482 | - Paper(Oral): https://arxiv.org/abs/2204.08680 483 | - Code: https://github.com/zengwang430521/TCFormer 484 | 485 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** 486 | 487 | - Paper: https://arxiv.org/abs/2204.10039 488 | - Code: https://github.com/H-deep/Trans-SVSR/ 489 | - Dataset: http://shorturl.at/mpwGX 490 | 491 | **Safe Self-Refinement for Transformer-based Domain Adaptation** 492 | 493 | - Paper: https://arxiv.org/abs/2204.07683 494 | - Code: https://github.com/tsun/SSRT 495 | 496 | **Fast Point Transformer** 497 | 498 | - Homepage: http://cvlab.postech.ac.kr/research/FPT/ 499 | - Paper: https://arxiv.org/abs/2112.04702 500 | - Code: https://github.com/POSTECH-CVLab/FastPointTransformer 501 | 502 | **Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval** 503 | 504 | - Paper: https://arxiv.org/abs/2204.09730 505 | - Code: https://github.com/mshukor/TFood 506 | 507 | **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation** 508 | 509 | - Paper: https://arxiv.org/abs/2111.14887 510 | - Code: https://github.com/lhoyer/DAFormer 511 | 512 | **Stratified Transformer for 3D Point Cloud Segmentation** 513 | 514 | - Paper: https://arxiv.org/pdf/2203.14508.pdf 515 | - Code: https://github.com/dvlab-research/Stratified-Transformer 516 | 517 | 518 | 519 | # 视觉和语言(Vision-Language) 520 | 521 | **Conditional Prompt Learning for Vision-Language Models** 522 | 523 | - Paper: https://arxiv.org/abs/2203.05557 524 | - Code: https://github.com/KaiyangZhou/CoOp 525 | 526 | **Bridging Video-text Retrieval with Multiple Choice Question** 527 | 528 | - Paper: https://arxiv.org/abs/2201.04850 529 | - Code: https://github.com/TencentARC/MCQ 530 | 531 | **Visual Abductive Reasoning** 532 | 533 | - Paper: https://arxiv.org/abs/2203.14040 534 | - Code: https://github.com/leonnnop/VAR 535 | 536 | 537 | 538 | # 自监督学习(Self-supervised Learning) 539 | 540 | **UniVIP: A Unified Framework for Self-Supervised Visual Pre-training** 541 | 542 | - Paper: https://arxiv.org/abs/2203.06965 543 | - Code: None 544 | 545 | **Crafting Better Contrastive Views for Siamese Representation Learning** 546 | 547 | - Paper: https://arxiv.org/abs/2202.03278 548 | - Code: https://github.com/xyupeng/ContrastiveCrop 549 | - 中文解读:https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A 550 | 551 | **HCSC: Hierarchical Contrastive Selective Coding** 552 | 553 | - Homepage: https://github.com/gyfastas/HCSC 554 | - Paper: https://arxiv.org/abs/2202.00455 555 | - 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ 556 | 557 | **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis** 558 | 559 | - Paper: https://arxiv.org/abs/2204.10437 560 | 561 | - Code: https://github.com/JLiangLab/DiRA 562 | 563 | 564 | 565 | # 数据增强(Data Augmentation) 566 | 567 | **TeachAugment: Data Augmentation Optimization Using Teacher Knowledge** 568 | 569 | - Paper: https://arxiv.org/abs/2202.12513 570 | - Code: https://github.com/DensoITLab/TeachAugment 571 | 572 | **AlignMixup: Improving Representations By Interpolating Aligned Features** 573 | 574 | - Paper: https://arxiv.org/abs/2103.15375 575 | - Code: https://github.com/shashankvkt/AlignMixup_CVPR22 576 | 577 | 578 | 579 | # 知识蒸馏(Knowledge Distillation) 580 | 581 | **Decoupled Knowledge Distillation** 582 | 583 | - Paper: https://arxiv.org/abs/2203.08679 584 | - Code: https://github.com/megvii-research/mdistiller 585 | - 中文解读:https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw 586 | 587 | 588 | 589 | # 目标检测(Object Detection) 590 | 591 | **BoxeR: Box-Attention for 2D and 3D Transformers** 592 | - Paper: https://arxiv.org/abs/2111.13087 593 | - Code: https://github.com/kienduynguyen/BoxeR 594 | - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w 595 | 596 | **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising** 597 | 598 | - Paper: https://arxiv.org/abs/2203.01305 599 | - Code: https://github.com/FengLi-ust/DN-DETR 600 | - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w 601 | 602 | **Accelerating DETR Convergence via Semantic-Aligned Matching** 603 | 604 | - Paper: https://arxiv.org/abs/2203.06883 605 | - Code: https://github.com/ZhangGongjie/SAM-DETR 606 | 607 | **Localization Distillation for Dense Object Detection** 608 | 609 | - Paper: https://arxiv.org/abs/2102.12252 610 | - Code: https://github.com/HikariTJU/LD 611 | - Code2: https://github.com/HikariTJU/LD 612 | - 中文解读:https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg 613 | 614 | **Focal and Global Knowledge Distillation for Detectors** 615 | 616 | - Paper: https://arxiv.org/abs/2111.11837 617 | - Code: https://github.com/yzd-v/FGD 618 | - 中文解读:https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ 619 | 620 | **A Dual Weighting Label Assignment Scheme for Object Detection** 621 | 622 | - Paper: https://arxiv.org/abs/2203.09730 623 | - Code: https://github.com/strongwolf/DW 624 | 625 | **AdaMixer: A Fast-Converging Query-Based Object Detector** 626 | 627 | - Paper(Oral): https://arxiv.org/abs/2203.16507 628 | - Code: https://github.com/MCG-NJU/AdaMixer 629 | 630 | **Omni-DETR: Omni-Supervised Object Detection with Transformers** 631 | 632 | - Paper: https://arxiv.org/abs/2203.16089 633 | - Code: https://github.com/amazon-research/omni-detr 634 | 635 | **SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection** 636 | 637 | - Paper(Oral): https://arxiv.org/abs/2203.06398 638 | - Code: https://github.com/CityU-AIM-Group/SIGMA 639 | 640 | ## 半监督目标检测 641 | 642 | **Dense Learning based Semi-Supervised Object Detection** 643 | 644 | - Paper: https://arxiv.org/abs/2204.07300 645 | 646 | - Code: https://github.com/chenbinghui1/DSL 647 | 648 | # 目标跟踪(Visual Tracking) 649 | 650 | **Correlation-Aware Deep Tracking** 651 | 652 | - Paper: https://arxiv.org/abs/2203.01666 653 | - Code: None 654 | 655 | **TCTrack: Temporal Contexts for Aerial Tracking** 656 | 657 | - Paper: https://arxiv.org/abs/2203.01885 658 | - Code: https://github.com/vision4robotics/TCTrack 659 | 660 | ## 多模态目标跟踪 661 | 662 | **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline** 663 | 664 | - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/ 665 | 666 | - Paper: https://arxiv.org/abs/2204.04120 667 | 668 | ## 多目标跟踪(Multi-Object Tracking) 669 | 670 | **Learning of Global Objective for Network Flow in Multi-Object Tracking** 671 | 672 | - Paper: https://arxiv.org/abs/2203.16210 673 | - Code: None 674 | 675 | **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion** 676 | 677 | - Homepage: https://dancetrack.github.io 678 | - Paper: https://arxiv.org/abs/2111.14690 679 | - Dataset: https://github.com/DanceTrack/DanceTrack 680 | 681 | 682 | 683 | # 语义分割(Semantic Segmentation) 684 | 685 | **Novel Class Discovery in Semantic Segmentation** 686 | 687 | - Homepage: https://ncdss.github.io/ 688 | - Paper: https://arxiv.org/abs/2112.01900 689 | - Code: https://github.com/HeliosZhao/NCDSS 690 | 691 | **Deep Hierarchical Semantic Segmentation** 692 | 693 | - Paper: https://arxiv.org/abs/2203.14335 694 | - Code: https://github.com/0liliulei/HieraSeg 695 | 696 | **Rethinking Semantic Segmentation: A Prototype View** 697 | 698 | - Paper(Oral): https://arxiv.org/abs/2203.15102 699 | - Code: https://github.com/tfzhou/ProtoSeg 700 | 701 | ## 弱监督语义分割 702 | 703 | **Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation** 704 | 705 | - Paper: https://arxiv.org/abs/2203.00962 706 | - Code: https://github.com/zhaozhengChen/ReCAM 707 | 708 | **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation** 709 | 710 | - Paper: https://arxiv.org/abs/2203.02891 711 | - Code: https://github.com/xulianuwa/MCTformer 712 | 713 | **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers** 714 | 715 | - Paper: https://arxiv.org/abs/2203.02664 716 | - Code: https://github.com/rulixiang/afa 717 | 718 | **CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation** 719 | 720 | - Paper: https://arxiv.org/abs/2203.02668 721 | - Code: https://github.com/CVI-SZU/CLIMS 722 | 723 | **CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation** 724 | 725 | - Paper: https://arxiv.org/abs/2203.13505 726 | - Code: https://github.com/CVI-SZU/CCAM 727 | 728 | **FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation** 729 | 730 | - Homeapage: http://cvlab.postech.ac.kr/research/FIFO/ 731 | - Paper(Oral): https://arxiv.org/abs/2204.01587 732 | - Code: https://github.com/sohyun-l/FIFO 733 | 734 | **Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation** 735 | 736 | - Paper: https://arxiv.org/abs/2203.09653 737 | - Code: https://github.com/maeve07/RCA.git 738 | 739 | ## 半监督语义分割 740 | 741 | **ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation** 742 | 743 | - Paper: https://arxiv.org/abs/2106.05095 744 | - Code: https://github.com/LiheYoung/ST-PlusPlus 745 | - 中文解读:https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA 746 | 747 | **Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels** 748 | 749 | - Homepage: https://haochen-wang409.github.io/U2PL/ 750 | - Paper: https://arxiv.org/abs/2203.03884 751 | - Code: https://github.com/Haochen-Wang409/U2PL 752 | - 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ 753 | 754 | **Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation** 755 | 756 | - Paper: https://arxiv.org/pdf/2111.12903.pdf 757 | - Code: https://github.com/yyliu01/PS-MT 758 | 759 | ## 域自适应语义分割 760 | 761 | **Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation** 762 | 763 | - Paper: https://arxiv.org/abs/2111.12940 764 | - Code: https://github.com/BIT-DA/RIPU 765 | 766 | **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation** 767 | 768 | - Paper: https://arxiv.org/abs/2111.14887 769 | - Code: https://github.com/lhoyer/DAFormer 770 | 771 | ## 无监督语义分割 772 | 773 | **GroupViT: Semantic Segmentation Emerges from Text Supervision** 774 | 775 | - Homepage: https://jerryxu.net/GroupViT/ 776 | - Paper: https://arxiv.org/abs/2202.11094 777 | - Demo: https://youtu.be/DtJsWIUTW-Y 778 | 779 | ## 少样本语义分割 780 | 781 | **Generalized Few-shot Semantic Segmentation** 782 | 783 | - Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf 784 | - Code: https://github.com/dvlab-research/GFS-Seg 785 | 786 | 787 | 788 | # 实例分割(Instance Segmentation) 789 | 790 | **BoxeR: Box-Attention for 2D and 3D Transformers** 791 | - Paper: https://arxiv.org/abs/2111.13087 792 | - Code: https://github.com/kienduynguyen/BoxeR 793 | - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w 794 | 795 | **E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation** 796 | 797 | - Paper: https://arxiv.org/abs/2203.04074 798 | - Code: https://github.com/zhang-tao-whu/e2ec 799 | 800 | **Mask Transfiner for High-Quality Instance Segmentation** 801 | 802 | - Paper: https://arxiv.org/abs/2111.13673 803 | - Code: https://github.com/SysCV/transfiner 804 | 805 | **Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity** 806 | 807 | - Homepage: https://sites.google.com/view/generic-grouping/ 808 | 809 | - Paper: https://arxiv.org/abs/2204.06107 810 | - Code: https://github.com/facebookresearch/Generic-Grouping 811 | 812 | ## 自监督实例分割 813 | 814 | **FreeSOLO: Learning to Segment Objects without Annotations** 815 | 816 | - Paper: https://arxiv.org/abs/2202.12181 817 | - Code: https://github.com/NVlabs/FreeSOLO 818 | 819 | ## 视频实例分割 820 | 821 | **Efficient Video Instance Segmentation via Tracklet Query and Proposal** 822 | 823 | - Homepage: https://jialianwu.com/projects/EfficientVIS.html 824 | - Paper: https://arxiv.org/abs/2203.01853 825 | - Demo: https://youtu.be/sSPMzgtMKCE 826 | 827 | **Temporally Efficient Vision Transformer for Video Instance Segmentation** 828 | 829 | - Paper: https://arxiv.org/abs/2204.08412 830 | - Code: https://github.com/hustvl/TeViT 831 | 832 | 833 | 834 | # 全景分割(Panoptic Segmentation) 835 | 836 | **Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers** 837 | 838 | - Paper: https://arxiv.org/abs/2109.03814 839 | - Code: https://github.com/zhiqi-li/Panoptic-SegFormer 840 | 841 | **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark** 842 | 843 | - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf 844 | - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset 845 | - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset 846 | 847 | 848 | 849 | # 小样本分类(Few-Shot Classification) 850 | 851 | **Integrative Few-Shot Learning for Classification and Segmentation** 852 | 853 | - Paper: https://arxiv.org/abs/2203.15712 854 | - Code: https://github.com/dahyun-kang/ifsl 855 | 856 | **Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification** 857 | 858 | - Paper: https://arxiv.org/abs/2106.05517 859 | - Code: https://github.com/LouieYang/MCL 860 | 861 | 862 | 863 | # 小样本分割(Few-Shot Segmentation) 864 | 865 | **Learning What Not to Segment: A New Perspective on Few-Shot Segmentation** 866 | 867 | - Paper: https://arxiv.org/abs/2203.07615 868 | - Code: https://github.com/chunbolang/BAM 869 | 870 | **Integrative Few-Shot Learning for Classification and Segmentation** 871 | 872 | - Paper: https://arxiv.org/abs/2203.15712 873 | - Code: https://github.com/dahyun-kang/ifsl 874 | 875 | **Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation** 876 | 877 | - Paper: https://arxiv.org/abs/2204.10638 878 | - Code: None 879 | 880 | 881 | 882 | # 图像抠图(Image Matting) 883 | 884 | **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation** 885 | 886 | - Paper: https://arxiv.org/abs/2201.06889 887 | - Code: None 888 | 889 | 890 | 891 | # 视频理解(Video Understanding) 892 | 893 | **Self-supervised Video Transformer** 894 | 895 | - Homepage: https://kahnchana.github.io/svt/ 896 | - Paper: https://arxiv.org/abs/2112.01514 897 | - Code: https://github.com/kahnchana/svt 898 | 899 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** 900 | 901 | - Paper(Oral): https://arxiv.org/abs/2204.01018 902 | - Code: https://github.com/SvipRepetitionCounting/TransRAC 903 | 904 | **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment** 905 | 906 | - Paper(Oral): https://arxiv.org/abs/2204.03646 907 | 908 | - Dataset: https://github.com/xujinglin/FineDiving 909 | - Code: https://github.com/xujinglin/FineDiving 910 | - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg 911 | 912 | **Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition** 913 | 914 | - Paper(Oral): https://arxiv.org/abs/2204.02148 915 | - Code: None 916 | 917 | ## 行为识别(Action Recognition) 918 | 919 | **Spatio-temporal Relation Modeling for Few-shot Action Recognition** 920 | 921 | - Paper: https://arxiv.org/abs/2112.05132 922 | - Code: https://github.com/Anirudh257/strm 923 | 924 | ## 动作检测(Action Detection) 925 | 926 | **End-to-End Semi-Supervised Learning for Video Action Detection** 927 | 928 | - Paper: https://arxiv.org/abs/2203.04251 929 | - Code: None 930 | 931 | 932 | 933 | # 图像编辑(Image Editing) 934 | 935 | **Style Transformer for Image Inversion and Editing** 936 | 937 | - Paper: https://arxiv.org/abs/2203.07932 938 | - Code: https://github.com/sapphire497/style-transformer 939 | 940 | **Blended Diffusion for Text-driven Editing of Natural Images** 941 | 942 | - Paper: https://arxiv.org/abs/2111.14818 943 | - Code: https://github.com/omriav/blended-diffusion 944 | 945 | **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing** 946 | 947 | - Homepage: https://semanticstylegan.github.io/ 948 | 949 | - Paper: https://arxiv.org/abs/2112.02236 950 | - Demo: https://semanticstylegan.github.io/videos/demo.mp4 951 | 952 | 953 | 954 | # Low-level Vision 955 | 956 | **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior** 957 | 958 | - Paper: https://arxiv.org/abs/2111.15362 959 | - Code: None 960 | 961 | **Restormer: Efficient Transformer for High-Resolution Image Restoration** 962 | 963 | - Paper: https://arxiv.org/abs/2111.09881 964 | - Code: https://github.com/swz30/Restormer 965 | 966 | **Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements** 967 | 968 | - Paper(Oral): https://arxiv.org/abs/2111.12855 969 | - Code: https://github.com/edongdongchen/REI 970 | 971 | 972 | 973 | # 超分辨率(Super-Resolution) 974 | 975 | ## 图像超分辨率(Image Super-Resolution) 976 | 977 | **Learning the Degradation Distribution for Blind Image Super-Resolution** 978 | 979 | - Paper: https://arxiv.org/abs/2203.04962 980 | - Code: https://github.com/greatlog/UnpairedSR 981 | 982 | ## 视频超分辨率(Video Super-Resolution) 983 | 984 | **BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment** 985 | 986 | - Paper: https://arxiv.org/abs/2104.13371 987 | - Code: https://github.com/open-mmlab/mmediting 988 | - Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus 989 | - 中文解读:https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g 990 | 991 | **Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling** 992 | 993 | - Paper: https://arxiv.org/abs/2204.07114 994 | - Code: None 995 | 996 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** 997 | 998 | - Paper: https://arxiv.org/abs/2204.10039 999 | - Code: https://github.com/H-deep/Trans-SVSR/ 1000 | - Dataset: http://shorturl.at/mpwGX 1001 | 1002 | 1003 | 1004 | # 去模糊(Deblur) 1005 | 1006 | ## 图像去模糊(Image Deblur) 1007 | 1008 | **Learning to Deblur using Light Field Generated and Real Defocus Images** 1009 | 1010 | - Homepage: http://lyruan.com/Projects/DRBNet/ 1011 | - Paper(Oral): https://arxiv.org/abs/2204.00442 1012 | 1013 | - Code: https://github.com/lingyanruan/DRBNet 1014 | 1015 | 1016 | 1017 | # 3D点云(3D Point Cloud) 1018 | 1019 | **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling** 1020 | 1021 | - Homepage: https://point-bert.ivg-research.xyz/ 1022 | 1023 | - Paper: https://arxiv.org/abs/2111.14819 1024 | - Code: https://github.com/lulutang0608/Point-BERT 1025 | 1026 | **A Unified Query-based Paradigm for Point Cloud Understanding** 1027 | 1028 | - Paper: https://arxiv.org/abs/2203.01252 1029 | - Code: None 1030 | 1031 | **CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding** 1032 | 1033 | - Paper: https://arxiv.org/abs/2203.00680 1034 | - Code: https://github.com/MohamedAfham/CrossPoint 1035 | 1036 | **PointCLIP: Point Cloud Understanding by CLIP** 1037 | 1038 | - Paper: https://arxiv.org/abs/2112.02413 1039 | - Code: https://github.com/ZrrSkywalker/PointCLIP 1040 | 1041 | **Fast Point Transformer** 1042 | 1043 | - Homepage: http://cvlab.postech.ac.kr/research/FPT/ 1044 | - Paper: https://arxiv.org/abs/2112.04702 1045 | - Code: https://github.com/POSTECH-CVLab/FastPointTransformer 1046 | 1047 | **RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds** 1048 | 1049 | - Paper: https://arxiv.org/abs/2205.11028 1050 | - Code: https://github.com/gxd1994/RCP 1051 | 1052 | **The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution** 1053 | 1054 | - Paper: https://arxiv.org/abs/2205.15210 1055 | - Code: https://github.com/GostInShell/PaRI-Conv 1056 | 1057 | 1058 | 1059 | # 3D目标检测(3D Object Detection) 1060 | 1061 | **Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds** 1062 | 1063 | - Paper(Oral): https://arxiv.org/abs/2203.11139 1064 | 1065 | - Code: https://github.com/yifanzhang713/IA-SSD 1066 | 1067 | - Demo: https://www.youtube.com/watch?v=3jP2o9KXunA 1068 | 1069 | **BoxeR: Box-Attention for 2D and 3D Transformers** 1070 | - Paper: https://arxiv.org/abs/2111.13087 1071 | - Code: https://github.com/kienduynguyen/BoxeR 1072 | - 中文解读:https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w 1073 | 1074 | **Embracing Single Stride 3D Object Detector with Sparse Transformer** 1075 | 1076 | - Paper: https://arxiv.org/abs/2112.06375 1077 | 1078 | - Code: https://github.com/TuSimple/SST 1079 | 1080 | **Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes** 1081 | 1082 | - Paper: https://arxiv.org/abs/2011.12001 1083 | - Code: https://github.com/qq456cvb/CanonicalVoting 1084 | 1085 | **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer** 1086 | 1087 | - Paper: https://arxiv.org/abs/2203.10981 1088 | - Code: https://github.com/kuanchihhuang/MonoDTR 1089 | 1090 | **HyperDet3D: Learning a Scene-conditioned 3D Object Detector** 1091 | 1092 | - Paper: https://arxiv.org/abs/2204.05599 1093 | - Code: None 1094 | 1095 | **OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data** 1096 | 1097 | - Paper: https://arxiv.org/abs/2204.06577 1098 | - Code: https://github.com/dschinagl/occam 1099 | 1100 | **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection** 1101 | 1102 | - Homepage: https://thudair.baai.ac.cn/index 1103 | - Paper: https://arxiv.org/abs/2204.05575 1104 | - Code: https://github.com/AIR-THU/DAIR-V2X 1105 | 1106 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions** 1107 | 1108 | - Homepage: https://ithaca365.mae.cornell.edu/ 1109 | 1110 | - Paper: https://arxiv.org/abs/2208.01166 1111 | 1112 | 1113 | 1114 | # 3D语义分割(3D Semantic Segmentation) 1115 | 1116 | **Scribble-Supervised LiDAR Semantic Segmentation** 1117 | 1118 | - Paper: https://arxiv.org/abs/2203.08537 1119 | - Dataset: https://github.com/ouenal/scribblekitti 1120 | 1121 | **Stratified Transformer for 3D Point Cloud Segmentation** 1122 | 1123 | - Paper: https://arxiv.org/pdf/2203.14508.pdf 1124 | - Code: https://github.com/dvlab-research/Stratified-Transformer 1125 | 1126 | # 3D实例分割(3D Instance Segmentation) 1127 | 1128 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions** 1129 | 1130 | - Homepage: https://ithaca365.mae.cornell.edu/ 1131 | 1132 | - Paper: https://arxiv.org/abs/2208.01166 1133 | 1134 | 1135 | 1136 | # 3D目标跟踪(3D Object Tracking) 1137 | 1138 | **Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds** 1139 | 1140 | - Paper: https://arxiv.org/abs/2203.01730 1141 | - Code: https://github.com/Ghostish/Open3DSOT 1142 | 1143 | **PTTR: Relational 3D Point Cloud Object Tracking with Transformer** 1144 | 1145 | - Paper: https://arxiv.org/abs/2112.02857 1146 | - Code: https://github.com/Jasonkks/PTTR 1147 | 1148 | 1149 | 1150 | # 3D人体姿态估计(3D Human Pose Estimation) 1151 | 1152 | **MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation** 1153 | 1154 | - Paper: https://arxiv.org/abs/2111.12707 1155 | 1156 | - Code: https://github.com/Vegetebird/MHFormer 1157 | 1158 | - 中文解读: https://zhuanlan.zhihu.com/p/439459426 1159 | 1160 | **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video** 1161 | 1162 | - Paper: https://arxiv.org/abs/2203.00859 1163 | - Code: None 1164 | 1165 | **Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation** 1166 | 1167 | - Paper: https://arxiv.org/abs/2203.07697 1168 | - Code: None 1169 | - 中文解读:https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw 1170 | 1171 | **BEV: Putting People in their Place: Monocular Regression of 3D People in Depth** 1172 | 1173 | - Homepage: https://arthur151.github.io/BEV/BEV.html 1174 | - Paper: https://arxiv.org/abs/2112.08274 1175 | - Code: https://github.com/Arthur151/ROMP 1176 | - Dataset: https://github.com/Arthur151/Relative_Human 1177 | - Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI 1178 | 1179 | 1180 | 1181 | # 3D语义场景补全(3D Semantic Scene Completion) 1182 | 1183 | **MonoScene: Monocular 3D Semantic Scene Completion** 1184 | 1185 | - Paper: https://arxiv.org/abs/2112.00726 1186 | - Code: https://github.com/cv-rits/MonoScene 1187 | 1188 | 1189 | 1190 | # 3D重建(3D Reconstruction) 1191 | 1192 | **BANMo: Building Animatable 3D Neural Models from Many Casual Videos** 1193 | 1194 | - Homepage: https://banmo-www.github.io/ 1195 | - Paper: https://arxiv.org/abs/2112.12761 1196 | - Code: https://github.com/facebookresearch/banmo 1197 | - 中文解读:https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew 1198 | 1199 | 1200 | 1201 | # 行人重识别(Person Re-identification) 1202 | 1203 | **NFormer: Robust Person Re-identification with Neighbor Transformer** 1204 | 1205 | - Paper: https://arxiv.org/abs/2204.09331 1206 | - Code: https://github.com/haochenheheda/NFormer 1207 | 1208 | 1209 | 1210 | # 伪装物体检测(Camouflaged Object Detection) 1211 | 1212 | **Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection** 1213 | 1214 | - Paper: https://arxiv.org/abs/2203.02688 1215 | - Code: https://github.com/lartpang/ZoomNet 1216 | 1217 | 1218 | 1219 | # 深度估计(Depth Estimation) 1220 | 1221 | ## 单目深度估计 1222 | 1223 | **NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation** 1224 | 1225 | - Paper: https://arxiv.org/abs/2203.01502 1226 | - Code: None 1227 | 1228 | **OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion** 1229 | 1230 | - Paper: https://arxiv.org/abs/2203.00838 1231 | - Code: None 1232 | 1233 | **Toward Practical Self-Supervised Monocular Indoor Depth Estimation** 1234 | 1235 | - Paper: https://arxiv.org/abs/2112.02306 1236 | - Code: None 1237 | 1238 | **P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior** 1239 | 1240 | - Paper: https://arxiv.org/abs/2204.02091 1241 | - Code: https://github.com/SysCV/P3Depth 1242 | 1243 | **Multi-Frame Self-Supervised Depth with Transformers** 1244 | 1245 | - Homepage: https://sites.google.com/tri.global/depthformer 1246 | 1247 | - Paper: https://arxiv.org/abs/2204.07616 1248 | - Code: None 1249 | 1250 | 1251 | 1252 | # 立体匹配(Stereo Matching) 1253 | 1254 | **ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching** 1255 | 1256 | - Paper: https://arxiv.org/abs/2203.02146 1257 | - Code: https://github.com/gangweiX/ACVNet 1258 | 1259 | 1260 | 1261 | # 特征匹配(Feature Matching) 1262 | 1263 | **ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching** 1264 | 1265 | - Paper: https://arxiv.org/abs/2204.11700 1266 | - Code: None 1267 | 1268 | 1269 | 1270 | # 车道线检测(Lane Detection) 1271 | 1272 | **Rethinking Efficient Lane Detection via Curve Modeling** 1273 | 1274 | - Paper: https://arxiv.org/abs/2203.02431 1275 | - Code: https://github.com/voldemortX/pytorch-auto-drive 1276 | - Demo:https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4 1277 | 1278 | **A Keypoint-based Global Association Network for Lane Detection** 1279 | 1280 | - Paper: https://arxiv.org/abs/2204.07335 1281 | - Code: https://github.com/Wolfwjs/GANet 1282 | 1283 | 1284 | 1285 | # 光流估计(Optical Flow Estimation) 1286 | 1287 | **Imposing Consistency for Optical Flow Estimation** 1288 | 1289 | - Paper: https://arxiv.org/abs/2204.07262 1290 | - Code: None 1291 | 1292 | **Deep Equilibrium Optical Flow Estimation** 1293 | 1294 | - Paper: https://arxiv.org/abs/2204.08442 1295 | - Code: https://github.com/locuslab/deq-flow 1296 | 1297 | **GMFlow: Learning Optical Flow via Global Matching** 1298 | 1299 | - Paper(Oral): https://arxiv.org/abs/2111.13680 1300 | - Code: https://github.com/haofeixu/gmflow 1301 | 1302 | 1303 | 1304 | # 图像修复(Image Inpainting) 1305 | 1306 | **Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding** 1307 | 1308 | - Paper: https://arxiv.org/abs/2203.00867 1309 | 1310 | - Code: https://github.com/DQiaole/ZITS_inpainting 1311 | 1312 | 1313 | 1314 | # 图像检索(Image Retrieval) 1315 | 1316 | **Correlation Verification for Image Retrieval** 1317 | 1318 | - Paper(Oral): https://arxiv.org/abs/2204.01458 1319 | - Code: https://github.com/sungonce/CVNet 1320 | 1321 | 1322 | 1323 | # 人脸识别(Face Recognition) 1324 | 1325 | **AdaFace: Quality Adaptive Margin for Face Recognition** 1326 | 1327 | - Paper(Oral): https://arxiv.org/abs/2204.00964 1328 | - Code: https://github.com/mk-minchul/AdaFace 1329 | 1330 | 1331 | 1332 | # 人群计数(Crowd Counting) 1333 | 1334 | **Leveraging Self-Supervision for Cross-Domain Crowd Counting** 1335 | 1336 | - Paper: https://arxiv.org/abs/2103.16291 1337 | - Code: None 1338 | 1339 | 1340 | 1341 | # 医学图像(Medical Image) 1342 | 1343 | **BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation** 1344 | 1345 | - Paper: https://arxiv.org/abs/2203.02533 1346 | - Code: None 1347 | 1348 | **Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification** 1349 | 1350 | - Paper: https://arxiv.org/abs/2111.12918 1351 | - Code: https://github.com/FBLADL/ACPL 1352 | 1353 | **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis** 1354 | 1355 | - Paper: https://arxiv.org/abs/2204.10437 1356 | 1357 | - Code: https://github.com/JLiangLab/DiRA 1358 | 1359 | 1360 | 1361 | # 视频生成(Video Generation) 1362 | 1363 | **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2** 1364 | 1365 | - Homepage: https://universome.github.io/stylegan-v 1366 | - Paper: https://arxiv.org/abs/2112.14683 1367 | 1368 | - Code: https://github.com/universome/stylegan-v 1369 | 1370 | - Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4 1371 | 1372 | 1373 | 1374 | # 场景图生成(Scene Graph Generation) 1375 | 1376 | **SGTR: End-to-end Scene Graph Generation with Transformer** 1377 | 1378 | - Paper: https://arxiv.org/abs/2112.12970 1379 | - Code: None 1380 | 1381 | 1382 | 1383 | # 参考视频目标分割(Referring Video Object Segmentation) 1384 | 1385 | **Language as Queries for Referring Video Object Segmentation** 1386 | 1387 | - Paper: https://arxiv.org/abs/2201.00487 1388 | - Code: https://github.com/wjn922/ReferFormer 1389 | 1390 | **ReSTR: Convolution-free Referring Image Segmentation Using Transformers** 1391 | 1392 | - Paper: https://arxiv.org/abs/2203.16768 1393 | - Code: None 1394 | 1395 | 1396 | 1397 | # 步态识别(Gait Recognition) 1398 | 1399 | **Gait Recognition in the Wild with Dense 3D Representations and A Benchmark** 1400 | 1401 | - Homepage: https://gait3d.github.io/ 1402 | - Paper: https://arxiv.org/abs/2204.02569 1403 | - Code: https://github.com/Gait3D/Gait3D-Benchmark 1404 | 1405 | 1406 | 1407 | # 风格迁移(Style Transfer) 1408 | 1409 | **StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions** 1410 | 1411 | - Homepage: https://lukashoel.github.io/stylemesh/ 1412 | - Paper: https://arxiv.org/abs/2112.01530 1413 | 1414 | - Code: https://github.com/lukasHoel/stylemesh 1415 | - Demo:https://www.youtube.com/watch?v=ZqgiTLcNcks 1416 | 1417 | 1418 | 1419 | # 异常检测(Anomaly Detection) 1420 | 1421 | **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection** 1422 | 1423 | - Paper: https://arxiv.org/abs/2111.08644 1424 | 1425 | - Dataset: https://github.com/lilygeorgescu/UBnormal 1426 | 1427 | **Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection** 1428 | 1429 | - Paper(Oral): https://arxiv.org/abs/2111.09099 1430 | - Code: https://github.com/ristea/sspcab 1431 | 1432 | 对抗样本) 1433 | 1434 | # 对抗样本(Adversarial Examples) 1435 | 1436 | **Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon** 1437 | 1438 | - Paper: https://arxiv.org/abs/2203.03818 1439 | - Code: https://github.com/hncszyq/ShadowAttack 1440 | 1441 | **LAS-AT: Adversarial Training with Learnable Attack Strategy** 1442 | 1443 | - Paper(Oral): https://arxiv.org/abs/2203.06616 1444 | - Code: https://github.com/jiaxiaojunQAQ/LAS-AT 1445 | 1446 | **Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection** 1447 | 1448 | - Paper: https://arxiv.org/abs/2112.04532 1449 | - Code: https://github.com/joellliu/SegmentAndComplete 1450 | 1451 | 1452 | 1453 | # 弱监督物体检测(Weakly Supervised Object Localization) 1454 | 1455 | **Weakly Supervised Object Localization as Domain Adaption** 1456 | 1457 | - Paper: https://arxiv.org/abs/2203.01714 1458 | - Code: https://github.com/zh460045050/DA-WSOL_CVPR2022 1459 | 1460 | 1461 | 1462 | # 雷达目标检测(Radar Object Detection) 1463 | 1464 | **Exploiting Temporal Relations on Radar Perception for Autonomous Driving** 1465 | 1466 | - Paper: https://arxiv.org/abs/2204.01184 1467 | - Code: None 1468 | 1469 | 1470 | 1471 | # 高光谱图像重建(Hyperspectral Image Reconstruction) 1472 | 1473 | **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction** 1474 | 1475 | - Paper: https://arxiv.org/abs/2111.07910 1476 | - Code: https://github.com/caiyuanhao1998/MST 1477 | 1478 | 1479 | 1480 | # 图像拼接(Image Stitching) 1481 | 1482 | **Deep Rectangling for Image Stitching: A Learning Baseline** 1483 | 1484 | - Paper(Oral): https://arxiv.org/abs/2203.03831 1485 | 1486 | - Code: https://github.com/nie-lang/DeepRectangling 1487 | - Dataset: https://github.com/nie-lang/DeepRectangling 1488 | - 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q 1489 | 1490 | 1491 | 1492 | # 水印(Watermarking) 1493 | 1494 | **Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings** 1495 | 1496 | - Paper: https://arxiv.org/abs/2104.13450 1497 | - Code: None 1498 | 1499 | 1500 | 1501 | # Action Counting 1502 | 1503 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** 1504 | 1505 | - Paper(Oral): https://arxiv.org/abs/2204.01018 1506 | - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html 1507 | - Code: https://github.com/SvipRepetitionCounting/TransRAC 1508 | 1509 | 1510 | 1511 | # Grounded Situation Recognition 1512 | 1513 | **Collaborative Transformers for Grounded Situation Recognition** 1514 | 1515 | - Paper: https://arxiv.org/abs/2203.16518 1516 | - Code: https://github.com/jhcho99/CoFormer 1517 | 1518 | 1519 | 1520 | # Zero-shot Learning 1521 | 1522 | **Unseen Classes at a Later Time? No Problem** 1523 | 1524 | - Paper: https://arxiv.org/abs/2203.16517 1525 | - Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time 1526 | 1527 | 1528 | 1529 | # DeepFakes 1530 | 1531 | **Detecting Deepfakes with Self-Blended Images** 1532 | 1533 | - Paper(Oral): https://arxiv.org/abs/2204.08376 1534 | 1535 | - Code: https://github.com/mapooon/SelfBlendedImages 1536 | 1537 | 1538 | 1539 | # 数据集(Datasets) 1540 | 1541 | **It's About Time: Analog Clock Reading in the Wild** 1542 | 1543 | - Homepage: https://charigyang.github.io/abouttime/ 1544 | - Paper: https://arxiv.org/abs/2111.09162 1545 | - Code: https://github.com/charigyang/itsabouttime 1546 | - Demo: https://youtu.be/cbiMACA6dRc 1547 | 1548 | **Toward Practical Self-Supervised Monocular Indoor Depth Estimation** 1549 | 1550 | - Paper: https://arxiv.org/abs/2112.02306 1551 | - Code: None 1552 | 1553 | **Kubric: A scalable dataset generator** 1554 | 1555 | - Paper: https://arxiv.org/abs/2203.03570 1556 | - Code: https://github.com/google-research/kubric 1557 | - 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg 1558 | 1559 | **Scribble-Supervised LiDAR Semantic Segmentation** 1560 | 1561 | - Paper: https://arxiv.org/abs/2203.08537 1562 | - Dataset: https://github.com/ouenal/scribblekitti 1563 | 1564 | **Deep Rectangling for Image Stitching: A Learning Baseline** 1565 | 1566 | - Paper(Oral): https://arxiv.org/abs/2203.03831 1567 | - Code: https://github.com/nie-lang/DeepRectangling 1568 | - Dataset: https://github.com/nie-lang/DeepRectangling 1569 | - 中文解读:https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q 1570 | 1571 | **ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer** 1572 | 1573 | - Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/ 1574 | - Paper: https://arxiv.org/abs/2204.02389 1575 | - Dataset: https://github.com/rhgao/ObjectFolder 1576 | - Demo:https://youtu.be/e5aToT3LkRA 1577 | 1578 | **Shape from Polarization for Complex Scenes in the Wild** 1579 | 1580 | - Homepage: https://chenyanglei.github.io/sfpwild/index.html 1581 | - Paper: https://arxiv.org/abs/2112.11377 1582 | - Code: https://github.com/ChenyangLEI/sfp-wild 1583 | 1584 | **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline** 1585 | 1586 | - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/ 1587 | - Paper: https://arxiv.org/abs/2204.04120 1588 | 1589 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting** 1590 | 1591 | - Paper(Oral): https://arxiv.org/abs/2204.01018 1592 | - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html 1593 | - Code: https://github.com/SvipRepetitionCounting/TransRAC 1594 | 1595 | **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment** 1596 | 1597 | - Paper(Oral): https://arxiv.org/abs/2204.03646 1598 | - Dataset: https://github.com/xujinglin/FineDiving 1599 | - Code: https://github.com/xujinglin/FineDiving 1600 | - 中文解读:https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg 1601 | 1602 | **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring** 1603 | 1604 | - Paper: https://arxiv.org/abs/2204.02701 1605 | - Dataset: https://github.com/yizhiwang96/TextLogoLayout 1606 | - Code: https://github.com/yizhiwang96/TextLogoLayout 1607 | 1608 | **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection** 1609 | 1610 | - Homepage: https://thudair.baai.ac.cn/index 1611 | - Paper: https://arxiv.org/abs/2204.05575 1612 | - Code: https://github.com/AIR-THU/DAIR-V2X 1613 | 1614 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution** 1615 | 1616 | - Paper: https://arxiv.org/abs/2204.10039 1617 | - Code: https://github.com/H-deep/Trans-SVSR/ 1618 | - Dataset: http://shorturl.at/mpwGX 1619 | 1620 | **Putting People in their Place: Monocular Regression of 3D People in Depth** 1621 | 1622 | - Homepage: https://arthur151.github.io/BEV/BEV.html 1623 | - Paper: https://arxiv.org/abs/2112.08274 1624 | 1625 | - Code:https://github.com/Arthur151/ROMP 1626 | - Dataset: https://github.com/Arthur151/Relative_Human 1627 | 1628 | **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection** 1629 | 1630 | - Paper: https://arxiv.org/abs/2111.08644 1631 | - Dataset: https://github.com/lilygeorgescu/UBnormal 1632 | 1633 | **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion** 1634 | 1635 | - Homepage: https://dancetrack.github.io 1636 | - Paper: https://arxiv.org/abs/2111.14690 1637 | - Dataset: https://github.com/DanceTrack/DanceTrack 1638 | 1639 | **Visual Abductive Reasoning** 1640 | 1641 | - Paper: https://arxiv.org/abs/2203.14040 1642 | - Code: https://github.com/leonnnop/VAR 1643 | 1644 | **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark** 1645 | 1646 | - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf 1647 | - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset 1648 | - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset 1649 | 1650 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions** 1651 | 1652 | - Homepage: https://ithaca365.mae.cornell.edu/ 1653 | 1654 | - Paper: https://arxiv.org/abs/2208.01166 1655 | 1656 | 1657 | 1658 | # 新任务(New Task) 1659 | 1660 | **Language-based Video Editing via Multi-Modal Multi-Level Transformer** 1661 | 1662 | - Paper: https://arxiv.org/abs/2104.01122 1663 | - Code: None 1664 | 1665 | **It's About Time: Analog Clock Reading in the Wild** 1666 | 1667 | - Homepage: https://charigyang.github.io/abouttime/ 1668 | - Paper: https://arxiv.org/abs/2111.09162 1669 | - Code: https://github.com/charigyang/itsabouttime 1670 | - Demo: https://youtu.be/cbiMACA6dRc 1671 | 1672 | **Splicing ViT Features for Semantic Appearance Transfer** 1673 | 1674 | - Homepage: https://splice-vit.github.io/ 1675 | - Paper: https://arxiv.org/abs/2201.00424 1676 | - Code: https://github.com/omerbt/Splice 1677 | 1678 | **Visual Abductive Reasoning** 1679 | 1680 | - Paper: https://arxiv.org/abs/2203.14040 1681 | - Code: https://github.com/leonnnop/VAR 1682 | 1683 | 1684 | 1685 | # 其他(Others) 1686 | 1687 | **Kubric: A scalable dataset generator** 1688 | 1689 | - Paper: https://arxiv.org/abs/2203.03570 1690 | - Code: https://github.com/google-research/kubric 1691 | - 中文解读:https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg 1692 | 1693 | **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning** 1694 | 1695 | - Paper: https://arxiv.org/abs/2203.00843 1696 | - Code: https://github.com/CurryYuan/X-Trans2Cap 1697 | 1698 | **Balanced MSE for Imbalanced Visual Regression** 1699 | 1700 | - Paper(Oral): https://arxiv.org/abs/2203.16427 1701 | - Code: https://github.com/jiawei-ren/BalancedMSE 1702 | 1703 | **SNUG: Self-Supervised Neural Dynamic Garments** 1704 | 1705 | - Homepage: http://mslab.es/projects/SNUG/ 1706 | - Paper(Oral): https://arxiv.org/abs/2204.02219 1707 | - Code: https://github.com/isantesteban/snug 1708 | 1709 | **Shape from Polarization for Complex Scenes in the Wild** 1710 | 1711 | - Homepage: https://chenyanglei.github.io/sfpwild/index.html 1712 | - Paper: https://arxiv.org/abs/2112.11377 1713 | - Code: https://github.com/ChenyangLEI/sfp-wild 1714 | 1715 | **LASER: LAtent SpacE Rendering for 2D Visual Localization** 1716 | 1717 | - Paper(Oral): https://arxiv.org/abs/2204.00157 1718 | - Code: None 1719 | 1720 | **Single-Photon Structured Light** 1721 | 1722 | - Paper(Oral): https://arxiv.org/abs/2204.05300 1723 | - Code: None 1724 | 1725 | **3DeformRS: Certifying Spatial Deformations on Point Clouds** 1726 | 1727 | - Paper: https://arxiv.org/abs/2204.05687 1728 | - Code: None 1729 | 1730 | **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring** 1731 | 1732 | - Paper: https://arxiv.org/abs/2204.02701 1733 | - Dataset: https://github.com/yizhiwang96/TextLogoLayout 1734 | - Code: https://github.com/yizhiwang96/TextLogoLayout 1735 | 1736 | **Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes** 1737 | 1738 | - Paper: https://arxiv.org/abs/2203.13412 1739 | - Code: https://github.com/zjsong/SSPL 1740 | 1741 | **Robust and Accurate Superquadric Recovery: a Probabilistic Approach** 1742 | 1743 | - Paper(Oral): https://arxiv.org/abs/2111.14517 1744 | - Code: https://github.com/bmlklwx/EMS-superquadric_fitting 1745 | 1746 | **Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence** 1747 | 1748 | - Paper: https://arxiv.org/abs/2203.00911 1749 | - Code: None 1750 | 1751 | **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer** 1752 | 1753 | - Paper(Oral): https://arxiv.org/abs/2204.08680 1754 | - Code: https://github.com/zengwang430521/TCFormer 1755 | 1756 | **DeepDPM: Deep Clustering With an Unknown Number of Clusters** 1757 | 1758 | - Paper: https://arxiv.org/abs/2203.14309 1759 | - Code: https://github.com/BGU-CS-VIL/DeepDPM 1760 | 1761 | **ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic** 1762 | 1763 | - Paper: https://arxiv.org/abs/2111.14447 1764 | - Code: https://github.com/YoadTew/zero-shot-image-to-text 1765 | 1766 | **Proto2Proto: Can you recognize the car, the way I do?** 1767 | 1768 | - Paper: https://arxiv.org/abs/2204.11830 1769 | - Code: https://github.com/archmaester/proto2proto 1770 | 1771 | **Putting People in their Place: Monocular Regression of 3D People in Depth** 1772 | 1773 | - Homepage: https://arthur151.github.io/BEV/BEV.html 1774 | - Paper: https://arxiv.org/abs/2112.08274 1775 | - Code:https://github.com/Arthur151/ROMP 1776 | - Dataset: https://github.com/Arthur151/Relative_Human 1777 | 1778 | **Light Field Neural Rendering** 1779 | 1780 | - Homepage: https://light-field-neural-rendering.github.io/ 1781 | - Paper(Oral): https://arxiv.org/abs/2112.09687 1782 | - Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering 1783 | 1784 | **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis** 1785 | 1786 | - Paper: https://arxiv.org/abs/2204.06160 1787 | - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution 1788 | 1789 | **Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning** 1790 | 1791 | - Paper: https://arxiv.org/abs/2203.14333 1792 | - Code: https://github.com/0liliulei/LIIR -------------------------------------------------------------------------------- /CVPR2023-Papers-with-Code.md: -------------------------------------------------------------------------------- 1 | # CVPR 2023 论文和开源项目合集(Papers with Code) 2 | 3 | [CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)! 4 | 5 | **25.78% = 2360 / 9155** 6 | 7 | CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate. 8 | 9 | 10 | > 注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目! 11 | > 12 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 13 | > 14 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md) 15 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md) 16 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md) 17 | > - [CVPR 2022](CVPR2022-Papers-with-Code.md) 18 | 19 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~ 20 | 21 | ![](CVer学术交流群.png) 22 | 23 | # 【CVPR 2023 论文开源目录】 24 | 25 | - [Backbone](#Backbone) 26 | - [CLIP](#CLIP) 27 | - [MAE](#MAE) 28 | - [GAN](#GAN) 29 | - [GNN](#GNN) 30 | - [MLP](#MLP) 31 | - [NAS](#NAS) 32 | - [OCR](#OCR) 33 | - [NeRF](#NeRF) 34 | - [DETR](#DETR) 35 | - [Prompt](#Prompt) 36 | - [Diffusion Models(扩散模型)](#Diffusion) 37 | - [Avatars](#Avatars) 38 | - [ReID(重识别)](#ReID) 39 | - [长尾分布(Long-Tail)](#Long-Tail) 40 | - [Vision Transformer](#Vision-Transformer) 41 | - [视觉和语言(Vision-Language)](#VL) 42 | - [自监督学习(Self-supervised Learning)](#SSL) 43 | - [数据增强(Data Augmentation)](#DA) 44 | - [目标检测(Object Detection)](#Object-Detection) 45 | - [目标跟踪(Visual Tracking)](#VT) 46 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) 47 | - [实例分割(Instance Segmentation)](#Instance-Segmentation) 48 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) 49 | - [医学图像分割(Medical Image Segmentation)](#MIS) 50 | - [视频目标分割(Video Object Segmentation)](#VOS) 51 | - [视频实例分割(Video Instance Segmentation)](#VIS) 52 | - [参考图像分割(Referring Image Segmentation)](#RIS) 53 | - [图像抠图(Image Matting)](#Matting) 54 | - [图像编辑(Image Editing)](#Image-Editing) 55 | - [Low-level Vision](#LLV) 56 | - [超分辨率(Super-Resolution)](#SR) 57 | - [去噪(Denoising)](#Denoising) 58 | - [去模糊(Deblur)](#Deblur) 59 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud) 60 | - [3D目标检测(3D Object Detection)](#3DOD) 61 | - [3D语义分割(3D Semantic Segmentation)](#3DSS) 62 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) 63 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) 64 | - [3D配准(3D Registration)](#3D-Registration) 65 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) 66 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) 67 | - [医学图像(Medical Image)](#Medical-Image) 68 | - [图像生成(Image Generation)](#Image-Generation) 69 | - [视频生成(Video Generation)](#Video-Generation) 70 | - [视频理解(Video Understanding)](#Video-Understanding) 71 | - [行为检测(Action Detection)](#Action-Detection) 72 | - [文本检测(Text Detection)](#Text-Detection) 73 | - [知识蒸馏(Knowledge Distillation)](#KD) 74 | - [模型剪枝(Model Pruning)](#Pruning) 75 | - [图像压缩(Image Compression)](#IC) 76 | - [异常检测(Anomaly Detection)](#AD) 77 | - [三维重建(3D Reconstruction)](#3D-Reconstruction) 78 | - [深度估计(Depth Estimation)](#Depth-Estimation) 79 | - [轨迹预测(Trajectory Prediction)](#TP) 80 | - [车道线检测(Lane Detection)](#Lane-Detection) 81 | - [图像描述(Image Captioning)](#Image-Captioning) 82 | - [视觉问答(Visual Question Answering)](#VQA) 83 | - [手语识别(Sign Language Recognition)](#SLR) 84 | - [视频预测(Video Prediction)](#Video-Prediction) 85 | - [新视点合成(Novel View Synthesis)](#NVS) 86 | - [Zero-Shot Learning(零样本学习)](#ZSL) 87 | - [立体匹配(Stereo Matching)](#Stereo-Matching) 88 | - [特征匹配(Feature Matching)](#Feature-Matching) 89 | - [场景图生成(Scene Graph Generation)](#SGG) 90 | - [隐式神经表示(Implicit Neural Representations)](#INR) 91 | - [图像质量评价(Image Quality Assessment)](#IQA) 92 | - [数据集(Datasets)](#Datasets) 93 | - [新任务(New Tasks)](#New-Tasks) 94 | - [其他(Others)](#Others) 95 | 96 | 97 | 98 | # Backbone 99 | 100 | **Integrally Pre-Trained Transformer Pyramid Networks** 101 | 102 | - Paper: https://arxiv.org/abs/2211.12735 103 | - Code: https://github.com/sunsmarterjie/iTPN 104 | 105 | **Stitchable Neural Networks** 106 | 107 | - Homepage: https://snnet.github.io/ 108 | - Paper: https://arxiv.org/abs/2302.06586 109 | - Code: https://github.com/ziplab/SN-Net 110 | 111 | **Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks** 112 | 113 | - Paper: https://arxiv.org/abs/2303.03667 114 | - Code: https://github.com/JierunChen/FasterNet 115 | 116 | **BiFormer: Vision Transformer with Bi-Level Routing Attention** 117 | 118 | - Paper: None 119 | - Code: https://github.com/rayleizhu/BiFormer 120 | 121 | **DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network** 122 | 123 | - Paper: https://arxiv.org/abs/2303.02165 124 | - Code: https://github.com/alibaba/lightweight-neural-architecture-search 125 | 126 | **Vision Transformer with Super Token Sampling** 127 | 128 | - Paper: https://arxiv.org/abs/2211.11167 129 | - Code: https://github.com/hhb072/SViT 130 | 131 | **Hard Patches Mining for Masked Image Modeling** 132 | 133 | - Paper: None 134 | - Code: None 135 | 136 | **SMPConv: Self-moving Point Representations for Continuous Convolution** 137 | 138 | - Paper: https://arxiv.org/abs/2304.02330 139 | - Code: https://github.com/sangnekim/SMPConv 140 | 141 | **Making Vision Transformers Efficient from A Token Sparsification View** 142 | 143 | - Paper: https://arxiv.org/abs/2303.08685 144 | - Code: https://github.com/changsn/STViT-R 145 | 146 | 147 | 148 | # CLIP 149 | 150 | **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis** 151 | 152 | - Paper: https://arxiv.org/abs/2301.12959 153 | - Code: https://github.com/tobran/GALIP 154 | 155 | **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation** 156 | 157 | - Paper: https://arxiv.org/abs/2303.06285 158 | - Code: https://github.com/Yueming6568/DeltaEdit 159 | 160 | 161 | 162 | # MAE 163 | 164 | **Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders** 165 | 166 | - Paper: https://arxiv.org/abs/2212.06785 167 | - Code: https://github.com/ZrrSkywalker/I2P-MAE 168 | 169 | **Generic-to-Specific Distillation of Masked Autoencoders** 170 | 171 | - Paper: https://arxiv.org/abs/2302.14771 172 | - Code: https://github.com/pengzhiliang/G2SD 173 | 174 | 175 | 176 | # GAN 177 | 178 | **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation** 179 | 180 | - Paper: https://arxiv.org/abs/2303.06285 181 | - Code: https://github.com/Yueming6568/DeltaEdit 182 | 183 | 184 | 185 | # NeRF 186 | 187 | **NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior** 188 | 189 | - Home: https://nope-nerf.active.vision/ 190 | - Paper: https://arxiv.org/abs/2212.07388 191 | - Code: None 192 | 193 | **Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures** 194 | 195 | - Paper: https://arxiv.org/abs/2211.07600 196 | - Code: https://github.com/eladrich/latent-nerf 197 | 198 | **NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis** 199 | 200 | - Paper: https://arxiv.org/abs/2301.08556 201 | - Code: None 202 | 203 | **Panoptic Lifting for 3D Scene Understanding with Neural Fields** 204 | 205 | - Homepage: https://nihalsid.github.io/panoptic-lifting/ 206 | - Paper: https://arxiv.org/abs/2212.09802 207 | - Code: None 208 | 209 | **NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer** 210 | 211 | - Homepage: https://redrock303.github.io/nerflix/ 212 | - Paper: https://arxiv.org/abs/2303.06919 213 | - Code: None 214 | 215 | **HNeRV: A Hybrid Neural Representation for Videos** 216 | 217 | - Homepage: https://haochen-rye.github.io/HNeRV 218 | - Paper: https://arxiv.org/abs/2304.02633 219 | - Code: https://github.com/haochen-rye/HNeRV 220 | 221 | 222 | 223 | # DETR 224 | 225 | **DETRs with Hybrid Matching** 226 | 227 | - Paper: https://arxiv.org/abs/2207.13080 228 | - Code: https://github.com/HDETR 229 | 230 | 231 | 232 | # Prompt 233 | 234 | **Diversity-Aware Meta Visual Prompting** 235 | 236 | - Paper: https://arxiv.org/abs/2303.08138 237 | - Code: https://github.com/shikiw/DAM-VP 238 | 239 | 240 | 241 | # NAS 242 | 243 | **PA&DA: Jointly Sampling PAth and DAta for Consistent NAS** 244 | 245 | - Paper: https://arxiv.org/abs/2302.14772 246 | - Code: https://github.com/ShunLu91/PA-DA 247 | 248 | 249 | 250 | # Avatars 251 | 252 | **Structured 3D Features for Reconstructing Relightable and Animatable Avatars** 253 | 254 | - Homepage: https://enriccorona.github.io/s3f/ 255 | - Paper: https://arxiv.org/abs/2212.06820 256 | - Code: None 257 | - Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s 258 | 259 | **Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos** 260 | 261 | - Homepage: https://augmentedperception.github.io/monoavatar/ 262 | - Paper: https://arxiv.org/abs/2304.01436 263 | 264 | 265 | 266 | # ReID(重识别) 267 | 268 | **Clothing-Change Feature Augmentation for Person Re-Identification** 269 | 270 | - Paper: None 271 | - Code: None 272 | 273 | **MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID** 274 | 275 | - Paper: https://arxiv.org/abs/2303.07065 276 | - Code: https://github.com/vimar-gu/MSINet 277 | 278 | **Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification** 279 | 280 | - Paper: https://arxiv.org/abs/2304.04205 281 | - Code: None 282 | 283 | **Large-scale Training Data Search for Object Re-identification** 284 | 285 | - Paper: https://arxiv.org/abs/2303.16186 286 | - Code: https://github.com/yorkeyao/SnP 287 | 288 | 289 | 290 | # Diffusion Models(扩散模型) 291 | 292 | **Video Probabilistic Diffusion Models in Projected Latent Space** 293 | 294 | - Homepage: https://sihyun.me/PVDM/ 295 | - Paper: https://arxiv.org/abs/2302.07685 296 | - Code: https://github.com/sihyun-yu/PVDM 297 | 298 | **Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models** 299 | 300 | - Paper: https://arxiv.org/abs/2211.10655 301 | - Code: None 302 | 303 | **Imagic: Text-Based Real Image Editing with Diffusion Models** 304 | 305 | - Homepage: https://imagic-editing.github.io/ 306 | - Paper: https://arxiv.org/abs/2210.09276 307 | - Code: None 308 | 309 | **Parallel Diffusion Models of Operator and Image for Blind Inverse Problems** 310 | 311 | - Paper: https://arxiv.org/abs/2211.10656 312 | - Code: None 313 | 314 | **DiffRF: Rendering-guided 3D Radiance Field Diffusion** 315 | 316 | - Homepage: https://sirwyver.github.io/DiffRF/ 317 | - Paper: https://arxiv.org/abs/2212.01206 318 | - Code: None 319 | 320 | **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation** 321 | 322 | - Paper: https://arxiv.org/abs/2212.09478 323 | - Code: https://github.com/researchmm/MM-Diffusion 324 | 325 | **HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising** 326 | 327 | - Homepage: https://aminshabani.github.io/housediffusion/ 328 | - Paper: https://arxiv.org/abs/2211.13287 329 | - Code: https://github.com/aminshabani/house_diffusion 330 | 331 | **TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets** 332 | 333 | - Paper: https://arxiv.org/abs/2303.05762 334 | - Code: https://github.com/chenweixin107/TrojDiff 335 | 336 | **Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption** 337 | 338 | - Paper: https://arxiv.org/abs/2207.03442 339 | - Code: https://github.com/shiyegao/DDA 340 | 341 | **DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration** 342 | 343 | - Paper: https://arxiv.org/abs/2303.06885 344 | - Code: None 345 | 346 | **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion** 347 | 348 | - Homepage: https://nv-tlabs.github.io/trace-pace/ 349 | - Paper: https://arxiv.org/abs/2304.01893 350 | - Code: None 351 | 352 | **Generative Diffusion Prior for Unified Image Restoration and Enhancement** 353 | 354 | - Paper: https://arxiv.org/abs/2304.01247 355 | - Code: None 356 | 357 | **Conditional Image-to-Video Generation with Latent Flow Diffusion Models** 358 | 359 | - Paper: https://arxiv.org/abs/2303.13744 360 | - Code: https://github.com/nihaomiao/CVPR23_LFDM 361 | 362 | 363 | 364 | # 长尾分布(Long-Tail) 365 | 366 | **Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation** 367 | 368 | - Paper: https://arxiv.org/abs/2304.01279 369 | - Code: None 370 | 371 | 372 | 373 | # Vision Transformer 374 | 375 | **Integrally Pre-Trained Transformer Pyramid Networks** 376 | 377 | - Paper: https://arxiv.org/abs/2211.12735 378 | - Code: https://github.com/sunsmarterjie/iTPN 379 | 380 | **Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors** 381 | 382 | - Homepage: https://niessnerlab.org/projects/hou2023mask3d.html 383 | - Paper: https://arxiv.org/abs/2302.14746 384 | - Code: None 385 | 386 | **Learning Trajectory-Aware Transformer for Video Super-Resolution** 387 | 388 | - Paper: https://arxiv.org/abs/2204.04216 389 | - Code: https://github.com/researchmm/TTVSR 390 | 391 | **Vision Transformers are Parameter-Efficient Audio-Visual Learners** 392 | 393 | - Homepage: https://yanbo.ml/project_page/LAVISH/ 394 | - Code: https://github.com/GenjiB/LAVISH 395 | 396 | **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes** 397 | 398 | - Paper: https://arxiv.org/abs/2303.04249 399 | - Code: None 400 | 401 | **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets** 402 | 403 | - Paper: https://arxiv.org/abs/2301.06051 404 | - Code: https://github.com/Haiyang-W/DSVT 405 | 406 | **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting** 407 | 408 | - Paper: https://arxiv.org/abs/2211.10772 409 | - Code link: https://github.com/ViTAE-Transformer/DeepSolo 410 | 411 | **BiFormer: Vision Transformer with Bi-Level Routing Attention** 412 | 413 | - Paper: https://arxiv.org/abs/2303.08810 414 | - Code: https://github.com/rayleizhu/BiFormer 415 | 416 | **Vision Transformer with Super Token Sampling** 417 | 418 | - Paper: https://arxiv.org/abs/2211.11167 419 | - Code: https://github.com/hhb072/SViT 420 | 421 | **BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision** 422 | 423 | - Paper: https://arxiv.org/abs/2211.10439 424 | - Code: None 425 | 426 | **BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation** 427 | 428 | - Paper: None 429 | - Code: None 430 | 431 | **Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention** 432 | 433 | - Paper: https://arxiv.org/abs/2304.03282 434 | - Code: None 435 | 436 | **Making Vision Transformers Efficient from A Token Sparsification View** 437 | 438 | - Paper: https://arxiv.org/abs/2303.08685 439 | - Code: https://github.com/changsn/STViT-R 440 | 441 | 442 | 443 | # 视觉和语言(Vision-Language) 444 | 445 | **GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods** 446 | 447 | - Paper: https://arxiv.org/abs/2301.01893 448 | - Code: None 449 | 450 | **Teaching Structured Vision&Language Concepts to Vision&Language Models** 451 | 452 | - Paper: https://arxiv.org/abs/2211.11733 453 | - Code: None 454 | 455 | **Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks** 456 | 457 | - Paper: https://arxiv.org/abs/2211.09808 458 | - Code: https://github.com/fundamentalvision/Uni-Perceiver 459 | 460 | **Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training** 461 | 462 | - Paper: https://arxiv.org/abs/2303.00040 463 | - Code: None 464 | 465 | **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining** 466 | 467 | - Paper: https://arxiv.org/abs/2303.02489 468 | - Code: None 469 | 470 | **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks** 471 | 472 | - Paper: https://arxiv.org/abs/2303.02483 473 | - Code: None 474 | 475 | **Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding** 476 | 477 | - Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html 478 | - Paper: https://arxiv.org/abs/2303.04077 479 | - Code: None 480 | 481 | **All in One: Exploring Unified Video-Language Pre-training** 482 | 483 | - Paper: https://arxiv.org/abs/2203.07303 484 | - Code: https://github.com/showlab/all-in-one 485 | 486 | **Position-guided Text Prompt for Vision Language Pre-training** 487 | 488 | - Paper: https://arxiv.org/abs/2212.09737 489 | - Code: https://github.com/sail-sg/ptp 490 | 491 | **EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding** 492 | 493 | - Paper: https://arxiv.org/abs/2209.14941 494 | - Code: https://github.com/yanmin-wu/EDA 495 | 496 | **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining** 497 | 498 | - Paper: https://arxiv.org/abs/2303.02489 499 | - Code: None 500 | 501 | **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks** 502 | 503 | - Paper: https://arxiv.org/abs/2303.02483 504 | - Code: https://github.com/BrandonHanx/FAME-ViL 505 | 506 | **Align and Attend: Multimodal Summarization with Dual Contrastive Losses** 507 | 508 | - Homepage: https://boheumd.github.io/A2Summ/ 509 | - Paper: https://arxiv.org/abs/2303.07284 510 | - Code: https://github.com/boheumd/A2Summ 511 | 512 | **Multi-Modal Representation Learning with Text-Driven Soft Masks** 513 | 514 | - Paper: https://arxiv.org/abs/2304.00719 515 | - Code: None 516 | 517 | **Learning to Name Classes for Vision and Language Models** 518 | 519 | - Paper: https://arxiv.org/abs/2304.01830 520 | - Code: None 521 | 522 | 523 | 524 | # 目标检测(Object Detection) 525 | 526 | **YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors** 527 | 528 | - Paper: https://arxiv.org/abs/2207.02696 529 | - Code: https://github.com/WongKinYiu/yolov7 530 | 531 | **DETRs with Hybrid Matching** 532 | 533 | - Paper: https://arxiv.org/abs/2207.13080 534 | - Code: https://github.com/HDETR 535 | 536 | **Enhanced Training of Query-Based Object Detection via Selective Query Recollection** 537 | 538 | - Paper: https://arxiv.org/abs/2212.07593 539 | - Code: https://github.com/Fangyi-Chen/SQR 540 | 541 | **Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection** 542 | 543 | - Paper: https://arxiv.org/abs/2303.05892 544 | - Code: https://github.com/LutingWang/OADP 545 | 546 | 547 | 548 | # 目标跟踪(Object Tracking) 549 | 550 | **Simple Cues Lead to a Strong Multi-Object Tracker** 551 | 552 | - Paper: https://arxiv.org/abs/2206.04656 553 | - Code: None 554 | 555 | **Joint Visual Grounding and Tracking with Natural Language Specification** 556 | 557 | - Paper: https://arxiv.org/abs/2303.12027 558 | - Code: https://github.com/lizhou-cs/JointNLT 559 | 560 | 561 | 562 | # 语义分割(Semantic Segmentation) 563 | 564 | **Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos** 565 | 566 | - Paper: https://arxiv.org/abs/2303.07224 567 | - Code: https://github.com/THU-LYJ-Lab/AR-Seg 568 | 569 | **FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding** 570 | 571 | - Paper: https://arxiv.org/abs/2304.02135 572 | - Code: https://github.com/uark-cviu/FREDOM 573 | 574 | 575 | 576 | # 医学图像分割(Medical Image Segmentation) 577 | 578 | **Label-Free Liver Tumor Segmentation** 579 | 580 | - Paper: https://arxiv.org/abs/2303.14869 581 | - Code: https://github.com/MrGiovanni/SyntheticTumors 582 | 583 | **Directional Connectivity-based Segmentation of Medical Images** 584 | 585 | - Paper: https://arxiv.org/abs/2304.00145 586 | - Code: https://github.com/Zyun-Y/DconnNet 587 | 588 | **Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation** 589 | 590 | - Paper: https://arxiv.org/abs/2305.00673 591 | - Code: https://github.com/DeepMed-Lab-ECNU/BCP 592 | 593 | **Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization** 594 | 595 | - Paper: https://arxiv.org/abs/2304.00212 596 | - Code: None 597 | 598 | **Fair Federated Medical Image Segmentation via Client Contribution Estimation** 599 | 600 | - Paper: https://arxiv.org/abs/2303.16520 601 | - Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce 602 | 603 | **Ambiguous Medical Image Segmentation using Diffusion Models** 604 | 605 | - Homepage: https://aimansnigdha.github.io/cimd/ 606 | - Paper: https://arxiv.org/abs/2304.04745 607 | - Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models 608 | 609 | **Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation** 610 | 611 | - Paper: https://arxiv.org/abs/2303.13090 612 | - Code: https://github.com/HengCai-NJU/DeSCO 613 | 614 | **MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery** 615 | 616 | - Paper: https://arxiv.org/abs/2301.01767 617 | - Code: https://github.com/DeepMed-Lab-ECNU/MagicNet 618 | 619 | **MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation** 620 | 621 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html 622 | - Code: https://github.com/WYC-321/MCF 623 | 624 | **Rethinking Few-Shot Medical Segmentation: A Vector Quantization View** 625 | 626 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html 627 | - Code: None 628 | 629 | **Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation** 630 | 631 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html 632 | - Code: https://github.com/hritam-98/PatchCL-MedSeg 633 | 634 | **SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation** 635 | 636 | - Paper: https://arxiv.org/abs/2305.11012 637 | - Code: None 638 | 639 | **DoNet: Deep De-overlapping Network for Cytology Instance Segmentation** 640 | 641 | - Paper: https://arxiv.org/abs/2303.14373 642 | - Code: https://github.com/DeepDoNet/DoNet 643 | 644 | 645 | 646 | # 视频目标分割(Video Object Segmentation) 647 | 648 | **Two-shot Video Object Segmentation** 649 | 650 | - Paper: https://arxiv.org/abs/2303.12078 651 | - Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation 652 | 653 | **Under Video Object Segmentation Section** 654 | 655 | - Paper: https://arxiv.org/abs/2303.07815 656 | - Code: None 657 | 658 | 659 | 660 | # 视频实例分割(Video Instance Segmentation) 661 | 662 | **Mask-Free Video Instance Segmentation** 663 | 664 | - Paper: https://arxiv.org/abs/2303.15904 665 | - Code: https://github.com/SysCV/MaskFreeVis 666 | 667 | 668 | 669 | # 参考图像分割(Referring Image Segmentation ) 670 | 671 | **PolyFormer: Referring Image Segmentation as Sequential Polygon Generation** 672 | 673 | - Paper: https://arxiv.org/abs/2302.07387 674 | 675 | - Code: None 676 | 677 | 678 | 679 | # 3D点云(3D-Point-Cloud) 680 | 681 | **Physical-World Optical Adversarial Attacks on 3D Face Recognition** 682 | 683 | - Paper: https://arxiv.org/abs/2205.13412 684 | - Code: https://github.com/PolyLiYJ/SLAttack.git 685 | 686 | **IterativePFN: True Iterative Point Cloud Filtering** 687 | 688 | - Paper: https://arxiv.org/abs/2304.01529 689 | - Code: https://github.com/ddsediri/IterativePFN 690 | 691 | **Attention-based Point Cloud Edge Sampling** 692 | 693 | - Homepage: https://junweizheng93.github.io/publications/APES/APES.html 694 | - Paper: https://arxiv.org/abs/2302.14673 695 | - Code: https://github.com/JunweiZheng93/APES 696 | 697 | 698 | 699 | # 3D目标检测(3D Object Detection) 700 | 701 | **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets** 702 | 703 | - Paper: https://arxiv.org/abs/2301.06051 704 | - Code: https://github.com/Haiyang-W/DSVT 705 | 706 | **FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection** 707 | 708 | - Paper: https://arxiv.org/abs/2301.04467 709 | - Code: None 710 | 711 | **3D Video Object Detection with Learnable Object-Centric Global Optimization** 712 | 713 | - Paper: None 714 | - Code: None 715 | 716 | **Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection** 717 | 718 | - Paper: https://arxiv.org/abs/2304.01464 719 | - Code: https://github.com/azhuantou/HSSDA 720 | 721 | 722 | 723 | # 3D语义分割(3D Semantic Segmentation) 724 | 725 | **Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation** 726 | 727 | - Paper: https://arxiv.org/abs/2303.11203 728 | - Code: https://github.com/l1997i/lim3d 729 | 730 | 731 | 732 | # 3D语义场景补全(3D Semantic Scene Completion) 733 | 734 | - Paper: https://arxiv.org/abs/2302.12251 735 | - Code: https://github.com/NVlabs/VoxFormer 736 | 737 | 738 | 739 | # 3D配准(3D Registration) 740 | 741 | **Robust Outlier Rejection for 3D Registration with Variational Bayes** 742 | 743 | - Paper: https://arxiv.org/abs/2304.01514 744 | - Code: https://github.com/Jiang-HB/VBReg 745 | 746 | 747 | 748 | # 3D人体姿态估计(3D Human Pose Estimation) 749 | 750 | 751 | 752 | # 3D人体Mesh估计(3D Human Mesh Estimation) 753 | 754 | **3D Human Mesh Estimation from Virtual Markers** 755 | 756 | - Paper: https://arxiv.org/abs/2303.11726 757 | - Code: https://github.com/ShirleyMaxx/VirtualMarker 758 | 759 | 760 | 761 | # Low-level Vision 762 | 763 | **Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective** 764 | 765 | - Paper: https://arxiv.org/abs/2303.06859 766 | - Code: https://github.com/lixinustc/Casual-IR-DIL 767 | 768 | **Burstormer: Burst Image Restoration and Enhancement Transformer** 769 | 770 | - Paper: https://arxiv.org/abs/2304.01194 771 | - Code: http://github.com/akshaydudhane16/Burstormer 772 | 773 | 774 | 775 | # 超分辨率(Video Super-Resolution) 776 | 777 | **Super-Resolution Neural Operator** 778 | 779 | - Paper: https://arxiv.org/abs/2303.02584 780 | - Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator 781 | 782 | ## 视频超分辨率 783 | 784 | **Learning Trajectory-Aware Transformer for Video Super-Resolution** 785 | 786 | - Paper: https://arxiv.org/abs/2204.04216 787 | 788 | - Code: https://github.com/researchmm/TTVSR 789 | 790 | Denoising 791 | 792 | # 去噪(Denoising) 793 | 794 | ## 图像去噪(Image Denoising) 795 | 796 | **Masked Image Training for Generalizable Deep Image Denoising** 797 | 798 | - Paper- : https://arxiv.org/abs/2303.13132 799 | - Code: https://github.com/haoyuc/MaskedDenoising 800 | 801 | 802 | 803 | # 图像生成(Image Generation) 804 | 805 | **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis** 806 | 807 | - Paper: https://arxiv.org/abs/2301.12959 808 | - Code: https://github.com/tobran/GALIP 809 | 810 | **MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis** 811 | 812 | - Paper: https://arxiv.org/abs/2211.09117 813 | - Code: https://github.com/LTH14/mage 814 | 815 | **Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation** 816 | 817 | - Paper: https://arxiv.org/abs/2304.01816 818 | - Code: None 819 | 820 | **Few-shot Semantic Image Synthesis with Class Affinity Transfer** 821 | 822 | - Paper: https://arxiv.org/abs/2304.02321 823 | - Code: None 824 | 825 | **TopNet: Transformer-based Object Placement Network for Image Compositing** 826 | 827 | - Paper: https://arxiv.org/abs/2304.03372 828 | - Code: None 829 | 830 | 831 | 832 | # 视频生成(Video Generation) 833 | 834 | **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation** 835 | 836 | - Paper: https://arxiv.org/abs/2212.09478 837 | - Code: https://github.com/researchmm/MM-Diffusion 838 | 839 | **Conditional Image-to-Video Generation with Latent Flow Diffusion Models** 840 | 841 | - Paper: https://arxiv.org/abs/2303.13744 842 | - Code: https://github.com/nihaomiao/CVPR23_LFDM 843 | 844 | 845 | 846 | # 视频理解(Video Understanding) 847 | 848 | **Learning Transferable Spatiotemporal Representations from Natural Script Knowledge** 849 | 850 | - Paper: https://arxiv.org/abs/2209.15280 851 | - Code: https://github.com/TencentARC/TVTS 852 | 853 | **Frame Flexible Network** 854 | 855 | - Paper: https://arxiv.org/abs/2303.14817 856 | - Code: https://github.com/BeSpontaneous/FFN 857 | 858 | **Masked Motion Encoding for Self-Supervised Video Representation Learning** 859 | 860 | - Paper: https://arxiv.org/abs/2210.06096 861 | - Code: https://github.com/XinyuSun/MME 862 | 863 | **MARLIN: Masked Autoencoder for facial video Representation LearnING** 864 | 865 | - Paper: https://arxiv.org/abs/2211.06627 866 | - Code: https://github.com/ControlNet/MARLIN 867 | 868 | 869 | 870 | # 行为检测(Action Detection) 871 | 872 | **TriDet: Temporal Action Detection with Relative Boundary Modeling** 873 | 874 | - Paper: https://arxiv.org/abs/2303.07347 875 | - Code: https://github.com/dingfengshi/TriDet 876 | 877 | 878 | 879 | # 文本检测(Text Detection) 880 | 881 | **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting** 882 | 883 | - Paper: https://arxiv.org/abs/2211.10772 884 | - Code link: https://github.com/ViTAE-Transformer/DeepSolo 885 | 886 | 887 | 888 | # 知识蒸馏(Knowledge Distillation) 889 | 890 | **Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation** 891 | 892 | - Paper: https://arxiv.org/abs/2302.14290 893 | - Code: None 894 | 895 | **Generic-to-Specific Distillation of Masked Autoencoders** 896 | 897 | - Paper: https://arxiv.org/abs/2302.14771 898 | - Code: https://github.com/pengzhiliang/G2SD 899 | 900 | 901 | 902 | # 模型剪枝(Model Pruning) 903 | 904 | **DepGraph: Towards Any Structural Pruning** 905 | 906 | - Paper: https://arxiv.org/abs/2301.12900 907 | - Code: https://github.com/VainF/Torch-Pruning 908 | 909 | 910 | 911 | # 图像压缩(Image Compression) 912 | 913 | **Context-Based Trit-Plane Coding for Progressive Image Compression** 914 | 915 | - Paper: https://arxiv.org/abs/2303.05715 916 | - Code: https://github.com/seungminjeon-github/CTC 917 | 918 | 919 | 920 | # 异常检测(Anomaly Detection) 921 | 922 | **Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images** 923 | 924 | - Paper: https://arxiv.org/abs/2111.13495 925 | - Code: https://github.com/tiangexiang/SQUID 926 | 927 | 928 | 929 | # 三维重建(3D Reconstruction) 930 | 931 | **OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields** 932 | 933 | - Paper: https://arxiv.org/abs/2211.12886 934 | - Code: None 935 | 936 | **SparsePose: Sparse-View Camera Pose Regression and Refinement** 937 | 938 | - Paper: https://arxiv.org/abs/2211.16991 939 | - Code: None 940 | 941 | **NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction** 942 | 943 | - Paper: https://arxiv.org/abs/2303.02375 944 | - Code: None 945 | 946 | **Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition** 947 | 948 | - Homepage: https://moygcc.github.io/vid2avatar/ 949 | - Paper: https://arxiv.org/abs/2302.11566 950 | - Code: https://github.com/MoyGcc/vid2avatar 951 | - Demo: https://youtu.be/EGi47YeIeGQ 952 | 953 | **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision** 954 | 955 | - Paper: https://arxiv.org/abs/2106.09614 956 | - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA 957 | 958 | **Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction** 959 | 960 | - Paper: https://arxiv.org/abs/2303.05937 961 | - Code: None 962 | 963 | **3D Cinemagraphy from a Single Image** 964 | 965 | - Homepage: https://xingyi-li.github.io/3d-cinemagraphy/ 966 | - Paper: https://arxiv.org/abs/2303.05724 967 | - Code: https://github.com/xingyi-li/3d-cinemagraphy 968 | 969 | **Revisiting Rotation Averaging: Uncertainties and Robust Losses** 970 | 971 | - Paper: https://arxiv.org/abs/2303.05195 972 | - Code https://github.com/zhangganlin/GlobalSfMpy 973 | 974 | **FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction** 975 | 976 | - Paper: https://arxiv.org/abs/2211.13874 977 | - Code: https://github.com/csbhr/FFHQ-UV 978 | 979 | **A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images** 980 | 981 | - Homepage: https://younglbw.github.io/HRN-homepage/ 982 | 983 | - Paper: https://arxiv.org/abs/2302.14434 984 | - Code: https://github.com/youngLBW/HRN 985 | 986 | 987 | 988 | # 深度估计(Depth Estimation) 989 | 990 | **Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation** 991 | 992 | - Paper: https://arxiv.org/abs/2211.13202 993 | - Code: https://github.com/noahzn/Lite-Mono 994 | 995 | 996 | 997 | # 轨迹预测(Trajectory Prediction) 998 | 999 | **IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction** 1000 | 1001 | - Paper: https://arxiv.org/abs/2303.00575 1002 | - Code: None 1003 | 1004 | **EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning** 1005 | 1006 | - Paper: https://arxiv.org/abs/2303.10876 1007 | - Code: https://github.com/MediaBrain-SJTU/EqMotion 1008 | 1009 | 1010 | 1011 | # 车道线检测(Lane Detection) 1012 | 1013 | **Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection** 1014 | 1015 | - Paper: https://arxiv.org/abs/2301.02371 1016 | - Code: https://github.com/tusen-ai/Anchor3DLane 1017 | 1018 | **BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points** 1019 | 1020 | - Paper: https://arxiv.org/abs/2210.06006v3 1021 | - Code: https://github.com/gigo-team/bev_lane_det 1022 | 1023 | 1024 | 1025 | # 图像描述(Image Captioning) 1026 | 1027 | **ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing** 1028 | 1029 | - Paper: https://arxiv.org/abs/2303.02437 1030 | - Code: Node 1031 | 1032 | **Cross-Domain Image Captioning with Discriminative Finetuning** 1033 | 1034 | - Paper: https://arxiv.org/abs/2304.01662 1035 | - Code: None 1036 | 1037 | **Model-Agnostic Gender Debiased Image Captioning** 1038 | 1039 | - Paper: https://arxiv.org/abs/2304.03693 1040 | - Code: None 1041 | 1042 | 1043 | 1044 | # 视觉问答(Visual Question Answering) 1045 | 1046 | **MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering** 1047 | 1048 | - Paper: https://arxiv.org/abs/2303.01239 1049 | - Code: https://github.com/jingjing12110/MixPHM 1050 | 1051 | 1052 | 1053 | # 手语识别(Sign Language Recognition) 1054 | 1055 | **Continuous Sign Language Recognition with Correlation Network** 1056 | 1057 | Paper: https://arxiv.org/abs/2303.03202 1058 | 1059 | Code: https://github.com/hulianyuyy/CorrNet 1060 | 1061 | 1062 | 1063 | # 视频预测(Video Prediction) 1064 | 1065 | **MOSO: Decomposing MOtion, Scene and Object for Video Prediction** 1066 | 1067 | - Paper: https://arxiv.org/abs/2303.03684 1068 | - Code: https://github.com/anonymous202203/MOSO 1069 | 1070 | 1071 | 1072 | # 新视点合成(Novel View Synthesis) 1073 | 1074 | **3D Video Loops from Asynchronous Input** 1075 | 1076 | - Homepage: https://limacv.github.io/VideoLoop3D_web/ 1077 | - Paper: https://arxiv.org/abs/2303.05312 1078 | - Code: https://github.com/limacv/VideoLoop3D 1079 | 1080 | 1081 | 1082 | # Zero-Shot Learning(零样本学习) 1083 | 1084 | **Bi-directional Distribution Alignment for Transductive Zero-Shot Learning** 1085 | 1086 | - Paper: https://arxiv.org/abs/2303.08698 1087 | - Code: https://github.com/Zhicaiwww/Bi-VAEGAN 1088 | 1089 | **Semantic Prompt for Few-Shot Learning** 1090 | 1091 | - Paper: None 1092 | - Code: None 1093 | 1094 | 1095 | 1096 | # 立体匹配(Stereo Matching) 1097 | 1098 | **Iterative Geometry Encoding Volume for Stereo Matching** 1099 | 1100 | - Paper: https://arxiv.org/abs/2303.06615 1101 | - Code: https://github.com/gangweiX/IGEV 1102 | 1103 | **Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation** 1104 | 1105 | - Paper: https://arxiv.org/abs/2304.00152 1106 | - Code: None 1107 | 1108 | 1109 | 1110 | # 特征匹配(Feature Matching) 1111 | 1112 | **Adaptive Spot-Guided Transformer for Consistent Local Feature Matching** 1113 | 1114 | - Homepage: [https://astr2023.github.io](https://astr2023.github.io/) 1115 | - Paper: https://arxiv.org/abs/2303.16624 1116 | - Code: https://github.com/ASTR2023/ASTR 1117 | 1118 | 1119 | 1120 | # 场景图生成(Scene Graph Generation) 1121 | 1122 | **Prototype-based Embedding Network for Scene Graph Generation** 1123 | 1124 | - Paper: https://arxiv.org/abs/2303.07096 1125 | - Code: None 1126 | 1127 | 1128 | 1129 | # 隐式神经表示(Implicit Neural Representations) 1130 | 1131 | **Polynomial Implicit Neural Representations For Large Diverse Datasets** 1132 | 1133 | - Paper: https://arxiv.org/abs/2303.11424 1134 | - Code: https://github.com/Rajhans0/Poly_INR 1135 | 1136 | 1137 | 1138 | # 图像质量评价(Image Quality Assessment) 1139 | 1140 | **Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild** 1141 | 1142 | - Paper: https://arxiv.org/abs/2304.00451 1143 | - Code: None 1144 | 1145 | 1146 | 1147 | # 数据集(Datasets) 1148 | 1149 | **Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes** 1150 | 1151 | - Paper: https://arxiv.org/abs/2303.02760 1152 | - Code: None 1153 | 1154 | **Align and Attend: Multimodal Summarization with Dual Contrastive Losses** 1155 | 1156 | - Homepage: https://boheumd.github.io/A2Summ/ 1157 | - Paper: https://arxiv.org/abs/2303.07284 1158 | - Code: https://github.com/boheumd/A2Summ 1159 | 1160 | **GeoNet: Benchmarking Unsupervised Adaptation across Geographies** 1161 | 1162 | - Homepage: https://tarun005.github.io/GeoNet/ 1163 | - Paper: https://arxiv.org/abs/2303.15443 1164 | 1165 | **CelebV-Text: A Large-Scale Facial Text-Video Dataset** 1166 | 1167 | - Homepage: https://celebv-text.github.io/ 1168 | - Paper: https://arxiv.org/abs/2303.14717 1169 | 1170 | 1171 | 1172 | # 其他(Others) 1173 | 1174 | **Interactive Segmentation as Gaussian Process Classification** 1175 | 1176 | - Paper: https://arxiv.org/abs/2302.14578 1177 | - Code: None 1178 | 1179 | **Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger** 1180 | 1181 | - Paper: https://arxiv.org/abs/2302.14677 1182 | - Code: None 1183 | 1184 | **SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries** 1185 | 1186 | - Homepage: http://bit.ly/splinecam 1187 | - Paper: https://arxiv.org/abs/2302.12828 1188 | - Code: None 1189 | 1190 | **SCOTCH and SODA: A Transformer Video Shadow Detection Framework** 1191 | 1192 | - Paper: https://arxiv.org/abs/2211.06885 1193 | - Code: None 1194 | 1195 | **DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization** 1196 | 1197 | - Homepage: https://ai4ce.github.io/DeepMapping2/ 1198 | - Paper: https://arxiv.org/abs/2212.06331 1199 | - None: https://github.com/ai4ce/DeepMapping2 1200 | 1201 | **RelightableHands: Efficient Neural Relighting of Articulated Hand Models** 1202 | 1203 | - Homepage: https://sh8.io/#/relightable_hands 1204 | - Paper: https://arxiv.org/abs/2302.04866 1205 | - Code: None 1206 | 1207 | **Token Turing Machines** 1208 | 1209 | - Paper: https://arxiv.org/abs/2211.09119 1210 | - Code: None 1211 | 1212 | **Single Image Backdoor Inversion via Robust Smoothed Classifiers** 1213 | 1214 | - Paper: https://arxiv.org/abs/2303.00215 1215 | - Code: https://github.com/locuslab/smoothinv 1216 | 1217 | **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision** 1218 | 1219 | - Paper: https://arxiv.org/abs/2106.09614 1220 | - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA 1221 | 1222 | **HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics** 1223 | 1224 | - Homepage: https://dolorousrtur.github.io/hood/ 1225 | - Paper: https://arxiv.org/abs/2212.07242 1226 | - Code: https://github.com/dolorousrtur/hood 1227 | - Demo: https://www.youtube.com/watch?v=cBttMDPrUYY 1228 | 1229 | **A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others** 1230 | 1231 | - Paper: https://arxiv.org/abs/2212.04825 1232 | - Code: https://github.com/facebookresearch/Whac-A-Mole.git 1233 | 1234 | **RelightableHands: Efficient Neural Relighting of Articulated Hand Models** 1235 | 1236 | - Homepage: https://sh8.io/#/relightable_hands 1237 | - Paper: https://arxiv.org/abs/2302.04866 1238 | - Code: None 1239 | - Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4 1240 | 1241 | **Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation** 1242 | 1243 | - Paper: https://arxiv.org/abs/2303.00914 1244 | - Code: None 1245 | 1246 | **Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression** 1247 | 1248 | - Paper: https://arxiv.org/abs/2303.01052 1249 | - Code: None 1250 | 1251 | **UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy** 1252 | 1253 | - Paper: https://arxiv.org/abs/2303.00938 1254 | - Code: None 1255 | 1256 | **Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness** 1257 | 1258 | - Paper: https://arxiv.org/abs/2303.00971 1259 | - Code: https://github.com/zhijieshen-bjtu/DOPNet 1260 | 1261 | **Learning Neural Parametric Head Models** 1262 | 1263 | - Homepage: https://simongiebenhain.github.io/NPHM) 1264 | - Paper: https://arxiv.org/abs/2212.02761 1265 | - Code: None 1266 | 1267 | **A Meta-Learning Approach to Predicting Performance and Data Requirements** 1268 | 1269 | - Paper: https://arxiv.org/abs/2303.01598 1270 | - Code: None 1271 | 1272 | **MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision** 1273 | 1274 | - Homepage: https://imagine.enpc.fr/~guedona/MACARONS/ 1275 | - Paper: https://arxiv.org/abs/2303.03315 1276 | - Code: None 1277 | 1278 | **Masked Images Are Counterfactual Samples for Robust Fine-tuning** 1279 | 1280 | - Paper: https://arxiv.org/abs/2303.03052 1281 | - Code: None 1282 | 1283 | **HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling** 1284 | 1285 | - Paper: https://arxiv.org/abs/2303.02700 1286 | - Code: None 1287 | 1288 | **Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization** 1289 | 1290 | - Paper: https://arxiv.org/abs/2303.02328 1291 | - Code: None 1292 | 1293 | **Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization** 1294 | 1295 | - Paper: https://arxiv.org/abs/2303.03108 1296 | - Code: None 1297 | 1298 | **Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples** 1299 | 1300 | - Paper: https://arxiv.org/abs/2301.01217 1301 | - Code: https://github.com/jiamingzhang94/Unlearnable-Clusters 1302 | 1303 | **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes** 1304 | 1305 | - Paper: https://arxiv.org/abs/2303.04249 1306 | - Code: None 1307 | 1308 | **UniHCP: A Unified Model for Human-Centric Perceptions** 1309 | 1310 | - Paper: https://arxiv.org/abs/2303.02936 1311 | - Code: https://github.com/OpenGVLab/UniHCP 1312 | 1313 | **CUDA: Convolution-based Unlearnable Datasets** 1314 | 1315 | - Paper: https://arxiv.org/abs/2303.04278 1316 | - Code: https://github.com/vinusankars/Convolution-based-Unlearnability 1317 | 1318 | **Masked Images Are Counterfactual Samples for Robust Fine-tuning** 1319 | 1320 | - Paper: https://arxiv.org/abs/2303.03052 1321 | - Code: None 1322 | 1323 | **AdaptiveMix: Robust Feature Representation via Shrinking Feature Space** 1324 | 1325 | - Paper: https://arxiv.org/abs/2303.01559 1326 | - Code: https://github.com/WentianZhang-ML/AdaptiveMix 1327 | 1328 | **Physical-World Optical Adversarial Attacks on 3D Face Recognition** 1329 | 1330 | - Paper: https://arxiv.org/abs/2205.13412 1331 | - Code: https://github.com/PolyLiYJ/SLAttack.git 1332 | 1333 | **DPE: Disentanglement of Pose and Expression for General Video Portrait Editing** 1334 | 1335 | - Paper: https://arxiv.org/abs/2301.06281 1336 | - Code: https://carlyx.github.io/DPE/ 1337 | 1338 | **SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation** 1339 | 1340 | - Paper: https://arxiv.org/abs/2211.12194 1341 | - Code: https://github.com/Winfredy/SadTalker 1342 | 1343 | **Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models** 1344 | 1345 | - Paper: None 1346 | - Code: None 1347 | 1348 | **Sharpness-Aware Gradient Matching for Domain Generalization** 1349 | 1350 | - Paper: None 1351 | - Code: https://github.com/Wang-pengfei/SAGM 1352 | 1353 | **Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization** 1354 | 1355 | - Paper: None 1356 | - Code: None 1357 | 1358 | **Blind Video Deflickering by Neural Filtering with a Flawed Atlas** 1359 | 1360 | - Homepage: https://chenyanglei.github.io/deflicker 1361 | - Paper: None 1362 | - Code: None 1363 | 1364 | **RiDDLE: Reversible and Diversified De-identification with Latent Encryptor** 1365 | 1366 | - Paper: None 1367 | - Code: https://github.com/ldz666666/RiDDLE 1368 | 1369 | **PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation** 1370 | 1371 | - Paper: https://arxiv.org/abs/2303.07337 1372 | - Code: None 1373 | 1374 | **Upcycling Models under Domain and Category Shift** 1375 | 1376 | - Paper: https://arxiv.org/abs/2303.07110 1377 | - Code: https://github.com/ispc-lab/GLC 1378 | 1379 | **Modality-Agnostic Debiasing for Single Domain Generalization** 1380 | 1381 | - Paper: https://arxiv.org/abs/2303.07123 1382 | - Code: None 1383 | 1384 | **Progressive Open Space Expansion for Open-Set Model Attribution** 1385 | 1386 | - Paper: https://arxiv.org/abs/2303.06877 1387 | - Code: None 1388 | 1389 | **Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies** 1390 | 1391 | - Paper: https://arxiv.org/abs/2303.06856 1392 | - Code: None 1393 | 1394 | **GFPose: Learning 3D Human Pose Prior with Gradient Fields** 1395 | 1396 | - Paper: https://arxiv.org/abs/2212.08641 1397 | - Code: https://github.com/Embracing/GFPose 1398 | 1399 | **PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment** 1400 | 1401 | - Paper: https://arxiv.org/abs/2303.11526 1402 | - Code: https://github.com/Zhang-VISLab 1403 | 1404 | **Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings** 1405 | 1406 | - Paper: https://arxiv.org/abs/2303.11502 1407 | - Code: None 1408 | 1409 | **Boundary Unlearning** 1410 | 1411 | - Paper: https://arxiv.org/abs/2303.11570 1412 | - Code: None 1413 | 1414 | **ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing** 1415 | 1416 | - Paper: https://arxiv.org/abs/2303.17096 1417 | - Code: https://github.com/alibaba/easyrobust 1418 | 1419 | **Zero-shot Model Diagnosis** 1420 | 1421 | - Paper: https://arxiv.org/abs/2303.15441 1422 | - Code: None 1423 | 1424 | **GeoNet: Benchmarking Unsupervised Adaptation across Geographies** 1425 | 1426 | - Homepage: https://tarun005.github.io/GeoNet/ 1427 | - Paper: https://arxiv.org/abs/2303.15443 1428 | 1429 | **Quantum Multi-Model Fitting** 1430 | 1431 | - Paper: https://arxiv.org/abs/2303.15444 1432 | - Code: https://github.com/FarinaMatteo/qmmf 1433 | 1434 | **DivClust: Controlling Diversity in Deep Clustering** 1435 | 1436 | - Paper: https://arxiv.org/abs/2304.01042 1437 | - Code: None 1438 | 1439 | **Neural Volumetric Memory for Visual Locomotion Control** 1440 | 1441 | - Homepage: https://rchalyang.github.io/NVM 1442 | - Paper: https://arxiv.org/abs/2304.01201 1443 | - Code: https://rchalyang.github.io/NVM 1444 | 1445 | **MonoHuman: Animatable Human Neural Field from Monocular Video** 1446 | 1447 | - Homepage: https://yzmblog.github.io/projects/MonoHuman/ 1448 | - Paper: https://arxiv.org/abs/2304.02001 1449 | - Code: https://github.com/Yzmblog/MonoHuman 1450 | 1451 | **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion** 1452 | 1453 | - Homepage: https://nv-tlabs.github.io/trace-pace/ 1454 | - Paper: https://arxiv.org/abs/2304.01893 1455 | - Code: None 1456 | 1457 | **Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification** 1458 | 1459 | - Paper: https://arxiv.org/abs/2304.01804 1460 | - Code: None 1461 | 1462 | **HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering** 1463 | 1464 | - Paper: https://arxiv.org/abs/2304.01686 1465 | - Code: None 1466 | 1467 | **On the Stability-Plasticity Dilemma of Class-Incremental Learning** 1468 | 1469 | - Paper: https://arxiv.org/abs/2304.01663 1470 | - Code: None 1471 | 1472 | **Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning** 1473 | 1474 | - Paper: https://arxiv.org/abs/2304.01482 1475 | - Code: None 1476 | 1477 | **VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution** 1478 | 1479 | - Paper: https://arxiv.org/abs/2304.01434 1480 | - Code: https://github.com/jaeill/CVPR23-VNE 1481 | 1482 | **Detecting and Grounding Multi-Modal Media Manipulation** 1483 | 1484 | - Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake 1485 | - Paper: https://arxiv.org/abs/2304.02556 1486 | - Code: https://github.com/rshaojimmy/MultiModal-DeepFake 1487 | 1488 | **Meta-causal Learning for Single Domain Generalization** 1489 | 1490 | - Paper: https://arxiv.org/abs/2304.03709 1491 | - Code: None 1492 | 1493 | **Disentangling Writer and Character Styles for Handwriting Generation** 1494 | 1495 | - Paper: https://arxiv.org/abs/2303.14736 1496 | - Code: https://github.com/dailenson/SDT 1497 | 1498 | **DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects** 1499 | 1500 | - Homepage: https://www.chenbao.tech/dexart/ 1501 | 1502 | - Code: https://github.com/Kami-code/dexart-release 1503 | 1504 | **Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision** 1505 | 1506 | - Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html 1507 | - Paper: https://arxiv.org/abs/2303.00462 1508 | - Code: https://github.com/Toytiny/CMFlow 1509 | 1510 | **Marching-Primitives: Shape Abstraction from Signed Distance Function** 1511 | 1512 | - Paper: https://arxiv.org/abs/2303.13190 1513 | - Code: https://github.com/ChirikjianLab/Marching-Primitives 1514 | 1515 | **Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision** 1516 | 1517 | - Paper: https://arxiv.org/abs/2303.00885 1518 | - Code: None -------------------------------------------------------------------------------- /CVer学术交流群.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wyf3/CVPR2024-Papers-with-Code/7a12b2155e596a79ba6dcc7a17a5ae27f0fc50a8/CVer学术交流群.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # CVPR 2024 论文和开源项目合集(Papers with Code) 2 | 3 | CVPR 2024 decisions are now available on OpenReview! 4 | 5 | 6 | > 注1:欢迎各位大佬提交issue,分享CVPR 2024论文和开源项目! 7 | > 8 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 9 | > 10 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md) 11 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md) 12 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md) 13 | > - [CVPR 2022](CVPR2022-Papers-with-Code.md) 14 | > - [CVPR 2023](CVPR2022-Papers-with-Code.md) 15 | 16 | 欢迎扫码加入【CVer学术交流群】,这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料,学起来! 17 | 18 | ![](CVer学术交流群.png) 19 | 20 | # 【CVPR 2024 论文开源目录】 21 | 22 | - [3DGS(Gaussian Splatting)](#3DGS) 23 | - [Avatars](#Avatars) 24 | - [Backbone](#Backbone) 25 | - [CLIP](#CLIP) 26 | - [MAE](#MAE) 27 | - [Embodied AI](#Embodied-AI) 28 | - [GAN](#GAN) 29 | - [GNN](#GNN) 30 | - [多模态大语言模型(MLLM)](#MLLM) 31 | - [大语言模型(LLM)](#LLM) 32 | - [NAS](#NAS) 33 | - [OCR](#OCR) 34 | - [NeRF](#NeRF) 35 | - [DETR](#DETR) 36 | - [Prompt](#Prompt) 37 | - [扩散模型(Diffusion Models)](#Diffusion) 38 | - [ReID(重识别)](#ReID) 39 | - [长尾分布(Long-Tail)](#Long-Tail) 40 | - [Vision Transformer](#Vision-Transformer) 41 | - [视觉和语言(Vision-Language)](#VL) 42 | - [自监督学习(Self-supervised Learning)](#SSL) 43 | - [数据增强(Data Augmentation)](#DA) 44 | - [目标检测(Object Detection)](#Object-Detection) 45 | - [异常检测(Anomaly Detection)](#Anomaly-Detection) 46 | - [目标跟踪(Visual Tracking)](#VT) 47 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) 48 | - [实例分割(Instance Segmentation)](#Instance-Segmentation) 49 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) 50 | - [医学图像(Medical Image)](#MI) 51 | - [医学图像分割(Medical Image Segmentation)](#MIS) 52 | - [视频目标分割(Video Object Segmentation)](#VOS) 53 | - [视频实例分割(Video Instance Segmentation)](#VIS) 54 | - [参考图像分割(Referring Image Segmentation)](#RIS) 55 | - [图像抠图(Image Matting)](#Matting) 56 | - [图像编辑(Image Editing)](#Image-Editing) 57 | - [Low-level Vision](#LLV) 58 | - [超分辨率(Super-Resolution)](#SR) 59 | - [去噪(Denoising)](#Denoising) 60 | - [去模糊(Deblur)](#Deblur) 61 | - [自动驾驶(Autonomous Driving)](#Autonomous-Driving) 62 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud) 63 | - [3D目标检测(3D Object Detection)](#3DOD) 64 | - [3D语义分割(3D Semantic Segmentation)](#3DSS) 65 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) 66 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) 67 | - [3D配准(3D Registration)](#3D-Registration) 68 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) 69 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) 70 | - [医学图像(Medical Image)](#Medical-Image) 71 | - [图像生成(Image Generation)](#Image-Generation) 72 | - [视频生成(Video Generation)](#Video-Generation) 73 | - [3D生成(3D Generation)](#3D-Generation) 74 | - [视频理解(Video Understanding)](#Video-Understanding) 75 | - [行为检测(Action Detection)](#Action-Detection) 76 | - [文本检测(Text Detection)](#Text-Detection) 77 | - [知识蒸馏(Knowledge Distillation)](#KD) 78 | - [模型剪枝(Model Pruning)](#Pruning) 79 | - [图像压缩(Image Compression)](#IC) 80 | - [三维重建(3D Reconstruction)](#3D-Reconstruction) 81 | - [深度估计(Depth Estimation)](#Depth-Estimation) 82 | - [轨迹预测(Trajectory Prediction)](#TP) 83 | - [车道线检测(Lane Detection)](#Lane-Detection) 84 | - [图像描述(Image Captioning)](#Image-Captioning) 85 | - [视觉问答(Visual Question Answering)](#VQA) 86 | - [手语识别(Sign Language Recognition)](#SLR) 87 | - [视频预测(Video Prediction)](#Video-Prediction) 88 | - [新视点合成(Novel View Synthesis)](#NVS) 89 | - [Zero-Shot Learning(零样本学习)](#ZSL) 90 | - [立体匹配(Stereo Matching)](#Stereo-Matching) 91 | - [特征匹配(Feature Matching)](#Feature-Matching) 92 | - [场景图生成(Scene Graph Generation)](#SGG) 93 | - [隐式神经表示(Implicit Neural Representations)](#INR) 94 | - [图像质量评价(Image Quality Assessment)](#IQA) 95 | - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment) 96 | - [数据集(Datasets)](#Datasets) 97 | - [新任务(New Tasks)](#New-Tasks) 98 | - [其他(Others)](#Others) 99 | 100 | 101 | 102 | # 3DGS(Gaussian Splatting) 103 | 104 | **Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering** 105 | 106 | - Homepage: https://city-super.github.io/scaffold-gs/ 107 | - Paper: https://arxiv.org/abs/2312.00109 108 | - Code: https://github.com/city-super/Scaffold-GS 109 | 110 | **GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis** 111 | 112 | - Homepage: https://shunyuanzheng.github.io/GPS-Gaussian 113 | - Paper: https://arxiv.org/abs/2312.02155 114 | - Code: https://github.com/ShunyuanZheng/GPS-Gaussian 115 | 116 | **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians** 117 | 118 | - Paper: https://arxiv.org/abs/2312.02134 119 | - Code: https://github.com/huliangxiao/GaussianAvatar 120 | 121 | **GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting** 122 | 123 | - Paper: https://arxiv.org/abs/2311.14521 124 | - Code: https://github.com/buaacyw/GaussianEditor 125 | 126 | **Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction** 127 | 128 | - Homepage: https://ingra14m.github.io/Deformable-Gaussians/ 129 | - Paper: https://arxiv.org/abs/2309.13101 130 | - Code: https://github.com/ingra14m/Deformable-3D-Gaussians 131 | 132 | **SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes** 133 | 134 | - Homepage: https://yihua7.github.io/SC-GS-web/ 135 | - Paper: https://arxiv.org/abs/2312.14937 136 | - Code: https://github.com/yihua7/SC-GS 137 | 138 | **Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis** 139 | 140 | - Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/ 141 | - Paper: https://arxiv.org/abs/2312.16812 142 | - Code: https://github.com/oppo-us-research/SpacetimeGaussians 143 | 144 | **DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization** 145 | 146 | - Homepage: https://fictionarry.github.io/DNGaussian/ 147 | - Paper: https://arxiv.org/abs/2403.06912 148 | - Code: https://github.com/Fictionarry/DNGaussian 149 | 150 | **4D Gaussian Splatting for Real-Time Dynamic Scene Rendering** 151 | 152 | - Paper: https://arxiv.org/abs/2310.08528 153 | - Code: https://github.com/hustvl/4DGaussians 154 | 155 | **GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models** 156 | 157 | - Paper: https://arxiv.org/abs/2310.08529 158 | - Code: https://github.com/hustvl/GaussianDreamer 159 | 160 | 161 | 162 | # Avatars 163 | 164 | **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians** 165 | 166 | - Paper: https://arxiv.org/abs/2312.02134 167 | - Code: https://github.com/huliangxiao/GaussianAvatar 168 | 169 | **Real-Time Simulated Avatar from Head-Mounted Sensors** 170 | 171 | - Homepage: https://www.zhengyiluo.com/SimXR/ 172 | - Paper: https://arxiv.org/abs/2403.06862 173 | 174 | 175 | 176 | # Backbone 177 | 178 | **RepViT: Revisiting Mobile CNN From ViT Perspective** 179 | 180 | - Paper: https://arxiv.org/abs/2307.09283 181 | - Code: https://github.com/THU-MIG/RepViT 182 | 183 | **TransNeXt: Robust Foveal Visual Perception for Vision Transformers** 184 | 185 | - Paper: https://arxiv.org/abs/2311.17132 186 | - Code: https://github.com/DaiShiResearch/TransNeXt 187 | 188 | 189 | 190 | # CLIP 191 | 192 | **Alpha-CLIP: A CLIP Model Focusing on Wherever You Want** 193 | 194 | - Paper: https://arxiv.org/abs/2312.03818 195 | - Code: https://github.com/SunzeY/AlphaCLIP 196 | 197 | **FairCLIP: Harnessing Fairness in Vision-Language Learning** 198 | 199 | - Paper: https://arxiv.org/abs/2403.19949 200 | - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP 201 | 202 | 203 | 204 | # MAE 205 | 206 | 207 | 208 | # Embodied AI 209 | 210 | **EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI** 211 | 212 | - Homepage: https://tai-wang.github.io/embodiedscan/ 213 | - Paper: https://arxiv.org/abs/2312.16170 214 | - Code: https://github.com/OpenRobotLab/EmbodiedScan 215 | 216 | **MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception** 217 | 218 | - Homepage: https://iranqin.github.io/MP5.github.io/ 219 | - Paper: https://arxiv.org/abs/2312.07472 220 | - Code: https://github.com/IranQin/MP5 221 | 222 | **LEMON: Learning 3D Human-Object Interaction Relation from 2D Images** 223 | 224 | - Paper: https://arxiv.org/abs/2312.08963 225 | - Code: https://github.com/yyvhang/lemon_3d 226 | 227 | 228 | 229 | # GAN 230 | 231 | 232 | 233 | # OCR 234 | 235 | **An Empirical Study of Scaling Law for OCR** 236 | 237 | - Paper: https://arxiv.org/abs/2401.00028 238 | - Code: https://github.com/large-ocr-model/large-ocr-model.github.io 239 | 240 | **ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting** 241 | 242 | - Paper: https://arxiv.org/abs/2403.00303 243 | - Code: https://github.com/PriNing/ODM 244 | 245 | 246 | 247 | # NeRF 248 | 249 | **PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF** 250 | 251 | - Paper: https://arxiv.org/abs/2311.13099 252 | - Code: https://github.com/FYTalon/pienerf/ 253 | 254 | 255 | 256 | # DETR 257 | 258 | **DETRs Beat YOLOs on Real-time Object Detection** 259 | 260 | - Paper: https://arxiv.org/abs/2304.08069 261 | - Code: https://github.com/lyuwenyu/RT-DETR 262 | 263 | **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement** 264 | 265 | - Paper: https://arxiv.org/abs/2403.16131 266 | - Code: https://github.com/xiuqhou/Salience-DETR 267 | 268 | 269 | 270 | # Prompt 271 | 272 | 273 | 274 | # 多模态大语言模型(MLLM) 275 | 276 | **mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration** 277 | 278 | - Paper: https://arxiv.org/abs/2311.04257 279 | - Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2 280 | 281 | **Link-Context Learning for Multimodal LLMs** 282 | 283 | - Paper: https://arxiv.org/abs/2308.07891 284 | - Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main 285 | 286 | **OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation** 287 | 288 | - Paper: https://arxiv.org/abs/2311.17911 289 | - Code: https://github.com/shikiw/OPERA 290 | 291 | **Making Large Multimodal Models Understand Arbitrary Visual Prompts** 292 | 293 | - Homepage: https://vip-llava.github.io/ 294 | - Paper: https://arxiv.org/abs/2312.00784 295 | 296 | **Pink: Unveiling the power of referential comprehension for multi-modal llms** 297 | 298 | - Paper: https://arxiv.org/abs/2310.00582 299 | - Code: https://github.com/SY-Xuan/Pink 300 | 301 | **Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding** 302 | 303 | - Paper: https://arxiv.org/abs/2311.08046 304 | - Code: https://github.com/PKU-YuanGroup/Chat-UniVi 305 | 306 | **OneLLM: One Framework to Align All Modalities with Language** 307 | 308 | - Paper: https://arxiv.org/abs/2312.03700 309 | - Code: https://github.com/csuhan/OneLLM 310 | 311 | 312 | 313 | # 大语言模型(LLM) 314 | 315 | **VTimeLLM: Empower LLM to Grasp Video Moments** 316 | 317 | - Paper: https://arxiv.org/abs/2311.18445 318 | - Code: https://github.com/huangb23/VTimeLLM 319 | 320 | 321 | 322 | # NAS 323 | 324 | 325 | 326 | # ReID(重识别) 327 | 328 | **Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification** 329 | 330 | - Paper: https://arxiv.org/abs/2403.10254 331 | - Code: https://github.com/924973292/EDITOR 332 | 333 | **Noisy-Correspondence Learning for Text-to-Image Person Re-identification** 334 | 335 | - Paper: https://arxiv.org/abs/2308.09911 336 | 337 | - Code : https://github.com/QinYang79/RDE 338 | 339 | 340 | 341 | # 扩散模型(Diffusion Models) 342 | 343 | **InstanceDiffusion: Instance-level Control for Image Generation** 344 | 345 | - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/ 346 | 347 | - Paper: https://arxiv.org/abs/2402.03290 348 | - Code: https://github.com/frank-xwang/InstanceDiffusion 349 | 350 | **Residual Denoising Diffusion Models** 351 | 352 | - Paper: https://arxiv.org/abs/2308.13712 353 | - Code: https://github.com/nachifur/RDDM 354 | 355 | **DeepCache: Accelerating Diffusion Models for Free** 356 | 357 | - Paper: https://arxiv.org/abs/2312.00858 358 | - Code: https://github.com/horseee/DeepCache 359 | 360 | **DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations** 361 | 362 | - Homepage: https://tianhao-qi.github.io/DEADiff/ 363 | 364 | - Paper: https://arxiv.org/abs/2403.06951 365 | - Code: https://github.com/Tianhao-Qi/DEADiff_code 366 | 367 | **SVGDreamer: Text Guided SVG Generation with Diffusion Model** 368 | 369 | - Paper: https://arxiv.org/abs/2312.16476 370 | - Code: https://ximinng.github.io/SVGDreamer-project/ 371 | 372 | **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model** 373 | 374 | - Paper: https://arxiv.org/abs/2312.05849 375 | - Code: https://github.com/jiuntian/interactdiffusion 376 | 377 | **MMA-Diffusion: MultiModal Attack on Diffusion Models** 378 | 379 | - Paper: https://arxiv.org/abs/2311.17516 380 | - Code: https://github.com/yangyijune/MMA-Diffusion 381 | 382 | **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models** 383 | 384 | - Homeoage: https://video-motion-customization.github.io/ 385 | - Paper: https://arxiv.org/abs/2312.00845 386 | - Code: https://github.com/HyeonHo99/Video-Motion-Customization 387 | 388 | 389 | 390 | # Vision Transformer 391 | 392 | **TransNeXt: Robust Foveal Visual Perception for Vision Transformers** 393 | 394 | - Paper: https://arxiv.org/abs/2311.17132 395 | - Code: https://github.com/DaiShiResearch/TransNeXt 396 | 397 | **RepViT: Revisiting Mobile CNN From ViT Perspective** 398 | 399 | - Paper: https://arxiv.org/abs/2307.09283 400 | - Code: https://github.com/THU-MIG/RepViT 401 | 402 | **A General and Efficient Training for Transformer via Token Expansion** 403 | 404 | - Paper: https://arxiv.org/abs/2404.00672 405 | - Code: https://github.com/Osilly/TokenExpansion 406 | 407 | 408 | 409 | # 视觉和语言(Vision-Language) 410 | 411 | **PromptKD: Unsupervised Prompt Distillation for Vision-Language Models** 412 | 413 | - Paper: https://arxiv.org/abs/2403.02781 414 | - Code: https://github.com/zhengli97/PromptKD 415 | 416 | **FairCLIP: Harnessing Fairness in Vision-Language Learning** 417 | 418 | - Paper: https://arxiv.org/abs/2403.19949 419 | - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP 420 | 421 | 422 | 423 | # 目标检测(Object Detection) 424 | 425 | **DETRs Beat YOLOs on Real-time Object Detection** 426 | 427 | - Paper: https://arxiv.org/abs/2304.08069 428 | - Code: https://github.com/lyuwenyu/RT-DETR 429 | 430 | **Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation** 431 | 432 | - Paper: https://arxiv.org/abs/2312.01220 433 | - Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation 434 | 435 | **YOLO-World: Real-Time Open-Vocabulary Object Detection** 436 | 437 | - Paper: https://arxiv.org/abs/2401.17270 438 | - Code: https://github.com/AILab-CVC/YOLO-World 439 | 440 | **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement** 441 | 442 | - Paper: https://arxiv.org/abs/2403.16131 443 | - Code: https://github.com/xiuqhou/Salience-DETR 444 | 445 | 446 | 447 | # 异常检测(Anomaly Detection) 448 | 449 | **Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection** 450 | 451 | - Paper: https://arxiv.org/abs/2310.12790 452 | - Code: https://github.com/mala-lab/AHL 453 | 454 | 455 | 456 | # 目标跟踪(Object Tracking) 457 | 458 | **Delving into the Trajectory Long-tail Distribution for Muti-object Tracking** 459 | 460 | - Paper: https://arxiv.org/abs/2403.04700 461 | - Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT 462 | 463 | 464 | 465 | # 语义分割(Semantic Segmentation) 466 | 467 | **Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation** 468 | 469 | - Paper: https://arxiv.org/abs/2312.04265 470 | - Code: https://github.com/w1oves/Rein 471 | 472 | **SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation** 473 | 474 | - Paper: https://arxiv.org/abs/2311.15537 475 | - Code: https://github.com/xb534/SED 476 | 477 | 478 | 479 | # 医学图像(Medical Image) 480 | 481 | **Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology** 482 | 483 | - Paper: https://arxiv.org/abs/2402.17228 484 | - Code: https://github.com/DearCaat/RRT-MIL 485 | 486 | **VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis** 487 | 488 | - Paper: https://arxiv.org/abs/2402.17300 489 | - Code: https://github.com/Luffy03/VoCo 490 | 491 | **ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images** 492 | 493 | - Paper: https://arxiv.org/abs/2311.15264 494 | - Code: https://github.com/nicoboou/chada_vit 495 | 496 | 497 | 498 | # 医学图像分割(Medical Image Segmentation) 499 | 500 | 501 | 502 | 503 | 504 | # 自动驾驶(Autonomous Driving) 505 | 506 | **UniPAD: A Universal Pre-training Paradigm for Autonomous Driving** 507 | 508 | - Paper: https://arxiv.org/abs/2310.08370 509 | - Code: https://github.com/Nightmare-n/UniPAD 510 | 511 | **Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications** 512 | 513 | - Paper: https://arxiv.org/abs/2311.17663 514 | - Code: https://github.com/haomo-ai/Cam4DOcc 515 | 516 | **Memory-based Adapters for Online 3D Scene Perception** 517 | 518 | - Paper: https://arxiv.org/abs/2403.06974 519 | - Code: https://github.com/xuxw98/Online3D 520 | 521 | **Symphonize 3D Semantic Scene Completion with Contextual Instance Queries** 522 | 523 | - Paper: https://arxiv.org/abs/2306.15670 524 | - Code: https://github.com/hustvl/Symphonies 525 | 526 | **A Real-world Large-scale Dataset for Roadside Cooperative Perception** 527 | 528 | - Paper: https://arxiv.org/abs/2403.10145 529 | - Code: https://github.com/AIR-THU/DAIR-RCooper 530 | 531 | **Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving** 532 | 533 | - Paper: https://arxiv.org/abs/2403.07535 534 | - Code: https://github.com/Junda24/AFNet 535 | 536 | **Traffic Scene Parsing through the TSP6K Dataset** 537 | 538 | - Paper: https://arxiv.org/pdf/2303.02835.pdf 539 | - Code: https://github.com/PengtaoJiang/TSP6K 540 | 541 | 542 | 543 | # 3D点云(3D-Point-Cloud) 544 | 545 | 546 | 547 | 548 | 549 | # 3D目标检测(3D Object Detection) 550 | 551 | **PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection** 552 | 553 | - Paper: https://arxiv.org/abs/2312.08371 554 | - Code: https://github.com/kuanchihhuang/PTT 555 | 556 | **UniMODE: Unified Monocular 3D Object Detection** 557 | 558 | - Paper: https://arxiv.org/abs/2402.18573 559 | 560 | 561 | 562 | # 3D语义分割(3D Semantic Segmentation) 563 | 564 | 565 | 566 | # 图像编辑(Image Editing) 567 | 568 | **Edit One for All: Interactive Batch Image Editing** 569 | 570 | - Homepage: https://thaoshibe.github.io/edit-one-for-all 571 | - Paper: https://arxiv.org/abs/2401.10219 572 | - Code: https://github.com/thaoshibe/edit-one-for-all 573 | 574 | 575 | 576 | # 视频编辑(Video Editing) 577 | 578 | **MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers** 579 | 580 | - Homepage: [https://maskint.github.io](https://maskint.github.io/) 581 | 582 | - Paper: https://arxiv.org/abs/2312.12468 583 | 584 | 585 | 586 | # Low-level Vision 587 | 588 | **Residual Denoising Diffusion Models** 589 | 590 | - Paper: https://arxiv.org/abs/2308.13712 591 | - Code: https://github.com/nachifur/RDDM 592 | 593 | **Boosting Image Restoration via Priors from Pre-trained Models** 594 | 595 | - Paper: https://arxiv.org/abs/2403.06793 596 | 597 | 598 | 599 | # 超分辨率(Super-Resolution) 600 | 601 | **SeD: Semantic-Aware Discriminator for Image Super-Resolution** 602 | 603 | - Paper: https://arxiv.org/abs/2402.19387 604 | - Code: https://github.com/lbc12345/SeD 605 | 606 | **APISR: Anime Production Inspired Real-World Anime Super-Resolution** 607 | 608 | - Paper: https://arxiv.org/abs/2403.01598 609 | - Code: https://github.com/Kiteretsu77/APISR 610 | 611 | 612 | 613 | # 去噪(Denoising) 614 | 615 | ## 图像去噪(Image Denoising) 616 | 617 | 618 | 619 | # 3D人体姿态估计(3D Human Pose Estimation) 620 | 621 | **Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation** 622 | 623 | - Paper: https://arxiv.org/abs/2311.12028 624 | - Code: https://github.com/NationalGAILab/HoT 625 | 626 | 627 | 628 | # 图像生成(Image Generation) 629 | 630 | **InstanceDiffusion: Instance-level Control for Image Generation** 631 | 632 | - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/ 633 | 634 | - Paper: https://arxiv.org/abs/2402.03290 635 | - Code: https://github.com/frank-xwang/InstanceDiffusion 636 | 637 | **ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations** 638 | 639 | - Homepage: https://eclipse-t2i.vercel.app/ 640 | - Paper: https://arxiv.org/abs/2312.04655 641 | 642 | - Code: https://github.com/eclipse-t2i/eclipse-inference 643 | 644 | **Instruct-Imagen: Image Generation with Multi-modal Instruction** 645 | 646 | - Paper: https://arxiv.org/abs/2401.01952 647 | 648 | **Residual Denoising Diffusion Models** 649 | 650 | - Paper: https://arxiv.org/abs/2308.13712 651 | - Code: https://github.com/nachifur/RDDM 652 | 653 | **UniGS: Unified Representation for Image Generation and Segmentation** 654 | 655 | - Paper: https://arxiv.org/abs/2312.01985 656 | 657 | **Multi-Instance Generation Controller for Text-to-Image Synthesis** 658 | 659 | - Paper: https://arxiv.org/abs/2402.05408 660 | - Code: https://github.com/limuloo/migc 661 | 662 | **SVGDreamer: Text Guided SVG Generation with Diffusion Model** 663 | 664 | - Paper: https://arxiv.org/abs/2312.16476 665 | - Code: https://ximinng.github.io/SVGDreamer-project/ 666 | 667 | **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model** 668 | 669 | - Paper: https://arxiv.org/abs/2312.05849 670 | - Code: https://github.com/jiuntian/interactdiffusion 671 | 672 | **Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following** 673 | 674 | - Paper: https://arxiv.org/abs/2311.17002 675 | - Code: https://github.com/ali-vilab/Ranni 676 | 677 | 678 | 679 | # 视频生成(Video Generation) 680 | 681 | **Vlogger: Make Your Dream A Vlog** 682 | 683 | - Paper: https://arxiv.org/abs/2401.09414 684 | - Code: https://github.com/Vchitect/Vlogger 685 | 686 | **VBench: Comprehensive Benchmark Suite for Video Generative Models** 687 | 688 | - Homepage: https://vchitect.github.io/VBench-project/ 689 | - Paper: https://arxiv.org/abs/2311.17982 690 | - Code: https://github.com/Vchitect/VBench 691 | 692 | **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models** 693 | 694 | - Homeoage: https://video-motion-customization.github.io/ 695 | - Paper: https://arxiv.org/abs/2312.00845 696 | - Code: https://github.com/HyeonHo99/Video-Motion-Customization 697 | 698 | 699 | 700 | # 3D生成 701 | 702 | **CityDreamer: Compositional Generative Model of Unbounded 3D Cities** 703 | 704 | - Homepage: https://haozhexie.com/project/city-dreamer/ 705 | - Paper: https://arxiv.org/abs/2309.00610 706 | - Code: https://github.com/hzxie/city-dreamer 707 | 708 | **LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching** 709 | 710 | - Paper: https://arxiv.org/abs/2311.11284 711 | - Code: https://github.com/EnVision-Research/LucidDreamer 712 | 713 | 714 | 715 | # 视频理解(Video Understanding) 716 | 717 | **MVBench: A Comprehensive Multi-modal Video Understanding Benchmark** 718 | 719 | - Paper: https://arxiv.org/abs/2311.17005 720 | - Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 721 | 722 | 723 | 724 | # 知识蒸馏(Knowledge Distillation) 725 | 726 | **Logit Standardization in Knowledge Distillation** 727 | 728 | - Paper: https://arxiv.org/abs/2403.01427 729 | - Code: https://github.com/sunshangquan/logit-standardization-KD 730 | 731 | **Efficient Dataset Distillation via Minimax Diffusion** 732 | 733 | - Paper: https://arxiv.org/abs/2311.15529 734 | - Code: https://github.com/vimar-gu/MinimaxDiffusion 735 | 736 | 737 | 738 | # 立体匹配(Stereo Matching) 739 | 740 | **Neural Markov Random Field for Stereo Matching** 741 | 742 | - Paper: https://arxiv.org/abs/2403.11193 743 | - Code: https://github.com/aeolusguan/NMRF 744 | 745 | 746 | 747 | # 场景图生成(Scene Graph Generation) 748 | 749 | **HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation** 750 | 751 | - Homepage: https://zhangce01.github.io/HiKER-SGG/ 752 | - Paper : https://arxiv.org/abs/2403.12033 753 | - Code: https://github.com/zhangce01/HiKER-SGG 754 | 755 | 756 | 757 | # 视频质量评价(Video Quality Assessment) 758 | 759 | **KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos** 760 | 761 | - Homepage: https://lixinustc.github.io/projects/KVQ/ 762 | 763 | - Paper: https://arxiv.org/abs/2402.07220 764 | - Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024 765 | 766 | 767 | 768 | # 数据集(Datasets) 769 | 770 | **A Real-world Large-scale Dataset for Roadside Cooperative Perception** 771 | 772 | - Paper: https://arxiv.org/abs/2403.10145 773 | - Code: https://github.com/AIR-THU/DAIR-RCooper 774 | 775 | **Traffic Scene Parsing through the TSP6K Dataset** 776 | 777 | - Paper: https://arxiv.org/pdf/2303.02835.pdf 778 | - Code: https://github.com/PengtaoJiang/TSP6K 779 | 780 | 781 | 782 | # 其他(Others) 783 | 784 | **Object Recognition as Next Token Prediction** 785 | 786 | - Paper: https://arxiv.org/abs/2312.02142 787 | - Code: https://github.com/kaiyuyue/nxtp 788 | 789 | **ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks** 790 | 791 | - Paper: https://arxiv.org/abs/2306.14525 792 | - Code: https://parameternet.github.io/ 793 | 794 | **Seamless Human Motion Composition with Blended Positional Encodings** 795 | 796 | - Paper: https://arxiv.org/abs/2402.15509 797 | - Code: https://github.com/BarqueroGerman/FlowMDM 798 | 799 | **LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning** 800 | 801 | - Homepage: https://ll3da.github.io/ 802 | 803 | - Paper: https://arxiv.org/abs/2311.18651 804 | - Code: https://github.com/Open3DA/LL3DA 805 | 806 | **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update** 807 | 808 | - Homepage: https://clova-tool.github.io/ 809 | - Paper: https://arxiv.org/abs/2312.10908 810 | 811 | **MoMask: Generative Masked Modeling of 3D Human Motions** 812 | 813 | - Paper: https://arxiv.org/abs/2312.00063 814 | - Code: https://github.com/EricGuo5513/momask-codes 815 | 816 | **Amodal Ground Truth and Completion in the Wild** 817 | 818 | - Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/ 819 | - Paper: https://arxiv.org/abs/2312.17247 820 | - Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild 821 | 822 | **Improved Visual Grounding through Self-Consistent Explanations** 823 | 824 | - Paper: https://arxiv.org/abs/2312.04554 825 | - Code: https://github.com/uvavision/SelfEQ 826 | 827 | **ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object** 828 | 829 | - Homepage: https://chenshuang-zhang.github.io/imagenet_d/ 830 | - Paper: https://arxiv.org/abs/2403.18775 831 | - Code: https://github.com/chenshuang-zhang/imagenet_d 832 | 833 | **Learning from Synthetic Human Group Activities** 834 | 835 | - Homepage: https://cjerry1243.github.io/M3Act/ 836 | - Paper https://arxiv.org/abs/2306.16772 837 | - Code: https://github.com/cjerry1243/M3Act 838 | 839 | **A Cross-Subject Brain Decoding Framework** 840 | 841 | - Homepage: https://littlepure2333.github.io/MindBridge/ 842 | - Paper: https://arxiv.org/abs/2404.07850 843 | - Code: https://github.com/littlepure2333/MindBridge 844 | 845 | **Multi-Task Dense Prediction via Mixture of Low-Rank Experts** 846 | 847 | - Paper : https://arxiv.org/abs/2403.17749 848 | - Code: https://github.com/YuqiYang213/MLoRE 849 | 850 | **Contrastive Mean-Shift Learning for Generalized Category Discovery** 851 | 852 | - Homepage: https://postech-cvlab.github.io/cms/ 853 | - Paper: https://arxiv.org/abs/2404.09451 854 | - Code: https://github.com/sua-choi/CMS 855 | -------------------------------------------------------------------------------- /master: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wyf3/CVPR2024-Papers-with-Code/7a12b2155e596a79ba6dcc7a17a5ae27f0fc50a8/master --------------------------------------------------------------------------------