├── CVPR2019-Papers-with-Code.md
├── CVPR2020-Papers-with-Code.md
├── CVPR2021-Papers-with-Code.md
├── CVPR2022-Papers-with-Code.md
├── CVPR2023-Papers-with-Code.md
├── CVer学术交流群.png
├── README.md
└── master


/CVPR2019-Papers-with-Code.md:
--------------------------------------------------------------------------------
  1 | # CVPR2019-Code
  2 | 
  3 | CVPR 2019 论文开源项目合集
  4 | 
  5 | 传送门：[CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)
  6 | 
  7 | 附：[530 篇 CVPR 2019 论文代码链接](./CVPR2019_CodeLink.csv)
  8 | 
  9 | - [目标检测](#Object-Detection)
 10 | - [目标跟踪](#Object-Tracking)
 11 | - [语义分割](#Semantic-Segmentation)
 12 | - [实例分割](#Instance-Segmentation)
 13 | - [GAN](#GAN)
 14 | - [人脸检测](#Face-Detection)
 15 | - [人体姿态估计](#Human-Pose-Estimation)
 16 | - [6DoF 姿态估计](#6DoF-Pose-Estimation)
 17 | - [头部姿态估计](#Head-Pose-Estimation)
 18 | - [人群密度估计](#Crowd-Counting)
 19 | 
 20 | **更新记录：**
 21 | 
 22 | - 20200226：添加 [CVPR 2020 论文开源项目合集](https://github.com/amusi/CVPR2020-Code)
 23 | 
 24 | - 20191026：添加 [530 篇论文代码链接](./CVPR2019_CodeLink.csv)
 25 | - 20190405：添加 8 篇论文（目标检测、语义分割等方向）
 26 | - 20190408：添加 6 篇论文（目标跟踪、GAN、6DoF姿态估计等方向）
 27 | 
 28 | <a name="Object-Detection"></a>
 29 | 
 30 | # 目标检测
 31 | 
 32 | **Bounding Box Regression with Uncertainty for Accurate Object Detection**
 33 | 
 34 | - arXiv：<https://arxiv.org/abs/1809.08545>
 35 | 
 36 | - github：<https://github.com/yihui-he/KL-Loss>
 37 | 
 38 | <a name="Object-Tracking"></a>
 39 | 
 40 | # 目标跟踪
 41 | 
 42 | **Fast Online Object Tracking and Segmentation: A Unifying Approach**
 43 | 
 44 | - arXiv：<https://arxiv.org/abs/1812.05050>
 45 | 
 46 | - github：<https://github.com/foolwood/SiamMask>
 47 | 
 48 | - homepage：<http://www.robots.ox.ac.uk/~qwang/SiamMask>
 49 | 
 50 | **Unsupervised Deep Tracking**
 51 | 
 52 | - arXiv：<https://arxiv.org/abs/1904.01828>
 53 | 
 54 | - github：<https://github.com/594422814/UDT>
 55 | 
 56 | - github(PyTorch)：<https://github.com/594422814/UDT_pytorch>
 57 | 
 58 | **Target-Aware Deep Tracking**
 59 | 
 60 | - arXiv：<https://arxiv.org/abs/1904.01772>
 61 | 
 62 | - homepage：<https://xinli-zn.github.io/TADT-project-page/>
 63 | 
 64 | <a name="Semantic-Segmentation"></a>
 65 | 
 66 | # 语义分割
 67 | 
 68 | **Decoders Matter for Semantic Segmentation: Data-Dependent Decoding Enables Flexible Feature Aggregation**
 69 | 
 70 | - arXiv：<https://arxiv.org/abs/1903.02120>
 71 | 
 72 | - github：[https://github.com/LinZhuoChen/DUpsampling（非官方）](https://github.com/LinZhuoChen/DUpsampling%EF%BC%88%E9%9D%9E%E5%AE%98%E6%96%B9%EF%BC%89)
 73 | 
 74 | **Dual Attention Network for Scene Segmentation**
 75 | 
 76 | - arXiv：<https://arxiv.org/abs/1809.02983>
 77 | 
 78 | - github：<https://github.com/junfu1115/DANet>
 79 | 
 80 | **Collaborative Global-Local Networks for Memory-Efﬁcient Segmentation of Ultra-High Resolution Images**
 81 | 
 82 | - arXiv：None
 83 | 
 84 | - github：<https://github.com/chenwydj/ultra_high_resolution_segmentation>
 85 | 
 86 | <a name="Instance-Segmentation"></a>
 87 | 
 88 | # 实例分割
 89 | 
 90 | **Mask Scoring R-CNN**
 91 | 
 92 | - arXiv：<https://arxiv.org/abs/1903.00241>
 93 | 
 94 | - github：<https://github.com/zjhuang22/maskscoring_rcnn>
 95 | 
 96 | <a name="GAN"></a>
 97 | 
 98 | # GAN
 99 | 
100 | **Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis**
101 | 
102 | - arXiv：<https://arxiv.org/abs/1903.05628>
103 | - github：<https://github.com/HelenMao/MSGAN>
104 | 
105 | <a name="Face-Detection"></a>
106 | 
107 | # 人脸检测
108 | 
109 | **DSFD: Dual Shot Face Detector**
110 | 
111 | - arXiv：<https://arxiv.org/abs/1810.10220>
112 | 
113 | - github：<https://github.com/TencentYoutuResearch/FaceDetection-DSFD>
114 | 
115 | <a name="Human-Pose-Estimation"></a>
116 | 
117 | # 人体姿态估计
118 | 
119 | **Deep High-Resolution Representation Learning for Human Pose Estimation**
120 | 
121 | - arXiv：<https://arxiv.org/abs/1902.09212>
122 | 
123 | - github：<https://github.com/leoxiaobin/deep-high-resolution-net.pytorch>
124 | 
125 | <a name="6DoF-Pose-Estimation"></a>
126 | 
127 | # 6DoF姿态估计
128 | 
129 | **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**
130 | 
131 | - arXiv：<https://arxiv.org/abs/1812.11788>
132 | - github：<https://github.com/zju3dv/pvnet>
133 | 
134 | <a name="Head-Pose-Estimation"></a>
135 | 
136 | # 头部姿态估计
137 | 
138 | **PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation**
139 | 
140 | - paper：<https://github.com/shamangary/FSA-Net/blob/master/0191.pdf>
141 | - github：<https://github.com/shamangary/FSA-Net>
142 | 
143 | <a name="Crowd-Counting"></a>
144 | 
145 | # 人群密度估计
146 | 
147 | **Learning from Synthetic Data for Crowd Counting in the Wild**
148 | 
149 | - arXiv：<https://arxiv.org/abs/1903.03303>
150 | - github：<https://github.com/gjy3035/GCC-SFCN>
151 | - homepage：<https://gjy3035.github.io/GCC-CL/>


--------------------------------------------------------------------------------
/CVPR2020-Papers-with-Code.md:
--------------------------------------------------------------------------------
   1 | # CVPR2020-Code
   2 | 
   3 | [CVPR 2020](https://openaccess.thecvf.com/CVPR2020) 论文开源项目合集，同时欢迎各位大佬提交issue，分享CVPR 2020开源项目
   4 | 
   5 | **【推荐阅读】**
   6 | 
   7 | - [CVPR 2020 virtual](http://cvpr20.com/)
   8 | - ECCV 2020 论文开源项目合集来了：https://github.com/amusi/ECCV2020-Code
   9 | 
  10 | - 关于往年CV顶会论文（如ECCV 2020、CVPR 2019、ICCV 2019）以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
  11 | 
  12 | **【CVPR 2020 论文开源目录】**
  13 | 
  14 | - [CNN](#CNN)
  15 | - [图像分类](#Image-Classification)
  16 | - [视频分类](#Video-Classification)
  17 | - [目标检测](#Object-Detection)
  18 | - [3D目标检测](#3D-Object-Detection)
  19 | - [视频目标检测](#Video-Object-Detection)
  20 | - [目标跟踪](#Object-Tracking)
  21 | - [语义分割](#Semantic-Segmentation)
  22 | - [实例分割](#Instance-Segmentation)
  23 | - [全景分割](#Panoptic-Segmentation)
  24 | - [视频目标分割](#VOS)
  25 | - [超像素分割](#Superpixel)
  26 | - [交互式图像分割](#IIS)
  27 | - [NAS](#NAS)
  28 | - [GAN](#GAN)
  29 | - [Re-ID](#Re-ID)
  30 | - [3D点云（分类/分割/配准/跟踪等）](#3D-PointCloud)
  31 | - [人脸（识别/检测/重建等）](#Face)
  32 | - [人体姿态估计(2D/3D)](#Human-Pose-Estimation)
  33 | - [人体解析](#Human-Parsing)
  34 | - [场景文本检测](#Scene-Text-Detection)
  35 | - [场景文本识别](#Scene-Text-Recognition)
  36 | - [特征(点)检测和描述](#Feature)
  37 | - [超分辨率](#Super-Resolution)
  38 | - [模型压缩/剪枝](#Model-Compression)
  39 | - [视频理解/行为识别](#Action-Recognition)
  40 | - [人群计数](#Crowd-Counting)
  41 | - [深度估计](#Depth-Estimation)
  42 | - [6D目标姿态估计](#6DOF)
  43 | - [手势估计](#Hand-Pose)
  44 | - [显著性检测](#Saliency)
  45 | - [去噪](#Denoising)
  46 | - [去雨](#Deraining)
  47 | - [去模糊](#Deblurring)
  48 | - [去雾](#Dehazing)
  49 | - [特征点检测与描述](#Feature)
  50 | - [视觉问答(VQA)](#VQA)
  51 | - [视频问答(VideoQA)](#VideoQA)
  52 | - [视觉语言导航](#VLN)
  53 | - [视频压缩](#Video-Compression)
  54 | - [视频插帧](#Video-Frame-Interpolation)
  55 | - [风格迁移](#Style-Transfer)
  56 | - [车道线检测](#Lane-Detection)
  57 | - ["人-物"交互(HOI)检测](#HOI)
  58 | - [轨迹预测](#TP)
  59 | - [运动预测](#Motion-Predication)
  60 | - [光流估计](#OF)
  61 | - [图像检索](#IR)
  62 | - [虚拟试衣](#Virtual-Try-On)
  63 | - [HDR](#HDR)
  64 | - [对抗样本](#AE)
  65 | - [三维重建](#3D-Reconstructing)
  66 | - [深度补全](#DC)
  67 | - [语义场景补全](#SSC)
  68 | - [图像/视频描述](#Captioning)
  69 | - [线框解析](#WP)
  70 | - [数据集](#Datasets)
  71 | - [其他](#Others)
  72 | - [不确定中没中](#Not-Sure)
  73 | 
  74 | <a name="CNN"></a>
  75 | 
  76 | # CNN
  77 | 
  78 | **Exploring Self-attention for Image Recognition**
  79 | 
  80 | - 论文：https://hszhao.github.io/papers/cvpr20_san.pdf
  81 | 
  82 | - 代码：https://github.com/hszhao/SAN
  83 | 
  84 | **Improving Convolutional Networks with Self-Calibrated Convolutions**
  85 | 
  86 | - 主页：https://mmcheng.net/scconv/
  87 | 
  88 | - 论文：http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf
  89 | - 代码：https://github.com/backseason/SCNet
  90 | 
  91 | **Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets**
  92 | 
  93 | - 论文：https://arxiv.org/abs/2003.13549
  94 | - 代码：https://github.com/zeiss-microscopy/BSConv
  95 | 
  96 | <a name="Image-Classification"></a>
  97 | 
  98 | # 图像分类
  99 | 
 100 | **Interpretable and Accurate Fine-grained Recognition via Region Grouping**
 101 | 
 102 | - 论文：https://arxiv.org/abs/2005.10411
 103 | 
 104 | - 代码：https://github.com/zxhuang1698/interpretability-by-parts
 105 | 
 106 | **Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion**
 107 | 
 108 | - 论文：https://arxiv.org/abs/2003.04490
 109 | 
 110 | - 代码：https://github.com/AdamKortylewski/CompositionalNets
 111 | 
 112 | **Spatially Attentive Output Layer for Image Classification**
 113 | 
 114 | - 论文：https://arxiv.org/abs/2004.07570 
 115 | - 代码（好像被原作者删除了）：https://github.com/ildoonet/spatially-attentive-output-layer 
 116 | 
 117 | <a name="Video-Classification"></a>
 118 | 
 119 | # 视频分类
 120 | 
 121 | **SmallBigNet: Integrating Core and Contextual Views for Video Classification**
 122 | 
 123 | - 论文：https://arxiv.org/abs/2006.14582
 124 | - 代码：https://github.com/xhl-video/SmallBigNet
 125 | 
 126 | <a name="Object-Detection"></a>
 127 | 
 128 | # 目标检测
 129 | 
 130 | **Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax**
 131 | 
 132 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Li_Overcoming_Classifier_Imbalance_for_Long-Tail_Object_Detection_With_Balanced_Group_CVPR_2020_paper.pdf
 133 | - 代码：https://github.com/FishYuLi/BalancedGroupSoftmax
 134 | 
 135 | **AugFPN: Improving Multi-scale Feature Learning for Object Detection**
 136 | 
 137 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Guo_AugFPN_Improving_Multi-Scale_Feature_Learning_for_Object_Detection_CVPR_2020_paper.pdf 
 138 | - 代码：https://github.com/Gus-Guo/AugFPN
 139 | 
 140 | **Noise-Aware Fully Webly Supervised Object Detection**
 141 | 
 142 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Shen_Noise-Aware_Fully_Webly_Supervised_Object_Detection_CVPR_2020_paper.html
 143 | - 代码：https://github.com/shenyunhang/NA-fWebSOD/
 144 | 
 145 | **Learning a Unified Sample Weighting Network for Object Detection**
 146 | 
 147 | - 论文：https://arxiv.org/abs/2006.06568
 148 | - 代码：https://github.com/caiqi/sample-weighting-network
 149 | 
 150 | **D2Det: Towards High Quality Object Detection and Instance Segmentation**
 151 | 
 152 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
 153 | 
 154 | - 代码：https://github.com/JialeCao001/D2Det
 155 | 
 156 | **Dynamic Refinement Network for Oriented and Densely Packed Object Detection**
 157 | 
 158 | - 论文下载链接：https://arxiv.org/abs/2005.09973
 159 | 
 160 | - 代码和数据集：https://github.com/Anymake/DRN_CVPR2020
 161 | 
 162 | **Scale-Equalizing Pyramid Convolution for Object Detection**
 163 | 
 164 | 论文：https://arxiv.org/abs/2005.03101
 165 | 
 166 | 代码：https://github.com/jshilong/SEPC
 167 | 
 168 | **Revisiting the Sibling Head in Object Detector**
 169 | 
 170 | - 论文：https://arxiv.org/abs/2003.07540
 171 | 
 172 | - 代码：https://github.com/Sense-X/TSD 
 173 | 
 174 | **Scale-equalizing Pyramid Convolution for Object Detection**
 175 | 
 176 | - 论文：暂无
 177 | - 代码：https://github.com/jshilong/SEPC 
 178 | 
 179 | **Detection in Crowded Scenes: One Proposal, Multiple Predictions**
 180 | 
 181 | - 论文：https://arxiv.org/abs/2003.09163
 182 | - 代码：https://github.com/megvii-model/CrowdDetection
 183 | 
 184 | **Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection**
 185 | 
 186 | - 论文：https://arxiv.org/abs/2004.04725
 187 | - 代码：https://github.com/NVlabs/wetectron
 188 | 
 189 | **Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection**
 190 | 
 191 | - 论文：https://arxiv.org/abs/1912.02424 
 192 | - 代码：https://github.com/sfzhang15/ATSS
 193 | 
 194 | **BiDet: An Efficient Binarized Object Detector**
 195 | 
 196 | - 论文：https://arxiv.org/abs/2003.03961 
 197 | - 代码：https://github.com/ZiweiWangTHU/BiDet
 198 | 
 199 | **Harmonizing Transferability and Discriminability for Adapting Object Detectors**
 200 | 
 201 | - 论文：https://arxiv.org/abs/2003.06297
 202 | - 代码：https://github.com/chaoqichen/HTCN
 203 | 
 204 | **CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection**
 205 | 
 206 | - 论文：https://arxiv.org/abs/2003.09119
 207 | - 代码：https://github.com/KiveeDong/CentripetalNet
 208 | 
 209 | **Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection**
 210 | 
 211 | - 论文：https://arxiv.org/abs/2003.11818
 212 | - 代码：https://github.com/ggjy/HitDet.pytorch
 213 | 
 214 | **EfficientDet: Scalable and Efficient Object Detection**
 215 | 
 216 | - 论文：https://arxiv.org/abs/1911.09070
 217 | - 代码：https://github.com/google/automl/tree/master/efficientdet 
 218 | 
 219 | <a name="3D-Object-Detection"></a>
 220 | 
 221 | # 3D目标检测
 222 | 
 223 | **SESS: Self-Ensembling Semi-Supervised 3D Object Detection**
 224 | 
 225 | - 论文： https://arxiv.org/abs/1912.11803
 226 | 
 227 | - 代码：https://github.com/Na-Z/sess
 228 | 
 229 | **Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection**
 230 | 
 231 | - 论文： https://arxiv.org/abs/2006.04356
 232 | 
 233 | - 代码：https://github.com/dleam/Associate-3Ddet
 234 | 
 235 | **What You See is What You Get: Exploiting Visibility for 3D Object Detection**
 236 | 
 237 | - 主页：https://www.cs.cmu.edu/~peiyunh/wysiwyg/
 238 | 
 239 | - 论文：https://arxiv.org/abs/1912.04986
 240 | - 代码：https://github.com/peiyunh/wysiwyg
 241 | 
 242 | **Learning Depth-Guided Convolutions for Monocular 3D Object Detection**
 243 | 
 244 | - 论文：https://arxiv.org/abs/1912.04799
 245 | - 代码：https://github.com/dingmyu/D4LCN
 246 | 
 247 | **Structure Aware Single-stage 3D Object Detection from Point Cloud**
 248 | 
 249 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/He_Structure_Aware_Single-Stage_3D_Object_Detection_From_Point_Cloud_CVPR_2020_paper.html
 250 | 
 251 | - 代码：https://github.com/skyhehe123/SA-SSD
 252 | 
 253 | **IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving**
 254 | 
 255 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Peng_IDA-3D_Instance-Depth-Aware_3D_Object_Detection_From_Stereo_Vision_for_Autonomous_CVPR_2020_paper.pdf
 256 | 
 257 | - 代码：https://github.com/swords123/IDA-3D
 258 | 
 259 | **Train in Germany, Test in The USA: Making 3D Object Detectors Generalize**
 260 | 
 261 | - 论文：https://arxiv.org/abs/2005.08139
 262 | 
 263 | - 代码：https://github.com/cxy1997/3D_adapt_auto_driving
 264 | 
 265 | **MLCVNet: Multi-Level Context VoteNet for 3D Object Detection**
 266 | 
 267 | - 论文：https://arxiv.org/abs/2004.05679
 268 | - 代码：https://github.com/NUAAXQ/MLCVNet
 269 | 
 270 | **3DSSD: Point-based 3D Single Stage Object Detector**
 271 | 
 272 | - CVPR 2020 Oral
 273 | 
 274 | - 论文：https://arxiv.org/abs/2002.10187
 275 | 
 276 | - 代码：https://github.com/tomztyang/3DSSD
 277 | 
 278 | **Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation**
 279 | 
 280 | - 论文：https://arxiv.org/abs/2004.03572
 281 | 
 282 | - 代码：https://github.com/zju3dv/disprcn
 283 | 
 284 | **End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection**
 285 | 
 286 | - 论文：https://arxiv.org/abs/2004.03080
 287 | 
 288 | - 代码：https://github.com/mileyan/pseudo-LiDAR_e2e
 289 | 
 290 | **DSGN: Deep Stereo Geometry Network for 3D Object Detection**
 291 | 
 292 | - 论文：https://arxiv.org/abs/2001.03398
 293 | - 代码：https://github.com/chenyilun95/DSGN
 294 | 
 295 | **LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention**
 296 | 
 297 | - 论文：https://arxiv.org/abs/2004.01389
 298 | - 代码：https://github.com/yinjunbo/3DVID
 299 | 
 300 | **PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection**
 301 | 
 302 | - 论文：https://arxiv.org/abs/1912.13192
 303 | 
 304 | - 代码：https://github.com/sshaoshuai/PV-RCNN
 305 | 
 306 | **Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud**
 307 | 
 308 | - 论文：https://arxiv.org/abs/2003.01251 
 309 | - 代码：https://github.com/WeijingShi/Point-GNN 
 310 | 
 311 | <a name="Video-Object-Detection"></a>
 312 | 
 313 | # 视频目标检测
 314 | 
 315 | **Memory Enhanced Global-Local Aggregation for Video Object Detection**
 316 | 
 317 | 论文：https://arxiv.org/abs/2003.12063
 318 | 
 319 | 代码：https://github.com/Scalsol/mega.pytorch
 320 | 
 321 | <a name="Object-Tracking"></a>
 322 | 
 323 | # 目标跟踪
 324 | 
 325 | **SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking**
 326 | 
 327 | - 论文：https://arxiv.org/abs/1911.07241
 328 | - 代码：https://github.com/ohhhyeahhh/SiamCAR
 329 | 
 330 | **D3S -- A Discriminative Single Shot Segmentation Tracker**
 331 | 
 332 | - 论文：https://arxiv.org/abs/1911.08862
 333 | - 代码：https://github.com/alanlukezic/d3s
 334 | 
 335 | **ROAM: Recurrently Optimizing Tracking Model**
 336 | 
 337 | - 论文：https://arxiv.org/abs/1907.12006
 338 | 
 339 | - 代码：https://github.com/skyoung/ROAM
 340 | 
 341 | **Siam R-CNN: Visual Tracking by Re-Detection**
 342 | 
 343 | - 主页：https://www.vision.rwth-aachen.de/page/siamrcnn
 344 | - 论文：https://arxiv.org/abs/1911.12836
 345 | - 论文2：https://www.vision.rwth-aachen.de/media/papers/192/siamrcnn.pdf
 346 | - 代码：https://github.com/VisualComputingInstitute/SiamR-CNN
 347 | 
 348 | **Cooling-Shrinking Attack: Blinding the Tracker with Imperceptible Noises**
 349 | 
 350 | - 论文：https://arxiv.org/abs/2003.09595 
 351 | - 代码：https://github.com/MasterBin-IIAU/CSA 
 352 | 
 353 | **High-Performance Long-Term Tracking with Meta-Updater**
 354 | 
 355 | - 论文：https://arxiv.org/abs/2004.00305
 356 | 
 357 | - 代码：https://github.com/Daikenan/LTMU
 358 | 
 359 | **AutoTrack: Towards High-Performance Visual Tracking for UAV with Automatic Spatio-Temporal Regularization**
 360 | 
 361 | - 论文：https://arxiv.org/abs/2003.12949
 362 | 
 363 | - 代码：https://github.com/vision4robotics/AutoTrack
 364 | 
 365 | **Probabilistic Regression for Visual Tracking**
 366 | 
 367 | - 论文：https://arxiv.org/abs/2003.12565
 368 | - 代码：https://github.com/visionml/pytracking
 369 | 
 370 | **MAST: A Memory-Augmented Self-supervised Tracker**
 371 | 
 372 | - 论文：https://arxiv.org/abs/2002.07793
 373 | - 代码：https://github.com/zlai0/MAST
 374 | 
 375 | **Siamese Box Adaptive Network for Visual Tracking**
 376 | 
 377 | - 论文：https://arxiv.org/abs/2003.06761
 378 | - 代码：https://github.com/hqucv/siamban
 379 | 
 380 | ## 多目标跟踪
 381 | 
 382 | **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**
 383 | 
 384 | - 主页：https://vap.aau.dk/3d-zef/
 385 | - 论文：https://arxiv.org/abs/2006.08466
 386 | - 代码：https://bitbucket.org/aauvap/3d-zef/src/master/
 387 | - 数据集：https://motchallenge.net/data/3D-ZeF20
 388 | 
 389 | <a name="Semantic-Segmentation"></a>
 390 | 
 391 | # 语义分割
 392 | 
 393 | **FDA: Fourier Domain Adaptation for Semantic Segmentation**
 394 | 
 395 | - 论文：https://arxiv.org/abs/2004.05498
 396 | 
 397 | - 代码：https://github.com/YanchaoYang/FDA
 398 | 
 399 | **Super-BPD: Super Boundary-to-Pixel Direction for Fast Image Segmentation**
 400 | 
 401 | - 论文：暂无
 402 | 
 403 | - 代码：https://github.com/JianqiangWan/Super-BPD
 404 | 
 405 | **Single-Stage Semantic Segmentation from Image Labels**
 406 | 
 407 | - 论文：https://arxiv.org/abs/2005.08104
 408 | 
 409 | - 代码：https://github.com/visinf/1-stage-wseg
 410 | 
 411 | **Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation**
 412 | 
 413 | - 论文：https://arxiv.org/abs/2003.00867
 414 | - 代码：https://github.com/MyeongJin-Kim/Learning-Texture-Invariant-Representation
 415 | 
 416 | **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**
 417 | 
 418 | - 论文：http://vladlen.info/papers/MSeg.pdf
 419 | - 代码：https://github.com/mseg-dataset/mseg-api
 420 | 
 421 | **CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement**
 422 | 
 423 | - 论文：https://arxiv.org/abs/2005.02551
 424 | - 代码：https://github.com/hkchengrex/CascadePSP
 425 | 
 426 | **Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision**
 427 | 
 428 | - Oral
 429 | - 论文：https://arxiv.org/abs/2004.07703
 430 | - 代码：https://github.com/feipan664/IntraDA
 431 | 
 432 | **Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation**
 433 | 
 434 | - 论文：https://arxiv.org/abs/2004.04581
 435 | - 代码：https://github.com/YudeWang/SEAM
 436 | 
 437 | **Temporally Distributed Networks for Fast Video Segmentation**
 438 | 
 439 | - 论文：https://arxiv.org/abs/2004.01800
 440 | 
 441 | - 代码：https://github.com/feinanshan/TDNet
 442 | 
 443 | **Context Prior for Scene Segmentation**
 444 | 
 445 | - 论文：https://arxiv.org/abs/2004.01547
 446 | 
 447 | - 代码：https://git.io/ContextPrior
 448 | 
 449 | **Strip Pooling: Rethinking Spatial Pooling for Scene Parsing**
 450 | 
 451 | - 论文：https://arxiv.org/abs/2003.13328
 452 | 
 453 | - 代码：https://github.com/Andrew-Qibin/SPNet
 454 | 
 455 | **Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via Height-driven Attention Networks**
 456 | 
 457 | - 论文：https://arxiv.org/abs/2003.05128
 458 | - 代码：https://github.com/shachoi/HANet
 459 | 
 460 | **Learning Dynamic Routing for Semantic Segmentation**
 461 | 
 462 | - 论文：https://arxiv.org/abs/2003.10401
 463 | 
 464 | - 代码：https://github.com/yanwei-li/DynamicRouting
 465 | 
 466 | <a name="Instance-Segmentation"></a>
 467 | 
 468 | # 实例分割
 469 | 
 470 | **D2Det: Towards High Quality Object Detection and Instance Segmentation**
 471 | 
 472 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
 473 | 
 474 | - 代码：https://github.com/JialeCao001/D2Det
 475 | 
 476 | **PolarMask: Single Shot Instance Segmentation with Polar Representation**
 477 | 
 478 | - 论文：https://arxiv.org/abs/1909.13226 
 479 | - 代码：https://github.com/xieenze/PolarMask 
 480 | - 解读：https://zhuanlan.zhihu.com/p/84890413 
 481 | 
 482 | **CenterMask : Real-Time Anchor-Free Instance Segmentation**
 483 | 
 484 | - 论文：https://arxiv.org/abs/1911.06667 
 485 | - 代码：https://github.com/youngwanLEE/CenterMask 
 486 | 
 487 | **BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation**
 488 | 
 489 | - 论文：https://arxiv.org/abs/2001.00309
 490 | - 代码：https://github.com/aim-uofa/AdelaiDet
 491 | 
 492 | **Deep Snake for Real-Time Instance Segmentation**
 493 | 
 494 | - 论文：https://arxiv.org/abs/2001.01629
 495 | - 代码：https://github.com/zju3dv/snake
 496 | 
 497 | **Mask Encoding for Single Shot Instance Segmentation**
 498 | 
 499 | - 论文：https://arxiv.org/abs/2003.11712
 500 | 
 501 | - 代码：https://github.com/aim-uofa/AdelaiDet
 502 | 
 503 | <a name="Panoptic-Segmentation"></a>
 504 | 
 505 | # 全景分割
 506 | 
 507 | **Video Panoptic Segmentation**
 508 | 
 509 | - 论文：https://arxiv.org/abs/2006.11339
 510 | - 代码：https://github.com/mcahny/vps
 511 | - 数据集：https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
 512 | 
 513 | **Pixel Consensus Voting for Panoptic Segmentation**
 514 | 
 515 | - 论文：https://arxiv.org/abs/2004.01849
 516 | - 代码：还未公布
 517 | 
 518 | **BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation**
 519 | 
 520 | 论文：https://arxiv.org/abs/2003.14031
 521 | 
 522 | 代码：https://github.com/Mooonside/BANet
 523 | 
 524 | <a name="VOS"></a>
 525 | 
 526 | # 视频目标分割
 527 | 
 528 | **A Transductive Approach for Video Object Segmentation**
 529 | 
 530 | - 论文：https://arxiv.org/abs/2004.07193
 531 | 
 532 | - 代码：https://github.com/microsoft/transductive-vos.pytorch
 533 | 
 534 | **State-Aware Tracker for Real-Time Video Object Segmentation**
 535 | 
 536 | - 论文：https://arxiv.org/abs/2003.00482
 537 | 
 538 | - 代码：https://github.com/MegviiDetection/video_analyst
 539 | 
 540 | **Learning Fast and Robust Target Models for Video Object Segmentation**
 541 | 
 542 | - 论文：https://arxiv.org/abs/2003.00908 
 543 | - 代码：https://github.com/andr345/frtm-vos
 544 | 
 545 | **Learning Video Object Segmentation from Unlabeled Videos**
 546 | 
 547 | - 论文：https://arxiv.org/abs/2003.05020
 548 | - 代码：https://github.com/carrierlxk/MuG
 549 | 
 550 | <a name="Superpixel"></a>
 551 | 
 552 | # 超像素分割
 553 | 
 554 | **Superpixel Segmentation with Fully Convolutional Networks**
 555 | 
 556 | - 论文：https://arxiv.org/abs/2003.12929
 557 | - 代码：https://github.com/fuy34/superpixel_fcn
 558 | 
 559 | <a name="IIS"></a>
 560 | 
 561 | # 交互式图像分割
 562 | 
 563 | **Interactive Object Segmentation with Inside-Outside Guidance**
 564 | 
 565 | - 论文下载链接：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
 566 | - 代码：https://github.com/shiyinzhang/Inside-Outside-Guidance
 567 | - 数据集：https://github.com/shiyinzhang/Pixel-ImageNet
 568 | 
 569 | <a name="NAS"></a>
 570 | 
 571 | # NAS
 572 | 
 573 | **AOWS: Adaptive and optimal network width search with latency constraints**
 574 | 
 575 | - 论文：https://arxiv.org/abs/2005.10481
 576 | - 代码：https://github.com/bermanmaxim/AOWS
 577 | 
 578 | **Densely Connected Search Space for More Flexible Neural Architecture Search**
 579 | 
 580 | - 论文：https://arxiv.org/abs/1906.09607
 581 | 
 582 | - 代码：https://github.com/JaminFong/DenseNAS
 583 | 
 584 | **MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning**
 585 | 
 586 | - 论文：https://arxiv.org/abs/2003.14058
 587 | 
 588 | - 代码：https://github.com/bhpfelix/MTLNAS
 589 | 
 590 | **FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions**
 591 | 
 592 | - 论文下载链接：https://arxiv.org/abs/2004.05565
 593 | 
 594 | - 代码：https://github.com/facebookresearch/mobile-vision
 595 | 
 596 | **Neural Architecture Search for Lightweight Non-Local Networks**
 597 | 
 598 | - 论文：https://arxiv.org/abs/2004.01961
 599 | - 代码：https://github.com/LiYingwei/AutoNL
 600 | 
 601 | **Rethinking Performance Estimation in Neural Architecture Search**
 602 | 
 603 | - 论文：https://arxiv.org/abs/2005.09917
 604 | - 代码：https://github.com/zhengxiawu/rethinking_performance_estimation_in_NAS
 605 | - 解读1：https://www.zhihu.com/question/372070853/answer/1035234510
 606 | - 解读2：https://zhuanlan.zhihu.com/p/111167409
 607 | 
 608 | **CARS: Continuous Evolution for Efficient Neural Architecture Search**
 609 | 
 610 | - 论文：https://arxiv.org/abs/1909.04977 
 611 | - 代码（即将开源）：https://github.com/huawei-noah/CARS 
 612 | 
 613 | <a name="GAN"></a>
 614 | 
 615 | # GAN
 616 | 
 617 | **SEAN: Image Synthesis with Semantic Region-Adaptive Normalization**
 618 | 
 619 | - 论文：https://arxiv.org/abs/1911.12861
 620 | - 代码：https://github.com/ZPdesu/SEAN
 621 | 
 622 | **Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation**
 623 | 
 624 | - 论文地址：http://openaccess.thecvf.com/content_CVPR_2020/html/Chen_Reusing_Discriminators_for_Encoding_Towards_Unsupervised_Image-to-Image_Translation_CVPR_2020_paper.html
 625 | - 代码地址：https://github.com/alpc91/NICE-GAN-pytorch 
 626 | 
 627 | **Distribution-induced Bidirectional Generative Adversarial Network for Graph Representation Learning**
 628 | 
 629 | - 论文：https://arxiv.org/abs/1912.01899
 630 | - 代码：https://github.com/SsGood/DBGAN 
 631 | 
 632 | **PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer**
 633 | 
 634 | - 论文：https://arxiv.org/abs/1909.06956
 635 | - 代码：https://github.com/wtjiang98/PSGAN
 636 | 
 637 | **Semantically Mutil-modal Image Synthesis**
 638 | 
 639 | - 主页：http://seanseattle.github.io/SMIS
 640 | - 论文：https://arxiv.org/abs/2003.12697
 641 | - 代码：https://github.com/Seanseattle/SMIS
 642 | 
 643 | **Unpaired Portrait Drawing Generation via Asymmetric Cycle Mapping**
 644 | 
 645 | - 论文：https://yiranran.github.io/files/CVPR2020_Unpaired%20Portrait%20Drawing%20Generation%20via%20Asymmetric%20Cycle%20Mapping.pdf
 646 | - 代码：https://github.com/yiranran/Unpaired-Portrait-Drawing
 647 | 
 648 | **Learning to Cartoonize Using White-box Cartoon Representations**
 649 | 
 650 | - 论文：https://github.com/SystemErrorWang/White-box-Cartoonization/blob/master/paper/06791.pdf
 651 | 
 652 | - 主页：https://systemerrorwang.github.io/White-box-Cartoonization/
 653 | - 代码：https://github.com/SystemErrorWang/White-box-Cartoonization
 654 | - 解读：https://zhuanlan.zhihu.com/p/117422157
 655 | - Demo视频：https://www.bilibili.com/video/av56708333
 656 | 
 657 | **GAN Compression: Efficient Architectures for Interactive Conditional GANs**
 658 | 
 659 | - 论文：https://arxiv.org/abs/2003.08936
 660 | 
 661 | - 代码：https://github.com/mit-han-lab/gan-compression
 662 | 
 663 | **Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions**
 664 | 
 665 | - 论文：https://arxiv.org/abs/2003.01826 
 666 | - 代码：https://github.com/cc-hpc-itwm/UpConv 
 667 | 
 668 | <a name="Re-ID"></a>
 669 | 
 670 | # Re-ID
 671 | 
 672 |  **High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification**
 673 | 
 674 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Wang_High-Order_Information_Matters_Learning_Relation_and_Topology_for_Occluded_Person_CVPR_2020_paper.html
 675 | - 代码：https://github.com/wangguanan/HOReID 
 676 | 
 677 | **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**
 678 | 
 679 | - 论文：https://arxiv.org/abs/2005.07862
 680 | 
 681 | - 数据集：暂无
 682 | 
 683 | **Transferable, Controllable, and Inconspicuous Adversarial Attacks on Person Re-identification With Deep Mis-Ranking**
 684 | 
 685 | - 论文：https://arxiv.org/abs/2004.04199
 686 | 
 687 | - 代码：https://github.com/whj363636/Adversarial-attack-on-Person-ReID-With-Deep-Mis-Ranking
 688 | 
 689 | **Pose-guided Visible Part Matching for Occluded Person ReID**
 690 | 
 691 | - 论文：https://arxiv.org/abs/2004.00230
 692 | - 代码：https://github.com/hh23333/PVPM
 693 | 
 694 | **Weakly supervised discriminative feature learning with state information for person identification**
 695 | 
 696 | - 论文：https://arxiv.org/abs/2002.11939 
 697 | - 代码：https://github.com/KovenYu/state-information 
 698 | 
 699 | <a name="3D-PointCloud"></a>
 700 | 
 701 | # 3D点云（分类/分割/配准等）
 702 | 
 703 | ## 3D点云卷积
 704 | 
 705 | **PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling**
 706 | 
 707 | - 论文：https://arxiv.org/abs/2003.00492
 708 | - 代码：https://github.com/yanx27/PointASNL 
 709 | 
 710 | **Global-Local Bidirectional Reasoning for Unsupervised Representation Learning of 3D Point Clouds**
 711 | 
 712 | - 论文下载链接：https://arxiv.org/abs/2003.12971
 713 | 
 714 | - 代码：https://github.com/raoyongming/PointGLR
 715 | 
 716 | **Grid-GCN for Fast and Scalable Point Cloud Learning**
 717 | 
 718 | - 论文：https://arxiv.org/abs/1912.02984
 719 | 
 720 | - 代码：https://github.com/Xharlie/Grid-GCN
 721 | 
 722 | **FPConv: Learning Local Flattening for Point Convolution**
 723 | 
 724 | - 论文：https://arxiv.org/abs/2002.10701
 725 | - 代码：https://github.com/lyqun/FPConv
 726 | 
 727 | ## 3D点云分类
 728 | 
 729 | **PointAugment: an Auto-Augmentation Framework for Point Cloud Classification**
 730 | 
 731 | - 论文：https://arxiv.org/abs/2002.10876 
 732 | - 代码（即将开源）： https://github.com/liruihui/PointAugment/ 
 733 | 
 734 | ## 3D点云语义分割
 735 | 
 736 | **RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds**
 737 | 
 738 | - 论文：https://arxiv.org/abs/1911.11236
 739 | - 代码：https://github.com/QingyongHu/RandLA-Net
 740 | 
 741 | - 解读：https://zhuanlan.zhihu.com/p/105433460
 742 | 
 743 | **Weakly Supervised Semantic Point Cloud Segmentation:Towards 10X Fewer Labels**
 744 | 
 745 | - 论文：https://arxiv.org/abs/2004.04091
 746 | 
 747 | - 代码：https://github.com/alex-xun-xu/WeakSupPointCloudSeg
 748 | 
 749 | **PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation**
 750 | 
 751 | - 论文：https://arxiv.org/abs/2003.14032
 752 | - 代码：https://github.com/edwardzhou130/PolarSeg
 753 | 
 754 | **Learning to Segment 3D Point Clouds in 2D Image Space**
 755 | 
 756 | - 论文：https://arxiv.org/abs/2003.05593
 757 | 
 758 | - 代码：https://github.com/WPI-VISLab/Learning-to-Segment-3D-Point-Clouds-in-2D-Image-Space
 759 | 
 760 | ## 3D点云实例分割
 761 | 
 762 | PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation
 763 | 
 764 | - 论文：https://arxiv.org/abs/2004.01658
 765 | - 代码：https://github.com/Jia-Research-Lab/PointGroup
 766 | 
 767 | ## 3D点云配准
 768 | 
 769 | **Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences**
 770 | 
 771 | - 论文：https://arxiv.org/abs/2005.01014
 772 | - 代码：https://github.com/XiaoshuiHuang/fmr 
 773 | 
 774 | **D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features**
 775 | 
 776 | - 论文：https://arxiv.org/abs/2003.03164
 777 | - 代码：https://github.com/XuyangBai/D3Feat
 778 | 
 779 | **RPM-Net: Robust Point Matching using Learned Features**
 780 | 
 781 | - 论文：https://arxiv.org/abs/2003.13479
 782 | - 代码：https://github.com/yewzijian/RPMNet 
 783 | 
 784 | ## 3D点云补全
 785 | 
 786 | **Cascaded Refinement Network for Point Cloud Completion**
 787 | 
 788 | - 论文：https://arxiv.org/abs/2004.03327
 789 | - 代码：https://github.com/xiaogangw/cascaded-point-completion
 790 | 
 791 | ## 3D点云目标跟踪
 792 | 
 793 | **P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds**
 794 | 
 795 | - 论文：https://arxiv.org/abs/2005.13888
 796 | - 代码：https://github.com/HaozheQi/P2B
 797 | 
 798 | ## 其他
 799 | 
 800 | **An Efficient PointLSTM for Point Clouds Based Gesture Recognition**
 801 | 
 802 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Min_An_Efficient_PointLSTM_for_Point_Clouds_Based_Gesture_Recognition_CVPR_2020_paper.html
 803 | - 代码：https://github.com/Blueprintf/pointlstm-gesture-recognition-pytorch
 804 | 
 805 | <a name="Face"></a>
 806 | 
 807 | # 人脸
 808 | 
 809 | ## 人脸识别
 810 | 
 811 | **CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition**
 812 | 
 813 | - 论文：https://arxiv.org/abs/2004.00288
 814 | 
 815 | - 代码：https://github.com/HuangYG123/CurricularFace
 816 | 
 817 | **Learning Meta Face Recognition in Unseen Domains**
 818 | 
 819 | - 论文：https://arxiv.org/abs/2003.07733
 820 | - 代码：https://github.com/cleardusk/MFR
 821 | - 解读：https://mp.weixin.qq.com/s/YZoEnjpnlvb90qSI3xdJqQ 
 822 | 
 823 | ## 人脸检测
 824 | 
 825 | ## 人脸活体检测
 826 | 
 827 | **Searching Central Difference Convolutional Networks for Face Anti-Spoofing**
 828 | 
 829 | - 论文：https://arxiv.org/abs/2003.04092
 830 | 
 831 | - 代码：https://github.com/ZitongYu/CDCN
 832 | 
 833 | ## 人脸表情识别
 834 | 
 835 | **Suppressing Uncertainties for Large-Scale Facial Expression Recognition**
 836 | 
 837 | - 论文：https://arxiv.org/abs/2002.10392 
 838 | 
 839 | - 代码（即将开源）：https://github.com/kaiwang960112/Self-Cure-Network 
 840 | 
 841 | ## 人脸转正
 842 | 
 843 | **Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images**
 844 | 
 845 | - 论文：https://arxiv.org/abs/2003.08124
 846 | - 代码：https://github.com/Hangz-nju-cuhk/Rotate-and-Render
 847 | 
 848 | ## 人脸3D重建
 849 | 
 850 | **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**
 851 | 
 852 | - 论文：https://arxiv.org/abs/2003.13845
 853 | - 数据集：https://github.com/lattas/AvatarMe
 854 | 
 855 | **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**
 856 | 
 857 | - 论文：https://arxiv.org/abs/2003.13989
 858 | - 代码：https://github.com/zhuhao-nju/facescape
 859 | 
 860 | <a name="Human-Pose-Estimation"></a>
 861 | 
 862 | # 人体姿态估计(2D/3D)
 863 | 
 864 | ## 2D人体姿态估计
 865 | 
 866 | **TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting**
 867 | 
 868 | - 主页：https://yzhq97.github.io/transmomo/
 869 | 
 870 | - 论文：https://arxiv.org/abs/2003.14401
 871 | - 代码：https://github.com/yzhq97/transmomo.pytorch
 872 | 
 873 | **HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation**
 874 | 
 875 | - 论文：https://arxiv.org/abs/1908.10357
 876 | - 代码：https://github.com/HRNet/HigherHRNet-Human-Pose-Estimation
 877 | 
 878 | **The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation**
 879 | 
 880 | - 论文：https://arxiv.org/abs/1911.07524 
 881 | - 代码：https://github.com/HuangJunJie2017/UDP-Pose
 882 | - 解读：https://zhuanlan.zhihu.com/p/92525039
 883 | 
 884 | **Distribution-Aware Coordinate Representation for Human Pose Estimation**
 885 | 
 886 | - 主页：https://ilovepose.github.io/coco/ 
 887 | 
 888 | - 论文：https://arxiv.org/abs/1910.06278 
 889 | 
 890 | - 代码：https://github.com/ilovepose/DarkPose 
 891 | 
 892 | ## 3D人体姿态估计
 893 | 
 894 |  **Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data**
 895 | 
 896 | - 论文：https://arxiv.org/abs/2006.07778
 897 | - 代码：https://github.com/Nicholasli1995/EvoSkeleton 
 898 | 
 899 | **Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A Geometric Approach**
 900 | 
 901 | - 主页：https://www.zhe-zhang.com/cvpr2020
 902 | - 论文：https://arxiv.org/abs/2003.11163
 903 | 
 904 | - 代码：https://github.com/CHUNYUWANG/imu-human-pose-pytorch
 905 | 
 906 | **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**
 907 | 
 908 | - 论文下载链接：https://arxiv.org/abs/2004.01166
 909 | 
 910 | - 代码：https://github.com/Healthcare-Robotics/bodies-at-rest
 911 | - 数据集：https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
 912 | 
 913 | **Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis**
 914 | 
 915 | - 主页：http://val.cds.iisc.ac.in/pgp-human/
 916 | - 论文：https://arxiv.org/abs/2004.04400
 917 | 
 918 | **Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation**
 919 | 
 920 | - 论文：https://arxiv.org/abs/2004.00329
 921 | - 代码：https://github.com/fabbrimatteo/LoCO
 922 | 
 923 | **VIBE: Video Inference for Human Body Pose and Shape Estimation**
 924 | 
 925 | - 论文：https://arxiv.org/abs/1912.05656 
 926 | - 代码：https://github.com/mkocabas/VIBE
 927 | 
 928 | **Back to the Future: Joint Aware Temporal Deep Learning 3D Human Pose Estimation**
 929 | 
 930 | - 论文：https://arxiv.org/abs/2002.11251 
 931 | - 代码：https://github.com/vnmr/JointVideoPose3D
 932 | 
 933 | **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**
 934 | 
 935 | - 论文：https://arxiv.org/abs/2003.03972
 936 | - 数据集：暂无
 937 | 
 938 | <a name="Human-Parsing"></a>
 939 | 
 940 | # 人体解析
 941 | 
 942 | **Correlating Edge, Pose with Parsing**
 943 | 
 944 | - 论文：https://arxiv.org/abs/2005.01431
 945 | 
 946 | - 代码：https://github.com/ziwei-zh/CorrPM
 947 | 
 948 | <a name="Scene-Text-Detection"></a>
 949 | 
 950 | # 场景文本检测
 951 | 
 952 | **STEFANN: Scene Text Editor using Font Adaptive Neural Network**
 953 | 
 954 | - 主页：https://prasunroy.github.io/stefann/
 955 | 
 956 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
 957 | - 代码：https://github.com/prasunroy/stefann
 958 | - 数据集：https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
 959 | 
 960 | **ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection**
 961 | 
 962 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_ContourNet_Taking_a_Further_Step_Toward_Accurate_Arbitrary-Shaped_Scene_Text_CVPR_2020_paper.pdf
 963 | - 代码：https://github.com/wangyuxin87/ContourNet 
 964 | 
 965 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
 966 | 
 967 | - 论文：https://arxiv.org/abs/2003.10608
 968 | - 代码和数据集：https://github.com/Jyouhou/UnrealText/
 969 | 
 970 | **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**
 971 | 
 972 | - 论文：https://arxiv.org/abs/2002.10200 
 973 | - 代码（即将开源）：https://github.com/Yuliang-Liu/bezier_curve_text_spotting
 974 | - 代码（即将开源）：https://github.com/aim-uofa/adet
 975 | 
 976 | **Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection**
 977 | 
 978 | - 论文：https://arxiv.org/abs/2003.07493
 979 | 
 980 | - 代码：https://github.com/GXYM/DRRG
 981 | 
 982 | <a name="Scene-Text-Recognition"></a>
 983 | 
 984 | # 场景文本识别
 985 | 
 986 | **SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition**
 987 | 
 988 | - 论文：https://arxiv.org/abs/2005.10977
 989 | - 代码：https://github.com/Pay20Y/SEED
 990 | 
 991 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
 992 | 
 993 | - 论文：https://arxiv.org/abs/2003.10608
 994 | - 代码和数据集：https://github.com/Jyouhou/UnrealText/
 995 | 
 996 | **ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network**
 997 | 
 998 | - 论文：https://arxiv.org/abs/2002.10200 
 999 | - 代码（即将开源）：https://github.com/aim-uofa/adet
1000 | 
1001 | **Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition**
1002 | 
1003 | - 论文：https://arxiv.org/abs/2003.06606
1004 | 
1005 | - 代码：https://github.com/Canjie-Luo/Text-Image-Augmentation
1006 | 
1007 | <a name="Feature"></a>
1008 | 
1009 | # 特征(点)检测和描述
1010 | 
1011 | **SuperGlue: Learning Feature Matching with Graph Neural Networks**
1012 | 
1013 | - 论文：https://arxiv.org/abs/1911.11763
1014 | - 代码：https://github.com/magicleap/SuperGluePretrainedNetwork
1015 | 
1016 | <a name="Super-Resolution"></a>
1017 | 
1018 | # 超分辨率
1019 | 
1020 | ## 图像超分辨率
1021 | 
1022 | **Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution**
1023 | 
1024 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Guo_Closed-Loop_Matters_Dual_Regression_Networks_for_Single_Image_Super-Resolution_CVPR_2020_paper.html
1025 | - 代码：https://github.com/guoyongcs/DRN
1026 | 
1027 | **Learning Texture Transformer Network for Image Super-Resolution**
1028 | 
1029 | - 论文：https://arxiv.org/abs/2006.04139
1030 | 
1031 | - 代码：https://github.com/FuzhiYang/TTSR
1032 | 
1033 | **Image Super-Resolution with Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining**
1034 | 
1035 | - 论文：https://arxiv.org/abs/2006.01424
1036 | - 代码：https://github.com/SHI-Labs/Cross-Scale-Non-Local-Attention
1037 | 
1038 | **Structure-Preserving Super Resolution with Gradient Guidance**
1039 | 
1040 | - 论文：https://arxiv.org/abs/2003.13081
1041 | 
1042 | - 代码：https://github.com/Maclory/SPSR
1043 | 
1044 | **Rethinking Data Augmentation for Image Super-resolution: A Comprehensive Analysis and a New Strategy**
1045 | 
1046 | 论文：https://arxiv.org/abs/2004.00448
1047 | 
1048 | 代码：https://github.com/clovaai/cutblur
1049 | 
1050 | ## 视频超分辨率
1051 | 
1052 | **TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution**
1053 | 
1054 | - 论文：https://arxiv.org/abs/1812.02898
1055 | - 代码：https://github.com/YapengTian/TDAN-VSR-CVPR-2020
1056 | 
1057 | **Space-Time-Aware Multi-Resolution Video Enhancement**
1058 | 
1059 | - 主页：https://alterzero.github.io/projects/STAR.html
1060 | - 论文：http://arxiv.org/abs/2003.13170
1061 | - 代码：https://github.com/alterzero/STARnet
1062 | 
1063 | **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**
1064 | 
1065 | - 论文：https://arxiv.org/abs/2002.11616 
1066 | - 代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 
1067 | 
1068 | <a name="Model-Compression"></a>
1069 | 
1070 | # 模型压缩/剪枝
1071 | 
1072 | **DMCP: Differentiable Markov Channel Pruning for Neural Networks**
1073 | 
1074 | - 论文：https://arxiv.org/abs/2005.03354
1075 | - 代码：https://github.com/zx55/dmcp
1076 | 
1077 | **Forward and Backward Information Retention for Accurate Binary Neural Networks**
1078 | 
1079 | - 论文：https://arxiv.org/abs/1909.10788
1080 | 
1081 | - 代码：https://github.com/htqin/IR-Net
1082 | 
1083 | **Towards Efficient Model Compression via Learned Global Ranking**
1084 | 
1085 | - 论文：https://arxiv.org/abs/1904.12368
1086 | - 代码：https://github.com/cmu-enyac/LeGR
1087 | 
1088 | **HRank: Filter Pruning using High-Rank Feature Map**
1089 | 
1090 | - 论文：http://arxiv.org/abs/2002.10179
1091 | - 代码：https://github.com/lmbxmu/HRank 
1092 | 
1093 | **GAN Compression: Efficient Architectures for Interactive Conditional GANs**
1094 | 
1095 | - 论文：https://arxiv.org/abs/2003.08936
1096 | 
1097 | - 代码：https://github.com/mit-han-lab/gan-compression
1098 | 
1099 | **Group Sparsity: The Hinge Between Filter Pruning and Decomposition for Network Compression**
1100 | 
1101 | - 论文：https://arxiv.org/abs/2003.08935
1102 | 
1103 | - 代码：https://github.com/ofsoundof/group_sparsity
1104 | 
1105 | <a name="Action-Recognition"></a>
1106 | 
1107 | # 视频理解/行为识别
1108 | 
1109 | **Oops! Predicting Unintentional Action in Video**
1110 | 
1111 | - 主页：https://oops.cs.columbia.edu/
1112 | 
1113 | - 论文：https://arxiv.org/abs/1911.11206
1114 | - 代码：https://github.com/cvlab-columbia/oops
1115 | - 数据集：https://oops.cs.columbia.edu/data
1116 | 
1117 | **PREDICT & CLUSTER: Unsupervised Skeleton Based Action Recognition**
1118 | 
1119 | - 论文：https://arxiv.org/abs/1911.12409
1120 | - 代码：https://github.com/shlizee/Predict-Cluster 
1121 | 
1122 | **Intra- and Inter-Action Understanding via Temporal Action Parsing**
1123 | 
1124 | - 论文：https://arxiv.org/abs/2005.10229
1125 | - 主页和数据集：https://sdolivia.github.io/TAPOS/
1126 | 
1127 | **3DV: 3D Dynamic Voxel for Action Recognition in Depth Video**
1128 | 
1129 | - 论文：https://arxiv.org/abs/2005.05501
1130 | - 代码：https://github.com/3huo/3DV-Action
1131 | 
1132 | **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**
1133 | 
1134 | - 主页：https://sdolivia.github.io/FineGym/
1135 | - 论文：https://arxiv.org/abs/2004.06704
1136 | 
1137 | **TEA: Temporal Excitation and Aggregation for Action Recognition**
1138 | 
1139 | - 论文：https://arxiv.org/abs/2004.01398
1140 | 
1141 | - 代码：https://github.com/Phoenix1327/tea-action-recognition
1142 | 
1143 | **X3D: Expanding Architectures for Efficient Video Recognition**
1144 | 
1145 | - 论文：https://arxiv.org/abs/2004.04730
1146 | 
1147 | - 代码：https://github.com/facebookresearch/SlowFast
1148 | 
1149 | **Temporal Pyramid Network for Action Recognition**
1150 | 
1151 | - 主页：https://decisionforce.github.io/TPN
1152 | 
1153 | - 论文：https://arxiv.org/abs/2004.03548 
1154 | - 代码：https://github.com/decisionforce/TPN 
1155 | 
1156 | ## 基于骨架的动作识别
1157 | 
1158 | **Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition**
1159 | 
1160 | - 论文：https://arxiv.org/abs/2003.14111
1161 | - 代码：https://github.com/kenziyuliu/ms-g3d
1162 | 
1163 | <a name="Crowd-Counting"></a>
1164 | 
1165 | # 人群计数
1166 | 
1167 | <a name="Depth-Estimation"></a>
1168 | 
1169 | # 深度估计
1170 | 
1171 | **BiFuse: Monocular 360◦ Depth Estimation via Bi-Projection Fusion**
1172 | 
1173 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_BiFuse_Monocular_360_Depth_Estimation_via_Bi-Projection_Fusion_CVPR_2020_paper.pdf
1174 | - 代码：https://github.com/Yeh-yu-hsuan/BiFuse
1175 | 
1176 | **Focus on defocus: bridging the synthetic to real domain gap for depth estimation**
1177 | 
1178 | - 论文：https://arxiv.org/abs/2005.09623
1179 | - 代码：https://github.com/dvl-tum/defocus-net
1180 | 
1181 | **Bi3D: Stereo Depth Estimation via Binary Classifications**
1182 | 
1183 | - 论文：https://arxiv.org/abs/2005.07274
1184 | 
1185 | - 代码：https://github.com/NVlabs/Bi3D
1186 | 
1187 | **AANet: Adaptive Aggregation Network for Efficient Stereo Matching**
1188 | 
1189 | - 论文：https://arxiv.org/abs/2004.09548
1190 | - 代码：https://github.com/haofeixu/aanet
1191 | 
1192 | **Towards Better Generalization: Joint Depth-Pose Learning without PoseNet**
1193 | 
1194 | - 论文：https://github.com/B1ueber2y/TrianFlow
1195 | 
1196 | - 代码：https://github.com/B1ueber2y/TrianFlow
1197 | 
1198 | ## 单目深度估计
1199 | 
1200 | **On the uncertainty of self-supervised monocular depth estimation**
1201 | 
1202 | - 论文：https://arxiv.org/abs/2005.06209
1203 | - 代码：https://github.com/mattpoggi/mono-uncertainty
1204 | 
1205 | **3D Packing for Self-Supervised Monocular Depth Estimation**
1206 | 
1207 | - 论文：https://arxiv.org/abs/1905.02693
1208 | - 代码：https://github.com/TRI-ML/packnet-sfm
1209 | - Demo视频：https://www.bilibili.com/video/av70562892/
1210 | 
1211 | **Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation**
1212 | 
1213 | - 论文：https://arxiv.org/abs/2002.12114
1214 | - 代码：https://github.com/yzhao520/ARC
1215 | 
1216 | <a name="6DOF"></a>
1217 | 
1218 | # 6D目标姿态估计
1219 | 
1220 |  **PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation**
1221 | 
1222 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/He_PVN3D_A_Deep_Point-Wise_3D_Keypoints_Voting_Network_for_6DoF_CVPR_2020_paper.pdf
1223 | - 代码：https://github.com/ethnhe/PVN3D
1224 | 
1225 | **MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion**
1226 | 
1227 | - 论文：https://arxiv.org/abs/2004.04336
1228 | - 代码：https://github.com/wkentaro/morefusion
1229 | 
1230 | **EPOS: Estimating 6D Pose of Objects with Symmetries**
1231 | 
1232 | 主页：http://cmp.felk.cvut.cz/epos
1233 | 
1234 | 论文：https://arxiv.org/abs/2004.00605
1235 | 
1236 | **G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features**
1237 | 
1238 | - 论文：https://arxiv.org/abs/2003.11089
1239 | 
1240 | - 代码：https://github.com/DC1991/G2L_Net
1241 | 
1242 | <a name="Hand-Pose"></a>
1243 | 
1244 | # 手势估计
1245 | 
1246 | **HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation**
1247 | 
1248 | - 论文：https://arxiv.org/abs/2004.00060
1249 | 
1250 | - 主页：http://vision.sice.indiana.edu/projects/hopenet
1251 | 
1252 | **Monocular Real-time Hand Shape and Motion Capture using Multi-modal Data**
1253 | 
1254 | - 论文：https://arxiv.org/abs/2003.09572
1255 | 
1256 | - 代码：https://github.com/CalciferZh/minimal-hand
1257 | 
1258 | <a name="Saliency"></a>
1259 | 
1260 | # 显著性检测
1261 | 
1262 | **JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection**
1263 | 
1264 | - 论文：https://arxiv.org/abs/2004.08515
1265 | 
1266 | - 代码：https://github.com/kerenfu/JLDCF/
1267 | 
1268 | **UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders**
1269 | 
1270 | - 主页：http://dpfan.net/d3netbenchmark/
1271 | 
1272 | - 论文：https://arxiv.org/abs/2004.05763
1273 | - 代码：https://github.com/JingZhang617/UCNet
1274 | 
1275 | <a name="Denoising"></a>
1276 | 
1277 | # 去噪
1278 | 
1279 | **A Physics-based Noise Formation Model for Extreme Low-light Raw Denoising**
1280 | 
1281 | - 论文：https://arxiv.org/abs/2003.12751
1282 | 
1283 | - 代码：https://github.com/Vandermode/NoiseModel
1284 | 
1285 | **CycleISP: Real Image Restoration via Improved Data Synthesis**
1286 | 
1287 | - 论文：https://arxiv.org/abs/2003.07761
1288 | 
1289 | - 代码：https://github.com/swz30/CycleISP
1290 | 
1291 | <a name="Deraining"></a>
1292 | 
1293 | # 去雨
1294 | 
1295 | **Multi-Scale Progressive Fusion Network for Single Image Deraining**
1296 | 
1297 | - 论文：https://arxiv.org/abs/2003.10985
1298 | - 代码：https://github.com/kuihua/MSPFN
1299 | 
1300 | **Detail-recovery Image Deraining via Context Aggregation Networks**
1301 | 
1302 | - 论文：https://openaccess.thecvf.com/content_CVPR_2020/html/Deng_Detail-recovery_Image_Deraining_via_Context_Aggregation_Networks_CVPR_2020_paper.html
1303 | - 代码：https://github.com/Dengsgithub/DRD-Net
1304 | 
1305 | <a name="Deblurring"></a>
1306 | 
1307 | # 去模糊
1308 | 
1309 | ## 视频去模糊
1310 | 
1311 | **Cascaded Deep Video Deblurring Using Temporal Sharpness Prior**
1312 | 
1313 | - 主页：https://csbhr.github.io/projects/cdvd-tsp/index.html 
1314 | - 论文：https://arxiv.org/abs/2004.02501 
1315 | - 代码：https://github.com/csbhr/CDVD-TSP
1316 | 
1317 | <a name="Dehazing"></a>
1318 | 
1319 | # 去雾
1320 | 
1321 | **Domain Adaptation for Image Dehazing**
1322 | 
1323 | - 论文：https://arxiv.org/abs/2005.04668
1324 | 
1325 | - 代码：https://github.com/HUSTSYJ/DA_dahazing
1326 | 
1327 | **Multi-Scale Boosted Dehazing Network with Dense Feature Fusion**
1328 | 
1329 | - 论文：https://arxiv.org/abs/2004.13388
1330 | 
1331 | - 代码：https://github.com/BookerDeWitt/MSBDN-DFF
1332 | 
1333 | <a name="Feature"></a>
1334 | 
1335 | # 特征点检测与描述
1336 | 
1337 | **ASLFeat: Learning Local Features of Accurate Shape and Localization**
1338 | 
1339 | - 论文：https://arxiv.org/abs/2003.10071
1340 | 
1341 | - 代码：https://github.com/lzx551402/aslfeat
1342 | 
1343 | <a name="VQA"></a>
1344 | 
1345 | # 视觉问答(VQA)
1346 | 
1347 | **VC R-CNN：Visual Commonsense R-CNN** 
1348 | 
1349 | - 论文：https://arxiv.org/abs/2002.12204
1350 | - 代码：https://github.com/Wangt-CN/VC-R-CNN
1351 | 
1352 | <a name="VideoQA"></a>
1353 | 
1354 | # 视频问答(VideoQA)
1355 | 
1356 | **Hierarchical Conditional Relation Networks for Video Question Answering**
1357 | 
1358 | - 论文：https://arxiv.org/abs/2002.10698
1359 | - 代码：https://github.com/thaolmk54/hcrn-videoqa
1360 | 
1361 | <a name="VLN"></a>
1362 | 
1363 | # 视觉语言导航
1364 | 
1365 | **Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training**
1366 | 
1367 | - 论文：https://arxiv.org/abs/2002.10638
1368 | - 代码（即将开源）：https://github.com/weituo12321/PREVALENT
1369 | 
1370 | <a name="Video-Compression"></a>
1371 | 
1372 | # 视频压缩
1373 | 
1374 | **Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement**
1375 | 
1376 | - 论文：https://arxiv.org/abs/2003.01966 
1377 | - 代码：https://github.com/RenYang-home/HLVC
1378 | 
1379 | <a name="Video-Frame-Interpolation"></a>
1380 | 
1381 | # 视频插帧
1382 | 
1383 | **AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation**
1384 | 
1385 | - 论文：https://arxiv.org/abs/1907.10244
1386 | - 代码：https://github.com/HyeongminLEE/AdaCoF-pytorch
1387 | 
1388 | **FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation**
1389 | 
1390 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Gui_FeatureFlow_Robust_Video_Interpolation_via_Structure-to-Texture_Generation_CVPR_2020_paper.html
1391 | 
1392 | - 代码：https://github.com/CM-BF/FeatureFlow
1393 | 
1394 | **Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution**
1395 | 
1396 | - 论文：https://arxiv.org/abs/2002.11616
1397 | - 代码：https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020
1398 | 
1399 | **Space-Time-Aware Multi-Resolution Video Enhancement**
1400 | 
1401 | - 主页：https://alterzero.github.io/projects/STAR.html
1402 | - 论文：http://arxiv.org/abs/2003.13170
1403 | - 代码：https://github.com/alterzero/STARnet
1404 | 
1405 | **Scene-Adaptive Video Frame Interpolation via Meta-Learning**
1406 | 
1407 | - 论文：https://arxiv.org/abs/2004.00779
1408 | - 代码：https://github.com/myungsub/meta-interpolation
1409 | 
1410 | **Softmax Splatting for Video Frame Interpolation**
1411 | 
1412 | - 主页：http://sniklaus.com/papers/softsplat
1413 | - 论文：https://arxiv.org/abs/2003.05534
1414 | - 代码：https://github.com/sniklaus/softmax-splatting
1415 | 
1416 | <a name="Style-Transfer"></a>
1417 | 
1418 | # 风格迁移
1419 | 
1420 | **Diversified Arbitrary Style Transfer via Deep Feature Perturbation**
1421 | 
1422 | - 论文：https://arxiv.org/abs/1909.08223
1423 | - 代码：https://github.com/EndyWon/Deep-Feature-Perturbation
1424 | 
1425 | **Collaborative Distillation for Ultra-Resolution Universal Style Transfer**
1426 | 
1427 | - 论文：https://arxiv.org/abs/2003.08436
1428 | 
1429 | - 代码：https://github.com/mingsun-tse/collaborative-distillation
1430 | 
1431 | <a name="Lane-Detection"></a>
1432 | 
1433 | # 车道线检测
1434 | 
1435 | **Inter-Region Affinity Distillation for Road Marking Segmentation**
1436 | 
1437 | - 论文：https://arxiv.org/abs/2004.05304
1438 | - 代码：https://github.com/cardwing/Codes-for-IntRA-KD
1439 | 
1440 | <a name="HOI"></a>
1441 | 
1442 | # "人-物"交互(HOT)检测
1443 | 
1444 | **PPDM: Parallel Point Detection and Matching for Real-time Human-Object Interaction Detection**
1445 | 
1446 | - 论文：https://arxiv.org/abs/1912.12898
1447 | - 代码：https://github.com/YueLiao/PPDM
1448 | 
1449 | **Detailed 2D-3D Joint Representation for Human-Object Interaction**
1450 | 
1451 | - 论文：https://arxiv.org/abs/2004.08154
1452 | 
1453 | - 代码：https://github.com/DirtyHarryLYL/DJ-RN
1454 | 
1455 | **Cascaded Human-Object Interaction Recognition**
1456 | 
1457 | - 论文：https://arxiv.org/abs/2003.04262
1458 | 
1459 | - 代码：https://github.com/tfzhou/C-HOI
1460 | 
1461 | **VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions**
1462 | 
1463 | - 论文：https://arxiv.org/abs/2003.05541
1464 | - 代码：https://github.com/ASMIftekhar/VSGNet
1465 | 
1466 | <a name="TP"></a>
1467 | 
1468 | # 轨迹预测
1469 | 
1470 | **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**
1471 | 
1472 | - 论文：https://arxiv.org/abs/1912.06445
1473 | - 代码：https://github.com/JunweiLiang/Multiverse
1474 | - 数据集：https://next.cs.cmu.edu/multiverse/
1475 | 
1476 | **Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction**
1477 | 
1478 | - 论文：https://arxiv.org/abs/2002.11927 
1479 | - 代码：https://github.com/abduallahmohamed/Social-STGCNN 
1480 | 
1481 | <a name="Motion-Predication"></a>
1482 | 
1483 | # 运动预测
1484 | 
1485 | **Collaborative Motion Prediction via Neural Motion Message Passing**
1486 | 
1487 | - 论文：https://arxiv.org/abs/2003.06594
1488 | - 代码：https://github.com/PhyllisH/NMMP
1489 | 
1490 | **MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps**
1491 | 
1492 | - 论文：https://arxiv.org/abs/2003.06754
1493 | 
1494 | - 代码：https://github.com/pxiangwu/MotionNet
1495 | 
1496 | <a name="OF"></a>
1497 | 
1498 | # 光流估计
1499 | 
1500 | **Learning by Analogy: Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation**
1501 | 
1502 | - 论文：https://arxiv.org/abs/2003.13045
1503 | - 代码：https://github.com/lliuz/ARFlow 
1504 | 
1505 | <a name="IR"></a>
1506 | 
1507 | # 图像检索
1508 | 
1509 | **Evade Deep Image Retrieval by Stashing Private Images in the Hash Space**
1510 | 
1511 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Xiao_Evade_Deep_Image_Retrieval_by_Stashing_Private_Images_in_the_CVPR_2020_paper.html
1512 | - 代码：https://github.com/sugarruy/hashstash
1513 | 
1514 | <a name="Virtual-Try-On"></a>
1515 | 
1516 | # 虚拟试衣
1517 | 
1518 | **Towards Photo-Realistic Virtual Try-On by Adaptively Generating↔Preserving Image Content**
1519 | 
1520 | - 论文：https://arxiv.org/abs/2003.05863
1521 | - 代码：https://github.com/switchablenorms/DeepFashion_Try_On
1522 | 
1523 | <a name="HDR"></a>
1524 | 
1525 | # HDR
1526 | 
1527 | **Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline**
1528 | 
1529 | - 主页：https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR
1530 | 
1531 | - 论文下载链接：https://www.cmlab.csie.ntu.edu.tw/~yulunliu/SingleHDR_/00942.pdf
1532 | 
1533 | - 代码：https://github.com/alex04072000/SingleHDR
1534 | 
1535 | <a name="AE"></a>
1536 | 
1537 | # 对抗样本
1538 | 
1539 | **Enhancing Cross-Task Black-Box Transferability of Adversarial Examples With Dispersion Reduction**
1540 | 
1541 | - 论文：https://openaccess.thecvf.com/content_CVPR_2020/papers/Lu_Enhancing_Cross-Task_Black-Box_Transferability_of_Adversarial_Examples_With_Dispersion_Reduction_CVPR_2020_paper.pdf
1542 | - 代码：https://github.com/erbloo/dr_cvpr20 
1543 | 
1544 | **Towards Large yet Imperceptible Adversarial Image Perturbations with Perceptual Color Distance**
1545 | 
1546 | - 论文：https://arxiv.org/abs/1911.02466
1547 | - 代码：https://github.com/ZhengyuZhao/PerC-Adversarial 
1548 | 
1549 | <a name="3D-Reconstructing"></a>
1550 | 
1551 | # 三维重建
1552 | 
1553 | **Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild**
1554 | 
1555 | - **CVPR 2020 Best Paper**
1556 | - 主页：https://elliottwu.com/projects/unsup3d/
1557 | - 论文：https://arxiv.org/abs/1911.11130
1558 | - 代码：https://github.com/elliottwu/unsup3d
1559 | 
1560 | **Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization**
1561 | 
1562 | - 主页：https://shunsukesaito.github.io/PIFuHD/
1563 | - 论文：https://arxiv.org/abs/2004.00452
1564 | - 代码：https://github.com/facebookresearch/pifuhd
1565 | 
1566 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
1567 | - 代码：https://github.com/chaitanya100100/TailorNet
1568 | - 数据集：https://github.com/zycliao/TailorNet_dataset
1569 | 
1570 | **Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion**
1571 | 
1572 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Chibane_Implicit_Functions_in_Feature_Space_for_3D_Shape_Reconstruction_and_CVPR_2020_paper.pdf
1573 | - 代码：https://github.com/jchibane/if-net
1574 | 
1575 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Mir_Learning_to_Transfer_Texture_From_Clothing_Images_to_3D_Humans_CVPR_2020_paper.pdf
1576 | - 代码：https://github.com/aymenmir1/pix2surf
1577 | 
1578 | <a name="DC"></a>
1579 | 
1580 | # 深度补全
1581 | 
1582 | **Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End**
1583 | 
1584 | 论文：https://arxiv.org/abs/2006.03349
1585 | 
1586 | 代码：https://github.com/abdo-eldesokey/pncnn
1587 | 
1588 | <a name="SSC"></a>
1589 | 
1590 | # 语义场景补全
1591 | 
1592 | **3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior**
1593 | 
1594 | - 论文：https://arxiv.org/abs/2003.14052
1595 | - 代码：https://github.com/charlesCXK/TorchSSC
1596 | 
1597 | <a name="Captioning"></a>
1598 | 
1599 | # 图像/视频描述
1600 | 
1601 | **Syntax-Aware Action Targeting for Video Captioning**
1602 | 
1603 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zheng_Syntax-Aware_Action_Targeting_for_Video_Captioning_CVPR_2020_paper.pdf
1604 | - 代码：https://github.com/SydCaption/SAAT 
1605 | 
1606 | <a name="WP"></a>
1607 | 
1608 | # 线框解析
1609 | 
1610 | **Holistically-Attracted Wireframe Parser**
1611 | 
1612 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Xue_Holistically-Attracted_Wireframe_Parsing_CVPR_2020_paper.html
1613 | 
1614 | - 代码：https://github.com/cherubicXN/hawp
1615 | 
1616 | <a name="Datasets"></a>
1617 | 
1618 | # 数据集
1619 | 
1620 | **OASIS: A Large-Scale Dataset for Single Image 3D in the Wild**
1621 | 
1622 | - 论文：https://arxiv.org/abs/2007.13215
1623 | - 数据集：https://oasis.cs.princeton.edu/
1624 | 
1625 | **STEFANN: Scene Text Editor using Font Adaptive Neural Network**
1626 | 
1627 | - 主页：https://prasunroy.github.io/stefann/
1628 | 
1629 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.html
1630 | - 代码：https://github.com/prasunroy/stefann
1631 | - 数据集：https://drive.google.com/open?id=1sEDiX_jORh2X-HSzUnjIyZr-G9LJIw1k
1632 | 
1633 | **Interactive Object Segmentation with Inside-Outside Guidance**
1634 | 
1635 | - 论文下载链接：http://openaccess.thecvf.com/content_CVPR_2020/papers/Zhang_Interactive_Object_Segmentation_With_Inside-Outside_Guidance_CVPR_2020_paper.pdf
1636 | - 代码：https://github.com/shiyinzhang/Inside-Outside-Guidance
1637 | - 数据集：https://github.com/shiyinzhang/Pixel-ImageNet
1638 | 
1639 | **Video Panoptic Segmentation**
1640 | 
1641 | - 论文：https://arxiv.org/abs/2006.11339
1642 | - 代码：https://github.com/mcahny/vps
1643 | - 数据集：https://www.dropbox.com/s/ecem4kq0fdkver4/cityscapes-vps-dataset-1.0.zip?dl=0
1644 | 
1645 | **FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation**
1646 | 
1647 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Li_FSS-1000_A_1000-Class_Dataset_for_Few-Shot_Segmentation_CVPR_2020_paper.html
1648 | 
1649 | - 代码：https://github.com/HKUSTCV/FSS-1000
1650 | 
1651 | - 数据集：https://github.com/HKUSTCV/FSS-1000
1652 | 
1653 | **3D-ZeF: A 3D Zebrafish Tracking Benchmark Dataset**
1654 | 
1655 | - 主页：https://vap.aau.dk/3d-zef/
1656 | - 论文：https://arxiv.org/abs/2006.08466
1657 | - 代码：https://bitbucket.org/aauvap/3d-zef/src/master/
1658 | - 数据集：https://motchallenge.net/data/3D-ZeF20
1659 | 
1660 | **TailorNet: Predicting Clothing in 3D as a Function of Human Pose, Shape and Garment Style**
1661 | 
1662 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/papers/Patel_TailorNet_Predicting_Clothing_in_3D_as_a_Function_of_Human_CVPR_2020_paper.pdf
1663 | - 代码：https://github.com/chaitanya100100/TailorNet
1664 | - 数据集：https://github.com/zycliao/TailorNet_dataset
1665 | 
1666 | **Oops! Predicting Unintentional Action in Video**
1667 | 
1668 | - 主页：https://oops.cs.columbia.edu/
1669 | 
1670 | - 论文：https://arxiv.org/abs/1911.11206
1671 | - 代码：https://github.com/cvlab-columbia/oops
1672 | - 数据集：https://oops.cs.columbia.edu/data
1673 | 
1674 | **The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction**
1675 | 
1676 | - 论文：https://arxiv.org/abs/1912.06445
1677 | - 代码：https://github.com/JunweiLiang/Multiverse
1678 | - 数据集：https://next.cs.cmu.edu/multiverse/
1679 | 
1680 | **Open Compound Domain Adaptation**
1681 | 
1682 | - 主页：https://liuziwei7.github.io/projects/CompoundDomain.html
1683 | - 数据集：https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
1684 | - 论文：https://arxiv.org/abs/1909.03403
1685 | - 代码：https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA
1686 | 
1687 | **Intra- and Inter-Action Understanding via Temporal Action Parsing**
1688 | 
1689 | - 论文：https://arxiv.org/abs/2005.10229
1690 | - 主页和数据集：https://sdolivia.github.io/TAPOS/
1691 | 
1692 | **Dynamic Refinement Network for Oriented and Densely Packed Object Detection**
1693 | 
1694 | - 论文下载链接：https://arxiv.org/abs/2005.09973
1695 | 
1696 | - 代码和数据集：https://github.com/Anymake/DRN_CVPR2020
1697 | 
1698 | **COCAS: A Large-Scale Clothes Changing Person Dataset for Re-identification**
1699 | 
1700 | - 论文：https://arxiv.org/abs/2005.07862
1701 | 
1702 | - 数据集：暂无
1703 | 
1704 | **KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous Human Annotations**
1705 | 
1706 | - 论文：https://arxiv.org/abs/2002.12687
1707 | 
1708 | - 数据集：https://github.com/qq456cvb/KeypointNet
1709 | 
1710 | **MSeg: A Composite Dataset for Multi-domain Semantic Segmentation**
1711 | 
1712 | - 论文：http://vladlen.info/papers/MSeg.pdf
1713 | - 代码：https://github.com/mseg-dataset/mseg-api
1714 | - 数据集：https://github.com/mseg-dataset/mseg-semantic
1715 | 
1716 | **AvatarMe: Realistically Renderable 3D Facial Reconstruction "in-the-wild"**
1717 | 
1718 | - 论文：https://arxiv.org/abs/2003.13845
1719 | - 数据集：https://github.com/lattas/AvatarMe
1720 | 
1721 | **Learning to Autofocus**
1722 | 
1723 | - 论文：https://arxiv.org/abs/2004.12260
1724 | - 数据集：暂无
1725 | 
1726 | **FaceScape: a Large-scale High Quality 3D Face Dataset and Detailed Riggable 3D Face Prediction**
1727 | 
1728 | - 论文：https://arxiv.org/abs/2003.13989
1729 | - 代码：https://github.com/zhuhao-nju/facescape
1730 | 
1731 | **Bodies at Rest: 3D Human Pose and Shape Estimation from a Pressure Image using Synthetic Data**
1732 | 
1733 | - 论文下载链接：https://arxiv.org/abs/2004.01166
1734 | 
1735 | - 代码：https://github.com/Healthcare-Robotics/bodies-at-rest
1736 | - 数据集：https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/KOA4ML
1737 | 
1738 | **FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding**
1739 | 
1740 | - 主页：https://sdolivia.github.io/FineGym/
1741 | - 论文：https://arxiv.org/abs/2004.06704
1742 | 
1743 | **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**
1744 | 
1745 | - 主页：https://anyirao.com/projects/SceneSeg.html
1746 | 
1747 | - 论文下载链接：https://arxiv.org/abs/2004.02678
1748 | 
1749 | - 代码：https://github.com/AnyiRao/SceneSeg
1750 | 
1751 | **Deep Homography Estimation for Dynamic Scenes**
1752 | 
1753 | - 论文：https://arxiv.org/abs/2004.02132
1754 | 
1755 | - 数据集：https://github.com/lcmhoang/hmg-dynamics
1756 | 
1757 | **Assessing Image Quality Issues for Real-World Problems**
1758 | 
1759 | - 主页：https://vizwiz.org/tasks-and-datasets/image-quality-issues/
1760 | - 论文：https://arxiv.org/abs/2003.12511
1761 | 
1762 | **UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World**
1763 | 
1764 | - 论文：https://arxiv.org/abs/2003.10608
1765 | - 代码和数据集：https://github.com/Jyouhou/UnrealText/
1766 | 
1767 | **PANDA: A Gigapixel-level Human-centric Video Dataset**
1768 | 
1769 | - 论文：https://arxiv.org/abs/2003.04852
1770 | 
1771 | - 数据集：http://www.panda-dataset.com/
1772 | 
1773 | **IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning**
1774 | 
1775 | - 论文：https://arxiv.org/abs/2003.02920
1776 | - 数据集：https://github.com/intra3d2019/IntrA
1777 | 
1778 | **Cross-View Tracking for Multi-Human 3D Pose Estimation at over 100 FPS**
1779 | 
1780 | - 论文：https://arxiv.org/abs/2003.03972
1781 | - 数据集：暂无
1782 | 
1783 | <a name="Others"></a>
1784 | 
1785 | # 其他
1786 | 
1787 | **CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus**
1788 | 
1789 | - 论文：http://openaccess.thecvf.com/content_CVPR_2020/html/Kluger_CONSAC_Robust_Multi-Model_Fitting_by_Conditional_Sample_Consensus_CVPR_2020_paper.html
1790 | - 代码：https://github.com/fkluger/consac
1791 | 
1792 | **Learning to Learn Single Domain Generalization**
1793 | 
1794 | - 论文：https://arxiv.org/abs/2003.13216
1795 | - 代码：https://github.com/joffery/M-ADA
1796 | 
1797 | **Open Compound Domain Adaptation**
1798 | 
1799 | - 主页：https://liuziwei7.github.io/projects/CompoundDomain.html
1800 | - 数据集：https://drive.google.com/drive/folders/1_uNTF8RdvhS_sqVTnYx17hEOQpefmE2r?usp=sharing
1801 | - 论文：https://arxiv.org/abs/1909.03403
1802 | - 代码：https://github.com/zhmiao/OpenCompoundDomainAdaptation-OCDA
1803 | 
1804 | **Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision**
1805 | 
1806 | - 论文：http://www.cvlibs.net/publications/Niemeyer2020CVPR.pdf
1807 | 
1808 | - 代码：https://github.com/autonomousvision/differentiable_volumetric_rendering
1809 | 
1810 | **QEBA: Query-Efficient Boundary-Based Blackbox Attack**
1811 | 
1812 | - 论文：https://arxiv.org/abs/2005.14137
1813 | - 代码：https://github.com/AI-secure/QEBA
1814 | 
1815 | **Equalization Loss for Long-Tailed Object Recognition**
1816 | 
1817 | - 论文：https://arxiv.org/abs/2003.05176
1818 | - 代码：https://github.com/tztztztztz/eql.detectron2
1819 | 
1820 | **Instance-aware Image Colorization**
1821 | 
1822 | - 主页：https://ericsujw.github.io/InstColorization/
1823 | - 论文：https://arxiv.org/abs/2005.10825
1824 | - 代码：https://github.com/ericsujw/InstColorization
1825 | 
1826 | **Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting**
1827 | 
1828 | - 论文：https://arxiv.org/abs/2005.09704
1829 | 
1830 | - 代码：https://github.com/Atlas200dk/sample-imageinpainting-HiFill
1831 | 
1832 | **Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching**
1833 | 
1834 | - 论文：https://arxiv.org/abs/2005.03860
1835 | - 代码：https://github.com/shiyujiao/cross_view_localization_DSM
1836 | 
1837 | **Epipolar Transformers**
1838 | 
1839 | - 论文：https://arxiv.org/abs/2005.04551
1840 | 
1841 | - 代码：https://github.com/yihui-he/epipolar-transformers 
1842 | 
1843 | **Bringing Old Photos Back to Life**
1844 | 
1845 | - 主页：http://raywzy.com/Old_Photo/
1846 | - 论文：https://arxiv.org/abs/2004.09484
1847 | 
1848 | **MaskFlownet: Asymmetric Feature Matching with Learnable Occlusion Mask**
1849 | 
1850 | - 论文：https://arxiv.org/abs/2003.10955 
1851 | 
1852 | - 代码：https://github.com/microsoft/MaskFlownet 
1853 | 
1854 | **Self-Supervised Viewpoint Learning from Image Collections**
1855 | 
1856 | - 论文：https://arxiv.org/abs/2004.01793
1857 | - 论文2：https://research.nvidia.com/sites/default/files/pubs/2020-03_Self-Supervised-Viewpoint-Learning/SSV-CVPR2020.pdf 
1858 | - 代码：https://github.com/NVlabs/SSV 
1859 | 
1860 | **Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations**
1861 | 
1862 | - Oral
1863 | 
1864 | - 论文：https://arxiv.org/abs/2003.12237 
1865 | - 代码：https://github.com/cuishuhao/BNM 
1866 | 
1867 | **Towards Learning Structure via Consensus for Face Segmentation and Parsing**
1868 | 
1869 | - 论文：https://arxiv.org/abs/1911.00957
1870 | - 代码：https://github.com/isi-vista/structure_via_consensus
1871 | 
1872 | **Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging**
1873 | 
1874 | - Oral
1875 | - 论文：https://arxiv.org/abs/2003.13654
1876 | 
1877 | - 代码：https://github.com/liuyang12/PnP-SCI
1878 | 
1879 | **Lightweight Photometric Stereo for Facial Details Recovery**
1880 | 
1881 | - 论文：https://arxiv.org/abs/2003.12307
1882 | - 代码：https://github.com/Juyong/FacePSNet
1883 | 
1884 | **Footprints and Free Space from a Single Color Image**
1885 | 
1886 | - 论文：https://arxiv.org/abs/2004.06376
1887 | 
1888 | - 代码：https://github.com/nianticlabs/footprints
1889 | 
1890 | **Self-Supervised Monocular Scene Flow Estimation**
1891 | 
1892 | - 论文：https://arxiv.org/abs/2004.04143
1893 | - 代码：https://github.com/visinf/self-mono-sf
1894 | 
1895 | **Quasi-Newton Solver for Robust Non-Rigid Registration**
1896 | 
1897 | - 论文：https://arxiv.org/abs/2004.04322
1898 | - 代码：https://github.com/Juyong/Fast_RNRR
1899 | 
1900 | **A Local-to-Global Approach to Multi-modal Movie Scene Segmentation**
1901 | 
1902 | - 主页：https://anyirao.com/projects/SceneSeg.html
1903 | 
1904 | - 论文下载链接：https://arxiv.org/abs/2004.02678
1905 | 
1906 | - 代码：https://github.com/AnyiRao/SceneSeg
1907 | 
1908 | **DeepFLASH: An Efficient Network for Learning-based Medical Image Registration**
1909 | 
1910 | - 论文：https://arxiv.org/abs/2004.02097
1911 | 
1912 | - 代码：https://github.com/jw4hv/deepflash
1913 | 
1914 | **Self-Supervised Scene De-occlusion**
1915 | 
1916 | - 主页：https://xiaohangzhan.github.io/projects/deocclusion/
1917 | - 论文：https://arxiv.org/abs/2004.02788
1918 | - 代码：https://github.com/XiaohangZhan/deocclusion
1919 | 
1920 | **Polarized Reflection Removal with Perfect Alignment in the Wild** 
1921 | 
1922 | - 主页：https://leichenyang.weebly.com/project-polarized.html
1923 | - 代码：https://github.com/ChenyangLEI/CVPR2020-Polarized-Reflection-Removal-with-Perfect-Alignment 
1924 | 
1925 | **Background Matting: The World is Your Green Screen**
1926 | 
1927 | - 论文：https://arxiv.org/abs/2004.00626
1928 | - 代码：http://github.com/senguptaumd/Background-Matting
1929 | 
1930 | **What Deep CNNs Benefit from Global Covariance Pooling: An Optimization Perspective**
1931 | 
1932 | - 论文：https://arxiv.org/abs/2003.11241
1933 | 
1934 | - 代码：https://github.com/ZhangLi-CS/GCP_Optimization
1935 | 
1936 | **Look-into-Object: Self-supervised Structure Modeling for Object Recognition**
1937 | 
1938 | - 论文：暂无
1939 | - 代码：https://github.com/JDAI-CV/LIO 
1940 | 
1941 |  **Video Object Grounding using Semantic Roles in Language Description**
1942 | 
1943 | - 论文：https://arxiv.org/abs/2003.10606
1944 | - 代码：https://github.com/TheShadow29/vognet-pytorch 
1945 | 
1946 | **Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives**
1947 | 
1948 | - 论文：https://arxiv.org/abs/2003.10739
1949 | - 代码：https://github.com/d-li14/DHM 
1950 | 
1951 | **SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization**
1952 | 
1953 | - 论文：http://www.cs.umd.edu/~yuejiang/papers/SDFDiff.pdf
1954 | - 代码：https://github.com/YueJiang-nj/CVPR2020-SDFDiff 
1955 | 
1956 | **On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location**
1957 | 
1958 | - 论文：https://arxiv.org/abs/2003.07064
1959 | 
1960 | - 代码：https://github.com/oskyhn/CNNs-Without-Borders
1961 | 
1962 | **GhostNet: More Features from Cheap Operations**
1963 | 
1964 | - 论文：https://arxiv.org/abs/1911.11907
1965 | 
1966 | - 代码：https://github.com/iamhankai/ghostnet
1967 | 
1968 | **AdderNet: Do We Really Need Multiplications in Deep Learning?** 
1969 | 
1970 | - 论文：https://arxiv.org/abs/1912.13200 
1971 | - 代码：https://github.com/huawei-noah/AdderNet
1972 | 
1973 | **Deep Image Harmonization via Domain Verification** 
1974 | 
1975 | - 论文：https://arxiv.org/abs/1911.13239 
1976 | - 代码：https://github.com/bcmi/Image_Harmonization_Datasets
1977 | 
1978 | **Blurry Video Frame Interpolation**
1979 | 
1980 | - 论文：https://arxiv.org/abs/2002.12259 
1981 | - 代码：https://github.com/laomao0/BIN
1982 | 
1983 | **Extremely Dense Point Correspondences using a Learned Feature Descriptor**
1984 | 
1985 | - 论文：https://arxiv.org/abs/2003.00619 
1986 | - 代码：https://github.com/lppllppl920/DenseDescriptorLearning-Pytorch
1987 | 
1988 | **Filter Grafting for Deep Neural Networks**
1989 | 
1990 | - 论文：https://arxiv.org/abs/2001.05868
1991 | - 代码：https://github.com/fxmeng/filter-grafting
1992 | - 论文解读：https://www.zhihu.com/question/372070853/answer/1041569335
1993 | 
1994 | **Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation**
1995 | 
1996 | - 论文：https://arxiv.org/abs/2003.02824 
1997 | - 代码：https://github.com/cmhungsteve/SSTDA
1998 | 
1999 | **Detecting Attended Visual Targets in Video**
2000 | 
2001 | - 论文：https://arxiv.org/abs/2003.02501 
2002 | 
2003 | - 代码：https://github.com/ejcgt/attention-target-detection
2004 | 
2005 | **Deep Image Spatial Transformation for Person Image Generation**
2006 | 
2007 | - 论文：https://arxiv.org/abs/2003.00696 
2008 | - 代码：https://github.com/RenYurui/Global-Flow-Local-Attention
2009 | 
2010 |  **Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications** 
2011 | 
2012 | - 论文：https://arxiv.org/abs/2003.01455
2013 | - 代码：https://github.com/bbrattoli/ZeroShotVideoClassification
2014 | 
2015 | https://github.com/charlesCXK/3D-SketchAware-SSC
2016 | 
2017 | https://github.com/Anonymous20192020/Anonymous_CVPR5767
2018 | 
2019 | https://github.com/avirambh/ScopeFlow
2020 | 
2021 | https://github.com/csbhr/CDVD-TSP
2022 | 
2023 | https://github.com/ymcidence/TBH
2024 | 
2025 | https://github.com/yaoyao-liu/mnemonics
2026 | 
2027 | https://github.com/meder411/Tangent-Images
2028 | 
2029 | https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch
2030 | 
2031 | https://github.com/sjmoran/deep_local_parametric_filters
2032 | 
2033 | https://github.com/charlesCXK/3D-SketchAware-SSC
2034 | 
2035 | https://github.com/bermanmaxim/AOWS
2036 | 
2037 | https://github.com/dc3ea9f/look-into-object 
2038 | 
2039 | <a name="Not-Sure"></a>
2040 | 
2041 | # 不确定中没中
2042 | 
2043 | **FADNet: A Fast and Accurate Network for Disparity Estimation**
2044 | 
2045 | - 论文：还没出来
2046 | - 代码：https://github.com/HKBU-HPML/FADNet
2047 | 
2048 | https://github.com/rFID-submit/RandomFID：不确定中没中
2049 | 
2050 | https://github.com/JackSyu/AE-MSR：不确定中没中
2051 | 
2052 | https://github.com/fastconvnets/cvpr2020：不确定中没中
2053 | 
2054 | https://github.com/aimagelab/meshed-memory-transformer：不确定中没中
2055 | 
2056 | https://github.com/TWSFar/CRGNet：不确定中没中
2057 | 
2058 | https://github.com/CVPR-2020/CDARTS：不确定中没中
2059 | 
2060 | https://github.com/anucvml/ddn-cvprw2020：不确定中没中
2061 | 
2062 | https://github.com/dl-model-recommend/model-trust：不确定中没中
2063 | 
2064 | https://github.com/apratimbhattacharyya18/CVPR-2020-Corr-Prior：不确定中没中
2065 | 
2066 | https://github.com/onetcvpr/O-Net：不确定中没中
2067 | 
2068 | https://github.com/502463708/Microcalcification_Detection：不确定中没中
2069 | 
2070 | https://github.com/anonymous-for-review/cvpr-2020-deep-smoke-machine：不确定中没中
2071 | 
2072 | https://github.com/anonymous-for-review/cvpr-2020-smoke-recognition-dataset：不确定中没中
2073 | 
2074 | https://github.com/cvpr-nonrigid/dataset：不确定中没中
2075 | 
2076 | https://github.com/theFool32/PPBA：不确定中没中
2077 | 
2078 | https://github.com/Realtime-Action-Recognition/Realtime-Action-Recognition


--------------------------------------------------------------------------------
/CVPR2022-Papers-with-Code.md:
--------------------------------------------------------------------------------
   1 | # CVPR 2022 论文和开源项目合集(Papers with Code)
   2 | 
   3 | [CVPR 2022](https://cvpr2022.thecvf.com/) 论文和开源项目合集(papers with code)！
   4 | 
   5 | CVPR 2022 收录列表ID：https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
   6 | 
   7 | > 注1：欢迎各位大佬提交issue，分享CVPR 2022论文和开源项目！
   8 | >
   9 | > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
  10 | >
  11 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md)
  12 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md)
  13 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md)
  14 | 
  15 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ 
  16 | 
  17 | ![](CVer学术交流群.png)
  18 | 
  19 | ## 【CVPR 2022 论文开源目录】
  20 | 
  21 | - [Backbone](#Backbone)
  22 | - [CLIP](#CLIP)
  23 | - [GAN](#GAN)
  24 | - [GNN](#GNN)
  25 | - [MLP](#MLP)
  26 | - [NAS](#NAS)
  27 | - [OCR](#OCR)
  28 | - [NeRF](#NeRF)
  29 | - [3D Face](#3D Face)
  30 | - [长尾分布(Long-Tail)](#Long-Tail)
  31 | - [Visual Transformer](#Visual-Transformer)
  32 | - [视觉和语言(Vision-Language)](#VL)
  33 | - [自监督学习(Self-supervised Learning)](#SSL)
  34 | - [数据增强(Data Augmentation)](#DA)
  35 | - [知识蒸馏(Knowledge Distillation)](#KD)
  36 | - [目标检测(Object Detection)](#Object-Detection)
  37 | - [目标跟踪(Visual Tracking)](#VT)
  38 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
  39 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
  40 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
  41 | - [小样本分类(Few-Shot Classification)](#FFC)
  42 | - [小样本分割(Few-Shot Segmentation)](#FFS)
  43 | - [图像抠图(Image Matting)](#Matting)
  44 | - [视频理解(Video Understanding)](#VU)
  45 | - [图像编辑(Image Editing)](#Image-Editing)
  46 | - [Low-level Vision](#LLV)
  47 | - [超分辨率(Super-Resolution)](#Super-Resolution)
  48 | - [去模糊(Deblur)](#Deblur)
  49 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
  50 | - [3D目标检测(3D Object Detection)](#3D-Object-Detection)
  51 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
  52 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
  53 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
  54 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
  55 | - [3D重建(3D Reconstruction)](#3D-R)
  56 | - [行人重识别(Person Re-identification)](#ReID)
  57 | - [伪装物体检测(Camouflaged Object Detection)](#COD)
  58 | - [深度估计(Depth Estimation)](#Depth-Estimation)
  59 | - [立体匹配(Stereo Matching)](#Stereo-Matching)
  60 | - [特征匹配(Feature Matching)](#FM)
  61 | - [车道线检测(Lane Detection)](#Lane-Detection)
  62 | - [光流估计(Optical Flow Estimation)](#Optical-Flow-Estimation)
  63 | - [图像修复(Image Inpainting)](#Image-Inpainting)
  64 | - [图像检索(Image Retrieval)](#Image-Retrieval)
  65 | - [人脸识别(Face Recognition)](#Face-Recognition)
  66 | - [人群计数(Crowd Counting)](#Crowd-Counting)
  67 | - [医学图像(Medical Image)](#Medical-Image)
  68 | - [视频生成(Video Generation)](#Video Generation)
  69 | - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
  70 | - [参考视频目标分割(Referring Video Object Segmentation)](#R-VOS)
  71 | - [步态识别(Gait Recognition)](#GR)
  72 | - [风格迁移(Style Transfer)](#ST)
  73 | - [异常检测(Anomaly Detection](#AD)
  74 | - [对抗样本(Adversarial Examples)](#AE)
  75 | - [弱监督物体检测(Weakly Supervised Object Localization)](#WSOL)
  76 | - [雷达目标检测(Radar Object Detection)](#ROD)
  77 | - [高光谱图像重建(Hyperspectral Image Reconstruction)](#HSI)
  78 | - [图像拼接(Image Stitching)](#Image-Stitching)
  79 | - [水印(Watermarking)](#Watermarking)
  80 | - [Action Counting](#AC)
  81 | - [Grounded Situation Recognition](#GSR)
  82 | - [Zero-shot Learning](#ZSL)
  83 | - [DeepFakes](#DeepFakes)
  84 | - [数据集(Datasets)](#Datasets)
  85 | - [新任务(New Tasks)](#New-Tasks)
  86 | - [其他(Others)](#Others)
  87 | 
  88 | <a name="Backbone"></a>
  89 | 
  90 | # Backbone
  91 | 
  92 | **A ConvNet for the 2020s**
  93 | 
  94 | - Paper: https://arxiv.org/abs/2201.03545
  95 | - Code: https://github.com/facebookresearch/ConvNeXt
  96 | - 中文解读：https://mp.weixin.qq.com/s/Xg5wPYExnvTqRo6s5-2cAw
  97 | 
  98 | **Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs**
  99 | 
 100 | - Paper: https://arxiv.org/abs/2203.06717
 101 | 
 102 | - Code: https://github.com/megvii-research/RepLKNet
 103 | - Code2: https://github.com/DingXiaoH/RepLKNet-pytorch
 104 | 
 105 | - 中文解读：https://mp.weixin.qq.com/s/_qXyIQut-JRW6VvsjaQlFg
 106 | 
 107 | **MPViT : Multi-Path Vision Transformer for Dense Prediction**
 108 | 
 109 | - Paper: https://arxiv.org/abs/2112.11010
 110 | - Code: https://github.com/youngwanLEE/MPViT
 111 | - 中文解读: https://mp.weixin.qq.com/s/Q9-crEOz5IYzZaNoq8oXfg
 112 | 
 113 | **Mobile-Former: Bridging MobileNet and Transformer**
 114 | 
 115 | - Paper: https://arxiv.org/abs/2108.05895
 116 | - Code: None
 117 | - 中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
 118 | 
 119 | **MetaFormer is Actually What You Need for Vision**
 120 | 
 121 | - Paper: https://arxiv.org/abs/2111.11418
 122 | - Code: https://github.com/sail-sg/poolformer
 123 | 
 124 | **Shunted Self-Attention via Multi-Scale Token Aggregation**
 125 | 
 126 | -  Paper(Oral): https://arxiv.org/abs/2111.15193
 127 | - Code: https://github.com/OliverRensu/Shunted-Transformer
 128 | 
 129 | **TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing**
 130 | 
 131 | - Paper: http://arxiv.org/abs/2203.10489
 132 | - Code: https://github.com/JierunChen/TVConv
 133 | 
 134 | **Learned Queries for Efficient Local Attention**
 135 | 
 136 | - Paper(Oral): https://arxiv.org/abs/2112.11435
 137 | - Code: https://github.com/moabarar/qna
 138 | 
 139 | **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**
 140 | 
 141 | - Paper: https://arxiv.org/abs/2112.11081
 142 | - Code: https://github.com/DingXiaoH/RepMLP
 143 | 
 144 | <a name="CLIP"></a>
 145 | 
 146 | # CLIP
 147 | 
 148 | **HairCLIP: Design Your Hair by Text and Reference Image**
 149 | 
 150 | - Paper: https://arxiv.org/abs/2112.05142
 151 | 
 152 | - Code: https://github.com/wty-ustc/HairCLIP
 153 | 
 154 | **PointCLIP: Point Cloud Understanding by CLIP**
 155 | 
 156 | - Paper: https://arxiv.org/abs/2112.02413
 157 | - Code: https://github.com/ZrrSkywalker/PointCLIP
 158 | 
 159 | **Blended Diffusion for Text-driven Editing of Natural Images**
 160 | 
 161 | - Paper: https://arxiv.org/abs/2111.14818
 162 | 
 163 | - Code: https://github.com/omriav/blended-diffusion
 164 | 
 165 | <a name="GAN"></a>
 166 | 
 167 | # GAN
 168 | 
 169 | **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**
 170 | 
 171 | - Homepage: https://semanticstylegan.github.io/
 172 | 
 173 | - Paper: https://arxiv.org/abs/2112.02236
 174 | - Demo: https://semanticstylegan.github.io/videos/demo.mp4
 175 | 
 176 | **Style Transformer for Image Inversion and Editing**
 177 | 
 178 | - Paper: https://arxiv.org/abs/2203.07932
 179 | - Code: https://github.com/sapphire497/style-transformer
 180 | 
 181 | **Unsupervised Image-to-Image Translation with Generative Prior**
 182 | 
 183 | - Homepage: https://www.mmlab-ntu.com/project/gpunit/
 184 | - Paper: https://arxiv.org/abs/2204.03641
 185 | - Code: https://github.com/williamyang1991/GP-UNIT
 186 | 
 187 | **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**
 188 | 
 189 | - Homepage: https://universome.github.io/stylegan-v
 190 | - Paper: https://arxiv.org/abs/2112.14683
 191 | - Code: https://github.com/universome/stylegan-v
 192 | 
 193 | **OSSGAN: Open-set Semi-supervised Image Generation**
 194 | 
 195 | - Paper: https://arxiv.org/abs/2204.14249
 196 | - Code: https://github.com/raven38/OSSGAN
 197 | 
 198 | **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**
 199 | 
 200 | - Paper: https://arxiv.org/abs/2204.06160
 201 | - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
 202 | 
 203 | <a name="GNN"></a>
 204 | 
 205 | # GNN
 206 | 
 207 | **OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks**
 208 | 
 209 | - Paper: https://wanyu-lin.github.io/assets/publications/wanyu-cvpr2022.pdf 
 210 | - Code: https://github.com/WanyuGroup/CVPR2022-OrphicX
 211 | 
 212 | <a name="MLP"></a>
 213 | 
 214 | # MLP
 215 | 
 216 | **RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality**
 217 | 
 218 | - Paper: https://arxiv.org/abs/2112.11081
 219 | - Code: https://github.com/DingXiaoH/RepMLP
 220 | 
 221 | <a name="NAS"></a>
 222 | 
 223 | # NAS
 224 | 
 225 | **β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search**
 226 | 
 227 | - Paper: https://arxiv.org/abs/2203.01665
 228 | - Code: https://github.com/Sunshine-Ye/Beta-DARTS
 229 | 
 230 | **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**
 231 | 
 232 | - Paper: https://arxiv.org/abs/2111.15362
 233 | - Code: None
 234 | 
 235 | <a name="OCR"></a>
 236 | 
 237 | # OCR
 238 | 
 239 | **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**
 240 | 
 241 | - Paper: https://arxiv.org/abs/2203.10209
 242 | 
 243 | - Code: https://github.com/mxin262/SwinTextSpotter
 244 | 
 245 | <a name="NeRF"></a>
 246 | 
 247 | # NeRF
 248 | 
 249 | **Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields**
 250 | 
 251 | - Homepage: https://jonbarron.info/mipnerf360/
 252 | - Paper: https://arxiv.org/abs/2111.12077
 253 | 
 254 | - Demo: https://youtu.be/YStDS2-Ln1s
 255 | 
 256 | **Point-NeRF: Point-based Neural Radiance Fields**
 257 | 
 258 | - Homepage: https://xharlie.github.io/projects/project_sites/pointnerf/
 259 | - Paper: https://arxiv.org/abs/2201.08845
 260 | - Code: https://github.com/Xharlie/point-nerf
 261 | 
 262 | **NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images**
 263 | 
 264 | - Paper: https://arxiv.org/abs/2111.13679
 265 | - Homepage: https://bmild.github.io/rawnerf/
 266 | - Demo: https://www.youtube.com/watch?v=JtBS4KBcKVc
 267 | 
 268 | **Urban Radiance Fields**
 269 | 
 270 | - Homepage: https://urban-radiance-fields.github.io/
 271 | 
 272 | - Paper: https://arxiv.org/abs/2111.14643
 273 | - Demo: https://youtu.be/qGlq5DZT6uc
 274 | 
 275 | **Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation**
 276 | 
 277 | - Paper: https://arxiv.org/abs/2202.13162
 278 | - Code: https://github.com/HexagonPrime/Pix2NeRF
 279 | 
 280 | **HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video**
 281 | 
 282 | - Homepage: https://grail.cs.washington.edu/projects/humannerf/
 283 | - Paper: https://arxiv.org/abs/2201.04127
 284 | 
 285 | - Demo: https://youtu.be/GM-RoZEymmw
 286 | 
 287 | <a name="3D Face"></a>
 288 | 
 289 | # 3D Face
 290 | 
 291 | **ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations**
 292 | 
 293 | - Paper: https://arxiv.org/abs/2203.14510
 294 | - Code: https://github.com/MingwuZheng/ImFace 
 295 | 
 296 | <a name="Long-Tail"></a>
 297 | 
 298 | # 长尾分布(Long-Tail)
 299 | 
 300 | **Retrieval Augmented Classification for Long-Tail Visual Recognition**
 301 | 
 302 | - Paper: https://arxiv.org/abs/2202.11233
 303 | - Code: None
 304 | 
 305 | <a name="Visual-Transformer"></a>
 306 | 
 307 | # Visual Transformer
 308 | 
 309 | ## Backbone
 310 | 
 311 | **MPViT : Multi-Path Vision Transformer for Dense Prediction**
 312 | 
 313 | - Paper: https://arxiv.org/abs/2112.11010
 314 | - Code: https://github.com/youngwanLEE/MPViT
 315 | 
 316 | **MetaFormer is Actually What You Need for Vision**
 317 | 
 318 | - Paper: https://arxiv.org/abs/2111.11418
 319 | - Code: https://github.com/sail-sg/poolformer
 320 | 
 321 | **Mobile-Former: Bridging MobileNet and Transformer**
 322 | 
 323 | - Paper: https://arxiv.org/abs/2108.05895
 324 | - Code: None
 325 | - 中文解读：https://mp.weixin.qq.com/s/yo5KmB2Y7t2R4jiOKI87HQ
 326 | 
 327 | **Shunted Self-Attention via Multi-Scale Token Aggregation**
 328 | 
 329 | -  Paper(Oral): https://arxiv.org/abs/2111.15193
 330 | - Code: https://github.com/OliverRensu/Shunted-Transformer
 331 | 
 332 | **Learned Queries for Efficient Local Attention**
 333 | 
 334 | - Paper(Oral): https://arxiv.org/abs/2112.11435
 335 | - Code: https://github.com/moabarar/qna
 336 | 
 337 | ## 应用(Application)
 338 | 
 339 | **Language-based Video Editing via Multi-Modal Multi-Level Transformer**
 340 | 
 341 | - Paper: https://arxiv.org/abs/2104.01122
 342 | - Code: None
 343 | 
 344 | **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**
 345 | 
 346 | - Paper: https://arxiv.org/abs/2203.00859
 347 | - Code: None
 348 | 
 349 | **Embracing Single Stride 3D Object Detector with Sparse Transformer**
 350 | 
 351 | - Paper: https://arxiv.org/abs/2112.06375
 352 | - Code: https://github.com/TuSimple/SST
 353 | - 中文解读：https://zhuanlan.zhihu.com/p/476056546
 354 | 
 355 | **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**
 356 | 
 357 | - Paper: https://arxiv.org/abs/2203.02891
 358 | - Code: https://github.com/xulianuwa/MCTformer
 359 | 
 360 | **Spatio-temporal Relation Modeling for Few-shot Action Recognition**
 361 | 
 362 | - Paper: https://arxiv.org/abs/2112.05132
 363 | - Code: https://github.com/Anirudh257/strm
 364 | 
 365 | **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**
 366 | 
 367 | - Paper: https://arxiv.org/abs/2111.07910
 368 | - Code: https://github.com/caiyuanhao1998/MST
 369 | 
 370 | **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**
 371 | 
 372 | - Homepage: https://point-bert.ivg-research.xyz/
 373 | - Paper: https://arxiv.org/abs/2111.14819
 374 | - Code: https://github.com/lulutang0608/Point-BERT
 375 | 
 376 | **GroupViT: Semantic Segmentation Emerges from Text Supervision**
 377 | 
 378 | - Homepage: https://jerryxu.net/GroupViT/
 379 | 
 380 | - Paper: https://arxiv.org/abs/2202.11094
 381 | - Demo: https://youtu.be/DtJsWIUTW-Y
 382 | 
 383 | **Restormer: Efficient Transformer for High-Resolution Image Restoration**
 384 | 
 385 | - Paper: https://arxiv.org/abs/2111.09881
 386 | - Code: https://github.com/swz30/Restormer
 387 | 
 388 | **Splicing ViT Features for Semantic Appearance Transfer**
 389 | 
 390 | - Homepage: https://splice-vit.github.io/
 391 | - Paper: https://arxiv.org/abs/2201.00424
 392 | - Code: https://github.com/omerbt/Splice
 393 | 
 394 | **Self-supervised Video Transformer**
 395 | 
 396 | - Homepage: https://kahnchana.github.io/svt/
 397 | - Paper: https://arxiv.org/abs/2112.01514
 398 | 
 399 | - Code: https://github.com/kahnchana/svt
 400 | 
 401 | **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**
 402 | 
 403 | - Paper: https://arxiv.org/abs/2203.02664
 404 | - Code: https://github.com/rulixiang/afa
 405 | 
 406 | **Accelerating DETR Convergence via Semantic-Aligned Matching**
 407 | 
 408 | - Paper: https://arxiv.org/abs/2203.06883
 409 | - Code: https://github.com/ZhangGongjie/SAM-DETR
 410 | 
 411 | **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**
 412 | 
 413 | - Paper: https://arxiv.org/abs/2203.01305
 414 | - Code: https://github.com/FengLi-ust/DN-DETR
 415 | - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
 416 | 
 417 | **Style Transformer for Image Inversion and Editing**
 418 | 
 419 | - Paper: https://arxiv.org/abs/2203.07932
 420 | - Code: https://github.com/sapphire497/style-transformer
 421 | 
 422 | **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**
 423 | 
 424 | - Paper: https://arxiv.org/abs/2203.10981
 425 | 
 426 | - Code: https://github.com/kuanchihhuang/MonoDTR
 427 | 
 428 | **Mask Transfiner for High-Quality Instance Segmentation**
 429 | 
 430 | - Paper: https://arxiv.org/abs/2111.13673
 431 | - Code: https://github.com/SysCV/transfiner
 432 | 
 433 | **Language as Queries for Referring Video Object Segmentation**
 434 | 
 435 | - Paper: https://arxiv.org/abs/2201.00487
 436 | - Code:  https://github.com/wjn922/ReferFormer
 437 | - 中文解读：https://mp.weixin.qq.com/s/MkQT8QWSYoYVhJ1RSF6oPQ
 438 | 
 439 | **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**
 440 | 
 441 | - Paper: https://arxiv.org/abs/2203.00843
 442 | - Code: https://github.com/CurryYuan/X-Trans2Cap
 443 | 
 444 | **AdaMixer: A Fast-Converging Query-Based Object Detector**
 445 | 
 446 | - Paper(Oral): https://arxiv.org/abs/2203.16507
 447 | - Code: https://github.com/MCG-NJU/AdaMixer
 448 | 
 449 | **Omni-DETR: Omni-Supervised Object Detection with Transformers**
 450 | 
 451 | - Paper: https://arxiv.org/abs/2203.16089
 452 | - Code: https://github.com/amazon-research/omni-detr
 453 | 
 454 | **SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition**
 455 | 
 456 | - Paper: https://arxiv.org/abs/2203.10209
 457 | 
 458 | - Code: https://github.com/mxin262/SwinTextSpotter
 459 | 
 460 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
 461 | 
 462 | - Paper(Oral): https://arxiv.org/abs/2204.01018
 463 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
 464 | 
 465 | **Collaborative Transformers for Grounded Situation Recognition**
 466 | 
 467 | - Paper: https://arxiv.org/abs/2203.16518
 468 | - Code: https://github.com/jhcho99/CoFormer
 469 | 
 470 | **NFormer: Robust Person Re-identification with Neighbor Transformer**
 471 | 
 472 | - Paper: https://arxiv.org/abs/2204.09331
 473 | - Code: https://github.com/haochenheheda/NFormer
 474 | 
 475 | **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**
 476 | 
 477 | - Paper: https://arxiv.org/abs/2201.06889
 478 | - Code: None
 479 | 
 480 | **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**
 481 | 
 482 | - Paper(Oral): https://arxiv.org/abs/2204.08680
 483 | - Code: https://github.com/zengwang430521/TCFormer
 484 | 
 485 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
 486 | 
 487 | - Paper: https://arxiv.org/abs/2204.10039
 488 | - Code: https://github.com/H-deep/Trans-SVSR/
 489 | - Dataset: http://shorturl.at/mpwGX
 490 | 
 491 | **Safe Self-Refinement for Transformer-based Domain Adaptation**
 492 | 
 493 | - Paper: https://arxiv.org/abs/2204.07683
 494 | - Code: https://github.com/tsun/SSRT
 495 | 
 496 | **Fast Point Transformer**
 497 | 
 498 | - Homepage: http://cvlab.postech.ac.kr/research/FPT/
 499 | - Paper: https://arxiv.org/abs/2112.04702
 500 | - Code: https://github.com/POSTECH-CVLab/FastPointTransformer
 501 | 
 502 | **Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval**
 503 | 
 504 | - Paper: https://arxiv.org/abs/2204.09730
 505 | - Code: https://github.com/mshukor/TFood
 506 | 
 507 | **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**
 508 | 
 509 | - Paper: https://arxiv.org/abs/2111.14887
 510 | - Code: https://github.com/lhoyer/DAFormer
 511 | 
 512 | **Stratified Transformer for 3D Point Cloud Segmentation**
 513 | 
 514 | - Paper: https://arxiv.org/pdf/2203.14508.pdf
 515 | - Code: https://github.com/dvlab-research/Stratified-Transformer 
 516 | 
 517 | <a name="VL"></a>
 518 | 
 519 | # 视觉和语言(Vision-Language)
 520 | 
 521 | **Conditional Prompt Learning for Vision-Language Models**
 522 | 
 523 | - Paper: https://arxiv.org/abs/2203.05557
 524 | - Code: https://github.com/KaiyangZhou/CoOp
 525 | 
 526 | **Bridging Video-text Retrieval with Multiple Choice Question**
 527 | 
 528 | - Paper: https://arxiv.org/abs/2201.04850
 529 | - Code: https://github.com/TencentARC/MCQ
 530 | 
 531 | **Visual Abductive Reasoning**
 532 | 
 533 | - Paper: https://arxiv.org/abs/2203.14040
 534 | - Code: https://github.com/leonnnop/VAR
 535 | 
 536 | <a name="SSL"></a>
 537 | 
 538 | # 自监督学习(Self-supervised Learning)
 539 | 
 540 | **UniVIP: A Unified Framework for Self-Supervised Visual Pre-training**
 541 | 
 542 | - Paper: https://arxiv.org/abs/2203.06965
 543 | - Code: None
 544 | 
 545 | **Crafting Better Contrastive Views for Siamese Representation Learning**
 546 | 
 547 | - Paper: https://arxiv.org/abs/2202.03278
 548 | - Code: https://github.com/xyupeng/ContrastiveCrop
 549 | - 中文解读：https://mp.weixin.qq.com/s/VTP9D5f7KG9vg30U9kVI2A
 550 | 
 551 | **HCSC: Hierarchical Contrastive Selective Coding**
 552 | 
 553 | - Homepage: https://github.com/gyfastas/HCSC
 554 | - Paper: https://arxiv.org/abs/2202.00455
 555 | - 中文解读: https://mp.weixin.qq.com/s/jkYR8mYp-e645qk8kfPNKQ
 556 | 
 557 | **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**
 558 | 
 559 | - Paper: https://arxiv.org/abs/2204.10437
 560 | 
 561 | - Code: https://github.com/JLiangLab/DiRA
 562 | 
 563 | <a name="DA"></a>
 564 | 
 565 | # 数据增强(Data Augmentation)
 566 | 
 567 | **TeachAugment: Data Augmentation Optimization Using Teacher Knowledge**
 568 | 
 569 | - Paper: https://arxiv.org/abs/2202.12513
 570 | - Code: https://github.com/DensoITLab/TeachAugment
 571 | 
 572 | **AlignMixup: Improving Representations By Interpolating Aligned Features**
 573 | 
 574 | - Paper: https://arxiv.org/abs/2103.15375
 575 | - Code: https://github.com/shashankvkt/AlignMixup_CVPR22 
 576 | 
 577 | <a name="KD"></a>
 578 | 
 579 | # 知识蒸馏(Knowledge Distillation)
 580 | 
 581 | **Decoupled Knowledge Distillation**
 582 | 
 583 | - Paper: https://arxiv.org/abs/2203.08679
 584 | - Code: https://github.com/megvii-research/mdistiller
 585 | - 中文解读：https://mp.weixin.qq.com/s/-4AA0zKIXh9Ei9-vc5jOhw
 586 | 
 587 | <a name="Object-Detection"></a>
 588 | 
 589 | # 目标检测(Object Detection)
 590 | 
 591 | **BoxeR: Box-Attention for 2D and 3D Transformers**
 592 | - Paper: https://arxiv.org/abs/2111.13087
 593 | - Code: https://github.com/kienduynguyen/BoxeR
 594 | - 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
 595 | 
 596 | **DN-DETR: Accelerate DETR Training by Introducing Query DeNoising**
 597 | 
 598 | - Paper: https://arxiv.org/abs/2203.01305
 599 | - Code: https://github.com/FengLi-ust/DN-DETR
 600 | - 中文解读: https://mp.weixin.qq.com/s/xdMfZ_L628Ru1d1iaMny0w
 601 | 
 602 | **Accelerating DETR Convergence via Semantic-Aligned Matching**
 603 | 
 604 | - Paper: https://arxiv.org/abs/2203.06883
 605 | - Code: https://github.com/ZhangGongjie/SAM-DETR
 606 | 
 607 | **Localization Distillation for Dense Object Detection**
 608 | 
 609 | - Paper: https://arxiv.org/abs/2102.12252
 610 | - Code: https://github.com/HikariTJU/LD
 611 | - Code2: https://github.com/HikariTJU/LD
 612 | - 中文解读：https://mp.weixin.qq.com/s/dxss8RjJH283h6IbPCT9vg
 613 | 
 614 | **Focal and Global Knowledge Distillation for Detectors**
 615 | 
 616 | - Paper: https://arxiv.org/abs/2111.11837
 617 | - Code: https://github.com/yzd-v/FGD
 618 | - 中文解读：https://mp.weixin.qq.com/s/yDkreTudC8JL2V2ETsADwQ
 619 | 
 620 | **A Dual Weighting Label Assignment Scheme for Object Detection**
 621 | 
 622 | - Paper: https://arxiv.org/abs/2203.09730
 623 | - Code: https://github.com/strongwolf/DW
 624 | 
 625 | **AdaMixer: A Fast-Converging Query-Based Object Detector**
 626 | 
 627 | - Paper(Oral): https://arxiv.org/abs/2203.16507
 628 | - Code: https://github.com/MCG-NJU/AdaMixer
 629 | 
 630 | **Omni-DETR: Omni-Supervised Object Detection with Transformers**
 631 | 
 632 | - Paper: https://arxiv.org/abs/2203.16089
 633 | - Code: https://github.com/amazon-research/omni-detr
 634 | 
 635 | **SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection**
 636 | 
 637 | - Paper(Oral): https://arxiv.org/abs/2203.06398
 638 | - Code: https://github.com/CityU-AIM-Group/SIGMA
 639 | 
 640 | ## 半监督目标检测
 641 | 
 642 | **Dense Learning based Semi-Supervised Object Detection**
 643 | 
 644 | - Paper: https://arxiv.org/abs/2204.07300
 645 | 
 646 | - Code: https://github.com/chenbinghui1/DSL
 647 | 
 648 | # 目标跟踪(Visual Tracking)
 649 | 
 650 | **Correlation-Aware Deep Tracking**
 651 | 
 652 | - Paper: https://arxiv.org/abs/2203.01666
 653 | - Code: None
 654 | 
 655 | **TCTrack: Temporal Contexts for Aerial Tracking**
 656 | 
 657 | - Paper: https://arxiv.org/abs/2203.01885
 658 | - Code: https://github.com/vision4robotics/TCTrack
 659 | 
 660 | ## 多模态目标跟踪
 661 | 
 662 | **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**
 663 | 
 664 | - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
 665 | 
 666 | - Paper: https://arxiv.org/abs/2204.04120
 667 | 
 668 | ## 多目标跟踪(Multi-Object Tracking)
 669 | 
 670 | **Learning of Global Objective for Network Flow in Multi-Object Tracking**
 671 | 
 672 | - Paper: https://arxiv.org/abs/2203.16210
 673 | - Code: None
 674 | 
 675 | **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**
 676 | 
 677 | - Homepage: https://dancetrack.github.io
 678 | - Paper: https://arxiv.org/abs/2111.14690
 679 | - Dataset: https://github.com/DanceTrack/DanceTrack
 680 | 
 681 | <a name="Semantic-Segmentation"></a>
 682 | 
 683 | # 语义分割(Semantic Segmentation)
 684 | 
 685 | **Novel Class Discovery in Semantic Segmentation**
 686 | 
 687 | - Homepage: https://ncdss.github.io/
 688 | - Paper: https://arxiv.org/abs/2112.01900
 689 | - Code: https://github.com/HeliosZhao/NCDSS
 690 | 
 691 | **Deep Hierarchical Semantic Segmentation**
 692 | 
 693 | - Paper: https://arxiv.org/abs/2203.14335
 694 | - Code: https://github.com/0liliulei/HieraSeg 
 695 | 
 696 | **Rethinking Semantic Segmentation: A Prototype View**
 697 | 
 698 | - Paper(Oral): https://arxiv.org/abs/2203.15102
 699 | - Code: https://github.com/tfzhou/ProtoSeg
 700 | 
 701 | ## 弱监督语义分割
 702 | 
 703 | **Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation**
 704 | 
 705 | - Paper: https://arxiv.org/abs/2203.00962
 706 | - Code: https://github.com/zhaozhengChen/ReCAM
 707 | 
 708 | **Multi-class Token Transformer for Weakly Supervised Semantic Segmentation**
 709 | 
 710 | - Paper: https://arxiv.org/abs/2203.02891
 711 | - Code: https://github.com/xulianuwa/MCTformer
 712 | 
 713 | **Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers**
 714 | 
 715 | - Paper: https://arxiv.org/abs/2203.02664
 716 | - Code: https://github.com/rulixiang/afa
 717 | 
 718 | **CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation**
 719 | 
 720 | - Paper: https://arxiv.org/abs/2203.02668
 721 | - Code: https://github.com/CVI-SZU/CLIMS 
 722 | 
 723 | **CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation**
 724 | 
 725 | - Paper: https://arxiv.org/abs/2203.13505
 726 | - Code: https://github.com/CVI-SZU/CCAM 
 727 | 
 728 | **FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation**
 729 | 
 730 | - Homeapage: http://cvlab.postech.ac.kr/research/FIFO/
 731 | - Paper(Oral): https://arxiv.org/abs/2204.01587
 732 | - Code: https://github.com/sohyun-l/FIFO 
 733 | 
 734 | **Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation**
 735 | 
 736 | - Paper: https://arxiv.org/abs/2203.09653
 737 | - Code: https://github.com/maeve07/RCA.git
 738 | 
 739 | ## 半监督语义分割
 740 | 
 741 | **ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation**
 742 | 
 743 | - Paper: https://arxiv.org/abs/2106.05095
 744 | - Code: https://github.com/LiheYoung/ST-PlusPlus
 745 | - 中文解读：https://mp.weixin.qq.com/s/knSnlebdtEnmrkChGM_0CA
 746 | 
 747 | **Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels**
 748 | 
 749 | - Homepage: https://haochen-wang409.github.io/U2PL/
 750 | - Paper: https://arxiv.org/abs/2203.03884
 751 | - Code: https://github.com/Haochen-Wang409/U2PL
 752 | - 中文解读: https://mp.weixin.qq.com/s/-08olqE7np8A1XQzt6HAgQ
 753 | 
 754 | **Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation**
 755 | 
 756 | - Paper: https://arxiv.org/pdf/2111.12903.pdf
 757 | - Code: https://github.com/yyliu01/PS-MT
 758 | 
 759 | ## 域自适应语义分割
 760 | 
 761 | **Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation**
 762 | 
 763 | - Paper: https://arxiv.org/abs/2111.12940
 764 | - Code: https://github.com/BIT-DA/RIPU
 765 | 
 766 | **DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation**
 767 | 
 768 | - Paper: https://arxiv.org/abs/2111.14887
 769 | - Code: https://github.com/lhoyer/DAFormer
 770 | 
 771 | ## 无监督语义分割
 772 | 
 773 | **GroupViT: Semantic Segmentation Emerges from Text Supervision**
 774 | 
 775 | - Homepage: https://jerryxu.net/GroupViT/
 776 | - Paper: https://arxiv.org/abs/2202.11094
 777 | - Demo: https://youtu.be/DtJsWIUTW-Y
 778 | 
 779 | ## 少样本语义分割
 780 | 
 781 | **Generalized Few-shot Semantic Segmentation**
 782 | 
 783 | - Paper: https://jiaya.me/papers/cvpr22_zhuotao.pdf
 784 | - Code: https://github.com/dvlab-research/GFS-Seg 
 785 | 
 786 | <a name="Instance-Segmentation"></a>
 787 | 
 788 | # 实例分割(Instance Segmentation)
 789 | 
 790 | **BoxeR: Box-Attention for 2D and 3D Transformers**
 791 | - Paper: https://arxiv.org/abs/2111.13087
 792 | - Code: https://github.com/kienduynguyen/BoxeR
 793 | - 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
 794 | 
 795 | **E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation**
 796 | 
 797 | - Paper: https://arxiv.org/abs/2203.04074
 798 | - Code: https://github.com/zhang-tao-whu/e2ec
 799 | 
 800 | **Mask Transfiner for High-Quality Instance Segmentation**
 801 | 
 802 | - Paper: https://arxiv.org/abs/2111.13673
 803 | - Code: https://github.com/SysCV/transfiner
 804 | 
 805 | **Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity**
 806 | 
 807 | - Homepage: https://sites.google.com/view/generic-grouping/
 808 | 
 809 | - Paper: https://arxiv.org/abs/2204.06107
 810 | - Code: https://github.com/facebookresearch/Generic-Grouping
 811 | 
 812 | ## 自监督实例分割
 813 | 
 814 | **FreeSOLO: Learning to Segment Objects without Annotations**
 815 | 
 816 | - Paper: https://arxiv.org/abs/2202.12181
 817 | - Code: https://github.com/NVlabs/FreeSOLO
 818 | 
 819 | ## 视频实例分割
 820 | 
 821 | **Efficient Video Instance Segmentation via Tracklet Query and Proposal**
 822 | 
 823 | - Homepage: https://jialianwu.com/projects/EfficientVIS.html
 824 | - Paper: https://arxiv.org/abs/2203.01853
 825 | - Demo: https://youtu.be/sSPMzgtMKCE
 826 | 
 827 | **Temporally Efficient Vision Transformer for Video Instance Segmentation**
 828 | 
 829 | - Paper: https://arxiv.org/abs/2204.08412
 830 | - Code: https://github.com/hustvl/TeViT
 831 | 
 832 | <a name="Panoptic-Segmentation"></a>
 833 | 
 834 | # 全景分割(Panoptic Segmentation)
 835 | 
 836 | **Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers**
 837 | 
 838 | - Paper: https://arxiv.org/abs/2109.03814
 839 | - Code: https://github.com/zhiqi-li/Panoptic-SegFormer
 840 | 
 841 | **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**
 842 | 
 843 | - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
 844 | - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
 845 | - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset 
 846 | 
 847 | <a name="FFC"></a>
 848 | 
 849 | # 小样本分类(Few-Shot Classification)
 850 | 
 851 | **Integrative Few-Shot Learning for Classification and Segmentation**
 852 | 
 853 | - Paper: https://arxiv.org/abs/2203.15712
 854 | - Code: https://github.com/dahyun-kang/ifsl
 855 | 
 856 | **Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification**
 857 | 
 858 | - Paper: https://arxiv.org/abs/2106.05517
 859 | - Code: https://github.com/LouieYang/MCL
 860 | 
 861 | <a name="FFS"></a>
 862 | 
 863 | # 小样本分割(Few-Shot Segmentation)
 864 | 
 865 | **Learning What Not to Segment: A New Perspective on Few-Shot Segmentation**
 866 | 
 867 | - Paper: https://arxiv.org/abs/2203.07615
 868 | - Code: https://github.com/chunbolang/BAM
 869 | 
 870 | **Integrative Few-Shot Learning for Classification and Segmentation**
 871 | 
 872 | - Paper: https://arxiv.org/abs/2203.15712
 873 | - Code: https://github.com/dahyun-kang/ifsl
 874 | 
 875 | **Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation**
 876 | 
 877 | - Paper: https://arxiv.org/abs/2204.10638
 878 | - Code: None
 879 | 
 880 | <a name="Matting"></a>
 881 | 
 882 | # 图像抠图(Image Matting)
 883 | 
 884 | **Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation**
 885 | 
 886 | - Paper: https://arxiv.org/abs/2201.06889
 887 | - Code: None
 888 | 
 889 | <a name="VU"></a>
 890 | 
 891 | # 视频理解(Video Understanding)
 892 | 
 893 | **Self-supervised Video Transformer**
 894 | 
 895 | - Homepage: https://kahnchana.github.io/svt/
 896 | - Paper: https://arxiv.org/abs/2112.01514
 897 | - Code: https://github.com/kahnchana/svt
 898 | 
 899 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
 900 | 
 901 | - Paper(Oral): https://arxiv.org/abs/2204.01018
 902 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
 903 | 
 904 | **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**
 905 | 
 906 | - Paper(Oral): https://arxiv.org/abs/2204.03646
 907 | 
 908 | - Dataset: https://github.com/xujinglin/FineDiving
 909 | - Code: https://github.com/xujinglin/FineDiving
 910 | - 中文解读：https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
 911 | 
 912 | **Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition**
 913 | 
 914 | - Paper(Oral): https://arxiv.org/abs/2204.02148
 915 | - Code: None
 916 | 
 917 | ## 行为识别(Action Recognition)
 918 | 
 919 | **Spatio-temporal Relation Modeling for Few-shot Action Recognition**
 920 | 
 921 | - Paper: https://arxiv.org/abs/2112.05132
 922 | - Code: https://github.com/Anirudh257/strm
 923 | 
 924 | ## 动作检测(Action Detection)
 925 | 
 926 | **End-to-End Semi-Supervised Learning for Video Action Detection**
 927 | 
 928 | - Paper: https://arxiv.org/abs/2203.04251
 929 | - Code: None
 930 | 
 931 | <a name="Image-Editing"></a>
 932 | 
 933 | # 图像编辑(Image Editing)
 934 | 
 935 | **Style Transformer for Image Inversion and Editing**
 936 | 
 937 | - Paper: https://arxiv.org/abs/2203.07932
 938 | - Code: https://github.com/sapphire497/style-transformer
 939 | 
 940 | **Blended Diffusion for Text-driven Editing of Natural Images**
 941 | 
 942 | - Paper: https://arxiv.org/abs/2111.14818
 943 | - Code: https://github.com/omriav/blended-diffusion
 944 | 
 945 | **SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing**
 946 | 
 947 | - Homepage: https://semanticstylegan.github.io/
 948 | 
 949 | - Paper: https://arxiv.org/abs/2112.02236
 950 | - Demo: https://semanticstylegan.github.io/videos/demo.mp4
 951 | 
 952 | <a name="LLV"></a>
 953 | 
 954 | # Low-level Vision
 955 | 
 956 | **ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior**
 957 | 
 958 | - Paper: https://arxiv.org/abs/2111.15362
 959 | - Code: None
 960 | 
 961 | **Restormer: Efficient Transformer for High-Resolution Image Restoration**
 962 | 
 963 | - Paper: https://arxiv.org/abs/2111.09881
 964 | - Code: https://github.com/swz30/Restormer
 965 | 
 966 | **Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements**
 967 | 
 968 | - Paper(Oral): https://arxiv.org/abs/2111.12855
 969 | - Code: https://github.com/edongdongchen/REI
 970 | 
 971 | <a name="Super-Resolution"></a>
 972 | 
 973 | # 超分辨率(Super-Resolution)
 974 | 
 975 | ## 图像超分辨率(Image Super-Resolution)
 976 | 
 977 | **Learning the Degradation Distribution for Blind Image Super-Resolution**
 978 | 
 979 | - Paper: https://arxiv.org/abs/2203.04962
 980 | - Code: https://github.com/greatlog/UnpairedSR
 981 | 
 982 | ## 视频超分辨率(Video Super-Resolution)
 983 | 
 984 | **BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment**
 985 | 
 986 | - Paper: https://arxiv.org/abs/2104.13371
 987 | - Code: https://github.com/open-mmlab/mmediting
 988 | - Code: https://github.com/ckkelvinchan/BasicVSR_PlusPlus
 989 | - 中文解读：https://mp.weixin.qq.com/s/HZTwYfphixyLHxlbCAxx4g
 990 | 
 991 | **Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling**
 992 | 
 993 | - Paper: https://arxiv.org/abs/2204.07114
 994 | - Code: None
 995 | 
 996 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
 997 | 
 998 | - Paper: https://arxiv.org/abs/2204.10039
 999 | - Code: https://github.com/H-deep/Trans-SVSR/
1000 | - Dataset: http://shorturl.at/mpwGX
1001 | 
1002 | <a name="Deblur"></a>
1003 | 
1004 | # 去模糊(Deblur)
1005 | 
1006 | ## 图像去模糊(Image Deblur)
1007 | 
1008 | **Learning to Deblur using Light Field Generated and Real Defocus Images**
1009 | 
1010 | - Homepage: http://lyruan.com/Projects/DRBNet/
1011 | - Paper(Oral): https://arxiv.org/abs/2204.00442
1012 | 
1013 | - Code: https://github.com/lingyanruan/DRBNet
1014 | 
1015 | <a name="3D-Point-Cloud"></a>
1016 | 
1017 | # 3D点云(3D Point Cloud)
1018 | 
1019 | **Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling**
1020 | 
1021 | - Homepage: https://point-bert.ivg-research.xyz/
1022 | 
1023 | - Paper: https://arxiv.org/abs/2111.14819
1024 | - Code: https://github.com/lulutang0608/Point-BERT
1025 | 
1026 | **A Unified Query-based Paradigm for Point Cloud Understanding**
1027 | 
1028 | - Paper: https://arxiv.org/abs/2203.01252
1029 | - Code: None 
1030 | 
1031 | **CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding**
1032 | 
1033 | - Paper: https://arxiv.org/abs/2203.00680
1034 | - Code: https://github.com/MohamedAfham/CrossPoint
1035 | 
1036 | **PointCLIP: Point Cloud Understanding by CLIP**
1037 | 
1038 | - Paper: https://arxiv.org/abs/2112.02413
1039 | - Code: https://github.com/ZrrSkywalker/PointCLIP
1040 | 
1041 | **Fast Point Transformer**
1042 | 
1043 | - Homepage: http://cvlab.postech.ac.kr/research/FPT/
1044 | - Paper: https://arxiv.org/abs/2112.04702
1045 | - Code: https://github.com/POSTECH-CVLab/FastPointTransformer
1046 | 
1047 | **RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds**
1048 | 
1049 | - Paper: https://arxiv.org/abs/2205.11028
1050 | - Code: https://github.com/gxd1994/RCP
1051 | 
1052 | **The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution**
1053 | 
1054 | - Paper: https://arxiv.org/abs/2205.15210
1055 | - Code: https://github.com/GostInShell/PaRI-Conv 
1056 | 
1057 | <a name="3D-Object-Detection"></a>
1058 | 
1059 | # 3D目标检测(3D Object Detection)
1060 | 
1061 | **Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds**
1062 | 
1063 | - Paper(Oral): https://arxiv.org/abs/2203.11139
1064 | 
1065 | - Code: https://github.com/yifanzhang713/IA-SSD
1066 | 
1067 | - Demo: https://www.youtube.com/watch?v=3jP2o9KXunA
1068 | 
1069 | **BoxeR: Box-Attention for 2D and 3D Transformers**
1070 | - Paper: https://arxiv.org/abs/2111.13087
1071 | - Code: https://github.com/kienduynguyen/BoxeR
1072 | - 中文解读：https://mp.weixin.qq.com/s/UnUJJBwcAsRgz6TnQf_b7w
1073 | 
1074 | **Embracing Single Stride 3D Object Detector with Sparse Transformer**
1075 | 
1076 | - Paper: https://arxiv.org/abs/2112.06375
1077 | 
1078 | - Code: https://github.com/TuSimple/SST
1079 | 
1080 | **Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes** 
1081 | 
1082 | - Paper: https://arxiv.org/abs/2011.12001
1083 | - Code: https://github.com/qq456cvb/CanonicalVoting
1084 | 
1085 | **MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer**
1086 | 
1087 | - Paper: https://arxiv.org/abs/2203.10981
1088 | - Code: https://github.com/kuanchihhuang/MonoDTR
1089 | 
1090 | **HyperDet3D: Learning a Scene-conditioned 3D Object Detector**
1091 | 
1092 | - Paper: https://arxiv.org/abs/2204.05599
1093 | - Code: None
1094 | 
1095 | **OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data**
1096 | 
1097 | - Paper: https://arxiv.org/abs/2204.06577
1098 | - Code: https://github.com/dschinagl/occam
1099 | 
1100 | **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**
1101 | 
1102 | - Homepage: https://thudair.baai.ac.cn/index
1103 | - Paper: https://arxiv.org/abs/2204.05575
1104 | - Code: https://github.com/AIR-THU/DAIR-V2X
1105 | 
1106 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
1107 | 
1108 | - Homepage: https://ithaca365.mae.cornell.edu/
1109 | 
1110 | - Paper: https://arxiv.org/abs/2208.01166
1111 | 
1112 | <a name="3DSS"></a>
1113 | 
1114 | # 3D语义分割(3D Semantic Segmentation)
1115 | 
1116 | **Scribble-Supervised LiDAR Semantic Segmentation**
1117 | 
1118 | - Paper: https://arxiv.org/abs/2203.08537
1119 | - Dataset: https://github.com/ouenal/scribblekitti
1120 | 
1121 | **Stratified Transformer for 3D Point Cloud Segmentation**
1122 | 
1123 | - Paper: https://arxiv.org/pdf/2203.14508.pdf
1124 | - Code: https://github.com/dvlab-research/Stratified-Transformer
1125 | 
1126 | # 3D实例分割(3D Instance Segmentation)
1127 | 
1128 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
1129 | 
1130 | - Homepage: https://ithaca365.mae.cornell.edu/
1131 | 
1132 | - Paper: https://arxiv.org/abs/2208.01166
1133 | 
1134 | <a name="3D-Object-Tracking"></a>
1135 | 
1136 | # 3D目标跟踪(3D Object Tracking)
1137 | 
1138 | **Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds**
1139 | 
1140 | - Paper: https://arxiv.org/abs/2203.01730
1141 | - Code: https://github.com/Ghostish/Open3DSOT
1142 | 
1143 | **PTTR: Relational 3D Point Cloud Object Tracking with Transformer**
1144 | 
1145 | - Paper: https://arxiv.org/abs/2112.02857
1146 | - Code: https://github.com/Jasonkks/PTTR 
1147 | 
1148 | <a name="3D-Human-Pose-Estimation"></a>
1149 | 
1150 | # 3D人体姿态估计(3D Human Pose Estimation)
1151 | 
1152 | **MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation**
1153 | 
1154 | - Paper: https://arxiv.org/abs/2111.12707
1155 | 
1156 | - Code: https://github.com/Vegetebird/MHFormer
1157 | 
1158 | - 中文解读: https://zhuanlan.zhihu.com/p/439459426
1159 | 
1160 | **MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video**
1161 | 
1162 | - Paper: https://arxiv.org/abs/2203.00859
1163 | - Code: None
1164 | 
1165 | **Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation**
1166 | 
1167 | - Paper: https://arxiv.org/abs/2203.07697
1168 | - Code: None
1169 | - 中文解读：https://mp.weixin.qq.com/s/L_F28IFLXvs5R4V9TTUpRw
1170 | 
1171 | **BEV: Putting People in their Place: Monocular Regression of 3D People in Depth**
1172 | 
1173 | - Homepage: https://arthur151.github.io/BEV/BEV.html
1174 | - Paper: https://arxiv.org/abs/2112.08274
1175 | - Code: https://github.com/Arthur151/ROMP
1176 | - Dataset: https://github.com/Arthur151/Relative_Human
1177 | - Demo: https://www.youtube.com/watch?v=Q62fj_6AxRI
1178 | 
1179 | <a name="3DSSC"></a>
1180 | 
1181 | # 3D语义场景补全(3D Semantic Scene Completion)
1182 | 
1183 | **MonoScene: Monocular 3D Semantic Scene Completion**
1184 | 
1185 | - Paper: https://arxiv.org/abs/2112.00726
1186 | - Code: https://github.com/cv-rits/MonoScene
1187 | 
1188 | <a name="3D-R"></a>
1189 | 
1190 | # 3D重建(3D Reconstruction)
1191 | 
1192 | **BANMo: Building Animatable 3D Neural Models from Many Casual Videos**
1193 | 
1194 | - Homepage: https://banmo-www.github.io/
1195 | - Paper: https://arxiv.org/abs/2112.12761
1196 | - Code: https://github.com/facebookresearch/banmo
1197 | - 中文解读：https://mp.weixin.qq.com/s/NMHP8-xWwrX40vpGx55Qew
1198 | 
1199 | <a name="ReID"></a>
1200 | 
1201 | # 行人重识别(Person Re-identification)
1202 | 
1203 | **NFormer: Robust Person Re-identification with Neighbor Transformer**
1204 | 
1205 | - Paper: https://arxiv.org/abs/2204.09331
1206 | - Code: https://github.com/haochenheheda/NFormer
1207 | 
1208 | <a name="COD"></a>
1209 | 
1210 | # 伪装物体检测(Camouflaged Object Detection)
1211 | 
1212 | **Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection**
1213 | 
1214 | - Paper: https://arxiv.org/abs/2203.02688
1215 | - Code: https://github.com/lartpang/ZoomNet
1216 | 
1217 | <a name="Depth-Estimation"></a>
1218 | 
1219 | # 深度估计(Depth Estimation)
1220 | 
1221 | ## 单目深度估计
1222 | 
1223 | **NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation**
1224 | 
1225 | - Paper: https://arxiv.org/abs/2203.01502
1226 | - Code: None
1227 | 
1228 | **OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion**
1229 | 
1230 | - Paper: https://arxiv.org/abs/2203.00838
1231 | - Code: None
1232 | 
1233 | **Toward Practical Self-Supervised Monocular Indoor Depth Estimation**
1234 | 
1235 | - Paper: https://arxiv.org/abs/2112.02306
1236 | - Code: None
1237 | 
1238 | **P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior**
1239 | 
1240 | - Paper: https://arxiv.org/abs/2204.02091
1241 | - Code: https://github.com/SysCV/P3Depth
1242 | 
1243 | **Multi-Frame Self-Supervised Depth with Transformers**
1244 | 
1245 | - Homepage: https://sites.google.com/tri.global/depthformer
1246 | 
1247 | - Paper: https://arxiv.org/abs/2204.07616
1248 | - Code: None
1249 | 
1250 | <a name="Stereo-Matching"></a>
1251 | 
1252 | # 立体匹配(Stereo Matching)
1253 | 
1254 | **ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching**
1255 | 
1256 | - Paper: https://arxiv.org/abs/2203.02146
1257 | - Code: https://github.com/gangweiX/ACVNet
1258 | 
1259 | <a name="FM"></a>
1260 | 
1261 | # 特征匹配(Feature Matching)
1262 | 
1263 | **ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching**
1264 | 
1265 | - Paper: https://arxiv.org/abs/2204.11700
1266 | - Code: None
1267 | 
1268 | <a name="Lane-Detection"></a>
1269 | 
1270 | # 车道线检测(Lane Detection)
1271 | 
1272 | **Rethinking Efficient Lane Detection via Curve Modeling**
1273 | 
1274 | - Paper: https://arxiv.org/abs/2203.02431
1275 | - Code: https://github.com/voldemortX/pytorch-auto-drive
1276 | - Demo：https://user-images.githubusercontent.com/32259501/148680744-a18793cd-f437-461f-8c3a-b909c9931709.mp4
1277 | 
1278 | **A Keypoint-based Global Association Network for Lane Detection**
1279 | 
1280 | - Paper: https://arxiv.org/abs/2204.07335
1281 | - Code: https://github.com/Wolfwjs/GANet
1282 | 
1283 | <a name="Optical-Flow-Estimation"></a>
1284 | 
1285 | # 光流估计(Optical Flow Estimation)
1286 | 
1287 | **Imposing Consistency for Optical Flow Estimation**
1288 | 
1289 | - Paper: https://arxiv.org/abs/2204.07262
1290 | - Code: None
1291 | 
1292 | **Deep Equilibrium Optical Flow Estimation**
1293 | 
1294 | - Paper: https://arxiv.org/abs/2204.08442
1295 | - Code: https://github.com/locuslab/deq-flow
1296 | 
1297 | **GMFlow: Learning Optical Flow via Global Matching**
1298 | 
1299 | - Paper(Oral): https://arxiv.org/abs/2111.13680
1300 | - Code: https://github.com/haofeixu/gmflow
1301 | 
1302 | <a name="Image-Inpainting"></a>
1303 | 
1304 | # 图像修复(Image Inpainting)
1305 | 
1306 | **Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding**
1307 | 
1308 | - Paper: https://arxiv.org/abs/2203.00867
1309 | 
1310 | - Code: https://github.com/DQiaole/ZITS_inpainting
1311 | 
1312 | <a name="Image-Retrieval"></a>
1313 | 
1314 | # 图像检索(Image Retrieval)
1315 | 
1316 | **Correlation Verification for Image Retrieval**
1317 | 
1318 | - Paper(Oral): https://arxiv.org/abs/2204.01458
1319 | - Code: https://github.com/sungonce/CVNet
1320 | 
1321 | <a name="Face-Recognition"></a>
1322 | 
1323 | # 人脸识别(Face Recognition)
1324 | 
1325 | **AdaFace: Quality Adaptive Margin for Face Recognition**
1326 | 
1327 | - Paper(Oral): https://arxiv.org/abs/2204.00964 
1328 | - Code: https://github.com/mk-minchul/AdaFace
1329 | 
1330 | <a name="Crowd-Counting"></a>
1331 | 
1332 | # 人群计数(Crowd Counting)
1333 | 
1334 | **Leveraging Self-Supervision for Cross-Domain Crowd Counting**
1335 | 
1336 | - Paper: https://arxiv.org/abs/2103.16291
1337 | - Code: None
1338 | 
1339 | <a name="Medical-Image"></a>
1340 | 
1341 | # 医学图像(Medical Image)
1342 | 
1343 | **BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation**
1344 | 
1345 | - Paper: https://arxiv.org/abs/2203.02533
1346 | - Code: None
1347 | 
1348 | **Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification**
1349 | 
1350 | - Paper: https://arxiv.org/abs/2111.12918
1351 | - Code: https://github.com/FBLADL/ACPL
1352 | 
1353 | **DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis**
1354 | 
1355 | - Paper: https://arxiv.org/abs/2204.10437
1356 | 
1357 | - Code: https://github.com/JLiangLab/DiRA
1358 | 
1359 | <a name="Video Generation"></a>
1360 | 
1361 | # 视频生成(Video Generation)
1362 | 
1363 | **StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2**
1364 | 
1365 | - Homepage: https://universome.github.io/stylegan-v
1366 | - Paper: https://arxiv.org/abs/2112.14683
1367 | 
1368 | - Code: https://github.com/universome/stylegan-v
1369 | 
1370 | - Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4
1371 | 
1372 | <a name="Scene-Graph-Generation"></a>
1373 | 
1374 | # 场景图生成(Scene Graph Generation)
1375 | 
1376 |  **SGTR: End-to-end Scene Graph Generation with Transformer**
1377 | 
1378 | - Paper: https://arxiv.org/abs/2112.12970
1379 | - Code: None
1380 | 
1381 | <a name="R-VOS"></a>
1382 | 
1383 | # 参考视频目标分割(Referring Video Object Segmentation)
1384 | 
1385 | **Language as Queries for Referring Video Object Segmentation**
1386 | 
1387 | - Paper: https://arxiv.org/abs/2201.00487
1388 | - Code:  https://github.com/wjn922/ReferFormer
1389 | 
1390 | **ReSTR: Convolution-free Referring Image Segmentation Using Transformers**
1391 | 
1392 | - Paper: https://arxiv.org/abs/2203.16768
1393 | - Code: None
1394 | 
1395 | <a name="GR"></a>
1396 | 
1397 | # 步态识别(Gait Recognition)
1398 | 
1399 | **Gait Recognition in the Wild with Dense 3D Representations and A Benchmark**
1400 | 
1401 | - Homepage: https://gait3d.github.io/
1402 | - Paper: https://arxiv.org/abs/2204.02569
1403 | - Code: https://github.com/Gait3D/Gait3D-Benchmark
1404 | 
1405 | <a name="ST"></a>
1406 | 
1407 | # 风格迁移(Style Transfer)
1408 | 
1409 | **StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions**
1410 | 
1411 | - Homepage: https://lukashoel.github.io/stylemesh/
1412 | - Paper: https://arxiv.org/abs/2112.01530
1413 | 
1414 | - Code: https://github.com/lukasHoel/stylemesh
1415 | - Demo：https://www.youtube.com/watch?v=ZqgiTLcNcks
1416 | 
1417 | <a name="AD"></a>
1418 | 
1419 | # 异常检测(Anomaly Detection)
1420 | 
1421 | **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**
1422 | 
1423 | - Paper: https://arxiv.org/abs/2111.08644
1424 | 
1425 | - Dataset: https://github.com/lilygeorgescu/UBnormal
1426 | 
1427 | **Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection**
1428 | 
1429 | - Paper(Oral): https://arxiv.org/abs/2111.09099
1430 | - Code: https://github.com/ristea/sspcab
1431 | 
1432 | 对抗样本)<a name="AE"></a>
1433 | 
1434 | # 对抗样本(Adversarial Examples)
1435 | 
1436 | **Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon**
1437 | 
1438 | - Paper: https://arxiv.org/abs/2203.03818
1439 | - Code: https://github.com/hncszyq/ShadowAttack
1440 | 
1441 | **LAS-AT: Adversarial Training with Learnable Attack Strategy**
1442 | 
1443 | - Paper(Oral): https://arxiv.org/abs/2203.06616
1444 | - Code: https://github.com/jiaxiaojunQAQ/LAS-AT
1445 | 
1446 | **Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection**
1447 | 
1448 | - Paper: https://arxiv.org/abs/2112.04532
1449 | - Code: https://github.com/joellliu/SegmentAndComplete
1450 | 
1451 | <a name="WSOL"></a>
1452 | 
1453 | # 弱监督物体检测(Weakly Supervised Object Localization)
1454 | 
1455 | **Weakly Supervised Object Localization as Domain Adaption**
1456 | 
1457 | - Paper: https://arxiv.org/abs/2203.01714
1458 | - Code: https://github.com/zh460045050/DA-WSOL_CVPR2022
1459 | 
1460 | <a name="ROD"></a>
1461 | 
1462 | # 雷达目标检测(Radar Object Detection)
1463 | 
1464 | **Exploiting Temporal Relations on Radar Perception for Autonomous Driving**
1465 | 
1466 | - Paper: https://arxiv.org/abs/2204.01184
1467 | - Code: None
1468 | 
1469 | <a name="HSI"></a>
1470 | 
1471 | # 高光谱图像重建(Hyperspectral Image Reconstruction)
1472 | 
1473 | **Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction**
1474 | 
1475 | - Paper: https://arxiv.org/abs/2111.07910
1476 | - Code: https://github.com/caiyuanhao1998/MST
1477 | 
1478 | <a name="Image-Stitching"></a>
1479 | 
1480 | # 图像拼接(Image Stitching)
1481 | 
1482 | **Deep Rectangling for Image Stitching: A Learning Baseline**
1483 | 
1484 | - Paper(Oral): https://arxiv.org/abs/2203.03831
1485 | 
1486 | - Code: https://github.com/nie-lang/DeepRectangling
1487 | - Dataset: https://github.com/nie-lang/DeepRectangling
1488 | - 中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
1489 | 
1490 | <a name="Watermarking"></a>
1491 | 
1492 | # 水印(Watermarking)
1493 | 
1494 | **Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings**
1495 | 
1496 | - Paper: https://arxiv.org/abs/2104.13450
1497 | - Code: None
1498 | 
1499 | <a name="AC"></a>
1500 | 
1501 | # Action Counting
1502 | 
1503 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
1504 | 
1505 | - Paper(Oral): https://arxiv.org/abs/2204.01018
1506 | - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
1507 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
1508 | 
1509 | <a name="GSR"></a>
1510 | 
1511 | # Grounded Situation Recognition
1512 | 
1513 | **Collaborative Transformers for Grounded Situation Recognition**
1514 | 
1515 | - Paper: https://arxiv.org/abs/2203.16518
1516 | - Code: https://github.com/jhcho99/CoFormer
1517 | 
1518 | <a name="ZSL"></a>
1519 | 
1520 | # Zero-shot Learning
1521 | 
1522 | **Unseen Classes at a Later Time? No Problem**
1523 | 
1524 | - Paper: https://arxiv.org/abs/2203.16517
1525 | - Code: https://github.com/sumitramalagi/Unseen-classes-at-a-later-time
1526 | 
1527 | <a name="DeepFakes"></a>
1528 | 
1529 | # DeepFakes
1530 | 
1531 | **Detecting Deepfakes with Self-Blended Images**
1532 | 
1533 | - Paper(Oral): https://arxiv.org/abs/2204.08376
1534 | 
1535 | - Code: https://github.com/mapooon/SelfBlendedImages
1536 | 
1537 | <a name="Datasets"></a>
1538 | 
1539 | # 数据集(Datasets)
1540 | 
1541 | **It's About Time: Analog Clock Reading in the Wild**
1542 | 
1543 | - Homepage: https://charigyang.github.io/abouttime/
1544 | - Paper: https://arxiv.org/abs/2111.09162
1545 | - Code: https://github.com/charigyang/itsabouttime
1546 | - Demo: https://youtu.be/cbiMACA6dRc
1547 | 
1548 | **Toward Practical Self-Supervised Monocular Indoor Depth Estimation**
1549 | 
1550 | - Paper: https://arxiv.org/abs/2112.02306
1551 | - Code: None
1552 | 
1553 | **Kubric: A scalable dataset generator**
1554 | 
1555 | - Paper: https://arxiv.org/abs/2203.03570
1556 | - Code: https://github.com/google-research/kubric
1557 | - 中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
1558 | 
1559 | **Scribble-Supervised LiDAR Semantic Segmentation**
1560 | 
1561 | - Paper: https://arxiv.org/abs/2203.08537
1562 | - Dataset: https://github.com/ouenal/scribblekitti
1563 | 
1564 | **Deep Rectangling for Image Stitching: A Learning Baseline**
1565 | 
1566 | - Paper(Oral): https://arxiv.org/abs/2203.03831
1567 | - Code: https://github.com/nie-lang/DeepRectangling
1568 | - Dataset: https://github.com/nie-lang/DeepRectangling
1569 | - 中文解读：https://mp.weixin.qq.com/s/lp5AnrtO_9urp-Fv6Z0l2Q
1570 | 
1571 | **ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer**
1572 | 
1573 | - Homepage: https://ai.stanford.edu/~rhgao/objectfolder2.0/
1574 | - Paper: https://arxiv.org/abs/2204.02389
1575 | - Dataset: https://github.com/rhgao/ObjectFolder
1576 | - Demo：https://youtu.be/e5aToT3LkRA
1577 | 
1578 | **Shape from Polarization for Complex Scenes in the Wild**
1579 | 
1580 | - Homepage: https://chenyanglei.github.io/sfpwild/index.html
1581 | - Paper: https://arxiv.org/abs/2112.11377
1582 | - Code: https://github.com/ChenyangLEI/sfp-wild
1583 | 
1584 | **Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline**
1585 | 
1586 | - Homepage: https://zhang-pengyu.github.io/DUT-VTUAV/
1587 | - Paper: https://arxiv.org/abs/2204.04120
1588 | 
1589 | **TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting**
1590 | 
1591 | - Paper(Oral): https://arxiv.org/abs/2204.01018
1592 | - Dataset: https://svip-lab.github.io/dataset/RepCount_dataset.html
1593 | - Code: https://github.com/SvipRepetitionCounting/TransRAC
1594 | 
1595 | **FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment**
1596 | 
1597 | - Paper(Oral): https://arxiv.org/abs/2204.03646
1598 | - Dataset: https://github.com/xujinglin/FineDiving
1599 | - Code: https://github.com/xujinglin/FineDiving
1600 | - 中文解读：https://mp.weixin.qq.com/s/8t12Y34eMNwvJr8PeryWXg
1601 | 
1602 | **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**
1603 | 
1604 | - Paper: https://arxiv.org/abs/2204.02701
1605 | - Dataset: https://github.com/yizhiwang96/TextLogoLayout
1606 | - Code: https://github.com/yizhiwang96/TextLogoLayout
1607 | 
1608 | **DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection**
1609 | 
1610 | - Homepage: https://thudair.baai.ac.cn/index
1611 | - Paper: https://arxiv.org/abs/2204.05575
1612 | - Code: https://github.com/AIR-THU/DAIR-V2X
1613 | 
1614 | **A New Dataset and Transformer for Stereoscopic Video Super-Resolution**
1615 | 
1616 | - Paper: https://arxiv.org/abs/2204.10039
1617 | - Code: https://github.com/H-deep/Trans-SVSR/
1618 | - Dataset: http://shorturl.at/mpwGX
1619 | 
1620 | **Putting People in their Place: Monocular Regression of 3D People in Depth**
1621 | 
1622 | - Homepage: https://arthur151.github.io/BEV/BEV.html
1623 | - Paper: https://arxiv.org/abs/2112.08274
1624 | 
1625 | - Code:https://github.com/Arthur151/ROMP
1626 | - Dataset: https://github.com/Arthur151/Relative_Human
1627 | 
1628 | **UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection**
1629 | 
1630 | - Paper: https://arxiv.org/abs/2111.08644
1631 | - Dataset: https://github.com/lilygeorgescu/UBnormal
1632 | 
1633 | **DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion**
1634 | 
1635 | - Homepage: https://dancetrack.github.io
1636 | - Paper: https://arxiv.org/abs/2111.14690
1637 | - Dataset: https://github.com/DanceTrack/DanceTrack
1638 | 
1639 | **Visual Abductive Reasoning**
1640 | 
1641 | - Paper: https://arxiv.org/abs/2203.14040
1642 | - Code: https://github.com/leonnnop/VAR
1643 | 
1644 | **Large-scale Video Panoptic Segmentation in the Wild: A Benchmark**
1645 | 
1646 | - Paper: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset/blob/main/VIPSeg2022.pdf
1647 | - Code: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
1648 | - Dataset: https://github.com/VIPSeg-Dataset/VIPSeg-Dataset
1649 | 
1650 | **Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions**
1651 | 
1652 | - Homepage: https://ithaca365.mae.cornell.edu/
1653 | 
1654 | - Paper: https://arxiv.org/abs/2208.01166
1655 | 
1656 | <a name="New-Tasks"></a>
1657 | 
1658 | # 新任务(New Task)
1659 | 
1660 | **Language-based Video Editing via Multi-Modal Multi-Level Transformer**
1661 | 
1662 | - Paper: https://arxiv.org/abs/2104.01122
1663 | - Code: None
1664 | 
1665 | **It's About Time: Analog Clock Reading in the Wild**
1666 | 
1667 | - Homepage: https://charigyang.github.io/abouttime/
1668 | - Paper: https://arxiv.org/abs/2111.09162
1669 | - Code: https://github.com/charigyang/itsabouttime
1670 | - Demo: https://youtu.be/cbiMACA6dRc
1671 | 
1672 | **Splicing ViT Features for Semantic Appearance Transfer**
1673 | 
1674 | - Homepage: https://splice-vit.github.io/
1675 | - Paper: https://arxiv.org/abs/2201.00424
1676 | - Code: https://github.com/omerbt/Splice
1677 | 
1678 | **Visual Abductive Reasoning**
1679 | 
1680 | - Paper: https://arxiv.org/abs/2203.14040
1681 | - Code: https://github.com/leonnnop/VAR
1682 | 
1683 | <a name="Others"></a>
1684 | 
1685 | # 其他(Others)
1686 | 
1687 | **Kubric: A scalable dataset generator**
1688 | 
1689 | - Paper: https://arxiv.org/abs/2203.03570
1690 | - Code: https://github.com/google-research/kubric
1691 | - 中文解读：https://mp.weixin.qq.com/s/mJ8HzY6C0GifxsErJIS3Mg
1692 | 
1693 | **X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning**
1694 | 
1695 | - Paper: https://arxiv.org/abs/2203.00843
1696 | - Code: https://github.com/CurryYuan/X-Trans2Cap
1697 | 
1698 | **Balanced MSE for Imbalanced Visual Regression**
1699 | 
1700 | - Paper(Oral): https://arxiv.org/abs/2203.16427
1701 | - Code: https://github.com/jiawei-ren/BalancedMSE
1702 | 
1703 | **SNUG: Self-Supervised Neural Dynamic Garments**
1704 | 
1705 | - Homepage: http://mslab.es/projects/SNUG/
1706 | - Paper(Oral): https://arxiv.org/abs/2204.02219
1707 | - Code: https://github.com/isantesteban/snug
1708 | 
1709 | **Shape from Polarization for Complex Scenes in the Wild**
1710 | 
1711 | - Homepage: https://chenyanglei.github.io/sfpwild/index.html
1712 | - Paper: https://arxiv.org/abs/2112.11377
1713 | - Code: https://github.com/ChenyangLEI/sfp-wild
1714 | 
1715 | **LASER: LAtent SpacE Rendering for 2D Visual Localization**
1716 | 
1717 | - Paper(Oral): https://arxiv.org/abs/2204.00157
1718 | - Code: None
1719 | 
1720 | **Single-Photon Structured Light**
1721 | 
1722 | - Paper(Oral): https://arxiv.org/abs/2204.05300
1723 | - Code: None
1724 | 
1725 | **3DeformRS: Certifying Spatial Deformations on Point Clouds**
1726 | 
1727 | - Paper: https://arxiv.org/abs/2204.05687
1728 | - Code: None
1729 | 
1730 | **Aesthetic Text Logo Synthesis via Content-aware Layout Inferring**
1731 | 
1732 | - Paper: https://arxiv.org/abs/2204.02701
1733 | - Dataset: https://github.com/yizhiwang96/TextLogoLayout
1734 | - Code: https://github.com/yizhiwang96/TextLogoLayout
1735 | 
1736 | **Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes**
1737 | 
1738 | - Paper: https://arxiv.org/abs/2203.13412
1739 | - Code: https://github.com/zjsong/SSPL
1740 | 
1741 | **Robust and Accurate Superquadric Recovery: a Probabilistic Approach**
1742 | 
1743 | - Paper(Oral): https://arxiv.org/abs/2111.14517
1744 | - Code: https://github.com/bmlklwx/EMS-superquadric_fitting
1745 | 
1746 | **Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence**
1747 | 
1748 | - Paper: https://arxiv.org/abs/2203.00911
1749 | - Code: None
1750 | 
1751 | **Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer**
1752 | 
1753 | - Paper(Oral): https://arxiv.org/abs/2204.08680
1754 | - Code: https://github.com/zengwang430521/TCFormer
1755 | 
1756 | **DeepDPM: Deep Clustering With an Unknown Number of Clusters**
1757 | 
1758 | - Paper: https://arxiv.org/abs/2203.14309
1759 | - Code: https://github.com/BGU-CS-VIL/DeepDPM
1760 | 
1761 | **ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic**
1762 | 
1763 | - Paper: https://arxiv.org/abs/2111.14447
1764 | - Code: https://github.com/YoadTew/zero-shot-image-to-text
1765 | 
1766 | **Proto2Proto: Can you recognize the car, the way I do?**
1767 | 
1768 | - Paper: https://arxiv.org/abs/2204.11830
1769 | - Code: https://github.com/archmaester/proto2proto
1770 | 
1771 | **Putting People in their Place: Monocular Regression of 3D People in Depth**
1772 | 
1773 | - Homepage: https://arthur151.github.io/BEV/BEV.html
1774 | - Paper: https://arxiv.org/abs/2112.08274
1775 | - Code:https://github.com/Arthur151/ROMP
1776 | - Dataset: https://github.com/Arthur151/Relative_Human
1777 | 
1778 | **Light Field Neural Rendering**
1779 | 
1780 | - Homepage: https://light-field-neural-rendering.github.io/
1781 | - Paper(Oral): https://arxiv.org/abs/2112.09687
1782 | - Code: https://github.com/google-research/google-research/tree/master/light_field_neural_rendering
1783 | 
1784 | **Neural Texture Extraction and Distribution for Controllable Person Image Synthesis**
1785 | 
1786 | - Paper: https://arxiv.org/abs/2204.06160
1787 | - Code: https://github.com/RenYurui/Neural-Texture-Extraction-Distribution
1788 | 
1789 | **Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning**
1790 | 
1791 | - Paper: https://arxiv.org/abs/2203.14333
1792 | - Code: https://github.com/0liliulei/LIIR  


--------------------------------------------------------------------------------
/CVPR2023-Papers-with-Code.md:
--------------------------------------------------------------------------------
   1 | # CVPR 2023 论文和开源项目合集(Papers with Code)
   2 | 
   3 | [CVPR 2023](https://openaccess.thecvf.com/CVPR2023?day=all) 论文和开源项目合集(papers with code)！
   4 | 
   5 | **25.78% = 2360 / 9155**
   6 | 
   7 | CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of **9155** submissions (a 12% increase over CVPR 2022), and accepted **2360** papers, for a 25.78% acceptance rate.
   8 | 
   9 | 
  10 | > 注1：欢迎各位大佬提交issue，分享CVPR 2023论文和开源项目！
  11 | >
  12 | > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
  13 | >
  14 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md)
  15 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md)
  16 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md)
  17 | > - [CVPR 2022](CVPR2022-Papers-with-Code.md)
  18 | 
  19 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【CVer学术交流群】！互相学习，一起进步~ 
  20 | 
  21 | ![](CVer学术交流群.png)
  22 | 
  23 | # 【CVPR 2023 论文开源目录】
  24 | 
  25 | - [Backbone](#Backbone)
  26 | - [CLIP](#CLIP)
  27 | - [MAE](#MAE)
  28 | - [GAN](#GAN)
  29 | - [GNN](#GNN)
  30 | - [MLP](#MLP)
  31 | - [NAS](#NAS)
  32 | - [OCR](#OCR)
  33 | - [NeRF](#NeRF)
  34 | - [DETR](#DETR)
  35 | - [Prompt](#Prompt)
  36 | - [Diffusion Models(扩散模型)](#Diffusion)
  37 | - [Avatars](#Avatars)
  38 | - [ReID(重识别)](#ReID)
  39 | - [长尾分布(Long-Tail)](#Long-Tail)
  40 | - [Vision Transformer](#Vision-Transformer)
  41 | - [视觉和语言(Vision-Language)](#VL)
  42 | - [自监督学习(Self-supervised Learning)](#SSL)
  43 | - [数据增强(Data Augmentation)](#DA)
  44 | - [目标检测(Object Detection)](#Object-Detection)
  45 | - [目标跟踪(Visual Tracking)](#VT)
  46 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
  47 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
  48 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
  49 | - [医学图像分割(Medical Image Segmentation)](#MIS)
  50 | - [视频目标分割(Video Object Segmentation)](#VOS)
  51 | - [视频实例分割(Video Instance Segmentation)](#VIS)
  52 | - [参考图像分割(Referring Image Segmentation)](#RIS)
  53 | - [图像抠图(Image Matting)](#Matting)
  54 | - [图像编辑(Image Editing)](#Image-Editing)
  55 | - [Low-level Vision](#LLV)
  56 | - [超分辨率(Super-Resolution)](#SR)
  57 | - [去噪(Denoising)](#Denoising)
  58 | - [去模糊(Deblur)](#Deblur)
  59 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
  60 | - [3D目标检测(3D Object Detection)](#3DOD)
  61 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
  62 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
  63 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
  64 | - [3D配准(3D Registration)](#3D-Registration)
  65 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
  66 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
  67 | - [医学图像(Medical Image)](#Medical-Image)
  68 | - [图像生成(Image Generation)](#Image-Generation)
  69 | - [视频生成(Video Generation)](#Video-Generation)
  70 | - [视频理解(Video Understanding)](#Video-Understanding)
  71 | - [行为检测(Action Detection)](#Action-Detection)
  72 | - [文本检测(Text Detection)](#Text-Detection)
  73 | - [知识蒸馏(Knowledge Distillation)](#KD)
  74 | - [模型剪枝(Model Pruning)](#Pruning)
  75 | - [图像压缩(Image Compression)](#IC)
  76 | - [异常检测(Anomaly Detection)](#AD)
  77 | - [三维重建(3D Reconstruction)](#3D-Reconstruction)
  78 | - [深度估计(Depth Estimation)](#Depth-Estimation)
  79 | - [轨迹预测(Trajectory Prediction)](#TP)
  80 | - [车道线检测(Lane Detection)](#Lane-Detection)
  81 | - [图像描述(Image Captioning)](#Image-Captioning)
  82 | - [视觉问答(Visual Question Answering)](#VQA)
  83 | - [手语识别(Sign Language Recognition)](#SLR)
  84 | - [视频预测(Video Prediction)](#Video-Prediction)
  85 | - [新视点合成(Novel View Synthesis)](#NVS)
  86 | - [Zero-Shot Learning(零样本学习)](#ZSL)
  87 | - [立体匹配(Stereo Matching)](#Stereo-Matching)
  88 | - [特征匹配(Feature Matching)](#Feature-Matching)
  89 | - [场景图生成(Scene Graph Generation)](#SGG)
  90 | - [隐式神经表示(Implicit Neural Representations)](#INR)
  91 | - [图像质量评价(Image Quality Assessment)](#IQA)
  92 | - [数据集(Datasets)](#Datasets)
  93 | - [新任务(New Tasks)](#New-Tasks)
  94 | - [其他(Others)](#Others)
  95 | 
  96 | <a name="Backbone"></a>
  97 | 
  98 | # Backbone
  99 | 
 100 | **Integrally Pre-Trained Transformer Pyramid Networks** 
 101 | 
 102 | - Paper: https://arxiv.org/abs/2211.12735
 103 | - Code: https://github.com/sunsmarterjie/iTPN
 104 | 
 105 | **Stitchable Neural Networks**
 106 | 
 107 | - Homepage: https://snnet.github.io/
 108 | - Paper: https://arxiv.org/abs/2302.06586
 109 | - Code: https://github.com/ziplab/SN-Net
 110 | 
 111 | **Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks**
 112 | 
 113 | - Paper: https://arxiv.org/abs/2303.03667
 114 | - Code: https://github.com/JierunChen/FasterNet 
 115 | 
 116 | **BiFormer: Vision Transformer with Bi-Level Routing Attention**
 117 | 
 118 | - Paper: None
 119 | - Code: https://github.com/rayleizhu/BiFormer 
 120 | 
 121 | **DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network**
 122 | 
 123 | - Paper: https://arxiv.org/abs/2303.02165
 124 | - Code: https://github.com/alibaba/lightweight-neural-architecture-search 
 125 | 
 126 | **Vision Transformer with Super Token Sampling**
 127 | 
 128 | - Paper: https://arxiv.org/abs/2211.11167
 129 | - Code: https://github.com/hhb072/SViT
 130 | 
 131 | **Hard Patches Mining for Masked Image Modeling**
 132 | 
 133 | - Paper: None
 134 | - Code: None
 135 | 
 136 | **SMPConv: Self-moving Point Representations for Continuous Convolution**
 137 | 
 138 | - Paper: https://arxiv.org/abs/2304.02330
 139 | - Code: https://github.com/sangnekim/SMPConv
 140 | 
 141 | **Making Vision Transformers Efficient from A Token Sparsification View**
 142 | 
 143 | - Paper: https://arxiv.org/abs/2303.08685
 144 | - Code: https://github.com/changsn/STViT-R 
 145 | 
 146 | <a name="CLIP"></a>
 147 | 
 148 | # CLIP
 149 | 
 150 | **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**
 151 | 
 152 | - Paper: https://arxiv.org/abs/2301.12959
 153 | - Code: https://github.com/tobran/GALIP
 154 | 
 155 | **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**
 156 | 
 157 | - Paper: https://arxiv.org/abs/2303.06285
 158 | - Code: https://github.com/Yueming6568/DeltaEdit 
 159 | 
 160 | <a name="MAE"></a>
 161 | 
 162 | # MAE
 163 | 
 164 | **Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders** 
 165 | 
 166 | - Paper: https://arxiv.org/abs/2212.06785
 167 | - Code: https://github.com/ZrrSkywalker/I2P-MAE
 168 | 
 169 | **Generic-to-Specific Distillation of Masked Autoencoders**
 170 | 
 171 | - Paper: https://arxiv.org/abs/2302.14771
 172 | - Code: https://github.com/pengzhiliang/G2SD
 173 | 
 174 | <a name="GAN"></a>
 175 | 
 176 | # GAN
 177 | 
 178 | **DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation**
 179 | 
 180 | - Paper: https://arxiv.org/abs/2303.06285
 181 | - Code: https://github.com/Yueming6568/DeltaEdit 
 182 | 
 183 | <a name="NeRF"></a>
 184 | 
 185 | # NeRF
 186 | 
 187 | **NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior**
 188 | 
 189 | - Home: https://nope-nerf.active.vision/
 190 | - Paper: https://arxiv.org/abs/2212.07388
 191 | - Code: None
 192 | 
 193 | **Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures**
 194 | 
 195 | - Paper: https://arxiv.org/abs/2211.07600
 196 | - Code: https://github.com/eladrich/latent-nerf
 197 | 
 198 | **NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis**
 199 | 
 200 | - Paper: https://arxiv.org/abs/2301.08556
 201 | - Code: None
 202 | 
 203 | **Panoptic Lifting for 3D Scene Understanding with Neural Fields**
 204 | 
 205 | - Homepage: https://nihalsid.github.io/panoptic-lifting/
 206 | - Paper: https://arxiv.org/abs/2212.09802
 207 | - Code: None
 208 | 
 209 | **NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer**
 210 | 
 211 | - Homepage: https://redrock303.github.io/nerflix/
 212 | - Paper: https://arxiv.org/abs/2303.06919 
 213 | - Code: None
 214 | 
 215 | **HNeRV: A Hybrid Neural Representation for Videos**
 216 | 
 217 | - Homepage: https://haochen-rye.github.io/HNeRV
 218 | - Paper: https://arxiv.org/abs/2304.02633
 219 | - Code: https://github.com/haochen-rye/HNeRV
 220 | 
 221 | <a name="DETR"></a>
 222 | 
 223 | # DETR
 224 | 
 225 | **DETRs with Hybrid Matching**
 226 | 
 227 | - Paper: https://arxiv.org/abs/2207.13080
 228 | - Code: https://github.com/HDETR
 229 | 
 230 | <a name="Prompt"></a>
 231 | 
 232 | # Prompt
 233 | 
 234 | **Diversity-Aware Meta Visual Prompting**
 235 | 
 236 | - Paper: https://arxiv.org/abs/2303.08138
 237 | - Code: https://github.com/shikiw/DAM-VP 
 238 | 
 239 | <a name="NAS"></a>
 240 | 
 241 | # NAS
 242 | 
 243 | **PA&DA: Jointly Sampling PAth and DAta for Consistent NAS**
 244 | 
 245 | - Paper: https://arxiv.org/abs/2302.14772
 246 | - Code: https://github.com/ShunLu91/PA-DA
 247 | 
 248 | <a name="Avatars"></a>
 249 | 
 250 | # Avatars
 251 | 
 252 | **Structured 3D Features for Reconstructing Relightable and Animatable Avatars**
 253 | 
 254 | - Homepage: https://enriccorona.github.io/s3f/
 255 | - Paper: https://arxiv.org/abs/2212.06820
 256 | - Code: None
 257 | - Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s
 258 | 
 259 | **Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos**
 260 | 
 261 | - Homepage: https://augmentedperception.github.io/monoavatar/
 262 | - Paper: https://arxiv.org/abs/2304.01436
 263 | 
 264 | <a name="ReID"></a>
 265 | 
 266 | # ReID(重识别)
 267 | 
 268 | **Clothing-Change Feature Augmentation for Person Re-Identification**
 269 | 
 270 | - Paper: None
 271 | - Code: None
 272 | 
 273 | **MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID**
 274 | 
 275 | - Paper: https://arxiv.org/abs/2303.07065
 276 | - Code: https://github.com/vimar-gu/MSINet
 277 | 
 278 | **Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification**
 279 | 
 280 | - Paper: https://arxiv.org/abs/2304.04205
 281 | - Code: None
 282 | 
 283 | **Large-scale Training Data Search for Object Re-identification**
 284 | 
 285 | - Paper: https://arxiv.org/abs/2303.16186
 286 | - Code: https://github.com/yorkeyao/SnP 
 287 | 
 288 | <a name="Diffusion"></a>
 289 | 
 290 | # Diffusion Models(扩散模型)
 291 | 
 292 | **Video Probabilistic Diffusion Models in Projected Latent Space** 
 293 | 
 294 | - Homepage: https://sihyun.me/PVDM/
 295 | - Paper: https://arxiv.org/abs/2302.07685
 296 | - Code: https://github.com/sihyun-yu/PVDM
 297 | 
 298 | **Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models**
 299 | 
 300 | - Paper: https://arxiv.org/abs/2211.10655
 301 | - Code: None
 302 | 
 303 | **Imagic: Text-Based Real Image Editing with Diffusion Models**
 304 | 
 305 | - Homepage: https://imagic-editing.github.io/
 306 | - Paper: https://arxiv.org/abs/2210.09276
 307 | - Code: None
 308 | 
 309 | **Parallel Diffusion Models of Operator and Image for Blind Inverse Problems**
 310 | 
 311 | - Paper: https://arxiv.org/abs/2211.10656
 312 | - Code: None
 313 | 
 314 | **DiffRF: Rendering-guided 3D Radiance Field Diffusion**
 315 | 
 316 | - Homepage: https://sirwyver.github.io/DiffRF/
 317 | - Paper: https://arxiv.org/abs/2212.01206
 318 | - Code: None
 319 | 
 320 | **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**
 321 | 
 322 | - Paper: https://arxiv.org/abs/2212.09478
 323 | - Code: https://github.com/researchmm/MM-Diffusion
 324 | 
 325 | **HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising**
 326 | 
 327 | - Homepage: https://aminshabani.github.io/housediffusion/
 328 | - Paper: https://arxiv.org/abs/2211.13287
 329 | - Code: https://github.com/aminshabani/house_diffusion 
 330 | 
 331 | **TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets**
 332 | 
 333 | - Paper: https://arxiv.org/abs/2303.05762
 334 | - Code: https://github.com/chenweixin107/TrojDiff
 335 | 
 336 | **Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption**
 337 | 
 338 | - Paper: https://arxiv.org/abs/2207.03442
 339 | - Code: https://github.com/shiyegao/DDA 
 340 | 
 341 | **DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration**
 342 | 
 343 | - Paper: https://arxiv.org/abs/2303.06885
 344 | - Code: None
 345 | 
 346 | **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**
 347 | 
 348 | - Homepage: https://nv-tlabs.github.io/trace-pace/
 349 | - Paper: https://arxiv.org/abs/2304.01893
 350 | - Code: None
 351 | 
 352 | **Generative Diffusion Prior for Unified Image Restoration and Enhancement**
 353 | 
 354 | - Paper: https://arxiv.org/abs/2304.01247
 355 | - Code: None
 356 | 
 357 | **Conditional Image-to-Video Generation with Latent Flow Diffusion Models**
 358 | 
 359 | - Paper: https://arxiv.org/abs/2303.13744
 360 | - Code: https://github.com/nihaomiao/CVPR23_LFDM 
 361 | 
 362 | <a name="Long-Tail"></a>
 363 | 
 364 | # 长尾分布(Long-Tail)
 365 | 
 366 | **Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation**
 367 | 
 368 | - Paper: https://arxiv.org/abs/2304.01279
 369 | - Code: None
 370 | 
 371 | <a name="Vision-Transformer"></a>
 372 | 
 373 | # Vision Transformer
 374 | 
 375 | **Integrally Pre-Trained Transformer Pyramid Networks** 
 376 | 
 377 | - Paper: https://arxiv.org/abs/2211.12735
 378 | - Code: https://github.com/sunsmarterjie/iTPN
 379 | 
 380 | **Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors**
 381 | 
 382 | - Homepage: https://niessnerlab.org/projects/hou2023mask3d.html
 383 | - Paper: https://arxiv.org/abs/2302.14746
 384 | - Code: None
 385 | 
 386 | **Learning Trajectory-Aware Transformer for Video Super-Resolution**
 387 | 
 388 | - Paper: https://arxiv.org/abs/2204.04216
 389 | - Code: https://github.com/researchmm/TTVSR
 390 | 
 391 | **Vision Transformers are Parameter-Efficient Audio-Visual Learners**
 392 | 
 393 | - Homepage: https://yanbo.ml/project_page/LAVISH/
 394 | - Code: https://github.com/GenjiB/LAVISH
 395 | 
 396 | **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**
 397 | 
 398 | - Paper: https://arxiv.org/abs/2303.04249
 399 | - Code: None
 400 | 
 401 | **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**
 402 | 
 403 | - Paper: https://arxiv.org/abs/2301.06051
 404 | - Code: https://github.com/Haiyang-W/DSVT
 405 | 
 406 | **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**
 407 | 
 408 | - Paper: https://arxiv.org/abs/2211.10772
 409 | - Code link: https://github.com/ViTAE-Transformer/DeepSolo
 410 | 
 411 | **BiFormer: Vision Transformer with Bi-Level Routing Attention**
 412 | 
 413 | - Paper: https://arxiv.org/abs/2303.08810
 414 | - Code: https://github.com/rayleizhu/BiFormer
 415 | 
 416 | **Vision Transformer with Super Token Sampling**
 417 | 
 418 | - Paper: https://arxiv.org/abs/2211.11167
 419 | - Code: https://github.com/hhb072/SViT
 420 | 
 421 | **BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision**
 422 | 
 423 | - Paper: https://arxiv.org/abs/2211.10439
 424 | - Code: None
 425 | 
 426 | **BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation**
 427 | 
 428 | - Paper: None
 429 | - Code: None
 430 | 
 431 | **Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention**
 432 | 
 433 | - Paper: https://arxiv.org/abs/2304.03282
 434 | - Code: None
 435 | 
 436 | **Making Vision Transformers Efficient from A Token Sparsification View**
 437 | 
 438 | - Paper: https://arxiv.org/abs/2303.08685
 439 | - Code: https://github.com/changsn/STViT-R 
 440 | 
 441 | <a name="VL"></a>
 442 | 
 443 | # 视觉和语言(Vision-Language)
 444 | 
 445 | **GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods**
 446 | 
 447 | - Paper: https://arxiv.org/abs/2301.01893
 448 | - Code: None
 449 | 
 450 | **Teaching Structured Vision&Language Concepts to Vision&Language Models**
 451 | 
 452 | - Paper: https://arxiv.org/abs/2211.11733
 453 | - Code: None
 454 | 
 455 | **Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks**
 456 | 
 457 | - Paper: https://arxiv.org/abs/2211.09808
 458 | - Code: https://github.com/fundamentalvision/Uni-Perceiver
 459 | 
 460 | **Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training**
 461 | 
 462 | - Paper: https://arxiv.org/abs/2303.00040
 463 | - Code: None
 464 | 
 465 | **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**
 466 | 
 467 | - Paper: https://arxiv.org/abs/2303.02489
 468 | - Code: None
 469 | 
 470 | **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**
 471 | 
 472 | - Paper: https://arxiv.org/abs/2303.02483
 473 | - Code: None
 474 | 
 475 | **Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding**
 476 | 
 477 | - Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html
 478 | - Paper: https://arxiv.org/abs/2303.04077
 479 | - Code: None
 480 | 
 481 | **All in One: Exploring Unified Video-Language Pre-training**
 482 | 
 483 | - Paper: https://arxiv.org/abs/2203.07303
 484 | - Code: https://github.com/showlab/all-in-one
 485 | 
 486 | **Position-guided Text Prompt for Vision Language Pre-training**
 487 | 
 488 | - Paper: https://arxiv.org/abs/2212.09737
 489 | - Code: https://github.com/sail-sg/ptp
 490 | 
 491 | **EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding**
 492 | 
 493 | - Paper: https://arxiv.org/abs/2209.14941
 494 | - Code: https://github.com/yanmin-wu/EDA
 495 | 
 496 | **CapDet: Unifying Dense Captioning and Open-World Detection Pretraining**
 497 | 
 498 | - Paper: https://arxiv.org/abs/2303.02489
 499 | - Code: None
 500 | 
 501 | **FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks**
 502 | 
 503 | - Paper: https://arxiv.org/abs/2303.02483
 504 | - Code: https://github.com/BrandonHanx/FAME-ViL
 505 | 
 506 | **Align and Attend: Multimodal Summarization with Dual Contrastive Losses**
 507 | 
 508 | - Homepage: https://boheumd.github.io/A2Summ/
 509 | - Paper: https://arxiv.org/abs/2303.07284
 510 | - Code: https://github.com/boheumd/A2Summ
 511 | 
 512 | **Multi-Modal Representation Learning with Text-Driven Soft Masks**
 513 | 
 514 | - Paper: https://arxiv.org/abs/2304.00719
 515 | - Code: None
 516 | 
 517 | **Learning to Name Classes for Vision and Language Models**
 518 | 
 519 | - Paper: https://arxiv.org/abs/2304.01830
 520 | - Code: None
 521 | 
 522 | <a name="Object-Detection"></a>
 523 | 
 524 | # 目标检测(Object Detection)
 525 | 
 526 | **YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors**
 527 | 
 528 | - Paper: https://arxiv.org/abs/2207.02696
 529 | - Code: https://github.com/WongKinYiu/yolov7
 530 | 
 531 | **DETRs with Hybrid Matching**
 532 | 
 533 | - Paper: https://arxiv.org/abs/2207.13080
 534 | - Code: https://github.com/HDETR
 535 | 
 536 | **Enhanced Training of Query-Based Object Detection via Selective Query Recollection**
 537 | 
 538 | - Paper: https://arxiv.org/abs/2212.07593
 539 | - Code: https://github.com/Fangyi-Chen/SQR
 540 | 
 541 | **Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection**
 542 | 
 543 | - Paper: https://arxiv.org/abs/2303.05892
 544 | - Code: https://github.com/LutingWang/OADP
 545 | 
 546 | <a name="VT"></a>
 547 | 
 548 | # 目标跟踪(Object Tracking)
 549 | 
 550 | **Simple Cues Lead to a Strong Multi-Object Tracker**
 551 | 
 552 | - Paper: https://arxiv.org/abs/2206.04656
 553 | - Code: None
 554 | 
 555 | **Joint Visual Grounding and Tracking with Natural Language Specification**
 556 | 
 557 | - Paper: https://arxiv.org/abs/2303.12027
 558 | - Code: https://github.com/lizhou-cs/JointNLT 
 559 | 
 560 | <a name="Semantic-Segmentation"></a>
 561 | 
 562 | # 语义分割(Semantic Segmentation)
 563 | 
 564 | **Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos**
 565 | 
 566 | - Paper: https://arxiv.org/abs/2303.07224
 567 | - Code: https://github.com/THU-LYJ-Lab/AR-Seg
 568 | 
 569 | **FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding**
 570 | 
 571 | - Paper: https://arxiv.org/abs/2304.02135
 572 | - Code: https://github.com/uark-cviu/FREDOM
 573 | 
 574 | <a name="MIS"></a>
 575 | 
 576 | # 医学图像分割(Medical Image Segmentation)
 577 | 
 578 | **Label-Free Liver Tumor Segmentation**
 579 | 
 580 | - Paper: https://arxiv.org/abs/2303.14869
 581 | - Code: https://github.com/MrGiovanni/SyntheticTumors
 582 | 
 583 | **Directional Connectivity-based Segmentation of Medical Images**
 584 | 
 585 | - Paper: https://arxiv.org/abs/2304.00145
 586 | - Code: https://github.com/Zyun-Y/DconnNet
 587 | 
 588 | **Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation**
 589 | 
 590 | - Paper: https://arxiv.org/abs/2305.00673
 591 | - Code: https://github.com/DeepMed-Lab-ECNU/BCP
 592 | 
 593 | **Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization**
 594 | 
 595 | - Paper: https://arxiv.org/abs/2304.00212
 596 | - Code: None
 597 | 
 598 | **Fair Federated Medical Image Segmentation via Client Contribution Estimation**
 599 | 
 600 | - Paper: https://arxiv.org/abs/2303.16520
 601 | - Code: https://github.com/NVIDIA/NVFlare/tree/dev/research/fed-ce
 602 | 
 603 | **Ambiguous Medical Image Segmentation using Diffusion Models**
 604 | 
 605 | - Homepage: https://aimansnigdha.github.io/cimd/
 606 | - Paper: https://arxiv.org/abs/2304.04745
 607 | - Code: https://github.com/aimansnigdha/Ambiguous-Medical-Image-Segmentation-using-Diffusion-Models
 608 | 
 609 | **Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation**
 610 | 
 611 | - Paper: https://arxiv.org/abs/2303.13090
 612 | - Code: https://github.com/HengCai-NJU/DeSCO
 613 | 
 614 | **MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery**
 615 | 
 616 | - Paper: https://arxiv.org/abs/2301.01767
 617 | - Code: https://github.com/DeepMed-Lab-ECNU/MagicNet
 618 | 
 619 | **MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation**
 620 | 
 621 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Wang_MCF_Mutual_Correction_Framework_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
 622 | - Code: https://github.com/WYC-321/MCF
 623 | 
 624 | **Rethinking Few-Shot Medical Segmentation: A Vector Quantization View**
 625 | 
 626 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Huang_Rethinking_Few-Shot_Medical_Segmentation_A_Vector_Quantization_View_CVPR_2023_paper.html
 627 | - Code: None
 628 | 
 629 | **Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation**
 630 | 
 631 | - Paper: https://openaccess.thecvf.com/content/CVPR2023/html/Basak_Pseudo-Label_Guided_Contrastive_Learning_for_Semi-Supervised_Medical_Image_Segmentation_CVPR_2023_paper.html
 632 | - Code: https://github.com/hritam-98/PatchCL-MedSeg
 633 | 
 634 | **SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation**
 635 | 
 636 | - Paper: https://arxiv.org/abs/2305.11012
 637 | - Code: None
 638 | 
 639 | **DoNet: Deep De-overlapping Network for Cytology Instance Segmentation**
 640 | 
 641 | - Paper: https://arxiv.org/abs/2303.14373
 642 | - Code: https://github.com/DeepDoNet/DoNet
 643 | 
 644 | <a name="VOS"></a>
 645 | 
 646 | # 视频目标分割（Video Object Segmentation）
 647 | 
 648 | **Two-shot Video Object Segmentation**
 649 | 
 650 | - Paper: https://arxiv.org/abs/2303.12078
 651 | - Code: https://github.com/yk-pku/Two-shot-Video-Object-Segmentation
 652 | 
 653 |  **Under Video Object Segmentation Section**
 654 | 
 655 | - Paper: https://arxiv.org/abs/2303.07815
 656 | - Code: None
 657 | 
 658 | <a name="VIS"></a>
 659 | 
 660 | # 视频实例分割(Video Instance Segmentation)
 661 | 
 662 | **Mask-Free Video Instance Segmentation**
 663 | 
 664 | - Paper: https://arxiv.org/abs/2303.15904
 665 | - Code: https://github.com/SysCV/MaskFreeVis 
 666 | 
 667 | <a name="RIS"></a>
 668 | 
 669 | # 参考图像分割(Referring Image Segmentation )
 670 | 
 671 | **PolyFormer: Referring Image Segmentation as Sequential Polygon Generation**
 672 | 
 673 | - Paper: https://arxiv.org/abs/2302.07387 
 674 | 
 675 | - Code: None
 676 | 
 677 | <a name="3D-Point-Cloud"></a>
 678 | 
 679 | # 3D点云(3D-Point-Cloud)
 680 | 
 681 | **Physical-World Optical Adversarial Attacks on 3D Face Recognition**
 682 | 
 683 | - Paper: https://arxiv.org/abs/2205.13412
 684 | - Code: https://github.com/PolyLiYJ/SLAttack.git
 685 | 
 686 | **IterativePFN: True Iterative Point Cloud Filtering**
 687 | 
 688 | - Paper: https://arxiv.org/abs/2304.01529
 689 | - Code: https://github.com/ddsediri/IterativePFN
 690 | 
 691 | **Attention-based Point Cloud Edge Sampling**
 692 | 
 693 | - Homepage: https://junweizheng93.github.io/publications/APES/APES.html 
 694 | - Paper: https://arxiv.org/abs/2302.14673
 695 | - Code: https://github.com/JunweiZheng93/APES
 696 | 
 697 | <a name="3DOD"></a>
 698 | 
 699 | # 3D目标检测(3D Object Detection)
 700 | 
 701 | **DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets**
 702 | 
 703 | - Paper: https://arxiv.org/abs/2301.06051
 704 | - Code: https://github.com/Haiyang-W/DSVT 
 705 | 
 706 | **FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection**
 707 | 
 708 | - Paper:  https://arxiv.org/abs/2301.04467
 709 | - Code: None
 710 | 
 711 | **3D Video Object Detection with Learnable Object-Centric Global Optimization**
 712 | 
 713 | - Paper: None
 714 | - Code: None
 715 | 
 716 | **Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection**
 717 | 
 718 | - Paper: https://arxiv.org/abs/2304.01464
 719 | - Code: https://github.com/azhuantou/HSSDA
 720 | 
 721 | <a name="3DOD"></a>
 722 | 
 723 | # 3D语义分割(3D Semantic Segmentation)
 724 | 
 725 | **Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation**
 726 | 
 727 | - Paper: https://arxiv.org/abs/2303.11203
 728 | - Code: https://github.com/l1997i/lim3d 
 729 | 
 730 | <a name="3DSSC"></a>
 731 | 
 732 | # 3D语义场景补全(3D Semantic Scene Completion)
 733 | 
 734 | - Paper: https://arxiv.org/abs/2302.12251
 735 | - Code: https://github.com/NVlabs/VoxFormer 
 736 | 
 737 | <a name="3D-Registration"></a>
 738 | 
 739 | # 3D配准(3D Registration)
 740 | 
 741 | **Robust Outlier Rejection for 3D Registration with Variational Bayes**
 742 | 
 743 | - Paper: https://arxiv.org/abs/2304.01514
 744 | - Code: https://github.com/Jiang-HB/VBReg
 745 | 
 746 | <a name="3D-Human-Pose-Estimation"></a>
 747 | 
 748 | # 3D人体姿态估计(3D Human Pose Estimation)
 749 | 
 750 | <a name="3D-Human-Mesh-Estimation"></a>
 751 | 
 752 | # 3D人体Mesh估计(3D Human Mesh Estimation)
 753 | 
 754 | **3D Human Mesh Estimation from Virtual Markers**
 755 | 
 756 | - Paper: https://arxiv.org/abs/2303.11726
 757 | - Code: https://github.com/ShirleyMaxx/VirtualMarker 
 758 | 
 759 | <a name="LLV"></a>
 760 | 
 761 | # Low-level Vision
 762 | 
 763 | **Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective**
 764 | 
 765 | - Paper: https://arxiv.org/abs/2303.06859
 766 | - Code: https://github.com/lixinustc/Casual-IR-DIL 
 767 | 
 768 | **Burstormer: Burst Image Restoration and Enhancement Transformer**
 769 | 
 770 | - Paper: https://arxiv.org/abs/2304.01194
 771 | - Code: http://github.com/akshaydudhane16/Burstormer
 772 | 
 773 | <a name="SR"></a>
 774 | 
 775 | # 超分辨率(Video Super-Resolution)
 776 | 
 777 | **Super-Resolution Neural Operator**
 778 | 
 779 | - Paper: https://arxiv.org/abs/2303.02584
 780 | - Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator 
 781 | 
 782 | ## 视频超分辨率
 783 | 
 784 | **Learning Trajectory-Aware Transformer for Video Super-Resolution**
 785 | 
 786 | - Paper: https://arxiv.org/abs/2204.04216
 787 | 
 788 | - Code: https://github.com/researchmm/TTVSR
 789 | 
 790 | Denoising<a name="Denoising"></a>
 791 | 
 792 | # 去噪(Denoising)
 793 | 
 794 | ## 图像去噪(Image Denoising)
 795 | 
 796 | **Masked Image Training for Generalizable Deep Image Denoising**
 797 | 
 798 | - Paper- : https://arxiv.org/abs/2303.13132
 799 | - Code: https://github.com/haoyuc/MaskedDenoising 
 800 | 
 801 | <a name="Image-Generation"></a>
 802 | 
 803 | # 图像生成(Image Generation)
 804 | 
 805 | **GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis**
 806 | 
 807 | - Paper: https://arxiv.org/abs/2301.12959
 808 | - Code: https://github.com/tobran/GALIP 
 809 | 
 810 | **MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis**
 811 | 
 812 | - Paper: https://arxiv.org/abs/2211.09117
 813 | - Code: https://github.com/LTH14/mage
 814 | 
 815 | **Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation**
 816 | 
 817 | - Paper: https://arxiv.org/abs/2304.01816
 818 | - Code: None
 819 | 
 820 | **Few-shot Semantic Image Synthesis with Class Affinity Transfer**
 821 | 
 822 | - Paper: https://arxiv.org/abs/2304.02321
 823 | - Code: None
 824 | 
 825 | **TopNet: Transformer-based Object Placement Network for Image Compositing**
 826 | 
 827 | - Paper: https://arxiv.org/abs/2304.03372
 828 | - Code: None
 829 | 
 830 | <a name="Video-Generation"></a>
 831 | 
 832 | # 视频生成(Video Generation)
 833 | 
 834 | **MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation**
 835 | 
 836 | - Paper: https://arxiv.org/abs/2212.09478
 837 | - Code: https://github.com/researchmm/MM-Diffusion
 838 | 
 839 | **Conditional Image-to-Video Generation with Latent Flow Diffusion Models**
 840 | 
 841 | - Paper: https://arxiv.org/abs/2303.13744
 842 | - Code: https://github.com/nihaomiao/CVPR23_LFDM 
 843 | 
 844 | <a name="Video-Understanding"></a>
 845 | 
 846 | # 视频理解(Video Understanding)
 847 | 
 848 | **Learning Transferable Spatiotemporal Representations from Natural Script Knowledge**
 849 | 
 850 | - Paper: https://arxiv.org/abs/2209.15280
 851 | - Code: https://github.com/TencentARC/TVTS
 852 | 
 853 | **Frame Flexible Network**
 854 | 
 855 | - Paper: https://arxiv.org/abs/2303.14817
 856 | - Code: https://github.com/BeSpontaneous/FFN
 857 | 
 858 | **Masked Motion Encoding for Self-Supervised Video Representation Learning**
 859 | 
 860 | - Paper: https://arxiv.org/abs/2210.06096
 861 | - Code: https://github.com/XinyuSun/MME
 862 | 
 863 | **MARLIN: Masked Autoencoder for facial video Representation LearnING**
 864 | 
 865 | - Paper: https://arxiv.org/abs/2211.06627
 866 | - Code: https://github.com/ControlNet/MARLIN 
 867 | 
 868 | <a name="Action-Detection"></a>
 869 | 
 870 | # 行为检测(Action Detection)
 871 | 
 872 | **TriDet: Temporal Action Detection with Relative Boundary Modeling**
 873 | 
 874 | - Paper: https://arxiv.org/abs/2303.07347
 875 | - Code: https://github.com/dingfengshi/TriDet 
 876 | 
 877 | <a name="Text-Detection"></a>
 878 | 
 879 | # 文本检测(Text Detection)
 880 | 
 881 | **DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting**
 882 | 
 883 | - Paper: https://arxiv.org/abs/2211.10772
 884 | - Code link: https://github.com/ViTAE-Transformer/DeepSolo
 885 | 
 886 | <a name="KD"></a>
 887 | 
 888 | # 知识蒸馏(Knowledge Distillation)
 889 | 
 890 | **Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation**
 891 | 
 892 | - Paper: https://arxiv.org/abs/2302.14290
 893 | - Code: None
 894 | 
 895 | **Generic-to-Specific Distillation of Masked Autoencoders**
 896 | 
 897 | - Paper: https://arxiv.org/abs/2302.14771
 898 | - Code: https://github.com/pengzhiliang/G2SD
 899 | 
 900 | <a name="Pruning"></a>
 901 | 
 902 | # 模型剪枝(Model Pruning)
 903 | 
 904 | **DepGraph: Towards Any Structural Pruning**
 905 | 
 906 | - Paper: https://arxiv.org/abs/2301.12900
 907 | - Code: https://github.com/VainF/Torch-Pruning 
 908 | 
 909 | <a name="IC"></a>
 910 | 
 911 | # 图像压缩(Image Compression)
 912 | 
 913 | **Context-Based Trit-Plane Coding for Progressive Image Compression**
 914 | 
 915 | - Paper: https://arxiv.org/abs/2303.05715
 916 | - Code: https://github.com/seungminjeon-github/CTC
 917 | 
 918 | <a name="AD"></a>
 919 | 
 920 | # 异常检测(Anomaly Detection)
 921 | 
 922 | **Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images**
 923 | 
 924 | - Paper: https://arxiv.org/abs/2111.13495
 925 | - Code: https://github.com/tiangexiang/SQUID 
 926 | 
 927 | <a name="3D-Reconstruction"></a>
 928 | 
 929 | # 三维重建(3D Reconstruction)
 930 | 
 931 | **OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields**
 932 | 
 933 | - Paper: https://arxiv.org/abs/2211.12886
 934 | - Code: None
 935 | 
 936 | **SparsePose: Sparse-View Camera Pose Regression and Refinement**
 937 | 
 938 | - Paper: https://arxiv.org/abs/2211.16991
 939 | - Code: None
 940 | 
 941 | **NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction**
 942 | 
 943 | - Paper: https://arxiv.org/abs/2303.02375
 944 | - Code: None
 945 | 
 946 | **Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition**
 947 | 
 948 | - Homepage: https://moygcc.github.io/vid2avatar/
 949 | - Paper: https://arxiv.org/abs/2302.11566
 950 | - Code: https://github.com/MoyGcc/vid2avatar
 951 | - Demo: https://youtu.be/EGi47YeIeGQ
 952 | 
 953 | **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**
 954 | 
 955 | - Paper: https://arxiv.org/abs/2106.09614
 956 | - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
 957 | 
 958 | **Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction**
 959 | 
 960 | - Paper: https://arxiv.org/abs/2303.05937
 961 | - Code: None
 962 | 
 963 | **3D Cinemagraphy from a Single Image**
 964 | 
 965 | - Homepage: https://xingyi-li.github.io/3d-cinemagraphy/
 966 | - Paper: https://arxiv.org/abs/2303.05724
 967 | - Code: https://github.com/xingyi-li/3d-cinemagraphy
 968 | 
 969 | **Revisiting Rotation Averaging: Uncertainties and Robust Losses**
 970 | 
 971 | - Paper: https://arxiv.org/abs/2303.05195
 972 | - Code https://github.com/zhangganlin/GlobalSfMpy 
 973 | 
 974 | **FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction**
 975 | 
 976 | - Paper: https://arxiv.org/abs/2211.13874
 977 | - Code: https://github.com/csbhr/FFHQ-UV 
 978 | 
 979 | **A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images**
 980 | 
 981 | - Homepage: https://younglbw.github.io/HRN-homepage/ 
 982 | 
 983 | - Paper: https://arxiv.org/abs/2302.14434
 984 | - Code: https://github.com/youngLBW/HRN
 985 | 
 986 | <a name="Depth-Estimation"></a>
 987 | 
 988 | # 深度估计(Depth Estimation)
 989 | 
 990 | **Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation**
 991 | 
 992 | - Paper: https://arxiv.org/abs/2211.13202
 993 | - Code: https://github.com/noahzn/Lite-Mono 
 994 | 
 995 | <a name="TP"></a>
 996 | 
 997 | # 轨迹预测(Trajectory Prediction)
 998 | 
 999 | **IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction**
1000 | 
1001 | - Paper:  https://arxiv.org/abs/2303.00575
1002 | - Code: None
1003 | 
1004 | **EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning**
1005 | 
1006 | - Paper: https://arxiv.org/abs/2303.10876
1007 | - Code: https://github.com/MediaBrain-SJTU/EqMotion 
1008 | 
1009 | <a name="Lane-Detection"></a>
1010 | 
1011 | # 车道线检测(Lane Detection)
1012 | 
1013 | **Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection**
1014 | 
1015 | - Paper: https://arxiv.org/abs/2301.02371
1016 | - Code: https://github.com/tusen-ai/Anchor3DLane
1017 | 
1018 | **BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points**
1019 | 
1020 | - Paper:  https://arxiv.org/abs/2210.06006v3 
1021 | - Code:  https://github.com/gigo-team/bev_lane_det 
1022 | 
1023 | <a name="Image-Captioning"></a>
1024 | 
1025 | # 图像描述(Image Captioning)
1026 | 
1027 | **ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing**
1028 | 
1029 | - Paper: https://arxiv.org/abs/2303.02437
1030 | - Code: Node
1031 | 
1032 | **Cross-Domain Image Captioning with Discriminative Finetuning**
1033 | 
1034 | - Paper: https://arxiv.org/abs/2304.01662
1035 | - Code: None
1036 | 
1037 | **Model-Agnostic Gender Debiased Image Captioning**
1038 | 
1039 | - Paper: https://arxiv.org/abs/2304.03693
1040 | - Code: None
1041 | 
1042 | <a name="VQA"></a>
1043 | 
1044 | # 视觉问答(Visual Question Answering)
1045 | 
1046 | **MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering**
1047 | 
1048 | - Paper:  https://arxiv.org/abs/2303.01239
1049 | - Code: https://github.com/jingjing12110/MixPHM
1050 | 
1051 | <a name="SLR"></a>
1052 | 
1053 | # 手语识别(Sign Language Recognition)
1054 | 
1055 | **Continuous Sign Language Recognition with Correlation Network**
1056 | 
1057 | Paper: https://arxiv.org/abs/2303.03202
1058 | 
1059 | Code: https://github.com/hulianyuyy/CorrNet
1060 | 
1061 | <a name="Video-Prediction"></a>
1062 | 
1063 | # 视频预测(Video Prediction)
1064 | 
1065 | **MOSO: Decomposing MOtion, Scene and Object for Video Prediction**
1066 | 
1067 | - Paper: https://arxiv.org/abs/2303.03684
1068 | - Code: https://github.com/anonymous202203/MOSO
1069 | 
1070 | <a name="NVS"></a>
1071 | 
1072 | # 新视点合成(Novel View Synthesis)
1073 | 
1074 |  **3D Video Loops from Asynchronous Input**
1075 | 
1076 | - Homepage: https://limacv.github.io/VideoLoop3D_web/
1077 | - Paper: https://arxiv.org/abs/2303.05312
1078 | - Code: https://github.com/limacv/VideoLoop3D 
1079 | 
1080 | <a name="ZSL"></a>
1081 | 
1082 | # Zero-Shot Learning(零样本学习)
1083 | 
1084 | **Bi-directional Distribution Alignment for Transductive Zero-Shot Learning**
1085 | 
1086 | - Paper: https://arxiv.org/abs/2303.08698
1087 | - Code: https://github.com/Zhicaiwww/Bi-VAEGAN
1088 | 
1089 | **Semantic Prompt for Few-Shot Learning**
1090 | 
1091 | - Paper: None
1092 | - Code: None
1093 | 
1094 | <a name="Stereo-Matching"></a>
1095 | 
1096 | # 立体匹配(Stereo Matching)
1097 | 
1098 | **Iterative Geometry Encoding Volume for Stereo Matching**
1099 | 
1100 | - Paper: https://arxiv.org/abs/2303.06615
1101 | - Code: https://github.com/gangweiX/IGEV
1102 | 
1103 | **Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation**
1104 | 
1105 | - Paper: https://arxiv.org/abs/2304.00152
1106 | - Code: None
1107 | 
1108 | <a name="Feature-Matching"></a>
1109 | 
1110 | # 特征匹配(Feature Matching)
1111 | 
1112 | **Adaptive Spot-Guided Transformer for Consistent Local Feature Matching**
1113 | 
1114 | - Homepage: [https://astr2023.github.io](https://astr2023.github.io/) 
1115 | - Paper: https://arxiv.org/abs/2303.16624
1116 | - Code: https://github.com/ASTR2023/ASTR
1117 | 
1118 | <a name="SGG"></a>
1119 | 
1120 | # 场景图生成(Scene Graph Generation)
1121 | 
1122 | **Prototype-based Embedding Network for Scene Graph Generation**
1123 | 
1124 | - Paper: https://arxiv.org/abs/2303.07096
1125 | - Code: None
1126 | 
1127 | <a name="INR"></a>
1128 | 
1129 | # 隐式神经表示(Implicit Neural Representations)
1130 | 
1131 | **Polynomial Implicit Neural Representations For Large Diverse Datasets**
1132 | 
1133 | - Paper: https://arxiv.org/abs/2303.11424
1134 | - Code: https://github.com/Rajhans0/Poly_INR
1135 | 
1136 | <a name="IQA"></a>
1137 | 
1138 | # 图像质量评价(Image Quality Assessment)
1139 | 
1140 | **Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild**
1141 | 
1142 | - Paper: https://arxiv.org/abs/2304.00451
1143 | - Code: None
1144 | 
1145 | <a name="Datasets"></a>
1146 | 
1147 | # 数据集(Datasets)
1148 | 
1149 | **Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes**
1150 | 
1151 | - Paper: https://arxiv.org/abs/2303.02760
1152 | - Code: None
1153 | 
1154 | **Align and Attend: Multimodal Summarization with Dual Contrastive Losses**
1155 | 
1156 | - Homepage: https://boheumd.github.io/A2Summ/
1157 | - Paper: https://arxiv.org/abs/2303.07284
1158 | - Code: https://github.com/boheumd/A2Summ
1159 | 
1160 | **GeoNet: Benchmarking Unsupervised Adaptation across Geographies**
1161 | 
1162 | - Homepage: https://tarun005.github.io/GeoNet/
1163 | - Paper: https://arxiv.org/abs/2303.15443
1164 | 
1165 | **CelebV-Text: A Large-Scale Facial Text-Video Dataset**
1166 | 
1167 | - Homepage: https://celebv-text.github.io/
1168 | - Paper: https://arxiv.org/abs/2303.14717
1169 | 
1170 | <a name="Others"></a>
1171 | 
1172 | # 其他(Others)
1173 | 
1174 | **Interactive Segmentation as Gaussian Process Classification**
1175 | 
1176 | - Paper: https://arxiv.org/abs/2302.14578
1177 | - Code: None
1178 | 
1179 | **Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger**
1180 | 
1181 | - Paper: https://arxiv.org/abs/2302.14677
1182 | - Code: None
1183 | 
1184 | **SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries**
1185 | 
1186 | - Homepage: http://bit.ly/splinecam
1187 | - Paper: https://arxiv.org/abs/2302.12828
1188 | - Code: None
1189 | 
1190 | **SCOTCH and SODA: A Transformer Video Shadow Detection Framework**
1191 | 
1192 | - Paper: https://arxiv.org/abs/2211.06885
1193 | - Code: None
1194 | 
1195 | **DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization**
1196 | 
1197 | - Homepage: https://ai4ce.github.io/DeepMapping2/
1198 | - Paper: https://arxiv.org/abs/2212.06331
1199 | - None: https://github.com/ai4ce/DeepMapping2
1200 | 
1201 | **RelightableHands: Efficient Neural Relighting of Articulated Hand Models**
1202 | 
1203 | - Homepage: https://sh8.io/#/relightable_hands
1204 | - Paper: https://arxiv.org/abs/2302.04866
1205 | - Code: None
1206 | 
1207 | **Token Turing Machines**
1208 | 
1209 | - Paper: https://arxiv.org/abs/2211.09119
1210 | - Code: None
1211 | 
1212 | **Single Image Backdoor Inversion via Robust Smoothed Classifiers**
1213 | 
1214 | - Paper: https://arxiv.org/abs/2303.00215
1215 | - Code: https://github.com/locuslab/smoothinv
1216 | 
1217 | **To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision**
1218 | 
1219 | - Paper: https://arxiv.org/abs/2106.09614
1220 | - Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
1221 | 
1222 | **HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics**
1223 | 
1224 | - Homepage: https://dolorousrtur.github.io/hood/
1225 | - Paper: https://arxiv.org/abs/2212.07242
1226 | - Code: https://github.com/dolorousrtur/hood
1227 | - Demo: https://www.youtube.com/watch?v=cBttMDPrUYY
1228 | 
1229 | **A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others**
1230 | 
1231 | - Paper: https://arxiv.org/abs/2212.04825
1232 | - Code: https://github.com/facebookresearch/Whac-A-Mole.git
1233 | 
1234 | **RelightableHands: Efficient Neural Relighting of Articulated Hand Models**
1235 | 
1236 | - Homepage: https://sh8.io/#/relightable_hands
1237 | - Paper: https://arxiv.org/abs/2302.04866
1238 | - Code: None
1239 | - Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4
1240 | 
1241 | **Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation**
1242 | 
1243 | - Paper: https://arxiv.org/abs/2303.00914
1244 | - Code: None
1245 | 
1246 | **Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression**
1247 | 
1248 | - Paper: https://arxiv.org/abs/2303.01052
1249 | - Code: None
1250 | 
1251 | **UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy**
1252 | 
1253 | - Paper: https://arxiv.org/abs/2303.00938
1254 | - Code: None
1255 | 
1256 | **Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness**
1257 | 
1258 | - Paper: https://arxiv.org/abs/2303.00971
1259 | - Code: https://github.com/zhijieshen-bjtu/DOPNet
1260 | 
1261 | **Learning Neural Parametric Head Models**
1262 | 
1263 | - Homepage: https://simongiebenhain.github.io/NPHM)
1264 | - Paper: https://arxiv.org/abs/2212.02761
1265 | - Code: None
1266 | 
1267 | **A Meta-Learning Approach to Predicting Performance and Data Requirements**
1268 | 
1269 | - Paper: https://arxiv.org/abs/2303.01598
1270 | - Code: None
1271 | 
1272 | **MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision**
1273 | 
1274 | - Homepage: https://imagine.enpc.fr/~guedona/MACARONS/
1275 | - Paper: https://arxiv.org/abs/2303.03315
1276 | - Code: None
1277 | 
1278 | **Masked Images Are Counterfactual Samples for Robust Fine-tuning**
1279 | 
1280 | - Paper: https://arxiv.org/abs/2303.03052
1281 | - Code: None
1282 | 
1283 | **HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling**
1284 | 
1285 | - Paper: https://arxiv.org/abs/2303.02700
1286 | - Code: None
1287 | 
1288 | **Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization**
1289 | 
1290 | - Paper: https://arxiv.org/abs/2303.02328
1291 | - Code: None
1292 | 
1293 | **Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization**
1294 | 
1295 | - Paper: https://arxiv.org/abs/2303.03108
1296 | - Code: None
1297 | 
1298 | **Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples**
1299 | 
1300 | - Paper: https://arxiv.org/abs/2301.01217
1301 | - Code: https://github.com/jiamingzhang94/Unlearnable-Clusters 
1302 | 
1303 | **Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes**
1304 | 
1305 | - Paper: https://arxiv.org/abs/2303.04249
1306 | - Code: None
1307 | 
1308 | **UniHCP: A Unified Model for Human-Centric Perceptions**
1309 | 
1310 | - Paper: https://arxiv.org/abs/2303.02936
1311 | - Code: https://github.com/OpenGVLab/UniHCP
1312 | 
1313 | **CUDA: Convolution-based Unlearnable Datasets**
1314 | 
1315 | - Paper: https://arxiv.org/abs/2303.04278
1316 | - Code: https://github.com/vinusankars/Convolution-based-Unlearnability
1317 | 
1318 | **Masked Images Are Counterfactual Samples for Robust Fine-tuning**
1319 | 
1320 | - Paper: https://arxiv.org/abs/2303.03052
1321 | - Code: None
1322 | 
1323 | **AdaptiveMix: Robust Feature Representation via Shrinking Feature Space**
1324 | 
1325 | - Paper: https://arxiv.org/abs/2303.01559
1326 | - Code: https://github.com/WentianZhang-ML/AdaptiveMix 
1327 | 
1328 | **Physical-World Optical Adversarial Attacks on 3D Face Recognition**
1329 | 
1330 | - Paper: https://arxiv.org/abs/2205.13412
1331 | - Code: https://github.com/PolyLiYJ/SLAttack.git
1332 | 
1333 | **DPE: Disentanglement of Pose and Expression for General Video Portrait Editing**
1334 | 
1335 | - Paper: https://arxiv.org/abs/2301.06281
1336 | - Code: https://carlyx.github.io/DPE/ 
1337 | 
1338 | **SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation**
1339 | 
1340 | - Paper: https://arxiv.org/abs/2211.12194
1341 | - Code: https://github.com/Winfredy/SadTalker
1342 | 
1343 | **Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models**
1344 | 
1345 | - Paper: None
1346 | - Code: None
1347 | 
1348 | **Sharpness-Aware Gradient Matching for Domain Generalization**
1349 | 
1350 | - Paper: None
1351 | - Code: https://github.com/Wang-pengfei/SAGM
1352 | 
1353 | **Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization**
1354 | 
1355 | - Paper: None
1356 | - Code: None
1357 | 
1358 | **Blind Video Deflickering by Neural Filtering with a Flawed Atlas**
1359 | 
1360 | - Homepage:  https://chenyanglei.github.io/deflicker 
1361 | - Paper: None
1362 | - Code: None
1363 | 
1364 | **RiDDLE: Reversible and Diversified De-identification with Latent Encryptor**
1365 | 
1366 | - Paper: None
1367 | - Code:  https://github.com/ldz666666/RiDDLE 
1368 | 
1369 | **PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation**
1370 | 
1371 | - Paper: https://arxiv.org/abs/2303.07337
1372 | - Code: None
1373 | 
1374 | **Upcycling Models under Domain and Category Shift**
1375 | 
1376 | - Paper: https://arxiv.org/abs/2303.07110
1377 | - Code: https://github.com/ispc-lab/GLC
1378 | 
1379 | **Modality-Agnostic Debiasing for Single Domain Generalization**
1380 | 
1381 | - Paper: https://arxiv.org/abs/2303.07123
1382 | - Code: None
1383 | 
1384 | **Progressive Open Space Expansion for Open-Set Model Attribution**
1385 | 
1386 | - Paper: https://arxiv.org/abs/2303.06877
1387 | - Code: None
1388 | 
1389 | **Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies**
1390 | 
1391 | - Paper: https://arxiv.org/abs/2303.06856
1392 | - Code: None
1393 | 
1394 | **GFPose: Learning 3D Human Pose Prior with Gradient Fields**
1395 | 
1396 | - Paper: https://arxiv.org/abs/2212.08641
1397 | - Code: https://github.com/Embracing/GFPose 
1398 | 
1399 | **PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment**
1400 | 
1401 | - Paper: https://arxiv.org/abs/2303.11526
1402 | - Code: https://github.com/Zhang-VISLab
1403 | 
1404 | **Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings**
1405 | 
1406 | - Paper: https://arxiv.org/abs/2303.11502
1407 | - Code: None
1408 | 
1409 | **Boundary Unlearning**
1410 | 
1411 | - Paper: https://arxiv.org/abs/2303.11570
1412 | - Code: None
1413 | 
1414 | **ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing**
1415 | 
1416 | - Paper: https://arxiv.org/abs/2303.17096
1417 | - Code: https://github.com/alibaba/easyrobust
1418 | 
1419 | **Zero-shot Model Diagnosis**
1420 | 
1421 | - Paper: https://arxiv.org/abs/2303.15441
1422 | - Code: None
1423 | 
1424 | **GeoNet: Benchmarking Unsupervised Adaptation across Geographies**
1425 | 
1426 | - Homepage: https://tarun005.github.io/GeoNet/
1427 | - Paper: https://arxiv.org/abs/2303.15443
1428 | 
1429 | **Quantum Multi-Model Fitting**
1430 | 
1431 | - Paper: https://arxiv.org/abs/2303.15444
1432 | - Code: https://github.com/FarinaMatteo/qmmf
1433 | 
1434 | **DivClust: Controlling Diversity in Deep Clustering**
1435 | 
1436 | - Paper: https://arxiv.org/abs/2304.01042
1437 | - Code: None
1438 | 
1439 | **Neural Volumetric Memory for Visual Locomotion Control**
1440 | 
1441 | - Homepage: https://rchalyang.github.io/NVM
1442 | - Paper: https://arxiv.org/abs/2304.01201
1443 | - Code: https://rchalyang.github.io/NVM
1444 | 
1445 | **MonoHuman: Animatable Human Neural Field from Monocular Video**
1446 | 
1447 | - Homepage: https://yzmblog.github.io/projects/MonoHuman/
1448 | - Paper: https://arxiv.org/abs/2304.02001
1449 | - Code: https://github.com/Yzmblog/MonoHuman
1450 | 
1451 | **Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion**
1452 | 
1453 | - Homepage: https://nv-tlabs.github.io/trace-pace/
1454 | - Paper: https://arxiv.org/abs/2304.01893
1455 | - Code: None
1456 | 
1457 | **Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification**
1458 | 
1459 | - Paper: https://arxiv.org/abs/2304.01804
1460 | - Code: None
1461 | 
1462 | **HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering**
1463 | 
1464 | - Paper: https://arxiv.org/abs/2304.01686
1465 | - Code: None
1466 | 
1467 | **On the Stability-Plasticity Dilemma of Class-Incremental Learning**
1468 | 
1469 | - Paper: https://arxiv.org/abs/2304.01663
1470 | - Code: None
1471 | 
1472 | **Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning**
1473 | 
1474 | - Paper: https://arxiv.org/abs/2304.01482
1475 | - Code: None
1476 | 
1477 | **VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution**
1478 | 
1479 | - Paper: https://arxiv.org/abs/2304.01434
1480 | - Code: https://github.com/jaeill/CVPR23-VNE
1481 | 
1482 | **Detecting and Grounding Multi-Modal Media Manipulation**
1483 | 
1484 | - Homepage: https://rshaojimmy.github.io/Projects/MultiModal-DeepFake
1485 | - Paper: https://arxiv.org/abs/2304.02556
1486 | - Code: https://github.com/rshaojimmy/MultiModal-DeepFake
1487 | 
1488 | **Meta-causal Learning for Single Domain Generalization**
1489 | 
1490 | - Paper: https://arxiv.org/abs/2304.03709
1491 | - Code: None
1492 | 
1493 | **Disentangling Writer and Character Styles for Handwriting Generation**
1494 | 
1495 | - Paper: https://arxiv.org/abs/2303.14736
1496 | - Code: https://github.com/dailenson/SDT
1497 | 
1498 | **DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects**
1499 | 
1500 | - Homepage: https://www.chenbao.tech/dexart/
1501 | 
1502 | - Code: https://github.com/Kami-code/dexart-release
1503 | 
1504 | **Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision**
1505 | 
1506 | - Homepage: https://toytiny.github.io/publication/23-cmflow-cvpr/index.html 
1507 | - Paper: https://arxiv.org/abs/2303.00462
1508 | - Code: https://github.com/Toytiny/CMFlow
1509 | 
1510 | **Marching-Primitives: Shape Abstraction from Signed Distance Function**
1511 | 
1512 | - Paper: https://arxiv.org/abs/2303.13190
1513 | - Code: https://github.com/ChirikjianLab/Marching-Primitives
1514 | 
1515 | **Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision**
1516 | 
1517 | - Paper: https://arxiv.org/abs/2303.00885
1518 | - Code: None


--------------------------------------------------------------------------------
/CVer学术交流群.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wyf3/CVPR2024-Papers-with-Code/7a12b2155e596a79ba6dcc7a17a5ae27f0fc50a8/CVer学术交流群.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # CVPR 2024 论文和开源项目合集(Papers with Code)
  2 | 
  3 | CVPR 2024 decisions are now available on OpenReview！
  4 | 
  5 | 
  6 | > 注1：欢迎各位大佬提交issue，分享CVPR 2024论文和开源项目！
  7 | >
  8 | > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
  9 | >
 10 | > - [CVPR 2019](CVPR2019-Papers-with-Code.md)
 11 | > - [CVPR 2020](CVPR2020-Papers-with-Code.md)
 12 | > - [CVPR 2021](CVPR2021-Papers-with-Code.md)
 13 | > - [CVPR 2022](CVPR2022-Papers-with-Code.md)
 14 | > - [CVPR 2023](CVPR2022-Papers-with-Code.md)
 15 | 
 16 | 欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！
 17 | 
 18 | ![](CVer学术交流群.png)
 19 | 
 20 | # 【CVPR 2024 论文开源目录】
 21 | 
 22 | - [3DGS(Gaussian Splatting)](#3DGS)
 23 | - [Avatars](#Avatars)
 24 | - [Backbone](#Backbone)
 25 | - [CLIP](#CLIP)
 26 | - [MAE](#MAE)
 27 | - [Embodied AI](#Embodied-AI)
 28 | - [GAN](#GAN)
 29 | - [GNN](#GNN)
 30 | - [多模态大语言模型(MLLM)](#MLLM)
 31 | - [大语言模型(LLM)](#LLM)
 32 | - [NAS](#NAS)
 33 | - [OCR](#OCR)
 34 | - [NeRF](#NeRF)
 35 | - [DETR](#DETR)
 36 | - [Prompt](#Prompt)
 37 | - [扩散模型(Diffusion Models)](#Diffusion)
 38 | - [ReID(重识别)](#ReID)
 39 | - [长尾分布(Long-Tail)](#Long-Tail)
 40 | - [Vision Transformer](#Vision-Transformer)
 41 | - [视觉和语言(Vision-Language)](#VL)
 42 | - [自监督学习(Self-supervised Learning)](#SSL)
 43 | - [数据增强(Data Augmentation)](#DA)
 44 | - [目标检测(Object Detection)](#Object-Detection)
 45 | - [异常检测(Anomaly Detection)](#Anomaly-Detection)
 46 | - [目标跟踪(Visual Tracking)](#VT)
 47 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
 48 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
 49 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
 50 | - [医学图像(Medical Image)](#MI)
 51 | - [医学图像分割(Medical Image Segmentation)](#MIS)
 52 | - [视频目标分割(Video Object Segmentation)](#VOS)
 53 | - [视频实例分割(Video Instance Segmentation)](#VIS)
 54 | - [参考图像分割(Referring Image Segmentation)](#RIS)
 55 | - [图像抠图(Image Matting)](#Matting)
 56 | - [图像编辑(Image Editing)](#Image-Editing)
 57 | - [Low-level Vision](#LLV)
 58 | - [超分辨率(Super-Resolution)](#SR)
 59 | - [去噪(Denoising)](#Denoising)
 60 | - [去模糊(Deblur)](#Deblur)
 61 | - [自动驾驶(Autonomous Driving)](#Autonomous-Driving)
 62 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
 63 | - [3D目标检测(3D Object Detection)](#3DOD)
 64 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
 65 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
 66 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
 67 | - [3D配准(3D Registration)](#3D-Registration)
 68 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
 69 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
 70 | - [医学图像(Medical Image)](#Medical-Image)
 71 | - [图像生成(Image Generation)](#Image-Generation)
 72 | - [视频生成(Video Generation)](#Video-Generation)
 73 | - [3D生成(3D Generation)](#3D-Generation)
 74 | - [视频理解(Video Understanding)](#Video-Understanding)
 75 | - [行为检测(Action Detection)](#Action-Detection)
 76 | - [文本检测(Text Detection)](#Text-Detection)
 77 | - [知识蒸馏(Knowledge Distillation)](#KD)
 78 | - [模型剪枝(Model Pruning)](#Pruning)
 79 | - [图像压缩(Image Compression)](#IC)
 80 | - [三维重建(3D Reconstruction)](#3D-Reconstruction)
 81 | - [深度估计(Depth Estimation)](#Depth-Estimation)
 82 | - [轨迹预测(Trajectory Prediction)](#TP)
 83 | - [车道线检测(Lane Detection)](#Lane-Detection)
 84 | - [图像描述(Image Captioning)](#Image-Captioning)
 85 | - [视觉问答(Visual Question Answering)](#VQA)
 86 | - [手语识别(Sign Language Recognition)](#SLR)
 87 | - [视频预测(Video Prediction)](#Video-Prediction)
 88 | - [新视点合成(Novel View Synthesis)](#NVS)
 89 | - [Zero-Shot Learning(零样本学习)](#ZSL)
 90 | - [立体匹配(Stereo Matching)](#Stereo-Matching)
 91 | - [特征匹配(Feature Matching)](#Feature-Matching)
 92 | - [场景图生成(Scene Graph Generation)](#SGG)
 93 | - [隐式神经表示(Implicit Neural Representations)](#INR)
 94 | - [图像质量评价(Image Quality Assessment)](#IQA)
 95 | - [视频质量评价(Video Quality Assessment)](#Video-Quality-Assessment)
 96 | - [数据集(Datasets)](#Datasets)
 97 | - [新任务(New Tasks)](#New-Tasks)
 98 | - [其他(Others)](#Others)
 99 | 
100 | <a name="3DGS"></a>
101 | 
102 | # 3DGS(Gaussian Splatting)
103 | 
104 | **Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering**
105 | 
106 | - Homepage: https://city-super.github.io/scaffold-gs/
107 | - Paper: https://arxiv.org/abs/2312.00109
108 | - Code: https://github.com/city-super/Scaffold-GS
109 | 
110 | **GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis**
111 | 
112 | - Homepage: https://shunyuanzheng.github.io/GPS-Gaussian 
113 | - Paper: https://arxiv.org/abs/2312.02155
114 | - Code: https://github.com/ShunyuanZheng/GPS-Gaussian
115 | 
116 | **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**
117 | 
118 | - Paper: https://arxiv.org/abs/2312.02134
119 | - Code: https://github.com/huliangxiao/GaussianAvatar
120 | 
121 | **GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting**
122 | 
123 | - Paper: https://arxiv.org/abs/2311.14521
124 | - Code: https://github.com/buaacyw/GaussianEditor 
125 | 
126 | **Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction**
127 | 
128 | - Homepage: https://ingra14m.github.io/Deformable-Gaussians/ 
129 | - Paper: https://arxiv.org/abs/2309.13101
130 | - Code: https://github.com/ingra14m/Deformable-3D-Gaussians
131 | 
132 | **SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes**
133 | 
134 | - Homepage: https://yihua7.github.io/SC-GS-web/ 
135 | - Paper: https://arxiv.org/abs/2312.14937
136 | - Code: https://github.com/yihua7/SC-GS
137 | 
138 | **Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis**
139 | 
140 | - Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/ 
141 | - Paper: https://arxiv.org/abs/2312.16812
142 | - Code: https://github.com/oppo-us-research/SpacetimeGaussians
143 | 
144 | **DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization**
145 | 
146 | - Homepage: https://fictionarry.github.io/DNGaussian/
147 | - Paper: https://arxiv.org/abs/2403.06912
148 | - Code: https://github.com/Fictionarry/DNGaussian
149 | 
150 | **4D Gaussian Splatting for Real-Time Dynamic Scene Rendering**
151 | 
152 | - Paper: https://arxiv.org/abs/2310.08528
153 | - Code: https://github.com/hustvl/4DGaussians
154 | 
155 | **GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models**
156 | 
157 | - Paper: https://arxiv.org/abs/2310.08529
158 | - Code: https://github.com/hustvl/GaussianDreamer
159 | 
160 | <a name="Avatars"></a>
161 | 
162 | # Avatars
163 | 
164 | **GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians**
165 | 
166 | - Paper: https://arxiv.org/abs/2312.02134
167 | - Code: https://github.com/huliangxiao/GaussianAvatar
168 | 
169 | **Real-Time Simulated Avatar from Head-Mounted Sensors**
170 | 
171 | - Homepage: https://www.zhengyiluo.com/SimXR/
172 | - Paper: https://arxiv.org/abs/2403.06862
173 | 
174 | <a name="Backbone"></a>
175 | 
176 | # Backbone
177 | 
178 | **RepViT: Revisiting Mobile CNN From ViT Perspective**
179 | 
180 | - Paper: https://arxiv.org/abs/2307.09283
181 | - Code: https://github.com/THU-MIG/RepViT
182 | 
183 | **TransNeXt: Robust Foveal Visual Perception for Vision Transformers**
184 | 
185 | - Paper: https://arxiv.org/abs/2311.17132
186 | - Code: https://github.com/DaiShiResearch/TransNeXt
187 | 
188 | <a name="CLIP"></a>
189 | 
190 | # CLIP
191 | 
192 | **Alpha-CLIP: A CLIP Model Focusing on Wherever You Want**
193 | 
194 | - Paper: https://arxiv.org/abs/2312.03818
195 | - Code: https://github.com/SunzeY/AlphaCLIP
196 | 
197 | **FairCLIP: Harnessing Fairness in Vision-Language Learning**
198 | 
199 | - Paper: https://arxiv.org/abs/2403.19949
200 | - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
201 | 
202 | <a name="MAE"></a>
203 | 
204 | # MAE
205 | 
206 | <a name="Embodied-AI"></a>
207 | 
208 | # Embodied AI
209 | 
210 | **EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI**
211 | 
212 | - Homepage: https://tai-wang.github.io/embodiedscan/
213 | - Paper: https://arxiv.org/abs/2312.16170
214 | - Code: https://github.com/OpenRobotLab/EmbodiedScan
215 | 
216 | **MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**
217 | 
218 | - Homepage: https://iranqin.github.io/MP5.github.io/ 
219 | - Paper: https://arxiv.org/abs/2312.07472
220 | - Code: https://github.com/IranQin/MP5
221 | 
222 | **LEMON: Learning 3D Human-Object Interaction Relation from 2D Images**
223 | 
224 | - Paper: https://arxiv.org/abs/2312.08963
225 | - Code: https://github.com/yyvhang/lemon_3d 
226 | 
227 | <a name="GAN"></a>
228 | 
229 | # GAN
230 | 
231 | <a name="OCR"></a>
232 | 
233 | # OCR
234 | 
235 | **An Empirical Study of Scaling Law for OCR**
236 | 
237 | - Paper: https://arxiv.org/abs/2401.00028
238 | - Code: https://github.com/large-ocr-model/large-ocr-model.github.io
239 | 
240 | **ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting**
241 | 
242 | - Paper: https://arxiv.org/abs/2403.00303
243 | - Code: https://github.com/PriNing/ODM 
244 | 
245 | <a name="NeRF"></a>
246 | 
247 | # NeRF
248 | 
249 | **PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF**
250 | 
251 | - Paper: https://arxiv.org/abs/2311.13099
252 | - Code: https://github.com/FYTalon/pienerf/ 
253 | 
254 | <a name="DETR"></a>
255 | 
256 | # DETR
257 | 
258 | **DETRs Beat YOLOs on Real-time Object Detection**
259 | 
260 | - Paper: https://arxiv.org/abs/2304.08069
261 | - Code: https://github.com/lyuwenyu/RT-DETR
262 | 
263 | **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**
264 | 
265 | - Paper: https://arxiv.org/abs/2403.16131
266 | - Code: https://github.com/xiuqhou/Salience-DETR
267 | 
268 | <a name="Prompt"></a>
269 | 
270 | # Prompt
271 | 
272 | <a name="MLLM"></a>
273 | 
274 | # 多模态大语言模型(MLLM)
275 | 
276 | **mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration**
277 | 
278 | - Paper: https://arxiv.org/abs/2311.04257
279 | - Code: https://github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2
280 | 
281 | **Link-Context Learning for Multimodal LLMs**
282 | 
283 | - Paper: https://arxiv.org/abs/2308.07891
284 | - Code: https://github.com/isekai-portal/Link-Context-Learning/tree/main 
285 | 
286 | **OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation**
287 | 
288 | - Paper: https://arxiv.org/abs/2311.17911
289 | - Code: https://github.com/shikiw/OPERA
290 | 
291 | **Making Large Multimodal Models Understand Arbitrary Visual Prompts**
292 | 
293 | - Homepage: https://vip-llava.github.io/ 
294 | - Paper: https://arxiv.org/abs/2312.00784
295 | 
296 | **Pink: Unveiling the power of referential comprehension for multi-modal llms**
297 | 
298 | - Paper: https://arxiv.org/abs/2310.00582
299 | - Code: https://github.com/SY-Xuan/Pink
300 | 
301 | **Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding**
302 | 
303 | - Paper: https://arxiv.org/abs/2311.08046
304 | - Code: https://github.com/PKU-YuanGroup/Chat-UniVi
305 | 
306 | **OneLLM: One Framework to Align All Modalities with Language**
307 | 
308 | - Paper: https://arxiv.org/abs/2312.03700
309 | - Code: https://github.com/csuhan/OneLLM
310 | 
311 | <a name="LLM"></a>
312 | 
313 | # 大语言模型(LLM)
314 | 
315 | **VTimeLLM: Empower LLM to Grasp Video Moments**
316 | 
317 | - Paper: https://arxiv.org/abs/2311.18445
318 | - Code: https://github.com/huangb23/VTimeLLM 
319 | 
320 | <a name="NAS"></a>
321 | 
322 | # NAS
323 | 
324 | <a name="ReID"></a>
325 | 
326 | # ReID(重识别)
327 | 
328 | **Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification**
329 | 
330 | - Paper: https://arxiv.org/abs/2403.10254
331 | - Code: https://github.com/924973292/EDITOR 
332 | 
333 | **Noisy-Correspondence Learning for Text-to-Image Person Re-identification**
334 | 
335 | - Paper: https://arxiv.org/abs/2308.09911
336 | 
337 | - Code : https://github.com/QinYang79/RDE 
338 | 
339 | <a name="Diffusion"></a>
340 | 
341 | # 扩散模型(Diffusion Models)
342 | 
343 | **InstanceDiffusion: Instance-level Control for Image Generation**
344 | 
345 | - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
346 | 
347 | - Paper: https://arxiv.org/abs/2402.03290
348 | - Code: https://github.com/frank-xwang/InstanceDiffusion
349 | 
350 | **Residual Denoising Diffusion Models**
351 | 
352 | - Paper: https://arxiv.org/abs/2308.13712
353 | - Code: https://github.com/nachifur/RDDM
354 | 
355 | **DeepCache: Accelerating Diffusion Models for Free**
356 | 
357 | - Paper: https://arxiv.org/abs/2312.00858
358 | - Code: https://github.com/horseee/DeepCache
359 | 
360 | **DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations**
361 | 
362 | - Homepage: https://tianhao-qi.github.io/DEADiff/ 
363 | 
364 | - Paper: https://arxiv.org/abs/2403.06951
365 | - Code: https://github.com/Tianhao-Qi/DEADiff_code
366 | 
367 | **SVGDreamer: Text Guided SVG Generation with Diffusion Model**
368 | 
369 | - Paper: https://arxiv.org/abs/2312.16476
370 | - Code: https://ximinng.github.io/SVGDreamer-project/
371 | 
372 | **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**
373 | 
374 | - Paper: https://arxiv.org/abs/2312.05849
375 | - Code: https://github.com/jiuntian/interactdiffusion
376 | 
377 | **MMA-Diffusion: MultiModal Attack on Diffusion Models**
378 | 
379 | - Paper: https://arxiv.org/abs/2311.17516
380 | - Code: https://github.com/yangyijune/MMA-Diffusion
381 | 
382 | **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**
383 | 
384 | - Homeoage: https://video-motion-customization.github.io/ 
385 | - Paper: https://arxiv.org/abs/2312.00845
386 | - Code: https://github.com/HyeonHo99/Video-Motion-Customization
387 | 
388 | <a name="Vision-Transformer"></a>
389 | 
390 | # Vision Transformer
391 | 
392 | **TransNeXt: Robust Foveal Visual Perception for Vision Transformers**
393 | 
394 | - Paper: https://arxiv.org/abs/2311.17132
395 | - Code: https://github.com/DaiShiResearch/TransNeXt
396 | 
397 | **RepViT: Revisiting Mobile CNN From ViT Perspective**
398 | 
399 | - Paper: https://arxiv.org/abs/2307.09283
400 | - Code: https://github.com/THU-MIG/RepViT
401 | 
402 | **A General and Efficient Training for Transformer via Token Expansion**
403 | 
404 | - Paper: https://arxiv.org/abs/2404.00672
405 | - Code: https://github.com/Osilly/TokenExpansion 
406 | 
407 | <a name="VL"></a>
408 | 
409 | # 视觉和语言(Vision-Language)
410 | 
411 | **PromptKD: Unsupervised Prompt Distillation for Vision-Language Models**
412 | 
413 | - Paper: https://arxiv.org/abs/2403.02781
414 | - Code: https://github.com/zhengli97/PromptKD
415 | 
416 | **FairCLIP: Harnessing Fairness in Vision-Language Learning**
417 | 
418 | - Paper: https://arxiv.org/abs/2403.19949
419 | - Code: https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP
420 | 
421 | <a name="Object-Detection"></a>
422 | 
423 | # 目标检测(Object Detection)
424 | 
425 | **DETRs Beat YOLOs on Real-time Object Detection**
426 | 
427 | - Paper: https://arxiv.org/abs/2304.08069
428 | - Code: https://github.com/lyuwenyu/RT-DETR
429 | 
430 | **Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation**
431 | 
432 | - Paper: https://arxiv.org/abs/2312.01220
433 | - Code: https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation 
434 | 
435 | **YOLO-World: Real-Time Open-Vocabulary Object Detection**
436 | 
437 | - Paper: https://arxiv.org/abs/2401.17270
438 | - Code: https://github.com/AILab-CVC/YOLO-World
439 | 
440 | **Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement**
441 | 
442 | - Paper: https://arxiv.org/abs/2403.16131
443 | - Code: https://github.com/xiuqhou/Salience-DETR
444 | 
445 | <a name="Anomaly-Detection"></a>
446 | 
447 | # 异常检测(Anomaly Detection)
448 | 
449 | **Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection**
450 | 
451 | - Paper: https://arxiv.org/abs/2310.12790
452 | - Code: https://github.com/mala-lab/AHL
453 | 
454 | <a name="VT"></a>
455 | 
456 | # 目标跟踪(Object Tracking)
457 | 
458 | **Delving into the Trajectory Long-tail Distribution for Muti-object Tracking**
459 | 
460 | - Paper: https://arxiv.org/abs/2403.04700
461 | - Code: https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT 
462 | 
463 | <a name="Semantic-Segmentation"></a>
464 | 
465 | # 语义分割(Semantic Segmentation)
466 | 
467 | **Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation**
468 | 
469 | - Paper: https://arxiv.org/abs/2312.04265
470 | - Code: https://github.com/w1oves/Rein
471 | 
472 | **SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation**
473 | 
474 | - Paper: https://arxiv.org/abs/2311.15537
475 | - Code: https://github.com/xb534/SED 
476 | 
477 | <a name="MI"></a>
478 | 
479 | # 医学图像(Medical Image)
480 | 
481 | **Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology**
482 | 
483 | - Paper: https://arxiv.org/abs/2402.17228
484 | - Code: https://github.com/DearCaat/RRT-MIL
485 | 
486 | **VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis**
487 | 
488 | - Paper: https://arxiv.org/abs/2402.17300
489 | - Code: https://github.com/Luffy03/VoCo
490 | 
491 | **ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images**
492 | 
493 | - Paper: https://arxiv.org/abs/2311.15264
494 | - Code: https://github.com/nicoboou/chada_vit 
495 | 
496 | <a name="MIS"></a>
497 | 
498 | # 医学图像分割(Medical Image Segmentation)
499 | 
500 | 
501 | 
502 | <a name="Autonomous-Driving"></a>
503 | 
504 | # 自动驾驶(Autonomous Driving)
505 | 
506 | **UniPAD: A Universal Pre-training Paradigm for Autonomous Driving**
507 | 
508 | - Paper: https://arxiv.org/abs/2310.08370
509 | - Code: https://github.com/Nightmare-n/UniPAD
510 | 
511 | **Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications**
512 | 
513 | - Paper: https://arxiv.org/abs/2311.17663
514 | - Code: https://github.com/haomo-ai/Cam4DOcc
515 | 
516 | **Memory-based Adapters for Online 3D Scene Perception**
517 | 
518 | - Paper: https://arxiv.org/abs/2403.06974
519 | - Code: https://github.com/xuxw98/Online3D
520 | 
521 | **Symphonize 3D Semantic Scene Completion with Contextual Instance Queries**
522 | 
523 | - Paper: https://arxiv.org/abs/2306.15670
524 | - Code: https://github.com/hustvl/Symphonies
525 | 
526 | **A Real-world Large-scale Dataset for Roadside Cooperative Perception**
527 | 
528 | - Paper: https://arxiv.org/abs/2403.10145
529 | - Code: https://github.com/AIR-THU/DAIR-RCooper
530 | 
531 | **Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving**
532 | 
533 | - Paper: https://arxiv.org/abs/2403.07535
534 | - Code: https://github.com/Junda24/AFNet
535 | 
536 | **Traffic Scene Parsing through the TSP6K Dataset**
537 | 
538 | - Paper: https://arxiv.org/pdf/2303.02835.pdf
539 | - Code: https://github.com/PengtaoJiang/TSP6K 
540 | 
541 | <a name="3D-Point-Cloud"></a>
542 | 
543 | # 3D点云(3D-Point-Cloud)
544 | 
545 | 
546 | 
547 | <a name="3DOD"></a>
548 | 
549 | # 3D目标检测(3D Object Detection)
550 | 
551 | **PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection**
552 | 
553 | - Paper: https://arxiv.org/abs/2312.08371
554 | - Code: https://github.com/kuanchihhuang/PTT
555 | 
556 | **UniMODE: Unified Monocular 3D Object Detection**
557 | 
558 | - Paper: https://arxiv.org/abs/2402.18573
559 | 
560 | <a name="3DOD"></a>
561 | 
562 | # 3D语义分割(3D Semantic Segmentation)
563 | 
564 | <a name="Image-Editing"></a>
565 | 
566 | # 图像编辑(Image Editing)
567 | 
568 | **Edit One for All: Interactive Batch Image Editing**
569 | 
570 | - Homepage: https://thaoshibe.github.io/edit-one-for-all 
571 | - Paper: https://arxiv.org/abs/2401.10219
572 | - Code: https://github.com/thaoshibe/edit-one-for-all
573 | 
574 | <a name="Video-Editing"></a>
575 | 
576 | # 视频编辑(Video Editing)
577 | 
578 | **MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers**
579 | 
580 | - Homepage:  [https://maskint.github.io](https://maskint.github.io/) 
581 | 
582 | - Paper: https://arxiv.org/abs/2312.12468
583 | 
584 | <a name="LLV"></a>
585 | 
586 | # Low-level Vision
587 | 
588 | **Residual Denoising Diffusion Models**
589 | 
590 | - Paper: https://arxiv.org/abs/2308.13712
591 | - Code: https://github.com/nachifur/RDDM
592 | 
593 | **Boosting Image Restoration via Priors from Pre-trained Models**
594 | 
595 | - Paper: https://arxiv.org/abs/2403.06793
596 | 
597 | <a name="SR"></a>
598 | 
599 | # 超分辨率(Super-Resolution)
600 | 
601 | **SeD: Semantic-Aware Discriminator for Image Super-Resolution**
602 | 
603 | - Paper: https://arxiv.org/abs/2402.19387
604 | - Code: https://github.com/lbc12345/SeD
605 | 
606 | **APISR: Anime Production Inspired Real-World Anime Super-Resolution**
607 | 
608 | - Paper: https://arxiv.org/abs/2403.01598
609 | - Code: https://github.com/Kiteretsu77/APISR 
610 | 
611 | <a name="Denoising"></a>
612 | 
613 | # 去噪(Denoising)
614 | 
615 | ## 图像去噪(Image Denoising)
616 | 
617 | <a name="3D-Human-Pose-Estimation"></a>
618 | 
619 | # 3D人体姿态估计(3D Human Pose Estimation)
620 | 
621 | **Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation**
622 | 
623 | - Paper: https://arxiv.org/abs/2311.12028
624 | - Code: https://github.com/NationalGAILab/HoT 
625 | 
626 | <a name="Image-Generation"></a>
627 | 
628 | # 图像生成(Image Generation)
629 | 
630 | **InstanceDiffusion: Instance-level Control for Image Generation**
631 | 
632 | - Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
633 | 
634 | - Paper: https://arxiv.org/abs/2402.03290
635 | - Code: https://github.com/frank-xwang/InstanceDiffusion
636 | 
637 | **ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations**
638 | 
639 | - Homepage: https://eclipse-t2i.vercel.app/
640 | - Paper: https://arxiv.org/abs/2312.04655
641 | 
642 | - Code: https://github.com/eclipse-t2i/eclipse-inference
643 | 
644 | **Instruct-Imagen: Image Generation with Multi-modal Instruction**
645 | 
646 | - Paper: https://arxiv.org/abs/2401.01952
647 | 
648 | **Residual Denoising Diffusion Models**
649 | 
650 | - Paper: https://arxiv.org/abs/2308.13712
651 | - Code: https://github.com/nachifur/RDDM
652 | 
653 | **UniGS: Unified Representation for Image Generation and Segmentation**
654 | 
655 | - Paper: https://arxiv.org/abs/2312.01985
656 | 
657 | **Multi-Instance Generation Controller for Text-to-Image Synthesis**
658 | 
659 | - Paper: https://arxiv.org/abs/2402.05408
660 | - Code: https://github.com/limuloo/migc
661 | 
662 | **SVGDreamer: Text Guided SVG Generation with Diffusion Model**
663 | 
664 | - Paper: https://arxiv.org/abs/2312.16476
665 | - Code: https://ximinng.github.io/SVGDreamer-project/
666 | 
667 | **InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model**
668 | 
669 | - Paper: https://arxiv.org/abs/2312.05849
670 | - Code: https://github.com/jiuntian/interactdiffusion
671 | 
672 | **Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following**
673 | 
674 | - Paper: https://arxiv.org/abs/2311.17002
675 | - Code: https://github.com/ali-vilab/Ranni
676 | 
677 | <a name="Video-Generation"></a>
678 | 
679 | # 视频生成(Video Generation)
680 | 
681 | **Vlogger: Make Your Dream A Vlog**
682 | 
683 | - Paper: https://arxiv.org/abs/2401.09414
684 | - Code: https://github.com/Vchitect/Vlogger
685 | 
686 | **VBench: Comprehensive Benchmark Suite for Video Generative Models**
687 | 
688 | - Homepage: https://vchitect.github.io/VBench-project/ 
689 | - Paper: https://arxiv.org/abs/2311.17982
690 | - Code: https://github.com/Vchitect/VBench
691 | 
692 | **VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models**
693 | 
694 | - Homeoage: https://video-motion-customization.github.io/ 
695 | - Paper: https://arxiv.org/abs/2312.00845
696 | - Code: https://github.com/HyeonHo99/Video-Motion-Customization
697 | 
698 | <a name="3D-Generation"></a>
699 | 
700 | # 3D生成
701 | 
702 | **CityDreamer: Compositional Generative Model of Unbounded 3D Cities**
703 | 
704 | - Homepage: https://haozhexie.com/project/city-dreamer/ 
705 | - Paper: https://arxiv.org/abs/2309.00610
706 | - Code: https://github.com/hzxie/city-dreamer
707 | 
708 | **LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching**
709 | 
710 | - Paper: https://arxiv.org/abs/2311.11284
711 | - Code: https://github.com/EnVision-Research/LucidDreamer 
712 | 
713 | <a name="Video-Understanding"></a>
714 | 
715 | # 视频理解(Video Understanding)
716 | 
717 | **MVBench: A Comprehensive Multi-modal Video Understanding Benchmark**
718 | 
719 | - Paper: https://arxiv.org/abs/2311.17005
720 | - Code: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2 
721 | 
722 | <a name="KD"></a>
723 | 
724 | # 知识蒸馏(Knowledge Distillation)
725 | 
726 | **Logit Standardization in Knowledge Distillation**
727 | 
728 | - Paper: https://arxiv.org/abs/2403.01427
729 | - Code: https://github.com/sunshangquan/logit-standardization-KD
730 | 
731 | **Efficient Dataset Distillation via Minimax Diffusion**
732 | 
733 | - Paper: https://arxiv.org/abs/2311.15529
734 | - Code: https://github.com/vimar-gu/MinimaxDiffusion
735 | 
736 | <a name="Stereo-Matching"></a>
737 | 
738 | # 立体匹配(Stereo Matching)
739 | 
740 | **Neural Markov Random Field for Stereo Matching**
741 | 
742 | - Paper: https://arxiv.org/abs/2403.11193
743 | - Code: https://github.com/aeolusguan/NMRF 
744 | 
745 | <a name="SGG"></a>
746 | 
747 | # 场景图生成(Scene Graph Generation)
748 | 
749 | **HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation**
750 | 
751 | - Homepage: https://zhangce01.github.io/HiKER-SGG/ 
752 | - Paper : https://arxiv.org/abs/2403.12033
753 | - Code: https://github.com/zhangce01/HiKER-SGG
754 | 
755 | <a name="Video-Quality-Assessment"></a>
756 | 
757 | # 视频质量评价(Video Quality Assessment)
758 | 
759 | **KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos**
760 | 
761 | - Homepage: https://lixinustc.github.io/projects/KVQ/ 
762 | 
763 | - Paper: https://arxiv.org/abs/2402.07220
764 | - Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024
765 | 
766 | <a name="Datasets"></a>
767 | 
768 | # 数据集(Datasets)
769 | 
770 | **A Real-world Large-scale Dataset for Roadside Cooperative Perception**
771 | 
772 | - Paper: https://arxiv.org/abs/2403.10145
773 | - Code: https://github.com/AIR-THU/DAIR-RCooper
774 | 
775 | **Traffic Scene Parsing through the TSP6K Dataset**
776 | 
777 | - Paper: https://arxiv.org/pdf/2303.02835.pdf
778 | - Code: https://github.com/PengtaoJiang/TSP6K 
779 | 
780 | <a name="Others"></a>
781 | 
782 | # 其他(Others)
783 | 
784 | **Object Recognition as Next Token Prediction**
785 | 
786 | - Paper: https://arxiv.org/abs/2312.02142
787 | - Code: https://github.com/kaiyuyue/nxtp
788 | 
789 | **ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks**
790 | 
791 | - Paper: https://arxiv.org/abs/2306.14525
792 | - Code: https://parameternet.github.io/ 
793 | 
794 | **Seamless Human Motion Composition with Blended Positional Encodings**
795 | 
796 | - Paper: https://arxiv.org/abs/2402.15509
797 | - Code: https://github.com/BarqueroGerman/FlowMDM 
798 | 
799 | **LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning**
800 | 
801 | - Homepage:  https://ll3da.github.io/ 
802 | 
803 | - Paper: https://arxiv.org/abs/2311.18651
804 | - Code: https://github.com/Open3DA/LL3DA
805 | 
806 |  **CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update**
807 | 
808 | - Homepage: https://clova-tool.github.io/ 
809 | - Paper: https://arxiv.org/abs/2312.10908
810 | 
811 | **MoMask: Generative Masked Modeling of 3D Human Motions**
812 | 
813 | - Paper: https://arxiv.org/abs/2312.00063
814 | - Code: https://github.com/EricGuo5513/momask-codes
815 | 
816 |  **Amodal Ground Truth and Completion in the Wild**
817 | 
818 | - Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/ 
819 | - Paper: https://arxiv.org/abs/2312.17247
820 | - Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild
821 | 
822 | **Improved Visual Grounding through Self-Consistent Explanations**
823 | 
824 | - Paper: https://arxiv.org/abs/2312.04554
825 | - Code: https://github.com/uvavision/SelfEQ
826 | 
827 | **ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object**
828 | 
829 | - Homepage: https://chenshuang-zhang.github.io/imagenet_d/
830 | - Paper: https://arxiv.org/abs/2403.18775
831 | - Code: https://github.com/chenshuang-zhang/imagenet_d
832 | 
833 | **Learning from Synthetic Human Group Activities**
834 | 
835 | - Homepage: https://cjerry1243.github.io/M3Act/ 
836 | - Paper  https://arxiv.org/abs/2306.16772
837 | - Code: https://github.com/cjerry1243/M3Act
838 | 
839 | **A Cross-Subject Brain Decoding Framework**
840 | 
841 | - Homepage: https://littlepure2333.github.io/MindBridge/
842 | - Paper: https://arxiv.org/abs/2404.07850
843 | - Code: https://github.com/littlepure2333/MindBridge
844 | 
845 | **Multi-Task Dense Prediction via Mixture of Low-Rank Experts**
846 | 
847 | - Paper : https://arxiv.org/abs/2403.17749
848 | - Code: https://github.com/YuqiYang213/MLoRE
849 | 
850 | **Contrastive Mean-Shift Learning for Generalized Category Discovery**
851 | 
852 | - Homepage: https://postech-cvlab.github.io/cms/ 
853 | - Paper: https://arxiv.org/abs/2404.09451
854 | - Code: https://github.com/sua-choi/CMS
855 |   


--------------------------------------------------------------------------------
/master:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wyf3/CVPR2024-Papers-with-Code/7a12b2155e596a79ba6dcc7a17a5ae27f0fc50a8/master


--------------------------------------------------------------------------------