├── ICCV2021-Papers-with-Code.md.md
└── README.md


/ICCV2021-Papers-with-Code.md.md:
--------------------------------------------------------------------------------
   1 | # ICCV2021-Papers-with-Code
   2 | 
   3 | [ICCV 2021](http://iccv2021.thecvf.com/) 论文和开源项目合集(papers with code)！
   4 | 
   5 | 1617 papers accepted - 25.9% acceptance rate
   6 | 
   7 | ICCV 2021 收录论文IDs：https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml
   8 | 
   9 | > 注1：欢迎各位大佬提交issue，分享ICCV 2021论文和开源项目！
  10 | >
  11 | > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
  12 | 
  13 | ## 【ICCV 2021 论文和开源目录】
  14 | 
  15 | - [Backbone](#Backbone)
  16 | - [Transformer](#Transformer)
  17 | - [涨点神器](#Cool)
  18 | - [GAN](#GAN)
  19 | - [NAS](#NAS)
  20 | - [NeRF](#NeRF)
  21 | - [Loss](#Loss)
  22 | - [Zero-Shot Learning](#Zero-Shot-Learning)
  23 | - [Few-Shot Learning](#Few-Shot-Learning)
  24 | - [长尾(Long-tailed)](#Long-tailed)
  25 | - [Vision and Language](#VL)
  26 | - [无监督/自监督(Self-Supervised)](#Un/Self-Supervised)
  27 | - [Multi-Label Image Recognition(多标签图像识别)](#MLIR)
  28 | - [2D目标检测(Object Detection)](#Object-Detection)
  29 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
  30 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
  31 | - [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation)
  32 | - [视频目标分割(Video Object Segmentation)](#VOS)
  33 | - [Few-shot Segmentation](#Few-shot-Segmentation)
  34 | - [人体运动分割(Human Motion Segmentation)](#HMS)
  35 | - [目标跟踪(Object Tracking)](#Object-Tracking)
  36 | - [3D Point Cloud](#3D-Point-Cloud)
  37 | - [3D Object Detection(3D目标检测)](#Point-Cloud-Object-Detection)
  38 | - [3D Semantic Segmenation(3D语义分割)](#Point-Cloud-Semantic-Segmentation)
  39 | - [3D Instance Segmentation(3D实例分割)](#Point-Cloud-Instance-Segmentation)
  40 | - [3D Multi-Object Tracking(3D多目标跟踪)](#Point-Cloud-Multi-Object-Tracking)
  41 | - [Point Cloud Denoising(点云去噪)](#Point-Cloud-Denoising)
  42 | - [Point Cloud Registration(点云配准)](#Point-Cloud-Registration)
  43 | - [Point Cloud Completion(点云补全)](#PCC)
  44 | - [雷达语义分割(Radar Semantic Segmentation)](#RSS)
  45 | - [图像恢复(Image Restoration)](#Image-Restoration)
  46 | - [超分辨率(Super-Resolution)](#Super-Resolution)
  47 | - [去噪(Denoising)](#Denoising)
  48 | - [医学图像去噪(Medical Image Denoising)](#Medical-Image-Denoising)
  49 | - [去模糊(Deblurring)](#Deblurring)
  50 | - [阴影去除(Shadow Removal)](Shadow-Removal)
  51 | - [视频插帧(Video Frame Interpolation)](#VFI)
  52 | - [视频修复/补全(Video Inpainting)](#Video-Inpainting)
  53 | - [行人重识别(Person Re-identification)](#Re-ID)
  54 | - [行人搜索(Person Search)](#Person-Search)
  55 | - [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation)
  56 | - [6D位姿估计(6D Object Pose Estimation)](#6D-Object)
  57 | - [3D人头重建(3D Head Reconstruction)](#3D-Head-Reconstruction)
  58 | - [人脸识别(Face Recognition)](#FR)
  59 | - [人脸表情识别(Facial Expression Recognition)](#FER)
  60 | - [行为识别(Action Recognition)](#Action-Recognition)
  61 | - [时序动作定位(Temporal Action Localization)](#Temporal-Action-Localization)
  62 | - [动作检测(Action Detection)](Action-Detection)
  63 | - [群体活动识别(Group Activity Recognition)](#GAR)
  64 | - [手语识别(Sign Language Recognition)](#SLR)
  65 | - [文本检测(Text Detection)](#Text-Detection)
  66 | - [文本识别(Text Recognition)](#Text-Recognition)
  67 | - [文本替换(Text Repalcement)](#TR)
  68 | - [视觉问答(Visual Question Answering, VQA)](#Visual-Question-Answering)
  69 | - [对抗攻击(Adversarial Attack)](#Adversarial-Attack)
  70 | - [深度估计(Depth Estimation)](#Depth-Estimation)
  71 | - [视线估计(Gaze Estimation)](#Gaze-Estimation)
  72 | - [人群计数(Crowd Counting)](#Crowd-Counting)
  73 | - [车道线检测(Lane Detection)](#Lane-Detection)
  74 | - [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction)
  75 | - [异常检测(Anomaly Detection)](#Anomaly-Detection)
  76 | - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation)
  77 | - [图像编辑(Image Editing)](#Image-Editing)
  78 | - [图像合成(Image Synthesis)](#Image-Synthesis)
  79 | - [图像检索(Image Retrieval)](#Image-Retrieval)
  80 | - [三维重建(3D Reconstruction)](#3D-R)
  81 | - [视频稳像(Video Stabilization)](#Video-Stabilization)
  82 | - [细粒度识别(Fine-Grained Recognition)](#FGR)
  83 | - [风格迁移(Style Transfer)](#Style-Transfer)
  84 | - [神经绘画(Neural Painting)](#Neural-Painting)
  85 | - [特征匹配(Feature Matching)](#FM)
  86 | - [语义对应(Semantic Correspondence)](#Semantic-Correspondence)
  87 | - [边缘检测(Edge Detection)](#Edge-Detection)
  88 | - [相机标定(Camera Calibration)](#Camera-Calibration)
  89 | - [图像质量评估(Image Quality Assessment)](#IQA)
  90 | - [度量学习(Metric Learning)](#ML)
  91 | - [Unsupervised Domain Adaptation](#UDA)
  92 | - [Video Rescaling](#Video-Rescaling)
  93 | - [Hand-Object Interaction](#Hand-Object-Interaction)
  94 | - [Vision-and-Language Navigation](#VLN)
  95 | - [数据集(Datasets)](#Datasets)
  96 | - [其他(Others)](#Others)
  97 | 
  98 | <a name="Backbone"></a>
  99 | 
 100 | # Backbone
 101 | 
 102 | **Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions**
 103 | 
 104 | - Paper(Oral): https://arxiv.org/abs/2102.12122
 105 | - Code: https://github.com/whai362/PVT
 106 | 
 107 | **AutoFormer: Searching Transformers for Visual Recognition**
 108 | 
 109 | - Paper: https://arxiv.org/abs/2107.00651
 110 | - Code: https://github.com/microsoft/AutoML
 111 | 
 112 | **Bias Loss for Mobile Neural Networks**
 113 | 
 114 | - Paper: https://arxiv.org/abs/2107.11170
 115 | - Code: None
 116 | 
 117 | **Vision Transformer with Progressive Sampling**
 118 | 
 119 | - Paper: https://arxiv.org/abs/2108.01684
 120 | - Code: https://github.com/yuexy/PS-ViT
 121 | 
 122 | **Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet**
 123 | 
 124 | - Paper: https://arxiv.org/abs/2101.11986
 125 | - Code: https://github.com/yitu-opensource/T2T-ViT
 126 | 
 127 | **Rethinking Spatial Dimensions of Vision Transformers**
 128 | 
 129 | - Paper: https://arxiv.org/abs/2103.16302
 130 | 
 131 | - Code: https://github.com/naver-ai/pit
 132 | 
 133 | **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**
 134 | 
 135 | - Paper: https://arxiv.org/abs/2103.14030
 136 | - Code: https://github.com/microsoft/Swin-Transformer
 137 | 
 138 | **Conformer: Local Features Coupling Global Representations for Visual Recognition**
 139 | 
 140 | - Paper: https://arxiv.org/abs/2105.03889
 141 | 
 142 | - Code: https://github.com/pengzhiliang/Conformer
 143 | 
 144 | **MicroNet: Improving Image Recognition with Extremely Low FLOPs**
 145 | 
 146 | - Paper: https://arxiv.org/abs/2108.05894
 147 | - Code: https://github.com/liyunsheng13/micronet
 148 | 
 149 | **Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition**
 150 | 
 151 | - Paper: https://arxiv.org/abs/2102.01063
 152 | - Code: https://github.com/idstcv/ZenNAS
 153 | 
 154 | <a name="Transformer"></a>
 155 | 
 156 | # Visual Transformer
 157 | 
 158 | **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows**
 159 | 
 160 | - Paper: https://arxiv.org/abs/2103.14030
 161 | - Code: https://github.com/microsoft/Swin-Transformer
 162 | 
 163 | **An Empirical Study of Training Self-Supervised Vision Transformers**
 164 | 
 165 | - Paper(Oral): https://arxiv.org/abs/2104.02057
 166 | - MoCo v3 Code: None
 167 | 
 168 | **Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions**
 169 | 
 170 | - Paper(Oral): https://arxiv.org/abs/2102.12122
 171 | - Code: https://github.com/whai362/PVT
 172 | 
 173 | **Group-Free 3D Object Detection via Transformers**
 174 | 
 175 | - Paper: https://arxiv.org/abs/2104.00678
 176 | - Code: None
 177 | 
 178 | **Spatial-Temporal Transformer for Dynamic Scene Graph Generation**
 179 | 
 180 | - Paper: https://arxiv.org/abs/2107.12309
 181 | - Code: None
 182 | 
 183 | **Rethinking and Improving Relative Position Encoding for Vision Transformer**
 184 | 
 185 | - Paper: https://arxiv.org/abs/2107.14222
 186 | - Code: https://github.com/microsoft/AutoML/tree/main/iRPE
 187 | 
 188 | **Emerging Properties in Self-Supervised Vision Transformers**
 189 | 
 190 | - Paper: https://arxiv.org/abs/2104.14294
 191 | - Code: https://github.com/facebookresearch/dino
 192 | 
 193 | **Learning Spatio-Temporal Transformer for Visual Tracking**
 194 | 
 195 | - Paper: https://arxiv.org/abs/2103.17154
 196 | - Code: https://github.com/researchmm/Stark
 197 | 
 198 | **Fast Convergence of DETR with Spatially Modulated Co-Attention**
 199 | 
 200 | - Paper: https://arxiv.org/abs/2101.07448
 201 | - Code: https://github.com/abc403/SMCA-replication
 202 | 
 203 | **Vision Transformer with Progressive Sampling**
 204 | 
 205 | - Paper: https://arxiv.org/abs/2108.01684
 206 | - Code: https://github.com/yuexy/PS-ViT
 207 | 
 208 | **Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet**
 209 | 
 210 | - Paper: https://arxiv.org/abs/2101.11986
 211 | - Code: https://github.com/yitu-opensource/T2T-ViT
 212 | 
 213 | **Rethinking Spatial Dimensions of Vision Transformers**
 214 | 
 215 | - Paper: https://arxiv.org/abs/2103.16302
 216 | - Code: https://github.com/naver-ai/pit
 217 | 
 218 | **The Right to Talk: An Audio-Visual Transformer Approach**
 219 | 
 220 | - Paper: https://arxiv.org/abs/2108.03256 
 221 | - Code: None
 222 | 
 223 | **Joint Inductive and Transductive Learning for Video Object Segmentation**
 224 | 
 225 | - Paper: https://arxiv.org/abs/2108.03679
 226 | - Code: https://github.com/maoyunyao/JOINT
 227 | 
 228 | **Conformer: Local Features Coupling Global Representations for Visual Recognition**
 229 | 
 230 | - Paper: https://arxiv.org/abs/2105.03889
 231 | - Code: https://github.com/pengzhiliang/Conformer
 232 | 
 233 | **Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer**
 234 | 
 235 | - Paper: https://arxiv.org/abs/2108.03032
 236 | - Code: https://github.com/zhiheLu/CWT-for-FSS
 237 | 
 238 | **Paint Transformer: Feed Forward Neural Painting with Stroke Prediction**
 239 | 
 240 | - Paper: https://arxiv.org/abs/2108.03798
 241 | - Code: https://github.com/wzmsltw/PaintTransformer
 242 | 
 243 | **Conditional DETR for Fast Training Convergence**
 244 | 
 245 | - Paper: https://arxiv.org/abs/2108.06152
 246 | - Code: https://github.com/Atten4Vis/ConditionalDETR
 247 | 
 248 | **MUSIQ: Multi-scale Image Quality Transformer**
 249 | 
 250 | - Paper: https://arxiv.org/abs/2108.05997
 251 | - Code: https://github.com/google-research/google-research/tree/master/musiq
 252 | 
 253 | **SOTR: Segmenting Objects with Transformers**
 254 | 
 255 | - Paper: https://arxiv.org/abs/2108.06747
 256 | - Code: https://github.com/easton-cau/SOTR
 257 | 
 258 | **PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers**
 259 | 
 260 | - Paper(Oral): https://arxiv.org/abs/2108.08839
 261 | - Code: https://github.com/yuxumin/PoinTr
 262 | 
 263 | **SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer**
 264 | 
 265 | - Paper: https://arxiv.org/abs/2108.04444
 266 | - Code: https://github.com/AllenXiangX/SnowflakeNet
 267 | 
 268 | **Improving 3D Object Detection with Channel-wise Transformer**
 269 | 
 270 | - Paper: https://arxiv.org/abs/2108.10723
 271 | - Code: https://github.com/hlsheng1/CT3D
 272 | 
 273 | **TransFER: Learning Relation-aware Facial Expression Representations with Transformers**
 274 | 
 275 | - Paper: https://arxiv.org/abs/2108.11116
 276 | - Code: None
 277 | 
 278 | **GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer**
 279 | 
 280 | - Paper: https://arxiv.org/abs/2108.12630
 281 | - Code: https://github.com/xueyee/GroupFormer
 282 | 
 283 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
 284 | 
 285 | - Paper: https://arxiv.org/abs/2109.00512
 286 | - Code: https://github.com/facebookresearch/co3d
 287 | - Dataset: https://github.com/facebookresearch/co3d
 288 | 
 289 | **Voxel Transformer for 3D Object Detection**
 290 | 
 291 | - Paper: https://arxiv.org/abs/2109.02497
 292 | - Code: None
 293 | 
 294 | **3D Human Texture Estimation from a Single Image with Transformers**
 295 | 
 296 | - Homepage: https://www.mmlab-ntu.com/project/texformer/
 297 | - Paper(Oral): https://arxiv.org/abs/2109.02563
 298 | - Code: None
 299 | 
 300 | **FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting**
 301 | 
 302 | - Paper: https://arxiv.org/abs/2109.02974
 303 | - Code: https://github.com/ruiliu-ai/FuseFormer
 304 | 
 305 | **CTRL-C: Camera calibration TRansformer with Line-Classification**
 306 | 
 307 | - Paper: https://arxiv.org/abs/2109.02259
 308 | - Code: https://github.com/jwlee-vcl/CTRL-C
 309 | 
 310 | **An End-to-End Transformer Model for 3D Object Detection**
 311 | 
 312 | - Homepage: https://facebookresearch.github.io/3detr/
 313 | - Paper: https://arxiv.org/abs/2109.08141
 314 | - Code: https://github.com/facebookresearch/3detr
 315 | 
 316 | **Eformer: Edge Enhancement based Transformer for Medical Image Denoising**
 317 | 
 318 | - Paper: https://arxiv.org/abs/2109.08044
 319 | - Code: None
 320 | 
 321 | **PnP-DETR: Towards Efficient Visual Analysis with Transformers**
 322 | 
 323 | - Paper: https://arxiv.org/abs/2109.07036
 324 | - Code: https://github.com/twangnh/pnp-detr
 325 | 
 326 | **Transformer-based Dual Relation Graph for Multi-label Image Recognition**
 327 | 
 328 | - Paper: https://arxiv.org/abs/2110.04722
 329 | - Code: None
 330 | 
 331 | <a name="Cool"></a>
 332 | 
 333 | # 涨点神器
 334 | 
 335 | **FaPN: Feature-aligned Pyramid Network for Dense Image Prediction**
 336 | 
 337 | - Paper: https://github.com/EMI-Group/FaPN
 338 | - Code: https://arxiv.org/abs/2108.07058
 339 | 
 340 | **Unifying Nonlocal Blocks for Neural Networks**
 341 | 
 342 | - Paper: https://arxiv.org/abs/2108.02451
 343 | - Code: https://github.com/zh460045050/SNL_ICCV2021
 344 | 
 345 | **Towards Learning Spatially Discriminative Feature Representations** 
 346 | 
 347 | - Paper: https://arxiv.org/abs/2109.01359
 348 | - Code: None
 349 | 
 350 | <a name="GAN"></a>
 351 | 
 352 | # GAN
 353 | 
 354 | **Labels4Free: Unsupervised Segmentation using StyleGAN**
 355 | 
 356 | - Homepage: https://rameenabdal.github.io/Labels4Free/
 357 | - Paper: https://arxiv.org/abs/2103.14968
 358 | 
 359 | **GNeRF: GAN-based Neural Radiance Field without Posed Camera**
 360 | 
 361 | - Paper(Oral): https://arxiv.org/abs/2103.15606
 362 | 
 363 | - Code: https://github.com/MQ66/gnerf
 364 | 
 365 | **EigenGAN: Layer-Wise Eigen-Learning for GANs**
 366 | 
 367 | - Paper: https://arxiv.org/abs/2104.12476
 368 | - Code: https://github.com/LynnHo/EigenGAN-Tensorflow
 369 | 
 370 | **From Continuity to Editability: Inverting GANs with Consecutive Images**
 371 | 
 372 | - Paper: https://arxiv.org/abs/2107.13812
 373 | - Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs
 374 | 
 375 | **Sketch Your Own GAN**
 376 | 
 377 | - Homepage: https://peterwang512.github.io/GANSketching/
 378 | - Paper: https://arxiv.org/abs/2108.02774
 379 | - 代码: https://github.com/peterwang512/GANSketching
 380 | 
 381 | **Manifold Matching via Deep Metric Learning for Generative Modeling**
 382 | 
 383 | - Paper: https://arxiv.org/abs/2106.10777
 384 | - Code: https://github.com/dzld00/pytorch-manifold-matching 
 385 | 
 386 | **Dual Projection Generative Adversarial Networks for Conditional Image Generation**
 387 | 
 388 | - Paper: https://arxiv.org/abs/2108.09016
 389 | - Code: None
 390 | 
 391 | **GAN Inversion for Out-of-Range Images with Geometric Transformations**
 392 | 
 393 | - Paper: https://arxiv.org/abs/2108.08998
 394 | - Code: None
 395 | 
 396 | **ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement**
 397 | 
 398 | - Homepage: https://yuval-alaluf.github.io/restyle-encoder/
 399 | - Paper: https://arxiv.org/abs/2104.02699
 400 | - Code: https://github.com/yuval-alaluf/restyle-encoder
 401 | 
 402 | **StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery**
 403 | 
 404 | - Paper(Oral): https://arxiv.org/abs/2103.17249
 405 | - Code: https://github.com/orpatashnik/StyleCLIP
 406 | 
 407 | **Image Synthesis via Semantic Composition**
 408 | 
 409 | - Homepage: https://shepnerd.github.io/scg/
 410 | - Paper: https://arxiv.org/abs/2109.07053
 411 | - Code: https://github.com/dvlab-research/SCGAN
 412 | 
 413 | <a name="NAS"></a>
 414 | 
 415 | # NAS
 416 | 
 417 | **AutoFormer: Searching Transformers for Visual Recognition**
 418 | 
 419 | - Paper: https://arxiv.org/abs/2107.00651
 420 | - Code: https://github.com/microsoft/AutoML
 421 | 
 422 | **BN-NAS: Neural Architecture Search with Batch Normalization**
 423 | 
 424 | - Paper: https://arxiv.org/abs/2108.07375
 425 | - Code: https://github.com/bychen515/BNNAS
 426 | 
 427 | **Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition**
 428 | 
 429 | - Paper: https://arxiv.org/abs/2102.01063
 430 | - Code: https://github.com/idstcv/ZenNAS
 431 | 
 432 | <a name="NeRF"></a>
 433 | 
 434 | # NeRF
 435 | 
 436 | **GNeRF: GAN-based Neural Radiance Field without Posed Camera**
 437 | 
 438 | - Paper(Oral): https://arxiv.org/abs/2103.15606
 439 | 
 440 | - Code: https://github.com/MQ66/gnerf
 441 | 
 442 | **KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs**
 443 | 
 444 | - Paper: https://arxiv.org/abs/2103.13744
 445 | 
 446 | - Code: https://github.com/creiser/kilonerf
 447 | 
 448 | **In-Place Scene Labelling and Understanding with Implicit Scene Representation**
 449 | 
 450 | - Homepage: https://shuaifengzhi.com/Semantic-NeRF/
 451 | - Paper(Oral): https://arxiv.org/abs/2103.15875
 452 | 
 453 | **Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis**
 454 | 
 455 | - Homepage: https://ajayj.com/dietnerf
 456 | - Paper(DietNeRF): https://arxiv.org/abs/2104.00677
 457 | 
 458 | **BARF: Bundle-Adjusting Neural Radiance Fields**
 459 | 
 460 | - Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/
 461 | - Paper(Oral): https://arxiv.org/abs/2104.06405
 462 | - Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF
 463 | 
 464 | **Self-Calibrating Neural Radiance Fields**
 465 | 
 466 | - Paper: https://arxiv.org/abs/2108.13826
 467 | - Code: https://github.com/POSTECH-CVLab/SCNeRF
 468 | 
 469 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
 470 | 
 471 | - Paper: https://arxiv.org/abs/2109.00512
 472 | - Code: https://github.com/facebookresearch/co3d
 473 | - Dataset: https://github.com/facebookresearch/co3d
 474 | 
 475 | **Neural Articulated Radiance Field**
 476 | 
 477 | - Paper: https://arxiv.org/abs/2104.03110
 478 | - Code: https://github.com/nogu-atsu/NARF
 479 | 
 480 | **NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo**
 481 | 
 482 | - Paper(Oral): https://arxiv.org/abs/2109.01129
 483 | - Code: https://github.com/weiyithu/NerfingMVS
 484 | 
 485 | **SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes**
 486 | 
 487 | - Homepage: https://xuchen-ethz.github.io/snarf
 488 | - Paper: https://arxiv.org/abs/2104.03953
 489 | - Code: https://github.com/xuchen-ethz/snarf
 490 | 
 491 | **CodeNeRF: Disentangled Neural Radiance Fields for Object Categories**
 492 | 
 493 | - Paper: https://arxiv.org/abs/2109.01750
 494 | - Code: https://github.com/wayne1123/code-nerf
 495 | 
 496 | **PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering**
 497 | 
 498 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Ren_PIRenderer_Controllable_Portrait_Image_Generation_via_Semantic_Neural_Rendering_ICCV_2021_paper.html
 499 | - Code: https://github.com/RenYurui/PIRender
 500 | 
 501 | <a name="Loss"></a>
 502 | 
 503 | # Loss
 504 | 
 505 | **Rank & Sort Loss for Object Detection and Instance Segmentation**
 506 | 
 507 | - Paper(Oral): https://arxiv.org/abs/2107.11669
 508 | - Code: https://github.com/kemaloksuz/RankSortLoss
 509 | 
 510 | **Bias Loss for Mobile Neural Networks**
 511 | 
 512 | - Paper: https://arxiv.org/abs/2107.11170
 513 | - Code: None
 514 | 
 515 | **A Robust Loss for Point Cloud Registration**
 516 | 
 517 | - Paper: https://arxiv.org/abs/2108.11682
 518 | - Code: None
 519 | 
 520 | **Reconcile Prediction Consistency for Balanced Object Detection**
 521 | 
 522 | - Paper: https://arxiv.org/abs/2108.10809
 523 | - Code: None 
 524 | 
 525 | **Influence-Balanced Loss for Imbalanced Visual Classification**
 526 | 
 527 | - Paper: https://arxiv.org/abs/2110.02444
 528 | - Code: https://github.com/pseulki/IB-Loss
 529 | 
 530 | <a name="Zero-Shot-Learning"></a>
 531 | 
 532 | # Zero-Shot Learning
 533 | 
 534 | **FREE: Feature Refinement for Generalized Zero-Shot Learning**
 535 | 
 536 | - Paper: https://arxiv.org/abs/2107.13807
 537 | - Code: https://github.com/shiming-chen/FREE
 538 | 
 539 | **Discriminative Region-based Multi-Label Zero-Shot Learning**
 540 | 
 541 | - Paper: https://arxiv.org/abs/2108.09301
 542 | - Code: https://arxiv.org/abs/2108.09301
 543 | 
 544 | **Semantics Disentangling for Generalized Zero-Shot Learning**
 545 | 
 546 | - Paper: https://arxiv.org/pdf/2101.07978
 547 | - Code: https://github.com/uqzhichen/SDGZSL 
 548 | 
 549 | <a name="Few-Shot-Learning"></a>
 550 | 
 551 | # Few-Shot Learning
 552 | 
 553 | **Relational Embedding for Few-Shot Classification**
 554 | 
 555 | - Paper: https://arxiv.org/abs/2108.0966
 556 | - Code: https://github.com/dahyun-kang/renet
 557 | 
 558 | **Few-Shot and Continual Learning with Attentive Independent Mechanisms**
 559 | 
 560 | - Paper: https://arxiv.org/abs/2107.14053
 561 | - Code: https://github.com/huang50213/AIM-Fewshot-Continual
 562 | 
 563 | **Few Shot Visual Relationship Co-Localization**
 564 | 
 565 | - Homepage: https://vl2g.github.io/projects/vrc/ 
 566 | 
 567 | - Paper: https://arxiv.org/abs/2108.11618
 568 | 
 569 | <a name="Long-tailed"></a>
 570 | 
 571 | # 长尾(Long-tailed)
 572 | 
 573 | **Parametric Contrastive Learning**
 574 | 
 575 | - Paper: https://arxiv.org/abs/2107.12028
 576 | - Code: https://github.com/jiequancui/Parametric-Contrastive-Learning
 577 | 
 578 | **Influence-Balanced Loss for Imbalanced Visual Classification**
 579 | 
 580 | - Paper: https://arxiv.org/abs/2110.02444
 581 | - Code: https://github.com/pseulki/IB-Loss
 582 | 
 583 | <a name="VL"></a>
 584 | 
 585 | # Vision and Language
 586 | 
 587 | **VLGrammar: Grounded Grammar Induction of Vision and Language**
 588 | 
 589 | - Paper: https://arxiv.org/abs/2103.12975
 590 | - Code: https://github.com/evelinehong/VLGrammar
 591 | 
 592 | <a name="Un/Self-Supervised"></a>
 593 | 
 594 | # 无监督/自监督(Un/Self-Supervised)
 595 | 
 596 | **An Empirical Study of Training Self-Supervised Vision Transformers**
 597 | 
 598 | - Paper(Oral): https://arxiv.org/abs/2104.02057
 599 | - MoCo v3 Code: None
 600 | 
 601 | **DetCo: Unsupervised Contrastive Learning for Object Detection**
 602 | 
 603 | - Paper: https://arxiv.org/abs/2102.04803
 604 | - Code: https://github.com/xieenze/DetCo
 605 | 
 606 | **Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization**
 607 | 
 608 | - Paper: https://arxiv.org/abs/2108.02183
 609 | - Code: None
 610 | 
 611 | **Improving Contrastive Learning by Visualizing Feature Transformation**
 612 | 
 613 | - Paper(Oral): https://arxiv.org/abs/2108.02982
 614 | - Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation
 615 | 
 616 | **Self-Supervised Visual Representations Learning by Contrastive Mask Prediction**
 617 | 
 618 | - Paper: https://arxiv.org/abs/2108.08012
 619 | - Code: None
 620 | 
 621 | **Temporal Knowledge Consistency for Unsupervised Visual Representation Learning**
 622 | 
 623 | - Paper: https://arxiv.org/abs/2108.10668
 624 | - Code: None
 625 | 
 626 | **MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving**
 627 | 
 628 | - Paper: https://arxiv.org/abs/2108.12178
 629 | - Code: https://github.com/KaiChen1998/MultiSiam
 630 | 
 631 | **Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds**
 632 | 
 633 | - Homepage: https://siyuanhuang.com/STRL/
 634 | - Paper: https://arxiv.org/abs/2109.00179
 635 | - Code: https://github.com/yichen928/STRL
 636 | 
 637 | **Self-supervised Product Quantization for Deep Unsupervised Image Retrieval**
 638 | 
 639 | - Paper: https://arxiv.org/abs/2109.02244
 640 | - Code: https://github.com/youngkyunJang/SPQ
 641 | 
 642 | **Self-Supervised Representation Learning from Flow Equivariance**
 643 | 
 644 | - Paper: https://arxiv.org/abs/2101.06553
 645 | - Code: None
 646 | 
 647 | <a name="MLIR"></a>
 648 | 
 649 | # Multi-Label Image Recognition(多标签图像识别)
 650 | 
 651 | **Residual Attention: A Simple but Effective Method for Multi-Label Recognition**
 652 | 
 653 | - Paper: https://arxiv.org/abs/2108.02456
 654 | - Code: https://github.com/Kevinz-code/CSRA
 655 | 
 656 | <a name="Object-Detection"></a>
 657 | 
 658 | # 2D目标检测(Object Detection)
 659 | 
 660 | **DetCo: Unsupervised Contrastive Learning for Object Detection**
 661 | 
 662 | - Paper: https://arxiv.org/abs/2102.04803
 663 | - Code: https://github.com/xieenze/DetCo
 664 | 
 665 | **Detecting Invisible People**
 666 | 
 667 | - Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm
 668 | - Code: https://arxiv.org/abs/2012.08419
 669 | 
 670 | **Active Learning for Deep Object Detection via Probabilistic Modeling**
 671 | 
 672 | - Paper: https://arxiv.org/abs/2103.16130
 673 | - Code: None
 674 | 
 675 | **Conditional Variational Capsule Network for Open Set Recognition**
 676 | 
 677 | - Paper: https://arxiv.org/abs/2104.09159
 678 | - Code: https://github.com/guglielmocamporese/cvaecaposr
 679 | 
 680 | **MDETR : Modulated Detection for End-to-End Multi-Modal Understanding**
 681 | 
 682 | - Homepage: https://ashkamath.github.io/mdetr_page/
 683 | - Paper(Oral): https://arxiv.org/abs/2104.12763
 684 | - Code: https://github.com/ashkamath/mdetr
 685 | 
 686 | **Rank & Sort Loss for Object Detection and Instance Segmentation**
 687 | 
 688 | - Paper(Oral): https://arxiv.org/abs/2107.11669
 689 | - Code: https://github.com/kemaloksuz/RankSortLoss
 690 | 
 691 | **SimROD: A Simple Adaptation Method for Robust Object Detection**
 692 | 
 693 | - Paper(Oral): https://arxiv.org/abs/2107.13389
 694 | - Code: None
 695 | 
 696 | **GraphFPN: Graph Feature Pyramid Network for Object Detection**
 697 | 
 698 | - Paper: https://arxiv.org/abs/2108.00580
 699 | - Code: None
 700 | 
 701 | **Fast Convergence of DETR with Spatially Modulated Co-Attention**
 702 | 
 703 | - Paper: https://arxiv.org/abs/2101.07448
 704 | - Code: https://github.com/abc403/SMCA-replication
 705 | 
 706 | **Conditional DETR for Fast Training Convergence**
 707 | 
 708 | - Paper: https://arxiv.org/abs/2108.06152
 709 | - Code: https://github.com/Atten4Vis/ConditionalDETR
 710 | 
 711 | **TOOD: Task-aligned One-stage Object Detection**
 712 | 
 713 | - Paper(Oral): https://arxiv.org/abs/2108.07755
 714 | - Code: https://github.com/fcjian/TOOD
 715 | 
 716 | **Reconcile Prediction Consistency for Balanced Object Detection**
 717 | 
 718 | - Paper: https://arxiv.org/abs/2108.10809
 719 | 
 720 | - Code: None
 721 | 
 722 | **Mutual Supervision for Dense Object Detection**
 723 | 
 724 | - Paper: https://arxiv.org/abs/2109.05986
 725 | - Code: https://github.com/MCG-NJU/MuSu-Detection
 726 | 
 727 | **PnP-DETR: Towards Efficient Visual Analysis with Transformers**
 728 | 
 729 | - Paper: https://arxiv.org/abs/2109.07036
 730 | - Code: https://github.com/twangnh/pnp-detr
 731 | 
 732 | **Deep Structured Instance Graph for Distilling Object Detectors**
 733 | 
 734 | - Paper: https://arxiv.org/abs/2109.12862
 735 | 
 736 | - Code: https://github.com/dvlab-research/Dsig
 737 | 
 738 | ## 半监督目标检测
 739 | 
 740 | **End-to-End Semi-Supervised Object Detection with Soft Teacher**
 741 | 
 742 | - Paper: https://arxiv.org/abs/2106.09018
 743 | - Code: None
 744 | 
 745 | ## 旋转目标检测
 746 | 
 747 | **Oriented R-CNN for Object Detection**
 748 | 
 749 | - Paper: https://arxiv.org/abs/2108.05699
 750 | - Code: https://github.com/jbwang1997/OBBDetection
 751 | 
 752 | ## Few-Shot目标检测
 753 | 
 754 | **DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection**
 755 | 
 756 | - Paper: https://arxiv.org/abs/2108.09017
 757 | - Code: https://github.com/er-muyue/DeFRCN
 758 | 
 759 | <a name="Semantic-Segmentation"></a>
 760 | 
 761 | ## 语义分割(Semantic Segmentation)
 762 | 
 763 | **Personalized Image Semantic Segmentation**
 764 | 
 765 | - Paper: https://arxiv.org/abs/2107.13978
 766 | - Code: https://github.com/zhangyuygss/PIS
 767 | - Dataset: https://github.com/zhangyuygss/PIS
 768 | 
 769 | **Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation**
 770 | 
 771 | - Paper(Oral): https://arxiv.org/abs/2107.11264
 772 | - Code: None
 773 | 
 774 | **Enhanced Boundary Learning for Glass-like Object Segmentation**
 775 | 
 776 | - Paper: https://arxiv.org/abs/2103.15734
 777 | - Code: https://github.com/hehao13/EBLNet
 778 | 
 779 | **Self-Regulation for Semantic Segmentation**
 780 | 
 781 | - Paper: https://arxiv.org/abs/2108.09702
 782 | - Code: https://github.com/dongzhang89/SR-SS
 783 | 
 784 | **Mining Contextual Information Beyond Image for Semantic Segmentation**
 785 | 
 786 | - Paper: https://arxiv.org/abs/2108.11819
 787 | - Code: https://github.com/CharlesPikachu/mcibi
 788 | 
 789 | **Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation**
 790 | 
 791 | - Paper: https://arxiv.org/abs/2107.11264
 792 | - Code: https://github.com/shjung13/Standardized-max-logits
 793 | 
 794 | **ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation**
 795 | 
 796 | - Paper: https://arxiv.org/abs/2108.12382
 797 | - Code: https://github.com/SegmentationBLWX/sssegmentation
 798 | 
 799 | **Scaling up instance annotation via label propagation**
 800 | 
 801 | - Homepage: http://scaling-anno.csail.mit.edu/
 802 | - Paper: https://arxiv.org/abs/2110.02277
 803 | - Code: None
 804 | 
 805 | ## 无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation)
 806 | 
 807 | **Multi-Anchor Active Domain Adaptation for Semantic Segmentation**
 808 | 
 809 | - Paper(Oral): https://arxiv.org/abs/2108.08012
 810 | - Code: https://github.com/munanning/MADA
 811 | 
 812 | **Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation**
 813 | 
 814 | - Homepage:  https://sites.google.com/view/sfdaseg
 815 | - Paper: https://arxiv.org/abs/2108.11249
 816 | 
 817 | ## Few-Shot语义分割
 818 | 
 819 | **Learning Meta-class Memory for Few-Shot Semantic Segmentation**
 820 | 
 821 | - Paper: https://arxiv.org/abs/2108.02958'
 822 | - Code: None
 823 | 
 824 | **Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer**
 825 | 
 826 | - Paper: https://arxiv.org/abs/2108.03032
 827 | - Code: https://github.com/zhiheLu/CWT-for-FSS
 828 | 
 829 | ## 半监督语义分割(Semi-supervised Semantic Segmentation)
 830 | 
 831 | **Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation**
 832 | 
 833 | - Paper: https://arxiv.org/abs/2107.11787
 834 | - Code: None
 835 | 
 836 | **Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation**
 837 | 
 838 | - Paper(Oral): https://arxiv.org/abs/2107.11279
 839 | - Code: https://github.com/CVMI-Lab/DARS
 840 | 
 841 | **Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation**
 842 | 
 843 | - Paper: https://arxiv.org/abs/2108.09025
 844 | - Code: None
 845 | 
 846 | ## 弱监督语义分割(Weakly Supervised Semantic Segmentation)
 847 | 
 848 | **Complementary Patch for Weakly Supervised Semantic Segmentation**
 849 | 
 850 | - Paper: https://arxiv.org/abs/2108.03852
 851 | - Code: None
 852 | 
 853 | ## 无监督分割(Unsupervised Segmentation)
 854 | 
 855 | **Labels4Free: Unsupervised Segmentation using StyleGAN**
 856 | 
 857 | - Homepage: https://rameenabdal.github.io/Labels4Free/
 858 | - Paper: https://arxiv.org/abs/2103.14968
 859 | 
 860 | <a name="Instance-Segmentation"></a>
 861 | 
 862 | # 实例分割(Instance Segmentation)
 863 | 
 864 | **Instances as Queries**
 865 | 
 866 | - Paper: https://arxiv.org/abs/2105.01928
 867 | - Code: https://github.com/hustvl/QueryInst
 868 | 
 869 | **Crossover Learning for Fast Online Video Instance Segmentation**
 870 | 
 871 | - Paper: https://arxiv.org/abs/2104.05970
 872 | - Code: https://github.com/hustvl/CrossVIS
 873 | 
 874 | **Rank & Sort Loss for Object Detection and Instance Segmentation**
 875 | 
 876 | - Paper(Oral): https://arxiv.org/abs/2107.11669
 877 | - Code: https://github.com/kemaloksuz/RankSortLoss
 878 | 
 879 | **SOTR: Segmenting Objects with Transformers**
 880 | 
 881 | - Paper: https://arxiv.org/abs/2108.06747
 882 | - Code: https://github.com/easton-cau/SOTR
 883 | 
 884 | **Scaling up instance annotation via label propagation**
 885 | 
 886 | - Homepage: http://scaling-anno.csail.mit.edu/
 887 | - Paper: https://arxiv.org/abs/2110.02277
 888 | - Code: None
 889 | 
 890 | <a name="Medical-Image-Segmentation"></a>
 891 | 
 892 | # 医学图像分割(Medical Image Segmentation)
 893 | 
 894 | **Recurrent Mask Refinement for Few-Shot Medical Image Segmentation**
 895 | 
 896 | - Paper: https://arxiv.org/abs/2108.00622
 897 | - Code:  https://github.com/uci-cbcl/RP-Net 
 898 | 
 899 | <a name="VOS"></a>
 900 | 
 901 | # 视频目标分割(Video Object Segmentation)
 902 | 
 903 | **Hierarchical Memory Matching Network for Video Object Segmentation**
 904 | 
 905 | - Paper: https://arxiv.org/abs/2109.11404
 906 | - Code: https://github.com/Hongje/HMMN
 907 | 
 908 | **Full-Duplex Strategy for Video Object Segmentation**
 909 | 
 910 | - Homepage: http://dpfan.net/FSNet/
 911 | - Paper:  https://arxiv.org/abs/2108.03151 
 912 | - Code: https://github.com/GewelsJI/FSNet
 913 | 
 914 | **Joint Inductive and Transductive Learning for Video Object Segmentation**
 915 | 
 916 | - Paper: https://arxiv.org/abs/2108.03679
 917 | - Code: https://github.com/maoyunyao/JOINT
 918 | 
 919 | <a name="Few-shot-Segmentation"></a>
 920 | 
 921 | # Few-shot Segmentation
 922 | 
 923 | **Mining Latent Classes for Few-shot Segmentation**
 924 | 
 925 | - Paper(Oral): https://arxiv.org/abs/2103.15402
 926 | - Code: https://github.com/LiheYoung/MiningFSS
 927 | 
 928 | <a name="HMS"></a>
 929 | 
 930 | # 人体运动分割(Human Motion Segmentation)
 931 | 
 932 | **Graph Constrained Data Representation Learning for Human Motion Segmentation**
 933 | 
 934 | - Paper: https://arxiv.org/abs/2107.13362
 935 | - Code: None
 936 | 
 937 | <a name="Object-Tracking"></a>
 938 | 
 939 | # 目标跟踪(Object Tracking)
 940 | 
 941 | **Learning to Track Objects from Unlabeled Videos**
 942 | 
 943 | - Paper: https://arxiv.org/abs/2108.12711
 944 | - Code: https://github.com/VISION-SJTU/USOT
 945 | 
 946 | **Learning Spatio-Temporal Transformer for Visual Tracking**
 947 | 
 948 | - Paper: https://arxiv.org/abs/2103.17154
 949 | - Code: https://github.com/researchmm/Stark
 950 | 
 951 | **Learning to Adversarially Blur Visual Object Tracking**
 952 | 
 953 | - Paper: https://arxiv.org/abs/2107.12085
 954 | - Code: https://github.com/tsingqguo/ABA
 955 | 
 956 | **HiFT: Hierarchical Feature Transformer for Aerial Tracking**
 957 | 
 958 | - Paper: https://arxiv.org/abs/2108.00202
 959 | - Code: https://github.com/vision4robotics/HiFT
 960 | 
 961 | **Learn to Match: Automatic Matching Network Design for Visual Tracking**
 962 | 
 963 | - Paper: https://arxiv.org/abs/2108.00803
 964 | - Code: https://github.com/JudasDie/SOTS
 965 | 
 966 | **Saliency-Associated Object Tracking**
 967 | 
 968 | - Paper: https://arxiv.org/abs/2108.03637
 969 | - Code: https://github.com/ZikunZhou/SAOT.git
 970 | 
 971 | ## RGBD 目标跟踪
 972 | 
 973 | **DepthTrack: Unveiling the Power of RGBD Tracking**
 974 | 
 975 | - Paper: https://arxiv.org/abs/2108.13962
 976 | - Code: https://github.com/xiaozai/DeT
 977 | - Dataset: https://github.com/xiaozai/DeT
 978 | 
 979 | <a name="3D-Point-Cloud"></a>
 980 | 
 981 | # 3D Point Cloud
 982 | 
 983 | **Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds**
 984 | 
 985 | - Homepage: https://siyuanhuang.com/STRL/
 986 | - Paper: https://arxiv.org/abs/2109.00179
 987 | 
 988 | - Code: https://github.com/yichen928/STRL
 989 | 
 990 | **Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion**
 991 | 
 992 | - Homepage: https://hansen7.github.io/OcCo/
 993 | - Paper: https://arxiv.org/abs/2010.01089 
 994 | - Code: https://github.com/hansen7/OcCo
 995 | 
 996 | **DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation**
 997 | 
 998 | - Paper: https://arxiv.org/abs/2108.04023
 999 | - Code: None
1000 | 
1001 | **Adaptive Graph Convolution for Point Cloud Analysis**
1002 | 
1003 | - Paper: https://arxiv.org/abs/2108.08035
1004 | - Code: https://github.com/hrzhou2/AdaptConv-master
1005 | 
1006 | **Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion**
1007 | 
1008 | - Paper: https://arxiv.org/abs/2010.01089
1009 | - Code: https://github.com/hansen7/OcCo
1010 | 
1011 | <a name="Point-Cloud-Object-Detection"></a>
1012 | 
1013 | # 3D Object Detection(3D目标检测)
1014 | 
1015 | **Group-Free 3D Object Detection via Transformers**
1016 | 
1017 | - Paper: https://arxiv.org/abs/2104.00678
1018 | - Code: None
1019 | 
1020 | **Improving 3D Object Detection with Channel-wise Transformer**
1021 | 
1022 | - Paper: https://arxiv.org/abs/2108.10723
1023 | - Code: https://github.com/hlsheng1/CT3D
1024 | 
1025 | **AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection**
1026 | 
1027 | - Paper: https://arxiv.org/abs/2108.11127
1028 | - Code: https://github.com/zongdai/AutoShape
1029 | 
1030 | **4D-Net for Learned Multi-Modal Alignment**
1031 | 
1032 | - Paper: https://arxiv.org/abs/2109.01066
1033 | - Code: None
1034 | 
1035 | **Voxel Transformer for 3D Object Detection**
1036 | 
1037 | - Paper: https://arxiv.org/abs/2109.02497
1038 | - Code: None
1039 | 
1040 | **Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection**
1041 | 
1042 | - Paper: https://arxiv.org/abs/2109.02499
1043 | - Code: None
1044 | 
1045 | **An End-to-End Transformer Model for 3D Object Detection**
1046 | 
1047 | - Homepage: https://facebookresearch.github.io/3detr/
1048 | - Paper: https://arxiv.org/abs/2109.08141
1049 | - Code: https://github.com/facebookresearch/3detr
1050 | 
1051 | **RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection**
1052 | 
1053 | - Paper: https://arxiv.org/abs/2103.10039
1054 | - Code: https://github.com/TuSimple/RangeDet
1055 | 
1056 | **Geometry-based Distance Decomposition for Monocular 3D Object Detection**
1057 | 
1058 | - Paper: https://arxiv.org/abs/2104.03775
1059 | - Code: https://github.com/Rock-100/MonoDet
1060 | 
1061 | <a name="Point-Cloud-Semantic-Segmentation"></a>
1062 | 
1063 | ## 3D Semantic Segmentation(3D语义分割)
1064 | 
1065 | **ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation**
1066 | 
1067 | - Paper: https://arxiv.org/abs/2107.11769
1068 | - Code: None
1069 | 
1070 | **Learning with Noisy Labels for Robust Point Cloud Segmentation**
1071 | 
1072 | - Homepage: https://shuquanye.com/PNAL_website/
1073 | - Paper(Oral): https://arxiv.org/abs/2107.14230
1074 | 
1075 | **VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation**
1076 | 
1077 | - Paper(Oral): https://arxiv.org/abs/2107.13824
1078 | - Code: https://github.com/hzykent/VMNet
1079 | 
1080 | **Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation**
1081 | 
1082 | - Paper: https://arxiv.org/abs/2107.14724
1083 | - Code: https://github.com/leolyj/DsCML
1084 | 
1085 | **DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation**
1086 | 
1087 | - Paper: https://arxiv.org/abs/2108.04023
1088 | - Code: None
1089 | 
1090 | **Adaptive Graph Convolution for Point Cloud Analysis**
1091 | 
1092 | - Paper: https://arxiv.org/abs/2108.08035
1093 | - Code: https://github.com/hrzhou2/AdaptConv-master
1094 | 
1095 | **Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation**
1096 | 
1097 | - Paper: https://arxiv.org/abs/2106.15277
1098 | 
1099 | - Code: https://github.com/ICEORY/PMF
1100 | 
1101 | <a name="Point-Cloud-Instance-Segmentation"></a>
1102 | 
1103 | ## 3D Instance Segmentation(3D实例分割)
1104 | 
1105 | **Hierarchical Aggregation for 3D Instance Segmentation**
1106 | 
1107 | - Paper: https://arxiv.org/abs/2108.02350
1108 | - Code: https://github.com/hustvl/HAIS
1109 | 
1110 | **Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks**
1111 | 
1112 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Liang_Instance_Segmentation_in_3D_Scenes_Using_Semantic_Superpoint_Tree_Networks_ICCV_2021_paper.html
1113 | 
1114 | - Code: https://github.com/Gorilla-Lab-SCUT/SSTNet
1115 | 
1116 | <a name="Point-Cloud-Multi-Object-Tracking"></a>
1117 | 
1118 | ## 3D Multi-Object Tracking(3D多目标跟踪)
1119 | 
1120 | **Exploring Simple 3D Multi-Object Tracking for Autonomous Driving**
1121 | 
1122 | - Paper: https://arxiv.org/abs/2108.10312
1123 | - Code: https://github.com/qcraftai/simtrack
1124 | 
1125 | <a name="Point-Cloud-Denoising"></a>
1126 | 
1127 | ## Point Cloud Denoising(点云去噪)
1128 | 
1129 | **Score-Based Point Cloud Denoising**
1130 | 
1131 | - Paper: https://arxiv.org/abs/2107.10981
1132 | - Code: None
1133 | 
1134 | <a name="Point-Cloud-Registration"></a>
1135 | 
1136 | ## Point Cloud Registration(点云配准)
1137 | 
1138 | **HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration**
1139 | 
1140 | - Homepage: https://ispc-group.github.io/hregnet
1141 | - Paper: https://arxiv.org/abs/2107.11992
1142 | - Code: https://github.com/ispc-lab/HRegNet
1143 | 
1144 | **A Robust Loss for Point Cloud Registration**
1145 | 
1146 | - Paper: https://arxiv.org/abs/2108.11682
1147 | - Code: None
1148 | 
1149 | <a name="PCC"></a>
1150 | 
1151 | # Point Cloud Completion(点云补全)
1152 | 
1153 | **PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers**
1154 | 
1155 | - Paper(Oral): https://arxiv.org/abs/2108.08839
1156 | - Code: https://github.com/yuxumin/PoinTr
1157 | 
1158 | **SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer**
1159 | 
1160 | - Paper: https://arxiv.org/abs/2108.04444
1161 | - Code: https://github.com/AllenXiangX/SnowflakeNet
1162 | 
1163 | <a name="RSS"></a>
1164 | 
1165 | # 雷达语义分割(Radar Semantic Segmentation)
1166 | 
1167 | **Multi-View Radar Semantic Segmentation**
1168 | 
1169 | - Paper: https://arxiv.org/abs/2103.16214
1170 | - Code: https://github.com/valeoai/MVRSS
1171 | 
1172 | <a name="Image-Restoration"></a>
1173 | 
1174 | # 图像恢复(Image Restoration)
1175 | 
1176 | **Dynamic Attentive Graph Learning for Image Restoration**
1177 | 
1178 | - Paper: https://arxiv.org/abs/2109.06620
1179 | - Code: https://github.com/jianzhangcs/DAGL
1180 | 
1181 | <a name="Super-Resolution"></a>
1182 | 
1183 | # 超分辨率(Super-Resolution)
1184 | 
1185 | **Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks**
1186 | 
1187 | - Paper: https://arxiv.org/abs/2004.03791
1188 | - Code: https://github.com/LongguangWang/ArbSR
1189 | 
1190 | **Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution**
1191 | 
1192 | - Paper: https://arxiv.org/abs/2108.05302
1193 | - Code: https://github.com/JingyunLiang/MANet
1194 | 
1195 | **Deep Reparametrization of Multi-Frame Super-Resolution and Denoising**
1196 | 
1197 | - Paper(Oral): https://arxiv.org/abs/2108.08286
1198 | - Code: None
1199 | 
1200 | **Dual-Camera Super-Resolution with Aligned Attention Modules**
1201 | 
1202 | - Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1203 | - Paper: https://arxiv.org/abs/2109.01349
1204 | - Code: https://github.com/Tengfei-Wang/DualCameraSR
1205 | - Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1206 | 
1207 | **Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme**
1208 | 
1209 | - Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
1210 | - Code: https://github.com/IanYeung/RealVSR
1211 | - Dataset: https://github.com/IanYeung/RealVSR
1212 | 
1213 | <a name="Denoising"></a>
1214 | 
1215 | # 去噪(Denoising)
1216 | 
1217 | **Deep Reparametrization of Multi-Frame Super-Resolution and Denoising**
1218 | 
1219 | - Paper(Oral): https://arxiv.org/abs/2108.08286
1220 | - Code: None
1221 | 
1222 | **Rethinking Deep Image Prior for Denoising**
1223 | 
1224 | - Paper: https://arxiv.org/abs/2108.12841
1225 | - Code: https://github.com/gistvision/DIP-denosing
1226 | 
1227 | <a name="Medical-Image-Denoising"></a>
1228 | 
1229 | # 医学图像去噪(Medical Image Denoising)
1230 | 
1231 | **Eformer: Edge Enhancement based Transformer for Medical Image Denoising**
1232 | 
1233 | - Paper: https://arxiv.org/abs/2109.08044
1234 | - Code: None
1235 | 
1236 | <a name="Deblurring"></a>
1237 | 
1238 | # 去模糊(Deblurring)
1239 | 
1240 | **Rethinking Coarse-to-Fine Approach in Single Image Deblurring**
1241 | 
1242 | - Paper: https://arxiv.org/abs/2108.05054
1243 | - Code: https://github.com/chosj95/MIMO-UNet
1244 | 
1245 | **Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions**
1246 | 
1247 | - Paper: https://arxiv.org/abs/2108.09108
1248 | - Code: None
1249 | 
1250 | <a name="Shadow-Removal"></a>
1251 | 
1252 | # 阴影去除(Shadow Removal)
1253 | 
1254 | **CANet: A Context-Aware Network for Shadow Removal**
1255 | 
1256 | - Paper: https://arxiv.org/abs/2108.09894
1257 | - Code: https://github.com/Zipei-Chen/CANet
1258 | 
1259 | <a name="VFI"></a>
1260 | 
1261 | # 视频插帧(Video Frame Interpolation)
1262 | 
1263 | **XVFI: eXtreme Video Frame Interpolation**
1264 | 
1265 | - Paper(Oral): https://arxiv.org/abs/2103.16206
1266 | - Code: https://github.com/JihyongOh/XVFI
1267 | - Dataset: https://github.com/JihyongOh/XVFI
1268 | 
1269 | **Asymmetric Bilateral Motion Estimation for Video Frame Interpolation**
1270 | 
1271 | - Paper: https://arxiv.org/abs/2108.06815
1272 | - Code: https://github.com/JunHeum/ABME
1273 | 
1274 | <a name="Video-Inpainting"></a>
1275 | 
1276 | # 视频修复/补全(Video Inpainting)
1277 | 
1278 | **FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting**
1279 | 
1280 | - Paper: https://arxiv.org/abs/2109.02974
1281 | - Code: https://github.com/ruiliu-ai/FuseFormer
1282 | 
1283 | <a name="Re-ID"></a>
1284 | 
1285 | # 行人重识别(Person Re-identification)
1286 | 
1287 | **TransReID: Transformer-based Object Re-Identification**
1288 | 
1289 | - Paper: https://arxiv.org/abs/2102.04378
1290 | 
1291 | - Code: https://github.com/heshuting555/TransReID
1292 | 
1293 | **IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID**
1294 | 
1295 | - Paper(Oral): https://arxiv.org/abs/2108.02413
1296 | - Code: https://github.com/SikaStar/IDM
1297 | 
1298 | <a name="Person-Search"></a>
1299 | 
1300 | # 行人搜索(Person Search)
1301 | 
1302 | **Weakly Supervised Person Search with Region Siamese Networks**
1303 | 
1304 | - Paper: https://arxiv.org/abs/2109.06109
1305 | - Code: None
1306 | 
1307 | <a name="Human-Pose-Estimation"></a>
1308 | 
1309 | # 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
1310 | 
1311 | ## 2D 人体姿态估计
1312 | 
1313 | **Human Pose Regression with Residual Log-likelihood Estimation**
1314 | 
1315 | - Paper(Oral): https://arxiv.org/abs/2107.11291
1316 | - Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression
1317 | 
1318 | **Online Knowledge Distillation for Efficient Pose Estimation**
1319 | 
1320 | - Paper: https://arxiv.org/abs/2108.02092
1321 | - Code: None
1322 | 
1323 | ## 3D 人体姿态估计
1324 | 
1325 | **Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows**
1326 | 
1327 | - Paper: https://arxiv.org/abs/2107.13788
1328 | - Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows
1329 | 
1330 | **Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images**
1331 | 
1332 | - Paper: https://arxiv.org/abs/2109.05885
1333 | - Code: None
1334 | 
1335 | <a name="6D-Object"></a>
1336 | 
1337 | # 6D位姿估计(6D Object Pose Estimation)
1338 | 
1339 | **StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation**
1340 | 
1341 | - Paper: https://arxiv.org/abs/2109.10115
1342 | - Code: None
1343 | - Dataset: None
1344 | 
1345 | <a name="3D-Head-Reconstruction"></a>
1346 | 
1347 | # 3D人头重建(3D Head Reconstruction)
1348 | 
1349 | **H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction**
1350 | 
1351 | - Homepage: https://crisalixsa.github.io/h3d-net/
1352 | 
1353 | - Paper: https://arxiv.org/abs/2107.12512
1354 | 
1355 | <a name="FR"></a>
1356 | 
1357 | # 人脸识别(Face Recognition)
1358 | 
1359 | **SynFace: Face Recognition with Synthetic Data**
1360 | 
1361 | - Paper: https://arxiv.org/abs/2108.07960
1362 | - Code: None
1363 | 
1364 | <a name="FER"></a>
1365 | 
1366 | # Facial Expression Recognition(人脸表情识别)
1367 | 
1368 | **TransFER: Learning Relation-aware Facial Expression Representations with Transformers**
1369 | 
1370 | - Paper: https://arxiv.org/abs/2108.11116
1371 | - Code: None
1372 | 
1373 | <a name="Action-Recognition"></a>
1374 | 
1375 | # 行为识别(Action Recognition)
1376 | 
1377 | **MGSampler: An Explainable Sampling Strategy for Video Action Recognition**
1378 | 
1379 | - Paper: https://arxiv.org/abs/2104.09952
1380 | - Code: None
1381 | 
1382 | **Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition**
1383 | 
1384 | - Paper: https://arxiv.org/abs/2107.12213
1385 | - Code: https://github.com/Uason-Chen/CTR-GCN
1386 | 
1387 | **Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization**
1388 | 
1389 | - Paper: https://arxiv.org/abs/2108.02183
1390 | - Code: None
1391 | 
1392 | **Dynamic Network Quantization for Efficient Video Inference**
1393 | 
1394 | - Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html
1395 | - Paper: https://arxiv.org/abs/2108.10394
1396 | - Code: https://github.com/sunxm2357/VideoIQ
1397 | 
1398 | <a name="Temporal-Action-Localization"></a>
1399 | 
1400 | # 时序动作定位(Temporal Action Localization)
1401 | 
1402 | **Enriching Local and Global Contexts for Temporal Action Localization**
1403 | 
1404 | - Paper: https://arxiv.org/abs/2107.12960
1405 | - Code: None
1406 | 
1407 | <a name="Action-Detection"></a>
1408 | 
1409 | # 动作检测(Action Detection)
1410 | 
1411 | **Class Semantics-based Attention for Action Detection**
1412 | 
1413 | - Paper: https://arxiv.org/abs/2109.02613
1414 | - Code: None
1415 | 
1416 | <a name="GAR"></a>
1417 | 
1418 | # 群体活动识别(Group Activity Recognition)
1419 | 
1420 | **GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer**
1421 | 
1422 | - Paper: https://arxiv.org/abs/2108.12630
1423 | - Code: https://github.com/xueyee/GroupFormer
1424 | 
1425 | <a name="SLR"></a>
1426 | 
1427 | # 手语识别(Sign Language Recognition)
1428 | 
1429 | **Visual Alignment Constraint for Continuous Sign Language Recognition**
1430 | 
1431 | - Paper: https://arxiv.org/abs/2104.02330
1432 | - Code: https://github.com/ycmin95/VAC_CSLR 
1433 | 
1434 | <a name="Text-Detection"></a>
1435 | 
1436 | # 文本检测(Text Detection)
1437 | 
1438 | **Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection**
1439 | 
1440 | - Paper: https://arxiv.org/abs/2107.12664
1441 | - Code: https://github.com/GXYM/TextBPN
1442 | 
1443 | <a name="Text-Recognition"></a>
1444 | 
1445 | # 文本识别(Text Recognition)
1446 | 
1447 | **Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition**
1448 | 
1449 | - Paper: https://arxiv.org/abs/2107.12090
1450 | - Code: None
1451 | 
1452 | <a name="TR"></a>
1453 | 
1454 | # 文本替换(Text Replacement)
1455 | 
1456 | **STRIVE: Scene Text Replacement In Videos**
1457 | 
1458 | - Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
1459 | - Paper: https://arxiv.org/abs/2109.02762
1460 | - Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1461 | 
1462 | - Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1463 | 
1464 | <a name="Visual-Question-Answering"></a>
1465 | 
1466 | # 视觉问答(Visual Question Answering, VQA)
1467 | 
1468 | **Greedy Gradient Ensemble for Robust Visual Question Answering**
1469 | 
1470 | - Paper: https://arxiv.org/abs/2107.12651
1471 | 
1472 | - Code: https://github.com/GeraldHan/GGE
1473 | 
1474 | <a name="Adversarial-Attack"></a>
1475 | 
1476 | # 对抗攻击(Adversarial Attack)
1477 | 
1478 | **Feature Importance-aware Transferable Adversarial Attacks**
1479 | 
1480 | - Paper: https://arxiv.org/abs/2107.14185
1481 | - Code: https://github.com/hcguoO0/FIA
1482 | 
1483 | **AdvDrop: Adversarial Attack to DNNs by Dropping Information**
1484 | 
1485 | - Paper: https://arxiv.org/abs/2108.09034
1486 | - Code: https://github.com/RjDuan/AdvDrop
1487 | 
1488 | <a name="Depth-Estimation"></a>
1489 | 
1490 | # 深度估计(Depth Estimation)
1491 | 
1492 | **Augmenting Depth Estimation with Geospatial Context**
1493 | 
1494 | - Paper: https://arxiv.org/abs/2109.09879
1495 | - Code: None
1496 | 
1497 | **NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo**
1498 | 
1499 | - Paper(Oral): https://arxiv.org/abs/2109.01129
1500 | - Code: https://github.com/weiyithu/NerfingMVS
1501 | 
1502 | ## 单目深度估计
1503 | 
1504 | **MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments**
1505 | 
1506 | - Paper: https://arxiv.org/abs/2107.12429
1507 | - Code: None
1508 | 
1509 | **Towards Interpretable Deep Networks for Monocular Depth Estimation**
1510 | 
1511 | - Paper: https://arxiv.org/abs/2108.05312
1512 | - Code: https://github.com/youzunzhi/InterpretableMDE
1513 | 
1514 | **Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark**
1515 | 
1516 | - Paper: https://arxiv.org/abs/2108.03830
1517 | - Code: https://github.com/w2kun/RNW
1518 | 
1519 | **Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation**
1520 | 
1521 | - Paper: https://arxiv.org/abs/2108.07628
1522 | - Code: https://github.com/LINA-lln/ADDS-DepthNet
1523 | 
1524 | **StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation**
1525 | 
1526 | - Paper: https://arxiv.org/abs/2108.08574
1527 | - Code: https://github.com/SJTU-ViSYS/StructDepth
1528 | 
1529 | <a name="Gaze-Estimation"></a>
1530 | 
1531 | # 视线估计(Gaze Estimation)
1532 | 
1533 | **Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation**
1534 | 
1535 | - Paper: https://arxiv.org/abs/2107.13780
1536 | - Code: https://github.com/DreamtaleCore/PnP-GA
1537 | 
1538 | <a name="Crowd-Counting"></a>
1539 | 
1540 | # 人群计数(Crowd Counting)
1541 | 
1542 | **Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework**
1543 | 
1544 | - Paper(Oral): https://arxiv.org/abs/2107.12746
1545 | - Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet
1546 | 
1547 | **Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting**
1548 | 
1549 | - Paper: https://arxiv.org/abs/2107.12619
1550 | - Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet
1551 | 
1552 | <a name="Lane-Detection"></a>
1553 | 
1554 | # 车道线检测(Lane-Detection)
1555 | 
1556 | **VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection**
1557 | 
1558 | - Paper: https://arxiv.org/abs/2108.08482
1559 | - Code: https://github.com/yujun0-0/MMA-Net
1560 | 
1561 | - Dataset: https://github.com/yujun0-0/MMA-Net
1562 | 
1563 | <a name="Trajectory-Prediction"></a>
1564 | 
1565 | # 轨迹预测(Trajectory Prediction)
1566 | 
1567 | **Human Trajectory Prediction via Counterfactual Analysis**
1568 | 
1569 | - Paper: https://arxiv.org/abs/2107.14202
1570 | - Code: https://github.com/CHENGY12/CausalHTP
1571 | 
1572 | **Personalized Trajectory Prediction via Distribution Discrimination**
1573 | 
1574 | - Paper: https://arxiv.org/abs/2107.14204
1575 | - Code: https://github.com/CHENGY12/DisDis
1576 | 
1577 | **MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction**
1578 | 
1579 | - Paper: https://arxiv.org/abs/2108.09274
1580 | - Code: https://github.com/selflein/MG-GAN
1581 | 
1582 | **Social NCE: Contrastive Learning of Socially-aware Motion Representations**
1583 | 
1584 | - Paper: https://arxiv.org/abs/2012.11717
1585 | - Code: https://github.com/vita-epfl/social-nce
1586 | 
1587 | **Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving**
1588 | 
1589 | - Paper: https://arxiv.org/abs/2109.01510
1590 | - Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction
1591 | 
1592 | **Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples**
1593 | 
1594 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Where_Are_You_Heading_Dynamic_Trajectory_Prediction_With_Expert_Goal_ICCV_2021_paper.pdf
1595 | - Code: https://github.com/JoeHEZHAO/expert_traj
1596 | 
1597 | <a name="Anomaly-Detection"></a>
1598 | 
1599 | # 异常检测(Anomaly Detection)
1600 | 
1601 | **Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning**
1602 | 
1603 | - Paper: https://arxiv.org/abs/2101.10030
1604 | - Code: https://github.com/tianyu0207/RTFM
1605 | 
1606 | <a name="Scene-Graph-Generation"></a>
1607 | 
1608 | # 场景图生成(Scene Graph Generation)
1609 | 
1610 | **Spatial-Temporal Transformer for Dynamic Scene Graph Generation**
1611 | 
1612 | - Paper: https://arxiv.org/abs/2107.12309
1613 | - Code: None
1614 | 
1615 | <a name="Image-Editing"></a>
1616 | 
1617 | # 图像编辑(Image Editing)
1618 | 
1619 | **Sketch Your Own GAN**
1620 | 
1621 | - Homepage: https://peterwang512.github.io/GANSketching/
1622 | - Paper: https://arxiv.org/abs/2108.02774
1623 | - 代码: https://github.com/peterwang512/GANSketching
1624 | 
1625 | <a name="Image-Synthesis"></a>
1626 | 
1627 | # 图像合成(Image Synthesis)
1628 | 
1629 | **Image Synthesis via Semantic Composition**
1630 | 
1631 | - Homepage: https://shepnerd.github.io/scg/
1632 | - Paper: https://arxiv.org/abs/2109.07053
1633 | - Code: https://github.com/dvlab-research/SCGAN
1634 | 
1635 | <a name="Image-Retrieva"></a>
1636 | 
1637 | # 图像检索(Image Retrieval)
1638 | 
1639 | **Self-supervised Product Quantization for Deep Unsupervised Image Retrieval**
1640 | 
1641 | - Paper: https://arxiv.org/abs/2109.02244
1642 | - Code: https://github.com/youngkyunJang/SPQ
1643 | 
1644 | <a name="3D-R"></a>
1645 | 
1646 | # 三维重建(3D Reconstruction)
1647 | 
1648 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
1649 | 
1650 | - Paper: https://arxiv.org/abs/2109.00512
1651 | 
1652 | - Code: https://github.com/facebookresearch/co3d
1653 | - Dataset: https://github.com/facebookresearch/co3d
1654 | 
1655 | <a name="Video-Stabilization"></a>
1656 | 
1657 | # 视频稳像(Video Stabilization)
1658 | 
1659 | **Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization**
1660 | 
1661 | - Paper: https://arxiv.org/abs/2108.09041
1662 | 
1663 | - 代码：https://github.com/Annbless/OVS_Stabilization
1664 | 
1665 | <a name="FGR"></a>
1666 | 
1667 | # 细粒度识别(Fine-Grained Recognition)
1668 | 
1669 | **Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach**
1670 | 
1671 | - Paper: https://arxiv.org/abs/2108.02399
1672 | - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1673 | - Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1674 | 
1675 | <a name="Style-Transfer"></a>
1676 | 
1677 | # 风格迁移(Style Transfer)
1678 | 
1679 | **AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer**
1680 | 
1681 | - Paper: https://arxiv.org/abs/2108.03647
1682 | 
1683 | - Paddle Code：https://github.com/PaddlePaddle/PaddleGAN
1684 | 
1685 | - PyTorch Code：https://github.com/Huage001/AdaAttN
1686 | 
1687 | <a name="Neural-Painting"></a>
1688 | 
1689 | # 神经绘画(Neural Painting)
1690 | 
1691 | **Paint Transformer: Feed Forward Neural Painting with Stroke Prediction**
1692 | 
1693 | - Paper: https://arxiv.org/abs/2108.03798
1694 | - Code: https://github.com/wzmsltw/PaintTransformer
1695 | 
1696 | <a name="FM"></a>
1697 | 
1698 | # 特征匹配(Feature Matching)
1699 | 
1700 | **Learning to Match Features with Seeded Graph Matching Network**
1701 | 
1702 | - Paper: https://arxiv.org/abs/2108.08771
1703 | 
1704 | - Code: https://github.com/vdvchen/SGMNet
1705 | 
1706 | <a name="Semantic-Correspondence"></a>
1707 | 
1708 | # 语义对应(Semantic Correspondence)
1709 | 
1710 | **Multi-scale Matching Networks for Semantic Correspondence**
1711 | 
1712 | - Paper: https://arxiv.org/abs/2108.00211
1713 | - Code: https://github.com/wintersun661/MMNet
1714 | 
1715 | <a name="Edge-Detection"></a>
1716 | 
1717 | # 边缘检测(Edge Detection)
1718 | 
1719 | **Pixel Difference Networks for Efficient Edge Detection**
1720 | 
1721 | - Paper: https://arxiv.org/abs/2108.07009
1722 | - Code: https://github.com/zhuoinoulu/pidinet
1723 | 
1724 | **RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth**
1725 | 
1726 | - Paper: https://arxiv.org/abs/2108.00616
1727 | - Code : https://github.com/MengyangPu/RINDNet
1728 | - Dataset: https://github.com/MengyangPu/RINDNet
1729 | 
1730 | <a name="Camera-Calibration"></a>
1731 | 
1732 | # 相机标定(Camera calibration)
1733 | 
1734 | **CTRL-C: Camera calibration TRansformer with Line-Classification**
1735 | 
1736 | - Paper: https://arxiv.org/abs/2109.02259
1737 | - Code: https://github.com/jwlee-vcl/CTRL-C
1738 | 
1739 | <a name="IQA"></a>
1740 | 
1741 | # 图像质量评估(Image Quality Assessment)
1742 | 
1743 | **MUSIQ: Multi-scale Image Quality Transformer**
1744 | 
1745 | - Paper: https://arxiv.org/abs/2108.05997
1746 | - Code: https://github.com/google-research/google-research/tree/master/musiq
1747 | 
1748 | **Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment**
1749 | 
1750 | - Paper: https://arxiv.org/abs/2108.07948
1751 | - Code: https://github.com/researchmm/CKDN 
1752 | 
1753 | <a name="ML"></a>
1754 | 
1755 | # 度量学习(Metric Learning)
1756 | 
1757 | **Deep Relational Metric Learning**
1758 | 
1759 | - Paper: https://arxiv.org/abs/2108.10026
1760 | - Code: https://github.com/zbr17/DRML
1761 | 
1762 | **Towards Interpretable Deep Metric Learning with Structural Matching**
1763 | 
1764 | - Paper: https://arxiv.org/abs/2108.05889
1765 | - Code: https://github.com/wl-zhao/DIML
1766 | 
1767 | <a name="UDA"></a>
1768 | 
1769 | # Unsupervised Domain Adaptation
1770 | 
1771 | **Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation**
1772 | 
1773 | - Paper(Oral): https://arxiv.org/abs/2107.13467
1774 | - Code: None
1775 | 
1776 | <a name="Video-Rescaling"></a>
1777 | 
1778 | # Video Rescaling
1779 | 
1780 | **Self-Conditioned Probabilistic Learning of Video Rescaling**
1781 | 
1782 | - Paper: https://arxiv.org/abs/2107.11639
1783 | 
1784 | - Code: None
1785 | 
1786 | <a name="Hand-Object-Interaction"></a>
1787 | 
1788 | # Hand-Object Interaction
1789 | 
1790 | **Learning a Contact Potential Field to Model the Hand-Object Interaction**
1791 | 
1792 | - Paper: https://arxiv.org/abs/2012.00924
1793 | - Code: https://lixiny.github.io/CPF 
1794 | 
1795 | <a name="VLN"></a>
1796 | 
1797 | # Vision-and-Language Navigation
1798 | 
1799 | **Airbert: In-domain Pretraining for Vision-and-Language Navigation**
1800 | 
1801 | - Paper: https://arxiv.org/abs/2108.09105
1802 | - Code: https://airbert-vln.github.io/
1803 | - Dataset: https://airbert-vln.github.io/
1804 | 
1805 | <a name="Datasets"></a>
1806 | 
1807 | # 数据集(Datasets)
1808 | 
1809 | **Beyond Road Extraction: A Dataset for Map Update using Aerial Images**
1810 | 
1811 | - Homepage: https://favyen.com/muno21/
1812 | - Paper: https://arxiv.org/abs/2110.04690
1813 | 
1814 | - Code: https://github.com/favyen/muno21
1815 | 
1816 | **StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation**
1817 | 
1818 | - Paper: https://arxiv.org/abs/2109.10115
1819 | - Code: None
1820 | - Dataset: None
1821 | 
1822 | **RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth**
1823 | 
1824 | - Paper: https://arxiv.org/abs/2108.00616
1825 | - Code : https://github.com/MengyangPu/RINDNet
1826 | - Dataset: https://github.com/MengyangPu/RINDNet
1827 | 
1828 | **Panoptic Narrative Grounding**
1829 | 
1830 | - Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
1831 | - Paper(Oral): https://arxiv.org/abs/2109.04988
1832 | - Code: https://github.com/BCV-Uniandes/PNG
1833 | - Dataset: https://github.com/BCV-Uniandes/PNG
1834 | 
1835 | **STRIVE: Scene Text Replacement In Videos**
1836 | 
1837 | - Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/
1838 | - Paper: https://arxiv.org/abs/2109.02762
1839 | - Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1840 | 
1841 | - Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/
1842 | 
1843 | **Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme**
1844 | 
1845 | - Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf
1846 | - Code: https://github.com/IanYeung/RealVSR
1847 | - Dataset: https://github.com/IanYeung/RealVSR
1848 | 
1849 | **Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes**
1850 | 
1851 | - Paper: https://arxiv.org/abs/2109.03585
1852 | 
1853 | - Code: None
1854 | 
1855 | **Dual-Camera Super-Resolution with Aligned Attention Modules**
1856 | 
1857 | - Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1858 | - Paper: https://arxiv.org/abs/2109.01349
1859 | - Code: https://github.com/Tengfei-Wang/DualCameraSR
1860 | - Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html
1861 | 
1862 | **DepthTrack: Unveiling the Power of RGBD Tracking**
1863 | 
1864 | - Paper: https://arxiv.org/abs/2108.13962
1865 | - Code: https://github.com/xiaozai/DeT
1866 | - Dataset: https://github.com/xiaozai/DeT
1867 | 
1868 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction**
1869 | 
1870 | - Paper: https://arxiv.org/abs/2109.00512
1871 | 
1872 | - Code: https://github.com/facebookresearch/co3d
1873 | - Dataset: https://github.com/facebookresearch/co3d
1874 | 
1875 | **BioFors: A Large Biomedical Image Forensics Dataset**
1876 | 
1877 | - Paper: https://arxiv.org/abs/2108.12961
1878 | - Code: None
1879 | - Dataset: None
1880 | 
1881 | **Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach**
1882 | 
1883 | - Paper: https://arxiv.org/abs/2108.02399
1884 | - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1885 | - Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset
1886 | 
1887 | **Airbert: In-domain Pretraining for Vision-and-Language Navigation**
1888 | 
1889 | - Paper: https://arxiv.org/abs/2108.09105
1890 | - Code: https://airbert-vln.github.io/
1891 | - Dataset: https://airbert-vln.github.io/
1892 | 
1893 | **Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation**
1894 | 
1895 | - Paper: http://arxiv.org/abs/2108.08202
1896 | - Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
1897 | - Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
1898 | 
1899 | **VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection**
1900 | 
1901 | - Paper: https://arxiv.org/abs/2108.08482
1902 | - Code: https://github.com/yujun0-0/MMA-Net
1903 | 
1904 | - Dataset: https://github.com/yujun0-0/MMA-Net
1905 | 
1906 | **XVFI: eXtreme Video Frame Interpolation**
1907 | 
1908 | - Paper(Oral): https://arxiv.org/abs/2103.16206
1909 | - Code: https://github.com/JihyongOh/XVFI
1910 | - Dataset: https://github.com/JihyongOh/XVFI
1911 | 
1912 | **Personalized Image Semantic Segmentation**
1913 | 
1914 | - Paper: https://arxiv.org/abs/2107.13978
1915 | - Code: https://github.com/zhangyuygss/PIS
1916 | - Dataset: https://github.com/zhangyuygss/PIS
1917 | 
1918 | **H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction**
1919 | 
1920 | - Homepage: https://crisalixsa.github.io/h3d-net/
1921 | 
1922 | - Paper: https://arxiv.org/abs/2107.12512
1923 | 
1924 | <a name="Others"></a>
1925 | 
1926 | # 其他(Others)
1927 | 
1928 | **Photon-Starved Scene Inference using Single Photon Cameras**
1929 | 
1930 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Goyal_Photon-Starved_Scene_Inference_Using_Single_Photon_Cameras_ICCV_2021_paper.pdf
1931 | 
1932 | - Code: https://github.com/bhavyagoyal/spclowlight
1933 | 
1934 | **Towards Flexible Blind JPEG Artifacts Removal**
1935 | 
1936 | - Paper: https://arxiv.org/abs/2109.14573
1937 | 
1938 | - Code: https://github.com/jiaxi-jiang/FBCNN
1939 | 
1940 | **Generating Attribution Maps with Disentangled Masked Backpropagation**
1941 | 
1942 | - Paper: https://arxiv.org/abs/2101.06773
1943 | - Code: https://gitlab.com/adriaruizo/dmbp_iccv21
1944 | 
1945 | **CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations**
1946 | 
1947 | - Paper: https://arxiv.org/abs/2109.14910
1948 | - Code: None
1949 | 
1950 | **ReconfigISP: Reconfigurable Camera Image Processing Pipeline**
1951 | 
1952 | - Paper: https://arxiv.org/abs/2109.04760
1953 | - Code: None
1954 | 
1955 | **Panoptic Narrative Grounding**
1956 | 
1957 | - Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/
1958 | - Paper(Oral): https://arxiv.org/abs/2109.04988
1959 | - Code: https://github.com/BCV-Uniandes/PNG
1960 | - Dataset: https://github.com/BCV-Uniandes/PNG
1961 | 
1962 | **NEAT: Neural Attention Fields for End-to-End Autonomous Driving**
1963 | 
1964 | - Paper: https://arxiv.org/abs/2109.04456
1965 | - https://github.com/autonomousvision/neat
1966 | 
1967 | **Keep CALM and Improve Visual Feature Attribution**
1968 | 
1969 | - Paper: https://arxiv.org/abs/2106.07861
1970 | - Code: https://github.com/naver-ai/calm
1971 | 
1972 | **YouRefIt: Embodied Reference Understanding with Language and Gesture**
1973 | 
1974 | - Paper: https://arxiv.org/abs/2109.03413
1975 | - Code: None
1976 | 
1977 | **Pri3D: Can 3D Priors Help 2D Representation Learning?**
1978 | 
1979 | - Paper: https://arxiv.org/abs/2104.11225
1980 | - Code: https://github.com/Sekunde/Pri3D
1981 | 
1982 | **Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain**
1983 | 
1984 | - Paper: https://arxiv.org/abs/2108.08487
1985 | - Code: https://github.com/iCGY96/APR 
1986 | 
1987 | **Continual Learning for Image-Based Camera Localization**
1988 | 
1989 | - Paper: https://arxiv.org/abs/2108.09112
1990 | - Code: None
1991 | 
1992 | **Multi-Task Self-Training for Learning General Representations**
1993 | 
1994 | - Paper: https://arxiv.org/abs/2108.11353
1995 | - Code: None
1996 | 
1997 | **A Unified Objective for Novel Class Discovery**
1998 | 
1999 | - Homepage: https://ncd-uno.github.io/
2000 | - Paper(Oral): https://arxiv.org/abs/2108.08536
2001 | - Code: https://github.com/DonkeyShot21/UNO
2002 | 
2003 | **Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs**
2004 | 
2005 | - Paper: https://arxiv.org/abs/2108.07884
2006 | - Code: https://github.com/islamamirul/PermuteNet
2007 | 
2008 | **Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation**
2009 | 
2010 | - Paper: http://arxiv.org/abs/2108.08202
2011 | - Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
2012 | - Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021
2013 | 
2014 | **Impact of Aliasing on Generalizatin in Deep Convolutional Networks**
2015 | 
2016 | - Paper: https://arxiv.org/abs/2108.03489
2017 | - Code: None
2018 | 
2019 | **Out-of-Core Surface Reconstruction via Global TGV Minimization**
2020 | 
2021 | - Paper: https://arxiv.org/abs/2107.14790
2022 | - Code: None
2023 | 
2024 | **Progressive Correspondence Pruning by Consensus Learning**
2025 | 
2026 | - Homepage: https://sailor-z.github.io/projects/CLNet.html
2027 | - Paper: https://arxiv.org/abs/2101.00591
2028 | - Code: https://github.com/sailor-z/CLNet
2029 | 
2030 | **Energy-Based Open-World Uncertainty Modeling for Confidence Calibration**
2031 | 
2032 | - Paper: https://arxiv.org/abs/2107.12628
2033 | - Code: None
2034 | 
2035 | **Generalized Shuffled Linear Regression**
2036 | 
2037 | - Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing
2038 | - Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression
2039 | 
2040 | **Discovering 3D Parts from Image Collections**
2041 | 
2042 | - Homepage: https://chhankyao.github.io/lpd/
2043 | 
2044 | - Paper: https://arxiv.org/abs/2107.13629
2045 | 
2046 | **Semi-Supervised Active Learning with Temporal Output Discrepancy**
2047 | 
2048 | - Paper: https://arxiv.org/abs/2107.14153
2049 | - Code: https://github.com/siyuhuang/TOD
2050 | 
2051 | **Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?**
2052 | 
2053 | Paper: https://arxiv.org/abs/2105.02498
2054 | 
2055 | Code: https://github.com/KingJamesSong/DifferentiableSVD 
2056 | 
2057 | **Hand-Object Contact Consistency Reasoning for Human Grasps Generation**
2058 | 
2059 | - Homepage: https://hwjiang1510.github.io/GraspTTA/
2060 | - Paper(Oral): https://arxiv.org/abs/2104.03304
2061 | - Code: None
2062 | 
2063 | **Equivariant Imaging: Learning Beyond the Range Space**
2064 | 
2065 | - Paper(Oral): https://arxiv.org/abs/2103.14756
2066 | - Code: https://github.com/edongdongchen/EI
2067 | 
2068 | **Just Ask: Learning to Answer Questions from Millions of Narrated Videos**
2069 | 
2070 | - Paper(Oral): https://arxiv.org/abs/2012.00451
2071 | - Code: https://github.com/antoyang/just-ask
2072 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # ICCV2023-Papers-with-Code
  2 | 
  3 | [ICCV 2023](http://iccv2023.thecvf.com/) 论文和开源项目合集(papers with code)！
  4 | 
  5 | 2160 papers accepted！
  6 | 
  7 | ICCV 2023 收录论文IDs：https://t.co/A0mCH8gbOi
  8 | 
  9 | > 注1：欢迎各位大佬提交issue，分享ICCV 2023论文和开源项目！
 10 | >
 11 | > 注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision
 12 | >
 13 | > [ICCV 2021](ICCV2021-Papers-with-Code.md)
 14 | 
 15 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料，欢迎扫码加入【[CVer学术交流群](https://t.zsxq.com/10OGjThDw)】！互相学习，一起进步~
 16 | 
 17 | ![](https://github.com/amusi/CVPR2023-Papers-with-Code/raw/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png)
 18 | 
 19 | # 【ICCV 2023 论文开源目录】
 20 | 
 21 | - [Backbone](#Backbone)
 22 | - [CLIP](#CLIP)
 23 | - [MAE](#MAE)
 24 | - [GAN](#GAN)
 25 | - [GNN](#GNN)
 26 | - [MLP](#MLP)
 27 | - [NAS](#NAS)
 28 | - [OCR](#OCR)
 29 | - [NeRF](#NeRF)
 30 | - [DETR](#DETR)
 31 | - [Prompt](#Prompt)
 32 | - [Diffusion Models(扩散模型)](#Diffusion)
 33 | - [Prompt](#Prompt)
 34 | - [Avatars](#Avatars)
 35 | - [ReID(重识别)](#ReID)
 36 | - [长尾分布(Long-Tail)](#Long-Tail)
 37 | - [Vision Transformer](#Vision-Transformer)
 38 | - [视觉和语言(Vision-Language)](#VL)
 39 | - [自监督学习(Self-supervised Learning)](#SSL)
 40 | - [数据增强(Data Augmentation)](#DA)
 41 | - [目标检测(Object Detection)](#Object-Detection)
 42 | - [目标跟踪(Visual Tracking)](#VT)
 43 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation)
 44 | - [实例分割(Instance Segmentation)](#Instance-Segmentation)
 45 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation)
 46 | - [医学图像分类(Medical Image Classfication)](#MIC)
 47 | - [医学图像分割(Medical Image Segmentation)](#MIS)
 48 | - [视频目标分割(Video Object Segmentation)](#VOS)
 49 | - [视频实例分割(Video Instance Segmentation)](#VIS)
 50 | - [参考图像分割(Referring Image Segmentation)](#RIS)
 51 | - [图像抠图(Image Matting)](#Matting)
 52 | - [Low-level Vision](#LLV)
 53 | - [超分辨率(Super-Resolution)](#SR)
 54 | - [去噪(Denoising)](#Denoising)
 55 | - [去模糊(Deblur)](#Deblur)
 56 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud)
 57 | - [3D目标检测(3D Object Detection)](#3DOD)
 58 | - [3D语义分割(3D Semantic Segmentation)](#3DSS)
 59 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking)
 60 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC)
 61 | - [3D配准(3D Registration)](#3D-Registration)
 62 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation)
 63 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation)
 64 | - [医学图像(Medical Image)](#Medical-Image)
 65 | - [图像生成(Image Generation)](#Image-Generation)
 66 | - [视频生成(Video Generation)](#Video-Generation)
 67 | - [图像编辑(Image Editing)](#Image-Editing)
 68 | - [视频编辑(Video Editing)](#Video-Editing)
 69 | - [视频理解(Video Understanding)](#Video-Understanding)
 70 | - [人体运动生成(Human Motion Generation)](#Human-Motion-Generation)
 71 | - [低光照图像增强(Low-light Image Enhancement)](#Low-light-Image-Enhancement)
 72 | - [场景文本识别(Scene Text Recognition)](#STR)
 73 | - [图像检索(Image Retrieval)](#Image-Retrieval)
 74 | - [图像融合(Image Fusion)](#Image-Fusion)
 75 | - [轨迹预测(Trajectory Prediction) ](#Trajectory-Prediction)
 76 | - [人群计数(Crowd Counting)](#Crowd-Counting)
 77 | - [Video Quality Assessment(视频质量评价)](#Video-Quality-Assessment)
 78 | - [其它(Others)](#Others)
 79 | 
 80 | <a name="Avatars"></a>
 81 | 
 82 | # Avatars 
 83 | 
 84 | **Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control**
 85 | 
 86 | Paper: https://arxiv.org/abs/2303.17606
 87 | 
 88 | Code: https://github.com/songrise/AvatarCraft
 89 | 
 90 | <a name="Backbone"></a>
 91 | 
 92 | # Backbone
 93 | 
 94 | **Rethinking Mobile Block for Efficient Attention-based Models**
 95 | 
 96 | - Paper: https://arxiv.org/abs/2301.01146
 97 | - Code: https://github.com/zhangzjn/EMO 
 98 | 
 99 | <a name="CLIP"></a>
100 | 
101 | # CLIP
102 | 
103 | **PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization**
104 | 
105 | - Paper: https://arxiv.org/abs/2307.15199
106 | - Code: [https://PromptStyler.github.io/](https://promptstyler.github.io/)
107 | 
108 | **CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation**
109 | 
110 | - Paper: https://arxiv.org/abs/2308.15226
111 | - Code: http://www.github.com/devaansh100/CLIPTrans
112 | 
113 | <a name="NeRF"></a>
114 | 
115 | # NeRF
116 | 
117 | **IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis**
118 | 
119 | - Homepage: https://zju3dv.github.io/intrinsic_nerf/
120 | - Paper: https://arxiv.org/abs/2210.00647
121 | - Code: https://github.com/zju3dv/IntrinsicNeRF
122 | 
123 | **Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control**
124 | 
125 | - Paper: https://arxiv.org/abs/2303.17606
126 | 
127 | - Code: https://github.com/songrise/AvatarCraft
128 | 
129 | **FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis**
130 | 
131 | - Homepage: https://shawn615.github.io/flipnerf/
132 | - Code: https://github.com/shawn615/FlipNeRF
133 | - Paper: https://arxiv.org/abs/2306.17723
134 | 
135 | **Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields**
136 | 
137 | - Homepage: https://wbhu.github.io/projects/Tri-MipRF
138 | 
139 | - Paper: https://arxiv.org/abs/2307.11335
140 | - Code: https://github.com/wbhu/Tri-MipRF
141 | 
142 | <a name="Diffusion"></a>
143 | 
144 | # Diffusion Models(扩散模型)
145 | 
146 | **PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment**
147 | 
148 | - Paper: https://arxiv.org/abs/2306.15667
149 | - Code: https://github.com/facebookresearch/PoseDiffusion
150 | 
151 | **FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model**
152 | 
153 | - Paper: https://arxiv.org/abs/2303.09833
154 | - Code: https://github.com/vvictoryuki/FreeDoM
155 | 
156 | **BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion**
157 | 
158 | - Paper: https://arxiv.org/abs/2307.10816
159 | - Code: https://github.com/Sierkinhane/BoxDiff
160 | 
161 | **BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction**
162 | 
163 | - Paper: https://arxiv.org/abs/2211.14304
164 | - Code: https://github.com/BarqueroGerman/BeLFusion
165 | 
166 | **DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion**
167 | 
168 | - Paper: https://arxiv.org/abs/2303.06840
169 | - Code: https://github.com/Zhaozixiang1228/MMIF-DDFM
170 | 
171 | **DIRE for Diffusion-Generated Image Detection**
172 | 
173 | - Paper: https://arxiv.org/abs/2303.09295
174 | - Code: https://github.com/ZhendongWang6/DIRE
175 | 
176 | <a name="Prompt"></a>
177 | 
178 | # Prompt
179 | 
180 | **Read-only Prompt Optimization for Vision-Language Few-shot Learning** 
181 | 
182 | - Paper: https://arxiv.org/abs/2308.14960
183 | - Code: https://github.com/mlvlab/RPO
184 | 
185 | **Introducing Language Guidance in Prompt-based Continual Learning**
186 | 
187 | - Paper: https://arxiv.org/abs/2308.15827
188 | - Code: None
189 | 
190 | <a name="VL"></a>
191 | 
192 | # 视觉和语言(Vision-Language)
193 | 
194 | **Read-only Prompt Optimization for Vision-Language Few-shot Learning** 
195 | 
196 | - Paper: https://arxiv.org/abs/2308.14960
197 | - Code: https://github.com/mlvlab/RPO
198 | 
199 | <a name="Object-Detection"></a>
200 | 
201 | # 目标检测(Object Detection)
202 | 
203 | **Femtodet: an object detection baseline for energy versus performance tradeoffs**
204 | 
205 | - Paper: https://arxiv.org/abs/2301.06719
206 | - Code: https://github.com/yh-pengtu/FemtoDet
207 | 
208 | **Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment**
209 | 
210 | - Paper: https://arxiv.org/abs/2207.13085
211 | - Code: https://github.com/Atten4Vis/GroupDETR
212 | 
213 | **Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection**
214 | 
215 | - Paper: https://arxiv.org/abs/2205.09613
216 | - Code: https://github.com/LiewFeng/imTED
217 | 
218 | **ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation**
219 | 
220 | - Paper: https://arxiv.org/abs/2308.09242
221 | - Code: https://github.com/iSEE-Laboratory/ASAG
222 | 
223 | <a name="VT"></a>
224 | 
225 | # 目标跟踪(Visual Tracking)
226 | 
227 | **Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers**
228 | 
229 | - Paper: https://arxiv.org/abs/2307.04129
230 | - Code: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker 
231 | 
232 | <a name="Semantic-Segmentation"></a>
233 | 
234 | # 语义分割(Semantic Segmentation)
235 | 
236 | **Segment Anything**
237 | 
238 | - Homepage: https://segment-anything.com/
239 | - Paper: https://arxiv.org/abs/2304.02643
240 | - Code: https://github.com/facebookresearch/segment-anything
241 | 
242 | **MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation**
243 | 
244 | - Paper: https://arxiv.org/abs/2304.09913
245 | - Code: https://github.com/shjo-april/MARS
246 | 
247 | **FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation**
248 | 
249 | - Paper: https://arxiv.org/abs/2307.07245
250 | - Code: https://github.com/TY-Shi/FreeCOS
251 | 
252 | **Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation**
253 | 
254 | - Paper: https://arxiv.org/abs/2211.14512
255 | - Code: https://github.com/yyliu01
256 | 
257 | **Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement**
258 | 
259 | - Paper: https://arxiv.org/abs/2307.09362
260 | - Code: https://github.com/w1oves/DTP
261 | 
262 | <a name="VOS"></a>
263 | 
264 | # 视频目标分割(Video Object Segmentation)
265 | 
266 | **Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus**
267 | 
268 | - Paper: https://arxiv.org/abs/2207.01203 
269 | 
270 | - Code: https://github.com/lxa9867/R2VOS
271 | 
272 | <a name="VIS"></a>
273 | 
274 | # 视频实例分割(Video Instance Segmentation)
275 | 
276 | **DVIS: Decoupled Video Instance Segmentation Framework**
277 | 
278 | - Paper: https://arxiv.org/abs/2306.03413
279 | - Code: https://github.com/zhang-tao-whu/DVIS
280 | 
281 | <a name="MIC"></a>
282 | 
283 | # 医学图像分类
284 | 
285 | **BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification**
286 | 
287 | - Paper: https://arxiv.org/abs/2203.01937
288 | 
289 | - Code: https://github.com/cyh-0/BoMD
290 | 
291 | <a name="MIS"></a>
292 | 
293 | # 医学图像分割
294 | 
295 | **CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection**
296 | 
297 | - Paper: https://arxiv.org/abs/2301.00785
298 | - Code: https://github.com/ljwztc/CLIP-Driven-Universal-Model
299 | 
300 | <a name="LLV"></a>
301 | 
302 | # Low-level Vision
303 | 
304 | **Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive**
305 | 
306 | - Paper: https://arxiv.org/abs/2305.19862
307 | - Code: https://github.com/shangwei5/SelfDRSC 
308 | 
309 | <a name="SR"></a>
310 | 
311 | # 超分辨率(Super-Resolution)
312 | 
313 | **Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.**
314 | 
315 | - Paper: https://arxiv.org/abs/2303.08942
316 | - Code: https://github.com/Zhaozixiang1228/GDSR-SSDNet 
317 | 
318 | <a name="3D-Point-Cloud"></a>
319 | 
320 | # 3D点云(3D Point Cloud)
321 | 
322 | **Robo3D: Towards Robust and Reliable 3D Perception against Corruptions**
323 | 
324 | - Homepage: https://ldkong.com/Robo3D
325 | - Paper: https://arxiv.org/abs/2303.17597
326 | - Code: https://github.com/ldkong1205/Robo3D
327 | 
328 | **Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models**
329 | 
330 | - Paper: https://arxiv.org/abs/2304.07221
331 | - Code: https://github.com/zyh16143998882/ICCV23-IDPT
332 | 
333 | **Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos**
334 | 
335 | - Paper: https://arxiv.org/abs/2308.09247
336 | - Code: None
337 | 
338 | <a name="3DOD"></a>
339 | 
340 | # 3D目标检测(3D Object Detection)
341 | 
342 | **PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images**
343 | 
344 | - Paper: https://arxiv.org/abs/2206.01256
345 | - Code: https://github.com/megvii-research/PETR
346 | 
347 | **DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection**
348 | 
349 | - Paper: https://arxiv.org/abs/2304.13031
350 | - Code: https://github.com/AIR-DISCOVER/DQS3D
351 | 
352 | **SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection**
353 | 
354 | - Paper: https://arxiv.org/abs/2304.14340
355 | - Code: https://github.com/yichen928/SparseFusion
356 | 
357 | **StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection**
358 | 
359 | - Paper: https://arxiv.org/abs/2303.11926
360 | - Code: https://github.com/exiawsh/StreamPETR.git
361 | 
362 | **Cross Modal Transformer: Towards Fast and Robust 3D Object Detection**
363 | 
364 | - Paper: https://arxiv.org/abs/2301.01283
365 | - Code: https://github.com/junjie18/CMT.git
366 | 
367 | **MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation**
368 | 
369 | - Paper: https://arxiv.org/abs/2304.09801
370 | - Project: https://chongjiange.github.io/metabev.html
371 | - Code: https://github.com/ChongjianGE/MetaBEV
372 | 
373 | **Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling**
374 | 
375 | - Paper: https://arxiv.org/abs/2307.07944
376 | - Code: https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet
377 | 
378 | **SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection**
379 | 
380 | - Paper: https://arxiv.org/abs/2307.11477
381 | - Code: https://github.com/mengtan00/SA-BEV
382 | 
383 | <a name="3DSS"></a>
384 | 
385 | # 3D语义分割(3D Semantic Segmentation)
386 | 
387 | **Rethinking Range View Representation for LiDAR Segmentation**
388 | 
389 | - Homepage: https://ldkong.com/RangeFormer
390 | - Paper: https://arxiv.org/abs/2303.05367
391 | - Code: None
392 | 
393 | <a name="3D-Object-Tracking"></a>
394 | 
395 | # 3D目标跟踪(3D Object Tracking)
396 | 
397 | **MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors**
398 | 
399 | - Paper: https://arxiv.org/abs/2303.05071
400 | - Code : https://github.com/slothfulxtx/MBPTrack3D
401 | 
402 | <a name="Video-Understanding"></a>
403 | 
404 | # 视频理解(Video Understanding)
405 | 
406 | **Unmasked Teacher: Towards Training-Efficient Video Foundation Models**
407 | 
408 | - Paper: https://arxiv.org/abs/2303.16058
409 | 
410 | - Code: https://github.com/OpenGVLab/unmasked_teacher
411 | 
412 | <a name="Image-Generation"></a>
413 | 
414 | # 图像生成(Image Generation)
415 | 
416 | **FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model**
417 | 
418 | - Paper: https://arxiv.org/abs/2303.09833
419 | - Code: https://github.com/vvictoryuki/FreeDoM
420 | 
421 | **BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion**
422 | 
423 | - Paper: https://arxiv.org/abs/2307.10816
424 | - Code: https://github.com/Sierkinhane/BoxDiff 
425 | 
426 | <a name="Video-Generation"></a>
427 | 
428 | # 视频生成(Video Generation)
429 | 
430 | **Simulating Fluids in Real-World Still Images**
431 | 
432 | - Homepage: https://slr-sfs.github.io/ 
433 | - Paper: https://arxiv.org/abs/2204.11335
434 | - Code: https://github.com/simon3dv/SLR-SFS
435 | 
436 | <a name="Image-Editing"></a>
437 | 
438 | # 图像编辑(Image Editing)
439 | 
440 | **Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing**
441 | 
442 | - Paper: https://arxiv.org/abs/2304.02051
443 | - Code: https://github.com/aimagelab/multimodal-garment-designer 
444 | 
445 | <a name="Video-Editing"></a>
446 | 
447 | # 视频编辑(Video Editing)
448 | 
449 | **FateZero: Fusing Attentions for Zero-shot Text-based Video Editing**
450 | 
451 | - Project: https://fate-zero-edit.github.io/ 
452 | - Paper: https://arxiv.org/abs/2303.09535
453 | - Code: https://github.com/ChenyangQiQi/FateZero 
454 | 
455 | <a name="Human-Motion-Generation"></a>
456 | 
457 | # 人体运动生成(Human Motion Generation)
458 | 
459 | **BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction**
460 | 
461 | - Paper: https://arxiv.org/abs/2211.14304
462 | - Code: https://github.com/BarqueroGerman/BeLFusion 
463 | 
464 | <a name="Low-light-Image-Enhancement"></a>
465 | 
466 | # 低光照图像增强(Low-light Image Enhancement)
467 | 
468 | **Implicit Neural Representation for Cooperative Low-light Image Enhancement**
469 | 
470 | - Paper: https://arxiv.org/abs/2303.11722
471 | - Code: https://github.com/Ysz2022/NeRCo
472 | 
473 | <a name="STD"></a>
474 | 
475 | # 场景文本检测(Scene Text Detection)
476 | 
477 | 
478 | 
479 | <a name="STR"></a>
480 | 
481 | # 场景文本识别(Scene Text Recognition)
482 | 
483 | **Self-supervised Character-to-Character Distillation for Text Recognition**
484 | 
485 | - Paper: https://arxiv.org/abs/2211.00288
486 | - Code: https://github.com/TongkunGuan/CCD
487 | 
488 | **MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition**
489 | 
490 | - Paper: https://arxiv.org/abs/2305.14758
491 | - Code: https://github.com/simplify23/MRN
492 | - 中文解读：https://zhuanlan.zhihu.com/p/643948935 
493 | 
494 | <a name="Image-Retrieval"></a>
495 | 
496 | # 图像检索(Image Retrieval)
497 | 
498 | **Zero-Shot Composed Image Retrieval with Textual Inversion**
499 | 
500 | - Paper: https://arxiv.org/abs/2303.15247
501 | - Code: https://github.com/miccunifi/SEARLE 
502 | 
503 | <a name="Image-Fusion"></a>
504 | 
505 | # 图像融合(Image Fusion)
506 | 
507 | **DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion**
508 | 
509 | - Paper: https://arxiv.org/abs/2303.06840
510 | - Code: https://github.com/Zhaozixiang1228/MMIF-DDFM
511 | 
512 | <a name="Trajectory-Prediction"></a>
513 | 
514 | # 轨迹预测(Trajectory Prediction)
515 | 
516 | **EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting**
517 | 
518 | - Homepage: https://inhwanbae.github.io/publication/eigentrajectory/
519 | 
520 | - Paper: https://arxiv.org/abs/2307.09306 
521 | - Code: https://github.com/InhwanBae/EigenTrajectory
522 | 
523 | <a name="Crowd-Counting"></a>
524 | 
525 | # 人群计数(Crowd Counting)
526 | 
527 | **Point-Query Quadtree for Crowd Counting, Localization, and More**
528 | 
529 | - Paper: https://arxiv.org/abs/2308.13814
530 | - Code: https://github.com/cxliu0/PET
531 | 
532 | <a name="Video-Quality-Assessment"></a>
533 | 
534 | # Video Quality Assessment(视频质量评价)
535 | 
536 | **Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives**
537 | 
538 | - Paper: https://arxiv.org/abs/2211.04894
539 | - Code: https://github.com/VQAssessment/DOVER
540 | 
541 | <a name="Others"></a>
542 | 
543 | # 其它(Others)
544 | 
545 | **MotionBERT: A Unified Perspective on Learning Human Motion Representations**
546 | 
547 | - Homepage: https://motionbert.github.io/
548 | - Paper: https://arxiv.org/abs/2210.06551
549 | - Code: https://github.com/Walter0807/MotionBERT 
550 | 
551 | **Graph Matching with Bi-level Noisy Correspondence**
552 | 
553 | - Paper: https://arxiv.org/pdf/2212.04085.pdf
554 | - Code: https://github.com/Lin-Yijie/Graph-Matching-Networks/tree/main/COMMON 
555 | 
556 | **LDL: Line Distance Functions for Panoramic Localization**
557 | 
558 | - Paper: https://arxiv.org/abs/2308.13989
559 | - Code: https://github.com/82magnolia/panoramic-localization
560 | 
561 | **Active Neural Mapping**
562 | 
563 | - Homepage: https://zikeyan.github.io/active-INR/index.html
564 | - Paper: https://arxiv.org/abs/2308.16246
565 | - Code: https://zikeyan.github.io/active-INR/index.html#
566 | 
567 | **Reconstructing Groups of People with Hypergraph Relational Reasoning**
568 | 
569 | - Paper: https://arxiv.org/abs/2308.15844
570 | - Code: https://github.com/boycehbz/GroupRec


--------------------------------------------------------------------------------