├── ICCV2021-Papers-with-Code.md.md └── README.md /ICCV2021-Papers-with-Code.md.md: -------------------------------------------------------------------------------- 1 | # ICCV2021-Papers-with-Code 2 | 3 | [ICCV 2021](http://iccv2021.thecvf.com/) 论文和开源项目合集(papers with code)! 4 | 5 | 1617 papers accepted - 25.9% acceptance rate 6 | 7 | ICCV 2021 收录论文IDs:https://docs.google.com/spreadsheets/u/1/d/e/2PACX-1vRfaTmsNweuaA0Gjyu58H_Cx56pGwFhcTYII0u1pg0U7MbhlgY0R6Y-BbK3xFhAiwGZ26u3TAtN5MnS/pubhtml 8 | 9 | > 注1:欢迎各位大佬提交issue,分享ICCV 2021论文和开源项目! 10 | > 11 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 12 | 13 | ## 【ICCV 2021 论文和开源目录】 14 | 15 | - [Backbone](#Backbone) 16 | - [Transformer](#Transformer) 17 | - [涨点神器](#Cool) 18 | - [GAN](#GAN) 19 | - [NAS](#NAS) 20 | - [NeRF](#NeRF) 21 | - [Loss](#Loss) 22 | - [Zero-Shot Learning](#Zero-Shot-Learning) 23 | - [Few-Shot Learning](#Few-Shot-Learning) 24 | - [长尾(Long-tailed)](#Long-tailed) 25 | - [Vision and Language](#VL) 26 | - [无监督/自监督(Self-Supervised)](#Un/Self-Supervised) 27 | - [Multi-Label Image Recognition(多标签图像识别)](#MLIR) 28 | - [2D目标检测(Object Detection)](#Object-Detection) 29 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) 30 | - [实例分割(Instance Segmentation)](#Instance-Segmentation) 31 | - [医学图像分割(Medical Image Segmentation)](#Medical-Image-Segmentation) 32 | - [视频目标分割(Video Object Segmentation)](#VOS) 33 | - [Few-shot Segmentation](#Few-shot-Segmentation) 34 | - [人体运动分割(Human Motion Segmentation)](#HMS) 35 | - [目标跟踪(Object Tracking)](#Object-Tracking) 36 | - [3D Point Cloud](#3D-Point-Cloud) 37 | - [3D Object Detection(3D目标检测)](#Point-Cloud-Object-Detection) 38 | - [3D Semantic Segmenation(3D语义分割)](#Point-Cloud-Semantic-Segmentation) 39 | - [3D Instance Segmentation(3D实例分割)](#Point-Cloud-Instance-Segmentation) 40 | - [3D Multi-Object Tracking(3D多目标跟踪)](#Point-Cloud-Multi-Object-Tracking) 41 | - [Point Cloud Denoising(点云去噪)](#Point-Cloud-Denoising) 42 | - [Point Cloud Registration(点云配准)](#Point-Cloud-Registration) 43 | - [Point Cloud Completion(点云补全)](#PCC) 44 | - [雷达语义分割(Radar Semantic Segmentation)](#RSS) 45 | - [图像恢复(Image Restoration)](#Image-Restoration) 46 | - [超分辨率(Super-Resolution)](#Super-Resolution) 47 | - [去噪(Denoising)](#Denoising) 48 | - [医学图像去噪(Medical Image Denoising)](#Medical-Image-Denoising) 49 | - [去模糊(Deblurring)](#Deblurring) 50 | - [阴影去除(Shadow Removal)](Shadow-Removal) 51 | - [视频插帧(Video Frame Interpolation)](#VFI) 52 | - [视频修复/补全(Video Inpainting)](#Video-Inpainting) 53 | - [行人重识别(Person Re-identification)](#Re-ID) 54 | - [行人搜索(Person Search)](#Person-Search) 55 | - [2D/3D人体姿态估计(2D/3D Human Pose Estimation)](#Human-Pose-Estimation) 56 | - [6D位姿估计(6D Object Pose Estimation)](#6D-Object) 57 | - [3D人头重建(3D Head Reconstruction)](#3D-Head-Reconstruction) 58 | - [人脸识别(Face Recognition)](#FR) 59 | - [人脸表情识别(Facial Expression Recognition)](#FER) 60 | - [行为识别(Action Recognition)](#Action-Recognition) 61 | - [时序动作定位(Temporal Action Localization)](#Temporal-Action-Localization) 62 | - [动作检测(Action Detection)](Action-Detection) 63 | - [群体活动识别(Group Activity Recognition)](#GAR) 64 | - [手语识别(Sign Language Recognition)](#SLR) 65 | - [文本检测(Text Detection)](#Text-Detection) 66 | - [文本识别(Text Recognition)](#Text-Recognition) 67 | - [文本替换(Text Repalcement)](#TR) 68 | - [视觉问答(Visual Question Answering, VQA)](#Visual-Question-Answering) 69 | - [对抗攻击(Adversarial Attack)](#Adversarial-Attack) 70 | - [深度估计(Depth Estimation)](#Depth-Estimation) 71 | - [视线估计(Gaze Estimation)](#Gaze-Estimation) 72 | - [人群计数(Crowd Counting)](#Crowd-Counting) 73 | - [车道线检测(Lane Detection)](#Lane-Detection) 74 | - [轨迹预测(Trajectory Prediction)](#Trajectory-Prediction) 75 | - [异常检测(Anomaly Detection)](#Anomaly-Detection) 76 | - [场景图生成(Scene Graph Generation)](#Scene-Graph-Generation) 77 | - [图像编辑(Image Editing)](#Image-Editing) 78 | - [图像合成(Image Synthesis)](#Image-Synthesis) 79 | - [图像检索(Image Retrieval)](#Image-Retrieval) 80 | - [三维重建(3D Reconstruction)](#3D-R) 81 | - [视频稳像(Video Stabilization)](#Video-Stabilization) 82 | - [细粒度识别(Fine-Grained Recognition)](#FGR) 83 | - [风格迁移(Style Transfer)](#Style-Transfer) 84 | - [神经绘画(Neural Painting)](#Neural-Painting) 85 | - [特征匹配(Feature Matching)](#FM) 86 | - [语义对应(Semantic Correspondence)](#Semantic-Correspondence) 87 | - [边缘检测(Edge Detection)](#Edge-Detection) 88 | - [相机标定(Camera Calibration)](#Camera-Calibration) 89 | - [图像质量评估(Image Quality Assessment)](#IQA) 90 | - [度量学习(Metric Learning)](#ML) 91 | - [Unsupervised Domain Adaptation](#UDA) 92 | - [Video Rescaling](#Video-Rescaling) 93 | - [Hand-Object Interaction](#Hand-Object-Interaction) 94 | - [Vision-and-Language Navigation](#VLN) 95 | - [数据集(Datasets)](#Datasets) 96 | - [其他(Others)](#Others) 97 | 98 | 99 | 100 | # Backbone 101 | 102 | **Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions** 103 | 104 | - Paper(Oral): https://arxiv.org/abs/2102.12122 105 | - Code: https://github.com/whai362/PVT 106 | 107 | **AutoFormer: Searching Transformers for Visual Recognition** 108 | 109 | - Paper: https://arxiv.org/abs/2107.00651 110 | - Code: https://github.com/microsoft/AutoML 111 | 112 | **Bias Loss for Mobile Neural Networks** 113 | 114 | - Paper: https://arxiv.org/abs/2107.11170 115 | - Code: None 116 | 117 | **Vision Transformer with Progressive Sampling** 118 | 119 | - Paper: https://arxiv.org/abs/2108.01684 120 | - Code: https://github.com/yuexy/PS-ViT 121 | 122 | **Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet** 123 | 124 | - Paper: https://arxiv.org/abs/2101.11986 125 | - Code: https://github.com/yitu-opensource/T2T-ViT 126 | 127 | **Rethinking Spatial Dimensions of Vision Transformers** 128 | 129 | - Paper: https://arxiv.org/abs/2103.16302 130 | 131 | - Code: https://github.com/naver-ai/pit 132 | 133 | **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows** 134 | 135 | - Paper: https://arxiv.org/abs/2103.14030 136 | - Code: https://github.com/microsoft/Swin-Transformer 137 | 138 | **Conformer: Local Features Coupling Global Representations for Visual Recognition** 139 | 140 | - Paper: https://arxiv.org/abs/2105.03889 141 | 142 | - Code: https://github.com/pengzhiliang/Conformer 143 | 144 | **MicroNet: Improving Image Recognition with Extremely Low FLOPs** 145 | 146 | - Paper: https://arxiv.org/abs/2108.05894 147 | - Code: https://github.com/liyunsheng13/micronet 148 | 149 | **Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition** 150 | 151 | - Paper: https://arxiv.org/abs/2102.01063 152 | - Code: https://github.com/idstcv/ZenNAS 153 | 154 | 155 | 156 | # Visual Transformer 157 | 158 | **Swin Transformer: Hierarchical Vision Transformer using Shifted Windows** 159 | 160 | - Paper: https://arxiv.org/abs/2103.14030 161 | - Code: https://github.com/microsoft/Swin-Transformer 162 | 163 | **An Empirical Study of Training Self-Supervised Vision Transformers** 164 | 165 | - Paper(Oral): https://arxiv.org/abs/2104.02057 166 | - MoCo v3 Code: None 167 | 168 | **Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions** 169 | 170 | - Paper(Oral): https://arxiv.org/abs/2102.12122 171 | - Code: https://github.com/whai362/PVT 172 | 173 | **Group-Free 3D Object Detection via Transformers** 174 | 175 | - Paper: https://arxiv.org/abs/2104.00678 176 | - Code: None 177 | 178 | **Spatial-Temporal Transformer for Dynamic Scene Graph Generation** 179 | 180 | - Paper: https://arxiv.org/abs/2107.12309 181 | - Code: None 182 | 183 | **Rethinking and Improving Relative Position Encoding for Vision Transformer** 184 | 185 | - Paper: https://arxiv.org/abs/2107.14222 186 | - Code: https://github.com/microsoft/AutoML/tree/main/iRPE 187 | 188 | **Emerging Properties in Self-Supervised Vision Transformers** 189 | 190 | - Paper: https://arxiv.org/abs/2104.14294 191 | - Code: https://github.com/facebookresearch/dino 192 | 193 | **Learning Spatio-Temporal Transformer for Visual Tracking** 194 | 195 | - Paper: https://arxiv.org/abs/2103.17154 196 | - Code: https://github.com/researchmm/Stark 197 | 198 | **Fast Convergence of DETR with Spatially Modulated Co-Attention** 199 | 200 | - Paper: https://arxiv.org/abs/2101.07448 201 | - Code: https://github.com/abc403/SMCA-replication 202 | 203 | **Vision Transformer with Progressive Sampling** 204 | 205 | - Paper: https://arxiv.org/abs/2108.01684 206 | - Code: https://github.com/yuexy/PS-ViT 207 | 208 | **Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet** 209 | 210 | - Paper: https://arxiv.org/abs/2101.11986 211 | - Code: https://github.com/yitu-opensource/T2T-ViT 212 | 213 | **Rethinking Spatial Dimensions of Vision Transformers** 214 | 215 | - Paper: https://arxiv.org/abs/2103.16302 216 | - Code: https://github.com/naver-ai/pit 217 | 218 | **The Right to Talk: An Audio-Visual Transformer Approach** 219 | 220 | - Paper: https://arxiv.org/abs/2108.03256 221 | - Code: None 222 | 223 | **Joint Inductive and Transductive Learning for Video Object Segmentation** 224 | 225 | - Paper: https://arxiv.org/abs/2108.03679 226 | - Code: https://github.com/maoyunyao/JOINT 227 | 228 | **Conformer: Local Features Coupling Global Representations for Visual Recognition** 229 | 230 | - Paper: https://arxiv.org/abs/2105.03889 231 | - Code: https://github.com/pengzhiliang/Conformer 232 | 233 | **Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer** 234 | 235 | - Paper: https://arxiv.org/abs/2108.03032 236 | - Code: https://github.com/zhiheLu/CWT-for-FSS 237 | 238 | **Paint Transformer: Feed Forward Neural Painting with Stroke Prediction** 239 | 240 | - Paper: https://arxiv.org/abs/2108.03798 241 | - Code: https://github.com/wzmsltw/PaintTransformer 242 | 243 | **Conditional DETR for Fast Training Convergence** 244 | 245 | - Paper: https://arxiv.org/abs/2108.06152 246 | - Code: https://github.com/Atten4Vis/ConditionalDETR 247 | 248 | **MUSIQ: Multi-scale Image Quality Transformer** 249 | 250 | - Paper: https://arxiv.org/abs/2108.05997 251 | - Code: https://github.com/google-research/google-research/tree/master/musiq 252 | 253 | **SOTR: Segmenting Objects with Transformers** 254 | 255 | - Paper: https://arxiv.org/abs/2108.06747 256 | - Code: https://github.com/easton-cau/SOTR 257 | 258 | **PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers** 259 | 260 | - Paper(Oral): https://arxiv.org/abs/2108.08839 261 | - Code: https://github.com/yuxumin/PoinTr 262 | 263 | **SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer** 264 | 265 | - Paper: https://arxiv.org/abs/2108.04444 266 | - Code: https://github.com/AllenXiangX/SnowflakeNet 267 | 268 | **Improving 3D Object Detection with Channel-wise Transformer** 269 | 270 | - Paper: https://arxiv.org/abs/2108.10723 271 | - Code: https://github.com/hlsheng1/CT3D 272 | 273 | **TransFER: Learning Relation-aware Facial Expression Representations with Transformers** 274 | 275 | - Paper: https://arxiv.org/abs/2108.11116 276 | - Code: None 277 | 278 | **GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer** 279 | 280 | - Paper: https://arxiv.org/abs/2108.12630 281 | - Code: https://github.com/xueyee/GroupFormer 282 | 283 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction** 284 | 285 | - Paper: https://arxiv.org/abs/2109.00512 286 | - Code: https://github.com/facebookresearch/co3d 287 | - Dataset: https://github.com/facebookresearch/co3d 288 | 289 | **Voxel Transformer for 3D Object Detection** 290 | 291 | - Paper: https://arxiv.org/abs/2109.02497 292 | - Code: None 293 | 294 | **3D Human Texture Estimation from a Single Image with Transformers** 295 | 296 | - Homepage: https://www.mmlab-ntu.com/project/texformer/ 297 | - Paper(Oral): https://arxiv.org/abs/2109.02563 298 | - Code: None 299 | 300 | **FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting** 301 | 302 | - Paper: https://arxiv.org/abs/2109.02974 303 | - Code: https://github.com/ruiliu-ai/FuseFormer 304 | 305 | **CTRL-C: Camera calibration TRansformer with Line-Classification** 306 | 307 | - Paper: https://arxiv.org/abs/2109.02259 308 | - Code: https://github.com/jwlee-vcl/CTRL-C 309 | 310 | **An End-to-End Transformer Model for 3D Object Detection** 311 | 312 | - Homepage: https://facebookresearch.github.io/3detr/ 313 | - Paper: https://arxiv.org/abs/2109.08141 314 | - Code: https://github.com/facebookresearch/3detr 315 | 316 | **Eformer: Edge Enhancement based Transformer for Medical Image Denoising** 317 | 318 | - Paper: https://arxiv.org/abs/2109.08044 319 | - Code: None 320 | 321 | **PnP-DETR: Towards Efficient Visual Analysis with Transformers** 322 | 323 | - Paper: https://arxiv.org/abs/2109.07036 324 | - Code: https://github.com/twangnh/pnp-detr 325 | 326 | **Transformer-based Dual Relation Graph for Multi-label Image Recognition** 327 | 328 | - Paper: https://arxiv.org/abs/2110.04722 329 | - Code: None 330 | 331 | 332 | 333 | # 涨点神器 334 | 335 | **FaPN: Feature-aligned Pyramid Network for Dense Image Prediction** 336 | 337 | - Paper: https://github.com/EMI-Group/FaPN 338 | - Code: https://arxiv.org/abs/2108.07058 339 | 340 | **Unifying Nonlocal Blocks for Neural Networks** 341 | 342 | - Paper: https://arxiv.org/abs/2108.02451 343 | - Code: https://github.com/zh460045050/SNL_ICCV2021 344 | 345 | **Towards Learning Spatially Discriminative Feature Representations** 346 | 347 | - Paper: https://arxiv.org/abs/2109.01359 348 | - Code: None 349 | 350 | 351 | 352 | # GAN 353 | 354 | **Labels4Free: Unsupervised Segmentation using StyleGAN** 355 | 356 | - Homepage: https://rameenabdal.github.io/Labels4Free/ 357 | - Paper: https://arxiv.org/abs/2103.14968 358 | 359 | **GNeRF: GAN-based Neural Radiance Field without Posed Camera** 360 | 361 | - Paper(Oral): https://arxiv.org/abs/2103.15606 362 | 363 | - Code: https://github.com/MQ66/gnerf 364 | 365 | **EigenGAN: Layer-Wise Eigen-Learning for GANs** 366 | 367 | - Paper: https://arxiv.org/abs/2104.12476 368 | - Code: https://github.com/LynnHo/EigenGAN-Tensorflow 369 | 370 | **From Continuity to Editability: Inverting GANs with Consecutive Images** 371 | 372 | - Paper: https://arxiv.org/abs/2107.13812 373 | - Code: https://github.com/Qingyang-Xu/InvertingGANs_with_ConsecutiveImgs 374 | 375 | **Sketch Your Own GAN** 376 | 377 | - Homepage: https://peterwang512.github.io/GANSketching/ 378 | - Paper: https://arxiv.org/abs/2108.02774 379 | - 代码: https://github.com/peterwang512/GANSketching 380 | 381 | **Manifold Matching via Deep Metric Learning for Generative Modeling** 382 | 383 | - Paper: https://arxiv.org/abs/2106.10777 384 | - Code: https://github.com/dzld00/pytorch-manifold-matching 385 | 386 | **Dual Projection Generative Adversarial Networks for Conditional Image Generation** 387 | 388 | - Paper: https://arxiv.org/abs/2108.09016 389 | - Code: None 390 | 391 | **GAN Inversion for Out-of-Range Images with Geometric Transformations** 392 | 393 | - Paper: https://arxiv.org/abs/2108.08998 394 | - Code: None 395 | 396 | **ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement** 397 | 398 | - Homepage: https://yuval-alaluf.github.io/restyle-encoder/ 399 | - Paper: https://arxiv.org/abs/2104.02699 400 | - Code: https://github.com/yuval-alaluf/restyle-encoder 401 | 402 | **StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery** 403 | 404 | - Paper(Oral): https://arxiv.org/abs/2103.17249 405 | - Code: https://github.com/orpatashnik/StyleCLIP 406 | 407 | **Image Synthesis via Semantic Composition** 408 | 409 | - Homepage: https://shepnerd.github.io/scg/ 410 | - Paper: https://arxiv.org/abs/2109.07053 411 | - Code: https://github.com/dvlab-research/SCGAN 412 | 413 | 414 | 415 | # NAS 416 | 417 | **AutoFormer: Searching Transformers for Visual Recognition** 418 | 419 | - Paper: https://arxiv.org/abs/2107.00651 420 | - Code: https://github.com/microsoft/AutoML 421 | 422 | **BN-NAS: Neural Architecture Search with Batch Normalization** 423 | 424 | - Paper: https://arxiv.org/abs/2108.07375 425 | - Code: https://github.com/bychen515/BNNAS 426 | 427 | **Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition** 428 | 429 | - Paper: https://arxiv.org/abs/2102.01063 430 | - Code: https://github.com/idstcv/ZenNAS 431 | 432 | 433 | 434 | # NeRF 435 | 436 | **GNeRF: GAN-based Neural Radiance Field without Posed Camera** 437 | 438 | - Paper(Oral): https://arxiv.org/abs/2103.15606 439 | 440 | - Code: https://github.com/MQ66/gnerf 441 | 442 | **KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs** 443 | 444 | - Paper: https://arxiv.org/abs/2103.13744 445 | 446 | - Code: https://github.com/creiser/kilonerf 447 | 448 | **In-Place Scene Labelling and Understanding with Implicit Scene Representation** 449 | 450 | - Homepage: https://shuaifengzhi.com/Semantic-NeRF/ 451 | - Paper(Oral): https://arxiv.org/abs/2103.15875 452 | 453 | **Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis** 454 | 455 | - Homepage: https://ajayj.com/dietnerf 456 | - Paper(DietNeRF): https://arxiv.org/abs/2104.00677 457 | 458 | **BARF: Bundle-Adjusting Neural Radiance Fields** 459 | 460 | - Homepage: https://chenhsuanlin.bitbucket.io/bundle-adjusting-NeRF/ 461 | - Paper(Oral): https://arxiv.org/abs/2104.06405 462 | - Code: https://github.com/chenhsuanlin/bundle-adjusting-NeRF 463 | 464 | **Self-Calibrating Neural Radiance Fields** 465 | 466 | - Paper: https://arxiv.org/abs/2108.13826 467 | - Code: https://github.com/POSTECH-CVLab/SCNeRF 468 | 469 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction** 470 | 471 | - Paper: https://arxiv.org/abs/2109.00512 472 | - Code: https://github.com/facebookresearch/co3d 473 | - Dataset: https://github.com/facebookresearch/co3d 474 | 475 | **Neural Articulated Radiance Field** 476 | 477 | - Paper: https://arxiv.org/abs/2104.03110 478 | - Code: https://github.com/nogu-atsu/NARF 479 | 480 | **NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo** 481 | 482 | - Paper(Oral): https://arxiv.org/abs/2109.01129 483 | - Code: https://github.com/weiyithu/NerfingMVS 484 | 485 | **SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes** 486 | 487 | - Homepage: https://xuchen-ethz.github.io/snarf 488 | - Paper: https://arxiv.org/abs/2104.03953 489 | - Code: https://github.com/xuchen-ethz/snarf 490 | 491 | **CodeNeRF: Disentangled Neural Radiance Fields for Object Categories** 492 | 493 | - Paper: https://arxiv.org/abs/2109.01750 494 | - Code: https://github.com/wayne1123/code-nerf 495 | 496 | **PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering** 497 | 498 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Ren_PIRenderer_Controllable_Portrait_Image_Generation_via_Semantic_Neural_Rendering_ICCV_2021_paper.html 499 | - Code: https://github.com/RenYurui/PIRender 500 | 501 | 502 | 503 | # Loss 504 | 505 | **Rank & Sort Loss for Object Detection and Instance Segmentation** 506 | 507 | - Paper(Oral): https://arxiv.org/abs/2107.11669 508 | - Code: https://github.com/kemaloksuz/RankSortLoss 509 | 510 | **Bias Loss for Mobile Neural Networks** 511 | 512 | - Paper: https://arxiv.org/abs/2107.11170 513 | - Code: None 514 | 515 | **A Robust Loss for Point Cloud Registration** 516 | 517 | - Paper: https://arxiv.org/abs/2108.11682 518 | - Code: None 519 | 520 | **Reconcile Prediction Consistency for Balanced Object Detection** 521 | 522 | - Paper: https://arxiv.org/abs/2108.10809 523 | - Code: None 524 | 525 | **Influence-Balanced Loss for Imbalanced Visual Classification** 526 | 527 | - Paper: https://arxiv.org/abs/2110.02444 528 | - Code: https://github.com/pseulki/IB-Loss 529 | 530 | 531 | 532 | # Zero-Shot Learning 533 | 534 | **FREE: Feature Refinement for Generalized Zero-Shot Learning** 535 | 536 | - Paper: https://arxiv.org/abs/2107.13807 537 | - Code: https://github.com/shiming-chen/FREE 538 | 539 | **Discriminative Region-based Multi-Label Zero-Shot Learning** 540 | 541 | - Paper: https://arxiv.org/abs/2108.09301 542 | - Code: https://arxiv.org/abs/2108.09301 543 | 544 | **Semantics Disentangling for Generalized Zero-Shot Learning** 545 | 546 | - Paper: https://arxiv.org/pdf/2101.07978 547 | - Code: https://github.com/uqzhichen/SDGZSL 548 | 549 | 550 | 551 | # Few-Shot Learning 552 | 553 | **Relational Embedding for Few-Shot Classification** 554 | 555 | - Paper: https://arxiv.org/abs/2108.0966 556 | - Code: https://github.com/dahyun-kang/renet 557 | 558 | **Few-Shot and Continual Learning with Attentive Independent Mechanisms** 559 | 560 | - Paper: https://arxiv.org/abs/2107.14053 561 | - Code: https://github.com/huang50213/AIM-Fewshot-Continual 562 | 563 | **Few Shot Visual Relationship Co-Localization** 564 | 565 | - Homepage: https://vl2g.github.io/projects/vrc/ 566 | 567 | - Paper: https://arxiv.org/abs/2108.11618 568 | 569 | 570 | 571 | # 长尾(Long-tailed) 572 | 573 | **Parametric Contrastive Learning** 574 | 575 | - Paper: https://arxiv.org/abs/2107.12028 576 | - Code: https://github.com/jiequancui/Parametric-Contrastive-Learning 577 | 578 | **Influence-Balanced Loss for Imbalanced Visual Classification** 579 | 580 | - Paper: https://arxiv.org/abs/2110.02444 581 | - Code: https://github.com/pseulki/IB-Loss 582 | 583 | 584 | 585 | # Vision and Language 586 | 587 | **VLGrammar: Grounded Grammar Induction of Vision and Language** 588 | 589 | - Paper: https://arxiv.org/abs/2103.12975 590 | - Code: https://github.com/evelinehong/VLGrammar 591 | 592 | 593 | 594 | # 无监督/自监督(Un/Self-Supervised) 595 | 596 | **An Empirical Study of Training Self-Supervised Vision Transformers** 597 | 598 | - Paper(Oral): https://arxiv.org/abs/2104.02057 599 | - MoCo v3 Code: None 600 | 601 | **DetCo: Unsupervised Contrastive Learning for Object Detection** 602 | 603 | - Paper: https://arxiv.org/abs/2102.04803 604 | - Code: https://github.com/xieenze/DetCo 605 | 606 | **Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization** 607 | 608 | - Paper: https://arxiv.org/abs/2108.02183 609 | - Code: None 610 | 611 | **Improving Contrastive Learning by Visualizing Feature Transformation** 612 | 613 | - Paper(Oral): https://arxiv.org/abs/2108.02982 614 | - Code: https://github.com/DTennant/CL-Visualizing-Feature-Transformation 615 | 616 | **Self-Supervised Visual Representations Learning by Contrastive Mask Prediction** 617 | 618 | - Paper: https://arxiv.org/abs/2108.08012 619 | - Code: None 620 | 621 | **Temporal Knowledge Consistency for Unsupervised Visual Representation Learning** 622 | 623 | - Paper: https://arxiv.org/abs/2108.10668 624 | - Code: None 625 | 626 | **MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving** 627 | 628 | - Paper: https://arxiv.org/abs/2108.12178 629 | - Code: https://github.com/KaiChen1998/MultiSiam 630 | 631 | **Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds** 632 | 633 | - Homepage: https://siyuanhuang.com/STRL/ 634 | - Paper: https://arxiv.org/abs/2109.00179 635 | - Code: https://github.com/yichen928/STRL 636 | 637 | **Self-supervised Product Quantization for Deep Unsupervised Image Retrieval** 638 | 639 | - Paper: https://arxiv.org/abs/2109.02244 640 | - Code: https://github.com/youngkyunJang/SPQ 641 | 642 | **Self-Supervised Representation Learning from Flow Equivariance** 643 | 644 | - Paper: https://arxiv.org/abs/2101.06553 645 | - Code: None 646 | 647 | 648 | 649 | # Multi-Label Image Recognition(多标签图像识别) 650 | 651 | **Residual Attention: A Simple but Effective Method for Multi-Label Recognition** 652 | 653 | - Paper: https://arxiv.org/abs/2108.02456 654 | - Code: https://github.com/Kevinz-code/CSRA 655 | 656 | 657 | 658 | # 2D目标检测(Object Detection) 659 | 660 | **DetCo: Unsupervised Contrastive Learning for Object Detection** 661 | 662 | - Paper: https://arxiv.org/abs/2102.04803 663 | - Code: https://github.com/xieenze/DetCo 664 | 665 | **Detecting Invisible People** 666 | 667 | - Homepage: http://www.cs.cmu.edu/~tkhurana/invisible.htm 668 | - Code: https://arxiv.org/abs/2012.08419 669 | 670 | **Active Learning for Deep Object Detection via Probabilistic Modeling** 671 | 672 | - Paper: https://arxiv.org/abs/2103.16130 673 | - Code: None 674 | 675 | **Conditional Variational Capsule Network for Open Set Recognition** 676 | 677 | - Paper: https://arxiv.org/abs/2104.09159 678 | - Code: https://github.com/guglielmocamporese/cvaecaposr 679 | 680 | **MDETR : Modulated Detection for End-to-End Multi-Modal Understanding** 681 | 682 | - Homepage: https://ashkamath.github.io/mdetr_page/ 683 | - Paper(Oral): https://arxiv.org/abs/2104.12763 684 | - Code: https://github.com/ashkamath/mdetr 685 | 686 | **Rank & Sort Loss for Object Detection and Instance Segmentation** 687 | 688 | - Paper(Oral): https://arxiv.org/abs/2107.11669 689 | - Code: https://github.com/kemaloksuz/RankSortLoss 690 | 691 | **SimROD: A Simple Adaptation Method for Robust Object Detection** 692 | 693 | - Paper(Oral): https://arxiv.org/abs/2107.13389 694 | - Code: None 695 | 696 | **GraphFPN: Graph Feature Pyramid Network for Object Detection** 697 | 698 | - Paper: https://arxiv.org/abs/2108.00580 699 | - Code: None 700 | 701 | **Fast Convergence of DETR with Spatially Modulated Co-Attention** 702 | 703 | - Paper: https://arxiv.org/abs/2101.07448 704 | - Code: https://github.com/abc403/SMCA-replication 705 | 706 | **Conditional DETR for Fast Training Convergence** 707 | 708 | - Paper: https://arxiv.org/abs/2108.06152 709 | - Code: https://github.com/Atten4Vis/ConditionalDETR 710 | 711 | **TOOD: Task-aligned One-stage Object Detection** 712 | 713 | - Paper(Oral): https://arxiv.org/abs/2108.07755 714 | - Code: https://github.com/fcjian/TOOD 715 | 716 | **Reconcile Prediction Consistency for Balanced Object Detection** 717 | 718 | - Paper: https://arxiv.org/abs/2108.10809 719 | 720 | - Code: None 721 | 722 | **Mutual Supervision for Dense Object Detection** 723 | 724 | - Paper: https://arxiv.org/abs/2109.05986 725 | - Code: https://github.com/MCG-NJU/MuSu-Detection 726 | 727 | **PnP-DETR: Towards Efficient Visual Analysis with Transformers** 728 | 729 | - Paper: https://arxiv.org/abs/2109.07036 730 | - Code: https://github.com/twangnh/pnp-detr 731 | 732 | **Deep Structured Instance Graph for Distilling Object Detectors** 733 | 734 | - Paper: https://arxiv.org/abs/2109.12862 735 | 736 | - Code: https://github.com/dvlab-research/Dsig 737 | 738 | ## 半监督目标检测 739 | 740 | **End-to-End Semi-Supervised Object Detection with Soft Teacher** 741 | 742 | - Paper: https://arxiv.org/abs/2106.09018 743 | - Code: None 744 | 745 | ## 旋转目标检测 746 | 747 | **Oriented R-CNN for Object Detection** 748 | 749 | - Paper: https://arxiv.org/abs/2108.05699 750 | - Code: https://github.com/jbwang1997/OBBDetection 751 | 752 | ## Few-Shot目标检测 753 | 754 | **DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection** 755 | 756 | - Paper: https://arxiv.org/abs/2108.09017 757 | - Code: https://github.com/er-muyue/DeFRCN 758 | 759 | 760 | 761 | ## 语义分割(Semantic Segmentation) 762 | 763 | **Personalized Image Semantic Segmentation** 764 | 765 | - Paper: https://arxiv.org/abs/2107.13978 766 | - Code: https://github.com/zhangyuygss/PIS 767 | - Dataset: https://github.com/zhangyuygss/PIS 768 | 769 | **Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation** 770 | 771 | - Paper(Oral): https://arxiv.org/abs/2107.11264 772 | - Code: None 773 | 774 | **Enhanced Boundary Learning for Glass-like Object Segmentation** 775 | 776 | - Paper: https://arxiv.org/abs/2103.15734 777 | - Code: https://github.com/hehao13/EBLNet 778 | 779 | **Self-Regulation for Semantic Segmentation** 780 | 781 | - Paper: https://arxiv.org/abs/2108.09702 782 | - Code: https://github.com/dongzhang89/SR-SS 783 | 784 | **Mining Contextual Information Beyond Image for Semantic Segmentation** 785 | 786 | - Paper: https://arxiv.org/abs/2108.11819 787 | - Code: https://github.com/CharlesPikachu/mcibi 788 | 789 | **Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation** 790 | 791 | - Paper: https://arxiv.org/abs/2107.11264 792 | - Code: https://github.com/shjung13/Standardized-max-logits 793 | 794 | **ISNet: Integrate Image-Level and Semantic-Level Context for Semantic Segmentation** 795 | 796 | - Paper: https://arxiv.org/abs/2108.12382 797 | - Code: https://github.com/SegmentationBLWX/sssegmentation 798 | 799 | **Scaling up instance annotation via label propagation** 800 | 801 | - Homepage: http://scaling-anno.csail.mit.edu/ 802 | - Paper: https://arxiv.org/abs/2110.02277 803 | - Code: None 804 | 805 | ## 无监督域自适应语义分割(Unsupervised Domain Ddaption Semantic Segmentation) 806 | 807 | **Multi-Anchor Active Domain Adaptation for Semantic Segmentation** 808 | 809 | - Paper(Oral): https://arxiv.org/abs/2108.08012 810 | - Code: https://github.com/munanning/MADA 811 | 812 | **Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation** 813 | 814 | - Homepage: https://sites.google.com/view/sfdaseg 815 | - Paper: https://arxiv.org/abs/2108.11249 816 | 817 | ## Few-Shot语义分割 818 | 819 | **Learning Meta-class Memory for Few-Shot Semantic Segmentation** 820 | 821 | - Paper: https://arxiv.org/abs/2108.02958' 822 | - Code: None 823 | 824 | **Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer** 825 | 826 | - Paper: https://arxiv.org/abs/2108.03032 827 | - Code: https://github.com/zhiheLu/CWT-for-FSS 828 | 829 | ## 半监督语义分割(Semi-supervised Semantic Segmentation) 830 | 831 | **Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation** 832 | 833 | - Paper: https://arxiv.org/abs/2107.11787 834 | - Code: None 835 | 836 | **Re-distributing Biased Pseudo Labels for Semi-supervised Semantic Segmentation: A Baseline Investigation** 837 | 838 | - Paper(Oral): https://arxiv.org/abs/2107.11279 839 | - Code: https://github.com/CVMI-Lab/DARS 840 | 841 | **Pixel Contrastive-Consistent Semi-Supervised Semantic Segmentation** 842 | 843 | - Paper: https://arxiv.org/abs/2108.09025 844 | - Code: None 845 | 846 | ## 弱监督语义分割(Weakly Supervised Semantic Segmentation) 847 | 848 | **Complementary Patch for Weakly Supervised Semantic Segmentation** 849 | 850 | - Paper: https://arxiv.org/abs/2108.03852 851 | - Code: None 852 | 853 | ## 无监督分割(Unsupervised Segmentation) 854 | 855 | **Labels4Free: Unsupervised Segmentation using StyleGAN** 856 | 857 | - Homepage: https://rameenabdal.github.io/Labels4Free/ 858 | - Paper: https://arxiv.org/abs/2103.14968 859 | 860 | 861 | 862 | # 实例分割(Instance Segmentation) 863 | 864 | **Instances as Queries** 865 | 866 | - Paper: https://arxiv.org/abs/2105.01928 867 | - Code: https://github.com/hustvl/QueryInst 868 | 869 | **Crossover Learning for Fast Online Video Instance Segmentation** 870 | 871 | - Paper: https://arxiv.org/abs/2104.05970 872 | - Code: https://github.com/hustvl/CrossVIS 873 | 874 | **Rank & Sort Loss for Object Detection and Instance Segmentation** 875 | 876 | - Paper(Oral): https://arxiv.org/abs/2107.11669 877 | - Code: https://github.com/kemaloksuz/RankSortLoss 878 | 879 | **SOTR: Segmenting Objects with Transformers** 880 | 881 | - Paper: https://arxiv.org/abs/2108.06747 882 | - Code: https://github.com/easton-cau/SOTR 883 | 884 | **Scaling up instance annotation via label propagation** 885 | 886 | - Homepage: http://scaling-anno.csail.mit.edu/ 887 | - Paper: https://arxiv.org/abs/2110.02277 888 | - Code: None 889 | 890 | 891 | 892 | # 医学图像分割(Medical Image Segmentation) 893 | 894 | **Recurrent Mask Refinement for Few-Shot Medical Image Segmentation** 895 | 896 | - Paper: https://arxiv.org/abs/2108.00622 897 | - Code: https://github.com/uci-cbcl/RP-Net 898 | 899 | 900 | 901 | # 视频目标分割(Video Object Segmentation) 902 | 903 | **Hierarchical Memory Matching Network for Video Object Segmentation** 904 | 905 | - Paper: https://arxiv.org/abs/2109.11404 906 | - Code: https://github.com/Hongje/HMMN 907 | 908 | **Full-Duplex Strategy for Video Object Segmentation** 909 | 910 | - Homepage: http://dpfan.net/FSNet/ 911 | - Paper: https://arxiv.org/abs/2108.03151 912 | - Code: https://github.com/GewelsJI/FSNet 913 | 914 | **Joint Inductive and Transductive Learning for Video Object Segmentation** 915 | 916 | - Paper: https://arxiv.org/abs/2108.03679 917 | - Code: https://github.com/maoyunyao/JOINT 918 | 919 | 920 | 921 | # Few-shot Segmentation 922 | 923 | **Mining Latent Classes for Few-shot Segmentation** 924 | 925 | - Paper(Oral): https://arxiv.org/abs/2103.15402 926 | - Code: https://github.com/LiheYoung/MiningFSS 927 | 928 | 929 | 930 | # 人体运动分割(Human Motion Segmentation) 931 | 932 | **Graph Constrained Data Representation Learning for Human Motion Segmentation** 933 | 934 | - Paper: https://arxiv.org/abs/2107.13362 935 | - Code: None 936 | 937 | 938 | 939 | # 目标跟踪(Object Tracking) 940 | 941 | **Learning to Track Objects from Unlabeled Videos** 942 | 943 | - Paper: https://arxiv.org/abs/2108.12711 944 | - Code: https://github.com/VISION-SJTU/USOT 945 | 946 | **Learning Spatio-Temporal Transformer for Visual Tracking** 947 | 948 | - Paper: https://arxiv.org/abs/2103.17154 949 | - Code: https://github.com/researchmm/Stark 950 | 951 | **Learning to Adversarially Blur Visual Object Tracking** 952 | 953 | - Paper: https://arxiv.org/abs/2107.12085 954 | - Code: https://github.com/tsingqguo/ABA 955 | 956 | **HiFT: Hierarchical Feature Transformer for Aerial Tracking** 957 | 958 | - Paper: https://arxiv.org/abs/2108.00202 959 | - Code: https://github.com/vision4robotics/HiFT 960 | 961 | **Learn to Match: Automatic Matching Network Design for Visual Tracking** 962 | 963 | - Paper: https://arxiv.org/abs/2108.00803 964 | - Code: https://github.com/JudasDie/SOTS 965 | 966 | **Saliency-Associated Object Tracking** 967 | 968 | - Paper: https://arxiv.org/abs/2108.03637 969 | - Code: https://github.com/ZikunZhou/SAOT.git 970 | 971 | ## RGBD 目标跟踪 972 | 973 | **DepthTrack: Unveiling the Power of RGBD Tracking** 974 | 975 | - Paper: https://arxiv.org/abs/2108.13962 976 | - Code: https://github.com/xiaozai/DeT 977 | - Dataset: https://github.com/xiaozai/DeT 978 | 979 | 980 | 981 | # 3D Point Cloud 982 | 983 | **Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds** 984 | 985 | - Homepage: https://siyuanhuang.com/STRL/ 986 | - Paper: https://arxiv.org/abs/2109.00179 987 | 988 | - Code: https://github.com/yichen928/STRL 989 | 990 | **Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion** 991 | 992 | - Homepage: https://hansen7.github.io/OcCo/ 993 | - Paper: https://arxiv.org/abs/2010.01089 994 | - Code: https://github.com/hansen7/OcCo 995 | 996 | **DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation** 997 | 998 | - Paper: https://arxiv.org/abs/2108.04023 999 | - Code: None 1000 | 1001 | **Adaptive Graph Convolution for Point Cloud Analysis** 1002 | 1003 | - Paper: https://arxiv.org/abs/2108.08035 1004 | - Code: https://github.com/hrzhou2/AdaptConv-master 1005 | 1006 | **Unsupervised Point Cloud Pre-Training via View-Point Occlusion, Completion** 1007 | 1008 | - Paper: https://arxiv.org/abs/2010.01089 1009 | - Code: https://github.com/hansen7/OcCo 1010 | 1011 | 1012 | 1013 | # 3D Object Detection(3D目标检测) 1014 | 1015 | **Group-Free 3D Object Detection via Transformers** 1016 | 1017 | - Paper: https://arxiv.org/abs/2104.00678 1018 | - Code: None 1019 | 1020 | **Improving 3D Object Detection with Channel-wise Transformer** 1021 | 1022 | - Paper: https://arxiv.org/abs/2108.10723 1023 | - Code: https://github.com/hlsheng1/CT3D 1024 | 1025 | **AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection** 1026 | 1027 | - Paper: https://arxiv.org/abs/2108.11127 1028 | - Code: https://github.com/zongdai/AutoShape 1029 | 1030 | **4D-Net for Learned Multi-Modal Alignment** 1031 | 1032 | - Paper: https://arxiv.org/abs/2109.01066 1033 | - Code: None 1034 | 1035 | **Voxel Transformer for 3D Object Detection** 1036 | 1037 | - Paper: https://arxiv.org/abs/2109.02497 1038 | - Code: None 1039 | 1040 | **Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection** 1041 | 1042 | - Paper: https://arxiv.org/abs/2109.02499 1043 | - Code: None 1044 | 1045 | **An End-to-End Transformer Model for 3D Object Detection** 1046 | 1047 | - Homepage: https://facebookresearch.github.io/3detr/ 1048 | - Paper: https://arxiv.org/abs/2109.08141 1049 | - Code: https://github.com/facebookresearch/3detr 1050 | 1051 | **RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection** 1052 | 1053 | - Paper: https://arxiv.org/abs/2103.10039 1054 | - Code: https://github.com/TuSimple/RangeDet 1055 | 1056 | **Geometry-based Distance Decomposition for Monocular 3D Object Detection** 1057 | 1058 | - Paper: https://arxiv.org/abs/2104.03775 1059 | - Code: https://github.com/Rock-100/MonoDet 1060 | 1061 | 1062 | 1063 | ## 3D Semantic Segmentation(3D语义分割) 1064 | 1065 | **ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation** 1066 | 1067 | - Paper: https://arxiv.org/abs/2107.11769 1068 | - Code: None 1069 | 1070 | **Learning with Noisy Labels for Robust Point Cloud Segmentation** 1071 | 1072 | - Homepage: https://shuquanye.com/PNAL_website/ 1073 | - Paper(Oral): https://arxiv.org/abs/2107.14230 1074 | 1075 | **VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation** 1076 | 1077 | - Paper(Oral): https://arxiv.org/abs/2107.13824 1078 | - Code: https://github.com/hzykent/VMNet 1079 | 1080 | **Sparse-to-dense Feature Matching: Intra and Inter domain Cross-modal Learning in Domain Adaptation for 3D Semantic Segmentation** 1081 | 1082 | - Paper: https://arxiv.org/abs/2107.14724 1083 | - Code: https://github.com/leolyj/DsCML 1084 | 1085 | **DRINet: A Dual-Representation Iterative Learning Network for Point Cloud Segmentation** 1086 | 1087 | - Paper: https://arxiv.org/abs/2108.04023 1088 | - Code: None 1089 | 1090 | **Adaptive Graph Convolution for Point Cloud Analysis** 1091 | 1092 | - Paper: https://arxiv.org/abs/2108.08035 1093 | - Code: https://github.com/hrzhou2/AdaptConv-master 1094 | 1095 | **Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation** 1096 | 1097 | - Paper: https://arxiv.org/abs/2106.15277 1098 | 1099 | - Code: https://github.com/ICEORY/PMF 1100 | 1101 | 1102 | 1103 | ## 3D Instance Segmentation(3D实例分割) 1104 | 1105 | **Hierarchical Aggregation for 3D Instance Segmentation** 1106 | 1107 | - Paper: https://arxiv.org/abs/2108.02350 1108 | - Code: https://github.com/hustvl/HAIS 1109 | 1110 | **Instance Segmentation in 3D Scenes Using Semantic Superpoint Tree Networks** 1111 | 1112 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/html/Liang_Instance_Segmentation_in_3D_Scenes_Using_Semantic_Superpoint_Tree_Networks_ICCV_2021_paper.html 1113 | 1114 | - Code: https://github.com/Gorilla-Lab-SCUT/SSTNet 1115 | 1116 | 1117 | 1118 | ## 3D Multi-Object Tracking(3D多目标跟踪) 1119 | 1120 | **Exploring Simple 3D Multi-Object Tracking for Autonomous Driving** 1121 | 1122 | - Paper: https://arxiv.org/abs/2108.10312 1123 | - Code: https://github.com/qcraftai/simtrack 1124 | 1125 | 1126 | 1127 | ## Point Cloud Denoising(点云去噪) 1128 | 1129 | **Score-Based Point Cloud Denoising** 1130 | 1131 | - Paper: https://arxiv.org/abs/2107.10981 1132 | - Code: None 1133 | 1134 | 1135 | 1136 | ## Point Cloud Registration(点云配准) 1137 | 1138 | **HRegNet: A Hierarchical Network for Large-scale Outdoor LiDAR Point Cloud Registration** 1139 | 1140 | - Homepage: https://ispc-group.github.io/hregnet 1141 | - Paper: https://arxiv.org/abs/2107.11992 1142 | - Code: https://github.com/ispc-lab/HRegNet 1143 | 1144 | **A Robust Loss for Point Cloud Registration** 1145 | 1146 | - Paper: https://arxiv.org/abs/2108.11682 1147 | - Code: None 1148 | 1149 | 1150 | 1151 | # Point Cloud Completion(点云补全) 1152 | 1153 | **PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers** 1154 | 1155 | - Paper(Oral): https://arxiv.org/abs/2108.08839 1156 | - Code: https://github.com/yuxumin/PoinTr 1157 | 1158 | **SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer** 1159 | 1160 | - Paper: https://arxiv.org/abs/2108.04444 1161 | - Code: https://github.com/AllenXiangX/SnowflakeNet 1162 | 1163 | 1164 | 1165 | # 雷达语义分割(Radar Semantic Segmentation) 1166 | 1167 | **Multi-View Radar Semantic Segmentation** 1168 | 1169 | - Paper: https://arxiv.org/abs/2103.16214 1170 | - Code: https://github.com/valeoai/MVRSS 1171 | 1172 | 1173 | 1174 | # 图像恢复(Image Restoration) 1175 | 1176 | **Dynamic Attentive Graph Learning for Image Restoration** 1177 | 1178 | - Paper: https://arxiv.org/abs/2109.06620 1179 | - Code: https://github.com/jianzhangcs/DAGL 1180 | 1181 | 1182 | 1183 | # 超分辨率(Super-Resolution) 1184 | 1185 | **Learning for Scale-Arbitrary Super-Resolution from Scale-Specific Networks** 1186 | 1187 | - Paper: https://arxiv.org/abs/2004.03791 1188 | - Code: https://github.com/LongguangWang/ArbSR 1189 | 1190 | **Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution** 1191 | 1192 | - Paper: https://arxiv.org/abs/2108.05302 1193 | - Code: https://github.com/JingyunLiang/MANet 1194 | 1195 | **Deep Reparametrization of Multi-Frame Super-Resolution and Denoising** 1196 | 1197 | - Paper(Oral): https://arxiv.org/abs/2108.08286 1198 | - Code: None 1199 | 1200 | **Dual-Camera Super-Resolution with Aligned Attention Modules** 1201 | 1202 | - Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html 1203 | - Paper: https://arxiv.org/abs/2109.01349 1204 | - Code: https://github.com/Tengfei-Wang/DualCameraSR 1205 | - Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html 1206 | 1207 | **Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme** 1208 | 1209 | - Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf 1210 | - Code: https://github.com/IanYeung/RealVSR 1211 | - Dataset: https://github.com/IanYeung/RealVSR 1212 | 1213 | 1214 | 1215 | # 去噪(Denoising) 1216 | 1217 | **Deep Reparametrization of Multi-Frame Super-Resolution and Denoising** 1218 | 1219 | - Paper(Oral): https://arxiv.org/abs/2108.08286 1220 | - Code: None 1221 | 1222 | **Rethinking Deep Image Prior for Denoising** 1223 | 1224 | - Paper: https://arxiv.org/abs/2108.12841 1225 | - Code: https://github.com/gistvision/DIP-denosing 1226 | 1227 | 1228 | 1229 | # 医学图像去噪(Medical Image Denoising) 1230 | 1231 | **Eformer: Edge Enhancement based Transformer for Medical Image Denoising** 1232 | 1233 | - Paper: https://arxiv.org/abs/2109.08044 1234 | - Code: None 1235 | 1236 | 1237 | 1238 | # 去模糊(Deblurring) 1239 | 1240 | **Rethinking Coarse-to-Fine Approach in Single Image Deblurring** 1241 | 1242 | - Paper: https://arxiv.org/abs/2108.05054 1243 | - Code: https://github.com/chosj95/MIMO-UNet 1244 | 1245 | **Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions** 1246 | 1247 | - Paper: https://arxiv.org/abs/2108.09108 1248 | - Code: None 1249 | 1250 | 1251 | 1252 | # 阴影去除(Shadow Removal) 1253 | 1254 | **CANet: A Context-Aware Network for Shadow Removal** 1255 | 1256 | - Paper: https://arxiv.org/abs/2108.09894 1257 | - Code: https://github.com/Zipei-Chen/CANet 1258 | 1259 | 1260 | 1261 | # 视频插帧(Video Frame Interpolation) 1262 | 1263 | **XVFI: eXtreme Video Frame Interpolation** 1264 | 1265 | - Paper(Oral): https://arxiv.org/abs/2103.16206 1266 | - Code: https://github.com/JihyongOh/XVFI 1267 | - Dataset: https://github.com/JihyongOh/XVFI 1268 | 1269 | **Asymmetric Bilateral Motion Estimation for Video Frame Interpolation** 1270 | 1271 | - Paper: https://arxiv.org/abs/2108.06815 1272 | - Code: https://github.com/JunHeum/ABME 1273 | 1274 | 1275 | 1276 | # 视频修复/补全(Video Inpainting) 1277 | 1278 | **FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting** 1279 | 1280 | - Paper: https://arxiv.org/abs/2109.02974 1281 | - Code: https://github.com/ruiliu-ai/FuseFormer 1282 | 1283 | 1284 | 1285 | # 行人重识别(Person Re-identification) 1286 | 1287 | **TransReID: Transformer-based Object Re-Identification** 1288 | 1289 | - Paper: https://arxiv.org/abs/2102.04378 1290 | 1291 | - Code: https://github.com/heshuting555/TransReID 1292 | 1293 | **IDM: An Intermediate Domain Module for Domain Adaptive Person Re-ID** 1294 | 1295 | - Paper(Oral): https://arxiv.org/abs/2108.02413 1296 | - Code: https://github.com/SikaStar/IDM 1297 | 1298 | 1299 | 1300 | # 行人搜索(Person Search) 1301 | 1302 | **Weakly Supervised Person Search with Region Siamese Networks** 1303 | 1304 | - Paper: https://arxiv.org/abs/2109.06109 1305 | - Code: None 1306 | 1307 | 1308 | 1309 | # 2D/3D人体姿态估计(2D/3D Human Pose Estimation) 1310 | 1311 | ## 2D 人体姿态估计 1312 | 1313 | **Human Pose Regression with Residual Log-likelihood Estimation** 1314 | 1315 | - Paper(Oral): https://arxiv.org/abs/2107.11291 1316 | - Code(RLE): https://github.com/Jeff-sjtu/res-loglikelihood-regression 1317 | 1318 | **Online Knowledge Distillation for Efficient Pose Estimation** 1319 | 1320 | - Paper: https://arxiv.org/abs/2108.02092 1321 | - Code: None 1322 | 1323 | ## 3D 人体姿态估计 1324 | 1325 | **Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows** 1326 | 1327 | - Paper: https://arxiv.org/abs/2107.13788 1328 | - Code: https://github.com/twehrbein/Probabilistic-Monocular-3D-Human-Pose-Estimation-with-Normalizing-Flows 1329 | 1330 | **Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images** 1331 | 1332 | - Paper: https://arxiv.org/abs/2109.05885 1333 | - Code: None 1334 | 1335 | 1336 | 1337 | # 6D位姿估计(6D Object Pose Estimation) 1338 | 1339 | **StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation** 1340 | 1341 | - Paper: https://arxiv.org/abs/2109.10115 1342 | - Code: None 1343 | - Dataset: None 1344 | 1345 | 1346 | 1347 | # 3D人头重建(3D Head Reconstruction) 1348 | 1349 | **H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction** 1350 | 1351 | - Homepage: https://crisalixsa.github.io/h3d-net/ 1352 | 1353 | - Paper: https://arxiv.org/abs/2107.12512 1354 | 1355 | 1356 | 1357 | # 人脸识别(Face Recognition) 1358 | 1359 | **SynFace: Face Recognition with Synthetic Data** 1360 | 1361 | - Paper: https://arxiv.org/abs/2108.07960 1362 | - Code: None 1363 | 1364 | 1365 | 1366 | # Facial Expression Recognition(人脸表情识别) 1367 | 1368 | **TransFER: Learning Relation-aware Facial Expression Representations with Transformers** 1369 | 1370 | - Paper: https://arxiv.org/abs/2108.11116 1371 | - Code: None 1372 | 1373 | 1374 | 1375 | # 行为识别(Action Recognition) 1376 | 1377 | **MGSampler: An Explainable Sampling Strategy for Video Action Recognition** 1378 | 1379 | - Paper: https://arxiv.org/abs/2104.09952 1380 | - Code: None 1381 | 1382 | **Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition** 1383 | 1384 | - Paper: https://arxiv.org/abs/2107.12213 1385 | - Code: https://github.com/Uason-Chen/CTR-GCN 1386 | 1387 | **Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization** 1388 | 1389 | - Paper: https://arxiv.org/abs/2108.02183 1390 | - Code: None 1391 | 1392 | **Dynamic Network Quantization for Efficient Video Inference** 1393 | 1394 | - Homepage: https://cs-people.bu.edu/sunxm/VideoIQ/project.html 1395 | - Paper: https://arxiv.org/abs/2108.10394 1396 | - Code: https://github.com/sunxm2357/VideoIQ 1397 | 1398 | 1399 | 1400 | # 时序动作定位(Temporal Action Localization) 1401 | 1402 | **Enriching Local and Global Contexts for Temporal Action Localization** 1403 | 1404 | - Paper: https://arxiv.org/abs/2107.12960 1405 | - Code: None 1406 | 1407 | 1408 | 1409 | # 动作检测(Action Detection) 1410 | 1411 | **Class Semantics-based Attention for Action Detection** 1412 | 1413 | - Paper: https://arxiv.org/abs/2109.02613 1414 | - Code: None 1415 | 1416 | 1417 | 1418 | # 群体活动识别(Group Activity Recognition) 1419 | 1420 | **GroupFormer: Group Activity Recognition with Clustered Spatial-Temporal Transformer** 1421 | 1422 | - Paper: https://arxiv.org/abs/2108.12630 1423 | - Code: https://github.com/xueyee/GroupFormer 1424 | 1425 | 1426 | 1427 | # 手语识别(Sign Language Recognition) 1428 | 1429 | **Visual Alignment Constraint for Continuous Sign Language Recognition** 1430 | 1431 | - Paper: https://arxiv.org/abs/2104.02330 1432 | - Code: https://github.com/ycmin95/VAC_CSLR 1433 | 1434 | 1435 | 1436 | # 文本检测(Text Detection) 1437 | 1438 | **Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection** 1439 | 1440 | - Paper: https://arxiv.org/abs/2107.12664 1441 | - Code: https://github.com/GXYM/TextBPN 1442 | 1443 | 1444 | 1445 | # 文本识别(Text Recognition) 1446 | 1447 | **Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition** 1448 | 1449 | - Paper: https://arxiv.org/abs/2107.12090 1450 | - Code: None 1451 | 1452 | 1453 | 1454 | # 文本替换(Text Replacement) 1455 | 1456 | **STRIVE: Scene Text Replacement In Videos** 1457 | 1458 | - Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/ 1459 | - Paper: https://arxiv.org/abs/2109.02762 1460 | - Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/ 1461 | 1462 | - Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/ 1463 | 1464 | 1465 | 1466 | # 视觉问答(Visual Question Answering, VQA) 1467 | 1468 | **Greedy Gradient Ensemble for Robust Visual Question Answering** 1469 | 1470 | - Paper: https://arxiv.org/abs/2107.12651 1471 | 1472 | - Code: https://github.com/GeraldHan/GGE 1473 | 1474 | 1475 | 1476 | # 对抗攻击(Adversarial Attack) 1477 | 1478 | **Feature Importance-aware Transferable Adversarial Attacks** 1479 | 1480 | - Paper: https://arxiv.org/abs/2107.14185 1481 | - Code: https://github.com/hcguoO0/FIA 1482 | 1483 | **AdvDrop: Adversarial Attack to DNNs by Dropping Information** 1484 | 1485 | - Paper: https://arxiv.org/abs/2108.09034 1486 | - Code: https://github.com/RjDuan/AdvDrop 1487 | 1488 | 1489 | 1490 | # 深度估计(Depth Estimation) 1491 | 1492 | **Augmenting Depth Estimation with Geospatial Context** 1493 | 1494 | - Paper: https://arxiv.org/abs/2109.09879 1495 | - Code: None 1496 | 1497 | **NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo** 1498 | 1499 | - Paper(Oral): https://arxiv.org/abs/2109.01129 1500 | - Code: https://github.com/weiyithu/NerfingMVS 1501 | 1502 | ## 单目深度估计 1503 | 1504 | **MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth Estimation for Indoor Environments** 1505 | 1506 | - Paper: https://arxiv.org/abs/2107.12429 1507 | - Code: None 1508 | 1509 | **Towards Interpretable Deep Networks for Monocular Depth Estimation** 1510 | 1511 | - Paper: https://arxiv.org/abs/2108.05312 1512 | - Code: https://github.com/youzunzhi/InterpretableMDE 1513 | 1514 | **Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark** 1515 | 1516 | - Paper: https://arxiv.org/abs/2108.03830 1517 | - Code: https://github.com/w2kun/RNW 1518 | 1519 | **Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation** 1520 | 1521 | - Paper: https://arxiv.org/abs/2108.07628 1522 | - Code: https://github.com/LINA-lln/ADDS-DepthNet 1523 | 1524 | **StructDepth: Leveraging the structural regularities for self-supervised indoor depth estimation** 1525 | 1526 | - Paper: https://arxiv.org/abs/2108.08574 1527 | - Code: https://github.com/SJTU-ViSYS/StructDepth 1528 | 1529 | 1530 | 1531 | # 视线估计(Gaze Estimation) 1532 | 1533 | **Generalizing Gaze Estimation with Outlier-guided Collaborative Adaptation** 1534 | 1535 | - Paper: https://arxiv.org/abs/2107.13780 1536 | - Code: https://github.com/DreamtaleCore/PnP-GA 1537 | 1538 | 1539 | 1540 | # 人群计数(Crowd Counting) 1541 | 1542 | **Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework** 1543 | 1544 | - Paper(Oral): https://arxiv.org/abs/2107.12746 1545 | - Code(P2PNet): https://github.com/TencentYoutuResearch/CrowdCounting-P2PNet 1546 | 1547 | **Uniformity in Heterogeneity:Diving Deep into Count Interval Partition for Crowd Counting** 1548 | 1549 | - Paper: https://arxiv.org/abs/2107.12619 1550 | - Code: https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet 1551 | 1552 | 1553 | 1554 | # 车道线检测(Lane-Detection) 1555 | 1556 | **VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection** 1557 | 1558 | - Paper: https://arxiv.org/abs/2108.08482 1559 | - Code: https://github.com/yujun0-0/MMA-Net 1560 | 1561 | - Dataset: https://github.com/yujun0-0/MMA-Net 1562 | 1563 | 1564 | 1565 | # 轨迹预测(Trajectory Prediction) 1566 | 1567 | **Human Trajectory Prediction via Counterfactual Analysis** 1568 | 1569 | - Paper: https://arxiv.org/abs/2107.14202 1570 | - Code: https://github.com/CHENGY12/CausalHTP 1571 | 1572 | **Personalized Trajectory Prediction via Distribution Discrimination** 1573 | 1574 | - Paper: https://arxiv.org/abs/2107.14204 1575 | - Code: https://github.com/CHENGY12/DisDis 1576 | 1577 | **MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction** 1578 | 1579 | - Paper: https://arxiv.org/abs/2108.09274 1580 | - Code: https://github.com/selflein/MG-GAN 1581 | 1582 | **Social NCE: Contrastive Learning of Socially-aware Motion Representations** 1583 | 1584 | - Paper: https://arxiv.org/abs/2012.11717 1585 | - Code: https://github.com/vita-epfl/social-nce 1586 | 1587 | **Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving** 1588 | 1589 | - Paper: https://arxiv.org/abs/2109.01510 1590 | - Code: https://github.com/xrenaa/Safety-Aware-Motion-Prediction 1591 | 1592 | **Where are you heading? Dynamic Trajectory Prediction with Expert Goal Examples** 1593 | 1594 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Zhao_Where_Are_You_Heading_Dynamic_Trajectory_Prediction_With_Expert_Goal_ICCV_2021_paper.pdf 1595 | - Code: https://github.com/JoeHEZHAO/expert_traj 1596 | 1597 | 1598 | 1599 | # 异常检测(Anomaly Detection) 1600 | 1601 | **Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning** 1602 | 1603 | - Paper: https://arxiv.org/abs/2101.10030 1604 | - Code: https://github.com/tianyu0207/RTFM 1605 | 1606 | 1607 | 1608 | # 场景图生成(Scene Graph Generation) 1609 | 1610 | **Spatial-Temporal Transformer for Dynamic Scene Graph Generation** 1611 | 1612 | - Paper: https://arxiv.org/abs/2107.12309 1613 | - Code: None 1614 | 1615 | 1616 | 1617 | # 图像编辑(Image Editing) 1618 | 1619 | **Sketch Your Own GAN** 1620 | 1621 | - Homepage: https://peterwang512.github.io/GANSketching/ 1622 | - Paper: https://arxiv.org/abs/2108.02774 1623 | - 代码: https://github.com/peterwang512/GANSketching 1624 | 1625 | 1626 | 1627 | # 图像合成(Image Synthesis) 1628 | 1629 | **Image Synthesis via Semantic Composition** 1630 | 1631 | - Homepage: https://shepnerd.github.io/scg/ 1632 | - Paper: https://arxiv.org/abs/2109.07053 1633 | - Code: https://github.com/dvlab-research/SCGAN 1634 | 1635 | 1636 | 1637 | # 图像检索(Image Retrieval) 1638 | 1639 | **Self-supervised Product Quantization for Deep Unsupervised Image Retrieval** 1640 | 1641 | - Paper: https://arxiv.org/abs/2109.02244 1642 | - Code: https://github.com/youngkyunJang/SPQ 1643 | 1644 | 1645 | 1646 | # 三维重建(3D Reconstruction) 1647 | 1648 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction** 1649 | 1650 | - Paper: https://arxiv.org/abs/2109.00512 1651 | 1652 | - Code: https://github.com/facebookresearch/co3d 1653 | - Dataset: https://github.com/facebookresearch/co3d 1654 | 1655 | 1656 | 1657 | # 视频稳像(Video Stabilization) 1658 | 1659 | **Out-of-boundary View Synthesis Towards Full-Frame Video Stabilization** 1660 | 1661 | - Paper: https://arxiv.org/abs/2108.09041 1662 | 1663 | - 代码:https://github.com/Annbless/OVS_Stabilization 1664 | 1665 | 1666 | 1667 | # 细粒度识别(Fine-Grained Recognition) 1668 | 1669 | **Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach** 1670 | 1671 | - Paper: https://arxiv.org/abs/2108.02399 1672 | - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset 1673 | - Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset 1674 | 1675 | 1676 | 1677 | # 风格迁移(Style Transfer) 1678 | 1679 | **AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer** 1680 | 1681 | - Paper: https://arxiv.org/abs/2108.03647 1682 | 1683 | - Paddle Code:https://github.com/PaddlePaddle/PaddleGAN 1684 | 1685 | - PyTorch Code:https://github.com/Huage001/AdaAttN 1686 | 1687 | 1688 | 1689 | # 神经绘画(Neural Painting) 1690 | 1691 | **Paint Transformer: Feed Forward Neural Painting with Stroke Prediction** 1692 | 1693 | - Paper: https://arxiv.org/abs/2108.03798 1694 | - Code: https://github.com/wzmsltw/PaintTransformer 1695 | 1696 | 1697 | 1698 | # 特征匹配(Feature Matching) 1699 | 1700 | **Learning to Match Features with Seeded Graph Matching Network** 1701 | 1702 | - Paper: https://arxiv.org/abs/2108.08771 1703 | 1704 | - Code: https://github.com/vdvchen/SGMNet 1705 | 1706 | 1707 | 1708 | # 语义对应(Semantic Correspondence) 1709 | 1710 | **Multi-scale Matching Networks for Semantic Correspondence** 1711 | 1712 | - Paper: https://arxiv.org/abs/2108.00211 1713 | - Code: https://github.com/wintersun661/MMNet 1714 | 1715 | 1716 | 1717 | # 边缘检测(Edge Detection) 1718 | 1719 | **Pixel Difference Networks for Efficient Edge Detection** 1720 | 1721 | - Paper: https://arxiv.org/abs/2108.07009 1722 | - Code: https://github.com/zhuoinoulu/pidinet 1723 | 1724 | **RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth** 1725 | 1726 | - Paper: https://arxiv.org/abs/2108.00616 1727 | - Code : https://github.com/MengyangPu/RINDNet 1728 | - Dataset: https://github.com/MengyangPu/RINDNet 1729 | 1730 | 1731 | 1732 | # 相机标定(Camera calibration) 1733 | 1734 | **CTRL-C: Camera calibration TRansformer with Line-Classification** 1735 | 1736 | - Paper: https://arxiv.org/abs/2109.02259 1737 | - Code: https://github.com/jwlee-vcl/CTRL-C 1738 | 1739 | 1740 | 1741 | # 图像质量评估(Image Quality Assessment) 1742 | 1743 | **MUSIQ: Multi-scale Image Quality Transformer** 1744 | 1745 | - Paper: https://arxiv.org/abs/2108.05997 1746 | - Code: https://github.com/google-research/google-research/tree/master/musiq 1747 | 1748 | **Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment** 1749 | 1750 | - Paper: https://arxiv.org/abs/2108.07948 1751 | - Code: https://github.com/researchmm/CKDN 1752 | 1753 | 1754 | 1755 | # 度量学习(Metric Learning) 1756 | 1757 | **Deep Relational Metric Learning** 1758 | 1759 | - Paper: https://arxiv.org/abs/2108.10026 1760 | - Code: https://github.com/zbr17/DRML 1761 | 1762 | **Towards Interpretable Deep Metric Learning with Structural Matching** 1763 | 1764 | - Paper: https://arxiv.org/abs/2108.05889 1765 | - Code: https://github.com/wl-zhao/DIML 1766 | 1767 | 1768 | 1769 | # Unsupervised Domain Adaptation 1770 | 1771 | **Recursively Conditional Gaussian for Ordinal Unsupervised Domain Adaptation** 1772 | 1773 | - Paper(Oral): https://arxiv.org/abs/2107.13467 1774 | - Code: None 1775 | 1776 | 1777 | 1778 | # Video Rescaling 1779 | 1780 | **Self-Conditioned Probabilistic Learning of Video Rescaling** 1781 | 1782 | - Paper: https://arxiv.org/abs/2107.11639 1783 | 1784 | - Code: None 1785 | 1786 | 1787 | 1788 | # Hand-Object Interaction 1789 | 1790 | **Learning a Contact Potential Field to Model the Hand-Object Interaction** 1791 | 1792 | - Paper: https://arxiv.org/abs/2012.00924 1793 | - Code: https://lixiny.github.io/CPF 1794 | 1795 | 1796 | 1797 | # Vision-and-Language Navigation 1798 | 1799 | **Airbert: In-domain Pretraining for Vision-and-Language Navigation** 1800 | 1801 | - Paper: https://arxiv.org/abs/2108.09105 1802 | - Code: https://airbert-vln.github.io/ 1803 | - Dataset: https://airbert-vln.github.io/ 1804 | 1805 | 1806 | 1807 | # 数据集(Datasets) 1808 | 1809 | **Beyond Road Extraction: A Dataset for Map Update using Aerial Images** 1810 | 1811 | - Homepage: https://favyen.com/muno21/ 1812 | - Paper: https://arxiv.org/abs/2110.04690 1813 | 1814 | - Code: https://github.com/favyen/muno21 1815 | 1816 | **StereOBJ-1M: Large-scale Stereo Image Dataset for 6D Object Pose Estimation** 1817 | 1818 | - Paper: https://arxiv.org/abs/2109.10115 1819 | - Code: None 1820 | - Dataset: None 1821 | 1822 | **RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth** 1823 | 1824 | - Paper: https://arxiv.org/abs/2108.00616 1825 | - Code : https://github.com/MengyangPu/RINDNet 1826 | - Dataset: https://github.com/MengyangPu/RINDNet 1827 | 1828 | **Panoptic Narrative Grounding** 1829 | 1830 | - Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/ 1831 | - Paper(Oral): https://arxiv.org/abs/2109.04988 1832 | - Code: https://github.com/BCV-Uniandes/PNG 1833 | - Dataset: https://github.com/BCV-Uniandes/PNG 1834 | 1835 | **STRIVE: Scene Text Replacement In Videos** 1836 | 1837 | - Homepage: https://striveiccv2021.github.io/STRIVE-ICCV2021/ 1838 | - Paper: https://arxiv.org/abs/2109.02762 1839 | - Code: https://github.com/striveiccv2021/STRIVE-ICCV2021/ 1840 | 1841 | - Datasets: https://github.com/striveiccv2021/STRIVE-ICCV2021/ 1842 | 1843 | **Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme** 1844 | 1845 | - Paper: https://www4.comp.polyu.edu.hk/~cslzhang/paper/ICCV21_RealVSR.pdf 1846 | - Code: https://github.com/IanYeung/RealVSR 1847 | - Dataset: https://github.com/IanYeung/RealVSR 1848 | 1849 | **Matching in the Dark: A Dataset for Matching Image Pairs of Low-light Scenes** 1850 | 1851 | - Paper: https://arxiv.org/abs/2109.03585 1852 | 1853 | - Code: None 1854 | 1855 | **Dual-Camera Super-Resolution with Aligned Attention Modules** 1856 | 1857 | - Homepage: https://tengfei-wang.github.io/Dual-Camera-SR/index.html 1858 | - Paper: https://arxiv.org/abs/2109.01349 1859 | - Code: https://github.com/Tengfei-Wang/DualCameraSR 1860 | - Dataset: https://tengfei-wang.github.io/Dual-Camera-SR/index.html 1861 | 1862 | **DepthTrack: Unveiling the Power of RGBD Tracking** 1863 | 1864 | - Paper: https://arxiv.org/abs/2108.13962 1865 | - Code: https://github.com/xiaozai/DeT 1866 | - Dataset: https://github.com/xiaozai/DeT 1867 | 1868 | **Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction** 1869 | 1870 | - Paper: https://arxiv.org/abs/2109.00512 1871 | 1872 | - Code: https://github.com/facebookresearch/co3d 1873 | - Dataset: https://github.com/facebookresearch/co3d 1874 | 1875 | **BioFors: A Large Biomedical Image Forensics Dataset** 1876 | 1877 | - Paper: https://arxiv.org/abs/2108.12961 1878 | - Code: None 1879 | - Dataset: None 1880 | 1881 | **Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An Approach** 1882 | 1883 | - Paper: https://arxiv.org/abs/2108.02399 1884 | - Code: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset 1885 | - Dataset: https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset 1886 | 1887 | **Airbert: In-domain Pretraining for Vision-and-Language Navigation** 1888 | 1889 | - Paper: https://arxiv.org/abs/2108.09105 1890 | - Code: https://airbert-vln.github.io/ 1891 | - Dataset: https://airbert-vln.github.io/ 1892 | 1893 | **Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation** 1894 | 1895 | - Paper: http://arxiv.org/abs/2108.08202 1896 | - Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021 1897 | - Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021 1898 | 1899 | **VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection** 1900 | 1901 | - Paper: https://arxiv.org/abs/2108.08482 1902 | - Code: https://github.com/yujun0-0/MMA-Net 1903 | 1904 | - Dataset: https://github.com/yujun0-0/MMA-Net 1905 | 1906 | **XVFI: eXtreme Video Frame Interpolation** 1907 | 1908 | - Paper(Oral): https://arxiv.org/abs/2103.16206 1909 | - Code: https://github.com/JihyongOh/XVFI 1910 | - Dataset: https://github.com/JihyongOh/XVFI 1911 | 1912 | **Personalized Image Semantic Segmentation** 1913 | 1914 | - Paper: https://arxiv.org/abs/2107.13978 1915 | - Code: https://github.com/zhangyuygss/PIS 1916 | - Dataset: https://github.com/zhangyuygss/PIS 1917 | 1918 | **H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction** 1919 | 1920 | - Homepage: https://crisalixsa.github.io/h3d-net/ 1921 | 1922 | - Paper: https://arxiv.org/abs/2107.12512 1923 | 1924 | 1925 | 1926 | # 其他(Others) 1927 | 1928 | **Photon-Starved Scene Inference using Single Photon Cameras** 1929 | 1930 | - Paper: https://openaccess.thecvf.com/content/ICCV2021/papers/Goyal_Photon-Starved_Scene_Inference_Using_Single_Photon_Cameras_ICCV_2021_paper.pdf 1931 | 1932 | - Code: https://github.com/bhavyagoyal/spclowlight 1933 | 1934 | **Towards Flexible Blind JPEG Artifacts Removal** 1935 | 1936 | - Paper: https://arxiv.org/abs/2109.14573 1937 | 1938 | - Code: https://github.com/jiaxi-jiang/FBCNN 1939 | 1940 | **Generating Attribution Maps with Disentangled Masked Backpropagation** 1941 | 1942 | - Paper: https://arxiv.org/abs/2101.06773 1943 | - Code: https://gitlab.com/adriaruizo/dmbp_iccv21 1944 | 1945 | **CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations** 1946 | 1947 | - Paper: https://arxiv.org/abs/2109.14910 1948 | - Code: None 1949 | 1950 | **ReconfigISP: Reconfigurable Camera Image Processing Pipeline** 1951 | 1952 | - Paper: https://arxiv.org/abs/2109.04760 1953 | - Code: None 1954 | 1955 | **Panoptic Narrative Grounding** 1956 | 1957 | - Homepage: https://bcv-uniandes.github.io/panoptic-narrative-grounding/ 1958 | - Paper(Oral): https://arxiv.org/abs/2109.04988 1959 | - Code: https://github.com/BCV-Uniandes/PNG 1960 | - Dataset: https://github.com/BCV-Uniandes/PNG 1961 | 1962 | **NEAT: Neural Attention Fields for End-to-End Autonomous Driving** 1963 | 1964 | - Paper: https://arxiv.org/abs/2109.04456 1965 | - https://github.com/autonomousvision/neat 1966 | 1967 | **Keep CALM and Improve Visual Feature Attribution** 1968 | 1969 | - Paper: https://arxiv.org/abs/2106.07861 1970 | - Code: https://github.com/naver-ai/calm 1971 | 1972 | **YouRefIt: Embodied Reference Understanding with Language and Gesture** 1973 | 1974 | - Paper: https://arxiv.org/abs/2109.03413 1975 | - Code: None 1976 | 1977 | **Pri3D: Can 3D Priors Help 2D Representation Learning?** 1978 | 1979 | - Paper: https://arxiv.org/abs/2104.11225 1980 | - Code: https://github.com/Sekunde/Pri3D 1981 | 1982 | **Amplitude-Phase Recombination: Rethinking Robustness of Convolutional Neural Networks in Frequency Domain** 1983 | 1984 | - Paper: https://arxiv.org/abs/2108.08487 1985 | - Code: https://github.com/iCGY96/APR 1986 | 1987 | **Continual Learning for Image-Based Camera Localization** 1988 | 1989 | - Paper: https://arxiv.org/abs/2108.09112 1990 | - Code: None 1991 | 1992 | **Multi-Task Self-Training for Learning General Representations** 1993 | 1994 | - Paper: https://arxiv.org/abs/2108.11353 1995 | - Code: None 1996 | 1997 | **A Unified Objective for Novel Class Discovery** 1998 | 1999 | - Homepage: https://ncd-uno.github.io/ 2000 | - Paper(Oral): https://arxiv.org/abs/2108.08536 2001 | - Code: https://github.com/DonkeyShot21/UNO 2002 | 2003 | **Global Pooling, More than Meets the Eye: Position Information is Encoded Channel-Wise in CNNs** 2004 | 2005 | - Paper: https://arxiv.org/abs/2108.07884 2006 | - Code: https://github.com/islamamirul/PermuteNet 2007 | 2008 | **Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation** 2009 | 2010 | - Paper: http://arxiv.org/abs/2108.08202 2011 | - Code: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021 2012 | - Dataset: https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021 2013 | 2014 | **Impact of Aliasing on Generalizatin in Deep Convolutional Networks** 2015 | 2016 | - Paper: https://arxiv.org/abs/2108.03489 2017 | - Code: None 2018 | 2019 | **Out-of-Core Surface Reconstruction via Global TGV Minimization** 2020 | 2021 | - Paper: https://arxiv.org/abs/2107.14790 2022 | - Code: None 2023 | 2024 | **Progressive Correspondence Pruning by Consensus Learning** 2025 | 2026 | - Homepage: https://sailor-z.github.io/projects/CLNet.html 2027 | - Paper: https://arxiv.org/abs/2101.00591 2028 | - Code: https://github.com/sailor-z/CLNet 2029 | 2030 | **Energy-Based Open-World Uncertainty Modeling for Confidence Calibration** 2031 | 2032 | - Paper: https://arxiv.org/abs/2107.12628 2033 | - Code: None 2034 | 2035 | **Generalized Shuffled Linear Regression** 2036 | 2037 | - Paper: https://drive.google.com/file/d/1Qu21VK5qhCW8WVjiRnnBjehrYVmQrDNh/view?usp=sharing 2038 | - Code: https://github.com/SILI1994/Generalized-Shuffled-Linear-Regression 2039 | 2040 | **Discovering 3D Parts from Image Collections** 2041 | 2042 | - Homepage: https://chhankyao.github.io/lpd/ 2043 | 2044 | - Paper: https://arxiv.org/abs/2107.13629 2045 | 2046 | **Semi-Supervised Active Learning with Temporal Output Discrepancy** 2047 | 2048 | - Paper: https://arxiv.org/abs/2107.14153 2049 | - Code: https://github.com/siyuhuang/TOD 2050 | 2051 | **Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?** 2052 | 2053 | Paper: https://arxiv.org/abs/2105.02498 2054 | 2055 | Code: https://github.com/KingJamesSong/DifferentiableSVD 2056 | 2057 | **Hand-Object Contact Consistency Reasoning for Human Grasps Generation** 2058 | 2059 | - Homepage: https://hwjiang1510.github.io/GraspTTA/ 2060 | - Paper(Oral): https://arxiv.org/abs/2104.03304 2061 | - Code: None 2062 | 2063 | **Equivariant Imaging: Learning Beyond the Range Space** 2064 | 2065 | - Paper(Oral): https://arxiv.org/abs/2103.14756 2066 | - Code: https://github.com/edongdongchen/EI 2067 | 2068 | **Just Ask: Learning to Answer Questions from Millions of Narrated Videos** 2069 | 2070 | - Paper(Oral): https://arxiv.org/abs/2012.00451 2071 | - Code: https://github.com/antoyang/just-ask 2072 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ICCV2023-Papers-with-Code 2 | 3 | [ICCV 2023](http://iccv2023.thecvf.com/) 论文和开源项目合集(papers with code)! 4 | 5 | 2160 papers accepted! 6 | 7 | ICCV 2023 收录论文IDs:https://t.co/A0mCH8gbOi 8 | 9 | > 注1:欢迎各位大佬提交issue,分享ICCV 2023论文和开源项目! 10 | > 11 | > 注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision 12 | > 13 | > [ICCV 2021](ICCV2021-Papers-with-Code.md) 14 | 15 | 如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【[CVer学术交流群](https://t.zsxq.com/10OGjThDw)】!互相学习,一起进步~ 16 | 17 | ![](https://github.com/amusi/CVPR2023-Papers-with-Code/raw/master/CVer%E5%AD%A6%E6%9C%AF%E4%BA%A4%E6%B5%81%E7%BE%A4.png) 18 | 19 | # 【ICCV 2023 论文开源目录】 20 | 21 | - [Backbone](#Backbone) 22 | - [CLIP](#CLIP) 23 | - [MAE](#MAE) 24 | - [GAN](#GAN) 25 | - [GNN](#GNN) 26 | - [MLP](#MLP) 27 | - [NAS](#NAS) 28 | - [OCR](#OCR) 29 | - [NeRF](#NeRF) 30 | - [DETR](#DETR) 31 | - [Prompt](#Prompt) 32 | - [Diffusion Models(扩散模型)](#Diffusion) 33 | - [Prompt](#Prompt) 34 | - [Avatars](#Avatars) 35 | - [ReID(重识别)](#ReID) 36 | - [长尾分布(Long-Tail)](#Long-Tail) 37 | - [Vision Transformer](#Vision-Transformer) 38 | - [视觉和语言(Vision-Language)](#VL) 39 | - [自监督学习(Self-supervised Learning)](#SSL) 40 | - [数据增强(Data Augmentation)](#DA) 41 | - [目标检测(Object Detection)](#Object-Detection) 42 | - [目标跟踪(Visual Tracking)](#VT) 43 | - [语义分割(Semantic Segmentation)](#Semantic-Segmentation) 44 | - [实例分割(Instance Segmentation)](#Instance-Segmentation) 45 | - [全景分割(Panoptic Segmentation)](#Panoptic-Segmentation) 46 | - [医学图像分类(Medical Image Classfication)](#MIC) 47 | - [医学图像分割(Medical Image Segmentation)](#MIS) 48 | - [视频目标分割(Video Object Segmentation)](#VOS) 49 | - [视频实例分割(Video Instance Segmentation)](#VIS) 50 | - [参考图像分割(Referring Image Segmentation)](#RIS) 51 | - [图像抠图(Image Matting)](#Matting) 52 | - [Low-level Vision](#LLV) 53 | - [超分辨率(Super-Resolution)](#SR) 54 | - [去噪(Denoising)](#Denoising) 55 | - [去模糊(Deblur)](#Deblur) 56 | - [3D点云(3D Point Cloud)](#3D-Point-Cloud) 57 | - [3D目标检测(3D Object Detection)](#3DOD) 58 | - [3D语义分割(3D Semantic Segmentation)](#3DSS) 59 | - [3D目标跟踪(3D Object Tracking)](#3D-Object-Tracking) 60 | - [3D语义场景补全(3D Semantic Scene Completion)](#3DSSC) 61 | - [3D配准(3D Registration)](#3D-Registration) 62 | - [3D人体姿态估计(3D Human Pose Estimation)](#3D-Human-Pose-Estimation) 63 | - [3D人体Mesh估计(3D Human Mesh Estimation)](#3D-Human-Pose-Estimation) 64 | - [医学图像(Medical Image)](#Medical-Image) 65 | - [图像生成(Image Generation)](#Image-Generation) 66 | - [视频生成(Video Generation)](#Video-Generation) 67 | - [图像编辑(Image Editing)](#Image-Editing) 68 | - [视频编辑(Video Editing)](#Video-Editing) 69 | - [视频理解(Video Understanding)](#Video-Understanding) 70 | - [人体运动生成(Human Motion Generation)](#Human-Motion-Generation) 71 | - [低光照图像增强(Low-light Image Enhancement)](#Low-light-Image-Enhancement) 72 | - [场景文本识别(Scene Text Recognition)](#STR) 73 | - [图像检索(Image Retrieval)](#Image-Retrieval) 74 | - [图像融合(Image Fusion)](#Image-Fusion) 75 | - [轨迹预测(Trajectory Prediction) ](#Trajectory-Prediction) 76 | - [人群计数(Crowd Counting)](#Crowd-Counting) 77 | - [Video Quality Assessment(视频质量评价)](#Video-Quality-Assessment) 78 | - [其它(Others)](#Others) 79 | 80 | 81 | 82 | # Avatars 83 | 84 | **Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control** 85 | 86 | Paper: https://arxiv.org/abs/2303.17606 87 | 88 | Code: https://github.com/songrise/AvatarCraft 89 | 90 | 91 | 92 | # Backbone 93 | 94 | **Rethinking Mobile Block for Efficient Attention-based Models** 95 | 96 | - Paper: https://arxiv.org/abs/2301.01146 97 | - Code: https://github.com/zhangzjn/EMO 98 | 99 | 100 | 101 | # CLIP 102 | 103 | **PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization** 104 | 105 | - Paper: https://arxiv.org/abs/2307.15199 106 | - Code: [https://PromptStyler.github.io/](https://promptstyler.github.io/) 107 | 108 | **CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation** 109 | 110 | - Paper: https://arxiv.org/abs/2308.15226 111 | - Code: http://www.github.com/devaansh100/CLIPTrans 112 | 113 | 114 | 115 | # NeRF 116 | 117 | **IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis** 118 | 119 | - Homepage: https://zju3dv.github.io/intrinsic_nerf/ 120 | - Paper: https://arxiv.org/abs/2210.00647 121 | - Code: https://github.com/zju3dv/IntrinsicNeRF 122 | 123 | **Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control** 124 | 125 | - Paper: https://arxiv.org/abs/2303.17606 126 | 127 | - Code: https://github.com/songrise/AvatarCraft 128 | 129 | **FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis** 130 | 131 | - Homepage: https://shawn615.github.io/flipnerf/ 132 | - Code: https://github.com/shawn615/FlipNeRF 133 | - Paper: https://arxiv.org/abs/2306.17723 134 | 135 | **Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields** 136 | 137 | - Homepage: https://wbhu.github.io/projects/Tri-MipRF 138 | 139 | - Paper: https://arxiv.org/abs/2307.11335 140 | - Code: https://github.com/wbhu/Tri-MipRF 141 | 142 | 143 | 144 | # Diffusion Models(扩散模型) 145 | 146 | **PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment** 147 | 148 | - Paper: https://arxiv.org/abs/2306.15667 149 | - Code: https://github.com/facebookresearch/PoseDiffusion 150 | 151 | **FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model** 152 | 153 | - Paper: https://arxiv.org/abs/2303.09833 154 | - Code: https://github.com/vvictoryuki/FreeDoM 155 | 156 | **BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion** 157 | 158 | - Paper: https://arxiv.org/abs/2307.10816 159 | - Code: https://github.com/Sierkinhane/BoxDiff 160 | 161 | **BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction** 162 | 163 | - Paper: https://arxiv.org/abs/2211.14304 164 | - Code: https://github.com/BarqueroGerman/BeLFusion 165 | 166 | **DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion** 167 | 168 | - Paper: https://arxiv.org/abs/2303.06840 169 | - Code: https://github.com/Zhaozixiang1228/MMIF-DDFM 170 | 171 | **DIRE for Diffusion-Generated Image Detection** 172 | 173 | - Paper: https://arxiv.org/abs/2303.09295 174 | - Code: https://github.com/ZhendongWang6/DIRE 175 | 176 | 177 | 178 | # Prompt 179 | 180 | **Read-only Prompt Optimization for Vision-Language Few-shot Learning** 181 | 182 | - Paper: https://arxiv.org/abs/2308.14960 183 | - Code: https://github.com/mlvlab/RPO 184 | 185 | **Introducing Language Guidance in Prompt-based Continual Learning** 186 | 187 | - Paper: https://arxiv.org/abs/2308.15827 188 | - Code: None 189 | 190 | 191 | 192 | # 视觉和语言(Vision-Language) 193 | 194 | **Read-only Prompt Optimization for Vision-Language Few-shot Learning** 195 | 196 | - Paper: https://arxiv.org/abs/2308.14960 197 | - Code: https://github.com/mlvlab/RPO 198 | 199 | 200 | 201 | # 目标检测(Object Detection) 202 | 203 | **Femtodet: an object detection baseline for energy versus performance tradeoffs** 204 | 205 | - Paper: https://arxiv.org/abs/2301.06719 206 | - Code: https://github.com/yh-pengtu/FemtoDet 207 | 208 | **Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment** 209 | 210 | - Paper: https://arxiv.org/abs/2207.13085 211 | - Code: https://github.com/Atten4Vis/GroupDETR 212 | 213 | **Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection** 214 | 215 | - Paper: https://arxiv.org/abs/2205.09613 216 | - Code: https://github.com/LiewFeng/imTED 217 | 218 | **ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation** 219 | 220 | - Paper: https://arxiv.org/abs/2308.09242 221 | - Code: https://github.com/iSEE-Laboratory/ASAG 222 | 223 | 224 | 225 | # 目标跟踪(Visual Tracking) 226 | 227 | **Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers** 228 | 229 | - Paper: https://arxiv.org/abs/2307.04129 230 | - Code: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker 231 | 232 | 233 | 234 | # 语义分割(Semantic Segmentation) 235 | 236 | **Segment Anything** 237 | 238 | - Homepage: https://segment-anything.com/ 239 | - Paper: https://arxiv.org/abs/2304.02643 240 | - Code: https://github.com/facebookresearch/segment-anything 241 | 242 | **MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation** 243 | 244 | - Paper: https://arxiv.org/abs/2304.09913 245 | - Code: https://github.com/shjo-april/MARS 246 | 247 | **FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation** 248 | 249 | - Paper: https://arxiv.org/abs/2307.07245 250 | - Code: https://github.com/TY-Shi/FreeCOS 251 | 252 | **Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation** 253 | 254 | - Paper: https://arxiv.org/abs/2211.14512 255 | - Code: https://github.com/yyliu01 256 | 257 | **Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement** 258 | 259 | - Paper: https://arxiv.org/abs/2307.09362 260 | - Code: https://github.com/w1oves/DTP 261 | 262 | 263 | 264 | # 视频目标分割(Video Object Segmentation) 265 | 266 | **Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus** 267 | 268 | - Paper: https://arxiv.org/abs/2207.01203 269 | 270 | - Code: https://github.com/lxa9867/R2VOS 271 | 272 | 273 | 274 | # 视频实例分割(Video Instance Segmentation) 275 | 276 | **DVIS: Decoupled Video Instance Segmentation Framework** 277 | 278 | - Paper: https://arxiv.org/abs/2306.03413 279 | - Code: https://github.com/zhang-tao-whu/DVIS 280 | 281 | 282 | 283 | # 医学图像分类 284 | 285 | **BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification** 286 | 287 | - Paper: https://arxiv.org/abs/2203.01937 288 | 289 | - Code: https://github.com/cyh-0/BoMD 290 | 291 | 292 | 293 | # 医学图像分割 294 | 295 | **CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection** 296 | 297 | - Paper: https://arxiv.org/abs/2301.00785 298 | - Code: https://github.com/ljwztc/CLIP-Driven-Universal-Model 299 | 300 | 301 | 302 | # Low-level Vision 303 | 304 | **Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive** 305 | 306 | - Paper: https://arxiv.org/abs/2305.19862 307 | - Code: https://github.com/shangwei5/SelfDRSC 308 | 309 | 310 | 311 | # 超分辨率(Super-Resolution) 312 | 313 | **Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.** 314 | 315 | - Paper: https://arxiv.org/abs/2303.08942 316 | - Code: https://github.com/Zhaozixiang1228/GDSR-SSDNet 317 | 318 | 319 | 320 | # 3D点云(3D Point Cloud) 321 | 322 | **Robo3D: Towards Robust and Reliable 3D Perception against Corruptions** 323 | 324 | - Homepage: https://ldkong.com/Robo3D 325 | - Paper: https://arxiv.org/abs/2303.17597 326 | - Code: https://github.com/ldkong1205/Robo3D 327 | 328 | **Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models** 329 | 330 | - Paper: https://arxiv.org/abs/2304.07221 331 | - Code: https://github.com/zyh16143998882/ICCV23-IDPT 332 | 333 | **Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos** 334 | 335 | - Paper: https://arxiv.org/abs/2308.09247 336 | - Code: None 337 | 338 | 339 | 340 | # 3D目标检测(3D Object Detection) 341 | 342 | **PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images** 343 | 344 | - Paper: https://arxiv.org/abs/2206.01256 345 | - Code: https://github.com/megvii-research/PETR 346 | 347 | **DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection** 348 | 349 | - Paper: https://arxiv.org/abs/2304.13031 350 | - Code: https://github.com/AIR-DISCOVER/DQS3D 351 | 352 | **SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection** 353 | 354 | - Paper: https://arxiv.org/abs/2304.14340 355 | - Code: https://github.com/yichen928/SparseFusion 356 | 357 | **StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection** 358 | 359 | - Paper: https://arxiv.org/abs/2303.11926 360 | - Code: https://github.com/exiawsh/StreamPETR.git 361 | 362 | **Cross Modal Transformer: Towards Fast and Robust 3D Object Detection** 363 | 364 | - Paper: https://arxiv.org/abs/2301.01283 365 | - Code: https://github.com/junjie18/CMT.git 366 | 367 | **MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation** 368 | 369 | - Paper: https://arxiv.org/abs/2304.09801 370 | - Project: https://chongjiange.github.io/metabev.html 371 | - Code: https://github.com/ChongjianGE/MetaBEV 372 | 373 | **Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling** 374 | 375 | - Paper: https://arxiv.org/abs/2307.07944 376 | - Code: https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet 377 | 378 | **SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection** 379 | 380 | - Paper: https://arxiv.org/abs/2307.11477 381 | - Code: https://github.com/mengtan00/SA-BEV 382 | 383 | 384 | 385 | # 3D语义分割(3D Semantic Segmentation) 386 | 387 | **Rethinking Range View Representation for LiDAR Segmentation** 388 | 389 | - Homepage: https://ldkong.com/RangeFormer 390 | - Paper: https://arxiv.org/abs/2303.05367 391 | - Code: None 392 | 393 | 394 | 395 | # 3D目标跟踪(3D Object Tracking) 396 | 397 | **MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors** 398 | 399 | - Paper: https://arxiv.org/abs/2303.05071 400 | - Code : https://github.com/slothfulxtx/MBPTrack3D 401 | 402 | 403 | 404 | # 视频理解(Video Understanding) 405 | 406 | **Unmasked Teacher: Towards Training-Efficient Video Foundation Models** 407 | 408 | - Paper: https://arxiv.org/abs/2303.16058 409 | 410 | - Code: https://github.com/OpenGVLab/unmasked_teacher 411 | 412 | 413 | 414 | # 图像生成(Image Generation) 415 | 416 | **FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model** 417 | 418 | - Paper: https://arxiv.org/abs/2303.09833 419 | - Code: https://github.com/vvictoryuki/FreeDoM 420 | 421 | **BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion** 422 | 423 | - Paper: https://arxiv.org/abs/2307.10816 424 | - Code: https://github.com/Sierkinhane/BoxDiff 425 | 426 | 427 | 428 | # 视频生成(Video Generation) 429 | 430 | **Simulating Fluids in Real-World Still Images** 431 | 432 | - Homepage: https://slr-sfs.github.io/ 433 | - Paper: https://arxiv.org/abs/2204.11335 434 | - Code: https://github.com/simon3dv/SLR-SFS 435 | 436 | 437 | 438 | # 图像编辑(Image Editing) 439 | 440 | **Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing** 441 | 442 | - Paper: https://arxiv.org/abs/2304.02051 443 | - Code: https://github.com/aimagelab/multimodal-garment-designer 444 | 445 | 446 | 447 | # 视频编辑(Video Editing) 448 | 449 | **FateZero: Fusing Attentions for Zero-shot Text-based Video Editing** 450 | 451 | - Project: https://fate-zero-edit.github.io/ 452 | - Paper: https://arxiv.org/abs/2303.09535 453 | - Code: https://github.com/ChenyangQiQi/FateZero 454 | 455 | 456 | 457 | # 人体运动生成(Human Motion Generation) 458 | 459 | **BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction** 460 | 461 | - Paper: https://arxiv.org/abs/2211.14304 462 | - Code: https://github.com/BarqueroGerman/BeLFusion 463 | 464 | 465 | 466 | # 低光照图像增强(Low-light Image Enhancement) 467 | 468 | **Implicit Neural Representation for Cooperative Low-light Image Enhancement** 469 | 470 | - Paper: https://arxiv.org/abs/2303.11722 471 | - Code: https://github.com/Ysz2022/NeRCo 472 | 473 | 474 | 475 | # 场景文本检测(Scene Text Detection) 476 | 477 | 478 | 479 | 480 | 481 | # 场景文本识别(Scene Text Recognition) 482 | 483 | **Self-supervised Character-to-Character Distillation for Text Recognition** 484 | 485 | - Paper: https://arxiv.org/abs/2211.00288 486 | - Code: https://github.com/TongkunGuan/CCD 487 | 488 | **MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition** 489 | 490 | - Paper: https://arxiv.org/abs/2305.14758 491 | - Code: https://github.com/simplify23/MRN 492 | - 中文解读:https://zhuanlan.zhihu.com/p/643948935 493 | 494 | 495 | 496 | # 图像检索(Image Retrieval) 497 | 498 | **Zero-Shot Composed Image Retrieval with Textual Inversion** 499 | 500 | - Paper: https://arxiv.org/abs/2303.15247 501 | - Code: https://github.com/miccunifi/SEARLE 502 | 503 | 504 | 505 | # 图像融合(Image Fusion) 506 | 507 | **DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion** 508 | 509 | - Paper: https://arxiv.org/abs/2303.06840 510 | - Code: https://github.com/Zhaozixiang1228/MMIF-DDFM 511 | 512 | 513 | 514 | # 轨迹预测(Trajectory Prediction) 515 | 516 | **EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting** 517 | 518 | - Homepage: https://inhwanbae.github.io/publication/eigentrajectory/ 519 | 520 | - Paper: https://arxiv.org/abs/2307.09306 521 | - Code: https://github.com/InhwanBae/EigenTrajectory 522 | 523 | 524 | 525 | # 人群计数(Crowd Counting) 526 | 527 | **Point-Query Quadtree for Crowd Counting, Localization, and More** 528 | 529 | - Paper: https://arxiv.org/abs/2308.13814 530 | - Code: https://github.com/cxliu0/PET 531 | 532 | 533 | 534 | # Video Quality Assessment(视频质量评价) 535 | 536 | **Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives** 537 | 538 | - Paper: https://arxiv.org/abs/2211.04894 539 | - Code: https://github.com/VQAssessment/DOVER 540 | 541 | 542 | 543 | # 其它(Others) 544 | 545 | **MotionBERT: A Unified Perspective on Learning Human Motion Representations** 546 | 547 | - Homepage: https://motionbert.github.io/ 548 | - Paper: https://arxiv.org/abs/2210.06551 549 | - Code: https://github.com/Walter0807/MotionBERT 550 | 551 | **Graph Matching with Bi-level Noisy Correspondence** 552 | 553 | - Paper: https://arxiv.org/pdf/2212.04085.pdf 554 | - Code: https://github.com/Lin-Yijie/Graph-Matching-Networks/tree/main/COMMON 555 | 556 | **LDL: Line Distance Functions for Panoramic Localization** 557 | 558 | - Paper: https://arxiv.org/abs/2308.13989 559 | - Code: https://github.com/82magnolia/panoramic-localization 560 | 561 | **Active Neural Mapping** 562 | 563 | - Homepage: https://zikeyan.github.io/active-INR/index.html 564 | - Paper: https://arxiv.org/abs/2308.16246 565 | - Code: https://zikeyan.github.io/active-INR/index.html# 566 | 567 | **Reconstructing Groups of People with Hypergraph Relational Reasoning** 568 | 569 | - Paper: https://arxiv.org/abs/2308.15844 570 | - Code: https://github.com/boycehbz/GroupRec --------------------------------------------------------------------------------