├── README.md └── mindmap └── fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png /README.md: -------------------------------------------------------------------------------- 1 | # CVPR 2024 2 | 3 | ![](https://camo.githubusercontent.com/e98004b4a9a1fdbad3c3fe1be700c0f0546286942108c54fa7f009eb786df0d0/68747470733a2f2f6869726f6b617473756b6174616f6b6131362e6769746875622e696f2f435650522d323032342d4c494d49542f696d672f435650525f4c6f676f53656174746c655f323032345f5072696d6172792e6a7067) 4 | 5 | ### Research Paper with Code 6 | 7 | ![](mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png) 8 | 9 | --- 10 | ## Table of Contents 11 | - [3DGS (Gaussian Splatting)](#3dgs-gaussian-splatting) 12 | - [Avatars](#avatars) 13 | - [Backbone](#backbone) 14 | - [CLIP](#clip) 15 | - [Embodied AI](#embodied-ai) 16 | - [OCR](#ocr) 17 | - [NeRF](#nerf) 18 | - [DETR](#detr) 19 | - [ReID](#reid) 20 | - [Long-Tail](#long-tail) 21 | - [Vision Transformer](#vision-transformer) 22 | - [Vision-Language](#vision-language) 23 | - [Self-supervised Learning](#self-supervised-learning) 24 | - [Data Augmentation](#data-augmentation) 25 | - [Object Detection](#object-detection) 26 | - [Anomaly Detection](#anomaly-detection) 27 | - [Visual Tracking](#visual-tracking) 28 | - [Semantic Segmentation](#semantic-segmentation) 29 | - [Instance Segmentation](#instance-segmentation) 30 | - [Panoptic Segmentation](#panoptic-segmentation) 31 | - [Medical Image](#medical-image) 32 | - [Medical Image Segmentation](#medical-image-segmentation) 33 | - [Video Object Segmentation](#video-object-segmentation) 34 | - [Video Instance Segmentation](#video-instance-segmentation) 35 | - [Referring Image Segmentation](#referring-image-segmentation) 36 | - [Image Matting](#image-matting) 37 | - [Image Editing](#image-editing) 38 | - [Low-level Vision](#low-level-vision) 39 | - [Super-Resolution](#super-resolution) 40 | - [Denoising](#denoising) 41 | - [Deblur](#deblur) 42 | - [Autonomous Driving](#autonomous-driving) 43 | - [3D Point Cloud](#3d-point-cloud) 44 | - [3D Object Detection](#3d-object-detection) 45 | - [3D Semantic Segmentation](#3d-semantic-segmentation) 46 | - [3D Object Tracking](#3d-object-tracking) 47 | - [3D Semantic Scene Completion](#3d-semantic-scene-completion) 48 | - [3D Registration](#3d-registration) 49 | - [3D Human Pose Estimation](#3d-human-pose-estimation) 50 | - [3D Human Mesh Estimation](#3d-human-mesh-estimation) 51 | - [Image Generation](#image-generation) 52 | - [Video Generation](#video-generation) 53 | - [Video Understanding](#video-understanding) 54 | - [Knowledge Distillation](#knowledge-distillation) 55 | - [Stereo Matching](#stereo-matching) 56 | - [Scene Graph Generation](#scene-graph-generation) 57 | - [Video Quality Assessment](#video-quality-assessment) 58 | - [Datasets](#datasets) 59 | - [Others](#others) 60 | 61 | ### Domain-wise Table 62 | 63 | #### 3DGS (Gaussian Splatting) 64 | 65 | | Index | Paper Title | Paper Link | Code | Official Repo | 66 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------ | 67 | | 1 | Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering | [Paper](https://arxiv.org/abs/2312.00109) | [Code](https://github.com/city-super/Scaffold-GS) | [Homepage](https://city-super.github.io/scaffold-gs/) | 68 | | 2 | GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis | [Paper](https://arxiv.org/abs/2312.02155) | [Code](https://github.com/ShunyuanZheng/GPS-Gaussian) | [Homepage](https://shunyuanzheng.github.io/GPS-Gaussian) | 69 | | 3 | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | [Paper](https://arxiv.org/abs/2312.02134) | [Code](https://github.com/huliangxiao/GaussianAvatar) | N/A | 70 | | 4 | GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting | [Paper](https://arxiv.org/abs/2311.14521) | [Code](https://github.com/buaacyw/GaussianEditor) | N/A | 71 | | 5 | Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction | [Paper](https://arxiv.org/abs/2309.13101) | [Code](https://github.com/ingra14m/Deformable-3D-Gaussians) | [Homepage](https://ingra14m.github.io/Deformable-Gaussians/) | 72 | 73 | #### Avatars 74 | 75 | | Index | Paper Title | Paper Link | Code | Official Repo | 76 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------------- | --------------------------------------------- | 77 | | 6 | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | [Paper](https://arxiv.org/abs/2312.02134) | [Code](https://github.com/huliangxiao/GaussianAvatar) | N/A | 78 | | 7 | Real-Time Simulated Avatar from Head-Mounted Sensors | [Paper](https://arxiv.org/abs/2403.06862) | N/A | [Homepage](https://www.zhengyiluo.com/SimXR/) | 79 | 80 | #### Backbone 81 | 82 | | Index | Paper Title | Paper Link | Code | Official Repo | 83 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- | 84 | | 8 | RepViT: Revisiting Mobile CNN From ViT Perspective | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT) | N/A | 85 | | 9 | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A | 86 | 87 | #### CLIP 88 | 89 | | Index | Paper Title | Paper Link | Code | Official Repo | 90 | | ----- | --------------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 91 | | 10 | Alpha-CLIP: A CLIP Model Focusing on Wherever You Want | [Paper](https://arxiv.org/abs/2312.03818) | [Code](https://github.com/SunzeY/AlphaCLIP) | N/A | 92 | | 11 | FairCLIP: Harnessing Fairness in Vision-Language Learning | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A | 93 | 94 | #### Embodied AI 95 | 96 | | Index | Paper Title | Paper Link | Code | Official Repo | 97 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------------- | ---------------------------------------------------- | 98 | | 12 | EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI | [Paper](https://arxiv.org/abs/2312.16170) | [Code](https://github.com/OpenRobotLab/EmbodiedScan) | [Homepage](https://tai-wang.github.io/embodiedscan/) | 99 | | 13 | MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception | [Paper](https://arxiv.org/abs/2312.07472) | [Code](https://github.com/IranQin/MP5) | [Homepage](https://iranqin.github.io/MP5.github.io/) | 100 | 101 | #### OCR 102 | 103 | | Index | Paper Title | Paper Link | Code | Official Repo | 104 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 105 | | 14 | An Empirical Study of Scaling Law for OCR | [Paper](https://arxiv.org/abs/2401.00028) | [Code](https://github.com/large-ocr-model/large-ocr-model.github.io) | N/A | 106 | | 15 | ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | [Paper](https://arxiv.org/abs/2403.00303) | [Code](https://github.com/PriNing/ODM) | N/A | 107 | 108 | #### NeRF 109 | 110 | | Index | Paper Title | Paper Link | Code | Official Repo | 111 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------- | ------------- | 112 | | 16 | PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF | [Paper](https://arxiv.org/abs/2311.13099) | [Code](https://github.com/FYTalon/pienerf/) | N/A | 113 | 114 | #### DETR 115 | 116 | | Index | Paper Title | Paper Link | Code | Official Repo | 117 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------ | ------------- | 118 | | 17 | DETRs Beat YOLOs on Real-time Object Detection | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR) | N/A | 119 | | 18 | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR) | N/A | 120 | 121 | #### ReID 122 | 123 | | Index | Paper Title | Paper Link | Code | Official Repo | 124 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------- | ------------- | 125 | | 19 | Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | [Paper](https://arxiv.org/abs/2403.10254) | [Code](https://github.com/924973292/EDITOR) | N/A | 126 | | 20 | Noisy-Correspondence Learning for Text-to-Image Person Re-identification | [Paper](https://arxiv.org/abs/2308.09911) | [Code](https://github.com/QinYang79/RDE) | N/A | 127 | 128 | #### Long-Tail 129 | 130 | | Index | Paper Title | Paper Link | Code | Official Repo | 131 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 132 | | 1 | Delving into the Trajectory Long-tail Distribution for Multi-object Tracking | [Paper](https://arxiv.org/abs/2403.04700) | [Code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT) | N/A | 133 | 134 | #### Vision Transformer 135 | 136 | | Index | Paper Title | Paper Link | Code | Official Repo | 137 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- | 138 | | 2 | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A | 139 | | 3 | RepViT: Revisiting Mobile CNN From ViT Perspective | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT) | N/A | 140 | 141 | #### Vision-Language 142 | 143 | | Index | Paper Title | Paper Link | Code | Official Repo | 144 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 145 | | 4 | PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | [Paper](https://arxiv.org/abs/2403.02781) | [Code](https://github.com/zhengli97/PromptKD) | N/A | 146 | | 5 | FairCLIP: Harnessing Fairness in Vision-Language Learning | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A | 147 | 148 | #### Self-supervised Learning 149 | 150 | | Index | Paper Title | Paper Link | Code | Official Repo | 151 | | ----- | ----------- | ---------- | ---- | ------------- | 152 | | 6 | N/A | N/A | N/A | N/A | 153 | 154 | #### Data Augmentation 155 | 156 | | Index | Paper Title | Paper Link | Code | Official Repo | 157 | | ----- | ----------- | ---------- | ---- | ------------- | 158 | | 7 | N/A | N/A | N/A | N/A | 159 | 160 | #### Object Detection 161 | 162 | | Index | Paper Title | Paper Link | Code | Official Repo | 163 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 164 | | 8 | DETRs Beat YOLOs on Real-time Object Detection | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR) | N/A | 165 | | 9 | Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation | [Paper](https://arxiv.org/abs/2312.01220) | [Code](https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation) | N/A | 166 | | 10 | YOLO-World: Real-Time Open-Vocabulary Object Detection | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World) | N/A | 167 | | 11 | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR) | N/A | 168 | 169 | #### Anomaly Detection 170 | 171 | | Index | Paper Title | Paper Link | Code | Official Repo | 172 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------- | ------------- | 173 | | 12 | Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection | [Paper](https://arxiv.org/abs/2310.12790) | [Code](https://github.com/mala-lab/AHL) | N/A | 174 | 175 | #### Visual Tracking 176 | 177 | | Index | Paper Title | Paper Link | Code | Official Repo | 178 | | ----- | ----------- | ---------- | ---- | ------------- | 179 | | 13 | N/A | N/A | N/A | N/A | 180 | 181 | #### Semantic Segmentation 182 | 183 | | Index | Paper Title | Paper Link | Code | Official Repo | 184 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------- | ------------- | 185 | | 14 | Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [Paper](https://arxiv.org/abs/2312.04265) | [Code](https://github.com/w1oves/Rein) | N/A | 186 | | 15 | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [Paper](https://arxiv.org/abs/2311.15537) | [Code](https://github.com/xb534/SED) | N/A | 187 | 188 | #### Instance Segmentation 189 | 190 | | Index | Paper Title | Paper Link | Code | Official Repo | 191 | | ----- | ----------- | ---------- | ---- | ------------- | 192 | | 16 | N/A | N/A | N/A | N/A | 193 | 194 | #### Panoptic Segmentation 195 | 196 | | Index | Paper Title | Paper Link | Code | Official Repo | 197 | | ----- | ----------- | ---------- | ---- | ------------- | 198 | | 17 | N/A | N/A | N/A | N/A | 199 | 200 | #### Medical Image 201 | 202 | | Index | Paper Title | Paper Link | Code | Official Repo | 203 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- | 204 | | 18 | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL) | N/A | 205 | | 19 | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo) | N/A | 206 | | 20 | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A | 207 | 208 | #### Medical Image Segmentation 209 | 210 | | Index | Paper Title | Paper Link | Code | Official Repo | 211 | | ----- | ----------- | ---------- | ---- | ------------- | 212 | | 21 | N/A | N/A | N/A | N/A | 213 | 214 | #### Video Object Segmentation 215 | 216 | | Index | Paper Title | Paper Link | Code | Official Repo | 217 | | ----- | ----------- | ---------- | ---- | ------------- | 218 | | 22 | N/A | N/A | N/A | N/A | 219 | 220 | #### Video Instance Segmentation 221 | 222 | | Index | Paper Title | Paper Link | Code | Official Repo | 223 | | ----- | ----------- | ---------- | ---- | ------------- | 224 | | 23 | N/A | N/A | N/A | N/A | 225 | 226 | #### Referring Image Segmentation 227 | 228 | | Index | Paper Title | Paper Link | Code | Official Repo | 229 | | ----- | ----------- | ---------- | ---- | ------------- | 230 | | 24 | N/A | N/A | N/A | N/A | 231 | 232 | #### Image Matting 233 | 234 | | Index | Paper Title | Paper Link | Code | Official Repo | 235 | | ----- | ----------- | ---------- | ---- | ------------- | 236 | | 25 | N/A | N/A | N/A | N/A | 237 | 238 | #### Image Editing 239 | 240 | | Index | Paper Title | Paper Link | Code | Official Repo | 241 | | ----- | ------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------- | 242 | | 26 | Edit One for All: Interactive Batch Image Editing | [Paper](https://arxiv.org/abs/2401.10219) | [Code](https://github.com/thaoshibe/edit-one-for-all) | [Homepage](https://thaoshibe.github.io/edit-one-for-all) | 243 | 244 | #### Low-level Vision 245 | 246 | | Index | Paper Title | Paper Link | Code | Official Repo | 247 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------- | ------------- | 248 | | 27 | Residual Denoising Diffusion Models | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A | 249 | | 28 | Boosting Image Restoration via Priors from Pre-trained Models | [Paper](https://arxiv.org/abs/2403.06793) | N/A | N/A | 250 | 251 | #### Super-Resolution) 252 | 253 | | Index | Paper Title | Paper Link | Code | Official Repo | 254 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------------- | ------------- | 255 | | 29 | SeD: Semantic-Aware Discriminator for Image Super-Resolution | [Paper](https://arxiv.org/abs/2402.19387) | [Code](https://github.com/lbc12345/SeD) | N/A | 256 | | 30 | APISR: Anime Production Inspired Real-World Anime Super-Resolution | [Paper](https://arxiv.org/abs/2403.01598) | [Code](https://github.com/Kiter### Domain-wise Table | | 257 | 258 | #### Denoising 259 | 260 | | Index | Paper Title | Paper Link | Code | Official Repo | 261 | | ----- | ----------------------------------- | ----------------------------------------- | ---------------------------------------- | ------------- | 262 | | 31 | Residual Denoising Diffusion Models | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A | 263 | 264 | #### Deblur 265 | 266 | | Index | Paper Title | Paper Link | Code | Official Repo | 267 | | ----- | ----------- | ---------- | ---- | ------------- | 268 | | 32 | N/A | N/A | N/A | N/A | 269 | 270 | #### Autonomous Driving 271 | 272 | | Index | Paper Title | Paper Link | Code | Official Repo | 273 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------- | ------------- | 274 | | 33 | UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | [Paper](https://arxiv.org/abs/2310.08370) | [Code](https://github.com/Nightmare-n/UniPAD) | N/A | 275 | | 34 | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | [Paper](https://arxiv.org/abs/2311.17663) | [Code](https://github.com/haomo-ai/Cam4DOcc) | N/A | 276 | | 35 | Memory-based Adapters for Online 3D Scene Perception | [Paper](https://arxiv.org/abs/2403.06974) | [Code](https://github.com/xuxw98/Online3D) | N/A | 277 | | 36 | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies) | N/A | 278 | | 37 | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145) | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A | 279 | | 38 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | [Paper](https://arxiv.org/abs/2403.07535) | [Code](https://github.com/Junda24/AFNet) | N/A | 280 | 281 | #### 3D Point Cloud 282 | 283 | | Index | Paper Title | Paper Link | Code | Official Repo | 284 | | ----- | ----------- | ---------- | ---- | ------------- | 285 | | 40 | N/A | N/A | N/A | N/A | 286 | 287 | #### 3D Object Detection 288 | 289 | | Index | Paper Title | Paper Link | Code | Official Repo | 290 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- | 291 | | 41 | PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection | [Paper](https://arxiv.org/abs/2312.08371) | [Code](https://github.com/kuanchihhuang/PTT) | N/A | 292 | | 42 | UniMODE: Unified Monocular 3D Object Detection | [Paper](https://arxiv.org/abs/2402.18573) | N/A | N/A | 293 | 294 | #### 3D Semantic Segmentation 295 | 296 | | Index | Paper Title | Paper Link | Code | Official Repo | 297 | | ----- | ----------- | ---------- | ---- | ------------- | 298 | | 43 | N/A | N/A | N/A | N/A | 299 | 300 | #### 3D Object Tracking 301 | 302 | | Index | Paper Title | Paper Link | Code | Official Repo | 303 | | ----- | ----------- | ---------- | ---- | ------------- | 304 | | 44 | N/A | N/A | N/A | N/A | 305 | 306 | #### 3D Semantic Scene Completion 307 | 308 | | Index | Paper Title | Paper Link | Code | Official Repo | 309 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- | 310 | | 45 | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies) | N/A | 311 | 312 | #### 3D Registration 313 | 314 | | Index | Paper Title | Paper Link | Code | Official Repo | 315 | | ----- | ----------- | ---------- | ---- | ------------- | 316 | | 46 | N/A | N/A | N/A | N/A | 317 | 318 | #### 3D Human Pose Estimation 319 | 320 | | Index | Paper Title | Paper Link | Code | Official Repo | 321 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- | 322 | | 47 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | [Paper](https://arxiv.org/abs/2311.12028) | [Code](https://github.com/NationalGAILab/HoT) | N/A | 323 | 324 | #### 3D Human Mesh Estimation 325 | 326 | | Index | Paper Title | Paper Link | Code | Official Repo | 327 | | ----- | ----------- | ---------- | ---- | ------------- | 328 | | 48 | N/A | N/A | N/A | N/A | 329 | 330 | #### Medical Image 331 | 332 | | Index | Paper Title | Paper Link | Code | Official Repo | 333 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- | 334 | | 49 | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL) | N/A | 335 | | 50 | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo) | N/A | 336 | | 51 | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A | 337 | 338 | #### Image Generation 339 | 340 | | Index | Paper Title | Paper Link | Code | Official Repo | 341 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ | 342 | | 52 | InstanceDiffusion: Instance-level Control for Image Generation | [Paper](https://arxiv.org/abs/2402.03290) | [Code](https://github.com/frank-xwang/InstanceDiffusion) | [Homepage](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/) | 343 | | 53 | ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations | [Paper](https://arxiv.org/abs/2312.04655) | [Code](https://github.com/eclipse-t2i/eclipse-inference) | [Homepage](https://eclipse-t2i.vercel.app/) | 344 | | 54 | Instruct-Imagen: Image Generation with Multi-modal Instruction | [Paper](https://arxiv.org/abs/2401.01952) | N/A | N/A | 345 | | 55 | UniGS: Unified Representation for Image Generation and Segmentation | [Paper](https://arxiv.org/abs/2312.01985) | N/A | N/A | 346 | | 56 | Multi-Instance Generation Controller for Text-to-Image Synthesis | [Paper](https://arxiv.org/abs/2402.05408) | [Code](https://github.com/limuloo/migc) | N/A | 347 | | 57 | SVGDreamer: Text Guided SVG Generation with Diffusion Model | [Paper](https://arxiv.org/abs/2312.16476) | [Code](https://ximinng.github.io/SVGDreamer-project/) | N/A | 348 | | 58 | InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model | [Paper](https://arxiv.org/abs/2312.05849) | [Code](https://github.com/jiuntian/interactdiffusion) | N/A | 349 | | 59 | Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following | [Paper](https://arxiv.org/abs/2311.17002) | [Code](https://github.com/ali-vilab/Ranni) | N/A | 350 | 351 | #### Video Generation 352 | 353 | | Index | Paper Title | Paper Link | Code | Official Repo | 354 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | 355 | | 60 | Vlogger: Make Your Dream A Vlog | [Paper](https://arxiv.org/abs/2401.09414) | [Code](https://github.com/Vchitect/Vlogger) | N/A | 356 | | 61 | VBench: Comprehensive Benchmark Suite for Video Generative Models | [Paper](https://arxiv.org/abs/2311.17982) | [Code](https://github.com/Vchitect/VBench) | [Homepage](https://vchitect.github.io/VBench-project/) | 357 | | 62 | VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models | [Paper](https://arxiv.org/abs/2312.00845) | [Code](https://github.com/HyeonHo99/Video-Motion-Customization) | [Homepage](https://github.com/HyeonHo99/Video-Motion-Customization) | 358 | 359 | #### Vision Transformer 360 | 361 | | Index | Paper Title | Paper Link | Code | Official Repo | 362 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- | 363 | | 63 | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A | 364 | | 64 | RepViT: Revisiting Mobile CNN From ViT Perspective | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT) | N/A | 365 | | 65 | A General and Efficient Training for Transformer via Token Expansion | [Paper](https://arxiv.org/abs/2404.00672) | [Code](https://github.com/Osilly/TokenExpansion) | N/A | 366 | 367 | #### Vision-Language 368 | 369 | | Index | Paper Title | Paper Link | Code | Official Repo | 370 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 371 | | 66 | PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | [Paper](https://arxiv.org/abs/2403.02781) | [Code](https://github.com/zhengli97/PromptKD) | N/A | 372 | | 67 | FairCLIP: Harnessing Fairness in Vision-Language Learning | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A | 373 | 374 | #### Object Detection 375 | 376 | | Index | Paper Title | Paper Link | Code | Official Repo | 377 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 378 | | 68 | DETRs Beat YOLOs on Real-time Object Detection | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR) | N/A | 379 | | 69 | Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation | [Paper](https://arxiv.org/abs/2312.01220) | [Code](https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation) | N/A | 380 | | 70 | YOLO-World: Real-Time Open-Vocabulary Object Detection | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World) | N/A | 381 | | 71 | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR) | N/A | 382 | 383 | #### Anomaly Detection 384 | 385 | | Index | Paper Title | Paper Link | Code | Official Repo | 386 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------- | ------------- | 387 | | 72 | Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection | [Paper](https://arxiv.org/abs/2310.12790) | [Code](https://github.com/mala-lab/AHL) | N/A | 388 | 389 | #### Object Tracking 390 | 391 | | Index | Paper Title | Paper Link | Code | Official Repo | 392 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 393 | | 73 | Delving into the Trajectory Long-tail Distribution for Multi-object Tracking | [Paper](https://arxiv.org/abs/2403.04700) | [Code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT) | N/A | 394 | 395 | #### Semantic Segmentation 396 | 397 | | Index | Paper Title | Paper Link | Code | Official Repo | 398 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------- | ------------- | 399 | | 74 | Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [Paper](https://arxiv.org/abs/2312.04265) | [Code](https://github.com/w1oves/Rein) | N/A | 400 | | 75 | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [Paper](https://arxiv.org/abs/2311.15537) | [Code](https://github.com/xb534/SED) | N/A | 401 | 402 | #### Medical Image 403 | 404 | | Index | Paper Title | Paper Link | Code | Official Repo | 405 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- | 406 | | 76 | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL) | N/A | 407 | | 77 | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo) | N/A | 408 | | 78 | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A | 409 | 410 | #### Medical Image Segmentation 411 | 412 | | Index | Paper Title | Paper Link | Code | Official Repo | 413 | | ----- | ----------- | ---------- | ---- | ------------- | 414 | | 76 | N/A | N/A | N/A | N/A | 415 | 416 | #### Autonomous Driving 417 | 418 | | Index | Paper Title | Paper Link | Code | Official Repo | 419 | | ----- | ------------------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- | 420 | | 77 | UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | [Paper](https://arxiv.org/abs/2310.08370) | [Code](https://github.com/Nightmare-n/UniPAD) | N/A | 421 | | 78 | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | [Paper](https://arxiv.org/abs/2311.17663) | [Code](https://github.com/haomo-ai/Cam4DOcc) | N/A | 422 | | 79 | Memory-based Adapters for Online 3D Scene Perception | [Paper](https://arxiv.org/abs/2403.06974) | [Code](https://github.com/xuxw98/Online3D) | N/A | 423 | | 80 | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies) | N/A | 424 | | 81 | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145) | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A | 425 | | 82 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | [Paper](https://arxiv.org/abs/2403.07535) | [Code](https://github.com/Junda24/AFNet) | N/A | 426 | | 83 | Traffic Scene Parsing through the TSP6K Dataset | [Paper](https://arxiv.org/pdf/2303.02835.pdf) | [Code](https://github.com/PengtaoJiang/TSP6K) | N/A | 427 | 428 | #### 3D Point Cloud 429 | 430 | | Index | Paper Title | Paper Link | Code | Official Repo | 431 | | ----- | ----------- | ---------- | ---- | ------------- | 432 | | 84 | N/A | N/A | N/A | N/A | 433 | 434 | #### 3D Object Detection 435 | 436 | | Index | Paper Title | Paper Link | Code | Official Repo | 437 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- | 438 | | 85 | PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection | [Paper](https://arxiv.org/abs/2312.08371) | [Code](https://github.com/kuanchihhuang/PTT) | N/A | 439 | | 86 | UniMODE: Unified Monocular 3D Object Detection | [Paper](https://arxiv.org/abs/2402.18573) | N/A | N/A | 440 | 441 | #### 3D Semantic Segmentation 442 | 443 | | Index | Paper Title | Paper Link | Code | Official Repo | 444 | | ----- | ----------- | ---------- | ---- | ------------- | 445 | | 87 | N/A | N/A | N/A | N/A | 446 | 447 | #### Image Editing 448 | 449 | | Index | Paper Title | Paper Link | Code | Official Repo | 450 | | ----- | ------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------- | 451 | | 88 | Edit One for All: Interactive Batch Image Editing | [Paper](https://arxiv.org/abs/2401.10219) | [Code](https://github.com/thaoshibe/edit-one-for-all) | [Homepage](https://thaoshibe.github.io/edit-one-for-all) | 452 | 453 | #### Video Editing 454 | 455 | | Index | Paper Title | Paper Link | Code | Official Repo | 456 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---- | ------------------------------------- | 457 | | 89 | MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers | [Paper](https://arxiv.org/abs/2312.12468) | N/A | [Homepage](https://maskint.github.io) | 458 | 459 | #### Low-level Vision 460 | 461 | | Index | Paper Title | Paper Link | Code | Official Repo | 462 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------- | ------------- | 463 | | 90 | Residual Denoising Diffusion Models | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A | 464 | | 91 | Boosting Image Restoration via Priors from Pre-trained Models | [Paper](https://arxiv.org/abs/2403.06793) | N/A | N/A | 465 | 466 | #### Super-Resolution 467 | 468 | | Index | Paper Title | Paper Link | Code | Official Repo | 469 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- | 470 | | 92 | SeD: Semantic-Aware Discriminator for Image Super-Resolution | [Paper](https://arxiv.org/abs/2402.19387) | [Code](https://github.com/lbc12345/SeD) | N/A | 471 | | 93 | APISR: Anime Production Inspired Real-World Anime Super-Resolution | [Paper](https://arxiv.org/abs/2403.01598) | [Code](https://github.com/Kiteretsu77/APISR) | N/A | 472 | 473 | #### Denoising 474 | 475 | | Index | Paper Title | Paper Link | Code | Official Repo | 476 | | ----- | ----------- | ---------- | ---- | ------------- | 477 | | 94 | N/A | N/A | N/A | N/A | 478 | 479 | #### 3D Human Pose Estimation 480 | 481 | | Index | Paper Title | Paper Link | Code | Official Repo | 482 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- | 483 | | 95 | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | [Paper](https://arxiv.org/abs/2311.12028) | [Code](https://github.com/NationalGAILab/HoT) | N/A | 484 | 485 | #### Image Generation 486 | 487 | | Index | Paper Title | Paper Link | Code | Official Repo | 488 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ | 489 | | 96 | InstanceDiffusion: Instance-level Control for Image Generation | [Paper](https://arxiv.org/abs/2402.03290) | [Code](https://github.com/frank-xwang/InstanceDiffusion) | [Homepage](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/) | 490 | | 97 | ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations | [Paper](https://arxiv.org/abs/2312.04655) | [Code](https://github.com/eclipse-t2i/eclipse-inference) | [Homepage](https://eclipse-t2i.vercel.app/) | 491 | | 98 | Instruct-Imagen: Image Generation with Multi-modal Instruction | [Paper](https://arxiv.org/abs/2401.01952) | N/A | N/A | 492 | | 99 | Residual Denoising Diffusion Models | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A | 493 | | 100 | UniGS: Unified Representation for Image Generation and Segmentation | [Paper](https://arxiv.org/abs/2312.01985) | N/A | N/A | 494 | | 101 | Multi-Instance Generation Controller for Text-to-Image Synthesis | [Paper](https://arxiv.org/abs/2402.05408) | [Code](https://github.com/limuloo/migc) | N/A | 495 | | 102 | SVGDreamer: Text Guided SVG Generation with Diffusion Model | [Paper](https://arxiv.org/abs/2312.16476) | [Code](https://ximinng.github.io/SVGDreamer-project/) | N/A | 496 | | 103 | InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model | [Paper](https://arxiv.org/abs/2312.05849) | [Code](https://github.com/jiuntian/interactdiffusion) | N/A | 497 | | 104 | Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following | [Paper](https://arxiv.org/abs/2311.17002) | [Code](https://github.com/ali-vilab/Ranni) | N/A | 498 | 499 | #### Video Generation 500 | 501 | | Index | Paper Title | Paper Link | Code | Official Repo | 502 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------- | 503 | | 105 | Vlogger: Make Your Dream A Vlog | [Paper](https://arxiv.org/abs/2401.09414) | [Code](https://github.com/Vchitect/Vlogger) | N/A | 504 | | 106 | VBench: Comprehensive Benchmark Suite for Video Generative Models | [Paper](https://arxiv.org/abs/2311.17982) | [Code](https://github.com/Vchitect/VBench) | [Homepage](https://vchitect.github.io/VBench-project/) | 505 | | 107 | VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models | [Paper](https://arxiv.org/abs/2312.00845) | [Code](https://github.com/HyeonHo99/Video-Motion-Customization) | [Homepage](https://video-motion-customization.github.io/) | 506 | 507 | #### 3D Generation 508 | 509 | | Index | Paper Title | Paper Link | Code | Official Repo | 510 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------- | 511 | | 108 | CityDreamer: Compositional Generative Model of Unbounded 3D Cities | [Paper](https://arxiv.org/abs/2309.00610) | [Code](https://github.com/hzxie/city-dreamer) | [Homepage](https://haozhexie.com/project/city-dreamer/) | 512 | | 109 | LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching | [Paper](https://arxiv.org/abs/2311.11284) | [Code](https://github.com/EnVision-Research/LucidDreamer) | N/A | 513 | 514 | #### Video Understanding 515 | 516 | | Index | Paper Title | Paper Link | Code | Official Repo | 517 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 518 | | 110 | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | [Paper](https://arxiv.org/abs/2311.17005) | [Code](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2) | N/A | 519 | 520 | #### Knowledge Distillation 521 | 522 | | Index | Paper Title | Paper Link | Code | Official Repo | 523 | | ----- | ---------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------- | 524 | | 111 | Logit Standardization in Knowledge Distillation | [Paper](https://arxiv.org/abs/2403.01427) | [Code](https://github.com/sunshangquan/logit-standardization-KD) | N/A | 525 | | 112 | Efficient Dataset Distillation via Minimax Diffusion | [Paper](https://arxiv.org/abs/2311.15529) | [Code](https://github.com/vimar-gu/MinimaxDiffusion) | N/A | 526 | 527 | #### Stereo Matching 528 | 529 | | Index | Paper Title | Paper Link | Code | Official Repo | 530 | | ----- | ---------------------------------------------- | ----------------------------------------- | ------------------------------------------ | ------------- | 531 | | 113 | Neural Markov Random Field for Stereo Matching | [Paper](https://arxiv.org/abs/2403.11193) | [Code](https://github.com/aeolusguan/NMRF) | N/A | 532 | 533 | #### Scene Graph Generation 534 | 535 | | Index | Paper Title | Paper Link | Code | Official Repo | 536 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------- | -------------------------------------------------- | 537 | | 114 | HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation | [Paper](https://arxiv.org/abs/2403.12033) | [Code](https://github.com/zhangce01/HiKER-SGG) | [Homepage](https://zhangce01.github.io/HiKER-SGG/) | 538 | 539 | #### Video Quality Assessment 540 | 541 | | Index | Paper Title | Paper Link | Code | Official Repo | 542 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------- | 543 | | 115 | KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos | [Paper](https://arxiv.org/abs/2402.07220) | [Code](https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024) | [Homepage](https://lixinustc.github.io/projects/KVQ/) | 544 | 545 | #### Datasets 546 | 547 | | Index | Paper Title | Paper Link | Code | Official Repo | 548 | | ----- | ------------------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- | 549 | | 116 | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145) | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A | 550 | | 117 | Traffic Scene Parsing through the TSP6K Dataset | [Paper](https://arxiv.org/pdf/2303.02835.pdf) | [Code](https://github.com/PengtaoJiang/TSP6K) | N/A | 551 | 552 | #### Others 553 | 554 | | Index | Paper Title | Paper Link | Code | Official Repo | 555 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | 556 | | 118 | Object Recognition as Next Token Prediction | [Paper](https://arxiv.org/abs/2312.02142) | [Code](https://github.com/kaiyuyue/nxtp) | N/A | 557 | | 119 | ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks | [Paper](https://arxiv.org/abs/2306.14525) | [Code](https://parameternet.github.io/) | N/A | 558 | | 120 | Seamless Human Motion Composition with Blended Positional Encodings | [Paper](https://arxiv.org/abs/2402.15509) | [Code](https://github.com/BarqueroGerman/FlowMDM) | N/A | 559 | | 121 | LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning | [Paper](https://arxiv.org/abs/2311.18651) | [Code](https://github.com/Open3DA/LL3DA) | [Homepage](https://ll3da.github.io/) | 560 | | 122 | CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update | [Paper](https://arxiv.org/abs/2312.10908) | N/A | [Homepage](https://clova-tool.github.io/) | 561 | | 123 | MoMask: Generative Masked Modeling of 3D Human Motions | [Paper](https://arxiv.org/abs/2312.00063) | [Code](https://github.com/EricGuo5513/momask-codes) | N/A | 562 | | 124 | Amodal Ground Truth and Completion in the Wild | [Paper](https://arxiv.org/abs/2312.17247) | [Code](https://github.com/Championchess/Amodal-Completion-in-the-Wild) | [Homepage](https://www.robots.ox.ac.uk/~vgg/research/amodal/) | 563 | | 125 | Improved Visual Grounding through Self-Consistent Explanations | [Paper](https://arxiv.org/abs/2312.04554) | [Code](https://github.com/uvavision/SelfEQ) | N/A | 564 | | 126 | ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | [Paper](https://arxiv.org/abs/2403.18775) | [Code](https://github.com/chenshuang-zhang/imagenet_d) | [Homepage](https://chenshuang-zhang.github.io/imagenet_d/) | 565 | | 127 | Learning from Synthetic Human Group Activities | [Paper](https://arxiv.org/abs/2306.16772) | [Code](https://github.com/cjerry1243/M3Act) | [Homepage](https://cjerry1243.github.io/M3Act/) | 566 | | 128 | A Cross-Subject Brain Decoding Framework | [Paper](https://arxiv.org/abs/2404.07850) | [Code](https://github.com/littlepure2333/MindBridge) | [Homepage](https://littlepure2333.github.io/MindBridge/) | 567 | | 129 | Multi-Task Dense Prediction via Mixture of Low-Rank Experts | [Paper](https://arxiv.org/abs/2403.17749) | [Code](https://github.com/YuqiYang213/MLoRE) | N/A | 568 | | 130 | Contrastive Mean-Shift Learning for Generalized Category Discovery | [Paper](https://arxiv.org/abs/2404.09451) | [Code](https://github.com/sua-choi/CMS) | [Homepage](https://postech-cvlab.github.io/cms/) | 569 | 570 | #### Thank you for Reading -------------------------------------------------------------------------------- /mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ashishpatel26/CVPR2024/0339d4af644ac2ed204f0857d1959992bfe34427/mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png --------------------------------------------------------------------------------