├── README.md
└── mindmap
    └── fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png


/README.md:
--------------------------------------------------------------------------------
  1 | # CVPR 2024 
  2 | 
  3 | ![](https://camo.githubusercontent.com/e98004b4a9a1fdbad3c3fe1be700c0f0546286942108c54fa7f009eb786df0d0/68747470733a2f2f6869726f6b617473756b6174616f6b6131362e6769746875622e696f2f435650522d323032342d4c494d49542f696d672f435650525f4c6f676f53656174746c655f323032345f5072696d6172792e6a7067)
  4 | 
  5 | ### Research Paper with Code
  6 | 
  7 | ![](mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png)
  8 | 
  9 | ---
 10 | ## Table of Contents
 11 | - [3DGS (Gaussian Splatting)](#3dgs-gaussian-splatting)
 12 | - [Avatars](#avatars)
 13 | - [Backbone](#backbone)
 14 | - [CLIP](#clip)
 15 | - [Embodied AI](#embodied-ai)
 16 | - [OCR](#ocr)
 17 | - [NeRF](#nerf)
 18 | - [DETR](#detr)
 19 | - [ReID](#reid)
 20 | - [Long-Tail](#long-tail)
 21 | - [Vision Transformer](#vision-transformer)
 22 | - [Vision-Language](#vision-language)
 23 | - [Self-supervised Learning](#self-supervised-learning)
 24 | - [Data Augmentation](#data-augmentation)
 25 | - [Object Detection](#object-detection)
 26 | - [Anomaly Detection](#anomaly-detection)
 27 | - [Visual Tracking](#visual-tracking)
 28 | - [Semantic Segmentation](#semantic-segmentation)
 29 | - [Instance Segmentation](#instance-segmentation)
 30 | - [Panoptic Segmentation](#panoptic-segmentation)
 31 | - [Medical Image](#medical-image)
 32 | - [Medical Image Segmentation](#medical-image-segmentation)
 33 | - [Video Object Segmentation](#video-object-segmentation)
 34 | - [Video Instance Segmentation](#video-instance-segmentation)
 35 | - [Referring Image Segmentation](#referring-image-segmentation)
 36 | - [Image Matting](#image-matting)
 37 | - [Image Editing](#image-editing)
 38 | - [Low-level Vision](#low-level-vision)
 39 | - [Super-Resolution](#super-resolution)
 40 | - [Denoising](#denoising)
 41 | - [Deblur](#deblur)
 42 | - [Autonomous Driving](#autonomous-driving)
 43 | - [3D Point Cloud](#3d-point-cloud)
 44 | - [3D Object Detection](#3d-object-detection)
 45 | - [3D Semantic Segmentation](#3d-semantic-segmentation)
 46 | - [3D Object Tracking](#3d-object-tracking)
 47 | - [3D Semantic Scene Completion](#3d-semantic-scene-completion)
 48 | - [3D Registration](#3d-registration)
 49 | - [3D Human Pose Estimation](#3d-human-pose-estimation)
 50 | - [3D Human Mesh Estimation](#3d-human-mesh-estimation)
 51 | - [Image Generation](#image-generation)
 52 | - [Video Generation](#video-generation)
 53 | - [Video Understanding](#video-understanding)
 54 | - [Knowledge Distillation](#knowledge-distillation)
 55 | - [Stereo Matching](#stereo-matching)
 56 | - [Scene Graph Generation](#scene-graph-generation)
 57 | - [Video Quality Assessment](#video-quality-assessment)
 58 | - [Datasets](#datasets)
 59 | - [Others](#others)
 60 | 
 61 | ### Domain-wise Table
 62 | 
 63 | #### 3DGS (Gaussian Splatting)
 64 | 
 65 | | Index | Paper Title                                                  | Paper Link                                | Code                                                        | Official Repo                                                |
 66 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------------------- | ------------------------------------------------------------ |
 67 | | 1     | Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering | [Paper](https://arxiv.org/abs/2312.00109) | [Code](https://github.com/city-super/Scaffold-GS)           | [Homepage](https://city-super.github.io/scaffold-gs/)        |
 68 | | 2     | GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis | [Paper](https://arxiv.org/abs/2312.02155) | [Code](https://github.com/ShunyuanZheng/GPS-Gaussian)       | [Homepage](https://shunyuanzheng.github.io/GPS-Gaussian)     |
 69 | | 3     | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | [Paper](https://arxiv.org/abs/2312.02134) | [Code](https://github.com/huliangxiao/GaussianAvatar)       | N/A                                                          |
 70 | | 4     | GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting | [Paper](https://arxiv.org/abs/2311.14521) | [Code](https://github.com/buaacyw/GaussianEditor)           | N/A                                                          |
 71 | | 5     | Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction | [Paper](https://arxiv.org/abs/2309.13101) | [Code](https://github.com/ingra14m/Deformable-3D-Gaussians) | [Homepage](https://ingra14m.github.io/Deformable-Gaussians/) |
 72 | 
 73 | #### Avatars
 74 | 
 75 | | Index | Paper Title                                                  | Paper Link                                | Code                                                  | Official Repo                                 |
 76 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------------- | --------------------------------------------- |
 77 | | 6     | GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | [Paper](https://arxiv.org/abs/2312.02134) | [Code](https://github.com/huliangxiao/GaussianAvatar) | N/A                                           |
 78 | | 7     | Real-Time Simulated Avatar from Head-Mounted Sensors         | [Paper](https://arxiv.org/abs/2403.06862) | N/A                                                   | [Homepage](https://www.zhengyiluo.com/SimXR/) |
 79 | 
 80 | #### Backbone
 81 | 
 82 | | Index | Paper Title                                                  | Paper Link                                | Code                                                | Official Repo |
 83 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- |
 84 | | 8     | RepViT: Revisiting Mobile CNN From ViT Perspective           | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT)           | N/A           |
 85 | | 9     | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A           |
 86 | 
 87 | #### CLIP
 88 | 
 89 | | Index | Paper Title                                               | Paper Link                                | Code                                                         | Official Repo |
 90 | | ----- | --------------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
 91 | | 10    | Alpha-CLIP: A CLIP Model Focusing on Wherever You Want    | [Paper](https://arxiv.org/abs/2312.03818) | [Code](https://github.com/SunzeY/AlphaCLIP)                  | N/A           |
 92 | | 11    | FairCLIP: Harnessing Fairness in Vision-Language Learning | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A           |
 93 | 
 94 | #### Embodied AI
 95 | 
 96 | | Index | Paper Title                                                  | Paper Link                                | Code                                                 | Official Repo                                        |
 97 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------------- | ---------------------------------------------------- |
 98 | | 12    | EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI | [Paper](https://arxiv.org/abs/2312.16170) | [Code](https://github.com/OpenRobotLab/EmbodiedScan) | [Homepage](https://tai-wang.github.io/embodiedscan/) |
 99 | | 13    | MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception | [Paper](https://arxiv.org/abs/2312.07472) | [Code](https://github.com/IranQin/MP5)               | [Homepage](https://iranqin.github.io/MP5.github.io/) |
100 | 
101 | #### OCR
102 | 
103 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
104 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
105 | | 14    | An Empirical Study of Scaling Law for OCR                    | [Paper](https://arxiv.org/abs/2401.00028) | [Code](https://github.com/large-ocr-model/large-ocr-model.github.io) | N/A           |
106 | | 15    | ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | [Paper](https://arxiv.org/abs/2403.00303) | [Code](https://github.com/PriNing/ODM)                       | N/A           |
107 | 
108 | #### NeRF
109 | 
110 | | Index | Paper Title                                                  | Paper Link                                | Code                                        | Official Repo |
111 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------- | ------------- |
112 | | 16    | PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF | [Paper](https://arxiv.org/abs/2311.13099) | [Code](https://github.com/FYTalon/pienerf/) | N/A           |
113 | 
114 | #### DETR
115 | 
116 | | Index | Paper Title                                                  | Paper Link                                | Code                                             | Official Repo |
117 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------ | ------------- |
118 | | 17    | DETRs Beat YOLOs on Real-time Object Detection               | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR)      | N/A           |
119 | | 18    | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR) | N/A           |
120 | 
121 | #### ReID
122 | 
123 | | Index | Paper Title                                                  | Paper Link                                | Code                                        | Official Repo |
124 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------- | ------------- |
125 | | 19    | Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification | [Paper](https://arxiv.org/abs/2403.10254) | [Code](https://github.com/924973292/EDITOR) | N/A           |
126 | | 20    | Noisy-Correspondence Learning for Text-to-Image Person Re-identification | [Paper](https://arxiv.org/abs/2308.09911) | [Code](https://github.com/QinYang79/RDE)    | N/A           |
127 | 
128 | #### Long-Tail
129 | 
130 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
131 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
132 | | 1     | Delving into the Trajectory Long-tail Distribution for Multi-object Tracking | [Paper](https://arxiv.org/abs/2403.04700) | [Code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT) | N/A           |
133 | 
134 | #### Vision Transformer
135 | 
136 | | Index | Paper Title                                                  | Paper Link                                | Code                                                | Official Repo |
137 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- |
138 | | 2     | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A           |
139 | | 3     | RepViT: Revisiting Mobile CNN From ViT Perspective           | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT)           | N/A           |
140 | 
141 | #### Vision-Language
142 | 
143 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
144 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
145 | | 4     | PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | [Paper](https://arxiv.org/abs/2403.02781) | [Code](https://github.com/zhengli97/PromptKD)                | N/A           |
146 | | 5     | FairCLIP: Harnessing Fairness in Vision-Language Learning    | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A           |
147 | 
148 | #### Self-supervised Learning
149 | 
150 | | Index | Paper Title | Paper Link | Code | Official Repo |
151 | | ----- | ----------- | ---------- | ---- | ------------- |
152 | | 6     | N/A         | N/A        | N/A  | N/A           |
153 | 
154 | #### Data Augmentation
155 | 
156 | | Index | Paper Title | Paper Link | Code | Official Repo |
157 | | ----- | ----------- | ---------- | ---- | ------------- |
158 | | 7     | N/A         | N/A        | N/A  | N/A           |
159 | 
160 | #### Object Detection
161 | 
162 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
163 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
164 | | 8     | DETRs Beat YOLOs on Real-time Object Detection               | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR)                  | N/A           |
165 | | 9     | Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation | [Paper](https://arxiv.org/abs/2312.01220) | [Code](https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation) | N/A           |
166 | | 10    | YOLO-World: Real-Time Open-Vocabulary Object Detection       | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World)              | N/A           |
167 | | 11    | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR)             | N/A           |
168 | 
169 | #### Anomaly Detection
170 | 
171 | | Index | Paper Title                                                  | Paper Link                                | Code                                    | Official Repo |
172 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------- | ------------- |
173 | | 12    | Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection | [Paper](https://arxiv.org/abs/2310.12790) | [Code](https://github.com/mala-lab/AHL) | N/A           |
174 | 
175 | #### Visual Tracking
176 | 
177 | | Index | Paper Title | Paper Link | Code | Official Repo |
178 | | ----- | ----------- | ---------- | ---- | ------------- |
179 | | 13    | N/A         | N/A        | N/A  | N/A           |
180 | 
181 | #### Semantic Segmentation
182 | 
183 | | Index | Paper Title                                                  | Paper Link                                | Code                                   | Official Repo |
184 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------- | ------------- |
185 | | 14    | Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [Paper](https://arxiv.org/abs/2312.04265) | [Code](https://github.com/w1oves/Rein) | N/A           |
186 | | 15    | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [Paper](https://arxiv.org/abs/2311.15537) | [Code](https://github.com/xb534/SED)   | N/A           |
187 | 
188 | #### Instance Segmentation
189 | 
190 | | Index | Paper Title | Paper Link | Code | Official Repo |
191 | | ----- | ----------- | ---------- | ---- | ------------- |
192 | | 16    | N/A         | N/A        | N/A  | N/A           |
193 | 
194 | #### Panoptic Segmentation
195 | 
196 | | Index | Paper Title | Paper Link | Code | Official Repo |
197 | | ----- | ----------- | ---------- | ---- | ------------- |
198 | | 17    | N/A         | N/A        | N/A  | N/A           |
199 | 
200 | #### Medical Image
201 | 
202 | | Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |
203 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |
204 | | 18    | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL)   | N/A           |
205 | | 19    | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo)       | N/A           |
206 | | 20    | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A           |
207 | 
208 | #### Medical Image Segmentation
209 | 
210 | | Index | Paper Title | Paper Link | Code | Official Repo |
211 | | ----- | ----------- | ---------- | ---- | ------------- |
212 | | 21    | N/A         | N/A        | N/A  | N/A           |
213 | 
214 | #### Video Object Segmentation
215 | 
216 | | Index | Paper Title | Paper Link | Code | Official Repo |
217 | | ----- | ----------- | ---------- | ---- | ------------- |
218 | | 22    | N/A         | N/A        | N/A  | N/A           |
219 | 
220 | #### Video Instance Segmentation
221 | 
222 | | Index | Paper Title | Paper Link | Code | Official Repo |
223 | | ----- | ----------- | ---------- | ---- | ------------- |
224 | | 23    | N/A         | N/A        | N/A  | N/A           |
225 | 
226 | #### Referring Image Segmentation
227 | 
228 | | Index | Paper Title | Paper Link | Code | Official Repo |
229 | | ----- | ----------- | ---------- | ---- | ------------- |
230 | | 24    | N/A         | N/A        | N/A  | N/A           |
231 | 
232 | #### Image Matting
233 | 
234 | | Index | Paper Title | Paper Link | Code | Official Repo |
235 | | ----- | ----------- | ---------- | ---- | ------------- |
236 | | 25    | N/A         | N/A        | N/A  | N/A           |
237 | 
238 | #### Image Editing
239 | 
240 | | Index | Paper Title                                       | Paper Link                                | Code                                                  | Official Repo                                            |
241 | | ----- | ------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------- |
242 | | 26    | Edit One for All: Interactive Batch Image Editing | [Paper](https://arxiv.org/abs/2401.10219) | [Code](https://github.com/thaoshibe/edit-one-for-all) | [Homepage](https://thaoshibe.github.io/edit-one-for-all) |
243 | 
244 | #### Low-level Vision
245 | 
246 | | Index | Paper Title                                                  | Paper Link                                | Code                                     | Official Repo |
247 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------- | ------------- |
248 | | 27    | Residual Denoising Diffusion Models                          | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A           |
249 | | 28    | Boosting Image Restoration via Priors from Pre-trained Models | [Paper](https://arxiv.org/abs/2403.06793) | N/A                                      | N/A           |
250 | 
251 | #### Super-Resolution)
252 | 
253 | | Index | Paper Title                                                  | Paper Link                                | Code                                                 | Official Repo |
254 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------------- | ------------- |
255 | | 29    | SeD: Semantic-Aware Discriminator for Image Super-Resolution | [Paper](https://arxiv.org/abs/2402.19387) | [Code](https://github.com/lbc12345/SeD)              | N/A           |
256 | | 30    | APISR: Anime Production Inspired Real-World Anime Super-Resolution | [Paper](https://arxiv.org/abs/2403.01598) | [Code](https://github.com/Kiter### Domain-wise Table |               |
257 | 
258 | #### Denoising
259 | 
260 | | Index | Paper Title                         | Paper Link                                | Code                                     | Official Repo |
261 | | ----- | ----------------------------------- | ----------------------------------------- | ---------------------------------------- | ------------- |
262 | | 31    | Residual Denoising Diffusion Models | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A           |
263 | 
264 | #### Deblur
265 | 
266 | | Index | Paper Title | Paper Link | Code | Official Repo |
267 | | ----- | ----------- | ---------- | ---- | ------------- |
268 | | 32    | N/A         | N/A        | N/A  | N/A           |
269 | 
270 | #### Autonomous Driving
271 | 
272 | | Index | Paper Title                                                  | Paper Link                                | Code                                            | Official Repo |
273 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ----------------------------------------------- | ------------- |
274 | | 33    | UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | [Paper](https://arxiv.org/abs/2310.08370) | [Code](https://github.com/Nightmare-n/UniPAD)   | N/A           |
275 | | 34    | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | [Paper](https://arxiv.org/abs/2311.17663) | [Code](https://github.com/haomo-ai/Cam4DOcc)    | N/A           |
276 | | 35    | Memory-based Adapters for Online 3D Scene Perception         | [Paper](https://arxiv.org/abs/2403.06974) | [Code](https://github.com/xuxw98/Online3D)      | N/A           |
277 | | 36    | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies)    | N/A           |
278 | | 37    | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145) | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A           |
279 | | 38    | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | [Paper](https://arxiv.org/abs/2403.07535) | [Code](https://github.com/Junda24/AFNet)        | N/A           |
280 | 
281 | #### 3D Point Cloud
282 | 
283 | | Index | Paper Title | Paper Link | Code | Official Repo |
284 | | ----- | ----------- | ---------- | ---- | ------------- |
285 | | 40    | N/A         | N/A        | N/A  | N/A           |
286 | 
287 | #### 3D Object Detection
288 | 
289 | | Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |
290 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |
291 | | 41    | PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection | [Paper](https://arxiv.org/abs/2312.08371) | [Code](https://github.com/kuanchihhuang/PTT) | N/A           |
292 | | 42    | UniMODE: Unified Monocular 3D Object Detection               | [Paper](https://arxiv.org/abs/2402.18573) | N/A                                          | N/A           |
293 | 
294 | #### 3D Semantic Segmentation
295 | 
296 | | Index | Paper Title | Paper Link | Code | Official Repo |
297 | | ----- | ----------- | ---------- | ---- | ------------- |
298 | | 43    | N/A         | N/A        | N/A  | N/A           |
299 | 
300 | #### 3D Object Tracking
301 | 
302 | | Index | Paper Title | Paper Link | Code | Official Repo |
303 | | ----- | ----------- | ---------- | ---- | ------------- |
304 | | 44    | N/A         | N/A        | N/A  | N/A           |
305 | 
306 | #### 3D Semantic Scene Completion
307 | 
308 | | Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |
309 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |
310 | | 45    | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670) | [Code](https://github.com/hustvl/Symphonies) | N/A           |
311 | 
312 | #### 3D Registration
313 | 
314 | | Index | Paper Title | Paper Link | Code | Official Repo |
315 | | ----- | ----------- | ---------- | ---- | ------------- |
316 | | 46    | N/A         | N/A        | N/A  | N/A           |
317 | 
318 | #### 3D Human Pose Estimation
319 | 
320 | | Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |
321 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |
322 | | 47    | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | [Paper](https://arxiv.org/abs/2311.12028) | [Code](https://github.com/NationalGAILab/HoT) | N/A           |
323 | 
324 | #### 3D Human Mesh Estimation
325 | 
326 | | Index | Paper Title | Paper Link | Code | Official Repo |
327 | | ----- | ----------- | ---------- | ---- | ------------- |
328 | | 48    | N/A         | N/A        | N/A  | N/A           |
329 | 
330 | #### Medical Image
331 | 
332 | | Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |
333 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |
334 | | 49    | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL)   | N/A           |
335 | | 50    | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo)       | N/A           |
336 | | 51    | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A           |
337 | 
338 | #### Image Generation
339 | 
340 | | Index | Paper Title                                                  | Paper Link                                | Code                                                     | Official Repo                                                |
341 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ |
342 | | 52    | InstanceDiffusion: Instance-level Control for Image Generation | [Paper](https://arxiv.org/abs/2402.03290) | [Code](https://github.com/frank-xwang/InstanceDiffusion) | [Homepage](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/) |
343 | | 53    | ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations | [Paper](https://arxiv.org/abs/2312.04655) | [Code](https://github.com/eclipse-t2i/eclipse-inference) | [Homepage](https://eclipse-t2i.vercel.app/)                  |
344 | | 54    | Instruct-Imagen: Image Generation with Multi-modal Instruction | [Paper](https://arxiv.org/abs/2401.01952) | N/A                                                      | N/A                                                          |
345 | | 55    | UniGS: Unified Representation for Image Generation and Segmentation | [Paper](https://arxiv.org/abs/2312.01985) | N/A                                                      | N/A                                                          |
346 | | 56    | Multi-Instance Generation Controller for Text-to-Image Synthesis | [Paper](https://arxiv.org/abs/2402.05408) | [Code](https://github.com/limuloo/migc)                  | N/A                                                          |
347 | | 57    | SVGDreamer: Text Guided SVG Generation with Diffusion Model  | [Paper](https://arxiv.org/abs/2312.16476) | [Code](https://ximinng.github.io/SVGDreamer-project/)    | N/A                                                          |
348 | | 58    | InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model | [Paper](https://arxiv.org/abs/2312.05849) | [Code](https://github.com/jiuntian/interactdiffusion)    | N/A                                                          |
349 | | 59    | Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following | [Paper](https://arxiv.org/abs/2311.17002) | [Code](https://github.com/ali-vilab/Ranni)               | N/A                                                          |
350 | 
351 | #### Video Generation
352 | 
353 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                                |
354 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
355 | | 60    | Vlogger: Make Your Dream A Vlog                              | [Paper](https://arxiv.org/abs/2401.09414) | [Code](https://github.com/Vchitect/Vlogger)                  | N/A                                                          |
356 | | 61    | VBench: Comprehensive Benchmark Suite for Video Generative Models | [Paper](https://arxiv.org/abs/2311.17982) | [Code](https://github.com/Vchitect/VBench)                   | [Homepage](https://vchitect.github.io/VBench-project/)       |
357 | | 62    | VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models | [Paper](https://arxiv.org/abs/2312.00845) | [Code](https://github.com/HyeonHo99/Video-Motion-Customization) | [Homepage](https://github.com/HyeonHo99/Video-Motion-Customization) |
358 | 
359 | #### Vision Transformer
360 | 
361 | | Index | Paper Title                                                  | Paper Link                                | Code                                                | Official Repo |
362 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------- | ------------- |
363 | | 63    | TransNeXt: Robust Foveal Visual Perception for Vision Transformers | [Paper](https://arxiv.org/abs/2311.17132) | [Code](https://github.com/DaiShiResearch/TransNeXt) | N/A           |
364 | | 64    | RepViT: Revisiting Mobile CNN From ViT Perspective           | [Paper](https://arxiv.org/abs/2307.09283) | [Code](https://github.com/THU-MIG/RepViT)           | N/A           |
365 | | 65    | A General and Efficient Training for Transformer via Token Expansion | [Paper](https://arxiv.org/abs/2404.00672) | [Code](https://github.com/Osilly/TokenExpansion)    | N/A           |
366 | 
367 | #### Vision-Language
368 | 
369 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
370 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
371 | | 66    | PromptKD: Unsupervised Prompt Distillation for Vision-Language Models | [Paper](https://arxiv.org/abs/2403.02781) | [Code](https://github.com/zhengli97/PromptKD)                | N/A           |
372 | | 67    | FairCLIP: Harnessing Fairness in Vision-Language Learning    | [Paper](https://arxiv.org/abs/2403.19949) | [Code](https://github.com/Harvard-Ophthalmology-AI-Lab/FairCLIP) | N/A           |
373 | 
374 | #### Object Detection
375 | 
376 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
377 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
378 | | 68    | DETRs Beat YOLOs on Real-time Object Detection               | [Paper](https://arxiv.org/abs/2304.08069) | [Code](https://github.com/lyuwenyu/RT-DETR)                  | N/A           |
379 | | 69    | Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation | [Paper](https://arxiv.org/abs/2312.01220) | [Code](https://github.com/ZPDu/Boosting-Object-Detection-with-Zero-Shot-Day-Night-Domain-Adaptation) | N/A           |
380 | | 70    | YOLO-World: Real-Time Open-Vocabulary Object Detection       | [Paper](https://arxiv.org/abs/2401.17270) | [Code](https://github.com/AILab-CVC/YOLO-World)              | N/A           |
381 | | 71    | Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement | [Paper](https://arxiv.org/abs/2403.16131) | [Code](https://github.com/xiuqhou/Salience-DETR)             | N/A           |
382 | 
383 | #### Anomaly Detection
384 | 
385 | | Index | Paper Title                                                  | Paper Link                                | Code                                    | Official Repo |
386 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------- | ------------- |
387 | | 72    | Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection | [Paper](https://arxiv.org/abs/2310.12790) | [Code](https://github.com/mala-lab/AHL) | N/A           |
388 | 
389 | #### Object Tracking
390 | 
391 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
392 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
393 | | 73    | Delving into the Trajectory Long-tail Distribution for Multi-object Tracking | [Paper](https://arxiv.org/abs/2403.04700) | [Code](https://github.com/chen-si-jia/Trajectory-Long-tail-Distribution-for-MOT) | N/A           |
394 | 
395 | #### Semantic Segmentation
396 | 
397 | | Index | Paper Title                                                  | Paper Link                                | Code                                   | Official Repo |
398 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------- | ------------- |
399 | | 74    | Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [Paper](https://arxiv.org/abs/2312.04265) | [Code](https://github.com/w1oves/Rein) | N/A           |
400 | | 75    | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [Paper](https://arxiv.org/abs/2311.15537) | [Code](https://github.com/xb534/SED)   | N/A           |
401 | 
402 | #### Medical Image
403 | 
404 | | Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |
405 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |
406 | | 76    | Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology | [Paper](https://arxiv.org/abs/2402.17228) | [Code](https://github.com/DearCaat/RRT-MIL)   | N/A           |
407 | | 77    | VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis | [Paper](https://arxiv.org/abs/2402.17300) | [Code](https://github.com/Luffy03/VoCo)       | N/A           |
408 | | 78    | ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images | [Paper](https://arxiv.org/abs/2311.15264) | [Code](https://github.com/nicoboou/chada_vit) | N/A           |
409 | 
410 | #### Medical Image Segmentation
411 | 
412 | | Index | Paper Title | Paper Link | Code | Official Repo |
413 | | ----- | ----------- | ---------- | ---- | ------------- |
414 | | 76    | N/A         | N/A        | N/A  | N/A           |
415 | 
416 | #### Autonomous Driving
417 | 
418 | | Index | Paper Title                                                  | Paper Link                                    | Code                                            | Official Repo |
419 | | ----- | ------------------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- |
420 | | 77    | UniPAD: A Universal Pre-training Paradigm for Autonomous Driving | [Paper](https://arxiv.org/abs/2310.08370)     | [Code](https://github.com/Nightmare-n/UniPAD)   | N/A           |
421 | | 78    | Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications | [Paper](https://arxiv.org/abs/2311.17663)     | [Code](https://github.com/haomo-ai/Cam4DOcc)    | N/A           |
422 | | 79    | Memory-based Adapters for Online 3D Scene Perception         | [Paper](https://arxiv.org/abs/2403.06974)     | [Code](https://github.com/xuxw98/Online3D)      | N/A           |
423 | | 80    | Symphonize 3D Semantic Scene Completion with Contextual Instance Queries | [Paper](https://arxiv.org/abs/2306.15670)     | [Code](https://github.com/hustvl/Symphonies)    | N/A           |
424 | | 81    | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145)     | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A           |
425 | | 82    | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | [Paper](https://arxiv.org/abs/2403.07535)     | [Code](https://github.com/Junda24/AFNet)        | N/A           |
426 | | 83    | Traffic Scene Parsing through the TSP6K Dataset              | [Paper](https://arxiv.org/pdf/2303.02835.pdf) | [Code](https://github.com/PengtaoJiang/TSP6K)   | N/A           |
427 | 
428 | #### 3D Point Cloud
429 | 
430 | | Index | Paper Title | Paper Link | Code | Official Repo |
431 | | ----- | ----------- | ---------- | ---- | ------------- |
432 | | 84    | N/A         | N/A        | N/A  | N/A           |
433 | 
434 | #### 3D Object Detection
435 | 
436 | | Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |
437 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |
438 | | 85    | PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection | [Paper](https://arxiv.org/abs/2312.08371) | [Code](https://github.com/kuanchihhuang/PTT) | N/A           |
439 | | 86    | UniMODE: Unified Monocular 3D Object Detection               | [Paper](https://arxiv.org/abs/2402.18573) | N/A                                          | N/A           |
440 | 
441 | #### 3D Semantic Segmentation
442 | 
443 | | Index | Paper Title | Paper Link | Code | Official Repo |
444 | | ----- | ----------- | ---------- | ---- | ------------- |
445 | | 87    | N/A         | N/A        | N/A  | N/A           |
446 | 
447 | #### Image Editing
448 | 
449 | | Index | Paper Title                                       | Paper Link                                | Code                                                  | Official Repo                                            |
450 | | ----- | ------------------------------------------------- | ----------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------- |
451 | | 88    | Edit One for All: Interactive Batch Image Editing | [Paper](https://arxiv.org/abs/2401.10219) | [Code](https://github.com/thaoshibe/edit-one-for-all) | [Homepage](https://thaoshibe.github.io/edit-one-for-all) |
452 | 
453 | #### Video Editing
454 | 
455 | | Index | Paper Title                                                  | Paper Link                                | Code | Official Repo                         |
456 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---- | ------------------------------------- |
457 | | 89    | MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers | [Paper](https://arxiv.org/abs/2312.12468) | N/A  | [Homepage](https://maskint.github.io) |
458 | 
459 | #### Low-level Vision
460 | 
461 | | Index | Paper Title                                                  | Paper Link                                | Code                                     | Official Repo |
462 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------- | ------------- |
463 | | 90    | Residual Denoising Diffusion Models                          | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM) | N/A           |
464 | | 91    | Boosting Image Restoration via Priors from Pre-trained Models | [Paper](https://arxiv.org/abs/2403.06793) | N/A                                      | N/A           |
465 | 
466 | #### Super-Resolution
467 | 
468 | | Index | Paper Title                                                  | Paper Link                                | Code                                         | Official Repo |
469 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------- | ------------- |
470 | | 92    | SeD: Semantic-Aware Discriminator for Image Super-Resolution | [Paper](https://arxiv.org/abs/2402.19387) | [Code](https://github.com/lbc12345/SeD)      | N/A           |
471 | | 93    | APISR: Anime Production Inspired Real-World Anime Super-Resolution | [Paper](https://arxiv.org/abs/2403.01598) | [Code](https://github.com/Kiteretsu77/APISR) | N/A           |
472 | 
473 | #### Denoising
474 | 
475 | | Index | Paper Title | Paper Link | Code | Official Repo |
476 | | ----- | ----------- | ---------- | ---- | ------------- |
477 | | 94    | N/A         | N/A        | N/A  | N/A           |
478 | 
479 | #### 3D Human Pose Estimation
480 | 
481 | | Index | Paper Title                                                  | Paper Link                                | Code                                          | Official Repo |
482 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------- | ------------- |
483 | | 95    | Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation | [Paper](https://arxiv.org/abs/2311.12028) | [Code](https://github.com/NationalGAILab/HoT) | N/A           |
484 | 
485 | #### Image Generation
486 | 
487 | | Index | Paper Title                                                  | Paper Link                                | Code                                                     | Official Repo                                                |
488 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | -------------------------------------------------------- | ------------------------------------------------------------ |
489 | | 96    | InstanceDiffusion: Instance-level Control for Image Generation | [Paper](https://arxiv.org/abs/2402.03290) | [Code](https://github.com/frank-xwang/InstanceDiffusion) | [Homepage](https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/) |
490 | | 97    | ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations | [Paper](https://arxiv.org/abs/2312.04655) | [Code](https://github.com/eclipse-t2i/eclipse-inference) | [Homepage](https://eclipse-t2i.vercel.app/)                  |
491 | | 98    | Instruct-Imagen: Image Generation with Multi-modal Instruction | [Paper](https://arxiv.org/abs/2401.01952) | N/A                                                      | N/A                                                          |
492 | | 99    | Residual Denoising Diffusion Models                          | [Paper](https://arxiv.org/abs/2308.13712) | [Code](https://github.com/nachifur/RDDM)                 | N/A                                                          |
493 | | 100   | UniGS: Unified Representation for Image Generation and Segmentation | [Paper](https://arxiv.org/abs/2312.01985) | N/A                                                      | N/A                                                          |
494 | | 101   | Multi-Instance Generation Controller for Text-to-Image Synthesis | [Paper](https://arxiv.org/abs/2402.05408) | [Code](https://github.com/limuloo/migc)                  | N/A                                                          |
495 | | 102   | SVGDreamer: Text Guided SVG Generation with Diffusion Model  | [Paper](https://arxiv.org/abs/2312.16476) | [Code](https://ximinng.github.io/SVGDreamer-project/)    | N/A                                                          |
496 | | 103   | InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model | [Paper](https://arxiv.org/abs/2312.05849) | [Code](https://github.com/jiuntian/interactdiffusion)    | N/A                                                          |
497 | | 104   | Ranni: Taming Text-to-Image Diffusion for Accurate Prompt Following | [Paper](https://arxiv.org/abs/2311.17002) | [Code](https://github.com/ali-vilab/Ranni)               | N/A                                                          |
498 | 
499 | #### Video Generation
500 | 
501 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                             |
502 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------- |
503 | | 105   | Vlogger: Make Your Dream A Vlog                              | [Paper](https://arxiv.org/abs/2401.09414) | [Code](https://github.com/Vchitect/Vlogger)                  | N/A                                                       |
504 | | 106   | VBench: Comprehensive Benchmark Suite for Video Generative Models | [Paper](https://arxiv.org/abs/2311.17982) | [Code](https://github.com/Vchitect/VBench)                   | [Homepage](https://vchitect.github.io/VBench-project/)    |
505 | | 107   | VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models | [Paper](https://arxiv.org/abs/2312.00845) | [Code](https://github.com/HyeonHo99/Video-Motion-Customization) | [Homepage](https://video-motion-customization.github.io/) |
506 | 
507 | #### 3D Generation
508 | 
509 | | Index | Paper Title                                                  | Paper Link                                | Code                                                      | Official Repo                                           |
510 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | --------------------------------------------------------- | ------------------------------------------------------- |
511 | | 108   | CityDreamer: Compositional Generative Model of Unbounded 3D Cities | [Paper](https://arxiv.org/abs/2309.00610) | [Code](https://github.com/hzxie/city-dreamer)             | [Homepage](https://haozhexie.com/project/city-dreamer/) |
512 | | 109   | LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching | [Paper](https://arxiv.org/abs/2311.11284) | [Code](https://github.com/EnVision-Research/LucidDreamer) | N/A                                                     |
513 | 
514 | #### Video Understanding
515 | 
516 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo |
517 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
518 | | 110   | MVBench: A Comprehensive Multi-modal Video Understanding Benchmark | [Paper](https://arxiv.org/abs/2311.17005) | [Code](https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2) | N/A           |
519 | 
520 | #### Knowledge Distillation
521 | 
522 | | Index | Paper Title                                          | Paper Link                                | Code                                                         | Official Repo |
523 | | ----- | ---------------------------------------------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------- |
524 | | 111   | Logit Standardization in Knowledge Distillation      | [Paper](https://arxiv.org/abs/2403.01427) | [Code](https://github.com/sunshangquan/logit-standardization-KD) | N/A           |
525 | | 112   | Efficient Dataset Distillation via Minimax Diffusion | [Paper](https://arxiv.org/abs/2311.15529) | [Code](https://github.com/vimar-gu/MinimaxDiffusion)         | N/A           |
526 | 
527 | #### Stereo Matching
528 | 
529 | | Index | Paper Title                                    | Paper Link                                | Code                                       | Official Repo |
530 | | ----- | ---------------------------------------------- | ----------------------------------------- | ------------------------------------------ | ------------- |
531 | | 113   | Neural Markov Random Field for Stereo Matching | [Paper](https://arxiv.org/abs/2403.11193) | [Code](https://github.com/aeolusguan/NMRF) | N/A           |
532 | 
533 | #### Scene Graph Generation
534 | 
535 | | Index | Paper Title                                                  | Paper Link                                | Code                                           | Official Repo                                      |
536 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ---------------------------------------------- | -------------------------------------------------- |
537 | | 114   | HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation | [Paper](https://arxiv.org/abs/2403.12033) | [Code](https://github.com/zhangce01/HiKER-SGG) | [Homepage](https://zhangce01.github.io/HiKER-SGG/) |
538 | 
539 | #### Video Quality Assessment
540 | 
541 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                         |
542 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------- |
543 | | 115   | KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos | [Paper](https://arxiv.org/abs/2402.07220) | [Code](https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024) | [Homepage](https://lixinustc.github.io/projects/KVQ/) |
544 | 
545 | #### Datasets
546 | 
547 | | Index | Paper Title                                                  | Paper Link                                    | Code                                            | Official Repo |
548 | | ----- | ------------------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- |
549 | | 116   | A Real-world Large-scale Dataset for Roadside Cooperative Perception | [Paper](https://arxiv.org/abs/2403.10145)     | [Code](https://github.com/AIR-THU/DAIR-RCooper) | N/A           |
550 | | 117   | Traffic Scene Parsing through the TSP6K Dataset              | [Paper](https://arxiv.org/pdf/2303.02835.pdf) | [Code](https://github.com/PengtaoJiang/TSP6K)   | N/A           |
551 | 
552 | #### Others
553 | 
554 | | Index | Paper Title                                                  | Paper Link                                | Code                                                         | Official Repo                                                |
555 | | ----- | ------------------------------------------------------------ | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
556 | | 118   | Object Recognition as Next Token Prediction                  | [Paper](https://arxiv.org/abs/2312.02142) | [Code](https://github.com/kaiyuyue/nxtp)                     | N/A                                                          |
557 | | 119   | ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks | [Paper](https://arxiv.org/abs/2306.14525) | [Code](https://parameternet.github.io/)                      | N/A                                                          |
558 | | 120   | Seamless Human Motion Composition with Blended Positional Encodings | [Paper](https://arxiv.org/abs/2402.15509) | [Code](https://github.com/BarqueroGerman/FlowMDM)            | N/A                                                          |
559 | | 121   | LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning | [Paper](https://arxiv.org/abs/2311.18651) | [Code](https://github.com/Open3DA/LL3DA)                     | [Homepage](https://ll3da.github.io/)                         |
560 | | 122   | CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update | [Paper](https://arxiv.org/abs/2312.10908) | N/A                                                          | [Homepage](https://clova-tool.github.io/)                    |
561 | | 123   | MoMask: Generative Masked Modeling of 3D Human Motions       | [Paper](https://arxiv.org/abs/2312.00063) | [Code](https://github.com/EricGuo5513/momask-codes)          | N/A                                                          |
562 | | 124   | Amodal Ground Truth and Completion in the Wild               | [Paper](https://arxiv.org/abs/2312.17247) | [Code](https://github.com/Championchess/Amodal-Completion-in-the-Wild) | [Homepage](https://www.robots.ox.ac.uk/~vgg/research/amodal/) |
563 | | 125   | Improved Visual Grounding through Self-Consistent Explanations | [Paper](https://arxiv.org/abs/2312.04554) | [Code](https://github.com/uvavision/SelfEQ)                  | N/A                                                          |
564 | | 126   | ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | [Paper](https://arxiv.org/abs/2403.18775) | [Code](https://github.com/chenshuang-zhang/imagenet_d)       | [Homepage](https://chenshuang-zhang.github.io/imagenet_d/)   |
565 | | 127   | Learning from Synthetic Human Group Activities               | [Paper](https://arxiv.org/abs/2306.16772) | [Code](https://github.com/cjerry1243/M3Act)                  | [Homepage](https://cjerry1243.github.io/M3Act/)              |
566 | | 128   | A Cross-Subject Brain Decoding Framework                     | [Paper](https://arxiv.org/abs/2404.07850) | [Code](https://github.com/littlepure2333/MindBridge)         | [Homepage](https://littlepure2333.github.io/MindBridge/)     |
567 | | 129   | Multi-Task Dense Prediction via Mixture of Low-Rank Experts  | [Paper](https://arxiv.org/abs/2403.17749) | [Code](https://github.com/YuqiYang213/MLoRE)                 | N/A                                                          |
568 | | 130   | Contrastive Mean-Shift Learning for Generalized Category Discovery | [Paper](https://arxiv.org/abs/2404.09451) | [Code](https://github.com/sua-choi/CMS)                      | [Homepage](https://postech-cvlab.github.io/cms/)             |
569 | 
570 | #### Thank you for Reading


--------------------------------------------------------------------------------
/mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ashishpatel26/CVPR2024/0339d4af644ac2ed204f0857d1959992bfe34427/mindmap/fd94ed3530b7015a458d81055f24fc026f33f7d1c45cde6cdc34f1a689509916.png


--------------------------------------------------------------------------------