├── .gitignore
├── CONTRIBUTING.md
├── LICENSE
├── README.md
└── automation
    ├── data.csv
    ├── generate.py
    └── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea/
2 | .vscode/
3 | 
4 | env/
5 | venv/
6 | 
7 | .DS_Store
8 | .ipynb_checkpoints


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | ## 🦸 contributor guide
 2 | 
 3 | - fork and clone the repository (command below clones the original repository)
 4 | 
 5 |     ```bash
 6 |     git clone https://github.com/SkalskiP/top-cvpr-2024-papers.git
 7 |     ```
 8 |   
 9 | - navigate to the `automation` directory
10 | 
11 |     ```bash
12 |     cd top-cvpr-2024-papers/automation
13 |     ```
14 |   
15 | - setup and activate python environment (optional, but recommended)
16 | 
17 |     ```bash
18 |     python3 -m venv venv
19 |     source venv/bin/activate
20 |     ```
21 | 
22 | - install dependencies
23 | 
24 |     ```bash
25 |     pip install -r requirements.txt
26 |     ```
27 | 
28 | - update `data.csv` with awesome CVPR 2024 papers
29 | 
30 | - update `README.md` with the following command
31 | 
32 |     ```bash
33 |     python automation/generate.py
34 |     ```


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Creative Commons Legal Code
  2 | 
  3 | CC0 1.0 Universal
  4 | 
  5 |     CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
  6 |     LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
  7 |     ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
  8 |     INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
  9 |     REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
 10 |     PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
 11 |     THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
 12 |     HEREUNDER.
 13 | 
 14 | Statement of Purpose
 15 | 
 16 | The laws of most jurisdictions throughout the world automatically confer
 17 | exclusive Copyright and Related Rights (defined below) upon the creator
 18 | and subsequent owner(s) (each and all, an "owner") of an original work of
 19 | authorship and/or a database (each, a "Work").
 20 | 
 21 | Certain owners wish to permanently relinquish those rights to a Work for
 22 | the purpose of contributing to a commons of creative, cultural and
 23 | scientific works ("Commons") that the public can reliably and without fear
 24 | of later claims of infringement build upon, modify, incorporate in other
 25 | works, reuse and redistribute as freely as possible in any form whatsoever
 26 | and for any purposes, including without limitation commercial purposes.
 27 | These owners may contribute to the Commons to promote the ideal of a free
 28 | culture and the further production of creative, cultural and scientific
 29 | works, or to gain reputation or greater distribution for their Work in
 30 | part through the use and efforts of others.
 31 | 
 32 | For these and/or other purposes and motivations, and without any
 33 | expectation of additional consideration or compensation, the person
 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she
 35 | is an owner of Copyright and Related Rights in the Work, voluntarily
 36 | elects to apply CC0 to the Work and publicly distribute the Work under its
 37 | terms, with knowledge of his or her Copyright and Related Rights in the
 38 | Work and the meaning and intended legal effect of CC0 on those rights.
 39 | 
 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be
 41 | protected by copyright and related or neighboring rights ("Copyright and
 42 | Related Rights"). Copyright and Related Rights include, but are not
 43 | limited to, the following:
 44 | 
 45 |   i. the right to reproduce, adapt, distribute, perform, display,
 46 |      communicate, and translate a Work;
 47 |  ii. moral rights retained by the original author(s) and/or performer(s);
 48 | iii. publicity and privacy rights pertaining to a person's image or
 49 |      likeness depicted in a Work;
 50 |  iv. rights protecting against unfair competition in regards to a Work,
 51 |      subject to the limitations in paragraph 4(a), below;
 52 |   v. rights protecting the extraction, dissemination, use and reuse of data
 53 |      in a Work;
 54 |  vi. database rights (such as those arising under Directive 96/9/EC of the
 55 |      European Parliament and of the Council of 11 March 1996 on the legal
 56 |      protection of databases, and under any national implementation
 57 |      thereof, including any amended or successor version of such
 58 |      directive); and
 59 | vii. other similar, equivalent or corresponding rights throughout the
 60 |      world based on applicable law or treaty, and any national
 61 |      implementations thereof.
 62 | 
 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention
 64 | of, applicable law, Affirmer hereby overtly, fully, permanently,
 65 | irrevocably and unconditionally waives, abandons, and surrenders all of
 66 | Affirmer's Copyright and Related Rights and associated claims and causes
 67 | of action, whether now known or unknown (including existing as well as
 68 | future claims and causes of action), in the Work (i) in all territories
 69 | worldwide, (ii) for the maximum duration provided by applicable law or
 70 | treaty (including future time extensions), (iii) in any current or future
 71 | medium and for any number of copies, and (iv) for any purpose whatsoever,
 72 | including without limitation commercial, advertising or promotional
 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
 74 | member of the public at large and to the detriment of Affirmer's heirs and
 75 | successors, fully intending that such Waiver shall not be subject to
 76 | revocation, rescission, cancellation, termination, or any other legal or
 77 | equitable action to disrupt the quiet enjoyment of the Work by the public
 78 | as contemplated by Affirmer's express Statement of Purpose.
 79 | 
 80 | 3. Public License Fallback. Should any part of the Waiver for any reason
 81 | be judged legally invalid or ineffective under applicable law, then the
 82 | Waiver shall be preserved to the maximum extent permitted taking into
 83 | account Affirmer's express Statement of Purpose. In addition, to the
 84 | extent the Waiver is so judged Affirmer hereby grants to each affected
 85 | person a royalty-free, non transferable, non sublicensable, non exclusive,
 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and
 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the
 88 | maximum duration provided by applicable law or treaty (including future
 89 | time extensions), (iii) in any current or future medium and for any number
 90 | of copies, and (iv) for any purpose whatsoever, including without
 91 | limitation commercial, advertising or promotional purposes (the
 92 | "License"). The License shall be deemed effective as of the date CC0 was
 93 | applied by Affirmer to the Work. Should any part of the License for any
 94 | reason be judged legally invalid or ineffective under applicable law, such
 95 | partial invalidity or ineffectiveness shall not invalidate the remainder
 96 | of the License, and in such case Affirmer hereby affirms that he or she
 97 | will not (i) exercise any of his or her remaining Copyright and Related
 98 | Rights in the Work or (ii) assert any associated claims and causes of
 99 | action with respect to the Work, in either case contrary to Affirmer's
100 | express Statement of Purpose.
101 | 
102 | 4. Limitations and Disclaimers.
103 | 
104 |  a. No trademark or patent rights held by Affirmer are waived, abandoned,
105 |     surrendered, licensed or otherwise affected by this document.
106 |  b. Affirmer offers the Work as-is and makes no representations or
107 |     warranties of any kind concerning the Work, express, implied,
108 |     statutory or otherwise, including without limitation warranties of
109 |     title, merchantability, fitness for a particular purpose, non
110 |     infringement, or the absence of latent or other defects, accuracy, or
111 |     the present or absence of errors, whether or not discoverable, all to
112 |     the greatest extent permissible under applicable law.
113 |  c. Affirmer disclaims responsibility for clearing rights of other persons
114 |     that may apply to the Work or any use thereof, including without
115 |     limitation any person's Copyright and Related Rights in the Work.
116 |     Further, Affirmer disclaims responsibility for obtaining any necessary
117 |     consents, permissions or other rights required for any use of the
118 |     Work.
119 |  d. Affirmer understands and acknowledges that Creative Commons is not a
120 |     party to this document and has no duty or obligation with respect to
121 |     this CC0 or use of the Work.
122 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ![visitor badge](https://visitor-badge.laobi.icu/badge?page_id=SkalskiP.top-cvpr-2024-papers)
  2 | 
  3 | <div align="center">
  4 |   <h1 align="center">top CVPR 2024 papers</h1>
  5 |   <a href="https://github.com/SkalskiP/top-cvpr-2023-papers">2023</a> | <a href="https://github.com/SkalskiP/top-cvpr-2024-papers">2024</a> | <a href="https://github.com/SkalskiP/top-cvpr-2025-papers">2025</a>
  6 | </div>
  7 | 
  8 | <br>
  9 | 
 10 | <div align="center">
 11 |   <img width="600" src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/347853f9-9e93-4ca0-858b-a7c3f6bba073" alt="vancouver">
 12 | </div>
 13 | 
 14 | ## 👋 hello
 15 | 
 16 | Computer Vision and Pattern Recognition is a massive conference. In **2024** alone,
 17 | **11,532** papers were submitted, and **2,719** were accepted. I created this repository
 18 | to help you search for crème de la crème of CVPR publications. If the paper you are
 19 | looking for is not on my short list, take a peek at the full
 20 | [list](https://cvpr.thecvf.com/Conferences/2024/AcceptedPapers) of accepted papers.
 21 | 
 22 | ## 🗞️ papers and posters
 23 | 
 24 | *🔥 - highlighted papers*
 25 | 
 26 | <!--- AUTOGENERATED_PAPERS_LIST -->
 27 | <!---
 28 |    WARNING: DO NOT EDIT THIS LIST MANUALLY. IT IS AUTOMATICALLY GENERATED.
 29 |    HEAD OVER TO https://github.com/SkalskiP/top-cvpr-2024-papers/blob/master/CONTRIBUTING.md FOR MORE DETAILS ON HOW TO MAKE CHANGES PROPERLY.
 30 | -->
 31 | ### 3d from multi-view and sensors
 32 | 
 33 | <p align="left">
 34 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31668.png?t=1717417393.7589533" title="SpatialTracker: Tracking Any 2D Pixels in 3D Space">
 35 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/56498f78-2ca0-46ee-9231-6aa1806b6ebc" alt="SpatialTracker: Tracking Any 2D Pixels in 3D Space" width="400px" align="left" />
 36 |     </a>
 37 |     <a href="https://arxiv.org/abs/2404.04319" title="SpatialTracker: Tracking Any 2D Pixels in 3D Space">
 38 |         <strong>🔥 SpatialTracker: Tracking Any 2D Pixels in 3D Space</strong>
 39 |     </a>
 40 |     <br/>
 41 |     Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou
 42 |     <br/>
 43 |     [<a href="https://arxiv.org/abs/2404.04319">paper</a>] [<a href="https://github.com/henry123-boy/SpaTracker">code</a>]   
 44 |     <br/>
 45 |     <strong>Topic:</strong> 3D from multi-view and sensors
 46 |     <br/>
 47 |     <strong>Session:</strong> Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84
 48 | </p>
 49 | <br/>
 50 | <br/>
 51 | 
 52 | 
 53 | <p align="left">
 54 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31616.png?t=1716470830.0209699" title="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models">
 55 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/0453bf88-9d54-4ecf-8a45-01af0f604faf" alt="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models" width="400px" align="left" />
 56 |     </a>
 57 |     <a href="https://arxiv.org/abs/2403.01807" title="ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models">
 58 |         <strong>ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models</strong>
 59 |     </a>
 60 |     <br/>
 61 |     Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner
 62 |     <br/>
 63 |     [<a href="https://arxiv.org/abs/2403.01807">paper</a>] [<a href="https://github.com/facebookresearch/ViewDiff">code</a>] [<a href="https://youtu.be/SdjoCqHzMMk">video</a>]  
 64 |     <br/>
 65 |     <strong>Topic:</strong> 3D from multi-view and sensors
 66 |     <br/>
 67 |     <strong>Session:</strong> Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20
 68 | </p>
 69 | <br/>
 70 | <br/>
 71 | 
 72 | 
 73 | <p align="left">
 74 |     <a href="https://arxiv.org/abs/2405.12979" title="OmniGlue: Generalizable Feature Matching with Foundation Model Guidance">
 75 |         <strong>OmniGlue: Generalizable Feature Matching with Foundation Model Guidance</strong>
 76 |     </a>
 77 |     <br/>
 78 |     Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo
 79 |     <br/>
 80 |     [<a href="https://arxiv.org/abs/2405.12979">paper</a>] [<a href="https://github.com/google-research/omniglue">code</a>]  [<a href="https://huggingface.co/spaces/qubvel-hf/omniglue">demo</a>] 
 81 |     <br/>
 82 |     <strong>Topic:</strong> 3D from multi-view and sensors
 83 |     <br/>
 84 |     <strong>Session:</strong> Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #32
 85 | </p>
 86 | <br/>
 87 | 
 88 | ### deep learning architectures and techniques
 89 | 
 90 | <p align="left">
 91 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30529.png?t=1717455193.7819567" title="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks">
 92 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4aaf3f87-cc62-4fa3-af99-c8c1c83c0069" alt="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks" width="400px" align="left" />
 93 |     </a>
 94 |     <a href="https://arxiv.org/pdf/2311.06242" title="Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks">
 95 |         <strong>🔥 Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks</strong>
 96 |     </a>
 97 |     <br/>
 98 |     Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan
 99 |     <br/>
100 |     [<a href="https://arxiv.org/pdf/2311.06242">paper</a>]  [<a href="https://youtu.be/cOlyA00K1ec">video</a>] [<a href="https://huggingface.co/spaces/gokaygokay/Florence-2">demo</a>] [<a href="https://youtu.be/cOlyA00K1ec">colab</a>]
101 |     <br/>
102 |     <strong>Topic:</strong> Deep learning architectures and techniques
103 |     <br/>
104 |     <strong>Session:</strong> Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102
105 | </p>
106 | <br/>
107 | <br/>
108 | 
109 | ### document analysis and understanding
110 | 
111 | <p align="left">
112 |     <a href="https://arxiv.org/abs/2405.04408" title="DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks">
113 |         <strong>DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks</strong>
114 |     </a>
115 |     <br/>
116 |     Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin
117 |     <br/>
118 |     [<a href="https://arxiv.org/abs/2405.04408">paper</a>] [<a href="https://github.com/ZZZHANG-jx/DocRes">code</a>]  [<a href="https://huggingface.co/spaces/qubvel-hf/documents-restoration">demo</a>] 
119 |     <br/>
120 |     <strong>Topic:</strong> Document analysis and understanding
121 |     <br/>
122 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #101
123 | </p>
124 | <br/>
125 | 
126 | ### efficient and scalable vision
127 | 
128 | <p align="left">
129 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/e95eac04-5a45-402c-885d-14395879abd3" title="EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything">
130 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/e95eac04-5a45-402c-885d-14395879abd3" alt="EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything" width="400px" align="left" />
131 |     </a>
132 |     <a href="https://arxiv.org/abs/2312.00863" title="EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything">
133 |         <strong>🔥 EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything</strong>
134 |     </a>
135 |     <br/>
136 |     Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra
137 |     <br/>
138 |     [<a href="https://arxiv.org/abs/2312.00863">paper</a>] [<a href="https://github.com/yformer/EfficientSAM">code</a>]  [<a href="https://huggingface.co/spaces/SkalskiP/EfficientSAM">demo</a>] 
139 |     <br/>
140 |     <strong>Topic:</strong> Efficient and scalable vision
141 |     <br/>
142 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #144
143 | </p>
144 | <br/>
145 | <br/>
146 | 
147 | 
148 | <p align="left">
149 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30022.png?t=1718402790.003817" title="MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training">
150 |         <img src="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30022.png?t=1718402790.003817" alt="MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training" width="400px" align="left" />
151 |     </a>
152 |     <a href="https://arxiv.org/abs/2311.17049" title="MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training">
153 |         <strong>MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training</strong>
154 |     </a>
155 |     <br/>
156 |     Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel
157 |     <br/>
158 |     [<a href="https://arxiv.org/abs/2311.17049">paper</a>] [<a href="https://github.com/apple/ml-mobileclip">code</a>]  [<a href="https://huggingface.co/spaces/Xenova/webgpu-mobileclip">demo</a>] 
159 |     <br/>
160 |     <strong>Topic:</strong> Efficient and scalable vision
161 |     <br/>
162 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #130
163 | </p>
164 | <br/>
165 | <br/>
166 | 
167 | ### explainable computer vision
168 | 
169 | <p align="left">
170 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/6d87318b-57c1-40c7-9de6-5cb47145e119" title="Describing Differences in Image Sets with Natural Language">
171 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/6d87318b-57c1-40c7-9de6-5cb47145e119" alt="Describing Differences in Image Sets with Natural Language" width="400px" align="left" />
172 |     </a>
173 |     <a href="https://arxiv.org/abs/2312.02974" title="Describing Differences in Image Sets with Natural Language">
174 |         <strong>🔥 Describing Differences in Image Sets with Natural Language</strong>
175 |     </a>
176 |     <br/>
177 |     Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy
178 |     <br/>
179 |     [<a href="https://arxiv.org/abs/2312.02974">paper</a>] [<a href="https://github.com/Understanding-Visual-Datasets/VisDiff">code</a>]   
180 |     <br/>
181 |     <strong>Topic:</strong> Explainable computer vision
182 |     <br/>
183 |     <strong>Session:</strong> Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #115
184 | </p>
185 | <br/>
186 | <br/>
187 | 
188 | ### image and video synthesis and generation
189 | 
190 | <p align="left">
191 |     <a href="https://arxiv.org/abs/2311.16973" title="DemoFusion: Democratising High-Resolution Image Generation With No $$$">
192 |         <strong>DemoFusion: Democratising High-Resolution Image Generation With No $$$</strong>
193 |     </a>
194 |     <br/>
195 |     Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
196 |     <br/>
197 |     [<a href="https://arxiv.org/abs/2311.16973">paper</a>] [<a href="https://github.com/PRIS-CV/DemoFusion">code</a>]  [<a href="https://huggingface.co/spaces/radames/Enhance-This-DemoFusion-SDXL">demo</a>] [<a href="https://colab.research.google.com/github/camenduru/DemoFusion-colab/blob/main/DemoFusion_colab.ipynb">colab</a>]
198 |     <br/>
199 |     <strong>Topic:</strong> Image and video synthesis and generation
200 |     <br/>
201 |     <strong>Session:</strong> Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #132
202 | </p>
203 | <br/>
204 | 
205 | 
206 | <p align="left">
207 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/b0833f6b-6924-4f28-b409-ae85aaaa4dd6" title="DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing">
208 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/2a0219f5-9f1e-47e1-a968-d4d98154feb2" alt="DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing" width="400px" align="left" />
209 |     </a>
210 |     <a href="https://arxiv.org/abs/2306.14435" title="DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing">
211 |         <strong>🔥 DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing</strong>
212 |     </a>
213 |     <br/>
214 |     Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai
215 |     <br/>
216 |     [<a href="https://arxiv.org/abs/2306.14435">paper</a>] [<a href="https://github.com/Yujun-Shi/DragDiffusion">code</a>] [<a href="https://youtu.be/rysOFTpDBhc">video</a>]  
217 |     <br/>
218 |     <strong>Topic:</strong> Image and video synthesis and generation
219 |     <br/>
220 |     <strong>Session:</strong> Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #392
221 | </p>
222 | <br/>
223 | <br/>
224 | 
225 | 
226 | <p align="left">
227 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30657.png?t=1717473392.6694562" title="Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models">
228 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/709e3619-25d9-409e-b6ad-ca082611fe09" alt="Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models" width="400px" align="left" />
229 |     </a>
230 |     <a href="https://arxiv.org/abs/2311.17919" title="Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models">
231 |         <strong>🔥 Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models</strong>
232 |     </a>
233 |     <br/>
234 |     Daniel Geng, Inbum Park, Andrew Owens
235 |     <br/>
236 |     [<a href="https://arxiv.org/abs/2311.17919">paper</a>] [<a href="https://github.com/dangeng/visual_anagrams">code</a>]   [<a href="https://colab.research.google.com/github/dangeng/visual_anagrams/blob/main/notebooks/colab_demo_free_tier.ipynb">colab</a>]
237 |     <br/>
238 |     <strong>Topic:</strong> Image and video synthesis and generation
239 |     <br/>
240 |     <strong>Session:</strong> Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #118
241 | </p>
242 | <br/>
243 | <br/>
244 | 
245 | ### low-level vision
246 | 
247 | <p align="left">
248 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/8eb6b4f0-4ae6-4615-9921-f73fa2aa3766" title="XFeat: Accelerated Features for Lightweight Image Matching">
249 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/50b6d16f-c2d8-49a4-8c15-a31d6f9a3c44" alt="XFeat: Accelerated Features for Lightweight Image Matching" width="400px" align="left" />
250 |     </a>
251 |     <a href="https://arxiv.org/abs/2404.19174" title="XFeat: Accelerated Features for Lightweight Image Matching">
252 |         <strong>XFeat: Accelerated Features for Lightweight Image Matching</strong>
253 |     </a>
254 |     <br/>
255 |     Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento
256 |     <br/>
257 |     [<a href="https://arxiv.org/abs/2404.19174">paper</a>] [<a href="https://github.com/verlab/accelerated_features">code</a>] [<a href="https://youtu.be/RamC70IkZuI">video</a>] [<a href="https://huggingface.co/spaces/qubvel-hf/xfeat">demo</a>] [<a href="https://colab.research.google.com/github/verlab/accelerated_features/blob/main/notebooks/xfeat_matching.ipynb">colab</a>]
258 |     <br/>
259 |     <strong>Topic:</strong> Low-level vision
260 |     <br/>
261 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #245
262 | </p>
263 | <br/>
264 | <br/>
265 | 
266 | 
267 | <p align="left">
268 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/038bef8f-a6df-440d-9ebc-b58f69beb338" title="Robust Image Denoising through Adversarial Frequency Mixup">
269 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/03cc753c-f875-479e-bca2-e0375e9929a6" alt="Robust Image Denoising through Adversarial Frequency Mixup" width="400px" align="left" />
270 |     </a>
271 |     <a href="https://openaccess.thecvf.com/content/CVPR2024/html/Ryou_Robust_Image_Denoising_through_Adversarial_Frequency_Mixup_CVPR_2024_paper.html" title="Robust Image Denoising through Adversarial Frequency Mixup">
272 |         <strong>Robust Image Denoising through Adversarial Frequency Mixup</strong>
273 |     </a>
274 |     <br/>
275 |     Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han
276 |     <br/>
277 |     [<a href="https://openaccess.thecvf.com/content/CVPR2024/html/Ryou_Robust_Image_Denoising_through_Adversarial_Frequency_Mixup_CVPR_2024_paper.html">paper</a>] [<a href="https://github.com/dhryougit/AFM">code</a>] [<a href="https://youtu.be/zQ0pwFSk7uo">video</a>]  
278 |     <br/>
279 |     <strong>Topic:</strong> Low-level vision
280 |     <br/>
281 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #250
282 | </p>
283 | <br/>
284 | <br/>
285 | 
286 | ### multi-modal learning
287 | 
288 | <p align="left">
289 |     <a href="https://arxiv.org/abs/2310.03744" title="Improved Baselines with Visual Instruction Tuning">
290 |         <strong>🔥 Improved Baselines with Visual Instruction Tuning</strong>
291 |     </a>
292 |     <br/>
293 |     Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee
294 |     <br/>
295 |     [<a href="https://arxiv.org/abs/2310.03744">paper</a>] [<a href="https://github.com/LLaVA-VL/LLaVA-NeXT">code</a>]   
296 |     <br/>
297 |     <strong>Topic:</strong> Multi-modal learning
298 |     <br/>
299 |     <strong>Session:</strong> Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #209
300 | </p>
301 | <br/>
302 | 
303 | ### recognition: categorization, detection, retrieval
304 | 
305 | <p align="left">
306 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31301.png?t=1717420504.9897285" title="DETRs Beat YOLOs on Real-time Object Detection">
307 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/3732bfdd-4be4-45cd-8353-e056094f9fec" alt="DETRs Beat YOLOs on Real-time Object Detection" width="400px" align="left" />
308 |     </a>
309 |     <a href="https://arxiv.org/abs/2304.08069" title="DETRs Beat YOLOs on Real-time Object Detection">
310 |         <strong>DETRs Beat YOLOs on Real-time Object Detection</strong>
311 |     </a>
312 |     <br/>
313 |     Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen
314 |     <br/>
315 |     [<a href="https://arxiv.org/abs/2304.08069">paper</a>] [<a href="https://github.com/lyuwenyu/RT-DETR">code</a>] [<a href="https://www.youtube.com/watch?v=UOc0qMSX4Ac">video</a>]  
316 |     <br/>
317 |     <strong>Topic:</strong> Recognition: Categorization, detection, retrieval
318 |     <br/>
319 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229
320 | </p>
321 | <br/>
322 | <br/>
323 | 
324 | 
325 | <p align="left">
326 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/f9023a28-aca5-4965-a194-984c62348dc0" title="YOLO-World: Real-Time Open-Vocabulary Object Detection">
327 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/b9f0bb1e-91d4-4ea3-83c6-ee0817afc1bf" alt="YOLO-World: Real-Time Open-Vocabulary Object Detection" width="400px" align="left" />
328 |     </a>
329 |     <a href="https://arxiv.org/abs/2401.17270" title="YOLO-World: Real-Time Open-Vocabulary Object Detection">
330 |         <strong>YOLO-World: Real-Time Open-Vocabulary Object Detection</strong>
331 |     </a>
332 |     <br/>
333 |     Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan
334 |     <br/>
335 |     [<a href="https://arxiv.org/abs/2401.17270">paper</a>] [<a href="https://github.com/AILab-CVC/YOLO-World">code</a>] [<a href="https://youtu.be/X7gKBGVz4vs">video</a>] [<a href="https://huggingface.co/spaces/SkalskiP/YOLO-World">demo</a>] [<a href="https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-yolo-world.ipynb">colab</a>]
336 |     <br/>
337 |     <strong>Topic:</strong> Recognition: Categorization, detection, retrieval
338 |     <br/>
339 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223
340 | </p>
341 | <br/>
342 | <br/>
343 | 
344 | 
345 | <p align="left">
346 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31732.png?t=1717298372.5822952" title="Object Recognition as Next Token Prediction">
347 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/bcdc1aba-8ecb-4e63-a8a7-d287ca728bbb" alt="Object Recognition as Next Token Prediction" width="400px" align="left" />
348 |     </a>
349 |     <a href="https://arxiv.org/abs/2312.02142" title="Object Recognition as Next Token Prediction">
350 |         <strong>🔥 Object Recognition as Next Token Prediction</strong>
351 |     </a>
352 |     <br/>
353 |     Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim
354 |     <br/>
355 |     [<a href="https://arxiv.org/abs/2312.02142">paper</a>] [<a href="https://github.com/kaiyuyue/nxtp">code</a>] [<a href="https://youtu.be/xeI8dZIpoco">video</a>]  [<a href="https://colab.research.google.com/drive/1pJX37LP5xGLDzD3H7ztTmpq1RrIBeWX3?usp=sharing">colab</a>]
356 |     <br/>
357 |     <strong>Topic:</strong> Recognition: Categorization, detection, retrieval
358 |     <br/>
359 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #199
360 | </p>
361 | <br/>
362 | <br/>
363 | 
364 | ### segmentation, grouping and shape analysis
365 | 
366 | <p align="left">
367 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/62d34981-73d6-49b2-8058-46ec99bac94d" title="RobustSAM: Segment Anything Robustly on Degraded Images">
368 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/ee15d3bc-c391-44f9-b35b-24af714ef119" alt="RobustSAM: Segment Anything Robustly on Degraded Images" width="400px" align="left" />
369 |     </a>
370 |     <a href="https://openaccess.thecvf.com/content/CVPR2024/html/Chen_RobustSAM_Segment_Anything_Robustly_on_Degraded_Images_CVPR_2024_paper.html" title="RobustSAM: Segment Anything Robustly on Degraded Images">
371 |         <strong>🔥 RobustSAM: Segment Anything Robustly on Degraded Images</strong>
372 |     </a>
373 |     <br/>
374 |     Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang
375 |     <br/>
376 |     [<a href="https://openaccess.thecvf.com/content/CVPR2024/html/Chen_RobustSAM_Segment_Anything_Robustly_on_Degraded_Images_CVPR_2024_paper.html">paper</a>]  [<a href="https://www.youtube.com/watch?v=Awukqkbs6zM">video</a>]  
377 |     <br/>
378 |     <strong>Topic:</strong> Segmentation, grouping and shape analysis
379 |     <br/>
380 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378
381 | </p>
382 | <br/>
383 | <br/>
384 | 
385 | 
386 | <p align="left">
387 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30253.png?t=1716781257.513028" title="Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation">
388 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/0c43b789-f2e8-4ff9-ae46-b5a87de1b921" alt="Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation" width="400px" align="left" />
389 |     </a>
390 |     <a href="https://openaccess.thecvf.com/content/CVPR2024/html/Zhang_Frozen_CLIP_A_Strong_Backbone_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2024_paper.html" title="Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation">
391 |         <strong>🔥 Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation</strong>
392 |     </a>
393 |     <br/>
394 |     Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao
395 |     <br/>
396 |     [<a href="https://openaccess.thecvf.com/content/CVPR2024/html/Zhang_Frozen_CLIP_A_Strong_Backbone_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2024_paper.html">paper</a>] [<a href="https://github.com/zbf1991/WeCLIP">code</a>] [<a href="https://youtu.be/Lh489nTm_M0">video</a>]  
397 |     <br/>
398 |     <strong>Topic:</strong> Segmentation, grouping and shape analysis
399 |     <br/>
400 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351
401 | </p>
402 | <br/>
403 | <br/>
404 | 
405 | 
406 | <p align="left">
407 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/2f2bf794-3981-48c8-992d-04dd32ee9ced" title="Semantic-aware SAM for Point-Prompted Instance Segmentation">
408 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/f1ed2755-1df1-45fe-810b-5fc98b4b52e1" alt="Semantic-aware SAM for Point-Prompted Instance Segmentation" width="400px" align="left" />
409 |     </a>
410 |     <a href="https://arxiv.org/abs/2312.15895" title="Semantic-aware SAM for Point-Prompted Instance Segmentation">
411 |         <strong>🔥 Semantic-aware SAM for Point-Prompted Instance Segmentation</strong>
412 |     </a>
413 |     <br/>
414 |     Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han
415 |     <br/>
416 |     [<a href="https://arxiv.org/abs/2312.15895">paper</a>] [<a href="https://github.com/zhaoyangwei123/SAPNet">code</a>] [<a href="https://youtu.be/42-tJFmT7Ao">video</a>]  
417 |     <br/>
418 |     <strong>Topic:</strong> Segmentation, grouping and shape analysis
419 |     <br/>
420 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #331
421 | </p>
422 | <br/>
423 | <br/>
424 | 
425 | 
426 | <p align="left">
427 |     <a href="https://arxiv.org/abs/2403.15789" title="In-Context Matting">
428 |         <strong>🔥 In-Context Matting</strong>
429 |     </a>
430 |     <br/>
431 |     He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu
432 |     <br/>
433 |     [<a href="https://arxiv.org/abs/2403.15789">paper</a>] [<a href="https://github.com/tiny-smart/in-context-matting">code</a>]   
434 |     <br/>
435 |     <strong>Topic:</strong> Segmentation, grouping and shape analysis
436 |     <br/>
437 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #343
438 | </p>
439 | <br/>
440 | 
441 | 
442 | <p align="left">
443 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/bfe79038-706d-491b-ac99-083f421dc5ec" title="General Object Foundation Model for Images and Videos at Scale">
444 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4f0ed38d-28aa-4766-b290-940cbc6711d6" alt="General Object Foundation Model for Images and Videos at Scale" width="400px" align="left" />
445 |     </a>
446 |     <a href="https://arxiv.org/abs/2312.09158" title="General Object Foundation Model for Images and Videos at Scale">
447 |         <strong>🔥 General Object Foundation Model for Images and Videos at Scale</strong>
448 |     </a>
449 |     <br/>
450 |     Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai
451 |     <br/>
452 |     [<a href="https://arxiv.org/abs/2312.09158">paper</a>] [<a href="https://github.com/FoundationVision/GLEE">code</a>] [<a href="https://www.youtube.com/watch?v=PSVhfTPx0GQ">video</a>]  
453 |     <br/>
454 |     <strong>Topic:</strong> Segmentation, grouping and shape analysis
455 |     <br/>
456 |     <strong>Session:</strong> Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #350
457 | </p>
458 | <br/>
459 | <br/>
460 | 
461 | ### self-supervised or unsupervised representation learning
462 | 
463 | <p align="left">
464 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30014.png?t=1717339970.9614518" title="InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks">
465 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/9a03d726-0459-48f1-9f1e-5f12c7382084" alt="InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks" width="400px" align="left" />
466 |     </a>
467 |     <a href="https://arxiv.org/abs/2312.14238" title="InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks">
468 |         <strong>🔥 InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks</strong>
469 |     </a>
470 |     <br/>
471 |     Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai
472 |     <br/>
473 |     [<a href="https://arxiv.org/abs/2312.14238">paper</a>] [<a href="https://github.com/OpenGVLab/InternVL">code</a>]  [<a href="https://huggingface.co/spaces/OpenGVLab/InternVL">demo</a>] 
474 |     <br/>
475 |     <strong>Topic:</strong> Self-supervised or unsupervised representation learning
476 |     <br/>
477 |     <strong>Session:</strong> Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #412
478 | </p>
479 | <br/>
480 | <br/>
481 | 
482 | ### video: low-level analysis, motion, and tracking
483 | 
484 | <p align="left">
485 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/29590.png?t=1717456006.3308516" title="Matching Anything by Segmenting Anything">
486 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/bb451f47-ba3e-4e34-a7c0-3410b64d9339" alt="Matching Anything by Segmenting Anything" width="400px" align="left" />
487 |     </a>
488 |     <a href="https://arxiv.org/abs/2406.04221" title="Matching Anything by Segmenting Anything">
489 |         <strong>🔥 Matching Anything by Segmenting Anything</strong>
490 |     </a>
491 |     <br/>
492 |     Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu
493 |     <br/>
494 |     [<a href="https://arxiv.org/abs/2406.04221">paper</a>] [<a href="https://github.com/siyuanliii/masa">code</a>] [<a href="https://youtu.be/KDQVujKAWFQ">video</a>]  
495 |     <br/>
496 |     <strong>Topic:</strong> Video: Low-level analysis, motion, and tracking
497 |     <br/>
498 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421
499 | </p>
500 | <br/>
501 | <br/>
502 | 
503 | 
504 | <p align="left">
505 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/9711186c-b05b-472d-b095-d98dbe386171" title="DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction">
506 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/18caf2db-5dab-4251-9eeb-e2397c67eb3f" alt="DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction" width="400px" align="left" />
507 |     </a>
508 |     <a href="https://arxiv.org/abs/2403.02075" title="DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction">
509 |         <strong>DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction</strong>
510 |     </a>
511 |     <br/>
512 |     Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng
513 |     <br/>
514 |     [<a href="https://arxiv.org/abs/2403.02075">paper</a>] [<a href="https://github.com/Kroery/DiffMOT">code</a>]   
515 |     <br/>
516 |     <strong>Topic:</strong> Video: Low-level analysis, motion, and tracking
517 |     <br/>
518 |     <strong>Session:</strong> Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455
519 | </p>
520 | <br/>
521 | <br/>
522 | 
523 | ### vision, language, and reasoning
524 | 
525 | <p align="left">
526 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31492.png?t=1717327133.6073072" title="Alpha-CLIP: A CLIP Model Focusing on Wherever You Want">
527 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4480d88a-7f8f-48c2-bcb0-bde3b694dfd8" alt="Alpha-CLIP: A CLIP Model Focusing on Wherever You Want" width="400px" align="left" />
528 |     </a>
529 |     <a href="https://arxiv.org/abs/2312.03818" title="Alpha-CLIP: A CLIP Model Focusing on Wherever You Want">
530 |         <strong>Alpha-CLIP: A CLIP Model Focusing on Wherever You Want</strong>
531 |     </a>
532 |     <br/>
533 |     Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
534 |     <br/>
535 |     [<a href="https://arxiv.org/abs/2312.03818">paper</a>] [<a href="https://github.com/SunzeY/AlphaCLIP">code</a>] [<a href="https://youtu.be/QCEIKPZpZz0">video</a>] [<a href="https://huggingface.co/spaces/Zery/Alpha-CLIP_LLaVA-1.5">demo</a>] 
536 |     <br/>
537 |     <strong>Topic:</strong> Vision, language, and reasoning
538 |     <br/>
539 |     <strong>Session:</strong> Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327
540 | </p>
541 | <br/>
542 | <br/>
543 | 
544 | 
545 | <p align="left">
546 |     <a href="https://arxiv.org/abs/2401.06209" title="Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs">
547 |         <strong>🔥 Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs</strong>
548 |     </a>
549 |     <br/>
550 |     Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie
551 |     <br/>
552 |     [<a href="https://arxiv.org/abs/2401.06209">paper</a>] [<a href="https://github.com/tsb0601/MMVP">code</a>]   
553 |     <br/>
554 |     <strong>Topic:</strong> Vision, language, and reasoning
555 |     <br/>
556 |     <strong>Session:</strong> Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #390
557 | </p>
558 | <br/>
559 | 
560 | 
561 | <p align="left">
562 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30109.png?t=1717509456.89997" title="LISA: Reasoning Segmentation via Large Language Model">
563 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/fc2699d9-7bd2-4c3a-8e6c-4961505cc802" alt="LISA: Reasoning Segmentation via Large Language Model" width="400px" align="left" />
564 |     </a>
565 |     <a href="https://arxiv.org/abs/2308.00692" title="LISA: Reasoning Segmentation via Large Language Model">
566 |         <strong>🔥 LISA: Reasoning Segmentation via Large Language Model</strong>
567 |     </a>
568 |     <br/>
569 |     Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia
570 |     <br/>
571 |     [<a href="https://arxiv.org/abs/2308.00692">paper</a>] [<a href="https://github.com/dvlab-research/LISA">code</a>]  [<a href="http://103.170.5.190:7870/">demo</a>] 
572 |     <br/>
573 |     <strong>Topic:</strong> Vision, language, and reasoning
574 |     <br/>
575 |     <strong>Session:</strong> Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413
576 | </p>
577 | <br/>
578 | <br/>
579 | 
580 | 
581 | <p align="left">
582 |     <a href="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/53e03a08-4dd9-451a-975e-e3654fa5bc71" title="ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts">
583 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/6d1536ae-3f96-49d9-a05f-9648b925cdb5" alt="ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts" width="400px" align="left" />
584 |     </a>
585 |     <a href="https://arxiv.org/abs/2312.00784" title="ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts">
586 |         <strong>ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts</strong>
587 |     </a>
588 |     <br/>
589 |     Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee
590 |     <br/>
591 |     [<a href="https://arxiv.org/abs/2312.00784">paper</a>] [<a href="https://github.com/WisconsinAIVision/ViP-LLaVA">code</a>] [<a href="https://youtu.be/j_l1bRQouzc">video</a>] [<a href="https://pages.cs.wisc.edu/~mucai/vip-llava.html">demo</a>] 
592 |     <br/>
593 |     <strong>Topic:</strong> Vision, language, and reasoning
594 |     <br/>
595 |     <strong>Session:</strong> Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317
596 | </p>
597 | <br/>
598 | <br/>
599 | 
600 | 
601 | <p align="left">
602 |     <a href="https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31040.png?t=1718300473.5736258" title="MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI">
603 |         <img src="https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/8b9f69b7-3384-40e6-828f-90bf7b43e345" alt="MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI" width="400px" align="left" />
604 |     </a>
605 |     <a href="https://arxiv.org/abs/2311.16502" title="MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI">
606 |         <strong>🔥 MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI</strong>
607 |     </a>
608 |     <br/>
609 |     Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
610 |     <br/>
611 |     [<a href="https://arxiv.org/abs/2311.16502">paper</a>]    
612 |     <br/>
613 |     <strong>Topic:</strong> Vision, language, and reasoning
614 |     <br/>
615 |     <strong>Session:</strong> Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #382
616 | </p>
617 | <br/>
618 | <br/>
619 | 
620 | <!--- AUTOGENERATED_PAPERS_LIST -->
621 | 
622 | ## 🦸 contribution
623 | 
624 | We would love your help in making this repository even better! If you know of an amazing
625 | paper that isn't listed here, or if you have any suggestions for improvement, feel free
626 | to open an
627 | [issue](https://github.com/SkalskiP/top-cvpr-2024-papers/issues)
628 | or submit a
629 | [pull request](https://github.com/SkalskiP/top-cvpr-2024-papers/pulls).
630 | 


--------------------------------------------------------------------------------
/automation/data.csv:
--------------------------------------------------------------------------------
 1 | "title","authors","paper","code","huggingface","colab","youtube","topic","poster","compressed_poster","session","is_highlighted"
 2 | "DETRs Beat YOLOs on Real-time Object Detection","Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen",https://arxiv.org/abs/2304.08069,https://github.com/lyuwenyu/RT-DETR,,,https://www.youtube.com/watch?v=UOc0qMSX4Ac,"Recognition: Categorization, detection, retrieval",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31301.png?t=1717420504.9897285,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/3732bfdd-4be4-45cd-8353-e056094f9fec,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #229",False
 3 | "Alpha-CLIP: A CLIP Model Focusing on Wherever You Want","Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang",https://arxiv.org/abs/2312.03818,https://github.com/SunzeY/AlphaCLIP,https://huggingface.co/spaces/Zery/Alpha-CLIP_LLaVA-1.5,,https://youtu.be/QCEIKPZpZz0,"Vision, language, and reasoning",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31492.png?t=1717327133.6073072,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4480d88a-7f8f-48c2-bcb0-bde3b694dfd8,"Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #327",False
 4 | "YOLO-World: Real-Time Open-Vocabulary Object Detection","Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan",https://arxiv.org/abs/2401.17270,https://github.com/AILab-CVC/YOLO-World,https://huggingface.co/spaces/SkalskiP/YOLO-World,https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/zero-shot-object-detection-with-yolo-world.ipynb,https://youtu.be/X7gKBGVz4vs,"Recognition: Categorization, detection, retrieval",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/f9023a28-aca5-4965-a194-984c62348dc0,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/b9f0bb1e-91d4-4ea3-83c6-ee0817afc1bf,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #223",False
 5 | "SpatialTracker: Tracking Any 2D Pixels in 3D Space","Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou",https://arxiv.org/abs/2404.04319,https://github.com/henry123-boy/SpaTracker,,,,"3D from multi-view and sensors",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31668.png?t=1717417393.7589533,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/56498f78-2ca0-46ee-9231-6aa1806b6ebc,"Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #84",True
 6 | "EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything","Yunyang Xiong, Bala Varadarajan, Lemeng Wu, Xiaoyu Xiang, Fanyi Xiao, Chenchen Zhu, Xiaoliang Dai, Dilin Wang, Fei Sun, Forrest Iandola, Raghuraman Krishnamoorthi, Vikas Chandra",https://arxiv.org/abs/2312.00863,https://github.com/yformer/EfficientSAM,https://huggingface.co/spaces/SkalskiP/EfficientSAM,,,"Efficient and scalable vision",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/e95eac04-5a45-402c-885d-14395879abd3,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/e95eac04-5a45-402c-885d-14395879abd3,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #144",True
 7 | "DemoFusion: Democratising High-Resolution Image Generation With No $$$","Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma",https://arxiv.org/abs/2311.16973,https://github.com/PRIS-CV/DemoFusion,https://huggingface.co/spaces/radames/Enhance-This-DemoFusion-SDXL,https://colab.research.google.com/github/camenduru/DemoFusion-colab/blob/main/DemoFusion_colab.ipynb,,"Image and video synthesis and generation",,,"Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #132",False
 8 | "Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs","Shengbang Tong, Zhuang Liu, Yuexiang Zhai, Yi Ma, Yann LeCun, Saining Xie",https://arxiv.org/abs/2401.06209,https://github.com/tsb0601/MMVP,,,,"Vision, language, and reasoning",,,"Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #390",True
 9 | "ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models","Lukas Höllein, Aljaž Božič, Norman Müller, David Novotny, Hung-Yu Tseng, Christian Richardt, Michael Zollhöfer, Matthias Nießner",https://arxiv.org/abs/2403.01807,https://github.com/facebookresearch/ViewDiff,,,https://youtu.be/SdjoCqHzMMk,"3D from multi-view and sensors",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31616.png?t=1716470830.0209699,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/0453bf88-9d54-4ecf-8a45-01af0f604faf,"Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #20",False
10 | "LISA: Reasoning Segmentation via Large Language Model","Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia",https://arxiv.org/abs/2308.00692,https://github.com/dvlab-research/LISA,http://103.170.5.190:7870/,,,"Vision, language, and reasoning",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30109.png?t=1717509456.89997,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/fc2699d9-7bd2-4c3a-8e6c-4961505cc802,"Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #413",True
11 | "Matching Anything by Segmenting Anything","Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu",https://arxiv.org/abs/2406.04221,https://github.com/siyuanliii/masa,,,https://youtu.be/KDQVujKAWFQ,"Video: Low-level analysis, motion, and tracking",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/29590.png?t=1717456006.3308516,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/bb451f47-ba3e-4e34-a7c0-3410b64d9339,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #421",True
12 | "DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction","Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng",https://arxiv.org/abs/2403.02075,https://github.com/Kroery/DiffMOT,,,,"Video: Low-level analysis, motion, and tracking",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/9711186c-b05b-472d-b095-d98dbe386171,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/18caf2db-5dab-4251-9eeb-e2397c67eb3f,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #455",False
13 | "RobustSAM: Segment Anything Robustly on Degraded Images","Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhou Ma, Jian Wang",https://openaccess.thecvf.com/content/CVPR2024/html/Chen_RobustSAM_Segment_Anything_Robustly_on_Degraded_Images_CVPR_2024_paper.html,,,,https://www.youtube.com/watch?v=Awukqkbs6zM,"Segmentation, grouping and shape analysis",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/62d34981-73d6-49b2-8058-46ec99bac94d,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/ee15d3bc-c391-44f9-b35b-24af714ef119,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #378",True
14 | "Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation","Bingfeng Zhang, Siyue Yu, Yunchao Wei, Yao Zhao, Jimin Xiao",https://openaccess.thecvf.com/content/CVPR2024/html/Zhang_Frozen_CLIP_A_Strong_Backbone_for_Weakly_Supervised_Semantic_Segmentation_CVPR_2024_paper.html,https://github.com/zbf1991/WeCLIP,,,https://youtu.be/Lh489nTm_M0,"Segmentation, grouping and shape analysis",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30253.png?t=1716781257.513028,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/0c43b789-f2e8-4ff9-ae46-b5a87de1b921,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #351",True
15 | "ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts","Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee",https://arxiv.org/abs/2312.00784,https://github.com/WisconsinAIVision/ViP-LLaVA,https://pages.cs.wisc.edu/~mucai/vip-llava.html,,https://youtu.be/j_l1bRQouzc,"Vision, language, and reasoning",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/53e03a08-4dd9-451a-975e-e3654fa5bc71,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/6d1536ae-3f96-49d9-a05f-9648b925cdb5,"Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #317",False
16 | "DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing","Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y. F. Tan, Song Bai",https://arxiv.org/abs/2306.14435,https://github.com/Yujun-Shi/DragDiffusion,,,https://youtu.be/rysOFTpDBhc,"Image and video synthesis and generation",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/b0833f6b-6924-4f28-b409-ae85aaaa4dd6,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/2a0219f5-9f1e-47e1-a968-d4d98154feb2,"Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #392",True
17 | "OmniGlue: Generalizable Feature Matching with Foundation Model Guidance","Hanwen Jiang, Arjun Karpur, Bingyi Cao, Qixing Huang, Andre Araujo",https://arxiv.org/abs/2405.12979,https://github.com/google-research/omniglue,https://huggingface.co/spaces/qubvel-hf/omniglue,,,"3D from multi-view and sensors",,,"Fri 21 Jun 1:30 p.m. EDT — 3 p.m. EDT #32",False
18 | "DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks","Jiaxin Zhang, Dezhi Peng, Chongyu Liu, Peirong Zhang, Lianwen Jin",https://arxiv.org/abs/2405.04408,https://github.com/ZZZHANG-jx/DocRes,https://huggingface.co/spaces/qubvel-hf/documents-restoration,,,"Document analysis and understanding",,,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #101",False
19 | "XFeat: Accelerated Features for Lightweight Image Matching","Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson R. Nascimento",https://arxiv.org/abs/2404.19174,https://github.com/verlab/accelerated_features,https://huggingface.co/spaces/qubvel-hf/xfeat,https://colab.research.google.com/github/verlab/accelerated_features/blob/main/notebooks/xfeat_matching.ipynb,https://youtu.be/RamC70IkZuI,"Low-level vision",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/8eb6b4f0-4ae6-4615-9921-f73fa2aa3766,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/50b6d16f-c2d8-49a4-8c15-a31d6f9a3c44,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #245",False
20 | "Improved Baselines with Visual Instruction Tuning","Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee",https://arxiv.org/abs/2310.03744,https://github.com/LLaVA-VL/LLaVA-NeXT,,,,"Multi-modal learning",,,"Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #209",True
21 | "Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks","Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan",https://arxiv.org/pdf/2311.06242,,https://huggingface.co/spaces/gokaygokay/Florence-2,https://youtu.be/cOlyA00K1ec,https://youtu.be/cOlyA00K1ec,"Deep learning architectures and techniques",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30529.png?t=1717455193.7819567,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4aaf3f87-cc62-4fa3-af99-c8c1c83c0069,"Wed 19 Jun 8 p.m. EDT — 9:30 p.m. EDT #102",True
22 | "Semantic-aware SAM for Point-Prompted Instance Segmentation","Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han",https://arxiv.org/abs/2312.15895,https://github.com/zhaoyangwei123/SAPNet,,,https://youtu.be/42-tJFmT7Ao,"Segmentation, grouping and shape analysis",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/2f2bf794-3981-48c8-992d-04dd32ee9ced,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/f1ed2755-1df1-45fe-810b-5fc98b4b52e1,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #331",True
23 | "In-Context Matting","He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu",https://arxiv.org/abs/2403.15789,https://github.com/tiny-smart/in-context-matting,,,,"Segmentation, grouping and shape analysis",,,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #343",True
24 | "Robust Image Denoising through Adversarial Frequency Mixup","Donghun Ryou, Inju Ha, Hyewon Yoo, Dongwan Kim, Bohyung Han",https://openaccess.thecvf.com/content/CVPR2024/html/Ryou_Robust_Image_Denoising_through_Adversarial_Frequency_Mixup_CVPR_2024_paper.html,https://github.com/dhryougit/AFM,,,https://youtu.be/zQ0pwFSk7uo,"Low-level vision",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/038bef8f-a6df-440d-9ebc-b58f69beb338,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/03cc753c-f875-479e-bca2-e0375e9929a6,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #250",False
25 | "General Object Foundation Model for Images and Videos at Scale","Junfeng Wu, Yi Jiang, Qihao Liu, Zehuan Yuan, Xiang Bai, Song Bai",https://arxiv.org/abs/2312.09158,https://github.com/FoundationVision/GLEE,,,https://www.youtube.com/watch?v=PSVhfTPx0GQ,"Segmentation, grouping and shape analysis",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/bfe79038-706d-491b-ac99-083f421dc5ec,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/4f0ed38d-28aa-4766-b290-940cbc6711d6,"Wed 19 Jun 1:30 p.m. EDT — 3 p.m. EDT #350",True
26 | "MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training","Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri, Raviteja Vemulapalli, Oncel Tuzel",https://arxiv.org/abs/2311.17049,https://github.com/apple/ml-mobileclip,https://huggingface.co/spaces/Xenova/webgpu-mobileclip,,,"Efficient and scalable vision",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30022.png?t=1718402790.003817,,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #130",False
27 | "Object Recognition as Next Token Prediction","Kaiyu Yue, Bor-Chun Chen, Jonas Geiping, Hengduo Li, Tom Goldstein, Ser-Nam Lim",https://arxiv.org/abs/2312.02142,https://github.com/kaiyuyue/nxtp,,https://colab.research.google.com/drive/1pJX37LP5xGLDzD3H7ztTmpq1RrIBeWX3?usp=sharing,https://youtu.be/xeI8dZIpoco,"Recognition: Categorization, detection, retrieval",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31732.png?t=1717298372.5822952,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/bcdc1aba-8ecb-4e63-a8a7-d287ca728bbb,"Thu 20 Jun 8 p.m. EDT — 9:30 p.m. EDT #199",True
28 | "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI","Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen",https://arxiv.org/abs/2311.16502,,,,,"Vision, language, and reasoning",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/31040.png?t=1718300473.5736258,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/8b9f69b7-3384-40e6-828f-90bf7b43e345,"Thu 20 Jun 1:30 p.m. EDT — 3 p.m. EDT #382",True
29 | "InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks","Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai",https://arxiv.org/abs/2312.14238,https://github.com/OpenGVLab/InternVL,https://huggingface.co/spaces/OpenGVLab/InternVL,,,"Self-supervised or unsupervised representation learning",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30014.png?t=1717339970.9614518,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/9a03d726-0459-48f1-9f1e-5f12c7382084,"Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #412",True
30 | "Describing Differences in Image Sets with Natural Language","Lisa Dunlap, Yuhui Zhang, Xiaohan Wang, Ruiqi Zhong, Trevor Darrell, Jacob Steinhardt, Joseph E. Gonzalez, Serena Yeung-Levy",https://arxiv.org/abs/2312.02974,https://github.com/Understanding-Visual-Datasets/VisDiff,,,,"Explainable computer vision",https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/6d87318b-57c1-40c7-9de6-5cb47145e119,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/6d87318b-57c1-40c7-9de6-5cb47145e119,"Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #115",True
31 | "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models","Daniel Geng, Inbum Park, Andrew Owens",https://arxiv.org/abs/2311.17919,https://github.com/dangeng/visual_anagrams,,https://colab.research.google.com/github/dangeng/visual_anagrams/blob/main/notebooks/colab_demo_free_tier.ipynb,,"Image and video synthesis and generation",https://cvpr.thecvf.com/media/PosterPDFs/CVPR%202024/30657.png?t=1717473392.6694562,https://github.com/SkalskiP/top-cvpr-2024-papers/assets/26109316/709e3619-25d9-409e-b6ad-ca082611fe09,"Fri 21 Jun 8 p.m. EDT — 9:30 p.m. EDT #118",True


--------------------------------------------------------------------------------
/automation/generate.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | from typing import List
  3 | 
  4 | import pandas as pd
  5 | 
  6 | from pandas.core.series import Series
  7 | 
  8 | TITLE_COLUMN_NAME = "title"
  9 | AUTHORS_COLUMN_NAME = "authors"
 10 | TOPIC_COLUMN_NAME = "topic"
 11 | SESSION_COLUMN_NAME = "session"
 12 | POSTER_COLUMN_NAME = "poster"
 13 | COMPRESSED_POSTER_COLUMN_NAME = "compressed_poster"
 14 | PAPER_COLUMN_NAME = "paper"
 15 | CODE_COLUMN_NAME = "code"
 16 | HUGGINGFACE_SPACE_COLUMN_NAME = "huggingface"
 17 | YOUTUBE_COLUMN_NAME = "youtube"
 18 | COLAB_COLUMN_NAME = "colab"
 19 | IS_HIGHLIGHTED_COLUMN_NAME = "is_highlighted"
 20 | 
 21 | AUTOGENERATED_PAPERS_LIST_TOKEN = "<!--- AUTOGENERATED_PAPERS_LIST -->"
 22 | 
 23 | WARNING_HEADER = [
 24 |     "<!---",
 25 |     "   WARNING: DO NOT EDIT THIS LIST MANUALLY. IT IS AUTOMATICALLY GENERATED.",
 26 |     "   HEAD OVER TO https://github.com/SkalskiP/top-cvpr-2024-papers/blob/master/CONTRIBUTING.md FOR MORE DETAILS ON HOW TO MAKE CHANGES PROPERLY.",
 27 |     "-->"
 28 | ]
 29 | 
 30 | ARXIV_BADGE_PATTERN = '[<a href="{}">paper</a>]'
 31 | GITHUB_BADGE_PATTERN = '[<a href="{}">code</a>]'
 32 | HUGGINGFACE_SPACE_BADGE_PATTERN = '[<a href="{}">demo</a>]'
 33 | COLAB_BADGE_PATTERN = '[<a href="{}">colab</a>]'
 34 | YOUTUBE_BADGE_PATTERN = '[<a href="{}">video</a>]'
 35 | 
 36 | PAPER_WITHOUT_POSTER_PATTERN = """
 37 | <p align="left">
 38 |     <a href="{}" title="{}">
 39 |         <strong>{}{}</strong>
 40 |     </a>
 41 |     <br/>
 42 |     {}
 43 |     <br/>
 44 |     {}
 45 |     <br/>
 46 |     <strong>Topic:</strong> {}
 47 |     <br/>
 48 |     <strong>Session:</strong> {}
 49 | </p>
 50 | <br/>
 51 | """
 52 | 
 53 | PAPER_WITH_POSTER_PATTERN = """
 54 | <p align="left">
 55 |     <a href="{}" title="{}">
 56 |         <img src="{}" alt="{}" width="400px" align="left" />
 57 |     </a>
 58 |     <a href="{}" title="{}">
 59 |         <strong>{}{}</strong>
 60 |     </a>
 61 |     <br/>
 62 |     {}
 63 |     <br/>
 64 |     {}
 65 |     <br/>
 66 |     <strong>Topic:</strong> {}
 67 |     <br/>
 68 |     <strong>Session:</strong> {}
 69 | </p>
 70 | <br/>
 71 | <br/>
 72 | """
 73 | 
 74 | def read_lines_from_file(path: str) -> List[str]:
 75 |     """
 76 |     Reads lines from file and strips trailing whitespaces.
 77 |     """
 78 |     with open(path) as file:
 79 |         return [line.rstrip() for line in file]
 80 | 
 81 | 
 82 | def save_lines_to_file(path: str, lines: List[str]) -> None:
 83 |     """
 84 |     Saves lines to file.
 85 |     """
 86 |     with open(path, "w") as f:
 87 |         for line in lines:
 88 |             f.write("%s\n" % line)
 89 | 
 90 | 
 91 | def format_entry(entry: Series) -> str:
 92 |     """
 93 |     Formats entry into Markdown table row, ensuring dates are formatted correctly.
 94 |     """
 95 |     title = entry.loc[TITLE_COLUMN_NAME]
 96 |     authors = entry.loc[AUTHORS_COLUMN_NAME]
 97 |     topics = entry.loc[TOPIC_COLUMN_NAME]
 98 |     session = entry.loc[SESSION_COLUMN_NAME]
 99 |     poster = entry.loc[POSTER_COLUMN_NAME]
100 |     compressed_poster = entry.loc[COMPRESSED_POSTER_COLUMN_NAME]
101 |     paper_id = entry.loc[PAPER_COLUMN_NAME]
102 |     code_url = entry.loc[CODE_COLUMN_NAME]
103 |     huggingface_url = entry.loc[HUGGINGFACE_SPACE_COLUMN_NAME]
104 |     youtube_url = entry.loc[YOUTUBE_COLUMN_NAME]
105 |     colab_url = entry.loc[COLAB_COLUMN_NAME]
106 |     is_highlight = entry.loc[IS_HIGHLIGHTED_COLUMN_NAME]
107 |     arxiv_badge = ARXIV_BADGE_PATTERN.format(paper_id) if paper_id else ""
108 |     code_badge = GITHUB_BADGE_PATTERN.format(code_url) if code_url else ""
109 |     youtube_badge = YOUTUBE_BADGE_PATTERN.format(youtube_url) if youtube_url else ""
110 |     huggingface_badge = HUGGINGFACE_SPACE_BADGE_PATTERN.format(huggingface_url) if huggingface_url else ""
111 |     colab_badge = COLAB_BADGE_PATTERN.format(colab_url) if colab_url else ""
112 |     highlight_badge = "🔥 " if is_highlight == "True" else ""
113 |     badges = " ".join([arxiv_badge, code_badge, youtube_badge, huggingface_badge, colab_badge])
114 |     compressed_poster = compressed_poster if compressed_poster else poster
115 | 
116 |     if not poster:
117 |         return PAPER_WITHOUT_POSTER_PATTERN.format(
118 |             paper_id, title, highlight_badge, title, authors, badges, topics, session)
119 | 
120 |     return PAPER_WITH_POSTER_PATTERN.format(
121 |         poster, title, compressed_poster, title, paper_id, title, highlight_badge, title, authors, badges, topics, session)
122 | 
123 | 
124 | def load_entries(path: str) -> List[str]:
125 |     """
126 |     Loads table entries from csv file, sorted by date in descending order and formats dates.
127 |     """
128 |     df = pd.read_csv(path, quotechar='"', dtype=str)
129 |     df.columns = df.columns.str.strip()
130 |     df = df.fillna("")
131 | 
132 |     entries = []
133 |     df_dict = {topic: group_df for topic, group_df in df.groupby(TOPIC_COLUMN_NAME)}
134 |     for topic, group_df in df_dict.items():
135 |         entries.append(f"### {topic.lower()}")
136 |         entries += [
137 |             format_entry(row)
138 |             for _, row
139 |             in group_df.iterrows()
140 |         ]
141 |     return entries
142 | 
143 | 
144 | def search_lines_with_token(lines: List[str], token: str) -> List[int]:
145 |     """
146 |     Searches for lines with token.
147 |     """
148 |     result = []
149 |     for line_index, line in enumerate(lines):
150 |         if token in line:
151 |             result.append(line_index)
152 |     return result
153 | 
154 | 
155 | def inject_papers_list_into_readme(
156 |     readme_lines: List[str],
157 |     papers_list_lines: List[str]
158 | ) -> List[str]:
159 |     """
160 |     Injects papers list into README.md.
161 |     """
162 |     lines_with_token_indexes = search_lines_with_token(
163 |         lines=readme_lines, token=AUTOGENERATED_PAPERS_LIST_TOKEN)
164 | 
165 |     if len(lines_with_token_indexes) != 2:
166 |         raise Exception(f"Please inject two {AUTOGENERATED_PAPERS_LIST_TOKEN} "
167 |                         f"tokens to signal start and end of autogenerated table.")
168 | 
169 |     [start_index, end_index] = lines_with_token_indexes
170 |     return readme_lines[:start_index + 1] + papers_list_lines + readme_lines[end_index:]
171 | 
172 | 
173 | if __name__ == "__main__":
174 |     parser = argparse.ArgumentParser()
175 |     parser.add_argument('-d', '--data_path', default='automation/data.csv')
176 |     parser.add_argument('-r', '--readme_path', default='README.md')
177 |     args = parser.parse_args()
178 | 
179 |     table_lines = load_entries(path=args.data_path)
180 |     table_lines = WARNING_HEADER + table_lines
181 |     readme_lines = read_lines_from_file(path=args.readme_path)
182 |     readme_lines = inject_papers_list_into_readme(readme_lines=readme_lines,
183 |                                                   papers_list_lines=table_lines)
184 |     save_lines_to_file(path=args.readme_path, lines=readme_lines)


--------------------------------------------------------------------------------
/automation/requirements.txt:
--------------------------------------------------------------------------------
1 | pandas


--------------------------------------------------------------------------------