├── assets
├── interpolation.jpg
├── t2i-results.jpg
├── imagenet-recon.jpg
├── t2i-efficiency.jpg
├── methodfig_ziptokv5.jpg
├── imagenet-efficiency.jpg
└── t2i-efficiency_with_ablations.jpg
├── README.md
└── .gitignore
/assets/interpolation.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/interpolation.jpg
--------------------------------------------------------------------------------
/assets/t2i-results.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/t2i-results.jpg
--------------------------------------------------------------------------------
/assets/imagenet-recon.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/imagenet-recon.jpg
--------------------------------------------------------------------------------
/assets/t2i-efficiency.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/t2i-efficiency.jpg
--------------------------------------------------------------------------------
/assets/methodfig_ziptokv5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/methodfig_ziptokv5.jpg
--------------------------------------------------------------------------------
/assets/imagenet-efficiency.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/imagenet-efficiency.jpg
--------------------------------------------------------------------------------
/assets/t2i-efficiency_with_ablations.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/t2i-efficiency_with_ablations.jpg
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
🦎 Adapting Self-Supervised Representations as a Latent Space for Efficient Generation
4 |
5 |
6 | Ming Gui*1,2 · Johannes Schusterbauer*1,2 · Timy Phan1,2
Felix Krause1,2 · Josh Susskind3 · Miguel A. Bautista3 · Björn Ommer1,2
7 |
8 |
9 | 1CompVis Group @ LMU Munich
10 | 2MCML
11 | 3Apple
12 |
13 |
* equal contribution
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 | # 🔥 TL;DR
23 |
24 | We introduce **Rep**resentation **Tok**enizer (RepTok🦎), a generative framework that encodes each image into a single continuous latent token derived from self-supervised vision transformers. By jointly fine-tuning the semantic `[cls]` token with a generative decoder, RepTok achieves faithful reconstructions while preserving the smooth, meaningful structure of the SSL space. This compact one-token formulation enables highly efficient latent-space generative modeling, delivering competitive results even under severely constrained training budgets.
25 |
26 | # 📝 Overview
27 |
28 | Our approach builds on a pre-trained SSL encoder that is lightly fine-tuned and trained jointly with a generative decoder. We train the decoder with a standard flow matching objective, complemented by a cosine-similarity loss that regularizes the latent representation to remain close to its original smooth and semantically structured space, which is well-suited for generation. Without auxiliary perceptual or adversarial losses, the resulting model is able to faithfully decode the single-token latent representation into the pixel space.
29 |
30 |
31 |
32 |
33 |
34 | This design enables highly efficient image synthesis training, allowing us to use simple, attention-free architectures such as MLP-Mixers for accelerated ImageNet training. Furthermore, we show that the framework naturally extends to text-to-image (T2I) synthesis: by incorporating cross-attention to integrate textual conditioning, our model achieves competitive zero-shot performance on the COCO benchmark under an extremely constrained training budget.
35 |
36 | # 📈 Results
37 |
38 | ## ⏳ Efficiency
39 |
40 | Our approach constantly achieves a substantially lower computational footprint while maintaining competitive performance on ImageNet.
41 |
42 |
43 |
44 |
45 |
46 | This also extends to a general T2I setting: RepTok reaches SD 1.5 quality in a fraction of the cost of other methods while delivering better generative performance compared to other efficiency-focused methods.
47 |
48 |
49 |
50 |
51 |
52 | ## 🌇 Qualitative Reconstructions
53 |
54 | Our approach augments the pre-trained SSL representations with additional necessary information to enable images to be faithfully encoded as a single continuous token, which allows for both high-fidelity image reconstruction and synthesis.
55 |
56 |
57 |
58 |
59 |
60 | ## 🐯 Interpolation Results
61 |
62 | We observe smooth transitions not only in semantic content but also in spatial configuration. This indicates that our method successfully integrates low-level spatial information while preserving the properties of the pretrained encoder's latent space, and facilitates generation within the learned representation.
63 |
64 |
65 |
66 |
67 |
68 | ## 🔥 T2I Results
69 |
70 | Using our approach, we trained a general T2I model which synthesizes coherent and aesthetically pleasing images with minimal compute budget.
71 |
72 |
73 |
74 |
75 |
76 | # 🚀 To-Do
77 |
78 | We are in the process of preparing the public release of the **RepTok** codebase.
79 | The following items are planned:
80 |
81 | - [ ] Release pretrained checkpoints
82 | - [ ] Provide inference demo
83 |
84 | Stay tuned — the code and pretrained models will be released soon!
85 |
86 | ## 🎓 Citation
87 |
88 | If you use our work in your research, please use the following BibTeX entry
89 |
90 | ```bibtex
91 | @misc{gui2025reptok,
92 | title={Adapting Self-Supervised Representations as a Latent Space for Efficient Generation},
93 | author={Ming Gui and Johannes Schusterbauer and Timy Phan and Felix Krause and Josh Susskind and Miguel Angel Bautista and Björn Ommer},
94 | year={2025},
95 | eprint={2510.14630},
96 | archivePrefix={arXiv},
97 | primaryClass={cs.CV},
98 | url={https://arxiv.org/abs/2510.14630},
99 | }
100 | ```
101 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[codz]
4 | *$py.class
5 |
6 | sandbox
7 | checkpoints
8 | results
9 | logs
10 | wandb
11 | outputs
12 |
13 | *__pycache__*
14 | .idea
15 | venv
16 |
17 | *.DS_Store
18 | *._.DS_Store
19 | testy.ipynb
20 | third_party
21 |
22 | # C extensions
23 | *.so
24 |
25 | # Distribution / packaging
26 | .Python
27 | build/
28 | develop-eggs/
29 | dist/
30 | downloads/
31 | eggs/
32 | .eggs/
33 | lib/
34 | lib64/
35 | parts/
36 | sdist/
37 | var/
38 | wheels/
39 | share/python-wheels/
40 | *.egg-info/
41 | .installed.cfg
42 | *.egg
43 | MANIFEST
44 |
45 | # PyInstaller
46 | # Usually these files are written by a python script from a template
47 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
48 | *.manifest
49 | *.spec
50 |
51 | # Installer logs
52 | pip-log.txt
53 | pip-delete-this-directory.txt
54 |
55 | # Unit test / coverage reports
56 | htmlcov/
57 | .tox/
58 | .nox/
59 | .coverage
60 | .coverage.*
61 | .cache
62 | nosetests.xml
63 | coverage.xml
64 | *.cover
65 | *.py.cover
66 | .hypothesis/
67 | .pytest_cache/
68 | cover/
69 |
70 | # Translations
71 | *.mo
72 | *.pot
73 |
74 | # Django stuff:
75 | *.log
76 | local_settings.py
77 | db.sqlite3
78 | db.sqlite3-journal
79 |
80 | # Flask stuff:
81 | instance/
82 | .webassets-cache
83 |
84 | # Scrapy stuff:
85 | .scrapy
86 |
87 | # Sphinx documentation
88 | docs/_build/
89 |
90 | # PyBuilder
91 | .pybuilder/
92 | target/
93 |
94 | # Jupyter Notebook
95 | .ipynb_checkpoints
96 |
97 | # IPython
98 | profile_default/
99 | ipython_config.py
100 |
101 | # pyenv
102 | # For a library or package, you might want to ignore these files since the code is
103 | # intended to run in multiple environments; otherwise, check them in:
104 | # .python-version
105 |
106 | # pipenv
107 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
108 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
109 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
110 | # install all needed dependencies.
111 | #Pipfile.lock
112 |
113 | # UV
114 | # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
115 | # This is especially recommended for binary packages to ensure reproducibility, and is more
116 | # commonly ignored for libraries.
117 | #uv.lock
118 |
119 | # poetry
120 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
121 | # This is especially recommended for binary packages to ensure reproducibility, and is more
122 | # commonly ignored for libraries.
123 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
124 | #poetry.lock
125 | #poetry.toml
126 |
127 | # pdm
128 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
129 | # pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python.
130 | # https://pdm-project.org/en/latest/usage/project/#working-with-version-control
131 | #pdm.lock
132 | #pdm.toml
133 | .pdm-python
134 | .pdm-build/
135 |
136 | # pixi
137 | # Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control.
138 | #pixi.lock
139 | # Pixi creates a virtual environment in the .pixi directory, just like venv module creates one
140 | # in the .venv directory. It is recommended not to include this directory in version control.
141 | .pixi
142 |
143 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
144 | __pypackages__/
145 |
146 | # Celery stuff
147 | celerybeat-schedule
148 | celerybeat.pid
149 |
150 | # SageMath parsed files
151 | *.sage.py
152 |
153 | # Environments
154 | .env
155 | .envrc
156 | .venv
157 | env/
158 | venv/
159 | ENV/
160 | env.bak/
161 | venv.bak/
162 |
163 | # Spyder project settings
164 | .spyderproject
165 | .spyproject
166 |
167 | # Rope project settings
168 | .ropeproject
169 |
170 | # mkdocs documentation
171 | /site
172 |
173 | # mypy
174 | .mypy_cache/
175 | .dmypy.json
176 | dmypy.json
177 |
178 | # Pyre type checker
179 | .pyre/
180 |
181 | # pytype static type analyzer
182 | .pytype/
183 |
184 | # Cython debug symbols
185 | cython_debug/
186 |
187 | # PyCharm
188 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
189 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
190 | # and can be added to the global gitignore or merged into this file. For a more nuclear
191 | # option (not recommended) you can uncomment the following to ignore the entire idea folder.
192 | #.idea/
193 |
194 | # Abstra
195 | # Abstra is an AI-powered process automation framework.
196 | # Ignore directories containing user credentials, local state, and settings.
197 | # Learn more at https://abstra.io/docs
198 | .abstra/
199 |
200 | # Visual Studio Code
201 | # Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
202 | # that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
203 | # and can be added to the global gitignore or merged into this file. However, if you prefer,
204 | # you could uncomment the following to ignore the entire vscode folder
205 | # .vscode/
206 |
207 | # Ruff stuff:
208 | .ruff_cache/
209 |
210 | # PyPI configuration file
211 | .pypirc
212 |
213 | # Cursor
214 | # Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
215 | # exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
216 | # refer to https://docs.cursor.com/context/ignore-files
217 | .cursorignore
218 | .cursorindexingignore
219 |
220 | # Marimo
221 | marimo/_static/
222 | marimo/_lsp/
223 | __marimo__/
224 |
--------------------------------------------------------------------------------