├── assets ├── interpolation.jpg ├── t2i-results.jpg ├── imagenet-recon.jpg ├── t2i-efficiency.jpg ├── methodfig_ziptokv5.jpg ├── imagenet-efficiency.jpg └── t2i-efficiency_with_ablations.jpg ├── README.md └── .gitignore /assets/interpolation.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/interpolation.jpg -------------------------------------------------------------------------------- /assets/t2i-results.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/t2i-results.jpg -------------------------------------------------------------------------------- /assets/imagenet-recon.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/imagenet-recon.jpg -------------------------------------------------------------------------------- /assets/t2i-efficiency.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/t2i-efficiency.jpg -------------------------------------------------------------------------------- /assets/methodfig_ziptokv5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/methodfig_ziptokv5.jpg -------------------------------------------------------------------------------- /assets/imagenet-efficiency.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/imagenet-efficiency.jpg -------------------------------------------------------------------------------- /assets/t2i-efficiency_with_ablations.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/CompVis/RepTok/HEAD/assets/t2i-efficiency_with_ablations.jpg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 |

3 |

🦎 Adapting Self-Supervised Representations as a Latent Space for Efficient Generation

4 | 5 |

6 | Ming Gui*1,2 · Johannes Schusterbauer*1,2 · Timy Phan1,2
Felix Krause1,2 · Josh Susskind3 · Miguel A. Bautista3 · Björn Ommer1,2
7 | 8 |
9 | 1CompVis Group @ LMU Munich     10 | 2MCML     11 | 3Apple 12 |

13 |

* equal contribution

14 |

15 | 16 | 17 | Paper PDF 18 | 19 | 20 |
21 | 22 | # 🔥 TL;DR 23 | 24 | We introduce **Rep**resentation **Tok**enizer (RepTok🦎), a generative framework that encodes each image into a single continuous latent token derived from self-supervised vision transformers. By jointly fine-tuning the semantic `[cls]` token with a generative decoder, RepTok achieves faithful reconstructions while preserving the smooth, meaningful structure of the SSL space. This compact one-token formulation enables highly efficient latent-space generative modeling, delivering competitive results even under severely constrained training budgets. 25 | 26 | # 📝 Overview 27 | 28 | Our approach builds on a pre-trained SSL encoder that is lightly fine-tuned and trained jointly with a generative decoder. We train the decoder with a standard flow matching objective, complemented by a cosine-similarity loss that regularizes the latent representation to remain close to its original smooth and semantically structured space, which is well-suited for generation. Without auxiliary perceptual or adversarial losses, the resulting model is able to faithfully decode the single-token latent representation into the pixel space. 29 | 30 |

31 | Pipeline 32 |

33 | 34 | This design enables highly efficient image synthesis training, allowing us to use simple, attention-free architectures such as MLP-Mixers for accelerated ImageNet training. Furthermore, we show that the framework naturally extends to text-to-image (T2I) synthesis: by incorporating cross-attention to integrate textual conditioning, our model achieves competitive zero-shot performance on the COCO benchmark under an extremely constrained training budget. 35 | 36 | # 📈 Results 37 | 38 | ## ⏳ Efficiency 39 | 40 | Our approach constantly achieves a substantially lower computational footprint while maintaining competitive performance on ImageNet. 41 | 42 |

43 | ImageNet efficiency 44 |

45 | 46 | This also extends to a general T2I setting: RepTok reaches SD 1.5 quality in a fraction of the cost of other methods while delivering better generative performance compared to other efficiency-focused methods. 47 | 48 |

49 | T2I efficiency 50 |

51 | 52 | ## 🌇 Qualitative Reconstructions 53 | 54 | Our approach augments the pre-trained SSL representations with additional necessary information to enable images to be faithfully encoded as a single continuous token, which allows for both high-fidelity image reconstruction and synthesis. 55 | 56 |

57 | ImageNet Reconstruction 58 |

59 | 60 | ## 🐯 Interpolation Results 61 | 62 | We observe smooth transitions not only in semantic content but also in spatial configuration. This indicates that our method successfully integrates low-level spatial information while preserving the properties of the pretrained encoder's latent space, and facilitates generation within the learned representation. 63 | 64 |

65 | Interpolation 66 |

67 | 68 | ## 🔥 T2I Results 69 | 70 | Using our approach, we trained a general T2I model which synthesizes coherent and aesthetically pleasing images with minimal compute budget. 71 | 72 |

73 | Interpolation 74 |

75 | 76 | # 🚀 To-Do 77 | 78 | We are in the process of preparing the public release of the **RepTok** codebase. 79 | The following items are planned: 80 | 81 | - [ ] Release pretrained checkpoints 82 | - [ ] Provide inference demo 83 | 84 | Stay tuned — the code and pretrained models will be released soon! 85 | 86 | ## 🎓 Citation 87 | 88 | If you use our work in your research, please use the following BibTeX entry 89 | 90 | ```bibtex 91 | @misc{gui2025reptok, 92 | title={Adapting Self-Supervised Representations as a Latent Space for Efficient Generation}, 93 | author={Ming Gui and Johannes Schusterbauer and Timy Phan and Felix Krause and Josh Susskind and Miguel Angel Bautista and Björn Ommer}, 94 | year={2025}, 95 | eprint={2510.14630}, 96 | archivePrefix={arXiv}, 97 | primaryClass={cs.CV}, 98 | url={https://arxiv.org/abs/2510.14630}, 99 | } 100 | ``` 101 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[codz] 4 | *$py.class 5 | 6 | sandbox 7 | checkpoints 8 | results 9 | logs 10 | wandb 11 | outputs 12 | 13 | *__pycache__* 14 | .idea 15 | venv 16 | 17 | *.DS_Store 18 | *._.DS_Store 19 | testy.ipynb 20 | third_party 21 | 22 | # C extensions 23 | *.so 24 | 25 | # Distribution / packaging 26 | .Python 27 | build/ 28 | develop-eggs/ 29 | dist/ 30 | downloads/ 31 | eggs/ 32 | .eggs/ 33 | lib/ 34 | lib64/ 35 | parts/ 36 | sdist/ 37 | var/ 38 | wheels/ 39 | share/python-wheels/ 40 | *.egg-info/ 41 | .installed.cfg 42 | *.egg 43 | MANIFEST 44 | 45 | # PyInstaller 46 | # Usually these files are written by a python script from a template 47 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 48 | *.manifest 49 | *.spec 50 | 51 | # Installer logs 52 | pip-log.txt 53 | pip-delete-this-directory.txt 54 | 55 | # Unit test / coverage reports 56 | htmlcov/ 57 | .tox/ 58 | .nox/ 59 | .coverage 60 | .coverage.* 61 | .cache 62 | nosetests.xml 63 | coverage.xml 64 | *.cover 65 | *.py.cover 66 | .hypothesis/ 67 | .pytest_cache/ 68 | cover/ 69 | 70 | # Translations 71 | *.mo 72 | *.pot 73 | 74 | # Django stuff: 75 | *.log 76 | local_settings.py 77 | db.sqlite3 78 | db.sqlite3-journal 79 | 80 | # Flask stuff: 81 | instance/ 82 | .webassets-cache 83 | 84 | # Scrapy stuff: 85 | .scrapy 86 | 87 | # Sphinx documentation 88 | docs/_build/ 89 | 90 | # PyBuilder 91 | .pybuilder/ 92 | target/ 93 | 94 | # Jupyter Notebook 95 | .ipynb_checkpoints 96 | 97 | # IPython 98 | profile_default/ 99 | ipython_config.py 100 | 101 | # pyenv 102 | # For a library or package, you might want to ignore these files since the code is 103 | # intended to run in multiple environments; otherwise, check them in: 104 | # .python-version 105 | 106 | # pipenv 107 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 108 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 109 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 110 | # install all needed dependencies. 111 | #Pipfile.lock 112 | 113 | # UV 114 | # Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control. 115 | # This is especially recommended for binary packages to ensure reproducibility, and is more 116 | # commonly ignored for libraries. 117 | #uv.lock 118 | 119 | # poetry 120 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 121 | # This is especially recommended for binary packages to ensure reproducibility, and is more 122 | # commonly ignored for libraries. 123 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 124 | #poetry.lock 125 | #poetry.toml 126 | 127 | # pdm 128 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 129 | # pdm recommends including project-wide configuration in pdm.toml, but excluding .pdm-python. 130 | # https://pdm-project.org/en/latest/usage/project/#working-with-version-control 131 | #pdm.lock 132 | #pdm.toml 133 | .pdm-python 134 | .pdm-build/ 135 | 136 | # pixi 137 | # Similar to Pipfile.lock, it is generally recommended to include pixi.lock in version control. 138 | #pixi.lock 139 | # Pixi creates a virtual environment in the .pixi directory, just like venv module creates one 140 | # in the .venv directory. It is recommended not to include this directory in version control. 141 | .pixi 142 | 143 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 144 | __pypackages__/ 145 | 146 | # Celery stuff 147 | celerybeat-schedule 148 | celerybeat.pid 149 | 150 | # SageMath parsed files 151 | *.sage.py 152 | 153 | # Environments 154 | .env 155 | .envrc 156 | .venv 157 | env/ 158 | venv/ 159 | ENV/ 160 | env.bak/ 161 | venv.bak/ 162 | 163 | # Spyder project settings 164 | .spyderproject 165 | .spyproject 166 | 167 | # Rope project settings 168 | .ropeproject 169 | 170 | # mkdocs documentation 171 | /site 172 | 173 | # mypy 174 | .mypy_cache/ 175 | .dmypy.json 176 | dmypy.json 177 | 178 | # Pyre type checker 179 | .pyre/ 180 | 181 | # pytype static type analyzer 182 | .pytype/ 183 | 184 | # Cython debug symbols 185 | cython_debug/ 186 | 187 | # PyCharm 188 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 189 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 190 | # and can be added to the global gitignore or merged into this file. For a more nuclear 191 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 192 | #.idea/ 193 | 194 | # Abstra 195 | # Abstra is an AI-powered process automation framework. 196 | # Ignore directories containing user credentials, local state, and settings. 197 | # Learn more at https://abstra.io/docs 198 | .abstra/ 199 | 200 | # Visual Studio Code 201 | # Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore 202 | # that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore 203 | # and can be added to the global gitignore or merged into this file. However, if you prefer, 204 | # you could uncomment the following to ignore the entire vscode folder 205 | # .vscode/ 206 | 207 | # Ruff stuff: 208 | .ruff_cache/ 209 | 210 | # PyPI configuration file 211 | .pypirc 212 | 213 | # Cursor 214 | # Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to 215 | # exclude from AI features like autocomplete and code analysis. Recommended for sensitive data 216 | # refer to https://docs.cursor.com/context/ignore-files 217 | .cursorignore 218 | .cursorindexingignore 219 | 220 | # Marimo 221 | marimo/_static/ 222 | marimo/_lsp/ 223 | __marimo__/ 224 | --------------------------------------------------------------------------------