└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # 🔥 Awesome Controllable Generative Models 2 | 3 | A curated and continuously updated collection of **recent (2023–2025)** research papers on **controllable generative models**, with a special focus on both **UNet-based diffusion models** and **Transformer-based diffusion architectures**. 4 | 5 | This list emphasizes core advances in: 6 | 7 | - 🧭 **Control mechanisms** – including condition injection, adapters, multi-modal control 8 | - 👁️ **Attention interpretation** – revealing what diffusion models focus on 9 | - 🎛️ **Frequency-based control** – using spectral domain knowledge to guide generation 10 | - 🔁 **Alignment & knowledge transfer** – enabling more coherent, faithful, and data-efficient synthesis 11 | - 🧑‍🎨 **Image-to-image (I2I) editing** – flexible, structure-preserving transformation across domains 12 | 13 | > 💡 Our goal is not only to track the state-of-the-art in controllable generation, but also to offer a **well-organized knowledge map** for newcomers and researchers building on top of diffusion models. 14 | 15 | --- 16 | 17 | ## 🧭 Control Mechanism 18 | 19 | | Paper | Venue | Links | 20 | |-------|-------|-------| 21 | | Adding Conditional Control to Text-to-Image Diffusion Models | ICCV 2023 | [paper](https://arxiv.org/abs/2302.05543) \| [Code](https://github.com/lllyasviel/ControlNet) | 22 | | Cocktail : Mixing Multi-Modality Controls for Text-Conditional Image Generation | NeurIPS 2023 | [Paper](https://arxiv.org/abs/2306.00964) \| [Code](https://github.com/mhh0318/Cocktail) | 23 | | OminiControl: Minimal and Universal Control for Diffusion Transformer | arXiv 2024 | [Paper](https://arxiv.org/abs/2411.15098) \| [Code](https://github.com/Yuanshi9815/OminiControl) | 24 | | SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions | ECCV 2024 (Poster) | [Paper](https://arxiv.org/abs/2404.06451) \| [Code](https://github.com/liuxiaoyu1104/SmartControl) | 25 | | CTRL-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model | ICLR 2025 (Oral) | [Paper](https://openreview.net/forum?id=ny8T8OuNHe) \| [Code](https://github.com/HL-hanlin/Ctrl-Adapter) | 26 | | FlexControl: Computation-Aware Conditional Control with Differentiable Router | ICML 2025 (Poster) | [Paper](https://arxiv.org/abs/2502.10451) \| [Code](https://github.com/Daryu‑Fan/FlexControl) | 27 | | Ctrl‑X: Controlling Structure and Appearance Without Guidance | NeurIPS 2024 | [Paper](https://arxiv.org/abs/2406.07540) \| [Code](https://github.com/genforce/ctrl-x) | 28 | | Composer: Creative and Controllable Image Synthesis with Composable Conditions | ICML 2023 | [Paper](https://arxiv.org/abs/2302.09778) \| [Code](https://github.com/ali-vilab/composer?tab=readme-ov-file) | 29 | | ConceptCtrl: Concept Control of Zero-shot Personalized Image Generation | arXiv 2025 | [Paper](https://arxiv.org/abs/2503.06568) \| [Code](https://github.com/QY-H00/Conceptrol) | 30 | | UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild | CVPR 2023 | [Paper](https://arxiv.org/abs/2305.11147) \| [Code](https://github.com/salesforce/UniControl) | 31 | | Controllable Generation with Text-to-Image Diffusion Models: A Survey | T-PAMI 2024 | [Paper](https://arxiv.org/abs/2403.04279) \| [Code](#) | 32 | | Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models | NeurIPS 2023 | [Paper](https://arxiv.org/abs/2305.16322) \| [Code](https://github.com/ShihaoZhaoZSH/Uni-ControlNet) | 33 | | Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling | ICLR 2025 | [Paper](https://arxiv.org/abs/2410.11236) \| [Code](https://github.com/grenoble-zhang/Ctrl-U) | 34 | | Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models | arXiv 2024 | [Paper](https://arxiv.org/abs/2411.07126) \| [Code](https://github.com/NVIDIA/Edify-Image) | 35 | | Generative Modeling by Estimating Gradients of the Data Distribution | NeurIPS 2019 | [Paper](https://arxiv.org/abs/1907.05600) \| [Code](#) | 36 | | IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models | arXiv 2023 | [Paper](https://arxiv.org/abs/2308.06721) \| [Code](https://github.com/tencent-ailab/IP-Adapter) | 37 | | T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion | NeurIPS 2023 | [Paper](https://arxiv.org/abs/2302.08453) \| [Code](https://github.com/TencentARC/T2I-Adapter) | 38 | | Is Noise Conditioning Necessary for Denoising Generative Models? | arXiv 2025 | [Paper](https://arxiv.org/abs/2502.13129) \| [Code](#) | 39 | | Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Guidance | WACV 2024 | [Paper](https://arxiv.org/abs/2402.13404) \| [Code](#) | 40 | | More Control for Free! Image Synthesis with Semantic Diffusion Guidance | WACV 2023 | [Paper](https://arxiv.org/abs/2112.05744) \| [Code](https://xh-liu.github.io/sdg/) | 41 | | MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation | CVPR 2023 | [Paper](https://arxiv.org/abs/2302.08113) \| [Code](#) | 42 | | Rectified Diffusion Guidance for Conditional Generation | CVPR 2025 | [Paper](https://arxiv.org/abs/2410.18737) \| [Code](#) | 43 | | Sketch-Guided Text-to-Image Diffusion Models | arXiv 2022 | [Paper](https://arxiv.org/abs/2211.13752) \| [Code](https://github.com/ogkalu2/Sketch-Guided-Text-To-Image-Diffusion) | 44 | 45 | > 🧠 These papers push the boundary of **how we guide generation**, whether through minimal prompts, learned adapters, or uncertainty-aware mechanisms. 46 | 47 | 48 | --- 49 | 50 | ## 👁️ Attention & Interpretability 51 | 52 | | Paper | Venue | Links | 53 | |-------|-------|-------| 54 | | ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | ICML 2025 (Oral) | [Paper](https://arxiv.org/abs/2502.04320) \| [Code](https://github.com/helblazer811/ConceptAttention) | 55 | | ToMA: Token Merge with Attention for Diffusion Models | ICML 2025 (Poster) | [Paper](https://openreview.net/forum?id=xhtqgW5b93) \| [Code](#) | 56 | | What the DAAM: Interpreting Stable Diffusion Using Cross Attention | ACL 2024 (Oral) | [Paper](https://arxiv.org/abs/2302.12243) \| [Code](https://github.com/zhudilin/DAAM) | 57 | | Attention Distillation: A Unified Approach to Visual Characteristics Transfer | CVPR 2025 | [Paper](https://arxiv.org/abs/2502.20235) \| [Code](https://github.com/xugao97/AttentionDistillation) | 58 | 59 | > 🔬 Interpretability is **not just analysis** — it's a step toward **transparent and editable generative pipelines**. 60 | 61 | --- 62 | 63 | ## 🎛️ Frequency Domain Control 64 | 65 | | Paper | Venue | Links | 66 | |-------|-------|-------| 67 | | DiffFNO: Diffusion Fourier Neural Operator for Arbitrary-Scale Super-Resolution | CVPR 2025 (Oral) | [Paper](https://arxiv.org/abs/2411.09911) \| [Code](#) | 68 | | Frequency Autoregressive Image Generation with Continuous Tokens | arXiv 2025 | [Paper](https://arxiv.org/abs/2503.05305) \| [Code](https://github.com/yuhuUSTC/FAR) | 69 | | Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation | AAAI 2024 | [Paper](https://arxiv.org/abs/2407.03006) \| [Code](https://github.com/XiangGao1102/FCDiffusion) | 70 | | FreeU: Free Lunch in Diffusion U-Net | CVPR 2024 | [Paper](https://arxiv.org/abs/2309.11497) \| [Code](https://github.com/ChenyangSi/FreeU) | 71 | | FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models | ECCV 2024 (Poster) | [Paper](https://arxiv.org/abs/2404.11895) \| [Code](https://github.com/thermal-dynamics/freediff) | 72 | | PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion | CVPR 2025 (Poster) | [Paper](https://arxiv.org/abs/2503.06186) \| [Code](https://github.com/XiangGao1102/PTDiffusion) | 73 | | ResDiff: Combining CNN and Diffusion Model for Image Super-Resolution | AAAI 2023 | [Paper](https://arxiv.org/abs/2303.08714) \| [Code](https://github.com/LYL1015/ResDiff) | 74 | | Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain | ICML 2025 (Spotlight Poster) | [Paper](https://arxiv.org/abs/2505.01267) \| [Code](#) | 75 | 76 | > 📡 Spectral and signal-level control provides **low-level but powerful levers** for generative consistency, resolution, and robustness. 77 | 78 | --- 79 | 80 | ## 🔁 Alignment & Knowledge Transfer 81 | 82 | | Paper | Venue | Links | 83 | |-------|-------|-------| 84 | | When Model Knowledge meets Diffusion: Data-free Synthesis with Domain-Class Alignment | ICML 2025 | [Paper](https://arxiv.org/abs/2506.15381) \| [Code](#) | 85 | 86 | > 🧬 These works align **discrete symbolic knowledge** with **continuous generative priors**, aiming for controllability in low-data or zero-shot regimes. 87 | 88 | --- 89 | 90 | You may also consider including some notable **image-to-image (I2I) editing methods** in your collection. 91 | ## Image Editing 92 | 93 | | Paper | Venue | Links | 94 | |-------|-------|-------| 95 | | AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea | CVPR 2025 (Oral) | [Paper](https://arxiv.org/abs/2411.15738) \| [Code](https://github.com/DCDmllm/AnyEdit) | 96 | | Diffusion Model-Based Image Editing: A Survey | arXiv 2024 | [Paper](https://arxiv.org/abs/2402.17525) \| [Code](https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods) | 97 | | Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code | ICLR 2024 | [Paper](https://arxiv.org/abs/2310.01506) \| [Code](https://github.com/cure-lab/PnPInversion) | 98 | | Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation | CVPR 2023 (Poster) | [Paper](https://arxiv.org/abs/2211.12572) \| [Code](https://github.com/MichalGeyer/plug-and-play) | 99 | | Prompt-to-Prompt Image Editing with Cross Attention Control | ICLR 2023 | [Paper](https://arxiv.org/abs/2208.01626) \| [Code](https://github.com/google/prompt-to-prompt) | 100 | | RePaint: Inpainting using Denoising Diffusion Probabilistic Models | CVPR 2022 | [Paper](https://arxiv.org/abs/2201.09865) \| [Code](https://github.com/andreas128/RePaint) | 101 | | Zero-shot Image-to-Image Translation (pix2pix-zero) | SIGGRAPH 2023 | [Paper](https://arxiv.org/abs/2302.03027) \| [Code](https://github.com/pix2pixzero/pix2pix-zero) | 102 | 103 | > ✏️ Editing is arguably **where controllability matters most** — precision, structure preservation, and user intent must all align. 104 | 105 | --- 106 | 107 | ## 📬 Contribute 108 | 109 | 💡 Know a paper we missed? Working on a new controllable generation method? 110 | Feel free to **submit a pull request** or **open an issue** — contributions are welcome! 111 | 112 | --------------------------------------------------------------------------------