└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # 🔥 Awesome Controllable Generative Models
  2 | 
  3 | A curated and continuously updated collection of **recent (2023–2025)** research papers on **controllable generative models**, with a special focus on both **UNet-based diffusion models** and **Transformer-based diffusion architectures**.
  4 | 
  5 | This list emphasizes core advances in:
  6 | 
  7 | - 🧭 **Control mechanisms** – including condition injection, adapters, multi-modal control
  8 | - 👁️ **Attention interpretation** – revealing what diffusion models focus on
  9 | - 🎛️ **Frequency-based control** – using spectral domain knowledge to guide generation
 10 | - 🔁 **Alignment & knowledge transfer** – enabling more coherent, faithful, and data-efficient synthesis
 11 | - 🧑‍🎨 **Image-to-image (I2I) editing** – flexible, structure-preserving transformation across domains
 12 | 
 13 | > 💡 Our goal is not only to track the state-of-the-art in controllable generation, but also to offer a **well-organized knowledge map** for newcomers and researchers building on top of diffusion models.
 14 | 
 15 | ---
 16 | 
 17 | ## 🧭 Control Mechanism
 18 | 
 19 | | Paper | Venue | Links |
 20 | |-------|-------|-------|
 21 | | Adding Conditional Control to Text-to-Image Diffusion Models | ICCV 2023 | [paper](https://arxiv.org/abs/2302.05543) \| [Code](https://github.com/lllyasviel/ControlNet) |
 22 | | Cocktail : Mixing Multi-Modality Controls for Text-Conditional Image Generation | NeurIPS 2023 | [Paper](https://arxiv.org/abs/2306.00964) \| [Code](https://github.com/mhh0318/Cocktail) |
 23 | | OminiControl: Minimal and Universal Control for Diffusion Transformer | arXiv 2024 | [Paper](https://arxiv.org/abs/2411.15098) \| [Code](https://github.com/Yuanshi9815/OminiControl) |
 24 | | SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions | ECCV 2024 (Poster) | [Paper](https://arxiv.org/abs/2404.06451) \| [Code](https://github.com/liuxiaoyu1104/SmartControl) |
 25 | | CTRL-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model | ICLR 2025 (Oral) | [Paper](https://openreview.net/forum?id=ny8T8OuNHe) \| [Code](https://github.com/HL-hanlin/Ctrl-Adapter) |
 26 | | FlexControl: Computation-Aware Conditional Control with Differentiable Router | ICML 2025 (Poster) | [Paper](https://arxiv.org/abs/2502.10451) \| [Code](https://github.com/Daryu‑Fan/FlexControl) |
 27 | | Ctrl‑X: Controlling Structure and Appearance Without Guidance | NeurIPS 2024 | [Paper](https://arxiv.org/abs/2406.07540) \| [Code](https://github.com/genforce/ctrl-x) |
 28 | | Composer: Creative and Controllable Image Synthesis with Composable Conditions | ICML 2023 | [Paper](https://arxiv.org/abs/2302.09778) \| [Code](https://github.com/ali-vilab/composer?tab=readme-ov-file) |
 29 | | ConceptCtrl: Concept Control of Zero-shot Personalized Image Generation | arXiv 2025 | [Paper](https://arxiv.org/abs/2503.06568) \| [Code](https://github.com/QY-H00/Conceptrol) |
 30 | | UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild | CVPR 2023 | [Paper](https://arxiv.org/abs/2305.11147) \| [Code](https://github.com/salesforce/UniControl) |
 31 | | Controllable Generation with Text-to-Image Diffusion Models: A Survey | T-PAMI 2024 | [Paper](https://arxiv.org/abs/2403.04279) \| [Code](#) |
 32 | | Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models | NeurIPS 2023 | [Paper](https://arxiv.org/abs/2305.16322) \| [Code](https://github.com/ShihaoZhaoZSH/Uni-ControlNet) |
 33 | | Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling | ICLR 2025 | [Paper](https://arxiv.org/abs/2410.11236) \| [Code](https://github.com/grenoble-zhang/Ctrl-U) |
 34 | | Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models | arXiv 2024 | [Paper](https://arxiv.org/abs/2411.07126) \| [Code](https://github.com/NVIDIA/Edify-Image) |
 35 | | Generative Modeling by Estimating Gradients of the Data Distribution | NeurIPS 2019 | [Paper](https://arxiv.org/abs/1907.05600) \| [Code](#) |
 36 | | IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models | arXiv 2023 | [Paper](https://arxiv.org/abs/2308.06721) \| [Code](https://github.com/tencent-ailab/IP-Adapter) |
 37 | | T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion | NeurIPS 2023 | [Paper](https://arxiv.org/abs/2302.08453) \| [Code](https://github.com/TencentARC/T2I-Adapter) |
 38 | | Is Noise Conditioning Necessary for Denoising Generative Models? | arXiv 2025 | [Paper](https://arxiv.org/abs/2502.13129) \| [Code](#) |
 39 | | Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Guidance | WACV 2024 | [Paper](https://arxiv.org/abs/2402.13404) \| [Code](#) |
 40 | | More Control for Free! Image Synthesis with Semantic Diffusion Guidance | WACV 2023 | [Paper](https://arxiv.org/abs/2112.05744) \| [Code](https://xh-liu.github.io/sdg/) |
 41 | | MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation | CVPR 2023 | [Paper](https://arxiv.org/abs/2302.08113) \| [Code](#) |
 42 | | Rectified Diffusion Guidance for Conditional Generation | CVPR 2025 | [Paper](https://arxiv.org/abs/2410.18737) \| [Code](#) |
 43 | | Sketch-Guided Text-to-Image Diffusion Models | arXiv 2022 | [Paper](https://arxiv.org/abs/2211.13752) \| [Code](https://github.com/ogkalu2/Sketch-Guided-Text-To-Image-Diffusion) |
 44 | 
 45 | > 🧠 These papers push the boundary of **how we guide generation**, whether through minimal prompts, learned adapters, or uncertainty-aware mechanisms.
 46 | 
 47 | 
 48 | ---
 49 | 
 50 | ## 👁️ Attention & Interpretability
 51 | 
 52 | | Paper | Venue | Links |
 53 | |-------|-------|-------|
 54 | | ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features | ICML 2025 (Oral) | [Paper](https://arxiv.org/abs/2502.04320) \| [Code](https://github.com/helblazer811/ConceptAttention) |
 55 | | ToMA: Token Merge with Attention for Diffusion Models | ICML 2025 (Poster) | [Paper](https://openreview.net/forum?id=xhtqgW5b93) \| [Code](#) |
 56 | | What the DAAM: Interpreting Stable Diffusion Using Cross Attention | ACL 2024 (Oral) | [Paper](https://arxiv.org/abs/2302.12243) \| [Code](https://github.com/zhudilin/DAAM) |
 57 | | Attention Distillation: A Unified Approach to Visual Characteristics Transfer | CVPR 2025 | [Paper](https://arxiv.org/abs/2502.20235) \| [Code](https://github.com/xugao97/AttentionDistillation) |
 58 | 
 59 | > 🔬 Interpretability is **not just analysis** — it's a step toward **transparent and editable generative pipelines**.
 60 | 
 61 | ---
 62 | 
 63 | ## 🎛️ Frequency Domain Control
 64 | 
 65 | | Paper | Venue | Links |
 66 | |-------|-------|-------|
 67 | | DiffFNO: Diffusion Fourier Neural Operator for Arbitrary-Scale Super-Resolution | CVPR 2025 (Oral) | [Paper](https://arxiv.org/abs/2411.09911) \| [Code](#) |
 68 | | Frequency Autoregressive Image Generation with Continuous Tokens | arXiv 2025 | [Paper](https://arxiv.org/abs/2503.05305) \| [Code](https://github.com/yuhuUSTC/FAR) |
 69 | | Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation | AAAI 2024 | [Paper](https://arxiv.org/abs/2407.03006) \| [Code](https://github.com/XiangGao1102/FCDiffusion) |
 70 | | FreeU: Free Lunch in Diffusion U-Net | CVPR 2024 | [Paper](https://arxiv.org/abs/2309.11497) \| [Code](https://github.com/ChenyangSi/FreeU) |
 71 | | FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models | ECCV 2024 (Poster) | [Paper](https://arxiv.org/abs/2404.11895) \| [Code](https://github.com/thermal-dynamics/freediff) |
 72 | | PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion | CVPR 2025 (Poster) | [Paper](https://arxiv.org/abs/2503.06186) \| [Code](https://github.com/XiangGao1102/PTDiffusion) |
 73 | | ResDiff: Combining CNN and Diffusion Model for Image Super-Resolution | AAAI 2023 | [Paper](https://arxiv.org/abs/2303.08714) \| [Code](https://github.com/LYL1015/ResDiff) |
 74 | | Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain | ICML 2025 (Spotlight Poster) | [Paper](https://arxiv.org/abs/2505.01267) \| [Code](#) |
 75 | 
 76 | > 📡 Spectral and signal-level control provides **low-level but powerful levers** for generative consistency, resolution, and robustness.
 77 | 
 78 | ---
 79 | 
 80 | ## 🔁 Alignment & Knowledge Transfer
 81 | 
 82 | | Paper | Venue | Links |
 83 | |-------|-------|-------|
 84 | | When Model Knowledge meets Diffusion: Data-free Synthesis with Domain-Class Alignment | ICML 2025 | [Paper](https://arxiv.org/abs/2506.15381) \| [Code](#) |
 85 | 
 86 | > 🧬 These works align **discrete symbolic knowledge** with **continuous generative priors**, aiming for controllability in low-data or zero-shot regimes.
 87 | 
 88 | ---
 89 | 
 90 | You may also consider including some notable **image-to-image (I2I） editing methods** in your collection.
 91 | ## Image Editing
 92 | 
 93 | | Paper | Venue | Links |
 94 | |-------|-------|-------|
 95 | | AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea | CVPR 2025 (Oral) | [Paper](https://arxiv.org/abs/2411.15738) \| [Code](https://github.com/DCDmllm/AnyEdit) |
 96 | | Diffusion Model-Based Image Editing: A Survey | arXiv 2024 | [Paper](https://arxiv.org/abs/2402.17525) \| [Code](https://github.com/SiatMMLab/Awesome-Diffusion-Model-Based-Image-Editing-Methods) |
 97 | | Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code | ICLR 2024 | [Paper](https://arxiv.org/abs/2310.01506) \| [Code](https://github.com/cure-lab/PnPInversion) |
 98 | | Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation | CVPR 2023 (Poster) | [Paper](https://arxiv.org/abs/2211.12572) \| [Code](https://github.com/MichalGeyer/plug-and-play) |
 99 | | Prompt-to-Prompt Image Editing with Cross Attention Control | ICLR 2023 | [Paper](https://arxiv.org/abs/2208.01626) \| [Code](https://github.com/google/prompt-to-prompt) |
100 | | RePaint: Inpainting using Denoising Diffusion Probabilistic Models | CVPR 2022 | [Paper](https://arxiv.org/abs/2201.09865) \| [Code](https://github.com/andreas128/RePaint) |
101 | | Zero-shot Image-to-Image Translation (pix2pix-zero) | SIGGRAPH 2023 | [Paper](https://arxiv.org/abs/2302.03027) \| [Code](https://github.com/pix2pixzero/pix2pix-zero) |
102 | 
103 | > ✏️ Editing is arguably **where controllability matters most** — precision, structure preservation, and user intent must all align.
104 | 
105 | ---
106 | 
107 | ## 📬 Contribute
108 | 
109 | 💡 Know a paper we missed? Working on a new controllable generation method?  
110 | Feel free to **submit a pull request** or **open an issue** — contributions are welcome!
111 | 
112 | 


--------------------------------------------------------------------------------