├── Images ├── Conditional image synthesis with diffusion model.png ├── Sampling.png ├── U-Net.jpg ├── conditional image synthesis tasks.png ├── workflow.pdf └── workflow.png ├── LICENSE ├── README.md └── media ├── License-MIT-green.svg └── badge.svg /Images/Conditional image synthesis with diffusion model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zju-pi/Awesome-Conditional-Diffusion-Models/c4f15eece706ad9f2dcf888ad5ff9159305c9945/Images/Conditional image synthesis with diffusion model.png -------------------------------------------------------------------------------- /Images/Sampling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zju-pi/Awesome-Conditional-Diffusion-Models/c4f15eece706ad9f2dcf888ad5ff9159305c9945/Images/Sampling.png -------------------------------------------------------------------------------- /Images/U-Net.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zju-pi/Awesome-Conditional-Diffusion-Models/c4f15eece706ad9f2dcf888ad5ff9159305c9945/Images/U-Net.jpg -------------------------------------------------------------------------------- /Images/conditional image synthesis tasks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zju-pi/Awesome-Conditional-Diffusion-Models/c4f15eece706ad9f2dcf888ad5ff9159305c9945/Images/conditional image synthesis tasks.png -------------------------------------------------------------------------------- /Images/workflow.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zju-pi/Awesome-Conditional-Diffusion-Models/c4f15eece706ad9f2dcf888ad5ff9159305c9945/Images/workflow.pdf -------------------------------------------------------------------------------- /Images/workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zju-pi/Awesome-Conditional-Diffusion-Models/c4f15eece706ad9f2dcf888ad5ff9159305c9945/Images/workflow.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 ZJU Probabilistic Intelligence Lab 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A Survey on Conditional Image Synthesis with Diffusion Models 2 | 3 | [![Awesome](media/badge.svg)](https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models/tree/main) 4 | [![License: MIT](media/License-MIT-green.svg)](https://opensource.org/licenses/MIT) 5 | [![visitors](https://visitor-badge.laobi.icu/badge?page_id=zju-pi/Awesome-Conditional-Diffusion-Models/tree/main)](https://visitor-badge.laobi.icu/badge?page_id=zju-pi/Awesome-Conditional-Diffusion-Models/tree/main) 6 | 7 | The repository is based on our recently released survey [Conditional Image Synthesis with Diffusion Models: A Survey](https://arxiv.org/pdf/2409.19365) 8 | 9 | Zheyuan Zhan, Defang Chen, Jian-Ping Mei, Zhenghe Zhao, Jiawei Chen, Chun Chen, Siwei Lyu, Fellow, IEEE and Can Wang 10 | 11 | ## Abstract 12 | 13 | Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches in the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the essential sampling process. All discussions are centered around popular applications. Finally, we pinpoint some critical yet still open problems to be solved in the future and suggest some possible solutions. 14 | 15 | ## News! 16 | 17 | 📆2024-10-05: Our comprehensive survey paper, summarizing related methods published before October 1, 2024, is now available. 18 | 19 | 📆2025-04-27: Our paper is accepted by TMLR!!! 20 | 21 | ## BibTeX 22 | 23 | ```bibtex 24 | @article{zhan2024conditional, 25 | title={Conditional Image Synthesis with Diffusion Models: A Survey}, 26 | author={Zhan, Zheyuan and Chen, Defang and Mei, Jian-Ping and Zhao, Zhenghe and Chen, Jiawei and Chen, Chun and Lyu, Siwei and Wang, Can}, 27 | journal={arXiv preprint arXiv:2409.19365}, 28 | year={2024} 29 | } 30 | ``` 31 | 32 | ## Contents 33 | - [Overview](#Overview) 34 | - [Paper Structure](#Paper-Structure) 35 | - [Conditional image synthesis tasks](#Conditional-image-synthesis-tasks) 36 | - [Papers](#Papers) 37 | - [Condition Integration in Denoising Networks](#condition-integration-in-denoising-networks) 38 | - [Condition Integration in the Training Stage](#Condition-Integration-in-the-Training-Stage) 39 | - [Conditional models for text-to-image (T2I)](#Conditional-models-for-text-to-image-(T2I)) 40 | - [Conditional Models for Image Restoration](#Conditional-Models-for-Image-Restoration) 41 | - [Conditional Models for Other Synthesis Scenarios](#Conditional-Models-for-Other-Synthesis-Scenarios) 42 | - [Condition Integration in the Re-purposing Stage](#Condition-Integration-in-the-Re-purposing-Stage) 43 | - [Re-purposed Conditional Encoders](#Re-purposed-Conditional-Encoders) 44 | - [Condition Injection](#Condition-Injection) 45 | - [Backbone Fine-tuning](#Backbone-Fine-tuning) 46 | - [Condition Integration in the Specialization Stage](#Condition-Integration-in-the-Specialization-Stage) 47 | - [Condition Injection](#Condition-Injection) 48 | - [Testing-time Model Fine-Tuning](#Testing-time-Model-Fine-Tuning) 49 | - [Condition Integration in the Sampling Process](#Condition-Integration-in-the-Sampling-Process) 50 | - [Inversion](#Inversion) 51 | - [Attention Manipulation](#Attention-Manipulation) 52 | - [Noise Blending](#Noise-Blending) 53 | - [Revising Diffusion Process](#Revising-Diffusion-Process) 54 | - [Guidance](#Guidance) 55 | - [Conditional Correction](#Conditional-Correction) 56 | 57 | # Overview 58 | 59 | In the two figures below, they respectively illustrate the DCIS taxonomy in this survey and the categorization of conditional image synthesis tasks. 60 | 61 | ## Paper Structure 62 | 63 | ![Conditional image synthesis with diffusion model](https://github.com/Szy12345-liv/A-Survey-on-conditional-image-synthesis-with-diffusion-model/blob/main/Images/Conditional%20image%20synthesis%20with%20diffusion%20model.png) 64 | 65 | ## Conditional image synthesis tasks 66 | 67 | ![tasks](https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models/blob/main/Images/conditional%20image%20synthesis%20tasks.png) 68 | 69 | # Papers 70 | 71 | The date in the table represents the publication date of the first version of the paper on Arxiv. 72 | 73 | ## DDPM denoising network 74 | 75 | ![Workflow](https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models/blob/main/Images/U-Net.jpg) 76 | 77 | ## Condition Integration in Denoising Networks 78 | 79 | This figure provides an examplar workflow to build desired denoising network for conditional synthesis tasks including text-to-image, visual signals to image and customization via these three condition integration stages. 80 | 81 | ![Workflow](https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models/blob/main/Images/workflow.png) 82 | 83 | ### Condition Integration in the Training Stage 84 | 85 | #### Conditional models for text-to-image (T2I) 86 | 87 | | Title | Task | Date | Publication | 88 | | ------------------------------------------------------------ | ------------- | ------- | ----------- | 89 | | [**Vector quantized diffusion model for text-to-image synthesis**](https://arxiv.org/abs/2111.14822) | Text-to-image | 2021.11 | CVPR2022 | 90 | | [**High-resolution image synthesis with latent diffusion models**](https://arxiv.org/abs/2112.10752) | Text-to-image | 2021.12 | CVPR2022 | 91 | | [**GLIDE: towards photorealistic image generation and editing with text-guided diffusion models**](https://arxiv.org/abs/2112.10741) | Text-to-image | 2021.12 | ICML2022 | 92 | | [**Hierarchical text-conditional image generation with CLIP latents**](https://arxiv.org/abs/2204.06125) | Text-to-image | 2022.4 | ARXIV2022 | 93 | | [**Photorealistic text-to-image diffusion models with deep language understanding**](https://arxiv.org/abs/2205.11487) | Text-to-image | 2022.5 | NeurIPS2022 | 94 | | [**ediffi: Text-to-image diffusion models with an ensemble of expert denoisers**](https://arxiv.org/abs/2211.01324) | Text-to-image | 2022.11 | ARXIV2022 | 95 | | [**PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis**](https://arxiv.org/abs/2310.00426) | Text-to-image | 2023.10 | ICLR2024 | 96 | | [**Scaling Rectified Flow Transformers for High-Resolution Image Synthesis**](https://arxiv.org/abs/2403.03206) | Text-to-image | 2024.03 | ICML2024 | 97 | 98 | 99 | #### Conditional Models for Image Restoration 100 | 101 | | Title | Task | Date | Publication | 102 | | ------------------------------------------------------------ | ------- | ------------------ | ------------------ | 103 | | [**Srdiff: Single image super-resolution with diffusion probabilistic models**](https://arxiv.org/abs/2104.14951) | Image restoration | 2021.4 | Neurocomputing2022 | 104 | | [**Image super-resolution via iterative refinement**](https://arxiv.org/abs/2104.07636) | Image restoration | 2021.4 | TPAMI2022 | 105 | | [**Cascaded diffusion models for high fidelity image generation**](https://arxiv.org/abs/2106.15282) | Image restoration | 2021.5 | JMLR2022 | 106 | | [**Palette: Image-to-image diffusion models**](https://arxiv.org/abs/2111.05826) | Image restoration | 2021.11 | SIGGRAPH2022 | 107 | | [**Denoising diffusion probabilistic models for robust image super-resolution in the wild**](https://arxiv.org/abs/2302.07864) | Image restoration | 2023.2 | ARXIV2023 | 108 | | [**Resdiff: Combining cnn and diffusion model for image super-resolution**](https://arxiv.org/abs/2303.08714) | Image restoration | 2023.3 | AAAI2024 | 109 | | [**Low-light image enhancement with wavelet-based diffusion models**](https://arxiv.org/abs/2306.00306) | Image restoration | 2023.6 | TOG2023| 110 | | [**Wavelet-based fourier information interaction with frequency diffusion adjustment for underwater image restoration**](https://arxiv.org/abs/2311.16845) | Image restoration | 2023.11 | CVPR2024 | 111 | | [**Diffusion-based blind text image super-resolution**](https://arxiv.org/abs/2312.08886) | Image restoration | 2023.12 | CVPR2023 | 112 | | [**Low-light image enhancement via clip-fourier guided wavelet diffusion**](https://arxiv.org/abs/2401.03788) | Image restoration | 2024.1 | ARXIV2024 | 113 | 114 | 115 | #### Conditional Models for Other Synthesis Scenarios 116 | 117 | | Title | Task | Date | Publication | 118 | | ------------------------------------------------------------ | ------- | ------------------------ | ------------------------ | 119 | | [**Diffusion autoencoders: Toward a meaningful and decodable representation**](https://arxiv.org/abs/2111.15640) | Novel conditional control | 2021.11 | CVPR2022 | 120 | | [**Semantic image synthesis via diffusion models**](https://arxiv.org/abs/2207.00050) | visual feature map | 2022.6 | ARXIV2022 | 121 | | [**A novel unified conditional scorebased generative framework for multi-modal medical image completion**](https://arxiv.org/abs/2207.03430) | Medical image synthesis | 2022.7 | ARXIV2022 | 122 | | [**A morphology focused diffusion probabilistic model for synthesis of histopathology images**](https://arxiv.org/abs/2209.13167) | Medical image synthesis | 2022.9 | WACV2023 | 123 | | [**Humandiffusion: a coarse-to-fine alignment diffusion framework for controllable text-driven person image generation**](https://arxiv.org/abs/2211.06235) | Visual signal to image | 2022.11 | ARXIV2022 | 124 | | [**Diffusion-based scene graph to image generation with masked contrastive pre-training**](https://arxiv.org/abs/2211.11138) | Graph to image | 2022.11 | ARXIV2022 | 125 | | [**Dolce: A model-based probabilistic diffusion framework for limited-angle ct reconstruction**](https://arxiv.org/abs/2211.12340) | Medical image synthesis | 2022.11 | ICCV2023 | 126 | | [**Zero-shot medical image translation via frequency-guided diffusion models**](https://arxiv.org/abs/2304.02742) | Image editing | 2023.4 | Trans. Med. Imaging 2023 | 127 | | [**Learned representation-guided diffusion models for large-image generation**](https://arxiv.org/abs/2312.07330) | / | 2023.12 | ARXIV2023 | 128 | 129 | ### Condition Integration in the Re-purposing Stage 130 | 131 | #### Re-purposed Conditional Encoders 132 | 133 | | Title | Task | Date | Publication | 134 | | ------------------------------------------------------------ | -------------------------------- | ------- | ------------ | 135 | | [**Pretraining is all you need for image-to-image translation**](https://arxiv.org/abs/2205.12952) | Visual signal to image | 2022.5 | ARXIV2022 | 136 | | [**T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models**](https://arxiv.org/abs/2302.08453) | Visual signal to image | 2023.2 | AAAI2024 | 137 | | [**Adding conditional control to text-to-image diffusion models**](https://arxiv.org/abs/2302.05543) | Visual signal to image | 2023.2 | ICCV2023 | 138 | | [**Encoder-based domain tuning for fast personalization of text-to-image models**](https://arxiv.org/abs/2302.12228) | Customization | 2023.2 | TOG2023 | 139 | | [**Pair-diffusion: Object-level image editing with structure-and-appearance paired diffusion models**](https://arxiv.org/abs/2303.17546v1) | Image editing, Image composition | 2023.3 | ARXIV2023 | 140 | | [**Taming encoder for zero fine-tuning image customization with text-to-image diffusion models**](https://arxiv.org/abs/2304.02642) | Customization | 2023.4 | ARXIV2023 | 141 | | [**Instantbooth: Personalized text-to-image generation without test-time finetuning**](https://arxiv.org/abs/2304.03411) | Customization | 2023.4 | CVPR2024 | 142 | | [**Blip-diffusion: pre-trained subject representation for controllable text-to-image generation and editing**](https://arxiv.org/abs/2305.14720) | Customization | 2023.5 | NeurIPS2023 | 143 | | [**Fastcomposer: Tuning-free multi-subject image generation with localized attention**](https://arxiv.org/abs/2305.10431) | Customization | 2023.5 | ARXIV2023 | 144 | | [**Prompt-free diffusion: Taking” text” out of text-to-image diffusion models**](https://arxiv.org/abs/2305.16223) | Visual signal to image | 2023.5 | CVPR2024 | 145 | | [**Paste,inpaint and harmonize via denoising: Subject-driven image editing with pre-trained diffusion model**](https://arxiv.org/abs/2306.07596) | Image composition | 2023.6 | ARXIV2023 | 146 | | [**Subject-diffusion: Open domain personalized text-to-image generation without test-time fine-tuning**](https://arxiv.org/abs/2307.11410) | Customization,Layout control | 2023.7 | SIGGRAPH2024 | 147 | | [**Imagebrush: Learning visual in-context instructions for exemplar-based image manipulation**](https://arxiv.org/abs/2308.00906) | Image editing | 2023.8 | NeurIPS2024 | 148 | | [**Guiding instruction-based image editing via multimodal large language models**](https://arxiv.org/abs/2309.17102) | Image editing | 2023.9 | ARXIV2023 | 149 | | [**Ranni: Taming text-to-image diffusion for accurate instruction following**](https://arxiv.org/abs/2311.17002) | Image editing | 2023.11 | ARXIV2023 | 150 | | [**Smartedit: Exploring complex instruction-based image editing with multimodal large language models**](https://arxiv.org/abs/2312.06739) | Image editing | 2023.12 | ARXIV2023 | 151 | | [**Instructany2pix: Flexible visual editing via multimodal instruction following**](https://arxiv.org/abs/2312.06738) | Image editing | 2023.12 | ARXIV2023 | 152 | | [**Warpdiffusion: Efficient diffusion model for high-fidelity virtual try-on**](https://arxiv.org/abs/2312.03667) | Image composition | 2023.12 | ARXIV2023 | 153 | | [**Coarse-to-fine latent diffusion for pose-guided person image synthesis**](https://arxiv.org/abs/2402.18078) | Customization | 2024.2 | CVPR2024 | 154 | | [**Lightit: Illumination modeling and control for diffusion models**](https://arxiv.org/abs/2403.10615) | Visual signal to image | 2024.3 | CVPR2024 | 155 | | [**Face2diffusion for fast and editable face personalization**](https://arxiv.org/abs/2403.05094) | Customization | 2024.3 | CVPR2024 | 156 | 157 | #### Condition Injection 158 | 159 | | Title | Task | Date | Publication | 160 | | ------------------------------------------------------------ | ------------------------------------ | ------- | ----------- | 161 | | [**GLIGEN: open-set grounded text-to-image generation**](https://arxiv.org/abs/2301.07093) | Layout control | 2023.1 | CVPR2023 | 162 | | [**Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation**](https://arxiv.org/abs/2302.13848) | Customization | 2023.2 | CVPR2023 | 163 | | [**Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models**](https://arxiv.org/abs/2305.18292) | Customization | 2023.5 | NeurIPS2024 | 164 | | [**Dragondiffusion: Enabling drag-style manipulation on diffusion models**](https://arxiv.org/abs/2307.02421) | Image editing | 2023.7 | ICLR2024 | 165 | | [**Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models**](https://arxiv.org/abs/2308.06721) | Visual signal to image,Image editing | 2023.8 | ARXIV2023 | 166 | | [**Interactdiffusion: Interaction control in text-to-image diffusion models**](https://arxiv.org/abs/2312.05849) | Layout control | 2023.12 | ARXIV2023 | 167 | | [**Instancediffusion: Instance-level control for image generation**](https://arxiv.org/abs/2402.03290) | Layout control | 2024.2 | CVPR2024 | 168 | | [**Deadiff: An efficient stylization diffusion model with disentangled representations**](https://arxiv.org/abs/2403.06951) | Image editing | 2024.3 | CVPR2024 | 169 | 170 | #### Backbone Fine-tuning 171 | 172 | | Title | | Date | Publication | 173 | | ------------------------------------------------------------ | ----------------- | ------- | ----------- | 174 | | [**Instructpix2pix: Learning to follow image editing instructions**](https://arxiv.org/abs/2211.09800) | Image editing | 2022.11 | CVPR2023 | 175 | | [**Paint by example: Exemplar-based image editing with diffusion models**](https://arxiv.org/abs/2211.13227) | Image composition | 2022.11 | CVPR2023 | 176 | | [**Objectstitch: Object compositing with diffusion model**](https://arxiv.org/abs/2212.00932v1) | Image composition | 2022.12 | CVPR2023 | 177 | | [**Smartbrush: Text and shape guided object inpainting with diffusion model**](https://arxiv.org/abs/2212.05034) | Image restoration | 2022.12 | CVPR2023 | 178 | | [**Imagen editor and editbench: Advancing and evaluating text-guided image inpainting**](https://arxiv.org/abs/2212.06909) | Image restoration | 2022.12 | CVPR2023 | 179 | | [**Reference-based image composition with sketch via structure-aware diffusion model**](https://arxiv.org/abs/2304.09748) | Image composition | 2023.3 | ARXIV2023 | 180 | | [**Dialogpaint: A dialogbased image editing model**](https://arxiv.org/abs/2303.10073) | Image editing | 2023.3 | ARXIV2023 | 181 | | [**Hive: Harnessing human feedback for instructional visual editing**](https://arxiv.org/abs/2303.09618) | Image editing | 2023.3 | CVPR2024 | 182 | | [**Inst-inpaint: Instructing to remove objects with diffusion models**](https://arxiv.org/abs/2304.03246) | Image editing | 2023.4 | ARXIV2023 | 183 | | [**Text-to-image editing by image information removal**](https://arxiv.org/abs/2305.17489) | Image editing | 2023.5 | WACV2024 | 184 | | [**Magicbrush: A manually annotated dataset for instruction-guided image editing**](https://arxiv.org/abs/2306.10012) | Image editing | 2023.6 | NeurIPS2024 | 185 | | [**Anydoor: Zero-shot object-level image customization**](https://arxiv.org/abs/2307.09481) | Image composition | 2023.7 | CVPR2024 | 186 | | [**Instructdiffusion: A generalist modeling interface for vision tasks**](https://arxiv.org/abs/2309.03895) | Image editing | 2023.9 | ARXIV2023 | 187 | | [**Emu edit: Precise image editing via recognition and generation tasks**](https://arxiv.org/abs/2311.10089) | Image editing | 2023.11 | CVPR2024 | 188 | | [**Dreaminpainter: Text-guided subject-driven image inpainting with diffusion models**](https://arxiv.org/abs/2312.03771) | Image composition | 2023.12 | ARXIV2023 | 189 | 190 | ### Condition Integration in the Specialization Stage 191 | 192 | #### Conditional Projection 193 | 194 | | Title | Task | Date | Publication | 195 | | ------------------------------------------------------------ | ------------- | ------- | ----------- | 196 | | [**An image is worth one word: Personalizing text-to-image generation using textual inversion**](https://arxiv.org/abs/2208.01618) | Customization | 2022.8 | ICLR2023 | 197 | | [**Imagic: Text-based real image editing with diffusion models**](https://arxiv.org/abs/2210.09276) | Image editing | 2022.10 | CVPR2023 | 198 | | [**Uncovering the disentanglement capability in text-to-image diffusion models**](https://arxiv.org/abs/2212.08698) | Image editing | 2022.12 | CVPR2023 | 199 | | [**Preditor: Text guided image editing with diffusion prior**](https://arxiv.org/abs/2302.07979) | Image editing | 2023.2 | ARXIV2023 | 200 | | [**iedit: Localised text-guided image editing with weak supervision**](https://arxiv.org/abs/2305.05947) | Image editing | 2023.5 | CVPR2024 | 201 | | [**Forgedit: Text guided image editing via learning and forgetting**](https://arxiv.org/abs/2309.10556) | Image editing | 2023.9 | ARXIV2023 | 202 | | [**Prompting hard or hardly prompting: Prompt inversion for text-to-image diffusion models**](https://arxiv.org/abs/2312.12416) | Image editing | 2023.12 | CVPR2024 | 203 | 204 | #### Testing-time Model Fine-Tuning 205 | 206 | | Title | task | Date | Publication | 207 | | ------------------------------------------------------------ | ------------- | ------- | ----------------- | 208 | | [**Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation**](https://arxiv.org/abs/2208.12242) | Customization | 2022.8 | CVPR2023 | 209 | | [**Imagic: Text-based real image editing with diffusion models**](https://arxiv.org/abs/2210.09276) | Image editing | 2022.10 | CVPR2023 | 210 | | [**Unitune: Text-driven image editing by fine tuning a diffusion model on a single image**](https://arxiv.org/abs/2210.09477) | Image editing | 2022.10 | TOG2023 | 211 | | [**Multi-concept customization of text-to-image diffusion**](https://arxiv.org/abs/2212.04488) | Customization | 2022.12 | CVPR2023 | 212 | | [**Sine: Single image editing with text-to-image diffusion models**](https://arxiv.org/abs/2212.04489) | Image editing | 2022.12 | CVPR2023 | 213 | | [**Encoder-based domain tuning for fast personalization of text-to-image models**](https://arxiv.org/abs/2302.12228) | Customization | 2023.2 | TOG2023 | 214 | | [**Svdiff: Compact parameter space for diffusion fine-tuning**](https://arxiv.org/abs/2303.11305) | Customization | 2023.3 | ICCV2023 | 215 | | [**Cones: concept neurons in diffusion models for customized generation**](https://arxiv.org/abs/2303.05125) | Customization | 2023.3 | ICML2023 | 216 | | [**Custom-edit: Text-guided image editing with customized diffusion models**](https://arxiv.org/abs/2305.15779) | Customization | 2023.5 | ARXIV2023 | 217 | | [**Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models**](https://arxiv.org/abs/2305.18292) | Customization | 2023.5 | NeurIPS2024 | 218 | | [**Layerdiffusion: Layered controlled image editing with diffusion models**](https://arxiv.org/abs/2305.18676) | Image editing | 2023.5 | SIGGRAPH Asia2023 | 219 | | [**Cones 2: Customizable image synthesis with multiple subjects**](https://arxiv.org/abs/2305.19327) | Customization | 2023.5 | NeurIPS2023 | 220 | 221 | ## Condition Integration in the Sampling Process 222 | 223 | We illustrate six conditioning mechanisms with an exemplary image editing process in next figure. 224 | 225 | ![Sampling](https://github.com/zju-pi/Awesome-Conditional-Diffusion-Models/blob/main/Images/Sampling.png) 226 | 227 | ### Inversion 228 | 229 | | Title | Task | Date | Publication | 230 | | ------------------------------------------------------------ | ------------------------------------- | ------- | ----------- | 231 | | [**Sdedit: Guided image synthesis and editing with stochastic differential equations**](https://arxiv.org/abs/2108.01073) | Image editing, Visual signal to image | 2021.8 | ICLR2022 | 232 | | [**Dual diffusion implicit bridges for image-to-image translation**](https://arxiv.org/abs/2203.08382) | Image editing, Visual signal to image | 2022.3 | ICLR2023 | 233 | | [**Null-text inversion for editing real images using guided diffusion models**](https://arxiv.org/abs/2211.09794) | Image editing | 2022.11 | CVPR2023 | 234 | | [**Edict: Exact diffusion inversion via coupled transformations**](https://arxiv.org/abs/2211.12446) | Image editing | 2022.11 | CVPR2023 | 235 | | [**A latent space of stochastic diffusion models for zero-shot image editing and guidance**](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_A_Latent_Space_of_Stochastic_Diffusion_Models_for_Zero-Shot_Image_ICCV_2023_paper.pdf) | Image editing | 2022.11 | ICCV2023 | 236 | | [**Inversion-based style transfer with diffusion models**](https://arxiv.org/abs/2211.13203) | Image editing | 2022.11 | CVPR2023 | 237 | | [**An edit friendly ddpm noise space: Inversion and manipulations**](https://arxiv.org/abs/2304.06140) | Image editing | 2023.4 | ARXIV2023 | 238 | | [**Prompt tuning inversion for text-driven image editing using diffusion models**](https://arxiv.org/abs/2305.04441) | Image editing | 2023.5 | ICCV2023 | 239 | | [**Negative-prompt inversion: Fast image inversion for editing with textguided diffusion models**](https://arxiv.org/abs/2305.16807) | Image editing | 2023.5 | ARXIV2023 | 240 | | [**Dragdiffusion: Harnessing diffusion models for interactive point-based image editing**](https://arxiv.org/abs/2306.14435) | Image editing | 2023.6 | CVPR2024 | 241 | | [**Tf-icon: Diffusion-based training-free cross-domain image composition**](https://arxiv.org/abs/2307.12493) | Image editing | 2023.7 | ICCV2023 | 242 | | [**Stylediffusion: Controllable disentangled style transfer via diffusion models**](https://arxiv.org/abs/2308.07863) | Image editing | 2023.8 | ICCV2023 | 243 | | [**Kv inversion: Kv embeddings learning for text-conditioned real image action editing**](https://arxiv.org/abs/2309.16608) | Image editing | 2023.9 | PRCV2023 | 244 | | [**Effective real image editing with accelerated iterative diffusion inversion**](https://arxiv.org/abs/2309.04907) | Image editing | 2023.9 | ICCV2023 | 245 | | [**Direct inversion: Boosting diffusion-based editing with 3 lines of code**](https://arxiv.org/abs/2310.01506) | Image editing | 2023.10 | ARXIV2023 | 246 | | [**Ledits++: Limitless image editing using text-to-image models**](https://arxiv.org/abs/2311.16711) | Image editing | 2023.11 | CVPR2024 | 247 | | [**The blessing of randomness: Sde beats ode in general diffusionbased image editing**](https://arxiv.org/abs/2311.01410) | Image editing | 2023.11 | ICLR2023 | 248 | | [**Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer**](https://arxiv.org/abs/2312.09008) | Image editing | 2023.12 | CVPR2024 | 249 | | [**Fixed-point inversion for text-to-image diffusion models**](https://arxiv.org/abs/2312.12540v1) | Image editing | 2023.12 | ARXIV2023 | 250 | 251 | ### Attention Manipulation 252 | 253 | | Title | Task | Date | Publication | 254 | | ------------------------------------------------------------ | -------------- | ------- | ----------- | 255 | | [**Prompt-to-prompt image editing with cross attention control**](https://arxiv.org/abs/2208.01626) | Image editing | 2022.8 | ICLR2023 | 256 | | [**Plug-and-play diffusion features for text-driven image-to-image translation**](https://arxiv.org/abs/2211.12572) | Image editing | 2022.11 | CVPR2023 | 257 | | [**ediffi: Text-toimage diffusion models with an ensemble of expert denoisers**](https://arxiv.org/abs/2211.01324) | Layout control | 2022.11 | ARXIV2022 | 258 | | [**Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing**](https://arxiv.org/abs/2304.08465) | Image editing | 2023.4 | ICCV2023 | 259 | | [**Custom-edit: Text-guided image editing with customized diffusion models**](https://arxiv.org/abs/2305.15779) | Customization | 2023.5 | ARXIV2023 | 260 | | [**Cones 2: Customizable image synthesis with multiple subjects**](https://arxiv.org/abs/2305.19327) | Customization | 2023.5 | NeurIPS2023 | 261 | | [**Dragdiffusion: Harnessing diffusion models for interactive point-based image editing**](https://arxiv.org/abs/2306.14435) | Image editing | 2023.6 | CVPR2024 | 262 | | [**Tf-icon: Diffusion-based training-free cross-domain image composition**](https://arxiv.org/abs/2307.12493) | Image editing | 2023.7 | ICCV2023 | 263 | | [**Dragondiffusion: Enabling drag-style manipulation on diffusion models**](https://arxiv.org/abs/2307.02421) | Image editing | 2023.7 | ICLR2024 | 264 | | [**Stylediffusion: Controllable disentangled style transfer via diffusion models**](https://arxiv.org/abs/2308.07863) | Image editing | 2023.8 | ICCV2023 | 265 | | [**Face aging via diffusion-based editing**](https://arxiv.org/abs/2309.11321) | Image editing | 2023.9 | BMVC2023 | 266 | | [**Dynamic prompt learning: Addressing cross-attention leakage for text-based image editing**](https://arxiv.org/abs/2309.15664) | Image editing | 2023.9 | NeurIPS2024 | 267 | | [**Style injection in diffusion: A training-free approach for adapting large-scale diffusion models for style transfer**](https://arxiv.org/abs/2312.09008) | Image editing | 2023.12 | CVPR2024 | 268 | | [**Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation**](https://arxiv.org/abs/2312.10113) | Image editing | 2023.12 | ARXIV2023 | 269 | | [**Towards understanding cross and self-attention in stable diffusion for text-guided image editing**](https://arxiv.org/abs/2403.03431) | Image editing | 2024.3 | CVPR2024 | 270 | | [**Taming Rectified Flow for Inversion and Editing**](https://arxiv.org/abs/2411.04746) | Image editing | 2024.11 | ARXIV2024 | 271 | | [**Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models**](https://arxiv.org/abs/2411.07232) | Image editing | 2024.11 | ARXIV2024 | 272 | 273 | ### Noise Blending 274 | 275 | | Title | Task | Date | Publication | 276 | | ------------------------------------------------------------ | -------------------------------- | ------- | ----------- | 277 | | [**Compositional visual generation with composable diffusion models**](https://arxiv.org/abs/2206.01714) | General approach | 2022.6 | ECCV2022 | 278 | | [**Classifier-free diffusion guidance**](https://arxiv.org/abs/2207.12598) | / | 2022.7 | ARXIV2022 | 279 | | [**Sine: Single image editing with text-to-image diffusion models**](https://arxiv.org/abs/2212.04489) | Image editing | 2022.12 | CVPR2023 | 280 | | [**Multidiffusion: Fusing diffusion paths for controlled image generation**](https://arxiv.org/abs/2302.08113) | Multiple control | 2023.2 | ICML2023 | 281 | | [**Pair-diffusion: Object-level image editing with structure-and-appearance paired diffusion models**](https://arxiv.org/abs/2303.17546) | Image editing, Image composition | 2023.3 | ARXIV2023 | 282 | | [**Magicfusion: Boosting text-to-image generation performance by fusing diffusion models**](https://arxiv.org/abs/2303.13126) | Image composition | 2023.3 | ICCV2023 | 283 | | [**Effective real image editing with accelerated iterative diffusion inversion**](https://arxiv.org/abs/2309.04907) | image editing | 2023.9 | ICCV2023 | 284 | | [**Ledits++: Limitless image editing using text-to-image models**](https://arxiv.org/abs/2311.16711) | Image editing | 2023.11 | CVPR2024 | 285 | | [**Noisecollage: A layout-aware text-to-image diffusion model based on noise cropping and merging**](https://arxiv.org/abs/2403.03485) | Image composition | 2024.3 | CVPR2024 | 286 | 287 | ### Revising Diffusion Process 288 | 289 | | Title | Task | Date | Publication | 290 | | ------------------------------------------------------------ | ----------------- | ------- | ----------- | 291 | | [**Snips: Solving noisy inverse problems stochastically**](https://arxiv.org/abs/2105.14951) | Image restoration | 2021.5 | NeurIPS2021 | 292 | | [**Denoising diffusion restoration models**](https://arxiv.org/abs/2201.11793) | Image restoration | 2022.1 | NeurIPS2022 | 293 | | [**Driftrec: Adapting diffusion models to blind jpeg restoration**](https://arxiv.org/abs/2211.06757) | Image restoration | 2022.11 | TIP2024 | 294 | | [**Zero-shot image restoration using denoising diffusion null-space model**](https://arxiv.org/abs/2212.00490) | Image restoration | 2022.12 | ICLR2024 | 295 | | [**Image restoration with mean-reverting stochastic differential equations**](https://arxiv.org/abs/2301.11699) | Image restoration | 2023.1 | ICML2023 | 296 | | [**Inversion by direct iteration: An alternative to denoising diffusion for image restoration**](https://arxiv.org/abs/2303.11435) | Image restoration | 2023.3 | TMLR2023 | 297 | | [**Resshift: Efficient diffusion model for image super-resolution by residual shifting**](https://arxiv.org/abs/2307.12348) | Image restoration | 2023.7 | NeurIPS2024 | 298 | | [**Sinsr: diffusion-based image super-resolution in a single step**](https://arxiv.org/abs/2311.14760) | Image restoration | 2023.11 | CVPR2024 | 299 | 300 | ### Guidance 301 | 302 | | Title | Task | Date | Publication | 303 | | ------------------------------------------------------------ | ------- | ------------ | ------------ | 304 | | [**Diffusion models beat gans on image synthesis**](https://arxiv.org/abs/2105.05233) | Text-to-image | 2021.5 | NeurIPS2021 | 305 | | [**Blended diffusion for text-driven editing of natural images**](https://arxiv.org/abs/2111.14818) | Image restoration | 2021.11 | CVPR2022 | 306 | | [**More control for free! image synthesis with semantic diffusion guidance**](https://arxiv.org/abs/2112.05744) | Text/Image-to-image | 2021.12 | WACV2023 | 307 | | [**Improving diffusion models for inverse problems using manifold constraints**](https://arxiv.org/abs/2206.00941) | Image restoration | 2022.6 | NeurIPS2022 | 308 | | [**Diffusion posterior sampling for general noisy inverse problems**](https://arxiv.org/abs/2209.14687) | Image restoration | 2022.9 | ICLR2023 | 309 | | [**Diffusion-based image translation using disentangled style and content representation**](https://arxiv.org/abs/2209.15264) | Image editing | 2022.9 | ICLR2023 | 310 | | [**Sketch-guided text-to-image diffusion models**](https://arxiv.org/abs/2211.13752) | Visual signal to image | 2022.11 | SIGGRAPH2023 | 311 | | [**High-fidelity guided image synthesis with latent diffusion models**](https://arxiv.org/abs/2211.17084) | Visual signal to image | 2022.11 | CVPR2023 | 312 | | [**Parallel diffusion models of operator and image for blind inverse problems**](https://arxiv.org/abs/2211.10656) | Image restoration | 2022.11 | CVPR2023 | 313 | | [**Zero-shot image-to-image translation**](https://arxiv.org/abs/2302.03027) | Image editing | 2023.2 | SIGGRAPH2023 | 314 | | [**Universal guidance for diffusion models**](https://arxiv.org/abs/2302.07121) | General guidance framework | 2023.2 | CVPR2023 | 315 | | [**Pseudoinverse-guided diffusion models for inverse problems**](https://openreview.net/pdf?id=9_gsMA8MRKQ) | Image restoration | 2023.2 | ICLR2023 | 316 | | [**Freedom: Training-free energy-guided conditional diffusion model**](https://arxiv.org/abs/2303.09833) | General guidance framework | 2023.3 | ICCV2023 | 317 | | [**Training-free layout control with cross-attention guidance**](https://arxiv.org/abs/2304.03373) | Layout control | 2023.4 | WACV2024 | 318 | | [**Generative diffusion prior for unified image restoration and enhancement**](https://arxiv.org/abs/2304.01247) | Image restoration | 2023.4 | CVPR2023 | 319 | | [**Regeneration learning of diffusion models with rich prompts for zero-shot image translation**](https://arxiv.org/abs/2305.04651) | Image editing | 2023.5 | ARXIV2023 | 320 | | [**Diffusion self-guidance for controllable image generation**](https://arxiv.org/abs/2306.00986) | Image editing | 2023.6 | NeurIPS2024 | 321 | | [**Energy-based cross attention for bayesian context update in text-to-image diffusion models**](https://arxiv.org/abs/2306.09869) | Image editing | 2023.6 | NeurIPS2024 | 322 | | [**Solving linear inverse problems provably via posterior sampling with latent diffusion models**](https://arxiv.org/abs/2307.00619) | Image restoration | 2023.7 | NeurIPS2024 | 323 | | [**Dragondiffusion: Enabling drag-style manipulation on diffusion models**](https://arxiv.org/abs/2307.02421) | Image editing | 2023.7 | ICLR2024 | 324 | | [**Readout guidance: Learning control from diffusion features**](https://arxiv.org/abs/2312.02150) | Visual signal to image | 2023.12 | CVPR2024 | 325 | | [**Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition**](https://arxiv.org/abs/2312.07536) | Visual signal to image | 2023.12 | CVPR2024 | 326 | | [**Diffeditor: Boosting accuracy and flexibility on diffusion-based image editing**](https://arxiv.org/abs/2402.02583) | Image editing | 2024.2 | CVPR2024 | 327 | 328 | 329 | 330 | ### Conditional Correction 331 | 332 | | Title | Task | Date | Publication | 333 | | ------------------------------------------------------------ | ----------------- | ------- | ----------- | 334 | | [**Score-based generative modeling through stochastic differential equations**](https://arxiv.org/abs/2011.13456) | Image restoration | 2020.11 | ICLR2021 | 335 | | [**ILVR: conditioning method for denoising diffusion probabilistic models**](https://arxiv.org/abs/2108.02938) | Image restoration | 2021.8 | ICCV2021 | 336 | | [**Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction**](https://arxiv.org/abs/2112.05146) | Image restoration | 2021.12 | CVPR2022 | 337 | | [**Repaint: Inpainting using denoising diffusion probabilistic models**](https://arxiv.org/abs/2201.09865) | Image restoration | 2022.1 | CVPR2022 | 338 | | [**Improving diffusion models for inverse problems using manifold constraints**](https://arxiv.org/abs/2206.00941) | Image restoration | 2022.6 | NeurIPS2022 | 339 | | [**Diffedit: Diffusion-based semantic image editing with mask guidance**](https://arxiv.org/abs/2210.11427) | Image editing | 2022.10 | ICLR2023 | 340 | | [**Region-aware diffusion for zero-shot text-driven image editing**](https://arxiv.org/abs/2302.11797) | Image editing | 2023.2 | ARXIV2023 | 341 | | [**Localizing object-level shape variations with text-to-image diffusion models**](https://arxiv.org/abs/2303.11306) | Image editing | 2023.3 | ICCV2023 | 342 | | [**Instructedit: Improving automatic masks for diffusion-based image editing with user instructions**](https://arxiv.org/abs/2305.18047) | Image editing | 2023.5 | ARXIV2023 | 343 | | [**Text-driven image editing via learnable regions**](https://arxiv.org/abs/2311.16432) | Image editing | 2023.11 | CVPR2024 | 344 | 345 | 346 | 347 | ## Star History 348 | 349 | [![Star History Chart](https://api.star-history.com/svg?repos=zju-pi/Awesome-Conditional-Diffusion-Models&type=Date)](https://star-history.com/#zju-pi/Awesome-Conditional-Diffusion-Models&Date) 350 | -------------------------------------------------------------------------------- /media/License-MIT-green.svg: -------------------------------------------------------------------------------- 1 | License: MITLicenseMIT -------------------------------------------------------------------------------- /media/badge.svg: -------------------------------------------------------------------------------- 1 | 2 | --------------------------------------------------------------------------------