├── imgs ├── img.png └── logo.png └── README.md /imgs/img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yeungchenwa/Recommendations-Diffusion-Text-Image/HEAD/imgs/img.png -------------------------------------------------------------------------------- /imgs/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yeungchenwa/Recommendations-Diffusion-Text-Image/HEAD/imgs/logo.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![LOGO](imgs/logo.png) 2 | # Recommendations of Diffusion for Text-Image 3 | This repository contains a paper collection of recent diffusion models for text-image generation tasks. 4 | 5 | ## 📖 Table of Contents 👀 6 | - [Document Restoration](#document-restoration) 7 | - [Font Generation](#font-generation) 8 | - [Text-to-Image (Visual Text Generation)](#text-to-image-visual-text-generation) 9 | - [Artistic Font Generation](#artistic-font-generation) 10 | - [Text-Image Removal](#text-image-removal) 11 | - [Text-Image Super Resolution](#text-image-super-resolution) 12 | - [Text-Image Editing](#text-image-editing) 13 | - [Inpainting](#inpainting) 14 | - [Handwritten Generation](#handwritten-generation) 15 | - [Scene Text Recognition](#scene-text-recognition) 16 | - [Scene Text Detection](#scene-text-detection) 17 | ## 18 | 19 | ### Document Restoration 20 | + 🔥🔥🔥[Predicting the Original Appearance of Damaged Historical Documents](https://arxiv.org/abs/2412.11634) (AAAI 2025) 21 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2412.11634) 22 | [![Project](https://img.shields.io/badge/Project-9cf)](https://yeungchenwa.github.io/hdr-homepage/) 23 | [![Star](https://img.shields.io/github/stars/yeungchenwa/HDR.svg?style=social&label=Star)](https://github.com/yeungchenwa/HDR) 24 | 25 | ### Font Generation 26 | + [DiffCJK: Conditional Diffusion Model for High-Quality and Wide-coverage CJK 27 | Character Generation](https://arxiv.org/abs/2404.05212) (Apr. 2024) 28 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2404.05212) 29 | 30 | + [VecFusion: Vector Font Generation with Diffusion](https://arxiv.org/abs/2312.10540) (CVPR2024) 31 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.10540) 32 | 33 | + [Font Style Interpolation with Diffusion Models](https://arxiv.org/abs/2402.14311) (Fec., 2024) 34 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2402.14311) 35 | 36 | + 🔥🔥🔥 [FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning](https://arxiv.org/abs/2312.12142) (AAAI2024) 37 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.12142) 38 | [![Project](https://img.shields.io/badge/Project-9cf)](https://yeungchenwa.github.io/fontdiffuser-homepage/) 39 | [![Star](https://img.shields.io/github/stars/yeungchenwa/FontDiffuser.svg?style=social&label=Star)](https://github.com/yeungchenwa/FontDiffuser) 40 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/yeungchenwa/FontDiffuser-Gradio) 41 | 42 | + [Diff-Font: Diffusion Model for Robust One-Shot Font Generation](https://arxiv.org/abs/2212.05895) (Dec., 2022) 43 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2212.05895) 44 | [![Star](https://img.shields.io/github/stars/Hxyz-123/Font-diff.svg?style=social&label=Star)](https://github.com/Hxyz-123/Font-diff) 45 | 46 | ### Text-to-Image (Visual Text Generation) 47 | + [GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models](https://arxiv.org/abs/2407.02252) (Jul, 2024) 48 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2407.02252) 49 | [![Star](https://img.shields.io/github/stars/OPPO-Mente-Lab/GlyphDraw2.svg?style=social&label=Star)](https://github.com/OPPO-Mente-Lab/GlyphDraw2) 50 | 51 | + [Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering](https://arxiv.org/abs/2406.10208) (Jul, 2024) 52 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2406.10208) 53 | [![Project](https://img.shields.io/badge/Project-9cf)](https://glyph-byt5-v2.github.io/) 54 | [![Star](https://img.shields.io/github/stars/AIGText/Glyph-ByT5.svg?style=social&label=Star)](https://github.com/AIGText/Glyph-ByT5) 55 | 56 | + [High Fidelity Scene Text Synthesis](https://arxiv.org/abs/2405.14701) (Dec, 2023) 57 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2405.14701) 58 | [![Star](https://img.shields.io/github/stars/CodeGoat24/DreamText.svg?style=social&label=Star)](https://github.com/CodeGoat24/DreamText) 59 | 60 | + [Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering](https://arxiv.org/abs/2403.09622) (Mar, 2024) 61 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2403.09622) 62 | [![Project](https://img.shields.io/badge/Project-9cf)](https://glyph-byt5.github.io/) 63 | 64 | + [Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model](https://arxiv.org/abs/2312.12232) (Dec, 2023) 65 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.12232) 66 | [![Star](https://img.shields.io/github/stars/ecnuljzhang/brush-your-text.svg?style=social&label=Star)](https://github.com/ecnuljzhang/brush-your-text) 67 | 68 | + [UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models](https://arxiv.org/abs/2312.04884) (Dec, 2023) 69 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.04884) 70 | [![Project](https://img.shields.io/badge/Project-9cf)](https://udifftext.github.io/) 71 | [![Star](https://img.shields.io/github/stars/zym-pku/udifftext.svg?style=social&label=Star)](https://github.com/zym-pku/udifftext) 72 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/ZYMPKU/UDiffText) 73 | 74 | + [TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering](https://arxiv.org/abs/2311.16465v1) (Nov, 2023) 75 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.16465v1) 76 | [![Project](https://img.shields.io/badge/Project-9cf)](https://jingyechen.github.io/textdiffuser2/) 77 | [![Star](https://img.shields.io/github/stars/microsoft/unilm.svg?style=social&label=Star)](https://github.com/microsoft/unilm/tree/master/textdiffuser-2) 78 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/JingyeChen22/TextDiffuser-2) 79 | 80 | + [ANYTEXT: MULTILINGUAL VISUAL TEXT GENERATION AND EDITING](https://arxiv.org/abs/2311.03054) (Nov, 2023) 81 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.03054) 82 | [![Star](https://img.shields.io/github/stars/tyxsspa/AnyText.svg?style=social&label=Star)](https://github.com/tyxsspa/AnyText) 83 | 84 | + [TextDiffuser: Diffusion Models as Text Painters](https://arxiv.org/abs/2305.10855) (May, 2023) 85 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.10855) 86 | [![Project](https://img.shields.io/badge/Project-9cf)](https://modelscope.cn/models/damo/text-to-video-synthesis/summary) 87 | [![Star](https://img.shields.io/github/stars/microsoft/unilm.svg?style=social&label=Star)](https://github.com/microsoft/unilm/tree/master/textdiffuser) 88 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/JingyeChen22/TextDiffuser) 89 | 90 | + [GlyphControl: Glyph Conditional Control for Visual Text Generation](https://arxiv.org/abs/2305.18259) (May, 2023) 91 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.18259) 92 | [![Star](https://img.shields.io/github/stars/AIGText/GlyphControl-release.svg?style=social&label=Star)](https://github.com/AIGText/GlyphControl-release) 93 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/AIGText/GlyphControl) 94 | 95 | + [IF](https://github.com/deep-floyd/IF) (Apr., 2023) 96 | [![Star](https://img.shields.io/github/stars/deep-floyd/IF.svg?style=social&label=Star)](https://github.com/deep-floyd/IF) 97 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/DeepFloyd/IF) 98 | 99 | + [GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation](https://arxiv.org/abs/2303.17870) (Mar., 2023) 100 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2303.17870) 101 | [![Project](https://img.shields.io/badge/Project-9cf)](https://1073521013.github.io/glyph-draw.github.io/) 102 | [![Star](https://img.shields.io/github/stars/OPPO-Mente-Lab/GlyphDraw.svg?style=social&label=Star)](https://github.com/OPPO-Mente-Lab/GlyphDraw) 103 | 104 | + [Character-aware models improve visual text rendering](https://arxiv.org/abs/2212.10562) (Dec., 2022) 105 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2212.10562) 106 | 107 | + [eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers](https://arxiv.org/abs/2211.01324) (Nov., 2022) 108 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2211.01324) 109 | [![Project](https://img.shields.io/badge/Project-9cf)](https://deepimagination.cc/eDiff-I/) 110 | 111 | + [Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding](https://arxiv.org/abs/2205.11487) (May, 2022) 112 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2205.11487) 113 | [![Project](https://img.shields.io/badge/Project-9cf)](https://imagen.research.google/) 114 | 115 | ### Artistic Font Generation 116 | + [Word-As-Image for Semantic Typography](https://arxiv.org/abs/2303.01818) (SIGGRAPH2023) 117 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2303.01818) 118 | [![Project](https://img.shields.io/badge/Project-9cf)](https://wordasimage.github.io/Word-As-Image-Page/) 119 | [![Star](https://img.shields.io/github/stars/Shiriluz/Word-As-Image.svg?style=social&label=Star)](https://github.com/Shiriluz/Word-As-Image) 120 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/SemanticTypography/Word-As-Image) 121 | 122 | + [ControlNet on Text Effect](https://mp.weixin.qq.com/s/rvpU4XhToldoec_bABeXJw) (Jul., 2023) 123 | 124 | + [DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion](https://arxiv.org/abs/2303.09604) (ICCV2023) 125 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2303.09604) 126 | [![Project](https://img.shields.io/badge/Project-9cf)](https://ds-fusion.github.io/) 127 | [![Star](https://img.shields.io/github/stars/tmaham/DS-Fusion.svg?style=social&label=Star)](https://github.com/tmaham/DS-Fusion) 128 | [![Demo](https://img.shields.io/badge/Demo-8A2BE2)](https://huggingface.co/spaces/tmaham/DS-Fusion-Express) 129 | 130 | ### Text-Image Removal 131 | + [Optical Character Recognition with Segment Anything (OCR-SAM)](https://github.com/yeungchenwa/OCR-SAM) (Apr. 2023) 132 | [![Star](https://img.shields.io/github/stars/yeungchenwa/OCR-SAM.svg?style=social&label=Star)](https://github.com/yeungchenwa/OCR-SAM) 133 | 134 | ### Text-Image Super Resolution 135 | + [Diffusion-based Blind Text Image Super-Resolution](https://arxiv.org/abs/2312.08886) (CVPR2024) 136 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.08886) 137 | 138 | + [PEAN: A Diffusion-based Prior-Enhanced Attention Network for 139 | Scene Text Image Super-Resolution](https://arxiv.org/abs/2311.17955) (Nov. 2023) 140 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.17955) 141 | [![Star](https://img.shields.io/github/stars/jdfxzzy/PEAN.svg?style=social&label=Star)](https://github.com/jdfxzzy/PEAN) 142 | 143 | 144 | + [RECOGNITION-GUIDED DIFFUSION MODEL FOR SCENE TEXT IMAGE 145 | SUPER-RESOLUTION](https://arxiv.org/abs/2311.13317) (Nov. 2023) 146 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.13317) 147 | 148 | + [Scene Text Image Super-resolution based on Text-conditional Diffusion Models](https://arxiv.org/abs/2311.09759) (Nov. 2023) 149 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.09759) 150 | 151 | + [DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior](https://arxiv.org/abs/2308.15070) (Aug. 2023) 152 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.15070) 153 | [![Project](https://img.shields.io/badge/Project-9cf)](https://0x3f3f3f3fun.github.io/projects/diffbir/) 154 | [![Star](https://img.shields.io/github/stars/XPixelGroup/DiffBIR.svg?style=social&label=Star)](https://github.com/XPixelGroup/DiffBIR) 155 | 156 | + [TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution](https://arxiv.org/abs/2308.06743) (Aug., 2023) 157 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2308.06743) 158 | [![Star](https://img.shields.io/github/stars/Lenubolim/TextDiff.svg?style=social&label=Star)](https://github.com/Lenubolim/TextDiff) 159 | 160 | + [DocDiff: Document Enhancement via Residual Diffusion Models](https://arxiv.org/abs/2305.03892) (ACMMM2023) 161 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.03892) 162 | [![Star](https://img.shields.io/github/stars/Royalvice/DocDiff.svg?style=social&label=Star)](https://github.com/Royalvice/DocDiff) 163 | 164 | + [STIRER: A Unified Model for Low-Resolution Scene Text Image Recovery and Recognition](https://github.com/zhaominyiz/STIRER) (ACMMM2023) 165 | [![Star](https://img.shields.io/github/stars/zhaominyiz/STIRER.svg?style=social&label=Star)](https://github.com/zhaominyiz/STIRER) 166 | 167 | ### Text-Image Editing 168 | + [On Manipulating Scene Text in the Wild with Diffusion Models](https://arxiv.org/abs/2311.00734) (WACV2024) 169 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.00734) 170 | 171 | + [DiffUTE: Universal Text Editing Diffusion Model](https://arxiv.org/abs/2305.10825) (May, 2023) 172 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.10825) 173 | [![Star](https://img.shields.io/github/stars/chenhaoxing/DiffUTE.svg?style=social&label=Star)](https://github.com/chenhaoxing/DiffUTE) 174 | 175 | + [Improving Diffusion Models for Scene Text Editing with Dual Encoders](https://arxiv.org/abs/2304.05568) (Apr., 2023) 176 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2304.05568) 177 | [![Star](https://img.shields.io/github/stars/UCSB-NLP-Chang/DiffSTE.svg?style=social&label=Star)](https://github.com/UCSB-NLP-Chang/DiffSTE) 178 | 179 | ### Inpainting 180 | + [Text Image Inpainting via Global Structure-Guided Diffusion Models](https://arxiv.org/abs/2401.14832) (AAAI2024) 181 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2401.14832) 182 | [![Star](https://img.shields.io/github/stars/blackprotoss/GSDM.svg?style=social&label=Star)](https://github.com/blackprotoss/GSDM) 183 | 184 | ### Handwritten Generation 185 | + [Conditional Text Image Generation with Diffusion Models](https://arxiv.org/abs/2306.10804) (CVPR2023) 186 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2306.10804) 187 | 188 | + [ChiroDiff: Modelling chirographic data with Diffusion Models](https://arxiv.org/abs/2304.03785) (ICLR2023) 189 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2304.03785) 190 | [![Project](https://img.shields.io/badge/Project-9cf)](https://ayandas.me/chirodiff) 191 | [![Star](https://img.shields.io/github/stars/dasayan05/chirodiff.svg?style=social&label=Star)](https://github.com/dasayan05/chirodiff) 192 | 193 | + [Zero-shot Generation of Training Data with Denoising Diffusion Probabilistic Model for Handwritten Chinese Character Recognition](https://arxiv.org/abs/2305.15660) (May, 2023) 194 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2305.15660) 195 | 196 | + [WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models](https://arxiv.org/abs/2303.16576) (ICDAR2023) 197 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2303.16576) 198 | [![Star](https://img.shields.io/github/stars/koninik/WordStylist.svg?style=social&label=Star)](https://github.com/koninik/WordStylist) 199 | 200 | + [Diffusion models for Handwriting Generation](https://arxiv.org/abs/2011.06704) (Nov., 2020) 201 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2011.06704) 202 | [![Star](https://img.shields.io/github/stars/tcl9876/Diffusion-Handwriting-Generation.svg?style=social&label=Star)](https://github.com/tcl9876/Diffusion-Handwriting-Generation) 203 | 204 | ### Scene Text Recognition 205 | + [DiffusionSTR: Diffusion Model for Scene Text Recognition](https://arxiv.org/abs/2306.16707) (ICIP2023) 206 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2306.16707) 207 | + [IPAD: Iterative, Parallel, and Diffusion-based 208 | Network for Scene Text Recognition](https://arxiv.org/abs/2312.11923) (TPAMI2023) 209 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2312.11923) 210 | 211 | ### Scene Text Detection 212 | + [Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion 213 | Models](https://arxiv.org/abs/2311.16555) (Nov. 2023) 214 | [![arXiv](https://img.shields.io/badge/arXiv-b31b1b.svg)](https://arxiv.org/abs/2311.16555) 215 | [![Star](https://img.shields.io/github/stars/99Franklin/DiffText.svg?style=social&label=Star)](https://github.com/99Franklin/DiffText) 216 | --------------------------------------------------------------------------------