├── README.md
├── assets
├── Arial_Unicode.ttf
├── adam-jang-8pOTAtyd_Mc-unsplash.jpg
├── comparison.pdf
├── example1.png
├── example2.png
├── example3.png
├── example4.png
├── example5.png
├── infer.png
├── inpaint.png
├── ipa.png
├── train.png
└── union.png
├── controlnet_flux.py
├── infer.py
├── infer_inpaint.py
├── pipeline_flux_controlnet.py
├── pipeline_flux_controlnet_inpaint.py
└── results
├── result.jpg
└── result_inpaint.jpg
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
RepText: Rendering Visual Text via Replicating
3 |
4 |
5 |
Haofan Wang†,
6 | Yujia Xu,
7 | Yimeng Li,
8 | Junchen Li,
9 | Chaowei Zhang,
10 | Jing Wang,
11 | Kejia Yang,
12 | Zhibo Chen
13 |
14 |
15 | Shakker Labs, Liblib AI
16 |
†Corresponding author
17 |
18 |
19 |

20 |

21 | [](https://huggingface.co/Shakker-Labs/RepText)
22 |
23 |
24 |
25 | We present RepText, which aims to empower pre-trained monolingual text-to-image generation models with the ability to accurately render, or more precisely, replicate, multilingual visual text in user-specified fonts, without the need to really understand them. Specifically, we adopt the setting from ControlNet and additionally integrate language agnostic glyph and position of rendered text to enable generating harmonized visual text, allowing users to customize text content, font and position on their needs. To improve accuracy, a text perceptual loss is employed along with the diffusion loss. Furthermore, to stabilize rendering process, at the inference phase, we directly initialize with noisy glyph latent instead of random initialization, and adopt region masks to restrict the feature injection to only the text region to avoid distortion of the background. We conducted extensive experiments to verify the effectiveness of our RepText relative to existing works, our approach outperforms existing open-source methods and achieves comparable results to native multi-language closed-source models.
26 |
27 |