├── README.md ├── README_EN.md ├── com.png ├── model.png └── sota.png /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | 4 | 5 |
6 | 7 |
8 | 9 | [简体中文](README.md) | [English](README_EN.md) | [Paper](https://arxiv.org/abs/2308.06743) 10 | # TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution 11 | 这里是论文[TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution]((https://arxiv.org/abs/2308.06743))的官方复现仓库。TextDiff是一个场景文字超分辨率优化模型(详见[论文](https://arxiv.org/abs/2308.06743)). 12 |
13 | 14 | # 网络结构 15 |
16 | 17 | 18 |
19 | 20 | # News 21 | - 置顶: 介绍一款我们实验室开发的多功能且多平台的OCR软件,包含常用的各种OCR功能,例如PDF转word,PDF转excel,公式识别,表格识别以及自动去除水印功能,欢迎试用! 22 | - 查看To-do lists,获取最新信息。 23 | 24 | # 使用指南 25 | 26 | ## 环境配置 27 | ### 深度学习环境 28 | - python >= 3.7 29 | - pytorch >= 1.7.0 30 | - torchvision >= 0.8.0 31 | - lmdb >= 0.98 32 | - pillow >= 7.1.2 33 | - numpy 34 | - six 35 | - tqdm 36 | - python-opencv 37 | - easydict 38 | - yaml 39 | 40 | ### 数据集 41 | - 下载TextZoom数据集 42 | 43 | ### 相关权重文件 44 | - 下载Aster model权重文件 45 | - 下载Moran model权重文件 46 | - 下载CRNN model权重文件 47 | 48 | ## 训练 49 | 1. 安装 50 | ``` 51 | git clone https://github.com/Lenubolim/TextDiff.git 52 | ``` 53 | 2. 参数配置 54 |
见config.yaml文件
55 | 56 | 3. 训练 57 | ``` 58 | python train.py 59 | ``` 60 | ## 推理 61 | ``` 62 | python test.py 63 | ``` 64 | 65 | # To-do lists 66 | 67 | - [ ] 添加训练代码(To be released soon.) 68 | - [ ] 添加推理代码(To be released soon.) 69 | - [ ] 使用DPM_solver减少推理步长 70 | 71 | 72 | # 效果图 73 |
74 | 75 | # 感谢 76 | 77 | - 如果你觉得TextDiff对你有帮助,请给个star,谢谢! 78 | - 如果你有任何问题,欢迎提issue(issue通知与我邮箱绑定,看到后我会及时回复)。 79 | - 如果你愿意将TextDiff作为你的项目的baseline,欢迎引用我们的论文。 80 | 81 | 82 | # References 83 |
84 | 85 | - [1] Scene text telescope: 86 | Text-focused scene image super-resolution 87 | - [2] Activating more pixels in image super-resolution 88 | transformer. 89 | - [3] Srdiff: Single image super-resolution 90 | with diffusion probabilistic models. 91 | - [4] DocDiff: Document Enhancement via Residual Diffusion Models 92 | - [5] Improving 93 | Scene Text Image Super-Resolution via Dual Prior Modulation Network 94 | 95 | 96 | # :book: Citation 97 | If you use (part of) my code or find my work helpful, please consider citing 98 | ``` 99 | @article{liu2023textdiff, 100 | title={TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution}, 101 | author={Liu, Baolin and Yang, Zongyuan and Wang, Pengfei and Zhou, Junjie and Liu, Ziqi and Song, Ziyi and Liu, Yan and Xiong, Yongping}, 102 | journal={arXiv preprint arXiv:2308.06743}, 103 | year={2023} 104 | } 105 | ``` 106 | # Acknowledgement 107 | This code is developed relying on DocDiff and TATT. Thanks for these great projects. Among them, DocDiff is the main research content of my classmate, and I participated in part of the research. 108 | -------------------------------------------------------------------------------- /README_EN.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | 4 | 5 |
6 | 7 |
8 | 9 | [简体中文](README.md) | [English](README_EN.md) | [Paper](https://arxiv.org/abs/2308.06743) 10 | # TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution 11 | Here is the official reproduction repository of the paper [TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution]((https://arxiv.org/abs/2308.06743)). TextDiff is a scene text super-resolution optimization model (see [paper](https://arxiv.org/abs/2308.06743) for details). 12 |
13 | 14 | # Network Structure 15 |
16 | 17 | 18 |
19 | 20 | # User Guide 21 | 22 | 23 | ## Environment configuration 24 | ### Deep Learning Environment 25 | - python >= 3.7 26 | - pytorch >= 1.7.0 27 | - torchvision >= 0.8.0 28 | - lmdb >= 0.98 29 | - pillow >= 7.1.2 30 | - numpy 31 | - six 32 | - tqdm 33 | - python-opencv 34 | - easydict 35 | - yaml 36 | 37 | ### Dataset 38 | - Download TextZoom dataset 39 | 40 | ### Related weight files 41 | - Download Aster model weight file 42 | - Download Moran model weight file 43 | - Download CRNN model weight file 44 | 45 | # To-do lists 46 | 47 | - [ ] Add training code 48 | - [ ] Add inference code 49 | - [ ] Use DPM_solver to reduce inference step size 50 | 51 | # Renderings 52 |
53 | 54 | # Gratitude 55 | 56 | - If you think TextDiff is helpful to you, please give it a star, thank you! 57 | - If you have any questions, please raise an issue and I will reply as soon as possible. 58 | - If you are willing to use TextDiff as a baseline for your project, you are welcome to cite our paper. 59 | 60 | 61 | # References 62 |
63 | 64 | - [1] Scene text telescope: 65 | Text-focused scene image super-resolution 66 | - [2] Activating more pixels in image super-resolution 67 | transformer. 68 | - [3] Srdiff: Single image super-resolution 69 | with diffusion probabilistic models. 70 | - [4] DocDiff: Document Enhancement via Residual Diffusion Models 71 | - [5] Improving 72 | Scene Text Image Super-Resolution via Dual Prior Modulation Network 73 | 74 | 75 | # :book: Citation 76 | If you use (part of) my code or find my work helpful, please consider citing 77 | ``` 78 | @article{liu2023textdiff, 79 | title={TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution}, 80 | author={Liu, Baolin and Yang, Zongyuan and Wang, Pengfei and Zhou, Junjie and Liu, Ziqi and Song, Ziyi and Liu, Yan and Xiong, Yongping}, 81 | journal={arXiv preprint arXiv:2308.06743}, 82 | year={2023} 83 | } 84 | ``` 85 | 86 | # Acknowledgement 87 | This code is developed relying on DocDiff and TATT. Thanks for these great projects. Among them, DocDiff is the main research content of my classmate, and I participated in part of the research. 88 | 89 | -------------------------------------------------------------------------------- /com.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lenubolim/TextDiff/b0264a94a240af2801e7bac1ca27ea77392473e7/com.png -------------------------------------------------------------------------------- /model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lenubolim/TextDiff/b0264a94a240af2801e7bac1ca27ea77392473e7/model.png -------------------------------------------------------------------------------- /sota.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Lenubolim/TextDiff/b0264a94a240af2801e7bac1ca27ea77392473e7/sota.png --------------------------------------------------------------------------------