├── README.md ├── VTP_Bench ├── DID │ ├── .DS_Store │ ├── __pycache__ │ │ └── select.cpython-310.pyc │ ├── evaluation_results │ │ └── .DS_Store │ ├── get_did_scores.py │ ├── gpt_did.py │ └── select_data │ │ ├── .DS_Store │ │ └── UVDoc │ │ ├── .DS_Store │ │ ├── dir300_14.png │ │ └── dir300_3.png ├── STE │ ├── .DS_Store │ ├── STE_data │ │ ├── .DS_Store │ │ └── AnyText │ │ │ ├── .DS_Store │ │ │ ├── 0.png │ │ │ ├── 1.png │ │ │ ├── 1000.png │ │ │ ├── 1005.png │ │ │ ├── 1006.png │ │ │ ├── 1011.png │ │ │ ├── 1012.png │ │ │ ├── 1017.png │ │ │ ├── 1018.png │ │ │ ├── 1023.png │ │ │ ├── 1024.png │ │ │ ├── 1029.png │ │ │ ├── 1030.png │ │ │ ├── 1035.png │ │ │ ├── 1036.png │ │ │ ├── 1041.png │ │ │ ├── 1042.png │ │ │ ├── 1047.png │ │ │ ├── 1048.png │ │ │ ├── 1053.png │ │ │ ├── 1054.png │ │ │ ├── 1059.png │ │ │ ├── 1060.png │ │ │ ├── 1065.png │ │ │ ├── 1066.png │ │ │ ├── 1071.png │ │ │ ├── 1072.png │ │ │ ├── 1077.png │ │ │ ├── 1078.png │ │ │ ├── 1083.png │ │ │ ├── 1084.png │ │ │ ├── 1089.png │ │ │ ├── 1090.png │ │ │ ├── 1095.png │ │ │ ├── 1096.png │ │ │ ├── 1101.png │ │ │ ├── 1102.png │ │ │ ├── 1107.png │ │ │ ├── 1108.png │ │ │ ├── 1113.png │ │ │ ├── 1114.png │ │ │ ├── 1119.png │ │ │ ├── 1120.png │ │ │ ├── 1125.png │ │ │ ├── 1126.png │ │ │ ├── 1131.png │ │ │ ├── 1132.png │ │ │ ├── 1137.png │ │ │ ├── 1138.png │ │ │ ├── 1143.png │ │ │ ├── 1144.png │ │ │ ├── 1149.png │ │ │ ├── 1150.png │ │ │ ├── 1155.png │ │ │ ├── 1156.png │ │ │ ├── 1161.png │ │ │ ├── 1162.png │ │ │ ├── 1167.png │ │ │ ├── 1168.png │ │ │ ├── 1173.png │ │ │ ├── 1174.png │ │ │ ├── 1179.png │ │ │ ├── 1180.png │ │ │ ├── 1185.png │ │ │ ├── 1186.png │ │ │ ├── 1191.png │ │ │ ├── 1192.png │ │ │ ├── 1197.png │ │ │ ├── 1198.png │ │ │ ├── 1203.png │ │ │ ├── 1204.png │ │ │ ├── 1209.png │ │ │ ├── 1210.png │ │ │ ├── 1215.png │ │ │ ├── 1216.png │ │ │ ├── 1221.png │ │ │ ├── 1222.png │ │ │ ├── 1227.png │ │ │ ├── 1228.png │ │ │ ├── 1233.png │ │ │ ├── 1234.png │ │ │ ├── 1239.png │ │ │ ├── 1240.png │ │ │ ├── 1245.png │ │ │ ├── 1246.png │ │ │ ├── 1251.png │ │ │ ├── 1252.png │ │ │ ├── 1257.png │ │ │ ├── 1258.png │ │ │ ├── 1263.png │ │ │ ├── 1264.png │ │ │ ├── 1269.png │ │ │ ├── 1270.png │ │ │ ├── 1277.png │ │ │ ├── 1278.png │ │ │ ├── 1284.png │ │ │ ├── 1285.png │ │ │ ├── 6.png │ │ │ ├── 7.png │ │ │ ├── 885.png │ │ │ ├── 886.png │ │ │ ├── 891.png │ │ │ ├── 892.png │ │ │ ├── 897.png │ │ │ ├── 898.png │ │ │ ├── 903.png │ │ │ ├── 904.png │ │ │ ├── 909.png │ │ │ ├── 910.png │ │ │ ├── 915.png │ │ │ ├── 916.png │ │ │ ├── 921.png │ │ │ ├── 922.png │ │ │ ├── 927.png │ │ │ ├── 928.png │ │ │ ├── 933.png │ │ │ ├── 934.png │ │ │ ├── 939.png │ │ │ ├── 940.png │ │ │ ├── 945.png │ │ │ ├── 946.png │ │ │ ├── 951.png │ │ │ ├── 952.png │ │ │ ├── 957.png │ │ │ ├── 958.png │ │ │ ├── 963.png │ │ │ ├── 964.png │ │ │ ├── 969.png │ │ │ ├── 970.png │ │ │ ├── 975.png │ │ │ ├── 976.png │ │ │ ├── 981.png │ │ │ ├── 982.png │ │ │ ├── 987.png │ │ │ ├── 988.png │ │ │ ├── 993.png │ │ │ ├── 994.png │ │ │ └── 999.png │ ├── evaluation_results │ │ └── .DS_Store │ ├── get_ste_score.py │ └── gpt_ste.py ├── STG │ ├── .DS_Store │ ├── data │ │ ├── .DS_Store │ │ └── anytext │ │ │ ├── .DS_Store │ │ │ ├── laion_001253592.jpg │ │ │ ├── laion_001253640.jpg │ │ │ ├── laion_001253653.jpg │ │ │ ├── laion_001253713.jpg │ │ │ ├── laion_001253727.jpg │ │ │ └── laion_001253775.jpg │ ├── evaluation_results │ │ └── .DS_Store │ ├── get_stg_score.py │ └── gpt_stg.py ├── STR │ ├── .DS_Store │ ├── STR_data │ │ ├── .DS_Store │ │ └── viteraser │ │ │ ├── .DS_Store │ │ │ ├── 1.png │ │ │ ├── 10.png │ │ │ ├── 2.png │ │ │ ├── 5.png │ │ │ ├── 6.png │ │ │ └── 8.png │ ├── evaluation_results │ │ └── .DS_Store │ ├── get_str_score.py │ └── gpt_str.py ├── STSR │ ├── .DS_Store │ ├── evaluation_results │ │ └── .DS_Store │ ├── get_stsr_scores.py │ ├── gpt_stsr.py │ └── lemma │ │ ├── .DS_Store │ │ ├── final_merged_0_hr.jpg │ │ ├── final_merged_0_lr.jpg │ │ ├── final_merged_0_sr.jpg │ │ ├── final_merged_1_hr.jpg │ │ ├── final_merged_1_lr.jpg │ │ └── final_merged_1_sr.jpg └── TIE │ ├── .DS_Store │ ├── data │ ├── .DS_Store │ └── docdiff │ │ ├── .DS_Store │ │ ├── train_image_0001.png │ │ ├── train_image_0003.png │ │ ├── train_image_0004.png │ │ ├── train_image_0005.png │ │ ├── train_image_0007.png │ │ └── train_image_0009.png │ ├── get_tie_score.py │ └── gpt_tie.py ├── logo.png └── vtpbench_2.png /README.md: -------------------------------------------------------------------------------- 1 | # Survey-of-Visual-Text-Processing 2 | The official project of paper "[Visual Text Processing: A Comprehensive Review and Unified Evaluation](https://arxiv.org/abs/2504.21682 )" 3 | 4 | ![LOGO](logo.png) 5 | 6 | This repository contains a paper collection of recent works for visual text processing tasks. 7 | 8 | ## 📚 VTPBench 9 | ![LOGO](vtpbench_2.png) 10 | We propose VTPBench, a multi-task benchmark comprising 4,305 samples across six visual text processing tasks. 11 | 12 | ### Dataset 13 | We release the dataset of VTPBench in [Data](https://huggingface.co/datasets/sy1998/VTP_Bench) 14 | 15 | ### Evaluation 16 | VTPBech utilizes GPT-4 as the evaluation method, which provides an easy implementation for unified VTP evaluation. Please see the evaluation code [here](VTP_Bench). 17 | 18 | (1) Infer your visual text processing models (e.g., Scene Text Removal Models) on the provided dataset. Then, concatenate the predicted one with the GT, like the [examples](VTP_Bench/STR/STR_data/viteraser) we provide. 19 | 20 | (2) Run the following code to get evaluation results. 21 | ```bash 22 | cd STR 23 | python gpt_str.py 24 | python get_str_score.py 25 | ``` 26 | 27 | ## 📖 Table of Contents 👀 28 | - [Text Image Super-resolution](#text-image-Super-resolution) 29 | - [Document Image Dewarping](#Document-Image-Dewarping) 30 | - [Text Image Denosing](#Text-Image-Denosing) 31 | - [Scene Text Removal](#Scene-Text-Removal) 32 | - [Scene Text Editing](#Scene-Text-Editing) 33 | - [Scene Text Generation](#Scene-Text-Generation) 34 | ## 35 | 36 | ### Text Image Super-resolution 37 | + Boosting Optical Character Recognition: A Super-Resolution Approach (**2015 arxiv**) [paper](https://arxiv.org/pdf/1506.02211.pdf) 38 | + ICDAR2015 competition on Text Image Super-Resolution 39 | (**2015 ICDAR**) [paper](https://ieeexplore.ieee.org/document/7333951) 40 | + Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network (**2017 CVPR**) [paper](https://openaccess.thecvf.com/content_cvpr_2017/html/Ledig_Photo-Realistic_Single_Image_CVPR_2017_paper.html) 41 | + TextSR: Content-Aware Text Super-Resolution Guided by Recognition (**2019 arxiv**) [paper](https://arxiv.org/pdf/1909.07113.pdf) [code](https://github.com/xieenze/TextSR) 42 | + Selective Super-Resolution for Scene Text Images (**2019 ICDAR**) [paper](https://ieeexplore.ieee.org/abstract/document/8977974) 43 | + Text-Attentional Conditional Generative Adversarial Network for Super-Resolution of Text Images (**2019 ICME**) [paper](https://ieeexplore.ieee.org/abstract/document/8784962) 44 | + Collaborative Deep Learning for Super-Resolving Blurry Text Images (**2020 TCI**) [paper](https://ieeexplore.ieee.org/abstract/document/9040515) 45 | + PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit (**2020 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-58555-6_10) 46 | + Scene Text Image Super-Resolution in the Wild (**2020 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-58607-2_38) [code](https://github.com/WenjiaWang0312/TextZoom) 47 | + Scene Text Telescope: Text-Focused Scene Image Super-Resolution (**2021 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2021/papers/Chen_Scene_Text_Telescope_Text-Focused_Scene_Image_Super-Resolution_CVPR_2021_paper.pdf) 48 | + Scene Text Image Super-Resolution via Parallelly Contextual Attention Network (**2021 CVPR**) [paper](https://dl.acm.org/doi/abs/10.1145/3474085.3475469) 49 | + Text Prior Guided Scene Text Image Super-Resolution (**2021 TIP**) [paper](https://ieeexplore.ieee.org/abstract/document/10042236) [code](https://github.com/mjq11302010044/TPGSR) 50 | + A text attention network for spatial deformation robust scene text image super-resolution (**2022 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Ma_A_Text_Attention_Network_for_Spatial_Deformation_Robust_Scene_Text_CVPR_2022_paper.pdf) [code](https://github.com/mjq11302010044/TATT) 51 | + C3-STISR: Scene Text Image Super-resolution with Triple Clues (**2022 IJCAI**) [paper] 52 | + Text gestalt: Stroke-aware scene text image super-resolution (**2022 AAAI**) [paper](https://ojs.aaai.org/index.php/AAAI/article/download/19904/19663) [code](https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt) 53 | + A Benchmark for Chinese-English Scene Text Image Super-Resolution (**2023 ICCV**) [paper](https://openaccess.thecvf.com/content/ICCV2023/html/Ma_A_Benchmark_for_Chinese-English_Scene_Text_Image_Super-Resolution_ICCV_2023_paper.html) [code](https://github.com/mjq11302010044/Real-CE) 54 | + Text Image Super-Resolution Guided by Text Structure and Embedding Priors (**2023 ACM MM**) [paper](https://dl.acm.org/doi/abs/10.1145/3595924) 55 | + Towards robust scene text image super-resolution via explicit location enhancement (**2023 ACM MM**) [paper](https://arxiv.org/pdf/2307.09749.pdf) [code](https://github.com/csguoh/LEMMA) 56 | + Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network (**2023 AAAI**) [paper](https://arxiv.org/pdf/2302.10414.pdf) [code](https://github.com/jdfxzzy/DPMN) 57 | + Learning Generative Structure Prior for Blind Text Image Super-Resolution (**2023 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2023/html/Li_Learning_Generative_Structure_Prior_for_Blind_Text_Image_Super-Resolution_CVPR_2023_paper.html) [code](https://github.com/csxmli2016/MARCONet) 58 | + Scene Text Image Super-resolution based on Text-conditional Diffusion Models (**2024 WACV**) [paper](https://openaccess.thecvf.com/content/WACV2024/papers/Noguchi_Scene_Text_Image_Super-Resolution_Based_on_Text-Conditional_Diffusion_Models_WACV_2024_paper.pdf) [code](https://github.com/ToyotaInfoTech/stisr-tcdm) 59 | + GARDEN: Generative prior guided network for scene text image super-resolution (**2024 ICDAR**) [paper](https://link.springer.com/chapter/10.1007/978-3-031-70549-6_12) 60 | + Scene text image super-resolution through multi-scale interaction of structural and semantic priors (**2024 TAI**) [paper](https://ieeexplore.ieee.org/document/10473520) 61 | + PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (**2024 ACM MM**) [paper](https://arxiv.org/abs/2311.17955) [code](https://github.com/jdfxzzy/PEAN) 62 | + Diffusion-based blind text image super-resolution (**2024 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Diffusion-based_Blind_Text_Image_Super-Resolution_CVPR_2024_paper.pdf) [code](https://github.com/YuzheZhang-1999/DiffTSR) 63 | + Diffusion-conditioned-diffusion model for scene text image super-resolution (**2024 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-031-72633-0_17) 64 | + QT-TextSR: Enhancing scene text image super-resolution via efficient interaction with text recognition using a query-aware transformer (**2024 Neurocomputing**) [paper](https://dl.acm.org/doi/10.1016/j.neucom.2024.129241) [code](https://github.com/lcy0604/QT-TextSR) 65 | 66 | 67 | 68 | ### Document Image Dewarping 69 | + A Fast Page Outline Detection and Dewarping Method Based on Iterative Cut and Adaptive Coordinate Transform (**2019 ICDARW**) [paper](https://ieeexplore.ieee.org/abstract/document/8892891) 70 | + DocUNet: Document Image Unwarping via a Stacked U-Net (**2018 CVPR**)[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Ma_DocUNet_Document_Image_CVPR_2018_paper.pdf) 71 | + DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks (**2019 ICCV**) [paper](https://openaccess.thecvf.com/content_ICCV_2019/papers/Das_DewarpNet_Single-Image_Document_Unwarping_With_Stacked_3D_and_2D_Regression_ICCV_2019_paper.pdf) [code](https://www.cs.stonybrook.edu/%E2%88%BCcvl/dewarpnet.html) 72 | + Document rectification and illumination correction using a patch-based CNN (**2019 TOG**) [paper](https://dl.acm.org/doi/abs/10.1145/3355089.3356563) 73 | + Dewarping Document Image by Displacement Flow Estimation with Fully Convolutional Network (**2020 IAPR**) [paper](https://arxiv.org/pdf/2104.06815) 74 | + Geometric rectification of document images using adversarial gated unwarping network (**2020 PR**) [paper](https://www.sciencedirect.com/science/article/abs/pii/S0031320320303794) 75 | + DocScanner: Robust Document Image Rectification with Progressive Learning (**2021 arxiv**) [paper](https://arxiv.org/pdf/2110.14968.pdf) 76 | + End-to-End Piece-Wise Unwarping of Document Images (**2021 ICCV**) [paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Das_End-to-End_Piece-Wise_Unwarping_of_Document_Images_ICCV_2021_paper.pdf) 77 | + Document Dewarping with Control Points (**2021 ICDAR**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-86549-8_30) [paper](https://github.com/gwxie/Document-Dewarping-with-Control-Points) 78 | + DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction (**2021 ACM MM**) [paper](https://arxiv.org/pdf/2110.12942.pdf) [code](https://github.com/fh2019ustc/DocTr) 79 | + Revisiting Document Image Dewarping by Grid Regularization (**2022 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Jiang_Revisiting_Document_Image_Dewarping_by_Grid_Regularization_CVPR_2022_paper.pdf) 80 | + Fourier Document Restoration for Robust Document Dewarping and Recognition (**2022 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2022/html/Xue_Fourier_Document_Restoration_for_Robust_Document_Dewarping_and_Recognition_CVPR_2022_paper.html) 81 | + Learning an Isometric Surface Parameterization for Texture Unwrapping (**2022 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-031-19836-6_33) [code](https://github.com/cvlab-stonybrook/Iso-UVField) 82 | + Geometric Representation Learning for Document Image Rectification (**2022 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-031-19836-6_27) 83 | + Learning From Documents in the Wild to Improve Document Unwarping (**2022 SIGGRAPH**) [paper](https://dl.acm.org/doi/abs/10.1145/3528233.3530756) [code](https://github.com/cvlab-stonybrook/PaperEdge) 84 | + Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild (**2023 ACM MM**) [paper](https://arxiv.org/pdf/2207.11515.pdf) [code](https://github.com/ZZZHANG-jx/Marior) 85 | + DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures (**2023 arxiv**) [paper](https://arxiv.org/pdf/2306.05749.pdf) 86 | + DocMAE: Document Image Rectification via Self-supervised Representation Learning (**2023 ICME***) [paper](https://arxiv.org/pdf/2304.10341.pdf) 87 | + Deep Unrestricted Document Image Rectification (**2023 TMM**) [paper](https://arxiv.org/pdf/2304.08796.pdf) [code](https://github.com/fh2019ustc/DocTr-Plus) 88 | + Layout-aware Single-image Document Flattening (**2023 TOG**) [paper](https://dl.acm.org/doi/abs/10.1145/3627818) [code](https://github.com/BunnySoCrazy/LA-DocFlatten) 89 | + Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation (**2023 arxiv**) [paper](https://arxiv.org/pdf/2312.07925.pdf) 90 | + DocReal: Robust Document Dewarping of Real-Life Images via Attention-Enhanced Control Point Prediction (**2023 WACV**) [paper](https://openaccess.thecvf.com/content/WACV2024/papers/Yu_DocReal_Robust_Document_Dewarping_of_Real-Life_Images_via_Attention-Enhanced_Control_WACV_2024_paper.pdf) [code](https://github.com/irisXcoding/DocReal) 91 | + Rethinking Supervision in Document Unwarping: A Self-consistent Flow-free Approach (**2023 TCSVT**) [paper](https://ieeexplore.ieee.org/abstract/document/10327775) 92 | + Am I readable? Transfer learning based document image rectification (**2024 IJDAR**) [paper](https://link.springer.com/article/10.1007/s10032-024-00476-9) 93 | + Efficient Joint Rectification of Photometric and Geometric Distortions in Document Images (**2024 ICASSP**) [paper](https://ieeexplore.ieee.org/abstract/document/10447446) 94 | 95 | 96 | 97 | ### Text Image Denosing 98 | + Shading Removal of Illustrated Documents (**2013 ICDAR**) [paper](https://link.springer.com/chapter/10.1007/978-3-642-39094-4_35) 99 | + Nonparametric illumination correction for scanned document images via convex hulls (**2013 TPAMI**) [paper](https://ieeexplore.ieee.org/abstract/document/6361405) 100 | + Removing shadows from images of documents (**2016 ACCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-319-54187-7_12) 101 | + Document enhancement using visibility detection (**2018 CVPR**) [paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Kligler_Document_Enhancement_Using_CVPR_2018_paper.pdf) 102 | + Water-Filling: An Efficient Algorithm for Digitized Document Shadow Removal (**2018 ACCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-20887-5_25) 103 | + Learning to Clean: A GAN Perspective (**2018 ACCVW**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-21074-8_14) 104 | + Deeperase: Weakly supervised ink artifact removal in document text images (**2020 WACV**) [paper](https://openaccess.thecvf.com/content_WACV_2020/papers/Qi_DeepErase_Weakly_Supervised_Ink_Artifact_Removal_in_Document_Text_Images_WACV_2020_paper.pdf) 105 | + From Shadow Segmentation to Shadow Removal (**2020 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-58621-8_16) 106 | + BEDSR-Net: A Deep Shadow Removal Network From a Single Document Image (**2020 CVPR**) [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lin_BEDSR-Net_A_Deep_Shadow_Removal_Network_From_a_Single_Document_CVPR_2020_paper.pdf) 107 | + Light-Weight Document Image Cleanup Using Perceptual Loss (**2021 ICDAR**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-86334-0_16) 108 | + RecycleNet: An Overlapped Text Instance Recovery Approach (**2021 ACM MM**) [paper](https://dl.acm.org/doi/abs/10.1145/3474085.3481536) 109 | + End-to-End Unsupervised Document Image Blind Denoising (**2021 ICCV**) [paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Gangeh_End-to-End_Unsupervised_Document_Image_Blind_Denoising_ICCV_2021_paper.pdf) 110 | + Bijective mapping network for shadow removal (**2022 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Zhu_Bijective_Mapping_Network_for_Shadow_Removal_CVPR_2022_paper.pdf) 111 | + Style-guided shadow removal (**2022 ECCV**) [paper](https://link.springer.com/chapter/10.1007/978-3-031-19800-7_21) [code](https://github.com/jinwan1994/SG-ShadowNet) 112 | + UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior (**2022 ACM MM**) [paper](https://dl.acm.org/doi/abs/10.1145/3503161.3547916) [code](https://github.com/harrytea/UDoc-GAN) 113 | + LP-IOANet: Efficient High Resolution Document Shadow Removal (**2023 ICASSP**) [paper](https://ieeexplore.ieee.org/abstract/document/10095920) 114 | + Shadow Removal of Text Document Images Using Background Estimation and Adaptive Text Enhancement (**2023 ICASSP**) [paper](https://ieeexplore.ieee.org/abstract/document/10096115) 115 | + Mask-Guided Stamp Erasure for Real Document Image (**2023 ICME**) [paper](https://ieeexplore.ieee.org/abstract/document/10219771) 116 | + Document Image Shadow Removal Guided by Color-Aware Background (**2023 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Zhang_Document_Image_Shadow_Removal_Guided_by_Color-Aware_Background_CVPR_2023_paper.pdf) 117 | + DocDiff: Document Enhancement via Residual Diffusion Models (**2023 ACM MM**) [paper](https://dl.acm.org/doi/abs/10.1145/3581783.3611730) [code](https://github.com/Royalvice/DocDiff) 118 | + DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degradations (**2024 AAAI**) [paper](https://ojs.aaai.org/index.php/AAAI/article/view/28366) [code](https://github.com/RylonW/DocNLC) 119 | + DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks (**2024 CVPR**) [paper](https://arxiv.org/pdf/2405.04408.pdf) [code](https://github.com/ZZZHANG-jx/DocRes) 120 | 121 | 122 | ### Scene Text Removal 123 | + Image-to-Image Translation with Conditional Adversarial Networks* (**2017 CVPR**) [paper](https://openaccess.thecvf.com/content_cvpr_2017/papers/Isola_Image-To-Image_Translation_With_CVPR_2017_paper.pdf) 124 | + Scene text eraser (**2017 ICDAR**) [paper](https://arxiv.org/pdf/1705.02772.pdf) 125 | + Automatic Semantic Content Removal by Learning to Neglect (**2018 BMVC**) [paper](https://arxiv.org/pdf/1807.07696.pdf) 126 | + Ensnet: Ensconce text in the wild (**2019 AAAI**) [paper](https://ojs.aaai.org/index.php/AAAI/article/download/3859/3737) [code](https://github.com/HCIILAB/Scene-Text-Removal) 127 | + Mtrnet: A generic scene text eraser (**2019 ICDAR**) [paper](https://arxiv.org/pdf/1903.04092) 128 | + Erasenet: End-to-end text removal in the wild (**2020 TIP**) [paper](https://ieeexplore.ieee.org/abstract/document/9180003) [code](https://github.com/HCIILAB/SCUT-EnsText) 129 | + Mtrnet++: One-stage mask-based scene text eraser (**2020 CVIU**) [paper](https://arxiv.org/pdf/1912.07183.pdf) 130 | + Erasing scene text with weak supervision (**2020 WACV**) [paper](https://openaccess.thecvf.com/content_WACV_2020/papers/Zdenek_Erasing_Scene_Text_with_Weak_Supervision_WACV_2020_paper.pdf) 131 | + Stroke-Based Scene Text Erasing Using Synthetic Data for Training (**2021 TIP**) [paper](https://ieeexplore.ieee.org/abstract/document/9609970) 132 | + Text region conditional generative adversarial network for text concealment in the wild (**2021 TCSVT**) [paper](https://ieeexplore.ieee.org/abstract/document/9509541) 133 | + Two-Stage Seamless Text Erasing On Real-World Scene Images (**2021 ICIP**) [paper](https://ieeexplore.ieee.org/abstract/document/9506394) 134 | + Scene text removal via cascaded text stroke detection and erasing (**2022 CVM**) [paper](https://link.springer.com/content/pdf/10.1007/s41095-021-0242-8.pdf) 135 | + Self-supervised text erasing with controllable image synthesis (**2022 ACM MM**) [paper](https://arxiv.org/pdf/2204.12743.pdf) 136 | + Multi-branch network with ensemble learning for text removal in the wild (**2022 ACCV**) [paper](https://openaccess.thecvf.com/content/ACCV2022/papers/Hou_Multi-Branch_Network_with_Ensemble_Learning_for_Text_Removal_in_the_ACCV_2022_paper.pdf) 137 | + The Surprisingly Straightforward Scene Text Removal Method with Gated Attention and Region of Interest Generation: A Comprehensive Prominent Model Analysis (**2022 ECCV**) [paper](https://arxiv.org/pdf/2210.07489) [code](https://github.com/naver/garnet) 138 | + Don’t forget me: accurate background recovery for text removal via modeling local-global context (**2022 ECCV**) [paper](https://arxiv.org/pdf/2207.10273) [code](https://github.com/lcy0604/CTRNet.) 139 | + Psstrnet: progressive segmentation-guided scene text removal network (**2022 ICME**) [paper](https://arxiv.org/pdf/2306.07842) 140 | + Fetnet: Feature erasing and transferring network for scene text removal (**2023 PR**) [paper](https://arxiv.org/pdf/2306.09593) 141 | + Modeling stroke mask for end-to-end text erasing (**2023 WACV**) [paper](https://openaccess.thecvf.com/content/WACV2023/papers/Du_Modeling_Stroke_Mask_for_End-to-End_Text_Erasing_WACV_2023_paper.pdf) 142 | + Viteraser: Harnessing the power of vision transformers for scene text removal with segmim pretraining (**2023 arxiv**) [paper](https://arxiv.org/pdf/2306.12106) [code](https://github.com/shannanyinxiang/ViTEraser) 143 | + Progressive scene text erasing with self-supervision (**2023 CVIU**) [paper](https://arxiv.org/pdf/2207.11469.pdf) 144 | + What is the Real Need for Scene Text Removal? Exploring the Background Integrity and Erasure Exhaustivity Properties (**2023 TIP**) [paper](https://ieeexplore.ieee.org/abstract/document/10214243) [code](https://github.com/wangyuxin87/PERT) 145 | + PERT: A Progressively Region-based Network for Scene Text Removal (**2023 TIP**) [paper](https://ieeexplore.ieee.org/document/10214243) [code](https://github.com/wangyuxin87/PERT) 146 | + Selective scene text removal (**2023 BMVC**) [paper](https://arxiv.org/pdf/2309.00410.pdf) [code](https://github.com/mitanihayato/Selective-Scene-Text-Removal) 147 | + FETNet: Feature Erasing and Transferring Network for Scene Text Removal (**2023 PR**) [paper](https://arxiv.org/abs/2306.09593) [code](https://github.com/GuangtaoLyu/FETNet) 148 | 149 | 150 | 151 | ### Scene Text Editing 152 | + Scene text magnifier (**2019 ICDAR**) [paper](https://arxiv.org/pdf/1907.00693.pdf) 153 | + Selective style transfer for text (**2019 ICDAR**) [paper](https://arxiv.org/pdf/1906.01466.pdf) [code](https://github.com/furkanbiten/SelectiveTextStyleTransfer) 154 | + Editing text in the wild (**2019 ACM MM**) [paper](https://arxiv.org/pdf/1908.03047.pdf) [code](https://github.com/youdao-ai/SRNet) 155 | + Swaptext: Image based texts transfer in scenes (**2020 CVPR**) [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Yang_SwapText_Image_Based_Texts_Transfer_in_Scenes_CVPR_2020_paper.pdf) 156 | + Scene text transfer for cross-language (**2021 ICIG**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-87355-4_46) 157 | + Mask-guided gan for robust text editing in the scene (**2021 Neurocomputing**) [paper](https://www.sciencedirect.com/science/article/abs/pii/S092523122100299X) 158 | + Stefann: scene text editor using font adaptive neural network (**2020 CVPR**) [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Roy_STEFANN_Scene_Text_Editor_Using_Font_Adaptive_Neural_Network_CVPR_2020_paper.pdf) 159 | + Deep learning-based forgery attack on document images (**2021 TIP**) [paper](https://arxiv.org/pdf/2102.00653) 160 | + Strive: Scene text replacement in videos (**2021 ICCV**) [paper](https://openaccess.thecvf.com/content/ICCV2021/papers/G_STRIVE_Scene_Text_Replacement_in_Videos_ICCV_2021_paper.pdf) 161 | + RewriteNet: Reliable Scene Text Editing with Implicit Decomposition of Text Contents and Styles (**2022 CVPRW**) [paper](https://arxiv.org/pdf/2107.11041.pdf) [code](https://github.com/clovaai/rewritenet) 162 | + Fast: Font-agnostic scene text editing (**2023 arxiv**) [paper](https://arxiv.org/abs/2308.02905) 163 | + Letter Embedding Guidance Diffusion Model for Scene Text Editing (**2023 ICME**) [paper](http://ercdm.sdu.edu.cn/__local/1/D6/7E/7C1DDDEFCDC240906F00E254B02_354743F8_1C042B.pdf) 164 | + Exploring stroke-level modifications for scene text editing (**2023 AAAI**) [paper](https://ojs.aaai.org/index.php/AAAI/article/view/25305/25077) [code](https://github.com/qqqyd/MOSTEL) 165 | + Textstylebrush: Transfer of text aesthetics from a single example (**2023 TPAMI**) [paper](https://arxiv.org/pdf/2106.08385.pdf) 166 | + Self-Supervised Cross-Language Scene Text Editing (**2023 ACM MM**) [paper](https://dl.acm.org/doi/abs/10.1145/3581783.3612174) 167 | + Scene style text editing (**2023 arxiv**) [paper](https://arxiv.org/pdf/2304.10097.pdf) 168 | + Improving Diffusion Models for Scene Text Editing with Dual Encoders (**2023 arxiv**) [paper](https://arxiv.org/pdf/2304.05568.pdf) [code](https://github.com/UCSB-NLP-Chang/DiffSTE) 169 | + Towards scene-text to scene-text translation (**2023 arxiv**) [paper](https://arxiv.org/pdf/2308.03024.pdf) 170 | + DiffUTE: Universal Text Editing Diffusion Model (**2023 NIPS**) [paper](https://arxiv.org/abs/2305.10825) [code](https://github.com/chenhaoxing/DiffUTE) 171 | + On manipulating scene text in the wild with diffusion models (**2024 WACV**) [paper](https://arxiv.org/pdf/2311.00734.pdf) 172 | + AnyText: Multilingual Visual Text Generation And Editing (**2024 ICLR**) [paper](https://arxiv.org/pdf/2311.03054) [code](https://github.com/tyxsspa/AnyText) 173 | + Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing (**2024 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhang_Choose_What_You_Need_Disentangled_Representation_Learning_for_Scene_Text_CVPR_2024_paper.pdf) 174 | + TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control (**2024 NIPS**) [paper](https://arxiv.org/pdf/2410.10133) [code](https://github.com/weichaozeng/TextCtrl) 175 | + How Control Information Influences Multilingual Text Image Generation and Editing? (**2024 NIPS**) [paper](https://arxiv.org/pdf/2407.11502) [code](https://github.com/CyrilSterling/TextGen) 176 | + TextMaster: Universal Controllable Text Edit (**2024 arxiv**) [paper](https://arxiv.org/pdf/2410.09879) 177 | + AnyText2: Visual Text Generation and Editing With Customizable Attributes (**2024 arxiv**) [paper](https://arxiv.org/pdf/2411.15245) [code](https://github.com/tyxsspa/AnyText2) 178 | + Recognition-Synergistic Scene Text Editing (**2025 CVPR**) [paper](https://arxiv.org/pdf/2503.08387) [code](https://github.com/ZhengyaoFang/RS-STE) 179 | + GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing (**2025 CVPR**) [paper](https://arxiv.org/pdf/2505.04915) 180 | + Type-R: Automatically Retouching Typos for Text-to-Image Generation(**2025 CVPR**) [paper](https://arxiv.org/pdf/2411.18159) [code](https://github.com/CyberAgentAILab/Type-R) 181 | 182 | ### Scene Text Generation 183 | + Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition (**2014 arxiv**) [paper](https://arxiv.org/pdf/1406.2227.pdf) 184 | + Synthetic data for text localisation in natural images (**2016 CVPR**) [paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Gupta_Synthetic_Data_for_CVPR_2016_paper.pdf) [paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/Gupta_Synthetic_Data_for_CVPR_2016_paper.pdf) [code](https://github.com/ankush-me/SynthText) 185 | + Text detection in traffic informatory signs using synthetic data (**2017 ICDAR**) [paper](https://ieeexplore.ieee.org/abstract/document/8270075) 186 | + Verisimilar image synthesis for accurate detection and recognition of texts in scenes (**2018 ECCV**) [paper](https://openaccess.thecvf.com/content_ECCV_2018/papers/Fangneng_Zhan_Verisimilar_Image_Synthesis_ECCV_2018_paper.pdf) [code](https://github.com/GodOfSmallThings/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes) 187 | + Spatial Fusion GAN for Image Synthesis (**2019 CVPR**) [paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_Spatial_Fusion_GAN_for_Image_Synthesis_CVPR_2019_paper.pdf) 188 | + Learning to draw text in natural images with conditional adversarial networks (**2019 IJCAI**) [paper](https://www.ijcai.org/Proceedings/2019/0101.pdf) 189 | + ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation (**2020 CVPR**) [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Fogel_ScrabbleGAN_Semi-Supervised_Varying_Length_Handwritten_Text_Generation_CVPR_2020_paper.pdf) 190 | + SynthText3D: synthesizing scene text images from 3D virtual worlds (**2020 Science China Information Sciences**) [paper](https://link.springer.com/article/10.1007/s11432-019-2737-0) 191 | + UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World (***2020 arxiv***) [paper](https://arxiv.org/pdf/2003.10608.pdf) [code](https://github.com/Jyouhou/UnrealText/) 192 | + Synthtiger: Synthetic text image generator towards better text recognition models (**2021 ICDAR**) [paper](https://link.springer.com/chapter/10.1007/978-3-030-86337-1_8) [code](https://github.com/clovaai/synthtiger) 193 | + Vector Quantized Diffusion Model for Text-to-Image Synthesis (**2022 CVPR**) [paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Gu_Vector_Quantized_Diffusion_Model_for_Text-to-Image_Synthesis_CVPR_2022_paper.pdf) 194 | + Photorealistic text-to-image diffusion models with deep language understanding (**2022 NIPS**) [paper](https://proceedings.neurips.cc/paper_files/paper/2022/file/ec795aeadae0b7d230fa35cbaf04c041-Paper-Conference.pdf) 195 | + eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers (**2022 arxiv**) [paper](https://arxiv.org/pdf/2211.01324.pdf) [code](https://deepimagination.cc/eDiff-I/) 196 | + Character-Aware Models Improve Visual Text Rendering (**2023 ACL**) [paper](https://arxiv.org/pdf/2212.10562.pdf) 197 | + Deepfloyd (**2023**) [code](https://github.com/deep-floyd/if) 198 | + GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation (**2023 arxiv**) [paper](https://arxiv.org/pdf/2303.17870.pdf) [code](https://1073521013.github.io/glyph-draw.github.io/) 199 | + TextDiffuser: Diffusion Models as Text Painters (**2023 NIPS**) [paper](https://arxiv.org/pdf/2305.10855.pdf) [code](https://aka.ms/textdiffuser) 200 | + Glyphcontrol: Glyph conditional control for visual text generation (**2023 NIPS**) [paper](https://arxiv.org/pdf/2305.10855.pdf) [code](https://aka.ms/textdiffuser) 201 | + AnyText: Multilingual Visual Text Generation And Editing (**2024 ICLR**) [paper](https://arxiv.org/pdf/2311.03054) [code](https://github.com/tyxsspa/AnyText) 202 | + How Control Information Influences Multilingual Text Image Generation and Editing? (**2024 NIPS**) [paper](https://arxiv.org/pdf/2407.11502) [code](https://github.com/CyrilSterling/TextGen) 203 | + First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending (**2024 ECAI**) [paper](https://arxiv.org/pdf/2410.10168) 204 | + Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering (**2024 ECCV**) [paper](https://arxiv.org/pdf/2403.09622) [code](https://github.com/AIGText/Glyph-ByT5) 205 | + TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering (**2024 ECCV**) [paper](https://arxiv.org/pdf/2311.16465) [code](https://github.com/microsoft/unilm/tree/master/textdiffuser-2) 206 | + AnyText2: Visual Text Generation and Editing With Customizable Attributes (**2024 arxiv**) [paper](https://arxiv.org/pdf/2411.15245) [code](https://github.com/tyxsspa/AnyText2) 207 | + Type-R: Automatically Retouching Typos for Text-to-Image Generation (**2024 arxiv**) [paper](https://arxiv.org/pdf/2411.18159) 208 | + Geometric-Aware Control in Diffusion Model for Handwritten Chinese Font Generation (**2024 ICDAR**) [paper](https://link.springer.com/chapter/10.1007/978-3-031-70536-6_1) 209 | + DiffusionPen: Towards Controlling the Style of Handwritten Text Generation (**2024 ECCV**) [paper](https://arxiv.org/pdf/2409.06065.pdf) [code](https://github.com/koninik/DiffusionPen) 210 | + Visual Text Generation in the Wild (**2024 ECCV**) [paper](https://arxiv.org/pdf/2407.14138.pdf) [code](https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/SceneVTG) 211 | + One-Shot Diffusion Mimicker for Handwritten Text Generation (**2024 ECCV**) [paper](https://arxiv.org/pdf/2409.04004.pdf) [code](https://github.com/dailenson/One-DM) 212 | + Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation (**2025 arxiv**) [paper](https://arxiv.org/pdf/2501.05892.pdf) 213 | + FonTS: Text Rendering with Typography and Style Controls (**2025 arxiv**) [paper](https://arxiv.org/pdf/2412.00136) 214 | + TextFlux: An OCR-Free DiT Model for High-Fidelity Multilingual Scene Text Synthesis (**2025 arxiv**) [paper](https://arxiv.org/pdf/2505.17778.pdf) [code](https://github.com/yyyyyxie/textflux) 215 | + Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model (**2025 arxiv**) [paper](https://arxiv.org/pdf/2503.07703.pdf) 216 | + GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models (**2025 AAAI**) [paper](https://arxiv.org/pdf/2407.02252.pdf) [code](https://github.com/OPPO-Mente-Lab/GlyphDraw2) 217 | + POSTA: A Go-to Framework for Customized Artistic Poster Generation (**2025 CVPR**) [paper](https://arxiv.org/pdf/2503.14908.pdf) [code](https://haoyuchen.com/POSTA) 218 | + PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering (**2025 CVPR**) [paper](https://arxiv.org/pdf/2504.06632.pdf) [code](https://poster-maker.github.io) 219 | + BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation (**2025 CVPR**) [paper](https://arxiv.org/pdf/2503.20672.pdf) [code](https://github.com/1230young/bizgen) 220 | + TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes (**2025 arxiv**) [paper](https://arxiv.org/pdf/2503.23461.pdf) [code](https://github.com/NJU-PCALab/TextCrafter) 221 | + Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering (**2024 arxiv**) [paper](https://arxiv.org/pdf/2406.10208.pdf) [code](https://glyph-byt5-v2.github.io/) 222 | + AMO Sampler: Enhancing Text Rendering with Overshooting (**2025 CVPR**) [paper](https://arxiv.org/pdf/2411.19415.pdf) [code](https://github.com/hxixixh/amo-release) 223 | 224 | 225 | ### Cite 226 | If you are interested in it, please star our project! And cite our paper as follows: 227 | ``` 228 | @article{shu2025visual, 229 | title={Visual Text Processing: A Comprehensive Review and Unified Evaluation}, 230 | author={Shu, Yan and Zeng, Weichao and Zhao, Fangmin and Chen, Zeyu and Li, Zhenhang and Yang, Xiaomeng and Zhou, Yu and Rota, Paolo and Bai, Xiang and Jin, Lianwen and Yin Xu-Cheng and Sebe, Nicu}, 231 | journal={arXiv preprint arXiv:2504.21682}, 232 | year={2025} 233 | } 234 | ``` 235 | 236 | 237 | -------------------------------------------------------------------------------- /VTP_Bench/DID/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/DID/__pycache__/select.cpython-310.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/__pycache__/select.cpython-310.pyc -------------------------------------------------------------------------------- /VTP_Bench/DID/evaluation_results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/evaluation_results/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/DID/get_did_scores.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import os 4 | 5 | 6 | def extract_numbers(input_str): 7 | # 正则表达式匹配冒号后面的数字 8 | numbers = re.findall(r":\s*(\d+)", input_str) 9 | # 将提取的数字转换为整数 10 | return [int(num) for num in numbers] 11 | 12 | folder="./DID/evaluation_results/DocTr" 13 | json_list=os.listdir(folder) 14 | all_geo,all_con,all_total=0,0,0 15 | num=0 16 | for j in json_list: 17 | json_file=os.path.join(folder,j) 18 | with open(json_file) as f: 19 | data=json.load(f) 20 | 21 | print("#####",data) 22 | 23 | for i in data: 24 | for j in data[i]: 25 | try: 26 | scores = extract_numbers(data[i][j]) 27 | except: 28 | continue 29 | print("111111",scores) 30 | 31 | if len(scores) <3: 32 | continue 33 | 34 | 35 | geo=scores[0] 36 | con=scores[1] 37 | total=scores[2] 38 | if total==0: 39 | continue 40 | all_geo+=geo 41 | all_con+=con 42 | all_total+=total 43 | num+=1 44 | 45 | 46 | 47 | print("#############evaluation results") 48 | final_geo=all_geo/num 49 | final_con=all_con/num 50 | final_total=all_total/num 51 | 52 | print(final_geo) 53 | print(final_con) 54 | print(final_total) 55 | 56 | 57 | 58 | -------------------------------------------------------------------------------- /VTP_Bench/DID/gpt_did.py: -------------------------------------------------------------------------------- 1 | 2 | import time 3 | import openai 4 | import json 5 | from tqdm import tqdm 6 | import os 7 | import cv2 8 | import base64 9 | import random 10 | 11 | 12 | def get_patch(image, x, y, patch_size=112): 13 | """ 14 | Extracts a patch of the specified size from the given position in the image. 15 | """ 16 | return image[y:y + patch_size, x:x + patch_size] 17 | 18 | def chat(image_path): 19 | openai.api_base = "https://api.942ai.com/v1" 20 | openai.api_key = "" 21 | 22 | # Load and split the image 23 | image = cv2.imread(image_path) 24 | width = image.shape[1] // 2 25 | left_image = image[:, :width] 26 | right_image = image[:, width:] 27 | 28 | responses = {} 29 | 30 | height, width = left_image.shape[:2] 31 | patch_size = 224 32 | 33 | for idx in range(1): # Loop to create 4 corresponding patches 34 | # Calculate random top-left corner for the patches 35 | x = random.randint(0, width - patch_size) 36 | y = random.randint(0, height - patch_size) 37 | 38 | # Get patches from both images at the same position 39 | left_patch = get_patch(left_image, x, y, patch_size) 40 | right_patch = get_patch(right_image, x, y, patch_size) 41 | 42 | # Concatenate patches horizontally 43 | combined = cv2.hconcat([left_patch, right_patch]) 44 | 45 | # Save the temporary combined image 46 | temp_path = "/share/shuyan/VTP/DID/temp/"+image_path.split("/")[-1] 47 | cv2.imwrite(temp_path, combined) 48 | with open(temp_path, "rb") as img_file: 49 | img_base64 = base64.b64encode(img_file.read()).decode("utf-8") 50 | 51 | # Prepare the messages 52 | messages = [ 53 | { 54 | "role": "system", 55 | "content": """ 56 | The image you will receive is composed of two parts: (1) The left section contains the original image. (2) The right section is generated by a model after image dewarping. Please evaluate the model’s image dewarping capabilities based on the following two criteria: 57 | 58 | (1) Geometric Accuracy (Score: 0-5): Assign a higher score if the dewarped image (the right section) exhibits accurate geometric correction, such as: Straightened lines: Text lines or table lines that should be straight are properly aligned without residual curvature. Shape preservation: The overall shape of the document is correctly restored (e.g., rectangular or square layout). Margin alignment: The edges of the document are neatly aligned and parallel/perpendicular to the frame of the image. 59 | (2) Content Readability (Score: 0-5): Assign a higher score if the text and visual elements in the dewarped image (the right section) are easy to read and understand, considering: Text clarity: The characters are legible, without distortion, excessive stretching, or compression. Visual continuity: The arrangement of text and other elements (e.g., images, diagrams) appears natural and coherent, with minimal artifacts or noise. Font consistency: The font size and style remain consistent across the document, avoiding irregularities caused by dewarping. 60 | 61 | Output the evaluation strictly in the following JSON format without any additional explanation or comments: 62 | {'Geometric Accuracy: score_geo, 'Content Readability': score_con, 'total_score': score_geo + score_con} 63 | """ 64 | }, 65 | { 66 | "role": "user", 67 | "content": [{ 68 | "type": "image_url", 69 | "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"} 70 | }] 71 | } 72 | ] 73 | 74 | # Call the OpenAI API 75 | response = openai.ChatCompletion.create(model="gpt-4o", messages=messages, temperature=0) 76 | answer = response.choices[0].message['content'] 77 | responses[idx] = answer 78 | print(f"Response for patch {idx + 1}:", answer) 79 | 80 | return responses 81 | 82 | if __name__ == '__main__': 83 | folder = "select_data/DocTr" 84 | new_folder = "./DID/evaluation_results/DocTr" 85 | image_list = os.listdir(folder) 86 | 87 | if not os.path.exists(new_folder): 88 | os.makedirs(new_folder) 89 | 90 | for i in tqdm(image_list): 91 | 92 | if i.replace(".png", ".json") in os.listdir(new_folder): 93 | continue 94 | 95 | save_dict = {} 96 | image_path = os.path.join(folder, i) 97 | print(f"Processing {image_path}") 98 | response = chat(image_path) 99 | print(response) 100 | save_dict[i] = response 101 | 102 | with open(os.path.join(new_folder, i.replace(".png", ".json")), "w") as f: 103 | json.dump(save_dict, f) 104 | -------------------------------------------------------------------------------- /VTP_Bench/DID/select_data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/select_data/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/DID/select_data/UVDoc/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/select_data/UVDoc/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/DID/select_data/UVDoc/dir300_14.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/select_data/UVDoc/dir300_14.png -------------------------------------------------------------------------------- /VTP_Bench/DID/select_data/UVDoc/dir300_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/DID/select_data/UVDoc/dir300_3.png -------------------------------------------------------------------------------- /VTP_Bench/STE/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/0.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1000.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1005.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1005.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1006.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1006.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1011.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1011.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1012.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1012.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1017.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1017.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1018.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1018.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1023.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1024.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1024.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1029.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1029.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1030.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1030.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1035.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1035.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1036.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1036.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1041.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1041.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1042.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1042.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1047.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1047.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1048.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1048.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1053.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1053.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1054.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1054.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1059.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1059.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1060.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1060.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1065.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1065.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1066.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1066.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1071.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1071.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1072.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1072.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1077.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1077.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1078.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1078.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1083.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1083.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1084.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1084.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1089.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1089.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1090.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1090.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1095.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1095.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1096.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1096.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1101.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1101.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1102.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1102.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1107.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1107.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1108.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1108.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1113.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1113.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1114.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1114.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1119.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1119.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1120.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1120.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1125.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1125.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1126.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1126.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1131.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1131.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1132.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1132.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1137.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1137.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1138.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1138.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1143.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1143.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1144.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1144.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1149.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1149.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1150.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1150.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1155.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1155.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1156.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1156.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1161.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1161.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1162.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1162.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1167.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1167.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1168.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1168.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1173.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1173.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1174.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1174.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1179.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1179.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1180.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1180.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1185.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1185.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1186.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1186.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1191.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1191.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1192.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1192.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1197.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1197.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1198.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1198.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1203.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1203.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1204.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1204.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1209.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1209.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1210.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1210.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1215.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1215.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1216.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1216.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1221.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1221.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1222.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1222.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1227.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1227.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1228.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1228.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1233.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1233.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1234.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1234.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1239.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1239.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1240.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1240.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1245.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1245.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1246.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1246.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1251.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1251.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1252.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1252.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1257.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1257.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1258.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1258.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1263.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1263.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1264.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1264.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1269.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1269.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1270.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1270.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1277.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1277.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1278.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1278.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1284.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1284.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/1285.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/1285.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/6.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/7.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/885.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/885.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/886.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/886.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/891.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/891.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/892.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/892.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/897.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/897.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/898.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/898.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/903.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/903.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/904.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/904.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/909.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/909.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/910.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/910.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/915.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/915.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/916.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/916.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/921.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/921.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/922.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/922.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/927.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/927.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/928.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/928.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/933.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/933.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/934.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/934.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/939.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/939.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/940.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/940.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/945.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/945.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/946.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/946.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/951.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/951.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/952.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/952.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/957.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/957.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/958.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/958.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/963.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/963.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/964.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/964.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/969.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/969.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/970.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/970.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/975.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/975.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/976.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/976.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/981.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/981.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/982.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/982.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/987.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/987.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/988.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/988.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/993.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/993.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/994.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/994.png -------------------------------------------------------------------------------- /VTP_Bench/STE/STE_data/AnyText/999.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/STE_data/AnyText/999.png -------------------------------------------------------------------------------- /VTP_Bench/STE/evaluation_results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STE/evaluation_results/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STE/get_ste_score.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import os 4 | 5 | 6 | def extract_numbers(input_str): 7 | # 正则表达式匹配冒号后面的数字 8 | numbers = re.findall(r":\s*(\d+)", input_str) 9 | # 将提取的数字转换为整数 10 | return [int(num) for num in numbers] 11 | 12 | folder="./evaluation_results/DiffSTE" 13 | json_list=os.listdir(folder) 14 | all_vs,all_tf,all_total=0,0,0 15 | num=0 16 | for j in json_list: 17 | json_file=os.path.join(folder,j) 18 | with open(json_file) as f: 19 | data=json.load(f) 20 | 21 | print("#####",data) 22 | 23 | for i in data: 24 | # print(i) 25 | try: 26 | scores = extract_numbers(data[i]) 27 | except: 28 | continue 29 | # print(scores) 30 | 31 | if len(scores) <3: 32 | continue 33 | 34 | 35 | vs=scores[0] 36 | tf=scores[1] 37 | total=scores[2] 38 | all_vs+=vs 39 | all_tf+=tf 40 | all_total+=total 41 | num+=1 42 | 43 | 44 | 45 | 46 | 47 | print("#############evaluation results") 48 | final_vs=all_vs/num 49 | final_tf=all_tf/num 50 | final_total=all_total/num 51 | 52 | print(final_vs) 53 | print(final_tf) 54 | print(final_total) 55 | 56 | -------------------------------------------------------------------------------- /VTP_Bench/STE/gpt_ste.py: -------------------------------------------------------------------------------- 1 | import time 2 | import openai 3 | import json 4 | from tqdm import tqdm 5 | import os 6 | import cv2 7 | import base64 8 | from tqdm import tqdm 9 | 10 | 11 | def chat(image_path,word): 12 | openai.api_base = "https://api.942ai.com/v1" 13 | openai.api_key = "" 14 | 15 | 16 | # Convert the image to Base64 17 | with open(image_path, "rb") as img_file: 18 | img_base64 = base64.b64encode(img_file.read()).decode("utf-8") 19 | 20 | 21 | 22 | 23 | messages = [ 24 | { 25 | "role": "system", 26 | "content": """ 27 | The image you will receive is composed of two parts: (1) The top section contains the original image. (2) The bottom section is generated by a model after image editing. Please evaluate the model’s image editing capabilities based on the following two criteria: 28 | 29 | (1) Visual Similarity (Score: 0-5): Assign a higher score if the generated image (the bottom section) closely resembles the original image (the top section) in terms of visual elements such as background color, font style, font color, and texture. 30 | (2) Textual Fidelity (Score: 0-5): Firstly recognize the texts in the generated image (the bottom section) and assign a higher score if the words in the generated image closely match the given target text. 31 | 32 | Output the evaluation strictly in the following JSON format without any additional explanation or comments: 33 | {'Visual Similarity: score_visual, 'Textual Fidelity': score_text, 'total_score': score_visual + score_text} 34 | """ 35 | }, 36 | { 37 | "role": "user", 38 | "content": [ 39 | { 40 | "type": "text", 41 | "text": f"target text: {word}" 42 | }, 43 | { 44 | "type": "image_url", 45 | "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"} 46 | }, 47 | ], 48 | } 49 | ] 50 | 51 | 52 | 53 | 54 | response = openai.ChatCompletion.create(model="gpt-4o",messages=messages,temperature=0) 55 | return response.choices[0].message['content'] 56 | 57 | 58 | 59 | 60 | 61 | if __name__ == '__main__': 62 | folder="STE_data/DiffSTE" 63 | gt="ScenePair/select.txt" 64 | with open(gt,"r") as f: 65 | texts=f.readlines() 66 | 67 | dic={} 68 | for i in texts: 69 | textline=i.split() 70 | dic[textline[0]]=textline[1] 71 | 72 | 73 | new_folder="./evaluation_results/DiffSTE" 74 | 75 | if not os.path.exists(new_folder): 76 | os.makedirs(new_folder) 77 | 78 | for i in tqdm(dic): 79 | save_dict={} 80 | print(i,dic[i]) 81 | 82 | # if i.replace(".png",".json") in os.listdir(new_folder): 83 | # continue 84 | 85 | 86 | image_path=os.path.join(folder,i) 87 | response=chat(image_path,dic[i]) 88 | print(response) 89 | save_dict[i]=response 90 | with open(os.path.join(new_folder,i.replace(".png",".json")),"w") as f: 91 | json.dump(save_dict,f) -------------------------------------------------------------------------------- /VTP_Bench/STG/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STG/data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/laion_001253592.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/laion_001253592.jpg -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/laion_001253640.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/laion_001253640.jpg -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/laion_001253653.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/laion_001253653.jpg -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/laion_001253713.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/laion_001253713.jpg -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/laion_001253727.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/laion_001253727.jpg -------------------------------------------------------------------------------- /VTP_Bench/STG/data/anytext/laion_001253775.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/data/anytext/laion_001253775.jpg -------------------------------------------------------------------------------- /VTP_Bench/STG/evaluation_results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STG/evaluation_results/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STG/get_stg_score.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import os 4 | 5 | 6 | def extract_numbers(input_str): 7 | # 正则表达式匹配冒号后面的数字 8 | numbers = re.findall(r":\s*(\d+)", input_str) 9 | # 将提取的数字转换为整数 10 | return [int(num) for num in numbers] 11 | 12 | folder="evaluation_results/controlnet" 13 | json_list=os.listdir(folder) 14 | all_vs,all_tf,all_total=0,0,0 15 | num=0 16 | for j in json_list: 17 | json_file=os.path.join(folder,j) 18 | with open(json_file) as f: 19 | data=json.load(f) 20 | 21 | print("#####",data) 22 | 23 | for i in data: 24 | # print(i) 25 | try: 26 | scores = extract_numbers(data[i]) 27 | except: 28 | continue 29 | # print(scores) 30 | 31 | if len(scores) <3: 32 | continue 33 | 34 | 35 | vs=scores[0] 36 | tf=scores[1] 37 | total=scores[2] 38 | all_vs+=vs 39 | all_tf+=tf 40 | all_total+=total 41 | num+=1 42 | 43 | 44 | 45 | 46 | 47 | print("#############evaluation results") 48 | final_vs=all_vs/num 49 | final_tf=all_tf/num 50 | final_total=all_total/num 51 | 52 | print(final_vs) 53 | print(final_tf) 54 | print(final_total) 55 | 56 | -------------------------------------------------------------------------------- /VTP_Bench/STG/gpt_stg.py: -------------------------------------------------------------------------------- 1 | import time 2 | import openai 3 | import json 4 | from tqdm import tqdm 5 | import os 6 | import cv2 7 | import base64 8 | from tqdm import tqdm 9 | 10 | 11 | def chat(image_path): 12 | # openai.api_base = "https://api.942ai.com/v1" 13 | openai.api_key="" 14 | 15 | 16 | # Convert the image to Base64 17 | with open(image_path, "rb") as img_file: 18 | img_base64 = base64.b64encode(img_file.read()).decode("utf-8") 19 | 20 | 21 | 22 | 23 | messages = [ 24 | { 25 | "role": "system", 26 | "content": """ 27 | The image you will receive is composed of two parts: (1) The top section is the original image. (2) The bottom section is generated by a model after image generation. Please evaluate the model’s image generation capabilities based on the following two criteria: 28 | 29 | (1) Visual Quality (Score: 0-5): Assign a higher score if the generated image (the bottom section) has better visual quality including better clarity, lower noise, higher color accuracy and higher Edge Preservation. 30 | 31 | (2) Textual Fidelity (Score: 0-5): Firstly recognize the texts in the generated image (the bottom section) and recognize the texts in the original image (the top section). Then, assign a higher score if the words are closely matched (including text content and text positions). 32 | 33 | Output the evaluation strictly in the following JSON format without any additional explanation or comments: 34 | {'Visual Quality: score_visual, 'Textual Fidelity': score_text, 'total_score': score_visual + score_text} 35 | """ 36 | }, 37 | { 38 | "role": "user", 39 | "content": [ 40 | { 41 | "type": "image_url", 42 | "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"} 43 | }, 44 | ], 45 | } 46 | ] 47 | 48 | 49 | 50 | 51 | response = openai.ChatCompletion.create(model="gpt-4o",messages=messages,temperature=0) 52 | # response_message = completion["choices"][0]["message"]["content"] 53 | return response.choices[0].message['content'] 54 | 55 | 56 | 57 | 58 | 59 | if __name__ == '__main__': 60 | folder="data/controlnet" 61 | new_folder="evaluation_results/controlnet" 62 | 63 | if not os.path.exists(new_folder): 64 | os.makedirs(new_folder) 65 | 66 | image_list=os.listdir(folder) 67 | file_list=os.listdir(new_folder) 68 | 69 | for i in tqdm(image_list): 70 | if i.replace("jpg","json") in file_list: 71 | continue 72 | save_dict={} 73 | print("########",i) 74 | image_path=os.path.join(folder,i) 75 | response=chat(image_path) 76 | print(response) 77 | save_dict[i]=response 78 | with open(os.path.join(new_folder,i.replace(".jpg",".json")),"w") as f: 79 | json.dump(save_dict,f) -------------------------------------------------------------------------------- /VTP_Bench/STR/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/1.png -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/10.png -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/2.png -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/5.png -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/6.png -------------------------------------------------------------------------------- /VTP_Bench/STR/STR_data/viteraser/8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/STR_data/viteraser/8.png -------------------------------------------------------------------------------- /VTP_Bench/STR/evaluation_results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STR/evaluation_results/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STR/get_str_score.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import os 4 | 5 | 6 | def extract_numbers(input_str): 7 | # 正则表达式匹配冒号后面的数字 8 | numbers = re.findall(r":\s*(\d+)", input_str) 9 | # 将提取的数字转换为整数 10 | return [int(num) for num in numbers] 11 | 12 | folder="./evaluation_results/ctrnet" 13 | json_list=os.listdir(folder) 14 | all_vs,all_tf,all_total=0,0,0 15 | num=0 16 | for j in json_list: 17 | json_file=os.path.join(folder,j) 18 | with open(json_file) as f: 19 | data=json.load(f) 20 | 21 | print("#####",data) 22 | 23 | for i in data: 24 | # print(i) 25 | try: 26 | scores = extract_numbers(data[i]) 27 | except: 28 | continue 29 | # print(scores) 30 | 31 | if len(scores) <3: 32 | continue 33 | 34 | 35 | vs=scores[0] 36 | tf=scores[1] 37 | total=scores[2] 38 | all_vs+=vs 39 | all_tf+=tf 40 | all_total+=total 41 | num+=1 42 | 43 | 44 | 45 | 46 | 47 | print("#############evaluation results") 48 | final_vs=all_vs/num 49 | final_tf=all_tf/num 50 | final_total=all_total/num 51 | 52 | print(final_vs) 53 | print(final_tf) 54 | print(final_total) 55 | 56 | 57 | -------------------------------------------------------------------------------- /VTP_Bench/STR/gpt_str.py: -------------------------------------------------------------------------------- 1 | import time 2 | import openai 3 | import json 4 | from tqdm import tqdm 5 | import os 6 | import cv2 7 | import base64 8 | from tqdm import tqdm 9 | 10 | 11 | def chat(image_path): 12 | # openai.api_base = "https://api.942ai.com/v1" 13 | openai.api_key="" 14 | 15 | 16 | # Convert the image to Base64 17 | with open(image_path, "rb") as img_file: 18 | img_base64 = base64.b64encode(img_file.read()).decode("utf-8") 19 | 20 | 21 | 22 | 23 | messages = [ 24 | { 25 | "role": "system", 26 | "content": """ 27 | The image you will receive is composed of two parts: (1) The top section is the original image. (2) The bottom section is generated by a model after scene text removal. Please evaluate the model’s text removal capabilities based on the following two criteria: 28 | 29 | (1) Text Erasure Degree (Score: 0-5): Assign a higher score if less visible text remains. 30 | 31 | (2) Visual Quality (Score: 0-5): Assign a higher score if the image is visually restored with minimal distortions. Especially the removal area is visually matched with the original image. 32 | 33 | Output the evaluation strictly in the following JSON format without any additional explanation or comments: 34 | {'Text Erasure Degree': score_text, 'Visual Quality: score_visual, 'total_score': score_text + score_visual} 35 | """ 36 | }, 37 | { 38 | "role": "user", 39 | "content": [ 40 | { 41 | "type": "image_url", 42 | "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"} 43 | }, 44 | ], 45 | } 46 | ] 47 | 48 | 49 | 50 | 51 | response = openai.ChatCompletion.create(model="gpt-4o",messages=messages,temperature=0) 52 | # response_message = completion["choices"][0]["message"]["content"] 53 | return response.choices[0].message['content'] 54 | 55 | 56 | 57 | 58 | 59 | if __name__ == '__main__': 60 | folder="STR_data/ctrnet" 61 | new_folder="./evaluation_results/ctrnet" 62 | 63 | if not os.path.exists(new_folder): 64 | os.makedirs(new_folder) 65 | 66 | image_list=os.listdir(folder) 67 | print(len(image_list)) 68 | file_list=os.listdir(new_folder) 69 | 70 | for i in tqdm(image_list): 71 | if i.replace("png","json") in file_list: 72 | continue 73 | save_dict={} 74 | print("########",i) 75 | image_path=os.path.join(folder,i) 76 | response=chat(image_path) 77 | print(response) 78 | save_dict[i]=response 79 | with open(os.path.join(new_folder,i.replace(".png",".json")),"w") as f: 80 | json.dump(save_dict,f) -------------------------------------------------------------------------------- /VTP_Bench/STSR/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STSR/evaluation_results/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/evaluation_results/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STSR/get_stsr_scores.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import os 4 | 5 | 6 | def extract_numbers(input_str): 7 | # 正则表达式匹配冒号后面的数字 8 | numbers = re.findall(r":\s*(\d+)", input_str) 9 | # 将提取的数字转换为整数 10 | return [int(num) for num in numbers] 11 | 12 | folder="./evaluation_results/tbsrn" 13 | json_list=os.listdir(folder) 14 | all_vs,all_tf,all_total=0,0,0 15 | num=0 16 | for j in json_list: 17 | json_file=os.path.join(folder,j) 18 | with open(json_file) as f: 19 | data=json.load(f) 20 | 21 | print("#####",data) 22 | 23 | for i in data: 24 | # print(i) 25 | try: 26 | scores = extract_numbers(data[i]) 27 | except: 28 | continue 29 | # print(scores) 30 | 31 | if len(scores) <3: 32 | continue 33 | 34 | 35 | vs=scores[0] 36 | tf=scores[1] 37 | total=scores[2] 38 | all_vs+=vs 39 | all_tf+=tf 40 | all_total+=total 41 | num+=1 42 | 43 | 44 | 45 | 46 | 47 | print("#############evaluation results") 48 | final_vs=all_vs/num 49 | final_tf=all_tf/num 50 | final_total=all_total/num 51 | 52 | print(final_vs) 53 | print(final_tf) 54 | print(final_total) 55 | 56 | -------------------------------------------------------------------------------- /VTP_Bench/STSR/gpt_stsr.py: -------------------------------------------------------------------------------- 1 | import time 2 | import openai 3 | import json 4 | from tqdm import tqdm 5 | import os 6 | import cv2 7 | import base64 8 | from tqdm import tqdm 9 | 10 | 11 | def chat(image_path): 12 | # openai.api_base = "https://api.942ai.com/v1" 13 | openai.api_key="" 14 | 15 | 16 | # Convert the image to Base64 17 | with open(image_path, "rb") as img_file: 18 | img_base64 = base64.b64encode(img_file.read()).decode("utf-8") 19 | 20 | 21 | 22 | 23 | messages = [ 24 | { 25 | "role": "system", 26 | "content": """ 27 | The image you will receive is composed of two parts: (1) The top section is the original image. (2) The bottom section is generated by a model after image super-resolution. Please evaluate the model’s image super-resolution capabilities based on the following two criteria: 28 | 29 | (1) Visual Quality (Score: 0-5): Assign a higher score if the generated image (the bottom section) has better visual quality including better clarity, lower noise, higher color accuracy and higher Edge Preservation. 30 | 31 | (2) Textual Fidelity (Score: 0-5): Firstly recognize the texts in the generated image (the bottom section) and recognize the texts in the original image (the top section). Then, assign a higher score if the words are closely matched. 32 | 33 | Output the evaluation strictly in the following JSON format without any additional explanation or comments: 34 | {'Visual Quality: score_visual, 'Textual Fidelity': score_text, 'total_score': score_visual + score_text} 35 | """ 36 | }, 37 | { 38 | "role": "user", 39 | "content": [ 40 | { 41 | "type": "image_url", 42 | "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"} 43 | }, 44 | ], 45 | } 46 | ] 47 | 48 | 49 | 50 | 51 | response = openai.ChatCompletion.create(model="gpt-4o",messages=messages,temperature=0) 52 | # response_message = completion["choices"][0]["message"]["content"] 53 | return response.choices[0].message['content'] 54 | 55 | 56 | 57 | 58 | 59 | if __name__ == '__main__': 60 | folder="/STSR_data/tbsrn" 61 | new_folder="./evaluation_results/tbsrn" 62 | 63 | if not os.path.exists(new_folder): 64 | os.makedirs(new_folder) 65 | 66 | image_list=os.listdir(folder) 67 | file_list=os.listdir(new_folder) 68 | 69 | for i in tqdm(image_list): 70 | if i.replace("jpg","json") in file_list: 71 | continue 72 | save_dict={} 73 | print("########",i) 74 | image_path=os.path.join(folder,i) 75 | response=chat(image_path) 76 | print(response) 77 | save_dict[i]=response 78 | with open(os.path.join(new_folder,i.replace(".jpg",".json")),"w") as f: 79 | json.dump(save_dict,f) -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/final_merged_0_hr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/final_merged_0_hr.jpg -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/final_merged_0_lr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/final_merged_0_lr.jpg -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/final_merged_0_sr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/final_merged_0_sr.jpg -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/final_merged_1_hr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/final_merged_1_hr.jpg -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/final_merged_1_lr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/final_merged_1_lr.jpg -------------------------------------------------------------------------------- /VTP_Bench/STSR/lemma/final_merged_1_sr.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/STSR/lemma/final_merged_1_sr.jpg -------------------------------------------------------------------------------- /VTP_Bench/TIE/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/.DS_Store -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/train_image_0001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/train_image_0001.png -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/train_image_0003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/train_image_0003.png -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/train_image_0004.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/train_image_0004.png -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/train_image_0005.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/train_image_0005.png -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/train_image_0007.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/train_image_0007.png -------------------------------------------------------------------------------- /VTP_Bench/TIE/data/docdiff/train_image_0009.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/VTP_Bench/TIE/data/docdiff/train_image_0009.png -------------------------------------------------------------------------------- /VTP_Bench/TIE/get_tie_score.py: -------------------------------------------------------------------------------- 1 | import json 2 | import re 3 | import os 4 | 5 | 6 | def extract_numbers(input_str): 7 | # 正则表达式匹配冒号后面的数字 8 | numbers = re.findall(r":\s*(\d+)", input_str) 9 | # 将提取的数字转换为整数 10 | return [int(num) for num in numbers] 11 | 12 | folder="evaluation_results/docres" 13 | json_list=os.listdir(folder) 14 | all_vs,all_tf,all_total=0,0,0 15 | num=0 16 | for j in json_list: 17 | json_file=os.path.join(folder,j) 18 | with open(json_file) as f: 19 | data=json.load(f) 20 | 21 | print("#####",data) 22 | 23 | for i in data: 24 | # print(i) 25 | try: 26 | scores = extract_numbers(data[i]) 27 | except: 28 | continue 29 | # print(scores) 30 | 31 | if len(scores) <3: 32 | continue 33 | 34 | 35 | vs=scores[0] 36 | tf=scores[1] 37 | total=scores[2] 38 | all_vs+=vs 39 | all_tf+=tf 40 | all_total+=total 41 | num+=1 42 | 43 | 44 | 45 | 46 | 47 | print("#############evaluation results") 48 | final_vs=all_vs/num 49 | final_tf=all_tf/num 50 | final_total=all_total/num 51 | 52 | print(final_vs) 53 | print(final_tf) 54 | print(final_total) 55 | -------------------------------------------------------------------------------- /VTP_Bench/TIE/gpt_tie.py: -------------------------------------------------------------------------------- 1 | 2 | import time 3 | import openai 4 | import json 5 | from tqdm import tqdm 6 | import os 7 | import cv2 8 | import base64 9 | import random 10 | 11 | 12 | 13 | def chat(image_path): 14 | openai.api_key="" 15 | 16 | with open(image_path, "rb") as img_file: 17 | img_base64 = base64.b64encode(img_file.read()).decode("utf-8") 18 | 19 | 20 | # Prepare the messages 21 | messages = [ 22 | { 23 | "role": "system", 24 | "content": """ 25 | The image you will receive is composed of two parts: (1) The top section contains the original image. (2) The bottom section is generated by a model after image deblurriing. Please evaluate the model’s image deblurring capabilities based on the following two criteria: 26 | 27 | (1) Visual Quality (Score: 0-5): Assign a higher score if the deblurred image (the bottom section) exhibits higher viqual quality, including better clarity, lower noise, higher color accuracy and higher Edge Preservation. 28 | 29 | (2) Content Readability (Score: 0-5): Assign a higher score if the text and visual elements in the deblurred image (the bottom section) are clear to read and understand. 30 | 31 | Output the evaluation strictly in the following JSON format without any additional explanation or comments: 32 | {'Visual Quality: score_visual, 'Textual Fidelity': score_text, 'total_score': score_visual + score_text} 33 | """ 34 | }, 35 | { 36 | "role": "user", 37 | "content": [{ 38 | "type": "image_url", 39 | "image_url": {"url": f"data:image/jpeg;base64,{img_base64}"} 40 | }] 41 | } 42 | ] 43 | 44 | # Call the OpenAI API 45 | 46 | response = openai.ChatCompletion.create(model="gpt-4o",messages=messages,temperature=0) 47 | # response_message = completion["choices"][0]["message"]["content"] 48 | return response.choices[0].message['content'] 49 | 50 | if __name__ == '__main__': 51 | folder = "data/docres" 52 | new_folder = "./evaluation_results/docres" 53 | image_list = os.listdir(folder) 54 | 55 | if not os.path.exists(new_folder): 56 | os.makedirs(new_folder) 57 | 58 | for i in tqdm(image_list): 59 | 60 | if i.replace(".png", ".json") in os.listdir(new_folder): 61 | continue 62 | 63 | save_dict = {} 64 | image_path = os.path.join(folder, i) 65 | print(f"Processing {image_path}") 66 | response = chat(image_path) 67 | print(response) 68 | save_dict[i] = response 69 | 70 | with open(os.path.join(new_folder, i.replace(".png", ".json")), "w") as f: 71 | json.dump(save_dict, f) 72 | -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/logo.png -------------------------------------------------------------------------------- /vtpbench_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/shuyansy/Visual-Text-Processing-survey/52e991cd8ebe98fdc9b108d2e9c4775127d69bc5/vtpbench_2.png --------------------------------------------------------------------------------