├── README.md └── scut-dlvc.jpeg /README.md: -------------------------------------------------------------------------------- 1 | # Scene Text Recognition Resources 2 | [

Author: 陈晓雪

](https://github.com/CCchenxiaoxue) 3 | The paper "Text Recognition in the Wild: A Survey" (accepted to appear in ACM Computing Surveys) in [arXiv](https://arxiv.org/pdf/2005.03492v3.pdf) version is available now. 4 | 5 | # ❗❗ Newest Version Can be Found Here ❗❗ 6 | ## This repository is no longer maintaining now, and you can refer to our newest one 7 | ## [Scene Text Recognition Recommendations](https://github.com/HCIILAB/Scene-Text-Recognition-Recommendations) 8 | 9 | ## Updates 10 | 11 | Dec 24, 2019: add 20 papers and update corresponding tables. 12 | 13 | Feb 29, 2020: add AAAI-2020 papers and update corresponding tables. 14 | 15 | May 8, 2020: add CVPR-2020 papers and update corresponding tables. 16 | 17 | Dec 8, 2020: add 11 papers and update corresponding tables. You can download the new [Excel](https://pan.baidu.com/s/1xitxu7R5hw27pVV7eJ1c7w) prepared by us. (Password: sj2t) 18 | 19 | *** 20 | 21 | 22 | 23 | - [1. Datasets](#1-datasets) 24 | - [1.1 Regular Latin Datasets](#11-regular-latin-datasets) 25 | - [1.2 Irregular Latin Datasets](#12-irregular-latin-datasets) 26 | - [1.3 Multilingual Datasets](#13-multilingual-datasets) 27 | - [1.4 Synthetic Datasets](#14-synthetic-datasets) 28 | - [1.5 Comparison of the Benchmark Datasets](#15-comparison-of-the-benchmark-datasets) 29 | - [2. Performance Comparison of Recognition Algorithms](#2-performance-comparison-of-recognition-algorithms) 30 | - [2.1 Characteristics Comparison of Recognition Approaches](#21-characteristics-comparison-of-recognition-approaches) 31 | - [2.2 Performance Comparison on Benchmark Datasets](#22-performance-comparison-on-benchmark-datasets) 32 | - [2.2.1 Performance Comparison of Recognition Algorithms on Regular Latin Datasets](#221-performance-comparison-of-recognition-algorithms-on-regular-latin-datasets) 33 | - [2.2.2 Performance Comparison of Recognition Algorithms on Irregular Latin Datasets](#222-performance-comparison-of-recognition-algorithms-on-irregular-latin-datasets) 34 | - [3. Survey](#3-survey) 35 | - [4. OCR Service](#4-ocr-service) 36 | - [5. References](#5-references) 37 | - [6.Help](#6help) 38 | - [7.Copyright](#7copyright) 39 | 40 | 41 | 42 | 43 | ## 1. Datasets 44 | 45 | 46 | ### 1.1 Regular Latin Datasets 47 | 48 | - IIIT5K[31]: 49 | * **Introduction:** The IIIT5K dataset [31] contains 5,000 text instance images: 2,000 for training and 3,000 for testing. It contains words from street scenes and from originally-digital images. Every image is associated with a 50 -word lexicon and a 1,000 -word lexicon. Specifically, the lexicon consists of a ground-truth word and some randomly picked words. 50 | * **Link:** [IIIT5K-download](http://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) 51 | - SVT[1]: 52 | * **Introduction:** The SVT dataset [1] contains 350 images: 100 for training and 250 for testing. Some images are severely corrupted by noise, blur, and low resolution. Each image is associated with a 50 -word lexicon. 53 | * **Link:** [SVT-download](http://vision.ucsd.edu/~kai/svt/) 54 | - ICDAR 2003(IC03)[33]: 55 | * **Introduction:** The IC03 dataset [33] contains 509 images: 258 for training and 251 for testing. Specifically, it contains 867 cropped text instances after discarding images that contain non-alphanumeric characters or less than three characters. Every image is associated with a 50 -word lexicon and a full-word lexicon. Moreover, the full lexicon combines all lexicon words. 56 | * **Link:** [IC03-download](http://www.iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions) 57 | - ICDAR 2013(IC13)[34]: 58 | * **Introduction:** The IC13 dataset [34] contains 561 images: 420 for training and 141 for testing. It inherits data from the IC03 dataset and extends it with new images. Similar to IC03 dataset, the IC13 dataset contains 1,015 cropped text instance images after removing the words with non-alphanumeric characters. No lexicon is associated with IC13 . Notably, 215 duplicate text instance images [65] exist between the IC03 training dataset and the IC13 testing dataset. Therefore, care should be taken regarding the overlapping data when evaluating a model on the IC13 testing data. 59 | * **Link:** [IC13-download](http://dagdata.cvc.uab.es/icdar2013competition/?ch=2&com=downloads) 60 | - SVHN[45]: 61 | * **Introduction:** The SVHN [45] dataset contains more than 600,000 digits of house numbers in natural scenes. It is obtained from a large number of street view images using a combination of automated algorithms and the Amazon Mechanical Turk (AMT) framework. The SVHN dataset was typically used for scene digit recognition. 62 | * **Link:** [SVHN-download](http://ufldl.stanford.edu/housenumbers/) 63 | 64 | 65 | ### 1.2 Irregular Latin Datasets 66 | 67 | - SVT-P[35]: 68 | - **Introduction:** The SVT-P [35] dataset contains 238 images with 639 cropped text instances. It is specifically designed to evaluate perspective distorted text recognition. It is built based on the original SVT dataset by selecting the images at the same address on Google Street View but with different view angles. Therefore, most text instances are heavily distorted by the non-frontal view angle. Moreover, each image is associated with a 50-word lexicon and a full-word lexicon. 69 | - **Link:** [SVT-P-download](https://pan.baidu.com/s/1rhYUn1mIo8OZQEGUZ9Nmrg ) \(Password : vnis) 70 | - CUTE80[36]: 71 | - **Introduction:** The CUTE80 dataset [36] contains 80 high-resolution images with 288 cropped text 72 | instances. It focuses on curved text recognition. Most images in CUTE80 have a complex background, perspective distortion, and poor resolution. No lexicon is associated with CUTE80. 73 | - **Link:** [CUTE80-download](http://cs-chan.com/downloads_CUTE80_dataset.html) 74 | - ICDAR 2015(IC15)[37]: 75 | - **Introduction:** The IC15 dataset [37] contains 1,500 images: 1,000 for training and 500 for testing. 76 | Specifically, it contains 2,077 cropped text instances, including more than 200 irregular text samples. As text images were taken by Google Glasses without ensuring the image quality, most of the text is very small, blurred, and multi-oriented. No lexicon is provided. 77 | - **Link:** [IC15-download](http://rrc.cvc.uab.es/?ch=4&com=downloads) 78 | - COCO-Text[38]: 79 | - **Introduction:** The COCO-Text dataset [38] contains 63,686 images with 145,859 cropped text instances. It is the first large-scale dataset for text in natural images and also the first dataset to annotate scene text with attributes such as legibility and type of text. However, no lexicon is associated with COCO-Text. 80 | - **Link:** [COCO-Text-download](https://vision.cornell.edu/se3/coco-text-2/) 81 | - Total-Text[39]: 82 | - **Introduction:** The Total-Text dataset [39] contains 1,555 images with 11,459 cropped text instance images. It focuses on curved scene text recognition. Images in Total-Text have more than three different orientations, including horizontal, multi-oriented, and curved. No lexicon is associated with Total-Text. 83 | - **Link:** [Total-Text-download](https://github.com/cs-chan/Total-Text-Dataset) 84 | 85 | 86 | ### 1.3 Multilingual Datasets 87 | 88 | - RCTW-17(RCTW competition,ICDAR17)[40]: 89 | - **Introduction:** The RCTW-17 dataset contains 12,514 images: 11,514 for training and 1,000 for testing. Most are natural images collected by cameras or mobile phones, whereas others are digital-born. Text instances are annotated with labels, fonts, languages, etc. 90 | - **Link:** [RCTW-17-download](http://rctw.vlrlab.net/dataset/) 91 | - MTWI(competition)[41]: 92 | - **Introduction:** The MTWI dataset contains 20,000 images. This is the first dataset constructed by Chinese and Latin web text. Most images in MTWI have a relatively high resolution and cover diverse types of web text, including multi-oriented text, tightly-stacked text, and complex-shaped text. 93 | - **Link:** [MTWI-download ](https://pan.baidu.com/s/1SUODaOzV7YOPkrun0xSz6A#list/path=%2F) \(Password:gox9) 94 | - CTW[42]: 95 | - **Introduction:** The CTW dataset includes 32,285 high-resolution street view images with 1,018,402 character instances. All images have character-level annotations: the underlying character, the bounding box, and six other attributes. 96 | - **Link:** [CTW-download](https://ctwdataset.github.io/) 97 | - SCUT-CTW1500[43]: 98 | - **Introduction:** The SCUT-CTW1500 dataset contains 1,500 images: 1,000 for training and 500 99 | for testing. In particular, it provides 10,751 cropped text instance images, including 3,530 with curved text. The images are manually harvested from the Internet, image libraries such as Google Open-Image, or phone cameras. The dataset contains a lot of horizontal and multi-oriented text 100 | - **Link:** [SCUT-CTW1500-download](https://github.com/Yuliang-Liu/Curve-Text-Detector) 101 | * LSVT(LSVT competition, ICDAR2019)[57]: 102 | * **Introduction:** The LSVT dataset contains 20,000 testing samples, 30,000 fully annotated training samples, and 400,000 training samples with weak annotations (i.e., with partial labels). All images are captured from streets and reflect a large variety of complicated real-world scenarios, e.g., store fronts and landmarks. 103 | * **Link:** [LSVT-download](https://rrc.cvc.uab.es/?ch=16&com=downloads) 104 | * ArT(ArT competition, ICDAR2019)[58]: 105 | * **Introduction:** The ArT dataset [58] contains 10,166 images: 5,603 for training and 4,563 for testing. ArT is a combination of Total-Text, SCUT-CTW 1500 , and Baidu Curved Scene Text 4 , which was collected to introduce the arbitrary-shaped text problem. Moreover, all existing text shapes (i.e., horizontal, multi-oriented, and curved) have multiple occurrences in the ArT dataset. 106 | * **Link:** [ArT-download](https://rrc.cvc.uab.es/?ch=16&com=downloads) 107 | * ReCTS-25k(ReCTS competition, ICDAR2019)[59]: 108 | * **Introduction:** The ReCTS-25k dataset [59] contains 25,000 images: 20,000 for training and 5,000 for testing. All the images are from the Meituan-Dianping Group, collected by Meituan business mer- 109 | chants, using phone cameras under uncontrolled conditions. Specifically, ReCTS-25 k dataset mainly contains images of Chinese text on signboards. 110 | * **Link:** [ReCTS-download](https://rrc.cvc.uab.es/?ch=16&com=downloads) 111 | * MLT(MLTcompetition, ICDAR2019) [81]: 112 | * **Introduction:** The MLT-2019 dataset [81] contains 20,000 images: 10,000 for training (1,000 per language) and 10,000 for testing. The dataset includes ten languages, representing seven different scripts: Arabic, Bangla, Chinese, Devanagari, English, French, German, Italian, Japanese, and Korean. The number of images per script is equal. 113 | * **Link:** [MLT-download](https://rrc.cvc.uab.es/?ch=15&com=downloads) 114 | 115 | 116 | ### 1.4 Synthetic Datasets 117 | 118 | * Synth90k [53] : 119 | * **Introduction:** The Synth90k dataset contains 9 million synthetic text instance images from a set of 90k common English words. Words are rendered onto natural images with random transformations and effects, such as random fonts, colors, blur, and noises. Synth90k dataset can emulate the distribution of scene text images and can be used instead of real-world data to train data-hungry deep learning algorithms. Besides, every image is annotated with a ground-truth word. 120 | * **Link:** [Synth90k-download](http://www.robots.ox.ac.uk/~vgg/data/text/) 121 | * SynthText [54] : 122 | * **Introduction:** The SynthText dataset contains 800,000 images with 6 million synthetic text instances. As in the generation of Synth90k dataset, the text sample is rendered using a randomly selected font and transformed according to the local surface orientation. Moreover, each image is annotated with a ground-truth word. 123 | * **Link:** [SynthText-download](https://github.com/ankush-me/SynthText) 124 | * Verisimilar Synthesis [73] : 125 | * **Introduction:** The Verisimilar Synthesis dataset [73] contains 5 million synthetic text instance images. Given background images and source texts, a semantic map and a saliency map are first 126 | determined which are then combined to identify semantically sensible and apt locations for text embedding. The color, brightness, and orientation of the source texts are further determined adaptively according to the color, brightness, and contextual structures around the embedding locations within the background image. 127 | * **Link:** [Verisimilar Synthesis](https://github.com/fnzhan/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes) 128 | * UnrealText [88]: 129 | * **Introduction:** The UnrealText dataset [88] contains 600K synthetic images with 12 million cropped text instances. It is developed upon Unreal Engine 4 and the UnrealCV plugin [89]. Text instances are regarded as planar polygon meshes with text foregrounds loaded as texture. These meshes are placed in suitable positions in 3D world, and rendered together with the scene as a whole. The same font set from [Google Fonts](https://fonts.google.com/) and the same text corpus, i.e., Newsgroup20, are used as SynthText does. 130 | * **Link:** [Verisimilar Synthesis](https://github.com/fnzhan/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes) 131 | 132 | 133 | ### 1.5 Comparison of the Benchmark Datasets 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 | 339 | 340 | 341 | 342 | 343 | 344 | 345 | 346 | 347 | 348 | 349 | 350 | 351 | 352 | 353 | 354 | 355 | 356 | 357 | 358 | 359 | 360 | 361 | 362 | 363 | 364 | 365 | 366 | 367 | 368 | 369 | 370 | 371 | 372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | 476 | 477 | 478 | 479 | 480 | 481 | 482 | 483 | 484 | 485 | 486 | 487 | 488 | 489 | 490 | 491 | 492 | 493 | 494 | 495 | 496 | 497 | 498 | 499 | 500 | 501 | 502 | 503 | 504 | 505 | 506 | 507 | 508 | 509 | 510 | 511 | 512 | 513 | 514 | 515 | 516 | 517 | 518 | 519 | 520 | 521 | 522 | 523 | 524 | 525 | 526 | 527 | 528 | 529 | 530 | 531 | 532 | 533 | 534 | 535 | 536 | 537 | 538 | 539 | 540 | 541 | 542 | 543 | 544 |
Comparison of the Benchmark Datasets
                  Datasets                  LanguageImagesLexiconLabelType
PicturesTraining PicturesTesting PicturesInstancesTraining InstancesTesting Instances501kFullNoneCharWord
IIIT5K[31]English1120380740500020003000 ×Regular
SVT[32]English350100250725211514 × × ×Regular
IC03[33]English509258251226811571111Regular
IC13[34]English561420141500335641439 × × ×Regular
SVHN[45]Digits6000005739682603260000057396826032 × × ×Regular
SVT-P[35]English23802386390639 × ×Irregular
CUTE80[36]English800802880288 × × × ×Irregular
IC15[37]English15001000500654544682077 × × × ×Irregular
COCO-Text[38]English63686436861000014585911830927550 × × × ×Irregular
Total-Text[39]English155512553001145911166293 × × × ×Irregular
RCTW-17[40]Chinese/English12514115141000--- × × × ×Regular
MTWI[41]Chinese/English200001000010000290206141476148730 × × × ×Regular
CTW[42]Chinese/English322852588732691018402812872103519 × × ×Regular
SCUT-CTW1500[43]Chinese/English150010005001075176833068 × × × ×Irregular
LSVT[57], [63]Chinese/English4500003000020000--- × × × ×Irregular
ArT[58]Chinese/English1016656034563984555002948426 × × × ×Irregular
ReCTS-25k[59]Chinese/English2500020000500011971310892410789 × × ×Irregular
MLT[81]Multilingual20000100001000019163989177102462 × × × ×Irregular
Synth90k[53]English~9000000--~9000000-- × × × ×Regular
SynthText[54]English~6000000--~6000000-- × × ×Regular
Verisimilar Synthesis[73]English---~5000000-- × × × ×Regular
UnrealText[88]English~600000--~12000000-- × × ×Regular
545 | 546 | *** 547 | 548 | 549 | ## 2. Performance Comparison of Recognition Algorithms 550 | 551 | 552 | ### 2.1 Characteristics Comparison of Recognition Approaches 553 | 554 | It is notable that 1) "Reg" stands for regular Latin datasets. 2) "Irreg" stands for irregular Latin datasets. 3) "Seg" denotes the segmentation-based methods. 4) "Extra" indicates the methods that use the extra datasets other than Synth90k and SynthText. 5) "CTC" represents the methods that apply the CTC-based algorithm to decode. 6) "Attn" represents the method that apply the attention mechanism to decode. 555 | 556 | You can also download the new [Excel](https://pan.baidu.com/s/1xitxu7R5hw27pVV7eJ1c7w) prepared by us. (Password: sj2t) 557 | 558 | 559 | 560 | 561 | 562 | 563 | 564 | 565 | 566 | 567 | 568 | 569 | 570 | 571 | 572 | 573 | 574 | 575 | 576 | 577 | 578 | 579 | 580 | 581 | 582 | 583 | 584 | 585 | 586 | 587 | 588 | 589 | 590 | 591 | 592 | 593 | 594 | 595 | 596 | 597 | 598 | 599 | 600 | 601 | 602 | 603 | 604 | 605 | 606 | 607 | 608 | 609 | 610 | 611 | 612 | 613 | 614 | 615 | 616 | 617 | 618 | 619 | 620 | 621 | 622 | 623 | 624 | 625 | 626 | 627 | 628 | 629 | 630 | 631 | 632 | 633 | 634 | 635 | 636 | 637 | 638 | 639 | 640 | 641 | 642 | 643 | 644 | 645 | 646 | 647 | 648 | 649 | 650 | 651 | 652 | 653 | 654 | 655 | 656 | 657 | 658 | 659 | 660 | 661 | 662 | 663 | 664 | 665 | 666 | 667 | 668 | 669 | 670 | 671 | 672 | 673 | 674 | 675 | 676 | 677 | 678 | 679 | 680 | 681 | 682 | 683 | 684 | 685 | 686 | 687 | 688 | 689 | 690 | 691 | 692 | 693 | 694 | 695 | 696 | 697 | 698 | 699 | 700 | 701 | 702 | 703 | 704 | 705 | 706 | 707 | 708 | 709 | 710 | 711 | 712 | 713 | 714 | 715 | 716 | 717 | 718 | 719 | 720 | 721 | 722 | 723 | 724 | 725 | 726 | 727 | 728 | 729 | 730 | 731 | 732 | 733 | 734 | 735 | 736 | 737 | 738 | 739 | 740 | 741 | 742 | 743 | 744 | 745 | 746 | 747 | 748 | 749 | 750 | 751 | 752 | 753 | 754 | 755 | 756 | 757 | 758 | 759 | 760 | 761 | 762 | 763 | 764 | 765 | 766 | 767 | 768 | 769 | 770 | 771 | 772 | 773 | 774 | 775 | 776 | 777 | 778 | 779 | 780 | 781 | 782 | 783 | 784 | 785 | 786 | 787 | 788 | 789 | 790 | 791 | 792 | 793 | 794 | 795 | 796 | 797 | 798 | 799 | 800 | 801 | 802 | 803 | 804 | 805 | 806 | 807 | 808 | 809 | 810 | 811 | 812 | 813 | 814 | 815 | 816 | 817 | 818 | 819 | 820 | 821 | 822 | 823 | 824 | 825 | 826 | 827 | 828 | 829 | 830 | 831 | 832 | 833 | 834 | 835 | 836 | 837 | 838 | 839 | 840 | 841 | 842 | 843 | 844 | 845 | 846 | 847 | 848 | 849 | 850 | 851 | 852 | 853 | 854 | 855 | 856 | 857 | 858 | 859 | 860 | 861 | 862 | 863 | 864 | 865 | 866 | 867 | 868 | 869 | 870 | 871 | 872 | 873 | 874 | 875 | 876 | 877 | 878 | 879 | 880 | 881 | 882 | 883 | 884 | 885 | 886 | 887 | 888 | 889 | 890 | 891 | 892 | 893 | 894 | 895 | 896 | 897 | 898 | 899 | 900 | 901 | 902 | 903 | 904 | 905 | 906 | 907 | 908 | 909 | 910 | 911 | 912 | 913 | 914 | 915 | 916 | 917 | 918 | 919 | 920 | 921 | 922 | 923 | 924 | 925 | 926 | 927 | 928 | 929 | 930 | 931 | 932 | 933 | 934 | 935 | 936 | 937 | 938 | 939 | 940 | 941 | 942 | 943 | 944 | 945 | 946 | 947 | 948 | 949 | 950 | 951 | 952 | 953 | 954 | 955 | 956 | 957 | 958 | 959 | 960 | 961 | 962 | 963 | 964 | 965 | 966 | 967 | 968 | 969 | 970 | 971 | 972 | 973 | 974 | 975 | 976 | 977 | 978 | 979 | 980 | 981 | 982 | 983 | 984 | 985 | 986 | 987 | 988 | 989 | 990 | 991 | 992 | 993 | 994 | 995 | 996 | 997 | 998 | 999 | 1000 | 1001 | 1002 | 1003 | 1004 | 1005 | 1006 | 1007 | 1008 | 1009 | 1010 | 1011 | 1012 | 1013 | 1014 | 1015 | 1016 | 1017 | 1018 | 1019 | 1020 | 1021 | 1022 | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1029 | 1030 | 1031 | 1032 | 1033 | 1034 | 1035 | 1036 | 1037 | 1038 | 1039 | 1040 | 1041 | 1042 | 1043 | 1044 | 1045 | 1046 | 1047 | 1048 | 1049 | 1050 | 1051 | 1052 | 1053 | 1054 | 1055 | 1056 | 1057 | 1058 | 1059 | 1060 | 1061 | 1062 | 1063 | 1064 | 1065 | 1066 | 1067 | 1068 | 1069 | 1070 | 1071 | 1072 | 1073 | 1074 | 1075 | 1076 | 1077 | 1078 | 1079 | 1080 | 1081 | 1082 | 1083 | 1084 | 1085 | 1086 | 1087 | 1088 | 1089 | 1090 | 1091 | 1092 | 1093 | 1094 | 1095 | 1096 | 1097 | 1098 | 1099 | 1100 | 1101 | 1102 | 1103 | 1104 | 1105 | 1106 | 1107 | 1108 | 1109 | 1110 | 1111 | 1112 | 1113 | 1114 | 1115 | 1116 | 1117 | 1118 | 1119 | 1120 | 1121 | 1122 | 1123 | 1124 | 1125 | 1126 | 1127 | 1128 | 1129 | 1130 | 1131 | 1132 | 1133 | 1134 | 1135 | 1136 | 1137 | 1138 | 1139 | 1140 | 1141 | 1142 | 1143 | 1144 | 1145 | 1146 | 1147 | 1148 | 1149 | 1150 | 1151 | 1152 | 1153 | 1154 | 1155 | 1156 | 1157 | 1158 | 1159 | 1160 | 1161 | 1162 | 1163 | 1164 | 1165 | 1166 | 1167 | 1168 | 1169 | 1170 | 1171 | 1172 | 1173 | 1174 | 1175 | 1176 | 1177 | 1178 | 1179 | 1180 | 1181 | 1182 | 1183 | 1184 | 1185 | 1186 | 1187 | 1188 | 1189 | 1190 | 1191 | 1192 | 1193 | 1194 | 1195 | 1196 | 1197 | 1198 | 1199 | 1200 | 1201 | 1202 | 1203 | 1204 | 1205 | 1206 | 1207 | 1208 | 1209 | 1210 | 1211 | 1212 | 1213 | 1214 | 1215 | 1216 | 1217 | 1218 | 1219 | 1220 | 1221 | 1222 | 1223 | 1224 | 1225 | 1226 | 1227 | 1228 | 1229 | 1230 | 1231 | 1232 | 1233 | 1234 | 1235 | 1236 | 1237 | 1238 | 1239 | 1240 | 1241 | 1242 | 1243 | 1244 | 1245 | 1246 | 1247 | 1248 | 1249 | 1250 | 1251 | 1252 | 1253 | 1254 | 1255 | 1256 | 1257 | 1258 | 1259 | 1260 | 1261 | 1262 | 1263 | 1264 | 1265 | 1266 | 1267 | 1268 | 1269 | 1270 | 1271 | 1272 | 1273 | 1274 | 1275 | 1276 | 1277 | 1278 | 1279 | 1280 | 1281 | 1282 | 1283 | 1284 | 1285 | 1286 | 1287 | 1288 | 1289 | 1290 | 1291 | 1292 | 1293 | 1294 | 1295 | 1296 | 1297 | 1298 | 1299 | 1300 | 1301 | 1302 | 1303 | 1304 | 1305 | 1306 | 1307 | 1308 | 1309 | 1310 | 1311 | 1312 | 1313 | 1314 | 1315 | 1316 | 1317 | 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | 1324 | 1325 | 1326 | 1327 | 1328 | 1329 | 1330 | 1331 | 1332 | 1333 | 1334 | 1335 | 1336 | 1337 | 1338 | 1339 | 1340 | 1341 | 1342 | 1343 | 1344 | 1345 | 1346 | 1347 | 1348 | 1349 | 1350 | 1351 | 1352 | 1353 | 1354 | 1355 | 1356 | 1357 | 1358 | 1359 | 1360 | 1361 | 1362 | 1363 | 1364 | 1365 | 1366 | 1367 | 1368 | 1369 | 1370 | 1371 | 1372 | 1373 | 1374 | 1375 | 1376 | 1377 | 1378 | 1379 | 1380 | 1381 | 1382 | 1383 | 1384 | 1385 | 1386 | 1387 | 1388 | 1389 | 1390 | 1391 | 1392 | 1393 | 1394 | 1395 | 1396 | 1397 | 1398 | 1399 | 1400 | 1401 | 1402 | 1403 | 1404 | 1405 | 1406 | 1407 | 1408 | 1409 | 1410 | 1411 | 1412 | 1413 | 1414 | 1415 | 1416 | 1417 | 1418 | 1419 | 1420 | 1421 | 1422 | 1423 | 1424 | 1425 | 1426 | 1427 | 1428 | 1429 | 1430 | 1431 | 1432 | 1433 | 1434 | 1435 | 1436 | 1437 | 1438 | 1439 | 1440 | 1441 | 1442 | 1443 | 1444 | 1445 | 1446 | 1447 | 1448 | 1449 | 1450 | 1451 | 1452 | 1453 | 1454 | 1455 | 1456 | 1457 | 1458 | 1459 | 1460 | 1461 | 1462 | 1463 | 1464 | 1465 | 1466 | 1467 | 1468 | 1469 | 1470 | 1471 | 1472 | 1473 | 1474 | 1475 | 1476 | 1477 | 1478 | 1479 | 1480 | 1481 | 1482 | 1483 | 1484 | 1485 | 1486 | 1487 | 1488 | 1489 | 1490 | 1491 | 1492 | 1493 | 1494 | 1495 | 1496 | 1497 | 1498 | 1499 | 1500 | 1501 | 1502 | 1503 | 1504 | 1505 | 1506 | 1507 | 1508 | 1509 | 1510 | 1511 | 1512 | 1513 | 1514 | 1515 | 1516 | 1517 | 1518 | 1519 | 1520 | 1521 | 1522 | 1523 | 1524 | 1525 | 1526 | 1527 | 1528 | 1529 | 1530 | 1531 | 1532 | 1533 | 1534 | 1535 | 1536 | 1537 | 1538 | 1539 | 1540 | 1541 | 1542 | 1543 | 1544 | 1545 |
Characteristics Comparison of Recognition Approaches
                          Method                          CodeRegularIrregularSegmentationExtra dataCTCAttentionSourceTime                                                                        Highlight                                                                        
Wang et al. [1] : ABBYY ××××ICCV2011a state-of-the-art text detector + a leading commercial OCR engine
Wang et al. [1] : SYNTH+PLEX×××××ICCV2011the baseline of scene text recognition
Mishra et al. [2]×××××BMVC20121) incorporating higher order statistical language models to recognize words in an unconstrained manner 2) introducing IIIT5K-word dataset
Wang et al. [3]××××ICPR2012CNNs + Non-maximal suppression + beam search
Goel et al. [4] : wDTW×××××ICDAR2013recognizing text by matching the scene and synthetic image features with wDTW
Bissacco et al. [5] : PhotoOCR×××××ICCV2013applying a network with five hidden layers for character classification
Phan et al. [6]×××××ICCV20131) MSER + SIFT descriptors + SVM 2) introducing the SVT-P datasets
Alsharif et al. [7] : HMM/Maxout×××××ICLR2014convolutional Maxout networks + Hybrid HMM
Almazan et al [8] : KCSR×××××TPAMI2014embedding word images and text strings in a common vectorial subspace and interpreting the task of recognition and retrieval as a nearest neighbor problem
Yao et al. [9] : Strokelets×××××CVPR2014proposing a novel multi-scale representation for scene text recognition: strokelets
R.-Serrano et al.[10] : Label embedding××××××IJCV2015embedding word labels and word images into a common Euclidean space and finding the cloest word label in this space
Jaderberg et al. [11]××××ECCV20141) enabling efficient feature sharing for text detection and classification 2) making technical changes over the traditional CNN architectures 3) proposing a method of automated data mining of Flickr.
Su and Lu [12]×××××ACCV2014HOG + BLSTM + CTC
Gordo[13] : Mid-features×××××CVPR2015proposing local mid-level features for building word image representations
Jaderberg et al. [14]×××××IJCV20151) treating each word as a category and training very large convolutional neural networks to perform word recognition on the whole proposal region 2) generating 9 million images with equal numbers of word samples from a 90k word dictionary
Jaderberg et al. [15]××××××ICLR2015CNN + CRF
Shi, Bai, and Yao [16] : CRNN××××TPAMI2017CNN + BLSTM + CTC
Shi et al. [17] : RARE×××××CVPR2016STN + CNN + attentional BLSTM
Lee and Osindero [18] : R2AM×××××CVPR2016presenting recursive recurrent neural networks with attention modeling
Liu et al. [19] : STAR-Net×××××BMVC2016STN + ResNet + BLSTM + CTC
Liu et al. [78]××××ICPR2016integrating the CNN and WFST classification model
Mishra et al. [77]××××CVIU2016character detection (HOG/CNN + SVM +Sliding window) + CRF, combining bottom-up cues from character detection and top-down cues from lexicon
Su and Lu [76]××××PR2017HOG(different scale) + BLSTM + CTC (ensemble)
*Yang et al. [20]××××IJCAI20171) CNN + 2D attention-based RNN, applying an auxiliary dense character detection task that helps to learn text specific visual patterns 2) developing a large-scale synthetic dataset
Yin et al. [21]×××××ICCV2017CNN + CTC
Wang et al.[66] : GRCNN××××NIPS2017Gated Recurrent Convulution Layer + BLSTM + CTC
*Cheng et al. [22] : FAN××××ICCV20171) proposing the concept of attention drift 2)introducing focusing network to focus deviated attention back on the target areas
Cheng et al. [23] : AON×××××CVPR20181) extracting scene text features in four directions 2) CNN + Attentional BLSTM
Gao et al. [24]××××NC2019attentional ResNet + CNN + CTC
Liu et al. [25] : Char-Net××××AAAI2018CNN + STN (facilitating the rectification of individual characters) + LSTM
*Liu et al. [26] : SqueezedText×××××AAAI2018binary convolutional encoder-decoder network + Bi-RNN
Zhan et al.[73]×××CVPR2018CRNN, achieving verisimilar scene text image synthesis by combining three novel designs, including semantic coherence, visual attention, and adaptive text appearance
*Bai et al. [27] : EP××××CVPR2018proposing edit probability to effectively handle the misalignment between the training text and the output probability distribution sequence
Fang et al.[74]××××MultiMedia2018ResNet + [2D Attentional CNN, CNN-based language module]
Liu et al.[75] : EnEsCTC××××NIPS2018proposing a novel maximum entropy based regularization for CTC (EnCTC) and an entropy-based pruning method (EsCTC) to effectively reduce the space of the feasible set
Liu et al. [28]×××××ECCV2018designing a multi-task network with an encoder-discriminator-generator architecture to guide the feature of the original image toward that of the clean image
Wang et al.[61] : MAAN×××××ICFHR2018ResNet + BLSTM + Memory-Augmented attentional decoder
Gao et al. [29]××××ICIP2018attentional DenseNet + BLSTM + CTC
Shi et al. [30] : ASTER××××TPAMI2018TPS + ResNet + bidirectional attention-based BLSTM
Chen et al. [60] : ASTER + AEG×××××NC2019TPS + ResNet + bidirectional attention-based BLSTM + AEG
Luo et al. [46] : MORAN××××PR2019Multi-object rectification network + CNN + attentional BLSTM
Luo et al. [61] : MORAN-v2××××PR2019Multi-object rectification network + ResNet + attentional BLSTM
Chen et al. [60] : MORAN-v2 + AEG×××××NC2019Multi-object rectification network + ResNet + attentional BLSTM + AEG
Xie et al. [47] : CAN×××××ACM2019ResNet + CNN + GLU
*Liao et al.[48] : CA-FCN×××AAAI2019performing character classification at each pixel location and needing character-level annotations
*Li et al. [49] : SAR×××AAAI2019ResNet + 2D attentional LSTM
Zhan el at. [55]: ESIR×××××CVPR2019Iterative rectification Network + ResNet + attentional BLSTM
Zhang et al. [56]: SSDAN××××CVPR2019attentional CNN + GAS + GRU
Yang et al. [62]: ScRN××××ICCV2019Symmetry-constrained Rectification Network + ResNet + BLSTM + attentional GRU
Wang et al. [64]: GCAM×××××ICME2019Convolutional Block Attention Module (CBAM) + ResNet + BLSTM + the proposed Gated Cascade Attention Module (GCAM)
Jeonghun et al. [65]××××ICCV2019TPS + ResNet + BLSTM + Attention Mechanism
Huang et al. [67] : EPAN×××××NC2019learning to sample features from the text region of 2D feature maps and innovatively introducing a two-stage attention mechanism
Gao et al. [68]×××××NC2019attentional DenseNET + 4-layer CNN + CTC
Qi et al. [69] : CCL××××ICDAR2019ResNet + [CTC, CCL]
Wang et al. [70] : ReELFA××××ICDAR2019VGG + attentional LSTM, utilizing one-hot encoded coordinates to indicate the spatial relationship of pixels and character center masks to help focus attention on the right feature areas
Zhu et al. [71] : HATN××××ICIP2019ResNet50 + Hierarchical Attention Mechanism (Transformer structure)
Zhan et al. [72] : SF-GAN××××CVPR2019ResNet50 + attentional Decoder, synthesising realistic scene text images for training better recognition models
Liao et al. [79] : SAM××××TPAMI2019Spatial attentional module (SAM)
Liao et al. [79] : seg-SAM×××TPAMI2019Character segmentation module + Spatial attention module (SAM)
Wang et al. [80] : DAN××××AAAI2020decoupling the decoder of the traditional attention mechanism into a convolutional alignment module and a decoupled text decoder
Wang et al. [82] : TextSR××××arXiv2019attempting to solve small texts with super-resolution methods
Wan et al. [83] : TextScanner××××AAAI2020an effective segmentation-based dual-branch framework for scene text recognition
Hu et al. [84] : GTC×××AAAI2020attempting to use GCN to learn the local correlations of feature sequence
Luo et al. [85] ×××××IJCV2020separating text content from noisy background styles
*Litman et al. [86]××××CVPR2020training a deep BiLSTM encoder, thus improving the encoding of contextual dependencies
Yu et al. [87]×××××CVPR2020 introducing a global semantic reasoning module (GSRM) to capture global semantic context through multi-way parallel transmission
Qiao et al. [101] : SEED××××CVPR2020 proposing a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts
Bleeker et al. [93] : Bi-STET××××ECAI2020a novel bidirectional STR method with a single decoder for bidirectional text decoding
*Bartz et al. [94] : KISS×××arXiv2020a new model for STR that consists of two ResNet based feature extractors, a spatial transformer, and a transformer
Zhang et al. [95] : SPIN×××××arXiv2020a new learnable geometric-unrelated module which allows the color manipulation of source data within the network
Lin et al. [96] : FASDA×××××arXiv2020implementing sequence-level domain adaption for STR
Zhang et al. [98] : AutoSTR××××ECCV2020searching data-dependent backbones
Mou et al. [99] : PlugNet×××××ECCV2020combining the pluggable super-resolution unit to solve the low-quality text recognition from the feature-level
*Yue et al. [100] : RobustScanner××××ECCV2020 mitigating the misrecognition problem of the encoderdecoder with attention framework on contextless text images
1546 | 1547 | 1548 | ### 2.2 Performance Comparison on Benchmark Datasets 1549 | 1550 | In this section, we compare the performance of the current advanced algorithms on benchmark datasets, including IIIT5K,SVT,IC03,IC13,SVT-P,CUTE80,IC15,COCO-Text, RCTW-17, MWTI, CTW,SCUT-CTW1500, LSVT, ArT and ReCTS-25k. 1551 | 1552 | It is notable that 1) The '*' indicates the methods that use the extra datasets other than Synth90k and SynthText. 2) The **bold** represents the best recognition results. 3) '^' denotes the best recognition results of using extra datasets. 4) '@' represents the methods under different evaluation that only uses 1811 test images. 5) 'SK', 'ST', 'ExPu', 'ExPr' and 'Un' indicates the methods that use Synth90K, SynthText, Extra Public Data, Extra Private Data and unknown data, respectively. 6) 'D_A' means data augmentation. 7) IC5-S contains only 1811 cropped text instances. 1553 | 1554 | 1555 | #### 2.2.1 Performance Comparison of Recognition Algorithms on Regular Latin Datasets 1556 | 1557 | 1558 | 1559 | 1560 | 1561 | 1562 | 1563 | 1564 | 1565 | 1566 | 1567 | 1568 | 1569 | 1570 | 1571 | 1572 | 1573 | 1574 | 1575 | 1576 | 1577 | 1578 | 1579 | 1580 | 1581 | 1582 | 1583 | 1584 | 1585 | 1586 | 1587 | 1588 | 1589 | 1590 | 1591 | 1592 | 1593 | 1594 | 1595 | 1596 | 1597 | 1598 | 1599 | 1600 | 1601 | 1602 | 1603 | 1604 | 1605 | 1606 | 1607 | 1608 | 1609 | 1610 | 1611 | 1612 | 1613 | 1614 | 1615 | 1616 | 1617 | 1618 | 1619 | 1620 | 1621 | 1622 | 1623 | 1624 | 1625 | 1626 | 1627 | 1628 | 1629 | 1630 | 1631 | 1632 | 1633 | 1634 | 1635 | 1636 | 1637 | 1638 | 1639 | 1640 | 1641 | 1642 | 1643 | 1644 | 1645 | 1646 | 1647 | 1648 | 1649 | 1650 | 1651 | 1652 | 1653 | 1654 | 1655 | 1656 | 1657 | 1658 | 1659 | 1660 | 1661 | 1662 | 1663 | 1664 | 1665 | 1666 | 1667 | 1668 | 1669 | 1670 | 1671 | 1672 | 1673 | 1674 | 1675 | 1676 | 1677 | 1678 | 1679 | 1680 | 1681 | 1682 | 1683 | 1684 | 1685 | 1686 | 1687 | 1688 | 1689 | 1690 | 1691 | 1692 | 1693 | 1694 | 1695 | 1696 | 1697 | 1698 | 1699 | 1700 | 1701 | 1702 | 1703 | 1704 | 1705 | 1706 | 1707 | 1708 | 1709 | 1710 | 1711 | 1712 | 1713 | 1714 | 1715 | 1716 | 1717 | 1718 | 1719 | 1720 | 1721 | 1722 | 1723 | 1724 | 1725 | 1726 | 1727 | 1728 | 1729 | 1730 | 1731 | 1732 | 1733 | 1734 | 1735 | 1736 | 1737 | 1738 | 1739 | 1740 | 1741 | 1742 | 1743 | 1744 | 1745 | 1746 | 1747 | 1748 | 1749 | 1750 | 1751 | 1752 | 1753 | 1754 | 1755 | 1756 | 1757 | 1758 | 1759 | 1760 | 1761 | 1762 | 1763 | 1764 | 1765 | 1766 | 1767 | 1768 | 1769 | 1770 | 1771 | 1772 | 1773 | 1774 | 1775 | 1776 | 1777 | 1778 | 1779 | 1780 | 1781 | 1782 | 1783 | 1784 | 1785 | 1786 | 1787 | 1788 | 1789 | 1790 | 1791 | 1792 | 1793 | 1794 | 1795 | 1796 | 1797 | 1798 | 1799 | 1800 | 1801 | 1802 | 1803 | 1804 | 1805 | 1806 | 1807 | 1808 | 1809 | 1810 | 1811 | 1812 | 1813 | 1814 | 1815 | 1816 | 1817 | 1818 | 1819 | 1820 | 1821 | 1822 | 1823 | 1824 | 1825 | 1826 | 1827 | 1828 | 1829 | 1830 | 1831 | 1832 | 1833 | 1834 | 1835 | 1836 | 1837 | 1838 | 1839 | 1840 | 1841 | 1842 | 1843 | 1844 | 1845 | 1846 | 1847 | 1848 | 1849 | 1850 | 1851 | 1852 | 1853 | 1854 | 1855 | 1856 | 1857 | 1858 | 1859 | 1860 | 1861 | 1862 | 1863 | 1864 | 1865 | 1866 | 1867 | 1868 | 1869 | 1870 | 1871 | 1872 | 1873 | 1874 | 1875 | 1876 | 1877 | 1878 | 1879 | 1880 | 1881 | 1882 | 1883 | 1884 | 1885 | 1886 | 1887 | 1888 | 1889 | 1890 | 1891 | 1892 | 1893 | 1894 | 1895 | 1896 | 1897 | 1898 | 1899 | 1900 | 1901 | 1902 | 1903 | 1904 | 1905 | 1906 | 1907 | 1908 | 1909 | 1910 | 1911 | 1912 | 1913 | 1914 | 1915 | 1916 | 1917 | 1918 | 1919 | 1920 | 1921 | 1922 | 1923 | 1924 | 1925 | 1926 | 1927 | 1928 | 1929 | 1930 | 1931 | 1932 | 1933 | 1934 | 1935 | 1936 | 1937 | 1938 | 1939 | 1940 | 1941 | 1942 | 1943 | 1944 | 1945 | 1946 | 1947 | 1948 | 1949 | 1950 | 1951 | 1952 | 1953 | 1954 | 1955 | 1956 | 1957 | 1958 | 1959 | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | 1966 | 1967 | 1968 | 1969 | 1970 | 1971 | 1972 | 1973 | 1974 | 1975 | 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | 2026 | 2027 | 2028 | 2029 | 2030 | 2031 | 2032 | 2033 | 2034 | 2035 | 2036 | 2037 | 2038 | 2039 | 2040 | 2041 | 2042 | 2043 | 2044 | 2045 | 2046 | 2047 | 2048 | 2049 | 2050 | 2051 | 2052 | 2053 | 2054 | 2055 | 2056 | 2057 | 2058 | 2059 | 2060 | 2061 | 2062 | 2063 | 2064 | 2065 | 2066 | 2067 | 2068 | 2069 | 2070 | 2071 | 2072 | 2073 | 2074 | 2075 | 2076 | 2077 | 2078 | 2079 | 2080 | 2081 | 2082 | 2083 | 2084 | 2085 | 2086 | 2087 | 2088 | 2089 | 2090 | 2091 | 2092 | 2093 | 2094 | 2095 | 2096 | 2097 | 2098 | 2099 | 2100 | 2101 | 2102 | 2103 | 2104 | 2105 | 2106 | 2107 | 2108 | 2109 | 2110 | 2111 | 2112 | 2113 | 2114 | 2115 | 2116 | 2117 | 2118 | 2119 | 2120 | 2121 | 2122 | 2123 | 2124 | 2125 | 2126 | 2127 | 2128 | 2129 | 2130 | 2131 | 2132 | 2133 | 2134 | 2135 | 2136 | 2137 | 2138 | 2139 | 2140 | 2141 | 2142 | 2143 | 2144 | 2145 | 2146 | 2147 | 2148 | 2149 | 2150 | 2151 | 2152 | 2153 | 2154 | 2155 | 2156 | 2157 | 2158 | 2159 | 2160 | 2161 | 2162 | 2163 | 2164 | 2165 | 2166 | 2167 | 2168 | 2169 | 2170 | 2171 | 2172 | 2173 | 2174 | 2175 | 2176 | 2177 | 2178 | 2179 | 2180 | 2181 | 2182 | 2183 | 2184 | 2185 | 2186 | 2187 | 2188 | 2189 | 2190 | 2191 | 2192 | 2193 | 2194 | 2195 | 2196 | 2197 | 2198 | 2199 | 2200 | 2201 | 2202 | 2203 | 2204 | 2205 | 2206 | 2207 | 2208 | 2209 | 2210 | 2211 | 2212 | 2213 | 2214 | 2215 | 2216 | 2217 | 2218 | 2219 | 2220 | 2221 | 2222 | 2223 | 2224 | 2225 | 2226 | 2227 | 2228 | 2229 | 2230 | 2231 | 2232 | 2233 | 2234 | 2235 | 2236 | 2237 | 2238 | 2239 | 2240 | 2241 | 2242 | 2243 | 2244 | 2245 | 2246 | 2247 | 2248 | 2249 | 2250 | 2251 | 2252 | 2253 | 2254 | 2255 | 2256 | 2257 | 2258 | 2259 | 2260 | 2261 | 2262 | 2263 | 2264 | 2265 | 2266 | 2267 | 2268 | 2269 | 2270 | 2271 | 2272 | 2273 | 2274 | 2275 | 2276 | 2277 | 2278 | 2279 | 2280 | 2281 | 2282 | 2283 | 2284 | 2285 | 2286 | 2287 | 2288 | 2289 | 2290 | 2291 | 2292 | 2293 | 2294 | 2295 | 2296 | 2297 | 2298 | 2299 | 2300 | 2301 | 2302 | 2303 | 2304 | 2305 | 2306 | 2307 | 2308 | 2309 | 2310 | 2311 | 2312 | 2313 | 2314 | 2315 | 2316 | 2317 | 2318 | 2319 | 2320 | 2321 | 2322 | 2323 | 2324 | 2325 | 2326 | 2327 | 2328 | 2329 | 2330 | 2331 | 2332 | 2333 | 2334 | 2335 | 2336 | 2337 | 2338 | 2339 | 2340 | 2341 | 2342 | 2343 | 2344 | 2345 | 2346 | 2347 | 2348 | 2349 | 2350 | 2351 | 2352 | 2353 | 2354 | 2355 | 2356 | 2357 | 2358 | 2359 | 2360 | 2361 | 2362 | 2363 | 2364 | 2365 | 2366 | 2367 | 2368 | 2369 | 2370 | 2371 | 2372 | 2373 | 2374 | 2375 | 2376 | 2377 | 2378 | 2379 | 2380 | 2381 | 2382 | 2383 | 2384 | 2385 | 2386 | 2387 | 2388 | 2389 | 2390 | 2391 | 2392 | 2393 | 2394 | 2395 | 2396 | 2397 | 2398 | 2399 | 2400 | 2401 | 2402 | 2403 | 2404 | 2405 | 2406 | 2407 | 2408 | 2409 | 2410 | 2411 | 2412 | 2413 | 2414 | 2415 | 2416 | 2417 | 2418 | 2419 | 2420 | 2421 | 2422 | 2423 | 2424 | 2425 | 2426 | 2427 | 2428 | 2429 | 2430 | 2431 | 2432 | 2433 | 2434 | 2435 | 2436 | 2437 | 2438 | 2439 | 2440 | 2441 | 2442 | 2443 | 2444 | 2445 | 2446 | 2447 | 2448 | 2449 | 2450 | 2451 | 2452 | 2453 | 2454 | 2455 | 2456 | 2457 | 2458 | 2459 | 2460 | 2461 | 2462 | 2463 | 2464 | 2465 | 2466 | 2467 | 2468 | 2469 | 2470 | 2471 | 2472 | 2473 | 2474 | 2475 | 2476 | 2477 | 2478 | 2479 | 2480 | 2481 | 2482 | 2483 | 2484 | 2485 | 2486 | 2487 | 2488 | 2489 | 2490 | 2491 | 2492 | 2493 | 2494 | 2495 | 2496 | 2497 | 2498 | 2499 | 2500 | 2501 | 2502 | 2503 | 2504 | 2505 | 2506 | 2507 | 2508 | 2509 | 2510 | 2511 | 2512 | 2513 | 2514 | 2515 | 2516 | 2517 | 2518 | 2519 | 2520 | 2521 | 2522 | 2523 | 2524 | 2525 | 2526 | 2527 | 2528 | 2529 | 2530 | 2531 | 2532 | 2533 | 2534 | 2535 | 2536 | 2537 | 2538 | 2539 | 2540 | 2541 | 2542 | 2543 | 2544 | 2545 | 2546 | 2547 | 2548 | 2549 | 2550 | 2551 | 2552 | 2553 | 2554 | 2555 | 2556 | 2557 | 2558 | 2559 | 2560 | 2561 | 2562 | 2563 | 2564 | 2565 | 2566 | 2567 | 2568 | 2569 | 2570 | 2571 | 2572 | 2573 | 2574 | 2575 | 2576 | 2577 | 2578 | 2579 | 2580 | 2581 | 2582 | 2583 | 2584 | 2585 | 2586 | 2587 | 2588 | 2589 | 2590 | 2591 | 2592 | 2593 | 2594 | 2595 | 2596 | 2597 | 2598 | 2599 | 2600 | 2601 | 2602 | 2603 | 2604 | 2605 | 2606 | 2607 | 2608 | 2609 | 2610 | 2611 | 2612 | 2613 | 2614 | 2615 | 2616 | 2617 | 2618 | 2619 | 2620 | 2621 | 2622 | 2623 | 2624 | 2625 | 2626 | 2627 | 2628 | 2629 | 2630 | 2631 | 2632 | 2633 | 2634 | 2635 | 2636 | 2637 | 2638 | 2639 | 2640 | 2641 | 2642 | 2643 | 2644 | 2645 | 2646 | 2647 | 2648 | 2649 | 2650 | 2651 | 2652 | 2653 | 2654 | 2655 | 2656 | 2657 | 2658 | 2659 | 2660 | 2661 | 2662 | 2663 | 2664 | 2665 | 2666 | 2667 | 2668 | 2669 | 2670 | 2671 | 2672 | 2673 | 2674 | 2675 | 2676 | 2677 | 2678 | 2679 | 2680 | 2681 | 2682 | 2683 | 2684 | 2685 | 2686 | 2687 | 2688 | 2689 | 2690 | 2691 | 2692 | 2693 | 2694 | 2695 | 2696 | 2697 | 2698 | 2699 | 2700 | 2701 | 2702 | 2703 | 2704 | 2705 | 2706 | 2707 | 2708 | 2709 | 2710 | 2711 | 2712 | 2713 | 2714 | 2715 | 2716 | 2717 | 2718 | 2719 | 2720 | 2721 | 2722 | 2723 | 2724 | 2725 | 2726 | 2727 | 2728 | 2729 | 2730 | 2731 | 2732 | 2733 | 2734 | 2735 | 2736 | 2737 | 2738 | 2739 | 2740 | 2741 | 2742 | 2743 | 2744 | 2745 | 2746 | 2747 | 2748 | 2749 | 2750 | 2751 | 2752 | 2753 | 2754 | 2755 | 2756 | 2757 | 2758 | 2759 | 2760 | 2761 | 2762 | 2763 | 2764 | 2765 | 2766 | 2767 | 2768 | 2769 | 2770 | 2771 |
Performance Comparison of Recognition Algorithms on Regular Latin Datasets
                          Method                          IIIT5KSVTIC03IC13                        Data                        SourceTime
501KNone50None50Full50kNoneNone
Wang et al. [1] : ABBYY24.3--35-5655---UnICCV2011
Wang et al. [1] : SYNTH+PLEX---57-7662---ExPrICCV2011
Mishra et al. [2]64.157.5-73.2-81.867.8---ExPuBMVC2012
Wang et al. [3]---70-9084---ExPrICPR2012
Goel et al. [4] : wDTW---77.3-89.7----UnICDAR2013
Bissacco et al. [5] : PhotoOCR---90.478----87.6ExPrICCV2013
Phan et al. [6]---73.7-82.2----ExPuICCV2013
Alsharif et al. [7] : HMM/Maxout---74.3-93.188.685.1--ExPuICLR2014
Almazan et al [8] : KCSR88.675.6-87------ExPuTPAMI2014
Yao et al. [9] : Strokelets80.269.3-75.9-88.580.3---ExPuCVPR2014
R.-Serrano et al.[10] : Label embedding76.157.4-70------ExPuIJCV2015
Jaderberg et al. [11]---86.1-96.291.5---ExPuECCV2014
Su and Lu [12]---83-9282---ExPuACCV2014
Gordo[13] : Mid-features93.386.6-91.8------ExPuCVPR2015
Jaderberg et al. [14]97.192.7-95.480.798.798.693.393.190.8ExPrIJCV2015
Jaderberg et al. [15]95.589.6-93.271.797.89793.489.681.8SK + ExPrICLR2015
Shi, Bai, and Yao [16] : CRNN97.89581.297.582.798.79895.791.989.6SKTPAMI2017
Shi et al. [17] : RARE96.293.881.995.581.998.396.294.890.188.6SKCVPR2016
Lee and Osindero [18] : R2AM96.894.478.496.380.797.997-88.790SKCVPR2016
Liu et al. [19] : STAR-Net97.794.583.395.583.696.995.3-89.989.1SK + ExPrBMVC2016
*Liu et al. [78]94.184.7-92.5-96.892.2---ExPu (D_A)ICPR2016
*Mishra et al. [77]78.07-46.7378.2-88--67.760.18ExPu (D_A)CVIU2016
*Su and Lu [76]---91-9589--76SK + ExPuPR2017
*Yang et al. [20]97.896.1-95.2-97.7----ExPuIJCAI2017
Yin et al. [21]98.796.178.295.172.597.696.5-81.181.4SKICCV2017
Wang et al.[66] : GRCNN9895.680.896.381.598.897.8-91.2-SKNIPS2017
*Cheng et al. [22] : FAN99.397.587.497.185.999.297.3-94.293.3SK + ST (Pixel_wise)ICCV2017
Cheng et al. [23] : AON99.698.1879682.898.597.1-91.5-SK + ST (D_A)CVPR2018
Gao et al. [24]99.197.981.897.482.798.796.7-89.288SKNC2019
Liu et al. [25] : Char-Net--83.6-84.4-93.3-91.590.8SK (D_A)AAAI2018
*Liu et al. [26] : SqueezedText9794.18795.2-98.897.993.893.192.9ExPrAAAI2018
*Zhan et al.[73]98.195.379.396.781.5----87.1Pr(5 million)CVPR2018
*Bai et al. [27] : EP99.597.988.396.687.598.797.9-94.694.4SK + ST (Pixel_wise)CVPR2018
Fang et al.[74]98.596.886.797.886.799.398.4-94.893.5SK + STMultiMedia2018
Liu et al.[75] : EnEsCTC--82-80.6---9290.6SKNIPS2018
Liu et al. [28]97.396.189.496.887.198.197.5-94.794SKECCV2018
Wang et al.[61] : MAAN98.396.484.196.483.597.496.4-92.291.1SKICFHR2018
Gao et al. [29]99.197.283.697.783.998.696.6-91.489.5SKICIP2018
Shi et al. [30] : ASTER99.698.893.497.489.598.898-94.591.8SK + STTPAMI2018
Chen et al. [60] : ASTER + AEG99.598.594.497.490.39998.3-95.295SK + STNC2019
Luo et al. [46] : MORAN97.996.291.296.688.398.797.8-9592.4SK + STPR2019
Luo et al. [61] : MORAN-v2--93.4-88.3---94.293.2SK + STPR2019
Chen et al. [60] : MORAN-v2 + AEG99.598.794.697.490.498.898.3-95.395.3SK + STNC2019
Xie et al. [47] : CAN9794.280.596.983.498.497.8-9190.5SKACM2019
*Liao et al.[48] : CA-FCN^99.898.99298.882.1----91.4SK + ST+ ExPrAAAI2019
*Li et al. [49] : SAR99.498.29598.591.2----94SK + ST + ExPrAAAI2019
Zhan el at. [55]: ESIR99.698.893.397.490.2----91.3SK + STCVPR2019
Zhang et al. [56]: SSDAN--83.8-84.5---92.191.8SKCVPR2019
*Yang et al. [62]: ScRN99.598.894.497.288.99998.3-9593.9SK + ST(char-level + word-level)ICCV2019
Wang et al. [64]: GCAM--93.9-91.3---95.395.7SK + STICME2019
Jeonghun et al. [65]--87.9-87.5---94.492.3SK + STICCV2019
Huang et al. [67]:EPAN98.997.89496.688.998.798-9594.5SK + STNC2019
Gao et al. [68]99.197.981.897.482.798.796.7-89.288SKNC2019
*Qi et al. [69] : CCL99.699.191.19885.999.2^98.8-93.592.8SK + ST(char-level + word-level)ICDAR2019
*Wang et al. [70] : ReELFA99.298.190.9-82.7-----ST(char-level + word-level)ICDAR2019
*Zhu et al. [71] : HATN--88.6-82.2---91.391.1SK(D_A) + PuICIP2019
*Zhan et al. [72] : SF-GAN--63-69.3----61.8Pr(1 million)CVPR2019
Liao et al. [79] : SAM99.498.693.998.690.698.898-95.295.3SK + STTPAMI2019
*Liao et al. [79] : seg-SAM^99.8^99.395.399.191.89997.9-9595.3SK + ST (char-level)TPAMI2019
Wang et al. [80] : DAN--94.3-89.2---9593.9SK + STAAAI2020
Wang et al. [82] : TextSR--92.59887.2---93.291.3SK + STarXiv2019
*Wan et al. [83] : TextScanner99.799.193.998.590.1----92.9SK + ST (char-level)AAAI2020
*Hu et al. [84] : GTC--^95.8-^92.9---95.594.4SK + ST + ExPuAAAI2020
Luo et al. [85]99.698.895.699.492.999.198.8-96.296SK + STIJCV2020
*Litman et al. [86]--93.7-92.7---^96.393.9SK + ST + ExPu CVPR2020
Yu et al. [87]--94.8-91.5----95.5SK + STCVPR2020
Qiao et al. [101] : SEED--93.8-89.6----92.8SK + STCVPR2020
Bleeker et al. [93] : Bi-STET99.698.994.797.48999.198.7-9693.4SK + STECAI2020
*Bartz et al. [94] : KISS--94.6-89.2----93.1SK + ST + ExPu (D_A)arXiv2020
Zhang et al. [95] : SPIN--94.7-90.3---94.492.8SK + STarXiv2020
Lin et al. [96] : FASDA---96.588.399.197.5-94.894.4SKarXiv2020
Zhang et al. [98] : AutoSTR--94.7-90.9---93.394.2SK + STECCV2020
Mou et al. [99] : PlugNet--94.4-92.3---95.795SK + STECCV2020
*Yue et al. [100] : RobustScanner--95.4-89.3----94.1SK + ST + ExPuECCV2020
2772 | 2773 | 2774 | #### 2.2.2 Performance Comparison of Recognition Algorithms on Irregular Latin Datasets 2775 | 2776 | 2777 | 2778 | 2779 | 2780 | 2781 | 2782 | 2783 | 2784 | 2785 | 2786 | 2787 | 2788 | 2789 | 2790 | 2791 | 2792 | 2793 | 2794 | 2795 | 2796 | 2797 | 2798 | 2799 | 2800 | 2801 | 2802 | 2803 | 2804 | 2805 | 2806 | 2807 | 2808 | 2809 | 2810 | 2811 | 2812 | 2813 | 2814 | 2815 | 2816 | 2817 | 2818 | 2819 | 2820 | 2821 | 2822 | 2823 | 2824 | 2825 | 2826 | 2827 | 2828 | 2829 | 2830 | 2831 | 2832 | 2833 | 2834 | 2835 | 2836 | 2837 | 2838 | 2839 | 2840 | 2841 | 2842 | 2843 | 2844 | 2845 | 2846 | 2847 | 2848 | 2849 | 2850 | 2851 | 2852 | 2853 | 2854 | 2855 | 2856 | 2857 | 2858 | 2859 | 2860 | 2861 | 2862 | 2863 | 2864 | 2865 | 2866 | 2867 | 2868 | 2869 | 2870 | 2871 | 2872 | 2873 | 2874 | 2875 | 2876 | 2877 | 2878 | 2879 | 2880 | 2881 | 2882 | 2883 | 2884 | 2885 | 2886 | 2887 | 2888 | 2889 | 2890 | 2891 | 2892 | 2893 | 2894 | 2895 | 2896 | 2897 | 2898 | 2899 | 2900 | 2901 | 2902 | 2903 | 2904 | 2905 | 2906 | 2907 | 2908 | 2909 | 2910 | 2911 | 2912 | 2913 | 2914 | 2915 | 2916 | 2917 | 2918 | 2919 | 2920 | 2921 | 2922 | 2923 | 2924 | 2925 | 2926 | 2927 | 2928 | 2929 | 2930 | 2931 | 2932 | 2933 | 2934 | 2935 | 2936 | 2937 | 2938 | 2939 | 2940 | 2941 | 2942 | 2943 | 2944 | 2945 | 2946 | 2947 | 2948 | 2949 | 2950 | 2951 | 2952 | 2953 | 2954 | 2955 | 2956 | 2957 | 2958 | 2959 | 2960 | 2961 | 2962 | 2963 | 2964 | 2965 | 2966 | 2967 | 2968 | 2969 | 2970 | 2971 | 2972 | 2973 | 2974 | 2975 | 2976 | 2977 | 2978 | 2979 | 2980 | 2981 | 2982 | 2983 | 2984 | 2985 | 2986 | 2987 | 2988 | 2989 | 2990 | 2991 | 2992 | 2993 | 2994 | 2995 | 2996 | 2997 | 2998 | 2999 | 3000 | 3001 | 3002 | 3003 | 3004 | 3005 | 3006 | 3007 | 3008 | 3009 | 3010 | 3011 | 3012 | 3013 | 3014 | 3015 | 3016 | 3017 | 3018 | 3019 | 3020 | 3021 | 3022 | 3023 | 3024 | 3025 | 3026 | 3027 | 3028 | 3029 | 3030 | 3031 | 3032 | 3033 | 3034 | 3035 | 3036 | 3037 | 3038 | 3039 | 3040 | 3041 | 3042 | 3043 | 3044 | 3045 | 3046 | 3047 | 3048 | 3049 | 3050 | 3051 | 3052 | 3053 | 3054 | 3055 | 3056 | 3057 | 3058 | 3059 | 3060 | 3061 | 3062 | 3063 | 3064 | 3065 | 3066 | 3067 | 3068 | 3069 | 3070 | 3071 | 3072 | 3073 | 3074 | 3075 | 3076 | 3077 | 3078 | 3079 | 3080 | 3081 | 3082 | 3083 | 3084 | 3085 | 3086 | 3087 | 3088 | 3089 | 3090 | 3091 | 3092 | 3093 | 3094 | 3095 | 3096 | 3097 | 3098 | 3099 | 3100 | 3101 | 3102 | 3103 | 3104 | 3105 | 3106 | 3107 | 3108 | 3109 | 3110 | 3111 | 3112 | 3113 | 3114 | 3115 | 3116 | 3117 | 3118 | 3119 | 3120 | 3121 | 3122 | 3123 | 3124 | 3125 | 3126 | 3127 | 3128 | 3129 | 3130 | 3131 | 3132 | 3133 | 3134 | 3135 | 3136 | 3137 | 3138 | 3139 | 3140 | 3141 | 3142 | 3143 | 3144 | 3145 | 3146 | 3147 | 3148 | 3149 | 3150 | 3151 | 3152 | 3153 | 3154 | 3155 | 3156 | 3157 | 3158 | 3159 | 3160 | 3161 | 3162 | 3163 | 3164 | 3165 | 3166 | 3167 | 3168 | 3169 | 3170 | 3171 | 3172 | 3173 | 3174 | 3175 | 3176 | 3177 | 3178 | 3179 | 3180 | 3181 | 3182 | 3183 | 3184 | 3185 | 3186 | 3187 | 3188 | 3189 | 3190 | 3191 | 3192 | 3193 | 3194 | 3195 | 3196 | 3197 | 3198 | 3199 | 3200 | 3201 | 3202 | 3203 | 3204 | 3205 | 3206 | 3207 | 3208 | 3209 | 3210 | 3211 | 3212 | 3213 | 3214 | 3215 | 3216 | 3217 | 3218 | 3219 | 3220 | 3221 | 3222 | 3223 | 3224 | 3225 | 3226 | 3227 | 3228 | 3229 | 3230 | 3231 | 3232 | 3233 | 3234 | 3235 | 3236 | 3237 | 3238 | 3239 | 3240 | 3241 | 3242 | 3243 | 3244 | 3245 | 3246 | 3247 | 3248 | 3249 | 3250 | 3251 | 3252 | 3253 | 3254 | 3255 | 3256 | 3257 | 3258 | 3259 | 3260 | 3261 | 3262 | 3263 | 3264 | 3265 | 3266 | 3267 | 3268 | 3269 | 3270 | 3271 | 3272 | 3273 | 3274 | 3275 | 3276 | 3277 | 3278 | 3279 | 3280 | 3281 | 3282 | 3283 | 3284 | 3285 | 3286 | 3287 | 3288 | 3289 | 3290 | 3291 | 3292 | 3293 | 3294 | 3295 | 3296 | 3297 | 3298 | 3299 | 3300 | 3301 | 3302 | 3303 | 3304 | 3305 | 3306 | 3307 | 3308 | 3309 | 3310 | 3311 | 3312 | 3313 | 3314 | 3315 | 3316 | 3317 | 3318 | 3319 | 3320 | 3321 | 3322 | 3323 | 3324 | 3325 | 3326 | 3327 | 3328 | 3329 | 3330 | 3331 | 3332 | 3333 | 3334 | 3335 | 3336 | 3337 | 3338 | 3339 | 3340 | 3341 | 3342 | 3343 | 3344 | 3345 | 3346 | 3347 | 3348 | 3349 | 3350 | 3351 | 3352 | 3353 | 3354 | 3355 | 3356 | 3357 | 3358 | 3359 | 3360 | 3361 | 3362 | 3363 | 3364 | 3365 | 3366 | 3367 | 3368 | 3369 | 3370 | 3371 | 3372 | 3373 | 3374 | 3375 | 3376 | 3377 | 3378 | 3379 | 3380 | 3381 | 3382 | 3383 | 3384 | 3385 | 3386 | 3387 | 3388 | 3389 | 3390 | 3391 | 3392 | 3393 | 3394 | 3395 | 3396 | 3397 | 3398 | 3399 | 3400 | 3401 | 3402 | 3403 | 3404 | 3405 | 3406 | 3407 | 3408 | 3409 | 3410 | 3411 | 3412 | 3413 | 3414 | 3415 | 3416 | 3417 | 3418 | 3419 | 3420 | 3421 | 3422 | 3423 | 3424 | 3425 | 3426 | 3427 | 3428 | 3429 | 3430 | 3431 | 3432 | 3433 | 3434 | 3435 | 3436 | 3437 | 3438 | 3439 | 3440 | 3441 | 3442 | 3443 | 3444 | 3445 | 3446 | 3447 | 3448 | 3449 | 3450 | 3451 | 3452 | 3453 | 3454 | 3455 | 3456 | 3457 | 3458 | 3459 | 3460 | 3461 | 3462 | 3463 | 3464 | 3465 | 3466 | 3467 | 3468 | 3469 | 3470 | 3471 | 3472 | 3473 | 3474 | 3475 | 3476 | 3477 | 3478 | 3479 | 3480 | 3481 | 3482 | 3483 | 3484 | 3485 | 3486 | 3487 | 3488 | 3489 | 3490 | 3491 | 3492 | 3493 | 3494 | 3495 | 3496 | 3497 | 3498 | 3499 | 3500 | 3501 | 3502 | 3503 | 3504 | 3505 | 3506 | 3507 | 3508 | 3509 | 3510 | 3511 | 3512 | 3513 | 3514 | 3515 | 3516 | 3517 | 3518 | 3519 | 3520 | 3521 | 3522 | 3523 | 3524 | 3525 | 3526 | 3527 | 3528 | 3529 | 3530 | 3531 | 3532 | 3533 | 3534 | 3535 | 3536 | 3537 | 3538 | 3539 | 3540 | 3541 | 3542 | 3543 | 3544 | 3545 | 3546 | 3547 | 3548 | 3549 | 3550 | 3551 | 3552 | 3553 | 3554 | 3555 | 3556 | 3557 | 3558 | 3559 | 3560 | 3561 | 3562 | 3563 | 3564 | 3565 | 3566 | 3567 | 3568 | 3569 | 3570 | 3571 | 3572 | 3573 | 3574 | 3575 | 3576 | 3577 | 3578 | 3579 | 3580 | 3581 | 3582 | 3583 | 3584 | 3585 | 3586 | 3587 | 3588 | 3589 | 3590 | 3591 | 3592 | 3593 | 3594 | 3595 | 3596 | 3597 | 3598 | 3599 | 3600 | 3601 | 3602 | 3603 | 3604 | 3605 | 3606 | 3607 | 3608 | 3609 | 3610 | 3611 | 3612 | 3613 | 3614 | 3615 | 3616 | 3617 | 3618 | 3619 | 3620 | 3621 | 3622 | 3623 | 3624 | 3625 | 3626 | 3627 | 3628 | 3629 | 3630 | 3631 | 3632 | 3633 | 3634 | 3635 | 3636 | 3637 | 3638 | 3639 | 3640 | 3641 | 3642 | 3643 | 3644 | 3645 | 3646 | 3647 | 3648 | 3649 | 3650 | 3651 | 3652 | 3653 | 3654 | 3655 | 3656 | 3657 | 3658 | 3659 | 3660 | 3661 | 3662 | 3663 | 3664 | 3665 | 3666 | 3667 | 3668 | 3669 | 3670 | 3671 | 3672 | 3673 | 3674 | 3675 | 3676 | 3677 | 3678 | 3679 | 3680 | 3681 | 3682 | 3683 | 3684 | 3685 | 3686 | 3687 | 3688 | 3689 | 3690 | 3691 | 3692 | 3693 | 3694 | 3695 | 3696 | 3697 | 3698 | 3699 | 3700 | 3701 | 3702 | 3703 | 3704 | 3705 | 3706 | 3707 | 3708 | 3709 | 3710 | 3711 | 3712 | 3713 | 3714 | 3715 | 3716 | 3717 | 3718 | 3719 | 3720 | 3721 | 3722 | 3723 | 3724 | 3725 | 3726 | 3727 | 3728 | 3729 | 3730 | 3731 | 3732 | 3733 | 3734 | 3735 | 3736 | 3737 | 3738 | 3739 | 3740 | 3741 | 3742 | 3743 | 3744 | 3745 | 3746 | 3747 | 3748 | 3749 | 3750 | 3751 | 3752 | 3753 | 3754 | 3755 | 3756 | 3757 | 3758 | 3759 | 3760 | 3761 | 3762 | 3763 | 3764 | 3765 | 3766 | 3767 |
Performance Comparison of Recognition Algorithms on Irregular Latin Datasets
                          Method                          SVT-PCUTE80IC15-SIC15COCO-TEXT                        Data                        SourceTime
50FullNoneNoneNoneNoneNone
Wang et al. [1] : ABBYY40.526.1-----UnICCV2011
Wang et al. [1] : SYNTH+PLEX-------ExPrICCV2011
Mishra et al. [2]45.724.7-----ExPuBMVC2012
Wang et al. [3]40.232.4-----ExPrICPR2012
Goel et al. [4] : wDTW-------UnICDAR2013
Bissacco et al. [5] : PhotoOCR-------ExPrICCV2013
Phan et al. [6]62.342.2-----ExPuICCV2013
Alsharif et al. [7] : HMM/Maxout-------ExPuICLR2014
Almazan et al [8] : KCSR-------ExPuTPAMI2014
Yao et al. [9] : Strokelets-------ExPuCVPR2014
R.-Serrano et al.[10] : Label embedding-------ExPuIJCV2015
Jaderberg et al. [11]-------ExPuECCV2014
Su and Lu [12]-------ExPuACCV2014
Gordo[13] : Mid-features-------ExPuCVPR2015
Jaderberg et al. [14]-------ExPrIJCV2015
Jaderberg et al. [15]-------SK + ExPrICLR2015
Shi, Bai, and Yao [16] : CRNN-------SKTPAMI2017
Shi et al. [17] : RARE91.277.471.859.2---SKCVPR2016
Lee and Osindero [18] : R2AM-------SKCVPR2016
Liu et al. [19] : STAR-Net94.383.673.5----SK + ExPrBMVC2016
*Liu et al. [78]-------ExPu (D_A)ICPR2016
*Mishra et al. [77]-------ExPu (D_A)CVIU2016
*Su and Lu [76]-------SK + ExPuPR2017
*Yang et al. [20]9380.275.869.3---ExPuIJCAI2017
Yin et al. [21]-------SKICCV2017
Wang et al.[66] : GRCNN-------SKNIPS2017
*Cheng et al. [22] : FAN----70.6--SK + ST (Pixel_wise)ICCV2017
Cheng et al. [23] : AON9483.77376.8-68.2-SK + ST (D_A)CVPR2018
Gao et al. [24]-------SKNC2019
Liu et al. [25] : Char-Net--73.5--60-SK (D_A)AAAI2018
*Liu et al. [26] : SqueezedText-------ExPrAAAI2018
*Zhan et al.[73]-------Pr(5 million)CVPR2018
*Bai et al. [27] : EP----73.9--SK + ST (Pixel_wise)CVPR2018
Fang et al.[74]-----71.2-SK + STMultiMedia2018
Liu et al.[75] : EnEsCTC-------SKNIPS2018
Liu et al. [28]--73.962.5
--SKECCV2018
Wang et al.[61] : MAAN-------SKICFHR2018
Gao et al. [29]-------SKICIP2018
Shi et al. [30] : ASTER--78.579.576.1--SK + STTPAMI2018
Chen et al. [60] : ASTER + AEG94.489.58280.9-76.7-SK + STNC2019
Luo et al. [46] : MORAN94.386.776.177.4-68.8-SK + STPR2019
Luo et al. [61] : MORAN-v2--79.781.9-73.9-SK + STPR2019
Chen et al. [60] : MORAN-v2 + AEG94.789.682.881.3-77.4-SK + STNC2019
Xie et al. [47] : CAN-------SKACM2019
*Liao et al.[48] : CA-FCN---78.1---SK + ST+ ExPrAAAI2019
*Li et al. [49] : SAR^95.891.2^86.489.6-78.8^66.8SK + ST + ExPrAAAI2019
Zhan el at. [55]: ESIR--79.683.3-76.9-SK + STCVPR2019
Zhang et al. [56]: SSDAN-------SKCVPR2019
*Yang et al. [62]: ScRN--80.887.5-78.7-SK + ST(char-level + word-level)ICCV2019
Wang et al. [64]: GCAM--85.783.383.5--SK + STICME2019
Jeonghun et al. [65]--79.274-71.8-SK + STICCV2019
Huang et al. [67]:EPAN91.286.479.482.6-73.9-SK + STNC2019
Gao et al. [68]-----62.340SKNC2019
*Qi et al. [69] : CCL----72.9--SK + ST(char-level + word-level)ICDAR2019
*Wang et al. [70] : ReELFA---82.3-68.5-ST(char-level + word-level)ICDAR2019
*Zhu et al. [71] : HATN--73.575.7-70.1-SK(D_A) + PuICIP2019
*Zhan et al. [72] : SF-GAN--48.640.6-39-Pr(1 million)CVPR2019
Liao et al. [79] : SAM--82.287.8-77.3-SK + STTPAMI2019
*Liao et al. [79] : seg-SAM--83.688.5-78.2-SK + ST (char-level)TPAMI2019
Wang et al. [80] : DAN--8084.4-74.5-SK + STAAAI2020
Wang et al. [82] : TextSR--77.478.9-75.6-SK + STarXiv2019
*Wan et al. [83] : TextScanner--84.383.3-79.4-SK + ST (char-level)AAAI2020
*Hu et al. [84] : GTC--85.792.2-79.5-SK + ST + ExPuAAAI2020
Luo et al. [85]95.891.585.191.383.981.4-SK + STIJCV2020
*Litman et al. [86]--^86.987.5-82.2-SK + ST + ExPu CVPR2020
Yu et al. [87]--85.187.882.7--SK + STCVPR2020
Qiao et al. [101] : SEED--81.483.680--SK + STCVPR2020
Bleeker et al. [93] : Bi-STET--80.682.575.7--SK + STECAI2020
*Bartz et al. [94] : KISS--83.189.680.374.2-SK + ST + ExPu (D_A)arXiv2020
Zhang et al. [95] : SPIN--82.887.582.278.5-SK + STarXiv2020
Lin et al. [96] : FASDA----73.3--SKarXiv2020
Zhang et al. [98] : AutoSTR--81.7-81.8--SK + STECCV2020
Mou et al. [99] : PlugNet--84.385-82.2-SK + STECCV2020
*Yue et al. [100] : RobustScanner--82.9^92.4-79.2-SK + ST + ExPuECCV2020
3768 | 3769 | *** 3770 | 3771 | 3772 | ## 3. Survey 3773 | 3774 | **[50] \[TPAMI-2015]** Q. Ye and D. Doermann, “Text detection and recognition in imagery: A survey,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 37, no. 7, pp. 1480–1500, 2015. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6945320) 3775 | 3776 | **[51] \[Frontiers-Comput. Sci-2016]** Y. Zhu, C. Yao, and X. Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science, vol. 10, no. 1, pp. 19–36, 2016. [paper](https://link.springer.com/article/10.1007/s11704-015-4488-0) 3777 | 3778 | **[52] \[IJCV-2020]** Long S, He X, Yao C. Scene text detection and recognition: The deep learning era[J]. International Journal of Computer Vision, 2020: 1-24. [paper](https://link.springer.com/article/10.1007/s11263-020-01369-0) [code](https://github.com/Jyouhou/SceneTextPapers) 3779 | 3780 | **[90]** **\[ACM Computing Surveys-2020]** X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text Recognition in the Wild: A Survey," ACM Computing Surveys (CSUR) 2020. [paper](https://arxiv.org/pdf/2005.03492.pdf) [code](https://github.com/HCIILAB/Scene-Text-Recognition) 3781 | 3782 | *** 3783 | 3784 | 3785 | ## 4. OCR Service 3786 | 3787 | | OCR | API | Free | Code | 3788 | | :----------------------------------------------------------: | :--: | :--: | :--: | 3789 | | [Tesseract OCR Engine](https://github.com/tesseract-ocr/tesseract) | × | √ | √ | 3790 | | [Azure](https://azure.microsoft.com/zh-cn/services/cognitive-services/computer-vision/#Analysis) | √ | √ | × | 3791 | | [ABBYY](https://www.abbyy.cn/real-time-recognition-sdk/technical-specifications/) | √ | √ | × | 3792 | | [OCR Space](https://ocr.space/) | √ | √ | × | 3793 | | [SODA PDF OCR](https://www.sodapdf.com/ocr-pdf/) | √ | √ | × | 3794 | | [Free Online OCR](https://www.newocr.com/) | √ | √ | × | 3795 | | [Online OCR](https://www.onlineocr.net/) | √ | √ | × | 3796 | | [Super Tools](https://www.wdku.net/) | √ | √ | × | 3797 | | [Online Chinese Recognition](http://chongdata.com/ocr/) | √ | √ | × | 3798 | | [Calamari OCR](https://github.com/Calamari-OCR/calamari) | × | √ | √ | 3799 | | [ Tencent OCR](https://cloud.tencent.com/product/ocr?lang=cn) | √ | × | × | 3800 | 3801 | *** 3802 | 3803 | 3804 | ## 5. References 3805 | 3806 | **[1] \[ICCV-2011]** K. Wang, B. Babenko, and S. Belongie, “End-to-end scene text recognition,” in Proceedings of ICCV, 2011, pp. 1457–1464. [paper ](https://www.researchgate.net/profile/Serge_Belongie/publication/221110077_End-to-end_scene_text_recognition/links/09e4150b34908d2ebb000000/End-to-end-scene-text-recognition.pdf) 3807 | 3808 | **[2] \[BMVC-2012]** A. Mishra, K. Alahari, and C. Jawahar, “Scene text recognition using higher order language priors,” in Proceedings of BMVC, 2012, pp. 1–11. [paper](https://hal.inria.fr/hal-00818183/document) [dataset](http://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) 3809 | 3810 | **[3] \[ICPR-2012]** T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, “End-to-end text recognition with convolutional neural networks,” in Proceedings of ICPR, 2012, pp. 3304–3308. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6460871&tag=1) 3811 | 3812 | **[4] \[ICDAR-2013]** V. Goel, A. Mishra, K. Alahari, and C. Jawahar, “Whole is greater than sum of parts: Recognizing scene text words,” in Proceedings of ICDAR, 2013, pp. 398–402. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6628652) 3813 | 3814 | **[5] \[ICCV-2013]** A. Bissacco, M. Cummins, Y. Netzer, and H. Neven, “Photoocr: Reading text in uncontrolled conditions,” in Proceedings of ICCV, 2013, pp. 785–792. [paper](http://openaccess.thecvf.com/content_iccv_2013/papers/Bissacco_PhotoOCR_Reading_Text_2013_ICCV_paper.pdf) 3815 | 3816 | **[6] \[ICCV-2013]** T. Quy Phan, P. Shivakumara, S. Tian, and C. Lim Tan, “Recognizing text with perspective distortion in natural scenes,” in Proceedings of ICCV, 2013, pp. 569–576. [paper](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf) 3817 | 3818 | **[7] \[ICLR-2014]** O. Alsharif and J. Pineau, “End-to-end text recognition with hybrid HMM maxout models,” in Proceedings of ICLR: Workshop, 2014. [paper](https://arxiv.org/pdf/1310.1811.pdf) 3819 | 3820 | **[8] \[TPAMI-2014]** J. Almaz ́an, A. Gordo, A. Forn ́ es, and E. Valveny, "Word spotting and recognition with embedded attributes,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 36, no. 12, pp. 2552–2566, 2014. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6857995) [code](https://github.com/almazan/watts) 3821 | 3822 | **[9] \[CVPR-2014]** C. Yao, X. Bai, B. Shi, and W. Liu, “Strokelets: A learned multiscale representation for scene text recognition,” in Proceedings of CVPR, 2014, pp. 4042–4049. [paper](http://openaccess.thecvf.com/content_cvpr_2014/papers/Yao_Strokelets_A_Learned_2014_CVPR_paper.pdf) 3823 | 3824 | **[10] \[IJCV-2015]** J. A. Rodriguez-Serrano, A. Gordo, and F. Perronnin, “Label embedding: A frugal baseline for text recognition,” Int. J. Comput. Vis, vol. 113, no. 3, pp. 193–207, 2015. [paper](https://link.springer.com/content/pdf/10.1007%2Fs11263-014-0793-6.pdf) 3825 | 3826 | **[11] \[ECCV-2014]** M. Jaderberg, A. Vedaldi, and A. Zisserman, “Deep features for text spotting,” in Proceedings of ECCV, 2014, pp. 512–528. [paper](https://link.springer.com/content/pdf/10.1007%2F978-3-319-10593-2_34.pdf) [code](https://bitbucket.org/jaderberg/eccv2014_textspotting/src/master/) 3827 | 3828 | **[12] \[ACCV-2014]** B. Su and S. Lu, “Accurate scene text recognition based on recurrent neural network,” in Proceedings of ACCV, 2014, pp. 35–48. [paper](https://link.springer.com/content/pdf/10.1007%2F978-3-319-16865-4_3.pdf) 3829 | 3830 | **[13] \[CVPR-2015]** A. Gordo, “Supervised mid-level features for word image representation,” in Proceedings of CVPR, 2015, pp. 2956–2964. [paper](http://openaccess.thecvf.com/content_cvpr_2015/papers/Gordo_Supervised_Mid-Level_Features_2015_CVPR_paper.pdf) 3831 | 3832 | **[14] \[IJCV-2015]** M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Reading text in the wild with convolutional neural networks,” Int. J. Comput. Vis, vol. 116, no. 1, pp. 1–20, 2016. [paper](https://link.springer.com/content/pdf/10.1007%2Fs11263-015-0823-z.pdf) [code](http://www.robots.ox.ac.uk/~vgg/research/text/) 3833 | 3834 | **[15] \[ICLR-2015]** M. Jaderberg, K. Simonyan, and A. Zisserman, “Deep structured output learning for unconstrained text recognition,” in Proceedings of ICLR, 2015. [paper](https://arxiv.org/pdf/1412.5903.pdf) 3835 | 3836 | **[16] \[TPAMI-2017]** B. Shi, X. Bai, and C. Yao, “An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 39, no. 11, pp. 2298–2304, 2017. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7801919) [code-Torch7](https://github.com/bgshih/crnn) [code-Pytorch](https://github.com/meijieru/crnn.pytorch) 3837 | 3838 | **[17] \[CVPR-2016]** B. Shi, X. Wang, P. Lyu, C. Yao, and X. Bai, “Robust scene text recognition with automatic rectification,” in Proceedings of CVPR, 2016, pp. 4168–4176. [paper](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shi_Robust_Scene_Text_CVPR_2016_paper.pdf) 3839 | 3840 | **[18] \[CVPR-2016]** C.-Y. Lee and S. Osindero, “Recursive recurrent nets with attention modeling for OCR in the wild,” in Proceedings of CVPR, 2016, pp. 2231–2239. [paper](http://openaccess.thecvf.com/content_cvpr_2016/papers/Lee_Recursive_Recurrent_Nets_CVPR_2016_paper.pdf) 3841 | 3842 | **[19] \[BMVC-2016]** W. Liu, C. Chen, K.-Y. K. Wong, Z. Su, and J. Han, “STAR-Net: A spatial attention residue network for scene text recognition,” in Proceedings of BMVC, 2016, p. 7. [paper](https://i.cs.hku.hk/~kykwong/publications/wliu_bmvc16.pdf) 3843 | 3844 | **[20] \[IJCAI-2017]** X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles, “Learning to read irregular text with attention mechanisms,” in Proceedings of IJCAI, 2017, pp. 3280–3286. [paper](https://pdfs.semanticscholar.org/1259/f7533abe2fe85fd9dead92853e2ff07a8792.pdf) 3845 | 3846 | **[21] \[ICCV-2017]** F. Yin, Y.-C. Wu, X.-Y. Zhang, and C.-L. Liu, “Scene text recognition with sliding convolutional character models,” in Proceedings of ICCV, 2017. [paper](https://arxiv.org/pdf/1709.01727.pdf) [code](https://github.com/lsvih/Sliding-Convolution) 3847 | 3848 | **[22] \[ICCV-2017]** Z. Cheng, F. Bai, Y. Xu, G. Zheng, S. Pu, and S. Zhou, “Focusing attention: Towards accurate text recognition in natural images,” in Proceedings of ICCV, 2017, pp. 5086–5094. [paper](http://openaccess.thecvf.com/content_ICCV_2017/papers/Cheng_Focusing_Attention_Towards_ICCV_2017_paper.pdf) 3849 | 3850 | **[23] \[CVPR-2018]** Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu, and S. Zhou, “AON: Towards arbitrarily-oriented text recognition,” in Proceedings of CVPR, 2018, pp. 5571–5579. [paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Cheng_AON_Towards_Arbitrarily-Oriented_CVPR_2018_paper.pdf) [code](https://github.com/huizhang0110/AON) 3851 | 3852 | **[24] \[NC-2019]** Y. Gao, Y. Chen, J. Wang, M. Tang, and H. Lu, “Reading scene text with fully convolutional sequence modeling,” Neurocomputing, vol. 339, pp. 161–170, 2019. [paper](https://www.sciencedirect.com/science/article/pii/S0925231219301870) 3853 | 3854 | **[25] \[AAAI-2018]** W. Liu, C. Chen, and K.-Y. K. Wong, “Char-Net: A character-aware neural network for distorted scene text recognition.” in Proceedings of AAAI, 2018, pp. 7154–7161. [paper](https://i.cs.hku.hk/~kykwong/publications/wliu_aaai2018.pdf) 3855 | 3856 | **[26] \[AAAI-2018]** Z. Liu, Y. Li, F. Ren, W. L. Goh, and H. Yu, “Squeezedtext: A real-time scene text recognition by binary convolutional encoderdecoder network,” in Proceedings of AAAI, 2018, pp. 7194–7201. [paper](https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16354/16312) 3857 | 3858 | **[27] \[CVPR-2018]** F. Bai, Z. Cheng, Y. Niu, S. Pu, and S. Zhou, “Edit probability for scene text recognition,” in Proceedings of CVPR, 2018, pp. 1508–1516. [paper](http://openaccess.thecvf.com/content_cvpr_2018/papers/Bai_Edit_Probability_for_CVPR_2018_paper.pdf) 3859 | 3860 | **[28] \[ECCV-2018]** Y. Liu, Z. Wang, H. Jin, and I. Wassell, “Synthetically supervised feature learning for scene text recognition,” in Proceedings of ECCV, 2018, pp. 449–465. [paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Yang_Liu_Synthetically_Supervised_Feature_ECCV_2018_paper.pdf) 3861 | 3862 | **[29] \[ICIP-2018]** Y. Gao, Y. Chen, J. Wang, M. Tang, and H. Lu, “Dense chained attention network for scene text recognition,” in Proceedings of ICIP, 2018, pp. 679–683. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8451273) 3863 | 3864 | **[30] \[TPAMI-2018]** B. Shi, M. Yang, X. Wang, P. Lyu, C. Yao, and X. Bai, “ASTER: An attentional scene text recognizer with flexible rectification,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 41, no. 9, pp. 2035–2048, 2019. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8395027) [code](https://github.com/bgshih/aster) 3865 | 3866 | **[31] \[CVPR-2012]** A. Mishra, K. Alahari, and C. Jawahar, “Top-down and bottom-up cues for scene text recognition,” in Proceedings of CVPR, 2012, pp. 2687–2694. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6247990) 3867 | 3868 | **[32]** https://github.com/Canjie-Luo/MORAN_v2 3869 | 3870 | **[33] \[ICDAR-2003]** S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, and R. Young, “ICDAR 2003 robust reading competitions,” in Proceedings of ICDAR, 2003, pp. 682–687. [paper](https://link.springer.com/content/pdf/10.1007%2Fs10032-004-0134-3.pdf) 3871 | 3872 | **[34] \[ICDAR-2013]** D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almazan, and L. P. De Las Heras, “ICDAR 2013 robust reading competition,” in Proceedings of ICDAR, 2013, pp. 1484–1493. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6628859) 3873 | 3874 | **[35] \[ICCV-2013]** T. Quy Phan, P. Shivakumara, S. Tian, and C. Lim Tan, “Recognizing text with perspective distortion in natural scenes,” in Proceedings of ICCV, 2013, pp. 569–576. [paper](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf) 3875 | 3876 | **[36] \[Expert Syst.Appl-2014]** A. Risnumawan, P. Shivakumara, C. S. Chan, and C. L. Tan, “A robust arbitrary text detection system for natural scene images,” Expert Systems with Applications, vol. 41, no. 18, pp. 8027–8048, 2014. [paper](https://pdf.sciencedirectassets.com/271506/1-s2.0-S0957417414X00114/1-s2.0-S0957417414004060/main.pdf?x-amz-security-token=AgoJb3JpZ2luX2VjEIn%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJGMEQCIGRhldIzSEiSi38FQg8SKy0nbkjetbYc6MOZN8lXbsg7AiB5pY6PoVDS8%2F3qS%2FPqd0zaHjqWrjqvDrAtTiZVSrpKmyrjAwiC%2F%2F%2F%2F%2F%2F%2F%2F%2F%2F8BEAIaDDA1OTAwMzU0Njg2NSIMoRUcs%2Ft3M%2Blz7DI3KrcDKVODcJNpq1tNel77UHdAJVSUDWRkG7Fx5loyHhQsb2V3MFfHqAGKJpbdr7xHlDY7CaTl5nrTdNe91InXfpM25fFyYQlANwrhlcYGiPEsN14mSBe%2F5gy0gipiPaN835Viwb0O1Raw9MP9Clc84q%2BHvW5YyQ02NIZYE7OlHYSWN6mbI0F9r7reRa2zNLjFVcvJRkAyt4I9L1Pbehf4WwFF1Er0ResczdJ2FbwKCZNaKWznwhdjVp8hw0SFC%2F9uxnDHzu8DTPeutA3mHeUetMOEtKspkSvIj%2FWQDI1OTECcNv0N7CfmUKSh9m5tOYn%2Fb7KQx8pPA450Snd3faB7euDzPwvCmtfeUzYs0n0UxWt9JWXxKONmvrMlxf9bjZL0RBR1NNeNgwFuCCCRZDAKogbBI%2FclJmPUiNO38mHO10B1unMow9XtojodS3U6Iz2n8jeERhaR4Rt4KeWB9ojZTQhUr5d1uUX%2FVpmJmFQhgvPXmEi%2Fnlp65TXeZNdbUncQPqZ8PaKGYxJn1F%2FwnuJH1Ww6ksTur4rcggGAEHTBgXL5vmLjJG2ZhH1vxaoY8qtUO5EuXwVVfBXCtzCit9jmBTq1Abb%2BixDckXLEYAwidBhA%2B4MKpbhmH4LItnjE3tSuFgOHFbwaq9g0MZyGtL3OUxCKc3eAkkLzOrJnEag%2F2eV%2BdMcU9%2F%2BKI5h8yQsz5PEFeqkY3BID%2BY7oIU6qhqhb38PVrbt1oyF420cS%2BoSpt2Nj6E%2FuCZZTrgMakE%2B9QXvAaTIBWG%2Fc0xOv6d3rGTAnZTkkscgD7j%2FP1jqFkf388YT5jCbAWlx8OiMbj%2BVPkK8UvXhgSh1ZPMk%3D&AWSAccessKeyId=ASIAQ3PHCVTY6VYAY5FB&Expires=1557539079&Signature=HF7E%2BOACBDJPtLAerMWH6MTKDCw%3D&hash=8fe7c8a1c5599e4659a2b26919b2a59f4aff8c452e5a07793dc187e49c69dc9e&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0957417414004060&tid=spdf-7bd6a860-4c57-4e39-87ce-bd2f050badcf&sid=57f9f09d5d026341242a7bd2fb38b97c5897gxrqb&type=client) 3877 | 3878 | **[37] \[ICDAR-2015]** D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al., “ICDAR 2015 competition on robust reading,” in Proceedings of ICDAR, 2015, pp. 1156–1160. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7333942) 3879 | 3880 | **[38] \[arXiv-2016]** A. Veit, T. Matera, L. Neumann, J. Matas, and S. Belongie, “Coco-text: Dataset and benchmark for text detection and recognition in natural images,” CoRR abs/1601.07140, 2016. [paper](https://arxiv.org/pdf/1601.07140.pdf) [code](https://github.com/andreasveit/coco-text) 3881 | 3882 | **[39] \[IJDAR-2019]** C.-K. Ch’ng, C. S. Chan, and C.-L. Liu, “Total-text: toward orientation robustness in scene text detection,” International Journal on Document Analysis and Recognition (IJDAR), pp. 1–22, 2019. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8270088) [code](https://github.com/cs-chan/Total-Text-Dataset) 3883 | 3884 | **[40] \[ICDAR-2017]** B. Shi, C. Yao, M. Liao, M. Yang, P. Xu, L. Cui, S. Belongie, S. Lu, and X. Bai, “ICDAR2017 competition on reading chinese text in the wild (rctw-17),” in Proceedings of ICDAR, 2017, pp. 1429–1434. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8270164) 3885 | 3886 | **[41] \[ICPR-2018]** M. He, Y. Liu, Z. Yang, S. Zhang, C. Luo, F. Gao, Q. Zheng, Y. Wang, X. Zhang, and L. Jin, “ICPR2018 contest on robust reading for multi-type web images,” in Proceedings of ICPR, 2018, pp. 7–12. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8546143) 3887 | 3888 | **[42] \[JCS&T-2019]** Yuan T L, Zhu Z, Xu K, et al. A large chinese text dataset in the wild[J]. Journal of Computer Science and Technology, 2019, 34(3): 509-521. [paper](https://link.springer.com/article/10.1007/s11390-019-1923-y) [code](https://github.com/yuantailing/ctw-baseline) 3889 | 3890 | **[43] \[arXiv-2017]** L. Yuliang, J. Lianwen, Z. Shuaitao, and Z. Sheng, “Detecting curve text in the wild: New dataset and new solution,” CoRR abs/1712.02170, 2017. [paper](https://arxiv.org/pdf/1712.02170.pdf) [code](https://github.com/Yuliang-Liu/Curve-Text-Detector) 3891 | 3892 | **[44] \[ECCV-2018]** P. Lyu, M. Liao, C. Yao, W. Wu, and X. Bai, “Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” in Proceedings of ECCV, 2018, pp. 67–83. [paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Pengyuan_Lyu_Mask_TextSpotter_An_ECCV_2018_paper.pdf) [code](https://github.com/lvpengyuan/masktextspotter.caffe2) 3893 | 3894 | **[45] \[NIPS-W-2011]** Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proceedings of NIPS, 2011. [paper](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/37648.pdf) 3895 | 3896 | **[46] \[PR-2019]** C. Luo, L. Jin, and Z. Sun, “MORAN: A multi-object rectified attention network for scene text recognition,” Pattern Recognition, vol. 90, pp. 109–118, 2019. [paper](https://pdf.sciencedirectassets.com/272206/1-s2.0-S0031320319X00023/1-s2.0-S0031320319300263/main.pdf?x-amz-security-token=AgoJb3JpZ2luX2VjEHcaCXVzLWVhc3QtMSJIMEYCIQCceDsmz9xCoE%2FnPRGjEfK6RAdvbVW7B%2B7rvG4viZPq%2BAIhALnS8lFI9N7LE3%2BNey3EgbWZU8f%2BJIAJIxlk5ewtKNTgKtoDCG8QAhoMMDU5MDAzNTQ2ODY1IgxbFmRrlDI1X0pjiscqtwMyG%2F2gaQjY2Ol3wnc2MtQwmfUOBotnxAj7F7lPnLyeRqPWJo6swI52Tz0YWuSI%2FTXRk3wuLNZthvTkLRWlD5wejFNGIAM9VNUuaYmfhLPT%2Ft33yaDBMGBe3wP%2BdsH0OXHn7JtbzLAFK%2BFPBYXLfSMth4IbmdKzLPAR77noWqY3NvtbYRJEvpw4r9N7yvM%2Fo1lBQnoulBsX%2B7sKRKm9SxWrreldzSuX2EIbnk5FPyXAkTlzYfH%2FCjoTOGYrReZl1VeFSnml73TaF1RImYbO0t155QfM8X90oEMKxlfd1IU8PzuYy14%2Bo0TAmHB3kYh3aKN6CEZ%2FEAHhxxhGrysEhBZWLF0RMp2oZFGDPxiOJVo3QES2GR38d9uBYjXF0dzjxgFnn40SuYTRE29nIjspgjZbsqeWyP%2BbsFHJcteX3w0eN6wq%2B%2Bbd5yPuAELoAy246KPZvRwBXGYUH1%2Bgm%2BqIIKidMGUedx98L2%2FO%2FAnbQ7gPCKW9HIWRdExufStWHid2r8gPpwB3%2Fb%2FKtCbA7iqm7cedZaqgjvaBbKM0hgBlgfXMGJTniXSmGYAIhvgH0aAlUCAAGeNsovbGMMCw1OYFOrMBESogEL75zcwF5QSaTC8s2DdSVFWjhwza7lgexCXY7r6aW4Y80XYluSj6%2BPWj2x6qH23kIlPIEfmyXL%2FJbbBzxTh%2FBbA9UVu04wph8eB%2BNGtvajIx%2FZFVNJSHc3WHz2DWlSAkLnuNGtahDLdXTG3ZD9jhdbweN7QdKYXsPPLh6ysos7v0hO9f7mM2%2BWsA5yOwB8lg9d2xmYu2PU6RKCQcv8hC1ASHgZh5PiYGBUjTfdv0cfI%3D&AWSAccessKeyId=ASIAQ3PHCVTYXA3CRTFK&Expires=1557473755&Signature=eBwdaj8gPpxtBnisCTPveLqX3iw%3D&hash=6f612616f8fba56ffd779d1eee8bfb19e39b7276f2b922ed529a46e51075049d&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S0031320319300263&tid=spdf-5d02711b-ea3c-46c1-bb55-4d301d13f947&sid=57f9f09d5d026341242a7bd2fb38b97c5897gxrqb&type=client) [code](https://github.com/Canjie-Luo/MORAN_v2) 3897 | 3898 | **[47] \[ACM-2019]** Xie H, Fang S, Zha Z J, et al, “Convolutional Attention Networks for Scene Text Recognition,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 15, pp. 3 2019. [paper](https://dl.acm.org/citation.cfm?id=3231737) 3899 | 3900 | **[48] \[AAAI-2019]** M. Liao, J. Zhang, Z. Wan, F. Xie, J. Liang, P. Lyu, C. Yao, and X. Bai, “Scene text recognition from two-dimensional perspective,” in Proceedings of AAAI, 2019, pp. 8714–8721. [paper](https://arxiv.org/pdf/1809.06508.pdf) 3901 | 3902 | **[49] \[AAAI-2019]** H. Li, P. Wang, C. Shen, and G. Zhang, “Show, attend and read: A simple and strong baseline for irregular text recognition,” in Proceedings of AAAI, 2019, pp. 8610–8617. [paper](https://arxiv.org/pdf/1811.00751.pdf) [code](https://tinyurl.com/ShowAttendRead) 3903 | 3904 | **[50] \[TPAMI-2015]** Q. Ye and D. Doermann, “Text detection and recognition in imagery: A survey,” IEEE Trans. Pattern Anal. Mach. Intell, vol. 37, no. 7, pp. 1480–1500, 2015. [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6945320) 3905 | 3906 | **[51] \[Frontiers-Comput. Sci-2016]** Y. Zhu, C. Yao, and X. Bai, “Scene text detection and recognition: Recent advances and future trends,” Frontiers of Computer Science, vol. 10, no. 1, pp. 19–36, 2016. [paper](https://link.springer.com/article/10.1007/s11704-015-4488-0) 3907 | 3908 | **[52] \[IJCV-2020]** Long S, He X, Yao C. Scene text detection and recognition: The deep learning era[J]. International Journal of Computer Vision, 2020: 1-24. [paper](https://link.springer.com/article/10.1007/s11263-020-01369-0) [code](https://github.com/Jyouhou/SceneTextPapers) 3909 | 3910 | **[53] \[NIPS-W-2014]** M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic data and artificial neural networks for natural scene text recognition,” in Proceedings of NIPS-W, 2014. [paper](https://arxiv.org/pdf/1406.2227.pdf) [code](http://www.robots.ox.ac.uk/~vgg/publications/2014/Jaderberg14c/) 3911 | 3912 | **[54] \[CVPR-2016]** A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text 3913 | localisation in natural images,” in Proceedings of CVPR, 2016, pp. 2315–2324. [paper](http://www.robots.ox.ac.uk/~ankush/textloc.pdf) [code](https://github.com/ankush-me/SynthText) 3914 | 3915 | **[55] \[CVPR-2019]** F. Zhan and S. Lu, “ESIR: End-to-end scene text recognition via iterative image rectification,” in Proceedings of CVPR, 2019, pp. 2059–2068. [paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_ESIR_End-To-End_Scene_Text_Recognition_via_Iterative_Image_Rectification_CVPR_2019_paper.pdf) 3916 | 3917 | **[56] \[CVPR-2019]** Y. Zhang, S. Nie, W. Liu, X. Xu, D. Zhang, and H. T. Shen, “Sequence-to-sequence domain adaptation network for robust text image recognition,” in Proceedings of CVPR, 2019, pp. 2740–2749. [paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Sequence-To-Sequence_Domain_Adaptation_Network_for_Robust_Text_Image_Recognition_CVPR_2019_paper.pdf) [code](https://github.com/AprilYapingZhang/Seq2SeqAdapt) 3918 | 3919 | **[57]\[ICDAR-2019]** Y. Sun, Z. Ni, C.-K. Chng, Y. Liu, C. Luo, C. C. Ng, J. Han, E. Ding, J. Liu, D. Karatzas et al., “ICDAR 2019 competition on large-scale street view text with partial labeling–RRC-LSVT,” in Proceedings of ICDAR, 2019, pp. 1557–1562. [paper](https://arxiv.org/abs/1909.07741) [Link](https://rrc.cvc.uab.es/?ch=16) 3920 | 3921 | **[58]\[ICDAR-2019]** C.-K. Chng, Y. Liu, Y. Sun, C. C. Ng, C. Luo, Z. Ni, C. Fang, S. Zhang, J. Han, E. Ding et al., “ICDAR2019 robust reading challenge on arbitrary-shaped text (RRC-ArT),” in Proceedings of ICDAR, 2019, pp. 1571–1576. [paper](https://arxiv.org/pdf/1909.07145.pdf) [Link](https://rrc.cvc.uab.es/?ch=14) 3922 | 3923 | **[59]\[ICDAR-2019]** X. Liu, R. Zhang, Y. Zhou, Q. Jiang, Q. Song, N. Li, K. Zhou, L. Wang, D. Wang, M. Liao et al., “ICDAR 2019 robust reading challenge on reading chinese text on signboard,” in Proceedings of ICDAR, 2019, pp. 1577–1581. [paper](https://ieeexplore.ieee.org/abstract/document/8978135) [Link](https://rrc.cvc.uab.es/?ch=12) 3924 | 3925 | **[60] \[NC-2019]** X. Chen, T. Wang, Y. Zhu, L. Jin, and C. Luo, “Adaptive embedding gate for attention-based scene text recognition,” Neurocomputing, vol. 381, pp. 261–271, 2020. [paper](https://www.sciencedirect.com/science/article/pii/S0925231219316510) 3926 | 3927 | *** 3928 | 3929 | **Newly added references (Dec 24, 2019)** 3930 | 3931 | **[61]** **\[ICFHR-2018]** C. Wang, F. Yin, and C.-L. Liu, “Memory-augmented attention model for scene text recognition,” in Proceedings of ICFHR, 2018, pp. 62–67. [paper](https://ieeexplore.ieee.org/abstract/document/8563227) 3932 | 3933 | **[62]** **\[ICCV-2019]** M. Yang, Y. Guan, M. Liao, X. He, K. Bian, S. Bai, C. Yao, and X. Bai, “Symmetry-constrained rectification network for scene text recognition,” in Proceedings of ICCV, 2019, pp. 9147–9156. [paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Yang_Symmetry-Constrained_Rectification_Network_for_Scene_Text_Recognition_ICCV_2019_paper.pdf) 3934 | 3935 | **[63]** **\[ICCV-2019]** Y. Sun, J. Liu, W. Liu, J. Han, E. Ding, and J. Liu, “Chinese street view text: Large-scale chinese text reading with partially supervised learning,” in Proceedings of ICCV, 2019, pp. 9086–9095. [paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Sun_Chinese_Street_View_Text_Large-Scale_Chinese_Text_Reading_With_Partially_ICCV_2019_paper.pdf) 3936 | 3937 | **[64]** **\[ICME-2019]** S. Wang, Y. Wang, X. Qin, Q. Zhao, and Z. Tang, “Scene text recognition via gated cascade attention,” in Proceedings of ICME, 2019, pp. 1018–1023. [paper](https://ieeexplore.ieee.org/abstract/document/8784914) 3938 | 3939 | **[65]** **\[ICCV-2019]** J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, “What is wrong with scene text recognition model comparisons? dataset and model analysis,” in Proceedings of ICCV, 2019, pp. 4714–4722. [paper](http://openaccess.thecvf.com/content_ICCV_2019/papers/Baek_What_Is_Wrong_With_Scene_Text_Recognition_Model_Comparisons_Dataset_ICCV_2019_paper.pdf) [code](https://github.com/clovaai/deep-text-recognition-benchmark) 3940 | 3941 | **[66]** **\[Nips-2017]** J. Wang and X. Hu, “Gated recurrent convolution neural network 3942 | for OCR,” in Proceedings of NIPS, 2017, pp. 335–344. [paper](http://papers.nips.cc/paper/6637-gated-recurrent-convolution-neural-network-for-ocr.pdf) [code](https://github.com/Jianfeng1991/GRCNN-for-OCR) 3943 | 3944 | **[67]** **\[NC-2020]** Y. Huang, Z. Sun, L. Jin, and C. Luo, “EPAN: Effective parts attention network for scene text recognition,” Neurocomputing, vol. 376, pp. 202–213, 2020. [paper](https://www.sciencedirect.com/science/article/pii/S0925231219313839) 3945 | 3946 | **[68]** **\[NC-2019]** Y. Gao, Y. Chen, J. Wang, M. Tang, and H. Lu, “Reading scene text with fully convolutional sequence modeling,” Neurocomputing, vol. 339, pp. 161–170, 2019. [paper](https://www.sciencedirect.com/science/article/pii/S0925231219301870) 3947 | 3948 | **[69]** **\[ICDAR-W-2019]** X. Qi, Y. Chen, R. Xiao, C.-G. Li, Q. Zou, and S. Cui, “A novel joint character categorization and localization approach for characterlevel scene text recognition,” in Proceedings of ICDAR: Workshops, 2019, pp. 83–90. [paper](https://ieeexplore.ieee.org/abstract/document/8892915) 3949 | 3950 | **[70]** **\[ICDAR-W-2019]** Q. Wang, W. Jia, X. He, Y. Lu, M. Blumenstein, Y. Huang, and S. Lyu, “ReELFA: A scene text recognizer with encoded location and focused attention,” in Proceedings of ICDAR: Workshops, 2019, pp. 71–76. [paper](https://ieeexplore.ieee.org/abstract/document/8892900) 3951 | 3952 | **[71]** **\[ICIP-2019]** Y. Zhu, S. Wang, Z. Huang, and K. Chen, “Text recognition in images based on transformer with hierarchical attention,” in Proceedings of ICIP, 2019, pp. 1945–1949. [paper](https://ieeexplore.ieee.org/abstract/document/8803203) 3953 | 3954 | **[72]** **\[CVPR-2019]** F. Zhan, H. Zhu, and S. Lu, “Spatial fusion gan for image synthesis,” in Proceedings of CVPR, 2019, pp. 3653–3662. [paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhan_Spatial_Fusion_GAN_for_Image_Synthesis_CVPR_2019_paper.pdf) 3955 | 3956 | **[73]** **\[ECCV-2018]** F. Zhan, S. Lu, and C. Xue, “Verisimilar image synthesis for accurate detection and recognition of texts in scenes,” in Proceedings of ECCV, 2018, pp. 249–266. [paper](http://openaccess.thecvf.com/content_ECCV_2018/papers/Fangneng_Zhan_Verisimilar_Image_Synthesis_ECCV_2018_paper.pdf) [code](https://github.com/fnzhan/Verisimilar-Image-Synthesis-for-Accurate-Detection-and-Recognition-of-Texts-in-Scenes) 3957 | 3958 | **[74]** **\[MultiMedia-2018]** S. Fang, H. Xie, Z.-J. Zha, N. Sun, J. Tan, and Y. Zhang, “Attention and language ensemble for scene text recognition with convolutional sequence modeling,” in ACM Multimedia Conference on Multimedia Conference, 2018, pp. 248–256. [paper](https://dl.acm.org/citation.cfm?id=3240571) [code](https://github.com/FangShancheng/conv-ensemble-str) 3959 | 3960 | **[75]** **\[Nips-2018]** H. Liu, S. Jin, and C. Zhang, “Connectionist temporal classification with maximum entropy regularization,” in Proceedings of NIPS, 2018, pp. 831–841 . [paper](http://papers.nips.cc/paper/7363-connectionist-temporal-classification-with-maximum-entropy-regularization) [code](https://github.com/liuhu-bigeye/enctc.crnn ) 3961 | 3962 | **[76]** **\[PR-2017]** B. Su and S. Lu, “Accurate recognition of words in scenes without 3963 | character segmentation using recurrent neural network,” Pattern Recognition, vol. 63, pp. 397–405, 2017. [paper](https://www.sciencedirect.com/science/article/pii/S0031320316303314) 3964 | 3965 | **[77]** **\[CVIU-2016]** A. Mishra, K. Alahari, and C. Jawahar, “Enhancing energy minimization framework for scene text recognition with top-down cues,” Computer Vision and Image Understanding, vol. 145, pp. 30–42, 2016. [paper](https://www.sciencedirect.com/science/article/pii/S107731421600014X) 3966 | 3967 | **[78]** **\[ICPR-2016]** X. Liu, T. Kawanishi, X. Wu, and K. Kashino, “Scene text recognition with CNN classifier and WFST-based word labeling,” in Proceedings of ICPR, 2016, pp. 3999–4004. [paper](https://ieeexplore.ieee.org/abstract/document/7900259) 3968 | 3969 | **[79]** **\[TPAMI-2019]** M. Liao, P. Lyu, M. He, C. Yao, W. Wu, and X. Bai, “Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes,” IEEE Trans. Pattern Anal. Mach. Intell, 2019. [paper](https://ieeexplore.ieee.org/abstract/document/8812908) [code](https://github.com/MhLiao/MaskTextSpotter) 3970 | 3971 | **[80]** **\[AAAI-2020]** T. Wang, Y. Zhu, L. Jin, C. Luo, X. Chen, Y. Wu, Q. Wang, and M. Cai, “Decoupled attention network for text recognition,” in Proceedings of AAAI, 2020. [paper](https://arxiv.org/pdf/1912.10205.pdf) [code](https://github.com/Wang-Tianwei/Decoupled-attention-network) 3972 | 3973 | --- 3974 | 3975 | **Newly added references (Feb 29, 2020)** 3976 | 3977 | **[81]** **\[ICDAR-2019]** N. Nayef, Y. Patel, M. Busta, P. N. Chowdhury, D. Karatzas, W. Khlif, J. Matas, U. Pal, J.-C. Burie, C.-l. Liu et al., “ICDAR2019 robust reading challenge on multi-lingual scene text detection and recognition–RRC-MLT-2019,” in Proceedings of ICDAR, 2019, pp. 1582–1587. [paper](https://arxiv.org/pdf/1907.00945.pdf) 3978 | 3979 | **[82]** **\[arXiv-2019]** W. Wang, E. Xie, P. Sun, W. Wang, L. Tian, C. Shen, and P. Luo, “TextSR: Content-aware text super-resolution guided by recognition,” CoRR abs/1909.07113, 2019. [paper](https://arxiv.org/pdf/1909.07113.pdf) [code](https://github.com/xieenze/TextSR) 3980 | 3981 | **[83]** **\[AAAI-2020]** Z. Wan, M. He, H. Chen, X. Bai, and C. Yao, “Textscanner: Reading characters in order for robust scene text recognition,” In Proceedings of AAAI, 2020. [paper](https://arxiv.org/pdf/1912.12422.pdf) 3982 | 3983 | **[84]** **\[AAAI-2020]** W. Hu, X. Cai, J. Hou, S. Yi, and Z. Lin, “GTC: Guided training of ctc towards efficient and accurate scene text recognition,” In Proceedings of AAAI, 2020. [paper](https://www.aaai.org/Papers/AAAI/2020GB/AAAI-HuW.7838.pdf) 3984 | 3985 | **[85]** **\[IJCV-2020]** C. Luo, Q. Lin, Y. Liu, J. Lianwen, and S. Chunhua, “Separating content from style using adversarial learning for recognizing text in the wild,” Int. J. Comput. Vis. 2020. [paper](https://arxiv.org/pdf/2001.04189.pdf) 3986 | 3987 | --- 3988 | 3989 | **Newly added references (May 8, 2020)** 3990 | 3991 | **[86]** **\[CVPR-2020]** R. Litman, O. Anschel, S. Tsiper, R. Litman, S. Mazor, and R. Manmatha, “SCATTER: selective context attentional scene text recognizer,” in Proceedings of CVPR, 2020. [paper](https://arxiv.org/abs/2003.11288.pdf) 3992 | 3993 | **[87]** **\[CVPR-2020]** D. Yu, X. Li, C. Zhang, J. Han, J. Liu, and E. Ding, “Towards accurate scene text recognition with semantic reasoning networks,” in Proceedings of CVPR, 2020. [paper](https://arxiv.org/abs/2003.12294.pdf) 3994 | 3995 | **[88]** **\[CVPR-2020]** S. Long and C. Yao, “UnrealText: Synthesizing realistic scene text images from the unreal world,” in Proceedings of CVPR, 2020. [paper](https://arxiv.org/abs/2003.10608.pdf) 3996 | 3997 | **[89]** **\[ECCV-2016]** W. Qiu and A. L. Yuille, “Unrealcv: Connecting computer vision 3998 | to unreal engine,” in Proceedings of ECCV, 2016, pp. 909–916. [paper](https://link.springer.com/chapter/10.1007/978-3-319-49409-8_75) 3999 | 4000 | **[90]** **\[ACM Computing Surveys-2020]** X. Chen, L. Jin, Y. Zhu, C. Luo, and T. Wang, “Text Recognition in the Wild: A Survey," ACM Computing Surveys (CSUR) 2020. [paper](https://arxiv.org/pdf/2005.03492.pdf) [code](https://github.com/HCIILAB/Scene-Text-Recognition) 4001 | 4002 | --- 4003 | 4004 | **Newly added references (Dec 8, 2020)** 4005 | 4006 | **[91]** **\[ICVGIP-2018]** Gupta A, Vedaldi A, Zisserman A. "Learning to read by spelling: Towards unsupervised text recognition," in Proceedings of ICVGIP, 2018. [paper](https://dl.acm.org/doi/pdf/10.1145/3293353.3293386) 4007 | 4008 | **[92]** **\[CVPR-2020]** Wan Z, Zhang J, Zhang L, et al, "On Vocabulary Reliance in Scene Text Recognition," in Proceedings of CVPR, 2020. [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Wan_On_Vocabulary_Reliance_in_Scene_Text_Recognition_CVPR_2020_paper.pdf) 4009 | 4010 | **[93]** **\[ECAI-2020]** Bleeker M, de Rijke M, "Bidirectional Scene Text Recognition with a Single Decoder," in Proceedings of ECAI, 2020. [paper](https://arxiv.org/pdf/1912.03656.pdf) [code](https://github.com/MauritsBleeker/Bi-STET) 4011 | 4012 | **[94]** **\[arXiv-2019]** Bartz C, Bethge J, Yang H, et al, "KISS: Keeping It Simple for Scene Text Recognition,"CoRR abs/1911.08400, 2019. [paper](https://arxiv.org/pdf/1911.08400.pdf) [code](https://github.com/Bartzi/kiss) 4013 | 4014 | **[95]** **\[arXiv-2020]** Zhang C, Xu Y, Cheng Z, et al, "SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition," CoRR abs/2005.13117, 2020. [paper](https://arxiv.org/pdf/2005.13117.pdf) 4015 | 4016 | **[96]** **\[arXiv-2020]** Lin J, Cheng Z, Bai F, et al, "Text Recognition in Real Scenarios with a Few Labeled Samples," CoRR abs/2006.12209, 2020. [paper](https://arxiv.org/pdf/2006.12209.pdf) 4017 | 4018 | **[97]** **\[ECCV-2020]** Zhang C, Gupta A, Zisserman A. "Adaptive Text Recognition through Visual Matching," in Proceedings of ECCV, 2020. [paper](https://link.springer.com/chapter/10.1007/978-3-030-58517-4_4) [code]( http://www.robots.ox.ac.uk/~vgg/research/FontAdaptor20/) 4019 | 4020 | **[98]** **\[ECCV-2020]** Zhang H, Yao Q, Yang M, et al, "AutoSTR: Efficient Backbone Search for Scene Text Recognition," in Proceedings of ECCV, 2020. [paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123690732.pdf) [code](https://github.com/AutoML-4Paradigm/AutoSTR.git) 4021 | 4022 | **[99]** **\[ECCV-2020]** Yan R, Huang Y, "PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit," in Proceedings of ECCV, 2020. [paper](http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123600154.pdf) 4023 | 4024 | **[100]** **\[ECCV-2020]** Yue X, Kuang Z, Lin C, et al. RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition," in Proceedings of ECCV, 2020. [paper](https://arxiv.org/pdf/2007.07542.pdf) 4025 | 4026 | **[101]** **\[CVPR-2020]** Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, and Weiping Wang. 2020. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. In Proceedings of CVPR. [paper](https://openaccess.thecvf.com/content_CVPR_2020/papers/Qiao_SEED_Semantics_Enhanced_Encoder-Decoder_Framework_for_Scene_Text_Recognition_CVPR_2020_paper.pdf) [code]( https://github.com/Pay20Y/SEED) 4027 | 4028 | 4029 | 4030 | ## 6.Help 4031 | 4032 | If you have any problem in our resources, or any good paper/code we missed, please inform us at xxuechen@foxmail.com. Thank you for your contribution. 4033 | 4034 | *** 4035 | 4036 | 4037 | 4038 | 4039 | ## 7.Copyright 4040 | 4041 | Copyright © 2020 SCUT-DLVC. All Rights Reserved. 4042 | 4043 |

4044 | Sample 4045 |

4046 | 4047 |

4048 |

4049 | -------------------------------------------------------------------------------- /scut-dlvc.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HCIILAB/Scene-Text-Recognition/19a26dbf4728e92d1a0ab9b1ad42128a12ef380e/scut-dlvc.jpeg --------------------------------------------------------------------------------