├── .github └── workflows │ └── greetings.yml ├── .gitignore ├── LICENSE ├── README.md ├── challenge-methods.md ├── image-acquiistion.md ├── ocr-engines.md ├── papers ├── Relation Schema Induction using Tensor Factorization with Side Information.pdf ├── 1610.01178v1.pdf ├── 1610.05567v1.pdf ├── A Sequence Learning Approach for Multiple Script Identification.pdf ├── Adaptive document image binarization.pdf ├── Brian Lott. Survey of Keyword Extraction Techniques.pdf ├── DCA-lecture06.pptx ├── Document-Image-Analysis-process.png ├── Figure 2.1- Illustration of basic template matching..png ├── Figure 2.2- Illustration of segmentation process using over-segmentation method..png ├── Figure 2.3- Figure showing the basic unit in HMM-based OCR. .png ├── Figure 2.4- Illustration of how RNNs are used for the OCR task.png ├── Figure 3.3- Example of Fraktur script. Ersch-Gruber is an encyclopedia written in the.png ├── Figure 3.4- Shape confusion in Fraktur script. Many characters in Fraktur resemble.png ├── Figure 3.5- Document quality degradation caused during preprocessing..png ├── Figure 3.6- Word formation in Devanagari script.png ├── Figure 3.7- Reading direction in Nastaleeq script.Nastaleeq script is read from right-.png ├── Generic Text Recognition using Long Short-Term Memory Networks-PhD_Thesis_Ul-Hasan.pdf ├── HOCR specification.pdf ├── InfoQ-JHipster-mini-book.pdf ├── Instructions.doc ├── Long-Short Memory Network(LSTM长短期记忆网络) - Physcal - 博客园.pdf ├── Page - Level Web Data Extraction f rom Template Pages-Chang_FiVaTech.pdf ├── Representation Learning -A Review and New Perspectives-TPAMISI-2012-04-0260-1.pdf ├── Review paper on “Optimized approaches for web data harvesting.pdf ├── Schema Extraction for Tabular Data on the Web-p421-adelfio.pdf ├── Sequence prediction using recurrent neural networks(LSTM) with TensorFlow — Mourad Mourafiq.pdf ├── Statistical Language Modeling for Historical Documents using Weighted Finite-State Transducers and Long Short-Term Memory.pdf ├── The Neural Turing Machine.pdf ├── Towards a Robust OCR System for Indic Scripts.pdf ├── UNSUPERVISED APPROACH TO DEDUCE SCHEMA AND EXTRACT DATA FROM TEMPLATE WEB PAGES.pdf ├── Universum Prescription- Regularization Using Unlabeled Data1511.03719v7.pdf ├── V3I3-0224.pdf ├── WEB SCHEMA DETECTION AND DATA EXTRACTION SYSTEM-17_chapter 7.pdf ├── ijsrp-p3021.pdf ├── jucs_20_02_0169_0192_grigalis.pdf ├── pdf_24.pdf ├── synthetic text-line image generation process.png ├── y-derivative of a Gaussian kernel (p. 42).pdf ├── 全家桶.jpg ├── 刘知远. 基于文档主题结构的关键词抽取方法研究.pdf ├── 文章结构.png ├── 深度神经网络结构以及Pre-Training的理解 - cyq0122的专栏 - 博客频道 - CSDN.NET.pdf ├── 自然语言处理的神经网络入门学习笔记.pdf └── 语义分析的一些方法(一) | 火光摇曳.pdf ├── post-processing.md ├── preprocessing.md ├── resources ├── 3754个常用汉字列表.txt ├── 6039.txt ├── chinese_5039.txt ├── cid2code.txt ├── special-character.txt ├── 汉语拼音码表-一级汉字拼音对照对照表.txt └── 汉语拼音码表-二级汉字拼音对照对照表.txt ├── trainning-data-preparing.md └── 车牌识别.md /.github/workflows/greetings.yml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/.github/workflows/greetings.yml -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/README.md -------------------------------------------------------------------------------- /challenge-methods.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /image-acquiistion.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/image-acquiistion.md -------------------------------------------------------------------------------- /ocr-engines.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /papers/ Relation Schema Induction using Tensor Factorization with Side Information.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/ Relation Schema Induction using Tensor Factorization with Side Information.pdf -------------------------------------------------------------------------------- /papers/1610.01178v1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/1610.01178v1.pdf -------------------------------------------------------------------------------- /papers/1610.05567v1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/1610.05567v1.pdf -------------------------------------------------------------------------------- /papers/A Sequence Learning Approach for Multiple Script Identification.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/A Sequence Learning Approach for Multiple Script Identification.pdf -------------------------------------------------------------------------------- /papers/Adaptive document image binarization.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Adaptive document image binarization.pdf -------------------------------------------------------------------------------- /papers/Brian Lott. Survey of Keyword Extraction Techniques.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Brian Lott. Survey of Keyword Extraction Techniques.pdf -------------------------------------------------------------------------------- /papers/DCA-lecture06.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/DCA-lecture06.pptx -------------------------------------------------------------------------------- /papers/Document-Image-Analysis-process.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Document-Image-Analysis-process.png -------------------------------------------------------------------------------- /papers/Figure 2.1- Illustration of basic template matching..png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 2.1- Illustration of basic template matching..png -------------------------------------------------------------------------------- /papers/Figure 2.2- Illustration of segmentation process using over-segmentation method..png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 2.2- Illustration of segmentation process using over-segmentation method..png -------------------------------------------------------------------------------- /papers/Figure 2.3- Figure showing the basic unit in HMM-based OCR. .png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 2.3- Figure showing the basic unit in HMM-based OCR. .png -------------------------------------------------------------------------------- /papers/Figure 2.4- Illustration of how RNNs are used for the OCR task.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 2.4- Illustration of how RNNs are used for the OCR task.png -------------------------------------------------------------------------------- /papers/Figure 3.3- Example of Fraktur script. Ersch-Gruber is an encyclopedia written in the.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 3.3- Example of Fraktur script. Ersch-Gruber is an encyclopedia written in the.png -------------------------------------------------------------------------------- /papers/Figure 3.4- Shape confusion in Fraktur script. Many characters in Fraktur resemble.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 3.4- Shape confusion in Fraktur script. Many characters in Fraktur resemble.png -------------------------------------------------------------------------------- /papers/Figure 3.5- Document quality degradation caused during preprocessing..png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 3.5- Document quality degradation caused during preprocessing..png -------------------------------------------------------------------------------- /papers/Figure 3.6- Word formation in Devanagari script.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 3.6- Word formation in Devanagari script.png -------------------------------------------------------------------------------- /papers/Figure 3.7- Reading direction in Nastaleeq script.Nastaleeq script is read from right-.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Figure 3.7- Reading direction in Nastaleeq script.Nastaleeq script is read from right-.png -------------------------------------------------------------------------------- /papers/Generic Text Recognition using Long Short-Term Memory Networks-PhD_Thesis_Ul-Hasan.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Generic Text Recognition using Long Short-Term Memory Networks-PhD_Thesis_Ul-Hasan.pdf -------------------------------------------------------------------------------- /papers/HOCR specification.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/HOCR specification.pdf -------------------------------------------------------------------------------- /papers/InfoQ-JHipster-mini-book.pdf: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /papers/Instructions.doc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Instructions.doc -------------------------------------------------------------------------------- /papers/Long-Short Memory Network(LSTM长短期记忆网络) - Physcal - 博客园.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Long-Short Memory Network(LSTM长短期记忆网络) - Physcal - 博客园.pdf -------------------------------------------------------------------------------- /papers/Page - Level Web Data Extraction f rom Template Pages-Chang_FiVaTech.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Page - Level Web Data Extraction f rom Template Pages-Chang_FiVaTech.pdf -------------------------------------------------------------------------------- /papers/Representation Learning -A Review and New Perspectives-TPAMISI-2012-04-0260-1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Representation Learning -A Review and New Perspectives-TPAMISI-2012-04-0260-1.pdf -------------------------------------------------------------------------------- /papers/Review paper on “Optimized approaches for web data harvesting.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Review paper on “Optimized approaches for web data harvesting.pdf -------------------------------------------------------------------------------- /papers/Schema Extraction for Tabular Data on the Web-p421-adelfio.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Schema Extraction for Tabular Data on the Web-p421-adelfio.pdf -------------------------------------------------------------------------------- /papers/Sequence prediction using recurrent neural networks(LSTM) with TensorFlow — Mourad Mourafiq.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Sequence prediction using recurrent neural networks(LSTM) with TensorFlow — Mourad Mourafiq.pdf -------------------------------------------------------------------------------- /papers/Statistical Language Modeling for Historical Documents using Weighted Finite-State Transducers and Long Short-Term Memory.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Statistical Language Modeling for Historical Documents using Weighted Finite-State Transducers and Long Short-Term Memory.pdf -------------------------------------------------------------------------------- /papers/The Neural Turing Machine.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/The Neural Turing Machine.pdf -------------------------------------------------------------------------------- /papers/Towards a Robust OCR System for Indic Scripts.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Towards a Robust OCR System for Indic Scripts.pdf -------------------------------------------------------------------------------- /papers/UNSUPERVISED APPROACH TO DEDUCE SCHEMA AND EXTRACT DATA FROM TEMPLATE WEB PAGES.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/UNSUPERVISED APPROACH TO DEDUCE SCHEMA AND EXTRACT DATA FROM TEMPLATE WEB PAGES.pdf -------------------------------------------------------------------------------- /papers/Universum Prescription- Regularization Using Unlabeled Data1511.03719v7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/Universum Prescription- Regularization Using Unlabeled Data1511.03719v7.pdf -------------------------------------------------------------------------------- /papers/V3I3-0224.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/V3I3-0224.pdf -------------------------------------------------------------------------------- /papers/WEB SCHEMA DETECTION AND DATA EXTRACTION SYSTEM-17_chapter 7.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/WEB SCHEMA DETECTION AND DATA EXTRACTION SYSTEM-17_chapter 7.pdf -------------------------------------------------------------------------------- /papers/ijsrp-p3021.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/ijsrp-p3021.pdf -------------------------------------------------------------------------------- /papers/jucs_20_02_0169_0192_grigalis.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/jucs_20_02_0169_0192_grigalis.pdf -------------------------------------------------------------------------------- /papers/pdf_24.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/pdf_24.pdf -------------------------------------------------------------------------------- /papers/synthetic text-line image generation process.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/synthetic text-line image generation process.png -------------------------------------------------------------------------------- /papers/y-derivative of a Gaussian kernel (p. 42).pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/y-derivative of a Gaussian kernel (p. 42).pdf -------------------------------------------------------------------------------- /papers/全家桶.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/全家桶.jpg -------------------------------------------------------------------------------- /papers/刘知远. 基于文档主题结构的关键词抽取方法研究.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/刘知远. 基于文档主题结构的关键词抽取方法研究.pdf -------------------------------------------------------------------------------- /papers/文章结构.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/文章结构.png -------------------------------------------------------------------------------- /papers/深度神经网络结构以及Pre-Training的理解 - cyq0122的专栏 - 博客频道 - CSDN.NET.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/深度神经网络结构以及Pre-Training的理解 - cyq0122的专栏 - 博客频道 - CSDN.NET.pdf -------------------------------------------------------------------------------- /papers/自然语言处理的神经网络入门学习笔记.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/自然语言处理的神经网络入门学习笔记.pdf -------------------------------------------------------------------------------- /papers/语义分析的一些方法(一) | 火光摇曳.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/papers/语义分析的一些方法(一) | 火光摇曳.pdf -------------------------------------------------------------------------------- /post-processing.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /preprocessing.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /resources/3754个常用汉字列表.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/3754个常用汉字列表.txt -------------------------------------------------------------------------------- /resources/6039.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/6039.txt -------------------------------------------------------------------------------- /resources/chinese_5039.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/chinese_5039.txt -------------------------------------------------------------------------------- /resources/cid2code.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/cid2code.txt -------------------------------------------------------------------------------- /resources/special-character.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/special-character.txt -------------------------------------------------------------------------------- /resources/汉语拼音码表-一级汉字拼音对照对照表.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/汉语拼音码表-一级汉字拼音对照对照表.txt -------------------------------------------------------------------------------- /resources/汉语拼音码表-二级汉字拼音对照对照表.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/resources/汉语拼音码表-二级汉字拼音对照对照表.txt -------------------------------------------------------------------------------- /trainning-data-preparing.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/trainning-data-preparing.md -------------------------------------------------------------------------------- /车牌识别.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wanghaisheng/awesome-ocr/HEAD/车牌识别.md --------------------------------------------------------------------------------