├── .gitattributes ├── doc ├── images │ ├── EDT-en.png │ ├── img_CLEAR.jpg │ ├── img_SPOT.gif │ ├── img_SPOT2.gif │ ├── contrast-en.png │ ├── img_HOLLOW.jpg │ ├── img_HOLLOW2.jpg │ ├── img_NORMAL.jpg │ ├── contrast-en2.png │ ├── contrast-zh-CN.png │ ├── demo_eurotext.png │ ├── img_WHITE_CHAR.jpg │ ├── contrast-zh-CN2.png │ ├── img_WHITE_CHAR2.jpg │ ├── plugin_linkbord1.png │ ├── plugin_linkbord2.png │ ├── plugin_linkbord3.png │ └── img_INTERFERENCE_LINE.png ├── readme_zh_CN.md └── readme_en.md ├── .github └── FUNDING.yml └── readme.md /.gitattributes: -------------------------------------------------------------------------------- 1 | * linguist-language=Java -------------------------------------------------------------------------------- /doc/images/EDT-en.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/EDT-en.png -------------------------------------------------------------------------------- /doc/images/img_CLEAR.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_CLEAR.jpg -------------------------------------------------------------------------------- /doc/images/img_SPOT.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_SPOT.gif -------------------------------------------------------------------------------- /doc/images/img_SPOT2.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_SPOT2.gif -------------------------------------------------------------------------------- /doc/images/contrast-en.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/contrast-en.png -------------------------------------------------------------------------------- /doc/images/img_HOLLOW.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_HOLLOW.jpg -------------------------------------------------------------------------------- /doc/images/img_HOLLOW2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_HOLLOW2.jpg -------------------------------------------------------------------------------- /doc/images/img_NORMAL.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_NORMAL.jpg -------------------------------------------------------------------------------- /doc/images/contrast-en2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/contrast-en2.png -------------------------------------------------------------------------------- /doc/images/contrast-zh-CN.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/contrast-zh-CN.png -------------------------------------------------------------------------------- /doc/images/demo_eurotext.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/demo_eurotext.png -------------------------------------------------------------------------------- /doc/images/img_WHITE_CHAR.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_WHITE_CHAR.jpg -------------------------------------------------------------------------------- /doc/images/contrast-zh-CN2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/contrast-zh-CN2.png -------------------------------------------------------------------------------- /doc/images/img_WHITE_CHAR2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_WHITE_CHAR2.jpg -------------------------------------------------------------------------------- /doc/images/plugin_linkbord1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/plugin_linkbord1.png -------------------------------------------------------------------------------- /doc/images/plugin_linkbord2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/plugin_linkbord2.png -------------------------------------------------------------------------------- /doc/images/plugin_linkbord3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/plugin_linkbord3.png -------------------------------------------------------------------------------- /doc/images/img_INTERFERENCE_LINE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ushelp/EasyOCR/HEAD/doc/images/img_INTERFERENCE_LINE.png -------------------------------------------------------------------------------- /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] 4 | patreon: # Replace with a single Patreon username 5 | open_collective: # Replace with a single Open Collective username 6 | ko_fi: # Replace with a single Ko-fi username 7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel 8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry 9 | liberapay: # Replace with a single Liberapay username 10 | issuehunt: # Replace with a single IssueHunt username 11 | otechie: # Replace with a single Otechie username 12 | custom: ['http://www.easyproject.cn/donation'] 13 | -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # EasyOCR 2 | --------------- 3 | 4 | Latest update: The 5.X version will be open sourced for free before October 2021. 5 | 6 | --------------- 7 | 8 | **Note: After 4.X, EasyOCR is no longer open source.** 9 | 10 | I am sorry for the users who support open source, because in addition to the open source spirit of selfless devotion, commercial support is also a major driving force for technological advancement. Thank you for your understanding. If you need to communicate, we can still provide corresponding assistance. The vision is to love open source and the joy of giving, we will also provide more other open source projects to promote community progress. 11 | 12 | BTW. Maybe we will reopen the source code in the future, depending on the factors that drive us. Therefore, we currently keep this repo. 13 | 14 | --------------- 15 | 16 | EasyOCR及其验证码识别 4.X 之后不再开源及免费提供,在此为支持开源的用户抱歉,因为除了无私付出的开源精神,商业支持亦是技术前进的重大推动力,感谢理解。如果您需要交流,我们依然可以提供相应帮助。愿景于热爱开源与付出所得来的快乐,我们还会提供更多其他的开源项目推动社区进步。 17 | 18 | 我们此刻还在平衡开源与商业之间的关系,并考虑未来重新开放代码,这将需要基于驱动我们的因素以及一些时间。 19 | 20 | 商业用户可获取相关资料: 21 | 22 | - EasyOCR 开发使用手册 23 | - EasyOCR Installation 引擎安装文档(Windows, Non-Windows) 24 | - EasyOCR API 文档手册 25 | - EasyOCR Plugin 扩展插件开发手册 26 | - EasyOCR Java doc 文档 27 | 28 | 29 | --------------- 30 | 31 | ### 中文 32 | 33 | 34 | EasyOCR 是一个使用 Java 语言实现的 OCR 识别引擎(基于Tesseract)。借助几个简单的 API,即能使用Java语言完成图片内容识别工作。并集成了**图片清理**、**识别 CAPTCHA 验证码图片**,**票据**等内容的一体化工作。EasyOCR 引擎支持**扩展插件编程**,并提供 **ETD 模板**支持,提供图形化 ETD 模板设计工具(**EasyTemplateDesigner GUI**)。 35 | 36 | EasyOCR 不仅可以为消费者提供服务,更主要面向开发,能够提供本地化的开发 SDK 集成,与 C/S,B/S 及 Android 移动端项目进行原生集成。 37 | 38 | 39 | 由于在当前OCR引擎领域,与主流商业引擎对比,EasyOCR 具备 SDK 集成能力,具备编程灵活性,功能全面,识别准确和性能卓越,目前已经为全球多家企业提供了引擎支持。在中文识别等领域,经过对比其他商业引擎,EasyOCR 具有更高灵活性及识别率。目前商业服务的领域包括银行,爬虫应用,支付,大数据处理以及在线游戏图形数据分析处理(英国)等等领域。 40 | 41 | ### English 42 | 43 | EasyOCR is a Java language using OCR recognition engine (based Tesseract). By means of a few simple API, the Java language can be used to complete the picture content identification work. And integrated **image cleanup**, **recognition CAPTCHA image**, **bill notes** and other content integration efforts. EasyOCR engine supports **plugin programming**, **ETD templates** support, provide a graphical ETD template design tools (**EasyTemplateDesigner GUI**). 44 | 45 | EasyOCR not only provide services for consumers, but mainly oriented to provide localized development SDK integration with C/S, B/S and Android mobile terminal native integration projects. 46 | 47 | Since the current OCR engine field, and mainstream commercial engines contrast, EasyOCR with SDK integration capabilities, with programming flexibility, comprehensive, accurate identification and performance, has provided engine support for global enterprises. In Chinese recognition and other fields, after comparing other commercial engines, EasyOCR have greater flexibility and recognition rate. Currently in the field of business services, including banking, reptiles application, pay, large data processing and data analysis in the field of online games graphics processing (United Kingdom), and so on. 48 | 49 | 50 | 51 | ## Document/文档 52 | 53 | ### 中文 54 | 55 | [中文说明文档](doc/readme_zh_CN.md) 56 | 57 | [官方主页](http://www.easyproject.cn/easyocr/zh-cn/index.jsp '官方主页') 58 | 59 | [留言评论](http://www.easyproject.cn/easyocr/zh-cn/index.jsp#donation '留言评论') 60 | 61 | 如果您有更好意见,建议或想法,请联系我。 62 | 63 | ### English 64 | 65 | [English Readme](doc/readme_en.md) 66 | 67 | [The official home page](http://www.easyproject.cn/easyocr/en/index.jsp 'The official home page') 68 | 69 | [Comments](http://www.easyproject.cn/easyocr/en/index.jsp#donation 'Comments') 70 | 71 | If you have more comments, suggestions or ideas, please contact me. 72 | 73 | 74 | 75 | ## End 76 | 77 | Email: 78 | 79 | [http://www.easyproject.cn](http://www.easyproject.cn "EasyProject Home") 80 | 81 | 82 | **Donation/捐助:** 83 | 84 | 85 | 
86 | 支付宝/微信/QQ/云闪付/PayPal 扫码支付 87 |
支付宝/微信/QQ/云闪付/PayPal
88 | 89 |
90 | 91 | 我们相信,每个人的点滴贡献,都将是推动产生更多、更好免费开源产品的一大步。 92 | 93 | **感谢慷慨捐助,以支持服务器运行和鼓励更多社区成员。** 94 | 95 | We believe that the contribution of each bit by bit, will be driven to produce more and better free and open source products a big step. 96 | 97 | **Thank you donation to support the server running and encourage more community members.** 98 | 99 | -------------------------------------------------------------------------------- /doc/readme_zh_CN.md: -------------------------------------------------------------------------------- 1 | # EasyOCR 2 | 3 | --------------- 4 | 5 | EasyOCR 是一个使用 Java 语言实现的 OCR 识别引擎(基于Tesseract)。借助几个简单的API,即能使用Java语言完成图片内容识别工作。并集成了图片清理、识别 CAPTCHA 验证码图片,票据等内容的一体化工作。 6 | 7 | EasyOCR不仅可以为消费者提供服务,更主要面向开发,能够提供本地化的开发SDK集成,与 C/S,B/S 及 Android 移动端项目进行原生集成。 8 | 9 | EasyOCR 5.X 新架构上线,最新版本 5.1.0。 10 | 11 | 12 | 13 | ## 主要特点 14 | 15 | - API 极简,一个方法,一行代码即可完成 16 | 17 | - 纯本地化SDK,JAVA原生支持,可作为引擎嵌入各种项目,支持 Android 移动端集成 18 | 19 | - 支持 API 级别的识别白名单限定,限定识别范围 20 | 21 | - 支持上百种语言识别,并支持混合语言识别,如:英文+日文+德文 22 | 23 | - 专门针对常用票据、验证码图片的清理、识别一体化实现,内置多种常见类型的验证码图片选项 24 | 25 | - 支持自定义插件,能够编写基于EasyOCR一体化识别的图片清理扩展插件 26 | 27 | - ETD模板支持,提供图形化ETD模板设计工具(EasyTemplateDesigner),准确可控提高识别率 28 | 29 | - EasyOCR Suite 跨平台 GUI 套件支持,为开发人员和消费者提供设计和使用工具 30 | 31 | - 标准输入输出,支持Socket网络接口的输入输出 32 | 33 | - 支持识别训练,基于规则的结果修正训练,让识别准确合理,提供后天能力增长 34 | 35 | - 性能卓越,默认纯内存运算交换 36 | 37 | - 可脱离环境变量运行 38 | 39 | - 跨平台支持:Window, Linux, Unix, Android 40 | 41 | ## EasyOCR 使用步骤 42 | 43 | 1. 安装引擎 44 | 45 | 2. 加入 jar 包 46 | 47 | 3. 调用 API 48 | 49 | ## EasyOCR 核心 API 50 | 51 | - `EasyOCR`:识别图片文字的 OCR 核心类,完成对 OCR 引擎的调用。内部完成自动清理,识别的一体化工作。支持识别,清理识别,基于模板的清理识别等方法。 52 | 53 | - `ImageClean`:图片及验证码清理类,完成各种验证码、票据和普通图片的清理工作并输出。支持图片清理(内置几种预定义的图片清理模式可以灵活切换选择)、形变和旋转等场景,并支持场景的同时应用,来提高文字识别率。 54 | 55 | - `Language`:EasyORC 识别语言列表,支持多语言同时混合识别。 56 | 57 | - `TextMode`:EasyOCR 识别模式列表,支持多种类型的文字识别模式枚举选择。 58 | 59 | - `CleanType`:验证码和普通图片过滤清理类型枚举。支持NONE不清理、CAPTCHA验证码清理、TEXT文字清理及票据等图像清理算法。 60 | 61 | 62 | ## EasyOCR Demo 63 | 64 | ### 1. 识别 Demo 65 | ![demo_eurotext.png](images/demo_eurotext.png) 66 | 67 | ```JAVA 68 | EasyOCR e=new EasyOCR(); 69 | //直接识别图片内容 70 | System.out.println(e.recognize("images/demo_eurotext.png")); 71 | ``` 72 | 73 | ### 2. 验证码识别Demo 74 | 75 | ![img_INTERFERENCE_LINE.png](images/img_INTERFERENCE_LINE.png) ![img_NORMAL.jpg](images/img_NORMAL.jpg) 76 | 77 | ```JAVA 78 | //直接识别验证码图片内容 79 | System.out.println(e.recognizeAutoCleanImage("images/img_INTERFERENCE_LINE.png",ImageType.CAPTCHA_INTERFERENCE_LINE)); 80 | //验证码图片,经过:普通清理、形变场景自动一体化处理后,识别内容 81 | System.out.println(e.recognizeAutoCleanImage("images/img_NORMAL.jpg", ImageType.CAPTCHA_NORMAL, 1.6, 0.7)); 82 | ``` 83 | 84 | 85 | 提示:对验证码图片进行合适的形变有助于提高识别率。在需要比例调整的特殊情况下,可通过多次分析观察获得合适比例。 86 | ```JAVA 87 | for(double imageWidthRatio=0.8;imageWidthRatio<=2;imageWidthRatio+=0.1){ 88 | for (double imageHeightRatio = 0.8;imageHeightRatio<=2.8;imageHeightRatio+=0.1) { 89 | System.out.println(e.recognizeAndAutoCleanImage("images/d.jpg",ImageType.CAPTCHA_NORMAL,imageWidthRatio,imageHeightRatio)); 90 | } 91 | } 92 | ``` 93 | 94 | ### 3. API 使用Demo 95 | ```Java 96 | EasyOCR ocr = new EasyOCR(); 97 | 98 | System.out.println("###### 中文会议通知内容识别 ######"); 99 | ocr.setAmendPath("amend_chi.txt"); // 中文识别修正 100 | ocr.setLanguage(Language.CHI_SIM); // 中文语言 101 | String res=ocr.recognize("images/bank/notice.tif"); 102 | System.out.println(res); 103 | 104 | System.out.println("###### 多语言混合识别 ######"); 105 | ocr.setLanguage(Language.multiLanguage(Language.ENG,Language.CHI_SIM)); // 多语言识别 106 | String res2=ocr.recognize("images/bank/bill2.tif"); 107 | System.out.println(res2); 108 | 109 | System.out.println("###### 基于ETD模板的中文银行票据识别 ######"); 110 | ocr.setLanguage(Language.CHI_SIM); // 中文识别 111 | ocr.setTextMode(TextMode.UNIFORM_TEXT); // 统一大小 112 | List res3=ocr.recognizeByTemplate("images/bank/bill3.jpg", "images/bank/bill.etd", ImageType.BILL_NORMAL); 113 | System.out.println(res3); 114 | 115 | System.out.println("###### 带图片的清理数字内容识别 ######"); 116 | ocr.setLanguage(Language.ENG); // 英文识别 117 | ocr.setCharList("0123456789"); // 字符限定API 118 | ocr.setTextMode(TextMode.SINGLE_LINE_TEXT); // 单行文本识别 119 | String res4=ocr.recognizeAutoCleanImage("images/bank/example4.jpg",ImageType.TEXT_BOLD_BLAK); 120 | System.out.println(res4); 121 | ``` 122 | 123 | - 自动检测文字区域 124 | 125 | ```Java 126 | String filePath="./images/idcard.png"; 127 | BufferedImage image = ImageIO.read(new File(filePath)); 128 | 129 | // Find Text Regions 130 | List regions = EasyOCR.findTextRegions(image, 10, 20, 70, 200); 131 | 132 | Graphics g = image.getGraphics(); 133 | g.setColor(Color.RED); 134 | 135 | for (TextRegion r : regions) { 136 | // if (r.x - 5 > 0) { 137 | // r.x -= 5; 138 | // } 139 | if (r.height >= 5) { 140 | r.y -= 5; 141 | r.y2 += 5; 142 | g.drawRect(r.x, r.y, r.x2 - r.x, r.y2 - r.y); 143 | g.drawRect(r.x+1, r.y+1, (r.x-r.x)-2, (r.y2-r.y)-2); 144 | } 145 | } 146 | 147 | ImageIO.write(image,filePath.substring(filePath.lastIndexOf('.')+1), new File("./images/idcard_2.png")); 148 | ``` 149 | 150 | ### 4. 当前枚举的验证码列表 151 | - `CAPTCHA_NORMAL` :普通验证码图片 152 | ![普通验证码](images/img_NORMAL.jpg) 153 | 154 | - `CAPTCHA_INTERFERENCE_LINE` :带干扰线的验证码图片 155 | ![带干扰线的验证码图片](images/img_INTERFERENCE_LINE.png) 156 | 157 | - `CAPTCHA_SPOT` : 点状验证码图片 158 | ![点状验证码](images/img_SPOT.gif) ![点状验证码](images/img_SPOT2.gif) 159 | 160 | - `CAPTCHA_WHITE_CHAR` : 白色文字,纯色背景验证码图片 161 | ![白色文字验证码](images/img_WHITE_CHAR.jpg) ![白色文字验证码](images/img_WHITE_CHAR2.jpg) 162 | 163 | - `CAPTCHA_HOLLOW_CHAR` : 空心文字验证码图片 164 | ![空心文字验证码](images/img_HOLLOW.jpg) ![空心文字验证码](images/img_HOLLOW2.jpg) 165 | 166 | - `CLEAR` : 无特殊干扰的普通图片清晰化,提高识别率 ![普通图片清晰化](images/img_CLEAR.jpg) 167 | 168 | - `LINK_BOLD` : 粘连的加粗字体 ![linkbord CAPTCHA](images/plugin_linkbord1.png) ![linkbord CAPTCHA](images/plugin_linkbord2.png) ![linkbord CAPTCHA](images/plugin_linkbord3.png) 169 | 170 | 171 | ## EasyOCR Suite 跨平台GUI套件 172 | EasyOCR Suite 提供了一套面向开发模板设计(ETD 工具),消费使用(EasyOCR UI)等等场景的跨平台图形化工具。 173 | 174 | 界面设计特别针对现代UI,化繁为简,并对触摸交互做了优化。在客户体验上功能清晰,易用,无论是开发人员还是用户都可以轻而易举的使用。 175 | 176 | - EDT工具 177 | 178 | ![EasyOCR Template Designer](images/EDT-en.png) 179 | 180 | 181 | 182 | ## 与其他商业引擎对比 183 | 与传统厂商不同,由于具有本地化SDK集成能力。EasyOCR不仅可以为消费者提供服务,更面向开发,可作为各种商业项目的内置引擎。 184 | 185 | ![EasyOCR Template Designer](images/contrast-zh-CN.png) 186 | 187 | 188 | ## 技术支持与服务 189 | 190 | 191 | OCR 识别并不是一项存在一劳永逸解决方案的技术工作。具体识别都应当视场景进行分析优化处理,常常需要配合图片清理,特征分析提取,规则修正和一定的后天学习。银行,游戏,支付,验证码破解,不同领域需要分析以提供不同的处理方法。 192 | 193 | EasyOCR 项目自发布以来,收到来自国内外,各个行业朋友的咨询服务,并寻求与其业务行业相关的具体解决方案。 194 | 195 | 由于在当前OCR引擎领域,与主流商业引擎对比,EasyOCR具备SDK集成能力,具备编程灵活性,功能全面,识别准确和性能卓越,已经为全球多家企业提供了引擎支持。在中文识别等领域,经过对比其他商业引擎,EasyOCR具有更高灵活性及识别率。目前商业服务的领域包括银行,爬虫应用,支付,大数据处理以及在线游戏图形数据分析处理(英国)等等领域。 196 | 197 | **为了保护这些商业用户的利益,EasyOCR及其验证码识别 4.X 之后不再免费提供,在此为支持开源的用户抱歉,因为除了无私付出的开源精神,商业支持亦是技术前进的重大推动力,感谢理解。如果您需要交流,我们依然可以提供相应帮助。愿景于热爱开源与付出所得来的快乐,我们还会提供更多其他的开源项目推动社区进步。** 198 | 199 | **商业用户可获取相关资料:** 200 | - EasyOCR 开发使用手册 201 | - EasyOCR Installation 引擎安装文档(Windows, Non-Windows) 202 | - EasyOCR API 文档手册 203 | - EasyOCR Plugin 扩展插件开发手册 204 | - EasyOCR Java doc 文档 205 | 206 | **如有任何需要请联系我们,提供引擎,本地SDK,云服务,跨平台图形设计工具,需求与方案定制,持续响应,合作等服务。** 207 | 208 | 209 | 210 | > **OCR 技术浅谈** 211 | > 212 | > OCR 技术到今天为止已经相对成熟,在一些领域已经得到广泛应用,能够为生活便利和突破。但另一方面它却并没有很多人想象的更好,这项技术也有其适应方面,在另外一些更宽泛的领域,OCR往往是作为一种更加高效的辅助手段,而并非绝对可靠的解决方案。 213 | > 214 | > 对于这些场景和领域来说,识别结果就是薛定谔盒子里的猫,人工参与确认之前,状态是叠加的,既正确又不正确。所以如果在无法达到限定场景的情况下,却需要通过OCR识别得到100%准确的结果,那么在哲学和逻辑上首先是一个难题。 215 | > 216 | > 一个更好的方案背后往往需要的更多配合工作,如图像清理,特征分析,辅助工具等等知识技能。OCR有时候就像一个小孩,要有容忍,也需要指导,纠正,训练,让他做的更好。 217 | > 218 | 219 | 220 | 221 | 222 | ## End 223 | 224 | Email: 225 | 226 | [http://www.easyproject.cn](http://www.easyproject.cn "EasyProject Home") 227 | 228 | 229 | **Donation/捐助:** 230 | 231 | 232 | 
233 | 支付宝/微信/QQ/云闪付/PayPal 扫码支付 234 |
支付宝/微信/QQ/云闪付/PayPal
235 | 236 |
237 | 238 | 我们相信,每个人的点滴贡献,都将是推动产生更多、更好免费开源产品的一大步。 239 | 240 | **感谢慷慨捐助,以支持服务器运行和鼓励更多社区成员。** 241 | 242 | We believe that the contribution of each bit by bit, will be driven to produce more and better free and open source products a big step. 243 | 244 | **Thank you donation to support the server running and encourage more community members.** 245 | -------------------------------------------------------------------------------- /doc/readme_en.md: -------------------------------------------------------------------------------- 1 | # EasyOCR 2 | 3 | --------------- 4 | 5 | 6 | EasyOCR is a Java language using OCR recognition engine (based Tesseract). By means of a few simple API, the Java language can be used to complete the picture content identification work. And integrated image cleanup, recognition CAPTCHA CAPTCHA image, notes and other content integration efforts. 7 | 8 | EasyOCR not only provide services for consumers, but mainly oriented to provide localized development SDK integration with C / S, B / S and Android mobile terminal native integration projects. 9 | 10 | EasyOCR 5.X new architecture on the line, the latest version 5.1.0. 11 | 12 | 13 | ## Main feature 14 | 15 | - API minimalist, a method, a line of code to complete 16 | 17 | - Pure localization SDK, JAVA native support, can be used as engine integrated into various projects to support the Android mobile terminal integrated 18 | 19 | - Support for API-level recognition whitelist limited, limited recognition range 20 | 21 | - Supports hundreds of speech recognition, and supports mixed language recognition, such as: English + Japanese + German 22 | 23 | - Specifically for common bills verification code image cleanup, identifying achieve integration, built-in multiple common types of CAPTCHA Options 24 | 25 | - Support for custom plug-ins, the ability to write extensions to clean up the image recognition based on the integration EasyOCR 26 | 27 | - ETD template support, ETD template provides a graphical design tool (EasyTemplateDesigner), accurately controllable improve the recognition rate 28 | 29 | - EasyOCR Suite suite supports cross-platform GUI, provide design and use of tools for developers and consumers 30 | 31 | - Standard input and output, Socket network interface support input and output 32 | 33 | - Support recognition training, rules-based modification of results in training, so recognition accuracy and reasonableness, acquired the ability to provide growth 34 | 35 | - Performance, pure memory operation default swap 36 | 37 | - May run out of environment variables 38 | 39 | - Cross-platform support: Window, Linux, Unix, Android 40 | 41 | ## EasyOCR Steps for usage 42 | 43 | 1. Install Engine 44 | 45 | 2. Add the jar package 46 | 47 | 3. Call API 48 | 49 | ## EasyOCR Core API 50 | 51 | - `EasyOCR`: OCR text recognition Pictures core classes, complete a call to OCR engine. Internal complete automatic cleanup, recognition integration efforts. Support identification, cleaning recognition, cleaning recognition template-based approach. 52 | 53 | - `ImageClean`: Photos and verification code cleanup class, complete a variety of verification code, notes and pictures of the clean-up work and common outputs. Support image cleanup (built several predefined picture cleanup mode selection switch can be flexible), deformation and rotation scene, and the scene at the same time support applications, to improve text recognition rate. 54 | 55 | - `Language`: Easy OCR recognition language list, support for multiple languages simultaneously mixed recognized. 56 | 57 | - `TextMode`: Easy OCR recognition mode list that supports several types of text recognition mode enumeration selection. 58 | 59 | - `CleanType`: Code and the general image filtering clean type enumeration. NONE who does not clean up, CAPTCHA verification code cleanup, TEXT Text cleaning and bills image cleanup algorithm. 60 | 61 | 62 | ## EasyOCR Demo 63 | 64 | ### 1. Recognition Demo 65 | ![demo_eurotext.png](images/demo_eurotext.png) 66 | 67 | ```JAVA 68 | EasyOCR e=new EasyOCR(); 69 | //Direct recognition images Content 70 | System.out.println(e.recognize("images/demo_eurotext.png")); 71 | ``` 72 | 73 | ### 2. CAPTCHA recognition Demo 74 | 75 | ![img_INTERFERENCE_LINE.png](images/img_INTERFERENCE_LINE.png) ![img_NORMAL.jpg](images/img_NORMAL.jpg) 76 | 77 | ```JAVA 78 | // Direct recognition CAPTCHA images Content 79 | System.out.println(e.recognizeAutoCleanImage("images/img_INTERFERENCE_LINE.png",ImageType.CAPTCHA_INTERFERENCE_LINE)); 80 | // CAPTCHA image, through: general cleaning, automatic integration process after deformation scene, identify the content 81 | System.out.println(e.recognizeAutoCleanImage("images/img_NORMAL.jpg", ImageType.CAPTCHA_NORMAL, 1.6, 0.7)); 82 | ``` 83 | 84 | 85 | Tip: verification code image suitable deformation helps to improve the recognition rate. In special cases the proportion to be adjusted, can be observed by multiple analysis to get the right ratio. 86 | ```JAVA 87 | for(double imageWidthRatio=0.8;imageWidthRatio<=2;imageWidthRatio+=0.1){ 88 | for (double imageHeightRatio = 0.8;imageHeightRatio<=2.8;imageHeightRatio+=0.1) { 89 | System.out.println(e.recognizeAndAutoCleanImage("images/d.jpg",ImageType.CAPTCHA_NORMAL,imageWidthRatio,imageHeightRatio)); 90 | } 91 | } 92 | ``` 93 | 94 | ### 3. API Use Demo 95 | ```Java 96 | EasyOCR ocr = new EasyOCR(); 97 | 98 | System.out.println("###### Chinese meeting notice Content recognition ######"); 99 | ocr.setAmendPath("amend_chi.txt"); // Chinese amend 100 | ocr.setLanguage(Language.CHI_SIM); // Chinese-Simple 101 | String res=ocr.recognize("images/bank/notice.tif"); 102 | System.out.println(res); 103 | 104 | System.out.println("###### Multilingual hybrid recognition ######"); 105 | ocr.setLanguage(Language.multiLanguage(Language.ENG,Language.CHI_SIM)); // Multilingual hybrid 106 | String res2=ocr.recognize("images/bank/bill2.tif"); 107 | System.out.println(res2); 108 | 109 | System.out.println("###### Recognition Based on EMD template Chinese bank notes ######"); 110 | ocr.setLanguage(Language.CHI_SIM); // Chinese-Simple 111 | ocr.setTextMode(TextMode.UNIFORM_TEXT); // Uniform text 112 | List res3=ocr.recognizeByTemplate("images/bank/bill3.jpg", "images/bank/bill.etd", ImageType.BILL_NORMAL); 113 | System.out.println(res3); 114 | 115 | System.out.println("###### Cleanup digital content recognition with pictures ######"); 116 | ocr.setLanguage(Language.ENG); // English 117 | ocr.setCharList("0123456789"); // Char whitelist 118 | ocr.setTextMode(TextMode.SINGLE_LINE_TEXT); // Single line text 119 | String res4=ocr.recognizeAutoCleanImage("images/bank/example4.jpg",ImageType.TEXT_BOLD_BLAK); 120 | System.out.println(res4); 121 | ``` 122 | 123 | - Auto detect text regions 124 | 125 | ```Java 126 | String filePath="./images/idcard.png"; 127 | BufferedImage image = ImageIO.read(new File(filePath)); 128 | 129 | // Find Text Regions 130 | List regions = EasyOCR.findTextRegions(image, 10, 20, 70, 200); 131 | 132 | Graphics g = image.getGraphics(); 133 | g.setColor(Color.RED); 134 | 135 | for (TextRegion r : regions) { 136 | // if (r.x - 5 > 0) { 137 | // r.x -= 5; 138 | // } 139 | if (r.height >= 5) { 140 | r.y -= 5; 141 | r.y2 += 5; 142 | g.drawRect(r.x, r.y, r.x2 - r.x, r.y2 - r.y); 143 | g.drawRect(r.x+1, r.y+1, (r.x-r.x)-2, (r.y2-r.y)-2); 144 | } 145 | } 146 | 147 | ImageIO.write(image,filePath.substring(filePath.lastIndexOf('.')+1), new File("./images/idcard_2.png")); 148 | ``` 149 | 150 | ### 4. CAPTCHA list of current enumeration 151 | - `CAPTCHA_NORMAL` : Common Captcha Image 152 | ![Common Captcha Image](images/img_NORMAL.jpg) 153 | 154 | - `CAPTCHA_INTERFERENCE_LINE` : With interference lines CAPTCHA 155 | ![With interference lines CAPTCHA](images/img_INTERFERENCE_LINE.png) 156 | 157 | - `CAPTCHA_SPOT` : Dot CAPTCHA 158 | ![Dot CAPTCHA](images/img_SPOT.gif) ![Dot CAPTCHA](images/img_SPOT2.gif) 159 | 160 | - `CAPTCHA_WHITE_CHAR` : White text, a solid background CAPTCHA 161 | ![White text, a solid background CAPTCHA](images/img_WHITE_CHAR.jpg) ![White text, a solid background CAPTCHA](images/img_WHITE_CHAR2.jpg) 162 | 163 | - `CAPTCHA_HOLLOW_CHAR` : Hollow Text Captcha Image 164 | ![Hollow Text Captcha Image](images/img_HOLLOW.jpg) ![Hollow Text Captcha Image](images/img_HOLLOW2.jpg) 165 | 166 | - `CLEAR` : No specific interference of ordinary picture clarity, improve the recognition rate ![clarity](images/img_CLEAR.jpg) 167 | 168 | - `LINK_BOLD` : Adhesion bold font ![linkbord CAPTCHA](images/plugin_linkbord1.png) ![linkbord CAPTCHA](images/plugin_linkbord2.png) ![linkbord CAPTCHA](images/plugin_linkbord3.png) 169 | 170 | 171 | ## EasyOCR Suite Cross-platform GUI Kit 172 | EasyOCR Suite provides a template for the development of design (ETD tools), consumer use (EasyOCR UI), and so the scene of cross-platform graphical tools. 173 | 174 | Interface specially designed for the modern UI, simplify, and touch interaction is optimized. On the customer experience clear function, easy to use, whether developers or users can easily use. 175 | 176 | - EDT tools 177 | 178 | ![EasyOCR Template Designer](images/EDT-en.png) 179 | 180 | 181 | 182 | ## Comparison with other commercial engines 183 | 184 | With the traditional vendors, because of localized SDK integration capabilities. EasyOCR can only provide services to consumers, more oriented development, as a variety of commercial projects built engine. 185 | 186 | ![EasyOCR Template Designer](images/contrast-en.png) 187 | 188 | 189 | ## Technical support and services 190 | 191 | 192 | OCR recognition technology is not a job once and for all solutions exist. Specific recognition should be optimized depending on the scene for analysis, often need to adjust image cleanup, feature extraction analysis, rule amendments and some of the acquired learning. Banking, gaming, payments, authentication code to crack, different fields need to be analyzed in order to provide a different approach. 193 | 194 | EasyOCR project since the release, received from home and abroad, a friend of consulting services in various industries and to seek their business-related industries specific solutions. 195 | 196 | Since the current OCR engine field, and mainstream commercial engines contrast, EasyOCR with SDK integration capabilities, with programming flexibility, comprehensive, accurate identification and performance, has provided engine support for global enterprises. In Chinese recognition and other fields, after comparing other commercial engines, EasyOCR have greater flexibility and recognition rate. Currently in the field of business services, including banking, reptiles application, pay, large data processing and data analysis in the field of online games graphics processing (United Kingdom), and so on. 197 | 198 | **In order to protect the interests of these business users, EasyOCR code identification and verification after 4.X no longer provided free of charge, in this case to support the open source user sorry, because in addition to paying the selfless spirit of open source, commercial support is also a major driving force for technological progress thanks for understanding. If you need to communicate, we can still provide the appropriate help. We love open source and pay income to happiness, we will provide more other open source projects to promote community progress.** 199 | 200 | **Business users can get some documentations**: 201 | - EasyOCR Development Manual 202 | - EasyOCR Engine installation documentation(Windows, non-Windows) 203 | - EasyOCR API Manual 204 | - EasyOCR Plugin Development Manual 205 | - EasyOCR Java doc 206 | 207 | **Please contact us if there is any need to provide the engine, local SDK, cloud service, cross-platform graphical design tools, customized solutions demand and sustained response, cooperation and other services.** 208 | 209 | 210 | > **OCR Tech Talk** 211 | > 212 | > OCR technology to date has been relatively mature, and in some areas has been widely used, it is possible for a living convenience and breakthroughs. But on the other hand it is not better to imagine a lot of people, this technology also has its adaptation, in other broader areas, OCR is often used as an adjunct to a more efficient, while not absolutely reliable solution . 213 | > 214 | > For these scenarios and fields, the recognition result is Schrodinger's cat box, before human intervention to confirm the state is superimposed, both correct and not correct. So if in the case of defined scenes can not be achieved, but need to be 100 percent accurate results through OCR recognition, then in philosophy and logic is primarily a problem. 215 | > 216 | > A better solution behind often require more complex work, such as image cleanup, feature analysis, aids etc. knowledge and skills. OCR sometimes like a child, to have tolerance, also need guidance, correction, training, let him do better. 217 | > 218 | 219 | 220 | 221 | ## End 222 | 223 | Email: 224 | 225 | [http://www.easyproject.cn](http://www.easyproject.cn "EasyProject Home") 226 | 227 | 228 | **Donation/捐助:** 229 | 230 | 231 | 
232 | 支付宝/微信/QQ/云闪付/PayPal 扫码支付 233 |
支付宝/微信/QQ/云闪付/PayPal
234 | 235 |
236 | 237 | 我们相信,每个人的点滴贡献,都将是推动产生更多、更好免费开源产品的一大步。 238 | 239 | **感谢慷慨捐助,以支持服务器运行和鼓励更多社区成员。** 240 | 241 | We believe that the contribution of each bit by bit, will be driven to produce more and better free and open source products a big step. 242 | 243 | **Thank you donation to support the server running and encourage more community members.** 244 | --------------------------------------------------------------------------------