├── README.md ├── Rakefile ├── bin ├── tesseract-train.rb └── tesseract.rb ├── examples └── nerdz-captcha-breaker │ ├── break.rb │ ├── captchas │ ├── 001.png │ ├── 002.png │ ├── 003.png │ ├── 004.png │ ├── 005.png │ ├── 006.png │ ├── 007.png │ ├── 008.png │ ├── 009.png │ ├── 010.png │ ├── 011.png │ ├── 012.png │ ├── 013.png │ ├── 014.png │ ├── 015.png │ ├── 016.png │ ├── 017.png │ ├── 018.png │ ├── 019.png │ ├── 020.png │ ├── 021.png │ ├── 022.png │ ├── 023.png │ ├── 024.png │ ├── 025.png │ ├── 026.png │ ├── 027.png │ ├── 028.png │ ├── 029.png │ ├── 030.png │ ├── 031.png │ ├── 032.png │ ├── 033.png │ ├── 034.png │ ├── 035.png │ ├── 036.png │ ├── 037.png │ ├── 038.png │ ├── 039.png │ ├── 040.png │ ├── 041.png │ ├── 042.png │ ├── 043.png │ ├── 044.png │ ├── 045.png │ ├── 046.png │ ├── 047.png │ ├── 048.png │ ├── 049.png │ ├── 050.png │ ├── 051.png │ ├── 052.png │ ├── 053.png │ ├── 054.png │ ├── 055.png │ ├── 056.png │ ├── 057.png │ ├── 058.png │ ├── 059.png │ ├── 060.png │ ├── 061.png │ ├── 062.png │ ├── 063.png │ ├── 064.png │ ├── 065.png │ ├── 066.png │ ├── 067.png │ ├── 068.png │ ├── 069.png │ ├── 070.png │ ├── 071.png │ ├── 072.png │ ├── 073.png │ ├── 074.png │ ├── 075.png │ ├── 076.png │ ├── 077.png │ ├── 078.png │ ├── 079.png │ ├── 080.png │ ├── 081.png │ ├── 082.png │ ├── 083.png │ ├── 084.png │ ├── 085.png │ ├── 086.png │ ├── 087.png │ ├── 088.png │ ├── 089.png │ ├── 090.png │ ├── 091.png │ ├── 092.png │ ├── 093.png │ ├── 094.png │ ├── 095.png │ ├── 096.png │ ├── 097.png │ ├── 098.png │ ├── 099.png │ ├── 100.png │ └── captchas.txt │ ├── tessdata │ ├── generate.rb │ ├── lol.box │ ├── lol.tif │ └── lol.traineddata │ └── test.rb ├── lib ├── tesseract-ocr.rb ├── tesseract.rb └── tesseract │ ├── api.rb │ ├── api │ ├── image.rb │ └── iterator.rb │ ├── c.rb │ ├── c │ ├── baseapi.rb │ ├── iterator.rb │ └── leptonica.rb │ ├── engine.rb │ ├── engine │ ├── baseline.rb │ ├── bounding_box.rb │ ├── font_attributes.rb │ ├── iterator.rb │ └── orientation.rb │ ├── extensions.rb │ ├── iterator.rb │ └── version.rb ├── tesseract-ocr.gemspec └── test ├── first.png ├── jsmj.png ├── second.png ├── tesseract_bench.rb ├── tesseract_spec.rb ├── test-european.jpg └── test.png /README.md: -------------------------------------------------------------------------------- 1 | ruby-tesseract - Ruby bindings and wrapper 2 | ========================================== 3 | This wrapper binds the TessBaseAPI object through ffi-inline (which means it 4 | will work on JRuby too) and then proceeds to wrap said API in a more ruby-esque 5 | Engine class. 6 | 7 | Making it work 8 | -------------- 9 | To make this library work you need tesseract-ocr and leptonica libraries and 10 | headers and a C++ compiler. 11 | 12 | The gem is called `tesseract-ocr`. 13 | 14 | If you're on a distribution that separates the libraries from headers, remember 15 | to install the *-dev* package. 16 | 17 | On Debian you will need to install `libleptonica-dev` and `libtesseract-dev`. 18 | 19 | Examples 20 | -------- 21 | Following are some examples that show the functionalities provided by 22 | tesseract-ocr. 23 | 24 | ### Basic functionality of tesseract 25 | 26 | ```ruby 27 | require 'tesseract' 28 | 29 | e = Tesseract::Engine.new {|e| 30 | e.language = :eng 31 | e.blacklist = '|' 32 | } 33 | 34 | e.text_for('test/first.png').strip # => 'ABC' 35 | ``` 36 | 37 | You can pass to `#text_for` either a path, an IO object, a string containing 38 | the image or an object that responds to `#to_blob` (for example 39 | Magick::Image), keep in mind that the format has to be supported by leptonica. 40 | 41 | ### Accessing advanced features 42 | 43 | With advanced features you get access to blocks, paragraphs, lines, words and 44 | symbols. 45 | 46 | Replace **level** in method names with either `block`, `paragraph`, `line`, 47 | `word` or `symbol`. 48 | 49 | The following kind of accessors need a block to be passed and they pass to the 50 | block each `Element` object. The Element object has various getters to access 51 | certain features, I'll talk about them later. 52 | 53 | The methods are: 54 | 55 | * `each_level` 56 | * `each_level_for` 57 | * `each_level_at` 58 | 59 | The following accessors instead return an `Array` of `Element`s with cached 60 | getters, the getters are cached beacause the values accessible in the `Element` 61 | are linked to the state of the internal API, and that state changes if you 62 | access something else. 63 | 64 | The methods are: 65 | 66 | * `levels` 67 | * `levels_for` 68 | * `levels_at` 69 | 70 | Again, to `*_for` methods you can pass what you can pass to a `#text_for`. 71 | 72 | Each `Element` object has the following getters: 73 | 74 | * `bounding_box`, this will return the box where the element is confined into 75 | * `binary_image`, this will return the bichromatic image of the element 76 | * `image`, this will return the image of the element 77 | * `baseline`, this will return the line where the text is with a pair of 78 | coordinates 79 | * `orientation`, this will return the orientation of the element 80 | * `text`, this will return the text of the element 81 | * `confidence`, this will return the confidence of correctness for the element 82 | 83 | `Block` elements also have `type` accessors that specify the type of the block. 84 | 85 | `Word` elements also have `font_attributes`, `from_dictionary?` and `numeric?` 86 | getters. 87 | 88 | `Symbol` elements also have `superscript?`, `subscript?` and `dropcap?` 89 | getters. 90 | 91 | ### hOCR 92 | 93 | ```ruby 94 | require 'tesseract' 95 | 96 | e = Tesseract::Engine.new {|e| 97 | e.language = :eng 98 | e.blacklist = '|' 99 | } 100 | 101 | puts e.hocr_for('test/first.png') 102 | ``` 103 | 104 | You can pass to `#hocr_for` either a path, an IO object, a string containing 105 | the image or an object that responds to `#to_blob` (for example 106 | Magick::Image), keep in mind that the format has to be supported by leptonica. 107 | 108 | Please note you have to pass `#hocr_for` the page you want to get the output of 109 | as well. 110 | 111 | Using the binary 112 | ---------------- 113 | You can also use the shipped executable in the following way: 114 | 115 | ```bash 116 | > tesseract.rb -h 117 | Usage: tesseract [options] 118 | --path PATH datapath to set 119 | -l, --language LANGUAGE language to use 120 | -m, --mode MODE mode to use 121 | -p, --psm MODE page segmentation mode to use 122 | -u, --unlv output in UNLV format 123 | -c, --confidence output the mean confidence of the recognition 124 | -C, --config PATH... config files to load 125 | -b, --blacklist LIST blacklist the following chars 126 | -w, --whitelist LIST whitelist the following chars 127 | > tesseract.rb test/first.png 128 | ABC 129 | > tesseract.rb -c test/first.png 130 | 86 131 | ``` 132 | 133 | License 134 | ------- 135 | The license is BSD one clause. 136 | -------------------------------------------------------------------------------- /Rakefile: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'rake' 3 | 4 | task :default => :test 5 | 6 | task :test do 7 | Dir.chdir 'test' 8 | sh 'rspec tesseract_spec.rb --color --format doc' 9 | end 10 | 11 | task :bench do 12 | Dir.chdir 'test' 13 | sh 'ruby tesseract_bench.rb' 14 | end 15 | -------------------------------------------------------------------------------- /bin/tesseract-train.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'optparse' 3 | require 'tmpdir' 4 | require 'fileutils' 5 | require 'shellwords' 6 | 7 | options = {} 8 | 9 | OptionParser.new do |o| 10 | o.on '-d', '--data DATA...', Array, 'the data to use' do |value| 11 | options[:data] = Hash[value.map { |e| e.split(?:).map { |p| File.realpath(p) } }] 12 | end 13 | 14 | o.on '-o', '--output FILE', 'the path where to output the traineddata' do |value| 15 | options[:output] = File.expand_path(value) 16 | end 17 | end.parse! 18 | 19 | if language = ARGV.shift 20 | options[:box] = File.realpath("#{language}.box") 21 | options[:image] = File.realpath("#{language}.tif") 22 | options[:output] = File.expand_path("#{language}.traineddata") 23 | else 24 | language = options[:output][/^(.*?)\./, 1] 25 | end 26 | 27 | Dir.chdir FileUtils.mkpath(File.join(Dir.tmpdir, rand.to_s)).first 28 | 29 | language = language.shellescape 30 | 31 | options[:data].each_with_index {|(box, image), index| 32 | %x{ 33 | cp #{box.shellescape} #{language}.#{index}.box 34 | cp #{image.shellescape} #{language}.#{index}#{File.extname(image)} 35 | 36 | tesseract #{language}.#{index}#{File.extname(image)} #{language} nobatch box.train.stderr 37 | 38 | unicharset_extractor #{language}.box 39 | 40 | echo #{language}.#{index} 0 0 0 0 0 >> font_properties 41 | mftraining -F font_properties -U unicharset -O #{language}.unicharset #{language}.tr 42 | } 43 | } 44 | 45 | %x{ 46 | cntraining #{language}.tr 47 | 48 | mv Microfeat #{language}.Microfeat 49 | mv normproto #{language}.normproto 50 | mv pffmtable #{language}.pffmtable 51 | mv mfunicharset #{language}.mfunicharset 52 | mv inttemp #{language}.inttemp 53 | 54 | combine_tessdata #{language}. 55 | 56 | mv #{language}.traineddata #{options[:output].shellescape} 57 | } 58 | 59 | =begin 60 | path = File.realpath(Dir.pwd) 61 | 62 | Dir.chdir '/' 63 | 64 | FileUtils.rm_rf path 65 | =end 66 | -------------------------------------------------------------------------------- /bin/tesseract.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'optparse' 3 | require 'tesseract' 4 | 5 | options = {} 6 | 7 | OptionParser.new do |o| 8 | options[:path] = ?. 9 | options[:language] = :en 10 | options[:mode] = :DEFAULT 11 | 12 | o.on '-p', '--path PATH', 'datapath to set' do |value| 13 | options[:path] = value 14 | end 15 | 16 | o.on '-l', '--language LANGUAGE', 'language to use' do |value| 17 | options[:language] = value 18 | end 19 | 20 | o.on '-m', '--mode MODE', 'mode to use' do |value| 21 | options[:mode] = value.upcase.to_sym 22 | end 23 | 24 | o.on '-p', '--psm MODE', 'page segmentation mode to use' do |value| 25 | options[:psm] = value.to_i 26 | end 27 | 28 | o.on '-u', '--unlv', 'output in UNLV format' do 29 | options[:unlv] = true 30 | end 31 | 32 | o.on '-c', '--confidence', 'output the mean confidence of the recognition' do 33 | options[:confidence] = true 34 | end 35 | 36 | o.on '-C', '--config PATH...', Array, 'config files to load' do |config| 37 | options[:config] = config 38 | end 39 | 40 | o.on '-b', '--blacklist LIST', 'blacklist the following chars' do |value| 41 | options[:blacklist] = value 42 | end 43 | 44 | o.on '-w', '--whitelist LIST', 'whitelist the following chars' do |value| 45 | options[:whitelist] = value 46 | end 47 | 48 | o.on '-s', '--scale VALUE', Float, 'scale the image before analyzing it' do |value| 49 | options[:scale] = value 50 | end 51 | 52 | o.on '-r', '--resize VALUE', Float, 'resize the image before analyzing it' do |value| 53 | options[:resize] = value 54 | end 55 | end.parse! 56 | 57 | Tesseract::Engine.new(options[:path], options[:language], options[:mode]) {|engine| 58 | engine.blacklist options[:blacklist] if options[:blacklist] 59 | engine.whitelist options[:whitelist] if options[:whitelist] 60 | 61 | engine.page_segmentation_mode = options[:psm] if options[:psm] 62 | engine.load_config options[:config] if options[:config] 63 | }.tap {|engine| 64 | image = if options[:scale] 65 | require 'RMagick'; Magick::Image.read(ARGV.first).first.scale(options[:scale]) 66 | elsif options[:resize] 67 | require 'RMagick'; Magick::Image.read(ARGV.first).first.resize(options[:resize]) 68 | else 69 | ARGV.first 70 | end 71 | 72 | if options[:unlv] 73 | puts engine.text_for(image).unlv.strip 74 | elsif options[:confidence] 75 | puts engine.text_for(image).confidence 76 | else 77 | puts engine.text_for(image).strip 78 | end 79 | } 80 | -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/break.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'tesseract' 3 | require 'RMagick' 4 | 5 | # this function is used to get points near the current pixel to 6 | # cleanup the oblique lines mess, horizontal points seem to output 7 | # better cleanup 8 | def near (x, y) 9 | [ 10 | # [x - 1, y - 1], 11 | # [x, y - 1], 12 | # [x + 1, y - 1], 13 | [x - 1, y ], 14 | [x + 1, y ], 15 | # [x - 1, y + 1], 16 | # [x, y + 1], 17 | # [x + 1, y + 1] 18 | ] 19 | end 20 | 21 | class Magick::Pixel 22 | def =~ (other) 23 | other = Magick::Pixel.from_color(other) if other.is_a?(String) 24 | 25 | red == other.red && green == other.green && blue == other.blue 26 | end 27 | end 28 | 29 | Tesseract.prefix = './' 30 | 31 | Tesseract::Engine.new {|engine| 32 | engine.language = :lol 33 | engine.page_segmentation_mode = 8 34 | engine.whitelist = [*'a'..'z', *'A'..'Z', *0..9].join 35 | }.tap {|engine| 36 | ARGV.each {|path| 37 | image = Magick::Image.read(path).first 38 | pixels = Hash.new { |h, k| h[k] = 0 } 39 | 40 | image.each_pixel {|p| 41 | pixels[p] += 1 42 | } 43 | 44 | pixels.reject! { |p| p =~ 'black' } 45 | 46 | text_color, count = pixels.max { |a, b| a.last <=> b.last } 47 | 48 | image.each_pixel {|p, x, y| 49 | next unless p =~ text_color or p =~ 'black' 50 | 51 | image.pixel_color x, y, p =~ text_color ? 'black' : 'white' 52 | } 53 | 54 | image.each_pixel {|p, x, y| 55 | next if p =~ 'black' || p =~ 'white' 56 | 57 | if near(x, y).map { |(x, y)| image.pixel_color x, y }.any? { |p| p =~ 'black' } 58 | image.pixel_color x, y, 'gray' 59 | else 60 | image.pixel_color x, y, 'white' 61 | end 62 | } 63 | 64 | image.each_pixel {|p, x, y| 65 | next unless p =~ 'gray' 66 | 67 | image.pixel_color x, y, 'black' 68 | } 69 | 70 | image.scale(4).display if ENV['DEBUG'] 71 | 72 | puts "#{path}: #{engine.text_for(image.scale(4)).strip}" 73 | } 74 | } 75 | -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/001.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/002.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/003.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/004.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/004.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/005.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/005.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/006.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/006.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/007.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/007.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/008.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/008.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/009.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/009.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/010.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/010.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/011.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/011.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/012.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/012.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/013.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/013.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/014.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/014.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/015.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/015.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/016.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/016.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/017.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/017.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/018.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/018.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/019.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/019.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/020.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/020.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/021.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/021.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/022.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/022.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/023.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/023.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/024.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/024.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/025.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/025.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/026.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/026.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/027.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/027.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/028.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/028.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/029.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/029.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/030.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/030.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/031.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/031.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/032.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/032.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/033.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/033.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/034.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/034.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/035.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/035.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/036.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/036.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/037.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/037.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/038.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/038.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/039.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/039.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/040.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/040.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/041.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/041.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/042.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/042.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/043.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/043.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/044.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/044.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/045.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/045.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/046.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/046.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/047.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/047.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/048.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/048.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/049.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/049.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/050.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/050.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/051.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/051.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/052.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/052.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/053.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/053.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/054.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/054.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/055.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/055.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/056.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/056.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/057.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/057.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/058.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/058.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/059.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/059.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/060.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/060.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/061.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/061.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/062.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/062.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/063.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/063.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/064.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/064.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/065.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/065.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/066.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/066.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/067.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/067.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/068.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/068.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/069.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/069.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/070.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/070.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/071.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/071.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/072.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/072.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/073.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/073.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/074.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/074.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/075.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/075.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/076.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/076.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/077.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/077.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/078.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/078.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/079.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/079.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/080.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/080.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/081.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/081.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/082.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/082.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/083.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/083.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/084.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/084.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/085.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/085.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/086.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/086.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/087.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/087.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/088.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/088.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/089.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/089.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/090.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/090.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/091.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/091.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/092.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/092.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/093.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/093.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/094.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/094.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/095.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/095.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/096.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/096.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/097.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/097.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/098.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/098.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/099.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/099.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/captchas/100.png -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/captchas/captchas.txt: -------------------------------------------------------------------------------- 1 | enKzaV 2 | CZU6tf 3 | ZO5mGY 4 | FNNhZv 5 | Dwp1Vy 6 | JYsDAi 7 | dld510 8 | yG615j 9 | WHxAUZ 10 | k9IhZu 11 | qWIPSr 12 | nDSXc5 13 | 9iTYeZ 14 | s44iQ9 15 | VPNXWy 16 | 80zxvW 17 | QA7IYj 18 | D8Ro4U 19 | OiEg1U 20 | pJS7Z8 21 | 6w8eik 22 | s5igED 23 | 7bJe8p 24 | VtYdW3 25 | jNNdcO 26 | neLPNV 27 | KONPnl 28 | Q8aSXJ 29 | kIwSqv 30 | 8LQExn 31 | RwcDU2 32 | LMLg5K 33 | C0YmdD 34 | mqAvES 35 | Ai0Wxi 36 | bopETp 37 | L3yP5u 38 | w4rw3b 39 | oSEUMU 40 | bftqDK 41 | mM7cKE 42 | rYl6x4 43 | 3hVI8X 44 | Tm2PPp 45 | VfmqQ6 46 | 0EZAgC 47 | QW6gBS 48 | UTS137 49 | YXXTqk 50 | a6LU3K 51 | SVzguN 52 | l9G8Y9 53 | ZP9TDM 54 | yj7zmS 55 | sD0Ub9 56 | XeWY3A 57 | w8EKVl 58 | 3gO266 59 | yYN3oJ 60 | NumjLi 61 | EEzwCz 62 | 8bUSjW 63 | GAo6ap 64 | AXcn6K 65 | KquWmp 66 | PM8LYt 67 | uGS7GO 68 | mucVzf 69 | UIZ6cf 70 | mXRdGq 71 | lcn8OP 72 | lKqkw7 73 | CTHnR1 74 | ShvmSD 75 | klCoXI 76 | 8epftU 77 | nA277p 78 | uXfavQ 79 | EhInXB 80 | KJLUZf 81 | nAWswu 82 | savOX1 83 | bJtoCK 84 | cTnmnF 85 | hoYGbS 86 | d3J04A 87 | c18WaN 88 | aO5VVN 89 | DvP3vf 90 | 7cuFOq 91 | mPaSmA 92 | k0rNqQ 93 | tvG40k 94 | 154xFW 95 | x9ioCI 96 | p9rR4J 97 | DNLPJA 98 | wGR2O1 99 | zVr96r 100 | 856sHR 101 | -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/tessdata/generate.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'rubygems' 3 | require 'gd2' 4 | require 'RMagick' 5 | 6 | captchas = File.read(ARGV.shift).lines.to_a 7 | 8 | image = GD2::Image.new(90, captchas.length * 30) 9 | font = GD2::Font::Giant 10 | 11 | captchas.each_with_index {|captcha, index| 12 | font.draw(image.image_ptr, 10, index * 30 + 10, 0, captcha.chomp, GD2::Color::WHITE.to_i) 13 | } 14 | 15 | output = ARGV.shift 16 | 17 | image.export(output) 18 | 19 | image = Magick::Image.read(output).first 20 | 21 | File.open(output, 'wb') { |f| f.write(image.scale(4).to_blob) } 22 | -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/tessdata/lol.box: -------------------------------------------------------------------------------- 1 | e 40 11908 72 11936 0 2 | n 76 11908 108 11936 0 3 | K 112 11908 144 11948 0 4 | z 152 11908 176 11936 0 5 | a 184 11908 216 11936 0 6 | V 220 11908 252 11948 0 7 | C 40 11788 72 11828 0 8 | Z 76 11788 104 11828 0 9 | U 112 11788 144 11828 0 10 | 6 148 11788 180 11828 0 11 | t 184 11788 216 11824 0 12 | f 220 11788 252 11828 0 13 | Z 40 11668 68 11708 0 14 | O 76 11668 108 11708 0 15 | 5 112 11668 144 11708 0 16 | m 148 11668 180 11696 0 17 | G 184 11668 216 11708 0 18 | Y 220 11668 252 11708 0 19 | F 40 11548 72 11588 0 20 | N 76 11548 108 11588 0 21 | N 112 11548 144 11588 0 22 | h 148 11548 180 11588 0 23 | Z 184 11548 212 11588 0 24 | v 220 11548 252 11576 0 25 | D 40 11428 72 11468 0 26 | w 76 11428 108 11456 0 27 | p 112 11420 144 11456 0 28 | 1 152 11428 176 11468 0 29 | V 184 11428 216 11468 0 30 | y 220 11420 252 11456 0 31 | J 44 11308 68 11348 0 32 | Y 76 11308 108 11348 0 33 | s 112 11308 144 11336 0 34 | D 148 11308 180 11348 0 35 | A 184 11308 216 11348 0 36 | i 224 11308 248 11348 0 37 | d 40 11188 72 11228 0 38 | l 84 11188 100 11228 0 39 | d 112 11188 144 11228 0 40 | 5 148 11188 180 11228 0 41 | 1 188 11188 212 11228 0 42 | 0 220 11188 252 11228 0 43 | y 40 11060 72 11096 0 44 | G 76 11068 108 11108 0 45 | 6 112 11068 144 11108 0 46 | 1 152 11068 176 11108 0 47 | 5 184 11068 216 11108 0 48 | j 220 11060 248 11108 0 49 | W 40 10948 72 10988 0 50 | H 76 10948 108 10988 0 51 | x 112 10948 144 10976 0 52 | A 148 10948 180 10988 0 53 | U 184 10948 216 10988 0 54 | Z 220 10948 248 10988 0 55 | k 44 10828 72 10868 0 56 | 9 76 10828 108 10868 0 57 | I 116 10828 140 10868 0 58 | h 148 10828 180 10868 0 59 | Z 184 10828 212 10868 0 60 | u 220 10828 252 10856 0 61 | q 40 10700 72 10736 0 62 | W 76 10708 108 10748 0 63 | I 116 10708 140 10748 0 64 | P 148 10708 180 10748 0 65 | S 184 10708 216 10748 0 66 | r 220 10708 252 10736 0 67 | n 40 10588 72 10616 0 68 | D 76 10588 108 10628 0 69 | S 112 10588 144 10628 0 70 | X 148 10588 180 10628 0 71 | c 184 10588 216 10616 0 72 | 5 220 10588 252 10628 0 73 | 9 40 10468 72 10508 0 74 | i 80 10468 104 10508 0 75 | T 112 10468 144 10508 0 76 | Y 148 10468 180 10508 0 77 | e 184 10468 216 10496 0 78 | Z 220 10468 248 10508 0 79 | s 40 10348 72 10376 0 80 | 4 76 10348 108 10388 0 81 | 4 112 10348 144 10388 0 82 | i 152 10348 176 10388 0 83 | Q 184 10348 216 10388 0 84 | 9 220 10348 252 10388 0 85 | V 40 10228 72 10268 0 86 | P 76 10228 108 10268 0 87 | N 112 10228 144 10268 0 88 | X 148 10228 180 10268 0 89 | W 184 10228 216 10268 0 90 | y 220 10220 252 10256 0 91 | 8 40 10108 72 10148 0 92 | 0 76 10108 108 10148 0 93 | z 116 10108 140 10136 0 94 | x 148 10108 180 10136 0 95 | v 184 10108 216 10136 0 96 | W 220 10108 252 10148 0 97 | Q 40 9988 72 10028 0 98 | A 76 9988 108 10028 0 99 | 7 112 9988 144 10028 0 100 | I 152 9988 176 10028 0 101 | Y 184 9988 216 10028 0 102 | J 220 9980 248 10028 0 103 | D 40 9868 72 9908 0 104 | 8 76 9868 108 9908 0 105 | R 112 9868 144 9908 0 106 | o 148 9868 180 9896 0 107 | 4 184 9868 216 9908 0 108 | U 220 9868 252 9908 0 109 | O 40 9748 72 9788 0 110 | i 80 9748 104 9788 0 111 | E 112 9748 140 9788 0 112 | g 148 9740 180 9776 0 113 | 1 188 9748 212 9788 0 114 | U 220 9748 252 9788 0 115 | p 40 9620 72 9656 0 116 | J 80 9628 104 9668 0 117 | S 112 9628 144 9668 0 118 | 7 148 9628 180 9668 0 119 | Z 184 9628 212 9668 0 120 | 8 220 9628 252 9668 0 121 | 6 40 9508 72 9548 0 122 | w 76 9508 108 9536 0 123 | 8 112 9508 144 9548 0 124 | e 148 9508 180 9536 0 125 | i 188 9508 212 9548 0 126 | k 224 9508 252 9548 0 127 | s 40 9388 72 9416 0 128 | 5 76 9388 108 9428 0 129 | i 116 9388 140 9428 0 130 | g 148 9380 180 9416 0 131 | E 184 9388 212 9428 0 132 | D 220 9388 252 9428 0 133 | 7 40 9268 72 9308 0 134 | b 76 9268 108 9308 0 135 | J 116 9268 140 9308 0 136 | e 148 9268 180 9296 0 137 | 8 184 9268 216 9308 0 138 | p 220 9260 252 9296 0 139 | V 40 9148 72 9188 0 140 | t 76 9148 108 9184 0 141 | Y 112 9148 144 9188 0 142 | d 148 9148 180 9188 0 143 | W 184 9148 216 9188 0 144 | 3 220 9148 252 9188 0 145 | j 40 9020 68 9068 0 146 | N 76 9028 108 9068 0 147 | N 112 9028 144 9068 0 148 | d 148 9028 180 9068 0 149 | c 184 9028 216 9056 0 150 | O 220 9028 252 9068 0 151 | n 40 8908 72 8936 0 152 | e 76 8908 108 8936 0 153 | L 112 8908 140 8948 0 154 | P 148 8908 180 8948 0 155 | N 184 8908 216 8948 0 156 | V 220 8908 252 8948 0 157 | K 40 8788 72 8828 0 158 | O 76 8788 108 8828 0 159 | N 112 8788 144 8828 0 160 | P 148 8788 180 8828 0 161 | n 184 8788 216 8816 0 162 | l 228 8788 244 8828 0 163 | Q 40 8668 72 8708 0 164 | 8 76 8668 108 8708 0 165 | a 112 8668 144 8696 0 166 | S 148 8668 180 8708 0 167 | X 184 8668 216 8708 0 168 | J 224 8668 248 8708 0 169 | k 44 8548 72 8588 0 170 | I 80 8548 104 8588 0 171 | w 112 8548 144 8576 0 172 | S 148 8548 180 8588 0 173 | q 184 8540 216 8576 0 174 | v 220 8548 252 8576 0 175 | 8 40 8428 72 8468 0 176 | L 76 8428 104 8468 0 177 | Q 112 8428 144 8468 0 178 | E 148 8428 176 8468 0 179 | x 184 8428 216 8456 0 180 | n 220 8428 252 8456 0 181 | R 40 8308 72 8348 0 182 | w 76 8308 108 8336 0 183 | c 112 8308 144 8336 0 184 | D 148 8308 180 8348 0 185 | U 184 8308 216 8348 0 186 | 2 220 8308 252 8348 0 187 | L 40 8188 68 8228 0 188 | M 76 8188 108 8228 0 189 | L 112 8188 140 8228 0 190 | g 148 8180 180 8216 0 191 | 5 184 8188 216 8228 0 192 | K 220 8188 252 8228 0 193 | C 40 8068 72 8108 0 194 | 0 76 8068 108 8108 0 195 | Y 112 8068 144 8108 0 196 | m 148 8068 180 8096 0 197 | d 184 8068 216 8108 0 198 | D 220 8068 252 8108 0 199 | m 40 7948 72 7976 0 200 | q 76 7940 108 7976 0 201 | A 112 7948 144 7988 0 202 | v 148 7948 180 7976 0 203 | E 184 7948 212 7988 0 204 | S 220 7948 252 7988 0 205 | A 40 7828 72 7868 0 206 | i 80 7828 104 7868 0 207 | 0 112 7828 144 7868 0 208 | W 148 7828 180 7868 0 209 | x 184 7828 216 7856 0 210 | i 224 7828 248 7868 0 211 | b 40 7708 72 7748 0 212 | o 76 7708 108 7736 0 213 | p 112 7700 144 7736 0 214 | E 148 7708 176 7748 0 215 | T 184 7708 216 7748 0 216 | p 220 7700 252 7736 0 217 | L 40 7588 68 7628 0 218 | 3 76 7588 108 7628 0 219 | y 112 7580 144 7616 0 220 | P 148 7588 180 7628 0 221 | 5 184 7588 216 7628 0 222 | u 220 7588 252 7616 0 223 | w 40 7468 72 7496 0 224 | 4 76 7468 108 7508 0 225 | r 112 7468 144 7496 0 226 | w 148 7468 180 7496 0 227 | 3 184 7468 216 7508 0 228 | b 220 7468 252 7508 0 229 | o 40 7348 72 7376 0 230 | S 76 7348 108 7388 0 231 | E 112 7348 140 7388 0 232 | U 148 7348 180 7388 0 233 | M 184 7348 216 7388 0 234 | U 220 7348 252 7388 0 235 | b 40 7228 72 7268 0 236 | f 76 7228 108 7268 0 237 | t 112 7228 144 7264 0 238 | q 148 7220 180 7256 0 239 | D 184 7228 216 7268 0 240 | K 220 7228 252 7268 0 241 | m 40 7108 72 7136 0 242 | M 76 7108 108 7148 0 243 | 7 112 7108 144 7148 0 244 | c 148 7108 180 7136 0 245 | K 184 7108 216 7148 0 246 | E 220 7108 248 7148 0 247 | r 40 6988 72 7016 0 248 | Y 76 6988 108 7028 0 249 | l 120 6988 136 7028 0 250 | 6 148 6988 180 7028 0 251 | x 184 6988 216 7016 0 252 | 4 220 6988 252 7028 0 253 | 3 40 6868 72 6908 0 254 | h 76 6868 108 6908 0 255 | V 112 6868 144 6908 0 256 | I 152 6868 176 6908 0 257 | 8 184 6868 216 6908 0 258 | X 220 6868 252 6908 0 259 | T 40 6748 72 6788 0 260 | m 76 6748 108 6776 0 261 | 2 112 6748 144 6788 0 262 | P 148 6748 180 6788 0 263 | P 184 6748 216 6788 0 264 | p 220 6740 252 6776 0 265 | V 40 6628 72 6668 0 266 | f 76 6628 108 6668 0 267 | m 112 6628 144 6656 0 268 | q 148 6620 180 6656 0 269 | Q 184 6628 216 6668 0 270 | 6 220 6628 252 6668 0 271 | 0 40 6508 72 6548 0 272 | E 76 6508 104 6548 0 273 | Z 112 6508 140 6548 0 274 | A 148 6508 180 6548 0 275 | g 184 6500 216 6536 0 276 | C 220 6508 252 6548 0 277 | Q 40 6388 72 6428 0 278 | W 76 6388 108 6428 0 279 | 6 112 6388 144 6428 0 280 | g 148 6380 180 6416 0 281 | B 184 6388 216 6428 0 282 | S 220 6388 252 6428 0 283 | U 40 6268 72 6308 0 284 | T 76 6268 108 6308 0 285 | S 112 6268 144 6308 0 286 | 1 152 6268 176 6308 0 287 | 3 184 6268 216 6308 0 288 | 7 220 6268 252 6308 0 289 | Y 40 6148 72 6188 0 290 | X 76 6148 108 6188 0 291 | X 112 6148 144 6188 0 292 | T 148 6148 180 6188 0 293 | q 184 6140 216 6176 0 294 | k 224 6148 252 6188 0 295 | a 40 6028 72 6056 0 296 | 6 76 6028 108 6068 0 297 | L 112 6028 140 6068 0 298 | U 148 6028 180 6068 0 299 | 3 184 6028 216 6068 0 300 | K 220 6028 252 6068 0 301 | S 40 5908 72 5948 0 302 | V 76 5908 108 5948 0 303 | z 116 5908 140 5936 0 304 | g 148 5900 180 5936 0 305 | u 184 5908 216 5936 0 306 | N 220 5908 252 5948 0 307 | l 48 5788 64 5828 0 308 | 9 76 5788 108 5828 0 309 | G 112 5788 144 5828 0 310 | 8 148 5788 180 5828 0 311 | Y 184 5788 216 5828 0 312 | 9 220 5788 252 5828 0 313 | Z 40 5668 68 5708 0 314 | P 76 5668 108 5708 0 315 | 9 112 5668 144 5708 0 316 | T 148 5668 180 5708 0 317 | D 184 5668 216 5708 0 318 | M 220 5668 252 5708 0 319 | y 40 5540 72 5576 0 320 | j 76 5540 104 5588 0 321 | 7 112 5548 144 5588 0 322 | z 152 5548 176 5576 0 323 | m 184 5548 216 5576 0 324 | S 220 5548 252 5588 0 325 | s 40 5428 72 5456 0 326 | D 76 5428 108 5468 0 327 | 0 112 5428 144 5468 0 328 | U 148 5428 180 5468 0 329 | b 184 5428 216 5468 0 330 | 9 220 5428 252 5468 0 331 | X 40 5308 72 5348 0 332 | e 76 5308 108 5336 0 333 | W 112 5308 144 5348 0 334 | Y 148 5308 180 5348 0 335 | 3 184 5308 216 5348 0 336 | A 220 5308 252 5348 0 337 | w 40 5188 72 5216 0 338 | 8 76 5188 108 5228 0 339 | E 112 5188 140 5228 0 340 | K 148 5188 180 5228 0 341 | V 184 5188 216 5228 0 342 | l 228 5188 244 5228 0 343 | 3 40 5068 72 5108 0 344 | g 76 5060 108 5096 0 345 | 0 112 5068 144 5108 0 346 | 2 148 5068 180 5108 0 347 | 6 184 5068 216 5108 0 348 | 6 220 5068 252 5108 0 349 | y 40 4940 72 4976 0 350 | Y 76 4948 108 4988 0 351 | N 112 4948 144 4988 0 352 | 3 148 4948 180 4988 0 353 | o 184 4948 216 4976 0 354 | J 224 4948 248 4988 0 355 | N 40 4828 72 4868 0 356 | u 76 4828 108 4856 0 357 | m 112 4828 144 4856 0 358 | j 148 4820 176 4868 0 359 | L 184 4828 212 4868 0 360 | i 224 4828 248 4868 0 361 | E 40 4708 68 4748 0 362 | E 76 4708 104 4748 0 363 | z 116 4708 140 4736 0 364 | w 148 4708 180 4736 0 365 | C 184 4708 216 4748 0 366 | z 224 4708 248 4736 0 367 | 8 40 4588 72 4628 0 368 | b 76 4588 108 4628 0 369 | U 112 4588 144 4628 0 370 | S 148 4588 180 4628 0 371 | j 184 4580 212 4628 0 372 | W 220 4588 252 4628 0 373 | G 40 4468 72 4508 0 374 | A 76 4468 108 4508 0 375 | o 112 4468 144 4496 0 376 | 6 148 4468 180 4508 0 377 | a 184 4468 216 4496 0 378 | p 220 4460 252 4496 0 379 | A 40 4348 72 4388 0 380 | X 76 4348 108 4388 0 381 | c 112 4348 144 4376 0 382 | n 148 4348 180 4376 0 383 | 6 184 4348 216 4388 0 384 | K 220 4348 252 4388 0 385 | K 40 4228 72 4268 0 386 | q 76 4220 108 4256 0 387 | u 112 4228 144 4256 0 388 | W 148 4228 180 4268 0 389 | m 184 4228 216 4256 0 390 | p 220 4220 252 4256 0 391 | P 40 4108 72 4148 0 392 | H 76 4108 108 4148 0 393 | 8 112 4108 144 4148 0 394 | L 148 4108 176 4148 0 395 | Y 184 4108 216 4148 0 396 | t 220 4108 252 4144 0 397 | U 40 3988 72 4016 0 398 | G 76 3988 108 4028 0 399 | E 112 3988 144 4028 0 400 | F 148 3988 180 4028 0 401 | G 184 3988 216 4028 0 402 | D 220 3988 252 4028 0 403 | m 40 3868 72 3896 0 404 | u 76 3868 108 3896 0 405 | c 112 3868 144 3896 0 406 | V 148 3868 180 3908 0 407 | z 188 3868 212 3896 0 408 | f 220 3868 252 3908 0 409 | U 40 3748 72 3788 0 410 | I 80 3748 104 3788 0 411 | Z 112 3748 140 3788 0 412 | 6 148 3748 180 3788 0 413 | c 184 3748 216 3776 0 414 | f 220 3748 252 3788 0 415 | m 40 3628 72 3656 0 416 | X 76 3628 108 3668 0 417 | R 112 3628 144 3668 0 418 | d 148 3628 180 3668 0 419 | G 184 3628 216 3668 0 420 | q 220 3620 252 3656 0 421 | l 48 3508 64 3548 0 422 | c 76 3508 108 3536 0 423 | n 112 3508 144 3536 0 424 | 8 148 3508 180 3548 0 425 | O 184 3508 216 3548 0 426 | P 220 3508 252 3548 0 427 | l 48 3388 64 3428 0 428 | K 76 3388 108 3428 0 429 | q 112 3380 144 3416 0 430 | k 152 3388 180 3428 0 431 | w 184 3388 216 3416 0 432 | 7 220 3388 252 3428 0 433 | C 40 3268 72 3308 0 434 | T 76 3268 108 3308 0 435 | H 112 3268 144 3308 0 436 | n 148 3268 180 3296 0 437 | R 184 3268 216 3308 0 438 | 1 224 3268 248 3308 0 439 | S 40 3148 72 3188 0 440 | h 76 3148 108 3188 0 441 | v 112 3148 144 3176 0 442 | m 148 3148 180 3176 0 443 | S 184 3148 216 3188 0 444 | D 220 3148 252 3188 0 445 | k 44 3028 72 3068 0 446 | l 84 3028 100 3068 0 447 | C 112 3028 144 3068 0 448 | o 148 3028 180 3056 0 449 | X 184 3028 216 3068 0 450 | I 224 3028 248 3068 0 451 | 8 40 2908 72 2948 0 452 | e 76 2908 108 2936 0 453 | p 112 2900 144 2936 0 454 | f 148 2908 180 2948 0 455 | t 184 2908 216 2944 0 456 | U 220 2908 252 2948 0 457 | n 40 2788 72 2816 0 458 | A 76 2788 108 2828 0 459 | 2 112 2788 144 2828 0 460 | 7 148 2788 180 2828 0 461 | 7 184 2788 216 2828 0 462 | p 220 2780 252 2816 0 463 | u 40 2668 72 2696 0 464 | X 76 2668 108 2708 0 465 | f 112 2668 144 2708 0 466 | a 148 2668 180 2696 0 467 | v 184 2668 216 2696 0 468 | Q 220 2668 252 2708 0 469 | E 40 2548 68 2588 0 470 | h 76 2548 108 2588 0 471 | I 116 2548 140 2588 0 472 | n 148 2548 180 2576 0 473 | X 184 2548 216 2588 0 474 | B 220 2548 252 2588 0 475 | K 40 2428 72 2468 0 476 | J 80 2428 104 2468 0 477 | L 112 2428 140 2468 0 478 | U 148 2428 180 2468 0 479 | Z 184 2428 212 2468 0 480 | f 220 2428 252 2468 0 481 | n 40 2308 72 2336 0 482 | A 76 2308 108 2348 0 483 | W 112 2308 144 2348 0 484 | s 148 2308 180 2336 0 485 | w 184 2308 216 2336 0 486 | u 220 2308 252 2336 0 487 | s 40 2188 72 2216 0 488 | a 76 2188 108 2216 0 489 | v 112 2188 144 2216 0 490 | O 148 2188 180 2228 0 491 | X 184 2188 216 2228 0 492 | 1 224 2188 248 2228 0 493 | b 40 2068 72 2108 0 494 | J 80 2068 104 2108 0 495 | t 112 2068 144 2104 0 496 | o 148 2068 180 2096 0 497 | C 184 2068 216 2108 0 498 | K 220 2068 252 2108 0 499 | c 40 1948 72 1976 0 500 | T 76 1948 108 1988 0 501 | n 112 1948 144 1976 0 502 | m 148 1948 180 1976 0 503 | n 184 1948 216 1976 0 504 | F 220 1948 252 1988 0 505 | h 40 1828 72 1868 0 506 | o 76 1828 108 1856 0 507 | Y 112 1828 144 1868 0 508 | G 148 1828 180 1868 0 509 | b 184 1828 216 1868 0 510 | S 220 1828 252 1868 0 511 | d 40 1708 72 1748 0 512 | 3 76 1708 108 1748 0 513 | J 116 1708 140 1748 0 514 | 0 148 1708 180 1748 0 515 | 4 184 1708 216 1748 0 516 | A 220 1708 252 1748 0 517 | c 40 1588 72 1616 0 518 | 1 80 1588 104 1628 0 519 | 8 112 1588 144 1628 0 520 | W 148 1588 180 1628 0 521 | a 184 1588 216 1616 0 522 | N 220 1588 252 1628 0 523 | a 40 1468 72 1496 0 524 | O 76 1468 108 1508 0 525 | 5 112 1468 144 1508 0 526 | V 148 1468 180 1508 0 527 | V 184 1468 216 1508 0 528 | N 220 1468 252 1508 0 529 | D 40 1348 72 1388 0 530 | v 76 1348 108 1376 0 531 | P 112 1348 144 1388 0 532 | 3 148 1348 180 1388 0 533 | v 184 1348 216 1376 0 534 | f 220 1348 252 1388 0 535 | 7 40 1228 72 1268 0 536 | c 76 1228 108 1256 0 537 | u 112 1228 144 1256 0 538 | F 148 1228 180 1268 0 539 | O 184 1228 216 1268 0 540 | q 220 1220 252 1256 0 541 | m 40 1108 72 1136 0 542 | P 76 1108 108 1148 0 543 | a 112 1108 144 1136 0 544 | S 148 1108 180 1148 0 545 | m 184 1108 216 1136 0 546 | A 220 1108 252 1148 0 547 | k 44 988 72 1028 0 548 | 0 76 988 108 1028 0 549 | r 112 988 144 1016 0 550 | N 148 988 180 1028 0 551 | q 184 980 216 1016 0 552 | Q 220 988 252 1028 0 553 | t 40 868 72 904 0 554 | v 76 868 108 896 0 555 | G 112 868 144 908 0 556 | 4 148 868 180 908 0 557 | 0 184 868 216 908 0 558 | k 224 868 252 908 0 559 | 1 44 748 68 788 0 560 | 5 76 748 108 788 0 561 | 4 112 748 144 788 0 562 | x 148 748 180 776 0 563 | F 184 748 216 788 0 564 | W 220 748 252 788 0 565 | x 40 628 72 656 0 566 | 9 76 628 108 668 0 567 | i 116 628 140 668 0 568 | o 148 628 180 656 0 569 | C 184 628 216 668 0 570 | I 224 628 248 668 0 571 | p 40 500 72 536 0 572 | 9 76 508 108 548 0 573 | r 112 508 144 536 0 574 | R 148 508 180 548 0 575 | 4 184 508 216 548 0 576 | J 224 508 248 548 0 577 | D 40 388 72 428 0 578 | N 76 388 108 428 0 579 | L 112 388 140 428 0 580 | P 148 388 180 428 0 581 | J 188 388 212 428 0 582 | A 220 388 252 428 0 583 | w 40 268 72 296 0 584 | G 76 268 108 308 0 585 | R 112 268 144 308 0 586 | 2 148 268 180 308 0 587 | O 184 268 216 308 0 588 | 1 224 268 248 308 0 589 | z 44 148 68 176 0 590 | V 76 148 108 188 0 591 | r 112 148 144 176 0 592 | 9 148 148 180 188 0 593 | 6 184 148 216 188 0 594 | r 220 148 252 176 0 595 | 8 40 28 72 68 0 596 | 5 76 28 108 68 0 597 | 6 112 28 144 68 0 598 | s 148 28 180 56 0 599 | H 184 28 216 68 0 600 | R 220 28 252 68 0 601 | -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/tessdata/lol.tif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/examples/nerdz-captcha-breaker/tessdata/lol.tif -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/tessdata/lol.traineddata: -------------------------------------------------------------------------------- 1 | 4 2 | linear essential -0.250000 0.750000 3 | linear non-essential 0.000000 1.000000 4 | linear essential 0.000000 1.000000 5 | linear essential 0.000000 1.000000 6 | 7 | e 1 8 | significant elliptical 7 9 | 0.236291 0.367987 0.154189 0.157962 10 | 0.000105 0.000100 0.000100 0.000100 11 | 12 | n 1 13 | significant elliptical 13 14 | 0.266366 0.301231 0.173399 0.204163 15 | 0.000154 0.000103 0.000100 0.000100 16 | 17 | K 1 18 | significant elliptical 12 19 | 0.350812 0.391606 0.233645 0.177517 20 | 0.000100 0.000129 0.000110 0.000100 21 | 22 | z 1 23 | significant elliptical 8 24 | 0.246176 0.276081 0.191569 0.134426 25 | 0.000119 0.000100 0.000100 0.000100 26 | 27 | a 1 28 | significant elliptical 9 29 | 0.233226 0.367708 0.158488 0.167808 30 | 0.000107 0.000105 0.000100 0.000100 31 | 32 | V 1 33 | significant elliptical 13 34 | 0.402965 0.318983 0.207276 0.168549 35 | 0.000100 0.000117 0.000100 0.000100 36 | 37 | C 1 38 | significant elliptical 8 39 | 0.353335 0.365962 0.248305 0.185277 40 | 0.000122 0.000100 0.000136 0.000120 41 | 42 | Z 1 43 | significant elliptical 11 44 | 0.349099 0.361625 0.263984 0.146199 45 | 0.000121 0.000100 0.000100 0.000103 46 | 47 | U 2 48 | significant elliptical 14 49 | 0.335777 0.361192 0.233302 0.200919 50 | 0.000100 0.000100 0.000100 0.000100 51 | significant elliptical 1 52 | 0.209556 0.279760 0.158989 0.184763 53 | 0.000100 0.000100 0.000100 0.000100 54 | 55 | 6 1 56 | significant elliptical 14 57 | 0.348538 0.437296 0.216182 0.165816 58 | 0.000100 0.000100 0.000100 0.000100 59 | 60 | t 1 61 | significant elliptical 7 62 | 0.286107 0.270097 0.209714 0.142435 63 | 0.000100 0.000100 0.000100 0.000100 64 | 65 | f 1 66 | significant elliptical 9 67 | 0.410744 0.295615 0.211375 0.150377 68 | 0.000100 0.000100 0.000100 0.000141 69 | 70 | O 1 71 | significant elliptical 9 72 | 0.350995 0.401897 0.229898 0.183143 73 | 0.000128 0.000100 0.000100 0.000137 74 | 75 | 5 1 76 | significant elliptical 10 77 | 0.359195 0.433510 0.235986 0.168252 78 | 0.000143 0.000201 0.000100 0.000112 79 | 80 | m 1 81 | significant elliptical 15 82 | 0.250459 0.400497 0.172809 0.179596 83 | 0.000100 0.000100 0.000100 0.000116 84 | 85 | G 1 86 | significant elliptical 10 87 | 0.340774 0.405120 0.243920 0.176654 88 | 0.000100 0.000100 0.000100 0.000111 89 | 90 | Y 1 91 | significant elliptical 13 92 | 0.435619 0.278236 0.221397 0.157281 93 | 0.000100 0.000100 0.000100 0.000122 94 | 95 | F 2 96 | significant elliptical 4 97 | 0.435221 0.309691 0.235570 0.164156 98 | 0.000100 0.000100 0.000100 0.000101 99 | significant elliptical 1 100 | 0.456644 0.295240 0.237164 0.177334 101 | 0.000100 0.000100 0.000100 0.000100 102 | 103 | N 1 104 | significant elliptical 14 105 | 0.352615 0.404162 0.222935 0.203070 106 | 0.000100 0.000100 0.000100 0.000100 107 | 108 | h 1 109 | significant elliptical 6 110 | 0.317966 0.336328 0.210271 0.201396 111 | 0.000128 0.000100 0.000125 0.000128 112 | 113 | v 1 114 | significant elliptical 10 115 | 0.285250 0.252656 0.151592 0.172330 116 | 0.000100 0.000100 0.000100 0.000134 117 | 118 | D 1 119 | significant elliptical 14 120 | 0.353157 0.409283 0.242507 0.193737 121 | 0.000100 0.000107 0.000107 0.000100 122 | 123 | w 1 124 | significant elliptical 11 125 | 0.243306 0.321823 0.156972 0.192868 126 | 0.000108 0.000100 0.000100 0.000105 127 | 128 | p 1 129 | significant elliptical 11 130 | 0.216565 0.373186 0.194673 0.183451 131 | 0.000128 0.000100 0.000100 0.000100 132 | 133 | 1 1 134 | significant elliptical 10 135 | 0.298121 0.243222 0.248124 0.105599 136 | 0.000100 0.000100 0.000123 0.000100 137 | 138 | y 1 139 | significant elliptical 6 140 | 0.135753 0.413064 0.202965 0.182359 141 | 0.000134 0.000106 0.000100 0.000151 142 | 143 | J 2 144 | significant elliptical 10 145 | 0.318799 0.267208 0.245902 0.129312 146 | 0.000128 0.000174 0.000100 0.000101 147 | significant elliptical 1 148 | 0.206095 0.310138 0.275134 0.164492 149 | 0.000100 0.000100 0.000100 0.000100 150 | 151 | s 2 152 | significant elliptical 6 153 | 0.242095 0.385049 0.164741 0.173632 154 | 0.000100 0.000100 0.000107 0.000100 155 | significant elliptical 1 156 | 0.261871 0.397893 0.174152 0.156976 157 | 0.000100 0.000100 0.000100 0.000100 158 | 159 | A 1 160 | significant elliptical 14 161 | 0.333014 0.387019 0.205204 0.185735 162 | 0.000100 0.000100 0.000142 0.000100 163 | 164 | i 1 165 | significant elliptical 10 166 | 0.299607 0.250327 0.242163 0.101411 167 | 0.000100 0.000122 0.000100 0.000100 168 | 169 | d 1 170 | significant elliptical 7 171 | 0.289428 0.373991 0.214446 0.176243 172 | 0.000132 0.000100 0.000100 0.000100 173 | 174 | l 1 175 | significant elliptical 8 176 | 0.332409 0.209799 0.249071 0.075567 177 | 0.000100 0.000122 0.000123 0.000100 178 | 179 | 0 1 180 | significant elliptical 10 181 | 0.352983 0.358702 0.204173 0.178544 182 | 0.000130 0.000257 0.000201 0.000100 183 | 184 | j 1 185 | significant elliptical 5 186 | 0.204354 0.322719 0.284487 0.154231 187 | 0.000100 0.000100 0.000100 0.000100 188 | 189 | W 1 190 | significant elliptical 13 191 | 0.341548 0.412308 0.215697 0.200492 192 | 0.000114 0.000100 0.000122 0.000121 193 | 194 | H 1 195 | significant elliptical 4 196 | 0.353759 0.382993 0.221391 0.207384 197 | 0.000133 0.000644 0.000128 0.000100 198 | 199 | x 1 200 | significant elliptical 7 201 | 0.247101 0.313565 0.168487 0.183715 202 | 0.000100 0.000100 0.000100 0.000100 203 | 204 | k 1 205 | significant elliptical 8 206 | 0.297643 0.312786 0.217795 0.150275 207 | 0.000100 0.000100 0.000105 0.000100 208 | 209 | 9 1 210 | significant elliptical 10 211 | 0.359326 0.437392 0.220004 0.161633 212 | 0.000100 0.000114 0.000100 0.000100 213 | 214 | I 1 215 | significant elliptical 9 216 | 0.350314 0.279721 0.268156 0.113107 217 | 0.000100 0.000100 0.000100 0.000100 218 | 219 | u 1 220 | significant elliptical 9 221 | 0.222700 0.300615 0.172820 0.199021 222 | 0.000100 0.000100 0.000100 0.000100 223 | 224 | q 1 225 | significant elliptical 11 226 | 0.210757 0.366713 0.200097 0.177179 227 | 0.000100 0.000100 0.000100 0.000100 228 | 229 | P 1 230 | significant elliptical 13 231 | 0.437561 0.340294 0.213110 0.180816 232 | 0.000100 0.000100 0.000130 0.000134 233 | 234 | S 1 235 | significant elliptical 16 236 | 0.361888 0.434149 0.240850 0.173467 237 | 0.000105 0.000123 0.000204 0.000104 238 | 239 | r 1 240 | significant elliptical 7 241 | 0.330586 0.238037 0.170471 0.159260 242 | 0.000106 0.000100 0.000100 0.000100 243 | 244 | X 1 245 | significant elliptical 13 246 | 0.351848 0.386853 0.234723 0.186127 247 | 0.000100 0.000100 0.000100 0.000100 248 | 249 | c 1 250 | significant elliptical 11 251 | 0.249989 0.308380 0.175803 0.165779 252 | 0.000100 0.000104 0.000100 0.000100 253 | 254 | T 1 255 | significant elliptical 8 256 | 0.452856 0.240727 0.249381 0.132352 257 | 0.000100 0.000100 0.000100 0.000104 258 | 259 | 4 1 260 | significant elliptical 9 261 | 0.351394 0.312148 0.196728 0.156359 262 | 0.000100 0.000100 0.000100 0.000100 263 | 264 | Q 1 265 | significant elliptical 8 266 | 0.327991 0.456745 0.228381 0.177377 267 | 0.000100 0.000120 0.000100 0.000135 268 | 269 | 8 1 270 | significant elliptical 16 271 | 0.357544 0.469162 0.220802 0.163384 272 | 0.000100 0.000154 0.000100 0.000100 273 | 274 | 7 1 275 | significant elliptical 10 276 | 0.432928 0.315285 0.226621 0.165014 277 | 0.000100 0.000100 0.000119 0.000115 278 | 279 | R 1 280 | significant elliptical 7 281 | 0.371887 0.420819 0.233009 0.191376 282 | 0.000128 0.000100 0.000100 0.000102 283 | 284 | o 1 285 | significant elliptical 9 286 | 0.252656 0.306399 0.154325 0.168836 287 | 0.000100 0.000100 0.000100 0.000100 288 | 289 | E 5 290 | significant elliptical 5 291 | 0.372294 0.369338 0.259346 0.162084 292 | 0.000100 0.000100 0.000150 0.000100 293 | significant elliptical 1 294 | 0.368948 0.441411 0.220024 0.176137 295 | 0.000100 0.000100 0.000100 0.000100 296 | significant elliptical 3 297 | 0.356953 0.351630 0.246174 0.146610 298 | 0.000100 0.000134 0.000100 0.000100 299 | significant elliptical 2 300 | 0.340935 0.350573 0.264206 0.141627 301 | 0.000100 0.000161 0.000100 0.000100 302 | significant elliptical 2 303 | 0.363168 0.362313 0.261559 0.166899 304 | 0.000100 0.000100 0.000100 0.000100 305 | 306 | g 2 307 | significant elliptical 4 308 | 0.158181 0.490328 0.197205 0.159490 309 | 0.000100 0.000100 0.000100 0.000100 310 | significant elliptical 3 311 | 0.175584 0.495272 0.218554 0.164773 312 | 0.000100 0.000100 0.000100 0.000100 313 | 314 | b 1 315 | significant elliptical 8 316 | 0.295254 0.381195 0.215593 0.176828 317 | 0.000100 0.000121 0.000100 0.000113 318 | 319 | 3 1 320 | significant elliptical 11 321 | 0.359293 0.401743 0.242537 0.161788 322 | 0.000100 0.000100 0.000100 0.000100 323 | 324 | L 1 325 | significant elliptical 10 326 | 0.263198 0.237610 0.246968 0.148439 327 | 0.000100 0.000100 0.000123 0.000100 328 | 329 | 2 1 330 | significant elliptical 5 331 | 0.342181 0.409027 0.249696 0.158152 332 | 0.000100 0.000100 0.000121 0.000100 333 | 334 | M 2 335 | significant elliptical 3 336 | 0.371564 0.406706 0.212450 0.190616 337 | 0.000100 0.000145 0.000100 0.000100 338 | significant elliptical 1 339 | 0.351816 0.412768 0.203631 0.210004 340 | 0.000100 0.000100 0.000100 0.000100 341 | 342 | B 2 343 | significant elliptical 1 344 | 0.360845 0.469181 0.238518 0.172173 345 | 0.000100 0.000100 0.000100 0.000100 346 | significant elliptical 1 347 | 0.344476 0.467981 0.235959 0.187345 348 | 0.000100 0.000100 0.000100 0.000100 349 | -------------------------------------------------------------------------------- /examples/nerdz-captcha-breaker/test.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | 3 | expected = File.read('captchas/captchas.txt').lines.to_a 4 | 5 | `./break.rb captchas/*.png`.lines.each_with_index {|line, index| 6 | whole, path, output = line.match(/^(.*?): (.*?)$/).to_a 7 | 8 | unless output == expected[index].chomp 9 | puts "#{path}: expected #{expected[index].chomp} but got #{output}" 10 | end 11 | } 12 | -------------------------------------------------------------------------------- /lib/tesseract-ocr.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract 26 | def self.prefix 27 | ENV['TESSDATA_PREFIX'] 28 | end 29 | 30 | def self.prefix=(path) 31 | ENV['TESSDATA_PREFIX'] = path 32 | end 33 | end 34 | 35 | require 'tesseract/api' 36 | require 'tesseract/engine' 37 | -------------------------------------------------------------------------------- /lib/tesseract.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | require 'tesseract-ocr' 26 | -------------------------------------------------------------------------------- /lib/tesseract/api.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | require 'tesseract/extensions' 26 | require 'tesseract/c' 27 | 28 | require 'tesseract/api/image' 29 | require 'tesseract/api/iterator' 30 | 31 | module Tesseract 32 | 33 | class API 34 | ## 35 | # Get a pointer to a tesseract-ocr usable image from a path, a string 36 | # with the data or an IO stream. 37 | def self.image_for (image) 38 | Image.new(image) 39 | end 40 | 41 | ## 42 | # Transform a language code to tesseract-ocr usable codes 43 | def self.to_language_code (code) 44 | code = code.to_s.downcase 45 | codes = ISO_639.find(code) 46 | 47 | return code unless codes 48 | 49 | term = codes.alpha3_terminologic 50 | bibl = codes.alpha3 51 | 52 | term.empty? ? bibl : term 53 | end 54 | 55 | Types = { 56 | int: [:integer], 57 | bool: [:boolean], 58 | double: [:float], 59 | string: [:str] 60 | } 61 | 62 | def initialize 63 | @internal = FFI::AutoPointer.new(C::BaseAPI.create, self.class.method(:finalize)) 64 | end 65 | 66 | def self.finalize (pointer) # :nodoc: 67 | C::BaseAPI.destroy(pointer) 68 | end 69 | 70 | def version 71 | C::BaseAPI.version(to_ffi) 72 | end 73 | 74 | def set_input_name (name) 75 | C::BaseAPI.set_input_name(to_ffi, name) 76 | end 77 | 78 | def set_output_name (name) 79 | C::BaseAPI.set_output_name(to_ffi, name) 80 | end 81 | 82 | def set_variable (name, value) 83 | C::BaseAPI.set_variable(to_ffi, name, value) 84 | end 85 | 86 | def get_variable (name, type = nil) 87 | if type.nil? 88 | type = Types.keys.find { |type| C::BaseAPI.__send__ "has_#{type}_variable", to_ffi, name } 89 | 90 | if type 91 | C::BaseAPI.__send__ "get_#{type}_variable", to_ffi, name 92 | end 93 | else 94 | unless Types.has_key?(type) 95 | name, aliases = Types.find { |name, aliases| aliases.member?(type) } 96 | 97 | raise ArgumentError, "unknown type #{type}" unless name 98 | 99 | type = name 100 | end 101 | 102 | if C::BaseAPI.__send__ "has_#{type}_variable", to_ffi, name 103 | C::BaseAPI.__send__ "get_#{type}_variable", to_ffi, name 104 | end 105 | end 106 | end 107 | 108 | def init (datapath = nil, language = 'eng', mode = :DEFAULT) 109 | unless C::BaseAPI.init(to_ffi, datapath || Tesseract.prefix || '/usr/share', language.to_s, mode).zero? 110 | raise 'the API did not Init correctly' 111 | end 112 | end 113 | 114 | def read_config_file (path) 115 | C::BaseAPI.read_config_file(to_ffi, path) 116 | end 117 | 118 | def get_page_seg_mode 119 | C::BaseAPI.get_page_seg_mode(to_ffi) 120 | end 121 | 122 | def set_page_seg_mode (value) 123 | C::BaseAPI.set_page_seg_mode(to_ffi, value) 124 | end 125 | 126 | def set_image (pix) 127 | C::BaseAPI.set_image(to_ffi, pix.is_a?(Image) ? pix.to_ffi : pix) 128 | end 129 | 130 | def set_rectangle (left, top, width, height) 131 | C::BaseAPI.set_rectangle(to_ffi, left, top, width, height) 132 | end 133 | 134 | def process_pages (name) 135 | result = C.create_string 136 | 137 | unless C::BaseAPI.process_pages(to_ffi, name, result) 138 | raise 'process_pages failed' 139 | end 140 | 141 | C.string_content(result).read_string(C.string_length(result)) 142 | ensure 143 | C.destroy_string(result) 144 | end 145 | 146 | def process_page (pix, page = 0, name = "") 147 | result = C.create_string 148 | 149 | unless C::BaseAPI.process_page(to_ffi, pix.is_a?(Image) ? pix.to_ffi : pix, page, name, result) 150 | raise 'process_page failed' 151 | end 152 | 153 | C.string_content(result).read_string(C.string_length(result)) 154 | ensure 155 | C.destroy_string(result) 156 | end 157 | 158 | def get_iterator 159 | Iterator.new(C::BaseAPI.get_iterator(to_ffi)) 160 | end 161 | 162 | def get_text 163 | pointer = C::BaseAPI.get_utf8_text(to_ffi) 164 | 165 | return if pointer.null? 166 | 167 | result = pointer.read_string 168 | result.force_encoding 'UTF-8' 169 | 170 | result 171 | ensure 172 | C.free_array_of_char(pointer) unless pointer.null? 173 | end 174 | 175 | def get_hocr(page = 0) 176 | pointer = C::BaseAPI.get_hocr_text(to_ffi, page) 177 | 178 | return if pointer.null? 179 | 180 | result = pointer.read_string 181 | result.force_encoding 'UTF-8' 182 | 183 | result 184 | ensure 185 | C.free_array_of_char(pointer) unless pointer.null? 186 | end 187 | 188 | def get_box (page = 0) 189 | pointer = C::BaseAPI.get_box_text(to_ffi, page) 190 | result = pointer.read_string 191 | result.force_encoding 'UTF-8' 192 | 193 | result 194 | ensure 195 | C.free_array_of_char(pointer) 196 | end 197 | 198 | def get_unlv 199 | pointer = C::BaseAPI.get_unlv_text(to_ffi) 200 | result = pointer.read_string 201 | result.force_encoding 'ISO8859-1' 202 | 203 | result 204 | ensure 205 | C.free_array_of_char(pointer) 206 | end 207 | 208 | def mean_text_confidence 209 | C::BaseAPI.mean_text_conf(to_ffi) 210 | end 211 | 212 | def all_word_confidences 213 | C::BaseAPI.all_word_confidences(to_ffi) 214 | end 215 | 216 | def clear 217 | C::BaseAPI.clear(to_ffi) 218 | end 219 | 220 | def end 221 | C::BaseAPI.end(to_ffi) 222 | end 223 | 224 | def to_ffi 225 | @internal 226 | end 227 | end 228 | 229 | end 230 | -------------------------------------------------------------------------------- /lib/tesseract/api/image.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; class API 26 | 27 | class Image 28 | def self.new (image, x = 0, y = 0) 29 | image = if image.is_a?(String) && (File.exists?(File.expand_path(image)) rescue nil) 30 | C::Leptonica.pix_read(File.expand_path(image)) 31 | elsif image.is_a?(String) 32 | C::Leptonica.pix_read_mem(image, image.bytesize) 33 | elsif image.is_a?(IO) 34 | C::Leptonica.pix_read_stream(image.to_i) 35 | elsif image.respond_to? :to_blob 36 | image = image.to_blob 37 | 38 | C::Leptonica.pix_read_mem(image, image.bytesize) 39 | else 40 | image 41 | end 42 | 43 | raise ArgumentError, 'invalid image' if image.nil? || image.null? 44 | 45 | super(image, x, y) 46 | end 47 | 48 | attr_accessor :x, :y 49 | 50 | def initialize (pointer, x = 0, y = 0) 51 | @internal = FFI::AutoPointer.new(pointer, self.class.method(:finalize)) 52 | @x = x 53 | @y = y 54 | end 55 | 56 | def self.finalize (pointer) 57 | C::Leptonica.pix_destroy(pointer) 58 | end 59 | 60 | def width 61 | C::Leptonica.pix_get_width(to_ffi) 62 | end 63 | 64 | def height 65 | C::Leptonica.pix_get_height(to_ffi) 66 | end 67 | 68 | def to_blob (format = :default) 69 | data = FFI::MemoryPointer.new(:pointer) 70 | size = FFI::MemoryPointer.new(:size_t) 71 | 72 | C::Leptonica.pix_write_mem(to_ffi, data, size, C.for_enum(format)) 73 | result = data.typecast(:pointer).read_string(size.typecast(:size_t)) 74 | C.free(data.typecast(:pointer)) 75 | 76 | result 77 | end 78 | 79 | def to_ffi 80 | @internal 81 | end 82 | end 83 | 84 | end; end 85 | -------------------------------------------------------------------------------- /lib/tesseract/api/iterator.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; class API 26 | 27 | class Iterator 28 | def initialize (pointer) 29 | raise ArgumentError, 'the pointer is null' if pointer.nil? || pointer.null? 30 | 31 | @internal = FFI::AutoPointer.new(pointer, self.class.method(:finalize)) 32 | end 33 | 34 | def self.finalize (pointer) # :nodoc: 35 | C::Iterator.destroy(pointer) 36 | end 37 | 38 | def begin 39 | C::Iterator.begin(to_ffi) 40 | end 41 | 42 | def beginning? (level = :word) 43 | C::Iterator.is_at_beginning_of(to_ffi, C.for_enum(level)) 44 | end 45 | 46 | def end? (level, element) 47 | C::Iterator.is_at_final_element(to_ffi, C.for_enum(level), C.for_enum(element)) 48 | end 49 | 50 | def next (level = :word) 51 | C::Iterator.next(to_ffi, C.for_enum(level)) 52 | end 53 | 54 | def bounding_box (level = :word) 55 | C::Iterator.bounding_box(to_ffi, C.for_enum(level)) 56 | end 57 | 58 | def get_binary_image (level = :word) 59 | Image.new(C::Iterator.get_binary_image(to_ffi, C.for_enum(level))) 60 | end 61 | 62 | def get_image (level = :word, padding = 0) 63 | image = C::Iterator.get_image(to_ffi, C.for_enum(level), padding) 64 | 65 | Image.new(image[:pix], image[:x], image[:y]) 66 | end 67 | 68 | def baseline (level = :word) 69 | C::Iterator.baseline(to_ffi, C.for_enum(level)) 70 | end 71 | 72 | def orientation 73 | C::Iterator.orientation(to_ffi) 74 | end 75 | 76 | def get_text (level = :word) 77 | pointer = C::Iterator.get_utf8_text(to_ffi, C.for_enum(level)) 78 | 79 | return if pointer.null? 80 | 81 | result = pointer.read_string 82 | result.force_encoding 'UTF-8' 83 | 84 | result 85 | ensure 86 | C.free_array_of_char(pointer) unless pointer.null? 87 | end 88 | 89 | def confidence (level = :word) 90 | C::Iterator.confidence(to_ffi, C.for_enum(level)) 91 | end 92 | 93 | def block_type 94 | C::Iterator.block_type(to_ffi) 95 | end 96 | 97 | def word_font_attributes 98 | C::Iterator.word_font_attributes(to_ffi) 99 | end 100 | 101 | def word_is_from_dictionary? 102 | C::Iterator.word_is_from_dictionary(to_ffi) 103 | end 104 | 105 | def word_is_numeric? 106 | C::Iterator.word_is_numeric(to_ffi) 107 | end 108 | 109 | def symbol_is_superscript? 110 | C::Iterator.symbol_is_superscript(to_ffi) 111 | end 112 | 113 | def symbol_is_subscript? 114 | C::Iterator.symbol_is_subscript(to_ffi) 115 | end 116 | 117 | def symbol_is_dropcap? 118 | C::Iterator.symbol_is_dropcap(to_ffi) 119 | end 120 | 121 | def to_ffi 122 | @internal 123 | end 124 | end 125 | 126 | end; end 127 | -------------------------------------------------------------------------------- /lib/tesseract/c.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | require 'ffi' 26 | require 'ffi/extra' 27 | require 'ffi/inline' 28 | 29 | module Tesseract 30 | 31 | module C 32 | extend FFI::Inline 33 | 34 | inline 'C++' do |cpp| 35 | cpp.include 'tesseract/strngs.h' 36 | cpp.libraries 'tesseract' 37 | 38 | cpp.function %{ 39 | void free (void* pointer) { 40 | free(pointer); 41 | } 42 | } 43 | 44 | cpp.function %{ 45 | void free_array_of_char (char* pointer) { 46 | delete [] pointer; 47 | } 48 | } 49 | 50 | cpp.function %{ 51 | void free_array_of_int (int* pointer) { 52 | delete [] pointer; 53 | } 54 | } 55 | 56 | cpp.function %{ 57 | STRING* create_string (void) { 58 | return new STRING(); 59 | } 60 | } 61 | 62 | cpp.function %{ 63 | void destroy_string (STRING* value) { 64 | delete value; 65 | } 66 | } 67 | 68 | cpp.function %{ 69 | int string_length (STRING* value) { 70 | return value->length(); 71 | } 72 | } 73 | 74 | cpp.function %{ 75 | const char* string_content (STRING* value) { 76 | return value->string(); 77 | } 78 | } 79 | end 80 | 81 | def self.for_enum (what) 82 | what.is_a?(Integer) ? what : what.to_s.upcase.to_sym 83 | end 84 | end 85 | 86 | end 87 | 88 | require 'tesseract/c/leptonica' 89 | require 'tesseract/c/baseapi' 90 | require 'tesseract/c/iterator' 91 | -------------------------------------------------------------------------------- /lib/tesseract/c/baseapi.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; module C 26 | 27 | module BaseAPI 28 | extend FFI::Inline 29 | 30 | inline 'C++' do |cpp| 31 | cpp.include 'tesseract/baseapi.h' 32 | cpp.libraries 'tesseract' 33 | 34 | cpp.raw 'using namespace tesseract;' 35 | 36 | cpp.eval { 37 | enum :OcrEngineMode, [ 38 | :TESSERACT_ONLY, :CUBE_ONLY, :TESSERACT_CUBE_COMBINED, :DEFAULT 39 | ] 40 | 41 | enum :PageSegMode, [ 42 | :OSD_ONLY, :AUTO_OSD, 43 | :AUTO_ONLY, :AUTO, :SINGLE_COLUMN, :SINGLE_BLOCK_VERT_TEXT, 44 | :SINGLE_BLOCK, :SINGLE_LINE, :SINGLE_WORD, :CIRCLE_WORD, :SINGLE_CHAR, 45 | :COUNT 46 | ] 47 | } 48 | 49 | cpp.function %{ 50 | TessBaseAPI* create (void) { 51 | return new TessBaseAPI(); 52 | } 53 | } 54 | 55 | cpp.function %{ 56 | void destroy (TessBaseAPI* api) { 57 | delete api; 58 | } 59 | } 60 | 61 | cpp.function %{ 62 | const char* version (TessBaseAPI* api) { 63 | return api->Version(); 64 | } 65 | }, return: :string 66 | 67 | cpp.function %{ 68 | void set_input_name (TessBaseAPI* api, const char* name) { 69 | api->SetInputName(name); 70 | } 71 | } 72 | 73 | cpp.function %{ 74 | void set_output_name (TessBaseAPI* api, const char* name) { 75 | api->SetOutputName(name); 76 | } 77 | } 78 | 79 | cpp.function %{ 80 | bool set_variable (TessBaseAPI* api, const char* name, const char* value) { 81 | return api->SetVariable(name, value); 82 | } 83 | } 84 | 85 | cpp.function %{ 86 | bool has_int_variable (TessBaseAPI* api, const char* name) { 87 | int tmp; 88 | 89 | return api->GetIntVariable(name, &tmp); 90 | } 91 | } 92 | 93 | cpp.function %{ 94 | bool has_bool_variable (TessBaseAPI* api, const char* name) { 95 | bool tmp; 96 | 97 | return api->GetBoolVariable(name, &tmp); 98 | } 99 | } 100 | 101 | cpp.function %{ 102 | bool has_double_variable (TessBaseAPI* api, const char* name) { 103 | double tmp; 104 | 105 | return api->GetDoubleVariable(name, &tmp); 106 | } 107 | } 108 | 109 | cpp.function %{ 110 | bool has_string_variable (TessBaseAPI* api, const char* name) { 111 | return api->GetStringVariable(name) != NULL; 112 | } 113 | } 114 | 115 | cpp.function %{ 116 | int get_int_variable (TessBaseAPI* api, const char* name) { 117 | int result = 0; 118 | 119 | api->GetIntVariable(name, &result); 120 | 121 | return result; 122 | } 123 | } 124 | 125 | cpp.function %{ 126 | bool get_bool_variable (TessBaseAPI* api, const char* name) { 127 | bool result = false; 128 | 129 | api->GetBoolVariable(name, &result); 130 | 131 | return result; 132 | } 133 | } 134 | 135 | cpp.function %{ 136 | double get_double_variable (TessBaseAPI* api, const char* name) { 137 | double result = 0; 138 | 139 | api->GetDoubleVariable(name, &result); 140 | 141 | return result; 142 | } 143 | } 144 | 145 | cpp.function %{ 146 | const char* get_string_variable (TessBaseAPI* api, const char* name) { 147 | return api->GetStringVariable(name); 148 | } 149 | }, return: :string 150 | 151 | cpp.function %{ 152 | int init (TessBaseAPI* api, const char* datapath, const char* language, OcrEngineMode oem) { 153 | return api->Init(datapath, language, oem); 154 | } 155 | } 156 | 157 | cpp.function %{ 158 | void set_page_seg_mode (TessBaseAPI* api, PageSegMode mode) { 159 | api->SetPageSegMode(mode); 160 | } 161 | } 162 | 163 | cpp.function %{ 164 | PageSegMode get_page_seg_mode (TessBaseAPI* api) { 165 | return api->GetPageSegMode(); 166 | } 167 | } 168 | 169 | cpp.function %{ 170 | void set_image (TessBaseAPI* api, const Pix* pix) { 171 | api->SetImage(pix); 172 | } 173 | } 174 | 175 | cpp.function %{ 176 | void set_rectangle (TessBaseAPI* api, int left, int top, int width, int height) { 177 | api->SetRectangle(left, top, width, height); 178 | } 179 | } 180 | 181 | cpp.function %{ 182 | bool process_pages (TessBaseAPI* api, const char* filename, STRING* output) { 183 | return api->ProcessPages(filename, NULL, 0, output); 184 | } 185 | }, blocking: true 186 | 187 | cpp.function %{ 188 | bool process_page (TessBaseAPI* api, Pix* pix, int page_index, const char* filename, STRING* output) { 189 | return api->ProcessPage(pix, page_index, filename, NULL, 0, output); 190 | } 191 | }, blocking: true 192 | 193 | cpp.function %{ 194 | ResultIterator* get_iterator (TessBaseAPI* api) { 195 | return api->GetIterator(); 196 | } 197 | } 198 | 199 | cpp.function %{ 200 | char* get_utf8_text (TessBaseAPI* api) { 201 | return api->GetUTF8Text(); 202 | } 203 | }, blocking: true 204 | 205 | cpp.function %{ 206 | char* get_hocr_text (TessBaseAPI* api, int page_number) { 207 | return api->GetHOCRText(page_number); 208 | } 209 | }, blocking: true 210 | 211 | cpp.function %{ 212 | char* get_box_text (TessBaseAPI* api, int page_number) { 213 | return api->GetBoxText(page_number); 214 | } 215 | }, blocking: true 216 | 217 | cpp.function %{ 218 | char* get_unlv_text (TessBaseAPI* api) { 219 | return api->GetUNLVText(); 220 | } 221 | }, blocking: true 222 | 223 | cpp.function %{ 224 | int mean_text_conf (TessBaseAPI* api) { 225 | return api->MeanTextConf(); 226 | } 227 | }, blocking: true 228 | 229 | cpp.function %{ 230 | int* all_word_confidences (TessBaseAPI* api) { 231 | return api->AllWordConfidences(); 232 | } 233 | }, blocking: true 234 | 235 | cpp.function %{ 236 | void clear (TessBaseAPI* api) { 237 | api->Clear(); 238 | } 239 | } 240 | 241 | cpp.function %{ 242 | void end (TessBaseAPI* api) { 243 | api->End(); 244 | } 245 | } 246 | end 247 | 248 | begin 249 | inline 'C++' do |cpp| 250 | cpp.include 'tesseract/baseapi.h' 251 | cpp.libraries 'tesseract' 252 | 253 | cpp.raw 'using namespace tesseract;' 254 | 255 | cpp.function %{ 256 | void read_config_file (TessBaseAPI* api, const char* filename) { 257 | api->ReadConfigFile(filename, false); 258 | } 259 | } 260 | end 261 | rescue CompilationError 262 | inline 'C++' do |cpp| 263 | cpp.include 'tesseract/baseapi.h' 264 | cpp.libraries 'tesseract' 265 | 266 | cpp.raw 'using namespace tesseract;' 267 | 268 | cpp.function %{ 269 | void read_config_file (TessBaseAPI* api, const char* filename) { 270 | api->ReadConfigFile(filename); 271 | } 272 | } 273 | end 274 | end 275 | end 276 | 277 | end; end 278 | -------------------------------------------------------------------------------- /lib/tesseract/c/iterator.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; module C 26 | 27 | module Iterator 28 | extend FFI::Inline 29 | 30 | inline 'C++' do |cpp| 31 | cpp.include 'tesseract/resultiterator.h' 32 | cpp.libraries 'tesseract' 33 | 34 | cpp.raw 'using namespace tesseract;' 35 | 36 | cpp.eval { 37 | enum :PolyBlockType, [ 38 | :UNKNOWN, 39 | :FLOWING_TEXT, :HEADING_TEXT, :PULLOUT_TEXT, :TABLE, :VERTICAL_TEXT, :CAPTION_TEXT, 40 | :FLOWING_IMAGE, :HEADING_IMAGE, :PULLOUT_IMAGE, 41 | :HORZ_LINE, :VERT_LINE, :NOISE, :COUNT 42 | ] 43 | 44 | enum :PageIteratorLevel, [ 45 | :BLOCK, :PARAGRAPH, :LINE, :WORD, :SYMBOL 46 | ] 47 | 48 | orientation = enum :UP, :RIGHT, :DOWN, :LEFT 49 | direction = enum :LEFT_TO_RIGHT, :RIGHT_TO_LEFT, :TOP_TO_BOTTOM 50 | 51 | BoundingBox = Class.new(FFI::Struct) { 52 | layout \ 53 | :left, :int, 54 | :top, :int, 55 | :right, :int, 56 | :bottom, :int 57 | } 58 | 59 | Image = Class.new(FFI::Struct) { 60 | layout \ 61 | :pix, :pointer, 62 | :x, :int, 63 | :y, :int 64 | } 65 | 66 | Baseline = Class.new(FFI::Struct) { 67 | layout \ 68 | :x1, :int, 69 | :y1, :int, 70 | :x2, :int, 71 | :y2, :int 72 | } 73 | 74 | Orientation = Class.new(FFI::Struct) { 75 | layout \ 76 | :orientation, orientation, 77 | :writing_direction, direction, 78 | :textline_order, direction, 79 | :deskew_angle, :float 80 | } 81 | 82 | FontAttributes = Class.new(FFI::Struct) { 83 | layout \ 84 | :id, :int, 85 | :name, :string, 86 | :pointsize, :int, 87 | 88 | :is_bold, :bool, 89 | :is_italic, :bool, 90 | :is_underlined, :bool, 91 | :is_monospace, :bool, 92 | :is_serif, :bool, 93 | :is_smallcaps, :bool 94 | } 95 | 96 | typedef BoundingBox.by_value, :BoundingBox 97 | typedef Image.by_value, :Image 98 | typedef Baseline.by_value, :Baseline 99 | typedef Orientation.by_value, :OrientationResult 100 | typedef FontAttributes.by_value, :FontAttributes 101 | } 102 | 103 | cpp.raw %{ 104 | typedef struct BoundingBox { 105 | int left; 106 | int top; 107 | int right; 108 | int bottom; 109 | } BoundingBox; 110 | 111 | typedef struct Image { 112 | Pix* pix; 113 | int x; 114 | int y; 115 | } Image; 116 | 117 | typedef struct Baseline { 118 | int x1; 119 | int y1; 120 | int x2; 121 | int y2; 122 | } Baseline; 123 | 124 | typedef struct OrientationResult { 125 | Orientation orientation; 126 | WritingDirection writing_direction; 127 | TextlineOrder textline_order; 128 | float deskew_angle; 129 | } OrientationResult; 130 | 131 | typedef struct FontAttributes { 132 | int id; 133 | const char* name; 134 | int pointsize; 135 | 136 | bool is_bold; 137 | bool is_italic; 138 | bool is_underlined; 139 | bool is_monospace; 140 | bool is_serif; 141 | bool is_smallcaps; 142 | } FontAttributes; 143 | } 144 | 145 | cpp.function %{ 146 | void destroy (PageIterator* it) { 147 | delete it; 148 | } 149 | } 150 | 151 | cpp.function %{ 152 | void begin (PageIterator* it) { 153 | it->Begin(); 154 | } 155 | } 156 | 157 | cpp.function %{ 158 | bool next (PageIterator* it, PageIteratorLevel level) { 159 | return it->Next(level); 160 | } 161 | } 162 | 163 | cpp.function %{ 164 | bool is_at_beginning_of (PageIterator* it, PageIteratorLevel level) { 165 | return it->IsAtBeginningOf(level); 166 | } 167 | } 168 | 169 | cpp.function %{ 170 | bool is_at_final_element (PageIterator* it, PageIteratorLevel level, PageIteratorLevel element) { 171 | return it->IsAtFinalElement(level, element); 172 | } 173 | } 174 | 175 | cpp.function %{ 176 | BoundingBox bounding_box (PageIterator* it, PageIteratorLevel level) { 177 | BoundingBox result; 178 | 179 | it->BoundingBox(level, &result.left, &result.top, &result.right, &result.bottom); 180 | 181 | return result; 182 | } 183 | } 184 | 185 | cpp.function %{ 186 | PolyBlockType block_type (PageIterator* it) { 187 | return it->BlockType(); 188 | } 189 | } 190 | 191 | cpp.function %{ 192 | Pix* get_binary_image (PageIterator* it, PageIteratorLevel level) { 193 | return it->GetBinaryImage(level); 194 | } 195 | } 196 | 197 | cpp.function %{ 198 | Image get_image (PageIterator* it, PageIteratorLevel level, int padding) { 199 | Image result; 200 | 201 | result.pix = it->GetImage(level, padding, &result.x, &result.y); 202 | 203 | return result; 204 | } 205 | } 206 | 207 | cpp.function %{ 208 | Baseline baseline (PageIterator* it, PageIteratorLevel level) { 209 | Baseline result; 210 | 211 | it->Baseline(level, &result.x1, &result.y1, &result.x2, &result.y2); 212 | 213 | return result; 214 | } 215 | } 216 | 217 | cpp.function %{ 218 | OrientationResult orientation (PageIterator* it) { 219 | OrientationResult result; 220 | 221 | it->Orientation(&result.orientation, &result.writing_direction, &result.textline_order, &result.deskew_angle); 222 | 223 | return result; 224 | } 225 | } 226 | 227 | cpp.function %{ 228 | char* get_utf8_text (ResultIterator* it, PageIteratorLevel level) { 229 | return it->GetUTF8Text(level); 230 | } 231 | }, blocking: true 232 | 233 | cpp.function %{ 234 | float confidence (ResultIterator* it, PageIteratorLevel level) { 235 | return it->Confidence(level); 236 | } 237 | }, blocking: true 238 | 239 | cpp.function %{ 240 | FontAttributes word_font_attributes (ResultIterator* it) { 241 | FontAttributes result; 242 | 243 | result.name = it->WordFontAttributes(&result.is_bold, &result.is_italic, &result.is_underlined, 244 | &result.is_monospace, &result.is_serif, &result.is_smallcaps, &result.pointsize, &result.id); 245 | 246 | return result; 247 | } 248 | } 249 | 250 | cpp.function %{ 251 | bool word_is_from_dictionary (ResultIterator* it) { 252 | return it->WordIsFromDictionary(); 253 | } 254 | } 255 | 256 | cpp.function %{ 257 | bool word_is_numeric (ResultIterator* it) { 258 | return it->WordIsNumeric(); 259 | } 260 | } 261 | 262 | cpp.function %{ 263 | bool symbol_is_superscript (ResultIterator* it) { 264 | return it->SymbolIsSuperscript(); 265 | } 266 | } 267 | 268 | cpp.function %{ 269 | bool symbol_is_subscript (ResultIterator* it) { 270 | return it->SymbolIsSubscript(); 271 | } 272 | } 273 | 274 | cpp.function %{ 275 | bool symbol_is_dropcap (ResultIterator* it) { 276 | return it->SymbolIsDropcap(); 277 | } 278 | } 279 | end 280 | end 281 | 282 | end; end 283 | -------------------------------------------------------------------------------- /lib/tesseract/c/leptonica.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; module C 26 | 27 | module Leptonica 28 | extend FFI::Inline 29 | 30 | inline 'C++' do |cpp| 31 | cpp.include 'leptonica/allheaders.h' 32 | cpp.libraries 'lept' 33 | 34 | cpp.eval { 35 | enum :Format, [ 36 | :UNKNOWN, :BMP, :JFIF_JPEG, :PNG, 37 | :TIFF, :TIFF_PACKBITS, :TIFF_RLE, :TIFF_G3, :TIFF_G4, :TIFF_LZW, :TIFF_ZIP, 38 | :PNM, :PS, :GIF, :JP2, :WEBP, :LPDF, :DEFAULT, :SPIX 39 | ] 40 | } 41 | 42 | cpp.typedef 'int32_t', 'Format' 43 | 44 | cpp.function %{ 45 | Pix* pix_read (const char* path) { 46 | return pixRead(path); 47 | } 48 | }, blocking: true 49 | 50 | cpp.function %{ 51 | Pix* pix_read_stream (int fd) { 52 | return pixReadStream(fdopen(fd, "rb"), 0); 53 | } 54 | }, blocking: true 55 | 56 | cpp.function %{ 57 | Pix* pix_read_mem (const l_uint8* data, size_t size) { 58 | return pixReadMem(data, size); 59 | } 60 | }, blocking: true 61 | 62 | cpp.function %{ 63 | bool pix_write_mem (Pix* pix, uint8_t** data, size_t* size, Format format) { 64 | return pixWriteMem(data, size, pix, format); 65 | } 66 | }, blocking: true 67 | 68 | cpp.function %{ 69 | void pix_destroy (Pix* pix) { 70 | pixDestroy(&pix); 71 | } 72 | } 73 | 74 | cpp.function %{ 75 | int32_t pix_get_width (Pix* pix) { 76 | return pixGetWidth(pix); 77 | } 78 | } 79 | 80 | cpp.function %{ 81 | int32_t pix_get_height (Pix* pix) { 82 | return pixGetHeight(pix); 83 | } 84 | } 85 | end 86 | end 87 | 88 | end; end 89 | -------------------------------------------------------------------------------- /lib/tesseract/engine.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | require 'tesseract/api' 26 | 27 | require 'tesseract/engine/iterator' 28 | 29 | module Tesseract 30 | 31 | class Engine 32 | attr_reader :config 33 | 34 | named :path, :language, :mode, :variables, 35 | :optional => { :path => nil, :language => :eng, :mode => :DEFAULT, :variables => {}, :config => [] }, 36 | :alias => { :data => :path, :lang => :language } 37 | def initialize (path = nil, language = :eng, mode = :DEFAULT, variables = {}, config = [], &block) # :yields: self 38 | @api = API.new 39 | 40 | @initializing = true 41 | 42 | @init = block 43 | @path = path 44 | @language = language 45 | @mode = mode 46 | @variables = variables 47 | @config = config 48 | @rectangle = [] 49 | 50 | yield self if block_given? 51 | 52 | @initializing = false 53 | 54 | _init 55 | end 56 | 57 | def version 58 | @api.version 59 | end 60 | 61 | def load_config (*config) 62 | @config.concat config.flatten.compact.uniq 63 | 64 | unless @initializing 65 | @config.each {|conf| 66 | @api.read_config_file(conf) 67 | } 68 | end 69 | end 70 | 71 | def with (&block) # :yields: self 72 | self.class.new(@path, @language, @mode, @variables.clone, @config.clone) {|e| 73 | @init.call(e) if @init 74 | block.call(e) if block 75 | } 76 | end 77 | 78 | def input= (name) 79 | @api.set_input_name(name) 80 | end 81 | 82 | def output= (name) 83 | @api.set_output_name(name) 84 | end 85 | 86 | def set (name, value) 87 | @variables[name] = value 88 | 89 | @api.set_variable(name.to_s, value.to_s) 90 | end 91 | 92 | def get (name) 93 | @api.get_variable(name.to_s) || @variables[name] 94 | end 95 | 96 | %w(path language mode).each {|name| 97 | define_method name do 98 | instance_variable_get "@#{name}" 99 | end 100 | 101 | define_method "#{name}=" do |value| 102 | instance_variable_set "@#{name}", value 103 | 104 | _init unless @initializing 105 | end 106 | } 107 | 108 | def blacklist 109 | get('tessedit_char_blacklist').chars.to_a 110 | end 111 | 112 | def blacklist= (value) 113 | set('tessedit_char_blacklist', value.respond_to?(:to_a) ? value.to_a.join : value.to_s) 114 | end 115 | 116 | def whitelist 117 | get('tessedit_char_whitelist').chars.to_a 118 | end 119 | 120 | def whitelist= (value) 121 | set('tessedit_char_whitelist', value.respond_to?(:to_a) ? value.to_a.join : value.to_s) 122 | end 123 | 124 | def page_segmentation_mode 125 | @api.get_page_seg_mode 126 | end 127 | 128 | def page_segmentation_mode= (value) 129 | @psm = C.for_enum(value) 130 | 131 | @api.set_page_seg_mode @psm 132 | end 133 | 134 | def image= (image) 135 | @image = image 136 | end 137 | 138 | named :x, :y, :width, :height, 139 | :optional => 0 .. -1, 140 | :alias => { :w => :width, :h => :height } 141 | def select (x = nil, y = nil, width = nil, height = nil) 142 | @rectangle = [x, y, width, height] 143 | end 144 | 145 | named :image, :x, :y, :width, :height, 146 | :optional => 0 .. -1, 147 | :alias => { :w => :width, :h => :height } 148 | def text_for (image = nil, x = nil, y = nil, width = nil, height = nil) 149 | _setup(image, x, y, width, height) 150 | 151 | @api.get_text.tap {|text| 152 | text.instance_exec(@api) {|api| 153 | @unlv = api.get_unlv 154 | @confidence = api.mean_text_confidence 155 | 156 | class << self 157 | attr_reader :unlv, :confidence 158 | end 159 | } 160 | } 161 | end 162 | 163 | named :x, :y, :width, :height, 164 | :optional => 0 .. -1, 165 | :alias => { :w => :width, :h => :height } 166 | def text_at (x = nil, y = nil, width = nil, height = nil) 167 | text_for(nil, x, y, width, height) 168 | end 169 | 170 | def text 171 | text_at 172 | end 173 | 174 | named :image, :x, :y, :width, :height, 175 | :optional => 0 .. -1, 176 | :alias => { :w => :width, :h => :height } 177 | def hocr_for (image = nil, x = nil, y = nil, width = nil, height = nil, page = nil) 178 | _setup(image, x, y, width, height) 179 | 180 | @api.get_hocr(page || 0) 181 | end 182 | 183 | named :x, :y, :width, :height, 184 | :optional => 0 .. -1, 185 | :alias => { :w => :width, :h => :height } 186 | def hocr_at (x = nil, y = nil, width = nil, height = nil, page = nil) 187 | hocr_for(nil, x, y, width, height, page) 188 | end 189 | 190 | def hocr 191 | hocr_at 192 | end 193 | 194 | %w(block paragraph line word symbol).each {|level| 195 | define_method "each_#{level}" do |&block| 196 | raise ArgumentError, 'you have to pass a block' unless block 197 | 198 | _iterator.__send__ "each_#{level}", &block 199 | end 200 | 201 | named :image, :x, :y, :width, :height, 202 | :optional => 0 .. -1, 203 | :alias => { :w => :width, :h => :height } 204 | define_method "each_#{level}_for" do |image = nil, x = nil, y = nil, width = nil, height = nil, &block| 205 | self.image = image if image 206 | select x, y, width, height 207 | 208 | __send__ "each_#{level}", &block 209 | end 210 | 211 | named :x, :y, :width, :height, 212 | :optional => 0 .. -1, 213 | :alias => { :w => :width, :h => :height } 214 | define_method "each_#{level}_at" do |x = nil, y = nil, width = nil, height = nil, &block| 215 | __send__ "each_#{level}_for", nil, x, y, width, height, &block 216 | end 217 | 218 | define_method "#{level}s" do 219 | _iterator.__send__ "#{level}s" 220 | end 221 | 222 | named :image, :x, :y, :width, :height, 223 | :optional => 0 .. -1, 224 | :alias => { :w => :width, :h => :height } 225 | define_method "#{level}s_for" do |image = nil, x = nil, y = nil, width = nil, height = nil| 226 | self.image = image if image 227 | select x, y, width, height 228 | 229 | __send__ "#{level}s" 230 | end 231 | 232 | named :x, :y, :width, :height, 233 | :optional => 0 .. -1, 234 | :alias => { :w => :width, :h => :height } 235 | define_method "#{level}s_at" do |x = nil, y = nil, width = nil, height = nil| 236 | __send__ "#{level}s_for", nil, x, y, width, height 237 | end 238 | } 239 | 240 | def process (image, page = nil) 241 | if page 242 | @api.process_page(API.image_for(image), page) 243 | else 244 | raise ArgumentError, 'the path does not exist' unless File.exists?(image) 245 | 246 | @api.process_pages(image) 247 | end 248 | end 249 | 250 | protected 251 | def _init 252 | @api.end 253 | 254 | @api.init(@path, API.to_language_code(@language), @mode) 255 | 256 | @variables.each {|name, value| 257 | @api.set_variable(name.to_s, value.to_s) 258 | } 259 | 260 | @config.each {|conf| 261 | @api.read_config_file(conf) 262 | } 263 | 264 | @api.set_page_seg_mode @psm if @psm 265 | end 266 | 267 | def _setup (image = nil, x = nil, y = nil, width = nil, height = nil) 268 | image ||= @image or raise ArgumentError, 'you have to set an image first' 269 | image = API.image_for(image) 270 | 271 | if !width && x 272 | width = image.width - x 273 | end 274 | 275 | if !height && y 276 | height = image.height - y 277 | end 278 | 279 | x ||= @rectangle[0] || 0 280 | y ||= @rectangle[1] || 0 281 | width ||= @rectangle[2] || image.width 282 | height ||= @rectangle[3] || image.height 283 | 284 | if (x + width) > image.width || (y + height) > image.height 285 | raise IndexError, 'image access out of boundaries' 286 | end 287 | 288 | @api.set_image(image) 289 | @api.set_rectangle(x, y, width, height) 290 | end 291 | 292 | def _recognize 293 | _setup 294 | 295 | @api.get_text 296 | end 297 | 298 | def _iterator 299 | _recognize 300 | 301 | Iterator.new(@api.get_iterator) 302 | end 303 | end 304 | 305 | end 306 | -------------------------------------------------------------------------------- /lib/tesseract/engine/baseline.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; class Engine 26 | 27 | class Baseline 28 | def initialize (struct) 29 | @internal = struct 30 | end 31 | 32 | C::Iterator::Baseline.layout.members.each {|name| 33 | define_method name do 34 | @internal[name] 35 | end 36 | } 37 | 38 | def inspect 39 | "#" 40 | end 41 | end 42 | 43 | end; end 44 | -------------------------------------------------------------------------------- /lib/tesseract/engine/bounding_box.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; class Engine 26 | 27 | class BoundingBox 28 | def initialize (struct) 29 | @internal = struct 30 | end 31 | 32 | C::Iterator::BoundingBox.layout.members.each {|name| 33 | define_method name do 34 | @internal[name] 35 | end 36 | } 37 | 38 | alias x left 39 | alias y top 40 | 41 | def width 42 | right - left 43 | end 44 | 45 | def height 46 | bottom - top 47 | end 48 | 49 | def inspect 50 | "#" 51 | end 52 | end 53 | 54 | end; end 55 | -------------------------------------------------------------------------------- /lib/tesseract/engine/font_attributes.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; class Engine 26 | 27 | class FontAttributes 28 | def initialize (struct) 29 | @internal = struct 30 | end 31 | 32 | C::Iterator::FontAttributes.layout.members.each {|name| 33 | define_method name do 34 | @internal[name] 35 | end 36 | } 37 | 38 | alias bold? is_bold 39 | alias italic? is_italic 40 | alias underlined? is_underlined 41 | alias monospace? is_monospace 42 | alias serif? is_serif 43 | alias smallcaps? is_smallcaps 44 | 45 | def inspect 46 | "#" 47 | end 48 | end 49 | 50 | end; end 51 | -------------------------------------------------------------------------------- /lib/tesseract/engine/iterator.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | require 'tesseract/engine/bounding_box' 26 | require 'tesseract/engine/baseline' 27 | require 'tesseract/engine/orientation' 28 | require 'tesseract/engine/font_attributes' 29 | 30 | module Tesseract; class Engine 31 | 32 | class Iterator 33 | class Element 34 | def self.for (level) 35 | Iterator.const_get(level.capitalize) 36 | end 37 | 38 | def initialize (level, iterator) 39 | @level = level 40 | @iterator = iterator 41 | end 42 | 43 | memoize 44 | def bounding_box 45 | BoundingBox.new(@iterator.bounding_box(@level)) 46 | end 47 | 48 | memoize 49 | def binary_image 50 | @iterator.get_binary_image(@level) rescue nil 51 | end 52 | 53 | memoize 54 | def image 55 | @iterator.get_image(@level) rescue nil 56 | end 57 | 58 | memoize 59 | def baseline 60 | Baseline.new(@iterator.baseline(@level)) 61 | end 62 | 63 | memoize 64 | def orientation 65 | Orientation.new(@iterator.orientation) 66 | end 67 | 68 | memoize 69 | def text 70 | @iterator.get_text(@level) 71 | end 72 | 73 | memoize 74 | def confidence 75 | @iterator.confidence(@level) 76 | end 77 | 78 | alias to_s text 79 | 80 | def inspect 81 | "#" 82 | end 83 | end 84 | 85 | class Block < Element 86 | memoize 87 | def type 88 | @iterator.block_type 89 | end 90 | end 91 | 92 | class Paragraph < Element; end 93 | 94 | class Line < Element; end 95 | 96 | class Word < Element 97 | memoize 98 | def font_attributes 99 | FontAttributes.new(@iterator.word_font_attributes) 100 | end 101 | 102 | memoize 103 | def from_dictionary? 104 | @iterator.word_is_from_dictionary? 105 | end 106 | 107 | memoize 108 | def numeric? 109 | @iterator.word_is_numeric? 110 | end 111 | end 112 | 113 | class Symbol < Element 114 | memoize 115 | def superscript? 116 | @iterator.symbol_is_superscript? 117 | end 118 | 119 | memoize 120 | def subscript? 121 | @iterator.symbol_is_subscript? 122 | end 123 | 124 | memoize 125 | def dropcap? 126 | @iterator.symbol_is_dropcap? 127 | end 128 | end 129 | 130 | def initialize (iterator) 131 | @iterator = iterator 132 | end 133 | 134 | %w(block paragraph line word symbol).each {|level| 135 | define_method "each_#{level}" do |&block| 136 | return enum_for "each_#{level}" unless block 137 | 138 | @iterator.begin 139 | 140 | begin 141 | block.call Element.for(level).new(level, @iterator) 142 | end while @iterator.next level 143 | end 144 | 145 | define_method "#{level}s" do 146 | __send__("each_#{level}").map {|e| 147 | e.methods.each {|name| 148 | if e.is_memoized?(name) 149 | e.__send__ name 150 | end 151 | } 152 | 153 | e.instance_eval { 154 | @iterator = nil 155 | } 156 | 157 | e 158 | } 159 | end 160 | } 161 | end 162 | 163 | end; end 164 | -------------------------------------------------------------------------------- /lib/tesseract/engine/orientation.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract; class Engine 26 | 27 | class Orientation 28 | def initialize (struct) 29 | @internal = struct 30 | end 31 | 32 | C::Iterator::Orientation.layout.members.each {|name| 33 | define_method name do 34 | @internal[name] 35 | end 36 | } 37 | 38 | alias direction orientation 39 | 40 | def inspect 41 | "#" 42 | end 43 | end 44 | 45 | end; end 46 | -------------------------------------------------------------------------------- /lib/tesseract/extensions.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | require 'call-me/named' 26 | require 'call-me/memoize' 27 | require 'iso-639' 28 | -------------------------------------------------------------------------------- /lib/tesseract/iterator.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract 26 | 27 | class Iterator 28 | def initialize (api, pointer) 29 | @api = api 30 | @internal = pointer 31 | end 32 | 33 | def to_ffi 34 | @internal 35 | end 36 | end 37 | 38 | end 39 | -------------------------------------------------------------------------------- /lib/tesseract/version.rb: -------------------------------------------------------------------------------- 1 | #-- 2 | # Copyright 2011 meh. All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without modification, are 5 | # permitted provided that the following conditions are met: 6 | # 7 | # 1. Redistributions of source code must retain the above copyright notice, this list of 8 | # conditions and the following disclaimer. 9 | # 10 | # THIS SOFTWARE IS PROVIDED BY meh ''AS IS'' AND ANY EXPRESS OR IMPLIED 11 | # WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 12 | # FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL meh OR 13 | # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 14 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 15 | # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON 16 | # ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 17 | # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 18 | # ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 19 | # 20 | # The views and conclusions contained in the software and documentation are those of the 21 | # authors and should not be interpreted as representing official policies, either expressed 22 | # or implied, of meh. 23 | #++ 24 | 25 | module Tesseract 26 | def self.version 27 | '0.1.8' 28 | end 29 | end 30 | -------------------------------------------------------------------------------- /tesseract-ocr.gemspec: -------------------------------------------------------------------------------- 1 | Kernel.load 'lib/tesseract/version.rb' 2 | 3 | Gem::Specification.new {|s| 4 | s.name = 'tesseract-ocr' 5 | s.version = Tesseract.version 6 | s.author = 'meh.' 7 | s.email = 'meh@schizofreni.co' 8 | s.homepage = 'http://github.com/meh/ruby-tesseract-ocr' 9 | s.platform = Gem::Platform::RUBY 10 | s.summary = 'A wrapper library to the tesseract-ocr API.' 11 | s.license = 'BSD' 12 | 13 | s.files = `git ls-files`.split("\n") 14 | s.executables = `git ls-files -- bin/*`.split("\n").map { |f| File.basename(f) } 15 | s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n") 16 | s.require_paths = ['lib'] 17 | 18 | s.add_dependency 'call-me' 19 | s.add_dependency 'iso-639' 20 | 21 | s.add_dependency 'ffi-extra' 22 | s.add_dependency 'ffi-inline' 23 | } 24 | -------------------------------------------------------------------------------- /test/first.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/test/first.png -------------------------------------------------------------------------------- /test/jsmj.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/test/jsmj.png -------------------------------------------------------------------------------- /test/second.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/test/second.png -------------------------------------------------------------------------------- /test/tesseract_bench.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'tesseract' 3 | require 'benchmark' 4 | 5 | Benchmark.bm do |b| 6 | engine = Tesseract::Engine.new 7 | 8 | b.report 'text_for: ' do 9 | 100.times do 10 | engine.text_for('first.png') 11 | 12 | GC.start 13 | end 14 | end 15 | 16 | b.report 'words_for: ' do 17 | 100.times do 18 | engine.words_for('first.png') 19 | 20 | GC.start 21 | end 22 | end 23 | 24 | b.report 'symbols_for: ' do 25 | 100.times do 26 | engine.symbols_for('first.png') 27 | 28 | GC.start 29 | end 30 | end 31 | end 32 | -------------------------------------------------------------------------------- /test/tesseract_spec.rb: -------------------------------------------------------------------------------- 1 | #! /usr/bin/env ruby 2 | require 'rubygems' 3 | require 'tesseract' 4 | 5 | class Block; end 6 | class Paragraph; end 7 | class Line; end 8 | class Word; end 9 | class Symbol; end 10 | 11 | describe Tesseract::Engine do 12 | let :engine do 13 | Tesseract::Engine.new(language: :eng) 14 | end 15 | 16 | describe '#text_for' do 17 | it 'can read the first test image' do 18 | expect(engine.text_for('first.png').strip).to eq('ABC') 19 | end 20 | 21 | it 'can read the second test image' do 22 | expect(engine.text_for('second.png').strip).to eq("#{Tesseract::API.new.version == '3.01' ? ?| : ?I}'m 12 and what is this.\nINSTALL GENTOO\nOH HAI 1234") 23 | end 24 | 25 | it 'raises when going out of the image boundaries' do 26 | expect { 27 | engine.text_for('second.png', 0, 0, 1000, 1000) 28 | }.to raise_error IndexError 29 | end 30 | end 31 | 32 | describe '#text_at' do 33 | it 'can read the first test image' do 34 | engine.image = 'first.png' 35 | 36 | expect(engine.text_at(2, 2, 2, 2).strip).to eq('') 37 | end 38 | 39 | it 'can read the second test image' do 40 | engine.image = 'second.png' 41 | 42 | expect(engine.text_at(242, 191, 129, 31).strip).to eq('OH HAI 1234') 43 | end 44 | 45 | it 'raises when going out of the image boundaries' do 46 | expect { 47 | engine.image = 'second.png' 48 | engine.text_at(10, 20, 1000, 1000) 49 | }.to raise_error IndexError 50 | end 51 | end 52 | 53 | describe '#text' do 54 | it 'can read the first test image' do 55 | engine.image = 'first.png' 56 | engine.select 2, 2, 2, 2 57 | 58 | expect(engine.text.strip).to eq('') 59 | end 60 | 61 | it 'can read the second test image' do 62 | engine.image = 'second.png' 63 | engine.select 242, 191, 129, 31 64 | 65 | expect(engine.text.strip).to eq('OH HAI 1234') 66 | end 67 | 68 | it 'raises when going out of the image boundaries' do 69 | expect { 70 | engine.image = 'second.png' 71 | engine.select 10, 20, 1000, 1000 72 | engine.text 73 | }.to raise_error IndexError 74 | end 75 | end 76 | 77 | describe '#hocr' do 78 | it 'can read the first test image' do 79 | engine.image = 'first.png' 80 | engine.select 2, 2, 2, 2 81 | 82 | expect(engine.hocr).to eq("
\n
\n") 83 | end 84 | 85 | it 'can read the second test image' do 86 | engine.image = 'second.png' 87 | engine.select 242, 191, 129, 31 88 | 89 | expect(engine.hocr).to eq("
\n
\n

\n OH HAI 1234 \n \n

\n
\n
\n") 90 | end 91 | 92 | it 'raises when going out of the image boundaries' do 93 | expect { 94 | engine.image = 'second.png' 95 | engine.select 10, 20, 1000, 1000 96 | 97 | engine.hocr 98 | }.to raise_error IndexError 99 | end 100 | end 101 | 102 | describe '#blacklist' do 103 | it 'works with removing weird signs' do 104 | expect(engine.with { |e| e.blacklist = '|' }.text_for('second.png').strip).to eq("I'm 12 and what is this.\nINSTALL GENTOO\nOH HAI 1234") 105 | end 106 | end 107 | 108 | describe '#whitelist' do 109 | it 'makes everything into a number' do 110 | expect(engine.with { |e| e.whitelist = '1234567890' }.text_for('second.png').strip).to match(/^[\d\s]*$/) 111 | end 112 | end 113 | 114 | describe '#page_segmentation_mode' do 115 | it 'sets it correctly' do 116 | expect(engine.with {|e| 117 | e.page_segmentation_mode = :single_line 118 | e.whitelist = [*'a'..'z', *'A'..'Z', *0..9, " ."].join 119 | }.text_for('jsmj.png').strip).to eq('Jsmj') 120 | end 121 | end 122 | 123 | describe '#blocks' do 124 | it 'works properly with first image' do 125 | expect(engine.blocks_for('first.png').first.to_s.strip).to eq('ABC') 126 | end 127 | end 128 | 129 | describe '#paragraphs' do 130 | it 'works properly with first image' do 131 | expect(engine.paragraphs_for('first.png').first.to_s.strip).to eq('ABC') 132 | end 133 | end 134 | 135 | describe '#lines' do 136 | it 'works properly with first image' do 137 | expect(engine.lines_for('first.png').first.to_s.strip).to eq('ABC') 138 | end 139 | end 140 | 141 | describe '#words' do 142 | it 'works properly with first image' do 143 | expect(engine.words_for('first.png').first.to_s).to eq('ABC') 144 | end 145 | end 146 | 147 | describe '#symbols' do 148 | it 'works properly with first image' do 149 | expect(engine.symbols_for('first.png').first.to_s).to eq('A') 150 | end 151 | end 152 | end 153 | -------------------------------------------------------------------------------- /test/test-european.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/test/test-european.jpg -------------------------------------------------------------------------------- /test/test.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/meh/ruby-tesseract-ocr/e0180d793cda4ad487364504df099ecbd5e30b28/test/test.png --------------------------------------------------------------------------------