├── .gitignore ├── README ├── clean ├── compose-inline-breaks ├── compose-pnm-files-multi-page ├── copy-characters-to-webspace ├── dice-posters ├── generate-output-multi-page ├── individual-characters └── .gitignore ├── make-unicode-poster ├── png-find-grid-revised ├── .gitignore ├── Makefile ├── bottom-line-proportion.c ├── crop-images.c ├── empty-image.c ├── find-grid.c ├── png-size.c ├── readpng.c └── readpng.h ├── test-codepoints └── tileinfo.rb /.gitignore: -------------------------------------------------------------------------------- 1 | *.png 2 | *.pdf 3 | *~ 4 | poster-inline-breaks-* 5 | poster-complete-* 6 | Blocks.txt 7 | codepoints.yaml 8 | top-sizes.yaml 9 | *.png.empty 10 | *.png.crop 11 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | These scripts require the following binaries: 2 | 3 | jruby (package jruby) 4 | ruby (package ruby1.8) 5 | curl (package curl) 6 | pdftoppm (package poppler-utils) 7 | make (package make) 8 | ocrad (package ocrad) 9 | convert (package imagemagick) 10 | 11 | ... and many from the netpbm package. You will also need to have the 12 | following libraries installed: 13 | 14 | libpng12-dev 15 | libgd-ruby1.8 16 | 17 | ... and the font DejaVu installed as well: 18 | 19 | ttf-dejavu-core 20 | 21 | You can install all of these dependencies with: 22 | 23 | apt-get install jruby ruby1.8 ruby curl poppler-utils make ocrad imagemagick libpng12-dev libgd-ruby1.8 ttf-dejavu-core netpbm 24 | 25 | A completed build tree for this poster is about 18 GiB for me so 26 | you need lots of disk space. 27 | 28 | Step 1: Extracting the Glyphs 29 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30 | 31 | You should be able to run: 32 | 33 | ./make-unicode-poster 34 | 35 | After a long time (at least a day on my computer) this will 36 | generate some useful output in the directory 37 | 'individual-characters'. These files include: 38 | 39 | U????-???-????????.png 40 | 41 | The glyph images with Unicode codepoint 42 | 43 | U????-???-????????-top.png 44 | 45 | Just the glyph image, cropped from the image above 46 | 47 | U????-???-????????-bottom.png 48 | 49 | Just the image of the Unicode codepoint number, also cropped 50 | from the full glyph image. 51 | 52 | In the base directory there will also be: 53 | 54 | top-sizes.yaml 55 | 56 | Information about the size of the -top.png image files and 57 | the block that the characters are from. 58 | 59 | codepoints.yaml 60 | 61 | This file gives the name of each image glyph and the Unicode 62 | codepoint acquired from doing OCR on the codepoint in the 63 | image. These values will have a number of errors in them, 64 | since the OCR isn't perfect; if you want to be able to map 65 | each image to a codepoint, see the option Step 5 below. 66 | 67 | Step 2: Drawing the Poster Strips 68 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 69 | 70 | The script "compose-inline-breaks" will create poster strips 71 | from these blocks. You can invoke it like: 72 | 73 | ./compose-inline-breaks "Unicode 5.1.0 (Single Page Test)" 1 74 | 75 | ... where the first argument is the title of the page and the 76 | second argument is the number of posters to spread across. This 77 | will generate a series of strips that we can compose into the 78 | final posters, called something like: 79 | 80 | poster-inline-breaks-00-00000030.png 81 | 82 | Note that these images are up to 90000 pixels wide, which 83 | creates a number of problems. You can view them in gimp quite 84 | happily, but many other programs won't cope. There are even 85 | bugs in some graphical file managers that mean that your file 86 | manager will crash if you open this directory. 87 | 88 | Step 3: Composing the Poster Strips 89 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 90 | 91 | Now, to compose the strips into a single file, just run: 92 | 93 | ./compose-pnm-files-multi-page 94 | 95 | That script converts each strip to PGM format and concatenates 96 | them into gigantic PNM files, one per page of poster. 97 | 98 | ****** WARNING ***** 99 | 100 | You need a great deal of disk space for this to run to 101 | completion. The concatenated version of the poster is about 102 | 12GB. So, before you consider running this file you probably 103 | want to make sure that you have 30GB of disk free to be on 104 | the safe side. 105 | 106 | ******************** 107 | 108 | The eventual output should be called (for a single page poster): 109 | 110 | poster-complete-00.pnm 111 | 112 | ... or multiple files for multi-page posters. 113 | 114 | In some sense this is the final output (a complete poster!) but 115 | it's 12GB and 90000x130000 pixels, so not very useful for 116 | printing. To generate output you have a chance of printing see 117 | the next section: 118 | 119 | Step 4: Generating Printable Output 120 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 121 | 122 | [This is by far the worst script, but you'll probably want to 123 | customize it quite a bit anyway, depending on what your 124 | printing situation is - see Step 5.] 125 | 126 | First, as a test, you can try generating some 600dpi A4 versions: 127 | 128 | ./generate-output-multi-page A4 600 129 | 130 | If the PNG and EPS files generated from that look sensible then 131 | you can try: 132 | 133 | ./generate-output-multi-page A0 600 134 | 135 | ... to generate A0 versions. 136 | 137 | Step 5: Printing Out Your Poster 138 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 139 | 140 | 1. Find someone with an A0 plotter. 141 | 142 | 2. Negotiate with them at great length about whether what you 143 | want to do is sensible (it is :)) and what file formats they 144 | can deal with. 145 | 146 | Step 1 is relatively easy, and step 2 is often very difficult. 147 | Some printers will want to load the bitmap into Photoshop and 148 | print it out from there. That will probably work OK, but you 149 | have to be very careful that they don't enable any option in the 150 | "Print" dialog that will cause JPEG compression to be applied to 151 | your image or the output will look horrible. Even at A0 size 152 | you really need the fine detail for this poster. 153 | 154 | Many of the large HP plotters will accept TIFF files if you can 155 | send files directly to them - this sometimes takes some 156 | negotiation. Before doing this, make sure that the resolution 157 | set in the TIFF file is correct for your poster. 158 | 159 | One other option here is to use pamdice to split the A0 image 160 | into many A4 sheets that you can stick together. It's much 161 | nicer to have a proper A0 version, though. 162 | 163 | Good luck with this step - I've found it a bit of a nightmare :/ 164 | 165 | [Optional] Step 5: Mapping Codepoints to Glyph Images 166 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 167 | 168 | For a number of applications you might want to be able to find the 169 | glyph image for a particular codepoint. In order to do this, you'll 170 | need to correct the generated codepoints.yaml file. We can detect 171 | most of the errors (all if we're lucky) by looking for the codepoint 172 | values that appear to be out of place. The script in this directory 173 | called "test-codepoints" is designed for this. You should do the 174 | following: 175 | 176 | # ./test-codepoints 177 | Problem with: U0C00-002-00000099.png => 0x0CS3 178 | Not a Fixnum 179 | Problem with: U0E00-002-00000016.png => 0x 180 | Not a Fixnum 181 | Problem with: U1D400-002-00000165.png => 0x_D4A5 182 | Not a Fixnum 183 | Problem with: U1D400-002-00000181.png => 0x_D4B5 184 | Not a Fixnum 185 | Problem with: U1D400-002-00000191.png => 46271 186 | Out of order... 187 | Problem with: U1D400-002-00000198.png => 0x1_4C6 188 | Not a Fixnum 189 | [.. etc ..] 190 | 191 | For each of these, do something like: 192 | 193 | # Look for what the codepoint should be from the image: 194 | xli U0C00-002-00000099.png 195 | 196 | # Correct it in the file: 197 | emacs individual-characters/non-blank/codepoints.yaml 198 | 199 | Run ./test-codepoints again and correct any further errors, 200 | repeating if necessary. Once that's done you can use the script 201 | ./copy-characters-to-webspace to scale them down by 50% and copy 202 | the glyph images to a directory where they'll have sensible 203 | names: 204 | 205 | mkdir ~/renamed-unicode-glyphs/ 206 | 207 | ./copy-characters-to-webspace codepoints.yaml ~/renamed-unicode-glyphs/ 208 | 209 | Those files will be called things like: 210 | 211 | reduced-0x00260E.png 212 | 213 | ... for U+00260E (BLACK TELEPHONE) 214 | 215 | ------------------------------------------------------------------------ 216 | 217 | Mark Longair 218 | http://longair.net/mark/ 219 | -------------------------------------------------------------------------------- /clean: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | rm -fv *.crop 4 | rm -fv *.empty 5 | find individual-characters -type f -print0|xargs -r -0 rm -v 6 | git checkout individual-characters/.gitignore 7 | rm -v poster-complete-* 8 | rm -v poster-inline-breaks-* 9 | rm -v U-*.png 10 | -------------------------------------------------------------------------------- /compose-inline-breaks: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby -w 2 | 3 | require 'yaml' 4 | require 'GD' 5 | 6 | unless ARGV.length == 2 7 | puts "Usage: <number-of-pages>" 8 | exit(-1) 9 | end 10 | 11 | $title = ARGV[0] 12 | $number_of_pages = Integer(ARGV[1]) 13 | 14 | require 'tileinfo.rb' 15 | 16 | $info_hash = {} 17 | $pages_hash = Hash.new 18 | 19 | input = "top-sizes.yaml" 20 | STDERR.puts "Opening #{input}" 21 | all_tiles = YAML.load( open( input ) { |f| f.read } ) 22 | all_tiles.each_pair do |filename,t| 23 | next unless t.valid_size? 24 | next unless t.apparently_successful_ocr? 25 | t.set_block_from_ocr_codepoint 26 | $info_hash[t.filename] = t 27 | block_first = t.block.first 28 | unless $pages_hash.has_key? block_first 29 | $pages_hash[block_first] = Array.new 30 | end 31 | $pages_hash[block_first].push t.filename 32 | end 33 | 34 | puts "After filtering, had #{$info_hash.length} valid tiles" 35 | 36 | for p in $pages_hash.keys.sort_by { |n| n.to_i(16) } 37 | puts "Block beginning #{p} has #{$pages_hash[p].length} characters" 38 | end 39 | 40 | $font_point_size = 160 41 | $font_filename = '/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf' 42 | # $font_filename = '/usr/share/fonts/truetype/ttf-bitstream-vera/Vera.ttf' 43 | 44 | def text_rectangle( text, point_size = $font_point_size ) 45 | 46 | error, r = GD::Image.stringTTF( 0, $font_filename, 47 | point_size, 0, 0, 0, 48 | text ) 49 | 50 | if( error ) 51 | raise "Couldn't find text dimensions for \"#{text}\": " + error 52 | end 53 | 54 | text_width = [ r[4] - r[6], r[2] - r[0] ].max 55 | text_height = [ r[1] - r[7], r[3] - r[5] ].max 56 | 57 | [ text_width, text_height, r ] 58 | 59 | end 60 | 61 | def draw_text( im, text, x_left, y_midpoint, point_size = $font_point_size ) 62 | 63 | text_width, text_height, r = text_rectangle( text, point_size ) 64 | 65 | black = im.colorAllocate( 0, 0, 0 ) 66 | 67 | draw_x = x_left - r[6] 68 | draw_y = y_midpoint - (text_height / 2) - r[7] 69 | 70 | unless text.empty? 71 | error, ignore = im.stringTTF( black, $font_filename, 72 | point_size, 0, 73 | draw_x, 74 | draw_y, 75 | text ) 76 | 77 | end 78 | 79 | # im.rectangle( draw_x, 80 | # draw_y, 81 | # draw_x + text_width, 82 | # draw_y - text_height, 83 | # im.colorAllocate( 0, 0, 0 ) ) 84 | 85 | end 86 | 87 | def compare_page_names( a, b ) 88 | 89 | re = Regexp.new('U?(\w+)') 90 | 91 | a_page = nil 92 | b_page = nil 93 | 94 | if( a =~ re ) 95 | a_page = Integer("0x"+$1) 96 | end 97 | 98 | if( b =~ re ) 99 | b_page = Integer("0x"+$1) 100 | end 101 | 102 | if( a_page && b_page ) 103 | if( 0 != (a_page <=> b_page) ) 104 | return (a_page <=> b_page) 105 | end 106 | end 107 | 108 | return (a <=> b) 109 | 110 | end 111 | 112 | $pages = $pages_hash.keys.sort { |a,b| compare_page_names(a,b) } 113 | 114 | # $pages.each do |p| 115 | # puts "#{p} (#{name_of_page(p)})" 116 | # end 117 | 118 | # The layout will look like this: 119 | # 120 | # ^ 121 | # | 122 | # | Title [page_title_height] 123 | # | 124 | # v 125 | # ^ 126 | # | Basic Latin (U000 to U007F) [block_header_height] 127 | # v 128 | # ^ _ _ 129 | # | | | | | 130 | # | |_| |_| ... 131 | # v 132 | # ^ _ _ 133 | # | | | | | 134 | # | |_| |_| 135 | # v 136 | # ^ 137 | # | Latin-1 Supplement (U080 to U007F) 138 | # v 139 | # ^ _ _ 140 | # | | | | | 141 | # | |_| |_| ... 142 | # v 143 | # 144 | # .... 145 | # 146 | # [ A footer image with a helpful URL ] 147 | 148 | # $block_header_height_in_pixels = 160 149 | # $title_height_in_pixels = 300 150 | # $max_tile_height = 163 - 27 151 | # $minimum_space_to_right_of_block = 100 152 | 153 | # The above values were for 300dpi; now we're at 800dpi, we need to 154 | # scale up a bit. 155 | 156 | $block_header_height_in_pixels = 426 157 | $title_height_in_pixels = 800 158 | $max_tile_height = ((163 - 27) * 8) / 3 159 | $minimum_space_to_right_of_block = 300 160 | # $minimum_space_to_right_of_block = 550 161 | 162 | $space_to_right_of_header = 150 163 | # $space_to_right_of_header = -190 164 | 165 | def name_of_page(p) 166 | $page_to_block[p].nice_name 167 | end 168 | 169 | def whole_block_width( page_name ) 170 | 171 | page_files = $pages_hash[page_name] 172 | 173 | block_width_so_far = 0 174 | 175 | page_files.each do |page_file| 176 | block_width_so_far += $info_hash[page_file].w 177 | end 178 | 179 | header_width, header_height, r = text_rectangle( name_of_page(page_name) ) 180 | 181 | header_width + $space_to_right_of_header + block_width_so_far 182 | 183 | end 184 | 185 | def header_width( page_name ) 186 | width, height, r = text_rectangle( name_of_page(page_name) ) 187 | width 188 | end 189 | 190 | $pages.each do |p| 191 | 192 | puts sprintf( "%10s: %10d (%70s)", p, whole_block_width(p), $page_to_block[p].name ) 193 | 194 | end 195 | 196 | class HeaderToDraw 197 | 198 | attr_accessor :x 199 | 200 | def initialize( x, text ) 201 | @text = text 202 | @x = x 203 | end 204 | 205 | def draw_to( im, color, leave_header_space ) 206 | puts "Drawing '#{@text}' at x = #{@x}" 207 | draw_text( im, @text, @x, ($block_header_height_in_pixels / 2) - 4, $font_point_size ) 208 | end 209 | 210 | end 211 | 212 | class ImageToDraw 213 | 214 | attr_accessor :x 215 | 216 | def initialize( x, image_file ) 217 | @x = x 218 | @image_file = image_file 219 | end 220 | 221 | def draw_to( im, color, leave_header_space ) 222 | im_tile = GD::Image.new_from_png( @image_file ) 223 | y = 0 224 | im_tile.copy( im, x, y, 0, 0, im_tile.width, im_tile.height) 225 | im_tile.destroy 226 | end 227 | 228 | end 229 | 230 | def draw_these( everything_to_draw, image_index, filename_base, maximum_row_width, current_page, just_score ) 231 | 232 | unless just_score 233 | print "Drawing page #{current_page} image #{image_index}..." 234 | end 235 | 236 | includes_headers = everything_to_draw.detect { |e| e.class == HeaderToDraw } 237 | 238 | height = $max_tile_height 239 | 240 | if just_score 241 | return height 242 | end 243 | 244 | puts " which is #{maximum_row_width}x#{height}" 245 | 246 | im = GD::Image.new( maximum_row_width, height ) 247 | 248 | white = im.colorAllocate( 255, 255, 255 ) 249 | black = im.colorAllocate( 0, 0, 0 ) 250 | 251 | im.filledRectangle( 0, 0, maximum_row_width, height, white ) 252 | 253 | everything_to_draw.each do |e| 254 | e.draw_to( im, black, includes_headers ) 255 | end 256 | 257 | open( filename_base + sprintf("-%02d-%08d.png",current_page,image_index), "w" ) do |f| 258 | im.png f 259 | end 260 | 261 | im.destroy 262 | im = nil 263 | 264 | GC.start 265 | 266 | return height 267 | 268 | end 269 | 270 | def draw_compacted( maximum_row_width, filename_base, just_score ) 271 | 272 | # If height_in_pixels_so_far goes over maximum_page_height, we start 273 | # writing them to a new page. 274 | 275 | maximum_page_height = maximum_row_width * Math.sqrt(2) 276 | current_page = 0 277 | 278 | height_in_pixels_so_far = 0 279 | row_width_so_far = 0 280 | 281 | image_index = 0 282 | 283 | # ------------------------------------------------------------------------ 284 | 285 | unless just_score 286 | im = GD::Image.new( maximum_row_width, $title_height_in_pixels ) 287 | 288 | white = im.colorAllocate( 255, 255, 255 ) 289 | black = im.colorAllocate( 0, 0, 0 ) 290 | 291 | im.filledRectangle( 0, 0, maximum_row_width, $title_height_in_pixels, white ) 292 | draw_text( im, $title, 100, $title_height_in_pixels / 2, 320 ) 293 | 294 | open( filename_base + sprintf("-%02d-%08d.png",current_page,image_index), "w" ) do |f| 295 | im.png f 296 | end 297 | im.destroy 298 | im = nil 299 | GC.start 300 | end 301 | 302 | image_index += 1 303 | height_in_pixels_so_far += $title_height_in_pixels 304 | 305 | # ------------------------------------------------------------------------ 306 | 307 | everything_to_draw = Array.new 308 | 309 | continue_this_line = false 310 | 311 | number_of_blocks = $pages.length 312 | 313 | $pages.each_with_index do |p,i| 314 | 315 | if continue_this_line 316 | name = name_of_page(p) 317 | h_width = header_width(p) 318 | row_width_so_far += $minimum_space_to_right_of_block 319 | everything_to_draw.push HeaderToDraw.new( row_width_so_far, name ) 320 | row_width_so_far += h_width + $space_to_right_of_header 321 | else 322 | name = name_of_page(p) 323 | row_width_so_far = 0 324 | h_width = header_width(p) 325 | everything_to_draw.push HeaderToDraw.new( row_width_so_far, name ) 326 | row_width_so_far += (h_width + $space_to_right_of_header) 327 | end 328 | 329 | page_files = $pages_hash[p] 330 | 331 | page_files.sort! 332 | 333 | page_files.each do |page_file| 334 | 335 | tile_width = $info_hash[page_file].w 336 | 337 | if (row_width_so_far + tile_width) > maximum_row_width 338 | 339 | row_width_so_far = 0 340 | 341 | # Draw all the pending stuff... 342 | 343 | height_in_pixels_so_far += draw_these( everything_to_draw, 344 | image_index, 345 | filename_base, 346 | maximum_row_width, 347 | current_page, 348 | just_score ) 349 | 350 | if (height_in_pixels_so_far > ((current_page + 1) * maximum_page_height)) && (current_page != ($number_of_pages - 1)) 351 | current_page += 1 352 | end 353 | everything_to_draw = Array.new 354 | image_index += 1 355 | 356 | else 357 | 358 | everything_to_draw.push ImageToDraw.new( row_width_so_far, page_file ) 359 | row_width_so_far += tile_width 360 | 361 | end 362 | 363 | end 364 | 365 | # Is there space for the next block too? 366 | 367 | if (i < (number_of_blocks - 1)) 368 | 369 | space_needed_for_next_block_header = header_width( $pages[i+1] ) 370 | 371 | if ((row_width_so_far + $minimum_space_to_right_of_block + space_needed_for_next_block_header) < (maximum_row_width)) 372 | 373 | continue_this_line = true 374 | 375 | else 376 | 377 | # Draw all the pending stuff... 378 | 379 | height_in_pixels_so_far += draw_these( everything_to_draw, 380 | image_index, 381 | filename_base, 382 | maximum_row_width, 383 | current_page, 384 | just_score ) 385 | 386 | if (height_in_pixels_so_far > ((current_page + 1) * maximum_page_height)) && (current_page != ($number_of_pages - 1)) 387 | current_page += 1 388 | end 389 | everything_to_draw = Array.new 390 | 391 | image_index += 1 392 | 393 | continue_this_line = false 394 | row_width_so_far = 0 395 | 396 | end 397 | 398 | else 399 | 400 | # Draw all the pending stuff... 401 | 402 | height_in_pixels_so_far += draw_these( everything_to_draw, 403 | image_index, 404 | filename_base, 405 | maximum_row_width, 406 | current_page, 407 | just_score ) 408 | 409 | if (height_in_pixels_so_far > ((current_page + 1) * maximum_page_height)) && (current_page != ($number_of_pages - 1)) 410 | current_page += 1 411 | end 412 | everything_to_draw = Array.new 413 | 414 | image_index += 1 415 | 416 | continue_this_line = false 417 | row_width_so_far = 0 418 | 419 | end 420 | 421 | end 422 | 423 | # Draw an empty one at the bottom... 424 | 425 | unless just_score 426 | 427 | im = GD::Image.new( maximum_row_width, $title_height_in_pixels ) 428 | 429 | white = im.colorAllocate( 255, 255, 255 ) 430 | 431 | im.filledRectangle( 0, 0, maximum_row_width, $title_height_in_pixels, white ) 432 | 433 | url_text = "http://mythic-beasts.com/~mark/random/unicode-poster/" 434 | url_text_width, ignore_height, ignored_bounds = text_rectangle( url_text, 320 ) 435 | draw_text( im, url_text, maximum_row_width - (url_text_width + 200), $title_height_in_pixels / 2, 320 ) 436 | 437 | open( filename_base + sprintf("-%02d-%08d.png",current_page,image_index), "w" ) do |f| 438 | im.png f 439 | end 440 | 441 | im.destroy 442 | im = nil 443 | 444 | GC.start 445 | 446 | end 447 | 448 | height_in_pixels_so_far += $title_height_in_pixels 449 | 450 | height_in_pixels_so_far 451 | 452 | end 453 | 454 | # Basically trying to solve: 455 | # 456 | # Math.sqrt(2) = x / draw_compacted(x) 457 | # 458 | # ... by binary search (draw_compacted is monotonic decreasing.) 459 | 460 | pixel_width_to_use = nil 461 | 462 | if true 463 | 464 | x_lower = 500 465 | x_higher = $info_hash.length * $max_tile_height 466 | 467 | y_lower = nil 468 | y_higher = nil 469 | 470 | loop do 471 | 472 | unless y_lower 473 | y_lower = draw_compacted( x_lower, "ignored", true ) 474 | end 475 | unless y_higher 476 | y_higher = draw_compacted( x_higher, "ignored", true ) 477 | end 478 | 479 | puts "---" 480 | puts " lower guess: #{x_lower} => #{y_lower} (#{y_lower/Float(x_lower)})" 481 | puts " higher guess: #{x_higher} => #{y_higher} (#{y_higher/Float(x_higher)})" 482 | 483 | x_guess = Integer( (x_lower + x_higher) / 2 ) 484 | y_guess = draw_compacted( x_guess, "ignored", true ) 485 | 486 | puts " middle guess: #{x_guess} => #{y_guess} (#{y_guess/Float(x_guess)})" 487 | 488 | if (y_guess == y_lower) && (y_guess == y_higher) 489 | puts "The guess was the same as one both previous ones; good enough." 490 | pixel_width_to_use = x_guess 491 | break 492 | end 493 | 494 | y_ideal = Math.sqrt(2) * x_guess * $number_of_pages 495 | 496 | if y_guess > y_ideal 497 | x_lower = x_guess 498 | y_lower = y_guess 499 | else 500 | x_higher = x_guess 501 | y_higher = y_guess 502 | end 503 | 504 | minimum_acceptable = y_ideal 505 | maximum_acceptable = y_ideal + $block_header_height_in_pixels + $max_tile_height 506 | 507 | if y_guess.between_inclusive?( minimum_acceptable, maximum_acceptable ) 508 | puts "The guess was within the acceptable boundaries, breaking." 509 | pixel_width_to_use = x_guess 510 | break 511 | end 512 | 513 | end 514 | 515 | else 516 | pixel_width_to_use = 100000 517 | end 518 | 519 | # Now finally draw the images. 520 | 521 | draw_compacted( pixel_width_to_use, "poster-inline-breaks", false ) 522 | -------------------------------------------------------------------------------- /compose-pnm-files-multi-page: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby -w 2 | 3 | pngtopnm = "pngtopnm" 4 | pnmtopng = "pnmtopng" 5 | 6 | page_glob_template = 'poster-inline-breaks-%02d-00*.pnm' 7 | complete_page_template_pnm = 'poster-complete-%02d.pnm' 8 | complete_page_template_png = 'poster-complete-%02d.png' 9 | 10 | png_files = Dir['poster-inline-breaks-*-00*.png'] 11 | png_files.sort! 12 | 13 | page_numbers = png_files.collect { |e| e.gsub(/poster-inline-breaks-(\d\d)-.*/,'\1') }.sort.uniq 14 | 15 | puts "There seem to be #{page_numbers.length} pages of poster..." 16 | 17 | widths = Array.new( page_numbers.length ) 18 | heights = Array.new( page_numbers.length, 0 ) 19 | 20 | # Calculate the final dimensions of each poster: 21 | 22 | png_files.each do |png_file| 23 | png_width, png_height = `png-find-grid-revised/png-size #{png_file}`.chomp.split(/x/).collect { |e| Integer(e) } 24 | png_file =~ /^poster-inline-breaks-(\d+)-/ 25 | page = Integer($1) 26 | if widths[page] 27 | if png_width != widths[page] 28 | raise "BUG: Not all the same width!" 29 | end 30 | else 31 | widths[page] = png_width 32 | end 33 | heights[page] += png_height 34 | end 35 | 36 | puts "Poster pages will have dimensions:" 37 | widths.each_index do |i| 38 | puts sprintf( "%02d: %d x %d", i, widths[i], heights[i] ) 39 | end 40 | 41 | # Now create the huge PNM files, by stripping the headings from each 42 | # and concatenating: 43 | 44 | pages_started = Array.new( page_numbers.length, false ) 45 | 46 | png_files.each do |png_file| 47 | 48 | png_file =~ /^poster-inline-breaks-(\d+)-/ 49 | page_string = $1 50 | page = Integer(page_string) 51 | 52 | output_filename = sprintf(complete_page_template_pnm,page); 53 | 54 | unless pages_started[page] 55 | puts "Starting new page: #{page}" 56 | # Output the heading and indicated we've started: 57 | output_filename = sprintf(complete_page_template_pnm,page) 58 | open( output_filename, "w" ) do |o| 59 | o.puts("P5") 60 | o.puts("#{widths[page]} #{heights[page]}") 61 | o.puts("255") 62 | end 63 | pages_started[page] = true 64 | end 65 | 66 | puts " Adding strip: #{png_file}" 67 | open( output_filename, "a" ) do |o| 68 | Kernel.open( "|-", "r" ) do |f| 69 | if f 70 | f.gets # Read the P5 71 | f.gets # Read the width and height 72 | f.gets # Read the 255 73 | o.write(f.read) 74 | else 75 | begin 76 | exec( pngtopnm, png_file ) 77 | rescue 78 | raise "Couldn't exec #{command}: #{$!}\n" 79 | end 80 | end 81 | end 82 | end 83 | 84 | end 85 | 86 | ## This takes ages and is basically pointless since we almost 87 | ## always want to work with a reduced size version instead. 88 | # 89 | # page_numbers.each_index do |i| 90 | # puts "Creating PNG version of page #{i}:" 91 | # pnm = sprintf( complete_page_template_pnm, i ) 92 | # png = sprintf( complete_page_template_png, i ) 93 | # system "#{pnmtopng} #{pnm} > #{png}" 94 | # end 95 | -------------------------------------------------------------------------------- /copy-characters-to-webspace: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby -w 2 | 3 | require 'yaml' 4 | 5 | if ARGV.length != 2 6 | puts "Usage: CORRECTED-CODEPOINTS-FILE DESTINATION-DIRECTORY" 7 | exit(-1) 8 | end 9 | 10 | codepoints_filename = ARGV[0] 11 | destination_directory = ARGV[1] 12 | 13 | data = open(codepoints_filename,"r") { |f| YAML.load(f.read) } 14 | 15 | data.each do |e| 16 | 17 | name = e[0] 18 | codepoint = e[1] 19 | 20 | new_name = sprintf("0x%06X.png",codepoint) 21 | 22 | puts "Copying #{name} to #{new_name}" 23 | 24 | unless system( "convert", 25 | "-scale", 26 | "50%", 27 | "individual-characters/" + name.gsub(".png","-top.png"), 28 | destination_directory + "reduced-" + new_name ) 29 | puts "Failed to convert and copy: #{name}" 30 | exit(-1) 31 | end 32 | end 33 | -------------------------------------------------------------------------------- /dice-posters: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby -w 2 | 3 | subdirectory = "diced-#{Time.now.to_i}" 4 | # subdirectory = "diced-hello" 5 | 6 | pamdice = "/home/mark/netpbm-10.34/package/bin/pamdice" 7 | pnmtops = "/home/mark/netpbm-10.34/package/bin/pnmtops" 8 | 9 | system("mkdir",subdirectory) 10 | 11 | originals = Dir["poster-complete-*-A0-600dpi.pam"] 12 | 13 | originals.each do |o| 14 | # Find the size: 15 | width = height = nil 16 | open(o) do |f| 17 | f.gets 18 | unless f.gets =~ /^(\d+) (\d+)/ 19 | STDERR.puts "This doesn't look like a pnm file to me." 20 | exit(-1) 21 | end 22 | width = Integer($1) 23 | height = Integer($2) 24 | end 25 | split_width = (width / 4.0).ceil 26 | split_height = (height / 4.0).ceil 27 | # Get the page number: 28 | unless o =~ /poster-complete-(\d{2})/ 29 | puts "Couldn't find page number from filename: #{o}" 30 | exit(-1) 31 | end 32 | page = $1 33 | system(pamdice, 34 | "-outstem=#{subdirectory}/diced-#{page}", 35 | "-width=#{split_width}", 36 | "-height=#{split_height}", 37 | o) 38 | 39 | end 40 | 41 | # Convert each to EPS: 42 | pgm_files = Dir[subdirectory+"/*.pgm"] 43 | pgm_files.each do |p| 44 | epsfile = p.gsub(/\.pgm/,'.eps') 45 | # full A4 would be 11.75 (ish) 46 | command = "#{pnmtops} -height 11 -imageheight 11 -psfilter -rle #{p} > #{epsfile}" 47 | puts command 48 | system command 49 | end 50 | -------------------------------------------------------------------------------- /generate-output-multi-page: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby -w 2 | 3 | def usage_and_exit 4 | STDERR.puts <<EOUSAGE 5 | Usage: <paper> [ <dpi> ] 6 | 7 | ... where <paper> is A4 or A0, and <dpi> is probably 300 or 600. The 8 | value of <dpi> defaults to 600, and the script will generate .pcl and 9 | .pjl output if it\'s 600. 10 | 11 | EOUSAGE 12 | exit(-1) 13 | end 14 | 15 | unless (((ARGV.length == 1) || (ARGV.length == 2)) && ((ARGV[0] == "A4") || (ARGV[0] == "A0"))) 16 | usage_and_exit 17 | end 18 | 19 | paper = ARGV[0] 20 | 21 | if paper == "A0" 22 | paper_width_in_inches = 33.11 23 | paper_height_in_inches = 46.81 24 | elsif paper == "A4" 25 | paper_width_in_inches = 8.27 26 | paper_height_in_inches = 11.69 27 | end 28 | 29 | dpi = ARGV[1] 30 | if dpi 31 | begin 32 | dpi = Integer(dpi) 33 | rescue 34 | usage_and_exit 35 | end 36 | else 37 | dpi = 600 38 | end 39 | 40 | 41 | puts "paper_height_in_inches_* #{paper_width_in_inches} x #{paper_height_in_inches}" 42 | 43 | pnmtops = "pnmtops" 44 | pamscale = "pnmscale" 45 | pnmtopng = "pnmtopng" 46 | pgmtoppm = "pgmtoppm" 47 | 48 | # ------------------------------------------------------------------------ 49 | # You probably don't have these: 50 | # ppmtopcl3 = "/home/mark/ppmtopcl3" 51 | # pjlsetpcl = "/home/mark/pjl-setpcl" 52 | # pjlreset = "/home/mark/pjl-reset" 53 | # ------------------------------------------------------------------------ 54 | 55 | pnm_files = Dir['poster-complete-*.pnm'] 56 | pnm_files.sort! 57 | 58 | pnm_files.each do |pnmfile| 59 | 60 | pnmfile_base = pnmfile.gsub(/.pnm/,'') 61 | pnmfile_reduce = pnmfile_base + "-#{paper}-#{dpi}dpi.pam" 62 | pngfile_reduce = pnmfile_base + "-#{paper}-#{dpi}dpi.png" 63 | ppmfile_reduce = pnmfile_base + "-#{paper}-#{dpi}dpi.ppm" 64 | # pclfile_reduce = pnmfile_base + "-#{paper}-#{dpi}dpi.pcl" 65 | # pjlfile_reduce = pnmfile_base + "-#{paper}-#{dpi}dpi.pjl" 66 | epsfile = pnmfile_base + "-#{paper}-#{dpi}dpi.eps" 67 | 68 | real_width = nil 69 | real_height = nil 70 | 71 | open(pnmfile) do |f| 72 | f.gets 73 | line = f.gets 74 | unless line =~ /^(\d+) (\d+)/ 75 | STDERR.puts "This doesn't look like a pnm file to me." 76 | exit(-1) 77 | end 78 | 79 | real_width = Integer($1) 80 | real_height = Integer($2) 81 | 82 | end 83 | 84 | scaling_width = real_width + 200 85 | scaling_height = real_height 86 | 87 | puts "width with margins for scaling (scaling_width) #{scaling_width}" 88 | puts "height with margins for scaling (scaling_height) #{scaling_height}" 89 | 90 | # pnmtops likes dimensions in inches 91 | 92 | paper_width_in_inches_smaller = paper_width_in_inches * 0.99 93 | paper_height_in_inches_smaller = paper_height_in_inches * 0.99 94 | 95 | puts "paper_*_in_inches_smaller #{paper_width_in_inches_smaller} x #{paper_height_in_inches_smaller}" 96 | 97 | # Not exactly sqrt(2) in fact... 98 | 99 | proportions_of_paper = paper_height_in_inches / paper_width_in_inches 100 | 101 | proportions_of_bitmap = scaling_height / Float(scaling_width) 102 | 103 | final_height_in_inches = nil 104 | final_width_in_inches = nil 105 | 106 | if( proportions_of_bitmap > proportions_of_paper ) 107 | 108 | puts("scaling to height...") 109 | # Then the height in our bitmap is the one we should scale to. 110 | 111 | final_height_in_pixels_with_possible_margin = Integer(paper_height_in_inches_smaller * Float(dpi)) 112 | puts " final_height_in_pixels_with_possible_margin: #{final_height_in_pixels_with_possible_margin}" 113 | 114 | pixel_scaling = scaling_height / Float(final_height_in_pixels_with_possible_margin) 115 | puts " pixel_scaling: #{pixel_scaling}" 116 | 117 | final_width_in_pixels = Integer(real_width / pixel_scaling) 118 | final_height_in_pixels = Integer(real_height / pixel_scaling) 119 | puts " final_*_in_pixels: #{final_width_in_pixels} x #{final_height_in_pixels}" 120 | 121 | final_width_in_inches = final_width_in_pixels / Float(dpi) 122 | final_height_in_inches = final_height_in_pixels / Float(dpi) 123 | puts " final_*_in_inches: #{final_width_in_inches} x #{final_height_in_inches}" 124 | 125 | if FileTest.exists?( pnmfile_reduce ) 126 | puts " #{pnmfile_reduce} exists, not regenerating..." 127 | else 128 | puts " Generating reduced PNM." 129 | command = "#{pamscale} -height #{final_height_in_pixels} #{pnmfile} > #{pnmfile_reduce}" 130 | puts " Running: #{command}" 131 | system command 132 | end 133 | if FileTest.exists?( pngfile_reduce ) 134 | puts " #{pngfile_reduce} exists, not regenerating..." 135 | else 136 | puts " Generating reduce PNG from that." 137 | command = "#{pnmtopng} -compression 3 #{pnmfile_reduce} > #{pngfile_reduce}" 138 | puts " Running: #{command}" 139 | system command 140 | end 141 | 142 | command = "#{pnmtops} -imageheight #{final_height_in_inches} #{pnmfile_reduce} > #{epsfile}" 143 | puts " Generating EPS file: #{command}" 144 | system command 145 | 146 | else 147 | 148 | puts("scaling to width...") 149 | # Then the width in our bitmap is the one we should scale to. 150 | 151 | final_width_in_pixels_with_possible_margin = Integer(paper_width_in_inches_smaller * Float(dpi)) 152 | puts " final_width_in_pixels_with_possible_margin: #{final_width_in_pixels_with_possible_margin}" 153 | 154 | pixel_scaling = scaling_width / Float(final_width_in_pixels_with_possible_margin) 155 | puts " pixel_scaling: #{pixel_scaling}" 156 | 157 | final_width_in_pixels = Integer(real_width / pixel_scaling) 158 | final_height_in_pixels = Integer(real_height / pixel_scaling) 159 | puts " final_*_in_pixels: #{final_width_in_pixels} x #{final_height_in_pixels}" 160 | 161 | final_width_in_inches = final_width_in_pixels / Float(dpi) 162 | final_height_in_inches = final_height_in_pixels / Float(dpi) 163 | puts " final_*_in_inches: #{final_width_in_inches} x #{final_height_in_inches}" 164 | 165 | if FileTest.exists?( pnmfile_reduce ) 166 | puts " #{pnmfile_reduce} exists, not regenerating..." 167 | else 168 | puts " Generating reduced PNM." 169 | command = "#{pamscale} -width #{final_width_in_pixels} #{pnmfile} > #{pnmfile_reduce}" 170 | puts " Running: #{command}" 171 | system command 172 | end 173 | if FileTest.exists?( pngfile_reduce ) 174 | puts " #{pngfile_reduce} exists, not regenerating..." 175 | else 176 | puts " Generating reduce PNG from that." 177 | command = "#{pnmtopng} -compression=3 #{pnmfile_reduce} > #{pngfile_reduce}" 178 | puts " Running: #{command}" 179 | system command 180 | end 181 | 182 | command = "#{pnmtops} -imageheight #{final_height_in_inches} #{pnmfile_reduce} > #{epsfile}" 183 | puts " Generating EPS file: #{command}" 184 | system command 185 | 186 | end 187 | 188 | # if 600 == dpi 189 | # 190 | # command = "#{pgmtoppm} rgb:ff/ff/ff #{pnmfile_reduce} > #{ppmfile_reduce}" 191 | # puts " Generating PPM file: #{command}" 192 | # system command 193 | # 194 | # command = "#{ppmtopcl3} < #{ppmfile_reduce} > #{pclfile_reduce}" 195 | # puts " Generating PCL file: #{command}" 196 | # system command 197 | # 198 | # command = "cat #{pjlsetpcl} #{pclfile_reduce} #{pjlreset} > #{pjlfile_reduce}" 199 | # puts " Generating PJL file: #{command}" 200 | # system command 201 | # 202 | # end 203 | 204 | end 205 | -------------------------------------------------------------------------------- /individual-characters/.gitignore: -------------------------------------------------------------------------------- 1 | *.png.empty 2 | -------------------------------------------------------------------------------- /make-unicode-poster: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env jruby 2 | 3 | require 'tileinfo.rb' 4 | require 'thread' 5 | require 'tempfile' 6 | 7 | include Java 8 | 9 | def jruby_safe_backticks(command,output_expected=false) 10 | # This is horrible, but for some reason IO.popen (like backticks) 11 | # sometimes still fails with an IOError, so retry up to three times. 12 | # See my bug report here: 13 | # http://jira.codehaus.org/browse/JRUBY-4909 14 | attempts = 0 15 | while attempts < 3 16 | begin 17 | io = IO.popen(command) 18 | data = io.read 19 | io.close 20 | unless output_expected and (data.length == 0) 21 | return data 22 | end 23 | rescue IOError => e 24 | STDERR.puts "IOError (#{e.to_s}) running #{command}" 25 | end 26 | attempts += 1 27 | end 28 | STDERR.puts "Still failed after #{attempts} attempts: #{command}" 29 | return "" 30 | end 31 | 32 | # The directory with the helper files: 33 | $helpers = "png-find-grid-revised" 34 | 35 | unless system("cd #{$helpers} && make") 36 | puts "Failed to build the helper programs. Is libpng12-dev installed?" 37 | exit(-1) 38 | end 39 | 40 | pdf_name = 'CodeCharts.pdf' 41 | pdf_url = "http://www.unicode.org/Public/6.0.0/charts/CodeCharts.pdf" 42 | 43 | last_png_file = 'U-849.png' 44 | 45 | unless FileTest.exist?(pdf_name) 46 | system("curl","-q","-o",pdf_name,pdf_url) 47 | end 48 | 49 | unless FileTest.exist?(last_png_file) 50 | unless system( "pdftoppm", 51 | "-r", "800", 52 | "-png", 53 | pdf_name, 54 | "U") 55 | puts "Failed to convert #{pdf_name} to a series of PNG files" 56 | exit(-1) 57 | end 58 | end 59 | 60 | puts "Globbing directory..." 61 | 62 | png_compare = lambda { |a| a =~ /U-([0-9]+)\.png/; $1.to_i(10) } 63 | 64 | png_files = Dir['U-*.png'] 65 | png_files = png_files.sort_by(&png_compare) 66 | 67 | $pngs_empty = {} 68 | 69 | Dir['U-*.png.empty'].each do |already_done| 70 | $pngs_empty[already_done] = true 71 | end 72 | 73 | def find_grid_and_crop(png_file) 74 | 75 | png_output_template = "individual-characters/" + png_file.gsub(/\.png/,"-%08d.png") 76 | 77 | skipping = false 78 | 79 | $mutex.synchronize { 80 | $condition.wait($mutex) while $running_threads >= $max_threads 81 | $running_threads += 1 82 | $condition.signal if $running_threads < $max_threads 83 | 84 | if $pngs_empty[png_file+".empty"] 85 | puts "Looks as if we've previously found #{png_file} to be empty; skipping..." 86 | skipping = true 87 | elsif FileTest.exist? sprintf(png_output_template,0) 88 | puts "At least one output file from "+png_file+"; skipping..." 89 | skipping = true 90 | else 91 | puts "Splitting: "+png_file 92 | end 93 | } 94 | 95 | unless skipping 96 | 97 | last_in_each_cell_y = nil 98 | first_in_each_cell_y = nil 99 | 100 | last_in_each_cell_x = nil 101 | first_in_each_cell_x = nil 102 | 103 | # The find-grid program gets certain code pages wrong, and one of 104 | # the PDF files shouldn't be in there. 105 | 106 | # if png_file =~ /U2580-002\.png/ 107 | # 108 | # # Fix this one... 109 | # 110 | # last_in_each_cell_x = [ 2247, 2592 ] 111 | # last_in_each_cell_y = [ 1244, 1684, 2124, 2564, 3004, 3444, 3884, 4324, 4764, 5204, 5644, 6084, 6524, 6964, 7404, 7837 ] 112 | # first_in_each_cell_x = [ 1908, 2252 ] 113 | # first_in_each_cell_y = [ 816, 1249, 1689, 2129, 2569, 3009, 3449, 3889, 4329, 4769, 5209, 5649, 6089, 6529, 6969, 7409 ] 114 | # 115 | # elsif png_file =~ /UFFF0-002\.png/ 116 | # 117 | # last_in_each_cell_x = [ 2414 ] 118 | # last_in_each_cell_y = [ 5204, 5644, 6084, 6524, 6964 ] 119 | # first_in_each_cell_x = [ 2085 ] 120 | # first_in_each_cell_y = [ 4769, 5209, 5649, 6089, 6529 ] 121 | # 122 | # elsif png_file =~ /UFB50-004\.png/ 123 | # 124 | # last_in_each_cell_x = [ 1387, 1739, 2090, 2442, 2794, 3146, 3497, 3849, 4200, 4552, 4904, 5952 ] 125 | # last_in_each_cell_y = [ 1244, 1684, 2124, 2564, 3004, 3444, 3884, 4324, 4764, 5204, 5644, 6084, 6524, 6964, 7404, 7837 ] 126 | # first_in_each_cell_x = [ 1041, 1392, 1744, 2095, 2447, 2799, 3151, 3502, 3854, 4205, 4557, 5612 ] 127 | # first_in_each_cell_y = [ 816, 1249, 1689, 2129, 2569, 3009, 3449, 3889, 4329, 4769, 5209, 5649, 6089, 6529, 6969, 7409 ] 128 | 129 | if false 130 | 131 | else 132 | 133 | command = "#{$helpers}/find-grid #{png_file} 333 427" 134 | grid_results_output = jruby_safe_backticks(command,output_expected=true) 135 | if grid_results_output.length > 1 136 | grid_results_lines = grid_results_output.split("\n") 137 | grid_results_lines.each do |line| 138 | line.chomp! 139 | next if line =~ /image height is/ 140 | values = line.gsub(/^(.*): *()/,'\2').split(/ +/) 141 | name = line.gsub(/^ *(.*):.*$/,'\1') 142 | case name 143 | when 'last_in_each_cell_x' 144 | last_in_each_cell_x = values 145 | when 'last_in_each_cell_y' 146 | last_in_each_cell_y = values 147 | when 'first_in_each_cell_x' 148 | first_in_each_cell_x = values 149 | when 'first_in_each_cell_y' 150 | first_in_each_cell_y = values 151 | else 152 | STDERR.puts "Unparseable line from command '#{command}': #{line}" 153 | exit(-1) 154 | end 155 | end 156 | else 157 | STDERR.puts "'#{command}' failed" 158 | exit(-1) 159 | end 160 | 161 | end 162 | 163 | cells_width = last_in_each_cell_x.length 164 | cells_height = last_in_each_cell_y.length 165 | 166 | if ((cells_width == 0) && (cells_height != 0)) || ((cells_width != 0) && (cells_height == 0)) 167 | message = "Broken: divided #{png_file} into #{cells_width} by #{cells_height}" 168 | STDERR.puts message 169 | end 170 | 171 | if ((cells_width == 0) || (cells_height == 0)) 172 | system "touch #{png_file}.empty" 173 | else 174 | 175 | puts " Cropping from #{png_file}" 176 | 177 | crop_specification_filename = png_file + ".crop" 178 | open(crop_specification_filename,"w") do |f| 179 | 180 | c_number = 0 181 | 0.upto( cells_width - 1 ) do |cell_x| 182 | 0.upto( cells_height - 1 ) do |cell_y| 183 | right_x = Integer(last_in_each_cell_x[cell_x]) 184 | left_x = Integer(first_in_each_cell_x[cell_x]) 185 | top_y = Integer(first_in_each_cell_y[cell_y]) 186 | bottom_y = Integer(last_in_each_cell_y[cell_y]) 187 | output_filename = sprintf( png_output_template, c_number ) 188 | unless FileTest.exist?( output_filename ) 189 | f.puts( "#{output_filename} #{left_x} #{top_y} #{(right_x - left_x) + 1} #{(bottom_y - top_y) + 1}" ) 190 | end 191 | c_number += 1 192 | end 193 | end 194 | end 195 | 196 | if not system("#{$helpers}/crop-images",png_file,crop_specification_filename) 197 | puts "#{$helpers}/crop-image #{png_file} #{crop_specification_filename} failed" 198 | exit(-1) 199 | end 200 | 201 | end 202 | end 203 | end 204 | 205 | $max_threads = java.lang.Runtime.getRuntime.availableProcessors 206 | $mutex = Mutex.new 207 | $condition = ConditionVariable.new 208 | $threads = [] 209 | $running_threads = 0 210 | 211 | unless FileTest.exist?( 'individual-characters/U-841-00000239.png' ) 212 | 213 | png_files.each do |png_file| 214 | 215 | puts "Looking at png_file: #{png_file}" 216 | 217 | $threads << Thread.new( png_file ) { |file_to_split| 218 | 219 | find_grid_and_crop(file_to_split) 220 | 221 | $mutex.synchronize { 222 | $running_threads -= 1 223 | $condition.signal if $running_threads < $max_threads 224 | } 225 | } 226 | 227 | end 228 | 229 | $threads.each {|t| t.join} 230 | 231 | end 232 | 233 | puts "Done with threads..." 234 | 235 | 236 | def tile_compare(a) 237 | a.gsub( /^.*U\-(\d+)\-(\d+)/, '\1\2' ).to_i(10) 238 | end 239 | 240 | files = Dir['individual-characters/U-*-*.png'] 241 | 242 | # If this is a repeat run then we may have some leftover -top and 243 | # -bottom files here: 244 | 245 | files.delete_if { |x| x =~ /top/ } 246 | files.delete_if { |x| x =~ /bottom/ } 247 | 248 | files = files.sort_by { |a| tile_compare(a) } 249 | 250 | puts "Got files now..." 251 | 252 | # ------------------------------------------------------------------------ 253 | 254 | def divide_and_ocr(character_filename) 255 | 256 | character_filename =~ /U-([0-9A-F]+)-([0-9A-F]+)/ 257 | 258 | info = jruby_safe_backticks("#{$helpers}/png-size #{character_filename}",output_expected=true) 259 | 260 | width = nil 261 | height = nil 262 | 263 | info.chomp! 264 | if info =~ /(\d+)x(\d+)/ 265 | 266 | width = Integer($1) 267 | height = Integer($2) 268 | 269 | top_part_fname = character_filename.sub( /.png/, '-top.png' ) 270 | bottom_part_fname = character_filename.sub( /.png/, '-bottom.png' ) 271 | 272 | guessed_text_size = 75 273 | 274 | if height < guessed_text_size 275 | message = "File too short: #{character_filename}" 276 | STDERR.puts message 277 | $mutex.synchronize { 278 | open('splitting.log','a') { |f| f.puts message } 279 | } 280 | else 281 | 282 | text_starts_at = height - guessed_text_size 283 | 284 | t = Tempfile.new('unicode-tile') 285 | t.puts( "#{top_part_fname} 0 0 #{width} #{text_starts_at}" ) 286 | t.puts( "#{bottom_part_fname} 0 #{text_starts_at} #{width} #{guessed_text_size}" ) 287 | t.close() 288 | 289 | if not system("#{$helpers}/crop-images",character_filename,t.path) 290 | puts "#{$helpers}/crop-image #{character_filename} #{t.path} failed" 291 | exit(-1) 292 | end 293 | 294 | top_width = nil 295 | top_height = nil 296 | 297 | output = jruby_safe_backticks("#{$helpers}/empty-image #{top_part_fname}",output_expected=true) 298 | output.chomp! 299 | if output == "Empty" 300 | puts "Empty, skipping: #{character_filename}" 301 | system("rm",top_part_fname) 302 | system("rm",bottom_part_fname) 303 | else 304 | output.chomp! 305 | if output =~ /^(\d+) (\d+)/ 306 | top_width = Integer($1) 307 | top_height = Integer($2) 308 | end 309 | 310 | # Check that the bottom part isn't hashed out, and delete 311 | # it if so: 312 | 313 | proportion = nil 314 | output = jruby_safe_backticks("#{$helpers}/bottom-line-proportion #{bottom_part_fname}",output_expected=true) 315 | begin 316 | proportion = Float(output) 317 | rescue ArgumentError => e 318 | STDERR.puts "Parsing a proportion out of the output '#{output}' from file #{bottom_part_fname} failed" 319 | end 320 | if proportion and proportion > 0.01 321 | puts "Probably cross-hatched, skipping: #{character_filename}" 322 | system("rm",top_part_fname) 323 | system("rm",bottom_part_fname) 324 | else 325 | 326 | tileinfo = TileInfo.new(top_width,top_height) 327 | tileinfo.filename = top_part_fname 328 | 329 | # Now use OCR to try to parse the codepoint out of the bottom 330 | # part: 331 | 332 | result = jruby_safe_backticks("pngtopnm #{bottom_part_fname} | ocrad -",output_expected=true) 333 | 334 | result.chomp! 335 | result.gsub!( /\s/, '' ) 336 | result.gsub!( /[oO]/, '0' ) 337 | result.gsub!( /a/, '8' ) 338 | result.gsub!( /g/, '9' ) 339 | 340 | result.upcase! 341 | 342 | tileinfo.ocr_codepoint = "0x" + result 343 | return tileinfo 344 | end 345 | end 346 | end 347 | else 348 | STDERR.puts "'#{$helpers}/png-size #{character_filename}' failed" 349 | end 350 | nil 351 | end 352 | 353 | $threads = [] 354 | $running_threads = 0 355 | 356 | top_sizes = [] 357 | codepoints = [] 358 | 359 | class Codepoint 360 | attr_accessor :filename, :codepoint 361 | def initialize(f,c) 362 | @filename = f 363 | @codepoint = c 364 | end 365 | end 366 | 367 | $max_threads *= 2 368 | 369 | done = 0 370 | 371 | files.each do |fname| 372 | 373 | puts "Considering for OCR #{fname}" 374 | 375 | # break if done > 100 376 | 377 | $mutex.synchronize { 378 | $condition.wait($mutex) while $running_threads >= $max_threads 379 | $running_threads += 1 380 | $condition.signal if $running_threads < $max_threads 381 | } 382 | 383 | thread = Thread.new(fname) { |file_to_process| 384 | 385 | Thread.current[:tileinfo] = divide_and_ocr(file_to_process) 386 | 387 | $mutex.synchronize { 388 | if Thread.current[:tileinfo] 389 | top_sizes << Thread.current[:tileinfo] 390 | end 391 | 392 | $running_threads -= 1 393 | $condition.signal if $running_threads < $max_threads 394 | } 395 | } 396 | 397 | # Go through the list of threads, find any that have finished, join 398 | # them, and remove all the finished threads from the array afterwards: 399 | $mutex.synchronize { 400 | indices_to_remove = [] 401 | $threads.each_index do |i| 402 | if $threads[i] 403 | status = $threads[i].status 404 | if status == false or status == nil 405 | $threads[i].join 406 | indices_to_remove << i 407 | end 408 | end 409 | end 410 | indices_to_remove.reverse.each do |i| 411 | $threads.delete_at i 412 | end 413 | $threads << thread 414 | } 415 | 416 | done += 1 417 | 418 | end 419 | 420 | $threads.each {|t| t.join} 421 | 422 | open( "top-sizes.yaml", "w" ) do |o_top_sizes| 423 | o_top_sizes.puts "---" 424 | top_sizes.each do |t| 425 | o_top_sizes.puts(t.to_yaml_hash_element) 426 | end 427 | end 428 | 429 | open( "codepoints.yaml", "w" ) do |o_codepoints| 430 | top_sizes.each do |t| 431 | o_codepoints.puts "-" 432 | o_codepoints.puts ' - "' + t.filename + '"' 433 | o_codepoints.puts " - " + t.ocr_codepoint 434 | end 435 | end 436 | -------------------------------------------------------------------------------- /png-find-grid-revised/.gitignore: -------------------------------------------------------------------------------- 1 | crop-images 2 | empty-image 3 | find-grid 4 | png-size 5 | bottom-line-proportion 6 | -------------------------------------------------------------------------------- /png-find-grid-revised/Makefile: -------------------------------------------------------------------------------- 1 | .PHONY : all clean 2 | 3 | BINARIES = find-grid empty-image png-size crop-images bottom-line-proportion 4 | 5 | all : $(BINARIES) 6 | 7 | clean : 8 | rm -vf $(BINARIES) 9 | 10 | CPPFLAGS=-Wall -g -O3 11 | LDFLAGS=-lpng12 12 | 13 | find-grid : find-grid.c readpng.c 14 | 15 | empty-image : empty-image.c readpng.c 16 | 17 | png-size : png-size.c readpng.c 18 | 19 | crop-images : crop-images.c readpng.c 20 | 21 | bottom-line-proportion : bottom-line-proportion.c readpng.c 22 | 23 | -------------------------------------------------------------------------------- /png-find-grid-revised/bottom-line-proportion.c: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | */ 4 | 5 | #include <stdio.h> 6 | #include <stdlib.h> 7 | 8 | #include "png.h" /* libpng header; includes zlib.h */ 9 | #include "readpng.h" /* typedefs, common macros, public prototypes */ 10 | 11 | #define PROGNAME "bottom-line-proportion" 12 | 13 | int main( int argc, char **argv ) { 14 | 15 | int x, y; 16 | 17 | char * filename; 18 | FILE * infile; 19 | int rc; 20 | unsigned long image_width, image_height, image_rowbytes, image_depth; 21 | int image_channels; 22 | unsigned char *image_data; 23 | 24 | if( argc != 2 ) { 25 | printf("Usage: " PROGNAME " <png-file-name>\n"); 26 | return -1; 27 | } 28 | 29 | filename = argv[1]; 30 | 31 | infile = fopen(filename, "rb"); 32 | 33 | if( !infile ) { 34 | fprintf(stderr, PROGNAME ": can't open PNG file [%s]\n", filename); 35 | return -1; 36 | 37 | } 38 | 39 | if ((rc = readpng_init(infile, &image_width, &image_height, &image_depth)) != 0) { 40 | 41 | switch (rc) { 42 | case 1: 43 | fprintf(stderr, PROGNAME 44 | ": [%s] is not a PNG file: incorrect signature\n", 45 | filename); 46 | return -1; 47 | case 2: 48 | fprintf(stderr, PROGNAME 49 | ": [%s] has bad IHDR (libpng longjmp)\n", 50 | filename); 51 | return -1; 52 | case 4: 53 | fprintf(stderr, PROGNAME ": insufficient memory\n"); 54 | return -1; 55 | default: 56 | fprintf(stderr, PROGNAME 57 | ": unknown readpng_init() error\n"); 58 | return -1; 59 | } 60 | } 61 | 62 | /* 63 | printf(" %s is %lu by %lu\n", filename, image_width, image_height ); 64 | printf(" %s has depth: %lu\n", filename, image_depth ); 65 | */ 66 | 67 | image_data = readpng_get_image(1.0, &image_channels, &image_rowbytes); 68 | 69 | /* 70 | printf(" %s has row_bytes: %lu\n", filename, image_rowbytes ); 71 | */ 72 | 73 | y = image_height - 1; 74 | int non_white_in_bottom_line = 0; 75 | for( x = 0; x < image_width; ++x ) { 76 | unsigned char r, g, b; 77 | r = image_data[y*image_rowbytes+(x*3)+0]; 78 | g = image_data[y*image_rowbytes+(x*3)+1]; 79 | b = image_data[y*image_rowbytes+(x*3)+2]; 80 | if( (r<254)||(g<254)||(b<254) ) { 81 | ++ non_white_in_bottom_line; 82 | } 83 | } 84 | float proportion = non_white_in_bottom_line / (float)image_width; 85 | printf("%f\n",proportion); 86 | 87 | readpng_cleanup(TRUE); 88 | fclose(infile); 89 | 90 | return 0; 91 | 92 | } 93 | 94 | -------------------------------------------------------------------------------- /png-find-grid-revised/crop-images.c: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | */ 4 | 5 | #include <stdio.h> 6 | #include <stdlib.h> 7 | 8 | #include "png.h" /* libpng header; includes zlib.h */ 9 | #include "readpng.h" /* typedefs, common macros, public prototypes */ 10 | 11 | #define PROGNAME "crop-images" 12 | 13 | typedef struct { 14 | 15 | char * output_filename; 16 | unsigned long x; 17 | unsigned long y; 18 | unsigned long width; 19 | unsigned long height; 20 | 21 | } target; 22 | 23 | 24 | int main( int argc, char **argv ) { 25 | 26 | target targets[512]; 27 | int number_of_targets = 0; 28 | char * filename = NULL; 29 | char * crop_specification_filename = NULL; 30 | FILE * infile; 31 | char line_buffer[512]; 32 | 33 | int rc; 34 | unsigned long image_width, image_height, image_rowbytes, image_depth; 35 | int image_channels; 36 | unsigned char *image_data; 37 | 38 | int i; 39 | 40 | if( argc != 3 ) { 41 | printf("Usage: crop-images <input-png-file-name> <crop-specification>\n"); 42 | return -1; 43 | } 44 | 45 | filename = argv[1]; 46 | crop_specification_filename = argv[2]; 47 | 48 | FILE * specification_file = fopen(crop_specification_filename,"r"); 49 | if( ! specification_file ) { 50 | fprintf(stderr,"Failed to open %s.\n",crop_specification_filename); 51 | return -1; 52 | } 53 | 54 | while( fgets(line_buffer,511,specification_file) ) { 55 | 56 | targets[number_of_targets].output_filename = malloc(512); 57 | if( ! targets[number_of_targets].output_filename ) { 58 | fprintf(stderr,"Failed to allocate 512 bytes.\n"); 59 | return -1; 60 | } 61 | 62 | int result = sscanf( line_buffer, 63 | "%s %lu %lu %lu %lu", 64 | targets[number_of_targets].output_filename, 65 | &(targets[number_of_targets].x), 66 | &(targets[number_of_targets].y), 67 | &(targets[number_of_targets].width), 68 | &(targets[number_of_targets].height) ); 69 | 70 | if( result != 5 ) { 71 | fprintf( stderr, "Standard input wasn't in the right input.\n" ); 72 | } 73 | 74 | /* 75 | printf( "%d:\n", number_of_targets ); 76 | printf( " filename: %s\n", targets[number_of_targets].output_filename ); 77 | printf( " x: %lu\n", targets[number_of_targets].x ); 78 | printf( " y: %lu\n", targets[number_of_targets].y ); 79 | printf( " width: %lu\n", targets[number_of_targets].width ); 80 | printf( " height: %lu\n", targets[number_of_targets].height ); 81 | */ 82 | 83 | ++ number_of_targets; 84 | } 85 | if(fclose(specification_file)) { 86 | fprintf(stderr, PROGNAME ": failed to close specification file [%s]\n", crop_specification_filename); 87 | return -1; 88 | } 89 | 90 | infile = fopen(filename, "rb"); 91 | 92 | if( !infile ) { 93 | fprintf(stderr, PROGNAME ": can't open PNG file [%s]\n", filename); 94 | return -1; 95 | 96 | } 97 | 98 | if ((rc = readpng_init(infile, &image_width, &image_height, &image_depth)) != 0) { 99 | switch (rc) { 100 | case 1: 101 | fprintf(stderr, PROGNAME 102 | ": [%s] is not a PNG file: incorrect signature\n", 103 | filename); 104 | return -1; 105 | case 2: 106 | fprintf(stderr, PROGNAME 107 | ": [%s] has bad IHDR (libpng longjmp)\n", 108 | filename); 109 | return -1; 110 | case 4: 111 | fprintf(stderr, PROGNAME ": insufficient memory\n"); 112 | return -1; 113 | default: 114 | fprintf(stderr, PROGNAME 115 | ": unknown readpng_init() error\n"); 116 | return -1; 117 | } 118 | } 119 | 120 | // printf(" %s is %lu by %lu\n", filename, image_width, image_height ); 121 | // printf(" %s has depth: %lu\n", filename, image_depth ); 122 | 123 | image_data = readpng_get_image(1.0, &image_channels, &image_rowbytes); 124 | 125 | // printf(" %s has row_bytes: %lu\n", filename, image_rowbytes ); 126 | 127 | for( i = 0; i < number_of_targets; ++i ) { 128 | 129 | png_bytepp rows = NULL; 130 | png_infop info_ptr = NULL; 131 | unsigned long width = targets[i].width; 132 | unsigned long height = targets[i].height; 133 | unsigned long x = targets[i].x; 134 | unsigned long y = targets[i].y; 135 | unsigned int pitch; 136 | char * output_filename = targets[i].output_filename; 137 | png_structp png_ptr; 138 | unsigned int j, k; 139 | 140 | FILE * fp = fopen( output_filename, "wb" ); 141 | if( ! fp ) { 142 | fprintf(stderr, PROGNAME ": can't open PNG file [%s] for output\n", 143 | output_filename); 144 | return -1; 145 | } 146 | 147 | rows = malloc( height * sizeof(png_bytep) ); 148 | 149 | png_ptr = png_create_write_struct( PNG_LIBPNG_VER_STRING, 0, 0, 0 ); 150 | if( ! png_ptr ) { 151 | fprintf(stderr, PROGNAME "Failed to allocate a png_structp\n" ); 152 | free(rows); 153 | fclose(fp); 154 | return -1; 155 | } 156 | 157 | info_ptr = png_create_info_struct( png_ptr ); 158 | if( ! info_ptr ) { 159 | fprintf(stderr, PROGNAME "Failed to allocate a png_infop" ); 160 | png_destroy_write_struct( &png_ptr, 0 ); 161 | free(rows); 162 | fclose( fp ); 163 | return -1; 164 | } 165 | 166 | if( setjmp( png_ptr->jmpbuf ) ) { 167 | fprintf(stderr, PROGNAME "Failed to setjmp\n" ); 168 | png_destroy_write_struct( &png_ptr, 0 ); 169 | free(rows); 170 | fclose(fp); 171 | return -1; 172 | } 173 | 174 | png_init_io( png_ptr, fp ); 175 | 176 | png_set_compression_level( png_ptr, 3 ); 177 | 178 | png_set_IHDR( png_ptr, 179 | info_ptr, 180 | width, 181 | height, 182 | 8, 183 | PNG_COLOR_TYPE_GRAY, 184 | PNG_INTERLACE_NONE, 185 | PNG_COMPRESSION_TYPE_DEFAULT, // The only option... 186 | PNG_FILTER_TYPE_DEFAULT ); 187 | 188 | png_write_info(png_ptr, info_ptr); 189 | 190 | if( setjmp( png_ptr->jmpbuf ) ) { 191 | fprintf(stderr, PROGNAME "Failed to setjmp\n" ); 192 | png_destroy_write_struct( &png_ptr, 0 ); 193 | free(rows); 194 | fclose(fp); 195 | return -1; 196 | } 197 | 198 | pitch = png_get_rowbytes( png_ptr, info_ptr ); 199 | 200 | // printf( "pitch is: %d\n",pitch ); 201 | 202 | for( j = 0; j < height; ++j ) { 203 | unsigned char * row = malloc(pitch); 204 | if( ! row ) { 205 | int l; 206 | fprintf(stderr, PROGNAME "Failed to allocate a row\n" ); 207 | png_destroy_write_struct( &png_ptr, 0 ); 208 | for( l = 0; l < j; ++j ) { 209 | free(rows[l]); 210 | } 211 | free(rows); 212 | fclose(fp); 213 | return -1; 214 | } 215 | for( k = 0; k < width; ++k ) { 216 | row[k] = image_data[(y + j)*image_rowbytes + (3 * (x + k))]; 217 | rows[j] = row; 218 | } 219 | } 220 | 221 | png_write_image( png_ptr, rows ); 222 | 223 | png_write_end( png_ptr, 0 ); 224 | 225 | png_destroy_write_struct( &png_ptr, &info_ptr ); 226 | 227 | for( j = 0; j < height; ++j ) { 228 | free(rows[j]); 229 | } 230 | free( rows ); 231 | 232 | free( targets[number_of_targets].output_filename ); 233 | 234 | } 235 | 236 | return 0; 237 | 238 | } 239 | 240 | -------------------------------------------------------------------------------- /png-find-grid-revised/empty-image.c: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | */ 4 | 5 | #include <stdio.h> 6 | #include <stdlib.h> 7 | 8 | #include "png.h" /* libpng header; includes zlib.h */ 9 | #include "readpng.h" /* typedefs, common macros, public prototypes */ 10 | 11 | #define PROGNAME "empty-image" 12 | 13 | int main( int argc, char **argv ) { 14 | 15 | int x, y; 16 | 17 | char * filename; 18 | FILE * infile; 19 | int rc; 20 | unsigned long image_width, image_height, image_rowbytes, image_depth; 21 | int image_channels; 22 | unsigned char *image_data; 23 | 24 | if( argc != 2 ) { 25 | printf("Usage: find-grid <png-file-name>\n"); 26 | return -1; 27 | } 28 | 29 | filename = argv[1]; 30 | 31 | infile = fopen(filename, "rb"); 32 | 33 | if( !infile ) { 34 | fprintf(stderr, PROGNAME ": can't open PNG file [%s]\n", filename); 35 | return -1; 36 | 37 | } 38 | 39 | if ((rc = readpng_init(infile, &image_width, &image_height, &image_depth)) != 0) { 40 | 41 | switch (rc) { 42 | case 1: 43 | fprintf(stderr, PROGNAME 44 | ": [%s] is not a PNG file: incorrect signature\n", 45 | filename); 46 | return -1; 47 | case 2: 48 | fprintf(stderr, PROGNAME 49 | ": [%s] has bad IHDR (libpng longjmp)\n", 50 | filename); 51 | return -1; 52 | case 4: 53 | fprintf(stderr, PROGNAME ": insufficient memory\n"); 54 | return -1; 55 | default: 56 | fprintf(stderr, PROGNAME 57 | ": unknown readpng_init() error\n"); 58 | return -1; 59 | } 60 | } 61 | 62 | /* 63 | printf(" %s is %lu by %lu\n", filename, image_width, image_height ); 64 | printf(" %s has depth: %lu\n", filename, image_depth ); 65 | */ 66 | 67 | image_data = readpng_get_image(1.0, &image_channels, &image_rowbytes); 68 | 69 | /* 70 | printf(" %s has row_bytes: %lu\n", filename, image_rowbytes ); 71 | */ 72 | 73 | 74 | for( x = 0; x < image_width; ++x ) { 75 | for( y = 0; y < image_height; ++y ) { 76 | unsigned char r, g, b; 77 | r = image_data[y*image_rowbytes+(x*3)+0]; 78 | g = image_data[y*image_rowbytes+(x*3)+1]; 79 | b = image_data[y*image_rowbytes+(x*3)+2]; 80 | if( (r<254)||(g<254)||(b<254) ) { 81 | /* printf("Got a non-white pixel (%d,%d,%d) at (%d,%d)\n", 82 | r, g, b, x, y); */ 83 | printf("%lu %lu\n",image_width,image_height); 84 | return -1; 85 | } 86 | } 87 | } 88 | printf("Empty\n"); 89 | readpng_cleanup(TRUE); 90 | fclose(infile); 91 | 92 | return 0; 93 | 94 | } 95 | 96 | -------------------------------------------------------------------------------- /png-find-grid-revised/find-grid.c: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | */ 4 | 5 | #include <stdio.h> 6 | #include <stdlib.h> 7 | 8 | #include "png.h" /* libpng header; includes zlib.h */ 9 | #include "readpng.h" /* typedefs, common macros, public prototypes */ 10 | 11 | #define PROGNAME "find-grid" 12 | 13 | int significant_horizontal_line( const unsigned char * image_data, 14 | int y, 15 | unsigned long image_width, 16 | unsigned long pitch, 17 | unsigned long minimum_width) 18 | { 19 | int run_length = 0, x; 20 | 21 | for( x = 0; x < image_width; ++x ) { 22 | 23 | int non_white = (image_data[y*pitch+(x*3)] < 0xFE); 24 | 25 | if( non_white ) 26 | ++ run_length; 27 | else 28 | run_length = 0; 29 | 30 | if( run_length > minimum_width ) 31 | return 1; 32 | 33 | } 34 | 35 | return 0; 36 | } 37 | 38 | 39 | int significant_vertical_line( const unsigned char * image_data, 40 | int x, 41 | unsigned long image_height, 42 | unsigned long pitch, 43 | unsigned long minimum_height) 44 | { 45 | int run_length = 0, y; 46 | 47 | for( y = 0; y < image_height; ++y ) { 48 | 49 | int non_white = (image_data[y*pitch+(x*3)] < 0xFE); 50 | 51 | if( non_white ) 52 | ++ run_length; 53 | else 54 | run_length = 0; 55 | 56 | if( run_length > minimum_height ) 57 | return 1; 58 | 59 | } 60 | 61 | return 0; 62 | } 63 | 64 | int main( int argc, char **argv ) { 65 | 66 | int last_in_each_cell_x[256]; 67 | int first_in_each_cell_x[256]; 68 | int last_in_each_cell_y[256]; 69 | int first_in_each_cell_y[256]; 70 | 71 | int i_last_x; 72 | int i_first_x; 73 | int i_last_y; 74 | int i_first_y; 75 | 76 | int cell_width, cell_height; 77 | 78 | int crossing_grid_boundary; 79 | 80 | int x, y; 81 | 82 | char * filename, * minimum_box_width_s, * minimum_box_height_s; 83 | unsigned long minimum_box_width, minimum_box_height; 84 | FILE * infile; 85 | int rc; 86 | unsigned long image_width, image_height, image_rowbytes, image_depth; 87 | int image_channels; 88 | unsigned char *image_data; 89 | 90 | if( argc != 4 ) { 91 | printf("Usage: find-grid <png-file-name> <minimum-box-width> <minimum-box-height>\n"); 92 | return -1; 93 | } 94 | 95 | filename = argv[1]; 96 | minimum_box_width_s = argv[2]; 97 | minimum_box_height_s = argv[3]; 98 | 99 | // atoi is bad, of course... 100 | 101 | minimum_box_width = (unsigned long) atol(argv[2]); 102 | minimum_box_height = (unsigned long) atol(argv[3]); 103 | 104 | // printf("%lu x %lu\n",minimum_box_width,minimum_box_height); 105 | 106 | if( minimum_box_width <= 0 ) { 107 | printf( "minimum_box_width must be an integer and greater than or equal to zero.\n"); 108 | return -1; 109 | } 110 | 111 | if( minimum_box_height <= 0 ) { 112 | printf( "minimum_box_height must be an integer and greater than or equal to zero.\n"); 113 | return -1; 114 | } 115 | 116 | infile = fopen(filename, "rb"); 117 | 118 | if( !infile ) { 119 | fprintf(stderr, PROGNAME ": can't open PNG file [%s]\n", filename); 120 | return -1; 121 | 122 | } 123 | 124 | if ((rc = readpng_init(infile, &image_width, &image_height, &image_depth)) != 0) { 125 | switch (rc) { 126 | case 1: 127 | fprintf(stderr, PROGNAME 128 | ": [%s] is not a PNG file: incorrect signature\n", 129 | filename); 130 | return -1; 131 | case 2: 132 | fprintf(stderr, PROGNAME 133 | ": [%s] has bad IHDR (libpng longjmp)\n", 134 | filename); 135 | return -1; 136 | case 4: 137 | fprintf(stderr, PROGNAME ": insufficient memory\n"); 138 | return -1; 139 | default: 140 | fprintf(stderr, PROGNAME 141 | ": unknown readpng_init() error\n"); 142 | return -1; 143 | } 144 | } 145 | 146 | /* 147 | printf(" %s is %lu by %lu\n", filename, image_width, image_height ); 148 | printf(" %s has depth: %lu\n", filename, image_depth ); 149 | */ 150 | 151 | image_data = readpng_get_image(1.0, &image_channels, &image_rowbytes); 152 | 153 | /* 154 | printf(" %s has row_bytes: %lu\n", filename, image_rowbytes ); 155 | */ 156 | 157 | i_last_x = 0; 158 | i_first_x = 0; 159 | 160 | crossing_grid_boundary = 0; 161 | 162 | for( x = 0; x < image_width; ++ x ) { 163 | 164 | int significant_line = significant_vertical_line( image_data, 165 | x, 166 | image_height, 167 | image_rowbytes, 168 | minimum_box_height ); 169 | 170 | /* 171 | if( significant_line ) 172 | printf("Significant vertical line at x = %d\n",x); 173 | */ 174 | 175 | if( crossing_grid_boundary && ! significant_line ) { 176 | first_in_each_cell_x[i_first_x++] = x; 177 | crossing_grid_boundary = 0; 178 | } else if( (! crossing_grid_boundary) && significant_line ) { 179 | last_in_each_cell_x[i_last_x++] = x - 1; 180 | crossing_grid_boundary = 1; 181 | } 182 | 183 | } 184 | 185 | cell_width = i_last_x - 1; 186 | 187 | crossing_grid_boundary = 0; 188 | 189 | i_last_y = 0; 190 | i_first_y = 0; 191 | 192 | printf( "image height is: %lu\n", image_height ); 193 | 194 | for( y = 0; y < image_height; ++y ) { 195 | 196 | int significant_line = significant_horizontal_line( image_data, 197 | y, 198 | image_width, 199 | image_rowbytes, 200 | minimum_box_width ); 201 | 202 | /* 203 | if( significant_line ) 204 | printf("Significant horizontal line at y = %d\n",y); 205 | */ 206 | 207 | if( crossing_grid_boundary && ! significant_line ) { 208 | first_in_each_cell_y[i_first_y++] = y; 209 | crossing_grid_boundary = 0; 210 | } else if( (! crossing_grid_boundary) && significant_line ) { 211 | last_in_each_cell_y[i_last_y++] = y - 1; 212 | crossing_grid_boundary = 1; 213 | } 214 | 215 | } 216 | 217 | cell_height = i_last_y - 1; 218 | 219 | /* 220 | printf( "Cell width: %d\n", cell_width ); 221 | printf( "Cell height: %d\n", cell_height ); 222 | */ 223 | 224 | readpng_cleanup(TRUE); 225 | fclose(infile); 226 | 227 | printf( " last_in_each_cell_x:" ); 228 | for( i_last_x = 1; i_last_x <= cell_width; ++ i_last_x ) 229 | printf( " %6d", last_in_each_cell_x[i_last_x] ); 230 | printf( "\n" ); 231 | 232 | printf( " last_in_each_cell_y:" ); 233 | for( i_last_y = 1; i_last_y <= cell_height; ++ i_last_y ) 234 | printf( " %6d", last_in_each_cell_y[i_last_y] ); 235 | printf( "\n" ); 236 | 237 | printf( "first_in_each_cell_x:" ); 238 | for( i_first_x = 0; i_first_x < cell_width; ++ i_first_x ) 239 | printf( " %6d", first_in_each_cell_x[i_first_x] ); 240 | printf( "\n" ); 241 | 242 | printf( "first_in_each_cell_y:" ); 243 | for( i_first_y = 0; i_first_y < cell_height; ++ i_first_y ) 244 | printf( " %6d", first_in_each_cell_y[i_first_y] ); 245 | printf( "\n" ); 246 | 247 | return 0; 248 | 249 | } 250 | 251 | -------------------------------------------------------------------------------- /png-find-grid-revised/png-size.c: -------------------------------------------------------------------------------- 1 | /* 2 | 3 | */ 4 | 5 | #include <stdio.h> 6 | #include <stdlib.h> 7 | 8 | #include "png.h" /* libpng header; includes zlib.h */ 9 | #include "readpng.h" /* typedefs, common macros, public prototypes */ 10 | 11 | #define PROGNAME "png-size" 12 | 13 | int significant_horizontal_line( const unsigned char * image_data, 14 | int y, 15 | unsigned long image_width, 16 | unsigned long pitch, 17 | unsigned long minimum_width) 18 | { 19 | int run_length = 0, x; 20 | 21 | for( x = 0; x < image_width; ++x ) { 22 | 23 | int non_white = image_data[y*pitch+(x*3)] != 0xFF; 24 | 25 | if( non_white ) 26 | ++ run_length; 27 | else 28 | run_length = 0; 29 | 30 | if( run_length > minimum_width ) 31 | return 1; 32 | 33 | } 34 | 35 | return 0; 36 | } 37 | 38 | 39 | int significant_vertical_line( const unsigned char * image_data, 40 | int x, 41 | unsigned long image_height, 42 | unsigned long pitch, 43 | unsigned long minimum_height) 44 | { 45 | int run_length = 0, y; 46 | 47 | for( y = 0; y < image_height; ++y ) { 48 | 49 | int non_white = image_data[y*pitch+(x*3)] != 0xFF; 50 | 51 | if( non_white ) 52 | ++ run_length; 53 | else 54 | run_length = 0; 55 | 56 | if( run_length > minimum_height ) 57 | return 1; 58 | 59 | } 60 | 61 | return 0; 62 | } 63 | 64 | int main( int argc, char **argv ) { 65 | 66 | char * filename; 67 | FILE * infile; 68 | int rc; 69 | unsigned long image_width, image_height, image_depth; 70 | 71 | if( argc != 2 ) { 72 | printf("Usage: find-grid <png-file-name>\n"); 73 | return -1; 74 | } 75 | 76 | filename = argv[1]; 77 | 78 | infile = fopen(filename, "rb"); 79 | 80 | if( !infile ) { 81 | fprintf(stderr, PROGNAME ": can't open PNG file [%s]\n", filename); 82 | return -1; 83 | 84 | } 85 | 86 | if ((rc = readpng_init(infile, &image_width, &image_height, &image_depth)) != 0) { 87 | switch (rc) { 88 | case 1: 89 | fprintf(stderr, PROGNAME 90 | ": [%s] is not a PNG file: incorrect signature\n", 91 | filename); 92 | return -1; 93 | case 2: 94 | fprintf(stderr, PROGNAME 95 | ": [%s] has bad IHDR (libpng longjmp)\n", 96 | filename); 97 | return -1; 98 | case 4: 99 | fprintf(stderr, PROGNAME ": insufficient memory\n"); 100 | return -1; 101 | default: 102 | fprintf(stderr, PROGNAME 103 | ": unknown readpng_init() error\n"); 104 | return -1; 105 | } 106 | } 107 | 108 | printf("%lux%lu",image_width,image_height); 109 | 110 | readpng_cleanup(TRUE); 111 | fclose(infile); 112 | 113 | return 0; 114 | 115 | } 116 | 117 | -------------------------------------------------------------------------------- /png-find-grid-revised/readpng.c: -------------------------------------------------------------------------------- 1 | /*--------------------------------------------------------------------------- 2 | 3 | rpng - simple PNG display program readpng.c 4 | 5 | --------------------------------------------------------------------------- 6 | 7 | Copyright (c) 1998-2000 Greg Roelofs. All rights reserved. 8 | 9 | This software is provided "as is," without warranty of any kind, 10 | express or implied. In no event shall the author or contributors 11 | be held liable for any damages arising in any way from the use of 12 | this software. 13 | 14 | Permission is granted to anyone to use this software for any purpose, 15 | including commercial applications, and to alter it and redistribute 16 | it freely, subject to the following restrictions: 17 | 18 | 1. Redistributions of source code must retain the above copyright 19 | notice, disclaimer, and this list of conditions. 20 | 2. Redistributions in binary form must reproduce the above copyright 21 | notice, disclaimer, and this list of conditions in the documenta- 22 | tion and/or other materials provided with the distribution. 23 | 3. All advertising materials mentioning features or use of this 24 | software must display the following acknowledgment: 25 | 26 | This product includes software developed by Greg Roelofs 27 | and contributors for the book, "PNG: The Definitive Guide," 28 | published by O'Reilly and Associates. 29 | 30 | ---------------------------------------------------------------------------*/ 31 | 32 | #include <stdio.h> 33 | #include <stdlib.h> 34 | 35 | #include "png.h" /* libpng header; includes zlib.h */ 36 | #include "readpng.h" /* typedefs, common macros, public prototypes */ 37 | 38 | /* future versions of libpng will provide this macro: */ 39 | #ifndef png_jmpbuf 40 | # define png_jmpbuf(png_ptr) ((png_ptr)->jmpbuf) 41 | #endif 42 | 43 | 44 | static png_structp png_ptr = NULL; 45 | static png_infop info_ptr = NULL; 46 | 47 | png_uint_32 width, height; 48 | int bit_depth, color_type; 49 | uch *image_data = NULL; 50 | 51 | 52 | void readpng_version_info(void) 53 | { 54 | fprintf(stderr, " Compiled with libpng %s; using libpng %s.\n", 55 | PNG_LIBPNG_VER_STRING, png_libpng_ver); 56 | fprintf(stderr, " Compiled with zlib %s; using zlib %s.\n", 57 | ZLIB_VERSION, zlib_version); 58 | } 59 | 60 | 61 | /* return value = 0 for success, 1 for bad sig, 2 for bad IHDR, 4 for no mem */ 62 | 63 | int readpng_init(FILE *infile, ulg *pWidth, ulg *pHeight, ulg *pDepth) 64 | { 65 | uch sig[8]; 66 | 67 | 68 | /* first do a quick check that the file really is a PNG image; could 69 | * have used slightly more general png_sig_cmp() function instead */ 70 | 71 | fread(sig, 1, 8, infile); 72 | if (!png_check_sig(sig, 8)) 73 | return 1; /* bad signature */ 74 | 75 | 76 | /* could pass pointers to user-defined error handlers instead of NULLs: */ 77 | 78 | png_ptr = png_create_read_struct(PNG_LIBPNG_VER_STRING, NULL, NULL, NULL); 79 | if (!png_ptr) 80 | return 4; /* out of memory */ 81 | 82 | info_ptr = png_create_info_struct(png_ptr); 83 | if (!info_ptr) { 84 | png_destroy_read_struct(&png_ptr, NULL, NULL); 85 | return 4; /* out of memory */ 86 | } 87 | 88 | 89 | /* we could create a second info struct here (end_info), but it's only 90 | * useful if we want to keep pre- and post-IDAT chunk info separated 91 | * (mainly for PNG-aware image editors and converters) */ 92 | 93 | 94 | /* setjmp() must be called in every function that calls a PNG-reading 95 | * libpng function */ 96 | 97 | if (setjmp(png_jmpbuf(png_ptr))) { 98 | png_destroy_read_struct(&png_ptr, &info_ptr, NULL); 99 | return 2; 100 | } 101 | 102 | 103 | png_init_io(png_ptr, infile); 104 | png_set_sig_bytes(png_ptr, 8); /* we already read the 8 signature bytes */ 105 | 106 | png_read_info(png_ptr, info_ptr); /* read all PNG info up to image data */ 107 | 108 | 109 | /* alternatively, could make separate calls to png_get_image_width(), 110 | * etc., but want bit_depth and color_type for later [don't care about 111 | * compression_type and filter_type => NULLs] */ 112 | 113 | png_get_IHDR(png_ptr, info_ptr, &width, &height, &bit_depth, &color_type, 114 | NULL, NULL, NULL); 115 | *pWidth = width; 116 | *pHeight = height; 117 | *pDepth = bit_depth; 118 | 119 | /* OK, that's all we need for now; return happy */ 120 | 121 | return 0; 122 | } 123 | 124 | 125 | 126 | 127 | /* returns 0 if succeeds, 1 if fails due to no bKGD chunk, 2 if libpng error; 128 | * scales values to 8-bit if necessary */ 129 | 130 | int readpng_get_bgcolor(uch *red, uch *green, uch *blue) 131 | { 132 | png_color_16p pBackground; 133 | 134 | 135 | /* setjmp() must be called in every function that calls a PNG-reading 136 | * libpng function */ 137 | 138 | if (setjmp(png_jmpbuf(png_ptr))) { 139 | png_destroy_read_struct(&png_ptr, &info_ptr, NULL); 140 | return 2; 141 | } 142 | 143 | 144 | if (!png_get_valid(png_ptr, info_ptr, PNG_INFO_bKGD)) 145 | return 1; 146 | 147 | /* it is not obvious from the libpng documentation, but this function 148 | * takes a pointer to a pointer, and it always returns valid red, green 149 | * and blue values, regardless of color_type: */ 150 | 151 | png_get_bKGD(png_ptr, info_ptr, &pBackground); 152 | 153 | 154 | /* however, it always returns the raw bKGD data, regardless of any 155 | * bit-depth transformations, so check depth and adjust if necessary */ 156 | 157 | if (bit_depth == 16) { 158 | *red = pBackground->red >> 8; 159 | *green = pBackground->green >> 8; 160 | *blue = pBackground->blue >> 8; 161 | } else if (color_type == PNG_COLOR_TYPE_GRAY && bit_depth < 8) { 162 | if (bit_depth == 1) 163 | *red = *green = *blue = pBackground->gray? 255 : 0; 164 | else if (bit_depth == 2) 165 | *red = *green = *blue = (255/3) * pBackground->gray; 166 | else /* bit_depth == 4 */ 167 | *red = *green = *blue = (255/15) * pBackground->gray; 168 | } else { 169 | *red = (uch)pBackground->red; 170 | *green = (uch)pBackground->green; 171 | *blue = (uch)pBackground->blue; 172 | } 173 | 174 | return 0; 175 | } 176 | 177 | 178 | 179 | 180 | /* display_exponent == LUT_exponent * CRT_exponent */ 181 | 182 | uch *readpng_get_image(double display_exponent, int *pChannels, ulg *pRowbytes) 183 | { 184 | double gamma; 185 | png_uint_32 i, rowbytes; 186 | png_bytepp row_pointers = NULL; 187 | 188 | 189 | /* setjmp() must be called in every function that calls a PNG-reading 190 | * libpng function */ 191 | 192 | if (setjmp(png_jmpbuf(png_ptr))) { 193 | png_destroy_read_struct(&png_ptr, &info_ptr, NULL); 194 | return NULL; 195 | } 196 | 197 | 198 | /* expand palette images to RGB, low-bit-depth grayscale images to 8 bits, 199 | * transparency chunks to full alpha channel; strip 16-bit-per-sample 200 | * images to 8 bits per sample; and convert grayscale to RGB[A] */ 201 | 202 | if (color_type == PNG_COLOR_TYPE_PALETTE) 203 | png_set_expand(png_ptr); 204 | if (color_type == PNG_COLOR_TYPE_GRAY && bit_depth < 8) 205 | png_set_expand(png_ptr); 206 | if (png_get_valid(png_ptr, info_ptr, PNG_INFO_tRNS)) 207 | png_set_expand(png_ptr); 208 | if (bit_depth == 16) 209 | png_set_strip_16(png_ptr); 210 | if (color_type == PNG_COLOR_TYPE_GRAY || 211 | color_type == PNG_COLOR_TYPE_GRAY_ALPHA) 212 | png_set_gray_to_rgb(png_ptr); 213 | 214 | 215 | /* unlike the example in the libpng documentation, we have *no* idea where 216 | * this file may have come from--so if it doesn't have a file gamma, don't 217 | * do any correction ("do no harm") */ 218 | 219 | if (png_get_gAMA(png_ptr, info_ptr, &gamma)) 220 | png_set_gamma(png_ptr, display_exponent, gamma); 221 | 222 | 223 | /* all transformations have been registered; now update info_ptr data, 224 | * get rowbytes and channels, and allocate image memory */ 225 | 226 | png_read_update_info(png_ptr, info_ptr); 227 | 228 | *pRowbytes = rowbytes = png_get_rowbytes(png_ptr, info_ptr); 229 | *pChannels = (int)png_get_channels(png_ptr, info_ptr); 230 | 231 | if ((image_data = (uch *)malloc(rowbytes*height)) == NULL) { 232 | png_destroy_read_struct(&png_ptr, &info_ptr, NULL); 233 | return NULL; 234 | } 235 | if ((row_pointers = (png_bytepp)malloc(height*sizeof(png_bytep))) == NULL) { 236 | png_destroy_read_struct(&png_ptr, &info_ptr, NULL); 237 | free(image_data); 238 | image_data = NULL; 239 | return NULL; 240 | } 241 | 242 | Trace((stderr, "readpng_get_image: channels = %d, rowbytes = %ld, height = %ld\n", *pChannels, rowbytes, height)); 243 | 244 | 245 | /* set the individual row_pointers to point at the correct offsets */ 246 | 247 | for (i = 0; i < height; ++i) 248 | row_pointers[i] = image_data + i*rowbytes; 249 | 250 | 251 | /* now we can go ahead and just read the whole image */ 252 | 253 | png_read_image(png_ptr, row_pointers); 254 | 255 | 256 | /* and we're done! (png_read_end() can be omitted if no processing of 257 | * post-IDAT text/time/etc. is desired) */ 258 | 259 | free(row_pointers); 260 | row_pointers = NULL; 261 | 262 | png_read_end(png_ptr, NULL); 263 | 264 | return image_data; 265 | } 266 | 267 | 268 | void readpng_cleanup(int free_image_data) 269 | { 270 | if (free_image_data && image_data) { 271 | free(image_data); 272 | image_data = NULL; 273 | } 274 | 275 | if (png_ptr && info_ptr) { 276 | png_destroy_read_struct(&png_ptr, &info_ptr, NULL); 277 | png_ptr = NULL; 278 | info_ptr = NULL; 279 | } 280 | } 281 | -------------------------------------------------------------------------------- /png-find-grid-revised/readpng.h: -------------------------------------------------------------------------------- 1 | /*--------------------------------------------------------------------------- 2 | 3 | rpng - simple PNG display program readpng.h 4 | 5 | --------------------------------------------------------------------------- 6 | 7 | Copyright (c) 1998-2000 Greg Roelofs. All rights reserved. 8 | 9 | This software is provided "as is," without warranty of any kind, 10 | express or implied. In no event shall the author or contributors 11 | be held liable for any damages arising in any way from the use of 12 | this software. 13 | 14 | Permission is granted to anyone to use this software for any purpose, 15 | including commercial applications, and to alter it and redistribute 16 | it freely, subject to the following restrictions: 17 | 18 | 1. Redistributions of source code must retain the above copyright 19 | notice, disclaimer, and this list of conditions. 20 | 2. Redistributions in binary form must reproduce the above copyright 21 | notice, disclaimer, and this list of conditions in the documenta- 22 | tion and/or other materials provided with the distribution. 23 | 3. All advertising materials mentioning features or use of this 24 | software must display the following acknowledgment: 25 | 26 | This product includes software developed by Greg Roelofs 27 | and contributors for the book, "PNG: The Definitive Guide," 28 | published by O'Reilly and Associates. 29 | 30 | ---------------------------------------------------------------------------*/ 31 | 32 | #ifndef TRUE 33 | # define TRUE 1 34 | # define FALSE 0 35 | #endif 36 | 37 | #ifndef MAX 38 | # define MAX(a,b) ((a) > (b)? (a) : (b)) 39 | # define MIN(a,b) ((a) < (b)? (a) : (b)) 40 | #endif 41 | 42 | #ifdef DEBUG 43 | # define Trace(x) {fprintf x ; fflush(stderr); fflush(stdout);} 44 | #else 45 | # define Trace(x) ; 46 | #endif 47 | 48 | typedef unsigned char uch; 49 | typedef unsigned short ush; 50 | typedef unsigned long ulg; 51 | 52 | 53 | /* prototypes for public functions in readpng.c */ 54 | 55 | void readpng_version_info(void); 56 | 57 | int readpng_init(FILE *infile, ulg *pWidth, ulg *pHeight, ulg *pDepth); 58 | 59 | int readpng_get_bgcolor(uch *bg_red, uch *bg_green, uch *bg_blue); 60 | 61 | uch *readpng_get_image(double display_exponent, int *pChannels, 62 | ulg *pRowbytes); 63 | 64 | void readpng_cleanup(int free_image_data); 65 | -------------------------------------------------------------------------------- /test-codepoints: -------------------------------------------------------------------------------- 1 | #!/usr/bin/ruby -w 2 | 3 | # This script will find the codepoints which are out 4 | # of order - this indicates that there was an error 5 | # in the OCR and you should correct these errors by 6 | # hand. 7 | 8 | require 'yaml' 9 | 10 | filename = nil 11 | 12 | if ARGV.length == 0 13 | filename = "individual-characters/non-blank/codepoints.yaml" 14 | elsif ARGV.length == 1 15 | filename = ARGV[0] 16 | else 17 | puts "Usage: ./test-codepoints [ FILENAME ]" 18 | exit(-1) 19 | end 20 | 21 | a = open(filename,"r") { |f| YAML.load(f.read) } 22 | 23 | last_codepoint = 0 24 | 25 | a.each do |e| 26 | 27 | name = e[0] 28 | codepoint = e[1] 29 | 30 | if codepoint.class == Fixnum 31 | 32 | if codepoint < last_codepoint 33 | puts "Problem with: #{name} => #{codepoint}" 34 | puts " Out of order..." 35 | end 36 | 37 | last_codepoint = codepoint 38 | 39 | else 40 | 41 | puts "Problem with: #{name} => #{codepoint}" 42 | puts " Not a Fixnum" 43 | 44 | end 45 | 46 | end 47 | 48 | -------------------------------------------------------------------------------- /tileinfo.rb: -------------------------------------------------------------------------------- 1 | unless FileTest.exists? 'Blocks.txt' 2 | unless system( "curl", "-s", "ftp://ftp.unicode.org/Public/5.1.0/ucd/Blocks.txt", "-o", "Blocks.txt" ) 3 | STDERR.puts "Couldn't download Blocks.txt." 4 | exit(-1) 5 | end 6 | end 7 | 8 | class Integer 9 | def between_inclusive?( min, max ) 10 | (self >= min) && (self <= max) 11 | end 12 | end 13 | 14 | # $pdfs_to_skip = Hash.new 15 | # 16 | # $pdfs_to_skip['U1FF80'] = 'Unassigned' 17 | # $pdfs_to_skip['U2FF80'] = 'Unassigned' 18 | # $pdfs_to_skip['U3FF80'] = 'Unassigned' 19 | # $pdfs_to_skip['U4FF80'] = 'Unassigned' 20 | # $pdfs_to_skip['U5FF80'] = 'Unassigned' 21 | # $pdfs_to_skip['U6FF80'] = 'Unassigned' 22 | # $pdfs_to_skip['U7FF80'] = 'Unassigned' 23 | # $pdfs_to_skip['U8FF80'] = 'Unassigned' 24 | # $pdfs_to_skip['U9FF80'] = 'Unassigned' 25 | # $pdfs_to_skip['UAFF80'] = 'Unassigned' 26 | # $pdfs_to_skip['UBFF80'] = 'Unassigned' 27 | # $pdfs_to_skip['UCFF80'] = 'Unassigned' 28 | # $pdfs_to_skip['UDFF80'] = 'Unassigned' 29 | # $pdfs_to_skip['UEFF80'] = 'Unassigned' 30 | # $pdfs_to_skip['UFFF80'] = 'Supplementary Private Use Area-B' 31 | # $pdfs_to_skip['U10FF80'] = 'Supplementary Private Use Area-B' 32 | 33 | class Block 34 | 35 | attr_accessor :first, :last, :name, :skip 36 | 37 | def initialize( first, last, name ) 38 | @first = first 39 | @last = last 40 | @name = name 41 | @skip = false 42 | @first_i = Integer("0x"+first) 43 | @last_i = Integer("0x"+last) 44 | end 45 | 46 | def in_block?( code_point ) 47 | code_point.between_inclusive?(@first_i,@last_i) 48 | end 49 | 50 | def nice_name 51 | "#{@name} (0x#{@first} to 0x#{@last})" 52 | end 53 | 54 | end 55 | 56 | $blocks = Array.new 57 | $page_to_block = {} 58 | 59 | open("Blocks.txt","r") do |f| 60 | f.each do |line| 61 | line.chomp! 62 | if line =~ /^([0-9A-F]+)\.\.([0-9A-F]+); (.*) *$/ 63 | b = Block.new($1,$2,$3) 64 | $blocks.push b 65 | $page_to_block[$1] = b 66 | end 67 | end 68 | end 69 | 70 | def block_from_codepoint( codepoint ) 71 | c = codepoint 72 | if c.class == String 73 | c = c.to_i(16) 74 | end 75 | $blocks.each do |b| 76 | if b.in_block? c 77 | return b 78 | end 79 | end 80 | raise "Filed to find block for codepoint #{codepoint}" 81 | end 82 | 83 | # This returns nil if we should skip it or a string with the name 84 | # otherwise: 85 | 86 | # def name_of_page( file_basename ) 87 | # 88 | # if $pdfs_to_skip.has_key? file_basename 89 | # return nil 90 | # end 91 | # 92 | # code_point = Integer("0x"+file_basename.gsub(/^U/,'')) 93 | # 94 | # $blocks.each do |b| 95 | # if b.in_block? code_point 96 | # return b.name 97 | # end 98 | # end 99 | # 100 | # nil 101 | # 102 | # end 103 | 104 | class TileInfo 105 | 106 | attr_accessor :w, :h 107 | attr_accessor :filename 108 | attr_accessor :block 109 | attr_accessor :ocr_codepoint 110 | 111 | def initialize(w,h) 112 | @w = w 113 | @h = h 114 | @filename = nil 115 | @block = nil 116 | @ocr_codepoint = nil 117 | end 118 | 119 | def valid_size? 120 | (@w > 200 and @w < 500 and @h > 250 and @h < 370) 121 | end 122 | 123 | def apparently_successful_ocr? 124 | @ocr_codepoint and not (@ocr_codepoint.class == String) 125 | end 126 | 127 | def set_block_from_ocr_codepoint 128 | @block = block_from_codepoint(@ocr_codepoint) 129 | end 130 | 131 | def TileInfo.from_filename( filename ) 132 | d = nil 133 | s = `png-find-grid/png-size #{filename}`.chomp 134 | unless $?.success? 135 | puts "png-find-grid/png-size #{filename} failed" 136 | exit(-1) 137 | end 138 | if s =~ /^(\d+)x(\d+)/ 139 | d = TileInfo.new(Integer($1),Integer($2)) 140 | d.filename = filename 141 | d.block = name_of_page filename.gsub(/^(U[0-9A-F]+).*$/,'\1') 142 | end 143 | return d 144 | end 145 | 146 | def to_s 147 | "#{filename}: #{w} x #{h} (in block '#{block}'" 148 | end 149 | 150 | def to_yaml_hash_element 151 | "#{filename}: !ruby/object:TileInfo\n"+ 152 | " filename: #{filename}\n"+ 153 | " h: #{h}\n"+ 154 | " w: #{w}\n"+ 155 | " block: #{block}\n"+ 156 | " ocr_codepoint: #{ocr_codepoint}" 157 | end 158 | 159 | end 160 | --------------------------------------------------------------------------------