├── README ├── manuscript └── unmerged │ ├── api.txt │ ├── compatibility.txt │ ├── cultural_barriers.txt │ ├── dynamic_toolkit.txt │ ├── fp.txt │ ├── io.txt │ ├── project_maintenance.txt │ ├── ruby_worst_practices.txt │ ├── stdlib.txt │ ├── testing.txt │ └── things_go_wrong.txt └── oreilly_final ├── book.xml └── figs ├── rubp_0701.png ├── rubp_0702.png ├── rubp_0703.png ├── rubp_0704.png ├── rubp_0705.png ├── rubp_0706.png ├── rubp_0801.png ├── rubp_0802.png ├── rubp_0803.png ├── rubp_0804.png └── rubp_ab01.png /README: -------------------------------------------------------------------------------- 1 | Welcome to the open source home of the "Ruby Best Practices" book. 2 | 3 | Here you'll find the original manuscript along with the production files that 4 | were used to generate the print version of the book. 5 | 6 | If instead you were looking for a free PDF download of the book, you 7 | can find it here: 8 | 9 | http://sandal.github.com/rbp-book/pdfs/rbp_1-0.pdf 10 | 11 | Or, if you wanted to kill trees and give me some money: 12 | 13 | http://oreilly.com/catalog/9780596523015/ 14 | http://www.amazon.com/gp/product/0596523009/ 15 | 16 | But assuming you are here for the source, check the brief description below. 17 | 18 | == Files 19 | 20 | manuscript/unmerged contains asciidoc sources that have not been updated to 21 | reflect copyediting. When I get around to it, manuscript/updated will contain 22 | the updated files. Once a file is updated, I will accept patches against it 23 | for fixes and modifications. 24 | 25 | oreilly_final/ contains the production files that were used to generate this 26 | book. Right now it's a bit limited, just one giant docbook file and some figs. 27 | We may be able to break it down by chapter later, but we may not necessarily 28 | need it. 29 | 30 | If you are wondering about code samples, they are currently at: 31 | http://github.com/sandal/rbp 32 | 33 | I plan to merge them here sooner or later, and extract more from the original 34 | manuscript. I sort of got lazy there. 35 | 36 | == Contributing / Using Content 37 | 38 | Right now, I need to go through the painstaking process of merging copyeditor 39 | changes into my asciidoc manuscript, and then setup the build toolchain again in 40 | a way that's easy enough for contributors to access. 41 | 42 | But for those who wish to fork and experiment on their own, all content here is 43 | hereby released under the Creative Commons Attribution-Noncommercial-Share Alike 44 | 3.0 license ( http://creativecommons.org/licenses/by-nc-sa/3.0/ ). 45 | 46 | If you have any questions about legal usage, contact me, and I'll 47 | talk with O'Reilly. 48 | 49 | gregory.t.brown at gmail.com 50 | 51 | 52 | -------------------------------------------------------------------------------- /manuscript/unmerged/compatibility.txt: -------------------------------------------------------------------------------- 1 | Appendix A: Writing Backwards Compatible Code 2 | --------------------------------------------- 3 | 4 | Not everyone has the luxury of using the latest and greatest tools available. 5 | Though Ruby 1.9 may be gaining ground among developers, much legacy code still 6 | runs on Ruby 1.8. Many folks have a responsibility to keep their code 7 | running on Ruby 1.8, whether it is in house, open source, or a commercial 8 | application. This chapter will show you how to maintain backwards 9 | compatibility with Ruby 1.8.6 without preventing your code from running 10 | smoothly on Ruby 1.9.1. 11 | 12 | I am assuming here that you are back-porting code to Ruby 1.8, but this may 13 | also serve as a helpful guide as to how to upgrade your projects to 1.9.1. 14 | That task is somewhat more complicated however, so your mileage may vary. 15 | 16 | The earlier you start considering backwards compatibility in your project, 17 | the easier it will be to make things run smoothly. I'll start by showing 18 | you how to keep your compatibility code manageable from the start, and then 19 | go on to describe some of the issues you may run into when supporting Ruby 20 | 1.8 and 1.9 side by side. 21 | 22 | Please note that when I mention 1.8 and 1.9 without further qualifications, I'm 23 | taking about Ruby 1.8.6 and its compatible implementations, and respectively, 24 | Ruby 1.9.1 and its compatible implementations. We have skipped Ruby 1.8.7 and 25 | Ruby 1.9.0 because both are transitional bridges between 1.8.6 and 1.9.1 and 26 | aren't truly compatible with either. 27 | 28 | Another thing to keep in mind is that this is definitely not intended to be 29 | a comprehensive guide to the differences between the versions of Ruby. Please 30 | consult your favorite reference after reviewing the tips you read here. 31 | 32 | But now that you have been sufficiently warned, we can move on to talking 33 | about how to keep things clean. 34 | 35 | === Avoiding a Mess === 36 | 37 | It is very tempting to run your test suite on one version of Ruby, check to 38 | make sure everything passes, then run it on the other version you want to 39 | support and see what breaks. After seeing failures, it might seem 40 | easy enough to just drop in code such as the following to make things go 41 | green again: 42 | 43 | ---------------------------------------------------------------------------- 44 | 45 | def my_method(string) 46 | lines = if RUBY_VERSION < "1.9" 47 | string.to_a 48 | else 49 | string.lines 50 | end 51 | do_something_with(lines) 52 | end 53 | 54 | ---------------------------------------------------------------------------- 55 | 56 | Resist this temptation! If you aren't careful, this will result in a giant 57 | mess that will be difficult to refactor, and will make your code less 58 | readable. Instead, we can approach this in a more organized fashion. 59 | 60 | ==== Selective Backporting ==== 61 | 62 | Before duplicating any effort, it's important to check and see if there is 63 | a reasonable way to write your code in another way that will allow it to 64 | run on both Ruby 1.8 and 1.9 natively. Even if this means writing code that's 65 | a little more verbose, it's generally worth the effort as it prevents the 66 | codebase from diverging. 67 | 68 | If this fails however, it may make sense to simply back-port the feature you 69 | need to Ruby 1.8. Because of Ruby's open classes, this is easy to do. We can 70 | even loosen up our changes so that they check for particular features rather than a 71 | specific version number, to improve our compatibility with other applications and 72 | Ruby implementations: 73 | 74 | ---------------------------------------------------------------------------- 75 | 76 | class String 77 | unless "".respond_to?(:lines) 78 | alias_method :lines, :to_a 79 | end 80 | end 81 | 82 | ---------------------------------------------------------------------------- 83 | 84 | Doing this will allow you to re-write your method so that it looks more 85 | natural: 86 | 87 | ---------------------------------------------------------------------------- 88 | 89 | def my_method(string) 90 | do_something_with(string.lines) 91 | end 92 | 93 | ---------------------------------------------------------------------------- 94 | 95 | Although this implementation isn't exact, it is good enough for our needs 96 | and will work as expected in most cases. However, if we wanted to be 97 | pedantic, we'd be sure to return an Enumerator instead of an Array: 98 | 99 | ---------------------------------------------------------------------------- 100 | 101 | class String 102 | unless "".respond_to?(:lines) 103 | require "enumerator" 104 | 105 | def lines 106 | to_a.enum_for(:each) 107 | end 108 | end 109 | end 110 | 111 | ---------------------------------------------------------------------------- 112 | 113 | If you aren't redistributing your code, passing tests in your application 114 | and code that works as expected are a good enough indication that your 115 | backwards compatibility patches are working. However, in code that you plan 116 | to distribute, open source or otherwise, you need to be prepared to make 117 | things more robust when necessary. Any time you distribute code that 118 | modifies core Ruby, you have an implicit responsibility of not breaking third 119 | party libraries or application code, so be sure to keep this in mind and 120 | clearly document exactly what you have changed. 121 | 122 | In Prawn, we use a single file, 'prawn/compatibility.rb', to store all the core 123 | extensions used in the library that support backwards compatibility. This 124 | helps make it easier for users to track down all the changes made by the 125 | library, which can help make subtle bugs that can arise from version 126 | incompatibilities easier to spot. 127 | 128 | In general, this approach is a fairly solid way to keep your application code 129 | clean while supporting both Ruby 1.8 and 1.9. However, you should only use 130 | it to add new functionality to Ruby 1.8.6 that isn't present in 1.9.1, and not 131 | to modify existing behavior. Adding functions that don't exist in a standard 132 | version of Ruby is a relatively low risk procedure, whereas changing core 133 | functionality is a far more controversial practice. 134 | 135 | ==== Version-specific Codeblocks ==== 136 | 137 | If you run into a situation where you really need two different approaches 138 | between the two major versions of Ruby, you can use a trick to make this 139 | a bit more attractive in your code. 140 | 141 | ---------------------------------------------------------------------------- 142 | 143 | if RUBY_VERSION < "1.9" 144 | def ruby_18 145 | yield 146 | end 147 | 148 | def ruby_19 149 | false 150 | end 151 | else 152 | def ruby_18 153 | false 154 | end 155 | 156 | def ruby_19 157 | yield 158 | end 159 | end 160 | 161 | ---------------------------------------------------------------------------- 162 | 163 | Here's an example of how you'd make use of these methods: 164 | 165 | ---------------------------------------------------------------------------- 166 | 167 | def open_file(file) 168 | ruby_18 { File.open("foo.txt","r") } || 169 | ruby_19 { File.open("foo.txt", "r:UTF-8") } 170 | end 171 | 172 | ---------------------------------------------------------------------------- 173 | 174 | Of course, since this approach creates a divergent codebase, it should be 175 | used as sparingly as possible. However, since this looks a little nicer 176 | than a conditional statement and provides a centralized place for changes to 177 | minor version numbers if needed, it is a nice way to go when it is actually 178 | necessary. 179 | 180 | ==== Compatibility shims for common operations ==== 181 | 182 | When you need to accomplish the same thing in two different ways, you can 183 | also consider adding a method to both versions of Ruby. Although Ruby 1.9.1 184 | shipped with `File.binread()`, this method did not exist in the earlier 185 | developmental versions of Ruby 1.9. 186 | 187 | Although a handful of +ruby_18+ and +ruby_19+ calls here and there aren't that bad, 188 | the need for opening binary files was pervasive, and it got tiring to see 189 | the following code popping up everywhere this feature was needed: 190 | 191 | ---------------------------------------------------------------------------- 192 | 193 | ruby_18 { File.open("foo.jpg", "rb") } || 194 | ruby_19 { File.open("foo.jpg", "rb:BINARY") } 195 | 196 | ---------------------------------------------------------------------------- 197 | 198 | To simplify things, we put together a simple `File.read_binary` method that 199 | worked on both Ruby 1.8 and 1.9. You can see this is nothing particularly 200 | exciting or surprising: 201 | 202 | ---------------------------------------------------------------------------- 203 | 204 | if RUBY_VERSION < "1.9" 205 | 206 | class File 207 | def self.read_binary(file) 208 | File.open(file,"rb") { |f| f.read } 209 | end 210 | end 211 | 212 | else 213 | 214 | class File 215 | def self.read_binary(file) 216 | File.open(file,"rb:BINARY") { |f| f.read } 217 | end 218 | end 219 | 220 | end 221 | 222 | ---------------------------------------------------------------------------- 223 | 224 | This cleaned up the rest of our code greatly, and reduced the number of 225 | version checks significantly. Of course, when `File.binread()` came along in 226 | Ruby 1.9.1, we went and used the techniques discussed earlier to back-port it 227 | to 1.8.6, but previous to that, this represented a nice way to to attack the 228 | same problem in two different ways. 229 | 230 | Now that we've discussed all the relevant techniques, I can now show you 231 | what prawn/compatibility.rb looks like. This file allows Prawn to run on 232 | both major versions of Ruby without any issues, and as you can see, is quite 233 | compact: 234 | 235 | ---------------------------------------------------------------------------- 236 | 237 | class String #:nodoc: 238 | unless "".respond_to?(:lines) 239 | alias_method :lines, :to_a 240 | end 241 | end 242 | 243 | unless File.respond_to?(:binread) 244 | def File.binread(file) 245 | File.open(file,"rb") { |f| f.read } 246 | end 247 | end 248 | 249 | if RUBY_VERSION < "1.9" 250 | 251 | def ruby_18 252 | yield 253 | end 254 | 255 | def ruby_19 256 | false 257 | end 258 | 259 | else 260 | 261 | def ruby_18 262 | false 263 | end 264 | 265 | def ruby_19 266 | yield 267 | end 268 | 269 | end 270 | 271 | ---------------------------------------------------------------------------- 272 | 273 | This code leaves Ruby 1.9.1 virtually untouched, and only adds a couple of 274 | simple features to Ruby 1.8.6. These small modifications enable Prawn to have 275 | cross-compatibility between versions of Ruby without polluting its 276 | codebase with copious version checks and workarounds. Of course, there are 277 | a few areas that needed extra attention, and we'll talk sorts of issues to 278 | look out for in just a moment, but for the most part, this little 279 | compatibility file gets the job done. 280 | 281 | Even if someone produced a Ruby 1.8 / 1.9 compatibility library that you 282 | could include into your projects, it might still be advisable to copy only 283 | what you need from it. The core philosophy here is that we want to do 284 | as much as we can to let each respective version of Ruby be what it is, 285 | to avoid confusing and painful debugging sessions. By taking a minimalist 286 | approach and making it as easy as possible to locate your platform specific 287 | changes, we can help make things run more smoothly. 288 | 289 | Before we move on to some more specific details on particular 290 | incompatibilities and how to work around them, let's recap the key points 291 | of this section: 292 | 293 | * Try to support both Ruby 1.8 and 1.9 from the ground up. However, be 294 | sure to write your code against Ruby 1.9 first and then backport to 1.8 295 | if you want prevent yourself from writing too much legacy code. 296 | 297 | * Before writing any version specific code or modifying core Ruby, attempt 298 | to find a way to write code that runs natively on both Ruby 1.8 and 1.9. 299 | Even if the solution turns out to be less beautiful than usual, its 300 | better to have code that works without introducing redundant 301 | implementations or modifications to core Ruby. 302 | 303 | * For features that don't have a straightforward solution that works on 304 | both versions, consider back-porting the necessary functionality to 305 | Ruby 1.8 by adding new methods to existing core classes. 306 | 307 | * If a feature is too complicated to backport or involves separate 308 | procedures across versions, consider adding a helper method that behaves 309 | the same on both versions. 310 | 311 | * If you need to inline version checks, consider using the ruby_18 and 312 | ruby_19 blocks shown in this chapter. These centralize your version 313 | checking logic and provide room for refactoring and future extension. 314 | 315 | With these thoughts in mind, let's talk about some incompatibilities you just 316 | can't work around, and how to avoid them. 317 | 318 | === Non-portable features in Ruby 1.9 === 319 | 320 | There are some features in Ruby 1.9 that you simply cannot backport to 1.8 321 | without modifying the interpreter itself. Here we'll talk about just a few 322 | of the more obvious ones, to serve as a reminder of what to avoid if you plan 323 | to have your code run on both versions of Ruby. In no particular order, 324 | here's a fun list of things that'll cause a backport to grind to a halt if 325 | you're not careful. 326 | 327 | ==== Pseudo-keyword Hash Syntax ==== 328 | 329 | Ruby 1.9 adds a cool feature that lets you write things like: 330 | 331 | ........................................................................... 332 | 333 | foo(a: 1, b: 2) 334 | 335 | ........................................................................... 336 | 337 | But on Ruby 1.8, we're stuck using the old key => value syntax: 338 | 339 | ........................................................................... 340 | 341 | foo(:a => 1, :b => 2) 342 | 343 | ........................................................................... 344 | 345 | ==== Multi-splat arguments ==== 346 | 347 | Ruby 1.9.1 offers a downright insane amount of ways to process arguments to 348 | methods. But even the more simple ones, such as multiple splats in an 349 | argument list, are not backwards compatible. Here's an example of something 350 | you can do on Ruby 1.9 that you can't on Ruby 1.8, which is something to be 351 | avoided in backwards compatible code: 352 | 353 | ........................................................................... 354 | 355 | def add(a,b,c,d,e) 356 | a + b + c + d + e 357 | end 358 | 359 | add(*[1,2], 3, *[4,5]) #=> 15 360 | 361 | ........................................................................... 362 | 363 | The closest thing we can get to this on Ruby 1.8 would be something like this: 364 | 365 | ........................................................................... 366 | 367 | add(*[[1,2], 3, [4,5]].flatten) #=> 15 368 | 369 | ........................................................................... 370 | 371 | Of course, this isn't nearly as appealing. It doesn't even handle the same edge 372 | cases that Ruby 1.9 does, as this would not work with any array arguments that 373 | are meant to be kept as an array. So it's best to just not rely on this kind of 374 | interface in code that needs to run on both 1.8 and 1.9. 375 | 376 | ==== Block-local variables ==== 377 | 378 | On Ruby 1.9, block variables will shadow outer local variables, resulting 379 | in the following behaviour: 380 | 381 | ........................................................................... 382 | 383 | >> a = 1 384 | => 1 385 | >> (1..10).each { |a| a } 386 | => 1..10 387 | >> a 388 | => 1 389 | 390 | ........................................................................... 391 | 392 | This is not the case on Ruby 1.8, where the variable will be modified even 393 | if not explicitly set: 394 | 395 | ........................................................................... 396 | 397 | >> a = 1 398 | => 1 399 | >> (1..10).each { |a| a } 400 | => 1..10 401 | >> a 402 | => 10 403 | 404 | ........................................................................... 405 | 406 | This can be the source of a lot of subtle errors, so if you want to be safe 407 | on Ruby 1.8, be sure to use different names for your block-local variables 408 | so as to avoid accidentally overwriting outer local variables. 409 | 410 | ==== Block Arguments ==== 411 | 412 | In Ruby 1.9, blocks can accept block arguments, which is most commonly seen 413 | in define_method. 414 | 415 | ........................................................................... 416 | 417 | define_method(:answer) { |&b| b.call(42) } 418 | 419 | ........................................................................... 420 | 421 | However, this won't work on Ruby 1.8 without some very ugly workarounds, so 422 | it might be best to rethink things and see if you can do them in a different 423 | way if you've been relying on this functionality. 424 | 425 | ==== New Proc Syntax ==== 426 | 427 | Both the stabby `Proc` and the `.()` call are new in 1.9, and aren't parseable by 428 | the Ruby 1.8 interpreter. This means that calls like this need to go: 429 | 430 | ........................................................................... 431 | 432 | >> ->(a) { a*3 }.(4) 433 | => 12 434 | 435 | ........................................................................... 436 | 437 | Instead, use the trusty lambda keyword and `Proc#call` or `Proc#[]` 438 | 439 | ........................................................................... 440 | 441 | >> lambda { |a| a*3 }[4] 442 | => 12 443 | 444 | ........................................................................... 445 | 446 | ==== Oniguruma ==== 447 | 448 | Although it is possible to build the Oniguruma regular expression engine into 449 | Ruby 1.8, it is not distributed by default, and thus should not be used in 450 | backwards compatible code. This means that if you're using named groups, 451 | you'll need to ditch them. The following code is using named groups: 452 | 453 | ........................................................................... 454 | 455 | >> "Gregory Brown".match(/(?\w+) (?\w+)/) 456 | => # 457 | 458 | ........................................................................... 459 | 460 | We'd need to rewrite this as: 461 | 462 | ........................................................................... 463 | 464 | >> "Gregory Brown".match(/(\w+) (\w+)/) 465 | => # 466 | 467 | ........................................................................... 468 | 469 | More advanced regular expressions, including those that make use of positive 470 | or negative look-behind, will need to be completely re-written so that they 471 | work on both Ruby 1.8's regular expression engine and Oniguruma. 472 | 473 | ==== Most M17N Functionality ==== 474 | 475 | Though it may go without saying, Ruby 1.8 is not particularly well suited for 476 | working with character encodings. There are some workarounds for this, but 477 | things like magic comments that tell what encoding a file is in, or String 478 | objects that are aware of their current encoding are completely missing from Ruby 1.8. 479 | 480 | Although we could go on, I'll leave the rest of the incompatibilities for you 481 | to research. Keeping an eye on the issues mentioned in this section will 482 | help you avoid some of the most common problems, and that might be enough to 483 | make things run smoothly for you, depending on your needs. 484 | 485 | So far we've focused on the things you can't work around, but there are lots 486 | of other issues that can be handled without too much effort, if you know how 487 | to approach them. We'll take a look at a few of those now. 488 | 489 | === Workarounds for common issues === 490 | 491 | Although we have seen that some functionality is simply not portable between 492 | Ruby 1.8 and 1.9, there are many more issues in which Ruby 1.9 just does 493 | things a little differently or more conveniently. In these cases, we can 494 | develop suitable workarounds that allow our code to run on both versions of 495 | Ruby. Let's take a look at a few of these issues and how we can deal with 496 | them. 497 | 498 | ==== Using Enumerator ==== 499 | 500 | In Ruby 1.9, you can get back an Enumerator for pretty much every method that 501 | iterates over a collection: 502 | 503 | ........................................................................... 504 | 505 | >> [1,2,3,4].map.with_index { |e,i| e + i } 506 | => [1, 3, 5, 7] 507 | 508 | ........................................................................... 509 | 510 | In Ruby 1.8, Enumerator is part of the standard library instead of core, and 511 | isn't quite as feature packed. However, we can still accomplish the same 512 | goals by being a bit more verbose: 513 | 514 | ........................................................................... 515 | 516 | >> require "enumerator" 517 | => true 518 | >> [1,2,3,4].enum_for(:each_with_index).map { |e,i| e + i } 519 | => [1, 3, 5, 7] 520 | 521 | ........................................................................... 522 | 523 | Because Ruby 1.9's implementation of Enumerator is mostly backwards 524 | compatible with Ruby 1.8, you can write your code in this legacy style 525 | without fear of breaking anything. 526 | 527 | ==== String Iterators ==== 528 | 529 | In Ruby 1.8, Strings are Enumerable, whereas in Ruby 1.9, they are not. 530 | Ruby 1.9 provides `String#lines`, `String#each_line`, `String#each_char`, 531 | `String#each_byte`, all of which are not present in Ruby 1.8. 532 | 533 | The best bet here is to back-port the features you need to Ruby 1.8, and 534 | avoid treating a String as an Enumerable sequence of lines. When you need 535 | that functionality, use String#lines followed by whatever enumerable method you 536 | need. 537 | 538 | The underlying point here is that it's better to stick with Ruby 1.9's 539 | functionality, because it'll be less likely to confuse others who might 540 | be reading your code. 541 | 542 | ==== Character Operations ==== 543 | 544 | In Ruby 1.9, Strings are generally character aware, which means that you can 545 | index into them and get back a single character, regardless of encoding: 546 | 547 | ........................................................................... 548 | 549 | >> "Foo"[0] 550 | => "F" 551 | 552 | ........................................................................... 553 | 554 | This is not the case in Ruby 1.8.6, as you can see: 555 | 556 | ........................................................................... 557 | 558 | >> "Foo"[0] 559 | => 70 560 | 561 | ........................................................................... 562 | 563 | If you need to do character aware operations on Ruby 1.8 and 1.9, you'll need 564 | to process things using a regex trick that gets you back an array of 565 | characters. After setting `$KCODE="U"` footnote:[This is necessary to work with 566 | UTF-8 on Ruby 1.8, but has no effect on 1.9], you'll need to do things like 567 | substitute calls to `String#reverse` with the following: 568 | 569 | ........................................................................... 570 | 571 | >> "résumé".scan(/./m).reverse.join 572 | => "émusér" 573 | 574 | ........................................................................... 575 | 576 | Or as another example, you'll replace `String#chop` with this: 577 | 578 | ........................................................................... 579 | 580 | >> r = "résumé".scan(/./m); r.pop; r.join 581 | => "résum" 582 | 583 | ........................................................................... 584 | 585 | Depending on how many of these manipulations you'll need to do, you might 586 | consider breaking out the Ruby 1.8 compatible code from the clearer Ruby 1.9 587 | code using the techniques discussed earlier in this chapter. However, the 588 | thing to remember is that anywhere you've been enjoying Ruby 1.9's M17N 589 | support, you'll need to do some rework. The good news is that many of 590 | the techniques used on Ruby 1.8 still work on Ruby 1.9, but the bad news is 591 | that they can appear quite convoluted to those who have gotten used to the 592 | way things work in newer versions of Ruby. 593 | 594 | ==== Encoding Conversions ==== 595 | 596 | Ruby 1.9 has built in support for transcoding between various character 597 | encodings, whereas Ruby 1.8 is more limited. However, both versions support 598 | Iconv. If you know exactly what formats you want to translate between, you can 599 | simply replace your string.encode("ISO-8859-1") calls with something like 600 | this: 601 | 602 | Iconv.conv("ISO-8859-1", "UTF-8", string) 603 | 604 | However, if you want to let Ruby 1.9 stay smart about its transcoding while 605 | still providing backwards compatibility, you will just need to write code 606 | for each version. Here's an example of how this was done in an early version of 607 | Prawn: 608 | 609 | ---------------------------------------------------------------------------- 610 | 611 | if "".respond_to?(:encode!) 612 | def normalize_builtin_encoding(text) 613 | text.encode!("ISO-8859-1") 614 | end 615 | else 616 | require 'iconv' 617 | def normalize_builtin_encoding(text) 618 | text.replace Iconv.conv('ISO-8859-1//TRANSLIT', 'utf-8', text) 619 | end 620 | end 621 | 622 | ---------------------------------------------------------------------------- 623 | 624 | Although there is duplication of effort here, the Ruby 1.9 based code does 625 | not assume UTF-8 based input, whereas the Ruby 1.8 based code is forced to 626 | make this assumption. In cases where you want to support many encodings on 627 | Ruby 1.9, this may be the right way to go. 628 | 629 | Although we've just scratched the surface, this handful of tricks should 630 | cover a handful of the most common issues you'll encounter. For everything 631 | else, consult your favorite language reference. 632 | 633 | === Conclusions === 634 | 635 | Depending on the nature of your project, getting things running on both Ruby 636 | 1.8 and 1.9 can be either trivial or a major undertaking. The more string 637 | processing you are doing, and the greater your need for multi-lingualization 638 | support, the more complicated a backwards compatible port of your software to 639 | Ruby 1.8 will be. Additionally, if you've been digging into some of the fancy 640 | new features that ship with Ruby 1.9, you might find yourself doing some 641 | serious rewriting when the time comes to support older versions of Ruby. 642 | 643 | In light of all this, it's best to start (if you can afford to) by supporting 644 | both versions from the ground up. By writing your code in a fairly backwards 645 | compatible subset of Ruby 1.9, you'll minimize the amount of duplicated 646 | effort that is needed to support both versions. If you keep your 647 | compatibility hacks well organized and centralized, it'll be easier to spot 648 | any problems that might crop up. 649 | 650 | If you find yourself writing the same workaround several times, think about 651 | extending the core with some helpers to make your code clearer. However, 652 | keep in mind that when you redistribute code, you have a responsibility not 653 | to break existing language features and that you should strive to avoid 654 | conflicts with third party libraries. 655 | 656 | However, don't let all these caveats turn you away. Writing code that runs 657 | on both Ruby 1.8 and 1.9 is about the most friendly thing you can do in terms 658 | of open source Ruby, and will also be beneficial in other scenarios. Start 659 | by reviewing the guidelines in this chapter, then remember to keep testing 660 | your code on both versions of Ruby. So long as you keep things well 661 | organized and try as best as you can to minimize version specific code, you 662 | should be able to get your project working on both Ruby 1.8 and 1.9 without 663 | conflicts. This gives you a great degree of flexibility which is often worth 664 | the extra effort. 665 | -------------------------------------------------------------------------------- /manuscript/unmerged/io.txt: -------------------------------------------------------------------------------- 1 | == Text Processing and File Management == 2 | 3 | === A Job Scripting Languages Are Built For === 4 | 5 | Ruby fills a lot of the same roles that languages such as Perl and Python 6 | do. Because of this, you can expect to find first rate support for text 7 | processing and file management. Whether it's parsing a text file with 8 | some regular expressions or building some *nix style filter applications, 9 | Ruby can help make life easier. 10 | 11 | However, much of Ruby's I/O facilities are tersely documented at best. It is 12 | also relatively hard to find good resources which show you general strategies 13 | for attacking common text processing tasks. This chapter aims to expose 14 | you to some good tricks that you can use to simplify your text processing 15 | needs, as well as sharpen your skills when it comes to interacting with 16 | and managing files on your system. 17 | 18 | As in other chapters, we'll start off by looking at some real open source 19 | code, this time, a simple parser for an Adobe Font Metrics file. This example 20 | will expose you to text processing in its setting. We'll then follow up 21 | with a number of detailed sections which look at different practices that 22 | will help you master basic I/O skill. Armed with these techniques, you'll 23 | be able to take on all sorts of text processing and file management tasks with 24 | ease. 25 | 26 | === Line Based File Processing with State Tracking === 27 | 28 | Processing a text document line by line does not mean that we're limited 29 | to extracting content in a uniform way, treating each line identically. 30 | Some files have more structure than that, but can still benefit from 31 | being processed linearly. We're now going to look over a small parser that 32 | illustrates this general idea by selecting different ways to extract our 33 | data based on what section of a file we are in. 34 | 35 | The code in this section was written by James Edward Gray II as part of Prawn's 36 | support for Adobe Font Metrics. Though the example itself is domain specific, 37 | we won't hung up in the particular details of this parser. Instead, we'll 38 | be taking a look at the general approach for to build a state aware parser 39 | that operates on an efficient line by line basis. Along the way, you'll pick 40 | up some basic I/O tips and tricks as well as see the importance regular 41 | expressions often play in this sort of task. 42 | 43 | Before we take a look at the actual parser, we can take a glance at the sort 44 | of data we're dealing with. Adobe Font Metrics files are essentially font 45 | glyph measurements and specifications, so they tend to look a bit like a 46 | configuration file of sorts. Some of these things are simply straight key 47 | value pairs, such as: 48 | 49 | ............................................................................... 50 | 51 | CapHeight 718 52 | XHeight 523 53 | Ascender 718 54 | Descender -207 55 | 56 | ............................................................................... 57 | 58 | Others are organized sets of values within a section, as in the following 59 | example: 60 | 61 | --------------------------------------------------------------------------------- 62 | 63 | StartCharMetrics 315 64 | C 32 ; WX 278 ; N space ; B 0 0 0 0 ; 65 | C 33 ; WX 278 ; N exclam ; B 90 0 187 718 ; 66 | C 34 ; WX 355 ; N quotedbl ; B 70 463 285 718 ; 67 | C 35 ; WX 556 ; N numbersign ; B 28 0 529 688 ; 68 | C 36 ; WX 556 ; N dollar ; B 32 -115 520 775 ; 69 | .... 70 | EndCharMetrics 71 | 72 | --------------------------------------------------------------------------------- 73 | 74 | Sections can be nested within each other, making things more interesting. 75 | The data across the file does not fit a uniform format, as each section 76 | represents a different sort of thing. However, we can come up with patterns 77 | to parse data in each section we're interested in, because they are consistent 78 | within their sections. We also are only interested in a subset of the 79 | sections, so we can safely ignore some of them. This is the essence of the 80 | task we needed to accomplish, but if you notice, it's a fairly abstract 81 | pattern that we can reuse. Many documents with a simple section-based 82 | structure can be worked with using the approach we show here. 83 | 84 | The code that follows is essentially a simple finite state machine that keeps 85 | track of what section the current line appears in. It attempts to parse 86 | the opening or closing of a section first, and then uses this information 87 | to determine a parsing strategy for the current line. The sections that 88 | we're not interested in parsing, we simply skip. 89 | 90 | We end up with a very straightforward solution. The whole parser is 91 | reduced to a simple iteration over each line of the file which 92 | manages a stack of nested sections, while determining if and how to 93 | parse the current line. 94 | 95 | We'll look at the parts in more details in just a moment, but here is the 96 | whole AFM parser that extracts all the information we need to properly render 97 | Adobe fonts in Prawn: 98 | 99 | ............................................................................... 100 | 101 | def parse_afm(file_name) 102 | section = [] 103 | 104 | File.foreach(file_name) do |line| 105 | case line 106 | when /^Start(\w+)/ 107 | section.push $1 108 | next 109 | when /^End(\w+)/ 110 | section.pop 111 | next 112 | end 113 | 114 | case section 115 | when ["FontMetrics", "CharMetrics"] 116 | next unless line =~ /^CH?\s/ 117 | 118 | name = line[/\bN\s+(\.?\w+)\s*;/, 1] 119 | @glyph_widths[name] = line[/\bWX\s+(\d+)\s*;/, 1].to_i 120 | @bounding_boxes[name] = line[/\bB\s+([^;]+);/, 1].to_s.rstrip 121 | when ["FontMetrics", "KernData", "KernPairs"] 122 | next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/ 123 | @kern_pairs[[$1, $2]] = $3.to_i 124 | when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"] 125 | next 126 | else 127 | parse_generic_afm_attribute(line) 128 | end 129 | end 130 | end 131 | 132 | ............................................................................... 133 | 134 | You could try to understand the particular details if you'd like, but it's 135 | also fine to black-box the expressions used here so that you can get 136 | a sense of the overall structure of the parser. Here's what the code 137 | looks like if we do that for all but the patterns which determine the 138 | section nesting: 139 | 140 | ............................................................................... 141 | 142 | def parse_afm(file_name) 143 | section = [] 144 | 145 | File.foreach(file_name) do |line| 146 | case line 147 | when /^Start(\w+)/ 148 | section.push $1 149 | next 150 | when /^End(\w+)/ 151 | section.pop 152 | next 153 | end 154 | 155 | case section 156 | when ["FontMetrics", "CharMetrics"] 157 | parse_char_metrics(line) 158 | when ["FontMetrics", "KernData", "KernPairs"] 159 | parse_kern_pairs(line) 160 | when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"] 161 | next 162 | else 163 | parse_generic_afm_attribute(line) 164 | end 165 | end 166 | end 167 | 168 | ............................................................................... 169 | 170 | With these simplifications, it's very clear that we're looking at an ordinary 171 | finite state machine which is acting upon the lines of the file. It 172 | also makes it easier to notice what's actually going on. 173 | 174 | The first case statement is just a simple way to check for which section 175 | we're currently looking at, updating the stack as necessary as we move 176 | in and out of sections: 177 | 178 | ............................................................................... 179 | 180 | case line 181 | when /^Start(\w+)/ 182 | section.push $1 183 | next 184 | when /^End(\w+)/ 185 | section.pop 186 | next 187 | end 188 | 189 | ............................................................................... 190 | 191 | If we find a section beginning or end, we skip to the next line as we know 192 | there is nothing else to parse. Otherwise, we know that we have to do some 193 | real work, which is done in the second case statement: 194 | 195 | ............................................................................... 196 | 197 | case section 198 | when ["FontMetrics", "CharMetrics"] 199 | next unless line =~ /^CH?\s/ 200 | 201 | name = line[/\bN\s+(\.?\w+)\s*;/, 1] 202 | @glyph_widths[name] = line[/\bWX\s+(\d+)\s*;/, 1].to_i 203 | @bounding_boxes[name] = line[/\bB\s+([^;]+);/, 1].to_s.rstrip 204 | when ["FontMetrics", "KernData", "KernPairs"] 205 | next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/ 206 | @kern_pairs[[$1, $2]] = $3.to_i 207 | when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"] 208 | next 209 | else 210 | parse_generic_afm_attribute(line) 211 | end 212 | 213 | ............................................................................... 214 | 215 | Here, we've got four different ways to handle our line of text. In the first 216 | two cases, we process the lines we need to as we walk through the section, 217 | extracting the bits of information we need and ignoring the extraneous 218 | information we're not interested in. 219 | 220 | In the third case, we identify certain sections to skip and simply resume 221 | processing the next line if we are currently within that section. 222 | 223 | Finally, if the other cases fail to match, our last case scenario is to 224 | assume we're dealing with a simple key value pair, which is handled by a 225 | private helper method in Prawn. Since it does not provide anything different 226 | to look at than the first two sections of this case statement, we can 227 | safely ignore how it works without missing anything important. 228 | 229 | However, the interesting thing that you might have noticed is that the first 230 | case and the second case use two different ways of extracting values. The code 231 | which processes +CharMetrics+ is using +String#[]+, wheras the code handling 232 | KernPairs is using Perl-style global match variables. The reason for this is 233 | largely convenience. The following two lines of code are equivalent: 234 | 235 | ............................................................................... 236 | 237 | name = line[/\bN\s+(\.?\w+)\s*;/, 1] 238 | name = line =~ /\bN\s+(\.?\w+)\s*;/ && $1 239 | 240 | ............................................................................... 241 | 242 | There are still other ways to handle your captured matches 243 | (Such as +MatchData+ via +String#match+), but we'll get into those later. For 244 | now, it's simply worth knowing that when you're trying to extract a single 245 | matched capture, +String#[]+ does the job well, but if you need to deal with 246 | more than one, you need to use another approach. We see this clearly in 247 | the second case: 248 | 249 | ............................................................................... 250 | 251 | next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/ 252 | @kern_pairs[[$1, $2]] = $3.to_i 253 | 254 | ............................................................................... 255 | 256 | This code is a bit clever, as the line that assigns the values to +@kern_pairs+ 257 | only gets executed when there is a successful match. When the match fails, 258 | it will return +nil+, causing the parser to skip to the next line for 259 | processing. 260 | 261 | We could continue studying this example, but we'd then be delving into the 262 | specifics and those details aren't important for remembering this simple 263 | general pattern. 264 | 265 | When dealing with a structured document that can be processed by discrete 266 | rules for each section, the general approach is simple and does not typically 267 | require pulling the entire document into memory or doing multiple passes 268 | through the data. 269 | 270 | Instead, you can do the following: 271 | 272 | * Identity the beginning and end markers of sections with a pattern. 273 | 274 | * If sections are nested, maintain a stack which you update before further 275 | processing of each line. 276 | 277 | * Break up your extraction code into different cases and select the right 278 | one based on the current section you are in. 279 | 280 | * When a line cannot be processed, skip to the next one as soon as possible, 281 | using the +next+ keyword. 282 | 283 | * Maintain state as you normally would, processing whatever data you need. 284 | 285 | By following these basic guidelines, you can avoid over thinking your problem, 286 | while still saving clock cycles and keeping your memory footprint low. 287 | Although the code here solves a particular problem, it can easily be adapted 288 | to fit a wide range of basic document processing needs. 289 | 290 | This introduction has hopefully provided a taste of what text processing in 291 | Ruby is all about. The rest of the chapter will provide many more tips and 292 | tricks, with a greater focus on the particular topics. Feel free to jump 293 | around to the things that interest you most, but I'm hoping all of the 294 | sections have something interesting to offer to even seasoned Rubyists. 295 | 296 | === Regular Expressions === 297 | 298 | At the time of writing this chapter, I was spending some time watching the 299 | Dow Jones Industrial Average, as the world was in the middle of a major 300 | financial meltdown. If you're wondering what this has to do with Ruby 301 | or Regular Expressions, take a quick look at the following code: 302 | 303 | ............................................................................... 304 | 305 | require "open-uri" 306 | loop do 307 | puts( open("http://finance.google.com/finance?cid=983582").read[ 308 | /([+-]?\d+\.\d+)/m, 1] ) 309 | sleep(30) 310 | end 311 | 312 | ............................................................................... 313 | 314 | In just a couple of lines, I was able to throw together a script that would 315 | poll Google Finance and pull down the current average price of the Dow. This 316 | sort of "find a needle in the haystack" extraction is what regular expressions 317 | are all about. 318 | 319 | Of course, the art of constructing regular expressions is often veiled in 320 | mystery. Even simple patterns such as this one might make some folks feel a bit uneasy: 321 | 322 | ............................................................................... 323 | 324 | /([+-]?\d+\.\d+)/m 325 | 326 | ............................................................................... 327 | 328 | This expression is simple by comparison to some other examples we can show, 329 | but it still makes use of a number of regular expression concepts. All in 330 | one line, we can see the use of character classes (both general and special), 331 | escapes, quantifiers, groups, and a switch that enables multi-line matching. 332 | 333 | Patterns are dense because they are written in a special syntax which acts 334 | as a sort of domain language for matching and extracting text. The reason 335 | why it may be considered daunting is that this language is made up of so 336 | few special characters: 337 | 338 | ............................................................................... 339 | 340 | \ [ ] . ^ $ ? * + { } | ( ) 341 | 342 | ............................................................................... 343 | 344 | At its heart, regular expressions are nothing more than a facility to do 345 | find and replace operations. This concept is so familiar that anyone who 346 | has used a word processor has a strong grasp on it. Using a regex, you 347 | can easily replace all instances of the word "Mitten" with "Kitten", just 348 | like your favorite text editor or word processor can: 349 | 350 | ............................................................................... 351 | 352 | some_string.gsub(/\bMitten\b/,"Kitten") 353 | 354 | ............................................................................... 355 | 356 | Many programmers get this far and stop. They learn to use regex as if it 357 | were a necessary evil rather than an essential techique. We can do 358 | better than that. In this section, we'll look at a few guidelines for 359 | how to write effective patterns that do what they're supposed to without 360 | getting too convoluted. I'm assuming you've done your homework and are 361 | at least familiar with Regex basics as well as Ruby's pattern syntax. If 362 | that's not the case, pick up your favorite language reference and take a few 363 | minutes to review the fundamentals. 364 | 365 | So long as you can comfortably read the first example in this section, you're 366 | ready to move on. If you can convince yourself that writing regular 367 | expressions is actually much easier than people tend to think it is, the tips 368 | and tricks to follow shouldn't cause you to break a sweat. 369 | 370 | ==== Don't Work Too Hard ==== 371 | 372 | Despite being such a compact format, it's relatively easy to write bloated 373 | patterns if you don't consciously remember to keep things clean and tight. 374 | We'll now take a look at a couple sources of extra fat and how to trim them 375 | down. 376 | 377 | Alternation is a very powerful regex tool. It allows you to match one of 378 | a series of potential sequences. For example, if you want to match the name 379 | "James Gray" but also match "James gray", "james Gray", and "james gray", 380 | the following code will do the trick: 381 | 382 | ............................................................................... 383 | 384 | >> ["James Gray", "James gray", "james gray", "james Gray"].all? { |e| 385 | ?> e.match(/James|james Gray|gray/) } 386 | => true 387 | 388 | ............................................................................... 389 | 390 | However, you don't need to work so hard. You're really talking about 391 | possible alternations of simply two characters, not two full words. You could 392 | write this far more efficiently using a character class: 393 | 394 | ............................................................................... 395 | 396 | >> ["James Gray", "James gray", "james gray", "james Gray"].all? { |e| 397 | ?> e.match(/[Jj]ames [Gg]ray/) } 398 | => true 399 | 400 | ............................................................................... 401 | 402 | This makes your pattern clearer and also will result in a much better 403 | optimization in Ruby's regex engine. So in addition to looking better, 404 | this code is actually faster. 405 | 406 | In a similar vein, it is unnecessary to use explicit character classes 407 | when a shortcut will do. To match a four digit number, we could write: 408 | 409 | ............................................................................... 410 | 411 | /[0-9][0-9][0-9][0-9]/ 412 | 413 | ............................................................................... 414 | 415 | Which can of course be cleaned up a bit using repetitions: 416 | 417 | ............................................................................... 418 | 419 | /[0-9]{4}/ 420 | 421 | ............................................................................... 422 | 423 | However, we can do even better by using the special class built in for this: 424 | 425 | ............................................................................... 426 | 427 | /\d{4}/ 428 | 429 | ............................................................................... 430 | 431 | It pays to learn what shortcuts are available to you. Here's a quick list 432 | for further study, if you're not already familiar with them: 433 | 434 | ............................................................................... 435 | 436 | . \s \S \w \W \d \D 437 | 438 | ............................................................................... 439 | 440 | Each one of the above corresponds to a literal character class that is more 441 | verbose when written out. Using shortcuts increases clarity and decreases 442 | the chance of bugs creeping in by ill defined patterns. Though it may seem 443 | a bit terse at first, you'll be able to sight read them at ease over time. 444 | 445 | ==== Anchors are your friends ==== 446 | 447 | One way to match my name in a string is to write the following simple pattern: 448 | 449 | ............................................................................... 450 | 451 | string =~ /Gregory Brown/ 452 | 453 | ............................................................................... 454 | 455 | However, consider the following: 456 | 457 | ............................................................................... 458 | 459 | >> "matched" if "Mr. Gregory Browne".match(/Gregory Brown/) 460 | => "matched" 461 | 462 | ............................................................................... 463 | 464 | Often times, we mean "match this phrase", but we write "match this sequence 465 | of characters". The solution is to make use of anchors to clarify what we 466 | mean. 467 | 468 | Sometimes we want to match only if a string starts with a phrase: 469 | 470 | ............................................................................... 471 | 472 | >> phrases = ["Mr. Gregory Browne", "Mr. Gregory Brown is cool", 473 | "Gregory Brown is cool", "Gregory Brown"] 474 | 475 | >> phrases.grep /\AGregory Brown\b/ 476 | => ["Gregory Brown is cool", "Gregory Brown"] 477 | 478 | ............................................................................... 479 | 480 | Other times we want to ensure that the string contains the phrase: 481 | 482 | ............................................................................... 483 | 484 | >> phrases.grep /\bGregory Brown\b/ 485 | => ["Mr. Gregory Brown is cool", "Gregory Brown is cool", "Gregory Brown"] 486 | 487 | ............................................................................... 488 | 489 | And finally, sometimes we want to ensure the string contains an exact phrase: 490 | 491 | ............................................................................... 492 | 493 | >> phrases.grep /\AGregory Brown\z/ 494 | => ["Gregory Brown"] 495 | 496 | ............................................................................... 497 | 498 | Although I am using English names and phrases here for simplicity, this can 499 | of course be generalized to encompass any sort of matching pattern. You could 500 | be verifying that a sequence of numbers fit a certain form, or something 501 | equally abstract. The key thing to take away from this is that when you use 502 | anchors, you're being much more explicit about how you expect your pattern to 503 | match, which in most cases means that you'll have a better chance of catching 504 | problems faster, and an easier time remembering what your pattern was supposed 505 | to do. 506 | 507 | An interesting thing to note about anchors is that they don't actually match 508 | characters. Instead, they match between characters to allow you to assert 509 | certain expectations about your strings. So when you use something like +\b+, 510 | you are actually matching between one of +\w\W+ , +\W\w+ , +\A+ , +\z+. In English, 511 | that means that you're transitioning from a non-word character to a word 512 | character, or a non-word character to a word character, or you're matching the 513 | beginning or end of the string. If you review the use of +\b+ in the examples above, 514 | it should now be very clear how anchors work. 515 | 516 | The full list of available anchors in Ruby are +\A+, +\Z+, +\z+, +^+, +$+, and 517 | +\b+. Each have their merits, so be sure to read up on them. 518 | 519 | ==== Use caution when working with quantifiers ==== 520 | 521 | One of the most common anti-patterns I picked up when first learning regular 522 | expressions was to make use of +.*+ everywhere. Though this may seem innocent, 523 | This is similar to my bad habit of using +rm -Rf+ on the command line all the 524 | time instead of just +rm+. Both can result in catastrophe when used 525 | incorrectly. 526 | 527 | But maybe you're not as crazy as I am. Instead, maybe you've been writing 528 | innocent things like +/(\d*)Foo/+ to match any number of digits prepended to 529 | the word Foo: 530 | 531 | For some cases, this works great: 532 | 533 | ............................................................................... 534 | 535 | >> "1234Foo"[/(\d*)Foo/,1] 536 | => "1234" 537 | 538 | ............................................................................... 539 | 540 | But does this surprise you? 541 | 542 | ............................................................................... 543 | 544 | >> "xFoo"[/(\d*)Foo/,1] 545 | => "" 546 | 547 | ............................................................................... 548 | 549 | It may not, but then again it may. It's relatively common to forget that +*+ 550 | always matches. At a first glance, the following code seems fine: 551 | 552 | ............................................................................... 553 | 554 | if num = string[/(\d*)Foo/,1] 555 | Integer(num) 556 | end 557 | 558 | ............................................................................... 559 | 560 | However, since the match will capture an empty string in its failure case, 561 | this code will break. The solution is simple. If you really mean "at least 562 | one", use + instead. 563 | 564 | ............................................................................... 565 | 566 | if num = string[/(\d+)Foo/,1] 567 | Integer(num) 568 | end 569 | 570 | ............................................................................... 571 | 572 | Though more experienced folks might not easily be trapped by something so 573 | simple, there are more subtle variants. For example, if we intend to match 574 | only "Greg" or "Gregory", the following code doesn't quite work: 575 | 576 | ............................................................................... 577 | 578 | >> "Gregory"[/Greg(ory)?/] 579 | => "Gregory" 580 | >> "Greg"[/Greg(ory)?/] 581 | => "Greg" 582 | >> "Gregor"[/Greg(ory)?/] 583 | => "Greg" 584 | 585 | ............................................................................... 586 | 587 | Even if the pattern looks close to what we want, we can see the results 588 | don't fit. The following modifications remedy the issue: 589 | 590 | ............................................................................... 591 | 592 | >> "Gregory"[/\bGreg(ory)?\b/] 593 | => "Gregory" 594 | >> "Greg"[/\bGreg(ory)?\b/] 595 | => "Greg" 596 | >> "Gregor"[/\bGreg(ory)?\b/] 597 | => nil 598 | 599 | ............................................................................... 600 | 601 | Notice that the pattern now properly matches Greg or Gregory, but no other 602 | words. The key thing to take away here is that unbounded zero-matching 603 | quantifiers are tautologies. They can never fail to match, so you need 604 | to be sure to account for that. 605 | 606 | A final gotcha about quantifiers is that they are greedy by default. 607 | This means they'll try to consume as much of the string as possible before 608 | matching. The following is an example of a greedy match: 609 | 610 | ............................................................................... 611 | 612 | >> "# x # y # z #"[/#(.*)#/,1] 613 | => " x # y # z " 614 | 615 | ............................................................................... 616 | 617 | As you can see, this matches everything between the first and last +#+ character. 618 | But sometimes, we want processing to happen from the left and end as soon 619 | as we have a match. To do this, we append a +?+ to our repetition: 620 | 621 | ............................................................................... 622 | 623 | >> "# x # y # z #"[/#(.*?)#/,1] 624 | => " x " 625 | 626 | ............................................................................... 627 | 628 | All quantifiers can be made non-greedy this way. Remembering this will save a lot of 629 | headaches in the long run. 630 | 631 | Though our treatment of regular expressions has been by no means 632 | comprehensive, these few basic tips will really carry you a long way. 633 | The key things to remember are: 634 | 635 | * Regular Expressions are nothing more than a special language 636 | for find and replace operations, built upon simple logical constructs. 637 | 638 | * There are lots of shortcuts built in for common regular expression 639 | operations, so be sure to make use of special character classes and 640 | other simplifications when you can. 641 | 642 | * Anchors provide a way to set up some expectation about where in a string 643 | you want to look for a match. These help with both optimization and 644 | pattern correctness. 645 | 646 | * Quantifiers such as +*+ and +?+ will always match, so they should not be 647 | used without sufficient boundaries. 648 | 649 | * Quantifiers are greedy by default, and can be made non-greedy via +?+. 650 | 651 | By following these guidelines, you'll write clearer, more accurate, and 652 | faster regular expressions. As a result, it'll be a whole lot easier to 653 | revisit them when you run into them in your own old code a few months down 654 | the line. 655 | 656 | A final note on regular expressions is that sometimes we are seduced by their 657 | power and overlook other solutions that may be more robust for certain needs. 658 | In both the stock ticker and AFM parsing examples, we were working within 659 | the realm where regular expressions are a quick, easy, and fine way to go. 660 | 661 | However, as documents take on more complex structures, and your needs move 662 | from extracting some values to attempting to fully parse a document, you 663 | will probably need to look to other techniques that involve full blown 664 | parsers such as Treetop, Ghostwheel, or Racc. These libraries can solve 665 | problems regular expressions can't solve, and if you find yourself with 666 | data that's hard to map a regex to, it's worth looking at these alternative 667 | solutions. 668 | 669 | Of course, your mileage will vary based on the problem at hand, so don't be 670 | afraid of trying a regex based solution first before pulling out the big guns. 671 | 672 | === Working With Files === 673 | 674 | There are a whole slew of options for doing various file management tasks in 675 | Ruby. Because of this, it can be difficult to decide what the best approach 676 | for a given task might be. In this section, we'll cover two key task while 677 | looking at three of Ruby's standard libraries. 678 | 679 | We'll start by showing how to use the +Pathname+ and +FileUtils+ libraries to 680 | traverse your file system using a clean cross-platform approach that rivals 681 | the power of popular *nix shells without sacrificing compatibility. We'll 682 | then move on to show how to use +Tempfile+ to automate handling of temporary 683 | file resources within your scripts. These practical tips will help you 684 | write platform-agnostic Ruby code that'll work out of the box on more 685 | systems, while still managing to make your job easier. 686 | 687 | ==== Using Pathname and FileUtils ==== 688 | 689 | If you are using Ruby to write administrative scripts, it's nearly inevitable 690 | that you've needed to do some file management along the way. It may be quite 691 | tempting to drop down the the shell to do things like move and rename 692 | directories, search for files in a complex directory structure, and do other 693 | common tasks that involve ferrying files around from one place to the other. 694 | However, Ruby provides some great tools to avoid this sort of thing. 695 | 696 | The +Pathname+ and +FileUtils+ standard libraries provide virtually everything 697 | you need for file management. The best way to demonstrate their capabilities 698 | is by example, so we'll now take a look at some code and then break it down 699 | piece by piece. 700 | 701 | To illustrate +Pathname+, we can take a look at a small tool I've built for 702 | doing local installations of libraries found on Github. This script, 703 | called 'mooch', essentially looks up and clones a git repository, puts it 704 | in a convenient place within your project (a 'vendor/' directory), and 705 | optionally sets up a stub file that will include your vendored packages 706 | into the loadpath upon requiring it. Sample usage looks something like 707 | this: 708 | 709 | ............................................................................... 710 | 711 | $ mooch init lib/my_project 712 | $ mooch sandal/prawn 0.2.3 713 | $ mooch ruport/ruport 1.6.1 714 | 715 | ............................................................................... 716 | 717 | Then, we can see the following will work without loading rubygems: 718 | 719 | ............................................................................... 720 | 721 | >> require "lib/my_project/dependencies" 722 | => true 723 | >> require "prawn" 724 | => true 725 | >> require "ruport" 726 | => true 727 | >> Prawn::VERSION 728 | => "0.2.3" 729 | >> Ruport::VERSION 730 | => "1.6.1" 731 | 732 | ............................................................................... 733 | 734 | Although this script is pretty useful, that's not what we're here to talk 735 | about though. Instead, let's focus on how this sort of thing is built, 736 | since it shows a practical example of using +Pathname+ to manipulate files and 737 | folders. I'll start by showing you the whole script, and then we'll walk 738 | through it part by part: 739 | 740 | ............................................................................... 741 | 742 | #!/usr/bin/env ruby 743 | require "pathname" 744 | 745 | WORKING_DIR = Pathname.getwd 746 | LOADER = %Q{ 747 | require "pathname" 748 | 749 | Pathname.glob("#{WORKING_DIR}/vendor/*/*/") do |dir| 750 | lib = dir + "lib" 751 | $LOAD_PATH.push(lib.directory? ? lib : dir) 752 | end 753 | } 754 | 755 | if ARGV[0] == "init" 756 | lib = Pathname.new(ARGV[1]) 757 | lib.mkpath 758 | (lib + 'dependencies.rb').open("w") do |file| 759 | file.write LOADER 760 | end 761 | else 762 | vendor = Pathname.new("vendor") 763 | vendor.mkpath 764 | Dir.chdir(vendor.realpath) 765 | system("git clone git://github.com/#{ARGV[0]}.git #{ARGV[0]}") 766 | if ARGV[1] 767 | Dir.chdir(ARGV[0]) 768 | system("git checkout #{ARGV[1]}") 769 | end 770 | end 771 | 772 | ............................................................................... 773 | 774 | As you can see, it's not a ton of code, even though it does a lot. Let's 775 | shine the spotlight on the interesting `Pathname` bits. 776 | 777 | ............................................................................... 778 | 779 | WORKING_DIR = Pathname.getwd 780 | 781 | ............................................................................... 782 | 783 | Here we are simply assigning the initial working directory to a constant. We 784 | use this to build up the code for the 'dependencies.rb' stub script that can 785 | be generated via +mooch init+. Here we're just doing quick and dirty code 786 | generation, and you can see the full stub as stored in +LOADER+: 787 | 788 | ............................................................................... 789 | 790 | LOADER = %Q{ 791 | require "pathname" 792 | 793 | Pathname.glob("#{WORKING_DIR}/vendor/*/*/") do |dir| 794 | lib = dir + "lib" 795 | $LOAD_PATH.push(lib.directory? ? lib : dir) 796 | end 797 | } 798 | 799 | ............................................................................... 800 | 801 | This script does something fun. It looks in the working directory that 802 | mooch init was run in for a folder called vendor, and then looks for 803 | folders two levels deep fitting the Github convention of username/project. We 804 | then use a glob to traverse the directory structure, in search of folders 805 | to add to the loadpath. The code will check to see if each project has a 806 | 'lib' folder within it (as is the common Ruby convention), but will add the 807 | project folder itself to the loadpath if it is not present. 808 | 809 | Here we notice a few of `Pathname`'s niceties. You can see we can construct 810 | new paths by just adding new strings to them, as shown here: 811 | 812 | ............................................................................... 813 | 814 | lib = dir + "lib" 815 | 816 | ............................................................................... 817 | 818 | In addition to this, we can check to see if the path we've created actually 819 | points to a directory on the filesystem, via a simple +Pathname#directory?+ 820 | call. This makes traversal downright easy, as you can see in the preceding 821 | code. 822 | 823 | This simple stub may be a bit dense, but once you get the hang of +Pathname+, 824 | you can see that it's quite powerful. Let's look at a couple more tricks, 825 | focusing this time on the code that actually writes this snippet to file: 826 | 827 | ............................................................................... 828 | 829 | lib = Pathname.new(ARGV[1]) 830 | lib.mkpath 831 | (lib + 'dependencies.rb').open("w") do |file| 832 | file.write LOADER 833 | end 834 | 835 | ............................................................................... 836 | 837 | Before, we showed an invocation that looked like this: 838 | 839 | ............................................................................... 840 | 841 | $ mooch init lib/my_project 842 | 843 | ............................................................................... 844 | 845 | Here, +ARGV[1]+ is 'lib/my_project'. So, in the preceding code, you can see 846 | we're building up a relative path to our current working directory and 847 | then creating a folder structure. A very cool thing about Pathname is that 848 | it works similar to +mkdir -p+ on *nix, so +Pathname#mkpath+ will actually create 849 | any necessary nesting directories as needed, and won't complain if the 850 | structure already exist, which are both what we want here. 851 | 852 | Once we build up the directories, we need to create our 'dependencies.rb' file 853 | and populate it with the string in +LOADER+. We can see here that Pathname 854 | provides shortcuts that work in a similar fashion to +File.open()+. 855 | 856 | In the code that actually downloads and vendors libraries from GitHub, 857 | we see the same techniques in use yet again, this time mixed in with some 858 | shell commands and +Dir.chdir+. Since this doesn't introduce anything new, 859 | we can skip overthe details. 860 | 861 | Before we move on to discussing temporary files, we'll take a quick look 862 | at +FileUtils+. The purpose of this module is to provide a UNIX-like interface 863 | to file manipulation tasks, and a quick look at its method list will show 864 | that it does a good job of this: 865 | 866 | ............................................................................... 867 | 868 | cd(dir, options) 869 | cd(dir, options) {|dir| .... } 870 | pwd() 871 | mkdir(dir, options) 872 | mkdir(list, options) 873 | mkdir_p(dir, options) 874 | mkdir_p(list, options) 875 | rmdir(dir, options) 876 | rmdir(list, options) 877 | ln(old, new, options) 878 | ln(list, destdir, options) 879 | ln_s(old, new, options) 880 | ln_s(list, destdir, options) 881 | ln_sf(src, dest, options) 882 | cp(src, dest, options) 883 | cp(list, dir, options) 884 | cp_r(src, dest, options) 885 | cp_r(list, dir, options) 886 | mv(src, dest, options) 887 | mv(list, dir, options) 888 | rm(list, options) 889 | rm_r(list, options) 890 | rm_rf(list, options) 891 | install(src, dest, mode = , options) 892 | chmod(mode, list, options) 893 | chmod_R(mode, list, options) 894 | chown(user, group, list, options) 895 | chown_R(user, group, list, options) 896 | touch(list, options) 897 | 898 | ............................................................................... 899 | 900 | You'll see a bit more of +FileUtils+ later on in the chapter when we talk about 901 | atomic saves. But before we jump into advanced file management techniques, 902 | let's take a quick look at another important foundational tool, the tempfile 903 | standard library. 904 | 905 | === The tempfile Standard Library === 906 | 907 | Producing temporary files is a common need in many applications. Whether 908 | you need to store some things on disk to keep it out of memory until it 909 | is needed again, or you want to serve up a file but don't need to keep it 910 | lurking around after your process has terminated, odds are you'll run into 911 | this problem sooner or later. 912 | 913 | It's quite tempting to roll our own +Tempfile+ support, which might look 914 | something like the following code: 915 | 916 | ............................................................................... 917 | 918 | File.open("/tmp/foo.txt","w") do |file| 919 | file << some_data 920 | end 921 | 922 | # Then in some later code 923 | 924 | File.foreach("/tmp/foo.txt") do |line| 925 | # do something with data 926 | end 927 | 928 | # Then finally 929 | require "fileutils" 930 | FileUtils.rm("/tmp/foo.txt") 931 | 932 | ............................................................................... 933 | 934 | This code works, but it has some drawbacks. The first is that it assumes 935 | that you're on a *nix system with a '/tmp' directory. Secondly, we 936 | don't do anything to avoid file collisions, so if another application is 937 | using '/tmp/foo.txt', this will overwrite it. Finally, we need to explicitly 938 | remove the file, or risk leaving a bunch of trash around. 939 | 940 | Luckily, Ruby has a standard library that helps us get around these issues. 941 | Using it, our example then looks like this: 942 | 943 | ............................................................................... 944 | 945 | require "tempfile" 946 | temp = Tempfile.new("foo.txt") 947 | temp << some_data 948 | 949 | # then in some later code 950 | temp.rewind 951 | temp.each do |line| 952 | # do something with data 953 | end 954 | 955 | # Then finally 956 | temp.close 957 | 958 | ............................................................................... 959 | 960 | Let's take a look at what's going on in a little more detail, to really get 961 | a sense of what the +tempfile+ library is doing for us. 962 | 963 | ==== Automatic Temporary Directory Handling ==== 964 | 965 | The code looks somewhat similar to our original example, as we're still 966 | essentially working with an IO object. However, the approach is different. 967 | +Tempfile+ opens up a file handle for us to a file that is stored in whatever 968 | your system's tempdir is. We can inspect this value, and even change it if we 969 | need to. Here's what it looks like on two of my systems: 970 | 971 | ............................................................................... 972 | 973 | >> Dir.tmpdir 974 | => "/var/folders/yH/yHvUeP-oFYamIyTmRPPoKE+++TI/-Tmp-" 975 | 976 | >> Dir.tmpdir 977 | => "/tmp" 978 | 979 | ............................................................................... 980 | 981 | Usually, it's best to go with whatever this value is because it is where Ruby 982 | thinks your temp files should go. However, in the cases where we want to 983 | control this ourselves, it is simple to do so, as shown in the following: 984 | 985 | ............................................................................... 986 | 987 | temp = Tempfile.new("foo.txt", "path/to/my/tmpdir") 988 | 989 | ............................................................................... 990 | 991 | ==== Collision Avoidance ==== 992 | 993 | When you create a temporary file with Tempfile.new, you aren't actually 994 | specifying an exact filename. Instead, the filename you specify is used 995 | as a base name and then gets a unique identifier appended to it. This 996 | prevents one temp file from accidentally overwriting another. Here's a 997 | trivial example that shows what's going on under the hood: 998 | 999 | ............................................................................... 1000 | 1001 | >> a = Tempfile.new("foo.txt") 1002 | => # 1003 | >> b = Tempfile.new("foo.txt") 1004 | => # 1005 | >> a.path 1006 | => "/tmp/foo.txt.2021.0" 1007 | >> b.path 1008 | => "/tmp/foo.txt.2021.1" 1009 | 1010 | ............................................................................... 1011 | 1012 | Allowing Ruby to handle collision avoidance is generally a good thing, 1013 | especially we don't normally care about the exact names of our temp files. 1014 | Of course, we can always rename the file if we need to store it somewhere 1015 | permanently, as you saw in the case study. 1016 | 1017 | ==== Same Old I/O Operations ==== 1018 | 1019 | Because we're dealing with an object that delegates most of its functionality 1020 | directly to +File+, we can use normal +File+ methods, as show in our example. 1021 | 1022 | For this reason, we can write to our file handle as expected: 1023 | 1024 | ............................................................................... 1025 | 1026 | temp << some_data 1027 | 1028 | ............................................................................... 1029 | 1030 | And read from it in a similar fashion: 1031 | 1032 | ............................................................................... 1033 | 1034 | # then in some later code 1035 | temp.rewind 1036 | temp.each do |line| 1037 | # do something with data 1038 | end 1039 | 1040 | ............................................................................... 1041 | 1042 | Because we leave the file handle open, we need to rewind it to point back 1043 | at the beginning of the file rather than the end. Beyond that, the 1044 | behavior is exactly similar to +File#each+. 1045 | 1046 | ==== Automatic Unlinking ==== 1047 | 1048 | Tempfile cleans up after itself. There are two main ways of unlinking a file 1049 | and which one to use depends on your needs. Simply closing the file handle 1050 | is good enough, and it is what we use in our example: 1051 | 1052 | ............................................................................... 1053 | 1054 | temp.close 1055 | 1056 | ............................................................................... 1057 | 1058 | In this case, Ruby doesn't remove the temporary file right away. Instead, 1059 | it will keep it around until all reference to temp have been garbage 1060 | collected. For this reason, if keeping lots of open file handles around is 1061 | a problem for you, you can actually close your handles without fear of losing 1062 | your tempfile so long as you keep a reference to it handy. 1063 | 1064 | However, in other situations, you may want to purge the file as soon as it 1065 | has been closed. The change to make this happen is trivial: 1066 | 1067 | ............................................................................... 1068 | 1069 | temp.close! 1070 | 1071 | ............................................................................... 1072 | 1073 | Finally, if you need to explicitly delete a file that has already been 1074 | closed, you can just use the following: 1075 | 1076 | ............................................................................... 1077 | 1078 | temp.unlink 1079 | 1080 | ............................................................................... 1081 | 1082 | In practice, you don't need to think about this in most cases. 1083 | Instead, +Tempfile+ works as you might expect, keeping your files around while you 1084 | need them and cleaning up after itself when it needs to. If you forget to 1085 | close a temporary file explicitly, it'll be unlinked when the process exits. For 1086 | these reasons, using the 'tempfile' library is often a better choice than 1087 | rolling your own solution. 1088 | 1089 | There is more to be said about this very cool library, but what we've already 1090 | discussed covers most of what you'll need day to day, now is a fine time to go 1091 | over what's been said and move on to the next thing. 1092 | 1093 | We've gone over some of the tools Ruby provides for working with 1094 | your filesystem in a platform-agnostic way, and we're about to get into some 1095 | more advanced strategies for managing, processing, and manipulating your 1096 | files and their contents. However, before we do that, let's review the 1097 | key points about working with your filesystem and with tempfiles: 1098 | 1099 | * There are a whole slew of options for file management in Ruby, including 1100 | `FileUtils`, `Dir`, and `Pathname`, with some overlap between them. 1101 | 1102 | * `Pathname` provides a high level, modern Ruby interface to managing files 1103 | and traversing your file system. 1104 | 1105 | * `FileUtils` provides a *nix style API to file management tools, but works 1106 | just fine on any system, making it quite useful for porting shell scripts 1107 | to Ruby. 1108 | 1109 | * The tempfile standard library provides a convenient IO-like class for 1110 | dealing with tempfiles in a system independent way. 1111 | 1112 | * The tempfile library also helps make things easier through things like 1113 | name collision avoidance, automatic file unlinking, and other niceties. 1114 | 1115 | With these things in mind, we'll see more of the techniques shown in this 1116 | section later on in the chapter. But if you're bored with the basics, now 1117 | is the time to look at higher level strategies for doing common I/O tasks. 1118 | 1119 | === Text Processing Strategies === 1120 | 1121 | Ruby makes basic I/O operations dead simple, but this doesn't mean it's a bad 1122 | idea to pick up and apply some general approaches to text processing. Here 1123 | we'll talk about two techniques that most programmers doing file processing 1124 | will want to know about, and show what they look like in Ruby. 1125 | 1126 | ==== Advanced Line Processing ==== 1127 | 1128 | The case study for this chapter showed the most common use of +File.foreach()+, 1129 | but there is more to be said about this approach. This section will 1130 | highlight a couple of tricks worth knowing about when doing line by line 1131 | processing. 1132 | 1133 | ===== Using Enumerator ===== 1134 | 1135 | In the following example, we will show code which extracts and sums the totals 1136 | found in a file that has entries similar to to ones below: 1137 | 1138 | ............................................................................... 1139 | 1140 | some 1141 | lines 1142 | of 1143 | text 1144 | total: 12 1145 | 1146 | other 1147 | lines 1148 | of 1149 | text 1150 | total: 16 1151 | 1152 | more 1153 | text 1154 | total: 3 1155 | 1156 | ............................................................................... 1157 | 1158 | The following code shows how to do this without loading the whole file into 1159 | memory: 1160 | 1161 | ............................................................................... 1162 | 1163 | sum = 0 1164 | File.foreach("data.txt") { |line| sum += line[/total: (\d+)/,1].to_f } 1165 | 1166 | ............................................................................... 1167 | 1168 | Here, we are using `File.foreach` as a direct iterator, and building up our sum 1169 | as we go. However, because `foreach()` returns an `Enumerator`, we can actually 1170 | write this in a cleaner way without sacrificing efficiency: 1171 | 1172 | ............................................................................... 1173 | 1174 | enum = File.foreach("data.txt") 1175 | sum = enum.inject(0) { |s,r| s + r[/total: (\d+)/,1].to_f } 1176 | 1177 | ............................................................................... 1178 | 1179 | The primary difference between the two approaches is that when you use 1180 | `File.foreach` directly with a block, you are simply iterating line by line 1181 | over the file, whereas `Enumerator` gives you some more powerful ways of 1182 | processing your data. 1183 | 1184 | When we work with arrays, we don't usually write code like this: 1185 | 1186 | ............................................................................... 1187 | 1188 | sum = 0 1189 | arr.each { |e| sum += e } 1190 | 1191 | ............................................................................... 1192 | 1193 | Instead, we typically let Ruby do more of the work for us: 1194 | 1195 | ............................................................................... 1196 | 1197 | sum = arr.inject(0) { |s,e| s + e } 1198 | 1199 | ............................................................................... 1200 | 1201 | For this reason, we should do the same thing with files. If we have an 1202 | `Enumerable` method we want to use to transform or process a file, we should 1203 | use the enumerator provided by +File.foreach()+ rather than try to do our 1204 | processing within the block. This will allow us to leverage the power 1205 | behind Ruby's `Enumerable` module rather than doing the heavy lifting ourselves. 1206 | 1207 | ===== Tracking Line Numbers ===== 1208 | 1209 | If you're interested in certain line numbers, there is no need to maintain 1210 | a manual counter. You simply need to create a file handle to work with, 1211 | and then make use of the +File#lineno+ method. To illustrate this, we can 1212 | very easily implement the UNIX command head. 1213 | 1214 | ............................................................................... 1215 | 1216 | def head(file_name,max_lines = 10) 1217 | File.open(file_name) do |file| 1218 | file.each do |line| 1219 | puts line 1220 | break if file.lineno == max_lines 1221 | end 1222 | end 1223 | end 1224 | 1225 | ............................................................................... 1226 | 1227 | For a more interesting use case, we can consider a file that is formatted 1228 | in two line pairs, the first line a key, the second a value, e.g. 1229 | 1230 | ............................................................................... 1231 | 1232 | first name 1233 | gregory 1234 | last name 1235 | brown 1236 | email 1237 | gregory.t.brown@gmail.com 1238 | 1239 | ............................................................................... 1240 | 1241 | Using +File#lineno+, this is trivial to process: 1242 | 1243 | ............................................................................... 1244 | 1245 | keys = [] 1246 | values = [] 1247 | 1248 | File.open("foo.txt") do |file| 1249 | file.each do |line| 1250 | (file.lineno.odd? ? keys : values) << line.chomp 1251 | end 1252 | end 1253 | 1254 | Hash[*keys.zip(values).flatten] 1255 | 1256 | ............................................................................... 1257 | 1258 | The result of this code is a simple hash, as you might expect: 1259 | 1260 | ............................................................................... 1261 | 1262 | { "first name" => "gregory", 1263 | "last name" => "brown", 1264 | "email" => "gregory.t.brown@gmail.com" } 1265 | 1266 | ............................................................................... 1267 | 1268 | Though there is probably more we can say about iterating over files line 1269 | by line, this should get you well on your way. For now, there are other 1270 | important I/O strategies to investigate, so we'll keep moving. 1271 | 1272 | ==== Atomic Saves ==== 1273 | 1274 | Although many file processing scripts can happily read in one file as input 1275 | and produce another as output, sometimes we want to be able to do 1276 | transformations directly on a single file. This isn't hard in practice, 1277 | but it's a little bit less obvious than you might think. 1278 | 1279 | It is technically possible to rewrite parts of a file using the `"r+"` file 1280 | mode, but in practice, this can be unwieldy in most cases. An alternative 1281 | approach is to load the entire contents of a file into memory, manipulate 1282 | the string, and then overwrite the original file. However, this approach 1283 | is wasteful, and is not the best way to go in most cases. 1284 | 1285 | As it turns out, there is a simple solution to this problem, and that is to simply 1286 | work around it. Rather than trying to make direct changes to a file, or 1287 | store a string in memory and then write it back out to the same file after 1288 | manipulation, we can instead make use of a temporary file and do line by 1289 | line processing as normal. When we finish the job, we can rename our temp 1290 | file so as to replace the original. Using this approach, we can easily 1291 | make a backup of the original file if necessary, and also roll back changes 1292 | upon error. 1293 | 1294 | Let's take a quick look at an example that demonstrates this general 1295 | strategy. We'll build a script that strips comments from Ruby files, 1296 | allowing us to take source code such as this: 1297 | 1298 | ............................................................................... 1299 | 1300 | # The best class ever 1301 | # Anywhere in the world 1302 | class Foo 1303 | 1304 | # A useless comment 1305 | def a 1306 | true 1307 | end 1308 | 1309 | #Another Useless comment 1310 | def b 1311 | false 1312 | end 1313 | 1314 | end 1315 | 1316 | ............................................................................... 1317 | 1318 | And turn it into comment-free code such as this: 1319 | 1320 | ............................................................................... 1321 | 1322 | class Foo 1323 | 1324 | def a 1325 | true 1326 | end 1327 | 1328 | def b 1329 | false 1330 | end 1331 | 1332 | end 1333 | 1334 | ............................................................................... 1335 | 1336 | With the help of Ruby's 'tempfile' and 'fileutils' standard libraries, this task 1337 | is trivial: 1338 | 1339 | ............................................................................... 1340 | 1341 | require "tempfile" 1342 | require "fileutils" 1343 | temp = Tempfile.new("working") 1344 | File.foreach(ARGV[0]) do |line| 1345 | temp << line unless line =~ /^\s*#/ 1346 | end 1347 | 1348 | temp.close 1349 | FileUtils.mv(temp.path,ARGV[0]) 1350 | 1351 | ............................................................................... 1352 | 1353 | We initialize a new +Tempfile+ object and then iterate over the file 1354 | specified on the command line. We append each line to the +Tempfile+, so long 1355 | as it is not a comment line. This is the first part of our task: 1356 | 1357 | ............................................................................... 1358 | 1359 | temp = Tempfile.new("working") 1360 | File.foreach(ARGV[0]) do |line| 1361 | temp << line unless line =~ /^\s*#/ 1362 | end 1363 | 1364 | temp.close 1365 | 1366 | ............................................................................... 1367 | 1368 | Once we've written our +Tempfile+ and closed the file handle, we then use 1369 | +FileUtils+ to rename it and replace the original file we were working on: 1370 | 1371 | ............................................................................... 1372 | 1373 | FileUtils.mv(temp.path,ARGV[0]) 1374 | 1375 | ............................................................................... 1376 | 1377 | In two steps, we've efficiently modified a file without loading it entirely 1378 | into memory or dealing with the complexities of using the 'r+' file mode. 1379 | In many cases, the simple approach shown here will be enough. 1380 | 1381 | Of course, because you are modifying a file in place, a poorly coded script 1382 | could risk destroying your input file. For this reason, you might want 1383 | to make a backup of your file. This can be done trivially with +FileUtils.cp+, 1384 | as shown in the following reworked version of our example: 1385 | 1386 | ............................................................................... 1387 | 1388 | require "tempfile" 1389 | require "fileutils" 1390 | 1391 | temp = Tempfile.new("working") 1392 | File.foreach(ARGV[0]) do |line| 1393 | temp << line unless line =~ /^\s*#/ 1394 | end 1395 | 1396 | temp.close 1397 | FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") 1398 | FileUtils.mv(temp.path,ARGV[0]) 1399 | 1400 | ............................................................................... 1401 | 1402 | This code only makes a backup of the original file if the temp file is 1403 | successfully populated, which prevents it from producing garbage during 1404 | testing. 1405 | 1406 | Some times it will make sense to do backups, other times, it won't be 1407 | essential. Of course, it's better to be safe then sorry, so if you're in 1408 | doubt, just add the extra line of code for a bit more peace of mind. 1409 | 1410 | The two strategies shown in this section will come in practice up again and 1411 | again for those doing frequent text processing. They can even be used in 1412 | combination when needed. 1413 | 1414 | We're about to close our discussion on this topic, but before we do that, 1415 | it's worth mentioning the following reminders: 1416 | 1417 | * When doing line based file processing, +File.foreach+ can be used as an 1418 | +Enumerator+, unlocking the power of +Enumerable+. This provides an 1419 | extremely handy way to search, traverse, and manipulate files without 1420 | sacrificing efficiency. 1421 | 1422 | * If you need to keep track of which line of a file you are on while you are 1423 | iterating over it, you can use +File#lineno+ rather than incrementing your 1424 | own counter. 1425 | 1426 | * When doing atomic saves, the tempfile standard library can be used to 1427 | avoid unnecessary clutter. 1428 | 1429 | * Be sure to test any code that does atomic saves thoroughly, as there is 1430 | real risk of destroying your original source files if backups are not made. 1431 | 1432 | 1433 | === Conclusions === 1434 | 1435 | When dealing with text processing and file management in Ruby, there are 1436 | a few things to keep in mind. Most of the pitfalls you can run into while 1437 | doing this sort of work tend to have to do with either performance, 1438 | platform-dependence, or code that doesn't clean up after itself. 1439 | 1440 | In this chapter, we talked about a couple standard libraries that can help 1441 | keep things clean and platform independent. Though Ruby is a fine language 1442 | to write shell scripts in, there is often no need to resort to code that 1443 | will run only on certain machines when a pure Ruby solution is just as clean. 1444 | For this reason, using libraries such as +Tempfile+, +Pathname+, and +FileUtils+ 1445 | will go a seriously long way towards keeping your code portable and 1446 | maintainable down the line. 1447 | 1448 | For issues of performance, you can almost always squeeze out extra speed 1449 | and minimize your memory footprint by processing your data line by line 1450 | rather than slurping everything into a single string. You can also much 1451 | more effectively find a needle in the haystack if you form well crafted 1452 | regular expressions that don't make Ruby work too hard. The techniques 1453 | we've shown here serve as reminders about common mistakes even seasoned 1454 | Rubyists tend to make, and provide good ways around them. 1455 | 1456 | Text processing and file management can quickly become complex, but with a 1457 | solid grasp of the fundamental strategies, you can use Ruby as an extremely 1458 | powerful tool that works faster and more effectively than you might imagine. 1459 | -------------------------------------------------------------------------------- /manuscript/unmerged/ruby_worst_practices.txt: -------------------------------------------------------------------------------- 1 | == Appendix C: Ruby Worst Practices == 2 | 3 | If you've read through most of this book, you'll notice that it didn't have much 4 | of a "Do this, not that" theme. Ruby as a language doesn't fit well into that 5 | framework, since there are always exceptions to any rule you can come up with. 6 | 7 | However, there are certainly a few things you really shouldn't do, unless you 8 | know exactly why you are doing them. This appendix is meant to cover a handful 9 | of those scenarios and show you some better alternatives. I've done my best to 10 | stick to issues I've been bit by myself, in the hopes that I can offer some 11 | practical advice for problems you might actually have run into. 12 | 13 | A bad practice in programming shouldn't be simply defined by some ill-defined 14 | aesthetic imposed upon folks by the "experts". Instead, we can often track 15 | anti-patterns in code down to either flaws in the high level design of an object 16 | oriented system, or failed attempts at cleverness in the underlying 17 | feature implementations. These bits of unsavory code produced by bad habits or 18 | the misunderstanding of certain Ruby peculiarties can be a drag on your whole 19 | project, creating substantial technical debt as they accumulate. 20 | 21 | We'll start with the high level design issues and then move on to the common 22 | sticking points when implementing tricky Ruby features. Making an improvement 23 | to even a couple of these problem areas will make a major difference, so even if 24 | you already know about most of these pitfalls, you might find one or two tips that 25 | will go a long way. 26 | 27 | === Not-so Intelligent Design === 28 | 29 | Well designed object oriented systems can be a dream to work with. When every 30 | component seems to fit together nicely, with clear, simple integration code 31 | between the major subsystems, you get the feeling that the architecture is 32 | working for you, and not against you. 33 | 34 | If you're not careful, all of this can come crashing down. Let's look at a few 35 | things to watch out for, and how to get around them. 36 | 37 | ==== Class Variables Considered Harmful ==== 38 | 39 | Ruby's class variables are one of the easiest ways to break encapsulation and 40 | create headaches for yourself when designing class heirarchies. To demonstrate the 41 | problem, I'll show an example where class variables were tempting but ultimately 42 | the wrong solution. 43 | 44 | In my abstract formatting library +Fatty+, I provide a formatter base class 45 | which users must inherit from from to make use of the system. This provides 46 | helpers which build up anonymous classes for certain formats. To get a sense 47 | of what this looks like, take a look at this example: 48 | 49 | .............................................................................. 50 | 51 | class Hello < FattyRBP::Formatter 52 | format :text do 53 | def render 54 | "Hello World" 55 | end 56 | end 57 | 58 | format :html do 59 | def render 60 | "Hello World" 61 | end 62 | end 63 | end 64 | 65 | puts Hello.render(:text) #=> "Hello World" 66 | puts Hello.render(:html) #=> "Hello World" 67 | 68 | .................................................................................. 69 | 70 | Though we've omitted most of the actual functionality +Fatty+ provides, a simple 71 | implementation of this system using class variables might look like this: 72 | 73 | .................................................................................. 74 | 75 | module FattyRBP 76 | class Formatter 77 | @@formats = {} 78 | 79 | def self.format(name, options={}, &block) 80 | @@formats[name] = Class.new(FattyRBP::Format, &block) 81 | end 82 | 83 | def self.render(format, options={}) 84 | @@formats[format].new(options).render 85 | end 86 | end 87 | 88 | class Format 89 | def initialize(options) 90 | # not important 91 | end 92 | 93 | def render 94 | raise NotImplementedError 95 | end 96 | end 97 | 98 | end 99 | 100 | .................................................................................. 101 | 102 | This code will make the example we showed earlier work as advertised. Now let's 103 | see what happens when we add another subclass into the mix. 104 | 105 | .................................................................................. 106 | 107 | class Goodbye < FattyRBP::Formatter 108 | format :text do 109 | def render 110 | "Goodbye Cruel World!" 111 | end 112 | end 113 | end 114 | 115 | puts Goodbye.render(:text) #=> "Goodbye Cruel World!" 116 | 117 | .................................................................................. 118 | 119 | At a first glance, things seem to be working. But if we dig deeper, we see two 120 | problems: 121 | 122 | .................................................................................. 123 | 124 | # Should not have changed 125 | puts Hello.render(:text) #=> "Goodbye Cruel World!" 126 | 127 | # Shouldn't exist 128 | puts Goodbye.render(:html) #=> "Hello World" 129 | 130 | .................................................................................. 131 | 132 | And here, we see the problem with class variables. If we think of them as 133 | class-level state, we'd be wrong. They are actually class-heirarchy variables 134 | that can have their state modified by any subclass, whether direct or many 135 | levels down the ancestry chain. This means they're fairly close to 136 | global state in nature, which is usually a bad thing. So unless you were 137 | actually counting on this behavior, an easy fix is to just dump class variables 138 | and use class instance variables instead. 139 | 140 | .................................................................................. 141 | 142 | module FattyRBP 143 | class Formatter 144 | 145 | def self.formats 146 | @formats ||= {} 147 | end 148 | 149 | def self.format(name, options={}, &block) 150 | formats[name] = Class.new(FattyRBP::Format, &block) 151 | end 152 | 153 | def self.render(format, options={}) 154 | formats[format].new(options).render 155 | end 156 | end 157 | 158 | class Format 159 | def initialize(options) 160 | # not important 161 | end 162 | end 163 | end 164 | 165 | .................................................................................. 166 | 167 | Although this prevents direct access to the variable from instances, it is easy 168 | to define accessors at the class level. The benefit is that each subclass 169 | carries their own instance variable, just like ordinary objects do. With this 170 | new code, everything works as expected: 171 | 172 | .................................................................................. 173 | 174 | puts Hello.render(:text) #=> "Hello World" 175 | puts Hello.render(:html) #=> "Hello World" 176 | puts Goodbye.render(:text) #=> "Goodbye Cruel World" 177 | 178 | puts Hello.render(:text) #=> "Hello World" 179 | puts Goodbye.render(:html) #=> raises an error 180 | 181 | .................................................................................. 182 | 183 | So the moral of the story here is that class-level state should be stored in 184 | class instance variables if you want to allow subclassing. Reserve class 185 | variables for data that needs to be shared across an entire class heirarchy. 186 | 187 | ==== Hardcoding Yourself into a Corner ==== 188 | 189 | One good practice is to provide alternative constructors for your classes when 190 | there are common configurations that might be generally useful. One such 191 | example is in +Prawn+, when a user wants to build up a document via a simplified 192 | interface and then immediately render it to file: 193 | 194 | .................................................................................. 195 | 196 | Prawn::Document.generate("hello.pdf") do 197 | text "Hello Prawn!" 198 | end 199 | 200 | .................................................................................. 201 | 202 | Implementing this method was very simple, as it simply wraps the constructor and 203 | calls an extra method to render the file afterwards: 204 | 205 | .................................................................................. 206 | 207 | module Prawn 208 | class Document 209 | 210 | def self.generate(filename,options={},&block) 211 | pdf = Prawn::Document.new(options,&block) 212 | pdf.render_file(filename) 213 | end 214 | 215 | end 216 | end 217 | 218 | .................................................................................. 219 | 220 | However, some months down the line, a bug report made me realize that I made 221 | somewhat stupid mistake here. I accidentally prevented users from being able to 222 | write code like this: 223 | 224 | .................................................................................. 225 | 226 | class MyDocument < Prawn::Document 227 | def say_hello 228 | text "Hello MyDocument" 229 | end 230 | end 231 | 232 | MyDocument.generate("hello.pdf") do 233 | say_hello 234 | end 235 | 236 | .................................................................................. 237 | 238 | The problem of course, is that +Prawn::Document.generate+ hard codes the 239 | constructor call, which prevents subclasses from ever being instantiated via 240 | +generate+. The fix is so easy, it is somewhat embarassing to share: 241 | 242 | .................................................................................. 243 | 244 | module Prawn 245 | class Document 246 | 247 | def self.generate(filename,options={},&block) 248 | pdf = new(options,&block) 249 | pdf.render_file(filename) 250 | end 251 | 252 | end 253 | end 254 | 255 | .................................................................................. 256 | 257 | By removing the explicit receiver, we now construct an object based on whatever 258 | +self+ is, rather than only building up +Prawn::Document+ objects. This affords 259 | us additional flexibility at virtually no cost. In fact, because hardcoding the 260 | name of the current class in your method definitions is almost always an 261 | accident, this applies across the board as a good habit to get into. 262 | 263 | Although much less severe, the same thing goes for class method definitions as 264 | well. Throughout this book, you will see class methods defined using 265 | +def self.my_method+ rather than +def MyClass.my_method+. The reason for this 266 | is much more about maintainability than it is about style. To illustrate this, 267 | let's do a simple comparison. We start off with two boring class definitions 268 | for the classes +A+ and +B+ 269 | 270 | .................................................................................. 271 | 272 | class A 273 | def self.foo 274 | # .. 275 | end 276 | 277 | def self.bar 278 | # .. 279 | end 280 | end 281 | 282 | class B 283 | def B.foo 284 | # ... 285 | end 286 | 287 | def B.bar 288 | # ... 289 | end 290 | end 291 | 292 | .................................................................................. 293 | 294 | These two are functionally equivalent, each defining the class methods +foo+ and 295 | +bar+ on their respective classes. But now, let's refactor our code a bit, 296 | renaming +A+ to +C+ and +B+ to +D+. Observe the work involved in doing each: 297 | 298 | .................................................................................. 299 | 300 | class C 301 | def self.foo 302 | # .. 303 | end 304 | 305 | def self.bar 306 | # .. 307 | end 308 | end 309 | 310 | class D 311 | def D.foo 312 | # ... 313 | end 314 | 315 | def D.bar 316 | # ... 317 | end 318 | end 319 | 320 | 321 | .................................................................................. 322 | 323 | To rename +A+ to +C+, we simply change the name of our class, and we don't need 324 | to touch the method definitions. But when we change +B+ to +D+, each and every 325 | method needs to be reworked. While this might be okay for an object with one or 326 | two methods at the class level, you can imagine how tedious this could be when 327 | that number gets larger. 328 | 329 | So we've now found two points against hardcoding class names, and could probably 330 | keep growing the list if we wanted. But for now, let's move on to some higher 331 | level design issues. 332 | 333 | ==== When Inheritence Becomes Restrictive ==== 334 | 335 | Inheritence is very nice when your classes have a clear hierarchical 336 | structure between them. However, it can get in the way when used 337 | inappropriately. Problems begin to crop up when we try to model cross-cutting 338 | concerns using ordinary inheritance. For examples of this, it's easy to look 339 | directly into core Ruby. 340 | 341 | Imagine if +Comparable+ were a class instead of a module. Then, you would be 342 | writing code like this: 343 | 344 | .................................................................................. 345 | 346 | class Person < Comparable 347 | 348 | def initialize(first_name, last_name) 349 | @first_name = first_name 350 | @last_name = last_name 351 | end 352 | 353 | attr_reader :first_name, :last_name 354 | 355 | def <=>(other_person) 356 | [last_name, first_name] <=> [other_person.last_name, other_person.first_name] 357 | end 358 | 359 | end 360 | 361 | .................................................................................. 362 | 363 | However, after seeing this, it becomes clear that it'd be nice to use a +Struct+ 364 | here. If we ignore the features provided by +Comparable+ here for a moment, 365 | the benefits of a struct to represent this simple data structure becomes 366 | obvious. 367 | 368 | .................................................................................. 369 | 370 | class Person < Struct.new(:first_name, :last_name) 371 | def full_name 372 | "#{first_name} #{last_name}" 373 | end 374 | end 375 | 376 | .................................................................................. 377 | 378 | Because Ruby supports single inheritance only, this example clearly 379 | demonstrates the problems we run into when relying too heavily on hierarchical 380 | structure. A +Struct+ is certainly not always +Comparable+. And it is just 381 | plain silly to think of all +Comparable+ objects being +Struct+ objects. The 382 | key distinction here is that a +Struct+ defines what an object is made up of, 383 | whereas +Comparable+ defines a set of features associated with certain objects. 384 | For this reason, the real Ruby code to accomplish this modeling makes a whole 385 | lot of sense: 386 | 387 | .................................................................................. 388 | 389 | class Person < Struct.new(:first_name, :last_name) 390 | 391 | include Comparable 392 | 393 | def <=>(other_person) 394 | [last_name, first_name] <=> [other_person.last_name, other_person.first_name] 395 | end 396 | 397 | def full_name 398 | "#{first_name} #{last_name}" 399 | end 400 | 401 | end 402 | 403 | .................................................................................. 404 | 405 | Keep in mind that while we are constained to exactly one superclass, we can 406 | include as many modules as we'd like. For this reason, modules are often used 407 | to implement features that are completely orthogonal to the underlying class 408 | definition they are mixed into. Taking an example from the Ruby API 409 | documentation, we see +Forwardable+ being used to very quickly implement a 410 | simple +Queue+ structure by doing little more than delegating to an underlying 411 | +Array+: 412 | 413 | .................................................................................. 414 | 415 | require "forwardable" 416 | 417 | class Queue 418 | extend Forwardable 419 | 420 | def initialize 421 | @q = [ ] 422 | end 423 | 424 | def_delegator :@q, :push, :enq 425 | def_delegator :@q, :shift, :deq 426 | 427 | def_delegators :@q, :clear, :first, :push, :shift, :size 428 | end 429 | 430 | .................................................................................. 431 | 432 | Although +Forwardable+ would make no sense anywhere in a class hierarchy, it 433 | accomplishes its task beautifully here. If we were constrained to a purely 434 | inheritance based model, such cleverness would not be so easy to pull off. 435 | 436 | The key thing to remember here is not that you should avoid inheritance at all 437 | cost, by any means. Instead, you should simply remember not to go out of your 438 | way to construct an artificial hierarchical structure to represent cross-cutting 439 | or orthogonal concerns. It's important to remember that Ruby's core is not 440 | special or magical in its abundant use of mix ins, but instead, representative 441 | of a very pragmatic and powerful object model. You can and should apply this 442 | technique within your own designs, whenever it makes sense to do so. 443 | 444 | === The Downside of Cleverness === 445 | 446 | Ruby lets you do all sorts of clever, fancy tricks. This cleverness is a big 447 | part of what makes Ruby so elegant, but it also can be downright dangerous in 448 | the wrong hands. To illustrate this, we'll look at the kind of trouble you can 449 | get in if you aren't careful. 450 | 451 | ==== The evils of +eval()+ ==== 452 | 453 | Throughout this book, we've dynamically evaluated code blocks all over the 454 | place. However, what you have not seen much of is the use of +eval()+, 455 | +class_eval()+, or even +instance_eval()+ with a string. Some might wonder why 456 | this is, because `eval()` can be so useful! For example, imagine that you are exposing 457 | a way for users to filter through some data. You would like to be able to support an 458 | interface like this: 459 | 460 | .................................................................................. 461 | 462 | user1 = User.new("Gregory Brown", balance: 2500) 463 | user2 = User.new("Arthur Brown", balance: 3300) 464 | user3 = User.new("Steven Brown", balance: 3200) 465 | 466 | f = Filter.new([user1, user2, user3]) 467 | f.search("balance > 3000") #=> [user2, user3] 468 | 469 | .................................................................................. 470 | 471 | Armed with +instance_eval+, this task is so easy that you barely bat an eye as you 472 | type out the following code: 473 | 474 | ................................................................................ 475 | 476 | class User 477 | def initialize(name, options) 478 | @name = name 479 | @balance = options[:balance] 480 | end 481 | 482 | attr_reader :name, :balance 483 | end 484 | 485 | class Filter 486 | def initialize(enum) 487 | @collection = enum 488 | end 489 | 490 | def search(query) 491 | @collection.select { |e| e.instance_eval(query) } 492 | end 493 | end 494 | 495 | ................................................................................ 496 | 497 | Running the earlier example, you see this code works great, exactly as expected. 498 | But unfortunately, trouble strikes when you see queries like this: 499 | 500 | ................................................................................ 501 | 502 | >> f.search("@balance = 0") 503 | => [#, 504 | #, 505 | #] 506 | 507 | ................................................................................ 508 | 509 | Or perhaps even scarier: 510 | 511 | ................................................................................ 512 | 513 | >> f.search("system('touch hacked')") 514 | => [#> File.exist?('hacked') 516 | => true 517 | 518 | ................................................................................ 519 | 520 | Since the ability for user generated strings to execute arbitrary system 521 | commands or damage the internals of an object aren't exactly appealing, you code 522 | up a regex filter to protect against this: 523 | 524 | ................................................................................ 525 | 526 | def search(query) 527 | raise "Invalid query" unless query =~ /^(\w+) ([>> f.search("system('touch hacked')") 538 | RuntimeError: Invalid query 539 | from (irb):33:in `search' 540 | from (irb):38 541 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 542 | 543 | >> f.search("@balance = 0") 544 | RuntimeError: Invalid query 545 | from (irb):33:in `search' 546 | from (irb):39 547 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 548 | 549 | ................................................................................ 550 | 551 | But if you weren't paying very close attention, you would have missed that we 552 | got our anchors wrong. That means there's still a hole to be exploited here: 553 | 554 | ................................................................................ 555 | 556 | >> f.search("balance == 0\nsystem('touch hacked_again')") 557 | => [#> File.exist?('hacked_again') 559 | => true 560 | 561 | ................................................................................ 562 | 563 | Since our regex checked the first line and not the whole string, we were able to 564 | sneak by the validation. Arguably, if you're very careful, you could come up 565 | with the right pattern and be reasonably safe. But since you are already 566 | validating the syntax, why play with fire? We can re-write this code to 567 | accomplish the same goals with none of the associated risks: 568 | 569 | ................................................................................ 570 | 571 | def search(query) 572 | data = query.match(/^(?\w+) (?[>\d+)$/) 573 | @collection.select do |e| 574 | attr = e.public_send(data[:attr]) 575 | attr.public_send(data[:op], Integer(data[:val])) 576 | end 577 | end 578 | 579 | ................................................................................ 580 | 581 | Here, we don't expose any of the objects internals, preserving encapsulation. 582 | Because we parse out the individual components of the statement and use 583 | +public_send+ to pass the messages on to our objects, we have completely 584 | eliminated the possibility of arbitrary code execution. All in all, this code 585 | is much more secure and easier to debug. As it turns out, this code will 586 | actually perform considerably better as well. 587 | 588 | Every time you use +eval(string)+, Ruby needs to fire up its parser and tree 589 | walker to execute the code you've embedded in your string. This means that in 590 | cases in which you just need to process a few values and then do something with 591 | them, using a targeted regular expression is often a much better option, as it 592 | greatly reduces the amount of work the interpreter needs to do. 593 | 594 | For virtually every need you might turn to a raw string +eval()+ for, you can 595 | work around it using the tools Ruby provides. These include all sorts of 596 | methods for getting at whatever you need, including +instance_variable_get+, 597 | +instance_variable_set+, +const_get+, +const_set+, +public_send+, +send+, 598 | +define_method+, +method()+,and even +Class.new+ / +Module.new+. These tools 599 | allow you to dynamically manipulate Ruby code without evaluating strings directly. 600 | For more details, you'll definitely want to read the "Mastering the Dynamic Toolkit" 601 | chapter. 602 | 603 | ==== Blind +rescue+ missions ==== 604 | 605 | Ruby provides a lot of different ways to handle exceptions. They run the gamut 606 | all the way from capturing the full stack trace to completely ignoring raised 607 | errors. This flexibility means that exceptions aren't necessarily treated with 608 | the same gravity in Ruby as in other languages, since they are very simple to 609 | rescue once they are raised. In certain cases, folks have even used +rescue+ 610 | as stand in replacement for conditional statements. The classic example 611 | follows: 612 | 613 | ................................................................................ 614 | 615 | name = @user.first_name.capitalize rescue "Anonymous" 616 | 617 | ................................................................................ 618 | 619 | Usually, this is done with the intention of capturing the +NoMethodError+ raised 620 | by something like +first_name+ being nil here. It accomplishes this task 621 | well, and looks slightly nicer than the alternative: 622 | 623 | ................................................................................ 624 | 625 | name = @user.first_name ? @user.first_name.capitalize : "Anonymous" 626 | 627 | ................................................................................ 628 | 629 | However, the downside of using this trick is that you will most likely end up 630 | seeing this code again, at the long end of a painful debugging session. For 631 | demonstration purposes, let's assume our +User+ is implemented like this: 632 | 633 | ................................................................................ 634 | require "pstore" 635 | 636 | class User 637 | 638 | def self.data 639 | @data ||= PStore.new("users.store") 640 | end 641 | 642 | def self.add(id, user_data) 643 | data.transaction do 644 | data[id] = user_data 645 | end 646 | end 647 | 648 | def self.find(id) 649 | data.transaction do 650 | data[id] or raise "User not found" 651 | end 652 | end 653 | 654 | def initialize(id) 655 | @user_id = id 656 | end 657 | 658 | def attributes 659 | self.class.find(@user_id) 660 | end 661 | 662 | def first_name 663 | attributes[:first_name] 664 | end 665 | 666 | end 667 | 668 | ................................................................................ 669 | 670 | What we have here is basically a +PStore+ backed user database. It's not terribly 671 | important to understand every last detail, but the code should be fairly easy to 672 | understand if you play around with it a bit. 673 | 674 | Firing up irb we can see that the +rescue+ trick works fine for the case where 675 | +User#first_name+ returns +nil+. 676 | 677 | ................................................................................ 678 | 679 | >> require "user" 680 | => true 681 | 682 | >> User.add('sandal', email: 'gregory@majesticseacreature.com') 683 | 684 | => {:email=>"gregory@majesticseacreature.com"} 685 | >> @user = User.new('sandal') 686 | => # 687 | >> name = @user.first_name.capitalize rescue "Anonymous" 688 | => "Anonymous" 689 | => # 690 | >> @user.first_name 691 | => nil 692 | >> @user.attributes 693 | => {:email=>"gregory@majesticseacreature.com"} 694 | 695 | ................................................................................ 696 | 697 | Ordinary execution also works fine: 698 | 699 | ................................................................................ 700 | 701 | >> User.add('jia', first_name: "Jia", email: "jia@majesticseacreature.com") 702 | 703 | => {:first_name=>"Jia", :email=>"jia@majesticseacreature.com"} 704 | >> @user = User.new('jia') 705 | => # 706 | >> name = @user.first_name.capitalize rescue "Anonymous" 707 | => "Jia" 708 | >> @user.attributes 709 | => {:first_name=>"Jia", :email=>"jia@majesticseacreature.com"} 710 | >> @user.first_name 711 | => "Jia" 712 | >> @user = User.new('sandal') 713 | 714 | ................................................................................ 715 | 716 | It seems like everything is in order, however, you don't need to look far. 717 | Notice that this line will succeed even if +@user+ is undefined 718 | 719 | ................................................................................ 720 | 721 | >> @user = nil 722 | => nil 723 | >> name = @user.first_name.capitalize rescue "Anonymous" 724 | => "Anonymous" 725 | 726 | ................................................................................ 727 | 728 | This means you can't count on catching an error when a typo or a renamed 729 | variable creeps into your code. This weakness of course propagates down the 730 | chain as well: 731 | 732 | ................................................................................ 733 | 734 | >> name = @user.a_fake_method.capitalize rescue "Anonymous" 735 | => "Anonymous" 736 | >> name = @user.a_fake_method.cannot_fail rescue "Anonymous" 737 | => "Anonymous" 738 | 739 | ................................................................................ 740 | 741 | Of course, issues with a one liner like this should be easy enough to catch even 742 | without an exception. This is most likely the reason why this pattern has 743 | become so common. However, this is usually an oversight, because the problem 744 | exists deeper down the bunny hole as well. Let's introduce a typo into our user 745 | implementation: 746 | 747 | ................................................................................ 748 | 749 | class User 750 | 751 | def first_name 752 | attribute[:first_name] 753 | end 754 | 755 | end 756 | 757 | ................................................................................ 758 | 759 | Now, we go back and look at one of our previously working examples: 760 | 761 | ................................................................................ 762 | 763 | >> @user = User.new('jia') 764 | => # 765 | >> name = @user.first_name.capitalize rescue "Anonymous" 766 | => "Anonymous" 767 | >> @user.first_name 768 | NameError: undefined local variable or method `attribute' for # 769 | from (irb):23:in `first_name' 770 | from (irb):32 771 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 772 | 773 | ................................................................................ 774 | 775 | Hopefully you're beginning to see the picture. Although good testing and 776 | extensive quality assurance can catch these bugs, using this conditional 777 | modifier +rescue+ hack is like putting blinders on your code. Unfortunately, 778 | this can also go for code of the form: 779 | 780 | ................................................................................ 781 | 782 | def do_something_dangerous 783 | might_raise_an_error 784 | rescue 785 | "default value" 786 | end 787 | 788 | ................................................................................ 789 | 790 | Pretty much any rescue that does not capture a specific error may be a source of 791 | silent failure in your applications. The only real case where an unqualified 792 | +rescue+ might make sense is when it is combined with a unqualified +raise+, 793 | which causes the same error to resurface after executing some code: 794 | 795 | ................................................................................ 796 | 797 | begin 798 | # do some stuff 799 | rescue => e 800 | MyLogger.error "Error doing stuff: #{e.message}" 801 | raise 802 | end 803 | 804 | ................................................................................ 805 | 806 | In other situations, be sure to either know the risks involved, or avoid 807 | this technique entirely. You'll thank yourself later. 808 | 809 | ==== Doing +method_missing+ wrong ==== 810 | 811 | One thing you really don't want to do is mess up a +method_missing+ hook. 812 | Because the purpose of +method_missing+ is to handle unknown messages, it is a 813 | key feature for helping finding bugs in your code. 814 | 815 | In the "Mastering the Dynamic Toolkit" chapter of this book, we covered some 816 | examples of how to use +method_missing+ properly. Here's an example of how to 817 | do it wrong: 818 | 819 | .................................................................................... 820 | 821 | class Prawn::Document 822 | 823 | # Provides the following shortcuts: 824 | # 825 | # stroke_some_method(*args) #=> some_method(*args); stroke 826 | # fill_some_method(*args) #=> some_method(*args); fill 827 | # fill_and_stroke_some_method(*args) #=> some_method(*args); fill_and_stroke 828 | # 829 | def method_missing(id,*args,&block) 830 | case(id.to_s) 831 | when /^fill_and_stroke_(.*)/ 832 | send($1,*args,&block); fill_and_stroke 833 | when /^stroke_(.*)/ 834 | send($1,*args,&block); stroke 835 | when /^fill_(.*)/ 836 | send($1,*args,&block); fill 837 | end 838 | end 839 | 840 | end 841 | 842 | .................................................................................... 843 | 844 | Although this may look very similar to an earlier example in this book, it has a 845 | critical flaw. Can you see it? If not, this irb session should help: 846 | 847 | .................................................................................... 848 | 849 | >> pdf.fill_and_stroke_cirlce([100,100], :radius => 25) 850 | => "0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\nb\n" 851 | >> pdf.stroke_the_pretty_kitty([100,100], :radius => 25) 852 | => "0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\nb\nS\n" 853 | >> pdf.donuts 854 | => nil 855 | 856 | .................................................................................... 857 | 858 | By coding a +method_missing+ hook without delegating to the original +Object+ 859 | definition, we have effectively muted our object's ability to complain about 860 | messages we really didn't want it to handle. To add insult to injury, failure 861 | cases such as +fill_and_stroke_cirlce+ and +stroke_the_pretty_kitty+ are doubly 862 | confusing, since they return a non-nil value even though they do not produce 863 | meaningful results. 864 | 865 | Luckily, the remedy to this is simple, we just add a call to +super+ in the 866 | catch-all case: 867 | 868 | .................................................................................... 869 | 870 | def method_missing(id,*args,&block) 871 | case(id.to_s) 872 | when /^fill_and_stroke_(.*)/ 873 | send($1,*args,&block); fill_and_stroke 874 | when /^stroke_(.*)/ 875 | send($1,*args,&block); stroke 876 | when /^fill_(.*)/ 877 | send($1,*args,&block); fill 878 | else 879 | super 880 | end 881 | end 882 | 883 | .................................................................................... 884 | 885 | Now, if we re-run our examples from before, you see much more predictable 886 | behavior, in line with what we'd expect if we had no hook set up in the first 887 | place: 888 | 889 | .................................................................................... 890 | 891 | >> pdf.fill_and_stroke_cirlce([100,100], :radius => 25) 892 | NoMethodError: undefined method `cirlce' for # 893 | from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:68:in `method_missing' 894 | from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:62:in `method_missing' 895 | from (irb):4 896 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 897 | 898 | >> pdf.stroke_the_pretty_kitty([100,100], :radius => 25) 899 | NoMethodError: undefined method `the_pretty_kitty' for # 900 | from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:68:in `method_missing' 901 | from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:64:in `method_missing' 902 | from (irb):5 903 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 904 | 905 | >> pdf.donuts 906 | NoMethodError: undefined method `donuts' for # 907 | from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:68:in `method_missing' 908 | from (irb):6 909 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 910 | 911 | .................................................................................... 912 | 913 | An important thing to remember is that in addition to ensuring that you call 914 | +super+ from within your +method_missing()+ calls, you are also responsible for 915 | maintaining the method's signature. Ruby will happily allow you to write a hook 916 | which only captures the name and not the arguments and block a method is called 917 | with: 918 | 919 | .................................................................................... 920 | 921 | def method_missing(id) 922 | # ... 923 | end 924 | 925 | .................................................................................... 926 | 927 | However, if you set things up this way, even when you call +super+ you'll be 928 | breaking things farther up the chain, since +Object#method_missing+ expects the 929 | whole signature of the function call to remain intact. So it's not only 930 | delegating to the original that is important, but delegating without information loss. 931 | 932 | If you're sure to act responsibly with your +method_missing+ calls, it won't be 933 | that dangerous in most cases. However, if you get sloppy here, it is virtually 934 | guaranteed to come back to haunt you. If you get into this habit right away, 935 | it'll be sure to save you some headaches down the line. 936 | 937 | === Conclusions === 938 | 939 | This appendix doesn't come close to covering all the trouble you can get 940 | yourself in with Ruby. It does however cover some of the most common sources of 941 | trouble and confusion, while showing some much less painful alternatives. 942 | 943 | When it comes to design, much can be gained by simply reducing complexity. If 944 | the path you're on seems like it is too difficult, odds are it can be made a lot 945 | easier if you just think about it in a different way. As for "clever" 946 | implementation tricks and shortcuts, they can be more trouble than they're worth 947 | if they come at the expense of clarity or maintainability of your code. 948 | 949 | Put simply, the worst practices in Ruby are ones that make you work much harder 950 | than you have to. If you start to introduce code that seems really cool at 951 | first, but later is shown to introduce complicated faults at the corner cases, 952 | it is generally wise to just rip it out and start fresh with something a little 953 | less exciting that's more reliable. 954 | 955 | If you maintain the balancing act between creative approaches to your problems 956 | and ones that work without introducing excess complexity, you'll have a very 957 | happy time writing Ruby code. Because Ruby gives you the power to do both good 958 | and evil, it's ultimately up to you how you want to maintain your projects. 959 | However, code that is maintainable and predictable is much more of a joy to work 960 | with than fragile and sloppy hacks that have been simply ductaped together. 961 | 962 | Now that we have reached the very end of this book, I trust that you have the 963 | skills necessary to go out and find the Best and Worst of Ruby Practices on your 964 | own. The real challenge is knowing the difference between the two, and that 965 | ability comes only with practical experience gained by working on and 966 | investigating real problems. With luck, this book has included enough real 967 | world examples to give you a head start in that area, but the heavy lifting needs to 968 | be done by you. 969 | 970 | I hope you have enjoyed this wild ride through Ruby with me, and I really hope 971 | that something or the other in this book has challenged or inspired you. Please 972 | go out now and write some good open source Ruby code, and maybe you'll make a 973 | guest appearance in the second edition! 974 | -------------------------------------------------------------------------------- /manuscript/unmerged/things_go_wrong.txt: -------------------------------------------------------------------------------- 1 | == When Things Go Wrong == 2 | 3 | === Resolving Defects Doesn't Have To Be Painful === 4 | 5 | Unfortunately, neither this book nor a lifetime of practice can cause you to 6 | attain Ruby programming perfection. However, a good substitute for never making 7 | a mistake is knowing how to fix your probelms as they arise. The purpose 8 | of this chapter is to provide you with the necessary tools and techniques to 9 | prepare you for Ruby search and rescue missions. 10 | 11 | We will start by walking through a simple but real bug hunting session to get a 12 | basic outline of how to investigate issues in your Ruby projects. We'll then 13 | dive into some more specific tools and techniques for helping refine this 14 | process. What may surprise you is that we'll do all of this without ever 15 | talking about using a debugger. This is mainly because most Rubyists can and do 16 | get away without the use of a formal debugging tool, via various lightweight 17 | techniques we'll discuss here. 18 | 19 | One skillset you will need to have in order to make the most out of what 20 | we'll discuss here is a decent understanding of how Ruby's built in unit testing 21 | framework works. That means if you haven't read the '"Driving Code Through 22 | Tests"' chapter yet, you may want to go ahead and do that now. 23 | 24 | What you will notice about this chapter is that it is much more about the 25 | process of problem solving in the context of Ruby than it is about solving any 26 | particular problem. If you keep this goal in mind while reading through the 27 | examples here, you'll make the most out of what we'll discuss here. 28 | 29 | Now that you know what to expect, let's start fixing some stuff. 30 | 31 | === A Process For Debugging Ruby Code === 32 | 33 | Part of becoming masterful at anything is learning from your mistakes. Since 34 | Ruby programming is no exception, I want to share one of my embarrassing 35 | moments so that others can benefit from it. If the problems with the code I am 36 | about to show are immediately obvious to you, don't worry about that. Instead, 37 | focus on the problem solving strategies used, as that's what is most important 38 | here. 39 | 40 | We're going to look at a simplified version of a real problem I ran into in my 41 | day to day work. One of my Rails gigs involved building a system for processing 42 | scholarship applications online. After users have filled out an application 43 | once, whether it was accepted or rejected, they are presented with a somewhat 44 | different application form upon renewal. Although it deviates a bit from our 45 | real world application, here's some simple code that illustrates that process: 46 | 47 | ................................................................................. 48 | 49 | if gregory.can_renew? 50 | puts "Start the application renewal process" 51 | else 52 | puts "Edit a pending application or submit a new one" 53 | end 54 | 55 | ................................................................................. 56 | 57 | At first, I thought the logic for this was simple. So long as all of the user's 58 | applications had a status of either accepted or rejected, it was safe to say 59 | they could renew their application. The following code provides a rough model 60 | that implements this requirement: 61 | 62 | ................................................................................. 63 | 64 | Application = Struct.new(:state) 65 | 66 | class User 67 | def initialize 68 | @applications = [] 69 | end 70 | 71 | attr_reader :applications 72 | 73 | def can_renew? 74 | applications.all? { |e| [:accepted, :rejected].include?(e.state) } 75 | end 76 | end 77 | 78 | ................................................................................. 79 | 80 | Using this model, we can see that the output of the following code is '"Start the 81 | application renewal process"': 82 | 83 | ................................................................................. 84 | 85 | gregory = User.new 86 | gregory.applications << Application.new(:accepted) 87 | gregory.applications << Application.new(:rejected) 88 | 89 | if gregory.can_renew? 90 | puts "Start the application renewal process" 91 | else 92 | puts "Edit a pending application or submit a new one" 93 | end 94 | 95 | ................................................................................. 96 | 97 | If we add a pending application into the mix, we see that the other case is 98 | triggered, outputting '"Edit a pending application or submit a new one"': 99 | 100 | ................................................................................. 101 | 102 | gregory = User.new 103 | gregory.applications << Application.new(:accepted) 104 | gregory.applications << Application.new(:rejected) 105 | gregory.applications << Application.new(:pending) 106 | 107 | if gregory.can_renew? 108 | puts "Start the application renewal process" 109 | else 110 | puts "Edit a pending application or submit a new one" 111 | end 112 | 113 | ................................................................................. 114 | 115 | So far everything has been going fine, but the next bit of code exposed a nasty 116 | edge case: 117 | 118 | ................................................................................. 119 | 120 | gregory = User.new 121 | 122 | if gregory.can_renew? 123 | puts "Start the application renewal process" 124 | else 125 | puts "Edit a pending application or submit a new one" 126 | end 127 | 128 | ................................................................................. 129 | 130 | While I fully expected this to print out '"Edit a pending application or submit a 131 | new one"', it managed to print the other message instead! 132 | 133 | Popping open irb, I tracked down the root of the problem: 134 | 135 | ................................................................................. 136 | 137 | >> gregory = User.new 138 | => # 139 | >> gregory.can_renew? 140 | => true 141 | 142 | >> gregory.applications 143 | => [] 144 | >> gregory.applications.all? { false } 145 | => true 146 | 147 | ................................................................................. 148 | 149 | Of course, the trouble here was due to an incorrect use of the +Enumerable#all?+ method. 150 | I had been relying on Ruby to do what I meant rather than what I actually asked it to 151 | do, which is usually a bad idea. For some reason I thought that calling +all?+ 152 | on an empty array would return +nil+ or +false+, but instead, it returned 153 | +true+. To fix it, I'd need to re-think +can_renew?+ a little bit. 154 | 155 | I could have fixed the issue immediately by adding a special case 156 | involving +applications.empty?+, but I wanted to be sure this bug wouldn't have 157 | a chance to crop up again. The easiest way to do this was to write some tests, 158 | which I probably should have done in the first place. 159 | 160 | The following simple test case clearly specified the behavior I expected, 161 | splitting it up into three cases as we did before: 162 | 163 | ................................................................................. 164 | 165 | require "test/unit" 166 | 167 | class UserTest < Test::Unit::TestCase 168 | 169 | def setup 170 | @gregory = User.new 171 | end 172 | 173 | def test_a_new_applicant_cannot_renew 174 | assert_block("Expected User#can_renew? to be false for a new applicant") do 175 | not @gregory.can_renew? 176 | end 177 | end 178 | 179 | def test_a_user_with_pending_applications_cannot_renew 180 | @gregory.applications << app(:accepted) << app(:pending) 181 | 182 | msg = "Expected User#can_renew? to be false when user has pending applications" 183 | assert_block(msg) do 184 | not @gregory.can_renew? 185 | end 186 | end 187 | 188 | def test_a_user_with_only_accepted_and_rejected_applications_can_renew 189 | @gregory.applications << app(:accepted) << app(:rejected) << app(:accepted) 190 | msg = "Expected User#can_renew? to be true when all applications are accepted or rejected" 191 | assert_block(msg) { @gregory.can_renew? } 192 | end 193 | 194 | private 195 | 196 | def app(name) 197 | Application.new(name) 198 | end 199 | 200 | end 201 | 202 | ................................................................................. 203 | 204 | When we run the tests, we can clearly see the failure that we investigated 205 | manually a little earlier: 206 | 207 | ................................................................................. 208 | 209 | 1) Failure: 210 | test_a_new_applicant_cannot_renew(UserTest) [foo.rb:24]: 211 | Expected User#can_renew? to be false for a new applicant 212 | 213 | 3 tests, 3 assertions, 1 failures, 0 errors 214 | 215 | ................................................................................. 216 | 217 | Now that we've successfully captured the essence of the bug, we can go about 218 | fixing it. As you may suspect, the solution is simple: 219 | 220 | ................................................................................. 221 | 222 | def can_renew? 223 | return false if applications.empty? 224 | applications.all? { |e| [:accepted, :rejected].include?(e.state) } 225 | end 226 | 227 | ................................................................................. 228 | 229 | Running the tests again, we see that everything passes: 230 | 231 | ................................................................................. 232 | 233 | 3 tests, 3 assertions, 0 failures, 0 errors 234 | 235 | ................................................................................. 236 | 237 | If we went back and ran our original examples that print some messages to the 238 | screen, we'd see that those now work as expected as well. We could have used 239 | those on their own to test our attempted fix, but by writing automated tests, we 240 | have a safety net against regressions, which may be one of the main benefits of 241 | unit tests. 242 | 243 | Though the particular bug we squashed may be a bit boring, what we have shown is 244 | a repeatable procedure for bug hunting, without ever firing up a debugger or 245 | combing through log files. To recap, here's the general plan for how things 246 | should play out: 247 | 248 | * First, identify the different scenarios that apply to a given feature. 249 | 250 | * Enumerate over these scenarios to identify which ones are affected by 251 | defects and which ones work as expected. This can be done in many ways, 252 | ranging from printing debugging messages on the command line, 253 | to logfile analysis, to live application testing. The important thing is to 254 | identify and isolate the cases effected by the bug. 255 | 256 | * Hop into `irb` if possible and take a look at what your objects actually look 257 | like under the hood. Experiment with the failing scenarios in a step by 258 | step fashion to try to dig down and uncover the root cause of problems. 259 | 260 | * Write tests to reproduce the problems you are having, along with what you 261 | expect to happen when the issue is resolved. 262 | 263 | * Implement a fix that passes the tests, and then repeat the process until all 264 | issues are resolved. 265 | 266 | Sometimes, it's possible to condense this process into two steps by simply 267 | writing a test which reproduces the bug and then introducing a fix that passes 268 | the tests. However, most of the time the extra leg work will pay off, as 269 | understanding the root cause of the problem will allow you to treat your 270 | application's disease all at once rather than addressing its sympthoms one by 271 | one. 272 | 273 | Given this basic outline of how to isolate and resolve issues within our code, we 274 | can now focus on some specific tools and techniques that will help improve the 275 | process for us. 276 | 277 | === Capturing the Essence of a Defect === 278 | 279 | Before you can begin to hunt down a bug, you need to be able to reproduce it in 280 | isolation. The main idea is that if you remove all the extraneous code that is 281 | unrelated to the issue, it will be easier to see what is really going on. As you 282 | continue to investigate an issue, you may discover that you can reduce the 283 | example more and more based on what you learn. Since I have a real example 284 | handy from one of my projects, we can look at this process in action to see how 285 | it plays out. 286 | 287 | What follows is some Prawn code that was submitted as a bug report. The problem 288 | it is supposed to show is that every text +span()+ resulted in a page break 289 | happening, when it wasn't supposed to. 290 | 291 | ................................................................................. 292 | 293 | 294 | Prawn::Document.generate("span.pdf") do 295 | 296 | span(350, :position => :center) do 297 | text "Here's some centered text in a 350 point column. " * 100 298 | end 299 | 300 | text "Here's my sentence." 301 | 302 | bounding_box([50,300], :width => 400) do 303 | text "Here's some default bounding box text. " * 10 304 | span(bounds.width, 305 | :position => bounds.absolute_left - margin_box.absolute_left) do 306 | text "The rain in spain falls mainly on the plains. " * 300 307 | end 308 | end 309 | 310 | text "Here's my second sentence." 311 | 312 | end 313 | 314 | ................................................................................. 315 | 316 | Without a strong knowledge of Prawn, this example may already seem fairly 317 | reduced. Afterall, the text represents a sort of abstract problem definition 318 | rather than some code that was ripped out of an application, and that is a good 319 | start. But upon running this code, I noticed that the defect was present 320 | whenever a +span()+ call was made. This allowed me to reduce the example 321 | substantially: 322 | 323 | ................................................................................. 324 | 325 | Prawn::Document.generate("span.pdf") do 326 | 327 | span(350) do 328 | text "Here's some text in a 350pt wide column. " * 20 329 | end 330 | 331 | text "This text should appear on the same page as the spanning text" 332 | 333 | end 334 | 335 | ................................................................................. 336 | 337 | 338 | Whether or not you have any practical experience in Prawn, the issue stands 339 | out better in this revised example, simply because there is less code to 340 | consider. The code is also a bit more self-documenting, which makes buggy output 341 | harder to miss. Many bug reports can be reduced in a similar fashion. Of course, 342 | not everything compacts so well, but every little bit of simplification helps. 343 | 344 | Most bugs aren't going to show up in the first place you look. Instead, they'll 345 | often be hidden farther down the chain, stashed away in some low level helper 346 | method or in some other code that your feature depends on. Since this is so 347 | common, I've developed the habit of mentally tracing the execution path that my 348 | example code follows, in hopes of finding some obvious mistake along the way. If 349 | I notice anything suspicious along the way, I start the next iteration of bug 350 | reproduction. 351 | 352 | Using this approach, I found out that the problem with +span()+ wasn't actually 353 | in +span()+ at all. Although the details aren't important, it turns out that 354 | the core problem was in a lower level function called +canvas()+ which 355 | +span()+ relies on. This method was incorrectly setting the text cursor on 356 | the page to the very bottom of the page after executing its block argument. 357 | I used the following example to confirm this was the case: 358 | 359 | ................................................................................. 360 | 361 | Prawn::Document.generate("canvas_sets_y_to_0.pdf") do 362 | canvas { text "Some text at the absolute top left of the page" } 363 | 364 | text "This text should not be after a pagebreak" 365 | end 366 | 367 | ................................................................................. 368 | 369 | When I saw that I was able to reproduce the problem, I went on to formally specify what 370 | was wrong in the form of tests, feeling reasonably confident that this 371 | was the root defect. 372 | 373 | Whenever you are hunting for bugs, the practice of reducing your area of 374 | interest first will help you avoid dead ends and limit the amount of possible 375 | places you'll need look for problems. Before doing any formal investigation, 376 | it's a good idea to check for obvious problems so that you can get a sense of 377 | where the real source of your defect is. Some bugs are harder to catch on sight 378 | than others, but there is no need to overthink the easy ones. 379 | 380 | If a defect can be reproduced in isolation, you can usually narrow it down to 381 | a specific deviation from what you expected to happen. We'll now take a look at 382 | how to go from an example that reproduces a bug to a failing test that fully 383 | categorizes it. 384 | 385 | The main benefit of an automated test is that it will explode when your code 386 | fails to act as expected. It is important to keep in mind that even if you 387 | have an existing test suite, when you encounter a bug that does not cause any 388 | failures, you need to update your tests. This helps prevent regressions, 389 | allowing you to fix a bug once and forget about it. 390 | 391 | Continuing with our example, here is a simple but sufficient test to 392 | corner the bug. 393 | 394 | ................................................................................. 395 | 396 | class CanvasTest < Test::Unit::TestCase 397 | 398 | def setup 399 | @pdf = Prawn::Document.new 400 | end 401 | 402 | def test_canvas_should_not_reset_y_to_zero 403 | after_text_position = nil 404 | 405 | @pdf.canvas do 406 | @pdf.text "Hello World" 407 | after_text_position = @pdf.y 408 | end 409 | 410 | assert_equal after_text_position, @pdf.y 411 | end 412 | end 413 | 414 | ................................................................................. 415 | 416 | Here, we expect the y coordinate after the +canvas+ block is executed to be the same 417 | as it was just after the text was rendered to the page. Running this test 418 | reproduces the problem we created an example for earlier: 419 | 420 | ................................................................................. 421 | 422 | 1) Failure:test_canvas_should_not_reset_y_to_zero(CanvasTest) [---] 423 | <778.128> expected but was 424 | <0.0>. 425 | 426 | ................................................................................. 427 | 428 | Here, we have converted our simplified example into something that can become a 429 | part of our automated test suite. The more simple an example is, the easier 430 | this is to do. More complicated examples may need to be broken into several 431 | chunks, but this process is straightforward more often than not. 432 | 433 | Once we write a test that reproduces our problem, the way we fix it is to get 434 | our tests passing again. If other tests end up breaking in order to get our new 435 | test to pass, we know that something is still wrong. If for some reason, our 436 | problem isn't solved when we get all the tests passing again, it means our 437 | reduced example probably didn't cover the entirety of the problem, so we need to 438 | go back to the drawing board in those cases. Even still, not all is lost. Each 439 | test serves as a significant reduction of your problem space. Every passing 440 | assertion eliminates the possibility of that particular issue from being the 441 | root of your problem. Sooner or later, there won't be any place left for your 442 | bugs to hide. 443 | 444 | For those who need a recap, here are the keys to producing a good reduced 445 | example: 446 | 447 | * Remove as much extraneous code as possible from your example, and the bug 448 | will be clearer to see. 449 | 450 | * Try to make your example self describing, so that even someone not 451 | familiar with the core issue can see at a glance whether something is wrong. 452 | This helps others report regressions even if they don't fully understand the 453 | internals of your project. 454 | 455 | * Continue to revise your examples until the reach the root cause of the 456 | problem. Don't throw away any of the higher level examples until you verify 457 | that fixing a general problem solves the specific issue you ran into as 458 | well. 459 | 460 | * When you understand the root cause of your problem, code up a failing test 461 | that demonstrates how the code should work. When it passes, the bug should 462 | be gone. If it fails again, you'll know there has been a regression. 463 | 464 | === Scrutinizing Your Code === 465 | 466 | When things aren't working the way you expect them to, you obviously need to 467 | find out why. There are certain tricks that can make this task a lot easier on 468 | you, and you can utilize them without ever needing to fire up the debugger. 469 | 470 | ==== Utilizing Reflection ==== 471 | 472 | Many bugs come from using an object in a different way than you're supposed to, 473 | or by some internal state deviating from your expectations. To be able to 474 | detect and fix these bugs, you need to be able to get a clear picture of what is 475 | going on under the hood in the objects you're working with. 476 | 477 | I'll assume that you already know that +Kernel#p+ and +Object#inspect+ exist, 478 | and how to use them for basic needs. However, when left to their default 479 | behaviors, using these tools to debug complex objects can be too painful to be 480 | practical. We can take an unadorned +Prawn::Document+'s inspect output for 481 | an example: 482 | 483 | ................................................................................. 484 | 485 | #0}, @gen=0, @identifier=4, @stream="0.000 0.000 0.000 r 487 | g\n0.000 0.000 0.000 RG\nq\n", @compressed=false>, @info= 488 | #"Prawn", :Producer=>"Prawn"}, 489 | @gen=0, @identifier=1, @compressed=false> 490 | , @root=#:Catalog, :Pages=> 491 | #1, :Kids=>[##0}, @gen=0, @identifier=4, @stream="0.000 0.000 0.000 rg\n0.000 494 | 0.000 0.000 RG\nq\n", 495 | 496 | << ABOUT 50 MORE LINES LIKE THIS >> 497 | 498 | #1, :Kids=>[#], :Type=>:Pages}, @gen=0, @identifier=2, @compressed=false>, 500 | :MediaBox=>[0, 0, 612.0, 792.0]}, @gen=0, @identifier=5, @compressed=false>], 501 | @margin_box=#, @height=720.0>, 503 | @fill_color="000000", @current_page=# 504 | #0}, @gen=0, @identifier=4, 505 | @stream="0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\n", @compressed=false>, 506 | :Type=>:Page, :Parent=>#1, 507 | :Kids=>[#], :Type=>:Pages}, 508 | @gen=0, @identifier=2, @compressed=false>, :MediaBox=>[0, 0, 612.0, 792.0]}, 509 | @gen=0, @identifier=5, @compressed=false>, @skip_encoding=nil, 510 | @bounding_box=#, @height=720.0>, @page_size="LETTER", 512 | @stroke_color="000000" , @text_options={}, @compress=false, @margins={:top=>36, 513 | :left=>36, :bottom=>36, :right=>36}> 514 | ................................................................................. 515 | 516 | Although this information sure is thorough, it probably won't quickly help us 517 | identify what page layout is being used or what the dimensions of the margins 518 | are. If we aren't familiar with the internals of this object, such verbose 519 | output is borderline useless. Of course, this doesn't mean we're simply out of 520 | luck. In situations like this, we can infer a lot about an object by using 521 | Ruby's reflective capabilities: 522 | 523 | ................................................................................. 524 | 525 | >> pdf.class 526 | => Prawn::Document 527 | 528 | >> pdf.instance_variables 529 | => [:@objects, :@info, :@pages, :@root, :@page_size, :@page_layout, :@compress, 530 | :@skip_encoding, :@background, :@font_size, :@text_options, :@margins, :@margin_box, 531 | :@bounding_box, :@page_content, :@current_page, :@fill_color, :@stroke_color, :@y] 532 | 533 | >> Prawn::Document.instance_methods(inherited_methods=false).sort 534 | => [:bounding_box, :bounds, :bounds=, :canvas, :compression_enabled?, :cursor, 535 | :find_font, :font, :font_families, :font_registry, :font_size, :font_size=, :margin_box, 536 | :margin_box=, :margins, :mask, :move_down, :move_up, :pad, :pad_bottom, :pad_top, :page_count, 537 | :page_layout, :page_size, :render, :render_file, :save_font, :set_font, :span, 538 | :start_new_page, :text_box, :width_of, :y, :y=] 539 | 540 | >> pdf.private_methods(inherited_methods=false) 541 | => [:init_bounding_box, :initialize, :build_new_page_content, :generate_margin_box] 542 | 543 | ................................................................................. 544 | 545 | Now, even if we haven't worked with this particular object before, we have a 546 | sense of what is available and it makes queries like the ones mentioned in the 547 | last paragraph much easier: 548 | 549 | ................................................................................. 550 | 551 | >> pdf.margins 552 | => {:left=>36, :right=>36, :top=>36, :bottom=>36} 553 | 554 | >> pdf.page_layout 555 | => :portrait 556 | 557 | ................................................................................. 558 | 559 | If we want to look at some lower level details, such as the contents of some 560 | instance variables, we can do so via +instance_variable_get+: 561 | 562 | ................................................................................. 563 | 564 | >> pdf.instance_variable_get(:@current_page) 565 | => #:Page, 566 | :Parent=>#:Pages, 567 | :Count=>1, :Kids=>[#]}, @compressed=false, @on_encode=nil>, 568 | :MediaBox=>[0, 0, 612.0, 792.0], :Contents=>#0}, @compressed=false, @on_encode=nil, 570 | @stream="0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\n">}, @compressed=false, 571 | @on_encode=nil> 572 | 573 | ................................................................................. 574 | 575 | Using these tricks, we can easily determine whether we've accidentally got the 576 | name of a variable or method wrong. We can also see what the underlying 577 | structure of our objects are, and repeat this process to drill down and 578 | investigate potential problems. 579 | 580 | ==== Improving inspect output ==== 581 | 582 | Of course, the whole situation here would be better if we had easier to read 583 | inspect output. There is actually a standard library called +pp+ that 584 | improves the formatting of inspect while operating in a very similar 585 | fashion. I wrote a whole section in `Appendix B` about this library, including 586 | some of its advanced capabilities. You should definitely read up on what +pp+ 587 | offers you when you get the chance, but here I'd like to cover some alternative 588 | approaches that can also come in handy. 589 | 590 | As it turns out, the output of +Kernel#p+ can be improved on an object by object 591 | basis. This may have already been obvious if you have used +Object#inspect+ 592 | before, but it is also a severely underused feature of Ruby. This feature can be 593 | used to turn the mess we saw in the previous section into beautiful debugging 594 | output: 595 | 596 | ................................................................................. 597 | 598 | >> pdf = Prawn::Document.new 599 | => < Prawn::Document:0x27df8a: 600 | @background: nil 601 | @compress: false 602 | @fill_color: "000000" 603 | @font_size: 12 604 | @margins: {:left=>36, :right=>36, :top=>36, :bottom=>36} 605 | @page_layout: :portrait 606 | @page_size: "LETTER" 607 | @skip_encoding: nil 608 | @stroke_color: "000000" 609 | @text_options: {} 610 | @y: 756.0 611 | 612 | @bounding_box -> Prawn::Document::BoundingBox:0x27dd64 613 | @current_page -> Prawn::Reference:0x27dd1e 614 | @info -> Prawn::Reference:0x27df44 615 | @margin_box -> Prawn::Document::BoundingBox:0x27dd64 616 | @objects -> Array:0x27df6c 617 | @page_content -> Prawn::Reference:0x27dd46 618 | @pages -> Prawn::Reference:0x27df26 619 | @root -> Prawn::Reference:0x27df12 > 620 | 621 | ................................................................................. 622 | 623 | I think you'll agree that this looks substantially easier to follow than the 624 | default inspect output. To accomplish this, I put together a pretty straightforward 625 | template that allows you to pass in a couple arrays of symbols which point at 626 | instance variables: 627 | 628 | ................................................................................. 629 | 630 | module InspectTemplate 631 | 632 | def __inspect_template(objs, refs) 633 | obj_output = objs.sort.each_with_object("") do |v,out| 634 | out << "\n #{v}: #{instance_variable_get(v).inspect}" 635 | end 636 | 637 | ref_output = refs.sort.each_with_object("") do |v,out| 638 | ref = instance_variable_get(v) 639 | out << "\n #{v} -> #{__inspect_object_tag(ref)}" 640 | end 641 | 642 | "< #{__inspect_object_tag(self)}: #{obj_output}\n#{ref_output} >" 643 | end 644 | 645 | def __inspect_object_tag(obj) 646 | "#{obj.class}:0x#{obj.object_id.to_s(16)}" 647 | end 648 | 649 | end 650 | 651 | ................................................................................. 652 | 653 | After mixing this into `Prawn::Document`, I only need to specify which variables I 654 | want to display the entire contents of, and which I want to just show as 655 | references. Then, it is as easy as calling +__inspect_template+ with these 656 | values. 657 | 658 | ................................................................................. 659 | 660 | class Prawn::Document 661 | 662 | include InspectTemplate 663 | 664 | def inspect 665 | objs = [ :@page_size, :@page_layout, :@margins, :@font_size, :@background, 666 | :@stroke_color, :@fill_color, :@text_options, :@y, :@compress, 667 | :@skip_encoding ] 668 | 669 | refs = [ :@objects, :@info, :@pages, :@bounding_box, :@margin_box, :@page_content, 670 | :@current_page, :@root] 671 | 672 | __inspect_template(objs,refs) 673 | end 674 | end 675 | 676 | ................................................................................. 677 | 678 | Once we provide a customized +inspect+ method that returns a string, both 679 | +Kernel#p+ and +irb+ will pick up on it, yielding the nice results we showed 680 | earlier. 681 | 682 | Although my +InspectTemplate+ can easily be reused, it carries the major caveat 683 | that you become 100% responsible for exposing your variables for debugging 684 | output. Anything not explicitly passed to +__inspect_template+ will not be 685 | rendered. However, there is a middle of the road solution that is far more 686 | automatic. 687 | 688 | The +YAML+ data serialization standard library has a nice side effect of 689 | producing highly readable representations of Ruby objects. Because of this, it 690 | actually provides a +Kernel#y+ method which can be used as a stand in 691 | replacement for +p+. Although this may be a bit strange, if you look at it in 692 | action, you'll see it has some benefits: 693 | 694 | ................................................................................. 695 | 696 | >> require "yaml" 697 | => true 698 | 699 | >> y Prawn::Document.new 700 | --- &id007 !ruby/object:Prawn::Document 701 | background: 702 | bounding_box: &id002 !ruby/object:Prawn::Document::BoundingBox 703 | height: 720.0 704 | parent: *id007 705 | width: 540.0 706 | x: 36 707 | y: 756.0 708 | compress: false 709 | info: &id003 !ruby/object:Prawn::Reference 710 | compressed: false 711 | data: 712 | :Creator: Prawn 713 | :Producer: Prawn 714 | gen: 0 715 | identifier: 1 716 | on_encode: 717 | margin_box: *id002 718 | margins: 719 | :left: 36 720 | :right: 36 721 | :top: 36 722 | :bottom: 36 723 | page_content: *id005 724 | page_layout: :portrait 725 | page_size: LETTER 726 | pages: *id004 727 | root: *id006 728 | skip_encoding: 729 | stroke_color: "000000" 730 | text_options: {} 731 | 732 | y: 756.0 733 | => nil 734 | 735 | ................................................................................. 736 | 737 | I truncated this file somewhat, but the basic structure shines through. You can 738 | see that YAML nicely shows nested object relations, and generally looks neat and 739 | tidy. Interestingly enough, YAML automatically truncates repeated object 740 | references by referring to them by ID only. This turns out to be especially good 741 | for tracking down a certain kind of Ruby bug: 742 | 743 | ................................................................................. 744 | 745 | >> a = Array.new(6) 746 | => [nil, nil, nil, nil, nil, nil] 747 | >> a = Array.new(6,[]) 748 | => [[], [], [], [], [], []] 749 | >> a[0] << "foo" 750 | => ["foo"] 751 | >> a 752 | => [["foo"], ["foo"], ["foo"], ["foo"], ["foo"], ["foo"]] 753 | >> y a 754 | --- 755 | - &id001 756 | - foo 757 | - *id001 758 | - *id001 759 | - *id001 760 | - *id001 761 | - *id001 762 | 763 | ................................................................................. 764 | 765 | Here, it's easy to see that the six sub-arrays that make up our main array are 766 | actually just six references to the same object. If that wasn't what we were 767 | going for, we can see the difference when we have six distinct objects very 768 | clearly in YAML: 769 | 770 | ................................................................................. 771 | 772 | >> a = Array.new(6) { [] } 773 | => [[], [], [], [], [], []] 774 | >> a[0] << "foo" 775 | => ["foo"] 776 | >> a 777 | => [["foo"], [], [], [], [], []] 778 | >> y a 779 | --- 780 | - - foo 781 | - [] 782 | 783 | - [] 784 | 785 | - [] 786 | 787 | - [] 788 | 789 | - [] 790 | 791 | ................................................................................. 792 | 793 | Although this may not be a problem you run into day to day, it's relatively easy 794 | to forget to deep copy a structure from time to time, or to accidentally create 795 | many copies of a reference to the same object when you're trying to set default 796 | values. When that happens, a quick call to +y+ will make a long series of 797 | references to the same object appear very clearly. 798 | 799 | Of course, the YAML output will come most in handy when you encounter this 800 | problem by accident or if it is part of some sort of deeply nested structure. 801 | If you already know exactly where to look and can easily get at it, using pure 802 | Ruby works fine as well: 803 | 804 | ................................................................................. 805 | 806 | >> a = Array.new(6) { [] } 807 | => [[], [], [], [], [], []] 808 | >> a.map { |e| e.object_id } 809 | => [3423870, 3423860, 3423850, 3423840, 3423830, 3423820] 810 | >> b = Array.new(6,[]) 811 | => [[], [], [], [], [], []] 812 | >> b.map { |e| e.object_id } 813 | => [3431570, 3431570, 3431570, 3431570, 3431570, 3431570] 814 | 815 | ................................................................................. 816 | 817 | So far, we've been focusing very heavily on how to inspect your objects. This 818 | is mostly because of the fact that a great deal of Ruby bugs can be solved by 819 | simply getting a sense of what objects are being passed around and what data 820 | they really contain. But this is of course not the full extent of the problem, 821 | we also need to be able to work with code that has been set in motion. 822 | 823 | ==== Finding Needles In A Haystack ==== 824 | 825 | Sometimes it's not possible to easily pull up a defective object to directly 826 | inspect. Consider for example, a large dataset that has some occasional 827 | anomalies in it. If you're dealing with tens or hundreds of thousands of 828 | records, an error like this won't be very helpful after your script churns for a 829 | while and then goes sailing off the tracks: 830 | 831 | ................................................................................. 832 | 833 | >> @data.map { |e|Integer(e[:amount]) } 834 | ArgumentError: invalid value for Integer: "157,000" 835 | from (irb):10:in `Integer' 836 | from (irb):10 837 | from (irb):10:in `inject' 838 | from (irb):10:in `each' 839 | from (irb):10:in `inject' 840 | from (irb):10 841 | from :0 842 | 843 | ................................................................................. 844 | 845 | This error tells you virtually nothing about what has happened, except that 846 | somewhere in your giant data set, there is an invalidly formatted integer. 847 | Let's explore how to deal with situations like this, by creating some data and 848 | introducing a few problems into it. 849 | 850 | When it comes to generating fake data for testing, you can't get easier than the 851 | 'faker' gem. Here's a sample of creating an array of hash records containing 852 | 5000 names, phone numbers, and payments: 853 | 854 | ................................................................................. 855 | 856 | >> data = 5000.times.map do 857 | ?> { name: Faker::Name.name, phone_number: Faker::PhoneNumber.phone_number, 858 | ?> payment: rand(10000).to_s } 859 | >> end 860 | 861 | >> data.length 862 | => 5000 863 | >> data[0..2] 864 | => [{:name=>"Joshuah Wyman", :phone_number=>"393-258-6420", :payment=>"6347"}, 865 | {:name=>"Kraig Jacobi", :phone_number=>"779-295-0532", :payment=>"9186"}, 866 | {:name=>"Jevon Harris", :phone_number=>"985.169.0519", :payment=>"213"}] 867 | 868 | ................................................................................. 869 | 870 | Now, we can randomly corrupt a handful of records, to give us a basis for our 871 | example. Keep in mind, the purpose of this demonstration is to show how to 872 | respond to unanticipated problems, rather than a known issue with your data. 873 | 874 | ................................................................................. 875 | 876 | 5.times { data[rand(data.length)][:payment] << ".25" } 877 | 878 | ................................................................................. 879 | 880 | Now if we ask a simple question such as which records have an amount over 1000, 881 | we get our familiar and useless error: 882 | 883 | ................................................................................. 884 | 885 | >> data.select { |e| Integer(e[:payment]) > 1000 } 886 | ArgumentError: invalid value for Integer: "1991.25" 887 | 888 | ................................................................................. 889 | 890 | At this point, we'd like to get some more information about where this problem 891 | is actually located in our data, and what the individual record looks like. 892 | Because we presumably have no idea how many of these records there are, we might 893 | start by rescuing a single failure and then re-raising the error after printing 894 | some of this data to the screen. We'll use a +begin .. rescue+ construct here as 895 | well as +Enumerable#with_index+. 896 | 897 | ................................................................................. 898 | 899 | >> data.select.with_index do |e,i| 900 | ?> begin 901 | ?> Integer(e[:payment]) > 1000 902 | >> rescue ArgumentError 903 | >> p [e,i] 904 | >> raise 905 | >> end 906 | >> end 907 | [{:name=>"Mr. Clotilde Baumbach", :phone_number=>"(608)779-7942", :payment=>"1991.25"}, 91] 908 | ArgumentError: invalid value for Integer: "1991.25" 909 | from (irb):67:in `Integer' 910 | from (irb):67:in `block in irb_binding' 911 | from (irb):65:in `select' 912 | from (irb):65:in `with_index' 913 | from (irb):65 914 | from /Users/sandal/lib/ruby19_1/bin/irb:12:in `
' 915 | 916 | ................................................................................. 917 | 918 | So now we've pinpointed where the problem is coming from, and we know what the 919 | actual record looks like. Aside from the payment being a string representation 920 | of a `Float` instead of an `Integer`, it's not immediately clear that their is 921 | anything else wrong with this record. If we drop the line that re-raises the 922 | error, we can get a full report of records with this issue: 923 | 924 | ................................................................................. 925 | 926 | >> data.select.with_index do |e,i| 927 | ?> begin 928 | ?> Integer(e[:payment]) > 1000 929 | >> rescue ArgumentError 930 | >> p [e,i] 931 | >> end 932 | >> end; nil 933 | [{:name=>"Mr. Clotilde Baumbach", :phone_number=>"(608)779-7942", :payment=>"1991.25"}, 91] 934 | [{:name=>"Oceane Cormier", :phone_number=>"658.016.1612", :payment=>"7361.25"}, 766] 935 | [{:name=>"Imogene Bergnaum", :phone_number=>"(573)402-6508", :payment=>"1073.25"}, 1368] 936 | [{:name=>"Jeramy Prohaska", :phone_number=>"928.266.5508 x97173", :payment=>"6109.25"}, 2398] 937 | [{:name=>"Betty Gerhold", :phone_number=>"250-149-3161", :payment=>"8668.25"}, 2399] 938 | => nil 939 | 940 | ................................................................................. 941 | 942 | As you can see, this recovered all the rows with this issue. Based on this 943 | information, we could probably make a 944 | decision about what to do to fix the issue. But because we're just interested 945 | in the process here, the actual solution doesn't matter that much. Instead, 946 | the real point to remember is that when faced with an opaque error after 947 | iterating across a large dataset, you can go back and temporarily rework things 948 | to allow you to analyze the problematic data records. 949 | 950 | We'll see variants on this theme later on in the chapter, but for now, let's 951 | recap what to remember when you are looking at your code under the microscope: 952 | 953 | * Don't rely on giant, ugly inspect statements if you can avoid it. Instead, 954 | use introspection to narrow your search down to the specific relevant 955 | objects. 956 | 957 | * Writing your own `#inspect` method allows customed output from +Kernel#p+ and 958 | within irb. However, this means you are responsible for adding new state to 959 | the debugging output as your objects evolve. 960 | 961 | * YAML provides a nice +Kernel#y+ method that provides a structured, easy to 962 | read representation of Ruby objects. This is also useful for spotting 963 | accidental reference duplication bugs. 964 | 965 | * Sometimes stack traces aren't enough. You can +rescue+ and then re-raise an 966 | error after printing some debugging output to help you find the root cause of 967 | your problems. 968 | 969 | So far, we've talked about solutions that work well as part of the active 970 | debugging process. However, in many cases it is also important to passively 971 | collect error feedback for dealing with later. It is possible to do this with 972 | Ruby's logging system, so let's shift gears a bit and talk about it. 973 | 974 | === Working With Logger === 975 | 976 | I'm not generally a big fan of log files. I much prefer the immediacy of seeing 977 | problems directly reported on the screen as soon as they happen. If possible, I 978 | actually want to be thrown directly into my problematic code so I can take a look 979 | around if possible. However, this isn't always an option, and in certain cases, 980 | having an audit trail in the form of log files is as good as it's going to get. 981 | 982 | Ruby's standard library 'logger' is fairly full featured, allowing you to log 983 | many different kinds of messages, and filter them based on their severity. The 984 | API is reasonably well documented, so I won't be spending a ton of time going 985 | over a feature by feature summary of what this library offers here. Instead, 986 | I'll show you how to replicate a bit of functionality that is especially common 987 | in Ruby's web frameworks: comprehensive error logging. 988 | 989 | If I pull up a log from one of my Rails applications, I can easily show what I'm 990 | talking about. The following is just a small section of a log file, in which a 991 | full request and the error it ran into have been recorded: 992 | 993 | ................................................................................. 994 | 995 | Processing ManagerController#call_in_sheet (for 127.0.0.1 at 2009-02-13 16:38:42) [POST] 996 | Session ID: BAh7CCIJdXNlcmkiOg5yZXR1cm5fdG8wIgpmbGFzaElDOidBY3Rpb25Db250%0Acm9sbGVyOj 997 | pGbGFzaDo6Rmxhc2hIYXNoewAGOgpAdXNlZHsA--2f1d03dee418f4c9751925da421ae4730f9b55dd 998 | Parameters: {"period"=>"01/19/2009", "commit"=>"Select", "action"=>"call_in_sheet", 999 | "controller"=>"manager"} 1000 | 1001 | 1002 | NameError (undefined local variable or method `lunch' for #): 1003 | /lib/reports.rb:368:in `employee_record' 1004 | /lib/reports.rb:306:in `to_grouping' 1005 | /lib/reports.rb:305:in `each' 1006 | /lib/reports.rb:305:in `to_grouping' 1007 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/data/table.rb:169:in `initialize' 1008 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/data/table.rb:809:in `new' 1009 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/data/table.rb:809:in `Table' 1010 | /lib/reports.rb:304:in `to_grouping' 1011 | /lib/reports.rb:170:in `CallInAggregator' 1012 | /lib/reports.rb:129:in `setup' 1013 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:337:in `render' 1014 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:379:in `build' 1015 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:335:in `render' 1016 | /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:451:in `method_missing' 1017 | /app/controllers/manager_controller.rb:111:in `call_in_sheet' 1018 | /app/controllers/application.rb:62:in `on' 1019 | /app/controllers/manager_controller.rb:110:in `call_in_sheet' 1020 | /vendor/rails/actionpack/lib/action_controller/base.rb:1104:in `send' 1021 | /vendor/rails/actionpack/lib/action_controller/base.rb:1104:in `perform_action_wit 1022 | 1023 | ................................................................................. 1024 | 1025 | While the production application would display a rather boring "We're sorry, 1026 | something went wrong" message upon triggering an error, our backend logs tell us 1027 | exactly what request triggered the error and when it occured. It also gives us 1028 | information about the actual request, to aid in debugging. Though this 1029 | particular bug is fairly boring, since it looks like it was simply a typo that 1030 | snuck through the cracks, logging each error that occurs along with its full 1031 | stack trace provides essentially the same information that you'd get if you were 1032 | running a script locally and ran into an error. 1033 | 1034 | While it's nice that some libraries and frameworks have logging built in, 1035 | sometimes we'll need to roll our own. To demonstrate this, we'll be walking 1036 | through a simple +TCPServer+ that does simple arithmetic operations in prefix 1037 | notation. We'll start by taking a look at it without any logging or error 1038 | handling support: 1039 | 1040 | ................................................................................. 1041 | 1042 | require "socket" 1043 | 1044 | class Server 1045 | 1046 | def initialize 1047 | @server = TCPServer.new('localhost',port=3333) 1048 | end 1049 | 1050 | def *(x, y) 1051 | "#{Float(x) * Float(y)}" 1052 | end 1053 | 1054 | def /(x, y) 1055 | "#{Float(x) / Float(y)}" 1056 | end 1057 | 1058 | def handle_request(session) 1059 | action, *args = session.gets.split(/\s/) 1060 | if ["*", "/"].include?(action) 1061 | session.puts(send(action, *args)) 1062 | else 1063 | session.puts("Invalid command") 1064 | end 1065 | end 1066 | 1067 | def run 1068 | while session = @server.accept 1069 | handle_request(session) 1070 | end 1071 | end 1072 | end 1073 | 1074 | ................................................................................. 1075 | 1076 | We can use the following fairly generic client to interact with the server, 1077 | which is similar to the one we used in the "Designing Beautiful APIs" chapter. 1078 | 1079 | ................................................................................. 1080 | 1081 | require "socket" 1082 | 1083 | class Client 1084 | 1085 | def initialize(ip="localhost",port=3333) 1086 | @ip, @port = ip, port 1087 | end 1088 | 1089 | def send_message(msg) 1090 | socket = TCPSocket.new(@ip,@port) 1091 | socket.puts(msg) 1092 | response = socket.gets 1093 | socket.close 1094 | return response 1095 | end 1096 | 1097 | def receive_message 1098 | socket = TCPSocket.new(@ip,@port) 1099 | response = socket.read 1100 | socket.close 1101 | return response 1102 | end 1103 | 1104 | end 1105 | 1106 | ................................................................................. 1107 | 1108 | Without any error handling, we end up with something like this on the client 1109 | side: 1110 | 1111 | ................................................................................. 1112 | 1113 | client = Client.new 1114 | 1115 | response = client.send_message("* 5 10") 1116 | puts response 1117 | 1118 | response = client.send_message("/ 4 3") 1119 | puts response 1120 | 1121 | response = client.send_message("/ 3 foo") 1122 | puts response 1123 | 1124 | response = client.send_message("* 5 7.2") 1125 | puts response 1126 | 1127 | ## OUTPUTS ## 1128 | 1129 | 50.0 1130 | 1.33333333333333 1131 | nil 1132 | client.rb:8:in `initialize': Connection refused - connect(2) (Errno::ECONNREFUSED) 1133 | from client.rb:8:in `new' 1134 | from client.rb:8:in `send_message' 1135 | from client.rb:35 1136 | 1137 | ................................................................................. 1138 | 1139 | When we send the erroneous third message, the server never responds, resulting 1140 | in a nil response. But when we try to send a fourth message, which would 1141 | ordinarily be valid, we see our connection was refused. If we take a look 1142 | server side, we see that a single uncaught exception caused it to crash immediately: 1143 | 1144 | ................................................................................. 1145 | 1146 | server_logging_initial.rb:15:in `Float': invalid value for Float(): "foo" (ArgumentError) 1147 | from server_logging_initial.rb:15:in `/' 1148 | from server_logging_initial.rb:20:in `send' 1149 | from server_logging_initial.rb:20:in `handle_request' 1150 | from server_logging_initial.rb:25:in `run' 1151 | from server_logging_initial.rb:31 1152 | 1153 | ................................................................................. 1154 | 1155 | While this does give us a sense of what happened, it doesn't give us much 1156 | insight into when and why. It also seems just a tad bit fragile to have a whole 1157 | server come crashing down on the account of a single bad request. With a little 1158 | more effort, we can add logging and error handling and make things behave much 1159 | better. 1160 | 1161 | ................................................................................. 1162 | 1163 | require "socket" 1164 | require "logger" 1165 | 1166 | class StandardError 1167 | def report 1168 | %{#{self.class}: #{message}\n#{backtrace.join("\n")}} 1169 | end 1170 | end 1171 | 1172 | class Server 1173 | 1174 | def initialize(logger) 1175 | @logger = logger 1176 | @server = TCPServer.new('localhost',port=3333) 1177 | end 1178 | 1179 | def *(x, y) 1180 | "#{Float(x) * Float(y)}" 1181 | end 1182 | 1183 | def /(x, y) 1184 | "#{Float(x) / Float(y)}" 1185 | end 1186 | 1187 | def handle_request(session) 1188 | action, *args = session.gets.split(/\s/) 1189 | if ["*", "/"].include?(action) 1190 | @logger.info "executing: '#{action}' with #{args.inspect}" 1191 | session.puts(send(action, *args)) 1192 | else 1193 | session.puts("Invalid command") 1194 | end 1195 | rescue StandardError => e 1196 | @logger.error(e.report) 1197 | session.puts "Sorry, something went wrong." 1198 | end 1199 | 1200 | def run 1201 | while session = @server.accept 1202 | handle_request(session) 1203 | end 1204 | end 1205 | end 1206 | 1207 | begin 1208 | logger = Logger.new("development.log") 1209 | host = Server.new(logger) 1210 | 1211 | host.run 1212 | rescue StandardError => e 1213 | logger.fatal(e.report) 1214 | puts "Something seriously bad just happened, exiting" 1215 | end 1216 | 1217 | ................................................................................. 1218 | 1219 | We'll go over the details in just a minute, but first, let's take a look at the 1220 | output on the client side running the identical code from earlier: 1221 | 1222 | ................................................................................. 1223 | 1224 | client = Client.new 1225 | 1226 | response = client.send_message("* 5 10") 1227 | puts response 1228 | 1229 | response = client.send_message("/ 4 3") 1230 | puts response 1231 | 1232 | response = client.send_message("/ 3 foo") 1233 | puts response 1234 | 1235 | response = client.send_message("* 5 7.2") 1236 | puts response 1237 | 1238 | ## OUTPUTS ## 1239 | 1240 | 50.0 1241 | 1.33333333333333 1242 | Sorry, something went wrong. 1243 | 36.0 1244 | 1245 | ................................................................................. 1246 | 1247 | We see that the third message is caught as an error and an apology is promptly 1248 | sent to the client. But the interesting bit is that the fourth example 1249 | continues to run normally, indicating that the server did not crash this time 1250 | around. 1251 | 1252 | Of course, if we swallowed all errors and just returned "We're sorry" every time 1253 | something happened without creating a proper paper trail for debugging, that'd 1254 | be a terrible idea. Upon inspecting the server logs, we can see that we haven't 1255 | forgotton to keep ourselves covered: 1256 | 1257 | ................................................................................. 1258 | 1259 | # Logfile created on Sat Feb 21 07:07:49 -0500 2009 by / 1260 | I, [2009-02-21T07:08:54.335294 #39662] INFO -- : executing: '*' with ["5", "10"] 1261 | I, [2009-02-21T07:08:54.335797 #39662] INFO -- : executing: '/' with ["4", "3"] 1262 | I, [2009-02-21T07:08:54.336163 #39662] INFO -- : executing: '/' with ["3", "foo"] 1263 | E, [2009-02-21T07:08:54.336243 #39662] ERROR -- : ArgumentError: invalid value for Float(): "foo" 1264 | server_logging.rb:22:in `Float' 1265 | server_logging.rb:22:in `/' 1266 | server_logging.rb:28:in `send' 1267 | server_logging.rb:28:in `handle_request' 1268 | server_logging.rb:36:in `run' 1269 | server_logging.rb:45 1270 | I, [2009-02-21T07:08:54.336573 #39662] INFO -- : executing: '*' with ["5", "7.2"] 1271 | 1272 | ................................................................................. 1273 | 1274 | Here we see two different levels of logging going on, INFO and ERROR. The 1275 | purpose of our INFO logs are simply to document requests as parsed by our 1276 | server. This is to ensure that the messages and their parameters are being 1277 | processed as we expect them to. Our ERROR logs document the actual errors we 1278 | run into while processing things, and you can see in this example that the stack 1279 | trace written to the log file is nearly identical to the one that was produced 1280 | when our more fragile version of the server crashed. 1281 | 1282 | Although the format is a little different, like the rails logs, this provides us 1283 | with everything we need for debugging. A time and date of the issue, a record 1284 | of the actual request, and a trace that shows where the error originated. Now 1285 | that we've seen it in action, let's take a look at how it all comes together. 1286 | 1287 | We'll start with the small extension to +StandardError+ 1288 | 1289 | ................................................................................. 1290 | 1291 | class StandardError 1292 | def report 1293 | %{#{self.class}: #{message}\n#{backtrace.join("\n")}} 1294 | end 1295 | end 1296 | 1297 | ................................................................................. 1298 | 1299 | This convenience method allows us to produce error reports that look similar to 1300 | the ones you'll find on the command line when an exception is raised. While 1301 | +StandardError+ objects provide all the same information, they do not have a 1302 | single public method that provides the same report data that Ruby does, so we 1303 | need to assemble it on our own. 1304 | 1305 | We can see how this error report is used in the main +handle_request+ method. 1306 | Notice that the server is passed a +Logger+ instance which is used as +@logger+ 1307 | in the following code: 1308 | 1309 | ................................................................................. 1310 | 1311 | def handle_request(session) 1312 | action, *args = session.gets.split(/\s/) 1313 | if ["*", "/"].include?(action) 1314 | @logger.info "executing: '#{action}' with #{args.inspect}" 1315 | session.puts(send(action, *args)) 1316 | else 1317 | session.puts("Invalid command") 1318 | end 1319 | rescue StandardError => e 1320 | @logger.error(e.report) 1321 | session.puts "Sorry, something went wrong." 1322 | end 1323 | 1324 | ................................................................................. 1325 | 1326 | Here, we see where the messages in our log file actually came from. Before the 1327 | server attempts to actually execute a command, it records what it has parsed out 1328 | using +@logger.info+. Then, it attempts to send the message along with its 1329 | parameters to the object itself, printing its return value to the client end of 1330 | the socket. If this fails for any reason, the relevant error is captured into 1331 | +e+ through +rescue+. This will catch all descendents of +StandardError+, which 1332 | include virtually all exceptions Ruby can throw. Once it is captured, we 1333 | utilize the custom +StandardError#report+ extension to generate an error report 1334 | string which is then logged as an error in the logfile. The apology is sent 1335 | along to the client, thus completing the cycle. 1336 | 1337 | 1338 | While that covers what we've seen in the logfile so far, there is an additional 1339 | measure for error handling in this application. We see this in the code that 1340 | actually gets everything up and running: 1341 | 1342 | ................................................................................. 1343 | 1344 | 1345 | begin 1346 | logger = Logger.new("development.log") 1347 | host = Server.new(logger) 1348 | 1349 | host.run 1350 | rescue StandardError => e 1351 | logger.fatal(e.report) 1352 | puts "Something seriously bad just happened, exiting" 1353 | end 1354 | 1355 | ................................................................................. 1356 | 1357 | Although our response handling code is pretty well insulated from errors, we 1358 | still want to track in our logfile any server crashes that may happen. Rather 1359 | than using ERROR as our designation, we instead use FATAL, indicating that our 1360 | server has no intention of recovering from errors that bubble up to this level. 1361 | I'll leave it up to the reader to figure out how to crash the server once it is 1362 | running, but this would also serve to persist to file things such as 1363 | misspelled variable and method names or other issues within the +Server+ class. 1364 | To illustrate this, replace the +run+ method with the following code: 1365 | 1366 | ................................................................................. 1367 | 1368 | def run 1369 | while session = @server.accept 1370 | handle_request(sessions) 1371 | end 1372 | end 1373 | 1374 | ................................................................................. 1375 | 1376 | You'll end up crashing the server and producing the following log message: 1377 | 1378 | ................................................................................. 1379 | 1380 | F, [2009-02-21T07:39:40.592569 #39789] FATAL -- : NameError: undefined local 1381 | variable or method `sessions' for # 1382 | server_logging.rb:36:in `run' 1383 | server_logging.rb:45 1384 | 1385 | ................................................................................. 1386 | 1387 | This can be helpful if you're deploying code remotely and have some code that 1388 | runs locally but not on the remote host, among other things. 1389 | 1390 | Although we have not covered +Logger+ in depth by any means, we've walked 1391 | through an example which can be used as a template for more general needs. Most 1392 | of the time, logging makes the most sense when you don't have easy, immediate 1393 | access to the running code, and can be overkill in other places. If you're considering 1394 | adding logging code to your applications, there are a few things to keep in 1395 | mind: 1396 | 1397 | * Error logging is essential for long runnning server processes, where you may 1398 | not physically be watching the application moment by moment. 1399 | 1400 | * If you are working in a multi-processing environment, be sure to use a 1401 | separate log file for each process, as otherwise there will be clashes. 1402 | 1403 | * `Logger` is powerful and includes a ton of featured not covered here, including 1404 | built in logfile rotation. 1405 | 1406 | * See the template for `StandardError#report` if you want to include error 1407 | reports in your logs that look similar to the ones Ruby generates on the 1408 | command line. 1409 | 1410 | * When it comes to logging error messages, FATAL should represent a bug your 1411 | code has no intention of recovering from, where ERROR is more open-ended. 1412 | 1413 | Depending on the kind of work you do, you may end up using +Logger+ every day or 1414 | not at all. If it's the former case, be sure to check out the API 1415 | documentation for many of the features not covered here. 1416 | 1417 | And with that, we've reached the end of another chapter. I'll just wrap up with 1418 | some closing remarks, and then we can move on to more upbeat topics. 1419 | 1420 | === Conclusions === 1421 | 1422 | Dealing with defective code is something we all need to do from time to time. 1423 | If we approach these issues in a relatively disciplined way, we can methodically 1424 | corner and squash pretty much any bug that can be imagined. Debugging Ruby 1425 | code tends to be a fluid process, starting with a good specification of how things 1426 | should actually work, and then exercising various investigative tactics until a fix 1427 | can be found. We don't necessarily need a debugger to track down issues in our code, 1428 | but we do need to use Ruby's introspective features as much as possible, since they 1429 | have the power to reveal to us exactly what is going on under the hood. 1430 | 1431 | Once you get into a comfortable work flow for resolving issues in your Ruby 1432 | code, it becomes more and more straightforward. If you find yourself lost while 1433 | hunting down some bug, take the time to slow down and utilize the strategies 1434 | we've gone over in this chapter. Once you get the hang of them, the tighter 1435 | feedback loop will kick in and make your job much easier. 1436 | -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0701.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0701.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0702.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0702.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0703.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0703.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0704.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0704.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0705.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0705.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0706.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0706.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0801.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0801.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0802.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0802.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0803.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0803.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_0804.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0804.png -------------------------------------------------------------------------------- /oreilly_final/figs/rubp_ab01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_ab01.png --------------------------------------------------------------------------------