├── README
├── manuscript
    └── unmerged
    │   ├── api.txt
    │   ├── compatibility.txt
    │   ├── cultural_barriers.txt
    │   ├── dynamic_toolkit.txt
    │   ├── fp.txt
    │   ├── io.txt
    │   ├── project_maintenance.txt
    │   ├── ruby_worst_practices.txt
    │   ├── stdlib.txt
    │   ├── testing.txt
    │   └── things_go_wrong.txt
└── oreilly_final
    ├── book.xml
    └── figs
        ├── rubp_0701.png
        ├── rubp_0702.png
        ├── rubp_0703.png
        ├── rubp_0704.png
        ├── rubp_0705.png
        ├── rubp_0706.png
        ├── rubp_0801.png
        ├── rubp_0802.png
        ├── rubp_0803.png
        ├── rubp_0804.png
        └── rubp_ab01.png


/README:
--------------------------------------------------------------------------------
 1 | Welcome to the open source home of the "Ruby Best Practices" book.
 2 | 
 3 | Here you'll find the original manuscript along with the production files that
 4 | were used to generate the print version of the book.
 5 | 
 6 | If instead you were looking for a free PDF download of the book, you
 7 | can find it here:
 8 | 
 9 | http://sandal.github.com/rbp-book/pdfs/rbp_1-0.pdf
10 | 
11 | Or, if you wanted to kill trees and give me some money:
12 | 
13 | http://oreilly.com/catalog/9780596523015/
14 | http://www.amazon.com/gp/product/0596523009/
15 | 
16 | But assuming you are here for the source, check the brief description below.
17 | 
18 | == Files
19 | 
20 | manuscript/unmerged contains asciidoc sources that have not been updated to
21 | reflect copyediting.  When I get around to it, manuscript/updated will contain
22 | the updated files.   Once a file is updated, I will accept patches against it
23 | for fixes and modifications.
24 | 
25 | oreilly_final/ contains the production files that were used to generate this
26 | book.  Right now it's a bit limited, just one giant docbook file and some figs.
27 | We may be able to break it down by chapter later, but we may not necessarily
28 | need it.
29 | 
30 | If you are wondering about code samples, they are currently at:
31 | http://github.com/sandal/rbp
32 | 
33 | I plan to merge them here sooner or later, and extract more from the original
34 | manuscript.  I sort of got lazy there.
35 | 
36 | == Contributing / Using Content
37 | 
38 | Right now, I need to go through the painstaking process of merging copyeditor
39 | changes into my asciidoc manuscript, and then setup the build toolchain again in
40 | a way that's easy enough for contributors to access.
41 | 
42 | But for those who wish to fork and experiment on their own, all content here is
43 | hereby released under the Creative Commons Attribution-Noncommercial-Share Alike
44 | 3.0 license ( http://creativecommons.org/licenses/by-nc-sa/3.0/ ).  
45 | 
46 | If you have any questions about legal usage, contact me, and I'll
47 | talk with O'Reilly.
48 | 
49 |   gregory.t.brown at gmail.com
50 | 
51 | 
52 | 


--------------------------------------------------------------------------------
/manuscript/unmerged/compatibility.txt:
--------------------------------------------------------------------------------
  1 | Appendix A: Writing Backwards Compatible Code
  2 | ---------------------------------------------
  3 | 
  4 | Not everyone has the luxury of using the latest and greatest tools available.
  5 | Though Ruby 1.9 may be gaining ground among developers, much legacy code still
  6 | runs on Ruby 1.8.  Many folks have a responsibility to keep their code 
  7 | running on Ruby 1.8, whether it is in house, open source, or a commercial
  8 | application.  This chapter will show you how to maintain backwards 
  9 | compatibility with Ruby 1.8.6 without preventing your code from running 
 10 | smoothly on Ruby 1.9.1.
 11 | 
 12 | I am assuming here that you are back-porting code to Ruby 1.8, but this may
 13 | also serve as a helpful guide as to how to upgrade your projects to 1.9.1.
 14 | That task is somewhat more complicated however, so your mileage may vary.
 15 | 
 16 | The earlier you start considering backwards compatibility in your project,
 17 | the easier it will be to make things run smoothly.  I'll start by showing
 18 | you how to keep your compatibility code manageable from the start, and then 
 19 | go on to describe some of the issues you may run into when supporting Ruby 
 20 | 1.8 and 1.9 side by side.
 21 | 
 22 | Please note that when I mention 1.8 and 1.9 without further qualifications, I'm
 23 | taking about Ruby 1.8.6 and its compatible implementations, and respectively,
 24 | Ruby 1.9.1 and its compatible implementations.  We have skipped Ruby 1.8.7 and
 25 | Ruby 1.9.0 because both are transitional bridges between 1.8.6 and 1.9.1 and
 26 | aren't truly compatible with either.
 27 | 
 28 | Another thing to keep in mind is that this is definitely not intended to be 
 29 | a comprehensive guide to the differences between the versions of Ruby.  Please
 30 | consult your favorite reference after reviewing the tips you read here.
 31 | 
 32 | But now that you have been sufficiently warned, we can move on to talking 
 33 | about how to keep things clean.
 34 | 
 35 | === Avoiding a Mess ===
 36 | 
 37 | It is very tempting to run your test suite on one version of Ruby, check to
 38 | make sure everything passes, then run it on the other version you want to
 39 | support and see what breaks.  After seeing failures, it might seem 
 40 | easy enough to just drop in code such as the following to make things go
 41 | green again:
 42 | 
 43 | ----------------------------------------------------------------------------
 44 | 
 45 | def my_method(string)
 46 |   lines = if RUBY_VERSION < "1.9"
 47 |     string.to_a
 48 |   else
 49 |     string.lines
 50 |   end
 51 |   do_something_with(lines)
 52 | end
 53 | 
 54 | ----------------------------------------------------------------------------
 55 |   
 56 | Resist this temptation!  If you aren't careful, this will result in a giant
 57 | mess that will be difficult to refactor, and will make your code less 
 58 | readable.  Instead, we can approach this in a more organized fashion.
 59 | 
 60 | ==== Selective Backporting ====
 61 | 
 62 | Before duplicating any effort, it's important to check and see if there is
 63 | a reasonable way to write your code in another way that will allow it to
 64 | run on both Ruby 1.8 and 1.9 natively.  Even if this means writing code that's
 65 | a little more verbose, it's generally worth the effort as it prevents the
 66 | codebase from diverging.
 67 | 
 68 | If this fails however, it may make sense to simply back-port the feature you
 69 | need to Ruby 1.8.  Because of Ruby's open classes, this is easy to do.  We can
 70 | even loosen up our changes so that they check for particular features rather than a
 71 | specific version number, to improve our compatibility with other applications and
 72 | Ruby implementations:
 73 | 
 74 | ----------------------------------------------------------------------------
 75 | 
 76 | class String
 77 |   unless "".respond_to?(:lines)
 78 |     alias_method :lines, :to_a
 79 |   end
 80 | end
 81 | 
 82 | ----------------------------------------------------------------------------
 83 | 
 84 | Doing this will allow you to re-write your method so that it looks more 
 85 | natural:
 86 | 
 87 | ----------------------------------------------------------------------------
 88 | 
 89 | def my_method(string)
 90 |   do_something_with(string.lines)
 91 | end
 92 | 
 93 | ----------------------------------------------------------------------------
 94 | 
 95 | Although this implementation isn't exact, it is good enough for our needs
 96 | and will work as expected in most cases.  However, if we wanted to be 
 97 | pedantic, we'd be sure to return an Enumerator instead of an Array:
 98 | 
 99 | ----------------------------------------------------------------------------
100 | 
101 | class String
102 |   unless "".respond_to?(:lines)
103 |     require "enumerator"
104 | 
105 |     def lines
106 |       to_a.enum_for(:each)
107 |     end
108 |   end
109 | end
110 | 
111 | ----------------------------------------------------------------------------
112 | 
113 | If you aren't redistributing your code, passing tests in your application
114 | and code that works as expected are a good enough indication that your 
115 | backwards compatibility patches are working.  However, in code that you plan
116 | to distribute, open source or otherwise, you need to be prepared to make
117 | things more robust when necessary.  Any time you distribute code that
118 | modifies core Ruby, you have an implicit responsibility of not breaking third 
119 | party libraries or application code, so be sure to keep this in mind and
120 | clearly document exactly what you have changed.
121 | 
122 | In Prawn, we use a single file, 'prawn/compatibility.rb', to store all the core
123 | extensions used in the library that support backwards compatibility. This
124 | helps make it easier for users to track down all the changes made by the
125 | library, which can help make subtle bugs that can arise from version 
126 | incompatibilities easier to spot.
127 | 
128 | In general, this approach is a fairly solid way to keep your application code
129 | clean while supporting both Ruby 1.8 and 1.9.  However, you should only use
130 | it to add new functionality to Ruby 1.8.6 that isn't present in 1.9.1, and not
131 | to modify existing behavior.  Adding functions that don't exist in a standard
132 | version of Ruby is a relatively low risk procedure, whereas changing core
133 | functionality is a far more controversial practice.
134 | 
135 | ==== Version-specific Codeblocks ====
136 | 
137 | If you run into a situation where you really need two different approaches
138 | between the two major versions of Ruby, you can use a trick to make this
139 | a bit more attractive in your code.
140 | 
141 | ----------------------------------------------------------------------------
142 | 
143 | if RUBY_VERSION < "1.9" 
144 |   def ruby_18
145 |     yield
146 |   end
147 | 
148 |   def ruby_19 
149 |     false
150 |   end
151 | else
152 |   def ruby_18
153 |     false
154 |   end
155 |   
156 |   def ruby_19
157 |     yield
158 |   end   
159 | end
160 | 
161 | ----------------------------------------------------------------------------
162 | 
163 | Here's an example of how you'd make use of these methods:
164 | 
165 | ----------------------------------------------------------------------------
166 | 
167 | def open_file(file)
168 |   ruby_18 { File.open("foo.txt","r") } ||
169 |     ruby_19 { File.open("foo.txt", "r:UTF-8") }
170 | end
171 | 
172 | ----------------------------------------------------------------------------
173 |   
174 | Of course, since this approach creates a divergent codebase, it should be
175 | used as sparingly as possible.  However, since this looks a little nicer
176 | than a conditional statement and provides a centralized place for changes to
177 | minor version numbers if needed, it is a nice way to go when it is actually
178 | necessary.
179 | 
180 | ==== Compatibility shims for common operations ====
181 | 
182 | When you need to accomplish the same thing in two different ways, you can
183 | also consider adding a method to both versions of Ruby.  Although Ruby 1.9.1
184 | shipped with `File.binread()`, this method did not exist in the earlier 
185 | developmental versions of Ruby 1.9.
186 | 
187 | Although a handful of +ruby_18+ and +ruby_19+ calls here and there aren't that bad,
188 | the need for opening binary files was pervasive, and it got tiring to see
189 | the following code popping up everywhere this feature was needed:
190 | 
191 | ----------------------------------------------------------------------------
192 | 
193 | ruby_18 { File.open("foo.jpg", "rb") } ||
194 |   ruby_19 { File.open("foo.jpg", "rb:BINARY") }
195 | 
196 | ----------------------------------------------------------------------------
197 | 
198 | To simplify things, we put together a simple `File.read_binary` method that
199 | worked on both Ruby 1.8 and 1.9.  You can see this is nothing particularly
200 | exciting or surprising:
201 | 
202 | ----------------------------------------------------------------------------
203 | 
204 | if RUBY_VERSION < "1.9"
205 | 
206 |   class File  
207 |     def self.read_binary(file) 
208 |       File.open(file,"rb") { |f| f.read } 
209 |     end
210 |   end
211 | 
212 | else
213 | 
214 |   class File  
215 |     def self.read_binary(file) 
216 |       File.open(file,"rb:BINARY") { |f| f.read } 
217 |     end
218 |   end  
219 | 
220 | end 
221 |   
222 | ----------------------------------------------------------------------------
223 | 
224 | This cleaned up the rest of our code greatly, and reduced the number of
225 | version checks significantly.  Of course, when `File.binread()` came along in
226 | Ruby 1.9.1, we went and used the techniques discussed earlier to back-port it
227 | to 1.8.6, but previous to that, this represented a nice way to to attack the 
228 | same problem in two different ways.
229 | 
230 | Now that we've discussed all the relevant techniques, I can now show you
231 | what prawn/compatibility.rb looks like.  This file allows Prawn to run on
232 | both major versions of Ruby without any issues, and as you can see, is quite
233 | compact:
234 | 
235 | ----------------------------------------------------------------------------
236 | 
237 | class String  #:nodoc:
238 |   unless "".respond_to?(:lines)
239 |     alias_method :lines, :to_a
240 |   end
241 | end
242 | 
243 | unless File.respond_to?(:binread)
244 |   def File.binread(file) 
245 |     File.open(file,"rb") { |f| f.read } 
246 |   end
247 | end
248 | 
249 | if RUBY_VERSION < "1.9"
250 |   
251 |   def ruby_18 
252 |     yield
253 |   end
254 |   
255 |   def ruby_19
256 |     false
257 |   end
258 |      
259 | else  
260 |  
261 |   def ruby_18 
262 |     false  
263 |   end
264 |   
265 |   def ruby_19 
266 |     yield
267 |   end 
268 |   
269 | end 
270 | 
271 | ----------------------------------------------------------------------------
272 | 
273 | This code leaves Ruby 1.9.1 virtually untouched, and only adds a couple of 
274 | simple features to Ruby 1.8.6.  These small modifications enable Prawn to have 
275 | cross-compatibility between versions of Ruby without polluting its
276 | codebase with copious version checks and workarounds.  Of course, there are
277 | a few areas that needed extra attention, and we'll talk sorts of issues to
278 | look out for in just a moment, but for the most part, this little 
279 | compatibility file gets the job done.
280 | 
281 | Even if someone produced a Ruby 1.8 / 1.9 compatibility library that you 
282 | could include into your projects, it might still be advisable to copy only
283 | what you need from it.  The core philosophy here is that we want to do
284 | as much as we can to let each respective version of Ruby be what it is,
285 | to avoid confusing and painful debugging sessions.  By taking a minimalist
286 | approach and making it as easy as possible to locate your platform specific
287 | changes, we can help make things run more smoothly.
288 | 
289 | Before we move on to some more specific details on particular 
290 | incompatibilities and how to work around them, let's recap the key points
291 | of this section:
292 | 
293 |   * Try to support both Ruby 1.8 and 1.9 from the ground up.  However, be
294 |     sure to write your code against Ruby 1.9 first and then backport to 1.8
295 |     if you want prevent yourself from writing too much legacy code.
296 |     
297 |   * Before writing any version specific code or modifying core Ruby, attempt
298 |     to find a way to write code that runs natively on both Ruby 1.8 and 1.9.
299 |     Even if the solution turns out to be less beautiful than usual, its 
300 |     better to have code that works without introducing redundant 
301 |     implementations or modifications to core Ruby.
302 |     
303 |   * For features that don't have a straightforward solution that works on
304 |     both versions, consider back-porting the necessary functionality to
305 |     Ruby 1.8 by adding new methods to existing core classes.
306 |     
307 |   * If a feature is too complicated to backport or involves separate 
308 |     procedures across versions, consider adding a helper method that behaves
309 |     the same on both versions.
310 |     
311 |   * If you need to inline version checks, consider using the ruby_18 and
312 |     ruby_19 blocks shown in this chapter.  These centralize your version 
313 |     checking logic and provide room for refactoring and future extension.
314 |     
315 | With these thoughts in mind, let's talk about some incompatibilities you just 
316 | can't work around, and how to avoid them.
317 | 
318 | === Non-portable features in Ruby 1.9 ===
319 | 
320 | There are some features in Ruby 1.9 that you simply cannot backport to 1.8
321 | without modifying the interpreter itself.  Here we'll talk about just a few 
322 | of the more obvious ones, to serve as a reminder of what to avoid if you plan
323 | to have your code run on both versions of Ruby.   In no particular order,
324 | here's a fun list of things that'll cause a backport to grind to a halt if
325 | you're not careful.
326 | 
327 | ==== Pseudo-keyword Hash Syntax ====
328 | 
329 | Ruby 1.9 adds a cool feature that lets you write things like:
330 | 
331 | ...........................................................................
332 | 
333 | foo(a: 1, b: 2)
334 | 
335 | ...........................................................................
336 |   
337 | But on Ruby 1.8, we're stuck using the old key => value syntax:
338 | 
339 | ...........................................................................
340 | 
341 | foo(:a => 1, :b => 2)
342 | 
343 | ...........................................................................
344 |   
345 | ==== Multi-splat arguments ====
346 | 
347 | Ruby 1.9.1 offers a downright insane amount of ways to process arguments to
348 | methods.  But even the more simple ones, such as multiple splats in an
349 | argument list, are not backwards compatible.  Here's an example of something
350 | you can do on Ruby 1.9 that you can't on Ruby 1.8, which is something to be
351 | avoided in backwards compatible code:
352 | 
353 | ...........................................................................
354 | 
355 | def add(a,b,c,d,e)
356 |   a + b + c + d + e
357 | end
358 | 
359 | add(*[1,2], 3, *[4,5]) #=> 15
360 | 
361 | ...........................................................................
362 | 
363 | The closest thing we can get to this on Ruby 1.8 would be something like this:
364 |  
365 | ...........................................................................
366 | 
367 | add(*[[1,2], 3, [4,5]].flatten) #=> 15
368 | 
369 | ...........................................................................
370 | 
371 | Of course, this isn't nearly as appealing.  It doesn't even handle the same edge
372 | cases that Ruby 1.9 does, as this would not work with any array arguments that
373 | are meant to be kept as an array.  So it's best to just not rely on this kind of
374 | interface in code that needs to run on both 1.8 and 1.9.
375 |   
376 | ==== Block-local variables ====
377 | 
378 | On Ruby 1.9, block variables will shadow outer local variables, resulting
379 | in the following behaviour:
380 | 
381 | ...........................................................................
382 | 
383 | >> a = 1
384 | => 1
385 | >> (1..10).each { |a| a }
386 | => 1..10
387 | >> a 
388 | => 1
389 | 
390 | ...........................................................................
391 | 
392 | This is not the case on Ruby 1.8, where the variable will be modified even
393 | if not explicitly set:
394 | 
395 | ...........................................................................
396 | 
397 | >> a = 1
398 | => 1
399 | >> (1..10).each { |a| a }
400 | => 1..10
401 | >> a
402 | => 10
403 | 
404 | ...........................................................................
405 | 
406 | This can be the source of a lot of subtle errors, so if you want to be safe
407 | on Ruby 1.8, be sure to use different names for your block-local variables
408 | so as to avoid accidentally overwriting outer local variables.
409 | 
410 | ==== Block Arguments ====
411 | 
412 | In Ruby 1.9, blocks can accept block arguments, which is most commonly seen
413 | in define_method.
414 | 
415 | ...........................................................................
416 | 
417 | define_method(:answer) { |&b| b.call(42) }
418 | 
419 | ...........................................................................
420 | 
421 | However, this won't work on Ruby 1.8 without some very ugly workarounds, so
422 | it might be best to rethink things and see if you can do them in a different
423 | way if you've been relying on this functionality.
424 | 
425 | ==== New Proc Syntax ====
426 | 
427 | Both the stabby `Proc` and the `.()` call are new in 1.9, and aren't parseable by
428 | the Ruby 1.8 interpreter.  This means that calls like this need to go:
429 | 
430 | ...........................................................................
431 | 
432 | >> ->(a) { a*3 }.(4)
433 | => 12
434 | 
435 | ...........................................................................
436 | 
437 | Instead, use the trusty lambda keyword and `Proc#call` or `Proc#[]`
438 | 
439 | ...........................................................................
440 | 
441 |   >> lambda { |a| a*3 }[4]
442 |   => 12
443 | 
444 | ...........................................................................
445 | 
446 | ==== Oniguruma ====
447 | 
448 | Although it is possible to build the Oniguruma regular expression engine into 
449 | Ruby 1.8, it is not distributed by default, and thus should not be used in 
450 | backwards compatible code.  This means that if you're using named groups,
451 | you'll need to ditch them.  The following code is using named groups:
452 | 
453 | ...........................................................................
454 | 
455 | >> "Gregory Brown".match(/(?<first_name>\w+) (?<last_name>\w+)/)
456 | => #<MatchData "Gregory Brown" first_name:"Gregory" last_name:"Brown">
457 | 
458 | ...........................................................................
459 | 
460 | We'd need to rewrite this as:
461 | 
462 | ...........................................................................
463 | 
464 | >> "Gregory Brown".match(/(\w+) (\w+)/)
465 | => #<MatchData "Gregory Brown" 1:"Gregory" 2:"Brown">
466 | 
467 | ...........................................................................
468 | 
469 | More advanced regular expressions, including those that make use of positive
470 | or negative look-behind, will need to be completely re-written so that they
471 | work on both Ruby 1.8's regular expression engine and Oniguruma.
472 | 
473 | ==== Most M17N Functionality ====
474 | 
475 | Though it may go without saying, Ruby 1.8 is not particularly well suited for
476 | working with character encodings.  There are some workarounds for this, but 
477 | things like magic comments that tell what encoding a file is in, or String 
478 | objects that are aware of their current encoding are completely missing from Ruby 1.8.
479 | 
480 | Although we could go on, I'll leave the rest of the incompatibilities for you
481 | to research.  Keeping an eye on the issues mentioned in this section will
482 | help you avoid some of the most common problems, and that might be enough to
483 | make things run smoothly for you, depending on your needs.
484 | 
485 | So far we've focused on the things you can't work around, but there are lots
486 | of other issues that can be handled without too much effort, if you know how
487 | to approach them.  We'll take a look at a few of those now.
488 | 
489 | === Workarounds for common issues ===
490 | 
491 | Although we have seen that some functionality is simply not portable between
492 | Ruby 1.8 and 1.9, there are many more issues in which Ruby 1.9 just does 
493 | things a little differently or more conveniently.  In these cases, we can
494 | develop suitable workarounds that allow our code to run on both versions of
495 | Ruby.  Let's take a look at a few of these issues and how we can deal with
496 | them.
497 | 
498 | ==== Using Enumerator ====
499 | 
500 | In Ruby 1.9, you can get back an Enumerator for pretty much every method that
501 | iterates over a collection:
502 | 
503 | ...........................................................................
504 | 
505 | >> [1,2,3,4].map.with_index { |e,i| e + i }
506 | => [1, 3, 5, 7]
507 | 
508 | ...........................................................................
509 | 
510 | In Ruby 1.8, Enumerator is part of the standard library instead of core, and
511 | isn't quite as feature packed.  However, we can still accomplish the same
512 | goals by being a bit more verbose:
513 | 
514 | ...........................................................................
515 | 
516 | >> require "enumerator"
517 | => true
518 | >> [1,2,3,4].enum_for(:each_with_index).map { |e,i| e + i }
519 | => [1, 3, 5, 7]
520 | 
521 | ...........................................................................
522 |   
523 | Because Ruby 1.9's implementation of Enumerator is mostly backwards 
524 | compatible with Ruby 1.8, you can write your code in this legacy style
525 | without fear of breaking anything.
526 | 
527 | ==== String Iterators ====
528 | 
529 | In Ruby 1.8, Strings are Enumerable, whereas in Ruby 1.9, they are not.
530 | Ruby 1.9 provides `String#lines`, `String#each_line`, `String#each_char`, 
531 | `String#each_byte`, all of which are not present in Ruby 1.8.
532 | 
533 | The best bet here is to back-port the features you need to Ruby 1.8, and
534 | avoid treating a String as an Enumerable sequence of lines.  When you need
535 | that functionality, use String#lines followed by whatever enumerable method you
536 | need.
537 | 
538 | The underlying point here is that it's better to stick with Ruby 1.9's
539 | functionality, because it'll be less likely to confuse others who might
540 | be reading your code.
541 | 
542 | ==== Character Operations ====
543 | 
544 | In Ruby 1.9, Strings are generally character aware, which means that you can
545 | index into them and get back a single character, regardless of encoding:
546 | 
547 | ...........................................................................
548 | 
549 | >> "Foo"[0]
550 | => "F"
551 | 
552 | ...........................................................................
553 | 
554 | This is not the case in Ruby 1.8.6, as you can see:
555 | 
556 | ...........................................................................
557 | 
558 | >> "Foo"[0]
559 | => 70
560 | 
561 | ...........................................................................
562 | 
563 | If you need to do character aware operations on Ruby 1.8 and 1.9, you'll need
564 | to process things using a regex trick that gets you back an array of 
565 | characters.  After setting `$KCODE="U"` footnote:[This is necessary to work with
566 | UTF-8 on Ruby 1.8, but has no effect on 1.9], you'll need to do things like 
567 | substitute calls to `String#reverse` with the following:
568 | 
569 | ...........................................................................
570 | 
571 | >> "résumé".scan(/./m).reverse.join
572 | => "émusér"
573 | 
574 | ...........................................................................
575 |   
576 | Or as another example, you'll replace `String#chop` with this:
577 | 
578 | ...........................................................................
579 | 
580 | >> r = "résumé".scan(/./m); r.pop; r.join
581 | => "résum"
582 | 
583 | ...........................................................................
584 | 
585 | Depending on how many of these manipulations you'll need to do, you might
586 | consider breaking out the Ruby 1.8 compatible code from the clearer Ruby 1.9
587 | code using the techniques discussed earlier in this chapter.  However, the
588 | thing to remember is that anywhere you've been enjoying Ruby 1.9's M17N 
589 | support, you'll need to do some rework.  The good news is that many of
590 | the techniques used on Ruby 1.8 still work on Ruby 1.9, but the bad news is
591 | that they can appear quite convoluted to those who have gotten used to the
592 | way things work in newer versions of Ruby.
593 | 
594 | ==== Encoding Conversions ====
595 | 
596 | Ruby 1.9 has built in support for transcoding between various character
597 | encodings, whereas Ruby 1.8 is more limited.  However, both versions support
598 | Iconv.  If you know exactly what formats you want to translate between, you can
599 | simply replace your string.encode("ISO-8859-1") calls with something like
600 | this:
601 | 
602 |  Iconv.conv("ISO-8859-1", "UTF-8", string)
603 |  
604 | However, if you want to let Ruby 1.9 stay smart about its transcoding while
605 | still providing backwards compatibility, you will just need to write code
606 | for each version.  Here's an example of how this was done in an early version of
607 | Prawn:
608 | 
609 | ----------------------------------------------------------------------------
610 | 
611 |   if "".respond_to?(:encode!)
612 |     def normalize_builtin_encoding(text)
613 |       text.encode!("ISO-8859-1")
614 |     end
615 |   else
616 |     require 'iconv'
617 |     def normalize_builtin_encoding(text)
618 |       text.replace Iconv.conv('ISO-8859-1//TRANSLIT', 'utf-8', text)
619 |     end
620 |   end
621 |   
622 | ----------------------------------------------------------------------------
623 | 
624 | Although there is duplication of effort here, the Ruby 1.9 based code does
625 | not assume UTF-8 based input, whereas the Ruby 1.8 based code is forced to
626 | make this assumption.  In cases where you want to support many encodings on
627 | Ruby 1.9, this may be the right way to go.
628 | 
629 | Although we've just scratched the surface, this handful of tricks should
630 | cover a handful of the most common issues you'll encounter.  For everything
631 | else, consult your favorite language reference.
632 | 
633 | === Conclusions ===
634 | 
635 | Depending on the nature of your project, getting things running on both Ruby
636 | 1.8 and 1.9 can be either trivial or a major undertaking.  The more string
637 | processing you are doing, and the greater your need for multi-lingualization
638 | support, the more complicated a backwards compatible port of your software to
639 | Ruby 1.8 will be.  Additionally, if you've been digging into some of the fancy
640 | new features that ship with Ruby 1.9, you might find yourself doing some
641 | serious rewriting when the time comes to support older versions of Ruby.
642 | 
643 | In light of all this, it's best to start (if you can afford to) by supporting
644 | both versions from the ground up.  By writing your code in a fairly backwards
645 | compatible subset of Ruby 1.9, you'll minimize the amount of duplicated 
646 | effort that is needed to support both versions.  If you keep your 
647 | compatibility hacks well organized and centralized, it'll be easier to spot
648 | any problems that might crop up.
649 | 
650 | If you find yourself writing the same workaround several times, think about
651 | extending the core with some helpers to make your code clearer.  However,
652 | keep in mind that when you redistribute code, you have a responsibility not
653 | to break existing language features and that you should strive to avoid
654 | conflicts with third party libraries.
655 | 
656 | However, don't let all these caveats turn you away.  Writing code that runs
657 | on both Ruby 1.8 and 1.9 is about the most friendly thing you can do in terms
658 | of open source Ruby, and will also be beneficial in other scenarios.  Start
659 | by reviewing the guidelines in this chapter, then remember to keep testing 
660 | your code on both versions of Ruby.  So long as you keep things well 
661 | organized and try as best as you can to minimize version specific code, you
662 | should be able to get your project working on both Ruby 1.8 and 1.9 without
663 | conflicts.  This gives you a great degree of flexibility which is often worth
664 | the extra effort.
665 | 


--------------------------------------------------------------------------------
/manuscript/unmerged/io.txt:
--------------------------------------------------------------------------------
   1 | == Text Processing and File Management ==
   2 | 
   3 | === A Job Scripting Languages Are Built For ===
   4 | 
   5 | Ruby fills a lot of the same roles that languages such as Perl and Python
   6 | do.  Because of this, you can expect to find first rate support for text
   7 | processing and file management.  Whether it's parsing a text file with
   8 | some regular expressions or building some *nix style filter applications,
   9 | Ruby can help make life easier.
  10 | 
  11 | However, much of Ruby's I/O facilities are tersely documented at best.  It is
  12 | also relatively hard to find good resources which show you general strategies
  13 | for attacking common text processing tasks.  This chapter aims to expose
  14 | you to some good tricks that you can use to simplify your text processing
  15 | needs, as well as sharpen your skills when it comes to interacting with
  16 | and managing files on your system.
  17 | 
  18 | As in other chapters, we'll start off by looking at some real open source
  19 | code, this time, a simple parser for an Adobe Font Metrics file.  This example
  20 | will expose you to text processing in its setting.  We'll then follow up
  21 | with a number of detailed sections which look at different practices that
  22 | will help you master basic I/O skill.  Armed with these techniques, you'll
  23 | be able to take on all sorts of text processing and file management tasks with
  24 | ease.   
  25 | 
  26 | === Line Based File Processing with State Tracking ===
  27 | 
  28 | Processing a text document line by line does not mean that we're limited
  29 | to extracting content in a uniform way, treating each line identically.
  30 | Some files have more structure than that, but can still benefit from
  31 | being processed linearly.  We're now going to look over a small parser that
  32 | illustrates this general idea by selecting different ways to extract our
  33 | data based on what section of a file we are in.
  34 | 
  35 | The code in this section was written by James Edward Gray II as part of Prawn's
  36 | support for Adobe Font Metrics.  Though the example itself is domain specific,
  37 | we won't hung up in the particular details of this parser. Instead, we'll
  38 | be taking a look at the general approach for to build a state aware parser 
  39 | that operates on an efficient line by line basis.  Along the way, you'll pick 
  40 | up some basic I/O tips and tricks as well as see the importance regular 
  41 | expressions often play in this sort of task.
  42 | 
  43 | Before we take a look at the actual parser, we can take a glance at the sort
  44 | of data we're dealing with.  Adobe Font Metrics files are essentially font
  45 | glyph measurements and specifications, so they tend to look a bit like a
  46 | configuration file of sorts.  Some of these things are simply straight key
  47 | value pairs, such as:
  48 | 
  49 | ...............................................................................
  50 | 
  51 | CapHeight 718
  52 | XHeight 523
  53 | Ascender 718
  54 | Descender -207
  55 | 
  56 | ...............................................................................
  57 | 
  58 | Others are organized sets of values within a section, as in the following
  59 | example:
  60 | 
  61 | ---------------------------------------------------------------------------------
  62 | 
  63 | StartCharMetrics 315
  64 | C 32 ; WX 278 ; N space ; B 0 0 0 0 ;
  65 | C 33 ; WX 278 ; N exclam ; B 90 0 187 718 ;
  66 | C 34 ; WX 355 ; N quotedbl ; B 70 463 285 718 ;
  67 | C 35 ; WX 556 ; N numbersign ; B 28 0 529 688 ;
  68 | C 36 ; WX 556 ; N dollar ; B 32 -115 520 775 ;
  69 | ....
  70 | EndCharMetrics
  71 | 
  72 | ---------------------------------------------------------------------------------
  73 | 
  74 | Sections can be nested within each other, making things more interesting.
  75 | The data across the file does not fit a uniform format, as each section
  76 | represents a different sort of thing.  However, we can come up with patterns
  77 | to parse data in each section we're interested in, because they are consistent
  78 | within their sections. We also are only interested in a subset of the 
  79 | sections, so we can safely ignore some of them.  This is the essence of the
  80 | task we needed to accomplish, but if you notice, it's a fairly abstract 
  81 | pattern that we can reuse.  Many documents with a simple section-based 
  82 | structure can be worked with using the approach we show here.
  83 | 
  84 | The code that follows is essentially a simple finite state machine that keeps
  85 | track of what section the current line appears in.  It attempts to parse
  86 | the opening or closing of a section first, and then uses this information
  87 | to determine a parsing strategy for the current line.  The sections that
  88 | we're not interested in parsing, we simply skip.  
  89 | 
  90 | We end up with a very straightforward solution.  The whole parser is 
  91 | reduced to a simple iteration over each line of the file which 
  92 | manages a stack of nested sections, while determining if and how to 
  93 | parse the current line. 
  94 | 
  95 | We'll look at the parts in more details in just a moment, but here is the
  96 | whole AFM parser that extracts all the information we need to properly render
  97 | Adobe fonts in Prawn:
  98 | 
  99 | ...............................................................................
 100 |  
 101 | def parse_afm(file_name) 
 102 |   section = []
 103 | 
 104 |   File.foreach(file_name) do |line|        
 105 |     case line
 106 |     when /^Start(\w+)/
 107 |       section.push $1
 108 |       next
 109 |     when /^End(\w+)/
 110 |       section.pop
 111 |       next
 112 |     end
 113 | 
 114 |     case section
 115 |     when ["FontMetrics", "CharMetrics"]
 116 |       next unless line =~ /^CH?\s/  
 117 | 
 118 |       name                  = line[/\bN\s+(\.?\w+)\s*;/, 1]
 119 |       @glyph_widths[name]   = line[/\bWX\s+(\d+)\s*;/, 1].to_i
 120 |       @bounding_boxes[name] = line[/\bB\s+([^;]+);/, 1].to_s.rstrip
 121 |     when ["FontMetrics", "KernData", "KernPairs"]
 122 |       next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/
 123 |       @kern_pairs[[$1, $2]] = $3.to_i
 124 |     when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"]
 125 |       next
 126 |     else
 127 |       parse_generic_afm_attribute(line)
 128 |     end
 129 |   end 
 130 | end
 131 | 
 132 | ...............................................................................
 133 | 
 134 | You could try to understand the particular details if you'd like, but it's
 135 | also fine to black-box the expressions used here so that you can get
 136 | a sense of the overall structure of the parser. Here's what the code
 137 | looks like if we do that for all but the patterns which determine the 
 138 | section nesting:
 139 | 
 140 | ...............................................................................
 141 | 
 142 | def parse_afm(file_name) 
 143 |   section = []
 144 | 
 145 |   File.foreach(file_name) do |line|        
 146 |     case line
 147 |     when /^Start(\w+)/
 148 |       section.push $1
 149 |       next
 150 |     when /^End(\w+)/
 151 |       section.pop
 152 |       next
 153 |     end
 154 | 
 155 |     case section
 156 |     when ["FontMetrics", "CharMetrics"]
 157 |       parse_char_metrics(line)
 158 |     when ["FontMetrics", "KernData", "KernPairs"]
 159 |       parse_kern_pairs(line)
 160 |     when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"]
 161 |       next
 162 |     else
 163 |       parse_generic_afm_attribute(line)
 164 |     end
 165 |   end 
 166 | end
 167 | 
 168 | ...............................................................................
 169 | 
 170 | With these simplifications, it's very clear that we're looking at an ordinary
 171 | finite state machine which is acting upon the lines of the file.  It
 172 | also makes it easier to notice what's actually going on.
 173 | 
 174 | The first case statement is just a simple way to check for which section
 175 | we're currently looking at, updating the stack as necessary as we move
 176 | in and out of sections:
 177 | 
 178 | ...............................................................................
 179 | 
 180 | case line
 181 | when /^Start(\w+)/
 182 |   section.push $1
 183 |   next
 184 | when /^End(\w+)/
 185 |   section.pop
 186 |   next
 187 | end
 188 | 
 189 | ...............................................................................
 190 | 
 191 | If we find a section beginning or end, we skip to the next line as we know
 192 | there is nothing else to parse.   Otherwise, we know that we have to do some
 193 | real work, which is done in the second case statement:
 194 | 
 195 | ...............................................................................
 196 | 
 197 | case section
 198 | when ["FontMetrics", "CharMetrics"]
 199 |   next unless line =~ /^CH?\s/  
 200 | 
 201 |   name                  = line[/\bN\s+(\.?\w+)\s*;/, 1]
 202 |   @glyph_widths[name]   = line[/\bWX\s+(\d+)\s*;/, 1].to_i
 203 |   @bounding_boxes[name] = line[/\bB\s+([^;]+);/, 1].to_s.rstrip
 204 | when ["FontMetrics", "KernData", "KernPairs"]
 205 |   next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/
 206 |   @kern_pairs[[$1, $2]] = $3.to_i
 207 | when ["FontMetrics", "KernData", "TrackKern"], ["FontMetrics", "Composites"]
 208 |   next
 209 | else
 210 |   parse_generic_afm_attribute(line)
 211 | end
 212 | 
 213 | ...............................................................................
 214 |   
 215 | Here, we've got four different ways to handle our line of text.  In the first
 216 | two cases, we process the lines we need to as we walk through the section,
 217 | extracting the bits of information we need and ignoring the extraneous
 218 | information we're not interested in.    
 219 | 
 220 | In the third case, we identify certain sections to skip and simply resume
 221 | processing the next line if we are currently within that section.
 222 | 
 223 | Finally, if the other cases fail to match, our last case scenario is to
 224 | assume we're dealing with a simple key value pair, which is handled by a 
 225 | private helper method in Prawn.  Since it does not provide anything different
 226 | to look at than the first two sections of this case statement, we can 
 227 | safely ignore how it works without missing anything important.
 228 | 
 229 | However, the interesting thing that you might have noticed is that the first
 230 | case and the second case use two different ways of extracting values.  The code
 231 | which processes +CharMetrics+ is using +String#[]+, wheras the code handling
 232 | KernPairs is using Perl-style global match variables.  The reason for this is 
 233 | largely convenience.  The following two lines of code are equivalent:
 234 | 
 235 | ...............................................................................
 236 | 
 237 | name = line[/\bN\s+(\.?\w+)\s*;/, 1]  
 238 | name = line =~ /\bN\s+(\.?\w+)\s*;/ && $1
 239 | 
 240 | ...............................................................................
 241 |   
 242 | There are still other ways to handle your captured matches
 243 | (Such as +MatchData+ via +String#match+), but we'll get into those later.  For
 244 | now, it's simply worth knowing that when you're trying to extract a single
 245 | matched capture, +String#[]+ does the job well, but if you need to deal with
 246 | more than one, you need to use another approach.  We see this clearly in
 247 | the second case:
 248 | 
 249 | ...............................................................................
 250 | 
 251 | next unless line =~ /^KPX\s+(\.?\w+)\s+(\.?\w+)\s+(-?\d+)/
 252 | @kern_pairs[[$1, $2]] = $3.to_i
 253 | 
 254 | ...............................................................................
 255 | 
 256 | This code is a bit clever, as the line that assigns the values to +@kern_pairs+
 257 | only gets executed when there is a successful match.  When the match fails,
 258 | it will return +nil+, causing the parser to skip to the next line for 
 259 | processing.
 260 | 
 261 | We could continue studying this example, but we'd then be delving into the
 262 | specifics and those details aren't important for remembering this simple
 263 | general pattern.
 264 | 
 265 | When dealing with a structured document that can be processed by discrete
 266 | rules for each section, the general approach is simple and does not typically
 267 | require pulling the entire document into memory or doing multiple passes
 268 | through the data.
 269 | 
 270 | Instead, you can do the following:
 271 | 
 272 |   * Identity the beginning and end markers of sections with a pattern.
 273 |   
 274 |   * If sections are nested, maintain a stack which you update before further
 275 |     processing of each line.
 276 |     
 277 |   * Break up your extraction code into different cases and select the right
 278 |     one based on the current section you are in.
 279 |     
 280 |   * When a line cannot be processed, skip to the next one as soon as possible,
 281 |     using the +next+ keyword.
 282 |     
 283 |   * Maintain state as you normally would, processing whatever data you need.
 284 |   
 285 | By following these basic guidelines, you can avoid over thinking your problem,
 286 | while still saving clock cycles and keeping your memory footprint low.  
 287 | Although the code here solves a particular problem, it can easily be adapted
 288 | to fit a wide range of basic document processing needs.
 289 | 
 290 | This introduction has hopefully provided a taste of what text processing in 
 291 | Ruby is all about.  The rest of the chapter will provide many more tips and
 292 | tricks, with a greater focus on the particular topics.  Feel free to jump
 293 | around to the things that interest you most, but I'm hoping all of the
 294 | sections have something interesting to offer to even seasoned Rubyists.
 295 | 
 296 | === Regular Expressions ===
 297 | 
 298 | At the time of writing this chapter, I was spending some time watching the
 299 | Dow Jones Industrial Average, as the world was in the middle of a major
 300 | financial meltdown.   If you're wondering what this has to do with Ruby
 301 | or Regular Expressions, take a quick look at the following code:
 302 | 
 303 | ...............................................................................
 304 | 
 305 |   require "open-uri"
 306 |   loop do
 307 |     puts( open("http://finance.google.com/finance?cid=983582").read[
 308 |     /<span class="\w+" id="ref_983582_c">([+-]?\d+\.\d+)/m, 1] )
 309 |     sleep(30)
 310 |   end
 311 | 
 312 | ...............................................................................
 313 | 
 314 | In just a couple of lines, I was able to throw together a script that would
 315 | poll Google Finance and pull down the current average price of the Dow.  This
 316 | sort of "find a needle in the haystack" extraction is what regular expressions
 317 | are all about.
 318 | 
 319 | Of course, the art of constructing regular expressions is often veiled in 
 320 | mystery.  Even simple patterns such as this one might make some folks feel a bit uneasy:
 321 | 
 322 | ...............................................................................
 323 | 
 324 | /<span class="\w+" id="ref_983582_c">([+-]?\d+\.\d+)/m
 325 | 
 326 | ...............................................................................
 327 |   
 328 | This expression is simple by comparison to some other examples we can show,
 329 | but it still makes use of a number of regular expression concepts. All in
 330 | one line, we can see the use of character classes (both general and special),
 331 | escapes, quantifiers, groups, and a switch that enables multi-line matching.
 332 | 
 333 | Patterns are dense because they are written in a special syntax which acts
 334 | as a sort of domain language for matching and extracting text.  The reason
 335 | why it may be considered daunting is that this language is made up of so
 336 | few special characters:
 337 | 
 338 | ...............................................................................
 339 | 
 340 |  \ [ ] . ^ $ ? * + { } | ( )
 341 | 
 342 | ...............................................................................
 343 |    
 344 | At its heart, regular expressions are nothing more than a facility to do
 345 | find and replace operations.  This concept is so familiar that anyone who
 346 | has used a word processor has a strong grasp on it.  Using a regex, you
 347 | can easily replace all instances of the word "Mitten" with "Kitten", just
 348 | like your favorite text editor or word processor can:
 349 | 
 350 | ...............................................................................
 351 | 
 352 | some_string.gsub(/\bMitten\b/,"Kitten")
 353 | 
 354 | ...............................................................................
 355 |   
 356 | Many programmers get this far and stop.  They learn to use regex as if it
 357 | were a necessary evil rather than an essential techique.  We can do
 358 | better than that.  In this section, we'll look at a few guidelines for
 359 | how to write effective patterns that do what they're supposed to without
 360 | getting too convoluted.  I'm assuming you've done your homework and are
 361 | at least familiar with Regex basics as well as Ruby's pattern syntax.  If
 362 | that's not the case, pick up your favorite language reference and take a few
 363 | minutes to review the fundamentals.
 364 | 
 365 | So long as you can comfortably read the first example in this section, you're
 366 | ready to move on.  If you can convince yourself that writing regular
 367 | expressions is actually much easier than people tend to think it is, the tips
 368 | and tricks to follow shouldn't cause you to break a sweat.
 369 | 
 370 | ==== Don't Work Too Hard ====
 371 | 
 372 | Despite being such a compact format, it's relatively easy to write bloated
 373 | patterns if you don't consciously remember to keep things clean and tight.
 374 | We'll now take a look at a couple sources of extra fat and how to trim them 
 375 | down.
 376 | 
 377 | Alternation is a very powerful regex tool.  It allows you to match one of
 378 | a series of potential sequences. For example, if you want to match the name 
 379 | "James Gray" but also match "James gray", "james Gray", and "james gray",
 380 | the following code will do the trick:
 381 | 
 382 | ...............................................................................
 383 | 
 384 | >> ["James Gray", "James gray", "james gray", "james Gray"].all? { |e|
 385 | ?>   e.match(/James|james Gray|gray/) }
 386 | => true
 387 | 
 388 | ...............................................................................
 389 | 
 390 | However, you don't need to work so hard.  You're really talking about
 391 | possible alternations of simply two characters, not two full words.  You could
 392 | write this far more efficiently using a character class:
 393 | 
 394 | ...............................................................................
 395 | 
 396 | >> ["James Gray", "James gray", "james gray", "james Gray"].all? { |e|
 397 | ?>   e.match(/[Jj]ames [Gg]ray/) }
 398 | => true
 399 | 
 400 | ...............................................................................
 401 | 
 402 | This makes your pattern clearer and also will result in a much better
 403 | optimization in Ruby's regex engine.  So in addition to looking better,
 404 | this code is actually faster.
 405 | 
 406 | In a similar vein, it is unnecessary to use explicit character classes
 407 | when a shortcut will do. To match a four digit number, we could write:
 408 | 
 409 | ...............................................................................
 410 | 
 411 | /[0-9][0-9][0-9][0-9]/
 412 | 
 413 | ...............................................................................
 414 |   
 415 | Which can of course be cleaned up a bit using repetitions:
 416 | 
 417 | ...............................................................................
 418 |   
 419 | /[0-9]{4}/
 420 | 
 421 | ...............................................................................
 422 |   
 423 | However, we can do even better by using the special class built in for this:
 424 |   
 425 | ...............................................................................
 426 | 
 427 | /\d{4}/
 428 | 
 429 | ...............................................................................
 430 |   
 431 | It pays to learn what shortcuts are available to you.  Here's a quick list
 432 | for further study, if you're not already familiar with them: 
 433 | 
 434 | ...............................................................................
 435 | 
 436 | . \s \S \w \W \d \D
 437 | 
 438 | ...............................................................................
 439 |   
 440 | Each one of the above corresponds to a literal character class that is more
 441 | verbose when written out.  Using shortcuts increases clarity and decreases
 442 | the chance of bugs creeping in by ill defined patterns.  Though it may seem
 443 | a bit terse at first, you'll be able to sight read them at ease over time.
 444 | 
 445 | ==== Anchors are your friends ====
 446 | 
 447 | One way to match my name in a string is to write the following simple pattern:
 448 | 
 449 | ...............................................................................
 450 | 
 451 | string =~ /Gregory Brown/
 452 | 
 453 | ...............................................................................
 454 |   
 455 | However, consider the following:
 456 | 
 457 | ...............................................................................
 458 | 
 459 | >> "matched" if "Mr. Gregory Browne".match(/Gregory Brown/)
 460 | => "matched"
 461 | 
 462 | ...............................................................................
 463 | 
 464 | Often times, we mean "match this phrase", but we write "match this sequence 
 465 | of characters".  The solution is to make use of anchors to clarify what we
 466 | mean.
 467 | 
 468 | Sometimes we want to match only if a string starts with a phrase:
 469 | 
 470 | ...............................................................................
 471 | 
 472 | >> phrases = ["Mr. Gregory Browne", "Mr. Gregory Brown is cool",
 473 |               "Gregory Brown is cool", "Gregory Brown"]
 474 |  
 475 | >> phrases.grep /\AGregory Brown\b/
 476 | => ["Gregory Brown is cool", "Gregory Brown"]
 477 | 
 478 | ...............................................................................
 479 |   
 480 | Other times we want to ensure that the string contains the phrase:
 481 | 
 482 | ...............................................................................
 483 | 
 484 | >> phrases.grep /\bGregory Brown\b/
 485 | => ["Mr. Gregory Brown is cool", "Gregory Brown is cool", "Gregory Brown"]
 486 | 
 487 | ...............................................................................
 488 |   
 489 | And finally, sometimes we want to ensure the string contains an exact phrase:
 490 | 
 491 | ...............................................................................
 492 | 
 493 | >> phrases.grep /\AGregory Brown\z/
 494 | => ["Gregory Brown"]
 495 | 
 496 | ...............................................................................
 497 |   
 498 | Although I am using English names and phrases here for simplicity, this can
 499 | of course be generalized to encompass any sort of matching pattern.  You could
 500 | be verifying that a sequence of numbers fit a certain form, or something
 501 | equally abstract.  The key thing to take away from this is that when you use
 502 | anchors, you're being much more explicit about how you expect your pattern to
 503 | match, which in most cases means that you'll have a better chance of catching
 504 | problems faster, and an easier time remembering what your pattern was supposed
 505 | to do.
 506 | 
 507 | An interesting thing to note about anchors is that they don't actually match
 508 | characters.  Instead, they match between characters to allow you to assert
 509 | certain expectations about your strings.  So when you use something like +\b+,
 510 | you are actually matching between one of +\w\W+ , +\W\w+ , +\A+ , +\z+.  In English,
 511 | that means that you're transitioning from a non-word character to a word
 512 | character, or a non-word character to a word character, or you're matching the
 513 | beginning or end of the string.  If you review the use of +\b+ in the examples above, 
 514 | it should now be very clear how anchors work.
 515 | 
 516 | The full list of available anchors in Ruby are +\A+, +\Z+, +\z+, +^+, +$+, and
 517 | +\b+. Each have their merits, so be sure to read up on them.
 518 | 
 519 | ==== Use caution when working with quantifiers ====
 520 | 
 521 | One of the most common anti-patterns I picked up when first learning regular
 522 | expressions was to make use of +.*+ everywhere.  Though this may seem innocent,
 523 | This is similar to my bad habit of using +rm -Rf+ on the command line all the 
 524 | time instead of just +rm+.  Both can result in catastrophe when used 
 525 | incorrectly.
 526 | 
 527 | But maybe you're not as crazy as I am.  Instead, maybe you've been writing
 528 | innocent things like +/(\d*)Foo/+ to match any number of digits prepended to
 529 | the word Foo:
 530 | 
 531 | For some cases, this works great:
 532 | 
 533 | ...............................................................................
 534 | 
 535 | >> "1234Foo"[/(\d*)Foo/,1]
 536 | => "1234"
 537 | 
 538 | ...............................................................................
 539 |   
 540 | But does this surprise you?
 541 | 
 542 | ...............................................................................
 543 | 
 544 | >> "xFoo"[/(\d*)Foo/,1]
 545 | => ""
 546 | 
 547 | ...............................................................................
 548 |   
 549 | It may not, but then again it may.  It's relatively common to forget that +*+
 550 | always matches. At a first glance, the following code seems fine:
 551 | 
 552 | ...............................................................................
 553 | 
 554 | if num = string[/(\d*)Foo/,1]
 555 |   Integer(num)
 556 | end
 557 | 
 558 | ...............................................................................
 559 | 
 560 | However, since the match will capture an empty string in its failure case,
 561 | this code will break.  The solution is simple.  If you really mean "at least
 562 | one", use + instead.
 563 | 
 564 | ...............................................................................
 565 | 
 566 | if num = string[/(\d+)Foo/,1]
 567 |   Integer(num)
 568 | end
 569 | 
 570 | ...............................................................................
 571 | 
 572 | Though more experienced folks might not easily be trapped by something so
 573 | simple, there are more subtle variants. For example, if we intend to match
 574 | only "Greg" or "Gregory", the following code doesn't quite work:
 575 | 
 576 | ...............................................................................
 577 | 
 578 | >> "Gregory"[/Greg(ory)?/]
 579 | => "Gregory"
 580 | >> "Greg"[/Greg(ory)?/]
 581 | => "Greg"
 582 | >> "Gregor"[/Greg(ory)?/]
 583 | => "Greg"
 584 | 
 585 | ...............................................................................
 586 | 
 587 | Even if the pattern looks close to what we want, we can see the results
 588 | don't fit.  The following modifications remedy the issue:
 589 | 
 590 | ...............................................................................
 591 | 
 592 | >> "Gregory"[/\bGreg(ory)?\b/]
 593 | => "Gregory"
 594 | >> "Greg"[/\bGreg(ory)?\b/]
 595 | => "Greg"
 596 | >> "Gregor"[/\bGreg(ory)?\b/]
 597 | => nil
 598 | 
 599 | ...............................................................................
 600 | 
 601 | Notice that the pattern now properly matches Greg or Gregory, but no other
 602 | words.  The key thing to take away here is that unbounded zero-matching
 603 | quantifiers are tautologies.  They can never fail to match, so you need
 604 | to be sure to account for that.
 605 | 
 606 | A final gotcha about quantifiers is that they are greedy by default.  
 607 | This means they'll try to consume as much of the string as possible before
 608 | matching.  The following is an example of a greedy match:
 609 | 
 610 | ...............................................................................
 611 | 
 612 | >> "# x # y # z #"[/#(.*)#/,1]
 613 | => " x # y # z "
 614 | 
 615 | ...............................................................................
 616 | 
 617 | As you can see, this matches everything between the first and last +#+ character.
 618 | But sometimes, we want processing to happen from the left and end as soon
 619 | as we have a match.  To do this, we append a +?+ to our repetition:
 620 | 
 621 | ...............................................................................
 622 | 
 623 | >> "# x # y # z #"[/#(.*?)#/,1]
 624 | => " x "
 625 | 
 626 | ...............................................................................
 627 | 
 628 | All quantifiers can be made non-greedy this way. Remembering this will save a lot of 
 629 | headaches in the long run.
 630 | 
 631 | Though our treatment of regular expressions has been by no means 
 632 | comprehensive, these few basic tips will really carry you a long way.
 633 | The key things to remember are:
 634 | 
 635 |   * Regular Expressions are nothing more than a special language
 636 |     for find and replace operations, built upon simple logical constructs.
 637 |     
 638 |   * There are lots of shortcuts built in for common regular expression
 639 |     operations, so be sure to make use of special character classes and
 640 |     other simplifications when you can.
 641 |     
 642 |   * Anchors provide a way to set up some expectation about where in a string
 643 |     you want to look for a match.  These help with both optimization and
 644 |     pattern correctness.
 645 |     
 646 |   * Quantifiers such as +*+ and +?+ will always match, so they should not be
 647 |     used without sufficient boundaries.
 648 |     
 649 |   * Quantifiers are greedy by default, and can be made non-greedy via +?+.
 650 |   
 651 | By following these guidelines, you'll write clearer, more accurate, and 
 652 | faster regular expressions.  As a result, it'll be a whole lot easier to
 653 | revisit them when you run into them in your own old code a few months down
 654 | the line.
 655 | 
 656 | A final note on regular expressions is that sometimes we are seduced by their
 657 | power and overlook other solutions that may be more robust for certain needs.
 658 | In both the stock ticker and AFM parsing examples, we were working within
 659 | the realm where regular expressions are a quick, easy, and fine way to go.
 660 | 
 661 | However, as documents take on more complex structures, and your needs move
 662 | from extracting some values to attempting to fully parse a document, you
 663 | will probably need to look to other techniques that involve full blown
 664 | parsers such as Treetop, Ghostwheel, or Racc.  These libraries can solve
 665 | problems regular expressions can't solve, and if you find yourself with
 666 | data that's hard to map a regex to, it's worth looking at these alternative
 667 | solutions.
 668 | 
 669 | Of course, your mileage will vary based on the problem at hand, so don't be
 670 | afraid of trying a regex based solution first before pulling out the big guns.
 671 | 
 672 | === Working With Files ===
 673 | 
 674 | There are a whole slew of options for doing various file management tasks in
 675 | Ruby.  Because of this, it can be difficult to decide what the best approach
 676 | for a given task might be.  In this section, we'll cover two key task while
 677 | looking at three of Ruby's standard libraries. 
 678 | 
 679 | We'll start by showing how to use the +Pathname+ and +FileUtils+ libraries to
 680 | traverse your file system using a clean cross-platform approach that rivals
 681 | the power of popular *nix shells without sacrificing compatibility.  We'll
 682 | then move on to show how to use +Tempfile+ to automate handling of temporary
 683 | file resources within your scripts.  These practical tips will help you
 684 | write platform-agnostic Ruby code that'll work out of the box on more 
 685 | systems, while still managing to make your job easier.
 686 | 
 687 | ==== Using Pathname and FileUtils ====
 688 | 
 689 | If you are using Ruby to write administrative scripts, it's nearly inevitable
 690 | that you've needed to do some file management along the way.  It may be quite
 691 | tempting to drop down the the shell to do things like move and rename 
 692 | directories, search for files in a complex directory structure, and do other
 693 | common tasks that involve ferrying files around from one place to the other.
 694 | However, Ruby provides some great tools to avoid this sort of thing.
 695 | 
 696 | The +Pathname+ and +FileUtils+ standard libraries provide virtually everything
 697 | you need for file management.  The best way to demonstrate their capabilities
 698 | is by example, so we'll now take a look at some code and then break it down
 699 | piece by piece.
 700 | 
 701 | To illustrate +Pathname+, we can take a look at a small tool I've built for
 702 | doing local installations of libraries found on Github.  This script,
 703 | called 'mooch', essentially looks up and clones a git repository, puts it
 704 | in a convenient place within your project (a 'vendor/' directory), and 
 705 | optionally sets up a stub file that will include your vendored packages
 706 | into the loadpath upon requiring it.  Sample usage looks something like
 707 | this:
 708 | 
 709 | ...............................................................................
 710 |   
 711 | $ mooch init lib/my_project 
 712 | $ mooch sandal/prawn  0.2.3
 713 | $ mooch ruport/ruport 1.6.1
 714 | 
 715 | ...............................................................................
 716 |   
 717 | Then, we can see the following will work without loading rubygems:
 718 | 
 719 | ...............................................................................
 720 | 
 721 | >> require "lib/my_project/dependencies"
 722 | => true
 723 | >> require "prawn"
 724 | => true
 725 | >> require "ruport"
 726 | => true
 727 | >> Prawn::VERSION
 728 | => "0.2.3"
 729 | >> Ruport::VERSION
 730 | => "1.6.1"
 731 | 
 732 | ...............................................................................
 733 | 
 734 | Although this script is pretty useful, that's not what we're here to talk
 735 | about though.  Instead, let's focus on how this sort of thing is built,
 736 | since it shows a practical example of using +Pathname+ to manipulate files and
 737 | folders.  I'll start by showing you the whole script, and then we'll walk
 738 | through it part by part:
 739 | 
 740 | ...............................................................................
 741 | 
 742 | #!/usr/bin/env ruby
 743 | require "pathname"
 744 | 
 745 | WORKING_DIR = Pathname.getwd
 746 | LOADER = %Q{
 747 |   require "pathname"
 748 | 
 749 |   Pathname.glob("#{WORKING_DIR}/vendor/*/*/") do |dir|
 750 |    lib = dir + "lib"
 751 |    $LOAD_PATH.push(lib.directory? ? lib : dir)
 752 |   end
 753 | }
 754 | 
 755 | if ARGV[0] == "init"
 756 |   lib = Pathname.new(ARGV[1])
 757 |   lib.mkpath
 758 |   (lib + 'dependencies.rb').open("w") do |file|
 759 |     file.write LOADER
 760 |   end
 761 | else
 762 |   vendor = Pathname.new("vendor")
 763 |   vendor.mkpath
 764 |   Dir.chdir(vendor.realpath)
 765 |   system("git clone git://github.com/#{ARGV[0]}.git #{ARGV[0]}")
 766 |   if ARGV[1]
 767 |     Dir.chdir(ARGV[0])
 768 |     system("git checkout #{ARGV[1]}")
 769 |   end
 770 | end
 771 | 
 772 | ...............................................................................
 773 |   
 774 | As you can see, it's not a ton of code, even though it does a lot.  Let's
 775 | shine the spotlight on the interesting `Pathname` bits.
 776 | 
 777 | ...............................................................................
 778 | 
 779 | WORKING_DIR = Pathname.getwd
 780 | 
 781 | ...............................................................................
 782 | 
 783 | Here we are simply assigning the initial working directory to a constant.  We 
 784 | use this to build up the code for the 'dependencies.rb' stub script that can
 785 | be generated via +mooch init+.  Here we're  just doing quick and dirty code 
 786 | generation, and you can see the full stub as stored in +LOADER+:
 787 | 
 788 | ...............................................................................
 789 | 
 790 | LOADER = %Q{
 791 |   require "pathname"
 792 | 
 793 |   Pathname.glob("#{WORKING_DIR}/vendor/*/*/") do |dir|
 794 |     lib = dir + "lib"
 795 |     $LOAD_PATH.push(lib.directory? ? lib : dir)
 796 |   end
 797 | }
 798 | 
 799 | ...............................................................................
 800 | 
 801 | This script does something fun.  It looks in the working directory that
 802 | mooch init was run in for a folder called vendor, and then looks for
 803 | folders two levels deep fitting the Github convention of username/project.  We
 804 | then use a glob to traverse the directory structure, in search of folders
 805 | to add to the loadpath.  The code will check to see if each project has a
 806 | 'lib' folder within it (as is the common Ruby convention), but will add the
 807 | project folder itself to the loadpath if it is not present.
 808 | 
 809 | Here we notice a few of `Pathname`'s niceties.  You can see we can construct
 810 | new paths by just adding new strings to them, as shown here:
 811 | 
 812 | ...............................................................................
 813 | 
 814 | lib = dir + "lib"
 815 | 
 816 | ...............................................................................
 817 |      
 818 | In addition to this, we can check to see if the path we've created actually
 819 | points to a directory on the filesystem, via a simple +Pathname#directory?+
 820 | call.  This makes traversal downright easy, as you can see in the preceding
 821 | code.
 822 | 
 823 | This simple stub may be a bit dense, but once you get the hang of +Pathname+,
 824 | you can see that it's quite powerful.  Let's look at a couple more tricks,
 825 | focusing this time on the code that actually writes this snippet to file:
 826 | 
 827 | ...............................................................................
 828 | 
 829 | lib = Pathname.new(ARGV[1])
 830 | lib.mkpath
 831 | (lib + 'dependencies.rb').open("w") do |file|
 832 |   file.write LOADER
 833 | end
 834 | 
 835 | ...............................................................................
 836 | 
 837 | Before, we showed an invocation that looked like this:
 838 | 
 839 | ...............................................................................
 840 | 
 841 | $ mooch init lib/my_project 
 842 | 
 843 | ...............................................................................
 844 | 
 845 | Here, +ARGV[1]+ is 'lib/my_project'.  So, in the preceding code, you can see
 846 | we're building up a relative path to our current working directory and
 847 | then creating a folder structure.  A very cool thing about Pathname is that
 848 | it works similar to +mkdir -p+ on *nix, so +Pathname#mkpath+ will actually create
 849 | any necessary nesting directories as needed, and won't complain if the
 850 | structure already exist, which are both what we want here.
 851 | 
 852 | Once we build up the directories, we need to create our 'dependencies.rb' file
 853 | and populate it with the string in +LOADER+.  We can see here that Pathname 
 854 | provides shortcuts that work in a similar fashion to +File.open()+.
 855 | 
 856 | In the code that actually downloads and vendors libraries from GitHub, 
 857 | we see the same techniques in use yet again, this time mixed in with some 
 858 | shell commands and +Dir.chdir+.  Since this doesn't introduce anything new,
 859 | we can skip overthe details.
 860 | 
 861 | Before we move on to discussing temporary files, we'll take a quick look
 862 | at +FileUtils+.  The purpose of this module is to provide a UNIX-like interface 
 863 | to file manipulation tasks, and a quick look at its method list will show
 864 | that it does a good job of this:
 865 | 
 866 | ...............................................................................
 867 | 
 868 | cd(dir, options)
 869 | cd(dir, options) {|dir| .... }
 870 | pwd()
 871 | mkdir(dir, options)
 872 | mkdir(list, options)
 873 | mkdir_p(dir, options)
 874 | mkdir_p(list, options)
 875 | rmdir(dir, options)
 876 | rmdir(list, options)
 877 | ln(old, new, options)
 878 | ln(list, destdir, options)
 879 | ln_s(old, new, options)
 880 | ln_s(list, destdir, options)
 881 | ln_sf(src, dest, options)
 882 | cp(src, dest, options)
 883 | cp(list, dir, options)
 884 | cp_r(src, dest, options)
 885 | cp_r(list, dir, options)
 886 | mv(src, dest, options)
 887 | mv(list, dir, options)
 888 | rm(list, options)
 889 | rm_r(list, options)
 890 | rm_rf(list, options)
 891 | install(src, dest, mode = <src's>, options)
 892 | chmod(mode, list, options)
 893 | chmod_R(mode, list, options)
 894 | chown(user, group, list, options)
 895 | chown_R(user, group, list, options)
 896 | touch(list, options)
 897 | 
 898 | ...............................................................................
 899 | 
 900 | You'll see a bit more of +FileUtils+ later on in the chapter when we talk about
 901 | atomic saves.  But before we jump into advanced file management techniques,
 902 | let's take a quick look at another important foundational tool, the tempfile
 903 | standard library.
 904 | 
 905 | === The tempfile Standard Library ===
 906 | 
 907 | Producing temporary files is a common need in many applications.  Whether
 908 | you need to store some things on disk to keep it out of memory until it
 909 | is needed again, or you want to serve up a file but don't need to keep it
 910 | lurking around after your process has terminated, odds are you'll run into
 911 | this problem sooner or later.
 912 | 
 913 | It's quite tempting to roll our own +Tempfile+ support, which might look
 914 | something like the following code:
 915 | 
 916 | ...............................................................................
 917 | 
 918 | File.open("/tmp/foo.txt","w") do |file|
 919 |   file << some_data
 920 | end
 921 | 
 922 | # Then in some later code
 923 | 
 924 | File.foreach("/tmp/foo.txt") do |line|
 925 |   # do something with data
 926 | end
 927 | 
 928 | # Then finally
 929 | require "fileutils"
 930 | FileUtils.rm("/tmp/foo.txt")
 931 | 
 932 | ...............................................................................
 933 |   
 934 | This code works, but it has some drawbacks.  The first is that it assumes
 935 | that you're on a *nix system with a '/tmp' directory.  Secondly, we
 936 | don't do anything to avoid file collisions, so if another application is
 937 | using '/tmp/foo.txt', this will overwrite it.  Finally, we need to explicitly
 938 | remove the file, or risk leaving a bunch of trash around.
 939 | 
 940 | Luckily, Ruby has a standard library that helps us get around these issues.  
 941 | Using it, our example then looks like this:
 942 | 
 943 | ...............................................................................
 944 | 
 945 | require "tempfile"
 946 | temp = Tempfile.new("foo.txt")
 947 | temp << some_data
 948 | 
 949 | # then in some later code
 950 | temp.rewind
 951 | temp.each do |line|
 952 |   # do something with data
 953 | end
 954 | 
 955 | # Then finally
 956 | temp.close
 957 | 
 958 | ...............................................................................
 959 |   
 960 | Let's take a look at what's going on in a little more detail, to really get
 961 | a sense of what the +tempfile+ library is doing for us.
 962 |   
 963 | ==== Automatic Temporary Directory Handling ====
 964 |   
 965 | The code looks somewhat similar to our original example, as we're still 
 966 | essentially working with an IO object.  However, the approach is different. 
 967 | +Tempfile+ opens up a file handle for us to a file that is stored in whatever 
 968 | your system's tempdir is.  We can inspect this value, and even change it if we
 969 | need to.  Here's what it looks like on two of my systems:
 970 | 
 971 | ...............................................................................
 972 | 
 973 | >> Dir.tmpdir
 974 | => "/var/folders/yH/yHvUeP-oFYamIyTmRPPoKE+++TI/-Tmp-"
 975 | 
 976 | >> Dir.tmpdir
 977 | => "/tmp"
 978 | 
 979 | ...............................................................................
 980 | 
 981 | Usually, it's best to go with whatever this value is because it is where Ruby
 982 | thinks your temp files should go.  However, in the cases where we want to
 983 | control this ourselves, it is simple to do so, as shown in the following:
 984 | 
 985 | ...............................................................................
 986 | 
 987 | temp = Tempfile.new("foo.txt", "path/to/my/tmpdir")
 988 | 
 989 | ...............................................................................
 990 | 
 991 | ==== Collision Avoidance ====
 992 | 
 993 | When you create a temporary file with Tempfile.new, you aren't actually
 994 | specifying an exact filename.  Instead, the filename you specify is used
 995 | as a base name and then gets a unique identifier appended to it.  This
 996 | prevents one temp file from accidentally overwriting another.  Here's a
 997 | trivial example that shows what's going on under the hood:
 998 | 
 999 | ...............................................................................
1000 | 
1001 | >> a = Tempfile.new("foo.txt")
1002 | => #<File:/tmp/foo.txt.2021.0>
1003 | >> b = Tempfile.new("foo.txt")
1004 | => #<File:/tmp/foo.txt.2021.1>
1005 | >> a.path
1006 | => "/tmp/foo.txt.2021.0"
1007 | >> b.path
1008 | => "/tmp/foo.txt.2021.1"
1009 | 
1010 | ...............................................................................
1011 | 
1012 | Allowing Ruby to handle collision avoidance is generally a good thing, 
1013 | especially we don't normally care about the exact names of our temp files.
1014 | Of course, we can always rename the file if we need to store it somewhere
1015 | permanently, as you saw in the case study.
1016 | 
1017 | ==== Same Old I/O Operations ====
1018 | 
1019 | Because we're dealing with an object that delegates most of its functionality
1020 | directly to +File+, we can use normal +File+ methods, as show in our example.
1021 | 
1022 | For this reason, we can write to our file handle as expected:
1023 | 
1024 | ...............................................................................
1025 | 
1026 | temp << some_data
1027 | 
1028 | ...............................................................................
1029 |   
1030 | And read from it in a similar fashion:
1031 | 
1032 | ...............................................................................
1033 | 
1034 | # then in some later code
1035 | temp.rewind
1036 | temp.each do |line|
1037 |   # do something with data
1038 | end
1039 | 
1040 | ...............................................................................
1041 |   
1042 | Because we leave the file handle open, we need to rewind it to point back
1043 | at the beginning of the file rather than the end. Beyond that, the 
1044 | behavior is exactly similar to +File#each+.  
1045 | 
1046 | ==== Automatic Unlinking ====
1047 | 
1048 | Tempfile cleans up after itself.  There are two main ways of unlinking a file
1049 | and which one to use depends on your needs.  Simply closing the file handle
1050 | is good enough, and it is what we use in our example:
1051 | 
1052 | ...............................................................................
1053 | 
1054 | temp.close
1055 | 
1056 | ...............................................................................
1057 |   
1058 | In this case, Ruby doesn't remove the temporary file right away.  Instead,
1059 | it will keep it around until all reference to temp have been garbage 
1060 | collected.  For this reason, if keeping lots of open file handles around is
1061 | a problem for you, you can actually close your handles without fear of losing
1062 | your tempfile so long as you keep a reference to it handy.
1063 | 
1064 | However, in other situations, you may want to purge the file as soon as it
1065 | has been closed.  The change to make this happen is trivial:
1066 | 
1067 | ...............................................................................
1068 | 
1069 | temp.close!
1070 | 
1071 | ...............................................................................
1072 |   
1073 | Finally, if you need to explicitly delete a file that has already been 
1074 | closed, you can just use the following:
1075 | 
1076 | ...............................................................................
1077 |  
1078 |   temp.unlink
1079 | 
1080 | ...............................................................................
1081 |   
1082 | In practice, you don't need to think about this in most cases.
1083 | Instead, +Tempfile+ works as you might expect, keeping your files around while you
1084 | need them and cleaning up after itself when it needs to.  If you forget to
1085 | close a temporary file explicitly, it'll be unlinked when the process exits.  For
1086 | these reasons, using the 'tempfile' library is often a better choice than
1087 | rolling your own solution.
1088 | 
1089 | There is more to be said about this very cool library, but what we've already
1090 | discussed covers most of what you'll need day to day, now is a fine time to go 
1091 | over what's been said and move on to the next thing.
1092 | 
1093 | We've gone over some of the tools Ruby provides for working with
1094 | your filesystem in a platform-agnostic way, and we're about to get into some 
1095 | more advanced strategies for managing, processing, and manipulating your 
1096 | files and their contents.  However, before we do that, let's review the
1097 | key points about working with your filesystem and with tempfiles:
1098 | 
1099 |   * There are a whole slew of options for file management in Ruby, including
1100 |     `FileUtils`, `Dir`, and `Pathname`, with some overlap between them.
1101 |     
1102 |   * `Pathname` provides a high level, modern Ruby interface to managing files
1103 |     and traversing your file system.
1104 |     
1105 |   * `FileUtils` provides a *nix style API to file management tools, but works
1106 |     just fine on any system, making it quite useful for porting shell scripts
1107 |     to Ruby.
1108 |     
1109 |   * The tempfile standard library provides a convenient IO-like class for
1110 |     dealing with tempfiles in a system independent way. 
1111 |     
1112 |   * The tempfile library also helps make things easier through things like
1113 |     name collision avoidance, automatic file unlinking, and other niceties.
1114 |     
1115 | With these things in mind, we'll see more of the techniques shown in this
1116 | section later on in the chapter.  But if you're bored with the basics, now
1117 | is the time to look at higher level strategies for doing common I/O tasks.
1118 | 
1119 | === Text Processing Strategies ===
1120 | 
1121 | Ruby makes basic I/O operations dead simple, but this doesn't mean it's a bad
1122 | idea to pick up and apply some general approaches to text processing.  Here 
1123 | we'll talk about two techniques that most programmers doing file processing 
1124 | will want to know about, and show what they look like in Ruby.
1125 | 
1126 | ====  Advanced Line Processing ====
1127 | 
1128 | The case study for this chapter showed the most common use of +File.foreach()+,
1129 | but there is more to be said about this approach.  This section will
1130 | highlight a couple of tricks worth knowing about when doing line by line 
1131 | processing.
1132 | 
1133 | ===== Using Enumerator =====
1134 | 
1135 | In the following example, we will show code which extracts and sums the totals
1136 | found in a file that has entries similar to to ones below:
1137 | 
1138 | ...............................................................................
1139 | 
1140 | some 
1141 | lines
1142 | of
1143 | text
1144 | total: 12
1145 | 
1146 | other
1147 | lines
1148 | of
1149 | text
1150 | total: 16
1151 | 
1152 | more
1153 | text
1154 | total: 3
1155 | 
1156 | ...............................................................................
1157 |   
1158 | The following code shows how to do this without loading the whole file into 
1159 | memory:
1160 | 
1161 | ...............................................................................
1162 | 
1163 | sum = 0
1164 | File.foreach("data.txt") { |line| sum += line[/total: (\d+)/,1].to_f }
1165 | 
1166 | ...............................................................................
1167 | 
1168 | Here, we are using `File.foreach` as a direct iterator, and building up our sum 
1169 | as we go.  However, because `foreach()` returns an `Enumerator`, we can actually 
1170 | write this in a cleaner way without sacrificing efficiency:
1171 | 
1172 | ...............................................................................
1173 | 
1174 | enum = File.foreach("data.txt")
1175 | sum = enum.inject(0) { |s,r| s + r[/total: (\d+)/,1].to_f }
1176 | 
1177 | ...............................................................................
1178 | 
1179 | The primary difference between the two approaches is that when you use 
1180 | `File.foreach` directly with a block, you are simply iterating line by line 
1181 | over the file, whereas `Enumerator` gives you some more powerful ways of 
1182 | processing your data.
1183 | 
1184 | When we work with arrays, we don't usually write code like this:
1185 | 
1186 | ...............................................................................
1187 | 
1188 | sum = 0
1189 | arr.each { |e| sum += e }
1190 | 
1191 | ...............................................................................
1192 | 
1193 | Instead, we typically let Ruby do more of the work for us:
1194 | 
1195 | ...............................................................................
1196 |   
1197 | sum = arr.inject(0) { |s,e| s + e }
1198 | 
1199 | ...............................................................................
1200 |   
1201 | For this reason, we should do the same thing with files.  If we have an
1202 | `Enumerable` method we want to use to transform or process a file, we should
1203 | use the enumerator provided by +File.foreach()+ rather than try to do our
1204 | processing within the block.  This will allow us to leverage the power
1205 | behind Ruby's `Enumerable` module rather than doing the heavy lifting ourselves.
1206 | 
1207 | ===== Tracking Line Numbers =====
1208 | 
1209 | If you're interested in certain line numbers, there is no need to maintain
1210 | a manual counter.  You simply need to create a file handle to work with,
1211 | and then make use of the +File#lineno+ method.  To illustrate this, we can
1212 | very easily implement the UNIX command head.
1213 | 
1214 | ...............................................................................
1215 | 
1216 | def head(file_name,max_lines = 10)
1217 |   File.open(file_name) do |file|
1218 |     file.each do |line|
1219 |       puts line
1220 |       break if file.lineno == max_lines
1221 |     end
1222 |   end
1223 | end
1224 | 
1225 | ...............................................................................
1226 | 
1227 | For a more interesting use case, we can consider a file that is formatted
1228 | in two line pairs, the first line a key, the second a value, e.g.
1229 | 
1230 | ...............................................................................
1231 | 
1232 | first name
1233 | gregory
1234 | last name
1235 | brown
1236 | email
1237 | gregory.t.brown@gmail.com
1238 | 
1239 | ...............................................................................
1240 | 
1241 | Using +File#lineno+, this is trivial to process:
1242 | 
1243 | ...............................................................................
1244 | 
1245 | keys   = []
1246 | values = []
1247 | 
1248 | File.open("foo.txt") do |file|
1249 |   file.each do |line|
1250 |     (file.lineno.odd? ? keys : values) << line.chomp
1251 |   end
1252 | end
1253 | 
1254 | Hash[*keys.zip(values).flatten] 
1255 | 
1256 | ...............................................................................
1257 |  
1258 | The result of this code is a simple hash, as you might expect:
1259 | 
1260 | ...............................................................................
1261 | 
1262 |  { "first name" => "gregory", 
1263 |    "last name"  => "brown", 
1264 |    "email"      => "gregory.t.brown@gmail.com" }
1265 | 
1266 | ...............................................................................
1267 |    
1268 | Though there is probably more we can say about iterating over files line
1269 | by line, this should get you well on your way.  For now, there are other
1270 | important I/O strategies to investigate, so we'll keep moving.
1271 | 
1272 | ==== Atomic Saves ====
1273 | 
1274 | Although many file processing scripts can happily read in one file as input
1275 | and produce another as output, sometimes we want to be able to do 
1276 | transformations directly on a single file.  This isn't hard in practice,
1277 | but it's a little bit less obvious than you might think.
1278 | 
1279 | It is technically possible to rewrite parts of a file using the `"r+"` file
1280 | mode, but in practice, this can be unwieldy in most cases.  An alternative 
1281 | approach is to load the entire contents of a file into memory, manipulate 
1282 | the string, and then overwrite the original file.  However, this approach
1283 | is wasteful, and is not the best way to go in most cases.
1284 | 
1285 | As it turns out, there is a simple solution to this problem, and that is to simply
1286 | work around it.  Rather than trying to make direct changes to a file, or
1287 | store a string in memory and then write it back out to the same file after
1288 | manipulation, we can instead make use of a temporary file and do line by
1289 | line processing as normal.  When we finish the job, we can rename our temp
1290 | file so as to replace the original.  Using this approach, we can easily
1291 | make a backup of the original file if necessary, and also roll back changes
1292 | upon error.
1293 | 
1294 | Let's take a quick look at an example that demonstrates this general 
1295 | strategy.  We'll build a script that strips comments from Ruby files,
1296 | allowing us to take source code such as this:
1297 | 
1298 | ...............................................................................
1299 | 
1300 | # The best class ever 
1301 | # Anywhere in the world 
1302 | class Foo              
1303 |        
1304 |   # A useless comment 
1305 |   def a 
1306 |      true 
1307 |   end               
1308 | 
1309 |   #Another Useless comment 
1310 |   def b 
1311 |     false 
1312 |   end  
1313 | 
1314 | end
1315 | 
1316 | ...............................................................................
1317 | 
1318 | And turn it into comment-free code such as this:
1319 | 
1320 | ...............................................................................
1321 | 
1322 | class Foo              
1323 |        
1324 |   def a 
1325 |      true 
1326 |   end               
1327 | 
1328 |   def b 
1329 |     false 
1330 |   end  
1331 | 
1332 | end
1333 | 
1334 | ...............................................................................
1335 | 
1336 | With the help of Ruby's 'tempfile' and 'fileutils' standard libraries, this task
1337 | is trivial:
1338 | 
1339 | ...............................................................................
1340 | 
1341 | require "tempfile" 
1342 | require "fileutils" 
1343 | temp = Tempfile.new("working")      
1344 | File.foreach(ARGV[0]) do |line| 
1345 |   temp << line unless line =~ /^\s*#/ 
1346 | end                               
1347 |                                
1348 | temp.close 
1349 | FileUtils.mv(temp.path,ARGV[0])
1350 | 
1351 | ...............................................................................
1352 | 
1353 | We initialize a new +Tempfile+ object and then iterate over the file
1354 | specified on the command line.  We append each line to the +Tempfile+, so long
1355 | as it is not a comment line.  This is the first part of our task:
1356 | 
1357 | ...............................................................................
1358 | 
1359 | temp = Tempfile.new("working")      
1360 | File.foreach(ARGV[0]) do |line| 
1361 |   temp << line unless line =~ /^\s*#/ 
1362 | end                               
1363 |                              
1364 | temp.close
1365 | 
1366 | ...............................................................................
1367 | 
1368 | Once we've written our +Tempfile+ and closed the file handle, we then use
1369 | +FileUtils+ to rename it and replace the original file we were working on:
1370 | 
1371 | ...............................................................................
1372 | 
1373 | FileUtils.mv(temp.path,ARGV[0])
1374 | 
1375 | ...............................................................................
1376 |   
1377 | In two steps, we've efficiently modified a file without loading it entirely
1378 | into memory or dealing with the complexities of using the 'r+' file mode.
1379 | In many cases, the simple approach shown here will be enough. 
1380 | 
1381 | Of course, because you are modifying a file in place, a poorly coded script
1382 | could risk destroying your input file.  For this reason, you might want
1383 | to make a backup of your file.  This can be done trivially with +FileUtils.cp+,
1384 | as shown in the following reworked version of our example:
1385 | 
1386 | ...............................................................................
1387 | 
1388 | require "tempfile" 
1389 | require "fileutils"
1390 | 
1391 | temp = Tempfile.new("working")      
1392 | File.foreach(ARGV[0]) do |line| 
1393 |   temp << line unless line =~ /^\s*#/ 
1394 | end                               
1395 |                              
1396 | temp.close 
1397 | FileUtils.cp(ARGV[0],"#{ARGV[0]}.bak") 
1398 | FileUtils.mv(temp.path,ARGV[0])
1399 | 
1400 | ...............................................................................
1401 | 
1402 | This code only makes a backup of the original file if the temp file is 
1403 | successfully populated, which prevents it from producing garbage during 
1404 | testing.
1405 | 
1406 | Some times it will make sense to do backups, other times, it won't be
1407 | essential.  Of course, it's better to be safe then sorry, so if you're in
1408 | doubt, just add the extra line of code for a bit more peace of mind.
1409 | 
1410 | The two strategies shown in this section will come in practice up again and 
1411 | again for those doing frequent text processing.  They can even be used in
1412 | combination when needed.
1413 | 
1414 | We're about to close our discussion on this topic, but before we do that,
1415 | it's worth mentioning the following reminders:
1416 | 
1417 |   * When doing line based file processing, +File.foreach+ can be used as an
1418 |     +Enumerator+, unlocking the power of +Enumerable+.  This provides an 
1419 |     extremely handy way to search, traverse, and manipulate files without
1420 |     sacrificing efficiency.
1421 |     
1422 |   * If you need to keep track of which line of a file you are on while you are
1423 |     iterating over it, you can use +File#lineno+ rather than incrementing your
1424 |     own counter.
1425 |     
1426 |   * When doing atomic saves, the tempfile standard library can be used to 
1427 |     avoid unnecessary clutter.
1428 |    
1429 |   * Be sure to test any code that does atomic saves thoroughly, as there is
1430 |     real risk of destroying your original source files if backups are not made.
1431 |    
1432 | 
1433 | === Conclusions ===
1434 | 
1435 | When dealing with text processing and file management in Ruby, there are
1436 | a few things to keep in mind.  Most of the pitfalls you can run into while
1437 | doing this sort of work tend to have to do with either performance, 
1438 | platform-dependence, or code that doesn't clean up after itself.
1439 | 
1440 | In this chapter, we talked about a couple standard libraries that can help
1441 | keep things clean and platform independent.  Though Ruby is a fine language
1442 | to write shell scripts in, there is often no need to resort to code that
1443 | will run only on certain machines when a pure Ruby solution is just as clean.
1444 | For this reason, using libraries such as +Tempfile+, +Pathname+, and +FileUtils+
1445 | will go a seriously long way towards keeping your code portable and 
1446 | maintainable down the line.
1447 | 
1448 | For issues of performance, you can almost always squeeze out extra speed
1449 | and minimize your memory footprint by processing your data line by line
1450 | rather than slurping everything into a single string.  You can also much
1451 | more effectively find a needle in the haystack if you form well crafted
1452 | regular expressions that don't make Ruby work too hard.  The techniques
1453 | we've shown here serve as reminders about common mistakes even seasoned
1454 | Rubyists tend to make, and provide good ways around them.
1455 | 
1456 | Text processing and file management can quickly become complex, but with a
1457 | solid grasp of the fundamental strategies, you can use Ruby as an extremely
1458 | powerful tool that works faster and more effectively than you might imagine.
1459 | 


--------------------------------------------------------------------------------
/manuscript/unmerged/ruby_worst_practices.txt:
--------------------------------------------------------------------------------
  1 | == Appendix C: Ruby Worst Practices ==
  2 | 
  3 | If you've read through most of this book, you'll notice that it didn't have much
  4 | of a "Do this, not that" theme.  Ruby as a language doesn't fit well into that
  5 | framework, since there are always exceptions to any rule you can come up with.
  6 | 
  7 | However, there are certainly a few things you really shouldn't do, unless you
  8 | know exactly why you are doing them.   This appendix is meant to cover a handful
  9 | of those scenarios and show you some better alternatives.  I've done my best to
 10 | stick to issues I've been bit by myself, in the hopes that I can offer some
 11 | practical advice for problems you might actually have run into.
 12 | 
 13 | A bad practice in programming shouldn't be simply defined by some ill-defined
 14 | aesthetic imposed upon folks by the "experts".  Instead, we can often track
 15 | anti-patterns in code down to either flaws in the high level design of an object
 16 | oriented system, or failed attempts at cleverness in the underlying
 17 | feature implementations.  These bits of unsavory code produced by bad habits or
 18 | the misunderstanding of certain Ruby peculiarties can be a drag on your whole
 19 | project, creating substantial technical debt as they accumulate.
 20 | 
 21 | We'll start with the high level design issues and then move on to the common
 22 | sticking points when implementing tricky Ruby features.  Making an improvement
 23 | to even a couple of these problem areas will make a major difference, so even if
 24 | you already know about most of these pitfalls, you might find one or two tips that
 25 | will go a long way.
 26 | 
 27 | === Not-so Intelligent Design ===
 28 | 
 29 | Well designed object oriented systems can be a dream to work with.  When every
 30 | component seems to fit together nicely, with clear, simple integration code
 31 | between the major subsystems, you get the feeling that the architecture is
 32 | working for you, and not against you.
 33 | 
 34 | If you're not careful, all of this can come crashing down.  Let's look at a few
 35 | things to watch out for, and how to get around them.
 36 | 
 37 | ==== Class Variables Considered Harmful ====
 38 | 
 39 | Ruby's class variables are one of the easiest ways to break encapsulation and
 40 | create headaches for yourself when designing class heirarchies.  To demonstrate the
 41 | problem, I'll show an example where class variables were tempting but ultimately
 42 | the wrong solution.
 43 | 
 44 | In my abstract formatting library +Fatty+, I provide a formatter base class
 45 | which users must inherit from from to make use of the system.  This provides
 46 | helpers which build up anonymous classes for certain formats.   To get a sense
 47 | of what this looks like, take a look at this example:
 48 | 
 49 | ..............................................................................
 50 | 
 51 | class Hello < FattyRBP::Formatter 
 52 |   format :text do
 53 |     def render
 54 |       "Hello World"
 55 |     end
 56 |   end
 57 | 
 58 |   format :html do
 59 |     def render
 60 |       "<b>Hello World</b>"
 61 |     end
 62 |   end
 63 | end
 64 | 
 65 | puts Hello.render(:text) #=> "Hello World"
 66 | puts Hello.render(:html) #=> "<b>Hello World</b>"
 67 | 
 68 | ..................................................................................
 69 | 
 70 | Though we've omitted most of the actual functionality +Fatty+ provides, a simple
 71 | implementation of this system using class variables might look like this:
 72 | 
 73 | ..................................................................................
 74 | 
 75 | module FattyRBP
 76 |   class Formatter
 77 |      @@formats = {}
 78 | 
 79 |      def self.format(name, options={}, &block)
 80 |        @@formats[name] = Class.new(FattyRBP::Format, &block)
 81 |      end
 82 | 
 83 |      def self.render(format, options={})
 84 |        @@formats[format].new(options).render
 85 |      end
 86 |    end
 87 | 
 88 |   class Format
 89 |     def initialize(options)
 90 |       # not important
 91 |     end
 92 | 
 93 |     def render
 94 |       raise NotImplementedError
 95 |     end
 96 |   end 
 97 | 
 98 | end
 99 | 
100 | ..................................................................................
101 | 
102 | This code will make the example we showed earlier work as advertised.  Now let's
103 | see what happens when we add another subclass into the mix.
104 | 
105 | ..................................................................................
106 | 
107 | class Goodbye < FattyRBP::Formatter
108 |   format :text do
109 |     def render
110 |       "Goodbye Cruel World!"
111 |     end
112 |   end
113 | end
114 | 
115 | puts Goodbye.render(:text) #=> "Goodbye Cruel World!"
116 | 
117 | ..................................................................................
118 | 
119 | At a first glance, things seem to be working.  But if we dig deeper, we see two
120 | problems:
121 | 
122 | ..................................................................................
123 | 
124 | # Should not have changed
125 | puts Hello.render(:text) #=> "Goodbye Cruel World!" 
126 | 
127 | # Shouldn't exist
128 | puts Goodbye.render(:html) #=> "<b>Hello World</b>"
129 | 
130 | ..................................................................................
131 | 
132 | And here, we see the problem with class variables.  If we think of them as
133 | class-level state, we'd be wrong.  They are actually class-heirarchy variables
134 | that can have their state modified by any subclass, whether direct or many
135 | levels down the ancestry chain.  This means they're fairly close to
136 | global state in nature, which is usually a bad thing.  So unless you were
137 | actually counting on this behavior, an easy fix is to just dump class variables
138 | and use class instance variables instead.
139 | 
140 | ..................................................................................
141 | 
142 | module FattyRBP
143 |   class Formatter
144 | 
145 |     def self.formats
146 |       @formats ||= {}
147 |     end
148 | 
149 |      def self.format(name, options={}, &block)
150 |        formats[name] = Class.new(FattyRBP::Format, &block)
151 |      end
152 | 
153 |      def self.render(format, options={})
154 |        formats[format].new(options).render
155 |      end
156 |    end
157 | 
158 |   class Format
159 |     def initialize(options)
160 |       # not important
161 |     end
162 |   end
163 | end
164 | 
165 | ..................................................................................
166 | 
167 | Although this prevents direct access to the variable from instances, it is easy
168 | to define accessors at the class level.  The benefit is that each subclass
169 | carries their own instance variable, just like ordinary objects do.  With this
170 | new code, everything works as expected:
171 | 
172 | ..................................................................................
173 | 
174 | puts Hello.render(:text)   #=> "Hello World"
175 | puts Hello.render(:html)   #=> "<b>Hello World</b>"
176 | puts Goodbye.render(:text) #=> "Goodbye Cruel World"
177 | 
178 | puts Hello.render(:text)   #=> "Hello World"
179 | puts Goodbye.render(:html) #=> raises an error
180 | 
181 | ..................................................................................
182 | 
183 | So the moral of the story here is that class-level state should be stored in
184 | class instance variables if you want to allow subclassing.  Reserve class
185 | variables for data that needs to be shared across an entire class heirarchy.
186 | 
187 | ==== Hardcoding Yourself into a Corner ====
188 | 
189 | One good practice is to provide alternative constructors for your classes when
190 | there are common configurations that might be generally useful.   One such
191 | example is in +Prawn+, when a user wants to build up a document via a simplified
192 | interface and then immediately render it to file:
193 | 
194 | ..................................................................................
195 | 
196 | Prawn::Document.generate("hello.pdf") do
197 |   text "Hello Prawn!"
198 | end
199 | 
200 | ..................................................................................
201 | 
202 | Implementing this method was very simple, as it simply wraps the constructor and
203 | calls an extra method to render the file afterwards:
204 | 
205 | ..................................................................................
206 | 
207 | module Prawn
208 |   class Document
209 | 
210 |     def self.generate(filename,options={},&block)
211 |       pdf = Prawn::Document.new(options,&block)          
212 |       pdf.render_file(filename)
213 |     end
214 | 
215 |   end
216 | end
217 | 
218 | ..................................................................................
219 | 
220 | However, some months down the line, a bug report made me realize that I made
221 | somewhat stupid mistake here.  I accidentally prevented users from being able to 
222 | write code like this:
223 | 
224 | ..................................................................................
225 | 
226 | class MyDocument < Prawn::Document
227 |   def say_hello
228 |     text "Hello MyDocument"
229 |   end
230 | end
231 | 
232 | MyDocument.generate("hello.pdf") do
233 |   say_hello
234 | end
235 | 
236 | ..................................................................................
237 | 
238 | The problem of course, is that +Prawn::Document.generate+ hard codes the
239 | constructor call, which prevents subclasses from ever being instantiated via
240 | +generate+.   The fix is so easy, it is somewhat embarassing to share:
241 | 
242 | ..................................................................................
243 | 
244 | module Prawn
245 |   class Document
246 | 
247 |     def self.generate(filename,options={},&block)
248 |       pdf = new(options,&block)          
249 |       pdf.render_file(filename)
250 |     end
251 | 
252 |   end
253 | end
254 | 
255 | ..................................................................................
256 | 
257 | By removing the explicit receiver, we now construct an object based on whatever
258 | +self+ is, rather than only building up +Prawn::Document+ objects.  This affords
259 | us additional flexibility at virtually no cost.  In fact, because hardcoding the
260 | name of the current class in your method definitions is almost always an
261 | accident, this applies across the board as a good habit to get into.
262 | 
263 | Although much less severe, the same thing goes for class method definitions as
264 | well.  Throughout this book, you will see class methods defined using
265 | +def self.my_method+ rather than +def MyClass.my_method+.  The reason for this
266 | is much more about maintainability than it is about style.  To illustrate this,
267 | let's do a simple comparison.  We start off with two boring class definitions
268 | for the classes +A+ and +B+
269 | 
270 | ..................................................................................
271 | 
272 |   class A
273 |     def self.foo
274 |       # ..
275 |     end
276 | 
277 |     def self.bar
278 |       # ..
279 |     end
280 |   end
281 | 
282 |   class B
283 |     def B.foo
284 |       # ...
285 |     end
286 | 
287 |     def B.bar
288 |       # ...
289 |     end
290 |   end
291 | 
292 | ..................................................................................
293 | 
294 | These two are functionally equivalent, each defining the class methods +foo+ and
295 | +bar+ on their respective classes.  But now, let's refactor our code a bit, 
296 | renaming +A+ to +C+ and +B+ to +D+. Observe the work involved in doing each:
297 | 
298 | ..................................................................................
299 | 
300 |   class C 
301 |     def self.foo
302 |       # ..
303 |     end
304 | 
305 |     def self.bar
306 |       # ..
307 |     end
308 |   end
309 | 
310 |   class D 
311 |     def D.foo
312 |       # ...
313 |     end
314 | 
315 |     def D.bar
316 |       # ...
317 |     end
318 |   end
319 | 
320 | 
321 | ..................................................................................
322 | 
323 | To rename +A+ to +C+, we simply change the name of our class, and we don't need
324 | to touch the method definitions.  But when we change +B+ to +D+, each and every
325 | method needs to be reworked.  While this might be okay for an object with one or
326 | two methods at the class level, you can imagine how tedious this could be when
327 | that number gets larger. 
328 | 
329 | So we've now found two points against hardcoding class names, and could probably 
330 | keep growing the list if we wanted.  But for now, let's move on to some higher
331 | level design issues.
332 | 
333 | ==== When Inheritence Becomes Restrictive ====
334 | 
335 | Inheritence is very nice when your classes have a clear hierarchical
336 | structure between them.  However, it can get in the way when used
337 | inappropriately.  Problems begin to crop up when we try to model cross-cutting
338 | concerns using ordinary inheritance.   For examples of this, it's easy to look
339 | directly into core Ruby.
340 | 
341 | Imagine if +Comparable+ were a class instead of a module.  Then, you would be
342 | writing code like this:
343 | 
344 | ..................................................................................
345 | 
346 | class Person < Comparable
347 | 
348 |   def initialize(first_name, last_name)
349 |     @first_name = first_name
350 |     @last_name  = last_name
351 |   end
352 | 
353 |   attr_reader :first_name, :last_name
354 | 
355 |   def <=>(other_person)
356 |     [last_name, first_name] <=> [other_person.last_name, other_person.first_name]
357 |   end
358 | 
359 | end
360 | 
361 | ..................................................................................
362 | 
363 | However, after seeing this, it becomes clear that it'd be nice to use a +Struct+
364 | here.   If we ignore the features provided by +Comparable+ here for a moment,
365 | the benefits of a struct to represent this simple data structure becomes
366 | obvious.
367 | 
368 | ..................................................................................
369 | 
370 | class Person < Struct.new(:first_name, :last_name)
371 |   def full_name
372 |     "#{first_name} #{last_name}"
373 |   end
374 | end
375 | 
376 | ..................................................................................
377 | 
378 | Because Ruby supports single inheritance only, this example clearly
379 | demonstrates the problems we run into when relying too heavily on hierarchical
380 | structure.  A +Struct+ is certainly not always +Comparable+.   And it is just
381 | plain silly to think of all +Comparable+ objects being +Struct+ objects.  The
382 | key distinction here is that a +Struct+ defines what an object is made up of,
383 | whereas +Comparable+ defines a set of features associated with certain objects.
384 | For this reason, the real Ruby code to accomplish this modeling makes a whole
385 | lot of sense:
386 | 
387 | ..................................................................................
388 | 
389 | class Person < Struct.new(:first_name, :last_name)
390 | 
391 |   include Comparable
392 | 
393 |   def <=>(other_person)
394 |     [last_name, first_name] <=> [other_person.last_name, other_person.first_name]
395 |   end
396 | 
397 |   def full_name
398 |      "#{first_name} #{last_name}"
399 |   end
400 | 
401 | end
402 | 
403 | ..................................................................................
404 | 
405 | Keep in mind that while we are constained to exactly one superclass, we can 
406 | include as many modules as we'd like.  For this reason, modules are often used
407 | to implement features that are completely orthogonal to the underlying class
408 | definition they are mixed into.   Taking an example from the Ruby API
409 | documentation, we see +Forwardable+ being used to very quickly implement a
410 | simple +Queue+ structure by doing little more than delegating to an underlying
411 | +Array+:
412 | 
413 | ..................................................................................
414 | 
415 | require "forwardable"
416 | 
417 | class Queue
418 |   extend Forwardable
419 | 
420 |   def initialize
421 |     @q = [ ]    
422 |   end
423 | 
424 |   def_delegator :@q, :push, :enq
425 |   def_delegator :@q, :shift, :deq
426 | 
427 |   def_delegators :@q, :clear, :first, :push, :shift, :size
428 | end
429 | 
430 | ..................................................................................
431 | 
432 | Although +Forwardable+ would make no sense anywhere in a class hierarchy, it
433 | accomplishes its task beautifully here.   If we were constrained to a purely
434 | inheritance based model, such cleverness would not be so easy to pull off.
435 | 
436 | The key thing to remember here is not that you should avoid inheritance at all
437 | cost, by any means.  Instead, you should simply remember not to go out of your
438 | way to construct an artificial hierarchical structure to represent cross-cutting
439 | or orthogonal concerns.  It's important to remember that Ruby's core is not
440 | special or magical in its abundant use of mix ins, but instead, representative
441 | of a very pragmatic and powerful object model.  You can and should apply this
442 | technique within your own designs, whenever it makes sense to do so.
443 | 
444 | === The Downside of Cleverness ===
445 | 
446 | Ruby lets you do all sorts of clever, fancy tricks.  This cleverness is a big
447 | part of what makes Ruby so elegant, but it also can be downright dangerous in
448 | the wrong hands. To illustrate this, we'll look at the kind of trouble you can
449 | get in if you aren't careful.
450 | 
451 | ==== The evils of +eval()+ ====
452 | 
453 | Throughout this book, we've dynamically evaluated code blocks all over the
454 | place.  However, what you have not seen much of is the use of +eval()+, 
455 | +class_eval()+, or even +instance_eval()+ with a string.  Some might wonder why 
456 | this is, because `eval()` can be so useful!  For example, imagine that you are exposing 
457 | a way for users to filter through some data.  You would like to be able to support an
458 | interface like this:
459 | 
460 | ..................................................................................
461 | 
462 | user1 = User.new("Gregory Brown", balance: 2500)
463 | user2 = User.new("Arthur Brown", balance: 3300)
464 | user3 = User.new("Steven Brown", balance: 3200)
465 | 
466 | f = Filter.new([user1, user2, user3])
467 | f.search("balance > 3000") #=> [user2, user3]
468 | 
469 | ..................................................................................
470 | 
471 | Armed with +instance_eval+, this task is so easy that you barely bat an eye as you
472 | type out the following code:
473 | 
474 | ................................................................................
475 | 
476 | class User
477 |   def initialize(name, options)
478 |     @name    = name
479 |     @balance = options[:balance]
480 |   end
481 | 
482 |   attr_reader :name, :balance
483 | end
484 | 
485 | class Filter
486 |   def initialize(enum)
487 |     @collection = enum
488 |   end
489 | 
490 |   def search(query)
491 |     @collection.select { |e| e.instance_eval(query) }
492 |   end
493 | end
494 | 
495 | ................................................................................
496 | 
497 | Running the earlier example, you see this code works great, exactly as expected.
498 | But unfortunately, trouble strikes when you see queries like this:
499 | 
500 | ................................................................................
501 | 
502 | >> f.search("@balance = 0")
503 | => [#<User:0x40caa4 @name="Gregory Brown", @balance=0>, 
504 |     #<User:0x409138 @name="Arthur Brown", @balance=0>, 
505 |     #<User:0x402874 @name="Steven Brown", @balance=0>]
506 | 
507 | ................................................................................
508 | 
509 | Or perhaps even scarier:
510 | 
511 | ................................................................................
512 | 
513 | >> f.search("system('touch hacked')")
514 | => [#<User:0x40caa4 @name="Gregory Brown", ...]
515 | >> File.exist?('hacked')
516 | => true
517 | 
518 | ................................................................................
519 | 
520 | Since the ability for user generated strings to execute arbitrary system
521 | commands or damage the internals of an object aren't exactly appealing, you code
522 | up a regex filter to protect against this:
523 | 
524 | ................................................................................
525 | 
526 | def search(query)
527 |   raise "Invalid query" unless query =~ /^(\w+) ([><!]=?|==) (\d+)$/
528 |   @collection.select { |e| e.instance_eval(query) }
529 | end
530 | 
531 | ................................................................................
532 | 
533 | This protects against the two issues we saw before, which is great:
534 | 
535 | ................................................................................
536 | 
537 | >> f.search("system('touch hacked')")
538 | RuntimeError: Invalid query
539 | 	from (irb):33:in `search'
540 | 	from (irb):38
541 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
542 | 
543 | >> f.search("@balance = 0")
544 | RuntimeError: Invalid query
545 | 	from (irb):33:in `search'
546 | 	from (irb):39
547 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
548 | 
549 | ................................................................................
550 | 
551 | But if you weren't paying very close attention, you would have missed that we
552 | got our anchors wrong.  That means there's still a hole to be exploited here:
553 | 
554 | ................................................................................
555 | 
556 | >> f.search("balance == 0\nsystem('touch hacked_again')")
557 | => [#<User:0x40caa4 @name="Gregory Brown", @balance=0  ...]
558 | >> File.exist?('hacked_again')
559 | => true
560 | 
561 | ................................................................................
562 | 
563 | Since our regex checked the first line and not the whole string, we were able to
564 | sneak by the validation.   Arguably, if you're very careful, you could come up
565 | with the right pattern and be reasonably safe.  But since you are already
566 | validating the syntax, why play with fire?  We can re-write this code to
567 | accomplish the same goals with none of the associated risks:
568 | 
569 | ................................................................................
570 | 
571 | def search(query)
572 |   data = query.match(/^(?<attr>\w+) (?<op>[><!]=?|==) (?<val>\d+)$/)
573 |   @collection.select do |e| 
574 |     attr = e.public_send(data[:attr])
575 |     attr.public_send(data[:op], Integer(data[:val])) 
576 |   end
577 | end
578 | 
579 | ................................................................................
580 | 
581 | Here, we don't expose any of the objects internals, preserving encapsulation.
582 | Because we parse out the individual components of the statement and use
583 | +public_send+ to pass the messages on to our objects, we have completely
584 | eliminated the possibility of arbitrary code execution.  All in all, this code
585 | is much more secure and easier to debug.  As it turns out, this code will
586 | actually perform considerably better as well.
587 | 
588 | Every time you use +eval(string)+, Ruby needs to fire up its parser and tree
589 | walker to execute the code you've embedded in your string.  This means that in
590 | cases in which you just need to process a few values and then do something with
591 | them, using a targeted regular expression is often a much better option, as it 
592 | greatly reduces the amount of work the interpreter needs to do.  
593 | 
594 | For virtually every need you might turn to a raw string +eval()+ for, you can
595 | work around it using the tools Ruby provides.  These include all sorts of
596 | methods for getting at whatever you need, including +instance_variable_get+, 
597 | +instance_variable_set+, +const_get+, +const_set+, +public_send+, +send+, 
598 | +define_method+, +method()+,and even +Class.new+ / +Module.new+.  These tools
599 | allow you to dynamically manipulate Ruby code without evaluating strings directly. 
600 | For more details, you'll definitely want to read the "Mastering the Dynamic Toolkit" 
601 | chapter.
602 | 
603 | ==== Blind +rescue+ missions ====
604 |  
605 | Ruby provides a lot of different ways to handle exceptions.  They run the gamut
606 | all the way from capturing the full stack trace to completely ignoring raised
607 | errors.  This flexibility means that exceptions aren't necessarily treated with
608 | the same gravity in Ruby as in other languages, since they are very simple to
609 | rescue once they are raised.   In certain cases, folks have even used +rescue+
610 | as stand in replacement for conditional statements.  The classic example
611 | follows: 
612 | 
613 | ................................................................................
614 | 
615 |   name = @user.first_name.capitalize rescue "Anonymous"
616 | 
617 | ................................................................................
618 | 
619 | Usually, this is done with the intention of capturing the +NoMethodError+ raised
620 | by something like +first_name+ being nil here.  It accomplishes this task
621 | well, and looks slightly nicer than the alternative:
622 | 
623 | ................................................................................
624 | 
625 |   name = @user.first_name ? @user.first_name.capitalize : "Anonymous"
626 | 
627 | ................................................................................
628 | 
629 | However, the downside of using this trick is that you will most likely end up
630 | seeing this code again, at the long end of a painful debugging session.  For
631 | demonstration purposes, let's assume our +User+ is implemented like this:
632 | 
633 | ................................................................................
634 | require "pstore"
635 | 
636 | class User
637 | 
638 |   def self.data
639 |     @data ||= PStore.new("users.store")
640 |   end
641 | 
642 |   def self.add(id, user_data)
643 |     data.transaction do
644 |       data[id] = user_data
645 |     end
646 |   end
647 | 
648 |   def self.find(id)
649 |     data.transaction do
650 |       data[id] or raise "User not found"
651 |     end
652 |   end
653 |  
654 |   def initialize(id)
655 |     @user_id = id
656 |   end
657 | 
658 |   def attributes
659 |     self.class.find(@user_id)
660 |   end
661 | 
662 |   def first_name
663 |     attributes[:first_name]
664 |   end
665 | 
666 | end
667 | 
668 | ................................................................................
669 | 
670 | What we have here is basically a +PStore+ backed user database.  It's not terribly
671 | important to understand every last detail, but the code should be fairly easy to
672 | understand if you play around with it a bit.
673 | 
674 | Firing up irb we can see that the +rescue+ trick works fine for the case where
675 | +User#first_name+ returns +nil+.  
676 | 
677 | ................................................................................
678 | 
679 | >> require "user"
680 | => true
681 | 
682 | >> User.add('sandal', email: 'gregory@majesticseacreature.com')
683 | 
684 | => {:email=>"gregory@majesticseacreature.com"}
685 | >> @user = User.new('sandal')
686 | => #<User:0x48c448 @user_id="sandal">
687 | >> name = @user.first_name.capitalize rescue "Anonymous"
688 | => "Anonymous"
689 | => #<User:0x49ab74 @user_id="sandal">
690 | >> @user.first_name
691 | => nil
692 | >> @user.attributes
693 | => {:email=>"gregory@majesticseacreature.com"}
694 | 
695 | ................................................................................
696 | 
697 | Ordinary execution also works fine:
698 | 
699 | ................................................................................
700 | 
701 | >> User.add('jia', first_name: "Jia", email: "jia@majesticseacreature.com")
702 | 
703 | => {:first_name=>"Jia", :email=>"jia@majesticseacreature.com"}
704 | >> @user = User.new('jia')
705 | => #<User:0x492154 @user_id="jia">
706 | >> name = @user.first_name.capitalize rescue "Anonymous"
707 | => "Jia"
708 | >> @user.attributes
709 | => {:first_name=>"Jia", :email=>"jia@majesticseacreature.com"}
710 | >> @user.first_name
711 | => "Jia"
712 | >> @user = User.new('sandal')
713 | 
714 | ................................................................................
715 | 
716 | It seems like everything is in order, however, you don't need to look far.
717 | Notice that this line will succeed even if +@user+ is undefined
718 | 
719 | ................................................................................
720 | 
721 | >> @user = nil                                                                  
722 | => nil
723 | >> name = @user.first_name.capitalize rescue "Anonymous"
724 | => "Anonymous"
725 | 
726 | ................................................................................
727 | 
728 | This means you can't count on catching an error when a typo or a renamed
729 | variable creeps into your code.   This weakness of course propagates down the
730 | chain as well:
731 | 
732 | ................................................................................
733 | 
734 | >> name = @user.a_fake_method.capitalize rescue "Anonymous"
735 | => "Anonymous"
736 | >> name = @user.a_fake_method.cannot_fail rescue "Anonymous"
737 | => "Anonymous"
738 | 
739 | ................................................................................
740 | 
741 | Of course, issues with a one liner like this should be easy enough to catch even
742 | without an exception.  This is most likely the reason why this pattern has
743 | become so common.   However, this is usually an oversight, because the problem
744 | exists deeper down the bunny hole as well.  Let's introduce a typo into our user
745 | implementation:
746 | 
747 | ................................................................................
748 | 
749 | class User
750 | 
751 |   def first_name
752 |     attribute[:first_name]
753 |   end
754 | 
755 | end
756 | 
757 | ................................................................................
758 | 
759 | Now, we go back and look at one of our previously working examples:
760 | 
761 | ................................................................................
762 | 
763 | >> @user = User.new('jia')
764 | => #<User:0x4b8548 @user_id="jia">
765 | >> name = @user.first_name.capitalize rescue "Anonymous"
766 | => "Anonymous"
767 | >> @user.first_name
768 | NameError: undefined local variable or method `attribute' for #<User:0x4b8548 @user_id="jia">
769 | 	from (irb):23:in `first_name'
770 | 	from (irb):32
771 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
772 | 
773 | ................................................................................
774 | 
775 | Hopefully you're beginning to see the picture.   Although good testing and
776 | extensive quality assurance can catch these bugs, using this conditional
777 | modifier +rescue+ hack is like putting blinders on your code.  Unfortunately,
778 | this can also go for code of the form:
779 | 
780 | ................................................................................
781 | 
782 |    def do_something_dangerous
783 |       might_raise_an_error
784 |    rescue
785 |       "default value"
786 |    end
787 | 
788 | ................................................................................
789 | 
790 | Pretty much any rescue that does not capture a specific error may be a source of
791 | silent failure in your applications.   The only real case where an unqualified
792 | +rescue+ might make sense is when it is combined with a unqualified +raise+,
793 | which causes the same error to resurface after executing some code:
794 | 
795 | ................................................................................
796 | 
797 |    begin
798 |      # do some stuff
799 |    rescue => e
800 |      MyLogger.error "Error doing stuff: #{e.message}"
801 |      raise
802 |    end 
803 | 
804 | ................................................................................
805 | 
806 | In other situations, be sure to either know the risks involved, or avoid
807 | this technique entirely.  You'll thank yourself later.
808 | 
809 | ==== Doing +method_missing+ wrong ====
810 | 
811 | One thing you really don't want to do is mess up a +method_missing+ hook.
812 | Because the purpose of +method_missing+ is to handle unknown messages, it is a
813 | key feature for helping finding bugs in your code.
814 | 
815 | In the "Mastering the Dynamic Toolkit" chapter of this book, we covered some
816 | examples of how to use +method_missing+ properly.  Here's an example of how to
817 | do it wrong:
818 | 
819 | ....................................................................................
820 | 
821 | class Prawn::Document
822 | 
823 |   # Provides the following shortcuts:
824 |   #
825 |   #    stroke_some_method(*args) #=> some_method(*args); stroke
826 |   #    fill_some_method(*args) #=> some_method(*args); fill
827 |   #    fill_and_stroke_some_method(*args) #=> some_method(*args); fill_and_stroke
828 |   #
829 |   def method_missing(id,*args,&block)
830 |     case(id.to_s) 
831 |     when /^fill_and_stroke_(.*)/
832 |       send($1,*args,&block); fill_and_stroke
833 |     when /^stroke_(.*)/
834 |       send($1,*args,&block); stroke 
835 |     when /^fill_(.*)/
836 |       send($1,*args,&block); fill
837 |     end
838 |   end 
839 | 
840 | end
841 | 
842 | ....................................................................................
843 | 
844 | Although this may look very similar to an earlier example in this book, it has a
845 | critical flaw.  Can you see it?   If not, this irb session should help:
846 | 
847 | ....................................................................................
848 | 
849 | >> pdf.fill_and_stroke_cirlce([100,100], :radius => 25)
850 | => "0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\nb\n"
851 | >> pdf.stroke_the_pretty_kitty([100,100], :radius => 25)
852 | => "0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\nb\nS\n"
853 | >> pdf.donuts
854 | => nil
855 | 
856 | ....................................................................................
857 | 
858 | By coding a +method_missing+ hook without delegating to the original +Object+
859 | definition, we have effectively muted our object's ability to complain about
860 | messages we really didn't want it to handle.   To add insult to injury, failure
861 | cases such as +fill_and_stroke_cirlce+ and +stroke_the_pretty_kitty+ are doubly
862 | confusing, since they return a non-nil value even though they do not produce
863 | meaningful results.
864 | 
865 | Luckily, the remedy to this is simple, we just add a call to +super+ in the
866 | catch-all case:
867 | 
868 | ....................................................................................
869 | 
870 | def method_missing(id,*args,&block)
871 |   case(id.to_s) 
872 |   when /^fill_and_stroke_(.*)/
873 |     send($1,*args,&block); fill_and_stroke
874 |   when /^stroke_(.*)/
875 |     send($1,*args,&block); stroke 
876 |   when /^fill_(.*)/
877 |     send($1,*args,&block); fill
878 |   else
879 |     super
880 |   end
881 | end 
882 | 
883 | ....................................................................................
884 | 
885 | Now, if we re-run our examples from before, you see much more predictable
886 | behavior, in line with what we'd expect if we had no hook set up in the first
887 | place:
888 | 
889 | ....................................................................................
890 | 
891 | >> pdf.fill_and_stroke_cirlce([100,100], :radius => 25)
892 | NoMethodError: undefined method `cirlce' for #<Prawn::Document:0x4e59f8>
893 | 	from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:68:in `method_missing'
894 | 	from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:62:in `method_missing'
895 | 	from (irb):4
896 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
897 | 
898 | >> pdf.stroke_the_pretty_kitty([100,100], :radius => 25)
899 | NoMethodError: undefined method `the_pretty_kitty' for #<Prawn::Document:0x4e59f8>
900 | 	from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:68:in `method_missing'
901 | 	from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:64:in `method_missing'
902 | 	from (irb):5
903 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
904 | 
905 | >> pdf.donuts
906 | NoMethodError: undefined method `donuts' for #<Prawn::Document:0x4e59f8>
907 | 	from /Users/sandal/devel/prawn/lib/prawn/graphics/color.rb:68:in `method_missing'
908 | 	from (irb):6
909 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
910 | 
911 | ....................................................................................
912 | 
913 | An important thing to remember is that in addition to ensuring that you call
914 | +super+ from within your +method_missing()+ calls, you are also responsible for
915 | maintaining the method's signature.  Ruby will happily allow you to write a hook
916 | which only captures the name and not the arguments and block a method is called
917 | with:
918 | 
919 | ....................................................................................
920 | 
921 | def method_missing(id)
922 |   # ...
923 | end
924 | 
925 | ....................................................................................
926 | 
927 | However, if you set things up this way, even when you call +super+ you'll be
928 | breaking things farther up the chain, since +Object#method_missing+ expects the
929 | whole signature of the function call to remain intact.  So it's not only
930 | delegating to the original that is important, but delegating without information loss. 
931 | 
932 | If you're sure to act responsibly with your +method_missing+ calls, it won't be
933 | that dangerous in most cases.  However, if you get sloppy here, it is virtually
934 | guaranteed to come back to haunt you.  If you get into this habit right away,
935 | it'll be sure to save you some headaches down the line.
936 | 
937 | === Conclusions ===
938 | 
939 | This appendix doesn't come close to covering all the trouble you can get
940 | yourself in with Ruby.  It does however cover some of the most common sources of
941 | trouble and confusion, while showing some much less painful alternatives.
942 | 
943 | When it comes to design, much can be gained by simply reducing complexity.  If
944 | the path you're on seems like it is too difficult, odds are it can be made a lot
945 | easier if you just think about it in a different way.  As for "clever"
946 | implementation tricks and shortcuts, they can be more trouble than they're worth
947 | if they come at the expense of clarity or maintainability of your code.
948 | 
949 | Put simply, the worst practices in Ruby are ones that make you work much harder
950 | than you have to.  If you start to introduce code that seems really cool at
951 | first, but later is shown to introduce complicated faults at the corner cases,
952 | it is generally wise to just rip it out and start fresh with something a little
953 | less exciting that's more reliable.
954 | 
955 | If you maintain the balancing act between creative approaches to your problems
956 | and ones that work without introducing excess complexity, you'll have a very
957 | happy time writing Ruby code.  Because Ruby gives you the power to do both good
958 | and evil, it's ultimately up to you how you want to maintain your projects.
959 | However, code that is maintainable and predictable is much more of a joy to work
960 | with than fragile and sloppy hacks that have been simply ductaped together.
961 | 
962 | Now that we have reached the very end of this book, I trust that you have the
963 | skills necessary to go out and find the Best and Worst of Ruby Practices on your
964 | own.  The real challenge is knowing the difference between the two, and that
965 | ability comes only with practical experience gained by working on and
966 | investigating real problems.  With luck, this book has included enough real
967 | world examples to give you a head start in that area, but the heavy lifting needs to
968 | be done by you.
969 | 
970 | I hope you have enjoyed this wild ride through Ruby with me, and I really hope
971 | that something or the other in this book has challenged or inspired you.  Please
972 | go out now and write some good open source Ruby code, and maybe you'll make a
973 | guest appearance in the second edition!
974 | 


--------------------------------------------------------------------------------
/manuscript/unmerged/things_go_wrong.txt:
--------------------------------------------------------------------------------
   1 | == When Things Go Wrong ==
   2 | 
   3 | === Resolving Defects Doesn't Have To Be Painful ===
   4 | 
   5 | Unfortunately, neither this book nor a lifetime of practice can cause you to
   6 | attain Ruby programming perfection.  However, a good substitute for never making
   7 | a mistake is knowing how to fix your probelms as they arise.  The purpose
   8 | of this chapter is to provide you with the necessary tools and techniques to
   9 | prepare you for Ruby search and rescue missions.
  10 | 
  11 | We will start by walking through a simple but real bug hunting session to get a
  12 | basic outline of how to investigate issues in your Ruby projects.  We'll then
  13 | dive into some more specific tools and techniques for helping refine this
  14 | process.  What may surprise you is that we'll do all of this without ever
  15 | talking about using a debugger.  This is mainly because most Rubyists can and do
  16 | get away without the use of a formal debugging tool, via various lightweight
  17 | techniques we'll discuss here.
  18 | 
  19 | One skillset you will need to have in order to make the most out of what 
  20 | we'll discuss here is a decent understanding of how Ruby's built in unit testing
  21 | framework works.  That means if you haven't read the '"Driving Code Through
  22 | Tests"' chapter yet, you may want to go ahead and do that now.  
  23 | 
  24 | What you will notice about this chapter is that it is much more about the
  25 | process of problem solving in the context of Ruby than it is about solving any
  26 | particular problem.  If you keep this goal in mind while reading through the
  27 | examples here, you'll make the most out of what we'll discuss here.
  28 | 
  29 | Now that you know what to expect, let's start fixing some stuff.
  30 | 
  31 | === A Process For Debugging Ruby Code === 
  32 | 
  33 | Part of becoming masterful at anything is learning from your mistakes.  Since
  34 | Ruby programming is no exception,  I want to share one of my embarrassing
  35 | moments so that others can benefit from it.  If the problems with the code I am
  36 | about to show are immediately obvious to you, don't worry about that.  Instead,
  37 | focus on the problem solving strategies used, as that's what is most important
  38 | here.
  39 | 
  40 | We're going to look at a simplified version of a real problem I ran into in my
  41 | day to day work.  One of my Rails gigs involved building a system for processing
  42 | scholarship applications online.  After users have filled out an application
  43 | once, whether it was accepted or rejected, they are presented with a somewhat
  44 | different application form upon renewal.  Although it deviates a bit from our
  45 | real world application, here's some simple code that illustrates that process:
  46 | 
  47 | .................................................................................
  48 | 
  49 |   if gregory.can_renew?
  50 |     puts "Start the application renewal process"
  51 |   else
  52 |     puts "Edit a pending application or submit a new one"
  53 |   end
  54 | 
  55 | .................................................................................
  56 | 
  57 | At first, I thought the logic for this was simple.  So long as all of the user's
  58 | applications had a status of either accepted or rejected, it was safe to say
  59 | they could renew their application.  The following code provides a rough model
  60 | that implements this requirement:
  61 | 
  62 | .................................................................................
  63 | 
  64 | Application = Struct.new(:state)
  65 | 
  66 | class User
  67 |   def initialize
  68 |     @applications = []
  69 |   end
  70 | 
  71 |   attr_reader :applications
  72 | 
  73 |   def can_renew?
  74 |     applications.all? { |e| [:accepted, :rejected].include?(e.state) }
  75 |   end
  76 | end
  77 | 
  78 | .................................................................................
  79 | 
  80 | Using this model, we can see that the output of the following code is '"Start the
  81 | application renewal process"':
  82 | 
  83 | .................................................................................
  84 | 
  85 | gregory = User.new
  86 | gregory.applications << Application.new(:accepted)
  87 | gregory.applications << Application.new(:rejected)
  88 | 
  89 | if gregory.can_renew?
  90 |   puts "Start the application renewal process"
  91 | else
  92 |   puts "Edit a pending application or submit a new one"
  93 | end
  94 | 
  95 | .................................................................................
  96 | 
  97 | If we add a pending application into the mix, we see that the other case is
  98 | triggered, outputting '"Edit a pending application or submit a new one"':
  99 | 
 100 | .................................................................................
 101 | 
 102 | gregory = User.new
 103 | gregory.applications << Application.new(:accepted)
 104 | gregory.applications << Application.new(:rejected)
 105 | gregory.applications << Application.new(:pending)
 106 | 
 107 | if gregory.can_renew?
 108 |   puts "Start the application renewal process"
 109 | else
 110 |   puts "Edit a pending application or submit a new one"
 111 | end
 112 | 
 113 | .................................................................................
 114 | 
 115 | So far everything has been going fine, but the next bit of code exposed a nasty 
 116 | edge case:
 117 | 
 118 | .................................................................................
 119 | 
 120 | gregory = User.new
 121 | 
 122 | if gregory.can_renew?
 123 |   puts "Start the application renewal process"
 124 | else
 125 |   puts "Edit a pending application or submit a new one"
 126 | end
 127 | 
 128 | .................................................................................
 129 | 
 130 | While I fully expected this to print out '"Edit a pending application or submit a
 131 | new one"', it managed to print the other message instead!
 132 | 
 133 | Popping open irb, I tracked down the root of the problem:
 134 | 
 135 | .................................................................................
 136 | 
 137 | >> gregory = User.new
 138 | => #<User:0x2618bc @applications=[]>
 139 | >> gregory.can_renew?
 140 | => true
 141 | 
 142 | >> gregory.applications
 143 | => []
 144 | >> gregory.applications.all? { false }
 145 | => true
 146 | 
 147 | .................................................................................
 148 | 
 149 | Of course, the trouble here was due to an incorrect use of the +Enumerable#all?+ method.  
 150 | I had been relying on Ruby to do what I meant rather than what I actually asked it to
 151 | do, which is usually a bad idea.  For some reason I thought that calling +all?+
 152 | on an empty array would return +nil+ or +false+, but instead, it returned
 153 | +true+.  To fix it, I'd need to re-think +can_renew?+ a little bit.
 154 | 
 155 | I could have fixed the issue immediately by adding a special case
 156 | involving +applications.empty?+, but I wanted to be sure this bug wouldn't have
 157 | a chance to crop up again.  The easiest way to do this was to write some tests, 
 158 | which I probably should have done in the first place.
 159 | 
 160 | The following simple test case clearly specified the behavior I expected,
 161 | splitting it up into three cases as we did before:
 162 | 
 163 | .................................................................................
 164 | 
 165 | require "test/unit"
 166 | 
 167 | class UserTest < Test::Unit::TestCase
 168 | 
 169 |   def setup
 170 |     @gregory = User.new
 171 |   end
 172 | 
 173 |   def test_a_new_applicant_cannot_renew
 174 |     assert_block("Expected User#can_renew? to be false for a new applicant") do
 175 |       not @gregory.can_renew?
 176 |     end
 177 |   end
 178 | 
 179 |   def test_a_user_with_pending_applications_cannot_renew
 180 |     @gregory.applications << app(:accepted) << app(:pending)
 181 | 
 182 |     msg = "Expected User#can_renew? to be false when user has pending applications"
 183 |     assert_block(msg) do
 184 |       not @gregory.can_renew?
 185 |     end
 186 |   end
 187 | 
 188 |   def test_a_user_with_only_accepted_and_rejected_applications_can_renew
 189 |     @gregory.applications << app(:accepted) << app(:rejected) << app(:accepted)
 190 |     msg = "Expected User#can_renew? to be true when all applications are accepted or rejected"
 191 |     assert_block(msg) { @gregory.can_renew? }
 192 |   end
 193 | 
 194 |   private
 195 | 
 196 |   def app(name)
 197 |     Application.new(name)
 198 |   end
 199 | 
 200 | end
 201 | 
 202 | .................................................................................
 203 | 
 204 | When we run the tests, we can clearly see the failure that we investigated
 205 | manually a little earlier:
 206 | 
 207 | .................................................................................
 208 | 
 209 |   1) Failure:
 210 | test_a_new_applicant_cannot_renew(UserTest) [foo.rb:24]:
 211 | Expected User#can_renew? to be false for a new applicant
 212 | 
 213 | 3 tests, 3 assertions, 1 failures, 0 errors
 214 | 
 215 | .................................................................................
 216 | 
 217 | Now that we've successfully captured the essence of the bug, we can go about
 218 | fixing it.  As you may suspect, the solution is simple:
 219 | 
 220 | .................................................................................
 221 | 
 222 | def can_renew?
 223 |   return false if applications.empty?
 224 |   applications.all? { |e| [:accepted, :rejected].include?(e.state) }
 225 | end
 226 | 
 227 | .................................................................................
 228 | 
 229 | Running the tests again, we see that everything passes:
 230 | 
 231 | .................................................................................
 232 | 
 233 | 3 tests, 3 assertions, 0 failures, 0 errors
 234 | 
 235 | .................................................................................
 236 | 
 237 | If we went back and ran our original examples that print some messages to the
 238 | screen, we'd see that those now work as expected as well.  We could have used
 239 | those on their own to test our attempted fix, but by writing automated tests, we
 240 | have a safety net against regressions, which may be one of the main benefits of
 241 | unit tests.
 242 | 
 243 | Though the particular bug we squashed may be a bit boring, what we have shown is
 244 | a repeatable procedure for bug hunting, without ever firing up a debugger or
 245 | combing through log files.   To recap, here's the general plan for how things
 246 | should play out:
 247 | 
 248 |   * First, identify the different scenarios that apply to a given feature.
 249 | 
 250 |   * Enumerate over these scenarios to identify which ones are affected by
 251 |     defects and which ones work as expected.  This can be done in many ways,
 252 |     ranging from printing debugging messages on the command line, 
 253 |     to logfile analysis, to live application testing.  The important thing is to 
 254 |     identify and isolate the cases effected by the bug.
 255 | 
 256 |   * Hop into `irb` if possible and take a look at what your objects actually look
 257 |     like under the hood.  Experiment with the failing scenarios in a step by
 258 |     step fashion to try to dig down and uncover the root cause of problems.
 259 | 
 260 |   * Write tests to reproduce the problems you are having, along with what you
 261 |     expect to happen when the issue is resolved.
 262 | 
 263 |   * Implement a fix that passes the tests, and then repeat the process until all
 264 |     issues are resolved.
 265 | 
 266 | Sometimes, it's possible to condense this process into two steps by simply
 267 | writing a test which reproduces the bug and then introducing a fix that passes
 268 | the tests.  However, most of the time the extra leg work will pay off, as
 269 | understanding the root cause of the problem will allow you to treat your
 270 | application's disease all at once rather than addressing its sympthoms one by
 271 | one.  
 272 | 
 273 | Given this basic outline of how to isolate and resolve issues within our code, we
 274 | can now focus on some specific tools and techniques that will help improve the
 275 | process for us.
 276 | 
 277 | === Capturing the Essence of a Defect ===
 278 | 
 279 | Before you can begin to hunt down a bug, you need to be able to reproduce it in
 280 | isolation.  The main idea is that if you remove all the extraneous code that is
 281 | unrelated to the issue, it will be easier to see what is really going on. As you
 282 | continue to investigate an issue, you may discover that you can reduce the
 283 | example more and more based on what you learn.  Since I have a real example
 284 | handy from one of my projects, we can look at this process in action to see how
 285 | it plays out.
 286 | 
 287 | What follows is some Prawn code that was submitted as a bug report.  The problem
 288 | it is supposed to show is that every text +span()+ resulted in a page break
 289 | happening, when it wasn't supposed to.
 290 | 
 291 | .................................................................................
 292 | 
 293 | 
 294 | Prawn::Document.generate("span.pdf") do
 295 | 
 296 |   span(350, :position => :center) do
 297 |     text "Here's some centered text in a 350 point column. " * 100
 298 |   end
 299 | 
 300 |   text "Here's my sentence."
 301 | 
 302 |   bounding_box([50,300], :width => 400) do
 303 |     text "Here's some default bounding box text. " * 10
 304 |     span(bounds.width,
 305 |       :position => bounds.absolute_left - margin_box.absolute_left) do
 306 |       text "The rain in spain falls mainly on the plains. " * 300
 307 |     end
 308 |   end
 309 | 
 310 |   text "Here's my second sentence."
 311 | 
 312 | end
 313 | 
 314 | .................................................................................
 315 | 
 316 | Without a strong knowledge of Prawn, this example may already seem fairly
 317 | reduced.  Afterall, the text represents a sort of abstract problem definition
 318 | rather than some code that was ripped out of an application, and that is a good
 319 | start.  But upon running this code, I noticed that the defect was present
 320 | whenever a +span()+ call was made.  This allowed me to reduce the example
 321 | substantially: 
 322 | 
 323 | .................................................................................
 324 | 
 325 | Prawn::Document.generate("span.pdf") do
 326 | 
 327 |   span(350) do
 328 |     text "Here's some text in a 350pt wide column. " * 20
 329 |   end
 330 | 
 331 |   text "This text should appear on the same page as the spanning text"
 332 | 
 333 | end
 334 | 
 335 | .................................................................................
 336 | 
 337 | 
 338 | Whether or not you have any practical experience in Prawn, the issue stands
 339 | out better in this revised example, simply because there is less code to
 340 | consider. The code is also a bit more self-documenting, which makes buggy output
 341 | harder to miss.  Many bug reports can be reduced in a similar fashion.  Of course,
 342 | not everything compacts so well, but every little bit of simplification helps.
 343 | 
 344 | Most bugs aren't going to show up in the first place you look.  Instead, they'll
 345 | often be hidden farther down the chain, stashed away in some low level helper
 346 | method or in some other code that your feature depends on.  Since this is so
 347 | common, I've developed the habit of mentally tracing the execution path that my
 348 | example code follows, in hopes of finding some obvious mistake along the way. If
 349 | I notice anything suspicious along the way, I start the next iteration of bug
 350 | reproduction.  
 351 | 
 352 | Using this approach, I found out that the problem with +span()+ wasn't actually
 353 | in +span()+ at all.  Although the details aren't important, it turns out that 
 354 | the core problem  was in a lower level function called +canvas()+ which 
 355 | +span()+ relies on.   This method was incorrectly setting the text cursor on 
 356 | the page to the  very bottom of the page after executing its block argument.
 357 | I used the following example to confirm this was the case:
 358 | 
 359 | .................................................................................
 360 | 
 361 |   Prawn::Document.generate("canvas_sets_y_to_0.pdf") do
 362 |     canvas { text "Some text at the absolute top left of the page" }
 363 | 
 364 |     text "This text should not be after a pagebreak"
 365 |   end
 366 | 
 367 | .................................................................................
 368 | 
 369 | When I saw that I was able to reproduce the problem, I went on to formally specify what
 370 | was wrong in the form of tests, feeling reasonably confident that this
 371 | was the root defect.  
 372 | 
 373 | Whenever you are hunting for bugs, the practice of reducing your area of 
 374 | interest first will help you avoid dead ends and limit the amount of possible 
 375 | places you'll need look for problems. Before doing any formal investigation,
 376 | it's a good idea to check for obvious problems so that you can get a sense of
 377 | where the real source of your defect is.  Some bugs are harder to catch on sight
 378 | than others, but there is no need to overthink the easy ones.  
 379 | 
 380 | If a defect can be reproduced in isolation, you can usually narrow it down to
 381 | a specific deviation from what you expected to happen. We'll now take a look at
 382 | how to go from an example that reproduces a bug to a failing test that fully
 383 | categorizes it.
 384 | 
 385 | The main benefit of an automated test is that it will explode when your code 
 386 | fails to act as expected.  It is important to keep in mind that even if you 
 387 | have an existing test suite, when you encounter a bug that does not cause any
 388 | failures, you need to update your tests. This helps prevent regressions,
 389 | allowing you to fix a bug once and forget about it. 
 390 | 
 391 | Continuing with our example, here is a simple but sufficient test to 
 392 | corner the bug.
 393 | 
 394 | .................................................................................
 395 | 
 396 | class CanvasTest < Test::Unit::TestCase
 397 | 
 398 |   def setup
 399 |     @pdf = Prawn::Document.new 
 400 |   end
 401 |   
 402 |   def test_canvas_should_not_reset_y_to_zero
 403 |     after_text_position = nil
 404 | 
 405 |     @pdf.canvas do 
 406 |       @pdf.text "Hello World" 
 407 |       after_text_position = @pdf.y 
 408 |     end
 409 | 
 410 |     assert_equal after_text_position, @pdf.y
 411 |   end
 412 | end  
 413 | 
 414 | .................................................................................
 415 | 
 416 | Here, we expect the y coordinate after the +canvas+ block is executed to be the same 
 417 | as it was just after the text was rendered to the page.  Running this test
 418 | reproduces the problem we created an example for earlier:
 419 | 
 420 | .................................................................................
 421 | 
 422 |   1) Failure:test_canvas_should_not_reset_y_to_zero(CanvasTest) [---]
 423 | <778.128> expected but was
 424 | <0.0>.  
 425 | 
 426 | .................................................................................
 427 | 
 428 | Here, we have converted our simplified example into something that can become a
 429 | part of our automated test suite.  The more simple an example is, the easier
 430 | this is to do.  More complicated examples may need to be broken into several
 431 | chunks, but this process is straightforward more often than not.
 432 | 
 433 | Once we write a test that reproduces our problem, the way we fix it is to get
 434 | our tests passing again.  If other tests end up breaking in order to get our new 
 435 | test to pass, we know that something is still wrong.  If for some reason, our
 436 | problem isn't solved when we get all the tests passing again, it means our
 437 | reduced example probably didn't cover the entirety of the problem, so we need to
 438 | go back to the drawing board in those cases.  Even still, not all is lost.  Each
 439 | test serves as a significant reduction of your problem space.  Every passing
 440 | assertion eliminates the possibility of that particular issue from being the
 441 | root of your problem.  Sooner or later, there won't be any place left for your
 442 | bugs to hide.
 443 | 
 444 | For those who need a recap, here are the keys to producing a good reduced
 445 | example:
 446 | 
 447 |   * Remove as much extraneous code as possible from your example, and the bug
 448 |     will be clearer to see.
 449 | 
 450 |   * Try to make your example self describing, so that even someone not
 451 |     familiar with the core issue can see at a glance whether something is wrong.
 452 |     This helps others report regressions even if they don't fully understand the
 453 |     internals of your project.
 454 | 
 455 |   * Continue to revise your examples until the reach the root cause of the
 456 |     problem.  Don't throw away any of the higher level examples until you verify
 457 |     that fixing a general problem solves the specific issue you ran into as
 458 |     well.
 459 | 
 460 |   * When you understand the root cause of your problem, code up a failing test
 461 |     that demonstrates how the code should work.  When it passes, the bug should
 462 |     be gone.  If it fails again, you'll know there has been a regression.
 463 | 
 464 | === Scrutinizing Your Code ===
 465 | 
 466 | When things aren't working the way you expect them to, you obviously need to
 467 | find out why.  There are certain tricks that can make this task a lot easier on
 468 | you, and you can utilize them without ever needing to fire up the debugger.
 469 | 
 470 | ==== Utilizing Reflection ====
 471 | 
 472 | Many bugs come from using an object in a different way than you're supposed to,
 473 | or by some internal state deviating from your expectations.  To be able to
 474 | detect and fix these bugs, you need to be able to get a clear picture of what is
 475 | going on under the hood in the objects you're working with.
 476 | 
 477 | I'll assume that you already know that +Kernel#p+ and +Object#inspect+ exist,
 478 | and how to use them for basic needs.  However, when left to their default
 479 | behaviors, using these tools to debug complex objects can be too painful to be
 480 | practical.  We can take an unadorned +Prawn::Document+'s inspect output for
 481 | an example:
 482 | 
 483 | .................................................................................
 484 | 
 485 | #<Prawn::Document:0x12cf17c @page_content=#<Prawn::Reference:0x12cecf4 
 486 | @data={:Length=>0}, @gen=0, @identifier=4, @stream="0.000 0.000 0.000 r
 487 | g\n0.000 0.000 0.000 RG\nq\n", @compressed=false>, @info=
 488 | #<Prawn::Reference:0x12cf0c8  @data={:Creator=>"Prawn", :Producer=>"Prawn"},
 489 | @gen=0, @identifier=1, @compressed=false>
 490 | , @root=#<Prawn::Reference:0x12cf064 @data={:Type=>:Catalog, :Pages=>
 491 | #<Prawn::Reference:0x12cf08c @data={:Count=>1, :Kids=>[#<Prawn::Reference:0x12ceca4
 492 | @data={:Contents=>#<Prawn::Reference:0x12cecf4 
 493 | @data={:Length=>0}, @gen=0, @identifier=4, @stream="0.000 0.000 0.000 rg\n0.000 
 494 | 0.000 0.000 RG\nq\n",
 495 | 
 496 | << ABOUT 50 MORE LINES LIKE THIS >>
 497 | 
 498 | #<Prawn::Reference:0x12cf08c @data={:Count=>1, :Kids=>[#<Prawn::Reference:0x12ceca4 
 499 | ...>], :Type=>:Pages}, @gen=0, @identifier=2, @compressed=false>, 
 500 | :MediaBox=>[0, 0, 612.0, 792.0]}, @gen=0, @identifier=5, @compressed=false>], 
 501 | @margin_box=#<Prawn::Document::BoundingBox:0x12ced30 @width=540.0,
 502 | @y=756.0, @x=36, @parent=#<Prawn::Document:0x12cf17c ...>, @height=720.0>, 
 503 | @fill_color="000000", @current_page=#<Prawn::Reference:0x12ceca4 @data={:Contents=>
 504 | #<Prawn::Reference:0x12cecf4  @data={:Length=>0}, @gen=0, @identifier=4, 
 505 | @stream="0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\n",  @compressed=false>, 
 506 | :Type=>:Page, :Parent=>#<Prawn::Reference:0x12cf08c  @data={:Count=>1, 
 507 | :Kids=>[#<Prawn::Reference:0x12ceca4 ...>], :Type=>:Pages},
 508 | @gen=0, @identifier=2, @compressed=false>,  :MediaBox=>[0, 0, 612.0, 792.0]},
 509 | @gen=0, @identifier=5, @compressed=false>, @skip_encoding=nil,
 510 | @bounding_box=#<Prawn::Document::BoundingBox:0x12ced30 @width=540.0, @y=756.0, @x=36,
 511 | @parent=#<Prawn::Document:0x12cf17c ...>, @height=720.0>, @page_size="LETTER", 
 512 | @stroke_color="000000" , @text_options={}, @compress=false, @margins={:top=>36,
 513 | :left=>36, :bottom=>36, :right=>36}>
 514 | .................................................................................
 515 | 
 516 | Although this information sure is thorough, it probably won't quickly help us
 517 | identify what page layout is being used or what the dimensions of the margins
 518 | are.  If we aren't familiar with the internals of this object, such verbose
 519 | output is borderline useless.  Of course, this doesn't mean we're simply out of
 520 | luck. In situations like this, we can infer a lot about an object by using 
 521 | Ruby's reflective capabilities:
 522 | 
 523 | .................................................................................
 524 | 
 525 | >> pdf.class
 526 | => Prawn::Document
 527 | 
 528 | >> pdf.instance_variables
 529 | => [:@objects, :@info, :@pages, :@root, :@page_size, :@page_layout, :@compress, 
 530 | :@skip_encoding, :@background, :@font_size, :@text_options, :@margins, :@margin_box,
 531 | :@bounding_box, :@page_content, :@current_page, :@fill_color, :@stroke_color, :@y]
 532 | 
 533 | >> Prawn::Document.instance_methods(inherited_methods=false).sort
 534 | => [:bounding_box, :bounds, :bounds=, :canvas, :compression_enabled?, :cursor, 
 535 | :find_font, :font, :font_families, :font_registry, :font_size, :font_size=, :margin_box,
 536 | :margin_box=, :margins, :mask, :move_down, :move_up, :pad, :pad_bottom, :pad_top, :page_count,
 537 | :page_layout, :page_size, :render, :render_file, :save_font, :set_font, :span,
 538 | :start_new_page, :text_box, :width_of, :y, :y=]
 539 | 
 540 | >> pdf.private_methods(inherited_methods=false)
 541 | => [:init_bounding_box, :initialize, :build_new_page_content, :generate_margin_box]
 542 | 
 543 | .................................................................................
 544 | 
 545 | Now, even if we haven't worked with this particular object before, we have a
 546 | sense of what is available and it makes queries like the ones mentioned in the
 547 | last paragraph much easier:
 548 | 
 549 | .................................................................................
 550 | 
 551 | >> pdf.margins
 552 | => {:left=>36, :right=>36, :top=>36, :bottom=>36}
 553 | 
 554 | >> pdf.page_layout
 555 | => :portrait
 556 | 
 557 | .................................................................................
 558 | 
 559 | If we want to look at some lower level details, such as the contents of some
 560 | instance variables, we can do so via +instance_variable_get+:
 561 | 
 562 | .................................................................................
 563 | 
 564 | >> pdf.instance_variable_get(:@current_page)
 565 | => #<Prawn::Reference:0x4e5750 @identifier=5, @gen=0, @data={:Type=>:Page, 
 566 | :Parent=>#<Prawn::Reference:0x4e5b60 @identifier=2, @gen=0, @data={:Type=>:Pages,
 567 | :Count=>1, :Kids=>[#<Prawn::Reference:0x4e5750 ...>]}, @compressed=false, @on_encode=nil>, 
 568 | :MediaBox=>[0, 0, 612.0, 792.0], :Contents=>#<Prawn::Reference:0x4e57a0 @identifier=4,
 569 | @gen=0, @data={:Length=>0}, @compressed=false, @on_encode=nil,
 570 | @stream="0.000 0.000 0.000 rg\n0.000 0.000 0.000 RG\nq\n">}, @compressed=false,
 571 | @on_encode=nil>
 572 | 
 573 | .................................................................................
 574 | 
 575 | Using these tricks, we can easily determine whether we've accidentally got the
 576 | name of a variable or method wrong.  We can also see what the underlying
 577 | structure of our objects are, and repeat this process to drill down and
 578 | investigate potential problems.
 579 | 
 580 | ==== Improving inspect output ====
 581 | 
 582 | Of course, the whole situation here would be better if we had easier to read
 583 | inspect output.  There is actually a standard library called +pp+ that 
 584 | improves the formatting of inspect while operating in a very similar 
 585 | fashion. I wrote a whole section in `Appendix B` about this library, including
 586 | some of its advanced capabilities.  You should definitely read up on what +pp+
 587 | offers you when you get the chance, but here I'd like to cover some alternative
 588 | approaches that can also come in handy.
 589 | 
 590 | As it turns out, the output of +Kernel#p+ can be improved on an object by object
 591 | basis.  This may have already been obvious if you have used +Object#inspect+
 592 | before, but it is also a severely underused feature of Ruby.  This feature can be 
 593 | used to turn the mess we saw in the previous section into beautiful debugging
 594 | output:
 595 | 
 596 | .................................................................................
 597 | 
 598 | >> pdf = Prawn::Document.new
 599 | => < Prawn::Document:0x27df8a: 
 600 |       @background: nil
 601 |       @compress: false
 602 |       @fill_color: "000000"
 603 |       @font_size: 12
 604 |       @margins: {:left=>36, :right=>36, :top=>36, :bottom=>36}
 605 |       @page_layout: :portrait
 606 |       @page_size: "LETTER"
 607 |       @skip_encoding: nil
 608 |       @stroke_color: "000000"
 609 |       @text_options: {}
 610 |       @y: 756.0
 611 | 
 612 |       @bounding_box -> Prawn::Document::BoundingBox:0x27dd64
 613 |       @current_page -> Prawn::Reference:0x27dd1e
 614 |       @info -> Prawn::Reference:0x27df44
 615 |       @margin_box -> Prawn::Document::BoundingBox:0x27dd64
 616 |       @objects -> Array:0x27df6c
 617 |       @page_content -> Prawn::Reference:0x27dd46
 618 |       @pages -> Prawn::Reference:0x27df26
 619 |       @root -> Prawn::Reference:0x27df12 >
 620 | 
 621 | .................................................................................
 622 | 
 623 | I think you'll agree that this looks substantially easier to follow than the
 624 | default inspect output. To accomplish this, I put together a pretty straightforward 
 625 | template that allows you to pass in a couple arrays of symbols which point at 
 626 | instance variables:
 627 | 
 628 | .................................................................................
 629 | 
 630 | module InspectTemplate 
 631 | 
 632 |   def __inspect_template(objs, refs)
 633 |     obj_output = objs.sort.each_with_object("") do |v,out| 
 634 |        out << "\n      #{v}: #{instance_variable_get(v).inspect}" 
 635 |     end
 636 | 
 637 |     ref_output = refs.sort.each_with_object("") do |v,out|
 638 |       ref = instance_variable_get(v)
 639 |       out << "\n      #{v} -> #{__inspect_object_tag(ref)}"
 640 |     end
 641 |     
 642 |     "< #{__inspect_object_tag(self)}: #{obj_output}\n#{ref_output} >"
 643 |   end
 644 | 
 645 |   def __inspect_object_tag(obj)
 646 |     "#{obj.class}:0x#{obj.object_id.to_s(16)}"
 647 |   end
 648 | 
 649 | end
 650 | 
 651 | .................................................................................
 652 | 
 653 | After mixing this into `Prawn::Document`, I only need to specify which variables I
 654 | want to display the entire contents of, and which I want to just show as
 655 | references.  Then, it is as easy as calling +__inspect_template+ with these
 656 | values.
 657 | 
 658 | .................................................................................
 659 | 
 660 | class Prawn::Document
 661 | 
 662 |   include InspectTemplate
 663 | 
 664 |   def inspect
 665 |     objs = [ :@page_size, :@page_layout, :@margins, :@font_size, :@background, 
 666 |              :@stroke_color, :@fill_color, :@text_options, :@y, :@compress, 
 667 |              :@skip_encoding ]
 668 | 
 669 |     refs = [ :@objects, :@info, :@pages, :@bounding_box, :@margin_box, :@page_content,
 670 |              :@current_page, :@root]
 671 |    
 672 |     __inspect_template(objs,refs)
 673 |   end
 674 | end
 675 | 
 676 | .................................................................................
 677 | 
 678 | Once we provide a customized +inspect+ method that returns a string, both
 679 | +Kernel#p+ and +irb+ will pick up on it, yielding the nice results we showed
 680 | earlier.
 681 | 
 682 | Although my +InspectTemplate+ can easily be reused, it carries the major caveat
 683 | that you become 100% responsible for exposing your variables for debugging
 684 | output.  Anything not explicitly passed to +__inspect_template+ will not be
 685 | rendered.  However, there is a middle of the road solution that is far more
 686 | automatic.
 687 | 
 688 | The +YAML+ data serialization standard library has a nice side effect of
 689 | producing highly readable representations of Ruby objects.  Because of this, it
 690 | actually provides a +Kernel#y+ method which can be used as a stand in
 691 | replacement for +p+.  Although this may be a bit strange, if you look at it in
 692 | action, you'll see it has some benefits:
 693 | 
 694 | .................................................................................
 695 | 
 696 | >> require "yaml"
 697 | => true
 698 | 
 699 | >> y Prawn::Document.new
 700 | --- &id007 !ruby/object:Prawn::Document 
 701 | background: 
 702 | bounding_box: &id002 !ruby/object:Prawn::Document::BoundingBox 
 703 |   height: 720.0
 704 |   parent: *id007
 705 |   width: 540.0
 706 |   x: 36
 707 |   y: 756.0
 708 | compress: false
 709 | info: &id003 !ruby/object:Prawn::Reference 
 710 |   compressed: false
 711 |   data: 
 712 |     :Creator: Prawn
 713 |     :Producer: Prawn
 714 |   gen: 0
 715 |   identifier: 1
 716 |   on_encode: 
 717 | margin_box: *id002
 718 | margins: 
 719 |   :left: 36
 720 |   :right: 36
 721 |   :top: 36
 722 |   :bottom: 36
 723 | page_content: *id005
 724 | page_layout: :portrait
 725 | page_size: LETTER
 726 | pages: *id004
 727 | root: *id006
 728 | skip_encoding: 
 729 | stroke_color: "000000"
 730 | text_options: {}
 731 | 
 732 | y: 756.0
 733 | => nil
 734 | 
 735 | .................................................................................
 736 | 
 737 | I truncated this file somewhat, but the basic structure shines through.  You can
 738 | see that YAML nicely shows nested object relations, and generally looks neat and
 739 | tidy.  Interestingly enough, YAML automatically truncates repeated object
 740 | references by referring to them by ID only. This turns out to be especially good 
 741 | for tracking down a certain kind of Ruby bug:
 742 | 
 743 | .................................................................................
 744 | 
 745 | >> a = Array.new(6)
 746 | => [nil, nil, nil, nil, nil, nil]
 747 | >> a = Array.new(6,[])
 748 | => [[], [], [], [], [], []]
 749 | >> a[0] << "foo"
 750 | => ["foo"]
 751 | >> a
 752 | => [["foo"], ["foo"], ["foo"], ["foo"], ["foo"], ["foo"]]
 753 | >> y a
 754 | --- 
 755 | - &id001 
 756 |   - foo
 757 | - *id001
 758 | - *id001
 759 | - *id001
 760 | - *id001
 761 | - *id001
 762 | 
 763 | .................................................................................
 764 | 
 765 | Here, it's easy to see that the six sub-arrays that make up our main array are
 766 | actually just six references to the same object.  If that wasn't what we were
 767 | going for, we can see the difference when we have six distinct objects very
 768 | clearly in YAML:
 769 | 
 770 | .................................................................................
 771 | 
 772 | >> a = Array.new(6) { [] }
 773 | => [[], [], [], [], [], []]
 774 | >> a[0] << "foo"
 775 | => ["foo"]
 776 | >> a
 777 | => [["foo"], [], [], [], [], []]
 778 | >> y a
 779 | --- 
 780 | - - foo
 781 | - []
 782 | 
 783 | - []
 784 | 
 785 | - []
 786 | 
 787 | - []
 788 | 
 789 | - []
 790 | 
 791 | .................................................................................
 792 | 
 793 | Although this may not be a problem you run into day to day, it's relatively easy
 794 | to forget to deep copy a structure from time to time, or to accidentally create
 795 | many copies of a reference to the same object when you're trying to set default
 796 | values.   When that happens, a quick call to +y+ will make a long series of
 797 | references to the same object appear very clearly.
 798 | 
 799 | Of course, the YAML output will come most in handy when you encounter this
 800 | problem by accident or if it is part of some sort of deeply nested structure.
 801 | If you already know exactly where to look and can easily get at it, using pure
 802 | Ruby works fine as well:
 803 | 
 804 | .................................................................................
 805 | 
 806 | >> a = Array.new(6) { [] }
 807 | => [[], [], [], [], [], []]
 808 | >> a.map { |e| e.object_id }
 809 | => [3423870, 3423860, 3423850, 3423840, 3423830, 3423820]
 810 | >> b = Array.new(6,[])
 811 | => [[], [], [], [], [], []]
 812 | >> b.map { |e| e.object_id }
 813 | => [3431570, 3431570, 3431570, 3431570, 3431570, 3431570]
 814 | 
 815 | .................................................................................
 816 | 
 817 | So far, we've been focusing very heavily on how to inspect your objects.  This
 818 | is mostly because of the fact that a great deal of Ruby bugs can be solved by
 819 | simply getting a sense of what objects are being passed around and what data
 820 | they really contain.  But this is of course not the full extent of the problem,
 821 | we also need to be able to work with code that has been set in motion.  
 822 | 
 823 | ==== Finding Needles In A Haystack ====
 824 | 
 825 | Sometimes it's not possible to easily pull up a defective object to directly
 826 | inspect.  Consider for example, a large dataset that has some occasional
 827 | anomalies in it. If you're dealing with tens or hundreds of thousands of
 828 | records, an error like this won't be very helpful after your script churns for a
 829 | while and then goes sailing off the tracks:
 830 | 
 831 | .................................................................................
 832 | 
 833 | >> @data.map { |e|Integer(e[:amount]) }
 834 | ArgumentError: invalid value for Integer: "157,000"
 835 | 	from (irb):10:in `Integer'
 836 | 	from (irb):10
 837 | 	from (irb):10:in `inject'
 838 | 	from (irb):10:in `each'
 839 | 	from (irb):10:in `inject'
 840 | 	from (irb):10
 841 | 	from :0
 842 | 
 843 | .................................................................................
 844 | 
 845 | This error tells you virtually nothing about what has happened, except that
 846 | somewhere in your giant data set, there is an invalidly formatted integer.
 847 | Let's explore how to deal with situations like this, by creating some data and
 848 | introducing a few problems into it.
 849 | 
 850 | When it comes to generating fake data for testing, you can't get easier than the
 851 | 'faker' gem.  Here's a sample of creating an array of hash records containing
 852 | 5000 names, phone numbers, and payments:
 853 | 
 854 | .................................................................................
 855 | 
 856 | >> data = 5000.times.map do
 857 | ?>   { name: Faker::Name.name, phone_number: Faker::PhoneNumber.phone_number,
 858 | ?>     payment: rand(10000).to_s }
 859 | >> end
 860 | 
 861 | >> data.length
 862 | => 5000
 863 | >> data[0..2]
 864 | => [{:name=>"Joshuah Wyman", :phone_number=>"393-258-6420", :payment=>"6347"}, 
 865 |     {:name=>"Kraig Jacobi", :phone_number=>"779-295-0532", :payment=>"9186"}, 
 866 |     {:name=>"Jevon Harris", :phone_number=>"985.169.0519", :payment=>"213"}]
 867 |   
 868 | .................................................................................
 869 | 
 870 | Now, we can randomly corrupt a handful of records, to give us a basis for our
 871 | example.  Keep in mind, the purpose of this demonstration is to show how to
 872 | respond to unanticipated problems, rather than a known issue with your data.
 873 | 
 874 | .................................................................................
 875 | 
 876 | 5.times { data[rand(data.length)][:payment] << ".25" }
 877 | 
 878 | .................................................................................
 879 | 
 880 | Now if we ask a simple question such as which records have an amount over 1000,
 881 | we get our familiar and useless error:
 882 | 
 883 | .................................................................................
 884 | 
 885 | >> data.select { |e| Integer(e[:payment]) > 1000 }
 886 | ArgumentError: invalid value for Integer: "1991.25"
 887 | 
 888 | .................................................................................
 889 | 
 890 | At this point, we'd like to get some more information about where this problem
 891 | is actually located in our data, and what the individual record looks like.
 892 | Because we presumably have no idea how many of these records there are, we might
 893 | start by rescuing a single failure and then re-raising the error after printing 
 894 | some of this data to the screen. We'll use a +begin .. rescue+ construct here as
 895 | well as +Enumerable#with_index+.
 896 | 
 897 | .................................................................................
 898 | 
 899 | >> data.select.with_index do |e,i|
 900 | ?>   begin
 901 | ?>     Integer(e[:payment]) > 1000
 902 | >>   rescue ArgumentError
 903 | >>      p [e,i]
 904 | >>      raise
 905 | >>   end
 906 | >> end
 907 | [{:name=>"Mr. Clotilde Baumbach", :phone_number=>"(608)779-7942", :payment=>"1991.25"}, 91]
 908 | ArgumentError: invalid value for Integer: "1991.25"
 909 | 	from (irb):67:in `Integer'
 910 | 	from (irb):67:in `block in irb_binding'
 911 | 	from (irb):65:in `select'
 912 | 	from (irb):65:in `with_index'
 913 | 	from (irb):65
 914 | 	from /Users/sandal/lib/ruby19_1/bin/irb:12:in `<main>'
 915 | 
 916 | .................................................................................
 917 | 
 918 | So now we've pinpointed where the problem is coming from, and we know what the
 919 | actual record looks like.  Aside from the payment being a string representation
 920 | of a `Float` instead of an `Integer`, it's not immediately clear that their is
 921 | anything else wrong with this record.  If we drop the line that re-raises the
 922 | error, we can get a full report of records with this issue:
 923 | 
 924 | .................................................................................
 925 | 
 926 | >> data.select.with_index do |e,i|
 927 | ?>   begin
 928 | ?>      Integer(e[:payment]) > 1000
 929 | >>   rescue ArgumentError
 930 | >>      p [e,i]
 931 | >>   end
 932 | >> end; nil
 933 | [{:name=>"Mr. Clotilde Baumbach", :phone_number=>"(608)779-7942", :payment=>"1991.25"}, 91]
 934 | [{:name=>"Oceane Cormier", :phone_number=>"658.016.1612", :payment=>"7361.25"}, 766]
 935 | [{:name=>"Imogene Bergnaum", :phone_number=>"(573)402-6508", :payment=>"1073.25"}, 1368]
 936 | [{:name=>"Jeramy Prohaska", :phone_number=>"928.266.5508 x97173", :payment=>"6109.25"}, 2398]
 937 | [{:name=>"Betty Gerhold", :phone_number=>"250-149-3161", :payment=>"8668.25"}, 2399]
 938 | => nil
 939 | 
 940 | .................................................................................
 941 | 
 942 | As you can see, this recovered all the rows with this issue.  Based on this
 943 | information, we could probably make a
 944 | decision about what to do to fix the issue.  But because we're just interested
 945 | in the process here, the actual solution doesn't matter that much.   Instead,
 946 | the real point to remember is that when faced with an opaque error after
 947 | iterating across a large dataset, you can go back and temporarily rework things
 948 | to allow you to analyze the problematic data records.
 949 | 
 950 | We'll see variants on this theme later on in the chapter, but for now, let's
 951 | recap what to remember when you are looking at your code under the microscope:
 952 | 
 953 |  * Don't rely on giant, ugly inspect statements if you can avoid it.  Instead,
 954 |    use introspection to narrow your search down to the specific relevant
 955 |    objects.
 956 | 
 957 |  * Writing your own `#inspect` method allows customed output from +Kernel#p+ and
 958 |    within irb.  However, this means you are responsible for adding new state to
 959 |    the debugging output as your objects evolve.
 960 | 
 961 |  * YAML provides a nice +Kernel#y+ method that provides a structured, easy to
 962 |    read representation of Ruby objects.  This is also useful for spotting
 963 |    accidental reference duplication bugs.
 964 | 
 965 |  * Sometimes stack traces aren't enough.  You can +rescue+ and then re-raise an
 966 |    error after printing some debugging output to help you find the root cause of
 967 |    your problems.
 968 | 
 969 | So far, we've talked about solutions that work well as part of the active
 970 | debugging process.  However, in many cases it is also important to passively
 971 | collect error feedback for dealing with later.  It is possible to do this with
 972 | Ruby's logging system, so let's shift gears a bit and talk about it.
 973 | 
 974 | === Working With Logger ===
 975 | 
 976 | I'm not generally a big fan of log files.  I much prefer the immediacy of seeing
 977 | problems directly reported on the screen as soon as they happen. If possible, I 
 978 | actually want to be thrown directly into my problematic code so I can take a look
 979 | around if possible.  However, this isn't always an option, and in certain cases,
 980 | having an audit trail in the form of log files is as good as it's going to get.
 981 | 
 982 | Ruby's standard library 'logger' is fairly full featured, allowing you to log
 983 | many different kinds of messages, and filter them based on their severity.  The
 984 | API is reasonably well documented, so I won't be spending a ton of time going
 985 | over a feature by feature summary of what this library offers here.  Instead,
 986 | I'll show you how to replicate a bit of functionality that is especially common
 987 | in Ruby's web frameworks: comprehensive error logging.
 988 | 
 989 | If I pull up a log from one of my Rails applications, I can easily show what I'm
 990 | talking about.  The following is just a small section of a log file, in which a
 991 | full request and the error it ran into have been recorded:
 992 | 
 993 | .................................................................................
 994 | 
 995 | Processing ManagerController#call_in_sheet (for 127.0.0.1 at 2009-02-13 16:38:42) [POST]
 996 |   Session ID: BAh7CCIJdXNlcmkiOg5yZXR1cm5fdG8wIgpmbGFzaElDOidBY3Rpb25Db250%0Acm9sbGVyOj
 997 |   pGbGFzaDo6Rmxhc2hIYXNoewAGOgpAdXNlZHsA--2f1d03dee418f4c9751925da421ae4730f9b55dd
 998 |   Parameters: {"period"=>"01/19/2009", "commit"=>"Select", "action"=>"call_in_sheet", 
 999 |   "controller"=>"manager"}
1000 | 
1001 | 
1002 | NameError (undefined local variable or method `lunch' for #<CallInAggregator:0x2589240>):
1003 |     /lib/reports.rb:368:in `employee_record'
1004 |     /lib/reports.rb:306:in `to_grouping'
1005 |     /lib/reports.rb:305:in `each'
1006 |     /lib/reports.rb:305:in `to_grouping'
1007 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/data/table.rb:169:in `initialize'
1008 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/data/table.rb:809:in `new'
1009 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/data/table.rb:809:in `Table'
1010 |     /lib/reports.rb:304:in `to_grouping'
1011 |     /lib/reports.rb:170:in `CallInAggregator'
1012 |     /lib/reports.rb:129:in `setup'
1013 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:337:in `render'
1014 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:379:in `build'
1015 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:335:in `render'
1016 |     /usr/local/lib/ruby/gems/1.8/gems/ruport-1.4.0/lib/ruport/renderer.rb:451:in `method_missing'
1017 |     /app/controllers/manager_controller.rb:111:in `call_in_sheet'
1018 |     /app/controllers/application.rb:62:in `on'
1019 |     /app/controllers/manager_controller.rb:110:in `call_in_sheet'
1020 |     /vendor/rails/actionpack/lib/action_controller/base.rb:1104:in `send'
1021 |     /vendor/rails/actionpack/lib/action_controller/base.rb:1104:in `perform_action_wit
1022 | 
1023 | .................................................................................
1024 | 
1025 | While the production application would display a rather boring "We're sorry,
1026 | something went wrong" message upon triggering an error, our backend logs tell us
1027 | exactly what request triggered the error and when it occured.   It also gives us
1028 | information about the actual request, to aid in debugging.  Though this
1029 | particular bug is fairly boring, since it looks like it was simply a typo that
1030 | snuck through the cracks, logging each error that occurs along with its full
1031 | stack trace provides essentially the same information that you'd get if you were
1032 | running a script locally and ran into an error.
1033 | 
1034 | While it's nice that some libraries and frameworks have logging built in,
1035 | sometimes we'll need to roll our own.   To demonstrate this, we'll be walking
1036 | through a simple +TCPServer+ that does simple arithmetic operations in prefix
1037 | notation.  We'll start by taking a look at it without any logging or error
1038 | handling support:
1039 | 
1040 | .................................................................................
1041 | 
1042 | require "socket"
1043 | 
1044 | class Server     
1045 | 
1046 |   def initialize
1047 |     @server   = TCPServer.new('localhost',port=3333)   
1048 |   end
1049 | 
1050 |   def *(x, y)
1051 |     "#{Float(x) * Float(y)}"
1052 |   end
1053 | 
1054 |   def /(x, y)
1055 |     "#{Float(x) / Float(y)}"
1056 |   end
1057 | 
1058 |   def handle_request(session)
1059 |     action, *args = session.gets.split(/\s/)
1060 |     if ["*", "/"].include?(action)
1061 |       session.puts(send(action, *args))
1062 |     else
1063 |       session.puts("Invalid command")
1064 |     end
1065 |   end
1066 | 
1067 |   def run 
1068 |     while session = @server.accept 
1069 |       handle_request(session)
1070 |     end
1071 |   end 
1072 | end
1073 | 
1074 | .................................................................................
1075 | 
1076 | We can use the following fairly generic client to interact with the server,
1077 | which is similar to the one we used in the "Designing Beautiful APIs" chapter.
1078 | 
1079 | .................................................................................
1080 | 
1081 | require "socket" 
1082 | 
1083 | class Client 
1084 | 
1085 |   def initialize(ip="localhost",port=3333) 
1086 |     @ip, @port = ip, port 
1087 |   end    
1088 | 
1089 |   def send_message(msg) 
1090 |     socket = TCPSocket.new(@ip,@port) 
1091 |     socket.puts(msg) 
1092 |     response = socket.gets 
1093 |     socket.close 
1094 |     return response 
1095 |   end    
1096 | 
1097 |   def receive_message 
1098 |     socket = TCPSocket.new(@ip,@port)   
1099 |     response = socket.read
1100 |     socket.close 
1101 |     return response 
1102 |   end 
1103 |  
1104 | end 
1105 | 
1106 | .................................................................................
1107 | 
1108 | Without any error handling, we end up with something like this on the client
1109 | side:
1110 | 
1111 | .................................................................................
1112 | 
1113 | client = Client.new 
1114 | 
1115 | response = client.send_message("* 5 10")  
1116 | puts response 
1117 | 
1118 | response = client.send_message("/ 4 3")
1119 | puts response
1120 | 
1121 | response = client.send_message("/ 3 foo")
1122 | puts response
1123 | 
1124 | response = client.send_message("* 5 7.2")
1125 | puts response
1126 | 
1127 | ## OUTPUTS ##
1128 | 
1129 | 50.0
1130 | 1.33333333333333
1131 | nil
1132 | client.rb:8:in `initialize': Connection refused - connect(2) (Errno::ECONNREFUSED)
1133 |         from client.rb:8:in `new'
1134 |         from client.rb:8:in `send_message'
1135 |         from client.rb:35 
1136 | 
1137 | .................................................................................
1138 | 
1139 | When we send the erroneous third message, the server never responds, resulting
1140 | in a nil response.  But when we try to send a fourth message, which would
1141 | ordinarily be valid, we see our connection was refused.  If we take a look
1142 | server side, we see that a single uncaught exception caused it to crash immediately:
1143 | 
1144 | .................................................................................
1145 | 
1146 | server_logging_initial.rb:15:in `Float': invalid value for Float(): "foo" (ArgumentError)
1147 | 	from server_logging_initial.rb:15:in `/'
1148 | 	from server_logging_initial.rb:20:in `send'
1149 | 	from server_logging_initial.rb:20:in `handle_request'
1150 | 	from server_logging_initial.rb:25:in `run'
1151 | 	from server_logging_initial.rb:31
1152 | 
1153 | .................................................................................
1154 | 
1155 | While this does give us a sense of what happened, it doesn't give us much
1156 | insight into when and why.  It also seems just a tad bit fragile to have a whole
1157 | server come crashing down on the account of a single bad request.  With a little
1158 | more effort, we can add logging and error handling and make things behave much
1159 | better.
1160 | 
1161 | .................................................................................
1162 | 
1163 | require "socket"
1164 | require "logger"
1165 | 
1166 | class StandardError
1167 |   def report
1168 |     %{#{self.class}: #{message}\n#{backtrace.join("\n")}}
1169 |   end
1170 | end
1171 | 
1172 | class Server     
1173 | 
1174 |   def initialize(logger)
1175 |     @logger   = logger
1176 |     @server   = TCPServer.new('localhost',port=3333)   
1177 |   end
1178 | 
1179 |   def *(x, y)
1180 |     "#{Float(x) * Float(y)}"
1181 |   end
1182 | 
1183 |   def /(x, y)
1184 |     "#{Float(x) / Float(y)}"
1185 |   end
1186 | 
1187 |   def handle_request(session)
1188 |     action, *args = session.gets.split(/\s/)
1189 |     if ["*", "/"].include?(action)
1190 |       @logger.info "executing: '#{action}' with #{args.inspect}"
1191 |       session.puts(send(action, *args))
1192 |     else
1193 |       session.puts("Invalid command")
1194 |     end
1195 |   rescue StandardError => e
1196 |     @logger.error(e.report)
1197 |     session.puts "Sorry, something went wrong."
1198 |   end
1199 | 
1200 |   def run 
1201 |     while session = @server.accept 
1202 |       handle_request(session)
1203 |     end
1204 |   end 
1205 | end
1206 | 
1207 | begin 
1208 |   logger = Logger.new("development.log")
1209 |   host   = Server.new(logger)
1210 |   
1211 |   host.run
1212 | rescue StandardError => e
1213 |   logger.fatal(e.report)
1214 |   puts "Something seriously bad just happened, exiting"
1215 | end
1216 | 
1217 | .................................................................................
1218 | 
1219 | We'll go over the details in just a minute, but first, let's take a look at the
1220 | output on the client side running the identical code from earlier:
1221 | 
1222 | .................................................................................
1223 | 
1224 | client = Client.new 
1225 | 
1226 | response = client.send_message("* 5 10")  
1227 | puts response 
1228 | 
1229 | response = client.send_message("/ 4 3")
1230 | puts response
1231 | 
1232 | response = client.send_message("/ 3 foo")
1233 | puts response
1234 | 
1235 | response = client.send_message("* 5 7.2")
1236 | puts response
1237 | 
1238 | ## OUTPUTS ##
1239 | 
1240 | 50.0
1241 | 1.33333333333333
1242 | Sorry, something went wrong.
1243 | 36.0   
1244 | 
1245 | .................................................................................
1246 | 
1247 | We see that the third message is caught as an error and an apology is promptly
1248 | sent to the client.  But the interesting bit is that the fourth example
1249 | continues to run normally, indicating that the server did not crash this time
1250 | around.
1251 | 
1252 | Of course, if we swallowed all errors and just returned "We're sorry" every time
1253 | something happened without creating a proper paper trail for debugging, that'd
1254 | be a terrible idea.  Upon inspecting the server logs, we can see that we haven't
1255 | forgotton to keep ourselves covered:
1256 | 
1257 | .................................................................................
1258 | 
1259 | # Logfile created on Sat Feb 21 07:07:49 -0500 2009 by /
1260 | I, [2009-02-21T07:08:54.335294 #39662]  INFO -- : executing: '*' with ["5", "10"]
1261 | I, [2009-02-21T07:08:54.335797 #39662]  INFO -- : executing: '/' with ["4", "3"]
1262 | I, [2009-02-21T07:08:54.336163 #39662]  INFO -- : executing: '/' with ["3", "foo"]
1263 | E, [2009-02-21T07:08:54.336243 #39662] ERROR -- : ArgumentError: invalid value for Float(): "foo"
1264 | server_logging.rb:22:in `Float'
1265 | server_logging.rb:22:in `/'
1266 | server_logging.rb:28:in `send'
1267 | server_logging.rb:28:in `handle_request'
1268 | server_logging.rb:36:in `run'
1269 | server_logging.rb:45
1270 | I, [2009-02-21T07:08:54.336573 #39662]  INFO -- : executing: '*' with ["5", "7.2"]
1271 | 
1272 | .................................................................................
1273 | 
1274 | Here we see two different levels of logging going on, INFO and ERROR.  The
1275 | purpose of our INFO logs are simply to document requests as parsed by our
1276 | server.  This is to ensure that the messages and their parameters are being
1277 | processed as we expect them to.  Our ERROR logs document the actual errors we
1278 | run into while processing things, and you can see in this example that the stack
1279 | trace written to the log file is nearly identical to the one that was produced
1280 | when our more fragile version of the server crashed.
1281 | 
1282 | Although the format is a little different, like the rails logs, this provides us
1283 | with everything we need for debugging.  A time and date of the issue, a record
1284 | of the actual request, and a trace that shows where the error originated.  Now
1285 | that we've seen it in action, let's take a look at how it all comes together.
1286 | 
1287 | We'll start with the small extension to +StandardError+
1288 | 
1289 | .................................................................................
1290 | 
1291 | class StandardError
1292 |   def report
1293 |     %{#{self.class}: #{message}\n#{backtrace.join("\n")}}
1294 |   end
1295 | end
1296 | 
1297 | .................................................................................
1298 | 
1299 | This convenience method allows us to produce error reports that look similar to
1300 | the ones you'll find on the command line when an exception is raised.  While
1301 | +StandardError+ objects provide all the same information, they do not have a
1302 | single public method that provides the same report data that Ruby does, so we
1303 | need to assemble it on our own.
1304 | 
1305 | We can see how this error report is used in the main +handle_request+ method.
1306 | Notice that the server is passed a +Logger+ instance which is used as +@logger+
1307 | in the following code:
1308 | 
1309 | .................................................................................
1310 | 
1311 | def handle_request(session)
1312 |   action, *args = session.gets.split(/\s/)
1313 |   if ["*", "/"].include?(action)
1314 |     @logger.info "executing: '#{action}' with #{args.inspect}"
1315 |     session.puts(send(action, *args))
1316 |   else
1317 |     session.puts("Invalid command")
1318 |   end
1319 | rescue StandardError => e
1320 |   @logger.error(e.report)
1321 |   session.puts "Sorry, something went wrong."
1322 | end
1323 | 
1324 | .................................................................................
1325 | 
1326 | Here, we see where the messages in our log file actually came from.  Before the
1327 | server attempts to actually execute a command, it records what it has parsed out
1328 | using +@logger.info+.   Then, it attempts to send the message along with its
1329 | parameters to the object itself, printing its return value to the client end of
1330 | the socket.  If this fails for any reason, the relevant error is captured into
1331 | +e+ through +rescue+.  This will catch all descendents of +StandardError+, which
1332 | include virtually all exceptions Ruby can throw.   Once it is captured, we
1333 | utilize the custom +StandardError#report+ extension to generate an error report
1334 | string which is then logged as an error in the logfile.   The apology is sent
1335 | along to the client, thus completing the cycle.
1336 | 
1337 | 
1338 | While that covers what we've seen in the logfile so far, there is an additional
1339 | measure for error handling in this application.  We see this in the code that
1340 | actually gets everything up and running:   
1341 | 
1342 | .................................................................................
1343 | 
1344 | 
1345 | begin 
1346 |   logger = Logger.new("development.log")
1347 |   host   = Server.new(logger)
1348 |   
1349 |   host.run
1350 | rescue StandardError => e
1351 |   logger.fatal(e.report)
1352 |   puts "Something seriously bad just happened, exiting"
1353 | end
1354 | 
1355 | .................................................................................
1356 | 
1357 | Although our response handling code is pretty well insulated from errors, we
1358 | still want to track in our logfile any server crashes that may happen.  Rather
1359 | than using ERROR as our designation, we instead use FATAL, indicating that our
1360 | server has no intention of recovering from errors that bubble up to this level.
1361 | I'll leave it up to the reader to figure out how to crash the server once it is
1362 | running, but this would also serve to persist to file things such as
1363 | misspelled variable and method names or other issues within the +Server+ class.  
1364 | To illustrate this, replace the +run+ method with the following code:
1365 | 
1366 | .................................................................................
1367 | 
1368 |   def run 
1369 |     while session = @server.accept 
1370 |       handle_request(sessions)
1371 |     end
1372 |   end 
1373 | 
1374 | .................................................................................
1375 | 
1376 | You'll end up crashing the server and producing the following log message:
1377 | 
1378 | .................................................................................
1379 | 
1380 | F, [2009-02-21T07:39:40.592569 #39789] FATAL -- : NameError: undefined local 
1381 | variable or method `sessions' for #<Server:0x20c970>
1382 | server_logging.rb:36:in `run'
1383 | server_logging.rb:45
1384 | 
1385 | .................................................................................
1386 | 
1387 | This can be helpful if you're deploying code remotely and have some code that
1388 | runs locally but not on the remote host, among other things.
1389 | 
1390 | Although we have not covered +Logger+ in depth by any means, we've walked
1391 | through an example which can be used as a template for more general needs.  Most
1392 | of the time, logging makes the most sense when you don't have easy, immediate
1393 | access to the running code, and can be overkill in other places.  If you're considering 
1394 | adding logging code to your applications, there are a few things to keep in
1395 | mind:
1396 | 
1397 |  * Error logging is essential for long runnning server processes, where you may
1398 |    not physically be watching the application moment by moment.
1399 | 
1400 |  * If you are working in a multi-processing environment, be sure to use a
1401 |    separate log file for each process, as otherwise there will be clashes.
1402 | 
1403 |  * `Logger` is powerful and includes a ton of featured not covered here, including
1404 |    built in logfile rotation.
1405 | 
1406 |  * See the template for `StandardError#report` if you want to include error
1407 |    reports in your logs that look similar to the ones Ruby generates on the
1408 |    command line.
1409 | 
1410 |  * When it comes to logging error messages, FATAL should represent a bug your 
1411 |    code has no intention of recovering from, where ERROR is more open-ended.
1412 | 
1413 | Depending on the kind of work you do, you may end up using +Logger+ every day or
1414 | not at all.   If it's the former case, be sure to check out the API
1415 | documentation for many of the features not covered here.
1416 | 
1417 | And with that, we've reached the end of another chapter.  I'll just wrap up with
1418 | some closing remarks, and then we can move on to more upbeat topics.
1419 | 
1420 | === Conclusions ===
1421 | 
1422 | Dealing with defective code is something we all need to do from time to time. 
1423 | If we approach these issues in a relatively disciplined way, we can methodically 
1424 | corner and squash pretty much any bug that can be imagined.  Debugging Ruby 
1425 | code tends to be a fluid process, starting with a good specification of how things 
1426 | should actually work, and then exercising various investigative tactics until a fix 
1427 | can be found.  We don't necessarily need a debugger to track down issues in our code,
1428 | but we do need to use Ruby's introspective features as much as possible, since they 
1429 | have the power to reveal to us exactly what is going on under the hood.
1430 | 
1431 | Once you get into a comfortable work flow for resolving issues in your Ruby
1432 | code, it becomes more and more straightforward.  If you find yourself lost while
1433 | hunting down some bug, take the time to slow down and utilize the strategies
1434 | we've gone over in this chapter.  Once you get the hang of them, the tighter
1435 | feedback loop will kick in and make your job much easier.
1436 | 


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0701.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0701.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0702.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0702.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0703.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0703.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0704.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0704.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0705.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0705.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0706.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0706.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0801.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0801.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0802.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0802.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0803.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0803.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_0804.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_0804.png


--------------------------------------------------------------------------------
/oreilly_final/figs/rubp_ab01.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/practicingruby/rbp-book/cfb6a56c3787a68e9c4245055c47aa605b31d7ad/oreilly_final/figs/rubp_ab01.png


--------------------------------------------------------------------------------