├── META
    ├── LICENSE.txt
    ├── book.yml
    └── contents.yml
├── NOTES.md
├── README.md
├── cheap-counterfeits-jekyll.md
├── domain-specific-api-construction.md
├── evented-io.md
├── formula-processing.md
├── http-server.md
├── parsing-json.md
├── rapid-prototyping.md
├── roll-your-own-enumerable-and-enumerator.md
├── unix-style-command-line-applications.md
└── working-with-binary-file-formats.md


/META/LICENSE.txt:
--------------------------------------------------------------------------------
  1 | Creative Commons Attribution-ShareAlike 3.0 Unported
  2 | 
  3 | <https://creativecommons.org/licenses/by-sa/3.0/legalcode.txt>
  4 | 
  5 | Creative Commons Legal Code
  6 | 
  7 | Attribution-ShareAlike 3.0 Unported
  8 | 
  9 |     CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
 10 |     LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
 11 |     ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
 12 |     INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
 13 |     REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
 14 |     DAMAGES RESULTING FROM ITS USE.
 15 | 
 16 | License
 17 | 
 18 | THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE
 19 | COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY
 20 | COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS
 21 | AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
 22 | 
 23 | BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE
 24 | TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY
 25 | BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS
 26 | CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND
 27 | CONDITIONS.
 28 | 
 29 | 1. Definitions
 30 | 
 31 |  a. "Adaptation" means a work based upon the Work, or upon the Work and
 32 |     other pre-existing works, such as a translation, adaptation,
 33 |     derivative work, arrangement of music or other alterations of a
 34 |     literary or artistic work, or phonogram or performance and includes
 35 |     cinematographic adaptations or any other form in which the Work may be
 36 |     recast, transformed, or adapted including in any form recognizably
 37 |     derived from the original, except that a work that constitutes a
 38 |     Collection will not be considered an Adaptation for the purpose of
 39 |     this License. For the avoidance of doubt, where the Work is a musical
 40 |     work, performance or phonogram, the synchronization of the Work in
 41 |     timed-relation with a moving image ("synching") will be considered an
 42 |     Adaptation for the purpose of this License.
 43 |  b. "Collection" means a collection of literary or artistic works, such as
 44 |     encyclopedias and anthologies, or performances, phonograms or
 45 |     broadcasts, or other works or subject matter other than works listed
 46 |     in Section 1(f) below, which, by reason of the selection and
 47 |     arrangement of their contents, constitute intellectual creations, in
 48 |     which the Work is included in its entirety in unmodified form along
 49 |     with one or more other contributions, each constituting separate and
 50 |     independent works in themselves, which together are assembled into a
 51 |     collective whole. A work that constitutes a Collection will not be
 52 |     considered an Adaptation (as defined below) for the purposes of this
 53 |     License.
 54 |  c. "Creative Commons Compatible License" means a license that is listed
 55 |     at https://creativecommons.org/compatiblelicenses that has been
 56 |     approved by Creative Commons as being essentially equivalent to this
 57 |     License, including, at a minimum, because that license: (i) contains
 58 |     terms that have the same purpose, meaning and effect as the License
 59 |     Elements of this License; and, (ii) explicitly permits the relicensing
 60 |     of adaptations of works made available under that license under this
 61 |     License or a Creative Commons jurisdiction license with the same
 62 |     License Elements as this License.
 63 |  d. "Distribute" means to make available to the public the original and
 64 |     copies of the Work or Adaptation, as appropriate, through sale or
 65 |     other transfer of ownership.
 66 |  e. "License Elements" means the following high-level license attributes
 67 |     as selected by Licensor and indicated in the title of this License:
 68 |     Attribution, ShareAlike.
 69 |  f. "Licensor" means the individual, individuals, entity or entities that
 70 |     offer(s) the Work under the terms of this License.
 71 |  g. "Original Author" means, in the case of a literary or artistic work,
 72 |     the individual, individuals, entity or entities who created the Work
 73 |     or if no individual or entity can be identified, the publisher; and in
 74 |     addition (i) in the case of a performance the actors, singers,
 75 |     musicians, dancers, and other persons who act, sing, deliver, declaim,
 76 |     play in, interpret or otherwise perform literary or artistic works or
 77 |     expressions of folklore; (ii) in the case of a phonogram the producer
 78 |     being the person or legal entity who first fixes the sounds of a
 79 |     performance or other sounds; and, (iii) in the case of broadcasts, the
 80 |     organization that transmits the broadcast.
 81 |  h. "Work" means the literary and/or artistic work offered under the terms
 82 |     of this License including without limitation any production in the
 83 |     literary, scientific and artistic domain, whatever may be the mode or
 84 |     form of its expression including digital form, such as a book,
 85 |     pamphlet and other writing; a lecture, address, sermon or other work
 86 |     of the same nature; a dramatic or dramatico-musical work; a
 87 |     choreographic work or entertainment in dumb show; a musical
 88 |     composition with or without words; a cinematographic work to which are
 89 |     assimilated works expressed by a process analogous to cinematography;
 90 |     a work of drawing, painting, architecture, sculpture, engraving or
 91 |     lithography; a photographic work to which are assimilated works
 92 |     expressed by a process analogous to photography; a work of applied
 93 |     art; an illustration, map, plan, sketch or three-dimensional work
 94 |     relative to geography, topography, architecture or science; a
 95 |     performance; a broadcast; a phonogram; a compilation of data to the
 96 |     extent it is protected as a copyrightable work; or a work performed by
 97 |     a variety or circus performer to the extent it is not otherwise
 98 |     considered a literary or artistic work.
 99 |  i. "You" means an individual or entity exercising rights under this
100 |     License who has not previously violated the terms of this License with
101 |     respect to the Work, or who has received express permission from the
102 |     Licensor to exercise rights under this License despite a previous
103 |     violation.
104 |  j. "Publicly Perform" means to perform public recitations of the Work and
105 |     to communicate to the public those public recitations, by any means or
106 |     process, including by wire or wireless means or public digital
107 |     performances; to make available to the public Works in such a way that
108 |     members of the public may access these Works from a place and at a
109 |     place individually chosen by them; to perform the Work to the public
110 |     by any means or process and the communication to the public of the
111 |     performances of the Work, including by public digital performance; to
112 |     broadcast and rebroadcast the Work by any means including signs,
113 |     sounds or images.
114 |  k. "Reproduce" means to make copies of the Work by any means including
115 |     without limitation by sound or visual recordings and the right of
116 |     fixation and reproducing fixations of the Work, including storage of a
117 |     protected performance or phonogram in digital form or other electronic
118 |     medium.
119 | 
120 | 2. Fair Dealing Rights. Nothing in this License is intended to reduce,
121 | limit, or restrict any uses free from copyright or rights arising from
122 | limitations or exceptions that are provided for in connection with the
123 | copyright protection under copyright law or other applicable laws.
124 | 
125 | 3. License Grant. Subject to the terms and conditions of this License,
126 | Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
127 | perpetual (for the duration of the applicable copyright) license to
128 | exercise the rights in the Work as stated below:
129 | 
130 |  a. to Reproduce the Work, to incorporate the Work into one or more
131 |     Collections, and to Reproduce the Work as incorporated in the
132 |     Collections;
133 |  b. to create and Reproduce Adaptations provided that any such Adaptation,
134 |     including any translation in any medium, takes reasonable steps to
135 |     clearly label, demarcate or otherwise identify that changes were made
136 |     to the original Work. For example, a translation could be marked "The
137 |     original work was translated from English to Spanish," or a
138 |     modification could indicate "The original work has been modified.";
139 |  c. to Distribute and Publicly Perform the Work including as incorporated
140 |     in Collections; and,
141 |  d. to Distribute and Publicly Perform Adaptations.
142 |  e. For the avoidance of doubt:
143 | 
144 |      i. Non-waivable Compulsory License Schemes. In those jurisdictions in
145 |         which the right to collect royalties through any statutory or
146 |         compulsory licensing scheme cannot be waived, the Licensor
147 |         reserves the exclusive right to collect such royalties for any
148 |         exercise by You of the rights granted under this License;
149 |     ii. Waivable Compulsory License Schemes. In those jurisdictions in
150 |         which the right to collect royalties through any statutory or
151 |         compulsory licensing scheme can be waived, the Licensor waives the
152 |         exclusive right to collect such royalties for any exercise by You
153 |         of the rights granted under this License; and,
154 |    iii. Voluntary License Schemes. The Licensor waives the right to
155 |         collect royalties, whether individually or, in the event that the
156 |         Licensor is a member of a collecting society that administers
157 |         voluntary licensing schemes, via that society, from any exercise
158 |         by You of the rights granted under this License.
159 | 
160 | The above rights may be exercised in all media and formats whether now
161 | known or hereafter devised. The above rights include the right to make
162 | such modifications as are technically necessary to exercise the rights in
163 | other media and formats. Subject to Section 8(f), all rights not expressly
164 | granted by Licensor are hereby reserved.
165 | 
166 | 4. Restrictions. The license granted in Section 3 above is expressly made
167 | subject to and limited by the following restrictions:
168 | 
169 |  a. You may Distribute or Publicly Perform the Work only under the terms
170 |     of this License. You must include a copy of, or the Uniform Resource
171 |     Identifier (URI) for, this License with every copy of the Work You
172 |     Distribute or Publicly Perform. You may not offer or impose any terms
173 |     on the Work that restrict the terms of this License or the ability of
174 |     the recipient of the Work to exercise the rights granted to that
175 |     recipient under the terms of the License. You may not sublicense the
176 |     Work. You must keep intact all notices that refer to this License and
177 |     to the disclaimer of warranties with every copy of the Work You
178 |     Distribute or Publicly Perform. When You Distribute or Publicly
179 |     Perform the Work, You may not impose any effective technological
180 |     measures on the Work that restrict the ability of a recipient of the
181 |     Work from You to exercise the rights granted to that recipient under
182 |     the terms of the License. This Section 4(a) applies to the Work as
183 |     incorporated in a Collection, but this does not require the Collection
184 |     apart from the Work itself to be made subject to the terms of this
185 |     License. If You create a Collection, upon notice from any Licensor You
186 |     must, to the extent practicable, remove from the Collection any credit
187 |     as required by Section 4(c), as requested. If You create an
188 |     Adaptation, upon notice from any Licensor You must, to the extent
189 |     practicable, remove from the Adaptation any credit as required by
190 |     Section 4(c), as requested.
191 |  b. You may Distribute or Publicly Perform an Adaptation only under the
192 |     terms of: (i) this License; (ii) a later version of this License with
193 |     the same License Elements as this License; (iii) a Creative Commons
194 |     jurisdiction license (either this or a later license version) that
195 |     contains the same License Elements as this License (e.g.,
196 |     Attribution-ShareAlike 3.0 US)); (iv) a Creative Commons Compatible
197 |     License. If you license the Adaptation under one of the licenses
198 |     mentioned in (iv), you must comply with the terms of that license. If
199 |     you license the Adaptation under the terms of any of the licenses
200 |     mentioned in (i), (ii) or (iii) (the "Applicable License"), you must
201 |     comply with the terms of the Applicable License generally and the
202 |     following provisions: (I) You must include a copy of, or the URI for,
203 |     the Applicable License with every copy of each Adaptation You
204 |     Distribute or Publicly Perform; (II) You may not offer or impose any
205 |     terms on the Adaptation that restrict the terms of the Applicable
206 |     License or the ability of the recipient of the Adaptation to exercise
207 |     the rights granted to that recipient under the terms of the Applicable
208 |     License; (III) You must keep intact all notices that refer to the
209 |     Applicable License and to the disclaimer of warranties with every copy
210 |     of the Work as included in the Adaptation You Distribute or Publicly
211 |     Perform; (IV) when You Distribute or Publicly Perform the Adaptation,
212 |     You may not impose any effective technological measures on the
213 |     Adaptation that restrict the ability of a recipient of the Adaptation
214 |     from You to exercise the rights granted to that recipient under the
215 |     terms of the Applicable License. This Section 4(b) applies to the
216 |     Adaptation as incorporated in a Collection, but this does not require
217 |     the Collection apart from the Adaptation itself to be made subject to
218 |     the terms of the Applicable License.
219 |  c. If You Distribute, or Publicly Perform the Work or any Adaptations or
220 |     Collections, You must, unless a request has been made pursuant to
221 |     Section 4(a), keep intact all copyright notices for the Work and
222 |     provide, reasonable to the medium or means You are utilizing: (i) the
223 |     name of the Original Author (or pseudonym, if applicable) if supplied,
224 |     and/or if the Original Author and/or Licensor designate another party
225 |     or parties (e.g., a sponsor institute, publishing entity, journal) for
226 |     attribution ("Attribution Parties") in Licensor's copyright notice,
227 |     terms of service or by other reasonable means, the name of such party
228 |     or parties; (ii) the title of the Work if supplied; (iii) to the
229 |     extent reasonably practicable, the URI, if any, that Licensor
230 |     specifies to be associated with the Work, unless such URI does not
231 |     refer to the copyright notice or licensing information for the Work;
232 |     and (iv) , consistent with Ssection 3(b), in the case of an
233 |     Adaptation, a credit identifying the use of the Work in the Adaptation
234 |     (e.g., "French translation of the Work by Original Author," or
235 |     "Screenplay based on original Work by Original Author"). The credit
236 |     required by this Section 4(c) may be implemented in any reasonable
237 |     manner; provided, however, that in the case of a Adaptation or
238 |     Collection, at a minimum such credit will appear, if a credit for all
239 |     contributing authors of the Adaptation or Collection appears, then as
240 |     part of these credits and in a manner at least as prominent as the
241 |     credits for the other contributing authors. For the avoidance of
242 |     doubt, You may only use the credit required by this Section for the
243 |     purpose of attribution in the manner set out above and, by exercising
244 |     Your rights under this License, You may not implicitly or explicitly
245 |     assert or imply any connection with, sponsorship or endorsement by the
246 |     Original Author, Licensor and/or Attribution Parties, as appropriate,
247 |     of You or Your use of the Work, without the separate, express prior
248 |     written permission of the Original Author, Licensor and/or Attribution
249 |     Parties.
250 |  d. Except as otherwise agreed in writing by the Licensor or as may be
251 |     otherwise permitted by applicable law, if You Reproduce, Distribute or
252 |     Publicly Perform the Work either by itself or as part of any
253 |     Adaptations or Collections, You must not distort, mutilate, modify or
254 |     take other derogatory action in relation to the Work which would be
255 |     prejudicial to the Original Author's honor or reputation. Licensor
256 |     agrees that in those jurisdictions (e.g. Japan), in which any exercise
257 |     of the right granted in Section 3(b) of this License (the right to
258 |     make Adaptations) would be deemed to be a distortion, mutilation,
259 |     modification or other derogatory action prejudicial to the Original
260 |     Author's honor and reputation, the Licensor will waive or not assert,
261 |     as appropriate, this Section, to the fullest extent permitted by the
262 |     applicable national law, to enable You to reasonably exercise Your
263 |     right under Section 3(b) of this License (right to make Adaptations)
264 |     but not otherwise.
265 | 
266 | 5. Representations, Warranties and Disclaimer
267 | 
268 | UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING, LICENSOR
269 | OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY
270 | KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE,
271 | INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTIBILITY,
272 | FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF
273 | LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS,
274 | WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION
275 | OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.
276 | 
277 | 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE
278 | LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR
279 | ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES
280 | ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS
281 | BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
282 | 
283 | 7. Termination
284 | 
285 |  a. This License and the rights granted hereunder will terminate
286 |     automatically upon any breach by You of the terms of this License.
287 |     Individuals or entities who have received Adaptations or Collections
288 |     from You under this License, however, will not have their licenses
289 |     terminated provided such individuals or entities remain in full
290 |     compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will
291 |     survive any termination of this License.
292 |  b. Subject to the above terms and conditions, the license granted here is
293 |     perpetual (for the duration of the applicable copyright in the Work).
294 |     Notwithstanding the above, Licensor reserves the right to release the
295 |     Work under different license terms or to stop distributing the Work at
296 |     any time; provided, however that any such election will not serve to
297 |     withdraw this License (or any other license that has been, or is
298 |     required to be, granted under the terms of this License), and this
299 |     License will continue in full force and effect unless terminated as
300 |     stated above.
301 | 
302 | 8. Miscellaneous
303 | 
304 |  a. Each time You Distribute or Publicly Perform the Work or a Collection,
305 |     the Licensor offers to the recipient a license to the Work on the same
306 |     terms and conditions as the license granted to You under this License.
307 |  b. Each time You Distribute or Publicly Perform an Adaptation, Licensor
308 |     offers to the recipient a license to the original Work on the same
309 |     terms and conditions as the license granted to You under this License.
310 |  c. If any provision of this License is invalid or unenforceable under
311 |     applicable law, it shall not affect the validity or enforceability of
312 |     the remainder of the terms of this License, and without further action
313 |     by the parties to this agreement, such provision shall be reformed to
314 |     the minimum extent necessary to make such provision valid and
315 |     enforceable.
316 |  d. No term or provision of this License shall be deemed waived and no
317 |     breach consented to unless such waiver or consent shall be in writing
318 |     and signed by the party to be charged with such waiver or consent.
319 |  e. This License constitutes the entire agreement between the parties with
320 |     respect to the Work licensed here. There are no understandings,
321 |     agreements or representations with respect to the Work not specified
322 |     here. Licensor shall not be bound by any additional provisions that
323 |     may appear in any communication from You. This License may not be
324 |     modified without the mutual written agreement of the Licensor and You.
325 |  f. The rights granted under, and the subject matter referenced, in this
326 |     License were drafted utilizing the terminology of the Berne Convention
327 |     for the Protection of Literary and Artistic Works (as amended on
328 |     September 28, 1979), the Rome Convention of 1961, the WIPO Copyright
329 |     Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996
330 |     and the Universal Copyright Convention (as revised on July 24, 1971).
331 |     These rights and subject matter take effect in the relevant
332 |     jurisdiction in which the License terms are sought to be enforced
333 |     according to the corresponding provisions of the implementation of
334 |     those treaty provisions in the applicable national law. If the
335 |     standard suite of rights granted under applicable copyright law
336 |     includes additional rights not granted under this License, such
337 |     additional rights are deemed to be included in the License; this
338 |     License is not intended to restrict the license of any rights under
339 |     applicable law.
340 | 
341 | 
342 | Creative Commons Notice
343 | 
344 |     Creative Commons is not a party to this License, and makes no warranty
345 |     whatsoever in connection with the Work. Creative Commons will not be
346 |     liable to You or any party on any legal theory for any damages
347 |     whatsoever, including without limitation any general, special,
348 |     incidental or consequential damages arising in connection to this
349 |     license. Notwithstanding the foregoing two (2) sentences, if Creative
350 |     Commons has expressly identified itself as the Licensor hereunder, it
351 |     shall have all rights and obligations of Licensor.
352 | 
353 |     Except for the limited purpose of indicating to the public that the
354 |     Work is licensed under the CCPL, Creative Commons does not authorize
355 |     the use by either party of the trademark "Creative Commons" or any
356 |     related trademark or logo of Creative Commons without the prior
357 |     written consent of Creative Commons. Any permitted use will be in
358 |     compliance with Creative Commons' then-current trademark usage
359 |     guidelines, as may be published on its website or otherwise made
360 |     available upon request from time to time. For the avoidance of doubt,
361 |     this trademark restriction does not form part of the License.
362 | 
363 |     Creative Commons may be contacted at https://creativecommons.org/.
364 | 


--------------------------------------------------------------------------------
/META/book.yml:
--------------------------------------------------------------------------------
1 | ############################
2 | #  Book (Meta) Info
3 | 
4 | title:   Best of Practicing Ruby (Book Edition)
5 | author:
6 |   name:  Gregory Brown, Luke Francl, Magnus Holm, Aaron Patterson, Solomon White et al
7 | 


--------------------------------------------------------------------------------
/META/contents.yml:
--------------------------------------------------------------------------------
  1 | ######################
  2 | #  Table of Contents
  3 | 
  4 | - title:  A minimal HTTP server - Build just enough HTTP functionality from scratch to serve up static files
  5 |   path:   http-server.md
  6 |   sections:
  7 |   - title: A (very) brief introduction to HTTP
  8 |   - title: Writing the "Hello World" HTTP server
  9 |   - title: Serving files over HTTP
 10 |   - title: Safely converting a URI into a file path
 11 |   - title: Serving up index.html implicitly
 12 |   - title: Where to go from here
 13 | 
 14 | - title:  Event loops demystified - Build a Node.js/EventMachine-style event loop in roughly 150 lines
 15 |   path:   evented-io.md
 16 |   sections:
 17 |   - title: Obligatory chat server example
 18 |   - title: Event handling
 19 |   - title: The IO loop
 20 |   - title: IO events
 21 |   - title: Working with the Ruby IO object
 22 |   - title: Getting real with IO.select
 23 |   - title: Handling streaming input and output
 24 |   - title: Conclusions
 25 | 
 26 | - title: Parsing JSON the hard way - Learn about low-level parser and compiler tools by implementing a JSON parser
 27 |   path:  parsing-json.md
 28 |   sections:
 29 |   - title: The Tools We'll Be Using
 30 |   - title: Racc Basics
 31 |   - title: Building our JSON Parser
 32 |   - title: Building the tokenizer
 33 |   - title: Building the parser
 34 |   - title: Building the handler
 35 |   - title: Reflections
 36 |   - title: Post Script
 37 | 
 38 | - title: Tricks for working with text and files - Tear apart a minimal clone of the Jekyll blog engine in search of helpful idioms
 39 |   path:  cheap-counterfeits-jekyll.md
 40 |   sections:
 41 |   - title: A brief overview of Jackal's functionality
 42 |   - title: Idioms for text processing
 43 |   - title: Idioms for working with files and folders
 44 |   - title: Reflections
 45 | 
 46 | - title: Working with binary file formats - Read and write bitmap files using only a few dozen lines of code
 47 |   path:  working-with-binary-file-formats.md
 48 |   sections:
 49 |   - title: The anatomy of a bitmap
 50 |   - title: Encoding a bitmap image
 51 |   - title: Decoding a bitmap image
 52 |   - title: Reflections
 53 | 
 54 | - title: Building Unix-style command line applications - Build a basic clone of the 'cat' utility while learning some idioms for command line applications
 55 |   path:  unix-style-command-line-applications.md
 56 |   sections:
 57 |   - title: Building an executable script
 58 |   - title: Stream processing techniques
 59 |   - title: Options parsing
 60 |   - title: Basic text formatting
 61 |   - title: Error handling and exit codes
 62 |   - title: Reflections
 63 | 
 64 | - title: Rapid Prototyping - Build a tiny prototype of a tetris game on the command line
 65 |   path:  rapid-prototyping.md
 66 |   sections:
 67 |   - title: The Planning Phase
 68 |   - title: The Requirements Phase
 69 |   - title: The Coding Phase
 70 |     sections:
 71 |     - title: "Case 1: line_shape_demo.rb"
 72 |     - title: "Case 2: bended_shape_demo.rb"
 73 |   - title: Reflections
 74 | 
 75 | - title: Building Enumerable 'n' Enumerator - Learn about powerful iteration tools by implementing some of its functionality yourself
 76 |   path:  roll-your-own-enumerable-and-enumerator.md
 77 |   sections:
 78 |   - title: Setting the stage with some tests
 79 |   - title: Implementing the `FakeEnumerable` module
 80 |   - title: Implementing the `FakeEnumerator` class
 81 |   - title:  Reflections
 82 | 
 83 | - title: Domain specific API construction - Master classic DSL design patterns by ripping off well-known libraries and tools
 84 |   path:  domain-specific-api-construction.md
 85 |   sections:
 86 |   - title: Implementing `attr_accessor`
 87 |   - title: Implementing a Rails-style `before_filter` construct
 88 |   - title: Implementing a cheap counterfeit of Mail's API
 89 |   - title: Implementing a shoddy version of XML Builder
 90 |   - title: Implementing Contest on top of MiniTest
 91 |   - title: Implement your own Gherkin parser, or criticize mine!
 92 |   - title: Reflections
 93 | 
 94 | - title: Safely evaluating user-defined formulas and calculations - Learn how to use Dentaku to evaluate Excel-like formulas in programs
 95 |   path:  formula-processing.md
 96 |   sections:
 97 |   - title: First steps with the Dentaku formula evaluator
 98 |   - title: Building the web interface
 99 |   - title: Defining garden layouts as simple data tables
100 |   - title: Implementing the formula processor
101 |   - title: Considering the tradeoffs involved in using Dentaku
102 |   - title: Reflections and further explorations
103 | 


--------------------------------------------------------------------------------
/NOTES.md:
--------------------------------------------------------------------------------
 1 | # Notes
 2 | 
 3 | 
 4 | ## Todos
 5 | 
 6 | ### fix images / double check
 7 | 
 8 | in parsing-json:
 9 | 
10 | ```
11 | ![method calls](//i.imgur.com/HZ0Sa.png)
12 | ```
13 | 
14 | in working-with-binary-file-formats.md
15 | 
16 | ```
17 | ![Pixels](http://i.imgur.com/XhKW1.png)
18 | ```
19 | 
20 | in formual-processing:
21 | 
22 | ```
23 | ![](//i.imgur.com/JlKz2kC.png)   and 8 more!
24 | ```
25 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | See the live version @ [`yukimotopress.github.io/practices`](http://yukimotopress.github.io/practicing)
 2 | 
 3 | ---
 4 | 
 5 | # Best of Practicing Ruby (Manuscripts Book Edition)
 6 | 
 7 | by [Gregory Brown](https://github.com/practicingruby), Luke Francl, Magnus Holm, Aaron Patterson, Solomon White et al
 8 | 
 9 | 
10 | This is the original source reformatted in a single-page book edition (using the [Manuscripts format](http://manuscripts.github.io)).
11 | 
12 | 
13 | 
14 | - [A minimal HTTP server](http-server.md) - Build just enough HTTP functionality from scratch to serve up static files. (w/ Luke Francl)   <!-- Issue 7.2 — July 2, 2013 -->
15 | - [Event loops demystified](evented-io.md) - Build a Node.js/EventMachine-style event loop in roughly 150 lines. (w/ Magnus Holm)  <!-- Issue 5.3 — September 4, 2012 -->
16 | - [Parsing JSON the hard way](parsing-json.md) - Learn about low-level parser and compiler tools by implementing a JSON parser. (w/ Aaron Patterson)  <!-- Issue 6.1 — January 1, 2013 -->
17 | - [Tricks for working with text and files](cheap-counterfeits-jekyll.md) - Tear apart a minimal clone of the Jekyll blog engine in search of helpful idioms. <!-- Issue 4.4 — May 10, 2012 -->
18 | - [Working with binary file formats](working-with-binary-file-formats.md) - Read and write bitmap files using only a few dozen lines of code.  <!-- Issue 2.12 — November 9, 2011 -->
19 | - [Building Unix-style command line applications](unix-style-command-line-applications.md) - Build a basic clone of the 'cat' utility while learning some idioms for command line applications.   <!-- Issue 2.9 — October 18, 2011 -->
20 | - [Rapid Prototyping](rapid-prototyping.md) - Build a tiny prototype of a tetris game on the command line.  <!-- Issue 1.12 — December 21, 2010 -->
21 | - [Building Enumerable 'n' Enumerator](roll-your-own-enumerable-and-enumerator.md) - Learn about powerful iteration tools by implementing some of its functionality yourself. <!-- Issue 2.4 — September 13, 2011 -->
22 | - [Domain specific API construction](domain-specific-api-construction.md) - Master classic DSL design patterns by ripping off well-known libraries and tools. <!-- Issue 2.11 — November 2, 2011 -->
23 | - [Safely evaluating user-defined formulas and calculations](formula-processing.md) -
24 | Learn how to use Dentaku to evaluate Excel-like formulas in programs (w/ Solomon White)   <!-- Issue 8.2 — September 10, 2015 -->
25 | 
26 | 
27 | 
28 | 
29 | ## Meta
30 | 
31 | ### Sources
32 | 
33 | See the [original source](https://github.com/elm-city-craftworks/practicing-ruby-manuscripts) repo.
34 | 


--------------------------------------------------------------------------------
/cheap-counterfeits-jekyll.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Tricks for working with text and files
  3 | ---
  4 | 
  5 | _Tear apart a minimal clone of the Jekyll blog engine in search of helpful idioms_
  6 | 
  7 | 
  8 | 
  9 | While it may not seem like it at first, you can learn a great deal about Ruby by building something as simple as a static website generator. Although the task itself may seem a bit dull, it provides an opportunity to practice a wide range of Ruby idioms that can be applied elsewhere whenever you need to manipulate text-based data or muck around with the filesystem. Because text and files are everywhere, this kind of practice can have a profound impact on your ability to write elegant Ruby code.
 10 | 
 11 | Unfortunately, there are two downsides to building a static site generator as a learning exercise: it involves a fairly large time commitment, and in the end you will probably be better off using [Jekyll](http://github.com/mojombo/jekyll) rather than maintaining your own project. But don't despair, I wrote this article specifically with those two points in mind!
 12 | 
 13 | In order to make it easier for us to study text and file processing tricks, I broke off a small chunk of Jekyll's functionality and implemented a simplified demo app called [Jackal](http://github.com/elm-city-craftworks/jackal). Although it would be a horrible idea to attempt to use this barely functional counterfeit to maintain a blog or website, it works great as a tiny but context-rich showcase for some very handy Ruby idioms.
 14 | 
 15 | ## A brief overview of Jackal's functionality
 16 | 
 17 | The best way to get a feel for what Jackal can do is to [grab it from Github](https://github.com/elm-city-craftworks/jackal) and follow the instructions in the README. However, because it only implements a single feature, you should be able to get a full sense of how it works from the following overview.
 18 | 
 19 | Similar to Jekyll, the main purpose of Jackal is to convert Markdown-formatted posts and their metadata into HTML files. For example, suppose we have a file called **_posts/2012-05-09-tiniest-kitten.markdown** with the following contents:
 20 | 
 21 | ```
 22 | ---
 23 | category: essays
 24 | title: The tiniest kitten
 25 | ---
 26 | 
 27 | # The Tiniest Kitten
 28 | 
 29 | Is not nearly as **small** as you might think she is.
 30 | ```
 31 | 
 32 | Jackal's job is to split the metadata from the content in this file and then generate a new file called **_site/essays/2012/05/09/tiniest_kitten.html** that ends up looking like this:
 33 | 
 34 | 
 35 | ```html
 36 | <h1>The Tiniest Kitten</h1>
 37 | 
 38 | <p>Is not nearly as <strong>small</strong> as you might think she is.</p>
 39 | ```
 40 | 
 41 | If Jackal were a real static site generator, it would support all sorts of fancy features like layouts and templates, but I found that I was able to generate enough "teaching moments" without those things, and so this is pretty much all there is to it. You may want to spend a few more minutes [reading its source](http://github.com/elm-city-craftworks/jackal) before moving on, but if you understand this example, you will have no trouble understanding the rest of this article.
 42 | 
 43 | Now that you have some sense of the surrounding context, I will take you on a guided tour of through various points of interest in Jackal's implementation, highlighting the parts that illustrate generally useful techniques.
 44 | 
 45 | ## Idioms for text processing
 46 | 
 47 | While working on solving this problem, I noticed a total of four text processing idioms worth mentioning.
 48 | 
 49 | **1) Enabling multi-line mode in patterns**
 50 | 
 51 | The first step that Jackal (and Jekyll) need to take before further processing can be done on source files is to split the YAML-based metadata from the post's content. In Jekyll, the following code is used to split things up:
 52 | 
 53 | ```ruby
 54 | if self.content =~ /^(---\s*\n.*?\n?)^(---\s*$\n?)/m
 55 |   self.content = $POSTMATCH
 56 |   self.data    = YAML.load($1)
 57 | end
 58 | ```
 59 | 
 60 | This is a fairly vanilla use of regular expressions, and is pretty easy to read even if you aren't especially familiar with Jekyll itself. The main interesting thing about it that it uses the `/m` modifier to make it so that the pattern is evaluated in multiline-mode. In this particular example, this simply makes it so that the group which captures the YAML metadata can match multiple lines without explicitly specifying the intermediate `\n` characters. The following contrived example should help you understand what that means if you are still scratching your head:
 61 | 
 62 | ```
 63 | >> "foo\nbar\nbaz\nquux"[/foo\n(.*)quux/, 1]
 64 | => nil
 65 | >> "foo\nbar\nbaz\nquux"[/foo\n(.*)quux/m, 1]
 66 | => "bar\nbaz\n"
 67 | ```
 68 | 
 69 | While this isn't much of an exciting idiom for those who have a decent understanding of regular expressions, I know that for many patterns can be a mystery, and so I wanted to make sure to point this feature out. It is great to use whenever you need to match a semi-arbritrary blob of content that can span many lines.
 70 | 
 71 | **2) Using MatchData objects rather than global variables**
 72 | 
 73 | While it is not necessarily terrible to use variables like `$1` and `$POSTMATCH`, I tend to avoid them whenever it is not strictly necessary to use them. I find that using `String#match` feels a lot more object-oriented and is more aesthetically pleasing:
 74 | 
 75 | ```ruby
 76 | if md = self.content.match(/^(---\s*\n.*?\n?)^(---\s*$\n?)/m)
 77 |   self.content = md.post_match
 78 |   self.data    = md[1]
 79 | end
 80 | ```
 81 | 
 82 | If you combine this with the use of Ruby 1.9's named groups, your code ends up looking even better. The following example is what I ended up using in Jackal:
 83 | 
 84 | ```ruby
 85 | if (md = contents.match(/^(?<metadata>---\s*\n.*?\n?)^(---\s*$\n?)/m))
 86 |   self.contents = md.post_match
 87 |   self.metadata = YAML.load(md[:metadata])
 88 | end
 89 | ```
 90 | 
 91 | While this does lead to somewhat more verbose patterns, it helps quite a bit with readability and even makes it possible to directly use `MatchData` objects in a way similar to how we would work with a parameters hash.
 92 | 
 93 | **3) Enabling free-spacing mode in patterns**
 94 | 
 95 | I tend to be very strict about keeping my code formatted so that my lines are under 80 characters, and as a result of that I find that I am often having to think about how to break up long statements. I ended up using the `/x` modifier in one of Jackal's regular expressions for this purpose, as shown below:
 96 | 
 97 | ```ruby
 98 | module Jackal
 99 |   class Post
100 |     PATTERN = /\A(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})-
101 |                 (?<basename>.*).markdown\z/x
102 | 
103 |     # ...
104 |   end
105 | end
106 | ```
107 | 
108 | This mode makes it so that patterns ignore whitespace characters, making the previous pattern functionally equivalent to the following pattern:
109 | 
110 | ```ruby
111 | /\A(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})-(?<basename>.*).markdown\z/x
112 | ```
113 | 
114 | However, this mode does not exist primarily to serve the needs of those with obsessive code formatting habits, but instead exists to make it possible to break up and document long regular expressions, such as in the following example:
115 | 
116 | ```ruby
117 | # adapted from: http://refactormycode.com/codes/573-phone-number-regex
118 | 
119 | PHONE_NUMBER_PATTERN = /^
120 |   (?:
121 |     (?<prefix>\d)             # prefix digit
122 |     [ \-\.]?                  # optional separator
123 |   )?
124 |   (?:
125 |     \(?(?<areacode>\d{3})\)?  # area code
126 |     [ \-\.]                   # separator
127 |   )?
128 |   (?<trunk>\d{3})             # trunk
129 |   [ \-\.]                     # separator
130 |   (?<line>\d{4})              # line
131 |   (?:\ ?x?                    # optional space or 'x'
132 |     (?<extension>\d+)         # extension
133 |   )?
134 | $/x
135 | ```
136 | 
137 | This idiom is not extremly common in Ruby, perhaps because it is easy to use interpolation within regular expressions to accomplish similar results. However, this does seem to be a handy way to document your patterns and arrange them in a way that can be easily visually scanned without having to chain things together through interpolation.
138 | 
139 | **4) Making good use of Array#join**
140 | 
141 | Whenever I am building up a string from a list of elements, I tend to use `Array#join` rather than string interpolation (i.e. the `#{}` operator) if I am working with more than two elements. As an example, take a look at my implementation of the `Jackal::Post#dirname` method:
142 | 
143 | ```ruby
144 | module Jackal
145 |   class Post
146 |     def dirname
147 |       raise ArgumentError unless metadata["category"]
148 | 
149 |       [ metadata["category"],
150 |         filedata["year"], filedata["month"], filedata["day"] ].join("/")
151 |     end
152 |   end
153 | end
154 | ```
155 | 
156 | The reason for this is mostly aesthetic, but it gives me the freedom to format my code any way I would like, and is a bit easier to make changes to.
157 | 
158 | > **NOTE:** Noah Hendrix pointed out in the [comments on this article](http://practicingruby.com/articles/57#comments) that for this particular example, using `File.join` would be better because it would take platform-specific path syntax into account.
159 | 
160 | ## Idioms for working with files and folders
161 | 
162 | In addition to the text processing tricks that we've already gone over, I also noticed four idioms for doing various kinds of file and folder manipulation that came in handy.
163 | 
164 | **1) Manipulating filenames**
165 | 
166 | There are three methods that are commonly used for munging filenames: `File.dirname`, `File.basename`, and `File.extname`. In Jackal, I ended up using two out of three of them, but could easily imagine how to make use of all three.
167 | 
168 | I expect that most folks will already be familiar with `File.dirname`, but if that is not the case, the tests below should familiarize you with one of its use cases:
169 | 
170 | ```ruby
171 | describe Jackal::Page do
172 |   let(:page) do
173 |     posts_dir = "#{File.dirname(__FILE__)}/../fixtures/sample_app/_posts"
174 |     Jackal::Page.new("#{posts_dir}/2012-05-07-first-post.markdown")
175 |   end
176 | 
177 |   it "must extract the base filename" do
178 |     page.filename.must_equal("2012-05-07-first-post.markdown")
179 |   end
180 | end
181 | ```
182 | 
183 | When used in conjunction with the special `__FILE__` variable, `File.dirname` is used generate a relative path. So for example, if the `__FILE__` variable in the previous tests evaluates to `"test/units/page_test.rb"`, you end up with the following return value from `File.dirname`:
184 | 
185 | ```ruby
186 | >> File.dirname("test/units/page_test.rb")
187 | => "test/units"
188 | ```
189 | 
190 | Then the whole path becomes `"tests/units/../fixtures/sample_app/_posts"`, which is functionally equivalent to `"test/fixtures/sample_app/_posts"`. The main benefit is that should you run the tests from a different folder, `__FILE__` would be updated accordingly to still generate a correct relative path. This is yet another one of those idioms that is hardly exciting to those who are already familiar with it, but is an important enough tool that I wanted to make sure to mention it.
191 | 
192 | If you feel like you understand `File.dirname`, then `File.basename` should be just as easy to grasp. It is essentially the opposite operation, getting just the filename and stripping away the directories in the path. If you take a closer look at the tests above, you will see that `File.basename` is exactly what we need in order to implement the behavior hinted at by `Jackal::Page#filename`. The irb-based example below should give you a sense of how that could work:
193 | 
194 | ```
195 | >> File.basename("long/path/to/_posts/2012-05-09-tiniest-kitten.markdown")
196 | => "2012-05-09-tiniest-kitten.markdown"
197 | ```
198 | 
199 | For the sake of simplicity, I decided to support Markdown only in Jackal posts, but if we wanted to make it more Jekyll-like, we would need to support looking up which formatter to use based on the post's file extension. This is where `File.extname` comes in handy:
200 | 
201 | ```
202 | >> File.extname("2012-05-09-tiniest-kitten.markdown")
203 | => ".markdown"
204 | >> File.extname("2012-05-09-tiniest-kitten.textile")
205 | => ".textile"
206 | ```
207 | 
208 | Typically when you are interested in the extension of a file, you are also interested in the name of the file without the extension. While I have seen several hacks that can be used for this purpose, the approach I like best is to use the lesser-known two argument form of `File.basename`, as shown below:
209 | 
210 | ```
211 | >> File.basename("2012-05-09-tiniest-kitten.textile", ".*")
212 | => "2012-05-09-tiniest-kitten"
213 | >> File.basename("2012-05-09-tiniest-kitten.markdown", ".*")
214 | => "2012-05-09-tiniest-kitten"
215 | ```
216 | 
217 | While these three methods may not look especially beautiful in your code, they provide a fairly comprehensive way of decomposing paths and filenames into their parts. With that in mind, it is somewhat surprising to me how many different ways I have seen people attempt to solve these problems, typically resorting to some regexp-based hacks.
218 | 
219 | **2) Using Pathname objects**
220 | 
221 | Whenever Ruby has a procedural or functional API, it usually also has a more object-oriented way of doing things as well. Manipulating paths and filenames is no exception, and the example below shows that it is entirely possible to use `Pathname` objects to solve the same problems discussed in the previous section:
222 | 
223 | ```
224 | >> require "pathname"
225 | => true
226 | >> Pathname.new("long/path/to/_posts/2012-05-09-tiniest-kitten.markdown").dirname
227 | => #<Pathname:long/path/to/_posts>
228 | >> Pathname.new("long/path/to/_posts/2012-05-09-tiniest-kitten.markdown").basename
229 | => #<Pathname:2012-05-09-tiniest-kitten.markdown>
230 | >> Pathname.new("long/path/to/_posts/2012-05-09-tiniest-kitten.markdown").extname
231 | => ".markdown"
232 | ```
233 | 
234 | However, because doing so doesn't really simplify the code, it is hard to see the advantages of using `Pathname` objects in this particular example. A much better example can be found in `Jackal::Post#save`:
235 | 
236 | 
237 | ```ruby
238 | module Jackal
239 |   class Post
240 |     def save(base_dir)
241 |       target_dir = Pathname.new(base_dir) + dirname
242 | 
243 |       target_dir.mkpath
244 | 
245 |       File.write(target_dir + filename, contents)
246 |     end
247 |   end
248 | end
249 | ```
250 | 
251 | The main reason why I used a `Pathname` object here is because I needed to make use of the `mkpath` method. This method is roughly equivalent to the UNIX `mkdir -p` command, which handles the creation of intermediate directories automatically. This feature really comes in handy for safely generating a deeply nested folder structure similar to the ones that Jekyll produces. I could have alternatively used the `FileUtils` standard library for this purpose, but personally find `Pathname` to look and feel a lot more like a modern Ruby library.
252 | 
253 | Although its use here is almost coincidental, the `Pathname#+` method is another powerful feature worth mentioning. This method builds up a `Pathname` object through concatenation. Because this method accepts both `Pathname` objects and `String` objects as arguments but always returns a `Pathname` object, it makes easy to incrementally build up a complex path. However, because `Pathname` objects do more than simply merge strings together, you need to be aware of certain edge cases. For example, the following irb session demonstrates that `Pathname` has a few special cases for dealing with absolute and relative paths:
254 | 
255 | ```
256 | >> Pathname.new("foo") + "bar"
257 | => #<Pathname:foo/bar>
258 | >> Pathname.new("foo") + "/bar"
259 | => #<Pathname:/bar>
260 | >> Pathname.new("foo") + "./bar"
261 | => #<Pathname:foo/bar>
262 | >> Pathname.new("foo") + ".////bar"
263 | => #<Pathname:foo/bar>
264 | ```
265 | 
266 | Unless you keep these issues in mind, you may end up introducing subtle errors into your code. However, this behavior makes sense as long as you can remember that `Pathname` is semantically aware of what a path actually is, and is not meant to be a drop in replacement for ordinary string concatenation.
267 | 
268 | **3) Using File.write**
269 | 
270 | When I first started using Ruby, I was really impressed by how simple and expressive the `File.read` method was. Because of that, it was kind of a shock to find out that simply writing some text to a file was not as simple. The following code felt like the opposite of elegance to me, but we all typed it for years:
271 | 
272 | ```ruby
273 | File.open(filename, "w") { |f| f << contents }
274 | ```
275 | 
276 | In modern versions of Ruby 1.9, the above code can be replaced with something far nicer, as shown below:
277 | 
278 | ```ruby
279 | File.write(filename, contents)
280 | ```
281 | 
282 | If you look back at the implementation of `Jackal::Post#save`, you will see that I use this technique there. While it is the simple and obvious thing to do, a ton of built up muscle memory typically causes me to forget that `File.write` exists, even when I am not concerned at all about backwards compatibility concerns.
283 | 
284 | Another pair of methods worth knowing about that help make some other easy tasks more elegant in a similar way are `File.binread` and `File.binwrite`. These aren't really related to our interests with Jackal, but are worth checking out if you ever work with binary files.
285 | 
286 | **4) Using Dir.mktmpdir for testing**
287 | 
288 | It can be challenging to write tests for code which deals with files and complicated folder structures, but it doesn't have to be. The tempfile standard library provides a lot of useful tools for dealing with this problem, and `Dir.mktmpdir` is one of its most useful methods.  
289 | 
290 | I like to use this method in combination with `Dir.chdir` to build up a temporary directory structure, do some work in it, and then automatically discard all the files I generated as soon as my test is completed. The tests below are a nice example of how that works:
291 | 
292 | ```ruby
293 | it "must be able to save contents to file" do
294 |   Dir.mktmpdir do |base_dir|
295 |     post.save(base_dir)
296 | 
297 |     Dir.chdir("#{base_dir}/#{post.dirname}") do
298 |       File.read(post.filename).must_equal(post.contents)
299 |     end
300 |   end
301 | end
302 | ```
303 | This approach provides an alternative to using mock objects. Even though this code creates real files and folders, the transactional nature of `Dir.mktmpdir` ensures that tests won't have any unexpected side effects from run to run. When manipulating files and folders is part of the core job of an object (as opposed to an implementation detail), I prefer testing in this way rather than using mock objects for the sake of realism.
304 | 
305 | The `Dir.mktmpdir` method can also come in handy whenever some complicated work needs to be done in a sandbox on the file system. For example, I [use it in Bookie](https://github.com/sandal/bookie/blob/45e0c4d0a575026deff79732b3c4c737f1c6f15c/lib/bookie/emitters/epub.rb#L19-46) to store the intermediate results of a complicated text munging process, and it seems to work great for that purpose.
306 | 
307 | ## Reflections
308 | 
309 | Taken individually, these text processing and file management idioms only make a subtle improvement to the quality of your code. However, if you get in the habit of using most or all of them whenever you have an opportunity to do so, you will end up with much more maintainable code that is very easy to read.
310 | 
311 | Because many languages make text processing and file management hard, and because Ruby also has low level APIs that work in much the same way as those languages, it is often the case that folks end up solving these problems the hard way without ever realizing that there are nicer alternatives available. Hopefully this article has exposed you to a few tricks you haven't already seen before, but if it hasn't, maybe you can share some thoughts on how to make this code even better!
312 | 


--------------------------------------------------------------------------------
/evented-io.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Event loops demystified
  3 | ---
  4 | 
  5 | _Build a Node.js/EventMachine-style event loop in roughly 150 lines of Ruby code._
  6 | 
  7 | <!-- Issue 5.3 — September 4, 2012 -->
  8 | 
  9 | 
 10 | *This chapter was written by Magnus Holm ([@judofyr][judofyr]),
 11 | a Ruby programmer from Norway. Magnus works on various open source
 12 | projects (including the [Camping][camping] web framework),
 13 | and writes articles over at [the timeless repository][timeless].*
 14 | 
 15 | Working with network I/O in Ruby is so easy:
 16 | 
 17 | ```ruby
 18 | require 'socket'
 19 | 
 20 | # Start a server on port 9234
 21 | server = TCPServer.new('0.0.0.0', 9234)
 22 | 
 23 | # Wait for incoming connections
 24 | while io = server.accept
 25 |   io << "HTTP/1.1 200 OK\r\n\r\nHello world!"
 26 |   io.close
 27 | end
 28 | 
 29 | # Visit http://localhost:9234/ in your browser.
 30 | ```
 31 | 
 32 | Boom, a server is up and running! Working in Ruby has some disadvantages, though: we
 33 | can handle only one connection at a time. We can also have only one *server*
 34 | running at a time. There's no understatement in saying that these constraints
 35 | can be quite limiting.
 36 | 
 37 | There are several ways to improve this situation, but lately we've seen an
 38 | influx of event-driven solutions. [Node.js][nodejs] is just an event-driven I/O-library
 39 | built on top of JavaScript. [EventMachine][em] has been a solid solution in the Ruby
 40 | world for several years. Python has [Twisted][twisted], and Perl has so many that they even
 41 | have [an abstraction around them][anyevent].
 42 | 
 43 | Although these solutions might seem like silver bullets, there are subtle details that
 44 | you'll have to think about. You can accomplish a lot by following simple rules
 45 | ("don't block the thread"), but I always prefer to know precisely what I'm
 46 | dealing with. Besides, if doing regular I/O is so simple, why does
 47 | event-driven I/O have to be looked at as black magic?
 48 | 
 49 | To show that they are nothing to be afraid of, we are going to implement an
 50 | I/O event loop in this article. Yep, that's right; we'll capture the core
 51 | part of EventMachine/Node.js/Twisted in about 150 lines of Ruby. It won't
 52 | be performant, it won't be test-driven, and it won't be solid, but it will
 53 | use the same concepts as in all of these great projects. We will start
 54 | by looking at a minimal chat server example and then discuss
 55 | how to build the infrastructure that supports it.
 56 | 
 57 | ## Obligatory chat server example
 58 | 
 59 | Because chat servers seem to be the event-driven equivalent of a
 60 | "hello world" program, we will keep with that tradition here. The
 61 | following example shows a trivial `ChatServer` object that uses
 62 | the `IOLoop` that we'll discuss in this article:
 63 | 
 64 | ```ruby
 65 | class ChatServer
 66 |   def initialize
 67 |     @clients = []
 68 |     @client_id = 0
 69 |   end
 70 | 
 71 |   def <<(server)
 72 |     server.on(:accept) do |stream|
 73 |       add_client(stream)
 74 |     end
 75 |   end
 76 | 
 77 |   def add_client(stream)
 78 |     id = (@client_id += 1)
 79 |     send("User ##{id} joined\n")
 80 | 
 81 |     stream.on(:data) do |chunk|
 82 |       send("User ##{id} said: #{chunk}")
 83 |     end
 84 | 
 85 |     stream.on(:close) do
 86 |       @clients.delete(stream)
 87 |       send("User ##{id} left")
 88 |     end
 89 | 
 90 |     @clients << stream
 91 |   end
 92 | 
 93 |   def send(msg)
 94 |     @clients.each do |stream|
 95 |       stream << msg
 96 |     end
 97 |   end
 98 | end
 99 | 
100 | # usage
101 | 
102 | io     = IOLoop.new
103 | server = ChatServer.new
104 | 
105 | server << io.listen('0.0.0.0', 1234)
106 | 
107 | io.start
108 | ```
109 | 
110 | To play around with this server, run [this script][chatserver] and then open up
111 | a couple of telnet sessions to it. You should be able to produce something like the
112 | following with a bit of experimentation:
113 | 
114 | ```
115 | # from User #1's console:
116 | $ telnet 127.0.0.1 1234
117 | 
118 | User #2 joined
119 | User #2 said: Hi
120 | Hi
121 | User #1 said: Hi
122 | User #2 said: Bye
123 | User #2 left
124 | 
125 | # from User #2's console (quits after saying Bye)
126 | $ telnet 127.0.0.1 1234
127 | 
128 | User #1 said: Hi
129 | Bye
130 | User #2 said: Bye
131 | ```
132 | 
133 | If you don't have the time to try out this code right now,
134 | don't worry: as long as you understand the basic idea behind it, you'll be fine.
135 | This chat server is here to serve as a practical example to help you
136 | understand [the code we'll be discussing][chatserver] throughout this article.
137 | 
138 | Now that we have a place to start from, let's build our event system.
139 | 
140 | ## Event handling
141 | 
142 | First of all we need, obviously, events! With no further ado:
143 | 
144 | ```ruby
145 | module EventEmitter
146 |   def _callbacks
147 |     @_callbacks ||= Hash.new { |h, k| h[k] = [] }
148 |   end
149 | 
150 |   def on(type, &blk)
151 |     _callbacks[type] << blk
152 |     self
153 |   end
154 | 
155 |   def emit(type, *args)
156 |     _callbacks[type].each do |blk|
157 |       blk.call(*args)
158 |     end
159 |   end
160 | end
161 | 
162 | class HTTPServer
163 |   include EventEmitter
164 | end
165 | 
166 | server = HTTPServer.new
167 | server.on(:request) do |req, res|
168 |   res.respond(200, 'Content-Type' => 'text/html')
169 |   res << "Hello world!"
170 |   res.close
171 | end
172 | 
173 | # When a new request comes in, the server will run:
174 | #   server.emit(:request, req, res)
175 | 
176 | ```
177 | 
178 | `EventEmitter` is a module that we can include in classes that can send and
179 | receive events. In one sense, this is the most important part of our event
180 | loop: it defines how we use and reason about events in the system. Modifying it
181 | later will require changes all over the place. Although this particular
182 | implementation is a bit more simple than what you'd expect from a real
183 | library, it covers the fundamental ideas that are common to all
184 | event-based systems.
185 | 
186 | ## The IO loop
187 | 
188 | Next, we need something to fire up these events. As you will see in
189 | the following code, the general flow of an event loop is simple:
190 | detect new events, run their associated callbacks, and then repeat
191 | the whole process again.
192 | 
193 | ```ruby
194 | class IOLoop
195 |   # List of streams that this IO loop will handle.
196 |   attr_reader :streams
197 | 
198 |   def initialize
199 |     @streams = []
200 |   end
201 | 
202 |   # Low-level API for adding a stream.
203 |   def <<(stream)
204 |     @streams << stream
205 |     stream.on(:close) do
206 |       @streams.delete(stream)
207 |     end
208 |   end
209 | 
210 |   # Some useful helpers:
211 |   def io(io)
212 |     stream = Stream.new(io)
213 |     self << stream
214 |     stream
215 |   end
216 | 
217 |   def open(file, *args)
218 |     io File.open(file, *args)
219 |   end
220 | 
221 |   def connect(host, port)
222 |     io TCPSocket.new(host, port)
223 |   end
224 | 
225 |   def listen(host, port)
226 |     server = Server.new(TCPServer.new(host, port))
227 |     self << server
228 |     server.on(:accept) do |stream|
229 |       self << stream
230 |     end
231 |     server
232 |   end
233 | 
234 |   # Start the loop by calling #tick over and over again.
235 |   def start
236 |     @running = true
237 |     tick while @running
238 |   end
239 | 
240 |   # Stop/pause the event loop after the current tick.
241 |   def stop
242 |     @running = false
243 |   end
244 | 
245 |   def tick
246 |     @streams.each do |stream|
247 |       stream.handle_read  if stream.readable?
248 |       stream.handle_write if stream.writable?
249 |     end
250 |   end
251 | end
252 | ```
253 | 
254 | Notice here that `IOLoop#start` blocks everything until `IOLoop#stop` is called.
255 | Everything after `IOLoop#start` will happen in callbacks, which means that the
256 | control flow can be surprising. For example, consider the following code:
257 | 
258 | ```ruby
259 | l = IOLoop.new
260 | 
261 | ruby = i.connect('ruby-lang.org', 80)  # 1
262 | ruby << "GET / HTTP/1.0\r\n\r\n"       # 2
263 | 
264 | # Print output
265 | ruby.on(:data) do |chunk|
266 |   puts chunk   # 3
267 | end
268 | 
269 | # Stop IO loop when we're done
270 | ruby.on(:close) do
271 |   l.stop       # 4
272 | end
273 | 
274 | l.start        # 5
275 | ```
276 | 
277 | You might think that you're writing data in step 2, but the
278 | `<<` method actually just stores the data in a local buffer.
279 | It's not until the event loop has started (in step 5) that the data
280 | actually gets sent. The `IOLoop#start` method triggers `#tick` to be run in a loop, which
281 | delegates to `Stream#handle_read` and `Stream#handle_write`. These methods
282 | are responsible for doing any necessary I/O operations and then triggering
283 | events such as `:data` and `:close`, which you can see being used in steps 3 and 4. We'll take a look at how `Stream` is implemented later, but for now
284 | the main thing to take away from this example is that event-driven code
285 | cannot be read in top-down fashion as if it were procedural code.
286 | 
287 | Studying the implementation of `IOLoop` should also reveal why it's
288 | so terrible to block inside a callback. For example, take a look at this
289 | call graph:
290 | 
291 | ```
292 | # indentation means that a method/block is called
293 | # deindentation means that the method/block returned
294 | 
295 | tick (10 streams are readable)
296 |   stream1.handle_read
297 |     stream1.emit(:data)
298 |       your callback
299 | 
300 |   stream2.handle_read
301 |     stream2.emit(:data)
302 |       your callback
303 |         you have a "sleep 5" inside here
304 | 
305 |   stream3.handle_read
306 |     stream3.emit(:data)
307 |       your callback
308 |   ...
309 | ```
310 | 
311 | By blocking inside the second callback, the I/O loop has to wait 5 seconds
312 | before it's able to call the rest of the callbacks. This wait is
313 | obviously a bad thing, and it is important
314 | to avoid such a situation when possible. Of course, nonblocking
315 | callbacks are not enough—the event loop also needs to make use of nonblocking
316 | I/O. Let's go over that a bit more now.
317 | 
318 | ## IO events
319 | 
320 | At the most basic level, there are only two events for an `IO` object:
321 | 
322 | 1. Readable: The `IO` is readable; data is waiting for us.
323 | 2. Writable: The `IO` is writable; we can write data.
324 | 
325 | These might sound a little confusing: how can a client know that the server
326 | will send us data? It can't. Readable doesn't mean "the server will send us
327 | data"; it means "the server has already sent us data." In that case, the data
328 | is handled by the kernel in your OS. Whenever you read from an `IO` object, you're
329 | actually just copying bytes from the kernel. If the receiver does not read
330 | from `IO`, the kernel's buffer will become full and the sender's `IO` will
331 | no longer be writable. The sender will then have to wait until the
332 | receiver can catch up and free up the kernel's buffer. This situation is
333 | what makes nonblocking `IO` operations tricky to work with.
334 | 
335 | Because these low-level operations can be tedious to handle manually, the
336 | goal of an I/O loop is to trigger some more usable events for application
337 | programmers:
338 | 
339 | 1. Data: A chunk of data was sent to us.
340 | 2. Close: The IO was closed.
341 | 3. Drain: We've sent all buffered outgoing data.
342 | 4. Accept: A new connection was opened (only for servers).
343 | 
344 | All of this functionality can be built on top of Ruby's `IO` objects with
345 | a bit of effort.
346 | 
347 | ## Working with the Ruby IO object
348 | 
349 | There are various ways to read from an `IO` object in Ruby:
350 | 
351 | ```ruby
352 | data = io.read
353 | data = io.read(12)
354 | data = io.readpartial(12)
355 | data = io.read_nonblock(12)
356 | ```
357 | 
358 | * `io.read` reads until the `IO` is closed (e.g., end of file, server closes the
359 | connection, etc.)
360 | 
361 | * `io.read(12)` reads until it has received exactly 12 bytes.
362 | 
363 | * `io.readpartial(12)` waits until the `IO` becomes readable, then it reads *at
364 | most* 12 bytes. So if a server sends only 6 bytes, `readpartial` will return
365 | those 6 bytes. If you had used `read(12)`, it would wait until 6 more bytes were
366 | sent.
367 | 
368 | * `io.read_nonblock(12)` will read at most 12 bytes if the IO is readable. It
369 | raises `IO::WaitReadable` if the `IO` is not readable.
370 | 
371 | For writing, there are two methods:
372 | 
373 | ```ruby
374 | length = io.write(str)
375 | length = io.write_nonblock(str)
376 | ```
377 | 
378 | * `io.write` writes the whole string to the `IO`, waiting until the `IO` becomes
379 | writable if necessary. It returns the number of bytes written (which should
380 | always be equal to the number of bytes in the original string).
381 | 
382 | * `io.write_nonblock` writes as many bytes as possible until the `IO` becomes
383 | nonwritable, returning the number of bytes written. It raises `IO::WaitWritable`
384 | if the `IO` is not writable.
385 | 
386 | The challenge when both reading and writing in a nonblocking fashion is knowing
387 | when it is possible to do so and when it is necessary to wait.
388 | 
389 | ## Getting real with IO.select
390 | 
391 | We need some mechanism for knowing when we can read or write to our
392 | streams, but I'm not going to implement `Stream#readable?` or `#writable?`. It's
393 | a terrible solution to loop over every stream object in Ruby and check whether it's
394 | readable/writable over and over again. This is really just not a job for Ruby;
395 | it's too far away from the kernel.
396 | 
397 | Luckily, the kernel exposes ways to efficiently detect readable and writable
398 | I/O streams. The simplest cross-platform method is called select(2)
399 | and is available in Ruby as `IO.select`:
400 | 
401 | ```
402 | IO.select(read_array [, write_array [, error_array [, timeout]]])
403 | 
404 | Calls select(2) system call. It monitors supplied arrays of IO objects and waits
405 | until one or more IO objects are ready for reading, ready for writing, or have
406 | errors. It returns an array of those IO objects that need attention. It returns
407 | nil if the optional timeout (in seconds) was supplied and has elapsed.
408 | ```
409 | 
410 | With this knowledge, we can write a much better `#tick` method:
411 | 
412 | ```ruby
413 | class IOLoop
414 |   def tick
415 |     r, w = IO.select(@streams, @streams)
416 |     r.each do |stream|
417 |       stream.handle_read
418 |     end
419 | 
420 |     w.each do |stream|
421 |       stream.handle_write
422 |     end
423 |   end
424 | end
425 | ```
426 | 
427 | `IO.select` will block until some of our streams become readable or writable
428 | and then return those streams. From there, it is up to those streams to do
429 | the actual data processing work.
430 | 
431 | ## Handling streaming input and output
432 | 
433 | Now that we've used the `Stream` object in various examples, you may
434 | already have an idea of what its responsibilities are. But let's first take a look at how it is implemented:
435 | 
436 | ```ruby
437 | class Stream
438 |   # We want to bind/emit events.
439 |   include EventEmitter
440 | 
441 |   def initialize(io)
442 |     @io = io
443 |     # Store outgoing data in this String.
444 |     @writebuffer = ""
445 |   end
446 | 
447 |   # This tells IO.select what IO to use.
448 |   def to_io; @io end
449 | 
450 |   def <<(chunk)
451 |     # Append to buffer; #handle_write is doing the actual writing.
452 |     @writebuffer << chunk
453 |   end
454 | 
455 |   def handle_read
456 |     chunk = @io.read_nonblock(4096)
457 |     emit(:data, chunk)
458 |   rescue IO::WaitReadable
459 |     # Oops, turned out the IO wasn't actually readable.
460 |   rescue EOFError, Errno::ECONNRESET
461 |     # IO was closed
462 |     emit(:close)
463 |   end
464 | 
465 |   def handle_write
466 |     return if @writebuffer.empty?
467 |     length = @io.write_nonblock(@writebuffer)
468 |     # Remove the data that was successfully written.
469 |     @writebuffer.slice!(0, length)
470 |     # Emit "drain" event if there's nothing more to write.
471 |     emit(:drain) if @writebuffer.empty?
472 |   rescue IO::WaitWritable
473 |   rescue EOFError, Errno::ECONNRESET
474 |     emit(:close)
475 |   end
476 | end
477 | ```
478 | 
479 | `Stream` is nothing more than a wrapper around a Ruby `IO` object that
480 | abstracts away all the low-level details of reading and writing that were
481 | discussed throughout this article. The `Server` object we make use of
482 | in `IOLoop#listen` is implemented in a similar fashion but is focused
483 | on accepting incoming connections instead:
484 | 
485 | ```ruby
486 | class Server
487 |   include EventEmitter
488 | 
489 |   def initialize(io)
490 |     @io = io
491 |   end
492 | 
493 |   def to_io; @io end
494 | 
495 |   def handle_read
496 |     sock = @io.accept_nonblock
497 |     emit(:accept, Stream.new(sock))
498 |   rescue IO::WaitReadable
499 |   end
500 | 
501 |   def handle_write
502 |     # do nothing
503 |   end
504 | end
505 | ```
506 | 
507 | Now that you've studied how these low-level objects work, you should
508 | be able to revisit the full [source code for the Chat Server
509 | example][chatserver] and understand exactly how it works. If you
510 | can do that, you know how to build an evented I/O loop from scratch.
511 | 
512 | ### Conclusions
513 | 
514 | Although the basic ideas behind event-driven I/O systems are easy to understand,
515 | there are many low-level details that complicate things. This article discussed some of these ideas, but there are many others that would need
516 | to be considered if we were trying to build a real event library. Among
517 | other things, we would need to consider the following problems:
518 | 
519 | * Because our event loop does not implement timers, it is difficult to do
520 | a number of important things. Even something as simple as keeping a
521 | connection open for a set period of time can be painful without built-in
522 | support for timers, so any serious event library must support them. It's
523 | worth pointing out that `IO#select` does accept a timeout parameter, and
524 | it would be possible to make use of it fairly easily within this codebase.
525 | 
526 | * The event loop shown in this article is susceptible to [back pressure][bp],
527 | which occurs when data continues to be buffered infinitely even if it
528 | has not been accepted for processing yet. Because our event loop
529 | provides no mechanism for signaling that its buffers are full, incoming
530 | data will accumulate and have a similar effect to a memory leak until
531 | the connection is closed or the data is accepted.
532 | 
533 | * The performance of select(2) is linear, which means that handling
534 | 10,000 streams will take 10,000x as long as handling a single stream.
535 | Alternative solutions do exist at the kernel, but many are not
536 | cross-platform and are not exposed to Ruby by default. If you have
537 | high performance needs, you may want to look into the [nio4r][nio4r]
538 | project, which attempts to solve this problem in a clean way by
539 | wrapping the libev library.
540 | 
541 | The challenges involved in getting the details right in event loops
542 | are the real reason why tools like EventMachine and Node.js exist. These systems
543 | allow application programmers to gain the benefits of event-driven I/O without
544 | having to worry about too many subtle details. Still, knowing how they work under the hood
545 | should help you make better use of these tools, and should also take away some
546 | of the feeling that they are a kind of deep voodoo that you'll never
547 | comprehend. Event-driven I/O is perfectly understandable; it is just a bit
548 | messy.
549 | 
550 | [chatserver]: https://gist.githubusercontent.com/practicingruby/3612925/raw/315e7bfc5de7a029606b3885d71953acb84f112e/ChatServer.rb
551 | [timeless]: http://timelessrepo.com
552 | [camping]: https://github.com/camping
553 | [judofyr]: http://twitter.com/judofyr
554 | [nodejs]: http://nodejs.org
555 | [em]: http://rubyeventmachine.com
556 | [twisted]: http://twistedmatrix.com
557 | [anyevent]: http://metacpan.org/module/AnyEvent
558 | [libev]: http://software.schmorp.de/pkg/libev.html
559 | [libuv]: https://github.com/joyent/libuv
560 | [nio4r]: https://github.com/tarcieri/nio4r
561 | [bp]: http://en.wikipedia.org/wiki/Back_pressure#Back_pressure_in_information_technology
562 | 


--------------------------------------------------------------------------------
/formula-processing.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Safely evaluating user-defined formulas and calculations
  3 | ---
  4 | 
  5 | _Learn how to use Dentaku to evaluate Excel-like formulas in programs_
  6 | 
  7 | 
  8 | > This chapter was written by Solomon White ([@rubysolo](http://twitter.com/rubysolo)). Solomon is a software developer from Denver, where he builds web applications with Ruby and ENV.JAVASCRIPT_FRAMEWORK.  He likes code, caffeine, and capsaicin.
  9 | 
 10 | 
 11 | Imagine that you're a programmer for a company that sells miniature zen gardens, and you've been asked to create a  small calculator program that will help determine the material costs of the various different garden designs in the company's product line.
 12 | 
 13 | The tool itself is simple: The dimensions of the garden to be built will be entered via a web form, and then calculator will output the quantity and weight of all the materials that are needed to construct the garden.
 14 | 
 15 | In practice, the problem is a little more complicated, because the company offers many different kinds of gardens. Even though only a handful of basic materials are used throughout the entire product line, the gardens themselves can consist of anything from a plain rectangular design to very intricate and complicated layouts. For this reason, figuring out how much material is needed for each garden type requires the use of custom formulas.
 16 | 
 17 | > MATH WARNING: You don't need to think through the geometric computations being done throughout this article, unless you enjoy that sort of thing; just notice how all the formulas are ordinary arithmetic expressions that operate on a handful of variables.
 18 | 
 19 | The following diagram shows the formulas used for determining the material quantities for two popular products. *Calm* is a minimal rectangular garden, while *Yinyang* is a more complex shape that requires working with circles and semicircles:
 20 | 
 21 | ![](//i.imgur.com/JlKz2kC.png)
 22 | 
 23 | In the past, material quantities and weights for new product designs were computed using Excel spreadsheets, which worked fine when the company only had a few different garden layouts. But to keep up with the incredibly high demand for bespoke desktop Zen Gardens, the business managers have insisted that their workflow become more Agile by moving all product design activities to a web application in THE CLOUD™.
 24 | 
 25 | The major design challenge for building this calculator is that it would not be practical to have a programmer update the codebase whenever a new product idea was dreamt up by the product design team. Some days, the designers have been known to attempt at least 32 different variants on a "snowman with top-hat" zen garden, and in the end only seven or so make it to the marketplace. Dealing with these rapidly changing requirements would drive any reasonable programmer insane.
 26 | 
 27 | After reviewing the project requirements, you decide to build a program that will allow the product design team to specify project requirements in a simple, Excel-like format and then safely execute the formulas they define within the context of a Ruby-based web application.
 28 | 
 29 | Fortunately, the [Dentaku](https://github.com/rubysolo/dentaku) formula parsing and evaluation library was built with this exact use case in mind. Just like you, Solomon White also really hates figuring out snowman geometry, and would prefer to leave that as an exercise for the user.
 30 | 
 31 | ## First steps with the Dentaku formula evaluator
 32 | 
 33 | The purpose of Dentaku is to provide a safe way to execute user-defined mathematical formulas within a Ruby application.  For example, consider the following code:
 34 | 
 35 | ```ruby
 36 | require "dentaku"
 37 | 
 38 | calc = Dentaku::Calculator.new
 39 | volume = calc.evaluate("length * width * height",
 40 |                        :length => 10, :width => 5, :height => 3)
 41 | 
 42 | p volume #=> 150
 43 | ```
 44 | 
 45 | Not much is going on here -- we have some named variables, some numerical values, and a simple formula: `length * width * height`.  Nothing in this example appears to be sensitive data, so on the surface it may not be clear why safety is a key concern here.
 46 | 
 47 | To understand the risks, you consider an alternative implementation that allows mathematical formulas to be evaluated directly as plain Ruby code. You implement the equivalent formula evaluator without the use of an external library, just to see what it would look like:
 48 | 
 49 | ```ruby
 50 | def evaluate_formula(expression, variables)
 51 |    obj = Object.new
 52 | 
 53 |    def obj.context
 54 |      binding
 55 |    end
 56 | 
 57 |    context = obj.context
 58 | 
 59 |    variables.each { |k,v| eval("#{k} = #{v}", context) }
 60 |    eval(expression, context)
 61 | end
 62 | 
 63 | volume = evaluate_formula("length * width * height",
 64 |                   :length => 10, :width => 5, :height => 3)
 65 | 
 66 | p volume #=> 150
 67 | ```
 68 | 
 69 | Although conceptually similar, it turns out these two code samples are worlds apart when you consider the implementation details:
 70 | 
 71 | * When using Dentaku, you're working with a very basic external domain specific language, which only knows how to represent simple numbers, variables, mathematical operations, etc. No direct access to the running Ruby process or its data is provided, and so formulas can only operate on what is explicitly provided to them whenever a `Calculator` object is instantiated.
 72 | 
 73 | * When using `eval` to run formulas as Ruby code, by default any valid Ruby code will be executed. Every instantiated object in the process can be accessed, system commands can be run, etc. This isn't much different than giving users access to the running application via an `irb` console.
 74 | 
 75 | This isn't to say that building a safe way to execute user-defined Ruby scripts isn't possible (it can even be practical in certain circumstances), but if you go that route, safe execution is something you need to specifically design for. By contrast, Dentaku is safe to use with minimally trusted users, because you have very fine-grained control over the data and actions those users will be able to work with.
 76 | 
 77 | You sit quietly for a moment and ponder the implications of all of this. After exactly four minutes of very serious soul searching, you decide that for the existing and forseeable future needs of our overworked but relentlessly optimistic Zen garden designers... Dentaku should work just fine. To be sure that you're  on the right path, you begin working on a functional prototype to share with the product team.
 78 | 
 79 | ## Building the web interface
 80 | 
 81 | You spend a little bit of time building out the web interface for the calculator, using Sinatra and Bootstrap. It consists of only two screens, both of which are shown below:
 82 | 
 83 | ![](//i.imgur.com/h0ftlcF.png)
 84 | 
 85 | People who mostly work with Excel spreadsheets all day murmur that you must be some sort of wizard, and compliment you on your beautiful design. You pay no attention to this, because your mind has already started to focus on the more interesting parts of the problem.
 86 | 
 87 | > **SOURCE FILES:** [app.rb](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/app.rb) // [app.erb](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/views/app.erb) // [index.erb](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/views/index.erb) // [materials.erb](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/views/materials.erb)
 88 | 
 89 | ## Defining garden layouts as simple data tables
 90 | 
 91 | With a basic idea in mind for how you'll implement the calculator, your next task is to figure out how to define the various garden layouts as a series of data tables.
 92 | 
 93 | You start with the weight calculations table, because it involves the most basic computations. The formulas all boil down to variants on the `mass = volume * density` equation:
 94 | 
 95 | ![](//i.imgur.com/1VIrDO1.png)
 96 | 
 97 | This material weight lookup table is suitable for use in all of the product definitions, but the `quantity` value will vary based both on the dimensions of the garden to be built and the physical layout of the garden.
 98 | 
 99 | With that in mind, you turn your attention to the tables that determine how much material is needed for each project, starting with the Calm rectangular garden as an example.
100 | 
101 | Going back to the diagram from earlier, you can see that the quantity of materials needed by the Calm project can be completely determined by the length, width, height, and desired fill level for the sandbox:
102 | 
103 | ![](//i.imgur.com/BfHgoPB.png)
104 | 
105 | You could directly use these formulas in project specifications, but it would feel a little too low-level. Project designers will need to work with various box-like shapes often, and so it would feel more natural to describe the problem with terms like perimeter, area, volume, etc. Knowing that the Dentaku formula processing engine provides support for creating helper functions, you come up with the following definitions for the materials used in the Calm project:
106 | 
107 | ![](//i.imgur.com/xyYtuAM.png)
108 | 
109 | With this work done, you turn your attention to the Yinyang circular garden project. Even though it is much more complex than the basic rectangular design, you notice that it too is defined entirely in terms of a handful of simple variables -- diameter, height, and fill level:
110 | 
111 | ![](//i.imgur.com/1G0vaNx.png)
112 | 
113 | As was the case before, it would be better from a product design perspective to describe things in terms of circular area, cylindrical volume, and circumference rather than the primary dimensional variables, so you design the project definition with that in mind:
114 | 
115 | ![](//i.imgur.com/d71MgSp.png)
116 | 
117 | To make the system easily customizable by the product designers, the common formulas used in the various garden layouts will also be stored in a data table rather than hard-coding them in the web application. The following table lists the names and definitions for all the formulas used in the *Calm* and *Yinyang* projects:
118 | 
119 | ![](//i.imgur.com/ovOhwEX.png)
120 | 
121 | Now that you have a rough sense of what the data model will look like, you're ready to start working on implementing the calculator program. You may need to change the domain model at some point in the future to support more complex use cases, but many different garden layouts can already be represented in this format.
122 | 
123 | > **SOURCE FILES:** [calm.csv](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/db/projects/calm.csv) // [yinyang.csv](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/db/projects/yinyang.csv) // [materials.csv](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/db/materials.csv) // [common_formulas.csv](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/db/common_formulas.csv)
124 | 
125 | ## Implementing the formula processor
126 | 
127 | You start off by building a utility class for reading all the relevant bits of project data that will be needed by the calculator. For the most part, this is another boring chore -- it involves nothing more than loading CSV and JSON data into some arrays and hashes.
128 | 
129 | After a bit of experimentation, you end up implementing the following interface:
130 | 
131 | ```ruby
132 | p Project.available_projects
133 | #=> ["calm", "yinyang”]
134 | 
135 | p Project.variables("calm")
136 | #=> ["length", "width", "height”]
137 | 
138 | p Project.weight_formulas["black sand"]
139 | #=> "quantity * 2.000”
140 | 
141 | p Project.quantity_formulas("yinyang")
142 |           .select { |e| e["name"] == "black sand" } #=>
143 | # [{"name" => "black sand",
144 | #    "formula" => "cylinder_volume * 0.5 * fill",
145 | #    "unit" => "cu cm”}]
146 | 
147 | p Project.common_formulas["cylinder_volume"]
148 | #=> "circular_area * height”
149 | ```
150 | 
151 | Down the line, the `Project` class will probably read from a database rather than text files, but this is largely an implementation detail. Rather than getting bogged down in ruminations about the future, you shift your attention to the heart of the problem -- the Dentaku-powered `Calculator` class.
152 | 
153 | This class will be instantiated with the name of a particular garden layout and a set of dimensional parameters that will be used to determine how much of each material is needed, and how much the entire garden kit will weigh. Sketching this concept out in code, you decide that the `Calculator` class should work as shown below:
154 | 
155 | ```ruby
156 | calc = Calculator.new("yinyang", "diameter" => "20", "height" => "5")
157 | 
158 | p calc.materials.map { |e| [e['name'], e['quantity'].ceil, e['unit']] } #=>
159 | # [["1cm thick flexible strip", 472, "sq cm"],
160 | #  ["granite slab", 315, "sq cm"],
161 | #  ["white sand", 550, "cu cm"],
162 | #  ["black sand", 550, "cu cm"]]
163 | 
164 | p calc.shipping_weight #=> 4006
165 | ```
166 | 
167 | With that goal in mind, the constructor for the `Calculator` class needs to do two chores:
168 | 
169 | 1. Convert the string-based dimension parameters provided via the web form into numeric values that Dentaku understands. An easy way to do this is to treat the strings as Dentaku expressions and evaluate them, so that a string like `"3.1416"` ends up getting converted to a `BigDecimal` object under the hood.
170 | 
171 | 2. Load any relevant formulas needed to compute the material quantities and weights -- relying on the `Project` class to figure out how to extract these values from the various user-provided CSV files.
172 | 
173 | The resulting code ends up looking like this:
174 | 
175 | ```ruby
176 | class Calculator
177 |   def initialize(project_name, params={})
178 |     @params = Hash[params.map { |k,v| [k,Dentaku(v)] }]  #1
179 | 
180 |     @quantity_formulas = Project.quantity_formulas(project_name)  #2
181 |     @common_formulas   = Project.common_formulas
182 |     @weight_formulas   = Project.weight_formulas
183 |   end
184 | 
185 |   # ...
186 | end
187 | ```
188 | 
189 | Because a decent amount of work has already been done to massage all the relevant bits of data into exactly the right format, the actual work of computing required material quantities is surprisingly simple:
190 | 
191 | 1. Instantiate a `Dentaku::Calculator` object
192 | 2. Load all the necessary common formulas into that object (e.g. `circular_area`, `cylinder_volume`, etc.)
193 | 3. Walk over the various material quantity formulas and evaluate them (e.g. `"black sand" => "cylinder_volume * 0.5 * fill"`)
194 | 4. Build up new records that map the names of materials in a project to their quantities.
195 | 
196 | A few lines of code later, and you have a freshly minted `Calculator#materials` method:
197 | 
198 | ```ruby
199 | # class Calculator
200 | 
201 |   def materials
202 |     calculator = Dentaku::Calculator.new #1
203 | 
204 |     @common_formulas.each { |k,v| calculator.store_formula(k,v) }  #2
205 | 
206 |     @quantity_formulas.map do |material|
207 |       amt = calculator.evaluate(material['formula'], @params) #3
208 | 
209 |       material.merge('quantity' => amt) #4
210 |     end
211 |   end
212 | ```
213 | 
214 | And for your last trick, you implement the `Calculator#shipping_weight` method.
215 | 
216 | Because currently all shipping weight computations are simple arithmetic operations on a `quantity` for each material, you don't need to load up the various common formulas used in the geometry equations. You just need to look up the relevant weight formulas by name, then evaluate them for each material in the list to get a weight value for that material. Sum up those values, for the entire materials list, and you're done!
217 | 
218 | ```ruby
219 | # class Calculator
220 | 
221 |   def shipping_weight
222 |     calculator = Dentaku::Calculator.new
223 | 
224 |     # Sum up weights for all materials in project based on quantity
225 |     materials.reduce(0.0) { |s, e|
226 |       weight = calculator.evaluate(@weight_formulas[e['name']], e)
227 | 
228 |       s + weight
229 |     }.ceil
230 |   end
231 | ```
232 | 
233 | Wiring the `Calculator` class up to your Sinatra application, you end up with a fully functional program, which looks just the same as it did when you mocked up the UI, but actually knows how to crunch numbers now.
234 | 
235 | As a sanity check, you enter the same values that you have been using to test the `Calculator` object on the command line into the Web UI, and observe the results:
236 | 
237 | ![](//i.imgur.com/26sV6wr.png)
238 | 
239 | They look correct. Mission accomplished!!!
240 | 
241 | > **SOURCE FILES:** [project.rb](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/project.rb) // [calculator.rb](https://github.com/PracticingDeveloper/dentaku-zen-garden/blob/32e518f80b5499990a4f92af6a261594baaba88a/calculator.rb)
242 | 
243 | ## Considering the tradeoffs involved in using Dentaku
244 | 
245 | It was easy to decide on using Dentaku in this particular project, for several reasons:
246 | 
247 | * The formulas used in the project consist entirely of simple arithmetic operations.
248 | 
249 | * The tool itself is an internal application with no major performance requirements.
250 | 
251 | * The people who will be writing the formulas already understand basic computing concepts.
252 | 
253 | * A programmer will available to customize the workflow and assist with problems as needed.
254 | 
255 | If even a couple of these conditions were not met, the potential caveats of using Dentaku (or any similar formula processing tool) would require more careful consideration.
256 | 
257 | **Maintainability concerns:**
258 | 
259 | Even though Dentaku's domain specific language is a very simple one, formulas are still a form of code. Like all code, any formulas that run through Dentaku need to be tested in some way -- and when things go wrong, they need to be debugged.
260 | 
261 | If your use of Dentaku is limited to the sort of thing someone might type into a cell of an Excel spreadsheet, there isn't much of a problem to worry about. You can fairly easily build some sane error handling, and can provide features within your application to allow the user to test formulas before they go live in production.
262 | 
263 | The more that user-defined computations start looking like "real programs", the more you will miss the various niceties of a real programming environment. We take for granted things like smart code editors that understand the languages we're working in, revision control systems, elaborate testing tools, debuggers, package managers, etc.
264 | 
265 | The simple nature of Dentaku's DSL should prevent you from ever getting into enough complexity to require the benefits of a proper development environment. That said, if the use cases for your project require you to run complex user-defined code that looks more like a program than a simple formula, Dentaku would definitely be the wrong tool for the job.
266 | 
267 | **Performance concerns:**
268 | 
269 | The default evaluation behavior of Dentaku is completely unoptimized: simply adding two numbers together is a couple orders of magnitude slower than it would be in pure Ruby. It is possible to precompile expressions by enabling `AST` caching, and this reduces evaluation overhead significantly. Doing so may introduce memory management issues at scale though, and even with this optimization the evaluator runs several times slower than native Ruby.
270 | 
271 | None of these performance issues matter when you're solving a single system of equations per request, but if you need to run Dentaku expressions in a tight loop over a large dataset, this is a problem to be aware of.
272 | 
273 | **Usability concerns:**
274 | 
275 | In this particular project, the people who will be using Dentaku are already familair with writing Excel-based formulas, and they are also comfortable with technology in general. This means that with a bit of documentation and training, they will be likely to comfortably use a code-based computational tool, as long as the workflow is kept relatively simple.
276 | 
277 | In cases where the target audience is not assumed to be comfortable writing code-based mathematical expressions and working with raw data formats, a lot more in-application support would be required. For example, one could imagine building a drag-and-drop interface for designing a garden layout, which would in turn generate the relevant Dentaku expressions under the hood.
278 | 
279 | The challenge is that once you get to the point where you need to put a layer of abstraction between the user and Dentaku's DSL, you should carefully consider whether you actually need a formula processing engine at all. It's certainly better to go without the extra complexity when it's possible to do so, but this will depend heavily on the context of your particular application.
280 | 
281 | **Extensibility concerns:**
282 | 
283 | Setting up non-programmers with a means of doing their own computations can help cut down on a lot of tedious maintenance programming work, but the core domain model and data access rules are still defined by the application's source code.
284 | 
285 | As requirements change in a business, new data sources may need to be wired up, and new pieces of support code may need to be written from time to time. This can be challenging, because tweaks to the domain model might require corresponding changes to the user-defined formulas.
286 | 
287 | In practice, this means that an embedded formula processing system works best when either the data sources and core domain model are somewhat stable, or there is a programmer actively maintaining the system that can help guide users through any necessary changes that come up.
288 | 
289 | With code stored either as user-provided data files or even in the application's database, there is also a potential for messy and complicated migrations to happen whenever a big change does need to happen. This may be especially challenging to navigate for non-programmers, who are used to writing something once and having it work forever.
290 | 
291 | *NOTE: Yes, this was a long list of caveats. Keep in mind that most of them only apply when you go beyond the "let's take this set of Excel sheets and turn it into a nicely managed program" use case and venture into the "I want to embed an adhoc SDK into my application" territory. The concerns listed above are meant to help you sort out what category your project falls into, so that you can choose a modeling technique wisely.*
292 | 
293 | ## Reflections and further explorations
294 | 
295 | By now you've seen that a formula parser/evaluator can be a great way to take a messy ad-hoc spreadsheet workflow and turn it into a slightly less messy ad-hoc web application workflow. This technique provides a way to balance the central management and depth of functionality that custom software development can offer with the flexibility and empowerment of putting computational modeling directly into the hands of non-programmers.
296 | 
297 | Although this is not an approach that should be used in every application, it's a very useful modeling strategy to know about, as long as you keep a close eye on the tradeoffs involved.
298 | 
299 | If you'd like to continue studying this topic, here are a few things to try out:
300 | 
301 | * Grab the [source code for the calculator application](https://github.com/PracticingDeveloper/dentaku-zen-garden), and run it on your own machine.
302 | 
303 | * Create a new garden layout, with some new material types and shapes. For example,
304 | you could try to create a group of concentric circles, or a checkerboard style design.
305 | 
306 | * [Explore how to extend Dentaku's DSL](https://github.com/rubysolo/dentaku#external-functions) with your own Ruby functions.
307 | 
308 | * Watch [Spreadsheets for developers](https://www.youtube.com/watch?v=0CKru5d4GPk), a talk by Felienne Hermans on the power and usefulness of basic spreadsheet software for rapid protyping and ad-hoc explorations.
309 | 
310 | Good luck with your future number crunching, and thanks for reading!
311 | 


--------------------------------------------------------------------------------
/http-server.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: A minimal HTTP server
  3 | ---
  4 | 
  5 | _Build just enough HTTP functionality from scratch to serve up static files._   
  6 | 
  7 | <!-- Issue 7.2 — July 2, 2013 -->
  8 | 
  9 | 
 10 | 
 11 | *This chapter was written by Luke Francl, a Ruby developer living in
 12 | San Francisco. He is a developer at [Swiftype](https://swiftype.com) where he
 13 | works on everything from web crawling to answering support requests.*
 14 | 
 15 | Implementing a simpler version of a technology that you use every day can
 16 | help you understand it better. In this article, we will apply this
 17 | technique by building a simple HTTP server in Ruby.
 18 | 
 19 | By the time you're done reading, you will know how to serve files from your
 20 | computer to a web browser with no dependencies other than a few standard
 21 | libraries that ship with Ruby. Although the server
 22 | we build will not be robust or anywhere near feature complete,
 23 | it will allow you to look under the hood of one of the most fundamental
 24 | pieces of technology that we all use on a regular basis.
 25 | 
 26 | ## A (very) brief introduction to HTTP
 27 | 
 28 | We all use web applications daily and many of us build
 29 | them for a living, but much of our work is done far above the HTTP level.
 30 | We'll need come down from the clouds a bit in order to explore
 31 | what happens at the protocol level when someone clicks a
 32 | link to *http://example.com/file.txt* in their web browser.
 33 | 
 34 | The following steps roughly cover the typical HTTP request/response lifecycle:
 35 | 
 36 | 1) The browser issues an HTTP request by opening a TCP socket connection to
 37 | `example.com` on port 80. The server accepts the connection, opening a
 38 | socket for bi-directional communication.
 39 | 
 40 | 2) When the connection has been made, the HTTP client sends a HTTP request:
 41 | 
 42 | ```
 43 | GET /file.txt HTTP/1.1
 44 | User-Agent: ExampleBrowser/1.0
 45 | Host: example.com
 46 | Accept: */*
 47 | ```
 48 | 
 49 | 3) The server then parses the request. The first line is the Request-Line which contains
 50 | the HTTP method (`GET`), Request-URI (`/file.txt`), and HTTP version (`1.1`).
 51 | Subsequent lines are headers, which consists of key-value pairs delimited by `:`.
 52 | After the headers is a blank line followed by an optional message body (not shown in
 53 | this example).
 54 | 
 55 | 4) Using the same connection, the server responds with the contents of the file:
 56 | 
 57 | ```
 58 | HTTP/1.1 200 OK
 59 | Content-Type: text/plain
 60 | Content-Length: 13
 61 | Connection: close
 62 | 
 63 | hello world
 64 | ```
 65 | 
 66 | 5) After finishing the response, the server closes the socket to terminate the connection.
 67 | 
 68 | The basic workflow shown above is one of HTTP's most simple use cases,
 69 | but it is also one of the most common interactions handled by web servers.
 70 | Let's jump right into implementing it!
 71 | 
 72 | 
 73 | ## Writing the "Hello World" HTTP server
 74 | 
 75 | To begin, let's build the simplest thing that could possibly work: a web server
 76 | that always responds "Hello World" with HTTP 200 to any request. The following
 77 | code mostly follows the process outlined in the previous section, but is
 78 | commented line-by-line to help you understand its implementation details:
 79 | 
 80 | ```ruby
 81 | require 'socket' # Provides TCPServer and TCPSocket classes
 82 | 
 83 | # Initialize a TCPServer object that will listen
 84 | # on localhost:2345 for incoming connections.
 85 | server = TCPServer.new('localhost', 2345)
 86 | 
 87 | # loop infinitely, processing one incoming
 88 | # connection at a time.
 89 | loop do
 90 | 
 91 |   # Wait until a client connects, then return a TCPSocket
 92 |   # that can be used in a similar fashion to other Ruby
 93 |   # I/O objects. (In fact, TCPSocket is a subclass of IO.)
 94 |   socket = server.accept
 95 | 
 96 |   # Read the first line of the request (the Request-Line)
 97 |   request = socket.gets
 98 | 
 99 |   # Log the request to the console for debugging
100 |   STDERR.puts request
101 | 
102 |   response = "Hello World!\n"
103 | 
104 |   # We need to include the Content-Type and Content-Length headers
105 |   # to let the client know the size and type of data
106 |   # contained in the response. Note that HTTP is whitespace
107 |   # sensitive, and expects each header line to end with CRLF (i.e. "\r\n")
108 |   socket.print "HTTP/1.1 200 OK\r\n" +
109 |                "Content-Type: text/plain\r\n" +
110 |                "Content-Length: #{response.bytesize}\r\n" +
111 |                "Connection: close\r\n"
112 | 
113 |   # Print a blank line to separate the header from the response body,
114 |   # as required by the protocol.
115 |   socket.print "\r\n"
116 | 
117 |   # Print the actual response body, which is just "Hello World!\n"
118 |   socket.print response
119 | 
120 |   # Close the socket, terminating the connection
121 |   socket.close
122 | end
123 | ```
124 | 
125 | To test your server, run this code and then try opening `http://localhost:2345/anything`
126 | in a browser. You should see the "Hello world!" message. Meanwhile, in the output for
127 | the HTTP server, you should see the request being logged:
128 | 
129 | ```
130 | GET /anything HTTP/1.1
131 | ```
132 | 
133 | Next, open another shell and test it with `curl`:
134 | 
135 | ```
136 | curl --verbose -XGET http://localhost:2345/anything
137 | ```
138 | 
139 | You'll see the detailed request and response headers:
140 | 
141 | ```
142 | * About to connect() to localhost port 2345 (#0)
143 | *   Trying 127.0.0.1... connected
144 | * Connected to localhost (127.0.0.1) port 2345 (#0)
145 | > GET /anything HTTP/1.1
146 | > User-Agent: curl/7.19.7 (universal-apple-darwin10.0) libcurl/7.19.7
147 |               OpenSSL/0.9.8r zlib/1.2.3
148 | > Host: localhost:2345
149 | > Accept: */*
150 | >
151 | < HTTP/1.1 200 OK
152 | < Content-Type: text/plain
153 | < Content-Length: 13
154 | < Connection: close
155 | <
156 | Hello world!
157 | * Closing connection #0
158 | ```
159 | 
160 | Congratulations, you've written a simple HTTP server! Now we'll
161 | build a more useful one.
162 | 
163 | ## Serving files over HTTP
164 | 
165 | We're about to build a more realistic program that is capable of
166 | serving files over HTTP, rather than simply responding to any request
167 | with "Hello World". In order to do that, we'll need to make a few
168 | changes to the way our server works.
169 | 
170 | For each incoming request, we'll parse the `Request-URI` header and translate it into
171 | a path to a file within the server's public folder. If we're able to find a match, we'll
172 | respond with its contents, using the file's size to determine the `Content-Length`,
173 | and its extension to determine the `Content-Type`. If no matching file can be found,
174 | we'll respond with a `404 Not Found` error status.
175 | 
176 | Most of these changes are fairly straightforward to implement, but mapping the
177 | `Request-URI` to a path on the server's filesystem is a bit more complicated due
178 | to security issues. To simplify things a bit, let's assume for the moment that a
179 | `requested_file` function has been implemented for us already that can handle
180 | this task safely. Then we could build a rudimentary HTTP file server in the following way:
181 | 
182 | ```ruby
183 | require 'socket'
184 | require 'uri'
185 | 
186 | # Files will be served from this directory
187 | WEB_ROOT = './public'
188 | 
189 | # Map extensions to their content type
190 | CONTENT_TYPE_MAPPING = {
191 |   'html' => 'text/html',
192 |   'txt' => 'text/plain',
193 |   'png' => 'image/png',
194 |   'jpg' => 'image/jpeg'
195 | }
196 | 
197 | # Treat as binary data if content type cannot be found
198 | DEFAULT_CONTENT_TYPE = 'application/octet-stream'
199 | 
200 | # This helper function parses the extension of the
201 | # requested file and then looks up its content type.
202 | 
203 | def content_type(path)
204 |   ext = File.extname(path).split(".").last
205 |   CONTENT_TYPE_MAPPING.fetch(ext, DEFAULT_CONTENT_TYPE)
206 | end
207 | 
208 | # This helper function parses the Request-Line and
209 | # generates a path to a file on the server.
210 | 
211 | def requested_file(request_line)
212 |   # ... implementation details to be discussed later ...
213 | end
214 | 
215 | # Except where noted below, the general approach of
216 | # handling requests and generating responses is
217 | # similar to that of the "Hello World" example
218 | # shown earlier.
219 | 
220 | server = TCPServer.new('localhost', 2345)
221 | 
222 | loop do
223 |   socket       = server.accept
224 |   request_line = socket.gets
225 | 
226 |   STDERR.puts request_line
227 | 
228 |   path = requested_file(request_line)
229 | 
230 |   # Make sure the file exists and is not a directory
231 |   # before attempting to open it.
232 |   if File.exist?(path) && !File.directory?(path)
233 |     File.open(path, "rb") do |file|
234 |       socket.print "HTTP/1.1 200 OK\r\n" +
235 |                    "Content-Type: #{content_type(file)}\r\n" +
236 |                    "Content-Length: #{file.size}\r\n" +
237 |                    "Connection: close\r\n"
238 | 
239 |       socket.print "\r\n"
240 | 
241 |       # write the contents of the file to the socket
242 |       IO.copy_stream(file, socket)
243 |     end
244 |   else
245 |     message = "File not found\n"
246 | 
247 |     # respond with a 404 error code to indicate the file does not exist
248 |     socket.print "HTTP/1.1 404 Not Found\r\n" +
249 |                  "Content-Type: text/plain\r\n" +
250 |                  "Content-Length: #{message.size}\r\n" +
251 |                  "Connection: close\r\n"
252 | 
253 |     socket.print "\r\n"
254 | 
255 |     socket.print message
256 |   end
257 | 
258 |   socket.close
259 | end
260 | ```
261 | 
262 | Although there is a lot more code here than what we saw in the
263 | "Hello World" example, most of it is routine file manipulation
264 | similar to the kind we'd encounter in everyday code. Now there
265 | is only one more feature left to implement before we can serve
266 | files over HTTP: the `requested_file` method.
267 | 
268 | ## Safely converting a URI into a file path
269 | 
270 | Practically speaking, mapping the Request-Line to a file on the
271 | server's filesystem is easy: you extract the Request-URI, scrub
272 | out any parameters and URI-encoding, and then finally turn that
273 | into a path to a file in the server's public folder:
274 | 
275 | ```ruby
276 | # Takes a request line (e.g. "GET /path?foo=bar HTTP/1.1")
277 | # and extracts the path from it, scrubbing out parameters
278 | # and unescaping URI-encoding.
279 | #
280 | # This cleaned up path (e.g. "/path") is then converted into
281 | # a relative path to a file in the server's public folder
282 | # by joining it with the WEB_ROOT.
283 | def requested_file(request_line)
284 |   request_uri  = request_line.split(" ")[1]
285 |   path         = URI.unescape(URI(request_uri).path)
286 | 
287 |   File.join(WEB_ROOT, path)
288 | end
289 | ```
290 | 
291 | However, this implementation has a very bad security problem that has affected
292 | many, many web servers and CGI scripts over the years: the server will happily
293 | serve up any file, even if it's outside the `WEB_ROOT`.
294 | 
295 | Consider a request like this:
296 | 
297 | ```
298 | GET /../../../../etc/passwd HTTP/1.1
299 | ```
300 | 
301 | On my system, when `File.join` is called on this path, the ".." path components
302 | will cause it escape the `WEB_ROOT` directory and serve the `/etc/passwd` file.
303 | Yikes! We'll need to sanitize the path before use in order to prevent this
304 | kind of problem.
305 | 
306 | > **Note:** If you want to try to reproduce this issue on your own machine,
307 | you may need to use a low level tool like *curl* to demonstrate it. Some browsers change the path to remove the ".." before sending a request to the server.
308 | 
309 | Because security code is notoriously difficult to get right, we will borrow our
310 | implementation from [Rack::File](https://github.com/rack/rack/blob/master/lib/rack/file.rb).
311 | The approach shown below was actually added to `Rack::File` in response to a [similar
312 | security vulnerability](http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2013-0262) that
313 | was disclosed in early 2013:
314 | 
315 | ```ruby
316 | def requested_file(request_line)
317 |   request_uri  = request_line.split(" ")[1]
318 |   path         = URI.unescape(URI(request_uri).path)
319 | 
320 |   clean = []
321 | 
322 |   # Split the path into components
323 |   parts = path.split("/")
324 | 
325 |   parts.each do |part|
326 |     # skip any empty or current directory (".") path components
327 |     next if part.empty? || part == '.'
328 |     # If the path component goes up one directory level (".."),
329 |     # remove the last clean component.
330 |     # Otherwise, add the component to the Array of clean components
331 |     part == '..' ? clean.pop : clean << part
332 |   end
333 | 
334 |   # return the web root joined to the clean path
335 |   File.join(WEB_ROOT, *clean)
336 | end
337 | ```
338 | 
339 | To test this implementation (and finally see your file server in action),
340 | replace the `requested_file` stub in the example from the previous section
341 | with the implementation shown above, and then create an `index.html` file
342 | in a `public/` folder that is contained within the same directory as your
343 | server script. Upon running the script, you should be able to
344 | visit `http://localhost:2345/index.html` but NOT be able to reach any
345 | files outside of the `public/` folder.
346 | 
347 | ## Serving up index.html implicitly
348 | 
349 | If you visit `http://localhost:2345` in your web browser, you'll see a 404 Not
350 | Found response, even though you've created an index.html file. Most real web
351 | servers will serve an index file when the client requests a directory. Let's
352 | implement that.
353 | 
354 | This change is more simple than it seems, and can be accomplished by adding
355 | a single line of code to our server script:
356 | 
357 | ```diff
358 | # ...
359 | path = requested_file(request_line)
360 | 
361 | + path = File.join(path, 'index.html') if File.directory?(path)
362 | 
363 | if File.exist?(path) && !File.directory?(path)
364 | # ...
365 | ```
366 | 
367 | Doing so will cause any path that refers to a directory to have "/index.html" appended to
368 | the end of it. This way, `/` becomes `/index.html`, and `/path/to/dir` becomes
369 | `path/to/dir/index.html`.
370 | 
371 | Perhaps surprisingly, the validations in our response code do not need
372 | to be changed. Let's recall what they look like and then examine why
373 | that's the case:
374 | 
375 | ```ruby
376 | if File.exist?(path) && !File.directory?(path)
377 |   # serve up the file...
378 | else
379 |   # respond with a 404
380 | end
381 | ```
382 | 
383 | Suppose a request is received for `/somedir`. That request will automatically be converted by our server into `/somedir/index.html`. If the index.html exists within `/somedir`, then it will be served up without any problems. However, if `/somedir` does not contain an `index.html` file, the `File.exist?` check will fail, causing the server to respond with a 404 error code. This is exactly what we want!
384 | 
385 | It may be tempting to think that this small change would make it possible to remove the `File.directory?` check, and in normal circumstances you might be able to safely do with it. However, because leaving it in prevents an error condition in the edge case where someone attempts to serve up a directory named `index.html`, we've decided to leave that validation as it is.
386 | 
387 | With this small improvement, our file server is now pretty much working as we'd expect it to. If you want to play with it some more, you can grab the [complete source code](https://github.com/elm-city-craftworks/practicing-ruby-examples/tree/master/v7/002) from GitHub.
388 | 
389 | ## Where to go from here
390 | 
391 | In this article, we reviewed how HTTP works, then built a simple web
392 | server that can serve up files from a directory. We've also examined
393 | one of the most common security problems with web applications and
394 | fixed it. If you've made it this far, congratulations! That's a lot
395 | to learn in one day.
396 | 
397 | However, it's obvious that the server we've built is extremely limited.
398 | If you want to continue in your studies, here are a few recommendations
399 | for how to go about improving the server:
400 | 
401 | * According to the HTTP 1.1 specification, a server must minimally
402 | respond to GET and HEAD to be compliant. Implement the HEAD response.
403 | * Add error handling that returns a 500 response to the client
404 | if something goes wrong with the request.
405 | * Make the web root directory and port configurable.
406 | * Add support for POST requests. You could implement CGI by executing
407 | a script when it matches the path, or implement
408 | the [Rack spec](http://rack.rubyforge.org/doc/SPEC.html) to
409 | let the server serve Rack apps with `call`.
410 | * Reimplement the request loop using [GServer](http://www.ruby-doc.org/stdlib-2.0/libdoc/gserver/rdoc/GServer.html)
411 | (Ruby's generic threaded server) to handle multiple connections.
412 | 
413 | Please do share your experiences and code if you decide to try any of
414 | these ideas, or if you come up with some improvement ideas of your own.
415 | Happy hacking!
416 | 
417 | *We'd like to thank Eric Hodel, Magnus Holm, Piotr Szotkowski, and
418 | Mathias Lafeldt for reviewing this article and providing feedback
419 | before we published it.*
420 | 


--------------------------------------------------------------------------------
/parsing-json.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Parsing JSON the hard way
  3 | ---
  4 | 
  5 | _Learn about low-level parser and compiler tools by implementing a JSON parser_
  6 | 
  7 | <!-- Issue 6.1 — January 1, 2013 -->
  8 | 
  9 | 
 10 | *This chapter was written by Aaron Patterson, a Ruby
 11 | developer living in Seattle, WA.  He's been having fun writing Ruby for the past
 12 | 7 years, and hopes to share his love of Ruby with you.*
 13 | 
 14 | Hey everybody!  I hope you're having a great day today!  The sun has peeked out
 15 | of the clouds for a bit today, so I'm doing great!
 16 | 
 17 | In this article, we're going to be looking at some compiler tools for use with Ruby.  In
 18 | order to explore these tools, we'll write a JSON parser.  I know you're saying,
 19 | "but Aaron, *why* write a JSON parser?  Don't we have like 1,234,567 of them?".
 20 | Yes!  We do have precisely 1,234,567 JSON parsers available in Ruby!  We're
 21 | going to parse JSON because the grammar is simple enough that we can finish the
 22 | parser in one sitting, and because the grammar is complex enough that we can
 23 | exercise some of Ruby's compiler tools.
 24 | 
 25 | As you read on, keep in mind that this isn't an article about parsing JSON,
 26 | its an article about using parser and compiler tools in Ruby.
 27 | 
 28 | ## The Tools We'll Be Using
 29 | 
 30 | I'm going to be testing this with Ruby 2.1.0, but it should work under any
 31 | flavor of Ruby you wish to try.  Mainly, we will be using a tool called `Racc`,
 32 | and a tool called `StringScanner`.
 33 | 
 34 | **Racc**
 35 | 
 36 | We'll be using Racc to generate our parser.  Racc is an LALR parser generator
 37 | similar to YACC.  YACC stands for "Yet Another Compiler Compiler", but this is
 38 | the Ruby version, hence "Racc".  Racc converts a grammar file (the ".y" file)
 39 | to a Ruby file that contains state transitions.  These state transitions are
 40 | interpreted by the Racc state machine (or runtime).  The Racc runtime ships
 41 | with Ruby, but the tool that converts the ".y" files to state tables does not.
 42 | In order to install the converter, do `gem install racc`.
 43 | 
 44 | We will write ".y" files, but users cannot run the ".y" files.  First we convert
 45 | them to runnable Ruby code, and ship the runnable Ruby code in our gem.  In
 46 | practical terms, this means that *only we install the Racc gem*, other users
 47 | do not need it.
 48 | 
 49 | Don't worry if this doesn't make sense right now.  It will become more clear
 50 | when we get our hands dirty and start playing with code.
 51 | 
 52 | **StringScanner**
 53 | 
 54 | Just like the name implies, [StringScanner](http://ruby-doc.org/stdlib-1.9.3/libdoc/strscan/rdoc/StringScanner.html)
 55 | is a class that helps us scan strings.  It keeps track of where we are
 56 | in the string, and lets us advance forward via regular expressions or by
 57 | character.
 58 | 
 59 | Let's try it out!  First we'll create a `StringScanner` object, then we'll scan
 60 | some letters from it:
 61 | 
 62 | ```ruby
 63 | require 'strscan'
 64 | 
 65 | ss = StringScanner.new 'aabbbbb' #=> #<StringScanner 0/7 @ "aabbb...">
 66 | ss.scan /a/ #=> "a"
 67 | ss.scan /a/ #=> "a"
 68 | ss.scan /a/ #=> nil
 69 | ss #=> #<StringScanner 2/7 "aa" @ "bbbbb">
 70 | ```
 71 | 
 72 | Notice that the third call to
 73 | [StringScanner#scan](http://ruby-doc.org/stdlib-1.9.3/libdoc/strscan/rdoc/StringScanner.html#method-i-scan)
 74 | resulted in a `nil`, since the regular expression did not match from the current
 75 | position.  Also note that when you inspect the `StringScanner` instance, you can
 76 | see the position of the scanner (in this case `2/7`).
 77 | 
 78 | We can also move through the scanner character by character using
 79 | [StringScanner#getch](http://ruby-doc.org/stdlib-1.9.3/libdoc/strscan/rdoc/StringScanner.html#method-i-getch):
 80 | 
 81 | ```ruby
 82 | ss #=> #<StringScanner 2/7 "aa" @ "bbbbb">
 83 | ss.getch #=> "b"
 84 | 
 85 | ss #=> #<StringScanner 3/7 "aab" @ "bbbb">
 86 | ```
 87 | 
 88 | The `getch` method returns the next character, and advances the pointer by one.
 89 | 
 90 | Now that we've covered the basics for scanning strings, let's take a
 91 | look at using Racc.
 92 | 
 93 | ## Racc Basics
 94 | 
 95 | As I said earlier, Racc is an LALR parser generator.  You can think of it as a
 96 | system that lets you write limited regular expressions that can execute
 97 | arbitrary code at different points as they're being evaluated.
 98 | 
 99 | Let's look at an example.  Suppose we have a pattern we want to match:
100 | `(a|c)*abb`.  That is, we want to match any number of 'a' or 'c' followed by
101 | 'abb'.  To translate this to a Racc grammar, we try to break up this regular
102 | expression to smaller parts, and assemble them as the whole.  Each part is
103 | called a "production".  Let's try breaking up this regular expression so that we
104 | can see what the productions look like, and the format of a Racc grammar file.
105 | 
106 | First we create our grammar file.  At the top of the file, we declare the Ruby
107 | class to be produced, followed by the `rule` keyword to indicate that we're
108 | going to declare the productions, followed by the `end` keyword to indicate the
109 | end of the productions:
110 | 
111 | ```
112 | class Parser
113 | rule
114 | end
115 | ```
116 | 
117 | Next lets add the production for "a|c".  We'll call this production `a_or_c`:
118 | 
119 | 
120 | ```
121 | class Parser
122 | rule
123 |   a_or_c : 'a' | 'c' ;
124 | end
125 | ```
126 | 
127 | Now we have a rule named `a_or_c`, and it matches the characters 'a' or 'c'.  In
128 | order to match one or more `a_or_c` productions, we'll add a recursive
129 | production called `a_or_cs`:
130 | 
131 | ```
132 | class Parser
133 | rule
134 |   a_or_cs
135 |     : a_or_cs a_or_c
136 |     | a_or_c
137 |     ;
138 |   a_or_c : 'a' | 'c' ;
139 | end
140 | ```
141 | 
142 | The `a_or_cs` production recurses on itself, equivalent to the regular
143 | expression `(a|c)+`.  Next, a production for 'abb':
144 | 
145 | ```
146 | class Parser
147 | rule
148 |   a_or_cs
149 |     : a_or_cs a_or_c
150 |     | a_or_c
151 |     ;
152 |   a_or_c : 'a' | 'c' ;
153 |   abb    : 'a' 'b' 'b'
154 | end
155 | ```
156 | 
157 | Finally, the `string` production ties everything together:
158 | 
159 | 
160 | ```
161 | class Parser
162 | rule
163 |   string
164 |     : a_or_cs abb
165 |     | abb
166 |     ;
167 |   a_or_cs
168 |     : a_or_cs a_or_c
169 |     | a_or_c
170 |     ;
171 |   a_or_c : 'a' | 'c' ;
172 |   abb    : 'a' 'b' 'b';
173 | end
174 | ```
175 | 
176 | This final production matches one or more 'a' or 'c' characters followed by
177 | 'abb', or just the string 'abb' on its own.  This is equivalent to our original
178 | regular expression of `(a|c)*abb`.
179 | 
180 | **But Aaron, this is so long!**
181 | 
182 | I know, it's much longer than the regular expression version.  However, we can
183 | add arbitrary Ruby code to be executed at any point in the matching process.
184 | For example, every time we find just the string "abb", we can execute some
185 | arbitrary code:
186 | 
187 | ```
188 | class Parser
189 | rule
190 |   string
191 |     | a_or_cs abb
192 |     | abb         
193 |     ;
194 |   a_or_cs
195 |     : a_or_cs a_or_c
196 |     | a_or_c
197 |     ;
198 |   a_or_c : 'a' | 'c' ;
199 |   abb    : 'a' 'b' 'b' { puts "I found abb!" };
200 | end
201 | ```
202 | 
203 | The Ruby code we want to execute should be wrapped in curly braces and placed
204 | after the rule where we want the trigger to fire.
205 | 
206 | To use this parser, we also need a tokenizer that can break the input
207 | data into tokens, along with some other boilerplate code. If you are curious
208 | about how that works, you can check out [this standalone
209 | example](https://gist.githubusercontent.com/sandal/9532497/raw/8e3bb03fc24c8f6604f96516bf242e7e13d0f4eb/parser_example.y).
210 | 
211 | Now that we've covered the basics, we can use knowledge we have so far to build
212 | an event based JSON parser and tokenizer.
213 | 
214 | ## Building our JSON Parser
215 | 
216 | Our JSON parser is going to consist of three different objects, a parser, a
217 | tokenizer, and document handler.The parser will be written with a Racc grammar,
218 | and will ask the tokenizer for input from the input stream.  Whenever the parser
219 | can identify a part of the JSON stream, it will send an event to the document
220 | handler.  The document handler is responsible for collecting the JSON
221 | information and translating it to a Ruby data structure. When we read in
222 | a JSON document, the following method calls are made:
223 | 
224 | ![method calls](//i.imgur.com/HZ0Sa.png)
225 | 
226 | It's time to get started building this system. We'll focus on building the
227 | tokenizer first, then work on the grammar for the parser, and finally implement
228 | the document handler.
229 | 
230 | ## Building the tokenizer
231 | 
232 | Our tokenizer is going to be constructed with an IO object.  We'll read the
233 | JSON data from the IO object.  Every time `next_token` is called, the tokenizer
234 | will read a token from the input and return it. Our tokenizer will return the
235 | following tokens, which we derived from the [JSON spec](http://www.json.org/):
236 | 
237 | * Strings
238 | * Numbers
239 | * True
240 | * False
241 | * Null
242 | 
243 | Complex types like arrays and objects will be determined by the parser.
244 | 
245 | **`next_token` return values:**
246 | 
247 | When the parser calls `next_token` on the tokenizer, it expects a two element
248 | array or a `nil` to be returned.  The first element of the array must contain
249 | the name of the token, and the second element can be anything (but most people
250 | just add the matched text).  When a `nil` is returned, that indicates there are
251 | no more tokens left in the tokenizer.
252 | 
253 | **`Tokenizer` class definition:**
254 | 
255 | Let's look at the source for the Tokenizer class and walk through it:
256 | 
257 | ```ruby
258 | module RJSON
259 |   class Tokenizer
260 |     STRING = /"(?:[^"\\]|\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4}))*"/
261 |     NUMBER = /-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)?/
262 |     TRUE   = /true/
263 |     FALSE  = /false/
264 |     NULL   = /null/
265 | 
266 |     def initialize io
267 |       @ss = StringScanner.new io.read
268 |     end
269 | 
270 |     def next_token
271 |       return if @ss.eos?
272 | 
273 |       case
274 |       when text = @ss.scan(STRING) then [:STRING, text]
275 |       when text = @ss.scan(NUMBER) then [:NUMBER, text]
276 |       when text = @ss.scan(TRUE)   then [:TRUE, text]
277 |       when text = @ss.scan(FALSE)  then [:FALSE, text]
278 |       when text = @ss.scan(NULL)   then [:NULL, text]
279 |       else
280 |         x = @ss.getch
281 |         [x, x]
282 |       end
283 |     end
284 |   end
285 | end
286 | ```
287 | 
288 | First we declare some regular expressions that we'll use along with the string
289 | scanner.  These regular expressions were derived from the definitions on
290 | [json.org](http://www.json.org).  We instantiate a string scanner object in the
291 | constructor.  String scanner requires a string on construction, so we read the
292 | IO object.  However, we could build an alternative tokenizer that reads from the
293 | IO as needed.
294 | 
295 | The real work is done in the `next_token` method.  The `next_token` method
296 | returns nil if there is nothing left to read from the string scanner, then it
297 | tries each regular expression until it finds a match.  If it finds a match, it
298 | returns the name of the token (for example `:STRING`) along with the text that
299 | it matched.  If none of the regular expressions match, then we read one
300 | character off the scanner, and return that character as both the name of the
301 | token, and the value.
302 | 
303 | Let's try feeding the tokenizer a JSON string and see what tokens come out:
304 | 
305 | ```ruby
306 | tok = RJSON::Tokenizer.new StringIO.new '{"foo":null}'
307 | #=> #<RJSON::Tokenizer:0x007fa8529fbeb8 @ss=#<StringScanner 0/12 @ "{\"foo...">>
308 | 
309 | tok.next_token #=> ["{", "{"]
310 | tok.next_token #=> [:STRING, "\"foo\""]
311 | tok.next_token #=> [":", ":"]
312 | tok.next_token #=> [:NULL, "null"]
313 | tok.next_token #=> ["}", "}"]
314 | tok.next_token #=> nil
315 | ```
316 | 
317 | In this example, we wrap the JSON string with a `StringIO` object in order to
318 | make the string quack like an IO.  Next, we try reading tokens from the
319 | tokenizer.  Each token the Tokenizer understands has the name as the first value of
320 | the array, where the unknown tokens have the single character value.  For
321 | example, string tokens look like this: `[:STRING, "foo"]`, and unknown tokens
322 | look like this: `['(', '(']`.   Finally, `nil` is returned when the input has
323 | been exhausted.
324 | 
325 | This is it for our tokenizer.  The tokenizer is initialized with an `IO` object,
326 | and has only one method: `next_token`.  Now we can focus on the parser side.
327 | 
328 | ## Building the parser
329 | 
330 | We have our tokenizer in place, so now it's time to assemble the parser.  First
331 | we need to do a little house keeping.  We're going to generate a Ruby file from
332 | our `.y` file.  The Ruby file needs to be regenerated every time the `.y` file
333 | changes.  A Rake task sounds like the perfect solution.
334 | 
335 | **Defining a compile task:**
336 | 
337 | The first thing we'll add to the Rakefile is a rule that says *"translate .y files to
338 | .rb files using the following command"*:
339 | 
340 | ```ruby
341 | rule '.rb' => '.y' do |t|
342 |   sh "racc -l -o #{t.name} #{t.source}"
343 | end
344 | ```
345 | 
346 | Then we'll add a "compile" task that depends on the generated `parser.rb` file:
347 | 
348 | ```ruby
349 | task :compile => 'lib/rjson/parser.rb'
350 | ```
351 | 
352 | We keep our grammar file as `lib/rjson/parser.y`, and when we run `rake
353 | compile`, rake will automatically translate the `.y` file to a `.rb` file using
354 | Racc.
355 | 
356 | Finally we make the test task depend on the compile task so that when we run
357 | `rake test`, the compiled file is automatically generated:
358 | 
359 | ```ruby
360 | task :test => :compile
361 | ```
362 | 
363 | Now we can compile and test the `.y` file.
364 | 
365 | **Translating the JSON.org spec:**
366 | 
367 | We're going to translate the diagrams from [json.org](http://www.json.org/) to a
368 | Racc grammar.  A JSON document should be an object or an array at the root, so
369 | we'll make a production called `document` and it should be an `object` or an
370 | `array`:
371 | 
372 | ```
373 | rule
374 |   document
375 |     : object
376 |     | array
377 |     ;
378 | ```
379 | 
380 | Next we need to define `array`.  The `array` production can either be empty, or
381 | contain 1 or more values:
382 | 
383 | ```
384 |   array
385 |     : '[' ']'
386 |     | '[' values ']'
387 |     ;
388 | ```
389 | 
390 | The `values` production can be recursively defined as one value, or many values
391 | separated by a comma:
392 | 
393 | ```
394 |   values
395 |     : values ',' value
396 |     | value
397 |     ;
398 | ```
399 | 
400 | The JSON spec defines a `value` as a string, number, object, array, true, false,
401 | or null.  We'll define it the same way, but for the immediate values such as
402 | NUMBER, TRUE, and FALSE, we'll use the token names we defined in the tokenizer:
403 | 
404 | ```
405 |   value
406 |     : string
407 |     | NUMBER
408 |     | object
409 |     | array
410 |     | TRUE
411 |     | FALSE
412 |     | NULL
413 |     ;
414 | ```
415 | 
416 | Now we need to define the `object` production.  Objects can be empty, or
417 | have many pairs:
418 | 
419 | ```
420 |   object
421 |     : '{' '}'
422 |     | '{' pairs '}'
423 |     ;
424 | ```
425 | 
426 | We can have one or more pairs, and they must be separated with a comma.  We can
427 | define this recursively like we did with the array values:
428 | 
429 | ```
430 |   pairs
431 |     : pairs ',' pair
432 |     | pair
433 |     ;
434 | ```
435 | 
436 | Finally, a pair is a string and value separated by a colon:
437 | 
438 | ```
439 |   pair
440 |     : string ':' value
441 |     ;
442 | ```
443 | 
444 | Now we let Racc know about our special tokens by declaring them at the top, and
445 | we have our full parser:
446 | 
447 | ```
448 | class RJSON::Parser
449 | token STRING NUMBER TRUE FALSE NULL
450 | rule
451 |   document
452 |     : object
453 |     | array
454 |     ;
455 |   object
456 |     : '{' '}'
457 |     | '{' pairs '}'
458 |     ;
459 |   pairs
460 |     : pairs ',' pair
461 |     | pair
462 |     ;
463 |   pair : string ':' value ;
464 |   array
465 |     : '[' ']'
466 |     | '[' values ']'
467 |     ;
468 |   values
469 |     : values ',' value
470 |     | value
471 |     ;
472 |   value
473 |     : string
474 |     | NUMBER
475 |     | object
476 |     | array
477 |     | TRUE
478 |     | FALSE
479 |     | NULL
480 |     ;
481 |   string : STRING ;
482 | end
483 | ```
484 | 
485 | ## Building the handler
486 | 
487 | Our parser will send events to a document handler.  The document handler will
488 | assemble the beautiful JSON bits in to lovely Ruby object!  Granularity of the
489 | events is really up to you, but I'm going to go with 5 events:
490 | 
491 | * `start_object` - called when an object is started
492 | * `end_object`   - called when an object ends
493 | * `start_array`  - called when an array is started
494 | * `end_array`    - called when an array ends
495 | * `scalar`       - called with terminal values like strings, true, false, etc
496 | 
497 | With these 5 events, we can assemble a Ruby object that represents the JSON
498 | object we are parsing.
499 | 
500 | **Keeping track of events**
501 | 
502 | The handler we build will simply keep track of events sent to us by the parser.
503 | This creates tree-like data structure that we'll use to convert JSON to Ruby.
504 | 
505 | ```ruby
506 | module RJSON
507 |   class Handler
508 |     def initialize
509 |       @stack = [[:root]]
510 |     end
511 | 
512 |     def start_object
513 |       push [:hash]
514 |     end
515 | 
516 |     def start_array
517 |       push [:array]
518 |     end
519 | 
520 |     def end_array
521 |       @stack.pop
522 |     end
523 |     alias :end_object :end_array
524 | 
525 |     def scalar(s)
526 |       @stack.last << [:scalar, s]
527 |     end
528 | 
529 |     private
530 | 
531 |     def push(o)
532 |       @stack.last << o
533 |       @stack << o
534 |     end
535 |   end
536 | end
537 | ```
538 | 
539 | When the parser encounters the start of an object, the handler pushes a list on
540 | the stack with the "hash" symbol to indicate the start of a hash.  Events that
541 | are children will be added to the parent, then when the object end is
542 | encountered the parent is popped off the stack.
543 | 
544 | This may be a little hard to understand, so let's look at some examples.  If we
545 | parse this JSON: `{"foo":{"bar":null}}`, then the `@stack` variable will look
546 | like this:
547 | 
548 | ```ruby
549 | [[:root,
550 |   [:hash,
551 |     [:scalar, "foo"],
552 |     [:hash,
553 |       [:scalar, "bar"],
554 |       [:scalar, nil]]]]]
555 | ```
556 | 
557 | If we parse a JSON array, like this JSON: `["foo",null,true]`, the `@stack`
558 | variable will look like this:
559 | 
560 | ```ruby
561 | [[:root,
562 |   [:array,
563 |     [:scalar, "foo"],
564 |     [:scalar, nil],
565 |     [:scalar, true]]]]
566 | ```
567 | 
568 | **Converting to Ruby:**
569 | 
570 | Now that we have an intermediate representation of the JSON, let's convert it to
571 | a Ruby data structure.  To convert to a Ruby data structure, we can just write a
572 | recursive function to process the tree:
573 | 
574 | ```ruby
575 | def result
576 |   root = @stack.first.last
577 |   process root.first, root.drop(1)
578 | end
579 | 
580 | private
581 | def process type, rest
582 |   case type
583 |   when :array
584 |     rest.map { |x| process(x.first, x.drop(1)) }
585 |   when :hash
586 |     Hash[rest.map { |x|
587 |       process(x.first, x.drop(1))
588 |     }.each_slice(2).to_a]
589 |   when :scalar
590 |     rest.first
591 |   end
592 | end
593 | ```
594 | 
595 | The `result` method removes the `root` node and sends the rest to the `process`
596 | method.  When the `process` method encounters a `hash` symbol it builds a hash
597 | using the children by recursively calling `process`.  Similarly, when an
598 | `array` symbol is found, an array is constructed recursively with the children.
599 | Scalar values are simply returned (which prevents an infinite loop).  Now if we
600 | call `result` on our handler, we can get the Ruby object back.
601 | 
602 | Let's see it in action:
603 | 
604 | ```ruby
605 | require 'rjson'
606 | 
607 | input   = StringIO.new '{"foo":"bar"}'
608 | tok     = RJSON::Tokenizer.new input
609 | parser  = RJSON::Parser.new tok
610 | handler = parser.parse
611 | handler.result # => {"foo"=>"bar"}
612 | ```
613 | 
614 | **Cleaning up the RJSON API:**
615 | 
616 | We have a fully function JSON parser.  Unfortunately, the API is not very
617 | friendly.  Let's take the previous example, and package it up in a method:
618 | 
619 | ```ruby
620 | module RJSON
621 |   def self.load(json)
622 |     input   = StringIO.new json
623 |     tok     = RJSON::Tokenizer.new input
624 |     parser  = RJSON::Parser.new tok
625 |     handler = parser.parse
626 |     handler.result
627 |   end
628 | end
629 | ```
630 | 
631 | Since we built our JSON parser to deal with IO from the start, we can add
632 | another method for people who would like to pass a socket or file handle:
633 | 
634 | ```ruby
635 | module RJSON
636 |   def self.load_io(input)
637 |     tok     = RJSON::Tokenizer.new input
638 |     parser  = RJSON::Parser.new tok
639 |     handler = parser.parse
640 |     handler.result
641 |   end
642 | 
643 |   def self.load(json)
644 |     load_io StringIO.new json
645 |   end
646 | end
647 | ```
648 | 
649 | Now the interface is a bit more friendly:
650 | 
651 | ```ruby
652 | require 'rjson'
653 | require 'open-uri'
654 | 
655 | RJSON.load '{"foo":"bar"}' # => {"foo"=>"bar"}
656 | RJSON.load_io open('http://example.org/some_endpoint.json')
657 | ```
658 | 
659 | ## Reflections
660 | 
661 | So we've finished our JSON parser.  Along the way we've studied compiler
662 | technology including the basics of parsers, tokenizers, and even interpreters
663 | (yes, we actually interpreted our JSON!).  You should be proud of yourself!
664 | 
665 | The JSON parser we've built is versatile. We can:
666 | 
667 | * Use it in an event driven manner by implementing a Handler object
668 | * Use a simpler API and just feed strings
669 | * Stream in JSON via IO objects
670 | 
671 | I hope this article has given you the confidence to start playing with parser
672 | and compiler technology in Ruby. Please leave a comment if you have any
673 | questions for me.
674 | 
675 | ## Post Script
676 | 
677 | I want to follow up with a few bits of minutiae that I omitted to maintain
678 | clarity in the article:
679 | 
680 | * [Here](https://github.com/tenderlove/rjson/blob/master/lib/rjson/parser.y) is
681 | the final grammar file for our JSON parser.  Notice
682 | the [---- inner section in the .y file](https://github.com/tenderlove/rjson/blob/master/lib/rjson/parser.y#L53).
683 | Anything in that section is included *inside* the generated parser class.  This
684 | is how we get the handler object to be passed to the parser.
685 | 
686 | * Our parser actually [does the
687 | translation](https://github.com/tenderlove/rjson/blob/master/lib/rjson/parser.y#L42-50)
688 | of JSON terminal nodes to Ruby.  So we're actually doing the translation of JSON
689 | to Ruby in two places: the parser *and* the document handler.  The document
690 | handler deals with structure where the parser deals with immediate values (like
691 | true, false, etc).  An argument could be made that none or all of this
692 | translation *should* be done in the parser.
693 | 
694 | * Finally, I mentioned that [the
695 | tokenizer](https://github.com/tenderlove/rjson/blob/master/lib/rjson/tokenizer.rb)
696 | buffers.  I implemented a simple non-buffering tokenizer that you can read
697 | [here](https://github.com/tenderlove/rjson/blob/master/lib/rjson/stream_tokenizer.rb).
698 | It's pretty messy, but I think could be cleaned up by using a state machine.
699 | 
700 | That's all. Thanks for reading! <3 <3 <3
701 | 


--------------------------------------------------------------------------------
/rapid-prototyping.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Rapid Prototyping
  3 | ---
  4 | 
  5 | _Build a tiny prototype of a tetris game on the command line_
  6 | 
  7 | 
  8 | Ruby makes it easy to quickly put together a proof-of-concept for almost any kind of project, as long as you have some experience in rapid application development. In this article, I will go over how I build prototypes, sharing the tricks that have worked well for me.
  9 | 
 10 | Today we'll be walking through a bit of code that implements a small chunk of a falling blocks game that is similar to Tetris. If you're not familiar with Tetris, head over to [freetetris.org](http://freetetris.org) and play it a bit before reading this article.
 11 | 
 12 | Assuming you're now familiar with the general idea behind the game, I'll walk you through the thought process that I went through from the initial idea of working on a falling blocks game to the small bit of code I have written for this issue.
 13 | 
 14 | ## The Planning Phase
 15 | 
 16 | After running through a few ideas, I settled on a falling blocks game as a good example of a problem that's too big to be tackled in a single sitting, but easy enough to make some quick progress on.
 17 | 
 18 | The next step for me was to come up with a target set of requirements for my
 19 | prototype. To prevent the possibilities from seeming endless, I had to set a
 20 | time limit up front to make this decision making process easier. Because
 21 | very small  chunks of focused effort can get you far in Ruby, I settled on
 22 | coming up with something I felt I could build within an hour or two.
 23 | 
 24 | I knew right away this meant that I wasn't going to make an interactive demo. Synchronizing user input and screen output is something that may be easy for folks who do it regularly, but my concurrency knowledge is very limited, and I'd risk spending several hours on that side of things and coming up empty if I went down that path. Fortunately, even without an event loop, there are still a lot of options for building a convincing demo.
 25 | 
 26 | In my initial optimism, I thought what I'd like to be able to do is place a piece on the screen, and then let gravity take over, eliminating any completed lines as it fell into place. But this would require me to implement collision detection, something I didn't want to tackle right away.
 27 | 
 28 | Eventually, I came up with the idea of just implementing the action that happens when a piece collides with the junk on the grid. This process involved turning the active piece into inactive junk, and then removing any completed rows from the grid. This is something that I felt fit within the range of what I could do within an hour or two, so I decided to sleep on it and see if any unknowns bubbled up to the surface.
 29 | 
 30 | I could have just started hacking right away, but ironically that's a practice I typically avoid when putting together rapid prototypes. If this were a commercial project and I quoted the customer 2-4 hours, I'd want to use their money in the best possible way, and picking the wrong scope for my project would be a surefire way to either blow the budget or fail to produce something interesting. I find a few hours of passive noodling helps me see unexpected issues before they bite me.
 31 | 
 32 | Fortunately, this idea managed to pass the test of time, and I set out to begin coding by turning the idea into a set of requirements.
 33 | 
 34 | ## The Requirements Phase
 35 | 
 36 | A good prototype does not come from a top-down or bottom-up design, but instead comes from starting in the middle and building outwards. By taking a small vertical slice of the problem at hand, you are forced to think about many aspects of the system, but not in a way that requires you consider the whole problem all at once. This allows most of your knowledge and often a good chunk of your code to be re-used when you approach the full project.
 37 | 
 38 | The key is to start with a behavior the user can actually observe. This means that you should be thinking in terms of features rather than functions and objects. Some folks use story frameworks such as Cucumber to help them formalize this sort of inside-out thinking, but personally, I prefer just to come up with a good, clear example and not worry about shoehorning it into a formal setting.
 39 | 
 40 | To do this, I created a simple text file filled with ascii art that codified two cases: One in which a line was cleared, and where no lines were cleared. Both cases are shown below.
 41 | 
 42 | 
 43 | ### CASE 1: REMOVING COMPLETED LINES
 44 | 
 45 | ```
 46 | ==========
 47 | 
 48 | 
 49 | 
 50 | 
 51 | 
 52 | 
 53 |    #       
 54 |    #|    |
 55 |   |#||  ||
 56 | |||#||||||
 57 | ==========
 58 | ```
 59 | 
 60 | BECOMES:
 61 | 
 62 | ```
 63 | ==========
 64 | 
 65 | 
 66 | 
 67 | 
 68 | 
 69 | 
 70 | 
 71 |    |       
 72 |    ||    |
 73 |   ||||  ||
 74 | ==========
 75 | ```
 76 | 
 77 | ### CASE 2: COLLISION WITHOUT ANY COMPLETED LINES
 78 | 
 79 | ```
 80 | ==========
 81 | 
 82 | 
 83 | 
 84 | 
 85 | 
 86 | 
 87 |   #       
 88 |   ##|    |
 89 |   |#||  ||
 90 | ||| ||||||
 91 | ==========
 92 | ```
 93 | 
 94 | BECOMES:
 95 | 
 96 | ```
 97 | ==========
 98 | 
 99 | 
100 | 
101 | 
102 | 
103 | 
104 |   |       
105 |   |||    |
106 |   ||||  ||
107 | ||| ||||||
108 | ==========
109 | ```
110 | 
111 | ---------------------------------------------------------------------
112 | 
113 | With the goals for the prototype clearly outlined, I set out to write a simple program that would perform the necessary transformations.
114 | 
115 | ## The Coding Phase
116 | 
117 | One thing I'll openly admit is that when prototyping something that will take me less than a half day from end to end, I tend to relax my standards on both testing and writing clean code. The reason for this is that when I'm trying to take a nose-dive into a new problem domain, I find my best practices actually get in the way until I have at least a basic understanding of the project.
118 | 
119 | What I'll typically do instead is write a single file that implements both the objects I need and an example that gets me closer to my goal. For this project, I started with a canvas object for rendering output similar to what I outlined in my requirements.
120 | 
121 | Imagining this canvas object already existed, I wrote some code for generating the very first bit out output we see in the requirements.
122 | 
123 | ```ruby
124 | canvas = FallingBlocks::Canvas.new
125 | 
126 | (0..2).map do |x|
127 |   canvas.paint([x,0], "|")
128 | end
129 | 
130 | canvas.paint([2,1], "|")
131 | 
132 | (0..3).map do |y|
133 |   canvas.paint([3,y], "#")
134 | end
135 | 
136 | (4..9).map do |x|
137 |   canvas.paint([x,0], "|")
138 | end
139 | 
140 | [4,5,8,9].map do |x|
141 |   canvas.paint([x,1], "|")
142 | end
143 | 
144 | canvas.paint([4,2], "|")
145 | canvas.paint([9,2], "|")
146 | 
147 | puts canvas
148 | ```
149 | 
150 | While I use a few loops for convenience, it's easy to see that this code does little more than put symbols on a text grid at the specified (x,y) coordinates. Once `FallingBlocks::Canvas` is implemented, we'd expect the following output from this example:
151 | 
152 | ```
153 | ==========
154 | 
155 | 
156 | 
157 | 
158 | 
159 | 
160 |    #       
161 |    #|    |
162 |   |#||  ||
163 | |||#||||||
164 | ==========
165 | ```
166 | 
167 | What we have done is narrowed the problem down to a much simpler task, making it easier to get started. The following implementation is sufficient to get the example working, and is simple enough that we probably don't need to discuss it further.
168 | 
169 | ```ruby
170 | module FallingBlocks
171 |   class Canvas
172 |     SIZE = 10
173 | 
174 |     def initialize
175 |       @data = SIZE.times.map { Array.new(SIZE) }
176 |     end
177 | 
178 |     def paint(point, marker)
179 |       x,y = point
180 |       @data[SIZE-y-1][x] = marker
181 |     end
182 | 
183 |     def to_s
184 |       [separator, body, separator].join("\n")
185 |     end
186 | 
187 |     def separator
188 |       "="*SIZE
189 |     end
190 | 
191 |     def body
192 |       @data.map do |row|
193 |         row.map { |e| e || " " }.join
194 |       end.join("\n")
195 |     end
196 |   end
197 | end
198 | ```
199 | 
200 | However, things get a little more hairy once we've plucked this low hanging fruit. So far, we've built a tool for painting the picture of what's going on, but that doesn't tell us anything about the underlying structure. This is a good time to start thinking about what Tetris pieces are.
201 | 
202 | While a full implementation of the game would require implementing rotations and movement, our prototype looks at pieces frozen in time. This means that a piece is really just represented by a collection of points. If we define each piece based on an origin of [0,0], we end up with something like this for a vertical line:
203 | 
204 | ```ruby
205 | line = FallingBlocks::Piece.new([[0,0],[0,1],[0,2],[0,3]])
206 | ```
207 | 
208 | Similarly, a bent S-shaped piece would be defined like this:
209 | 
210 | ```ruby
211 | bent = FallingBlocks::Piece.new([[0,1],[0,2],[1,0],[1,1]])
212 | ```
213 | 
214 | In order to position these pieces on a grid, what we'd need as an anchor point that could be used to translate the positions occupied by the pieces into another coordinate space.
215 | 
216 | We could use the origin at [0,0], but for aesthetic reason, I didn't like the mental model of grasping a piece by a position that could potentially be unoccupied. Instead, I decided to define the anchor as the top-left position occupied by the piece, which could later be translated to a different position on the canvas. This gives us an anchor of [0,3] for the line, and an anchor of [0,2] for the bent shape. I wrote the following example to outline how the API should work.
217 | 
218 | ```ruby
219 | line = FallingBlocks::Piece.new([[0,0],[0,1],[0,2],[0,3]])
220 | p line.anchor #=> [0,3]
221 | 
222 | bent = FallingBlocks::Piece.new([[0,1],[0,2],[1,0],[1,1]])
223 | p bent.anchor #=> [0,2]
224 | ```
225 | 
226 | Once again, a simple example gives me enough constraints to make it easy to write an object that implements the desired behavior.
227 | 
228 | ```ruby
229 | class Piece
230 |   def initialize(points)
231 |     @points = points
232 |     establish_anchor
233 |   end
234 | 
235 |   attr_reader :points, :anchor
236 | 
237 |   # Gets the top-left most point
238 |   def establish_anchor
239 |     @anchor = @points.max_by { |x,y| [y,-x] }
240 |   end
241 | end
242 | ```
243 | 
244 | As I was writing this code, I stopped for a moment and considered that this logic, as well as the logic written earlier that manipulates (x,y) coordinates to fit inside a row-major data structure are the sort of things I really like to write unit tests for. There is nothing particularly tricky about this code, but the lack of tests makes it harder to see what's going on at a glance. Still, this sort of tension is normal when prototyping, and at this point I wasn't even 30 minutes into working on the problem, so I let the feeling pass.
245 | 
246 | The next step was to paint these pieces onto the canvas, and I decided to start
247 | with their absolute coordinates to verify my shape definitions. The following example
248 | outlines the behavior I had expected.
249 | 
250 | ```ruby
251 | canvas = FallingBlocks::Canvas.new
252 | 
253 | bent_shape = FallingBlocks::Piece.new([[0,1],[0,2],[1,0],[1,1]])
254 | bent_shape.paint(canvas)
255 | 
256 | puts canvas
257 | ```
258 | 
259 | OUTPUTS:
260 | 
261 | ```
262 | ==========
263 | 
264 | 
265 | 
266 | 
267 | 
268 | 
269 | 
270 | #         
271 | ##        
272 |  #        
273 | ==========
274 | ```
275 | 
276 | Getting this far was easy, the following definition of `Piece` does the trick:
277 | 
278 | ```ruby
279 | class Piece
280 |    SYMBOL = "#"
281 | 
282 |   def initialize(points)
283 |     @points = points
284 |     establish_anchor
285 |   end
286 | 
287 |   attr_reader :points, :anchor
288 | 
289 |   # Gets the top-left most point
290 |   def establish_anchor
291 |     @anchor = @points.min_by { |x,y| [y,-x] }
292 |   end
293 | 
294 |   def paint(canvas)
295 |     points.each do |point|
296 |       canvas.paint(point, SYMBOL)
297 |     end
298 |   end
299 | end
300 | ```
301 | 
302 | This demonstrates to me that the concept of considering pieces as a collection of points can work, and that my basic coordinates for a bent piece are right. But since I need a way to translate these coordinates to arbitrary positions of the grid for this code to be useful, this iteration was only a stepping stone. A new example pushes us forward.
303 | 
304 | ```ruby
305 | canvas = FallingBlocks::Canvas.new
306 | 
307 | bent_shape = FallingBlocks::Piece.new([[0,1],[0,2],[1,0],[1,1]])
308 | 
309 | canvas.paint_shape(bent_shape, [2,3])
310 | 
311 | puts canvas
312 | ```
313 | 
314 | OUTPUTS
315 | 
316 | ```
317 | ==========
318 | 
319 | 
320 | 
321 | 
322 | 
323 | 
324 |   #       
325 |   ##      
326 |    #      
327 | 
328 | ==========
329 | ```
330 | 
331 | As you can see in the code above, I decided that my `Piece#paint` method was probably better off as `Canvas#paint_shape`, just to collect the presentation logic in one place. Here's what the updated code ended up looking like.
332 | 
333 | ```ruby
334 | class Canvas
335 |  # ...
336 | 
337 |  def paint_shape(shape, position)
338 |    shape.translated_points(position).each do |point|
339 |      paint(point, Piece::SYMBOL)
340 |    end
341 |  end
342 | end
343 | ```
344 | 
345 | This new code does not rely directly on the `Piece#points` method anymore, but instead, passes a position to the newly created `Piece#translated_points` to get a set of coordinates anchored by the specified position.
346 | 
347 | ```ruby
348 | class Piece
349 |   #...
350 | 
351 |   def translated_points(new_anchor)
352 |     new_x, new_y = new_anchor
353 |     old_x, old_y = anchor
354 | 
355 |     dx = new_x - old_x
356 |     dy = new_y - old_y
357 | 
358 |     points.map { |x,y| [x+dx, y+dy] }
359 |   end
360 | end
361 | ```
362 | 
363 | While this mapping isn't very complex, it's yet another point where I was
364 | thinking 'gee, I should be writing tests', and a couple subtle bugs that
365 | cropped up while implementing it confirmed my gut feeling. But with the light
366 | visible at the end of the tunnel, I wrote an example to unify piece objects
367 | with the junk left on the grid from previous moves.
368 | 
369 | ```ruby
370 | game = FallingBlocks::Game.new
371 | bent_shape = FallingBlocks::Piece.new([[0,1],[0,2],[1,0],[1,1]])
372 | game.piece = bent_shape
373 | game.piece_position = [2,3]
374 | game.junk += [[0,0], [1,0], [2,0], [2,1], [4,0],
375 |               [4,1], [4,2], [5,0], [5,1], [6,0],
376 |               [7,0], [8,0], [8,1], [9,0], [9,1],
377 |               [9,2]]
378 | 
379 | puts game
380 | ```
381 | 
382 | OUTPUTS:
383 | 
384 | ```
385 | ==========
386 | 
387 | 
388 | 
389 | 
390 | 
391 | 
392 |   #
393 |   ##|    |
394 |   |#||  ||
395 | ||| ||||||
396 | ==========
397 | ```
398 | 
399 | The key component that tied this all together is the `Game` object, which essentially is just a container that knows how to use a `Canvas` object to render itself.
400 | 
401 | ```ruby
402 | class Game
403 |   def initialize
404 |     @junk = []
405 |     @piece = nil
406 |     @piece_position = []
407 |   end
408 | 
409 |   attr_accessor :junk, :piece, :piece_position
410 | 
411 |   def to_s
412 |     canvas = Canvas.new
413 | 
414 |     junk.each do |pos|
415 |       canvas.paint(pos, "|")
416 |     end
417 | 
418 |     canvas.paint_shape(piece, piece_position, "#")
419 | 
420 |     canvas.to_s
421 |   end
422 | end
423 | ```
424 | 
425 | I made a small change to `Canvas#paint_shape` so that the symbol used to display pieces on the grid was parameterized rather than stored in `Piece::SYMBOL`. This isn't a major change and was just another attempt at moving display code away from the data models.
426 | 
427 | After all this work, we've made it back to the output we were getting out of our first example, but without the smoke and mirrors. Still, the model is not as solid as I'd hoped for, and some last minute changes were needed to bridge the gap before this code was ready to implement the two use cases I was targeting.
428 | 
429 | Since the last iteration would be a bit cumbersome to describe in newsletter form, please just "check out my final commit":http://is.gd/jbvdB for this project on github. With this new code, it's possible to get output identical to our target story through the following two examples.
430 | 
431 | 
432 | ### CASE 1: line_shape_demo.rb
433 | 
434 | ```ruby
435 | require_relative "falling_blocks"
436 | 
437 | game = FallingBlocks::Game.new
438 | line_shape = FallingBlocks::Piece.new([[0,0],[0,1],[0,2],[0,3]])
439 | game.piece = line_shape
440 | game.piece_position = [3,3]
441 | game.add_junk([[0,0], [1,0], [2,0], [2,1], [4,0],
442 |               [4,1], [4,2], [5,0], [5,1], [6,0],
443 |               [7,0], [8,0], [8,1], [9,0], [9,1],
444 |               [9,2]])
445 | 
446 | puts game
447 | 
448 | puts "\nBECOMES:\n\n"
449 | 
450 | game.update_junk
451 | puts game
452 | ```
453 | 
454 | ### CASE 2: bended_shape_demo.rb
455 | 
456 | ```ruby
457 | require_relative "falling_blocks"
458 | 
459 | game = FallingBlocks::Game.new
460 | bent_shape = FallingBlocks::Piece.new([[0,1],[0,2],[1,0],[1,1]])
461 | game.piece = bent_shape
462 | game.piece_position = [2,3]
463 | game.add_junk([[0,0], [1,0], [2,0], [2,1], [4,0],
464 |               [4,1], [4,2], [5,0], [5,1], [6,0],
465 |               [7,0], [8,0], [8,1], [9,0], [9,1],
466 |               [9,2]])
467 | 
468 | puts game
469 | 
470 | puts "\nBECOMES:\n\n"
471 | 
472 | game.update_junk
473 | puts game
474 | ```
475 | 
476 | ## Reflections
477 | 
478 | Once I outlined the story by drawing some ascii art, it took me just over 1.5 hours to produce working code that performs the transformations described. Overall, I'd call that a success.
479 | 
480 | That having been said, working on this problem was not without hurdles. While it turns out that removing completed lines and turning pieces into junk upon collision is surprisingly simple, I am still uneasy about my final design. It seems that there is considerable duplication between the grid maintained by `Game` and the `Canvas` object. But a refactoring here would be non-trivial, and I wouldn't want to attempt it without laying down some tests to minimize the amount of time hunting down subtle bugs.
481 | 
482 | For me, this is about as far as I can write code organically in a single sitting without either writing tests, or doing some proper design in front of whiteboard, or a combination of the two. I think it's important to recognize this limit, and also note that it varies from person to person and project to project. The key to writing a good prototype is getting as close to that line as you can without flying off the edge of a cliff.
483 | 
484 | In the end though, what I like about this prototype is that it isn't just an illusion. With a little work, it'd be easy enough to scale up to my initial ambition of demonstrating a free falling piece. By adding some tests and doing some refactoring, it'd be possible to evolve this code into something that could be used in production rather than just treating it as throwaway demo-ware.
485 | 
486 | Hopefully, seeing how I decomposed the problem, and having a bit of insight into what my though process was like as I worked on this project has helped you understand what goes into making proof-of-concept code in Ruby. I've not actually taught extensively about this process before, so describing it is a bit of an experiment for me. Let me know what you think!
487 | 


--------------------------------------------------------------------------------
/roll-your-own-enumerable-and-enumerator.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Building Enumerable 'n' Enumerator
  3 | ---
  4 | 
  5 | _Learn about powerful iteration tools by implementing some of its functionality yourself_
  6 | 
  7 | 
  8 | When I first came to Ruby, one of the things that impressed me the most was the killer features provided by the `Enumerable` module. I eventually also came to love `Enumerator`, even though it took me a long time to figure out what it was and what one might use it for.
  9 | 
 10 | As a beginner, I had always assumed that these features worked through some dark form of magic that was buried deep within the Ruby interpreter. With so much left to learn just in order to be productive, I was content to postpone learning the details about what was going on under the hood. After some time, I came to regret that decision, thanks to David A. Black.
 11 | 
 12 | David teaches Ruby to raw beginners not only by showing them what `Enumerable` can do, but also by making them implement their own version of it! This is a profoundly good exercise, because it exposes how nonmagical the features are: if you understand `yield`, you can build all the methods in `Enumerable`. Similarly, the interesting features of `Enumerator` can be implemented fairly easily if you use Ruby's `Fiber` construct.
 13 | 
 14 | In this article, we're going to work through the exercise of rolling your own subset of the functionality provided by `Enumerable` and `Enumerator`, discussing each detail along the way. Regardless of your skill level, an understanding of the elegant design of these constructs will undoubtedly give you a great source of inspiration that you can draw from when designing new constructs in your own programs.
 15 | 
 16 | ## Setting the stage with some tests
 17 | 
 18 | I've selected a small but representative subset of the features that `Enumerable` and `Enumerator` provide and written some tests to nail down their behavior. These tests will guide my implementations throughout the rest of this article and serve as a roadmap for you if you'd like to try out the exercise on your own.
 19 | 
 20 | If you have some time to do so, try to get at least some of the tests to go green before reading my implementation code and explanations, as you'll learn a lot more that way. But if you're not planning on doing that, at least read through the tests carefully and think about how you might go about implementing the features they describe.
 21 | 
 22 | ```ruby
 23 | class SortedList
 24 |   include FakeEnumerable
 25 | 
 26 |   def initialize
 27 |     @data = []
 28 |   end
 29 | 
 30 |   def <<(new_element)
 31 |     @data << new_element
 32 |     @data.sort!
 33 | 
 34 |     self
 35 |   end
 36 | 
 37 |   def each
 38 |     if block_given?
 39 |       @data.each { |e| yield(e) }
 40 |     else
 41 |       FakeEnumerator.new(self, :each)
 42 |     end
 43 |   end
 44 | end
 45 | 
 46 | require "minitest/autorun"
 47 | 
 48 | describe "FakeEnumerable" do
 49 |   before do
 50 |     @list = SortedList.new
 51 | 
 52 |     # will get stored interally as 3,4,7,13,42
 53 |     @list << 3 << 13 << 42 << 4 << 7
 54 |   end
 55 | 
 56 |   it "supports map" do
 57 |     @list.map { |x| x + 1 }.must_equal([4,5,8,14,43])  
 58 |   end
 59 | 
 60 |   it "supports sort_by" do
 61 |     # ascii sort order
 62 |     @list.sort_by { |x| x.to_s }.must_equal([13, 3, 4, 42, 7])
 63 |   end
 64 | 
 65 |   it "supports select" do
 66 |     @list.select { |x| x.even? }.must_equal([4,42])
 67 |   end
 68 | 
 69 |   it "supports reduce" do
 70 |     @list.reduce(:+).must_equal(69)
 71 |     @list.reduce { |s,e| s + e }.must_equal(69)
 72 |     @list.reduce(-10) { |s,e| s + e }.must_equal(59)
 73 |   end
 74 | end
 75 | 
 76 | describe "FakeEnumerator" do
 77 |   before do
 78 |     @list = SortedList.new
 79 | 
 80 |     @list << 3 << 13 << 42 << 4 << 7
 81 |   end
 82 | 
 83 |   it "supports next" do
 84 |     enum = @list.each
 85 | 
 86 |     enum.next.must_equal(3)
 87 |     enum.next.must_equal(4)
 88 |     enum.next.must_equal(7)
 89 |     enum.next.must_equal(13)
 90 |     enum.next.must_equal(42)
 91 | 
 92 |     assert_raises(StopIteration) { enum.next }
 93 |   end
 94 | 
 95 |   it "supports rewind" do
 96 |     enum = @list.each
 97 | 
 98 |     4.times { enum.next }
 99 |     enum.rewind
100 | 
101 |     2.times { enum.next }
102 |     enum.next.must_equal(7)
103 |   end
104 | 
105 |   it "supports with_index" do
106 |     enum     = @list.map
107 |     expected = ["0. 3", "1. 4", "2. 7", "3. 13", "4. 42"]  
108 | 
109 |     enum.with_index { |e,i| "#{i}. #{e}" }.must_equal(expected)
110 |   end
111 | end
112 | ```
113 | 
114 | If you do decide to try implementing these features yourself, get as close to the behavior that Ruby provides as you can, but don't worry if your implementation is different from what Ruby really uses. Just think of this as if it's a new problem that needs solving, and let the tests guide your implementation. Once you've done that, read on to see how I did it.
115 | 
116 | ## Implementing the `FakeEnumerable` module
117 | 
118 | Before I began work on implementing `FakeEnumerable`, I needed to get its tests to a failure state rather than an error state. The following code does exactly that:
119 | 
120 | ```ruby
121 | module FakeEnumerable
122 |   def map
123 |   end
124 | 
125 |   def select
126 |   end
127 | 
128 |   def sort_by
129 |   end
130 | 
131 |   def reduce(*args)
132 |   end
133 | end
134 | ```
135 | 
136 | I then began working on implementing the methods one by one, starting with `map`. The key thing to realize while working with `Enumerable` is that every feature will build on top of the `each` method in some way, using it in combination with `yield` to produce its results. The `map` feature is possibly the most straightforward nontrivial combination of these operations, as you can see in this implementation:
137 | 
138 | ```ruby
139 | def map
140 |   out = []
141 | 
142 |   each { |e| out << yield(e) }
143 | 
144 |   out
145 | end
146 | ```
147 | 
148 | Here we see that `map` is simply a function that builds up a new array by taking each element and replacing it with the return value of the block you provide to it. We can clean this up to make it a one liner using `Object#tap`, but I'm not sure if I like that approach because it breaks the simplicity of the implementation a bit. That said, I've included it here for your consideration and will use it throughout the rest of this article, just for the sake of brevity.
149 | 
150 | ```ruby
151 | def map
152 |   [].tap { |out| each { |e| out << yield(e) } }
153 | end
154 | ```
155 | 
156 | Implementing `select` is quite easy as well. It builds on the same concepts used to implement `map` but adds a conditional check to see whether the block returns a `true` value. For each new yielded element, if the value returned by the block is logically true, the element gets added to the newly built array; otherwise, it does not.
157 | 
158 | ```ruby
159 | def select
160 |   [].tap { |out| each { |e| out << e if yield(e) } }
161 | end
162 | ```
163 | 
164 | Implementing `sort_by` is a little more tricky. I cheated and looked at the API documentation, which (perhaps surprisingly) describes how the method is implemented and even gives a reference implementation in Ruby. Apparently, `sort_by` uses a [Schwartzian transform](http://en.wikipedia.org/wiki/Schwartzian_transform) to convert the collection we are iterating over into tuples containing the sort key and the original element. It then uses `Array#sort` to put these in order, and it finally uses `map` on the resulting array to convert the array of tuples back into an array of the elements from the original collection. That's definitely more confusing to explain than it is to implement in code, so just look at the following code for clarification:
165 | 
166 | ```ruby
167 | def sort_by
168 |   map { |a| [yield(a), a] }.sort.map { |a| a[1] }
169 | end
170 | ```
171 | 
172 | The interesting thing about this implementation is that `sort_by` is dependent on `map`, both on the current collection being iterated over as well as on the `Array` it generates. But after tracing it down to the core, this method is still expecting the collection to implement only the `each` method. Additionally, because `Array#sort` is thrown into the mix, your sort keys need to respond to `<=>`. But for such a powerful method, the contract is still very narrow.
173 | 
174 | Implementing `reduce` is a bit more involved because it has three different ways of interacting with it. It's also interesting because it's one of the few `Enumerable` methods that isn't necessarily designed to return an `Array` object. I'll let you ponder the following implementation a bit before providing more commentary, because reading through it should be a good exercise.
175 | 
176 | ```ruby
177 | def reduce(operation_or_value=nil)
178 |   case operation_or_value
179 |   when Symbol
180 |     # convert things like reduce(:+) into reduce { |s,e| s + e }
181 |     return reduce { |s,e| s.send(operation_or_value, e) }
182 |   when nil
183 |     acc = nil
184 |   else
185 |     acc = operation_or_value
186 |   end
187 | 
188 |   each do |a|
189 |     if acc.nil?
190 |       acc = a
191 |     else
192 |       acc = yield(acc, a)
193 |     end
194 |   end
195 | 
196 |   return acc
197 | end
198 | ```
199 | 
200 | First, I have to say I'm not particularly happy with my implementation; it seems a bit too brute force and I think I might be missing some obvious refactorings. But it should have been readable enough for you to get a general feel for what's going on. The first paragraph of code is simply handling the three different cases of `reduce()`. The real operation happens starting with our `each` call.
201 | 
202 | Without a predefined initial value, we set the initial value to the first element in the collection, and our first yield occurs starting with the second element. Otherwise, the initial value and first element are yielded. The purpose of `reduce()` is to perform an operation on each successive value in a list by combining it in some way with the last calculated value. In this way, the list gets reduced to a single value in the end. This behavior explains why the old alias for this method in Ruby was called `inject`: a function is being injected between each element in the collection via our `yield` call. I find this operation much easier to understand when I'm able to see it in terms of primitive concepts such as `yield` and `each` because it makes it possible to trace exactly what is going on.
203 | 
204 | If you are having trouble following the implementation of `reduce()`, don't worry about it. It's definitely one of the more complex `Enumerable` methods, and if you try to implement a few of the others and then return to studying `reduce()`, you may have better luck. But the beautiful thing is that if you ignore the `reduce(:+)` syntactic sugar, it introduces no new concepts beyond that what is used to implement `map()`. If you think you understand `map()` but not `reduce()`, it's a sign that you may need to brush up on your fundamentals, such as how `yield` works.
205 | 
206 | If you've been following along at home, you should at this point be passing all your `FakeEnumerable` tests. That means it's time to get started on our `FakeEnumerator`.
207 | 
208 | ## Implementing the `FakeEnumerator` class
209 | 
210 | Similar to before, I needed to write some code to get my tests to a failure state. First, I set up the skeleton of the `FakeEnumerator` class.
211 | 
212 | ```ruby
213 | class FakeEnumerator
214 |   def next
215 |   end
216 | 
217 |   def with_index
218 |   end
219 | 
220 |   def rewind
221 |   end
222 | end
223 | ```
224 | 
225 | Then I realized that I needed to back and at least modify the `FakeEnumerable#map` method, as my tests rely on it returning a `FakeEnumerator` object when a block is not provided, in a similar manner to the way `Enumerable#map` would return an `Enumerator` in that scenario.
226 | 
227 | ```ruby
228 | module FakeEnumerable
229 |   def map
230 |     if block_given?
231 |       [].tap { |out| each { |e| out << yield(e) } }
232 |     else
233 |       FakeEnumerator.new(self, :map)
234 |     end
235 |   end
236 | end
237 | ```
238 | 
239 | Although, technically speaking, I should have also updated all my other `FakeEnumerable` methods, it's not important to do so because our tests don't cover it and that change introduces no new concepts to discuss. With this change to `map`, my tests all failed rather than erroring out, which meant it was time to start working on the implementation code.
240 | 
241 | But before we get started, it's worth reflecting on the core purpose of an `Enumerator`, which I haven't talked about yet. At its core, an `Enumerator` is simply a proxy object that mixes in `Enumerable` and then delegates its `each` method to some other iterator provided by the object it wraps. This behavior turns an internal iterator into an external one, which allows you to pass it around and manipulate it as an object.
242 | 
243 | Our tests call for us to implement `next`, `rewind`, and `each_index`, but before we can do that meaningfully, we need to make `FakeEnumerator` into a `FakeEnumerable`-enabled proxy object. There are no tests for this because I didn't want to reveal too many hints to those who wanted to try this exercise at home, but this code will do the trick:
244 | 
245 | ```ruby
246 | class FakeEnumerator
247 |   include FakeEnumerable
248 | 
249 |   def initialize(target, iter)
250 |     @target = target
251 |     @iter   = iter
252 |   end
253 | 
254 |   def each(&block)
255 |     @target.send(@iter, &block)
256 |   end
257 | 
258 |    # other methods go here...
259 | end
260 | ```
261 | 
262 | Here we see that `each` uses `send` to call the original iterator method on the target object. Other than that, this is the ordinary pattern we've seen in implementing other collections. The next step is to implement our `next` method, which is a bit tricky.
263 | 
264 | What we need to be able to do is iterate once, then pause and return a value. Then, when `next` is called again, we need to be able to advance one more iteration and repeat the process. We could do something like run the whole iteration and cache the results into an array, then do some sort of indexing operation, but that's both inefficient and impractical for certain applications. This problem made me realize that Ruby's `Fiber` construct might be a good fit because it specifically allows you to jump in and out of a chunk of code on demand. So I decided to try that out and see how far I could get. After some fumbling around, I got the following code to pass the test:
265 | 
266 | ```ruby
267 | # loading the fiber stdlib gives us some extra features, including Fiber#alive?
268 | require "fiber"
269 | 
270 | class FakeEnumerator
271 |   def next
272 |     @fiber ||= Fiber.new do
273 |       each { |e| Fiber.yield(e) }
274 | 
275 |       raise StopIteration
276 |     end
277 | 
278 |     if @fiber.alive?
279 |       @fiber.resume
280 |     else
281 |       raise StopIteration
282 |     end
283 |   end
284 | end
285 | ```
286 | 
287 | This code is hard to read because it isn't really a linear flow, but I'll do my best to explain it using my very limited knowledge of how the `Fiber` construct works. Basically, when you call `Fiber#new` with a block, the code in that block isn't executed immediately. Instead, execution begins when `Fiber#resume` is called. Each time a `Fiber#yield` call is encountered, control is returned to the caller of `Fiber#resume` with the value that was passed to `Fiber#yield` returned. Each subsequent `Fiber#resume` picks up execution back at the point where the last `Fiber#yield` call was made, rather than at the beginning of the code block. This process continues until no more `Fiber#yield` calls remain, and then the last executed line of code is returned as the final value of `Fiber#resume`. Any additional attempts to call `Fiber#resume` result in a `FiberError` because there is nothing left to execute.
288 | 
289 | If you reread the previous paragraph a couple of times and compare it to the definition of my `next` method, it should start to make sense. But if it's causing your brain to melt, check out the [Fiber documentation](http://ruby-doc.org/core-1.9/classes/Fiber.html), which is reasonably helpful.
290 | 
291 | The very short story about this whole thing is that using a `Fiber` in our `next` definition lets us keep track of just how far into the `each` iteration we are and jump back into the iterator on demand to get the next value. I prevent the `FiberError` from ever occurring by checking to see whether the `Fiber` object is still alive before calling `resume`. But I also need to make it so that the final executed statement within the `Fiber` raises a `StopIteration` error as well, to prevent it from returning the result of `each`, which would be the collection itself. This is a kludge, and if you have a better idea for how to handle this case, please leave me a comment.
292 | 
293 | The use of `Fiber` objects to implement `next` makes it possible to work with infinite iterators, such as `Enumerable#cycle`. Though we won't get into implementation details, the following code should give some hints as to why this is a useful feature:
294 | 
295 | ```ruby
296 | >> row_colors = [:red, :green].cycle
297 | => #<Enumerator: [:red, :green]:cycle>
298 | >> row_colors.next
299 | => :red
300 | >> row_colors.next
301 | => :green
302 | >> row_colors.next
303 | => :red
304 | >> row_colors.next
305 | => :green
306 | ```
307 | 
308 | As cool as that is, and as much as it makes me want to dig into implementing it, I have to imagine that you're getting tired by now. Heck, I've already slept twice since I started writing this article! So let's hurry up and finish implementing `rewind` and `each_index` so that we can wrap things up.
309 | 
310 | I found a way to implement `rewind` that is trivial, but something about it makes me wonder if I've orphaned a `Fiber` object somewhere and whether that has weird garbage collection mplications. But nonetheless, because our implementation of `next` depends on the caching of a `Fiber` object to keep track of where it is in its iteration, the easiest way to rewind back to the beginning state is to simply wipe out that object. The following code gets my `rewind` tests passing:
311 | 
312 | ```ruby
313 | def rewind
314 |   @fiber = nil
315 | end
316 | ```
317 | 
318 | Now only one feature stands between us and the completion of our exercise: `with_index`. The real `with_index` method in Ruby is much smarter than what you're about to see, but for its most simple functionality, the following code will do the trick:
319 | 
320 | ```ruby
321 | def with_index
322 |   i = 0
323 |   each do |e|
324 |     out = yield(e, i)
325 |     i += 1
326 |     out
327 |   end
328 | end
329 | ```
330 | 
331 | Here, I did the brute force thing and maintained my own counter. I then made a small modification to control flow so that rather than yielding just the element on each iteration, both the element and its index are yielded. Keep in mind that the `each` call here is a proxy to some other iterator on another collection, which is what gives us the ability to call `@list.map.with_index` and get `map` behavior rather than `each` behavior. Although you won't use every day, knowing how to implement an around filter using `yield` can be quite useful.
332 | 
333 | With this code written, my full test suite finally went green. Even though I'd done these exercises a dozen times before, I still learned a thing or two while writing this article, and I imagine there is still plenty left for me to learn as well. How about you?
334 | 
335 | ## Reflections
336 | 
337 | This is definitely one of my favorite exercises for getting to understand Ruby better. I'm not usually big on contrived practice drills, but there is something about peeling back the magic on features that look really complex on the surface that gives me a great deal of satisfaction. I find that even if my solutions are very much cheap counterfeits of what Ruby must really be doing, it still helps tremendously to have implemented these features in any way I know how, because it gives me a mental model of my own construction from which to view the features.
338 | 
339 | If you enjoyed this exercise, there are a number of things that you could do to squeeze even more out of it. The easiest way to do so is to implement a few more of the `Enumerable` and `Enumerator` methods. As you do that, you'll find areas where the implementations we built out today are clearly insufficient or would be better off written another way. That's fine, because it will teach you even more about how these features hang together. You can also discuss and improve upon the examples I've provided, as there is certainly room for refactoring in several of them. Finally, if you want to take a more serious approach to things, you could take a look at the tests in [RubySpec](https://github.com/rubyspec/rubyspec) and the implementations in [Rubinius](https://github.com/rubinius/rubinius). Implementing Ruby in Ruby isn't just something folks do for fun these days, and if you really enjoyed working on these low-level features, you might consider contributing to Rubinius in some way. The maintainers of that project are amazing, and you can learn a tremendous amount that way.
340 | 
341 | Of course, not everyone has time to contribute to a Ruby implementation, even if it's for the purpose of advancing their own understanding of Ruby. So I'd certainly settle for a comment here sharing your experiences with this exercise.
342 | 


--------------------------------------------------------------------------------
/unix-style-command-line-applications.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Building Unix-style command line applications
  3 | ---
  4 | 
  5 | _Build a basic clone of the 'cat' utility while learning some idioms for command line applications_
  6 | 
  7 | 
  8 | 
  9 | Ruby is best known as a web development language, but in its early days it was
 10 | mainly used on the command line. In this article, we'll get back to those roots by building a partial implementation of the standard Unix command `cat`.
 11 | 
 12 | The core purpose of the `cat` utility is to read in a list of input files, concatenate them, and output the resulting text to the command line. You can also use `cat` for a few other useful things, such as adding line numbers and suppressing extraneous whitespace. If we stick to these commonly used features, the core functionality of `cat` is something even a novice programmer would be able to implement without too much effort.
 13 | 
 14 | The tricky part of building a `cat` clone is that it involves more than just
 15 | some basic text manipulation; you also need to know about some
 16 | stream processing and error handling techniques that are common in Unix
 17 | utilities. The [acceptance tests](https://gist.github.com/1293709)
 18 | that I've used to compare the original `cat` utility to my Ruby-based `rcat`
 19 | tool reveal some of the extra details that need to be considered when
 20 | building this sort of command line application.
 21 | 
 22 | If you are already fairly comfortable with building command line tools, you may
 23 | want to try implementing your own version of `rcat` before reading on. But don't
 24 | worry if you wouldn't even know where to start: I've provided a
 25 | detailed walkthrough of my solution that will teach you everything
 26 | that you need to know.
 27 | 
 28 | > **NOTE:** You'll need to have the source code for [my implementation of rcat](https://github.com/elm-city-craftworks/rcat) easily accessible as you work through the rest of this article. Please either clone the repository now or keep the GitHub file browser open while reading.
 29 | 
 30 | ## Building an executable script
 31 | 
 32 | Our first task is to make it possible to run the `rcat` script without having to type something like `ruby path/to/rcat` each time we run it. This task can be done in three easy steps.
 33 | 
 34 | **1) Add a shebang line to your script.**
 35 | 
 36 | If you look at `bin/rcat` in my code, you'll see that it starts with the following line:
 37 | 
 38 | ```
 39 | #!/usr/bin/env ruby
 40 | ```
 41 | 
 42 | This line (commonly called a shebang line) tells the shell what interpreter to use to process the rest of the file. Rather than providing a path directly to the Ruby interpreter, I instead use the path to the standard `env` utility. This step allows `env` to figure out which `ruby` executable is present in our current environment and to use that interpreter to process the rest of the file. This approach is preferable because it is [more portable](http://en.wikipedia.org/wiki/Shebang_line#Portability) than hard-coding a path to a particular Ruby install. Although Ruby can be installed in any number of places, the somewhat standardized location of `env` makes it reasonably dependable.
 43 | 
 44 | **2) Make your script executable.**
 45 | 
 46 | Once the shebang line is set up, it's necessary to update the permissions on the `bin/rcat` file. Running the following command from the project root will make `bin/rcat` executable:
 47 | 
 48 | ```
 49 | $ chmod +x bin/rcat
 50 | ```
 51 | 
 52 | Although the executable has not yet been added to the shell's lookup path, it is now possible to test it by providing an explicit path to the executable.
 53 | 
 54 | ```
 55 | $ ./bin/rcat data/gettysburg.txt
 56 | Four score and seven years ago, our fathers brought forth on this continent a
 57 | new nation, conceived in Liberty and dedicated to the proposition that all men
 58 | are created equal.
 59 | 
 60 | ... continued ...
 61 | ```
 62 | 
 63 | **3) Add your script to the shell's lookup path.**
 64 | 
 65 | The final step is to add the executable to the shell's lookup path so that it can be called as a simple command. In Bash-like shells, the path is updated by modifying the `PATH` environment variable, as shown in the following example:
 66 | 
 67 | ```
 68 | $ export PATH=/Users/seacreature/devel/rcat/bin:$PATH
 69 | ```
 70 | 
 71 | This command prepends the `bin` folder in my rcat project to the existing contents of the `PATH`, which makes it possible for the current shell to call the `rcat` command without specifying a direct path to the executable, similar to how we call ordinary Unix commands:
 72 | 
 73 | ```
 74 | $ rcat data/gettysburg.txt
 75 | Four score and seven years ago, our fathers brought forth on this continent a
 76 | new nation, conceived in Liberty and dedicated to the proposition that all men
 77 | are created equal.
 78 | 
 79 | ... continued ...
 80 | ```
 81 | 
 82 | To confirm that you've followed these steps correctly and that things are working as expected, you can now run the acceptance tests. If you see anything different than the following output, retrace your steps and see whether you've made a mistake somewhere. If not, please leave a comment and I'll try to help you out.
 83 | 
 84 | ```
 85 | $ ruby tests.rb
 86 | You passed the tests, yay!
 87 | ```
 88 | 
 89 | Assuming that you have a working `rcat` executable, we can now move on to talk about how the actual program is implemented.
 90 | 
 91 | ## Stream processing techniques
 92 | 
 93 | We now can turn our focus to the first few acceptance tests from the _tests.rb_ file. The thing that all these use cases have in common is that they involve very simple processing of input and output streams, and nothing more.
 94 | 
 95 | ```ruby
 96 | cat_output  = `cat #{gettysburg_file}`
 97 | rcat_output = `rcat #{gettysburg_file}`
 98 | 
 99 | fail "Failed 'cat == rcat'" unless cat_output == rcat_output
100 | 
101 | ############################################################################
102 | 
103 | cat_output  = `cat #{gettysburg_file} #{spaced_file}`
104 | rcat_output = `rcat #{gettysburg_file} #{spaced_file}`
105 | 
106 | fail "Failed 'cat [f1 f2] == rcat [f1 f2]'" unless cat_output == rcat_output
107 | 
108 | ############################################################################
109 | 
110 | cat_output  = `cat < #{spaced_file}`
111 | rcat_output = `rcat < #{spaced_file}`
112 | 
113 | fail "Failed 'cat < file == rcat < file" unless cat_output == rcat_output
114 | ```
115 | 
116 | If we needed only to pass these three tests, we'd be in luck. Ruby provides a special stream object called `ARGF` that combines multiple input files into a single stream or falls back to standard input if no files are provided. Our entire script could look something like this:
117 | 
118 | ```ruby
119 | ARGF.each_line { |line| print line }
120 | ```
121 | 
122 | However, the real `cat` utility does a lot more than what `ARGF` provides,
123 | so it was necessary to write some custom code to handle stream processing:
124 | 
125 | ```ruby
126 | module RCat
127 |   class Application
128 |     def initialize(argv)
129 |       @params, @files = parse_options(argv)
130 | 
131 |       @display        = RCat::Display.new(@params)
132 |     end
133 | 
134 |     def run
135 |       if @files.empty?
136 |         @display.render(STDIN)
137 |       else
138 |         @files.each do |filename|
139 |           File.open(filename) { |f| @display.render(f) }
140 |         end
141 |       end
142 |     end
143 | 
144 |     def parse_options(argv)
145 |       # ignore this for now
146 |     end
147 |   end
148 | end
149 | ```
150 | 
151 | The main difference between this code and the `ARGF`-based approach is that `RCat::Application#run` creates a new stream for each file. This comes in handy later when working on support for empty line suppression and complex line numbering but also complicates the implementation of the `RCat::Display` object. In the following example, I've stripped away the code that is related to these more complicated features to make it a bit easier for you to see the overall flow of things:
152 | 
153 | ```ruby
154 | module RCat
155 |   class Display
156 |     def render(data)
157 |       lines = data.each_line
158 |       loop { render_line(lines) }
159 |     end
160 | 
161 |     private
162 | 
163 |     def render_line(lines)
164 |       current_line = lines.next
165 |       print current_line
166 |     end
167 |   end
168 | end
169 | ```
170 | 
171 | The use of `loop` instead of an ordinary Ruby iterator might feel a bit strange here, but it works fairly well in combination with `Enumerator#next`. The following irb session demonstrates how the two interact with one another:
172 | 
173 | ```
174 | >> lines = "a\nb\nc\n".each_line
175 | => #<Enumerator: "a\nb\nc\n":each_line>
176 | >> loop { p lines.next }
177 | "a\n"
178 | "b\n"
179 | "c\n"
180 | => nil
181 | 
182 | >> lines = "a\nb\nc\n".each_line
183 | => #<Enumerator: "a\nb\nc\n":each_line>
184 | >> lines.next
185 | => "a\n"
186 | >> lines.next
187 | => "b\n"
188 | >> lines.next
189 | => "c\n"
190 | 
191 | >> lines.next
192 | StopIteration: iteration reached an end
193 |   from (irb):8:in `next'
194 |   from (irb):8
195 |   from /Users/seacreature/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
196 | 
197 | >> loop { raise StopIteration }
198 | => nil
199 | ```
200 | 
201 | Using this pattern makes it possible for `render_line` to actually consume more
202 | than one line from the input stream at once. If you work through the logic that
203 | is necessary to get the following test to pass, you might catch a glimpse of the
204 | benefits of this technique:
205 | 
206 | ```ruby
207 | cat_output  = `cat -s #{spaced_file}`
208 | rcat_output = `rcat -s #{spaced_file}`
209 | 
210 | fail "Failed 'cat -s == rcat -s'" unless cat_output == rcat_output
211 | ```
212 | 
213 | Tracing the executation path for `rcat -s` will lead you to this line of code in
214 | `render_line`, which is the whole reason I decided to use this
215 | `Enumerator`-based implementation:
216 | 
217 | ```ruby
218 | lines.next while lines.peek.chomp.empty?
219 | ```
220 | 
221 | This code does an arbitrary amount of line-by-line lookahead until either a nonblank line is found or the end of the file is reached. It does so in a purely stateless and memory-efficient manner and is perhaps the most interesting line of code in this entire project. The downside of this approach is that it requires the entire `RCat::Display` object to be designed from the ground up to work with `Enumerator` objects. However, I struggled to come up with an alternative implementation that didn't involve some sort of complicated state machine/buffering mechanism that would be equally cumbersome to work with.
222 | 
223 | As tempting as it is to continue discussing the pros and cons of the different
224 | ways of solving this particular problem, it's probably best for us to get back on
225 | track and look at some more basic problems that arise when working on
226 | command-line applications. I will now turn to the `parse_options` method that I asked you
227 | to treat as a black box in our earlier examples.
228 | 
229 | ## Options parsing
230 | 
231 | Ruby provides two standard libraries for options parsing: `GetoptLong` and `OptionParser`. Though both are fairly complex tools, `OptionParser` looks and feels a lot more like ordinary Ruby code while simultaneously managing to be much more powerful. The implementation of `RCat::Application#parse_options` makes it clear what a good job `OptionParser` does when it comes to making easy things easy:
232 | 
233 | ```ruby
234 | module RCat
235 |   class Application
236 |     # other code omitted
237 | 
238 |     def parse_options(argv)
239 |       params = {}
240 |       parser = OptionParser.new
241 | 
242 |       parser.on("-n") { params[:line_numbering_style] ||= :all_lines         }
243 |       parser.on("-b") { params[:line_numbering_style]   = :significant_lines }
244 |       parser.on("-s") { params[:squeeze_extra_newlines] = true               }
245 | 
246 |       files = parser.parse(argv)
247 | 
248 |       [params, files]
249 |     end
250 |   end
251 | end
252 | ```
253 | 
254 | The job of `OptionParser#parse` is to take an arguments array and match it against the callbacks defined via the `OptionParser#on` method. Whenever a flag is matched, the associated block for that flag is executed. Finally, any unmatched arguments are returned. In the case of `rcat`, the unmatched arguments consist of the list of files we want to concatenate and display. The following example demonstrates what's going on in `RCat::Application`:
255 | 
256 | ```ruby
257 | require "optparse"
258 | 
259 | puts "ARGV is #{ARGV.inspect}"
260 | 
261 | params = {}
262 | parser = OptionParser.new
263 | 
264 | parser.on("-n") { params[:line_numbering_style] ||= :all_lines         }
265 | parser.on("-b") { params[:line_numbering_style]   = :significant_lines }
266 | parser.on("-s") { params[:squeeze_extra_newlines] = true               }
267 | 
268 | files = parser.parse(ARGV)
269 | 
270 | puts "params are #{params.inspect}"
271 | puts "files are #{files.inspect}"
272 | ```
273 | 
274 | Try running this script with various options and see what you end up with. You should get something similar to the output shown here:
275 | 
276 | ```
277 | $ ruby option_parser_example.rb -ns data/*.txt
278 | ARGV is ["-ns", "data/gettysburg.txt", "data/spaced_out.txt"]
279 | params are {:line_numbering_style=>:all_lines, :squeeze_extra_newlines=>true}
280 | files are ["data/gettysburg.txt", "data/spaced_out.txt"]
281 | 
282 | $ ruby option_parser_example.rb data/*.txt
283 | ARGV is ["data/gettysburg.txt", "data/spaced_out.txt"]
284 | params are {}
285 | files are ["data/gettysburg.txt", "data/spaced_out.txt"]
286 | ```
287 | 
288 | Although `rcat` requires us to parse only the most basic form of arguments, `OptionParser` is capable of a whole lot more than what I've shown here. Be sure to check out its [API documentation](http://ruby-doc.org/stdlib-1.9.2/libdoc/optparse/rdoc/OptionParser.html#method-i-parse) to see the full extent of what it can do.
289 | 
290 | Now that I've covered how to get data in and out of our `rcat` application, we can talk a bit about how it does `cat`-style formatting for line numbering.
291 | 
292 | ## Basic text formatting
293 | 
294 | Formatting text for the console can be a bit cumbersome, but some things are easier than they seem. For example, the tidy output of `cat -n` shown here is not especially hard to implement:
295 | 
296 | <pre style="font-size: 0.8em">
297 | $ cat -n data/gettysburg.txt
298 |    1  Four score and seven years ago, our fathers brought forth on this continent a
299 |    2  new nation, conceived in Liberty and dedicated to the proposition that all men
300 |    3  are created equal.
301 |    4  
302 |    5  Now we are engaged in a great civil war, testing whether that nation, or any
303 |    6  nation so conceived and so dedicated, can long endure. We are met on a great
304 |    7  battle-field of that war. We have come to dedicate a portion of that field as a
305 |    8  final resting place for those who here gave their lives that that nation might
306 |    9  live. It is altogether fitting and proper that we should do this.
307 |   10  
308 |   11  But, in a larger sense, we can not dedicate -- we can not consecrate -- we can
309 |   12  not hallow -- this ground. The brave men, living and dead, who struggled here
310 |   13  have consecrated it far above our poor power to add or detract. The world will
311 |   14  little note nor long remember what we say here, but it can never forget what
312 |   15  they did here. It is for us the living, rather, to be dedicated here to the
313 |   16  unfinished work which they who fought here have thus far so nobly advanced. It
314 |   17  is rather for us to be here dedicated to the great task remaining before us --
315 |   18  that from these honored dead we take increased devotion to that cause for which
316 |   19  they gave the last full measure of devotion -- that we here highly resolve that
317 |   20  these dead shall not have died in vain -- that this nation, under God, shall
318 |   21  have a new birth of freedom -- and that government of the people, by the people,
319 |   22  for the people, shall not perish from the earth.
320 | </pre>
321 | 
322 | On my system, `cat` seems to assume a fixed-width column with space for up to six digits. This format looks great for any file with fewer than a million lines in it, but eventually breaks down once you cross that boundary.
323 | 
324 | ```
325 | $ ruby -e "1_000_000.times { puts 'blah' }" | cat -n | tail
326 | 999991    blah
327 | 999992    blah
328 | 999993    blah
329 | 999994    blah
330 | 999995    blah
331 | 999996    blah
332 | 999997    blah
333 | 999998    blah
334 | 999999    blah
335 | 1000000    blah
336 | ```
337 | 
338 | This design decision makes implementing the formatting code for this feature a whole lot easier. The `RCat::Display#print_labeled_line` method shows that it's possible to implement this kind of formatting with a one-liner:
339 | 
340 | ```ruby
341 | def print_labeled_line(line)
342 |   print "#{line_number.to_s.rjust(6)}\t#{line}"
343 | end
344 | ```
345 | 
346 | Although the code in this example is sufficient for our needs in `rcat`, it's worth mentioning that `String` also supports the `ljust` and `center` methods. All three of these justification methods can optionally take a second argument, which causes them to use an arbitrary string as padding rather than a space character; this feature is sometimes useful for creating things like ASCII status bars or tables.
347 | 
348 | I've worked on a lot of different command-line report formats before, and I can tell you that streamable, fixed-width output is the easiest kind of reporting you'll come by. Things get a lot more complicated when you have to support variable-width columns or render elements that span multiple rows and columns. I won't get into the details of how to do those things here, but feel free to leave a comment if you're interested in hearing more on that topic.
349 | 
350 | ## Error handling and exit codes
351 | 
352 | The techniques we've covered so far are enough to get most of `rcat`'s tests passing, but the following three scenarios require a working knowledge of how Unix commands tend to handle errors. Read through them and do the best you can to make sense of what's going on.
353 | 
354 | ```ruby
355 | `cat #{gettysburg_file}`
356 | cat_success = $?
357 | 
358 | `rcat #{gettysburg_file}`
359 | rcat_success = $?
360 | 
361 | unless cat_success.exitstatus == 0 && rcat_success.exitstatus == 0
362 |   fail "Failed 'cat and rcat success exit codes match"
363 | end
364 | 
365 | ############################################################################
366 | 
367 | cat_out, cat_err, cat_process    = Open3.capture3("cat some_invalid_file")
368 | rcat_out, rcat_err, rcat_process = Open3.capture3("rcat some_invalid_file")
369 | 
370 | unless cat_process.exitstatus == 1 && rcat_process.exitstatus == 1
371 |   fail "Failed 'cat and rcat exit codes match on bad file"
372 | end
373 | 
374 | unless rcat_err == "rcat: No such file or directory - some_invalid_file\n"
375 |   fail "Failed 'cat and rcat error messages match on bad file'"
376 | end
377 | 
378 | ############################################################################
379 | 
380 | 
381 | cat_out, cat_err, cat_proccess  = Open3.capture3("cat -x #{gettysburg_file}")
382 | rcat_out,rcat_err, rcat_process = Open3.capture3("rcat -x #{gettysburg_file}")
383 | 
384 | unless cat_process.exitstatus == 1 && rcat_process.exitstatus == 1
385 |   fail "Failed 'cat and rcat exit codes match on bad switch"
386 | end
387 | 
388 | unless rcat_err == "rcat: invalid option: -x\nusage: rcat [-bns] [file ...]\n"
389 |   fail "Failed 'rcat provides usage instructions when given invalid option"
390 | end
391 | ```
392 | 
393 | The first test verifies exit codes for successful calls to `cat` and `rcat`. In Unix programs, exit codes are a means to pass information back to the shell about whether a command finished successfully. The right way to signal that things worked as expected is to return an exit code of 0, which is exactly what Ruby does whenever a program exits normally without error.
394 | 
395 | Whenever we run a shell command in Ruby using backticks, a `Process::Status` object is created and is then assigned to the `$?` global variable. This object contains (among other things) the exit status of the command that was run. Although it looks a bit cryptic, we're able to use this feature to verify in our first test that both `cat` and `rcat` finished their jobs successfully without error.
396 | 
397 | The second and third tests require a bit more heavy lifting because in these scenarios, we want to capture not only the exit status of these commands, but also whatever text they end up writing to the STDERR stream. To do so, we use the `Open3` standard library. The `Open3.capture3` method runs a shell command and then returns whatever was written to STDOUT and STDERR, as well as a `Process::Status` object similar to the one we pulled out of `$?` earlier.
398 | 
399 | If you look at _bin/rcat_, you'll find the code that causes these tests to pass:
400 | 
401 | ```ruby
402 | begin
403 |   RCat::Application.new(ARGV).run
404 | rescue Errno::ENOENT => err
405 |   abort "rcat: #{err.message}"
406 | rescue OptionParser::InvalidOption => err
407 |   abort "rcat: #{err.message}\nusage: rcat [-bns] [file ...]"
408 | end
409 | ```
410 | 
411 | The `abort` method provides a means to write some text to STDERR and then exit with a nonzero code. The previous code provides functionality equivalent to the following, more explicit code:
412 | 
413 | ```ruby
414 | begin
415 |   RCat::Application.new(ARGV).run
416 | rescue Errno::ENOENT => err
417 |   $stderr.puts "rcat: #{err.message}"
418 |   exit(1)
419 | rescue OptionParser::InvalidOption => err
420 |   $stderr.puts "rcat: #{err.message}\nusage: rcat [-bns] [file ...]"
421 |   exit(1)
422 | end
423 | ```
424 | 
425 | Looking back on things, the errors I've rescued here are somewhat low level, and
426 | it might have been better to rescue them where they occur and then reraise
427 | custom errors provided by `RCat`. This approach would lead to code similar to
428 | what is shown below:
429 | 
430 | ```ruby
431 | begin
432 |   RCat::Application.new(ARGV).run
433 | rescue RCat::Errors::FileNotFound => err
434 |   # ...
435 | rescue RCat::Errors::InvalidParameter => err
436 |   # ..
437 | end
438 | ```
439 | 
440 | Regardless of how these exceptions are labeled, it's important to note that I intentionally let them bubble all the way up to the outermost layer and only then rescue them and call `Kernel#exit`. Intermingling `exit` calls within control flow or modeling logic makes debugging nearly impossible and also makes automated testing a whole lot harder.
441 | 
442 | Another thing to note about this code is that I write my error messages to `STDERR` rather than `STDOUT`. Unix-based systems give us these two different streams for a reason: they let us separate debugging output and functional output so that they can be redirected and manipulated independently. Mixing the two together makes it much more difficult for commands to be chained together in a pipeline, going against the [Unix philosophy](http://en.wikipedia.org/wiki/Unix_philosophy).
443 | 
444 | Error handling is a topic that could easily span several articles. But when it comes to building command-line applications, you'll be in pretty good shape if you remember just two things: use `STDERR` instead of `STDOUT` for debugging output, and make sure to exit with a nonzero status code if your application fails to do what it is supposed to do. Following those two simple rules will make your application play a whole lot nicer with others.
445 | 
446 | ## Reflections
447 | 
448 | Holy cow, this was a hard article to write! When I originally decided to write a `cat` clone, I worried that the example would be too trivial and boring to be worth writing about. However, once I actually implemented it and sat down to write this article, I realized that building command-line applications that respect Unix philosophy and play nice with others is harder than it seems on the surface.
449 | 
450 | Rather than treating this article as a definitive reference for how to build good command-line applications, perhaps we can instead use it as a jumping-off point for future topics to cover in a more self-contained fashion. I'd love to hear your thoughts on what topics in particular interested you and what areas you think should have been covered in greater detail.
451 | 


--------------------------------------------------------------------------------
/working-with-binary-file-formats.md:
--------------------------------------------------------------------------------
  1 | ---
  2 | title: Working with binary file formats
  3 | ---
  4 | 
  5 | _Read and write bitmap files using only a few dozen lines of code_
  6 | 
  7 | 
  8 | Even if we rarely give them much thought, binary file formats are everywhere.
  9 | Ranging from images to audio files to nearly every other sort of media you can
 10 | imagine, binary files are used because they are an efficient way of
 11 | storing information in a ready-to-process format.
 12 | 
 13 | Despite their usefulness, binary files are cryptic and appear to be
 14 | difficult to understand on the surface. Unlike a
 15 | text-based data format, simply looking at a binary file won't give you any
 16 | hints about what its contents are. To even begin to understand a binary
 17 | encoded file, you need to read its format specification. These specifications
 18 | tend to include lots of details about obscure edge cases, and that makes for
 19 | challenging reading unless you already have spent a fair amount of time
 20 | working in the realm of bits and bytes. For these reasons, it's probably better
 21 | to learn by example rather than taking a more formal approach.
 22 | 
 23 | In this article, I will show you how to encode and decode the bitmap image
 24 | format. Bitmap images have a simple structure, and the format is well documented.
 25 | Despite the fact that you'll probably never need to work with bitmap images
 26 | at all in your day-to-day work, the concepts involved in both reading and
 27 | writing a BMP file are pretty much the same as any other file format you'll encounter.
 28 | 
 29 | ## The anatomy of a bitmap
 30 | 
 31 | A bitmap file consists of several sections of metadata followed by a pixel array that represents the color and position of every pixel in the image.
 32 | The example below demonstrates that even if you break the sequence up into its different parts, it would still be a real
 33 | challenge to understand without any documentation handy:
 34 | 
 35 | ```ruby
 36 | # coding: binary
 37 | 
 38 | hex_data = %w[
 39 |   42 4D
 40 |   46 00 00 00
 41 |   00 00
 42 |   00 00
 43 |   36 00 00 00
 44 | 
 45 |   28 00 00 00
 46 |   02 00 00 00
 47 |   02 00 00 00
 48 |   01 00
 49 |   18 00
 50 |   00 00 00 00
 51 |   10 00 00 00
 52 |   13 0B 00 00
 53 |   13 0B 00 00
 54 |   00 00 00 00
 55 |   00 00 00 00
 56 | 
 57 |   00 00 FF
 58 |   FF FF FF
 59 |   00 00
 60 |   FF 00 00
 61 |   00 FF 00
 62 |   00 00
 63 | ]
 64 | 
 65 | out = hex_data.each_with_object("") { |e,s| s << Integer("0x#{e}") }
 66 | 
 67 | File.binwrite("example1.bmp", out)
 68 | ```
 69 | 
 70 | Once you learn what each section represents, you can start
 71 | to interpret the data. For example, if you know that this is a
 72 | 24-bit per pixel image that is two pixels wide, and two pixels high, you might
 73 | be able to make sense of the pixel array data shown below:
 74 | 
 75 | ```
 76 | 00 00 FF
 77 | FF FF FF
 78 | 00 00
 79 | FF 00 00
 80 | 00 FF 00
 81 | 00 00
 82 | ```
 83 | 
 84 | If you run this example script and open the image file it produces, you'll see
 85 | something similar to what is shown below once you zoom in close enough to see
 86 | its pixels:
 87 | 
 88 | ![Pixels](http://i.imgur.com/XhKW1.png)
 89 | 
 90 | 
 91 | By experimenting with changing some of the values in the pixel array by hand, you will fairly quickly discover the overall structure of the array and the way pixels are represented. After figuring this out, you might also be able to look back on the rest of the file and determine what a few of the fields in the headers are without looking at the documentation.
 92 | 
 93 | After exploring a bit on your own, you should check out the [field-by-field walkthrough of a 2x2 bitmap file](http://en.wikipedia.org/wiki/BMP_file_format#Example_1) that this example was based on. The information in that table is pretty much all you'll need to know in order to make sense of the bitmap reader and writer implementations I've built for this article.
 94 | 
 95 | ## Encoding a bitmap image
 96 | 
 97 | Now that you've seen what a bitmap looks like in its raw form, I can demonstrate
 98 | how to build a simple encoder object that allows you to generate bitmap images
 99 | in a much more convenient way. In particular, I'm going to show what I did to
100 | get the following code to output the same image that we rendered via a raw
101 | sequence of bytes earlier:
102 | 
103 | ```ruby
104 | bmp = BMP::Writer.new(2,2)
105 | 
106 | # NOTE: Bitmap encodes pixels in BGR format, not RGB!
107 | bmp[0,0] = "ff0000"
108 | bmp[1,0] = "00ff00"
109 | bmp[0,1] = "0000ff"
110 | bmp[1,1] = "ffffff"
111 | 
112 | bmp.save_as("example_generated.bmp")
113 | ```
114 | 
115 | Like most binary formats, the bitmap format has a tremendous amount of options
116 | that make building a complete implementation a whole lot more complicated than
117 | just building a tool which is suitable for generating a single type of image. I
118 | realized shortly after skimming the format description that you can skip out on
119 | a lot of the boilerplate information if you stick to 24bit-per-pixel images, so
120 | I decided to do exactly that.
121 | 
122 | Looking at the implementation from the outside-in, you can see the general
123 | structure of the `BMP::Writer` class. Pixels are stored in a two-dimensional
124 | array, and all the interesting things happen at the time you write the image out
125 | to file:
126 | 
127 | ```ruby
128 | class BMP
129 |   class Writer
130 |     def initialize(width, height)
131 |       @width, @height = width, height
132 | 
133 |       @pixels = Array.new(@height) { Array.new(@width) { "000000" } }
134 |     end
135 | 
136 |     def []=(x,y,value)
137 |       @pixels[y][x] = value
138 |     end
139 | 
140 |     def save_as(filename)
141 |       File.open(filename, "wb") do |file|
142 |         write_bmp_file_header(file)
143 |         write_dib_header(file)
144 |         write_pixel_array(file)
145 |       end
146 |     end
147 | 
148 |     # ... rest of implementation details omitted for now ...
149 |   end
150 | end
151 | ```
152 | 
153 | All bitmap files start out with the bitmap file header, which consists of the
154 | following things:
155 | 
156 | * A two character signature to indicate the file is a bitmap file (typically "BM").
157 | * A 32bit unsigned little-endian integer representing the size of the file itself.
158 | * A pair of 16bit unsigned little-endian integers reserved for application specific uses.
159 | * A 32bit unsigned little-endian integer representing the offset to where the pixel array starts in the file.
160 | 
161 | The following code shows how `BMP::Writer` builds up this header and writes it
162 | to file:
163 | 
164 | ```ruby
165 | class BMP
166 |   class Writer
167 |     PIXEL_ARRAY_OFFSET = 54
168 |     BITS_PER_PIXEL     = 24
169 | 
170 |     # ... rest of code as before ...
171 | 
172 |     def write_bmp_file_header(file)
173 |       file << ["BM", file_size, 0, 0, PIXEL_ARRAY_OFFSET].pack("A2Vv2V")
174 |     end
175 | 
176 |     def file_size
177 |       PIXEL_ARRAY_OFFSET + pixel_array_size
178 |     end
179 | 
180 |     def pixel_array_size
181 |       ((BITS_PER_PIXEL*@width)/32.0).ceil*4*@height
182 |     end
183 |   end
184 | end
185 | ```
186 | 
187 | Out of the five fields in this header, only the file size ended up being
188 | dynamic. I was able to treat the pixel array offset as a constant because the
189 | headers for 24 bit color images take up a fixed amount of space. The file size
190 | computations[^1] will make sense later once we examine the way that the pixel
191 | array gets encoded.
192 | 
193 | The tool that makes it possible for us to convert these various field values
194 | into binary sequences is `Array#pack`. If you note that the file size of our
195 | reference image is 2x2 bitmap is 70 bytes, it becomes clear what `pack`
196 | is actually doing for us when we examine the byte by byte values
197 | in the following example:
198 | 
199 | ```ruby
200 | header = ["BM", 70, 0, 0, 54].pack("A2Vv2V")
201 | p header.bytes.map { |e| "%.2x" % e }
202 | 
203 | =begin expected output (NOTE: reformatted below for easier reading)
204 |   ["42", "4d",
205 |    "46", "00", "00", "00",
206 |    "00", "00",
207 |    "00", "00",
208 |    "36", "00", "00", "00"]
209 | =end
210 | ```
211 | The byte sequence for the file header exactly matches that of our reference image,
212 | which indicates that the proper bitmap file header is being generated.
213 | Below I've listed out how each field in the header encoded:
214 | 
215 | ```
216 |   "A2" -> arbitrary binary string of width 2 (packs "BM" as: 42 4d)
217 |   "V"  -> a 32bit unsigned little endian int (packs 70 as: 46 00 00 00)
218 |   "v2" -> two 16bit unsigned little endian ints (packs 0, 0 as: 00 00 00 00)
219 |   "V"  -> a 32bit unsigned little endian int (packs 54 as: 36 00 00 00)
220 | ```
221 | 
222 | While I went to the effort of expanding out the byte sequences to make it easier
223 | to see what is going on, you don't typically need to do this at all while
224 | working with `Array#pack` as long as you craft your template strings carefully.
225 | But like anything else in Ruby, it's nice to be able to write little scripts or
226 | hack around a bit in `irb` whenever you're trying to figure out how your
227 | code is actually working.
228 | 
229 | After figuring out how to encode the file header, the next step was to work on
230 | the DIB header, which includes some metadata about the image and how it should
231 | be displayed on the screen:
232 | 
233 | ```ruby
234 | class BMP
235 |   class Writer
236 |     DIB_HEADER_SIZE    = 40
237 |     PIXELS_PER_METER   = 2835 # 2835 pixels per meter is basically 72dpi
238 | 
239 |     # ... other code as before ...
240 | 
241 |    def write_dib_header(file)
242 |       file << [DIB_HEADER_SIZE, @width, @height, 1, BITS_PER_PIXEL,
243 |                0, pixel_array_size, PIXELS_PER_METER, PIXELS_PER_METER,
244 |                0, 0].pack("Vl<2v2V2l<2V2")
245 |   end
246 | end
247 | ```
248 | 
249 | Because we are only working on a very limited subset of BMP features, it's
250 | possible to construct the DIB header mostly from preset constants combined with
251 | a few values that we already computed for the BMP file header.
252 | 
253 | The `pack` statement in the above code works in a very similar fashion as the
254 | code that writes out the BMP file header, with one exception: it needs to handle
255 | signed 32-bit little endian integers. This data type does not have a pattern of its own,
256 | but instead is a composite pattern made up of two
257 | characters: `l<`. The first character (`l`) instructs Ruby to read a 32-bit
258 | signed integer, and the second character (`<`) tells it to read it in
259 | little-endian byte order.
260 | 
261 | It isn't clear to me at all why a bitmap image could contain negative values for
262 | its width, height, and pixel density -- this is just how the format is
263 | specified. Because our goal is to learn about binary file processing and not
264 | image format esoterica, it's fine to treat that design decision as a black
265 | box for now and move on to looking at how the pixel array is processed.
266 | 
267 | ```ruby
268 | class BMP
269 |   class Writer
270 |     # .. other code as before ...
271 | 
272 |     def write_pixel_array(file)
273 |       @pixels.reverse_each do |row|
274 |         row.each do |color|
275 |           file << pixel_binstring(color)
276 |         end
277 | 
278 |         file << row_padding
279 |       end
280 |     end
281 | 
282 |     def pixel_binstring(rgb_string)
283 |       raise ArgumentError unless rgb_string =~ /\A\h{6}\z/
284 |       [rgb_string].pack("H6")
285 |     end
286 | 
287 |     def row_padding
288 |       "\x0" * (@width % 4)
289 |     end
290 |   end
291 | end
292 | ```
293 | 
294 | The most interesting thing to note about this code is that each row of pixels ends up getting padded with some null characters. This is to ensure that each row of pixels is aligned on WORD boundaries (4 byte sequences). This is a semi-arbitrary limitation that has to do with file storage constraints, but things like this are common in binary files.
295 | 
296 | The calculations below show how much padding is needed to bring rows of various widths up to a multiple of 4, and explains how I derived the computation for the `row_padding` method:
297 | 
298 | ```
299 | Width 2 : 2 * 3 Bytes per pixel = 6 bytes  + 2 padding  = 8
300 | Width 3 : 3 * 3 Bytes per pixel = 9 bytes  + 3 padding  = 12
301 | Width 4 : 4 * 3 Bytes per pixel = 12 bytes + 0 padding  = 12
302 | Width 5 : 5 * 3 Bytes per pixel = 15 bytes + 1 padding  = 16
303 | Width 6 : 6 * 3 Bytes per pixel = 18 bytes + 2 padding  = 20
304 | Width 7 : 7 * 3 Bytes per pixel = 21 bytes + 3 padding  = 24
305 | ...
306 | ```
307 | 
308 | Sometimes calculations like this are provided for you in format specifications,
309 | other times you need to derive them yourself. Choosing to work
310 | with only 24bit per pixel images allowed me to skirt the question of how to
311 | generalize this computation to an arbitrary amount of bits per pixel.
312 | 
313 | While the padding code is definitely the most interesting aspect of the pixel array, there are a couple other details about this implementation worth discussing. In particular, we should take a closer look at the `pixel_binstring` method:
314 | 
315 | ```ruby
316 | def pixel_binstring(rgb_string)
317 |   raise ArgumentError unless rgb_string =~ /\A\h{6}\z/
318 |   [rgb_string].pack("H6")
319 | end
320 | ```
321 | 
322 | This is the method that converts the values we set in the pixel array via lines like `bmp[0,0] = "ff0000"` into actual binary sequences. It starts by matching the string with a regex to ensure that the input string is a valid sequence of 6 hexadecimal digits. If the validation succeeds, it then packs those values into a binary sequence, creating a string with three bytes in it. The example below should make it clear what is going on here:
323 | 
324 | ```
325 | >> ["ffa0ff"].pack("H6").bytes.to_a
326 | => [255, 160, 255]
327 | ```
328 | 
329 | This pattern makes it possible for us to specify color values directly in hexadecimal strings and then convert them to their numeric value just before they get written to the file.
330 | 
331 | With this last detail explained, you should now understand how to build a
332 | functional bitmap encoder for writing 24bit color images. If seeing things
333 | broken out step by step caused you to lose a sense of the big picture, you can
334 | check out the [source code for BMP::Writer](https://gist.github.com/1351737). Feel free to play around with it a bit before moving on to the next section: the best way to learn is to actually run these code samples and try to extend them and/or break them in various ways.
335 | 
336 | ## Decoding a bitmap image
337 | 
338 | As you might expect, there is a nice symmetry between encoding and decoding binary files. To show just to what extent this is the case, I will walk you through the code which makes the following example run:
339 | 
340 | ```ruby
341 | bmp = BMP::Reader.new("example1.bmp")
342 | p bmp.width  #=> 2
343 | p bmp.height #=> 2
344 | 
345 | p bmp[0,0] #=> "ff0000"   
346 | p bmp[1,0] #=> "00ff00"
347 | p bmp[0,1] #=> "0000ff"
348 | p bmp[1,1] #=> "ffffff"
349 | ```
350 | 
351 | The general structure of `BMP::Reader` ended up being quite similar to what I did for `BMP::Writer`. The code below shows the methods which define the public interface:
352 | 
353 | ```ruby
354 | class BMP
355 |   class Reader
356 |     def initialize(bmp_filename)
357 |       File.open(bmp_filename, "rb") do |file|
358 |         read_bmp_header(file) # does some validations
359 |         read_dib_header(file) # sets @width, @height
360 |         read_pixels(file)     # populates the @pixels array
361 |       end
362 |     end
363 | 
364 |     attr_reader :width, :height
365 | 
366 |     def [](x,y)
367 |       @pixels[y][x]
368 |     end
369 |   end
370 | end
371 | ```
372 | 
373 | This time, we still are working with an ordinary array of arrays to store the
374 | pixel data, and most of the work gets done as soon as the file is read in the
375 | constructor. Because I decided to support only a single image type, most of the
376 | work of reading the headers is just for validation purposes. In fact, the
377 | `read_bmp_header` method does nothing more than some basic sanity checking, as
378 | shown below:
379 | 
380 | ```ruby
381 | class BMP
382 |   class Reader
383 |     PIXEL_ARRAY_OFFSET = 54
384 | 
385 |     # ...other code as before ...
386 | 
387 |     def read_bmp_header(file)
388 |       header = file.read(14)
389 |       magic_number, file_size, reserved1,
390 |       reserved2, array_location = header.unpack("A2Vv2V")
391 | 
392 |       fail "Not a bitmap file!" unless magic_number == "BM"
393 | 
394 |       unless file.size == file_size
395 |         fail "Corrupted bitmap: File size is not as expected"
396 |       end
397 | 
398 |       unless array_location == PIXEL_ARRAY_OFFSET
399 |         fail "Unsupported bitmap: pixel array does not start where expected"
400 |       end
401 |     end
402 |   end
403 | end
404 | ```
405 | 
406 | The key thing to notice about this code is that it reads from the file just the bytes it needs in order to parse the header. This makes it possible to validate a very large file without loading much data into memory. Reading entire files into memory is rarely a good idea, and this is especially true when it comes to binary data because doing so will actually make your job harder rather than easier.
407 | 
408 | Once the header data is loaded into a string, the `String#unpack` method is used to extract some values from it. Notice here how `String#unpack` uses the same template syntax as `Array#pack` and simply provides the inverse operation. While the `pack` operation converts an array of values into a string of binary data, the `unpack` operation converts a binary string into an array of processed values. This allows us to recover the information packed into the bitmap file header as Ruby strings and fixnums.
409 | 
410 | Once these values have been converted into Ruby objects, it's easy to do some
411 | ordinary comparisons to check to see if they're what we'd expect them to be.
412 | Because they help detect corrupted files, clearly defined validations are an
413 | important part of writing any decoder for binary file formats. If you do not do
414 | this sort of sanity checking, you will inevitably run into
415 | subtle processing errors later on that will be much harder to debug.
416 | 
417 | As you might expect, the implementation of `read_dib_header` involves more of
418 | the same sort of extractions and validations. It also sets the `@width` and
419 | `@height` variables, which we use later to determine how to traverse the encoded
420 | pixel array.
421 | 
422 | ```ruby
423 | class BMP
424 |   class Reader
425 |     # ... other code as before ...
426 | 
427 |     BITS_PER_PIXEL     = 24
428 |     DIB_HEADER_SIZE    = 40
429 | 
430 |     def read_dib_header(file)
431 |       header = file.read(40)
432 | 
433 |       header_size, width, height, planes, bits_per_pixel,
434 |       compression_method, image_size, hres,
435 |       vres, n_colors, i_colors = header.unpack("Vl<2v2V2l<2V2")
436 | 
437 |       unless header_size == DIB_HEADER_SIZE
438 |         fail "Corrupted bitmap: DIB header does not match expected size"
439 |       end
440 | 
441 |       unless planes == 1
442 |         fail "Corrupted bitmap: Expected 1 plane, got #{planes}"
443 |       end
444 | 
445 |       unless bits_per_pixel == BITS_PER_PIXEL
446 |         fail "#{bits_per_pixel} bits per pixel bitmaps are not supported"
447 |       end
448 | 
449 |       unless compression_method == 0
450 |         fail "Bitmap compression not supported"
451 |       end
452 | 
453 |       unless image_size + PIXEL_ARRAY_OFFSET == file.size
454 |         fail "Corrupted bitmap: pixel array size isn't as expected"
455 |       end
456 | 
457 |       @width, @height = width, height
458 |     end
459 |   end
460 | end
461 | ```
462 | 
463 | Beyond what has already been said about this example and the DIB header itself, there isn't much more to discuss about this particular method. That means we can finally take a look at how `BMP::Reader` converts the encoded pixel array into a nested Ruby array structure.
464 | 
465 | ```ruby
466 | class BMP
467 |   class Reader
468 |     def read_pixels(file)
469 |       @pixels = Array.new(@height) { Array.new(@width) }
470 | 
471 |       (@height-1).downto(0) do |y|
472 |         0.upto(@width - 1) do |x|
473 |           @pixels[y][x] = file.read(3).unpack("H6").first
474 |         end
475 |         advance_to_next_row(file)
476 |       end
477 |     end
478 | 
479 |     def advance_to_next_row(file)
480 |       padding_bytes = @width % 4
481 |       return if padding_bytes == 0
482 | 
483 |       file.pos += padding_bytes
484 |     end
485 |   end
486 | end
487 | ```
488 | 
489 | One interesting aspect of this code is that it uses explicit numerical iterators. These are relatively rare in idiomatic Ruby, but I did not see a better way to approach this particular problem. Rows are listed in the pixel array from the bottom up, while the image itself still gets indexed from the top down (with 0 at the top). This makes it necessary to iterate over the row numbers in reverse order, and the use of `downto` is the best way I could find to do that.
490 | 
491 | The other thing worth noticing about this code is that in the `advance_to_next_row` method, we actually move the pointer ahead in the file rather than reading the padding bytes between each row. This makes little difference when you're dealing with a maximum of three bytes of padding per row (two in this case), but is a good practice for writing more efficient code that consumes less memory.
492 | 
493 | When you take all these code examples and glue them together into a single class
494 | definition, you'll end up with a `BMP::Reader` object that is capable giving you
495 | the width and height of a 24bit BMP image as well as the color of each and every
496 | pixel in the image. For those who'd like to experiment further, the [source code
497 | for BMP::Reader](https://gist.github.com/1352294) is available.
498 | 
499 | ## Reflections
500 | 
501 | The thing that makes me appreciate binary file formats is that if you just learn
502 | a few basic computing concepts, there are few things that could be more
503 | fundamentally simple to work with. But simple does not necessarily mean easy, and in the process of writing this article I realized that some aspects of binary file processing are not quite as trivial or intuitive as I originally thought they were.
504 | 
505 | What I can say is that this kind of work gets a whole lot easier with practice.
506 | Due to my work on [Prawn](http://prawnpdf.org) I have written
507 | implementations for various different binary formats including PDF, PNG, JPG,
508 | and TTF. These formats each have their differences, but my experience tells me
509 | that if you fully understand the examples in this article, then you are already
510 | well on your way to tackling pretty much any binary file format.
511 | 
512 | [^1]: To determine the storage space needed for the pixel array in BMP images, I used the computations described in the [Wikipedia article on bitmap images](http://en.wikipedia.org/wiki/BMP_file_format#Pixel_storage).
513 | 


--------------------------------------------------------------------------------