├── 0-why-haskell.org
├── 1-getting-started.org
├── 10-parsing-a-binary-data-format.org
├── 11-testing-and-quality-assurance.org
├── 12-barcode-recognition.org
├── 13-data-structures.org
├── 14-using-parsec.org
├── 15-monads.org
├── 16-programming-with-monads.org
├── 17-interfacing-with-c.org
├── 18-monad-transformers.org
├── 19-error-handling.org
├── 2-types-and-functions.org
├── 20-systems-programming-in-haskell.org
├── 21-using-databases.org
├── 22-web-client-programming.org
├── 23-gui-programming-with-gtk2hs.org
├── 24-concurrent-and-multicore-programming.org
├── 25-profiling-and-optimization.org
├── 26-building-a-bloom-filter.org
├── 27-sockets-and-syslog.org
├── 28-software-transactional-memory.org
├── 3-defining-types-streamlining-functions.org
├── 4-functional-programming.org
├── 5-writing-a-library.org
├── 6-using-typeclasses.org
├── 7-io.org
├── 8-efficient-file-processing-regular-expressions-and-file-name-matching.org
├── 9-a-library-for-searching-the-file-system.org
├── LICENSE.txt
├── Makefile
├── README.org
├── appendix-characters-strings-and-escaping-rules.org
├── bibliography.org
├── bin
    ├── map_html_to_org.py
    ├── update_file_links
    └── update_quotation_marks
└── figs
    ├── ch11-hpc-round1.png
    ├── ch11-hpc-round2.png
    ├── ch12-bad-angled.jpg
    ├── ch12-bad-too-far.jpg
    ├── ch12-bad-too-near.jpg
    ├── ch12-barcode-example.png
    ├── ch12-barcode-generated.png
    ├── ch12-barcode-photo.jpg
    ├── ch25-heap-hc.png
    ├── ch25-heap-hd.png
    ├── ch25-heap-hy.png
    ├── ch25-stack.png
    ├── gui-glade-3.png
    ├── gui-pod-addwin.png
    ├── gui-pod-mainwin.png
    └── gui-update-complete.png


/1-getting-started.org:
--------------------------------------------------------------------------------
  1 | * Chapter 1: Getting Started
  2 | 
  3 | As you read the early chapters of this book, keep in mind that we
  4 | will sometimes introduce ideas in restricted, simplified form.
  5 | Haskell is a deep language, and presenting every aspect of a given
  6 | subject all at once is likely to prove overwhelming. As we build a
  7 | solid foundation in Haskell, we will expand upon these initial
  8 | explanations.
  9 | 
 10 | ** Your Haskell environment
 11 | 
 12 | The Glasgow Haskell Compiler (GHC) is the most widely used. It
 13 | compiles to native code, supports parallel execution, and provides
 14 | useful performance analysis and debugging tools.
 15 | 
 16 | GHC has three main components.
 17 | 
 18 | - ~ghc~ is an optimizing compiler that generates fast native code.
 19 | - ~ghci~ is an interactive interpreter and debugger.
 20 | - ~runghc~ is a program for running Haskell programs as scripts,
 21 |   without needing to compile them first.
 22 | 
 23 | #+BEGIN_NOTE
 24 | How we refer to the components of GHC
 25 | 
 26 | When we discuss the GHC system as a whole, we will refer to it as
 27 | GHC. If we are talking about a specific command, we will mention
 28 | ~ghc~, ~ghci~, or ~runghc~ by name.
 29 | #+END_NOTE
 30 | 
 31 | In this book, we assume that you're using at least version 8.2.2
 32 | of GHC, which was released in 2017. To obtain a copy of GHC visit
 33 | [[http://www.haskell.org/downloads][the GHC download page]], and look for the list of binary packages
 34 | and installers.
 35 | 
 36 | Many Linux distributors, and providers of BSD and other Unix
 37 | variants, make custom binary packages of GHC available. Because
 38 | these are built specifically for each environment, they are much
 39 | easier to install and use than the generic binary packages that
 40 | are available from the GHC download page. You can find a list of
 41 | distributions that custom-build GHC at the GHC
 42 | [[http://www.haskell.org/ghc/distribution_packages.html][distribution packages]] page.
 43 | 
 44 | For more detailed information about how to install GHC on a
 45 | variety of popular platforms, we've provided some instructions in
 46 | [[file:installing-ghc-and-haskell-libraries.html][Appendix A, /Installing GHC and Haskell libraries/]].
 47 | 
 48 | ** Getting started with ghci, the interpreter
 49 | 
 50 | The interactive interpreter for GHC is a program named ~ghci~. It
 51 | lets us enter and evaluate Haskell expressions, explore modules,
 52 | and debug our code. If you are familiar with Python or Ruby,
 53 | ~ghci~ is somewhat similar to ~python~ and ~irb~, the interactive
 54 | Python and Ruby interpreters.
 55 | 
 56 | #+BEGIN_NOTE
 57 | The ~ghci~ command has a narrow focus
 58 | 
 59 | We typically cannot copy some code out of a Haskell source file
 60 | and paste it into ~ghci~. This does not have a significant effect
 61 | on debugging pieces of code, but it can initially be surprising if
 62 | you are used to, say, the interactive Python interpreter.
 63 | #+END_NOTE
 64 | 
 65 | On Unix-like systems, we run ~ghci~ as a command in a shell
 66 | window. On Windows, it's available via the Start Menu. For
 67 | example, if you installed using the GHC installer on Windows,
 68 | you should go to "Programs", then "GHC"; you will then see
 69 | ~ghci~ in the list. (See [[file:installing-ghc-and-haskell-libraries.org::*Windows][the section called "Windows"]]
 70 | screenshot.)
 71 | 
 72 | When we run ~ghci~, it displays a startup banner, followed by a
 73 | ~Prelude>~ prompt. Here, we're showing version 8.2.2.
 74 | 
 75 | #+BEGIN_SRC screen
 76 | $ ghci
 77 | GHCi, version 8.2.2: http://www.haskell.org/ghc/  :? for help
 78 | Prelude>
 79 | #+END_SRC
 80 | 
 81 | The word ~Prelude~ in the prompt indicates that ~Prelude~, a
 82 | standard library of useful functions, is loaded and ready to use.
 83 | When we load other modules or source files, they will show up in
 84 | the prompt, too.
 85 | 
 86 | #+BEGIN_TIP
 87 | Getting help
 88 | 
 89 | If you enter ~:?~ at the ~ghci~ prompt, it will print a long help
 90 | message.
 91 | #+END_TIP
 92 | 
 93 | The ~Prelude~ module is sometimes referred to as "the standard
 94 | prelude", because its contents are defined by the Haskell 2010
 95 | standard. Usually, it's simply shortened to "the prelude".
 96 | 
 97 | #+BEGIN_NOTE
 98 | About the ghci prompt
 99 | 
100 | The prompt displayed by ~ghci~ changes frequently depending on
101 | what modules we have loaded. It can often grow long enough to
102 | leave little visual room on a single line for our input.
103 | 
104 | For brevity and consistency, we have replaced ~ghci~'s default
105 | prompts throughout this book with the prompt string ~ghci>~.
106 | 
107 | If you want to do this yourself, use ~ghci~'s ~:set prompt~
108 | directive, as follows.
109 | 
110 | #+BEGIN_SRC screen
111 | Prelude> :set prompt "ghci> "
112 | ghci>
113 | #+END_SRC
114 | #+END_NOTE
115 | 
116 | The prelude is always implicitly available; we don't need to take
117 | any actions to use the types, values, or functions it defines. To
118 | use definitions from other modules, we must load them into ~ghci~,
119 | using the ~:module~ command.
120 | 
121 | #+BEGIN_SRC screen
122 | ghci> :module + Data.Ratio
123 | #+END_SRC
124 | 
125 | We can now use the functionality of the ~Data.Ratio~ module,
126 | which lets us work with rational numbers (fractions).
127 | 
128 | ** Basic interaction: using ghci as a calculator
129 | 
130 | In addition to providing a convenient interface for testing code
131 | fragments, ~ghci~ can function as a readily accessible desktop
132 | calculator. We can easily express any calculator operation in
133 | ~ghci~ and, as an added bonus, we can add more complex operations
134 | as we become more familiar with Haskell. Even using the
135 | interpreter in this simple way can help us to become more
136 | comfortable with how Haskell works.
137 | 
138 | *** Simple arithmetic
139 | 
140 | We can immediately start entering expressions, to see what ~ghci~
141 | will do with them. Basic arithmetic works similarly to languages
142 | like C and Python: we write expressions in /infix/ form, where an
143 | operator appears between its operands.
144 | 
145 | #+BEGIN_SRC screen
146 | ghci> 2 + 2
147 | 4
148 | ghci> 31337 * 101
149 | 3165037
150 | ghci> 7.0 / 2.0
151 | 3.5
152 | #+END_SRC
153 | 
154 | The infix style of writing an expression is just a convenience: we
155 | can also write an expression in /prefix/ form, where the operator
156 | precedes its arguments. To do this, we must enclose the operator
157 | in parentheses.
158 | 
159 | #+BEGIN_SRC screen
160 | ghci> 2 + 2
161 | 4
162 | ghci> (+) 2 2
163 | 4
164 | #+END_SRC
165 | 
166 | As the expressions above imply, Haskell has a notion of integers
167 | and floating point numbers. Integers can be arbitrarily large.
168 | Here, ~(^)~ provides integer exponentiation.
169 | 
170 | #+BEGIN_SRC screen
171 | ghci> 313 ^ 15
172 | 27112218957718876716220410905036741257
173 | #+END_SRC
174 | 
175 | *** An arithmetic quirk: writing negative numbers
176 | 
177 | Haskell presents us with one peculiarity in how we must write
178 | numbers: it's often necessary to enclose a negative number in
179 | parentheses. This affects us as soon as we move beyond the
180 | simplest expressions.
181 | 
182 | We'll start by writing a negative number.
183 | 
184 | #+BEGIN_SRC screen
185 | ghci> -3
186 | -3
187 | #+END_SRC
188 | 
189 | The ~-~ above is a unary operator. In other words, we didn't write
190 | the single number "-3"; we wrote the number "3", and applied the
191 | operator ~-~ to it. The ~-~ operator is Haskell's only unary
192 | operator, and we cannot mix it with infix operators.
193 | 
194 | #+BEGIN_SRC screen
195 | ghci> 2 + -3
196 | 
197 | <interactive>:1:1: error:
198 |     Precedence parsing error
199 |         cannot mix ‘+’ [infixl 6] and prefix `-' [infixl 6] in the same infix expression
200 | #+END_SRC
201 | 
202 | If we want to use the unary minus near an infix operator, we must
203 | wrap the expression it applies to in parentheses.
204 | 
205 | #+BEGIN_SRC screen
206 | ghci> 2 + (-3)
207 | -1
208 | ghci> 3 + (-(13 * 37))
209 | -478
210 | #+END_SRC
211 | 
212 | This avoids a parsing ambiguity. When we apply a function in
213 | Haskell, we write the name of the function, followed by its
214 | argument, for example ~f 3~. If we did not need to wrap a negative
215 | number in parentheses, we would have two profoundly different ways
216 | to read ~f-3~: it could be either "apply the function ~f~ to the
217 | number ~-3~", or "subtract the number ~3~ from the variable ~f~".
218 | 
219 | /Most/ of the time, we can omit white space ("blank" characters
220 | such as space and tab) from expressions, and Haskell will parse
221 | them as we intended. But not always. Here is an expression that
222 | works:
223 | 
224 | #+BEGIN_SRC screen
225 | ghci> 2*3
226 | 6
227 | #+END_SRC
228 | 
229 | And here is one that seems similar to the problematic negative
230 | number example above, but results in a different error message.
231 | 
232 | #+BEGIN_SRC screen
233 | ghci> 2*-3
234 | 
235 | <interactive>:1:1: error:
236 |     • Variable not in scope: (*-) :: Integer -> Integer -> t
237 |     • Perhaps you meant one of these:
238 |         ‘*’ (imported from Prelude), ‘-’ (imported from Prelude),
239 |         ‘*>’ (imported from Prelude)
240 | #+END_SRC
241 | 
242 | Here, the Haskell implementation is reading ~*-~ as a single
243 | operator. Haskell lets us define new operators (a subject that we
244 | will return to later), but we haven't defined ~*-~. Once again, a
245 | few parentheses get us and ~ghci~ looking at the expression in the
246 | same way.
247 | 
248 | #+BEGIN_SRC screen
249 | ghci> 2*(-3)
250 | -6
251 | #+END_SRC
252 | 
253 | Compared to other languages, this unusual treatment of negative
254 | numbers might seem annoying, but it represents a reasoned
255 | trade-off. Haskell lets us define new operators at any time. This
256 | is not some kind of esoteric language feature; we will see quite a
257 | few user-defined operators in the chapters ahead. The language
258 | designers chose to accept a slightly cumbersome syntax for
259 | negative numbers in exchange for this expressive power.
260 | 
261 | *** Boolean logic, operators, and value comparisons
262 | 
263 | The values of Boolean logic in Haskell are ~True~ and ~False~. The
264 | capitalization of these names is important. The language uses
265 | C-influenced operators for working with Boolean values: ~(&&)~ is
266 | logical "and", and ~(||)~ is logical "or".
267 | 
268 | #+BEGIN_SRC screen
269 | ghci> True && False
270 | False
271 | ghci> False || True
272 | True
273 | #+END_SRC
274 | 
275 | While some programming languages treat the number zero as
276 | synonymous with ~False~, Haskell does not, nor does it consider a
277 | non-zero value to be ~True~.
278 | 
279 | #+BEGIN_SRC screen
280 | ghci> True && 1
281 | 
282 | <interactive>:1:9: error:
283 |     • No instance for (Num Bool) arising from the literal ‘1’
284 |     • In the second argument of ‘(&&)’, namely ‘1’
285 |       In the expression: True && 1
286 |       In an equation for ‘it’: it = True && 1
287 | #+END_SRC
288 | 
289 | Once again, we are faced with a substantial-looking error message.
290 | In brief, it tells us that the boolean type, ~Bool~, is not a
291 | member of the family of numeric types, ~Num~. The error message is
292 | rather long because ~ghci~ is pointing out the location of the
293 | problem, and hinting at a possible change we could make that might
294 | fix the problem.
295 | 
296 | Here is a more detailed breakdown of the error message.
297 | 
298 | - "~No instance for (Num Bool)~" tells us that ~ghci~ is trying to
299 |   treat the numeric value 1 as having a Bool type, but it cannot.
300 | - "~arising from the literal `1'~" indicates that it was our use
301 |   of the number ~1~ that caused the problem.
302 | - "~In the definition of `it'~" refers to a ~ghci~ short cut that
303 |   we will revisit in a few pages.
304 | 
305 | #+BEGIN_TIP
306 | Remain fearless in the face of error messages
307 | 
308 | We have an important point to make here, which we will repeat
309 | throughout the early sections of this book. If you run into
310 | problems or error messages that you do not yet understand, /don't
311 | panic/. Early on, all you have to do is figure out enough to make
312 | progress on a problem. As you acquire experience, you will find it
313 | easier to understand parts of error messages that initially seem
314 | obscure.
315 | 
316 | The numerous error messages have a purpose: they actually help us
317 | in writing correct code, by making us perform some amount of
318 | debugging "up front", before we ever run a program. If you are
319 | coming from a background of working with more permissive
320 | languages, this way of working may come as something of a shock.
321 | Bear with us.
322 | #+END_TIP
323 | 
324 | Most of Haskell's comparison operators are similar to those used
325 | in C and the many languages it has influenced.
326 | 
327 | #+BEGIN_SRC screen
328 | ghci> 1 == 1
329 | True
330 | ghci> 2 < 3
331 | True
332 | ghci> 4 >= 3.99
333 | True
334 | #+END_SRC
335 | 
336 | One operator that differs from its C counterpart is "is not equal
337 | to". In C, this is written as ~!=~. In Haskell, we write ~(/=)~,
338 | which resembles the ≠ notation used in mathematics.
339 | 
340 | #+BEGIN_SRC screen
341 | ghci> 2 /= 3
342 | True
343 | #+END_SRC
344 | 
345 | Also, where C-like languages often use ~!~ for logical negation,
346 | Haskell uses the ~not~ function.
347 | 
348 | #+BEGIN_SRC screen
349 | ghci> not True
350 | False
351 | #+END_SRC
352 | 
353 | *** Operator precedence and associativity
354 | 
355 | Like written algebra and other programming languages that use
356 | infix operators, Haskell has a notion of operator precedence. We
357 | can use parentheses to explicitly group parts of an expression,
358 | and precedence allows us to omit a few parentheses. For example,
359 | the multiplication operator has a higher precedence than the
360 | addition operator, so Haskell treats the following two expressions
361 | as equivalent.
362 | 
363 | #+BEGIN_SRC screen
364 | ghci> 1 + (4 * 4)
365 | 17
366 | ghci> 1 + 4 * 4
367 | 17
368 | #+END_SRC
369 | 
370 | Haskell assigns numeric precedence values to operators, with 1
371 | being the lowest precedence and 9 the highest. A higher-precedence
372 | operator is applied before a lower-precedence operator. We can use
373 | ~ghci~ to inspect the precedence levels of individual operators,
374 | using its ~:info~ command.
375 | 
376 | #+BEGIN_SRC screen
377 | ghci> :info (+)
378 | class (Eq a, Show a) => Num a where
379 |   (+) :: a -> a -> a
380 |   ...
381 |     -- Defined in GHC.Num
382 | infixl 6 +
383 | ghci> :info (*)
384 | class (Eq a, Show a) => Num a where
385 |   ...
386 |   (*) :: a -> a -> a
387 |   ...
388 |     -- Defined in GHC.Num
389 | infixl 7 *
390 | #+END_SRC
391 | 
392 | The information we seek is in the line "~infixl 6 +~", which
393 | indicates that the ~(+)~ operator has a precedence of 6. (We will
394 | explain the other output in a later chapter.) The "~infixl 7 *~"
395 | tells us that the ~(*)~ operator has a precedence of 7. Since
396 | ~(*)~ has a higher precedence than ~(+)~, we can now see why
397 | ~1 + 4 * 4~ is evaluated as ~1 + (4 * 4)~, and not ~(1 + 4) * 4~.
398 | 
399 | Haskell also defines /associativity/ of operators. This determines
400 | whether an expression containing multiple uses of an operator is
401 | evaluated from left to right, or right to left. The ~(+)~ and
402 | ~(*)~ operators are left associative, which is represented as
403 | ~infixl~ in the ~ghci~ output above. A right associative operator
404 | is displayed with ~infixr~.
405 | 
406 | #+BEGIN_SRC screen
407 | ghci> :info (^)
408 | (^) :: (Num a, Integral b) => a -> b -> a  -- Defined in GHC.Real
409 | infixr 8 ^
410 | #+END_SRC
411 | 
412 | The combination of precedence and associativity rules are usually
413 | referred to as /fixity/ rules.
414 | 
415 | *** Undefined values, and introducing variables
416 | 
417 | Haskell's prelude, the standard library we mentioned earlier,
418 | defines at least one well-known mathematical constant for us.
419 | 
420 | #+BEGIN_SRC screen
421 | ghci> pi
422 | 3.141592653589793
423 | #+END_SRC
424 | 
425 | But its coverage of mathematical constants is not comprehensive,
426 | as we can quickly see. Let us look for Euler's number, ~e~.
427 | 
428 | #+BEGIN_SRC screen
429 | ghci> e
430 | 
431 | <interactive>:1:1: error: Variable not in scope: e
432 | #+END_SRC
433 | 
434 | Oh well. We have to define it ourselves.
435 | 
436 | #+BEGIN_NOTE
437 | Don't worry about the error message
438 | 
439 | If the above "not in scope" error message seems a little
440 | daunting, do not worry. All it means is that there is no variable
441 | defined with the name ~e~.
442 | #+END_NOTE
443 | 
444 | Using ~ghci~'s ~let~ construct, we can make a temporary definition
445 | of ~e~ ourselves.
446 | 
447 | #+BEGIN_SRC screen
448 | ghci> e = exp 1
449 | #+END_SRC
450 | 
451 | This is an application of the exponential function, ~exp~, and our
452 | first example of applying a function in Haskell. While languages
453 | like Python require parentheses around the arguments to a
454 | function, Haskell does not.
455 | 
456 | With ~e~ defined, we can now use it in arithmetic expressions. The
457 | ~(^)~ exponentiation operator that we introduced earlier can only
458 | raise a number to an integer power. To use a floating point number
459 | as the exponent, we use the ~(**)~ exponentiation operator.
460 | 
461 | #+BEGIN_SRC screen
462 | ghci> (e ** pi) - pi
463 | 19.99909997918947
464 | #+END_SRC
465 | 
466 | #+BEGIN_WARNING
467 | This syntax is ghci-specific
468 | 
469 | The syntax for ~let~ that ~ghci~ accepts is not the same as we
470 | would use at the "top level" of a normal Haskell program. We will
471 | see the normal syntax in
472 | [[file:3-defining-types-streamlining-functions.org::*Introducing local variables][the section called "Introducing local variables"]]
473 | #+END_WARNING
474 | 
475 | *** Dealing with precedence and associativity rules
476 | 
477 | It is sometimes better to leave at least some parentheses in
478 | place, even when Haskell allows us to omit them. Their presence
479 | can help future readers (including ourselves) to understand what
480 | we intended.
481 | 
482 | Even more importantly, complex expressions that rely completely on
483 | operator precedence are notorious sources of bugs. A compiler and
484 | a human can easily end up with different notions of what even a
485 | short, parenthesis-free expression is supposed to do.
486 | 
487 | There is no need to remember all of the precedence and
488 | associativity rules numbers: it is simpler to add parentheses if
489 | you are unsure.
490 | 
491 | ** Command line editing in ghci
492 | 
493 | On most systems, ~ghci~ has some amount of command line editing
494 | ability. In case you are not familiar with command line editing,
495 | it's a huge time saver. The basics are common to both Unix-like
496 | and Windows systems. Pressing the ↑ key on your keyboard recalls
497 | the last line of input you entered; pressing ↑ repeatedly cycles
498 | through earlier lines of input. You can use the ← and → arrow keys
499 | to move around inside a line of input. On Unix (but not Windows,
500 | unfortunately), the ~tab~ key completes partially entered
501 | identifiers.
502 | 
503 | #+BEGIN_TIP
504 | Where to look for more information
505 | 
506 | We've barely scratched the surface of command line editing here.
507 | Since you can work more effectively if you're more familiar with
508 | the capabilities of your command line editing system, you might
509 | find it useful to do some further reading. ~ghci~ uses the
510 | Haskeline library under the hood, which is [[https://github.com/judah/haskeline/wiki/KeyBindings][powerful]] and
511 | [[https://github.com/judah/haskeline/wiki/UserPreferences][customisable]].
512 | #+END_TIP
513 | 
514 | ** Lists
515 | 
516 | A list is surrounded by square brackets; the elements are
517 | separated by commas.
518 | 
519 | #+BEGIN_SRC screen
520 | ghci> [1, 2, 3]
521 | [1,2,3]
522 | #+END_SRC
523 | 
524 | #+BEGIN_NOTE
525 | Commas are separators, not terminators
526 | 
527 | Some languages permit the last element in a list to be followed by
528 | an optional trailing comma before a closing bracket, but Haskell
529 | doesn't allow this. If you leave in a trailing comma (e.g.
530 | ~[1,2,]~), you'll get a parse error.
531 | #+END_NOTE
532 | 
533 | A list can be of any length. The empty list is written ~[]~.
534 | 
535 | #+BEGIN_SRC screen
536 | ghci> []
537 | []
538 | ghci> ["foo", "bar", "baz", "quux", "fnord", "xyzzy"]
539 | ["foo","bar","baz","quux","fnord","xyzzy"]
540 | #+END_SRC
541 | 
542 | All elements of a list must be of the same type. Here, we violate
543 | this rule: our list starts with two Bool values, but ends with a
544 | string.
545 | 
546 | #+BEGIN_SRC screen
547 | ghci> [True, False, "testing"]
548 | 
549 | <interactive>:1:15: error:
550 |     • Couldn't match expected type ‘Bool’ with actual type ‘[Char]’
551 |     • In the expression: "testing"
552 |       In the expression: [True, False, "testing"]
553 |       In an equation for ‘it’: it = [True, False, "testing"]
554 | #+END_SRC
555 | 
556 | Once again, ~ghci~'s error message is verbose, but it's simply
557 | telling us that there is no way to turn the string into a Boolean
558 | value, so the list expression isn't properly typed.
559 | 
560 | If we write a series of elements using /enumeration notation/,
561 | Haskell will fill in the contents of the list for us.
562 | 
563 | #+BEGIN_SRC screen
564 | ghci> [1..10]
565 | [1,2,3,4,5,6,7,8,9,10]
566 | #+END_SRC
567 | 
568 | Here, the ~..~ characters denote an /enumeration/. We can only use
569 | this notation for types whose elements we can enumerate. It makes
570 | no sense for text strings, for instance: there is not any
571 | sensible, general way to enumerate ~["foo".."quux"]~.
572 | 
573 | By the way, notice that the above use of range notation gives us a
574 | /closed interval/; the list contains both endpoints.
575 | 
576 | When we write an enumeration, we can optionally specify the size
577 | of the step to use by providing the first two elements, followed
578 | by the value at which to stop generating the enumeration.
579 | 
580 | #+BEGIN_SRC screen
581 | ghci> [1.0,1.25..2.0]
582 | [1.0,1.25,1.5,1.75,2.0]
583 | ghci> [1,4..15]
584 | [1,4,7,10,13]
585 | ghci> [10,9..1]
586 | [10,9,8,7,6,5,4,3,2,1]
587 | #+END_SRC
588 | 
589 | In the latter case above, the list is quite sensibly missing the
590 | end point of the enumeration, because it isn't an element of the
591 | series we defined.
592 | 
593 | We can omit the end point of an enumeration. If a type doesn't
594 | have a natural "upper bound", this will produce values
595 | indefinitely. For example, if you type ~[1..]~ at the ~ghci~
596 | prompt, you'll have to interrupt or kill ~ghci~ to stop it from
597 | printing an infinite succession of ever-larger numbers. If you are
598 | tempted to do this, type ~Ctrl-C~ to halt the enumeration. We will
599 | find later on that infinite lists are often useful in Haskell.
600 | 
601 | #+BEGIN_WARNING
602 | Beware enumerating floating point numbers
603 | 
604 | Here's a non-intuitive bit of behaviour.
605 | 
606 | #+BEGIN_SRC screen
607 | ghci> [1.0..1.8]
608 | [1.0,2.0]
609 | #+END_SRC
610 | 
611 | Behind the scenes, to avoid floating point roundoff problems, the
612 | Haskell implementation enumerates from ~1.0~ to ~1.8+0.5~.
613 | 
614 | Using enumeration notation over floating point numbers can pack
615 | more than a few surprises, so if you use it at all, be careful.
616 | Floating point behavior is quirky in all programming languages;
617 | there is nothing unique to Haskell here.
618 | #+END_WARNING
619 | 
620 | *** Operators on lists
621 | 
622 | There are two ubiquitous operators for working with lists. We
623 | concatenate two lists using the ~(++)~ operator.
624 | 
625 | #+BEGIN_SRC screen
626 | ghci> [3,1,3] ++ [3,7]
627 | [3,1,3,3,7]
628 | ghci> [] ++ [False,True] ++ [True]
629 | [False,True,True]
630 | #+END_SRC
631 | 
632 | More basic is the ~(:)~ operator, which adds an element to the
633 | front of a list. This is pronounced "cons" (short for
634 | "construct").
635 | 
636 | #+BEGIN_SRC screen
637 | ghci> 1 : [2,3]
638 | [1,2,3]
639 | ghci> 1 : []
640 | [1]
641 | #+END_SRC
642 | 
643 | You might be tempted to try writing ~[1,2] : 3~ to add an element
644 | to the end of a list, but ~ghci~ will reject this with an error
645 | message, because the first argument of ~(:)~ must be an element,
646 | and the second must be a list.
647 | 
648 | ** Strings and characters
649 | 
650 | If you know a language like Perl or C, you'll find Haskell's
651 | notations for strings familiar.
652 | 
653 | A text string is surrounded by double quotes.
654 | 
655 | #+BEGIN_SRC screen
656 | ghci> "This is a string."
657 | "This is a string."
658 | #+END_SRC
659 | 
660 | As in many languages, we can represent hard-to-see characters by
661 | "escaping" them. Haskell's escape characters and escaping rules
662 | follow the widely used conventions established by the C language.
663 | For example, ~'\n'~ denotes a newline character, and ~'\t'~ is a
664 | tab character. For complete details, see
665 | [[file:appendix-characters-strings-and-escaping-rules.org][Appendix B, /Characters, strings, and escaping rules/]].
666 | 
667 | #+BEGIN_SRC screen
668 | ghci> putStrLn "Here's a newline -->\n<-- See?"
669 | Here's a newline -->
670 | <-- See?
671 | #+END_SRC
672 | 
673 | Haskell makes a distinction between single characters and text
674 | strings. A single character is enclosed in single quotes.
675 | 
676 | #+BEGIN_SRC screen
677 | ghci> 'a'
678 | 'a'
679 | #+END_SRC
680 | 
681 | In fact, a text string is simply a list of individual characters.
682 | Here's a painful way to write a short string, which ~ghci~ gives
683 | back to us in a more familiar form.
684 | 
685 | #+BEGIN_SRC screen
686 | ghci> a = ['l', 'o', 't', 's', ' ', 'o', 'f', ' ', 'w', 'o', 'r', 'k']
687 | ghci> a
688 | "lots of work"
689 | ghci> a == "lots of work"
690 | True
691 | #+END_SRC
692 | 
693 | The empty string is written ~""~, and is a synonym for ~[]~.
694 | 
695 | #+BEGIN_SRC screen
696 | ghci> "" == []
697 | True
698 | #+END_SRC
699 | 
700 | Since a string is a list of characters, we can use the regular
701 | list operators to construct new strings.
702 | 
703 | #+BEGIN_SRC screen
704 | ghci> 'a':"bc"
705 | "abc"
706 | ghci> "foo" ++ "bar"
707 | "foobar"
708 | #+END_SRC
709 | 
710 | ** First steps with types
711 | 
712 | While we've talked a little about types already, our interactions
713 | with ~ghci~ have so far been free of much type-related thinking.
714 | We haven't told ~ghci~ what types we've been using, and it's
715 | mostly been willing to accept our input.
716 | 
717 | Haskell requires type names to start with an uppercase letter, and
718 | variable names must start with a lowercase letter. Bear this in
719 | mind as you read on; it makes it much easier to follow the names.
720 | 
721 | The first thing we can do to start exploring the world of types is
722 | to get ~ghci~ to tell us more about what it's doing. ~ghci~ has a
723 | command, ~:set~, that lets us change a few of its default
724 | behaviours. We can tell it to print more type information as
725 | follows.
726 | 
727 | #+BEGIN_SRC screen
728 | ghci> :set +t
729 | ghci> 'c'
730 | 'c'
731 | it :: Char
732 | ghci> "foo"
733 | "foo"
734 | it :: [Char]
735 | #+END_SRC
736 | 
737 | What the ~+t~ does is tell ~ghci~ to print the type of an
738 | expression after the expression. That cryptic ~it~ in the output
739 | can be very useful: it's actually the name of a special variable,
740 | in which ~ghci~ stores the result of the last expression we
741 | evaluated. (This isn't a Haskell language feature; it's specific
742 | to ~ghci~ alone.) Let's break down the meaning of the last line of
743 | ~ghci~ output.
744 | 
745 | - It's telling us about the special variable ~it~.
746 | - We can read text of the form ~x :: y~ as meaning "the
747 |   expression ~x~ has the type ~y~".
748 | - Here, the expression "it" has the type ~[Char]~. (The name
749 |   ~String~ is often used instead of ~[Char]~. It is simply a
750 |   synonym for ~[Char]~.)
751 | 
752 | #+BEGIN_TIP
753 | The joy of "it"
754 | 
755 | That ~it~ variable is a handy ~ghci~ shortcut. It lets us use the
756 | result of the expression we just evaluated in a new expression.
757 | 
758 | #+BEGIN_SRC screen
759 | ghci> "foo"
760 | "foo"
761 | it :: [Char]
762 | ghci> it ++ "bar"
763 | "foobar"
764 | it :: [Char]
765 | #+END_SRC
766 | 
767 | When evaluating an expression, ~ghci~ won't change the value of
768 | ~it~ if the evaluation fails. This lets you write potentially
769 | bogus expressions with something of a safety net.
770 | 
771 | #+BEGIN_SRC screen
772 | ghci> it
773 | "foobar"
774 | it :: [Char]
775 | ghci> it ++ 3
776 | 
777 | <interactive>:1:1: error
778 |     • No instance for (Num [Char]) arising from the literal ‘3’
779 |     • In the second argument of ‘(++)’, namely ‘3’
780 |       In the expression: it ++ 3
781 |       In an equation for ‘it’: it = it ++ 3
782 | ghci> it
783 | "foobar"
784 | it :: [Char]
785 | ghci> it ++ "baz"
786 | "foobarbaz"
787 | it :: [Char]
788 | #+END_SRC
789 | 
790 | When we couple ~it~ with liberal use of the arrow keys to recall
791 | and edit the last expression we typed, we gain a decent way to
792 | experiment interactively: the cost of mistakes is very low. Take
793 | advantage of the opportunity to make cheap, plentiful mistakes
794 | when you're exploring the language!
795 | #+END_TIP
796 | 
797 | Here are a few more of Haskell's names for types, from expressions
798 | of the sort we've already seen.
799 | 
800 | #+BEGIN_SRC screen
801 | ghci> 7 ^ 80
802 | 40536215597144386832065866109016673800875222251012083746192454448001
803 | it :: Integer
804 | #+END_SRC
805 | 
806 | Haskell's integer type is named ~Integer~. The size of an
807 | ~Integer~ value is bounded only by your system's memory capacity.
808 | 
809 | Rational numbers don't look quite the same as integers. To
810 | construct a rational number, we use the ~(%)~ operator. The
811 | numerator is on the left, the denominator on the right.
812 | 
813 | #+BEGIN_SRC screen
814 | ghci> :m +Data.Ratio
815 | ghci> 11 % 29
816 | 11%29
817 | it :: Integral a => Ratio a
818 | #+END_SRC
819 | 
820 | For convenience, ~ghci~ lets us abbreviate many commands, so we
821 | can write ~:m~ instead of ~:module~ to load a module.
822 | 
823 | Notice /two/ words on the right hand side of the ~::~ above. We
824 | can read this as a "ratio of integer". We might guess that a
825 | ~Ratio~ must have values of type Integer as both numerator and
826 | denominator. Sure enough, if we try to construct a ~Ratio~ where
827 | the numerator and denominator are of different types, or of the
828 | same non-integral type, ~ghci~ complains.
829 | 
830 | #+BEGIN_SRC screen
831 | ghci> 3.14 % 8
832 | 
833 | <interactive>:1:1: error:
834 |     • Ambiguous type variable ‘a0’ arising from a use of ‘print’
835 |       prevents the constraint ‘(Show a0)’ from being solved.
836 |       Probable fix: use a type annotation to specify what ‘a0’ should be.
837 |       These potential instances exist:
838 |         instance Show a => Show (Ratio a) -- Defined in ‘GHC.Real’
839 |         instance Show Ordering -- Defined in ‘GHC.Show’
840 |         instance Show Integer -- Defined in ‘GHC.Show’
841 |         ...plus 23 others
842 |         ...plus 11 instances involving out-of-scope types
843 |         (use -fprint-potential-instances to see them all)
844 |     • In a stmt of an interactive GHCi command: print it
845 | ghci> 1.2 % 3.4
846 | 
847 | <interactive>:1:1: error:
848 |     • Ambiguous type variable ‘a0’ arising from a use of ‘print’
849 |       prevents the constraint ‘(Show a0)’ from being solved.
850 |       Probable fix: use a type annotation to specify what ‘a0’ should be.
851 |       These potential instances exist:
852 |         instance Show a => Show (Ratio a) -- Defined in ‘GHC.Real’
853 |         instance Show Ordering -- Defined in ‘GHC.Show’
854 |         instance Show Integer -- Defined in ‘GHC.Show’
855 |         ...plus 23 others
856 |         ...plus 11 instances involving out-of-scope types
857 |         (use -fprint-potential-instances to see them all)
858 |     • In a stmt of an interactive GHCi command: print it
859 | #+END_SRC
860 | 
861 | Although it is initially useful to have ~:set +t~ giving us type
862 | information for every expression we enter, this is a facility we
863 | will quickly outgrow. After a while, we will often know what type
864 | we expect an expression to have. We can turn off the extra type
865 | information at any time, using the ~:unset~ command.
866 | 
867 | #+BEGIN_SRC screen
868 | ghci> :unset +t
869 | ghci> 2
870 | 2
871 | #+END_SRC
872 | 
873 | Even with this facility turned off, we can still get that type
874 | information easily when we need it, using another ~ghci~ command.
875 | 
876 | #+BEGIN_SRC screen
877 | ghci> :type 'a'
878 | 'a' :: Char
879 | ghci> "foo"
880 | "foo"
881 | ghci> :type it
882 | it :: [Char]
883 | #+END_SRC
884 | 
885 | The ~:type~ command will print type information for any expression
886 | we give it (including ~it~, as we see above). It won't actually
887 | evaluate the expression; it only checks its type and prints that.
888 | 
889 | Why are the types reported for these two expressions different?
890 | 
891 | #+BEGIN_SRC screen
892 | ghci> 3 + 2
893 | 5
894 | ghci> :type it
895 | it :: Integer
896 | ghci> :type 3 + 2
897 | 3 + 2 :: (Num t) => t
898 | #+END_SRC
899 | 
900 | Haskell has several numeric types. For example, a literal number
901 | such as ~1~ could, depending on the context in which it appears,
902 | be an integer or a floating point value. When we force ~ghci~ to
903 | evaluate the expression ~3 + 2~, it has to choose a type so that
904 | it can print the value, and it defaults to ~Integer~. In the
905 | second case, we ask ~ghci~ to print the type of the expression
906 | without actually evaluating it, so it does not have to be so
907 | specific. It answers, in effect, "its type is numeric". We will
908 | see more of this style of type annotation in
909 | [[file:6-using-typeclasses.org][Chapter 6, Using Type Classes]].
910 | 
911 | ** A simple program
912 | 
913 | Let's take a small leap ahead, and write a small program that
914 | counts the number of lines in its input. Don't expect to
915 | understand this yet; it's just fun to get our hands dirty. In a
916 | text editor, enter the following code into a file, and save it as
917 | ~WC.hs~.
918 | 
919 | #+CAPTION: wc.hs
920 | #+BEGIN_SRC haskell
921 | -- lines beginning with "--" are comments.
922 | 
923 | main = interact wordCount
924 |     where wordCount input = show (length (lines input)) ++ "\n"
925 | #+END_SRC
926 | 
927 | Find or create a text file; let's call it ~quux.txt~[fn:1].
928 | 
929 | #+BEGIN_SRC screen
930 | $ cat quux.txt
931 | Teignmouth, England
932 | Paris, France
933 | Ulm, Germany
934 | Auxerre, France
935 | Brunswick, Germany
936 | Beaumont-en-Auge, France
937 | Ryazan, Russia
938 | #+END_SRC
939 | 
940 | From a shell or command prompt, run the following command.
941 | 
942 | #+BEGIN_SRC screen
943 | $ runghc wc < quux.txt
944 | 7
945 | #+END_SRC
946 | 
947 | We have successfully written a simple program that interacts with
948 | the real world! In the chapters that follow, we will successively
949 | fill the gaps in our understanding until we can write programs of
950 | our own.
951 | 
952 | ** Exercises
953 | 
954 | 1. Enter the following expressions into ~ghci~. What are their
955 |    types?
956 | 
957 |    - ~5 + 8~
958 |    - ~3 * 5 + 8~
959 |    - ~2 + 4~
960 |    - ~(+) 2 4~
961 |    - ~sqrt 16~
962 |    - ~succ 6~
963 |    - ~succ 7~
964 |    - ~pred 9~
965 |    - ~pred 8~
966 |    - ~sin (pi / 2)~
967 |    - ~truncate pi~
968 |    - ~round 3.5~
969 |    - ~round 3.4~
970 |    - ~floor 3.7~
971 |    - ~ceiling 3.3~
972 | 
973 | 2. From ~ghci~, type ~:?~ to print some help. Define a variable,
974 |    such as ~x = 1~, then type ~:show bindings~. What do you see?
975 | 3. The ~words~ function counts the number of words in a string.
976 |    Modify the ~wc.hs~ example to count the number of words in a
977 |    file.
978 | 4. Modify the ~wc.hs~ example again, to print the number of
979 |    characters in a file.
980 | 
981 | ** Footnotes
982 | 
983 | [fn:1] Incidentally, what do these cities have in common?
984 | 


--------------------------------------------------------------------------------
/11-testing-and-quality-assurance.org:
--------------------------------------------------------------------------------
  1 | * Chapter 11: Testing and Quality Assurance
  2 | 
  3 | Building real systems means caring about quality control,
  4 | robustness and correctness. With the right quality assurance
  5 | mechanisms in place, well-written code can feel like a precision
  6 | machine, with all functions performing their tasks exactly as
  7 | specified. There is no sloppiness around the edges, and the final
  8 | result can be code that is self-explanatory, obviously correct --
  9 | the kind of code that inspires confidence.
 10 | 
 11 | In Haskell, we have several tools at our disposal for building
 12 | such precise systems. The most obvious tool, and one built into
 13 | the language itself, is the expressive type-system, which allows
 14 | for complicated invariants to be enforced statically–making it
 15 | impossible to write code violating chosen constraints. In
 16 | addition, purity and polymorphism encourage a style of code that
 17 | is modular, refactorable and testable. This is the kind of code
 18 | that just doesn't go wrong.
 19 | 
 20 | Testing plays a key role in keeping code on the
 21 | straight-and-narrow path. The main testing mechanisms in Haskell
 22 | are traditional unit testing (via the HUnit library), and its more
 23 | powerful descendant: type-based "property" testing, with
 24 | QuickCheck, an open source testing framework for Haskell.
 25 | Property-based testing encourages a high level approach to testing
 26 | in the form of abstract invariants functions should satisfy
 27 | universally, with the actual test data generated for the
 28 | programmer by the testing library. In this way code can be
 29 | hammered with thousands of tests that would be infeasible to write
 30 | by hand, often uncovering subtle corner cases that wouldn't be
 31 | found otherwise.
 32 | 
 33 | In this chapter we'll look at how to use QuickCheck to establish
 34 | invariants in code and then re-examine the pretty printer
 35 | developed in previous chapters, testing it with QuickCheck. We'll
 36 | also see how to guide the testing process with GHC's code coverage
 37 | tool: HPC.
 38 | 
 39 | ** QuickCheck: type-based testing
 40 | 
 41 | To get an overview of how property-based testing works, we'll
 42 | begin with a simple scenario: you've written a specialised sorting
 43 | function and want to test its behaviour.
 44 | 
 45 | First, we import the QuickCheck library, and any other modules we
 46 | need:
 47 | 
 48 | #+CAPTION: QCBasics.hs
 49 | #+BEGIN_SRC haskell
 50 | import Test.QuickCheck
 51 | import Data.List
 52 | #+END_SRC
 53 | 
 54 | And the function we want to test–a custom sort routine:
 55 | 
 56 | #+CAPTION: QCBasics.hs
 57 | #+BEGIN_SRC haskell
 58 | qsort :: Ord a => [a] -> [a]
 59 | qsort []     = []
 60 | qsort (x:xs) = qsort lhs ++ [x] ++ qsort rhs
 61 |     where lhs = filter  (< x) xs
 62 |           rhs = filter (>= x) xs
 63 | #+END_SRC
 64 | 
 65 | This is the classic Haskell sort implementation: a study in
 66 | functional programming elegance, if not efficiency (this isn't an
 67 | inplace sort). Now, we'd like to check that this function obeys
 68 | the basic rules a good sort should follow. One useful invariant to
 69 | start with, and one that comes up in a lot of purely functional
 70 | code, is /idempotency/–applying a function twice has the same
 71 | result as applying it only once. For our sort routine, a stable
 72 | sort algorithm, this should certainly be true, or things have gone
 73 | horribly wrong! This invariant can be encoded as a property
 74 | simply:
 75 | 
 76 | #+CAPTION: QCBasics.hs
 77 | #+BEGIN_SRC haskell
 78 | prop_idempotent xs = qsort (qsort xs) == qsort xs
 79 | #+END_SRC
 80 | 
 81 | We'll use the QuickCheck convention of prefixing test properties
 82 | with ~prop_~ to distinguish them from normal code. This
 83 | idempotency property is written simply as a Haskell function
 84 | stating an equality that must hold for any input data that is
 85 | sorted. We can check this makes sense for a few simple cases by
 86 | hand:
 87 | 
 88 | #+BEGIN_SRC screen
 89 | ghci> prop_idempotent []
 90 | True
 91 | ghci> prop_idempotent [1,1,1,1]
 92 | True
 93 | ghci> prop_idempotent [1..100]
 94 | True
 95 | ghci> prop_idempotent [1,5,2,1,2,0,9]
 96 | True
 97 | #+END_SRC
 98 | 
 99 | Looking good. However, writing out the input data by hand is
100 | tedious, and violates the moral code of the efficient functional
101 | programmer: let the machine do the work! To automate this the
102 | QuickCheck library comes with a set of data generators for all the
103 | basic Haskell data types. QuickCheck uses the ~Arbitrary~
104 | type class to present a uniform interface to (pseudo-)random data
105 | generation with the type system used to resolve which generator to
106 | use. QuickCheck normally hides the data generation plumbing,
107 | however we can also run the generators by hand to get a sense for
108 | the distribution of data QuickCheck produces. For example, to
109 | generate a random list of boolean values:
110 | 
111 | #+BEGIN_SRC screen
112 | ghci> :m +Test.QuickCheck.Arbitrary
113 | ghci> :m +Test.QuickCheck.Gen
114 | ghci> :m +Test.QuickCheck.Random
115 | ghci> unGen arbitrary (mkQCGen 2) 10 :: [Bool]
116 | [False,True,True,True,False,True,False,False,True,False]
117 | #+END_SRC
118 | 
119 | QuickCheck generates test data like this and passes it to the
120 | property of our choosing, via the ~quickCheck~ function. The type
121 | of the property itself determines which data generator is used.
122 | ~quickCheck~ then checks that for all the test data produced, the
123 | property is satisfied. Now, since our idempotency test is
124 | polymorphic in the list element type, we need to pick a particular
125 | type to generate test data for, which we write as a type
126 | constraint on the property. To run the test, we just call
127 | ~quickCheck~ with our property function, set to the required data
128 | type (otherwise the list element type will default to the
129 | uninteresting ~()~ type):
130 | 
131 | #+BEGIN_SRC screen
132 | ghci> :type quickCheck
133 | quickCheck :: Testable prop => prop -> IO ()
134 | ghci> quickCheck (prop_idempotent :: [Integer] -> Bool)
135 | +++ OK, passed 100 tests.
136 | #+END_SRC
137 | 
138 | For the 100 different lists generated, our property held–great!
139 | When developing tests, it is often useful to see the actual data
140 | generated for each test. To do this, we would replace ~quickCheck~
141 | with its sibling, ~verboseCheck~, to see (verbose) output for each
142 | test. Now, let's look at more sophisticated properties that our
143 | function might satisfy.
144 | 
145 | *** Testing for properties
146 | 
147 | Good libraries consist of a set of orthogonal primitives having
148 | sensible relationships to each other. We can use QuickCheck to
149 | specify the relationships between functions in our code, helping
150 | us find a good library interface by developing functions that are
151 | interrelated via useful properties. QuickCheck in this way acts as
152 | an API "lint" tool–it provides machine support for ensuring our
153 | library API makes sense.
154 | 
155 | The list sorting function should certainly have a number of
156 | interesting properties that tie it to other list operations. For
157 | example: the first element in a sorted list should always be the
158 | smallest element of the input list. We might be tempted to specify
159 | this intuition in Haskell, using the ~List~ library's ~minimum~
160 | function:
161 | 
162 | #+CAPTION: QCBasics.hs
163 | #+BEGIN_SRC haskell
164 | prop_minimum xs = head (qsort xs) == minimum xs
165 | #+END_SRC
166 | 
167 | Testing this, though, reveals an error:
168 | 
169 | #+BEGIN_SRC screen
170 | ghci> quickCheck (prop_minimum :: [Integer] -> Bool)
171 | *** Failed! Exception: 'Prelude.head: empty list' (after 1 test): []
172 | #+END_SRC
173 | 
174 | The property failed when sorting an empty list–for which ~head~
175 | and ~minimum~ aren't defined, as we can see from their definition:
176 | 
177 | #+CAPTION: minimum.hs
178 | #+BEGIN_SRC haskell
179 | head       :: [a] -> a
180 | head (x:_) = x
181 | head []    = error "Prelude.head: empty list"
182 | 
183 | minimum    :: (Ord a) => [a] -> a
184 | minimum [] =  error "Prelude.minimum: empty list"
185 | minimum xs =  foldl1 min xs
186 | #+END_SRC
187 | 
188 | So this property will only hold for non-empty lists. QuickCheck,
189 | thankfully, comes with a full property writing embedded language,
190 | so we can specify more precisely our invariants, filtering out
191 | values we don't want to consider. For the empty list case, we
192 | really want to say: /if/ the list is non-empty, /then/ the first
193 | element of the sorted result is the minimum. This is done by using
194 | the ~(==>)~ implication function, which filters out invalid data
195 | before running the property:
196 | 
197 | #+CAPTION: QCBasics.hs
198 | #+BEGIN_SRC haskell
199 | prop_minimum' xs = not (null xs) ==> head (qsort xs) == minimum xs
200 | #+END_SRC
201 | 
202 | The result is quite clean. By separating out the empty list case,
203 | we can now confirm the property does in fact hold:
204 | 
205 | #+BEGIN_SRC screen
206 | ghci> quickCheck (prop_minimum' :: [Integer] -> Property)
207 | +++ OK, passed 100 tests.
208 | #+END_SRC
209 | 
210 | Note that we had to change the type of the property from being a
211 | simple ~Bool~ result to the more general ~Property~ type (the
212 | property itself is now a function that filters non-empty lists,
213 | before testing them, rather than a simple boolean constant).
214 | 
215 | We can now complete the basic property set for the sort function
216 | with some other invariants that it should satisfy: that the output
217 | is ordered (each element should be smaller than, or equal to, its
218 | successor); that the output is a permutation of the input (which
219 | we achieve via the list difference function, ~(\\)~); that the
220 | last sorted element should be the largest element; and if we find
221 | the smallest element of two different lists, that should be the
222 | first element if we append and sort those lists. These properties
223 | can be stated as:
224 | 
225 | #+CAPTION: QCBasics.hs
226 | #+BEGIN_SRC haskell
227 | prop_ordered xs = ordered (qsort xs)
228 |     where ordered []       = True
229 |           ordered [x]      = True
230 |           ordered (x:y:xs) = x <= y && ordered (y:xs)
231 | 
232 | prop_permutation xs = permutation xs (qsort xs)
233 |     where permutation xs ys = null (xs \\ ys) && null (ys \\ xs)
234 | 
235 | prop_maximum xs         =
236 |     not (null xs) ==>
237 |         last (qsort xs) == maximum xs
238 | 
239 | prop_append xs ys       =
240 |     not (null xs) ==>
241 |     not (null ys) ==>
242 |         head (qsort (xs ++ ys)) == min (minimum xs) (minimum ys)
243 | #+END_SRC
244 | 
245 | *** Testing against a model
246 | 
247 | Another technique for gaining confidence in some code is to test
248 | it against a model implementation. We can tie our implementation
249 | of list sort to the reference sort function in the standard list
250 | library, and, if they behave the same, we gain confidence that our
251 | sort does the right thing.
252 | 
253 | #+CAPTION: QCBasics.hs
254 | #+BEGIN_SRC haskell
255 | prop_sort_model xs = sort xs == qsort xs
256 | #+END_SRC
257 | 
258 | This kind of model-based testing is extremely powerful. Often
259 | developers will have a reference implementation or prototype that,
260 | while inefficient, is correct. This can then be kept around and
261 | used to ensure optimised production code conforms to the
262 | reference. By building a large suite of these model-based tests,
263 | and running them regularly (on every commit, for example), we can
264 | cheaply ensure the precision of our code. Large Haskell projects
265 | often come bundled with property suites comparable in size to the
266 | project itself, with thousands of invariants tested on every
267 | change, keeping the code tied to the specification, and ensuring
268 | it behaves as required.
269 | 
270 | ** Testing case study: specifying a pretty printer
271 | 
272 | Testing individual functions for their natural properties is one
273 | of the basic building blocks that guides development of large
274 | systems in Haskell. We'll look now at a more complicated scenario:
275 | taking the pretty printing library developed in earlier chapters,
276 | and building a test suite for it.
277 | 
278 | *** Generating test data
279 | 
280 | Recall that the pretty printer is built around the ~Doc~, an
281 | algebraic data type that represents well-formed documents:
282 | 
283 | #+BEGIN_SRC haskell
284 | data Doc = Empty
285 |          | Char Char
286 |          | Text String
287 |          | Line
288 |          | Concat Doc Doc
289 |          | Union Doc Doc
290 |          deriving (Show,Eq)
291 | #+END_SRC
292 | 
293 | The library itself is implemented as a set of functions that build
294 | and transform values of this document type, before finally
295 | rendering the finished document to a string.
296 | 
297 | QuickCheck encourages an approach to testing where the developer
298 | specifies invariants that should hold for any data we can throw at
299 | the code. To test the pretty printing library, then, we'll need a
300 | source of input data. To do this, we take advantage of the small
301 | combinator suite for building random data that QuickCheck provides
302 | via the ~Arbitrary~ class. The class provides a function,
303 | ~arbitrary~, to generate data of each type, and with this we can
304 | define our data generator for our custom data types.[fn:1]
305 | 
306 | #+BEGIN_SRC haskell
307 | class Arbitrary a where
308 |   arbitrary :: Gen a
309 | #+END_SRC
310 | 
311 | One thing to notice is that the generators run in a ~Gen~
312 | environment, indicated by the type. This is a simple state-passing
313 | monad that is used to hide the random number generator state that
314 | is threaded through the code. We'll look thoroughly at monads in
315 | later chapters, but for now it suffices to know that, as ~Gen~ is
316 | defined as a monad, we can use ~do~ syntax to write new generators
317 | that access the implicit random number source. To actually write
318 | generators for our custom type we use any of a set of functions
319 | defined in the library for introducing new random values and
320 | gluing them together to build up data structures of the type we're
321 | interested in. The types of the key functions are:
322 | 
323 | #+BEGIN_SRC haskell
324 | elements :: [a] -> Gen a
325 | choose :: Random a => (a, a) -> Gen a
326 | oneof :: [Gen a] -> Gen a
327 | #+END_SRC
328 | 
329 | The function ~elements~, for example, takes a list of values, and
330 | returns a generator of random values from that list. ~choose~ and
331 | ~oneof~ we'll use later. With this, we can start writing
332 | generators for simple data types. For example, if we define a new
333 | data type for ternary logic:
334 | 
335 | #+CAPTION: Arbitrary.hs
336 | #+BEGIN_SRC haskell
337 | import Test.QuickCheck
338 | 
339 | data Ternary
340 |     = Yes
341 |     | No
342 |     | Unknown
343 |     deriving (Eq,Show)
344 | #+END_SRC
345 | 
346 | we can write an ~Arbitrary~ instance for the ~Ternary~ type by
347 | defining a function that picks elements from a list of the
348 | possible values of ~Ternary~ type:
349 | 
350 | #+CAPTION: Arbitrary.hs
351 | #+BEGIN_SRC haskell
352 | instance Arbitrary Ternary where
353 |   arbitrary = elements [Yes, No, Unknown]
354 | #+END_SRC
355 | 
356 | Another approach to data generation is to generate values for one
357 | of the basic Haskell types and then translate those values into
358 | the type you're actually interested in. We could have written the
359 | ~Ternary~ instance by generating integer values from 0 to 2
360 | instead, using ~choose~, and then mapping them onto the ternary
361 | values:
362 | 
363 | #+CAPTION: Arbitrary.hs
364 | #+BEGIN_SRC haskell
365 | instance Arbitrary Ternary where
366 |   arbitrary = do
367 |       n <- choose (0, 2) :: Gen Int
368 |       return $ case n of
369 |                     0 -> Yes
370 |                     1 -> No
371 |                     _ -> Unknown
372 | #+END_SRC
373 | 
374 | For simple /sum/ types, this approach works nicely, as the
375 | integers map nicely onto the constructors of the data type. For
376 | /product/ types (such as structures and tuples), we need to
377 | instead generate each component of the product separately (and
378 | recursively for nested types), and then combine the components.
379 | For example, to generate random pairs of random values:
380 | 
381 | #+CAPTION: Arbitrary.hs
382 | #+BEGIN_SRC haskell
383 | instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
384 |   arbitrary = do
385 |       x <- arbitrary
386 |       y <- arbitrary
387 |       return (x, y)
388 | #+END_SRC
389 | 
390 | So let's now write a generator for all the different variants of
391 | the ~Doc~ type. We'll start by breaking the problem down, first
392 | generating random constructors for each type, then, depending on
393 | the result, the components of each field. We choose a random
394 | integer to represent which document variant to generate, and then
395 | dispatch based on the result. To generate concat or union document
396 | nodes, we just recurse on ~arbitrary~, letting type inference
397 | determine which instance of ~Arbitrary~ we mean:
398 | 
399 | #+CAPTION: QC.hs
400 | #+BEGIN_SRC haskell
401 | module QC where
402 | 
403 | import Prettify
404 | 
405 | import Data.List
406 | import Test.QuickCheck
407 | 
408 | instance Arbitrary Doc where
409 |     arbitrary = do
410 |         n <- choose (1,6) :: Gen Int
411 |         case n of
412 |              1 -> return Empty
413 | 
414 |              2 -> do x <- arbitrary
415 |                      return (Char x)
416 | 
417 |              3 -> do x <- arbitrary
418 |                      return (Text x)
419 | 
420 |              4 -> return Line
421 | 
422 |              5 -> do x <- arbitrary
423 |                      y <- arbitrary
424 |                      return (Concat x y)
425 | 
426 |              6 -> do x <- arbitrary
427 |                      y <- arbitrary
428 |                      return (Union x y)
429 | #+END_SRC
430 | 
431 | That was fairly straightforward, and we can clean it up some more
432 | by using the ~oneof~ function, whose type we saw earlier, to pick
433 | between different generators in a list (we can also use the
434 | monadic combinator, ~liftM~ to avoid naming intermediate results
435 | from each generator):
436 | 
437 | #+CAPTION: QC.hs
438 | #+BEGIN_SRC haskell
439 | -- import Control.Monad
440 | instance Arbitrary Doc where
441 |     arbitrary =
442 |         oneof [ return Empty
443 |               , liftM  Char   arbitrary
444 |               , liftM  Text   arbitrary
445 |               , return Line
446 |               , liftM2 Concat arbitrary arbitrary
447 |               , liftM2 Union  arbitrary arbitrary ]
448 | #+END_SRC
449 | 
450 | The latter is more concise, just picking between a list of
451 | generators, but they describe the same data either way. We can
452 | check that the output makes sense, by generating a list of random
453 | documents (seeding the pseudo-random generator with an initial
454 | seed of 2):
455 | 
456 | #+BEGIN_SRC screen
457 | ghci> unGen arbitrary (mkQCGen 2) 10 :: [Doc]
458 | [Empty,Union (Char 't') Line,Line,Union Line Empty,Concat (Char '\9930')
459 | (Text "\DEL"),Line,Text "\263060\ACKJ@e",Empty,Char '\367759',Concat Line
460 | (Text ")\385036N\332758D(")]
461 | #+END_SRC
462 | 
463 | Looking at the output we see a good mix of simple, base cases, and
464 | some more complicated nested documents. We'll be generating
465 | hundreds of these each test run, so that should do a pretty good
466 | job. We can now write some generic properties for our document
467 | functions.
468 | 
469 | *** Testing document construction
470 | 
471 | Two of the basic functions on documents are the null document
472 | constant (a nullary function), ~empty~, and the append function.
473 | 
474 | #+BEGIN_SRC haskell
475 | empty :: Doc
476 | 
477 | (<>) :: Doc -> Doc -> Doc
478 | #+END_SRC
479 | 
480 | Together, these should have a nice property: appending or
481 | prepending the empty list onto a second list, should leave the
482 | second list unchanged. We can state this invariant as a property:
483 | 
484 | #+CAPTION: QC.hs
485 | #+BEGIN_SRC haskell
486 | prop_empty_id x = empty <> x == x && x <> empty == x
487 | #+END_SRC
488 | 
489 | Confirming that this is indeed true, we're now underway with our
490 | testing:
491 | 
492 | #+BEGIN_SRC screen
493 | ghci> quickCheck prop_empty_id
494 | +++ OK, passed 100 tests.
495 | #+END_SRC
496 | 
497 | To look at what actual test documents were generated (by replacing
498 | ~quickCheck~ with ~verboseCheck~). A good mixture of both simple
499 | and complicated cases are being generated. We can refine the data
500 | generation further, with constraints on the proportion of
501 | generated data, if desirable.
502 | 
503 | Other functions in the API are also simple enough to have their
504 | behaviour fully described via properties. By doing so we can
505 | maintain an external, checkable description of the function's
506 | behaviour, so later changes won't break these basic invariants.
507 | 
508 | #+CAPTION: QC.hs
509 | #+BEGIN_SRC haskell
510 | prop_char c = char c == Char c
511 | 
512 | prop_text s = text s == if null s then Empty else Text s
513 | 
514 | prop_line = line == Line
515 | 
516 | prop_double d = double d == text (show d)
517 | #+END_SRC
518 | 
519 | **** TODO explain why ~prop_line~ only produces one test
520 | 
521 | These properties are enough to fully test the structure returned
522 | by the basic document operators. To test the rest of the library
523 | will require more work.
524 | 
525 | *** Using lists as a model
526 | 
527 | Higher order functions are the basic glue of reusable programming,
528 | and our pretty printer library is no exception–a custom fold
529 | function is used internally to implement both document
530 | concatenation and interleaving separators between document chunks.
531 | The ~fold~ defined for documents takes a list of document pieces,
532 | and glues them all together with a supplied combining function:
533 | 
534 | #+BEGIN_SRC haskell
535 | fold :: (Doc -> Doc -> Doc) -> [Doc] -> Doc
536 | fold f = foldr f empty
537 | #+END_SRC
538 | 
539 | We can write tests in isolation for specific instances of fold
540 | easily. Horizontal concatenation of documents, for example, is
541 | easy to specify by writing a reference implementation on lists:
542 | 
543 | #+CAPTION: QC.hs
544 | #+BEGIN_SRC haskell
545 | prop_hcat xs = hcat xs == glue xs
546 |     where
547 |         glue []     = empty
548 |         glue (d:ds) = d <> glue ds
549 | #+END_SRC
550 | 
551 | It is a similar story for ~punctuate~, where we can model
552 | inserting punctuation with list interspersion (from ~Data.List~,
553 | ~intersperse~ is a function that takes an element and interleaves
554 | it between other elements of a list):
555 | 
556 | #+CAPTION: QC.hs
557 | #+BEGIN_SRC haskell
558 | prop_punctuate s xs = punctuate s xs == intersperse s xs
559 | #+END_SRC
560 | 
561 | While this looks fine, running it reveals a flaw in our reasoning:
562 | 
563 | #+BEGIN_SRC screen
564 | ghci> quickCheck prop_punctuate
565 | *** Failed! Falsifiable (after 4 tests):
566 | Empty
567 | [Char '\DC3',Empty]
568 | #+END_SRC
569 | 
570 | The pretty printing library optimises away redundant empty
571 | documents, something the model implementation doesn't, so we'll
572 | need to augment our model to match reality. First, we can
573 | intersperse the punctuation text throughout the document list,
574 | then a little loop to clean up the ~Empty~ documents scattered
575 | through, like so:
576 | 
577 | #+CAPTION: QC.hs
578 | #+BEGIN_SRC haskell
579 | prop_punctuate' s xs = punctuate s xs == combine (intersperse s xs)
580 |     where
581 |         combine []           = []
582 |         combine [x]          = [x]
583 |         combine (x:Empty:ys) = x : combine ys
584 |         combine (Empty:y:ys) = y : combine ys
585 |         combine (x:y:ys)     = x `Concat` y : combine ys
586 | #+END_SRC
587 | 
588 | Running this in GHCi, we can confirm the result. It is reassuring
589 | to have the test framework spot the flaws in our reasoning about
590 | the code–exactly what we're looking for:
591 | 
592 | #+BEGIN_SRC screen
593 | ghci> quickCheck prop_punctuate'
594 | +++ OK, passed 100 tests.
595 | #+END_SRC
596 | 
597 | *** Putting it altogether
598 | 
599 | We can put all these tests together in a single file, and run them
600 | simply by using one of QuickCheck's driver functions. Several
601 | exist, including elaborate parallel ones. The basic batch driver
602 | is often good enough, however. All we need do is set up some
603 | default test parameters, and then list the functions we want to
604 | test:
605 | 
606 | #+CAPTION: Run.hs
607 | #+BEGIN_SRC haskell
608 | import QC
609 | import Test.QuickCheck
610 | import Control.Monad (forM_)
611 | 
612 | options = stdArgs { maxSuccess = 200, maxSize = 200}
613 | 
614 | type Run = Args -> IO ()
615 | 
616 | run :: Testable prop => prop -> Run
617 | run = flip quickCheckWith
618 | 
619 | runTests :: String -> Args -> [Run] -> IO ()
620 | runTests name opts tests =
621 |     putStrLn ("Running " ++ name ++ " tests:") >>
622 |         forM_ tests (\ rn -> rn opts)
623 | 
624 | main = do
625 |     runTests "simple" options
626 |         [ run prop_empty_id
627 |         , run prop_char
628 |         , run prop_text
629 |         , run prop_line
630 |         , run prop_double
631 |         ]
632 | 
633 |     runTests "complex" options
634 |         [ run prop_hcat
635 |         , run prop_punctuate
636 |         ]
637 | #+END_SRC
638 | 
639 | We've structured the code here as a separate, standalone test
640 | script, with instances and properties in their own file, separate
641 | to the library source. This is typical for library projects, where
642 | the tests are kept apart from the library itself, and import the
643 | library via the module system. The test script can then be
644 | compiled and executed:
645 | 
646 | #+BEGIN_SRC screen
647 | ghci> :l Run.hs
648 | [1 of 4] Compiling SimpleJSON       ( SimpleJSON.hs, interpreted )
649 | [2 of 4] Compiling Prettify         ( Prettify.hs, interpreted )
650 | [3 of 4] Compiling QC               ( QC.hs, interpreted )
651 | [4 of 4] Compiling Main             ( Run.hs, interpreted )
652 | Ok, four modules loaded.
653 | *Main> main
654 | Running simple tests:
655 | +++ OK, passed 200 tests.
656 | +++ OK, passed 200 tests.
657 | +++ OK, passed 200 tests.
658 | +++ OK, passed 1 tests.
659 | +++ OK, passed 200 tests.
660 | Running complex tests:
661 | +++ OK, passed 200 tests.
662 | +++ OK, passed 200 tests.
663 | #+END_SRC
664 | 
665 | A total of 1201 individual tests were created, which is
666 | comforting. We can increase the depth easily enough, but to find
667 | out exactly how well the code is being tested we should turn to
668 | the built in code coverage tool, HPC, which can state precisely
669 | what is going on.
670 | 
671 | ** Measuring test coverage with HPC
672 | 
673 | HPC (Haskell Program Coverage) is an extension to the compiler to
674 | observe what parts of the code were actually executed during a
675 | given program run. This is useful in the context of testing, as it
676 | lets us observe precisely which functions, branches and
677 | expressions were evaluated. The result is precise knowledge about
678 | the percent of code tested, that's easy to obtain. HPC comes with
679 | a simple utility to generate useful graphs of program coverage,
680 | making it easy to zoom in on weak spots in the test suite.
681 | 
682 | To obtain test coverage data, all we need to do is add the ~-fhpc~
683 | flag to the command line, when compiling the tests:
684 | 
685 | #+BEGIN_SRC screen
686 | $ ghc -fhpc Run.hs --make
687 | #+END_SRC
688 | 
689 | Then run the tests as normal;
690 | 
691 | #+BEGIN_SRC screen
692 | $ ./Run
693 | Running simple tests:
694 | +++ OK, passed 200 tests.
695 | +++ OK, passed 200 tests.
696 | +++ OK, passed 200 tests.
697 | +++ OK, passed 1 tests.
698 | +++ OK, passed 200 tests.
699 | Running complex tests:
700 | +++ OK, passed 200 tests.
701 | +++ OK, passed 200 tests.
702 | #+END_SRC
703 | 
704 | During the test run the trace of the program is written to .tix
705 | and .mix files in the current directory. Afterwards, these files
706 | are used by the command line tool, ~hpc~, to display various
707 | statistics about what happened. The basic interface is textual. To
708 | begin, we can get a summary of the code tested during the run
709 | using the ~report~ flag to ~hpc~. We'll exclude the test programs
710 | themselves, (using the ~--exclude~ flag), so as to concentrate
711 | only on code in the pretty printer library. Entering the following
712 | into the console:
713 | 
714 | #+BEGIN_SRC screen
715 | $ hpc report Run --exclude=Main --exclude=QC
716 |  17% expressions used (30/176)
717 |   0% boolean coverage (0/3)
718 |        0% guards (0/3), 3 unevaluated
719 |      100% 'if' conditions (0/0)
720 |      100% qualifiers (0/0)
721 |  17% alternatives used (8/46)
722 |   0% local declarations used (0/4)
723 |  30% top-level declarations used (10/33)
724 | #+END_SRC
725 | 
726 | we see that, on the last line, 30% of top level definitions were
727 | evaluated during the test run. Not too bad for a first attempt. As
728 | we test more and more functions from the library, this figure will
729 | rise. The textual version is useful for a quick summary, but to
730 | really see what's going on it is best to look at the marked up
731 | output. To generate this, use the ~markup~ flag instead:
732 | 
733 | #+BEGIN_SRC screen
734 | $ hpc markup Run --exclude=Main --exclude=QC
735 | #+END_SRC
736 | 
737 | This will generate one html file for each Haskell source file, and
738 | some index files. Loading the file ~hpc_index.html~ into a
739 | browser, we can see some pretty graphs of the code coverage:
740 | 
741 | [[file:figs/ch11-hpc-round1.png]]
742 | 
743 | Not too bad. Clicking through to the ~Prettify~ module itself, we
744 | see the actual source of the program, marked up in bold yellow for
745 | code that wasn't tested, and code that was executed simply bold.
746 | 
747 | It is important to remove the old .tix file after you make
748 | modifications or an error will occur as HPC tries to combine the
749 | statistics from separate runs:
750 | 
751 | #+BEGIN_SRC screen
752 | $ ghc -fhpc Run.hs --make -no-recomp
753 | $ ./Run
754 | in module 'Prettify'
755 | Hpc failure: module mismatch with .tix/.mix file hash number
756 | (perhaps remove Run.tix file?)
757 | $ rm *.tix
758 | $ ./Run
759 | Running simple tests:
760 | +++ OK, passed 200 tests.
761 | +++ OK, passed 200 tests.
762 | +++ OK, passed 200 tests.
763 | +++ OK, passed 1 tests.
764 | +++ OK, passed 200 tests.
765 | Running complex tests:
766 | +++ OK, passed 200 tests.
767 | +++ OK, passed 200 tests.
768 | #+END_SRC
769 | 
770 | Another two hundred tests were added to the suite, and our
771 | coverage statistics improves to 52 percent of the code base:
772 | 
773 | [[file:figs/ch11-hpc-round2.png]]
774 | 
775 | HPC ensures that we're honest in our testing, as anything less
776 | than 100% coverage will be pointed out in glaring color. In
777 | particular, it ensures the programmer has to think about error
778 | cases, and complicated branches with obscure conditions, all forms
779 | of code smell. When combined with a saturating test generation
780 | system, like QuickCheck's, testing becomes a rewarding activity,
781 | and a core part of Haskell development.
782 | 
783 | ** Footnotes
784 | 
785 | [fn:1] The class also defines a method, ~coarbitrary~, which given
786 | a value of some type, yields a function for new generators. We can
787 | disregard for now, as it is only needed for generating random
788 | values of function type. One result of disregarding ~coarbitrary~
789 | is that GHC will warn about it not being defined, however, it is
790 | safe to ignore these warnings.
791 | 


--------------------------------------------------------------------------------
/21-using-databases.org:
--------------------------------------------------------------------------------
  1 | * Chapter 21. Using Databases
  2 | 
  3 | Everything from web forums to podcatchers or even backup programs
  4 | frequently use databases for persistent storage. SQL-based
  5 | databases are often quite convenient: they are fast, can scale
  6 | from tiny to massive sizes, can operate over the network, often
  7 | help handle locking and transactions, and can even provide
  8 | failover and redundancy improvements for applications. Databases
  9 | come in many different shapes: the large commercial databases such
 10 | as Oracle, Open Source engines such as PostgreSQL or MySQL, and
 11 | even embeddable engines such as Sqlite.
 12 | 
 13 | Because databases are so important, Haskell support for them is
 14 | important as well. In this chapter, we will introduce you to one
 15 | of the Haskell frameworks for working with databases. We will also
 16 | use this framework to begin building a podcast downloader, which
 17 | we will further develop in [[file:22-web-client-programming.org][Chapter 22, /Extended Example: Web Client Programming/]].
 18 | 
 19 | ** Overview of HDBC
 20 | 
 21 | At the bottom of the database stack is the database engine. The
 22 | database engine is responsible for actually storing data on disk.
 23 | Well-known database engines include PostgreSQL, MySQL, and Oracle.
 24 | 
 25 | Most modern database engines support SQL, the Structured Query
 26 | Language, as a standard way of getting data into and out of
 27 | relational databases. This book will not provide a tutorial on SQL
 28 | or relational database management.[fn:1]
 29 | 
 30 | Once you have a database engine that supports SQL, you need a way
 31 | to communicate with it. Each database has its own protocol. Since
 32 | SQL is reasonably constant across databases, it is possible to
 33 | make a generic interface that uses drivers for each individual
 34 | protocol.
 35 | 
 36 | Haskell has several different database frameworks available, some
 37 | providing high-level layers atop others. For this chapter, we will
 38 | concentrate on HDBC, the Haskell DataBase Connectivity system.
 39 | HDBC is a database abstraction library. That is, you can write
 40 | code that uses HDBC and can access data stored in almost any SQL
 41 | database with little or no modification.[fn:2] Even if you never
 42 | need to switch underlying database engines, the HDBC system of
 43 | drivers makes a large number of choices available to you with a
 44 | single interface.
 45 | 
 46 | Another database abstraction library for Haskell is HSQL, which
 47 | shares a similar purpose with HDBC. There is also a higher-level
 48 | framework called HaskellDB, which sits atop either HDBC or HSQL,
 49 | and is designed to help insulate the programmer from the details
 50 | of working with SQL. However, it does not have as broad appeal
 51 | because its design limits it to certain—albeit quite common—
 52 | database access patterns. Finally, Takusen is a framework that
 53 | uses a "left fold" approach to reading data from the database.
 54 | 
 55 | ** Installing HDBC and Drivers
 56 | 
 57 | To connect to a given database with HDBC, you need at least two
 58 | packages: the generic interface, and a driver for your specific
 59 | database. You can obtain the generic HDBC package, and all of the
 60 | other drivers, from [[http://hackage.haskell.org/][Hackage]][fn:3]. For this chapter, we will use
 61 | HDBC version 1.1.3 for examples.
 62 | 
 63 | You'll also need a database backend and backend driver. For this
 64 | chapter, we'll use Sqlite version 3. Sqlite is an embedded
 65 | database, so it doesn't require a separate server and is easy to
 66 | set up. Many operating systems already ship with Sqlite version 3.
 67 | If yours doesn't, you can download it from
 68 | [[http://www.sqlite.org/]]. The HDBC homepage has a link to known
 69 | HDBC backend drivers. The specific driver for Sqlite version 3 can
 70 | be obtained from Hackage.
 71 | 
 72 | If you want to use HDBC with other databases, check out the HDBC
 73 | Known Drivers page at
 74 | [[http://software.complete.org/hdbc/wiki/KnownDrivers]]. There you
 75 | will find a link to the ODBC binding, which lets you connect to
 76 | virtually any database on virtually any platform (Windows, POSIX,
 77 | and others). You will also find a PostgreSQL binding. MySQL is
 78 | supported via the ODBC binding, and specific information for MySQL
 79 | users can be found in the [[http://software.complete.org/static/hdbc-odbc/doc/HDBC-odbc/][HDBC-ODBC API documentation]].
 80 | 
 81 | ** Connecting to Databases
 82 | 
 83 | To connect to a database, you will use a connection function from
 84 | a database backend driver. Each database has its own unique method
 85 | of connecting. The initial connection is generally the only time
 86 | you will call anything from a backend driver module directly.
 87 | 
 88 | The database connection function will return a database handle.
 89 | The precise type of this handle may vary from one driver to the
 90 | next, but it will always be an instance of the ~IConnection~
 91 | type class. All of the functions you will use to operate on
 92 | databases will work with any type that is an instance of
 93 | ~IConnection~. When you're done talking to the database, call the
 94 | ~disconnect~ function. It will disconnect you from the database.
 95 | Here's an example of connecting to a Sqlite database:
 96 | 
 97 | #+BEGIN_SRC screen
 98 | ghci> :module Database.HDBC Database.HDBC.Sqlite3
 99 | ghci> conn <- connectSqlite3 "test1.db"
100 | Loading package array-0.1.0.0 ... linking ... done.
101 | Loading package containers-0.1.0.1 ... linking ... done.
102 | Loading package bytestring-0.9.0.1 ... linking ... done.
103 | Loading package old-locale-1.0.0.0 ... linking ... done.
104 | Loading package old-time-1.0.0.0 ... linking ... done.
105 | Loading package mtl-1.1.0.0 ... linking ... done.
106 | Loading package HDBC-1.1.5 ... linking ... done.
107 | Loading package HDBC-sqlite3-1.1.4.0 ... linking ... done.
108 | ghci> :type conn
109 | conn :: Connection
110 | ghci> disconnect conn
111 | #+END_SRC
112 | 
113 | ** Transactions
114 | 
115 | Most modern SQL databases have a notion of transactions. A
116 | transaction is designed to ensure that all components of a
117 | modification get applied, or that none of them do. Furthermore,
118 | transactions help prevent other processes accessing the same
119 | database from seeing partial data from modifications that are in
120 | progress.
121 | 
122 | Many databases require you to either explicitly commit all your
123 | changes before they appear on disk, or to run in an "autocommit"
124 | mode. The "autocommit" mode runs an implicit commit after every
125 | statement. This may make the adjustment to transactional databases
126 | easier for programmers not accustomed to them, but is just a
127 | hindrance to people who actually want to use multi-statement
128 | transactions.
129 | 
130 | HDBC intentionally does not support autocommit mode. When you
131 | modify data in your databases, you must explicitly cause it to be
132 | committed to disk. There are two ways to do that in HDBC: you can
133 | call ~commit~ when you're ready to write the data to disk, or you
134 | can use the ~withTransaction~ function to wrap around your
135 | modification code. ~withTransaction~ will cause data to be
136 | committed upon successful completion of your function.
137 | 
138 | Sometimes a problem will occur while you are working on writing
139 | data to the database. Perhaps you get an error from the database
140 | or discover a problem with the data. In these instances, you can
141 | "roll back" your changes. This will cause all changes you were
142 | making since your last ~commit~ or roll back to be forgotten. In
143 | HDBC, you can call the ~rollback~ function to do this. If you are
144 | using ~withTransaction~, any uncaught exception will cause a roll
145 | back to be issued.
146 | 
147 | Note that a roll back operation only rolls back the changes since
148 | the last ~commit~, ~rollback~, or ~withTransaction~. A database
149 | does not maintain an extensive history like a version-control
150 | system. You will see examples of ~commit~ later in this chapter.
151 | 
152 | #+BEGIN_WARNING
153 | Warning
154 | 
155 | One popular database, MySQL, does not support transactions with
156 | its default table type. In its default configuration, MySQL will
157 | silently ignore calls to ~commit~ or ~rollback~ and will commit
158 | all changes to disk immediately. The HDBC ODBC driver has
159 | instructions for configuring MySQL to indicate to HDBC that it
160 | does not support transactions, which will cause ~commit~ and
161 | ~rollback~ to generate errors. Alternatively, you can use InnoDB
162 | tables with MySQL, which do support transactions. InnoDB tables
163 | are recommended for use with HDBC.
164 | #+END_WARNING
165 | 
166 | ** Simple Queries
167 | 
168 | Some of the simplest queries in SQL involve statements that don't
169 | return any data. These queries can be used to create tables,
170 | insert data, delete data, and set database parameters.
171 | 
172 | The most basic function for sending queries to a database is
173 | ~run~. This function takes an ~IConnection~, a ~String~
174 | representing the query itself, and a list of parameters. Let's use
175 | it to set up some things in our database.
176 | 
177 | #+BEGIN_SRC screen
178 | ghci> :module Database.HDBC Database.HDBC.Sqlite3
179 | ghci> conn <- connectSqlite3 "test1.db"
180 | Loading package array-0.1.0.0 ... linking ... done.
181 | Loading package containers-0.1.0.1 ... linking ... done.
182 | Loading package bytestring-0.9.0.1 ... linking ... done.
183 | Loading package old-locale-1.0.0.0 ... linking ... done.
184 | Loading package old-time-1.0.0.0 ... linking ... done.
185 | Loading package mtl-1.1.0.0 ... linking ... done.
186 | Loading package HDBC-1.1.5 ... linking ... done.
187 | Loading package HDBC-sqlite3-1.1.4.0 ... linking ... done.
188 | ghci> run conn "CREATE TABLE test (id INTEGER NOT NULL, desc VARCHAR(80))" []
189 | 0
190 | ghci> run conn "INSERT INTO test (id) VALUES (0)" []
191 | 1
192 | ghci> commit conn
193 | ghci> disconnect conn
194 | #+END_SRC
195 | 
196 | After connecting to the database, we first created a table called
197 | ~test~. Then we inserted one row of data into the table. Finally,
198 | we committed the changes and disconnected from the database. Note
199 | that if we hadn't called ~commit~, no final change would have been
200 | written to the database at all.
201 | 
202 | The ~run~ function returns the number of rows each query modified.
203 | For the first query, which created a table, no rows were modified.
204 | The second query inserted a single row, so ~run~ returned =1=.
205 | 
206 | ** ~SqlValues~
207 | 
208 | Before proceeding, we need to discuss a data type introduced in
209 | HDBC: ~SqlValue~. Since both Haskell and SQL are strongly-typed
210 | systems, HDBC tries to preserve type information as much as
211 | possible. At the same time, Haskell and SQL types don't exactly
212 | mirror each other. Furthermore, different databases have different
213 | ways of representing things such as dates or special characters in
214 | strings.
215 | 
216 | ~SqlValue~ is a data type that has a number of constructors such
217 | as ~SqlString~, ~SqlBool~, ~SqlNull~, ~SqlInteger~, and more. This
218 | lets you represent various types of data in argument lists to the
219 | database, and to see various types of data in the results coming
220 | back, and still store it all in a list. There are convenience
221 | functions ~toSql~ and ~fromSql~ that you will normally use. If you
222 | care about the precise representation of data, you can still
223 | manually construct ~SqlValue~ data if you need to.
224 | 
225 | ** Query Parameters
226 | 
227 | HDBC, like most databases, supports a notion of replaceable
228 | parameters in queries. There are three primary benefits of using
229 | replaceable parameters: they prevent SQL injection attacks or
230 | trouble when the input contains quote characters, they improve
231 | performance when executing similar queries repeatedly, and they
232 | permit easy and portable insertion of data into queries.
233 | 
234 | Let's say you wanted to add thousands of rows into our new table
235 | ~test~. You could issue thousands of queries looking like
236 | ~INSERT INTO test VALUES (0, 'zero')~ and
237 | ~INSERT INTO test VALUES (1, 'one')~. This forces the database
238 | server to parse each SQL statement individually. If you could
239 | replace the two values with a placeholder, the server could parse
240 | the SQL query once, and just execute it multiple times with the
241 | different data.
242 | 
243 | A second problem involves escaping characters. What if you wanted
244 | to insert the string ~"I don't like 1"~? SQL uses the single quote
245 | character to show the end of the field. Most SQL databases would
246 | require you to write this as ~'I don''t like 1'~. But rules for
247 | other special characters such as backslashes differ between
248 | databases. Rather than trying to code this yourself, HDBC can
249 | handle it all for you. Let's look at an example.
250 | 
251 | #+BEGIN_SRC screen
252 | ghci> conn <- connectSqlite3 "test1.db"
253 | ghci> run conn "INSERT INTO test VALUES (?, ?)" [toSql 0, toSql "zero"]
254 | 1
255 | ghci> commit conn
256 | ghci> disconnect conn
257 | #+END_SRC
258 | 
259 | The question marks in the ~INSERT~ query in this example are the
260 | placeholders. We then passed the parameters that are going to go
261 | there. ~run~ takes a list of ~SqlValue~, so we used ~toSql~ to
262 | convert each item into an ~SqlValue~. HDBC automatically handled
263 | conversion of the ~String~ ~"zero"~ into the appropriate
264 | representation for the database in use.
265 | 
266 | This approach won't actually achieve any performance benefits when
267 | inserting large amounts of data. For that, we need more control
268 | over the process of creating the SQL query. We'll discuss that in
269 | the next section.
270 | 
271 | #+BEGIN_NOTE
272 | Using replaceable parameters
273 | 
274 | Replaceable parameters only work for parts of the queries where
275 | the server is expecting a value, such as a ~WHERE~ clause in a
276 | ~SELECT~ statement or a value for an ~INSERT~ statement. You
277 | cannot say ~run "SELECT * from ?" [toSql "tablename"]~ and expect
278 | it to work. A table name is not a value, and most databases will
279 | not accept this syntax. That's not a big problem in practice,
280 | because there is rarely a call for replacing things that aren't
281 | values in this way.
282 | #+END_NOTE
283 | 
284 | ** Prepared Statements
285 | 
286 | HDBC defines a function ~prepare~ that will prepare a SQL query,
287 | but it does not yet bind the parameters to the query. ~prepare~
288 | returns a ~Statement~ representing the compiled query.
289 | 
290 | Once you have a ~Statement~, you can do a number of things with
291 | it. You can call ~execute~ on it one or more times. After calling
292 | ~execute~ on a query that returns data, you can use one of the
293 | fetch functions to retrieve that data. Functions like ~run~ and
294 | ~quickQuery'~ use statements and ~execute~ internally; they are
295 | simply shortcuts to let you perform common tasks quickly. When you
296 | need more control over what's happening, you can use a ~Statement~
297 | instead of a function like ~run~.
298 | 
299 | Let's look at using statements to insert multiple values with a
300 | single query. Here's an example:
301 | 
302 | #+BEGIN_SRC screen
303 | ghci> conn <- connectSqlite3 "test1.db"
304 | ghci> stmt <- prepare conn "INSERT INTO test VALUES (?, ?)"
305 | ghci> execute stmt [toSql 1, toSql "one"]
306 | 1
307 | ghci> execute stmt [toSql 2, toSql "two"]
308 | 1
309 | ghci> execute stmt [toSql 3, toSql "three"]
310 | 1
311 | ghci> execute stmt [toSql 4, SqlNull]
312 | 1
313 | ghci> commit conn
314 | ghci> disconnect conn
315 | #+END_SRC
316 | 
317 | In this example, we created a prepared statement and called it
318 | ~stmt~. We then executed that statement four times, and passed
319 | different parameters each time. These parameters are used, in
320 | order, to replace the question marks in the original query string.
321 | Finally, we commit the changes and disconnect the database.
322 | 
323 | HDBC also provides a function ~executeMany~ that can be useful in
324 | situations such as this. ~executeMany~ simply takes a list of rows
325 | of data to call the statement with. Here's an example:
326 | 
327 | #+BEGIN_SRC screen
328 | ghci> conn <- connectSqlite3 "test1.db"
329 | ghci> stmt <- prepare conn "INSERT INTO test VALUES (?, ?)"
330 | ghci> executeMany stmt [[toSql 5, toSql "five's nice"], [toSql 6, SqlNull]]
331 | ghci> commit conn
332 | ghci> disconnect conn
333 | #+END_SRC
334 | 
335 | #+BEGIN_NOTE
336 | More efficient execution
337 | 
338 | On the server, most databases will have an optimization that they
339 | can apply to ~executeMany~ so that they only have to compile this
340 | query string once, rather than twice.[fn:4] This can lead to a
341 | dramatic performance gain when inserting large amounts of data at
342 | once. Some databases can also apply this optimization to
343 | ~execute~, but not all.
344 | #+END_NOTE
345 | 
346 | ** Reading Results
347 | 
348 | So far, we have discussed queries that insert or change data.
349 | Let's discuss getting data back out of the database. The type of
350 | the function ~quickQuery'~ looks very similar to ~run~, but it
351 | returns a list of results instead of a count of changed rows.
352 | ~quickQuery'~ is normally used with ~SELECT~ statements. Let's see
353 | an example:
354 | 
355 | #+BEGIN_SRC screen
356 | ghci> conn <- connectSqlite3 "test1.db"
357 | ghci> quickQuery' conn "SELECT * from test where id < 2" []
358 | [[SqlString "0",SqlNull],[SqlString "0",SqlString "zero"],[SqlString "1",SqlString "one"]]
359 | ghci> disconnect conn
360 | #+END_SRC
361 | 
362 | ~quickQuery'~ works with replaceable parameters, as we discussed
363 | above. In this case, we aren't using any, so the set of values to
364 | replace is the empty list at the end of the ~quickQuery'~ call.
365 | ~quickQuery'~ returns a list of rows, where each row is itself
366 | represented as ~[SqlValue]~. The values in the row are listed in
367 | the order returned by the database. You can use ~fromSql~ to
368 | convert them into regular Haskell types as needed.
369 | 
370 | It's a bit hard to read that output. Let's extend this example to
371 | format the results nicely. Here's some code to do that:
372 | 
373 | #+CAPTION: query.hs
374 | #+BEGIN_SRC haskell
375 | import Database.HDBC.Sqlite3 (connectSqlite3)
376 | import Database.HDBC
377 | 
378 | {- | Define a function that takes an integer representing the maximum
379 | id value to look up.  Will fetch all matching rows from the test database
380 | and print them to the screen in a friendly format. -}
381 | query :: Int -> IO ()
382 | query maxId =
383 |     do -- Connect to the database
384 |        conn <- connectSqlite3 "test1.db"
385 | 
386 |        -- Run the query and store the results in r
387 |        r <- quickQuery' conn
388 |             "SELECT id, desc from test where id <= ? ORDER BY id, desc"
389 |             [toSql maxId]
390 | 
391 |        -- Convert each row into a String
392 |        let stringRows = map convRow r
393 | 
394 |        -- Print the rows out
395 |        mapM_ putStrLn stringRows
396 | 
397 |        -- And disconnect from the database
398 |        disconnect conn
399 | 
400 |     where convRow :: [SqlValue] -> String
401 |           convRow [sqlId, sqlDesc] =
402 |               show intid ++ ": " ++ desc
403 |               where intid = (fromSql sqlId)::Integer
404 |                     desc = case fromSql sqlDesc of
405 |                              Just x -> x
406 |                              Nothing -> "NULL"
407 |           convRow x = fail $ "Unexpected result: " ++ show x
408 | #+END_SRC
409 | 
410 | This program does mostly the same thing as our example with
411 | ~ghci~, but with a new addition: the ~convRow~ function. This
412 | function takes a row of data from the database and converts it to
413 | a ~String~. This string can then be easily printed out.
414 | 
415 | Notice how we took ~intid~ from ~fromSql~ directly, but processed
416 | ~fromSql sqlDesc~ as a ~Maybe String~ type. If you recall, we
417 | declared that the first column in this table can never contain a
418 | ~NULL~ value, but that the second column could. Therefore, we can
419 | safely ignore the potential for a ~NULL~ in the first column, but
420 | not in the second. It is possible to use ~fromSql~ to convert the
421 | second column to a ~String~ directly, and it would even work—until
422 | a row with a ~NULL~ in that position was encountered, which would
423 | cause a runtime exception. So, we convert a SQL ~NULL~ value into
424 | the string ~"NULL"~. When printed, this will be indistinguishable
425 | from a SQL string ~'NULL'~, but that's acceptable for this
426 | example. Let's try calling this function in ~ghci~:
427 | 
428 | #+BEGIN_SRC screen
429 | ghci> :load query.hs
430 | [1 of 1] Compiling Main             ( query.hs, interpreted )
431 | Ok, modules loaded: Main.
432 | ghci> query 2
433 | 0: NULL
434 | 0: zero
435 | 1: one
436 | 2: two
437 | #+END_SRC
438 | 
439 | *** Reading with Statements
440 | 
441 | As we discussed in [[file:21-using-databases.org::*Prepared Statements][the section called "Prepared Statements"]]
442 | you can use statements for reading. There are a number of ways of
443 | reading data from statements that can be useful in certain
444 | situations. Like ~run~, ~quickQuery'~ is a convenience function
445 | that in fact uses statements to accomplish its task.
446 | 
447 | To create a statement for reading, you use ~prepare~ just as you
448 | would for a statement that will be used to write data. You also
449 | use ~execute~ to execute it on the database server. Then, you can
450 | use various functions to read data from the ~Statement~. The
451 | ~fetchAllRows'~ function returns ~[[SqlValue]]~, just like
452 | ~quickQuery'~. There is also a function called ~sFetchAllRows'~,
453 | which converts every column's data to a ~Maybe String~ before
454 | returning it. Finally, there is ~fetchAllRowsAL'~, which returns
455 | ~(String, SqlValue)~ pairs for each column. The ~String~ is the
456 | column name as returned by the database; see
457 | [[file:21-using-databases.org::*Database Metadata][the section called "Database Metadata"]]
458 | 
459 | You can also read data one row at a time by calling ~fetchRow~,
460 | which returns ~IO (Maybe [SqlValue])~. It will be ~Nothing~ if all
461 | the results have already been read, or one row otherwise.
462 | 
463 | *** Lazy Reading
464 | 
465 | Back in [[file:7-io.org::*Lazy I/O][the section called "Lazy I/O"]]
466 | from files. It is also possible to read data lazily from
467 | databases. This can be particularly useful when dealing with
468 | queries that return an exceptionally large amount of data. By
469 | reading data lazily, you can still use convenient functions such
470 | as ~fetchAllRows~ instead of having to manually read each row as
471 | it comes in. If you are careful in your use of the data, you can
472 | avoid having to buffer all of the results in memory.
473 | 
474 | Lazy reading from a database, however, is more complex than
475 | reading from a file. When you're done reading data lazily from a
476 | file, the file is closed, and that's generally fine. When you're
477 | done reading data lazily from a database, the database connection
478 | is still open—you may be submitting other queries with it, for
479 | instance. Some databases can even support multiple simultaneous
480 | queries, so HDBC can't just close the connection when you're done.
481 | 
482 | When using lazy reading, it is critically important that you
483 | finish reading the entire data set before you attempt to close the
484 | connection or execute a new query. We encourage you to use the
485 | strict functions, or row-by-row processing, wherever possible to
486 | minimize complex interactions with lazy reading.
487 | 
488 | #+BEGIN_TIP
489 | Tip
490 | 
491 | If you are new to HDBC or the concept of lazy reading, but have
492 | lots of data to read, repeated calls to ~fetchRow~ may be easier
493 | to understand. Lazy reading is a powerful and useful tool, but
494 | must be used correctly.
495 | #+END_TIP
496 | 
497 | To read lazily from a database, you use the same functions you
498 | used before, without the apostrophe. For instance, you'd use
499 | ~fetchAllRows~ instead of ~fetchAllRows'~. The types of the lazy
500 | functions are the same as their strict cousins. Here's an example
501 | of lazy reading:
502 | 
503 | #+BEGIN_SRC screen
504 | ghci> conn <- connectSqlite3 "test1.db"
505 | ghci> stmt <- prepare conn "SELECT * from test where id < 2"
506 | ghci> execute stmt []
507 | 0
508 | ghci> results <- fetchAllRowsAL stmt
509 | [[("id",SqlString "0"),("desc",SqlNull)],[("id",SqlString "0"),("desc",SqlString "zero")],[("id",SqlString "1"),("desc",SqlString "one")]]
510 | ghci> mapM_ print results
511 | [("id",SqlString "0"),("desc",SqlNull)]
512 | [("id",SqlString "0"),("desc",SqlString "zero")]
513 | [("id",SqlString "1"),("desc",SqlString "one")]
514 | ghci> disconnect conn
515 | #+END_SRC
516 | 
517 | Note that you could have used ~fetchAllRowsAL'~ here as well.
518 | However, if you had a large data set to read, it would have
519 | consumed a lot of memory. By reading the data lazily, we can print
520 | out extremely large result sets using a constant amount of memory.
521 | With the lazy version, results will be evaluated in chunks; with
522 | the strict version, all results are read up front, stored in RAM,
523 | then printed.
524 | 
525 | ** Database Metadata
526 | 
527 | Sometimes it can be useful for a program to learn information
528 | about the database itself. For instance, a program may want to see
529 | what tables exist so that it can automatically create missing
530 | tables or upgrade the database schema. In some cases, a program
531 | may need to alter its behavior depending on the database backend
532 | in use.
533 | 
534 | First, there is a ~getTables~ function that will obtain a list of
535 | defined tables in a database. You can also use the ~describeTable~
536 | function, which will provide information about the defined columns
537 | in a given table.
538 | 
539 | You can learn about the database server in use by calling
540 | ~dbServerVer~ and ~proxiedClientName~, for instance. The
541 | ~dbTransactionSupport~ function can be used to determine whether
542 | or not a given database supports transactions. Let's look at an
543 | example of some of these items:
544 | 
545 | #+BEGIN_SRC screen
546 | ghci> conn <- connectSqlite3 "test1.db"
547 | ghci> getTables conn
548 | ["test"]
549 | ghci> proxiedClientName conn
550 | "sqlite3"
551 | ghci> dbServerVer conn
552 | "3.5.9"
553 | ghci> dbTransactionSupport conn
554 | True
555 | ghci> disconnect conn
556 | #+END_SRC
557 | 
558 | You can also learn about the results of a specific query by
559 | obtaining information from its statement. The ~describeResult~
560 | function returns ~[(String, SqlColDesc)]~, a list of pairs. The
561 | first item gives the column name, and the second provides
562 | information about the column: the type, the size, whether it may
563 | be ~NULL~. The full specification is given in the HDBC API
564 | reference.
565 | 
566 | Please note that some databases may not be able to provide all
567 | this metadata. In these circumstances, an exception will be
568 | raised. Sqlite3, for instance, does not support ~describeResult~
569 | or ~describeTable~ as of this writing.
570 | 
571 | ** Error Handling
572 | 
573 | HDBC will raise exceptions when errors occur. The exceptions have
574 | type ~SqlError~. They convey information from the underlying SQL
575 | engine, such as the database's state, the error message, and the
576 | database's numeric error code, if any.
577 | 
578 | ~ghc~ does not know how to display an ~SqlError~ on the screen
579 | when it occurs. While the exception will cause the program to
580 | terminate, it will not display a useful message. Here's an
581 | example:
582 | 
583 | #+BEGIN_SRC screen
584 | ghci> conn <- connectSqlite3 "test1.db"
585 | ghci> quickQuery' conn "SELECT * from test2" []
586 | *** Exception: (unknown)
587 | ghci> disconnect conn
588 | #+END_SRC
589 | 
590 | Here we tried to ~SELECT~ data from a table that didn't exist. The
591 | error message we got back wasn't helpful. There's a utility
592 | function, ~handleSqlError~, that will catch an ~SqlError~ and
593 | re-raise it as an ~IOError~. In this form, it will be printable
594 | on-screen, but it will be more difficult to extract specific
595 | pieces of information programmatically. Let's look at its usage:
596 | 
597 | #+BEGIN_SRC screen
598 | ghci> conn <- connectSqlite3 "test1.db"
599 | ghci> handleSqlError $ quickQuery' conn "SELECT * from test2" []
600 | *** Exception: user error (SQL error: SqlError {seState = "", seNativeError = 1, seErrorMsg = "prepare 20: SELECT * from test2: no such table: test2"})
601 | ghci> disconnect conn
602 | #+END_SRC
603 | 
604 | Here we got more information, including even a message saying that
605 | there is no such table as test2. This is much more helpful. Many
606 | HDBC programmers make it a standard practice to start their
607 | programs with ~main = handleSqlError $ do~, which will ensure that
608 | every un-caught ~SqlError~ will be printed in a helpful manner.
609 | 
610 | There are also ~catchSql~ and ~handleSql~—similar to the standard
611 | ~catch~ and ~handle~ functions. ~catchSql~ and ~handleSql~ will
612 | intercept only HDBC errors. For more information on error
613 | handling, refer to [[file:19-error-handling.org][Chapter 19, /Error handling/]].
614 | 
615 | ** Footnotes
616 | 
617 | [fn:1] The O'Reilly books /Learning SQL/ and /SQL in a Nutshell/
618 | may be useful if you don't have experience with SQL.
619 | 
620 | [fn:2] This assumes you restrict yourself to using standard SQL.
621 | 
622 | [fn:3] For more information on installing Haskell software, please
623 | refer to [[file:installing-ghc-and-haskell-libraries.org::*Installing Haskell software][the section called "Installing Haskell software"]]
624 | 
625 | [fn:4] HDBC emulates this behavior for databases that do not
626 | provide it, providing programmers a unified API for running
627 | queries repeatedly.
628 | 


--------------------------------------------------------------------------------
/22-web-client-programming.org:
--------------------------------------------------------------------------------
  1 | * Chapter 22. Extended Example: Web Client Programming
  2 | 
  3 | By this point, you've seen how to interact with a database, parse
  4 | things, and handle errors. Let's now take this a step farther and
  5 | introduce a web client library to the mix.
  6 | 
  7 | We'll develop a real application in this chapter: a podcast
  8 | downloader, or "podcatcher". The idea of a podcatcher is simple.
  9 | It is given a list of URLs to process. Downloading each of these
 10 | URLs results in an XML file in the RSS format. Inside this XML
 11 | file, we'll find references to URLs for audio files to download.
 12 | 
 13 | Podcatchers usually let the user subscribe to podcasts by adding
 14 | RSS URLs to their configuration. Then, the user can periodically
 15 | run an update operation. The podcatcher will download the RSS
 16 | documents, examine them for audio file references, and download
 17 | any audio files that haven't already been downloaded on behalf of
 18 | this user.
 19 | 
 20 | #+BEGIN_TIP
 21 | Tip
 22 | 
 23 | Users often call the RSS document a podcast or the podcast feed,
 24 | and each individual audio file an episode.
 25 | #+END_TIP
 26 | 
 27 | To make this happen, we need to have several things:
 28 | 
 29 | - An HTTP client library to download files
 30 | - An XML parser
 31 | - A way to specify and persistently store which podcasts we're
 32 |   interested in
 33 | - A way to persistently store which podcast episodes we've already
 34 |   downloaded
 35 | 
 36 | The last two items can be accommodated via a database we'll set up
 37 | using HDBC. The first two can be accommodated via other library
 38 | modules we'll introduce in this chapter.
 39 | 
 40 | #+BEGIN_TIP
 41 | Tip
 42 | 
 43 | The code in this chapter was written specifically for this book,
 44 | but is based on code written for hpodder, an existing podcatcher
 45 | written in Haskell. hpodder has many more features than the
 46 | examples presented here, which make it too long and complex for
 47 | coverage in this book. If you are interested in studying hpodder,
 48 | its source code is freely available at
 49 | [[http://software.complete.org/hpodder]].
 50 | #+END_TIP
 51 | 
 52 | We'll write the code for this chapter in pieces. Each piece will
 53 | be its own Haskell module. You'll be able to play with each piece
 54 | by itself in ~ghci~. At the end, we'll write the final code that
 55 | ties everything together into a finished application. We'll start
 56 | with the basic types we'll need to use.
 57 | 
 58 | ** Basic Types
 59 | 
 60 | The first thing to do is have some idea of the basic information
 61 | that will be important to the application. This will generally be
 62 | information about the podcasts the user is interested in, plus
 63 | information about episodes that we have seen and processed. It's
 64 | easy enough to change this later if needed, but since we'll be
 65 | importing it just about everywhere, we'll define it first.
 66 | 
 67 | #+CAPTION: PodTypes.hs
 68 | #+BEGIN_SRC haskell
 69 | module PodTypes where
 70 | 
 71 | data Podcast =
 72 |     Podcast {castId :: Integer, -- ^ Numeric ID for this podcast
 73 |              castURL :: String  -- ^ Its feed URL
 74 |             }
 75 |     deriving (Eq, Show, Read)
 76 | 
 77 | data Episode =
 78 |     Episode {epId :: Integer,     -- ^ Numeric ID for this episode
 79 |              epCast :: Podcast, -- ^ The ID of the podcast it came from
 80 |              epURL :: String,     -- ^ The download URL for this episode
 81 |              epDone :: Bool       -- ^ Whether or not we are done with this ep
 82 |             }
 83 |     deriving (Eq, Show, Read)
 84 | #+END_SRC
 85 | 
 86 | We'll be storing this information in a database. Having a unique
 87 | identifier for both a podcast and an episode makes it easy to find
 88 | which episodes belong to a particular podcast, load information
 89 | for a particular podcast or episode, or handle future cases such
 90 | as changing URLs for podcasts.
 91 | 
 92 | ** The Database
 93 | 
 94 | Next, we'll write the code to make possible persistent storage in
 95 | a database. We'll primarily be interested in moving data between
 96 | the Haskell structures we defined in ~PodTypes.hs~ and the
 97 | database on disk. Also, the first time the user runs the program,
 98 | we'll need to create the database tables that we'll use to store
 99 | our data.
100 | 
101 | We'll use HDBC (see [[file:21-using-databases.org][Chapter 21, /Using Databases/]])
102 | to interact with a Sqlite database. Sqlite is lightweight and
103 | self-contained, which makes it perfect for this project. For
104 | information on installing HDBC and Sqlite, consult
105 | [[file:21-using-databases.org::*Installing HDBC and Drivers][the section called "Installing HDBC and Drivers"]]
106 | 
107 | #+CAPTION: PodDB.hs
108 | #+BEGIN_SRC haskell
109 | module PodDB where
110 | 
111 | import Database.HDBC
112 | import Database.HDBC.Sqlite3
113 | import PodTypes
114 | import Control.Monad(when)
115 | import Data.List(sort)
116 | 
117 | -- | Initialize DB and return database Connection
118 | connect :: FilePath -> IO Connection
119 | connect fp =
120 |     do dbh <- connectSqlite3 fp
121 |        prepDB dbh
122 |        return dbh
123 | 
124 | {- | Prepare the database for our data.
125 | 
126 | We create two tables and ask the database engine to verify some pieces
127 | of data consistency for us:
128 | 
129 | * castid and epid both are unique primary keys and must never be duplicated
130 | * castURL also is unique
131 | * In the episodes table, for a given podcast (epcast), there must be only
132 |   one instance of each given URL or episode ID
133 | -}
134 | prepDB :: IConnection conn => conn -> IO ()
135 | prepDB dbh =
136 |     do tables <- getTables dbh
137 |        when (not ("podcasts" `elem` tables)) $
138 |            do run dbh "CREATE TABLE podcasts (\
139 |                        \castid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,\
140 |                        \castURL TEXT NOT NULL UNIQUE)" []
141 |               return ()
142 |        when (not ("episodes" `elem` tables)) $
143 |            do run dbh "CREATE TABLE episodes (\
144 |                        \epid INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,\
145 |                        \epcastid INTEGER NOT NULL,\
146 |                        \epurl TEXT NOT NULL,\
147 |                        \epdone INTEGER NOT NULL,\
148 |                        \UNIQUE(epcastid, epurl),\
149 |                        \UNIQUE(epcastid, epid))" []
150 |               return ()
151 |        commit dbh
152 | 
153 | {- | Adds a new podcast to the database.  Ignores the castid on the
154 | incoming podcast, and returns a new object with the castid populated.
155 | 
156 | An attempt to add a podcast that already exists is an error. -}
157 | addPodcast :: IConnection conn => conn -> Podcast -> IO Podcast
158 | addPodcast dbh podcast =
159 |     handleSql errorHandler $
160 |       do -- Insert the castURL into the table.  The database
161 |          -- will automatically assign a cast ID.
162 |          run dbh "INSERT INTO podcasts (castURL) VALUES (?)"
163 |              [toSql (castURL podcast)]
164 |          -- Find out the castID for the URL we just added.
165 |          r <- quickQuery' dbh "SELECT castid FROM podcasts WHERE castURL = ?"
166 |               [toSql (castURL podcast)]
167 |          case r of
168 |            [[x]] -> return $ podcast {castId = fromSql x}
169 |            y -> fail $ "addPodcast: unexpected result: " ++ show y
170 |     where errorHandler e =
171 |               do fail $ "Error adding podcast; does this URL already exist?\n"
172 |                      ++ show e
173 | 
174 | {- | Adds a new episode to the database.
175 | 
176 | Since this is done by automation, instead of by user request, we will
177 | simply ignore requests to add duplicate episodes.  This way, when we are
178 | processing a feed, each URL encountered can be fed to this function,
179 | without having to first look it up in the DB.
180 | 
181 | Also, we generally won't care about the new ID here, so don't bother
182 | fetching it. -}
183 | addEpisode :: IConnection conn => conn -> Episode -> IO ()
184 | addEpisode dbh ep =
185 |     run dbh "INSERT OR IGNORE INTO episodes (epCastId, epURL, epDone) \
186 |                 \VALUES (?, ?, ?)"
187 |                 [toSql (castId . epCast $ ep), toSql (epURL ep),
188 |                  toSql (epDone ep)]
189 |     >> return ()
190 | 
191 | {- | Modifies an existing podcast.  Looks up the given podcast by
192 | ID and modifies the database record to match the passed Podcast. -}
193 | updatePodcast :: IConnection conn => conn -> Podcast -> IO ()
194 | updatePodcast dbh podcast =
195 |     run dbh "UPDATE podcasts SET castURL = ? WHERE castId = ?"
196 |             [toSql (castURL podcast), toSql (castId podcast)]
197 |     >> return ()
198 | 
199 | {- | Modifies an existing episode.  Looks it up by ID and modifies the
200 | database record to match the given episode. -}
201 | updateEpisode :: IConnection conn => conn -> Episode -> IO ()
202 | updateEpisode dbh episode =
203 |     run dbh "UPDATE episodes SET epCastId = ?, epURL = ?, epDone = ? \
204 |              \WHERE epId = ?"
205 |              [toSql (castId . epCast $ episode),
206 |               toSql (epURL episode),
207 |               toSql (epDone episode),
208 |               toSql (epId episode)]
209 |     >> return ()
210 | 
211 | {- | Remove a podcast.  First removes any episodes that may exist
212 | for this podcast. -}
213 | removePodcast :: IConnection conn => conn -> Podcast -> IO ()
214 | removePodcast dbh podcast =
215 |     do run dbh "DELETE FROM episodes WHERE epcastid = ?"
216 |          [toSql (castId podcast)]
217 |        run dbh "DELETE FROM podcasts WHERE castid = ?"
218 |          [toSql (castId podcast)]
219 |        return ()
220 | 
221 | {- | Gets a list of all podcasts. -}
222 | getPodcasts :: IConnection conn => conn -> IO [Podcast]
223 | getPodcasts dbh =
224 |     do res <- quickQuery' dbh
225 |               "SELECT castid, casturl FROM podcasts ORDER BY castid" []
226 |        return (map convPodcastRow res)
227 | 
228 | {- | Get a particular podcast.  Nothing if the ID doesn't match, or
229 | Just Podcast if it does. -}
230 | getPodcast :: IConnection conn => conn -> Integer -> IO (Maybe Podcast)
231 | getPodcast dbh wantedId =
232 |     do res <- quickQuery' dbh
233 |               "SELECT castid, casturl FROM podcasts WHERE castid = ?"
234 |               [toSql wantedId]
235 |        case res of
236 |          [x] -> return (Just (convPodcastRow x))
237 |          [] -> return Nothing
238 |          x -> fail $ "Really bad error; more than one podcast with ID"
239 | 
240 | {- | Convert the result of a SELECT into a Podcast record -}
241 | convPodcastRow :: [SqlValue] -> Podcast
242 | convPodcastRow [svId, svURL] =
243 |     Podcast {castId = fromSql svId,
244 |              castURL = fromSql svURL}
245 | convPodcastRow x = error $ "Can't convert podcast row " ++ show x
246 | 
247 | {- | Get all episodes for a particular podcast. -}
248 | getPodcastEpisodes :: IConnection conn => conn -> Podcast -> IO [Episode]
249 | getPodcastEpisodes dbh pc =
250 |     do r <- quickQuery' dbh
251 |             "SELECT epId, epURL, epDone FROM episodes WHERE epCastId = ?"
252 |             [toSql (castId pc)]
253 |        return (map convEpisodeRow r)
254 |     where convEpisodeRow [svId, svURL, svDone] =
255 |               Episode {epId = fromSql svId, epURL = fromSql svURL,
256 |                        epDone = fromSql svDone, epCast = pc}
257 | #+END_SRC
258 | 
259 | In the ~PodDB~ module, we have defined functions to connect to the
260 | database, create the needed database tables, add data to the
261 | database, query the database, and remove data from the database.
262 | Here is an example ~ghci~ session demonstrating interacting with
263 | the database. It will create a database file named ~poddbtest.db~
264 | in the current working directory and add a podcast and an episode
265 | to it.
266 | 
267 | #+BEGIN_SRC screen
268 | ghci> :load PodDB.hs
269 | [1 of 2] Compiling PodTypes         ( PodTypes.hs, interpreted )
270 | [2 of 2] Compiling PodDB            ( PodDB.hs, interpreted )
271 | Ok, modules loaded: PodDB, PodTypes.
272 | ghci> dbh <- connect "poddbtest.db"
273 | ghci> :type dbh
274 | dbh :: Connection
275 | ghci> getTables dbh
276 | ["episodes","podcasts","sqlite_sequence"]
277 | ghci> let url = "http://feeds.thisamericanlife.org/talpodcast"
278 | ghci> pc <- addPodcast dbh (Podcast {castId=0, castURL=url})
279 | Podcast {castId = 1, castURL = "http://feeds.thisamericanlife.org/talpodcast"}
280 | ghci> getPodcasts dbh
281 | [Podcast {castId = 1, castURL = "http://feeds.thisamericanlife.org/talpodcast"}]
282 | ghci> addEpisode dbh (Episode {epId = 0, epCast = pc, epURL = "http://www.example.com/foo.mp3", epDone = False})
283 | ghci> getPodcastEpisodes dbh pc
284 | [Episode {epId = 1, epCast = Podcast {castId = 1, castURL = "http://feeds.thisamericanlife.org/talpodcast"}, epURL = "http://www.example.com/foo.mp3", epDone = False}]
285 | ghci> commit dbh
286 | ghci> disconnect dbh
287 | #+END_SRC
288 | 
289 | ** The Parser
290 | 
291 | Now that we have the database component, we need to have code to
292 | parse the podcast feeds. These are XML files that contain various
293 | information. Here's an example XML file to show you what they look
294 | like:
295 | 
296 | #+BEGIN_SRC xml
297 | <?xml version="1.0" encoding="UTF-8"?>
298 | <rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd" version="2.0">
299 |   <channel>
300 |     <title>Haskell Radio</title>
301 |     <link>http://www.example.com/radio/</link>
302 |     <description>Description of this podcast</description>
303 |     <item>
304 |       <title>Episode 2: Lambdas</title>
305 |       <link>http://www.example.com/radio/lambdas</link>
306 |       <enclosure url="http://www.example.com/radio/lambdas.mp3"
307 |        type="audio/mpeg" length="10485760"/>
308 |     </item>
309 |     <item>
310 |       <title>Episode 1: Parsec</title>
311 |       <link>http://www.example.com/radio/parsec</link>
312 |       <enclosure url="http://www.example.com/radio/parsec.mp3"
313 |        type="audio/mpeg" length="10485150"/>
314 |     </item>
315 |   </channel>
316 | </rss>
317 | #+END_SRC
318 | 
319 | Out of these files, we are mainly interested in two things: the
320 | podcast title and the enclosure URLs. We use the
321 | [[http://www.cs.york.ac.uk/fp/HaXml/][HaXml toolkit]] to parse the
322 | XML file. Here's the source code for this component:
323 | 
324 | #+CAPTION: PodParser.hs
325 | #+BEGIN_SRC haskell
326 | module PodParser where
327 | 
328 | import PodTypes
329 | import Text.XML.HaXml
330 | import Text.XML.HaXml.Parse
331 | import Text.XML.HaXml.Html.Generate(showattr)
332 | import Data.Char
333 | import Data.List
334 | 
335 | data PodItem = PodItem {itemtitle :: String,
336 |                   enclosureurl :: String
337 |                   }
338 |           deriving (Eq, Show, Read)
339 | 
340 | data Feed = Feed {channeltitle :: String,
341 |                   items :: [PodItem]}
342 |             deriving (Eq, Show, Read)
343 | 
344 | {- | Given a podcast and an PodItem, produce an Episode -}
345 | item2ep :: Podcast -> PodItem -> Episode
346 | item2ep pc item =
347 |     Episode {epId = 0,
348 |              epCast = pc,
349 |              epURL = enclosureurl item,
350 |              epDone = False}
351 | 
352 | {- | Parse the data from a given string, with the given name to use
353 | in error messages. -}
354 | parse :: String -> String -> Feed
355 | parse content name =
356 |     Feed {channeltitle = getTitle doc,
357 |           items = getEnclosures doc}
358 | 
359 |     where parseResult = xmlParse name (stripUnicodeBOM content)
360 |           doc = getContent parseResult
361 | 
362 |           getContent :: Document -> Content
363 |           getContent (Document _ _ e _) = CElem e
364 |           
365 |           {- | Some Unicode documents begin with a binary sequence;
366 |              strip it off before processing. -}
367 |           stripUnicodeBOM :: String -> String
368 |           stripUnicodeBOM ('\xef':'\xbb':'\xbf':x) = x
369 |           stripUnicodeBOM x = x
370 | 
371 | {- | Pull out the channel part of the document.
372 | 
373 | Note that HaXml defines CFilter as:
374 | 
375 | > type CFilter = Content -> [Content]
376 | -}
377 | channel :: CFilter
378 | channel = tag "rss" /> tag "channel"
379 | 
380 | getTitle :: Content -> String
381 | getTitle doc =
382 |     contentToStringDefault "Untitled Podcast"
383 |         (channel /> tag "title" /> txt $ doc)
384 | 
385 | getEnclosures :: Content -> [PodItem]
386 | getEnclosures doc =
387 |     concatMap procPodItem $ getPodItems doc
388 |     where procPodItem :: Content -> [PodItem]
389 |           procPodItem item = concatMap (procEnclosure title) enclosure
390 |               where title = contentToStringDefault "Untitled Episode"
391 |                                (keep /> tag "title" /> txt $ item)
392 |                     enclosure = (keep /> tag "enclosure") item
393 | 
394 |           getPodItems :: CFilter
395 |           getPodItems = channel /> tag "item"
396 | 
397 |           procEnclosure :: String -> Content -> [PodItem]
398 |           procEnclosure title enclosure =
399 |               map makePodItem (showattr "url" enclosure)
400 |               where makePodItem :: Content -> PodItem
401 |                     makePodItem x = PodItem {itemtitle = title,
402 |                                        enclosureurl = contentToString [x]}
403 | 
404 | {- | Convert [Content] to a printable String, with a default if the
405 | passed-in [Content] is [], signifying a lack of a match. -}
406 | contentToStringDefault :: String -> [Content] -> String
407 | contentToStringDefault msg [] = msg
408 | contentToStringDefault _ x = contentToString x
409 | 
410 | {- | Convert [Content] to a printable string, taking care to unescape it.
411 | 
412 | An implementation without unescaping would simply be:
413 | 
414 | > contentToString = concatMap (show . content)
415 | 
416 | Because HaXml's unescaping only works on Elements, we must make sure that
417 | whatever Content we have is wrapped in an Element, then use txt to
418 | pull the insides back out. -}
419 | contentToString :: [Content] -> String
420 | contentToString =
421 |     concatMap procContent
422 |     where procContent x =
423 |               verbatim $ keep /> txt $ CElem (unesc (fakeElem x))
424 | 
425 |           fakeElem :: Content -> Element
426 |           fakeElem x = Elem "fake" [] [x]
427 | 
428 |           unesc :: Element -> Element
429 |           unesc = xmlUnEscape stdXmlEscaper
430 | #+END_SRC
431 | 
432 | Let's look at this code. First, we declare two types: ~PodItem~
433 | and ~Feed~. We will be transforming the XML document into a
434 | ~Feed~, which then contains items. We also provide a function to
435 | convert an ~PodItem~ into an ~Episode~ as defined in
436 | ~PodTypes.hs~.
437 | 
438 | Next, it is on to parsing. The ~parse~ function takes a ~String~
439 | representing the XML content as well as a ~String~ representing a
440 | name to use in error messages, and returns a ~Feed~.
441 | 
442 | HaXml is designed as a "filter" converting data of one type to
443 | another. It can be a simple straightforward conversion of XML to
444 | XML, or of XML to Haskell data, or of Haskell data to XML. HaXml
445 | has a data type called ~CFilter~, which is defined like this:
446 | 
447 | #+BEGIN_SRC haskell
448 | type CFilter = Content -> [Content]
449 | #+END_SRC
450 | 
451 | That is, a ~CFilter~ takes a fragment of an XML document and
452 | returns 0 or more fragments. A ~CFilter~ might be asked to find
453 | all children of a specified tag, all tags with a certain name, the
454 | literal text contained within a part of an XML document, or any of
455 | a number of other things. There is also an operator ~(/>)~ that
456 | chains ~CFilter~ functions together. All of the data that we're
457 | interested in occurs within the ~<channel>~ tag, so first we want
458 | to get at that. We define a simple ~CFilter~:
459 | 
460 | #+BEGIN_SRC haskell
461 | channel = tag "rss" /> tag "channel"
462 | #+END_SRC
463 | 
464 | When we pass a document to ~channel~, it will search the top level
465 | for the tag named ~rss~. Then, within that, it will look for the
466 | ~channel~ tag.
467 | 
468 | The rest of the program follows this basic approach. ~txt~
469 | extracts the literal text from a tag, and by using ~CFilter~
470 | functions, we can get at any part of the document.
471 | 
472 | ** Downloading
473 | 
474 | The next part of our program is a module to download data. We'll
475 | need to download two different types of data: the content of a
476 | podcast, and the audio for each episode. In the former case, we'll
477 | parse the data and update our database. For the latter, we'll
478 | write the data out to a file on disk.
479 | 
480 | We'll be downloading from HTTP servers, so we'll use a Haskell
481 | [[http://www.haskell.org/http/][HTTP library]]. For downloading
482 | podcast feeds, we'll download the document, parse it, and update
483 | the database. For episode audio, we'll download the file, write it
484 | to disk, and mark it downloaded in the database. Here's the code:
485 | 
486 | #+CAPTION: PodDownload.hs
487 | #+BEGIN_SRC haskell
488 | module PodDownload where
489 | import PodTypes
490 | import PodDB
491 | import PodParser
492 | import Network.HTTP
493 | import System.IO
494 | import Database.HDBC
495 | import Data.Maybe
496 | import Network.URI
497 | 
498 | {- | Download a URL.  (Left errorMessage) if an error,
499 | (Right doc) if success. -}
500 | downloadURL :: String -> IO (Either String String)
501 | downloadURL url =
502 |     do resp <- simpleHTTP request
503 |        case resp of
504 |          Left x -> return $ Left ("Error connecting: " ++ show x)
505 |          Right r ->
506 |              case rspCode r of
507 |                (2,_,_) -> return $ Right (rspBody r)
508 |                (3,_,_) -> -- A HTTP redirect
509 |                  case findHeader HdrLocation r of
510 |                    Nothing -> return $ Left (show r)
511 |                    Just url -> downloadURL url
512 |                _ -> return $ Left (show r)
513 |     where request = Request {rqURI = uri,
514 |                              rqMethod = GET,
515 |                              rqHeaders = [],
516 |                              rqBody = ""}
517 |           uri = fromJust $ parseURI url
518 | 
519 | {- | Update the podcast in the database. -}
520 | updatePodcastFromFeed :: IConnection conn => conn -> Podcast -> IO ()
521 | updatePodcastFromFeed dbh pc =
522 |     do resp <- downloadURL (castURL pc)
523 |        case resp of
524 |          Left x -> putStrLn x
525 |          Right doc -> updateDB doc
526 | 
527 |     where updateDB doc =
528 |               do mapM_ (addEpisode dbh) episodes
529 |                  commit dbh
530 |               where feed = parse doc (castURL pc)
531 |                     episodes = map (item2ep pc) (items feed)
532 | 
533 | {- | Downloads an episode, returning a String representing
534 | the filename it was placed into, or Nothing on error. -}
535 | getEpisode :: IConnection conn => conn -> Episode -> IO (Maybe String)
536 | getEpisode dbh ep =
537 |     do resp <- downloadURL (epURL ep)
538 |        case resp of
539 |          Left x -> do putStrLn x
540 |                       return Nothing
541 |          Right doc ->
542 |              do file <- openBinaryFile filename WriteMode
543 |                 hPutStr file doc
544 |                 hClose file
545 |                 updateEpisode dbh (ep {epDone = True})
546 |                 commit dbh
547 |                 return (Just filename)
548 |           -- This function ought to apply an extension based on the filetype
549 |     where filename = "pod." ++ (show . castId . epCast $ ep) ++ "." ++
550 |                      (show (epId ep)) ++ ".mp3"
551 | #+END_SRC
552 | 
553 | This module defines three functions: ~downloadURL~, which simply
554 | downloads a URL and returns it as a ~String~;
555 | ~updatePodcastFromFeed~, which downloads an XML feed file, parses
556 | it, and updates the database; and ~getEpisode~, which downloads a
557 | given episode and marks it done in the database.
558 | 
559 | #+BEGIN_WARNING
560 | Warning
561 | 
562 | The HTTP library used here does not read the HTTP result lazily.
563 | As a result, it can result in the consumption of a large amount of
564 | RAM when downloading large files such as podcasts. Other libraries
565 | are available that do not have this limitation. We used this one
566 | because it is stable, easy to install, and reasonably easy to use.
567 | We suggest mini-http, available from Hackage, for serious HTTP
568 | needs.
569 | #+END_WARNING
570 | 
571 | ** Main Program
572 | 
573 | Finally, we need a main program to tie it all together. Here's our
574 | main module:
575 | 
576 | #+CAPTION: PodMain.hs
577 | #+BEGIN_SRC haskell
578 | module Main where
579 | 
580 | import PodDownload
581 | import PodDB
582 | import PodTypes
583 | import System.Environment
584 | import Database.HDBC
585 | import Network.Socket(withSocketsDo)
586 | 
587 | main = withSocketsDo $ handleSqlError $
588 |     do args <- getArgs
589 |        dbh <- connect "pod.db"
590 |        case args of
591 |          ["add", url] -> add dbh url
592 |          ["update"] -> update dbh
593 |          ["download"] -> download dbh
594 |          ["fetch"] -> do update dbh
595 |                          download dbh
596 |          _ -> syntaxError
597 |        disconnect dbh
598 | 
599 | add dbh url =
600 |     do addPodcast dbh pc
601 |        commit dbh
602 |     where pc = Podcast {castId = 0, castURL = url}
603 | 
604 | update dbh =
605 |     do pclist <- getPodcasts dbh
606 |        mapM_ procPodcast pclist
607 |     where procPodcast pc =
608 |               do putStrLn $ "Updating from " ++ (castURL pc)
609 |                  updatePodcastFromFeed dbh pc
610 | 
611 | download dbh =
612 |     do pclist <- getPodcasts dbh
613 |        mapM_ procPodcast pclist
614 |     where procPodcast pc =
615 |               do putStrLn $ "Considering " ++ (castURL pc)
616 |                  episodelist <- getPodcastEpisodes dbh pc
617 |                  let dleps = filter (\ep -> epDone ep == False)
618 |                              episodelist
619 |                  mapM_ procEpisode dleps
620 |           procEpisode ep =
621 |               do putStrLn $ "Downloading " ++ (epURL ep)
622 |                  getEpisode dbh ep
623 | 
624 | syntaxError = putStrLn
625 |   "Usage: pod command [args]\n\
626 |   \\n\
627 |   \pod add url      Adds a new podcast with the given URL\n\
628 |   \pod download     Downloads all pending episodes\n\
629 |   \pod fetch        Updates, then downloads\n\
630 |   \pod update       Downloads podcast feeds, looks for new episodes\n"
631 | #+END_SRC
632 | 
633 | We have a very simple command-line parser with a function to
634 | indicate a command-line syntax error, plus small functions to
635 | handle the different command-line arguments.
636 | 
637 | You can compile this program with a command like this:
638 | 
639 | #+BEGIN_SRC screen
640 | ghc --make -O2 -o pod -package HTTP -package HaXml -package network \
641 |     -package HDBC -package HDBC-sqlite3 PodMain.hs
642 | #+END_SRC
643 | 
644 | Alternatively, you could use a Cabal file as documented in
645 | [[file:5-writing-a-library.org::*Creating a package][the section called "Creating a package"]]
646 | 
647 | #+CAPTION: pod.cabal
648 | #+BEGIN_SRC
649 | Name: pod
650 | Version: 1.0.0
651 | Build-type: Simple
652 | Build-Depends: HTTP, HaXml, network, HDBC, HDBC-sqlite3, base
653 | 
654 | Executable: pod
655 | Main-Is: PodMain.hs
656 | GHC-Options: -O2
657 | #+END_SRC
658 | 
659 | Also, you'll want a simple ~Setup.hs~ file:
660 | 
661 | #+BEGIN_SRC haskell
662 | import Distribution.Simple
663 | main = defaultMain
664 | #+END_SRC
665 | 
666 | Now, to build with Cabal, you just run:
667 | 
668 | #+BEGIN_SRC screen
669 | runghc Setup.hs configure
670 | runghc Setup.hs build
671 | #+END_SRC
672 | 
673 | And you'll find a ~dist~ directory containing your output. To
674 | install the program system-wide, run ~runghc Setup.hs install~.
675 | 


--------------------------------------------------------------------------------
/23-gui-programming-with-gtk2hs.org:
--------------------------------------------------------------------------------
  1 | * Chapter 23. GUI Programming with gtk2hs
  2 | 
  3 | Throughout this book, we have been developing simple text-based
  4 | tools. While these are often ideal interfaces, sometimes a
  5 | graphical user interface (GUI) is required. There are several GUI
  6 | toolkits available for Haskell. In this chapter, we will look at
  7 | one of the, gtk2hs.[fn:1]
  8 | 
  9 | ** Installing gtk2hs
 10 | 
 11 | Before we dive in to working with gtk2hs, you'll need to get it
 12 | installed. On most Linux, BSD, or other POSIX platforms, you will
 13 | find ready-made gtk2hs packages. You will generally need to
 14 | install the GTK+ development environment, Glade, and gtk2hs. The
 15 | specifics of doing so vary by distribution.
 16 | 
 17 | Windows and Mac developers should consult the gtk2hs downloads
 18 | site at [[http://www.haskell.org/gtk2hs/download/]]. Begin by
 19 | downloading gtk2hs from there. Then you will also need Glade
 20 | version 3. Mac developers can find this at
 21 | [[http://www.macports.org/]], while Windows developers should
 22 | consult [[http://sourceforge.net/projects/gladewin32]].
 23 | 
 24 | ** Overview of the GTK+ Stack
 25 | 
 26 | Before diving in to the code, let's pause a brief moment and
 27 | consider the architecture of the system we are going to use. First
 28 | off, we have GTK+. GTK+ is a cross-platform GUI-building toolkit,
 29 | implemented in C. It runs on Windows, Mac, Linux, BSDs, and more.
 30 | It is also the toolkit beneath the Gnome desktop environment.
 31 | 
 32 | Next, we have Glade. Glade is a user interface designer, which
 33 | lets you graphically lay out your application's windows and
 34 | dialogs. Glade saves the interface in XML files, which your
 35 | application will load at runtime.
 36 | 
 37 | The last piece of this puzzle is gtk2hs. This is the Haskell
 38 | binding for GTK+, Glade, and several related libraries. It is one
 39 | of many language bindings available for GTK+.
 40 | 
 41 | ** User Interface Design with Glade
 42 | 
 43 | In this chapter, we are going to develop a GUI for the podcast
 44 | downloader we first developed in
 45 | [[file:22-web-client-programming.org][Chapter 22, /Extended Example: Web Client Programming/]]. Our first
 46 | task is to design the user interface in Glade. Once we have
 47 | accomplished that, we will write the Haskell code to integrate it
 48 | with the application.
 49 | 
 50 | Because this is a Haskell book, rather than a GUI design book, we
 51 | will move fast through some of these early parts. For more
 52 | information on interface design with Glade, you may wish to refer
 53 | to one of these resources:
 54 | 
 55 | - The Glade homepage, which contains documentation for Glade.
 56 |   [[http://glade.gnome.org/]]
 57 | - The GTK+ homepage contains information about the different
 58 |   widgets. Refer to the documentation section, then the stable GTK
 59 |   documentation area. [[http://www.gtk.org/]]
 60 | - The gtk2hs homepage also has a useful documentation section,
 61 |   which contains an API reference to gtk2hs as well as a glade
 62 |   tutorial. [[http://www.haskell.org/gtk2hs/documentation/]]
 63 | 
 64 | *** Glade Concepts
 65 | 
 66 | Glade is a user interface design tool. It lets us use a graphical
 67 | interface to design our graphical interface. We could build up the
 68 | window components using a bunch of calls to GTK+ functions, but it
 69 | is usually easier to do this with Glade.
 70 | 
 71 | The fundamental "thing" we work with in GTK+ is the /widget/. A
 72 | widget represents any part of the GUI, and may contain other
 73 | widgets. Some examples of widgets include a window, dialog box,
 74 | button, and text within the button.
 75 | 
 76 | Glade, then, is a widget layout tool. We set up a whole tree of
 77 | widgets, with top-level windows at the top of the tree. You can
 78 | think of Glade and widgets in somewhat the same terms as HTML: you
 79 | can arrange widgets in a table-like layout, set up padding rules,
 80 | and structure the entire description in a hierarchical way.
 81 | 
 82 | Glade saves the widget descriptions into an XML file. Our program
 83 | loads this XML file at runtime. We load the widgets by asking the
 84 | Glade runtime library to load a widget with a specific name.
 85 | 
 86 | Here's a screenshot of an example working with Glade to design our
 87 | application's main screen:
 88 | 
 89 | [[file:figs/gui-glade-3.png]]
 90 | 
 91 | In the downloadable material available for this book, you can find
 92 | the full Glade XML file as ~podresources.glade~. You can load this
 93 | file in Glade and edit it if you wish.
 94 | 
 95 | ** Event-Driven Programming
 96 | 
 97 | GTK+, like many GUI toolkits, is an /event-driven/ toolkit. That
 98 | means that instead of, say, displaying a dialog box and waiting
 99 | for the user to click on a button, we instead tell gtk2hs what
100 | function to call if a certain button is clicked, but don't sit
101 | there waiting for a click in the dialog box.
102 | 
103 | This is different from the model traditionally used for console
104 | programs. When you think about it, though, it almost has to be. A
105 | GUI program could have multiple windows open, and writing code to
106 | sit there waiting for input in the particular combination of open
107 | windows could be a complicated proposition.
108 | 
109 | Event-driven programming complements Haskell nicely. As we've
110 | discussed over and over in this book, functional languages thrive
111 | on passing around functions. So we'll be passing functions to
112 | gtk2hs that get called when certain events occur. These are known
113 | as /callback functions/.
114 | 
115 | At the core of a GTK+ program is the /main loop/. This is the part
116 | of the program that waits for actions from the user or commands
117 | from the program and carries them out. The GTK+ main loop is
118 | handled entirely by GTK+. To us, it looks like an I/O action that
119 | we execute, that doesn't return until the GUI has been disposed
120 | of.
121 | 
122 | Since the main loop is responsible for doing everything from
123 | handling clicks of a mouse to redrawing a window when it has been
124 | uncovered, it must always be available. We can't just run a
125 | long-running task—such as downloading a podcast episode—from
126 | within the main loop. This would make the GUI unresponsive, and
127 | actions such as clicking a Cancel button wouldn't be processed in
128 | a timely manner.
129 | 
130 | Therefore, we will be using multithreading to handle these
131 | long-running tasks. More information on multithreading can be
132 | found in
133 | [[file:24-concurrent-and-multicore-programming.org][Chapter 24, /Concurrent and multicore programming/]]. For now, just know that
134 | we will use ~forkIO~ to create new threads for long-running tasks
135 | such as downloading podcast feeds and episodes. For very quick
136 | tasks, such as adding a new podcast to the database, we will not
137 | bother with a separate thread since it will be executed so fast
138 | the user will never notice.
139 | 
140 | ** Initializing the GUI
141 | 
142 | Our first steps are going to involve initializing the GUI for our
143 | program. For reasons that we'll explain in
144 | [[file:23-gui-programming-with-gtk2hs.org::*Using Cabal][the section called "Using Cabal"]]
145 | file called ~PodLocalMain.hs~ that loads ~PodMain~ and passes to
146 | it the path to ~podresources.glade~, the XML file saved by Glade
147 | that gives the information about our GUI widgets.
148 | 
149 | #+CAPTION: PodLocalMain.hs
150 | #+BEGIN_SRC haskell
151 | module Main where
152 | 
153 | import qualified PodMainGUI
154 | 
155 | main = PodMainGUI.main "podresources.glade"
156 | #+END_SRC
157 | 
158 | Now, let's consider ~PodMainGUI.hs~. This file is the only
159 | Haskell source file that we had to modify from the example in
160 | [[file:22-web-client-programming.org][Chapter 22, /Extended Example: Web Client Programming/]] to make it work as a GUI.
161 | Let's start by looking at the start of our new ~PodMainGUI.hs~
162 | file—we've renamed it from ~PodMain.hs~ for clarity.
163 | 
164 | #+CAPTION: PodMainGUI.hs
165 | #+BEGIN_SRC haskell
166 | module PodMainGUI where
167 | 
168 | import PodDownload
169 | import PodDB
170 | import PodTypes
171 | import System.Environment
172 | import Database.HDBC
173 | import Network.Socket(withSocketsDo)
174 | 
175 | -- GUI libraries
176 | 
177 | import Graphics.UI.Gtk hiding (disconnect)
178 | import Graphics.UI.Gtk.Glade
179 | 
180 | -- Threading
181 | 
182 | import Control.Concurrent
183 | #+END_SRC
184 | 
185 | This first part of ~PodMainGUI.hs~ is similar to our non-GUI
186 | version. We import three additional components, however. First, we
187 | have ~Graphics.UI.Gtk~, which provides most of the GTK+ functions
188 | we will be using. Both this module and ~Database.HDBC~ provide a
189 | function named ~disconnect~. Since we'll be using the HDBC
190 | version, but not the GTK+ version, we don't import that function
191 | from ~Graphics.UI.Gtk~. ~Graphics.UI.Gtk.Glade~ contains functions
192 | needed for loading and working with our Glade file.
193 | 
194 | We also import ~Control.Concurrent~, which has the basics needed
195 | for multi-threaded programming. We'll use a few functions from
196 | here as described above once we get into the guts of the program.
197 | Next, let's define a type to store information about our GUI.
198 | 
199 | #+CAPTION: PodMainGUI.hs
200 | #+BEGIN_SRC haskell
201 | -- | Our main GUI type
202 | data GUI = GUI {
203 |       mainWin :: Window,
204 |       mwAddBt :: Button,
205 |       mwUpdateBt :: Button,
206 |       mwDownloadBt :: Button,
207 |       mwFetchBt :: Button,
208 |       mwExitBt :: Button,
209 |       statusWin :: Dialog,
210 |       swOKBt :: Button,
211 |       swCancelBt :: Button,
212 |       swLabel :: Label,
213 |       addWin :: Dialog,
214 |       awOKBt :: Button,
215 |       awCancelBt :: Button,
216 |       awEntry :: Entry}
217 | #+END_SRC
218 | 
219 | Our new ~GUI~ type stores all the widgets we will care about in
220 | the entire program. Large programs may not wish to have a
221 | monolithic type like this. For this small example, it makes sense
222 | because it can be easily passed around to different functions, and
223 | we'll know that we always have the information we need available.
224 | 
225 | Within this record, we have fields for a ~Window~ (a top-level
226 | window), ~Dialog~ (dialog window), ~Button~ (clickable button),
227 | ~Label~ (piece of text), and ~Entry~ (place for the user to enter
228 | text). Let's now look at our ~main~ function:
229 | 
230 | #+CAPTION: PodMainGUI.hs
231 | #+BEGIN_SRC haskell
232 | main :: FilePath -> IO ()
233 | main gladepath = withSocketsDo $ handleSqlError $
234 |     do initGUI                  -- Initialize GTK+ engine
235 | 
236 |        -- Every so often, we try to run other threads.
237 |        timeoutAddFull (yield >> return True)
238 |                       priorityDefaultIdle 100
239 | 
240 |        -- Load the GUI from the Glade file
241 |        gui <- loadGlade gladepath
242 | 
243 |        -- Connect to the database
244 |        dbh <- connect "pod.db"
245 | 
246 |        -- Set up our events
247 |        connectGui gui dbh
248 | 
249 |        -- Run the GTK+ main loop; exits after GUI is done
250 |        mainGUI
251 | 
252 |        -- Disconnect from the database at the end
253 |        disconnect dbh
254 | #+END_SRC
255 | 
256 | Remember that the type of this ~main~ function is a little
257 | different than usual because it is being called by ~main~ in
258 | ~PodLocalMain.hs~. We start by calling ~initGUI~, which
259 | initializes the GTK+ system. Next, we have a call to
260 | ~timeoutAddFull~. This call is only needed for multithreaded GTK+
261 | programs. It tells the GTK+ main loop to pause to give other
262 | threads a chance to run every so often.
263 | 
264 | After that, we call our ~loadGlade~ function (see below) to load
265 | the widgets from our Glade XML file. After that, we connect to our
266 | database, call our ~connectGui~ function to set up our callback
267 | functions. Then, we fire up the GTK+ main loop. We expect it could
268 | be minutes, hours, or even days before ~mainGUI~ returns. When it
269 | does, it means the user has closed the main window or clicked the
270 | Exit button. After that, we disconnect from the database and close
271 | the program. Now, let's look at our ~loadGlade~ function.
272 | 
273 | #+CAPTION: PodMainGUI.hs
274 | #+BEGIN_SRC haskell
275 | loadGlade gladepath =
276 |     do -- Load XML from glade path.
277 |        -- Note: crashes with a runtime error on console if fails!
278 |        Just xml <- xmlNew gladepath
279 | 
280 |        -- Load main window
281 |        mw <- xmlGetWidget xml castToWindow "mainWindow"
282 | 
283 |        -- Load all buttons
284 | 
285 |        [mwAdd, mwUpdate, mwDownload, mwFetch, mwExit, swOK, swCancel,
286 |         auOK, auCancel] <-
287 |            mapM (xmlGetWidget xml castToButton)
288 |            ["addButton", "updateButton", "downloadButton",
289 |             "fetchButton", "exitButton", "okButton", "cancelButton",
290 |             "auOK", "auCancel"]
291 | 
292 |        sw <- xmlGetWidget xml castToDialog "statusDialog"
293 |        swl <- xmlGetWidget xml castToLabel "statusLabel"
294 | 
295 |        au <- xmlGetWidget xml castToDialog "addDialog"
296 |        aue <- xmlGetWidget xml castToEntry "auEntry"
297 | 
298 |        return $ GUI mw mwAdd mwUpdate mwDownload mwFetch mwExit
299 |               sw swOK swCancel swl au auOK auCancel aue
300 | #+END_SRC
301 | 
302 | This function starts by calling ~xmlNew~, which loads the Glade
303 | XML file. It returns ~Nothing~ on error. Here we are using pattern
304 | matching to extract the result value on success. If it fails,
305 | there will be a console (not graphical) exception displayed; one
306 | of the exercises at the end of this chapter addresses this.
307 | 
308 | Now that we have Glade's XML file loaded, you will see a bunch of
309 | calls to ~xmlGetWidget~. This Glade function is used to load the
310 | XML definition of a widget, and return a GTK+ widget type for that
311 | widget. We have to pass along to that function a value indicating
312 | what GTK+ type we expect—we'll get a runtime error if these don't
313 | match.
314 | 
315 | We start by creating a widget for the main window. It is loaded
316 | from the XML widget defined with name ~"mainWindow"~ and stored in
317 | the ~mw~ variable. We then use pattern matching and ~mapM~ to load
318 | up all the buttons. Then, we have two dialogs, a label, and an
319 | entry to load. Finally, we use all of these to build up the GUI
320 | type and return it. Next, we need to set our callback functions up
321 | as event handlers.
322 | 
323 | #+CAPTION: PodMainGUI.hs
324 | #+BEGIN_SRC haskell
325 | connectGui gui dbh =
326 |     do -- When the close button is clicked, terminate GUI loop
327 |        -- by calling GTK mainQuit function
328 |        onDestroy (mainWin gui) mainQuit
329 | 
330 |        -- Main window buttons
331 |        onClicked (mwAddBt gui) (guiAdd gui dbh)
332 |        onClicked (mwUpdateBt gui) (guiUpdate gui dbh)
333 |        onClicked (mwDownloadBt gui) (guiDownload gui dbh)
334 |        onClicked (mwFetchBt gui) (guiFetch gui dbh)
335 |        onClicked (mwExitBt gui) mainQuit
336 | 
337 |        -- We leave the status window buttons for later
338 | #+END_SRC
339 | 
340 | We start out the ~connectGui~ function by calling ~onDestroy~.
341 | This means that when somebody clicks on the operating system's
342 | close button (typically an X in the titlebar on Windows or Linux,
343 | or a red circle on Mac OS X), on the main window, we call the
344 | ~mainQuit~ function. ~mainQuit~ closes all GUI windows and
345 | terminates the GTK+ main loop.
346 | 
347 | Next, we call ~onClicked~ to register event handlers for clicking
348 | on our five different buttons. For buttons, these handlers are
349 | also called if the user selects the button via the keyboard.
350 | Clicking on these buttons will call our functions such as
351 | ~guiAdd~, passing along the GUI record as well as a database
352 | handle.
353 | 
354 | At this point, we have completely defined the main window for the
355 | GUI podcatcher. It looks like this:
356 | 
357 | [[file:figs/gui-pod-mainwin.png]]
358 | 
359 | ** The Add Podcast Window
360 | 
361 | Now that we've covered the main window, let's talk about the other
362 | windows that our application presents, starting with the Add
363 | Podcast window. When the user clicks the button to add a new
364 | podcast, we need to pop up a dialog box to prompt for the URL of
365 | the podcast. We have defined this dialog box in Glade, so all we
366 | need to do is set it up.
367 | 
368 | #+CAPTION: PodMainGUI.hs
369 | #+BEGIN_SRC haskell
370 | guiAdd gui dbh =
371 |     do -- Initialize the add URL window
372 |        entrySetText (awEntry gui) ""
373 |        onClicked (awCancelBt gui) (widgetHide (addWin gui))
374 |        onClicked (awOKBt gui) procOK
375 | 
376 |        -- Show the add URL window
377 |        windowPresent (addWin gui)
378 |     where procOK =
379 |               do url <- entryGetText (awEntry gui)
380 |                  widgetHide (addWin gui) -- Remove the dialog
381 |                  add dbh url             -- Add to the DB
382 | #+END_SRC
383 | 
384 | We start by calling ~entrySetText~ to set the contents of the
385 | entry box (the place where the user types in the URL) to the empty
386 | string. That's because the same widget gets reused over the
387 | lifetime of the program, and we don't want the last URL the user
388 | entered to remain there. Next, we set up actions for the two
389 | buttons in the dialog. If the users clicks on the cancel button,
390 | we simply remove the dialog box from the screen by calling
391 | ~widgetHide~ on it. If the user clicks the OK button, we call
392 | ~procOK~.
393 | 
394 | ~procOK~ starts by retrieving the supplied URL from the entry
395 | widget. Next, it uses ~widgetHide~ to get rid of the dialog box.
396 | Finally, it calls ~add~ to add the URL to the database. This ~add~
397 | is exactly the same function as we had in the non-GUI version of
398 | the program.
399 | 
400 | The last thing we do in ~guiAdd~ is actually display the pop-up
401 | window. That's done by calling ~windowPresent~, which is the
402 | opposite of ~widgetHide~.
403 | 
404 | Note that the ~guiAdd~ function returns almost immediately. It
405 | just sets up the widgets and causes the box to be displayed; at no
406 | point does it block waiting for input. Here's what the dialog box
407 | looks like:
408 | 
409 | [[file:figs/gui-pod-addwin.png]]
410 | 
411 | ** Long-Running Tasks
412 | 
413 | As we think about the buttons available in the main window, three
414 | of them correspond to tasks that could take a while to complete:
415 | update, download, and fetch. While these operations take place,
416 | we'd like to do two things with our GUI: provide the user with the
417 | status of the operation, and provide the user with the ability to
418 | cancel the operation as it is in progress.
419 | 
420 | Since all three of these things are very similar operations, it
421 | makes sense to provide a generic way to handle this interaction.
422 | We have defined a single status window widget in the Glade file
423 | that will be used by all three of these. In our Haskell source
424 | code, we'll define a generic ~statusWindow~ function that will be
425 | used by all three of these operations as well.
426 | 
427 | ~statusWindow~ takes four parameters: the GUI information, the
428 | database information, a ~String~ giving the title of the window,
429 | and a function that will perform the operation. This function will
430 | itself be passed a function that it can call to report its
431 | progress. Here's the code:
432 | 
433 | #+CAPTION: PodMainGUI.hs
434 | #+BEGIN_SRC haskell
435 | statusWindow :: IConnection conn =>
436 |                 GUI
437 |              -> conn
438 |              -> String
439 |              -> ((String -> IO ()) -> IO ())
440 |              -> IO ()
441 | statusWindow gui dbh title func =
442 |     do -- Clear the status text
443 |        labelSetText (swLabel gui) ""
444 | 
445 |        -- Disable the OK button, enable Cancel button
446 |        widgetSetSensitivity (swOKBt gui) False
447 |        widgetSetSensitivity (swCancelBt gui) True
448 | 
449 |        -- Set the title
450 |        windowSetTitle (statusWin gui) title
451 | 
452 |        -- Start the operation
453 |        childThread <- forkIO childTasks
454 | 
455 |        -- Define what happens when clicking on Cancel
456 |        onClicked (swCancelBt gui) (cancelChild childThread)
457 | 
458 |        -- Show the window
459 |        windowPresent (statusWin gui)
460 |     where childTasks =
461 |               do updateLabel "Starting thread..."
462 |                  func updateLabel
463 |                  -- After the child task finishes, enable OK
464 |                  -- and disable Cancel
465 |                  enableOK
466 | 
467 |           enableOK =
468 |               do widgetSetSensitivity (swCancelBt gui) False
469 |                  widgetSetSensitivity (swOKBt gui) True
470 |                  onClicked (swOKBt gui) (widgetHide (statusWin gui))
471 |                  return ()
472 | 
473 |           updateLabel text =
474 |               labelSetText (swLabel gui) text
475 |           cancelChild childThread =
476 |               do killThread childThread
477 |                  yield
478 |                  updateLabel "Action has been cancelled."
479 |                  enableOK
480 | #+END_SRC
481 | 
482 | This function starts by clearing the label text from the last run.
483 | Next, we disable (gray out) the OK button and enable the cancel
484 | button. While the operation is in progress, clicking OK doesn't
485 | make much sense. And when it's done, clicking Cancel doesn't make
486 | much sense.
487 | 
488 | Next, we set the title of the window. The title is the part that
489 | is displayed by the system in the title bar of the window.
490 | Finally, we start off the new thread (represented by ~childTasks~)
491 | and save off its thread ID. Then, we define what to do if the user
492 | clicks on Cancel —we call ~cancelChild~, passing along the thread
493 | ID. Finally, we call ~windowPresent~ to show the status window.
494 | 
495 | In ~childTasks~, we display a message saying that we're starting
496 | the thread. Then we call the actual worker function, passing
497 | ~updateLabel~ as the function to use for displaying status
498 | messages. Note that a command-line version of the program could
499 | pass ~putStrLn~ here.
500 | 
501 | Finally, after the worker function exits, we call ~enableOK~. This
502 | function disables the cancel button, enables the OK button, and
503 | defines that a click on the OK button causes the status window to
504 | go away.
505 | 
506 | ~updateLabel~ simply calls ~labelSetText~ on the label widget to
507 | update it with the displayed text. Finally, ~cancelChild~ kills
508 | the thread processing the task, updates the label, and enables the
509 | OK button.
510 | 
511 | We now have the infrastructure in place to define our three GUI
512 | functions. They look like this:
513 | 
514 | #+CAPTION: PodMainGUI.hs
515 | #+BEGIN_SRC haskell
516 | guiUpdate :: IConnection conn => GUI -> conn -> IO ()
517 | guiUpdate gui dbh =
518 |     statusWindow gui dbh "Pod: Update" (update dbh)
519 | 
520 | guiDownload gui dbh =
521 |     statusWindow gui dbh "Pod: Download" (download dbh)
522 | 
523 | guiFetch gui dbh =
524 |     statusWindow gui dbh "Pod: Fetch"
525 |                      (\logf -> update dbh logf >> download dbh logf)
526 | #+END_SRC
527 | 
528 | For brevity, we have given the type for only the first one, but
529 | all three have the same type, and Haskell can work them out via
530 | type inference. Notice our implementation of ~guiFetch~. We don't
531 | call ~statusWindow~ twice, but rather combine functions in its
532 | action.
533 | 
534 | The final piece of the puzzle consists of the three functions that
535 | do our work. ~add~ is unmodified from the command-line chapter.
536 | ~update~ and ~download~ are modified only to take a logging
537 | function instead of calling ~putStrLn~ for status updates.
538 | 
539 | #+CAPTION: PodMainGUI.hs
540 | #+BEGIN_SRC haskell
541 | add dbh url =
542 |     do addPodcast dbh pc
543 |        commit dbh
544 |     where pc = Podcast {castId = 0, castURL = url}
545 | 
546 | update :: IConnection conn => conn -> (String -> IO ()) -> IO ()
547 | update dbh logf =
548 |     do pclist <- getPodcasts dbh
549 |        mapM_ procPodcast pclist
550 |        logf "Update complete."
551 |     where procPodcast pc =
552 |               do logf $ "Updating from " ++ (castURL pc)
553 |                  updatePodcastFromFeed dbh pc
554 | 
555 | download dbh logf =
556 |     do pclist <- getPodcasts dbh
557 |        mapM_ procPodcast pclist
558 |        logf "Download complete."
559 |     where procPodcast pc =
560 |               do logf $ "Considering " ++ (castURL pc)
561 |                  episodelist <- getPodcastEpisodes dbh pc
562 |                  let dleps = filter (\ep -> epDone ep == False)
563 |                              episodelist
564 |                  mapM_ procEpisode dleps
565 |           procEpisode ep =
566 |               do logf $ "Downloading " ++ (epURL ep)
567 |                  getEpisode dbh ep
568 | #+END_SRC
569 | 
570 | Here's what the final result looks like after running an update:
571 | 
572 | [[file:figs/gui-update-complete.png]]
573 | 
574 | ** Using Cabal
575 | 
576 | We presented a Cabal file to build this project for the
577 | command-line version in
578 | [[file:22-web-client-programming.org::*Main Program][the section called "Main Program"]]
579 | for it to work with our GUI version. First, there's the obvious
580 | need to add the gtk2hs packages to the list of build dependencies.
581 | There is also the matter of the Glade XML file.
582 | 
583 | Earlier, we wrote a ~PodLocalMain.hs~ that simply assumed this
584 | file was named ~podresources.glade~ and stored in the current
585 | working directory. For a real, system-wide installation, we can't
586 | make that assumption. Moreover, different systems may place the
587 | file at different locations.
588 | 
589 | Cabal provides a way around this problem. It automatically
590 | generates a module that exports functions that can interrogate the
591 | environment. We must add a ~Data-files~ line to our Cabal
592 | description file. This file names all data files that will be part
593 | of a system-wide installation. Then, Cabal will export a
594 | ~Paths_pod~ module (the "pod" part comes from the ~Name~ line in
595 | the Cabal file) that we can interrogate for the location at
596 | runtime. Here's our new Cabal description file:
597 | 
598 | #+CAPTION: pod.cabal
599 | #+BEGIN_SRC
600 | Name: pod
601 | Version: 1.0.0
602 | Build-type: Simple
603 | Build-Depends: HTTP, HaXml, network, HDBC, HDBC-sqlite3, base, 
604 |                gtk, glade
605 | Data-files: podresources.glade
606 | 
607 | Executable: pod
608 | Main-Is: PodCabalMain.hs
609 | GHC-Options: -O2
610 | #+END_SRC
611 | 
612 | And, to go with it, ~PodCabalMain.hs~:
613 | 
614 | #+CAPTION: PodCabalMain.hs
615 | #+BEGIN_SRC haskell
616 | module Main where
617 | 
618 | import qualified PodMainGUI
619 | import Paths_pod(getDataFileName)
620 | 
621 | main =
622 |     do gladefn <- getDataFileName "podresources.glade"
623 |        PodMainGUI.main gladefn
624 | #+END_SRC
625 | 
626 | ** Exercises
627 | 
628 | 1. Present a helpful GUI error message if the call to ~xmlNew~
629 |    returns ~Nothing~.
630 | 2. Modify the podcatcher to be able to run with either the GUI
631 |    or the command-line interface from a single code base. Hint:
632 |    move common code out of ~PodMainGUI.hs~, then have two
633 |    different ~main~ modules, one for the GUI, and one for the
634 |    command line.
635 | 3. Why does ~guiFetch~ combine worker functions instead of calling
636 |    ~statusWindow~ twice?
637 | 
638 | ** Footnotes
639 | 
640 | [fn:1] Several alternatives also exist. Alongside gtk2hs,
641 | wxHaskell is also a prominent cross-platform GUI toolkit.
642 | 


--------------------------------------------------------------------------------
/27-sockets-and-syslog.org:
--------------------------------------------------------------------------------
  1 | * Chapter 27. Sockets and Syslog
  2 | 
  3 | ** Basic Networking
  4 | 
  5 | In several earlier chapters of this book, we have discussed
  6 | services that operate over a network. Two examples are
  7 | client/server databases and web services. When the need arises to
  8 | devise a new protocol, or to communicate with a protocol that
  9 | doesn't have an existing helper library in Haskell, you'll need to
 10 | use the lower-level networking tools in the Haskell library.
 11 | 
 12 | In this chapter, we will discuss these lower-level tools. Network
 13 | communication is a broad topic with entire books devoted to it. In
 14 | this chapter, we will show you how to use Haskell to apply
 15 | low-level network knowledge you already have.
 16 | 
 17 | Haskell's networking functions almost always correspond directly
 18 | to familiar C function calls. As most other languages also layer
 19 | atop C, you should find this interface familiar.
 20 | 
 21 | ** Communicating with UDP
 22 | 
 23 | UDP breaks data down into packets. It does not ensure that the
 24 | data reaches its destination, or reaches it only once. It does use
 25 | checksumming to ensure that packets that arrive have not been
 26 | corrupted. UDP tends to be used in applications that are
 27 | performance- or latency-sensitive, in which each individual packet
 28 | of data is less important than the overall performance of the
 29 | system. It may also be used where the TCP behavior isn't the most
 30 | efficient, such as ones that send short, discrete messages.
 31 | Examples of systems that tend to use UDP include audio and video
 32 | conferencing, time synchronization, network-based filesystems, and
 33 | logging systems.
 34 | 
 35 | *** UDP Client Example: syslog
 36 | 
 37 | The traditional Unix syslog service allows programs to send log
 38 | messages over a network to a central server that records them.
 39 | Some programs are quite performance-sensitive, and may generate a
 40 | large volume of messages. In these programs, it could be more
 41 | important to have the logging impose a minimal performance
 42 | overhead than to guarantee every message is logged. Moreover, it
 43 | may be desirable to continue program operation even if the logging
 44 | server is unreachable. For this reason, UDP is one of the
 45 | protocols supported by syslog for the transmission of log
 46 | messages. The protocol is simple and we present a Haskell
 47 | implementation of a client here.
 48 | 
 49 | #+CAPTION: syslogclient.hs
 50 | #+BEGIN_EXAMPLE
 51 | import Data.Bits
 52 | import Network.Socket
 53 | import Network.BSD
 54 | import Data.List
 55 | import SyslogTypes
 56 | 
 57 | data SyslogHandle =
 58 |     SyslogHandle {slSocket :: Socket,
 59 |                   slProgram :: String,
 60 |                   slAddress :: SockAddr}
 61 | 
 62 | openlog :: HostName             -- ^ Remote hostname, or localhost
 63 |         -> String               -- ^ Port number or name; 514 is default
 64 |         -> String               -- ^ Name to log under
 65 |         -> IO SyslogHandle      -- ^ Handle to use for logging
 66 | openlog hostname port progname =
 67 |     do -- Look up the hostname and port.  Either raises an exception
 68 |        -- or returns a nonempty list.  First element in that list
 69 |        -- is supposed to be the best option.
 70 |        addrinfos <- getAddrInfo Nothing (Just hostname) (Just port)
 71 |        let serveraddr = head addrinfos
 72 | 
 73 |        -- Establish a socket for communication
 74 |        sock <- socket (addrFamily serveraddr) Datagram defaultProtocol
 75 | 
 76 |        -- Save off the socket, program name, and server address in a handle
 77 |        return $ SyslogHandle sock progname (addrAddress serveraddr)
 78 | 
 79 | syslog :: SyslogHandle -> Facility -> Priority -> String -> IO ()
 80 | syslog syslogh fac pri msg =
 81 |     sendstr sendmsg
 82 |     where code = makeCode fac pri
 83 |           sendmsg = "<" ++ show code ++ ">" ++ (slProgram syslogh) ++
 84 |                     ": " ++ msg
 85 | 
 86 |           -- Send until everything is done
 87 |           sendstr :: String -> IO ()
 88 |           sendstr [] = return ()
 89 |           sendstr omsg = do sent <- sendTo (slSocket syslogh) omsg
 90 |                                     (slAddress syslogh)
 91 |                             sendstr (genericDrop sent omsg)
 92 | 
 93 | closelog :: SyslogHandle -> IO ()
 94 | closelog syslogh = sClose (slSocket syslogh)
 95 | 
 96 | {- | Convert a facility and a priority into a syslog code -}
 97 | makeCode :: Facility -> Priority -> Int
 98 | makeCode fac pri =
 99 |     let faccode = codeOfFac fac
100 |         pricode = fromEnum pri
101 |         in
102 |           (faccode `shiftL` 3) .|. pricode
103 | #+END_EXAMPLE
104 | 
105 | This also requires ~SyslogTypes.hs~, shown here:
106 | 
107 | #+CAPTION: SyslogTypes.hs
108 | #+BEGIN_EXAMPLE
109 | module SyslogTypes where
110 | {- | Priorities define how important a log message is. -}
111 | 
112 | data Priority =
113 |             DEBUG                   -- ^ Debug messages
114 |           | INFO                    -- ^ Information
115 |           | NOTICE                  -- ^ Normal runtime conditions
116 |           | WARNING                 -- ^ General Warnings
117 |           | ERROR                   -- ^ General Errors
118 |           | CRITICAL                -- ^ Severe situations
119 |           | ALERT                   -- ^ Take immediate action
120 |           | EMERGENCY               -- ^ System is unusable
121 |                     deriving (Eq, Ord, Show, Read, Enum)
122 | 
123 | {- | Facilities are used by the system to determine where messages
124 | are sent. -}
125 | 
126 | data Facility =
127 |               KERN                      -- ^ Kernel messages
128 |               | USER                    -- ^ General userland messages
129 |               | MAIL                    -- ^ E-Mail system
130 |               | DAEMON                  -- ^ Daemon (server process) messages
131 |               | AUTH                    -- ^ Authentication or security messages
132 |               | SYSLOG                  -- ^ Internal syslog messages
133 |               | LPR                     -- ^ Printer messages
134 |               | NEWS                    -- ^ Usenet news
135 |               | UUCP                    -- ^ UUCP messages
136 |               | CRON                    -- ^ Cron messages
137 |               | AUTHPRIV                -- ^ Private authentication messages
138 |               | FTP                     -- ^ FTP messages
139 |               | LOCAL0
140 |               | LOCAL1
141 |               | LOCAL2
142 |               | LOCAL3
143 |               | LOCAL4
144 |               | LOCAL5
145 |               | LOCAL6
146 |               | LOCAL7
147 |                 deriving (Eq, Show, Read)
148 | 
149 | facToCode = [
150 |                        (KERN, 0),
151 |                        (USER, 1),
152 |                        (MAIL, 2),
153 |                        (DAEMON, 3),
154 |                        (AUTH, 4),
155 |                        (SYSLOG, 5),
156 |                        (LPR, 6),
157 |                        (NEWS, 7),
158 |                        (UUCP, 8),
159 |                        (CRON, 9),
160 |                        (AUTHPRIV, 10),
161 |                        (FTP, 11),
162 |                        (LOCAL0, 16),
163 |                        (LOCAL1, 17),
164 |                        (LOCAL2, 18),
165 |                        (LOCAL3, 19),
166 |                        (LOCAL4, 20),
167 |                        (LOCAL5, 21),
168 |                        (LOCAL6, 22),
169 |                        (LOCAL7, 23)
170 |            ]
171 | 
172 | codeToFac = map (\(x, y) -> (y, x)) facToCode
173 | 
174 | {- | We can't use enum here because the numbering is discontiguous -}
175 | codeOfFac :: Facility -> Int
176 | codeOfFac f = case lookup f facToCode of
177 |                 Just x -> x
178 |                 _ -> error $ "Internal error in codeOfFac"
179 | 
180 | facOfCode :: Int -> Facility
181 | facOfCode f = case lookup f codeToFac of
182 |                 Just x -> x
183 |                 _ -> error $ "Invalid code in facOfCode"
184 | #+END_EXAMPLE
185 | 
186 | With ~ghci~, you can send a message to a local syslog server. You
187 | can use either the example syslog server presented in this
188 | chapter, or an existing syslog server like you would typically
189 | find on Linux or other POSIX systems. Note that most of these
190 | disable the UDP port by default and you may need to enable UDP
191 | before your vendor-supplied syslog daemon will display received
192 | messages.
193 | 
194 | If you were sending a message to a syslog server on the local
195 | system, you might use a command such as this:
196 | 
197 | #+BEGIN_SRC screen
198 | ghci> :load syslogclient.hs
199 | [1 of 2] Compiling SyslogTypes      ( SyslogTypes.hs, interpreted )
200 | [2 of 2] Compiling Main             ( syslogclient.hs, interpreted )
201 | Ok, modules loaded: SyslogTypes, Main.
202 | ghci> h <- openlog "localhost" "514" "testprog"
203 | Loading package parsec-2.1.0.0 ... linking ... done.
204 | Loading package network-2.1.0.0 ... linking ... done.
205 | ghci> syslog h USER INFO "This is my message"
206 | ghci> closelog h
207 | #+END_SRC
208 | 
209 | *** UDP Syslog Server
210 | 
211 | UDP servers will bind to a specific port on the server machine.
212 | They will accept packets directed to that port and process them.
213 | Since UDP is a stateless, packet-oriented protocol, programmers
214 | normally use a call such as ~recvFrom~ to receive both the data
215 | and information about the machine that sent it, which is used for
216 | sending back a response.
217 | 
218 | #+CAPTION: syslogserver.hs
219 | #+BEGIN_EXAMPLE
220 | import Data.Bits
221 | import Network.Socket
222 | import Network.BSD
223 | import Data.List
224 | 
225 | type HandlerFunc = SockAddr -> String -> IO ()
226 | 
227 | serveLog :: String              -- ^ Port number or name; 514 is default
228 |          -> HandlerFunc         -- ^ Function to handle incoming messages
229 |          -> IO ()
230 | serveLog port handlerfunc = withSocketsDo $
231 |     do -- Look up the port.  Either raises an exception or returns
232 |        -- a nonempty list.
233 |        addrinfos <- getAddrInfo
234 |                     (Just (defaultHints {addrFlags = [AI_PASSIVE]}))
235 |                     Nothing (Just port)
236 |        let serveraddr = head addrinfos
237 | 
238 |        -- Create a socket
239 |        sock <- socket (addrFamily serveraddr) Datagram defaultProtocol
240 | 
241 |        -- Bind it to the address we're listening to
242 |        bindSocket sock (addrAddress serveraddr)
243 | 
244 |        -- Loop forever processing incoming data.  Ctrl-C to abort.
245 |        procMessages sock
246 |     where procMessages sock =
247 |               do -- Receive one UDP packet, maximum length 1024 bytes,
248 |                  -- and save its content into msg and its source
249 |                  -- IP and port into addr
250 |                  (msg, _, addr) <- recvFrom sock 1024
251 |                  -- Handle it
252 |                  handlerfunc addr msg
253 |                  -- And process more messages
254 |                  procMessages sock
255 | 
256 | -- A simple handler that prints incoming packets
257 | plainHandler :: HandlerFunc
258 | plainHandler addr msg =
259 |     putStrLn $ "From " ++ show addr ++ ": " ++ msg
260 | #+END_EXAMPLE
261 | 
262 | You can run this in ~ghci~. A call to
263 | ~serveLog "1514" plainHandler~ will set up a UDP server on port
264 | 1514 that will use ~plainHandler~ to print out every incoming UDP packet
265 | on that port. Ctrl-C will terminate the program.
266 | 
267 | #+BEGIN_NOTE
268 | In case of problems
269 | 
270 | Getting ~bind: permission denied~ when testing this? Make sure you
271 | use a port number greater than 1024. Some operating systems only
272 | allow the ~root~ user to bind to ports less than 1024.
273 | #+END_NOTE
274 | 
275 | ** Communicating with TCP
276 | 
277 | TCP is designed to make data transfer over the Internet as
278 | reliable as possible. TCP traffic is a stream of data. While this
279 | stream gets broken up into individual packets by the operating
280 | system, the packet boundaries are neither known nor relevant to
281 | applications. TCP guarantees that, if traffic is delivered to the
282 | application at all, that it has arrived intact, unmodified,
283 | exactly once, and in order. Obviously, things such as a broken
284 | wire can cause traffic to not be delivered, and no protocol can
285 | overcome those limitations.
286 | 
287 | This brings with it some tradeoffs compared with UDP. First of
288 | all, there are a few packets that must be sent at the start of the
289 | TCP conversation to establish the link. For very short
290 | conversations, then, UDP would have a performance advantage. Also,
291 | TCP tries very hard to get data through. If one end of a
292 | conversation tries to send data to the remote, but doesn't receive
293 | an acknowledgment back, it will periodically re-transmit the data
294 | for some time before giving up. This makes TCP robust in the face
295 | of dropped packets. However, it also means that TCP is not the
296 | best choice for real-time protocols that involve things such as
297 | live audio or video.
298 | 
299 | *** Handling Multiple TCP Streams
300 | 
301 | With TCP, connections are stateful. That means that there is a
302 | dedicated logical "channel" between a client and server, rather
303 | than just one-off packets as with UDP. This makes things easy for
304 | client developers. Server applications almost always will want to
305 | be able to handle more than one TCP connection at once. How then
306 | to do this?
307 | 
308 | On the server side, you will first create a socket and bind to a
309 | port, just like UDP. Instead of repeatedly listening for data from
310 | any location, your main loop will be around the ~accept~ call.
311 | Each time a client connects, the server's operating system
312 | allocates a new socket for it. So we have the /master/ socket,
313 | used only to listen for incoming connections, and never to
314 | transmit data. We also have the potential for multiple /child/
315 | sockets to be used at once, each corresponding to a logical TCP
316 | conversation.
317 | 
318 | In Haskell, you will usually use ~forkIO~ to create a separate
319 | lightweight thread to handle each conversation with a child.
320 | Haskell has an efficient internal implementation of this that
321 | performs quite well.
322 | 
323 | *** TCP Syslog Server
324 | 
325 | Let's say that we wanted to reimplement syslog using TCP instead
326 | of UDP. We could say that a single message is defined not by being
327 | in a single packet, but is ended by a trailing newline character
328 | ~'\n'~. Any given client could send 0 or more messages to the
329 | server using a given TCP connection. Here's how we might write
330 | that.
331 | 
332 | #+CAPTION: syslogtcpserver.hs
333 | #+BEGIN_EXAMPLE
334 | import Data.Bits
335 | import Network.Socket
336 | import Network.BSD
337 | import Data.List
338 | import Control.Concurrent
339 | import Control.Concurrent.MVar
340 | import System.IO
341 | 
342 | type HandlerFunc = SockAddr -> String -> IO ()
343 | 
344 | serveLog :: String              -- ^ Port number or name; 514 is default
345 |          -> HandlerFunc         -- ^ Function to handle incoming messages
346 |          -> IO ()
347 | serveLog port handlerfunc = withSocketsDo $
348 |     do -- Look up the port.  Either raises an exception or returns
349 |        -- a nonempty list.
350 |        addrinfos <- getAddrInfo
351 |                     (Just (defaultHints {addrFlags = [AI_PASSIVE]}))
352 |                     Nothing (Just port)
353 |        let serveraddr = head addrinfos
354 | 
355 |        -- Create a socket
356 |        sock <- socket (addrFamily serveraddr) Stream defaultProtocol
357 | 
358 |        -- Bind it to the address we're listening to
359 |        bindSocket sock (addrAddress serveraddr)
360 | 
361 |        -- Start listening for connection requests.  Maximum queue size
362 |        -- of 5 connection requests waiting to be accepted.
363 |        listen sock 5
364 | 
365 |        -- Create a lock to use for synchronizing access to the handler
366 |        lock <- newMVar ()
367 | 
368 |        -- Loop forever waiting for connections.  Ctrl-C to abort.
369 |        procRequests lock sock
370 | 
371 |     where
372 |           -- | Process incoming connection requests
373 |           procRequests :: MVar () -> Socket -> IO ()
374 |           procRequests lock mastersock =
375 |               do (connsock, clientaddr) <- accept mastersock
376 |                  handle lock clientaddr
377 |                     "syslogtcpserver.hs: client connected"
378 |                  forkIO $ procMessages lock connsock clientaddr
379 |                  procRequests lock mastersock
380 | 
381 |           -- | Process incoming messages
382 |           procMessages :: MVar () -> Socket -> SockAddr -> IO ()
383 |           procMessages lock connsock clientaddr =
384 |               do connhdl <- socketToHandle connsock ReadMode
385 |                  hSetBuffering connhdl LineBuffering
386 |                  messages <- hGetContents connhdl
387 |                  mapM_ (handle lock clientaddr) (lines messages)
388 |                  hClose connhdl
389 |                  handle lock clientaddr
390 |                     "syslogtcpserver.hs: client disconnected"
391 | 
392 |           -- Lock the handler before passing data to it.
393 |           handle :: MVar () -> HandlerFunc
394 |           -- This type is the same as
395 |           -- handle :: MVar () -> SockAddr -> String -> IO ()
396 |           handle lock clientaddr msg =
397 |               withMVar lock
398 |                  (\a -> handlerfunc clientaddr msg >> return a)
399 | 
400 | -- A simple handler that prints incoming packets
401 | plainHandler :: HandlerFunc
402 | plainHandler addr msg =
403 |     putStrLn $ "From " ++ show addr ++ ": " ++ msg
404 | #+END_EXAMPLE
405 | 
406 | For our ~SyslogTypes~ implementation, see
407 | [[file:27-sockets-and-syslog.org::*UDP Client Example: syslog][the section called "UDP Client Example: syslog"]]
408 | 
409 | Let's look at this code. Our main loop is in ~procRequests~, where
410 | we loop forever waiting for new connections from clients. The
411 | ~accept~ call blocks until a client connects. When a client
412 | connects, we get a new socket and the address of the client. We
413 | pass a message to the handler about that, then use ~forkIO~ to
414 | create a thread to handle the data from that client. This thread
415 | runs ~procMessages~.
416 | 
417 | When dealing with TCP data, it's often convenient to convert a
418 | socket into a Haskell ~Handle~. We do so here, and explicitly set
419 | the buffering—an important point for TCP communication. Next, we
420 | set up lazy reading from the socket's ~Handle~. For each incoming
421 | line, we pass it to ~handle~. After there is no more data—because
422 | the remote end has closed the socket—we output a message about
423 | that.
424 | 
425 | Since we may be handling multiple incoming messages at once, we
426 | need to ensure that we're not writing out multiple messages at
427 | once in the handler. That could result in garbled output. We use a
428 | simple lock to serialize access to the handler, and write a simple
429 | ~handle~ function to handle that.
430 | 
431 | You can test this with the client we'll present next, or you can
432 | even use the ~telnet~ program to connect to this server. Each line
433 | of text you send to it will be printed on the display by the
434 | server. Let's try it out:
435 | 
436 | #+BEGIN_SRC screen
437 | ghci> :load syslogtcpserver.hs
438 | [1 of 1] Compiling Main             ( syslogtcpserver.hs, interpreted )
439 | Ok, modules loaded: Main.
440 | ghci> serveLog "10514" plainHandler
441 | Loading package parsec-2.1.0.0 ... linking ... done.
442 | Loading package network-2.1.0.0 ... linking ... done.
443 | #+END_SRC
444 | 
445 | At this point, the server will begin listening for connections at
446 | port 10514. It will not appear to be doing anything until a client
447 | connects. We could use telnet to connect to the server:
448 | 
449 | #+BEGIN_SRC screen
450 | ~$ telnet localhost 10514
451 | Trying 127.0.0.1...
452 | Connected to localhost.
453 | Escape character is '^]'.
454 | Test message
455 | ^]
456 | telnet> quit
457 | Connection closed.
458 | #+END_SRC
459 | 
460 | Meanwhile, in our other terminal running the TCP server, you'll
461 | see something like this:
462 | 
463 | #+BEGIN_SRC screen
464 | From 127.0.0.1:38790: syslogtcpserver.hs: client connected
465 | From 127.0.0.1:38790: Test message
466 | From 127.0.0.1:38790: syslogtcpserver.hs: client disconnected
467 | #+END_SRC
468 | 
469 | This shows that a client connected from port 38790 on the local
470 | machine (127.0.0.1). After it connected, it sent one message, and
471 | disconnected. When you are acting as a TCP client, the operating
472 | system assigns an unused port for you. This port number will
473 | usually be different each time you run the program.
474 | 
475 | *** TCP Syslog Client
476 | 
477 | Now, let's write a client for our TCP syslog protocol. This client
478 | will be similar to the UDP client, but there are a couple of
479 | changes. First, since TCP is a streaming protocol, we can send
480 | data using a ~handle~ rather than using the lower-level socket
481 | operations. Secondly, we no longer need to store the destination
482 | address in the ~SyslogHandle~ since we will be using ~connect~ to
483 | establish the TCP connection. Finally, we need a way to know where
484 | one message ends and the next begins. With UDP, that was easy
485 | because each message was a discrete logical packet. With TCP,
486 | we'll just use the newline character ~'\n'~ as the end-of-message
487 | marker, though that means that no individual message may contain
488 | the newline. Here's our code:
489 | 
490 | #+CAPTION: syslogtcpclient.hs
491 | #+BEGIN_EXAMPLE
492 | import Data.Bits
493 | import Network.Socket
494 | import Network.BSD
495 | import Data.List
496 | import SyslogTypes
497 | import System.IO
498 | 
499 | data SyslogHandle =
500 |     SyslogHandle {slHandle :: Handle,
501 |                   slProgram :: String}
502 | 
503 | openlog :: HostName             -- ^ Remote hostname, or localhost
504 |         -> String               -- ^ Port number or name; 514 is default
505 |         -> String               -- ^ Name to log under
506 |         -> IO SyslogHandle      -- ^ Handle to use for logging
507 | openlog hostname port progname =
508 |     do -- Look up the hostname and port.  Either raises an exception
509 |        -- or returns a nonempty list.  First element in that list
510 |        -- is supposed to be the best option.
511 |        addrinfos <- getAddrInfo Nothing (Just hostname) (Just port)
512 |        let serveraddr = head addrinfos
513 | 
514 |        -- Establish a socket for communication
515 |        sock <- socket (addrFamily serveraddr) Stream defaultProtocol
516 | 
517 |        -- Mark the socket for keep-alive handling since it may be idle
518 |        -- for long periods of time
519 |        setSocketOption sock KeepAlive 1
520 | 
521 |        -- Connect to server
522 |        connect sock (addrAddress serveraddr)
523 | 
524 |        -- Make a Handle out of it for convenience
525 |        h <- socketToHandle sock WriteMode
526 | 
527 |        -- We're going to set buffering to BlockBuffering and then
528 |        -- explicitly call hFlush after each message, below, so that
529 |        -- messages get logged immediately
530 |        hSetBuffering h (BlockBuffering Nothing)
531 | 
532 |        -- Save off the socket, program name, and server address in a handle
533 |        return $ SyslogHandle h progname
534 | 
535 | syslog :: SyslogHandle -> Facility -> Priority -> String -> IO ()
536 | syslog syslogh fac pri msg =
537 |     do hPutStrLn (slHandle syslogh) sendmsg
538 |        -- Make sure that we send data immediately
539 |        hFlush (slHandle syslogh)
540 |     where code = makeCode fac pri
541 |           sendmsg = "<" ++ show code ++ ">" ++ (slProgram syslogh) ++
542 |                     ": " ++ msg
543 | 
544 | closelog :: SyslogHandle -> IO ()
545 | closelog syslogh = hClose (slHandle syslogh)
546 | 
547 | {- | Convert a facility and a priority into a syslog code -}
548 | makeCode :: Facility -> Priority -> Int
549 | makeCode fac pri =
550 |     let faccode = codeOfFac fac
551 |         pricode = fromEnum pri
552 |         in
553 |           (faccode `shiftL` 3) .|. pricode
554 | #+END_EXAMPLE
555 | 
556 | We can try it out under ~ghci~. If you still have the TCP server
557 | running from earlier, your session might look something like this:
558 | 
559 | #+BEGIN_SRC screen
560 | ghci> :load syslogtcpclient.hs
561 | Loading package base ... linking ... done.
562 | [1 of 2] Compiling SyslogTypes      ( SyslogTypes.hs, interpreted )
563 | [2 of 2] Compiling Main             ( syslogtcpclient.hs, interpreted )
564 | Ok, modules loaded: Main, SyslogTypes.
565 | ghci> openlog "localhost" "10514" "tcptest"
566 | Loading package parsec-2.1.0.0 ... linking ... done.
567 | Loading package network-2.1.0.0 ... linking ... done.
568 | ghci> sl <- openlog "localhost" "10514" "tcptest"
569 | ghci> syslog sl USER INFO "This is my TCP message"
570 | ghci> syslog sl USER INFO "This is my TCP message again"
571 | ghci> closelog sl
572 | #+END_SRC
573 | 
574 | Over on the server, you'll see something like this:
575 | 
576 | #+BEGIN_SRC screen
577 | From 127.0.0.1:46319: syslogtcpserver.hs: client connected
578 | From 127.0.0.1:46319: <9>tcptest: This is my TCP message
579 | From 127.0.0.1:46319: <9>tcptest: This is my TCP message again
580 | From 127.0.0.1:46319: syslogtcpserver.hs: client disconnected
581 | #+END_SRC
582 | 
583 | The ~<9>~ is the priority and facility code being sent along,
584 | just as it was with UDP.
585 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
  1 | Creative Commons Legal Code
  2 | 
  3 | Attribution-NonCommercial 3.0 Unported
  4 | 
  5 |     CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
  6 |     LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
  7 |     ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
  8 |     INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
  9 |     REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
 10 |     DAMAGES RESULTING FROM ITS USE.
 11 | 
 12 | License
 13 | 
 14 | THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE
 15 | COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY
 16 | COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS
 17 | AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
 18 | 
 19 | BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE
 20 | TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY
 21 | BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS
 22 | CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND
 23 | CONDITIONS.
 24 | 
 25 | 1. Definitions
 26 | 
 27 |  a. "Adaptation" means a work based upon the Work, or upon the Work and
 28 |     other pre-existing works, such as a translation, adaptation,
 29 |     derivative work, arrangement of music or other alterations of a
 30 |     literary or artistic work, or phonogram or performance and includes
 31 |     cinematographic adaptations or any other form in which the Work may be
 32 |     recast, transformed, or adapted including in any form recognizably
 33 |     derived from the original, except that a work that constitutes a
 34 |     Collection will not be considered an Adaptation for the purpose of
 35 |     this License. For the avoidance of doubt, where the Work is a musical
 36 |     work, performance or phonogram, the synchronization of the Work in
 37 |     timed-relation with a moving image ("synching") will be considered an
 38 |     Adaptation for the purpose of this License.
 39 |  b. "Collection" means a collection of literary or artistic works, such as
 40 |     encyclopedias and anthologies, or performances, phonograms or
 41 |     broadcasts, or other works or subject matter other than works listed
 42 |     in Section 1(f) below, which, by reason of the selection and
 43 |     arrangement of their contents, constitute intellectual creations, in
 44 |     which the Work is included in its entirety in unmodified form along
 45 |     with one or more other contributions, each constituting separate and
 46 |     independent works in themselves, which together are assembled into a
 47 |     collective whole. A work that constitutes a Collection will not be
 48 |     considered an Adaptation (as defined above) for the purposes of this
 49 |     License.
 50 |  c. "Distribute" means to make available to the public the original and
 51 |     copies of the Work or Adaptation, as appropriate, through sale or
 52 |     other transfer of ownership.
 53 |  d. "Licensor" means the individual, individuals, entity or entities that
 54 |     offer(s) the Work under the terms of this License.
 55 |  e. "Original Author" means, in the case of a literary or artistic work,
 56 |     the individual, individuals, entity or entities who created the Work
 57 |     or if no individual or entity can be identified, the publisher; and in
 58 |     addition (i) in the case of a performance the actors, singers,
 59 |     musicians, dancers, and other persons who act, sing, deliver, declaim,
 60 |     play in, interpret or otherwise perform literary or artistic works or
 61 |     expressions of folklore; (ii) in the case of a phonogram the producer
 62 |     being the person or legal entity who first fixes the sounds of a
 63 |     performance or other sounds; and, (iii) in the case of broadcasts, the
 64 |     organization that transmits the broadcast.
 65 |  f. "Work" means the literary and/or artistic work offered under the terms
 66 |     of this License including without limitation any production in the
 67 |     literary, scientific and artistic domain, whatever may be the mode or
 68 |     form of its expression including digital form, such as a book,
 69 |     pamphlet and other writing; a lecture, address, sermon or other work
 70 |     of the same nature; a dramatic or dramatico-musical work; a
 71 |     choreographic work or entertainment in dumb show; a musical
 72 |     composition with or without words; a cinematographic work to which are
 73 |     assimilated works expressed by a process analogous to cinematography;
 74 |     a work of drawing, painting, architecture, sculpture, engraving or
 75 |     lithography; a photographic work to which are assimilated works
 76 |     expressed by a process analogous to photography; a work of applied
 77 |     art; an illustration, map, plan, sketch or three-dimensional work
 78 |     relative to geography, topography, architecture or science; a
 79 |     performance; a broadcast; a phonogram; a compilation of data to the
 80 |     extent it is protected as a copyrightable work; or a work performed by
 81 |     a variety or circus performer to the extent it is not otherwise
 82 |     considered a literary or artistic work.
 83 |  g. "You" means an individual or entity exercising rights under this
 84 |     License who has not previously violated the terms of this License with
 85 |     respect to the Work, or who has received express permission from the
 86 |     Licensor to exercise rights under this License despite a previous
 87 |     violation.
 88 |  h. "Publicly Perform" means to perform public recitations of the Work and
 89 |     to communicate to the public those public recitations, by any means or
 90 |     process, including by wire or wireless means or public digital
 91 |     performances; to make available to the public Works in such a way that
 92 |     members of the public may access these Works from a place and at a
 93 |     place individually chosen by them; to perform the Work to the public
 94 |     by any means or process and the communication to the public of the
 95 |     performances of the Work, including by public digital performance; to
 96 |     broadcast and rebroadcast the Work by any means including signs,
 97 |     sounds or images.
 98 |  i. "Reproduce" means to make copies of the Work by any means including
 99 |     without limitation by sound or visual recordings and the right of
100 |     fixation and reproducing fixations of the Work, including storage of a
101 |     protected performance or phonogram in digital form or other electronic
102 |     medium.
103 | 
104 | 2. Fair Dealing Rights. Nothing in this License is intended to reduce,
105 | limit, or restrict any uses free from copyright or rights arising from
106 | limitations or exceptions that are provided for in connection with the
107 | copyright protection under copyright law or other applicable laws.
108 | 
109 | 3. License Grant. Subject to the terms and conditions of this License,
110 | Licensor hereby grants You a worldwide, royalty-free, non-exclusive,
111 | perpetual (for the duration of the applicable copyright) license to
112 | exercise the rights in the Work as stated below:
113 | 
114 |  a. to Reproduce the Work, to incorporate the Work into one or more
115 |     Collections, and to Reproduce the Work as incorporated in the
116 |     Collections;
117 |  b. to create and Reproduce Adaptations provided that any such Adaptation,
118 |     including any translation in any medium, takes reasonable steps to
119 |     clearly label, demarcate or otherwise identify that changes were made
120 |     to the original Work. For example, a translation could be marked "The
121 |     original work was translated from English to Spanish," or a
122 |     modification could indicate "The original work has been modified.";
123 |  c. to Distribute and Publicly Perform the Work including as incorporated
124 |     in Collections; and,
125 |  d. to Distribute and Publicly Perform Adaptations.
126 | 
127 | The above rights may be exercised in all media and formats whether now
128 | known or hereafter devised. The above rights include the right to make
129 | such modifications as are technically necessary to exercise the rights in
130 | other media and formats. Subject to Section 8(f), all rights not expressly
131 | granted by Licensor are hereby reserved, including but not limited to the
132 | rights set forth in Section 4(d).
133 | 
134 | 4. Restrictions. The license granted in Section 3 above is expressly made
135 | subject to and limited by the following restrictions:
136 | 
137 |  a. You may Distribute or Publicly Perform the Work only under the terms
138 |     of this License. You must include a copy of, or the Uniform Resource
139 |     Identifier (URI) for, this License with every copy of the Work You
140 |     Distribute or Publicly Perform. You may not offer or impose any terms
141 |     on the Work that restrict the terms of this License or the ability of
142 |     the recipient of the Work to exercise the rights granted to that
143 |     recipient under the terms of the License. You may not sublicense the
144 |     Work. You must keep intact all notices that refer to this License and
145 |     to the disclaimer of warranties with every copy of the Work You
146 |     Distribute or Publicly Perform. When You Distribute or Publicly
147 |     Perform the Work, You may not impose any effective technological
148 |     measures on the Work that restrict the ability of a recipient of the
149 |     Work from You to exercise the rights granted to that recipient under
150 |     the terms of the License. This Section 4(a) applies to the Work as
151 |     incorporated in a Collection, but this does not require the Collection
152 |     apart from the Work itself to be made subject to the terms of this
153 |     License. If You create a Collection, upon notice from any Licensor You
154 |     must, to the extent practicable, remove from the Collection any credit
155 |     as required by Section 4(c), as requested. If You create an
156 |     Adaptation, upon notice from any Licensor You must, to the extent
157 |     practicable, remove from the Adaptation any credit as required by
158 |     Section 4(c), as requested.
159 |  b. You may not exercise any of the rights granted to You in Section 3
160 |     above in any manner that is primarily intended for or directed toward
161 |     commercial advantage or private monetary compensation. The exchange of
162 |     the Work for other copyrighted works by means of digital file-sharing
163 |     or otherwise shall not be considered to be intended for or directed
164 |     toward commercial advantage or private monetary compensation, provided
165 |     there is no payment of any monetary compensation in connection with
166 |     the exchange of copyrighted works.
167 |  c. If You Distribute, or Publicly Perform the Work or any Adaptations or
168 |     Collections, You must, unless a request has been made pursuant to
169 |     Section 4(a), keep intact all copyright notices for the Work and
170 |     provide, reasonable to the medium or means You are utilizing: (i) the
171 |     name of the Original Author (or pseudonym, if applicable) if supplied,
172 |     and/or if the Original Author and/or Licensor designate another party
173 |     or parties (e.g., a sponsor institute, publishing entity, journal) for
174 |     attribution ("Attribution Parties") in Licensor's copyright notice,
175 |     terms of service or by other reasonable means, the name of such party
176 |     or parties; (ii) the title of the Work if supplied; (iii) to the
177 |     extent reasonably practicable, the URI, if any, that Licensor
178 |     specifies to be associated with the Work, unless such URI does not
179 |     refer to the copyright notice or licensing information for the Work;
180 |     and, (iv) consistent with Section 3(b), in the case of an Adaptation,
181 |     a credit identifying the use of the Work in the Adaptation (e.g.,
182 |     "French translation of the Work by Original Author," or "Screenplay
183 |     based on original Work by Original Author"). The credit required by
184 |     this Section 4(c) may be implemented in any reasonable manner;
185 |     provided, however, that in the case of a Adaptation or Collection, at
186 |     a minimum such credit will appear, if a credit for all contributing
187 |     authors of the Adaptation or Collection appears, then as part of these
188 |     credits and in a manner at least as prominent as the credits for the
189 |     other contributing authors. For the avoidance of doubt, You may only
190 |     use the credit required by this Section for the purpose of attribution
191 |     in the manner set out above and, by exercising Your rights under this
192 |     License, You may not implicitly or explicitly assert or imply any
193 |     connection with, sponsorship or endorsement by the Original Author,
194 |     Licensor and/or Attribution Parties, as appropriate, of You or Your
195 |     use of the Work, without the separate, express prior written
196 |     permission of the Original Author, Licensor and/or Attribution
197 |     Parties.
198 |  d. For the avoidance of doubt:
199 | 
200 |      i. Non-waivable Compulsory License Schemes. In those jurisdictions in
201 |         which the right to collect royalties through any statutory or
202 |         compulsory licensing scheme cannot be waived, the Licensor
203 |         reserves the exclusive right to collect such royalties for any
204 |         exercise by You of the rights granted under this License;
205 |     ii. Waivable Compulsory License Schemes. In those jurisdictions in
206 |         which the right to collect royalties through any statutory or
207 |         compulsory licensing scheme can be waived, the Licensor reserves
208 |         the exclusive right to collect such royalties for any exercise by
209 |         You of the rights granted under this License if Your exercise of
210 |         such rights is for a purpose or use which is otherwise than
211 |         noncommercial as permitted under Section 4(b) and otherwise waives
212 |         the right to collect royalties through any statutory or compulsory
213 |         licensing scheme; and,
214 |    iii. Voluntary License Schemes. The Licensor reserves the right to
215 |         collect royalties, whether individually or, in the event that the
216 |         Licensor is a member of a collecting society that administers
217 |         voluntary licensing schemes, via that society, from any exercise
218 |         by You of the rights granted under this License that is for a
219 |         purpose or use which is otherwise than noncommercial as permitted
220 |         under Section 4(c).
221 |  e. Except as otherwise agreed in writing by the Licensor or as may be
222 |     otherwise permitted by applicable law, if You Reproduce, Distribute or
223 |     Publicly Perform the Work either by itself or as part of any
224 |     Adaptations or Collections, You must not distort, mutilate, modify or
225 |     take other derogatory action in relation to the Work which would be
226 |     prejudicial to the Original Author's honor or reputation. Licensor
227 |     agrees that in those jurisdictions (e.g. Japan), in which any exercise
228 |     of the right granted in Section 3(b) of this License (the right to
229 |     make Adaptations) would be deemed to be a distortion, mutilation,
230 |     modification or other derogatory action prejudicial to the Original
231 |     Author's honor and reputation, the Licensor will waive or not assert,
232 |     as appropriate, this Section, to the fullest extent permitted by the
233 |     applicable national law, to enable You to reasonably exercise Your
234 |     right under Section 3(b) of this License (right to make Adaptations)
235 |     but not otherwise.
236 | 
237 | 5. Representations, Warranties and Disclaimer
238 | 
239 | UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING, LICENSOR
240 | OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY
241 | KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE,
242 | INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTIBILITY,
243 | FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF
244 | LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS,
245 | WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION
246 | OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU.
247 | 
248 | 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE
249 | LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR
250 | ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES
251 | ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS
252 | BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
253 | 
254 | 7. Termination
255 | 
256 |  a. This License and the rights granted hereunder will terminate
257 |     automatically upon any breach by You of the terms of this License.
258 |     Individuals or entities who have received Adaptations or Collections
259 |     from You under this License, however, will not have their licenses
260 |     terminated provided such individuals or entities remain in full
261 |     compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will
262 |     survive any termination of this License.
263 |  b. Subject to the above terms and conditions, the license granted here is
264 |     perpetual (for the duration of the applicable copyright in the Work).
265 |     Notwithstanding the above, Licensor reserves the right to release the
266 |     Work under different license terms or to stop distributing the Work at
267 |     any time; provided, however that any such election will not serve to
268 |     withdraw this License (or any other license that has been, or is
269 |     required to be, granted under the terms of this License), and this
270 |     License will continue in full force and effect unless terminated as
271 |     stated above.
272 | 
273 | 8. Miscellaneous
274 | 
275 |  a. Each time You Distribute or Publicly Perform the Work or a Collection,
276 |     the Licensor offers to the recipient a license to the Work on the same
277 |     terms and conditions as the license granted to You under this License.
278 |  b. Each time You Distribute or Publicly Perform an Adaptation, Licensor
279 |     offers to the recipient a license to the original Work on the same
280 |     terms and conditions as the license granted to You under this License.
281 |  c. If any provision of this License is invalid or unenforceable under
282 |     applicable law, it shall not affect the validity or enforceability of
283 |     the remainder of the terms of this License, and without further action
284 |     by the parties to this agreement, such provision shall be reformed to
285 |     the minimum extent necessary to make such provision valid and
286 |     enforceable.
287 |  d. No term or provision of this License shall be deemed waived and no
288 |     breach consented to unless such waiver or consent shall be in writing
289 |     and signed by the party to be charged with such waiver or consent.
290 |  e. This License constitutes the entire agreement between the parties with
291 |     respect to the Work licensed here. There are no understandings,
292 |     agreements or representations with respect to the Work not specified
293 |     here. Licensor shall not be bound by any additional provisions that
294 |     may appear in any communication from You. This License may not be
295 |     modified without the mutual written agreement of the Licensor and You.
296 |  f. The rights granted under, and the subject matter referenced, in this
297 |     License were drafted utilizing the terminology of the Berne Convention
298 |     for the Protection of Literary and Artistic Works (as amended on
299 |     September 28, 1979), the Rome Convention of 1961, the WIPO Copyright
300 |     Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996
301 |     and the Universal Copyright Convention (as revised on July 24, 1971).
302 |     These rights and subject matter take effect in the relevant
303 |     jurisdiction in which the License terms are sought to be enforced
304 |     according to the corresponding provisions of the implementation of
305 |     those treaty provisions in the applicable national law. If the
306 |     standard suite of rights granted under applicable copyright law
307 |     includes additional rights not granted under this License, such
308 |     additional rights are deemed to be included in the License; this
309 |     License is not intended to restrict the license of any rights under
310 |     applicable law.
311 | 
312 | 
313 | Creative Commons Notice
314 | 
315 |     Creative Commons is not a party to this License, and makes no warranty
316 |     whatsoever in connection with the Work. Creative Commons will not be
317 |     liable to You or any party on any legal theory for any damages
318 |     whatsoever, including without limitation any general, special,
319 |     incidental or consequential damages arising in connection to this
320 |     license. Notwithstanding the foregoing two (2) sentences, if Creative
321 |     Commons has expressly identified itself as the Licensor hereunder, it
322 |     shall have all rights and obligations of Licensor.
323 | 
324 |     Except for the limited purpose of indicating to the public that the
325 |     Work is licensed under the CCPL, Creative Commons does not authorize
326 |     the use by either party of the trademark "Creative Commons" or any
327 |     related trademark or logo of Creative Commons without the prior
328 |     written consent of Creative Commons. Any permitted use will be in
329 |     compliance with Creative Commons' then-current trademark usage
330 |     guidelines, as may be published on its website or otherwise made
331 |     available upon request from time to time. For the avoidance of doubt,
332 |     this trademark restriction does not form part of the License.
333 | 
334 |     Creative Commons may be contacted at https://creativecommons.org/.
335 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | data/chapter_links.txt : setup
 4 | 	grep 'file:[^ ]*.html]' *.org | cut -d':' -f 3-4 | cut -d']' -f1 | sort | uniq > $@
 5 | 	grep -ho '\[file:.*' *.org | grep html | cut -d':' -f2 | grep '#' | cut -d'#' -f1 | sort | uniq >> $@
 6 | 
 7 | setup:
 8 | 	mkdir -p data
 9 | 
10 | htmlLinks:
11 | 	# grep '\[file:[^ ]*.html#' *.org
12 | 
13 | 	grep -n file *.org | grep '\.html' | grep -v 'bibliography.html'
14 | 
15 | quote_marks:
16 | 	sed -i 's/“/"/g' *.org
17 | 	sed -i 's/”/"/g' *.org
18 | 
19 | typeclass:
20 | 	# First, manually check that all instances of "Typeclass" don't need to be "Type Class"
21 | 	sed -i 's/Typeclass/Type class/g' *.org
22 | 	sed -i 's/typeclass/type class/g' *.org
23 | 	sed -i 's/6-using-type classes.org/6-using-typeclasses.org/g' *.org
24 | 
25 | typecheck:
26 | 	# No instances of "Typecheck"
27 | 	sed -i 's/typecheck/type-check/g' *.org
28 | 
29 | biblio:
30 | 	sed -i 's/file:bibliography.html/file:bibliography.org/g' *.org
31 | 


--------------------------------------------------------------------------------
/README.org:
--------------------------------------------------------------------------------
 1 | [[http://book.realworldhaskell.org/][Real World Haskell]] by Bryan O'Sullivan, Don Stewart and John
 2 | Goerzen is an old book (2008) that approach teaching Haskell by
 3 | building small programs. Sadly the language and libraries have
 4 | changed enough to make several of the examples useless so I am
 5 | making them working again. After I finish I pretend to make the
 6 | changes listed as /enhancements/ in the issue tracker.
 7 | 
 8 | * Progress
 9 | 
10 | 1. DONE [[file:0-why-haskell.org][Introduction]]
11 | 2. DONE [[file:1-getting-started.org][Getting started]]
12 | 3. DONE [[file:2-types-and-functions.org][Types and functions]]
13 | 4. DONE [[file:3-defining-types-streamlining-functions.org][Defining Types, Streamlining Functions]]
14 | 5. DONE [[file:4-functional-programming.org][Functional programming]]
15 | 6. DONE [[file:5-writing-a-library.org][Writing a library]]
16 | 7. DONE [[file:6-using-typeclasses.org][Using type classes]]
17 | 8. DONE [[file:7-io.org][I/O]]
18 | 9. DONE [[file:8-efficient-file-processing-regular-expressions-and-file-name-matching.org][Efficient file processing]]
19 | 10. DONE [[file:9-a-library-for-searching-the-file-system.org][A library for searching the file system]]
20 | 11. DONE [[file:10-parsing-a-binary-data-format.org][Parsing a binary data format]]
21 | 12. DONE [[file:11-testing-and-quality-assurance.org][Testing and quality assurance]]
22 | 13. DONE [[file:12-barcode-recognition.org][Barcode recognition]]
23 | 14. DONE [[file:13-data-structures.org][Data structures]]
24 | 15. DONE [[file:14-using-parsec.org][Using Parsec]]
25 | 16. DONE [[file:15-monads.org][Monads]]
26 | 17. DONE [[file:16-programming-with-monads.org][Programming with monads]]
27 | 18. TODO [[file:17-interfacing-with-c.org][Interfacing with C]]
28 | 19. STARTED [[file:18-monad-transformers.org][Monad transformers]]
29 | 20. STARTED [[file:19-error-handling.org][Error handling]]
30 | 21. TODO [[file:20-systems-programming-in-haskell.org][Systems programming]]
31 | 22. TODO [[file:21-using-databases.org][Using databases]]
32 | 23. TODO [[file:22-web-client-programming.org][Web client programming]]
33 | 24. TODO [[file:23-gui-programming-with-gtk2hs.org][GUI programming with gtk2hs]]
34 | 25. TODO [[file:24-concurrent-and-multicore-programming.org][Concurrent and multicore programming]]
35 | 26. TODO [[file:25-profiling-and-optimization.org][Profiling and optimization]]
36 | 27. TODO [[file:26-building-a-bloom-filter.org][Building a Bloom filter]]
37 | 28. TODO [[file:27-sockets-and-syslog.org][Sockets and syslog]]
38 | 28. TODO [[file:28-software-transactional-memory.org][Software transactional memory]]
39 | 29. TODO [[file:appendix-characters-strings-and-escaping-rules.org][Appendix: Characters, strings and scaping rules]]
40 | 30. TODO [[file:bibliography.org][Bibliography]]
41 | 


--------------------------------------------------------------------------------
/appendix-characters-strings-and-escaping-rules.org:
--------------------------------------------------------------------------------
  1 | * Appendix. Characters, strings, and escaping rules
  2 | 
  3 | This appendix covers the escaping rules used to represent
  4 | non-ASCII characters in Haskell character and string literals.
  5 | Haskell's escaping rules follow the pattern established by the C
  6 | programming language, but expand considerably upon them.
  7 | 
  8 | ** Writing character and string literals
  9 | 
 10 | A single character is surrounded by ASCII single quotes, ~'~, and
 11 | has type ~Char~.
 12 | 
 13 | #+BEGIN_SRC screen
 14 | ghci> 'c'
 15 | 'c'
 16 | ghci> :type 'c'
 17 | 'c' :: Char
 18 | #+END_SRC
 19 | 
 20 | A string literal is surrounded by double quotes, ~"~, and has type
 21 | ~[Char]~ (more often written as ~String~).
 22 | 
 23 | #+BEGIN_SRC screen
 24 | ghci> "a string literal"
 25 | "a string literal"
 26 | ghci> :type "a string literal"
 27 | "a string literal" :: [Char]
 28 | #+END_SRC
 29 | 
 30 | The double-quoted form of a string literal is just syntactic
 31 | sugar for list notation.
 32 | 
 33 | #+BEGIN_SRC screen
 34 | ghci> ['a', ' ', 's', 't', 'r', 'i', 'n', 'g'] == "a string"
 35 | True
 36 | #+END_SRC
 37 | 
 38 | ** International language support
 39 | 
 40 | Haskell uses Unicode internally for its ~Char~ data type. Since
 41 | ~String~ is just an alias for ~[Char]~, a list of ~Char~s, Unicode
 42 | is also used to represent strings.
 43 | 
 44 | Different Haskell implementations place limitations on the
 45 | character sets they can accept in source files. GHC allows source
 46 | files to be written in the UTF-8 encoding of Unicode, so in a
 47 | source file, you can use UTF-8 literals inside a character or
 48 | string constant. Do be aware that if you use UTF-8, other Haskell
 49 | implementations may not be able to parse your source files.
 50 | 
 51 | When you run the ~ghci~ interpreter interactively, it may not be
 52 | able to deal with international characters in character or string
 53 | literals that you enter at the keyboard.
 54 | 
 55 | #+BEGIN_NOTE
 56 | Note
 57 | 
 58 | Although Haskell represents characters and strings internally
 59 | using Unicode, there is no standardised way to do I/O on files
 60 | that contain Unicode data. Haskell's standard text I/O functions
 61 | treat text as a sequence of 8-bit characters, and do not perform
 62 | any character set conversion.
 63 | 
 64 | There exist third-party libraries that will convert between the
 65 | many different encodings used in files and Haskell's internal
 66 | Unicode representation.
 67 | #+END_NOTE
 68 | 
 69 | ** Escaping text
 70 | 
 71 | Some characters must be escaped to be represented inside a
 72 | character or string literal. For example, a double quote character
 73 | inside a string literal must be escaped, or else it will be
 74 | treated as the end of the string.
 75 | 
 76 | *** Single-character escape codes
 77 | 
 78 | Haskell uses essentially the same single-character escapes as the
 79 | C language and many other popular languages.
 80 | 
 81 | #+CAPTION: Single-character escape codes
 82 | | Escape | Unicode | Character           |
 83 | |--------+---------+---------------------|
 84 | | =\0=   | U+0000  | null character      |
 85 | | =\a=   | U+0007  | alert               |
 86 | | =\b=   | U+0008  | backspace           |
 87 | | =\f=   | U+000C  | form feed           |
 88 | | =\n=   | U+000A  | newline (line feed) |
 89 | | =\r=   | U+000D  | carriage return     |
 90 | | =\t=   | U+0009  | horizontal tab      |
 91 | | =\v=   | U+000B  | vertical tab        |
 92 | | =\"=   | U+0022  | double quote        |
 93 | | =\&=   | /n/a/   | empty string        |
 94 | | =\'=   | U+0027  | single quote        |
 95 | | =\\=   | U+005C  | backslash           |
 96 | 
 97 | *** Multiline string literals
 98 | 
 99 | To write a string literal that spans multiple lines, terminate
100 | one line with a backslash, and resume the string with another backslash.
101 | An arbitrary amount of whitespace (of any kind) can fill the gap between
102 | the two backslashes.
103 | 
104 | #+BEGIN_SRC haskell
105 | "this is a \
106 | \long string,\
107 | \ spanning multiple lines"
108 | #+END_SRC
109 | 
110 | *** ASCII control codes
111 | 
112 | Haskell recognises the escaped use of the standard two- and
113 | three-letter abbreviations of ASCII control codes.
114 | 
115 | #+CAPTION: ASCII control code abbreviations
116 | | Escape | Unicode| Meaning                   |
117 | |--------+--------+---------------------------|
118 | | =\NUL= | U+0000 | null character            |
119 | | =\SOH= | U+0001 | start of heading          |
120 | | =\STX= | U+0002 | start of text             |
121 | | =\ETX= | U+0003 | end of text               |
122 | | =\EOT= | U+0004 | end of transmission       |
123 | | =\ENQ= | U+0005 | enquiry                   |
124 | | =\ACK= | U+0006 | acknowledge               |
125 | | =\BEL= | U+0007 | bell                      |
126 | | =\BS=  | U+0008 | backspace                 |
127 | | =\HT=  | U+0009 | horizontal tab            |
128 | | =\LF=  | U+000A | line feed (newline)       |
129 | | =\VT=  | U+000B | vertical tab              |
130 | | =\FF=  | U+000C | form feed                 |
131 | | =\CR=  | U+000D | carriage return           |
132 | | =\SO=  | U+000E | shift out                 |
133 | | =\SI=  | U+000F | shift in                  |
134 | | =\DLE= | U+0010 | data link escape          |
135 | | =\DC1= | U+0011 | device control 1          |
136 | | =\DC2= | U+0012 | device control 2          |
137 | | =\DC3= | U+0013 | device control 3          |
138 | | =\DC4= | U+0014 | device control 4          |
139 | | =\NAK= | U+0015 | negative acknowledge      |
140 | | =\SYN= | U+0016 | synchronous idle          |
141 | | =\ETB= | U+0017 | end of transmission block |
142 | | =\CAN= | U+0018 | cancel                    |
143 | | =\EM=  | U+0019 | end of medium             |
144 | | =\SUB= | U+001A | substitute                |
145 | | =\ESC= | U+001B | escape                    |
146 | | =\FS=  | U+001C | file separator            |
147 | | =\GS=  | U+001D | group separator           |
148 | | =\RS=  | U+001E | record separator          |
149 | | =\US=  | U+001F | unit separator            |
150 | | =\SP=  | U+0020 | space                     |
151 | | =\DEL= | U+007F | delete                    |
152 | 
153 | *** Control-with-character escapes
154 | 
155 | Haskell recognises an alternate notation for control characters,
156 | which represents the archaic effect of pressing the ~control~ key
157 | on a keyboard and chording it with another key. These sequences
158 | begin with the characters ~\^~, followed by a symbol or uppercase
159 | letter.
160 | 
161 | #+CAPTION: Control-with-character escapes
162 | | Escape              | Unicode               | Meaning          |
163 | |---------------------+-----------------------+------------------|
164 | | =\^@=               | U+0000                | null character   |
165 | | =\^A= through =\^Z= | U+0001 through U+001A | control codes    |
166 | | =\^[=               | U+001B                | escape           |
167 | | =\^\=               | U+001C                | file separator   |
168 | | =\^]=               | U+001D                | group separator  |
169 | | =\^^=               | U+001E                | record separator |
170 | | =\^_=               | U+001F                | unit separator   |
171 | 
172 | *** Numeric escapes
173 | 
174 | Haskell allows Unicode characters to be written using numeric
175 | escapes. A decimal character begins with a digit, e.g. ~\1234~. A
176 | hexadecimal character begins with an ~x~, e.g. ~\xbeef~. An octal
177 | character begins with an ~o~, e.g. ~\o1234~.
178 | 
179 | The maximum value of a numeric literal is ~\1114111~, which may
180 | also be written ~\x10ffff~ or ~\o4177777~.
181 | 
182 | *** The zero-width escape sequence
183 | 
184 | String literals can contain a zero-width escape sequence, written
185 | ~\&~. This is not a real character, as it represents the empty
186 | string.
187 | 
188 | #+BEGIN_SRC screen
189 | ghci> "\&"
190 | ""
191 | ghci> "foo\&bar"
192 | "foobar"
193 | #+END_SRC
194 | 
195 | The purpose of this escape sequence is to make it possible to
196 | write a numeric escape followed immediately by a regular ASCII
197 | digit.
198 | 
199 | #+BEGIN_SRC screen
200 | ghci> "\130\&11"
201 | "\130\&11"
202 | #+END_SRC
203 | 
204 | Because the empty escape sequence represents an empty string, it
205 | is not legal in a character literal.
206 | 


--------------------------------------------------------------------------------
/bibliography.org:
--------------------------------------------------------------------------------
 1 | * Bibliography
 2 | 
 3 | #+NAME: Broder02
 4 | [Broder02] Andrei Broder. Michael Mitzenmacher.
 5 | "[[http://www.eecs.harvard.edu/~michaelm/postscripts/im2005b.pdf][Network applications of Bloom filters: a survey]]".
 6 | [[http://www.internetmathematics.org/][/Internet Mathematics/]].
 7 | 1. 4. 2005. 485-509.
 8 | [[http://www.akpeters.com/][A K Peters Ltd.]].
 9 | 
10 | #+NAME: Google08
11 | [Google08] Jeffrey Dean. Sanjay Ghemawat.
12 | "[[http://labs.google.com/papers/mapreduce.html][MapReduce: simplified data processing on large clusters]]".
13 | [[http://cacm.acm.org/][/Communications of the ACM/]].
14 | 51. 1. January 2008. 107-113.
15 | [[http://www.acm.org/][Association for Computing Machinery]]. 0956-7968.
16 | 
17 | #+NAME: Hughes95
18 | [Hughes95] John Hughes.
19 | "[[http://citeseer.ist.psu.edu/hughes95design.html][The design of a pretty-printing library]]".
20 | May, 1995. First International Spring School on Advanced
21 | Functional Programming Techniques.
22 | Bastad, Sweden.
23 | 
24 | #+NAME: Hutton99
25 | [Hutton99] Graham Hutton.
26 | "[[http://www.cs.nott.ac.uk/~gmh/fold.pdf][A tutorial on the universality and expressiveness of fold]]".
27 | [[http://journals.cambridge.org/jid_JFP][/Journal of Functional Programming/]].
28 | 9. 4. July 1999. 355-372.
29 | [[http://www.cambridge.org/][Cambridge University Press]]. 0956-7968.
30 | 
31 | #+NAME: Okasaki99
32 | [Okasaki99] Chris Okasaki.
33 | /Purely Functional Data Structures/.
34 | [[http://www.cambridge.org/][Cambridge University Press]]. 0521663504.
35 | 
36 | #+NAME: Okasaki96
37 | [Okasaki96] Chris Okasaki.
38 | [[http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf][/Purely Functional Data Structures/]]. Carnegie Mellon University.
39 | This is Okasaki's PhD thesis, a less complete precursor to [[[Okasaki99][Okasaki99]]]
40 | 
41 | #+NAME: Wadler89
42 | [Wadler89] Philip Wadler.
43 | "[[http://citeseer.ist.psu.edu/wadler89theorems.html][Theorems for free!]]".
44 | September, 1989.
45 | International Conference on Functional Programming and Computer
46 | Architecture.
47 | 4. London, England.
48 | 
49 | #+NAME: Wadler98
50 | [Wadler98] Philip Wadler. "[[http://citeseer.ist.psu.edu/wadler98prettier.html][A prettier printer]]". March 1998.
51 | 


--------------------------------------------------------------------------------
/bin/map_html_to_org.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import re
 3 | 
 4 | data_file = 'data/chapter_links.txt'
 5 | 
 6 | with open(data_file, 'r') as f:
 7 |     html_files = list(set([line.strip() for line in f.readlines()]))
 8 | 
 9 | # Sort so that longest file names are first
10 | html_files = sorted(html_files, key=lambda x: -len(x))
11 | 
12 | org_files_orig = [f for f in os.listdir('.') if f.endswith('.org')]
13 | 
14 | org_files = list(org_files_orig)
15 | 
16 | mapping = {}
17 | 
18 | # Hardcode some mappings
19 | 
20 | mapping = {'advanced-library-design-building-a-bloom-filter.html': '26-building-a-bloom-filter.org',
21 |            'code-case-study-parsing-a-binary-data-format.html': '10-parsing-a-binary-data-format.org',
22 |            'extended-example-web-client-programming.html': '22-web-client-programming.org',
23 |            'interfacing-with-c-the-ffi.html': '17-interfacing-with-c.org',
24 |            'writing-a-library-working-with-json-data.html': '5-writing-a-library.org',
25 |            'io-case-study-a-library-for-searching-the-filesystem.html': '9-a-library-for-searching-the-file-system.org',
26 |            'gui-programming-with-gtk-hs.html': '23-gui-programming-with-gtk2hs.org',
27 |            }
28 | 
29 | for f_raw in html_files:
30 |     f = f_raw.replace('.html', '.org')
31 | 
32 |     match = None
33 |     for o in org_files:
34 |         if f in o:
35 |             match = o
36 |             break
37 |     if match is not None:
38 |         org_files = [f for f in org_files if f != match]
39 |         mapping[f_raw] = match
40 | 
41 | print('<Unmapped chapters>')
42 | print([f for f in html_files if f not in mapping])
43 | print('</Unmapped chapters>')
44 | 
45 | print('<Unmapped ORG>')
46 | print([f for f in org_files if f not in mapping.values()])
47 | print('</Unmapped ORG>')
48 | 
49 | 
50 | def format_sed_cmd(i, o):
51 |     return ("sed -i 's/\\[file:{infile}\\]/\\[file:{outfile}\\]/g' *.org"
52 |             ).format(infile=i, outfile=o)
53 | 
54 | 
55 | print('')
56 | print('sed commands:')
57 | for h, o in mapping.items():
58 |     print(format_sed_cmd(h, o))
59 | 
60 | def fn_replace_section_ref(matchobj, html_mapping=mapping):
61 |     """This gets called in re.sub(...)"""
62 |     (html, _, sec) = matchobj.groups()
63 |     org = html_mapping.get(html, html.replace('.html', '.org'))
64 | 
65 |     return '[[file:{a1}::*{a3}][the section called \"{a3}\"]]'.format(a1=org, a3=sec)
66 | 
67 | 
68 | def replace_section_ref(line, html_mapping):
69 |     """
70 |     >>> replace_section_ref('', {})
71 |     ''
72 | 
73 |     >>> replace_section_ref('abc', {})
74 |     'abc'
75 | 
76 |     >>> replace_section_ref('[[file:xyz.html#abc][the section called "blah"]]', {"xyz.html": "0-xyz.org"})
77 |     '[[file:0-xyz.org::*blah][the section called \"blah\"]]'
78 |     """
79 | 
80 |     # return re.sub('\[\[file:([^ ]+.html)#([^ \]]+)\]\[the section called “([^“]+)”\].*', '[[file:\\1::*\\3][the section called \"\\3\"]]', line)
81 |     return re.sub('\[\[file:([^ ]+.html)#([^ \]]+)\]\[the section called [“"]([^“]+)[”"]\]\]',
82 |                   lambda s: fn_replace_section_ref(s, html_mapping), line)
83 | 
84 | 
85 | if __name__ == '__main__':
86 |     import sys
87 | 
88 |     # print(sys.argv[1:])
89 | 
90 |     files_to_format = [f for f in sys.argv[1:] if f.endswith('.org')]
91 |     # files_to_format = [f for f in files_to_format if f.startswith('2-')]
92 | 
93 |     print('To format: {}'.format(files_to_format))
94 | 
95 |     if True and files_to_format:
96 |         import fileinput
97 |         for line in fileinput.input(files=files_to_format, inplace=True, backup='.bak'):
98 |             print(re.sub('\[\[file:([^ ]+.html)#([^ \]]+)\]\[the section called “([^“]+)”\].*', fn_replace_section_ref, line), end='')
99 | 


--------------------------------------------------------------------------------
/bin/update_file_links:
--------------------------------------------------------------------------------
 1 | sed -i 's/\[file:monads.html\]/\[file:15-monads.org\]/g' *.org
 2 | sed -i 's/\[file:advanced-library-design-building-a-bloom-filter.html\]/\[file:26-building-a-bloom-filter.org\]/g' *.org
 3 | sed -i 's/\[file:code-case-study-parsing-a-binary-data-format.html\]/\[file:10-parsing-a-binary-data-format.org\]/g' *.org
 4 | sed -i 's/\[file:extended-example-web-client-programming.html\]/\[file:22-web-client-programming.org\]/g' *.org
 5 | sed -i 's/\[file:interfacing-with-c-the-ffi.html\]/\[file:17-interfacing-with-c.org\]/g' *.org
 6 | sed -i 's/\[file:efficient-file-processing-regular-expressions-and-file-name-matching.html\]/\[file:8-efficient-file-processing-regular-expressions-and-file-name-matching.org\]/g' *.org
 7 | sed -i 's/\[file:characters-strings-and-escaping-rules.html\]/\[file:appendix-characters-strings-and-escaping-rules.org\]/g' *.org
 8 | sed -i 's/\[file:defining-types-streamlining-functions.html\]/\[file:3-defining-types-streamlining-functions.org\]/g' *.org
 9 | sed -i 's/\[file:concurrent-and-multicore-programming.html\]/\[file:24-concurrent-and-multicore-programming.org\]/g' *.org
10 | sed -i 's/\[file:software-transactional-memory.html\]/\[file:28-software-transactional-memory.org\]/g' *.org
11 | sed -i 's/\[file:profiling-and-optimization.html\]/\[file:25-profiling-and-optimization.org\]/g' *.org
12 | sed -i 's/\[file:functional-programming.html\]/\[file:4-functional-programming.org\]/g' *.org
13 | sed -i 's/\[file:monad-transformers.html\]/\[file:18-monad-transformers.org\]/g' *.org
14 | sed -i 's/\[file:using-typeclasses.html\]/\[file:6-using-typeclasses.org\]/g' *.org
15 | sed -i 's/\[file:data-structures.html\]/\[file:13-data-structures.org\]/g' *.org
16 | sed -i 's/\[file:using-databases.html\]/\[file:21-using-databases.org\]/g' *.org
17 | sed -i 's/\[file:error-handling.html\]/\[file:19-error-handling.org\]/g' *.org
18 | sed -i 's/\[file:using-parsec.html\]/\[file:14-using-parsec.org\]/g' *.org
19 | sed -i 's/\[file:parsec.html\]/\[file:14-using-parsec.org\]/g' *.org
20 | sed -i 's/\[file:io.html\]/\[file:7-io.org\]/g' *.org


--------------------------------------------------------------------------------
/bin/update_quotation_marks:
--------------------------------------------------------------------------------
1 | sed -i 's/“/"/g' *.org


--------------------------------------------------------------------------------
/figs/ch11-hpc-round1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch11-hpc-round1.png


--------------------------------------------------------------------------------
/figs/ch11-hpc-round2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch11-hpc-round2.png


--------------------------------------------------------------------------------
/figs/ch12-bad-angled.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch12-bad-angled.jpg


--------------------------------------------------------------------------------
/figs/ch12-bad-too-far.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch12-bad-too-far.jpg


--------------------------------------------------------------------------------
/figs/ch12-bad-too-near.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch12-bad-too-near.jpg


--------------------------------------------------------------------------------
/figs/ch12-barcode-example.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch12-barcode-example.png


--------------------------------------------------------------------------------
/figs/ch12-barcode-generated.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch12-barcode-generated.png


--------------------------------------------------------------------------------
/figs/ch12-barcode-photo.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch12-barcode-photo.jpg


--------------------------------------------------------------------------------
/figs/ch25-heap-hc.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch25-heap-hc.png


--------------------------------------------------------------------------------
/figs/ch25-heap-hd.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch25-heap-hd.png


--------------------------------------------------------------------------------
/figs/ch25-heap-hy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch25-heap-hy.png


--------------------------------------------------------------------------------
/figs/ch25-stack.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/ch25-stack.png


--------------------------------------------------------------------------------
/figs/gui-glade-3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/gui-glade-3.png


--------------------------------------------------------------------------------
/figs/gui-pod-addwin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/gui-pod-addwin.png


--------------------------------------------------------------------------------
/figs/gui-pod-mainwin.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/gui-pod-mainwin.png


--------------------------------------------------------------------------------
/figs/gui-update-complete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tssm/up-to-date-real-world-haskell/1c9b06e4564177587d5b54b20ba86736ea2a0d88/figs/gui-update-complete.png


--------------------------------------------------------------------------------