├── pics
    ├── diff1.pdf
    ├── diff2.pdf
    ├── diff3.pdf
    ├── diff4.pdf
    ├── diff5.pdf
    ├── diff6.pdf
    ├── diff7.pdf
    ├── diff8.pdf
    ├── diff9.pdf
    ├── case1_1.png
    ├── case1_2.png
    ├── case1_3.png
    ├── case1_4.png
    ├── case1_5.png
    ├── case1_6.png
    ├── case1_7.png
    ├── case1_8.png
    ├── case1_9.png
    ├── case2_1.png
    ├── case2_2.png
    ├── case2_3.png
    ├── case3_1.png
    ├── case3_2.png
    ├── case3_3.png
    ├── diff10.pdf
    ├── diff11.pdf
    ├── diff12.pdf
    ├── diff13.pdf
    ├── diff14.pdf
    ├── diff15.pdf
    ├── diff16.pdf
    ├── diff17.pdf
    ├── diff18.pdf
    ├── diff19.pdf
    ├── diff20.pdf
    ├── diff21.pdf
    ├── diff22.pdf
    ├── diff23.pdf
    ├── zendo1.png
    ├── zendo2.png
    ├── case1_10.png
    ├── hedgehog1.png
    ├── hedgehog2.png
    ├── hedgehog3.png
    ├── hedgehog4.png
    ├── hedgehog5.png
    ├── hedgehog6.png
    ├── hedgehog7.png
    ├── komposition-light.png
    ├── property_inverse.png
    ├── property_list_rev.png
    ├── property_commutative.png
    ├── property_dollar_map.png
    ├── property_idempotence.png
    ├── property_induction.png
    ├── property_invariant.png
    ├── property_list_sort.png
    ├── property_list_sort1.png
    ├── property_list_sort2.png
    ├── property_list_sort3.png
    ├── property_test_oracle.png
    ├── choosing_properties_1.png
    ├── choosing_properties_10.png
    ├── choosing_properties_11.png
    ├── choosing_properties_2.png
    ├── choosing_properties_3.png
    ├── choosing_properties_4.png
    ├── choosing_properties_5.png
    ├── choosing_properties_6.png
    ├── choosing_properties_7.png
    ├── choosing_properties_8.png
    ├── choosing_properties_9.png
    ├── property_dollar_times.png
    ├── property_dollar_times2.png
    ├── property_dollar_times3.png
    ├── property_string_split.png
    ├── property_list_rev_inverse.png
    ├── property_easy_verification.png
    ├── property_list_sort_pairwise.png
    └── property_list_sort_permutation.png
├── HaskellArticles.tex
├── HaskellArticles_1_4.pdf
├── HaskellArticles_1_6.pdf
├── tools
    └── post-commit
├── README.md
├── .gitignore
├── HaskellArticles.bib
├── jasper_van_der_jeugt.tex
├── michael_snoyman.tex
├── david_luposchainsky.tex
├── matt_parsons2.tex
├── hillel_wayne2.tex
├── hillel_wayne.tex
├── matt_parsons.tex
├── jasper_van_der_jeugt2.tex
├── LICENSE
├── alexis_king3.tex
└── jasper_van_der_jeugt3.tex


/pics/diff1.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff1.pdf


--------------------------------------------------------------------------------
/pics/diff2.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff2.pdf


--------------------------------------------------------------------------------
/pics/diff3.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff3.pdf


--------------------------------------------------------------------------------
/pics/diff4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff4.pdf


--------------------------------------------------------------------------------
/pics/diff5.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff5.pdf


--------------------------------------------------------------------------------
/pics/diff6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff6.pdf


--------------------------------------------------------------------------------
/pics/diff7.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff7.pdf


--------------------------------------------------------------------------------
/pics/diff8.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff8.pdf


--------------------------------------------------------------------------------
/pics/diff9.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff9.pdf


--------------------------------------------------------------------------------
/pics/case1_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_1.png


--------------------------------------------------------------------------------
/pics/case1_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_2.png


--------------------------------------------------------------------------------
/pics/case1_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_3.png


--------------------------------------------------------------------------------
/pics/case1_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_4.png


--------------------------------------------------------------------------------
/pics/case1_5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_5.png


--------------------------------------------------------------------------------
/pics/case1_6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_6.png


--------------------------------------------------------------------------------
/pics/case1_7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_7.png


--------------------------------------------------------------------------------
/pics/case1_8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_8.png


--------------------------------------------------------------------------------
/pics/case1_9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_9.png


--------------------------------------------------------------------------------
/pics/case2_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case2_1.png


--------------------------------------------------------------------------------
/pics/case2_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case2_2.png


--------------------------------------------------------------------------------
/pics/case2_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case2_3.png


--------------------------------------------------------------------------------
/pics/case3_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case3_1.png


--------------------------------------------------------------------------------
/pics/case3_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case3_2.png


--------------------------------------------------------------------------------
/pics/case3_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case3_3.png


--------------------------------------------------------------------------------
/pics/diff10.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff10.pdf


--------------------------------------------------------------------------------
/pics/diff11.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff11.pdf


--------------------------------------------------------------------------------
/pics/diff12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff12.pdf


--------------------------------------------------------------------------------
/pics/diff13.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff13.pdf


--------------------------------------------------------------------------------
/pics/diff14.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff14.pdf


--------------------------------------------------------------------------------
/pics/diff15.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff15.pdf


--------------------------------------------------------------------------------
/pics/diff16.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff16.pdf


--------------------------------------------------------------------------------
/pics/diff17.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff17.pdf


--------------------------------------------------------------------------------
/pics/diff18.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff18.pdf


--------------------------------------------------------------------------------
/pics/diff19.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff19.pdf


--------------------------------------------------------------------------------
/pics/diff20.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff20.pdf


--------------------------------------------------------------------------------
/pics/diff21.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff21.pdf


--------------------------------------------------------------------------------
/pics/diff22.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff22.pdf


--------------------------------------------------------------------------------
/pics/diff23.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/diff23.pdf


--------------------------------------------------------------------------------
/pics/zendo1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/zendo1.png


--------------------------------------------------------------------------------
/pics/zendo2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/zendo2.png


--------------------------------------------------------------------------------
/HaskellArticles.tex:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/HaskellArticles.tex


--------------------------------------------------------------------------------
/pics/case1_10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/case1_10.png


--------------------------------------------------------------------------------
/pics/hedgehog1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog1.png


--------------------------------------------------------------------------------
/pics/hedgehog2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog2.png


--------------------------------------------------------------------------------
/pics/hedgehog3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog3.png


--------------------------------------------------------------------------------
/pics/hedgehog4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog4.png


--------------------------------------------------------------------------------
/pics/hedgehog5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog5.png


--------------------------------------------------------------------------------
/pics/hedgehog6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog6.png


--------------------------------------------------------------------------------
/pics/hedgehog7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/hedgehog7.png


--------------------------------------------------------------------------------
/HaskellArticles_1_4.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/HaskellArticles_1_4.pdf


--------------------------------------------------------------------------------
/HaskellArticles_1_6.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/HaskellArticles_1_6.pdf


--------------------------------------------------------------------------------
/pics/komposition-light.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/komposition-light.png


--------------------------------------------------------------------------------
/pics/property_inverse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_inverse.png


--------------------------------------------------------------------------------
/pics/property_list_rev.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_rev.png


--------------------------------------------------------------------------------
/pics/property_commutative.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_commutative.png


--------------------------------------------------------------------------------
/pics/property_dollar_map.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_dollar_map.png


--------------------------------------------------------------------------------
/pics/property_idempotence.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_idempotence.png


--------------------------------------------------------------------------------
/pics/property_induction.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_induction.png


--------------------------------------------------------------------------------
/pics/property_invariant.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_invariant.png


--------------------------------------------------------------------------------
/pics/property_list_sort.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_sort.png


--------------------------------------------------------------------------------
/pics/property_list_sort1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_sort1.png


--------------------------------------------------------------------------------
/pics/property_list_sort2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_sort2.png


--------------------------------------------------------------------------------
/pics/property_list_sort3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_sort3.png


--------------------------------------------------------------------------------
/pics/property_test_oracle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_test_oracle.png


--------------------------------------------------------------------------------
/pics/choosing_properties_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_1.png


--------------------------------------------------------------------------------
/pics/choosing_properties_10.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_10.png


--------------------------------------------------------------------------------
/pics/choosing_properties_11.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_11.png


--------------------------------------------------------------------------------
/pics/choosing_properties_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_2.png


--------------------------------------------------------------------------------
/pics/choosing_properties_3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_3.png


--------------------------------------------------------------------------------
/pics/choosing_properties_4.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_4.png


--------------------------------------------------------------------------------
/pics/choosing_properties_5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_5.png


--------------------------------------------------------------------------------
/pics/choosing_properties_6.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_6.png


--------------------------------------------------------------------------------
/pics/choosing_properties_7.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_7.png


--------------------------------------------------------------------------------
/pics/choosing_properties_8.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_8.png


--------------------------------------------------------------------------------
/pics/choosing_properties_9.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/choosing_properties_9.png


--------------------------------------------------------------------------------
/pics/property_dollar_times.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_dollar_times.png


--------------------------------------------------------------------------------
/pics/property_dollar_times2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_dollar_times2.png


--------------------------------------------------------------------------------
/pics/property_dollar_times3.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_dollar_times3.png


--------------------------------------------------------------------------------
/pics/property_string_split.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_string_split.png


--------------------------------------------------------------------------------
/pics/property_list_rev_inverse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_rev_inverse.png


--------------------------------------------------------------------------------
/pics/property_easy_verification.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_easy_verification.png


--------------------------------------------------------------------------------
/pics/property_list_sort_pairwise.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_sort_pairwise.png


--------------------------------------------------------------------------------
/pics/property_list_sort_permutation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/oswald2/haskell_articles/HEAD/pics/property_list_sort_permutation.png


--------------------------------------------------------------------------------
/tools/post-commit:
--------------------------------------------------------------------------------
 1 | #!/bin/sh
 2 | #
 3 | # An example hook script to verify what is about to be committed.
 4 | # Called by "git commit" with no arguments.  The hook should
 5 | # exit with non-zero status after issuing an appropriate message if
 6 | # it wants to stop the commit.
 7 | #
 8 | # To enable this hook, rename this file to "pre-commit".
 9 | 
10 | git rev-parse HEAD > git_hash.tex
11 | git describe --abbrev=0 --tags > git_tag.tex
12 | 
13 | 
14 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Haskell Articles
 2 | 
 3 | Dervied from the post [A List of Haskell articles on good design, good testing](https://williamyaoh.com/posts/2019-11-24-design-and-testing-articles.html)
 4 | 
 5 | All articles are contained in this repostory in LaTeX book format.
 6 | 
 7 | ## LaTeX'ing
 8 | 
 9 | The source uses the LaTeX package minted for syntax highlighting. This package needs the flag "-shell-escape" specified. So to build the document:
10 | 
11 | ```
12 | pdflatex -shell-escape HaskellArticles.tex
13 | ```
14 | 
15 | There is also a bibilography contained, so you need to run BibTex as well and pdflatex several times to get the cross references right. 
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | ## Core latex/pdflatex auxiliary files:
  2 | *.aux
  3 | *.lof
  4 | *.log
  5 | *.lot
  6 | *.fls
  7 | *.out
  8 | *.toc
  9 | *.fmt
 10 | *.fot
 11 | *.cb
 12 | *.cb2
 13 | .*.lb
 14 | 
 15 | ## Intermediate documents:
 16 | *.dvi
 17 | *.xdv
 18 | *-converted-to.*
 19 | # these rules might exclude image files for figures etc.
 20 | # *.ps
 21 | # *.eps
 22 | # *.pdf
 23 | 
 24 | ## Generated if empty string is given at "Please type another file name for output:"
 25 | .pdf
 26 | 
 27 | ## Bibliography auxiliary files (bibtex/biblatex/biber):
 28 | *.bbl
 29 | *.bcf
 30 | *.blg
 31 | *-blx.aux
 32 | *-blx.bib
 33 | *.run.xml
 34 | 
 35 | ## Build tool auxiliary files:
 36 | *.fdb_latexmk
 37 | *.synctex
 38 | *.synctex(busy)
 39 | *.synctex.gz
 40 | *.synctex.gz(busy)
 41 | *.pdfsync
 42 | 
 43 | ## Auxiliary and intermediate files from other packages:
 44 | # algorithms
 45 | *.alg
 46 | *.loa
 47 | 
 48 | # achemso
 49 | acs-*.bib
 50 | 
 51 | # amsthm
 52 | *.thm
 53 | 
 54 | # beamer
 55 | *.nav
 56 | *.pre
 57 | *.snm
 58 | *.vrb
 59 | 
 60 | # changes
 61 | *.soc
 62 | 
 63 | # cprotect
 64 | *.cpt
 65 | 
 66 | # elsarticle (documentclass of Elsevier journals)
 67 | *.spl
 68 | 
 69 | # endnotes
 70 | *.ent
 71 | 
 72 | # fixme
 73 | *.lox
 74 | 
 75 | # feynmf/feynmp
 76 | *.mf
 77 | *.mp
 78 | *.t[1-9]
 79 | *.t[1-9][0-9]
 80 | *.tfm
 81 | 
 82 | #(r)(e)ledmac/(r)(e)ledpar
 83 | *.end
 84 | *.?end
 85 | *.[1-9]
 86 | *.[1-9][0-9]
 87 | *.[1-9][0-9][0-9]
 88 | *.[1-9]R
 89 | *.[1-9][0-9]R
 90 | *.[1-9][0-9][0-9]R
 91 | *.eledsec[1-9]
 92 | *.eledsec[1-9]R
 93 | *.eledsec[1-9][0-9]
 94 | *.eledsec[1-9][0-9]R
 95 | *.eledsec[1-9][0-9][0-9]
 96 | *.eledsec[1-9][0-9][0-9]R
 97 | 
 98 | # glossaries
 99 | *.acn
100 | *.acr
101 | *.glg
102 | *.glo
103 | *.gls
104 | *.glsdefs
105 | 
106 | # gnuplottex
107 | *-gnuplottex-*
108 | 
109 | # gregoriotex
110 | *.gaux
111 | *.gtex
112 | 
113 | # htlatex
114 | *.4ct
115 | *.4tc
116 | *.idv
117 | *.lg
118 | *.trc
119 | *.xref
120 | 
121 | # hyperref
122 | *.brf
123 | 
124 | # knitr
125 | *-concordance.tex
126 | # TODO Comment the next line if you want to keep your tikz graphics files
127 | *.tikz
128 | *-tikzDictionary
129 | 
130 | # listings
131 | *.lol
132 | 
133 | # makeidx
134 | *.idx
135 | *.ilg
136 | *.ind
137 | *.ist
138 | 
139 | # minitoc
140 | *.maf
141 | *.mlf
142 | *.mlt
143 | *.mtc[0-9]*
144 | *.slf[0-9]*
145 | *.slt[0-9]*
146 | *.stc[0-9]*
147 | 
148 | # minted
149 | _minted*
150 | *.pyg
151 | 
152 | # morewrites
153 | *.mw
154 | 
155 | # nomencl
156 | *.nlg
157 | *.nlo
158 | *.nls
159 | 
160 | # pax
161 | *.pax
162 | 
163 | # pdfpcnotes
164 | *.pdfpc
165 | 
166 | # sagetex
167 | *.sagetex.sage
168 | *.sagetex.py
169 | *.sagetex.scmd
170 | 
171 | # scrwfile
172 | *.wrt
173 | 
174 | # sympy
175 | *.sout
176 | *.sympy
177 | sympy-plots-for-*.tex/
178 | 
179 | # pdfcomment
180 | *.upa
181 | *.upb
182 | 
183 | # pythontex
184 | *.pytxcode
185 | pythontex-files-*/
186 | 
187 | # thmtools
188 | *.loe
189 | 
190 | # TikZ & PGF
191 | *.dpth
192 | *.md5
193 | *.auxlock
194 | 
195 | # todonotes
196 | *.tdo
197 | 
198 | # easy-todo
199 | *.lod
200 | 
201 | # xmpincl
202 | *.xmpi
203 | 
204 | # xindy
205 | *.xdy
206 | 
207 | # xypic precompiled matrices
208 | *.xyc
209 | 
210 | # endfloat
211 | *.ttt
212 | *.fff
213 | 
214 | # Latexian
215 | TSWLatexianTemp*
216 | 
217 | ## Editors:
218 | # WinEdt
219 | *.bak
220 | *.sav
221 | 
222 | # Texpad
223 | .texpadtmp
224 | 
225 | # Kile
226 | *.backup
227 | 
228 | # KBibTeX
229 | *~[0-9]*
230 | 
231 | # auto folder when using emacs and auctex
232 | ./auto/*
233 | *.el
234 | 
235 | # expex forward references with \gathertags
236 | *-tags.tex
237 | 
238 | # standalone packages
239 | *.sta
240 | 
241 | # generated if using elsarticle.cls
242 | *.spl
243 | tmp/
244 | git_hash.tex
245 | git_tag.tex
246 | *~
247 | 
248 | 


--------------------------------------------------------------------------------
/HaskellArticles.bib:
--------------------------------------------------------------------------------
  1 | @misc{original_article,
  2 | 	howpublished = {William Yao: A list of Haskell articles on good design, good testing -- \url{https://williamyaoh.com/posts/2019-11-24-design-and-testing-articles.html}},
  3 | 	note = {Accessed: 27.11.2019}
  4 | }
  5 | 
  6 | 
  7 | 
  8 | @misc{type_safety_back_and_forth,
  9 | 	howpublished = {Matt Parsons: Type Safety Back and Forth -- \url{https://www.parsonsmatt.org/2017/10/11/type_safety_back_and_forth.html}},
 10 | 	note = {Accessed: 27.11.2019}
 11 | }
 12 | 
 13 | 
 14 | 
 15 | 
 16 | @misc{keep_your_types_small,
 17 | 	howpublished = {Matt Parsons: Keep your types small... -- \url{https://www.parsonsmatt.org/2018/10/02/small_types.html}},
 18 | 	note = {Accessed: 27.11.2019}
 19 | }
 20 | 
 21 | @misc{algebraic_blindness,
 22 | 	howpublished = {David Luposchainsky: Algebraic Blindness -- \url{https://github.com/quchen/articles/blob/master/algebraic-blindness.md}},
 23 | 	note = {Accessed: 27.11.2019}
 24 | }
 25 | 
 26 | 
 27 | @misc{parse_dont_validate,
 28 | 	howpublished = {Alexis King: Parse, don't validate -- \url{https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/}},
 29 | 	note = {Accessed: 27.11.2019}
 30 | }
 31 | 
 32 | 
 33 | 
 34 | @misc{on_adhoc_datatypes,
 35 | 	howpublished = {Jasper van der Jeugt: On Ad-hoc Datatypes -- \url{https://jaspervdj.be/posts/2016-05-11-ad-hoc-datatypes.html}},
 36 | 	note = {Accessed: 27.11.2019}
 37 | }
 38 | 
 39 | 
 40 | 
 41 | @misc{good_design_and_type_safety_in_yahtzee,
 42 | 	howpublished = {Tom Ellis: Good design and type safety in Yahtzee -- \url{http://h2.jaguarpaw.co.uk/posts/good-design-and-type-safety-in-yahtzee/}},
 43 | 	note = {Accessed: 27.11.2019}
 44 | }
 45 | 
 46 | @misc{using_our_brain_less,
 47 | 	howpublished = {Tom Ellis: Using our brain less in refactoring Yahtzee -- \url{http://h2.jaguarpaw.co.uk/posts/using-brain-less-refactoring-yahtzee/}},
 48 | 	note = {Accessed: 27.11.2019}
 49 | }
 50 | 
 51 | 
 52 | @misc{weakly_typed_haskell,
 53 | 	howpublished = {Michael Snoyman: Weakly Typed Haskell -- \url{https://www.fpcomplete.com/blog/2018/01/weakly-typed-haskell}},
 54 | 	note = {Accessed: 27.11.2019}
 55 | }
 56 | 
 57 | 
 58 | @misc{the_trouble_with_typed_errors,
 59 | 	howpublished = {Matt Parsons: The Trouble with Typed Errors -- \url{https://www.parsonsmatt.org/2018/11/03/trouble_with_typed_errors.html}},
 60 | 	note = {Accessed: 27.11.2019}
 61 | }
 62 | 
 63 | 
 64 | @misc{type_directed_code_generation,
 65 | 	howpublished = {Sandy Maguire: Type-Directed Code Generation -- \url{https://reasonablypolymorphic.com/blog/type-directed-code-generation/}},
 66 | 	note = {Accessed: 27.11.2019}
 67 | }
 68 | 
 69 | 
 70 | @misc{practical_testing_in_haskell,
 71 | 	howpublished = {Jasper van der Jeugt: Practical testing in Haskell -- \url{https://jaspervdj.be/posts/2015-03-13-practical-testing-in-haskell.html}},
 72 | 	note = {Accessed: 27.11.2019}
 73 | }
 74 | 
 75 | 
 76 | @misc{screencast_introduction,
 77 | 	howpublished = {Oskar Wickström: Property-Based Testing in a Screencast Editor: Introduction -- \url{https://wickstrom.tech/programming/2019/03/02/property-based-testing-in-a-screencast-editor-introduction.html}},
 78 | 	note = {Accessed: 27.11.2019}
 79 | 
 80 | 
 81 | @misc{timeline_flattening,
 82 | 	howpublished = {Oskar Wickström: Property-Based Testing in a Screencast Editor, Case Study 1: Timeline Flattening -- \url{https://wickstrom.tech/programming/2019/03/24/property-based-testing-in-a-screencast-editor-case-study-1.html}},
 83 | 	note = {Accessed: 27.11.2019}
 84 | 
 85 | @misc{video_scene_classification,
 86 | 	howpublished = {Oskar Wickström: Property-Based Testing in a Screencast Editor, Case Study 2: Video Scene Classification -- \url{https://wickstrom.tech/programming/2019/04/17/property-based-testing-in-a-screencast-editor-case-study-2.html}},
 87 | 	note = {Accessed: 27.11.2019}
 88 | 
 89 | 
 90 | @misc{integration_testing,
 91 | 	howpublished = {Oskar Wickström: Property-Based Testing in a Screencast Editor, Case Study 3: Integration Testing -- \url{https://wickstrom.tech/programming/2019/06/02/property-based-testing-in-a-screencast-editor-case-study-3.html}},
 92 | 	note = {Accessed: 27.11.2019}
 93 | 
 94 | 
 95 | 	
 96 | @misc{choosing_properties,
 97 | 	howpublished = {Scott Wlaschin: Choosing properties for property-based testing -- \url{https://fsharpforfunandprofit.com/posts/property-based-testing-2/}},
 98 | 	note = {Accessed: 27.11.2019}
 99 | 
100 | 	
101 | @misc{finding_properties,
102 | 	howpublished = {Hillel Wayne: Findin Property Tests -- \url{https://www.hillelwayne.com/post/contract-examples/}},
103 | 	note = {Accessed: 27.11.2019}
104 | 
105 | 	
106 | @misc{using_types_to_unit_test,
107 | 	howpublished = {Alexis King: Using types to unit-test in Haskell -- \url{https://lexi-lambda.github.io/blog/2016/10/03/using-types-to-unit-test-in-haskell/}},
108 | 	note = {Accessed: 27.11.2019}
109 | 
110 | 	
111 | @misc{time_travelling,
112 | 	howpublished = {Oskar Wickström: Time Travelling and Fixing Bugs with Property-Based Testing -- \url{https://wickstrom.tech/programming/2019/11/17/time-travelling-and-fixing-bugs-with-property-based-testing.html}},
113 | 	note = {Accessed: 27.11.2019}
114 | 
115 | 	
116 | @misc{metamorphic_testing,
117 | 	howpublished = {Hillel Wayne: Metamorphic Testing -- \url{https://www.hillelwayne.com/post/metamorphic-testing/}},
118 | 	note = {Accessed: 27.11.2019}
119 | 
120 | 	
121 | @misc{unit_testing_monad_mock,
122 | 	howpublished = {Alexis King: Unit testing effectful Haskell with monad-mock -- \url{https://lexi-lambda.github.io/blog/2017/06/29/unit-testing-effectful-haskell-with-monad-mock/}},
123 | 	note = {Accessed: 27.11.2019}
124 | 
125 | @misc{the_handle_pattern,
126 | 	howpublished = {Jasper van der Jeugt: The Handle Pattern -- \url{https://jaspervdj.be/posts/2018-03-08-handle-pattern.html}},
127 | 	note = {Accessed: 11.12.2019}
128 | 
129 | 	
130 | @misc{readert_pattern,
131 | 	howpublished = {Michael Snoyman: The ReaderT Design Pattern -- \url{https://www.fpcomplete.com/blog/2017/06/readert-design-pattern}},
132 | 	note = {Accessed: 11.12.2019}
133 | 


--------------------------------------------------------------------------------
/jasper_van_der_jeugt.tex:
--------------------------------------------------------------------------------
  1 | \chapter{On Ad-hoc Datatypes - Jasper Van der Jeugt}
  2 | \label{sec:adhoc_datatypes}
  3 | 
  4 | \begin{quotation}
  5 | \noindent\textit{\textbf{William Yao:}}
  6 | 
  7 | \textit{In a similar vein to preventing algebraic blindness, make your code more readable by naming things with datatypes, even ones that only live in a single module. Defining new dataypes is cheap and easy, so do it!}
  8 | 
  9 | \vspace{\baselineskip}
 10 | \noindent\textit{Original article: \cite{on_adhoc_datatypes}}
 11 | \end{quotation}
 12 | In Haskell, it is extremely easy for the programmer to add a quick datatype. It does not have to take more than a few lines. This is useful to add auxiliary, ad-hoc datatypes.
 13 | 
 14 | I don't think this is used enough. Most libraries and code I see use ``heavier'' datatypes: canonical and very well-thought-out datatypes, followed by dozens of instances and related functions. These are of course great to work with, but it doesn't have to be a restriction: adding quick datatypes -- without all these instances and auxiliary functions -- often makes code easier to read.
 15 | 
 16 | The key idea is that, in order to make code as simple as possible, you want to represent your data as simply as possible. However, the definition of ``simple'' is not the same in every context. Sometimes, looking at your data from another perspective makes specific tasks easier. In those cases, introducing ``quick-and-dirty'' datatypes often makes code cleaner.
 17 | 
 18 | This blogpost is written in literate Haskell so you should be able to just load it up in GHCi and play around with it. You can find the raw \texttt{.lhs} file \href{https://github.com/jaspervdj/jaspervdj/raw/master/posts/2016-05-11-ad-hoc-datatypes.lhs}{here}.
 19 | 
 20 | \begin{minted}{haskell}
 21 | import Control.Monad (forM_)
 22 | \end{minted}
 23 | Let's look at a quick example. Here, we have a definition of a shopping cart in a fruit store.
 24 | 
 25 | \begin{minted}{haskell}
 26 | data Fruit = Banana | Apple | Orange
 27 |     deriving (Show, Eq)
 28 | 
 29 | type Cart = [(Fruit, Int)]
 30 | \end{minted}
 31 | And we have a few functions to inspect it.
 32 | 
 33 | \begin{minted}{haskell}
 34 | cartIsHomogeneous :: Cart -> Bool
 35 | cartIsHomogeneous []                = True
 36 | cartIsHomogeneous ((fruit, _) : xs) = all ((== fruit) . fst) xs
 37 | 
 38 | cartTotalItems :: Cart -> Int
 39 | cartTotalItems = sum . map snd
 40 | \end{minted}
 41 | This is very much like code you typically see in Haskell codebases (of course, with more complex datatypes than this simple example).
 42 | 
 43 | The last function we want to add is printing a cart. The exact way we format it depends on what is in the cart. There are four possible scenarios.
 44 | 
 45 | \begin{enumerate}
 46 | \item The cart is empty.
 47 | \item There is a single item in the customers cart and we have some sort of simplified checkout.
 48 | \item The customer buys three or more of the same fruit (and nothing else). In that case we give out a bonus.
 49 | \item None of the above.
 50 | \end{enumerate}
 51 | This is clearly a good candidate for Haskell's \texttt{case} statement and guards. Let's try that.
 52 | 
 53 | \begin{minted}{haskell}
 54 | printCart :: Cart -> IO ()
 55 | printCart cart = case cart of
 56 |     []           -> putStrLn $ "Your cart is empty"
 57 |     [(fruit, 1)] -> putStrLn $ "You are buying one " ++ show fruit
 58 |     _ | cartIsHomogeneous cart && cartTotalItems cart >= 3 -> do
 59 |               putStrLn $
 60 |                   show (cartTotalItems cart) ++
 61 |                   " " ++ show (fst $ head cart) ++ "s" ++
 62 |                   " for you!"
 63 |               putStrLn "BONUS GET!"
 64 |       | otherwise -> do
 65 |           putStrLn $ "Your cart: "
 66 |           forM_ cart $ \(fruit, num) ->
 67 |               putStrLn $ "- " ++ show num ++ " " ++ show fruit
 68 | \end{minted}
 69 | This is not very nice. The business logic is interspersed with the printing code. We could clean it up by adding additional predicates such as \texttt{cartIsBonus}, but having too many of these predicates leads to a certain kind of \href{https://existentialtype.wordpress.com/2011/03/15/boolean-blindness/}{Boolean Blindness}.
 70 | 
 71 | Instead, it seems much nicer to introduce a temporary type:
 72 | 
 73 | \begin{minted}{haskell}
 74 | data CartView
 75 |     = EmptyCart
 76 |     | SingleCart  Fruit
 77 |     | BonusCart   Fruit Int
 78 |     | GeneralCart Cart
 79 | \end{minted}
 80 | This allows us to decompose our \texttt{printCart} into two clean parts: classifying the cart, and printing it.
 81 | 
 82 | \begin{minted}{haskell}
 83 | cartView :: Cart -> CartView
 84 | cartView []           = EmptyCart
 85 | cartView [(fruit, 1)] = SingleCart fruit
 86 | cartView cart
 87 |     | cartIsHomogeneous cart && cartTotalItems cart >= 3 =
 88 |         BonusCart (fst $ head cart) (cartTotalItems cart)
 89 |     | otherwise = GeneralCart cart
 90 | \end{minted}
 91 | Note that we now have a \textit{single} location where we classify the cart. This is useful if we need this information in multiple places. If we chose to solve the problem by adding additional predicates such has \texttt{cartIsBonus} instead, we would still have to watch out that we check these predicates in the \textit{same order} everywhere. Furthermore, if we need to add a case, we can simply add a constructor to this datatype, and the compiler will tell us where we need to update our code\footnote{If you are compiling with \texttt{-Wall}, which is what you really, really should be doing.}.
 92 | 
 93 | Our printCart becomes very simple now:
 94 | 
 95 | \begin{minted}{haskell}
 96 | printCart2 :: Cart -> IO ()
 97 | printCart2 cart = case cartView cart of
 98 |     EmptyCart          -> putStrLn $ "Your cart is empty"
 99 |     SingleCart fruit   -> putStrLn $ "You are buying one " ++ show fruit
100 |     BonusCart  fruit n -> do
101 |         putStrLn $ show n ++ " " ++ show fruit ++ "s for you!"
102 |         putStrLn "BONUS GET!"
103 |     GeneralCart items  -> do
104 |         putStrLn $ "Your cart: "
105 |         forM_ items $ \(fruit, num) ->
106 |             putStrLn $ "- " ++ show num ++ " " ++ show fruit
107 | \end{minted}
108 | Of course, it goes without saying that ad-hoc datatypes that are only used locally should not be exported from the module -- otherwise you end up with a mess again.
109 | 
110 | 


--------------------------------------------------------------------------------
/michael_snoyman.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Weakly Typed Haskell - Michael Snoyman}
  2 | 
  3 | \begin{quotation}
  4 | \noindent\textit{\textbf{William Yao:}}
  5 | 
  6 | \textit{A short example of preventing errors by restricting our function inputs. Uses the streaming library conduit for its example, but should be understandable without knowing too much about it.}
  7 | 
  8 | \vspace{\baselineskip}
  9 | \noindent\textit{Original article: \cite{weakly_typed_haskell}}
 10 | \end{quotation}
 11 | 
 12 | 
 13 | I was recently doing a minor cleanup of a Haskell codebase. I started off with some code that looked like this:
 14 | 
 15 | \begin{minted}{haskell}
 16 | runConduitRes $ sourceFile fp .| someConsumer
 17 | \end{minted}
 18 | This code uses \href{https://haskell-lang.org/library/conduit}{Conduit} to stream the contents of a file into a consumer function, and \href{https://www.fpcomplete.com/blog/2017/06/understanding-resourcet}{ResourceT} to ensure that the code is exception safe (the file is closed regardless of exceptions). For various reasons (not relevant to our discussion now), I was trying to reduce usage of \texttt{ResourceT} in this bit of code, and so I instead wrote:
 19 | 
 20 | \begin{minted}{haskell}
 21 | withBinaryFile fp ReadMode $ \h ->
 22 |   runConduit $ sourceHandle h .| someConsumer
 23 | \end{minted}
 24 | Instead of using \texttt{ResourceT} to ensure exception safety, I used the \texttt{with} (or \texttt{bracket}) pattern embodied by the \texttt{withBinaryFile} function. This transformation worked very nicely, and I was able to apply the change to a number of parts of the code base.
 25 | 
 26 | However, I noticed an error message from this application a few days later:
 27 | 
 28 | \begin{minted}{haskell}
 29 | /some/long/file/path.txt: hGetBufSome: illegal operation (handle is not 
 30 | open for reading)
 31 | \end{minted}
 32 | I looked through my code base, and sure enough I found that in one of the places I'd done this refactoring, I'd written the following instead:
 33 | 
 34 | \begin{minted}{haskell}
 35 | withBinaryFile fp WriteMode $ \h ->
 36 |   runConduit $ sourceHandle h.| someConsumer
 37 | \end{minted}
 38 | Notice how I used \texttt{WriteMode} instead of \texttt{ReadMode}. It's a simple mistake, and it's obvious when you look at it. The patch to fix this bug was trivial. But I wasn't satisfied with fixing this bug. I wanted to eliminate it from happening again.
 39 | 
 40 | \section{A strongly typed language?}
 41 | 
 42 | Lots of people believe that Haskell is a strongly typed language. Strong typing means that you catch lots of classes of bugs with the type system itself. (Static typing means that the type checking occurs at compile time instead of runtime.) I disagree: Haskell is \texttt{not} a strongly typed language. In fact, my claim is broader:
 43 | 
 44 | \begin{quotation}
 45 | \textit{There's no such thing as a strongly typed language}
 46 | \end{quotation}
 47 | Instead, you can write your code in strongly typed or weakly typed style. Some language features make it easy to make your programs more strongly typed. For example, Haskell supports:
 48 | 
 49 | \begin{itemize}
 50 | \item Cheap newtype wrappers
 51 | \item Sum types
 52 | \item Phantom type arguments
 53 | \item GADTs
 54 | \end{itemize}
 55 | All of these features allow you to more easily add type safety to your code. But here's the rub:
 56 | 
 57 | \begin{quotation}
 58 |  \textit{You have to add the type safety yourself }
 59 | \end{quotation}
 60 | If you want to write a program in Haskell that passes string data everywhere and puts everything in \texttt{IO}, you're still writing Haskell, but you're throwing away the potential for getting extra protections from the compiler.
 61 | 
 62 | My \texttt{withBinaryFile} usage is a small-scale example of this. The \texttt{sourceFile} function I'd been using previously looks roughly like:
 63 | 
 64 | \begin{minted}{haskell}
 65 | sourceFile :: FilePath -> Source (ResourceT IO) ByteString
 66 | \end{minted}
 67 | This says that if you give this function a \texttt{FilePath}, it will give you back a stream of bytes, and that it requires \texttt{ResourceT} to be present (to register the cleanup function in the case of an exception). Importantly: there's no way you could accidentally try to send data into this file. The type (\texttt{Source}) prevents it. If you did something like:
 68 | 
 69 | \begin{minted}{haskell}
 70 | runConduitRes $ someProducer .| sourceFile "output.txt"
 71 | \end{minted}
 72 | The compiler would complain about the types mismatching, which is exactly what you want! Now, by contrast, let's look at the types of \texttt{withBinaryFile} and \texttt{sourceHandle}:
 73 | 
 74 | \begin{minted}{haskell}
 75 | withBinaryFile :: FilePath -> IOMode -> (Handle -> IO a) -> IO a
 76 | sourceHandle :: Handle -> Source IO ByteString
 77 | \end{minted}
 78 | The type signature of \texttt{withBinaryFile} uses the bracket pattern, meaning that you provide it with a function to run while the file is open, and it will ensure that it closes the file. But notice something about the type of that inner function: \texttt{Handle -> IO a}. It tells you absolutely nothing about whether the file is opened for reading and writing!
 79 | 
 80 | The question is: how do we protect ourselves from the kinds of bugs this weak typing allows?
 81 | 
 82 | \section{Quarantining weak typing}
 83 | 
 84 | Let's capture the basic concept of what I was trying to do in my program with a helper function:
 85 | 
 86 | \begin{minted}{haskell}
 87 | withSourceFile :: FilePath -> (Source IO ByteString -> IO a) -> IO a
 88 | withSourceFile fp inner =
 89 |   withBinaryFile fp ReadMode $ \handle ->
 90 |     inner $ sourceHandle handle
 91 | \end{minted}
 92 | This function has all of the same weak typing problems in its body as what I wrote before. However, let's look at the use site of this function:
 93 | 
 94 | \begin{minted}{haskell}
 95 | withSourceFile fp $ \src -> runConduit $ src .| someConsumer
 96 | \end{minted}
 97 | I definitely can't accidentally pass \texttt{WriteMode} instead, and if I try to do something like:
 98 | 
 99 | \begin{minted}{haskell}
100 | withSourceFile fp $ \src -> runConduit $ someProducer .| src
101 | \end{minted}
102 | I'll get a compile time error. In other words:
103 | \begin{quotation}
104 |  \textit{While my function internally is weakly typed, externally it's strongly typed }
105 | \end{quotation}
106 | This means that all users of my functions get the typing guarantees I've been hoping to provide. We can't eliminate the possibility of weak typing errors completely, since the systems we're running on are ultimately weakly typed. After all, at the OS level, a file descriptor is just an int, and tells you nothing about whether it's read mode, write mode, or even a pointer to some random address in memory.
107 | 
108 | Instead, our goal in writing strongly typed programs is to contain as much of the weak typing to helper functions as possible, and expose a strongly typed interface for most of our program. By using \texttt{withSourceFile} instead of \texttt{withBinaryFile}, I now have just one place in my code I need to check the logic of using \texttt{ReadMode} correctly, instead of dozens.
109 | 
110 | \section{Discipline and best practices}
111 | 
112 | The takeaway here is that you can \textit{always} shoot yourself in the foot. Languages like Haskell are not some piece of magic that will eliminate bugs. You need to follow through with discipline in using the languages well if you want to get the benefits of features like strong typing.
113 | 
114 | You can use the some kind of technique in many languages. But if you use a language like Haskell with a plethora of features geared towards easy safety, you'll be much more likely to follow through on it.
115 | 


--------------------------------------------------------------------------------
/david_luposchainsky.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Algebraic blindness - David Luposchainsky}
  2 | 
  3 | \begin{quotation}
  4 | \noindent\textit{\textbf{William Yao:}}
  5 | 
  6 | \textit{The motivating problem: if I have a \texttt{Maybe Int}, I only implicitly know what the two branches \textbf{mean}. \texttt{Nothing} could mean ``something bad happened, abort'', or it could mean ``I haven't found anything yet, keep going.'' Conversely, a value of \texttt{Just x} could be a useful value, or it could be a subroutine signalling that an error code occurred. The names are \textbf{too generic}; the structure only tells us what we have, not what it's for.}
  7 | 
  8 | \textit{Goes through how we might solve this problem by defining new datatypes that are isomorphic to, say, \texttt{Bool}, but with more useful names. Note that the problem that the article talks about with regards to not having typeclass instances for your custom types can (since GHC 8.6) be solved using \texttt{DerivingVia}.}
  9 | 
 10 | \vspace{\baselineskip}
 11 | 
 12 | \noindent\textit{Original article: \cite{algebraic_blindness}}
 13 | \end{quotation}
 14 | 
 15 | \section{Abstract}
 16 | 
 17 | Algebraic data types make data more flexible, but also prone to a form of generalized Boolean Blindness, making code more difficult to maintain. Luckily, refactoring the issues is completely type-safe.
 18 | 
 19 | \section{Boolean blindness}
 20 | 
 21 | 
 22 | In programming, there is a common problem known as \textit{Boolean Blindness}, what it ``means'' to be \texttt{True} depends heavily on the context, and cannot be inferred by the value alone. Consider
 23 | 
 24 | \begin{minted}{haskell}
 25 | main = withFile True "file.txt" (\handle -> do stuff)
 26 | \end{minted}
 27 | The Boolean parameter could distinguish between read-only\-/\-write-only, or read-only\-/\-read+write, whether the file should be truncated before opening, and countless other possibilities.
 28 | 
 29 | And often there is an even worse issue: we have two booleans, and we do not know whether they describe something in the same problem domain. A boolean used for read-only-vs-write-only looks just the same as one distinguishing red from blue. And then, one day, we open a file in ``blue mode'', and probably nothing good follows.
 30 | 
 31 | The point is: in order to find out what the \texttt{True} means, you probably have to read documentation or code elsewhere.
 32 | 
 33 | 
 34 | 
 35 | \section{Haskell to the rescue}
 36 | 
 37 | Haskell makes it very natural and cheap to define our own data types. The example code above would be expressed as
 38 | 
 39 | \begin{minted}{haskell}
 40 | main = withFile ReadMode "file.txt" (\handle -> do stuff)
 41 | \end{minted}
 42 | in more idiomatic Haskell. Here, the meaning of the field is obvious, and since a data type \texttt{data IOMode = ReadMode | WriteMode} has nothing to do with a data type \texttt{data Colours = Red | Blue}, we cannot accidentally pass a \texttt{Red} value to our function, despite the fact that they would both have corresponded to \texttt{True} in the Boolean-typed example.
 43 | 
 44 | 
 45 | 
 46 | 
 47 | \section{The petting zoo of blindness}
 48 | 
 49 | 
 50 | Boolean blindness is of course just a name for the most characteristic version of the issue that most standard types share. An \texttt{Int} parameter to a server might be a port or a timeout, a \texttt{String} could be a host, a route, a log prefix, and so on.
 51 | 
 52 | The Haskell solution is to simply wrap things in newtypes, tagging values with phantom types, or introducing type synonyms. (I'm not a fan of the latter, which you can read more about in a \href{https://github.com/quchen/articles/blob/master/tag-dont-type.md}{previous article}.)
 53 | 
 54 | 
 55 | 
 56 | \section{Algebraic blindness}
 57 | 
 58 | 
 59 | Haskell has a lot more ``simple, always available, nicely supported'' types than most other languages, for example \texttt{Maybe}, \texttt{Either}, tuples or lists. These types naturally extend Boolean Blindness.
 60 | 
 61 | \begin{itemize}
 62 | \item \texttt{Maybe} adds another distinct value to a type. \texttt{Nothing} is sometimes used to denote an error case (this is what many assume by default, implicitly given by its Monad instance), sometimes the ``nothing weird happened'' case, and sometimes something completely different.
 63 | 
 64 | \texttt{Maybe a} is as blind as \texttt{a}, plus one value.
 65 | 
 66 | \item \texttt{Either} is similar: sometimes \texttt{Left} is an exceptional case, sometimes it's just ``the other'' case.
 67 | 
 68 | \item \texttt{Either a b} is as blind as \texttt{a} plus as blind as \texttt{b}, plus one for the fact that \texttt{Left} and \texttt{Right} do not have intrinsic meaning.
 69 | 
 70 | \item Pairs have two fields, but how do they relate to each other? Does one maybe tell us about errors in the other? We cannot know.
 71 | 
 72 | \texttt{(a,b)} is as blind as \texttt{a} times the blindness of \texttt{b}.
 73 | 
 74 | \item Unit is not very blind, since even synonyms of it mostly mean the same thing: we don't really care about the result.                                                                                                                    \end{itemize}
 75 | In GHCi's source code, there is a value of type \texttt{Maybe Bool} passed around, which has three possible values:
 76 | 
 77 | \begin{enumerate}
 78 | \item \texttt{Nothing} means there is no more input to process when GHCi is invoked via \texttt{ghc -e}.
 79 | \item \texttt{Just True} reports success of the last command.
 80 | \item \texttt{Just False} is an error in the last command, and \texttt{ghc -e} should abort with an exit code of 1, while a GHCi session should continue.
 81 | \end{enumerate}
 82 | It is very hard to justify this over something like
 83 | 
 84 | \begin{minted}{haskell}
 85 | data CommandResult
 86 |     = NoMoreInput -- ^ Haddock might go here!
 87 |     | Success
 88 |     | Failure
 89 | \end{minted}
 90 | which is just as easy to implement, has four lines of overhead (instead of 0 lines of overheadache), is easily searchable in full-text search even, and gives us type errors when we use it in the wrong context.
 91 | 
 92 | 
 93 | 
 94 | 
 95 | \section{Haskell to the rescue, for real this time}
 96 | 
 97 | We're lucky, because Haskell has a built-in solution here as well. We can prototype our programs not worrying about the precise meaning of symbols, use \texttt{Either () ()} instead of Bool because we might need the flexibility, and do all sorts of atrocities.
 98 | 
 99 | The type system allows us to repair our code: just understand what the different values of our blind values mean, and encode this in a new data type. Then pick a random use site, and just put it in there. The compiler will yell at you for a couple of minutes, but it will report every single conflicting site, and since you're introducing something entirely new, there is no way you are producing undetected clashes. I found this type of refactoring to be \textit{one of the most powerful tools Haskell has to offer}, but we rarely speak about it because it seems so normal to us.
100 | 
101 | 
102 | 
103 | \section{Drawbacks}
104 | 
105 | Introducing new domain types has a drawback: we lose the API of the standard types. The result is that we sometimes have to write boilerplate to get specific parts of the API back, which is unfortunate, and sometimes this makes it not worth bothering with the anti-blindness refactoring.
106 | 
107 | 
108 | 
109 | \section{Conclusion}
110 | 
111 | 
112 | When you have lots of faceless data types in your code, consider painting them with their domain meanings. Make them distinct, make them memorable, make them maintainable. And sometimes, when you see a type that looks like
113 | 
114 | \begin{minted}{haskell}
115 | data IOMode
116 |     = ReadMode
117 |     | WriteMode
118 |     | AppendMode
119 |     | ReadWriteMode
120 | \end{minted}
121 | take a moment to appreciate that the author hasn't used
122 | 
123 | \begin{minted}{haskell}
124 | Either Bool Bool
125 | --      ^    ^
126 | --      |    |
127 | --      |    Complex modes: append, read+write
128 | --      |
129 | --      Simple modes: read/write only
130 | \end{minted}
131 | instead.
132 | 


--------------------------------------------------------------------------------
/matt_parsons2.tex:
--------------------------------------------------------------------------------
  1 | \chapter{The Trouble with Typed Errors - Matt Parsons}
  2 | 
  3 | \begin{quotation}
  4 | \noindent\textit{\textbf{William Yao:}}
  5 | 
  6 | \textit{Or: ``Avoiding a monolithic error type''}
  7 | 
  8 | \textit{Good, but not necessary when starting out. While the approach described here is pretty cool, it's also somewhat heavyweight. You'll have to decide for yourself whether it's worth the extra cognitive overhead. Truthfully it can often be quite reasonable to have a single, monolithic sum type for all your application/library errors.}
  9 | 
 10 | \vspace{\baselineskip}
 11 | \noindent\textit{Original article: \cite{the_trouble_with_typed_errors}}
 12 | \end{quotation}
 13 | 
 14 | You, like me, program in either Haskell, or Scala, or F\#, or Elm, or PureScript, and you don't like runtime errors. They're awful and nasty! You have to debug them, and they're not represented in the types. Instead, we like to use \texttt{Either} (or something isomorphic) to represent stuff that might fail:
 15 | 
 16 | \begin{minted}{haskell}
 17 | data Either l r = Left l | Right r
 18 | \end{minted}
 19 | \texttt{Either} has a \texttt{Monad} instance, so you can short-circuit an \texttt{Either l r} computation with an \texttt{l} value, or bind it to a function on the \texttt{r} value.
 20 | 
 21 | So, we take our unsafe, runtime failure functions:
 22 | 
 23 | \begin{minted}{haskell}
 24 | head   :: [a] -> a
 25 | lookup :: k -> Map k v -> v
 26 | parse  :: String -> Integer
 27 | \end{minted}
 28 | and we use informative error types to represent their possible failures:
 29 | 
 30 | \begin{minted}{haskell}
 31 | data HeadError = ListWasEmpty
 32 | 
 33 | head :: [a] -> Either HeadError a
 34 | 
 35 | data LookupError = KeyWasNotPresent
 36 | 
 37 | lookup :: k -> Map k v -> Either LookupError v
 38 | 
 39 | data ParseError
 40 |     = UnexpectedChar Char String
 41 |     | RanOutOfInput
 42 | 
 43 | parse :: String -> Either ParseError Integer
 44 | \end{minted}
 45 | Except, we don't really use types like \texttt{HeadError} or \texttt{LookupError}. There's only one way that \texttt{head} or \texttt{lookup} could fail. So we just use \texttt{Maybe} instead. \texttt{Maybe a} is just like using \texttt{Either () a} -- there's only one possible \texttt{Left ()} value, and there's only one possible \texttt{Nothing} value. (If you're unconvinced, write \texttt{newtype Maybe a = Maybe (Either () a)}, derive all the relevant instances, and try and detect a difference between this \texttt{Maybe} and the stock one).
 46 | 
 47 | But, \texttt{Maybe} isn't great -- we've lost information! Suppose we have some computation:
 48 | 
 49 | \begin{minted}{haskell}
 50 | foo :: String -> Maybe Integer
 51 | foo str = do
 52 |     c <- head str
 53 |     r <- lookup str strMap
 54 |     eitherToMaybe (parse (c : r))
 55 | \end{minted}
 56 | Now, we try it on some input, and it gives us \texttt{Nothing} back. Which step failed? We actually can't know that! All we can know is that \textit{something} failed.
 57 | 
 58 | So, let's try using \texttt{Either} to get more information on what failed. Can we just write this?
 59 | 
 60 | \begin{minted}{haskell}
 61 | foo :: String -> Either ??? Integer
 62 | foo str = do
 63 |     c <- head str
 64 |     r <- lookup str strMap
 65 |     parse (c : r)
 66 | \end{minted}
 67 | Unfortunately, this gives us a type error. We can see why by looking at the type of \texttt{>>=}:
 68 | 
 69 | \begin{minted}{haskell}
 70 | (>>=) :: (Monad m) => m a -> (a -> m b) -> m b
 71 | \end{minted}
 72 | The type variable \texttt{m} must be an instance of \texttt{Monad}, and the type \texttt{m} must be exactly the same for the value on the left and the function on the right. \texttt{Either LookupError} and \texttt{Either ParseError} are not the same type, and so this does not type check.
 73 | 
 74 | Instead, we need some way of accumulating these possible errors. We'll introduce a utility function \texttt{mapLeft} that helps us:
 75 | 
 76 | \begin{minted}{haskell}
 77 | mapLeft :: (l -> l') -> Either l r -> Either l' r
 78 | mapLeft f (Left l) = Left (f l)
 79 | mapLeft _ r = r
 80 | \end{minted}
 81 | Now, we can combine these error types:
 82 | 
 83 | \begin{minted}{haskell}
 84 | foo :: String
 85 |     -> Either
 86 |         (Either HeadError (Either LookupError ParseError))
 87 |         Integer
 88 | foo str = do
 89 |     c <- mapLeft Left (head str)
 90 |     r <- mapLeft (Right . Left) (lookup str strMap)
 91 |     mapLeft (Right . Right) (parse (c : r))
 92 | \end{minted}
 93 | There! Now we can know exactly how and why the computation failed. Unfortunately, that type is a bit of a monster. It's verbose and all the \texttt{mapLeft} boilerplate is annoying.
 94 | 
 95 | At this point, most application developers will create a ``application error'' type, and they'll just shove everything that can go wrong into it.
 96 | 
 97 | \begin{minted}{haskell}
 98 | data AllErrorsEver
 99 |      = AllParseError ParseError
100 |      | AllLookupError LookupError
101 |      | AllHeadError HeadError
102 |      | AllWhateverError WhateverError
103 |      | FileNotFound FileNotFoundError
104 |      | etc...
105 | \end{minted}
106 | Now, this slightly cleans up the code:
107 | 
108 | \begin{minted}{haskell}
109 | foo :: String -> Either AllErrorsEver Integer
110 | foo str = do
111 |     c <- mapLeft AllHeadError (head str)
112 |     r <- mapLeft AllLookupError (lookup str strMap)
113 |     mapLeft AllParseError (parse (c : r))
114 | \end{minted}
115 | However, there's a pretty major problem with this code. foo is claiming that it can ``throw'' all kinds of errors -- it's being honest about parse errors, lookup errors, and head errors, but it's also claiming that it will throw if files aren't found, ``whatever'' happens, and etc. There's no way that a call to foo will result in \texttt{FileNotFound}, because foo can't even do \texttt{IO}! It's absurd. The type is too large! And I have written about keeping your types small (see \ref{sec:keep_your_types_small}) and how wonderful  it can be for getting rid of bugs.
116 | 
117 | Suppose we want to handle \texttt{foo}'s error. We call the function, and then write a case expression like good Haskellers:
118 | 
119 | \begin{minted}{haskell}
120 | case foo "hello world" of
121 |     Right val ->
122 |         pure val
123 |     Left err ->
124 |         case err of
125 |             AllParseError parseError ->
126 |                 handleParseError parseError
127 |             AllLookupError lookupErr ->
128 |                 handleLookupError
129 |             AllHeadError headErr ->
130 |                 handleHeadError
131 |             _ ->
132 |                 error "impossible?!?!?!"
133 | \end{minted}
134 | Unfortunately, this code is brittle to refactoring! We've claimed to handle all errors, but we're really not handling many of them. We currently ``know'' that these are the only errors that can happen, but there's no compiler guarantee that this is the case. Someone might later modify \texttt{foo} to throw another error, and this case expression will break. Any case expression that evaluates any result from \texttt{foo} will need to be updated.
135 | 
136 | The error type is too big, and so we introduce the possibility of mishandling it. There's another problem. Let's suppose we know how to handle a case or two of the error, but we must pass the rest of the error cases upstream:
137 | 
138 | \begin{minted}{haskell}
139 | bar :: String -> Either AllErrorsEver Integer
140 | bar str =
141 |     case foo str of
142 |         Right val -> Right val
143 |         Left err ->
144 |             case err of
145 |                 AllParseError pe ->
146 |                     Right (handleParseError pe)
147 |                 _ ->
148 |                     Left err
149 | \end{minted}
150 | We know that \texttt{AllParseError} has been handled by \texttt{bar}, because -- just look at it! However, the compiler has no idea. Whenever we inspect the error content of \texttt{bar}, we must either a) ``handle'' an error case that has already been handled, perhaps dubiously, or b) ignore the error, and desperately hope that no underlying code ever ends up throwing the error.
151 | 
152 | Are we done with the problems on this approach? No! There's no guarantee that I throw the right error!
153 | 
154 | \begin{minted}{haskell}
155 | head :: [a] -> Either AllErrorsEver a
156 | head (x:xs) = Right x
157 | head [] = Left (AllLookupError KeyWasNotPresent)
158 | \end{minted}
159 | This code typechecks, but it's \texttt{wrong}, because \texttt{LookupError} is only supposed to be thrown by \texttt{lookup}! It's obvious in this case, but in larger functions and codebases, it won't be so clear.
160 | 
161 | \section{Monolithic error types are bad}
162 | 
163 | So, having a monolithic error type has a ton of problems. I'm going to make a claim here:
164 | \begin{quotation}
165 | \textit{All error types should have a single constructor}
166 | \end{quotation}
167 | That is, no sum types for errors. How can we handle this?
168 | 
169 | Let's maybe see if we can make Either any nicer to use. We'll define a few helpers:
170 | 
171 | \begin{minted}{haskell}
172 | type (+) = Either
173 | infixr + 5
174 | 
175 | l :: l -> Either l r
176 | l = Left
177 | 
178 | r :: r -> Either l r
179 | r = Right
180 | \end{minted}
181 | Now, let's refactor that uglier Either code with these new helpers:
182 | 
183 | \begin{minted}{haskell}
184 | foo :: String
185 |     -> Either
186 |         (HeadError + LookupError + ParseError)
187 |         Integer
188 | foo str = do
189 |     c <- mapLeft l (head str)
190 |     r <- mapLeft (r . l) (lookup str strMap)
191 |     mapLeft (r . r) (parse (c : r))
192 | \end{minted}
193 | Well, the syntax is nicer. We can \texttt{case} over the nested \texttt{Either} in the error branch to eliminate single error cases. It's easier to ensure we don't claim to throw errors we don't -- after all, GHC will correctly infer the type of \texttt{foo}, and if GHC infers a type variable for any \texttt{+}, then we can assume that we're not using that error slot, and can delete it.
194 | 
195 | Unfortunately, there's still the \texttt{mapLeft} boilerplate. And expressions which you'd really want to be equal, aren't.
196 | 
197 | \begin{minted}{haskell}
198 | x :: Either (HeadError + LookupError) Int
199 | y :: Either (LookupError + HeadError) Int
200 | \end{minted}
201 | The values \texttt{x} and \texttt{y} are isomorphic, but we can't use them in a \texttt{do} block because they're not exactly equal. If we add errors, then we must revise all \texttt{mapLeft} code, as well as all case expressions that inspect the errors. Fortunately, these are entirely compiler-guided refactors, so the chance of messing them up is small. However, they contribute significant boilerplate, noise, and busywork to our program.
202 | 
203 | \section{Boilerplate be gone!}
204 | 
205 | Well, turns out, we can get rid of the order dependence and boilerplate with type classes! The most powerful approach is to use ``classy prisms'' from the \texttt{lens} package. Let's translate our types from concrete values to prismatic ones:
206 | 
207 | \begin{minted}{haskell}
208 | -- Concrete:
209 | head :: [a] -> Either HeadError a
210 | 
211 | -- Prismatic:
212 | head :: AsHeadError err => [a] -> Either err a
213 | 
214 | 
215 | -- Concrete:
216 | lookup :: k -> Map k v -> Either LookupError v
217 | 
218 | -- Prismatic:
219 | lookup
220 |     :: (AsLookupError err)
221 |     => k -> Map k v -> Either err v
222 | \end{minted}
223 | Now, type class constraints don't care about order -- \texttt{(Foo a, Bar a) => a} and \texttt{(Bar a, Foo a) => a} are exactly the same thing as far as GHC is concerned. The \texttt{AsXXX} type classes will automatically provide the \texttt{mapLeft} stuff for us, so now our foo function looks a great bit cleaner:
224 | 
225 | \begin{minted}{haskell}
226 | foo :: (AsHeadError err, AsLookupError err, AsParseError err)
227 |     => String -> Either err Integer
228 | foo str = do
229 |     c <- head str
230 |     r <- lookup str strMap
231 |     parse (c : r)
232 | \end{minted}
233 | This appears to be a significant improvement over what we've had before! And, most of the boilerplate with the \texttt{AsXXX} classes is taken care of via Template Haskell:
234 | 
235 | \begin{minted}{haskell}
236 | makeClassyPrisms ''ParseError
237 | -- this line generates the following:
238 | 
239 | class AsParseError a where
240 |     _ParseError :: Prism' a ParseError
241 |     _UnexpectedChar :: Prism' a (Char, String)
242 |     _RanOutOfInput :: Prism' a ()
243 | 
244 | instance AsParseError ParseError where
245 |     -- etc...
246 | \end{minted}
247 | However, we do have to write our own boilerplate when we eventually want to concretely handle these types. We may end up writing a huge \texttt{AppError} that all of these errors get injected into.
248 | 
249 | There's one major, fatal flaw with this approach. While it composes very nicely, it doesn't decompose at all! There's no way to catch a single case and ensure that it's handled. The machinery that prisms give us don't allow us to separate out a single constraint, so we can't pattern match on a single error.
250 | 
251 | Once again, our types become ever larger, with all of the problems that entails.
252 | 
253 | \section{Generics to the rescue!}
254 | 
255 | 
256 | What we really want is:
257 | 
258 | \begin{itemize}
259 | \item Order independence
260 | \item No boilerplate
261 | \item Easy composition
262 | \item Easy decomposition
263 | \end{itemize}
264 | In PureScript or OCaml, you can use open variant types to do this flawlessly. Haskell doesn't have open variants, and the attempts to mock them end up quite clumsy to use in practice.
265 | 
266 | I'm happy to say that the entire job is handled quite nicely with the amazing \texttt{generic-lens} package. I created \href{https://gist.github.com/parsonsmatt/880fbf79eaad6ed863786c6c02f8ddc9}{a gist} that demonstrates their usage, but the \textit{magic} comes down to this simple fact: there's an instance of the prismatic \texttt{AsType} class for \texttt{Either}, which allows you to ``pluck'' a constraint off. This satisfies all of the things I wanted in my list, and we can consider representing errors mostly solved.
267 | 
268 | \section{Mostly?}
269 | 
270 | Well, \texttt{ExceptT e IO} a still imposes a significant runtime performance hit, and asynchronous exceptions aren't considered here. A bifunctor IO type like \texttt{newtype BIO err a = BIO (IO a)} which carries the type class constraints of the errors it contains is promising, but I haven't been able to write a satisfying interface to this yet.
271 | 
272 | I also haven't used this technique in a large codebase yet, and I don't know how it scales. And the technique does require you to be comfortable with \texttt{lens}, which is a fairly high bar for training new folks on. I'm sure that API improvements could be made to make this style more accessible and remove some of the lens knowledge prerequisites.
273 | 


--------------------------------------------------------------------------------
/hillel_wayne2.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Metamorphic Testing - Hillel Wayne}
  2 | 
  3 | 
  4 | \begin{quotation}
  5 | \noindent\textit{\textbf{William Yao:}}
  6 | \textit{Another more general PBT post. The motivating problem: vanilla PBT assumes it's easy to generate inputs to our code. Sometimes it's not. For instance, what if you're testing an image classifier neural net? You can't randomly generate images, because you don't know what the output classification should be for a random image. So we might only have a small set of manually-classified test inputs. Metamorphic testing is a way of expanding our set of test inputs programmatically by transforming the inputs we do have in some way and finding relationships between the original result and the transformed result. For instance, if we invert the color of one of our test images, our classifier should probably give us the same result. If you make a property out of that, you now have more test cases for free, and that catches more bugs.}
  7 | 
  8 | \vspace{\baselineskip}
  9 | \noindent\textit{Original article: \cite{metamorphic_testing}}
 10 | \end{quotation}
 11 | 
 12 | 
 13 | 
 14 | \noindent Confession: I read the \href{https://queue.acm.org/}{ACM Magazine}. This
 15 | makes me a dweeb even in programming circles. One of the things I found
 16 | in it is ``Metamorphic Testing''. I've never heard of it, and nobody I
 17 | knew heard about it either. But the academic literature was shockingly
 18 | impressive: many incredibly successful case studies in wildly different
 19 | fields. So why haven't we heard of it before? There's only
 20 | \href{https://medium.com/trustableai/testing-ai-with-metamorphic-testing-61d690001f5c}{one}
 21 | article anywhere targeted at people outside academia. Let's make it two.
 22 | 
 23 | \section{Background}
 24 | \label{background}
 25 | 
 26 | Most written tests use \textbf{oracles}. That's where you know the
 27 | answer and are explicitly checking that the computation gives you the
 28 | answer.
 29 | 
 30 | \begin{minted}{python}
 31 | def test_dist():
 32 |     p1 = (0, 3)
 33 |     p2 = (4, 0)
 34 |     assert dist(p1, p2) == 5
 35 | \end{minted}
 36 | In addition to being an oracle test, it's also a manual test. Somebody
 37 | sat down and decided specific inputs and specific outputs. As systems
 38 | get more complex, bespoke manual tests become less and less useful. Each
 39 | one only hits a single point in a larger state space, and we want
 40 | something that covers the state space.
 41 | 
 42 | This gives us \textbf{generative testing}: writing tests that hit a
 43 | random set of the statespace. The most popular style of generative
 44 | testing is \textbf{property based testing}, or PBT. We find a
 45 | ``property'' of the function and then generate inputs and see if the
 46 | outputs match that property.
 47 | 
 48 | \begin{minted}{python}
 49 | def test_dist():
 50 |     p1 = random_point()
 51 |     p2 = random_point()
 52 |     assert dist(p1, p2) >= 0
 53 | \end{minted}
 54 | The advantage of PBT is that it gives more coverage. The downside is
 55 | that we've lost specificity. This is \emph{not} an oracle test anymore!
 56 | We don't know what the answer should be, and the function might be
 57 | broken in a way that has the same property. We rely on heuristics here.
 58 | 
 59 | One big problem with PBT is finding good properties. Most functions have
 60 | simple, general properties and complex, specific properties.
 61 | General properties (see chapter \ref{sec:choosing_properties_for_testing})
 62 | can be applied to a wider variety of functions but don't
 63 | give us much information. More specific properties give more
 64 | information, but are harder to find and only apply to specific problem
 65 | domains. If you had a function that determined whether or not a graph is
 66 | acyclic, what property tests would you write? Would they give you
 67 | confidence your function is right?
 68 | 
 69 | \section{Motivation}
 70 | \label{motivation}
 71 | 
 72 | Now take a more complex problem. Imagine we're trying to write an
 73 | English speech-to-text (STT) processor. It takes a sound file and
 74 | outputs the text. How would you test it?
 75 | 
 76 | The simplest way is with a manual oracle. Read out a sentence and
 77 | confirm it gives you that sentence. But this isn't nearly enough! The
 78 | range of human speech is \emph{enormous}. It'd be better if we could
 79 | instead test 1,000 or 10,000 different sound files. Manually
 80 | transcribing oracles is going to be way too expensive. This means we
 81 | have to use property-based testing instead.
 82 | 
 83 | But how do we generate the inputs? One way would be to create random
 84 | strings, then run them through a text-to-speech processor (TTS), and
 85 | then check our STT gives the same text. But, once again, this gives us a
 86 | very limited range of human speech. Will our TTS give us changes in
 87 | tone, slurred words, strong accents? If we don't handle those, is our
 88 | STT actually that useful? We're better off sweeping for ``wild'' text,
 89 | such as from radio, podcasts, online videos.
 90 | 
 91 | Now we have a new problem. Using a TTS meant we started with the
 92 | transcription. We don't have that with ``wild'' text, and we still don't
 93 | want to transcribe it ourselves. We're restricted to using properties
 94 | instead. So what properties should we test? Some simple ones might be
 95 | ``it doesn't crash on any input'' (good) or ``It doesn't turn acoustic
 96 | music into words'' (maybe?). These properties don't really cover the
 97 | ``intent'' of the program, and don't increase confidence all that much.
 98 | 
 99 | So we have two problems. One, we need a wide variety of speech inputs.
100 | Two, we need a way to know make them into useful tests without spending
101 | hours manually transcribing the speech into oracles.
102 | 
103 | \section{Metamorphic Testing}
104 | \label{metamorphic-testing}
105 | 
106 | That all treats the output in isolation. What if we embed it in a
107 | broader context? For example, if a given soundclip transcibes to output
108 | \texttt{out}, then we should \emph{still} get output \texttt{out} if we:
109 | 
110 | \begin{itemize}
111 | \item
112 |   Double the volume, or
113 | \item
114 |   Raise the pitch, or
115 | \item
116 |   Increase the tempo, or
117 | \item
118 |   Add some background static, or
119 | \item
120 |   Add some traffic noises, or
121 | \item
122 |   Do any combination of the above.
123 | \end{itemize}
124 | All of these are ``straightforward'' transformations we can easily test.
125 | For example, for the ``traffic noises'' test, we can take 10 traffic
126 | samples, overlay them on a soundclip, and see that all 11 versions
127 | match. We can double or half the volume to turn 11 versions into 33
128 | versions, and double the tempo to get 66 versions. Then we can then
129 | scale this up to every soundclip in our database, which helps augment
130 | the space of our inputs.
131 | 
132 | Having 66 versions to compare is useful enough. However, there's
133 | something else here: we don't need to know what the output is. If all 66
134 | transformations return \texttt{out}, the test passes, and if any return
135 | something different, the test fails. At no point do we need to check
136 | what \texttt{out} is. This is really, really big. It dramatically
137 | increases the range we can test with very little human effort. We could,
138 | for example, download an episode of \emph{This American Life}, run the
139 | transformations, and see if they all
140 | match\footnote{Okay, there's obvious problems here,
141 |   because the podcast might have music, samples in other languages, etc.
142 |   But the theory is sound: given we have a way of acquiring speech
143 |   samples, we can use it as part of tests without having to manually
144 |   label it first.}.
145 | We have useful tests \emph{without listening to the voice clip.} We can
146 | now generate complex, deep tests without the use of an oracle!
147 | 
148 | The two inputs, along with their outputs, are all connected to each
149 | other. This kind of property spanning multiple inputs/outputs is called
150 | a \textbf{metamorphic
151 | relation}\footnote{The corresponding idea in
152 |   specifications is \textbf{hyperproperties}, properties on sets of
153 |   behaviors instead of individual behaviors. Most HP research is
154 |   concerned with security hyperproperties. As I understand it HPs are a
155 |   superset of MRs.}.
156 | Testing that leverages this is called \textbf{metamorphic testing}. For
157 | complex systems, it can be easier to find interesting metamorphic
158 | relations than interesting single input/output properties.
159 | 
160 | To be a bit more formal: if we have \texttt{x} and \texttt{f(x)}, we can
161 | make some transformation on \texttt{x} to get \texttt{x2} and
162 | \texttt{f(x2)}. In the STT case, we just checked
163 | \texttt{f(x)\ =\ f(x2)}, but we can use whatever relations we want
164 | between the two. You could also have MRs like
165 | \texttt{f(x2)\ \textgreater{}\ f(x)} or ``\texttt{f(x2)/f(x)} is an
166 | integer''. Similarly, we can also span more than two inputs, using
167 | \texttt{f(x)} and \texttt{f(x3)}. One example of this might be comparing
168 | search engine results with no filters to engine results with one filter
169 | and two filters. Most of the case studies I read only use two inputs,
170 | because even that is enough to find crazy bugs.
171 | 
172 | \section{The Case Studies}
173 | \label{the-case-studies}
174 | 
175 | Speaking of case studies: How effective is MT in practice? It's one
176 | thing to talk about a technique in abstract, or provide toy examples.
177 | Reading case studies is useful for three reasons. First, it shows
178 | whether or not this actually works. Next, it shares some potential
179 | gotchas if we try to use MT. Finally, it gives us ideas on \emph{how} we
180 | can use it. Any MR a case study uses is something we might be able to
181 | adapt for our own purposes.
182 | 
183 | \href{http://www.cs.hku.hk/research/techreps/document/TR-2017-04.pdf}{``Metamorphic
184 | Testing: A Review of Challenges and Opportunities''} lists a lot of
185 | studies, but they're all academic papers. Here are a few of the most
186 | interesting ones. Articles marked \texttt{(pdf)} are, unsurprisingly,
187 | PDFs.
188 | 
189 | \begin{mldescription}
190 | \mlitem{\href{https://arxiv.org/abs/1807.10453}{METTLE: A Metamorphic
191 | Testing Approach To Validating Unsupervised Machine
192 | Learning Methods}
193 | (pdf)}
194 | Defines 11 different MRs for testing unsupervised clustering, like ``do
195 | we get the same result if we shuffle the inputs?'' and ``do additional
196 | inputs at cluster boundaries belong to those clusters?'' Different
197 | models changed under dfferent relations. For example, about 5\% of
198 | tested k-means models had a mean clustering error of 20\% under
199 | shuffling the order of input points
200 | 
201 | \mlitem{\href{https://arxiv.org/abs/1708.08559}{DeepTest: Automated
202 | Testing of Deep-Neural-Network-driven Autonomous Cars} (pdf)}
203 | Subject was car vision systems, MRs were things like ``adding a rain
204 | filter'' or ``slightly tilting the image''. Authors put sample results
205 | \href{https://deeplearningtest.github.io/deepTest/}{here}: Pretty much
206 | all the systems they tested collapsed under the MR changes.
207 | 
208 | \mlitem{\href{http://multicore.doc.ic.ac.uk/publications/oopsla-17.html}{Automated
209 | Testing of Graphics Shader Compilers} (pdf)}
210 | Injecting dead code and runtime-constants into shaders made things in
211 | pictures disappear or turn to noise. The researchers made a startup
212 | called
213 | \href{https://web.archive.org/web/20180710214938/http://www.graphicsfuzz.com/}{GraphicsFuzz}
214 | off their work, which was acquired by Google and the site taken down.
215 | 
216 | \mlitem{\href{http://www.lsi.us.es/~segura/files/papers/segura17-tse.pdf}{Metamorphic
217 | Testing of RESTful Web APIs} (pdf)}
218 | Do you get the same items when you change the
219 | \href{https://github.com/spotify/web-api/issues/225}{pagination}? What
220 | if you order them by date? A whole bunch of errors in Spotify and
221 | Youtube in this paper.
222 | 
223 | 
224 | \item{\href{https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-24}{An
225 | innovative approach for testing bioinformatics programs using
226 | metamorphic testing} (pdf, but now not)}
227 | Finding mistakes in bioinformatics stuff? Look I barely understand
228 | bioinformatics, but it's demonstrating how MR is useful in specialist
229 | domains.
230 | \end{mldescription}
231 | 
232 | 
233 | \hypertarget{the-problem}{%
234 | \subsection{The Problem}
235 | \label{the-problem}}
236 | 
237 | Huh, they're all PDFs.
238 | 
239 | Finding all of those took several hours. And that ties into the biggest
240 | drag on MT adoption: All of the above are \textbf{preprints}, or first
241 | drafts of eventual academic papers. When I dig into obscure techniques,
242 | I always ask ``why is it obscure?'' Sometimes there's an obvious reason,
243 | sometimes it's a complex set of subtle reasons, sometimes it's just bad
244 | luck.
245 | 
246 | In the case of MT the problem is obvious. \textbf{Almost all of the info
247 | is behind academic paywalls.} If you want to learn about MT, you either
248 | need journal access or spend hours hunting down
249 | preprints\footnote{I had a second, refuted hypothesis:
250 |   since a lot of the major researchers are from China and Hong Kong,
251 |   maybe the technique was more well-known in Mandarin-language
252 |   programming communities than English-language ones.
253 |   \href{https://twitter.com/sindarknave}{Brian Ng} was kind enough to
254 |   check for me and didn't find significant use.}.
255 | 
256 | \hypertarget{learning-more}{%
257 | \subsection{Learning More}
258 | \label{learning-more}}
259 | 
260 | The inventor of MT is
261 | \href{https://www.swinburne.edu.au/science-engineering-technology/staff/profile/index.php?id=tychen}{TY
262 | Chen}. He's also the driver of a lot of the research. Other names are
263 | \href{https://www.uow.edu.au/~zhiquan/}{Zhi Quan Zhou} and
264 | \href{http://personal.us.es/sergiosegura/publications/}{Sergio Segura},
265 | both of whom have put all of their preprints online. Most of the
266 | research is by one of those.
267 | 
268 | The best starting resource are probably
269 | \href{http://www.cs.hku.hk/research/techreps/document/TR-2017-04.pdf}{Metamorphic
270 | Testing: A Review of Challenges and Opportunities} and
271 | \href{http://www.lsi.us.es/~segura/files/papers/segura16-tse.pdf}{A
272 | Survey on Metamorphic Testing}. While this article was about Metamorphic
273 | \emph{Testing}, researchers have also been applying Metamorphic
274 | Relationships in general to a wide variety of other disciplines, such as
275 | formal verification and debugging. I have not researched those other
276 | uses in depth, but they're probs also worth looking into.
277 | 
278 | In terms of application, it should be theoretically possible to adapt
279 | most PBT libraries to check metamorphic properties. In fact the first
280 | example in the
281 | \href{https://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/quick.pdf}{Quickcheck}
282 | tests a MR, and
283 | \href{https://hypothesis.works/articles/testing-optimizers-with-hypothesis/}{this}
284 | essay on PBT implicitly uses an MR. \emph{In general} it seems to me
285 | that most PBT research focuses on how we effectively generate and shrink
286 | inputs, while MT research is more focused on determining what we
287 | actually want to test. As such they are probably complementary
288 | techniques.
289 | 
290 | \emph{Thanks to \href{https://twitter.com/sindarknave}{Brian Ng} for
291 | help researching this.}
292 | 
293 | \subsection{PS: Request}
294 | \label{ps-request}
295 | 
296 | It's not actually that surprising that I never heard of this before.
297 | There's a lot of really interesting, useful techniques that never leave
298 | their tiny bubble. Learning about MT was more luck than any action on my
299 | part.
300 | 
301 | If you know of anything you think deserves wider use, please
302 | \href{mailto:h@hillelwayne.com}{email} me.
303 | 


--------------------------------------------------------------------------------
/hillel_wayne.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Finding Property Tests - Hillel Wayne}
  2 | \label{sec:finding_property_tests}
  3 | 
  4 | \begin{quotation}
  5 | \noindent\textit{\textbf{William Yao:}}
  6 | \textit{More starting points for figuring out what properties to test.}
  7 | 
  8 | \vspace{\baselineskip}
  9 | \noindent\textit{Original article: \cite{finding_properties}}
 10 | \end{quotation}
 11 | 
 12 | A while back I was ranting about APLs and included this python code to
 13 | get the mode of a list:
 14 | 
 15 | \begin{minted}{python}
 16 | def mode(l):
 17 |   max = None
 18 |   count = {}
 19 |   for x in l:
 20 |     if x not in count:
 21 |       count[x] = 0
 22 |     count[x] += 1
 23 |     if not max or count[x] > count[max]:
 24 |       max = x
 25 |   return max
 26 | \end{minted}
 27 | There's a bug in it. Do you see it? If not, try running it on the list
 28 | \texttt{{[}0,\ 0,\ 1{]}}:
 29 | 
 30 | \begin{minted}{python}
 31 | >>> mode([0, 0, 1])
 32 | 1
 33 | \end{minted}
 34 | The issue is that 0 is falsy, so if max is 0, \texttt{if\ not\ max} is
 35 | true.
 36 | 
 37 | I could say this bug was a result of carelessness. I didn't write any
 38 | tests for this function, just tried a few obvious examples and thought
 39 | ``yeah it works''. But I don't think writing bespoke manual unit tests
 40 | would have caught this. To surface the bug, the mode of a list must be a
 41 | falsy value like 0 or \texttt{{[}{]}} \emph{and} the last value of the
 42 | list must be something else. It's a small intersection of Python's
 43 | typing and the mechanics of \texttt{mode}, making it too unusual a case
 44 | to be found by standard unit testing practice.
 45 | 
 46 | A different testing style is property based testing (PBT) with
 47 | \href{https://www.hillelwayne.com/post/contracts/}{contracts}. By
 48 | generating a random set of inputs, we cover more of the state space than
 49 | we'd do manually. The problem with PBT, though, is that it can be hard
 50 | to find good properties. I'd like to take mode as a case study in what
 51 | properties we could come up with.
 52 | There are a few things I'm looking for:
 53 | 
 54 | \begin{itemize}
 55 | 
 56 | \item
 57 |   The property test should find the bug.
 58 | \item
 59 |   The test should be \emph{simple}. I'm presumably not putting a lot of
 60 |   effort into testing such a simple function, so a complex test doesn't
 61 |   accurately capture what I'd do.
 62 | \item
 63 |   The test should be \emph{obvious}. I'm looking for a natural test that
 64 |   finds the bug, not a post-hoc one. Catching a bug with tests is much
 65 |   less believable if you already know what you're looking for.
 66 | \end{itemize}
 67 | 
 68 | So, let's talk some tests and contracts! I'm using
 69 | \href{http://hypothesis.works/}{hypothesis} for the property tests,
 70 | \href{https://github.com/deadpixi/contracts}{dpcontracts} for my
 71 | contracts library, \href{https://docs.pytest.org/en/latest/}{pytest} for
 72 | the runner\footnote{Oddly enough I'm getting increasingly less sold on using pytest, purely because I want to experiment with weird janky metaprogramming in my tests and pytest doesn't really support that.}.
 73 | For the sake of this problem, assume we're only passing in nonempty
 74 | lists.
 75 | 
 76 | \begin{minted}{python}
 77 | @require("l must not be empty", lambda args: len(args.l) > 0)
 78 | def mode(l):
 79 |   ...
 80 | \end{minted}
 81 | 
 82 | \section{Contract-wise}
 83 | \label{contract-wise}
 84 | 
 85 | One property of a function is ``all of the contracts are satisfied''. We
 86 | can use this to write ``thin'' tests, where we don't put any assertions
 87 | in the test itself. If any of our contracts raise an error then the test
 88 | will fail.
 89 | 
 90 | \begin{minted}{python}
 91 | @given(lists(integers(), min_size=1))
 92 | def test_mode(l):
 93 |     mode(l)
 94 | \end{minted}
 95 | 
 96 | \subsection{Types}
 97 | \label{types}
 98 | 
 99 | Typically in dynamic programming languages, contracts are used as a poor
100 | replacement of a static type system. Instead of checking the type at
101 | compile time, people check the type at runtime. Most contract libraries
102 | are heavily geared towards this kind of use.
103 | 
104 | \begin{minted}{python}
105 | @ensure("result is an int", lambda _, r: isinstance(r, int))
106 | \end{minted}
107 | This is a bad contract for three reasons:
108 | 
109 | \begin{enumerate}
110 | \item
111 |   It requires the function to return integers when it's currently
112 |   generic. We \emph{could} try to make it generic by doing something
113 |   like \texttt{type(a.l)\ ==\ type(r)}, but \emph{ugh}.
114 | \item
115 |   We should be using mypy for type checks anyway.
116 | \item
117 |   It doesn't actually find the error. The problem isn't the type, it's
118 |   that we got the wrong result.
119 | \end{enumerate}
120 | 
121 | \subsubsection{Sanity Checking}
122 | \label{sanity-checking}
123 | 
124 | We can go further than replicating static types. One common type of
125 | contract is a ``sanity check'': some property that does not fully
126 | specify our code, but is easy to express and should hold true anyway.
127 | For example, we know the mode will be an element of the list, so why not
128 | check that we're returning an element of the list?
129 | 
130 | \begin{minted}{python}
131 | @ensure("result is in list", lambda a, r: r in a.l)
132 | \end{minted}
133 | This is a pretty good contract! It tells us useful things about the
134 | function, and it's not easily replacable with a typecheck. If I was
135 | writing production code I'd probably write a lot of contracts like this.
136 | But it also doesn't find the problem, so we need to go further.
137 | 
138 | \subsection{First element}
139 | \label{first-element}
140 | 
141 | Our sanity check was only minimally related to our function. There are
142 | lots of functions that return elements in the list: \texttt{head},
143 | \texttt{random\_element}, \texttt{last}, etc. The issue is a subtle bug
144 | in our implementation. Our contract should express some important
145 | property about our function. In mode's case, it should relate to the
146 | count of the value.
147 | 
148 | One \emph{extremely} useful property is adding bounds. The mode of a
149 | list is the element that occurs most frequently. Every element of the
150 | list should occur less often, or as often, as the
151 | mode\footnote{We have to say ``less than or \textit{equal} to'' for two reasons. First, the mode is not strictly more frequent than other elements, like in \texttt{[1, 1, 2, 2]}. Second, what if the mode \textit{is} \texttt{l[0]}?}.
152 | One good arbitrary element is the first element:
153 | 
154 | \begin{minted}{python}
155 | @ensure("result > arbitrary", 
156 |   lambda a, r: a.l.count(r) >= a.l.count(a.l[0]))
157 | \end{minted}
158 | This finds the bug!
159 | 
160 | \begin{minted}{python}
161 | args = ([0, 0, 1],), kwargs = {}, rargs = Args(l=[0, 0, 1])
162 | result = 1
163 | E           dpcontracts.PostconditionError: result > arbitrary
164 | \end{minted}
165 | Personally, I'd prefer this as a property test clause instead of a
166 | contract clause. It doesn't feel ``right'' to me. I think it's more an
167 | aesthetic judgment than a technical one here.
168 | 
169 | \begin{minted}{python}
170 | @given(lists(integers(), min_size=1))
171 | def test_mode(l):
172 |     x = mode(l)
173 |     assert l.count(x) >= l.count(l[0])
174 | \end{minted}
175 | Either way, this is only a partial contract: while it will catch
176 | \emph{some} incorrect outputs, it won't catch them all. We could get
177 | this with \texttt{{[}1,\ 0,\ 0,\ 2{]}}:
178 | \texttt{count(0)\ \textgreater{}\ count(2)\ \textgreater{}=\ count(1)},
179 | but our broken function would return \texttt{2}. In some cases, this is
180 | all we can feasibly get. For simpler functions, though, we can rule out
181 | all incorrect outputs. We want a total contract, one which always raises
182 | on an incorrect output and never raises on a correct one.
183 | 
184 | \subsection{The dang definition}
185 | \label{the-dang-definition}
186 | 
187 | Why not just use the definition itself?
188 | 
189 | \begin{minted}{python}
190 | @ensure("result is the mode", 
191 |   lambda a, r: all((a.l.count(r) >= a.l.count(x) for x in a.l)))
192 | \end{minted}
193 | To make this nicer we can extract this into a dedicated helper contract:
194 | 
195 | \begin{minted}{python}
196 | def is_mode(l, m):
197 |     return all((l.count(m) >= l.count(x) for x in l))
198 | 
199 | @ensure("result is the mode", lambda a, r: is_mode(a.l, r))
200 | \end{minted}
201 | This also catches the error.
202 | 
203 | \begin{minted}{python}
204 | args = ([0, 0, 1],), kwargs = {}, rargs = Args(l=[0, 0, 1])
205 | result = 1
206 | \end{minted}
207 | This is the same faulty input as before. Property-based Testing
208 | libraries \textbf{shrink} inputs to find the smallest possible error,
209 | which is \texttt{{[}0,\ 0,\ 1{]}}.
210 | 
211 | \subsubsection{Compare to an Oracle}
212 | \label{compare-to-an-oracle}
213 | 
214 | If we had another way of getting the answer that we knew was correct, we
215 | could just compare the two results and see if they're the same. This is
216 | called using an \textbf{oracle}. Oracles are often a good choice when
217 | you're trying to refactor a complex function or optimize an expensive
218 | one. For our purposes, it goes too far.
219 | 
220 | \begin{minted}{python}
221 | from collections import Counter
222 | 
223 | def math_mode(l):
224 |     c = Counter(l)
225 |     return c.most_common(1)[0][0]
226 | 
227 | @require("l must not be empty", lambda args: len(args.l) > 0)
228 | @ensure("result matches oracle", lambda a, r: r == math_mode(a.l))
229 | def mode(l):
230 |   ...
231 | \end{minted}
232 | This is too heavy. Not only is it cumbersome, but it overconstrains what
233 | the mode can be. We see this in the error it finds: it finds an error
234 | with a smaller input than the other two!
235 | 
236 | \begin{minted}{python}
237 | args = ([0, 1],), kwargs = {}, rargs = Args(l=[0, 1]), result = 1
238 | \end{minted}
239 | We haven't precisely defined the semantics of \texttt{mode}. If there
240 | are two values which tie for the most elements, which is the mode? Our
241 | prior contracts didn't say: as long as we picked an element that had at
242 | least as many instances as any other element, we were good. With
243 | \texttt{math\_mode}, we're arbitrarily choosing one of them as the
244 | ``real'' mode and checking that our \texttt{mode} picked that arbitrary
245 | element. We can see this better by writing a manual test:
246 | 
247 | \begin{minted}{python}
248 | def test_mode():
249 |     mode([3, 2, 2, 3])
250 | 
251 | ...
252 | 
253 | args = ([3, 2, 2, 3],), kwargs = {}, rargs = Args(l=[3, 2, 2, 3])
254 | result = 2
255 | \end{minted}
256 | Whereas with our previous contract passes on this.
257 | 
258 | \section{Property-wise}
259 | \label{property-wise}
260 | 
261 | Our contract approach converged on ``testing the definition'' as the
262 | best result. There are many cases where code-under-test does not have a
263 | nice mathematical definition. Contracts are still useful here, as they
264 | can rule out bad cases, but you'll need additional tests.
265 | 
266 | \emph{Hypothetically} contracts can express all possible properties of a
267 | function. In practice you're limited to what your framework can express
268 | and check. For most complicated properties we're better off sticking it
269 | in a dedicated test.
270 | 
271 | ``Property-wise'' property tests have several advantages over
272 | ``contract-wise'' property tests:
273 | 
274 | \begin{enumerate}
275 | 
276 | \item
277 |   We can test properties that aren't ``ergonomic'' in our contract
278 |   framework.
279 | \item
280 |   We can test properties that involve effects.
281 | \item
282 |   We can test
283 |   \href{https://www.hillelwayne.com/post/metamorphic-testing/}{metamorphic
284 |   relations} which involve comparing multiple function calls.
285 | \end{enumerate}
286 | For this \texttt{mode} problem we don't need any of them, but let's show
287 | off some possible tactics anyway.
288 | 
289 | \subsection{Preserving
290 | Transformation}
291 | \label{preserving-transformation}
292 | 
293 | We find some transformation of the input that should give the same
294 | output. For lists, a good transformation is
295 | sorting\footnote{Unless you're trying to test a sorting function.}.
296 | The mode of a list doesn't change if you sort it:
297 | 
298 | \begin{minted}{python}
299 | @given(lists(integers(), min_size=1))
300 | def test_sorting_preserves_mode(l):
301 |     assert mode(l) == mode(sorted(l))
302 | 
303 | ...
304 | 
305 | l = [0, 0, -1]
306 | \end{minted}
307 | We could also reverse the list instead of sort it, but that gives us an
308 | error case of \texttt{{[}1,\ -1{]}}, which again is due to
309 | overconstraints.
310 | 
311 | Or we could assert that the mode doesn't change if we add it again to
312 | the list:
313 | 
314 | \begin{minted}{python}
315 | def test_can_add_to_mode(l):
316 |     m = mode(l)
317 |     assert mode(l + [m]) == m
318 | \end{minted}
319 | This does \emph{not} find the bug, though.
320 | 
321 | \subsection{Controlled
322 | Transformation}
323 | \label{controlled-transformation}
324 | 
325 | Instead of finding a solution that doesn't change the answer, we could
326 | find one that changes it in a known way. One of them might be ``doubling
327 | all of the numbers doubles the mode'':
328 | 
329 | \begin{minted}{python}
330 | @given(lists(integers(), min_size=1))
331 | def test_doubling_doubles_mode(l):
332 |     doubled = [x * 2 for x in l]
333 |     assert 2*mode(l) == mode(doubled)
334 | \end{minted}
335 | This does not find the bug. We could also try ``adding 1 to every
336 | element adds 1 to the mode'':
337 | 
338 | \begin{minted}{python}
339 | @given(lists(integers(), min_size=1))
340 | def test_incrementing_increments_mode(l):
341 |     incremented = [x + 1 for x in l]
342 |     assert mode(l)+1 == mode(incremented)
343 | 
344 | ...
345 | 
346 | l = [0, 1]
347 | \end{minted}
348 | 
349 | It gives the same output as in our overconstrained case, but it only
350 | ``is wrong'' when we have \texttt{0}'s anyway. If we restrict the list to only
351 | positive integers, it will pass (unlike our oracle contract).
352 | 
353 | If we wanted to be extra thorough we could generatively pick both a
354 | slope and an increment:
355 | 
356 | \begin{minted}{python}
357 | @given(lists(integers(), min_size=1), integers())
358 | def test_affine_relation(l, m, b):
359 |     transformed = [m*x+b for x in l]
360 |     assert  m*mode(l)+b == mode(transformed)
361 | 
362 | ...
363 | 
364 | l = [0, 1], m = 1, b = 1
365 | \end{minted}
366 | It depends on how paranoid you want to get.
367 | 
368 | \subsection{Oracle Generators}
369 | \label{oracle-generators}
370 | 
371 | The big advantage of manual tests to generative ones is that you can
372 | come up with the appropriate outputs for a given input. Since we can't
373 | easily do that in PBT, we're stuck testing properties instead of
374 | oracles.
375 | 
376 | Or we could go in reverse: take a random output and generate a
377 | corresponding input. One way we can do that:
378 | 
379 | \begin{enumerate}
380 | 
381 | \item
382 |   generate pairs of elements and counts. Make sure that the elements are
383 |   unique
384 | \item
385 |   construct a list from that
386 | \item
387 |   pass in both the list and the corresponding mode, selected from the
388 |   pair.
389 | \end{enumerate}
390 | 
391 | \begin{minted}{python}
392 | @composite
393 | def list_and_mode(draw):
394 |     out = []
395 |     pairs_max_10 = tuples(integers(), integers(min_value=1, max_value=10))
396 |     counts = draw(lists(pairs_max_10, 
397 |         min_size=1,
398 |         max_size=5, 
399 |         unique_by= lambda x: x[0]))
400 |     for number, count in counts:
401 |         out += ([number] * count)
402 |     mode_of_out = max(counts, key=lambda x: x[1])[0]
403 |     return out, mode_of_out
404 | 
405 | @given(list_and_mode())
406 | def test_can_find_mode(lm):
407 |     l, m = lm
408 |     assert mode(l) == m
409 | 
410 | ...
411 | 
412 | lm = ([0, 1], 0)
413 | \end{minted}
414 | This overconstrains (we're not ruling out two pairs having the same
415 | counts), but it \emph{does not} raise a false positive for
416 | \texttt{{[}3,\ 2,\ 2,\ 3{]}}. This is because we construct the list in
417 | the same way \texttt{max} interprets the list. If we do
418 | 
419 | \begin{verbatim}
420 | -   for number, count in counts:
421 | +   for number, count in reversed(counts):
422 | \end{verbatim}
423 | 
424 | Then it raises \texttt{{[}3,\ 2{]}} as a ``counterexample''. Between the
425 | cumbersomeness and the overconstraining, making an oracle generator is
426 | not a good choice for this problem. There are some cases, though, where
427 | it can be more useful.
428 | 
429 | \section{Limitations}
430 | \label{limitations}
431 | 
432 | Here's a fixed version that \emph{looks} like it will work:
433 | 
434 | \begin{minted}{python}
435 | def mode(l):
436 |   max = None
437 |   count = {}
438 |   for x in l:
439 |     if x not in count:
440 |       count[x] = 0
441 |     count[x] += 1
442 | +   if max is None or count[x] > count[max]:
443 | -   if not max or count[x] > count[max]:
444 |       max = x
445 |   return max
446 | \end{minted}
447 | And this passes all of our tests. But there's still a bug in it. Again,
448 | take a minute to see if you can find it. If you can't, try the
449 | following:
450 | 
451 | \begin{minted}{python}
452 | mode([None, None, 2])
453 | \end{minted}
454 | This will select the mode as \texttt{2}, when it really should be
455 | \texttt{None}. The problem isn't in our contracts or assertions. It's in
456 | our test generator: we're only testing with lists of integers.
457 | Hypothesis can generate heterogenous lists, but you still have to
458 | explicitly list the types you want to be in the list. In order to find
459 | this bug we'd have to explicitly realize that \texttt{None} might be a
460 | problem for us.
461 | 
462 | If we only want to call \texttt{mode} on homogenous lists, we should
463 | instead use a typechecker to catch the bug:
464 | 
465 | \begin{minted}{python}
466 | + def mode(l: List[T]) -> T:
467 | - def mode(l):
468 |     max = None
469 | +   count = {} # type: Dict[T, int]
470 | -   count = {}
471 | \end{minted}
472 | This will raise a spurious error, saying that the return value is
473 | actually an \texttt{Optional{[}T{]}}. If we change \texttt{max\ =\ None}
474 | to \texttt{max\ =\ l{[}0{]}} both the error and the bug go away. But we
475 | can change the return value to \texttt{Optional{[}T{]}} and the bug
476 | remains- mypy can't actually detect if we're passing in a heterogenous
477 | list. More type-oriented languages can ban heterogenous lists outright
478 | but even those will miss the bugs our contracts caught. Static and
479 | dynamic analysis are complementary, not
480 | contradictory\footnote{Both contracts and properties can be checked statically, but \textit{most} people using them will be checking them at runtime. This is because static analysis of contracts quickly turns into formal verification, which is really, really hard. }.
481 | 
482 | \subsection{Summary}
483 | 
484 | This was a pretty short dive into what makes a good property or
485 | contract. It also focused on just pure functions: a lot of languages use
486 | contracts to maintain class invariants or monitor the side effects of
487 | procedures.
488 | 
489 | If you're interested in learning more about properties,
490 | chapter \ref{sec:choosing_properties_for_testing}
491 | is a canonical article on abstract properties and chapter \ref{sec:screencast_introduction}
492 | is a series on applying it to business problems. If you're interested in
493 | learning more about contracts, I'd recommend\ldots{} actually, I can't
494 | think of anything that's not language-specific. Kind of surprising given
495 | how useful they are.
496 | 
497 | 
498 | 


--------------------------------------------------------------------------------
/matt_parsons.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Type Safety Back and Forth - Matt Parsons}
  2 | \label{sec:type_safety_back_and_forth}
  3 | 
  4 | \begin{quotation}
  5 | \noindent\textit{\textbf{William Yao:}}
  6 | 
  7 | \textit{Essential reading. A very typical design technique of restricting what inputs your functions take to ensure that they can't fail.}
  8 | 
  9 | \textit{The most typical instances of using this are \texttt{NonEmpty} for lists with at least one element and \texttt{Natural} for non-negative integers. Oddly, I don't often see people do the same thing for other structures where this would be useful; for instance, nonempty \texttt{Vector}s or nonempty \texttt{Text}. Thankfully, it's easy enough to define yourself.}
 10 | \begin{minted}{haskell}
 11 | data NonEmptyVec a = NonEmptyVec a (Vector a)
 12 | 
 13 | -- An additional invariant you might want to enforce is
 14 | -- 'not all whitespace'
 15 | data NonEmptyText = NonEmptyText Char Text
 16 | \end{minted}
 17 | \textit{Original Article: \cite{type_safety_back_and_forth}}
 18 | \end{quotation}
 19 | 
 20 | 
 21 | \noindent Types are a powerful construct for improving program safety. Haskell has a few notable ways of handling potential failure, the most famous being the venerable \texttt{Maybe} type:
 22 | 
 23 | \begin{minted}{haskell}
 24 | data Maybe a
 25 |     = Nothing
 26 |     | Just a
 27 | \end{minted}
 28 | We can use Maybe as the result of a function to indicate:
 29 | \begin{quotation}
 30 | \noindent \textit{Hey, friend! This function might fail. You'll need to handle the \texttt{Nothing} case.                                                                             } 
 31 | \end{quotation} 
 32 | This allows us to write functions like a safe division function:
 33 | 
 34 | \begin{minted}{haskell}
 35 | safeDivide :: Int -> Int -> Maybe Int
 36 | safeDivide i 0 = Nothing
 37 | safeDivide i j = Just (i `div` j)
 38 | \end{minted}
 39 | 
 40 | I like to think of this as pushing the responsibility for failure forward. I'm telling the caller of the code that they can provide whatever \texttt{Int}s they want, but that some condition might cause them to fail. And the caller of the code has to handle that failure later on.
 41 | 
 42 | This is the easiest technique to show and tell, because it's one-size-fits-all. If your function can fail, just slap \texttt{Maybe} or \texttt{Either} on the result type and you've got safety. I can write a 35 line blog post to show off the technique, and if I were feeling frisky, I could use it as an introduction to \texttt{Functor}, \texttt{Monad}, and all that jazz.
 43 | 
 44 | Instead, I'd like to share another technique. Rather than push the responsibility for failure forward, let's explore pushing it back. This technique is a little harder to show, because it depends on the individual cases you might use.
 45 | 
 46 | If pushing responsibility forward means accepting whatever parameters and having the caller of the code handle possibility of failure, then \textit{pushing it back} is going to mean we accept stricter parameters that we can't fail with. Let's consider \texttt{safeDivide}, but with a more lax type signature:
 47 | 
 48 | 
 49 | \begin{minted}{haskell}
 50 | safeDivide :: String -> String -> Maybe Int
 51 | safeDivide iStr jStr = do
 52 |     i <- readMay iStr
 53 |     j <- readMay jStr
 54 |     guard (j /= 0)
 55 |     pure (i `div` j)
 56 | \end{minted}
 57 | This function takes two strings, and then tries to parse \texttt{Int}s out of them. Then, if the \texttt{j} parameter isn't \texttt{0}, we return the result of division. This function is safe, but we have a much larger space of calls to \texttt{safeDivide} that fail and return \texttt{Nothing}. We've accepted more parameters, but we've pushed a lot of responsibility forward for handling possible failure.
 58 | 
 59 | Let's push the failure back.
 60 | \begin{minted}{haskell}
 61 | safeDivide :: Int -> NonZero Int -> Int
 62 | safeDivide i (NonZero j) = i `div` j
 63 | \end{minted}
 64 | We've required that users provide us a \texttt{NonZero Int} rather than any old \texttt{Int}. We've pushed back against the callers of our function:
 65 | \begin{quotation}
 66 | \noindent \textit{No! You must provide a \texttt{NonZero Int}. I refuse to work with just any \texttt{Int}, because then I might fail, and that's annoying.} 
 67 | \end{quotation} 
 68 | So speaks our valiant little function, standing up for itself!
 69 | 
 70 | Let's implement \texttt{NonZero}. We'll take advantage of Haskell's \texttt{PatternSynonyms} language extension to allow people to pattern match on a ``constructor'' without exposing a way to unsafely construct values.
 71 | 
 72 | \begin{minted}{haskell}
 73 | {-# LANGUAGE PatternSynonyms #-}
 74 | 
 75 | module NonZero
 76 |   ( NonZero()
 77 |   , pattern NonZero
 78 |   , unNonZero
 79 |   , nonZero
 80 |   ) where
 81 | 
 82 | newtype NonZero a = UnsafeNonZero a
 83 | 
 84 | pattern NonZero a <- UnsafeNonZero a
 85 | 
 86 | unNonZero :: NonZero a -> a
 87 | unNonZero (UnsafeNonZero a) = a
 88 | 
 89 | nonZero :: (Num a, Eq a) => a -> Maybe (NonZero a)
 90 | nonZero 0 = Nothing
 91 | nonZero i = Just (UnsafeNonZero i)
 92 | \end{minted}
 93 | This module allows us to push the responsibility for type safety backwards onto callers.
 94 | 
 95 | As another example, consider head. Here's the unsafe, convenient variety:
 96 | \begin{minted}{haskell}
 97 | head :: [a] -> a
 98 | head (x:xs) = x
 99 | head []     = error "oh no"
100 | \end{minted}
101 | This code is making a promise that it can't keep. Given the empty list, it will fail at runtime.
102 | 
103 | Let's push the responsibility for safety forward:
104 | 
105 | \begin{minted}{haskell}
106 | headMay :: [a] -> Maybe a
107 | headMay (x:xs) = Just x
108 | headMay []     = Nothing
109 | \end{minted}
110 | Now, we won't fail at runtime. We've required the caller to handle a \texttt{Nothing} case.
111 | 
112 | Let's try pushing it back now:
113 | 
114 | \begin{minted}{haskell}
115 | headOr :: a -> [a] -> a
116 | headOr def (x:xs) = x
117 | headOr def []     = def
118 | \end{minted}
119 | Now, we're requiring that the \textit{caller} of the function handle possible failure before they ever call this. There's no way to get it wrong. Alternatively, we can use a type for nonempty lists!
120 | 
121 | \begin{minted}{haskell}
122 | data NonEmpty a = a :| [a]
123 | 
124 | safeHead :: NonEmpty a -> a
125 | safeHead (x :| xs) = x
126 | \end{minted}
127 | This one works just as well. We're requiring that the calling code handle failure ahead of time.
128 | 
129 | A more complicated example of this technique is the \href{https://hackage.haskell.org/package/justified-containers-0.1.2.0/docs/Data-Map-Justified-Tutorial.html}{justified-containers} library. The library uses the type system to prove that a given key exists in the underlying Map. From that point on, lookups using those keys are total: they are guaranteed to return a value, and they don't return a Maybe.
130 | 
131 | This works even if you \texttt{map} over the \texttt{Map} with a function, transforming values. You can also use it to ensure that two maps share related information. It's a powerful feature, beyond just having type safety.
132 | 
133 | 
134 | 
135 | 
136 | 
137 | 
138 | \section{The Ripple Effect}
139 | 
140 | When some piece of code hands us responsibility, we have two choices:
141 | 
142 | \begin{enumerate}
143 | \item Handle that responsibility.
144 | \item Pass it to someone else!
145 | \end{enumerate}
146 | In my experience, developers will tend to push responsibility in the same direction that the code they call does. So if some function returns a \texttt{Maybe}, the developer is going to be inclined to also return a \texttt{Maybe} value. If some function requires a \texttt{NonEmpty Int}, then the developer is going to be inclined to also require a \texttt{NonEmpty Int} be passed in.
147 | 
148 | This played out in my work codebase. We have a type representing an \texttt{Order} with many \texttt{Item}s in it. Originally, the type looked something like this:
149 | 
150 | \begin{minted}{haskell}
151 | data Order = Order  { items :: [Item] }
152 | \end{minted}
153 | The \texttt{Item}s contained nearly all of the interesting information in the order, so almost everything that we did with an \texttt{Order} would need to return a \texttt{Maybe} value to handle the empty list case. This was a lot of work, and a lot of \texttt{Maybe} values!
154 | 
155 | The type is \textit{too permissive}. As it happens, an \texttt{Order} may not exist without at least one \texttt{Item}. So we can make the type \textit{more restrictive} and have more fun!
156 | 
157 | We redefined the type to be:
158 | 
159 | \begin{minted}{haskell}
160 | data Order = Order { items :: NonEmpty Item }
161 | \end{minted}
162 | All of the \texttt{Maybe}s relating to the empty list were purged, and all of the code was pure and free. The failure case (an empty list of orders) was moved to two sites:
163 | 
164 | \begin{enumerate}
165 | \item Decoding JSON
166 | \item Decoding database rows
167 | \end{enumerate}
168 | Decoding JSON happens at the API side of things, when various services \texttt{POST} updates to us. Now, we can respond with a \texttt{400} error and tell API clients that they've provided invalid data! This prevents our data from going bad.
169 | 
170 | Decoding database rows is even easier. We use an \texttt{INNER JOIN} when retrieving \texttt{Order}s and \texttt{Item}s, which guarantees that each \texttt{Order} will have at least one \texttt{Item} in the result set. Foreign keys ensure that each \texttt{Item}'s \texttt{Order} is actually present in the database. This does leave the possibility that an \texttt{Order} might be orphaned in the database, but it's mostly safe.
171 | 
172 | When we push our type safety back, we're encouraged to continue pushing it back. Eventually, we push it all the way back – to the edges of our system! This simplifies all of the code and logic inside of the system. We're taking advantage of types to make our code simpler, safer, and easier to understand.
173 | 
174 | 
175 | 
176 | \section{Ask Only What You Need}
177 | 
178 | In many senses, designing our code with type safety in mind is about being as strict as possible about your possible inputs. Haskell makes this easier than many other languages, but there's nothing stopping you from writing a function that can take literally any binary value, do whatever effects you want, and return whatever binary value:
179 | 
180 | \begin{minted}{haskell}
181 | foobar :: ByteString -> IO ByteString
182 | \end{minted}
183 | A \texttt{ByteString} is a totally unrestricted data type. It can contain any sequence of bytes. Because it can express any value, we have very little guarantees on what it actually contains, and we are very limited in how we can safely handle this.
184 | 
185 | By restricting our past, we gain freedom in the future.
186 | 
187 | 
188 | 
189 | \chapter{Keep your types small\ldots and your bugs smaller - Matt Parsons}
190 | \label{sec:keep_your_types_small}
191 | 
192 | \textit{Original article: \cite{keep_your_types_small}}
193 | 
194 | \vspace{\baselineskip}
195 | 
196 | \noindent In my previous article ``Type Safety Back and Forth'' (see chapter \ref{sec:type_safety_back_and_forth}), I discussed two different techniques for bringing type safety to programs that may fail. On the one hand, you can push the responsibility forward. This technique uses types like \texttt{Either} and \texttt{Maybe} to report a problem with the inputs to the function. Here are two example type signatures:
197 | 
198 | \begin{minted}{haskell}
199 | safeDivide
200 |     :: Int
201 |     -> Int
202 |     -> Maybe Int
203 | 
204 | lookup
205 |     :: Ord k
206 |     => k
207 |     -> Map k a
208 |     -> Maybe a
209 | \end{minted}
210 | If the second parameter to \texttt{safeDivide} is \texttt{0}, then we return \texttt{Nothing}. Likewise, if the given \texttt{k} is not present in the \texttt{Map}, then we return \texttt{Nothing}.
211 | 
212 | On the other hand, you can push it back. Here are those functions, but with the safety pushed back:
213 | 
214 | \begin{minted}{haskell}
215 | safeDivide
216 |     :: Int
217 |     -> NonZero Int
218 |     -> Int
219 | 
220 | lookupJustified
221 |     :: Key ph k
222 |     -> Map ph k a
223 |     -> a
224 | \end{minted}
225 | With \texttt{safeDivide}, we require the user pass in a \texttt{NonZero Int} -- a type that guarantees that the underlying value is not \texttt{0}. With \texttt{lookupJustified}, the \texttt{ph} type guarantees that the \texttt{Key} is present in the \texttt{Map}, so we can pull the resulting value out without requiring a \texttt{Maybe}. (Check out the \href{https://hackage.haskell.org/package/justified-containers-0.3.0.0/docs/Data-Map-Justified-Tutorial.html}{tutorial} for \texttt{justified-containers}, it is pretty awesome).
226 | 
227 | 
228 | 
229 | 
230 | \section{Expansion and Restriction}
231 | 
232 | 
233 | ``Type Safety Back and Forth'' uses the metaphor of ``pushing'' the responsibility in one of two directions:
234 | 
235 | \begin{itemize}
236 | \item forwards: the caller of the function is responsible for handling the possible error output
237 | \item backwards: the caller of the function is required to providing correct inputs
238 | \end{itemize}
239 | However, this metaphor is a bit squishy. We can make it more precise by talking about the ``cardinality'' of a type -- how many values it can contain. The type \texttt{Bool} can contain two values -- \texttt{True} and \texttt{False}, so we say it has a cardinality of 2. The type \texttt{Word8} can express the numbers from 0 to 255, so we say it has a cardinality of 256.
240 | 
241 | The type \texttt{Maybe a} has a cardinality of $1 + a$. We get a ``free'' value \texttt{Nothing :: Maybe a}. For every value of type \texttt{a}, we can wrap it in \texttt{Just}. 
242 | The type \texttt{Either e a} has a cardinality of $e + a$. We can wrap all the values of type \texttt{e} in \texttt{Left}, and then we can wrap all the values of type \texttt{a} in \texttt{Right}.
243 | 
244 | The first technique -- pushing forward -- is ``expanding the result type.'' When we wrap our results in \texttt{Maybe}, \texttt{Either}, and similar types, we're saying that we can't handle all possible inputs, and so we must have extra outputs to safely deal with this.
245 | 
246 | Let's consider the second technique. Specifically, here's \texttt{NonZero} and \texttt{NonEmpty}, two common ways to implement it:
247 | 
248 | \begin{minted}{haskell}
249 | newtype NonZero a
250 |     = UnsafeNonZero
251 |     { unNonZero :: a
252 |     }
253 | 
254 | nonZero :: (Num a, Eq a) => a -> Maybe (NonZero a)
255 | nonZero 0 = Nothing
256 | nonZero i = Just (UnsafeNonZero i)
257 | 
258 | data NonEmpty a = a :| [a]
259 | 
260 | nonEmpty :: [a] -> Maybe (NonEmpty a)
261 | nonEmpty []     = Nothing
262 | nonEmpty (x:xs) = x :| xs
263 | \end{minted}
264 | What is the cardinality of these types?
265 | 
266 | \texttt{NonZero} a represents ``the type of values \texttt{a} such that the value is not equal to \texttt{0}.'' \texttt{NonEmpty a} represents ``the type of lists of a that are not empty''. In both of these cases, we start with some larger type and remove some potential values. So the type \texttt{NonZero a} has the cardinality $a - 1$, and the type \texttt{NonEmpty a} has the cardinality $[a] - 1$.
267 | 
268 | Interestingly enough, \texttt{[a]} has an infinite cardinality, so $[a] - 1$ seems somewhat strange -- it is also infinite! Math tells us that these are even the same infinity. So it's not the mere cardinality that helps -- it is the specific value(s) that we have removed that makes this type safer for certain operations.
269 | 
270 | These are custom examples of \href{https://ucsd-progsys.github.io/liquidhaskell-tutorial/}{refinement types}. Another closely related idea is \href{https://www.hedonisticlearning.com/posts/quotient-types-for-programmers.html}{quotient types}. The basic idea here is to restrict the size of our inputs. Slightly more formally,
271 | 
272 | \begin{itemize}
273 | \item Forwards: expand the range
274 | \item Backwards: restrict the domain
275 | \end{itemize}
276 | 
277 | 
278 | 
279 | 
280 | \section{Constraints Liberate}
281 | 
282 | 
283 | 
284 | Runar Bjarnason has a wonderful talk titled \href{https://www.youtube.com/watch?v=GqmsQeSzMdw}{Constraints Liberate, Liberties Constrain}. The big idea of the talk, as I see it, is this:
285 | \begin{quotation}
286 | \textit{When we restrict what we can do, it's easier to understand what we can do.}
287 | \end{quotation}
288 | I feel there is a deep connection between this idea and Rich Hickey's talk \href{https://www.youtube.com/watch?v=34_L7t7fD_U}{Simple Made Easy}. In both cases, we are focusing on simplicity -- on cutting away the inessential and striving for more elegant ways to express our problems.
289 | 
290 | Pushing the safety forward -- expanding the range -- does not make things simpler. It provides us with more power, more options, and more possibilities. Pushing the safety backwards -- restricting the domain -- does make things simpler. We can use this technique to take away the power to get it wrong, the options that aren't right, and the possibilities we don't want.
291 | 
292 | Indeed, if we manage to restrict our types sufficiently, there may be only one implementation possible! The classic example is the \texttt{identity} function:
293 | 
294 | \begin{minted}{haskell}
295 | identity :: a -> a
296 | identity a = a
297 | \end{minted}
298 | This is the only implementation of this function that satisfies the type signature (ignoring undefined, of course). In fact, for any function with a sufficiently precise type signature, there is a way to automatically derive the function! Joachim Breitner's \href{https://www.joachim-breitner.de/blog/735-The_magic_%E2%80%9CJust_do_it%E2%80%9D_type_class}{justDoIt} is a fascinating utility that can solve these implementations for you.
299 | 
300 | With sufficiently fancy types, the computer can write even more code for you. The programming language Idris can \href{https://youtu.be/X36ye-1x_HQ?t=1140}{write well-defined functions like \texttt{zipWith} and \texttt{transpose} for length-indexed lists nearly automatically}!
301 | 
302 | 
303 | \section{Restrict the Range}
304 | 
305 | I see this pattern and I am compelled to fill it in:
306 | \begin{table}[htbp]
307 | \centering
308 | \begin{tabular}{lcc}
309 |  & \textbf{Restrict}  & \textbf{Expand} \\
310 | \hline
311 | Range &  & \texttt{:(} \\
312 | Domain & \texttt{:D} & \\
313 | \end{tabular} 
314 | \end{table}
315 | I've talked about restricting the domain and expanding the range. Expanding the domain seems silly to do -- we accept more possible values than we know what to do with. This is clearly not going to make it easier or simpler to implement our programs. However, there are many functions in Haskell's standard library that have a domain that is too large. Consider:
316 | 
317 | \begin{minted}{haskell}
318 | take :: Int -> [a] -> [a]
319 | \end{minted}
320 | \texttt{Int}, as a domain, is both too large and too small. It allows us to provide negative numbers: what does it even mean to take \texttt{-3} elements from a list? As \texttt{Int} is a finite type, and \texttt{[a]} is infinite, we are restricted to only using this function with sufficiently small \texttt{Int}s. A closer fit would be \texttt{take :: Natural -> [a] -> [a]}. Natural allows any non-negative whole number, and perfectly expresses the reasonable domain. Expanding the domain isn't desirable, as we might expect.
321 | 
322 | \texttt{base} has functions with a \texttt{range} that is too large, as well. Let's consider:
323 | \begin{minted}{haskell}
324 | length :: [a] -> Int
325 | \end{minted}
326 | This has many of the same problems as \texttt{take} -- a list with too many elements will overflow the \texttt{Int}, and we won't get the right answer. Additionally, we have a guarantee that we \textit{forget} -- a length for any container must be positive! We can more correctly express this type by restricting the output type:
327 | 
328 | \begin{minted}{haskell}
329 | length :: [a] -> Natural
330 | \end{minted}
331 | 
332 | \section{A perfect fit}
333 | 
334 | 
335 | The more precisely our types describe our program, the fewer ways we have to go wrong. Ideally, we can provide a correct output for every input, and we use a type that tightly describes the properties of possible outputs.
336 | 


--------------------------------------------------------------------------------
/jasper_van_der_jeugt2.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Practical testing in Haskell - Jasper van der Jeugt}
  2 | 
  3 | 
  4 | \begin{quotation}
  5 | \noindent\textit{\textbf{William Yao:}}
  6 | \textit{A short post about writing property tests for an LRU cache. Main takeaway is what Jasper terms the ``Action trick'': generating complicated data more easily by instead generating a sequence of events that could happen to the data and constructing a value accordingly. For instance, you could generate a binary search tree by generating a sequence of insertions and deletions.}
  7 | 
  8 | \vspace{\baselineskip}
  9 | \noindent\textit{Original article: \cite{practical_testing_in_haskell}}
 10 | \end{quotation}
 11 | 
 12 | 
 13 | \section{Introduction}
 14 | 
 15 | There has been a theme of ``Practical Haskell'' in the last few blogposts I published, and when I published the last one, on \href{https://jaspervdj.be/posts/2015-02-24-lru-cache.html}{how to write an LRU Cache}, someone asked me if I could elaborate on how I would test or benchmark such a module. For the sake of brevity, I will constrain myself to testing for now, although I think a lot of the ideas in the blogpost also apply to benchmarking.
 16 | 
 17 | This post is written in Literate Haskell. It depends on the LRU Cache we wrote last time, so you need both modules if you want to play around with the code. Both can be found in \href{https://github.com/jaspervdj/jaspervdj/}{this repo}.
 18 | 
 19 | Since I use a different format for blogpost filenames than GHC expects for module names, loading both modules is a bit tricky. The following works for me:
 20 | 
 21 | \begin{minted}{haskell}
 22 | $ ghci posts/2015-02-24-lru-cache.lhs \
 23 |     posts/2015-03-13-practical-testing-in-haskell.lhs
 24 | *Data.SimpleLruCache> :m +Data.SimpleLruCache.Tests
 25 | *Data.SimpleLruCache Data.SimpleLruCache.Tests>
 26 | \end{minted}
 27 | Alternatively, you can of course rename the files.
 28 | 
 29 | \section{Test frameworks in Haskell}
 30 | 
 31 | There are roughly two kinds of test frameworks which are commonly used in the Haskell world:
 32 | 
 33 | \begin{itemize}
 34 | \item Unit testing, for writing concrete test cases. We will be using \href{http://hackage.haskell.org/package/HUnit}{HUnit}.
 35 | 
 36 | \item Property testing, which allows you to test \textit{properties} rather than specific \textit{cases}. We will be using \href{http://hackage.haskell.org/package/QuickCheck}{QuickCheck}. Property testing is something that might be unfamiliar to people just starting out in Haskell. However, because there already are great \href{http://wiki.haskell.org/Introduction_to_QuickCheck1}{tutorials} out there on there on QuickCheck, I will not explain it in detail. \href{http://hackage.haskell.org/package/smallcheck}{smallcheck} also falls in this category.
 37 | \end{itemize}
 38 | 
 39 | Finally, it's nice to have something to tie it all together. We will be using \href{http://hackage.haskell.org/package/tasty}{Tasty}, which lets us run HUnit and QuickCheck tests in the same test suite. It also gives us plenty of convenient options, e.g. running only a part of the test suite. We could also choose to use \href{http://hackage.haskell.org/package/test-framework}{test-framework} or \href{http://hspec.github.io/}{Hspec} instead of Tasty.
 40 | 
 41 | \section{A module structure for tests}
 42 | 
 43 | Many Haskell projects start out by just having a \texttt{tests.hs} file somewhere, but this obviously does not scale well to larger codebases.
 44 | 
 45 | The way I like to organize tests is based on how we organize code in general: through the module hierarchy. If I have the following modules in \texttt{src/}:
 46 | 
 47 | \begin{minted}{haskell}
 48 | AcmeCompany.AwesomeProduct.Database
 49 | AcmeCompany.AwesomeProduct.Importer
 50 | AcmeCompany.AwesomeProduct.Importer.Csv
 51 | \end{minted}
 52 | I aim to have the following modules in \texttt{tests/}:
 53 | 
 54 | \begin{minted}{haskell}
 55 | AcmeCompany.AwesomeProduct.Database.Tests
 56 | AcmeCompany.AwesomeProduct.Importer.Tests
 57 | AcmeCompany.AwesomeProduct.Importer.Csv.Tests
 58 | \end{minted}
 59 | If I want to add some higher-level tests which basically test the entire product, I can usually add these higher in the module tree. For example, if I wanted to test our entire awesome product, I would write the tests in \texttt{AcmeCompany.AwesomeProduct.Tests}.
 60 | 
 61 | Every \texttt{.Tests} module exports a \texttt{tests :: TestTree} value. A \texttt{TestTree} is a tasty concept -- basically a structured group of tests. Let's go to our motivating example: testing the LRU Cache I wrote in the previous blogpost.
 62 | 
 63 | Since I named the module \texttt{Data.SimpleLruCache}, we use \texttt{Data.SimpleLruCache.Tests} here.
 64 | 
 65 | \begin{minted}{haskell}
 66 | {-# OPTIONS_GHC -fno-warn-orphans #-}
 67 | {-# LANGUAGE BangPatterns               #-}
 68 | {-# LANGUAGE GeneralizedNewtypeDeriving #-}
 69 | module Data.SimpleLruCache.Tests
 70 |     ( tests
 71 |     ) where
 72 | import           Control.Applicative     ((<$>), (<*>))
 73 | import           Control.DeepSeq         (NFData)
 74 | import           Control.Monad           (foldM_)
 75 | import           Data.Hashable           (Hashable (..))
 76 | import qualified Data.HashPSQ            as HashPSQ
 77 | import           Data.IORef              (newIORef, readIORef, writeIORef)
 78 | import           Data.List               (foldl')
 79 | import qualified Data.Set                as S
 80 | import           Prelude                 hiding (lookup)
 81 | import           Data.SimpleLruCache
 82 | import qualified Test.QuickCheck         as QC
 83 | import qualified Test.QuickCheck.Monadic as QC
 84 | import           Test.Tasty              (TestTree, testGroup)
 85 | import           Test.Tasty.HUnit        (testCase)
 86 | import           Test.Tasty.QuickCheck   (testProperty)
 87 | import           Test.HUnit              (Assertion, (@?=))
 88 | \end{minted}
 89 | 
 90 | 
 91 | \section{What to test}
 92 | 
 93 | 
 94 | One of the hardest questions is, of course, which functions and modules should I test? If unlimited time and resources are available, the obvious answer is ``everything''. Unfortunately, time and resources are often scarce.
 95 | 
 96 | My rule of thumb is based on my development style. I tend to use GHCi a lot during development, and play around with datastructures and functions until they seem to work. These ``it seems to work'' cases I execute in GHCi often make great candidates for simple HUnit tests, so I usually start with that.
 97 | 
 98 | Then I look at invariants of the code, and try to model these as QuickCheck properties. This sometimes requires writing tricky \texttt{Arbitrary} instances; I will give an example of this later in this blogpost.
 99 | 
100 | I probably don't have to say that the more critical the code is, the more tests should be added.
101 | 
102 | After doing this, it is still likely that we will hit bugs if the code is non-trivial. These bugs form good candidates for testing as well:
103 | 
104 | \begin{enumerate}
105 | \item First, add a test case to reproduce the bug. Sometimes a test case will be a better fit, sometimes we should go with a property -- it depends on the bug.
106 | \item Fix the bug so the test case passes.
107 | \item Leave in the test case for regression testing.
108 | \end{enumerate}
109 | Using this strategy, you should be able to convince yourself (and others) that the code works.
110 | 
111 | \section{Simple HUnit tests}
112 | 
113 | 
114 | Testing simple cases using HUnit is trivial, so we won't spend that much time here. \texttt{@?=} asserts that two values must be equal, so let's use that to check that trimming the empty \texttt{Cache} doesn't do anything evil:
115 | 
116 | \begin{minted}{haskell}
117 | testCache01 :: Assertion
118 | testCache01 =
119 |     trim (empty 3 :: Cache String Int) @?= empty 3
120 | \end{minted}
121 | If we need to some I/O for our test, we can do so without much trouble in HUnit. After all,
122 | 
123 | \begin{minted}{haskell}
124 | Test.HUnit> :i Assertion
125 | type Assertion = IO ()  -- Defined in 'Test.HUnit.Lang'
126 | \end{minted}
127 | so \texttt{Assertion} is just \texttt{IO}!
128 | 
129 | \begin{minted}{haskell}
130 | testCache02 :: Assertion
131 | testCache02 = do
132 |     h  <- newHandle 10 :: IO (Handle String Int)
133 |     v1 <- cached h "foo" (return 123)
134 |     v1 @?= 123
135 |     v2 <- cached h "foo" (fail "should be cached")
136 |     v2 @?= 123
137 | \end{minted}
138 | That was fairly easy.
139 | 
140 | As you can see, I usually give simple test cases numeric names. Sometimes there is a meaningful name for a test (for example, if it is a regression test for a bug), but usually I don't mind using just numbers.
141 | 
142 | \section{Simple QuickCheck tests}
143 | 
144 | 
145 | Let's do some property based testing. There are a few properties we can come up with.
146 | 
147 | Calling \texttt{HashPSQ.size} takes $O(n)$ time, which is why are keeping our own counter, \texttt{cSize}. We should check that it matches \texttt{HashPSQ.size}, though:
148 | 
149 | \begin{minted}{haskell}
150 | sizeMatches :: (Hashable k, Ord k) => Cache k v -> Bool
151 | sizeMatches c =
152 |     cSize c == HashPSQ.size (cQueue c)
153 | \end{minted}
154 | The \texttt{cTick} field contains the priority of our next element that we will insert. The priorities currently in the queue should all be smaller than that.
155 | 
156 | \begin{minted}{haskell}
157 | prioritiesSmallerThanNext :: (Hashable k, Ord k) => Cache k v -> Bool
158 | prioritiesSmallerThanNext c =
159 |     all (< cTick c) priorities
160 |   where
161 |     priorities = [p | (_, p, _) <- HashPSQ.toList (cQueue c)]
162 | \end{minted}
163 | Lastly, the size should always be smaller than or equal to the capacity:
164 | 
165 | \begin{minted}{haskell}
166 | sizeSmallerThanCapacity :: (Hashable k, Ord k) => Cache k v -> Bool
167 | sizeSmallerThanCapacity c =
168 |     cSize c <= cCapacity c
169 | \end{minted}
170 | 
171 | 
172 | \section{Tricks for writing Arbitrary instances}
173 | 
174 | \subsection{The Action trick}
175 | 
176 | 
177 | Of course, if you are somewhat familiar with QuickCheck, you will know that the previous properties require an \texttt{Arbitrary} instance for \texttt{Cache}.
178 | 
179 | One way to write such instances is what I'll call the ``direct'' method. For us this would mean generating a list of \texttt{[(key, priority, value)]} pairs and convert that to a \texttt{HashPSQ}. Then we could compute the size of that and initialize the remaining fields.
180 | 
181 | However, writing an \texttt{Arbitrary} instance this way can get hard if our datastructure becomes more complicated, especially if there are complicated invariants. Additionally, if we take any shortcuts in the implementation of \texttt{arbitrary}, we might not test the edge cases well!
182 | 
183 | Another way to write the \texttt{Arbitrary} instance is by modeling use of the API. In our case, there are only two things we can do with a pure \texttt{Cache}: insert and lookup.
184 | 
185 | \begin{minted}{haskell}
186 | data CacheAction k v
187 |     = InsertAction k v
188 |     | LookupAction k
189 |     deriving (Show)
190 | \end{minted}
191 | This has a trivial Arbitrary instance:
192 | 
193 | \begin{minted}{haskell}
194 | instance (QC.Arbitrary k, QC.Arbitrary v) =>
195 |         QC.Arbitrary (CacheAction k v) where
196 |     arbitrary = QC.oneof
197 |         [ InsertAction <$> QC.arbitrary <*> QC.arbitrary
198 |         , LookupAction <$> QC.arbitrary
199 |         ]
200 | \end{minted}
201 | And we can apply these actions to our pure \texttt{Cache} to get a new \texttt{Cache}:
202 | 
203 | \begin{minted}{haskell}
204 | applyCacheAction
205 |     :: (Hashable k, Ord k)
206 |     => CacheAction k v -> Cache k v -> Cache k v
207 | applyCacheAction (InsertAction k v) c = insert k v c
208 | applyCacheAction (LookupAction k)   c = case lookup k c of
209 |     Nothing      -> c
210 |     Just (_, c') -> c'
211 | \end{minted}
212 | You probably guessed where this was going by now: we can generate an arbitrary \texttt{Cache} by generating a bunch of these actions and applying them one by one on top of the \texttt{empty} cache.
213 | 
214 | \begin{minted}{haskell}
215 | instance (QC.Arbitrary k, QC.Arbitrary v, Hashable k, NFData v, Ord k) =>
216 |         QC.Arbitrary (Cache k v) where
217 |     arbitrary = do
218 |         capacity <- QC.choose (1, 50)
219 |         actions  <- QC.arbitrary
220 |         let !cache = empty capacity
221 |         return $! foldl' (\c a -> applyCacheAction a c) cache actions
222 | \end{minted}
223 | Provided that we can model the complete user facing API using such an ``action'' datatype, I think this is a great way to write \texttt{Arbitrary} instances. After all, our \texttt{Arbitrary} instance should then be able to reach the same states as a user of our code.
224 | 
225 | An extension of this trick is using a separate datatype which holds the list of actions we used to generate the \texttt{Cache} as well as the \texttt{Cache}.
226 | 
227 | \begin{minted}{haskell}
228 | data ArbitraryCache k v = ArbitraryCache [CacheAction k v] (Cache k v)
229 |     deriving (Show)
230 | \end{minted}
231 | When a test fails, we can then log the list of actions which got us into the invalid state -- very useful for debugging. Furthermore, we can implement the \texttt{shrink} method in order to try to reach a similar invalid state using less actions.
232 | 
233 | \subsection{The SmallInt trick}
234 | 
235 | 
236 | Now, note that our \texttt{Arbitrary} instance is for \texttt{Cache k v}, i.e., we haven't chosen yet what we want to have as \texttt{k} and \texttt{v} for our tests. In this case \texttt{v} is not so important, but the choice of \texttt{k} is important.
237 | 
238 | We want to cover all corner cases, and this includes ensuring that we cover collisions. If we use \texttt{String} or \texttt{Int} as key type k, collisions are very unlikely due to the high cardinality of both types. Since we are using a hash-based container underneath, hash collisions must also be covered.
239 | 
240 | We can solve both problems by introducing a \texttt{newtype} which restricts the cardinality of \texttt{Int}, and uses a ``worse'' (in the traditional sense) hashing method.
241 | 
242 | \begin{minted}{haskell}
243 | newtype SmallInt = SmallInt Int
244 |     deriving (Eq, Ord, Show)
245 | instance QC.Arbitrary SmallInt where
246 |     arbitrary = SmallInt <$> QC.choose (1, 100)
247 | instance Hashable SmallInt where
248 |     hashWithSalt salt (SmallInt x) = (salt + x) `mod` 10
249 | \end{minted}
250 | 
251 | \section{Monadic QuickCheck}
252 | 
253 | 
254 | Now let's mix QuickCheck with monadic code. We will be testing the \texttt{Handle} interface to our cache. This interface consists of a single method:
255 | 
256 | \begin{minted}{haskell}
257 | cached
258 |     :: (Hashable k, Ord k)
259 |     => Handle k v -> k -> IO v -> IO v
260 | \end{minted}
261 | We will write a property to ensure our cache retains and evicts the right key-value pairs. It takes two arguments: the capacity of the LRU Cache (we use a \texttt{SmallInt} in order to get more evictions), and a list of key-value pairs we will insert using \texttt{cached} (we use \texttt{SmallInt} so we will cover collisions).
262 | 
263 | \begin{minted}{haskell}
264 | historic
265 |     :: SmallInt              -- ^ Capacity
266 |     -> [(SmallInt, String)]  -- ^ Key-value pairs
267 |     -> QC.Property           -- ^ Property
268 | historic (SmallInt capacity) pairs = QC.monadicIO $ do
269 | \end{minted}
270 | \texttt{QC.run} is used to lift \texttt{IO} code into the QuickCheck property monad \texttt{PropertyM} -- so it is a bit like a more concrete version of \texttt{liftIO}. I prefer it here over \texttt{liftIO} because it makes it a bit more clear what is going on.
271 | 
272 | \begin{minted}{haskell}
273 |     h <- QC.run $ newHandle capacity
274 | \end{minted}
275 | We will fold (\texttt{foldM\_}) over the pairs we need to insert. The state we pass in this \texttt{foldM\_} is the history of pairs we previously inserted. By building this up again using \texttt{(:)}, we ensure \texttt{history} contains a recent-first list, which is very convenient.
276 | 
277 | Inside every step, we call \texttt{cached}. By using an \texttt{IORef} in the code where we would usually actually ``load'' the value \texttt{v}, we can communicate whether or not the value was already in the cache. If it was already in the cache, the write will not be executed, so the \texttt{IORef} will still be set to \texttt{False}. We store that result in \texttt{wasInCache}.
278 | 
279 | In order to verify this result, we reconstruct a set of the $N$ most recent keys. We can easily do this using the list of recent-first key-value pairs we have in \texttt{history}.
280 | 
281 | \begin{minted}{haskell}
282 |     foldM_ (step h) [] pairs
283 |   where
284 |     step h history (k, v) = do
285 |         wasInCacheRef <- QC.run $ newIORef True
286 |         _             <- QC.run $ cached h k $ do
287 |             writeIORef wasInCacheRef False
288 |             return v
289 |         wasInCache    <- QC.run $ readIORef wasInCacheRef
290 |         let recentKeys = nMostRecentKeys capacity S.empty history
291 |         QC.assert (S.member k recentKeys == wasInCache)
292 |         return ((k, v) : history)
293 | \end{minted}
294 | This is our auxiliary function to calculate the N most recent keys, given a recent-first key-value pair list.
295 | 
296 | \begin{minted}{haskell}
297 | nMostRecentKeys :: Ord k => Int -> S.Set k -> [(k, v)] -> S.Set k
298 | nMostRecentKeys _ keys [] = keys
299 | nMostRecentKeys n keys ((k, _) : history)
300 |     | S.size keys >= n    = keys
301 |     | otherwise           =
302 |         nMostRecentKeys n (S.insert k keys) history
303 | \end{minted}
304 | This test did not cover checking that the \textit{values} in the cache are correct, but only ensures it retains the correct key-value pairs. This is a conscious decision: I think the retaining/evicting part of the LRU Cache code was the most tricky, so we should prioritize testing that.
305 | 
306 | \section{Tying everything up}
307 | 
308 | 
309 | Lastly, we have our \texttt{tests :: TestTree}. It is not much more than an index of tests in the module. We use \texttt{testCase} to pass HUnit tests to the framework, and \texttt{testProperty} for QuickCheck properties.
310 | 
311 | Note that I usually tend to put these at the top of the module, but here I put it at the bottom of the blogpost for easier reading.
312 | 
313 | \begin{minted}{haskell}
314 | tests :: TestTree
315 | tests = testGroup "Data.SimpleLruCache"
316 |     [ testCase "testCache01" testCache01
317 |     , testCase "testCache02" testCache02
318 |     , testProperty "size == HashPSQ.size"
319 |         (sizeMatches :: Cache SmallInt String -> Bool)
320 |     , testProperty "priorities < next priority"
321 |         (prioritiesSmallerThanNext :: Cache SmallInt String -> Bool)
322 |     , testProperty "size < capacity"
323 |         (sizeSmallerThanCapacity :: Cache SmallInt String -> Bool)
324 |     , testProperty "historic" historic
325 |     ]
326 | \end{minted}
327 | The last thing we need is a \texttt{main} function for \texttt{cabal test} to invoke. I usually put this in something like \texttt{tests/Main.hs}. If you use the scheme which I described above, this file should look very neat:
328 | 
329 | \begin{minted}{haskell}
330 | module Main where
331 | 
332 | import           Test.Tasty (defaultMain, testGroup)
333 | 
334 | import qualified AcmeCompany.AwesomeProduct.Database.Tests
335 | import qualified AcmeCompany.AwesomeProduct.Importer.Csv.Tests
336 | import qualified AcmeCompany.AwesomeProduct.Importer.Tests
337 | import qualified Data.SimpleLruCache.Tests
338 | 
339 | main :: IO ()
340 | main = defaultMain $ testGroup "Tests"
341 |     [ AcmeCompany.AwesomeProduct.Database.Tests.tests
342 |     , AcmeCompany.AwesomeProduct.Importer.Csv.Tests.tests
343 |     , AcmeCompany.AwesomeProduct.Importer.Tests.tests
344 |     , Data.SimpleLruCache.Tests.tests
345 |     ]
346 | \end{minted}
347 | If you are still hungry for more Haskell testing, I would recommend looking into \href{http://wiki.haskell.org/Haskell_program_coverage}{Haskell program coverage} for mission-critical modules.
348 | 
349 | Special thanks to Alex Sayers, who beat everyone's expectations when he managed to stay sober for just long enough to proofread this blogpost.
350 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Attribution-ShareAlike 4.0 International
  2 | 
  3 | =======================================================================
  4 | 
  5 | Creative Commons Corporation ("Creative Commons") is not a law firm and
  6 | does not provide legal services or legal advice. Distribution of
  7 | Creative Commons public licenses does not create a lawyer-client or
  8 | other relationship. Creative Commons makes its licenses and related
  9 | information available on an "as-is" basis. Creative Commons gives no
 10 | warranties regarding its licenses, any material licensed under their
 11 | terms and conditions, or any related information. Creative Commons
 12 | disclaims all liability for damages resulting from their use to the
 13 | fullest extent possible.
 14 | 
 15 | Using Creative Commons Public Licenses
 16 | 
 17 | Creative Commons public licenses provide a standard set of terms and
 18 | conditions that creators and other rights holders may use to share
 19 | original works of authorship and other material subject to copyright
 20 | and certain other rights specified in the public license below. The
 21 | following considerations are for informational purposes only, are not
 22 | exhaustive, and do not form part of our licenses.
 23 | 
 24 |      Considerations for licensors: Our public licenses are
 25 |      intended for use by those authorized to give the public
 26 |      permission to use material in ways otherwise restricted by
 27 |      copyright and certain other rights. Our licenses are
 28 |      irrevocable. Licensors should read and understand the terms
 29 |      and conditions of the license they choose before applying it.
 30 |      Licensors should also secure all rights necessary before
 31 |      applying our licenses so that the public can reuse the
 32 |      material as expected. Licensors should clearly mark any
 33 |      material not subject to the license. This includes other CC-
 34 |      licensed material, or material used under an exception or
 35 |      limitation to copyright. More considerations for licensors:
 36 | 	wiki.creativecommons.org/Considerations_for_licensors
 37 | 
 38 |      Considerations for the public: By using one of our public
 39 |      licenses, a licensor grants the public permission to use the
 40 |      licensed material under specified terms and conditions. If
 41 |      the licensor's permission is not necessary for any reason--for
 42 |      example, because of any applicable exception or limitation to
 43 |      copyright--then that use is not regulated by the license. Our
 44 |      licenses grant only permissions under copyright and certain
 45 |      other rights that a licensor has authority to grant. Use of
 46 |      the licensed material may still be restricted for other
 47 |      reasons, including because others have copyright or other
 48 |      rights in the material. A licensor may make special requests,
 49 |      such as asking that all changes be marked or described.
 50 |      Although not required by our licenses, you are encouraged to
 51 |      respect those requests where reasonable. More_considerations
 52 |      for the public:
 53 | 	wiki.creativecommons.org/Considerations_for_licensees
 54 | 
 55 | =======================================================================
 56 | 
 57 | Creative Commons Attribution-ShareAlike 4.0 International Public
 58 | License
 59 | 
 60 | By exercising the Licensed Rights (defined below), You accept and agree
 61 | to be bound by the terms and conditions of this Creative Commons
 62 | Attribution-ShareAlike 4.0 International Public License ("Public
 63 | License"). To the extent this Public License may be interpreted as a
 64 | contract, You are granted the Licensed Rights in consideration of Your
 65 | acceptance of these terms and conditions, and the Licensor grants You
 66 | such rights in consideration of benefits the Licensor receives from
 67 | making the Licensed Material available under these terms and
 68 | conditions.
 69 | 
 70 | 
 71 | Section 1 -- Definitions.
 72 | 
 73 |   a. Adapted Material means material subject to Copyright and Similar
 74 |      Rights that is derived from or based upon the Licensed Material
 75 |      and in which the Licensed Material is translated, altered,
 76 |      arranged, transformed, or otherwise modified in a manner requiring
 77 |      permission under the Copyright and Similar Rights held by the
 78 |      Licensor. For purposes of this Public License, where the Licensed
 79 |      Material is a musical work, performance, or sound recording,
 80 |      Adapted Material is always produced where the Licensed Material is
 81 |      synched in timed relation with a moving image.
 82 | 
 83 |   b. Adapter's License means the license You apply to Your Copyright
 84 |      and Similar Rights in Your contributions to Adapted Material in
 85 |      accordance with the terms and conditions of this Public License.
 86 | 
 87 |   c. BY-SA Compatible License means a license listed at
 88 |      creativecommons.org/compatiblelicenses, approved by Creative
 89 |      Commons as essentially the equivalent of this Public License.
 90 | 
 91 |   d. Copyright and Similar Rights means copyright and/or similar rights
 92 |      closely related to copyright including, without limitation,
 93 |      performance, broadcast, sound recording, and Sui Generis Database
 94 |      Rights, without regard to how the rights are labeled or
 95 |      categorized. For purposes of this Public License, the rights
 96 |      specified in Section 2(b)(1)-(2) are not Copyright and Similar
 97 |      Rights.
 98 | 
 99 |   e. Effective Technological Measures means those measures that, in the
100 |      absence of proper authority, may not be circumvented under laws
101 |      fulfilling obligations under Article 11 of the WIPO Copyright
102 |      Treaty adopted on December 20, 1996, and/or similar international
103 |      agreements.
104 | 
105 |   f. Exceptions and Limitations means fair use, fair dealing, and/or
106 |      any other exception or limitation to Copyright and Similar Rights
107 |      that applies to Your use of the Licensed Material.
108 | 
109 |   g. License Elements means the license attributes listed in the name
110 |      of a Creative Commons Public License. The License Elements of this
111 |      Public License are Attribution and ShareAlike.
112 | 
113 |   h. Licensed Material means the artistic or literary work, database,
114 |      or other material to which the Licensor applied this Public
115 |      License.
116 | 
117 |   i. Licensed Rights means the rights granted to You subject to the
118 |      terms and conditions of this Public License, which are limited to
119 |      all Copyright and Similar Rights that apply to Your use of the
120 |      Licensed Material and that the Licensor has authority to license.
121 | 
122 |   j. Licensor means the individual(s) or entity(ies) granting rights
123 |      under this Public License.
124 | 
125 |   k. Share means to provide material to the public by any means or
126 |      process that requires permission under the Licensed Rights, such
127 |      as reproduction, public display, public performance, distribution,
128 |      dissemination, communication, or importation, and to make material
129 |      available to the public including in ways that members of the
130 |      public may access the material from a place and at a time
131 |      individually chosen by them.
132 | 
133 |   l. Sui Generis Database Rights means rights other than copyright
134 |      resulting from Directive 96/9/EC of the European Parliament and of
135 |      the Council of 11 March 1996 on the legal protection of databases,
136 |      as amended and/or succeeded, as well as other essentially
137 |      equivalent rights anywhere in the world.
138 | 
139 |   m. You means the individual or entity exercising the Licensed Rights
140 |      under this Public License. Your has a corresponding meaning.
141 | 
142 | 
143 | Section 2 -- Scope.
144 | 
145 |   a. License grant.
146 | 
147 |        1. Subject to the terms and conditions of this Public License,
148 |           the Licensor hereby grants You a worldwide, royalty-free,
149 |           non-sublicensable, non-exclusive, irrevocable license to
150 |           exercise the Licensed Rights in the Licensed Material to:
151 | 
152 |             a. reproduce and Share the Licensed Material, in whole or
153 |                in part; and
154 | 
155 |             b. produce, reproduce, and Share Adapted Material.
156 | 
157 |        2. Exceptions and Limitations. For the avoidance of doubt, where
158 |           Exceptions and Limitations apply to Your use, this Public
159 |           License does not apply, and You do not need to comply with
160 |           its terms and conditions.
161 | 
162 |        3. Term. The term of this Public License is specified in Section
163 |           6(a).
164 | 
165 |        4. Media and formats; technical modifications allowed. The
166 |           Licensor authorizes You to exercise the Licensed Rights in
167 |           all media and formats whether now known or hereafter created,
168 |           and to make technical modifications necessary to do so. The
169 |           Licensor waives and/or agrees not to assert any right or
170 |           authority to forbid You from making technical modifications
171 |           necessary to exercise the Licensed Rights, including
172 |           technical modifications necessary to circumvent Effective
173 |           Technological Measures. For purposes of this Public License,
174 |           simply making modifications authorized by this Section 2(a)
175 |           (4) never produces Adapted Material.
176 | 
177 |        5. Downstream recipients.
178 | 
179 |             a. Offer from the Licensor -- Licensed Material. Every
180 |                recipient of the Licensed Material automatically
181 |                receives an offer from the Licensor to exercise the
182 |                Licensed Rights under the terms and conditions of this
183 |                Public License.
184 | 
185 |             b. Additional offer from the Licensor -- Adapted Material.
186 |                Every recipient of Adapted Material from You
187 |                automatically receives an offer from the Licensor to
188 |                exercise the Licensed Rights in the Adapted Material
189 |                under the conditions of the Adapter's License You apply.
190 | 
191 |             c. No downstream restrictions. You may not offer or impose
192 |                any additional or different terms or conditions on, or
193 |                apply any Effective Technological Measures to, the
194 |                Licensed Material if doing so restricts exercise of the
195 |                Licensed Rights by any recipient of the Licensed
196 |                Material.
197 | 
198 |        6. No endorsement. Nothing in this Public License constitutes or
199 |           may be construed as permission to assert or imply that You
200 |           are, or that Your use of the Licensed Material is, connected
201 |           with, or sponsored, endorsed, or granted official status by,
202 |           the Licensor or others designated to receive attribution as
203 |           provided in Section 3(a)(1)(A)(i).
204 | 
205 |   b. Other rights.
206 | 
207 |        1. Moral rights, such as the right of integrity, are not
208 |           licensed under this Public License, nor are publicity,
209 |           privacy, and/or other similar personality rights; however, to
210 |           the extent possible, the Licensor waives and/or agrees not to
211 |           assert any such rights held by the Licensor to the limited
212 |           extent necessary to allow You to exercise the Licensed
213 |           Rights, but not otherwise.
214 | 
215 |        2. Patent and trademark rights are not licensed under this
216 |           Public License.
217 | 
218 |        3. To the extent possible, the Licensor waives any right to
219 |           collect royalties from You for the exercise of the Licensed
220 |           Rights, whether directly or through a collecting society
221 |           under any voluntary or waivable statutory or compulsory
222 |           licensing scheme. In all other cases the Licensor expressly
223 |           reserves any right to collect such royalties.
224 | 
225 | 
226 | Section 3 -- License Conditions.
227 | 
228 | Your exercise of the Licensed Rights is expressly made subject to the
229 | following conditions.
230 | 
231 |   a. Attribution.
232 | 
233 |        1. If You Share the Licensed Material (including in modified
234 |           form), You must:
235 | 
236 |             a. retain the following if it is supplied by the Licensor
237 |                with the Licensed Material:
238 | 
239 |                  i. identification of the creator(s) of the Licensed
240 |                     Material and any others designated to receive
241 |                     attribution, in any reasonable manner requested by
242 |                     the Licensor (including by pseudonym if
243 |                     designated);
244 | 
245 |                 ii. a copyright notice;
246 | 
247 |                iii. a notice that refers to this Public License;
248 | 
249 |                 iv. a notice that refers to the disclaimer of
250 |                     warranties;
251 | 
252 |                  v. a URI or hyperlink to the Licensed Material to the
253 |                     extent reasonably practicable;
254 | 
255 |             b. indicate if You modified the Licensed Material and
256 |                retain an indication of any previous modifications; and
257 | 
258 |             c. indicate the Licensed Material is licensed under this
259 |                Public License, and include the text of, or the URI or
260 |                hyperlink to, this Public License.
261 | 
262 |        2. You may satisfy the conditions in Section 3(a)(1) in any
263 |           reasonable manner based on the medium, means, and context in
264 |           which You Share the Licensed Material. For example, it may be
265 |           reasonable to satisfy the conditions by providing a URI or
266 |           hyperlink to a resource that includes the required
267 |           information.
268 | 
269 |        3. If requested by the Licensor, You must remove any of the
270 |           information required by Section 3(a)(1)(A) to the extent
271 |           reasonably practicable.
272 | 
273 |   b. ShareAlike.
274 | 
275 |      In addition to the conditions in Section 3(a), if You Share
276 |      Adapted Material You produce, the following conditions also apply.
277 | 
278 |        1. The Adapter's License You apply must be a Creative Commons
279 |           license with the same License Elements, this version or
280 |           later, or a BY-SA Compatible License.
281 | 
282 |        2. You must include the text of, or the URI or hyperlink to, the
283 |           Adapter's License You apply. You may satisfy this condition
284 |           in any reasonable manner based on the medium, means, and
285 |           context in which You Share Adapted Material.
286 | 
287 |        3. You may not offer or impose any additional or different terms
288 |           or conditions on, or apply any Effective Technological
289 |           Measures to, Adapted Material that restrict exercise of the
290 |           rights granted under the Adapter's License You apply.
291 | 
292 | 
293 | Section 4 -- Sui Generis Database Rights.
294 | 
295 | Where the Licensed Rights include Sui Generis Database Rights that
296 | apply to Your use of the Licensed Material:
297 | 
298 |   a. for the avoidance of doubt, Section 2(a)(1) grants You the right
299 |      to extract, reuse, reproduce, and Share all or a substantial
300 |      portion of the contents of the database;
301 | 
302 |   b. if You include all or a substantial portion of the database
303 |      contents in a database in which You have Sui Generis Database
304 |      Rights, then the database in which You have Sui Generis Database
305 |      Rights (but not its individual contents) is Adapted Material,
306 | 
307 |      including for purposes of Section 3(b); and
308 |   c. You must comply with the conditions in Section 3(a) if You Share
309 |      all or a substantial portion of the contents of the database.
310 | 
311 | For the avoidance of doubt, this Section 4 supplements and does not
312 | replace Your obligations under this Public License where the Licensed
313 | Rights include other Copyright and Similar Rights.
314 | 
315 | 
316 | Section 5 -- Disclaimer of Warranties and Limitation of Liability.
317 | 
318 |   a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
319 |      EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
320 |      AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
321 |      ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
322 |      IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
323 |      WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
324 |      PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
325 |      ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
326 |      KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
327 |      ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
328 | 
329 |   b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
330 |      TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
331 |      NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
332 |      INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
333 |      COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
334 |      USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
335 |      ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
336 |      DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
337 |      IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
338 | 
339 |   c. The disclaimer of warranties and limitation of liability provided
340 |      above shall be interpreted in a manner that, to the extent
341 |      possible, most closely approximates an absolute disclaimer and
342 |      waiver of all liability.
343 | 
344 | 
345 | Section 6 -- Term and Termination.
346 | 
347 |   a. This Public License applies for the term of the Copyright and
348 |      Similar Rights licensed here. However, if You fail to comply with
349 |      this Public License, then Your rights under this Public License
350 |      terminate automatically.
351 | 
352 |   b. Where Your right to use the Licensed Material has terminated under
353 |      Section 6(a), it reinstates:
354 | 
355 |        1. automatically as of the date the violation is cured, provided
356 |           it is cured within 30 days of Your discovery of the
357 |           violation; or
358 | 
359 |        2. upon express reinstatement by the Licensor.
360 | 
361 |      For the avoidance of doubt, this Section 6(b) does not affect any
362 |      right the Licensor may have to seek remedies for Your violations
363 |      of this Public License.
364 | 
365 |   c. For the avoidance of doubt, the Licensor may also offer the
366 |      Licensed Material under separate terms or conditions or stop
367 |      distributing the Licensed Material at any time; however, doing so
368 |      will not terminate this Public License.
369 | 
370 |   d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
371 |      License.
372 | 
373 | 
374 | Section 7 -- Other Terms and Conditions.
375 | 
376 |   a. The Licensor shall not be bound by any additional or different
377 |      terms or conditions communicated by You unless expressly agreed.
378 | 
379 |   b. Any arrangements, understandings, or agreements regarding the
380 |      Licensed Material not stated herein are separate from and
381 |      independent of the terms and conditions of this Public License.
382 | 
383 | 
384 | Section 8 -- Interpretation.
385 | 
386 |   a. For the avoidance of doubt, this Public License does not, and
387 |      shall not be interpreted to, reduce, limit, restrict, or impose
388 |      conditions on any use of the Licensed Material that could lawfully
389 |      be made without permission under this Public License.
390 | 
391 |   b. To the extent possible, if any provision of this Public License is
392 |      deemed unenforceable, it shall be automatically reformed to the
393 |      minimum extent necessary to make it enforceable. If the provision
394 |      cannot be reformed, it shall be severed from this Public License
395 |      without affecting the enforceability of the remaining terms and
396 |      conditions.
397 | 
398 |   c. No term or condition of this Public License will be waived and no
399 |      failure to comply consented to unless expressly agreed to by the
400 |      Licensor.
401 | 
402 |   d. Nothing in this Public License constitutes or may be interpreted
403 |      as a limitation upon, or waiver of, any privileges and immunities
404 |      that apply to the Licensor or You, including from the legal
405 |      processes of any jurisdiction or authority.
406 | 
407 | 
408 | =======================================================================
409 | 
410 | Creative Commons is not a party to its public
411 | licenses. Notwithstanding, Creative Commons may elect to apply one of
412 | its public licenses to material it publishes and in those instances
413 | will be considered the “Licensor.” The text of the Creative Commons
414 | public licenses is dedicated to the public domain under the CC0 Public
415 | Domain Dedication. Except for the limited purpose of indicating that
416 | material is shared under a Creative Commons public license or as
417 | otherwise permitted by the Creative Commons policies published at
418 | creativecommons.org/policies, Creative Commons does not authorize the
419 | use of the trademark "Creative Commons" or any other trademark or logo
420 | of Creative Commons without its prior written consent including,
421 | without limitation, in connection with any unauthorized modifications
422 | to any of its public licenses or any other arrangements,
423 | understandings, or agreements concerning use of licensed material. For
424 | the avoidance of doubt, this paragraph does not form part of the
425 | public licenses.
426 | 
427 | Creative Commons may be contacted at creativecommons.org.
428 | 


--------------------------------------------------------------------------------
/alexis_king3.tex:
--------------------------------------------------------------------------------
  1 | \chapter{Unit testing effectful Haskell with monad-mock}
  2 | 
  3 | \begin{quotation}
  4 | \noindent\textit{\textbf{William Yao:}}
  5 | 
  6 | \textit{Introduces a way of doing more “traditional” unit testing, but focused more on doing white-box testing, checking whether the code under test performed certain operations, rather than just expecting on the output. Since this is Haskell, doing that is a little bit more unusual.}
  7 | 
  8 | \textit{Personally, I'm not a fan of writing tests like this because they feel like they couple the tests to the implementation too tightly, and calcify design decisions too quickly. Still, if there's something that's legitimately too difficult to sandbox for your testing environment, it's a useful pattern to be aware of.}
  9 | 
 10 | \vspace{\baselineskip}
 11 | 
 12 | \noindent\textit{Original article: \cite{unit_testing_monad_mock}}
 13 | \end{quotation}
 14 | 
 15 | 
 16 | \noindent Nearly eight months ago (see chapter \ref{sec:using_types_to_unit_test}),
 17 | I wrote a blog post about unit testing effectful Haskell code using a
 18 | library called test-fixture. That library has served us well, but it
 19 | wasn't as easy to use as I would have liked, and it worked better with
 20 | certain patterns than others. Since then, I've learned more about
 21 | Haskell and more about testing, and I'm pleased to announce that I am
 22 | releasing an entirely new testing library,
 23 | \href{https://hackage.haskell.org/package/monad-mock}{monad-mock}.
 24 | 
 25 | \section{A first glance at
 26 | monad-mock}\label{a-first-glance-at-monad-mock}
 27 | 
 28 | The monad-mock library is, first and foremost, designed to be
 29 | \emph{easy}. It doesn't ask much from you, and it requires almost zero
 30 | boilerplate.
 31 | 
 32 | The first step is to write an mtl-style interface that encodes an effect
 33 | you want to mock. For example, you might want to test some code that
 34 | interacts with the filesystem:
 35 | 
 36 | \begin{minted}{haskell}
 37 | class Monad m => MonadFileSystem m where
 38 |   readFile :: FilePath -> m String
 39 |   writeFile :: FilePath -> String -> m ()
 40 | \end{minted}
 41 | Now you just have to write your code as normal. For demonstration
 42 | purposes, here's a function that defines copying a file in terms of
 43 | \texttt{readFile} and \texttt{writeFile}:
 44 | 
 45 | \begin{minted}{haskell}
 46 | copyFile :: MonadFileSystem m => FilePath -> FilePath -> m ()
 47 | copyFile a b = do
 48 |   contents <- readFile a
 49 |   writeFile b contents
 50 | \end{minted}
 51 | Making this function work on the real filesystem is trivial, since we
 52 | just need to define an instance of \texttt{MonadFileSystem} for
 53 | \texttt{IO}:
 54 | 
 55 | \begin{minted}{haskell}
 56 | instance MonadFileSystem IO where
 57 |   readFile = Prelude.readFile
 58 |   writeFile = Prelude.writeFile
 59 | \end{minted}
 60 | But how do we test this? Well, we \emph{could} run some real code in
 61 | \texttt{IO}, which might not be so bad for such a simple function, but
 62 | this seems like a bad idea. For one thing, a bad implementation of
 63 | \texttt{copyFile} could do some pretty horrible things if it misbehaved
 64 | and decided to overwrite important files, and if you're constantly
 65 | running a test suite whenever a file changes, it's easy to imagine
 66 | causing a lot of damage. Running tests against the real filesystem also
 67 | makes tests slower and harder to parallelize, and it only gets much
 68 | worse once you are doing more complex effects than interacting with the
 69 | filesystem.
 70 | 
 71 | Using monad-mock, we can test this function in just a couple of lines of
 72 | code:
 73 | 
 74 | \begin{minted}{haskell}
 75 | import Control.Exception (evaluate)
 76 | import Control.Monad.Mock
 77 | import Control.Monad.Mock.TH
 78 | import Data.Function ((&))
 79 | import Test.Hspec
 80 | 
 81 | makeMock "FileSystemAction" [ts| MonadFileSystem |]
 82 | 
 83 | spec = describe "copyFile" $
 84 |   it "reads a file and writes its contents to another file" $
 85 |     evaluate $ copyFile "foo.txt" "bar.txt"
 86 |       & runMock [ ReadFile "foo.txt" :-> "contents"
 87 |                 , WriteFile "bar.txt" "contents" :-> () ]
 88 | \end{minted}
 89 | That's it!
 90 | The last two lines of the above snippet are the real interesting bits,
 91 | which specify the actions that are expected to be executed, and it
 92 | couples them with their results. You will find that if you tweak the
 93 | list in any way, such as reordering the actions, eliminating one or both
 94 | of them, or adding an additional action to the end, the test will fail.
 95 | We could even turn this into a property-based test that generated
 96 | arbitrary file paths and file contents.
 97 | 
 98 | Admittedly, in this trivial example, the mock is a little silly, since
 99 | converting this into a property-based test would demonstrate how much
100 | we've basically just reimplemented the function in our test. However,
101 | once our function starts to do somewhat more complicated things, then
102 | our tests become more meaningful. Here's a similar function that only
103 | copies a file if it is nonempty:
104 | 
105 | \begin{minted}{haskell}
106 | copyNonemptyFile :: MonadFileSystem m => FilePath -> FilePath -> m ()
107 | copyNonemptyFile a b = do
108 |   contents <- readFile a
109 |   unless (null contents) $
110 |     writeFile b contents
111 | \end{minted}
112 | This function has some logic which is very clearly \emph{not} expressed
113 | in its type, and it would be difficult to encode that information into
114 | the type in a safe way. Fortunately, we can guarantee that it works by
115 | writing some tests:
116 | 
117 | \begin{minted}{haskell}
118 | describe "copyNonemptyFile" $ do
119 |   it "copies a file with contents" $
120 |     evaluate $ copyNonemptyFile "foo.txt" "bar.txt"
121 |       & runMock [ ReadFile "foo.txt" :-> "contents"
122 |                 , WriteFile "bar.txt" "contents" :-> () ]
123 | 
124 |   it "does nothing with an empty file" $
125 |     evaluate $ copyNonemptyFile "foo.txt" "bar.txt"
126 |       & runMock [ ReadFile "foo.txt" :-> "" ]
127 | \end{minted}
128 | These tests are much more useful, and they have some actual value to
129 | them. Imagine we had accidentally written \texttt{when} instead of
130 | \texttt{unless}, an easy typo to make. Our tests would fail with some
131 | useful error messages:
132 | 
133 | \begin{minted}{haskell}
134 | 1) copyNonemptyFile copies a file with contents
135 |      uncaught exception: runMockT: expected the following unexecuted actions to be run:
136 |        WriteFile "bar.txt" "contents"
137 | 
138 | 2) copyNonemptyFile does nothing with an empty file
139 |      uncaught exception: runMockT: expected end of program, called writeFile
140 |        given action: WriteFile "bar.txt" ""
141 | \end{minted}
142 | You now know enough to write tests with monad-mock.
143 | 
144 | \hypertarget{why-unit-test}{%
145 | \section{Why unit test?}\label{why-unit-test}}
146 | 
147 | When the issue of testing is brought up in Haskell, it is often treated
148 | with a certain distaste by a portion of the community. There are some
149 | points I've seen a number of times, and though they take different
150 | forms, they boil down to two ideas:
151 | 
152 | \begin{enumerate}
153 | \item
154 |   ``Haskell code does not need tests because the type system can prove
155 |   correctness.''
156 | \item
157 |   ``Testing in Haskell is trivial because it is a pure language, and
158 |   testing pure functions is easy.''
159 | \end{enumerate}
160 | I've been writing Haskell professionally for over a year now, and I can
161 | happily say that there \emph{is} some truth to both of those things!
162 | When my Haskell code typechecks, I feel a confidence in it that I would
163 | not feel were I using a language with a less powerful type system.
164 | Furthermore, Haskell encourages a ``pure core, impure shell'' approach
165 | to system design that makes testing many things pleasant and
166 | straightforward, and it completely eliminates the worry of subtle
167 | nondeterminism leaking into tests.
168 | 
169 | That said, Haskell is not a proof assistant, and its type system cannot
170 | guarantee everything, especially for code that operates on the
171 | boundaries of what Haskell can control. For much the same reason, I find
172 | that my pure code is the code I am \emph{least} likely to need to test,
173 | since it is also the code with the strongest type safety guarantees,
174 | operating on types in my application's domain. In contrast, the
175 | effectful code is often what I find the most value in extensively
176 | testing, since it often contains the most subtle complexity, and it is
177 | frequently difficult or even impossible to encode into types.
178 | 
179 | Haskell has the power to provide remarkably strong correctness
180 | guarantees with a surprisingly small amount of effort by using a
181 | combination of tests and types, using each to accommodate for the
182 | other's weaknesses and playing to each technique's strengths. Some code
183 | is test-driven, other code is type-driven. Most code ends up being a mix
184 | of both. Testing is just a tool like any other, and it's nice to feel
185 | confident in one's ability to effectively structure code in a decoupled,
186 | testable manner.
187 | 
188 | \hypertarget{why-mock}{%
189 | \section{Why mock?}\label{why-mock}}
190 | 
191 | Even if you accept that testing is good, the question of whether or not
192 | to \emph{mock} is a subtler issue. To some people, ``unit testing'' is
193 | synonymous with mocks. This is emphatically not true, and in fact,
194 | overly aggressive mocking is one of the best ways to make your test
195 | suite completely worthless. The monad-mock approach to mocking is a bit
196 | more principled than mocking in many dynamic, object-oriented languages,
197 | but it comes with many of the same drawbacks: mocks couple your tests to
198 | your implementation in ways that make them less valuable and less
199 | meaningful.
200 | 
201 | For the \texttt{MonadFileSystem} example above, I would actually
202 | probably \emph{not} use a mock. Instead, I would use a \textbf{fake},
203 | in-memory filesystem implementation:
204 | 
205 | \begin{minted}{haskell}
206 | newtype FakeFileSystemT m a = FakeFileSystemT (StateT [(FilePath, String)] m a)
207 |   deriving (Functor, Applicative, Monad)
208 | 
209 | fakeFileSystemT :: Monad m => [(FilePath, String)]
210 |                 -> FakeFileSystemT m a -> m (a, [(FilePath, String)])
211 | fakeFileSystemT fs (FakeFileSystemT x) = second sort <$> runStateT x fs
212 | 
213 | instance Monad m => MonadFileSystem (FakeFileSystemT m) where
214 |   readFile path = FakeFileSystemT $ get >>= \fs -> lookup path fs &
215 |     maybe (fail $ "readFile: no such file ‘" ++ path ++ "’") return
216 |   writeFile path contents = FakeFileSystemT . modify $ \fs ->
217 |     (path, contents) : filter ((/= path) . fst) fs
218 | \end{minted}
219 | The above snippet demonstrates how easy it is to define a
220 | \texttt{MonadFileSystem} implementation in terms of \texttt{StateT}, and
221 | while this may seem like a lot of boilerplate, it really isn't. You have
222 | to write a fake \emph{once} per interface, and the above block is a
223 | minuscule twelve lines of code. With this technique, you are still able
224 | to write tests that depend on the state of the filesystem before and
225 | after running the implementation, but you decouple yourself from the
226 | precise process of getting there:
227 | 
228 | \begin{minted}{haskell}
229 | describe "copyNonemptyFile" $ do
230 |   it "copies a file with contents" $ do
231 |     let ((), fs) = runIdentity $ copyNonemptyFile "foo.txt" "bar.txt"
232 |           & fakeFileSystemT [ ("foo.txt", "contents") ]
233 |     fs `shouldBe` [ ("bar.txt", "contents"), ("foo.txt", "contents") ]
234 | 
235 |   it "does nothing with an empty file" $ do
236 |     let ((), fs) = runIdentity $ copyNonemptyFile "foo.txt" "bar.txt"
237 |           & fakeFileSystemT [ ("foo.txt", "") ]
238 |     fs `shouldBe` [ ("foo.txt", "") ]
239 | \end{minted}
240 | This is better than using a mock, and I would highly recommend doing it
241 | if you can! However, a lot of real applications have to interact with
242 | services of much greater complexity than an idealized filesystem, and
243 | creating that sort of in-memory fake is not always practical. One such
244 | situation might be interacting with AWS CloudFormation, for example:
245 | 
246 | \begin{minted}{haskell}
247 | class Monad m => MonadAWS m where
248 |   createStack :: StackName -> StackTemplate -> m (Either AWSError StackId)
249 |   listStacks :: m (Either AWSError [StackSummaries])
250 |   describeStack :: StackId -> m (Either AWSError StackInfo)
251 |   -- and so on...
252 | \end{minted}
253 | AWS is a very complex system, and it can do dozens of different things
254 | (and fail in dozens of different ways) based on an equally complex set
255 | of inputs. For example, in the above API, \texttt{createStack} needs to
256 | parse its template, which can be YAML or JSON, in order to determine
257 | which of many possible errors and behaviors can be produced, both on the
258 | initial call and on subsequent ones.
259 | 
260 | Creating a fake implementation of \emph{AWS} is hardly feasible, and
261 | this is where a mock can be useful. By simply writing
262 | \texttt{makeMock\ "AWSAction"\ {[}ts\textbar{}\ MonadAWS\ \textbar{}{]}},
263 | we can test functions that interact with AWS in a pure way without
264 | necessarily needing to replicate all of its complexity.
265 | 
266 | \hypertarget{isolating-mocks}{%
267 | \subsection{Isolating mocks}\label{isolating-mocks}}
268 | 
269 | Of course, tests that use mocks provide less value than tests that use
270 | ``smarter'' fakes, since they are far more tightly coupled to the
271 | implementation, and it's dramatically more likely that you will need to
272 | change the tests when you change the logic. To avoid this, it can be
273 | helpful to create multiple interfaces to the same thing: a high-level
274 | interface and a low-level one. If our above \texttt{MonadAWS} is a
275 | low-level interface, we could create a high-level counterpart that does
276 | precisely what our application needs:
277 | 
278 | \begin{minted}{haskell}
279 | class Monad m => MonadDeploy m where
280 |   executeDeployment :: Deployment -> m (Either DeployError ())
281 | \end{minted}
282 | When running our application ``for real'', we would use
283 | \texttt{MonadAWS} to implement \texttt{MonadDeploy}:
284 | 
285 | \begin{minted}{haskell}
286 | executeDeploymentImpl :: MonadAWS m => Deployment -> m (Either DeployError ())
287 | executeDeploymentImpl = ...
288 | \end{minted}
289 | The nice thing about this is we can actually test
290 | \texttt{executeDeploymentImpl} using a \texttt{MonadAWS} mock, so we can
291 | still have unit test coverage of the code on the boundaries of our
292 | system! Additionally, by containing the mock to a single place, we can
293 | test the rest of our code using a smarter fake implementation of
294 | \texttt{MonadDeploy}, helping to decouple our code from AWS's complex
295 | API and improve the reliability and usefulness of our test suite.
296 | 
297 | They key point here is that mocking is just a small piece of the larger
298 | testing puzzle in \emph{any} language, and that is just as true in
299 | Haskell. An overemphasis on mocking is an easy way to end up with a test
300 | suite that feels useless, probably because it is. Use mocks as a
301 | technique to insulate your application from the complexity in others'
302 | APIs, then use more domain-specific testing techniques and type-level
303 | assertions to ensure the correctness of your logic.
304 | 
305 | \hypertarget{how-monad-mock-works}{%
306 | \section{How monad-mock works}\label{how-monad-mock-works}}
307 | 
308 | If you've read this far and are convinced that monad-mock is useful, you
309 | may safely stop reading now. However, if you are interested in the
310 | details of what it actually does and what makes it tick, the rest of
311 | this blog post is going to focus on how the implementation works and how
312 | it compares to other techniques.
313 | 
314 | The centerpiece of monad-mock's API is its monad transformer,
315 | \texttt{MockT}, which is a type constructor that accepts three types:
316 | 
317 | \begin{minted}{haskell}
318 | newtype MockT (f :: * -> *) (m :: * -> *) (a :: *)
319 | \end{minted}
320 | The \texttt{m} and \texttt{a} type variables obviously correspond to the
321 | usual monad transformer arguments, which represent the underlying monad
322 | and the result of the monadic computation, respectively. The \texttt{f}
323 | variable is more interesting, since it's what makes \texttt{MockT} work
324 | at all, and it isn't even a type: it's a type constructor with kind
325 | \texttt{*\ -\textgreater{}\ *}. What does it mean?
326 | 
327 | Looking at the type signature of \texttt{runMockT} gives us a little bit
328 | more information about what that \texttt{f} actually represents:
329 | 
330 | \begin{minted}{haskell}
331 | runMockT :: (Action f, Monad m) => [WithResult f] -> MockT f m a -> m a
332 | \end{minted}
333 | This type signature provides two pieces of key information:
334 | 
335 | \begin{enumerate}
336 | \item
337 |   The \texttt{f} parameter is constrained by the \texttt{Action\ f}
338 |   constraint.
339 | \item
340 |   Running a mocked computation requires supplying a list of
341 |   \texttt{WithResult\ f} values. This list corresponds to the list of
342 |   expectations provided to \texttt{runMock} in earlier examples.
343 | \end{enumerate}
344 | To understand both of these things, it helps to examine the definition
345 | of an actual datatype that can have an \texttt{Action} instance. For the
346 | filesystem example, the action datatype looks like this:
347 | 
348 | \begin{minted}{haskell}
349 | data FileSystemAction r where
350 |   ReadFile :: FilePath -> FileSystemAction String
351 |   WriteFile :: FilePath -> String -> FileSystemAction ()
352 | \end{minted}
353 | Notice how each constructor clearly corresponds to one of the methods of
354 | \texttt{MonadFileSystem}, with a type to match. Now the purpose of the
355 | type provided to the \texttt{FileSystemAction} constructor (in this case
356 | \texttt{r}) should hopefully become clear: it represents the type of the
357 | value \emph{produced} by each method. Also note that the type is
358 | completely phantom---it does not appear in negative position in any of
359 | the constructors.
360 | 
361 | With this in mind, we can take a look at the definition of
362 | \texttt{WithResult}:
363 | 
364 | \begin{minted}{haskell}
365 | data WithResult f where
366 |   (:->) :: f r -> r -> WithResult f
367 | \end{minted}
368 | This is what defines the \texttt{(:-\textgreater{})} constructor from
369 | earlier in the blog post, and you can see that it effectively just
370 | represents a tuple of an action and a value of its associated result.
371 | It's completely type-safe, since it ensures the result matches the type
372 | argument to the action.
373 | 
374 | Finally, this brings us to the \texttt{Action} class, which is not
375 | complex, but is unfortunately necessary:
376 | 
377 | \begin{minted}{haskell}
378 | class Action f where
379 |   eqAction :: f a -> f b -> Maybe (a :~: b)
380 |   showAction :: f a -> String
381 | \end{minted}
382 | Notice that these methods are effectively just \texttt{(==)} and
383 | \texttt{show}, lifted to type constructors of kind
384 | \texttt{*\ -\textgreater{}\ *}. One significant difference is that
385 | \texttt{eqAction} produces \texttt{Maybe\ (a\ :\textasciitilde{}:\ b)}
386 | instead of \texttt{Bool}, where \texttt{(:\textasciitilde{}:)} is from
387 | \texttt{Data.Type.Equality}. This is a type equality witness, which
388 | means a successful equality between two values allows the compiler to be
389 | sure that the two \emph{types} are equal. This is necessary for the
390 | implementation of \texttt{runMockT} due to the phantom type in
391 | actions---in order to convince GHC that we can properly return the
392 | result of a mocked action, we need to assure it that the value we're
393 | going to return is actually of the proper type.
394 | 
395 | Implementing this typeclass is not particularly burdensome, but it's
396 | entirely boilerplate, so even if you want to define your own action type
397 | (that is, you don't want to use \texttt{makeMock}), you can use the
398 | \texttt{deriveAction} function from \texttt{Control.Monad.Mock.TH} to
399 | derive an \texttt{Action} instance on an existing datatype.
400 | 
401 | \hypertarget{connecting-the-mock-to-its-class}{%
402 | \subsection{Connecting the mock to its
403 | class}\label{connecting-the-mock-to-its-class}}
404 | 
405 | Now that we have an action with which to mock a class, we need to
406 | actually define an instance of that class for \texttt{MockT}. For this
407 | process, monad-mock provides a \texttt{mockAction} function with the
408 | following type:
409 | 
410 | \begin{minted}{haskell}
411 | mockAction :: (Action f, Monad m) => String -> f r -> MockT f m r
412 | \end{minted}
413 | This function accepts two arguments: the name of the method being mocked
414 | and the action that represents the current call. This is easier to
415 | illustrate with an actual instance of \texttt{MonadFileSystem} using
416 | \texttt{MockT} and our \texttt{FileSystemAction} type:
417 | 
418 | \begin{minted}{haskell}
419 | instance Monad m => MonadFileSystem (MockT FileSystemAction m) where
420 |   readFile a = mockAction "readFile" (ReadFile a)
421 |   writeFile a b = mockAction "writeFile" (WriteFile a b)
422 | \end{minted}
423 | This allows \texttt{readFile} and \texttt{writeFile} to defer to the
424 | mock, and providing the names of the functions as strings helps
425 | monad-mock to produce useful error messages upon failure. Internally,
426 | \texttt{MockT} is a \texttt{StateT} that keeps track of a list of
427 | \texttt{WithResult\ f} values as its state. Each call to the mock checks
428 | the action against the internal list of calls, and if they match, it
429 | returns the associated result. Otherwise, it throws an exception.
430 | 
431 | This scheme is simple, but it seems to work remarkably well. There are
432 | some obvious enhancements that will probably be eventually necessary,
433 | like allowing action results that run in the underlying monad \texttt{m}
434 | in order to support things like \texttt{throwError} from
435 | \texttt{MonadError}, but so far, it hasn't been necessary for what we've
436 | been using it for. Certain tricky signatures defy this simple technique,
437 | such as signatures where a monadic action appears in a negative position
438 | (that is, the signatures you need things like
439 | \href{https://hackage.haskell.org/package/monad-control}{monad-control}
440 | or \href{https://hackage.haskell.org/package/monad-unlift}{monad-unlift}
441 | for), but we've found that most of our effects don't have any reason to
442 | include such signatures.
443 | 
444 | \hypertarget{a-brief-comparison-with-freer-monads}{%
445 | \section{A brief comparison with free(r)
446 | monads}\label{a-brief-comparison-with-freer-monads}}
447 | 
448 | At this point, astute readers will likely be thinking about free monads,
449 | which parts of this technique greatly resemble. The representation of
450 | actions as GADTs is especially similar to
451 | \href{https://hackage.haskell.org/package/freer}{freer}, which does
452 | something extremely similar. Indeed, you can think of this technique as
453 | something that combines a freer-style representation with mtl-style
454 | classes. Given that freer already does this, you might ask yourself what
455 | the point is.
456 | 
457 | If you are already sold on free monads, monad-mock may very well be
458 | uninteresting to you. From the perspective of theoretical novelty,
459 | monad-mock is not anything new or different. However, there are a
460 | variety of practical reasons to prefer mtl over free, and it's nice to
461 | see how easy it is to enjoy the testing benefits of free without too
462 | much extra effort.
463 | 
464 | An in-depth comparison between mtl and free is well outside the scope of
465 | this blog post. However, the key point is that this technique
466 | \emph{only} affects test code, so the real runtime implementation will
467 | not be affected in any way. This means you can take advantage of the
468 | performance benefits and ecosystem support of mtl without sacrificing
469 | simple, expressive testing.
470 | 
471 | \hypertarget{conclusion}{%
472 | \section{Conclusion}\label{conclusion}}
473 | 
474 | To cap things off, I want to emphasize monad-mock's role as a single
475 | part of a larger initiative we've been making for the better part of the
476 | past eighteen months. Haskell is a language with ever-evolving
477 | techniques and style, and it's sometimes dizzying to figure out how to
478 | use all the pieces together to develop robust, maintainable
479 | applications. While monad-mock might not be anything drastically
480 | different from existing testing techniques, my hope is that it can
481 | provide an opinionated mechanism to make testing easy and accessible,
482 | even for complex interactions with other services and systems.
483 | 
484 | I've made an effort to make it abundantly clear in this blog post that
485 | monad-mock is \emph{not} a silver bullet to testing, and in fact, I
486 | would prefer other techniques for ensuring correctness whenever
487 | possible. Even so, mocking is a nice tool to have in your toolbox, and
488 | it's a good fallback to get even the worst APIs under test coverage.
489 | 
490 | If you want to try out monad-mock for yourself,
491 | \href{https://hackage.haskell.org/package/monad-mock}{take a look at the
492 | documentation on Hackage} and start playing around! It's still early
493 | software, so it's not the most proven or featureful, but we've managed
494 | to get mileage out of it already, all the same. If you find any
495 | problems, have a use case it does not support, or just find something
496 | about it unclear, please do not hesitate to
497 | \href{https://github.com/cjdev/monad-mock}{open an issue on the GitHub
498 | repository}---we obviously can't fix issues we don't know about.
499 | 
500 | Thanks as always to the many people who have contributed ideas that have
501 | shaped my philosophy and approach to testing and have helped provide the
502 | tools that make this library work. Happy testing!
503 | 


--------------------------------------------------------------------------------
/jasper_van_der_jeugt3.tex:
--------------------------------------------------------------------------------
  1 | \chapter{The Handle Pattern - Jasper van der Joigt}
  2 | 
  3 | \begin{quotation}
  4 | \noindent\textit{\textbf{William Yao:}}
  5 | 
  6 | \textit{While there are fancy ways of injecting things like DB access and side effects into your application, such as monad transformers or effect algebras, the dead-simplest way to avoid turning your program into IO-soup is to just pass around records of functions. While there's nothing wrong with just leaving things in IO when bootstrapping a new project, if you find yourself needing to abstract over your concrete effects, try reaching for this before something more complicated.}
  7 | 
  8 | \vspace{\baselineskip}
  9 | \noindent\textit{Original article: \cite{the_handle_pattern}}
 10 | \end{quotation}
 11 | 
 12 | \section{Introduction}
 13 | 
 14 | I'd like to talk about a design pattern in Haskell that I've been
 15 | calling \emph{the Handle pattern}. This is far from novel -- I've
 16 | mentioned this
 17 | \href{https://skillsmatter.com/skillscasts/10832-how-to-architect-medium-to-large-scale-haskell-applications}{before}
 18 | and the idea is definitely not mine. As far as I know, in fact, it has
 19 | been around since basically
 20 | forever\footnote{Well, \texttt{System.IO.Handle} has definitely been around for a while}.
 21 | Since it is ridiculously close to what we'd call \emph{common
 22 | sense}\footnote{If you're reading this article and you're thinking: \emph{``What does
 23 |   this guy keep going on about? This is all so obvious!''} -- Well,
 24 |   that's the
 25 |   point!},
 26 | it's often used without giving it any explicit thought.
 27 | 
 28 | I first started more consciously using this pattern when I was working
 29 | together with \href{https://github.com/meiersi}{Simon Meier} at Better
 30 | (aka
 31 | \href{https://jaspervdj.be/posts/2013-09-29-erudify.html}{erudify}).
 32 | Simon did a writeup about this pattern
 33 | \href{https://www.schoolofhaskell.com/user/meiersi/the-service-pattern}{as
 34 | well}. But as I was explaining this idea again at last week's
 35 | \href{https://www.meetup.com/HaskellerZ/}{HaskellerZ} meetup, I figured
 36 | it was time to do an update of that article.
 37 | 
 38 | The \emph{Handle pattern} allows you write stateful applications that
 39 | interact with external services in Haskell. It complements pure code
 40 | (e.g.~your business logic) well, and it is somewhat the result of
 41 | iteratively applying the question:
 42 | 
 43 | \begin{itemize}
 44 | 
 45 | \item
 46 |   Can we make it simpler?
 47 | \item
 48 |   Can we make it simpler still?
 49 | \item
 50 |   And can we still make it simpler?
 51 | \end{itemize}
 52 | 
 53 | The result is a powerful and simple pattern that does not even require
 54 | Monads\footnote{It does require \texttt{IO}, but we don't require thinking about
 55 |   \texttt{IO} as a \texttt{Monad}. If this sounds weird -- think of
 56 |   lists. We work with lists all the time but we just consider them lists
 57 |   of things, we don't constantly call them ``List Monad'' or ``The Free
 58 |   Monoid'' for that
 59 |   matter.}
 60 | or Monad transformers to be useful. This makes it extremely suitable for
 61 | beginners trying to build their first medium-sized Haskell application.
 62 | And that does not mean it is beginners-only: this technique has been
 63 | applied successfully at several Haskell companies as well.
 64 | 
 65 | 
 66 | \section{Context}
 67 | 
 68 | In Haskell, we try to capture ideas in beautiful, pure and
 69 | mathematically sound patterns, for example \emph{Monoids}. But at other
 70 | times, we can't do that. We might be dealing with some inherently
 71 | mutable state, or we are simply dealing with external code which doesn't
 72 | behave nicely.
 73 | 
 74 | In those cases, we need another approach. What we're going to describe
 75 | feels suspiciously similar to Object Oriented Programming:
 76 | 
 77 | \begin{itemize}
 78 | 
 79 | \item
 80 |   Encapsulating and hiding state inside objects
 81 | \item
 82 |   Providing methods to manipulate this state rather than touching it
 83 |   directly
 84 | \item
 85 |   Coupling these objects together with methods that modify their state
 86 | \end{itemize}
 87 | 
 88 | As you can see, it is not exactly the same as Alan Kay's
 89 | \href{http://wiki.c2.com/?AlanKaysDefinitionOfObjectOriented}{original
 90 | definition} of
 91 | OOP\footnote{And indeed, we will touch on a common way of encoding OOP in Haskell
 92 |   -- creating explicit records of functions -- but we'll also explain
 93 |   why this isn't always
 94 |   necessary.},
 95 | but it is far from the horrible incidents that permeate our field such
 96 | as UML, abstract factory factories and broken subtyping.
 97 | 
 98 | Before we dig in to the actual code, let's talk about some disclaimers.
 99 | 
100 | Pretty much any sort of Haskell code can be written in this particular
101 | way, but \emph{that doesn't mean that you should}. This method relies
102 | heavily on \texttt{IO}. Whenever you can write things in a pure way, you
103 | should attempt to do that and avoid \texttt{IO}. This pattern is only
104 | useful when \texttt{IO} is required.
105 | 
106 | Secondly, there are many alternatives to this approach: complex monad
107 | transformer stacks, interpreters over free monads, uniqueness types,
108 | effect systems\ldots{} I don't want to claim that this method is better
109 | than the others. All of these have advantages and disadvantages, so one
110 | must always make a careful trade-off.
111 | 
112 | \hypertarget{the-module-layout}{%
113 | \subsection{The module layout}\label{the-module-layout}}
114 | 
115 | For this pattern, we've got a very well-defined module layout. I believe
116 | this helps with recognition which I think is also one of the reasons we
117 | use typeclasses like \emph{Monoid}.
118 | 
119 | When I'm looking at the documentation of libraries I haven't used yet,
120 | the types will sometimes look a bit bewildering. But then I see that
121 | there's an \texttt{instance\ Monoid}. That's an ``Aha!'' moment for me.
122 | I \emph{know} what a Monoid is. I \emph{know} how they behave. This
123 | allows me to get up to speed with this library much faster!
124 | 
125 | Using a consistent module layout in a project (and even across projects)
126 | provides, I think, very similar benefits to that. It allows new people
127 | on the team to learn parts of the codebase they are yet unfamiliar with
128 | much faster.
129 | 
130 | \subsection{A Database Handle}
131 | 
132 | Anyway, let's look at the concrete module layout we are proposing with
133 | this pattern. As an example, let's consider a database. The type in
134 | which we are encapsulating the state is \emph{always} called
135 | \texttt{Handle}. That is because we
136 | \href{https://mail.haskell.org/pipermail/haskell-cafe/2008-June/043986.html}{design
137 | for qualified import}.
138 | 
139 | We might have something like:
140 | 
141 | \begin{minted}{haskell}
142 | module MyApp.Database
143 | 
144 | data Handle = Handle
145 |     { hPool   :: Pool Postgres.Connection
146 |     , hCache  :: IORef (PSQueue Int Text User)
147 |     , hLogger :: Logger.Handle  -- Another handle!
148 |     , ...
149 |     }
150 | \end{minted}
151 | The internals of the \texttt{Handle} typically consist of static fields
152 | and other handles, \texttt{MVar}s, \texttt{IORef}s, \texttt{TVar}s,
153 | \texttt{Chan}s\ldots{} With our \texttt{Handle} defined, we are able to
154 | define functions using it. These are usually straightforward imperative
155 | pieces of code and I'll omit them for
156 | brevity\footnote{If you want to see a full example, you can refer to
157 |   \href{https://github.com/jaspervdj/fugacious}{this repository} that I
158 |   have been using to teach practical
159 |   Haskell.}:
160 | 
161 | \begin{minted}{haskell}
162 | module MyApp.Database where
163 | 
164 | data Handle = ...
165 | 
166 | createUser :: Handle -> Text -> IO User
167 | createUser = ...
168 | 
169 | getUserMail :: Handle -> User -> IO [Mail]
170 | getUserMail = ...
171 | \end{minted}
172 | Some thoughts on this design:
173 | 
174 | \begin{enumerate}
175 | \item
176 |   We call our functions \texttt{createUser} rather than
177 |   \texttt{databaseCreateUser}. Again, we're working with qualified
178 |   imports so there's no need for ``C-style'' names.
179 | \item
180 |   \textbf{All functions take the \texttt{Handle} as the first argument.}
181 |   This is very important for consistency, but also for
182 |   \href{https://jaspervdj.be/posts/2018-03-08-handle-pattern.html\#handle-polymorphism}{polymorphism}
183 |   and code style.
184 | 
185 |   With code style, I mean that the \texttt{Handle} is often a
186 |   syntactically simpler expression (e.g.~a name) than the argument
187 |   (which is often a composed expression). Consider:
188 | 
189 | \begin{minted}{haskell}
190 | Database.createUser database $ userName <> "@" <> companyDomain
191 | \end{minted}
192 |   Versus:
193 | 
194 | \begin{minted}{haskell}
195 | Database.createUser (userName <> "@" <> companyDomain) database
196 | \end{minted}
197 | \item
198 |   Other \texttt{Handle}s (e.g.~\texttt{Logger.Handle}) are stored in a
199 |   field of our \texttt{Database.Handle}. You could also remove it there
200 |   and instead have it as an argument wherever it is needed, for example:
201 | 
202 | \begin{minted}{haskell}
203 | createUser :: Handle -> Logger.Handle -> Text -> IO User
204 | createUser = ...
205 | \end{minted}
206 |   I usually prefer to put it inside the \texttt{Handle} since that
207 |   reduces the amount of arguments required for functions such as
208 |   \texttt{createUser}. However, if the lifetime of a
209 |   \texttt{Logger.Handle} is very
210 |   short\href{https://jaspervdj.be/posts/2018-03-08-handle-pattern.html\#fn6}{\textsuperscript{6}},
211 |   or if you want to reduce the amount of dependencies for \texttt{new},
212 |   then you could consider doing the above.
213 | \item
214 |   The datatypes such as \texttt{Mail} may be defined in this module may
215 |   even be specific to this function. I've written about
216 |   ad-hoc datatypes before (see chapter \ref{sec:adhoc_datatypes}).
217 | \end{enumerate}
218 | 
219 | \subsection{Creating a Handle}
220 | 
221 | I mentioned before that an important advantage of using these patterns
222 | is that programmers become ``familiar'' with it. That is also the goal
223 | we have in mind when designing our API for the creation of
224 | \texttt{Handle}s.
225 | 
226 | In addition to always having a type called \texttt{Handle}, we'll
227 | require the module to always have a type called \texttt{Config}. This is
228 | where we encode our static configuration parameters -- and by static I
229 | mean that we shouldn't have any \texttt{IORef}s or the like here: this
230 | \texttt{Config} should be easy to create from pure code.
231 | 
232 | \begin{minted}{haskell}
233 | module MyApp.Database where
234 | 
235 | data Config = Config
236 |     { cPath :: FilePath
237 |     , ...
238 |     }
239 | 
240 | data Handle = ...
241 | \end{minted}
242 | We can also offer some way to create a \texttt{Config}. This really
243 | depends on your application. If you use the
244 | \href{https://hackage.haskell.org/package/configurator}{configurator}
245 | library, you might have something like:
246 | 
247 | \begin{minted}{haskell}
248 | parseConfig :: Configurator.Config -> IO Config
249 | parseConfig = ...
250 | \end{minted}
251 | On the other hand, if you use
252 | \href{https://hackage.haskell.org/package/aeson}{aeson} or
253 | \href{https://hackage.haskell.org/package/yaml}{yaml}, you could write:
254 | 
255 | \begin{minted}{haskell}
256 | instance Aeson.FromJSON Config where
257 |     parseJSON = ...
258 | \end{minted}
259 | You could even use a
260 | \href{https://medium.com/@jonathangfischoff/the-partial-options-monoid-pattern-31914a71fc67}{Monoid}
261 | to support loading configurations from multiple places. But I digress --
262 | the important part is that there is a type called \texttt{Config}.
263 | 
264 | Next is a similar pattern: in addition to always having a
265 | \texttt{Config}, we'll also always provide a function called
266 | \texttt{new}. The parameters follow a similarly strict pattern:
267 | 
268 | \begin{minted}{haskell}
269 | new :: Config         -- 1. Config
270 |     -> Logger.Handle  -- 2. Dependencies
271 |     -> ...            --    (usually other handles)
272 |     -> IO Handle      -- 3. Result
273 | \end{minted}
274 | Inside the \texttt{new} function, we can create some more
275 | \texttt{IORef}s, file handles, caches\ldots{} if required and then store
276 | them in the \texttt{Handle}.
277 | 
278 | \subsection{Destroying a Handle}
279 | 
280 | We've talked about creation of a \texttt{Handle}, and we mentioned the
281 | normal functions operating on a \texttt{Handle}
282 | (e.g.~\texttt{createUser}) before. So now let's consider the final stage
283 | in the lifetime of \texttt{Handle}.
284 | 
285 | Haskell is a garbage collected language and we can let the runtime
286 | system take care of destroying things for us -- but that's not always a
287 | great idea. Many resources (file handles in particular come to mind as
288 | an example) are scarce.
289 | 
290 | There is quite a strong correlation between scarce resources and things
291 | you would naturally use a \texttt{Handle} for. That's why I recommend
292 | always providing a \texttt{close} as well, even if does nothing. This is
293 | a form of forward compatibility in our API: if we later decide to add
294 | some sort of log files (which will need to be closed), we can do so
295 | without individually mailing all our module users that they now need to
296 | add a \texttt{close} to their code.
297 | 
298 | \begin{minted}{haskell}
299 | close :: Handle -> IO ()
300 | close = ...
301 | \end{minted}
302 | 
303 | 
304 | 
305 | 
306 | \subsection{Reasonable safety}
307 | 
308 | When you're given a \texttt{new} and \texttt{close}, it's often tempting
309 | to add an auxiliary function like:
310 | 
311 | \begin{minted}{haskell}
312 | withHandle
313 |     :: Config            -- 1. Config
314 |     -> Logger.Handle     -- 2. Dependencies
315 |     -> ...               --    (usually other handles)
316 |     -> (Handle -> IO a)  -- 3. Function to apply
317 |     -> IO a              -- 4. Result, handle is closed automatically
318 | \end{minted}
319 | I think this is a great idea. In fact, it's sometimes useful to
320 | \emph{only} provide the \texttt{withHandle} function, and hide
321 | \texttt{new} and \texttt{close} in an internal module.
322 | 
323 | The only caveat is that the naive implementation of this function:
324 | 
325 | \begin{minted}{haskell}
326 | withHandle config dep1 dep2 ... depN f = do
327 |     h <- new config dep1 dep2 ... depN
328 |     x <- f h
329 |     close h
330 |     return x
331 | \end{minted}
332 | Is \textbf{wrong}! In any sort of \texttt{withXyz} function, you should
333 | always use \texttt{bracket} to guard against exceptions. This means the
334 | correct implementation is:
335 | 
336 | \begin{minted}{haskell}
337 | withHandle config dep1 dep2 … depN f =
338 |     bracket (new config dep1 dep2 … depN) close f
339 | \end{minted}
340 | Well, it's even shorter! In case you want more information on why
341 | \texttt{bracket} is necessary,
342 | \href{http://www.well-typed.com/blog/97/}{this blogpost} gives a good
343 | in-depth overview. My summary of it as it relates to this article would
344 | be:
345 | 
346 | \begin{enumerate}
347 | \item
348 |   Always use \texttt{bracket} to match \texttt{new} and \texttt{close}
349 | \item
350 |   You can now use \texttt{throwIO} and \texttt{killThread} safely
351 | \end{enumerate}
352 | It's important to note that \texttt{withXyz} functions do not provide
353 | complete safety against things like use-after-close or double-close.
354 | There are many interesting approaches to fix these issues but they are
355 | \emph{way} beyond the scope of this tutorial -- things like
356 | \href{http://okmij.org/ftp/Haskell/regions.html}{Monadic Regions} and
357 | \href{https://www.cis.upenn.edu/~jpaykin/papers/pz_linearity_monad_2017.pdf}{The
358 | Linearity Monad} come to mind. For now, we'll rely on \texttt{bracket}
359 | to catch common issues and on code reviews to catch team members who are
360 | not using \texttt{bracket}.
361 | 
362 | \subsection{Summary of the module layout}
363 | 
364 | If we quickly summarise the module layout, we now have:
365 | 
366 | \begin{minted}{haskell}
367 | module MyApp.Database
368 |     ( Config (..)   -- Internals exported
369 |     , parseConfig   -- Or some other way to load a config
370 | 
371 |     , Handle        -- Internals usually not exported
372 |     , new
373 |     , close
374 |     , withHandle
375 | 
376 |     , createUser  -- Actual functions on the handle
377 |     , ...
378 |     ) where
379 | \end{minted}
380 | This is a well-structured, straightforward and easy to learn
381 | organisation. Most of the \texttt{Handle}s in any application should
382 | probably look this way. In the next section, we'll see how we can build
383 | on top of this to create dynamic, customizable \texttt{Handle}s.
384 | 
385 | \section{Handle polymorphism}
386 | 
387 | It's often important to split between the interface and implementation
388 | of a service. There are countless ways to do this in programming
389 | languages. For Haskell, there is:
390 | 
391 | \begin{itemize}
392 | 
393 | \item
394 |   Higher order functions
395 | \item
396 |   Type classes and type families
397 | \item
398 |   Dictionary passing
399 | \item
400 |   Backpack module system
401 | \item
402 |   Interpreters over concrete ASTs
403 | \item
404 |   \ldots{}
405 | \end{itemize}
406 | The list is endless. And because Haskell on one hand makes it so easy to
407 | abstract over things, and on the other hand makes it possible to
408 | abstract over pretty much anything, I'll start this section with a
409 | disclaimer.
410 | 
411 | \emph{Premature} abstraction is a real concern in Haskell (and many
412 | other high-level programming languages). It's easy to quickly whiteboard
413 | an abstraction or interface and unintentionally end up with completely
414 | the wrong thing.
415 | 
416 | It usually goes like this:
417 | 
418 | \begin{enumerate}
419 | \def\labelenumi{\arabic{enumi}.}
420 | 
421 | \item
422 |   You need to implement a bunch of things that look similar
423 | \item
424 |   You write down a typeclass or another interface-capturing abstraction
425 | \item
426 |   You start writing the actual implementations
427 | \item
428 |   One of them doesn't \emph{quite} match the interface so you need to
429 |   change it two weeks in
430 | \item
431 |   You add another parameter, or another method, mostly for one specific
432 |   interface
433 | \item
434 |   This causes some problems or inconsistencies for interfaces
435 | \item
436 |   Go back to (4)
437 | \end{enumerate}
438 | What you end up with is a leaky abstraction that is the \emph{product}
439 | of all concrete implementations -- where what you really wanted is the
440 | \emph{greatest common divisor}.
441 | 
442 | There's no magic bullet to avoid broken abstractions so my advice is
443 | usually to first painstakingly do all the different implementations (or
444 | at least a few of them). \emph{After} you have something working and you
445 | have emerged victorous from horrible battles with the guts of these
446 | implementations, \emph{then} you could start looking at what the
447 | different implementations have in common. At this point, you'll also be
448 | a bit wiser about where they differ -- and you'll be able to take these
449 | important details into account, at which point you retire from just
450 | being an idiot drawing squares and arrows on a whiteboard.
451 | 
452 | This is why I recommend sticking with simple \texttt{Handle}s until
453 | \href{https://en.wikipedia.org/wiki/You_aren\%27t_gonna_need_it}{you
454 | really need it}. But naturally, sometimes we really need the extra
455 | power.
456 | 
457 | \subsection{A Handle interface}
458 | 
459 | So let's do the simplest thing that can possibly work. Consider the
460 | following definition of the \texttt{Handle} we discussed before:
461 | 
462 | \begin{minted}{haskell}
463 | module MyApp.Database
464 |     ( Handle (..)  -- We now need to export this
465 |     ) where
466 | 
467 | data Handle = Handle
468 |     { createUser :: Text -> IO User
469 |     , ...
470 |     }
471 | \end{minted}
472 | What's the type of \texttt{createUser} now?
473 | 
474 | \begin{minted}{haskell}
475 | createUser :: Handle -> Text -> IO User
476 | \end{minted}
477 | It's exactly the same as before! This is pretty much a requirement: it
478 | means we can move our \texttt{Handle}s to this approach when we need it,
479 | not when we envision that we will need it at some point in the future.
480 | 
481 | \subsection{A Handle implementation}
482 | 
483 | We can now create a concrete implementation for this abstract
484 | \texttt{Handle} type. We'll do this in a module like
485 | \texttt{MyApp.Database.Postgres}.
486 | 
487 | \begin{minted}{haskell}
488 | module MyApp.Database.Postgres where
489 | import MyApp.Database
490 | 
491 | data Config = ...
492 | 
493 | new :: Config -> Logger.Handle -> … -> IO Handle
494 | \end{minted}
495 | The \texttt{Config} datatype and the \texttt{new} function have now
496 | moved to the implementation module, rather than the interface module.
497 | 
498 | Since we can have any number of implementation modules, it is worth
499 | mentioning that we will have multiple \texttt{Config} types and
500 | \texttt{new} functions (exactly one of each per implementation).
501 | Configurations are always specific to the concrete implementation. For
502 | example, an \href{https://sqlite.org/index.html}{sqlite} database may
503 | just have a \texttt{FilePath} in the configuration, but our
504 | \texttt{Postgres} implementation will have other details such as port,
505 | database, username and password.
506 | 
507 | In the implementation of \texttt{new}, we simply initialize a
508 | \texttt{Handle}:
509 | 
510 | \begin{minted}{haskell}
511 | new config dep1 dep2 ... depN = do
512 |     -- Intialization of things inside the handle
513 |     ...
514 | 
515 |     -- Construct record
516 |     return Handle
517 |         { createUser = \name -> do
518 |             ...
519 |         , ...
520 |         }
521 | \end{minted}
522 | Of course, we can manually float out the body of \texttt{createUser}
523 | since constructing these large records gets kind of ugly.
524 | 
525 | \section{Compared to other approaches}
526 | 
527 | We've presented an approach to modularize the effectful layer of medium-
528 | to large-scaled Haskell applications. There are many other approaches to
529 | tackling this, so any comparison I come up with would probably be
530 | inexhaustive.
531 | 
532 | Perhaps the most important advantage of using \texttt{Handle}s is that
533 | they are first class values that we can freely mix and match. This often
534 | does not come for free when using more exotic strategies.
535 | 
536 | Consider the following type signature from a Hackage package -- and I do
537 | not mean to discredit the author, the package works fine but simply uses
538 | a different approach than my personal preference:
539 | 
540 | \begin{minted}{haskell}
541 | -- | Create JSON-RPC session around conduits from transport layer.
542 | -- When context exits session disappears.
543 | runJsonRpcT
544 |     :: (MonadLoggerIO m, MonadBaseControl IO m)
545 |     => Ver                  -- ^ JSON-RPC version
546 |     -> Bool                 -- ^ Ignore incoming requests/notifs
547 |     -> Sink ByteString m () -- ^ Sink to send messages
548 |     -> Source m ByteString  -- ^ Source to receive messages from
549 |     -> JsonRpcT m a         -- ^ JSON-RPC action
550 |     -> m a                  -- ^ Output of action
551 | \end{minted}
552 | I'm a fairly experienced Haskeller and it still takes me a bit of
553 | eye-squinting to see how this will fit into my application, especially
554 | if I want to use this package with other libraries that do not use the
555 | \texttt{Sink}/\texttt{Source} or \texttt{MonadBaseControl} abstractions.
556 | 
557 | It is somewhat obvious that one running call to \texttt{runJsonRpcT}
558 | corresponds to being connected to one JSON-RPC endpoint, since it takes
559 | a single sink and source. But what if we want our application to be
560 | connected to multiple endpoints at the same time?
561 | 
562 | What if we need to have hundreds of thousands of these, and we want to
563 | store them in some priority queue and only consider the most recent ones
564 | in the general case. How would you go about that?
565 | 
566 | You could consider running a lightweight thread for every
567 | \texttt{runJsonRpcT}, but that means you now need to worry about thread
568 | overhead, communicating exceptions between threads and killing the
569 | threads after you remove them. Whereas with first-class handles, we
570 | would just have a \texttt{HashPSQ\ Text\ Int\ JsonRpc.Handle}, which is
571 | much easier to reason about.
572 | 
573 | \begin{center}\rule{0.5\linewidth}{\linethickness}\end{center}
574 | 
575 | \noindent So, I guess one of the oldest and most widely used approaches is
576 | MTL-style monad transformers. This uses a hierarchy of typeclasses to
577 | represent access to various subsystems.
578 | 
579 | I love working with MTL-style transformers in the case of pure code,
580 | since they often allow us to express complex ideas concisely. For
581 | effectful code, on the other hand, they do not seem to offer many
582 | advantages and often make it harder to reason about code.
583 | 
584 | My personal preference for writing complex effectful code is to reify
585 | the effectful operations as a datatype and then write pure code
586 | manipulating these effectful operations. An interpreter can then simply
587 | use the \texttt{Handle}s to perform the effects. For simpler effectful
588 | code, we can just use \texttt{Handle}s directly.
589 | 
590 | I have implemented a number of these patterns in the (ever unfinished)
591 | example web application
592 | \href{https://github.com/jaspervdj/fugacious}{fugacious}, in case you
593 | want to see them in action or if you want a more elaborate example than
594 | the short snippets in this blogpost. Finally, I would like to thank
595 | \href{https://github.com/alang9/}{Alex Lang} and
596 | \href{http://www.nmattia.com/}{Nicolas Mattia} for proofreading, and
597 | \href{https://github.com/tivervac}{Titouan Vervack} for many corrections
598 | and typos.
599 | 


--------------------------------------------------------------------------------