├── .gitignore
├── LICENSE
├── README.md
├── bahug-2012
├── Makefile
├── slides.md
└── slidy.css
├── bayhac-2011
├── Makefile
└── slides.tex
├── cufp-2010
├── Makefile
├── README
├── diagrams
│ ├── SpaceLeak-eps-converted-to.pdf
│ ├── SpaceLeak.eps
│ ├── SpaceLeak.pdf
│ ├── intpair-unpacked.pdf
│ ├── intpair-unpacked.svg
│ ├── intpair.pdf
│ ├── intpair.svg
│ ├── list12.pdf
│ └── list12.svg
└── slides.tex
├── galois-2011
├── Makefile
└── slides.tex
├── haskell-2011
├── Makefile
├── performance.png
├── reasoning.png
└── slides.tex
├── hiw-2011
├── Makefile
├── hamt-mem.hp
├── hamt-mem.pdf
├── hamt.graffle
├── hamt.pdf
├── patricia-mem.hp
├── patricia-mem.pdf
└── slides.tex
├── stanford-2011
├── Makefile
├── hashmap-naive.graffle
├── hashmap-naive.png
├── hashmap.graffle
├── hashmap.png
├── intpair-unpacked.graffle
├── intpair-unpacked.png
├── intpair.graffle
├── intpair.png
├── list12.graffle
├── list12.png
├── performance.md
└── slidy.css
└── zurihac-2015
├── Makefile
├── intpair-unpacked.graffle
├── intpair-unpacked.png
├── intpair.graffle
├── intpair.png
├── list12.graffle
├── list12.png
├── show.css
├── slides.md
└── slidy.js
/.gitignore:
--------------------------------------------------------------------------------
1 | *.aux
2 | *.log
3 | *.nav
4 | *.out
5 | *.pdf
6 | *.snm
7 | *.toc
8 | *.vrb
9 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | All material in this repository is licensed under the Creative Commons
2 | Attribution 3.0 Unported (CC BY 3.0) license. For the legal text see
3 | the LICENSE file.
4 |
--------------------------------------------------------------------------------
/bahug-2012/Makefile:
--------------------------------------------------------------------------------
1 | slides.html: slides.md
2 | pandoc --data-dir=. --offline -s -t slidy -o slides.html slides.md
3 |
--------------------------------------------------------------------------------
/bahug-2012/slides.md:
--------------------------------------------------------------------------------
1 | % Haskell Performance Patterns
2 | % Johan Tibell
3 | % February 15, 2012
4 |
5 | # Caveats
6 |
7 | * Much of this is GHC specific.
8 |
9 | * Some of the patterns trade generality/beauty for performance. Only
10 | use these when needed.
11 |
12 | * The following patterns are guidelines, not rules. There are
13 | exceptions.
14 |
15 | # Think about your data representation
16 |
17 | * A linked-list of linked-lists of pointers to integers is not a good
18 | way to represent a bitmap! (Someone actually did this and complained
19 | Haskell was slow.)
20 |
21 | * Make use of modern data types: `ByteString`, `Text`, `Vector`,
22 | `HashMap`, etc.
23 |
24 | # Unpack scalar fields
25 |
26 | Always unpack scalar fields (e.g. `Int`, `Double`):
27 |
28 | ~~~~ {.haskell}
29 | data Vec3 = Vec3 {-# UNPACK #-} !Double
30 | {-# UNPACK #-} !Double
31 | {-# UNPACK #-} !Double
32 | ~~~~
33 |
34 | * This is **the most important optimization available** to us.
35 |
36 | * GHC does Good Things (tm) to strict, unpacked fields.
37 |
38 | * You can use `-funbox-strict-fields` on a per file basis if `UNPACK`
39 | is too verbose.
40 |
41 | # Use a strict spine for data structures
42 |
43 | * Most container types have a strict spine e.g. `Data.Map`:
44 |
45 | ~~~~ {.haskell}
46 | data Map k a = Tip
47 | | Bin {-# UNPACK #-} !Size !k a
48 | !(Map k a) !(Map k a)
49 | ~~~~
50 |
51 | (Note the bang on the `Map k a` fields.)
52 |
53 | * Strict spines cause more work to be done up-front (e.g. on insert),
54 | when the data structure is in cache, rather than later (e.g. on the
55 | next lookup.)
56 |
57 | * Does not always apply (e.g. when representing streams and other
58 | infitinte structures.)
59 |
60 | # Specialized data types are sometimes faster
61 |
62 | * Polymorphic fields are always stored as pointer-to-thing, which
63 | increases memory usage and decreases cache locality. Compare:
64 |
65 | ~~~~ {.haskell}
66 | data Tree a = Leaf | Bin a !(Tree a) !(Tree a)
67 | data IntTree = IntLeaf | IntBin {-# UNPACK #-} !Int !IntTree !IntTree
68 | ~~~~
69 |
70 | * Specialized data types can be faster, but at the cost of code
71 | duplication. **Benchmark** your code and only use them if really
72 | needed.
73 |
74 | # Inline recursive functions using a wrapper
75 |
76 | * GHC does not inline recursive functions:
77 |
78 | ~~~~ {.haskell}
79 | map :: (a -> b) -> [a] -> [b]
80 | map _ [] = []
81 | map f (x:xs) = f x : map f xs
82 | ~~~~
83 |
84 | * **If** you want to inline a recursive function, use a non-recursive
85 | wrapper like so:
86 |
87 | ~~~~ {.haskell}
88 | map :: (a -> b) -> [a] -> [b]
89 | map f = go
90 | where
91 | go [] = []
92 | go (x:xs) = f x : go xs
93 | ~~~~
94 |
95 | * You still need to figure out if you want a particular function
96 | inlined (e.g. see the next slide.)
97 |
98 | # Inline HOFs to avoid indirect calls
99 |
100 | * Calling an unknown function (e.g. a function that's passed as an
101 | argument) is more expensive than calling a known function. Such
102 | *indirect* calls appear in higher-order functions:
103 |
104 | ~~~~ {.haskell}
105 | map :: (a -> b) -> [a] -> [b]
106 | map _ [] = []
107 | map f (x:xs) = f x : map f xs
108 |
109 | g xs = map (+1) xs -- map is recursive => not inlined
110 | ~~~~
111 |
112 | * At the cost of increased code size, we can inline `map` into `g` by
113 | using the non-recursive wrapper trick on the previous slide together
114 | with an `INLINE` pragma.
115 |
116 | * Inline HOFs if the higher-order argument is used a lot (e.g. in
117 | `map`, but not in `Data.Map.insertWith`.)
118 |
119 | # Use strict data types in accumulators
120 |
121 | If you're using a composite accumulator (e.g. a pair), make sure it has
122 | strict fields.
123 |
124 | Allocates on each iteration:
125 |
126 | ~~~~ {.haskell}
127 | mean :: [Double] -> Double
128 | mean xs = s / n
129 | where (s, n) = foldl' (\ (s, n) x -> (s+x, n+1)) (0, 0) xs
130 | ~~~~
131 |
132 | Doesn't allocate on each iteration:
133 |
134 | ~~~~ {.haskell}
135 | data StrictPair a b = SP !a !b
136 |
137 | mean2 :: [Double] -> Double
138 | mean2 xs = s / n
139 | where SP s n = foldl' (\ (SP s n) x -> SP (s+x) (n+1)) (SP 0 0) xs
140 | ~~~~
141 |
142 | Haskell makes it cheap to create throwaway data types like
143 | `StrictPair`: one line of code.
144 |
145 | # Use strict returns in monadic code
146 |
147 | `return` often wraps the value in some kind of (lazy) box. This is
148 | often not what we want, especially when returning some arithmetic
149 | expression. For example, assuming we're in a state monad:
150 |
151 | ~~~~ {.haskell}
152 | return $ x + y
153 | ~~~~
154 |
155 | creates a thunk. We most likely want:
156 |
157 | ~~~~ {.haskell}
158 | return $! x + y
159 | ~~~~
160 |
161 | # Beware of the lazy base case
162 |
163 | Functions that would otherwise be strict might be made lazy by the
164 | "base case":
165 |
166 | ~~~~ {.haskell}
167 | data Tree = Leaf
168 | | Bin Key Value !Tree !Tree
169 |
170 | insert :: Key -> Value -> Tree
171 | insert k v Leaf = Bin k v Leaf Leaf -- lazy in @k@
172 | insert k v (Bin k' v' l r)
173 | | k < k' = ...
174 | | otherwise = ...
175 | ~~~~
176 |
177 | Since GHC does good things to strict arguments, we should make the
178 | base case strict, unless the extra laziness is useful:
179 |
180 | ~~~~ {.haskell}
181 | insert !k v Leaf = Bin k v Leaf Leaf -- strict in @k@
182 | ~~~~
183 |
184 | In this case GHC might unbox the key, making all those comparisons
185 | cheaper.
186 |
187 | # Beware of returning expressions inside lazy data types
188 |
189 | * Remember that many standard data types are lazy (e.g. `Maybe`,
190 | `Either`).
191 |
192 | * This means that it's easy to be lazier than you intend by wrapping
193 | an expression in such a value:
194 |
195 | ~~~~ {.haskell}
196 | safeDiv :: Int -> Int -> Maybe Int
197 | safeDiv _ 0 = Nothing
198 | safeDiv x y = Just $ x / y -- creates thunk
199 | ~~~~
200 |
201 | * Force the value (e.g. using `$!`) before wrapping it in the
202 | constructor.
203 |
204 | # Summary
205 |
206 | * Strict fields are good for performance.
207 |
208 | * Think about your data representation (and use `UNPACK` where
209 | appropriate.)
210 |
--------------------------------------------------------------------------------
/bahug-2012/slidy.css:
--------------------------------------------------------------------------------
1 | /* slidy.css
2 |
3 | Copyright (c) 2005-2010 W3C (MIT, ERCIM, Keio), All Rights Reserved.
4 | W3C liability, trademark, document use and software licensing
5 | rules apply, see:
6 |
7 | http://www.w3.org/Consortium/Legal/copyright-documents
8 | http://www.w3.org/Consortium/Legal/copyright-software
9 | */
10 | body
11 | {
12 | margin: 0 0 0 0;
13 | padding: 0 0 0 0;
14 | width: 100%;
15 | height: 100%;
16 | color: black;
17 | background-color: white;
18 | font-family: "URW Palladio L", "Palatino Linotype", sans-serif;
19 | font-size: 14pt;
20 | }
21 |
22 | code
23 | {
24 | font-family: "DejaVu Sans Mono", monospace;
25 | }
26 |
27 | div.toolbar {
28 | position: fixed; z-index: 200;
29 | top: auto; bottom: 0; left: 0; right: 0;
30 | height: 1.2em; text-align: right;
31 | padding-left: 1em;
32 | padding-right: 1em;
33 | font-size: 60%;
34 | color: red;
35 | background-color: rgb(240,240,240);
36 | border-top: solid 1px rgb(180,180,180);
37 | }
38 |
39 | div.toolbar span.copyright {
40 | color: black;
41 | margin-left: 0.5em;
42 | }
43 |
44 | div.initial_prompt {
45 | position: absolute;
46 | z-index: 1000;
47 | bottom: 1.2em;
48 | width: 100%;
49 | background-color: rgb(200,200,200);
50 | opacity: 0.35;
51 | background-color: rgb(200,200,200, 0.35);
52 | cursor: pointer;
53 | }
54 |
55 | div.initial_prompt p.help {
56 | text-align: center;
57 | }
58 |
59 | div.initial_prompt p.close {
60 | text-align: right;
61 | font-style: italic;
62 | }
63 |
64 | div.slidy_toc {
65 | position: absolute;
66 | z-index: 300;
67 | width: 60%;
68 | max-width: 30em;
69 | height: 30em;
70 | overflow: auto;
71 | top: auto;
72 | right: auto;
73 | left: 4em;
74 | bottom: 4em;
75 | padding: 1em;
76 | background: rgb(240,240,240);
77 | border-style: solid;
78 | border-width: 2px;
79 | font-size: 60%;
80 | }
81 |
82 | div.slidy_toc .toc_heading {
83 | text-align: center;
84 | width: 100%;
85 | margin: 0;
86 | margin-bottom: 1em;
87 | border-bottom-style: solid;
88 | border-bottom-color: rgb(180,180,180);
89 | border-bottom-width: 1px;
90 | }
91 |
92 | div.slide {
93 | z-index: 20;
94 | margin: 0 0 0 0;
95 | padding-top: 0;
96 | padding-bottom: 0;
97 | padding-left: 20px;
98 | padding-right: 20px;
99 | border-width: 0;
100 | clear: both;
101 | top: 0;
102 | bottom: 0;
103 | left: 0;
104 | right: 0;
105 | line-height: 120%;
106 | background-color: transparent;
107 | }
108 |
109 | div.slide > div.figure {
110 | text-align: center
111 | }
112 |
113 | div.background {
114 | display: none;
115 | }
116 |
117 | div.handout {
118 | margin-left: 20px;
119 | margin-right: 20px;
120 | }
121 |
122 | div.slide.titlepage {
123 | text-align: center;
124 | }
125 |
126 | div.slide.titlepage h1 {
127 | padding-top: 10%;
128 | margin-right: 0;
129 | }
130 |
131 | div.slide h1 {
132 | padding-left: 0;
133 | padding-right: 20pt;
134 | padding-top: 4pt;
135 | padding-bottom: 4pt;
136 | margin-top: 0;
137 | margin-left: 0;
138 | margin-right: 60pt;
139 | margin-bottom: 0.5em;
140 | display: block;
141 | font-size: 160%;
142 | line-height: 1.2em;
143 | background: transparent;
144 | }
145 |
146 | div.toc {
147 | position: absolute;
148 | top: auto;
149 | bottom: 4em;
150 | left: 4em;
151 | right: auto;
152 | width: 60%;
153 | max-width: 30em;
154 | height: 30em;
155 | border: solid thin black;
156 | padding: 1em;
157 | background: rgb(240,240,240);
158 | color: black;
159 | z-index: 300;
160 | overflow: auto;
161 | display: block;
162 | visibility: visible;
163 | }
164 |
165 | div.toc-heading {
166 | width: 100%;
167 | border-bottom: solid 1px rgb(180,180,180);
168 | margin-bottom: 1em;
169 | text-align: center;
170 | }
171 |
172 | pre {
173 | font-size: 80%;
174 | font-weight: bold;
175 | line-height: 120%;
176 | padding-top: 0.2em;
177 | padding-bottom: 0.2em;
178 | padding-left: 1em;
179 | padding-right: 1em;
180 | border-style: solid;
181 | border-left-width: 1em;
182 | border-top-width: thin;
183 | border-right-width: thin;
184 | border-bottom-width: thin;
185 | border-color: #95ABD0;
186 | color: #00428C;
187 | background-color: #E4E5E7;
188 | }
189 |
190 | li pre { margin-left: 0; }
191 |
192 | blockquote { font-style: italic }
193 |
194 | img { background-color: transparent }
195 |
196 | p.copyright { font-size: smaller }
197 |
198 | .center { text-align: center }
199 | .footnote { font-size: smaller; margin-left: 2em; }
200 |
201 | a img { border-width: 0; border-style: none }
202 |
203 | a:visited { color: navy }
204 | a:link { color: navy }
205 | a:hover { color: red; text-decoration: underline }
206 | a:active { color: red; text-decoration: underline }
207 |
208 | a {text-decoration: none}
209 | .navbar a:link {color: white}
210 | .navbar a:visited {color: yellow}
211 | .navbar a:active {color: red}
212 | .navbar a:hover {color: red}
213 |
214 | ul { list-style-type: square; }
215 | ul ul { list-style-type: disc; }
216 | ul ul ul { list-style-type: circle; }
217 | ul ul ul ul { list-style-type: disc; }
218 | li { margin-left: 0.5em; margin-top: 0.5em; font-weight: bold }
219 | li li { font-size: 85%; font-weight: normal }
220 | li li li { font-size: 85%; font-weight: normal }
221 | strong { color: red; }
222 | li li strong { color: black; }
223 | /* pandoc's rules about when to insert paragraphs don't interact well
224 | with the requirement for blank lines around code blocks. Let's just
225 | neutralize the effects of p in bullets. */
226 | li > p { margin: 0em; }
227 |
228 | div dt
229 | {
230 | margin-left: 0;
231 | margin-top: 1em;
232 | margin-bottom: 0.5em;
233 | font-weight: bold;
234 | }
235 | div dd
236 | {
237 | margin-left: 2em;
238 | margin-bottom: 0.5em;
239 | }
240 |
241 |
242 | p,pre,ul,ol,blockquote,h2,h3,h4,h5,h6,dl,table {
243 | margin-left: 1em;
244 | margin-right: 1em;
245 | }
246 |
247 | p.subhead { font-weight: bold; margin-top: 2em; }
248 |
249 | .smaller { font-size: smaller }
250 | .bigger { font-size: 130% }
251 |
252 | td,th { padding: 0.2em }
253 |
254 | ul {
255 | margin: 0.5em 1.5em 0.5em 1.5em;
256 | padding: 0;
257 | }
258 |
259 | ol {
260 | margin: 0.5em 1.5em 0.5em 1.5em;
261 | padding: 0;
262 | }
263 |
264 | ul { list-style-type: square; }
265 | ul ul { list-style-type: disc; }
266 | ul ul ul { list-style-type: circle; }
267 | ul ul ul ul { list-style-type: disc; }
268 |
269 | ul li {
270 | list-style: square;
271 | margin: 0.1em 0em 0.6em 0;
272 | padding: 0 0 0 0;
273 | line-height: 140%;
274 | }
275 |
276 | ol li {
277 | margin: 0.1em 0em 0.6em 1.5em;
278 | padding: 0 0 0 0px;
279 | line-height: 140%;
280 | list-style-type: decimal;
281 | }
282 |
283 | li ul li {
284 | font-size: 85%;
285 | font-style: normal;
286 | list-style-type: disc;
287 | background: transparent;
288 | padding: 0 0 0 0;
289 | }
290 | li li ul li {
291 | font-size: 85%;
292 | font-style: normal;
293 | list-style-type: circle;
294 | background: transparent;
295 | padding: 0 0 0 0;
296 | }
297 | li li li ul li {
298 | list-style-type: disc;
299 | background: transparent;
300 | padding: 0 0 0 0;
301 | }
302 |
303 | li ol li {
304 | list-style-type: decimal;
305 | }
306 |
307 |
308 | li li ol li {
309 | list-style-type: decimal;
310 | }
311 |
312 | /*
313 | setting class="outline on ol or ul makes it behave as an
314 | ouline list where blocklevel content in li elements is
315 | hidden by default and can be expanded or collapsed with
316 | mouse click. Set class="expand" on li to override default
317 | */
318 |
319 | ol.outline li:hover { cursor: pointer }
320 | ol.outline li.nofold:hover { cursor: default }
321 |
322 | ul.outline li:hover { cursor: pointer }
323 | ul.outline li.nofold:hover { cursor: default }
324 |
325 | ol.outline { list-style:decimal; }
326 | ol.outline ol { list-style-type:lower-alpha }
327 |
328 | ol.outline li.nofold {
329 | padding: 0 0 0 20px;
330 | background: transparent url(../graphics/nofold-dim.gif) no-repeat 0px 0.5em;
331 | }
332 | ol.outline li.unfolded {
333 | padding: 0 0 0 20px;
334 | background: transparent url(../graphics/fold-dim.gif) no-repeat 0px 0.5em;
335 | }
336 | ol.outline li.folded {
337 | padding: 0 0 0 20px;
338 | background: transparent url(../graphics/unfold-dim.gif) no-repeat 0px 0.5em;
339 | }
340 | ol.outline li.unfolded:hover {
341 | padding: 0 0 0 20px;
342 | background: transparent url(../graphics/fold.gif) no-repeat 0px 0.5em;
343 | }
344 | ol.outline li.folded:hover {
345 | padding: 0 0 0 20px;
346 | background: transparent url(../graphics/unfold.gif) no-repeat 0px 0.5em;
347 | }
348 |
349 | ul.outline li.nofold {
350 | padding: 0 0 0 20px;
351 | background: transparent url(../graphics/nofold-dim.gif) no-repeat 0px 0.5em;
352 | }
353 | ul.outline li.unfolded {
354 | padding: 0 0 0 20px;
355 | background: transparent url(../graphics/fold-dim.gif) no-repeat 0px 0.5em;
356 | }
357 | ul.outline li.folded {
358 | padding: 0 0 0 20px;
359 | background: transparent url(../graphics/unfold-dim.gif) no-repeat 0px 0.5em;
360 | }
361 | ul.outline li.unfolded:hover {
362 | padding: 0 0 0 20px;
363 | background: transparent url(../graphics/fold.gif) no-repeat 0px 0.5em;
364 | }
365 | ul.outline li.folded:hover {
366 | padding: 0 0 0 20px;
367 | background: transparent url(../graphics/unfold.gif) no-repeat 0px 0.5em;
368 | }
369 |
370 | /* for slides with class "title" in table of contents */
371 | a.titleslide { font-weight: bold; font-style: italic }
372 |
373 | /*
374 | hide images for work around for save as bug
375 | where browsers fail to save images used by CSS
376 | */
377 | img.hidden { display: none; visibility: hidden }
378 | div.initial_prompt { display: none; visibility: hidden }
379 |
380 | div.slide {
381 | visibility: visible;
382 | position: inherit;
383 | }
384 | div.handout {
385 | border-top-style: solid;
386 | border-top-width: thin;
387 | border-top-color: black;
388 | }
389 |
390 | @media screen {
391 | .hidden { display: none; visibility: visible }
392 |
393 | div.slide.hidden { display: block; visibility: visible }
394 | div.handout.hidden { display: block; visibility: visible }
395 | div.background { display: none; visibility: hidden }
396 | body.single_slide div.initial_prompt { display: block; visibility: visible }
397 | body.single_slide div.background { display: block; visibility: visible }
398 | body.single_slide div.background.hidden { display: none; visibility: hidden }
399 | body.single_slide .invisible { visibility: hidden }
400 | body.single_slide .hidden { display: none; visibility: hidden }
401 | body.single_slide div.slide { position: absolute }
402 | body.single_slide div.handout { display: none; visibility: hidden }
403 | }
404 |
405 | @media print {
406 | .hidden { display: block; visibility: visible }
407 |
408 | div.slide pre { font-size: 60%; padding-left: 0.5em; }
409 | div.toolbar { display: none; visibility: hidden; }
410 | div.slidy_toc { display: none; visibility: hidden; }
411 | div.background { display: none; visibility: hidden; }
412 | div.slide { page-break-before: always }
413 | /* :first-child isn't reliable for print media */
414 | div.slide.first-slide { page-break-before: avoid }
415 | }
416 |
417 |
--------------------------------------------------------------------------------
/bayhac-2011/Makefile:
--------------------------------------------------------------------------------
1 | TARGET = slides.pdf
2 |
3 | all: $(TARGET)
4 |
5 | clean:
6 | rm -f *.aux *.dvi *.log *.nav *.out *.snm *.toc *.pdf *.vrb *.hi *.o *.prof
7 |
8 | distclean: clean
9 | rm -f $(TARGET)
10 |
11 | pdf: all
12 | open $(TARGET)
13 |
14 | %.pdf: %.tex
15 | pdflatex -file-line-error $<
16 |
17 | .PHONY: all clean distclean pdf
18 |
--------------------------------------------------------------------------------
/bayhac-2011/slides.tex:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}
2 | \usepackage{listings}
3 | % \usepackage{pgfpages}
4 | % \pgfpagesuselayout{4 on 1}[a4paper,border shrink=5mm,landscape]
5 |
6 | \title{Reasoning about laziness}
7 | \author{Johan Tibell\\johan.tibell@gmail.com}
8 | \date{2011-02-12}
9 |
10 | \begin{document}
11 | \lstset{language=Haskell}
12 |
13 | \frame{\titlepage}
14 |
15 | \begin{frame}[fragile]
16 | \frametitle{Laziness}
17 |
18 | \begin{itemize}
19 | \item Haskell is a lazy language
20 | \item Functions and data constructors don't evaluate their arguments
21 | until they need them
22 | \begin{lstlisting}
23 | cond :: Bool -> a -> a -> a
24 | cond True t e = t
25 | cond False t e = e
26 | \end{lstlisting}
27 | \item Same with local definitions
28 | \begin{lstlisting}
29 | abs :: Int -> Int
30 | abs x | x > 0 = x
31 | | otherwise = neg_x
32 | where neg_x = negate x
33 | \end{lstlisting}
34 | \end{itemize}
35 | \end{frame}
36 |
37 | \begin{frame}[fragile]
38 | \frametitle{Why laziness is important}
39 |
40 | \begin{itemize}
41 | \item Laziness supports \emph{modular programming}
42 | \item Programmer-written functions instead of built-in language
43 | constructs
44 | \begin{lstlisting}
45 | (||) :: Bool -> Bool -> Bool
46 | True || _ = True
47 | False || x = x
48 | \end{lstlisting}
49 | \end{itemize}
50 | \end{frame}
51 |
52 | \begin{frame}[fragile]
53 | \frametitle{Laziness and modularity}
54 |
55 | Laziness lets us separate producers and consumers and still get
56 | efficient execution:
57 | \begin{itemize}
58 | \item Generate all solutions (a huge tree structure)
59 | \item Find the solution(s) you want
60 | \end{itemize}
61 |
62 | \begin{lstlisting}
63 | nextMove :: Board -> Move
64 | nextMove b = selectMove allMoves
65 | where
66 | allMoves = allMovesFrom b
67 | \end{lstlisting}
68 |
69 | The solutions are generated as they are consumed.
70 | \end{frame}
71 |
72 | \begin{frame}[fragile]
73 | \frametitle{Example: summing some numbers}
74 | \begin{lstlisting}
75 | sum :: [Int] -> Int
76 | sum xs = sum' 0 xs
77 | where
78 | sum' acc [] = acc
79 | sum' acc (x:xs) = sum' (acc + x) xs
80 | \end{lstlisting}
81 |
82 | \lstinline!foldl! abstracts the accumulator recursion pattern:
83 |
84 | \begin{lstlisting}
85 | foldl :: (a -> b -> a) -> a -> [b] -> a
86 | foldl f z [] = z
87 | foldl f z (x:xs) = foldl f (f z x) xs
88 |
89 | sum = foldl (+) 0
90 | \end{lstlisting}
91 | \end{frame}
92 |
93 | \begin{frame}[fragile]
94 | \frametitle{A misbehaving function}
95 | How does evaluation of this expression proceed?
96 | \begin{lstlisting}
97 | sum [1,2,3]
98 | \end{lstlisting}
99 |
100 | Like this:
101 | \begin{verbatim}
102 | sum [1,2,3]
103 | ==> foldl (+) 0 [1,2,3]
104 | ==> foldl (+) (0+1) [2,3]
105 | ==> foldl (+) ((0+1)+2) [3]
106 | ==> foldl (+) (((0+1)+2)+3) []
107 | ==> ((0+1)+2)+3
108 | ==> (1+2)+3
109 | ==> 3+3
110 | ==> 6
111 | \end{verbatim}
112 | \end{frame}
113 |
114 | \begin{frame}
115 | \frametitle{Thunks}
116 |
117 | A \emph{thunk} represents an unevaluated expression.
118 |
119 | \begin{itemize}
120 | \item GHC needs to store all the unevaluated \lstinline!+! expressions
121 | on the heap, until their value is needed.
122 | \item Storing and evaluating thunks is costly, and unnecessary if the
123 | expression was going to be evaluated anyway.
124 | \item \lstinline!foldl! allocates \emph{n} thunks, one for each
125 | addition, causing a stack overflow when GHC tries to evaluate the
126 | chain of thunks.
127 | \end{itemize}
128 | \end{frame}
129 |
130 | \begin{frame}[fragile]
131 | \frametitle{Controlling evaluation order}
132 |
133 | The \lstinline!seq! function allows to control evaluation order.
134 |
135 | \begin{lstlisting}
136 | seq :: a -> b -> b
137 | \end{lstlisting}
138 |
139 | Informally, when evaluated, the expression \lstinline!seq a b!
140 | evaluates \lstinline!a! and then returns \lstinline!b!.
141 | \end{frame}
142 |
143 | \begin{frame}[fragile]
144 | \frametitle{Weak head normal form}
145 |
146 | Evaluation stops as soon as a data constructor (or lambda) is
147 | reached:
148 | \begin{verbatim}
149 | ghci> seq (1 `div` 0) 2
150 | *** Exception: divide by zero
151 | ghci> seq ((1 `div` 0), 3) 2
152 | 2
153 | \end{verbatim}
154 | We say that \lstinline!seq! evaluates to \emph{weak head normal
155 | form} (WHNF).
156 | \end{frame}
157 |
158 | \begin{frame}[fragile]
159 | \frametitle{Weak head normal form}
160 |
161 | Forcing the evaluation of an expression using \lstinline!seq! only
162 | makes sense if the result of that expression is used later:
163 | \begin{lstlisting}
164 | let x = 1 + 2 in seq x (f x)
165 | \end{lstlisting}
166 |
167 | The expression
168 | \begin{lstlisting}
169 | print (seq (1 + 2) 3)
170 | \end{lstlisting}
171 | doesn't make sense as the result of \lstinline!1+2! is never used.
172 | \end{frame}
173 |
174 | \begin{frame}[fragile]
175 | \frametitle{Exercise}
176 |
177 | Rewrite the expression
178 | \begin{lstlisting}
179 | (1 + 2, 'a')
180 | \end{lstlisting}
181 | so that the component of the pair is evaluated before the pair is
182 | created.
183 | \end{frame}
184 |
185 | \begin{frame}[fragile]
186 | \frametitle{Solution}
187 |
188 | Rewrite the expression as
189 | \begin{lstlisting}
190 | let x = 1 + 2 in seq x (x, 'a')
191 | \end{lstlisting}
192 | \end{frame}
193 |
194 | \begin{frame}[fragile]
195 | \frametitle{A strict left fold}
196 |
197 | We want to evaluate the expression \lstinline!f z x! \emph{before}
198 | evaluating the recursive call:
199 |
200 | \begin{lstlisting}
201 | foldl' :: (a -> b -> a) -> a -> [b] -> a
202 | foldl' f z [] = z
203 | foldl' f z (x:xs) = let z' = f z x
204 | in seq z' (foldl' f z' xs)
205 | \end{lstlisting}
206 | \end{frame}
207 |
208 | \begin{frame}[fragile]
209 | \frametitle{Summing numbers, attempt 2}
210 |
211 | How does evaluation of this expression proceed?
212 | \begin{verbatim}
213 | foldl' (+) 0 [1,2,3]
214 | \end{verbatim}
215 |
216 | Like this:
217 | \begin{verbatim}
218 | foldl' (+) 0 [1,2,3]
219 | ==> foldl' (+) 1 [2,3]
220 | ==> foldl' (+) 3 [3]
221 | ==> foldl' (+) 6 []
222 | ==> 6
223 | \end{verbatim}
224 |
225 | Sanity check:
226 | \begin{verbatim}
227 | ghci> print (foldl' (+) 0 [1..1000000])
228 | 500000500000
229 | \end{verbatim}
230 | \end{frame}
231 |
232 | \begin{frame}[fragile]
233 | \frametitle{Computing the mean}
234 |
235 | A function that computes the mean of a list of numbers:
236 | \begin{lstlisting}
237 | mean :: [Double] -> Double
238 | mean xs = s / fromIntegral l
239 | where
240 | (s, l) = foldl' step (0, 0) xs
241 | step (s, l) a = (s+a, l+1)
242 | \end{lstlisting}
243 | We compute the length of the list and the sum of the numbers in one
244 | pass.
245 |
246 | \begin{verbatim}
247 | $ ./Mean
248 | Stack space overflow: current size 8388608 bytes.
249 | Use `+RTS -Ksize -RTS' to increase it.
250 | \end{verbatim}
251 | Didn't we just fix that problem?!?
252 | \end{frame}
253 |
254 | \begin{frame}[fragile]
255 | \frametitle{seq and data constructors}
256 |
257 | Remember:
258 | \begin{itemize}
259 | \item Data constructors don't evaluate their arguments when
260 | created
261 | \item \lstinline!seq! only evaluates to the outmost data
262 | constructor, but doesn't evaluate its arguments
263 | \end{itemize}
264 |
265 | Problem: \lstinline!foldl'! forces the evaluation of the pair
266 | constructor, but not its arguments, causing unevaluated thunks build
267 | up inside the pair:
268 |
269 | \begin{verbatim}
270 | (0.0 + 1.0 + 2.0 + 3.0, 0 + 1 + 1 + 1)
271 | \end{verbatim}
272 | \end{frame}
273 |
274 | \begin{frame}[fragile]
275 | \frametitle{Forcing evaluation of constructor arguments}
276 |
277 | We can force GHC to evaluate the constructor arguments before the
278 | constructor is created:
279 |
280 | \begin{lstlisting}
281 | mean :: [Double] -> Double
282 | mean xs = s / fromIntegral l
283 | where
284 | (s, l) = foldl' step (0, 0) xs
285 | step (s, l) a = let s' = s + a
286 | l' = l + 1
287 | in seq s' (seq l' (s', l'))
288 | \end{lstlisting}
289 | \end{frame}
290 |
291 | \begin{frame}[fragile]
292 | \frametitle{Bang patterns}
293 |
294 | A \emph{bang patterns} is a concise way to express that an argument
295 | should be evaluated.
296 |
297 | \begin{lstlisting}
298 | {-# LANGUAGE BangPatterns #-}
299 |
300 | mean :: [Double] -> Double
301 | mean xs = s / fromIntegral l
302 | where
303 | (s, l) = foldl' step (0, 0) xs
304 | step (!s, !l) a = (s + a, l + 1)
305 | \end{lstlisting}
306 |
307 | \lstinline!s! and \lstinline!l! are evaluated before the right-hand
308 | side of \lstinline!step! is evaluated.
309 | \end{frame}
310 |
311 | \begin{frame}[fragile]
312 | \frametitle{Strictness}
313 |
314 | We say that a function is \emph{strict} in an argument, if
315 | evaluating the function always causes the argument to be evaluated.
316 |
317 | \begin{lstlisting}
318 | null :: [a] -> Bool
319 | null [] = True
320 | null _ = False
321 | \end{lstlisting}
322 |
323 | \lstinline!null! is strict in its first (and only) argument, as it
324 | needs to be evaluated to pick a return value.
325 | \end{frame}
326 |
327 | \begin{frame}[fragile]
328 | \frametitle{Strictness - Example}
329 |
330 | \lstinline!cond! is strict in the first argument, but not in the
331 | second and third argument:
332 | \begin{lstlisting}
333 | cond :: Bool -> a -> a -> a
334 | cond True t e = t
335 | cond False t e = e
336 | \end{lstlisting}
337 | Reason: Each of the two branches only evaluate one of the two last
338 | arguments to \lstinline!cond!.
339 | \end{frame}
340 |
341 | \begin{frame}[fragile]
342 | \frametitle{Strict data types}
343 |
344 | Haskell lets us say that we always want the arguments of a
345 | constructor to be evaluated:
346 |
347 | \begin{lstlisting}
348 | data PairS a b = PS !a !b
349 | \end{lstlisting}
350 |
351 | When a \lstinline!PairS! is evaluated, its arguments are evaluated.
352 | \end{frame}
353 |
354 | \begin{frame}[fragile]
355 | \frametitle{Strict pairs as accumulators}
356 |
357 | We can use a strict pair to simplify our \lstinline!mean! function:
358 |
359 | \begin{lstlisting}
360 | mean :: [Double] -> Double
361 | mean xs = s / fromIntegral l
362 | where
363 | PS s l = foldl' step (PS 0 0) xs
364 | step (PS s l) a = PS (s + a) (l + 1)
365 | \end{lstlisting}
366 |
367 | Tip: Prefer strict data types when laziness is not needed for your
368 | program to work correctly.
369 |
370 | \end{frame}
371 |
372 | \begin{frame}[fragile]
373 | \frametitle{Reasoning about laziness}
374 |
375 | A function application is only evaluated if its result is needed,
376 | therefore:
377 | \begin{itemize}
378 | \item One of the function's right-hand sides will be evaluated.
379 | \item Any expression whose value is required to decide which RHS to
380 | evaluate, must be evaluated.
381 | \end{itemize}
382 | By using this ``backward-to-front'' analysis we can figure which
383 | arguments a function is strict in.
384 | \end{frame}
385 |
386 | \begin{frame}[fragile]
387 | \frametitle{Reasoning about laziness: example}
388 |
389 | \begin{lstlisting}
390 | max :: Int -> Int -> Int
391 | max x y
392 | | x > y = x
393 | | x < y = y
394 | | otherwise = x -- arbitrary
395 | \end{lstlisting}
396 |
397 | \begin{itemize}
398 | \item To pick one of the three RHS, we must evaluate \lstinline!x > y!.
399 | \item Therefore we must evaluate \emph{both} \lstinline!x! and
400 | \lstinline!y!.
401 | \item Therefore \lstinline!max! is strict in both \lstinline!x! and
402 | \lstinline!y!.
403 | \end{itemize}
404 | \end{frame}
405 |
406 | \begin{frame}[fragile]
407 | \frametitle{Poll}
408 |
409 | \begin{lstlisting}
410 | data BST = Leaf | Node Int BST BST
411 |
412 | insert :: Int -> BST -> BST
413 | insert x Leaf = Node x Leaf Leaf
414 | insert x (Node x' l r)
415 | | x < x' = Node x' (insert x l) r
416 | | x > x' = Node x' l (insert x r)
417 | | otherwise = Node x l r
418 | \end{lstlisting}
419 |
420 | Which arguments is \lstinline!insert! strict in?
421 |
422 | \begin{itemize}
423 | \item None
424 | \item 1st
425 | \item 2nd
426 | \item Both
427 | \end{itemize}
428 | \end{frame}
429 |
430 | \begin{frame}[fragile]
431 | \frametitle{Solution}
432 |
433 | Only the second, as inserting into an empty tree can be done without
434 | comparing the value being inserted. For example, this expression
435 | \begin{lstlisting}
436 | insert (1 `div` 0) Leaf
437 | \end{lstlisting}
438 | does not raise a division-by-zero expression but
439 | \begin{lstlisting}
440 | insert (1 `div` 0) (Node 2 Leaf Leaf)
441 | \end{lstlisting}
442 | does.
443 | \end{frame}
444 |
445 | \begin{frame}[fragile]
446 | \frametitle{Some other things worth pointing out}
447 |
448 | \begin{itemize}
449 | \item \lstinline!insert x l! is not evaluated before the
450 | \lstinline!Node! is created, so it's stored as a thunk.
451 | \item Most tree based data structures use strict sub-trees:
452 | \begin{lstlisting}
453 | data Set a = Tip
454 | | Bin !Size a !(Set a) !(Set a)
455 | \end{lstlisting}
456 | \end{itemize}
457 | \end{frame}
458 |
459 | \begin{frame}[fragile]
460 | \frametitle{Strict function arguments are great for performance}
461 |
462 | \begin{itemize}
463 | \item Strict arguments can often be passed as \emph{unboxed} values
464 | (e.g. a machine integer in a register instead of a pointer to an
465 | integer on the heap).
466 | \item The compiler can often infer which arguments are stricts, but
467 | can sometimes need a little help (like in the case of
468 | \lstinline!insert! a few slides back).
469 | \end{itemize}
470 | \end{frame}
471 |
472 | \begin{frame}[fragile]
473 | \frametitle{Summary}
474 |
475 | Understanding how evaluation works in Haskell is important and
476 | requires practice.
477 | \end{frame}
478 |
479 | \end{document}
480 |
481 | %%% Local Variables:
482 | %%% mode: latex
483 | %%% TeX-master: t
484 | %%% TeX-PDF-mode: t
485 | %%% End:
486 |
--------------------------------------------------------------------------------
/cufp-2010/Makefile:
--------------------------------------------------------------------------------
1 | TARGET = slides.pdf
2 |
3 | all: $(TARGET)
4 |
5 | clean:
6 | rm -f *.aux *.dvi *.log *.nav *.out *.snm *.toc *.pdf *.vrb *.hi *.o *.prof
7 |
8 | distclean: clean
9 | rm -f $(TARGET)
10 |
11 | pdf: all
12 | open $(TARGET)
13 |
14 | %.pdf: %.tex
15 | pdflatex -file-line-error $<
16 |
17 | .PHONY: all clean distclean pdf
18 |
--------------------------------------------------------------------------------
/cufp-2010/README:
--------------------------------------------------------------------------------
1 | Prerequisites
2 | -------------
3 |
4 | You need TeX Live and the beamer class to build:
5 |
6 | sudo apt-get install texlive latex-beamer
7 |
8 |
9 | Building
10 | --------
11 |
12 | To build the presentation PDF run:
13 |
14 | make pdf
15 |
--------------------------------------------------------------------------------
/cufp-2010/diagrams/SpaceLeak-eps-converted-to.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/cufp-2010/diagrams/SpaceLeak-eps-converted-to.pdf
--------------------------------------------------------------------------------
/cufp-2010/diagrams/SpaceLeak.eps:
--------------------------------------------------------------------------------
1 | %!PS-Adobe-2.0
2 | %%Title: SpaceLeak 1e7 +RTS -p -K400M -hy
3 | %%Creator: hp2ps (version 0.25)
4 | %%CreationDate: Sat Sep 25 14:27 2010
5 | %%BoundingBox: 0 0 288 192
6 | %%EndComments
7 | 0.444444 0.444444 scale
8 | /HE10 /Helvetica findfont 10 scalefont def
9 | /HE12 /Helvetica findfont 12 scalefont def
10 | newpath
11 | 0 0 moveto
12 | 0 432.000000 rlineto
13 | 648.000000 0 rlineto
14 | 0 -432.000000 rlineto
15 | closepath
16 | 0.500000 setlinewidth
17 | stroke
18 | newpath
19 | 5.000000 407.000000 moveto
20 | 0 20.000000 rlineto
21 | 638.000000 0 rlineto
22 | 0 -20.000000 rlineto
23 | closepath
24 | 0.500000 setlinewidth
25 | stroke
26 | HE12 setfont
27 | 11.000000 413.000000 moveto
28 | (SpaceLeak 1e7 +RTS -p -K400M -hy) show
29 | HE12 setfont
30 | (8,896,936,338 bytes x seconds)
31 | dup stringwidth pop
32 | 2 div
33 | 319.000000
34 | exch sub
35 | 413.000000 moveto
36 | show
37 | HE12 setfont
38 | (Sat Sep 25 14:27 2010)
39 | dup stringwidth pop
40 | 637.000000
41 | exch sub
42 | 413.000000 moveto
43 | show
44 | 45.000000 20.000000 moveto
45 | 487.464558 0 rlineto
46 | 0.500000 setlinewidth
47 | stroke
48 | HE10 setfont
49 | (seconds)
50 | dup stringwidth pop
51 | 532.464558
52 | exch sub
53 | 5.000000 moveto
54 | show
55 | 45.000000 20.000000 moveto
56 | 0 -4 rlineto
57 | stroke
58 | HE10 setfont
59 | (0.0)
60 | dup stringwidth pop
61 | 2 div
62 | 45.000000 exch sub
63 | 5.000000 moveto
64 | show
65 | 92.207492 20.000000 moveto
66 | 0 -4 rlineto
67 | stroke
68 | HE10 setfont
69 | (5.0)
70 | dup stringwidth pop
71 | 2 div
72 | 92.207492 exch sub
73 | 5.000000 moveto
74 | show
75 | 139.414983 20.000000 moveto
76 | 0 -4 rlineto
77 | stroke
78 | HE10 setfont
79 | (10.0)
80 | dup stringwidth pop
81 | 2 div
82 | 139.414983 exch sub
83 | 5.000000 moveto
84 | show
85 | 186.622475 20.000000 moveto
86 | 0 -4 rlineto
87 | stroke
88 | HE10 setfont
89 | (15.0)
90 | dup stringwidth pop
91 | 2 div
92 | 186.622475 exch sub
93 | 5.000000 moveto
94 | show
95 | 233.829966 20.000000 moveto
96 | 0 -4 rlineto
97 | stroke
98 | HE10 setfont
99 | (20.0)
100 | dup stringwidth pop
101 | 2 div
102 | 233.829966 exch sub
103 | 5.000000 moveto
104 | show
105 | 281.037458 20.000000 moveto
106 | 0 -4 rlineto
107 | stroke
108 | HE10 setfont
109 | (25.0)
110 | dup stringwidth pop
111 | 2 div
112 | 281.037458 exch sub
113 | 5.000000 moveto
114 | show
115 | 328.244949 20.000000 moveto
116 | 0 -4 rlineto
117 | stroke
118 | HE10 setfont
119 | (30.0)
120 | dup stringwidth pop
121 | 2 div
122 | 328.244949 exch sub
123 | 5.000000 moveto
124 | show
125 | 375.452441 20.000000 moveto
126 | 0 -4 rlineto
127 | stroke
128 | HE10 setfont
129 | (35.0)
130 | dup stringwidth pop
131 | 2 div
132 | 375.452441 exch sub
133 | 5.000000 moveto
134 | show
135 | 422.659933 20.000000 moveto
136 | 0 -4 rlineto
137 | stroke
138 | HE10 setfont
139 | (40.0)
140 | dup stringwidth pop
141 | 2 div
142 | 422.659933 exch sub
143 | 5.000000 moveto
144 | show
145 | 469.867424 20.000000 moveto
146 | 0 -4 rlineto
147 | stroke
148 | HE10 setfont
149 | (45.0)
150 | dup stringwidth pop
151 | 2 div
152 | 469.867424 exch sub
153 | 5.000000 moveto
154 | show
155 | 45.000000 20.000000 moveto
156 | 0 382.000000 rlineto
157 | 0.500000 setlinewidth
158 | stroke
159 | gsave
160 | HE10 setfont
161 | (bytes)
162 | dup stringwidth pop
163 | 402.000000
164 | exch sub
165 | 40.000000 exch
166 | translate
167 | 90 rotate
168 | 0 0 moveto
169 | show
170 | grestore
171 | 45.000000 20.000000 moveto
172 | -4 0 rlineto
173 | stroke
174 | HE10 setfont
175 | (0M)
176 | dup stringwidth
177 | 2 div
178 | 20.000000 exch sub
179 | exch
180 | 40.000000 exch sub
181 | exch
182 | moveto
183 | show
184 | 45.000000 66.444575 moveto
185 | -4 0 rlineto
186 | stroke
187 | HE10 setfont
188 | (50M)
189 | dup stringwidth
190 | 2 div
191 | 66.444575 exch sub
192 | exch
193 | 40.000000 exch sub
194 | exch
195 | moveto
196 | show
197 | 45.000000 112.889151 moveto
198 | -4 0 rlineto
199 | stroke
200 | HE10 setfont
201 | (100M)
202 | dup stringwidth
203 | 2 div
204 | 112.889151 exch sub
205 | exch
206 | 40.000000 exch sub
207 | exch
208 | moveto
209 | show
210 | 45.000000 159.333726 moveto
211 | -4 0 rlineto
212 | stroke
213 | HE10 setfont
214 | (150M)
215 | dup stringwidth
216 | 2 div
217 | 159.333726 exch sub
218 | exch
219 | 40.000000 exch sub
220 | exch
221 | moveto
222 | show
223 | 45.000000 205.778301 moveto
224 | -4 0 rlineto
225 | stroke
226 | HE10 setfont
227 | (200M)
228 | dup stringwidth
229 | 2 div
230 | 205.778301 exch sub
231 | exch
232 | 40.000000 exch sub
233 | exch
234 | moveto
235 | show
236 | 45.000000 252.222877 moveto
237 | -4 0 rlineto
238 | stroke
239 | HE10 setfont
240 | (250M)
241 | dup stringwidth
242 | 2 div
243 | 252.222877 exch sub
244 | exch
245 | 40.000000 exch sub
246 | exch
247 | moveto
248 | show
249 | 45.000000 298.667452 moveto
250 | -4 0 rlineto
251 | stroke
252 | HE10 setfont
253 | (300M)
254 | dup stringwidth
255 | 2 div
256 | 298.667452 exch sub
257 | exch
258 | 40.000000 exch sub
259 | exch
260 | moveto
261 | show
262 | 45.000000 345.112027 moveto
263 | -4 0 rlineto
264 | stroke
265 | HE10 setfont
266 | (350M)
267 | dup stringwidth
268 | 2 div
269 | 345.112027 exch sub
270 | exch
271 | 40.000000 exch sub
272 | exch
273 | moveto
274 | show
275 | 537.464558 89.400000 moveto
276 | 0 14 rlineto
277 | 14 0 rlineto
278 | 0 -14 rlineto
279 | closepath
280 | gsave
281 | 0.000000 0.000000 0.000000 setrgbcolor
282 | fill
283 | grestore
284 | stroke
285 | HE10 setfont
286 | 556.464558 91.400000 moveto
287 | (BLACKHOLE) show
288 | 537.464558 165.800000 moveto
289 | 0 14 rlineto
290 | 14 0 rlineto
291 | 0 -14 rlineto
292 | closepath
293 | gsave
294 | 0.000000 0.000000 1.000000 setrgbcolor
295 | fill
296 | grestore
297 | stroke
298 | HE10 setfont
299 | 556.464558 167.800000 moveto
300 | (*) show
301 | 537.464558 242.200000 moveto
302 | 0 14 rlineto
303 | 14 0 rlineto
304 | 0 -14 rlineto
305 | closepath
306 | gsave
307 | 0.000000 1.000000 0.000000 setrgbcolor
308 | fill
309 | grestore
310 | stroke
311 | HE10 setfont
312 | 556.464558 244.200000 moveto
313 | (Double) show
314 | 537.464558 318.600000 moveto
315 | 0 14 rlineto
316 | 14 0 rlineto
317 | 0 -14 rlineto
318 | closepath
319 | gsave
320 | 0.000000 1.000000 1.000000 setrgbcolor
321 | fill
322 | grestore
323 | stroke
324 | HE10 setfont
325 | 556.464558 320.600000 moveto
326 | ([]) show
327 | 45.000000 20.000000 moveto
328 | 45.000000 20.000000 lineto
329 | 94.095791 20.000000 lineto
330 | 107.125059 20.000000 lineto
331 | 532.464558 20.000000 lineto
332 | 532.464558 20.000000 lineto
333 | 532.464558 20.000000 lineto
334 | 107.125059 91.239870 lineto
335 | 94.095791 37.808284 lineto
336 | 45.000000 20.000000 lineto
337 | closepath
338 | gsave
339 | 0.000000 0.000000 0.000000 setrgbcolor
340 | fill
341 | grestore
342 | stroke
343 | 45.000000 20.000000 moveto
344 | 45.000000 20.000000 lineto
345 | 94.095791 37.808284 lineto
346 | 107.125059 91.239870 lineto
347 | 532.464558 20.000000 lineto
348 | 532.464558 20.000000 lineto
349 | 532.464558 20.000000 lineto
350 | 107.125059 98.918526 lineto
351 | 94.095791 179.065905 lineto
352 | 45.000000 20.000000 lineto
353 | closepath
354 | gsave
355 | 0.000000 0.000000 1.000000 setrgbcolor
356 | fill
357 | grestore
358 | stroke
359 | 45.000000 20.000000 moveto
360 | 45.000000 20.000000 lineto
361 | 94.095791 179.065905 lineto
362 | 107.125059 98.918526 lineto
363 | 532.464558 20.000000 lineto
364 | 532.464558 20.000000 lineto
365 | 532.464558 20.000000 lineto
366 | 107.125059 210.385496 lineto
367 | 94.095791 290.532874 lineto
368 | 45.000000 20.000000 lineto
369 | closepath
370 | gsave
371 | 0.000000 1.000000 0.000000 setrgbcolor
372 | fill
373 | grestore
374 | stroke
375 | 45.000000 20.000000 moveto
376 | 45.000000 20.000000 lineto
377 | 94.095791 290.532874 lineto
378 | 107.125059 210.385496 lineto
379 | 532.464558 20.000000 lineto
380 | 532.464558 20.000000 lineto
381 | 532.464558 20.000000 lineto
382 | 107.125059 321.852621 lineto
383 | 94.095791 402.000000 lineto
384 | 45.000000 20.000000 lineto
385 | closepath
386 | gsave
387 | 0.000000 1.000000 1.000000 setrgbcolor
388 | fill
389 | grestore
390 | stroke
391 | showpage
392 |
--------------------------------------------------------------------------------
/cufp-2010/diagrams/SpaceLeak.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/cufp-2010/diagrams/SpaceLeak.pdf
--------------------------------------------------------------------------------
/cufp-2010/diagrams/intpair-unpacked.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/cufp-2010/diagrams/intpair-unpacked.pdf
--------------------------------------------------------------------------------
/cufp-2010/diagrams/intpair-unpacked.svg:
--------------------------------------------------------------------------------
1 |
2 |
148 |
--------------------------------------------------------------------------------
/cufp-2010/diagrams/intpair.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/cufp-2010/diagrams/intpair.pdf
--------------------------------------------------------------------------------
/cufp-2010/diagrams/intpair.svg:
--------------------------------------------------------------------------------
1 |
2 |
206 |
--------------------------------------------------------------------------------
/cufp-2010/diagrams/list12.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/cufp-2010/diagrams/list12.pdf
--------------------------------------------------------------------------------
/cufp-2010/diagrams/list12.svg:
--------------------------------------------------------------------------------
1 |
2 |
267 |
--------------------------------------------------------------------------------
/galois-2011/Makefile:
--------------------------------------------------------------------------------
1 | TARGET = slides.pdf
2 |
3 | all: $(TARGET)
4 |
5 | clean:
6 | rm -f *.aux *.dvi *.log *.nav *.out *.snm *.toc *.pdf *.vrb *.hi *.o *.prof
7 |
8 | distclean: clean
9 | rm -f $(TARGET)
10 |
11 | pdf: all
12 | open $(TARGET)
13 |
14 | %.pdf: %.tex
15 | pdflatex -file-line-error $<
16 |
17 | .PHONY: all clean distclean pdf
18 |
--------------------------------------------------------------------------------
/galois-2011/slides.tex:
--------------------------------------------------------------------------------
1 | \documentclass{beamer}
2 | \usepackage{listings}
3 | % \usepackage{pgfpages}
4 | % \pgfpagesuselayout{4 on 1}[a4paper,border shrink=5mm,landscape]
5 |
6 | \title{Faster persistent data structures through hashing}
7 | \author{Johan Tibell\\johan.tibell@gmail.com}
8 | \date{2011-02-15}
9 |
10 | \begin{document}
11 | \lstset{language=Haskell}
12 |
13 | \frame{\titlepage}
14 |
15 | \begin{frame}
16 | \frametitle{Motivating problem: Twitter data analys}
17 |
18 | ``I'm computing a communication graph from Twitter data and then
19 | scan it daily to allocate social capital to nodes behaving in a good
20 | karmic manner. The graph is culled from 100 million tweets and has
21 | about 3 million nodes.''
22 |
23 | \bigskip
24 | We need a data structure that is
25 | \begin{itemize}
26 | \item fast when used with string keys, and
27 | \item doesn't use too much memory.
28 | \end{itemize}
29 | \end{frame}
30 |
31 | \begin{frame}
32 | \frametitle{Persistent maps in Haskell}
33 |
34 | \begin{itemize}
35 | \item \lstinline!Data.Map! is the most commonly used map type.
36 | \item It's implemented using size balanced trees.
37 | \item Keys can be of any type, as long as values of the type can be
38 | ordered.
39 | \end{itemize}
40 | \end{frame}
41 |
42 | \begin{frame}
43 | \frametitle{Real world performance of Data.Map}
44 |
45 | \begin{itemize}
46 | \item Good in theory: no more than $O(\log n)$ comparisons.
47 | \item Not great in practice: up to $O(\log n)$ comparisons!
48 | \item Many common types are expensive to compare e.g
49 | \lstinline!String!, \lstinline!ByteString!, and \lstinline!Text!.
50 | \item Given a string of length $k$, we need $O(k*\log n)$
51 | comparisons to look up an entry.
52 | \end{itemize}
53 | \end{frame}
54 |
55 | \begin{frame}
56 | \frametitle{Hash tables}
57 | \begin{itemize}
58 | \item Hash tables perform well with string keys: $O(k)$ amortized
59 | lookup time for strings of length $k$.
60 | \item However, we want persistent maps, not mutable hash tables.
61 | \end{itemize}
62 | \end{frame}
63 |
64 | \begin{frame}
65 | \frametitle{Milan Straka's idea: Patricia trees as sparse arrays}
66 | \begin{itemize}
67 | \item We can use hashing without using hash tables!
68 | \item A Patricia tree implements a persistent, sparse array.
69 | \item Patricia trees are as twice fast as size balanced trees, but
70 | only work with \lstinline!Int! keys.
71 | \item Use hashing to derive an \lstinline!Int! from an arbitrary
72 | key.
73 | \end{itemize}
74 | \end{frame}
75 |
76 | \begin{frame}
77 | \frametitle{Implementation tricks}
78 | \begin{itemize}
79 | \item Patricia trees implement a sparse, persistent array of size
80 | $2^{32}$ (or $2^{64}$).
81 | \item Hashing using this many buckets makes collisions rare: for
82 | $2^{24}$ entries we expect about 32,000 single collisions.
83 | \item Linked lists are a perfectly adequate collision resolution
84 | strategy.
85 | \end{itemize}
86 | \end{frame}
87 |
88 | \begin{frame}[fragile]
89 | \frametitle{First attempt at an implementation}
90 | \begin{lstlisting}
91 | -- Defined in the containers package.
92 | data IntMap a
93 | = Nil
94 | | Tip {-# UNPACK #-} !Key a
95 | | Bin {-# UNPACK #-} !Prefix
96 | {-# UNPACK #-} !Mask
97 | !(IntMap a) !(IntMap a)
98 |
99 | type Prefix = Int
100 | type Mask = Int
101 | type Key = Int
102 |
103 | newtype HashMap k v = HashMap (IntMap [(k, v)])
104 | \end{lstlisting}
105 | \end{frame}
106 |
107 |
108 | \begin{frame}[fragile]
109 | \frametitle{A more memory efficient implementation}
110 | \begin{lstlisting}
111 | data HashMap k v
112 | = Nil
113 | | Tip {-# UNPACK #-} !Hash
114 | {-# UNPACK #-} !(FL.FullList k v)
115 | | Bin {-# UNPACK #-} !Prefix
116 | {-# UNPACK #-} !Mask
117 | !(HashMap k v) !(HashMap k v)
118 |
119 | type Prefix = Int
120 | type Mask = Int
121 | type Hash = Int
122 |
123 | data FullList k v = FL !k !v !(List k v)
124 | data List k v = Nil | Cons !k !v !(List k v)
125 | \end{lstlisting}
126 | \end{frame}
127 |
128 | \begin{frame}
129 | \frametitle{Reducing the memory footprint}
130 | \begin{itemize}
131 | \item \lstinline!List k v! uses 2 fewer words per key/value pair
132 | than \lstinline![(k, v)]!.
133 | \item \lstinline!FullList! can be unpacked into the \lstinline!Tip!
134 | constructor as it's a product type, saving 2 more words.
135 | \item Always unpack word sized types, like \lstinline!Int!, unless
136 | you really need them to be lazy.
137 | \end{itemize}
138 | \end{frame}
139 |
140 | \begin{frame}
141 | \frametitle{Benchmarks}
142 |
143 | Keys: $2^{12}$ random 8-byte \lstinline!ByteString!s
144 |
145 | \bigskip
146 | \begin{tabular}{|c|c|c|}
147 | \hline & \textbf{Map} & \textbf{HashMap} \\
148 | \hline \textbf{insert} & 1.00 & 0.43 \\
149 | \hline \textbf{lookup} & 1.00 & 0.28 \\
150 | \hline
151 | \end{tabular}
152 | \bigskip
153 |
154 | \lstinline!delete! performs like \lstinline!insert!.
155 | \end{frame}
156 |
157 | \begin{frame}
158 | \frametitle{Can we do better?}
159 | \begin{itemize}
160 | \item We still need to perform $O(min(W, \log n))$ \lstinline!Int!
161 | comparisons, where $W$ is the number of bits in a word.
162 | \item The memory overhead per key/value pair is still quite high.
163 | \end{itemize}
164 | \end{frame}
165 |
166 | \begin{frame}
167 | \frametitle{Borrowing from our neighbours}
168 | \begin{itemize}
169 | \item Clojure uses a \emph{hash-array mapped trie} (HAMT) data
170 | structure to implement persistent maps.
171 | \item Described in the paper ``Ideal Hash Trees'' by Bagwell (2001).
172 | \item Originally a mutable data structure implemented in C++.
173 | \item Clojure's persistent version was created by Rich Hickey.
174 | \end{itemize}
175 | \end{frame}
176 |
177 | \begin{frame}
178 | \frametitle{Hash-array mapped tries in Clojure}
179 | \begin{itemize}
180 | \item Shallow tree with high branching factor.
181 | \item Each node, except the leaf nodes, contains an array of up to
182 | 32 elements.
183 | \item 5 bits of the hash are used to index the array at each level.
184 | \item A clever trick, using bit population count, is used to
185 | represent sparse arrays.
186 | \end{itemize}
187 | \end{frame}
188 |
189 | \begin{frame}[fragile]
190 | \frametitle{The Haskell definition of a HAMT}
191 | \begin{lstlisting}
192 | data HashMap k v
193 | = Empty
194 | | BitmapIndexed
195 | {-# UNPACK #-} !Bitmap
196 | {-# UNPACK #-} !(Array (HashMap k v))
197 | | Leaf {-# UNPACK #-} !(Leaf k v)
198 | | Full {-# UNPACK #-} !(Array (HashMap k v))
199 | | Collision {-# UNPACK #-} !Hash
200 | {-# UNPACK #-} !(Array (Leaf k v))
201 |
202 | type Bitmap = Word
203 |
204 | data Array a = Array !(Array# a)
205 | {-# UNPACK #-} !Int
206 | \end{lstlisting}
207 | \end{frame}
208 |
209 | \begin{frame}
210 | \frametitle{Making it fast}
211 | \begin{itemize}
212 | \item The initial implementation by Edward Z. Yang: correct but
213 | didn't perform well.
214 | \item Improved performance by
215 | \begin{itemize}
216 | \item replacing use of \lstinline!Data.Vector! by a
217 | specialized array type,
218 | \item paying careful attention to strictness, and
219 | \item using GHC's new \texttt{INLINABLE} pragma.
220 | \end{itemize}
221 | \end{itemize}
222 | \end{frame}
223 |
224 | \begin{frame}
225 | \frametitle{Benchmarks}
226 |
227 | Keys: $2^{12}$ random 8-byte \lstinline!ByteString!s
228 |
229 | \bigskip
230 | \begin{tabular}{|c|c|c|c|}
231 | \hline & \textbf{Map} & \textbf{HashMap} &
232 | \textbf{HashMap (HAMT)} \\
233 | \hline \textbf{insert} & 1.00 & 0.43 & 1.21 \\
234 | \hline \textbf{lookup} & 1.00 & 0.28 & 0.21 \\
235 | \hline
236 | \end{tabular}
237 | \end{frame}
238 |
239 | \begin{frame}
240 | \frametitle{Where is all the time spent?}
241 |
242 | \begin{itemize}
243 | \item Most time in \lstinline!insert! is spent copying small arrays.
244 | \item Array copying is implemented using \lstinline!indexArray\#!
245 | and \lstinline!writeArray\#!, which results in poor performance.
246 | \item When cloning an array, we are force to first fill the new
247 | array with dummy elements, followed by copying over the elements
248 | from the old array.
249 | \end{itemize}
250 | \end{frame}
251 |
252 | \begin{frame}
253 | \frametitle{A better array copy}
254 |
255 | \begin{itemize}
256 | \item Daniel Peebles have implemented a set of new primops for
257 | copying array in GHC.
258 | \item The first implementation showed a 20\% performance improvement
259 | for \lstinline!insert!.
260 | \item Copying arrays is still slow so there might be room for big
261 | improvements still.
262 | \end{itemize}
263 | \end{frame}
264 |
265 | \begin{frame}
266 | \frametitle{Other possible performance improvements}
267 | \begin{itemize}
268 | \item Even faster array copying using SSE instructions, inline
269 | memory allocation, and C-- inlining.
270 | \item Use dedicated bit population count instruction on
271 | architectures where it's available.
272 | \item Clojure uses a clever trick to unpack keys and values directly
273 | into the arrays; keys are stored at even positions and values at
274 | odd positions.
275 | \item GHC 7.2 will use 1 less word for \lstinline!Array!.
276 | \end{itemize}
277 | \end{frame}
278 |
279 | \begin{frame}
280 | \frametitle{Optimize common cases}
281 | \begin{itemize}
282 | \item In many cases maps are created in one go from a sequence of
283 | key/value pairs.
284 | \item We can optimize for this case by repeatedly mutating the HAMT
285 | and freeze it when we're done.
286 | \end{itemize}
287 |
288 | \bigskip
289 | Keys: $2^{12}$ random 8-byte \lstinline!ByteString!s
290 |
291 | \bigskip
292 | \begin{tabular}{|c|c|}
293 | \hline \textbf{fromList/pure} & 1.00 \\
294 | \hline \textbf{fromList/mutating} & 0.50 \\
295 | \hline
296 | \end{tabular}
297 | \end{frame}
298 |
299 | \begin{frame}
300 | \frametitle{Abstracting over collection types}
301 | \begin{itemize}
302 | \item We will soon have two map types worth using (one ordered and one
303 | unordered).
304 | \item We want to write functions that work with both types, without a
305 | $O(n)$ conversion cost.
306 | \item Use type families to abstract over different concrete
307 | implementations.
308 | \end{itemize}
309 | \end{frame}
310 |
311 | \begin{frame}
312 | \frametitle{Summary}
313 | \begin{itemize}
314 | \item Hashing allows us to create more efficient data structures.
315 | \item There are new interesting data structures out there that have,
316 | or could have, an efficient persistent implementation.
317 | \end{itemize}
318 | \end{frame}
319 |
320 | \end{document}
321 |
322 | %%% Local Variables:
323 | %%% mode: latex
324 | %%% TeX-master: t
325 | %%% TeX-PDF-mode: t
326 | %%% End:
327 |
--------------------------------------------------------------------------------
/haskell-2011/Makefile:
--------------------------------------------------------------------------------
1 | TARGET = slides.pdf
2 |
3 | all: $(TARGET)
4 |
5 | clean:
6 | rm -f *.aux *.dvi *.log *.nav *.out *.snm *.toc *.pdf *.vrb *.hi *.o *.prof
7 |
8 | distclean: clean
9 | rm -f $(TARGET)
10 |
11 | pdf: all
12 | open $(TARGET)
13 |
14 | %.pdf: %.tex
15 | xelatex -file-line-error $<
16 |
17 | .PHONY: all clean distclean pdf
18 |
--------------------------------------------------------------------------------
/haskell-2011/performance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/haskell-2011/performance.png
--------------------------------------------------------------------------------
/haskell-2011/reasoning.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/haskell-2011/reasoning.png
--------------------------------------------------------------------------------
/haskell-2011/slides.tex:
--------------------------------------------------------------------------------
1 | \documentclass[xetex,mathserif,serif]{beamer}
2 |
3 | \usepackage{mathspec}
4 | \usepackage{xltxtra,fontspec,xunicode}
5 | \usepackage{listings}
6 |
7 | % Features
8 | \setbeamertemplate{navigation symbols}{}
9 |
10 | % Fonts
11 | \setmainfont{Helvetica}
12 | \setmathsfont(Digits,Latin,Greek){Helvetica}
13 | \setmonofont{Monaco}
14 |
15 | \title{The State of Haskell, 2011 survey results}
16 | \author{Johan Tibell\\johan.tibell@gmail.com}
17 | \date{2011-09-22}
18 |
19 | \begin{document}
20 |
21 | \frame{\titlepage}
22 |
23 | \begin{frame}
24 | \frametitle{The State of Haskell, 2011 Survey}
25 | \begin{itemize}
26 | \item I run a yearly Haskell user survey
27 | \item Good source of information for current problems, as
28 | experienced by the ``average'' Haskell user
29 | \item I will focus on one main theme
30 | \end{itemize}
31 | \end{frame}
32 |
33 | \begin{frame}
34 | \frametitle{Performance of Haskell code}
35 | \begin{center}
36 | \includegraphics[width=\textwidth]{performance}
37 | \end{center}
38 | Scale: 1 - poor, 5 - excellent
39 | \end{frame}
40 |
41 | \begin{frame}
42 | \frametitle{Ease of reasoning about performance}
43 | \begin{center}
44 | \includegraphics[width=\textwidth]{reasoning}
45 | \end{center}
46 | Scale: 1 - easy, 5 - hard
47 | \end{frame}
48 |
49 | \begin{frame}
50 | \frametitle{The working Haskell engineer}
51 | \begin{itemize}
52 | \item Predictable performance is important to engineers because...
53 | \item ...not knowing if you will be able to solve a future
54 | (performance) problem in the program that pays your bills is
55 | scary!
56 | \end{itemize}
57 | \end{frame}
58 |
59 | \begin{frame}
60 | \frametitle{Dirty talk}
61 | \begin{itemize}
62 | \item \textbf{Reasoning about performance in Haskell is possible}: a
63 | small group of people know how to
64 | \item Talking about performance/operational reasoning is considered
65 | ``dirty'': \textbf{we don't teach people how to do it}
66 | \end{itemize}
67 | \end{frame}
68 |
69 | \begin{frame}
70 | \frametitle{Conclusion/Call-to-action}
71 | What I think we should do:
72 | \begin{enumerate}
73 | \item Pay the issues some attention
74 | \item Figure out how to teach performance reasoning:
75 | \begin{itemize}
76 | \item When is the right time to introduce the topic?
77 | \item What do you teach first, second, etc?
78 | \end{itemize}
79 | \end{enumerate}
80 | \end{frame}
81 |
82 | \end{document}
83 |
84 | %%% Local Variables:
85 | %%% mode: latex
86 | %%% TeX-master: t
87 | %%% TeX-PDF-mode: t
88 | %%% End:
89 |
--------------------------------------------------------------------------------
/hiw-2011/Makefile:
--------------------------------------------------------------------------------
1 | TARGET = slides.pdf
2 |
3 | all: $(TARGET)
4 |
5 | clean:
6 | rm -f *.aux *.dvi *.log *.nav *.out *.snm *.toc $(TARGET) *.vrb *.hi *.o *.prof
7 |
8 | distclean: clean
9 | rm -f $(TARGET)
10 |
11 | pdf: all
12 | open $(TARGET)
13 |
14 | %.pdf: %.tex
15 | xelatex -file-line-error $<
16 |
17 | .PHONY: all clean distclean pdf
18 |
--------------------------------------------------------------------------------
/hiw-2011/hamt-mem.hp:
--------------------------------------------------------------------------------
1 | JOB "mem"
2 | DATE "Thu Sep 22 13:49 2011"
3 | SAMPLE_UNIT "seconds"
4 | VALUE_UNIT "bytes"
5 | BEGIN_SAMPLE 0.00
6 | END_SAMPLE 0.00
7 | BEGIN_SAMPLE 0.13
8 | THUNK 40
9 | base:GHC.Conc.Sync.ThreadId 16
10 | ARR_WORDS 32768
11 | base:Data.Dynamic.Dynamic 24
12 | FUN_2_0 48
13 | base:GHC.IO.Handle.Types.Handle__ 136
14 | base:GHC.Arr.STArray 40
15 | BLACKHOLE 16
16 | PAP 32
17 | THUNK_2_0 32
18 | MVAR_CLEAN 64
19 | STACK 912
20 | base:GHC.IO.Encoding.Types.TextEncoding 32
21 | base:GHC.IO.Handle.Types.FileHandle 24
22 | base:GHC.MVar.MVar 16
23 | WEAK 96
24 | TSO 112
25 | ghc-prim:GHC.Tuple.(,) 48
26 | THUNK_0_1 24
27 | THUNK_1_1 32
28 | THUNK_1_0 72
29 | base:GHC.ForeignPtr.MallocPtr 72
30 | FUN_1_0 48
31 | base:GHC.IO.Encoding.Types.BufferCodec 48
32 | base:GHC.IO.Buffer.Buffer 112
33 | FUN 32
34 | ghc-prim:GHC.Types.: 312
35 | base:Data.Maybe.Just 32
36 | MUT_VAR_CLEAN 112
37 | MUT_ARR_PTRS_CLEAN 552
38 | base:GHC.STRef.STRef 48
39 | MUT_ARR_PTRS_FROZEN 4840888
40 | main:Data.HashMap.Base.Full 69888
41 | main:Data.HashMap.Base.BitmapIndexed 1572864
42 | ghc-prim:GHC.Types.I# 8178800
43 | main:Data.HashMap.Base.Leaf 8178784
44 | END_SAMPLE 0.13
45 | BEGIN_SAMPLE 0.29
46 | base:GHC.Conc.Sync.ThreadId 16
47 | THUNK 40
48 | ARR_WORDS 32768
49 | FUN_2_0 48
50 | base:GHC.IO.Handle.Types.Handle__ 136
51 | base:GHC.Arr.STArray 40
52 | BLACKHOLE 16
53 | PAP 32
54 | THUNK_2_0 32
55 | MVAR_CLEAN 64
56 | STACK 912
57 | base:GHC.IO.Encoding.Types.TextEncoding 32
58 | base:GHC.IO.Handle.Types.FileHandle 24
59 | base:GHC.MVar.MVar 16
60 | WEAK 96
61 | TSO 112
62 | base:GHC.STRef.STRef 48
63 | base:Data.Dynamic.Dynamic 24
64 | THUNK_0_1 24
65 | THUNK_1_1 32
66 | THUNK_1_0 72
67 | base:GHC.ForeignPtr.MallocPtr 72
68 | FUN_1_0 48
69 | base:GHC.IO.Encoding.Types.BufferCodec 48
70 | ghc-prim:GHC.Tuple.(,) 48
71 | base:GHC.IO.Buffer.Buffer 112
72 | FUN 32
73 | ghc-prim:GHC.Types.: 312
74 | base:Data.Maybe.Just 32
75 | MUT_VAR_CLEAN 112
76 | MUT_ARR_PTRS_CLEAN 552
77 | main:Data.HashMap.Base.Full 69888
78 | ghc-prim:GHC.Types.I# 15816784
79 | main:Data.HashMap.Base.BitmapIndexed 1572864
80 | main:Data.HashMap.Base.Leaf 15816768
81 | MUT_ARR_PTRS_FROZEN 6750384
82 | END_SAMPLE 0.29
83 | BEGIN_SAMPLE 0.30
84 | base:GHC.Conc.Sync.ThreadId 16
85 | THUNK 40
86 | ARR_WORDS 32768
87 | FUN_2_0 48
88 | base:GHC.IO.Handle.Types.Handle__ 136
89 | base:GHC.Arr.STArray 40
90 | BLACKHOLE 16
91 | PAP 32
92 | THUNK_2_0 32
93 | MVAR_CLEAN 64
94 | STACK 912
95 | base:GHC.IO.Encoding.Types.TextEncoding 32
96 | base:GHC.IO.Handle.Types.FileHandle 24
97 | base:GHC.MVar.MVar 16
98 | WEAK 96
99 | TSO 112
100 | base:GHC.STRef.STRef 48
101 | base:Data.Dynamic.Dynamic 24
102 | THUNK_0_1 24
103 | THUNK_1_1 32
104 | THUNK_1_0 72
105 | base:GHC.ForeignPtr.MallocPtr 72
106 | FUN_1_0 48
107 | base:GHC.IO.Encoding.Types.BufferCodec 48
108 | ghc-prim:GHC.Tuple.(,) 48
109 | base:GHC.IO.Buffer.Buffer 112
110 | FUN 32
111 | ghc-prim:GHC.Types.: 312
112 | base:Data.Maybe.Just 32
113 | MUT_VAR_CLEAN 112
114 | MUT_ARR_PTRS_CLEAN 552
115 | MUT_ARR_PTRS_FROZEN 6959648
116 | main:Data.HashMap.Base.Full 69888
117 | main:Data.HashMap.Base.BitmapIndexed 1572864
118 | ghc-prim:GHC.Types.I# 16653840
119 | main:Data.HashMap.Base.Leaf 16653824
120 | END_SAMPLE 0.30
121 | BEGIN_SAMPLE 0.32
122 | base:GHC.Conc.Sync.ThreadId 16
123 | THUNK 40
124 | ARR_WORDS 32768
125 | FUN_2_0 48
126 | base:GHC.IO.Handle.Types.Handle__ 136
127 | base:GHC.Arr.STArray 40
128 | BLACKHOLE 16
129 | PAP 32
130 | THUNK_2_0 32
131 | MVAR_CLEAN 64
132 | STACK 912
133 | base:GHC.IO.Encoding.Types.TextEncoding 32
134 | base:GHC.IO.Handle.Types.FileHandle 24
135 | base:GHC.MVar.MVar 16
136 | WEAK 96
137 | TSO 112
138 | FUN 32
139 | base:GHC.STRef.STRef 48
140 | base:Data.Dynamic.Dynamic 24
141 | THUNK_0_1 24
142 | FUN_1_0 48
143 | THUNK_1_1 32
144 | THUNK_1_0 72
145 | base:GHC.ForeignPtr.MallocPtr 72
146 | base:GHC.IO.Encoding.Types.BufferCodec 48
147 | ghc-prim:GHC.Tuple.(,) 48
148 | base:GHC.IO.Buffer.Buffer 112
149 | ghc-prim:GHC.Types.: 312
150 | base:Data.Maybe.Just 32
151 | MUT_VAR_CLEAN 112
152 | MUT_ARR_PTRS_CLEAN 552
153 | main:Data.HashMap.Array.Array 16
154 | main:Data.HashMap.Base.BitmapIndexed 1572864
155 | main:Data.HashMap.Base.Full 69888
156 | ghc-prim:GHC.Types.I# 17243872
157 | main:Data.HashMap.Base.Leaf 17243872
158 | MUT_ARR_PTRS_FROZEN 7107256
159 | END_SAMPLE 0.32
160 | BEGIN_SAMPLE 0.36
161 | THUNK 40
162 | base:GHC.Conc.Sync.ThreadId 16
163 | ARR_WORDS 32768
164 | FUN_2_0 48
165 | base:GHC.IO.Handle.Types.Handle__ 136
166 | base:GHC.Arr.STArray 40
167 | BLACKHOLE 16
168 | PAP 32
169 | THUNK_2_0 32
170 | MVAR_CLEAN 64
171 | STACK 912
172 | base:GHC.IO.Encoding.Types.TextEncoding 32
173 | base:GHC.IO.Handle.Types.FileHandle 24
174 | base:GHC.MVar.MVar 16
175 | WEAK 96
176 | TSO 112
177 | base:GHC.STRef.STRef 48
178 | base:Data.Dynamic.Dynamic 24
179 | THUNK_0_1 24
180 | THUNK_1_1 32
181 | THUNK_1_0 72
182 | base:GHC.ForeignPtr.MallocPtr 72
183 | FUN_1_0 48
184 | base:GHC.IO.Encoding.Types.BufferCodec 48
185 | ghc-prim:GHC.Tuple.(,) 48
186 | base:GHC.IO.Buffer.Buffer 112
187 | FUN 32
188 | ghc-prim:GHC.Types.: 312
189 | base:Data.Maybe.Just 32
190 | MUT_VAR_CLEAN 112
191 | MUT_ARR_PTRS_CLEAN 552
192 | main:Data.HashMap.Base.Full 69888
193 | ghc-prim:GHC.Types.I# 19271056
194 | main:Data.HashMap.Base.BitmapIndexed 1572864
195 | main:Data.HashMap.Base.Leaf 19271040
196 | MUT_ARR_PTRS_FROZEN 7613952
197 | END_SAMPLE 0.36
198 | BEGIN_SAMPLE 0.58
199 | base:GHC.Conc.Sync.ThreadId 16
200 | THUNK 40
201 | ARR_WORDS 32768
202 | base:GHC.IO.Handle.Types.Handle__ 136
203 | base:GHC.Arr.STArray 40
204 | BLACKHOLE 16
205 | PAP 32
206 | THUNK_2_0 32
207 | MVAR_CLEAN 64
208 | STACK 912
209 | base:GHC.IO.Encoding.Types.TextEncoding 32
210 | base:GHC.IO.Handle.Types.FileHandle 24
211 | base:GHC.MVar.MVar 16
212 | WEAK 96
213 | TSO 112
214 | FUN_2_0 48
215 | FUN 32
216 | base:GHC.STRef.STRef 48
217 | base:Data.Dynamic.Dynamic 24
218 | THUNK_0_1 24
219 | FUN_1_0 48
220 | THUNK_1_1 32
221 | THUNK_1_0 72
222 | base:GHC.ForeignPtr.MallocPtr 72
223 | base:GHC.IO.Encoding.Types.BufferCodec 48
224 | ghc-prim:GHC.Tuple.(,) 48
225 | base:GHC.IO.Buffer.Buffer 112
226 | ghc-prim:GHC.Types.: 312
227 | base:Data.Maybe.Just 32
228 | MUT_VAR_CLEAN 112
229 | MUT_ARR_PTRS_CLEAN 704
230 | main:Data.HashMap.Base.Full 69888
231 | main:Data.HashMap.Base.BitmapIndexed 1572864
232 | ghc-prim:GHC.Types.I# 29540720
233 | main:Data.HashMap.Base.Leaf 29540704
234 | MUT_ARR_PTRS_FROZEN 10181368
235 | END_SAMPLE 0.58
236 | BEGIN_SAMPLE 0.67
237 | END_SAMPLE 0.67
238 |
--------------------------------------------------------------------------------
/hiw-2011/hamt-mem.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/hiw-2011/hamt-mem.pdf
--------------------------------------------------------------------------------
/hiw-2011/hamt.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/hiw-2011/hamt.pdf
--------------------------------------------------------------------------------
/hiw-2011/patricia-mem.hp:
--------------------------------------------------------------------------------
1 | JOB "mem"
2 | DATE "Thu Sep 22 13:48 2011"
3 | SAMPLE_UNIT "seconds"
4 | VALUE_UNIT "bytes"
5 | BEGIN_SAMPLE 0.00
6 | END_SAMPLE 0.00
7 | BEGIN_SAMPLE 0.07
8 | base:GHC.Conc.Sync.ThreadId 16
9 | THUNK 40
10 | ARR_WORDS 32768
11 | base:GHC.IO.Handle.Types.Handle__ 136
12 | base:GHC.Arr.STArray 40
13 | BLACKHOLE 16
14 | PAP 32
15 | THUNK_2_0 32
16 | MVAR_CLEAN 64
17 | STACK 912
18 | base:GHC.IO.Encoding.Types.TextEncoding 32
19 | base:GHC.IO.Handle.Types.FileHandle 24
20 | base:GHC.MVar.MVar 16
21 | WEAK 96
22 | TSO 112
23 | MUT_ARR_PTRS_CLEAN 552
24 | FUN_2_0 48
25 | base:Data.Dynamic.Dynamic 24
26 | FUN 32
27 | base:GHC.STRef.STRef 48
28 | THUNK_0_1 24
29 | FUN_1_0 48
30 | THUNK_1_1 32
31 | THUNK_1_0 72
32 | base:GHC.ForeignPtr.MallocPtr 72
33 | base:GHC.IO.Encoding.Types.BufferCodec 48
34 | ghc-prim:GHC.Tuple.(,) 48
35 | base:GHC.IO.Buffer.Buffer 112
36 | ghc-prim:GHC.Types.: 312
37 | base:Data.Maybe.Just 32
38 | MUT_VAR_CLEAN 112
39 | main:Data.HashMap.Common.Tip 6021760
40 | ghc-prim:GHC.Types.I# 4817424
41 | main:Data.HashMap.Common.Bin 4816832
42 | END_SAMPLE 0.07
43 | BEGIN_SAMPLE 0.18
44 | base:GHC.Conc.Sync.ThreadId 16
45 | THUNK 40
46 | ARR_WORDS 32768
47 | base:GHC.IO.Encoding.Types.BufferCodec 48
48 | ghc-prim:GHC.Tuple.(,) 48
49 | base:GHC.IO.Buffer.Buffer 112
50 | FUN 32
51 | base:Data.Maybe.Just 32
52 | MUT_ARR_PTRS_CLEAN 552
53 | MUT_VAR_CLEAN 112
54 | FUN_1_0 48
55 | base:Data.Dynamic.Dynamic 24
56 | THUNK_0_1 24
57 | THUNK_1_1 32
58 | base:GHC.STRef.STRef 48
59 | FUN_2_0 48
60 | base:GHC.IO.Handle.Types.Handle__ 136
61 | base:GHC.Arr.STArray 40
62 | THUNK_1_0 72
63 | base:GHC.ForeignPtr.MallocPtr 72
64 | BLACKHOLE 16
65 | PAP 32
66 | THUNK_2_0 32
67 | MVAR_CLEAN 64
68 | STACK 912
69 | base:GHC.IO.Encoding.Types.TextEncoding 32
70 | ghc-prim:GHC.Types.: 312
71 | base:GHC.IO.Handle.Types.FileHandle 24
72 | base:GHC.MVar.MVar 16
73 | WEAK 96
74 | TSO 112
75 | main:Data.HashMap.Common.Bin 9894752
76 | ghc-prim:GHC.Types.I# 9894944
77 | main:Data.HashMap.Common.Tip 12368680
78 | END_SAMPLE 0.18
79 | BEGIN_SAMPLE 0.19
80 | base:GHC.Conc.Sync.ThreadId 16
81 | THUNK 40
82 | ARR_WORDS 32768
83 | base:GHC.IO.Encoding.Types.BufferCodec 48
84 | ghc-prim:GHC.Tuple.(,) 48
85 | base:GHC.IO.Buffer.Buffer 112
86 | FUN 32
87 | base:Data.Maybe.Just 32
88 | MUT_ARR_PTRS_CLEAN 552
89 | MUT_VAR_CLEAN 112
90 | FUN_1_0 48
91 | base:Data.Dynamic.Dynamic 24
92 | THUNK_0_1 24
93 | THUNK_1_1 32
94 | base:GHC.STRef.STRef 48
95 | FUN_2_0 48
96 | base:GHC.IO.Handle.Types.Handle__ 136
97 | base:GHC.Arr.STArray 40
98 | THUNK_1_0 72
99 | base:GHC.ForeignPtr.MallocPtr 72
100 | BLACKHOLE 16
101 | PAP 32
102 | THUNK_2_0 32
103 | MVAR_CLEAN 64
104 | STACK 912
105 | base:GHC.IO.Encoding.Types.TextEncoding 32
106 | ghc-prim:GHC.Types.: 312
107 | base:GHC.IO.Handle.Types.FileHandle 24
108 | base:GHC.MVar.MVar 16
109 | WEAK 96
110 | TSO 112
111 | main:Data.HashMap.Common.Bin 10238816
112 | ghc-prim:GHC.Types.I# 10239008
113 | main:Data.HashMap.Common.Tip 12798760
114 | END_SAMPLE 0.19
115 | BEGIN_SAMPLE 0.22
116 | base:GHC.Conc.Sync.ThreadId 16
117 | THUNK 40
118 | ARR_WORDS 32768
119 | base:GHC.IO.Encoding.Types.BufferCodec 48
120 | ghc-prim:GHC.Tuple.(,) 48
121 | base:GHC.IO.Buffer.Buffer 112
122 | FUN 32
123 | base:Data.Maybe.Just 32
124 | MUT_ARR_PTRS_CLEAN 552
125 | MUT_VAR_CLEAN 112
126 | FUN_1_0 48
127 | base:Data.Dynamic.Dynamic 24
128 | THUNK_0_1 24
129 | THUNK_1_1 32
130 | base:GHC.STRef.STRef 48
131 | FUN_2_0 48
132 | base:GHC.IO.Handle.Types.Handle__ 136
133 | base:GHC.Arr.STArray 40
134 | THUNK_1_0 72
135 | base:GHC.ForeignPtr.MallocPtr 72
136 | BLACKHOLE 16
137 | PAP 32
138 | THUNK_2_0 32
139 | MVAR_CLEAN 64
140 | STACK 912
141 | base:GHC.IO.Encoding.Types.TextEncoding 32
142 | ghc-prim:GHC.Types.: 312
143 | base:GHC.IO.Handle.Types.FileHandle 24
144 | base:GHC.MVar.MVar 16
145 | WEAK 96
146 | TSO 112
147 | main:Data.HashMap.Common.Bin 11467616
148 | ghc-prim:GHC.Types.I# 11467808
149 | main:Data.HashMap.Common.Tip 14334760
150 | END_SAMPLE 0.22
151 | BEGIN_SAMPLE 0.36
152 | THUNK 40
153 | base:GHC.Conc.Sync.ThreadId 16
154 | ARR_WORDS 32768
155 | base:GHC.IO.Encoding.Types.BufferCodec 48
156 | ghc-prim:GHC.Tuple.(,) 48
157 | base:GHC.IO.Buffer.Buffer 112
158 | FUN 32
159 | base:Data.Maybe.Just 32
160 | MUT_ARR_PTRS_CLEAN 552
161 | MUT_VAR_CLEAN 112
162 | FUN_1_0 48
163 | base:Data.Dynamic.Dynamic 24
164 | THUNK_0_1 24
165 | THUNK_1_1 32
166 | base:GHC.STRef.STRef 48
167 | FUN_2_0 48
168 | base:GHC.IO.Handle.Types.Handle__ 136
169 | base:GHC.Arr.STArray 40
170 | THUNK_1_0 72
171 | base:GHC.ForeignPtr.MallocPtr 72
172 | BLACKHOLE 16
173 | PAP 32
174 | THUNK_2_0 32
175 | MVAR_CLEAN 64
176 | STACK 912
177 | base:GHC.IO.Encoding.Types.TextEncoding 32
178 | ghc-prim:GHC.Types.: 312
179 | base:GHC.IO.Handle.Types.FileHandle 24
180 | base:GHC.MVar.MVar 16
181 | WEAK 96
182 | TSO 112
183 | main:Data.HashMap.Common.Tip 21175600
184 | main:Data.HashMap.Common.Bin 16940352
185 | ghc-prim:GHC.Types.I# 16940480
186 | END_SAMPLE 0.36
187 | BEGIN_SAMPLE 0.63
188 | base:GHC.Conc.Sync.ThreadId 16
189 | THUNK 40
190 | ARR_WORDS 32768
191 | base:GHC.IO.Encoding.Types.BufferCodec 48
192 | ghc-prim:GHC.Tuple.(,) 48
193 | base:GHC.IO.Buffer.Buffer 112
194 | FUN 32
195 | base:Data.Maybe.Just 32
196 | MUT_ARR_PTRS_CLEAN 552
197 | MUT_VAR_CLEAN 112
198 | FUN_1_0 48
199 | base:Data.Dynamic.Dynamic 24
200 | THUNK_0_1 24
201 | THUNK_1_1 32
202 | base:GHC.STRef.STRef 48
203 | FUN_2_0 48
204 | base:GHC.IO.Handle.Types.Handle__ 136
205 | base:GHC.Arr.STArray 40
206 | THUNK_1_0 72
207 | base:GHC.ForeignPtr.MallocPtr 72
208 | BLACKHOLE 16
209 | PAP 32
210 | THUNK_2_0 32
211 | MVAR_CLEAN 64
212 | STACK 912
213 | base:GHC.IO.Encoding.Types.TextEncoding 32
214 | ghc-prim:GHC.Types.: 312
215 | base:GHC.IO.Handle.Types.FileHandle 24
216 | base:GHC.MVar.MVar 16
217 | WEAK 96
218 | TSO 112
219 | main:Data.HashMap.Common.Bin 26675616
220 | ghc-prim:GHC.Types.I# 26675936
221 | main:Data.HashMap.Common.Tip 33344920
222 | END_SAMPLE 0.63
223 | BEGIN_SAMPLE 0.72
224 | base:GHC.Conc.Sync.ThreadId 16
225 | THUNK 40
226 | ARR_WORDS 32768
227 | FUN_2_0 48
228 | base:GHC.IO.Handle.Types.Handle__ 136
229 | base:GHC.Arr.STArray 40
230 | BLACKHOLE 16
231 | PAP 32
232 | THUNK_2_0 32
233 | MVAR_CLEAN 64
234 | STACK 912
235 | base:GHC.IO.Encoding.Types.TextEncoding 32
236 | base:GHC.IO.Handle.Types.FileHandle 24
237 | base:GHC.MVar.MVar 16
238 | WEAK 96
239 | TSO 112
240 | base:GHC.STRef.STRef 48
241 | base:Data.Dynamic.Dynamic 24
242 | THUNK_0_1 24
243 | THUNK_1_1 32
244 | THUNK_1_0 72
245 | base:GHC.ForeignPtr.MallocPtr 72
246 | FUN_1_0 48
247 | base:GHC.IO.Encoding.Types.BufferCodec 48
248 | ghc-prim:GHC.Tuple.(,) 48
249 | base:GHC.IO.Buffer.Buffer 112
250 | FUN 32
251 | ghc-prim:GHC.Types.: 312
252 | base:Data.Maybe.Just 32
253 | MUT_VAR_CLEAN 112
254 | MUT_ARR_PTRS_CLEAN 552
255 | ghc-prim:GHC.Types.I# 29583504
256 | main:Data.HashMap.Common.Tip 36979360
257 | main:Data.HashMap.Common.Bin 29582880
258 | END_SAMPLE 0.72
259 | BEGIN_SAMPLE 0.83
260 | END_SAMPLE 0.83
261 |
--------------------------------------------------------------------------------
/hiw-2011/patricia-mem.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/hiw-2011/patricia-mem.pdf
--------------------------------------------------------------------------------
/hiw-2011/slides.tex:
--------------------------------------------------------------------------------
1 | \documentclass[xetex,mathserif,serif]{beamer}
2 |
3 | \usepackage{mathspec}
4 | \usepackage{xltxtra,fontspec,xunicode}
5 | \usepackage{listings}
6 |
7 | % \usepackage{pgfpages}
8 | % \pgfpagesuselayout{4 on 1}[a4paper,border shrink=5mm,landscape]
9 |
10 | % Features
11 | \setbeamertemplate{navigation symbols}{}
12 |
13 | % Fonts
14 | \setmainfont{Helvetica}
15 | \setmathsfont(Digits,Latin,Greek){Helvetica}
16 | \setmonofont{Monaco}
17 |
18 | % Macros
19 | \definecolor{CodeColor}{RGB}{0,112,0}
20 | \newcommand{\code}[1]{\mbox{\texttt{\small{\color{CodeColor}{#1}}}}}
21 |
22 | \title{Faster persistent data structures through hashing}
23 | \author{Johan Tibell\\johan.tibell@gmail.com}
24 | \date{2011-09-23}
25 |
26 | \begin{document}
27 | \lstset{
28 | language=Haskell,
29 | basicstyle=\small\ttfamily,
30 | keywordstyle=,
31 | commentstyle=\color{CodeColor}}
32 |
33 | \frame{\titlepage}
34 |
35 | \begin{frame}
36 | \frametitle{Motivating problem: Twitter data analysis}
37 |
38 | \begin{quote}
39 | I'm computing a communication graph from Twitter data and then
40 | scan it daily to allocate social capital to nodes behaving in a
41 | good karmic manner. The graph is culled from 100 million tweets
42 | and has about 3 million nodes.
43 | \end{quote}
44 |
45 | \bigskip
46 | We need a data structure that is
47 | \begin{itemize}
48 | \item fast when used with string keys, and
49 | \item doesn't use too much memory.
50 | \end{itemize}
51 | \end{frame}
52 |
53 | \begin{frame}
54 | \frametitle{Persistent maps in Haskell}
55 |
56 | \begin{itemize}
57 | \item \code{Data.Map} is the most commonly used map type.
58 | \item It's implemented using size balanced trees and is
59 | representative of the performance of other binary tree
60 | implementations.
61 | \item Keys can be of any type, as long as values of the type can be
62 | ordered.
63 | \end{itemize}
64 | \end{frame}
65 |
66 | \begin{frame}
67 | \frametitle{Real world performance of Data.Map}
68 |
69 | \begin{itemize}
70 | \item Good in theory: no more than $O(\log n)$ comparisons.
71 | \item Not great in practice: up to $O(\log n)$ comparisons!
72 | \item Many common types are expensive to compare e.g
73 | \code{String}, \code{ByteString}, and \code{Text}.
74 | \item Given a string of length $k$, we need $O(k*\log n)$
75 | comparisons to look up an entry.
76 | \end{itemize}
77 | \end{frame}
78 |
79 | \begin{frame}
80 | \frametitle{Hash tables}
81 | \begin{itemize}
82 | \item Hash tables perform well with string keys: $O(k)$ amortized
83 | lookup time for strings of length $k$.
84 | \item However, we want persistent maps, not mutable hash tables.
85 | \end{itemize}
86 | \end{frame}
87 |
88 | \begin{frame}[fragile]
89 | \frametitle{Milan Straka's idea: IntMaps as arrays}
90 | \begin{itemize}
91 | \item We can use hashing without using hash tables!
92 | \item \code{Data.IntMap} implements a persistent array and is much
93 | faster than \code{Data.Map}.
94 | \item Use hashing to derive an \code{Int} from an arbitrary
95 | key.
96 | \end{itemize}
97 | \begin{lstlisting}
98 | class Hashable a where
99 | hash :: a -> Int
100 | \end{lstlisting}
101 | \end{frame}
102 |
103 | \begin{frame}
104 | \frametitle{Collisions are easy to deal with}
105 | \begin{itemize}
106 | \item \code{IntMap} implements a sparse, persistent array of size
107 | $2^{32}$ (or $2^{64}$).
108 | \item Hashing using this many buckets makes collisions rare: for
109 | $2^{24}$ entries we expect about 32,000 single collisions.
110 | \item Implication: We can use any old collision handling strategy
111 | (e.g. chaining using linked lists).
112 | \end{itemize}
113 | \end{frame}
114 |
115 | \begin{frame}[fragile]
116 | \frametitle{HashMap implemented using an IntMap}
117 |
118 | Naive implementation:
119 |
120 | \begin{lstlisting}
121 | newtype HashMap k v = HashMap (IntMap [(k, v)])
122 | \end{lstlisting}
123 |
124 | By inlining (``unpacking'') the list and pair constructors we can
125 | save 2 words of memory per key/value pair.
126 | \end{frame}
127 |
128 | \begin{frame}
129 | \frametitle{Benchmark: Map vs HashMap}
130 |
131 | Keys: $2^{12}$ random 8-byte \code{ByteString}s
132 |
133 | \bigskip
134 | \begin{center}
135 | \begin{tabular}{r|rrr}
136 | & \multicolumn{2}{c}{Runtime ($\mu$s)} & Runtime \\
137 | & Map & HashMap & \% increase \\
138 | \hline lookup & 1956 & 916 & -53\% \\
139 | insert & 3543 & 1855 & -48\% \\
140 | delete & 3791 & 1838 & -52\% \\
141 | \end{tabular}
142 | \end{center}
143 | \end{frame}
144 |
145 | \begin{frame}
146 | \frametitle{Can we do better?}
147 | \begin{itemize}
148 | \item Imperative hash tables still perform better, perhaps there's
149 | room for improvement.
150 | \item We still need to perform $O(min(W, \log n))$ \code{Int}
151 | comparisons, where $W$ is the number of bits in a word.
152 | \item The memory overhead per key/value pair is still high, about 9
153 | words per key/value pair.
154 | \end{itemize}
155 | \end{frame}
156 |
157 | \begin{frame}
158 | \frametitle{Borrowing from our neighbours}
159 | \begin{itemize}
160 | \item Clojure uses a \emph{hash-array mapped trie} (HAMT) data
161 | structure to implement persistent maps.
162 | \item Described in the paper ``Ideal Hash Trees'' by Bagwell (2001).
163 | \item Originally a mutable data structure implemented in C++.
164 | \item Clojure's persistent version was created by Rich Hickey.
165 | \end{itemize}
166 | \end{frame}
167 |
168 | \begin{frame}
169 | \frametitle{Hash-array mapped tries}
170 | \begin{itemize}
171 | \item Shallow tree with high branching factor.
172 | \item Each node, except the leaf nodes, contains an array of up to
173 | 32 elements.
174 | \item 5 bits of the hash are used to index the array at each level.
175 | \item A clever trick, using bit population count, is used to
176 | represent sparse arrays.
177 | \end{itemize}
178 | \end{frame}
179 |
180 | \begin{frame}
181 | \frametitle{HAMT}
182 | \includegraphics[width=\textwidth]{hamt.pdf}
183 | \end{frame}
184 |
185 |
186 | \begin{frame}[fragile]
187 | \frametitle{The Haskell definition of a HAMT}
188 | \begin{lstlisting}
189 | data HashMap k v
190 | = Empty
191 | | BitmapIndexed !Bitmap !(Array (HashMap k v))
192 | | Leaf !Hash !k v
193 | | Full !(Array (HashMap k v))
194 | | Collision !Hash !(Array (Leaf k v))
195 |
196 | type Bitmap = Word
197 | type Hash = Int
198 | data Array a = Array (Array# a)
199 | \end{lstlisting}
200 | \end{frame}
201 |
202 | \begin{frame}
203 | \frametitle{High performance Haskell programming}
204 | Optimized implementation using standard techniques:
205 | \begin{itemize}
206 | \item constructor unpacking,
207 | \item GHC's new \code{INLINABLE} pragma, and
208 | \item paying careful attention to strictness.
209 | \end{itemize}
210 | \code{insert} performance still bad (e.g compare to hash tables).
211 | \end{frame}
212 |
213 | \begin{frame}
214 | \frametitle{Optimizing insertion}
215 |
216 | \begin{itemize}
217 | \item Most time in \code{insert} is spent copying small arrays.
218 | \item Array copying is implemented in Haskell and GHC doesn't apply
219 | enough loop optimizations to make it run fast.
220 | \item When allocating arrays GHC fills the array with dummy
221 | elements, which are immediately overwritten.
222 | \end{itemize}
223 | \end{frame}
224 |
225 | \begin{frame}
226 | \frametitle{Optimizing insertion: copy less}
227 |
228 | \begin{itemize}
229 | \item Bagwell's original formulation used a fanout of 32.
230 | \item A fanout of 16 seems to provide a better trade-off between
231 | \code{lookup} and \code{insert} performance in our setting.
232 | \item Improved performance by 14\%
233 | \end{itemize}
234 | \end{frame}
235 |
236 | \begin{frame}
237 | \frametitle{Optimizing insertion: copy faster}
238 |
239 | \begin{itemize}
240 | \item Daniel Peebles and I have implemented a set of new primops for
241 | copying arrays in GHC.
242 | \item The implementation generates straight-line code for copies of
243 | statically known small size, and uses a fast \code{memcpy}
244 | otherwise.
245 | \item Improved performance by 20\%
246 | \end{itemize}
247 | \end{frame}
248 |
249 | \begin{frame}
250 | \frametitle{Optimizing insertion: common patterns}
251 | \begin{itemize}
252 | \item In many cases maps are created in one go from a sequence of
253 | key/value pairs.
254 | \item We can optimize for this case by repeatedly mutating the HAMT
255 | and freezing it when we're done.
256 | \end{itemize}
257 |
258 | \bigskip
259 | Keys: $2^{12}$ random 8-byte \code{ByteString}s
260 |
261 | \bigskip
262 | \begin{center}
263 | \begin{tabular}{c|c}
264 | & Runtime (\%) \\
265 | \hline fromList/pure & 100 \\
266 | fromList/mutating & 50 \\
267 | \end{tabular}
268 | \end{center}
269 | \end{frame}
270 |
271 | \begin{frame}
272 | \frametitle{Optimizing lookup: Faster population count}
273 | \begin{itemize}
274 | \item Tried several bit population count implementations.
275 | \item Best speed/memory-use trade-off is a lookup table based
276 | approach.
277 | \item Using the \code{POPCNT} SSE 4.2 instructions improves the
278 | performance of \code{lookup} by 12\%.
279 | \end{itemize}
280 | \end{frame}
281 |
282 | \begin{frame}
283 | \frametitle{Benchmark: IntMap-based vs HAMT}
284 |
285 | Keys: $2^{12}$ random 8-byte \code{ByteString}s
286 |
287 | \bigskip
288 | \begin{center}
289 | \begin{tabular}{r|rrr}
290 | & \multicolumn{2}{c}{Runtime ($\mu$s)} & Runtime \\
291 | & IntMap & HAMT & \% increase \\
292 | \hline lookup & 916 & 477 & -48\% \\
293 | insert & 1855 & 1998 & 8\% \\
294 | delete & 1838 & 2303 & 25\% \\
295 | \end{tabular}
296 | \end{center}
297 |
298 | The benchmarks don't include the \code{POPCNT} optimization, due to
299 | it not being available on many architectures.
300 | \end{frame}
301 |
302 | \begin{frame}
303 | \frametitle{Memory usage: IntMap-based}
304 | Total: 96 MB, tree: 66MB ($2^{20}$ \code{Int} entries)
305 | \begin{center}
306 | \includegraphics[angle=90,scale=0.3]{patricia-mem.pdf}
307 | \end{center}
308 | \end{frame}
309 |
310 | \begin{frame}
311 | \frametitle{Memory usage: HAMT}
312 |
313 | Total: 71MB, tree: 41MB ($2^{20}$ \code{Int} entries)
314 | \begin{center}
315 | \includegraphics[angle=90,scale=0.3]{hamt-mem.pdf}
316 | \end{center}
317 | \end{frame}
318 |
319 | \begin{frame}
320 | \frametitle{Summary}
321 | Keys: $2^{12}$ random 8-byte \code{ByteString}s
322 |
323 | \bigskip
324 | \begin{center}
325 | \begin{tabular}{r|rrr}
326 | & \multicolumn{2}{c}{Runtime ($\mu$s)} & Runtime \\
327 | & Map & HAMT & \% increase \\
328 | \hline lookup & 1956 & 477 & -76\% \\
329 | insert & 3543 & 1998 & -44\% \\
330 | delete & 3791 & 2303 & -39\% \\
331 | \end{tabular}
332 | \end{center}
333 | \end{frame}
334 |
335 | \end{document}
336 |
337 | %%% Local Variables:
338 | %%% mode: latex
339 | %%% TeX-master: t
340 | %%% TeX-PDF-mode: t
341 | %%% End:
342 |
--------------------------------------------------------------------------------
/stanford-2011/Makefile:
--------------------------------------------------------------------------------
1 | performance.html: performance.md
2 | pandoc --data-dir=. --offline -s -t slidy -o performance.html performance.md
3 |
--------------------------------------------------------------------------------
/stanford-2011/hashmap-naive.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/stanford-2011/hashmap-naive.png
--------------------------------------------------------------------------------
/stanford-2011/hashmap.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tibbe/talks/a90ef9cfd75cabc8569416578e4cc1407b04b0d5/stanford-2011/hashmap.png
--------------------------------------------------------------------------------
/stanford-2011/intpair-unpacked.graffle:
--------------------------------------------------------------------------------
1 |
2 |
3 |

145 | 146 | * Each box represents one machine word 147 | 148 | * Arrows represent pointers 149 | 150 | * Each constructor has one word overhead for e.g. GC information 151 | 152 | 153 | # Refresher: unboxed types 154 | 155 | GHC defines a number of _unboxed_ types. These typically represent 156 | primitive machine types. 157 | 158 | * By convention, the names of these types end with a 159 | `#`. 160 | 161 | * Most unboxed types take one word (except 162 | e.g. `Double#` on 32-bit machines) 163 | 164 | * Values of unboxed types cannot be thunks. 165 | 166 | * The basic types are defined in terms unboxed types e.g. 167 | 168 | ~~~~ {.haskell} 169 | data Int = I# Int# 170 | ~~~~ 171 | 172 | * We call types such as `Int` _boxed_ types 173 | 174 | 175 | # Poll 176 | 177 | How many machine words is needed to store a value of this data type: 178 | 179 | ~~~~ {.haskell} 180 | data IntPair = IP Int Int 181 | ~~~~ 182 | 183 | * 3? 184 | 185 | * 5? 186 | 187 | * 7? 188 | 189 | * 9? 190 | 191 | Tip: Draw a boxes-and-arrows diagram. 192 | 193 | 194 | # IntPair memory layout 195 | 196 |
197 | 198 | So an `IntPair` value takes 7 words. 199 | 200 | 201 | # Refresher: unpacking 202 | 203 | GHC gives us some control over data representation via the 204 | `UNPACK` pragma. 205 | 206 | * The pragma unpacks the contents of a constructor into the 207 | field of another constructor, removing one level of indirection 208 | and one constructor header. 209 | 210 | * Only fields that are strict, monomorphic, and single-constructor 211 | can be unpacked. 212 | 213 | The pragma is added just before the bang pattern: 214 | 215 | ~~~~ {.haskell} 216 | data Foo = Foo {-# UNPACK #-} !SomeType 217 | ~~~~ 218 | 219 | GHC 7 and later will warn if an `UNPACK` pragma cannot be used because 220 | it fails the use constraint. 221 | 222 | 223 | # Unpacking example 224 | 225 | ~~~~ {.haskell} 226 | data IntPair = IP !Int !Int 227 | ~~~~ 228 | 229 |
230 | 231 | ~~~~ {.haskell} 232 | data IntPair = IP {-# UNPACK #-} !Int 233 | {-# UNPACK #-} !Int 234 | ~~~~ 235 | 236 |
237 | 238 | 239 | # Benefits of unpacking 240 | 241 | When the pragma applies, it offers the following benefits: 242 | 243 | * Reduced memory usage (4 words saved in the case of `IntPair`) 244 | 245 | * Removes indirection 246 | 247 | Caveat: There are (rare) cases where unpacking hurts performance 248 | e.g. if the value is passed to a non-strict function, as it needs to 249 | be reboxed. 250 | 251 | **Unpacking is one of the most important optimizations available to 252 | us.** 253 | 254 | 255 | # A structural comparison with C 256 | 257 | By reference: 258 | 259 | ~~~~ {.haskell} 260 | -- Haskell 261 | data A = A !Int 262 | ~~~~ 263 | 264 | ~~~~ {.c} 265 | // C 266 | struct A { 267 | int *a; 268 | }; 269 | ~~~~ 270 | 271 | By value: 272 | 273 | ~~~~ {.haskell} 274 | -- Haskell 275 | data A = A {-# UNPACK #-} !Int 276 | ~~~~ 277 | 278 | ~~~~ {.c} 279 | // C 280 | struct A { 281 | int a; 282 | }; 283 | ~~~~ 284 | 285 | If you can figure out which C representation you want, you can figure 286 | out which Haskell representation you want. 287 | 288 | 289 | # Exercise: HashMap memory layout 290 | 291 | Here are the data types used in our naive `HashMap` implementation: 292 | 293 | ~~~~ {.haskell} 294 | newtype HashMap k v = HashMap (IntMap [(k, v)]) 295 | 296 | data IntMap a 297 | = Bin {-# UNPACK #-} !SuffixMask 298 | !(IntMap a) 299 | !(IntMap a) 300 | | Tip {-# UNPACK #-} !Key a 301 | | Nil 302 | 303 | type SuffixMask = Int 304 | type Key = Int 305 | ~~~~ 306 | 307 | Exercise: 308 | 309 | * Draw a diagram of a map containing two key-value pairs of type `Int` 310 | (i.e. `Bin ... (Tip ...) (Tip ...)`). 311 | 312 | * How many words of memory does the map use? 313 | 314 | 315 | # Solution 316 | 317 |
318 | 319 | 30 words! 22 (73%) of them overhead. 320 | 321 | 322 | # Can we do better? 323 | 324 | Yes! We can make use of the following: 325 | 326 | * The list of collisions is never empty (and almost always contains a 327 | single element). 328 | 329 | * We don't need to store arbitrary elements in the list of collisions, 330 | just pairs: 331 | 332 | ~~~~ {.haskell} 333 | data List k v = Nil | Cons k v (List k v) 334 | ~~~~ 335 | 336 | is more memory efficient than `[(k, v)]`, as the pair constructor has 337 | been unpacked into the `Cons` constructor. 338 | 339 | 340 | # An improved HashMap data type 341 | 342 | ~~~~ {.haskell} 343 | data HashMap k v 344 | = Bin {-# UNPACK #-} !SuffixMask 345 | !(HashMap k v) 346 | !(HashMap k v) 347 | | Tip {-# UNPACK #-} !Hash 348 | {-# UNPACK #-} !(FullList k v) -- now monomorphic 349 | | Nil 350 | 351 | type SuffixMask = Int 352 | type Hash = Int 353 | 354 | data FullList k v = FL k v !(List k v) 355 | data List k v = Nil | Cons k v !(List k v) 356 | ~~~~ 357 | 358 | * The `FullList` type has only one constructor, so it can be unpacked. 359 | 360 | * In the common case, the tail of the `FullList` is empty and thus 361 | points to a shared `Nil` constructor. 362 | 363 | # Improved HashMap data type memory layout 364 | 365 |
366 | 367 | 22 words. 14 (64%) of them overhead. 368 | 369 | In general: $5N + 4(N-1)$ words + size of keys & values 370 | 371 | 372 | # Remaining sources of inefficiency 373 | 374 | * Keys and values are still boxed. 375 | 376 | * There are quite a few interior nodes. A wider fanning tree would be 377 | better. (See the 378 | [video](http://www.youtube.com/watch?v=Dn74rhQrKeQ) and 379 | [slides](http://www.haskell.org/wikiupload/6/65/HIW2011-Talk-Tibell.pdf) 380 | from my talk at HIW2011.) 381 | 382 | 383 | # Reasoning about laziness 384 | 385 | A function application is only evaluated if its result is needed, 386 | therefore: 387 | 388 | * One of the function's right-hand sides will be evaluated. 389 | 390 | * Any expression whose value is required to decide which RHS to 391 | evaluate, must be evaluated. 392 | 393 | These two properties allow us to use "back-to-front" analysis (known 394 | as demand/strictness analysis) to figure which arguments a function is 395 | strict in. 396 | 397 | 398 | # Reasoning about laziness: example 399 | 400 | ~~~~ {.haskell} 401 | max :: Int -> Int -> Int 402 | max x y 403 | | x > y = x 404 | | x < y = y 405 | | otherwise = x -- arbitrary 406 | ~~~~ 407 | 408 | * To pick one of the three RHSs, we must evaluate `x > y`. 409 | 410 | * Therefore we must evaluate _both_ `x` and `y`. 411 | 412 | * Therefore `max` is strict in both `x` and `y`. 413 | 414 | 415 | # Poll 416 | 417 | ~~~~ {.haskell} 418 | data Tree = Leaf | Node Int Tree Tree 419 | 420 | insert :: Int -> Tree -> Tree 421 | insert x Leaf = Node x Leaf Leaf 422 | insert x (Node y l r) 423 | | x < y = Node y (insert x l) r 424 | | x > y = Node y l (insert x r) 425 | | otherwise = Node x l r 426 | ~~~~ 427 | 428 | Which argument(s) is `insert` strict in? 429 | 430 | * None 431 | 432 | * 1st 433 | 434 | * 2nd 435 | 436 | * Both 437 | 438 | 439 | # Solution 440 | 441 | Only the second, as inserting into an empty tree can be done without 442 | comparing the value being inserted. For example, this expression 443 | 444 | ~~~~ {.haskell} 445 | insert (1 `div` 0) Leaf 446 | ~~~~ 447 | 448 | does not raise a division-by-zero expression but 449 | 450 | ~~~~ {.haskell} 451 | insert (1 `div` 0) (Node 2 Leaf Leaf) 452 | ~~~~ 453 | 454 | does. 455 | 456 | 457 | # Strictness annotations in the real world 458 | 459 | ~~~~ {.haskell} 460 | delete :: (Eq k, Hashable k) => k -> HashMap k v -> HashMap k v 461 | delete k0 = go h0 k0 462 | where 463 | h0 = hash k0 464 | go !h !k t@(Bin sm l r) 465 | | nomatch h sm = t 466 | | zero h sm = bin sm (go h k l) r 467 | | otherwise = bin sm l (go h k r) 468 | go h k t@(Tip h' l) 469 | | h == h' = case FL.delete k l of 470 | Nothing -> Nil 471 | Just l' -> Tip h' l' 472 | | otherwise = t 473 | go _ _ Nil = Nil 474 | {-# INLINABLE delete #-} 475 | ~~~~ 476 | 477 | * Without the bang patterns, `go` is not strict in the key `k` or the 478 | hash `h` (why?). 479 | 480 | * Making `go` strict in the key and hash arguments allows GHC to unbox 481 | these values in the loop (after `delete` has been inlined and the 482 | key type is known). 483 | 484 | 485 | # Benchmark 486 | 487 | So, is `HashMap` faster than `Map`? Benchmark: $2^12$ random 488 | `ByteString` keys of length 8 489 | 490 | ~~~~ 491 | benchmarking Map/lookup/ByteString 492 | mean: 1.590200 ms 493 | std dev: 30.69466 us 494 | 495 | benchmarking HashMap/lookup/ByteString 496 | mean: 575.9371 us 497 | std dev: 8.790398 us 498 | 499 | benchmarking Map/insert/ByteString 500 | mean: 2.957678 ms 501 | std dev: 451.8105 us 502 | 503 | benchmarking HashMap/insert/ByteString 504 | mean: 1.506817 ms 505 | std dev: 301.2400 us 506 | ~~~~ 507 | 508 | Yes! 509 | 510 | 511 | # Memory usage 512 | 513 | Benchmark: $2^20$ key-value pairs of type `Int`, on a 64-bit machine 514 | 515 | Estimated: $8 * (5N + 4(N-1) + 4N)$ = 104 MB 516 | 517 | Real: 518 | 519 | ~~~~ 520 | 716,158,856 bytes allocated in the heap 521 | 1,218,205,432 bytes copied during GC 522 | 106,570,936 bytes maximum residency (16 sample(s)) 523 | 3,636,304 bytes maximum slop 524 | 269 MB total memory in use (0 MB lost due to fragmentation) 525 | ~~~~ 526 | 527 | Maximum residency is the number we care about (note that the value is 528 | sampled and thus not 100% accurate). 529 | 530 | # Summary 531 | 532 | * Focus on memory layout and good performance almost always follows. 533 | 534 | * Strictness annotations are mainly used on loop variables and in data 535 | type definitions. 536 | 537 | * `Data.HashMap` is implemented in the unordered-containers package. 538 | You can get the source from 539 | [http://github.com/tibbe/unordered-containers](http://github.com/tibbe/unordered-containers) 540 | 541 | # Bonus: memory footprint of some common data types 542 | 543 | Write this down on an index card and keep around for 544 | back-of-the-envelope calculations. 545 | 546 |Data type | 549 |Memory footprint | 550 |
---|---|
Data.ByteString |
553 | 9 words + N bytes | 554 |
Data.Text |
557 | 6 words + 2N bytes | 558 |
String |
561 | 5N words | 562 |
Data.Map |
565 | 6N words + size of keys & values | 566 |
Data.Set |
569 | 5N words + size of elements | 570 |
Data.IntMap |
573 | 3N + 5(N-1) words + size of values | 574 |
Data.IntSet |
577 | 2N + 5(N-1) words | 578 |
Data.HashMap |
581 | 5N + 4(N-1) words + size of keys & values | 582 |
Data.HashSet |
585 | 5N + 4(N-1) words + size of elements | 586 |

60 | 61 | * Each box represents one machine word 62 | 63 | * Arrows represent pointers 64 | 65 | * Each constructor has one word overhead for e.g. GC information 66 | 67 | 68 | ## Refresher: unboxed types 69 | 70 | GHC defines a number of _unboxed_ types. These typically represent 71 | primitive machine types. 72 | 73 | * By convention, the names of these types end with a 74 | `#`. 75 | 76 | * Most unboxed types take one word (except 77 | e.g. `Double#` on 32-bit machines) 78 | 79 | * Values of unboxed types cannot be thunks. 80 | 81 | * The basic types are defined in terms unboxed types e.g. 82 | 83 | ~~~~ {.haskell} 84 | data Int = I# Int# 85 | ~~~~ 86 | 87 | * We call types such as `Int` _boxed_ types 88 | 89 | 90 | ## Poll 91 | 92 | How many machine words is needed to store a value of this data type: 93 | 94 | ~~~~ {.haskell} 95 | data IntPair = IP Int Int 96 | ~~~~ 97 | 98 | * 3? 99 | 100 | * 5? 101 | 102 | * 7? 103 | 104 | * 9? 105 | 106 | Tip: Draw a boxes-and-arrows diagram. 107 | 108 | 109 | ## IntPair memory layout 110 | 111 |
112 | 113 | So an `IntPair` value takes 7 words. 114 | 115 | 116 | ## Refresher: unpacking 117 | 118 | GHC gives us some control over data representation via the 119 | `UNPACK` pragma. 120 | 121 | * The pragma unpacks the contents of a constructor into the 122 | field of another constructor, removing one level of indirection 123 | and one constructor header. 124 | 125 | * Only fields that are strict, monomorphic, and single-constructor 126 | can be unpacked. 127 | 128 | The pragma is added just before the bang pattern: 129 | 130 | ~~~~ {.haskell} 131 | data Foo = Foo {-# UNPACK #-} !SomeType 132 | ~~~~ 133 | 134 | GHC 7 and later will warn if an `UNPACK` pragma cannot be used because 135 | it fails the use constraint. 136 | 137 | 138 | ## Unpacking example 139 | 140 | ~~~~ {.haskell} 141 | data IntPair = IP !Int !Int 142 | ~~~~ 143 | 144 |
145 | 146 | ~~~~ {.haskell} 147 | data IntPair = IP {-# UNPACK #-} !Int 148 | {-# UNPACK #-} !Int 149 | ~~~~ 150 | 151 |
152 | 153 | 154 | ## A structural comparison with C 155 | 156 | By reference: 157 | 158 | ~~~~ {.haskell} 159 | -- Haskell 160 | data A = A !Int 161 | ~~~~ 162 | 163 | ~~~~ {.c} 164 | // C 165 | struct A { 166 | int *a; 167 | }; 168 | ~~~~ 169 | 170 | By value: 171 | 172 | ~~~~ {.haskell} 173 | -- Haskell 174 | data A = A {-# UNPACK #-} !Int 175 | ~~~~ 176 | 177 | ~~~~ {.c} 178 | // C 179 | struct A { 180 | int a; 181 | }; 182 | ~~~~ 183 | 184 | If you can figure out which C representation you want, you can figure 185 | out which Haskell representation you want. 186 | 187 | ## Benefits of unpacking 188 | 189 | When the pragma applies, it offers the following benefits: 190 | 191 | * Reduced memory usage (4 words saved in the case of `IntPair`) 192 | 193 | * Removes indirection 194 | 195 | Caveat: There are (rare) cases where unpacking hurts performance 196 | e.g. if the value is passed to a non-strict function, as it needs to 197 | be reboxed. 198 | 199 | **Unpacking is one of the most important optimizations available to 200 | us.** 201 | 202 | 203 | ## Compiler support 204 | 205 | Starting with GHC 7.10, small (pointer-sized* or less) strict fields 206 | are unpacked automatically. 207 | 208 | \* Applies to `Double` even on 32-bit architectures. 209 | 210 | 211 | ## Reasoning about laziness 212 | 213 | A function application is only evaluated if its result is needed, 214 | therefore: 215 | 216 | * One of the function's right-hand sides will be evaluated. 217 | 218 | * Any expression whose value is required to decide which RHS to 219 | evaluate, must be evaluated. 220 | 221 | These two properties allow us to use "back-to-front" analysis (known 222 | as demand/strictness analysis) to figure which arguments a function is 223 | strict in. 224 | 225 | 226 | ## Example 227 | 228 | ~~~~ {.haskell} 229 | max :: Int -> Int -> Int 230 | max x y 231 | | x >= y = x 232 | | x < y = y 233 | ~~~~ 234 | 235 | * To pick one of the two RHSs, we must evaluate `x > y`. 236 | 237 | * Therefore we must evaluate _both_ `x` and `y`. 238 | 239 | * Therefore `max` is strict in both `x` and `y`. 240 | 241 | 242 | ## Poll 243 | 244 | ~~~~ {.haskell} 245 | data Tree = Leaf | Node Int Tree Tree 246 | 247 | insert :: Int -> Tree -> Tree 248 | insert x Leaf = Node x Leaf Leaf 249 | insert x (Node y l r) 250 | | x < y = Node y (insert x l) r 251 | | x > y = Node y l (insert x r) 252 | | otherwise = Node x l r 253 | ~~~~ 254 | 255 | Which argument(s) is `insert` strict in? 256 | 257 | * None 258 | 259 | * 1st 260 | 261 | * 2nd 262 | 263 | * Both 264 | 265 | 266 | ## Solution 267 | 268 | Only the second, as inserting into an empty tree can be done without 269 | comparing the value being inserted. For example, this expression 270 | 271 | ~~~~ {.haskell} 272 | insert (1 `div` 0) Leaf 273 | ~~~~ 274 | 275 | does not raise a division-by-zero exception but 276 | 277 | ~~~~ {.haskell} 278 | insert (1 `div` 0) (Node 2 Leaf Leaf) 279 | ~~~~ 280 | 281 | does. 282 | 283 | 284 | ## Strict data types cannot contain thunks 285 | 286 | Given a function `f :: ... -> T` where `T` is a type with only strict 287 | fields*, whose value are types containing only strict fields, and so 288 | forth, cannot contain any thunks. 289 | 290 | ~~~~ {.haskell} 291 | data T1 = C1 !Int -- Exactly 2 words 292 | data T2 = C2 Int -- 2-inf words 293 | ~~~~ 294 | 295 | Therefore, `f` cannot leak space after the evaluation of `f` has 296 | finished. 297 | 298 | Using strict fields is thus a way to prevent space leaks, without 299 | sprinkling bangs all over the definition of `f`. 300 | 301 | \* of non-function type, because closures can retain data. 302 | 303 | 304 | ## Guideline 1: data types should be strict by default 305 | 306 | * Allows a more compact data representation. 307 | 308 | * Avoids many space leaks by construction. 309 | 310 | * Can be more cache-friendly to create. 311 | 312 | 313 | ## In practice: strict can be more cache-friendly 314 | 315 | Example: `Data.Map` 316 | 317 | ~~~~ {.haskell} 318 | data Map k a = Tip 319 | | Bin {-# UNPACK #-} !Size !k a 320 | !(Map k a) !(Map k a) 321 | ~~~~ 322 | 323 | (Note the bang on the `Map k a` fields.) 324 | 325 | * Most container types have a strict spine. 326 | 327 | * Strict spines cause more work to be done up-front (e.g. on 328 | `insert`), when the data structure is in cache, rather than later 329 | (e.g. on the next `lookup`.) 330 | 331 | * Does not always apply (e.g. when representing streams and other 332 | infinite structures.) 333 | 334 | 335 | ## Guideline 2: use strict data types in accumulators 336 | 337 | If you're using a composite accumulator (e.g. a pair), make sure it has 338 | strict fields. 339 | 340 | Allocates on each iteration: 341 | 342 | ~~~~ {.haskell} 343 | mean :: [Double] -> Double 344 | mean xs = s / n 345 | where (s, n) = foldl' (\ (s, n) x -> (s+x, n+1)) (0, 0) xs 346 | ~~~~ 347 | 348 | Doesn't allocate on each iteration: 349 | 350 | ~~~~ {.haskell} 351 | data StrictPair a b = SP !a !b 352 | 353 | mean2 :: [Double] -> Double 354 | mean2 xs = s / n 355 | where SP s n = foldl' (\ (SP s n) x -> SP (s+x) (n+1)) (SP 0 0) xs 356 | ~~~~ 357 | 358 | Haskell makes it cheap to create throwaway data types like 359 | `StrictPair`: one line of code. 360 | 361 | 362 | ## Guideline 3: use strict returns in monadic code 363 | 364 | `return` often wraps the value in some kind of (lazy) box. This is an 365 | example of a hidden lazy data type in our code. For example, assuming 366 | we're in a state monad: 367 | 368 | ~~~~ {.haskell} 369 | return $ x + y 370 | ~~~~ 371 | 372 | creates a thunk. We most likely want: 373 | 374 | ~~~~ {.haskell} 375 | return $! x + y 376 | ~~~~ 377 | 378 | Just use `$!` by default. 379 | 380 | 381 | ## In practice: beware of the lazy base case 382 | 383 | Functions that would otherwise be strict might be made lazy by the 384 | "base case": 385 | 386 | ~~~~ {.haskell} 387 | data Tree = Leaf 388 | | Bin Key !Value !Tree !Tree 389 | 390 | insert :: Key -> Value -> Tree -> Tree 391 | insert k v Leaf = Bin k v Leaf Leaf -- lazy in @k@ 392 | insert k v (Bin k' v' l r) 393 | | k < k' = ... 394 | | otherwise = ... 395 | ~~~~ 396 | 397 | Since GHC does good things to strict arguments, we should make the 398 | base case strict, unless the extra laziness is useful: 399 | 400 | ~~~~ {.haskell} 401 | insert !k v Leaf = Bin k v Leaf Leaf -- strict in @k@ 402 | ~~~~ 403 | 404 | In this case GHC might unbox the key, making all those comparisons 405 | cheaper. 406 | 407 | ## Guideline 4: force expressions before wrapping them in lazy data types 408 | 409 | We don't control all data types in our program. 410 | 411 | * Many standard data types are lazy (e.g. `Maybe`, `Either`, tuples). 412 | 413 | * This means that it's easy to be lazier than you intend by wrapping 414 | an expression in such a value: 415 | 416 | ~~~~ {.haskell} 417 | safeDiv :: Int -> Int -> Maybe Int 418 | safeDiv _ 0 = Nothing 419 | safeDiv x y = Just $ x / y -- creates thunk 420 | ~~~~ 421 | 422 | * Force the value (e.g. using `$!`) before wrapping it in the 423 | constructor. 424 | 425 | 426 | ## INLINE all the things!?! 427 | 428 | * **Don't.** GHC typically does a good job inlining code on its own. 429 | 430 | * We probably inline too much in core libraries, out of paranoia. 431 | 432 | * We should however *make it possible* for GHC to inline (see next 433 | slide). 434 | 435 | 436 | ## Guideline 5: Add wrappers to recursive functions 437 | 438 | * GHC does not inline recursive functions: 439 | 440 | ~~~~ {.haskell} 441 | map :: (a -> b) -> [a] -> [b] 442 | map _ [] = [] 443 | map f (x:xs) = f x : map f xs 444 | ~~~~ 445 | 446 | * **If** you want to inline a recursive function, use a non-recursive 447 | wrapper like so: 448 | 449 | ~~~~ {.haskell} 450 | map :: (a -> b) -> [a] -> [b] 451 | map f = go 452 | where 453 | go [] = [] 454 | go (x:xs) = f x : go xs 455 | ~~~~ 456 | 457 | 458 | ## Inlining HOFs avoids indirect calls 459 | 460 | * Calling an unknown function (e.g. a function that's passed as an 461 | argument) is more expensive than calling a known function. Such 462 | *indirect* calls appear in higher-order functions: 463 | 464 | ~~~~ {.haskell} 465 | map :: (a -> b) -> [a] -> [b] 466 | map _ [] = [] 467 | map f (x:xs) = f x : map f xs 468 | 469 | g xs = map (+1) xs -- map is recursive => not inlined 470 | ~~~~ 471 | 472 | * If we use the non-recursive wrapper from the last slide, GHC will 473 | likely inline `map` into `g`. 474 | 475 | * It's useful to Inline HOFs if the higher-order argument is used a 476 | lot (e.g. in `map`, but not in `Data.Map.insertWith`). Sometimes GHC 477 | gets this wrong (check the Core) and you can use a manual pragma to 478 | help it. 479 | 480 | 481 | ## Guideline 6: Use INLINABLE 482 | 483 | Use `INLINABLE` to remove overhead from type classes. 484 | 485 | * Despite its name, it works quite differently from `INLINE`. 486 | 487 | * `INLINABLE` gives us a way to do call-site specialization of type 488 | class parameters. 489 | 490 | Given 491 | 492 | ~~~~ {.haskell} 493 | module M1 where 494 | f :: Num a => a -> a -> a 495 | f x y = ... 496 | {-# INLINABLE f #-} 497 | 498 | module M2 where 499 | main = print $ f (1 :: Int) 2 500 | ~~~~ 501 | 502 | GHC will create a copy of `f` at the call site, specialized to `Int`. 503 | 504 | 505 | ## GHC Core 506 | 507 | * GHC uses an intermediate language, called "Core," as its internal 508 | representation during several compilation stages 509 | * Core resembles a subset of Haskell 510 | * The compiler performs many of its optimizations by repeatedly 511 | rewriting the Core code 512 | 513 | 514 | ## Why knowing how to read Core is important 515 | 516 | Reading the generated Core lets you answer many questions, for 517 | example: 518 | 519 | * When are expressions evaluated? 520 | * Is this function argument accessed via an indirection? 521 | * Did my function get inlined? 522 | 523 | 524 | ## Convincing GHC to show us the Core 525 | 526 | Given this "program" 527 | 528 | ~~~~ {.haskell} 529 | module Sum where 530 | 531 | import Prelude hiding (sum) 532 | 533 | sum :: [Int] -> Int 534 | sum [] = 0 535 | sum (x:xs) = x + sum xs 536 | ~~~~ 537 | 538 | we can get GHC to output the Core by adding the `-ddump-simpl` flag 539 | 540 | ~~~~ 541 | $ ghc -O -ddump-simpl -dsuppress-module-prefixes \ 542 | -dsuppress-idinfo Sum.hs -fforce-recomp 543 | ~~~~ 544 | 545 | 546 | ## Reading Core: a guide 547 | 548 | * Unless you use the `-dsuppress-*` flags, 549 | 550 | - all names are fully qualified (e.g.`GHC.Types.Int` instead of 551 | just `Int`) and 552 | - there's lots of meta information about the function (strictness, 553 | types, etc.) 554 | 555 | * Lots of the names are generated by GHC (e.g. `w_sgJ`). 556 | 557 | Note: The Core syntax changes slightly with new compiler releases. 558 | 559 | 560 | ## Tips for reading Core 561 | 562 | Three tips for reading Core: 563 | 564 | * Open and edit it in your favorite editor to simplify it (e.g. rename 565 | variables to something sensible). 566 | * Use the ghc-core package on Hackage 567 | * Use the GHC Core major mode in Emacs (ships with haskell-mode) 568 | 569 | 570 | ## Core for the sum function 571 | 572 | ~~~~ {.haskell} 573 | Rec { 574 | $wsum :: [Int] -> Int# 575 | $wsum = 576 | \ (w_sxC :: [Int]) -> 577 | case w_sxC of _ { 578 | [] -> 0; 579 | : x_amZ xs_an0 -> 580 | case x_amZ of _ { I# x1_ax0 -> 581 | case $wsum xs_an0 of ww_sxF { __DEFAULT -> +# x1_ax0 ww_sxF } 582 | } 583 | } 584 | end Rec } 585 | 586 | sum :: [Int] -> Int 587 | sum = 588 | \ (w_sxC :: [Int]) -> 589 | case $wsum w_sxC of ww_sxF { __DEFAULT -> I# ww_sxF } 590 | ~~~~ 591 | 592 | 593 | ## Core for sum explained 594 | 595 | * Convention: A variable name that ends with # stands for an unboxed 596 | value. 597 | * GHC has split `sum` into two parts: a wrapper, `sum`, and a worker, 598 | `$wsum`. 599 | * The worker returns an unboxed integer, which the wrapper wraps in an 600 | `I#` constructor. 601 | * `+#` is addition for unboxed integers (i.e. a single assembler 602 | instruction). 603 | * GHC has added a note that `sum` should be inlined. 604 | 605 | 606 | ## What can we learn from this Core? 607 | 608 | ~~~~ {.haskell} 609 | Rec { 610 | $wsum :: [Int] -> Int# 611 | $wsum = 612 | \ (w_sxC :: [Int]) -> 613 | case w_sxC of _ { 614 | [] -> 0; 615 | : x_amZ xs_an0 -> 616 | case x_amZ of _ { I# x1_ax0 -> 617 | case $wsum xs_an0 of ww_sxF { __DEFAULT -> +# x1_ax0 ww_sxF } 618 | } 619 | } 620 | end Rec } 621 | ~~~~ 622 | 623 | The worker is not tail recursive, as it performs an addition after 624 | calling itself recursively. 625 | 626 | This means that it will use more stack. 627 | 628 | We should probably rewrite it to use an accumulator parameter. 629 | 630 | ## Summary: how to write production quality code 631 | 632 | * Think about memory layout. Use back-of-the-envelope calculations to 633 | figure out how much memory your data types will take (e.g. if you 634 | want to store lots of data in a map). 635 | 636 | * Use strict fields by default. 637 | 638 | * Know the limited number of cases (e.g. accumulator recursion) where 639 | you need to use an explicit bang pattern. 640 | 641 | * Don't go overboard with inlining. 642 | 643 | * Learn to read Core, it's useful and fun! 644 | --------------------------------------------------------------------------------