├── .gitignore ├── 01-monty-hall.Rmd ├── 02-logic.Rmd ├── 03-truth-tables.Rmd ├── 04-gamblers-fallacy.Rmd ├── 05-calculating-probabilities.Rmd ├── 06-conditional-probability.Rmd ├── 07-calculating-probabilities-2.Rmd ├── 08-bayes-theorem.Rmd ├── 09-multiple-conditions.Rmd ├── 10-induction-and-probability.Rmd ├── 11-expected-value.Rmd ├── 12-utility.Rmd ├── 13-challenges-to-eu.Rmd ├── 14-infinity-and-beyond.Rmd ├── 15-two-schools.Rmd ├── 16-beliefs-and-betting-rates.Rmd ├── 17-dutch-books.Rmd ├── 18-priors.Rmd ├── 19-significance-testing.Rmd ├── 20-lindley-paradox.Rmd ├── A-cheat-sheet.Rmd ├── B-axiomatic-probability-theory.Rmd ├── C-grue.Rmd ├── D-problem-of-induction.Rmd ├── E-selected-solutions.Rmd ├── README.md ├── _bookdown.yml ├── _output.yml ├── custom.css ├── header.html ├── img ├── allais.png ├── bertrand.png ├── bertrand_screengrab.png ├── bertrand_screenshot.png ├── daniel_bernoulli.png ├── die │ ├── die1.png │ ├── die2.png │ ├── die3.png │ ├── die4.png │ ├── die5.png │ └── die6.png ├── door_closed.png ├── door_open.png ├── ellsberg.png ├── emoji_hearts.png ├── emoji_hearts_small.png ├── emoji_nerd.png ├── emoji_nerd_small.png ├── emoji_shades.png ├── emoji_shades_small.png ├── euler.png ├── fig.png ├── fisher.png ├── flanders.png ├── goodman.png ├── hume.png ├── jeffreys.png ├── laplace.png ├── lets_make_a_deal.png ├── marg_fig.png ├── marilyn_vos_savant.png ├── moon.gif ├── moon.png ├── neon_bayes.png ├── pascal.png ├── pill_green.png ├── pill_red.png ├── playing_cards.png ├── ramsey.png ├── roulette_wheel.png ├── social_image.png ├── taxi_blue.png ├── taxi_green.png ├── vacuum.gif ├── vacuum.png ├── wiphi_grue.png └── xfiles.png ├── index.Rmd ├── preamble.html ├── preamble.tex └── toc.css /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | vip/* 3 | _book/* 4 | _bookdown_files/* 5 | _main* 6 | .vscode/* 7 | .Rproj.user/* 8 | *.Rproj 9 | .Rproj.user 10 | img_source/* 11 | ideas.md -------------------------------------------------------------------------------- /01-monty-hall.Rmd: -------------------------------------------------------------------------------- 1 | # (PART\*) Part I {-} 2 | 3 | # The Monty Hall Problem 4 | 5 | ```{block, type="epigraph"} 6 | When tackling a math problem,\ 7 | I encourage students to draw a picture.\ 8 | ---Francis Su 9 | ``` 10 | 11 | `r newthought("Imagine")` you're on a game show. There are three doors, one with a prize behind it. You're allowed to pick any door, so you choose the first one at random, door A. 12 | 13 | ```{marginfigure, echo=TRUE} 14 |  15 | Monty Hall was creator and host of the game show *Let's Make a Deal*. 16 | ``` 17 | 18 | Now the rules of the game require the host to open one of the other doors and let you switch your choice if you want. Because the host doesn't want to give away the game, they always open an empty door. 19 | 20 | In your case, the host opens door C: no prize, as expected. "Do you want to switch to door B?" the host asks. 21 | 22 | Pause a moment to think about your answer before reading on. 23 | 24 | `r newthought("What")` did you decide? Did you conclude it doesn't matter whether you stick with door A or switch to door B? 25 | 26 | ```{marginfigure, echo=TRUE} 27 |  28 | Marilyn vos Savant made the Monty Hall problem famous when she solved it correctly in her magazine column. Read about it [here](https://www.nytimes.com/1991/07/21/us/behind-monty-hall-s-doors-puzzle-debate-and-answer.html). 29 | ``` 30 | 31 | If so, you're in good company. Most people find this answer sensible, including some professors of statistics and mathematics. They figure there are only two possibilities remaining, door A and door B, each with the same one-in-two chance of being the winner. So it doesn't matter which one you pick. 32 | 33 | But the right answer is you should switch. Door B is now twice as likely to be the winner as door A. Why? 34 | 35 | The reason is subtle. One way to think about it is that the host's choice of which door to open is a bit of a tell. Maybe they *had* to open door C, because the prize is behind door B and they didn't want to give that away. Of course, it could be behind door A instead, so maybe they just picked door C at random. But there was only a one-in-three chance the prize would be behind door A. Which means there's a two-in-three chance they didn't really have a choice, they had to open door C to avoid showing you the prize behind door B. 36 | 37 | ```{r montygrid, echo=FALSE, fig.margin=TRUE, fig.width=5, fig.cap="The hundred-door version of the Monty Hall problem, suggested by Marilyn vos Savant"} 38 | open_door <- readPNG("img/door_open.png") %>% rasterGrob() 39 | closed_door <- readPNG("img/door_closed.png") %>% rasterGrob() 40 | 41 | place_image <- function(row, img) { 42 | annotation_custom(img, xmin = row[1], xmax = row[1] + 1, 43 | ymin = row[2] + .01, ymax = row[2] + .99) 44 | } 45 | 46 | open_grid <- expand.grid(x = 0:9, y = 9:0) 47 | open_grid <- open_grid[-c(1, 59), ] 48 | closed_grid <- data.frame(x = c(0, 8), y = c(9, 4)) 49 | 50 | ggplot() + 51 | theme_void() + xlim(0, 10) + ylim(0, 10) + 52 | apply(open_grid, 1, place_image, open_door) + 53 | apply(closed_grid, 1, place_image, closed_door) 54 | ``` 55 | 56 | Here's another way to think about it. Imagine the game had a hundred doors instead of just three. And suppose again you start by picking the first door at random. Then the host opens *all the other doors but one*, door $59$ let's say. You have to ask yourself: why did they pick door $59$ to leave closed?? Almost certainly because that's where the prize is hidden! Maybe you got really lucky and picked right with the first door at the beginning. But it's way more likely you didn't, and the host had to keep door $59$ closed to avoid giving away the game. 57 | 58 | 59 | ## Diagramming the Solution 60 | 61 | ```{r montytree1, echo=FALSE, warning=FALSE, dev="png", fig.margin=TRUE, fig.cap="First stage of a tree diagram for the Monty Hall problem"} 62 | g <- data.frame(from = c(1, 1, 1, 2, 3, 4, 4), 63 | to = c(2, 3, 4, 5, 6, 7, 8)) %>% 64 | graph_from_data_frame() 65 | 66 | E(g)$weight <- c("1/3", "1/3", "1/3", "", "", "1/2", "1/2") 67 | 68 | vertex_attr(g, "name") <- c(NA, "C", "B", "A", 69 | "'Open B'~~~~~~bold(' ')", "'Open C'~~~~~~bold(' ')", 70 | "'Open B'~~~~~~bold(' ')", "'Open C'~~~~~~bold(' ')") 71 | 72 | ggraph(g, layout = "tree") + 73 | geom_edge_link(aes(x = x + c(0, 0, 0, 0, .15, 0, 0), 74 | xend = xend + c(0, .15, 0, 0, .15, 0, 0), 75 | label = weight), 76 | alpha = c(rep(1, 300), rep(0, 400)), 77 | label_alpha = c(1, 1, 1, 0, 0, 0, 0), 78 | label_size = 7, 79 | angle_calc = "along", 80 | label_dodge = unit(.2, "inches"), 81 | start_cap = circle(c(rep(0, 300), rep(5, 400)), 'mm'), 82 | end_cap = circle(5, 'mm')) + 83 | geom_node_text(aes(x = x + c(0, 0, .15, 0, 0, .15, 0, 0), 84 | label = name, filter = !is.na(name)), 85 | alpha = c(1, 1, 1, 0, 0, 0, 0), 86 | parse = TRUE, 87 | size = 7, 88 | hjust = c(rep(.5, 3), rep(0, 4))) + 89 | scale_y_reverse(expand = expansion(add = c(.05, .7))) + 90 | scale_x_reverse() + 91 | theme_void() + 92 | coord_flip() 93 | ``` 94 | ```{r montytree2, echo=FALSE, dev="png", fig.margin=TRUE, fig.cap="Second stage"} 95 | ggraph(g, layout = "tree") + 96 | geom_edge_link(aes(x = x + c(0, 0, 0, 0, .15, 0, 0), 97 | xend = xend + c(0, .15, 0, 0, .15, 0, 0), 98 | label = weight), 99 | label_size = 7, 100 | angle_calc = "along", 101 | label_dodge = unit(.2, "inches"), 102 | start_cap = circle(c(rep(0, 300), rep(5, 400)), 'mm'), 103 | end_cap = circle(5, 'mm')) + 104 | geom_node_text(aes(x = x + c(0, 0, .15, 0, 0, .15, 0, 0), 105 | label = name, filter = !is.na(name)), 106 | parse = TRUE, 107 | size = 7, 108 | hjust = c(rep(.5, 3), rep(0, 4))) + 109 | scale_y_reverse(expand = expansion(add = c(.05, .7))) + 110 | scale_x_reverse() + 111 | theme_void() + 112 | coord_flip() 113 | ``` 114 | ```{r montytree3, echo=FALSE, dev="png", fig.margin=TRUE, fig.cap="Third and final stage"} 115 | vertex_attr(g, "name") <- c(NA, "C", "B", "A", 116 | "'Open B'~~~~~~bold('1/3')", "'Open C'~~~~~~bold('1/3')", 117 | "'Open B'~~~~~~bold('1/6')", "'Open C'~~~~~~bold('1/6')") 118 | 119 | ggraph(g, layout = "tree") + 120 | geom_edge_link(aes(x = x + c(0, 0, 0, 0, .15, 0, 0), 121 | xend = xend + c(0, .15, 0, 0, .15, 0, 0), 122 | label = weight), 123 | label_size = 7, 124 | angle_calc = "along", 125 | label_dodge = unit(.2, "inches"), 126 | start_cap = circle(c(rep(0, 300), rep(5, 400)), 'mm'), 127 | end_cap = circle(5, 'mm')) + 128 | geom_node_text(aes(x = x + c(0, 0, .15, 0, 0, .15, 0, 0), 129 | label = name, filter = !is.na(name)), 130 | parse = TRUE, 131 | size = 7, 132 | hjust = c(rep(.5, 3), rep(0, 4))) + 133 | scale_y_reverse(expand = expansion(add = c(.05, .7))) + 134 | scale_x_reverse() + 135 | theme_void() + 136 | coord_flip() 137 | ``` 138 | 139 | A picture helps clarify things. At first the prize could be behind any of the three doors, with equal probability each way. So we draw a tree with three branches, each labeled with a probability of $1/3$. Figure \@ref(fig:montytree1) shows the result. 140 | 141 | Now, which door the host opens may depend on where the prize is, i.e. which branch we're on. If it's behind door C, they won't show you by opening that door. They would have to open door B in this case. 142 | 143 | Likewise, if the prize is behind door B, then opening door C is their only option. 144 | 145 | Only if the prize is behind door A do they have a choice: open either door B or door C. In that case it's a tossup which door they'll open, so each of those possibilities has a 1/2 chance. Check out Figure \@ref(fig:montytree2). 146 | 147 | Now imagine playing the game over and over. A third of the time things will follow the top path; a third of the time they'll follow the middle one; and the remaining third they'll follow one of the two bottom paths. 148 | 149 | When things follow the bottom branches, half of those times the host will open door B, and half the time they'll open door C. So one in every six plays will follow the *A-and-Open-B* path. And one in every six plays will follow the *A-and-Open-C* path. See Figure \@ref(fig:montytree3). 150 | 151 | Now we can understand what happens when the host opens door C. Usually it's because the prize is behind door B. Sometimes they open door C because the prize is behind door A instead. But that's only a sixth of the time, compared to a third of the time where they open door C because the prize is behind door B. 152 | 153 | So when you see the host open door C, you should think it's more likely you're on the middle branch, with the prize behind door B. Switch! 154 | 155 | 156 | ## Lessons Learned {#lessons} 157 | 158 | Tree diagrams are a handy tool for solving probability problems. They also illustrate some central concepts of probability. 159 | 160 | Probabilities are numbers assigned to possibilities. In the Monty Hall problem, there are three possibilities for where the prize is: door A, door B, and door C. Each of these possibilities starts with the same probability: 1/3. 161 | 162 | Some possibilities are ***mutually exclusive***, meaning only one of them can obtain. The prize can't be behind door A and door B, for example. Here are more examples of mutually exclusive possibilities: 163 | 164 | - A coin can land heads or tails, but it can't do both on the same toss. 165 | - A card drawn from a standard deck could be either an ace or a queen, but it can't be both. 166 | - The temperature at noon tomorrow could be 20 degrees, or it could be 25 degrees, but it can't be both. 167 | 168 | When possibilities are mutually exclusive, their probabilities add up. For example, the initial probability the prize will be behind either door A or door B is $1/3 + 1/3 = 2/3$. And the probability a card drawn from a standard deck will be either an ace or a queen is $4/52 + 4/52 = 8/52 = 2/13$. 169 | 170 | Another key concept is possibilities that are ***exhaustive***. In the Monty Hall problem, the prize has to be behind one of the three doors, so A, B, and C "exhaust" all the possibilities. Here are more examples of exhaustive possibilities: 171 | 172 | - A card drawn from a standard deck must be either red or black. 173 | - The temperature at noon tomorrow must be either above zero, below zero, or zero. 174 | 175 | ```{r echo=FALSE, warning=FALSE, fig.margin=TRUE, fig.height=10, fig.cap="Three partitions for a card drawn from a standard deck"} 176 | 177 | # setting warning=FALSE to hide buggy warnings arising from geom_tile 178 | # see https://github.com/tidyverse/ggplot2/issues/1904 179 | 180 | colour_partition <- rbind( 181 | data.frame(x = .5, y = 5.42, w = 1, h = .85), 182 | data.frame(x = .5, y = 4.58, w = 1, h = .85) 183 | ) 184 | 185 | face_card_partition <- rbind( 186 | data.frame(x = .385, y = 3, w = .77, h = 1.7), 187 | data.frame(x = .885, y = 3, w = .23, h = 1.7) 188 | ) 189 | 190 | suit_partition <- rbind( 191 | data.frame(x = .5, y = 1.205, w = 1, h = .425), 192 | data.frame(x = .5, y = 1.628, w = 1, h = .425), 193 | data.frame(x = .5, y = 0.79, w = 1, h = .425), 194 | data.frame(x = .5, y = 0.372, w = 1, h = .425) 195 | ) 196 | 197 | img <- readPNG("img/playing_cards.png") 198 | 199 | ggplot() + 200 | theme_void() + 201 | xlim(0, 1) + ylim(0, 6) + 202 | annotation_custom(rasterGrob(img, interpolate = TRUE), xmin = 0, xmax = 1,ymin = 0, ymax = 2) + 203 | annotation_custom(rasterGrob(img, interpolate = TRUE), xmin = 0, xmax = 1, ymin = 2, ymax = 4) + 204 | annotation_custom(rasterGrob(img, interpolate = TRUE), xmin = 0, xmax = 1, 205 | ymin = 4, ymax = 6) + 206 | geom_tile(aes(x, y, height = h, width = w), colour = "forestgreen", size = 2, fill = "transparent", data = colour_partition) + 207 | geom_tile(aes(x, y, height = h, width = w), colour = "forestgreen", size = 2, fill = "transparent", data = face_card_partition) + 208 | geom_tile(aes(x, y, height = h, width = w), colour = "forestgreen", size = 2, fill = "transparent", data = suit_partition) 209 | ``` 210 | 211 | `r newthought("In our")` tree diagrams, each branch-point always uses a set of possibilities that is *both* exclusive *and* exhaustive. The first split on the three doors covers all the possibilities for where the prize might be, and only one of those possibilities can be the actual location of the prize. Likewise for the second stage of the diagram. On the bottom branch for example, the host must open either door B or door C given the rules, but he will only open one or the other. 212 | 213 | When a set of possibilities is both exclusive and exhaustive, it's called a ***partition***. A partition "carves up" the space of possibilities into distinct, non-overlapping units. 214 | 215 | There can be more than one way to partition the space of possibilities. For example, a randomly drawn playing card could be black or red; it could be a face card or not; and it could be any of the four suits ($\heartsuit$, $\diamondsuit$, $\clubsuit$, $\spadesuit$). 216 | 217 | When possibilities form a partition, their probabilities must add up to 1. Initially, the probability the prize will be behind one of the three doors is $1/3 + 1/3 + 1/3 = 1$. And the probability that a card drawn from a standard deck at random will be either red or black is $1/2 + 1/2 = 1$. 218 | 219 | In a way, the fundamental principle of probability is that probabilities over a partition must add up to 1. 220 | 221 | 222 | `r newthought("Tree")` diagrams follow a few simple rules based on these concepts. The parts of a tree are called *nodes*, *branches*, and *leaves*: see Figure \@ref(fig:treeparts). 223 | 224 | ```{r treeparts, echo=FALSE, fig.cap="The parts of a tree diagram: nodes, branches, and leaves"} 225 | vertex_attr(g, "name") <- c(NA, "C", "B", "A", 226 | "'Open B'~~~~~~bold(' ')", "'Open C'~~~~~~bold(' ')", 227 | "'Open B'~~~~~~bold(' ')", "'Open C'~~~~~~bold(' ')") 228 | 229 | a <- arrow(length = unit(0.01, "npc"), type = "closed") 230 | 231 | ggraph(g, layout = "tree") + 232 | geom_edge_link(aes(x = x + c(0, 0, 0, 0, .15, 0, 0), 233 | xend = xend + c(0, .15, 0, 0, .15, 0, 0)), 234 | label_size = 5, 235 | angle_calc = "along", 236 | label_dodge = unit(.2, "inches"), 237 | start_cap = circle(c(rep(0, 300), rep(5, 400)), 'mm'), 238 | end_cap = circle(5, 'mm')) + 239 | geom_node_text(aes(x = x + c(0, 0, .15, 0, 0, .15, 0, 0), 240 | label = name, filter = !is.na(name)), 241 | parse = TRUE, 242 | size = 5, 243 | hjust = c(rep(.5, 3), rep(0, 4))) + 244 | annotate(geom = "segment", x = 1, y = 1.6, xend = .2, yend = 1.95, 245 | color = "firebrick", arrow = a) + 246 | annotate(geom = "segment", x = 1, y = 1.6, xend = 1.3, yend = 1.1, 247 | color = "firebrick", arrow = a) + 248 | annotate(geom = "label", label = "Nodes", x = 1, y = 1.6, 249 | color = "firebrick", label.size = NA) + 250 | annotate(geom = "segment", x = .5, y = .68, xend = 0.1, yend = .68, 251 | color = "steelblue", arrow = a) + 252 | annotate(geom = "segment", x = .5, y = .68, xend = 0.9, yend = .5, 253 | color = "steelblue", arrow = a) + 254 | annotate(geom = "segment", x = .5, y = .68, xend = 0.75, yend = 1.35, 255 | color = "steelblue", arrow = a) + 256 | annotate(geom = "label", label = "Branches", x = .5, y = .68, 257 | color = "steelblue", label.size = NA) + 258 | annotate(geom = "segment", x = -.6, y = 0, xend = -.2, yend = -.1, 259 | color = "forestgreen", arrow = a) + 260 | annotate(geom = "segment", x = -.6, y = 0, xend = -1, yend = -.1, 261 | color = "forestgreen", arrow = a) + 262 | annotate(geom = "label", label = "Leaves", x = -.6, y = 0, 263 | color = "forestgreen", label.size = NA) + 264 | scale_y_reverse(expand = expansion(add = c(.05, .7))) + 265 | scale_x_reverse() + 266 | theme_void() + 267 | coord_flip() 268 | ``` 269 | 270 | The rules for a tree are as follows. 271 | 272 | - *Rule 1.* Each node must branch into a partition. The subpossibilities that emerge from a node must be mutually exclusive, and they must include every way the possibility from which they emerge could obtain. 273 | 274 | - *Rule 2.* The probabilities emerging from a node must add to $1$. If we add up the numbers on the branches immediately coming out of a node, they should add to $1$. 275 | 276 | - *Rule 3.* The probability on a branch is *conditional* on the branches leading up to it. Consider the bottom path in the Monty Hall problem. The probability the host will open door C is $1/2$ there because we're assuming the prize is behind door A. 277 | 278 | - *Rule 4.* The probability of a leaf is calculated by multiplying across the branches on the path leading to it. This number represents the probability that all possibilities on that path occur. 279 | 280 | Notice, Rule 4 is how we got the final probabilities we used to solve the Monty Hall problem (the numbers in bold). 281 | 282 | 283 | ## Exercises {-} 284 | 285 | #. True or false: in the Monty Hall problem, it's essential to the puzzle that the host doesn't want to expose the prize. If they didn't care about giving away the location of the prize, there would be no reason to switch when they open door C. 286 | 287 | #. In the version of the Monty Hall problem with a hundred doors, after the host opens every door except door 1 (your door) and door 59, the chance the prize is behind door 59 is: 288 | 289 | a. 1/100 290 | b. 1/99 291 | c. 1/2 292 | d. 99/100 293 | 294 | #. Imagine three prisoners, A, B, and C, are condemned to die in the morning. But the king decides to pardon one of them first. He makes his choice at random and communicates it to the guard, who is sworn to secrecy. She can only tell the prisoners that one of them will be released at dawn, she can't say who. 295 | 296 | Prisoner A welcomes the news, as he now has a $1/3$ chance of survival. Hoping to go even further, he says to the guard, "I know you can't tell me whether I am condemned or pardoned. But at least one other prisoner must still be condemned, so can you just name one who is?". The guard tells him that B is still condemned. "Ok," says A, "then it's either me or C who was pardoned. So my chance of survival has gone up to 1/2." 297 | 298 | Is prisoner A's reasoning correct? Use a probability tree to explain why/why not. 299 | 300 | #. In a probability tree, each branch point should split into possibilities that are: 301 | 302 | a. Mutually exclusive. 303 | b. Exhaustive. 304 | c. Both mutually exclusive and exhaustive. 305 | d. None of the above. 306 | 307 | #. Suppose you have two urns. The first has two black marbles and two white marbles. The second has three black marbles and one white marble. You are going to flip a fair coin to select one of the urns at random, and then draw one marble at random. What is the chance you will select a black marble? 308 | 309 | Hint: draw a probability tree and ask yourself, "if I did this experiment over and over again, how often would I draw a black marble in the long run?" 310 | 311 | a. 5/8 312 | b. 3/8 313 | c. 1/2 314 | d. 1/4 315 | 316 | #. An ice cream counter sells 4 different flavours of ice cream (chocolate, vanilla, strawberry, mint). There are $2$ toppings (fudge, caramel), and $3$ kinds of sprinkles (chocolate, rainbow, purple). You order by picking a flavour, a topping, and a kind of sprinkles. 317 | 318 | a. How many possible orders are there? 319 | b. If you make your three choices randomly, what is the probability your order will have strawberry ice cream but not rainbow sprinkles? -------------------------------------------------------------------------------- /02-logic.Rmd: -------------------------------------------------------------------------------- 1 | # Logic 2 | 3 | ```{block, type="epigraph"} 4 | I can win an argument on any topic, against any opponent. People know this, and steer clear of me at parties. Often, as a sign of their great respect, they don't even invite me.\ 5 | ---Dave Barry 6 | ``` 7 | 8 | `r newthought("Logic")` is the study of what follows from what. From the information that Tweety is a bird and all birds are animals, it follows that Tweety is an animal. But things aren't always so certain. Can Tweety fly? Most birds can fly, so probably. But Tweety might be a penguin. 9 | 10 | *Deductive* logic is the branch of logic that studies what follows with certainty. *Inductive* logic deals with uncertainty, things that only follow with high probability. 11 | 12 | This book is about inductive logic and probability. But we need a few concepts from deductive logic to get started. 13 | 14 | 15 | ## Validity & Soundness 16 | 17 | In deductive logic we study "valid" arguments. An argument is ***valid*** when the conclusion must be true if the premises are true. Take this example again: 18 | 19 | ```{block, type="argument", echo=TRUE} 20 | Tweety is a bird.\ 21 | All birds are animals.\ 22 | Therefore, Tweety is an animal. 23 | ``` 24 | 25 | The first two lines are called the *premises* of the argument. The last line is called the *conclusion*. In this example, the conclusion must be true if the premises are. So the argument is valid. 26 | 27 | Here's another example of a valid argument: 28 | 29 | ```{block, type="argument", echo=TRUE} 30 | Tweety is taller than Kwazi.\ 31 | Kwazi is taller than Peso.\ 32 | Therefore, Tweety is taller than Peso. 33 | ``` 34 | 35 | The argument is valid because it's just not possible for the premises to be true and the conclusion false. 36 | 37 | Here's an example of an *invalid* argument: 38 | 39 | ```{block, type="argument", echo=TRUE} 40 | Tweety is a bird.\ 41 | Most birds can fly.\ 42 | Therefore, Tweety can fly. 43 | ``` 44 | 45 | It's not valid because validity requires the conclusion to follow *necessarily*. If there's any way for the the premises to be true yet the conclusion false, the argument doesn't count as valid. And like we said, Tweety might be a penguin. 46 | 47 | Valid arguments are interesting because their logic is airtight. If the assumptions of the argument are correct, there's no way to go wrong accepting the conclusion. But what if the assumptions *aren't* correct? Validity isn't everything, we also want our arguments to build on true foundations. 48 | 49 | `r newthought("We call")` an argument ***sound*** when it is valid *and* all the premises are true: 50 | $$ \mbox{sound = valid + true premises}.$$ 51 | For example, here's a sound argument: 52 | 53 | ```{block, type="argument", echo=TRUE} 54 | The author of this book is human.\ 55 | All humans are animals.\ 56 | Therefore, the author of this book is an animal. 57 | ``` 58 | 59 | Sound arguments are important because their conclusions are always true. The premises of a sound argument are true by definition. And since it's valid by definition too, that guarantees the conclusion to be true as well. 60 | 61 | Yet deductive logic studies validity, not soundness. Why? 62 | 63 | Because logicians aren't in the business of determining when the premises of an argument are true. As a logician, I might have no idea who Tweety is, and thus no idea whether Tweety is a bird. I might not even know whether all birds fly, or just some, or even none. That's a job for an ornithologist. 64 | 65 | A logician's job is to assess the *logic* of an argument, the connections between its assumptions and its conclusion. So a logician just takes the premises of an argument for granted, and asks how well they support the conclusion. That's something you don't need to know any ornithology to study. Or biology, or medicine, or physics, or whatever topic a particular argument concerns. 66 | 67 | `r newthought("Validity")` is a tricky, counterintuitive concept. It's very much a hypothetical notion: it's about whether the conclusion must be true *if* the premises are true. So when we assess an argument's validity, we ignore what we know about the truth of its premises. We pretend they're true even if they aren't. We even have to ignore what we know about the conclusion. 68 | 69 | Instead, we suspend what we know about the topic, and just imagine the premises to be true. Then we ask: in this hypothetical scenario, is there any way the conclusion could be false? If there is, the argument is invalid. Otherwise, it's valid. 70 | 71 | 72 | ## Propositions 73 | 74 | Arguments are made out of statements, assertions that something is true. In logic we call these statements ***propositions***. And we use capital letters of the English alphabet to stand for them. For example, this argument: 75 | 76 | ```{block, type="argument", echo=TRUE} 77 | If Aegon is a tyrant, then Brandon is a wizard.\ 78 | Aegon is a tyrant.\ 79 | Therefore, Brandon is a wizard. 80 | ``` 81 | 82 | can be summarized like this: 83 | 84 | ```{block, type="argument", echo=TRUE} 85 | If $A$, then $B$.\ 86 | $A$.\ 87 | Therefore, $B$. 88 | ``` 89 | 90 | `r newthought("Not")` all sentences are propositions. Some are questions, some are commands, some are expressions of worry. For example: 91 | 92 | - What time is it? 93 | - Pass the rooster sauce! 94 | - Uh oh. 95 | 96 | One way to distinguish propositions from other kinds of sentences is: propositions are capable of being true or false. It wouldn't make sense to respond to someone who asks you what time it is by saying, "what you just said is false!" And you wouldn't respond to someone's request to pass the sauce with "that's true!" Except maybe as a joke. 97 | 98 | 99 | ## Visualizing Propositions 100 | 101 | We learned about mutually exclusive propositions in [Section](#lessons) \@ref(lessons). Two propositions are mutually exclusive when one of them being true means the other must be false. For example: 102 | 103 | - $A$: Confucius was born in the 6th Century A.D. 104 | - $B$: Confucius was born in the 6th Century B.C. 105 | 106 | There is no way for both of these propositions to be true, and we can visualize this relationship in a diagram (Figure \@ref(fig:meprops)). 107 | 108 | ```{r meprops, echo=FALSE, fig.margin=TRUE, fig.cap="Mutually exclusive propositions"} 109 | euler_diagram <- function(propositions) { 110 | ggplot(data = propositions) + theme_void() + coord_fixed() + 111 | xlim(-3,3) + ylim(-2,2) + 112 | theme(panel.border = element_rect(colour="black", fill=NA, size=1)) + 113 | geom_circle(aes(x0 = cirx, y0 = ciry, r = r)) + 114 | geom_text(aes(x = labx, y = laby, label = labl), parse = TRUE, size = 7) 115 | } 116 | 117 | propositions <- data.frame( 118 | cirx = c(-1.25 , 1.25), 119 | ciry = c(0 , 0), 120 | r = c(1 , 1), 121 | labx = c(-2 , 2), 122 | laby = c(1 , 1), 123 | labl = c("italic(A)", "italic(B)") 124 | ) 125 | 126 | euler_diagram(propositions) 127 | ``` 128 | 129 | Each circle represents a proposition. You can think of it as surrounding the possible situations where the proposition would be true. The circles don't overlap because there is no possible situation where both propositions in this example are true. 130 | 131 | In contrast, these two propositions are not mutually exclusive: 132 | 133 | - Confucius was born in Asia. 134 | - Confucius was born in the 6th Century B.C. 135 | 136 | When propositons are not mutually exclusive, we say they are ***compatible***. Compatible propositions overlap (Figure \@ref(fig:compropositions)). The region where the circles overlap represents the possible scenarios where both propositions are true (the "$A \wedge B$ region"). 137 | 138 | ```{r compropositions, fig.margin=TRUE, echo=FALSE, fig.cap="Compatible propositions"} 139 | propositions <- data.frame( 140 | cirx = c(-.5 , .5), 141 | ciry = c(0 , 0), 142 | r = c(1 , 1), 143 | labx = c(-1.25 , 1.25), 144 | laby = c(1 , 1), 145 | labl = c("italic(A)", "italic(B)") 146 | ) 147 | 148 | euler_diagram(propositions) 149 | ``` 150 | 151 | [^eulernote]: Leonhard Euler lived from $1707$ to $1783$. You may have encountered some of his work before if you've worked with logarithms or taken calculus. 152 | 153 | These are called ***Euler diagrams***, after the mathematician Leonhard Euler (pronounced *oiler*).[^eulernote] You may have seen Venn diagrams before, which are very similar. But in an Euler diagram, the circles don't have to overlap. 154 | 155 | `r newthought("Sometimes")` one circle will even contain another circle entirely. Take this example: 156 | 157 | - Confucius was born in Asia. 158 | - Confucius was born somewhere. 159 | 160 | These propositions aren't just compatible. If the first is true, then the second *must* be true. Imagine an argument with the first proposition as the premise and the second proposition as the conclusion. The argument would be valid: 161 | 162 | ```{block, type="argument", echo=TRUE} 163 | Confucius was born in Asia.\ 164 | Therefore, Confucius was born somewhere. 165 | ``` 166 | 167 | ```{r entailment, echo=FALSE, fig.margin=TRUE, fig.cap="Logical entailment"} 168 | propositions <- data.frame( 169 | cirx = c(0 , 0), 170 | ciry = c(0 , 0), 171 | r = c(1.25 , .5), 172 | labx = c(-.5 , -.95), 173 | laby = c(.5 , 1.15), 174 | labl = c("italic(A)", "italic(B)") 175 | ) 176 | euler_diagram(propositions) 177 | ``` 178 | 179 | In this case we say that the first proposition ***logically entails*** the second. In terms of an Euler diagram, the first circle is contained entirely in the second (Figure \@ref(fig:entailment)). Because there is no possible situation where the first proposition is true yet the second false. 180 | 181 | What if an argument has multiple premises? For example: 182 | 183 | ```{block, type="argument", echo=TRUE} 184 | Zhuangzi was born in the Chinese province of Anhui.\ 185 | Zhuoru was born in the Chinese city of Beijing.\ 186 | Therefore, both Zhuangzi and Zhuoru were born in China. 187 | ``` 188 | 189 | ```{r validtwopremises, fig.margin=TRUE, echo=FALSE, fig.cap="A valid argument with two premises"} 190 | propositions <- data.frame( 191 | cirx = c(-.75 , .75, 0), 192 | ciry = c(0 , 0 , 0), 193 | r = c(1 , 1 , 1), 194 | labx = c(-1.25 , 1.25, 0), 195 | laby = c(1.15 , 1.15, 1.15), 196 | labl = c("italic(A)", "italic(B)", "italic(C)") 197 | ) 198 | euler_diagram(propositions) 199 | ``` 200 | 201 | This argument is valid, and the diagram might look like Figure \@ref(fig:validtwopremises). Notice how the $A \wedge B$ region lies entirely within the $C$ circle. This reflects the argument's validity: there is no way for the first two propositions to be true and the last one false. 202 | 203 | In contrast, an invalid argument would have a diagram like Figure \@ref(fig:invalidtwopremises). This diagram allows for the possibility that $A$ and $B$ are both true yet $C$ is false; part of the $A \wedge B$ region falls outside the $C$ circle. 204 | 205 | ```{r invalidtwopremises, echo=FALSE, fig.margin=TRUE, fig.cap="An invalid argument with two premises"} 206 | propositions <- data.frame( 207 | cirx = c(-.75 , .75 , 0), 208 | ciry = c(-.5 , -.5 , 0.45), 209 | r = c(1 , 1 , 1), 210 | labx = c(-1.25 , 1.25 , 0), 211 | laby = c(.65 , .65 , 1.7), 212 | labl = c("italic(A)", "italic(B)", "italic(C)") 213 | ) 214 | euler_diagram(propositions) 215 | ``` 216 | 217 | 218 | ## Strength 219 | 220 | Inductive logic studies arguments that aren't necessarily valid, but still "strong." A ***strong*** argument is one where the conclusion is highly probable, if the premises are true. For example: 221 | 222 | ```{block, type="argument", echo=TRUE} 223 | The sun has risen every day so far.\ 224 | Therefore, the sun will rise again tomorrow. 225 | ``` 226 | 227 | This argument isn't valid, because it's possible the conclusion is false even though the premise is true. Maybe the sun will explode in the night for some surprising reason. Or maybe the earth's rotation will be stopped by alien forces. 228 | 229 | These possibilities aren't very likely, of course. So the argument is strong, even though it's not strictly valid. The premise gives us very good reason to believe the conclusion, just not a 100% guarantee. 230 | 231 | In terms of an Euler diagram then, the premise circle isn't contained entirely within the conclusion circle (Figure \@ref(fig:strongarg)). We have to leave some room for the possibility that the premise is true and the conclusion false. But we can still convey that this possibility has only a very slight chance of being true, by making it slim. 232 | 233 | ```{r strongarg, echo=FALSE, fig.margin=TRUE, fig.cap="A strong argument with premise $A$ and conclusion $B$"} 234 | propositions <- data.frame( 235 | cirx = c(0 , .85), 236 | ciry = c(0 , 0), 237 | r = c(1.25 , .5), 238 | labx = c(-1.05, .3), 239 | laby = c(1.05 , .45), 240 | labl = c("italic(B)", "italic(A)") 241 | ) 242 | euler_diagram(propositions) 243 | ``` 244 | 245 | (ref:laplacecap) Pierre Simone Laplace (1749--1827) developed [a formula](https://bit.ly/2mU9WgW) for calculating the probability the sun will rise tomorrow. We'll learn how to do similar calculations in the coming chapters. 246 | 247 | ```{r laplace, echo=FALSE, fig.margin=TRUE, fig.cap="(ref:laplacecap)"} 248 | knitr::include_graphics("img/laplace.png") 249 | ``` 250 | 251 | We could also label the *$A$-but-not-$B$* region with a small number, if we knew exactly how unlikely this possibility was. 252 | 253 | `r newthought("Strength")` comes in degrees. An argument's premises can make the conclusion somewhat likely, very likely, almost certain, or perfectly certain. So arguments range from weak, to somewhat strong, to very strong, etc. 254 | 255 | Strength differs from validity here, since validity is all-or-nothing. If there is any possible way for the premises to be true and the conclusion false, the argument is invalid---no matter how remote or bizarre that possibility is. 256 | 257 | Notice though, valid arguments are strong by definition. Since it's impossible for a valid argument's conclusion to be false if the premises are true, the premises make the conclusion 100% probable. A valid argument is the strongest possible argument. 258 | 259 | 260 | ## Forms of Inductive Argument {#indargs} 261 | 262 | What kinds of strong arguments are there, and how strong are they? That's what the rest of this book is about, in a way. But we can start by identifying some common forms of inductive argument right now. 263 | 264 | `r newthought("Generalizing")` from observed instances is one extremely common form of argument: 265 | 266 | ```{block, type="argument", echo=TRUE} 267 | Every raven I have ever seen has been black.\ 268 | Therefore, all ravens are black. 269 | ``` 270 | 271 | Arguments of this kind are usually stronger the more instances you observe. If you've only ever seen two ravens, this argument won't be very compelling. But if you've seen thousands, then it's much stronger. 272 | 273 | It also helps to observe different kinds of instances. If you've only observed ravens in your city or town, then even the thousands you've seen won't count for much. Maybe the raven population in your area is unusual, and ravens on the other side of the world are all different colours. 274 | 275 | `r newthought("Going")` in the opposite direction, we can use what we know about a general population to draw conclusions about particular instances. We saw an example of this earlier: 276 | 277 | ```{block, type="argument", echo=TRUE} 278 | Most birds can fly.\ 279 | Tweety is a bird.\ 280 | Therefore, Tweety can fly. 281 | ``` 282 | 283 | Again, the strength of the inference depends on the details. If "most birds" means $99\%$, the argument is quite strong. If "most" only means $80\%$, then it's not so strong. (Usually "most" just means more than $50\%$.) 284 | 285 | It also helps to know that Tweety is similar to the birds that can fly, and different from the ones that can't. If we know that Tweety is small and has feathers, that makes the argument stronger. If instead we know that Tweety is large, and coloured black and white, that makes the argument weaker. It suggests Tweety is a penguin. 286 | 287 | 288 | `r newthought("Inference")` to the best explanation is another common form of argument, quite different from the previous two. Here's an example: 289 | 290 | ```{block, type="argument", echo=TRUE} 291 | My car won't start and the gas gauge reads 'empty.'\ 292 | Therefore, my car is out of gas. 293 | ``` 294 | 295 | An empty tank would explain the sytmptoms described in the premise, so the premise makes the conclusion plausible. There could be other possible explanations, of course. Maybe the engine and the gas gauge both just happened to break at the same time. But that would be quite a coincidence, so this explanation isn't as good. 296 | 297 | What makes one explanation better than another? That turns out to be a very hard question, and there is no generally accepted answer. We'll come back to this issue later, once we have a better grip on the basics of probability. 298 | 299 | 300 | ## Exercises {-} 301 | 302 | #. For each of the following arguments, say whether it is valid or invalid. 303 | 304 | a. All cats have whiskers.\ 305 | Simba has whiskers.\ 306 | Therefore, Simba is a cat. 307 | 308 | #. Ada Lovelace wrote the world's first computer program.\ 309 | Ada Lovelace was Lord Byron's daughter.\ 310 | Therefore, the first computer program was written by Lord Byron's daughter. 311 | 312 | #. All Canadian residents are Russian citizens.\ 313 | Vladimir Putin is a Canadian resident.\ 314 | Therefore, Vladimir Putin is a Russian citizen. 315 | 316 | #. Manitoba is located in either Saskatchewan, Ontario, or Quebec.\ 317 | Manitoba is not located in Saskatchewan.\ 318 | Manitoba is not located in Ontario.\ 319 | Therefore, Manitoba is located in Quebec. 320 | 321 | #. If snow is black then pigs can fly.\ 322 | Snow is not black.\ 323 | Therefore, pigs cannot fly. 324 | 325 | #. Either the moon is made of green cheese or pigs can fly.\ 326 | Pigs can't fly.\ 327 | Therefore the moon is made of green cheese.\ 328 | 329 | #. For each pair of propositions, say whether they are mutually exclusive or compatible. 330 | 331 | a. Regarding the roll of an ordinary die: 332 | 333 | - The die will land on an even number. 334 | - The die will land either $4$ or $5$. 335 | 336 | #. Regarding the unemployment rate in your country tomorrow: 337 | 338 | - The unemployment rate will be at least $5\%$. 339 | - The unemployment rate will be exactly $5\%$. 340 | 341 | #. Regarding a party tomorrow: 342 | 343 | - Ani will be there and so will her sister PJ. 344 | - PJ will not be there. 345 | 346 | #. True or false? If $A$ and $B$ are mutually exclusive, and $B$ and $C$ are also mutually exclusive, then $A$ and $C$ are mutually exclusive. 347 | 348 | #. True or false? If $A$ and $B$ are mutually exclusive, then $A$ logically entails that $B$ is false. 349 | 350 | #. True or false? It is possible for $A$ to logically entail $B$ even though the reverse does not hold (i.e. even though $B$ does not logically entail $A$). 351 | 352 | #. Create your own example of each of the three types of inductive argument described in this chapter: 353 | 354 | a. Generalizing from Observed Instances 355 | #. Inferring an Instance from a Generalization 356 | #. Inference to the Best Explanation 357 | 358 | #. Suppose a family has two children. 359 | 360 | a. How many possible ways are there for their birthdays to be distributed among the four seasons fall, winter, spring, and summer? 361 | #. Suppose we know the two siblings were born in different seasons. How many possibilities are there then? 362 | #. Suppose another family has three children. We don't know whether any of them have the same birth-season. How many possibilities are there in this case? 363 | 364 | #. Suppose $A$ and $B$ are compatible, and $B$ and $C$ are compatible. Does that mean $A$ and $C$ are compatible? If yes, justify your answer. If no, describe a counterexample (an example where $A$ and $B$ are compatible, and $B$ and $C$ are compatible, but $A$ and $C$ are *not* compatible). 365 | 366 | #. Suppose $A$ logically entails $B$, and $B$ logically entails $C$. Does that mean $A$ logically entails $C$? If yes, justify your answer. If no, describe a counterexample (an example where $A$ logically entails $B$, and $B$ logically entails $C$, but $A$ does *not* logically entail $C$). -------------------------------------------------------------------------------- /04-gamblers-fallacy.Rmd: -------------------------------------------------------------------------------- 1 | # The Gambler's Fallacy 2 | 3 | ```{block, type="epigraph"} 4 | Applied statistics is hard.\ 5 | ---Andrew Gelman 6 | ``` 7 | 8 | `r newthought("My wife's")` family keeps having girls. My wife has two sisters and she and her sisters each have two daughters, with no other siblings or children. That's nine girls in a row! 9 | 10 | ```{marginfigure, echo=TRUE} 11 | Note that girl and boy aren't the only possibilities. The next child could also be intersex, like fashion model [Hanne Gaby Odiele](https://en.wikipedia.org/wiki/Hanne_Gaby_Odiele). 12 | ``` 13 | 14 | So are they due for a boy next? Here are three possible answers. 15 | 16 | - *Answer 1.* Yes, the next baby is more likely to be a boy. Ten girls in a row would be a *really* unlikely outcome. 17 | - *Answer 2.* No, the next baby is actually more likely to be a girl. Girls run in the family! Something about this family clearly predispose them to have girls. 18 | - *Answer 3.* No, the next baby is equally likely to be a boy vs. girl. Each baby's sex is determined by a purely random event, similar to a coin flip. So it's equal odds every time. The nine girls so far is just a coincidence. 19 | 20 | Which answer is correct? 21 | 22 | 23 | ## Independence 24 | 25 | It all hangs on whether the sex of each baby is "independent" of the others. Two events are ***independent*** when the outcome of one doesn't change the probability of the other. 26 | 27 | A clear example of independence is ***sampling with replacement***. Suppose we have an urn with $50$ black marbles and $50$ white ones. You draw a ball at random, then put it back. Then you give the urn a good hard shake and draw at random again. The two draws are independent in this case. Each time you draw, the number of black vs. white marbles is the same, and they're all randomly mixed up. 28 | 29 | Even if you were to draw ten white balls in a row, the eleventh draw would still be a $50$-$50$ shot! It would just be a coincidence that you drew ten white balls in a row. Because there's always an even mix of black and white marbles, and you're always picking one at random. 30 | 31 | But now imagine sampling ***without replacement***. The situation is the same, except now you set aside each ball you draw rather than put it back. Now the draws are ***dependent***. If you draw a black ball on the first draw, the odds of black on the next draw go down. There are only $49$ black balls in the urn now, vs. $50$ white. 32 | 33 | ```{r eikosex, echo=FALSE, fig.margin=TRUE, fig.cap="Example of an eikosogram."} 34 | ggplot() + 35 | geom_rect(aes(xmin = 0, xmax = 1/3, ymin = 3/4, ymax = 1), 36 | fill = "transparent", colour = "black") + 37 | geom_rect(aes(xmin = 1/3, xmax = 1, ymin = 3/4, ymax = 1), 38 | fill = "transparent", colour = "black") + 39 | geom_rect(aes(xmin = 0, xmax = 2/3, ymin = 0, ymax = 3/4), 40 | fill = "transparent", colour = "black") + 41 | geom_rect(aes(xmin = 2/3, xmax = 1, ymin = 0, ymax = 3/4), 42 | fill = "transparent", colour = "black") + 43 | geom_text(aes(x = -.1, y = 7/8, label = "A"), 44 | fontface = "italic", size = 7) + 45 | geom_text(aes(x = -.1, y = 3/8, label = "~A"), 46 | fontface = "italic", size = 7) + 47 | geom_text(aes(x = 1/6, y = 1.075, label = "B"), 48 | fontface = "italic", size = 7) + 49 | geom_text(aes(x = 2/3, y = 1.075, label = "~B"), 50 | fontface = "italic", size = 7) + 51 | coord_fixed() + 52 | theme_void() 53 | ``` 54 | 55 | `r newthought("We can")` visualize independence with an ***eikosogram***, which is like an Euler diagram but with rectangles instead of circles. The size of each sector reflects its probability. For example, in Figure \@ref(fig:eikosex) the proposition $A \wedge B$ has low probability ($1/12$), while $\neg A \wedge B$ has much higher probability ($1/2$). 56 | 57 | ```{r eikosindex, echo=FALSE, fig.margin=TRUE, fig.cap="Example of an eikosogram where $A$ and $B$ are independent."} 58 | ggplot() + 59 | geom_rect(aes(xmin = 0, xmax = 2/3, ymin = 3/4, ymax = 1), 60 | fill = "transparent", colour = "black") + 61 | geom_rect(aes(xmin = 2/3, xmax = 1, ymin = 3/4, ymax = 1), 62 | fill = "transparent", colour = "black") + 63 | geom_rect(aes(xmin = 0, xmax = 2/3, ymin = 0, ymax = 3/4), 64 | fill = "transparent", colour = "black") + 65 | geom_rect(aes(xmin = 2/3, xmax = 1, ymin = 0, ymax = 3/4), 66 | fill = "transparent", colour = "black") + 67 | geom_text(aes(x = -.1, y = 7/8, label = "A"), 68 | fontface = "italic", size = 7) + 69 | geom_text(aes(x = -.1, y = 3/8, label = "~A"), 70 | fontface = "italic", size = 7) + 71 | geom_text(aes(x = 1/3, y = 1.075, label = "B"), 72 | fontface = "italic", size = 7) + 73 | geom_text(aes(x = 5/6, y = 1.075, label = "~B"), 74 | fontface = "italic", size = 7) + 75 | coord_fixed() + 76 | theme_void() 77 | ``` 78 | 79 | When propositions are independent, as in Figure \@ref(fig:eikosindex), the eikosogram is divided by just two straight lines. That way the $A$ region takes up the same percentage of the $B$ region as it does the $\neg B$ region. So the probability of $A$ is the same whether $B$ is true or false. In other words, $B$'s truth has no effect on the probability of $A$. 80 | 81 | 82 | ## Fairness 83 | 84 | Flips of an ordinary coin are also independent. Even if you get ten heads in a row, the eleventh toss is still $50$-$50$. If it's *really* an ordinary coin, the ten heads in a row was just a coincidence. 85 | 86 | Coin flips aren't just independent, they're also *unbiased*: heads and tails are equally likely. A process is ***biased*** if some outcomes are more likely than others. For example, a loaded coin that comes up heads $3/4$ of the time is biased. 87 | 88 | So coin flips are unbiased *and* independent. We call such processes ***fair***. 89 | $$ \mbox{Fair = Unbiased + Independent}.$$ 90 | Another example of a fair process is drawing from our black/white urn with replacement. There are $50$ black and $50$ white marbles on every draw, so black and white have equal probability every time. 91 | 92 | But drawing without replacement is not a fair process, because the draws are not independent. Removing a black ball makes the chance of black on the next draw go down. 93 | 94 | 95 | ## The Gambler's Fallacy 96 | 97 | Gambling often involves fair processes: fair dice, fair roulette wheels, fair decks of cards, etc. But people sometimes forget that fair processes are independent. If a roulette wheel comes up black nine times in a row, they figure it's "due" for red. Or if they get a bunch of bad hands in a row at poker, they figure they're due for a good one soon. 98 | 99 | ```{marginfigure, echo=TRUE} 100 | "Simply cannot stop thinking about a girl I knew who dated a guy she couldn't STAND---all because his ex died in a plane crash, & her biggest fear was dying in a plane crash, & she figured there was no way the universe would let both of this guy's girlfriends die in a plane crash" ---[\@isabelzawtun](https://twitter.com/isabelzawtun/status/1329537047837675522?s=20). 101 | ``` 102 | 103 | This way of thinking is called *the gambler's fallacy*. A fallacy is a mistake in reasoning. The mistake here is failing to fully account for independence. These gamblers know the process in question is fair, in fact that's a key part of their reasoning. They know it's unlikely that the roulette wheel will land on black ten times in a row because a fair wheel should land on black and red equally often. But then they overlook the fact that fair also means independent, and independent means the last nine spins tell us nothing about the tenth spin. 104 | 105 | The gambler's fallacy is so seductive that it can be hard to find your way out of it. Here's one way to think about it that may help. Imagine the gambler's point of view at two different times: before the ten spins of the wheel, and after. Before, the gambler is contemplaing the likelihood of getting ten black spins in a row: 106 | 107 | $$ \_ \, \_ \,\_ \,\_ \,\_ \,\_ \,\_ \,\_ \,\_ \,\_ \, ? $$ 108 | 109 | From that vantage point, the gambler is exactly right to think it's unlikely these ten spins will all land on black. But now imagine their point of view after observing (to their surprise) the first nine spins all landing black: 110 | 111 | $$ B \, B \, B \, B \, B \, B \, B \, B \, B \, \_ \,? $$ 112 | 113 | Now how likely is it these ten spins will all land black? Well, just one more spin has to land black now to fulfill this unlikely prophecy. So it's not such a long shot anymore. In fact it's a $50$-$50$ shot. Although it was very unlikely the first nine spins would turn out this way, now that they have, it's perfectly possible the tenth will turn out the same. 114 | 115 | 116 | ## Ignorance Is Not a Fallacy 117 | 118 | At this point you may have a nagging thought at the back of your mind. If we flipped a coin $100$ times and it landed heads every time, wouldn't we conclude the next toss will probably land heads? How could that be a mistake?! 119 | 120 | The answer: it's not a mistake. It *would* be a mistake if you knew the coin was fair. But if you don't know that, then $100$ heads in a row could be enough to convince you it's actually not fair. 121 | 122 | The gambler's fallacy only occurs when you know a process is fair, and then you fail to reason accordingly. If you don't know whether a process is fair, then you aren't making a logical error by reasoning according to a different assumption. 123 | 124 | So, is the gambler's fallacy at work if my wife's family expects a boy next? As it turns out, [the process that determines the sex of a child is pretty much fair](https://doi.org/10.1080/09332480.2001.10542293).^[The question isn't completely settled though, as far as I could tell from my (not very thorough) research.] So the correct answer to our opening question was Answer 3. 125 | 126 | Most people don't know about the relevant research, though. They may (like me) only know a bit from high school biology about how sex is usually determined at conception. But it's still possible for all they know that some people's eggs are more likely to select sperm cells with an X chromosome, for example. 127 | 128 | So it's not necessarily a fallacy if my in-laws expect a boy next. It could just be a reasonable conclusion given the information available. A fallacy is an error in reasoning, not a lack of knowledge. 129 | 130 | 131 | ## The Hot Hand Fallacy 132 | 133 | Sometimes a basketball player hits a lot of baskets in a row and people figure they're on fire: they've got a "hot hand." But [a famous study published in 1985](http://www.sciencedirect.com/science/article/pii/0010028585900106?via%3Dihub) found that these streaks are just a coincidence. Each shot is still independent of the others. Is the hot hand an example of the gambler's fallacy? 134 | 135 | Most people don't know about the famous 1985 study. Certainly nobody knew what the result of the study would be before it was conducted. So a lot of believers in the hot hand were in the unfortunate position of just not knowing a player's shots are independent. So the hot hand isn't the same as the gambler's fallacy. 136 | 137 | Believers in the hot hand may be guilty of a different fallacy, though. That same study analyzed the reasoning that leads people to think a player's shots are dependent. Their conclusion: people tend to see patterns in sequences of misses and hits even when they're random. So there may be a second, different fallacy at work, the "hot hand fallacy." 138 | 139 | Things might actually be even more complicated than that, though. [Some recent studies](https://www.gsb.stanford.edu/insights/jeffrey-zwiebel-why-hot-hand-may-be-real-after-all) found that the hot hand may actually be real after all! How could that be possible? What did the earlier studies miss? 140 | 141 | It's still being researched, but one possibility is: defense. When a basketball player gets in the zone, the other team ups their defense. The hot player has to take harder shots. So one of the recent studies added a correction to account for increased difficulty. And another looked at baseball instead, where they did find evidence of streaks. 142 | 143 | ```{marginfigure, echo=TRUE} 144 | But here's [Selena Gomez and Nobel prize winner Richard Thaler](https://www.youtube.com/watch?v=Pxr_FzpPM2Q) telling a bit of the story in a clip from the 2015 movie *The Big Short*. 145 | ``` 146 | 147 | The full story of the hot hand fallacy has yet to be told it seems. 148 | 149 | 150 | ## Exercises {-} 151 | 152 | #. Suppose you are going to draw a card at random from a standard deck. For each pair of propositions, say whether they are independent or not. 153 | 154 | a. $A$: the card is red.\ 155 | $B$: the card is an ace. 156 | 157 | #. $A$: the card is red.\ 158 | $B$: the card is a diamond. 159 | 160 | #. $A$: the card is an ace.\ 161 | $B$: the card is a spade. 162 | 163 | #. $A$: the card is a Queen.\ 164 | $B$: the card is a face card. 165 | 166 | #. After one draw with replacement, drawing a second marble from an urn filled with $50$ black and $50$ white marbles is a ______ process. 167 | 168 | a. Independent 169 | #. Fair 170 | #. Unbiased 171 | #. All of the above 172 | #. None of the above 173 | 174 | #. After one draw *without* replacement, drawing a second marble from an urn filled with $50$ black and $50$ white marbles is a ______ process. 175 | 176 | a. Independent 177 | #. Fair 178 | #. Unbiased 179 | #. All of the above 180 | #. None of the above 181 | 182 | #. For each of the following examples, say whether it is an instance of the gambler's fallacy. 183 | 184 | a. You're playing cards with your friends using a standard, randomly shuffled deck of $52$ cards. You're about half-way through the deck and no aces have been drawn yet. You conclude that an ace is due soon, and thus the probability the next card is an ace has gone up. 185 | 186 | #. You're holding a six-sided die, which you know to be fair. You're going to roll it $60$ times. You figure about $10$ of those rolls should be threes. But after $59$ rolls, you've rolled a three only five times. You figure that the probability of a three on the last roll has gone up: it's higher than just $1/6$. 187 | 188 | #. You know the lottery numbers in Ontario are selected using a fair process. So it's really unlikely that someone will win two weeks in a row. Your friend won last week, so you conclude their chances of winning this week too are even lower than usual. 189 | 190 | #. You're visiting a new country where corruption is common, so you aren't sure whether the lottery there is fair. You see on the news that the King's cousin won the lottery two weeks in a row. You conclude that their chances of winning next week are higher than normal, because the lottery is rigged in their favour. 191 | 192 | #. Suppose $A$ and $B$ are compatible (not mutually exclusive). Does it follow that they are independent? If your answer is yes, explain why this follows. If your answer is no, give a counterexample (an example where $A$ and $B$ are neither mutually exclusive nor independent). 193 | 194 | #. Suppose $A$ is independent of $B$, and $B$ is independent of $C$. Does that mean $A$ is independent of $C$? If yes, justify your answer. If no, describe a counterexample (an example where it's false). -------------------------------------------------------------------------------- /05-calculating-probabilities.Rmd: -------------------------------------------------------------------------------- 1 | # Calculating Probabilities 2 | 3 | 4 | `r newthought("Imagine")` you're going to flip a fair coin twice. You could get two heads, two tails, or one of each. How probable is each outcome? 5 | 6 | It's tempting to say they're equally probable, $1/3$ each. But actually the first two are only $1/4$ likely, while the last is $1/2$ likely. Why? 7 | 8 | There are actually four possible outcomes here, but we have to consider the order of events to see how. If you get one each of heads and tails, what order will they come in? You could get the head first and then the tail, or the reverse. 9 | 10 | So there are four possible sequences: HH, TT, HT, and TH. And all four sequences are equally likely, a probability of $1/4$. 11 | 12 | How do we know each sequence has $1/4$ probability though? And how does that tell us the probability is $1/2$ that you'll get one each of heads and tails? We need to introduce some mechanics of probability to settle these questions. 13 | 14 | 15 | ## Multiplying Probabilities 16 | 17 | We denote the probability of proposition $A$ with $Pr(A)$. For example, $Pr(A)=2/3$ means there's a $2/3$ chance $A$ is true. 18 | 19 | Now, our coin is fair, and by definition that means it always has a $1/2$ chance of landing heads and a $1/2$ chance of landing tails. For a single toss, we can use $H$ for the proposition that it lands heads, and $T$ for the proposition that it lands tails. We can then write $Pr(H) = 1/2$ and $Pr(T) = 1/2$. 20 | 21 | For a sequence of two tosses, we can use $H_1$ for heads on the first toss, and $H_2$ for heads on the second toss. Similarly, $T_1$ and $T_2$ represent tails on the first and second tosses, respectively. The four possible sequences are then expressed by the complex propositions: 22 | 23 | - $H_1 \,\&\, H_2$, 24 | - $T_1 \,\&\, T_2$, 25 | - $H_1 \,\&\, T_2$, 26 | - $T_1 \,\&\, H_2$. 27 | 28 | We want to calculate the probabilities of these propositions. For example, we want to know what number $Pr(H_1 \,\&\, H_2)$ is equal to. 29 | 30 | Because the coin is fair, we know $Pr(H_1) = 1/2$ and $Pr(H_2) = 1/2$. The probability of heads on any given toss is always $1/2$, no matter what came before. To get the probability of $H_1 \,\&\, H_2$ it's then natural to compute: 31 | $$ 32 | \begin{aligned} 33 | Pr(H_1 \,\&\, H_2) &= Pr(H_1) \times Pr(H_2)\\ 34 | &= 1/2 \times 1/2\\ 35 | &= 1/4. 36 | \end{aligned} 37 | $$ 38 | And this is indeed correct, but *only because the coin is fair and thus the tosses are independent*. The following is a general rule of probability. 39 | 40 | The Multiplication Rule 41 | 42 | : If $A$ and $B$ are independent, then $Pr(A \,\&\, B) = Pr(A) \times Pr(B)$. 43 | 44 | ```{r echo=FALSE} 45 | # TODO: add E&~H-style diagram to motivate the rule 46 | ``` 47 | 48 | So, because our two coin tosses are independent, we can multiply to calculate $Pr(H_1 \,\&\, H_2) = 1/4$. And the same reasoning applies to all four possible sequences, so we have: 49 | $$ 50 | \begin{aligned} 51 | Pr(H_1 \,\&\, H_2) &= 1/4,\\ 52 | Pr(T_1 \,\&\, T_2) &= 1/4,\\ 53 | Pr(H_1 \,\&\, T_2) &= 1/4,\\ 54 | Pr(T_1 \,\&\, H_2) &= 1/4. 55 | \end{aligned} 56 | $$ 57 | 58 | `r newthought("The Multiplication")` rule only applies to independent propositions. Otherwise it gives the wrong answer. 59 | 60 | For example, the propositions $H_1$ and $T_1$ are definitely not independent. If the coin lands heads on the first toss ($H_1$), that drastically alters the chances of tails on the first toss ($T_1$). It changes that probability to zero! If you were to apply the Multiplication Rule, you would get $Pr(H_1 \,\&\, T_1) = Pr(H_1) \times Pr(T_1) = 1/2 \times 1/2 = 1/4$, which is definitely wrong. 61 | 62 | ```{block, type='warning'} 63 | Only use the Multiplication Rule on independent propositions. 64 | ``` 65 | 66 | 67 | ## Adding Probabilities 68 | 69 | We observed that you can get one head and one tail two different ways. You can either get heads then tails ($H_1 \,\&\, T_2$), or you can get tails then heads ($T_1 \,\&\, H_2$). So the logical expression for "one of each" is: 70 | $$ (H_1 \,\&\, T_2) \vee (T_1 \,\&\, H_2). $$ 71 | This proposition is a disjunction: its main connective is $\vee$. How do we calculate the probability of a disjunction? 72 | 73 | The Addition Rule 74 | 75 | : If $A$ and $B$ are mutually exclusive, then $Pr(A \vee B) = Pr(A) + Pr(B)$. 76 | 77 | ```{r echo=FALSE} 78 | # TODO: add Euelr diagram motivating the rule 79 | ``` 80 | 81 | In this case the two sides of our disjunction are mutually exclusive. They describe opposite orders of affairs. So we can apply the Addition Rule to calculate: 82 | 83 | $$ 84 | \begin{aligned} 85 | Pr((H_1 \,\&\, T_2) \vee (T_1 \,\&\, H_2)) 86 | &= Pr(H_1 \,\&\, T_2) + Pr(T_1 \,\&\, H_2)\\ 87 | &= 1/4 + 1/4\\ 88 | &= 1/2. 89 | \end{aligned} 90 | $$ 91 | 92 | `r newthought("This")` completes the solution to our opening problem. We've now computed the three probabilities we wanted: 93 | 94 | - $Pr(\mbox{2 heads}) = Pr(H_1 \,\&\, H_2) = 1/2 \times 1/2 = 1/4$, 95 | - $Pr(\mbox{2 tails}) = Pr(T_1 \,\&\, T_2) = 1/2 \times 1/2 = 1/4$, 96 | - $Pr(\mbox{One of each}) = Pr((H_1 \,\&\, T_2) \vee (T_1 \,\&\, H_2)) = 1/4 + 1/4 = 1/2$. 97 | 98 | In the process we introduced two central rules of probability, one for $\,\&\,$ and one for $\vee$. The multiplication rule for $\,\&\,$ only applies when the propositions are independent. The addition rule for $\,\vee\,$ only applies when the propositions are mutually exclusive. 99 | 100 | `r newthought("Why does")` the addition rule for $\vee$ sentences only apply when the propositions are mutually exclusive? Well imagine the weather forecast says there's a $90\%$ chance of rain in the morning, and there's also a $90\%$ chance of rain in the afternoon. What's the chance it'll rain at some point during the day, either in the morning or the afternoon? If we calculate $Pr(M \vee A) = Pr(M) + Pr(A)$, we get $90\% + 90\% = 180\%$, which doesn't make any sense. There can't be a $180\%$ chance of rain tomorrow. 101 | 102 | The problem is that $M$ and $A$ are not mutually exclusive. It could rain all day, both morning and afternoon. We'll see the correct way to handle this kind of situation in [Chapter 7][Calculating Probabilities, Part 2]. In the meantime just be careful: 103 | 104 | ```{block, type='warning'} 105 | Only use the Addition Rule on mutually exclusive propositions. 106 | ``` 107 | 108 | ## Exclusivity vs. Independence 109 | 110 | ```{r echo=FALSE, fig.margin=TRUE, fig.cap="Mutually exclusive propositions don't overlap"} 111 | 112 | euler_diagram <- function(propositions) { 113 | ggplot(data = propositions) + theme_void() + coord_fixed() + 114 | xlim(-3,3) + ylim(-2,2) + 115 | theme(panel.border = element_rect(colour = "black", fill=NA, size=1)) + 116 | geom_circle(aes(x0 = cirx, y0 = ciry, r = r)) + 117 | geom_text(aes(x = labx, y = laby, label = labl), 118 | fontface = "italic", size = 7) 119 | } 120 | 121 | propositions <- data.frame( 122 | cirx = c(-1.25 , 1.25), 123 | ciry = c(0 , 0), 124 | r = c(1 , 1), 125 | labx = c(-2 , 2), 126 | laby = c(1 , 1), 127 | labl = c("A", "B") 128 | ) 129 | 130 | euler_diagram(propositions) 131 | ``` 132 | 133 | Exclusivity and independence can be hard to keep straight at first. One way to keep track of the difference is to remember that mutually exclusive propositions don't overlap, but independent propositions do. Independence means the truth of one proposition doesn't affect the chance of the other. So if you find out that $A$ is true, $B$ still has the same chance of being true. Which means there have to be some $B$ possibilities within the $A$ circle. (Unless the probability of $A$ was zero to start with, but our official definition of independence in the next chapter will rule this case out by hypothsis.) 134 | 135 | ```{r echo=FALSE, fig.margin=TRUE, fig.cap="Independent propositions do overlap (unless one of them has zero probability)."} 136 | propositions <- data.frame( 137 | cirx = c(-.5 , .5), 138 | ciry = c(0 , 0), 139 | r = c(1 , 1), 140 | labx = c(-1.25 , 1.25), 141 | laby = c(1 , 1), 142 | labl = c("A", "B") 143 | ) 144 | euler_diagram(propositions) 145 | ``` 146 | 147 | So independence and exclusivity are very different. Exclusive propositions are not independent, and independent propositions are not exclusive. 148 | 149 | Another marker that may help you keep these two concepts straight: exclusivity is a concept of deductive logic. It's about whether it's *possible* for both propositions to be true (even if that possibility is very unlikely). But independence is a concept of inductive logic. It's about whether one proposition being true changes the *probability* of the other being true. 150 | 151 | 152 | ## Tautologies, Contradictions, and Equivalent Propositions 153 | 154 | ```{r echo=FALSE, fig.margin=TRUE, fig.cap="The Tautology Rule. Every point falls in either the $A$ region or the $\\neg A$ region, so $\\p(A \\vee \\neg A) = 1$."} 155 | ggplot() + theme_void() + coord_fixed() + 156 | xlim(-3,3) + ylim(-2,2) + 157 | theme(panel.border = element_rect(colour = "black", fill=NA, size=1), 158 | panel.background = element_rect(fill = bookblue)) + 159 | geom_circle(aes(x0 = 0, y0 = 0, r = 1.5), fill = bookred) + 160 | geom_text(aes(x = 0, y = 0, label = "A"), 161 | fontface = "italic", size = 7) + 162 | geom_text(aes(x = -2, y = 1.5, label = "~A"), 163 | fontface = "italic", size = 7) 164 | ``` 165 | 166 | A tautology is a proposition that must be true, so its probability is always 1. 167 | 168 | The Tautology Rule 169 | 170 | : $\p(T) = 1$ for every tautology $T$. 171 | 172 | For example, $A \vee \neg A$ is a tautology, so $\p(A \vee \neg A) = 1$. In terms of an Euler diagram, the $A$ and $\neg A$ regions together take up the whole diagram. To put it a bit colourfully, $\p(A \vee \neg A) = \color{bookred}{\blacksquare}\color{black}{} + \color{bookblue}{\blacksquare}\color{black}{} = 1$. 173 | 174 | `r newthought("The flipside")` of a tautology is a contradiction, a proposition that can't possibly be true. So it has probability 0. 175 | 176 | The Contradiction Rule 177 | 178 | : $\p(C) = 0$ for every contradiction $C$. 179 | 180 | For example, $A \wedge \neg A$ is a contradiction, so $\p(A \wedge \neg A) = 0$. In terms of our Euler diagram, there is no region where $A$ and $\neg A$ overlap. So the portion the diagram devoted to $A \wedge \neg A$ is nil, zero. 181 | 182 | `r newthought("Equivalent")` propositions are true under exactly the same circumstances (and false under exactly the same circumstances). So they have the same chance of being true (ditto false). 183 | 184 | ```{r equivrule, echo=FALSE, fig.margin=TRUE, fig.cap="The Equivalence Rule. The $A \\vee B$ region is identical to the $B \\vee A$ region, so they have the same probability."} 185 | propositions <- data.frame( 186 | cirx = c(-.75 , .75), 187 | ciry = c(0 , 0), 188 | r = c(1.5 , 1.5), 189 | labx = c(-2.25 , 2.25), 190 | laby = c(1 , 1), 191 | labl = c("A", "B") 192 | ) 193 | 194 | ggplot(data = propositions) + 195 | theme_void() + coord_fixed() + 196 | xlim(-3,3) + ylim(-2,2) + 197 | theme(panel.border = element_rect(colour = "black", fill = NA, size = 1)) + 198 | geom_circle(aes(x0 = cirx, y0 = ciry, r = r), fill = bookred) + 199 | geom_circle(aes(x0 = cirx, y0 = ciry, r = r), fill = "transparent") + 200 | geom_text(aes(x = labx, y = laby, label = labl), 201 | fontface = "italic", size = 7) 202 | ``` 203 | 204 | The Equivalence Rule 205 | 206 | : $\p(A) = \p(B)$ if $A$ and $B$ are logically equivalent. 207 | 208 | For example, $A \vee B$ is logically equivalent to $B \vee A$, so $\p(A \vee B) = \p(B \vee A)$. 209 | 210 | In terms of an Euler diagram, the $A \vee B$ region is exactly the same as the $B \vee A$ region: the red region in Figure \@ref(fig:equivrule). So both propositions take up the same amount of space in the diagram. 211 | 212 | 213 | ## The Language of Events 214 | 215 | In math and statistics books you'll often see a lot of the same concepts from this chapter introduced in different language. Instead of propositions, they'll discuss *events*, which are sets of possible outcomes. 216 | 217 | For example, the roll of a six-sided die has six possible outcomes: $1, 2, 3, 4, 5, 6$. And the event of the die landing on an even number is the set $\{2, 4, 6\}$. 218 | 219 | In this way of doing things, rather than consider the probability that a proposition $A$ is true, we consider the probability that event $E$ occurs. Instead of considering a conjunction of propositions like $A \,\&\, B$, we consider the *intersection* of two events, $E \cap F$. And so on. 220 | 221 | If you're used to seeing probability presented this way, there's an easy way to translate into logic-ese. For any event $E$, there's the corresponding proposition that event $E$ occurs. And you can translate the usual set operations into logic as follows: 222 | 223 | ```{r echo=FALSE, fig.cap="Compatible propositions"} 224 | df <- data.frame( 225 | Events = c("$E^c$", "$E \\cap F$", "$E \\cup F$"), 226 | Propositions = c("$\\sim\\! A$", "$A \\,\\&\\, B$", "$A \\vee B$") 227 | ) 228 | 229 | knitr::kable(df, align = "c", caption="Translating between events and propositions") 230 | ``` 231 | 232 | We won't use the language of events in this book. I'm just mentioning it in case you've come across it before and you're wondering how it connects. If you've never seen it before, you can safely ignore this section. 233 | 234 | 235 | ## Summary 236 | 237 | In this chapter we learned how to represent probabilities of propositions using the $Pr(\ldots)$ operator. We also learned some fundamental rules of probability. 238 | 239 | There were three rules corresponding to the concepts of tautology, contradiction, and equivalence. 240 | 241 | - $\p(T) = 1$ for every tautology $T$. 242 | - $\p(C) = 0$ for every contradiction $C$. 243 | - $\p(A) = \p(B)$ if $A$ and $B$ are logically equivalent. 244 | 245 | And there were two rules corresponding to the connectives $\wedge$ and $\vee$. 246 | 247 | - $Pr(A \vee B) = Pr(A) + Pr(B)$, if $A$ and $B$ are mutually exclusive. 248 | - $Pr(A \wedge B) = Pr(A) \times Pr(B)$, if $A$ and $B$ are independent. 249 | 250 | The restrictions on these two rules are essential. If you ignore them, you will get wrong answers. 251 | 252 | 253 | ## Exercises {-} 254 | 255 | #. What is the probability of each of the following propositions? 256 | 257 | a. $A \wedge (B \wedge \neg A)$ 258 | b. $\neg (A \wedge \neg A)$ 259 | 260 | #. Give an example of each of the following. 261 | 262 | a. Two statements that are mutually exclusive. 263 | b. Two statements that are independent. 264 | 265 | #. For each of the following, say whether it is true or false. 266 | 267 | a. If propositions are independent, then they must be mutually exclusive. 268 | b. Independent propositions usually aren't mutually exclusive. 269 | c. If propositions are mutually exclusive, then they must be independent. 270 | d. Mutually exclusive propositions usually aren't independent. 271 | 272 | #. Assume $Pr(A \wedge B)=1/3$ and $Pr(A \wedge \neg B)=1/5$. Answer each of the following: 273 | 274 | a. What is $Pr((A \wedge B) \vee (A \wedge \neg B))$? 275 | b. What is $Pr(A)$? 276 | c. Are $(A \wedge B)$ and $(A \wedge \neg B)$ independent? 277 | 278 | #. Suppose $A$ and $B$ are independent, and $A$ and $C$ are mutually exclusive. Assume $\p(A) = 1/3, \p(B) = 1/6, \p(C) = 1/9$, and answer each of the following: 279 | 280 | a. What is $\p(A \wedge C)$? 281 | b. What is $\p((A \wedge B) \vee C)$? 282 | c. Must $\p(A \wedge B) = 0$? 283 | 284 | #. True or false: if $\p(A)=\p(B)$, then $A$ and $B$ are logically equivalent. 285 | 286 | #. Consider the following argument. 287 | 288 | ```{block, type="argument", echo=TRUE} 289 | If a coin is fair, the probability of getting at least one heads in a sequence of four tosses is quite high: above 90%.\ 290 | Therefore, if a fair coin has landed tails three times in a row, the next toss will probably land heads. 291 | ``` 292 | 293 | Answer each of the following questions. 294 | 295 | a. Is the premise of this argument true? 296 | b. Is the argument valid? 297 | c. Is the argument sound? 298 | 299 | #. Suppose a fair, six-sided die is rolled two times. What is is the probability of it landing on the same number each time? 300 | 301 | Hint: calculate the probability of it landing on a *different* number each time. To do this, first count the number of possible ways the two rolls could turn out. Then count how many of these are "no-repeats." 302 | 303 | #. Same as the previous exercise but with four rolls instead of two. That is, suppose a fair, six-sided die is rolled four times. What is the probability of it landing on the same number four times in a row? 304 | 305 | #. The Addition Rule can be extended to three propositions. If $A$, $B$, and $C$ are all mutually exclusive with one another, then 306 | $$ \p(A \vee B \vee C) = \p(A) + \p(B) + \p(C).$$ 307 | Explain why this rule is correct. Would the same idea extend to four mutually exclusive propositions? To five? 308 | 309 | (Hint: there's more than one way to do this. You can use an Euler diagram. Or you can derive the new rule from the original one, by thinking of $A \vee B \vee C$ as a disjunction of $A \vee B$ and $C$.) 310 | 311 | #. You have a biased coin, where each toss has a $3/5$ chance of landing heads. But each toss is independent of the others. Suppose you're going to flip the coin $1,000$ times. The first 998 tosses all land tails. What is the probability at least one of the last two flips will be tails? 312 | 313 | #. There are $3$ empty buckets lined up. Someone takes $4$ apples and places each one in a bucket. The placement of each apple is random and independent of the others. What is the probability that the first two buckets end up with no apples? 314 | 315 | #. Suppose three cards are stacked in order: jack on top, queen in the middle, king on the bottom. If we shuffle them randomly, what is the probability the queen will still be in the middle when we're done? Assume shuffling makes every possible ordering of the cards equally likely. Hint: how many ways are there to assign each card a place in the stack? How many of these have the queen in the middle? 316 | 317 | #. At University X, all courses meet once a week for 3 hours. At the beginning of the semester, the registrar assigns each course a time slot at random: they pick a random weekday and then flip a fair coin to decide on morning (9 a.m. -- 12 p.m.) vs. afternoon (1 p.m. -- 4 p.m.). 318 | 319 | Conflicts are allowed: students can sign up for two (or more) classes scheduled at the same time. 320 | 321 | So far Axl has signed up for 2 classes that do not conflict. If he chooses a 3rd class at random, what is the probability it will introduce a conflict? -------------------------------------------------------------------------------- /10-induction-and-probability.Rmd: -------------------------------------------------------------------------------- 1 | # Probability & Induction 2 | 3 | 4 | ```{block, type="epigraph"} 5 | Nothing in life is to be feared, it is only to be understood. 6 | Now is the time to understand more, so that we may fear less.\ 7 | ---Marie Curie 8 | ``` 9 | 10 | 11 | `r newthought("We")` met some common types of inductive argument back in [Chapter 2][Forms of Inductive Argument]. Now that we know how to work with probability, let's use what we've learned to sharpen our understanding of how those arguments work. 12 | 13 | 14 | ## Generalizing from Observed Instances 15 | 16 | Generalizing from observed instances was the first major form of inductive argument we encountered. Suppose you want to know what colour a particular species of bird tends to be. Then you might go out and look at a bunch of examples: 17 | 18 | ```{block, type="argument", echo=TRUE} 19 | I've seen $10$ ravens and they've all been black.\ 20 | Therefore, all ravens are black. 21 | ``` 22 | 23 | How strong is this argument? 24 | 25 | Observing ravens is a lot like sampling from an urn. Each raven is a marble, and the population of all ravens is the urn. We don't know what nature's urn contains at first: it might contain only black ravens, or it might contain ravens of other colours too. To assess the argument's strength, we have to calculate $\p(A \given B_1 \wedge B_2 \wedge \ldots \wedge B_{10})$: the probability that all ravens in nature's urn are black, given that the first raven we observed was black, and the second, and so on, up to the tenth raven. 26 | 27 | We learned how to solve simple problems of this form in the [previous chapter][Multiple Draws]. For example, imagine you face another of our mystery urns, and this time there are two equally likely possibilities. 28 | $$ 29 | \begin{aligned} 30 | A &= \mbox{The urn contains only black marbles.} \\ 31 | \neg A &= \mbox{The urn contains an equal mix of black and white marbles.} \\ 32 | \end{aligned} 33 | $$ 34 | If we do two random draws with replacement, and both are black, we calculate $\p(A \given B_1 \wedge B_2)$ using Bayes' theorem: 35 | $$ 36 | \begin{aligned} 37 | \p(A \given B_1 \wedge B_2) &= \frac{\p(B_1 \wedge B_2 \given A)\p(A)}{\p(B_1 \wedge B_2 \given A) \p(A) + \p(B_1 \wedge B_2 \given \neg A) \p(\neg A)} \\ 38 | &= \frac{(1)^2(1/2)}{(1)^2(1/2) + (1/2)^2(1/2)}\\ 39 | &= 4/5. 40 | \end{aligned} 41 | $$ 42 | If we do a third draw with replacement, and it too comes up black, we replace the squares with cubes. On the fourth draw we'd raise to the fourth power. And so on. When we get to the tenth black draw, the calculation becomes: 43 | $$ 44 | \begin{aligned} 45 | \p(A \given B_1 \wedge \ldots \wedge B_{10}) &= \frac{(1)^{10}(1/2)}{(1)^{10}(1/2) + (1/2)^{10}(1/2)}\\ 46 | &= 1,024/1,025\\ 47 | &\approx .999. 48 | \end{aligned} 49 | $$ 50 | So after ten black draws, we can be about $99.9\%$ certain the urn contains only black marbles. 51 | 52 | But that doesn't mean our argument that all ravens are black is $99.9\%$ strong! 53 | 54 | 55 | ## Real Life Is More Complicated 56 | 57 | There are two major limitations to our urn analogy. 58 | 59 | `r newthought("The first")` limitation is that the ravens we observe in real life aren't randomly sampled from nature's "urn." We only observe ravens in certain locations, for example. But our solution to the urn problem relied on random sampling. For example, we assumed $\p(B_1 \given \neg A) = 1/2$ because the black marbles are just as likely to be drawn as the white ones, if there are any white ones. 60 | 61 | If there are white ravens in the world though, they might be limited to certain locales.^[In fact there are white ravens, [especially in one area of Vancouver Island](https://vancouversun.com/news/local-news/rare-white-raven-spotted-on-vancouver-island).] So the fact we're only observing ravens in our part of the world could make a big difference to what we find. It matters whether your sample really is random. 62 | 63 | `r newthought("The second")` limitation is that we pretended there were only two possibilities: either all the marbles in the urn are black, or half of them are. And, accordingly, we assumed there was already a $1/2$ chance all the marbles are black, before we even looked at any of them. 64 | 65 | In real life though, when we encounter a new species, it could be that $90\%$ of them are black, or $31\%$, or $42.718\%$, or any portion from $0\%$ to $100\%$. So there are many, many more possibilities. The possibility that *all* members of the new species ($100\%$) are black is just one of these many possibilities. So it would start with a much lower probability than $1/2$. 66 | 67 | 68 | ## The Rule of Succession {#succession} 69 | 70 | There is a famous formula that addresses this second issue. 71 | ```{marginfigure} 72 | The formula was first derived by [Laplace](#fig:laplace) to solve [the sunrise problem](https://en.wikipedia.org/wiki/Sunrise_problem), the problem of calculating the probability that the sun will rise tomorrow given that it's risen every day so far. 73 | ``` 74 | Suppose we take all possible compositions of the urn into account: the portion of black balls could be anywhere from $0\%$ to $100\%$. If all these possibilities are equally likely, and we draw randomly with replacement, then the probability the next draw will be black is 75 | $$ \frac{k + 1}{n + 2},$$ 76 | where $k$ is the number of black marbles drawn so far, and $n$ is the total number of draws so far. Deriving this formula is a bit tedious so we won't go into it here. We'll settle for understanding it instead. 77 | 78 | In our example, we did $10$ draws, all of which came up black. So $n = 10$, and $k = 10$ too. Applying the Rule of Sucession gives us a probability of 79 | $$ \frac{10 + 1}{10 + 2} = \frac{11}{12}, $$ 80 | in other words the next draw has about a $0.92$ probability of being black. If we'd only gotten $k = 5$ black marbles out of $n = 10$ draws, the probability would be $6/12 = 1/2$. 81 | 82 | Notice though that we've somewhat changed the subject. The Rule of Succession gives us the probability for *one* draw. It doesn't tell us the probability that *all* marbles in the urn are black, it just tells us the probability of getting a black marble if we draw one of them. Analyzing the probability that all the marbles are black is trickier, so we won't go into it. Just be aware that the Rule of Succession gives us individual probabilities, not general ones. 83 | 84 | Notice also that the Rule of Succession relies on two assumptions. The first is an assumption we also used earlier, namely that we're sampling randomly. Sometimes this assumption is realistic, but in many real-world applications getting a random sample is tricky, even impossible. 85 | 86 | The second assumption is that all possible compositions of the urn are equally likely. It's just as likely that $50\%$ of the marbles are black as that $75\%$ are, or $35.12\%$, or $0.0001\%$. This assumption certainly looks reasonable, at least in some cases. But we'll see in [Chapter 18](#priors) that there are fundamental problems lurking here. 87 | 88 | When these assumptions hold though, the Rule of Succession is entirely correct. It just follows from the rules of probability we've already learned. We just need to remember that it would be a mistake to use the Rule of Succession in situations where these assumptions do not apply. 89 | 90 | 91 | ## Inference to the Best Explanation {#bayesibe} 92 | 93 | Let's set aside arguments that generalize from observed instances, and focus instead on a different form of inductive argument we met in [Chapter 2][Logic], namely Inference to the Best Explanation. An example: 94 | 95 | ```{block, type="argument", echo=TRUE} 96 | My car won't start and the gas gauge reads empty.\ 97 | Therefore, my car is out of gas. 98 | ``` 99 | 100 | My car being out of gas is a very good explanation of the facts that it won't start and the gauge reads empty. So this seems like a pretty strong argument. 101 | 102 | How do we understand its strength using probability? This is actually a controversial topic, currently being studied by researchers. There are different, competing theories about how Inference to the Best Explanation fits into probability theory. So we'll just look at one, popular way of understanding things. 103 | 104 | Let's start by thinking about what makes an explanation a good one. 105 | 106 | `r newthought("A good")` explanation should account for all the things we're trying to explain. For example, if we're trying to explain why my car won't start and the gauge reads empty, I'd be skeptical if my mechanic said it's because the brakes are broken. That doesn't account for any of the symptoms! I'd also be skeptical if they said the gas gauge was broken. That might fit okay with one of the symptoms (the gauge reads empty), but it doesn't account for the fact the car won't start. 107 | 108 | The explanation that my car is out of gas, however, fits both symptoms. It would account for both the empty reading on the gauge and the car's refusal to start. 109 | 110 | A good explanation should also fit with other things I know. For example, suppose my mechanic tries to explain my car troubles by saying that both the gauge and the ignition broke down at the same time. But I know my car is new, it's a highly reliable model, and it was recently serviced. So my mechanic's explanation doesn't fit well with the other things I know. It's not a very good explanation. 111 | 112 | We have two criteria now for a good explanation: 113 | 114 | 1. it should account for all the things we're trying to explain, and 115 | 2. it should fit well with other things we know. 116 | 117 | These criteria match up with terms in Bayes' theorem. Imagine we have some evidence $E$ we're trying to explain, and some hypothesis $H$ that's meant to explain it. Bayes' theorem says: 118 | $$ \p(H \given E) = \frac{\p(H)\p(E \given H)}{\p(E)}. $$ 119 | How probable is our explanation $H$ given our evidence $E$? Well, the larger the terms in the numerator are, the higher that probability is. And the terms in the numerator correspond to our two criteria for a good explanation. 120 | 121 | 1. $\p(E \given H)$ corresponds to how well our hypothesis $H$ accounts for our evidence $E$. If $H$ is the hypothesis that the car is out of gas, then $\p(E \given H) \approx 1$. After all, if there's no gas in the car, it's virtually guaranteed that it won't start and the gauge will read empty. (It's not perfectly guaranteed because the gauge could be broken after all, though that's not very likely.) 122 | 123 | 2. $\p(H)$ corresponds to how well our hypothesis fits with other things we know. For example, suppose I know it's been a while since I put gas in the car. If $H$ is the hypothesis that the car is out of gas, this fits well with what I already know, so $\p(H)$ will be pretty high. 124 | 125 | Whereas if $H$ is the hypothesis that the gauge and the ignition both broke down at the same time, this hypothesis starts out pretty improbable given what else I know (it's a new car, a reliable model, and recently serviced). So in that case, $\p(H)$ would be low 126 | 127 | So the better $H$ accounts for the evidence, the larger $\p(E \given H)$ will be. And the better $H$ fits with my background information, the larger $\p(H)$ will be. Thus, the better $H$ is as an explanation, the larger $\p(H \given E)$ will be. And thus the stronger $E$ will be as an argument for $H$. 128 | 129 | What about the last term in Bayes' theorem though, the denominator $\p(E)$? It corresponds to a virtue of good explanations too! 130 | 131 | (ref:mooncap1) The hammer/feather experiment was performed on the moon in 1971. See the [full video here](https://bit.ly/1KLQzOB). 132 | 133 | (ref:mooncap2) The hammer/feather experiment has also been performed in vacuum chambers here on earth. A beautifully filmed example is [available on YouTube](https://bit.ly/10hw8mP), courtesy of the BBC. 134 | 135 | ```{r echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap=c("(ref:mooncap1)")} 136 | 137 | if (knitr:::is_latex_output()) { 138 | knitr::include_graphics("img/moon.png") 139 | } else { 140 | knitr::include_graphics("img/moon.gif") 141 | } 142 | ``` 143 | 144 | Scientists love theories that explain the unexplained. For example, Newton's theory of physics is able to explain why a heavy object and a light object, like a hammer and feather, fall to the ground at the same speed as long as there's no air resistance. If you'd never performed this experiment before, you'd probably expect the hammer to fall faster. You'd be surprised to find that the hammer and feather actually hit the ground at the same time. That Newton's theory explains this surprising fact strongly supports his theory. 145 | 146 | So the ability to explain surprising facts is a third virtue of a good explanation. And this virtue corresponds to our third term in Bayes' theorem: 147 | 148 | ```{r echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap=c("(ref:mooncap2)")} 149 | if (knitr:::is_latex_output()) { 150 | knitr::include_graphics("img/vacuum.png") 151 | } else { 152 | knitr::include_graphics("img/vacuum.gif") 153 | } 154 | ``` 155 | 156 | 3. $\p(E)$ corresponds to how surprising the evidence $E$ is. If $E$ is surprising, then $\p(E)$ will be low, since $E$ isn't something we expect to be true. 157 | 158 | And since $\p(E)$ is in the denominator of Bayes' theorem, a smaller number there means a *bigger* value for $\p(H \given E)$. So the more surprising the finding $E$ is, the more it supports a hypothesis $H$ that explains it. 159 | 160 | According to this analysis then, each term in Bayes' theorem corresponds to a virtue of a good explanation. And that's why Inference to the Best Explanation works as a form of inductive inference. 161 | 162 | 163 | ## Exercises {-} 164 | 165 | #. The Rule of Succession can look a little strange or mysterious at first. Why is there a $+1$ in the numerator, and a $+2$ in the denominator? To answer these questions, respond to the following. 166 | 167 | a. According to the Rule of Succession, what is the probability that the first marble drawn will be black? (Hint: in this scenario $k$ and $n$ are the same number, what number?) 168 | #. Suppose we do $n$ random draws and half of them are black. According to the Rule of Sucession, what is the probability that the next marble will be black? 169 | #. Explain why the Rule of Succession has $+1$ in the numerator and $+2$ in the denominator. 170 | 171 | #. The Rule of Succession doesn't just tell us the probability of black on the next draw. The same formula $(k+1)/(n+2)$ applies to any future draw. 172 | 173 | a. Suppose we've done $10$ draws so far and $7$ of them were black. What is the probability that the next two draws will both be black? (Careful: the draws are not independent.) 174 | #. In general, if we've done $n$ draws and $k$ were black, what is the probability that the next two draws will both be black? Your answer should be a formula in terms of $n$ and $k$. 175 | 176 | #. Instead of the Rule of Succession, statisticians often use the simpler formula $k/n$. This is known as the "maximum likelihood estimate," or MLE. 177 | 178 | a. According to the MLE, what is the probability that the first marble drawn will be black? 179 | b. According to the MLE, if we draw one random marble and it's white, what is the probability the next marble drawn will be black? 180 | #. The MLE usually gives different answers than the Rule of Succession. But they do agree in one special case: when half the draws are black, both formulas equal $1/2$. Prove that this is always true. 181 | #. Although the two formulas usually give different answers, they give very similar answers as long as... (fill in the blank) 182 | 183 | #. Suppose an urn contains a mix of black and white balls. There are two, equally likely possibilities: the ratio of black to white is either $2:1$ or $1:2$. Suppose we do two draws and they are both black. 184 | 185 | a. What is the probability the next draw will be black? 186 | #. Would the Rule of Succession give the same answer? Why or why not? 187 | 188 | #. Suppose we have some evidence $E$ for which we are considering two possible explanations, $H_1$ and $H_2$. 189 | 190 | a. Suppose $H_1$ and $H_2$ are mutually exclusive and exhaustive, and they fit equally well with our background information. But $H_1$ fits the evidence $E$ better than $H_2$ does. Prove that $\p(H_1 \given E) > \p(H_2 \given E)$. 191 | #. Suppose $H_1$ and $H_2$ fit equally well with our background information, but they are not mutually exclusive or exhaustive. As before, $H_1$ fits the evidence $E$ better than $H_2$. Prove that $\p(H_1 \given E) > \p(H_2 \given E)$. 192 | #. Suppose the two explanations fit the evidence $E$ equally well, but $H_1$ fits better with our background information. Prove that $\p(H_1 \given E) > \p(H_2 \given E)$. 193 | 194 | #. Suppose we flip a coin $3$ times. Our evidence $E$ is that it lands heads, tails, heads. Now consider two possible explanations. 195 | 196 | - $H_1$: the coin is fair. 197 | - $H_2$: the coin is biased toward heads, with a $2/3$ chance of heads on each toss (the tosses are independent). 198 | 199 | Suppose these are the only two possibilities, and they fit equally well with our background information. 200 | 201 | a. How well does each hypothesis fit the evidence, $E$? That is, what are $\p(E \given H_1)$ and $\p(E \given H_2)$? 202 | #. How probable is each hypothesis given the evidence. In other words, what are $\p(H_1 \given E)$ and $\p(H_2 \given E)$? 203 | 204 | #. Suppose $\p(H \given E) < \p(H)$. When is $\p(E \given H)/\p(E) > 1$ then? Always, just sometimes, or never? Assume all conditional probabilities are well-defined. 205 | 206 | #. Suppose $\p(E \given H) > \p(E)$. When is $\p(H \given E) > 1/2$ then? Assume all conditional probabilities are well-defined. 207 | 208 | a. Always 209 | #. Just sometimes: when $\p(E) \leq 1/2$. 210 | #. Just sometimes: when $\p(H) \geq 1/2$. 211 | #. Never 212 | 213 | #. Consider this statement: 214 | 215 | - If $\p(E \given H) = 1$ and $0 < \p(E) < 1$, then $\p(H \given E) < \p(H)$. 216 | 217 | Does this statement always hold? If yes, prove that it does. If no, give a counterexample (draw an Euler diagram where the first two conditions hold but not the third). -------------------------------------------------------------------------------- /15-two-schools.Rmd: -------------------------------------------------------------------------------- 1 | # (PART\*) Part III {-} 2 | 3 | # Two Schools 4 | 5 | 6 | ```{block, type="epigraph"} 7 | As a statistician, there’s a big secret that I think the public needs to know. Statisticians don’t all agree on what probability is!\ 8 | ---Kareem Carr 9 | ``` 10 | 11 | 12 | ```{r echo=FALSE, fig.margin=TRUE, fig.cap="Ned Flanders informs us that, well sir, there are two schools of thought on the matter."} 13 | knitr::include_graphics("img/flanders.png") 14 | ``` 15 | 16 | `r newthought("What")` does the word "probability" mean? There are two competing philosophies of probability, and two very different schools of statistics to go with them. 17 | 18 | 19 | ## Probability as Frequency 20 | 21 | In statistics, the dominant tradition has been to think of probability in terms of "frequency." What's the probability a coin will land heads? That just depends on how often it lands heads---the *frequency* of heads. 22 | 23 | If a coin lands heads half the time, then the probability of heads on any given toss is $1/2$. If it lands heads $9/10$ of the time, then the probability of heads is $9/10$. 24 | 25 | This is probably the most common way of understanding "probability." You may even be thinking to yourself, *isn't it obvious that's what probability is about?* 26 | 27 | 28 | ## Probability as Belief 29 | 30 | But many statements about probability don't fit the frequency mold, not very well at least. 31 | 32 | Consider the statement, "the probability the dinosaurs were wiped out by a meteor is $90\%$." Does this mean $90\%$ of the times dinosaurs existed on earth, they were wiped out by a meteor? They only existed once! This probability is about an event that doesn't repeat. So there's no frequency with which it happens. 33 | 34 | Here's another example: "humans are probably the main cause of our changing climate." Does that mean most of the time, when climate change happens, humans are the cause? Humans haven't even been around for most of the climate changes in Earth's history. So again: this doesn't seem to be a statement about the frequency with which humans cause global warming. 35 | 36 | These statements appear instead to be about what beliefs are supported by the evidence. When someone says it's $90\%$ likely the dinosaurs were wiped out by a meteor, they mean the evidence warrants being $90\%$ confident that's what happened.^[What evidence? People don't always say what evidence they're relying on. But sometimes they do: fossil records and geological traces, for example.] Similarly, when someone says humans are probably the main cause of climate change, they mean that the evidence warrants being more than $50\%$ confident it's true. 37 | 38 | So, some probability statements appear to be about *belief*, not frequency. If a proposition has high probability, that means the evidence warrants strong belief in it. If a proposition has low probability, the evidence only warrants low confidence. 39 | 40 | 41 | ## Which Kind of Probability? 42 | 43 | Which kind of probability are scientists using when they use probability theory? Is science about the frequency with which certain events happen? Or is it about what beliefs are warranted by the evidence? 44 | 45 | There is a deep divide among scientists on this issue, especially statisticians. 46 | 47 | The *frequentists* think that science deals in the first kind of probability, frequency. This interpretation has the appeal of being conrete and objective, since we can observe and count how often something happens. And science is all about observation and objectivity, right? 48 | 49 | ```{marginfigure, echo=TRUE} 50 | When YouTuber [Dream](https://en.wikipedia.org/wiki/Dream_(YouTuber)) was accused of cheating in a Minecraft speedrun record, his appeal was rejected for illegitimately mixing a frequentist approach with a Bayesian analysis. 51 | ``` 52 | 53 | The *Bayesians* think instead that science deals in the second kind of probability, belief-type probability. Science is supposed to tell us what to believe given our evidence, after all. So it has to go beyond just the frequencies we've observed, and say what beliefs those observations support. 54 | 55 | Let's consider the strengths and weaknesses of each approach. 56 | 57 | 58 | ## Frequentism 59 | 60 | According to frequentists, probability is all about how often something happens. But what if it only ever has one opportunity to happen? 61 | 62 | For example, suppose we take an ordinary coin fresh from the mint, and we flip it once. It lands heads. Then we melt it down and destroy it. Was the probability of heads on that flip $1$? The coin landed heads $1$ out of $1$ times, so isn't that what the frequency view implies? And yet, common sense says the probability of heads was $1/2$, not $1$. It was an ordinary coin, it could have landed either way. 63 | 64 | Well, we can distinguish *actual* frequency from *hypothetical* frequency. 65 | 66 | *Actual frequency* is the number of times the coin actually lands heads, divided by the total number of flips. If there's only one flip and it's a heads, then the actual frequency is $1/1$, which is just $1$. If there's ten flips and four are heads, then the actual frequency is $4/10$. 67 | 68 | But *hypothetical frequency* is the number of times the coin *would* land heads if you flipped it over and over for a long time, divided by the total number of hypothetical flips. If we flipped the coin a hundred times for example, it would probably land heads about half the time, like in Figure \@ref(fig:hundredflips). 69 | 70 | ```{r hundredflips, echo=FALSE, fig.width=12, fig.fullwidth=TRUE, fig.cap="The frequency of heads over the course of $100$ coin flips. This particular sequence of heads and tails was generated by a computer simulation."} 71 | set.seed(1) 72 | 73 | df <- data.frame(outcome = rbinom(100, 1, 1/2)) %>% 74 | mutate(ht = if_else(outcome == 1, "H", "T"), 75 | n = row_number(), 76 | k = cumsum(outcome)) 77 | 78 | ggplot(df) + 79 | geom_line(aes(x = n, y = k/n)) + 80 | geom_text(aes(x = n, y = -.05, label = ht)) + 81 | xlab("flip") + 82 | ylim(-.1, 1) + ylab("frequency of heads") + 83 | theme_minimal(base_size = 12) 84 | ``` 85 | 86 | Presumably, it's the *hypothetical* frequency that is the real probability of heads, according to frequentists. So doesn't that solve our problem with one-off events? Even if a coin is only ever flipped once, what matters is how it would have landed if we'd flipped it many times. 87 | 88 | `r newthought("Serious")` problems beset the hypothetical frequency view too, however. 89 | 90 | ```{r threecoins, echo=FALSE, fig.margin=TRUE, fig.cap="Three fair coins flipped $100$ times each, yielding three different frequencies"} 91 | set.seed(1) 92 | 93 | df1 <- data.frame(outcome = rbinom(100, 1, 1/2)) %>% 94 | mutate(ht = if_else(outcome == 1, "H", "T"), 95 | n = row_number(), 96 | k = cumsum(outcome)) 97 | df1$title <- paste(tail(df1$k, n = 1), "H,", 100 - tail(df1$k, n = 1), "T") 98 | 99 | df2 <- data.frame(outcome = rbinom(100, 1, 1/2)) %>% 100 | mutate(ht = if_else(outcome == 1, "H", "T"), 101 | n = row_number(), 102 | k = cumsum(outcome)) 103 | df2$title <- paste(tail(df2$k, n = 1), "H,", 100 - tail(df2$k, n = 1), "T") 104 | 105 | df3 <- data.frame(outcome = rbinom(100, 1, 1/2)) %>% 106 | mutate(ht = if_else(outcome == 1, "H", "T"), 107 | n = row_number(), 108 | k = cumsum(outcome)) 109 | df3$title <- paste(tail(df3$k, n = 1), "H,", 100 - tail(df3$k, n = 1), "T") 110 | 111 | df <- bind_rows(df1, df2, df3) 112 | 113 | ggplot(df) + 114 | geom_line(aes(x = n, y = k/n)) + 115 | facet_grid(rows = vars(title)) + 116 | xlab("flip") + 117 | ylim(-.1, 1) + ylab("frequency of heads") 118 | ``` 119 | 120 | The first problem is that it makes our definition of "probability" circular, because hypothetical frequency has to be defined in terms of probability. If you flipped the coin over and over, say a hundred times, the most *probable* outcome is $50$ heads and $50$ tails. But other outcomes are perfectly possible, like $48$ heads, or $54$ heads. Figure \@ref(fig:threecoins) shows an example of three fair coins flipped $100$ times each, yielding three different frequencies. 121 | 122 | So the hypothetical frequency of $1/2$ isn't what would necessarily happen. It's only what would *probably* happen. So what we're really saying is: "probability" $=$ most probable hypothetical frequency. But you can't define a concept in terms of itself! 123 | 124 | The second problem is about observability. You can observe actual frequencies, but not hypothetical frequencies. We never actually get to flip a coin more than a few hundred times. So hypothetical frequencies aren't observable and objective, which undermines the main appeal of the frequency theory. 125 | 126 | A third problem has to do with evaluating scientific theories. Part of the point of science is to establish which theory is most probable. But theories don't have frequencies. Recall the example from earlier, about the dinosaurs being made extinct by a meteor. Or take the theory that DNA has a double-helix structure. When we say these theories are highly probable, how would we translate that into a statement about hypothetical frequencies? 127 | 128 | ```{marginfigure} 129 | Sports commentators often cite statistics like, "no team has ever come back from a 28 point lead in the last period of a championship away game." Sometimes these statistics are informative, but sometimes they're so specific they feel like parody. How specific is too specific? 130 | ```` 131 | 132 | A fourth and final problem is that how often an event happens depends on what you compare it to. It depends on the *reference class*. Consider Tweety, who has wings and is coloured black-and-white. What is the probability that Tweety can fly? Most things with wings can fly. Most things that are black-and-white cannot. Which reference class determines the probability that Tweety can fly? The class of winged things, or the class of black-and-white things? 133 | 134 | It's problems like these that drive many philosophers and scientists away from frequentism, and toward the alternative offered by "Bayesian" probability. 135 | 136 | 137 | ## Bayesianism 138 | 139 | According to Bayesians, probability is ultimately about belief. It's about how certain you should be that something is true. 140 | 141 | For example, $\p(A)=.9$ means that $A$ is certain to degree $0.9$. We can be $90\%$ confident that $A$ is true. Whereas $\p(A)=.3$ means that $A$ is certain to degree $0.3$. We can only be $30\%$ confident $A$ is true. 142 | 143 | Why is this view called "Bayesianism"? Because it uses Bayes' theorem to explain how science works. 144 | 145 | Suppose we have a hypothesis $H$, and some evidence $E$. How believable is $H$ given the evidence $E$? Bayes' Theorem tells us $\p(H \given E)$ can be calculated: 146 | $$ 147 | \begin{aligned} 148 | \p(H \given E) &= \frac{\p(H)\p(E \given H)}{\p(E)}. 149 | \end{aligned} 150 | $$ 151 | And we saw in Section \@ref(bayesibe) how each term on the right corresponds to a rule of good scientific reasoning. 152 | 153 | The better a theory fits with the evidence, the more believable it is. And $\p(E \given H)$ corresponds to how well the hypothesis explains the evidence. Since this term appears in the numerator of Bayes' theorem, it makes $\p(H \given E)$ larger. 154 | 155 | The more surprising ("novel") a finding is, the more it supports a theory that explains it. The term $\p(E)$ corresponds to how surprising the evidence is. And since it appears in the denominator of Bayes' theorem, surprising evidence makes $\p(H \given E)$ larger if $H$ can successfully explain $E$. 156 | 157 | Finally, new evidence has to be weighed against previous evidence and existing considerations. The term $\p(H)$ corresponds to the prior plausibility of the hypothesis $H$, and it appears in the numerator of Bayes' theorem. So the more the hypothesis fits with prior considerations, the larger $\p(H \given E)$ will be. 158 | 159 | So, Bayesians say, we should understand probability as the degree of belief it's rational to have. The laws of probability, like Bayes' theorem, show us how to be good, objective scientists, shaping our beliefs according to the evidence. 160 | 161 | `r newthought("The main")` challenge for Bayesians is objectivity. Critics complain that science is objective, but belief is subjective. How so? 162 | 163 | First, belief is something personal and mental, so it can't be quantified objectively. What does it even mean to be $90\%$ confident that something is true, you might ask? How can you pin a number on a belief? 164 | 165 | And second, belief varies from person to person. People from different communities and with different personalities bring different opinions and assumptions to the scientific table. But science is supposed to eliminate personal, subjective elements like opinion and bias. 166 | 167 | `r newthought("So frequentists")` and Bayesians both have their work cut out for them. In the coming chapters we'll see how they address these issues. 168 | 169 | 170 | ## Exercises {-} 171 | 172 | #. Consider these two statements. 173 | 174 | i. According to Bayesianism, frequencies aren't real. 175 | ii. According to Bayesianism, frequencies don't obey the laws of probability. 176 | 177 | Are both statements true? Both false? Only one (which one)? 178 | 179 | #. Explain the one-shot event problem for frequentism. Give two *original* examples to illustrate the problem. 180 | 181 | #. Explain the difference between actual and hypothetical frequency. Give two *original* examples to illustrate. Then explain why the distinction is important. 182 | 183 | #. Explain the reference class problem for frequentism. Give two *original* examples to illustrate the problem. -------------------------------------------------------------------------------- /16-beliefs-and-betting-rates.Rmd: -------------------------------------------------------------------------------- 1 | # Beliefs & Betting Rates 2 | 3 | `r newthought("For")` Bayesians, probabilities are beliefs. When I say it'll probably rain today, I'm telling you something about my personal level of confidence in rain today. I'm saying I'm more than $50\%$ confident it'll rain. 4 | 5 | But how can we quantify something as personal and elusive as a level of confidence? Bayesians answer this question using the same basic idea we used for utility in [Chapter 12][Utility]. They look at people's willingness to risk things they care about. 6 | 7 | 8 | ## Measuring Personal Probabilities 9 | 10 | The more confident someone is, the more they'll be willing to bet. So let's use betting rates to quantify personal probabilities. 11 | 12 | I said I'm more than $50\%$ confident it'll rain today. But exactly how confident: $60\%$? $70\%$? Well, I'd give two-to-one odds on it raining today, and no higher. In other words, I'd accept a deal that pays $\$1$ if it rains, and costs me $\$2$ otherwise. But I wouldn't risk more than $\$2$ when I only stand to win $\$1$. 13 | 14 | In this example I put $2$ dollars on the table, and you put down $1$ dollar. Whoever wins the bet keeps all $3$ dollars. The sum of all the money on the table is called the ***stake***. In this case the stake is $\$2 + \$1 = \$3$. 15 | 16 | If it doesn't rain, I'll lose $\$2$. To find my ***fair betting rate***, we divide this potential loss by the stake: 17 | $$ 18 | \begin{aligned} 19 | \mbox{betting rate} &= \frac{\mbox{potential loss}}{\mbox{stake}}\\ 20 | &= \frac{\$2}{\$2 + \$1}\\ 21 | &= \frac{2}{3}. 22 | \end{aligned} 23 | $$ 24 | 25 | ```{r echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap="A bet that pays $\\$1$ if you win and costs $\\$2$ if you lose, is fair when the blue and red regions have equal size: when the probability of winning is $2/3$."} 26 | f <- function(x) case_when(x <= 2/3 ~ 1, x <= 1 ~ 0) 27 | g <- function(x) case_when(x <= 2/3 ~ 0, x <= 1 ~ -2) 28 | 29 | ggplot() + 30 | stat_function(fun = f, geom = "area", n = 1000, fill = bookblue) + 31 | stat_function(fun = g, geom = "area", n = 1000, fill = bookred) + 32 | scale_y_continuous("payoff ($)", breaks = seq(-2, 1, 1), limits = c(-2.5, 1.5)) + 33 | scale_x_continuous("probability", labels = c("0" = "0", "1/3" = "1/3", "2/3" = "2/3", "1" = "1"), breaks = seq(0, 1, 1/3)) 34 | ``` 35 | 36 | A person's betting rate reflects their degree of confidence. The more confident they are of winning, the more they'll be willing to risk losing. In this example my betting rate is $2/3$ because I'm $2/3$ confident it will rain. That's my personal probability: $\p(R) = 2/3$. 37 | 38 | Notice that a bet at two-to-one odds has zero expected value given my personal probability of $2/3$: 39 | $$ (2/3)(\$1) + (1/3)(-\$2) = 0. $$ 40 | This makes sense: it's a fair bet from my point of view, after all. 41 | 42 | ```{r echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap="A bet that pays $\\$9$ if you win and costs $\\$1$ if you lose is fair when the probability of winning is $1/10$."} 43 | f <- function(x) case_when(x <= 1/10 ~ 9, x <= 1 ~ 0) 44 | g <- function(x) case_when(x <= 1/10 ~ 0, x <= 1 ~ -1) 45 | 46 | ggplot() + 47 | stat_function(fun = f, geom = "area", n = 1000, fill = bookblue) + 48 | stat_function(fun = g, geom = "area", n = 1000, fill = bookred) + 49 | scale_y_continuous("payoff ($)", limits = c(-1.5, 10.5)) + 50 | scale_x_continuous("probability") 51 | ``` 52 | 53 | What if I were less confident in rain, say just $1/10$ confident? Then I'd be willing to stake much less. I'd need you to put down at least $\$9$ before I'd put down even $\$1$. Only then would the bet have $0$ expected value: 54 | $$ (1/10)(\$9) + (9/10)(-\$1) = 0. $$ 55 | So, for the bet to be fair in my eyes, the odds have to match my fair betting rate. 56 | 57 | Here's the general recipe for quantifying someone's personal probability in proposition $A$. 58 | 59 | 1. Find a bet on $A$ they deem fair. Call the potential winnings $w$ and the potential losses $l$. 60 | 2. Because they deem the bet fair, set the expected value of the bet equal to zero: 61 | $$ \p(A) \times w + (1-\p(A)) \times -l = 0. $$ 62 | 3. Now solve for $\p(A)$: 63 | $$ 64 | \begin{aligned} 65 | \p(A) \times w + (1-\p(A)) \times -l &= 0 \\ 66 | \p(A) \times w &= (1-\p(A)) \times l \\ 67 | \p(A) \times w + \p(A) \times l &= l \\ 68 | \p(A) &= \frac{l}{w + l}. 69 | \end{aligned} 70 | $$ 71 | 72 | Notice how we got the same formula we started with: potential loss divided by total stake. 73 | 74 | You can memorize this formula, but personally, I prefer to apply the recipe. It shows why the formula works, and it also exposes the formula's limitations. It helps us understand when the formula *doesn't* work. 75 | 76 | 77 | ## Things to Watch Out For 78 | 79 | Personal probabilities aren't revealed by just any old betting rate a person will accept. They're exposed by the person's *fair* betting rates. 80 | 81 | Consider: I'd take a bet where you pay me a million dollars if it rains today, and I pay you just $\$1$ otherwise. But that's because I think this bet is *advantageous*. I don't think this is a fair bet, which is why I'd only take one side of it. I wouldn't take the reverse deal, where I win $\$1$ if it rains and I pay you a million dollars if it does. That's a terrible deal from my point of view! 82 | 83 | So you can't just look at a bet a person is willing to accept. You have to look at a bet they're willing to accept *because they think it's fair*. 84 | 85 | `r newthought("Another")` caveat is that we're cheating by using dollars instead of utils. When we learned about utility, we saw that utility and dollars can be quite different. Gaining a dollar and losing a dollar aren't necessarily comparable. Especially if it's your last dollar! 86 | 87 | So, to really measure personal probabilities accurately, we'd have to substitute utilities for dollars. Nevertheless, we'll pretend dollars and utils are equal for simplicity. Dollars are a decent approximation of utils for many people, as long as we stick to small sums. 88 | 89 | `r newthought("Last")` but definitely not least, our method only works when the person is following the expected value formula. Setting the expected value equal to zero was the key to deriving the formula: 90 | $$ \p(A) = \frac{\mbox{potential loss}}{\mbox{stake}}. $$ 91 | But we know people don't always follow the expected value formula, that's one of the lessons of [the Allais paradox]. So this way of measuring personal probabilities is limited. 92 | 93 | 94 | ## Indirect Measurements 95 | 96 | Sometimes we don't have the betting rate we need in order to apply the loss/stake formula directly. But we can still figure things out indirectly, given the betting rates we do have. 97 | 98 | For example, I'm not very confident there's intelligent life on other planets. But I'd be much more confident if we learned there was life of any kind on another planet. If NASA finds bacteria living on Mars, I'll be much less surprised to learn there are intelligent aliens on Alpha Centauri. 99 | 100 | Exactly how confident will I be? What is $\p(I \given L)$, my personal probability that there is intelligent life on other planets given that there's life of some kind on other planets at all? 101 | 102 | Suppose I tell you my betting rates for $I$ and $L$. I deem the following bets fair: 103 | 104 | - I win $\$9$ if $I$ is true, otherwise I pay $\$1$. 105 | - I win $\$6$ if $L$ is true, otherwise I pay $\$4$. 106 | 107 | You can apply the loss/stake formula to figure $\p(I) = 1/10$ and $\p(L) = 4/10$. But what about $\p(I \given L)$? You can figure that out by starting with the definition of conditional probability: 108 | $$ 109 | \begin{aligned} 110 | \p(I \given L) &= \p(I \wedge L)/\p(L) \\ 111 | &= \p(I)/\p(L) \\ 112 | &= 1/4. 113 | \end{aligned} 114 | $$ 115 | The second line in this calculation uses the fact that $I$ is equivalent to $I \wedge L$. If there's intelligent life, then there must be life, by definition. So $I \wedge L$ is redundant. We can drop the second half and replace the whole statement with just $I$. 116 | 117 | The general strategy here is: 1) identify what betting rates you have, 2) apply the loss/stakes formula to get those personal probabilities, and then 3) apply familiar rules of probability to derive other personal probabilities. 118 | 119 | We have to be careful though. This technique only works if the subject's betting rates follow the familiar rules of probability. If my betting rate for rain tomorrow is $3/10$, you might expect my betting rate for no rain to be $7/10$. But people don't always follow the laws of probability, just as they [don't always follow the expected utility rule][The Allais Paradox]. The taxicab problem from [Chapter 8](#chbayes) illustrates one way people commonly violate the rules of probability. We'll encounter [another way](#bankteller) in the next chapter. 120 | 121 | 122 | ## Exercises {-} 123 | 124 | #. Li thinks humans will eventually colonize Mars. More exactly, he regards the following deal as fair: if he's right about that, you pay him $\$3$, otherwise he'll pay you $\$7$. 125 | 126 | Suppose Li equates money with utility: for him, the utility of gaining $\$3$ is $3$, the utility of losing $\$7$ is $-7$, and so on. 127 | 128 | a. What is Li's personal probability that humans will colonize Mars? 129 | 130 | Li also thinks there's an even better chance of colonization if Elon Musk is elected president of the United States. If Musk is elected, Li will regard the following deal as fair: if colonization happens you pay him $\$3$, otherwise he pays you $\$12$. 131 | 132 | b. What is Li's personal conditional probability that humans will colonize Mars, given that Elon Musk is elected U.S. president? 133 | 134 | Li thinks the chances of colonization are lower if Musk is not elected. His personal conditional probability that colonization will happen given that Musk is not elected is $1/2$. 135 | 136 | c. What is Li's personal probability that Musk will be elected? (Assume his personal probabilities obey the laws of probability.) 137 | 138 | #. Sam thinks the Saskatchewan Roughriders will win the next Grey Cup game. She's confident enough that she regards the following deal as fair: if they win, you pay her $\$2$, otherwise she'll pay you $\$8$. 139 | 140 | Suppose Sam equates money with utility: for her, the utility of gaining $\$2$ is $2$, the utility of losing $\$8$ is $-8$, and so on. 141 | 142 | a. What is Sam's personal probability that the Roughriders will win the Grey Cup? 143 | 144 | Sam thinks the Roughriders will have an even better chance in the snow. If it snows during the game, she will regard the following deal as fair: if the Roughriders win, you pay her $\$1$, otherwise she'll pay you $\$9$. 145 | 146 | b. What is Sam's personal conditional probability that the Roughriders will win the Grey Cup if it snows? 147 | 148 | Sam thinks that the Roughriders will lose their advantage if it doesn't snow. Her personal conditional probability that the Roughriders will win if it doesn't snow is $1/2$. 149 | 150 | c. What is Sam's personal probability that it will snow during the Grey Cup? (Assume her personal probabilities obey all the familiar laws of probability.) 151 | 152 | #. Sam thinks the Leafs have a real shot at the playoffs next year. In fact, she regards the following deal as fair: if the Leafs make the playoffs, you pay her $\$2$, otherwise she pays you $\$10$. 153 | 154 | Suppose Sam equates money with utility: for her, the utility of gaining $\$2$ is $2$, the utility of losing $\$10$ is $-10$, and so on. 155 | 156 | a. What is Sam's personal probability that the Leafs will make the playoffs? 157 | 158 | Sam also thinks the Leafs might even have a shot at winning the Stanley Cup. She's willing to pay you $\$1$ if they don't win the Cup, if you agree to pay her $\$2$ if they do. That's a fair deal for her. 159 | 160 | b. What is Sam's personal probability that the Leafs will win the Stanley Cup? 161 | c. What is Sam's personal conditional probability that the Leafs will win the Stanley Cup if they make the playoffs? (Assume that winning the Stanley Cup logically entails making the playoffs.) 162 | 163 | #. Freya isn't sure whether it will snow tomorrow. For her, a fair gamble is one where she gets $\$10$ if it snows and she pays $\$10$ if it doesn't. Assume Freya equates money with utility. 164 | 165 | a. What is Freya's personal probability for snow tomorrow? 166 | 167 | Here's another gamble Freya regards as fair: she'll check her phone to see whether tomorrow's forecast calls for snow. If it does predict snow, she'll pay you $\$10$, but you have to pay her $\$5$ if it doesn't. 168 | 169 | b. What is Freya's personal probability that the forecast calls for snow? 170 | 171 | After checking the forecast and seeing that it does predict snow, Freya changes her betting odds for snow tomorrow. Now she's willing to accept as little as $\$5$ if it snows, while still paying $\$10$ if it doesn't. 172 | 173 | c. Now what is Freya's personal probability for snow tomorrow? 174 | d. Before she checked the forecast, what was Freya's personal probability that the forecast would predict snow and be right? 175 | 176 | #. Ben's favourite TV show is *Community*. He thinks it's so good they'll make a movie of it. In fact, he's so confident that he thinks the following is a fair deal: he pays you $\$8$ if they don't make it into a movie and you pay him $\$1$ if they do. Assume Ben equates money with utility. 177 | 178 | a. What is Ben's personal probability that *Community* will be made into a movie? 179 | 180 | Ben thinks the odds of a *Community* movie getting made are even higher if his favourite character, Shirly, returns to the show (she's on leave right now). If Shirly returns, he's willing to pay as much as $\$17$ if the movie does not get made, in return for $\$1$ if it does. 181 | 182 | b. What is Ben's personal conditional probability that there will be a *Community* movie if Shirly returns? 183 | 184 | Ben also thinks the chances of a movie go down drastically if Shirly doesn't return. His personal conditional probability that the movie will happen without Shirly is only $1/3$. 185 | 186 | c. What is Ben's personal probability that Shirly will return? -------------------------------------------------------------------------------- /17-dutch-books.Rmd: -------------------------------------------------------------------------------- 1 | # Dutch Books 2 | 3 | ```{block, type="epigraph"} 4 | As to the speculators from the south, they had the advantage in the toss up; they said heads, I win, tails, you lose; they could not lose any thing for they had nothing at stake.\ 5 | ---[Newspaper report](https://chroniclingamerica.loc.gov/lccn/sn83045242/1805-09-02/ed-1/seq-2.pdf), Sep. 2, 1805 6 | ``` 7 | 8 | `r newthought("Critics")` of Bayesianism won't be satisfied just because beliefs can be quantified with betting rates. After all, a person's betting rates might not obey the laws of probability. What's to stop someone being $9/10$ confident it will rain tomorrow and also $9/10$ confident it won't rain? How do we enforce the laws of probability, like the Negation Rule, if probability is just subjective opinion? 9 | 10 | 11 | ## Dutch Books 12 | 13 | The Bayesian answer uses a special betting strategy. If someone violates the laws of probability, we can use this strategy to take advantage of them. We can sucker them into a deal that will lose them money no matter what. 14 | 15 | ```{marginfigure, echo=TRUE} 16 | "A bet is a tax on bullshit." ---Alex Tabarrok 17 | ``` 18 | 19 | For example, suppose Sam violates a very simple rule of probability. His personal probability in the proposition $2+2=4$ is only $9/10$, when it should be $1$. (Because it's impossible for $2+2$ to be anything other than $4$.) Sam has violated the laws of probability, so he'll be willing to accept a very bad deal, as follows. 20 | 21 | We put two marbles into an empty bucket, and then two more. We then offer Sam the following deal: we pay him $90$¢, and in exchange he pays us $\$1$ if the bucket has a total of $4$ marbles in it. 22 | 23 | This deal should be fair according to Sam. As he sees it, there's a $90\%$ chance he'll have to pay us $\$1$ for a net loss of $10$¢. But there's also a $10\%$ chance he won't have to pay us anything, a net gain of $90$¢. So the expected value is zero according to Sam's personal probabilities: 24 | $$ (9/10)(-\$.10) + (1/10)(\$.90) = 0. $$ 25 | And yet, Sam will lose money no matter what! There's no way for him to win: he's bound to end up paying out $\$1$, when we only paid him $90$¢ to get in on the deal. 26 | 27 | `r newthought("This")` kind of deal is called a *Dutch book*.^[Why is it called that? The 'book' part comes from gambling, where 'making a book' means setting up a betting arrangement. But the 'Dutch' part [is a mystery](https://personal.eur.nl/wakker/miscella/dutchbk.htm).] A Dutch book has two defining features. 28 | 29 | 1. The arrangement is fair according to the subject's personal probabilities. 30 | 2. The subject will lose money *no matter what*. 31 | 32 | If you are vulnerable to a Dutch book, then it looks like there's something very wrong with your personal probabilities. A deal looks fair to you when it clearly isn't. It's impossible for you to win! 33 | 34 | Almost nobody in real life would be foolish enough accept the deal we just offered Sam. Real people have enough sense not to violate the laws of probability in such obvious ways. But there are subtler ways to violate the laws of probability, which people do fall prey to. And we can create Dutch books for them too. 35 | 36 | 37 | ## The Bankteller Fallacy {#bankteller} 38 | 39 | Consider a famous problem. 40 | 41 | ```{marginfigure, echo=TRUE} 42 | This problem was devised by the same psychologists who studied [the taxicab problem](#chbayes): Daniel Kahneman and Amos Tversky. This one is from a 1983 study of theirs. 43 | ``` 44 | 45 | ```{block2, type="problem"} 46 | Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable? 47 | 48 | 1. Linda is a bank teller. 49 | 2. Linda is a bank teller and is active in the feminist movement. 50 | 51 | ``` 52 | 53 | In psychology experiments, almost everyone chooses (2). But that can't be right, and you don't need to know anything about Linda to prove it. 54 | 55 | ```{r bankteller, echo=FALSE, fig.margin=TRUE, fig.cap="Bank tellers and feminist bank tellers"} 56 | euler_diagram <- function(propositions) { 57 | ggplot(data = propositions) + theme_void() + coord_fixed() + 58 | xlim(-3,3) + ylim(-2,2) + 59 | theme(panel.border = element_rect(colour = "black", fill=NA, size=1)) + 60 | geom_circle(aes(x0 = cirx, y0 = ciry, r = r)) + 61 | geom_text(aes(x = labx, y = laby, label = labl), 62 | parse = TRUE, size = 7) 63 | } 64 | 65 | propositions <- data.frame( 66 | cirx = c(0 , 0), 67 | ciry = c(0 , 0), 68 | r = c(1.25 , .5), 69 | labx = c(-.55 , -.95), 70 | laby = c(.5 , 1.15), 71 | labl = c("paste(italic(B), '&', italic(F))", "italic(B)") 72 | ) 73 | 74 | euler_diagram(propositions) 75 | ``` 76 | 77 | In general, the probability of a conjunction $B \wedge F$ cannot be greater than the probability of $B$. There are simply more possibilities where $B$ is true than there are possibilities where *both* $B$ *and* $F$ are true. Figure \@ref(fig:bankteller) illustrates the point. 78 | 79 | So it's a law of probability that $\p(B \wedge F) \leq \p(B)$. Which means it can't be more likely that Linda is both a bankteller and a feminist than that she is a bankteller. 80 | 81 | Here's another way to think about it. Imagine everyone who fits the description of Linda gathered in a room. The people in the room who happen to be bank tellers are then gathered together inside a circle. Some of the people inside that circle will also be feminists. But there can't be more feminists inside that circle than people! Feminist bank tellers are just less common than bank tellers in general. 82 | 83 | `r newthought("How do")` we Dutch book someone who mistakenly thinks $B \wedge F$ is more probable than $B$? We offer them two deals, one involving a bet on $B$ and the other a bet on $B \wedge F$. 84 | 85 | Suppose for example their betting rate for $B$ is $1/5$, and their betting rate for $B \wedge F$ is $1/4$. Then we offer them these deals: 86 | 87 | 1. We pay them $20$¢, and in exchange they agree to pay us $\$1$ if $B$ is true. 88 | 2. They pay us $25$¢, and in exchange we agree to pay them $\$1$ if $B \wedge F$ is true. 89 | 90 | Both of these deals are fair according to their betting rates. So our victim will be willing accept both. And yet, they'll lose money no matter what. Why? 91 | 92 | Notice how they've already paid us more than we've paid them, before the bets are even settled. We're up $5$¢ from the get-go. So for them to come out ahead, they'll have to win the second bet. But if they do win the second bet, we win the first bet. They only win the second bet if $B \wedge F$ is true, in which case $B$ is automatically true and we win the first bet. Since both bets pay off $\$1$, they cancel each other out. 93 | 94 | Thinking through a Dutch book can be tricky, but a table like \@ref(tab:banktellerdb) can help. The columns are the possible outcomes, the rows are exchanges of cash. For clarity, we separate investments and returns into their own rows. The investment is the amount the person pays or receives at first, when the bet is arranged. The return is the amount they win/lose if the bet pays off. The last row sums up all the investments and returns to get the net amount won or lost. 95 | 96 | ```{r banktellerdb, echo=FALSE} 97 | table <- data.frame( 98 | c("Bet $1$", "", "Bet $2$", "", "Net"), 99 | c("Investment", "Return", "Investment", "Return", ""), 100 | c("$\\$0.20$", "$-\\$1.00$", "$-\\$0.25$", "$\\$1.00$", "$-\\$0.05$"), 101 | c("$\\$0.20$", "$-\\$1.00$", "$-\\$0.25$", "$\\$0.00$", "$-\\$1.05$"), 102 | c("$\\$0.20$", "$\\$0.00$", "$-\\$0.25$", "$\\$0.00$", "$-\\$0.05$") 103 | ) 104 | colnames(table) <- c("", "", "$B \\wedge F$", "$B \\wedge \\neg F$", "$\\neg B$") 105 | #row.names(table) <- c("Bet $1$:", "", "Bet $2$:", "", "Net") 106 | knitr::kable(table, align = "r", caption = "A Dutch book for the bank teller fallacy") 107 | ``` 108 | 109 | Notice how the net amount is negative in every column. In a Dutch book, no matter what, the victim loses money. 110 | 111 | 112 | ## Dutch Books in General 113 | 114 | Anyone who violates any law of probability can be suckered via a Dutch book. But if you obey the laws of probability, it's impossible to be Dutch booked. That's why the laws of probability are objectively correct, according to Bayesians. 115 | 116 | ```{r echo=FALSE} 117 | # TODO: insert sketch of de Finetti 118 | ``` 119 | ```{marginfigure} 120 | Bruno de Finetti (1906--1985) proved in the $1930$s that the laws of probability are the only safegaurd against Dutch books. The same point was also noted by [Frank Ramsey](#fig:ramsey) in an essay from the $1920$s, though he didn't include a proof. 121 | ``` 122 | 123 | What's the general recipe for constructing a Dutch book when someone violates the laws of probability? 124 | 125 | First, use bets with a $\$1$ stake. That makes things simple, because the victim's personal probability will be the same as their fair price for the bet. If their personal probability is $1/2$, they'll be willing to pay $50$¢ for a bet with a $\$1$ payoff. If their personal probability is $1/4$, they'll be willing to pay $25$¢ for a bet with a $\$1$ payoff. And so on. 126 | 127 | Second, remember that if a bet is fair then a person should be willing to take *either side* of the bet. Suppose they think it's fair to pay $25$¢ in exchange for $\$1$ if $A$ turns out to be true. Then they should also be willing to *accept* $25$¢ instead, and *pay you* $\$1$ if $A$ turns out to be true. It's a fair bet, so either side should be acceptable. 128 | 129 | Third is the trickiest bit. Which propositions should you get them to bet on, and which should you get them to bet against? Think about which propositions they are overconfident of, and which they are underconfident of. 130 | 131 | For example, Sam was underconfident in $2+2=4$. He was only $9/10$ confident instead of $1$. So we underpaid him for a bet on that proposition: $90$¢ in exchange for $\$1$ if it was true. His underconfidence meant he was willing to accept too little in exchange for a $\$1$ payout. 132 | 133 | Whereas in the bank teller example, the victim was overconfident of $B \wedge F$, and underconfident in $B$. So we underpaid them for a bet on $B$, and got them to overpay us for a bet on $B \wedge F$. 134 | 135 | `r newthought("Here's")` one more example. Suppose Suzy the Scientist is conducting an experiment. The experiment could have either of two outcomes, $E$ or $\neg E$. Suzy thinks there's a $70\%$ chance of $E$, and a $40\%$ chance of $\neg E$. She's violated the Additivity rule: $\p(E) = 7/10$ and $\p(\neg E) = 4/10$, which adds up to more than $1$. 136 | 137 | So we can make a Dutch book against Suzy. In this case she's overconfident in both of the propositions $E$ and $\neg E$. So we get her to overpay us for bets on $E$ and on $\neg E$. 138 | 139 | 1. Suzy pays us $70$¢, and in exchange we pay her $\$1$ if $E$ is true. 140 | 2. Suzy pays us $40$¢, and in exchange we pay her $\$1$ if $\neg E$ is true. 141 | 142 | The end result is that Suzy loses $10$¢ no matter what. At the beginning she pays us $\$0.70 + \$0.40 = \$1.10$. Then she wins back $\$1$, either for the bet on $E$ or for the bet on $\neg E$. But in the end, she still has a net loss of $10$¢. 143 | 144 | ```{r echo=FALSE} 145 | table <- data.frame( 146 | c("Bet $1$", "", "Bet $2$", "", "Net"), 147 | c("Investment", "Return", "Investment", "Return", ""), 148 | "E" = c("$-\\$0.70$", "$\\$1.00$", "$-\\$0.40$", "$\\$0.00$", "$-\\$0.10$"), 149 | "nE" = c("$-\\$0.70$", "$\\$0.00$", "$-\\$0.40$", "$\\$1.00$", "$-\\$0.10$") 150 | ) 151 | colnames(table) <- c("", "", "$E$", "$\\neg E$") 152 | knitr::kable(table, align = "r", caption = "A Dutch book for Suzy the Scientist") 153 | ``` 154 | 155 | `r newthought("We now")` have a second way for Bayesians to argue that the concept of personal probability is scientifically respectable. Not only can beliefs be quantified. There are also objective rules that everyone's beliefs should follow: the laws of probability. Anyone who doesn't follow those laws can be suckered into a sure loss. 156 | 157 | 158 | ## Exercises {-} 159 | 160 | #. Suppose Ronnie has personal probabilities $\p(A) = 4/10$ and $\p(\neg A) = 7/10$. Explain how to make a Dutch book against Ronnie. Your answer should include all of the following: 161 | 162 | - A list of the bets to be made with Ronnie. 163 | - An explanation why Ronnie will regard these bets as fair. 164 | - An explanation why these bets will lead to a sure loss for Ronnie no matter what. 165 | 166 | #. Suppose Marco has personal probabilities $\p(X) = 3/10$, $\p(Y) = 2/10$, and $\p(X \vee Y) = 6/10$. Explain how to make a Dutch book against him. Your answer should include all of the following: 167 | 168 | - A list of the bets to be made with Marco. 169 | - An explanation why Marco will regard these bets as fair. 170 | - An explanation why these bets will lead to a sure loss for Marco no matter what. 171 | 172 | #. Maya is the star of her high school hockey team. My personal probability that she'll go to university on an athletic scholarship is $3/5$. But because she doesn't like schoolwork, my personal probability that she'll go to university is $1/3$. Explain how to make a Dutch book against me. 173 | 174 | #. Saia isn't sure what grade she'll get in her statistics class. But she thinks the following deals are all fair: 175 | 176 | - If she gets at least a B+, you pay her $\$2$; otherwise she pays you $\$5$. 177 | - If she gets a B+, you pay her $\$6$; otherwise she pays you $\$1$. 178 | - If she gets a B+, B, or B-, you pay her $\$5$; otherwise she pays you $\$2$. 179 | 180 | Assume that utility and money are equal for Saia. 181 | 182 | a. What is Saia's personal probability that she'll get at least a B+? 183 | b. What is Saia's personal probability that she'll get a B or a B-? 184 | c. True or false: given the information provided about Saia's fair betting rates, there is a way to make a Dutch book against her. 185 | 186 | #. Cheryl isn't sure what the weather will be tomorrow, but she thinks the following deals are both fair: 187 | 188 | - If it rains, you pay her $\$3$; otherwise she pays you $\$7$. 189 | - If it rains or snows, she pays you $\$8$; otherwise you pay her $\$2$. 190 | 191 | Assume that utility and money are equal for Cheryl. 192 | 193 | a. What is Cheryl's personal probability that it will rain? 194 | b. What is Cheryl's personal probability that it will rain or snow? 195 | c. True or false: given the information provided about Cheryl's fair betting rates, there is a way to make a Dutch book against her. 196 | 197 | #. Silvio can't find his keys. He suspects they're either in his car or in his apartment. His personal probability is $6/10$ that they're in one of those two places. But he searched his car top to bottom and didn't find them, so his personal probability that they're in his car is only $1/10$. On the other hand, his apartment is messy and most of the time when he can't find something, it's buried somewhere in the apartment. So his personal probability that the keys are in his apartment is $3/5$. Explain how to make a Dutch book against Silvio. 198 | 199 | #. Suppose my personal probability for rain tomorrow is $30\%$, and my personal probability for snow tomorrow is $40\%$. Suppose also that my personal probability that it will either snow or rain tomorrow is $80\%$. Explain how to make a Dutch book against me. 200 | 201 | #. Piz's personal probability that Pia is a basketball player is $1/4$. His probability that she's a good basketball player is $1/3$. Explain how to make a Dutch book against Piz. 202 | 203 | #. Consider these two statements: 204 | 205 | i. If someone's degrees of belief violate a law of probability, then they can be Dutch booked. 206 | ii. If someone can be Dutch booked, then their degrees of belief violate a law of probability. 207 | 208 | Which one of the following is correct? 209 | 210 | a. Only (i) is true. 211 | #. Only (ii) is true. 212 | #. Both (i) and (ii) are true. 213 | #. Neither is true. 214 | 215 | #. Suppose $A$ and $B$ are compatible propositions. True or false: if someone has personal probabilities $\p(A \wedge B) = \p(A \vee B)$, then they can be Dutch booked. 216 | 217 | #. Suppose a fair coin will be flipped twice. You and I have flipped this coin many times before, so we know that it is fair. But Frank doesn't know about our experiments; in fact he suspects the coin is not fair. His personal probabilities are as follows: 218 | \begin{align*} 219 | \p(H_1 \wedge H_2) &= 3/10, \\ 220 | \p(H_1 \wedge T_2) &= 2/10, \\ 221 | \p(T_1 \wedge H_2) &= 2/10, \\ 222 | \p(T_1 \wedge T_2) &= 3/10. 223 | \end{align*} 224 | Assume we can only make two bets with Frank, a bet on $H_1$ and a bet on $H_2$. The stake for each bet must be $\$1$. What is the most we can win from him in a Dutch book? In other words, what is the maximum amount we can guarantee we will win from him? (Note that this is different from the maximum we *might* win from him if we get lucky with the coin flips.) 225 | 226 | a. $\$1$ 227 | b. $\$0.50$ 228 | c. $\$0.40$ 229 | d. $\$0$: we cannot make a Dutch book against Frank -------------------------------------------------------------------------------- /A-cheat-sheet.Rmd: -------------------------------------------------------------------------------- 1 | # (APPENDIX) Appendix {-} 2 | 3 | # Cheat Sheet 4 | 5 | ## Deductive Logic {-} 6 | 7 | Validity 8 | 9 | : An argument is valid if it is impossible for the premises to be true and the conclusion false. 10 | 11 | Soundness 12 | 13 | : An argument is sound if it is valid and all the premises are true. 14 | 15 | Connectives 16 | 17 | : There are three connectives: $\neg$ (negation), $\wedge$ (conjunction), and $\vee$ (disjunction). Their truth tables are as follows. 18 | 19 | ```{r echo=FALSE} 20 | df <- data.frame( 21 | A = c("T", "T", "F", "F"), 22 | B = c("T", "F", "T", "F"), 23 | notA = c("F", "F", "T", "T"), 24 | AandB = c("T", "F", "F", "F"), 25 | AveeB = c("T", "T", "T", "F") 26 | ) 27 | colnames(df) <- c("$A$", "$B$", "$\\neg A$", "$A \\wedge B$", "$A \\vee B$") 28 | knitr::kable(df, align = "c") 29 | ``` 30 | 31 | Logical Truth (Tautology) 32 | 33 | : A proposition that is always true. 34 | 35 | Contradiction 36 | 37 | : A proposition that is never true. 38 | 39 | Mutually Exclusive 40 | 41 | : Two propositions are mutually exclusive if they cannot both be true. 42 | 43 | Logical Entailment 44 | 45 | : One proposition logically entails another if it is impossible for the first to be true and the second false. 46 | 47 | Logical Equivalence 48 | 49 | : Two propositions are logically equivalent if they entail one another. 50 | 51 | 52 | ## Probability {-} 53 | 54 | Independence 55 | 56 | : Proposition $A$ is independent of proposition $B$ if $B$'s truth or falsity makes no difference to the probability of $A$. 57 | 58 | Fairness 59 | 60 | : A repeating process is fair if each outcome has the same probability and the repetitions are independent of one another. 61 | 62 | Multiplication Rule 63 | 64 | : If $A$ and $B$ are independent then $\p(A \wedge B) = \p(A) \times \p(B)$. 65 | 66 | Addition Rule 67 | 68 | : If $A$ and $B$ are mutually exclusive then $\p(A \vee B) = \p(A) + \p(B)$. 69 | 70 | Tautology Rule 71 | 72 | : If $A$ is a tautology then $\p(A) = 1$. 73 | 74 | Contradiction Rule 75 | 76 | : If $A$ is a contradiction then $\p(A) = 0$. 77 | 78 | Equivalence Rule 79 | 80 | : If $A$ and $B$ are logically equivalent then $\p(A) = \p(B)$. 81 | 82 | Conditional Probability 83 | 84 | : $$\p(A \given B) = \frac{\p(A \wedge B)}{\p(B)}.$$ 85 | 86 | 87 | Independence (Formal Definition) 88 | 89 | : $A$ is independent of $B$ if $\p(A \given B) = \p(A)$ and $\p(A) > 0$. 90 | 91 | Negation Rule 92 | 93 | : $\p(\neg A) = 1 - \p(A)$. 94 | 95 | General Multiplication Rule 96 | 97 | : $\p(A \wedge B) = \p(A \given B) \p(B)$ if $\p(B) > 0$. 98 | 99 | General Addition Rule 100 | 101 | : $\p(A \vee B) = \p(A) + \p(B) - \p(A \wedge B)$. 102 | 103 | Law of Total Probability 104 | 105 | : If $1 > \p(B) > 0$ then $$\p(A) = \p(A \given B)\p(B) + \p(A \given \neg B)\p(\neg B).$$ 106 | 107 | Bayes' Theorem 108 | 109 | : If $\p(A), \p(B) > 0$ then $$\p(A \given B) = \p(A) \frac{\p(B \given A)}{\p(B)}.$$ 110 | 111 | Bayes' Theorem (Long Version) 112 | 113 | : If $1 > \p(A) > 0$ and $\p(B) > 0$ then $$\p(A \given B) = \frac{\p(B \given A)\p(A)}{\p(B \given A)\p(A) + \p(B \given \neg A)\p(\neg A)}.$$ 114 | 115 | 116 | ## Decision Theory {-} 117 | 118 | Expected Monetary Value 119 | 120 | : Suppose act $A$ has possible payoffs $\$x_1, \$x_2, \ldots, \$x_n$. Then the *expected monetary value* of $A$ is defined: 121 | $$ 122 | \begin{aligned} 123 | \E(A) &= \p(\$x_1) \times \$x_1 + \p(\$x_2) \times \$x_2 + \ldots + \p(x_n) \times \$x_n. 124 | \end{aligned} 125 | $$ 126 | 127 | Expected Utility 128 | 129 | : Suppose act $A$ has possible consequences $C_1, C_2, \ldots,C_n$. Denote the utility of each outcome $U(C_1)$, $U(C_2)$, etc. Then the *expected utility* of $A$ is defined: 130 | $$ \EU(A) = \p(C_1)\u(C_1) + \p(C_2)\u(C_2) + \ldots + \p(C_n)\u(C_n). $$ 131 | 132 | Measuring Utility 133 | 134 | : Suppose an agent's best and worst possible outcomes are $B$ and $W$. Let $\u(B) = 1$ and $\u(W) = 0$. And suppose $\p(B)$ is the lowest probability such that they are indifferent between outcome $O$ and a gamble with probability $\p(B)$ of outcome $B$, and probability $1 - \p(B)$ of outcome $W$. Then, if the agent is following the expected utility rule, $\u(O) = \p(B)$. 135 | 136 | Sure-thing Principle 137 | 138 | : If you would choose $X$ over $Y$ if you knew that $E$ was true, and you'd also choose $X$ over $Y$ if you knew $E$ wasn't true, then you should choose $X$ over $Y$ when you don't know whether $E$ is true or not. 139 | 140 | 141 | ## Bayesianism {-} 142 | 143 | Measuring Personal Probability 144 | 145 | : Personal probabilities are measured by fair betting rates, if the agent is following the expected value rule. More concreteley, suppose an agent regards as fair a bet where they win $w$ if $A$ is true, and they lose $l$ if $A$ is false. Then, if they are following the expected value rule, their personal probability for $A$ is: 146 | 147 | $$ \p(A) = \frac{l}{w + l}. $$ 148 | 149 | Dutch book 150 | 151 | : A Dutch book is a set of bets where each bet is fair according to the agent's betting rates, and yet the set of bets is guaranteed to lose them money. Agents who violate the laws of probability can be Dutch booked. Agents who obey the laws of probability cannot be Dutch booked. 152 | 153 | Principle of Indifference 154 | 155 | : If there are $n$ possible outcomes, each outcome should have the same prior probability: $1/n$. 156 | 157 | If there is an interval of possible outcomes from $a$ to $b$, the probability of any subinterval from $c$ to $d$ is: $$\frac{d-c}{b-a}.$$ 158 | 159 | 160 | ## Frequentism {-} 161 | 162 | Significance Testing 163 | 164 | : A significance test at the $.05$ level can be described in three steps. 165 | 166 | 1. State the hypothesis you want to test: the true probability of outcome X is $p$. This is called the *null hypothesis*. 167 | 2. Repeat the event over and over and count the number of times $k$ that outcome X occurs. 168 | 3. If the number $k$ falls outside the range of outcomes expected $95\%$ of the time, reject the null hypothesis. (Otherwise, draw no conclusion.) 169 | 170 | For a test at the $.01$ level, follow the same steps but check instead whether $k$ falls outside the range of outcomes expected $99\%$ of the time. 171 | 172 | Normal Approximation 173 | 174 | : Suppose an event has two possible outcomes, with probabilities $p$ and $1-p$. And suppose the event will be repeated $n$ independent times. We define the mean $\mu = np$ and the standard deviation $\sigma = \sqrt{np(1-p)}$. Let $k$ be the number of times the first outcome occurs. Then, if $n$ is large enough: 175 | 176 | - The probability is about $.68$ that $k$ will be between $\mu - \sigma$ and $\mu + \sigma$. 177 | - The probability is about $.95$ that $k$ will be between $\mu - 2\sigma$ and $\mu + 2\sigma$. 178 | - The probability is about $.99$ that $k$ will be between $\mu - 3\sigma$ and $\mu + 3\sigma$. -------------------------------------------------------------------------------- /B-axiomatic-probability-theory.Rmd: -------------------------------------------------------------------------------- 1 | # The Axioms of Probability 2 | 3 | ## Theories and Axioms {-} 4 | 5 | In mathematics, a theory like the theory of probability is developed axiomatically. That means we begin with fundamental laws or principles called *axioms*, which are the assumptions the theory rests on. Then we derive the consequences of these axioms via *proofs*: deductive arguments which establish additional principles that follow from the axioms. These further principles are called *theorems*. 6 | 7 | In the case of probability theory, we can build the whole theory from just three axioms. And that makes certain tasks much easier. For example, it makes it easy to establish that anyone who violates a law of probability can be Dutch booked. Because, if you violate a law of probability, you must also be violating one of the three axioms that entail the law you've violated. And with only three axioms to check, we can verify pretty quickly that violating an axiom always makes you vulnerable to a Dutch book. 8 | 9 | The axiomatic approach is useful for lots of other reasons too. For example, we can program the axioms into a computer and use it to solve real-world problems. Or, we could use the axioms to verify that the theory is consistent: if we can establish that the axioms don't contradict one another, then we know the theory makes sense. Axioms are also a useful way to summarize a theory, which makes it easier to compare it to alternative theories. 10 | 11 | In addition to axioms, a theory typically includes some *definitions*. Definitions construct new concepts out of existing ones, ones that already appear in the axioms. Definitions don't add new assumptions to the theory. Instead they're useful because they give us new language in which to describe what the axioms already entail. 12 | 13 | So a theory is a set of statements that tells us everything true about the subject at hand. There are three kinds of statements. 14 | 15 | 1. Axioms: the principles we take for granted. 16 | 2. Definitions: statements that introduce new concepts or terms. 17 | 3. Theorems: statements that follow from the axioms and the definitions. 18 | 19 | In this appendix we'll construct probability theory axiomatically. We'll learn how to derive all the laws of probability discussed in Part I from three simple statements. 20 | 21 | 22 | ## The Three Axioms of Probability {-} 23 | 24 | Probability theory has three axioms, and they're all familiar laws of probability. But they're fundamental laws in a way. All the other laws can be derived from them. 25 | 26 | The three axioms are as follows. 27 | 28 | Normality 29 | 30 | : For any proposition $A$, $0 \leq \p(A) \leq 1$. 31 | 32 | Tautology Rule 33 | 34 | : If $A$ is a logical truth then $\p(A) = 1$. 35 | 36 | Additivity Rule 37 | 38 | : If $A$ and $B$ are mutually exclusive then $\p(A \vee B) = \p(A) + \p(B)$. 39 | 40 | Our task now is to derive from these three axioms the other laws of probability. We do this by stating each law, and then giving a proof of it: a valid deductive argument showing that it follows from the axioms and definitions. 41 | 42 | 43 | ## First Steps {-} 44 | 45 | Let's start with one of the easier laws to derive. 46 | 47 | The Negation Rule 48 | 49 | : $\p(\neg A) = 1 - \p(A)$. 50 | 51 | ```{proof} 52 | 53 | To prove this rule, start by noticing that $A \vee \neg A$ is a logical truth. So we can reason as follows: 54 | $$ 55 | \begin{aligned} 56 | \p(A \vee \neg A) &= 1 & \mbox{ by Tautology}\\ 57 | \p(A) + \p(\neg A) &= 1 & \mbox{ by Additivity}\\ 58 | \p(\neg A) &= 1 - \p(A) & \mbox{ by algebra.} 59 | \end{aligned} 60 | $$ 61 | 62 | ``` 63 | 64 | The little square indicates the end of the proof. Notice how each line of our proof is justified by either applying an axiom or using basic algebra. This ensures it's a valid deductive argument. 65 | 66 | Now we can use the Negation rule to establish the flipside of the Tautology rule: the Contradiction rule. 67 | 68 | The Contradiction Rule 69 | 70 | : If $A$ is a contradiction then $\p(A) = 0$. 71 | 72 | ```{proof} 73 | 74 | Notice that if $A$ is a contradiction, then $\neg A$ must be a tautology. So $\p(\neg A) = 1$. Therefore: 75 | $$ 76 | \begin{aligned} 77 | \p(A) &= 1 - \p(\neg A) & \mbox{by Negation}\\ 78 | &= 1 - 1 & \mbox{by Tautology}\\ 79 | &= 0 & \mbox{by arithmetic.} 80 | \end{aligned} 81 | $$ 82 | 83 | ``` 84 | 85 | ## Conditional Probability & the Multiplication Rule {-} 86 | 87 | 88 | Our next theorem is about conditional probability. But the concept of conditional probability isn't mentioned in the axioms, so we need to define it first. 89 | 90 | Definition: Conditional Probability 91 | 92 | : The conditional probability of $A$ given $B$ is written $\p(A \given B)$ and is defined: $$\p(A \given B) = \frac{\p(A \wedge B)}{\p(B)},$$ provided that $\p(B) > 0$. 93 | 94 | From this definition we can derive the following theorem. 95 | 96 | Multiplication Rule 97 | 98 | : If $\p(B) > 0$, then $\p(A \wedge B) = \p(A \given B)\p(B)$. 99 | 100 | ```{proof} 101 | 102 | $$ 103 | \begin{aligned} 104 | \p(A \given B) &= \frac{\p(A \wedge B)}{\p(B)} & \mbox{ by definition}\\ 105 | \p(A \given B)\p(B) &= \p(A \wedge B) & \mbox{ by algebra}\\ 106 | \p(A \wedge B) &= \p(A \given B)\p(B) & \mbox{ by algebra.} 107 | \end{aligned} 108 | $$ 109 | 110 | ``` 111 | 112 | Notice that the first step in this proof wouldn't make sense if we didn't assume from the beginning that $\p(B) > 0$. That's why the theorem begins with the qualifier, "If $\p(B) > 0$...". 113 | 114 | 115 | ## Equivalence & General Addition {-} 116 | 117 | Next we'll prove the Equivalence rule and the General Addition rule. These proofs are longer and more difficult than the ones we've done so far. 118 | 119 | Equivalence Rule 120 | 121 | : When $A$ and $B$ are logically equivalent, $\p(A) = \p(B)$. 122 | 123 | ```{proof} 124 | 125 | Suppose that $A$ and $B$ are logically equivalent. Then $\neg A$ and $B$ are mutually exclusive: if $B$ is true then $A$ must be true, hence $\neg A$ false. So $B$ and $\neg A$ can't both be true. 126 | 127 | So we can apply the Additivity axiom to $\neg A \vee B$: 128 | $$ 129 | \begin{aligned} 130 | \p(\neg A \vee B) &= \p(\neg A) + \p(B) & \mbox{ by Additivity}\\ 131 | &= 1 - \p(A) + \p(B) & \mbox{ by Negation.} 132 | \end{aligned} 133 | $$ 134 | 135 | Next notice that, because $A$ and $B$ are logically equivalent, we also know that $\neg A \vee B$ is a logical truth. If $B$ is false, then $A$ must be false, so $\neg A$ must be true. So either $B$ is true, or $\neg A$ is true. So $\neg A \vee B$ is always true, no matter what. 136 | 137 | So we can apply the Tautology axiom: 138 | $$ 139 | \begin{aligned} 140 | \p(\neg A \vee B) &= 1 & \mbox{ by Tautology.} 141 | \end{aligned} 142 | $$ 143 | Combining the previous two equations we get: 144 | $$ 145 | \begin{aligned} 146 | 1 &= 1 - \p(A) + \p(B) & \mbox{ by algebra}\\ 147 | \p(A) &= \p(B) & \mbox{ by algebra}. 148 | \end{aligned} 149 | $$ 150 | 151 | ``` 152 | 153 | Now we can use this theorem to derive the General Addition rule. 154 | 155 | General Addition Rule 156 | 157 | : $\p(A \vee B) = \p(A) + \p(B) - \p(A \wedge B)$. 158 | 159 | ```{proof} 160 | 161 | Start with the observation that $A \vee B$ is logically equivalent to: 162 | $$ (A \wedge \neg B) \vee (A \wedge B) \vee (\neg A \wedge B). $$ 163 | This is easiest to see with an Euler diagram, but you can also verify it with a truth table. (We won't go through either of these exercises here.) 164 | 165 | So we can apply the Equivalence rule to get: 166 | $$ 167 | \begin{aligned} 168 | \p(A \vee B) &= \p((A \wedge \neg B) \vee (A \wedge B) \vee (\neg A \wedge B)). 169 | \end{aligned} 170 | $$ 171 | And thus, by Additivity: 172 | $$ 173 | \begin{aligned} 174 | \p(A \vee B) &= \p(A \wedge \neg B) + \p(A \wedge B) + \p(\neg A \wedge B). 175 | \end{aligned} 176 | $$ 177 | 178 | We can also verify with an Euler diagram (or truth table) that $A$ is logically equivalent to $(A \wedge B) \vee (A \wedge \neg B)$, and that $B$ is logically equivalent to $(A \wedge B) \vee (\neg A \wedge B)$. So, by Additivity, we also have the equations: 179 | $$ 180 | \begin{aligned} 181 | \p(A) &= \p(A \wedge \neg B) + \p(A \wedge B).\\ 182 | \p(B) &= \p(A \wedge B) + \p(\neg A \wedge B). 183 | \end{aligned} 184 | $$ 185 | Notice, the last equation here can be transformed to: 186 | $$ 187 | \begin{aligned} 188 | \p(\neg A \wedge B) &= \p(B) - \p(A \wedge B). 189 | \end{aligned} 190 | $$ 191 | Putting the previous four equations together, we can then derive: 192 | $$ 193 | \begin{aligned} 194 | \p(A \vee B) &= \p(A \wedge \neg B) + \p(A \wedge B) + \p(\neg A \wedge B) & \mbox{by algebra}\\ 195 | &= \p(A) + \p(\neg A \wedge B) & \mbox{by algebra}\\ 196 | &= \p(A) + \p(B) - \p(A \wedge B) & \mbox{by algebra.} 197 | \end{aligned} 198 | $$ 199 | 200 | ``` 201 | 202 | ## Total Probability & Bayes' Theorem {-} 203 | 204 | Next we derive the Law of Total Probability and Bayes' theorem. 205 | 206 | Total Probability 207 | 208 | : If $0 < \p(B) < 1$, then 209 | 210 | $$ \p(A) = \p(A \given B)\p(B) + \p(A \given \neg B)\p(\neg B). $$ 211 | 212 | ```{proof} 213 | 214 | $$ 215 | \begin{aligned} 216 | \p(A) &= \p((A \wedge B) \vee (A \wedge \neg B)) & \mbox{ by Equivalence}\\ 217 | &= \p(A \wedge B) + \p(A \wedge \neg B) & \mbox{ by Additivity}\\ 218 | &= \p(A \given B)\p(B) + \p(A \given \neg B)\p(\neg B) & \mbox{ by Multiplication.} 219 | \end{aligned} 220 | $$ 221 | 222 | ``` 223 | 224 | Notice, the last line of this proof only makes sense if $\p(B) > 0$ and $\p(\neg B) > 0$. That's the same as $0 < \p(B) < 1$, which is why the theorem begins with the condition: "If $0 < \p(B) < 1$...". 225 | 226 | Now for the first version of Bayes' theorem: 227 | 228 | Bayes' Theorem 229 | 230 | : If $\p(A),\p(B)>0$, then 231 | $$ \p(A \given B) = \p(A)\frac{\p(B \given A)}{\p(B)}. $$ 232 | 233 | ```{proof} 234 | 235 | $$ 236 | \begin{aligned} 237 | \p(A \given B) &= \frac{\p(A \wedge B)}{\p(B)} & \mbox{by definition}\\ 238 | &= \frac{\p(B \given A)\p(A)}{\p(B)} & \mbox{by Multiplication}\\ 239 | &= \p(A)\frac{\p(B \given A)}{\p(B)} & \mbox{by algebra.}\\ 240 | \end{aligned} 241 | $$ 242 | 243 | ``` 244 | 245 | And next the long version: 246 | 247 | Bayes' Theorem (long version) 248 | 249 | : If $1 > \p(A) > 0$ and $\p(B)>0$, then 250 | $$ \p(A \given B) = \frac{\p(A)\p(B \given A)}{\p(A)\p(B \given A) + \p(\neg A)\p(B \given \neg A)}. $$ 251 | 252 | ```{proof} 253 | 254 | $$ 255 | \begin{aligned} 256 | \p(A \given B) 257 | &= \frac{\p(A)\p(B \given A)}{\p(B)} & \mbox{by Bayes' theorem}\\ 258 | &= \frac{\p(A)\p(B \given A)}{\p(A)\p(B \given A) + \p(\neg A)\p(B \given \neg A)} & \mbox{by Total Probability.} 259 | \end{aligned} 260 | $$ 261 | 262 | ``` 263 | 264 | ## Independence {-} 265 | 266 | Finally, let's introduce the concept of independence, and two key theorems that deal with it. 267 | 268 | Definition: Independence 269 | 270 | : $A$ is independent of $B$ if $\p(A \given B) = \p(A)$ and $\p(A) > 0$. 271 | 272 | Now we can state and prove the Multiplication rule. 273 | 274 | Multiplication Rule 275 | 276 | : If $A$ is independent of $B$, then $\p(A \wedge B) = \p(A)\p(B)$. 277 | 278 | ```{proof} 279 | 280 | Suppose $A$ is independent of $B$. Then: 281 | $$ 282 | \begin{aligned} 283 | \p(A \given B) &= \p(A) & \mbox{ by definition}\\ 284 | \frac{\p(A \wedge B)}{\p(B)} &= \p(A) & \mbox{ by definition}\\ 285 | \p(A \wedge B) &= \p(A) \p(B) & \mbox{ by algebra.} 286 | \end{aligned} 287 | $$ 288 | 289 | ``` 290 | 291 | Finally, we prove another useful fact about independence, namely that it goes both ways. 292 | 293 | Independence is Symmetric 294 | 295 | : If $A$ is independent of $B$, then $B$ is independent of $A$. 296 | 297 | ```{proof} 298 | 299 | To derive this fact, suppose $A$ is independent of $B$. Then: 300 | $$ 301 | \begin{aligned} 302 | \p(A \wedge B) &= \p(A) \p(B) & \mbox{ by Multiplication}\\ 303 | \p(B \wedge A) &= \p(A) \p(B) & \mbox{ by Equivalence}\\ 304 | \frac{\p(B \wedge A)}{\p(A)} &= \p(B) & \mbox{ by algebra}\\ 305 | \p(B \given A) &= \p(B) & \mbox{ by definition.} 306 | \end{aligned} 307 | $$ 308 | 309 | ``` 310 | 311 | We've now established that the laws of probability used in this book can be derived from the three axioms we began with. -------------------------------------------------------------------------------- /C-grue.Rmd: -------------------------------------------------------------------------------- 1 | # The Grue Paradox {#grue} 2 | 3 | `r newthought("In")` [Section](#indargs) \@ref(indargs) we noticed that many inductive arguments have a certain format. 4 | 5 | ```{block, type="argument", echo=TRUE} 6 | All observed instances of $X$ have been $Y$.\ 7 | Therefore, all instances of $X$ are $Y$. 8 | ``` 9 | 10 | All observed ravens have been black, so we expect all ravens to be black. All observed emeralds have been green, so we expect all emeralds to be green. And so on. 11 | 12 | It seems like a fundamental principle of scientific inquiry that we expect the unobserved to resemble the observed. Philosophers call this *The Principle of Induction*. 13 | 14 | 15 | ## A Gruesome Concept {-} 16 | 17 | ```{r echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap="Nelson Goodman (1906--1988) discovered the grue paradox in the $1940$s and '$50$s."} 18 | knitr::include_graphics("img/goodman.png") 19 | ``` 20 | 21 | But in the $1940$s, Nelson Goodman discovered a problem with the Principle of Induction. To illustrate the problem he invented a very curious concept, *grue*. 22 | 23 | There are two ways for an object to be grue. Some green things are grue, but it depends on when we first encounter them. If our first observation of a green object happens before the year $2050$, then it's grue. So the Statue of Liberty is grue: it's a green object that was first observed before the year $2050$ (*long* before). 24 | 25 | But if our first encounter with a green object happens in the year $2050$ or later, then it's not grue. The same goes if we never observe it. Objects on the far side of the universe that we'll never see, or buried deep underground, are not grue. 26 | 27 | There is a second way for an object to be grue: some blue objects are grue. Not the ones observed before $2050$, though. Instead it's the ones that *aren't* observed before $2050$. If a blue object is observed for the first time *after* $2049$, or it's never observed at all, then it's grue. So blue sapphires that won't be mined before the year $2050$ are grue, for example. 28 | 29 | As usual, it helps to have a diagram: see Figure \@ref(fig:gruegraph). Here is the official definition of grue. 30 | 31 | ```{r gruegraph, echo=FALSE, cache=TRUE, fig.cap="The definition of grue"} 32 | df <- expand.grid( 33 | t_obs = c("Observed Before 2050", "Not Observed Before 2050"), 34 | colour = c("Green", "Blue") 35 | ) 36 | df$colour <- factor(df$colour, levels = rev(levels(df$colour))) 37 | df$green <- c(TRUE, TRUE, FALSE, FALSE) 38 | df$grolour <- c("Grue", "Bleen", "Bleen", "Grue") 39 | 40 | ggplot(df) + 41 | geom_tile(aes(x = t_obs, y = colour, fill = green), colour = "black") + 42 | geom_label(aes(x = t_obs, y = colour, label = grolour), 43 | data = df %>% filter(grolour == "Grue")) + 44 | theme_minimal() + 45 | scale_fill_manual(values = c(bookblue, bookgreen)) + 46 | xlab(NULL) + ylab(NULL) + 47 | theme_minimal(base_size = 14) + 48 | theme(panel.grid = element_blank(), legend.position = "none") 49 | ``` 50 | 51 | Grue 52 | 53 | : An object is *grue* if either (a) it is green and first observed before the year $2050$, or (b) it is blue and not observed before $2050$. 54 | 55 | To test your understanding, see if you can explain why each of the following are examples of grue things: the $\$20$ bill in my pocket, Kermit the Frog, the first sapphire to be mined in $2050$, and blue planets on the far side of the universe. 56 | 57 | Then see if you can explain why these things aren't grue: fire engines, the [Star of India](https://en.wikipedia.org/wiki/Star_of_India_(gem)), and the first $\$20$ bill to be printed in $2050$. 58 | 59 | Once you've got all those down, try this question: do grue objects change colour in the year $2050$? It's a common mistake to think they do. 60 | 61 | But no, grue objects don't change colour. The Statue of Liberty is green and it always will be (let's assume). So it's grue, and always will be, because it's a green thing that was first observed before the year $2050$. Part (a) of the definition of grue guarantees that. 62 | 63 | The only way time comes into it is in determining which green things are grue, and which blue things. If a green thing is first observed before $2050$, then it's grue, ever and always. Likewise if a blue thing is *not* first observed before $2050$. Then it's grue---and it always has been! 64 | 65 | 66 | ## The Paradox {-} 67 | 68 | Now ask yourself, have you ever seen a grue emerald? You probably have. In fact, every emerald everyone's ever seen has been grue. 69 | 70 | Why? Because they're all green, and they've all been observed before the year $2050$. So they're all grue the first way---they all satisfy part (a) of the definition. (Notice it's an either/or definition, so you only have to satisfy one of the two parts to be grue.) 71 | 72 | So all the emeralds we've ever seen have been grue. Let's apply the Principle of Induction then: 73 | 74 | ```{block, type="argument", echo=TRUE} 75 | All observed emeralds have been grue.\ 76 | Therefore *all* emeralds are grue. 77 | ``` 78 | 79 | But if all emeralds are grue, then the first emeralds to be mined in $2050$ will be grue. And that means they'll be blue! They won't have been observed before $2050$, so the only way for them to be grue is to be blue. 80 | 81 | We've reached the absurd conclusion that there are blue emeralds out there, just waiting to be pulled out of the earth. Something has gone off the rails here, but what? 82 | 83 | Here's another way to put the challenge. We have two "patterns" in our observed data. The emeralds we've seen are uniformly green, but they're also uniformly grue. We can't project both these patterns into the future, though. They'll contradict each other starting in $2050$. 84 | 85 | Now, obviously, common sense says the green pattern is the real one. The grue "pattern" is bogus, and no one but a philosopher would even bother thinking about it. But *why* is it bogus? What's so special about green? 86 | 87 | Apparently the Principle of Induction has a huge hole in it! It says to extrapolate from observed patterns. But *which* patterns? 88 | 89 | 90 | ## Grue & Artificial Intelligence {-} 91 | 92 | ```{r curvefitting, echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap="The same set of points interpreted two different ways"} 93 | x <- seq(0, 22, .1) 94 | line <- function(x) x 95 | squiggle <- function(x) 2 * pi * sin(x) + x 96 | 97 | df_line <- data.frame(x = x, y = line(x), wiggly = FALSE) 98 | df_squiggle <- data.frame(x = x, y = squiggle(x), wiggly = TRUE) 99 | df <- bind_rows(df_line, df_squiggle) 100 | 101 | x <- seq(0, 22, pi) 102 | df_points <- data.frame(x = x, y = line(x)) 103 | 104 | ggplot(df) + 105 | geom_point(aes(x = x, y = y), data = df_points) + 106 | geom_path(aes(x = x, y = y), linetype = "dashed") + 107 | facet_grid(rows = vars(wiggly)) + 108 | scale_x_continuous(name = NULL, labels = NULL) + 109 | scale_y_continuous(name = NULL, labels = NULL) + 110 | theme(strip.text = element_blank()) 111 | ``` 112 | 113 | Patterns are cheap, as any data scientist will tell you. Given a bunch of data points in an *xy*-plane, there are lots of ways to connect the dots. Even if they all lie on a straight line, you could draw an oscillating curve that passes through each point (Figure \@ref(fig:curvefitting)). You can probably think of even sillier curves that will fit all the points. 114 | 115 | Designing a computer program that will know which patterns to use and which to ignore is a big part of what machine learning experts do. And it's one reason humans are still essential to designing artificial intelligence. Thanks to our experience, and our genetic inheritance, we have *lots* of information about which patterns are likely to continue, and which are bogus, like grue. 116 | 117 | But how do we pass all that wisdom on to the machines? How do we teach them the difference between green and grue, so that they can take it from here and we can all go on permanent vacation? 118 | 119 | 120 | ## Disjunctivitis {-} 121 | 122 | Here's one very natural answer. The problem with grue is that it's *disjunctive*: it's defined using either/or. It suffers from what we might call "disjunctivitis." 123 | 124 | But the beauty of Goodman's puzzle is the neat way it exposes the flaw in this answer. It allows us to make 'green' the disjunctive concept instead! How? We start by building grue a friend, a concept to fill in the missing spaces in our original diagram. We'll call it *bleen*: see Figure \@ref(fig:gruebleengraph). 125 | 126 | ```{r gruebleengraph, echo=FALSE, cache=TRUE, fig.cap="Defining bleen, a counterpart to grue"} 127 | df <- expand.grid( 128 | t_obs = c("Observed Before 2050", "Not Observed Before 2050"), 129 | colour = c("Green", "Blue") 130 | ) 131 | df$colour <- factor(df$colour, levels = rev(levels(df$colour))) 132 | df$green <- c(TRUE, TRUE, FALSE, FALSE) 133 | df$grolour <- c("Grue", "Bleen", "Bleen", "Grue") 134 | 135 | ggplot(df) + 136 | geom_tile(aes(x = t_obs, y = colour, fill = green), colour = "black") + 137 | geom_label(aes(x = t_obs, y = colour, label = grolour), data = df) + 138 | theme_minimal() + 139 | scale_fill_manual(values = c(bookblue, bookgreen)) + 140 | xlab(NULL) + ylab(NULL) + 141 | theme_minimal(base_size = 14) + 142 | theme(panel.grid = element_blank(), legend.position = "none") 143 | ``` 144 | 145 | Now we can define green in terms of grue and bleen. 146 | 147 | Green 148 | 149 | : An object is *green* if either (a) it's grue and first observed before the year $2050$, or (b) it's bleen and not observed before $2050$. 150 | 151 | Notice how when we start with the odd language of grue and bleen, and then define the familiar concept green, it's the familiar concept that comes out all disjunctive and artificial-seeming. 152 | 153 | Maybe you're thinking: you *could* define green that way, but that's not how it's *actually* defined. In reality, we already understand the concept of green, and we have to learn the concept of grue from its disjunctive definition. 154 | 155 | The problem is, that's just a fact about *us humans*, not about the concepts grue and green. That's just the way we *homo sapiens* happen to be built (or maybe socialized, or both). 156 | 157 | Some bizarre species of alien could grow up thinking in grue/bleen terms instead. And when they landed on Earth, we'd have to explain our green/blue language to them using an either/or definition. Then *they* would be looking at *us* thinking: you guys have a very weird, disjunctive way of thinking! 158 | 159 | What could we say to them to establish the superiority of our way of thinking? It's been more than $70$ years since Goodman first posed this question. Yet no answer has emerged as the clear and decisively correct one. 160 | 161 | 162 | ## Time Dependence {-} 163 | 164 | Another natural answer to Goodman's challenge is to say that grue is defective because it's time-dependent. It means different things depending on the time an object is first observed. 165 | 166 | But the same reversal of fortunes that toppled the "disjunctivitis" diagnosis happens here. We can define green in terms of grue and bleen. Then green becomes the time-dependent concept, not grue. 167 | 168 | So we're left in the same spot. We need some way of showing that the "true" order of definition is the one we're used to. By what criterion can we say that green is more fundamental, more basic, than grue? 169 | 170 | 171 | ## The Moral {-} 172 | 173 | ```{marginfigure} 174 | [](http://www.wi-phi.com/video/puzzle-grue) 175 | For another explanation of the grue puzzle, check out [this excellent Wi-Phi video](http://www.wi-phi.com/video/puzzle-grue). 176 | ``` 177 | 178 | Goodman's puzzle may seem just cute at first, a mere curiosity. But it is actually quite profound. 179 | 180 | In a way, the central question of this book is: what is the logic of science? What are the correct rules of scientific reasoning? 181 | 182 | The laws of probability seem like a good place to start. But that path led us to a dead end at the problem of priors in [Chapter](#priors) \@ref(priors). Perhaps then we could start with the Principle of Induction instead? But then we end up in another dead end, stopped by the grue paradox. 183 | 184 | Just as Bertrand's paradox stops us from using the Principle of Indifference to answer the problem of priors, Goodman's paradox stops us from using the Principle of Induction. 185 | 186 | There is even more to the similarity between these two paradoxes. Both paradoxes are problems of "language dependence." Depending on what language we work in, rules like the Principle of Indifference and the Principle of Induction give different recommendations. If we apply the Principle of Indifference to length, we get one prior probability; if we apply it to area, we get another. If we apply the Principle of Induction to green, we expect all emeralds to be green; if we apply it to grue, we expect some to be blue. 187 | 188 | To this day, we do not have an answer to the question: which language is the right one to use, and why? 189 | 190 | 191 | ## Exercises {-} 192 | 193 | 1. Suppose you have a large urn of emeralds, and you want to compare the following two hypotheses: 194 | \begin{align*} 195 | H_1 &= \text{all the emeralds in the urn are green,}\\ 196 | H_2 &= \text{all the emeralds in the urn are grue.} 197 | \end{align*} 198 | You draw $10$ emeralds from the urn, and all $10$ are green. Assuming the two hypotheses have the same prior probability, how do their posterior probabilities compare? 199 | 200 | a. $\p(H_1 \given E) > \p(H_2 \given E)$ 201 | b. $\p(H_1 \given E) < \p(H_2 \given E)$ 202 | c. $\p(H_1 \given E) = \p(H_2 \given E)$ 203 | 204 | #. Suppose you are exploring the relationship between temperature and pressure for a newly discovered gas. You have a sample of the gas in a container where you can control the temperature. So you try $8$ different temperatures, and measure the pressure at each temperature. The results look like the points in Figure \@ref(fig:curvefitting), where the $x$-axis is temperature and the $y$-axis is pressure. 205 | 206 | You want to compare two hypotheses about the true relationship between pressure and temperature, portrayed with dashes in Figure \@ref(fig:curvefitting): 207 | \begin{align*} 208 | H_1 &= \text{the straight line,}\\ 209 | H_2 &= \text{the oscillating curve.} 210 | \end{align*} 211 | Assuming the two hypotheses have the same prior probability, how do their posterior probabilities compare? 212 | 213 | a. $\p(H_1 \given E) > \p(H_2 \given E)$ 214 | b. $\p(H_1 \given E) < \p(H_2 \given E)$ 215 | c. $\p(H_1 \given E) = \p(H_2 \given E)$ 216 | 217 | #. Write a short essay ($3$--$4$ paragraphs) explaining the grue paradox. Your essay should include all of the following: 218 | 219 | - a clear, accurate explanation of the Principle of Induction, 220 | - a clear, accurate explanation of the concept grue, 221 | - a clear, accurate explanation of the challenge grue poses for the Principle of Induction, and 222 | - a clear, accurate explanation of the "disjunctivitis" solution to the paradox, and Goodman's reply to that solution. 223 | 224 | -------------------------------------------------------------------------------- /D-problem-of-induction.Rmd: -------------------------------------------------------------------------------- 1 | # The Problem of Induction 2 | 3 | ```{block, type="epigraph"} 4 | It's tough to make predictions, especially about the future.\ 5 | ---Yogi Berra 6 | ``` 7 | 8 | `r newthought("Many")` inductive arguments work by projecting an observed pattern onto as-yet unobserved instances. All the ravens we've observed have been black, so all ravens are. All the emeralds we've seen have been green, so all emeralds are. 9 | 10 | The assumption that the unobserved will resemble the observed seems to be central to induction. Philosophers call this assumption the *Principle of Induction*.[^poi] But what justfies this assumption? Do we have any reason to think the parts of reality we've observed so far are a good representation of the parts we haven't seen yet? 11 | 12 | [^poi]: See [Section](#indargs) \@ref(indargs) and [Appendix](#grue) \@ref(grue) for previous discussions of the Principle of Induction. 13 | 14 | Actually there are strong reasons to doubt whether this assumption can be justified. It may be impossible to give any good argument for expecting the unobserved to resemble the observed. 15 | 16 | 17 | ## The Dilemma {-} 18 | 19 | We noted in [Chapter 2][Logic] that there are two kinds of argument, inductive and deductive. Some arguments establish their conclusions necessarily, others only support them with high probability. If there is an argument for the Principle of Induction, it must be one of these two kinds. Let's consider each in turn. 20 | 21 | Could we give an inductive argument for the Principle of Induction? At first it seems we could. Scientists have been using inductive reasoning for millenia, often with great success. Indeed, it seems humans, and other creatures too, have relied on it for much longer, and could not have survived without it. So the Principle of Induction has a very strong track record. Isn't that a good argument for believing it's correct? 22 | 23 | ```{r echo=FALSE, cache=TRUE, fig.margin=TRUE, fig.cap="David Hume (1711--1776) raised the problem of induction in $1739$. Our presentation of it here is somewhat modernized from his original argument."} 24 | knitr::include_graphics("img/hume.png") 25 | ``` 26 | 27 | No, because the argument is circular. It uses the Principle of Induction to justify believing in the Principle of Induction. Consider that the argument we are attempting looks like this: 28 | 29 | ```{block, type="argument"} 30 | The principle has worked well when we've used it in the past.\ 31 | Therefore it will work well in future instances. 32 | ``` 33 | 34 | This is an inductive argument, an argument from observed instances to ones as yet unobserved. So, under the hood, it appeals to the Principle of Induction. But that's exactly the conclusion we're trying to establish. And one can't use a principle to justify itself. 35 | 36 | What about our second option: could a deductive argument establish the Principle of Induction? Well, by definition, a deductive argument establishes its conclusion with necessity. Is it necessary that the unobserved will be like the observed? It doesn't look like it. It seems perfectly possible that tomorrow the world will go haywire, randomly switching from pattern to pattern, or even to no pattern at all. 37 | 38 | Maybe tomorrow the sun will fail to rise. Maybe gravity will push apart instead of pull together, and all the other laws of physics will reverse too. And just as soon as we get used to those patterns and start expecting them to continue, another pattern will arise. And then another. And then, just as we give up and come to have no expectation at all about what will come next, everything will return to normal. Until we get comfortable and everything changes again. 39 | 40 | Thankfully, our universe hasn't been so mischievous. We get surprised now and again, but for the most part inductive reasoning is pretty reliable, when we do it carefully. But we're lucky in this respect, is the point. 41 | 42 | Nature *could* have been mischievous, totally unpredictable. It is not a necessary truth that the unobserved must resemble the observed. And so it seems there cannot be a deductive argument for the Principle of Induction. Because such an argument would establish the principle as a necessary truth. 43 | 44 | 45 | ## The Problem of Induction vs. the Grue Paradox {-} 46 | 47 | If you read [Appendix](#grue) \@ref(grue), you know of another famous problem with the Principle of Induction: the grue paradox. (If you haven't read that chapter, you might want to skip this section.) 48 | 49 | The two problems are quite different, but it's easy to get them confused. The problem we're discussing here is about justifying the Principle of Induction. Is there any reason to believe it's true? Whereas the grue paradox points out that we don't even really know what the principle says, in a way. It says that what we've observed is a good indicator of what we haven't yet obsered. But in what respects? Will unobserved emeralds be green, or will they be grue? 50 | 51 | So the challenge posed by grue is to spell out, precisely, what the Principle of Induction says. But even if we can meet that challenge, this challenge will remain. Why should we believe the principle, once it's been spelled out? Neither a deductive argument nor an inductive argument seems possible. 52 | 53 | 54 | ## Probability Theory to the Rescue? {-} 55 | 56 | The Problem of Induction is centuries old. Isn't it out of date? Hasn't the modern, mathematical theory of probability solved the problem for us? 57 | 58 | Not at all, unfortunately. One thing we learn in this book is that the laws of probability are very weak in a way. They don't tell us much, without us first telling them what the prior probabilities are. And as we've seen over and again throughout Part III, [the problem of priors](#priors) is very much unsolved. 59 | 60 | For example, suppose we're going to flip a mystery coin three times. We don't know whether the coin is fair or biased, but we hope to have some idea after a few flips. 61 | 62 | Now suppose we've done the first two flips, both heads. The Principle of Induction says we should expect the next flip to be heads too. At least, that outcome should now be more probable. 63 | 64 | Do the laws of probability agree? Well, we need to calculate the quantity: 65 | $$ \p(H_3 \given H_1 \wedge H_2).$$ 66 | The definition of conditional probability tells us: 67 | $$ 68 | \begin{aligned} 69 | \p(H_3 \given H_2 \wedge H_1) 70 | &= \frac{\p(H_3 \wedge H_2 \wedge H_1)}{\p(H_2 \wedge H_1)}. 71 | \end{aligned} 72 | $$ 73 | But the laws of probability don't tell us what numbers go in the numerator and the denominator. 74 | 75 | The numbers have to be between $0$ and $1$. And we have to be sure mutually exclusive propositions have probabilities that add up, according to the Additivity rule. But that still leaves things wide open. 76 | 77 | For example, we could assume that all possible sequences of heads and tails are equally likely. In other words: 78 | $$ \p(HHH) = \p(THH) = \p(HTH) = \ldots = \p(TTT) = 1/8. $$ 79 | Given that assumption, we get the result that $\p(H_3 \given H_2 \wedge H_1) = 1/2$. 80 | $$ 81 | \begin{aligned} 82 | \p(H_3 \given H_2 \wedge H_1) 83 | &= \frac{\p(H_3 \wedge H_2 \wedge H_1)}{\p(H_2 \wedge H_1)}\\ 84 | &= \frac{1/8}{1/8 + 1/8}\\ 85 | &= 1/2. 86 | \end{aligned} 87 | $$ 88 | But that means the first two flips didn't tell us anything about the third! Even after we got two heads, the chance of another heads is still stuck at $1/2$, same as it was to start with. 89 | 90 | `r newthought("We didn't")` *have* to assume all possible sequences are equally likely. We can make a different assumption, and get a different result. 91 | 92 | Let's try assuming instead that all possible *frequencies* of heads are equally probable. In other words, the probability of getting $0$ heads is the same as the probability of getting $1$ head, which is also the same as the probability of getting $2$ heads, and likewise for $3$ heads. So we're grouping the possible sequences like so: 93 | 94 | - $0$ heads: $TTT$ 95 | - $1$ head: $HTT$, $THT$, $TTH$ 96 | - $2$ heads: $HHT$, $HTH$, $THH$ 97 | - $3$ heads: $HHH$ 98 | 99 | Each grouping has the same probability, $1/4$. And for the groups in the middle, which have multiple members, we divide that evenly between the members. So $\p(HTH) = 1/12$, for example, but $\p(TTT) = 1/4$. 100 | 101 | This might seem a funny way of assigning prior probabilities. But it actually leads to very sensible results; the same results as the [Rule of Succession](#succession) in fact! For example, we get $\p(H_3 \given H_2 \wedge H_1) = 3/4$: 102 | $$ 103 | \begin{aligned} 104 | \p(H_3 \given H_1 \wedge H_2) 105 | &= \frac{\p(H_3 \wedge H_2 \wedge H_1)}{\p(H_2 \wedge H_1)}\\ 106 | &= \frac{1/4}{1/4 + 1/12}\\ 107 | &= 3/4. 108 | \end{aligned} 109 | $$ 110 | So, on this analysis, the first two tosses do tell us what to expect on the next toss. 111 | 112 | `r newthought("We've")` seen two different ways of assigning prior probabilities, which lead to very different results. The first way, where we assume all possible sequences are equally likely, disagrees with the Principle of Induction. Our observations of the first two flips tell us nothing about the next one. But the second way, where we assume all possible frequencies are equally likely, agrees with with the Principle of Induction. Observing heads on the first two does tell us to expect another heads on the next one. 113 | 114 | Both assumptions are consistent with the laws of probability. So those laws don't, by themselves, tell us what to expect. The laws of probability only tell us what to expect once we've specified the prior probabilities. The problem of induction challenges us to justify one choice of prior probabilities over the alternatives. 115 | 116 | In the $280$ years since this challenge was first raised by David Hume, no answer has gained general acceptance. 117 | 118 | 119 | ## Exercises {-} 120 | 121 | 1. Suppose we do $100$ draws, with replacement, from an urn containing an unknown mixture of black and white balls. All $100$ draws come out black. Which of the following is correct? 122 | 123 | a. According to the laws of probability, the next draw is more likely to be black than white. 124 | #. According to the laws of probability, the next draw is more likely to be white than black. 125 | #. According to the laws of probability, the next draw is equally likely to be black vs. white. 126 | #. The laws of probability are consistent with any of the above conclusions; it depends on the prior probabilities. 127 | #. None of the above. 128 | 129 | #. Write a short essay ($3$--$4$ paragraphs) explaining the problem of induction. Your essay should include all of the following: 130 | 131 | - a clear, accurate explanation of the Principle of Induction, 132 | - a clear, accurate explanation of the dilemma we face in justifying the Principle of Induction, 133 | - a clear, accurate explanation of the challenge for the deductive horn of this dilemma, and 134 | - a clear, accurate explanation of the challenge for the inductive horn of the dilemma. 135 | 136 | #. A coin will be tossed $3$ times, so the number of heads could be $0$, $1$, $2$, or $3$. Suppose all $4$ of these possibilities are equally likely. Moreover, any two sequences with the same number of heads are also equally likely. For example, $\p(H_1 \wedge H_2 \wedge T_3) = \p(T_1 \wedge H_2 \wedge H_3)$. Answer each of the following. 137 | 138 | a. What is $\p(H_2 \given H_1)$? 139 | #. What is $\p(H_3 \given H_1 \wedge H_2)$? 140 | 141 | Now suppose we do $4$ tosses instead of $3$. The prior probabilities follow the same rules: all possible numbers of heads are equally likely, and any two sequences with the same number of heads are equally likely. 142 | 143 | c. If the first $n$ tosses come up heads, what is the probability the next toss will come up heads? In other words, give a formula for $\p(H_{n+1} \given H_1 \wedge \ldots \wedge H_n)$ in terms of $n$. 144 | #. What if only $k$ out of the first $n$ tosses land heads, then what formula gives the probability of heads on the next toss? 145 | 146 | #. Suppose a computer program prints out a stream of A's and B's. After observing the sequence A, A, B, we want to know the probability of an A next. 147 | 148 | Our friend Charlie suggests we reason as follows. We are going to observe $4$ characters total. Before we observed any characters, there were $5$ possibilities: the total number of As could turn out to be $0$, $1$, $2$, $3$, or $4$. And all of these possibilities are equally likely. So each has prior probability $1/5$. 149 | 150 | Some of these possibilities can be subdivided. For example, there are $4$ ways to get $3$ A's: 151 | 152 | A, A, A, B\ 153 | A, A, B, A\ 154 | A, B, A, A\ 155 | B, A, A, A 156 | 157 | So each of these sequences gets $1/4$ of $1/5$, in other words prior probability $1/20$. 158 | 159 | According to Charlie's way of reasoning, what is the probability the $4$th character will be an A, given that the first $3$ were A, A, B? 160 | 161 | #. In this chapter we considered a coin to be flipped $3$ times, with the same prior probability for every possible sequence of heads/tails. Now suppose the coin will be flipped some very large number of times, $n$. And again suppose the prior probability is the same for every possible sequence of heads/tails. Prove that no matter how many times the coin lands heads, the probability of heads on the next toss is still $1/2$. In other words, prove that $\p(H_{k+1} \given H_1 \wedge \ldots \wedge H_{k}) = 1/2$ no matter how large $k$ gets. 162 | 163 | #. Suppose a coin will be flipped $3$ times. There are $2^3 = 8$ possible sequences of heads/tails that we might get. Find a way of assigning prior probabilities to these $8$ sequences so that, the more heads we observe, the *less* likely it becomes we'll get heads on the next toss. In other words, assign a prior probability to each sequence so that $\p(H_2 \given H_1) < \p(H_2)$ and $\p(H_3 \given H_2 \wedge H_1) < \p(H_3 \given H_1)$. 164 | 165 | #. Suppose a coin will be flipped $4$ times. Recall the two different ways of assigning a probability to each possible sequence of heads and tails discussed in the chapter: 166 | 167 | - Scheme 1: all possible sequences have the same probability. 168 | - Scheme 2: all possible *frequencies* (numbers of heads) have the same probability, and all sequences that share the same frequency have the same probability. 169 | 170 | According to each scheme, how probable is each of the following propositions? 171 | 172 | - $A =$ All tosses will land the same way.\ 173 | - $B =$ The number of heads and tails will be the same. 174 | 175 | Now answer the same question with $10$ tosses instead of $4$. -------------------------------------------------------------------------------- /E-selected-solutions.Rmd: -------------------------------------------------------------------------------- 1 | # Solutions to Selected Exercises 2 | 3 | ```{block, type="epigraph"} 4 | To understand God's thoughts we must study statistics, for these are the measure of His purpose.\ 5 | ---attributed to Florence Nightingale 6 | ``` 7 | 8 | 9 | ## Chapter 1 {-} 10 | 11 | \noindent 12 | *Exercise 3.* Prisoner A is incorrect. His chances of survival are still only $1/3$. To see why, we can use the same tree as we did for the Monty Hall problem, and just change the labels on the leaves from "Monty Opens X" to "Guard Names X." 13 | 14 | \vspace{.5em}\vspace{.5em}\noindent 15 | *Exercise 5.* Option (a): $5/8$. 16 | 17 | 18 | ## Chapter 2 {-} 19 | 20 | \noindent 21 | *Exercise 1.* (a) Invalid, (b) Valid, (c) Valid, (d) Valid, (e) Invalid, (f) Valid. 22 | 23 | \vspace{.5em}\noindent 24 | *Exercise 2.* (a) Compatible, (b) Compatible, (c) Mutually Exclusive. 25 | 26 | \vspace{.5em}\noindent 27 | *Exercise 3.* False. 28 | 29 | \vspace{.5em}\noindent 30 | *Exercise 5.* Yes, it is possible. 31 | 32 | 33 | ## Chapter 3 {-} 34 | 35 | \noindent 36 | *Exercise 1.* (a) $\neg A$, (b) $A \wedge B$, (c) $A \wedge \neg B$, (d) $\neg A \wedge \neg B$. 37 | 38 | \vspace{.5em}\noindent 39 | *Exercise 2a.* These are mutually exclusive: there's only one row where $A \wedge B$ is true, and $A \wedge \neg B$ is false in that row. 40 | 41 | \vspace{.5em}\noindent 42 | *Exercise 3b.* These are logically equivalent: their final columns are identical, T T F F. 43 | 44 | \vspace{.5em}\noindent 45 | *Exercise 4.* The column for $B \wedge C$ is T F F F T F F F. The column for $A \vee (B \wedge C)$ is T T T T T F F F. 46 | 47 | 48 | ## Chapter 4 {-} 49 | 50 | \noindent 51 | *Exercise 1.* (a) Independent, (b) Not Independent, (c) Independent, (d) Not Independent. 52 | 53 | \vspace{.5em}\noindent 54 | *Exercise 2.* Option (d): All of the above. 55 | 56 | \vspace{.5em}\noindent 57 | *Exercise 3.* Option (e): None of the above. 58 | 59 | \vspace{.5em}\noindent 60 | *Exercise 4.* (a) Not a gambler's fallacy, (b) Gambler's fallacy. 61 | 62 | 63 | ## Chapter 5 {-} 64 | 65 | \noindent 66 | *Exercise 1.* This is a contradiction so its probability is $0$. 67 | 68 | \vspace{.5em}\noindent 69 | *Exercise 4.* (a) $8/15$, (b) $8/15$, (c) No. 70 | 71 | \vspace{.5em}\noindent 72 | *Exercise 5.* (a) $0$, (b) $1/6$, (c) No. 73 | 74 | \vspace{.5em}\noindent 75 | *Exercise 7.* (a) Yes, (b) No, (c) No. 76 | 77 | \vspace{.5em}\noindent 78 | *Exercise 8.* $1/6$. 79 | 80 | 81 | ## Chapter 6 {-} 82 | 83 | \noindent 84 | *Exercise 2.* (a) $1/2$, (b) $2/7$. 85 | 86 | \vspace{.5em}\noindent 87 | *Exercise 5.* (a) $9/70$, (b) $9/140$, (c) $1/30$, (d) $1/60$, (e) $17/210$, (f) $27/34$. 88 | 89 | \vspace{.5em}\noindent 90 | *Exercise 6.* $1/9$. 91 | 92 | 93 | ## Chapter 7 {-} 94 | 95 | \noindent 96 | *Exercise 1.* (a) $3/13$, (b) $10/13$, (c) $4/13$, (d) $4/13$, (e) $10/13$ 97 | 98 | \vspace{.5em}\noindent 99 | *Exercise 4.* $17/32$ 100 | 101 | \vspace{.5em}\noindent 102 | *Exercise 6.* Yes: 103 | $$ 104 | \begin{aligned} 105 | \p(A) 106 | & = \p((A \wedge B) \vee (A \wedge C) \vee (A \wedge D)) & \text{by Equivalence}\\ 107 | & = \p(A \wedge B) + \p(A \wedge C) + \p(A \wedge D) & \text{by Addition}\\ 108 | & = \p(A \given B)\p(B) + \p(A \given C)\p(C) + \p(A \given D)\p(D) & \text{by General Multiplication}. 109 | \end{aligned} 110 | $$ 111 | 112 | \vspace{.5em}\noindent 113 | *Exercise 8.* (a) $1/13$, (b) $4/51$, (c) $4/663$, (d) $4/663$, (e) $8/663$, (f) $1/221$ 114 | 115 | \vspace{.5em}\noindent 116 | *Exercise 25.* First observe that $A \wedge (A \vee B)$ is equivalent to $A$. This can be verified by a truth table or Euler diagram. We then reason as follows: 117 | $$ 118 | \begin{aligned} 119 | \p(A \given A \vee B) 120 | & = \frac{\p(A \wedge (A \vee B))}{\p(A \vee B)} & \text{by definition}\\ 121 | & = \frac{\p(A)}{\p(A \vee B)} & \text{by Equivalence}\\ 122 | & = \frac{\p(A)}{\p(A) + \p(B)} & \text{by Addition.} 123 | \end{aligned} 124 | $$ 125 | 126 | 127 | ## Chapter 8 {-} 128 | 129 | \noindent 130 | *Exercise 1.* $1/4$ 131 | 132 | \vspace{.5em}\noindent 133 | *Exercise 3.* $2/3$ 134 | 135 | \vspace{.5em}\noindent 136 | *Exercise 5.* $9/11$ 137 | 138 | \vspace{.5em}\noindent 139 | *Exercise 7.* The full formula is: 140 | $$ 141 | \p(X \given B) 142 | = \frac{\p(X)\p(B \given X)}{\p(X)\p(B \given X) + \p(Y)\p(B \given Y) + \p(Z)\p(B \given Z)}. 143 | $$ 144 | To derive this formula, start with the short form of Bayes' theorem for $\p(X \given B)$. Then apply the version of LTP from Exercise 7.6 to the denominator, $\p(B)$. 145 | 146 | \vspace{.5em}\noindent 147 | *Exercise 8.* $1/57$ 148 | 149 | \vspace{.5em}\noindent 150 | *Exercise 11.* $1/4$ 151 | 152 | 153 | ## Chapter 9 {-} 154 | 155 | \noindent 156 | *Exercise 1.* (a) $1/6$, (b) $1/2$, (c) $1/2$. 157 | 158 | \vspace{.5em}\noindent 159 | *Exercise 2.* $81/85$. 160 | 161 | \vspace{.5em}\noindent 162 | *Exercise 5.* 163 | $$ 164 | \begin{aligned} 165 | \p(A \given B \wedge C) 166 | &= \frac{\p(A \wedge (B \wedge C))}{\p(B \wedge C)} & \text{ by definition}\\ 167 | &= \frac{\p(A \wedge (C \wedge B))}{\p(C \wedge B)} & \text{ by Equivalence}\\ 168 | &= \p(A \given C \wedge B) & \text{ by definition.} 169 | \end{aligned} 170 | $$ 171 | 172 | \vspace{.5em}\noindent 173 | *Exercise 7.* First note that $\p(C) = \p((A \wedge C) \vee \p(\neg A \wedge C))$ by Equivalence, and thus by Addition we have $\p(\neg A \wedge C) = \p(C) - \p(A \wedge C)$. We then reason as follows: 174 | $$ 175 | \begin{aligned} 176 | \p(\neg A \given C) 177 | &= \frac{\p(\neg A \wedge C)}{\p(C)} & \text{ by definition}\\ 178 | &= \frac{\p(C) - \p(A \wedge C)}{\p(C)} & \text{ by above}\\ 179 | &= \frac{\p(C)}{\p(C)} - \frac{\p(A \wedge C)}{\p(C)} & \text{ by algebra}\\ 180 | &= 1 - \frac{\p(A \wedge C)}{\p(C)} & \text{ by algebra}\\ 181 | &= 1 - \p(A \given C) & \text{ by definition.} 182 | \end{aligned} 183 | $$ 184 | 185 | 186 | ## Chapter 11 {-} 187 | 188 | \noindent 189 | *Exercise 2.* $-\$80$. 190 | 191 | \vspace{.5em}\noindent 192 | *Exercise 4.* $-\$0.79$. 193 | 194 | \vspace{.5em}\noindent 195 | *Exercise 9.* (a) $\$460$ million, (b) $\$580$ million, (c) $\$220$ million, (d) no, they won't conduct the study because it would be a waste of $\$5,000$. (The EMV of enacting the tax will be positive regardless of the study's findings. So doing the study won't help them make their decision.) 196 | 197 | \vspace{.5em}\noindent 198 | *Exercise 11.* (a) $-\$60$, (b) $-\$52$. 199 | 200 | \vspace{.5em}\noindent 201 | *Exercise 16.* $x = 888$. 202 | 203 | \vspace{.5em}\noindent 204 | *Exercise 22.* Suppose that $E(A) = \$x$. Let the possible payoffs of $A$ be $x_1, \ldots, x_n$. Then: 205 | $$ 206 | \begin{aligned} 207 | E(\text{Pay $\$x$ for $A$}) 208 | &= \p(\$x_1) \cdot (\$x_1 - \$x) + \ldots + \p(\$x_n) \cdot (\$x_n - \$x)\\ 209 | &= \left[\p(\$x_1) \cdot \$ x_1 - \p(\$x_1) \cdot \$x \right] + \ldots + \left[\p(\$x_n) \cdot \$x_n - \p(\$x_n) \cdot \$x \right]\\ 210 | &= E(A) - \left[\p(\$x_1) \cdot \$ x + \ldots + \p(\$x_n) \cdot \$x \right]\\ 211 | &= E(A) - \$x \left[\p(\$x_1) + \ldots + \p(\$x_n) \right]\\ 212 | &= E(A) - \$x \\ 213 | &= 0. 214 | \end{aligned} 215 | $$ 216 | 217 | 218 | ## Chapter 12 {-} 219 | 220 | \noindent 221 | *Exercise 2.* (d) 222 | 223 | \vspace{.5em}\noindent 224 | *Exercise 3.* $3/5$ 225 | 226 | \vspace{.5em}\noindent 227 | *Exercise 5.* (a) $3/5$, (b) $2/3$. 228 | 229 | \vspace{.5em}\noindent 230 | *Exercise 8.* (a) $103/2$, (b) $490/9$, (c) $45/49$. 231 | 232 | \vspace{.5em}\noindent 233 | *Exercise 10.* Suppose action $A$ has only two possible consequences, $C_1$ and $C_2$, such that $\p(C_1) = \p(C_2)$ and $U(C_1) = -U(C_2)$. Since $\p(C_1) = \p(C_2) = 1/2$, we have: 234 | $$ 235 | \begin{aligned} 236 | EU(A) 237 | &= \p(C_1) U(C_1) + \p(C_2) U(C_2) \\ 238 | &= \frac{1}{2} U(C_1) + \frac{1}{2} U(C_2) \\ 239 | &= -\frac{1}{2} U(C_2) + \frac{1}{2} U(C_2) \\ 240 | &= 0. 241 | \end{aligned} 242 | $$ 243 | 244 | 245 | ## Chapter 13 {-} 246 | 247 | \noindent 248 | *Exercise 1.* (d) 249 | 250 | \vspace{.5em}\noindent 251 | *Exercise 2.* (a) 252 | 253 | \vspace{.5em}\noindent 254 | *Exercise 3.* (b) 255 | 256 | \vspace{.5em}\noindent 257 | *Exercise 5.* (c) 258 | 259 | \vspace{.5em}\noindent 260 | *Exercise 6.* (c) 261 | 262 | \vspace{.5em}\noindent 263 | *Exercise 8.* (d) 264 | 265 | 266 | ## Chapter 14 {-} 267 | 268 | \noindent 269 | *Exercise 1.* (d) 270 | 271 | \vspace{.5em}\noindent 272 | *Exercise 2.* (b) 273 | 274 | \vspace{.5em}\noindent 275 | *Exercise 5.* $f(y) = e^y$. If we plug $e^y$ into $\log(x)$ we get $y$ back: $\log(e^y) = y$. 276 | 277 | \vspace{.5em}\noindent 278 | *Exercise 8.* (b) 279 | 280 | \vspace{.5em}\noindent 281 | *Exercise 9.* (c) 282 | 283 | 284 | ## Chapter 16 {-} 285 | 286 | \noindent 287 | *Exercise 1.* a) $7/10$, b) $4/5$, c) $2/3$. 288 | 289 | \vspace{.5em}\noindent 290 | *Exercise 3.* (a) $5/6$, (b) $1/3$, (c) $2/5$ 291 | 292 | \vspace{.5em}\noindent 293 | *Exercise 4.* (a) $1/2$, (b) $1/3$, (c) $2/3$, (d) $2/9$ 294 | 295 | 296 | ## Chapter 17 {-} 297 | 298 | \noindent 299 | *Exercise 1.* We make the following deals with Ronnie. 300 | 301 | - He pays us $\$.40$; we pay him $\$1$ if $A$ is true. 302 | - He pays us $\$.70$; we pay him $\$1$ if $A$ is false. 303 | 304 | Each of these deals is fair according to Ronnie because the betting rates match his personal probabilities. For example, the expected value of the first bet is 0: 305 | $$ (\$1 - \$.40) \p(A) - \$.40 (1 - \p(A)) = (\$.60)(4/10) - \$.40(6/10) = 0. $$ 306 | But Ronnie must pay us $1.10 for these bets, and he will only get $1 in return. Whether $A$ is true or false, only one of the bets will pay off for Ronnie. So his net "gain" will be $\$1 - \$1.10 = -\$.10$. 307 | 308 | \vspace{.5em}\noindent 309 | *Exercise 2.* We make the following deals with Marco. 310 | 311 | - We pay him $\$0.30$; he pays us $\$1$ if $X$ is true. 312 | - We pay him $\$0.20$; he pays us $\$1$ if $Y$ is true. 313 | - He pays us $\$0.60$; we pay him $\$1$ if $X \vee Y$ is true. 314 | 315 | The explanation is similar to 17.1: each bet is fair according to Marco's personal probabilities, as can be checked by doing the expected value calculations which come out to $0$ for each one. But he pays us $\$.10$ more up front than we pay him, and he can't possibly win it back. If he wins the third bet, we win one or both of the first two bets. 316 | 317 | \vspace{.5em}\noindent 318 | *Exercise 8.* We make the following deals with Piz. 319 | 320 | - We pay him $\$1/4$; he pays us $\$1$ if Pia is a basketball player. 321 | - He pays us $\$1/3$; we pay him $\$1$ if Pia is a good basketball player. 322 | 323 | Again the deals are fair given his personal probabilities. And again he pays us more up front than we pay him, money which he can't win back. If he wins the second bet, we win the first. 324 | 325 | ## Chapter 18 {-} 326 | 327 | \noindent 328 | *Exercise 1.* (a) $7/10$, (b) $3/10$, (c) same answers as before, (d) different answers: $217/300$ and $11/36$. 329 | 330 | \vspace{.5em}\noindent 331 | *Exercise 2.* a) $1/2$, b) $5/8$, c) yes, different answers here: $2/3$ and $21/32$, d) different again: $49/62$ and $713/992$. 332 | 333 | \vspace{.5em}\noindent 334 | *Exercise 6.* a) $2/3$, b) $2/1$, c) $1/2$, d) different: $1/3$ now 335 | 336 | 337 | ## Chapter 19 {-} 338 | 339 | \noindent 340 | *Exercise 1.* 341 | (a) $\mu = 20, \sigma = 4$, 342 | (b) a bell curve centered at $20$ and reaching $y \approx 0$ around $x \approx 8$, 343 | (c) $a = 16, b = 24$, 344 | (d) $a = 12, b = 28$, 345 | (e) $a = 8, b = 32$, 346 | (f) $7$ or fewer, 347 | (g) $33$ or more, 348 | (h) answers will vary. 349 | 350 | \vspace{.5em}\noindent 351 | *Exercise 3.* 352 | (a) $\mu = 144, \sigma = 6$, 353 | (b) $132, 156$, 354 | (c) $126, 162$, 355 | (d) not significant at either level, 356 | (e) false (more accurately, not enough information is given to draw a conclusion either way). 357 | 358 | \vspace{.5em}\noindent 359 | *Exercise 8.* 360 | (a) $\mu = 10, \sigma = 3$, 361 | (b) no, 362 | (c) false, 363 | (d) false. 364 | 365 | \vspace{.5em}\noindent 366 | *Exercise 9.* 367 | (a) $\mu = 80, \sigma = 4$, 368 | (b) yes, 369 | (c) if the null hypothesis is true, and we repeated the experiment over and over, we would get a result this far from the mean less than $1\%$ of the time. 370 | 371 | \vspace{.5em}\noindent 372 | *Exercise 11.* If the null hypothesis is true, the probability of getting a result this "extreme" (i.e. as improbable as this one, or even less probable) is below $.05$. 373 | 374 | 375 | ## Chapter 20 {-} 376 | 377 | \noindent 378 | *Exercise 1.* (a) and (b) 379 | 380 | \vspace{.5em}\noindent 381 | *Exercise 2.* (a) $25$, (b) $400$, (c) $16/17$ or about $94\%$. 382 | 383 | \vspace{.5em}\noindent 384 | *Exercise 3.* (a) 225, (b) $675/676$ or about $99.9\%$. 385 | 386 | \vspace{.5em}\noindent 387 | *Exercise 5.* (a) $\mu = 5, \sigma = 2$, (b) yes, (c) $\mu = 5/2, \sigma = 3/2$, (d) yes. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Source Code for the Book, *Odds & Ends* 2 | 3 | This repo contains the source code for a textbook, *Odds & Ends: Introducing Probability & Decision with a Visual Emphasis*. The book is for introductory philosophy courses on probability and inductive logic, and is based on a typical such course I teach at the University of Toronto. 4 | 5 | You can read the book in [HTML here](http://jonathanweisberg.org/vip/) or [PDF here](http://jonathanweisberg.org/vip/_main.pdf). -------------------------------------------------------------------------------- /_bookdown.yml: -------------------------------------------------------------------------------- 1 | output_dir: vip/docs 2 | delete_merged_file: true 3 | -------------------------------------------------------------------------------- /_output.yml: -------------------------------------------------------------------------------- 1 | bookdown::tufte_html_book: 2 | toc: yes 3 | css: 4 | - toc.css 5 | - custom.css 6 | split_by: chapter 7 | includes: 8 | in_header: header.html 9 | before_body: preamble.html 10 | bookdown::pdf_book: 11 | base_format: tufte::tufte_book 12 | toc_depth: 1 13 | fig_width: 7 14 | fig_height: 5 15 | number_sections: yes 16 | keep_tex: yes 17 | includes: 18 | in_header: preamble.tex 19 | # latex_engine: xelatex 20 | # compile from the command line with: Rscript -e "bookdown::render_book('index.Rmd', 'bookdown::pdf_book')" 21 | -------------------------------------------------------------------------------- /custom.css: -------------------------------------------------------------------------------- 1 | body { 2 | background-color: #fefefe; 3 | /* font-family: Palatino, "Palatino Linotype", "Palatino LT STD", "Book Antiqua", Georgia, serif; */ 4 | } 5 | 6 | /*.numeral, .sidenote-number { 7 | font-family: Palatino, "Palatino Linotype", "Palatino LT STD", "Book Antiqua", Georgia, serif; 8 | }*/ 9 | 10 | dt { 11 | text-indent: 1rem; 12 | font-size: 1.4rem; 13 | font-weight: bold !important; 14 | } 15 | 16 | h1 > .header-section-number { 17 | padding-right: 1em; 18 | } 19 | 20 | div.epigraph { 21 | font-style: italic; 22 | text-align: right; 23 | width: 75%; 24 | margin-left: auto; 25 | margin-right: auto; 26 | } 27 | 28 | div.example { 29 | border-left: 3px double teal; 30 | padding-left: 1rem; 31 | } 32 | 33 | div.problem { 34 | border-left: 3px double maroon; 35 | padding-left: 1rem; 36 | } 37 | 38 | div.puzzle { 39 | border-left: 3px double teal; 40 | padding-left: 1rem; 41 | } 42 | 43 | div.info { 44 | border-left: 3px double teal; 45 | padding-left: 1rem; 46 | } 47 | 48 | div.warning { 49 | border-left: 3px double red; 50 | padding-left: 1rem; 51 | } 52 | 53 | div.argument { 54 | /*border-left: 3px double silver;*/ 55 | padding-left: 2rem; 56 | } 57 | 58 | button { 59 | background: none!important; 60 | color: inherit; 61 | border: none; 62 | padding: 1!important; 63 | font: inherit; 64 | cursor: pointer; 65 | } 66 | 67 | a:nth-last-child(even) > .btn.btn-default:before{ 68 | content: "☜ "; 69 | } 70 | 71 | a:nth-last-child(odd) > .btn.btn-default:after{ 72 | content: " ☞"; 73 | line-height: 60%; 74 | } 75 | 76 | .title { 77 | width: 55%; 78 | } 79 | 80 | ol, ul, hr { 81 | width: 50%; 82 | } 83 | 84 | ol ol, ol ul, ul ol, ul ul, ol table, ul table, ol hr, ul hr { 85 | width: 91%; 86 | } 87 | 88 | @media screen and (max-width: 760px) { 89 | ol, ul { width: 90%; } 90 | ul { width: 85%; } 91 | } 92 | 93 | div.proof:after { 94 | text-align: right; 95 | display: inline-block; 96 | width: 55%; 97 | font-size: 1.4rem; 98 | content: "∎"; 99 | } -------------------------------------------------------------------------------- /header.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 23 | 24 | -------------------------------------------------------------------------------- /img/allais.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/allais.png -------------------------------------------------------------------------------- /img/bertrand.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/bertrand.png -------------------------------------------------------------------------------- /img/bertrand_screengrab.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/bertrand_screengrab.png -------------------------------------------------------------------------------- /img/bertrand_screenshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/bertrand_screenshot.png -------------------------------------------------------------------------------- /img/daniel_bernoulli.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/daniel_bernoulli.png -------------------------------------------------------------------------------- /img/die/die1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/die/die1.png -------------------------------------------------------------------------------- /img/die/die2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/die/die2.png -------------------------------------------------------------------------------- /img/die/die3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/die/die3.png -------------------------------------------------------------------------------- /img/die/die4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/die/die4.png -------------------------------------------------------------------------------- /img/die/die5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/die/die5.png -------------------------------------------------------------------------------- /img/die/die6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/die/die6.png -------------------------------------------------------------------------------- /img/door_closed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/door_closed.png -------------------------------------------------------------------------------- /img/door_open.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/door_open.png -------------------------------------------------------------------------------- /img/ellsberg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/ellsberg.png -------------------------------------------------------------------------------- /img/emoji_hearts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/emoji_hearts.png -------------------------------------------------------------------------------- /img/emoji_hearts_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/emoji_hearts_small.png -------------------------------------------------------------------------------- /img/emoji_nerd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/emoji_nerd.png -------------------------------------------------------------------------------- /img/emoji_nerd_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/emoji_nerd_small.png -------------------------------------------------------------------------------- /img/emoji_shades.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/emoji_shades.png -------------------------------------------------------------------------------- /img/emoji_shades_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/emoji_shades_small.png -------------------------------------------------------------------------------- /img/euler.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/euler.png -------------------------------------------------------------------------------- /img/fig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/fig.png -------------------------------------------------------------------------------- /img/fisher.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/fisher.png -------------------------------------------------------------------------------- /img/flanders.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/flanders.png -------------------------------------------------------------------------------- /img/goodman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/goodman.png -------------------------------------------------------------------------------- /img/hume.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/hume.png -------------------------------------------------------------------------------- /img/jeffreys.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/jeffreys.png -------------------------------------------------------------------------------- /img/laplace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/laplace.png -------------------------------------------------------------------------------- /img/lets_make_a_deal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/lets_make_a_deal.png -------------------------------------------------------------------------------- /img/marg_fig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/marg_fig.png -------------------------------------------------------------------------------- /img/marilyn_vos_savant.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/marilyn_vos_savant.png -------------------------------------------------------------------------------- /img/moon.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/moon.gif -------------------------------------------------------------------------------- /img/moon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/moon.png -------------------------------------------------------------------------------- /img/neon_bayes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/neon_bayes.png -------------------------------------------------------------------------------- /img/pascal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/pascal.png -------------------------------------------------------------------------------- /img/pill_green.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/pill_green.png -------------------------------------------------------------------------------- /img/pill_red.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/pill_red.png -------------------------------------------------------------------------------- /img/playing_cards.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/playing_cards.png -------------------------------------------------------------------------------- /img/ramsey.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/ramsey.png -------------------------------------------------------------------------------- /img/roulette_wheel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/roulette_wheel.png -------------------------------------------------------------------------------- /img/social_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/social_image.png -------------------------------------------------------------------------------- /img/taxi_blue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/taxi_blue.png -------------------------------------------------------------------------------- /img/taxi_green.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/taxi_green.png -------------------------------------------------------------------------------- /img/vacuum.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/vacuum.gif -------------------------------------------------------------------------------- /img/vacuum.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/vacuum.png -------------------------------------------------------------------------------- /img/wiphi_grue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/wiphi_grue.png -------------------------------------------------------------------------------- /img/xfiles.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jweisber/vip-source/25024e8625b42e58096caf345941076aedccb7ab/img/xfiles.png -------------------------------------------------------------------------------- /index.Rmd: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Odds & Ends" 3 | subtitle: "Introducing Probability & Decision with a Visual Emphasis" 4 | author: "Jonathan Weisberg" 5 | description: "An open access textbook for introductory philosophy courses on probability and inductive logic." 6 | cover-image: "img/social_image.png" 7 | github-repo: jweisber/vip 8 | classoption: justified 9 | site: bookdown::bookdown_site 10 | --- 11 | 12 | # Preface {-} 13 | 14 | ```{r setup, echo=FALSE, include=FALSE, cache=FALSE} 15 | knitr::opts_chunk$set( 16 | cache = TRUE, 17 | dev.args = list(bg = "transparent") 18 | ) 19 | if (knitr:::is_latex_output()) { 20 | options(knitr.table.format = 'pandoc') 21 | knitr::opts_chunk$set(dpi = 300) 22 | } 23 | if (knitr:::is_html_output()) { 24 | knitr::include_graphics('img/social_image.png', dpi = NA) 25 | } 26 | 27 | library(tufte) 28 | library(dplyr) 29 | library(ggplot2) 30 | library(ggforce) 31 | library(png) 32 | library(grid) 33 | library(ptree) 34 | library(ggraph) 35 | library(igraph) 36 | 37 | theme_set(theme_minimal(base_size = 18)) 38 | # update_geom_defaults("text", list(size = 7)) 39 | # update_geom_defaults("label", list(size = 7)) 40 | 41 | bookred <- rgb(228, 6, 19, maxColorValue = 255) 42 | bookblue <- rgb(0, 92, 169, maxColorValue = 255) 43 | bookpurple <- rgb(114, 49, 94, maxColorValue = 255) 44 | bookgreen <- rgb(26, 179, 22, maxColorValue = 255) 45 | ``` 46 | 47 | `r newthought("This")` textbook is for introductory philosophy courses on probability and inductive logic. It is based on a typical such course I teach at the University of Toronto, where we offer "Probability & Inductive Logic" in the second year, alongside the usual deductive logic intro.$\,$ 48 | 49 | The book assumes no deductive logic. The early chapters introduce the little that's used. In fact almost no formal background is presumed, only very simple high school algebra. 50 | 51 | Several well known predecessors inspired and shaped this book. Brian Skyrms' *Choice & Chance* and Ian Hacking's *An Introduction to Probability and Inductive Logic* were especially influential. Both texts are widely used with good reason---they are excellent. I've taught both myself many times, with great success. But this book blends my favourite aspects of each, organizing them in the sequence and style I prefer. 52 | 53 | I hope this book also offers more universal benefits. 54 | 55 | 1. It is open access, hence free. 56 | 2. It's also [open source](https://github.com/jweisber/vip-source), so other instructors can modify it to their liking. 57 | ```{marginfigure} 58 | If you teach from this book I'd love to know: email or [tweet me](https://twitter.com/jweisber). 59 | ``` 60 | 3. It's available in both [PDF](http://jonathanweisberg.org/vip/_main.pdf) and [HTML](http://jonathanweisberg.org/vip/). So it can be read comfortably on a range of devices, or printed. 61 | 4. It emphasizes visual explanations and techniques, to make the material more approachable. 62 | 5. It livens up the text with hyperlinks, images, and margin notes that highlight points of history and curiosity. I also hope to add some animations and interactive tools soon. 63 | 64 | `r newthought("The")` book is divided into three main parts. The first explains the basics of logic and probability, the second covers basic decision theory, and the last explores the philosophical foundations of probability and statistics. This last, philosophical part focuses on the Bayesian and frequentist approaches. 65 | 66 | A "cheat sheet" summarizing key definitions and formulas appears in [Appendix][Cheat Sheet] [A][Cheat Sheet]. Further appendices cover the axiomatic construction of probability theory, Hume's problem of induction, and Goodman's new ridle of induction. 67 | 68 | `r newthought("I")` usually get a mix of students in my course, with different ideological inclinations and varying levels of background. For some the technical material is easy, even review. For others, a healthy skepticism about scientific methods and discourses comes naturally. My goal is to get these students all more or less on the same page. 69 | 70 | By the end of the course, students with little formal background have a bevy of tools for thinking about uncertainty. They can understand much more of the statistical and scientific discourse they encounter. And hopefully they have a greater appreciation for the value of formal methods. Students who already have strong formal tools and skills will, I hope, better understand their limitations. I want them to understand why these tools leave big questions open---not just philosophically, but also in very pressing, practical ways. 71 | 72 | `r newthought("The")` book was made with the `bookdown` package created by Yihui Xie. It's a wonderful tool, built on a bunch of other technologies I love, especially the R programming language and the pandoc conversion tool created by philosopher John MacFarlane. The book's visual style emulates the famous designs of Edward Tufte, thanks to more software created by Yihui Xie, J. J. Allaire, and many others who adapted Tufte's designs to HTML and PDF (via LaTeX). 73 | 74 | If it weren't for these tools, I never would have written this book. It wouldn't have been possible to create one that does all the things this book is meant to do. I also owe inspiration to Kieran Healy's book [*Data Visualization: A Practical Introduction*](http://socviz.co/), which uses the same suite of tools. It gave me the idea to use those tools for an updated, open, and visually enhanced rendition of the classic material from Skyrms and Hacking. 75 | 76 | Finally, I'm indebted to several teaching assistants and students who helped with earlier drafts. Thanks especially to Liang Zhou Koh and Meagan Phillips, who contributed several exercises; to Soroush Marouzi and Daniel Munro, who worked out the kinks during the first semester the book was piloted; and to the students in that course, who bore with us, and contributed several corrections of their own. 77 | 78 | 79 | 81 | 86 | 89 | 94 | -------------------------------------------------------------------------------- /preamble.html: -------------------------------------------------------------------------------- 1 |
-------------------------------------------------------------------------------- /preamble.tex: -------------------------------------------------------------------------------- 1 | %\usepackage{MinionPro} 2 | %\usepackage{fontspec} 3 | %\newfontfamily\DejaSans{DejaVu Sans} 4 | 5 | \newcommand{\given}{\mid} 6 | \renewcommand{\neg}{\mathbin{\sim}} 7 | \renewcommand{\wedge}{\mathbin{\&}} 8 | \renewcommand{\u}{U} 9 | \newcommand{\gt}{>} 10 | \newcommand{\p}{Pr} 11 | \newcommand{\E}{E} 12 | \newcommand{\EU}{EU} 13 | \newcommand{\pr}{Pr} 14 | \newcommand{\po}{Pr^*} 15 | \newcommand{\degr}{^{\circ}} 16 | \definecolor{bookred}{RGB}{228,6,19} 17 | \definecolor{bookblue}{RGB}{0,92,169} 18 | \definecolor{bookpurple}{RGB}{114,49,94} 19 | 20 | \newenvironment{epigraph}% 21 | { 22 | \begin{flushright} 23 | \begin{minipage}{20em} 24 | \begin{flushright} 25 | \itshape 26 | }% 27 | { 28 | \end{flushright} 29 | \end{minipage} 30 | \end{flushright} 31 | } 32 | \newenvironment{problem}{\begin{quote}\normalsize}{\end{quote}} 33 | \newenvironment{puzzle}{\begin{quote}\normalsize}{\end{quote}} 34 | \def\argument{\list{}{\leftmargin3em}\item[]} 35 | \let\endargument=\endlist 36 | \usepackage{fontawesome} 37 | \newenvironment{warning}{\begin{itemize}\item[\faBan]}{\end{itemize}} 38 | \usepackage{marvosym} 39 | \newenvironment{info}{\begin{itemize}\item[\Info]}{\end{itemize}} 40 | 41 | %%%% Kevin Godny's code for title page and contents from https://groups.google.com/forum/#!topic/tufte-latex/ujdzrktC1BQ 42 | \makeatletter 43 | \renewcommand{\maketitlepage}{% 44 | \begingroup% 45 | \setlength{\parindent}{0pt} 46 | {\fontsize{18}{18}\selectfont\textit{\@author}\par} 47 | \vspace{1.75in}{\fontsize{36}{14}\selectfont\@title\par} 48 | \vspace{0.5in}{\fontsize{20}{14}\selectfont Introducing Probability \& Decision with a Visual Emphasis\par} 49 | \vspace{0.5in}{\fontsize{14}{14}\selectfont\textsf{\smallcaps{v0.3 beta}}\par} 50 | \vfill{\fontsize{14}{14}\selectfont\textit{An Open Access Publication}\par} 51 | \thispagestyle{empty} 52 | \endgroup 53 | } 54 | \makeatother 55 | 56 | % Change shape from [display] to [block] to keep chapter numbers and titles on the same line 57 | \titleformat{\chapter}% 58 | [block]% shape 59 | {\relax\ifthenelse{\NOT\boolean{@tufte@symmetric}}{\begin{fullwidth}}{}}% format applied to label+text 60 | {\itshape\huge\thechapter}% label 61 | {3em}% horizontal separation between label and title body 62 | {\huge\rmfamily\itshape}% before the title body 63 | [\ifthenelse{\NOT\boolean{@tufte@symmetric}}{\end{fullwidth}}{}]% after the title body 64 | 65 | 66 | \usepackage{etoolbox} 67 | % Jesse Rosenthal's code from https://groups.google.com/forum/#!topic/pandoc-discuss/wCF78X6SvwY 68 | % Avoid new pagraph/indent after lists, quotes, etc. 69 | \makeatletter 70 | \newcommand{\gobblepars}{% 71 | \@ifnextchar\par% 72 | {\expandafter\gobblepars\@gobble}% 73 | {}} 74 | \newcommand{\eatpar}{\@ifnextchar\par{\@gobble}{}} 75 | \newcommand{\forcepar}{\par} 76 | \makeatother 77 | \AfterEndEnvironment{quote}{\expandafter\gobblepars} 78 | \AfterEndEnvironment{enumerate}{\expandafter\gobblepars} 79 | \AfterEndEnvironment{itemize}{\expandafter\gobblepars} 80 | \AfterEndEnvironment{description}{\expandafter\gobblepars} 81 | \AfterEndEnvironment{example}{\expandafter\gobblepars} 82 | \AfterEndEnvironment{argument}{\expandafter\gobblepars} 83 | \AfterEndEnvironment{problem}{\expandafter\gobblepars} 84 | \AfterEndEnvironment{info}{\expandafter\gobblepars} 85 | \AfterEndEnvironment{warning}{\expandafter\gobblepars} 86 | \AfterEndEnvironment{marginfigure}{\expandafter\gobblepars} 87 | \AfterEndEnvironment{longtable}{\expandafter\gobblepars} % not working, why? 88 | \makeatletter 89 | \AfterEndEnvironment{longtable}{\par\@afterindentfalse\@afterheading} % this seems to work instead 90 | \makeatother 91 | 92 | \renewcommand*\descriptionlabel[1]{\hspace\labelsep\normalfont\em #1.} 93 | 94 | % prevent extra space when \newthought follows \section 95 | % see: https://tex.stackexchange.com/questions/291746/tufte-latex-newthought-after-section 96 | \makeatletter 97 | \def\tuftebreak{% 98 | \if@nobreak\else 99 | \par 100 | \ifdim\lastskip<\tufteskipamount 101 | \removelastskip \penalty -100 102 | \tufteskip 103 | \fi 104 | \fi 105 | } 106 | \makeatother 107 | 108 | % indent lists a bit 109 | \usepackage{enumitem} 110 | \setlist[1]{leftmargin=24pt} 111 | 112 | \def\labelitemii{$\circ$} -------------------------------------------------------------------------------- /toc.css: -------------------------------------------------------------------------------- 1 | @-webkit-keyframes fadeIn { 2 | from { opacity: 0; } 3 | to { opacity: 1; } 4 | } 5 | @keyframes fadeIn { 6 | from { opacity: 0; } 7 | to { opacity: 1; } 8 | } 9 | 10 | #TOC { 11 | width: 20rem; 12 | } 13 | 14 | #TOC::before { 15 | content: "☰ Contents"; 16 | font-size: 1.7rem; 17 | font-variant: small-caps; 18 | cursor: pointer; 19 | display: inline-block; 20 | padding-left: 10px; 21 | width: calc(20rem - 10px); 22 | border-radius: 5px 5px 0 0; 23 | border-bottom: 2px double #fefefe; 24 | } 25 | 26 | #TOC:hover { 27 | color: white; 28 | } 29 | 30 | #TOC:hover:before { 31 | background: #2c2c2c; 32 | border-bottom: 2px double #4f4f4f; 33 | -webkit-animation: fadeIn .5s; 34 | animation: fadeIn .5s; 35 | } 36 | 37 | #TOC ul { 38 | padding: 0; 39 | margin: 0; 40 | display: none; 41 | position: absolute; 42 | font-size: 1.2rem; 43 | } 44 | 45 | #TOC:hover > ul { 46 | display: block; 47 | z-index: 1; 48 | -webkit-animation: fadeIn .5s; 49 | animation: fadeIn .5s; 50 | } 51 | 52 | #TOC li { 53 | padding: 0 0 0 0; 54 | position: relative; 55 | display: block; 56 | width: 20rem; 57 | } 58 | 59 | #TOC > ul > li { 60 | border-bottom: 1px solid #4f4f4f; 61 | } 62 | 63 | #TOC ul ul { 64 | position: absolute; 65 | display: none; 66 | left: 100%; 67 | top: 0; 68 | } 69 | 70 | #TOC li:hover ul { 71 | width: fit-content; 72 | display: block; 73 | -webkit-animation: fadeIn .5s; 74 | animation: fadeIn .5s; 75 | } 76 | 77 | #TOC li:hover > ul { 78 | display: block; 79 | } 80 | 81 | #TOC a, .part, .appendix { 82 | background: #2c2c2c; /*linear-gradient(to bottom, #2c2c2c 0%, #141414 100%);*/ 83 | color: #ffffff; 84 | display: block; 85 | padding: 10px; 86 | text-decoration: none; 87 | text-shadow: none; 88 | } 89 | 90 | #TOC .part, .appendix { 91 | background: #0fa1e0; 92 | text-align: center; 93 | font-size: 1.2rem; 94 | } 95 | 96 | #TOC > ul > li > a:hover{ 97 | background: #1e1e1e; 98 | } 99 | 100 | #TOC .has-sub ul li a { 101 | background: #0fa1e0; 102 | border-bottom: 1px dotted #31b7f1; 103 | filter: none; 104 | display: block; 105 | line-height: 120%; 106 | padding: 10px; 107 | color: #ffffff; 108 | } 109 | 110 | #TOC .has-sub ul li:first-child a { 111 | border-radius: 0 5px 0 0; 112 | } 113 | 114 | #TOC .has-sub ul li:last-child a { 115 | border-radius: 0 0 5px 0; 116 | } 117 | 118 | #TOC .has-sub ul li:hover a { 119 | background: #0c7fb0; 120 | } 121 | 122 | #TOC ul ul li:hover > a { 123 | color: #ffffff; 124 | } --------------------------------------------------------------------------------