├── .gitignore ├── Exercise 1.ipynb ├── Exercise 2.ipynb ├── Exercise 3 - solutions.ipynb ├── Exercise 3.ipynb └── README.md /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | -------------------------------------------------------------------------------- /Exercise 1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Exercise 1: Bayesian A testing for Swedish Fish Incorporated (B comes later)\n", 8 | "=======================\n", 9 | "### *Rasmus Bååth (adapted for Python by Christophe Carvenius)*\n", 10 | "\n", 11 | "Swedish Fish Incorporated is the largest Swedish company delivering fish by mail order. They are now trying to get into the lucrative Danish market by selling one year Salmon subscriptions. The marketing department have done a pilot study and tried the following marketing method:\n", 12 | "\n", 13 | "**A:** Sending a mail with a colorful brochure that invites people to sign up for a one year salmon subscription.\n", 14 | "\n", 15 | "The marketing department sent out 16 mails of type A. Six Danes that received a mail signed up for one year of salmon and marketing now wants to know, how good is method A?\n", 16 | "\n", 17 | "*At the bottom of this document you’ll find a solution. But try yourself first!*\n" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "Question I, Build a Bayesian model that answers the question: What would the rate of sign-up be if method A was used on a larger number of people?\n", 25 | "-------------------\n", 26 | "**Hint 1:** The answer is not a single number but a distribution over probable rates of sign-up.\n", 27 | "\n", 28 | "**Hint 2:** As part of you generative model you’ll want to use the binomial distribution, which you can sample from in R using the `np.random.binomial(n, p, size)`. The binomial distribution simulates the following process n times: The number of “successes” when performing size trials, where the probability of “success” is p.\n", 29 | "\n", 30 | "**Hint 3:** A commonly used prior for the unknown probability of success in a binomial distribution is a uniform distribution from 0 to 1. You can draw from this distribution by running `np.random.uniform(0, 1, size = n_draws)`\n", 31 | "\n", 32 | "**Hint 4:** Here is a code scaffold that you can build upon.\n", 33 | " " 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": { 40 | "collapsed": true 41 | }, 42 | "outputs": [], 43 | "source": [ 44 | "# Import libraries\n", 45 | "import pandas as pd\n", 46 | "import numpy as np\n", 47 | "\n", 48 | "# Number of random draws from the prior\n", 49 | "n_draws = 10000\n", 50 | "\n", 51 | "# Here you sample n_draws draws from the prior into a pandas Series (to have convenient\n", 52 | "# methods available for histograms and descriptive statistics, e.g. median)\n", 53 | "prior = pd.Series(...) \n", 54 | "\n", 55 | "prior.hist() # It's always good to eyeball the prior to make sure it looks ok.\n", 56 | "\n", 57 | "# Here you define the generative model\n", 58 | "def generative_model(parameters):\n", 59 | " return(...)\n", 60 | "\n", 61 | "# Here you simulate data using the parameters from the prior and the \n", 62 | "# generative model\n", 63 | "sim_data = list()\n", 64 | "for p in prior:\n", 65 | " sim_data.append(generative_model(p))\n", 66 | " \n", 67 | "# Here you filter off all draws that do not match the data.\n", 68 | "posterior = prior[list(map(lambda x: x == observed_data, sim_data))]\n", 69 | "\n", 70 | "posterior.hist() # Eyeball the posterior\n", 71 | "\n", 72 | "\n", 73 | "# See that we got enought draws left after the filtering. \n", 74 | "# There are no rules here, but you probably want to aim for >1000 draws.\n", 75 | "\n", 76 | "# Now you can summarize the posterior, where a common summary is to take the mean or the median posterior, \n", 77 | "# and perhaps a 95% quantile interval.\n", 78 | "\n", 79 | "\n", 80 | "print('Number of draws left: %d, Posterior median: %.3f, Posterior quantile interval: %.3f-%.3f' % \n", 81 | " (len(posterior), posterior.median(), posterior.quantile(.025), posterior.quantile(.975)))\n", 82 | "\n" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "Question II, What’s the probability that method A is better than telemarketing?\n", 90 | "----------------\n", 91 | "So marketing just told us that the rate of sign-up would be 20% if salmon subscribers were snared by a telemarketing campaign instead (to us it’s very unclear where marketing got this very precise number from). So given the model and the data that we developed in the last question, what’s the probability that method A has a higher rate of sign-up than telemarketing?\n", 92 | "\n", 93 | "**Hint 1:** If you have a vector of samples representing a probability distribution, which you should have from the last question, calculating the amount of probability above a certain value is done by simply *counting* the number of samples above that value and dividing by the total number of samples.\n", 94 | "\n", 95 | "**Hint 2:** The answer to this question is a one-liner." 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "Question III, If method A was used on 100 people what would be number of sign-ups?\n", 103 | "--------------\n", 104 | "\n", 105 | "**Hint 1:** The answer is again not a single number but a distribution over probable number of sign-ups.\n", 106 | "\n", 107 | "**Hint 2:** As before, the binomial distribution is a good candidate for how many people that sign up out of the 100 possible.\n", 108 | "\n", 109 | "**Hint 3:** Make sure you don’t “throw away” uncertainty, for example by using a summary of the posterior distribution calculated in the first question. Use the full original posterior sample!\n", 110 | "\n", 111 | "**Hint 4:** The general patter when calculating “derivatives” of posterior samples is to go through the values one-by-one, and perform a transformation (say, plugging in the value in a binomial distribution), and collect the new values in a vector.\n", 112 | "\n", 113 | " " 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "Solutions (but this can be done in many ways)\n", 121 | "==============\n", 122 | "Question I\n", 123 | "--------------" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 2, 129 | "metadata": { 130 | "collapsed": false 131 | }, 132 | "outputs": [ 133 | { 134 | "data": { 135 | "text/plain": [ 136 | "" 137 | ] 138 | }, 139 | "execution_count": 2, 140 | "metadata": {}, 141 | "output_type": "execute_result" 142 | }, 143 | { 144 | "data": { 145 | "image/svg+xml": [ 146 | "\n", 147 | "\n", 149 | "\n", 150 | "\n", 151 | " \n", 152 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 311 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 779 | " \n", 780 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | "\n" 786 | ], 787 | "text/plain": [ 788 | "" 789 | ] 790 | }, 791 | "metadata": {}, 792 | "output_type": "display_data" 793 | } 794 | ], 795 | "source": [ 796 | "# Import libraries\n", 797 | "import pandas as pd\n", 798 | "import numpy as np\n", 799 | "\n", 800 | "# Number of random draws from the prior\n", 801 | "n_draw = 10000\n", 802 | "\n", 803 | "# Defining and drawing from the prior distribution \n", 804 | "prior_rate = pd.Series(np.random.uniform(0, 1, size = n_draw)) \n", 805 | "\n", 806 | "# It's always good to eyeball the prior to make sure it looks ok.\n", 807 | "prior_rate.hist() " 808 | ] 809 | }, 810 | { 811 | "cell_type": "code", 812 | "execution_count": 5, 813 | "metadata": { 814 | "collapsed": false 815 | }, 816 | "outputs": [ 817 | { 818 | "name": "stdout", 819 | "output_type": "stream", 820 | "text": [ 821 | "Number of draws left: 581, Posterior mean: 0.395, Posterior median: 0.393, Posterior 95% quantile interval: 0.194-0.615\n" 822 | ] 823 | }, 824 | { 825 | "data": { 826 | "image/svg+xml": [ 827 | "\n", 828 | "\n", 830 | "\n", 831 | "\n", 832 | " \n", 833 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 992 | " \n", 1005 | " \n", 1024 | " \n", 1025 | " \n", 1026 | " \n", 1027 | " \n", 1028 | " \n", 1029 | " \n", 1030 | " \n", 1031 | " \n", 1032 | " \n", 1033 | " \n", 1034 | " \n", 1037 | " \n", 1038 | " \n", 1039 | " \n", 1040 | " \n", 1041 | " \n", 1042 | " \n", 1043 | " \n", 1044 | " \n", 1045 | " \n", 1046 | " \n", 1047 | " \n", 1048 | " \n", 1049 | " \n", 1050 | " \n", 1051 | " \n", 1074 | " \n", 1075 | " \n", 1076 | " \n", 1077 | " \n", 1078 | " \n", 1079 | " \n", 1080 | " \n", 1081 | " \n", 1082 | " \n", 1083 | " \n", 1084 | " \n", 1087 | " \n", 1088 | " \n", 1089 | " \n", 1090 | " \n", 1091 | " \n", 1092 | " \n", 1093 | " \n", 1094 | " \n", 1095 | " \n", 1096 | " \n", 1097 | " \n", 1098 | " \n", 1099 | " \n", 1100 | " \n", 1101 | " \n", 1132 | " \n", 1133 | " \n", 1134 | " \n", 1135 | " \n", 1136 | " \n", 1137 | " \n", 1138 | " \n", 1139 | " \n", 1140 | " \n", 1141 | " \n", 1142 | " \n", 1145 | " \n", 1146 | " \n", 1147 | " \n", 1148 | " \n", 1149 | " \n", 1150 | " \n", 1151 | " \n", 1152 | " \n", 1153 | " \n", 1154 | " \n", 1155 | " \n", 1156 | " \n", 1157 | " \n", 1158 | " \n", 1159 | " \n", 1176 | " \n", 1177 | " \n", 1178 | " \n", 1179 | " \n", 1180 | " \n", 1181 | " \n", 1182 | " \n", 1183 | " \n", 1184 | " \n", 1185 | " \n", 1186 | " \n", 1189 | " \n", 1190 | " \n", 1191 | " \n", 1192 | " \n", 1193 | " \n", 1194 | " \n", 1195 | " \n", 1196 | " \n", 1197 | " \n", 1198 | " \n", 1199 | " \n", 1200 | " \n", 1201 | " \n", 1202 | " \n", 1203 | " \n", 1227 | " \n", 1228 | " \n", 1229 | " \n", 1230 | " \n", 1231 | " \n", 1232 | " \n", 1233 | " \n", 1234 | " \n", 1235 | " \n", 1236 | " \n", 1237 | " \n", 1240 | " \n", 1241 | " \n", 1242 | " \n", 1243 | " \n", 1244 | " \n", 1245 | " \n", 1246 | " \n", 1247 | " \n", 1248 | " \n", 1249 | " \n", 1250 | " \n", 1251 | " \n", 1252 | " \n", 1253 | " \n", 1254 | " \n", 1282 | " \n", 1283 | " \n", 1284 | " \n", 1285 | " \n", 1286 | " \n", 1287 | " \n", 1288 | " \n", 1289 | " \n", 1290 | " \n", 1291 | " \n", 1292 | " \n", 1295 | " \n", 1296 | " \n", 1297 | " \n", 1298 | " \n", 1299 | " \n", 1300 | " \n", 1301 | " \n", 1302 | " \n", 1303 | " \n", 1304 | " \n", 1305 | " \n", 1306 | " \n", 1307 | " \n", 1308 | " \n", 1309 | " \n", 1318 | " \n", 1319 | " \n", 1320 | " \n", 1321 | " \n", 1322 | " \n", 1323 | " \n", 1324 | " \n", 1325 | " \n", 1326 | " \n", 1327 | " \n", 1328 | " \n", 1331 | " \n", 1332 | " \n", 1333 | " \n", 1334 | " \n", 1335 | " \n", 1336 | " \n", 1337 | " \n", 1338 | " \n", 1339 | " \n", 1340 | " \n", 1341 | " \n", 1342 | " \n", 1343 | " \n", 1344 | " \n", 1345 | " \n", 1381 | " \n", 1382 | " \n", 1383 | " \n", 1384 | " \n", 1385 | " \n", 1386 | " \n", 1387 | " \n", 1388 | " \n", 1389 | " \n", 1390 | " \n", 1391 | " \n", 1392 | " \n", 1393 | " \n", 1396 | " \n", 1397 | " \n", 1398 | " \n", 1399 | " \n", 1402 | " \n", 1403 | " \n", 1404 | " \n", 1405 | " \n", 1406 | " \n", 1407 | " \n", 1408 | " \n", 1409 | " \n", 1412 | " \n", 1413 | " \n", 1414 | " \n", 1415 | " \n", 1416 | " \n", 1417 | " \n", 1418 | " \n", 1419 | " \n", 1420 | " \n", 1421 | " \n", 1422 | " \n", 1423 | " \n", 1424 | " \n", 1425 | " \n", 1426 | " \n", 1429 | " \n", 1430 | " \n", 1431 | " \n", 1432 | " \n", 1433 | " \n", 1434 | " \n", 1435 | " \n", 1436 | " \n", 1437 | " \n", 1438 | " \n", 1439 | " \n", 1440 | " \n", 1441 | " \n", 1442 | " \n", 1443 | " \n", 1444 | " \n", 1445 | " \n", 1446 | " \n", 1447 | " \n", 1448 | " \n", 1449 | " \n", 1450 | " \n", 1453 | " \n", 1454 | " \n", 1455 | " \n", 1456 | " \n", 1457 | " \n", 1458 | " \n", 1459 | " \n", 1460 | " \n", 1461 | " \n", 1462 | " \n", 1463 | " \n", 1464 | " \n", 1465 | " \n", 1466 | " \n", 1467 | " \n", 1468 | " \n", 1469 | " \n", 1470 | " \n", 1471 | " \n", 1472 | " \n", 1473 | " \n", 1474 | " \n", 1477 | " \n", 1478 | " \n", 1479 | " \n", 1480 | " \n", 1481 | " \n", 1482 | " \n", 1483 | " \n", 1484 | " \n", 1485 | " \n", 1486 | " \n", 1487 | " \n", 1488 | " \n", 1489 | " \n", 1490 | " \n", 1491 | " \n", 1492 | " \n", 1493 | " \n", 1494 | " \n", 1495 | " \n", 1496 | " \n", 1497 | " \n", 1498 | " \n", 1501 | " \n", 1502 | " \n", 1503 | " \n", 1504 | " \n", 1505 | " \n", 1506 | " \n", 1507 | " \n", 1508 | " \n", 1509 | " \n", 1510 | " \n", 1511 | " \n", 1512 | " \n", 1513 | " \n", 1514 | " \n", 1515 | " \n", 1516 | " \n", 1517 | " \n", 1518 | " \n", 1519 | " \n", 1520 | " \n", 1521 | " \n", 1522 | " \n", 1525 | " \n", 1526 | " \n", 1527 | " \n", 1528 | " \n", 1529 | " \n", 1530 | " \n", 1531 | " \n", 1532 | " \n", 1533 | " \n", 1534 | " \n", 1535 | " \n", 1536 | " \n", 1537 | " \n", 1538 | " \n", 1539 | " \n", 1540 | " \n", 1541 | " \n", 1542 | " \n", 1543 | " \n", 1544 | " \n", 1545 | " \n", 1546 | " \n", 1547 | " \n", 1550 | " \n", 1551 | " \n", 1552 | " \n", 1553 | " \n", 1554 | " \n", 1555 | " \n", 1556 | " \n", 1557 | " \n", 1558 | " \n", 1559 | " \n", 1560 | " \n", 1561 | " \n", 1562 | " \n", 1563 | " \n", 1564 | " \n", 1565 | " \n", 1566 | " \n", 1567 | " \n", 1568 | " \n", 1569 | " \n", 1570 | " \n", 1571 | " \n", 1572 | " \n", 1573 | " \n", 1574 | " \n", 1575 | " \n", 1576 | " \n", 1577 | " \n", 1578 | "\n" 1579 | ], 1580 | "text/plain": [ 1581 | "" 1582 | ] 1583 | }, 1584 | "metadata": {}, 1585 | "output_type": "display_data" 1586 | } 1587 | ], 1588 | "source": [ 1589 | "# Defining the generative model\n", 1590 | "def gen_model(prob):\n", 1591 | " return(np.random.binomial(16, prob))\n", 1592 | "\n", 1593 | "# the generative model\n", 1594 | "subscribers = list()\n", 1595 | "\n", 1596 | "# Simulating the data\n", 1597 | "for p in prior_rate:\n", 1598 | " subscribers.append(gen_model(p))\n", 1599 | " \n", 1600 | "# Observed data\n", 1601 | "observed_data = 6\n", 1602 | "\n", 1603 | "# Here you filter off all draws that do not match the data.\n", 1604 | "post_rate = prior_rate[list(map(lambda x: x == observed_data, subscribers))]\n", 1605 | "\n", 1606 | "post_rate.hist() # Eyeball the posterior\n", 1607 | "\n", 1608 | "\n", 1609 | "# See that we got enought draws left after the filtering. \n", 1610 | "# There are no rules here, but you probably want to aim for >1000 draws.\n", 1611 | "\n", 1612 | "# Now you can summarize the posterior, where a common summary is to take the mean or the median posterior, \n", 1613 | "# and perhaps a 95% quantile interval.\n", 1614 | "\n", 1615 | "\n", 1616 | "print('Number of draws left: %d, Posterior mean: %.3f, Posterior median: %.3f, Posterior 95%% quantile interval: %.3f-%.3f' % \n", 1617 | " (len(post_rate), post_rate.mean(), post_rate.median(), post_rate.quantile(.025), post_rate.quantile(.975)))\n", 1618 | "\n" 1619 | ] 1620 | }, 1621 | { 1622 | "cell_type": "markdown", 1623 | "metadata": {}, 1624 | "source": [ 1625 | "Question II\n", 1626 | "----------\n" 1627 | ] 1628 | }, 1629 | { 1630 | "cell_type": "code", 1631 | "execution_count": 6, 1632 | "metadata": { 1633 | "collapsed": false 1634 | }, 1635 | "outputs": [ 1636 | { 1637 | "data": { 1638 | "text/plain": [ 1639 | "0.96729776247848542" 1640 | ] 1641 | }, 1642 | "execution_count": 6, 1643 | "metadata": {}, 1644 | "output_type": "execute_result" 1645 | } 1646 | ], 1647 | "source": [ 1648 | "sum(post_rate > 0.2) / len(post_rate) # or just np.mean(post_rate > 0.2)" 1649 | ] 1650 | }, 1651 | { 1652 | "cell_type": "markdown", 1653 | "metadata": {}, 1654 | "source": [ 1655 | "Question III\n", 1656 | "----------" 1657 | ] 1658 | }, 1659 | { 1660 | "cell_type": "code", 1661 | "execution_count": 10, 1662 | "metadata": { 1663 | "collapsed": false 1664 | }, 1665 | "outputs": [ 1666 | { 1667 | "name": "stdout", 1668 | "output_type": "stream", 1669 | "text": [ 1670 | "Sign-up 95% quantile interval 17-63\n" 1671 | ] 1672 | }, 1673 | { 1674 | "data": { 1675 | "image/svg+xml": [ 1676 | "\n", 1677 | "\n", 1679 | "\n", 1680 | "\n", 1681 | " \n", 1682 | " \n", 1685 | " \n", 1686 | " \n", 1687 | " \n", 1688 | " \n", 1695 | " \n", 1696 | " \n", 1697 | " \n", 1698 | " \n", 1704 | " \n", 1705 | " \n", 1706 | " \n", 1712 | " \n", 1713 | " \n", 1714 | " \n", 1720 | " \n", 1721 | " \n", 1722 | " \n", 1728 | " \n", 1729 | " \n", 1730 | " \n", 1736 | " \n", 1737 | " \n", 1738 | " \n", 1744 | " \n", 1745 | " \n", 1746 | " \n", 1752 | " \n", 1753 | " \n", 1754 | " \n", 1760 | " \n", 1761 | " \n", 1762 | " \n", 1768 | " \n", 1769 | " \n", 1770 | " \n", 1776 | " \n", 1777 | " \n", 1778 | " \n", 1784 | " \n", 1785 | " \n", 1786 | " \n", 1789 | " \n", 1790 | " \n", 1791 | " \n", 1794 | " \n", 1795 | " \n", 1796 | " \n", 1799 | " \n", 1800 | " \n", 1801 | " \n", 1804 | " \n", 1805 | " \n", 1806 | " \n", 1807 | " \n", 1808 | " \n", 1811 | " \n", 1812 | " \n", 1813 | " \n", 1814 | " \n", 1817 | " \n", 1818 | " \n", 1819 | " \n", 1820 | " \n", 1821 | " \n", 1822 | " \n", 1823 | " \n", 1824 | " \n", 1827 | " \n", 1828 | " \n", 1829 | " \n", 1830 | " \n", 1831 | " \n", 1832 | " \n", 1833 | " \n", 1834 | " \n", 1835 | " \n", 1854 | " \n", 1855 | " \n", 1856 | " \n", 1857 | " \n", 1858 | " \n", 1859 | " \n", 1860 | " \n", 1861 | " \n", 1862 | " \n", 1865 | " \n", 1866 | " \n", 1867 | " \n", 1868 | " \n", 1869 | " \n", 1870 | " \n", 1871 | " \n", 1872 | " \n", 1873 | " \n", 1874 | " \n", 1875 | " \n", 1876 | " \n", 1877 | " \n", 1878 | " \n", 1879 | " \n", 1892 | " \n", 1893 | " \n", 1894 | " \n", 1895 | " \n", 1896 | " \n", 1897 | " \n", 1898 | " \n", 1899 | " \n", 1900 | " \n", 1901 | " \n", 1904 | " \n", 1905 | " \n", 1906 | " \n", 1907 | " \n", 1908 | " \n", 1909 | " \n", 1910 | " \n", 1911 | " \n", 1912 | " \n", 1913 | " \n", 1914 | " \n", 1915 | " \n", 1916 | " \n", 1917 | " \n", 1918 | " \n", 1941 | " \n", 1942 | " \n", 1943 | " \n", 1944 | " \n", 1945 | " \n", 1946 | " \n", 1947 | " \n", 1948 | " \n", 1949 | " \n", 1950 | " \n", 1953 | " \n", 1954 | " \n", 1955 | " \n", 1956 | " \n", 1957 | " \n", 1958 | " \n", 1959 | " \n", 1960 | " \n", 1961 | " \n", 1962 | " \n", 1963 | " \n", 1964 | " \n", 1965 | " \n", 1966 | " \n", 1967 | " \n", 1998 | " \n", 1999 | " \n", 2000 | " \n", 2001 | " \n", 2002 | " \n", 2003 | " \n", 2004 | " \n", 2005 | " \n", 2006 | " \n", 2007 | " \n", 2010 | " \n", 2011 | " \n", 2012 | " \n", 2013 | " \n", 2014 | " \n", 2015 | " \n", 2016 | " \n", 2017 | " \n", 2018 | " \n", 2019 | " \n", 2020 | " \n", 2021 | " \n", 2022 | " \n", 2023 | " \n", 2024 | " \n", 2041 | " \n", 2042 | " \n", 2043 | " \n", 2044 | " \n", 2045 | " \n", 2046 | " \n", 2047 | " \n", 2048 | " \n", 2049 | " \n", 2050 | " \n", 2053 | " \n", 2054 | " \n", 2055 | " \n", 2056 | " \n", 2057 | " \n", 2058 | " \n", 2059 | " \n", 2060 | " \n", 2061 | " \n", 2062 | " \n", 2063 | " \n", 2064 | " \n", 2065 | " \n", 2066 | " \n", 2067 | " \n", 2091 | " \n", 2092 | " \n", 2093 | " \n", 2094 | " \n", 2095 | " \n", 2096 | " \n", 2097 | " \n", 2098 | " \n", 2099 | " \n", 2100 | " \n", 2103 | " \n", 2104 | " \n", 2105 | " \n", 2106 | " \n", 2107 | " \n", 2108 | " \n", 2109 | " \n", 2110 | " \n", 2111 | " \n", 2112 | " \n", 2113 | " \n", 2114 | " \n", 2115 | " \n", 2116 | " \n", 2117 | " \n", 2145 | " \n", 2146 | " \n", 2147 | " \n", 2148 | " \n", 2149 | " \n", 2150 | " \n", 2151 | " \n", 2152 | " \n", 2153 | " \n", 2154 | " \n", 2157 | " \n", 2158 | " \n", 2159 | " \n", 2160 | " \n", 2161 | " \n", 2162 | " \n", 2163 | " \n", 2164 | " \n", 2165 | " \n", 2166 | " \n", 2167 | " \n", 2168 | " \n", 2169 | " \n", 2170 | " \n", 2171 | " \n", 2180 | " \n", 2181 | " \n", 2182 | " \n", 2183 | " \n", 2184 | " \n", 2185 | " \n", 2186 | " \n", 2187 | " \n", 2188 | " \n", 2189 | " \n", 2192 | " \n", 2193 | " \n", 2194 | " \n", 2195 | " \n", 2196 | " \n", 2197 | " \n", 2198 | " \n", 2199 | " \n", 2200 | " \n", 2201 | " \n", 2202 | " \n", 2203 | " \n", 2204 | " \n", 2205 | " \n", 2206 | " \n", 2242 | " \n", 2243 | " \n", 2244 | " \n", 2245 | " \n", 2246 | " \n", 2247 | " \n", 2248 | " \n", 2249 | " \n", 2250 | " \n", 2251 | " \n", 2252 | " \n", 2253 | " \n", 2256 | " \n", 2257 | " \n", 2258 | " \n", 2259 | " \n", 2262 | " \n", 2263 | " \n", 2264 | " \n", 2265 | " \n", 2266 | " \n", 2267 | " \n", 2268 | " \n", 2269 | " \n", 2272 | " \n", 2273 | " \n", 2274 | " \n", 2275 | " \n", 2276 | " \n", 2277 | " \n", 2278 | " \n", 2279 | " \n", 2280 | " \n", 2281 | " \n", 2282 | " \n", 2283 | " \n", 2284 | " \n", 2285 | " \n", 2286 | " \n", 2289 | " \n", 2290 | " \n", 2291 | " \n", 2292 | " \n", 2293 | " \n", 2294 | " \n", 2295 | " \n", 2296 | " \n", 2297 | " \n", 2298 | " \n", 2299 | " \n", 2300 | " \n", 2301 | " \n", 2302 | " \n", 2303 | " \n", 2304 | " \n", 2305 | " \n", 2306 | " \n", 2307 | " \n", 2308 | " \n", 2309 | " \n", 2310 | " \n", 2313 | " \n", 2314 | " \n", 2315 | " \n", 2316 | " \n", 2317 | " \n", 2318 | " \n", 2319 | " \n", 2320 | " \n", 2321 | " \n", 2322 | " \n", 2323 | " \n", 2324 | " \n", 2325 | " \n", 2326 | " \n", 2327 | " \n", 2328 | " \n", 2329 | " \n", 2330 | " \n", 2331 | " \n", 2332 | " \n", 2333 | " \n", 2334 | " \n", 2337 | " \n", 2338 | " \n", 2339 | " \n", 2340 | " \n", 2341 | " \n", 2342 | " \n", 2343 | " \n", 2344 | " \n", 2345 | " \n", 2346 | " \n", 2347 | " \n", 2348 | " \n", 2349 | " \n", 2350 | " \n", 2351 | " \n", 2352 | " \n", 2353 | " \n", 2354 | " \n", 2355 | " \n", 2356 | " \n", 2357 | " \n", 2358 | " \n", 2361 | " \n", 2362 | " \n", 2363 | " \n", 2364 | " \n", 2365 | " \n", 2366 | " \n", 2367 | " \n", 2368 | " \n", 2369 | " \n", 2370 | " \n", 2371 | " \n", 2372 | " \n", 2373 | " \n", 2374 | " \n", 2375 | " \n", 2376 | " \n", 2377 | " \n", 2378 | " \n", 2379 | " \n", 2380 | " \n", 2381 | " \n", 2382 | " \n", 2385 | " \n", 2386 | " \n", 2387 | " \n", 2388 | " \n", 2389 | " \n", 2390 | " \n", 2391 | " \n", 2392 | " \n", 2393 | " \n", 2394 | " \n", 2395 | " \n", 2396 | " \n", 2397 | " \n", 2398 | " \n", 2399 | " \n", 2400 | " \n", 2401 | " \n", 2402 | " \n", 2403 | " \n", 2404 | " \n", 2405 | " \n", 2406 | " \n", 2407 | " \n", 2410 | " \n", 2411 | " \n", 2412 | " \n", 2413 | " \n", 2414 | " \n", 2415 | " \n", 2416 | " \n", 2417 | " \n", 2418 | " \n", 2419 | " \n", 2420 | " \n", 2421 | " \n", 2422 | " \n", 2423 | " \n", 2424 | " \n", 2425 | " \n", 2426 | " \n", 2427 | " \n", 2428 | " \n", 2429 | " \n", 2430 | " \n", 2431 | " \n", 2432 | " \n", 2433 | " \n", 2434 | " \n", 2435 | " \n", 2436 | " \n", 2437 | " \n", 2438 | "\n" 2439 | ], 2440 | "text/plain": [ 2441 | "" 2442 | ] 2443 | }, 2444 | "metadata": {}, 2445 | "output_type": "display_data" 2446 | } 2447 | ], 2448 | "source": [ 2449 | "# This can be done with a for loop\n", 2450 | "signups = list()\n", 2451 | "\n", 2452 | "for p in post_rate:\n", 2453 | " signups.append(np.random.binomial(100, p))\n", 2454 | "\n", 2455 | "\n", 2456 | "# But we can write it like this:\n", 2457 | "signups = pd.Series([np.random.binomial(n = 100, p = p) for p in post_rate])\n", 2458 | "\n", 2459 | "signups.hist()\n", 2460 | "print('Sign-up 95%% quantile interval %d-%d' % tuple(signups.quantile([.025, .975]).values))" 2461 | ] 2462 | }, 2463 | { 2464 | "cell_type": "markdown", 2465 | "metadata": { 2466 | "collapsed": true 2467 | }, 2468 | "source": [ 2469 | "So a decent guess is that is would be between 20 and 60 sign-ups." 2470 | ] 2471 | } 2472 | ], 2473 | "metadata": { 2474 | "kernelspec": { 2475 | "display_name": "Python 3", 2476 | "language": "python", 2477 | "name": "python3" 2478 | }, 2479 | "language_info": { 2480 | "codemirror_mode": { 2481 | "name": "ipython", 2482 | "version": 3 2483 | }, 2484 | "file_extension": ".py", 2485 | "mimetype": "text/x-python", 2486 | "name": "python", 2487 | "nbconvert_exporter": "python", 2488 | "pygments_lexer": "ipython3", 2489 | "version": "3.4.3" 2490 | } 2491 | }, 2492 | "nbformat": 4, 2493 | "nbformat_minor": 0 2494 | } 2495 | -------------------------------------------------------------------------------- /Exercise 3.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Exercise 3: Bayesian computation with Stan and Farmer Jöns\n", 8 | "============\n", 9 | "### *Rasmus Bååth (adapted for Python by Christophe Carvenius)*\n", 10 | "Here follows a number of data analytic questions. Use [Stan](http://mc-stan.org/) and R to build models that probe these questions. The Stan documentation can be found here: http://mc-stan.org/documentation/ . You can find the answers to the exercise questions here: http://rpubs.com/rasmusab/answers_bayes_with_farmer_jons\n", 11 | "\n", 12 | "## 1. Getting started\n", 13 | "Below is a code scaffold you can use. Right now the scaffold contains a simple model for two binomial rates, but this should be replaced with a model that matches the relevant questions.\n", 14 | "\n", 15 | "**→ Read through the code to see if you can figure out what does what and then run it to make sure it works. It should print out some statistics and some pretty graphs.**\n", 16 | "\n", 17 | " " 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": { 24 | "collapsed": false 25 | }, 26 | "outputs": [], 27 | "source": [ 28 | "# Import libraries\n", 29 | "import pystan # install with pip install pystan\n", 30 | "import pandas as pd\n", 31 | "import numpy as np\n", 32 | "\n", 33 | "# The Stan model as a string.\n", 34 | "model_string = \"\"\"\n", 35 | "data {\n", 36 | " # Number of data points\n", 37 | " int n1;\n", 38 | " int n2;\n", 39 | " # Number of successes\n", 40 | " int y1[n1];\n", 41 | " int y2[n2];\n", 42 | "}\n", 43 | "\n", 44 | "parameters {\n", 45 | " real theta1;\n", 46 | " real theta2;\n", 47 | "}\n", 48 | "\n", 49 | "model { \n", 50 | " theta1 ~ beta(1, 1);\n", 51 | " theta2 ~ beta(1, 1);\n", 52 | " y1 ~ bernoulli(theta1);\n", 53 | " y2 ~ bernoulli(theta2); \n", 54 | "}\n", 55 | "\n", 56 | "generated quantities {\n", 57 | "}\n", 58 | "\"\"\"\n", 59 | "\n", 60 | "y1 = [0, 1, 0, 0, 0, 0, 1, 0, 0, 0]\n", 61 | "y2 = [0, 0, 1, 1, 1, 0, 1, 1, 1, 0]\n", 62 | "data_list = {'y1' : y1, 'y2' : y2, 'n1' : len(y1), 'n2' : len(y2)}\n", 63 | "\n", 64 | "# Compiling and producing posterior samples from the model.\n", 65 | "stan_samples = pystan.stan(model_code = model_string, data = data_list)\n", 66 | "\n", 67 | "# Plotting and summarizing the posterior distribution\n", 68 | "print(stan_samples)\n", 69 | "print(stan_samples.plot())" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "**Hint 1:** `bernoulli` is the Bernoulli distribution which is the special case of the binomial distribution when there is just one trial `(bernoulli(x) === binomial(1, x))`. That is, if the data is coded as 4 successes out of 6 `(x = 4; n = 6)` it would be most convenient to use a binomial distribution. If the data is coded like `[1, 1, 1, 1, 0, 0]` it would be more convenient to use a Bernoulli distribution. The result would in any case be the same.\n", 77 | "\n", 78 | "**Hint 2:** Stan has quite a lot of different built in data types and two that sounds the same, but aren’t, are vectors and arrays. Vectors are simple, they are lists of real numbers and `vector[4] v;` would define a vector or length 4. Arrays are more general in that they can contain other data types, for example `int a[4]` would define an array of integers of length 4. Note the different placement of the `[]`-brackets compared to defining a vector.\n", 79 | "\n", 80 | "**Hint 3:** When defining parameters it’s important to properly define the *support*, that is, for what values the parameter has a defined meaning. For example, the support of a mean is on the whole real line (-Inf to Inf) so that can simply be declared by `real mu;`. A *standard deviation*, on the other hand, can’t be below 0.0, which could be written like this: `real sigma`. Finaly, a rate has to be between 0 and 1 which would be written like `real theta;`." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "## 2. Manipulating samples\n", 88 | "To inspect and manipulate samples from individual parameters it is useful to convert the Stan “object” into a simple dataframe which gets one column per parameter: " 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 13, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/html": [ 101 | "
\n", 102 | "\n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | "
theta1theta2lp__
00.3389450.673828-15.336131
10.1252900.825161-17.500812
20.1107330.596067-15.812474
30.5363500.324687-18.623693
40.1745770.282452-17.472201
\n", 144 | "
" 145 | ], 146 | "text/plain": [ 147 | " theta1 theta2 lp__\n", 148 | "0 0.338945 0.673828 -15.336131\n", 149 | "1 0.125290 0.825161 -17.500812\n", 150 | "2 0.110733 0.596067 -15.812474\n", 151 | "3 0.536350 0.324687 -18.623693\n", 152 | "4 0.174577 0.282452 -17.472201" 153 | ] 154 | }, 155 | "execution_count": 13, 156 | "metadata": {}, 157 | "output_type": "execute_result" 158 | } 159 | ], 160 | "source": [ 161 | "s = pd.DataFrame(stan_samples.extract())\n", 162 | "s.head()\n", 163 | " " 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "This is useful as you can, for example, plot and compare the individual parameters." 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 23, 176 | "metadata": { 177 | "collapsed": false 178 | }, 179 | "outputs": [ 180 | { 181 | "data": { 182 | "text/plain": [ 183 | "0.96499999999999997" 184 | ] 185 | }, 186 | "execution_count": 23, 187 | "metadata": {}, 188 | "output_type": "execute_result" 189 | } 190 | ], 191 | "source": [ 192 | "# The probability that the rate theta1 is smaller than theta2\n", 193 | "np.mean(s.theta1 < s.theta2) # Can also be written as sum(s.theta1 < s.theta2) / len(s)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 25, 199 | "metadata": { 200 | "collapsed": false 201 | }, 202 | "outputs": [ 203 | { 204 | "data": { 205 | "text/plain": [ 206 | "" 207 | ] 208 | }, 209 | "execution_count": 25, 210 | "metadata": {}, 211 | "output_type": "execute_result" 212 | }, 213 | { 214 | "data": { 215 | "image/svg+xml": [ 216 | "\n", 217 | "\n", 219 | "\n", 220 | "\n", 221 | " \n", 222 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 394 | " \n", 411 | " \n", 417 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | " \n", 763 | " \n", 764 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | " \n", 769 | " \n", 770 | " \n", 771 | " \n", 772 | " \n", 773 | " \n", 774 | " \n", 775 | " \n", 776 | " \n", 777 | " \n", 778 | " \n", 781 | " \n", 782 | " \n", 783 | " \n", 784 | " \n", 785 | " \n", 786 | " \n", 787 | " \n", 788 | " \n", 789 | " \n", 790 | " \n", 791 | " \n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | "\n" 886 | ], 887 | "text/plain": [ 888 | "" 889 | ] 890 | }, 891 | "metadata": {}, 892 | "output_type": "display_data" 893 | } 894 | ], 895 | "source": [ 896 | "# Plotting distribution of the difference between theta1 and theta2\n", 897 | "(s.theta2 - s.theta1).hist()" 898 | ] 899 | }, 900 | { 901 | "cell_type": "markdown", 902 | "metadata": {}, 903 | "source": [ 904 | "**→ Calculate the probability that the difference between the two underlying rates is smaller than 0.2.**\n", 905 | "\n", 906 | "*Hint:* `abs(x - y)` calculates the absolute difference between x and y." 907 | ] 908 | }, 909 | { 910 | "cell_type": "markdown", 911 | "metadata": {}, 912 | "source": [ 913 | "## 3. Cows and disease\n", 914 | "\n", 915 | "Farmer Jöns has a huge number of cows. Earlier this year he ran an experiment where he gave 10 cows medicine A and 10 medicine B and then measured whether they got sick `0` or not `1` during the summer season. Here is the resulting data:" 916 | ] 917 | }, 918 | { 919 | "cell_type": "code", 920 | "execution_count": 28, 921 | "metadata": { 922 | "collapsed": true 923 | }, 924 | "outputs": [], 925 | "source": [ 926 | "cowA = [0, 1, 0, 0, 0, 0, 1, 0, 0, 0]\n", 927 | "cowB = [0, 0, 1, 1, 1, 0, 1, 1, 1, 0]" 928 | ] 929 | }, 930 | { 931 | "cell_type": "markdown", 932 | "metadata": {}, 933 | "source": [ 934 | "**→ Jöns now wants to know: How effective are the drugs? What is the evidence that medicine A is better or worse than medicine B?**" 935 | ] 936 | }, 937 | { 938 | "cell_type": "markdown", 939 | "metadata": {}, 940 | "source": [ 941 | "## 4. Cows and milk\n", 942 | "Farmer Jöns has a huge number of cows. Earlier this year he ran an experiment where he gave 10 cows a special diet that he had heard could make them produce more milk. He recorded the number of liters of milk from these “diet” cows and from 15 “normal” cows during one month. This is the data:" 943 | ] 944 | }, 945 | { 946 | "cell_type": "code", 947 | "execution_count": 31, 948 | "metadata": { 949 | "collapsed": true 950 | }, 951 | "outputs": [], 952 | "source": [ 953 | "diet_milk = [651, 679, 374, 601, 401, 609, 767, 709, 704, 679]\n", 954 | "normal_milk = [798, 1139, 529, 609, 553, 743, 151, 544, 488, 555, 257, 692, 678, 675, 538]" 955 | ] 956 | }, 957 | { 958 | "cell_type": "markdown", 959 | "metadata": {}, 960 | "source": [ 961 | "**→ Jöns now wants to know: Was the diet any good, does it results in better milk production?**\n", 962 | "\n", 963 | "**Hint 1:** To model this you might find it useful to use the Normal distribution which is called `normal` in Stan. A statement using `normal` could look like:\n", 964 | " \n", 965 | "`for(i in 1:n ) {`\n", 966 | "\n", 967 | "` y[i] ~ normal(mu, sigma)`\n", 968 | "\n", 969 | "`}`\n", 970 | "\n", 971 | "Where `mu` is the mean and `sigma` is the standard deviation and `y` is a vector of length `n`. Since Stan is partly vectorized the above could also be written without the loop like `y ~ normal(mu, sigma)`.\n", 972 | "\n", 973 | "**Hint 2:** You will have to put priors on `mu` and `sigma` and here there are many options. A lazy but often OK shortcut is to just use `uniform` distributions that are wide enough to include all thinkable values of the parameters. If you want to be extra sloppy you can actually skip putting any priors at all in which case Stan will use uniform(-Infinity, Infinity), but it’s good style to use explicit priors.\n", 974 | "\n", 975 | "## Bonus questions \n", 976 | "If you have made it this far, great! Below are a couple of bonus questions. How far can you reach?\n", 977 | "## 5. Cows and Mutant Cows\n", 978 | "Farmer Jöns has a huge number of cows. Due to a recent radioactive leak in a nearby power plant he fears that some of them have become mutant cows. Jöns is interested in measuring the effectiveness of a diet on normal cows, but not on mutant cows (that might produce excessive amounts of milk, or nearly no milk at all!). The following data set contains the amount of milk for cows on a diet and cows on normal diet:" 979 | ] 980 | }, 981 | { 982 | "cell_type": "code", 983 | "execution_count": 33, 984 | "metadata": { 985 | "collapsed": false 986 | }, 987 | "outputs": [], 988 | "source": [ 989 | "diet_milk = [651, 679, 374, 601, 4000, 401, 609, 767, 3890, 704, 679]\n", 990 | "normal_milk = [798, 1139, 529, 609, 553, 743, 3, 151, 544, 488, 15, 257, 692, 678, 675, 538]" 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": {}, 996 | "source": [ 997 | "Some of the data points might come from mutant cows (also known as outliers).\n", 998 | "\n", 999 | "**→ Jöns now wants to know: Was the diet any good, does it results in better milk production for non-mutant cows?**\n", 1000 | "\n", 1001 | "**Hint:** Basically we have an outlier problem. A conventional trick in this situation is to supplement the normal distribution for a distribution with wider tails that is more sensitive to the central values and disregards the far away values (this is a little bit like trimming away some amount of the data on the left and on the right). A good choice for such a distribution is the t-distribution which is like the normal but with a third parameter called the “degrees of freedom”. The lower the “degrees of freedom” the wider the tails and when this parameter is larger than about 50 the t-distribution is practically the same as the normal. A good choice for the problem with the mutant cows would be to use a t distribution with around 3 degrees of freedom:\n", 1002 | "\n", 1003 | "`y ~ student_t(3, mu, sigma);`" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "markdown", 1008 | "metadata": {}, 1009 | "source": [ 1010 | "Of course, you could also estimate the “degrees of freedom” as a free parameter, but that might be overkill in this case…\n", 1011 | "\n", 1012 | "## 6. Chickens and diet\n", 1013 | "Farmer Jöns has a huge number of cows. He also has chickens. He tries different diets on them too with the hope that they will produce more eggs. Below is the number of eggs produced in one week by chickens on a diet and chickens eating normal chicken stuff:" 1014 | ] 1015 | }, 1016 | { 1017 | "cell_type": "code", 1018 | "execution_count": 34, 1019 | "metadata": { 1020 | "collapsed": true 1021 | }, 1022 | "outputs": [], 1023 | "source": [ 1024 | "diet_eggs = [6, 4, 2, 3, 4, 3, 0, 4, 0, 6, 3]\n", 1025 | "normal_eggs = [4, 2, 1, 1, 2, 1, 2, 1, 3, 2, 1]" 1026 | ] 1027 | }, 1028 | { 1029 | "cell_type": "markdown", 1030 | "metadata": {}, 1031 | "source": [ 1032 | "**→ Jöns now wants to know: Was the diet any good, does it result in the chickens producing more eggs?**\n", 1033 | "\n", 1034 | "**Hint:** The `poisson` distribution is a discrete distribution that is often a reasonable choice when one wants to model count data (like, for example, counts of eggs). The `poisson` has one parameter `λλ` which stands for the mean count. In Stan you would use the Poisson like this:\n", 1035 | "\n", 1036 | "`y ~ poisson(lambda);`\n", 1037 | "\n", 1038 | "where y would be a single integer or an integer array of length n ( defined like int y[n];) and lambda a real number bounded at 0.0 (`real lambda;`)" 1039 | ] 1040 | }, 1041 | { 1042 | "cell_type": "markdown", 1043 | "metadata": {}, 1044 | "source": [ 1045 | "## 7. Cows and milk in a different data format\n", 1046 | "It's often common to have all data in a data frame. Inspect the following data frame d:" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "code", 1051 | "execution_count": 42, 1052 | "metadata": { 1053 | "collapsed": false 1054 | }, 1055 | "outputs": [], 1056 | "source": [ 1057 | "d = pd.DataFrame({'milk' : [651, 679, 374, 601, 401, 609, 767, 709, 704, 679, 798, 1139,\n", 1058 | " 529, 609, 553, 743, 151, 544, 488, 555, 257, 692, 678, 675, 538],\n", 1059 | " 'group' : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, \n", 1060 | " 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]})" 1061 | ] 1062 | }, 1063 | { 1064 | "cell_type": "markdown", 1065 | "metadata": {}, 1066 | "source": [ 1067 | "Looking at `d` you should see that it contains the same data as in exercise (4) but coded with one cow per row (The mutant cows were perhaps just a dream…). The diet group is coded as a 1 and the normal group is coded as a 2. This data could be read into Stan by using the following data list:" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "code", 1072 | "execution_count": 47, 1073 | "metadata": { 1074 | "collapsed": true 1075 | }, 1076 | "outputs": [], 1077 | "source": [ 1078 | "data_list = dict(y = d.milk, x = d.group, n = len(d.milk), \n", 1079 | " n_groups = max(d.group))" 1080 | ] 1081 | }, 1082 | { 1083 | "cell_type": "markdown", 1084 | "metadata": {}, 1085 | "source": [ 1086 | "**→ Modify the model from (4) to work with this data format instead.**\n", 1087 | "\n", 1088 | "**Hint:** In your Stan code you can loop over the group variable and use it to pick out the parameters belonging to that group like this:\n", 1089 | "\n", 1090 | "`for(i in 1:n) {`\n", 1091 | "\n", 1092 | "` y[i] ~ normal( mu[x[i]], sigma[x[i]] ) `\n", 1093 | "\n", 1094 | "`}`\n", 1095 | "\n", 1096 | "Where `mu` and `sigma` now are 2-length vectors. This is also known as indexception: You use an index (`i`) to pick out an index (`x[i]`) to pick out a value (`mu[x[i]]`). As indexing is vectorised in Stan this can actually be shortened to just:\n", 1097 | "\n", 1098 | "`y ~ normal( mu[x], sigma[x] );`" 1099 | ] 1100 | }, 1101 | { 1102 | "cell_type": "markdown", 1103 | "metadata": {}, 1104 | "source": [ 1105 | "## 8. Cows and more diets\n", 1106 | "Farmer Jöns has a huge number of cows. He also has a huge number of different diets he wants to try. In addition to the diet he already tried, he tries another diet (let’s call it diet 2) on 10 more cows. Copy-n-paste the following into R and inspect the resulting data frame `d`." 1107 | ] 1108 | }, 1109 | { 1110 | "cell_type": "code", 1111 | "execution_count": 49, 1112 | "metadata": { 1113 | "collapsed": true 1114 | }, 1115 | "outputs": [], 1116 | "source": [ 1117 | "d = pd.DataFrame({'milk' : [651, 679, 374, 601, 401, 609, 767, 709, 704, 679, 798, 1139, 529,\n", 1118 | " 609, 553, 743, 151, 544, 488, 555, 257, 692, 678, 675, 538, 1061,\n", 1119 | " 721, 595, 784, 877, 562, 800, 684, 741, 516],\n", 1120 | " 'group' : [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", 1121 | " 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]})" 1122 | ] 1123 | }, 1124 | { 1125 | "cell_type": "markdown", 1126 | "metadata": {}, 1127 | "source": [ 1128 | "It contains the same data as in the last exercise but with 10 added rows for diet 2 which is coded as group = 3.\n", 1129 | "\n", 1130 | "**→ Now Jöns now wants to know: Which diet seems best, if any? How much more milk should he be expecting to produce using the best diet compared to the others?**\n", 1131 | "\n", 1132 | "**Hint:** If you looped or used vectorization in a smart way you should be able to use the same model as in Question 7.\n", 1133 | "\n", 1134 | "## 9. Cows and sunshine\n", 1135 | "Farmer Jöns has a huge number of cows. He is wondering whether the amount of time a cow spends outside in the sunshine affects how much milk she produces. To test this he makes a controlled experiment where he picks out 20 cows and assigns each a number of hours she should spend outside each day. The experiment runs for a month and Jöns records the number of liters of milk each cow produces. Copy-n-paste the following into R and inspect the resulting data frame `d`." 1136 | ] 1137 | }, 1138 | { 1139 | "cell_type": "code", 1140 | "execution_count": 50, 1141 | "metadata": { 1142 | "collapsed": true 1143 | }, 1144 | "outputs": [], 1145 | "source": [ 1146 | "d = pd.DataFrame({'milk' : [685, 691, 476, 1151, 879, 725, 1190, 1107, 809, 539,\n", 1147 | " 298, 805, 820, 498, 1026, 1217, 1177, 684, 1061, 834],\n", 1148 | " 'hours' : [3, 7, 6, 10, 6, 5, 10, 11, 9, 3, 6, 6, 3, 5, 8, 11, \n", 1149 | " 12, 9, 5, 5]})" 1150 | ] 1151 | }, 1152 | { 1153 | "cell_type": "markdown", 1154 | "metadata": {}, 1155 | "source": [ 1156 | "**→ Using this data on hours of sunshine and resulting liters of milk Jöns wants to know: Does sunshine affect milk production positively or negatively?**\n", 1157 | "\n", 1158 | "**Hint 1:** A model probing the question above requires quite small changes to the model you developed in Question 8.\n", 1159 | "\n", 1160 | "**Hint 2:** You do remember the equation for the (linear regression) line? If not, here it is: `mu <- beta0 + beta1 * x;`\n" 1161 | ] 1162 | } 1163 | ], 1164 | "metadata": { 1165 | "celltoolbar": "Raw Cell Format", 1166 | "kernelspec": { 1167 | "display_name": "Python 3", 1168 | "language": "python", 1169 | "name": "python3" 1170 | }, 1171 | "language_info": { 1172 | "codemirror_mode": { 1173 | "name": "ipython", 1174 | "version": 3 1175 | }, 1176 | "file_extension": ".py", 1177 | "mimetype": "text/x-python", 1178 | "name": "python", 1179 | "nbconvert_exporter": "python", 1180 | "pygments_lexer": "ipython3", 1181 | "version": "3.4.3" 1182 | } 1183 | }, 1184 | "nbformat": 4, 1185 | "nbformat_minor": 0 1186 | } 1187 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Bayesian probabilities workshop 2 | ========== 3 | A collection of questions and solutions to problems presented at Rasmus Bååth's Bayesian probabilities workshop. 4 | --------------------------------------------------------------------------------