├── README.md
└── Lesson_01_Probability.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # Mathematical_statistics_using_Julia
2 | A course in mathematical statistics using Julia.
3 | The Julia language has some powerful, inbuilt, mathematical function that makes learning mathematical statistics very easy. It is so much more powerful to learn how to write the code to excute a statistical equation than doing the tedious work by hand.
4 |
--------------------------------------------------------------------------------
/Lesson_01_Probability.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [
10 | {
11 | "data": {
12 | "text/plain": [
13 | "4"
14 | ]
15 | },
16 | "execution_count": 1,
17 | "metadata": {},
18 | "output_type": "execute_result"
19 | }
20 | ],
21 | "source": [
22 | "# Is the kernel loaded?\n",
23 | "+(2, 2)"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "
Probability
"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {},
36 | "source": [
37 | "Introduction
"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "metadata": {},
43 | "source": [
44 | "In these lessons we are concerned with outcomes that occur randomly. To calculate the probability fo an outcome we construct experiments. All the possible outcomes of an experiment make up the sample space.
\n",
45 | "Sticking with tradition, we should consider the example of flipping a fair coin four three (fair meanining it has an equal likelihood of landing face up or down). Here is a list of all the possible outcomes:
\n",
46 | "\n",
47 | "- HHH
\n",
48 | "- HHT
\n",
49 | "- HTH
\n",
50 | "- THH
\n",
51 | "- HTT
\n",
52 | "- THT
\n",
53 | "- TTH
\n",
54 | "- TTT
\n",
55 | "
\n",
56 | "There are eight possible outcomes. Those, though, are the outcomes if the experiment is about what the probability is of getting any of those particular outcomes. The outcomes will be quite different if we ask what the probability is of getting both a head on the first and last flip. Now the possible outcomes are:\n",
57 | "\n",
58 | "- HHH
\n",
59 | "- HTH
\n",
60 | "
\n",
61 | "The possible outcomes for each of these experiments are called the *sample space*. Note that the second sample space is a subset of the first sample space."
62 | ]
63 | },
64 | {
65 | "cell_type": "markdown",
66 | "metadata": {},
67 | "source": [
68 | "
\n",
69 | "The multiplication principle
"
70 | ]
71 | },
72 | {
73 | "cell_type": "markdown",
74 | "metadata": {},
75 | "source": [
76 | "The defintion of the multiplication principle states that if there is an experiment with $m$ outcomes and another with $n$ outcomes, then there are $mn$ possible outcomes for the two experiments.
\n",
77 | "This is usually explained in set theory where we have two sets, $A$ and $B$. We make ordered pairs, $\\left(a,b\\right)$ where every $ a \\in A $ and $ b \\in B $. If there are $m$ elements in $A$ and $n$ elements in $B$ then there are $mn$ ordered pairs."
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "metadata": {},
83 | "source": [
84 | "In the extended multiplication principle we consider that if there are $p$ experiments and the first has $ {n}_{1} $ outcomes and the second has $ {n}_{2} $ outcomes up to experiment $p$ with $ {n}_{p} $ outcomes, the total number of outcomes for the $p$ experiments are $ {n}_{1} \\times {n}_{2} \\times \\dots \\times {n}_{p} $."
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": [
91 | "Consider 8-bit binary computer code. There are eight zeros or ones, so a sample space of two outcomes for every experiment (eight consecutive bits). So, we have $ 2 \\times 2 \\times 2 \\times 2 \\times 2 \\times 2\\times 2 \\times 2 = {2}^{8} = 256 $ outcomes."
92 | ]
93 | },
94 | {
95 | "cell_type": "markdown",
96 | "metadata": {},
97 | "source": [
98 | "
\n",
99 | "Permutation
"
100 | ]
101 | },
102 | {
103 | "cell_type": "markdown",
104 | "metadata": {},
105 | "source": [
106 | "Here we are considering an ordered arrangement of elements. Order in as much as *red*, *blue* and *blue*, *red* are two different permutations.
\n",
107 | "During an experiment we can have two options, either *replacement* or *no replacement*. If we have a deck of cards, we put each one back after selecting it in the case of *replacement* and in the case of *no replacement*, we obviously don't.
\n",
108 | "Let's do a run of experiments. We have five patients. How many different ordered list of these five patients can we come up with. Hint: this is a case of *no replacement*. The problem is easy to solve if we remember the multiplication principles. During the first experiment there are $5$ outcomes (each of the five patients can be selected). We don't put that patient back in the mix after selecting them, so for the second experiment we have $4$ outcomes, and so on. We end up with $ 5 \\times 4 \\times 3 \\times 2 \\times 1 = 5! = 120 $. So, 120 different permutations.
\n",
109 | "If we put each pateint back for reselection (which doesn't make sense for our list, but anyway), we have $5$ outcomes at every run of the experiment, so $ {5}^{5} $ outcomes.
\n",
110 | "We can replace the number $5$ with an arbitrary value, usually $n$, so for *no replacement* we have $n!$ outcomes.
\n",
111 | "What if we have want to make lists of three patients. This will be $ 5 \\times 4 \\times 3 = 60 $ outcomes, or $ n \\left( n-1 \\right) \\left( n-2 \\right) \\dots \\left( n-r+1 \\right) $ outcomes, where $n$ is the number of patients to choose from and $r$ is the number of patients on the list. In case you are wondering about the $ n-r+1 $ term, remember that at round one we have $n$ choices and at round two we have $ n-1 $ choices. So by the time we get to $r$ we get $ n - \\left( r-1 \\right) = n-r+1 $."
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "It should be clear that the equation for permutations follow from the scenario above. First we included all the patients and then only three, so in essence we have to get rid of the last two in the second iteration.\n",
119 | "$$ \\frac{5 \\times 4 \\times 3 \\times 2 \\times 1}{2 \\times 1} = \\frac{n!}{\\left( n - r \\right)!} $$"
120 | ]
121 | },
122 | {
123 | "cell_type": "markdown",
124 | "metadata": {},
125 | "source": [
126 | "We can use simple in-built Julia code for this."
127 | ]
128 | },
129 | {
130 | "cell_type": "code",
131 | "execution_count": 1,
132 | "metadata": {
133 | "collapsed": false
134 | },
135 | "outputs": [
136 | {
137 | "data": {
138 | "text/plain": [
139 | "120"
140 | ]
141 | },
142 | "execution_count": 1,
143 | "metadata": {},
144 | "output_type": "execute_result"
145 | }
146 | ],
147 | "source": [
148 | "# Including all five patients\n",
149 | "factorial(5)"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 2,
155 | "metadata": {
156 | "collapsed": false
157 | },
158 | "outputs": [
159 | {
160 | "data": {
161 | "text/plain": [
162 | "60.0"
163 | ]
164 | },
165 | "execution_count": 2,
166 | "metadata": {},
167 | "output_type": "execute_result"
168 | }
169 | ],
170 | "source": [
171 | "# Making ordered list of 3 patients choosing from 5, without replacement\n",
172 | "factorial(5) / factorial(5 - 3)"
173 | ]
174 | },
175 | {
176 | "cell_type": "markdown",
177 | "metadata": {},
178 | "source": [
179 | "In the next example we will look at how many alpha-numeric patient file numbers we can come up with if the first three places in the file number are letters of the alphabet and the last three are numbers from $0$ to $9$. We once again use the multiplication principle. Also note that we do have replacement. Let's do it in Julia. Just to be sure, we will be using values for the equation $ {\\left( { {n}_{\\alpha} } \\right)}^{r} {\\left( { {n}_{d} } \\right)}^{r} $ where $ {n}_{\\alpha} $ is the number of letters and $ {n}_{d} $ is the number of digits, with $r$ the number of choices."
180 | ]
181 | },
182 | {
183 | "cell_type": "code",
184 | "execution_count": 3,
185 | "metadata": {
186 | "collapsed": false
187 | },
188 | "outputs": [
189 | {
190 | "data": {
191 | "text/plain": [
192 | "17576000"
193 | ]
194 | },
195 | "execution_count": 3,
196 | "metadata": {},
197 | "output_type": "execute_result"
198 | }
199 | ],
200 | "source": [
201 | "(26^3)*(10^3)"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "metadata": {},
207 | "source": [
208 | "If we want to know what the probability is of getting a file number with no duplicates, we have to consider permutations (without replacement)."
209 | ]
210 | },
211 | {
212 | "cell_type": "markdown",
213 | "metadata": {
214 | "collapsed": true
215 | },
216 | "source": [
217 | "
\n",
218 | "Combinations
"
219 | ]
220 | },
221 | {
222 | "cell_type": "markdown",
223 | "metadata": {},
224 | "source": [
225 | "Combinations differ from permutations in that order is of no consequence. Whereas the arrangement of three objects can be done in $3!$ ways (that is without replacement), combinations sees them all as one. So red, green, blue and blue, red and green, and red, blue and green, etc. are all the same.
\n",
226 | "How do we derive an equation for this? Well, we know the equations for permutations $\\frac{n!}{\\left(n-r\\right)!}$. Now, have a look at the table below. We have four letters, a, b, c, and d and we have to choose three. The question is: how many different combinations can we make? Remember that a, b, c and c, b, a are exactely the same.
\n",
227 | "$$\\begin{matrix} abc & acb & bac & bca & cab & cba \\\\ abd & adb & bad & bda & dab & dba \\\\ acd & adc & cad & cda & dac & dca \\\\ bcd & bdc & bcd & cdb & dbc & dcb \\end{matrix}$$\n",
228 | "We still get the expected 24 permutations, but note that we can reduce this to only four combinations, since aech same set of three letters can be arranged in six was ($3!$). So we have to divide by six (the number of ways to arrange three objects without replacement). The equations is now easy to derive, $\\frac{n!}{\\left(n-r\\right)!r!}$. This is usually written as $\\begin{pmatrix} n \\\\ r \\end{pmatrix}$, pronounced n choose r."
229 | ]
230 | },
231 | {
232 | "cell_type": "markdown",
233 | "metadata": {},
234 | "source": [
235 | "Let's consider an example where we are manufacturing disposable surgical instruments. We need to test them for quality control, but most definitely can't test them all. As in many manufacturing quality-control program, a few are chosen at random and tested. Consider that we have $n$ instruments and choose $r$ for testing. Since the order in whcih we select them is of no consequence, we are dealing with combinations. This means we can come up with $\\begin{pmatrix} n \\\\ r \\end{pmatrix}$ different sets of samples to test.
\n",
236 | "Now, no manufacturing process is faultless, so let's say we have $k$ defective units. The question is: How likely is it that we come up with $m$ defective instruments in our sample? Our task would be to test as few as possible instruments, but still have a good measure of how many total defective instruments are being produced.
\n",
237 | "To answer this question, start by considering how many ways we can come up with $m$ choices from a total of $k$ objeccts. Remember, order doesn't matter, so it is $\\begin{pmatrix} k \\\\ m \\end{pmatrix}$. There are $n-k$ properly functioning instruments. Choosing $r-m$ from these leaves $\\begin{pmatrix} n-k \\\\ r-m \\end{pmatrix}$ combinations. Now for the multiplication principle. We have two experiments, each with their own number of outcomes. We only need to answer our question now. The probability that a sample contains $m$ defective instruments. This would be the ratio of our multiplication to the total number of combinations, $\\frac { \\begin{pmatrix} k \\\\ m \\end{pmatrix}\\begin{pmatrix} n-k \\\\ r-m \\end{pmatrix} }{ \\begin{pmatrix} n \\\\ r \\end{pmatrix} } $."
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "metadata": {},
243 | "source": [
244 | "Extending the concept of combinations further
"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "metadata": {},
250 | "source": [
251 | "Suppose we have a total of $n$ objects. We want to divide them into $i$ classes, with each class containing ${n}_{i}$ objects so that $\\sum _{ i }^{ }{ { n }_{ i } } =n$.
\n",
252 | "We have to make use of combinations and the multiplication principle. For the first class we have $\\begin{pmatrix} n \\\\ {n}_{1} \\end{pmatrix}$ combinations and the next one will have $\\begin{pmatrix} n - {n}_{1} \\\\ {n}_{2} \\end{pmatrix}$ combinations and so on.
\n",
253 | "For $r$ classes this simplifies to $\\frac{n!}{{n}_{1}!{n}_{2}! \\cdots {n}_{r}!}$."
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "metadata": {},
259 | "source": [
260 | "Consider this example. We have eight researchers and have to divide them into three research units with a total of four researchers in the first unit, and two each in the other two. How many combinations can we come up with?"
261 | ]
262 | },
263 | {
264 | "cell_type": "code",
265 | "execution_count": 4,
266 | "metadata": {
267 | "collapsed": false
268 | },
269 | "outputs": [
270 | {
271 | "data": {
272 | "text/plain": [
273 | "420.0"
274 | ]
275 | },
276 | "execution_count": 4,
277 | "metadata": {},
278 | "output_type": "execute_result"
279 | }
280 | ],
281 | "source": [
282 | "(factorial(8)) / (factorial(4) * factorial(2) * factorial(2))"
283 | ]
284 | },
285 | {
286 | "cell_type": "markdown",
287 | "metadata": {},
288 | "source": [
289 | "This extended form of using combinations solves a rather difficult problem. If we have eight subjects to choose from, how many ways can we combine them, given that three of them are identical, another three are identical and the last two are also identical? If they were all different individuals, there'd be $8!$ permutation (without replacement). Choosing all of them would only leave us with a single combination, since no matter what oerdeer we put them in, it's the same combination of all eight. Here is the solution, though."
290 | ]
291 | },
292 | {
293 | "cell_type": "code",
294 | "execution_count": 5,
295 | "metadata": {
296 | "collapsed": false
297 | },
298 | "outputs": [
299 | {
300 | "data": {
301 | "text/plain": [
302 | "560.0"
303 | ]
304 | },
305 | "execution_count": 5,
306 | "metadata": {},
307 | "output_type": "execute_result"
308 | }
309 | ],
310 | "source": [
311 | "(factorial(8)) / (factorial(3) * factorial(3) * factorial(2))"
312 | ]
313 | },
314 | {
315 | "cell_type": "markdown",
316 | "metadata": {},
317 | "source": [
318 | "
\n",
319 | "Conditional probability
"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": null,
325 | "metadata": {
326 | "collapsed": true
327 | },
328 | "outputs": [],
329 | "source": []
330 | }
331 | ],
332 | "metadata": {
333 | "kernelspec": {
334 | "display_name": "Julia 0.4.1",
335 | "language": "julia",
336 | "name": "julia-0.4"
337 | },
338 | "language_info": {
339 | "file_extension": ".jl",
340 | "mimetype": "application/julia",
341 | "name": "julia",
342 | "version": "0.4.1"
343 | }
344 | },
345 | "nbformat": 4,
346 | "nbformat_minor": 0
347 | }
348 |
--------------------------------------------------------------------------------