├── .gitignore
├── README.md
├── _config.yml
├── cs229.png
├── index.html
├── mit6006.jpg
├── prob
├── central_limit_theorem.ipynb
├── poisson_paradigm.ipynb
├── prob_concepts.ipynb
├── prob_dist_discrete.ipynb
└── rv.png
├── stats110.jpg
└── test
├── readme.md
├── test_notebook.ipynb
└── test_notebook2.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | # Created by .ignore support plugin (hsz.mobi)
2 | ### Python template
3 | # Byte-compiled / optimized / DLL files
4 | __pycache__/
5 | *.py[cod]
6 | *$py.class
7 |
8 | # C extensions
9 | *.so
10 |
11 | # Distribution / packaging
12 | .Python
13 | env/
14 | build/
15 | develop-eggs/
16 | dist/
17 | downloads/
18 | eggs/
19 | .eggs/
20 | lib/
21 | lib64/
22 | parts/
23 | sdist/
24 | var/
25 | wheels/
26 | *.egg-info/
27 | .installed.cfg
28 | *.egg
29 |
30 | # PyInstaller
31 | # Usually these files are written by a python script from a template
32 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 |
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 |
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *,cover
49 | .hypothesis/
50 |
51 | # Translations
52 | *.mo
53 | *.pot
54 |
55 | # Django stuff:
56 | *.log
57 | local_settings.py
58 |
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 |
63 | # Scrapy stuff:
64 | .scrapy
65 |
66 | # Sphinx documentation
67 | docs/_build/
68 |
69 | # PyBuilder
70 | target/
71 |
72 | # Jupyter Notebook
73 | .ipynb_checkpoints
74 |
75 | # pyenv
76 | .python-version
77 |
78 | # celery beat schedule file
79 | celerybeat-schedule
80 |
81 | # SageMath parsed files
82 | *.sage.py
83 |
84 | # dotenv
85 | .env
86 |
87 | # virtualenv
88 | .venv
89 | venv/
90 | ENV/
91 |
92 | # Spyder project settings
93 | .spyderproject
94 |
95 | # Rope project settings
96 | .ropeproject
97 |
98 | /.idea/
99 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Preparations for DS/AI/ML/Quant
2 |
3 | ## What is this
4 |
5 | A short list of resources and topics covering the essential quantitative tools for data scientists, AI/machine learning practitioners, quant developers/researchers and those who are preparing to interview for these roles.
6 |
7 | At a high-level we can divide things into 3 main areas:
8 |
9 | 1. Machine Learning
10 | 2. Coding
11 | 3. Math (calculus, linear algebra, probability, etc)
12 |
13 | Depending on the type of roles, the emphasis can be quite different. For example, AI/ML interviews might go deeper into the latest deep learning models, while quant interviews might cast a wide net on various kinds of math puzzles. Interviews for research-oriented roles might be lighter on coding problems or at least emphasize on algorithms instead of software designs or tooling.
14 |
15 |
16 | ## List of resources
17 |
18 | A minimalist list of the best/most practical ones:
19 |
20 | 
21 | 
22 | 
23 |
24 | Machine Learning:
25 |
26 | - Course on classic ML: Andrew Ng's CS229 (there are several different versions, [the Cousera one](https://www.coursera.org/learn/machine-learning) is easily accessible. I used this [older version](https://www.youtube.com/playlist?list=PLA89DCFA6ADACE599))
27 | - Book on classic ML: Alpaydin's Intro to ML [link](https://www.amazon.com/Introduction-Machine-Learning-Adaptive-Computation/dp/026201243X/ref=la_B001KD8D4G_1_2?s=books&ie=UTF8&qid=1525554938&sr=1-2)
28 | - Course with a deep learing focus: [CS231](http://cs231n.stanford.edu/) from Stanford, lectures available on Youtube
29 | - Book on deep learning: [Deep Leanring] (https://www.deeplearningbook.org/) by Ian Goodfellow et al.
30 | - Book on deep laerning NLP: Yoav Goldberg's [Neural Network Methods for Natural Language Processing](https://www.amazon.com/Language-Processing-Synthesis-Lectures-Technologies-ebook/dp/B071FGKZMH)
31 | - Hands on exercises on deep learning: Pytorch and MXNet/Gluon are easier to pick up compared to Tensorflow. For anyone of them, you can find plenty of hands on examples online. My biased recommendation is [https://d2l.ai/](https://d2l.ai/) using MXNet/Gluon created by people at Amazon (it came from [mxnet-the-straight-dope](https://github.com/zackchase/mxnet-the-straight-dope))
32 |
33 |
34 | Coding:
35 |
36 | - Course: MIT OCW 6006 [link](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/)
37 | - Book: Cracking the Coding Interview [link](https://www.amazon.com/Cracking-Coding-Interview-Programming-Questions/dp/098478280X)
38 | - SQL tutorial: from [Mode Analytics](https://community.modeanalytics.com/sql/)
39 | - Practice sites: [Leetcode](https://leetcode.com/), [HackerRank](https://www.hackerrank.com/)
40 |
41 |
42 | Math:
43 |
44 | - Calculus and Linear Algebra: undergrad class would be the best, refresher notes from CS229 [link](http://cs229.stanford.edu/section/cs229-linalg.pdf)
45 | - Probability: Harvard Stats110 [link](https://projects.iq.harvard.edu/stat110/home); [book](https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573/ref=pd_lpo_sbs_14_t_2?_encoding=UTF8&psc=1&refRID=5W11QQ7WW4DFE0Q89N7V) from the same professor
46 | - Statistics: Shaum's Outline [link](https://www.amazon.com/Schaums-Outline-Statistics-5th-Outlines/dp/0071822526)
47 | - Numerical Methods and Optimization: these are two different topics really, college courses are probably the best bet. I have yet to find good online courses for them. But don't worry, most interviews won't really touch on them.
48 |
49 |
50 |
51 | ## List of topics
52 |
53 | Here is a list of topics from which interview questions are often derived. The depth and trickiness of the questions certainly depend on the role and the company.
54 |
55 | Under topic I try to add a few bullet points of the key things you should know.
56 |
57 | ### Machine learning
58 | - Models (roughly in decreasing order of frequency)
59 | - Linear regression
60 | - e.g. assumptions, multicollinearity, derive from scratch in linear algebra form
61 | - Logistic regression
62 | - be able to write out everything from scratch: from definitng a classficiation problem to the gradient updates
63 | - Decision trees/forest
64 | - e.g. how does a tree/forest grow, on a pseudocode level
65 | - Clustering algorithms
66 | - e.g. K-means, agglomerative clustering
67 | - SVM
68 | - e.g. margin-based loss objectives, how do we use support vectors, prime-dual problem
69 | - Generative vs discriminative models
70 | - e.g. Gaussian mixture, Naive Bayes
71 | - Anomaly/outlier detection algorithms (DBSCAN, LOF etc)
72 | - Matrix factorization based models
73 | - Training methods
74 | - Gradient descent, SGD and other popular variants
75 | - Understand momentum, how they work, and what are the diffrences between the popular ones (RMSProp, Adgrad, Adadelta, Adam etc)
76 | - Bonus point: when to not use momentum?
77 | - EM algorithm
78 | - Andrew's [lecture notes](http://cs229.stanford.edu/notes/cs229-notes8.pdf) are great, also see [this](https://dingran.github.io/EM/)
79 | - Gradient boosting
80 | - Learning theory / best practice (see Andrew's advice [slides](http://cs229.stanford.edu/materials/ML-advice.pdf))
81 | - Bias vs variance, regularization
82 | - Feature selection
83 | - Model validation
84 | - Model metrics
85 | - Ensemble method, boosting, bagging, bootstraping
86 | - Generic topics on deep learning
87 | - Feedforward networks
88 | - Backpropagation and computation graph
89 | - I really liked the [miniflow](https://gist.github.com/dingran/154a524003c86ecab4a949c538afa766) project Udacity developed
90 | - In addition, be absolutely familiar with doing derivatives with matrix and vectors, see [Vector, Matrix, and Tensor Derivatives](http://cs231n.stanford.edu/vecDerivs.pdf) by Erik Learned-Miller and [Backpropagation for a Linear Layer](http://cs231n.stanford.edu/handouts/linear-backprop.pdf) by Justin Johnson
91 | - CNN, RNN/LSTM/GRU
92 | - Regularization in NN, dropout, batch normalization
93 |
94 | ### Coding essentials
95 | The bare minimum of coding concepts you need to know well.
96 |
97 | - Data structures:
98 | - array, dict, link list, tree, heap, graph, ways of representing sparse matrices
99 | - Sorting algorithms:
100 | - see [this](https://brilliant.org/wiki/sorting-algorithms/) from brilliant.org
101 | - Tree/Graph related algorithms
102 | - Traversal (BFS, DFS)
103 | - Shortest path (two sided BFS, dijkstra)
104 | - Recursion and dynamic programming
105 |
106 | ### Calculus
107 |
108 | Just to spell things out
109 |
110 | - Derivatives
111 | - Product rule, chain rule, power rule, L'Hospital's rule,
112 | - Partial and total derivative
113 | - Things worth remembering
114 | - common function's derivatives
115 | - limits and approximations
116 | - Applications of derivatives: e.g. [this](https://math.stackexchange.com/questions/1619911/why-ex-is-always-greater-than-xe)
117 | - Integration
118 | - Power rule, integration by sub, integration by part
119 | - Change of coordinates
120 | - Taylor expansion
121 | - Single and multiple variables
122 | - Taylor/McLauren series for common functions
123 | - Derive Newton-Raphson
124 | - ODEs, PDEs (common ways to solve them analytically)
125 |
126 |
127 | ### Linear algebra
128 | - Vector and matrix multiplication
129 | - Matrix operations (transpose, determinant, inverse etc)
130 | - Types of matrices (symmetric, Hermition, orthogonal etc) and their properties
131 | - Eigenvalue and eigenvectors
132 | - Matrix calculus (gradients, hessian etc)
133 | - Useful theorems
134 | - Matrix decomposition
135 | - Concrete applications in ML and optimization
136 |
137 |
138 | ### Probability
139 |
140 | Solving probability interview questions is really all about pattern recognition. To do well, do plenty of exercise from [this](https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573/ref=pd_lpo_sbs_14_t_2?_encoding=UTF8&psc=1&refRID=5W11QQ7WW4DFE0Q89N7V) and [this](https://www.amazon.com/Practical-Guide-Quantitative-Finance-Interviews/dp/1438236662). This topic is particularly heavy in quant interviews and usually quite light in ML/AI/DS interviews.
141 |
142 | - Basic concepts
143 | - Event, outcome, random variable, probability and probability distributions
144 | - Combinatorics
145 | - Permutation
146 | - Combinations
147 | - Inclusion-exclusion
148 | - Conditional probability
149 | - Bayes rule
150 | - Law of total probability
151 | - Probability Distributions
152 | - Expectation and variance equations
153 | - Discrete probability and stories
154 | - Continuous probability: uniform, gaussian, poisson
155 | - Expectations, variance, and covariance
156 | - Linearity of expectation
157 | - solving problems with this theorem and symmetry
158 | - Law of total expectation
159 | - Covariance and correlation
160 | - Independence implies zero correlation
161 | - Hash collision probability
162 | - Universality of Uniform distribution
163 | - Proof
164 | - Circle problem
165 | - Order statistics
166 | - Expectation of min and max and random variable
167 | - Graph-based solutions involving multiple random variables
168 | - e.g. breaking sticks, meeting at the train station, frog jump (simplex)
169 | - Approximation method: Central Limit Theorem
170 | - Definition, examples (unfair coins, Monte Carlo integration)
171 | - [Example question](https://github.com/dingran/quant-notes/blob/master/prob/central_limit_theorem.ipynb)
172 | - Approximation method: Poisson Paradigm
173 | - Definition, examples (duplicated draw, near birthday problem)
174 | - Poisson count/time duality
175 | - Poisson from poissons
176 | - Markov chain tricks
177 | - Various games, introduction of martingale
178 |
179 | ### Statistics
180 | - Z-score, p-value
181 | - t-test, F-test, Chi2 test (know when to use which)
182 | - Sampling methods
183 | - AIC, BIC
184 |
185 | ### [Optional] Numerical methods and optimization
186 | - Computer errors (e.g. float)
187 | - Root finding (newton method, bisection, secant etc)
188 | - Interpolating
189 | - Numerical integration and difference
190 | - Numerical linear algebra
191 | - Solving linear equations, direct methods (understand complexities here) and iterative methods (e.g. conjugate gradient)
192 | - Matrix decompositions/transformations (e.g. QR, LU, SVD etc)
193 | - Eigenvalue (e.g. power iteration, Arnoldi/Lanczos etc)
194 | - ODE solvers (explicit, implicit)
195 | - Finite-difference method, finite-element method
196 | - Optimization topics: TBA
197 |
198 |
199 |
200 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-cayman
--------------------------------------------------------------------------------
/cs229.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dingran/quant-notes/b14bcc686425fef4841c254753386a2b7e652913/cs229.png
--------------------------------------------------------------------------------
/index.html:
--------------------------------------------------------------------------------
1 | hello world
2 |
--------------------------------------------------------------------------------
/mit6006.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/dingran/quant-notes/b14bcc686425fef4841c254753386a2b7e652913/mit6006.jpg
--------------------------------------------------------------------------------
/prob/central_limit_theorem.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Central Limit Theorem (CLT)\n",
8 | "\n",
9 | "## Definition:\n",
10 | "Let $X_{1}$, $X_{2}$, $X_{3}$,... be i.i.d random variables from some distribution with finite mean $\\mu$ and finite variance $\\sigma^{2}$. \n",
11 | "\n",
12 | "As $n \\rightarrow \\infty$, let $S=\\sum_{k=1}^n X_{i}$, we have $S \\rightarrow \\mathcal{N}(n\\mu, n\\sigma^{2})$ and $\\frac{S-n\\mu}{\\sqrt{n\\sigma^{2}}} \\rightarrow \\mathcal{N}(0,1)$\n",
13 | "\n",
14 | "Equivalently, let $M=\\frac{1}{n}\\sum_{k=1}^n X_{i}$, we have\n",
15 | "$M \\rightarrow \\mathcal{N}(\\mu,\\frac{\\sigma^2}{n})$ and $\\frac{M-\\mu}{\\sqrt{\\frac{\\sigma^2}{n}}} \\rightarrow \\mathcal{N}(0,1)$\n",
16 | "\n",
17 | "\n",
18 | "Notation:\n",
19 | " - $\\mathcal{N}(\\mu,\\sigma^2)$ denotes [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean of $\\mu$ and variance of $\\sigma^2$."
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "## Discussions:\n",
27 | "\n",
28 | "Naturally CLT appears in questions that invovle sum or average of a large number of random variablse and especially when the questions only ask for an approximate answer. \n",
29 | "\n",
30 | "Here are a few examples."
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {
36 | "collapsed": true
37 | },
38 | "source": [
39 | "
\n",
40 | "***Example 1:***\n",
41 | "\n",
42 | "Suppose we have a fair coin and we flip it 400 times. What is the probability you will see 210 heads or more?\n",
43 | "\n",
44 | "---\n",
45 | "\n",
46 | "
\n",
47 | "**Exact answer**\n",
48 | "\n",
49 | "Let the outcome of each coin flip be a random variable $I_{i}$. Thus we are dealing with the random variable $S=\\sum_{i=1}^{400}I_{i}$. $S$ is te sume of a series of i.i.d Bernoulie trials, thus it follows Binomial distribution. So the exact answer is: $P(S\\geq210)= \\sum_{k=210}^{400}C_{400}^{k}\\left(\\frac{1}{2}\\right)^{400}$ which requires a program to calculate (Actually try implementing this, beware of roudoff errors and compare it against the approximate answer below.).\n",
50 | "\n",
51 | "\n",
52 | "Notation:\n",
53 | " - $C_{n}^{k}$ is the notation for \"[n choose k](https://en.wikipedia.org/wiki/Binomial_coefficient)\", which denotes the number of ways to choose k items from n items where order doesn't matter.\n",
54 | "\n",
55 | "
\n",
56 | "**Approximation**\n",
57 | "\n",
58 | "We use CLT to easily get an approxmate answer quickly. First recognize that for each $I_{i}$ we have $\\mu=0.5$ and $\\sigma^2=0.5\\times(1-0.5)=0.25$. Then, $Z=\\frac{S-400*0.5}{\\sqrt{400*0.25}}=\\frac{S-200}{10}$ is approximately $\\mathcal{N}(0,1)$. For $S \\geq 210$, we have $Z\\geq1$. \n",
59 | "\n",
60 | "The 68-95-99.7 rule tells us that for a standard Normal distribution $\\mathcal{N}(0,1)$, the probability of the random variable taking value more than 1 standard deviation away from the center is $1-0.68=0.32$ and thus the one sided probability for $P(Z\\geq1) = 0.32/2 = 0.16$."
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "
\n",
68 | "***Example 2:***\n",
69 | "\n",
70 | "Suppose you use Monte Carlo simulation to estimate the numerical value of $\\pi$.\n",
71 | "- How would you implement it? \n",
72 | "- If we require an error of 0.001, how many trials do you need?\n",
73 | "\n",
74 | "---\n",
75 | "\n",
76 | "**Solution**\n",
77 | "\n",
78 | "One possible implementation is to start with a rectangle, say $x \\in [-1,1], y\\in[-1,1]$. If we uniformly randomly draw a point from this rectangle, the probability $p$ of the point following into the circle region $x^2+y^2\\lt1$ is the ratio of the area between the circle and rectangle, i.e $p=\\frac{\\pi}{4}$\n",
79 | "\n",
80 | "Formally, let random indicator variable $I$ take value 1 if the point falls in the circle and 0 otherwise, then $P(I=1)=p$ and $E(I)=p$. If we do $n$ such trials, and define $M=\\frac{1}{n}\\sum_{k=1}^n I_{k}$, then $M$ follows approximately $\\mathcal{N}(\\mu_{I},\\frac{\\sigma_{I}^2}{n})$. In this setup, $\\mu_{I}=E(I)=p$ and $\\sigma_{I}^2=p(1-p)$ (see [Probability Distribution](prob-dist-discrete.ipynb) section for details on $\\sigma_{I}^2$).\n",
81 | "\n",
82 | "One thing we need to clarify with the interviewer is what error really means? She might tell you to consider it as the standard deviation of the estimated $\\pi$. Therefore the specified error translates into a required sigma of $\\sigma_{req}=\\frac{error}{4}$ for random variable $M$. Thus $n = \\frac{\\sigma_{I}^2}{\\sigma_{req}^2}=\\frac{p(1-p)}{(0.00025)^2}\\approx2.7\\times 10^6$.\n",
83 | "\n",
84 | "By the way, we can see that the number of trials $n$ scales with $\\frac{1}{error^2}$, which is caused by the $\\frac{1}{\\sqrt{n}}$ scaling of the $\\sigma_{M}$ in the CLT, and is generally the computationaly complexity entailed by [Monte Carlo integration](https://en.wikipedia.org/wiki/Monte_Carlo_integration).\n"
85 | ]
86 | },
87 | {
88 | "cell_type": "markdown",
89 | "metadata": {},
90 | "source": []
91 | }
92 | ],
93 | "metadata": {
94 | "kernelspec": {
95 | "display_name": "Python 2",
96 | "language": "python",
97 | "name": "python2"
98 | },
99 | "language_info": {
100 | "codemirror_mode": {
101 | "name": "ipython",
102 | "version": 2
103 | },
104 | "file_extension": ".py",
105 | "mimetype": "text/x-python",
106 | "name": "python",
107 | "nbconvert_exporter": "python",
108 | "pygments_lexer": "ipython2",
109 | "version": "2.7.13"
110 | }
111 | },
112 | "nbformat": 4,
113 | "nbformat_minor": 1
114 | }
115 |
--------------------------------------------------------------------------------
/prob/poisson_paradigm.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "collapsed": true
7 | },
8 | "source": [
9 | "# Poisson Paradigm\n",
10 | " \n",
11 | " - birthday problem, birthday triplets, near birthday problem\n",
12 | " - repeated draws"
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": null,
18 | "metadata": {},
19 | "outputs": [],
20 | "source": []
21 | }
22 | ],
23 | "metadata": {
24 | "kernelspec": {
25 | "display_name": "Python 2",
26 | "language": "python",
27 | "name": "python2"
28 | },
29 | "language_info": {
30 | "codemirror_mode": {
31 | "name": "ipython",
32 | "version": 2
33 | },
34 | "file_extension": ".py",
35 | "mimetype": "text/x-python",
36 | "name": "python",
37 | "nbconvert_exporter": "python",
38 | "pygments_lexer": "ipython2",
39 | "version": "2.7.6"
40 | }
41 | },
42 | "nbformat": 4,
43 | "nbformat_minor": 0
44 | }
45 |
--------------------------------------------------------------------------------
/prob/prob_concepts.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Probability concepts\n",
8 | "\n",
9 | "In this notebook we will go over some essential concepts in probability, such as **events, random variables, probability and probability distribution**. They seem simple but also sometimes are quite confusing to someone new to probability.\n",
10 | "\n"
11 | ]
12 | },
13 | {
14 | "cell_type": "markdown",
15 | "metadata": {},
16 | "source": [
17 | "## Sample space, outcomes, events\n",
18 | "There are things in lif that are a bit *random* in the sense that we do not know the outcome with certainty before it occurs, so we reason about the uncertainties with tools of probability. \n",
19 | "\n",
20 | "For an experiment with uncertain outcomes, we denote all possible outcomes as a *set* $S$, and call it the *sample space*. The actual outcome(s) will belong to this set.\n",
21 | "\n",
22 | "An *event* $E$ is a subset of $S$ (i.e. $E \\subseteq S$) and we would say event $E$ *occurred* if the actual outcome(s) belongs to $E$ (i.e. $s_{actual} \\in E$).\n",
23 | "\n",
24 | "To make this concrete, let's look at a single roll if a 6-sided dice.\n",
25 | "\n",
26 | "The sample space $S$ in this case is a set of 6 elements: $S=\\{\\text{Face 1 shows up}, \\text{Face 2 shows up}, \\dots, \\text{Face 6 shows up}\\}$\n",
27 | "\n",
28 | "We could define an event however we want. For exampple we can define event $E_1$ to be the event face 5 shows up, define an event $E_2$ to be the event \"a face with even number shows up\", and define an event $E_3$ to be the event \"the face shows up is not 2 or 3\". They are expressed as follows:\n",
29 | "\n",
30 | "$$E_1 =\\{\\text{Face 5 shows up}\\}$$\n",
31 | "$$E_2 =\\{\\text{Face 2 shows up},\\text{Face 4 shows up},\\text{Face 6 shows up}\\}$$\n",
32 | "$$E_3^c =\\{\\text{Face 2 shows up}, \\text{Face 3 shows up}\\}, E_3=S-E_3^c$$\n",
33 | "\n",
34 | "In the above, we introduced the notation $^c$ to mean the complementation of a set. Specifically, it means $E_3^c$ occurs if and only if $E_3$ does not occur. \n",
35 | "\n",
36 | "A few other notations on set algebra might be helpful to review: https://en.wikipedia.org/wiki/Algebra_of_sets"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "## Probability\n",
44 | "\n",
45 | "\"Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, where, loosely speaking, 0 indicates impossibility and 1 indicates certainty.\" - Wikipedia\n",
46 | "\n",
47 | "We can interpret probability as the frequency of certain event occur if we do the experiment many many times. This long-run frequency view is the *frequentist* view. Or we can interpret probability as our degree of belief in the event, which is useful for experiments that are not possible to repeat over and over. This is the *Bayesian* view.\n",
48 | "\n",
49 | "We use $P(A)$ to denote the probabily that event $A$ occurs and define probability with the following axioms:\n",
50 | "- $P(S)=1$, $P(\\emptyset)=0$\n",
51 | "- Disjoint/mutually exclusive events, $A_1, A_2, \\dots$ are defined such that $A_i \\cap A_j = \\emptyset$ for $i\\neq j$. And we have \n",
52 | "$$P \\left( \\bigcup_{i=1}^{\\infty}A_i \\right) = \\sum_{i=1}^{\\infty}P(A_i) $$\n",
53 | "\n",
54 | "\n",
55 | "Following the definiiton we have properties:\n",
56 | "- $P(A) + P(A^c)=1$\n",
57 | "- $A \\subseteq B \\Rightarrow P(A) \\leq P(B)$\n",
58 | "- $P(A \\cup B) = P(A) + P(B) - P(A \\cap B)$\n",
59 | "\n",
60 | "The last property can be generalized into the [inclusion-exclusion theorem](https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle) for more than two sets."
61 | ]
62 | },
63 | {
64 | "cell_type": "markdown",
65 | "metadata": {},
66 | "source": [
67 | "Let's try to find the probability of $E_1$, $E_2$ and $E_3$ in our dice example. Assume our dice is fair, meaning each face will show up equally likely. Denote $A_i$ to be the outcome face $i$ shows up, we have $ \\sum_{i=1}^6 A_i = 1 $ and $A_1=A_2= \\dots = A_6$, therefore $A_i=\\frac{1}{6}$. \n",
68 | "\n",
69 | "The fairness of the dices gives the problem **symmetry** (i.e. all $A_i$s are equal). In addition $A_i$s are **mutually exclusive** (a roll of a single dice cannot possibly take more than one value). Therefore we can resort to counting the **number of occurences** for calculating probability:\n",
70 | "\n",
71 | "$$P(E) = \\frac{\\text{number of outcomes in E}}{\\text{number of all possible outcomes}}$$\n",
72 | "\n",
73 | "Thus we have $P(E_1)=\\frac{1}{6}$, $P(E_2)=\\frac{3}{6}=\\frac{1}{2}$ and $P(E_3)=\\frac{4}{6}=\\frac{2}{3}$\n",
74 | "\n",
75 | "The assumption/condition that **all the outcomes are equally likely and mutually exclussive** forms the basis for couting and combinatorics that we will go over in the next section."
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "## Random variable\n",
83 | "\n",
84 | "In the discussion above, we have $P(\\cdot)$ notation for probability, it is a function that takes an event as input and outputs a real value that is between 0 and 1.\n",
85 | "\n",
86 | "In order to fully utilize the tools we have in calculus, it would be nice to have the input of $P(\\cdot)$ be real valued. And random variable comes to resume.\n",
87 | "\n",
88 | "**A random variable maps sample space $S$ to real numbers $\\mathbb{R}$.** That's it. The exact mapping is up to use to define. \n",
89 | "\n",
90 | "\n",
91 | "(Image credit: *Blitzstein, Joseph K., and Jessica Hwang. Introduction to probability. CRC Press, 2014.*)\n",
92 | "\n",
93 | "For example, we could define a random variable $X$ (we usually use capital letter to denote a random variable) to take on value $i$ if the dice face $i$ shows up.\n",
94 | "\n",
95 | "For example, if we roll the dice twice, we could define a random variable $Y$ to take on the value $i_1+i_2$, where $i_1$ and $i_2$ are the dice value of the first and second roll respectively. Thus very conveniently $Y=7$ represents the outcomes of $(i_1, i_2)$ takes on values of $\\{(1,6), (6,1), (2, 5), (5, 2), (3,4), (4,3) \\})$ from the two rolls. "
96 | ]
97 | },
98 | {
99 | "cell_type": "markdown",
100 | "metadata": {},
101 | "source": [
102 | "## Probability distribution\n",
103 | "\n",
104 | "Now that we have both the input and output of $P(\\cdot)$ as real numbers, it is natual to think of $P(\\cdot)$ as function that describes the probability of random variable taking of various values. More concretely $P(X=k)=P(k)=f_X(k)$.\n",
105 | "\n",
106 | "For our dice example above, because $X$ and $Y$ takes on discrete values, they are called discrete random variables. Their $P(\\cdot)$ functions are called probability mass functions (PMF). Because at each of the value $k$ the variable takes, P(k) is indeed a probability. This is in contrast to the continuous version of the random variables. \n",
107 | "\n",
108 | "For continuous random variables, the probability of taking on an exact real number is zero, and it is more useful to talk about the probabiliy for taking on values of certain interval, so the equivalent of $P(\\cdot)$ there is called probability density function (PDF).\n",
109 | "\n",
110 | "To get some feeling about PMF, let's calculate the PMF for $Y$. "
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 1,
116 | "metadata": {
117 | "collapsed": true
118 | },
119 | "outputs": [],
120 | "source": [
121 | "import matplotlib.pyplot as plt\n",
122 | "from collections import defaultdict\n",
123 | "%matplotlib inline\n",
124 | "\n",
125 | "p_y = defaultdict(int)\n",
126 | "for i in range(1, 7):\n",
127 | " for j in range(1, 7):\n",
128 | " s = i+j\n",
129 | " p_y[s] +=1"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 2,
135 | "metadata": {},
136 | "outputs": [
137 | {
138 | "data": {
139 | "text/plain": [
140 | "defaultdict(int,\n",
141 | " {2: 1,\n",
142 | " 3: 2,\n",
143 | " 4: 3,\n",
144 | " 5: 4,\n",
145 | " 6: 5,\n",
146 | " 7: 6,\n",
147 | " 8: 5,\n",
148 | " 9: 4,\n",
149 | " 10: 3,\n",
150 | " 11: 2,\n",
151 | " 12: 1})"
152 | ]
153 | },
154 | "execution_count": 2,
155 | "metadata": {},
156 | "output_type": "execute_result"
157 | }
158 | ],
159 | "source": [
160 | "p_y"
161 | ]
162 | },
163 | {
164 | "cell_type": "markdown",
165 | "metadata": {},
166 | "source": [
167 | "We can see that $Y$ only takes on certain values (2 to 12), this set of values are called the *support*, outside of the support $P(Y)=0$. To get the probability, we should divide the count outcomes by the total count of all possible outcomes."
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 3,
173 | "metadata": {},
174 | "outputs": [
175 | {
176 | "data": {
177 | "text/plain": [
178 | ""
179 | ]
180 | },
181 | "execution_count": 3,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | },
185 | {
186 | "data": {
187 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEICAYAAABfz4NwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGhZJREFUeJzt3X90XOV95/H3J/KPKFCiAE4XyyY4i1fEga1dhAPbLd0N\nJTJtgnWyJrEPDWaXs2Zz6t3sZiti7yak9aZNWHVLm1OaxA2/A7apY4ySmlVpKN2TXX5YYNbCUIFw\niC3JKSIgQhMF2/J3/5hH7DCM0R1pRmPNfF7nzNG9z33uc59Hmrkf3R8zo4jAzMzsHdXugJmZnRgc\nCGZmBjgQzMwscSCYmRngQDAzs8SBYGZmgAPBbNpJapT0HUmvSvqLavfHbJwDweqCpBckjUr6B0l/\nL+lWSSenZQ9JCkm/VLDOzlT+L9L870o6ktoYf1w3ie6sAn4ROC0irijY5qWpf6fnlc2V9Iykayex\nLbPMHAhWTz4WEScDvwxcAHw+b9mzwFXjM5JOAy4Ehgva2BYRJ+c9/vsk+vE+4NmIOFq4ICIeAL4L\n/Ele8eeBQ8DmSWzLLDMHgtWdiBgE7gfOzSu+C/ikpIY0vwa4Fzg8mW1I+kA68hiRtE/S5an894Dr\n07b+QdI1RVb/LPBrkn5T0rnAeuDfhj9WwCpsVrU7YDbdJC0EfgPYkVc8BDwNfIRcWFwFfAb42CTa\nnw18B7gltffPgfsktUbEFyUFcHZE/Fax9SPiVUmfBr5O7sjg9yLi+VL7YVYqHyFYPdkpaQT4PvC3\nwB8ULL8DuEpSC9AUEQ8XaeMT6b/+8cf8InUuBE4GvhIRhyPiQXKngdZk7WhEfAd4hNxr9KtZ1zOb\nCh8hWD1pj4i/fpvlO4D/AfwYuPM4de453n/2eeYDByPiWF7ZD4HmzD3N2Qe8XtCOWcU4EMySiPiZ\npPuBTwP/eApNDQELJb0jb2d+JrkL12YnLJ8yMnuz/wL8WkS8MIU2HgV+ClwnaXa6bfVjwNapd8+s\nchwIZnkiYigivj/FNg4DlwOXAS8BfwZcFRF/V4YumlWMfCebmZmBjxDMzCzJFAiSVkjqk9QvaUOR\n5RdLekLSUUmr8sr/paQn8x4/l9Selt0m6Qd5y5aWb1hmZlaqCU8ZpXduPgtcCgwAu4E1EfF0Xp2z\ngFOA3wG6ImJ7kXZOBfqBBelujtuA7xara2Zm0y/LbafLgf6I2A8gaSuwkty7OgEYvyND0tvdL70K\nuD8ifjbp3pqZWcVkCYRm4GDe/ADwoUlsazXwRwVlvy/peuB7wIaIeL1wJUnrgHUAJ5100vnnnHPO\nJDZtZla/Hn/88ZciYt5E9bIEgoqUlXRrkqQzgPOA7rzijcCPgDnkPsXxc8Cmt2woYnNaTmtra/T0\n9JSyaTOzuifph1nqZbmoPAAszJtfQO6dmKX4BHBvRBwZL4iIQ5HzOnAruVNTZmZWJVkCYTewWNIi\nSXPInfrpKnE7a4At+QXpqAFJAtqBp0ps08zMymjCQEhf4rGe3OmeZ8h9uNc+SZvyPuP9AkkDwBXA\nNyTtG18/3YG0kNynS+a7S1Iv0AucDnxp6sMxM7PJmlHvVPY1BDOz0kl6PCJaJ6rndyqbmRngj782\nm5Kdewbp7O5jaGSU+U2NdLS10L6s1K89MDsxOBDMJmnnnkE27uhl9MgYAIMjo2zc0QvgULAZyaeM\nzCaps7vvjTAYN3pkjM7uvir1yGxqHAhmkzQ0MlpSudmJzoFgNknzmxpLKjc70TkQzCapo62FxtkN\nbyprnN1AR1tLlXpkNjW+qGw2SeMXjq/bvpfDY8do9l1GNsM5EMymoH1ZM1seOwDAtmsvqnJvzKbG\np4zMzAxwIJiZWeJAMDMzwIFgZmaJA8HMzAAHgpmZJQ4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAwMzPA\ngWBmZkmmQJC0QlKfpH5JG4osv1jSE5KOSlpVsGxM0pPp0ZVXvkjSo5Kek7RN0pypD8fMzCZrwkCQ\n1ADcBFwGLAHWSFpSUO0AcDVwd5EmRiNiaXpcnld+A3BjRCwGXgGumUT/zcysTLIcISwH+iNif0Qc\nBrYCK/MrRMQLEbEXOJZlo5IEfBjYnopuB9oz99rMzMouSyA0Awfz5gdSWVbvlNQj6RFJ4zv904CR\niDg6UZuS1qX1e4aHh0vYrJmZlSLLF+SoSFmUsI0zI2JI0vuBByX1Aj/J2mZEbAY2A7S2tpayXTMz\nK0GWI4QBYGHe/AJgKOsGImIo/dwPPAQsA14CmiSNB1JJbZqZWfllCYTdwOJ0V9AcYDXQNcE6AEh6\nj6S5afp04FeApyMigL8Bxu9IWgvcV2rnzcysfCYMhHSefz3QDTwD3BMR+yRtknQ5gKQLJA0AVwDf\nkLQvrf4BoEfS/yUXAF+JiKfTss8Bn5XUT+6aws3lHJiZmZUmyzUEImIXsKug7Pq86d3kTvsUrvd/\ngPOO0+Z+cncwmZnZCcDvVDYzM8CBYGZmiQPBzMwAB4KZmSWZLiqbneh27hmks7uPoZFR5jc10tHW\nQvuyUt5QP/PU45itshwINuPt3DPIxh29jB4ZA2BwZJSNO3oBanYHWY9jtsrzKSOb8Tq7+97YMY4b\nPTJGZ3dflXpUefU4Zqs8B4LNeEMjoyWV14J6HLNVngPBZrz5TY0lldeCehyzVZ4DwWa8jrYWGmc3\nvKmscXYDHW0tVepR5dXjmK3yfFHZZrzxi6jXbd/L4bFjNNfBHTf1OGarPAeC1YT2Zc1seewAANuu\nvajKvZke9ThmqyyfMjIzM8CBYGZmiQPBzMwAB4KZmSUOBDMzAxwIZmaWOBDMzAxwIJiZWeJAMDMz\nIGMgSFohqU9Sv6QNRZZfLOkJSUclrcorXyrpYUn7JO2V9Mm8ZbdJ+oGkJ9NjaXmGZGZmkzHhR1dI\nagBuAi4FBoDdkroi4um8ageAq4HfKVj9Z8BVEfGcpPnA45K6I2IkLe+IiO1THYSZmU1dls8yWg70\nR8R+AElbgZXAG4EQES+kZcfyV4yIZ/OmhyS9CMwDRjAzsxNKllNGzcDBvPmBVFYSScuBOcDzecW/\nn04l3Shp7nHWWyepR1LP8PBwqZs1M7OMsgSCipRFKRuRdAZwJ/CvI2L8KGIjcA5wAXAq8Lli60bE\n5ohojYjWefPmlbJZMzMrQZZAGAAW5s0vAIaybkDSKcBfAp+PiEfGyyPiUOS8DtxK7tSUmZlVSZZA\n2A0slrRI0hxgNdCVpfFU/17gjoj4i4JlZ6SfAtqBp0rpuJmZldeEgRARR4H1QDfwDHBPROyTtEnS\n5QCSLpA0AFwBfEPSvrT6J4CLgauL3F56l6ReoBc4HfhSWUdmZmYlyfSNaRGxC9hVUHZ93vRucqeS\nCtf7FvCt47T54ZJ6amZmFeV3KpuZGeBAMDOzxIFgZmaAA8HMzBIHgpmZAQ4EMzNLHAhmZgZkfB+C\nWRY79wzS2d3H0Mgo85sa6WhroX1ZyZ+DaCc4/51rlwPBymLnnkE27uhl9MgYAIMjo2zc0QvgnUUN\n8d+5tvmUkZVFZ3ffGzuJcaNHxujs7qtSj6wS/HeubQ4EK4uhkdGSym1m8t+5tjkQrCzmNzWWVG4z\nk//Otc2BYGXR0dZC4+yGN5U1zm6go62lSj2ySvDfubb5orKVxfgFxeu27+Xw2DGaffdJTfLfubY5\nEKxs2pc1s+WxAwBsu/aiKvfGKsV/59rlU0ZmZgY4EMzMLHEgmJkZ4EAwM7PEgWBmZkDGQJC0QlKf\npH5JG4osv1jSE5KOSlpVsGytpOfSY21e+fmSelObX5WkqQ/HzMwma8JAkNQA3ARcBiwB1khaUlDt\nAHA1cHfBuqcCXwQ+BCwHvijpPWnx14B1wOL0WDHpUZiZ2ZRlOUJYDvRHxP6IOAxsBVbmV4iIFyJi\nL3CsYN024IGIeDkiXgEeAFZIOgM4JSIejogA7gDapzoYMzObvCyB0AwczJsfSGVZHG/d5jQ9YZuS\n1knqkdQzPDyccbNmZlaqLIFQ7Nx+ZGz/eOtmbjMiNkdEa0S0zps3L+NmzcysVFkCYQBYmDe/ABjK\n2P7x1h1I05Np08zMKiBLIOwGFktaJGkOsBroyth+N/ARSe9JF5M/AnRHxCHgNUkXpruLrgLum0T/\nzcysTCYMhIg4Cqwnt3N/BrgnIvZJ2iTpcgBJF0gaAK4AviFpX1r3ZeC/kQuV3cCmVAbwaeCbQD/w\nPHB/WUdmZmYlyfRppxGxC9hVUHZ93vRu3nwKKL/eLcAtRcp7gHNL6ayZmVWO36lsZmaAA8HMzBIH\ngpmZAQ4EMzNLHAhmZgY4EMzMLHEgmJkZ4EAwM7PEgWBmZoADwczMEgeCmZkBGT/LyGaWnXsG6ezu\nY2hklPlNjXS0tdC+LOt3GpmduPzcriwHQo3ZuWeQjTt6GT0yBsDgyCgbd/QC+IVjM5qf25XnU0Y1\nprO7740XzLjRI2N0dvdVqUdm5eHnduU5EGrM0MhoSeVmM4Wf25XnQKgx85saSyo3myn83K48B0KN\n6WhroXF2w5vKGmc30NHWUqUemZWHn9uV54vKNWb84tp12/dyeOwYzb4Tw2qEn9uV50CoQe3Lmtny\n2AEAtl17UZV7Y1Y+fm5Xlk8ZmZkZ4EAwM7MkUyBIWiGpT1K/pA1Fls+VtC0tf1TSWan8SklP5j2O\nSVqalj2U2hxf9t5yDszMzEozYSBIagBuAi4DlgBrJC0pqHYN8EpEnA3cCNwAEBF3RcTSiFgKfAp4\nISKezFvvyvHlEfFiGcZjZmaTlOUIYTnQHxH7I+IwsBVYWVBnJXB7mt4OXCJJBXXWAFum0lkzM6uc\nLIHQDBzMmx9IZUXrRMRR4FXgtII6n+StgXBrOl30hSIBAoCkdZJ6JPUMDw9n6K6ZmU1GlkAotqOO\nUupI+hDws4h4Km/5lRFxHvCr6fGpYhuPiM0R0RoRrfPmzcvQXTMzm4wsgTAALMybXwAMHa+OpFnA\nu4GX85avpuDoICIG08/XgLvJnZoyM7MqyRIIu4HFkhZJmkNu595VUKcLWJumVwEPRkQASHoHcAW5\naw+kslmSTk/Ts4GPAk9hZmZVM+E7lSPiqKT1QDfQANwSEfskbQJ6IqILuBm4U1I/uSOD1XlNXAwM\nRMT+vLK5QHcKgwbgr4E/L8uIzMxsUjJ9dEVE7AJ2FZRdnzf9c3JHAcXWfQi4sKDsp8D5JfbVzMwq\nyO9UNjMzwIFgZmaJA8HMzAAHgpmZJQ4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAwMzPAgWBmZokDwczM\nAAeCmZklDgQzMwMcCGZmlmT6+Gsr3c49g3R29zE0Msr8pkY62lpoX1b4VdRmNhPUy+vZgVABO/cM\nsnFHL6NHxgAYHBll445egJp8EpnVsnp6PfuUUQV0dve98eQZN3pkjM7uvir1yMwmq55ezw6EChga\nGS2p3MxOXPX0enYgVMD8psaSys3sxFVPr2cHQgV0tLXQOLvhTWWNsxvoaGupUo/MbLLq6fWcKRAk\nrZDUJ6lf0oYiy+dK2paWPyrprFR+lqRRSU+mx9fz1jlfUm9a56uSVK5BVVv7sma+/PHzmNOQ+/U2\nNzXy5Y+fV3MXoMzqQT29nie8y0hSA3ATcCkwAOyW1BURT+dVuwZ4JSLOlrQauAH4ZFr2fEQsLdL0\n14B1wCPALmAFcP+kR3KCaV/WzJbHDgCw7dqLqtwbM5uKenk9ZzlCWA70R8T+iDgMbAVWFtRZCdye\nprcDl7zdf/ySzgBOiYiHIyKAO4D2kntvZmZlkyUQmoGDefMDqaxonYg4CrwKnJaWLZK0R9LfSvrV\nvPoDE7RpZmbTKMsb04r9px8Z6xwCzoyIH0s6H9gp6YMZ28w1LK0jd2qJM888M0N3zcxsMrIcIQwA\nC/PmFwBDx6sjaRbwbuDliHg9In4MEBGPA88D/yTVXzBBm6T1NkdEa0S0zps3L0N3zcxsMrIEwm5g\nsaRFkuYAq4GugjpdwNo0vQp4MCJC0rx0URpJ7wcWA/sj4hDwmqQL07WGq4D7yjAeMzObpAlPGUXE\nUUnrgW6gAbglIvZJ2gT0REQXcDNwp6R+4GVyoQFwMbBJ0lFgDPh3EfFyWvZp4DagkdzdRTVzh5GZ\n2UyU6cPtImIXuVtD88uuz5v+OXBFkfW+DXz7OG32AOeW0lkzM6scv1PZzMwAB4KZmSUOBDMzAxwI\nZmaWOBDMzAxwIJiZWeJAMDMzwIFgZmaJA8HMzAAHgpmZJQ4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAw\nMzPAgWBmZokDwczMAAeCmZklmb5CcybbuWeQzu4+hkZGmd/USEdbC+3LmqvdLTOzCU33/qumA2Hn\nnkE27uhl9MgYAIMjo2zc0QvgUDCzE1o19l81fcqos7vvjV/muNEjY3R291WpR2Zm2VRj/5UpECSt\nkNQnqV/ShiLL50ralpY/KumsVH6ppMcl9aafH85b56HU5pPp8d5yDWrc0MhoSeVmZieKauy/JgwE\nSQ3ATcBlwBJgjaQlBdWuAV6JiLOBG4EbUvlLwMci4jxgLXBnwXpXRsTS9HhxCuMoan5TY0nlZmYn\nimrsv7IcISwH+iNif0QcBrYCKwvqrARuT9PbgUskKSL2RMRQKt8HvFPS3HJ0PIuOthYaZze8qaxx\ndgMdbS3T1QUzs0mpxv4rSyA0Awfz5gdSWdE6EXEUeBU4raDOvwL2RMTreWW3ptNFX5CkYhuXtE5S\nj6Se4eHhDN39/9qXNfPlj5/HnIbcMJubGvnyx8/zBWUzO+FVY/+V5S6jYjvqKKWOpA+SO430kbzl\nV0bEoKRfAL4NfAq44y2NRGwGNgO0trYWbndC7cua2fLYAQC2XXtRqaubmVXNdO+/shwhDAAL8+YX\nAEPHqyNpFvBu4OU0vwC4F7gqIp4fXyEiBtPP14C7yZ2aMjOzKskSCLuBxZIWSZoDrAa6Cup0kbto\nDLAKeDAiQlIT8JfAxoj43+OVJc2SdHqang18FHhqakMxM7OpmDAQ0jWB9UA38AxwT0Tsk7RJ0uWp\n2s3AaZL6gc8C47emrgfOBr5QcHvpXKBb0l7gSWAQ+PNyDszMzEqT6Z3KEbEL2FVQdn3e9M+BK4qs\n9yXgS8dp9vzs3TQzs0qr6Xcqm5lZdg4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAwMzPAgWBmZokDwczM\nAAeCmZklDgQzMwMcCGZmljgQzMwMcCCYmVniQDAzM8CBYGZmiQPBzMwAB4KZmSUOBDMzAxwIZmaW\nOBDMzAxwIJiZWZIpECStkNQnqV/ShiLL50ralpY/KumsvGUbU3mfpLasbZqZ2fSaMBAkNQA3AZcB\nS4A1kpYUVLsGeCUizgZuBG5I6y4BVgMfBFYAfyapIWObZmY2jWZlqLMc6I+I/QCStgIrgafz6qwE\nfjdNbwf+VJJS+daIeB34gaT+1B4Z2iybFQ/dzT8aPsgPv39KJZo/rqsP/QRg2rdbzW17zNPLY66P\n7f5o3kK49qKKbytLIDQDB/PmB4APHa9ORByV9CpwWip/pGDd5jQ9UZsASFoHrAM488wzM3T3rU49\naS7verVhUutOxbvmTP82q71tj7k+tu0xT+92Tz1p7rRsK0sgqEhZZKxzvPJip6oK28wVRmwGNgO0\ntrYWrTORld/8w8msNmXvq8pWq7ttj7k+tu0x1+Z2s1xUHgAW5s0vAIaOV0fSLODdwMtvs26WNs3M\nbBplCYTdwGJJiyTNIXeRuKugThewNk2vAh6MiEjlq9NdSIuAxcBjGds0M7NpNOEpo3RNYD3QDTQA\nt0TEPkmbgJ6I6AJuBu5MF41fJreDJ9W7h9zF4qPAb0fEGECxNss/PDMzy0q5f+RnhtbW1ujp6al2\nN8zMZhRJj0dE60T1/E5lMzMDHAhmZpY4EMzMDHAgmJlZMqMuKksaBn44ydVPB14qY3dmAo+5PnjM\ntW+q431fRMybqNKMCoSpkNST5Sp7LfGY64PHXPuma7w+ZWRmZoADwczMknoKhM3V7kAVeMz1wWOu\nfdMy3rq5hmBmZm+vno4QzMzsbTgQzMwMqINAkLRQ0t9IekbSPkmfqXafpkP67uo9kr5b7b5MB0lN\nkrZL+rv0t6789w1WmaT/lJ7TT0naIumd1e5TuUm6RdKLkp7KKztV0gOSnks/31PNPpbbccbcmZ7b\neyXdK6mpEtuu+UAg97Hb/zkiPgBcCPy2pCVV7tN0+AzwTLU7MY3+BPifEXEO8EvU+NglNQP/AWiN\niHPJfYz86ur2qiJuA1YUlG0AvhcRi4HvpflachtvHfMDwLkR8U+BZ4GNldhwzQdCRByKiCfS9Gvk\ndhTNb7/WzCZpAfCbwDer3ZfpIOkU4GJy38tBRByOiJHq9mpazAIa07cUvosa/NbBiPhf5L5jJd9K\n4PY0fTvQPq2dqrBiY46Iv4qIo2n2EXLfMll2NR8I+SSdBSwDHq1uTyruj4HrgGPV7sg0eT8wDNya\nTpN9U9JJ1e5UJUXEIPCHwAHgEPBqRPxVdXs1bX4xIg5B7h8+4L1V7s90+zfA/ZVouG4CQdLJwLeB\n/xgRP6l2fypF0keBFyPi8Wr3ZRrNAn4Z+FpELAN+Su2dRniTdN58JbAImA+cJOm3qtsrqzRJ/5Xc\nafC7KtF+XQSCpNnkwuCuiNhR7f5U2K8Al0t6AdgKfFjSt6rbpYobAAYiYvzIbzu5gKhlvw78ICKG\nI+IIsAP4Z1Xu03T5e0lnAKSfL1a5P9NC0lrgo8CVUaE3kNV8IEgSuXPLz0TEH1W7P5UWERsjYkFE\nnEXuIuODEVHT/zlGxI+Ag5JaUtEl5L7Hu5YdAC6U9K70HL+EGr+QnqcLWJum1wL3VbEv00LSCuBz\nwOUR8bNKbafmA4Hcf8yfIvef8pPp8RvV7pSV3b8H7pK0F1gK/EGV+1NR6WhoO/AE0EvutVxzH+cg\naQvwMNAiaUDSNcBXgEslPQdcmuZrxnHG/KfALwAPpH3Y1yuybX90hZmZQX0cIZiZWQYOBDMzAxwI\nZmaWOBDMzAxwIJiZWeJAMDMzwIFgZmbJ/wM2o9sA5317XwAAAABJRU5ErkJggg==\n",
188 | "text/plain": [
189 | ""
190 | ]
191 | },
192 | "metadata": {},
193 | "output_type": "display_data"
194 | }
195 | ],
196 | "source": [
197 | "x,y =zip(*[(k, v/36) for k,v in p_y.items()])\n",
198 | "plt.stem(x,y)\n",
199 | "plt.title('PMF of Y')"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": null,
205 | "metadata": {
206 | "collapsed": true
207 | },
208 | "outputs": [],
209 | "source": []
210 | }
211 | ],
212 | "metadata": {
213 | "kernelspec": {
214 | "display_name": "Python 3",
215 | "language": "python",
216 | "name": "python3"
217 | },
218 | "language_info": {
219 | "codemirror_mode": {
220 | "name": "ipython",
221 | "version": 3
222 | },
223 | "file_extension": ".py",
224 | "mimetype": "text/x-python",
225 | "name": "python",
226 | "nbconvert_exporter": "python",
227 | "pygments_lexer": "ipython3",
228 | "version": "3.5.2"
229 | },
230 | "widgets": {
231 | "state": {},
232 | "version": "1.1.2"
233 | }
234 | },
235 | "nbformat": 4,
236 | "nbformat_minor": 2
237 | }
238 |
--------------------------------------------------------------------------------
/prob/prob_dist_discrete.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Probability Distributions - Discrete"
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {
13 | "collapsed": true
14 | },
15 | "source": [
16 | "%\n",
17 | "% plain-TeX file\n",
18 | "%\n",
19 | "\\ifx\\pdfoutput\\undefined\n",
20 | "\\input mbboard\n",
21 | "\\input mathabx\n",
22 | "\\fi\n",
23 | "\\input typofrmt\n",
24 | "\\input typotabl\n",
25 | "\\useoptions{magstep1,a4,english,preprint}\n",
26 | "\n",
27 | "\\title{Probability Distributions}\n",
28 | "\\author{Anthony Phan, \\today}\n",
29 | "\\maketitle\n",
30 | "\n",
31 | "\\overfullrule=0pt\n",
32 | "\\def\\description{\\medbreak\\bgroup\n",
33 | "\t\\def\\item##1{\\medbreak\\hangindent\\parindent\\leavevmode\n",
34 | "\t\t\\hskip-\\parindent{{\\tt##1}.}\\enspace\\ignorespaces}%\n",
35 | "\t\\def\\subitem##1{\\smallbreak\\hangindent2\\parindent\\leavevmode\n",
36 | "\t\t{\\it##1.}\\enspace\\ignorespaces}%\n",
37 | "\t\\def\\subsubitem##1{\\par\\hangindent3\\parindent\\leavevmode\n",
38 | "\t\t\\hskip\\parindent{##1.}\\enspace\\ignorespaces}%\n",
39 | "\t\\let\\itemitem=\\subitem}%\n",
40 | "\\def\\enddescription{\\egroup\\medbreak}%\n",
41 | "\\def\\cs#1{\\hbox{\\tt\\string#1}}%\n",
42 | "\\def\\var#1{\\ifmmode#1\\else$#1$\\fi}% usual variable\n",
43 | "\\def\\vari#1{\\ifmmode\\hbox{\\it#1}\\else{\\it#1}\\fi}% variable in italic\n",
44 | "\\def\\vartype#1{\\ifmmode\\hbox{\\tt#1}\\else{\\tt#1}\\fi}% variable type\n",
45 | "\n",
46 | "\\section{Continuous Probability distributions\\footnote{{\\it See}\\/ \\cs{probability-distributions.h}}}\n",
47 | "\n",
48 | "\\description\n",
49 | "\n",
50 | "\\item{lnGamma(\\vartype{double} $x$)} Logarithm of the Eulerian Gamma\n",
51 | "function, $\\ln(\\Gamma(x))$.\n",
52 | "\n",
53 | "\\item{Gamma(\\vartype{double} $x$)} Eulerian Gamma function\n",
54 | "$$\n",
55 | "\t\\mathbb R_+^*\\longrightarrow[\\sqrt\n",
56 | "\t\\pi,+\\infty\\mathclose[,\\qquad x\\longmapsto \\Gamma(x)\n",
57 | "\t=\\int_0^{+\\infty}t^{x-1}\\,{\\rm e}^{-t}\\,{\\rm d}t.\n",
58 | "$$\n",
59 | "Remember that for $n\\in\\mathbb N$, $n!=\\Gamma(n+1)$.\n",
60 | "\n",
61 | "\\item{Beta(\\vartype{double} $x$, \\vartype{double} $y$)} Eulerian Beta function\n",
62 | "$$\n",
63 | "\t(\\mathbb R_+^*)^2\\longrightarrow\\mathbb R_+^*,\\qquad\n",
64 | "\tx\\longmapsto {\\rm B}(x, y)=\\int_{0}^{1}u^{x-1}(1-u)^{y-1}\\,{\\rm d}u\n",
65 | "\t={\\Gamma(x)\\times\\Gamma(y)\\over\\Gamma(x+y)}\\,.\n",
66 | "$$\n",
67 | "\n",
68 | "\n",
69 | "% Gamma distributions\n",
70 | "\n",
71 | "\\item{gammapdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $\\lambda$)} Probability density function of\n",
72 | "the Gamma distribution with parameter $a>0$, $\\lambda>0$.\n",
73 | "$$\n",
74 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n",
75 | "\t\\mathbb 1_{\\mathbb R_+}(x){\\lambda^a\\over\\Gamma(a)}\\, x^{a-1}\\,{\\rm\n",
76 | "\te}^{-\\lambda x}.\n",
77 | "$$\n",
78 | "\n",
79 | "\\item{gammacdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $\\lambda$)} Cumulative distribution function\n",
80 | "of the Gamma distribution with parameter $a>0$, $\\lambda>0$.\n",
81 | "$$\n",
82 | "\t\\mathbb R\\longrightarrow[0,1\\mathclose[,\\qquad\n",
83 | "\tx\\longmapsto\\mathbb 1_{\\mathbb R_+}(x){\\lambda^a\\over\\Gamma(a)}\n",
84 | "\t\\int_0^x t^{a-1}\\,{\\rm e}^{-\\lambda t}\\,{\\rm d}t.\n",
85 | "$$\n",
86 | "\n",
87 | "\\item{gammaicdf(\\vartype{double} $p$, \\vartype{double} $a$, \\vartype{double} $\\lambda$)} Inverse cumulative\n",
88 | "distribution function of the Gamma distribution with parameter $a>0$, $\\lambda>0$.\n",
89 | "$$\n",
90 | "\t[0,1\\mathclose[\\longrightarrow\\mathbb R_+,\\qquad\n",
91 | "\tp\\longmapsto\\mathop{\\rm gammaicdf}(p,a,\\lambda).\n",
92 | "$$\n",
93 | "It is set to $0$ for $p<\\cs{accuracy}$ and to \\cs{infinity} for\n",
94 | "$p>1-\\cs{accuracy}$.\n",
95 | "\n",
96 | "% N(0,1) distribution\n",
97 | "\n",
98 | "\\item{normlimit} Numerical parameter for normal computations: if $X$ is a\n",
99 | "random variable with law $\\mathcal N(0,1)$, the normal distribution\n",
100 | "with mean $0$ and standard deviation $1$, then $\\mathbb\n",
101 | "P\\{X\\geq\\cs{normlimit}\\}=\\mathbb\n",
102 | "P\\{X\\leq-\\cs{normlimit}\\}\\approx0$. Its value is (unreasonably) set to\n",
103 | "$\\cs{normlimit} = 10$ ($\\cs{normlimit} = 4$ should be sufficient).\n",
104 | "\n",
105 | "\\item{normalpdf(\\vartype{double} $x$)} Probability density function of\n",
106 | "$\\mathcal N(0,1)$.\n",
107 | "$$\n",
108 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto{\\rm\n",
109 | "\te}^{-x^2\\!/2}/\\sqrt{2\\pi}.\n",
110 | "$$\n",
111 | "\n",
112 | "\n",
113 | "\\item{normalcdf (\\vartype{double} $x$)} Cumulative distribution function of\n",
114 | "$\\mathcal N(0,1)$.\n",
115 | "$$\n",
116 | "\t\\mathbb R\\longrightarrow\\mathopen]0,\\mathclose1[,\\qquad\n",
117 | "\tx\\longmapsto\\Phi(x)=\\int_{-\\infty}^x{\\rm e}^{-z^2\\!/2}\\, {{\\rm\n",
118 | "\td} z\\over\\sqrt{2\\pi}}\n",
119 | "\t={1\\over2}+{1\\over\\sqrt{2\\pi}}\\sum_{n=0}^\\infty\n",
120 | "\t{(-1)^n\\,x^{2n+1}\\over (2n+1)\\times n!}\\,.\n",
121 | "$$\n",
122 | "It is computed with its associated power series when\n",
123 | "$|x|<\\cs{normlimit}$, and set to $0$ or $1$ otherwise.\n",
124 | "\n",
125 | "\\item{normalcdf\\_(\\vartype{double} $x$)} Cumulative distribution function\n",
126 | "of $\\mathcal N(0,1)$. It is just another implementation of the\n",
127 | "previous function with a Gamma cumulative distribution function since\n",
128 | "$$\n",
129 | "\t\\Phi(x)= {1\\over2}\\,\\bigl(\\mathop{\\rm sgn}(x)\n",
130 | "\t\\times\\mathop{\\rm gammacdf}(x^2\\!/2,1/2)+1,1\\bigr),\n",
131 | "\t\\qquad\\hbox{for all $x\\in\\mathbb R$.}\n",
132 | "$$\n",
133 | "\n",
134 | "\\item{normalicdf(\\vartype{double} $p$)} Inverse cumulative\n",
135 | "distribution function of $\\mathcal N(0,1)$.\n",
136 | "$$\n",
137 | "\t\\mathopen]0,1\\mathclose[\\longrightarrow\\mathbb R,\\qquad\n",
138 | "\tp\\longmapsto\\mathop{\\rm\n",
139 | "\tnormalicdf}(p)=\\Phi^{-1}(p).\n",
140 | "$$\n",
141 | "It is set to $\\pm\\cs{infinity}$ for $p$ outside of\n",
142 | "$\\mathopen]\\cs{accuracy},1-\\cs{accuracy}\\mathclose[$.\n",
143 | "Of course, there is also \\cs{normalicdf\\_}\\dots\n",
144 | "\n",
145 | "\\remark\n",
146 | "Normal distributions with mean $m$ and standard deviation $\\sigma$ are not inplemented since they an be easily derived from the standard normal distribution. For instance, one can set\n",
147 | "\\verbatim\n",
148 | "double gaussianpdf(double x, double m, double sigma){\n",
149 | " return normalpdf((x-m)/sigma)/sigma;}\n",
150 | "double gaussiancdf(double x, double m, double sigma){\n",
151 | " return normalcdf((x-m)/sigma);}\n",
152 | "double gaussianicdf(double p, double m, double sigma){\n",
153 | " return sigma*normalicdf(p)+m;}\n",
154 | "\\endverbatim\n",
155 | "in order to get the probability, cumulative, inverse cumulative distribution functions of the $\\mathcal N(m,\\sigma^2)$ distribution with $m\\in\\mathbb R$ and $\\sigma>0$.\n",
156 | "\\endremark\n",
157 | "\n",
158 | "% $\\chi^2$ distributions\n",
159 | "\n",
160 | "\\item{chisquarepdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Probability density function\n",
161 | "of $\\chi^2(\\nu)$, the chi-square (Pearson) distribution with $\\nu>0$\n",
162 | "degrees of freedom.\n",
163 | "$$\n",
164 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\\mathbb\n",
165 | "\t1_{\\mathbb R_+}(x) \\,{x^{\\nu/2-1}{\\rm e}^{-x/2}\\over\n",
166 | "\t2^{\\nu/2}\\,\\Gamma(\\nu/2)}\\,.\n",
167 | "$$\n",
168 | "\n",
169 | "\\item{chisquarecdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Cumulative distribution\n",
170 | "function of $\\chi^2(\\nu)$.\n",
171 | "$$\n",
172 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\\mathbb\n",
173 | "\t1_{\\mathbb R_+}(x) \\int_0^x\\,{t^{\\nu/2-1}{\\rm e}^{-t/2} \\,{{\\rm\n",
174 | "\td}t\\over 2^{\\nu/2}\\,\\Gamma(\\nu/2)}}\\,.\n",
175 | "$$\n",
176 | "\n",
177 | "\\item{chisquareicdf(\\vartype{double} $p$, \\vartype{double} $\\nu$)} Inverse cumulative\n",
178 | "distribution function of $\\chi^2(\\nu)$.\n",
179 | "$$\n",
180 | "\t[0,1\\mathclose[\\longrightarrow\\mathbb R_+,\\qquad\n",
181 | "\tp\\longmapsto\\mathop{\\rm chisquareicdf}(p,\\nu).\n",
182 | "$$\n",
183 | "It is set to $0$ for $p<\\cs{accuracy}$ and to \\cs{infinity} for\n",
184 | "$p>1-\\cs{accuracy}$.\n",
185 | "\n",
186 | "% Beta distributions\n",
187 | "\n",
188 | "\\item{betapdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $b$)} Probability density function\n",
189 | "of the Beta distribution with parameters $a>à$ and $b>0$.\n",
190 | "$$\n",
191 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\\mathbb\n",
192 | "\t1_{[0,1]}(x)\\,x^{a-1}(1-x)^{b-1}\\!/{\\rm B}(a,b).\n",
193 | "$$\n",
194 | "\n",
195 | "\\item{betacdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $b$)} Cumulative distribution\n",
196 | "function of the Beta distribution with parameters $a>0$ and $b>0$.\n",
197 | "$$\n",
198 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n",
199 | "\t\\int_0^x\\mathbb 1_{[0,1]}(u)\\,u^{a-1}(1-u)^{b-1} \\,{{\\rm\n",
200 | "\td}u\\over {\\rm B}(a,b)}\\,.\n",
201 | "$$\n",
202 | "\n",
203 | "\\item{betaicdf(\\vartype{double} $p$, \\vartype{double} $a$, \\vartype{double} $b$)} Inverse cumulative\n",
204 | "distribution function of the Beta distribution with parameters $a>0$\n",
205 | "and $b>0$.\n",
206 | "$$\n",
207 | "\t[0,1]\\longrightarrow\\mathbb R_+,\\qquad\n",
208 | "\tp\\longmapsto\\mathop{\\rm betaicdf}(p,a,b).\n",
209 | "$$\n",
210 | "It is set to $0$ for $p<\\cs{accuracy}$ and to $1$ for\n",
211 | "$p>1-\\cs{accuracy}$.\n",
212 | "\n",
213 | "% Student's (T) distributions\n",
214 | "\n",
215 | "\\item{studentpdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Probability density function of\n",
216 | "$\\mathcal T(\\nu)$, the Student distribution with $\\nu>0$ degrees of\n",
217 | "freedom.\n",
218 | "$$\n",
219 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n",
220 | " \t{1\\over\\sqrt{\\nu}\\,{\\rm B}(\\nu/2,1/2)} \\biggl(1+{x^2\\over\n",
221 | " \t\\nu}\\biggr)^{\\!\\!-(\\nu+1)/2} .\n",
222 | "$$\n",
223 | "\n",
224 | "\\item{studentcdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Cumulative distribution\n",
225 | "function of $\\mathcal T(\\nu)$.\n",
226 | "$$\n",
227 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n",
228 | " \t\\int_{-\\infty}^x \\biggl(1+{z^2\\over \\nu}\\biggr)^{\\!\\!-(\\nu+1)/2}\n",
229 | " \t{{\\rm d}z\\over\\sqrt{\\nu}\\,{\\rm B}(\\nu/2,1/2)} .\n",
230 | "$$\n",
231 | "\\item{studenticdf(\\vartype{double} $p$, \\vartype{double} $\\nu$)} Inverse cumulative\n",
232 | "distribution function of $\\mathcal T(\\nu)$.\n",
233 | "$$\n",
234 | "\t\\mathopen]0,1\\mathclose[\\longrightarrow\\mathbb R,\\qquad\n",
235 | "\tp\\longmapsto\\mathop{\\rm studenticdf}(p,\\nu).\n",
236 | "$$\n",
237 | "It is set to $\\pm\\cs{infinity}$ for $p$ outside of\n",
238 | "$\\mathopen]\\cs{accuracy},1-\\cs{accuracy}\\mathclose[$.\n",
239 | "\n",
240 | "% Fisher's (F) distributions\n",
241 | "\n",
242 | "\\item{fisherpdf(\\vartype{double} $x$, \\vartype{double} $\\nu_1$, \\vartype{double} $\\nu_2$)} Probability density\n",
243 | "function of $\\mathcal F(\\nu_1,\\nu_2)$, the Fisher distribution with\n",
244 | "$\\nu_1>0$ and $\\nu_2>0$ degrees of freedom ($\\nu_1$ is the numerator degree\n",
245 | "of freedom, $\\nu_2$ the demominator degree of freedom).\n",
246 | "$$\n",
247 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n",
248 | "\t\\mathbb 1_{\\mathbb R_+}(x)\\,{(\\nu_1/\\nu_2)^{\\nu_1/2}\\,x^{\\nu_1/2-1}\n",
249 | "\t\\over {\\rm B}(\\nu_1/2,\\nu_2/2)(1+x\\times \\nu_1/\\nu_2)^{(\\nu_1+\\nu_2)/2}}.\n",
250 | "$$\n",
251 | "\n",
252 | "\\item{fishercdf(\\vartype{double} $x$, \\vartype{double} $\\nu_1$, \\vartype{double} $\\nu_2$)} Cumulative distribution\n",
253 | "function of $\\mathcal F(\\nu_1,\\nu_2)$.\n",
254 | "$$\n",
255 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n",
256 | "\t\\mathbb 1_{\\mathbb R_+}(x)\\int_0^x\n",
257 | "\t{(\\nu_1/\\nu_2)^{\\nu_1/2}\\,z^{\\nu_1/2-1} \\over {\\rm B}(\\nu_1/2,\\nu_2/2)(1+z\\times\n",
258 | "\t\\nu_1/\\nu_2)^{(\\nu_1+\\nu_2)/2}} \\,{\\rm d}z.\n",
259 | "$$\n",
260 | "\n",
261 | "\\item{fishericdf(\\vartype{double} $p$, \\vartype{double} $\\nu_1$, \\vartype{double} $\\nu_2$)} Inverse cumulative\n",
262 | "distribution function of $\\mathcal F(\\nu_1, \\nu_2)$.\n",
263 | "$$\n",
264 | "\t[0,1\\mathclose[\\longrightarrow\\mathbb R_+,\\qquad\n",
265 | "\tp\\longmapsto\\mathop{\\rm fishericdf}(p,\\nu_1,\\nu_2).\n",
266 | "$$\n",
267 | "It is set to $0$ for $p<\\cs{accuracy}$ and to \\cs{infinity} for\n",
268 | "$p>1-\\cs{accuracy}$.\n",
269 | "\n",
270 | "\\section{Discrete Probability distributions\\footnote{{\\it See}\\/ \\cs{probability-distributions.h}}}\n",
271 | "\n",
272 | "About quantiles, please note that $q_p=k+0.5$ when\n",
273 | "$F(k)=p$ for $k\\in\\mathbb N$.\n",
274 | "\n",
275 | "% Poisson's distributions\n",
276 | "\n",
277 | "\\item{poissonpdf(\\vartype{double} $x$, \\vartype{double} $\\lambda$)} Probability distribution\n",
278 | "function of $\\mathcal P(\\lambda)$, the Poisson distribution with\n",
279 | "parameter $\\lambda\\geq0$.\n",
280 | "$$\n",
281 | "\t\\mathbb R\\longrightarrow[0,1],\\qquad x\\longmapsto\\cases{{\\rm\n",
282 | "\te}^{-\\lambda}\\,\\lambda^x\\!/x!& if $x\\in\\mathbb N$,\\cr 0\n",
283 | "\t&otherwise.}\n",
284 | "$$\n",
285 | "\\item{poissoncdf(\\vartype{double} $x$, \\vartype{double} $\\lambda$)} Cumulative distribution\n",
286 | "function of $\\mathcal P(\\lambda)$.\n",
287 | "\n",
288 | "\\item{poissonicdf(\\vartype{double} $p$, \\vartype{double} $\\lambda$)} Inverse cumulative\n",
289 | "distribution function of $\\mathcal P(\\lambda)$.\n",
290 | "\n",
291 | "% Binomial distributions\n",
292 | "\n",
293 | "\\item{binomialpdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Probability distribution\n",
294 | "function of $\\mathcal B(n,\\pi)$, the binomial distribution with\n",
295 | "parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$.\n",
296 | "$$\n",
297 | "\t\\mathbb R\\longrightarrow[0,1],\\qquad\n",
298 | "\tx\\longmapsto\\cases{C_n^x\\,\\pi^x(1-\\pi)^{n-x}& if\n",
299 | "\t$x\\in\\{0,1,\\dots,n\\}$,\\cr 0 &otherwise.}\n",
300 | "$$\n",
301 | "\n",
302 | "\\item{binomialcdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Cumulative distribution\n",
303 | "function of $\\mathcal B(n,\\pi)$.\n",
304 | "\n",
305 | "\\item{binomialicdf(\\vartype{double} $p$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Inverse cumulative\n",
306 | "distribution function of $\\mathcal B(n,\\pi)$.\n",
307 | "\n",
308 | "% Geometric distributions\n",
309 | "\n",
310 | "\\item{geometricpdf(\\vartype{double} $x$, \\vartype{double} $\\pi$)} Probability distribution\n",
311 | "function of $\\mathcal G(\\pi)$, the geometrical distribution with\n",
312 | "parameter $\\pi\\in[0,1]$. Describe the law of the first success rank in\n",
313 | "an infinitely repeated Bernoulli trial with parameter $\\pi\\in[0,1]$.\n",
314 | "Thus, it is given by\n",
315 | "$$\n",
316 | "\t\\mathbb R\\longrightarrow[0,1], \\qquad x\\longmapsto\n",
317 | "\t\\cases{\\pi(1-\\pi)^{x-1}& if $x\\in\\{1,2,3,\\dots\\}$,\\cr 0 &\n",
318 | "\totherwise.}\n",
319 | "$$\n",
320 | "\n",
321 | "\\item{geometriccdf(\\vartype{double} $x$, \\vartype{double} $\\pi$)} Cumulative distribution\n",
322 | "function of $\\mathcal G(\\pi)$. It returns the sum up to $x$ of the\n",
323 | "previous probabilities. Thus it is given by\n",
324 | "$$\n",
325 | "\t\\mathbb R\\longrightarrow[0,1], \\qquad x\\longmapsto\n",
326 | "\t\\cases{1-(1-\\pi)^{\\mathop{\\rm floor}x}& if $x\\geq1$,\\cr 0 &\n",
327 | "\totherwise.}\n",
328 | "$$\n",
329 | "\n",
330 | "\\item{geometricicdf(\\vartype{double} $p$, \\vartype{double} $\\pi$)} Inverse cumulative\n",
331 | "distribution function of $\\mathcal G(\\pi)$.\n",
332 | "$$\n",
333 | "\t[0,1]\\longrightarrow\\{1,1.5,2,2.5,3,3.5,\\dots\\}, \\qquad\n",
334 | "\tp\\longmapsto\\mathop{\\rm geometricicdf}(p,\\pi).\n",
335 | "$$\n",
336 | "\n",
337 | "\\item{negativebinomialpdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Probability\n",
338 | "distribution function of the negative binomial ditribution with\n",
339 | "parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$. Describe the law of the\n",
340 | "$n$-th success rank, $n\\in\\mathbb N^*$, in an infinitely repeated\n",
341 | "Bernoulli trial with parameter $\\pi\\in[0,1]$. Thus, it is given by\n",
342 | "$$\n",
343 | "\t\\mathbb R\\longrightarrow[0,1],\\qquad x\\longmapsto\n",
344 | "\t\\cases{C_{x-1}^{n-1}\\,\\pi^n(1-\\pi)^{x-n}& if\n",
345 | "\t$x\\in\\{n,n+1,n+2,\\dots\\}$,\\cr 0 & otherwise.}\n",
346 | "$$\n",
347 | "One can remark that the negative binomial distributions for $n=1$ are\n",
348 | "just the corresponding geometric ones. It is certainly better when\n",
349 | "$n=1$ to use functions related to geometric distributions instead of\n",
350 | "negative binomial distributions related ones.\n",
351 | "\n",
352 | "\\item{negativebinomialcdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Cumulative\n",
353 | "distribution function of the negative binomial ditribution with\n",
354 | "parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$. One can easily prove\n",
355 | "that it is given by\n",
356 | "$$\n",
357 | "\t\\mathbb R\\longrightarrow[0,1], \\qquad x\\longmapsto\n",
358 | "\t1-\\mathop{\\rm binomialcdf}(n-1,\\mathop{\\rm floor} x,\\pi).\n",
359 | "$$\n",
360 | "\n",
361 | "\\item{negativebinomialicdf(\\vartype{double} $p$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Inverse\n",
362 | "cumulative distribution function of the negative binomial ditribution\n",
363 | "with parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$.\n",
364 | "$$\n",
365 | "\t[0,1]\\longrightarrow\\{n,n+0.5,n+1,n+1.5,\\dots\\}, \\qquad\n",
366 | "\tp\\longmapsto\\mathop{\\rm\n",
367 | "\tnegativebinomialicdf}(p,n,\\pi).\n",
368 | "$$\n",
369 | "\n",
370 | "% Hypergeometric distributions\n",
371 | "\n",
372 | "\\item{hypergeometricpdf(\\vartype{double} $x$, \\vartype{int} $N$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Probability\n",
373 | "distribution function of $\\mathcal H(N, n,\\pi)$, the\n",
374 | "hypergeometrical distribution with parameters $N\\geq n\\in\\mathbb N^*$\n",
375 | "and $\\pi\\in[0,1]$. One should have $N\\pi\\in\\mathbb N$.\n",
376 | "$$\n",
377 | "\\eqalign{\n",
378 | "\t\\mathbb R&\\longrightarrow[0,1],\\cr\n",
379 | "\tx&\\longmapsto\\cases{\\displaystyle\n",
380 | "\t{C_{N\\pi}^x\\,C_{N(1-\\pi)}^{n-x}\\over C_N^n}& if\n",
381 | "\t$x\\in\\{\\max(0,n-N(1-\\pi)),\\dots,\\min(n,N\\pi)\\}$,\\cr 0 &otherwise.}\n",
382 | "\t}\n",
383 | "$$\n",
384 | "\n",
385 | "\\item{hypergeometriccdf(\\vartype{double} $x$, \\vartype{int} $N$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Cumulative\n",
386 | "distribution function of $\\mathcal H(N, n,\\pi)$.\n",
387 | "\n",
388 | "\\item{hypergeometricicdf(\\vartype{double} $p$, \\vartype{int} $N$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Inverse\n",
389 | "cumulative distribution function of $\\mathcal H(N,n,\\pi)$.\n",
390 | "\n",
391 | "\\section{Rather specific Probability distributions\\footnote{{\\it See}\\/ \\cs{probability-distributions.h}}}\n",
392 | "\n",
393 | "\\item{Kolmogorovcdf(\\vartype{double} $x$, \\vartype{int} $n$)} Cumulative\n",
394 | "distribution function of Kolmogorov distributions.\n",
395 | "These are the famous probability distributions involved in\n",
396 | "Kolmogorov--Smirnov (two-sided) Goodness of Fit tests\n",
397 | "with statistic\n",
398 | "$$\n",
399 | "\tK_n={\\|F-F_n\\|}_\\infty=\\sup_{x\\in\\mathbb R}\\bigl|F(x)-F_n(x)\\bigr|\n",
400 | "\t=\\max\\nolimits_{i = 1}^n \\bigl(F(X_{(i)})-(i-1)/n\\bigr)\n",
401 | "\t\\vee\\bigl(i/n-F(X_{(i)})\\bigr).\n",
402 | "$$\n",
403 | "Their\n",
404 | "computation is based on ``Evaluating Kolmogorov's Distribution''\n",
405 | "by George Marsa\\-glia and Wai Wan Tsang.\n",
406 | "\n",
407 | "\\item{kolmogorovicdf(\\vartype{double} $p$, \\vartype{int} $n$)} Inverse\n",
408 | "cumulative distribution function of Kolmogorov distributions.\n",
409 | "(Please do not use it since it is based on the bisection\n",
410 | "method or dichotomy.)\n",
411 | "\n",
412 | "\\item{klmcdf(\\vartype{double} $x$, \\vartype{int} $n$)}\n",
413 | "Cumulative distribution function of the limiting distributions\n",
414 | "associated to Kolmogorov distribution by Dudley's asymptotic\n",
415 | "formula (1964):\n",
416 | "$$\n",
417 | "\t\\lim_{n\\to\\infty}\\mathbb P\\bigl\\{K_n\\leq u/\\!\\sqrt n\\bigr\\}\n",
418 | "\t=1+2\\sum_{k=1}^\\infty(-1)^k\\exp\\bigl(-2k^2u^2\\bigr),\n",
419 | "$$\n",
420 | "with some numerical adjustments (Stephens M.A., 1970).\n",
421 | "\n",
422 | "\\item{klmicdf(\\vartype{double} $p$, \\vartype{int} $n$)} Inverse\n",
423 | "cumulative distribution function of the previous distributions.\n",
424 | "\n",
425 | "\n",
426 | "\\item{kpmcdf(\\vartype{double} $x$, \\vartype{int} $n$)}\n",
427 | "Cumulative distribution function of the distribution involved in the\n",
428 | "one sided Goodness of Fit tests with statistic\n",
429 | "$$\n",
430 | "\tK^+_n=\\sup_{x\\in\\mathbb R}\\bigl(F(x)-F_n(x)\\bigr)\n",
431 | "\t=\\max\\nolimits_{1\\leq i\\leq n}\n",
432 | "\t\\Bigl({i\\over n}-F\\bigl(X_{(i)}\\bigr)\\Bigr)\n",
433 | "$$\n",
434 | "or\n",
435 | "$$\n",
436 | "\tK^-_n=\\sup_{x\\in\\mathbb R}\\bigl(F_n(x)-F(x)\\bigr)\n",
437 | "\t=\\max\\nolimits_{1\\leq i\\leq n}\n",
438 | "\t\\Bigl(F\\bigl(X_{(i)}\\bigr)-{i-1\\over n}\\Bigr)\n",
439 | "$$\n",
440 | "which share the same distribution: for $x\\in[0,1]$,\n",
441 | "$$\n",
442 | "\\eqalign{\n",
443 | " \\hbox{\\tt kpmcdf}(x,n)\n",
444 | " =\\Proba\\bigl\\{K^\\pm_n\\leq x\\bigr\\}\n",
445 | "\t&=x\\sum_{0\\leq k\\leq nx}{n\\choose k}(k/n-x)^k(x+1-k/n)^{n-k-1}\\cr\n",
446 | " &=1-x\\sum_{nx\n",
46 | " "
47 | ],
48 | "text/plain": [
49 | "\n",
50 | " \n",
57 | " "
58 | ]
59 | },
60 | "execution_count": 2,
61 | "metadata": {},
62 | "output_type": "execute_result"
63 | }
64 | ],
65 | "source": [
66 | "from IPython.display import IFrame\n",
67 | "IFrame('https://www.baidu.com', width='100%', height=350)"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": 9,
73 | "metadata": {
74 | "collapsed": true
75 | },
76 | "outputs": [],
77 | "source": [
78 | "from IPython.display import HTML\n",
79 | "js_script_str = '''\n",
91 | "'''"
92 | ]
93 | },
94 | {
95 | "cell_type": "code",
96 | "execution_count": 10,
97 | "metadata": {},
98 | "outputs": [
99 | {
100 | "data": {
101 | "text/html": [
102 | "\n",
114 | ""
115 | ],
116 | "text/plain": [
117 | "\n",
129 | ""
130 | ]
131 | },
132 | "execution_count": 10,
133 | "metadata": {},
134 | "output_type": "execute_result"
135 | }
136 | ],
137 | "source": [
138 | "HTML(js_script_str.replace('{_}', 'content to hide'))"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {
145 | "collapsed": true
146 | },
147 | "outputs": [],
148 | "source": []
149 | }
150 | ],
151 | "metadata": {
152 | "kernelspec": {
153 | "display_name": "Python 3",
154 | "language": "python",
155 | "name": "python3"
156 | },
157 | "language_info": {
158 | "codemirror_mode": {
159 | "name": "ipython",
160 | "version": 3
161 | },
162 | "file_extension": ".py",
163 | "mimetype": "text/x-python",
164 | "name": "python",
165 | "nbconvert_exporter": "python",
166 | "pygments_lexer": "ipython3",
167 | "version": "3.5.2"
168 | },
169 | "widgets": {
170 | "state": {},
171 | "version": "1.1.2"
172 | }
173 | },
174 | "nbformat": 4,
175 | "nbformat_minor": 2
176 | }
177 |
--------------------------------------------------------------------------------
/test/test_notebook2.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This note book is for testing out basic functionalities of nbviewer rendering of equations and code blocks."
8 | ]
9 | },
10 | {
11 | "cell_type": "markdown",
12 | "metadata": {},
13 | "source": [
14 | "Let me try an equation\n",
15 | "\n",
16 | "$$\\frac{1}{N} \\sum_{i=1}^N Z_i \\rightarrow E[ Z ], \\;\\;\\; N \\rightarrow \\infty.$$\n",
17 | "\n",
18 | "\\begin{eqnarray}\n",
19 | "\\nabla \\times \\vec{\\mathbf{B}} -\\, \\frac1c\\, \\frac{\\partial\\vec{\\mathbf{E}}}{\\partial t} & = \\frac{4\\pi}{c}\\vec{\\mathbf{j}} \\\\\n",
20 | "\\nabla \\cdot \\vec{\\mathbf{E}} & = 4 \\pi \\rho \\\\\n",
21 | "\\nabla \\times \\vec{\\mathbf{E}}\\, +\\, \\frac1c\\, \\frac{\\partial\\vec{\\mathbf{B}}}{\\partial t} & = \\vec{\\mathbf{0}} \\\\\n",
22 | "\\nabla \\cdot \\vec{\\mathbf{B}} & = 0 \n",
23 | "\\end{eqnarray}"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 7,
29 | "metadata": {},
30 | "outputs": [
31 | {
32 | "data": {
33 | "text/html": [
34 | "\n",
35 | " \n",
42 | " "
43 | ],
44 | "text/plain": [
45 | ""
46 | ]
47 | },
48 | "execution_count": 7,
49 | "metadata": {},
50 | "output_type": "execute_result"
51 | }
52 | ],
53 | "source": [
54 | "from IPython.display import IFrame\n",
55 | "IFrame('https://www.baidu.com', width='100%', height=350)"
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "metadata": {
62 | "collapsed": true
63 | },
64 | "outputs": [],
65 | "source": []
66 | }
67 | ],
68 | "metadata": {
69 | "kernelspec": {
70 | "display_name": "Python 2",
71 | "language": "python",
72 | "name": "python2"
73 | },
74 | "language_info": {
75 | "codemirror_mode": {
76 | "name": "ipython",
77 | "version": 2
78 | },
79 | "file_extension": ".py",
80 | "mimetype": "text/x-python",
81 | "name": "python",
82 | "nbconvert_exporter": "python",
83 | "pygments_lexer": "ipython2",
84 | "version": "2.7.13"
85 | }
86 | },
87 | "nbformat": 4,
88 | "nbformat_minor": 2
89 | }
90 |
--------------------------------------------------------------------------------