├── .gitignore ├── README.md ├── _config.yml ├── cs229.png ├── index.html ├── mit6006.jpg ├── prob ├── central_limit_theorem.ipynb ├── poisson_paradigm.ipynb ├── prob_concepts.ipynb ├── prob_dist_discrete.ipynb └── rv.png ├── stats110.jpg └── test ├── readme.md ├── test_notebook.ipynb └── test_notebook2.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | # Created by .ignore support plugin (hsz.mobi) 2 | ### Python template 3 | # Byte-compiled / optimized / DLL files 4 | __pycache__/ 5 | *.py[cod] 6 | *$py.class 7 | 8 | # C extensions 9 | *.so 10 | 11 | # Distribution / packaging 12 | .Python 13 | env/ 14 | build/ 15 | develop-eggs/ 16 | dist/ 17 | downloads/ 18 | eggs/ 19 | .eggs/ 20 | lib/ 21 | lib64/ 22 | parts/ 23 | sdist/ 24 | var/ 25 | wheels/ 26 | *.egg-info/ 27 | .installed.cfg 28 | *.egg 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *,cover 49 | .hypothesis/ 50 | 51 | # Translations 52 | *.mo 53 | *.pot 54 | 55 | # Django stuff: 56 | *.log 57 | local_settings.py 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # Jupyter Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # SageMath parsed files 82 | *.sage.py 83 | 84 | # dotenv 85 | .env 86 | 87 | # virtualenv 88 | .venv 89 | venv/ 90 | ENV/ 91 | 92 | # Spyder project settings 93 | .spyderproject 94 | 95 | # Rope project settings 96 | .ropeproject 97 | 98 | /.idea/ 99 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Preparations for DS/AI/ML/Quant 2 | 3 | ## What is this 4 | 5 | A short list of resources and topics covering the essential quantitative tools for data scientists, AI/machine learning practitioners, quant developers/researchers and those who are preparing to interview for these roles. 6 | 7 | At a high-level we can divide things into 3 main areas: 8 | 9 | 1. Machine Learning 10 | 2. Coding 11 | 3. Math (calculus, linear algebra, probability, etc) 12 | 13 | Depending on the type of roles, the emphasis can be quite different. For example, AI/ML interviews might go deeper into the latest deep learning models, while quant interviews might cast a wide net on various kinds of math puzzles. Interviews for research-oriented roles might be lighter on coding problems or at least emphasize on algorithms instead of software designs or tooling. 14 | 15 | 16 | ## List of resources 17 | 18 | A minimalist list of the best/most practical ones: 19 | 20 | ![](cs229.png) 21 | ![](mit6006.jpg) 22 | ![](stats110.jpg) 23 | 24 | Machine Learning: 25 | 26 | - Course on classic ML: Andrew Ng's CS229 (there are several different versions, [the Cousera one](https://www.coursera.org/learn/machine-learning) is easily accessible. I used this [older version](https://www.youtube.com/playlist?list=PLA89DCFA6ADACE599)) 27 | - Book on classic ML: Alpaydin's Intro to ML [link](https://www.amazon.com/Introduction-Machine-Learning-Adaptive-Computation/dp/026201243X/ref=la_B001KD8D4G_1_2?s=books&ie=UTF8&qid=1525554938&sr=1-2) 28 | - Course with a deep learing focus: [CS231](http://cs231n.stanford.edu/) from Stanford, lectures available on Youtube 29 | - Book on deep learning: [Deep Leanring] (https://www.deeplearningbook.org/) by Ian Goodfellow et al. 30 | - Book on deep laerning NLP: Yoav Goldberg's [Neural Network Methods for Natural Language Processing](https://www.amazon.com/Language-Processing-Synthesis-Lectures-Technologies-ebook/dp/B071FGKZMH) 31 | - Hands on exercises on deep learning: Pytorch and MXNet/Gluon are easier to pick up compared to Tensorflow. For anyone of them, you can find plenty of hands on examples online. My biased recommendation is [https://d2l.ai/](https://d2l.ai/) using MXNet/Gluon created by people at Amazon (it came from [mxnet-the-straight-dope](https://github.com/zackchase/mxnet-the-straight-dope)) 32 | 33 | 34 | Coding: 35 | 36 | - Course: MIT OCW 6006 [link](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/) 37 | - Book: Cracking the Coding Interview [link](https://www.amazon.com/Cracking-Coding-Interview-Programming-Questions/dp/098478280X) 38 | - SQL tutorial: from [Mode Analytics](https://community.modeanalytics.com/sql/) 39 | - Practice sites: [Leetcode](https://leetcode.com/), [HackerRank](https://www.hackerrank.com/) 40 | 41 | 42 | Math: 43 | 44 | - Calculus and Linear Algebra: undergrad class would be the best, refresher notes from CS229 [link](http://cs229.stanford.edu/section/cs229-linalg.pdf) 45 | - Probability: Harvard Stats110 [link](https://projects.iq.harvard.edu/stat110/home); [book](https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573/ref=pd_lpo_sbs_14_t_2?_encoding=UTF8&psc=1&refRID=5W11QQ7WW4DFE0Q89N7V) from the same professor 46 | - Statistics: Shaum's Outline [link](https://www.amazon.com/Schaums-Outline-Statistics-5th-Outlines/dp/0071822526) 47 | - Numerical Methods and Optimization: these are two different topics really, college courses are probably the best bet. I have yet to find good online courses for them. But don't worry, most interviews won't really touch on them. 48 | 49 | 50 | 51 | ## List of topics 52 | 53 | Here is a list of topics from which interview questions are often derived. The depth and trickiness of the questions certainly depend on the role and the company. 54 | 55 | Under topic I try to add a few bullet points of the key things you should know. 56 | 57 | ### Machine learning 58 | - Models (roughly in decreasing order of frequency) 59 | - Linear regression 60 | - e.g. assumptions, multicollinearity, derive from scratch in linear algebra form 61 | - Logistic regression 62 | - be able to write out everything from scratch: from definitng a classficiation problem to the gradient updates 63 | - Decision trees/forest 64 | - e.g. how does a tree/forest grow, on a pseudocode level 65 | - Clustering algorithms 66 | - e.g. K-means, agglomerative clustering 67 | - SVM 68 | - e.g. margin-based loss objectives, how do we use support vectors, prime-dual problem 69 | - Generative vs discriminative models 70 | - e.g. Gaussian mixture, Naive Bayes 71 | - Anomaly/outlier detection algorithms (DBSCAN, LOF etc) 72 | - Matrix factorization based models 73 | - Training methods 74 | - Gradient descent, SGD and other popular variants 75 | - Understand momentum, how they work, and what are the diffrences between the popular ones (RMSProp, Adgrad, Adadelta, Adam etc) 76 | - Bonus point: when to not use momentum? 77 | - EM algorithm 78 | - Andrew's [lecture notes](http://cs229.stanford.edu/notes/cs229-notes8.pdf) are great, also see [this](https://dingran.github.io/EM/) 79 | - Gradient boosting 80 | - Learning theory / best practice (see Andrew's advice [slides](http://cs229.stanford.edu/materials/ML-advice.pdf)) 81 | - Bias vs variance, regularization 82 | - Feature selection 83 | - Model validation 84 | - Model metrics 85 | - Ensemble method, boosting, bagging, bootstraping 86 | - Generic topics on deep learning 87 | - Feedforward networks 88 | - Backpropagation and computation graph 89 | - I really liked the [miniflow](https://gist.github.com/dingran/154a524003c86ecab4a949c538afa766) project Udacity developed 90 | - In addition, be absolutely familiar with doing derivatives with matrix and vectors, see [Vector, Matrix, and Tensor Derivatives](http://cs231n.stanford.edu/vecDerivs.pdf) by Erik Learned-Miller and [Backpropagation for a Linear Layer](http://cs231n.stanford.edu/handouts/linear-backprop.pdf) by Justin Johnson 91 | - CNN, RNN/LSTM/GRU 92 | - Regularization in NN, dropout, batch normalization 93 | 94 | ### Coding essentials 95 | The bare minimum of coding concepts you need to know well. 96 | 97 | - Data structures: 98 | - array, dict, link list, tree, heap, graph, ways of representing sparse matrices 99 | - Sorting algorithms: 100 | - see [this](https://brilliant.org/wiki/sorting-algorithms/) from brilliant.org 101 | - Tree/Graph related algorithms 102 | - Traversal (BFS, DFS) 103 | - Shortest path (two sided BFS, dijkstra) 104 | - Recursion and dynamic programming 105 | 106 | ### Calculus 107 | 108 | Just to spell things out 109 | 110 | - Derivatives 111 | - Product rule, chain rule, power rule, L'Hospital's rule, 112 | - Partial and total derivative 113 | - Things worth remembering 114 | - common function's derivatives 115 | - limits and approximations 116 | - Applications of derivatives: e.g. [this](https://math.stackexchange.com/questions/1619911/why-ex-is-always-greater-than-xe) 117 | - Integration 118 | - Power rule, integration by sub, integration by part 119 | - Change of coordinates 120 | - Taylor expansion 121 | - Single and multiple variables 122 | - Taylor/McLauren series for common functions 123 | - Derive Newton-Raphson 124 | - ODEs, PDEs (common ways to solve them analytically) 125 | 126 | 127 | ### Linear algebra 128 | - Vector and matrix multiplication 129 | - Matrix operations (transpose, determinant, inverse etc) 130 | - Types of matrices (symmetric, Hermition, orthogonal etc) and their properties 131 | - Eigenvalue and eigenvectors 132 | - Matrix calculus (gradients, hessian etc) 133 | - Useful theorems 134 | - Matrix decomposition 135 | - Concrete applications in ML and optimization 136 | 137 | 138 | ### Probability 139 | 140 | Solving probability interview questions is really all about pattern recognition. To do well, do plenty of exercise from [this](https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573/ref=pd_lpo_sbs_14_t_2?_encoding=UTF8&psc=1&refRID=5W11QQ7WW4DFE0Q89N7V) and [this](https://www.amazon.com/Practical-Guide-Quantitative-Finance-Interviews/dp/1438236662). This topic is particularly heavy in quant interviews and usually quite light in ML/AI/DS interviews. 141 | 142 | - Basic concepts 143 | - Event, outcome, random variable, probability and probability distributions 144 | - Combinatorics 145 | - Permutation 146 | - Combinations 147 | - Inclusion-exclusion 148 | - Conditional probability 149 | - Bayes rule 150 | - Law of total probability 151 | - Probability Distributions 152 | - Expectation and variance equations 153 | - Discrete probability and stories 154 | - Continuous probability: uniform, gaussian, poisson 155 | - Expectations, variance, and covariance 156 | - Linearity of expectation 157 | - solving problems with this theorem and symmetry 158 | - Law of total expectation 159 | - Covariance and correlation 160 | - Independence implies zero correlation 161 | - Hash collision probability 162 | - Universality of Uniform distribution 163 | - Proof 164 | - Circle problem 165 | - Order statistics 166 | - Expectation of min and max and random variable 167 | - Graph-based solutions involving multiple random variables 168 | - e.g. breaking sticks, meeting at the train station, frog jump (simplex) 169 | - Approximation method: Central Limit Theorem 170 | - Definition, examples (unfair coins, Monte Carlo integration) 171 | - [Example question](https://github.com/dingran/quant-notes/blob/master/prob/central_limit_theorem.ipynb) 172 | - Approximation method: Poisson Paradigm 173 | - Definition, examples (duplicated draw, near birthday problem) 174 | - Poisson count/time duality 175 | - Poisson from poissons 176 | - Markov chain tricks 177 | - Various games, introduction of martingale 178 | 179 | ### Statistics 180 | - Z-score, p-value 181 | - t-test, F-test, Chi2 test (know when to use which) 182 | - Sampling methods 183 | - AIC, BIC 184 | 185 | ### [Optional] Numerical methods and optimization 186 | - Computer errors (e.g. float) 187 | - Root finding (newton method, bisection, secant etc) 188 | - Interpolating 189 | - Numerical integration and difference 190 | - Numerical linear algebra 191 | - Solving linear equations, direct methods (understand complexities here) and iterative methods (e.g. conjugate gradient) 192 | - Matrix decompositions/transformations (e.g. QR, LU, SVD etc) 193 | - Eigenvalue (e.g. power iteration, Arnoldi/Lanczos etc) 194 | - ODE solvers (explicit, implicit) 195 | - Finite-difference method, finite-element method 196 | - Optimization topics: TBA 197 | 198 | 199 | 200 | -------------------------------------------------------------------------------- /_config.yml: -------------------------------------------------------------------------------- 1 | theme: jekyll-theme-cayman -------------------------------------------------------------------------------- /cs229.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dingran/quant-notes/b14bcc686425fef4841c254753386a2b7e652913/cs229.png -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | hello world 2 | -------------------------------------------------------------------------------- /mit6006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dingran/quant-notes/b14bcc686425fef4841c254753386a2b7e652913/mit6006.jpg -------------------------------------------------------------------------------- /prob/central_limit_theorem.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Central Limit Theorem (CLT)\n", 8 | "\n", 9 | "## Definition:\n", 10 | "Let $X_{1}$, $X_{2}$, $X_{3}$,... be i.i.d random variables from some distribution with finite mean $\\mu$ and finite variance $\\sigma^{2}$. \n", 11 | "\n", 12 | "As $n \\rightarrow \\infty$, let $S=\\sum_{k=1}^n X_{i}$, we have $S \\rightarrow \\mathcal{N}(n\\mu, n\\sigma^{2})$ and $\\frac{S-n\\mu}{\\sqrt{n\\sigma^{2}}} \\rightarrow \\mathcal{N}(0,1)$\n", 13 | "\n", 14 | "Equivalently, let $M=\\frac{1}{n}\\sum_{k=1}^n X_{i}$, we have\n", 15 | "$M \\rightarrow \\mathcal{N}(\\mu,\\frac{\\sigma^2}{n})$ and $\\frac{M-\\mu}{\\sqrt{\\frac{\\sigma^2}{n}}} \\rightarrow \\mathcal{N}(0,1)$\n", 16 | "\n", 17 | "\n", 18 | "Notation:\n", 19 | " - $\\mathcal{N}(\\mu,\\sigma^2)$ denotes [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean of $\\mu$ and variance of $\\sigma^2$." 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Discussions:\n", 27 | "\n", 28 | "Naturally CLT appears in questions that invovle sum or average of a large number of random variablse and especially when the questions only ask for an approximate answer. \n", 29 | "\n", 30 | "Here are a few examples." 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "source": [ 39 | "
\n", 40 | "***Example 1:***\n", 41 | "\n", 42 | "Suppose we have a fair coin and we flip it 400 times. What is the probability you will see 210 heads or more?\n", 43 | "\n", 44 | "---\n", 45 | "\n", 46 | "
\n", 47 | "**Exact answer**\n", 48 | "\n", 49 | "Let the outcome of each coin flip be a random variable $I_{i}$. Thus we are dealing with the random variable $S=\\sum_{i=1}^{400}I_{i}$. $S$ is te sume of a series of i.i.d Bernoulie trials, thus it follows Binomial distribution. So the exact answer is: $P(S\\geq210)= \\sum_{k=210}^{400}C_{400}^{k}\\left(\\frac{1}{2}\\right)^{400}$ which requires a program to calculate (Actually try implementing this, beware of roudoff errors and compare it against the approximate answer below.).\n", 50 | "\n", 51 | "\n", 52 | "Notation:\n", 53 | " - $C_{n}^{k}$ is the notation for \"[n choose k](https://en.wikipedia.org/wiki/Binomial_coefficient)\", which denotes the number of ways to choose k items from n items where order doesn't matter.\n", 54 | "\n", 55 | "
\n", 56 | "**Approximation**\n", 57 | "\n", 58 | "We use CLT to easily get an approxmate answer quickly. First recognize that for each $I_{i}$ we have $\\mu=0.5$ and $\\sigma^2=0.5\\times(1-0.5)=0.25$. Then, $Z=\\frac{S-400*0.5}{\\sqrt{400*0.25}}=\\frac{S-200}{10}$ is approximately $\\mathcal{N}(0,1)$. For $S \\geq 210$, we have $Z\\geq1$. \n", 59 | "\n", 60 | "The 68-95-99.7 rule tells us that for a standard Normal distribution $\\mathcal{N}(0,1)$, the probability of the random variable taking value more than 1 standard deviation away from the center is $1-0.68=0.32$ and thus the one sided probability for $P(Z\\geq1) = 0.32/2 = 0.16$." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "
\n", 68 | "***Example 2:***\n", 69 | "\n", 70 | "Suppose you use Monte Carlo simulation to estimate the numerical value of $\\pi$.\n", 71 | "- How would you implement it? \n", 72 | "- If we require an error of 0.001, how many trials do you need?\n", 73 | "\n", 74 | "---\n", 75 | "\n", 76 | "**Solution**\n", 77 | "\n", 78 | "One possible implementation is to start with a rectangle, say $x \\in [-1,1], y\\in[-1,1]$. If we uniformly randomly draw a point from this rectangle, the probability $p$ of the point following into the circle region $x^2+y^2\\lt1$ is the ratio of the area between the circle and rectangle, i.e $p=\\frac{\\pi}{4}$\n", 79 | "\n", 80 | "Formally, let random indicator variable $I$ take value 1 if the point falls in the circle and 0 otherwise, then $P(I=1)=p$ and $E(I)=p$. If we do $n$ such trials, and define $M=\\frac{1}{n}\\sum_{k=1}^n I_{k}$, then $M$ follows approximately $\\mathcal{N}(\\mu_{I},\\frac{\\sigma_{I}^2}{n})$. In this setup, $\\mu_{I}=E(I)=p$ and $\\sigma_{I}^2=p(1-p)$ (see [Probability Distribution](prob-dist-discrete.ipynb) section for details on $\\sigma_{I}^2$).\n", 81 | "\n", 82 | "One thing we need to clarify with the interviewer is what error really means? She might tell you to consider it as the standard deviation of the estimated $\\pi$. Therefore the specified error translates into a required sigma of $\\sigma_{req}=\\frac{error}{4}$ for random variable $M$. Thus $n = \\frac{\\sigma_{I}^2}{\\sigma_{req}^2}=\\frac{p(1-p)}{(0.00025)^2}\\approx2.7\\times 10^6$.\n", 83 | "\n", 84 | "By the way, we can see that the number of trials $n$ scales with $\\frac{1}{error^2}$, which is caused by the $\\frac{1}{\\sqrt{n}}$ scaling of the $\\sigma_{M}$ in the CLT, and is generally the computationaly complexity entailed by [Monte Carlo integration](https://en.wikipedia.org/wiki/Monte_Carlo_integration).\n" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [] 91 | } 92 | ], 93 | "metadata": { 94 | "kernelspec": { 95 | "display_name": "Python 2", 96 | "language": "python", 97 | "name": "python2" 98 | }, 99 | "language_info": { 100 | "codemirror_mode": { 101 | "name": "ipython", 102 | "version": 2 103 | }, 104 | "file_extension": ".py", 105 | "mimetype": "text/x-python", 106 | "name": "python", 107 | "nbconvert_exporter": "python", 108 | "pygments_lexer": "ipython2", 109 | "version": "2.7.13" 110 | } 111 | }, 112 | "nbformat": 4, 113 | "nbformat_minor": 1 114 | } 115 | -------------------------------------------------------------------------------- /prob/poisson_paradigm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "collapsed": true 7 | }, 8 | "source": [ 9 | "# Poisson Paradigm\n", 10 | " \n", 11 | " - birthday problem, birthday triplets, near birthday problem\n", 12 | " - repeated draws" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [] 21 | } 22 | ], 23 | "metadata": { 24 | "kernelspec": { 25 | "display_name": "Python 2", 26 | "language": "python", 27 | "name": "python2" 28 | }, 29 | "language_info": { 30 | "codemirror_mode": { 31 | "name": "ipython", 32 | "version": 2 33 | }, 34 | "file_extension": ".py", 35 | "mimetype": "text/x-python", 36 | "name": "python", 37 | "nbconvert_exporter": "python", 38 | "pygments_lexer": "ipython2", 39 | "version": "2.7.6" 40 | } 41 | }, 42 | "nbformat": 4, 43 | "nbformat_minor": 0 44 | } 45 | -------------------------------------------------------------------------------- /prob/prob_concepts.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Probability concepts\n", 8 | "\n", 9 | "In this notebook we will go over some essential concepts in probability, such as **events, random variables, probability and probability distribution**. They seem simple but also sometimes are quite confusing to someone new to probability.\n", 10 | "\n" 11 | ] 12 | }, 13 | { 14 | "cell_type": "markdown", 15 | "metadata": {}, 16 | "source": [ 17 | "## Sample space, outcomes, events\n", 18 | "There are things in lif that are a bit *random* in the sense that we do not know the outcome with certainty before it occurs, so we reason about the uncertainties with tools of probability. \n", 19 | "\n", 20 | "For an experiment with uncertain outcomes, we denote all possible outcomes as a *set* $S$, and call it the *sample space*. The actual outcome(s) will belong to this set.\n", 21 | "\n", 22 | "An *event* $E$ is a subset of $S$ (i.e. $E \\subseteq S$) and we would say event $E$ *occurred* if the actual outcome(s) belongs to $E$ (i.e. $s_{actual} \\in E$).\n", 23 | "\n", 24 | "To make this concrete, let's look at a single roll if a 6-sided dice.\n", 25 | "\n", 26 | "The sample space $S$ in this case is a set of 6 elements: $S=\\{\\text{Face 1 shows up}, \\text{Face 2 shows up}, \\dots, \\text{Face 6 shows up}\\}$\n", 27 | "\n", 28 | "We could define an event however we want. For exampple we can define event $E_1$ to be the event face 5 shows up, define an event $E_2$ to be the event \"a face with even number shows up\", and define an event $E_3$ to be the event \"the face shows up is not 2 or 3\". They are expressed as follows:\n", 29 | "\n", 30 | "$$E_1 =\\{\\text{Face 5 shows up}\\}$$\n", 31 | "$$E_2 =\\{\\text{Face 2 shows up},\\text{Face 4 shows up},\\text{Face 6 shows up}\\}$$\n", 32 | "$$E_3^c =\\{\\text{Face 2 shows up}, \\text{Face 3 shows up}\\}, E_3=S-E_3^c$$\n", 33 | "\n", 34 | "In the above, we introduced the notation $^c$ to mean the complementation of a set. Specifically, it means $E_3^c$ occurs if and only if $E_3$ does not occur. \n", 35 | "\n", 36 | "A few other notations on set algebra might be helpful to review: https://en.wikipedia.org/wiki/Algebra_of_sets" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Probability\n", 44 | "\n", 45 | "\"Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, where, loosely speaking, 0 indicates impossibility and 1 indicates certainty.\" - Wikipedia\n", 46 | "\n", 47 | "We can interpret probability as the frequency of certain event occur if we do the experiment many many times. This long-run frequency view is the *frequentist* view. Or we can interpret probability as our degree of belief in the event, which is useful for experiments that are not possible to repeat over and over. This is the *Bayesian* view.\n", 48 | "\n", 49 | "We use $P(A)$ to denote the probabily that event $A$ occurs and define probability with the following axioms:\n", 50 | "- $P(S)=1$, $P(\\emptyset)=0$\n", 51 | "- Disjoint/mutually exclusive events, $A_1, A_2, \\dots$ are defined such that $A_i \\cap A_j = \\emptyset$ for $i\\neq j$. And we have \n", 52 | "$$P \\left( \\bigcup_{i=1}^{\\infty}A_i \\right) = \\sum_{i=1}^{\\infty}P(A_i) $$\n", 53 | "\n", 54 | "\n", 55 | "Following the definiiton we have properties:\n", 56 | "- $P(A) + P(A^c)=1$\n", 57 | "- $A \\subseteq B \\Rightarrow P(A) \\leq P(B)$\n", 58 | "- $P(A \\cup B) = P(A) + P(B) - P(A \\cap B)$\n", 59 | "\n", 60 | "The last property can be generalized into the [inclusion-exclusion theorem](https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle) for more than two sets." 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "Let's try to find the probability of $E_1$, $E_2$ and $E_3$ in our dice example. Assume our dice is fair, meaning each face will show up equally likely. Denote $A_i$ to be the outcome face $i$ shows up, we have $ \\sum_{i=1}^6 A_i = 1 $ and $A_1=A_2= \\dots = A_6$, therefore $A_i=\\frac{1}{6}$. \n", 68 | "\n", 69 | "The fairness of the dices gives the problem **symmetry** (i.e. all $A_i$s are equal). In addition $A_i$s are **mutually exclusive** (a roll of a single dice cannot possibly take more than one value). Therefore we can resort to counting the **number of occurences** for calculating probability:\n", 70 | "\n", 71 | "$$P(E) = \\frac{\\text{number of outcomes in E}}{\\text{number of all possible outcomes}}$$\n", 72 | "\n", 73 | "Thus we have $P(E_1)=\\frac{1}{6}$, $P(E_2)=\\frac{3}{6}=\\frac{1}{2}$ and $P(E_3)=\\frac{4}{6}=\\frac{2}{3}$\n", 74 | "\n", 75 | "The assumption/condition that **all the outcomes are equally likely and mutually exclussive** forms the basis for couting and combinatorics that we will go over in the next section." 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "## Random variable\n", 83 | "\n", 84 | "In the discussion above, we have $P(\\cdot)$ notation for probability, it is a function that takes an event as input and outputs a real value that is between 0 and 1.\n", 85 | "\n", 86 | "In order to fully utilize the tools we have in calculus, it would be nice to have the input of $P(\\cdot)$ be real valued. And random variable comes to resume.\n", 87 | "\n", 88 | "**A random variable maps sample space $S$ to real numbers $\\mathbb{R}$.** That's it. The exact mapping is up to use to define. \n", 89 | "\n", 90 | "![rv.png](rv.png)\n", 91 | "(Image credit: *Blitzstein, Joseph K., and Jessica Hwang. Introduction to probability. CRC Press, 2014.*)\n", 92 | "\n", 93 | "For example, we could define a random variable $X$ (we usually use capital letter to denote a random variable) to take on value $i$ if the dice face $i$ shows up.\n", 94 | "\n", 95 | "For example, if we roll the dice twice, we could define a random variable $Y$ to take on the value $i_1+i_2$, where $i_1$ and $i_2$ are the dice value of the first and second roll respectively. Thus very conveniently $Y=7$ represents the outcomes of $(i_1, i_2)$ takes on values of $\\{(1,6), (6,1), (2, 5), (5, 2), (3,4), (4,3) \\})$ from the two rolls. " 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "## Probability distribution\n", 103 | "\n", 104 | "Now that we have both the input and output of $P(\\cdot)$ as real numbers, it is natual to think of $P(\\cdot)$ as function that describes the probability of random variable taking of various values. More concretely $P(X=k)=P(k)=f_X(k)$.\n", 105 | "\n", 106 | "For our dice example above, because $X$ and $Y$ takes on discrete values, they are called discrete random variables. Their $P(\\cdot)$ functions are called probability mass functions (PMF). Because at each of the value $k$ the variable takes, P(k) is indeed a probability. This is in contrast to the continuous version of the random variables. \n", 107 | "\n", 108 | "For continuous random variables, the probability of taking on an exact real number is zero, and it is more useful to talk about the probabiliy for taking on values of certain interval, so the equivalent of $P(\\cdot)$ there is called probability density function (PDF).\n", 109 | "\n", 110 | "To get some feeling about PMF, let's calculate the PMF for $Y$. " 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 1, 116 | "metadata": { 117 | "collapsed": true 118 | }, 119 | "outputs": [], 120 | "source": [ 121 | "import matplotlib.pyplot as plt\n", 122 | "from collections import defaultdict\n", 123 | "%matplotlib inline\n", 124 | "\n", 125 | "p_y = defaultdict(int)\n", 126 | "for i in range(1, 7):\n", 127 | " for j in range(1, 7):\n", 128 | " s = i+j\n", 129 | " p_y[s] +=1" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 2, 135 | "metadata": {}, 136 | "outputs": [ 137 | { 138 | "data": { 139 | "text/plain": [ 140 | "defaultdict(int,\n", 141 | " {2: 1,\n", 142 | " 3: 2,\n", 143 | " 4: 3,\n", 144 | " 5: 4,\n", 145 | " 6: 5,\n", 146 | " 7: 6,\n", 147 | " 8: 5,\n", 148 | " 9: 4,\n", 149 | " 10: 3,\n", 150 | " 11: 2,\n", 151 | " 12: 1})" 152 | ] 153 | }, 154 | "execution_count": 2, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "p_y" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "We can see that $Y$ only takes on certain values (2 to 12), this set of values are called the *support*, outside of the support $P(Y)=0$. To get the probability, we should divide the count outcomes by the total count of all possible outcomes." 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 3, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "" 179 | ] 180 | }, 181 | "execution_count": 3, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | }, 185 | { 186 | "data": { 187 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEICAYAAABfz4NwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGhZJREFUeJzt3X90XOV95/H3J/KPKFCiAE4XyyY4i1fEga1dhAPbLd0N\nJTJtgnWyJrEPDWaXs2Zz6t3sZiti7yak9aZNWHVLm1OaxA2/A7apY4ySmlVpKN2TXX5YYNbCUIFw\niC3JKSIgQhMF2/J3/5hH7DCM0R1pRmPNfF7nzNG9z33uc59Hmrkf3R8zo4jAzMzsHdXugJmZnRgc\nCGZmBjgQzMwscSCYmRngQDAzs8SBYGZmgAPBbNpJapT0HUmvSvqLavfHbJwDweqCpBckjUr6B0l/\nL+lWSSenZQ9JCkm/VLDOzlT+L9L870o6ktoYf1w3ie6sAn4ROC0irijY5qWpf6fnlc2V9Iykayex\nLbPMHAhWTz4WEScDvwxcAHw+b9mzwFXjM5JOAy4Ehgva2BYRJ+c9/vsk+vE+4NmIOFq4ICIeAL4L\n/Ele8eeBQ8DmSWzLLDMHgtWdiBgE7gfOzSu+C/ikpIY0vwa4Fzg8mW1I+kA68hiRtE/S5an894Dr\n07b+QdI1RVb/LPBrkn5T0rnAeuDfhj9WwCpsVrU7YDbdJC0EfgPYkVc8BDwNfIRcWFwFfAb42CTa\nnw18B7gltffPgfsktUbEFyUFcHZE/Fax9SPiVUmfBr5O7sjg9yLi+VL7YVYqHyFYPdkpaQT4PvC3\nwB8ULL8DuEpSC9AUEQ8XaeMT6b/+8cf8InUuBE4GvhIRhyPiQXKngdZk7WhEfAd4hNxr9KtZ1zOb\nCh8hWD1pj4i/fpvlO4D/AfwYuPM4de453n/2eeYDByPiWF7ZD4HmzD3N2Qe8XtCOWcU4EMySiPiZ\npPuBTwP/eApNDQELJb0jb2d+JrkL12YnLJ8yMnuz/wL8WkS8MIU2HgV+ClwnaXa6bfVjwNapd8+s\nchwIZnkiYigivj/FNg4DlwOXAS8BfwZcFRF/V4YumlWMfCebmZmBjxDMzCzJFAiSVkjqk9QvaUOR\n5RdLekLSUUmr8sr/paQn8x4/l9Selt0m6Qd5y5aWb1hmZlaqCU8ZpXduPgtcCgwAu4E1EfF0Xp2z\ngFOA3wG6ImJ7kXZOBfqBBelujtuA7xara2Zm0y/LbafLgf6I2A8gaSuwkty7OgEYvyND0tvdL70K\nuD8ifjbp3pqZWcVkCYRm4GDe/ADwoUlsazXwRwVlvy/peuB7wIaIeL1wJUnrgHUAJ5100vnnnHPO\nJDZtZla/Hn/88ZciYt5E9bIEgoqUlXRrkqQzgPOA7rzijcCPgDnkPsXxc8Cmt2woYnNaTmtra/T0\n9JSyaTOzuifph1nqZbmoPAAszJtfQO6dmKX4BHBvRBwZL4iIQ5HzOnAruVNTZmZWJVkCYTewWNIi\nSXPInfrpKnE7a4At+QXpqAFJAtqBp0ps08zMymjCQEhf4rGe3OmeZ8h9uNc+SZvyPuP9AkkDwBXA\nNyTtG18/3YG0kNynS+a7S1Iv0AucDnxp6sMxM7PJmlHvVPY1BDOz0kl6PCJaJ6rndyqbmRngj782\nm5Kdewbp7O5jaGSU+U2NdLS10L6s1K89MDsxOBDMJmnnnkE27uhl9MgYAIMjo2zc0QvgULAZyaeM\nzCaps7vvjTAYN3pkjM7uvir1yGxqHAhmkzQ0MlpSudmJzoFgNknzmxpLKjc70TkQzCapo62FxtkN\nbyprnN1AR1tLlXpkNjW+qGw2SeMXjq/bvpfDY8do9l1GNsM5EMymoH1ZM1seOwDAtmsvqnJvzKbG\np4zMzAxwIJiZWeJAMDMzwIFgZmaJA8HMzAAHgpmZJQ4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAwMzPA\ngWBmZkmmQJC0QlKfpH5JG4osv1jSE5KOSlpVsGxM0pPp0ZVXvkjSo5Kek7RN0pypD8fMzCZrwkCQ\n1ADcBFwGLAHWSFpSUO0AcDVwd5EmRiNiaXpcnld+A3BjRCwGXgGumUT/zcysTLIcISwH+iNif0Qc\nBrYCK/MrRMQLEbEXOJZlo5IEfBjYnopuB9oz99rMzMouSyA0Awfz5gdSWVbvlNQj6RFJ4zv904CR\niDg6UZuS1qX1e4aHh0vYrJmZlSLLF+SoSFmUsI0zI2JI0vuBByX1Aj/J2mZEbAY2A7S2tpayXTMz\nK0GWI4QBYGHe/AJgKOsGImIo/dwPPAQsA14CmiSNB1JJbZqZWfllCYTdwOJ0V9AcYDXQNcE6AEh6\nj6S5afp04FeApyMigL8Bxu9IWgvcV2rnzcysfCYMhHSefz3QDTwD3BMR+yRtknQ5gKQLJA0AVwDf\nkLQvrf4BoEfS/yUXAF+JiKfTss8Bn5XUT+6aws3lHJiZmZUmyzUEImIXsKug7Pq86d3kTvsUrvd/\ngPOO0+Z+cncwmZnZCcDvVDYzM8CBYGZmiQPBzMwAB4KZmSWZLiqbneh27hmks7uPoZFR5jc10tHW\nQvuyUt5QP/PU45itshwINuPt3DPIxh29jB4ZA2BwZJSNO3oBanYHWY9jtsrzKSOb8Tq7+97YMY4b\nPTJGZ3dflXpUefU4Zqs8B4LNeEMjoyWV14J6HLNVngPBZrz5TY0lldeCehyzVZ4DwWa8jrYWGmc3\nvKmscXYDHW0tVepR5dXjmK3yfFHZZrzxi6jXbd/L4bFjNNfBHTf1OGarPAeC1YT2Zc1seewAANuu\nvajKvZke9ThmqyyfMjIzM8CBYGZmiQPBzMwAB4KZmSUOBDMzAxwIZmaWOBDMzAxwIJiZWeJAMDMz\nIGMgSFohqU9Sv6QNRZZfLOkJSUclrcorXyrpYUn7JO2V9Mm8ZbdJ+oGkJ9NjaXmGZGZmkzHhR1dI\nagBuAi4FBoDdkroi4um8ageAq4HfKVj9Z8BVEfGcpPnA45K6I2IkLe+IiO1THYSZmU1dls8yWg70\nR8R+AElbgZXAG4EQES+kZcfyV4yIZ/OmhyS9CMwDRjAzsxNKllNGzcDBvPmBVFYSScuBOcDzecW/\nn04l3Shp7nHWWyepR1LP8PBwqZs1M7OMsgSCipRFKRuRdAZwJ/CvI2L8KGIjcA5wAXAq8Lli60bE\n5ohojYjWefPmlbJZMzMrQZZAGAAW5s0vAIaybkDSKcBfAp+PiEfGyyPiUOS8DtxK7tSUmZlVSZZA\n2A0slrRI0hxgNdCVpfFU/17gjoj4i4JlZ6SfAtqBp0rpuJmZldeEgRARR4H1QDfwDHBPROyTtEnS\n5QCSLpA0AFwBfEPSvrT6J4CLgauL3F56l6ReoBc4HfhSWUdmZmYlyfSNaRGxC9hVUHZ93vRucqeS\nCtf7FvCt47T54ZJ6amZmFeV3KpuZGeBAMDOzxIFgZmaAA8HMzBIHgpmZAQ4EMzNLHAhmZgZkfB+C\nWRY79wzS2d3H0Mgo85sa6WhroX1ZyZ+DaCc4/51rlwPBymLnnkE27uhl9MgYAIMjo2zc0QvgnUUN\n8d+5tvmUkZVFZ3ffGzuJcaNHxujs7qtSj6wS/HeubQ4EK4uhkdGSym1m8t+5tjkQrCzmNzWWVG4z\nk//Otc2BYGXR0dZC4+yGN5U1zm6go62lSj2ySvDfubb5orKVxfgFxeu27+Xw2DGaffdJTfLfubY5\nEKxs2pc1s+WxAwBsu/aiKvfGKsV/59rlU0ZmZgY4EMzMLHEgmJkZ4EAwM7PEgWBmZkDGQJC0QlKf\npH5JG4osv1jSE5KOSlpVsGytpOfSY21e+fmSelObX5WkqQ/HzMwma8JAkNQA3ARcBiwB1khaUlDt\nAHA1cHfBuqcCXwQ+BCwHvijpPWnx14B1wOL0WDHpUZiZ2ZRlOUJYDvRHxP6IOAxsBVbmV4iIFyJi\nL3CsYN024IGIeDkiXgEeAFZIOgM4JSIejogA7gDapzoYMzObvCyB0AwczJsfSGVZHG/d5jQ9YZuS\n1knqkdQzPDyccbNmZlaqLIFQ7Nx+ZGz/eOtmbjMiNkdEa0S0zps3L+NmzcysVFkCYQBYmDe/ABjK\n2P7x1h1I05Np08zMKiBLIOwGFktaJGkOsBroyth+N/ARSe9JF5M/AnRHxCHgNUkXpruLrgLum0T/\nzcysTCYMhIg4Cqwnt3N/BrgnIvZJ2iTpcgBJF0gaAK4AviFpX1r3ZeC/kQuV3cCmVAbwaeCbQD/w\nPHB/WUdmZmYlyfRppxGxC9hVUHZ93vRu3nwKKL/eLcAtRcp7gHNL6ayZmVWO36lsZmaAA8HMzBIH\ngpmZAQ4EMzNLHAhmZgY4EMzMLHEgmJkZ4EAwM7PEgWBmZoADwczMEgeCmZkBGT/LyGaWnXsG6ezu\nY2hklPlNjXS0tdC+LOt3GpmduPzcriwHQo3ZuWeQjTt6GT0yBsDgyCgbd/QC+IVjM5qf25XnU0Y1\nprO7740XzLjRI2N0dvdVqUdm5eHnduU5EGrM0MhoSeVmM4Wf25XnQKgx85saSyo3myn83K48B0KN\n6WhroXF2w5vKGmc30NHWUqUemZWHn9uV54vKNWb84tp12/dyeOwYzb4Tw2qEn9uV50CoQe3Lmtny\n2AEAtl17UZV7Y1Y+fm5Xlk8ZmZkZ4EAwM7MkUyBIWiGpT1K/pA1Fls+VtC0tf1TSWan8SklP5j2O\nSVqalj2U2hxf9t5yDszMzEozYSBIagBuAi4DlgBrJC0pqHYN8EpEnA3cCNwAEBF3RcTSiFgKfAp4\nISKezFvvyvHlEfFiGcZjZmaTlOUIYTnQHxH7I+IwsBVYWVBnJXB7mt4OXCJJBXXWAFum0lkzM6uc\nLIHQDBzMmx9IZUXrRMRR4FXgtII6n+StgXBrOl30hSIBAoCkdZJ6JPUMDw9n6K6ZmU1GlkAotqOO\nUupI+hDws4h4Km/5lRFxHvCr6fGpYhuPiM0R0RoRrfPmzcvQXTMzm4wsgTAALMybXwAMHa+OpFnA\nu4GX85avpuDoICIG08/XgLvJnZoyM7MqyRIIu4HFkhZJmkNu595VUKcLWJumVwEPRkQASHoHcAW5\naw+kslmSTk/Ts4GPAk9hZmZVM+E7lSPiqKT1QDfQANwSEfskbQJ6IqILuBm4U1I/uSOD1XlNXAwM\nRMT+vLK5QHcKgwbgr4E/L8uIzMxsUjJ9dEVE7AJ2FZRdnzf9c3JHAcXWfQi4sKDsp8D5JfbVzMwq\nyO9UNjMzwIFgZmaJA8HMzAAHgpmZJQ4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAwMzPAgWBmZokDwczM\nAAeCmZklDgQzMwMcCGZmlmT6+Gsr3c49g3R29zE0Msr8pkY62lpoX1b4VdRmNhPUy+vZgVABO/cM\nsnFHL6NHxgAYHBll445egJp8EpnVsnp6PfuUUQV0dve98eQZN3pkjM7uvir1yMwmq55ezw6EChga\nGS2p3MxOXPX0enYgVMD8psaSys3sxFVPr2cHQgV0tLXQOLvhTWWNsxvoaGupUo/MbLLq6fWcKRAk\nrZDUJ6lf0oYiy+dK2paWPyrprFR+lqRRSU+mx9fz1jlfUm9a56uSVK5BVVv7sma+/PHzmNOQ+/U2\nNzXy5Y+fV3MXoMzqQT29nie8y0hSA3ATcCkwAOyW1BURT+dVuwZ4JSLOlrQauAH4ZFr2fEQsLdL0\n14B1wCPALmAFcP+kR3KCaV/WzJbHDgCw7dqLqtwbM5uKenk9ZzlCWA70R8T+iDgMbAVWFtRZCdye\nprcDl7zdf/ySzgBOiYiHIyKAO4D2kntvZmZlkyUQmoGDefMDqaxonYg4CrwKnJaWLZK0R9LfSvrV\nvPoDE7RpZmbTKMsb04r9px8Z6xwCzoyIH0s6H9gp6YMZ28w1LK0jd2qJM888M0N3zcxsMrIcIQwA\nC/PmFwBDx6sjaRbwbuDliHg9In4MEBGPA88D/yTVXzBBm6T1NkdEa0S0zps3L0N3zcxsMrIEwm5g\nsaRFkuYAq4GugjpdwNo0vQp4MCJC0rx0URpJ7wcWA/sj4hDwmqQL07WGq4D7yjAeMzObpAlPGUXE\nUUnrgW6gAbglIvZJ2gT0REQXcDNwp6R+4GVyoQFwMbBJ0lFgDPh3EfFyWvZp4DagkdzdRTVzh5GZ\n2UyU6cPtImIXuVtD88uuz5v+OXBFkfW+DXz7OG32AOeW0lkzM6scv1PZzMwAB4KZmSUOBDMzAxwI\nZmaWOBDMzAxwIJiZWeJAMDMzwIFgZmaJA8HMzAAHgpmZJQ4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAw\nMzPAgWBmZokDwczMAAeCmZklmb5CcybbuWeQzu4+hkZGmd/USEdbC+3LmqvdLTOzCU33/qumA2Hn\nnkE27uhl9MgYAIMjo2zc0QvgUDCzE1o19l81fcqos7vvjV/muNEjY3R291WpR2Zm2VRj/5UpECSt\nkNQnqV/ShiLL50ralpY/KumsVH6ppMcl9aafH85b56HU5pPp8d5yDWrc0MhoSeVmZieKauy/JgwE\nSQ3ATcBlwBJgjaQlBdWuAV6JiLOBG4EbUvlLwMci4jxgLXBnwXpXRsTS9HhxCuMoan5TY0nlZmYn\nimrsv7IcISwH+iNif0QcBrYCKwvqrARuT9PbgUskKSL2RMRQKt8HvFPS3HJ0PIuOthYaZze8qaxx\ndgMdbS3T1QUzs0mpxv4rSyA0Awfz5gdSWdE6EXEUeBU4raDOvwL2RMTreWW3ptNFX5CkYhuXtE5S\nj6Se4eHhDN39/9qXNfPlj5/HnIbcMJubGvnyx8/zBWUzO+FVY/+V5S6jYjvqKKWOpA+SO430kbzl\nV0bEoKRfAL4NfAq44y2NRGwGNgO0trYWbndC7cua2fLYAQC2XXtRqaubmVXNdO+/shwhDAAL8+YX\nAEPHqyNpFvBu4OU0vwC4F7gqIp4fXyEiBtPP14C7yZ2aMjOzKskSCLuBxZIWSZoDrAa6Cup0kbto\nDLAKeDAiQlIT8JfAxoj43+OVJc2SdHqang18FHhqakMxM7OpmDAQ0jWB9UA38AxwT0Tsk7RJ0uWp\n2s3AaZL6gc8C47emrgfOBr5QcHvpXKBb0l7gSWAQ+PNyDszMzEqT6Z3KEbEL2FVQdn3e9M+BK4qs\n9yXgS8dp9vzs3TQzs0qr6Xcqm5lZdg4EMzMDHAhmZpY4EMzMDHAgmJlZ4kAwMzPAgWBmZokDwczM\nAAeCmZklDgQzMwMcCGZmljgQzMwMcCCYmVniQDAzM8CBYGZmiQPBzMwAB4KZmSUOBDMzAxwIZmaW\nOBDMzAxwIJiZWZIpECStkNQnqV/ShiLL50ralpY/KumsvGUbU3mfpLasbZqZ2fSaMBAkNQA3AZcB\nS4A1kpYUVLsGeCUizgZuBG5I6y4BVgMfBFYAfyapIWObZmY2jWZlqLMc6I+I/QCStgIrgafz6qwE\nfjdNbwf+VJJS+daIeB34gaT+1B4Z2iybFQ/dzT8aPsgPv39KJZo/rqsP/QRg2rdbzW17zNPLY66P\n7f5o3kK49qKKbytLIDQDB/PmB4APHa9ORByV9CpwWip/pGDd5jQ9UZsASFoHrAM488wzM3T3rU49\naS7verVhUutOxbvmTP82q71tj7k+tu0xT+92Tz1p7rRsK0sgqEhZZKxzvPJip6oK28wVRmwGNgO0\ntrYWrTORld/8w8msNmXvq8pWq7ttj7k+tu0x1+Z2s1xUHgAW5s0vAIaOV0fSLODdwMtvs26WNs3M\nbBplCYTdwGJJiyTNIXeRuKugThewNk2vAh6MiEjlq9NdSIuAxcBjGds0M7NpNOEpo3RNYD3QDTQA\nt0TEPkmbgJ6I6AJuBu5MF41fJreDJ9W7h9zF4qPAb0fEGECxNss/PDMzy0q5f+RnhtbW1ujp6al2\nN8zMZhRJj0dE60T1/E5lMzMDHAhmZpY4EMzMDHAgmJlZMqMuKksaBn44ydVPB14qY3dmAo+5PnjM\ntW+q431fRMybqNKMCoSpkNST5Sp7LfGY64PHXPuma7w+ZWRmZoADwczMknoKhM3V7kAVeMz1wWOu\nfdMy3rq5hmBmZm+vno4QzMzsbTgQzMwMqINAkLRQ0t9IekbSPkmfqXafpkP67uo9kr5b7b5MB0lN\nkrZL+rv0t6789w1WmaT/lJ7TT0naIumd1e5TuUm6RdKLkp7KKztV0gOSnks/31PNPpbbccbcmZ7b\neyXdK6mpEtuu+UAg97Hb/zkiPgBcCPy2pCVV7tN0+AzwTLU7MY3+BPifEXEO8EvU+NglNQP/AWiN\niHPJfYz86ur2qiJuA1YUlG0AvhcRi4HvpflachtvHfMDwLkR8U+BZ4GNldhwzQdCRByKiCfS9Gvk\ndhTNb7/WzCZpAfCbwDer3ZfpIOkU4GJy38tBRByOiJHq9mpazAIa07cUvosa/NbBiPhf5L5jJd9K\n4PY0fTvQPq2dqrBiY46Iv4qIo2n2EXLfMll2NR8I+SSdBSwDHq1uTyruj4HrgGPV7sg0eT8wDNya\nTpN9U9JJ1e5UJUXEIPCHwAHgEPBqRPxVdXs1bX4xIg5B7h8+4L1V7s90+zfA/ZVouG4CQdLJwLeB\n/xgRP6l2fypF0keBFyPi8Wr3ZRrNAn4Z+FpELAN+Su2dRniTdN58JbAImA+cJOm3qtsrqzRJ/5Xc\nafC7KtF+XQSCpNnkwuCuiNhR7f5U2K8Al0t6AdgKfFjSt6rbpYobAAYiYvzIbzu5gKhlvw78ICKG\nI+IIsAP4Z1Xu03T5e0lnAKSfL1a5P9NC0lrgo8CVUaE3kNV8IEgSuXPLz0TEH1W7P5UWERsjYkFE\nnEXuIuODEVHT/zlGxI+Ag5JaUtEl5L7Hu5YdAC6U9K70HL+EGr+QnqcLWJum1wL3VbEv00LSCuBz\nwOUR8bNKbafmA4Hcf8yfIvef8pPp8RvV7pSV3b8H7pK0F1gK/EGV+1NR6WhoO/AE0EvutVxzH+cg\naQvwMNAiaUDSNcBXgEslPQdcmuZrxnHG/KfALwAPpH3Y1yuybX90hZmZQX0cIZiZWQYOBDMzAxwI\nZmaWOBDMzAxwIJiZWeJAMDMzwIFgZmbJ/wM2o9sA5317XwAAAABJRU5ErkJggg==\n", 188 | "text/plain": [ 189 | "" 190 | ] 191 | }, 192 | "metadata": {}, 193 | "output_type": "display_data" 194 | } 195 | ], 196 | "source": [ 197 | "x,y =zip(*[(k, v/36) for k,v in p_y.items()])\n", 198 | "plt.stem(x,y)\n", 199 | "plt.title('PMF of Y')" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": { 206 | "collapsed": true 207 | }, 208 | "outputs": [], 209 | "source": [] 210 | } 211 | ], 212 | "metadata": { 213 | "kernelspec": { 214 | "display_name": "Python 3", 215 | "language": "python", 216 | "name": "python3" 217 | }, 218 | "language_info": { 219 | "codemirror_mode": { 220 | "name": "ipython", 221 | "version": 3 222 | }, 223 | "file_extension": ".py", 224 | "mimetype": "text/x-python", 225 | "name": "python", 226 | "nbconvert_exporter": "python", 227 | "pygments_lexer": "ipython3", 228 | "version": "3.5.2" 229 | }, 230 | "widgets": { 231 | "state": {}, 232 | "version": "1.1.2" 233 | } 234 | }, 235 | "nbformat": 4, 236 | "nbformat_minor": 2 237 | } 238 | -------------------------------------------------------------------------------- /prob/prob_dist_discrete.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Probability Distributions - Discrete" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": { 13 | "collapsed": true 14 | }, 15 | "source": [ 16 | "%\n", 17 | "% plain-TeX file\n", 18 | "%\n", 19 | "\\ifx\\pdfoutput\\undefined\n", 20 | "\\input mbboard\n", 21 | "\\input mathabx\n", 22 | "\\fi\n", 23 | "\\input typofrmt\n", 24 | "\\input typotabl\n", 25 | "\\useoptions{magstep1,a4,english,preprint}\n", 26 | "\n", 27 | "\\title{Probability Distributions}\n", 28 | "\\author{Anthony Phan, \\today}\n", 29 | "\\maketitle\n", 30 | "\n", 31 | "\\overfullrule=0pt\n", 32 | "\\def\\description{\\medbreak\\bgroup\n", 33 | "\t\\def\\item##1{\\medbreak\\hangindent\\parindent\\leavevmode\n", 34 | "\t\t\\hskip-\\parindent{{\\tt##1}.}\\enspace\\ignorespaces}%\n", 35 | "\t\\def\\subitem##1{\\smallbreak\\hangindent2\\parindent\\leavevmode\n", 36 | "\t\t{\\it##1.}\\enspace\\ignorespaces}%\n", 37 | "\t\\def\\subsubitem##1{\\par\\hangindent3\\parindent\\leavevmode\n", 38 | "\t\t\\hskip\\parindent{##1.}\\enspace\\ignorespaces}%\n", 39 | "\t\\let\\itemitem=\\subitem}%\n", 40 | "\\def\\enddescription{\\egroup\\medbreak}%\n", 41 | "\\def\\cs#1{\\hbox{\\tt\\string#1}}%\n", 42 | "\\def\\var#1{\\ifmmode#1\\else$#1$\\fi}% usual variable\n", 43 | "\\def\\vari#1{\\ifmmode\\hbox{\\it#1}\\else{\\it#1}\\fi}% variable in italic\n", 44 | "\\def\\vartype#1{\\ifmmode\\hbox{\\tt#1}\\else{\\tt#1}\\fi}% variable type\n", 45 | "\n", 46 | "\\section{Continuous Probability distributions\\footnote{{\\it See}\\/ \\cs{probability-distributions.h}}}\n", 47 | "\n", 48 | "\\description\n", 49 | "\n", 50 | "\\item{lnGamma(\\vartype{double} $x$)} Logarithm of the Eulerian Gamma\n", 51 | "function, $\\ln(\\Gamma(x))$.\n", 52 | "\n", 53 | "\\item{Gamma(\\vartype{double} $x$)} Eulerian Gamma function\n", 54 | "$$\n", 55 | "\t\\mathbb R_+^*\\longrightarrow[\\sqrt\n", 56 | "\t\\pi,+\\infty\\mathclose[,\\qquad x\\longmapsto \\Gamma(x)\n", 57 | "\t=\\int_0^{+\\infty}t^{x-1}\\,{\\rm e}^{-t}\\,{\\rm d}t.\n", 58 | "$$\n", 59 | "Remember that for $n\\in\\mathbb N$, $n!=\\Gamma(n+1)$.\n", 60 | "\n", 61 | "\\item{Beta(\\vartype{double} $x$, \\vartype{double} $y$)} Eulerian Beta function\n", 62 | "$$\n", 63 | "\t(\\mathbb R_+^*)^2\\longrightarrow\\mathbb R_+^*,\\qquad\n", 64 | "\tx\\longmapsto {\\rm B}(x, y)=\\int_{0}^{1}u^{x-1}(1-u)^{y-1}\\,{\\rm d}u\n", 65 | "\t={\\Gamma(x)\\times\\Gamma(y)\\over\\Gamma(x+y)}\\,.\n", 66 | "$$\n", 67 | "\n", 68 | "\n", 69 | "% Gamma distributions\n", 70 | "\n", 71 | "\\item{gammapdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $\\lambda$)} Probability density function of\n", 72 | "the Gamma distribution with parameter $a>0$, $\\lambda>0$.\n", 73 | "$$\n", 74 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n", 75 | "\t\\mathbb 1_{\\mathbb R_+}(x){\\lambda^a\\over\\Gamma(a)}\\, x^{a-1}\\,{\\rm\n", 76 | "\te}^{-\\lambda x}.\n", 77 | "$$\n", 78 | "\n", 79 | "\\item{gammacdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $\\lambda$)} Cumulative distribution function\n", 80 | "of the Gamma distribution with parameter $a>0$, $\\lambda>0$.\n", 81 | "$$\n", 82 | "\t\\mathbb R\\longrightarrow[0,1\\mathclose[,\\qquad\n", 83 | "\tx\\longmapsto\\mathbb 1_{\\mathbb R_+}(x){\\lambda^a\\over\\Gamma(a)}\n", 84 | "\t\\int_0^x t^{a-1}\\,{\\rm e}^{-\\lambda t}\\,{\\rm d}t.\n", 85 | "$$\n", 86 | "\n", 87 | "\\item{gammaicdf(\\vartype{double} $p$, \\vartype{double} $a$, \\vartype{double} $\\lambda$)} Inverse cumulative\n", 88 | "distribution function of the Gamma distribution with parameter $a>0$, $\\lambda>0$.\n", 89 | "$$\n", 90 | "\t[0,1\\mathclose[\\longrightarrow\\mathbb R_+,\\qquad\n", 91 | "\tp\\longmapsto\\mathop{\\rm gammaicdf}(p,a,\\lambda).\n", 92 | "$$\n", 93 | "It is set to $0$ for $p<\\cs{accuracy}$ and to \\cs{infinity} for\n", 94 | "$p>1-\\cs{accuracy}$.\n", 95 | "\n", 96 | "% N(0,1) distribution\n", 97 | "\n", 98 | "\\item{normlimit} Numerical parameter for normal computations: if $X$ is a\n", 99 | "random variable with law $\\mathcal N(0,1)$, the normal distribution\n", 100 | "with mean $0$ and standard deviation $1$, then $\\mathbb\n", 101 | "P\\{X\\geq\\cs{normlimit}\\}=\\mathbb\n", 102 | "P\\{X\\leq-\\cs{normlimit}\\}\\approx0$. Its value is (unreasonably) set to\n", 103 | "$\\cs{normlimit} = 10$ ($\\cs{normlimit} = 4$ should be sufficient).\n", 104 | "\n", 105 | "\\item{normalpdf(\\vartype{double} $x$)} Probability density function of\n", 106 | "$\\mathcal N(0,1)$.\n", 107 | "$$\n", 108 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto{\\rm\n", 109 | "\te}^{-x^2\\!/2}/\\sqrt{2\\pi}.\n", 110 | "$$\n", 111 | "\n", 112 | "\n", 113 | "\\item{normalcdf (\\vartype{double} $x$)} Cumulative distribution function of\n", 114 | "$\\mathcal N(0,1)$.\n", 115 | "$$\n", 116 | "\t\\mathbb R\\longrightarrow\\mathopen]0,\\mathclose1[,\\qquad\n", 117 | "\tx\\longmapsto\\Phi(x)=\\int_{-\\infty}^x{\\rm e}^{-z^2\\!/2}\\, {{\\rm\n", 118 | "\td} z\\over\\sqrt{2\\pi}}\n", 119 | "\t={1\\over2}+{1\\over\\sqrt{2\\pi}}\\sum_{n=0}^\\infty\n", 120 | "\t{(-1)^n\\,x^{2n+1}\\over (2n+1)\\times n!}\\,.\n", 121 | "$$\n", 122 | "It is computed with its associated power series when\n", 123 | "$|x|<\\cs{normlimit}$, and set to $0$ or $1$ otherwise.\n", 124 | "\n", 125 | "\\item{normalcdf\\_(\\vartype{double} $x$)} Cumulative distribution function\n", 126 | "of $\\mathcal N(0,1)$. It is just another implementation of the\n", 127 | "previous function with a Gamma cumulative distribution function since\n", 128 | "$$\n", 129 | "\t\\Phi(x)= {1\\over2}\\,\\bigl(\\mathop{\\rm sgn}(x)\n", 130 | "\t\\times\\mathop{\\rm gammacdf}(x^2\\!/2,1/2)+1,1\\bigr),\n", 131 | "\t\\qquad\\hbox{for all $x\\in\\mathbb R$.}\n", 132 | "$$\n", 133 | "\n", 134 | "\\item{normalicdf(\\vartype{double} $p$)} Inverse cumulative\n", 135 | "distribution function of $\\mathcal N(0,1)$.\n", 136 | "$$\n", 137 | "\t\\mathopen]0,1\\mathclose[\\longrightarrow\\mathbb R,\\qquad\n", 138 | "\tp\\longmapsto\\mathop{\\rm\n", 139 | "\tnormalicdf}(p)=\\Phi^{-1}(p).\n", 140 | "$$\n", 141 | "It is set to $\\pm\\cs{infinity}$ for $p$ outside of\n", 142 | "$\\mathopen]\\cs{accuracy},1-\\cs{accuracy}\\mathclose[$.\n", 143 | "Of course, there is also \\cs{normalicdf\\_}\\dots\n", 144 | "\n", 145 | "\\remark\n", 146 | "Normal distributions with mean $m$ and standard deviation $\\sigma$ are not inplemented since they an be easily derived from the standard normal distribution. For instance, one can set\n", 147 | "\\verbatim\n", 148 | "double gaussianpdf(double x, double m, double sigma){\n", 149 | " return normalpdf((x-m)/sigma)/sigma;}\n", 150 | "double gaussiancdf(double x, double m, double sigma){\n", 151 | " return normalcdf((x-m)/sigma);}\n", 152 | "double gaussianicdf(double p, double m, double sigma){\n", 153 | " return sigma*normalicdf(p)+m;}\n", 154 | "\\endverbatim\n", 155 | "in order to get the probability, cumulative, inverse cumulative distribution functions of the $\\mathcal N(m,\\sigma^2)$ distribution with $m\\in\\mathbb R$ and $\\sigma>0$.\n", 156 | "\\endremark\n", 157 | "\n", 158 | "% $\\chi^2$ distributions\n", 159 | "\n", 160 | "\\item{chisquarepdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Probability density function\n", 161 | "of $\\chi^2(\\nu)$, the chi-square (Pearson) distribution with $\\nu>0$\n", 162 | "degrees of freedom.\n", 163 | "$$\n", 164 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\\mathbb\n", 165 | "\t1_{\\mathbb R_+}(x) \\,{x^{\\nu/2-1}{\\rm e}^{-x/2}\\over\n", 166 | "\t2^{\\nu/2}\\,\\Gamma(\\nu/2)}\\,.\n", 167 | "$$\n", 168 | "\n", 169 | "\\item{chisquarecdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Cumulative distribution\n", 170 | "function of $\\chi^2(\\nu)$.\n", 171 | "$$\n", 172 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\\mathbb\n", 173 | "\t1_{\\mathbb R_+}(x) \\int_0^x\\,{t^{\\nu/2-1}{\\rm e}^{-t/2} \\,{{\\rm\n", 174 | "\td}t\\over 2^{\\nu/2}\\,\\Gamma(\\nu/2)}}\\,.\n", 175 | "$$\n", 176 | "\n", 177 | "\\item{chisquareicdf(\\vartype{double} $p$, \\vartype{double} $\\nu$)} Inverse cumulative\n", 178 | "distribution function of $\\chi^2(\\nu)$.\n", 179 | "$$\n", 180 | "\t[0,1\\mathclose[\\longrightarrow\\mathbb R_+,\\qquad\n", 181 | "\tp\\longmapsto\\mathop{\\rm chisquareicdf}(p,\\nu).\n", 182 | "$$\n", 183 | "It is set to $0$ for $p<\\cs{accuracy}$ and to \\cs{infinity} for\n", 184 | "$p>1-\\cs{accuracy}$.\n", 185 | "\n", 186 | "% Beta distributions\n", 187 | "\n", 188 | "\\item{betapdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $b$)} Probability density function\n", 189 | "of the Beta distribution with parameters $a>à$ and $b>0$.\n", 190 | "$$\n", 191 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\\mathbb\n", 192 | "\t1_{[0,1]}(x)\\,x^{a-1}(1-x)^{b-1}\\!/{\\rm B}(a,b).\n", 193 | "$$\n", 194 | "\n", 195 | "\\item{betacdf(\\vartype{double} $x$, \\vartype{double} $a$, \\vartype{double} $b$)} Cumulative distribution\n", 196 | "function of the Beta distribution with parameters $a>0$ and $b>0$.\n", 197 | "$$\n", 198 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n", 199 | "\t\\int_0^x\\mathbb 1_{[0,1]}(u)\\,u^{a-1}(1-u)^{b-1} \\,{{\\rm\n", 200 | "\td}u\\over {\\rm B}(a,b)}\\,.\n", 201 | "$$\n", 202 | "\n", 203 | "\\item{betaicdf(\\vartype{double} $p$, \\vartype{double} $a$, \\vartype{double} $b$)} Inverse cumulative\n", 204 | "distribution function of the Beta distribution with parameters $a>0$\n", 205 | "and $b>0$.\n", 206 | "$$\n", 207 | "\t[0,1]\\longrightarrow\\mathbb R_+,\\qquad\n", 208 | "\tp\\longmapsto\\mathop{\\rm betaicdf}(p,a,b).\n", 209 | "$$\n", 210 | "It is set to $0$ for $p<\\cs{accuracy}$ and to $1$ for\n", 211 | "$p>1-\\cs{accuracy}$.\n", 212 | "\n", 213 | "% Student's (T) distributions\n", 214 | "\n", 215 | "\\item{studentpdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Probability density function of\n", 216 | "$\\mathcal T(\\nu)$, the Student distribution with $\\nu>0$ degrees of\n", 217 | "freedom.\n", 218 | "$$\n", 219 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n", 220 | " \t{1\\over\\sqrt{\\nu}\\,{\\rm B}(\\nu/2,1/2)} \\biggl(1+{x^2\\over\n", 221 | " \t\\nu}\\biggr)^{\\!\\!-(\\nu+1)/2} .\n", 222 | "$$\n", 223 | "\n", 224 | "\\item{studentcdf(\\vartype{double} $x$, \\vartype{double} $\\nu$)} Cumulative distribution\n", 225 | "function of $\\mathcal T(\\nu)$.\n", 226 | "$$\n", 227 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n", 228 | " \t\\int_{-\\infty}^x \\biggl(1+{z^2\\over \\nu}\\biggr)^{\\!\\!-(\\nu+1)/2}\n", 229 | " \t{{\\rm d}z\\over\\sqrt{\\nu}\\,{\\rm B}(\\nu/2,1/2)} .\n", 230 | "$$\n", 231 | "\\item{studenticdf(\\vartype{double} $p$, \\vartype{double} $\\nu$)} Inverse cumulative\n", 232 | "distribution function of $\\mathcal T(\\nu)$.\n", 233 | "$$\n", 234 | "\t\\mathopen]0,1\\mathclose[\\longrightarrow\\mathbb R,\\qquad\n", 235 | "\tp\\longmapsto\\mathop{\\rm studenticdf}(p,\\nu).\n", 236 | "$$\n", 237 | "It is set to $\\pm\\cs{infinity}$ for $p$ outside of\n", 238 | "$\\mathopen]\\cs{accuracy},1-\\cs{accuracy}\\mathclose[$.\n", 239 | "\n", 240 | "% Fisher's (F) distributions\n", 241 | "\n", 242 | "\\item{fisherpdf(\\vartype{double} $x$, \\vartype{double} $\\nu_1$, \\vartype{double} $\\nu_2$)} Probability density\n", 243 | "function of $\\mathcal F(\\nu_1,\\nu_2)$, the Fisher distribution with\n", 244 | "$\\nu_1>0$ and $\\nu_2>0$ degrees of freedom ($\\nu_1$ is the numerator degree\n", 245 | "of freedom, $\\nu_2$ the demominator degree of freedom).\n", 246 | "$$\n", 247 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n", 248 | "\t\\mathbb 1_{\\mathbb R_+}(x)\\,{(\\nu_1/\\nu_2)^{\\nu_1/2}\\,x^{\\nu_1/2-1}\n", 249 | "\t\\over {\\rm B}(\\nu_1/2,\\nu_2/2)(1+x\\times \\nu_1/\\nu_2)^{(\\nu_1+\\nu_2)/2}}.\n", 250 | "$$\n", 251 | "\n", 252 | "\\item{fishercdf(\\vartype{double} $x$, \\vartype{double} $\\nu_1$, \\vartype{double} $\\nu_2$)} Cumulative distribution\n", 253 | "function of $\\mathcal F(\\nu_1,\\nu_2)$.\n", 254 | "$$\n", 255 | "\t\\mathbb R\\longrightarrow\\mathbb R_+,\\qquad x\\longmapsto\n", 256 | "\t\\mathbb 1_{\\mathbb R_+}(x)\\int_0^x\n", 257 | "\t{(\\nu_1/\\nu_2)^{\\nu_1/2}\\,z^{\\nu_1/2-1} \\over {\\rm B}(\\nu_1/2,\\nu_2/2)(1+z\\times\n", 258 | "\t\\nu_1/\\nu_2)^{(\\nu_1+\\nu_2)/2}} \\,{\\rm d}z.\n", 259 | "$$\n", 260 | "\n", 261 | "\\item{fishericdf(\\vartype{double} $p$, \\vartype{double} $\\nu_1$, \\vartype{double} $\\nu_2$)} Inverse cumulative\n", 262 | "distribution function of $\\mathcal F(\\nu_1, \\nu_2)$.\n", 263 | "$$\n", 264 | "\t[0,1\\mathclose[\\longrightarrow\\mathbb R_+,\\qquad\n", 265 | "\tp\\longmapsto\\mathop{\\rm fishericdf}(p,\\nu_1,\\nu_2).\n", 266 | "$$\n", 267 | "It is set to $0$ for $p<\\cs{accuracy}$ and to \\cs{infinity} for\n", 268 | "$p>1-\\cs{accuracy}$.\n", 269 | "\n", 270 | "\\section{Discrete Probability distributions\\footnote{{\\it See}\\/ \\cs{probability-distributions.h}}}\n", 271 | "\n", 272 | "About quantiles, please note that $q_p=k+0.5$ when\n", 273 | "$F(k)=p$ for $k\\in\\mathbb N$.\n", 274 | "\n", 275 | "% Poisson's distributions\n", 276 | "\n", 277 | "\\item{poissonpdf(\\vartype{double} $x$, \\vartype{double} $\\lambda$)} Probability distribution\n", 278 | "function of $\\mathcal P(\\lambda)$, the Poisson distribution with\n", 279 | "parameter $\\lambda\\geq0$.\n", 280 | "$$\n", 281 | "\t\\mathbb R\\longrightarrow[0,1],\\qquad x\\longmapsto\\cases{{\\rm\n", 282 | "\te}^{-\\lambda}\\,\\lambda^x\\!/x!& if $x\\in\\mathbb N$,\\cr 0\n", 283 | "\t&otherwise.}\n", 284 | "$$\n", 285 | "\\item{poissoncdf(\\vartype{double} $x$, \\vartype{double} $\\lambda$)} Cumulative distribution\n", 286 | "function of $\\mathcal P(\\lambda)$.\n", 287 | "\n", 288 | "\\item{poissonicdf(\\vartype{double} $p$, \\vartype{double} $\\lambda$)} Inverse cumulative\n", 289 | "distribution function of $\\mathcal P(\\lambda)$.\n", 290 | "\n", 291 | "% Binomial distributions\n", 292 | "\n", 293 | "\\item{binomialpdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Probability distribution\n", 294 | "function of $\\mathcal B(n,\\pi)$, the binomial distribution with\n", 295 | "parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$.\n", 296 | "$$\n", 297 | "\t\\mathbb R\\longrightarrow[0,1],\\qquad\n", 298 | "\tx\\longmapsto\\cases{C_n^x\\,\\pi^x(1-\\pi)^{n-x}& if\n", 299 | "\t$x\\in\\{0,1,\\dots,n\\}$,\\cr 0 &otherwise.}\n", 300 | "$$\n", 301 | "\n", 302 | "\\item{binomialcdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Cumulative distribution\n", 303 | "function of $\\mathcal B(n,\\pi)$.\n", 304 | "\n", 305 | "\\item{binomialicdf(\\vartype{double} $p$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Inverse cumulative\n", 306 | "distribution function of $\\mathcal B(n,\\pi)$.\n", 307 | "\n", 308 | "% Geometric distributions\n", 309 | "\n", 310 | "\\item{geometricpdf(\\vartype{double} $x$, \\vartype{double} $\\pi$)} Probability distribution\n", 311 | "function of $\\mathcal G(\\pi)$, the geometrical distribution with\n", 312 | "parameter $\\pi\\in[0,1]$. Describe the law of the first success rank in\n", 313 | "an infinitely repeated Bernoulli trial with parameter $\\pi\\in[0,1]$.\n", 314 | "Thus, it is given by\n", 315 | "$$\n", 316 | "\t\\mathbb R\\longrightarrow[0,1], \\qquad x\\longmapsto\n", 317 | "\t\\cases{\\pi(1-\\pi)^{x-1}& if $x\\in\\{1,2,3,\\dots\\}$,\\cr 0 &\n", 318 | "\totherwise.}\n", 319 | "$$\n", 320 | "\n", 321 | "\\item{geometriccdf(\\vartype{double} $x$, \\vartype{double} $\\pi$)} Cumulative distribution\n", 322 | "function of $\\mathcal G(\\pi)$. It returns the sum up to $x$ of the\n", 323 | "previous probabilities. Thus it is given by\n", 324 | "$$\n", 325 | "\t\\mathbb R\\longrightarrow[0,1], \\qquad x\\longmapsto\n", 326 | "\t\\cases{1-(1-\\pi)^{\\mathop{\\rm floor}x}& if $x\\geq1$,\\cr 0 &\n", 327 | "\totherwise.}\n", 328 | "$$\n", 329 | "\n", 330 | "\\item{geometricicdf(\\vartype{double} $p$, \\vartype{double} $\\pi$)} Inverse cumulative\n", 331 | "distribution function of $\\mathcal G(\\pi)$.\n", 332 | "$$\n", 333 | "\t[0,1]\\longrightarrow\\{1,1.5,2,2.5,3,3.5,\\dots\\}, \\qquad\n", 334 | "\tp\\longmapsto\\mathop{\\rm geometricicdf}(p,\\pi).\n", 335 | "$$\n", 336 | "\n", 337 | "\\item{negativebinomialpdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Probability\n", 338 | "distribution function of the negative binomial ditribution with\n", 339 | "parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$. Describe the law of the\n", 340 | "$n$-th success rank, $n\\in\\mathbb N^*$, in an infinitely repeated\n", 341 | "Bernoulli trial with parameter $\\pi\\in[0,1]$. Thus, it is given by\n", 342 | "$$\n", 343 | "\t\\mathbb R\\longrightarrow[0,1],\\qquad x\\longmapsto\n", 344 | "\t\\cases{C_{x-1}^{n-1}\\,\\pi^n(1-\\pi)^{x-n}& if\n", 345 | "\t$x\\in\\{n,n+1,n+2,\\dots\\}$,\\cr 0 & otherwise.}\n", 346 | "$$\n", 347 | "One can remark that the negative binomial distributions for $n=1$ are\n", 348 | "just the corresponding geometric ones. It is certainly better when\n", 349 | "$n=1$ to use functions related to geometric distributions instead of\n", 350 | "negative binomial distributions related ones.\n", 351 | "\n", 352 | "\\item{negativebinomialcdf(\\vartype{double} $x$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Cumulative\n", 353 | "distribution function of the negative binomial ditribution with\n", 354 | "parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$. One can easily prove\n", 355 | "that it is given by\n", 356 | "$$\n", 357 | "\t\\mathbb R\\longrightarrow[0,1], \\qquad x\\longmapsto\n", 358 | "\t1-\\mathop{\\rm binomialcdf}(n-1,\\mathop{\\rm floor} x,\\pi).\n", 359 | "$$\n", 360 | "\n", 361 | "\\item{negativebinomialicdf(\\vartype{double} $p$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Inverse\n", 362 | "cumulative distribution function of the negative binomial ditribution\n", 363 | "with parameters $n\\in\\mathbb N^*$ and $\\pi\\in[0,1]$.\n", 364 | "$$\n", 365 | "\t[0,1]\\longrightarrow\\{n,n+0.5,n+1,n+1.5,\\dots\\}, \\qquad\n", 366 | "\tp\\longmapsto\\mathop{\\rm\n", 367 | "\tnegativebinomialicdf}(p,n,\\pi).\n", 368 | "$$\n", 369 | "\n", 370 | "% Hypergeometric distributions\n", 371 | "\n", 372 | "\\item{hypergeometricpdf(\\vartype{double} $x$, \\vartype{int} $N$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Probability\n", 373 | "distribution function of $\\mathcal H(N, n,\\pi)$, the\n", 374 | "hypergeometrical distribution with parameters $N\\geq n\\in\\mathbb N^*$\n", 375 | "and $\\pi\\in[0,1]$. One should have $N\\pi\\in\\mathbb N$.\n", 376 | "$$\n", 377 | "\\eqalign{\n", 378 | "\t\\mathbb R&\\longrightarrow[0,1],\\cr\n", 379 | "\tx&\\longmapsto\\cases{\\displaystyle\n", 380 | "\t{C_{N\\pi}^x\\,C_{N(1-\\pi)}^{n-x}\\over C_N^n}& if\n", 381 | "\t$x\\in\\{\\max(0,n-N(1-\\pi)),\\dots,\\min(n,N\\pi)\\}$,\\cr 0 &otherwise.}\n", 382 | "\t}\n", 383 | "$$\n", 384 | "\n", 385 | "\\item{hypergeometriccdf(\\vartype{double} $x$, \\vartype{int} $N$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Cumulative\n", 386 | "distribution function of $\\mathcal H(N, n,\\pi)$.\n", 387 | "\n", 388 | "\\item{hypergeometricicdf(\\vartype{double} $p$, \\vartype{int} $N$, \\vartype{int} $n$, \\vartype{double} $\\pi$)} Inverse\n", 389 | "cumulative distribution function of $\\mathcal H(N,n,\\pi)$.\n", 390 | "\n", 391 | "\\section{Rather specific Probability distributions\\footnote{{\\it See}\\/ \\cs{probability-distributions.h}}}\n", 392 | "\n", 393 | "\\item{Kolmogorovcdf(\\vartype{double} $x$, \\vartype{int} $n$)} Cumulative\n", 394 | "distribution function of Kolmogorov distributions.\n", 395 | "These are the famous probability distributions involved in\n", 396 | "Kolmogorov--Smirnov (two-sided) Goodness of Fit tests\n", 397 | "with statistic\n", 398 | "$$\n", 399 | "\tK_n={\\|F-F_n\\|}_\\infty=\\sup_{x\\in\\mathbb R}\\bigl|F(x)-F_n(x)\\bigr|\n", 400 | "\t=\\max\\nolimits_{i = 1}^n \\bigl(F(X_{(i)})-(i-1)/n\\bigr)\n", 401 | "\t\\vee\\bigl(i/n-F(X_{(i)})\\bigr).\n", 402 | "$$\n", 403 | "Their\n", 404 | "computation is based on ``Evaluating Kolmogorov's Distribution''\n", 405 | "by George Marsa\\-glia and Wai Wan Tsang.\n", 406 | "\n", 407 | "\\item{kolmogorovicdf(\\vartype{double} $p$, \\vartype{int} $n$)} Inverse\n", 408 | "cumulative distribution function of Kolmogorov distributions.\n", 409 | "(Please do not use it since it is based on the bisection\n", 410 | "method or dichotomy.)\n", 411 | "\n", 412 | "\\item{klmcdf(\\vartype{double} $x$, \\vartype{int} $n$)}\n", 413 | "Cumulative distribution function of the limiting distributions\n", 414 | "associated to Kolmogorov distribution by Dudley's asymptotic\n", 415 | "formula (1964):\n", 416 | "$$\n", 417 | "\t\\lim_{n\\to\\infty}\\mathbb P\\bigl\\{K_n\\leq u/\\!\\sqrt n\\bigr\\}\n", 418 | "\t=1+2\\sum_{k=1}^\\infty(-1)^k\\exp\\bigl(-2k^2u^2\\bigr),\n", 419 | "$$\n", 420 | "with some numerical adjustments (Stephens M.A., 1970).\n", 421 | "\n", 422 | "\\item{klmicdf(\\vartype{double} $p$, \\vartype{int} $n$)} Inverse\n", 423 | "cumulative distribution function of the previous distributions.\n", 424 | "\n", 425 | "\n", 426 | "\\item{kpmcdf(\\vartype{double} $x$, \\vartype{int} $n$)}\n", 427 | "Cumulative distribution function of the distribution involved in the\n", 428 | "one sided Goodness of Fit tests with statistic\n", 429 | "$$\n", 430 | "\tK^+_n=\\sup_{x\\in\\mathbb R}\\bigl(F(x)-F_n(x)\\bigr)\n", 431 | "\t=\\max\\nolimits_{1\\leq i\\leq n}\n", 432 | "\t\\Bigl({i\\over n}-F\\bigl(X_{(i)}\\bigr)\\Bigr)\n", 433 | "$$\n", 434 | "or\n", 435 | "$$\n", 436 | "\tK^-_n=\\sup_{x\\in\\mathbb R}\\bigl(F_n(x)-F(x)\\bigr)\n", 437 | "\t=\\max\\nolimits_{1\\leq i\\leq n}\n", 438 | "\t\\Bigl(F\\bigl(X_{(i)}\\bigr)-{i-1\\over n}\\Bigr)\n", 439 | "$$\n", 440 | "which share the same distribution: for $x\\in[0,1]$,\n", 441 | "$$\n", 442 | "\\eqalign{\n", 443 | " \\hbox{\\tt kpmcdf}(x,n)\n", 444 | " =\\Proba\\bigl\\{K^\\pm_n\\leq x\\bigr\\}\n", 445 | "\t&=x\\sum_{0\\leq k\\leq nx}{n\\choose k}(k/n-x)^k(x+1-k/n)^{n-k-1}\\cr\n", 446 | " &=1-x\\sum_{nx\n", 46 | " " 47 | ], 48 | "text/plain": [ 49 | "\n", 50 | " \n", 57 | " " 58 | ] 59 | }, 60 | "execution_count": 2, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "from IPython.display import IFrame\n", 67 | "IFrame('https://www.baidu.com', width='100%', height=350)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 9, 73 | "metadata": { 74 | "collapsed": true 75 | }, 76 | "outputs": [], 77 | "source": [ 78 | "from IPython.display import HTML\n", 79 | "js_script_str = '''\n", 91 | "
{_}
'''" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 10, 97 | "metadata": {}, 98 | "outputs": [ 99 | { 100 | "data": { 101 | "text/html": [ 102 | "\n", 114 | "
content to hide
" 115 | ], 116 | "text/plain": [ 117 | "\n", 129 | "
content to hide
" 130 | ] 131 | }, 132 | "execution_count": 10, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "HTML(js_script_str.replace('{_}', 'content to hide'))" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": { 145 | "collapsed": true 146 | }, 147 | "outputs": [], 148 | "source": [] 149 | } 150 | ], 151 | "metadata": { 152 | "kernelspec": { 153 | "display_name": "Python 3", 154 | "language": "python", 155 | "name": "python3" 156 | }, 157 | "language_info": { 158 | "codemirror_mode": { 159 | "name": "ipython", 160 | "version": 3 161 | }, 162 | "file_extension": ".py", 163 | "mimetype": "text/x-python", 164 | "name": "python", 165 | "nbconvert_exporter": "python", 166 | "pygments_lexer": "ipython3", 167 | "version": "3.5.2" 168 | }, 169 | "widgets": { 170 | "state": {}, 171 | "version": "1.1.2" 172 | } 173 | }, 174 | "nbformat": 4, 175 | "nbformat_minor": 2 176 | } 177 | -------------------------------------------------------------------------------- /test/test_notebook2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This note book is for testing out basic functionalities of nbviewer rendering of equations and code blocks." 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Let me try an equation\n", 15 | "\n", 16 | "$$\\frac{1}{N} \\sum_{i=1}^N Z_i \\rightarrow E[ Z ], \\;\\;\\; N \\rightarrow \\infty.$$\n", 17 | "\n", 18 | "\\begin{eqnarray}\n", 19 | "\\nabla \\times \\vec{\\mathbf{B}} -\\, \\frac1c\\, \\frac{\\partial\\vec{\\mathbf{E}}}{\\partial t} & = \\frac{4\\pi}{c}\\vec{\\mathbf{j}} \\\\\n", 20 | "\\nabla \\cdot \\vec{\\mathbf{E}} & = 4 \\pi \\rho \\\\\n", 21 | "\\nabla \\times \\vec{\\mathbf{E}}\\, +\\, \\frac1c\\, \\frac{\\partial\\vec{\\mathbf{B}}}{\\partial t} & = \\vec{\\mathbf{0}} \\\\\n", 22 | "\\nabla \\cdot \\vec{\\mathbf{B}} & = 0 \n", 23 | "\\end{eqnarray}" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 7, 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "data": { 33 | "text/html": [ 34 | "\n", 35 | " \n", 42 | " " 43 | ], 44 | "text/plain": [ 45 | "" 46 | ] 47 | }, 48 | "execution_count": 7, 49 | "metadata": {}, 50 | "output_type": "execute_result" 51 | } 52 | ], 53 | "source": [ 54 | "from IPython.display import IFrame\n", 55 | "IFrame('https://www.baidu.com', width='100%', height=350)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": null, 61 | "metadata": { 62 | "collapsed": true 63 | }, 64 | "outputs": [], 65 | "source": [] 66 | } 67 | ], 68 | "metadata": { 69 | "kernelspec": { 70 | "display_name": "Python 2", 71 | "language": "python", 72 | "name": "python2" 73 | }, 74 | "language_info": { 75 | "codemirror_mode": { 76 | "name": "ipython", 77 | "version": 2 78 | }, 79 | "file_extension": ".py", 80 | "mimetype": "text/x-python", 81 | "name": "python", 82 | "nbconvert_exporter": "python", 83 | "pygments_lexer": "ipython2", 84 | "version": "2.7.13" 85 | } 86 | }, 87 | "nbformat": 4, 88 | "nbformat_minor": 2 89 | } 90 | --------------------------------------------------------------------------------