├── .gitignore
├── README.md
├── check_env.py
├── data
├── Demographics_State.csv
├── Population_State.csv
└── Weed_Price.csv
├── notebooks
├── 1. Introduction.ipynb
├── 2. Warm-up.ipynb
├── 3. Resampling.ipynb
├── 4. Basic Metrics.ipynb
├── 5. Distributions.ipynb
├── 6. Hypothesis Testing.ipynb
├── 7. Linear Regression.ipynb
├── 8. Closing thoughts and terminology.ipynb
├── 9. References.ipynb
└── img
│ ├── 6sigma.png
│ ├── binomial.gif
│ ├── binomial_pmf.png
│ ├── correlation.gif
│ ├── correlation_not_causation.gif
│ ├── covariance.png
│ ├── exponential_pdf.png
│ ├── kurtosis.png
│ ├── leastsquare.gif
│ ├── normal_cdf.png
│ ├── normal_pdf.png
│ ├── normaldist.png
│ ├── skewness.png
│ ├── uniform.png
│ └── variance.png
├── requirements.txt
└── requirements_linux.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | # Created by https://www.gitignore.io/api/vim,ipythonnotebook,virtualenv
2 |
3 | ### Vim ###
4 | [._]*.s[a-w][a-z]
5 | [._]s[a-w][a-z]
6 | *.un~
7 | Session.vim
8 | .netrwhist
9 | *~
10 |
11 |
12 | ### IPythonNotebook ###
13 | # Temporary data
14 | .ipynb_checkpoints/
15 |
16 |
17 | ### VirtualEnv ###
18 | # Virtualenv
19 | # http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
20 | .Python
21 | [Bb]in
22 | [Ii]nclude
23 | [Ll]ib
24 | [Ss]cripts
25 | pyvenv.cfg
26 | pip-selfcheck.json
27 |
28 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Introduction to Statistics
2 |
3 | [](https://bitdeli.com/free "Bitdeli Badge")
4 |
5 |
6 | Inspired by Allen Downey's books [Think Stats](http://greenteapress.com/thinkstats/) and [Think Bayes](http://greenteapress.com/thinkbayes/), this is an attempt to learn Statistics using an application-centric programming approach.
7 |
8 | ## Objective
9 | Showcase real-life examples and what statistics to use in each of those examples. Almost every book teaches a concept and shows an example. Ultimately, every topic gets treated separately and no holistic view is presented. Here, we would take examples and see how to make sense out of it.
10 |
11 | ## Topics covered
12 |
13 | * Mean, Median, Mode
14 | * Standard Deviation
15 | * Variance
16 | * Co-variance
17 | * Probability Distribution
18 | * Hypothesis Testing
19 | * t-test, p-value, chi-squared test
20 | * Confidence Intervals
21 | * Confidence levels and Sigificance levels
22 | * Correlation
23 | * Resampling (and uses in Big Data)
24 | * A/B Testing
25 | * A simple linear regression model
26 |
27 | ## Workshop Plan
28 | We would be using Marijuana prices in various states of the USA, along with demographic data of the USA based on the latest census data
29 |
30 | There will be separate ipython notebooks - grouped by topic similarities. *notebooks will be uploaded later*
31 | Some examples include:
32 | * Find sum of people buying weed in a year, by various states.
33 | * Find mean of price in a week/month, by various states.
34 | * Find variance of price in selected states. Find variance of selected states by week of month
35 | * Define distribution. Plot histograms
36 | * Determining outliers (Plots, quantiles, box plots, percentiles) in weed price data
37 | * Continuous distributions(exponential distribution, normal distribution)
38 | * Introduction to Probability
39 | * Hypothesis testing. Check if weed price across states are similar or not. Check for different qualities of weed
40 | * Resampling
41 | * Simple regression model: Predict weed price for the next month. Understand the output and diagnostics
42 | * Introduction to A/B testing: Impact of regulation and deregulation on a couple of states
43 |
44 |
45 | ## Prerequisites
46 | * Basics of Python. User should know how to write functions; read in a text file(csv, txt, fwf) and parse them; conditional and looping constructs; using standard libraries like os, sys; lists, list comprehension, dictionaries
47 | * It is good to know basics of the following:
48 | * Numpy
49 | * Scipy
50 | * Pandas
51 | * Matplotlib
52 | * Seaborn
53 | * IPython and IPython notebook - Everything here would be an IPython notebook
54 | * Software Requirements
55 | * Python 2.7
56 | * git - so that this repo can be cloned :)
57 | * virtualenv
58 | * Libraries from *requirements.txt*
59 |
60 | ## Optional
61 | Users could choose to install Anaconda, if they want. If using Anaconda or Enthought, please ensure that all libraries listed in the requirements.txt are installed.
62 |
63 | *Note to Windows Users*: Neither of us use Windows. From past workshop experiences, Windows users have faced issues installing the way explained below. It is advisable to install Anaconda and ensure that all the libraries listed in the *requirements.txt* file are installed.
64 |
65 | ## Setup Guide
66 |
67 | #### Clone the repository
68 | $ git clone https://github.com/rouseguy/intro2stats.git
69 |
70 | #### Create a virtual environment & activate
71 | $ cd intro2stats
72 | $ virtualenv env
73 | $ source env/bin/activate
74 |
75 | #### Install reqirements from requirements file
76 | $ pip install -r requirements.txt
77 |
78 | #### Note: Make sure you have libraries for png & freetype.
79 | Ubuntu users can install the below
80 |
81 | apt-get install libfreetype6-dev
82 | apt-get install libpng-dev
83 |
84 | ### Script to check if installation is fine for the workshop
85 | Please execute the following at the command prompt
86 |
87 | $ python check_env.py
88 |
89 | If any library has a `FAIL` message, please install/upgrade that library.
90 |
91 | ---
92 |
93 | 
Introduction to Statistics using Python by Bargava and Raghotham is licensed under a Creative Commons Attribution 4.0 International License.
94 |
95 |
96 |
97 |
--------------------------------------------------------------------------------
/check_env.py:
--------------------------------------------------------------------------------
1 | """
2 | This script will check if the environment setup is correct for the workshop.
3 |
4 | To run, please execute the following command from the command prompt
5 | >>> python check_env.py
6 |
7 | The output will indicate if any of the libraries are missing or need to be updated.
8 |
9 | This script is inspired from https://github.com/fonnesbeck/scipy2015_tutorial/blob/master/check_env.py
10 | """
11 |
12 | from __future__ import print_function
13 |
14 | try:
15 | import curses
16 | curses.setupterm()
17 | assert curses.tigetnum("colors") > 2
18 | OK = "\x1b[1;%dm[ OK ]\x1b[0m" % (30 + curses.COLOR_GREEN)
19 | FAIL = "\x1b[1;%dm[FAIL]\x1b[0m" % (30 + curses.COLOR_RED)
20 | except:
21 | OK = '[ OK ]'
22 | FAIL = '[FAIL]'
23 |
24 | import sys
25 | try:
26 | import importlib
27 | except ImportError:
28 | print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
29 | from distutils.version import LooseVersion as Version
30 |
31 | def import_version(pkg, min_ver, fail_msg=""):
32 | mod = None
33 | try:
34 | mod = importlib.import_module(pkg)
35 | # workaround for Image not having __version__
36 | version = getattr(mod, "__version__", 0) or getattr(mod, "VERSION", 0)
37 | if Version(version) < min_ver:
38 | print(FAIL, "%s version %s or higher required, but %s installed."
39 | % (lib, min_ver, version))
40 | else:
41 | print(OK, '%s version %s' % (pkg, version))
42 | except ImportError:
43 | print(FAIL, '%s not installed. %s' % (pkg, fail_msg))
44 | return mod
45 |
46 |
47 | # first check the python version
48 | print('Using python in', sys.prefix)
49 | print(sys.version)
50 | pyversion = Version(sys.version)
51 | if pyversion >= "3":
52 | print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
53 | elif pyversion >= "2":
54 | if pyversion < "2.7":
55 | print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
56 | else:
57 | print(FAIL, "Unknown Python version: %s" % sys.version)
58 |
59 | print()
60 | requirements = {'numpy': "1.9.2", 'pandas': "0.16.2", 'scipy': "0.9", 'matplotlib': "1.4.3",
61 | 'IPython': "4.0", 'sklearn': "0.16.1", 'seaborn': "0.6.0", 'statsmodels': "0.6.1"}
62 |
63 | # now the dependencies
64 | for lib, required_version in list(requirements.items()):
65 | import_version(lib, required_version)
66 |
--------------------------------------------------------------------------------
/data/Demographics_State.csv:
--------------------------------------------------------------------------------
1 | "region","total_population","percent_white","percent_black","percent_asian","percent_hispanic","per_capita_income","median_rent","median_age"
2 | "alabama",4799277,67,26,1,4,23680,501,38.1
3 | "alaska",720316,63,3,5,6,32651,978,33.6
4 | "arizona",6479703,57,4,3,30,25358,747,36.3
5 | "arkansas",2933369,74,15,1,7,22170,480,37.5
6 | "california",37659181,40,6,13,38,29527,1119,35.4
7 | "colorado",5119329,70,4,3,21,31109,825,36.1
8 | "connecticut",3583561,70,9,4,14,37892,880,40.2
9 | "delaware",908446,65,21,3,8,29819,828,38.9
10 | "district of columbia",619371,35,49,3,10,45290,1154,33.8
11 | "florida",19091156,57,15,2,23,26236,838,41
12 | "georgia",9810417,55,30,3,9,25182,673,35.6
13 | "hawaii",1376298,23,2,37,9,29305,1220,38.3
14 | "idaho",1583364,84,1,1,11,22568,607,34.9
15 | "illinois",12848554,63,14,5,16,29666,759,36.8
16 | "indiana",6514861,81,9,2,6,24635,577,37.1
17 | "iowa",3062553,88,3,2,5,27027,534,38.1
18 | "kansas",2868107,78,6,2,11,26929,551,36
19 | "kentucky",4361333,86,8,1,3,23462,506,38.2
20 | "louisiana",4567968,60,32,2,4,24442,610,36
21 | "maine",1328320,94,1,1,1,26824,664,43.2
22 | "maryland",5834299,54,29,6,8,36354,1034,38
23 | "massachusetts",6605058,76,6,6,10,35763,936,39.2
24 | "michigan",9886095,76,14,3,5,25681,623,39.1
25 | "minnesota",5347740,83,5,4,5,30913,734,37.6
26 | "mississippi",2976872,58,37,1,3,20618,510,36.2
27 | "missouri",6007182,81,11,2,4,25649,549,38
28 | "montana",998554,87,0,1,3,25373,577,39.9
29 | "nebraska",1841625,82,4,2,9,26899,563,36.3
30 | "nevada",2730066,53,8,7,27,26589,840,36.6
31 | "new hampshire",1319171,92,1,2,3,33134,878,41.5
32 | "new jersey",8832406,59,13,9,18,36027,1024,39.1
33 | "new mexico",2069706,40,2,1,47,23763,635,36.7
34 | "new york",19487053,58,14,8,18,32382,963,38.1
35 | "north carolina",9651380,65,21,2,9,25284,602,37.6
36 | "north dakota",689781,88,1,1,2,29732,564,36.4
37 | "ohio",11549590,81,12,2,3,26046,562,39
38 | "oklahoma",3785742,68,7,2,9,24208,525,36.2
39 | "oregon",3868721,78,2,4,12,26809,749,38.7
40 | "pennsylvania",12731381,79,10,3,6,28502,652,40.3
41 | "rhode island",1051695,76,5,3,13,30469,781,39.6
42 | "south carolina",4679602,64,28,1,5,23943,582,38.1
43 | "south dakota",825198,84,1,1,3,25740,517,36.9
44 | "tennessee",6402387,75,17,1,5,24409,568,38.2
45 | "texas",25639373,45,12,4,38,26019,688,33.8
46 | "utah",2813673,80,1,2,13,23873,739,29.6
47 | "vermont",625904,94,1,1,2,29167,754,42
48 | "virginia",8100653,64,19,6,8,33493,910,37.5
49 | "washington",6819579,72,3,7,11,30742,853,37.3
50 | "west virginia",1853619,93,3,1,1,22966,448,41.5
51 | "wisconsin",5706871,83,6,2,6,27523,636,38.7
52 | "wyoming",570134,85,1,1,9,28902,647,36.8
53 |
--------------------------------------------------------------------------------
/data/Population_State.csv:
--------------------------------------------------------------------------------
1 | "region","value"
2 | "alabama",4777326
3 | "alaska",711139
4 | "arizona",6410979
5 | "arkansas",2916372
6 | "california",37325068
7 | "colorado",5042853
8 | "connecticut",3572213
9 | "delaware",900131
10 | "district of columbia",605759
11 | "florida",18885152
12 | "georgia",9714569
13 | "hawaii",1362730
14 | "idaho",1567803
15 | "illinois",12823860
16 | "indiana",6485530
17 | "iowa",3047646
18 | "kansas",2851183
19 | "kentucky",4340167
20 | "louisiana",4529605
21 | "maine",1329084
22 | "maryland",5785496
23 | "massachusetts",6560595
24 | "michigan",9897264
25 | "minnesota",5313081
26 | "mississippi",2967620
27 | "missouri",5982413
28 | "montana",990785
29 | "nebraska",1827306
30 | "nevada",2704204
31 | "new hampshire",1317474
32 | "new jersey",8793888
33 | "new mexico",2055287
34 | "new york",19398125
35 | "north carolina",9544249
36 | "north dakota",676253
37 | "ohio",11533561
38 | "oklahoma",3749005
39 | "oregon",3836628
40 | "pennsylvania",12699589
41 | "rhode island",1052471
42 | "south carolina",4630351
43 | "south dakota",815871
44 | "tennessee",6353226
45 | "texas",25208897
46 | "utah",2766233
47 | "vermont",625498
48 | "virginia",8014955
49 | "washington",6738714
50 | "west virginia",1850481
51 | "wisconsin",5687219
52 | "wyoming",562803
53 |
--------------------------------------------------------------------------------
/notebooks/1. Introduction.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "> **The fact that data science exists as a field is a colossal failure of statistics ** - Hadley Wickham\n",
8 | "\n",
9 | "\n",
10 | "# THERE ARE HACKS, DAMN HACKS, AND THERE ARE STATISTICS \n",
11 | "\n",
12 | "\n",
13 | "Statistics, to most people, is a bunch of formulae, yes - invariably complicated with strong assumptions. Very few practitioners in fields outside math can derive those from first principles. But is it necessarily need to be that way?\n",
14 | "\n",
15 | "## Philosophy for this workshop\n",
16 | "Instead of a formula-based, discretely-structured classical framework for teaching statistics, we believe in the hackers' philosophy of DIY. \n",
17 | "> **I hear and I forget. I see and I remember. I do and I understand - Confucius** \n",
18 | "\n",
19 | "What is the point of learning something if you don't know how and where to apply that? We aim to bridge that gap a bit.\n",
20 | "\n",
21 | "## Data Analysis\n",
22 | "\n",
23 | "The art of data analysis\n",
24 | "\n",
25 | "> https://github.com/amitkaps/weed/blob/master/0-Intro.ipynb\n",
26 | "\n",
27 | "# Statistics: Inferring results about a population given a sample\n",
28 | "\n",
29 | "\n",
30 | "# *Where does statistics fit in Business Anaytics?*\n",
31 | "\n",
32 | "> DISCUSSION\n",
33 | "\n"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": null,
39 | "metadata": {
40 | "collapsed": true
41 | },
42 | "outputs": [],
43 | "source": []
44 | }
45 | ],
46 | "metadata": {
47 | "kernelspec": {
48 | "display_name": "Python 2",
49 | "language": "python",
50 | "name": "python2"
51 | },
52 | "language_info": {
53 | "codemirror_mode": {
54 | "name": "ipython",
55 | "version": 2
56 | },
57 | "file_extension": ".py",
58 | "mimetype": "text/x-python",
59 | "name": "python",
60 | "nbconvert_exporter": "python",
61 | "pygments_lexer": "ipython2",
62 | "version": "2.7.10"
63 | }
64 | },
65 | "nbformat": 4,
66 | "nbformat_minor": 0
67 | }
68 |
--------------------------------------------------------------------------------
/notebooks/2. Warm-up.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "> **SOMETIMES THE QUESTIONS ARE COMPLICATED AND THE ANSWERS ARE SIMPLE **\n",
8 | "\n",
9 | ">*Dr. Seuss*\n",
10 | "\n",
11 | "## Coin Toss\n",
12 | "\n",
13 | "You toss a coin 30 times and see head 24 times. Is it a fair coin?\n",
14 | "\n",
15 | "**Hypothesis 1**: Tossing a fair coin will get you 15 heads in 30 tosses. This coin is biased\n",
16 | "\n",
17 | "**Hypothesis 2**: Come on, even a fair coin could show 24 heads in 30 tosses. This is just by chance\n",
18 | "\n",
19 | "#### Statistical Method\n",
20 | "\n",
21 | "P(H) = ? \n",
22 | "\n",
23 | "P(HH) = ?\n",
24 | "\n",
25 | "P(THH) = ?\n",
26 | "\n",
27 | "Now, slightly tougher : P(2H, 1T) = ?\n",
28 | "\n",
29 | "Generalizing, \n",
30 | "\n",
31 | "
\n",
32 | "\n",
33 | "
\n",
34 | "
\n",
35 | "
\n",
36 | "
\n",
37 | "\n",
38 | "\n",
39 | "**What is the probability of getting 24 heads in 30 tosses ?**\n",
40 | "\n",
41 | "It is the probability of getting heads 24 times or more. \n",
42 | "\n",
43 | "#### Hacker's Approach\n",
44 | "\n",
45 | "Simulation. Run the experiment 100,000 times. Find the percentage of times the experiment returned 24 or more heads. If it is more than 5%, we conclude that the coin is biased. "
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 1,
51 | "metadata": {
52 | "collapsed": false
53 | },
54 | "outputs": [
55 | {
56 | "name": "stdout",
57 | "output_type": "stream",
58 | "text": [
59 | "Data of the Experiment: [1 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 1 1 1 1 0]\n",
60 | "Heads in the Experiment: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]\n",
61 | "Number of heads in the experiment: 15\n"
62 | ]
63 | }
64 | ],
65 | "source": [
66 | "import numpy as np \n",
67 | "\n",
68 | "total_tosses = 30\n",
69 | "num_heads = 24\n",
70 | "prob_head = 0.5\n",
71 | "\n",
72 | "#0 is tail. 1 is heads. Generate one experiment\n",
73 | "experiment = np.random.randint(0,2,total_tosses)\n",
74 | "print \"Data of the Experiment:\", experiment\n",
75 | "#Find the number of heads\n",
76 | "print \"Heads in the Experiment:\", experiment[experiment==1] #This will give all the heads in the array\n",
77 | "head_count = experiment[experiment==1].shape[0] #This will get the count of heads in the array\n",
78 | "print \"Number of heads in the experiment:\", head_count"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 2,
84 | "metadata": {
85 | "collapsed": false
86 | },
87 | "outputs": [],
88 | "source": [
89 | "#Now, the above experiment needs to be repeated 100 times. Let's write a function and put the above code in a loop\n",
90 | "\n",
91 | "def coin_toss_experiment(times_to_repeat):\n",
92 | "\n",
93 | " head_count = np.empty([times_to_repeat,1], dtype=int)\n",
94 | " \n",
95 | " for times in np.arange(times_to_repeat):\n",
96 | " experiment = np.random.randint(0,2,total_tosses)\n",
97 | " head_count[times] = experiment[experiment==1].shape[0]\n",
98 | " \n",
99 | " return head_count"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": 3,
105 | "metadata": {
106 | "collapsed": false
107 | },
108 | "outputs": [],
109 | "source": [
110 | "head_count = coin_toss_experiment(100)"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 4,
116 | "metadata": {
117 | "collapsed": false
118 | },
119 | "outputs": [
120 | {
121 | "data": {
122 | "text/plain": [
123 | "array([[15],\n",
124 | " [13],\n",
125 | " [15],\n",
126 | " [16],\n",
127 | " [11],\n",
128 | " [16],\n",
129 | " [14],\n",
130 | " [16],\n",
131 | " [13],\n",
132 | " [17]])"
133 | ]
134 | },
135 | "execution_count": 4,
136 | "metadata": {},
137 | "output_type": "execute_result"
138 | }
139 | ],
140 | "source": [
141 | "head_count[:10] "
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 5,
147 | "metadata": {
148 | "collapsed": false
149 | },
150 | "outputs": [
151 | {
152 | "name": "stdout",
153 | "output_type": "stream",
154 | "text": [
155 | "Dimensions: (100, 1) \n",
156 | "Type of object: \n"
157 | ]
158 | }
159 | ],
160 | "source": [
161 | "print \"Dimensions:\", head_count.shape, \"\\n\",\"Type of object:\", type(head_count)"
162 | ]
163 | },
164 | {
165 | "cell_type": "code",
166 | "execution_count": 6,
167 | "metadata": {
168 | "collapsed": false
169 | },
170 | "outputs": [],
171 | "source": [
172 | "#Let's plot the above distribution\n",
173 | "import matplotlib.pyplot as plt\n",
174 | "%matplotlib inline\n",
175 | "import seaborn as sns\n",
176 | "sns.set(color_codes = True)"
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 7,
182 | "metadata": {
183 | "collapsed": false
184 | },
185 | "outputs": [
186 | {
187 | "data": {
188 | "text/plain": [
189 | ""
190 | ]
191 | },
192 | "execution_count": 7,
193 | "metadata": {},
194 | "output_type": "execute_result"
195 | },
196 | {
197 | "data": {
198 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAFVCAYAAAAzJuxuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGspJREFUeJzt3W9slfX9//HXOaf0HNqeAyU9JmONDJlR77ikKiHGqeH7\nTdeoUbe5DGgb/3BDcIoisvFP7aaAmZlsS1mAmpmMoc0StyBGt2w6ZMNMyE8kYbpNZdPhwF9Liz3n\n9Jz2tNf1vUFkiND2Or2u86ZXn487wIHrXK/3uc75vM5V2nNFXNd1BQAATEStAwAAMJlRxAAAGKKI\nAQAwRBEDAGCIIgYAwBBFDACAoVGL+ODBg2ptbZUkvf/++1q4cKEWLVqkNWvWiJ98AgBgfEYs4o6O\nDq1bt07FYlGS1N7erqVLl+rZZ5/V4OCgdu/eXY6MAACE1ohFPGvWLLW3t586800kEjpx4oRc11Uu\nl9OUKVPKEhIAgLAasYgbGxsVi8VO/bmlpUXr16/XDTfcoJ6eHs2dOzfwgAAAhJmnb9ZauXKlnn32\nWb388su6+eab9cQTT4y6Df+PDADAuVV4+ceFQkHV1dWSpAsuuEAHDhwYdZtIJKKurkxp6SaAdDoZ\n2vnCPJvEfBMd801cYZ5NOjmfF2Mq4kgkIkl6/PHHtWzZMsXjcVVWVuqxxx7znhAAAJwyahHX19er\ns7NTknT11Vfr6quvDjwUAACTBR/oAQCAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgi\nBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMeboMIhAWjuOor69PmUzWOookqbq6WtEo74vPV47jKJfL\nlXWf8bh7zucnz5dwoYgxKeVyOb2y/4iGhu0Xs4FCXv9z1Rwlk96uYYryOfl8eV/xxNSy7bOmOqFs\nrvC523m+hA9FjEkrkajSsMtLAGMTT0zV1Kqasu2vqjrB83OSsD8dAABgEqOIAQAwRBEDAGCIIgYA\nwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYGjUIj548KBaW1slScePH9fSpUvV\n0tKi5uZmHTlyJPCAAACE2YifKN7R0aEXXnhB1dXVkqQnn3xSt9xyi5qamvTGG2/o3XffVX19fVmC\nAgAQRiOeEc+aNUvt7e1yXVeSdODAAR07dkx33nmndu3apXnz5pUlJAAAYTXiGXFjY+Nnvvz80Ucf\nadq0aXrmmWe0efNmdXR0aNmyZYGHBFAejuMol8tZx5B0MoskRaNnP1+Ix11lMtmyZMlms3Idtyz7\nwuTj6WKX06dP1/z58yVJ8+fP16ZNm8a0XTod7gtYh3m+sM4Wj7vS4R4laxLWURSLDKmurkaplP+P\ntdfj19fXp1f2H1EiUeV7Fq96e7sUjVRo2vTas/+Dwz1lzVJVlSz78+Vs+wvy+VJOYV1bSuGpiBsa\nGrR7927dcsst2rdvny6++OIxbdfVlSkp3ESQTidDO1+YZ/v0TCqTLRgnkfL9BXV3ZzUwEPH1fks5\nfplMVkPD0fPigvTDw1E50XNnSdYkynb8hoejyuYGVFFZvufLueYL6vlSTmFeWyTvbzLG9ONLkcjJ\nA75q1Srt3LlTCxYs0N69e7VkyRLvCQEAwCmjvu2tr69XZ2enJGnmzJn6+c9/HngoAAAmCz7QAwAA\nQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAY\noogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQ\nRQwAgCGKGAAAQxQxAACGRi3igwcPqrW19TO37dq1SwsWLAgsFAAAk0XFSH/Z0dGhF154QdXV1adu\ne/vtt/X8888HHgwAgMlgxDPiWbNmqb29Xa7rSpJ6e3u1adMmrVmz5tRtAACgdCOeETc2NurIkSOS\nJMdxtHbtWq1atUrxeLws4YDJwHEcZbNZ3+83HneVyXi732w2K9fhTTZQTiMW8ekOHTqkDz/8UG1t\nbRocHNR7772njRs3avXq1aNum04nxxXyfBfm+cI6WzzuSod7lKxJWEfRYP4T/b9/fKxp0wf9vePD\nPZ436e3tUlVV8jx5XBKKxmIjZilXzrFkCcLZ9heLDKmurkap1MR+bYZ1bSnFmIv48ssv14svvihJ\n+uijj/Tggw+OqYQlqasrU1q6CSCdToZ2vjDP9umZYiZbME4i5foLikQrNOyO+eU4JsmahOf5hoej\nyuYGVFF5/jwu58pSynxBZQnCuebL9xfU3Z3VwECkbFn8Fua1RfL+JmNMP74UiXz2gLuu+7nbAACA\nd6MWcX19vTo7O0e9DQAAeMcHegAAYIgiBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMUcQAABiiiAEA\nMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACA\nIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDA0KhFfPDgQbW2tkqS3nnnHTU3N6u1\ntVWLFy/W8ePHAw8IAECYjVjEHR0dWrdunYrFoiRpw4YNevjhh7V9+3Y1Njaqo6OjLCEBAAirEYt4\n1qxZam9vl+u6kqSnnnpKl156qSRpaGhI8Xg8+IQAAITYiEXc2NioWCx26s/pdFqS9Oabb2rHjh26\n4447Ag0HAEDYVXjd4KWXXtKWLVu0bds21dbWjmmbdDrpOdhEEub5wjpbPO5Kh3uUrElYR9FgPqFo\nLBZIFq/3GWQWr8aSpVw5rR6Xs+0vFhlSXV2NUqmJ/doM69pSCk9FvHPnTv3qV7/S9u3bNW3atDFv\n19WV8Rxsokink6GdL8yzZTLZk79mC8ZJpFx/QZFohSoq/c2SrEl4ni+oLKUYLUsp8wWVJQjnmi/f\nX1B3d1YDA5GyZfFbmNcWyfubjDEVcSQSkeM42rBhg2bOnKl7771XkjR37lzdd9993lMCAABJYyji\n+vp6dXZ2SpLeeOONwAMBADCZ8IEeAAAYoogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMUMQAAhihi\nAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBED\nAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYGjUIj548KBaW1slSR98\n8IEWLlyo5uZmtbW1yXXdwAMCABBmIxZxR0eH1q1bp2KxKEnauHGjHnzwQe3YsUOu6+qVV14pS0gA\nAMJqxCKeNWuW2tvbT535vv3227rqqqskSddee61ef/314BMCABBiFSP9ZWNjo44cOXLqz6d/Kbqq\nqkqZTGZMO0mnkyXGmxjCPF9YZ4vHXelwj5I1CesoGswnFI3FAsni9T6DzOLVWLKUK6fV43K2/cUi\nQ6qrq1EqNbFfm2FdW0oxYhGfKRr97wl0LpdTKpUa03ZdXWMr7IkonU6Gdr4wz5bJZE/+mi0YJ5Fy\n/QVFohWqqPQ3S7Im4Xm+oLKUYrQspcwXVJYgnGu+fH9B3d1ZDQxEypbFb2FeWyTvbzI8fdf0ZZdd\npn379kmS9uzZoyuvvNLTzgAAwGeN6Yw4Ejn5zmvVqlV6+OGHVSwWNWfOHDU1NQUaDgCAsBu1iOvr\n69XZ2SlJ+tKXvqTt27cHHgoAgMmCD/QAAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAw\nRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDAEEUMAIAh\nihgAAEMUMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgKEKrxs4jqO1a9fqX//6l6LR\nqB577DFddNFFQWQDACD0PJ8R//nPf1Y+n9dzzz2n73znO/rxj38cRC4AACYFz0WcSCSUyWTkuq4y\nmYymTJkSRC4AACYFz1+abmho0ODgoJqamnTixAlt2bJl1G3S6WRJ4SaKMM8X1tnicVc63KNkTcI6\nigbzCUVjsUCyeL3PILN4NZYs5cpp9bicbX+xyJDq6mqUSk3s12ZY15ZSeC7ip59+Wg0NDVq+fLmO\nHTum22+/Xbt27VJlZeU5t+nqyowr5PksnU6Gdr4wz5bJZE/+mi0YJ5Fy/QVFohWqqPQ3S7Im4Xm+\noLKUYrQspcwXVJYgnGu+fH9B3d1ZDQxEypbFb2FeWyTvbzI8f2k6n8+rurpakpRKpVQsFuU4jte7\nAQAAKuGMePHixVq9erUWLVqkoaEhrVixQomE/ZexAACYiDwXcSqV0ubNm4PIAgDApMMHegAAYIgi\nBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQx\nAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogB\nADBEEQMAYIgiBgDAUEUpG23dulV//OMfVSwW1dLSoq9//et+5wIAYFLwXMRvvPGGDhw4oM7OTvX3\n9+vpp58OIhcAAJOC5yLeu3evLrnkEt1zzz3KZrP67ne/G0QuAAAmBc9F3NPTo6NHj2rr1q3697//\nraVLl+q3v/1tENkQMo7jKJfLWceQJGWzWbmuax0DALwXcW1trebMmaOKigrNnj1b8XhcPT09mjFj\nxjm3SaeT4wp5vgvzfH7O1tfXp1f2H1EiUeXbfZaqt7dLVVVJ1c6wP3aD+YSisZiSNQnf79vrfQaZ\nxauxZClXTqvH5Wz7i0WGVFdXo1TK/rk7HmFeN73yXMRXXHGFfvGLX+jOO+/Uxx9/rHw+r9ra2hG3\n6erKlBzwfJdOJ0M7n9+zZTJZDQ1HNeyW9D2CvhoePvkDA5lswTiJlOsvKBKtUEWlv1mSNQnP8wWV\npRSjZSllvqCyBOFc8+X7C+ruzmpgIFK2LH4L87opeX+T4XlFvP7667V//37ddtttchxHjz76qCKR\nifuEAADAUkmnJitXrvQ7BwAAkxIf6AEAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCI\nIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMU\nMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQyUX8fHjx3Xdddfpn//8p595\nAACYVEoq4mKxqEceeURTp071Ow8AAJNKSUX8wx/+UAsXLlQ6nfY7DwAAk0qF1w1+/etfa8aMGbrm\nmmu0detWua4bRC74xHEc5XK5kraNx11lMlnfsmSzWbkOzxcAOF3E9dikLS0tikQikqS//e1vmj17\ntn72s5+prq4ukIAYn76+Pr24529KJKqso6i3t0tVVUnVzrB/rhzv+ljRWIwsZJlwWfpzWf3v3AuV\nSqWso8Anns+If/nLX576fWtrq37wgx+MWsJdXRnvySaIdDp5Xs+XyWQ1NBzVsOv5UCtZk1AmW/At\ny/BwVNncgCoq/bvPUuX6C0omq32dbzxZItEK3x+XUo5fUFlKMVoWv5+f48kShHPNl+8vqLs7q4GB\nSNmy+O18XzfHK51Oevr3/PgSAACGvJ8mnWb79u1+5QAAYFLijBgAAEMUMQAAhihiAAAMUcQAABii\niAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGBrXRR/ON/+/67iOdfeWdZ/T\nj1bpxIn+z90+NVGpi2dfWNYsAMLPcRxls1nrGJKk6upqRaOcz41XqIq4u/cT9QxMLes+i7m4MgPu\n526f0t+ni2eXNQqASWBwIK89b/UpNW26aY6BQl7/c9UcJZPerr2LzwtVEQPAZBBPTNXUqhrrGPAJ\nX1MAAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABD\nnj9rulgsas2aNfrPf/6jwcFBLV26VPPnzw8iGwAAoee5iHft2qUZM2boySef1CeffKJbb72VIgYA\noESei7ipqUlf+9rXJJ28LmYsFvM9FAAAk4XnIq6qqpIkZbNZ3X///Vq+fLnvoQAAGCvHcZTL5axj\nnJJOe7tGc0nXIz569KjuvfdeNTc368Ybb/Q9VKlqu6qVj5b/EsvJmsTnbqt0i2WbeyTxuKua6oSq\nqj+fcSzONlupBvMJRWMxX+9zPFkkf+crVZCPi9f7PN+O0WhZypXT6nE52/7Ol2MUiwyprq5GqVRp\n65yf62NfX59e2X9EiUSVb/dZqkKhXxddNNPTNp5bq7u7W3fddZceffRRzZs3b0zbdHVlvO6mJL29\nOWX642XZ16eSNQllsoXP3T5lOF+2uUeSyWSVzRU07Hp/g3Ku2UqV6y8oEq1QRaV/9zmeLMlkta/z\njSdLEI9LKcfvfDtGI2Xx+/k5nixBONd858sxyvcX1N2d1cBAxPO26XTS1/Uxk8lqaDha0jrnt6Fh\n7z+M5HmLLVu2KJPJaPPmzWptbVVra6sGBgY87xgAAJRwRrxu3TqtW7cuiCwAAEw6fKAHAACGKGIA\nAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBkf82o\nkHIcR5mM/WUQs9msXMe1jgEgZBzHUTabLWnbeNxVJlPatmcz0dc5ijggA4W8Xtn/vuKJqaY5Puk9\nrkRVjewvlw0gTAYH8trzVp9S06Z73ramOqFszr/rKU/0dY4iDlA8MVVTq2pMMxTyOdP9AwivUte4\nquqEhl3/6meir3P8HzEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMUMQAA\nhihiAAAMUcQAABjy/GGfjuOora1N//jHPzRlyhStX79eF154YRDZAAAIPc9nxH/4wx9ULBbV2dmp\nhx56SE888UQQuQAAmBQ8F/Gbb76pr371q5Kkr3zlKzp06JDvoQAAmCw8f2k6m82qpua/l72KxWJy\nHEfRqP1/N0cj0kD2eFn3WamEBrKfv65mZGhQA8V8WbOczUChoEg0pny/94twxyJDyvf7d83Q8WTx\n20ChoCkVUcWGrJME97iUcvzOt2M0Uha/n5/jyRKEc813vhwj1pZzZfG+7nsu4pqaGuVy/73241hK\nOJ1Oeg5WinT6K/pqWfYEAIA/PJ/GNjQ0aM+ePZKkt956S5dcconvoQAAmCwiruu6XjZwXVdtbW36\n+9//LknauHGjZs+eHUg4AADCznMRAwAA/9h/hxUAAJMYRQwAgCGKGAAAQxQxAACGAivirVu3asGC\nBfrmN7+p3/zmN0HtxoTjOFq9erUWLlyo5uZmHT582DqSbw4ePKjW1lZJ0gcffHBqxra2Nk307+s7\nfbZ33nlHzc3Nam1t1eLFi3X8eHk/CCYIp8/3qV27dmnBggVGifx1+nzHjx/X0qVL1dLSoubmZh05\ncsQ43fidPt/777+vhQsXatGiRVqzZs2Efu0Vi0WtXLlSzc3N+ta3vqVXX301VGvL2ebzvL64AfjL\nX/7i3n333a7rum4ul3N/8pOfBLEbM6+99pp7//33u67runv37nXvu+8+40T+2LZtm3vTTTe53/72\nt13Xdd27777b3bdvn+u6rvvII4+4v//97y3jjcuZs7W0tLjvvPOO67qu29nZ6W7cuNEy3ridOZ/r\nuu5f//pX9/bbb//MbRPVmfN973vfc19++WXXdU+uN6+++qplvHE7c74HHnjAfe2111zXdd0VK1ZM\n6Pmef/55d8OGDa7ruu6JEyfc6667zl2yZElo1pazzed1fQnkjHjv3r265JJLdM8992jJkiWaP39+\nELsxk0gklMlk5LquMpmMpkyZYh3JF7NmzVJ7e/upd6dvv/22rrrqKknStddeq9dff90y3ricOdtT\nTz2lSy+9VJI0NDSkeDxuGW/czpyvt7dXmzZtmvBnU586c74DBw7o2LFjuvPOO7Vr1y7NmzfPOOH4\nnDlfIpHQiRMn5LqucrnchF5jmpqatGzZMkknv5pYUVERqrXlbPNt2rTJ0/oSSBH39PTo0KFD+ulP\nf6rvf//7euihh4LYjZmGhgYNDg6qqalJjzzyiFpaWqwj+aKxsVGxWOzUn09fwKuqqpTJZCxi+eLM\n2dLptKSTFzHZsWOH7rjjDqNk/jh9PsdxtHbtWq1atUpVVVXGyfxx5vH76KOPNG3aND3zzDP6whe+\noI6ODsN043fmfC0tLVq/fr1uuOEG9fT0aO7cuYbpxqeqqkrV1dXKZrO6//779cADD8hxnM/8/URe\nW86cb/ny5aqrq5M09vUlkCKura3VNddco4qKCs2ePVvxeFw9PT1B7MrE008/rYaGBv3ud7/Tzp07\ntWrVKg0ODlrH8t3pnyGey+WUSqUM0/jvpZdeUltbm7Zt26ba2lrrOL45dOiQPvzwQ7W1tWnFihV6\n7733tHHjRutYvpo+ffqpr7TNnz8/dFeBW7lypZ599lm9/PLLuvnmmyf85WaPHj2q22+/Xbfeeqtu\nuumm0K0tp8934403SvK2vgRSxFdccYX+9Kc/SZI+/vhj5fP5UC10+Xxe1dXVkqRUKqVisfiZd3hh\ncdlll2nfvn2SpD179ujKK680TuSfnTt3aseOHdq+fbvq6+ut4/jq8ssv14svvqjt27frqaee0pe/\n/GWtXr3aOpavGhoatHv3bknSvn37dPHFF9sG8lmhUDi1xlxwwQXq6+szTlS67u5u3XXXXVq5cqW+\n8Y1vSArX2nK2+byuL56vvjQW119/vfbv36/bbrtNjuPo0UcfVSQSCWJXJhYvXqzVq1dr0aJFGhoa\n0ooVK5RIJKxj+ebTY7Vq1So9/PDDKhaLmjNnjpqamoyTjV8kEpHjONqwYYNmzpype++9V5I0d+5c\n3Xfffcbpxu/M15nruqF67Z3+3Fy3bp2ee+45pVIp/ehHPzJO5o9P53v88ce1bNkyxeNxVVZW6rHH\nHjNOVrotW7Yok8lo8+bN2rx5syRp7dq1Wr9+fSjWljPncxxH7777rr74xS+OeX3hs6YBADDEB3oA\nAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAY+j954m43MqFUVwAAAABJRU5ErkJggg==\n",
199 | "text/plain": [
200 | ""
201 | ]
202 | },
203 | "metadata": {},
204 | "output_type": "display_data"
205 | }
206 | ],
207 | "source": [
208 | "sns.distplot(head_count, kde=False)"
209 | ]
210 | },
211 | {
212 | "cell_type": "markdown",
213 | "metadata": {},
214 | "source": [
215 | "**Exercise**: Try setting `kde=True` in the above cell and observe what happens"
216 | ]
217 | },
218 | {
219 | "cell_type": "code",
220 | "execution_count": null,
221 | "metadata": {
222 | "collapsed": true
223 | },
224 | "outputs": [],
225 | "source": []
226 | },
227 | {
228 | "cell_type": "code",
229 | "execution_count": 8,
230 | "metadata": {
231 | "collapsed": false
232 | },
233 | "outputs": [
234 | {
235 | "data": {
236 | "text/plain": [
237 | "array([], dtype=int64)"
238 | ]
239 | },
240 | "execution_count": 8,
241 | "metadata": {},
242 | "output_type": "execute_result"
243 | }
244 | ],
245 | "source": [
246 | "#Number of times the experiment returned 24 heads.\n",
247 | "head_count[head_count>=24]"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 9,
253 | "metadata": {
254 | "collapsed": false
255 | },
256 | "outputs": [
257 | {
258 | "name": "stdout",
259 | "output_type": "stream",
260 | "text": [
261 | "No of times experiment returned 24 heads or more: 0\n",
262 | "% of times with 24 or more heads: 0.0\n"
263 | ]
264 | }
265 | ],
266 | "source": [
267 | "print \"No of times experiment returned 24 heads or more:\", head_count[head_count>=24].shape[0]\n",
268 | "print \"% of times with 24 or more heads: \", head_count[head_count>=24].shape[0]/float(head_count.shape[0])*100"
269 | ]
270 | },
271 | {
272 | "cell_type": "code",
273 | "execution_count": null,
274 | "metadata": {
275 | "collapsed": false
276 | },
277 | "outputs": [],
278 | "source": []
279 | },
280 | {
281 | "cell_type": "markdown",
282 | "metadata": {},
283 | "source": [
284 | "#### Exercise: Repeat the experiment 100,000 times. "
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": null,
290 | "metadata": {
291 | "collapsed": true
292 | },
293 | "outputs": [],
294 | "source": []
295 | },
296 | {
297 | "cell_type": "markdown",
298 | "metadata": {},
299 | "source": [
300 | "# Is the coin fair?"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": null,
306 | "metadata": {
307 | "collapsed": true
308 | },
309 | "outputs": [],
310 | "source": []
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "### Extra pointers on numpy"
317 | ]
318 | },
319 | {
320 | "cell_type": "markdown",
321 | "metadata": {},
322 | "source": [
323 | "**** Removing `for` loop in the funciton ****"
324 | ]
325 | },
326 | {
327 | "cell_type": "code",
328 | "execution_count": 10,
329 | "metadata": {
330 | "collapsed": false
331 | },
332 | "outputs": [],
333 | "source": [
334 | "def coin_toss_experiment_2(times_to_repeat):\n",
335 | "\n",
336 | " head_count = np.empty([times_to_repeat,1], dtype=int)\n",
337 | " experiment = np.random.randint(0,2,[times_to_repeat,total_tosses])\n",
338 | " return experiment.sum(axis=1)"
339 | ]
340 | },
341 | {
342 | "cell_type": "markdown",
343 | "metadata": {},
344 | "source": [
345 | "#### Exercise: Benchmark `coin_toss_experiment` and `coin_toss_experiment_2` for 100 and 100,000 runs and report improvements, if any"
346 | ]
347 | },
348 | {
349 | "cell_type": "code",
350 | "execution_count": null,
351 | "metadata": {
352 | "collapsed": true
353 | },
354 | "outputs": [],
355 | "source": []
356 | }
357 | ],
358 | "metadata": {
359 | "kernelspec": {
360 | "display_name": "Python 2",
361 | "language": "python",
362 | "name": "python2"
363 | },
364 | "language_info": {
365 | "codemirror_mode": {
366 | "name": "ipython",
367 | "version": 2
368 | },
369 | "file_extension": ".py",
370 | "mimetype": "text/x-python",
371 | "name": "python",
372 | "nbconvert_exporter": "python",
373 | "pygments_lexer": "ipython2",
374 | "version": "2.7.10"
375 | }
376 | },
377 | "nbformat": 4,
378 | "nbformat_minor": 0
379 | }
380 |
--------------------------------------------------------------------------------
/notebooks/3. Resampling.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Problem\n",
8 | "The number of shoes sold by an e-commerce company during the first three months(12 weeks) of the year were:\n",
9 | "
\n",
10 | "23 21 19 24 35 17 18 24 33 27 21 23\n",
11 | "\n",
12 | "Meanwhile, the company developed some dynamic price optimization algorithms and the sales for the next 12 weeks were:\n",
13 | "
\n",
14 | "31 28 19 24 32 27 16 41 23 32 29 33\n",
15 | "\n",
16 | "Did the dynamic price optimization algorithm deliver superior results? Can it be trusted?\n",
17 | "\n",
18 | "### Solution\n",
19 | "\n",
20 | "Before we get onto different approaches, let's quickly get a feel for the data\n",
21 | "\n"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 1,
27 | "metadata": {
28 | "collapsed": true
29 | },
30 | "outputs": [],
31 | "source": [
32 | "import numpy as np\n",
33 | "import seaborn as sns\n",
34 | "sns.set(color_codes=True)\n",
35 | "%matplotlib inline"
36 | ]
37 | },
38 | {
39 | "cell_type": "code",
40 | "execution_count": 2,
41 | "metadata": {
42 | "collapsed": true
43 | },
44 | "outputs": [],
45 | "source": [
46 | "#Load the data\n",
47 | "before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])\n",
48 | "after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 3,
54 | "metadata": {
55 | "collapsed": false
56 | },
57 | "outputs": [
58 | {
59 | "data": {
60 | "text/plain": [
61 | "23.75"
62 | ]
63 | },
64 | "execution_count": 3,
65 | "metadata": {},
66 | "output_type": "execute_result"
67 | }
68 | ],
69 | "source": [
70 | "before_opt.mean()"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 4,
76 | "metadata": {
77 | "collapsed": false
78 | },
79 | "outputs": [
80 | {
81 | "data": {
82 | "text/plain": [
83 | "27.916666666666668"
84 | ]
85 | },
86 | "execution_count": 4,
87 | "metadata": {},
88 | "output_type": "execute_result"
89 | }
90 | ],
91 | "source": [
92 | "after_opt.mean()"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 5,
98 | "metadata": {
99 | "collapsed": true
100 | },
101 | "outputs": [],
102 | "source": [
103 | "observed_difference = after_opt.mean() - before_opt.mean()"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 6,
109 | "metadata": {
110 | "collapsed": false
111 | },
112 | "outputs": [
113 | {
114 | "name": "stdout",
115 | "output_type": "stream",
116 | "text": [
117 | "Difference between the means is: 4.16666666667\n"
118 | ]
119 | }
120 | ],
121 | "source": [
122 | "print \"Difference between the means is:\", observed_difference"
123 | ]
124 | },
125 | {
126 | "cell_type": "markdown",
127 | "metadata": {},
128 | "source": [
129 | "On average, the sales after optimization is more than the sales before optimization. But is the difference legit? Could it be due to chance?\n",
130 | "\n",
131 | "**Classical Method** : We could cover this method later on. This entails doing a *t-test* \n",
132 | "\n",
133 | "**Hacker's Method** : Let's see if we can provide a hacker's perspective to this problem, similar to what we did in the previous notebook."
134 | ]
135 | },
136 | {
137 | "cell_type": "code",
138 | "execution_count": null,
139 | "metadata": {
140 | "collapsed": true
141 | },
142 | "outputs": [],
143 | "source": [
144 | "#Step 1: Create the dataset. Let's give Label 0 to before_opt and Label 1 to after_opt"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": null,
150 | "metadata": {
151 | "collapsed": true
152 | },
153 | "outputs": [],
154 | "source": [
155 | "#Learn about the following three functions"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": null,
161 | "metadata": {
162 | "collapsed": true
163 | },
164 | "outputs": [],
165 | "source": [
166 | "?np.append"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": null,
172 | "metadata": {
173 | "collapsed": true
174 | },
175 | "outputs": [],
176 | "source": [
177 | "?np.zeros"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": null,
183 | "metadata": {
184 | "collapsed": true
185 | },
186 | "outputs": [],
187 | "source": [
188 | "?np.ones"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": 7,
194 | "metadata": {
195 | "collapsed": false
196 | },
197 | "outputs": [],
198 | "source": [
199 | "shoe_sales = np.array([np.append(np.zeros(before_opt.shape[0]), np.ones(after_opt.shape[0])),\n",
200 | "np.append(before_opt, after_opt)], dtype=int)"
201 | ]
202 | },
203 | {
204 | "cell_type": "code",
205 | "execution_count": 8,
206 | "metadata": {
207 | "collapsed": false
208 | },
209 | "outputs": [
210 | {
211 | "name": "stdout",
212 | "output_type": "stream",
213 | "text": [
214 | "Shape: (2, 24)\n",
215 | "Data: \n",
216 | "[[ 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1]\n",
217 | " [23 21 19 24 35 17 18 24 33 27 21 23 31 28 19 24 32 27 16 41 23 32 29 33]]\n"
218 | ]
219 | }
220 | ],
221 | "source": [
222 | "print \"Shape:\", shoe_sales.shape\n",
223 | "print \"Data:\", \"\\n\", shoe_sales"
224 | ]
225 | },
226 | {
227 | "cell_type": "code",
228 | "execution_count": 9,
229 | "metadata": {
230 | "collapsed": false
231 | },
232 | "outputs": [
233 | {
234 | "name": "stdout",
235 | "output_type": "stream",
236 | "text": [
237 | "Shape: (24, 2)\n",
238 | "Data: \n",
239 | "[[ 0 23]\n",
240 | " [ 0 21]\n",
241 | " [ 0 19]\n",
242 | " [ 0 24]\n",
243 | " [ 0 35]\n",
244 | " [ 0 17]\n",
245 | " [ 0 18]\n",
246 | " [ 0 24]\n",
247 | " [ 0 33]\n",
248 | " [ 0 27]\n",
249 | " [ 0 21]\n",
250 | " [ 0 23]\n",
251 | " [ 1 31]\n",
252 | " [ 1 28]\n",
253 | " [ 1 19]\n",
254 | " [ 1 24]\n",
255 | " [ 1 32]\n",
256 | " [ 1 27]\n",
257 | " [ 1 16]\n",
258 | " [ 1 41]\n",
259 | " [ 1 23]\n",
260 | " [ 1 32]\n",
261 | " [ 1 29]\n",
262 | " [ 1 33]]\n"
263 | ]
264 | }
265 | ],
266 | "source": [
267 | "shoe_sales = shoe_sales.T\n",
268 | "print \"Shape:\",shoe_sales.shape\n",
269 | "print \"Data:\", \"\\n\", shoe_sales"
270 | ]
271 | },
272 | {
273 | "cell_type": "code",
274 | "execution_count": 10,
275 | "metadata": {
276 | "collapsed": true
277 | },
278 | "outputs": [],
279 | "source": [
280 | "#This is the approach we are going to take\n",
281 | "#We are going to randomly shuffle the labels. Then compute the mean between the two groups. \n",
282 | "#Find the % of times when the difference between the means computed is greater than what we observed above\n",
283 | "#If the % of times is less than 5%, we would make the call that the improvements are real"
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": 11,
289 | "metadata": {
290 | "collapsed": true
291 | },
292 | "outputs": [],
293 | "source": [
294 | "np.random.shuffle(shoe_sales)"
295 | ]
296 | },
297 | {
298 | "cell_type": "code",
299 | "execution_count": 12,
300 | "metadata": {
301 | "collapsed": false
302 | },
303 | "outputs": [
304 | {
305 | "data": {
306 | "text/plain": [
307 | "array([[ 1, 29],\n",
308 | " [ 1, 24],\n",
309 | " [ 1, 19],\n",
310 | " [ 1, 16],\n",
311 | " [ 1, 28],\n",
312 | " [ 1, 41],\n",
313 | " [ 1, 27],\n",
314 | " [ 0, 18],\n",
315 | " [ 1, 33],\n",
316 | " [ 0, 24],\n",
317 | " [ 0, 21],\n",
318 | " [ 1, 32],\n",
319 | " [ 1, 31],\n",
320 | " [ 0, 23],\n",
321 | " [ 0, 19],\n",
322 | " [ 0, 17],\n",
323 | " [ 0, 27],\n",
324 | " [ 0, 21],\n",
325 | " [ 1, 32],\n",
326 | " [ 0, 23],\n",
327 | " [ 0, 24],\n",
328 | " [ 0, 33],\n",
329 | " [ 0, 35],\n",
330 | " [ 1, 23]])"
331 | ]
332 | },
333 | "execution_count": 12,
334 | "metadata": {},
335 | "output_type": "execute_result"
336 | }
337 | ],
338 | "source": [
339 | "shoe_sales"
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": 13,
345 | "metadata": {
346 | "collapsed": true
347 | },
348 | "outputs": [],
349 | "source": [
350 | "experiment_label = np.random.randint(0,2,shoe_sales.shape[0])"
351 | ]
352 | },
353 | {
354 | "cell_type": "code",
355 | "execution_count": 14,
356 | "metadata": {
357 | "collapsed": false
358 | },
359 | "outputs": [
360 | {
361 | "data": {
362 | "text/plain": [
363 | "array([0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0,\n",
364 | " 1])"
365 | ]
366 | },
367 | "execution_count": 14,
368 | "metadata": {},
369 | "output_type": "execute_result"
370 | }
371 | ],
372 | "source": [
373 | "experiment_label"
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": 15,
379 | "metadata": {
380 | "collapsed": false
381 | },
382 | "outputs": [
383 | {
384 | "name": "stdout",
385 | "output_type": "stream",
386 | "text": [
387 | "[[ 0 29]\n",
388 | " [ 1 24]\n",
389 | " [ 1 19]\n",
390 | " [ 1 16]\n",
391 | " [ 1 28]\n",
392 | " [ 1 41]\n",
393 | " [ 0 27]\n",
394 | " [ 0 18]\n",
395 | " [ 0 33]\n",
396 | " [ 0 24]\n",
397 | " [ 0 21]\n",
398 | " [ 1 32]\n",
399 | " [ 0 31]\n",
400 | " [ 1 23]\n",
401 | " [ 0 19]\n",
402 | " [ 1 17]\n",
403 | " [ 0 27]\n",
404 | " [ 1 21]\n",
405 | " [ 1 32]\n",
406 | " [ 0 23]\n",
407 | " [ 1 24]\n",
408 | " [ 1 33]\n",
409 | " [ 0 35]\n",
410 | " [ 1 23]]\n"
411 | ]
412 | }
413 | ],
414 | "source": [
415 | "experiment_data = np.array([experiment_label, shoe_sales[:,1]])\n",
416 | "experiment_data = experiment_data.T\n",
417 | "print experiment_data"
418 | ]
419 | },
420 | {
421 | "cell_type": "code",
422 | "execution_count": 16,
423 | "metadata": {
424 | "collapsed": false
425 | },
426 | "outputs": [],
427 | "source": [
428 | "experiment_diff_mean = experiment_data[experiment_data[:,0]==1].mean() \\\n",
429 | " - experiment_data[experiment_data[:,0]==0].mean()"
430 | ]
431 | },
432 | {
433 | "cell_type": "code",
434 | "execution_count": 17,
435 | "metadata": {
436 | "collapsed": false
437 | },
438 | "outputs": [
439 | {
440 | "data": {
441 | "text/plain": [
442 | "0.26223776223776341"
443 | ]
444 | },
445 | "execution_count": 17,
446 | "metadata": {},
447 | "output_type": "execute_result"
448 | }
449 | ],
450 | "source": [
451 | "experiment_diff_mean"
452 | ]
453 | },
454 | {
455 | "cell_type": "code",
456 | "execution_count": 18,
457 | "metadata": {
458 | "collapsed": true
459 | },
460 | "outputs": [],
461 | "source": [
462 | "#Like the previous notebook, let's repeat this experiment 100 and then 100000 times"
463 | ]
464 | },
465 | {
466 | "cell_type": "code",
467 | "execution_count": 19,
468 | "metadata": {
469 | "collapsed": true
470 | },
471 | "outputs": [],
472 | "source": [
473 | "def shuffle_experiment(number_of_times):\n",
474 | " experiment_diff_mean = np.empty([number_of_times,1])\n",
475 | " for times in np.arange(number_of_times):\n",
476 | " experiment_label = np.random.randint(0,2,shoe_sales.shape[0])\n",
477 | " experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T\n",
478 | " experiment_diff_mean[times] = experiment_data[experiment_data[:,0]==1].mean() \\\n",
479 | " - experiment_data[experiment_data[:,0]==0].mean()\n",
480 | " return experiment_diff_mean "
481 | ]
482 | },
483 | {
484 | "cell_type": "code",
485 | "execution_count": 20,
486 | "metadata": {
487 | "collapsed": false
488 | },
489 | "outputs": [],
490 | "source": [
491 | "experiment_diff_mean = shuffle_experiment(100)"
492 | ]
493 | },
494 | {
495 | "cell_type": "code",
496 | "execution_count": 21,
497 | "metadata": {
498 | "collapsed": false
499 | },
500 | "outputs": [
501 | {
502 | "data": {
503 | "text/plain": [
504 | "array([[ 1.83333333],\n",
505 | " [ 0.7 ],\n",
506 | " [-0.33333333],\n",
507 | " [ 0.54444444],\n",
508 | " [-1.0625 ],\n",
509 | " [ 0.61428571],\n",
510 | " [ 3.36713287],\n",
511 | " [ 1.3 ],\n",
512 | " [ 0.3 ],\n",
513 | " [ 0.66666667]])"
514 | ]
515 | },
516 | "execution_count": 21,
517 | "metadata": {},
518 | "output_type": "execute_result"
519 | }
520 | ],
521 | "source": [
522 | "experiment_diff_mean[:10]"
523 | ]
524 | },
525 | {
526 | "cell_type": "code",
527 | "execution_count": 22,
528 | "metadata": {
529 | "collapsed": false
530 | },
531 | "outputs": [
532 | {
533 | "data": {
534 | "text/plain": [
535 | ""
536 | ]
537 | },
538 | "execution_count": 22,
539 | "metadata": {},
540 | "output_type": "execute_result"
541 | },
542 | {
543 | "data": {
544 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXEAAAECCAYAAAAIMefLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEARJREFUeJzt3X+M5Hddx/Hn7l5vZndm7kpyK2psIAH5BKMoVIIgoW0K\nBqKmQuAPU4kgBBASiqINVFJjgkjE1kCsRBCoYsTQpkDQADVn0yYkNAcWFDHv/iAkog3uHaWdmdvd\n+zHjHztn7sruzo+dme++756PpMn8+H6/n1e/O/ea73xnvt/vQr/fR5KU02LVASRJk7PEJSkxS1yS\nErPEJSkxS1ySErPEJSmxA7s9WUq5DPg48DSgBrwX+C7wj8CDg8k+HBGfnmVISdL2di1x4HpgLSJe\nW0p5CvAN4I+AWyLi1pmnkyTtaliJ3wHcObi9CJwGrgRKKeU64CHgHRHRmV1ESdJOFkY5YrOU0gI+\nB3wEqAPfiIgHSik3AU+JiN+fbUxJ0naGfrFZSrkC+BfgbyPiH4DPRMQDg6c/Czx3hvkkSbsY9sXm\nU4G7gbdGxD2Dh79YSnl7RBwDrgW+OmyQfr/fX1hY2HNYSbqEjFSau+5OKaV8EHgNEOc9/C7gFrb2\njz8KvGmEfeL9tbX2KHn2pdXVFlnzZ84O5q+a+auzutoaqcR33RKPiBuAG7Z56sWThJIkTZcH+0hS\nYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYsMu\nz6aLUK/Xo9vtVh0DgEajweKi2xLSpCzxS1C32+XosUeo1ZcrzbG5sc61z38GrVar0hxSZpb4JapW\nX2Z5pVl1DEl75OdYSUrMEpekxCxxSUrMEpekxCxxSUrMEpekxCxxSUrMEpekxCxxSUrMEpekxCxx\nSUrMEpekxCxxSUrMEpekxCxxSUrM84nrkjfLKx3Van3a7c7I03ulI43LEtclb5ZXOmo26nS6GyNN\n65WONAlLXGJ2VzpaadQ52/efmWbHz22SlJglLkmJWeKSlNiuO+tKKZcBHweeBtSA9wL/CdwO9IBv\nAm+LiP5sY0qStjNsS/x6YC0iXgK8HLgNuAW4afDYAnDdbCNKknYyrMTvAG4+b9rTwPMi4r7BY18A\nXjqjbJKkIXbdnRIRXYBSSoutQn8P8GfnTdIBDs8snSRpV0N/wFpKuQK4C7gtIj5VSvnT855uAT8Y\nZaDV1dwHMGTO/+TstVqfZqPOSqNeUaItSwtnOHKkyaFDu6/bWa/7Wa+PVnO05Y66PuYt82sf8ucf\nZtgXm08F7gbeGhH3DB5+oJRyVUTcC7wCODrKQGtr7T0FrdLqaitt/u2yt9sdOt2Nyg9CWT+5wfHj\nHTY3F3acZh7rfpbro9Ws0+6MdsTmKOtj3jK/9iF3/lHffIa9am9ia3fJzaWUc/vGbwA+VEo5CHwL\nuHPSkJKkvRm2T/wGtkr7ya6eSRpJ0lg82EeSErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPE\nJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkx\nS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySEjtQdQBdunq9\nHp1OZ9dparU+7fbu0+xVp9Oh3+vPdAxpVixxVebU5jr3ff0JDh2+fMdpmo06ne7GTHM8/tgJ6itN\nVmY6ijQblrgqVasvs7zS3PH5lUads/3Zvkw31rszXb40S+4Tl6TELHFJSmykz6mllBcA74+Ia0op\nzwU+Dzw0ePrDEfHpWQWUJO1saImXUm4EfgM49xOBK4FbI+LWWQaTJA03yu6Uh4FXAQuD+1cCv1xK\nubeU8tellJ2/lZIkzdTQEo+Iu4Az5z10P/B7EXEV8G3gD2eUTZI0xCRfbH4mIh4Y3P4s8Nwp5pEk\njWGSH+B+sZTy9og4BlwLfHWUmVZXWxMMtX9kzv/k7LVan2ajzkqjXlGiLafW6ywuLdFq7p5j2PPz\nyjGpUZe7tHCGI0eaHDq0v15rmV/7kD//MOOU+Lnjkt8C3FZKOQ08CrxplJnX1tpjRts/VldbafNv\nl73d7tDpbsz8IJphuic3WFg8wIGDOx+R2WrWaXdme8TmKDkmNU7+9ZMbHD/eYXNzYfjEc5L5tQ+5\n84/65jPSv+KI+A7wosHtbwAvnjSYJGl6PNhHkhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKz\nxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpsWpPKC3p//V6PTqdzvAJ56DRaLC46DZeBpa4\ntE+c2lznvq8/waHDl1eaY3NjnWuf/wxarYv7ijgXC0tc2kdq9WWWV5pVx1Aifl6SpMQscUlKzBKX\npMQscUlKzBKXpMQscUlKzJ8YzlGv16Pb7c51zFqtT7t94QEknU6Hfq8/1xySZsMSn6Nut8vRY49Q\nqy/Pbcxmo06nu3HBY48/doL6SpOVuaWQNCuW+JzN+2COlUads/0L/8wb6/P9NCBpdtwnLkmJWeKS\nlJglLkmJWeKSlJglLkmJWeKSlJglLkmJWeKSlJglLkmJWeKSlJglLkmJjXTulFLKC4D3R8Q1pZRn\nArcDPeCbwNsiwlPiSVIFhm6Jl1JuBD4K1AYP3QrcFBEvARaA62YXT5K0m1F2pzwMvIqtwgZ4XkTc\nN7j9BeClswgmSRpuaIlHxF3AmfMeWjjvdgc4PO1QkqTRTPLFZu+82y3gB1PKIkka0yQXhXiglHJV\nRNwLvAI4OspMq6utCYbaP6aRv1br02zUWWnUp5BodK3mheOdWq+zuLT0Q4/P26g5Zp1z1utj1OXu\nl7/L0sIZjhxpcujQ1mvef7v72zglfu4XKO8EPlpKOQh8C7hzlJnX1tpjRts/VldbU8nfbnfodDd+\n6Eo7s9Rq1ml3Lrw8W/fkBguLBzhwcGOHueZjlBzb5a8ix6TGyb9f/i7rJzc4frzD5ubC1F77Vcmc\nf9Q3n5HaJCK+A7xocPsh4OoJc0mSpsiDfSQpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtc\nkhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKz\nxCUpMUtckhI7UHUASXqyXq9Ht9vd83JqtT7tdmdPy2g0Giwu7t/tXUtc0r7T7XY5euwRavXlPS2n\n2ajT6W5MPP/mxjrXPv8ZtFqtPeWYJUtc0r5Uqy+zvNLc0zJWGnXO9i/umtu/nxEkSUNZ4pKUmCUu\nSYlZ4pKUmCUuSYlZ4pKUmCUuSYlZ4pKUmCUuSYlZ4pKUmCUuSYlZ4pKU2MRnhiml/Cvw+ODutyPi\nDdOJJEka1UQlXkqpA0TENdONI0kax6Rb4j8LrJRSvjRYxk0Rcf/0YkmSRjFpiXeBD0TEx0opPwl8\noZTyrIjoTTGbpAr0ej06na2r4UzjyjiT6HQ69Hv9uY+b0aQl/iDwMEBEPFRKOQH8GPDfO82wurp/\nr4wximnkr9X6NBt1Vhr1KSQaXat54Xin1ussLi390OPzNmqOWeec9foYdbn75+/yOF978HscvvwU\nfPv7lWR47LE1VlZaU1kXe1nG0sIZjhxpcujQ/u2vSUv89cBzgLeVUn4cOAQ8utsMa2vtCYeq3upq\nayr52+0One7GXK800mrWaXcuvDxV9+QGC4sHOHBw8stWTcMoObbLX0WOSY2Tf7/9Xc72D8xl/W/n\n7NlFOt3NPa+LveZfP7nB8eMdNjcX9pRjEqNuOE7aJh8DPlFKuW9w//XuSpGk+ZuoxCPiDPDaKWeR\nJI3Jg30kKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJIS\ns8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISOzCPQb73v2ucONGZ\nx1A7qh2scehQq9IMkjRtcynx+7/1fdrdjXkMtaPWgU1e+LxnV5pBkqZtLiV+sFbj4On+PIba0SKn\nKx1fkmbBfeKSlJglLkmJWeKSlJglLkmJWeKSlJglLkmJzeUnhvtBr9ej3W5PNG+t1qfd3vvBSp1O\nh36v2p9aSrq4XDIlvrmxztFjj1CrL489b7NRpzOFg5Uef+wE9ZUmK3tekiRtuWRKHKBWX2Z5pTn2\nfCuNOmf7e19VG+vdPS9Dks7nPnFJSswSl6TEJtpHUEpZBP4SeA6wCbwxIh6ZZjBJ0nCTbon/GnAw\nIl4EvAu4ZXqRJEmjmrTEfxH4IkBE3A/8/NQSSZJGNmmJHwKeOO/+2cEuFknSHE36u7kngPMvk7MY\nEb2dJt7snGCzU+1FIS7rn2bz9PpE8y4tnGH95N7zb25ssLC4xPrJ+V3laLvsVeTYzig5prXu95pj\nUuPk349/l3ms/2EZ9mKv+Tc3JuuMeZq0xL8M/CpwRynlF4B/223i61525cKE40iSdjFpiX8GeFkp\n5cuD+6+fUh5J0hgW+n3P5SFJWfllpCQlZolLUmKWuCQlZolLUmIzPxVtKaUB/D1wOXAK+M2I+J9Z\njzstpZTDwN+x9bv4g8DvRsRXqk01mVLKK4FXR8T1VWcZ5mI5P08p5QXA+yPimqqzjKqUchnwceBp\nQA14b0R8vtpUoyulLAEfBZ4F9IG3RMR/VJtqfKWUHwG+BlwbEQ/uNN08tsTfCByLiKvYKsMb5zDm\nNP0O8M8RcTXwOuC2StNMqJTyQeB9QJbf7Kc/P08p5Ua2yqRWdZYxXQ+sRcRLgJcDf1FxnnH9CtCL\niBcD7wH+uOI8Yxu8kf4VMPQiBDMv8Yg4Vx6w9c7+2KzHnLI/Bz4yuH0ZsP8P4drel4HfJk+JXwzn\n53kYeBV51vk5dwA3D24vAmcqzDK2iPgc8ObB3aeTr3MAPgB8GHh02IRT3Z1SSnkD8I4nPfy6iPha\nKeUo8NPAL01zzGkakv9HgU8CN8w/2eh2+X/4dCnl6goiTWrb8/PsdnqH/SYi7iqlPL3qHOOKiC5A\nKaXFVqH/QbWJxhcRZ0sptwOvBF5dcZyxlFJex9YnobtLKe9myEbAXA/2KaUU4J8i4plzG3QKSik/\nA3wKeGdEfKnqPJMalPibI+LXq84yTCnlFuArEXHH4P5/RcQVFcca26DEPxURL6w6yzhKKVcAdwG3\nRcTtFceZWCnlqcD9wLMjIsWn6FLKvWzty+8DPwcEcF1EfG+76efxxea7ge9GxCfZ2r+T6qNZKeWn\n2NoaeU1E/HvVeS4hY52fR9MzKL67gbdGxD1V5xlXKeW1wE9ExJ+wtfuzN/gvhcH3hwCUUu5ha8Nr\n2wKH+Vwo+WPA35RSfgtYIt95Vt7H1q9SPrT1QYIfRMQrq400sXPv7hlcTOfnybLOz7kJOAzcXEo5\nt2/8FRFR7alIR3cncPtgi/Yy4IaI2Kw408x47hRJSsyDfSQpMUtckhKzxCUpMUtckhKzxCUpMUtc\nkhKzxCUpMUtckhL7P8BDb+El45uCAAAAAElFTkSuQmCC\n",
545 | "text/plain": [
546 | ""
547 | ]
548 | },
549 | "metadata": {},
550 | "output_type": "display_data"
551 | }
552 | ],
553 | "source": [
554 | "sns.distplot(experiment_diff_mean, kde=False)"
555 | ]
556 | },
557 | {
558 | "cell_type": "code",
559 | "execution_count": 23,
560 | "metadata": {
561 | "collapsed": false
562 | },
563 | "outputs": [
564 | {
565 | "name": "stdout",
566 | "output_type": "stream",
567 | "text": [
568 | "Data: Difference in mean greater than observed: []\n",
569 | "Number of times diff in mean greater than observed: 0\n",
570 | "% of times diff in mean greater than observed: 0.0\n"
571 | ]
572 | }
573 | ],
574 | "source": [
575 | "#Finding % of times difference of means is greater than observed\n",
576 | "print \"Data: Difference in mean greater than observed:\", \\\n",
577 | " experiment_diff_mean[experiment_diff_mean>=observed_difference]\n",
578 | "\n",
579 | "print \"Number of times diff in mean greater than observed:\", \\\n",
580 | " experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]\n",
581 | "print \"% of times diff in mean greater than observed:\", \\\n",
582 | " experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]/float(experiment_diff_mean.shape[0])*100"
583 | ]
584 | },
585 | {
586 | "cell_type": "markdown",
587 | "metadata": {},
588 | "source": [
589 | "#### Exercise: Repeat the above for 100,000 runs and report the results"
590 | ]
591 | },
592 | {
593 | "cell_type": "code",
594 | "execution_count": null,
595 | "metadata": {
596 | "collapsed": true
597 | },
598 | "outputs": [],
599 | "source": []
600 | },
601 | {
602 | "cell_type": "markdown",
603 | "metadata": {},
604 | "source": [
605 | "# Is the result by chance? "
606 | ]
607 | },
608 | {
609 | "cell_type": "markdown",
610 | "metadata": {},
611 | "source": [
612 | "### What is the justification for shuffling the labels? \n",
613 | "\n",
614 | ">Thought process is this: If price optimization had no real effect, then, the sales before optimization would often give more sales than sales after optimization. By shuffling, we are simulating the situation where that happens - sales before optimization is greater than sales after optimization. If many such trials provide improvements, then, the price optimization has no effect. In statistical terms, *the observed difference could have occurred by chance*. \n",
615 | "\n",
616 | "Now, to show that the same difference in mean might lead to a different conclusion, let's try the same experiment with a different dataset. "
617 | ]
618 | },
619 | {
620 | "cell_type": "code",
621 | "execution_count": 24,
622 | "metadata": {
623 | "collapsed": true
624 | },
625 | "outputs": [],
626 | "source": [
627 | "before_opt = np.array([230, 210, 190, 240, 350, 170, 180, 240, 330, 270, 210, 230])\n",
628 | "after_opt = np.array([310, 180, 190, 240, 220, 240, 160, 410, 130, 320, 290, 210])"
629 | ]
630 | },
631 | {
632 | "cell_type": "code",
633 | "execution_count": 25,
634 | "metadata": {
635 | "collapsed": false
636 | },
637 | "outputs": [
638 | {
639 | "name": "stdout",
640 | "output_type": "stream",
641 | "text": [
642 | "Mean sales before price optimization: 237.5\n",
643 | "Mean sales after price optimization: 241.666666667\n",
644 | "Difference in mean sales: 4.16666666667\n"
645 | ]
646 | }
647 | ],
648 | "source": [
649 | "print \"Mean sales before price optimization:\", np.mean(before_opt)\n",
650 | "print \"Mean sales after price optimization:\", np.mean(after_opt)\n",
651 | "print \"Difference in mean sales:\", np.mean(after_opt) - np.mean(before_opt) #Same as above"
652 | ]
653 | },
654 | {
655 | "cell_type": "code",
656 | "execution_count": 26,
657 | "metadata": {
658 | "collapsed": true
659 | },
660 | "outputs": [],
661 | "source": [
662 | "shoe_sales = np.array([np.append(np.zeros(before_opt.shape[0]), np.ones(after_opt.shape[0])),\n",
663 | "np.append(before_opt, after_opt)], dtype=int)\n",
664 | "shoe_sales = shoe_sales.T"
665 | ]
666 | },
667 | {
668 | "cell_type": "code",
669 | "execution_count": 27,
670 | "metadata": {
671 | "collapsed": false
672 | },
673 | "outputs": [
674 | {
675 | "data": {
676 | "text/plain": [
677 | ""
678 | ]
679 | },
680 | "execution_count": 27,
681 | "metadata": {},
682 | "output_type": "execute_result"
683 | },
684 | {
685 | "data": {
686 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAECCAYAAAAW+Nd4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGShJREFUeJzt3W+MXNd53/Hv7FI7s7szS7/g2IpUI6nd6KmSlHYkE64l\ng6QgRbIEG0oDREYUuJKRSLXMCHIrIK23igAbVNTEkdoSdpiEdEIpKlzAhJPYJUTJJQIuvalCxpGU\nqHIem0qcwEbS7tIkd2Y4M8vdmb6Yu83szv/d2fl3fh+A0M5z7869RzP7u3fOOXNvrFwuIyIiYRnr\n9w6IiEjvKfxFRAKk8BcRCZDCX0QkQAp/EZEAKfxFRAK0o9UKZvYA8GD0cBJ4D/BB4L8CJeAN4IC7\nl83sIeBhYAU46O4nzGwSeAFIAxngAXdf7HZDRESkfbFO5vmb2eeB14CPAM+4+5yZHQZeAl4BXgZu\npnKQ+AbwPuCXgKS7f9bMPgp8wN0/1d1miIhIJ9ru9jGz9wE/5u5HgZvdfS5a9CJwB7AHmHf3q+6+\nBJwHdgO3AiejdU9G64qISB910uc/C3wm+jlWVc8AO4EZ4HKD+tKGmoiI9FFb4W9mbwNucPfTUalU\ntXgGuEQl4FNV9VSd+lpNRET6qOWAb2QvcKrq8atmti86GNwdLTsLPGVmcSAB3EhlMHgeuAc4F607\nRxPlcrkci8WarSIiIrU6Cs52w/8G4K2qx48DR8xsAngTOB7N9jkEnKHyiWLW3YvRgPBzZnYGKAL3\nN937WIyFhUwnbRgq6XRK7RtSo9w2UPuGXTqdar1SlY5m+/RIedRfILVvOI1y20DtG3bpdKqjM399\nyUtEJEAKfxGRACn8RUQC1O6Ar4gApVKJXC5Xd9n09DRjYzqfkuGg8BfpQC6X49S5t4gnJtfVi4U8\nt+95N6lUZzMuRPpF4S9SR70z/Hi8TDabZWIiweRUsk97JtIdCn+ROuqd4SenE3z/e98nMZVkqo/7\nJtINCn+RBuKJyXVn+FPTCeKJRB/3SKR7NDolIhIghb+ISIAU/iIiAVL4i4gESOEvIhIghb+ISIAU\n/iIiAVL4i4gESOEvIhIghb+ISIAU/iIiAVL4i4gESOEvIhIghb+ISIB0SWeRbaTbPsqgUvjLyOtn\nAOu2jzKoWoa/mX0a+AhwDfB5YB44BpSAN4AD7l42s4eAh4EV4KC7nzCzSeAFIA1kgAfcfXE7GiLS\nSL8DeONNYUQGQdNTHjPbD3zA3W8B9gPvAp4BZt19LxAD7jWza4FHgVuAu4CnzWwCeAR4PVr3eeCJ\nbWqHSFNrAVz9b+PBQCQkrT7v3gn8pZn9IfA14KvAze4+Fy1/EbgD2APMu/tVd18CzgO7gVuBk9G6\nJ6N1RUSkz1p1+6SBdwIfpnLW/zUqZ/trMsBOYAa43KC+tKEmIiJ91ir8F4FvufsK8G0zKwDXVy2f\nAS5RCfjqjtNUnfparaV0erQHwdS+3orHyySnE0xNr7/5+nhshV27kszM1O5vo9+ZnkowNj5OKtne\nc21m2/00aK9dt416+zrRKvy/ATwGPGtm1wFTwCkz2+fup4G7gVPAWeApM4sDCeBGKoPB88A9wLlo\n3bnaTdRaWMhsoinDIZ1OqX09lslkyeYKrJbXv93zVwosLmYpFmNt/U4qmSB3pUBsbAc7JgptPddm\ntt0vg/jadVMI7etE0/CPZuzsNbOzVMYHPgl8FzgSDei+CRyPZvscAs5E6826e9HMDgPPmdkZoAjc\n32mDRESk+1pO9XT3f1+nvL/OekeBoxtqeeC+ze6ciIhsD33JS6QPSqUS2Wy24XJ9+1e2m8JfpA+W\ni3nmXltiZufbapbp27/SCwp/kT7RN3+ln/S5UkQkQAp/EZEAKfxFRAKk8BcRCZDCX0QkQAp/EZEA\naaqnjIxGd+zKZrOUS+U+7JHI4FL4y8hodMeuyxcvkJhKMtWn/RIZRAp/GSn1vjhVyNe/f69IyNTn\nLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS\n+IuIBKitC7uZ2Z8Dl6OHfw08DRwDSsAbwAF3L5vZQ8DDwApw0N1PmNkk8AKQBjLAA+6+2NVWiIhI\nR1qGv5klANz9tqraV4FZd58zs8PAvWb2CvAocDMwCXzDzL4OPAK87u6fNbOPAk8An+p+U0Q6UyqV\nyGazdZfpHgAy6to5838PMGVmL0Xr/0fgJnefi5a/CNwJrALz7n4VuGpm54HdwK3Ar0XrngR+pYv7\nL7Jpy8U8c68tMbPzbTXLdA8AGXXthH8O+Jy7f9HMfpRKgFfLADuBGf6xa2hjfWlDTWQg1Lv+P+ge\nADL62gn/bwPnAdz9O2Z2AfjJquUzwCUqAZ+qqqfq1NdqTaXTqVarDDW1b3vE42WS0wmmphPr6sv5\nBGPj46SS7dWbLZueql8fj62wa1eSmZn1be90n5o9Vy/ovRmOdsL/41S6bw6Y2XVUAvxlM9vn7qeB\nu4FTwFngKTOLAwngRiqDwfPAPcC5aN252k2st7CQ2URThkM6nVL7tkkmkyWbK7BaXv+2zl0pEBvb\nwY6JQlv1RstSyUTD38lfKbC4mKVYjG1pn5o913bTe3O4dXpgayf8vwj8npmthfbHgQvAETObAN4E\njkezfQ4BZ6hMIZ1192I0IPycmZ0BisD9He2hBKnRzdjXTE9PMzammcoim9Uy/N19BfhYnUX766x7\nFDi6oZYH7tvk/kmgGt2MHaBYyHP7nneTSukjvMhm6QbuMrAaDcYOokbTRjVlVAaVwl+kCxpNG9WU\nURlUCn+RLqn3SUVTRmVQacRMRCRAOvMXGRLNZkBp9pN0SuEvMiQazYDS7CfZDIW/yBAZphlQMtj0\nOVFEJEAKfxGRACn8RUQCpPAXEQmQwl9EJEAKfxGRACn8RUQCpPAXEQmQwl9EJEAKfxGRAOnyDjJ0\ndOMUka1T+MvQ0Y1TRLZO4S9DaZRvnKJPNtILCn+RAaNPNtILCn+RATTKn2xkMGi2j4hIgBT+IiIB\naqvbx8zeDnwTuB0oAcei/74BHHD3spk9BDwMrAAH3f2EmU0CLwBpIAM84O6LXW+FiIh0pOWZv5ld\nA/w2kANiwLPArLvvjR7fa2bXAo8CtwB3AU+b2QTwCPB6tO7zwBPb0goREelIO90+nwMOA38fPb7J\n3eein18E7gD2APPuftXdl4DzwG7gVuBktO7JaF0REemzpuFvZg8CC+7+clSKRf/WZICdwAxwuUF9\naUNNRET6rFWf/8eBspndAbwXeI5K//2aGeASlYBPVdVTdeprtZbS6VTrlYaY2tdaPF4mOZ1gajpR\ns2w5n2BsfJxUMrEt9WbLpqf6t+1G9fHYCrt2JZmZ2fr/d703w9E0/N1939rPZvbHwCeAz5nZPnc/\nDdwNnALOAk+ZWRxIADdSGQyeB+4BzkXrztGGhYVM5y0ZEul0Su1rQyaTJZsrsFqufYvmrhSIje1g\nx0RhW+qNlqWSib5tu1k9f6XA4mKWYjHGVui9Odw6PbB1+iWvMvA4cCQa0H0TOB7N9jkEnKHSlTTr\n7kUzOww8Z2ZngCJwf4fbE5EWGl0OAmB6epqxMc3ollpth7+731b1cH+d5UeBoxtqeeC+ze6ciLTW\n6HIQxUKe2/e8m1RKXR1SS5d3EBkB9S4HIdKMwl/6qlQqkcvVXrNGV7AU2V4Kf+mrXC7HqXNvEU9M\nrqvrCpYi20vhL32nK1iK9J6mAYiIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4i\nIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/\niEiAFP4iIgFqeQN3MxsHjgA3AGXgE0AROAaUgDeAA+5eNrOHgIeBFeCgu58ws0ngBSANZIAH3H1x\nG9oiIiJtaufM/8NAyd0/CDwB/CrwDDDr7nuBGHCvmV0LPArcAtwFPG1mE8AjwOvRus9HzyEiIn3U\nMvzd/Y+AfxM9/BHgInCzu89FtReBO4A9wLy7X3X3JeA8sBu4FTgZrXsyWldERPqoZbcPgLuvmtkx\n4KeBnwV+qmpxBtgJzACXG9SXNtQkMKVSiVwuV1PPZrOUS+U+7JFI2NoKfwB3f9DM3gGcBRJVi2aA\nS1QCPlVVT9Wpr9WaSqdTrVYZaiG2b2lpiVPnvkciMbWufvHiAlNTKVLJxLr6cj7B2Ph4Tb3Zsm7V\nmy2bnurftjutx8rLxONl4vH6B9dkMsnY2PoP/yG+N0PVzoDvx4B/4u5PA3lgFfgzM9vn7qeBu4FT\nVA4KT5lZnMrB4UYqg8HzwD3AuWjdudqtrLewkNlca4ZAOp0Ksn2ZTJaV1TFWy+vfcqurY2RzRXZM\nFNbVc1cKxMZ21NSbLetWvdGyVDLRt21vpn7xBxf5g1P/l5mdb6vZRrGQ5/Y97yaV+scwDPW9OSo6\nPbC1c+Z/HDhmZqeBa4DHgL8CjkQDum8Cx6PZPoeAM1TGEmbdvWhmh4HnzOwMlVlC93e0hyKyafHE\nJJNTyX7vhgygluHv7nngo3UW7a+z7lHgaJ3fv2+T+yci26BUKpHNZtfV4vEymUyW6enpmu4gGT1t\n9/mLtKNUKpHJ1H601sDuYFku5pl7bWldl1ByOsGFCxdruoNkNCn8pauy2Synzr1FPDG5rn754gUS\nU0mmGvye9N7GLqGp6QTZXO0YhIwmhb90Xb1+5kK+dpqniPSPOvZERAKk8BcRCZDCX0QkQAp/EZEA\nKfxFRAKk8BcRCZDCX0QkQAp/EZEAKfxFRAKk8BcRCZDCX0QkQAp/EZEAKfxFRAKk8BcRCZDCX0Qk\nQAp/EZEA6WYu0rFSqUQuV//mLMvLZd2uUWQIKPylY7lcru6tGgGuLmcolSd0u8YhVe/G7mt0Y/fR\novCXTal3q0aA8dgqVworfdgj6YZ6N3YHKBbyurH7iFH4i8g6jQ7sMlr0GU5EJEBNz/zN7Brgd4Ef\nBuLAQeBbwDGgBLwBHHD3spk9BDwMrAAH3f2EmU0CLwBpIAM84O6L29QWERFpU6sz/58HFtx9L/Ah\n4AvAM8BsVIsB95rZtcCjwC3AXcDTZjYBPAK8Hq37PPDE9jRDREQ60Sr8vww8WbXuVeAmd5+Lai8C\ndwB7gHl3v+ruS8B5YDdwK3AyWvdktK6IiPRZ024fd88BmFmKyoHgCeA3qlbJADuBGeByg/rShpqI\niPRZy9k+ZvZO4CvAF9z9S2b261WLZ4BLVAK+eg5Yqk59rdZSOj3a08mGvX3xeJnkdIKp6UTNsgv5\nyySn46SS65ct5xOMjY9vud7N59rMNqan+rftfrZvPLbCrl1JZmaG+7077H973dRqwPcdwMvAJ939\nj6Pyq2a2z91PA3cDp4CzwFNmFgcSwI1UBoPngXuAc9G6c7RhYSGziaYMh3Q6NfTty2SyZHMFVsv1\n3z7ZXJEdE4V1tdyVArGxHVuud/O5Ot1GKpno27b73b78lQKLi1mKxVjNcw2LUfjba6bTA1urM/9Z\nKl01T5rZWt//Y8ChaED3TeB4NNvnEHCGytjArLsXzeww8JyZnQGKwP0d7Z2IiGyLVn3+j1EJ+432\n11n3KHB0Qy0P3LeF/RMRkW2gL3mJiARI4S8iEiCFv4hIgBT+IiIBUviLiARI4S8iEiCFv4hIgBT+\nIiIBUviLiARIt3GUhkqlErlcrqaezWYpl8p92CPpF93YffQo/KWhXC7HqXNvEU9MrqtfvniBxFSS\nqT7tl/Sebuw+ehT+0lS9m3kX8rWfBmT06cbuo0Wf1UREAqTwFxEJkMJfRCRACn8RkQAp/EVEAqTw\nFxEJkMJfRCRACn8RkQDpS16iyziIBEjhL7qMg2xas2v+gK77M8gU/gLoMg6yOY2u+QO67s+gU/iL\nyJbomj/Dqa3wN7P3A//J3W8zs38GHANKwBvAAXcvm9lDwMPACnDQ3U+Y2STwApAGMsAD7r64De0Q\nEZEOtOyMM7NfBo4A8aj0LDDr7nuBGHCvmV0LPArcAtwFPG1mE8AjwOvRus8DT3S/CSIyiNbGAzKZ\nTM2/UqnU790LXjtn/ueBnwF+P3p8k7vPRT+/CNwJrALz7n4VuGpm54HdwK3Ar0XrngR+pVs7LiKD\nTfcAGGwtw9/dv2JmP1JVilX9nAF2AjPA5Qb1pQ016RNN6ZRe03jA4NrMgG/157UZ4BKVgK8+jKfq\n1NdqLaXTo31G0K/2LS0tcerc90gk1k/evHhxgampFKlkYl19OZ9gbHy87TrAhfxlktPxLT9Xs210\n67k2s43pqf5te5ja12zZeGyFXbuSzMz0/u9g1LOlE5sJ/1fNbJ+7nwbuBk4BZ4GnzCwOJIAbqQwG\nzwP3AOeidefqP+V6CwuZTezWcEinU31rXyaTZWV1jNXy+pd9dXWMbK7IjonCunruSoHY2I6262u6\n8VzNttGt5+p0G6lkom/bHrb2NVuWv1JgcTFLsRir+Z3t1M+/vV7o9MDWybcv1voFHgc+Y2Z/QuXg\ncdzd/w9wCDhD5WAw6+5F4DDw42Z2BvhF4DMd7Z2IiGyLts783f27VGby4O7fAfbXWecocHRDLQ/c\nt9WdFBGR7tL3rkVEAqTwFxEJkMJfRCRACn8RkQAp/EVEAqTwFxEJkMJfRCRACn8RkQDpZi4i0lPN\nbv2o2z72jsJ/BOnqnTLIdKnnwaDwH0G6IbsMOl3quf8U/iNKN2QXkWbUuSYiEiCd+Q+pRv36oL59\nGU4aCO4thf+QatSvD+rbl+GkgeDeUvgPsUaDZurbl2GlgeDe0ecoEZEA6cx/wGnOvohsB4X/gNOc\nfQmdBoK3h8J/CGjOvoRMA8HbQ+EvIgNPA8Hdp89LIiIB0pn/gNDArkhnmo0FgMYDWtn28DezMeA3\ngd1AEfhFd39ru7c7bDSwK9KZRmMBoPGAdvTizP+ngQl3v8XM3g88E9WCVCqVyGQyNfVsNsvEREID\nuyIdaDQWUO9TQTxeJpPJ6hNBpBfhfytwEsDd/9TM3teDbfZVs+vuLC8v8T//9C0SU+vP5XWGL9I9\n9T4VJKcTLCxc4AM//kMkk7UHjNAOCr0I/xlgqerxqpmNuXupB9vuikZhXipVmrDxDZPNZnnlf/9D\nTcADXF3OEBuf0Bm+yDbb+KlgajpBbPECc6/9XU1XUf5KLriDQi/Cfwmo7njravAvLy/z2uuv1V12\n/XXXMTOzc8vbyGaznH71b4jHE+vqly//gLHYOKkN27h8+QdMTqZINDiPLxby5K9kN9QKxMbGt1zv\n5nNtZttXl69QLKwOdfsaLRuPrfRt28PWvl60Yyvt22i5WODrr/xVzd9ysVhg30/+07oHhV7ZrnGL\nXoT/PPAR4Mtm9i+Bv2ixfiyd7qyx119/+yZ3rX27d9+w7duQYfAT/d6Bbab2haIX4f8HwE+Z2Xz0\n+OM92KaIiDQRK5c1h1xEJDSjN4ohIiItKfxFRAKk8BcRCZDCX0QkQANxYTczGweeBW4GJoAn3f1k\nNDX0vwArwMvu/tk+7uaWmdk/B14B3u7uy6PSPjPbCbxA5fscE8C/c/dXRqh9I3d9KjO7Bvhd4IeB\nOHAQ+BZwDCgBbwAH3H2oZ4SY2duBbwK3U2nXMUakfWb2aSrT6K8BPk9lWv0x2mzfoJz5fwzY4e4f\npHLdnxuj+m8BPxfV329m7+3XDm6Vmc1Qua5Roap8mNFo378Fvu7u+4EHgS9E9VF5/f7/9amA/0Dl\ndRx2Pw8suPte4ENUXrNngNmoFgPu7eP+bVl0gPttIEelPc8yIu0zs/3AB6L35H7gXXT4+g1K+N8J\nfN/M/gdwBPijKCwn3P1vonVeAu7o1w5uhZnFqLwJPw3ko9oMEB+F9gH/Gfid6OdrgLyZpRiR148N\n16cCRuH6VF8Gnox+HgOuAje5+1xUe5Hhfb3WfI7KCdbfR49HqX13An9pZn8IfA34KnBzJ+3rebeP\nmf0C8KkN5QUg7+4fNrO9wO8B97P+mkAZKke3gdagfX8L/Hd3/wszg8pReeM1j4a5fQ+6+zfN7Frg\n94HHgJ0MYfsaGPrrU23k7jmA6CD9ZeAJ4DeqVslSeQ2Hkpk9SOWTzctR90gs+rdmqNsHpIF3Ah+m\n8nf1NTpsX8/D392/CHyxumZmXwJORMvnzOwGaq8JNANc6tV+blaD9n0H+IUoOK+lchb8EUakfQBm\n9i+ALwGPu/uZ6JPN0LWvgW29PlW/mNk7ga8AX3D3L5nZr1ctTjG8rxdUriRQNrM7gPcCz1EJzDXD\n3r5F4FvuvgJ828wKwPVVy1u2b1C6fb4B3ANgZu8B/tbdM8Cymb0r6ja5E5hr8hwDy91/1N1vc/fb\ngH8A7hyl9pnZj1E5e/w5d38JwN2XGJH2URlIW3t/tnN9qoFnZu8AXgZ+2d2PReVXzWxf9PPdDO/r\nhbvvc/f90d/ca8C/Bk6OSvuoZOaHAMzsOmAKONVJ+wZitg+Vfv7DZva/osefqPrvfwPGgZfc/Vw/\ndq7LqkffR6V9v0plls+hqFvrkrv/K0anfaN4fapZKt0CT5rZWt//Y1RewwngTeB4v3ZuG5SBx4Ej\no9A+dz9hZnvN7CyVk/hPAt+lg/bp2j4iIgEalG4fERHpIYW/iEiAFP4iIgFS+IuIBEjhLyISIIW/\niEiAFP4iIgFS+IuIBOj/AVMVjN48eqv2AAAAAElFTkSuQmCC\n",
687 | "text/plain": [
688 | ""
689 | ]
690 | },
691 | "metadata": {},
692 | "output_type": "display_data"
693 | }
694 | ],
695 | "source": [
696 | "experiment_diff_mean = shuffle_experiment(100000)\n",
697 | "sns.distplot(experiment_diff_mean, kde=False)"
698 | ]
699 | },
700 | {
701 | "cell_type": "code",
702 | "execution_count": 28,
703 | "metadata": {
704 | "collapsed": false
705 | },
706 | "outputs": [
707 | {
708 | "name": "stdout",
709 | "output_type": "stream",
710 | "text": [
711 | "Number of times diff in mean greater than observed: 40473\n",
712 | "% of times diff in mean greater than observed: 40.473\n"
713 | ]
714 | }
715 | ],
716 | "source": [
717 | "#Finding % of times difference of means is greater than observed\n",
718 | "print \"Number of times diff in mean greater than observed:\", \\\n",
719 | " experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]\n",
720 | "print \"% of times diff in mean greater than observed:\", \\\n",
721 | " experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]/float(experiment_diff_mean.shape[0])*100"
722 | ]
723 | },
724 | {
725 | "cell_type": "markdown",
726 | "metadata": {},
727 | "source": [
728 | "### Did the conclusion change now? "
729 | ]
730 | },
731 | {
732 | "cell_type": "code",
733 | "execution_count": null,
734 | "metadata": {
735 | "collapsed": true
736 | },
737 | "outputs": [],
738 | "source": []
739 | },
740 | {
741 | "cell_type": "markdown",
742 | "metadata": {},
743 | "source": [
744 | "# Effect Size\n",
745 | "\n",
746 | "> **Because you can't argue with all the fools in the world. It's easier to let them have their way, then trick them when they're not paying attention** - Christopher Paolini\n",
747 | "\n",
748 | "In the first case, how much did the price optimization increase the sales on average?"
749 | ]
750 | },
751 | {
752 | "cell_type": "code",
753 | "execution_count": 29,
754 | "metadata": {
755 | "collapsed": false
756 | },
757 | "outputs": [
758 | {
759 | "name": "stdout",
760 | "output_type": "stream",
761 | "text": [
762 | "The % increase of sales in the first case: 17.5438596491 %\n"
763 | ]
764 | }
765 | ],
766 | "source": [
767 | "before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])\n",
768 | "after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])\n",
769 | "\n",
770 | "print \"The % increase of sales in the first case:\", \\\n",
771 | "(np.mean(after_opt) - np.mean(before_opt))/np.mean(before_opt)*100,\"%\""
772 | ]
773 | },
774 | {
775 | "cell_type": "code",
776 | "execution_count": 30,
777 | "metadata": {
778 | "collapsed": false
779 | },
780 | "outputs": [
781 | {
782 | "name": "stdout",
783 | "output_type": "stream",
784 | "text": [
785 | "The % increase of sales in the second case: 1.75438596491 %\n"
786 | ]
787 | }
788 | ],
789 | "source": [
790 | "before_opt = np.array([230, 210, 190, 240, 350, 170, 180, 240, 330, 270, 210, 230])\n",
791 | "after_opt = np.array([310, 180, 190, 240, 220, 240, 160, 410, 130, 320, 290, 210])\n",
792 | "\n",
793 | "print \"The % increase of sales in the second case:\", \\\n",
794 | "(np.mean(after_opt) - np.mean(before_opt))/np.mean(before_opt)*100,\"%\""
795 | ]
796 | },
797 | {
798 | "cell_type": "markdown",
799 | "metadata": {},
800 | "source": [
801 | "**Would business feel comfortable spending millions of dollars if the increase is going to be just 1.75%. Does it make sense? Maybe yes - if margins are thin and any increase is considered good. But if the returns from the price optimization module does not let the company break even, it makes no sense to take that path.**"
802 | ]
803 | },
804 | {
805 | "cell_type": "markdown",
806 | "metadata": {},
807 | "source": [
808 | "> Someone tells you the result is statistically significant. The first question you should ask?\n",
809 | "\n",
810 | "# How large is the effect?\n",
811 | "\n",
812 | "To answer such a question, we will make use of the concept **confidence interval**\n",
813 | "\n",
814 | "In plain english, *confidence interval* is the range of values the measurement metric is going to take. \n",
815 | "\n",
816 | "An example would be: 90% of the times, the increase in average sales (before and after price optimization) would be within the bucket `3.4 and 6.7` (These numbers are illustrative. We will derive those numbers below)\n",
817 | "\n",
818 | "What is the *hacker's way* of doing it? We will do the following steps:\n",
819 | "\n",
820 | "1. From actual sales data, we sample the data with repetition (separately for before and after) - sample size will be the same as the original\n",
821 | "2. Find the differences between the mean of the two samples.\n",
822 | "3. Repeat steps 1 and 2 , say 100,000 times.\n",
823 | "4. Sort the differences. For getting 90% interval, take the 5% and 95% number. That range gives you the 90% confidence interval on the mean.\n",
824 | "5. This process of generating the samples is called **bootstrapping**"
825 | ]
826 | },
827 | {
828 | "cell_type": "code",
829 | "execution_count": 31,
830 | "metadata": {
831 | "collapsed": true
832 | },
833 | "outputs": [],
834 | "source": [
835 | "#Load the data\n",
836 | "before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])\n",
837 | "after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])"
838 | ]
839 | },
840 | {
841 | "cell_type": "code",
842 | "execution_count": 32,
843 | "metadata": {
844 | "collapsed": false
845 | },
846 | "outputs": [],
847 | "source": [
848 | "#generate a uniform random sample\n",
849 | "random_before_opt = np.random.choice(before_opt, size=before_opt.size, replace=True)"
850 | ]
851 | },
852 | {
853 | "cell_type": "code",
854 | "execution_count": 33,
855 | "metadata": {
856 | "collapsed": false
857 | },
858 | "outputs": [
859 | {
860 | "name": "stdout",
861 | "output_type": "stream",
862 | "text": [
863 | "Actual sample before optimization: [23 21 19 24 35 17 18 24 33 27 21 23]\n",
864 | "Bootstrapped sample before optimization: [21 17 19 21 33 27 24 18 18 19 24 24]\n"
865 | ]
866 | }
867 | ],
868 | "source": [
869 | "print \"Actual sample before optimization:\", before_opt\n",
870 | "print \"Bootstrapped sample before optimization: \", random_before_opt"
871 | ]
872 | },
873 | {
874 | "cell_type": "code",
875 | "execution_count": 34,
876 | "metadata": {
877 | "collapsed": false
878 | },
879 | "outputs": [
880 | {
881 | "name": "stdout",
882 | "output_type": "stream",
883 | "text": [
884 | "Mean for actual sample: 23.75\n",
885 | "Mean for bootstrapped sample: 22.0833333333\n"
886 | ]
887 | }
888 | ],
889 | "source": [
890 | "print \"Mean for actual sample:\", np.mean(before_opt)\n",
891 | "print \"Mean for bootstrapped sample:\", np.mean(random_before_opt)"
892 | ]
893 | },
894 | {
895 | "cell_type": "code",
896 | "execution_count": 35,
897 | "metadata": {
898 | "collapsed": false
899 | },
900 | "outputs": [
901 | {
902 | "name": "stdout",
903 | "output_type": "stream",
904 | "text": [
905 | "Actual sample after optimization: [31 28 19 24 32 27 16 41 23 32 29 33]\n",
906 | "Bootstrapped sample after optimization: [33 41 27 32 28 41 33 41 41 31 29 19]\n",
907 | "Mean for actual sample: 27.9166666667\n",
908 | "Mean for bootstrapped sample: 33.0\n"
909 | ]
910 | }
911 | ],
912 | "source": [
913 | "random_after_opt = np.random.choice(after_opt, size=after_opt.size, replace=True)\n",
914 | "print \"Actual sample after optimization:\", after_opt\n",
915 | "print \"Bootstrapped sample after optimization: \", random_after_opt\n",
916 | "print \"Mean for actual sample:\", np.mean(after_opt)\n",
917 | "print \"Mean for bootstrapped sample:\", np.mean(random_after_opt)"
918 | ]
919 | },
920 | {
921 | "cell_type": "code",
922 | "execution_count": 36,
923 | "metadata": {
924 | "collapsed": false
925 | },
926 | "outputs": [
927 | {
928 | "name": "stdout",
929 | "output_type": "stream",
930 | "text": [
931 | "Difference in means of actual samples: 4.16666666667\n",
932 | "Difference in means of bootstrapped samples: 10.9166666667\n"
933 | ]
934 | }
935 | ],
936 | "source": [
937 | "print \"Difference in means of actual samples:\", np.mean(after_opt) - np.mean(before_opt)\n",
938 | "print \"Difference in means of bootstrapped samples:\", np.mean(random_after_opt) - np.mean(random_before_opt)"
939 | ]
940 | },
941 | {
942 | "cell_type": "code",
943 | "execution_count": 37,
944 | "metadata": {
945 | "collapsed": true
946 | },
947 | "outputs": [],
948 | "source": [
949 | "#Like always, we will repeat this experiment 100,000 times. \n",
950 | "\n",
951 | "def bootstrap_experiment(number_of_times):\n",
952 | " mean_difference = np.empty([number_of_times,1])\n",
953 | " for times in np.arange(number_of_times):\n",
954 | " random_before_opt = np.random.choice(before_opt, size=before_opt.size, replace=True)\n",
955 | " random_after_opt = np.random.choice(after_opt, size=after_opt.size, replace=True)\n",
956 | " mean_difference[times] = np.mean(random_after_opt) - np.mean(random_before_opt)\n",
957 | " return mean_difference"
958 | ]
959 | },
960 | {
961 | "cell_type": "code",
962 | "execution_count": 38,
963 | "metadata": {
964 | "collapsed": false
965 | },
966 | "outputs": [
967 | {
968 | "data": {
969 | "text/plain": [
970 | ""
971 | ]
972 | },
973 | "execution_count": 38,
974 | "metadata": {},
975 | "output_type": "execute_result"
976 | },
977 | {
978 | "data": {
979 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAECCAYAAAAW+Nd4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGhdJREFUeJzt3X+M2/d93/En785H3vFIuYi4OGmDbnHhd90ASmJXaCpn\nkj171qwl1RoMHuA0cIxEnl3B8IBsGXZTPbiQo7SpjUZoJgxWGtlQUKAxkgyFYMWDVujU6w8rqa3G\nc/K25aQItnbpnWbfkRTJ04ncH/zyRFHUkbwjj0d+Xg/gIPLNz5Gfj/i91/fL748PY5VKBRERCctI\nvzsgIiIbT+EvIhIghb+ISIAU/iIiAVL4i4gESOEvIhKgsdUeNLMR4ChwC1AG9gGXgWPR/deA/e5e\nMbN9wMPAMnDQ3U+Y2QRwHMgAWeBBd5/v0VhERKRNrbb87wWS7v5R4LeBLwBPA9PuvhOIAXvN7Cbg\nMWAHsBs4ZGbjwKPAuajt88CB3gxDREQ60Sr8C8AWM4sBW4Al4HZ3n4kefxG4B9gOzLr7JXdfBM4D\n24A7gJNR25NRWxER6bNVd/sAs0AC+CHwLuDjwM66x7NUVwppYOE69cWGmoiI9FmrLf/PU92iN+BD\nVHfd3FD3eBp4h2rAp+rqqSb1Wk1ERPqs1ZZ/kitb7m9H7V8xs13ufhq4DzgFvAw8ZWZxqp8UbqV6\nMHgW2AOcjdrO0EKlUqnEYrE1DEVEJGgdBWdstYndzOxG4GvAVqpb/L8PfA94FhgHXgf2RWf7fJbq\n2T4jwFPu/q3obJ/ngPcAJeABd/+HFn2qzM1lOxnDQMlkUmh8g2mYxwYa36DLZFLdC/8+UfgPsGEe\n3zCPDTS+Qddp+OsiLxGRACn8RUQCpPAXEQmQwl9EJEAKfxGRACn8RUQC1OoiL5GhVy6Xyefz19ST\nySQjI9o+kuGk8Jfg5fN5Tp19i3hiYqVWKha4e/vNpFKpVX5TZHAp/EWAeGKCicmpfndDZMPoM62I\nSIAU/iIiAVL4i4gESOEvIhIgHfCVoDQ7rTOXy1Epb7rZbUV6SuEvQWl2WufC2xdITE4x2cd+iWw0\nhb8Ep/G0zmLh2gu8RIad9vmLiARI4S8iEiCFv4hIgBT+IiIBannA18weBD4d3Z0APgh8FPgyUAZe\nA/a7e8XM9gEPA8vAQXc/YWYTwHEgA2SBB919vtsDERGR9rXc8nf359z9Lne/C/gu8BjwBDDt7juB\nGLDXzG6KHtsB7AYOmdk48ChwLmr7PHCgN0MREZF2tb3bx8x+Gfgldz8K3O7uM9FDLwL3ANuBWXe/\n5O6LwHlgG3AHcDJqezJqKyIifdTJPv9p4MnodqyungW2AGlg4Tr1xYaaiIj0UVvhb2Y3Are4++mo\nVK57OA28QzXg67/5ItWkXquJiEgftXuF707gVN39V8xsV7QyuC967GXgKTOLAwngVqoHg2eBPcDZ\nqO0MLWQyw/3tSRpf/8TjFaaSCSaTiZXaUiHByOgoqakrtdHYMlu3TpFOXz2WzTy2btD4wtFu+N8C\nvFV3/3PAs9EB3deBF6KzfQ4DZ6h+oph295KZHQGeM7MzQAl4oNWLzc1lOxnDQMlkUhpfH2WzOXL5\nIpcrVxb9/MUisZExxsaLK7XCxSLz8zlKpSt7ODf72NZL4xtsna7Y2gp/d/+9hvtvAnc2aXcUONpQ\nKwD3d9QrERHpKV3kJSISIIW/iEiAFP4iIgFS+IuIBEjhLyISIH2Tl0gT5XKZXC53VS0er1AuVxgZ\n0TaTDD6Fvwyt9XxZ+1KpwMyri6S33LhSGxst85Ff+jlSKV0oJINP4S9Da71f1t74Xb+jseUe9FKk\nPxT+MtT0Ze0izWnnpYhIgBT+IiIBUviLiARI4S8iEiCFv4hIgBT+IiIBUviLiARI5/mLtKnZlA8A\nyWRSUz7IwFH4i7SpVCww8+o/XDXlQ6lY4O7tN2vKBxk4Cn+RDjReMSwyqPRZVUQkQC23/M3sPwEf\nB24A/gCYBY4BZeA1YL+7V8xsH/AwsAwcdPcTZjYBHAcyQBZ40N3nezEQERFp36pb/mZ2J/Cr7r4D\nuBN4P/A0MO3uO4EYsNfMbgIeA3YAu4FDZjYOPAqci9o+Dxzo0ThERKQDrbb87wW+b2bfBtLAfwA+\n4+4z0eMvRm0uA7Pufgm4ZGbngW3AHcDvRG1PAr/V5f6LrGicv7/duftFQtQq/DPA+4CPUd3q/xOq\nW/s1WWAL1RXDwnXqiw01kZ5onL+/k7n7RULTKvzngR+4+zLwhpkVgZ+tezwNvEM14OvPdUs1qddq\nLWUyw33anMbXG/F4hXe962eYTFbPxhmNXWZkdJTUVGKlzVIhsebahcICU8n4VbXR2DJbt06RTg/H\ne6plMxytwv/PgMeBZ8zsvcAkcMrMdrn7aeA+4BTwMvCUmcWBBHAr1YPBs8Ae4GzUdubal7jW3Fx2\nDUMZDJlMSuPrkWw2Ry5f5HKluljnLxaJjYwxNl5cabOeGkAuX7qqVrhYZH4+R6kUY9Bp2Rxsna7Y\nVg3/6IydnWb2MtWDw78J/C3wbHRA93Xghehsn8PAmajdtLuXzOwI8JyZnQFKwAOdDkhERLqv5ame\n7v4fm5TvbNLuKHC0oVYA7l9r50REpDd0ha9seo1n8dRshjl1ms33sxn6JdKKwl82vcazeGDzzKmz\nVCow8+riynw/m6VfIq0o/GUgbOY5dTZz30SuR59NRUQCpPAXEQmQdvuIdJG+8EUGhcJfpIsaDwCD\nDgLL5qTwl4HUbAt7s0zkpgPAMggU/jKQmm1hayI3kfYp/GVgNW5hFwvXXggmIs3pCJSISIAU/iIi\nAVL4i4gESOEvIhIghb+ISIAU/iIiAVL4i4gESOEvIhIghb+ISIAU/iIiAWpregcz+2tgIbr7I+AQ\ncAwoA68B+929Ymb7gIeBZeCgu58wswngOJABssCD7j7f1VGIiEhHWm75m1kCwN3vin4+AzwDTLv7\nTiAG7DWzm4DHgB3AbuCQmY0DjwLnorbPAwd6MxQREWlXO1v+HwQmzew7Ufv/DNzm7jPR4y8C9wKX\ngVl3vwRcMrPzwDbgDuB3orYngd/qYv9FRGQN2tnnnwe+5O67gUeArzc8ngW2AGmu7BpqrC821ERE\npI/a2fJ/AzgP4O5vmtkF4MN1j6eBd6gGfP1XFaWa1Gu1VWUyw/2NRxpfZ+LxClPJBJPJxEptqZBg\nZHSU1NT1a+206aR2obDAVDLe8WuOxpbZunWKdHrzv+9aNsPRTvg/RHX3zX4zey/VAH/JzHa5+2ng\nPuAU8DLwlJnFgQRwK9WDwbPAHuBs1Hbm2pe42txcdg1DGQyZTErj61A2myOXL3K5cmVxzV8sEhsZ\nY2y8eN1aO206qQHk8qWOX7Nwscj8fI5SKbbe/4qe0rI52DpdsbUT/l8FvmZmtdB+CLgAPBsd0H0d\neCE62+cwcIbq7qRpdy+Z2RHgOTM7A5SABzrqoYiIdF3L8Hf3ZeBTTR66s0nbo8DRhloBuH+N/RMR\nkR7QRV4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiA2prPX0TWrlwuk8vl\nrqknk0lGRrT9Jf2h8BfpsaVSgZlXF0lvuXGlVioWuHv7zaRSmmhM+kPhL7IB4okJJian+t0NkRX6\nzCkiEiCFv4hIgBT+IiIBUviLiARI4S8iEiCFv4hIgBT+IiIBUviLiARIF3nJplIul8nn81fVcrkc\nlXKlTz0SGU5thb+Z/SPge8DdQBk4Fv37GrDf3Stmtg94GFgGDrr7CTObAI4DGSALPOju810fhQyN\nfD7PqbNvEU9MrNQW3r5AYnKKyT72S2TYtNztY2Y3AP8NyAMx4Blg2t13Rvf3mtlNwGPADmA3cMjM\nxoFHgXNR2+eBAz0ZhQyV2lQItZ94ItHvLokMnXb2+X8JOAL8fXT/NnefiW6/CNwDbAdm3f2Suy8C\n54FtwB3AyajtyaitiIj02arhb2afBubc/aWoFIt+arLAFiANLFynvthQExGRPmu1z/8hoGJm9wAf\nAp6juv++Jg28QzXg6+emTTWp12otZTLDPc2txnd98XiFqWSCyeSVXT1LhQQjo6OkpjqrrfX3rle7\nUFhgKhnvymuOxpbZunWKdHpzLQtaNsOxavi7+67abTP7U+AR4EtmtsvdTwP3AaeAl4GnzCwOJIBb\nqR4MngX2AGejtjO0YW4u2/lIBkQmk9L4VpHN5sjli1yuXFk08xeLxEbGGBsvdlRb6+9drwaQy5e6\n8pqFi0Xm53OUSvUfpPtLy+Zg63TF1ul5/hXgc8CTZvbnVFceL7j7T4HDwBmqK4Npdy9RPVbwATM7\nA3wWeLLD1xMRkR5o+zx/d7+r7u6dTR4/ChxtqBWA+9faORER6Q1d4SsiEiCFv4hIgBT+IiIBUviL\niARI4S8iEiCFv4hIgBT+IiIBUviLiARI4S8iEiCFv4hIgPQ1jiJ9UC6XyeVy19STySQjI9omk95T\n+Iv0wVKpwMyri6S33LhSKxUL3L39ZlIpTTssvafwF+mT2tdVivSDPl+KiARI4S8iEiDt9pG+KZfL\n5PP5q2q5XI5KudKnHomEQ+EvfZPP5zl19i3iiYmV2sLbF0hMTjHZx36JhEDhL33VeNCzWMiv0lpE\nukX7/EVEAqTwFxEJUMvdPmY2CjwL3AJUgEeAEnAMKAOvAfvdvWJm+4CHgWXgoLufMLMJ4DiQAbLA\ng+4+34OxiIhIm9rZ8v8YUHb3jwIHgC8ATwPT7r4TiAF7zewm4DFgB7AbOGRm48CjwLmo7fPRc4iI\nSB+1DH93/+/Av43u/mPgbeB2d5+Jai8C9wDbgVl3v+Tui8B5YBtwB3AyansyaisiIn3U1j5/d79s\nZseALwNfp7q1X5MFtgBpYOE69cWGmoiI9FHbp3q6+6fN7N3Ay0Ci7qE08A7VgK+fkSrVpF6rrSqT\nGe6JrTS+qni8wlQywWTyyuK0VEgwMjpKamr9tW4+F8CFwgJTyXjPXnM0tszWrVOk0/1bPrRshqOd\nA76fAn7O3Q8BBeAy8F0z2+Xup4H7gFNUVwpPmVmc6srhVqoHg2eBPcDZqO3Mta9ytbm57NpGMwAy\nmZTGF8lmc+TyRS5XriyG+YtFYiNjjI0X113r5nPV5PKlnr1m4WKR+fkcpVL9B+uNo2VzsHW6Ymtn\ny/8F4JiZnQZuAB4Hfgg8Gx3QfR14ITrb5zBwhurupGl3L5nZEeA5MztD9SyhBzrqoYiIdF3L8Hf3\nAvBvmjx0Z5O2R4GjTX7//jX2T0REekDTO4hsEvp2L9lICn+RTULf7iUbSeEvsono271ko+izpIhI\ngLTlLxtCX9wisrko/GVD6ItbRDYXhb9sGH1xi8jmoX3+IiIBUviLiARI4S8iEiCFv4hIgBT+IiIB\nUviLiARI4S8iEiCFv4hIgBT+IiIB0hW+0hONc/loHp+10Rz/0isKf+mJxrl8NI/P2miOf+kVhb/0\nTP1cPprHZ+00x7/0gj43iogEaNUtfzO7AfhD4OeBOHAQ+AFwDCgDrwH73b1iZvuAh4Fl4KC7nzCz\nCeA4kAGywIPuPt+jsYiISJtabfl/Ephz953AvwC+AjwNTEe1GLDXzG4CHgN2ALuBQ2Y2DjwKnIva\nPg8c6M0wRESkE63C/xvAE3VtLwG3uftMVHsRuAfYDsy6+yV3XwTOA9uAO4CTUduTUVsREemzVXf7\nuHsewMxSVFcEB4Dfq2uSBbYAaWDhOvXFhpqIrEOz0z916qd0quXZPmb2PuCbwFfc/Y/M7HfrHk4D\n71AN+PrzzlJN6rVaS5nMcJ/CFsL44vEKU8kEk8kEAEuFBCOjo6SmEivtel3r9vNfKCwwlYxv6Gs2\nry3wvTd+ypYblwAoFi/ysZ2/SDq9/uUqhGVTqlod8H038BLwm+7+p1H5FTPb5e6ngfuAU8DLwFNm\nFgcSwK1UDwbPAnuAs1HbGdowN5ddw1AGQyaTCmJ82WyOXL7I5Up1EctfLBIbGWNsvLjStte1bj8/\nQC5f2tDXXK1W+79dvjzC/HyOUinGeoSybA6rTldsrbb8p6nuqnnCzGr7/h8HDkcHdF8HXojO9jkM\nnKF6bGDa3UtmdgR4zszOACXggY56JyIiPdFqn//jVMO+0Z1N2h4FjjbUCsD96+ifiIj0gI4QiYgE\nSOEvIhIghb+ISIA0sZusW/30zfF4pXqmj6ZwFtnUFP6ybvXTN08lE+TyRU3hLLLJKfylK2rTDk8m\nE1yujGkKZ5FNTvv8RUQCpPAXEQmQwl9EJEAKfxGRACn8RUQCpPAXEQmQwl9EJEA6z186Un81b42u\n5hUZPAp/6Uj91bw1uppXZPAo/KVjtat5a3Q1r8jg0T5/EZEAKfxFRAKk8BcRCZDCX0QkQG0d8DWz\nXwG+6O53mdkvAMeAMvAasN/dK2a2D3gYWAYOuvsJM5sAjgMZIAs86O7zPRiHSLDK5TK5XO6aejKZ\nZGRE23fSXMslw8w+DzwLxKPSM8C0u+8EYsBeM7sJeAzYAewGDpnZOPAocC5q+zxwoPtDEAnbUqnA\nzKs/4c++/3crP6fOvnXN9Rgi9drZLDgPfIJq0APc5u4z0e0XgXuA7cCsu19y98Xod7YBdwAno7Yn\no7Yi0mW1029rP/XXYYg00zL83f2bVHfl1MTqbmeBLUAaWLhOfbGhJiIifbaWi7zKdbfTwDtUAz5V\nV081qddqLWUyqdaNBtggjy8erzCVTDCZTKzUlgoJRkZHSU1Va6mpxDW1xvsbUev2818oLDCVjG/o\na651nKOxZbZunSKd7mxZG+Rlsx3DPr5OrCX8XzGzXe5+GrgPOAW8DDxlZnEgAdxK9WDwLLAHOBu1\nnWn+lFebm8uuoVuDIZNJDfT4stkcuXyRy5Uri07+YpHYyBhj40VSUwmyueJVtcY2zX6vF7VuPz9A\nLl/a0Ndc6zgLF4vMz+coleo/qK9u0JfNVkIYXyc6ORWgNnPX54AnzezPqa48XnD3nwKHgTNUVwbT\n7l4CjgAfMLMzwGeBJzvqnYiI9ERbW/7u/rdUz+TB3d8E7mzS5ihwtKFWAO5fbyelPzSDp8jw0sRu\ncl2awXNw6dx/aUXhL6vSDJ6DqXru/yLpLTeu1ErFAndvv5lUSgc9ReEvMrQaV9wi9fT5T0QkQAp/\nEZEAKfxFRAKkff4C6LROkdAo/AXQaZ0ioVH4ywqd1ikSDoW/SCB04ZfUU/iLBEIXfkk9hX+AdHA3\nXLrwS2oU/gHSwV0RUfgHSgd3Ba4+DhCPV8hmq7d1HGD4KfxFAlZ/HGAqmSCXL+o4QCAU/gFo3Mev\n/ftSr/YpcDKZuOob2mS46Z0OQOM+fu3fFxGF/5C53pk84+OJlX382r8vq9H1AGFQ+A8Znckj69Xs\neoDCxTy/+oH3MDV15SQBrQwGW8/D38xGgP8KbANKwGfd/a1ev27IdCaPrFezZWjm1Z+srBB0UHjw\nbcSW/78Cxt19h5n9CvB0VJN10sVaspF0gdhw2YjwvwM4CeDuf2Vmv7wBrzl0rhf0f/m//i+JySs7\ndLSLRzaCjgsMvo0I/zSwWHf/spmNuHt5A15702sW6uVy9b+m/o9otaDXLh7ZaO0eF2i2LDergVYc\nG20jwn8RqN8xOHDBf+5vXmVpaWnlfowYN9/8fsbGxjt+rvqrKKEa6qdf+THxeGKltrDw/xiJjZJK\nb7mqNjGRItGwTV8qFihczNXdLxIbGV211k6btdZGY8sULhY39DU3apyXli5SKl7u2/9tr1/zeu/d\naq9Zb6lU5H/85Q+vWW6bLcuNtVKpyK4P/5OrVhzdlsno+ES9jQj/WeDjwDfM7CPA37RoH9tsb9I9\nd//Trj5fOp2+6v62bbd09flFpLnNli39tBHh/y3gn5vZbHT/oQ14TRERWUWsUtGZISIiodHRFRGR\nACn8RUQCpPAXEQmQwl9EJECbamI3M/t14F+7+yej+x8Bfh9YBl5y99/uZ/+6wcxiwP8G3ohKf+Hu\n033s0rqFMH+Tmf01sBDd/ZG7f6af/emGaLqVL7r7XWb2C8AxoAy8Bux394E+G6RhfB8G/gR4M3r4\niLv/cf96tz5mdgPwh8DPA3HgIPADOngPN034m9mXgXuBV+rKR4BPuPuPzeyEmX3I3V/tTw+75mbg\ne+7+a/3uSBcN9fxNZpYAcPe7+t2XbjGzzwO/AdSu5HoGmHb3GTM7AuwFvt2v/q1Xk/HdDjzj7s/0\nr1dd9Ulgzt0/ZWY/A5yjmp1tv4ebabfPLPAoEAMwszQQd/cfR49/B7inT33rptuBnzWz/xmt0Ibh\nCq+r5m8Chm3+pg8Ck2b2HTM7Fa3gBt154BNEf2/Abe4+E91+kcH/W2sc3+3AvzSz02Z21MwGfYa6\nbwBPRLdHgEt0+B5uePib2WfM7PsNP7c3+QjWOCdQFtjCAGk2VuDvgC+4+z8DvgAc728vu6Lp/E39\n6kwP5IEvuftu4BHg64M+Pnf/JtXdqTWxuts5BuxvrVGT8f0V8O/dfRfwI+C/9KVjXeLueXfPmVmK\n6orgAFfnecv3cMN3+7j7V4GvttG0cU6gNPBOTzrVI83GamYTRAulu8+a2Xv70bcuG/j5m1p4g+qW\nJO7+ppldAN4D/J++9qq76t+vFAP2t9aGb7l77ZjNt4HD/exMN5jZ+4BvAl9x9z8ys9+te7jle7hp\nt17cfRFYMrP3RwdJ7wVmWvzaIHgC+HcAZvZB4Cf97U5XzAJ7YOUgfav5mwbNQ1SPYxCtrNPA3/e1\nR933ipntim7fx3D8rdU7aWbbo9t3A9/tZ2fWy8zeDbwEfN7dj0Xljt7DTXPAN1KJfmoeAb4OjALf\ncfezfelVd30ROG5me6h+Avh0f7vTFcM+f9NXga+ZWe2P6aEh+mRT+3v7HPCsmY0DrwMv9K9LXVUb\n3yPAV8zsEtUV98P961JXTFPdrfOEmdX2/T8OHG73PdTcPiIiAdq0u31ERKR3FP4iIgFS+IuIBEjh\nLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISoP8PoK6vAauRiewAAAAASUVORK5CYII=\n",
980 | "text/plain": [
981 | ""
982 | ]
983 | },
984 | "metadata": {},
985 | "output_type": "display_data"
986 | }
987 | ],
988 | "source": [
989 | "mean_difference = bootstrap_experiment(100000)\n",
990 | "sns.distplot(mean_difference, kde=False)"
991 | ]
992 | },
993 | {
994 | "cell_type": "code",
995 | "execution_count": 39,
996 | "metadata": {
997 | "collapsed": false
998 | },
999 | "outputs": [],
1000 | "source": [
1001 | "mean_difference = np.sort(mean_difference, axis=0)"
1002 | ]
1003 | },
1004 | {
1005 | "cell_type": "code",
1006 | "execution_count": 40,
1007 | "metadata": {
1008 | "collapsed": false
1009 | },
1010 | "outputs": [
1011 | {
1012 | "data": {
1013 | "text/plain": [
1014 | "array([[ -6.66666667],\n",
1015 | " [ -6.33333333],\n",
1016 | " [ -6.08333333],\n",
1017 | " ..., \n",
1018 | " [ 13.16666667],\n",
1019 | " [ 13.16666667],\n",
1020 | " [ 15. ]])"
1021 | ]
1022 | },
1023 | "execution_count": 40,
1024 | "metadata": {},
1025 | "output_type": "execute_result"
1026 | }
1027 | ],
1028 | "source": [
1029 | "mean_difference #Sorted difference"
1030 | ]
1031 | },
1032 | {
1033 | "cell_type": "code",
1034 | "execution_count": 41,
1035 | "metadata": {
1036 | "collapsed": false
1037 | },
1038 | "outputs": [
1039 | {
1040 | "data": {
1041 | "text/plain": [
1042 | "array([ 0.16666667, 8.08333333])"
1043 | ]
1044 | },
1045 | "execution_count": 41,
1046 | "metadata": {},
1047 | "output_type": "execute_result"
1048 | }
1049 | ],
1050 | "source": [
1051 | "np.percentile(mean_difference, [5,95])"
1052 | ]
1053 | },
1054 | {
1055 | "cell_type": "markdown",
1056 | "metadata": {},
1057 | "source": [
1058 | "Reiterating what this means: 90% of the times, the mean difference is between the limits as shown above"
1059 | ]
1060 | },
1061 | {
1062 | "cell_type": "markdown",
1063 | "metadata": {},
1064 | "source": [
1065 | "**Exercise: Find the 95% percentile for confidence intevals**"
1066 | ]
1067 | },
1068 | {
1069 | "cell_type": "code",
1070 | "execution_count": null,
1071 | "metadata": {
1072 | "collapsed": true
1073 | },
1074 | "outputs": [],
1075 | "source": []
1076 | },
1077 | {
1078 | "cell_type": "markdown",
1079 | "metadata": {},
1080 | "source": [
1081 | "### Where do we go from here? \n",
1082 | "\n",
1083 | "First of all there are two points to be made.\n",
1084 | "\n",
1085 | "1. Whey do we need signficance testing if confidence intervals can provide us more information?\n",
1086 | "2. How does it relate to the traditional statistical procedure of finding confidence intervals\n",
1087 | "\n",
1088 | "For the first one:\n",
1089 | "\n",
1090 | "What if sales in the first month after price changes was 80 and the month before price changes was 40. The difference is 40. And confidence interval,as explained above, using replacements, would always generate 40. But if we do the significance testing, as detailed above - where the labels are shuffled, the prices are equally likely to occur in both the groups. And so, significance testing would answer that there was no difference. But don't we all know that the data is **too small** to make meaningful inferences?\n",
1091 | "\n",
1092 | "For the second one:\n",
1093 | "\n",
1094 | "Traditional statistics derivation assumes normal distribution. But what if the underlying distribution isn't normal? Also, people relate to resampling much better :-) "
1095 | ]
1096 | }
1097 | ],
1098 | "metadata": {
1099 | "kernelspec": {
1100 | "display_name": "Python 2",
1101 | "language": "python",
1102 | "name": "python2"
1103 | },
1104 | "language_info": {
1105 | "codemirror_mode": {
1106 | "name": "ipython",
1107 | "version": 2
1108 | },
1109 | "file_extension": ".py",
1110 | "mimetype": "text/x-python",
1111 | "name": "python",
1112 | "nbconvert_exporter": "python",
1113 | "pygments_lexer": "ipython2",
1114 | "version": "2.7.10"
1115 | }
1116 | },
1117 | "nbformat": 4,
1118 | "nbformat_minor": 0
1119 | }
1120 |
--------------------------------------------------------------------------------
/notebooks/4. Basic Metrics.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Basic Metrics\n",
8 | "\n",
9 | "When we think about summarizing data, what are the metrics that we look at?\n",
10 | "\n",
11 | "In this notebook, we will look in the price of weed dataset along with the demographic information of the United States. \n",
12 | "\n",
13 | "To read how the data was acquired, please read [this](https://github.com/amitkaps/weed/blob/master/1-Acquire.ipynb) to get more information\n",
14 | "\n",
15 | "This notebook will make use of pandas quite a bit."
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": 1,
21 | "metadata": {
22 | "collapsed": true
23 | },
24 | "outputs": [],
25 | "source": [
26 | "import numpy as np\n",
27 | "import pandas as pd\n",
28 | "from datetime import datetime as dt\n",
29 | "from scipy import stats"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "### Read the input datasets. There are three datasets:\n",
37 | "\n",
38 | "1. Weed price by date / state\n",
39 | "2. Demographics of State\n",
40 | "3. Population of state"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 2,
46 | "metadata": {
47 | "collapsed": false
48 | },
49 | "outputs": [],
50 | "source": [
51 | "prices_pd = pd.read_csv(\"../data/Weed_Price.csv\", parse_dates=[-1])\n",
52 | "demography_pd = pd.read_csv(\"../data/Demographics_State.csv\")\n",
53 | "population_pd = pd.read_csv(\"../data/Population_State.csv\")"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 3,
59 | "metadata": {
60 | "collapsed": false
61 | },
62 | "outputs": [
63 | {
64 | "data": {
65 | "text/html": [
66 | "\n",
67 | "
\n",
68 | " \n",
69 | " \n",
70 | " | \n",
71 | " State | \n",
72 | " HighQ | \n",
73 | " HighQN | \n",
74 | " MedQ | \n",
75 | " MedQN | \n",
76 | " LowQ | \n",
77 | " LowQN | \n",
78 | " date | \n",
79 | "
\n",
80 | " \n",
81 | " \n",
82 | " \n",
83 | " 0 | \n",
84 | " Alabama | \n",
85 | " 339.06 | \n",
86 | " 1042 | \n",
87 | " 198.64 | \n",
88 | " 933 | \n",
89 | " 149.49 | \n",
90 | " 123 | \n",
91 | " 2014-01-01 | \n",
92 | "
\n",
93 | " \n",
94 | " 1 | \n",
95 | " Alaska | \n",
96 | " 288.75 | \n",
97 | " 252 | \n",
98 | " 260.60 | \n",
99 | " 297 | \n",
100 | " 388.58 | \n",
101 | " 26 | \n",
102 | " 2014-01-01 | \n",
103 | "
\n",
104 | " \n",
105 | " 2 | \n",
106 | " Arizona | \n",
107 | " 303.31 | \n",
108 | " 1941 | \n",
109 | " 209.35 | \n",
110 | " 1625 | \n",
111 | " 189.45 | \n",
112 | " 222 | \n",
113 | " 2014-01-01 | \n",
114 | "
\n",
115 | " \n",
116 | " 3 | \n",
117 | " Arkansas | \n",
118 | " 361.85 | \n",
119 | " 576 | \n",
120 | " 185.62 | \n",
121 | " 544 | \n",
122 | " 125.87 | \n",
123 | " 112 | \n",
124 | " 2014-01-01 | \n",
125 | "
\n",
126 | " \n",
127 | " 4 | \n",
128 | " California | \n",
129 | " 248.78 | \n",
130 | " 12096 | \n",
131 | " 193.56 | \n",
132 | " 12812 | \n",
133 | " 192.92 | \n",
134 | " 778 | \n",
135 | " 2014-01-01 | \n",
136 | "
\n",
137 | " \n",
138 | "
\n",
139 | "
"
140 | ],
141 | "text/plain": [
142 | " State HighQ HighQN MedQ MedQN LowQ LowQN date\n",
143 | "0 Alabama 339.06 1042 198.64 933 149.49 123 2014-01-01\n",
144 | "1 Alaska 288.75 252 260.60 297 388.58 26 2014-01-01\n",
145 | "2 Arizona 303.31 1941 209.35 1625 189.45 222 2014-01-01\n",
146 | "3 Arkansas 361.85 576 185.62 544 125.87 112 2014-01-01\n",
147 | "4 California 248.78 12096 193.56 12812 192.92 778 2014-01-01"
148 | ]
149 | },
150 | "execution_count": 3,
151 | "metadata": {},
152 | "output_type": "execute_result"
153 | }
154 | ],
155 | "source": [
156 | "prices_pd.head()"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 4,
162 | "metadata": {
163 | "collapsed": false
164 | },
165 | "outputs": [
166 | {
167 | "data": {
168 | "text/html": [
169 | "\n",
170 | "
\n",
171 | " \n",
172 | " \n",
173 | " | \n",
174 | " State | \n",
175 | " HighQ | \n",
176 | " HighQN | \n",
177 | " MedQ | \n",
178 | " MedQN | \n",
179 | " LowQ | \n",
180 | " LowQN | \n",
181 | " date | \n",
182 | "
\n",
183 | " \n",
184 | " \n",
185 | " \n",
186 | " 22894 | \n",
187 | " Virginia | \n",
188 | " 364.98 | \n",
189 | " 3513 | \n",
190 | " 293.12 | \n",
191 | " 3079 | \n",
192 | " NaN | \n",
193 | " 284 | \n",
194 | " 2014-12-31 | \n",
195 | "
\n",
196 | " \n",
197 | " 22895 | \n",
198 | " Washington | \n",
199 | " 233.05 | \n",
200 | " 3337 | \n",
201 | " 189.92 | \n",
202 | " 3562 | \n",
203 | " NaN | \n",
204 | " 160 | \n",
205 | " 2014-12-31 | \n",
206 | "
\n",
207 | " \n",
208 | " 22896 | \n",
209 | " West Virginia | \n",
210 | " 359.35 | \n",
211 | " 551 | \n",
212 | " 224.03 | \n",
213 | " 545 | \n",
214 | " NaN | \n",
215 | " 60 | \n",
216 | " 2014-12-31 | \n",
217 | "
\n",
218 | " \n",
219 | " 22897 | \n",
220 | " Wisconsin | \n",
221 | " 350.52 | \n",
222 | " 2244 | \n",
223 | " 272.71 | \n",
224 | " 2221 | \n",
225 | " NaN | \n",
226 | " 167 | \n",
227 | " 2014-12-31 | \n",
228 | "
\n",
229 | " \n",
230 | " 22898 | \n",
231 | " Wyoming | \n",
232 | " 322.27 | \n",
233 | " 131 | \n",
234 | " 351.86 | \n",
235 | " 197 | \n",
236 | " NaN | \n",
237 | " 12 | \n",
238 | " 2014-12-31 | \n",
239 | "
\n",
240 | " \n",
241 | "
\n",
242 | "
"
243 | ],
244 | "text/plain": [
245 | " State HighQ HighQN MedQ MedQN LowQ LowQN date\n",
246 | "22894 Virginia 364.98 3513 293.12 3079 NaN 284 2014-12-31\n",
247 | "22895 Washington 233.05 3337 189.92 3562 NaN 160 2014-12-31\n",
248 | "22896 West Virginia 359.35 551 224.03 545 NaN 60 2014-12-31\n",
249 | "22897 Wisconsin 350.52 2244 272.71 2221 NaN 167 2014-12-31\n",
250 | "22898 Wyoming 322.27 131 351.86 197 NaN 12 2014-12-31"
251 | ]
252 | },
253 | "execution_count": 4,
254 | "metadata": {},
255 | "output_type": "execute_result"
256 | }
257 | ],
258 | "source": [
259 | "prices_pd.tail()"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": 5,
265 | "metadata": {
266 | "collapsed": false
267 | },
268 | "outputs": [
269 | {
270 | "data": {
271 | "text/html": [
272 | "\n",
273 | "
\n",
274 | " \n",
275 | " \n",
276 | " | \n",
277 | " region | \n",
278 | " total_population | \n",
279 | " percent_white | \n",
280 | " percent_black | \n",
281 | " percent_asian | \n",
282 | " percent_hispanic | \n",
283 | " per_capita_income | \n",
284 | " median_rent | \n",
285 | " median_age | \n",
286 | "
\n",
287 | " \n",
288 | " \n",
289 | " \n",
290 | " 0 | \n",
291 | " alabama | \n",
292 | " 4799277 | \n",
293 | " 67 | \n",
294 | " 26 | \n",
295 | " 1 | \n",
296 | " 4 | \n",
297 | " 23680 | \n",
298 | " 501 | \n",
299 | " 38.1 | \n",
300 | "
\n",
301 | " \n",
302 | " 1 | \n",
303 | " alaska | \n",
304 | " 720316 | \n",
305 | " 63 | \n",
306 | " 3 | \n",
307 | " 5 | \n",
308 | " 6 | \n",
309 | " 32651 | \n",
310 | " 978 | \n",
311 | " 33.6 | \n",
312 | "
\n",
313 | " \n",
314 | " 2 | \n",
315 | " arizona | \n",
316 | " 6479703 | \n",
317 | " 57 | \n",
318 | " 4 | \n",
319 | " 3 | \n",
320 | " 30 | \n",
321 | " 25358 | \n",
322 | " 747 | \n",
323 | " 36.3 | \n",
324 | "
\n",
325 | " \n",
326 | " 3 | \n",
327 | " arkansas | \n",
328 | " 2933369 | \n",
329 | " 74 | \n",
330 | " 15 | \n",
331 | " 1 | \n",
332 | " 7 | \n",
333 | " 22170 | \n",
334 | " 480 | \n",
335 | " 37.5 | \n",
336 | "
\n",
337 | " \n",
338 | " 4 | \n",
339 | " california | \n",
340 | " 37659181 | \n",
341 | " 40 | \n",
342 | " 6 | \n",
343 | " 13 | \n",
344 | " 38 | \n",
345 | " 29527 | \n",
346 | " 1119 | \n",
347 | " 35.4 | \n",
348 | "
\n",
349 | " \n",
350 | "
\n",
351 | "
"
352 | ],
353 | "text/plain": [
354 | " region total_population percent_white percent_black percent_asian \\\n",
355 | "0 alabama 4799277 67 26 1 \n",
356 | "1 alaska 720316 63 3 5 \n",
357 | "2 arizona 6479703 57 4 3 \n",
358 | "3 arkansas 2933369 74 15 1 \n",
359 | "4 california 37659181 40 6 13 \n",
360 | "\n",
361 | " percent_hispanic per_capita_income median_rent median_age \n",
362 | "0 4 23680 501 38.1 \n",
363 | "1 6 32651 978 33.6 \n",
364 | "2 30 25358 747 36.3 \n",
365 | "3 7 22170 480 37.5 \n",
366 | "4 38 29527 1119 35.4 "
367 | ]
368 | },
369 | "execution_count": 5,
370 | "metadata": {},
371 | "output_type": "execute_result"
372 | }
373 | ],
374 | "source": [
375 | "demography_pd.head()"
376 | ]
377 | },
378 | {
379 | "cell_type": "code",
380 | "execution_count": 6,
381 | "metadata": {
382 | "collapsed": false
383 | },
384 | "outputs": [
385 | {
386 | "data": {
387 | "text/html": [
388 | "\n",
389 | "
\n",
390 | " \n",
391 | " \n",
392 | " | \n",
393 | " region | \n",
394 | " value | \n",
395 | "
\n",
396 | " \n",
397 | " \n",
398 | " \n",
399 | " 0 | \n",
400 | " alabama | \n",
401 | " 4777326 | \n",
402 | "
\n",
403 | " \n",
404 | " 1 | \n",
405 | " alaska | \n",
406 | " 711139 | \n",
407 | "
\n",
408 | " \n",
409 | " 2 | \n",
410 | " arizona | \n",
411 | " 6410979 | \n",
412 | "
\n",
413 | " \n",
414 | " 3 | \n",
415 | " arkansas | \n",
416 | " 2916372 | \n",
417 | "
\n",
418 | " \n",
419 | " 4 | \n",
420 | " california | \n",
421 | " 37325068 | \n",
422 | "
\n",
423 | " \n",
424 | "
\n",
425 | "
"
426 | ],
427 | "text/plain": [
428 | " region value\n",
429 | "0 alabama 4777326\n",
430 | "1 alaska 711139\n",
431 | "2 arizona 6410979\n",
432 | "3 arkansas 2916372\n",
433 | "4 california 37325068"
434 | ]
435 | },
436 | "execution_count": 6,
437 | "metadata": {},
438 | "output_type": "execute_result"
439 | }
440 | ],
441 | "source": [
442 | "population_pd.head()"
443 | ]
444 | },
445 | {
446 | "cell_type": "code",
447 | "execution_count": 7,
448 | "metadata": {
449 | "collapsed": false
450 | },
451 | "outputs": [
452 | {
453 | "data": {
454 | "text/plain": [
455 | "State object\n",
456 | "HighQ float64\n",
457 | "HighQN int64\n",
458 | "MedQ float64\n",
459 | "MedQN int64\n",
460 | "LowQ float64\n",
461 | "LowQN int64\n",
462 | "date datetime64[ns]\n",
463 | "dtype: object"
464 | ]
465 | },
466 | "execution_count": 7,
467 | "metadata": {},
468 | "output_type": "execute_result"
469 | }
470 | ],
471 | "source": [
472 | "prices_pd.dtypes"
473 | ]
474 | },
475 | {
476 | "cell_type": "markdown",
477 | "metadata": {},
478 | "source": [
479 | "#### Sort the data on state and date, then fill NA values"
480 | ]
481 | },
482 | {
483 | "cell_type": "code",
484 | "execution_count": null,
485 | "metadata": {
486 | "collapsed": false
487 | },
488 | "outputs": [],
489 | "source": [
490 | "prices_pd.sort(columns=['State', 'date'], inplace=True)\n",
491 | "prices_pd.fillna(method='ffill', inplace=True)"
492 | ]
493 | },
494 | {
495 | "cell_type": "markdown",
496 | "metadata": {},
497 | "source": [
498 | "### Finding mean, median, mode, variance, standard deviation for California"
499 | ]
500 | },
501 | {
502 | "cell_type": "markdown",
503 | "metadata": {},
504 | "source": [
505 | "#### Mean\n",
506 | "\n",
507 | "arithmetic average of a range of values or quantities, computed by dividing the total of all values by the number of values."
508 | ]
509 | },
510 | {
511 | "cell_type": "code",
512 | "execution_count": 9,
513 | "metadata": {
514 | "collapsed": false
515 | },
516 | "outputs": [
517 | {
518 | "data": {
519 | "text/html": [
520 | "\n",
521 | "
\n",
522 | " \n",
523 | " \n",
524 | " | \n",
525 | " State | \n",
526 | " HighQ | \n",
527 | " HighQN | \n",
528 | " MedQ | \n",
529 | " MedQN | \n",
530 | " LowQ | \n",
531 | " LowQN | \n",
532 | " date | \n",
533 | "
\n",
534 | " \n",
535 | " \n",
536 | " \n",
537 | " 20098 | \n",
538 | " California | \n",
539 | " 248.77 | \n",
540 | " 12021 | \n",
541 | " 193.44 | \n",
542 | " 12724 | \n",
543 | " 193.88 | \n",
544 | " 770 | \n",
545 | " 2013-12-27 | \n",
546 | "
\n",
547 | " \n",
548 | " 20863 | \n",
549 | " California | \n",
550 | " 248.74 | \n",
551 | " 12025 | \n",
552 | " 193.44 | \n",
553 | " 12728 | \n",
554 | " 193.88 | \n",
555 | " 770 | \n",
556 | " 2013-12-28 | \n",
557 | "
\n",
558 | " \n",
559 | " 21577 | \n",
560 | " California | \n",
561 | " 248.76 | \n",
562 | " 12047 | \n",
563 | " 193.55 | \n",
564 | " 12760 | \n",
565 | " 193.60 | \n",
566 | " 772 | \n",
567 | " 2013-12-29 | \n",
568 | "
\n",
569 | " \n",
570 | " 22291 | \n",
571 | " California | \n",
572 | " 248.82 | \n",
573 | " 12065 | \n",
574 | " 193.54 | \n",
575 | " 12779 | \n",
576 | " 193.80 | \n",
577 | " 773 | \n",
578 | " 2013-12-30 | \n",
579 | "
\n",
580 | " \n",
581 | " 22801 | \n",
582 | " California | \n",
583 | " 248.76 | \n",
584 | " 12082 | \n",
585 | " 193.54 | \n",
586 | " 12792 | \n",
587 | " 193.80 | \n",
588 | " 773 | \n",
589 | " 2013-12-31 | \n",
590 | "
\n",
591 | " \n",
592 | "
\n",
593 | "
"
594 | ],
595 | "text/plain": [
596 | " State HighQ HighQN MedQ MedQN LowQ LowQN date\n",
597 | "20098 California 248.77 12021 193.44 12724 193.88 770 2013-12-27\n",
598 | "20863 California 248.74 12025 193.44 12728 193.88 770 2013-12-28\n",
599 | "21577 California 248.76 12047 193.55 12760 193.60 772 2013-12-29\n",
600 | "22291 California 248.82 12065 193.54 12779 193.80 773 2013-12-30\n",
601 | "22801 California 248.76 12082 193.54 12792 193.80 773 2013-12-31"
602 | ]
603 | },
604 | "execution_count": 9,
605 | "metadata": {},
606 | "output_type": "execute_result"
607 | }
608 | ],
609 | "source": [
610 | "california_pd = prices_pd[prices_pd.State == \"California\"].copy(True)\n",
611 | "california_pd.head()"
612 | ]
613 | },
614 | {
615 | "cell_type": "code",
616 | "execution_count": 10,
617 | "metadata": {
618 | "collapsed": false
619 | },
620 | "outputs": [],
621 | "source": [
622 | "ca_sum = california_pd['HighQ'].sum()"
623 | ]
624 | },
625 | {
626 | "cell_type": "code",
627 | "execution_count": 11,
628 | "metadata": {
629 | "collapsed": false
630 | },
631 | "outputs": [],
632 | "source": [
633 | "ca_count = california_pd['HighQ'].count()"
634 | ]
635 | },
636 | {
637 | "cell_type": "code",
638 | "execution_count": 12,
639 | "metadata": {
640 | "collapsed": false
641 | },
642 | "outputs": [
643 | {
644 | "name": "stdout",
645 | "output_type": "stream",
646 | "text": [
647 | "Mean weed price in CA is: 245.376124722\n"
648 | ]
649 | }
650 | ],
651 | "source": [
652 | "ca_mean = ca_sum / ca_count\n",
653 | "print \"Mean weed price in CA is:\", ca_mean"
654 | ]
655 | },
656 | {
657 | "cell_type": "markdown",
658 | "metadata": {},
659 | "source": [
660 | "#### Exercise: Find CA mean for 2013, 2014 & 2015 separately\n",
661 | "\n",
662 | "*Hint:* `california_pd.iloc[0]['date'].year`"
663 | ]
664 | },
665 | {
666 | "cell_type": "code",
667 | "execution_count": null,
668 | "metadata": {
669 | "collapsed": true
670 | },
671 | "outputs": [],
672 | "source": []
673 | },
674 | {
675 | "cell_type": "markdown",
676 | "metadata": {},
677 | "source": [
678 | "#### Median\n",
679 | "\n",
680 | "Denotes value or quantity lying at the midpoint of a frequency distribution of observed values or quantities, such that there is an equal probability of falling above or below it. Simply put, it is the *middle* value in the list of numbers."
681 | ]
682 | },
683 | {
684 | "cell_type": "code",
685 | "execution_count": 13,
686 | "metadata": {
687 | "collapsed": false
688 | },
689 | "outputs": [
690 | {
691 | "data": {
692 | "text/plain": [
693 | "449"
694 | ]
695 | },
696 | "execution_count": 13,
697 | "metadata": {},
698 | "output_type": "execute_result"
699 | }
700 | ],
701 | "source": [
702 | "ca_count"
703 | ]
704 | },
705 | {
706 | "cell_type": "markdown",
707 | "metadata": {},
708 | "source": [
709 | "If count is odd, the median is the value at (n+1)/2,\n",
710 | "\n",
711 | "else it is the average of n/2 and (n+1)/2"
712 | ]
713 | },
714 | {
715 | "cell_type": "code",
716 | "execution_count": null,
717 | "metadata": {
718 | "collapsed": false
719 | },
720 | "outputs": [],
721 | "source": [
722 | "ca_highq_pd = california_pd.sort(columns=['HighQ'])\n",
723 | "ca_highq_pd.head()"
724 | ]
725 | },
726 | {
727 | "cell_type": "code",
728 | "execution_count": 15,
729 | "metadata": {
730 | "collapsed": false
731 | },
732 | "outputs": [
733 | {
734 | "name": "stdout",
735 | "output_type": "stream",
736 | "text": [
737 | "Median price of weed in CA is: 245.31\n"
738 | ]
739 | }
740 | ],
741 | "source": [
742 | "ca_median = ca_highq_pd.HighQ.iloc[(ca_count) / 2]\n",
743 | "print \"Median price of weed in CA is:\", ca_median"
744 | ]
745 | },
746 | {
747 | "cell_type": "markdown",
748 | "metadata": {},
749 | "source": [
750 | "#### Mode\n",
751 | "\n",
752 | "It is the number which appears most often in a set of numbers. "
753 | ]
754 | },
755 | {
756 | "cell_type": "code",
757 | "execution_count": 16,
758 | "metadata": {
759 | "collapsed": false
760 | },
761 | "outputs": [
762 | {
763 | "name": "stdout",
764 | "output_type": "stream",
765 | "text": [
766 | "The most common price is CA, as indicated by its mode, is: 245.05\n"
767 | ]
768 | }
769 | ],
770 | "source": [
771 | "ca_mode = ca_highq_pd.HighQ.value_counts().index[0]\n",
772 | "print \"The most common price is CA, as indicated by its mode, is:\", ca_mode"
773 | ]
774 | },
775 | {
776 | "cell_type": "markdown",
777 | "metadata": {},
778 | "source": [
779 | "#### Variance\n",
780 | "\n",
781 | "> Once two statistician of height 4 feet and 5 feet have to cross a river of AVERAGE depth 3 feet. Meanwhile, a third person comes and said, \"what are you waiting for? You can easily cross the river\"\n",
782 | "\n",
783 | "It's the average distance of the data values from the *mean*\n",
784 | "\n",
785 | "
"
786 | ]
787 | },
788 | {
789 | "cell_type": "code",
790 | "execution_count": 17,
791 | "metadata": {
792 | "collapsed": false
793 | },
794 | "outputs": [],
795 | "source": [
796 | "california_pd['HighQ_dev'] = (california_pd['HighQ'] - ca_mean) ** 2"
797 | ]
798 | },
799 | {
800 | "cell_type": "code",
801 | "execution_count": 18,
802 | "metadata": {
803 | "collapsed": false
804 | },
805 | "outputs": [
806 | {
807 | "name": "stdout",
808 | "output_type": "stream",
809 | "text": [
810 | "Variance of High Quality weed prices in CA is: 2.98268628798\n"
811 | ]
812 | }
813 | ],
814 | "source": [
815 | "ca_HighQ_variance = california_pd.HighQ_dev.sum() / (ca_count - 1)\n",
816 | "print \"Variance of High Quality weed prices in CA is:\", ca_HighQ_variance"
817 | ]
818 | },
819 | {
820 | "cell_type": "markdown",
821 | "metadata": {},
822 | "source": [
823 | "#### Standard Deviation\n",
824 | "\n",
825 | "It is the square root of variance. This will have the same units as the data and mean. "
826 | ]
827 | },
828 | {
829 | "cell_type": "code",
830 | "execution_count": 19,
831 | "metadata": {
832 | "collapsed": false
833 | },
834 | "outputs": [
835 | {
836 | "name": "stdout",
837 | "output_type": "stream",
838 | "text": [
839 | "Standard Deviation of High Quality weed prices in CA is: 1.72704553732\n"
840 | ]
841 | }
842 | ],
843 | "source": [
844 | "ca_HighQ_SD = np.sqrt(ca_HighQ_variance)\n",
845 | "print \"Standard Deviation of High Quality weed prices in CA is:\", ca_HighQ_SD"
846 | ]
847 | },
848 | {
849 | "cell_type": "markdown",
850 | "metadata": {},
851 | "source": [
852 | "#### Using Pandas built-in function"
853 | ]
854 | },
855 | {
856 | "cell_type": "code",
857 | "execution_count": 20,
858 | "metadata": {
859 | "collapsed": false
860 | },
861 | "outputs": [
862 | {
863 | "data": {
864 | "text/html": [
865 | "\n",
866 | "
\n",
867 | " \n",
868 | " \n",
869 | " | \n",
870 | " HighQ | \n",
871 | " HighQN | \n",
872 | " MedQ | \n",
873 | " MedQN | \n",
874 | " LowQ | \n",
875 | " LowQN | \n",
876 | " HighQ_dev | \n",
877 | "
\n",
878 | " \n",
879 | " \n",
880 | " \n",
881 | " count | \n",
882 | " 449.000000 | \n",
883 | " 449.000000 | \n",
884 | " 449.000000 | \n",
885 | " 449.000000 | \n",
886 | " 449.000000 | \n",
887 | " 449.000000 | \n",
888 | " 449.000000 | \n",
889 | "
\n",
890 | " \n",
891 | " mean | \n",
892 | " 245.376125 | \n",
893 | " 14947.073497 | \n",
894 | " 191.268909 | \n",
895 | " 16769.821826 | \n",
896 | " 189.783586 | \n",
897 | " 976.298441 | \n",
898 | " 2.976043 | \n",
899 | "
\n",
900 | " \n",
901 | " std | \n",
902 | " 1.727046 | \n",
903 | " 1656.133565 | \n",
904 | " 1.524028 | \n",
905 | " 2433.943191 | \n",
906 | " 1.598252 | \n",
907 | " 120.246714 | \n",
908 | " 3.961134 | \n",
909 | "
\n",
910 | " \n",
911 | " min | \n",
912 | " 241.840000 | \n",
913 | " 12021.000000 | \n",
914 | " 187.850000 | \n",
915 | " 12724.000000 | \n",
916 | " 187.830000 | \n",
917 | " 770.000000 | \n",
918 | " 0.000015 | \n",
919 | "
\n",
920 | " \n",
921 | " 25% | \n",
922 | " 244.480000 | \n",
923 | " 13610.000000 | \n",
924 | " 190.260000 | \n",
925 | " 14826.000000 | \n",
926 | " 188.600000 | \n",
927 | " 878.000000 | \n",
928 | " 0.106357 | \n",
929 | "
\n",
930 | " \n",
931 | " 50% | \n",
932 | " 245.310000 | \n",
933 | " 15037.000000 | \n",
934 | " 191.570000 | \n",
935 | " 16793.000000 | \n",
936 | " 188.600000 | \n",
937 | " 982.000000 | \n",
938 | " 0.729103 | \n",
939 | "
\n",
940 | " \n",
941 | " 75% | \n",
942 | " 246.220000 | \n",
943 | " 16090.000000 | \n",
944 | " 192.550000 | \n",
945 | " 18435.000000 | \n",
946 | " 191.320000 | \n",
947 | " 1060.000000 | \n",
948 | " 4.435761 | \n",
949 | "
\n",
950 | " \n",
951 | " max | \n",
952 | " 248.820000 | \n",
953 | " 18492.000000 | \n",
954 | " 193.630000 | \n",
955 | " 22027.000000 | \n",
956 | " 193.880000 | \n",
957 | " 1232.000000 | \n",
958 | " 12.504178 | \n",
959 | "
\n",
960 | " \n",
961 | "
\n",
962 | "
"
963 | ],
964 | "text/plain": [
965 | " HighQ HighQN MedQ MedQN LowQ \\\n",
966 | "count 449.000000 449.000000 449.000000 449.000000 449.000000 \n",
967 | "mean 245.376125 14947.073497 191.268909 16769.821826 189.783586 \n",
968 | "std 1.727046 1656.133565 1.524028 2433.943191 1.598252 \n",
969 | "min 241.840000 12021.000000 187.850000 12724.000000 187.830000 \n",
970 | "25% 244.480000 13610.000000 190.260000 14826.000000 188.600000 \n",
971 | "50% 245.310000 15037.000000 191.570000 16793.000000 188.600000 \n",
972 | "75% 246.220000 16090.000000 192.550000 18435.000000 191.320000 \n",
973 | "max 248.820000 18492.000000 193.630000 22027.000000 193.880000 \n",
974 | "\n",
975 | " LowQN HighQ_dev \n",
976 | "count 449.000000 449.000000 \n",
977 | "mean 976.298441 2.976043 \n",
978 | "std 120.246714 3.961134 \n",
979 | "min 770.000000 0.000015 \n",
980 | "25% 878.000000 0.106357 \n",
981 | "50% 982.000000 0.729103 \n",
982 | "75% 1060.000000 4.435761 \n",
983 | "max 1232.000000 12.504178 "
984 | ]
985 | },
986 | "execution_count": 20,
987 | "metadata": {},
988 | "output_type": "execute_result"
989 | }
990 | ],
991 | "source": [
992 | "california_pd.describe()"
993 | ]
994 | },
995 | {
996 | "cell_type": "code",
997 | "execution_count": 21,
998 | "metadata": {
999 | "collapsed": false
1000 | },
1001 | "outputs": [
1002 | {
1003 | "data": {
1004 | "text/plain": [
1005 | "0 245.03\n",
1006 | "1 245.05\n",
1007 | "dtype: float64"
1008 | ]
1009 | },
1010 | "execution_count": 21,
1011 | "metadata": {},
1012 | "output_type": "execute_result"
1013 | }
1014 | ],
1015 | "source": [
1016 | "california_pd.HighQ.mode()"
1017 | ]
1018 | },
1019 | {
1020 | "cell_type": "markdown",
1021 | "metadata": {},
1022 | "source": [
1023 | "#### Co-variance \n",
1024 | "\n",
1025 | "covariance as a measure of the (average) co-variation between two variables, say x and y. Covariance describes both how far the variables are spread out, and the nature of their relationship, Covariance is a measure of how much two variables change together. Compare this to Variance, which is just the range over which one measure (or variable) varies.\n",
1026 | "\n",
1027 | "
\n",
1028 | "\n",
1029 | "
\n",
1030 | "
\n",
1031 | "
\n",
1032 | "
\n",
1033 | "\n",
1034 | "#### Co-variance of weed price in California vs New York"
1035 | ]
1036 | },
1037 | {
1038 | "cell_type": "code",
1039 | "execution_count": 22,
1040 | "metadata": {
1041 | "collapsed": false
1042 | },
1043 | "outputs": [
1044 | {
1045 | "data": {
1046 | "text/html": [
1047 | "\n",
1048 | "
\n",
1049 | " \n",
1050 | " \n",
1051 | " | \n",
1052 | " State | \n",
1053 | " HighQ | \n",
1054 | " HighQN | \n",
1055 | " MedQ | \n",
1056 | " MedQN | \n",
1057 | " LowQ | \n",
1058 | " LowQN | \n",
1059 | " date | \n",
1060 | "
\n",
1061 | " \n",
1062 | " \n",
1063 | " \n",
1064 | " 20120 | \n",
1065 | " New York | \n",
1066 | " 351.98 | \n",
1067 | " 5773 | \n",
1068 | " 268.83 | \n",
1069 | " 5786 | \n",
1070 | " 190.31 | \n",
1071 | " 479 | \n",
1072 | " 2013-12-27 | \n",
1073 | "
\n",
1074 | " \n",
1075 | " 20885 | \n",
1076 | " New York | \n",
1077 | " 351.92 | \n",
1078 | " 5775 | \n",
1079 | " 268.83 | \n",
1080 | " 5786 | \n",
1081 | " 190.31 | \n",
1082 | " 479 | \n",
1083 | " 2013-12-28 | \n",
1084 | "
\n",
1085 | " \n",
1086 | " 21599 | \n",
1087 | " New York | \n",
1088 | " 351.99 | \n",
1089 | " 5785 | \n",
1090 | " 269.02 | \n",
1091 | " 5806 | \n",
1092 | " 190.75 | \n",
1093 | " 480 | \n",
1094 | " 2013-12-29 | \n",
1095 | "
\n",
1096 | " \n",
1097 | " 22313 | \n",
1098 | " New York | \n",
1099 | " 352.02 | \n",
1100 | " 5791 | \n",
1101 | " 268.98 | \n",
1102 | " 5814 | \n",
1103 | " 190.75 | \n",
1104 | " 480 | \n",
1105 | " 2013-12-30 | \n",
1106 | "
\n",
1107 | " \n",
1108 | " 22823 | \n",
1109 | " New York | \n",
1110 | " 351.97 | \n",
1111 | " 5794 | \n",
1112 | " 268.93 | \n",
1113 | " 5818 | \n",
1114 | " 190.75 | \n",
1115 | " 480 | \n",
1116 | " 2013-12-31 | \n",
1117 | "
\n",
1118 | " \n",
1119 | "
\n",
1120 | "
"
1121 | ],
1122 | "text/plain": [
1123 | " State HighQ HighQN MedQ MedQN LowQ LowQN date\n",
1124 | "20120 New York 351.98 5773 268.83 5786 190.31 479 2013-12-27\n",
1125 | "20885 New York 351.92 5775 268.83 5786 190.31 479 2013-12-28\n",
1126 | "21599 New York 351.99 5785 269.02 5806 190.75 480 2013-12-29\n",
1127 | "22313 New York 352.02 5791 268.98 5814 190.75 480 2013-12-30\n",
1128 | "22823 New York 351.97 5794 268.93 5818 190.75 480 2013-12-31"
1129 | ]
1130 | },
1131 | "execution_count": 22,
1132 | "metadata": {},
1133 | "output_type": "execute_result"
1134 | }
1135 | ],
1136 | "source": [
1137 | "ny_pd = prices_pd[prices_pd['State'] == 'New York'].copy(True)\n",
1138 | "ny_pd.head()"
1139 | ]
1140 | },
1141 | {
1142 | "cell_type": "code",
1143 | "execution_count": 23,
1144 | "metadata": {
1145 | "collapsed": false
1146 | },
1147 | "outputs": [],
1148 | "source": [
1149 | "ny_pd = ny_pd.ix[:,[1,7]]\n",
1150 | "ny_pd.columns = ['NY_HighQ', 'date']"
1151 | ]
1152 | },
1153 | {
1154 | "cell_type": "code",
1155 | "execution_count": 24,
1156 | "metadata": {
1157 | "collapsed": false
1158 | },
1159 | "outputs": [
1160 | {
1161 | "data": {
1162 | "text/html": [
1163 | "\n",
1164 | "
\n",
1165 | " \n",
1166 | " \n",
1167 | " | \n",
1168 | " NY_HighQ | \n",
1169 | " date | \n",
1170 | "
\n",
1171 | " \n",
1172 | " \n",
1173 | " \n",
1174 | " 20120 | \n",
1175 | " 351.98 | \n",
1176 | " 2013-12-27 | \n",
1177 | "
\n",
1178 | " \n",
1179 | " 20885 | \n",
1180 | " 351.92 | \n",
1181 | " 2013-12-28 | \n",
1182 | "
\n",
1183 | " \n",
1184 | " 21599 | \n",
1185 | " 351.99 | \n",
1186 | " 2013-12-29 | \n",
1187 | "
\n",
1188 | " \n",
1189 | " 22313 | \n",
1190 | " 352.02 | \n",
1191 | " 2013-12-30 | \n",
1192 | "
\n",
1193 | " \n",
1194 | " 22823 | \n",
1195 | " 351.97 | \n",
1196 | " 2013-12-31 | \n",
1197 | "
\n",
1198 | " \n",
1199 | "
\n",
1200 | "
"
1201 | ],
1202 | "text/plain": [
1203 | " NY_HighQ date\n",
1204 | "20120 351.98 2013-12-27\n",
1205 | "20885 351.92 2013-12-28\n",
1206 | "21599 351.99 2013-12-29\n",
1207 | "22313 352.02 2013-12-30\n",
1208 | "22823 351.97 2013-12-31"
1209 | ]
1210 | },
1211 | "execution_count": 24,
1212 | "metadata": {},
1213 | "output_type": "execute_result"
1214 | }
1215 | ],
1216 | "source": [
1217 | "ny_pd.head()"
1218 | ]
1219 | },
1220 | {
1221 | "cell_type": "code",
1222 | "execution_count": 25,
1223 | "metadata": {
1224 | "collapsed": false
1225 | },
1226 | "outputs": [
1227 | {
1228 | "data": {
1229 | "text/html": [
1230 | "\n",
1231 | "
\n",
1232 | " \n",
1233 | " \n",
1234 | " | \n",
1235 | " CA_HighQ | \n",
1236 | " date | \n",
1237 | " NY_HighQ | \n",
1238 | "
\n",
1239 | " \n",
1240 | " \n",
1241 | " \n",
1242 | " 0 | \n",
1243 | " 248.77 | \n",
1244 | " 2013-12-27 | \n",
1245 | " 351.98 | \n",
1246 | "
\n",
1247 | " \n",
1248 | " 1 | \n",
1249 | " 248.74 | \n",
1250 | " 2013-12-28 | \n",
1251 | " 351.92 | \n",
1252 | "
\n",
1253 | " \n",
1254 | " 2 | \n",
1255 | " 248.76 | \n",
1256 | " 2013-12-29 | \n",
1257 | " 351.99 | \n",
1258 | "
\n",
1259 | " \n",
1260 | " 3 | \n",
1261 | " 248.82 | \n",
1262 | " 2013-12-30 | \n",
1263 | " 352.02 | \n",
1264 | "
\n",
1265 | " \n",
1266 | " 4 | \n",
1267 | " 248.76 | \n",
1268 | " 2013-12-31 | \n",
1269 | " 351.97 | \n",
1270 | "
\n",
1271 | " \n",
1272 | "
\n",
1273 | "
"
1274 | ],
1275 | "text/plain": [
1276 | " CA_HighQ date NY_HighQ\n",
1277 | "0 248.77 2013-12-27 351.98\n",
1278 | "1 248.74 2013-12-28 351.92\n",
1279 | "2 248.76 2013-12-29 351.99\n",
1280 | "3 248.82 2013-12-30 352.02\n",
1281 | "4 248.76 2013-12-31 351.97"
1282 | ]
1283 | },
1284 | "execution_count": 25,
1285 | "metadata": {},
1286 | "output_type": "execute_result"
1287 | }
1288 | ],
1289 | "source": [
1290 | "ca_ny_pd = pd.merge(california_pd.ix[:,[1,7]].copy(), ny_pd, on=\"date\")\n",
1291 | "ca_ny_pd.rename(columns={\"HighQ\": \"CA_HighQ\"}, inplace=True)\n",
1292 | "ca_ny_pd.head()"
1293 | ]
1294 | },
1295 | {
1296 | "cell_type": "code",
1297 | "execution_count": 26,
1298 | "metadata": {
1299 | "collapsed": false
1300 | },
1301 | "outputs": [
1302 | {
1303 | "data": {
1304 | "text/plain": [
1305 | "346.9127616926502"
1306 | ]
1307 | },
1308 | "execution_count": 26,
1309 | "metadata": {},
1310 | "output_type": "execute_result"
1311 | }
1312 | ],
1313 | "source": [
1314 | "ny_mean = ca_ny_pd.NY_HighQ.mean()\n",
1315 | "ny_mean"
1316 | ]
1317 | },
1318 | {
1319 | "cell_type": "code",
1320 | "execution_count": 27,
1321 | "metadata": {
1322 | "collapsed": false
1323 | },
1324 | "outputs": [
1325 | {
1326 | "data": {
1327 | "text/html": [
1328 | "\n",
1329 | "
\n",
1330 | " \n",
1331 | " \n",
1332 | " | \n",
1333 | " CA_HighQ | \n",
1334 | " date | \n",
1335 | " NY_HighQ | \n",
1336 | " ca_dev | \n",
1337 | "
\n",
1338 | " \n",
1339 | " \n",
1340 | " \n",
1341 | " 0 | \n",
1342 | " 248.77 | \n",
1343 | " 2013-12-27 | \n",
1344 | " 351.98 | \n",
1345 | " 3.393875 | \n",
1346 | "
\n",
1347 | " \n",
1348 | " 1 | \n",
1349 | " 248.74 | \n",
1350 | " 2013-12-28 | \n",
1351 | " 351.92 | \n",
1352 | " 3.363875 | \n",
1353 | "
\n",
1354 | " \n",
1355 | " 2 | \n",
1356 | " 248.76 | \n",
1357 | " 2013-12-29 | \n",
1358 | " 351.99 | \n",
1359 | " 3.383875 | \n",
1360 | "
\n",
1361 | " \n",
1362 | " 3 | \n",
1363 | " 248.82 | \n",
1364 | " 2013-12-30 | \n",
1365 | " 352.02 | \n",
1366 | " 3.443875 | \n",
1367 | "
\n",
1368 | " \n",
1369 | " 4 | \n",
1370 | " 248.76 | \n",
1371 | " 2013-12-31 | \n",
1372 | " 351.97 | \n",
1373 | " 3.383875 | \n",
1374 | "
\n",
1375 | " \n",
1376 | "
\n",
1377 | "
"
1378 | ],
1379 | "text/plain": [
1380 | " CA_HighQ date NY_HighQ ca_dev\n",
1381 | "0 248.77 2013-12-27 351.98 3.393875\n",
1382 | "1 248.74 2013-12-28 351.92 3.363875\n",
1383 | "2 248.76 2013-12-29 351.99 3.383875\n",
1384 | "3 248.82 2013-12-30 352.02 3.443875\n",
1385 | "4 248.76 2013-12-31 351.97 3.383875"
1386 | ]
1387 | },
1388 | "execution_count": 27,
1389 | "metadata": {},
1390 | "output_type": "execute_result"
1391 | }
1392 | ],
1393 | "source": [
1394 | "ca_ny_pd['ca_dev'] = ca_ny_pd['CA_HighQ'] - ca_mean\n",
1395 | "ca_ny_pd.head()"
1396 | ]
1397 | },
1398 | {
1399 | "cell_type": "code",
1400 | "execution_count": 28,
1401 | "metadata": {
1402 | "collapsed": false
1403 | },
1404 | "outputs": [
1405 | {
1406 | "data": {
1407 | "text/html": [
1408 | "\n",
1409 | "
\n",
1410 | " \n",
1411 | " \n",
1412 | " | \n",
1413 | " CA_HighQ | \n",
1414 | " date | \n",
1415 | " NY_HighQ | \n",
1416 | " ca_dev | \n",
1417 | " ny_dev | \n",
1418 | "
\n",
1419 | " \n",
1420 | " \n",
1421 | " \n",
1422 | " 0 | \n",
1423 | " 248.77 | \n",
1424 | " 2013-12-27 | \n",
1425 | " 351.98 | \n",
1426 | " 3.393875 | \n",
1427 | " 5.067238 | \n",
1428 | "
\n",
1429 | " \n",
1430 | " 1 | \n",
1431 | " 248.74 | \n",
1432 | " 2013-12-28 | \n",
1433 | " 351.92 | \n",
1434 | " 3.363875 | \n",
1435 | " 5.007238 | \n",
1436 | "
\n",
1437 | " \n",
1438 | " 2 | \n",
1439 | " 248.76 | \n",
1440 | " 2013-12-29 | \n",
1441 | " 351.99 | \n",
1442 | " 3.383875 | \n",
1443 | " 5.077238 | \n",
1444 | "
\n",
1445 | " \n",
1446 | " 3 | \n",
1447 | " 248.82 | \n",
1448 | " 2013-12-30 | \n",
1449 | " 352.02 | \n",
1450 | " 3.443875 | \n",
1451 | " 5.107238 | \n",
1452 | "
\n",
1453 | " \n",
1454 | " 4 | \n",
1455 | " 248.76 | \n",
1456 | " 2013-12-31 | \n",
1457 | " 351.97 | \n",
1458 | " 3.383875 | \n",
1459 | " 5.057238 | \n",
1460 | "
\n",
1461 | " \n",
1462 | "
\n",
1463 | "
"
1464 | ],
1465 | "text/plain": [
1466 | " CA_HighQ date NY_HighQ ca_dev ny_dev\n",
1467 | "0 248.77 2013-12-27 351.98 3.393875 5.067238\n",
1468 | "1 248.74 2013-12-28 351.92 3.363875 5.007238\n",
1469 | "2 248.76 2013-12-29 351.99 3.383875 5.077238\n",
1470 | "3 248.82 2013-12-30 352.02 3.443875 5.107238\n",
1471 | "4 248.76 2013-12-31 351.97 3.383875 5.057238"
1472 | ]
1473 | },
1474 | "execution_count": 28,
1475 | "metadata": {},
1476 | "output_type": "execute_result"
1477 | }
1478 | ],
1479 | "source": [
1480 | "ca_ny_pd['ny_dev'] = ca_ny_pd['NY_HighQ'] - ny_mean\n",
1481 | "ca_ny_pd.head()"
1482 | ]
1483 | },
1484 | {
1485 | "cell_type": "code",
1486 | "execution_count": 29,
1487 | "metadata": {
1488 | "collapsed": false
1489 | },
1490 | "outputs": [
1491 | {
1492 | "name": "stdout",
1493 | "output_type": "stream",
1494 | "text": [
1495 | "Covariance of the High Quality weed prices in CA and NY is: 5.91681496729\n"
1496 | ]
1497 | }
1498 | ],
1499 | "source": [
1500 | "ca_ny_cov = (ca_ny_pd['ca_dev'] * ca_ny_pd['ny_dev']).sum() / (ca_count - 1)\n",
1501 | "print \"Covariance of the High Quality weed prices in CA and NY is:\", ca_ny_cov"
1502 | ]
1503 | },
1504 | {
1505 | "cell_type": "markdown",
1506 | "metadata": {},
1507 | "source": [
1508 | "#### Using Pandas built-in function"
1509 | ]
1510 | },
1511 | {
1512 | "cell_type": "code",
1513 | "execution_count": 30,
1514 | "metadata": {
1515 | "collapsed": false
1516 | },
1517 | "outputs": [
1518 | {
1519 | "data": {
1520 | "text/html": [
1521 | "\n",
1522 | "
\n",
1523 | " \n",
1524 | " \n",
1525 | " | \n",
1526 | " CA_HighQ | \n",
1527 | " NY_HighQ | \n",
1528 | " ca_dev | \n",
1529 | " ny_dev | \n",
1530 | "
\n",
1531 | " \n",
1532 | " \n",
1533 | " \n",
1534 | " CA_HighQ | \n",
1535 | " 2.982686 | \n",
1536 | " 5.916815 | \n",
1537 | " 2.982686 | \n",
1538 | " 5.916815 | \n",
1539 | "
\n",
1540 | " \n",
1541 | " NY_HighQ | \n",
1542 | " 5.916815 | \n",
1543 | " 12.245147 | \n",
1544 | " 5.916815 | \n",
1545 | " 12.245147 | \n",
1546 | "
\n",
1547 | " \n",
1548 | " ca_dev | \n",
1549 | " 2.982686 | \n",
1550 | " 5.916815 | \n",
1551 | " 2.982686 | \n",
1552 | " 5.916815 | \n",
1553 | "
\n",
1554 | " \n",
1555 | " ny_dev | \n",
1556 | " 5.916815 | \n",
1557 | " 12.245147 | \n",
1558 | " 5.916815 | \n",
1559 | " 12.245147 | \n",
1560 | "
\n",
1561 | " \n",
1562 | "
\n",
1563 | "
"
1564 | ],
1565 | "text/plain": [
1566 | " CA_HighQ NY_HighQ ca_dev ny_dev\n",
1567 | "CA_HighQ 2.982686 5.916815 2.982686 5.916815\n",
1568 | "NY_HighQ 5.916815 12.245147 5.916815 12.245147\n",
1569 | "ca_dev 2.982686 5.916815 2.982686 5.916815\n",
1570 | "ny_dev 5.916815 12.245147 5.916815 12.245147"
1571 | ]
1572 | },
1573 | "execution_count": 30,
1574 | "metadata": {},
1575 | "output_type": "execute_result"
1576 | }
1577 | ],
1578 | "source": [
1579 | "ca_ny_pd.cov()"
1580 | ]
1581 | },
1582 | {
1583 | "cell_type": "markdown",
1584 | "metadata": {},
1585 | "source": [
1586 | "### Correlation\n",
1587 | "\n",
1588 | "Extent to which two or more variables fluctuate together. A positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation indicates the extent to which one variable increases as the other decreases.\n",
1589 | "\n",
1590 | "
\n",
1591 | "\n",
1592 | "
\n",
1593 | "
\n",
1594 | "
\n",
1595 | "\n",
1596 | "#### Finding correlation between weed prices in New York and California"
1597 | ]
1598 | },
1599 | {
1600 | "cell_type": "code",
1601 | "execution_count": 31,
1602 | "metadata": {
1603 | "collapsed": false
1604 | },
1605 | "outputs": [
1606 | {
1607 | "name": "stdout",
1608 | "output_type": "stream",
1609 | "text": [
1610 | "Correlation between weed prices in NY and CA: 0.979043961106\n"
1611 | ]
1612 | }
1613 | ],
1614 | "source": [
1615 | "ca_highq_std = ca_ny_pd.CA_HighQ.std()\n",
1616 | "ny_highq_std = ca_ny_pd.NY_HighQ.std()\n",
1617 | "\n",
1618 | "ca_ny_corr = ca_ny_cov / (ca_highq_std * ny_highq_std)\n",
1619 | "print \"Correlation between weed prices in NY and CA:\", ca_ny_corr"
1620 | ]
1621 | },
1622 | {
1623 | "cell_type": "code",
1624 | "execution_count": 32,
1625 | "metadata": {
1626 | "collapsed": false
1627 | },
1628 | "outputs": [
1629 | {
1630 | "data": {
1631 | "text/html": [
1632 | "\n",
1633 | "
\n",
1634 | " \n",
1635 | " \n",
1636 | " | \n",
1637 | " CA_HighQ | \n",
1638 | " NY_HighQ | \n",
1639 | " ca_dev | \n",
1640 | " ny_dev | \n",
1641 | "
\n",
1642 | " \n",
1643 | " \n",
1644 | " \n",
1645 | " CA_HighQ | \n",
1646 | " 1.000000 | \n",
1647 | " 0.979044 | \n",
1648 | " 1.000000 | \n",
1649 | " 0.979044 | \n",
1650 | "
\n",
1651 | " \n",
1652 | " NY_HighQ | \n",
1653 | " 0.979044 | \n",
1654 | " 1.000000 | \n",
1655 | " 0.979044 | \n",
1656 | " 1.000000 | \n",
1657 | "
\n",
1658 | " \n",
1659 | " ca_dev | \n",
1660 | " 1.000000 | \n",
1661 | " 0.979044 | \n",
1662 | " 1.000000 | \n",
1663 | " 0.979044 | \n",
1664 | "
\n",
1665 | " \n",
1666 | " ny_dev | \n",
1667 | " 0.979044 | \n",
1668 | " 1.000000 | \n",
1669 | " 0.979044 | \n",
1670 | " 1.000000 | \n",
1671 | "
\n",
1672 | " \n",
1673 | "
\n",
1674 | "
"
1675 | ],
1676 | "text/plain": [
1677 | " CA_HighQ NY_HighQ ca_dev ny_dev\n",
1678 | "CA_HighQ 1.000000 0.979044 1.000000 0.979044\n",
1679 | "NY_HighQ 0.979044 1.000000 0.979044 1.000000\n",
1680 | "ca_dev 1.000000 0.979044 1.000000 0.979044\n",
1681 | "ny_dev 0.979044 1.000000 0.979044 1.000000"
1682 | ]
1683 | },
1684 | "execution_count": 32,
1685 | "metadata": {},
1686 | "output_type": "execute_result"
1687 | }
1688 | ],
1689 | "source": [
1690 | "ca_ny_pd.corr()"
1691 | ]
1692 | },
1693 | {
1694 | "cell_type": "markdown",
1695 | "metadata": {},
1696 | "source": [
1697 | "# Correlation != Causation\n",
1698 | "\n",
1699 | "correlation between two variables does not necessarily imply that one causes the other.\n",
1700 | "\n",
1701 | "\n",
1702 | "
"
1703 | ]
1704 | }
1705 | ],
1706 | "metadata": {
1707 | "kernelspec": {
1708 | "display_name": "Python 2",
1709 | "language": "python",
1710 | "name": "python2"
1711 | },
1712 | "language_info": {
1713 | "codemirror_mode": {
1714 | "name": "ipython",
1715 | "version": 2
1716 | },
1717 | "file_extension": ".py",
1718 | "mimetype": "text/x-python",
1719 | "name": "python",
1720 | "nbconvert_exporter": "python",
1721 | "pygments_lexer": "ipython2",
1722 | "version": "2.7.10"
1723 | }
1724 | },
1725 | "nbformat": 4,
1726 | "nbformat_minor": 0
1727 | }
1728 |
--------------------------------------------------------------------------------
/notebooks/6. Hypothesis Testing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Hypothesis Testing\n",
8 | "\n",
9 | "\n",
10 | "We would like to know if the effects we see in the sample(observed data) are likely to occur in the population. \n",
11 | "\n",
12 | "The way classical hypothesis testing works is by conducting a statistical test to answer the following question:\n",
13 | "> Given the sample and an effect, what is the probability of seeing that effect just by chance?\n",
14 | "\n",
15 | "Here are the steps on how we would do this\n",
16 | "\n",
17 | "1. Compute test statistic\n",
18 | "2. Define null hypothesis\n",
19 | "3. Compute p-value\n",
20 | "4. Interpret the result\n",
21 | "\n",
22 | "If p-value is very low(most often than now, below 0.05), the effect is considered statistically significant. That means that effect is unlikely to have occured by chance. The inference? The effect is likely to be seen in the population too. \n",
23 | "\n",
24 | "This process is very similar to the *proof by contradiction* paradigm. We first assume that the effect is false. That's the null hypothesis. Next step is to compute the probability of obtaining that effect (the p-value). If p-value is very low(<0.05 as a rule of thumb), we reject the null hypothesis. "
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 1,
30 | "metadata": {
31 | "collapsed": true
32 | },
33 | "outputs": [],
34 | "source": [
35 | "import numpy as np\n",
36 | "import pandas as pd\n",
37 | "from scipy import stats\n",
38 | "import matplotlib as mpl\n",
39 | "%matplotlib inline"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 2,
45 | "metadata": {
46 | "collapsed": false
47 | },
48 | "outputs": [],
49 | "source": [
50 | "import seaborn as sns\n",
51 | "sns.set(color_codes=True)"
52 | ]
53 | },
54 | {
55 | "cell_type": "code",
56 | "execution_count": 3,
57 | "metadata": {
58 | "collapsed": true
59 | },
60 | "outputs": [],
61 | "source": [
62 | "weed_pd = pd.read_csv(\"../data/Weed_Price.csv\", parse_dates=[-1])"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 4,
68 | "metadata": {
69 | "collapsed": false
70 | },
71 | "outputs": [],
72 | "source": [
73 | "weed_pd[\"month\"] = weed_pd.date.apply(lambda x: x.month)\n",
74 | "weed_pd[\"year\"] = weed_pd.date.apply(lambda x: x.year)"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": 5,
80 | "metadata": {
81 | "collapsed": false
82 | },
83 | "outputs": [
84 | {
85 | "data": {
86 | "text/html": [
87 | "\n",
88 | "
\n",
89 | " \n",
90 | " \n",
91 | " | \n",
92 | " State | \n",
93 | " HighQ | \n",
94 | " HighQN | \n",
95 | " MedQ | \n",
96 | " MedQN | \n",
97 | " LowQ | \n",
98 | " LowQN | \n",
99 | " date | \n",
100 | " month | \n",
101 | " year | \n",
102 | "
\n",
103 | " \n",
104 | " \n",
105 | " \n",
106 | " 0 | \n",
107 | " Alabama | \n",
108 | " 339.06 | \n",
109 | " 1042 | \n",
110 | " 198.64 | \n",
111 | " 933 | \n",
112 | " 149.49 | \n",
113 | " 123 | \n",
114 | " 2014-01-01 | \n",
115 | " 1 | \n",
116 | " 2014 | \n",
117 | "
\n",
118 | " \n",
119 | " 1 | \n",
120 | " Alaska | \n",
121 | " 288.75 | \n",
122 | " 252 | \n",
123 | " 260.60 | \n",
124 | " 297 | \n",
125 | " 388.58 | \n",
126 | " 26 | \n",
127 | " 2014-01-01 | \n",
128 | " 1 | \n",
129 | " 2014 | \n",
130 | "
\n",
131 | " \n",
132 | " 2 | \n",
133 | " Arizona | \n",
134 | " 303.31 | \n",
135 | " 1941 | \n",
136 | " 209.35 | \n",
137 | " 1625 | \n",
138 | " 189.45 | \n",
139 | " 222 | \n",
140 | " 2014-01-01 | \n",
141 | " 1 | \n",
142 | " 2014 | \n",
143 | "
\n",
144 | " \n",
145 | " 3 | \n",
146 | " Arkansas | \n",
147 | " 361.85 | \n",
148 | " 576 | \n",
149 | " 185.62 | \n",
150 | " 544 | \n",
151 | " 125.87 | \n",
152 | " 112 | \n",
153 | " 2014-01-01 | \n",
154 | " 1 | \n",
155 | " 2014 | \n",
156 | "
\n",
157 | " \n",
158 | " 4 | \n",
159 | " California | \n",
160 | " 248.78 | \n",
161 | " 12096 | \n",
162 | " 193.56 | \n",
163 | " 12812 | \n",
164 | " 192.92 | \n",
165 | " 778 | \n",
166 | " 2014-01-01 | \n",
167 | " 1 | \n",
168 | " 2014 | \n",
169 | "
\n",
170 | " \n",
171 | "
\n",
172 | "
"
173 | ],
174 | "text/plain": [
175 | " State HighQ HighQN MedQ MedQN LowQ LowQN date month \\\n",
176 | "0 Alabama 339.06 1042 198.64 933 149.49 123 2014-01-01 1 \n",
177 | "1 Alaska 288.75 252 260.60 297 388.58 26 2014-01-01 1 \n",
178 | "2 Arizona 303.31 1941 209.35 1625 189.45 222 2014-01-01 1 \n",
179 | "3 Arkansas 361.85 576 185.62 544 125.87 112 2014-01-01 1 \n",
180 | "4 California 248.78 12096 193.56 12812 192.92 778 2014-01-01 1 \n",
181 | "\n",
182 | " year \n",
183 | "0 2014 \n",
184 | "1 2014 \n",
185 | "2 2014 \n",
186 | "3 2014 \n",
187 | "4 2014 "
188 | ]
189 | },
190 | "execution_count": 5,
191 | "metadata": {},
192 | "output_type": "execute_result"
193 | }
194 | ],
195 | "source": [
196 | "weed_pd.head()"
197 | ]
198 | },
199 | {
200 | "cell_type": "markdown",
201 | "metadata": {},
202 | "source": [
203 | "### Let's work on weed prices in California in 2014\n"
204 | ]
205 | },
206 | {
207 | "cell_type": "code",
208 | "execution_count": 6,
209 | "metadata": {
210 | "collapsed": true
211 | },
212 | "outputs": [],
213 | "source": [
214 | "weed_ca_2014 = weed_pd[(weed_pd.State==\"California\") & (weed_pd.year==2014)]"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": 7,
220 | "metadata": {
221 | "collapsed": false
222 | },
223 | "outputs": [
224 | {
225 | "name": "stdout",
226 | "output_type": "stream",
227 | "text": [
228 | "Mean: 245.894230769\n",
229 | "Standard Deviation: 1.28990793937\n"
230 | ]
231 | }
232 | ],
233 | "source": [
234 | "#Mean and standard deviation of high quality weed's price\n",
235 | "print \"Mean:\", weed_ca_2014.HighQ.mean()\n",
236 | "print \"Standard Deviation:\", weed_ca_2014.HighQ.std()"
237 | ]
238 | },
239 | {
240 | "cell_type": "code",
241 | "execution_count": 8,
242 | "metadata": {
243 | "collapsed": false
244 | },
245 | "outputs": [
246 | {
247 | "data": {
248 | "text/plain": [
249 | "(245.761718492726, 246.02674304573577)"
250 | ]
251 | },
252 | "execution_count": 8,
253 | "metadata": {},
254 | "output_type": "execute_result"
255 | }
256 | ],
257 | "source": [
258 | "#Confidence interval on the mean\n",
259 | "stats.norm.interval(0.95, loc=weed_ca_2014.HighQ.mean(), scale = weed_ca_2014.HighQ.std()/np.sqrt(len(weed_ca_2014)))"
260 | ]
261 | },
262 | {
263 | "cell_type": "markdown",
264 | "metadata": {},
265 | "source": [
266 | "### Question: Are high-quality weed prices in Jan 2014 significantly higher than in Jan 2015?"
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": 9,
272 | "metadata": {
273 | "collapsed": false
274 | },
275 | "outputs": [],
276 | "source": [
277 | "#Get the data\n",
278 | "weed_ca_jan2014 = np.array(weed_pd[(weed_pd.State==\"California\") & (weed_pd.year==2014) & (weed_pd.month==1)].HighQ)\n",
279 | "weed_ca_jan2015 = np.array(weed_pd[(weed_pd.State==\"California\") & (weed_pd.year==2015) & (weed_pd.month==1)].HighQ)"
280 | ]
281 | },
282 | {
283 | "cell_type": "code",
284 | "execution_count": 10,
285 | "metadata": {
286 | "collapsed": false
287 | },
288 | "outputs": [
289 | {
290 | "name": "stdout",
291 | "output_type": "stream",
292 | "text": [
293 | "Mean-2014 Jan: 248.445483871\n",
294 | "Mean-2015 Jan: 243.602258065\n"
295 | ]
296 | }
297 | ],
298 | "source": [
299 | "print \"Mean-2014 Jan:\", weed_ca_jan2014.mean()\n",
300 | "print \"Mean-2015 Jan:\", weed_ca_jan2015.mean()"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": 11,
306 | "metadata": {
307 | "collapsed": false
308 | },
309 | "outputs": [
310 | {
311 | "name": "stdout",
312 | "output_type": "stream",
313 | "text": [
314 | "Effect size: 4.84322580645\n"
315 | ]
316 | }
317 | ],
318 | "source": [
319 | "print \"Effect size:\", weed_ca_jan2014.mean() - weed_ca_jan2015.mean()"
320 | ]
321 | },
322 | {
323 | "cell_type": "markdown",
324 | "metadata": {},
325 | "source": [
326 | "**Null Hypothesis**: Mean prices aren't significantly different\n",
327 | "\n",
328 | "Perform **t-test** and determine the p-value. "
329 | ]
330 | },
331 | {
332 | "cell_type": "code",
333 | "execution_count": 12,
334 | "metadata": {
335 | "collapsed": false
336 | },
337 | "outputs": [
338 | {
339 | "data": {
340 | "text/plain": [
341 | "Ttest_indResult(statistic=98.011325238158051, pvalue=6.2979718185084028e-68)"
342 | ]
343 | },
344 | "execution_count": 12,
345 | "metadata": {},
346 | "output_type": "execute_result"
347 | }
348 | ],
349 | "source": [
350 | "stats.ttest_ind(weed_ca_jan2014, weed_ca_jan2015, equal_var=True)"
351 | ]
352 | },
353 | {
354 | "cell_type": "markdown",
355 | "metadata": {},
356 | "source": [
357 | "p-value is the probability that the effective size was by chance. And here, p-value is almost 0.\n",
358 | "\n",
359 | "*Conclusion*: The price difference is significant. But is a price increase of $4.85 a big deal? The price decreased in 2015 by almost 2%. Always remember to look at effect size. "
360 | ]
361 | },
362 | {
363 | "cell_type": "markdown",
364 | "metadata": {},
365 | "source": [
366 | "**Problem** Determine if prices of medium quality weed for Jan 2015 and Feb 2015 are significantly different for New York. "
367 | ]
368 | },
369 | {
370 | "cell_type": "code",
371 | "execution_count": null,
372 | "metadata": {
373 | "collapsed": true
374 | },
375 | "outputs": [],
376 | "source": []
377 | },
378 | {
379 | "cell_type": "markdown",
380 | "metadata": {},
381 | "source": [
382 | "### Assumption of t-test\n",
383 | "\n",
384 | "One assumption is that the data used came from a normal distribution. \n",
385 | "
\n",
386 | "There's a [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro-Wilk) to test for normality. If p-value is less than 0.05, then there's a low chance that the distribution is normal."
387 | ]
388 | },
389 | {
390 | "cell_type": "code",
391 | "execution_count": 13,
392 | "metadata": {
393 | "collapsed": false
394 | },
395 | "outputs": [
396 | {
397 | "data": {
398 | "text/plain": [
399 | "(0.9469053149223328, 0.12818680703639984)"
400 | ]
401 | },
402 | "execution_count": 13,
403 | "metadata": {},
404 | "output_type": "execute_result"
405 | }
406 | ],
407 | "source": [
408 | "stats.shapiro(weed_ca_jan2015)"
409 | ]
410 | },
411 | {
412 | "cell_type": "code",
413 | "execution_count": 14,
414 | "metadata": {
415 | "collapsed": false
416 | },
417 | "outputs": [
418 | {
419 | "data": {
420 | "text/plain": [
421 | "(0.9353488683700562, 0.06141229346394539)"
422 | ]
423 | },
424 | "execution_count": 14,
425 | "metadata": {},
426 | "output_type": "execute_result"
427 | }
428 | ],
429 | "source": [
430 | "stats.shapiro(weed_ca_jan2014)"
431 | ]
432 | },
433 | {
434 | "cell_type": "code",
435 | "execution_count": null,
436 | "metadata": {
437 | "collapsed": true
438 | },
439 | "outputs": [],
440 | "source": [
441 | "#We seem to be good."
442 | ]
443 | },
444 | {
445 | "cell_type": "markdown",
446 | "metadata": {},
447 | "source": [
448 | "### A/B testing\n",
449 | "\n",
450 | "Comparing two versions to check which one performs better. Eg: Show to people two variants for the same webpage that they want to see and find which one provides better conversion rate (or the relevant metric). [wiki](https://en.wikipedia.org/wiki/A/B_testing)"
451 | ]
452 | },
453 | {
454 | "cell_type": "markdown",
455 | "metadata": {},
456 | "source": [
457 | "**Exercise: Impact of regulation and deregulation.**\n",
458 | "\n",
459 | "Information on regulation of Weed in the US by State [wiki](Impact of regulation and deregulation on a couple of states )\n",
460 | "\n",
461 | "1. Alaska legalized it on 4th Nov 2014. Find if prices significantly changed in Dec 2014 compared to Oct 2014. \n",
462 | "2. Maryland decriminalized possessing weed from Oct 1, 2014. Find if prices of weed changed significantly in Oct 2014 compared to Sep 2014"
463 | ]
464 | },
465 | {
466 | "cell_type": "code",
467 | "execution_count": null,
468 | "metadata": {
469 | "collapsed": true
470 | },
471 | "outputs": [],
472 | "source": []
473 | },
474 | {
475 | "cell_type": "markdown",
476 | "metadata": {},
477 | "source": [
478 | " Something to think about: Which of these give smaller p-values ?
\n",
479 | " \n",
480 | " * Smaller effect size\n",
481 | " * Smaller standard error\n",
482 | " * Smaller sample size\n",
483 | " * Higher variance\n",
484 | " \n",
485 | " **Answer:** "
486 | ]
487 | },
488 | {
489 | "cell_type": "markdown",
490 | "metadata": {},
491 | "source": [
492 | "### Chi-square tests"
493 | ]
494 | },
495 | {
496 | "cell_type": "markdown",
497 | "metadata": {},
498 | "source": [
499 | "Chi-Square tests are used when the data are frequencies, rather than numerical score/price.\n",
500 | "\n",
501 | "The following two tests make use of chi-square statistic\n",
502 | "\n",
503 | "1. chi-square test for goodness of fit\n",
504 | "2. chi-square test for independence\n",
505 | "\n",
506 | "Chi-square test is a non-parametric test. They do not require assumptions about population parameters and they do not test hypotheses about population parameters."
507 | ]
508 | },
509 | {
510 | "cell_type": "markdown",
511 | "metadata": {},
512 | "source": [
513 | " Chi-Square test for goodness fit
"
514 | ]
515 | },
516 | {
517 | "cell_type": "markdown",
518 | "metadata": {},
519 | "source": [
520 | "$$ \\chi^2 = \\sum (O - E)^2/E $$\n",
521 | "\n",
522 | "* O is observed frequency\n",
523 | "* E is expected frequency\n",
524 | "* $ \\chi $ is the chi-square statistic"
525 | ]
526 | },
527 | {
528 | "cell_type": "markdown",
529 | "metadata": {},
530 | "source": [
531 | "Let's assume the proportion of people who bought High, Medium and Low quality weed in Jan-2014 as the expected proportion. Find if proportion of people who bought weed in Jan 2015 conformed to the norm"
532 | ]
533 | },
534 | {
535 | "cell_type": "code",
536 | "execution_count": 16,
537 | "metadata": {
538 | "collapsed": true
539 | },
540 | "outputs": [],
541 | "source": [
542 | "weed_jan2014 = weed_pd[(weed_pd.year==2014) & (weed_pd.month==1)][[\"HighQN\", \"MedQN\", \"LowQN\"]]\n",
543 | "weed_jan2015 = weed_pd[(weed_pd.year==2015) & (weed_pd.month==1)][[\"HighQN\", \"MedQN\", \"LowQN\"]]"
544 | ]
545 | },
546 | {
547 | "cell_type": "code",
548 | "execution_count": 17,
549 | "metadata": {
550 | "collapsed": false
551 | },
552 | "outputs": [],
553 | "source": [
554 | "Expected = np.array(weed_jan2014.apply(sum, axis=0))\n",
555 | "Observed = np.array(weed_jan2015.apply(sum, axis=0))"
556 | ]
557 | },
558 | {
559 | "cell_type": "code",
560 | "execution_count": 18,
561 | "metadata": {
562 | "collapsed": false
563 | },
564 | "outputs": [
565 | {
566 | "name": "stdout",
567 | "output_type": "stream",
568 | "text": [
569 | "Expected: [2918004 2644757 263958] \n",
570 | "Observed: [4057716 4035049 358088]\n"
571 | ]
572 | }
573 | ],
574 | "source": [
575 | "print \"Expected:\", Expected, \"\\n\" , \"Observed:\", Observed"
576 | ]
577 | },
578 | {
579 | "cell_type": "code",
580 | "execution_count": 19,
581 | "metadata": {
582 | "collapsed": false
583 | },
584 | "outputs": [
585 | {
586 | "name": "stdout",
587 | "output_type": "stream",
588 | "text": [
589 | "Expected: [ 0.5007971 0.45390159 0.04530131] \n",
590 | "Observed: [ 0.48015461 0.47747239 0.042373 ]\n"
591 | ]
592 | }
593 | ],
594 | "source": [
595 | "print \"Expected:\", Expected/np.sum(Expected.astype(float)), \"\\n\" , \"Observed:\", Observed/np.sum(Observed.astype(float))"
596 | ]
597 | },
598 | {
599 | "cell_type": "code",
600 | "execution_count": 20,
601 | "metadata": {
602 | "collapsed": false
603 | },
604 | "outputs": [
605 | {
606 | "data": {
607 | "text/plain": [
608 | "Power_divergenceResult(statistic=1209562.2775169075, pvalue=0.0)"
609 | ]
610 | },
611 | "execution_count": 20,
612 | "metadata": {},
613 | "output_type": "execute_result"
614 | }
615 | ],
616 | "source": [
617 | "stats.chisquare(Observed, Expected)"
618 | ]
619 | },
620 | {
621 | "cell_type": "markdown",
622 | "metadata": {},
623 | "source": [
624 | "*Inference* : We reject null hypothesis. The proportions in Jan 2015 is different than what was expected."
625 | ]
626 | }
627 | ],
628 | "metadata": {
629 | "kernelspec": {
630 | "display_name": "Python 2",
631 | "language": "python",
632 | "name": "python2"
633 | },
634 | "language_info": {
635 | "codemirror_mode": {
636 | "name": "ipython",
637 | "version": 2
638 | },
639 | "file_extension": ".py",
640 | "mimetype": "text/x-python",
641 | "name": "python",
642 | "nbconvert_exporter": "python",
643 | "pygments_lexer": "ipython2",
644 | "version": "2.7.10"
645 | }
646 | },
647 | "nbformat": 4,
648 | "nbformat_minor": 0
649 | }
650 |
--------------------------------------------------------------------------------
/notebooks/8. Closing thoughts and terminology.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Terminology\n",
8 | "\n",
9 | "1. Null Hypothesis\n",
10 | "2. Alternate Hypothesis\n",
11 | "3. p-value (Probability of observing the metric from the data at least as extreme as computed just by chance)\n",
12 | "4. Bootstrap\n",
13 | "5. Acceptance Region\n",
14 | "6. Rejection Region\n",
15 | "7. t-test\n",
16 | "8. One-tailed test\n",
17 | "9. Two-tailed test\n",
18 | "10. Significance test\n",
19 | "11. Confidence interval\n",
20 | "12. Power of a test\n",
21 | "13. type 1 error (Rejecting null hypothesis when it is true). Also called false positive.\n",
22 | "14. type 2 error (Failing to reject null hypothesis when it is false). Also called false negative\n",
23 | "\n",
24 | "# Some Practical thoughts \n",
25 | "\n",
26 | "1. Data could be biased. Confidence intervals may then not be representative.\n",
27 | "2. One way to handle biased data is to use bias-corrected-confidence-intervals. \n",
28 | "3. Outliers can impact confidence intervals. \n",
29 | "4. Too often, people remove outliers. But they might be encoding some necessary information.\n",
30 | "5. One way to handle outliers is to use ranking, instead of actual numbers.\n",
31 | "6. If sample size is small, bootstrapping underestimates the size of confidence interval. \n",
32 | "7. Better to use significance testing if sample size is small.\n",
33 | "8. Bootstrapping should not be used find maximum value (Eg: maximum sales of shoes, 5th largest sales of shoes, etc)\n",
34 | "9. Use rank transformation when using bootstrapping, if the data has outliers\n",
35 | "10. Lack of representativeness is a problem for any statistical technique\n",
36 | "11. The experiment should be random. (Eg: When doing A/B testing, randomize the subjects). Experimental bias can lead to wrong inferences. \n",
37 | "12. Resampling time series data is tricky. The assumption we used - that each data point is independent, doesn't hold good for time series data. \n",
38 | "13. Rank transformation changes the question. For our shoe sales example, a rank transformed analysis would be: \"Do sales tend to be higher after price optimization?\". (Our analysis was: \"Does post-price-optimization sales have a higher mean sales?\")\n",
39 | "14. Power of a test increases if sample size increases\n",
40 | "\n",
41 | "# Types of Error\n",
42 | "\n",
43 | "1. Sampling Bias\n",
44 | "2. Measurement Error\n",
45 | "3. Random Error\n"
46 | ]
47 | }
48 | ],
49 | "metadata": {
50 | "kernelspec": {
51 | "display_name": "Python 2",
52 | "language": "python",
53 | "name": "python2"
54 | },
55 | "language_info": {
56 | "codemirror_mode": {
57 | "name": "ipython",
58 | "version": 2
59 | },
60 | "file_extension": ".py",
61 | "mimetype": "text/x-python",
62 | "name": "python",
63 | "nbconvert_exporter": "python",
64 | "pygments_lexer": "ipython2",
65 | "version": "2.7.10"
66 | }
67 | },
68 | "nbformat": 4,
69 | "nbformat_minor": 0
70 | }
71 |
--------------------------------------------------------------------------------
/notebooks/9. References.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Books, Slides, Articles\n",
8 | "1. Book: [Think Stats](http://greenteapress.com/thinkstats2/)\n",
9 | "2. Book: [All of Statistics](http://www.stat.cmu.edu/~larry/all-of-statistics/)\n",
10 | "2. Book: [Statistics is Easy](http://www.amazon.com/Statistics-Edition-Synthesis-Lectures-Mathematics/dp/160845570X)\n",
11 | "3. Workshop: Computational Statistics Workshop, Allen Donwney SciPy 2015 [Repo](https://github.com/AllenDowney/CompStats) [Video](https://www.youtube.com/watch?v=5Vjrqnk7Igs)\n",
12 | "4. Slides: [Statistics for Hackers](https://speakerdeck.com/jakevdp/statistics-for-hackers)"
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": null,
18 | "metadata": {
19 | "collapsed": true
20 | },
21 | "outputs": [],
22 | "source": []
23 | }
24 | ],
25 | "metadata": {
26 | "kernelspec": {
27 | "display_name": "Python 2",
28 | "language": "python",
29 | "name": "python2"
30 | },
31 | "language_info": {
32 | "codemirror_mode": {
33 | "name": "ipython",
34 | "version": 2
35 | },
36 | "file_extension": ".py",
37 | "mimetype": "text/x-python",
38 | "name": "python",
39 | "nbconvert_exporter": "python",
40 | "pygments_lexer": "ipython2",
41 | "version": "2.7.10"
42 | }
43 | },
44 | "nbformat": 4,
45 | "nbformat_minor": 0
46 | }
47 |
--------------------------------------------------------------------------------
/notebooks/img/6sigma.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/6sigma.png
--------------------------------------------------------------------------------
/notebooks/img/binomial.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/binomial.gif
--------------------------------------------------------------------------------
/notebooks/img/binomial_pmf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/binomial_pmf.png
--------------------------------------------------------------------------------
/notebooks/img/correlation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/correlation.gif
--------------------------------------------------------------------------------
/notebooks/img/correlation_not_causation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/correlation_not_causation.gif
--------------------------------------------------------------------------------
/notebooks/img/covariance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/covariance.png
--------------------------------------------------------------------------------
/notebooks/img/exponential_pdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/exponential_pdf.png
--------------------------------------------------------------------------------
/notebooks/img/kurtosis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/kurtosis.png
--------------------------------------------------------------------------------
/notebooks/img/leastsquare.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/leastsquare.gif
--------------------------------------------------------------------------------
/notebooks/img/normal_cdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/normal_cdf.png
--------------------------------------------------------------------------------
/notebooks/img/normal_pdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/normal_pdf.png
--------------------------------------------------------------------------------
/notebooks/img/normaldist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/normaldist.png
--------------------------------------------------------------------------------
/notebooks/img/skewness.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/skewness.png
--------------------------------------------------------------------------------
/notebooks/img/uniform.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/uniform.png
--------------------------------------------------------------------------------
/notebooks/img/variance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/variance.png
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | appnope==0.1.0
2 | backports.ssl-match-hostname==3.4.0.2
3 | certifi==2015.4.28
4 | decorator==4.0.2
5 | funcsigs==0.4
6 | functools32==3.2.3.post2
7 | gnureadline==6.3.3
8 | ipykernel==4.0.3
9 | ipython==4.0.0
10 | ipython-genutils==0.1.0
11 | Jinja2==2.8
12 | jsonschema==2.5.1
13 | jupyter-client==4.0.0
14 | jupyter-core==4.0.4
15 | MarkupSafe==0.23
16 | matplotlib==1.4.3
17 | mistune==0.7.1
18 | mock==1.3.0
19 | nbconvert==4.0.0
20 | nbformat==4.0.0
21 | nose==1.3.7
22 | notebook==4.0.4
23 | numpy==1.9.2
24 | pandas==0.16.2
25 | path.py==7.7
26 | patsy==0.4.0
27 | pbr==1.6.0
28 | pexpect==3.3
29 | pickleshare==0.5
30 | ptyprocess==0.5
31 | Pygments==2.0.2
32 | pyparsing==2.0.3
33 | python-dateutil==2.4.2
34 | pytz==2015.4
35 | pyzmq==14.7.0
36 | scikit-learn==0.16.1
37 | scipy==0.16.0
38 | seaborn==0.6.0
39 | simplegeneric==0.8.1
40 | six==1.9.0
41 | terminado==0.5
42 | tornado==4.2.1
43 | traitlets==4.0.0
44 | vincent==0.4.4
45 | statsmodels==0.6.1
46 |
--------------------------------------------------------------------------------
/requirements_linux.txt:
--------------------------------------------------------------------------------
1 | backports.ssl-match-hostname==3.4.0.2
2 | certifi==2015.4.28
3 | decorator==4.0.2
4 | funcsigs==0.4
5 | functools32==3.2.3.post2
6 | ipykernel==4.0.3
7 | ipython==4.0.0
8 | ipython-genutils==0.1.0
9 | Jinja2==2.8
10 | jsonschema==2.5.1
11 | jupyter-client==4.0.0
12 | jupyter-core==4.0.4
13 | MarkupSafe==0.23
14 | matplotlib==1.4.3
15 | mistune==0.7.1
16 | mock==1.3.0
17 | nbconvert==4.0.0
18 | nbformat==4.0.0
19 | nose==1.3.7
20 | notebook==4.0.4
21 | numpy==1.9.2
22 | pandas==0.16.2
23 | path.py==7.7.1
24 | patsy==0.4.0
25 | pbr==1.6.0
26 | pexpect==3.3
27 | pickleshare==0.5
28 | ptyprocess==0.5
29 | Pygments==2.0.2
30 | pyparsing==2.0.3
31 | python-dateutil==2.4.2
32 | pytz==2015.4
33 | pyzmq==14.7.0
34 | scikit-learn==0.16.1
35 | scipy==0.16.0
36 | seaborn==0.6.0
37 | simplegeneric==0.8.1
38 | six==1.9.0
39 | statsmodels==0.6.1
40 | terminado==0.5
41 | tornado==4.2.1
42 | traitlets==4.0.0
43 | vincent==0.4.4
44 | wheel==0.24.0
45 |
--------------------------------------------------------------------------------