├── .gitignore
├── README.md
├── check_env.py
├── data
    ├── Demographics_State.csv
    ├── Population_State.csv
    └── Weed_Price.csv
├── notebooks
    ├── 1. Introduction.ipynb
    ├── 2. Warm-up.ipynb
    ├── 3. Resampling.ipynb
    ├── 4. Basic Metrics.ipynb
    ├── 5. Distributions.ipynb
    ├── 6. Hypothesis Testing.ipynb
    ├── 7. Linear Regression.ipynb
    ├── 8. Closing thoughts and terminology.ipynb
    ├── 9. References.ipynb
    └── img
    │   ├── 6sigma.png
    │   ├── binomial.gif
    │   ├── binomial_pmf.png
    │   ├── correlation.gif
    │   ├── correlation_not_causation.gif
    │   ├── covariance.png
    │   ├── exponential_pdf.png
    │   ├── kurtosis.png
    │   ├── leastsquare.gif
    │   ├── normal_cdf.png
    │   ├── normal_pdf.png
    │   ├── normaldist.png
    │   ├── skewness.png
    │   ├── uniform.png
    │   └── variance.png
├── requirements.txt
└── requirements_linux.txt


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Created by https://www.gitignore.io/api/vim,ipythonnotebook,virtualenv
 2 | 
 3 | ### Vim ###
 4 | [._]*.s[a-w][a-z]
 5 | [._]s[a-w][a-z]
 6 | *.un~
 7 | Session.vim
 8 | .netrwhist
 9 | *~
10 | 
11 | 
12 | ### IPythonNotebook ###
13 | # Temporary data
14 | .ipynb_checkpoints/
15 | 
16 | 
17 | ### VirtualEnv ###
18 | # Virtualenv
19 | # http://iamzed.com/2009/05/07/a-primer-on-virtualenv/
20 | .Python
21 | [Bb]in
22 | [Ii]nclude
23 | [Ll]ib
24 | [Ss]cripts
25 | pyvenv.cfg
26 | pip-selfcheck.json
27 | 
28 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Introduction to Statistics
 2 | 
 3 | [![Bitdeli Badge](https://d2weczhvl823v0.cloudfront.net/rouseguy/intro2stats/trend.png)](https://bitdeli.com/free "Bitdeli Badge")
 4 | 
 5 | 
 6 | Inspired by Allen Downey's books [Think Stats](http://greenteapress.com/thinkstats/) and [Think Bayes](http://greenteapress.com/thinkbayes/), this is an attempt to learn Statistics using an application-centric programming approach. 
 7 | 
 8 | ## Objective
 9 | Showcase real-life examples and what statistics to use in each of those examples. Almost every book teaches a concept and shows an example. Ultimately, every topic gets treated separately and no holistic view is presented. Here, we would take examples and see how to make sense out of it. 
10 | 
11 | ## Topics covered
12 | 
13 | * Mean, Median, Mode
14 | * Standard Deviation
15 | * Variance
16 | * Co-variance
17 | * Probability Distribution
18 | * Hypothesis Testing
19 | * t-test, p-value, chi-squared test
20 | * Confidence Intervals
21 | * Confidence levels and Sigificance levels
22 | * Correlation
23 | * Resampling (and uses in Big Data)
24 | * A/B Testing
25 | * A simple linear regression model
26 | 
27 | ## Workshop Plan
28 | We would be using Marijuana prices in various states of the USA, along with demographic data of the USA based on the latest census data
29 | 
30 | There will be separate ipython notebooks - grouped by topic similarities. *notebooks will be uploaded later*
31 | Some examples include:
32 | * Find sum of people buying weed in a year, by various states.
33 | * Find mean of price in a week/month, by various states.
34 | * Find variance of price in selected states. Find variance of selected states by week of month
35 | * Define distribution. Plot histograms
36 | * Determining outliers (Plots, quantiles, box plots, percentiles) in weed price data
37 | * Continuous distributions(exponential distribution, normal distribution)
38 | * Introduction to Probability
39 | * Hypothesis testing. Check if weed price across states are similar or not. Check for different qualities of weed
40 | * Resampling
41 | * Simple regression model: Predict weed price for the next month. Understand the output and diagnostics
42 | * Introduction to A/B testing: Impact of regulation and deregulation on a couple of states 
43 | 
44 | 
45 | ## Prerequisites
46 | * Basics of Python. User should know how to write functions; read in a text file(csv, txt, fwf) and parse them; conditional and looping constructs; using standard libraries like os, sys; lists, list comprehension, dictionaries
47 | * It is good to know basics of the following:
48 |     * Numpy
49 |     * Scipy
50 |     * Pandas
51 |     * Matplotlib
52 |     * Seaborn
53 |     * IPython and IPython notebook - Everything here would be an IPython notebook
54 | * Software Requirements
55 |     * Python 2.7
56 |     * git - so that this repo can be cloned :)  
57 |     * virtualenv
58 |     * Libraries from *requirements.txt*
59 | 
60 | ## Optional
61 | Users could choose to install Anaconda, if they want. If using Anaconda or Enthought, please ensure that all libraries listed in the requirements.txt are installed. 
62 | 
63 | *Note to Windows Users*: Neither of us use Windows. From past workshop experiences, Windows users have faced issues installing the way explained below. It is advisable to install Anaconda and ensure that all the libraries listed in the *requirements.txt* file are installed.  
64 | 
65 | ## Setup Guide
66 | 
67 | #### Clone the repository
68 |     $ git clone https://github.com/rouseguy/intro2stats.git
69 | 
70 | #### Create a virtual environment & activate
71 |     $ cd intro2stats
72 |     $ virtualenv env
73 |     $ source env/bin/activate
74 | 
75 | #### Install reqirements from requirements file
76 |     $ pip install -r requirements.txt
77 | 
78 | #### Note: Make sure you have libraries for png & freetype.
79 | Ubuntu users can install the below
80 | 
81 |     apt-get install libfreetype6-dev
82 |     apt-get install libpng-dev
83 | 
84 | ### Script to check if installation is fine for the workshop
85 | Please execute the following at the command prompt
86 | 
87 |     $ python check_env.py
88 | 
89 | If any library has a `FAIL` message, please install/upgrade that library.
90 | 
91 | ---
92 | 
93 | <a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Introduction to Statistics using Python</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://twitter.com/bargava/" property="cc:attributionName" rel="cc:attributionURL">Bargava</a> and <a xmlns:cc="http://creativecommons.org/ns#" href="https://twitter.com/raghothams/" property="cc:attributionName" rel="cc:attributionURL">Raghotham</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.
94 | 
95 | 
96 | 
97 | 


--------------------------------------------------------------------------------
/check_env.py:
--------------------------------------------------------------------------------
 1 | """
 2 |     This script will check if the environment setup is correct for the workshop.
 3 | 
 4 |     To run, please execute the following command from the command prompt
 5 |                >>> python check_env.py
 6 |     
 7 |     The output will indicate if any of the libraries are missing or need to be updated. 
 8 | 
 9 |     This script is inspired from https://github.com/fonnesbeck/scipy2015_tutorial/blob/master/check_env.py
10 | """
11 | 
12 | from __future__ import print_function
13 | 
14 | try:
15 |     import curses
16 |     curses.setupterm()
17 |     assert curses.tigetnum("colors") > 2
18 |     OK = "\x1b[1;%dm[ OK ]\x1b[0m" % (30 + curses.COLOR_GREEN)
19 |     FAIL = "\x1b[1;%dm[FAIL]\x1b[0m" % (30 + curses.COLOR_RED)
20 | except:
21 |     OK = '[ OK ]'
22 |     FAIL = '[FAIL]'
23 | 
24 | import sys
25 | try:
26 |     import importlib
27 | except ImportError:
28 |     print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
29 | from distutils.version import LooseVersion as Version
30 | 
31 | def import_version(pkg, min_ver, fail_msg=""):
32 |     mod = None
33 |     try:
34 |         mod = importlib.import_module(pkg)
35 |         # workaround for Image not having __version__
36 |         version = getattr(mod, "__version__", 0) or getattr(mod, "VERSION", 0)
37 |         if Version(version) < min_ver:
38 |             print(FAIL, "%s version %s or higher required, but %s installed."
39 |                   % (lib, min_ver, version))
40 |         else:
41 |             print(OK, '%s version %s' % (pkg, version))
42 |     except ImportError:
43 |         print(FAIL, '%s not installed. %s' % (pkg, fail_msg))
44 |     return mod
45 | 
46 | 
47 | # first check the python version
48 | print('Using python in', sys.prefix)
49 | print(sys.version)
50 | pyversion = Version(sys.version)
51 | if pyversion >= "3":
52 |     print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
53 | elif pyversion >= "2":
54 |     if pyversion < "2.7":
55 |         print(FAIL, "Python version 2.7 is required, but %s is installed." % sys.version)
56 | else:
57 |     print(FAIL, "Unknown Python version: %s" % sys.version)
58 | 
59 | print()
60 | requirements = {'numpy': "1.9.2", 'pandas': "0.16.2", 'scipy': "0.9", 'matplotlib': "1.4.3",
61 |         'IPython': "4.0", 'sklearn': "0.16.1", 'seaborn': "0.6.0", 'statsmodels': "0.6.1"}
62 | 
63 | # now the dependencies
64 | for lib, required_version in list(requirements.items()):
65 |     import_version(lib, required_version)
66 | 


--------------------------------------------------------------------------------
/data/Demographics_State.csv:
--------------------------------------------------------------------------------
 1 | "region","total_population","percent_white","percent_black","percent_asian","percent_hispanic","per_capita_income","median_rent","median_age"
 2 | "alabama",4799277,67,26,1,4,23680,501,38.1
 3 | "alaska",720316,63,3,5,6,32651,978,33.6
 4 | "arizona",6479703,57,4,3,30,25358,747,36.3
 5 | "arkansas",2933369,74,15,1,7,22170,480,37.5
 6 | "california",37659181,40,6,13,38,29527,1119,35.4
 7 | "colorado",5119329,70,4,3,21,31109,825,36.1
 8 | "connecticut",3583561,70,9,4,14,37892,880,40.2
 9 | "delaware",908446,65,21,3,8,29819,828,38.9
10 | "district of columbia",619371,35,49,3,10,45290,1154,33.8
11 | "florida",19091156,57,15,2,23,26236,838,41
12 | "georgia",9810417,55,30,3,9,25182,673,35.6
13 | "hawaii",1376298,23,2,37,9,29305,1220,38.3
14 | "idaho",1583364,84,1,1,11,22568,607,34.9
15 | "illinois",12848554,63,14,5,16,29666,759,36.8
16 | "indiana",6514861,81,9,2,6,24635,577,37.1
17 | "iowa",3062553,88,3,2,5,27027,534,38.1
18 | "kansas",2868107,78,6,2,11,26929,551,36
19 | "kentucky",4361333,86,8,1,3,23462,506,38.2
20 | "louisiana",4567968,60,32,2,4,24442,610,36
21 | "maine",1328320,94,1,1,1,26824,664,43.2
22 | "maryland",5834299,54,29,6,8,36354,1034,38
23 | "massachusetts",6605058,76,6,6,10,35763,936,39.2
24 | "michigan",9886095,76,14,3,5,25681,623,39.1
25 | "minnesota",5347740,83,5,4,5,30913,734,37.6
26 | "mississippi",2976872,58,37,1,3,20618,510,36.2
27 | "missouri",6007182,81,11,2,4,25649,549,38
28 | "montana",998554,87,0,1,3,25373,577,39.9
29 | "nebraska",1841625,82,4,2,9,26899,563,36.3
30 | "nevada",2730066,53,8,7,27,26589,840,36.6
31 | "new hampshire",1319171,92,1,2,3,33134,878,41.5
32 | "new jersey",8832406,59,13,9,18,36027,1024,39.1
33 | "new mexico",2069706,40,2,1,47,23763,635,36.7
34 | "new york",19487053,58,14,8,18,32382,963,38.1
35 | "north carolina",9651380,65,21,2,9,25284,602,37.6
36 | "north dakota",689781,88,1,1,2,29732,564,36.4
37 | "ohio",11549590,81,12,2,3,26046,562,39
38 | "oklahoma",3785742,68,7,2,9,24208,525,36.2
39 | "oregon",3868721,78,2,4,12,26809,749,38.7
40 | "pennsylvania",12731381,79,10,3,6,28502,652,40.3
41 | "rhode island",1051695,76,5,3,13,30469,781,39.6
42 | "south carolina",4679602,64,28,1,5,23943,582,38.1
43 | "south dakota",825198,84,1,1,3,25740,517,36.9
44 | "tennessee",6402387,75,17,1,5,24409,568,38.2
45 | "texas",25639373,45,12,4,38,26019,688,33.8
46 | "utah",2813673,80,1,2,13,23873,739,29.6
47 | "vermont",625904,94,1,1,2,29167,754,42
48 | "virginia",8100653,64,19,6,8,33493,910,37.5
49 | "washington",6819579,72,3,7,11,30742,853,37.3
50 | "west virginia",1853619,93,3,1,1,22966,448,41.5
51 | "wisconsin",5706871,83,6,2,6,27523,636,38.7
52 | "wyoming",570134,85,1,1,9,28902,647,36.8
53 | 


--------------------------------------------------------------------------------
/data/Population_State.csv:
--------------------------------------------------------------------------------
 1 | "region","value"
 2 | "alabama",4777326
 3 | "alaska",711139
 4 | "arizona",6410979
 5 | "arkansas",2916372
 6 | "california",37325068
 7 | "colorado",5042853
 8 | "connecticut",3572213
 9 | "delaware",900131
10 | "district of columbia",605759
11 | "florida",18885152
12 | "georgia",9714569
13 | "hawaii",1362730
14 | "idaho",1567803
15 | "illinois",12823860
16 | "indiana",6485530
17 | "iowa",3047646
18 | "kansas",2851183
19 | "kentucky",4340167
20 | "louisiana",4529605
21 | "maine",1329084
22 | "maryland",5785496
23 | "massachusetts",6560595
24 | "michigan",9897264
25 | "minnesota",5313081
26 | "mississippi",2967620
27 | "missouri",5982413
28 | "montana",990785
29 | "nebraska",1827306
30 | "nevada",2704204
31 | "new hampshire",1317474
32 | "new jersey",8793888
33 | "new mexico",2055287
34 | "new york",19398125
35 | "north carolina",9544249
36 | "north dakota",676253
37 | "ohio",11533561
38 | "oklahoma",3749005
39 | "oregon",3836628
40 | "pennsylvania",12699589
41 | "rhode island",1052471
42 | "south carolina",4630351
43 | "south dakota",815871
44 | "tennessee",6353226
45 | "texas",25208897
46 | "utah",2766233
47 | "vermont",625498
48 | "virginia",8014955
49 | "washington",6738714
50 | "west virginia",1850481
51 | "wisconsin",5687219
52 | "wyoming",562803
53 | 


--------------------------------------------------------------------------------
/notebooks/1. Introduction.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "> **The fact that data science exists as a field is a colossal failure of statistics ** - Hadley Wickham\n",
 8 |     "\n",
 9 |     "\n",
10 |     "# THERE ARE HACKS, DAMN HACKS, AND THERE ARE STATISTICS  \n",
11 |     "\n",
12 |     "\n",
13 |     "Statistics, to most people, is a bunch of formulae, yes - invariably complicated with strong assumptions. Very few practitioners in fields outside math can derive those from first principles. But is it necessarily need to be that way?\n",
14 |     "\n",
15 |     "## Philosophy for this workshop\n",
16 |     "Instead of a formula-based, discretely-structured classical framework for teaching statistics, we believe in the hackers' philosophy of DIY. \n",
17 |     "> **I hear and I forget. I see and I remember. I do and I understand  - Confucius** \n",
18 |     "\n",
19 |     "What is the point of learning something if you don't know how and where to apply that? We aim to bridge that gap a bit.\n",
20 |     "\n",
21 |     "## Data Analysis\n",
22 |     "\n",
23 |     "The art of data analysis\n",
24 |     "\n",
25 |     "> https://github.com/amitkaps/weed/blob/master/0-Intro.ipynb\n",
26 |     "\n",
27 |     "#                Statistics: Inferring results about a population given a sample\n",
28 |     "\n",
29 |     "\n",
30 |     "# *Where does statistics fit in Business Anaytics?*\n",
31 |     "\n",
32 |     "> DISCUSSION\n",
33 |     "\n"
34 |    ]
35 |   },
36 |   {
37 |    "cell_type": "code",
38 |    "execution_count": null,
39 |    "metadata": {
40 |     "collapsed": true
41 |    },
42 |    "outputs": [],
43 |    "source": []
44 |   }
45 |  ],
46 |  "metadata": {
47 |   "kernelspec": {
48 |    "display_name": "Python 2",
49 |    "language": "python",
50 |    "name": "python2"
51 |   },
52 |   "language_info": {
53 |    "codemirror_mode": {
54 |     "name": "ipython",
55 |     "version": 2
56 |    },
57 |    "file_extension": ".py",
58 |    "mimetype": "text/x-python",
59 |    "name": "python",
60 |    "nbconvert_exporter": "python",
61 |    "pygments_lexer": "ipython2",
62 |    "version": "2.7.10"
63 |   }
64 |  },
65 |  "nbformat": 4,
66 |  "nbformat_minor": 0
67 | }
68 | 


--------------------------------------------------------------------------------
/notebooks/2. Warm-up.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "> **SOMETIMES THE QUESTIONS ARE COMPLICATED AND THE ANSWERS ARE SIMPLE **\n",
  8 |     "\n",
  9 |     ">*Dr. Seuss*\n",
 10 |     "\n",
 11 |     "## Coin Toss\n",
 12 |     "\n",
 13 |     "You toss a coin 30 times and see head 24 times. Is it a fair coin?\n",
 14 |     "\n",
 15 |     "**Hypothesis 1**: Tossing a fair coin will get you 15 heads in 30 tosses. This coin is biased\n",
 16 |     "\n",
 17 |     "**Hypothesis 2**: Come on, even a fair coin could show 24 heads in 30 tosses. This is just by chance\n",
 18 |     "\n",
 19 |     "#### Statistical Method\n",
 20 |     "\n",
 21 |     "P(H) = ? \n",
 22 |     "\n",
 23 |     "P(HH) = ?\n",
 24 |     "\n",
 25 |     "P(THH) = ?\n",
 26 |     "\n",
 27 |     "Now, slightly tougher : P(2H, 1T) = ?\n",
 28 |     "\n",
 29 |     "Generalizing, \n",
 30 |     "\n",
 31 |     "<img style=\"float: left;\" src=\"img/binomial.gif\">\n",
 32 |     "\n",
 33 |     "<br>\n",
 34 |     "<br>\n",
 35 |     "<br>\n",
 36 |     "<br>\n",
 37 |     "\n",
 38 |     "\n",
 39 |     "**What is the probability of getting 24 heads in 30 tosses ?**\n",
 40 |     "\n",
 41 |     "It is the probability of getting heads 24 times or more. \n",
 42 |     "\n",
 43 |     "#### Hacker's Approach\n",
 44 |     "\n",
 45 |     "Simulation. Run the experiment 100,000 times. Find the percentage of times the experiment returned 24 or more heads. If it is more than 5%, we conclude that the coin is biased. "
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "code",
 50 |    "execution_count": 1,
 51 |    "metadata": {
 52 |     "collapsed": false
 53 |    },
 54 |    "outputs": [
 55 |     {
 56 |      "name": "stdout",
 57 |      "output_type": "stream",
 58 |      "text": [
 59 |       "Data of the Experiment: [1 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 0 0 0 0 0 1 0 0 1 1 1 1 0]\n",
 60 |       "Heads in the Experiment: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]\n",
 61 |       "Number of heads in the experiment: 15\n"
 62 |      ]
 63 |     }
 64 |    ],
 65 |    "source": [
 66 |     "import numpy as np  \n",
 67 |     "\n",
 68 |     "total_tosses = 30\n",
 69 |     "num_heads = 24\n",
 70 |     "prob_head = 0.5\n",
 71 |     "\n",
 72 |     "#0 is tail. 1 is heads. Generate one experiment\n",
 73 |     "experiment = np.random.randint(0,2,total_tosses)\n",
 74 |     "print \"Data of the Experiment:\", experiment\n",
 75 |     "#Find the number of heads\n",
 76 |     "print \"Heads in the Experiment:\", experiment[experiment==1]  #This will give all the heads in the array\n",
 77 |     "head_count = experiment[experiment==1].shape[0] #This will get the count of heads in the array\n",
 78 |     "print \"Number of heads in the experiment:\", head_count"
 79 |    ]
 80 |   },
 81 |   {
 82 |    "cell_type": "code",
 83 |    "execution_count": 2,
 84 |    "metadata": {
 85 |     "collapsed": false
 86 |    },
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "#Now, the above experiment needs to be repeated 100 times. Let's write a function and put the above code in a loop\n",
 90 |     "\n",
 91 |     "def coin_toss_experiment(times_to_repeat):\n",
 92 |     "\n",
 93 |     "    head_count = np.empty([times_to_repeat,1], dtype=int)\n",
 94 |     "    \n",
 95 |     "    for times in np.arange(times_to_repeat):\n",
 96 |     "        experiment = np.random.randint(0,2,total_tosses)\n",
 97 |     "        head_count[times] = experiment[experiment==1].shape[0]\n",
 98 |     "    \n",
 99 |     "    return head_count"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "code",
104 |    "execution_count": 3,
105 |    "metadata": {
106 |     "collapsed": false
107 |    },
108 |    "outputs": [],
109 |    "source": [
110 |     "head_count = coin_toss_experiment(100)"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 4,
116 |    "metadata": {
117 |     "collapsed": false
118 |    },
119 |    "outputs": [
120 |     {
121 |      "data": {
122 |       "text/plain": [
123 |        "array([[15],\n",
124 |        "       [13],\n",
125 |        "       [15],\n",
126 |        "       [16],\n",
127 |        "       [11],\n",
128 |        "       [16],\n",
129 |        "       [14],\n",
130 |        "       [16],\n",
131 |        "       [13],\n",
132 |        "       [17]])"
133 |       ]
134 |      },
135 |      "execution_count": 4,
136 |      "metadata": {},
137 |      "output_type": "execute_result"
138 |     }
139 |    ],
140 |    "source": [
141 |     "head_count[:10] "
142 |    ]
143 |   },
144 |   {
145 |    "cell_type": "code",
146 |    "execution_count": 5,
147 |    "metadata": {
148 |     "collapsed": false
149 |    },
150 |    "outputs": [
151 |     {
152 |      "name": "stdout",
153 |      "output_type": "stream",
154 |      "text": [
155 |       "Dimensions: (100, 1) \n",
156 |       "Type of object: <type 'numpy.ndarray'>\n"
157 |      ]
158 |     }
159 |    ],
160 |    "source": [
161 |     "print \"Dimensions:\", head_count.shape, \"\\n\",\"Type of object:\", type(head_count)"
162 |    ]
163 |   },
164 |   {
165 |    "cell_type": "code",
166 |    "execution_count": 6,
167 |    "metadata": {
168 |     "collapsed": false
169 |    },
170 |    "outputs": [],
171 |    "source": [
172 |     "#Let's plot the above distribution\n",
173 |     "import matplotlib.pyplot as plt\n",
174 |     "%matplotlib inline\n",
175 |     "import seaborn as sns\n",
176 |     "sns.set(color_codes = True)"
177 |    ]
178 |   },
179 |   {
180 |    "cell_type": "code",
181 |    "execution_count": 7,
182 |    "metadata": {
183 |     "collapsed": false
184 |    },
185 |    "outputs": [
186 |     {
187 |      "data": {
188 |       "text/plain": [
189 |        "<matplotlib.axes._subplots.AxesSubplot at 0x10a2ae350>"
190 |       ]
191 |      },
192 |      "execution_count": 7,
193 |      "metadata": {},
194 |      "output_type": "execute_result"
195 |     },
196 |     {
197 |      "data": {
198 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeIAAAFVCAYAAAAzJuxuAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGspJREFUeJzt3W9slfX9//HXOaf0HNqeAyU9JmONDJlR77ikKiHGqeH7\nTdeoUbe5DGgb/3BDcIoisvFP7aaAmZlsS1mAmpmMoc0StyBGt2w6ZMNMyE8kYbpNZdPhwF9Liz3n\n9Jz2tNf1vUFkiND2Or2u86ZXn487wIHrXK/3uc75vM5V2nNFXNd1BQAATEStAwAAMJlRxAAAGKKI\nAQAwRBEDAGCIIgYAwBBFDACAoVGL+ODBg2ptbZUkvf/++1q4cKEWLVqkNWvWiJ98AgBgfEYs4o6O\nDq1bt07FYlGS1N7erqVLl+rZZ5/V4OCgdu/eXY6MAACE1ohFPGvWLLW3t586800kEjpx4oRc11Uu\nl9OUKVPKEhIAgLAasYgbGxsVi8VO/bmlpUXr16/XDTfcoJ6eHs2dOzfwgAAAhJmnb9ZauXKlnn32\nWb388su6+eab9cQTT4y6Df+PDADAuVV4+ceFQkHV1dWSpAsuuEAHDhwYdZtIJKKurkxp6SaAdDoZ\n2vnCPJvEfBMd801cYZ5NOjmfF2Mq4kgkIkl6/PHHtWzZMsXjcVVWVuqxxx7znhAAAJwyahHX19er\ns7NTknT11Vfr6quvDjwUAACTBR/oAQCAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgi\nBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMeboMIhAWjuOor69PmUzWOookqbq6WtEo74vPV47jKJfL\nlXWf8bh7zucnz5dwoYgxKeVyOb2y/4iGhu0Xs4FCXv9z1Rwlk96uYYryOfl8eV/xxNSy7bOmOqFs\nrvC523m+hA9FjEkrkajSsMtLAGMTT0zV1Kqasu2vqjrB83OSsD8dAABgEqOIAQAwRBEDAGCIIgYA\nwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYGjUIj548KBaW1slScePH9fSpUvV\n0tKi5uZmHTlyJPCAAACE2YifKN7R0aEXXnhB1dXVkqQnn3xSt9xyi5qamvTGG2/o3XffVX19fVmC\nAgAQRiOeEc+aNUvt7e1yXVeSdODAAR07dkx33nmndu3apXnz5pUlJAAAYTXiGXFjY+Nnvvz80Ucf\nadq0aXrmmWe0efNmdXR0aNmyZYGHBFAejuMol8tZx5B0MoskRaNnP1+Ix11lMtmyZMlms3Idtyz7\nwuTj6WKX06dP1/z58yVJ8+fP16ZNm8a0XTod7gtYh3m+sM4Wj7vS4R4laxLWURSLDKmurkaplP+P\ntdfj19fXp1f2H1EiUeV7Fq96e7sUjVRo2vTas/+Dwz1lzVJVlSz78+Vs+wvy+VJOYV1bSuGpiBsa\nGrR7927dcsst2rdvny6++OIxbdfVlSkp3ESQTidDO1+YZ/v0TCqTLRgnkfL9BXV3ZzUwEPH1fks5\nfplMVkPD0fPigvTDw1E50XNnSdYkynb8hoejyuYGVFFZvufLueYL6vlSTmFeWyTvbzLG9ONLkcjJ\nA75q1Srt3LlTCxYs0N69e7VkyRLvCQEAwCmjvu2tr69XZ2enJGnmzJn6+c9/HngoAAAmCz7QAwAA\nQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAY\noogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQ\nRQwAgCGKGAAAQxQxAACGRi3igwcPqrW19TO37dq1SwsWLAgsFAAAk0XFSH/Z0dGhF154QdXV1adu\ne/vtt/X8888HHgwAgMlgxDPiWbNmqb29Xa7rSpJ6e3u1adMmrVmz5tRtAACgdCOeETc2NurIkSOS\nJMdxtHbtWq1atUrxeLws4YDJwHEcZbNZ3+83HneVyXi732w2K9fhTTZQTiMW8ekOHTqkDz/8UG1t\nbRocHNR7772njRs3avXq1aNum04nxxXyfBfm+cI6WzzuSod7lKxJWEfRYP4T/b9/fKxp0wf9vePD\nPZ436e3tUlVV8jx5XBKKxmIjZilXzrFkCcLZ9heLDKmurkap1MR+bYZ1bSnFmIv48ssv14svvihJ\n+uijj/Tggw+OqYQlqasrU1q6CSCdToZ2vjDP9umZYiZbME4i5foLikQrNOyO+eU4JsmahOf5hoej\nyuYGVFF5/jwu58pSynxBZQnCuebL9xfU3Z3VwECkbFn8Fua1RfL+JmNMP74UiXz2gLuu+7nbAACA\nd6MWcX19vTo7O0e9DQAAeMcHegAAYIgiBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMUcQAABiiiAEA\nMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACA\nIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDA0KhFfPDgQbW2tkqS3nnnHTU3N6u1\ntVWLFy/W8ePHAw8IAECYjVjEHR0dWrdunYrFoiRpw4YNevjhh7V9+3Y1Njaqo6OjLCEBAAirEYt4\n1qxZam9vl+u6kqSnnnpKl156qSRpaGhI8Xg8+IQAAITYiEXc2NioWCx26s/pdFqS9Oabb2rHjh26\n4447Ag0HAEDYVXjd4KWXXtKWLVu0bds21dbWjmmbdDrpOdhEEub5wjpbPO5Kh3uUrElYR9FgPqFo\nLBZIFq/3GWQWr8aSpVw5rR6Xs+0vFhlSXV2NUqmJ/doM69pSCk9FvHPnTv3qV7/S9u3bNW3atDFv\n19WV8Rxsokink6GdL8yzZTLZk79mC8ZJpFx/QZFohSoq/c2SrEl4ni+oLKUYLUsp8wWVJQjnmi/f\nX1B3d1YDA5GyZfFbmNcWyfubjDEVcSQSkeM42rBhg2bOnKl7771XkjR37lzdd9993lMCAABJYyji\n+vp6dXZ2SpLeeOONwAMBADCZ8IEeAAAYoogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMUMQAAhihi\nAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBED\nAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYGjUIj548KBaW1slSR98\n8IEWLlyo5uZmtbW1yXXdwAMCABBmIxZxR0eH1q1bp2KxKEnauHGjHnzwQe3YsUOu6+qVV14pS0gA\nAMJqxCKeNWuW2tvbT535vv3227rqqqskSddee61ef/314BMCABBiFSP9ZWNjo44cOXLqz6d/Kbqq\nqkqZTGZMO0mnkyXGmxjCPF9YZ4vHXelwj5I1CesoGswnFI3FAsni9T6DzOLVWLKUK6fV43K2/cUi\nQ6qrq1EqNbFfm2FdW0oxYhGfKRr97wl0LpdTKpUa03ZdXWMr7IkonU6Gdr4wz5bJZE/+mi0YJ5Fy\n/QVFohWqqPQ3S7Im4Xm+oLKUYrQspcwXVJYgnGu+fH9B3d1ZDQxEypbFb2FeWyTvbzI8fdf0ZZdd\npn379kmS9uzZoyuvvNLTzgAAwGeN6Yw4Ejn5zmvVqlV6+OGHVSwWNWfOHDU1NQUaDgCAsBu1iOvr\n69XZ2SlJ+tKXvqTt27cHHgoAgMmCD/QAAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAw\nRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDAEEUMAIAh\nihgAAEMUMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgKEKrxs4jqO1a9fqX//6l6LR\nqB577DFddNFFQWQDACD0PJ8R//nPf1Y+n9dzzz2n73znO/rxj38cRC4AACYFz0WcSCSUyWTkuq4y\nmYymTJkSRC4AACYFz1+abmho0ODgoJqamnTixAlt2bJl1G3S6WRJ4SaKMM8X1tnicVc63KNkTcI6\nigbzCUVjsUCyeL3PILN4NZYs5cpp9bicbX+xyJDq6mqUSk3s12ZY15ZSeC7ip59+Wg0NDVq+fLmO\nHTum22+/Xbt27VJlZeU5t+nqyowr5PksnU6Gdr4wz5bJZE/+mi0YJ5Fy/QVFohWqqPQ3S7Im4Xm+\noLKUYrQspcwXVJYgnGu+fH9B3d1ZDQxEypbFb2FeWyTvbzI8f2k6n8+rurpakpRKpVQsFuU4jte7\nAQAAKuGMePHixVq9erUWLVqkoaEhrVixQomE/ZexAACYiDwXcSqV0ubNm4PIAgDApMMHegAAYIgi\nBgDAEEUMAIAhihgAAEMUMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQx\nAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogB\nADBEEQMAYIgiBgDAUEUpG23dulV//OMfVSwW1dLSoq9//et+5wIAYFLwXMRvvPGGDhw4oM7OTvX3\n9+vpp58OIhcAAJOC5yLeu3evLrnkEt1zzz3KZrP67ne/G0QuAAAmBc9F3NPTo6NHj2rr1q3697//\nraVLl+q3v/1tENkQMo7jKJfLWceQJGWzWbmuax0DALwXcW1trebMmaOKigrNnj1b8XhcPT09mjFj\nxjm3SaeT4wp5vgvzfH7O1tfXp1f2H1EiUeXbfZaqt7dLVVVJ1c6wP3aD+YSisZiSNQnf79vrfQaZ\nxauxZClXTqvH5Wz7i0WGVFdXo1TK/rk7HmFeN73yXMRXXHGFfvGLX+jOO+/Uxx9/rHw+r9ra2hG3\n6erKlBzwfJdOJ0M7n9+zZTJZDQ1HNeyW9D2CvhoePvkDA5lswTiJlOsvKBKtUEWlv1mSNQnP8wWV\npRSjZSllvqCyBOFc8+X7C+ruzmpgIFK2LH4L87opeX+T4XlFvP7667V//37ddtttchxHjz76qCKR\nifuEAADAUkmnJitXrvQ7BwAAkxIf6AEAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCI\nIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMU\nMQAAhihiAAAMUcQAABiiiAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQyUX8fHjx3Xdddfpn//8p595\nAACYVEoq4mKxqEceeURTp071Ow8AAJNKSUX8wx/+UAsXLlQ6nfY7DwAAk0qF1w1+/etfa8aMGbrm\nmmu0detWua4bRC74xHEc5XK5kraNx11lMlnfsmSzWbkOzxcAOF3E9dikLS0tikQikqS//e1vmj17\ntn72s5+prq4ukIAYn76+Pr24529KJKqso6i3t0tVVUnVzrB/rhzv+ljRWIwsZJlwWfpzWf3v3AuV\nSqWso8Anns+If/nLX576fWtrq37wgx+MWsJdXRnvySaIdDp5Xs+XyWQ1NBzVsOv5UCtZk1AmW/At\ny/BwVNncgCoq/bvPUuX6C0omq32dbzxZItEK3x+XUo5fUFlKMVoWv5+f48kShHPNl+8vqLs7q4GB\nSNmy+O18XzfHK51Oevr3/PgSAACGvJ8mnWb79u1+5QAAYFLijBgAAEMUMQAAhihiAAAMUcQAABii\niAEAMEQRAwBgiCIGAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGBrXRR/ON/+/67iOdfeWdZ/T\nj1bpxIn+z90+NVGpi2dfWNYsAMLPcRxls1nrGJKk6upqRaOcz41XqIq4u/cT9QxMLes+i7m4MgPu\n526f0t+ni2eXNQqASWBwIK89b/UpNW26aY6BQl7/c9UcJZPerr2LzwtVEQPAZBBPTNXUqhrrGPAJ\nX1MAAMAQRQwAgCGKGAAAQxQxAACGKGIAAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABD\nnj9rulgsas2aNfrPf/6jwcFBLV26VPPnzw8iGwAAoee5iHft2qUZM2boySef1CeffKJbb72VIgYA\noESei7ipqUlf+9rXJJ28LmYsFvM9FAAAk4XnIq6qqpIkZbNZ3X///Vq+fLnvoQAAGCvHcZTL5axj\nnJJOe7tGc0nXIz569KjuvfdeNTc368Ybb/Q9VKlqu6qVj5b/EsvJmsTnbqt0i2WbeyTxuKua6oSq\nqj+fcSzONlupBvMJRWMxX+9zPFkkf+crVZCPi9f7PN+O0WhZypXT6nE52/7Ol2MUiwyprq5GqVRp\n65yf62NfX59e2X9EiUSVb/dZqkKhXxddNNPTNp5bq7u7W3fddZceffRRzZs3b0zbdHVlvO6mJL29\nOWX642XZ16eSNQllsoXP3T5lOF+2uUeSyWSVzRU07Hp/g3Ku2UqV6y8oEq1QRaV/9zmeLMlkta/z\njSdLEI9LKcfvfDtGI2Xx+/k5nixBONd858sxyvcX1N2d1cBAxPO26XTS1/Uxk8lqaDha0jrnt6Fh\n7z+M5HmLLVu2KJPJaPPmzWptbVVra6sGBgY87xgAAJRwRrxu3TqtW7cuiCwAAEw6fKAHAACGKGIA\nAAxRxAAAGKKIAQAwRBEDAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAYoogBADBkf82o\nkHIcR5mM/WUQs9msXMe1jgEgZBzHUTabLWnbeNxVJlPatmcz0dc5ijggA4W8Xtn/vuKJqaY5Puk9\nrkRVjewvlw0gTAYH8trzVp9S06Z73ramOqFszr/rKU/0dY4iDlA8MVVTq2pMMxTyOdP9AwivUte4\nquqEhl3/6meir3P8HzEAAIYoYgAADFHEAAAYoogBADBEEQMAYIgiBgDAEEUMAIAhihgAAEMUMQAA\nhihiAAAMUcQAABjy/GGfjuOora1N//jHPzRlyhStX79eF154YRDZAAAIPc9nxH/4wx9ULBbV2dmp\nhx56SE888UQQuQAAmBQ8F/Gbb76pr371q5Kkr3zlKzp06JDvoQAAmCw8f2k6m82qpua/l72KxWJy\nHEfRqP1/N0cj0kD2eFn3WamEBrKfv65mZGhQA8V8WbOczUChoEg0pny/94twxyJDyvf7d83Q8WTx\n20ChoCkVUcWGrJME97iUcvzOt2M0Uha/n5/jyRKEc813vhwj1pZzZfG+7nsu4pqaGuVy/73241hK\nOJ1Oeg5WinT6K/pqWfYEAIA/PJ/GNjQ0aM+ePZKkt956S5dcconvoQAAmCwiruu6XjZwXVdtbW36\n+9//LknauHGjZs+eHUg4AADCznMRAwAA/9h/hxUAAJMYRQwAgCGKGAAAQxQxAACGAivirVu3asGC\nBfrmN7+p3/zmN0HtxoTjOFq9erUWLlyo5uZmHT582DqSbw4ePKjW1lZJ0gcffHBqxra2Nk307+s7\nfbZ33nlHzc3Nam1t1eLFi3X8eHk/CCYIp8/3qV27dmnBggVGifx1+nzHjx/X0qVL1dLSoubmZh05\ncsQ43fidPt/777+vhQsXatGiRVqzZs2Efu0Vi0WtXLlSzc3N+ta3vqVXX301VGvL2ebzvL64AfjL\nX/7i3n333a7rum4ul3N/8pOfBLEbM6+99pp7//33u67runv37nXvu+8+40T+2LZtm3vTTTe53/72\nt13Xdd27777b3bdvn+u6rvvII4+4v//97y3jjcuZs7W0tLjvvPOO67qu29nZ6W7cuNEy3ridOZ/r\nuu5f//pX9/bbb//MbRPVmfN973vfc19++WXXdU+uN6+++qplvHE7c74HHnjAfe2111zXdd0VK1ZM\n6Pmef/55d8OGDa7ruu6JEyfc6667zl2yZElo1pazzed1fQnkjHjv3r265JJLdM8992jJkiWaP39+\nELsxk0gklMlk5LquMpmMpkyZYh3JF7NmzVJ7e/upd6dvv/22rrrqKknStddeq9dff90y3ricOdtT\nTz2lSy+9VJI0NDSkeDxuGW/czpyvt7dXmzZtmvBnU586c74DBw7o2LFjuvPOO7Vr1y7NmzfPOOH4\nnDlfIpHQiRMn5LqucrnchF5jmpqatGzZMkknv5pYUVERqrXlbPNt2rTJ0/oSSBH39PTo0KFD+ulP\nf6rvf//7euihh4LYjZmGhgYNDg6qqalJjzzyiFpaWqwj+aKxsVGxWOzUn09fwKuqqpTJZCxi+eLM\n2dLptKSTFzHZsWOH7rjjDqNk/jh9PsdxtHbtWq1atUpVVVXGyfxx5vH76KOPNG3aND3zzDP6whe+\noI6ODsN043fmfC0tLVq/fr1uuOEG9fT0aO7cuYbpxqeqqkrV1dXKZrO6//779cADD8hxnM/8/URe\nW86cb/ny5aqrq5M09vUlkCKura3VNddco4qKCs2ePVvxeFw9PT1B7MrE008/rYaGBv3ud7/Tzp07\ntWrVKg0ODlrH8t3pnyGey+WUSqUM0/jvpZdeUltbm7Zt26ba2lrrOL45dOiQPvzwQ7W1tWnFihV6\n7733tHHjRutYvpo+ffqpr7TNnz8/dFeBW7lypZ599lm9/PLLuvnmmyf85WaPHj2q22+/Xbfeeqtu\nuumm0K0tp8934403SvK2vgRSxFdccYX+9Kc/SZI+/vhj5fP5UC10+Xxe1dXVkqRUKqVisfiZd3hh\ncdlll2nfvn2SpD179ujKK680TuSfnTt3aseOHdq+fbvq6+ut4/jq8ssv14svvqjt27frqaee0pe/\n/GWtXr3aOpavGhoatHv3bknSvn37dPHFF9sG8lmhUDi1xlxwwQXq6+szTlS67u5u3XXXXVq5cqW+\n8Y1vSArX2nK2+byuL56vvjQW119/vfbv36/bbrtNjuPo0UcfVSQSCWJXJhYvXqzVq1dr0aJFGhoa\n0ooVK5RIJKxj+ebTY7Vq1So9/PDDKhaLmjNnjpqamoyTjV8kEpHjONqwYYNmzpype++9V5I0d+5c\n3Xfffcbpxu/M15nruqF67Z3+3Fy3bp2ee+45pVIp/ehHPzJO5o9P53v88ce1bNkyxeNxVVZW6rHH\nHjNOVrotW7Yok8lo8+bN2rx5syRp7dq1Wr9+fSjWljPncxxH7777rr74xS+OeX3hs6YBADDEB3oA\nAGCIIgYAwBBFDACAIYoYAABDFDEAAIYoYgAADFHEAAAY+j954m43MqFUVwAAAABJRU5ErkJggg==\n",
199 |       "text/plain": [
200 |        "<matplotlib.figure.Figure at 0x10691fb50>"
201 |       ]
202 |      },
203 |      "metadata": {},
204 |      "output_type": "display_data"
205 |     }
206 |    ],
207 |    "source": [
208 |     "sns.distplot(head_count, kde=False)"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "markdown",
213 |    "metadata": {},
214 |    "source": [
215 |     "**Exercise**: Try setting `kde=True` in the above cell and observe what happens"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "code",
220 |    "execution_count": null,
221 |    "metadata": {
222 |     "collapsed": true
223 |    },
224 |    "outputs": [],
225 |    "source": []
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": 8,
230 |    "metadata": {
231 |     "collapsed": false
232 |    },
233 |    "outputs": [
234 |     {
235 |      "data": {
236 |       "text/plain": [
237 |        "array([], dtype=int64)"
238 |       ]
239 |      },
240 |      "execution_count": 8,
241 |      "metadata": {},
242 |      "output_type": "execute_result"
243 |     }
244 |    ],
245 |    "source": [
246 |     "#Number of times the experiment returned 24 heads.\n",
247 |     "head_count[head_count>=24]"
248 |    ]
249 |   },
250 |   {
251 |    "cell_type": "code",
252 |    "execution_count": 9,
253 |    "metadata": {
254 |     "collapsed": false
255 |    },
256 |    "outputs": [
257 |     {
258 |      "name": "stdout",
259 |      "output_type": "stream",
260 |      "text": [
261 |       "No of times experiment returned 24 heads or more: 0\n",
262 |       "% of times with 24 or more heads:  0.0\n"
263 |      ]
264 |     }
265 |    ],
266 |    "source": [
267 |     "print \"No of times experiment returned 24 heads or more:\", head_count[head_count>=24].shape[0]\n",
268 |     "print \"% of times with 24 or more heads: \", head_count[head_count>=24].shape[0]/float(head_count.shape[0])*100"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "code",
273 |    "execution_count": null,
274 |    "metadata": {
275 |     "collapsed": false
276 |    },
277 |    "outputs": [],
278 |    "source": []
279 |   },
280 |   {
281 |    "cell_type": "markdown",
282 |    "metadata": {},
283 |    "source": [
284 |     "####  Exercise: Repeat the experiment 100,000 times. "
285 |    ]
286 |   },
287 |   {
288 |    "cell_type": "code",
289 |    "execution_count": null,
290 |    "metadata": {
291 |     "collapsed": true
292 |    },
293 |    "outputs": [],
294 |    "source": []
295 |   },
296 |   {
297 |    "cell_type": "markdown",
298 |    "metadata": {},
299 |    "source": [
300 |     "# Is the coin fair?"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": null,
306 |    "metadata": {
307 |     "collapsed": true
308 |    },
309 |    "outputs": [],
310 |    "source": []
311 |   },
312 |   {
313 |    "cell_type": "markdown",
314 |    "metadata": {},
315 |    "source": [
316 |     "### Extra pointers on numpy"
317 |    ]
318 |   },
319 |   {
320 |    "cell_type": "markdown",
321 |    "metadata": {},
322 |    "source": [
323 |     "**** Removing `for` loop in the funciton ****"
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "code",
328 |    "execution_count": 10,
329 |    "metadata": {
330 |     "collapsed": false
331 |    },
332 |    "outputs": [],
333 |    "source": [
334 |     "def coin_toss_experiment_2(times_to_repeat):\n",
335 |     "\n",
336 |     "    head_count = np.empty([times_to_repeat,1], dtype=int)\n",
337 |     "    experiment = np.random.randint(0,2,[times_to_repeat,total_tosses])\n",
338 |     "    return experiment.sum(axis=1)"
339 |    ]
340 |   },
341 |   {
342 |    "cell_type": "markdown",
343 |    "metadata": {},
344 |    "source": [
345 |     "#### Exercise: Benchmark `coin_toss_experiment` and `coin_toss_experiment_2` for 100 and 100,000 runs and report improvements, if any"
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": null,
351 |    "metadata": {
352 |     "collapsed": true
353 |    },
354 |    "outputs": [],
355 |    "source": []
356 |   }
357 |  ],
358 |  "metadata": {
359 |   "kernelspec": {
360 |    "display_name": "Python 2",
361 |    "language": "python",
362 |    "name": "python2"
363 |   },
364 |   "language_info": {
365 |    "codemirror_mode": {
366 |     "name": "ipython",
367 |     "version": 2
368 |    },
369 |    "file_extension": ".py",
370 |    "mimetype": "text/x-python",
371 |    "name": "python",
372 |    "nbconvert_exporter": "python",
373 |    "pygments_lexer": "ipython2",
374 |    "version": "2.7.10"
375 |   }
376 |  },
377 |  "nbformat": 4,
378 |  "nbformat_minor": 0
379 | }
380 | 


--------------------------------------------------------------------------------
/notebooks/3. Resampling.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "### Problem\n",
   8 |     "The number of shoes sold by an e-commerce company during the first three months(12 weeks) of the year were:\n",
   9 |     "<br>\n",
  10 |     "23 21 19 24 35 17 18 24 33 27 21 23\n",
  11 |     "\n",
  12 |     "Meanwhile, the company developed some dynamic price optimization algorithms and the sales for the next 12 weeks were:\n",
  13 |     "<br>\n",
  14 |     "31 28 19 24 32 27 16 41 23 32 29 33\n",
  15 |     "\n",
  16 |     "Did the dynamic price optimization algorithm deliver superior results? Can it be trusted?\n",
  17 |     "\n",
  18 |     "### Solution\n",
  19 |     "\n",
  20 |     "Before we get onto different approaches, let's quickly get a feel for the data\n",
  21 |     "\n"
  22 |    ]
  23 |   },
  24 |   {
  25 |    "cell_type": "code",
  26 |    "execution_count": 1,
  27 |    "metadata": {
  28 |     "collapsed": true
  29 |    },
  30 |    "outputs": [],
  31 |    "source": [
  32 |     "import numpy as np\n",
  33 |     "import seaborn as sns\n",
  34 |     "sns.set(color_codes=True)\n",
  35 |     "%matplotlib inline"
  36 |    ]
  37 |   },
  38 |   {
  39 |    "cell_type": "code",
  40 |    "execution_count": 2,
  41 |    "metadata": {
  42 |     "collapsed": true
  43 |    },
  44 |    "outputs": [],
  45 |    "source": [
  46 |     "#Load the data\n",
  47 |     "before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])\n",
  48 |     "after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])"
  49 |    ]
  50 |   },
  51 |   {
  52 |    "cell_type": "code",
  53 |    "execution_count": 3,
  54 |    "metadata": {
  55 |     "collapsed": false
  56 |    },
  57 |    "outputs": [
  58 |     {
  59 |      "data": {
  60 |       "text/plain": [
  61 |        "23.75"
  62 |       ]
  63 |      },
  64 |      "execution_count": 3,
  65 |      "metadata": {},
  66 |      "output_type": "execute_result"
  67 |     }
  68 |    ],
  69 |    "source": [
  70 |     "before_opt.mean()"
  71 |    ]
  72 |   },
  73 |   {
  74 |    "cell_type": "code",
  75 |    "execution_count": 4,
  76 |    "metadata": {
  77 |     "collapsed": false
  78 |    },
  79 |    "outputs": [
  80 |     {
  81 |      "data": {
  82 |       "text/plain": [
  83 |        "27.916666666666668"
  84 |       ]
  85 |      },
  86 |      "execution_count": 4,
  87 |      "metadata": {},
  88 |      "output_type": "execute_result"
  89 |     }
  90 |    ],
  91 |    "source": [
  92 |     "after_opt.mean()"
  93 |    ]
  94 |   },
  95 |   {
  96 |    "cell_type": "code",
  97 |    "execution_count": 5,
  98 |    "metadata": {
  99 |     "collapsed": true
 100 |    },
 101 |    "outputs": [],
 102 |    "source": [
 103 |     "observed_difference = after_opt.mean() - before_opt.mean()"
 104 |    ]
 105 |   },
 106 |   {
 107 |    "cell_type": "code",
 108 |    "execution_count": 6,
 109 |    "metadata": {
 110 |     "collapsed": false
 111 |    },
 112 |    "outputs": [
 113 |     {
 114 |      "name": "stdout",
 115 |      "output_type": "stream",
 116 |      "text": [
 117 |       "Difference between the means is: 4.16666666667\n"
 118 |      ]
 119 |     }
 120 |    ],
 121 |    "source": [
 122 |     "print \"Difference between the means is:\", observed_difference"
 123 |    ]
 124 |   },
 125 |   {
 126 |    "cell_type": "markdown",
 127 |    "metadata": {},
 128 |    "source": [
 129 |     "On average, the sales after optimization is more than the sales before optimization. But is the difference legit? Could it be due to chance?\n",
 130 |     "\n",
 131 |     "**Classical Method** : We could cover this method later on. This entails doing a *t-test* \n",
 132 |     "\n",
 133 |     "**Hacker's Method** : Let's see if we can provide a hacker's perspective to this problem, similar to what we did in the previous notebook."
 134 |    ]
 135 |   },
 136 |   {
 137 |    "cell_type": "code",
 138 |    "execution_count": null,
 139 |    "metadata": {
 140 |     "collapsed": true
 141 |    },
 142 |    "outputs": [],
 143 |    "source": [
 144 |     "#Step 1: Create the dataset. Let's give Label 0 to before_opt and Label 1 to after_opt"
 145 |    ]
 146 |   },
 147 |   {
 148 |    "cell_type": "code",
 149 |    "execution_count": null,
 150 |    "metadata": {
 151 |     "collapsed": true
 152 |    },
 153 |    "outputs": [],
 154 |    "source": [
 155 |     "#Learn about the following three functions"
 156 |    ]
 157 |   },
 158 |   {
 159 |    "cell_type": "code",
 160 |    "execution_count": null,
 161 |    "metadata": {
 162 |     "collapsed": true
 163 |    },
 164 |    "outputs": [],
 165 |    "source": [
 166 |     "?np.append"
 167 |    ]
 168 |   },
 169 |   {
 170 |    "cell_type": "code",
 171 |    "execution_count": null,
 172 |    "metadata": {
 173 |     "collapsed": true
 174 |    },
 175 |    "outputs": [],
 176 |    "source": [
 177 |     "?np.zeros"
 178 |    ]
 179 |   },
 180 |   {
 181 |    "cell_type": "code",
 182 |    "execution_count": null,
 183 |    "metadata": {
 184 |     "collapsed": true
 185 |    },
 186 |    "outputs": [],
 187 |    "source": [
 188 |     "?np.ones"
 189 |    ]
 190 |   },
 191 |   {
 192 |    "cell_type": "code",
 193 |    "execution_count": 7,
 194 |    "metadata": {
 195 |     "collapsed": false
 196 |    },
 197 |    "outputs": [],
 198 |    "source": [
 199 |     "shoe_sales = np.array([np.append(np.zeros(before_opt.shape[0]), np.ones(after_opt.shape[0])),\n",
 200 |     "np.append(before_opt, after_opt)], dtype=int)"
 201 |    ]
 202 |   },
 203 |   {
 204 |    "cell_type": "code",
 205 |    "execution_count": 8,
 206 |    "metadata": {
 207 |     "collapsed": false
 208 |    },
 209 |    "outputs": [
 210 |     {
 211 |      "name": "stdout",
 212 |      "output_type": "stream",
 213 |      "text": [
 214 |       "Shape: (2, 24)\n",
 215 |       "Data: \n",
 216 |       "[[ 0  0  0  0  0  0  0  0  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1]\n",
 217 |       " [23 21 19 24 35 17 18 24 33 27 21 23 31 28 19 24 32 27 16 41 23 32 29 33]]\n"
 218 |      ]
 219 |     }
 220 |    ],
 221 |    "source": [
 222 |     "print \"Shape:\", shoe_sales.shape\n",
 223 |     "print \"Data:\", \"\\n\", shoe_sales"
 224 |    ]
 225 |   },
 226 |   {
 227 |    "cell_type": "code",
 228 |    "execution_count": 9,
 229 |    "metadata": {
 230 |     "collapsed": false
 231 |    },
 232 |    "outputs": [
 233 |     {
 234 |      "name": "stdout",
 235 |      "output_type": "stream",
 236 |      "text": [
 237 |       "Shape: (24, 2)\n",
 238 |       "Data: \n",
 239 |       "[[ 0 23]\n",
 240 |       " [ 0 21]\n",
 241 |       " [ 0 19]\n",
 242 |       " [ 0 24]\n",
 243 |       " [ 0 35]\n",
 244 |       " [ 0 17]\n",
 245 |       " [ 0 18]\n",
 246 |       " [ 0 24]\n",
 247 |       " [ 0 33]\n",
 248 |       " [ 0 27]\n",
 249 |       " [ 0 21]\n",
 250 |       " [ 0 23]\n",
 251 |       " [ 1 31]\n",
 252 |       " [ 1 28]\n",
 253 |       " [ 1 19]\n",
 254 |       " [ 1 24]\n",
 255 |       " [ 1 32]\n",
 256 |       " [ 1 27]\n",
 257 |       " [ 1 16]\n",
 258 |       " [ 1 41]\n",
 259 |       " [ 1 23]\n",
 260 |       " [ 1 32]\n",
 261 |       " [ 1 29]\n",
 262 |       " [ 1 33]]\n"
 263 |      ]
 264 |     }
 265 |    ],
 266 |    "source": [
 267 |     "shoe_sales = shoe_sales.T\n",
 268 |     "print \"Shape:\",shoe_sales.shape\n",
 269 |     "print \"Data:\", \"\\n\", shoe_sales"
 270 |    ]
 271 |   },
 272 |   {
 273 |    "cell_type": "code",
 274 |    "execution_count": 10,
 275 |    "metadata": {
 276 |     "collapsed": true
 277 |    },
 278 |    "outputs": [],
 279 |    "source": [
 280 |     "#This is the approach we are going to take\n",
 281 |     "#We are going to randomly shuffle the labels. Then compute the mean between the two groups. \n",
 282 |     "#Find the % of times when the difference between the means computed is greater than what we observed above\n",
 283 |     "#If the % of times is less than 5%, we would make the call that the improvements are real"
 284 |    ]
 285 |   },
 286 |   {
 287 |    "cell_type": "code",
 288 |    "execution_count": 11,
 289 |    "metadata": {
 290 |     "collapsed": true
 291 |    },
 292 |    "outputs": [],
 293 |    "source": [
 294 |     "np.random.shuffle(shoe_sales)"
 295 |    ]
 296 |   },
 297 |   {
 298 |    "cell_type": "code",
 299 |    "execution_count": 12,
 300 |    "metadata": {
 301 |     "collapsed": false
 302 |    },
 303 |    "outputs": [
 304 |     {
 305 |      "data": {
 306 |       "text/plain": [
 307 |        "array([[ 1, 29],\n",
 308 |        "       [ 1, 24],\n",
 309 |        "       [ 1, 19],\n",
 310 |        "       [ 1, 16],\n",
 311 |        "       [ 1, 28],\n",
 312 |        "       [ 1, 41],\n",
 313 |        "       [ 1, 27],\n",
 314 |        "       [ 0, 18],\n",
 315 |        "       [ 1, 33],\n",
 316 |        "       [ 0, 24],\n",
 317 |        "       [ 0, 21],\n",
 318 |        "       [ 1, 32],\n",
 319 |        "       [ 1, 31],\n",
 320 |        "       [ 0, 23],\n",
 321 |        "       [ 0, 19],\n",
 322 |        "       [ 0, 17],\n",
 323 |        "       [ 0, 27],\n",
 324 |        "       [ 0, 21],\n",
 325 |        "       [ 1, 32],\n",
 326 |        "       [ 0, 23],\n",
 327 |        "       [ 0, 24],\n",
 328 |        "       [ 0, 33],\n",
 329 |        "       [ 0, 35],\n",
 330 |        "       [ 1, 23]])"
 331 |       ]
 332 |      },
 333 |      "execution_count": 12,
 334 |      "metadata": {},
 335 |      "output_type": "execute_result"
 336 |     }
 337 |    ],
 338 |    "source": [
 339 |     "shoe_sales"
 340 |    ]
 341 |   },
 342 |   {
 343 |    "cell_type": "code",
 344 |    "execution_count": 13,
 345 |    "metadata": {
 346 |     "collapsed": true
 347 |    },
 348 |    "outputs": [],
 349 |    "source": [
 350 |     "experiment_label = np.random.randint(0,2,shoe_sales.shape[0])"
 351 |    ]
 352 |   },
 353 |   {
 354 |    "cell_type": "code",
 355 |    "execution_count": 14,
 356 |    "metadata": {
 357 |     "collapsed": false
 358 |    },
 359 |    "outputs": [
 360 |     {
 361 |      "data": {
 362 |       "text/plain": [
 363 |        "array([0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0,\n",
 364 |        "       1])"
 365 |       ]
 366 |      },
 367 |      "execution_count": 14,
 368 |      "metadata": {},
 369 |      "output_type": "execute_result"
 370 |     }
 371 |    ],
 372 |    "source": [
 373 |     "experiment_label"
 374 |    ]
 375 |   },
 376 |   {
 377 |    "cell_type": "code",
 378 |    "execution_count": 15,
 379 |    "metadata": {
 380 |     "collapsed": false
 381 |    },
 382 |    "outputs": [
 383 |     {
 384 |      "name": "stdout",
 385 |      "output_type": "stream",
 386 |      "text": [
 387 |       "[[ 0 29]\n",
 388 |       " [ 1 24]\n",
 389 |       " [ 1 19]\n",
 390 |       " [ 1 16]\n",
 391 |       " [ 1 28]\n",
 392 |       " [ 1 41]\n",
 393 |       " [ 0 27]\n",
 394 |       " [ 0 18]\n",
 395 |       " [ 0 33]\n",
 396 |       " [ 0 24]\n",
 397 |       " [ 0 21]\n",
 398 |       " [ 1 32]\n",
 399 |       " [ 0 31]\n",
 400 |       " [ 1 23]\n",
 401 |       " [ 0 19]\n",
 402 |       " [ 1 17]\n",
 403 |       " [ 0 27]\n",
 404 |       " [ 1 21]\n",
 405 |       " [ 1 32]\n",
 406 |       " [ 0 23]\n",
 407 |       " [ 1 24]\n",
 408 |       " [ 1 33]\n",
 409 |       " [ 0 35]\n",
 410 |       " [ 1 23]]\n"
 411 |      ]
 412 |     }
 413 |    ],
 414 |    "source": [
 415 |     "experiment_data = np.array([experiment_label, shoe_sales[:,1]])\n",
 416 |     "experiment_data = experiment_data.T\n",
 417 |     "print experiment_data"
 418 |    ]
 419 |   },
 420 |   {
 421 |    "cell_type": "code",
 422 |    "execution_count": 16,
 423 |    "metadata": {
 424 |     "collapsed": false
 425 |    },
 426 |    "outputs": [],
 427 |    "source": [
 428 |     "experiment_diff_mean =  experiment_data[experiment_data[:,0]==1].mean() \\\n",
 429 |     "                        - experiment_data[experiment_data[:,0]==0].mean()"
 430 |    ]
 431 |   },
 432 |   {
 433 |    "cell_type": "code",
 434 |    "execution_count": 17,
 435 |    "metadata": {
 436 |     "collapsed": false
 437 |    },
 438 |    "outputs": [
 439 |     {
 440 |      "data": {
 441 |       "text/plain": [
 442 |        "0.26223776223776341"
 443 |       ]
 444 |      },
 445 |      "execution_count": 17,
 446 |      "metadata": {},
 447 |      "output_type": "execute_result"
 448 |     }
 449 |    ],
 450 |    "source": [
 451 |     "experiment_diff_mean"
 452 |    ]
 453 |   },
 454 |   {
 455 |    "cell_type": "code",
 456 |    "execution_count": 18,
 457 |    "metadata": {
 458 |     "collapsed": true
 459 |    },
 460 |    "outputs": [],
 461 |    "source": [
 462 |     "#Like the previous notebook, let's repeat this experiment 100 and then 100000 times"
 463 |    ]
 464 |   },
 465 |   {
 466 |    "cell_type": "code",
 467 |    "execution_count": 19,
 468 |    "metadata": {
 469 |     "collapsed": true
 470 |    },
 471 |    "outputs": [],
 472 |    "source": [
 473 |     "def shuffle_experiment(number_of_times):\n",
 474 |     "    experiment_diff_mean = np.empty([number_of_times,1])\n",
 475 |     "    for times in np.arange(number_of_times):\n",
 476 |     "        experiment_label = np.random.randint(0,2,shoe_sales.shape[0])\n",
 477 |     "        experiment_data = np.array([experiment_label, shoe_sales[:,1]]).T\n",
 478 |     "        experiment_diff_mean[times] =  experiment_data[experiment_data[:,0]==1].mean() \\\n",
 479 |     "                        - experiment_data[experiment_data[:,0]==0].mean()\n",
 480 |     "    return experiment_diff_mean    "
 481 |    ]
 482 |   },
 483 |   {
 484 |    "cell_type": "code",
 485 |    "execution_count": 20,
 486 |    "metadata": {
 487 |     "collapsed": false
 488 |    },
 489 |    "outputs": [],
 490 |    "source": [
 491 |     "experiment_diff_mean = shuffle_experiment(100)"
 492 |    ]
 493 |   },
 494 |   {
 495 |    "cell_type": "code",
 496 |    "execution_count": 21,
 497 |    "metadata": {
 498 |     "collapsed": false
 499 |    },
 500 |    "outputs": [
 501 |     {
 502 |      "data": {
 503 |       "text/plain": [
 504 |        "array([[ 1.83333333],\n",
 505 |        "       [ 0.7       ],\n",
 506 |        "       [-0.33333333],\n",
 507 |        "       [ 0.54444444],\n",
 508 |        "       [-1.0625    ],\n",
 509 |        "       [ 0.61428571],\n",
 510 |        "       [ 3.36713287],\n",
 511 |        "       [ 1.3       ],\n",
 512 |        "       [ 0.3       ],\n",
 513 |        "       [ 0.66666667]])"
 514 |       ]
 515 |      },
 516 |      "execution_count": 21,
 517 |      "metadata": {},
 518 |      "output_type": "execute_result"
 519 |     }
 520 |    ],
 521 |    "source": [
 522 |     "experiment_diff_mean[:10]"
 523 |    ]
 524 |   },
 525 |   {
 526 |    "cell_type": "code",
 527 |    "execution_count": 22,
 528 |    "metadata": {
 529 |     "collapsed": false
 530 |    },
 531 |    "outputs": [
 532 |     {
 533 |      "data": {
 534 |       "text/plain": [
 535 |        "<matplotlib.axes._subplots.AxesSubplot at 0x10a6c56d0>"
 536 |       ]
 537 |      },
 538 |      "execution_count": 22,
 539 |      "metadata": {},
 540 |      "output_type": "execute_result"
 541 |     },
 542 |     {
 543 |      "data": {
 544 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXEAAAECCAYAAAAIMefLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEARJREFUeJzt3X+M5Hddx/Hn7l5vZndm7kpyK2psIAH5BKMoVIIgoW0K\nBqKmQuAPU4kgBBASiqINVFJjgkjE1kCsRBCoYsTQpkDQADVn0yYkNAcWFDHv/iAkog3uHaWdmdvd\n+zHjHztn7sruzo+dme++756PpMn8+H6/n1e/O/ea73xnvt/vQr/fR5KU02LVASRJk7PEJSkxS1yS\nErPEJSkxS1ySErPEJSmxA7s9WUq5DPg48DSgBrwX+C7wj8CDg8k+HBGfnmVISdL2di1x4HpgLSJe\nW0p5CvAN4I+AWyLi1pmnkyTtaliJ3wHcObi9CJwGrgRKKeU64CHgHRHRmV1ESdJOFkY5YrOU0gI+\nB3wEqAPfiIgHSik3AU+JiN+fbUxJ0naGfrFZSrkC+BfgbyPiH4DPRMQDg6c/Czx3hvkkSbsY9sXm\nU4G7gbdGxD2Dh79YSnl7RBwDrgW+OmyQfr/fX1hY2HNYSbqEjFSau+5OKaV8EHgNEOc9/C7gFrb2\njz8KvGmEfeL9tbX2KHn2pdXVFlnzZ84O5q+a+auzutoaqcR33RKPiBuAG7Z56sWThJIkTZcH+0hS\nYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYpa4JCVmiUtSYsMu\nz6aLUK/Xo9vtVh0DgEajweKi2xLSpCzxS1C32+XosUeo1ZcrzbG5sc61z38GrVar0hxSZpb4JapW\nX2Z5pVl1DEl75OdYSUrMEpekxCxxSUrMEpekxCxxSUrMEpekxCxxSUrMEpekxCxxSUrMEpekxCxx\nSUrMEpekxCxxSUrMEpekxCxxSUrM84nrkjfLKx3Van3a7c7I03ulI43LEtclb5ZXOmo26nS6GyNN\n65WONAlLXGJ2VzpaadQ52/efmWbHz22SlJglLkmJWeKSlNiuO+tKKZcBHweeBtSA9wL/CdwO9IBv\nAm+LiP5sY0qStjNsS/x6YC0iXgK8HLgNuAW4afDYAnDdbCNKknYyrMTvAG4+b9rTwPMi4r7BY18A\nXjqjbJKkIXbdnRIRXYBSSoutQn8P8GfnTdIBDs8snSRpV0N/wFpKuQK4C7gtIj5VSvnT855uAT8Y\nZaDV1dwHMGTO/+TstVqfZqPOSqNeUaItSwtnOHKkyaFDu6/bWa/7Wa+PVnO05Y66PuYt82sf8ucf\nZtgXm08F7gbeGhH3DB5+oJRyVUTcC7wCODrKQGtr7T0FrdLqaitt/u2yt9sdOt2Nyg9CWT+5wfHj\nHTY3F3acZh7rfpbro9Ws0+6MdsTmKOtj3jK/9iF3/lHffIa9am9ia3fJzaWUc/vGbwA+VEo5CHwL\nuHPSkJKkvRm2T/wGtkr7ya6eSRpJ0lg82EeSErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPE\nJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkx\nS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySErPEJSkxS1ySEjtQdQBdunq9\nHp1OZ9dparU+7fbu0+xVp9Oh3+vPdAxpVixxVebU5jr3ff0JDh2+fMdpmo06ne7GTHM8/tgJ6itN\nVmY6ijQblrgqVasvs7zS3PH5lUads/3Zvkw31rszXb40S+4Tl6TELHFJSmykz6mllBcA74+Ia0op\nzwU+Dzw0ePrDEfHpWQWUJO1saImXUm4EfgM49xOBK4FbI+LWWQaTJA03yu6Uh4FXAQuD+1cCv1xK\nubeU8tellJ2/lZIkzdTQEo+Iu4Az5z10P/B7EXEV8G3gD2eUTZI0xCRfbH4mIh4Y3P4s8Nwp5pEk\njWGSH+B+sZTy9og4BlwLfHWUmVZXWxMMtX9kzv/k7LVan2ajzkqjXlGiLafW6ywuLdFq7p5j2PPz\nyjGpUZe7tHCGI0eaHDq0v15rmV/7kD//MOOU+Lnjkt8C3FZKOQ08CrxplJnX1tpjRts/VldbafNv\nl73d7tDpbsz8IJphuic3WFg8wIGDOx+R2WrWaXdme8TmKDkmNU7+9ZMbHD/eYXNzYfjEc5L5tQ+5\n84/65jPSv+KI+A7wosHtbwAvnjSYJGl6PNhHkhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKz\nxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpsWpPKC3p//V6PTqdzvAJ56DRaLC46DZeBpa4\ntE+c2lznvq8/waHDl1eaY3NjnWuf/wxarYv7ijgXC0tc2kdq9WWWV5pVx1Aifl6SpMQscUlKzBKX\npMQscUlKzBKXpMQscUlKzJ8YzlGv16Pb7c51zFqtT7t94QEknU6Hfq8/1xySZsMSn6Nut8vRY49Q\nqy/Pbcxmo06nu3HBY48/doL6SpOVuaWQNCuW+JzN+2COlUads/0L/8wb6/P9NCBpdtwnLkmJWeKS\nlJglLkmJWeKSlJglLkmJWeKSlJglLkmJWeKSlJglLkmJWeKSlJglLkmJjXTulFLKC4D3R8Q1pZRn\nArcDPeCbwNsiwlPiSVIFhm6Jl1JuBD4K1AYP3QrcFBEvARaA62YXT5K0m1F2pzwMvIqtwgZ4XkTc\nN7j9BeClswgmSRpuaIlHxF3AmfMeWjjvdgc4PO1QkqTRTPLFZu+82y3gB1PKIkka0yQXhXiglHJV\nRNwLvAI4OspMq6utCYbaP6aRv1br02zUWWnUp5BodK3mheOdWq+zuLT0Q4/P26g5Zp1z1utj1OXu\nl7/L0sIZjhxpcujQ1mvef7v72zglfu4XKO8EPlpKOQh8C7hzlJnX1tpjRts/VldbU8nfbnfodDd+\n6Eo7s9Rq1ml3Lrw8W/fkBguLBzhwcGOHueZjlBzb5a8ix6TGyb9f/i7rJzc4frzD5ubC1F77Vcmc\nf9Q3n5HaJCK+A7xocPsh4OoJc0mSpsiDfSQpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtc\nkhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKzxCUpMUtckhKz\nxCUpMUtckhI7UHUASXqyXq9Ht9vd83JqtT7tdmdPy2g0Giwu7t/tXUtc0r7T7XY5euwRavXlPS2n\n2ajT6W5MPP/mxjrXPv8ZtFqtPeWYJUtc0r5Uqy+zvNLc0zJWGnXO9i/umtu/nxEkSUNZ4pKUmCUu\nSYlZ4pKUmCUuSYlZ4pKUmCUuSYlZ4pKUmCUuSYlZ4pKUmCUuSYlZ4pKU2MRnhiml/Cvw+ODutyPi\nDdOJJEka1UQlXkqpA0TENdONI0kax6Rb4j8LrJRSvjRYxk0Rcf/0YkmSRjFpiXeBD0TEx0opPwl8\noZTyrIjoTTGbpAr0ej06na2r4UzjyjiT6HQ69Hv9uY+b0aQl/iDwMEBEPFRKOQH8GPDfO82wurp/\nr4wximnkr9X6NBt1Vhr1KSQaXat54Xin1ussLi390OPzNmqOWeec9foYdbn75+/yOF978HscvvwU\nfPv7lWR47LE1VlZaU1kXe1nG0sIZjhxpcujQ/u2vSUv89cBzgLeVUn4cOAQ8utsMa2vtCYeq3upq\nayr52+0One7GXK800mrWaXcuvDxV9+QGC4sHOHBw8stWTcMoObbLX0WOSY2Tf7/9Xc72D8xl/W/n\n7NlFOt3NPa+LveZfP7nB8eMdNjcX9pRjEqNuOE7aJh8DPlFKuW9w//XuSpGk+ZuoxCPiDPDaKWeR\nJI3Jg30kKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJIS\ns8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISs8QlKTFLXJISOzCPQb73v2ucONGZ\nx1A7qh2scehQq9IMkjRtcynx+7/1fdrdjXkMtaPWgU1e+LxnV5pBkqZtLiV+sFbj4On+PIba0SKn\nKx1fkmbBfeKSlJglLkmJWeKSlJglLkmJWeKSlJglLkmJzeUnhvtBr9ej3W5PNG+t1qfd3vvBSp1O\nh36v2p9aSrq4XDIlvrmxztFjj1CrL489b7NRpzOFg5Uef+wE9ZUmK3tekiRtuWRKHKBWX2Z5pTn2\nfCuNOmf7e19VG+vdPS9Dks7nPnFJSswSl6TEJtpHUEpZBP4SeA6wCbwxIh6ZZjBJ0nCTbon/GnAw\nIl4EvAu4ZXqRJEmjmrTEfxH4IkBE3A/8/NQSSZJGNmmJHwKeOO/+2cEuFknSHE36u7kngPMvk7MY\nEb2dJt7snGCzU+1FIS7rn2bz9PpE8y4tnGH95N7zb25ssLC4xPrJ+V3laLvsVeTYzig5prXu95pj\nUuPk349/l3ms/2EZ9mKv+Tc3JuuMeZq0xL8M/CpwRynlF4B/223i61525cKE40iSdjFpiX8GeFkp\n5cuD+6+fUh5J0hgW+n3P5SFJWfllpCQlZolLUmKWuCQlZolLUmIzPxVtKaUB/D1wOXAK+M2I+J9Z\njzstpZTDwN+x9bv4g8DvRsRXqk01mVLKK4FXR8T1VWcZ5mI5P08p5QXA+yPimqqzjKqUchnwceBp\nQA14b0R8vtpUoyulLAEfBZ4F9IG3RMR/VJtqfKWUHwG+BlwbEQ/uNN08tsTfCByLiKvYKsMb5zDm\nNP0O8M8RcTXwOuC2StNMqJTyQeB9QJbf7Kc/P08p5Ua2yqRWdZYxXQ+sRcRLgJcDf1FxnnH9CtCL\niBcD7wH+uOI8Yxu8kf4VMPQiBDMv8Yg4Vx6w9c7+2KzHnLI/Bz4yuH0ZsP8P4drel4HfJk+JXwzn\n53kYeBV51vk5dwA3D24vAmcqzDK2iPgc8ObB3aeTr3MAPgB8GHh02IRT3Z1SSnkD8I4nPfy6iPha\nKeUo8NPAL01zzGkakv9HgU8CN8w/2eh2+X/4dCnl6goiTWrb8/PsdnqH/SYi7iqlPL3qHOOKiC5A\nKaXFVqH/QbWJxhcRZ0sptwOvBF5dcZyxlFJex9YnobtLKe9myEbAXA/2KaUU4J8i4plzG3QKSik/\nA3wKeGdEfKnqPJMalPibI+LXq84yTCnlFuArEXHH4P5/RcQVFcca26DEPxURL6w6yzhKKVcAdwG3\nRcTtFceZWCnlqcD9wLMjIsWn6FLKvWzty+8DPwcEcF1EfG+76efxxea7ge9GxCfZ2r+T6qNZKeWn\n2NoaeU1E/HvVeS4hY52fR9MzKL67gbdGxD1V5xlXKeW1wE9ExJ+wtfuzN/gvhcH3hwCUUu5ha8Nr\n2wKH+Vwo+WPA35RSfgtYIt95Vt7H1q9SPrT1QYIfRMQrq400sXPv7hlcTOfnybLOz7kJOAzcXEo5\nt2/8FRFR7alIR3cncPtgi/Yy4IaI2Kw408x47hRJSsyDfSQpMUtckhKzxCUpMUtckhKzxCUpMUtc\nkhKzxCUpMUtckhL7P8BDb+El45uCAAAAAElFTkSuQmCC\n",
 545 |       "text/plain": [
 546 |        "<matplotlib.figure.Figure at 0x106906f50>"
 547 |       ]
 548 |      },
 549 |      "metadata": {},
 550 |      "output_type": "display_data"
 551 |     }
 552 |    ],
 553 |    "source": [
 554 |     "sns.distplot(experiment_diff_mean, kde=False)"
 555 |    ]
 556 |   },
 557 |   {
 558 |    "cell_type": "code",
 559 |    "execution_count": 23,
 560 |    "metadata": {
 561 |     "collapsed": false
 562 |    },
 563 |    "outputs": [
 564 |     {
 565 |      "name": "stdout",
 566 |      "output_type": "stream",
 567 |      "text": [
 568 |       "Data: Difference in mean greater than observed: []\n",
 569 |       "Number of times diff in mean greater than observed: 0\n",
 570 |       "% of times diff in mean greater than observed: 0.0\n"
 571 |      ]
 572 |     }
 573 |    ],
 574 |    "source": [
 575 |     "#Finding % of times difference of means is greater than observed\n",
 576 |     "print \"Data: Difference in mean greater than observed:\", \\\n",
 577 |     "        experiment_diff_mean[experiment_diff_mean>=observed_difference]\n",
 578 |     "\n",
 579 |     "print \"Number of times diff in mean greater than observed:\", \\\n",
 580 |     "            experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]\n",
 581 |     "print \"% of times diff in mean greater than observed:\", \\\n",
 582 |     "        experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]/float(experiment_diff_mean.shape[0])*100"
 583 |    ]
 584 |   },
 585 |   {
 586 |    "cell_type": "markdown",
 587 |    "metadata": {},
 588 |    "source": [
 589 |     "#### Exercise: Repeat the above for 100,000 runs and report the results"
 590 |    ]
 591 |   },
 592 |   {
 593 |    "cell_type": "code",
 594 |    "execution_count": null,
 595 |    "metadata": {
 596 |     "collapsed": true
 597 |    },
 598 |    "outputs": [],
 599 |    "source": []
 600 |   },
 601 |   {
 602 |    "cell_type": "markdown",
 603 |    "metadata": {},
 604 |    "source": [
 605 |     "# Is the result by chance? "
 606 |    ]
 607 |   },
 608 |   {
 609 |    "cell_type": "markdown",
 610 |    "metadata": {},
 611 |    "source": [
 612 |     "### What is the justification for shuffling the labels? \n",
 613 |     "\n",
 614 |     ">Thought process is this: If price optimization had no real effect, then, the sales before optimization would often give more sales than sales after optimization. By shuffling, we are simulating the situation where that happens -  sales before optimization is greater than sales after optimization. If many such trials provide improvements, then, the price optimization has no effect. In statistical terms, *the observed difference could have occurred by chance*. \n",
 615 |     "\n",
 616 |     "Now, to show that the same difference in mean might lead to a different conclusion, let's try the same experiment with a different dataset. "
 617 |    ]
 618 |   },
 619 |   {
 620 |    "cell_type": "code",
 621 |    "execution_count": 24,
 622 |    "metadata": {
 623 |     "collapsed": true
 624 |    },
 625 |    "outputs": [],
 626 |    "source": [
 627 |     "before_opt = np.array([230, 210, 190, 240, 350, 170, 180, 240, 330, 270, 210, 230])\n",
 628 |     "after_opt = np.array([310, 180, 190, 240, 220, 240, 160, 410, 130, 320, 290, 210])"
 629 |    ]
 630 |   },
 631 |   {
 632 |    "cell_type": "code",
 633 |    "execution_count": 25,
 634 |    "metadata": {
 635 |     "collapsed": false
 636 |    },
 637 |    "outputs": [
 638 |     {
 639 |      "name": "stdout",
 640 |      "output_type": "stream",
 641 |      "text": [
 642 |       "Mean sales before price optimization: 237.5\n",
 643 |       "Mean sales after price optimization: 241.666666667\n",
 644 |       "Difference in mean sales: 4.16666666667\n"
 645 |      ]
 646 |     }
 647 |    ],
 648 |    "source": [
 649 |     "print \"Mean sales before price optimization:\", np.mean(before_opt)\n",
 650 |     "print \"Mean sales after price optimization:\", np.mean(after_opt)\n",
 651 |     "print \"Difference in mean sales:\", np.mean(after_opt) - np.mean(before_opt) #Same as above"
 652 |    ]
 653 |   },
 654 |   {
 655 |    "cell_type": "code",
 656 |    "execution_count": 26,
 657 |    "metadata": {
 658 |     "collapsed": true
 659 |    },
 660 |    "outputs": [],
 661 |    "source": [
 662 |     "shoe_sales = np.array([np.append(np.zeros(before_opt.shape[0]), np.ones(after_opt.shape[0])),\n",
 663 |     "np.append(before_opt, after_opt)], dtype=int)\n",
 664 |     "shoe_sales = shoe_sales.T"
 665 |    ]
 666 |   },
 667 |   {
 668 |    "cell_type": "code",
 669 |    "execution_count": 27,
 670 |    "metadata": {
 671 |     "collapsed": false
 672 |    },
 673 |    "outputs": [
 674 |     {
 675 |      "data": {
 676 |       "text/plain": [
 677 |        "<matplotlib.axes._subplots.AxesSubplot at 0x10b414050>"
 678 |       ]
 679 |      },
 680 |      "execution_count": 27,
 681 |      "metadata": {},
 682 |      "output_type": "execute_result"
 683 |     },
 684 |     {
 685 |      "data": {
 686 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAECCAYAAAAW+Nd4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGShJREFUeJzt3W+MXNd53/Hv7FI7s7szS7/g2IpUI6nd6KmSlHYkE64l\ng6QgRbIEG0oDREYUuJKRSLXMCHIrIK23igAbVNTEkdoSdpiEdEIpKlzAhJPYJUTJJQIuvalCxpGU\nqHIem0qcwEbS7tIkd2Y4M8vdmb6Yu83szv/d2fl3fh+A0M5z7869RzP7u3fOOXNvrFwuIyIiYRnr\n9w6IiEjvKfxFRAKk8BcRCZDCX0QkQAp/EZEAKfxFRAK0o9UKZvYA8GD0cBJ4D/BB4L8CJeAN4IC7\nl83sIeBhYAU46O4nzGwSeAFIAxngAXdf7HZDRESkfbFO5vmb2eeB14CPAM+4+5yZHQZeAl4BXgZu\npnKQ+AbwPuCXgKS7f9bMPgp8wN0/1d1miIhIJ9ru9jGz9wE/5u5HgZvdfS5a9CJwB7AHmHf3q+6+\nBJwHdgO3AiejdU9G64qISB910uc/C3wm+jlWVc8AO4EZ4HKD+tKGmoiI9FFb4W9mbwNucPfTUalU\ntXgGuEQl4FNV9VSd+lpNRET6qOWAb2QvcKrq8atmti86GNwdLTsLPGVmcSAB3EhlMHgeuAc4F607\nRxPlcrkci8WarSIiIrU6Cs52w/8G4K2qx48DR8xsAngTOB7N9jkEnKHyiWLW3YvRgPBzZnYGKAL3\nN937WIyFhUwnbRgq6XRK7RtSo9w2UPuGXTqdar1SlY5m+/RIedRfILVvOI1y20DtG3bpdKqjM399\nyUtEJEAKfxGRACn8RUQC1O6Ar4gApVKJXC5Xd9n09DRjYzqfkuGg8BfpQC6X49S5t4gnJtfVi4U8\nt+95N6lUZzMuRPpF4S9SR70z/Hi8TDabZWIiweRUsk97JtIdCn+ROuqd4SenE3z/e98nMZVkqo/7\nJtINCn+RBuKJyXVn+FPTCeKJRB/3SKR7NDolIhIghb+ISIAU/iIiAVL4i4gESOEvIhIghb+ISIAU\n/iIiAVL4i4gESOEvIhIghb+ISIAU/iIiAVL4i4gESOEvIhIghb+ISIB0SWeRbaTbPsqgUvjLyOtn\nAOu2jzKoWoa/mX0a+AhwDfB5YB44BpSAN4AD7l42s4eAh4EV4KC7nzCzSeAFIA1kgAfcfXE7GiLS\nSL8DeONNYUQGQdNTHjPbD3zA3W8B9gPvAp4BZt19LxAD7jWza4FHgVuAu4CnzWwCeAR4PVr3eeCJ\nbWqHSFNrAVz9b+PBQCQkrT7v3gn8pZn9IfA14KvAze4+Fy1/EbgD2APMu/tVd18CzgO7gVuBk9G6\nJ6N1RUSkz1p1+6SBdwIfpnLW/zUqZ/trMsBOYAa43KC+tKEmIiJ91ir8F4FvufsK8G0zKwDXVy2f\nAS5RCfjqjtNUnfparaV0erQHwdS+3orHyySnE0xNr7/5+nhshV27kszM1O5vo9+ZnkowNj5OKtne\nc21m2/00aK9dt416+zrRKvy/ATwGPGtm1wFTwCkz2+fup4G7gVPAWeApM4sDCeBGKoPB88A9wLlo\n3bnaTdRaWMhsoinDIZ1OqX09lslkyeYKrJbXv93zVwosLmYpFmNt/U4qmSB3pUBsbAc7JgptPddm\ntt0vg/jadVMI7etE0/CPZuzsNbOzVMYHPgl8FzgSDei+CRyPZvscAs5E6826e9HMDgPPmdkZoAjc\n32mDRESk+1pO9XT3f1+nvL/OekeBoxtqeeC+ze6ciIhsD33JS6QPSqUS2Wy24XJ9+1e2m8JfpA+W\ni3nmXltiZufbapbp27/SCwp/kT7RN3+ln/S5UkQkQAp/EZEAKfxFRAKk8BcRCZDCX0QkQAp/EZEA\naaqnjIxGd+zKZrOUS+U+7JHI4FL4y8hodMeuyxcvkJhKMtWn/RIZRAp/GSn1vjhVyNe/f69IyNTn\nLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS\n+IuIBKitC7uZ2Z8Dl6OHfw08DRwDSsAbwAF3L5vZQ8DDwApw0N1PmNkk8AKQBjLAA+6+2NVWiIhI\nR1qGv5klANz9tqraV4FZd58zs8PAvWb2CvAocDMwCXzDzL4OPAK87u6fNbOPAk8An+p+U0Q6UyqV\nyGazdZfpHgAy6to5838PMGVmL0Xr/0fgJnefi5a/CNwJrALz7n4VuGpm54HdwK3Ar0XrngR+pYv7\nL7Jpy8U8c68tMbPzbTXLdA8AGXXthH8O+Jy7f9HMfpRKgFfLADuBGf6xa2hjfWlDTWQg1Lv+P+ge\nADL62gn/bwPnAdz9O2Z2AfjJquUzwCUqAZ+qqqfq1NdqTaXTqVarDDW1b3vE42WS0wmmphPr6sv5\nBGPj46SS7dWbLZueql8fj62wa1eSmZn1be90n5o9Vy/ovRmOdsL/41S6bw6Y2XVUAvxlM9vn7qeB\nu4FTwFngKTOLAwngRiqDwfPAPcC5aN252k2st7CQ2URThkM6nVL7tkkmkyWbK7BaXv+2zl0pEBvb\nwY6JQlv1RstSyUTD38lfKbC4mKVYjG1pn5o913bTe3O4dXpgayf8vwj8npmthfbHgQvAETObAN4E\njkezfQ4BZ6hMIZ1192I0IPycmZ0BisD9He2hBKnRzdjXTE9PMzammcoim9Uy/N19BfhYnUX766x7\nFDi6oZYH7tvk/kmgGt2MHaBYyHP7nneTSukjvMhm6QbuMrAaDcYOokbTRjVlVAaVwl+kCxpNG9WU\nURlUCn+RLqn3SUVTRmVQacRMRCRAOvMXGRLNZkBp9pN0SuEvMiQazYDS7CfZDIW/yBAZphlQMtj0\nOVFEJEAKfxGRACn8RUQCpPAXEQmQwl9EJEAKfxGRACn8RUQCpPAXEQmQwl9EJEAKfxGRAOnyDjJ0\ndOMUka1T+MvQ0Y1TRLZO4S9DaZRvnKJPNtILCn+RAaNPNtILCn+RATTKn2xkMGi2j4hIgBT+IiIB\naqvbx8zeDnwTuB0oAcei/74BHHD3spk9BDwMrAAH3f2EmU0CLwBpIAM84O6LXW+FiIh0pOWZv5ld\nA/w2kANiwLPArLvvjR7fa2bXAo8CtwB3AU+b2QTwCPB6tO7zwBPb0goREelIO90+nwMOA38fPb7J\n3eein18E7gD2APPuftXdl4DzwG7gVuBktO7JaF0REemzpuFvZg8CC+7+clSKRf/WZICdwAxwuUF9\naUNNRET6rFWf/8eBspndAbwXeI5K//2aGeASlYBPVdVTdeprtZbS6VTrlYaY2tdaPF4mOZ1gajpR\ns2w5n2BsfJxUMrEt9WbLpqf6t+1G9fHYCrt2JZmZ2fr/d703w9E0/N1939rPZvbHwCeAz5nZPnc/\nDdwNnALOAk+ZWRxIADdSGQyeB+4BzkXrztGGhYVM5y0ZEul0Su1rQyaTJZsrsFqufYvmrhSIje1g\nx0RhW+qNlqWSib5tu1k9f6XA4mKWYjHGVui9Odw6PbB1+iWvMvA4cCQa0H0TOB7N9jkEnKHSlTTr\n7kUzOww8Z2ZngCJwf4fbE5EWGl0OAmB6epqxMc3ollpth7+731b1cH+d5UeBoxtqeeC+ze6ciLTW\n6HIQxUKe2/e8m1RKXR1SS5d3EBkB9S4HIdKMwl/6qlQqkcvVXrNGV7AU2V4Kf+mrXC7HqXNvEU9M\nrqvrCpYi20vhL32nK1iK9J6mAYiIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4i\nIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/\niEiAFP4iIgFqeQN3MxsHjgA3AGXgE0AROAaUgDeAA+5eNrOHgIeBFeCgu58ws0ngBSANZIAH3H1x\nG9oiIiJtaufM/8NAyd0/CDwB/CrwDDDr7nuBGHCvmV0LPArcAtwFPG1mE8AjwOvRus9HzyEiIn3U\nMvzd/Y+AfxM9/BHgInCzu89FtReBO4A9wLy7X3X3JeA8sBu4FTgZrXsyWldERPqoZbcPgLuvmtkx\n4KeBnwV+qmpxBtgJzACXG9SXNtQkMKVSiVwuV1PPZrOUS+U+7JFI2NoKfwB3f9DM3gGcBRJVi2aA\nS1QCPlVVT9Wpr9WaSqdTrVYZaiG2b2lpiVPnvkciMbWufvHiAlNTKVLJxLr6cj7B2Ph4Tb3Zsm7V\nmy2bnurftjutx8rLxONl4vH6B9dkMsnY2PoP/yG+N0PVzoDvx4B/4u5PA3lgFfgzM9vn7qeBu4FT\nVA4KT5lZnMrB4UYqg8HzwD3AuWjdudqtrLewkNlca4ZAOp0Ksn2ZTJaV1TFWy+vfcqurY2RzRXZM\nFNbVc1cKxMZ21NSbLetWvdGyVDLRt21vpn7xBxf5g1P/l5mdb6vZRrGQ5/Y97yaV+scwDPW9OSo6\nPbC1c+Z/HDhmZqeBa4DHgL8CjkQDum8Cx6PZPoeAM1TGEmbdvWhmh4HnzOwMlVlC93e0hyKyafHE\nJJNTyX7vhgygluHv7nngo3UW7a+z7lHgaJ3fv2+T+yci26BUKpHNZtfV4vEymUyW6enpmu4gGT1t\n9/mLtKNUKpHJ1H601sDuYFku5pl7bWldl1ByOsGFCxdruoNkNCn8pauy2Synzr1FPDG5rn754gUS\nU0mmGvye9N7GLqGp6QTZXO0YhIwmhb90Xb1+5kK+dpqniPSPOvZERAKk8BcRCZDCX0QkQAp/EZEA\nKfxFRAKk8BcRCZDCX0QkQAp/EZEAKfxFRAKk8BcRCZDCX0QkQAp/EZEAKfxFRAKk8BcRCZDCX0Qk\nQAp/EZEA6WYu0rFSqUQuV//mLMvLZd2uUWQIKPylY7lcru6tGgGuLmcolSd0u8YhVe/G7mt0Y/fR\novCXTal3q0aA8dgqVworfdgj6YZ6N3YHKBbyurH7iFH4i8g6jQ7sMlr0GU5EJEBNz/zN7Brgd4Ef\nBuLAQeBbwDGgBLwBHHD3spk9BDwMrAAH3f2EmU0CLwBpIAM84O6L29QWERFpU6sz/58HFtx9L/Ah\n4AvAM8BsVIsB95rZtcCjwC3AXcDTZjYBPAK8Hq37PPDE9jRDREQ60Sr8vww8WbXuVeAmd5+Lai8C\ndwB7gHl3v+ruS8B5YDdwK3AyWvdktK6IiPRZ024fd88BmFmKyoHgCeA3qlbJADuBGeByg/rShpqI\niPRZy9k+ZvZO4CvAF9z9S2b261WLZ4BLVAK+eg5Yqk59rdZSOj3a08mGvX3xeJnkdIKp6UTNsgv5\nyySn46SS65ct5xOMjY9vud7N59rMNqan+rftfrZvPLbCrl1JZmaG+7077H973dRqwPcdwMvAJ939\nj6Pyq2a2z91PA3cDp4CzwFNmFgcSwI1UBoPngXuAc9G6c7RhYSGziaYMh3Q6NfTty2SyZHMFVsv1\n3z7ZXJEdE4V1tdyVArGxHVuud/O5Ot1GKpno27b73b78lQKLi1mKxVjNcw2LUfjba6bTA1urM/9Z\nKl01T5rZWt//Y8ChaED3TeB4NNvnEHCGytjArLsXzeww8JyZnQGKwP0d7Z2IiGyLVn3+j1EJ+432\n11n3KHB0Qy0P3LeF/RMRkW2gL3mJiARI4S8iEiCFv4hIgBT+IiIBUviLiARI4S8iEiCFv4hIgBT+\nIiIBUviLiARIt3GUhkqlErlcrqaezWYpl8p92CPpF93YffQo/KWhXC7HqXNvEU9MrqtfvniBxFSS\nqT7tl/Sebuw+ehT+0lS9m3kX8rWfBmT06cbuo0Wf1UREAqTwFxEJkMJfRCRACn8RkQAp/EVEAqTw\nFxEJkMJfRCRACn8RkQDpS16iyziIBEjhL7qMg2xas2v+gK77M8gU/gLoMg6yOY2u+QO67s+gU/iL\nyJbomj/Dqa3wN7P3A//J3W8zs38GHANKwBvAAXcvm9lDwMPACnDQ3U+Y2STwApAGMsAD7r64De0Q\nEZEOtOyMM7NfBo4A8aj0LDDr7nuBGHCvmV0LPArcAtwFPG1mE8AjwOvRus8DT3S/CSIyiNbGAzKZ\nTM2/UqnU790LXjtn/ueBnwF+P3p8k7vPRT+/CNwJrALz7n4VuGpm54HdwK3Ar0XrngR+pVs7LiKD\nTfcAGGwtw9/dv2JmP1JVilX9nAF2AjPA5Qb1pQ016RNN6ZRe03jA4NrMgG/157UZ4BKVgK8+jKfq\n1NdqLaXTo31G0K/2LS0tcerc90gk1k/evHhxgampFKlkYl19OZ9gbHy87TrAhfxlktPxLT9Xs210\n67k2s43pqf5te5ja12zZeGyFXbuSzMz0/u9g1LOlE5sJ/1fNbJ+7nwbuBk4BZ4GnzCwOJIAbqQwG\nzwP3AOeidefqP+V6CwuZTezWcEinU31rXyaTZWV1jNXy+pd9dXWMbK7IjonCunruSoHY2I6262u6\n8VzNttGt5+p0G6lkom/bHrb2NVuWv1JgcTFLsRir+Z3t1M+/vV7o9MDWybcv1voFHgc+Y2Z/QuXg\ncdzd/w9wCDhD5WAw6+5F4DDw42Z2BvhF4DMd7Z2IiGyLts783f27VGby4O7fAfbXWecocHRDLQ/c\nt9WdFBGR7tL3rkVEAqTwFxEJkMJfRCRACn8RkQAp/EVEAqTwFxEJkMJfRCRACn8RkQDpZi4i0lPN\nbv2o2z72jsJ/BOnqnTLIdKnnwaDwH0G6IbsMOl3quf8U/iNKN2QXkWbUuSYiEiCd+Q+pRv36oL59\nGU4aCO4thf+QatSvD+rbl+GkgeDeUvgPsUaDZurbl2GlgeDe0ecoEZEA6cx/wGnOvohsB4X/gNOc\nfQmdBoK3h8J/CGjOvoRMA8HbQ+EvIgNPA8Hdp89LIiIB0pn/gNDArkhnmo0FgMYDWtn28DezMeA3\ngd1AEfhFd39ru7c7bDSwK9KZRmMBoPGAdvTizP+ngQl3v8XM3g88E9WCVCqVyGQyNfVsNsvEREID\nuyIdaDQWUO9TQTxeJpPJ6hNBpBfhfytwEsDd/9TM3teDbfZVs+vuLC8v8T//9C0SU+vP5XWGL9I9\n9T4VJKcTLCxc4AM//kMkk7UHjNAOCr0I/xlgqerxqpmNuXupB9vuikZhXipVmrDxDZPNZnnlf/9D\nTcADXF3OEBuf0Bm+yDbb+KlgajpBbPECc6/9XU1XUf5KLriDQi/Cfwmo7njravAvLy/z2uuv1V12\n/XXXMTOzc8vbyGaznH71b4jHE+vqly//gLHYOKkN27h8+QdMTqZINDiPLxby5K9kN9QKxMbGt1zv\n5nNtZttXl69QLKwOdfsaLRuPrfRt28PWvl60Yyvt22i5WODrr/xVzd9ysVhg30/+07oHhV7ZrnGL\nXoT/PPAR4Mtm9i+Bv2ixfiyd7qyx119/+yZ3rX27d9+w7duQYfAT/d6Bbab2haIX4f8HwE+Z2Xz0\n+OM92KaIiDQRK5c1h1xEJDSjN4ohIiItKfxFRAKk8BcRCZDCX0QkQANxYTczGweeBW4GJoAn3f1k\nNDX0vwArwMvu/tk+7uaWmdk/B14B3u7uy6PSPjPbCbxA5fscE8C/c/dXRqh9I3d9KjO7Bvhd4IeB\nOHAQ+BZwDCgBbwAH3H2oZ4SY2duBbwK3U2nXMUakfWb2aSrT6K8BPk9lWv0x2mzfoJz5fwzY4e4f\npHLdnxuj+m8BPxfV329m7+3XDm6Vmc1Qua5Roap8mNFo378Fvu7u+4EHgS9E9VF5/f7/9amA/0Dl\ndRx2Pw8suPte4ENUXrNngNmoFgPu7eP+bVl0gPttIEelPc8yIu0zs/3AB6L35H7gXXT4+g1K+N8J\nfN/M/gdwBPijKCwn3P1vonVeAu7o1w5uhZnFqLwJPw3ko9oMEB+F9gH/Gfid6OdrgLyZpRiR148N\n16cCRuH6VF8Gnox+HgOuAje5+1xUe5Hhfb3WfI7KCdbfR49HqX13An9pZn8IfA34KnBzJ+3rebeP\nmf0C8KkN5QUg7+4fNrO9wO8B97P+mkAZKke3gdagfX8L/Hd3/wszg8pReeM1j4a5fQ+6+zfN7Frg\n94HHgJ0MYfsaGPrrU23k7jmA6CD9ZeAJ4DeqVslSeQ2Hkpk9SOWTzctR90gs+rdmqNsHpIF3Ah+m\n8nf1NTpsX8/D392/CHyxumZmXwJORMvnzOwGaq8JNANc6tV+blaD9n0H+IUoOK+lchb8EUakfQBm\n9i+ALwGPu/uZ6JPN0LWvgW29PlW/mNk7ga8AX3D3L5nZr1ctTjG8rxdUriRQNrM7gPcCz1EJzDXD\n3r5F4FvuvgJ828wKwPVVy1u2b1C6fb4B3ANgZu8B/tbdM8Cymb0r6ja5E5hr8hwDy91/1N1vc/fb\ngH8A7hyl9pnZj1E5e/w5d38JwN2XGJH2URlIW3t/tnN9qoFnZu8AXgZ+2d2PReVXzWxf9PPdDO/r\nhbvvc/f90d/ca8C/Bk6OSvuoZOaHAMzsOmAKONVJ+wZitg+Vfv7DZva/osefqPrvfwPGgZfc/Vw/\ndq7LqkffR6V9v0plls+hqFvrkrv/K0anfaN4fapZKt0CT5rZWt//Y1RewwngTeB4v3ZuG5SBx4Ej\no9A+dz9hZnvN7CyVk/hPAt+lg/bp2j4iIgEalG4fERHpIYW/iEiAFP4iIgFS+IuIBEjhLyISIIW/\niEiAFP4iIgFS+IuIBOj/AVMVjN48eqv2AAAAAElFTkSuQmCC\n",
 687 |       "text/plain": [
 688 |        "<matplotlib.figure.Figure at 0x10b47c490>"
 689 |       ]
 690 |      },
 691 |      "metadata": {},
 692 |      "output_type": "display_data"
 693 |     }
 694 |    ],
 695 |    "source": [
 696 |     "experiment_diff_mean = shuffle_experiment(100000)\n",
 697 |     "sns.distplot(experiment_diff_mean, kde=False)"
 698 |    ]
 699 |   },
 700 |   {
 701 |    "cell_type": "code",
 702 |    "execution_count": 28,
 703 |    "metadata": {
 704 |     "collapsed": false
 705 |    },
 706 |    "outputs": [
 707 |     {
 708 |      "name": "stdout",
 709 |      "output_type": "stream",
 710 |      "text": [
 711 |       "Number of times diff in mean greater than observed: 40473\n",
 712 |       "% of times diff in mean greater than observed: 40.473\n"
 713 |      ]
 714 |     }
 715 |    ],
 716 |    "source": [
 717 |     "#Finding % of times difference of means is greater than observed\n",
 718 |     "print \"Number of times diff in mean greater than observed:\", \\\n",
 719 |     "            experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]\n",
 720 |     "print \"% of times diff in mean greater than observed:\", \\\n",
 721 |     "        experiment_diff_mean[experiment_diff_mean>=observed_difference].shape[0]/float(experiment_diff_mean.shape[0])*100"
 722 |    ]
 723 |   },
 724 |   {
 725 |    "cell_type": "markdown",
 726 |    "metadata": {},
 727 |    "source": [
 728 |     "### Did the conclusion change now? "
 729 |    ]
 730 |   },
 731 |   {
 732 |    "cell_type": "code",
 733 |    "execution_count": null,
 734 |    "metadata": {
 735 |     "collapsed": true
 736 |    },
 737 |    "outputs": [],
 738 |    "source": []
 739 |   },
 740 |   {
 741 |    "cell_type": "markdown",
 742 |    "metadata": {},
 743 |    "source": [
 744 |     "# Effect Size\n",
 745 |     "\n",
 746 |     "> **Because you can't argue with all the fools in the world. It's easier to let them have their way, then trick them when they're not paying attention**  - Christopher Paolini\n",
 747 |     "\n",
 748 |     "In the first case, how much did the price optimization increase the sales on average?"
 749 |    ]
 750 |   },
 751 |   {
 752 |    "cell_type": "code",
 753 |    "execution_count": 29,
 754 |    "metadata": {
 755 |     "collapsed": false
 756 |    },
 757 |    "outputs": [
 758 |     {
 759 |      "name": "stdout",
 760 |      "output_type": "stream",
 761 |      "text": [
 762 |       "The % increase of sales in the first case: 17.5438596491 %\n"
 763 |      ]
 764 |     }
 765 |    ],
 766 |    "source": [
 767 |     "before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])\n",
 768 |     "after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])\n",
 769 |     "\n",
 770 |     "print \"The % increase of sales in the first case:\", \\\n",
 771 |     "(np.mean(after_opt) - np.mean(before_opt))/np.mean(before_opt)*100,\"%\""
 772 |    ]
 773 |   },
 774 |   {
 775 |    "cell_type": "code",
 776 |    "execution_count": 30,
 777 |    "metadata": {
 778 |     "collapsed": false
 779 |    },
 780 |    "outputs": [
 781 |     {
 782 |      "name": "stdout",
 783 |      "output_type": "stream",
 784 |      "text": [
 785 |       "The % increase of sales in the second case: 1.75438596491 %\n"
 786 |      ]
 787 |     }
 788 |    ],
 789 |    "source": [
 790 |     "before_opt = np.array([230, 210, 190, 240, 350, 170, 180, 240, 330, 270, 210, 230])\n",
 791 |     "after_opt = np.array([310, 180, 190, 240, 220, 240, 160, 410, 130, 320, 290, 210])\n",
 792 |     "\n",
 793 |     "print \"The % increase of sales in the second case:\", \\\n",
 794 |     "(np.mean(after_opt) - np.mean(before_opt))/np.mean(before_opt)*100,\"%\""
 795 |    ]
 796 |   },
 797 |   {
 798 |    "cell_type": "markdown",
 799 |    "metadata": {},
 800 |    "source": [
 801 |     "**Would business feel comfortable spending millions of dollars if the increase is going to be just 1.75%. Does it make sense? Maybe yes - if margins are thin and any increase is considered good. But if the returns from the price optimization module does not let the company break even, it makes no sense to take that path.**"
 802 |    ]
 803 |   },
 804 |   {
 805 |    "cell_type": "markdown",
 806 |    "metadata": {},
 807 |    "source": [
 808 |     "> Someone tells you the result is statistically significant. The first question you should ask?\n",
 809 |     "\n",
 810 |     "# How large is the effect?\n",
 811 |     "\n",
 812 |     "To answer such a question, we will make use of the concept **confidence interval**\n",
 813 |     "\n",
 814 |     "In plain english, *confidence interval* is the range of values the measurement metric is going to take. \n",
 815 |     "\n",
 816 |     "An example would be: 90% of the times, the increase in average sales (before and after price optimization) would be within the bucket `3.4 and 6.7` (These numbers are illustrative. We will derive those numbers below)\n",
 817 |     "\n",
 818 |     "What is the *hacker's way* of doing it? We will do the following steps:\n",
 819 |     "\n",
 820 |     "1. From actual sales data, we sample the data with repetition (separately for before and after) - sample size will be the same as the original\n",
 821 |     "2. Find the differences between the mean of the two samples.\n",
 822 |     "3. Repeat steps 1 and 2 , say 100,000 times.\n",
 823 |     "4. Sort the differences. For getting 90% interval, take the 5% and 95% number. That range gives you the 90% confidence interval on the mean.\n",
 824 |     "5. This process of generating the samples is called **bootstrapping**"
 825 |    ]
 826 |   },
 827 |   {
 828 |    "cell_type": "code",
 829 |    "execution_count": 31,
 830 |    "metadata": {
 831 |     "collapsed": true
 832 |    },
 833 |    "outputs": [],
 834 |    "source": [
 835 |     "#Load the data\n",
 836 |     "before_opt = np.array([23, 21, 19, 24, 35, 17, 18, 24, 33, 27, 21, 23])\n",
 837 |     "after_opt = np.array([31, 28, 19, 24, 32, 27, 16, 41, 23, 32, 29, 33])"
 838 |    ]
 839 |   },
 840 |   {
 841 |    "cell_type": "code",
 842 |    "execution_count": 32,
 843 |    "metadata": {
 844 |     "collapsed": false
 845 |    },
 846 |    "outputs": [],
 847 |    "source": [
 848 |     "#generate a uniform random sample\n",
 849 |     "random_before_opt = np.random.choice(before_opt, size=before_opt.size, replace=True)"
 850 |    ]
 851 |   },
 852 |   {
 853 |    "cell_type": "code",
 854 |    "execution_count": 33,
 855 |    "metadata": {
 856 |     "collapsed": false
 857 |    },
 858 |    "outputs": [
 859 |     {
 860 |      "name": "stdout",
 861 |      "output_type": "stream",
 862 |      "text": [
 863 |       "Actual sample before optimization: [23 21 19 24 35 17 18 24 33 27 21 23]\n",
 864 |       "Bootstrapped sample before optimization:  [21 17 19 21 33 27 24 18 18 19 24 24]\n"
 865 |      ]
 866 |     }
 867 |    ],
 868 |    "source": [
 869 |     "print \"Actual sample before optimization:\", before_opt\n",
 870 |     "print \"Bootstrapped sample before optimization: \", random_before_opt"
 871 |    ]
 872 |   },
 873 |   {
 874 |    "cell_type": "code",
 875 |    "execution_count": 34,
 876 |    "metadata": {
 877 |     "collapsed": false
 878 |    },
 879 |    "outputs": [
 880 |     {
 881 |      "name": "stdout",
 882 |      "output_type": "stream",
 883 |      "text": [
 884 |       "Mean for actual sample: 23.75\n",
 885 |       "Mean for bootstrapped sample: 22.0833333333\n"
 886 |      ]
 887 |     }
 888 |    ],
 889 |    "source": [
 890 |     "print \"Mean for actual sample:\", np.mean(before_opt)\n",
 891 |     "print \"Mean for bootstrapped sample:\", np.mean(random_before_opt)"
 892 |    ]
 893 |   },
 894 |   {
 895 |    "cell_type": "code",
 896 |    "execution_count": 35,
 897 |    "metadata": {
 898 |     "collapsed": false
 899 |    },
 900 |    "outputs": [
 901 |     {
 902 |      "name": "stdout",
 903 |      "output_type": "stream",
 904 |      "text": [
 905 |       "Actual sample after optimization: [31 28 19 24 32 27 16 41 23 32 29 33]\n",
 906 |       "Bootstrapped sample after optimization:  [33 41 27 32 28 41 33 41 41 31 29 19]\n",
 907 |       "Mean for actual sample: 27.9166666667\n",
 908 |       "Mean for bootstrapped sample: 33.0\n"
 909 |      ]
 910 |     }
 911 |    ],
 912 |    "source": [
 913 |     "random_after_opt = np.random.choice(after_opt, size=after_opt.size, replace=True)\n",
 914 |     "print \"Actual sample after optimization:\", after_opt\n",
 915 |     "print \"Bootstrapped sample after optimization: \", random_after_opt\n",
 916 |     "print \"Mean for actual sample:\", np.mean(after_opt)\n",
 917 |     "print \"Mean for bootstrapped sample:\", np.mean(random_after_opt)"
 918 |    ]
 919 |   },
 920 |   {
 921 |    "cell_type": "code",
 922 |    "execution_count": 36,
 923 |    "metadata": {
 924 |     "collapsed": false
 925 |    },
 926 |    "outputs": [
 927 |     {
 928 |      "name": "stdout",
 929 |      "output_type": "stream",
 930 |      "text": [
 931 |       "Difference in means of actual samples: 4.16666666667\n",
 932 |       "Difference in means of bootstrapped samples: 10.9166666667\n"
 933 |      ]
 934 |     }
 935 |    ],
 936 |    "source": [
 937 |     "print \"Difference in means of actual samples:\", np.mean(after_opt) - np.mean(before_opt)\n",
 938 |     "print \"Difference in means of bootstrapped samples:\", np.mean(random_after_opt) - np.mean(random_before_opt)"
 939 |    ]
 940 |   },
 941 |   {
 942 |    "cell_type": "code",
 943 |    "execution_count": 37,
 944 |    "metadata": {
 945 |     "collapsed": true
 946 |    },
 947 |    "outputs": [],
 948 |    "source": [
 949 |     "#Like always, we will repeat this experiment 100,000 times. \n",
 950 |     "\n",
 951 |     "def bootstrap_experiment(number_of_times):\n",
 952 |     "    mean_difference = np.empty([number_of_times,1])\n",
 953 |     "    for times in np.arange(number_of_times):\n",
 954 |     "        random_before_opt = np.random.choice(before_opt, size=before_opt.size, replace=True)\n",
 955 |     "        random_after_opt = np.random.choice(after_opt, size=after_opt.size, replace=True)\n",
 956 |     "        mean_difference[times] = np.mean(random_after_opt) - np.mean(random_before_opt)\n",
 957 |     "    return mean_difference"
 958 |    ]
 959 |   },
 960 |   {
 961 |    "cell_type": "code",
 962 |    "execution_count": 38,
 963 |    "metadata": {
 964 |     "collapsed": false
 965 |    },
 966 |    "outputs": [
 967 |     {
 968 |      "data": {
 969 |       "text/plain": [
 970 |        "<matplotlib.axes._subplots.AxesSubplot at 0x10b5a60d0>"
 971 |       ]
 972 |      },
 973 |      "execution_count": 38,
 974 |      "metadata": {},
 975 |      "output_type": "execute_result"
 976 |     },
 977 |     {
 978 |      "data": {
 979 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAECCAYAAAAW+Nd4AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGhdJREFUeJzt3X+M2/d93/En785H3vFIuYi4OGmDbnHhd90ASmJXaCpn\nkj171qwl1RoMHuA0cIxEnl3B8IBsGXZTPbiQo7SpjUZoJgxWGtlQUKAxkgyFYMWDVujU6w8rqa3G\nc/K25aQItnbpnWbfkRTJ04ncH/zyRFHUkbwjj0d+Xg/gIPLNz5Gfj/i91/fL748PY5VKBRERCctI\nvzsgIiIbT+EvIhIghb+ISIAU/iIiAVL4i4gESOEvIhKgsdUeNLMR4ChwC1AG9gGXgWPR/deA/e5e\nMbN9wMPAMnDQ3U+Y2QRwHMgAWeBBd5/v0VhERKRNrbb87wWS7v5R4LeBLwBPA9PuvhOIAXvN7Cbg\nMWAHsBs4ZGbjwKPAuajt88CB3gxDREQ60Sr8C8AWM4sBW4Al4HZ3n4kefxG4B9gOzLr7JXdfBM4D\n24A7gJNR25NRWxER6bNVd/sAs0AC+CHwLuDjwM66x7NUVwppYOE69cWGmoiI9FmrLf/PU92iN+BD\nVHfd3FD3eBp4h2rAp+rqqSb1Wk1ERPqs1ZZ/kitb7m9H7V8xs13ufhq4DzgFvAw8ZWZxqp8UbqV6\nMHgW2AOcjdrO0EKlUqnEYrE1DEVEJGgdBWdstYndzOxG4GvAVqpb/L8PfA94FhgHXgf2RWf7fJbq\n2T4jwFPu/q3obJ/ngPcAJeABd/+HFn2qzM1lOxnDQMlkUmh8g2mYxwYa36DLZFLdC/8+UfgPsGEe\n3zCPDTS+Qddp+OsiLxGRACn8RUQCpPAXEQmQwl9EJEAKfxGRACn8RUQC1OoiL5GhVy6Xyefz19ST\nySQjI9o+kuGk8Jfg5fN5Tp19i3hiYqVWKha4e/vNpFKpVX5TZHAp/EWAeGKCicmpfndDZMPoM62I\nSIAU/iIiAVL4i4gESOEvIhIgHfCVoDQ7rTOXy1Epb7rZbUV6SuEvQWl2WufC2xdITE4x2cd+iWw0\nhb8Ep/G0zmLh2gu8RIad9vmLiARI4S8iEiCFv4hIgBT+IiIBannA18weBD4d3Z0APgh8FPgyUAZe\nA/a7e8XM9gEPA8vAQXc/YWYTwHEgA2SBB919vtsDERGR9rXc8nf359z9Lne/C/gu8BjwBDDt7juB\nGLDXzG6KHtsB7AYOmdk48ChwLmr7PHCgN0MREZF2tb3bx8x+Gfgldz8K3O7uM9FDLwL3ANuBWXe/\n5O6LwHlgG3AHcDJqezJqKyIifdTJPv9p4MnodqyungW2AGlg4Tr1xYaaiIj0UVvhb2Y3Are4++mo\nVK57OA28QzXg67/5ItWkXquJiEgftXuF707gVN39V8xsV7QyuC967GXgKTOLAwngVqoHg2eBPcDZ\nqO0MLWQyw/3tSRpf/8TjFaaSCSaTiZXaUiHByOgoqakrtdHYMlu3TpFOXz2WzTy2btD4wtFu+N8C\nvFV3/3PAs9EB3deBF6KzfQ4DZ6h+oph295KZHQGeM7MzQAl4oNWLzc1lOxnDQMlkUhpfH2WzOXL5\nIpcrVxb9/MUisZExxsaLK7XCxSLz8zlKpSt7ODf72NZL4xtsna7Y2gp/d/+9hvtvAnc2aXcUONpQ\nKwD3d9QrERHpKV3kJSISIIW/iEiAFP4iIgFS+IuIBEjhLyISIH2Tl0gT5XKZXC53VS0er1AuVxgZ\n0TaTDD6Fvwyt9XxZ+1KpwMyri6S33LhSGxst85Ff+jlSKV0oJINP4S9Da71f1t74Xb+jseUe9FKk\nPxT+MtT0Ze0izWnnpYhIgBT+IiIBUviLiARI4S8iEiCFv4hIgBT+IiIBUviLiARI5/mLtKnZlA8A\nyWRSUz7IwFH4i7SpVCww8+o/XDXlQ6lY4O7tN2vKBxk4Cn+RDjReMSwyqPRZVUQkQC23/M3sPwEf\nB24A/gCYBY4BZeA1YL+7V8xsH/AwsAwcdPcTZjYBHAcyQBZ40N3nezEQERFp36pb/mZ2J/Cr7r4D\nuBN4P/A0MO3uO4EYsNfMbgIeA3YAu4FDZjYOPAqci9o+Dxzo0ThERKQDrbb87wW+b2bfBtLAfwA+\n4+4z0eMvRm0uA7Pufgm4ZGbngW3AHcDvRG1PAr/V5f6LrGicv7/duftFQtQq/DPA+4CPUd3q/xOq\nW/s1WWAL1RXDwnXqiw01kZ5onL+/k7n7RULTKvzngR+4+zLwhpkVgZ+tezwNvEM14OvPdUs1qddq\nLWUyw33anMbXG/F4hXe962eYTFbPxhmNXWZkdJTUVGKlzVIhsebahcICU8n4VbXR2DJbt06RTg/H\ne6plMxytwv/PgMeBZ8zsvcAkcMrMdrn7aeA+4BTwMvCUmcWBBHAr1YPBs8Ae4GzUdubal7jW3Fx2\nDUMZDJlMSuPrkWw2Ry5f5HKluljnLxaJjYwxNl5cabOeGkAuX7qqVrhYZH4+R6kUY9Bp2Rxsna7Y\nVg3/6IydnWb2MtWDw78J/C3wbHRA93Xghehsn8PAmajdtLuXzOwI8JyZnQFKwAOdDkhERLqv5ame\n7v4fm5TvbNLuKHC0oVYA7l9r50REpDd0ha9seo1n8dRshjl1ms33sxn6JdKKwl82vcazeGDzzKmz\nVCow8+riynw/m6VfIq0o/GUgbOY5dTZz30SuR59NRUQCpPAXEQmQdvuIdJG+8EUGhcJfpIsaDwCD\nDgLL5qTwl4HUbAt7s0zkpgPAMggU/jKQmm1hayI3kfYp/GVgNW5hFwvXXggmIs3pCJSISIAU/iIi\nAVL4i4gESOEvIhIghb+ISIAU/iIiAVL4i4gESOEvIhIghb+ISIAU/iIiAWpregcz+2tgIbr7I+AQ\ncAwoA68B+929Ymb7gIeBZeCgu58wswngOJABssCD7j7f1VGIiEhHWm75m1kCwN3vin4+AzwDTLv7\nTiAG7DWzm4DHgB3AbuCQmY0DjwLnorbPAwd6MxQREWlXO1v+HwQmzew7Ufv/DNzm7jPR4y8C9wKX\ngVl3vwRcMrPzwDbgDuB3orYngd/qYv9FRGQN2tnnnwe+5O67gUeArzc8ngW2AGmu7BpqrC821ERE\npI/a2fJ/AzgP4O5vmtkF4MN1j6eBd6gGfP1XFaWa1Gu1VWUyw/2NRxpfZ+LxClPJBJPJxEptqZBg\nZHSU1NT1a+206aR2obDAVDLe8WuOxpbZunWKdHrzv+9aNsPRTvg/RHX3zX4zey/VAH/JzHa5+2ng\nPuAU8DLwlJnFgQRwK9WDwbPAHuBs1Hbm2pe42txcdg1DGQyZTErj61A2myOXL3K5cmVxzV8sEhsZ\nY2y8eN1aO206qQHk8qWOX7Nwscj8fI5SKbbe/4qe0rI52DpdsbUT/l8FvmZmtdB+CLgAPBsd0H0d\neCE62+cwcIbq7qRpdy+Z2RHgOTM7A5SABzrqoYiIdF3L8Hf3ZeBTTR66s0nbo8DRhloBuH+N/RMR\nkR7QRV4iIgFS+IuIBEjhLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISIIW/iEiA2prPX0TWrlwuk8vl\nrqknk0lGRrT9Jf2h8BfpsaVSgZlXF0lvuXGlVioWuHv7zaRSmmhM+kPhL7IB4okJJian+t0NkRX6\nzCkiEiCFv4hIgBT+IiIBUviLiARI4S8iEiCFv4hIgBT+IiIBUviLiARIF3nJplIul8nn81fVcrkc\nlXKlTz0SGU5thb+Z/SPge8DdQBk4Fv37GrDf3Stmtg94GFgGDrr7CTObAI4DGSALPOju810fhQyN\nfD7PqbNvEU9MrNQW3r5AYnKKyT72S2TYtNztY2Y3AP8NyAMx4Blg2t13Rvf3mtlNwGPADmA3cMjM\nxoFHgXNR2+eBAz0ZhQyV2lQItZ94ItHvLokMnXb2+X8JOAL8fXT/NnefiW6/CNwDbAdm3f2Suy8C\n54FtwB3AyajtyaitiIj02arhb2afBubc/aWoFIt+arLAFiANLFynvthQExGRPmu1z/8hoGJm9wAf\nAp6juv++Jg28QzXg6+emTTWp12otZTLDPc2txnd98XiFqWSCyeSVXT1LhQQjo6OkpjqrrfX3rle7\nUFhgKhnvymuOxpbZunWKdHpzLQtaNsOxavi7+67abTP7U+AR4EtmtsvdTwP3AaeAl4GnzCwOJIBb\nqR4MngX2AGejtjO0YW4u2/lIBkQmk9L4VpHN5sjli1yuXFk08xeLxEbGGBsvdlRb6+9drwaQy5e6\n8pqFi0Xm53OUSvUfpPtLy+Zg63TF1ul5/hXgc8CTZvbnVFceL7j7T4HDwBmqK4Npdy9RPVbwATM7\nA3wWeLLD1xMRkR5o+zx/d7+r7u6dTR4/ChxtqBWA+9faORER6Q1d4SsiEiCFv4hIgBT+IiIBUviL\niARI4S8iEiCFv4hIgBT+IiIBUviLiARI4S8iEiCFv4hIgPQ1jiJ9UC6XyeVy19STySQjI9omk95T\n+Iv0wVKpwMyri6S33LhSKxUL3L39ZlIpTTssvafwF+mT2tdVivSDPl+KiARI4S8iEiDt9pG+KZfL\n5PP5q2q5XI5KudKnHomEQ+EvfZPP5zl19i3iiYmV2sLbF0hMTjHZx36JhEDhL33VeNCzWMiv0lpE\nukX7/EVEAqTwFxEJUMvdPmY2CjwL3AJUgEeAEnAMKAOvAfvdvWJm+4CHgWXgoLufMLMJ4DiQAbLA\ng+4+34OxiIhIm9rZ8v8YUHb3jwIHgC8ATwPT7r4TiAF7zewm4DFgB7AbOGRm48CjwLmo7fPRc4iI\nSB+1DH93/+/Av43u/mPgbeB2d5+Jai8C9wDbgVl3v+Tui8B5YBtwB3AyansyaisiIn3U1j5/d79s\nZseALwNfp7q1X5MFtgBpYOE69cWGmoiI9FHbp3q6+6fN7N3Ay0Ci7qE08A7VgK+fkSrVpF6rrSqT\nGe6JrTS+qni8wlQywWTyyuK0VEgwMjpKamr9tW4+F8CFwgJTyXjPXnM0tszWrVOk0/1bPrRshqOd\nA76fAn7O3Q8BBeAy8F0z2+Xup4H7gFNUVwpPmVmc6srhVqoHg2eBPcDZqO3Mta9ytbm57NpGMwAy\nmZTGF8lmc+TyRS5XriyG+YtFYiNjjI0X113r5nPV5PKlnr1m4WKR+fkcpVL9B+uNo2VzsHW6Ymtn\ny/8F4JiZnQZuAB4Hfgg8Gx3QfR14ITrb5zBwhurupGl3L5nZEeA5MztD9SyhBzrqoYiIdF3L8Hf3\nAvBvmjx0Z5O2R4GjTX7//jX2T0REekDTO4hsEvp2L9lICn+RTULf7iUbSeEvsono271ko+izpIhI\ngLTlLxtCX9wisrko/GVD6ItbRDYXhb9sGH1xi8jmoX3+IiIBUviLiARI4S8iEiCFv4hIgBT+IiIB\nUviLiARI4S8iEiCFv4hIgBT+IiIB0hW+0hONc/loHp+10Rz/0isKf+mJxrl8NI/P2miOf+kVhb/0\nTP1cPprHZ+00x7/0gj43iogEaNUtfzO7AfhD4OeBOHAQ+AFwDCgDrwH73b1iZvuAh4Fl4KC7nzCz\nCeA4kAGywIPuPt+jsYiISJtabfl/Ephz953AvwC+AjwNTEe1GLDXzG4CHgN2ALuBQ2Y2DjwKnIva\nPg8c6M0wRESkE63C/xvAE3VtLwG3uftMVHsRuAfYDsy6+yV3XwTOA9uAO4CTUduTUVsREemzVXf7\nuHsewMxSVFcEB4Dfq2uSBbYAaWDhOvXFhpqIrEOz0z916qd0quXZPmb2PuCbwFfc/Y/M7HfrHk4D\n71AN+PrzzlJN6rVaS5nMcJ/CFsL44vEKU8kEk8kEAEuFBCOjo6SmEivtel3r9vNfKCwwlYxv6Gs2\nry3wvTd+ypYblwAoFi/ysZ2/SDq9/uUqhGVTqlod8H038BLwm+7+p1H5FTPb5e6ngfuAU8DLwFNm\nFgcSwK1UDwbPAnuAs1HbGdowN5ddw1AGQyaTCmJ82WyOXL7I5Up1EctfLBIbGWNsvLjStte1bj8/\nQC5f2tDXXK1W+79dvjzC/HyOUinGeoSybA6rTldsrbb8p6nuqnnCzGr7/h8HDkcHdF8HXojO9jkM\nnKF6bGDa3UtmdgR4zszOACXggY56JyIiPdFqn//jVMO+0Z1N2h4FjjbUCsD96+ifiIj0gI4QiYgE\nSOEvIhIghb+ISIA0sZusW/30zfF4pXqmj6ZwFtnUFP6ybvXTN08lE+TyRU3hLLLJKfylK2rTDk8m\nE1yujGkKZ5FNTvv8RUQCpPAXEQmQwl9EJEAKfxGRACn8RUQCpPAXEQmQwl9EJEA6z186Un81b42u\n5hUZPAp/6Uj91bw1uppXZPAo/KVjtat5a3Q1r8jg0T5/EZEAKfxFRAKk8BcRCZDCX0QkQG0d8DWz\nXwG+6O53mdkvAMeAMvAasN/dK2a2D3gYWAYOuvsJM5sAjgMZIAs86O7zPRiHSLDK5TK5XO6aejKZ\nZGRE23fSXMslw8w+DzwLxKPSM8C0u+8EYsBeM7sJeAzYAewGDpnZOPAocC5q+zxwoPtDEAnbUqnA\nzKs/4c++/3crP6fOvnXN9Rgi9drZLDgPfIJq0APc5u4z0e0XgXuA7cCsu19y98Xod7YBdwAno7Yn\no7Yi0mW1029rP/XXYYg00zL83f2bVHfl1MTqbmeBLUAaWLhOfbGhJiIifbaWi7zKdbfTwDtUAz5V\nV081qddqLWUyqdaNBtggjy8erzCVTDCZTKzUlgoJRkZHSU1Va6mpxDW1xvsbUev2818oLDCVjG/o\na651nKOxZbZunSKd7mxZG+Rlsx3DPr5OrCX8XzGzXe5+GrgPOAW8DDxlZnEgAdxK9WDwLLAHOBu1\nnWn+lFebm8uuoVuDIZNJDfT4stkcuXyRy5Uri07+YpHYyBhj40VSUwmyueJVtcY2zX6vF7VuPz9A\nLl/a0Ndc6zgLF4vMz+coleo/qK9u0JfNVkIYXyc6ORWgNnPX54AnzezPqa48XnD3nwKHgTNUVwbT\n7l4CjgAfMLMzwGeBJzvqnYiI9ERbW/7u/rdUz+TB3d8E7mzS5ihwtKFWAO5fbyelPzSDp8jw0sRu\ncl2awXNw6dx/aUXhL6vSDJ6DqXru/yLpLTeu1ErFAndvv5lUSgc9ReEvMrQaV9wi9fT5T0QkQAp/\nEZEAKfxFRAKkff4C6LROkdAo/AXQaZ0ioVH4ywqd1ikSDoW/SCB04ZfUU/iLBEIXfkk9hX+AdHA3\nXLrwS2oU/gHSwV0RUfgHSgd3Ba4+DhCPV8hmq7d1HGD4KfxFAlZ/HGAqmSCXL+o4QCAU/gFo3Mev\n/ftSr/YpcDKZuOob2mS46Z0OQOM+fu3fFxGF/5C53pk84+OJlX382r8vq9H1AGFQ+A8Znckj69Xs\neoDCxTy/+oH3MDV15SQBrQwGW8/D38xGgP8KbANKwGfd/a1ev27IdCaPrFezZWjm1Z+srBB0UHjw\nbcSW/78Cxt19h5n9CvB0VJN10sVaspF0gdhw2YjwvwM4CeDuf2Vmv7wBrzl0rhf0f/m//i+JySs7\ndLSLRzaCjgsMvo0I/zSwWHf/spmNuHt5A15702sW6uVy9b+m/o9otaDXLh7ZaO0eF2i2LDergVYc\nG20jwn8RqN8xOHDBf+5vXmVpaWnlfowYN9/8fsbGxjt+rvqrKKEa6qdf+THxeGKltrDw/xiJjZJK\nb7mqNjGRItGwTV8qFihczNXdLxIbGV211k6btdZGY8sULhY39DU3apyXli5SKl7u2/9tr1/zeu/d\naq9Zb6lU5H/85Q+vWW6bLcuNtVKpyK4P/5OrVhzdlsno+ES9jQj/WeDjwDfM7CPA37RoH9tsb9I9\nd//Trj5fOp2+6v62bbd09flFpLnNli39tBHh/y3gn5vZbHT/oQ14TRERWUWsUtGZISIiodHRFRGR\nACn8RUQCpPAXEQmQwl9EJECbamI3M/t14F+7+yej+x8Bfh9YBl5y99/uZ/+6wcxiwP8G3ohKf+Hu\n033s0rqFMH+Tmf01sBDd/ZG7f6af/emGaLqVL7r7XWb2C8AxoAy8Bux394E+G6RhfB8G/gR4M3r4\niLv/cf96tz5mdgPwh8DPA3HgIPADOngPN034m9mXgXuBV+rKR4BPuPuPzeyEmX3I3V/tTw+75mbg\ne+7+a/3uSBcN9fxNZpYAcPe7+t2XbjGzzwO/AdSu5HoGmHb3GTM7AuwFvt2v/q1Xk/HdDjzj7s/0\nr1dd9Ulgzt0/ZWY/A5yjmp1tv4ebabfPLPAoEAMwszQQd/cfR49/B7inT33rptuBnzWz/xmt0Ibh\nCq+r5m8Chm3+pg8Ck2b2HTM7Fa3gBt154BNEf2/Abe4+E91+kcH/W2sc3+3AvzSz02Z21MwGfYa6\nbwBPRLdHgEt0+B5uePib2WfM7PsNP7c3+QjWOCdQFtjCAGk2VuDvgC+4+z8DvgAc728vu6Lp/E39\n6kwP5IEvuftu4BHg64M+Pnf/JtXdqTWxuts5BuxvrVGT8f0V8O/dfRfwI+C/9KVjXeLueXfPmVmK\n6orgAFfnecv3cMN3+7j7V4GvttG0cU6gNPBOTzrVI83GamYTRAulu8+a2Xv70bcuG/j5m1p4g+qW\nJO7+ppldAN4D/J++9qq76t+vFAP2t9aGb7l77ZjNt4HD/exMN5jZ+4BvAl9x9z8ys9+te7jle7hp\nt17cfRFYMrP3RwdJ7wVmWvzaIHgC+HcAZvZB4Cf97U5XzAJ7YOUgfav5mwbNQ1SPYxCtrNPA3/e1\nR933ipntim7fx3D8rdU7aWbbo9t3A9/tZ2fWy8zeDbwEfN7dj0Xljt7DTXPAN1KJfmoeAb4OjALf\ncfezfelVd30ROG5me6h+Avh0f7vTFcM+f9NXga+ZWe2P6aEh+mRT+3v7HPCsmY0DrwMv9K9LXVUb\n3yPAV8zsEtUV98P961JXTFPdrfOEmdX2/T8OHG73PdTcPiIiAdq0u31ERKR3FP4iIgFS+IuIBEjh\nLyISIIW/iEiAFP4iIgFS+IuIBEjhLyISoP8PoK6vAauRiewAAAAASUVORK5CYII=\n",
 980 |       "text/plain": [
 981 |        "<matplotlib.figure.Figure at 0x10bbb79d0>"
 982 |       ]
 983 |      },
 984 |      "metadata": {},
 985 |      "output_type": "display_data"
 986 |     }
 987 |    ],
 988 |    "source": [
 989 |     "mean_difference = bootstrap_experiment(100000)\n",
 990 |     "sns.distplot(mean_difference, kde=False)"
 991 |    ]
 992 |   },
 993 |   {
 994 |    "cell_type": "code",
 995 |    "execution_count": 39,
 996 |    "metadata": {
 997 |     "collapsed": false
 998 |    },
 999 |    "outputs": [],
1000 |    "source": [
1001 |     "mean_difference = np.sort(mean_difference, axis=0)"
1002 |    ]
1003 |   },
1004 |   {
1005 |    "cell_type": "code",
1006 |    "execution_count": 40,
1007 |    "metadata": {
1008 |     "collapsed": false
1009 |    },
1010 |    "outputs": [
1011 |     {
1012 |      "data": {
1013 |       "text/plain": [
1014 |        "array([[ -6.66666667],\n",
1015 |        "       [ -6.33333333],\n",
1016 |        "       [ -6.08333333],\n",
1017 |        "       ..., \n",
1018 |        "       [ 13.16666667],\n",
1019 |        "       [ 13.16666667],\n",
1020 |        "       [ 15.        ]])"
1021 |       ]
1022 |      },
1023 |      "execution_count": 40,
1024 |      "metadata": {},
1025 |      "output_type": "execute_result"
1026 |     }
1027 |    ],
1028 |    "source": [
1029 |     "mean_difference #Sorted difference"
1030 |    ]
1031 |   },
1032 |   {
1033 |    "cell_type": "code",
1034 |    "execution_count": 41,
1035 |    "metadata": {
1036 |     "collapsed": false
1037 |    },
1038 |    "outputs": [
1039 |     {
1040 |      "data": {
1041 |       "text/plain": [
1042 |        "array([ 0.16666667,  8.08333333])"
1043 |       ]
1044 |      },
1045 |      "execution_count": 41,
1046 |      "metadata": {},
1047 |      "output_type": "execute_result"
1048 |     }
1049 |    ],
1050 |    "source": [
1051 |     "np.percentile(mean_difference, [5,95])"
1052 |    ]
1053 |   },
1054 |   {
1055 |    "cell_type": "markdown",
1056 |    "metadata": {},
1057 |    "source": [
1058 |     "Reiterating what this means: 90% of the times, the mean difference is between the limits as shown above"
1059 |    ]
1060 |   },
1061 |   {
1062 |    "cell_type": "markdown",
1063 |    "metadata": {},
1064 |    "source": [
1065 |     "**Exercise: Find the 95% percentile for confidence intevals**"
1066 |    ]
1067 |   },
1068 |   {
1069 |    "cell_type": "code",
1070 |    "execution_count": null,
1071 |    "metadata": {
1072 |     "collapsed": true
1073 |    },
1074 |    "outputs": [],
1075 |    "source": []
1076 |   },
1077 |   {
1078 |    "cell_type": "markdown",
1079 |    "metadata": {},
1080 |    "source": [
1081 |     "### Where do we go from here? \n",
1082 |     "\n",
1083 |     "First of all there are two points to be made.\n",
1084 |     "\n",
1085 |     "1. Whey do we need signficance testing if confidence intervals can provide us more information?\n",
1086 |     "2. How does it relate to the traditional statistical procedure of finding confidence intervals\n",
1087 |     "\n",
1088 |     "For the first one:\n",
1089 |     "\n",
1090 |     "What if sales in the first month after price changes was 80 and the month before price changes was 40. The difference is 40. And confidence interval,as explained above, using replacements, would always generate 40. But if we do the significance testing, as detailed above - where the labels are shuffled, the prices are equally likely to occur in both the groups. And so, significance testing would answer that there was no difference. But don't we all know that the data is **too small** to make meaningful inferences?\n",
1091 |     "\n",
1092 |     "For the second one:\n",
1093 |     "\n",
1094 |     "Traditional statistics derivation assumes normal distribution. But what if the underlying distribution isn't normal? Also, people relate to resampling much better :-) "
1095 |    ]
1096 |   }
1097 |  ],
1098 |  "metadata": {
1099 |   "kernelspec": {
1100 |    "display_name": "Python 2",
1101 |    "language": "python",
1102 |    "name": "python2"
1103 |   },
1104 |   "language_info": {
1105 |    "codemirror_mode": {
1106 |     "name": "ipython",
1107 |     "version": 2
1108 |    },
1109 |    "file_extension": ".py",
1110 |    "mimetype": "text/x-python",
1111 |    "name": "python",
1112 |    "nbconvert_exporter": "python",
1113 |    "pygments_lexer": "ipython2",
1114 |    "version": "2.7.10"
1115 |   }
1116 |  },
1117 |  "nbformat": 4,
1118 |  "nbformat_minor": 0
1119 | }
1120 | 


--------------------------------------------------------------------------------
/notebooks/4. Basic Metrics.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Basic Metrics\n",
   8 |     "\n",
   9 |     "When we think about summarizing data, what are the metrics that we look at?\n",
  10 |     "\n",
  11 |     "In this notebook, we will look in the price of weed dataset along with the demographic information of the United States. \n",
  12 |     "\n",
  13 |     "To read how the data was acquired, please read [this](https://github.com/amitkaps/weed/blob/master/1-Acquire.ipynb) to get more information\n",
  14 |     "\n",
  15 |     "This notebook will make use of pandas quite a bit."
  16 |    ]
  17 |   },
  18 |   {
  19 |    "cell_type": "code",
  20 |    "execution_count": 1,
  21 |    "metadata": {
  22 |     "collapsed": true
  23 |    },
  24 |    "outputs": [],
  25 |    "source": [
  26 |     "import numpy as np\n",
  27 |     "import pandas as pd\n",
  28 |     "from datetime import datetime as dt\n",
  29 |     "from scipy import stats"
  30 |    ]
  31 |   },
  32 |   {
  33 |    "cell_type": "markdown",
  34 |    "metadata": {},
  35 |    "source": [
  36 |     "### Read the input datasets. There are three datasets:\n",
  37 |     "\n",
  38 |     "1. Weed price by date / state\n",
  39 |     "2. Demographics of State\n",
  40 |     "3. Population of state"
  41 |    ]
  42 |   },
  43 |   {
  44 |    "cell_type": "code",
  45 |    "execution_count": 2,
  46 |    "metadata": {
  47 |     "collapsed": false
  48 |    },
  49 |    "outputs": [],
  50 |    "source": [
  51 |     "prices_pd = pd.read_csv(\"../data/Weed_Price.csv\", parse_dates=[-1])\n",
  52 |     "demography_pd = pd.read_csv(\"../data/Demographics_State.csv\")\n",
  53 |     "population_pd = pd.read_csv(\"../data/Population_State.csv\")"
  54 |    ]
  55 |   },
  56 |   {
  57 |    "cell_type": "code",
  58 |    "execution_count": 3,
  59 |    "metadata": {
  60 |     "collapsed": false
  61 |    },
  62 |    "outputs": [
  63 |     {
  64 |      "data": {
  65 |       "text/html": [
  66 |        "<div>\n",
  67 |        "<table border=\"1\" class=\"dataframe\">\n",
  68 |        "  <thead>\n",
  69 |        "    <tr style=\"text-align: right;\">\n",
  70 |        "      <th></th>\n",
  71 |        "      <th>State</th>\n",
  72 |        "      <th>HighQ</th>\n",
  73 |        "      <th>HighQN</th>\n",
  74 |        "      <th>MedQ</th>\n",
  75 |        "      <th>MedQN</th>\n",
  76 |        "      <th>LowQ</th>\n",
  77 |        "      <th>LowQN</th>\n",
  78 |        "      <th>date</th>\n",
  79 |        "    </tr>\n",
  80 |        "  </thead>\n",
  81 |        "  <tbody>\n",
  82 |        "    <tr>\n",
  83 |        "      <th>0</th>\n",
  84 |        "      <td>Alabama</td>\n",
  85 |        "      <td>339.06</td>\n",
  86 |        "      <td>1042</td>\n",
  87 |        "      <td>198.64</td>\n",
  88 |        "      <td>933</td>\n",
  89 |        "      <td>149.49</td>\n",
  90 |        "      <td>123</td>\n",
  91 |        "      <td>2014-01-01</td>\n",
  92 |        "    </tr>\n",
  93 |        "    <tr>\n",
  94 |        "      <th>1</th>\n",
  95 |        "      <td>Alaska</td>\n",
  96 |        "      <td>288.75</td>\n",
  97 |        "      <td>252</td>\n",
  98 |        "      <td>260.60</td>\n",
  99 |        "      <td>297</td>\n",
 100 |        "      <td>388.58</td>\n",
 101 |        "      <td>26</td>\n",
 102 |        "      <td>2014-01-01</td>\n",
 103 |        "    </tr>\n",
 104 |        "    <tr>\n",
 105 |        "      <th>2</th>\n",
 106 |        "      <td>Arizona</td>\n",
 107 |        "      <td>303.31</td>\n",
 108 |        "      <td>1941</td>\n",
 109 |        "      <td>209.35</td>\n",
 110 |        "      <td>1625</td>\n",
 111 |        "      <td>189.45</td>\n",
 112 |        "      <td>222</td>\n",
 113 |        "      <td>2014-01-01</td>\n",
 114 |        "    </tr>\n",
 115 |        "    <tr>\n",
 116 |        "      <th>3</th>\n",
 117 |        "      <td>Arkansas</td>\n",
 118 |        "      <td>361.85</td>\n",
 119 |        "      <td>576</td>\n",
 120 |        "      <td>185.62</td>\n",
 121 |        "      <td>544</td>\n",
 122 |        "      <td>125.87</td>\n",
 123 |        "      <td>112</td>\n",
 124 |        "      <td>2014-01-01</td>\n",
 125 |        "    </tr>\n",
 126 |        "    <tr>\n",
 127 |        "      <th>4</th>\n",
 128 |        "      <td>California</td>\n",
 129 |        "      <td>248.78</td>\n",
 130 |        "      <td>12096</td>\n",
 131 |        "      <td>193.56</td>\n",
 132 |        "      <td>12812</td>\n",
 133 |        "      <td>192.92</td>\n",
 134 |        "      <td>778</td>\n",
 135 |        "      <td>2014-01-01</td>\n",
 136 |        "    </tr>\n",
 137 |        "  </tbody>\n",
 138 |        "</table>\n",
 139 |        "</div>"
 140 |       ],
 141 |       "text/plain": [
 142 |        "        State   HighQ  HighQN    MedQ  MedQN    LowQ  LowQN       date\n",
 143 |        "0     Alabama  339.06    1042  198.64    933  149.49    123 2014-01-01\n",
 144 |        "1      Alaska  288.75     252  260.60    297  388.58     26 2014-01-01\n",
 145 |        "2     Arizona  303.31    1941  209.35   1625  189.45    222 2014-01-01\n",
 146 |        "3    Arkansas  361.85     576  185.62    544  125.87    112 2014-01-01\n",
 147 |        "4  California  248.78   12096  193.56  12812  192.92    778 2014-01-01"
 148 |       ]
 149 |      },
 150 |      "execution_count": 3,
 151 |      "metadata": {},
 152 |      "output_type": "execute_result"
 153 |     }
 154 |    ],
 155 |    "source": [
 156 |     "prices_pd.head()"
 157 |    ]
 158 |   },
 159 |   {
 160 |    "cell_type": "code",
 161 |    "execution_count": 4,
 162 |    "metadata": {
 163 |     "collapsed": false
 164 |    },
 165 |    "outputs": [
 166 |     {
 167 |      "data": {
 168 |       "text/html": [
 169 |        "<div>\n",
 170 |        "<table border=\"1\" class=\"dataframe\">\n",
 171 |        "  <thead>\n",
 172 |        "    <tr style=\"text-align: right;\">\n",
 173 |        "      <th></th>\n",
 174 |        "      <th>State</th>\n",
 175 |        "      <th>HighQ</th>\n",
 176 |        "      <th>HighQN</th>\n",
 177 |        "      <th>MedQ</th>\n",
 178 |        "      <th>MedQN</th>\n",
 179 |        "      <th>LowQ</th>\n",
 180 |        "      <th>LowQN</th>\n",
 181 |        "      <th>date</th>\n",
 182 |        "    </tr>\n",
 183 |        "  </thead>\n",
 184 |        "  <tbody>\n",
 185 |        "    <tr>\n",
 186 |        "      <th>22894</th>\n",
 187 |        "      <td>Virginia</td>\n",
 188 |        "      <td>364.98</td>\n",
 189 |        "      <td>3513</td>\n",
 190 |        "      <td>293.12</td>\n",
 191 |        "      <td>3079</td>\n",
 192 |        "      <td>NaN</td>\n",
 193 |        "      <td>284</td>\n",
 194 |        "      <td>2014-12-31</td>\n",
 195 |        "    </tr>\n",
 196 |        "    <tr>\n",
 197 |        "      <th>22895</th>\n",
 198 |        "      <td>Washington</td>\n",
 199 |        "      <td>233.05</td>\n",
 200 |        "      <td>3337</td>\n",
 201 |        "      <td>189.92</td>\n",
 202 |        "      <td>3562</td>\n",
 203 |        "      <td>NaN</td>\n",
 204 |        "      <td>160</td>\n",
 205 |        "      <td>2014-12-31</td>\n",
 206 |        "    </tr>\n",
 207 |        "    <tr>\n",
 208 |        "      <th>22896</th>\n",
 209 |        "      <td>West Virginia</td>\n",
 210 |        "      <td>359.35</td>\n",
 211 |        "      <td>551</td>\n",
 212 |        "      <td>224.03</td>\n",
 213 |        "      <td>545</td>\n",
 214 |        "      <td>NaN</td>\n",
 215 |        "      <td>60</td>\n",
 216 |        "      <td>2014-12-31</td>\n",
 217 |        "    </tr>\n",
 218 |        "    <tr>\n",
 219 |        "      <th>22897</th>\n",
 220 |        "      <td>Wisconsin</td>\n",
 221 |        "      <td>350.52</td>\n",
 222 |        "      <td>2244</td>\n",
 223 |        "      <td>272.71</td>\n",
 224 |        "      <td>2221</td>\n",
 225 |        "      <td>NaN</td>\n",
 226 |        "      <td>167</td>\n",
 227 |        "      <td>2014-12-31</td>\n",
 228 |        "    </tr>\n",
 229 |        "    <tr>\n",
 230 |        "      <th>22898</th>\n",
 231 |        "      <td>Wyoming</td>\n",
 232 |        "      <td>322.27</td>\n",
 233 |        "      <td>131</td>\n",
 234 |        "      <td>351.86</td>\n",
 235 |        "      <td>197</td>\n",
 236 |        "      <td>NaN</td>\n",
 237 |        "      <td>12</td>\n",
 238 |        "      <td>2014-12-31</td>\n",
 239 |        "    </tr>\n",
 240 |        "  </tbody>\n",
 241 |        "</table>\n",
 242 |        "</div>"
 243 |       ],
 244 |       "text/plain": [
 245 |        "               State   HighQ  HighQN    MedQ  MedQN  LowQ  LowQN       date\n",
 246 |        "22894       Virginia  364.98    3513  293.12   3079   NaN    284 2014-12-31\n",
 247 |        "22895     Washington  233.05    3337  189.92   3562   NaN    160 2014-12-31\n",
 248 |        "22896  West Virginia  359.35     551  224.03    545   NaN     60 2014-12-31\n",
 249 |        "22897      Wisconsin  350.52    2244  272.71   2221   NaN    167 2014-12-31\n",
 250 |        "22898        Wyoming  322.27     131  351.86    197   NaN     12 2014-12-31"
 251 |       ]
 252 |      },
 253 |      "execution_count": 4,
 254 |      "metadata": {},
 255 |      "output_type": "execute_result"
 256 |     }
 257 |    ],
 258 |    "source": [
 259 |     "prices_pd.tail()"
 260 |    ]
 261 |   },
 262 |   {
 263 |    "cell_type": "code",
 264 |    "execution_count": 5,
 265 |    "metadata": {
 266 |     "collapsed": false
 267 |    },
 268 |    "outputs": [
 269 |     {
 270 |      "data": {
 271 |       "text/html": [
 272 |        "<div>\n",
 273 |        "<table border=\"1\" class=\"dataframe\">\n",
 274 |        "  <thead>\n",
 275 |        "    <tr style=\"text-align: right;\">\n",
 276 |        "      <th></th>\n",
 277 |        "      <th>region</th>\n",
 278 |        "      <th>total_population</th>\n",
 279 |        "      <th>percent_white</th>\n",
 280 |        "      <th>percent_black</th>\n",
 281 |        "      <th>percent_asian</th>\n",
 282 |        "      <th>percent_hispanic</th>\n",
 283 |        "      <th>per_capita_income</th>\n",
 284 |        "      <th>median_rent</th>\n",
 285 |        "      <th>median_age</th>\n",
 286 |        "    </tr>\n",
 287 |        "  </thead>\n",
 288 |        "  <tbody>\n",
 289 |        "    <tr>\n",
 290 |        "      <th>0</th>\n",
 291 |        "      <td>alabama</td>\n",
 292 |        "      <td>4799277</td>\n",
 293 |        "      <td>67</td>\n",
 294 |        "      <td>26</td>\n",
 295 |        "      <td>1</td>\n",
 296 |        "      <td>4</td>\n",
 297 |        "      <td>23680</td>\n",
 298 |        "      <td>501</td>\n",
 299 |        "      <td>38.1</td>\n",
 300 |        "    </tr>\n",
 301 |        "    <tr>\n",
 302 |        "      <th>1</th>\n",
 303 |        "      <td>alaska</td>\n",
 304 |        "      <td>720316</td>\n",
 305 |        "      <td>63</td>\n",
 306 |        "      <td>3</td>\n",
 307 |        "      <td>5</td>\n",
 308 |        "      <td>6</td>\n",
 309 |        "      <td>32651</td>\n",
 310 |        "      <td>978</td>\n",
 311 |        "      <td>33.6</td>\n",
 312 |        "    </tr>\n",
 313 |        "    <tr>\n",
 314 |        "      <th>2</th>\n",
 315 |        "      <td>arizona</td>\n",
 316 |        "      <td>6479703</td>\n",
 317 |        "      <td>57</td>\n",
 318 |        "      <td>4</td>\n",
 319 |        "      <td>3</td>\n",
 320 |        "      <td>30</td>\n",
 321 |        "      <td>25358</td>\n",
 322 |        "      <td>747</td>\n",
 323 |        "      <td>36.3</td>\n",
 324 |        "    </tr>\n",
 325 |        "    <tr>\n",
 326 |        "      <th>3</th>\n",
 327 |        "      <td>arkansas</td>\n",
 328 |        "      <td>2933369</td>\n",
 329 |        "      <td>74</td>\n",
 330 |        "      <td>15</td>\n",
 331 |        "      <td>1</td>\n",
 332 |        "      <td>7</td>\n",
 333 |        "      <td>22170</td>\n",
 334 |        "      <td>480</td>\n",
 335 |        "      <td>37.5</td>\n",
 336 |        "    </tr>\n",
 337 |        "    <tr>\n",
 338 |        "      <th>4</th>\n",
 339 |        "      <td>california</td>\n",
 340 |        "      <td>37659181</td>\n",
 341 |        "      <td>40</td>\n",
 342 |        "      <td>6</td>\n",
 343 |        "      <td>13</td>\n",
 344 |        "      <td>38</td>\n",
 345 |        "      <td>29527</td>\n",
 346 |        "      <td>1119</td>\n",
 347 |        "      <td>35.4</td>\n",
 348 |        "    </tr>\n",
 349 |        "  </tbody>\n",
 350 |        "</table>\n",
 351 |        "</div>"
 352 |       ],
 353 |       "text/plain": [
 354 |        "       region  total_population  percent_white  percent_black  percent_asian  \\\n",
 355 |        "0     alabama           4799277             67             26              1   \n",
 356 |        "1      alaska            720316             63              3              5   \n",
 357 |        "2     arizona           6479703             57              4              3   \n",
 358 |        "3    arkansas           2933369             74             15              1   \n",
 359 |        "4  california          37659181             40              6             13   \n",
 360 |        "\n",
 361 |        "   percent_hispanic  per_capita_income  median_rent  median_age  \n",
 362 |        "0                 4              23680          501        38.1  \n",
 363 |        "1                 6              32651          978        33.6  \n",
 364 |        "2                30              25358          747        36.3  \n",
 365 |        "3                 7              22170          480        37.5  \n",
 366 |        "4                38              29527         1119        35.4  "
 367 |       ]
 368 |      },
 369 |      "execution_count": 5,
 370 |      "metadata": {},
 371 |      "output_type": "execute_result"
 372 |     }
 373 |    ],
 374 |    "source": [
 375 |     "demography_pd.head()"
 376 |    ]
 377 |   },
 378 |   {
 379 |    "cell_type": "code",
 380 |    "execution_count": 6,
 381 |    "metadata": {
 382 |     "collapsed": false
 383 |    },
 384 |    "outputs": [
 385 |     {
 386 |      "data": {
 387 |       "text/html": [
 388 |        "<div>\n",
 389 |        "<table border=\"1\" class=\"dataframe\">\n",
 390 |        "  <thead>\n",
 391 |        "    <tr style=\"text-align: right;\">\n",
 392 |        "      <th></th>\n",
 393 |        "      <th>region</th>\n",
 394 |        "      <th>value</th>\n",
 395 |        "    </tr>\n",
 396 |        "  </thead>\n",
 397 |        "  <tbody>\n",
 398 |        "    <tr>\n",
 399 |        "      <th>0</th>\n",
 400 |        "      <td>alabama</td>\n",
 401 |        "      <td>4777326</td>\n",
 402 |        "    </tr>\n",
 403 |        "    <tr>\n",
 404 |        "      <th>1</th>\n",
 405 |        "      <td>alaska</td>\n",
 406 |        "      <td>711139</td>\n",
 407 |        "    </tr>\n",
 408 |        "    <tr>\n",
 409 |        "      <th>2</th>\n",
 410 |        "      <td>arizona</td>\n",
 411 |        "      <td>6410979</td>\n",
 412 |        "    </tr>\n",
 413 |        "    <tr>\n",
 414 |        "      <th>3</th>\n",
 415 |        "      <td>arkansas</td>\n",
 416 |        "      <td>2916372</td>\n",
 417 |        "    </tr>\n",
 418 |        "    <tr>\n",
 419 |        "      <th>4</th>\n",
 420 |        "      <td>california</td>\n",
 421 |        "      <td>37325068</td>\n",
 422 |        "    </tr>\n",
 423 |        "  </tbody>\n",
 424 |        "</table>\n",
 425 |        "</div>"
 426 |       ],
 427 |       "text/plain": [
 428 |        "       region     value\n",
 429 |        "0     alabama   4777326\n",
 430 |        "1      alaska    711139\n",
 431 |        "2     arizona   6410979\n",
 432 |        "3    arkansas   2916372\n",
 433 |        "4  california  37325068"
 434 |       ]
 435 |      },
 436 |      "execution_count": 6,
 437 |      "metadata": {},
 438 |      "output_type": "execute_result"
 439 |     }
 440 |    ],
 441 |    "source": [
 442 |     "population_pd.head()"
 443 |    ]
 444 |   },
 445 |   {
 446 |    "cell_type": "code",
 447 |    "execution_count": 7,
 448 |    "metadata": {
 449 |     "collapsed": false
 450 |    },
 451 |    "outputs": [
 452 |     {
 453 |      "data": {
 454 |       "text/plain": [
 455 |        "State             object\n",
 456 |        "HighQ            float64\n",
 457 |        "HighQN             int64\n",
 458 |        "MedQ             float64\n",
 459 |        "MedQN              int64\n",
 460 |        "LowQ             float64\n",
 461 |        "LowQN              int64\n",
 462 |        "date      datetime64[ns]\n",
 463 |        "dtype: object"
 464 |       ]
 465 |      },
 466 |      "execution_count": 7,
 467 |      "metadata": {},
 468 |      "output_type": "execute_result"
 469 |     }
 470 |    ],
 471 |    "source": [
 472 |     "prices_pd.dtypes"
 473 |    ]
 474 |   },
 475 |   {
 476 |    "cell_type": "markdown",
 477 |    "metadata": {},
 478 |    "source": [
 479 |     "#### Sort the data on state and date, then fill NA values"
 480 |    ]
 481 |   },
 482 |   {
 483 |    "cell_type": "code",
 484 |    "execution_count": null,
 485 |    "metadata": {
 486 |     "collapsed": false
 487 |    },
 488 |    "outputs": [],
 489 |    "source": [
 490 |     "prices_pd.sort(columns=['State', 'date'], inplace=True)\n",
 491 |     "prices_pd.fillna(method='ffill', inplace=True)"
 492 |    ]
 493 |   },
 494 |   {
 495 |    "cell_type": "markdown",
 496 |    "metadata": {},
 497 |    "source": [
 498 |     "### Finding mean, median, mode, variance, standard deviation for California"
 499 |    ]
 500 |   },
 501 |   {
 502 |    "cell_type": "markdown",
 503 |    "metadata": {},
 504 |    "source": [
 505 |     "#### Mean\n",
 506 |     "\n",
 507 |     "arithmetic average of a range of values or quantities, computed by dividing the total of all values by the number of values."
 508 |    ]
 509 |   },
 510 |   {
 511 |    "cell_type": "code",
 512 |    "execution_count": 9,
 513 |    "metadata": {
 514 |     "collapsed": false
 515 |    },
 516 |    "outputs": [
 517 |     {
 518 |      "data": {
 519 |       "text/html": [
 520 |        "<div>\n",
 521 |        "<table border=\"1\" class=\"dataframe\">\n",
 522 |        "  <thead>\n",
 523 |        "    <tr style=\"text-align: right;\">\n",
 524 |        "      <th></th>\n",
 525 |        "      <th>State</th>\n",
 526 |        "      <th>HighQ</th>\n",
 527 |        "      <th>HighQN</th>\n",
 528 |        "      <th>MedQ</th>\n",
 529 |        "      <th>MedQN</th>\n",
 530 |        "      <th>LowQ</th>\n",
 531 |        "      <th>LowQN</th>\n",
 532 |        "      <th>date</th>\n",
 533 |        "    </tr>\n",
 534 |        "  </thead>\n",
 535 |        "  <tbody>\n",
 536 |        "    <tr>\n",
 537 |        "      <th>20098</th>\n",
 538 |        "      <td>California</td>\n",
 539 |        "      <td>248.77</td>\n",
 540 |        "      <td>12021</td>\n",
 541 |        "      <td>193.44</td>\n",
 542 |        "      <td>12724</td>\n",
 543 |        "      <td>193.88</td>\n",
 544 |        "      <td>770</td>\n",
 545 |        "      <td>2013-12-27</td>\n",
 546 |        "    </tr>\n",
 547 |        "    <tr>\n",
 548 |        "      <th>20863</th>\n",
 549 |        "      <td>California</td>\n",
 550 |        "      <td>248.74</td>\n",
 551 |        "      <td>12025</td>\n",
 552 |        "      <td>193.44</td>\n",
 553 |        "      <td>12728</td>\n",
 554 |        "      <td>193.88</td>\n",
 555 |        "      <td>770</td>\n",
 556 |        "      <td>2013-12-28</td>\n",
 557 |        "    </tr>\n",
 558 |        "    <tr>\n",
 559 |        "      <th>21577</th>\n",
 560 |        "      <td>California</td>\n",
 561 |        "      <td>248.76</td>\n",
 562 |        "      <td>12047</td>\n",
 563 |        "      <td>193.55</td>\n",
 564 |        "      <td>12760</td>\n",
 565 |        "      <td>193.60</td>\n",
 566 |        "      <td>772</td>\n",
 567 |        "      <td>2013-12-29</td>\n",
 568 |        "    </tr>\n",
 569 |        "    <tr>\n",
 570 |        "      <th>22291</th>\n",
 571 |        "      <td>California</td>\n",
 572 |        "      <td>248.82</td>\n",
 573 |        "      <td>12065</td>\n",
 574 |        "      <td>193.54</td>\n",
 575 |        "      <td>12779</td>\n",
 576 |        "      <td>193.80</td>\n",
 577 |        "      <td>773</td>\n",
 578 |        "      <td>2013-12-30</td>\n",
 579 |        "    </tr>\n",
 580 |        "    <tr>\n",
 581 |        "      <th>22801</th>\n",
 582 |        "      <td>California</td>\n",
 583 |        "      <td>248.76</td>\n",
 584 |        "      <td>12082</td>\n",
 585 |        "      <td>193.54</td>\n",
 586 |        "      <td>12792</td>\n",
 587 |        "      <td>193.80</td>\n",
 588 |        "      <td>773</td>\n",
 589 |        "      <td>2013-12-31</td>\n",
 590 |        "    </tr>\n",
 591 |        "  </tbody>\n",
 592 |        "</table>\n",
 593 |        "</div>"
 594 |       ],
 595 |       "text/plain": [
 596 |        "            State   HighQ  HighQN    MedQ  MedQN    LowQ  LowQN       date\n",
 597 |        "20098  California  248.77   12021  193.44  12724  193.88    770 2013-12-27\n",
 598 |        "20863  California  248.74   12025  193.44  12728  193.88    770 2013-12-28\n",
 599 |        "21577  California  248.76   12047  193.55  12760  193.60    772 2013-12-29\n",
 600 |        "22291  California  248.82   12065  193.54  12779  193.80    773 2013-12-30\n",
 601 |        "22801  California  248.76   12082  193.54  12792  193.80    773 2013-12-31"
 602 |       ]
 603 |      },
 604 |      "execution_count": 9,
 605 |      "metadata": {},
 606 |      "output_type": "execute_result"
 607 |     }
 608 |    ],
 609 |    "source": [
 610 |     "california_pd = prices_pd[prices_pd.State == \"California\"].copy(True)\n",
 611 |     "california_pd.head()"
 612 |    ]
 613 |   },
 614 |   {
 615 |    "cell_type": "code",
 616 |    "execution_count": 10,
 617 |    "metadata": {
 618 |     "collapsed": false
 619 |    },
 620 |    "outputs": [],
 621 |    "source": [
 622 |     "ca_sum = california_pd['HighQ'].sum()"
 623 |    ]
 624 |   },
 625 |   {
 626 |    "cell_type": "code",
 627 |    "execution_count": 11,
 628 |    "metadata": {
 629 |     "collapsed": false
 630 |    },
 631 |    "outputs": [],
 632 |    "source": [
 633 |     "ca_count = california_pd['HighQ'].count()"
 634 |    ]
 635 |   },
 636 |   {
 637 |    "cell_type": "code",
 638 |    "execution_count": 12,
 639 |    "metadata": {
 640 |     "collapsed": false
 641 |    },
 642 |    "outputs": [
 643 |     {
 644 |      "name": "stdout",
 645 |      "output_type": "stream",
 646 |      "text": [
 647 |       "Mean weed price in CA is: 245.376124722\n"
 648 |      ]
 649 |     }
 650 |    ],
 651 |    "source": [
 652 |     "ca_mean = ca_sum / ca_count\n",
 653 |     "print \"Mean weed price in CA is:\", ca_mean"
 654 |    ]
 655 |   },
 656 |   {
 657 |    "cell_type": "markdown",
 658 |    "metadata": {},
 659 |    "source": [
 660 |     "#### Exercise: Find CA mean for 2013, 2014 & 2015 separately\n",
 661 |     "\n",
 662 |     "*Hint:* `california_pd.iloc[0]['date'].year`"
 663 |    ]
 664 |   },
 665 |   {
 666 |    "cell_type": "code",
 667 |    "execution_count": null,
 668 |    "metadata": {
 669 |     "collapsed": true
 670 |    },
 671 |    "outputs": [],
 672 |    "source": []
 673 |   },
 674 |   {
 675 |    "cell_type": "markdown",
 676 |    "metadata": {},
 677 |    "source": [
 678 |     "#### Median\n",
 679 |     "\n",
 680 |     "Denotes value or quantity lying at the midpoint of a frequency distribution of observed values or quantities, such that there is an equal probability of falling above or below it. Simply put, it is the *middle* value in the list of numbers."
 681 |    ]
 682 |   },
 683 |   {
 684 |    "cell_type": "code",
 685 |    "execution_count": 13,
 686 |    "metadata": {
 687 |     "collapsed": false
 688 |    },
 689 |    "outputs": [
 690 |     {
 691 |      "data": {
 692 |       "text/plain": [
 693 |        "449"
 694 |       ]
 695 |      },
 696 |      "execution_count": 13,
 697 |      "metadata": {},
 698 |      "output_type": "execute_result"
 699 |     }
 700 |    ],
 701 |    "source": [
 702 |     "ca_count"
 703 |    ]
 704 |   },
 705 |   {
 706 |    "cell_type": "markdown",
 707 |    "metadata": {},
 708 |    "source": [
 709 |     "If count is odd, the median is the value at (n+1)/2,\n",
 710 |     "\n",
 711 |     "else it is the average of n/2 and (n+1)/2"
 712 |    ]
 713 |   },
 714 |   {
 715 |    "cell_type": "code",
 716 |    "execution_count": null,
 717 |    "metadata": {
 718 |     "collapsed": false
 719 |    },
 720 |    "outputs": [],
 721 |    "source": [
 722 |     "ca_highq_pd = california_pd.sort(columns=['HighQ'])\n",
 723 |     "ca_highq_pd.head()"
 724 |    ]
 725 |   },
 726 |   {
 727 |    "cell_type": "code",
 728 |    "execution_count": 15,
 729 |    "metadata": {
 730 |     "collapsed": false
 731 |    },
 732 |    "outputs": [
 733 |     {
 734 |      "name": "stdout",
 735 |      "output_type": "stream",
 736 |      "text": [
 737 |       "Median price of weed in CA is: 245.31\n"
 738 |      ]
 739 |     }
 740 |    ],
 741 |    "source": [
 742 |     "ca_median = ca_highq_pd.HighQ.iloc[(ca_count) / 2]\n",
 743 |     "print \"Median price of weed in CA is:\", ca_median"
 744 |    ]
 745 |   },
 746 |   {
 747 |    "cell_type": "markdown",
 748 |    "metadata": {},
 749 |    "source": [
 750 |     "#### Mode\n",
 751 |     "\n",
 752 |     "It is the number which appears most often in a set of numbers. "
 753 |    ]
 754 |   },
 755 |   {
 756 |    "cell_type": "code",
 757 |    "execution_count": 16,
 758 |    "metadata": {
 759 |     "collapsed": false
 760 |    },
 761 |    "outputs": [
 762 |     {
 763 |      "name": "stdout",
 764 |      "output_type": "stream",
 765 |      "text": [
 766 |       "The most common price is CA, as indicated by its mode, is: 245.05\n"
 767 |      ]
 768 |     }
 769 |    ],
 770 |    "source": [
 771 |     "ca_mode = ca_highq_pd.HighQ.value_counts().index[0]\n",
 772 |     "print \"The most common price is CA, as indicated by its mode, is:\", ca_mode"
 773 |    ]
 774 |   },
 775 |   {
 776 |    "cell_type": "markdown",
 777 |    "metadata": {},
 778 |    "source": [
 779 |     "#### Variance\n",
 780 |     "\n",
 781 |     "> Once two statistician of height 4 feet and 5 feet have to cross a river of AVERAGE depth 3 feet. Meanwhile, a third person comes and said, \"what are you waiting for? You can easily cross the river\"\n",
 782 |     "\n",
 783 |     "It's the average distance of the data values from the *mean*\n",
 784 |     "\n",
 785 |     "<img style=\"float: left;\" src=\"img/variance.png\" height=\"320\" width=\"320\">"
 786 |    ]
 787 |   },
 788 |   {
 789 |    "cell_type": "code",
 790 |    "execution_count": 17,
 791 |    "metadata": {
 792 |     "collapsed": false
 793 |    },
 794 |    "outputs": [],
 795 |    "source": [
 796 |     "california_pd['HighQ_dev'] = (california_pd['HighQ'] - ca_mean) ** 2"
 797 |    ]
 798 |   },
 799 |   {
 800 |    "cell_type": "code",
 801 |    "execution_count": 18,
 802 |    "metadata": {
 803 |     "collapsed": false
 804 |    },
 805 |    "outputs": [
 806 |     {
 807 |      "name": "stdout",
 808 |      "output_type": "stream",
 809 |      "text": [
 810 |       "Variance of High Quality weed prices in CA is: 2.98268628798\n"
 811 |      ]
 812 |     }
 813 |    ],
 814 |    "source": [
 815 |     "ca_HighQ_variance = california_pd.HighQ_dev.sum() / (ca_count - 1)\n",
 816 |     "print \"Variance of High Quality weed prices in CA is:\", ca_HighQ_variance"
 817 |    ]
 818 |   },
 819 |   {
 820 |    "cell_type": "markdown",
 821 |    "metadata": {},
 822 |    "source": [
 823 |     "#### Standard Deviation\n",
 824 |     "\n",
 825 |     "It is the square root of variance. This will have the same units as the data and mean. "
 826 |    ]
 827 |   },
 828 |   {
 829 |    "cell_type": "code",
 830 |    "execution_count": 19,
 831 |    "metadata": {
 832 |     "collapsed": false
 833 |    },
 834 |    "outputs": [
 835 |     {
 836 |      "name": "stdout",
 837 |      "output_type": "stream",
 838 |      "text": [
 839 |       "Standard Deviation of High Quality weed prices in CA is: 1.72704553732\n"
 840 |      ]
 841 |     }
 842 |    ],
 843 |    "source": [
 844 |     "ca_HighQ_SD = np.sqrt(ca_HighQ_variance)\n",
 845 |     "print \"Standard Deviation of High Quality weed prices in CA is:\", ca_HighQ_SD"
 846 |    ]
 847 |   },
 848 |   {
 849 |    "cell_type": "markdown",
 850 |    "metadata": {},
 851 |    "source": [
 852 |     "#### Using Pandas built-in function"
 853 |    ]
 854 |   },
 855 |   {
 856 |    "cell_type": "code",
 857 |    "execution_count": 20,
 858 |    "metadata": {
 859 |     "collapsed": false
 860 |    },
 861 |    "outputs": [
 862 |     {
 863 |      "data": {
 864 |       "text/html": [
 865 |        "<div>\n",
 866 |        "<table border=\"1\" class=\"dataframe\">\n",
 867 |        "  <thead>\n",
 868 |        "    <tr style=\"text-align: right;\">\n",
 869 |        "      <th></th>\n",
 870 |        "      <th>HighQ</th>\n",
 871 |        "      <th>HighQN</th>\n",
 872 |        "      <th>MedQ</th>\n",
 873 |        "      <th>MedQN</th>\n",
 874 |        "      <th>LowQ</th>\n",
 875 |        "      <th>LowQN</th>\n",
 876 |        "      <th>HighQ_dev</th>\n",
 877 |        "    </tr>\n",
 878 |        "  </thead>\n",
 879 |        "  <tbody>\n",
 880 |        "    <tr>\n",
 881 |        "      <th>count</th>\n",
 882 |        "      <td>449.000000</td>\n",
 883 |        "      <td>449.000000</td>\n",
 884 |        "      <td>449.000000</td>\n",
 885 |        "      <td>449.000000</td>\n",
 886 |        "      <td>449.000000</td>\n",
 887 |        "      <td>449.000000</td>\n",
 888 |        "      <td>449.000000</td>\n",
 889 |        "    </tr>\n",
 890 |        "    <tr>\n",
 891 |        "      <th>mean</th>\n",
 892 |        "      <td>245.376125</td>\n",
 893 |        "      <td>14947.073497</td>\n",
 894 |        "      <td>191.268909</td>\n",
 895 |        "      <td>16769.821826</td>\n",
 896 |        "      <td>189.783586</td>\n",
 897 |        "      <td>976.298441</td>\n",
 898 |        "      <td>2.976043</td>\n",
 899 |        "    </tr>\n",
 900 |        "    <tr>\n",
 901 |        "      <th>std</th>\n",
 902 |        "      <td>1.727046</td>\n",
 903 |        "      <td>1656.133565</td>\n",
 904 |        "      <td>1.524028</td>\n",
 905 |        "      <td>2433.943191</td>\n",
 906 |        "      <td>1.598252</td>\n",
 907 |        "      <td>120.246714</td>\n",
 908 |        "      <td>3.961134</td>\n",
 909 |        "    </tr>\n",
 910 |        "    <tr>\n",
 911 |        "      <th>min</th>\n",
 912 |        "      <td>241.840000</td>\n",
 913 |        "      <td>12021.000000</td>\n",
 914 |        "      <td>187.850000</td>\n",
 915 |        "      <td>12724.000000</td>\n",
 916 |        "      <td>187.830000</td>\n",
 917 |        "      <td>770.000000</td>\n",
 918 |        "      <td>0.000015</td>\n",
 919 |        "    </tr>\n",
 920 |        "    <tr>\n",
 921 |        "      <th>25%</th>\n",
 922 |        "      <td>244.480000</td>\n",
 923 |        "      <td>13610.000000</td>\n",
 924 |        "      <td>190.260000</td>\n",
 925 |        "      <td>14826.000000</td>\n",
 926 |        "      <td>188.600000</td>\n",
 927 |        "      <td>878.000000</td>\n",
 928 |        "      <td>0.106357</td>\n",
 929 |        "    </tr>\n",
 930 |        "    <tr>\n",
 931 |        "      <th>50%</th>\n",
 932 |        "      <td>245.310000</td>\n",
 933 |        "      <td>15037.000000</td>\n",
 934 |        "      <td>191.570000</td>\n",
 935 |        "      <td>16793.000000</td>\n",
 936 |        "      <td>188.600000</td>\n",
 937 |        "      <td>982.000000</td>\n",
 938 |        "      <td>0.729103</td>\n",
 939 |        "    </tr>\n",
 940 |        "    <tr>\n",
 941 |        "      <th>75%</th>\n",
 942 |        "      <td>246.220000</td>\n",
 943 |        "      <td>16090.000000</td>\n",
 944 |        "      <td>192.550000</td>\n",
 945 |        "      <td>18435.000000</td>\n",
 946 |        "      <td>191.320000</td>\n",
 947 |        "      <td>1060.000000</td>\n",
 948 |        "      <td>4.435761</td>\n",
 949 |        "    </tr>\n",
 950 |        "    <tr>\n",
 951 |        "      <th>max</th>\n",
 952 |        "      <td>248.820000</td>\n",
 953 |        "      <td>18492.000000</td>\n",
 954 |        "      <td>193.630000</td>\n",
 955 |        "      <td>22027.000000</td>\n",
 956 |        "      <td>193.880000</td>\n",
 957 |        "      <td>1232.000000</td>\n",
 958 |        "      <td>12.504178</td>\n",
 959 |        "    </tr>\n",
 960 |        "  </tbody>\n",
 961 |        "</table>\n",
 962 |        "</div>"
 963 |       ],
 964 |       "text/plain": [
 965 |        "            HighQ        HighQN        MedQ         MedQN        LowQ  \\\n",
 966 |        "count  449.000000    449.000000  449.000000    449.000000  449.000000   \n",
 967 |        "mean   245.376125  14947.073497  191.268909  16769.821826  189.783586   \n",
 968 |        "std      1.727046   1656.133565    1.524028   2433.943191    1.598252   \n",
 969 |        "min    241.840000  12021.000000  187.850000  12724.000000  187.830000   \n",
 970 |        "25%    244.480000  13610.000000  190.260000  14826.000000  188.600000   \n",
 971 |        "50%    245.310000  15037.000000  191.570000  16793.000000  188.600000   \n",
 972 |        "75%    246.220000  16090.000000  192.550000  18435.000000  191.320000   \n",
 973 |        "max    248.820000  18492.000000  193.630000  22027.000000  193.880000   \n",
 974 |        "\n",
 975 |        "             LowQN   HighQ_dev  \n",
 976 |        "count   449.000000  449.000000  \n",
 977 |        "mean    976.298441    2.976043  \n",
 978 |        "std     120.246714    3.961134  \n",
 979 |        "min     770.000000    0.000015  \n",
 980 |        "25%     878.000000    0.106357  \n",
 981 |        "50%     982.000000    0.729103  \n",
 982 |        "75%    1060.000000    4.435761  \n",
 983 |        "max    1232.000000   12.504178  "
 984 |       ]
 985 |      },
 986 |      "execution_count": 20,
 987 |      "metadata": {},
 988 |      "output_type": "execute_result"
 989 |     }
 990 |    ],
 991 |    "source": [
 992 |     "california_pd.describe()"
 993 |    ]
 994 |   },
 995 |   {
 996 |    "cell_type": "code",
 997 |    "execution_count": 21,
 998 |    "metadata": {
 999 |     "collapsed": false
1000 |    },
1001 |    "outputs": [
1002 |     {
1003 |      "data": {
1004 |       "text/plain": [
1005 |        "0    245.03\n",
1006 |        "1    245.05\n",
1007 |        "dtype: float64"
1008 |       ]
1009 |      },
1010 |      "execution_count": 21,
1011 |      "metadata": {},
1012 |      "output_type": "execute_result"
1013 |     }
1014 |    ],
1015 |    "source": [
1016 |     "california_pd.HighQ.mode()"
1017 |    ]
1018 |   },
1019 |   {
1020 |    "cell_type": "markdown",
1021 |    "metadata": {},
1022 |    "source": [
1023 |     "#### Co-variance \n",
1024 |     "\n",
1025 |     "covariance as a measure of the (average) co-variation between two variables, say x and y. Covariance describes both how far the variables are spread out, and the nature of their relationship, Covariance is a measure of how much two variables change together. Compare this to Variance, which is just the range over which one measure (or variable) varies.\n",
1026 |     "\n",
1027 |     "<img style=\"float: left;\" src=\"img/covariance.png\" height=\"270\" width=\"270\">\n",
1028 |     "\n",
1029 |     "<br>\n",
1030 |     "<br>\n",
1031 |     "<br>\n",
1032 |     "<br>\n",
1033 |     "\n",
1034 |     "#### Co-variance of weed price in California vs New York"
1035 |    ]
1036 |   },
1037 |   {
1038 |    "cell_type": "code",
1039 |    "execution_count": 22,
1040 |    "metadata": {
1041 |     "collapsed": false
1042 |    },
1043 |    "outputs": [
1044 |     {
1045 |      "data": {
1046 |       "text/html": [
1047 |        "<div>\n",
1048 |        "<table border=\"1\" class=\"dataframe\">\n",
1049 |        "  <thead>\n",
1050 |        "    <tr style=\"text-align: right;\">\n",
1051 |        "      <th></th>\n",
1052 |        "      <th>State</th>\n",
1053 |        "      <th>HighQ</th>\n",
1054 |        "      <th>HighQN</th>\n",
1055 |        "      <th>MedQ</th>\n",
1056 |        "      <th>MedQN</th>\n",
1057 |        "      <th>LowQ</th>\n",
1058 |        "      <th>LowQN</th>\n",
1059 |        "      <th>date</th>\n",
1060 |        "    </tr>\n",
1061 |        "  </thead>\n",
1062 |        "  <tbody>\n",
1063 |        "    <tr>\n",
1064 |        "      <th>20120</th>\n",
1065 |        "      <td>New York</td>\n",
1066 |        "      <td>351.98</td>\n",
1067 |        "      <td>5773</td>\n",
1068 |        "      <td>268.83</td>\n",
1069 |        "      <td>5786</td>\n",
1070 |        "      <td>190.31</td>\n",
1071 |        "      <td>479</td>\n",
1072 |        "      <td>2013-12-27</td>\n",
1073 |        "    </tr>\n",
1074 |        "    <tr>\n",
1075 |        "      <th>20885</th>\n",
1076 |        "      <td>New York</td>\n",
1077 |        "      <td>351.92</td>\n",
1078 |        "      <td>5775</td>\n",
1079 |        "      <td>268.83</td>\n",
1080 |        "      <td>5786</td>\n",
1081 |        "      <td>190.31</td>\n",
1082 |        "      <td>479</td>\n",
1083 |        "      <td>2013-12-28</td>\n",
1084 |        "    </tr>\n",
1085 |        "    <tr>\n",
1086 |        "      <th>21599</th>\n",
1087 |        "      <td>New York</td>\n",
1088 |        "      <td>351.99</td>\n",
1089 |        "      <td>5785</td>\n",
1090 |        "      <td>269.02</td>\n",
1091 |        "      <td>5806</td>\n",
1092 |        "      <td>190.75</td>\n",
1093 |        "      <td>480</td>\n",
1094 |        "      <td>2013-12-29</td>\n",
1095 |        "    </tr>\n",
1096 |        "    <tr>\n",
1097 |        "      <th>22313</th>\n",
1098 |        "      <td>New York</td>\n",
1099 |        "      <td>352.02</td>\n",
1100 |        "      <td>5791</td>\n",
1101 |        "      <td>268.98</td>\n",
1102 |        "      <td>5814</td>\n",
1103 |        "      <td>190.75</td>\n",
1104 |        "      <td>480</td>\n",
1105 |        "      <td>2013-12-30</td>\n",
1106 |        "    </tr>\n",
1107 |        "    <tr>\n",
1108 |        "      <th>22823</th>\n",
1109 |        "      <td>New York</td>\n",
1110 |        "      <td>351.97</td>\n",
1111 |        "      <td>5794</td>\n",
1112 |        "      <td>268.93</td>\n",
1113 |        "      <td>5818</td>\n",
1114 |        "      <td>190.75</td>\n",
1115 |        "      <td>480</td>\n",
1116 |        "      <td>2013-12-31</td>\n",
1117 |        "    </tr>\n",
1118 |        "  </tbody>\n",
1119 |        "</table>\n",
1120 |        "</div>"
1121 |       ],
1122 |       "text/plain": [
1123 |        "          State   HighQ  HighQN    MedQ  MedQN    LowQ  LowQN       date\n",
1124 |        "20120  New York  351.98    5773  268.83   5786  190.31    479 2013-12-27\n",
1125 |        "20885  New York  351.92    5775  268.83   5786  190.31    479 2013-12-28\n",
1126 |        "21599  New York  351.99    5785  269.02   5806  190.75    480 2013-12-29\n",
1127 |        "22313  New York  352.02    5791  268.98   5814  190.75    480 2013-12-30\n",
1128 |        "22823  New York  351.97    5794  268.93   5818  190.75    480 2013-12-31"
1129 |       ]
1130 |      },
1131 |      "execution_count": 22,
1132 |      "metadata": {},
1133 |      "output_type": "execute_result"
1134 |     }
1135 |    ],
1136 |    "source": [
1137 |     "ny_pd = prices_pd[prices_pd['State'] == 'New York'].copy(True)\n",
1138 |     "ny_pd.head()"
1139 |    ]
1140 |   },
1141 |   {
1142 |    "cell_type": "code",
1143 |    "execution_count": 23,
1144 |    "metadata": {
1145 |     "collapsed": false
1146 |    },
1147 |    "outputs": [],
1148 |    "source": [
1149 |     "ny_pd = ny_pd.ix[:,[1,7]]\n",
1150 |     "ny_pd.columns = ['NY_HighQ', 'date']"
1151 |    ]
1152 |   },
1153 |   {
1154 |    "cell_type": "code",
1155 |    "execution_count": 24,
1156 |    "metadata": {
1157 |     "collapsed": false
1158 |    },
1159 |    "outputs": [
1160 |     {
1161 |      "data": {
1162 |       "text/html": [
1163 |        "<div>\n",
1164 |        "<table border=\"1\" class=\"dataframe\">\n",
1165 |        "  <thead>\n",
1166 |        "    <tr style=\"text-align: right;\">\n",
1167 |        "      <th></th>\n",
1168 |        "      <th>NY_HighQ</th>\n",
1169 |        "      <th>date</th>\n",
1170 |        "    </tr>\n",
1171 |        "  </thead>\n",
1172 |        "  <tbody>\n",
1173 |        "    <tr>\n",
1174 |        "      <th>20120</th>\n",
1175 |        "      <td>351.98</td>\n",
1176 |        "      <td>2013-12-27</td>\n",
1177 |        "    </tr>\n",
1178 |        "    <tr>\n",
1179 |        "      <th>20885</th>\n",
1180 |        "      <td>351.92</td>\n",
1181 |        "      <td>2013-12-28</td>\n",
1182 |        "    </tr>\n",
1183 |        "    <tr>\n",
1184 |        "      <th>21599</th>\n",
1185 |        "      <td>351.99</td>\n",
1186 |        "      <td>2013-12-29</td>\n",
1187 |        "    </tr>\n",
1188 |        "    <tr>\n",
1189 |        "      <th>22313</th>\n",
1190 |        "      <td>352.02</td>\n",
1191 |        "      <td>2013-12-30</td>\n",
1192 |        "    </tr>\n",
1193 |        "    <tr>\n",
1194 |        "      <th>22823</th>\n",
1195 |        "      <td>351.97</td>\n",
1196 |        "      <td>2013-12-31</td>\n",
1197 |        "    </tr>\n",
1198 |        "  </tbody>\n",
1199 |        "</table>\n",
1200 |        "</div>"
1201 |       ],
1202 |       "text/plain": [
1203 |        "       NY_HighQ       date\n",
1204 |        "20120    351.98 2013-12-27\n",
1205 |        "20885    351.92 2013-12-28\n",
1206 |        "21599    351.99 2013-12-29\n",
1207 |        "22313    352.02 2013-12-30\n",
1208 |        "22823    351.97 2013-12-31"
1209 |       ]
1210 |      },
1211 |      "execution_count": 24,
1212 |      "metadata": {},
1213 |      "output_type": "execute_result"
1214 |     }
1215 |    ],
1216 |    "source": [
1217 |     "ny_pd.head()"
1218 |    ]
1219 |   },
1220 |   {
1221 |    "cell_type": "code",
1222 |    "execution_count": 25,
1223 |    "metadata": {
1224 |     "collapsed": false
1225 |    },
1226 |    "outputs": [
1227 |     {
1228 |      "data": {
1229 |       "text/html": [
1230 |        "<div>\n",
1231 |        "<table border=\"1\" class=\"dataframe\">\n",
1232 |        "  <thead>\n",
1233 |        "    <tr style=\"text-align: right;\">\n",
1234 |        "      <th></th>\n",
1235 |        "      <th>CA_HighQ</th>\n",
1236 |        "      <th>date</th>\n",
1237 |        "      <th>NY_HighQ</th>\n",
1238 |        "    </tr>\n",
1239 |        "  </thead>\n",
1240 |        "  <tbody>\n",
1241 |        "    <tr>\n",
1242 |        "      <th>0</th>\n",
1243 |        "      <td>248.77</td>\n",
1244 |        "      <td>2013-12-27</td>\n",
1245 |        "      <td>351.98</td>\n",
1246 |        "    </tr>\n",
1247 |        "    <tr>\n",
1248 |        "      <th>1</th>\n",
1249 |        "      <td>248.74</td>\n",
1250 |        "      <td>2013-12-28</td>\n",
1251 |        "      <td>351.92</td>\n",
1252 |        "    </tr>\n",
1253 |        "    <tr>\n",
1254 |        "      <th>2</th>\n",
1255 |        "      <td>248.76</td>\n",
1256 |        "      <td>2013-12-29</td>\n",
1257 |        "      <td>351.99</td>\n",
1258 |        "    </tr>\n",
1259 |        "    <tr>\n",
1260 |        "      <th>3</th>\n",
1261 |        "      <td>248.82</td>\n",
1262 |        "      <td>2013-12-30</td>\n",
1263 |        "      <td>352.02</td>\n",
1264 |        "    </tr>\n",
1265 |        "    <tr>\n",
1266 |        "      <th>4</th>\n",
1267 |        "      <td>248.76</td>\n",
1268 |        "      <td>2013-12-31</td>\n",
1269 |        "      <td>351.97</td>\n",
1270 |        "    </tr>\n",
1271 |        "  </tbody>\n",
1272 |        "</table>\n",
1273 |        "</div>"
1274 |       ],
1275 |       "text/plain": [
1276 |        "   CA_HighQ       date  NY_HighQ\n",
1277 |        "0    248.77 2013-12-27    351.98\n",
1278 |        "1    248.74 2013-12-28    351.92\n",
1279 |        "2    248.76 2013-12-29    351.99\n",
1280 |        "3    248.82 2013-12-30    352.02\n",
1281 |        "4    248.76 2013-12-31    351.97"
1282 |       ]
1283 |      },
1284 |      "execution_count": 25,
1285 |      "metadata": {},
1286 |      "output_type": "execute_result"
1287 |     }
1288 |    ],
1289 |    "source": [
1290 |     "ca_ny_pd = pd.merge(california_pd.ix[:,[1,7]].copy(), ny_pd, on=\"date\")\n",
1291 |     "ca_ny_pd.rename(columns={\"HighQ\": \"CA_HighQ\"}, inplace=True)\n",
1292 |     "ca_ny_pd.head()"
1293 |    ]
1294 |   },
1295 |   {
1296 |    "cell_type": "code",
1297 |    "execution_count": 26,
1298 |    "metadata": {
1299 |     "collapsed": false
1300 |    },
1301 |    "outputs": [
1302 |     {
1303 |      "data": {
1304 |       "text/plain": [
1305 |        "346.9127616926502"
1306 |       ]
1307 |      },
1308 |      "execution_count": 26,
1309 |      "metadata": {},
1310 |      "output_type": "execute_result"
1311 |     }
1312 |    ],
1313 |    "source": [
1314 |     "ny_mean = ca_ny_pd.NY_HighQ.mean()\n",
1315 |     "ny_mean"
1316 |    ]
1317 |   },
1318 |   {
1319 |    "cell_type": "code",
1320 |    "execution_count": 27,
1321 |    "metadata": {
1322 |     "collapsed": false
1323 |    },
1324 |    "outputs": [
1325 |     {
1326 |      "data": {
1327 |       "text/html": [
1328 |        "<div>\n",
1329 |        "<table border=\"1\" class=\"dataframe\">\n",
1330 |        "  <thead>\n",
1331 |        "    <tr style=\"text-align: right;\">\n",
1332 |        "      <th></th>\n",
1333 |        "      <th>CA_HighQ</th>\n",
1334 |        "      <th>date</th>\n",
1335 |        "      <th>NY_HighQ</th>\n",
1336 |        "      <th>ca_dev</th>\n",
1337 |        "    </tr>\n",
1338 |        "  </thead>\n",
1339 |        "  <tbody>\n",
1340 |        "    <tr>\n",
1341 |        "      <th>0</th>\n",
1342 |        "      <td>248.77</td>\n",
1343 |        "      <td>2013-12-27</td>\n",
1344 |        "      <td>351.98</td>\n",
1345 |        "      <td>3.393875</td>\n",
1346 |        "    </tr>\n",
1347 |        "    <tr>\n",
1348 |        "      <th>1</th>\n",
1349 |        "      <td>248.74</td>\n",
1350 |        "      <td>2013-12-28</td>\n",
1351 |        "      <td>351.92</td>\n",
1352 |        "      <td>3.363875</td>\n",
1353 |        "    </tr>\n",
1354 |        "    <tr>\n",
1355 |        "      <th>2</th>\n",
1356 |        "      <td>248.76</td>\n",
1357 |        "      <td>2013-12-29</td>\n",
1358 |        "      <td>351.99</td>\n",
1359 |        "      <td>3.383875</td>\n",
1360 |        "    </tr>\n",
1361 |        "    <tr>\n",
1362 |        "      <th>3</th>\n",
1363 |        "      <td>248.82</td>\n",
1364 |        "      <td>2013-12-30</td>\n",
1365 |        "      <td>352.02</td>\n",
1366 |        "      <td>3.443875</td>\n",
1367 |        "    </tr>\n",
1368 |        "    <tr>\n",
1369 |        "      <th>4</th>\n",
1370 |        "      <td>248.76</td>\n",
1371 |        "      <td>2013-12-31</td>\n",
1372 |        "      <td>351.97</td>\n",
1373 |        "      <td>3.383875</td>\n",
1374 |        "    </tr>\n",
1375 |        "  </tbody>\n",
1376 |        "</table>\n",
1377 |        "</div>"
1378 |       ],
1379 |       "text/plain": [
1380 |        "   CA_HighQ       date  NY_HighQ    ca_dev\n",
1381 |        "0    248.77 2013-12-27    351.98  3.393875\n",
1382 |        "1    248.74 2013-12-28    351.92  3.363875\n",
1383 |        "2    248.76 2013-12-29    351.99  3.383875\n",
1384 |        "3    248.82 2013-12-30    352.02  3.443875\n",
1385 |        "4    248.76 2013-12-31    351.97  3.383875"
1386 |       ]
1387 |      },
1388 |      "execution_count": 27,
1389 |      "metadata": {},
1390 |      "output_type": "execute_result"
1391 |     }
1392 |    ],
1393 |    "source": [
1394 |     "ca_ny_pd['ca_dev'] = ca_ny_pd['CA_HighQ'] - ca_mean\n",
1395 |     "ca_ny_pd.head()"
1396 |    ]
1397 |   },
1398 |   {
1399 |    "cell_type": "code",
1400 |    "execution_count": 28,
1401 |    "metadata": {
1402 |     "collapsed": false
1403 |    },
1404 |    "outputs": [
1405 |     {
1406 |      "data": {
1407 |       "text/html": [
1408 |        "<div>\n",
1409 |        "<table border=\"1\" class=\"dataframe\">\n",
1410 |        "  <thead>\n",
1411 |        "    <tr style=\"text-align: right;\">\n",
1412 |        "      <th></th>\n",
1413 |        "      <th>CA_HighQ</th>\n",
1414 |        "      <th>date</th>\n",
1415 |        "      <th>NY_HighQ</th>\n",
1416 |        "      <th>ca_dev</th>\n",
1417 |        "      <th>ny_dev</th>\n",
1418 |        "    </tr>\n",
1419 |        "  </thead>\n",
1420 |        "  <tbody>\n",
1421 |        "    <tr>\n",
1422 |        "      <th>0</th>\n",
1423 |        "      <td>248.77</td>\n",
1424 |        "      <td>2013-12-27</td>\n",
1425 |        "      <td>351.98</td>\n",
1426 |        "      <td>3.393875</td>\n",
1427 |        "      <td>5.067238</td>\n",
1428 |        "    </tr>\n",
1429 |        "    <tr>\n",
1430 |        "      <th>1</th>\n",
1431 |        "      <td>248.74</td>\n",
1432 |        "      <td>2013-12-28</td>\n",
1433 |        "      <td>351.92</td>\n",
1434 |        "      <td>3.363875</td>\n",
1435 |        "      <td>5.007238</td>\n",
1436 |        "    </tr>\n",
1437 |        "    <tr>\n",
1438 |        "      <th>2</th>\n",
1439 |        "      <td>248.76</td>\n",
1440 |        "      <td>2013-12-29</td>\n",
1441 |        "      <td>351.99</td>\n",
1442 |        "      <td>3.383875</td>\n",
1443 |        "      <td>5.077238</td>\n",
1444 |        "    </tr>\n",
1445 |        "    <tr>\n",
1446 |        "      <th>3</th>\n",
1447 |        "      <td>248.82</td>\n",
1448 |        "      <td>2013-12-30</td>\n",
1449 |        "      <td>352.02</td>\n",
1450 |        "      <td>3.443875</td>\n",
1451 |        "      <td>5.107238</td>\n",
1452 |        "    </tr>\n",
1453 |        "    <tr>\n",
1454 |        "      <th>4</th>\n",
1455 |        "      <td>248.76</td>\n",
1456 |        "      <td>2013-12-31</td>\n",
1457 |        "      <td>351.97</td>\n",
1458 |        "      <td>3.383875</td>\n",
1459 |        "      <td>5.057238</td>\n",
1460 |        "    </tr>\n",
1461 |        "  </tbody>\n",
1462 |        "</table>\n",
1463 |        "</div>"
1464 |       ],
1465 |       "text/plain": [
1466 |        "   CA_HighQ       date  NY_HighQ    ca_dev    ny_dev\n",
1467 |        "0    248.77 2013-12-27    351.98  3.393875  5.067238\n",
1468 |        "1    248.74 2013-12-28    351.92  3.363875  5.007238\n",
1469 |        "2    248.76 2013-12-29    351.99  3.383875  5.077238\n",
1470 |        "3    248.82 2013-12-30    352.02  3.443875  5.107238\n",
1471 |        "4    248.76 2013-12-31    351.97  3.383875  5.057238"
1472 |       ]
1473 |      },
1474 |      "execution_count": 28,
1475 |      "metadata": {},
1476 |      "output_type": "execute_result"
1477 |     }
1478 |    ],
1479 |    "source": [
1480 |     "ca_ny_pd['ny_dev'] = ca_ny_pd['NY_HighQ'] - ny_mean\n",
1481 |     "ca_ny_pd.head()"
1482 |    ]
1483 |   },
1484 |   {
1485 |    "cell_type": "code",
1486 |    "execution_count": 29,
1487 |    "metadata": {
1488 |     "collapsed": false
1489 |    },
1490 |    "outputs": [
1491 |     {
1492 |      "name": "stdout",
1493 |      "output_type": "stream",
1494 |      "text": [
1495 |       "Covariance of the High Quality weed prices in CA and NY is: 5.91681496729\n"
1496 |      ]
1497 |     }
1498 |    ],
1499 |    "source": [
1500 |     "ca_ny_cov = (ca_ny_pd['ca_dev'] * ca_ny_pd['ny_dev']).sum() / (ca_count - 1)\n",
1501 |     "print \"Covariance of the High Quality weed prices in CA and NY is:\", ca_ny_cov"
1502 |    ]
1503 |   },
1504 |   {
1505 |    "cell_type": "markdown",
1506 |    "metadata": {},
1507 |    "source": [
1508 |     "#### Using Pandas built-in function"
1509 |    ]
1510 |   },
1511 |   {
1512 |    "cell_type": "code",
1513 |    "execution_count": 30,
1514 |    "metadata": {
1515 |     "collapsed": false
1516 |    },
1517 |    "outputs": [
1518 |     {
1519 |      "data": {
1520 |       "text/html": [
1521 |        "<div>\n",
1522 |        "<table border=\"1\" class=\"dataframe\">\n",
1523 |        "  <thead>\n",
1524 |        "    <tr style=\"text-align: right;\">\n",
1525 |        "      <th></th>\n",
1526 |        "      <th>CA_HighQ</th>\n",
1527 |        "      <th>NY_HighQ</th>\n",
1528 |        "      <th>ca_dev</th>\n",
1529 |        "      <th>ny_dev</th>\n",
1530 |        "    </tr>\n",
1531 |        "  </thead>\n",
1532 |        "  <tbody>\n",
1533 |        "    <tr>\n",
1534 |        "      <th>CA_HighQ</th>\n",
1535 |        "      <td>2.982686</td>\n",
1536 |        "      <td>5.916815</td>\n",
1537 |        "      <td>2.982686</td>\n",
1538 |        "      <td>5.916815</td>\n",
1539 |        "    </tr>\n",
1540 |        "    <tr>\n",
1541 |        "      <th>NY_HighQ</th>\n",
1542 |        "      <td>5.916815</td>\n",
1543 |        "      <td>12.245147</td>\n",
1544 |        "      <td>5.916815</td>\n",
1545 |        "      <td>12.245147</td>\n",
1546 |        "    </tr>\n",
1547 |        "    <tr>\n",
1548 |        "      <th>ca_dev</th>\n",
1549 |        "      <td>2.982686</td>\n",
1550 |        "      <td>5.916815</td>\n",
1551 |        "      <td>2.982686</td>\n",
1552 |        "      <td>5.916815</td>\n",
1553 |        "    </tr>\n",
1554 |        "    <tr>\n",
1555 |        "      <th>ny_dev</th>\n",
1556 |        "      <td>5.916815</td>\n",
1557 |        "      <td>12.245147</td>\n",
1558 |        "      <td>5.916815</td>\n",
1559 |        "      <td>12.245147</td>\n",
1560 |        "    </tr>\n",
1561 |        "  </tbody>\n",
1562 |        "</table>\n",
1563 |        "</div>"
1564 |       ],
1565 |       "text/plain": [
1566 |        "          CA_HighQ   NY_HighQ    ca_dev     ny_dev\n",
1567 |        "CA_HighQ  2.982686   5.916815  2.982686   5.916815\n",
1568 |        "NY_HighQ  5.916815  12.245147  5.916815  12.245147\n",
1569 |        "ca_dev    2.982686   5.916815  2.982686   5.916815\n",
1570 |        "ny_dev    5.916815  12.245147  5.916815  12.245147"
1571 |       ]
1572 |      },
1573 |      "execution_count": 30,
1574 |      "metadata": {},
1575 |      "output_type": "execute_result"
1576 |     }
1577 |    ],
1578 |    "source": [
1579 |     "ca_ny_pd.cov()"
1580 |    ]
1581 |   },
1582 |   {
1583 |    "cell_type": "markdown",
1584 |    "metadata": {},
1585 |    "source": [
1586 |     "### Correlation\n",
1587 |     "\n",
1588 |     "Extent to which two or more variables fluctuate together. A positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation indicates the extent to which one variable increases as the other decreases.\n",
1589 |     "\n",
1590 |     "<img style=\"float: left;\" src=\"img/correlation.gif\" height=\"270\" width=\"270\">\n",
1591 |     "\n",
1592 |     "<br>\n",
1593 |     "<br>\n",
1594 |     "<br>\n",
1595 |     "\n",
1596 |     "#### Finding correlation between weed prices in New York and California"
1597 |    ]
1598 |   },
1599 |   {
1600 |    "cell_type": "code",
1601 |    "execution_count": 31,
1602 |    "metadata": {
1603 |     "collapsed": false
1604 |    },
1605 |    "outputs": [
1606 |     {
1607 |      "name": "stdout",
1608 |      "output_type": "stream",
1609 |      "text": [
1610 |       "Correlation between weed prices in NY and CA: 0.979043961106\n"
1611 |      ]
1612 |     }
1613 |    ],
1614 |    "source": [
1615 |     "ca_highq_std = ca_ny_pd.CA_HighQ.std()\n",
1616 |     "ny_highq_std = ca_ny_pd.NY_HighQ.std()\n",
1617 |     "\n",
1618 |     "ca_ny_corr = ca_ny_cov / (ca_highq_std * ny_highq_std)\n",
1619 |     "print \"Correlation between weed prices in NY and CA:\", ca_ny_corr"
1620 |    ]
1621 |   },
1622 |   {
1623 |    "cell_type": "code",
1624 |    "execution_count": 32,
1625 |    "metadata": {
1626 |     "collapsed": false
1627 |    },
1628 |    "outputs": [
1629 |     {
1630 |      "data": {
1631 |       "text/html": [
1632 |        "<div>\n",
1633 |        "<table border=\"1\" class=\"dataframe\">\n",
1634 |        "  <thead>\n",
1635 |        "    <tr style=\"text-align: right;\">\n",
1636 |        "      <th></th>\n",
1637 |        "      <th>CA_HighQ</th>\n",
1638 |        "      <th>NY_HighQ</th>\n",
1639 |        "      <th>ca_dev</th>\n",
1640 |        "      <th>ny_dev</th>\n",
1641 |        "    </tr>\n",
1642 |        "  </thead>\n",
1643 |        "  <tbody>\n",
1644 |        "    <tr>\n",
1645 |        "      <th>CA_HighQ</th>\n",
1646 |        "      <td>1.000000</td>\n",
1647 |        "      <td>0.979044</td>\n",
1648 |        "      <td>1.000000</td>\n",
1649 |        "      <td>0.979044</td>\n",
1650 |        "    </tr>\n",
1651 |        "    <tr>\n",
1652 |        "      <th>NY_HighQ</th>\n",
1653 |        "      <td>0.979044</td>\n",
1654 |        "      <td>1.000000</td>\n",
1655 |        "      <td>0.979044</td>\n",
1656 |        "      <td>1.000000</td>\n",
1657 |        "    </tr>\n",
1658 |        "    <tr>\n",
1659 |        "      <th>ca_dev</th>\n",
1660 |        "      <td>1.000000</td>\n",
1661 |        "      <td>0.979044</td>\n",
1662 |        "      <td>1.000000</td>\n",
1663 |        "      <td>0.979044</td>\n",
1664 |        "    </tr>\n",
1665 |        "    <tr>\n",
1666 |        "      <th>ny_dev</th>\n",
1667 |        "      <td>0.979044</td>\n",
1668 |        "      <td>1.000000</td>\n",
1669 |        "      <td>0.979044</td>\n",
1670 |        "      <td>1.000000</td>\n",
1671 |        "    </tr>\n",
1672 |        "  </tbody>\n",
1673 |        "</table>\n",
1674 |        "</div>"
1675 |       ],
1676 |       "text/plain": [
1677 |        "          CA_HighQ  NY_HighQ    ca_dev    ny_dev\n",
1678 |        "CA_HighQ  1.000000  0.979044  1.000000  0.979044\n",
1679 |        "NY_HighQ  0.979044  1.000000  0.979044  1.000000\n",
1680 |        "ca_dev    1.000000  0.979044  1.000000  0.979044\n",
1681 |        "ny_dev    0.979044  1.000000  0.979044  1.000000"
1682 |       ]
1683 |      },
1684 |      "execution_count": 32,
1685 |      "metadata": {},
1686 |      "output_type": "execute_result"
1687 |     }
1688 |    ],
1689 |    "source": [
1690 |     "ca_ny_pd.corr()"
1691 |    ]
1692 |   },
1693 |   {
1694 |    "cell_type": "markdown",
1695 |    "metadata": {},
1696 |    "source": [
1697 |     "# Correlation != Causation\n",
1698 |     "\n",
1699 |     "correlation between two variables does not necessarily imply that one causes the other.\n",
1700 |     "\n",
1701 |     "\n",
1702 |     "<img style=\"float: left;\" src=\"img/correlation_not_causation.gif\" height=\"570\" width=\"570\">"
1703 |    ]
1704 |   }
1705 |  ],
1706 |  "metadata": {
1707 |   "kernelspec": {
1708 |    "display_name": "Python 2",
1709 |    "language": "python",
1710 |    "name": "python2"
1711 |   },
1712 |   "language_info": {
1713 |    "codemirror_mode": {
1714 |     "name": "ipython",
1715 |     "version": 2
1716 |    },
1717 |    "file_extension": ".py",
1718 |    "mimetype": "text/x-python",
1719 |    "name": "python",
1720 |    "nbconvert_exporter": "python",
1721 |    "pygments_lexer": "ipython2",
1722 |    "version": "2.7.10"
1723 |   }
1724 |  },
1725 |  "nbformat": 4,
1726 |  "nbformat_minor": 0
1727 | }
1728 | 


--------------------------------------------------------------------------------
/notebooks/6. Hypothesis Testing.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Hypothesis Testing\n",
  8 |     "\n",
  9 |     "\n",
 10 |     "We would like to know if the effects we see in the sample(observed data) are likely to occur in the population. \n",
 11 |     "\n",
 12 |     "The way classical hypothesis testing works is by conducting a statistical test to answer the following question:\n",
 13 |     "> Given the sample and an effect, what is the probability of seeing that effect just by chance?\n",
 14 |     "\n",
 15 |     "Here are the steps on how we would do this\n",
 16 |     "\n",
 17 |     "1. Compute test statistic\n",
 18 |     "2. Define null hypothesis\n",
 19 |     "3. Compute p-value\n",
 20 |     "4. Interpret the result\n",
 21 |     "\n",
 22 |     "If p-value is very low(most often than now, below 0.05), the effect is considered statistically significant. That means that effect is unlikely to have occured by chance. The inference? The effect is likely to be seen in the population too. \n",
 23 |     "\n",
 24 |     "This process is very similar to the *proof by contradiction* paradigm. We first assume that the effect is false. That's the null hypothesis. Next step is to compute the probability of obtaining that effect (the p-value). If p-value is very low(<0.05 as a rule of thumb), we reject the null hypothesis. "
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 1,
 30 |    "metadata": {
 31 |     "collapsed": true
 32 |    },
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "import numpy as np\n",
 36 |     "import pandas as pd\n",
 37 |     "from scipy import stats\n",
 38 |     "import matplotlib as mpl\n",
 39 |     "%matplotlib inline"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": 2,
 45 |    "metadata": {
 46 |     "collapsed": false
 47 |    },
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "import seaborn as sns\n",
 51 |     "sns.set(color_codes=True)"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "code",
 56 |    "execution_count": 3,
 57 |    "metadata": {
 58 |     "collapsed": true
 59 |    },
 60 |    "outputs": [],
 61 |    "source": [
 62 |     "weed_pd = pd.read_csv(\"../data/Weed_Price.csv\", parse_dates=[-1])"
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": 4,
 68 |    "metadata": {
 69 |     "collapsed": false
 70 |    },
 71 |    "outputs": [],
 72 |    "source": [
 73 |     "weed_pd[\"month\"] = weed_pd.date.apply(lambda x: x.month)\n",
 74 |     "weed_pd[\"year\"] = weed_pd.date.apply(lambda x: x.year)"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "code",
 79 |    "execution_count": 5,
 80 |    "metadata": {
 81 |     "collapsed": false
 82 |    },
 83 |    "outputs": [
 84 |     {
 85 |      "data": {
 86 |       "text/html": [
 87 |        "<div>\n",
 88 |        "<table border=\"1\" class=\"dataframe\">\n",
 89 |        "  <thead>\n",
 90 |        "    <tr style=\"text-align: right;\">\n",
 91 |        "      <th></th>\n",
 92 |        "      <th>State</th>\n",
 93 |        "      <th>HighQ</th>\n",
 94 |        "      <th>HighQN</th>\n",
 95 |        "      <th>MedQ</th>\n",
 96 |        "      <th>MedQN</th>\n",
 97 |        "      <th>LowQ</th>\n",
 98 |        "      <th>LowQN</th>\n",
 99 |        "      <th>date</th>\n",
100 |        "      <th>month</th>\n",
101 |        "      <th>year</th>\n",
102 |        "    </tr>\n",
103 |        "  </thead>\n",
104 |        "  <tbody>\n",
105 |        "    <tr>\n",
106 |        "      <th>0</th>\n",
107 |        "      <td>Alabama</td>\n",
108 |        "      <td>339.06</td>\n",
109 |        "      <td>1042</td>\n",
110 |        "      <td>198.64</td>\n",
111 |        "      <td>933</td>\n",
112 |        "      <td>149.49</td>\n",
113 |        "      <td>123</td>\n",
114 |        "      <td>2014-01-01</td>\n",
115 |        "      <td>1</td>\n",
116 |        "      <td>2014</td>\n",
117 |        "    </tr>\n",
118 |        "    <tr>\n",
119 |        "      <th>1</th>\n",
120 |        "      <td>Alaska</td>\n",
121 |        "      <td>288.75</td>\n",
122 |        "      <td>252</td>\n",
123 |        "      <td>260.60</td>\n",
124 |        "      <td>297</td>\n",
125 |        "      <td>388.58</td>\n",
126 |        "      <td>26</td>\n",
127 |        "      <td>2014-01-01</td>\n",
128 |        "      <td>1</td>\n",
129 |        "      <td>2014</td>\n",
130 |        "    </tr>\n",
131 |        "    <tr>\n",
132 |        "      <th>2</th>\n",
133 |        "      <td>Arizona</td>\n",
134 |        "      <td>303.31</td>\n",
135 |        "      <td>1941</td>\n",
136 |        "      <td>209.35</td>\n",
137 |        "      <td>1625</td>\n",
138 |        "      <td>189.45</td>\n",
139 |        "      <td>222</td>\n",
140 |        "      <td>2014-01-01</td>\n",
141 |        "      <td>1</td>\n",
142 |        "      <td>2014</td>\n",
143 |        "    </tr>\n",
144 |        "    <tr>\n",
145 |        "      <th>3</th>\n",
146 |        "      <td>Arkansas</td>\n",
147 |        "      <td>361.85</td>\n",
148 |        "      <td>576</td>\n",
149 |        "      <td>185.62</td>\n",
150 |        "      <td>544</td>\n",
151 |        "      <td>125.87</td>\n",
152 |        "      <td>112</td>\n",
153 |        "      <td>2014-01-01</td>\n",
154 |        "      <td>1</td>\n",
155 |        "      <td>2014</td>\n",
156 |        "    </tr>\n",
157 |        "    <tr>\n",
158 |        "      <th>4</th>\n",
159 |        "      <td>California</td>\n",
160 |        "      <td>248.78</td>\n",
161 |        "      <td>12096</td>\n",
162 |        "      <td>193.56</td>\n",
163 |        "      <td>12812</td>\n",
164 |        "      <td>192.92</td>\n",
165 |        "      <td>778</td>\n",
166 |        "      <td>2014-01-01</td>\n",
167 |        "      <td>1</td>\n",
168 |        "      <td>2014</td>\n",
169 |        "    </tr>\n",
170 |        "  </tbody>\n",
171 |        "</table>\n",
172 |        "</div>"
173 |       ],
174 |       "text/plain": [
175 |        "        State   HighQ  HighQN    MedQ  MedQN    LowQ  LowQN       date  month  \\\n",
176 |        "0     Alabama  339.06    1042  198.64    933  149.49    123 2014-01-01      1   \n",
177 |        "1      Alaska  288.75     252  260.60    297  388.58     26 2014-01-01      1   \n",
178 |        "2     Arizona  303.31    1941  209.35   1625  189.45    222 2014-01-01      1   \n",
179 |        "3    Arkansas  361.85     576  185.62    544  125.87    112 2014-01-01      1   \n",
180 |        "4  California  248.78   12096  193.56  12812  192.92    778 2014-01-01      1   \n",
181 |        "\n",
182 |        "   year  \n",
183 |        "0  2014  \n",
184 |        "1  2014  \n",
185 |        "2  2014  \n",
186 |        "3  2014  \n",
187 |        "4  2014  "
188 |       ]
189 |      },
190 |      "execution_count": 5,
191 |      "metadata": {},
192 |      "output_type": "execute_result"
193 |     }
194 |    ],
195 |    "source": [
196 |     "weed_pd.head()"
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "markdown",
201 |    "metadata": {},
202 |    "source": [
203 |     "### Let's work on weed prices in California in 2014\n"
204 |    ]
205 |   },
206 |   {
207 |    "cell_type": "code",
208 |    "execution_count": 6,
209 |    "metadata": {
210 |     "collapsed": true
211 |    },
212 |    "outputs": [],
213 |    "source": [
214 |     "weed_ca_2014 = weed_pd[(weed_pd.State==\"California\") & (weed_pd.year==2014)]"
215 |    ]
216 |   },
217 |   {
218 |    "cell_type": "code",
219 |    "execution_count": 7,
220 |    "metadata": {
221 |     "collapsed": false
222 |    },
223 |    "outputs": [
224 |     {
225 |      "name": "stdout",
226 |      "output_type": "stream",
227 |      "text": [
228 |       "Mean: 245.894230769\n",
229 |       "Standard Deviation: 1.28990793937\n"
230 |      ]
231 |     }
232 |    ],
233 |    "source": [
234 |     "#Mean and standard deviation of high quality weed's price\n",
235 |     "print \"Mean:\", weed_ca_2014.HighQ.mean()\n",
236 |     "print \"Standard Deviation:\", weed_ca_2014.HighQ.std()"
237 |    ]
238 |   },
239 |   {
240 |    "cell_type": "code",
241 |    "execution_count": 8,
242 |    "metadata": {
243 |     "collapsed": false
244 |    },
245 |    "outputs": [
246 |     {
247 |      "data": {
248 |       "text/plain": [
249 |        "(245.761718492726, 246.02674304573577)"
250 |       ]
251 |      },
252 |      "execution_count": 8,
253 |      "metadata": {},
254 |      "output_type": "execute_result"
255 |     }
256 |    ],
257 |    "source": [
258 |     "#Confidence interval on the mean\n",
259 |     "stats.norm.interval(0.95, loc=weed_ca_2014.HighQ.mean(), scale = weed_ca_2014.HighQ.std()/np.sqrt(len(weed_ca_2014)))"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "markdown",
264 |    "metadata": {},
265 |    "source": [
266 |     "### Question: Are high-quality weed prices in Jan 2014 significantly higher than in Jan 2015?"
267 |    ]
268 |   },
269 |   {
270 |    "cell_type": "code",
271 |    "execution_count": 9,
272 |    "metadata": {
273 |     "collapsed": false
274 |    },
275 |    "outputs": [],
276 |    "source": [
277 |     "#Get the data\n",
278 |     "weed_ca_jan2014 = np.array(weed_pd[(weed_pd.State==\"California\") & (weed_pd.year==2014) & (weed_pd.month==1)].HighQ)\n",
279 |     "weed_ca_jan2015 = np.array(weed_pd[(weed_pd.State==\"California\") & (weed_pd.year==2015) & (weed_pd.month==1)].HighQ)"
280 |    ]
281 |   },
282 |   {
283 |    "cell_type": "code",
284 |    "execution_count": 10,
285 |    "metadata": {
286 |     "collapsed": false
287 |    },
288 |    "outputs": [
289 |     {
290 |      "name": "stdout",
291 |      "output_type": "stream",
292 |      "text": [
293 |       "Mean-2014 Jan: 248.445483871\n",
294 |       "Mean-2015 Jan: 243.602258065\n"
295 |      ]
296 |     }
297 |    ],
298 |    "source": [
299 |     "print \"Mean-2014 Jan:\", weed_ca_jan2014.mean()\n",
300 |     "print \"Mean-2015 Jan:\", weed_ca_jan2015.mean()"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": 11,
306 |    "metadata": {
307 |     "collapsed": false
308 |    },
309 |    "outputs": [
310 |     {
311 |      "name": "stdout",
312 |      "output_type": "stream",
313 |      "text": [
314 |       "Effect size: 4.84322580645\n"
315 |      ]
316 |     }
317 |    ],
318 |    "source": [
319 |     "print \"Effect size:\", weed_ca_jan2014.mean() - weed_ca_jan2015.mean()"
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "markdown",
324 |    "metadata": {},
325 |    "source": [
326 |     "**Null Hypothesis**: Mean prices aren't significantly different\n",
327 |     "\n",
328 |     "Perform **t-test** and determine the p-value. "
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "code",
333 |    "execution_count": 12,
334 |    "metadata": {
335 |     "collapsed": false
336 |    },
337 |    "outputs": [
338 |     {
339 |      "data": {
340 |       "text/plain": [
341 |        "Ttest_indResult(statistic=98.011325238158051, pvalue=6.2979718185084028e-68)"
342 |       ]
343 |      },
344 |      "execution_count": 12,
345 |      "metadata": {},
346 |      "output_type": "execute_result"
347 |     }
348 |    ],
349 |    "source": [
350 |     "stats.ttest_ind(weed_ca_jan2014, weed_ca_jan2015, equal_var=True)"
351 |    ]
352 |   },
353 |   {
354 |    "cell_type": "markdown",
355 |    "metadata": {},
356 |    "source": [
357 |     "p-value is the probability that the effective size was by chance. And here, p-value is almost 0.\n",
358 |     "\n",
359 |     "*Conclusion*: The price difference is significant. But is a price increase of $4.85 a big deal? The price decreased in 2015 by almost 2%. Always remember to look at effect size. "
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "markdown",
364 |    "metadata": {},
365 |    "source": [
366 |     "**Problem** Determine if prices of medium quality weed for Jan 2015 and Feb 2015 are significantly different for New York. "
367 |    ]
368 |   },
369 |   {
370 |    "cell_type": "code",
371 |    "execution_count": null,
372 |    "metadata": {
373 |     "collapsed": true
374 |    },
375 |    "outputs": [],
376 |    "source": []
377 |   },
378 |   {
379 |    "cell_type": "markdown",
380 |    "metadata": {},
381 |    "source": [
382 |     "### Assumption of t-test\n",
383 |     "\n",
384 |     "One assumption is that the data used came from a normal distribution. \n",
385 |     "<br>\n",
386 |     "There's a [Shapiro-Wilk test](https://en.wikipedia.org/wiki/Shapiro-Wilk) to test for normality. If p-value is less than 0.05, then there's a low chance that the distribution is normal."
387 |    ]
388 |   },
389 |   {
390 |    "cell_type": "code",
391 |    "execution_count": 13,
392 |    "metadata": {
393 |     "collapsed": false
394 |    },
395 |    "outputs": [
396 |     {
397 |      "data": {
398 |       "text/plain": [
399 |        "(0.9469053149223328, 0.12818680703639984)"
400 |       ]
401 |      },
402 |      "execution_count": 13,
403 |      "metadata": {},
404 |      "output_type": "execute_result"
405 |     }
406 |    ],
407 |    "source": [
408 |     "stats.shapiro(weed_ca_jan2015)"
409 |    ]
410 |   },
411 |   {
412 |    "cell_type": "code",
413 |    "execution_count": 14,
414 |    "metadata": {
415 |     "collapsed": false
416 |    },
417 |    "outputs": [
418 |     {
419 |      "data": {
420 |       "text/plain": [
421 |        "(0.9353488683700562, 0.06141229346394539)"
422 |       ]
423 |      },
424 |      "execution_count": 14,
425 |      "metadata": {},
426 |      "output_type": "execute_result"
427 |     }
428 |    ],
429 |    "source": [
430 |     "stats.shapiro(weed_ca_jan2014)"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "code",
435 |    "execution_count": null,
436 |    "metadata": {
437 |     "collapsed": true
438 |    },
439 |    "outputs": [],
440 |    "source": [
441 |     "#We seem to be good."
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "markdown",
446 |    "metadata": {},
447 |    "source": [
448 |     "### A/B testing\n",
449 |     "\n",
450 |     "Comparing two versions to check which one performs better. Eg: Show to people two variants for the same webpage that they want to see and find which one provides better conversion rate (or the relevant metric). [wiki](https://en.wikipedia.org/wiki/A/B_testing)"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "markdown",
455 |    "metadata": {},
456 |    "source": [
457 |     "**Exercise: Impact of regulation and deregulation.**\n",
458 |     "\n",
459 |     "Information on regulation of Weed in the US by State [wiki](Impact of regulation and deregulation on a couple of states )\n",
460 |     "\n",
461 |     "1. Alaska legalized it on 4th Nov 2014. Find if prices significantly changed in Dec 2014 compared to Oct 2014. \n",
462 |     "2. Maryland decriminalized possessing weed from Oct 1, 2014. Find if prices of weed changed significantly in Oct 2014 compared to Sep 2014"
463 |    ]
464 |   },
465 |   {
466 |    "cell_type": "code",
467 |    "execution_count": null,
468 |    "metadata": {
469 |     "collapsed": true
470 |    },
471 |    "outputs": [],
472 |    "source": []
473 |   },
474 |   {
475 |    "cell_type": "markdown",
476 |    "metadata": {},
477 |    "source": [
478 |     "<h2> Something to think about: Which of these give smaller p-values ? </h2>\n",
479 |     "   \n",
480 |     "   * Smaller effect size\n",
481 |     "   * Smaller standard error\n",
482 |     "   * Smaller sample size\n",
483 |     "   * Higher variance\n",
484 |     "   \n",
485 |     "   **Answer:** "
486 |    ]
487 |   },
488 |   {
489 |    "cell_type": "markdown",
490 |    "metadata": {},
491 |    "source": [
492 |     "### Chi-square tests"
493 |    ]
494 |   },
495 |   {
496 |    "cell_type": "markdown",
497 |    "metadata": {},
498 |    "source": [
499 |     "Chi-Square tests are used when the data are frequencies, rather than numerical score/price.\n",
500 |     "\n",
501 |     "The following two tests make use of chi-square statistic\n",
502 |     "\n",
503 |     "1. chi-square test for goodness of fit\n",
504 |     "2. chi-square test for independence\n",
505 |     "\n",
506 |     "Chi-square test is a non-parametric test. They do not require assumptions about population parameters and they do not test hypotheses about population parameters."
507 |    ]
508 |   },
509 |   {
510 |    "cell_type": "markdown",
511 |    "metadata": {},
512 |    "source": [
513 |     "<h2> Chi-Square test for goodness fit </h2>"
514 |    ]
515 |   },
516 |   {
517 |    "cell_type": "markdown",
518 |    "metadata": {},
519 |    "source": [
520 |     "$$ \\chi^2 = \\sum (O - E)^2/E $$\n",
521 |     "\n",
522 |     "* O is observed frequency\n",
523 |     "* E is expected frequency\n",
524 |     "* $ \\chi $ is the chi-square statistic"
525 |    ]
526 |   },
527 |   {
528 |    "cell_type": "markdown",
529 |    "metadata": {},
530 |    "source": [
531 |     "Let's assume the proportion of people who bought High, Medium and Low quality weed in Jan-2014 as the expected proportion. Find if proportion of people who bought weed in Jan 2015 conformed to the norm"
532 |    ]
533 |   },
534 |   {
535 |    "cell_type": "code",
536 |    "execution_count": 16,
537 |    "metadata": {
538 |     "collapsed": true
539 |    },
540 |    "outputs": [],
541 |    "source": [
542 |     "weed_jan2014 = weed_pd[(weed_pd.year==2014) & (weed_pd.month==1)][[\"HighQN\", \"MedQN\", \"LowQN\"]]\n",
543 |     "weed_jan2015 = weed_pd[(weed_pd.year==2015) & (weed_pd.month==1)][[\"HighQN\", \"MedQN\", \"LowQN\"]]"
544 |    ]
545 |   },
546 |   {
547 |    "cell_type": "code",
548 |    "execution_count": 17,
549 |    "metadata": {
550 |     "collapsed": false
551 |    },
552 |    "outputs": [],
553 |    "source": [
554 |     "Expected = np.array(weed_jan2014.apply(sum, axis=0))\n",
555 |     "Observed = np.array(weed_jan2015.apply(sum, axis=0))"
556 |    ]
557 |   },
558 |   {
559 |    "cell_type": "code",
560 |    "execution_count": 18,
561 |    "metadata": {
562 |     "collapsed": false
563 |    },
564 |    "outputs": [
565 |     {
566 |      "name": "stdout",
567 |      "output_type": "stream",
568 |      "text": [
569 |       "Expected: [2918004 2644757  263958] \n",
570 |       "Observed: [4057716 4035049  358088]\n"
571 |      ]
572 |     }
573 |    ],
574 |    "source": [
575 |     "print \"Expected:\", Expected, \"\\n\" , \"Observed:\", Observed"
576 |    ]
577 |   },
578 |   {
579 |    "cell_type": "code",
580 |    "execution_count": 19,
581 |    "metadata": {
582 |     "collapsed": false
583 |    },
584 |    "outputs": [
585 |     {
586 |      "name": "stdout",
587 |      "output_type": "stream",
588 |      "text": [
589 |       "Expected: [ 0.5007971   0.45390159  0.04530131] \n",
590 |       "Observed: [ 0.48015461  0.47747239  0.042373  ]\n"
591 |      ]
592 |     }
593 |    ],
594 |    "source": [
595 |     "print \"Expected:\", Expected/np.sum(Expected.astype(float)), \"\\n\" , \"Observed:\", Observed/np.sum(Observed.astype(float))"
596 |    ]
597 |   },
598 |   {
599 |    "cell_type": "code",
600 |    "execution_count": 20,
601 |    "metadata": {
602 |     "collapsed": false
603 |    },
604 |    "outputs": [
605 |     {
606 |      "data": {
607 |       "text/plain": [
608 |        "Power_divergenceResult(statistic=1209562.2775169075, pvalue=0.0)"
609 |       ]
610 |      },
611 |      "execution_count": 20,
612 |      "metadata": {},
613 |      "output_type": "execute_result"
614 |     }
615 |    ],
616 |    "source": [
617 |     "stats.chisquare(Observed, Expected)"
618 |    ]
619 |   },
620 |   {
621 |    "cell_type": "markdown",
622 |    "metadata": {},
623 |    "source": [
624 |     "*Inference* : We reject null hypothesis. The proportions in Jan 2015 is different than what was expected."
625 |    ]
626 |   }
627 |  ],
628 |  "metadata": {
629 |   "kernelspec": {
630 |    "display_name": "Python 2",
631 |    "language": "python",
632 |    "name": "python2"
633 |   },
634 |   "language_info": {
635 |    "codemirror_mode": {
636 |     "name": "ipython",
637 |     "version": 2
638 |    },
639 |    "file_extension": ".py",
640 |    "mimetype": "text/x-python",
641 |    "name": "python",
642 |    "nbconvert_exporter": "python",
643 |    "pygments_lexer": "ipython2",
644 |    "version": "2.7.10"
645 |   }
646 |  },
647 |  "nbformat": 4,
648 |  "nbformat_minor": 0
649 | }
650 | 


--------------------------------------------------------------------------------
/notebooks/8. Closing thoughts and terminology.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "# Terminology\n",
 8 |     "\n",
 9 |     "1. Null Hypothesis\n",
10 |     "2. Alternate Hypothesis\n",
11 |     "3. p-value (Probability of observing the metric from the data at least as extreme as computed just by chance)\n",
12 |     "4. Bootstrap\n",
13 |     "5. Acceptance Region\n",
14 |     "6. Rejection Region\n",
15 |     "7. t-test\n",
16 |     "8. One-tailed test\n",
17 |     "9. Two-tailed test\n",
18 |     "10. Significance test\n",
19 |     "11. Confidence interval\n",
20 |     "12. Power of a test\n",
21 |     "13. type 1 error (Rejecting null hypothesis when it is true). Also called false positive.\n",
22 |     "14. type 2 error (Failing to reject null hypothesis when it is false). Also called false negative\n",
23 |     "\n",
24 |     "# Some Practical thoughts \n",
25 |     "\n",
26 |     "1. Data could be biased. Confidence intervals may then not be representative.\n",
27 |     "2. One way to handle biased data is to use bias-corrected-confidence-intervals. \n",
28 |     "3. Outliers can impact confidence intervals. \n",
29 |     "4. Too often, people remove outliers. But they might be encoding some necessary information.\n",
30 |     "5. One way to handle outliers is to use ranking, instead of actual numbers.\n",
31 |     "6. If sample size is small, bootstrapping underestimates the size of confidence interval. \n",
32 |     "7. Better to use significance testing if sample size is small.\n",
33 |     "8. Bootstrapping should not be used find maximum value (Eg: maximum sales of shoes, 5th largest sales of shoes, etc)\n",
34 |     "9. Use rank transformation when using bootstrapping, if the data has outliers\n",
35 |     "10. Lack of representativeness is a problem for any statistical technique\n",
36 |     "11. The experiment should be random. (Eg: When doing A/B testing, randomize the subjects). Experimental bias can lead to wrong inferences. \n",
37 |     "12. Resampling time series data is tricky. The assumption we used - that each data point is independent, doesn't hold good for time series data. \n",
38 |     "13. Rank transformation changes the question. For our shoe sales example, a rank transformed analysis would be: \"Do sales tend to be higher after price optimization?\". (Our analysis was: \"Does post-price-optimization sales have a higher mean sales?\")\n",
39 |     "14. Power of a test increases if sample size increases\n",
40 |     "\n",
41 |     "# Types of Error\n",
42 |     "\n",
43 |     "1. Sampling Bias\n",
44 |     "2. Measurement Error\n",
45 |     "3. Random Error\n"
46 |    ]
47 |   }
48 |  ],
49 |  "metadata": {
50 |   "kernelspec": {
51 |    "display_name": "Python 2",
52 |    "language": "python",
53 |    "name": "python2"
54 |   },
55 |   "language_info": {
56 |    "codemirror_mode": {
57 |     "name": "ipython",
58 |     "version": 2
59 |    },
60 |    "file_extension": ".py",
61 |    "mimetype": "text/x-python",
62 |    "name": "python",
63 |    "nbconvert_exporter": "python",
64 |    "pygments_lexer": "ipython2",
65 |    "version": "2.7.10"
66 |   }
67 |  },
68 |  "nbformat": 4,
69 |  "nbformat_minor": 0
70 | }
71 | 


--------------------------------------------------------------------------------
/notebooks/9. References.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "markdown",
 5 |    "metadata": {},
 6 |    "source": [
 7 |     "# Books, Slides, Articles\n",
 8 |     "1. Book: [Think Stats](http://greenteapress.com/thinkstats2/)\n",
 9 |     "2. Book: [All of Statistics](http://www.stat.cmu.edu/~larry/all-of-statistics/)\n",
10 |     "2. Book: [Statistics is Easy](http://www.amazon.com/Statistics-Edition-Synthesis-Lectures-Mathematics/dp/160845570X)\n",
11 |     "3. Workshop: Computational Statistics Workshop, Allen Donwney SciPy 2015 [Repo](https://github.com/AllenDowney/CompStats) [Video](https://www.youtube.com/watch?v=5Vjrqnk7Igs)\n",
12 |     "4. Slides: [Statistics for Hackers](https://speakerdeck.com/jakevdp/statistics-for-hackers)"
13 |    ]
14 |   },
15 |   {
16 |    "cell_type": "code",
17 |    "execution_count": null,
18 |    "metadata": {
19 |     "collapsed": true
20 |    },
21 |    "outputs": [],
22 |    "source": []
23 |   }
24 |  ],
25 |  "metadata": {
26 |   "kernelspec": {
27 |    "display_name": "Python 2",
28 |    "language": "python",
29 |    "name": "python2"
30 |   },
31 |   "language_info": {
32 |    "codemirror_mode": {
33 |     "name": "ipython",
34 |     "version": 2
35 |    },
36 |    "file_extension": ".py",
37 |    "mimetype": "text/x-python",
38 |    "name": "python",
39 |    "nbconvert_exporter": "python",
40 |    "pygments_lexer": "ipython2",
41 |    "version": "2.7.10"
42 |   }
43 |  },
44 |  "nbformat": 4,
45 |  "nbformat_minor": 0
46 | }
47 | 


--------------------------------------------------------------------------------
/notebooks/img/6sigma.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/6sigma.png


--------------------------------------------------------------------------------
/notebooks/img/binomial.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/binomial.gif


--------------------------------------------------------------------------------
/notebooks/img/binomial_pmf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/binomial_pmf.png


--------------------------------------------------------------------------------
/notebooks/img/correlation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/correlation.gif


--------------------------------------------------------------------------------
/notebooks/img/correlation_not_causation.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/correlation_not_causation.gif


--------------------------------------------------------------------------------
/notebooks/img/covariance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/covariance.png


--------------------------------------------------------------------------------
/notebooks/img/exponential_pdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/exponential_pdf.png


--------------------------------------------------------------------------------
/notebooks/img/kurtosis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/kurtosis.png


--------------------------------------------------------------------------------
/notebooks/img/leastsquare.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/leastsquare.gif


--------------------------------------------------------------------------------
/notebooks/img/normal_cdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/normal_cdf.png


--------------------------------------------------------------------------------
/notebooks/img/normal_pdf.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/normal_pdf.png


--------------------------------------------------------------------------------
/notebooks/img/normaldist.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/normaldist.png


--------------------------------------------------------------------------------
/notebooks/img/skewness.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/skewness.png


--------------------------------------------------------------------------------
/notebooks/img/uniform.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/uniform.png


--------------------------------------------------------------------------------
/notebooks/img/variance.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/rouseguy/intro2stats/1c3c89725ede55fdb4af6e0b846888a2d664a211/notebooks/img/variance.png


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | appnope==0.1.0
 2 | backports.ssl-match-hostname==3.4.0.2
 3 | certifi==2015.4.28
 4 | decorator==4.0.2
 5 | funcsigs==0.4
 6 | functools32==3.2.3.post2
 7 | gnureadline==6.3.3
 8 | ipykernel==4.0.3
 9 | ipython==4.0.0
10 | ipython-genutils==0.1.0
11 | Jinja2==2.8
12 | jsonschema==2.5.1
13 | jupyter-client==4.0.0
14 | jupyter-core==4.0.4
15 | MarkupSafe==0.23
16 | matplotlib==1.4.3
17 | mistune==0.7.1
18 | mock==1.3.0
19 | nbconvert==4.0.0
20 | nbformat==4.0.0
21 | nose==1.3.7
22 | notebook==4.0.4
23 | numpy==1.9.2
24 | pandas==0.16.2
25 | path.py==7.7
26 | patsy==0.4.0
27 | pbr==1.6.0
28 | pexpect==3.3
29 | pickleshare==0.5
30 | ptyprocess==0.5
31 | Pygments==2.0.2
32 | pyparsing==2.0.3
33 | python-dateutil==2.4.2
34 | pytz==2015.4
35 | pyzmq==14.7.0
36 | scikit-learn==0.16.1
37 | scipy==0.16.0
38 | seaborn==0.6.0
39 | simplegeneric==0.8.1
40 | six==1.9.0
41 | terminado==0.5
42 | tornado==4.2.1
43 | traitlets==4.0.0
44 | vincent==0.4.4
45 | statsmodels==0.6.1
46 | 


--------------------------------------------------------------------------------
/requirements_linux.txt:
--------------------------------------------------------------------------------
 1 | backports.ssl-match-hostname==3.4.0.2
 2 | certifi==2015.4.28
 3 | decorator==4.0.2
 4 | funcsigs==0.4
 5 | functools32==3.2.3.post2
 6 | ipykernel==4.0.3
 7 | ipython==4.0.0
 8 | ipython-genutils==0.1.0
 9 | Jinja2==2.8
10 | jsonschema==2.5.1
11 | jupyter-client==4.0.0
12 | jupyter-core==4.0.4
13 | MarkupSafe==0.23
14 | matplotlib==1.4.3
15 | mistune==0.7.1
16 | mock==1.3.0
17 | nbconvert==4.0.0
18 | nbformat==4.0.0
19 | nose==1.3.7
20 | notebook==4.0.4
21 | numpy==1.9.2
22 | pandas==0.16.2
23 | path.py==7.7.1
24 | patsy==0.4.0
25 | pbr==1.6.0
26 | pexpect==3.3
27 | pickleshare==0.5
28 | ptyprocess==0.5
29 | Pygments==2.0.2
30 | pyparsing==2.0.3
31 | python-dateutil==2.4.2
32 | pytz==2015.4
33 | pyzmq==14.7.0
34 | scikit-learn==0.16.1
35 | scipy==0.16.0
36 | seaborn==0.6.0
37 | simplegeneric==0.8.1
38 | six==1.9.0
39 | statsmodels==0.6.1
40 | terminado==0.5
41 | tornado==4.2.1
42 | traitlets==4.0.0
43 | vincent==0.4.4
44 | wheel==0.24.0
45 | 


--------------------------------------------------------------------------------