├── .gitignore
├── Dummy Classifier Notebook.ipynb
├── Gaussian_Mixture_Models.ipynb
├── Hypothesis Testing
    ├── Hypo_Testing.ipynb
    ├── Hypothesis Testing.md
    ├── PlantGrowth.csv
    └── blood_pressure.csv
├── LICENSE
├── README.md
├── WIP
    ├── Exponential_Smoothing.ipynb
    └── Monte Carlo Simulation.ipynb
├── data
    ├── Breast-Cancer.csv
    ├── HorseKicks.txt
    ├── Housefly_wing_lengths.txt
    └── food_outlet_data.csv
├── ideas.md
├── images
    ├── Conditional.png
    ├── JDTable.png
    ├── Joint.png
    ├── Marginal.png
    ├── Marginal2.png
    ├── OneDirectional.png
    ├── OnettwoTailed.png
    ├── Table.png
    ├── TwoDirectional.png
    └── TypeIandTypeIIError.png
└── notebooks
    ├── 1-Way ANOVA.ipynb
    ├── ARIMA.ipynb
    ├── Baye's Theorem Notebook.ipynb
    ├── Binary Classification-Logistic Regression.ipynb
    ├── Central  Limit Theorem.ipynb
    ├── Correlation with Example
        ├── Correlation.ipynb
        ├── Movie Recommendation using Correlation.ipynb
        └── README.md
    ├── Data_Summary_Notebook#12.ipynb
    ├── Decision Tree.ipynb
    ├── Dummy Classifier Notebook.ipynb
    ├── Frequency Distribution.ipynb
    ├── Frequency_Distribution.ipynb
    ├── Heteroscedasticty.ipynb
    ├── Hypothesis Testing.ipynb
    ├── JointProbabilityDistribution.ipynb
    ├── KMeans_Clustering.ipynb
    ├── K_Nearest_Neighbours.ipynb
    ├── LNN.ipynb
    ├── Linear_Discriminant_Analysis.ipynb
    ├── Markov_chains.ipynb
    ├── MonteCarlo.ipynb
    ├── Multilinear-Regression.ipynb
    ├── PhiK Correlation
        ├── PhiK.ipynb
        ├── data_description.txt
        └── dataset.csv
    ├── Precision&Recall.ipynb
    ├── Principal-Component-Analysis.ipynb
    ├── Probability_Distributions_All.ipynb
    ├── RFR using GridsearchCV.ipynb
    ├── Statistical_&_Probability_Notebook_Part_1.ipynb
    ├── Statistical_&_Probability_Notebook_Part_2.ipynb
    ├── Support_Vector_Machine.ipynb
    ├── Time_Series.ipynb
    ├── Time_Series_Visualization.ipynb
    ├── agriculture_yield_rice
    ├── autocorrelation.ipynb
    ├── bias_variance_notebook.ipynb
    ├── biden_speech.txt
    ├── data_summary_breast_cancer.ipynb
    ├── intro-numpy-pandas-matplotlib.ipynb
    └── maximum-likelihood-estimation.ipynb


/.gitignore:
--------------------------------------------------------------------------------
1 | .ipynb_checkpoints/
2 | 


--------------------------------------------------------------------------------
/Dummy Classifier Notebook.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Dummy Classifier "
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## What is a `DummyClassifier`?\n",
 15 |     "\n",
 16 |     "DummyClassifier is a classifier that makes predictions using simple rules, which can be\n",
 17 |     "useful as a baseline for comparison against actual classifiers, especially with imbalanced classes(where the class distribution is not equal or close to equal, and is instead biased or skewed).\n",
 18 |     "\n",
 19 |     "A dummy classifier is basically a classifier which doesn’t even look at the training data while classification, but follows just a rule of thumb or strategy that we instruct it to use while classifying. It is done by including the strategy we want in the strategy parameter of the `DummyClassifier`.The main notion behind using a dummy classifier is that a classifier which is based on an analytic approach to do better than random guessing approach.\n"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "## Strategies used in Dummy Classifier\n",
 27 |     "\n",
 28 |     "The scikit-learn `DummyClassifier` class implements several strategies for random guessing classifiers. \n",
 29 |     "The strategies are as follows:\n",
 30 |     "\n",
 31 |     "- stratified : This strategy generates the prediction using the training set's class distribution\n",
 32 |     "- most_frequent : This always predicts the most frequent label in training set.\n",
 33 |     "- prior : This predicts the class that maximises the class prior.\n",
 34 |     "- uniform : This generates predictions uniformly at random\n",
 35 |     "- constant : Always predicts a constant label which is user defined. This is specificaly usefull for metrics that evaluate a non-majority class."
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     " ## Explaination through Implementation\n",
 43 |     " \n",
 44 |     "The dummy classifier gives measure of \"baseline\" performance--i.e. the success rate one should expect to achieve even if simply guessing.\n",
 45 |     "\n",
 46 |     "If one wishes to determine whether a given object possesses or does not possess a certain property. After analyzing a large number of the objects it is found that 90% contain the target property, then guessing that every future instance of the object possesses the target property gives a 90% likelihood of guessing correctly. Structuring these guesses is equivalent to using the `most_frequent` method in dummy clasifier\n",
 47 |     "\n",
 48 |     "Because many machine learning tasks attempt to increase the success rate of (e.g.) classification tasks, evaluating the baseline success rate can afford a floor value for the minimal value one's classifier should out-perform. \n",
 49 |     "\n",
 50 |     "If one trains a dummy classifier with the `stratified` parameter using the data discussed above, that classifier will predict that there is a 90% probability that each object it encounters possesses the target property. This is different from training a dummy classifier with the `most_frequent` parameter, as the latter would guess that all future objects possess the target property. Here's some code to illustrate:"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 1,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "import numpy as np \n",
 60 |     "import pandas as pd \n",
 61 |     "import matplotlib.pyplot as plt  \n",
 62 |     "import seaborn as sns "
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": 2,
 68 |    "metadata": {},
 69 |    "outputs": [
 70 |     {
 71 |      "data": {
 72 |       "text/html": [
 73 |        "<div>\n",
 74 |        "<style scoped>\n",
 75 |        "    .dataframe tbody tr th:only-of-type {\n",
 76 |        "        vertical-align: middle;\n",
 77 |        "    }\n",
 78 |        "\n",
 79 |        "    .dataframe tbody tr th {\n",
 80 |        "        vertical-align: top;\n",
 81 |        "    }\n",
 82 |        "\n",
 83 |        "    .dataframe thead th {\n",
 84 |        "        text-align: right;\n",
 85 |        "    }\n",
 86 |        "</style>\n",
 87 |        "<table border=\"1\" class=\"dataframe\">\n",
 88 |        "  <thead>\n",
 89 |        "    <tr style=\"text-align: right;\">\n",
 90 |        "      <th></th>\n",
 91 |        "      <th>Pregnancies</th>\n",
 92 |        "      <th>Glucose</th>\n",
 93 |        "      <th>BloodPressure</th>\n",
 94 |        "      <th>SkinThickness</th>\n",
 95 |        "      <th>Insulin</th>\n",
 96 |        "      <th>BMI</th>\n",
 97 |        "      <th>DiabetesPedigreeFunction</th>\n",
 98 |        "      <th>Age</th>\n",
 99 |        "      <th>Outcome</th>\n",
100 |        "    </tr>\n",
101 |        "    <tr>\n",
102 |        "      <th>Id</th>\n",
103 |        "      <th></th>\n",
104 |        "      <th></th>\n",
105 |        "      <th></th>\n",
106 |        "      <th></th>\n",
107 |        "      <th></th>\n",
108 |        "      <th></th>\n",
109 |        "      <th></th>\n",
110 |        "      <th></th>\n",
111 |        "      <th></th>\n",
112 |        "    </tr>\n",
113 |        "  </thead>\n",
114 |        "  <tbody>\n",
115 |        "    <tr>\n",
116 |        "      <th>0</th>\n",
117 |        "      <td>6</td>\n",
118 |        "      <td>148</td>\n",
119 |        "      <td>72</td>\n",
120 |        "      <td>35</td>\n",
121 |        "      <td>0</td>\n",
122 |        "      <td>33.6</td>\n",
123 |        "      <td>0.627</td>\n",
124 |        "      <td>50</td>\n",
125 |        "      <td>1</td>\n",
126 |        "    </tr>\n",
127 |        "    <tr>\n",
128 |        "      <th>2</th>\n",
129 |        "      <td>8</td>\n",
130 |        "      <td>183</td>\n",
131 |        "      <td>64</td>\n",
132 |        "      <td>0</td>\n",
133 |        "      <td>0</td>\n",
134 |        "      <td>23.3</td>\n",
135 |        "      <td>0.672</td>\n",
136 |        "      <td>32</td>\n",
137 |        "      <td>1</td>\n",
138 |        "    </tr>\n",
139 |        "    <tr>\n",
140 |        "      <th>3</th>\n",
141 |        "      <td>1</td>\n",
142 |        "      <td>89</td>\n",
143 |        "      <td>66</td>\n",
144 |        "      <td>23</td>\n",
145 |        "      <td>94</td>\n",
146 |        "      <td>28.1</td>\n",
147 |        "      <td>0.167</td>\n",
148 |        "      <td>21</td>\n",
149 |        "      <td>0</td>\n",
150 |        "    </tr>\n",
151 |        "  </tbody>\n",
152 |        "</table>\n",
153 |        "</div>"
154 |       ],
155 |       "text/plain": [
156 |        "    Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \\\n",
157 |        "Id                                                                      \n",
158 |        "0             6      148             72             35        0  33.6   \n",
159 |        "2             8      183             64              0        0  23.3   \n",
160 |        "3             1       89             66             23       94  28.1   \n",
161 |        "\n",
162 |        "    DiabetesPedigreeFunction  Age  Outcome  \n",
163 |        "Id                                          \n",
164 |        "0                      0.627   50        1  \n",
165 |        "2                      0.672   32        1  \n",
166 |        "3                      0.167   21        0  "
167 |       ]
168 |      },
169 |      "execution_count": 2,
170 |      "metadata": {},
171 |      "output_type": "execute_result"
172 |     }
173 |    ],
174 |    "source": [
175 |     "import pandas as pd\n",
176 |     "import matplotlib.pyplot as plt\n",
177 |     "df_train = pd.read_csv(\"C:/Users/sshre/OneDrive/Documents/DATA SCIENCE/train.csv\")\n",
178 |     "df_train.set_index(\"Id\").head(3)"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "code",
183 |    "execution_count": 3,
184 |    "metadata": {},
185 |    "outputs": [],
186 |    "source": [
187 |     "X = df_train.drop([\"Outcome\"],axis=1)\n",
188 |     "y = df_train[\"Outcome\"]"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "markdown",
193 |    "metadata": {},
194 |    "source": [
195 |     "Dividing the data set into training and test data for analysis"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": 4,
201 |    "metadata": {},
202 |    "outputs": [],
203 |    "source": [
204 |     "from sklearn.model_selection import train_test_split\n",
205 |     "\n",
206 |     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "markdown",
211 |    "metadata": {},
212 |    "source": [
213 |     "Checking the dummyclassifier performance with different strategies."
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": 8,
219 |    "metadata": {},
220 |    "outputs": [
221 |     {
222 |      "name": "stdout",
223 |      "output_type": "stream",
224 |      "text": [
225 |       "0.6276595744680851\n",
226 |       "0.5638297872340425\n",
227 |       "0.574468085106383\n"
228 |      ]
229 |     }
230 |    ],
231 |    "source": [
232 |     "from sklearn.metrics import accuracy_score\n",
233 |     "from sklearn.dummy import DummyClassifier\n",
234 |     "strategies = ['most_frequent', 'stratified', 'uniform'] \n",
235 |     "  \n",
236 |     "test_scores = [] \n",
237 |     "for s in strategies: \n",
238 |     "     \n",
239 |     "    dclf = DummyClassifier(strategy = s, random_state = 0) \n",
240 |     "    dclf.fit(X_train, y_train)  \n",
241 |     "    prediction=dclf.predict(X_test)\n",
242 |     "    score=(accuracy_score(y_test,prediction)) \n",
243 |     "    test_scores.append(score)\n",
244 |     "    print(score)"
245 |    ]
246 |   },
247 |   {
248 |    "cell_type": "markdown",
249 |    "metadata": {},
250 |    "source": [
251 |     "Plotting the performace score of the dummyclassifier "
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "code",
256 |    "execution_count": 9,
257 |    "metadata": {},
258 |    "outputs": [
259 |     {
260 |      "data": {
261 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEHCAYAAAC0pdErAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAc2ElEQVR4nO3dfZxcVZ3n8c+XhCAIyEMaN5JAMpqIjsNTyigyKGGEiU/JzoIRRlfjiBlXI+OM4AvXh3GCzgw+jLtoZjWwgIzyoOhAQCSgC4gokopAIGEimQRMiy9pkyAgCgS++8e9DZXK7U6R9O1KJ9/361WvrnPuuad+1dVdvzrn1j1XtomIiGi3S7cDiIiI7VMSREREVEqCiIiISkkQERFRKQkiIiIqje52AENl7NixnjhxYrfDiIgYUZYuXfob2z1V23aYBDFx4kSazWa3w4iIGFEk3T/QtkwxRUREpSSIiIiolAQRERGVak0QkmZIWilplaQzB2gzW9IKScslXdy2bW9Jv5T05TrjjIiIzdV2kFrSKGABcDzQCyyRtMj2ipY2k4GPAkfb3iDpgLZuzgJuqivGiIgYWJ0jiGnAKturbT8BXArMamvzXmCB7Q0Ath/s3yBpKvBC4LoaY4yIiAHUmSAOBNa2lHvLulZTgCmSbpF0q6QZAJJ2Ab4AnDHYA0iaK6kpqdnX1zeEoUdERJ0JQhV17WuLjwYmA8cCpwDnSdoHeD9wje21DML2QtsN242ensrzPCIiYivVeaJcLzChpTweeKCiza22nwTWSFpJkTCOAo6R9H5gT2CMpEdtVx7ojoiIoVfnCGIJMFnSJEljgJOBRW1trgCmA0gaSzHltNr2220fZHsicDpwUZJDRMTwqi1B2N4IzAMWA/cA37S9XNJ8STPLZouBdZJWADcAZ9heV1dMERHROe0olxxtNBrOWkwREc+NpKW2G1XbciZ1RERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUqjVBSJohaaWkVZIqryktabakFZKWS7q4rDtY0lJJd5T176szzoiI2NzoujqWNApYABwP9AJLJC2yvaKlzWTgo8DRtjdIOqDc9CvgNbYfl7QncHe57wN1xRsREZuqcwQxDVhle7XtJ4BLgVltbd4LLLC9AcD2g+XPJ2w/XrbZreY4IyKiQp1vvAcCa1vKvWVdqynAFEm3SLpV0oz+DZImSFpW9nF21ehB0lxJTUnNvr6+Gp5CRMTOq84EoYo6t5VHA5OBY4FTgPMk7QNge63tQ4GXAO+S9MLNOrMX2m7YbvT09Axp8BERO7s6E0QvMKGlPB5oHwX0AlfaftL2GmAlRcJ4RjlyWA4cU2OsERHRps4EsQSYLGmSpDHAycCitjZXANMBJI2lmHJaLWm8pN3L+n2BoymSR0REDJPavsVke6OkecBiYBRwvu3lkuYDTduLym0nSFoBPAWcYXudpOOBL0gyxVTV523fVVesI9Uv1j3G6Zffyc/u38CRB+/L5086jIP236PbYUXEDkJ2+2GBkanRaLjZbHY7jGE1+6s/4bY1658pT5u0H9/866O6GFFEjDSSltpuVG3L10dHsJ/dv2HQckTEtkiCGMGOPHjfQcsREdsiCWIE+/xJhzFt0n6M3kVMm7Qfnz/psG6HFBE7kNoOUkf9Dtp/jxxziIjaZAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUanWBCFphqSVklZJOnOANrMlrZC0XNLFZd3hkn5S1i2T9LY644yIiM3Vtty3pFHAAuB4oBdYImmR7RUtbSYDHwWOtr1B0gHlpseAd9q+V9KLgKWSFtt+qK54IyJiU3WOIKYBq2yvtv0EcCkwq63Ne4EFtjcA2H6w/Plz2/eW9x8AHgR6aow1IiLa1JkgDgTWtpR7y7pWU4Apkm6RdKukGe2dSJoGjAH+s2LbXElNSc2+vr4hDD0iIupMEKqoc1t5NDAZOBY4BThP0j7PdCCNA/4NeLftpzfrzF5ou2G70dOTAUZExFCqM0H0AhNayuOBByraXGn7SdtrgJUUCQNJewPfBT5u+9Ya44yIiAp1JoglwGRJkySNAU4GFrW1uQKYDiBpLMWU0+qy/b8DF9n+Vo0xRkTEAGpLELY3AvOAxcA9wDdtL5c0X9LMstliYJ2kFcANwBm21wGzgdcCcyTdUd4OryvWiIjYnOz2wwIjU6PRcLPZ7HYYEREjiqSlthtV23ImdUREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzhygzWxJKyQtl3RxS/21kh6SdHWdMUZERLXRdXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBLF58D9gD+uq4YIyJiYFscQUjqkfTV/k/ykl4uaU4HfU8DVtlebfsJ4FJgVlub9wILbG8AsP1g/wbbPwAe6expRETEUOtkiulC4CZgQlm+F/hwB/sdCKxtKfeWda2mAFMk3SLpVkkzOug3IiKGQScJ4gDbFwNPA9h+Eniqg/1UUee28mhgMnAscApwnqR9Oui7eABprqSmpGZfX1+nu0VERAc6SRC/k7Qf5Zu7pFfS2dRPL8+OOgDGAw9UtLnS9pO21wArKRJGR2wvtN2w3ejp6el0t4iI6EAnCeJ04CrgjyTdBFwCfLCD/ZYAkyVNkjQGOBlY1NbmCmA6gKSxFFNOqzuMPSIiajTot5gk7QKMongTfxnFtNGK8qDzoGxvlDQPWFz2cb7t5ZLmA03bi8ptJ0haQTFtdYbtdeVj3wwcAuwpqRd4j+3FW/tEIyLiuZHdfligrYF0q+1XD1M8W63RaLjZbHY7jIiIEUXSUtuNqm2dTDFdL6n966kREbGD6+REuXnACyQ9DvyeYprJtverNbKIiOiqThLE2NqjiIiI7c4WE4TtpyS9EXhtWXWj7WvrDSsiIrqtk6U2PgN8hOLrp6uBj0j6dN2BRUREd3UyxfQW4AjbTwFIOh/4GfDxOgOLiIju6nS5771b7u9VRyAREbF96WQE8VngZ5J+QPENpmOBT9YZVEREdF8nB6m/LukG4FUUCeKTtn9Ze2QREdFVnRykngk8avs7tr9NsXjfm+sPLSIiuqmTYxDzbf+2v2D7IeCs+kKKiIjtQScJoqpNbZcqjYiI7UMnCeJnkj4r6WBJB0n6HHB73YFFRER3dZIg5pXtrqS4LgTA+2uLKCIitgudfIvpUYqLBiFpL9udXE0uIiJGuAFHEJI+JumQ8v4YSdcBayX9WtJxwxZhRER0xWBTTH9JcY1ogHcCz6NY2fU44J9qjisiIrpssATxhJ+93NwM4GLbG20vB3btpHNJMyStlLRK0pkDtJktaYWk5ZIubql/l6R7y9u7On1CERExNAY7BvG4pJcBD1KMGj7Ssm2PLXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBZvx/w90ADMLC03HfDc3p2ERGx1QYbQXwYWASsAs6xvRqgvDbEsg76ngassr3a9hPApUD7pUvfCyzof+O3/WBZ/+fA9bbXl9uupxjFRETEMBlwBGH7FmByRf01wDUd9H0gsLal3EuxnlOrKQCSbgFGAZ8qL0ZUte+BHTxmREQMkTrPiFZFndvKoymS0LHAeOBmSa/ocF8kzQXmAhx00EHbEmtERLTp9HoQW6MXmNBSHg88UNHmSttP2l5D8a2pyR3ui+2Fthu2Gz09PUMafETEzq6T1Vw3G2VU1VVYAkyWNEnSGOBkimMara4Appd9jqWYcloNLAZOkLSvpH2BE8q6iIgYJp2MIG7rsG4TtjdSLNOxGLgH+Kbt5ZLml0uIU25bJ2kFcANwhu11ttdTrBi7pLzNL+siImKY6NlTHdo2FF85HUfx7aPZPHtcYG/gPNuHDEuEHWo0Gm42m90OIyJiRJG01HajattgU0VvAv6KYv5/Ac8miEeATwxphBERsd0Z7GuuFwAXSJpt+5vDGFNERGwHOjkGcYCkvQEkfUXSbZL+rOa4IiKiyzpJEHNtPyzpBIrppv8BfLbesCIiots6SRD9R7HfAFxge2mH+0VExAjWyRv9nZKuAd4CfE/SnlSc1RwRETuWTk54ezcwlWLhvcfKE9reU29YERHRbVscQdh+CvgjimMPALt3sl9ERIxsnSy18WWK5TDeUVb9DvhKnUFFRET3dTLF9BrbR0q6HcD2+nJtpYiI2IF1MlX0pKRdKA9MS9ofeLrWqCIiousGTBAtK7YuAL4N9Ej6B+BHwNnDEFtERHTRYFNMtwFH2r5I0lLg9RTrMb3V9t3DEl1ERHTNYAnimau62V4OLK8/nIiI2F4MliB6JP3dQBtt/0sN8URExHZisAQxCtiT6utDR0TEDm6wBPEr2/OHLZKIiNiuDPY114wcIiJ2YoMliG2+5oOkGZJWSlol6cyK7XMk9Um6o7yd2rLtbEl3l7e3bWssERHx3Ax2Rbn129KxpFEU51AcD/QCSyQtsr2irelltue17fsm4EjgcGA34CZJ37P98LbEFBERnatz0b1pFCvArrb9BHApMKvDfV8O3GR7o+3fAXcCM2qKMyIiKtSZIA4E1raUe8u6didKWibpckkTyro7gTdI2qNcXnw6MKF9R0lzJTUlNfv6+oY6/oiInVqdCaLqIHf7hYauAibaPhT4PvA1ANvXAdcAPwYuAX4CbNysM3uh7YbtRk9Pz1DGHhGx06szQfSy6af+8cADrQ1sr7P9eFk8l+LCRP3bPmP7cNvHUySbe2uMNSIi2tSZIJYAkyVNKpcHPxlY1NpA0riW4kzgnrJ+VLlqLJIOBQ4Frqsx1oiIaNPJ9SC2iu2NkuYBiynOyj7f9nJJ84Gm7UXAaZJmUkwfrQfmlLvvCtwsCeBh4B22N5tiioiI+shuPywwMjUaDTebzW6HERExokhaartRtS3Xlo6IiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEq1JghJMyStlLRK0pkV2+dI6pN0R3k7tWXbZyUtl3SPpHNUXn80IiKGR23XpJY0ClgAHA/0AkskLbK9oq3pZbbnte37GuBo4NCy6kfA64Ab64o3IiI2VecIYhqwyvZq208AlwKzOtzXwPOAMcBuwK7Ar2uJMiIiKtWZIA4E1raUe8u6didKWibpckkTAGz/BLgB+FV5W2z7nvYdJc2V1JTU7OvrG/pnEBGxE6szQVQdM3Bb+Spgou1Dge8DXwOQ9BLgZcB4iqRynKTXbtaZvdB2w3ajp6dnSIOPiNjZ1ZkgeoEJLeXxwAOtDWyvs/14WTwXmFre/wvgVtuP2n4U+B7w6hpjjYiINnUmiCXAZEmTJI0BTgYWtTaQNK6lOBPon0b6BfA6SaMl7UpxgHqzKaaIiKhPbd9isr1R0jxgMTAKON/2cknzgabtRcBpkmYCG4H1wJxy98uB44C7KKalrrV9VV2xRkTE5mS3HxYYmRqNhpvNZrfDiIgYUSQttd2o2pYzqSMiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiG209pG1zLl2DkdcdARzrp3D2kfWbnmnESAJIiJiG33ilk+w9NdL2eiNLP31Uj5xyye6HdKQSIKIiNhGdz5456DlkSoJIiJiGx12wGGDlkeqJIiIiG101tFnMfWFUxmt0Ux94VTOOvqsboc0JGpbrC8iYmcxYa8JXDjjwm6HMeQygoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIioVGuCkDRD0kpJqySdWbF9jqQ+SXeUt1PL+uktdXdI+oOk/1pnrBERsanavuYqaRSwADge6AWWSFpke0Vb08tsz2utsH0DcHjZz37AKuC6umKNiIjN1TmCmAassr3a9hPApcCsrejnJOB7th8b0ugiImJQdSaIA4HWJQ17y7p2J0paJulySRMqtp8MXFL1AJLmSmpKavb19W17xBER8Yw6E4Qq6txWvgqYaPtQ4PvA1zbpQBoH/AmwuOoBbC+03bDd6OnpGYKQIyKiX50JohdoHRGMBx5obWB7ne3Hy+K5wNS2PmYD/277ydqijIiISnUmiCXAZEmTJI2hmCpa1NqgHCH0mwnc09bHKQwwvRQREfWq7VtMtjdKmkcxPTQKON/2cknzgabtRcBpkmYCG4H1wJz+/SVNpBiB3FRXjBERMTDZ7YcFRqZGo+Fms9ntMCIiRhRJS203qrblTOqIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiI4bR+DVzwRpi/f/Fz/ZpuRxQxoCSIiOF05Qfg/lvg6Y3Fzys/0O2IIgaUBBExnNb+dPByxHYkCSJiOE141eDliO1IEkTEcJq1AA4+GnYZXfyctaDbEUUMqLblviOiwn6T4N3XdDuKiI5kBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzqzYPkdSn6Q7ytupLdsOknSdpHskrSgvQRoREcOktoPUkkYBC4DjgV5giaRFtle0Nb3M9ryKLi4CPmP7ekl7Ak/XFWtERGyuzhHENGCV7dW2nwAuBWZ1sqOklwOjbV8PYPtR24/VF2pERLSrM0EcCKxtKfeWde1OlLRM0uWSJpR1U4CHJH1H0u2SPleOSDYhaa6kpqRmX1/f0D+DiIidWJ3nQaiizm3lq4BLbD8u6X3A14DjyriOAY4AfgFcBswB/u8mndkLgYUA5bGM+4fyCWxnxgK/6XYQsdXy+o1cO/prd/BAG+pMEL3AhJbyeOCB1ga217UUzwXObtn3dturASRdAbyatgTR1lfPEMS83ZLUtN3odhyxdfL6jVw782tX5xTTEmCypEmSxgAnA4taG0ga11KcCdzTsu++kvrf9I8D2g9uR0REjWobQdjeKGkesBgYBZxve7mk+UDT9iLgNEkzgY3AeoppJGw/Jel04AeSBCylGGFERMQwkd1+WCC2R5LmlsdcYgTK6zdy7cyvXRJERERUylIbERFRKQkiIiIqJUFEbIGkD0naYyv2myPpRS3l88pVApD01nKdsRskNSSd8xz7vlHSTvnVy7q0vg6SdpP0/XKNuLd1O7ZuSYIYRpImSvrLDtpdUp5d/rfDEVenOo1/B/QhoDJBVJ3h32IO8EyCsH1qy1pk7wHeb3u67abt04Yq2Ng6ba/DEcCutg+3fVkn+2/hb2FESoIYXhOBQd9gJf0X4DW2D7X9xbZt3b4C4ES2EP9IJ+n5kr4r6U5Jd0v6e4o3+Rsk3VC2eVTSfEk/BY6S9ElJS8r2C1U4CWgA3yg/he7e/6lf0ieBPwW+Ui4jc6ykq1se//yyv9slzSrrd5d0afnB4TJg9278fkaS8gPN3S3l0yV9qnwdzpZ0m6SfSzqm3H6spKslHQB8HTi8fO1eLOnPytfjrvL12a3c577y9f8R8Nay7y9K+mE5QnxluWTQvZI+3ZVfxLawndsAN4o3xP8AzgPuBr4BvB64BbiXYkHC/YArgGXArcCh5b6vA+4ob7cDe5Xbf1vW/e0Aj7kM+H3Z5hjgRuAfgZuADwM9wLcpTiZcAhxd7rc/cF35WF8F7qdYImAicHdL/6cDnyrvvxi4luI8k5uBQ8r6C4FzgB8Dq4GTyvotxj/Sb8CJwLkt5RcA9wFjW+oMzG4p79dy/9+At5T3bwQaLdueKbfdPxa4urz/j8A7yvv7AD8Hng/8HcW5RACHUpw71NjW57sj3wb62y9/918o694IfL/idWi9/zyKdeWmlOWLgA+V9+8DPtL2Gp9d3v8bitUjxgG7UawQsX+3fy/P5ZYRxJa9BPjfFP+Uh1B8gv5Tij+2/wn8A8WyIIeW5YvK/U4HPmD7cIo3+t8DZwI3uxi2bjI6aDET+M+yzc1l3T62X2f7C2UsX7T9Soo3s/PKNn8P/Mj2ERRnrB/UwXNbCHzQ9tQy3n9t2TaufJ5vBv65rOsk/pHuLuD15SfMY2z/tqLNUxRJut90ST+VdBfFWf9/vA2PfwJwpqQ7KN5snkfxWr6W4lMttpdRfJCIrfed8udSikQymJcCa2z/vCx/jeL16Nc+BdW/YsRdwHLbv7L9OMWHrQmMIN2eshgJ1ti+C0DScuAHtl2+GUykWOjqRADb/0/S/pJeQDHK+BdJ3wC+Y7u3OCl8q7T+Ab4eeHlLX3tL2oviD/a/lXF8V9KGwTpUcY2N1wDfaulrt5YmV9h+Glgh6YVbG/hIY/vnkqZSfLL8J0nXVTT7g+2nACQ9jyKxNmyvlfQpijf1rSXgRNsrN6ksXqOctPTcbGTTafTW1+Xx8udTbPl9cEv/uL9rK/f3/XTL/f7yiHrPzQhiy9pf4NYXfzQDrFpr+5+BUynmim+VdMg2xND6B7gLcFT5Kf5w2wfafqT/cSv2HeifZBfgoZZ+Drf9spZ2rc97qzPbSFN+6+gx218HPg8cCTxCMUVYpf/3+Zsy6Z7Usm2w/QayGPhgucQMko4o638IvL2sewXFiDYG92vggPJD224Uo+Gt8R/AREkvKcv/nWLKd4eXBLHtWv9xjwV+Y/thSS+2fZfts4EmxfTU1rxhtLsOeOYKfJIOr4jjDcC+ZX3lP4nth4E1kt5a7iNJh23hsYci/u3dnwC3lVM8HwM+TTEV973+g9StbD9EsU7YXRTHopa0bL6Q4kD0HZI6Pah8FrArsKw8wHpWWf9/gD0lLQM+Atz2XJ/Yzsb2k8B84KfA1RRv9FvTzx+Ad1OMtu+i+HD4laGKc3uWpTYGoeI62FfbfkVZvrAsX96/jWJq5wJgEvAYMNf2MklfAqZTDGFXUHzl8WmKg8JjgQur5vErHvNG4HTbzbI8luJSri+jGMH80Pb7JO0PXFL2fRPFdNNU27+RdBpwGrAG+CVwn+1PSZpE8cYzjuJN6VLb81ufZ/mYj9reU9KuW4o/InYcSRA7KEn3UcyL78gXOomIGmWKKSIiKmUE0SWS/pxnr6DXb43tv+hGPBER7ZIgIiKiUqaYIiKiUhJERERUSoKIqCDpY5KWl4vj3SHpVRqiZb8jRookiIg2ko6iOKHwyHKNrddTLNY2JMt+R4wUSRARmxtHcUb84wDluSQnMXTLfk+VdJOkpZIWSxpX9vfKcsTyExXLgN9d1t/ccsY8km6RlKU2onZJEBGbuw6YoOJaAf8q6XW2z6FYunm67ellu+dTLCf9Kts/Ar5s+5XlWfC7A28uz0ZvAm8vV/bdCHyJYgn1qcD5wGfK/i4A3mf7KIoz8PudRzEKQdIUYLdyRdeIWiVBRLSx/SgwFZgL9AGXSZpT0XRrlv1+KfAK4PpyvaePA+Ml7QPsZfvHZbuLW/b5FvDmcqmTv6JY4ymidiNq6dmI4VIu530jcGP5hv+uimZbs+y3KK4RcNQmldK+FW37Y3lM0vXALGA2xZRVRO0ygohoI+mlkia3VB1OcYW+oVj2eyXQUx4IR9Kukv7Y9gbgEUmvLtud3Nb/eRRX+Vtie/3WPK+I5yojiIjN7Ql8qZz22QisophuOoVi2e9ftRyHAIplvyX1L/t9H9XLfv8eOIoieZxTXlhqNPC/gOXAe4BzJf2OYvTy25b+l0p6mOI4RcSwyFIbEdsJSXuWxz+QdCYwzvbflOUXUSSNQ8or/UXULlNMEduPN5Vfhb2b4jrmnwaQ9E6Ki958LMkhhlNGEBERUSkjiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhK/x8uoDdie+EbqgAAAABJRU5ErkJggg==\n",
262 |       "text/plain": [
263 |        "<Figure size 432x288 with 1 Axes>"
264 |       ]
265 |      },
266 |      "metadata": {
267 |       "needs_background": "light"
268 |      },
269 |      "output_type": "display_data"
270 |     }
271 |    ],
272 |    "source": [
273 |     "ax = sns.stripplot(strategies, test_scores); \n",
274 |     "ax.set(xlabel ='Strategy', ylabel ='Test Score') \n",
275 |     "plt.show() "
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "markdown",
280 |    "metadata": {},
281 |    "source": [
282 |     "Checking the performance of `RandomForestClassifier` on the data"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": 10,
288 |    "metadata": {},
289 |    "outputs": [
290 |     {
291 |      "data": {
292 |       "text/plain": [
293 |        "0.776595744680851"
294 |       ]
295 |      },
296 |      "execution_count": 10,
297 |      "metadata": {},
298 |      "output_type": "execute_result"
299 |     }
300 |    ],
301 |    "source": [
302 |     "from sklearn.ensemble import RandomForestClassifier\n",
303 |     "from sklearn.metrics import accuracy_score\n",
304 |     "ans=RandomForestClassifier()\n",
305 |     "ans.fit(X_train,y_train)\n",
306 |     "prediction=ans.predict(X_test)\n",
307 |     "accuracy_score(y_test,prediction)"
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "markdown",
312 |    "metadata": {},
313 |    "source": [
314 |     "On comparing the scores of the KNN classifier with the dummy classifier, we come to the conclusion that the KNN classifier is, in fact, a good classifier for the given data."
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "metadata": {},
320 |    "source": [
321 |     "## Imbalanced Class and `Dummy Classifier`\n",
322 |     "\n",
323 |     "A major motivation for Dummy Classifier is F-score, when the positive class is in minority (i.e. imbalanced classes). This classifier is used for sanity test of actual classifier. Actually, dummy classifier completely ignores the input data. In case of 'most frequent' method, it checks the occurrence of most frequent label."
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "code",
328 |    "execution_count": 11,
329 |    "metadata": {},
330 |    "outputs": [
331 |     {
332 |      "name": "stdout",
333 |      "output_type": "stream",
334 |      "text": [
335 |       "0 178\n",
336 |       "1 182\n",
337 |       "2 177\n",
338 |       "3 183\n",
339 |       "4 181\n",
340 |       "5 182\n",
341 |       "6 181\n",
342 |       "7 179\n",
343 |       "8 174\n",
344 |       "9 180\n"
345 |      ]
346 |     }
347 |    ],
348 |    "source": [
349 |     "from sklearn.datasets import load_digits\n",
350 |     "\n",
351 |     "dataset = load_digits()\n",
352 |     "X, y = dataset.data, dataset.target\n",
353 |     "\n",
354 |     "for class_name, class_count in zip(dataset.target_names, np.bincount(dataset.target)):\n",
355 |     "    print(class_name,class_count)"
356 |    ]
357 |   },
358 |   {
359 |    "cell_type": "code",
360 |    "execution_count": 12,
361 |    "metadata": {},
362 |    "outputs": [
363 |     {
364 |      "name": "stdout",
365 |      "output_type": "stream",
366 |      "text": [
367 |       "Original labels:\t [1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]\n",
368 |       "New binary labels:\t [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]\n"
369 |      ]
370 |     }
371 |    ],
372 |    "source": [
373 |     "y_imbalanced = y.copy()\n",
374 |     "y_imbalanced[y_imbalanced != 1] = 0\n",
375 |     "\n",
376 |     "print('Original labels:\\t', y[1:20])\n",
377 |     "print('New binary labels:\\t', y_imbalanced[1:20])"
378 |    ]
379 |   },
380 |   {
381 |    "cell_type": "code",
382 |    "execution_count": 14,
383 |    "metadata": {},
384 |    "outputs": [
385 |     {
386 |      "data": {
387 |       "text/plain": [
388 |        "array([1615,  182], dtype=int64)"
389 |       ]
390 |      },
391 |      "execution_count": 14,
392 |      "metadata": {},
393 |      "output_type": "execute_result"
394 |     }
395 |    ],
396 |    "source": [
397 |     "np.bincount(y_imbalanced)"
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "markdown",
402 |    "metadata": {},
403 |    "source": [
404 |     "We can observe that in the above data array one class is more frequent than other which shows it is an imbalanced class"
405 |    ]
406 |   },
407 |   {
408 |    "cell_type": "code",
409 |    "execution_count": 20,
410 |    "metadata": {},
411 |    "outputs": [
412 |     {
413 |      "data": {
414 |       "text/plain": [
415 |        "0.5466666666666666"
416 |       ]
417 |      },
418 |      "execution_count": 20,
419 |      "metadata": {},
420 |      "output_type": "execute_result"
421 |     }
422 |    ],
423 |    "source": [
424 |     "X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y_imbalanced, random_state=0)\n",
425 |     "\n",
426 |     "# Accuracy of Support Vector Machine classifier\n",
427 |     "from sklearn.naive_bayes import GaussianNB\n",
428 |     "gnb = GaussianNB()\n",
429 |     "y_pred = gnb.fit(X_train1, y_train1)\n",
430 |     "gnb.score(X_test1, y_test1)"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "markdown",
435 |    "metadata": {},
436 |    "source": [
437 |     "Here on using Naive Bayes Classifier we get a score of 0.55 , We know this is not a good score and we can use other classifiers and fit the model and check their score. "
438 |    ]
439 |   },
440 |   {
441 |    "cell_type": "code",
442 |    "execution_count": 24,
443 |    "metadata": {},
444 |    "outputs": [
445 |     {
446 |      "data": {
447 |       "text/plain": [
448 |        "0.9088888888888889"
449 |       ]
450 |      },
451 |      "execution_count": 24,
452 |      "metadata": {},
453 |      "output_type": "execute_result"
454 |     }
455 |    ],
456 |    "source": [
457 |     "from sklearn.ensemble import RandomForestClassifier\n",
458 |     "\n",
459 |     "clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train1, y_train1)\n",
460 |     "clf.score(X_test1,y_test1)"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "markdown",
465 |    "metadata": {},
466 |    "source": [
467 |     "On Using RandomForestClassifier we get a score of 0.908 which is a great score and also much better than what Naive Bayes Classifier performed . "
468 |    ]
469 |   },
470 |   {
471 |    "cell_type": "markdown",
472 |    "metadata": {},
473 |    "source": [
474 |     "## Using dummy classifier as baseline"
475 |    ]
476 |   },
477 |   {
478 |    "cell_type": "code",
479 |    "execution_count": 25,
480 |    "metadata": {},
481 |    "outputs": [
482 |     {
483 |      "data": {
484 |       "text/plain": [
485 |        "0.9044444444444445"
486 |       ]
487 |      },
488 |      "execution_count": 25,
489 |      "metadata": {},
490 |      "output_type": "execute_result"
491 |     }
492 |    ],
493 |    "source": [
494 |     "from sklearn.dummy import DummyClassifier\n",
495 |     "dummy_majority = DummyClassifier(strategy = 'most_frequent').fit(X_train1, y_train1)\n",
496 |     "y_dummy_predictions = dummy_majority.predict(X_test)\n",
497 |     "dummy_majority.score(X_test1, y_test1)"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "markdown",
502 |    "metadata": {},
503 |    "source": [
504 |     "We observe that the RandomForest classsifier score is not much compared to dummy classifier which also has a score of more than .90 . Which shows that RandomForest is not a right fit for the model despite the good score.\n",
505 |     "This makes us realise that we need a better model which scores better ."
506 |    ]
507 |   },
508 |   {
509 |    "cell_type": "code",
510 |    "execution_count": 27,
511 |    "metadata": {},
512 |    "outputs": [
513 |     {
514 |      "data": {
515 |       "text/plain": [
516 |        "0.9955555555555555"
517 |       ]
518 |      },
519 |      "execution_count": 27,
520 |      "metadata": {},
521 |      "output_type": "execute_result"
522 |     }
523 |    ],
524 |    "source": [
525 |     "from sklearn.svm import SVC\n",
526 |     "svm = SVC(kernel='rbf', C=1).fit(X_train1, y_train1)\n",
527 |     "svm.score(X_test1, y_test1)"
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "markdown",
532 |    "metadata": {},
533 |    "source": [
534 |     "On using SVM classifier using RBF kernel for the model,gives a whoping score of 0.99 which is a good score as well as it  performce better than dummy classifier which is our baseline. "
535 |    ]
536 |   },
537 |   {
538 |    "cell_type": "markdown",
539 |    "metadata": {},
540 |    "source": [
541 |     "*Thus, Dummy Classifier works as a baseline and gives an idea of the performance of the model on dataset*\n"
542 |    ]
543 |   },
544 |   {
545 |    "cell_type": "code",
546 |    "execution_count": null,
547 |    "metadata": {},
548 |    "outputs": [],
549 |    "source": []
550 |   }
551 |  ],
552 |  "metadata": {
553 |   "kernelspec": {
554 |    "display_name": "Python 3",
555 |    "language": "python",
556 |    "name": "python3"
557 |   },
558 |   "language_info": {
559 |    "codemirror_mode": {
560 |     "name": "ipython",
561 |     "version": 3
562 |    },
563 |    "file_extension": ".py",
564 |    "mimetype": "text/x-python",
565 |    "name": "python",
566 |    "nbconvert_exporter": "python",
567 |    "pygments_lexer": "ipython3",
568 |    "version": "3.7.6"
569 |   }
570 |  },
571 |  "nbformat": 4,
572 |  "nbformat_minor": 4
573 | }
574 | 


--------------------------------------------------------------------------------
/Hypothesis Testing/Hypo_Testing.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Hypo_Testing.ipynb",
  7 |       "provenance": [],
  8 |       "toc_visible": true
  9 |     },
 10 |     "kernelspec": {
 11 |       "name": "python3",
 12 |       "display_name": "Python 3"
 13 |     },
 14 |     "language_info": {
 15 |       "name": "python"
 16 |     }
 17 |   },
 18 |   "cells": [
 19 |     {
 20 |       "cell_type": "markdown",
 21 |       "metadata": {
 22 |         "id": "4uItjVLu5mXc"
 23 |       },
 24 |       "source": [
 25 |         "# T-test\n",
 26 |         "\n",
 27 |         "A t-test is a type of inferential statistic which is used to determine if there\n",
 28 |         "is a significant difference between the means of two groups which may be related in certain features.\n",
 29 |         "\n",
 30 |         "T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.\n",
 31 |         "One sample t-test : The One Sample t Test determines whether the\n",
 32 |         "sample mean is statistically different from a known or hypothesised population mean. The One Sample t Test is a parametric test."
 33 |       ]
 34 |     },
 35 |     {
 36 |       "cell_type": "code",
 37 |       "metadata": {
 38 |         "colab": {
 39 |           "base_uri": "https://localhost:8080/"
 40 |         },
 41 |         "id": "y90dateE5d9E",
 42 |         "outputId": "69e603b5-689b-47d9-c61a-cc1d3f3b3279"
 43 |       },
 44 |       "source": [
 45 |         "#-----------------------------------------T-test---------------------------------#\n",
 46 |         "\n",
 47 |         "\n",
 48 |         "from scipy.stats import ttest_1samp\n",
 49 |         "import numpy as np\n",
 50 |         "\n",
 51 |         "#10 ages and you are checking whether avg age is 30 or not.\n",
 52 |         "#H0: The average age is 30\n",
 53 |         "#H1: The average age is not 30.\n",
 54 |         "ages = np.array([32,34,29,29,22,39,38,37,38,36,30,26,22,22])\n",
 55 |         "print(ages)\n",
 56 |         "#mean of the age \n",
 57 |         "ages_mean = np.mean(ages)\n",
 58 |         "print(ages_mean)\n",
 59 |         "#One Sample t-test\n",
 60 |         "tset, pval = ttest_1samp(ages, 30)\n",
 61 |         "print('p-values',pval)\n",
 62 |         "if pval < 0.05:    # alpha value is 0.05 or 5%\n",
 63 |         "   print(\" we are rejecting null hypothesis\")\n",
 64 |         "else:\n",
 65 |         "  print(\"we are accepting null hypothesis\")\n"
 66 |       ],
 67 |       "execution_count": 1,
 68 |       "outputs": [
 69 |         {
 70 |           "output_type": "stream",
 71 |           "text": [
 72 |             "[32 34 29 29 22 39 38 37 38 36 30 26 22 22]\n",
 73 |             "31.0\n",
 74 |             "p-values 0.5605155888171379\n",
 75 |             "we are accepting null hypothesis\n"
 76 |           ],
 77 |           "name": "stdout"
 78 |         }
 79 |       ]
 80 |     },
 81 |     {
 82 |       "cell_type": "markdown",
 83 |       "metadata": {
 84 |         "id": "Lie2acfC5lrP"
 85 |       },
 86 |       "source": [
 87 |         "# Z-Test\n",
 88 |         "\n",
 89 |         "Z test is used if:\n",
 90 |         "Your sample size is greater than 30. Otherwise, use a t test.\n",
 91 |         "Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect another data point.\n",
 92 |         "Your data should be normally distributed. However, for large sample sizes (over 30) this doesn’t always matter.\n",
 93 |         "Your data should be randomly selected from a population, where each item has an equal chance of being selected.\n",
 94 |         "Sample sizes should be equal if at all possible."
 95 |       ]
 96 |     },
 97 |     {
 98 |       "cell_type": "code",
 99 |       "metadata": {
100 |         "colab": {
101 |           "base_uri": "https://localhost:8080/"
102 |         },
103 |         "id": "uTfU9vxz6Dhc",
104 |         "outputId": "0da843b1-50b7-4a44-9ad0-d37f9e7479cd"
105 |       },
106 |       "source": [
107 |         "#-----------One Sample Z-test-----------#\n",
108 |         "import pandas as pd\n",
109 |         "from scipy import stats\n",
110 |         "from statsmodels.stats import weightstats as stests\n",
111 |         "df = pd.read_csv(\"blood_pressure.csv\")\n",
112 |         "\n",
113 |         "ztest ,pval = stests.ztest(df['bp_before'], x2=None, value=156)\n",
114 |         "print('One-sample z-test')\n",
115 |         "print(float(pval))\n",
116 |         "if pval<0.05:\n",
117 |         "    print(\"reject null hypothesis\")\n",
118 |         "else:\n",
119 |         "    print(\"accept null hypothesis\")\n",
120 |         "\n",
121 |         "#-----------Two Sample Z-test-----------#\n",
122 |         "#Two-sample Z test- Just check two independent data groups and decide whether sample mean of two group is equal or not.\n",
123 |         "#H0 : mean of two group is 0\n",
124 |         "#H1 : mean of two group is not 0\n",
125 |         "#Example : we are checking in blood data after blood and before blood data.\n",
126 |         "\n",
127 |         "ztest ,pval1 = stests.ztest(df['bp_before'], x2=df['bp_after'], value=0,alternative='two-sided')\n",
128 |         "print('Two-sample z-test')\n",
129 |         "print(float(pval1))\n",
130 |         "if pval1<0.05:\n",
131 |         "    print(\"reject null hypothesis\")\n",
132 |         "else:\n",
133 |         "    print(\"accept null hypothesis\")"
134 |       ],
135 |       "execution_count": 2,
136 |       "outputs": [
137 |         {
138 |           "output_type": "stream",
139 |           "text": [
140 |             "One-sample z-test\n",
141 |             "0.6651614730255063\n",
142 |             "accept null hypothesis\n",
143 |             "Two-sample z-test\n",
144 |             "0.002162306611369422\n",
145 |             "reject null hypothesis\n"
146 |           ],
147 |           "name": "stdout"
148 |         },
149 |         {
150 |           "output_type": "stream",
151 |           "text": [
152 |             "/usr/local/lib/python3.7/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n",
153 |             "  import pandas.util.testing as tm\n"
154 |           ],
155 |           "name": "stderr"
156 |         }
157 |       ]
158 |     },
159 |     {
160 |       "cell_type": "markdown",
161 |       "metadata": {
162 |         "id": "O0AqvlnM6aLN"
163 |       },
164 |       "source": [
165 |         "# ANOVA (F-TEST) :- \n",
166 |         "The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time.\n",
167 |         "For example, if we wanted to test whether voter age differs based on some categorical variable like race, we have to compare the means of each level or group the variable. \n",
168 |         "The analysis of variance or ANOVA is a statistical inference test that lets you compare multiple groups at the same time.\n"
169 |       ]
170 |     },
171 |     {
172 |       "cell_type": "code",
173 |       "metadata": {
174 |         "colab": {
175 |           "base_uri": "https://localhost:8080/"
176 |         },
177 |         "id": "uZRF_2qM6Qv_",
178 |         "outputId": "64439f9f-a116-497d-90e6-d43e0bd552dd"
179 |       },
180 |       "source": [
181 |         "\n",
182 |         "#------------------------One Way F-test(Anova)------------------------# \n",
183 |         "#To tell whether two or more groups are similar or not based on their mean similarity and f-score.\n",
184 |         "#Example : there are 3 different category of plant and their weight and need to check whether all 3 group are similar or not.\n",
185 |         "import pandas as pd\n",
186 |         "from scipy import stats\n",
187 |         "from statsmodels.stats import weightstats as stests\n",
188 |         "print('One-way Anova')\n",
189 |         "df_anova = pd.read_csv('PlantGrowth.csv')\n",
190 |         "df_anova = df_anova[['weight','group']]\n",
191 |         "grps = pd.unique(df_anova.group.values)\n",
192 |         "d_data = {grp:df_anova['weight'][df_anova.group == grp] for grp in grps}\n",
193 |         " \n",
194 |         "F, p = stats.f_oneway(d_data['ctrl'], d_data['trt1'], d_data['trt2'])\n",
195 |         "print(\"p-value for significance is: \", p)\n",
196 |         "if p<0.05:\n",
197 |         "    print(\"reject null hypothesis\")\n",
198 |         "else:\n",
199 |         "    print(\"accept null hypothesis\")\n",
200 |         "\n",
201 |         "#------------------------------------Two Way F-test-----------------------------------# \n",
202 |         "#Two way F-test is extension of 1-way f-test, it is used when we have 2 independent variable and 2+ groups.\n",
203 |         "#2-way F-test does not tell which variable is dominant. If we need to check individual significance then Post-hoc testing need to be performed.\n",
204 |         "\n",
205 |         "#e.g: Grand mean crop yield (the mean crop yield not by any sub-group), as well the mean crop yield by each factor, \n",
206 |         "# as well as by the factors grouped together.\n",
207 |         "import statsmodels.api as sm\n",
208 |         "from statsmodels.formula.api import ols\n",
209 |         "print('Two-way ANova')\n",
210 |         "df_anova2 = pd.read_csv(\"https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/crop_yield.csv\")\n",
211 |         "model = ols('Yield ~ C(Fert)*C(Water)', df_anova2).fit()\n",
212 |         "print(f\"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}\")\n",
213 |         "res = sm.stats.anova_lm(model, typ= 2)\n",
214 |         "print(res)"
215 |       ],
216 |       "execution_count": 3,
217 |       "outputs": [
218 |         {
219 |           "output_type": "stream",
220 |           "text": [
221 |             "One-way Anova\n",
222 |             "p-value for significance is:  0.0159099583256229\n",
223 |             "reject null hypothesis\n",
224 |             "Two-way ANova\n",
225 |             "Overall model F( 3, 16) =  4.112, p =  0.0243\n",
226 |             "                   sum_sq    df         F    PR(>F)\n",
227 |             "C(Fert)            69.192   1.0  5.766000  0.028847\n",
228 |             "C(Water)           63.368   1.0  5.280667  0.035386\n",
229 |             "C(Fert):C(Water)   15.488   1.0  1.290667  0.272656\n",
230 |             "Residual          192.000  16.0       NaN       NaN\n"
231 |           ],
232 |           "name": "stdout"
233 |         }
234 |       ]
235 |     }
236 |   ]
237 | }


--------------------------------------------------------------------------------
/Hypothesis Testing/Hypothesis Testing.md:
--------------------------------------------------------------------------------
 1 | # What is Hypothesis Testing?
 2 | 
 3 | Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data.
 4 | It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data.
 5 | 
 6 | ## Important Parameters:
 7 | 
 8 | **Null Hypothesis** : 
 9 | 1) A statement about a population paramter.
10 | 2) Contains: '=', '<=', '>='
11 | 3) We consider Null Hypothesis to be true, to decide whether to reject or accept the alternative hypothesis.
12 | 
13 | **Alternative Hypothesis** :
14 | 1) A statement that directly contardicts the null hypothesis.
15 | 2) COntains : 'Not equal to', '>', '<'
16 | 
17 | **Level Of Significance**:
18 | Degree of significance in which we accept or reject the null-hypothesis. (Ussually taken as 5%, which means your output should be 95% confident to give similar kind of result in each sample.)
19 | 
20 | **Type I error:**
21 |  When we reject the null hypothesis, although that hypothesis was true. (alpha)
22 | 
23 | **Type II error :**
24 | When we accept the null hypothesis but it is false. (Beta)
25 | 
26 | **One tailed test :-**
27 |  A test of a statistical hypothesis , where the region of rejection is on only one side of the sampling distribution , is called a one-tailed test.
28 | 
29 |  A box has ≥ 40 chocolates.
30 | 
31 | **Two-tailed test :-** 
32 | A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis.
33 | 
34 | A box != 40 chocolates.
35 | 
36 | **P-value** 
37 | P-value or the calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true.
38 | If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. 
39 | 
40 | Example : You have a coin and you don’t know whether that is fair or tricky so let’s decide null and alternate hypothesis
41 | H0 : The coin is a fair coin.
42 | H1 : The coin is a tricky coin.
43 | Level of significance :  95%
44 | alpha = 5% or 0.05
45 | 
46 | Now let’s toss the coin and calculate p- value ( probability value).
47 | Toss a coin 1st time and result is tail- P-value = 50% (as head and tail have equal probability)
48 | Toss a coin 2nd time and result is tail, now p-value = 50/2 = 25%
49 | and similarly we Toss 6 consecutive time and got result as P-value = 1.5% but we set our significance level as 95% means 5% error rate we allow and here we see we are beyond that level (p-value < alpha) i.e. our null- hypothesis does not hold good so we need to reject the null hypothesis and propose that this coin is a tricky coin which is actually.
50 | 
51 | Some of the widely used hypothesis tests and codes are discussed.
52 | 
53 | 
54 | 
55 | 


--------------------------------------------------------------------------------
/Hypothesis Testing/PlantGrowth.csv:
--------------------------------------------------------------------------------
 1 | "","weight","group"
 2 | "1",4.17,"ctrl"
 3 | "2",5.58,"ctrl"
 4 | "3",5.18,"ctrl"
 5 | "4",6.11,"ctrl"
 6 | "5",4.5,"ctrl"
 7 | "6",4.61,"ctrl"
 8 | "7",5.17,"ctrl"
 9 | "8",4.53,"ctrl"
10 | "9",5.33,"ctrl"
11 | "10",5.14,"ctrl"
12 | "11",4.81,"trt1"
13 | "12",4.17,"trt1"
14 | "13",4.41,"trt1"
15 | "14",3.59,"trt1"
16 | "15",5.87,"trt1"
17 | "16",3.83,"trt1"
18 | "17",6.03,"trt1"
19 | "18",4.89,"trt1"
20 | "19",4.32,"trt1"
21 | "20",4.69,"trt1"
22 | "21",6.31,"trt2"
23 | "22",5.12,"trt2"
24 | "23",5.54,"trt2"
25 | "24",5.5,"trt2"
26 | "25",5.37,"trt2"
27 | "26",5.29,"trt2"
28 | "27",4.92,"trt2"
29 | "28",6.15,"trt2"
30 | "29",5.8,"trt2"
31 | "30",5.26,"trt2"
32 | 


--------------------------------------------------------------------------------
/Hypothesis Testing/blood_pressure.csv:
--------------------------------------------------------------------------------
  1 | patient,sex,agegrp,bp_before,bp_after
  2 | 1,Male,30-45,143,153
  3 | 2,Male,30-45,163,170
  4 | 3,Male,30-45,153,168
  5 | 4,Male,30-45,153,142
  6 | 5,Male,30-45,146,141
  7 | 6,Male,30-45,150,147
  8 | 7,Male,30-45,148,133
  9 | 8,Male,30-45,153,141
 10 | 9,Male,30-45,153,131
 11 | 10,Male,30-45,158,125
 12 | 11,Male,30-45,149,164
 13 | 12,Male,30-45,173,159
 14 | 13,Male,30-45,165,135
 15 | 14,Male,30-45,145,159
 16 | 15,Male,30-45,143,153
 17 | 16,Male,30-45,152,126
 18 | 17,Male,30-45,141,162
 19 | 18,Male,30-45,176,134
 20 | 19,Male,30-45,143,136
 21 | 20,Male,30-45,162,150
 22 | 21,Male,46-59,149,168
 23 | 22,Male,46-59,156,155
 24 | 23,Male,46-59,151,136
 25 | 24,Male,46-59,159,132
 26 | 25,Male,46-59,164,160
 27 | 26,Male,46-59,154,160
 28 | 27,Male,46-59,152,136
 29 | 28,Male,46-59,142,183
 30 | 29,Male,46-59,162,152
 31 | 30,Male,46-59,155,162
 32 | 31,Male,46-59,175,151
 33 | 32,Male,46-59,184,139
 34 | 33,Male,46-59,167,175
 35 | 34,Male,46-59,148,184
 36 | 35,Male,46-59,170,151
 37 | 36,Male,46-59,159,171
 38 | 37,Male,46-59,149,157
 39 | 38,Male,46-59,140,159
 40 | 39,Male,46-59,185,140
 41 | 40,Male,46-59,160,174
 42 | 41,Male,60+,157,167
 43 | 42,Male,60+,158,158
 44 | 43,Male,60+,162,168
 45 | 44,Male,60+,160,159
 46 | 45,Male,60+,180,153
 47 | 46,Male,60+,155,164
 48 | 47,Male,60+,172,169
 49 | 48,Male,60+,157,148
 50 | 49,Male,60+,171,185
 51 | 50,Male,60+,170,163
 52 | 51,Male,60+,175,146
 53 | 52,Male,60+,175,160
 54 | 53,Male,60+,172,175
 55 | 54,Male,60+,173,163
 56 | 55,Male,60+,170,185
 57 | 56,Male,60+,164,146
 58 | 57,Male,60+,147,176
 59 | 58,Male,60+,154,147
 60 | 59,Male,60+,172,161
 61 | 60,Male,60+,162,164
 62 | 61,Female,30-45,152,149
 63 | 62,Female,30-45,147,142
 64 | 63,Female,30-45,144,146
 65 | 64,Female,30-45,144,138
 66 | 65,Female,30-45,158,131
 67 | 66,Female,30-45,147,145
 68 | 67,Female,30-45,154,134
 69 | 68,Female,30-45,151,135
 70 | 69,Female,30-45,149,131
 71 | 70,Female,30-45,138,135
 72 | 71,Female,30-45,162,133
 73 | 72,Female,30-45,157,135
 74 | 73,Female,30-45,141,168
 75 | 74,Female,30-45,167,144
 76 | 75,Female,30-45,147,147
 77 | 76,Female,30-45,143,151
 78 | 77,Female,30-45,142,149
 79 | 78,Female,30-45,166,147
 80 | 79,Female,30-45,147,149
 81 | 80,Female,30-45,142,135
 82 | 81,Female,46-59,157,127
 83 | 82,Female,46-59,170,150
 84 | 83,Female,46-59,150,138
 85 | 84,Female,46-59,150,147
 86 | 85,Female,46-59,167,157
 87 | 86,Female,46-59,154,146
 88 | 87,Female,46-59,143,148
 89 | 88,Female,46-59,157,136
 90 | 89,Female,46-59,149,146
 91 | 90,Female,46-59,161,132
 92 | 91,Female,46-59,142,145
 93 | 92,Female,46-59,162,132
 94 | 93,Female,46-59,144,157
 95 | 94,Female,46-59,142,140
 96 | 95,Female,46-59,159,137
 97 | 96,Female,46-59,140,154
 98 | 97,Female,46-59,144,169
 99 | 98,Female,46-59,142,145
100 | 99,Female,46-59,145,137
101 | 100,Female,46-59,145,143
102 | 101,Female,60+,168,178
103 | 102,Female,60+,142,141
104 | 103,Female,60+,147,149
105 | 104,Female,60+,148,148
106 | 105,Female,60+,162,138
107 | 106,Female,60+,170,143
108 | 107,Female,60+,173,167
109 | 108,Female,60+,151,158
110 | 109,Female,60+,155,152
111 | 110,Female,60+,163,154
112 | 111,Female,60+,183,161
113 | 112,Female,60+,159,143
114 | 113,Female,60+,148,159
115 | 114,Female,60+,151,177
116 | 115,Female,60+,165,142
117 | 116,Female,60+,152,152
118 | 117,Female,60+,161,152
119 | 118,Female,60+,165,174
120 | 119,Female,60+,149,151
121 | 120,Female,60+,185,163
122 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Pankhuri Saxena
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | Possible project for Kharagpur Winter of Code 2020
  2 | 
  3 | # Statistics and Econometrics for Data Science
  4 | ![GitHub Repo stars](https://img.shields.io/github/stars/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge)
  5 | ![GitHub forks](https://img.shields.io/github/forks/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge)
  6 | ![GitHub contributors](https://img.shields.io/github/contributors/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge)
  7 | ![GitHub pull requests](https://img.shields.io/github/issues-pr/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?logo=Github&style=for-the-badge)
  8 | ![GitHub issues](https://img.shields.io/github/issues/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?logo=Github&style=for-the-badge)
  9 | 
 10 | ## Table of Contents
 11 | 1. How are the topics even related to ML?
 12 | 2. What will the project entail?
 13 | 3. How to start with the project?
 14 | 4. What are the prerequisites for the project?
 15 | 5. What can you contribute to the project?
 16 | 6. Expectations from the project
 17 | 7. How much is ML and how much is statistics/econometrics?
 18 | 8. Who to contact?
 19 |  
 20 |  
 21 |  
 22 | ## How are the topics even related to ML?
 23 | Often while building models in ML we become too concerned with accuracy and forget whether
 24 | the model does what we initially set out to do. Statistics and Econometrics help in
 25 | building better models and understanding the data. They can help in better feature engineering,
 26 | and a better understanding of the assumptions which can help in ultimately building better models.
 27 | Running linear regression sounds easy but what if someone asks you what assumptions you made
 28 | while running the model. If your answer is "Umm..." then you are on the track to understanding
 29 | what these topics can contribute to ML (if you didn't already know). 
 30 |  
 31 | Due to certain limitations, for the time being, we are concerned with only Linear Regression.
 32 | This is just a very small subset of ML but let's start with tiny steps to progress.
 33 |  
 34 |  
 35 |  
 36 | ## What will the project entail?
 37 | The project aims to have a series of notebooks that will help in understanding the basic topics.
 38 | The notebooks could be used to get a broad overview of the topic or to quickly revise the topic.
 39 | The notebooks can be helpful in the following ways:
 40 | - You are participating in a competition and you want to run some quick checks on the data/model
 41 | - You are sitting for internship/placement and need to revise some topics fast
 42 | - You want some code snippet for a certain test and how to interpret the test results.
 43 |  
 44 |  
 45 |  
 46 | ## How to start with the project?
 47 | 1. Install Jupyter Notebook, recommended installing with [Anaconda](https://www.anaconda.com/products/individual)
 48 | 2. Learn how to use Jupyter Notebook, and python libraries NumPy, pandas, and matplotlib 
 49 | 3. Clone this repo and make a new branch
 50 | 4. Each ipynb file should be able to stand independently so you should be able to open it using Jupyter Notebook
 51 |  
 52 |  
 53 |  
 54 | ## What are the prerequisites for the project?
 55 | - Basic knowledge of at least one programming language (preferable python)
 56 | - Basic knowledge of probability (class 12 level)
 57 | - Desire to learn statistics 
 58 |  
 59 |  
 60 |  
 61 | ## What can you contribute to the project?
 62 | Easy: Make some changes to the existing graphs or explanation to make them look better, 
 63 | add new ideas to 'ideas.md', check if existing notebooks make sense
 64 |  
 65 | Intermediate: Start with a new notebook of your own
 66 |  
 67 | Advanced: Make a series of notebooks or explain a complicated/advanced topic
 68 |  
 69 |  
 70 |  
 71 | ## Expectations from the project
 72 | There will be a variety of issues, some easy to get you started and one harder to make you 
 73 | significantly contribute. But I'll set down the minimum expected work that you should do to 
 74 | pass. By medievals, you should have at least one new notebook and by endevals, you should have 
 75 | at least three new notebooks ready. Each notebook should have some introduction to the topic, 
 76 | mathematical proofs if required, the code to implement that topic from scratch, and any ready-made
 77 | library code, if available. 
 78 |  
 79 | The notebook referred to here are Jupyter Notebooks.
 80 |  
 81 |  
 82 |  
 83 | ## How much is ML and how much is statistics/econometrics?
 84 | Well, your learning from this will be less towards ML. These topics are to provide support to ML
 85 | and do not replace the importance of doing a course/project purely based on machine learning.
 86 |  
 87 |  
 88 |  
 89 | ## Who to contact?
 90 | The project was started by PetalsOnWind (Pankhuri Saxena, a fourth-year Economics student at IIT KGP).
 91 | She can be reached at pankhurisaxena[dot]iitkgp[at]gmail[dot]com.
 92 | 
 93 | ## Contributors:
 94 | 
 95 | ### Credits goes to these people:✨
 96 | 
 97 | <table>
 98 | 	<tr>
 99 | 		<td>
100 | <a href="https://github.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/graphs/contributors">
101 |   <img src="https://contrib.rocks/image?repo=PetalsOnWind/Statistics-and-Econometrics-for-Data-Science" />
102 | </a>
103 | 		</td>
104 | 	</tr>
105 | </table>
106 | 


--------------------------------------------------------------------------------
/WIP/Monte Carlo Simulation.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# MONTE CARLO METHODS OF INFERENCE"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "#### In this section we will see how we can simulate random sequence of numbers using MONTE CARLO SIMULATION."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "The value of pi can be calculated using the monte carlo simulation. In this part first we will generate random tuples of numbers using random function in numpy library into two separate array array_x and array_y. \n",
 22 |     "The value of pi will be area of circle of radius r divided by area of square with side r, same as the radius of the circle.\n",
 23 |     "\n",
 24 |     "The value of  $$ pi ( \\pi ) = \\frac{area(circle)}{area(square)} \\\\ $$"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {},
 31 |    "outputs": [],
 32 |    "source": [
 33 |     "#Importing all the necessary library.\n",
 34 |     "import numpy as np\n",
 35 |     "import pandas as pd\n",
 36 |     "import matplotlib.pyplot as plt\n",
 37 |     "import seaborn as sns\n",
 38 |     "%matplotlib inline"
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "Generating random tuples in the first quadrant between x=0 to x=5 and y=0 to y=5. Using random function from numpy library we will get random tuples or point evenly distributed in the square of dimensions 5 X 5.\n",
 46 |     "Now we have assigned a square of side 2 units with one side at origin inside this region and a circle of radius 2 units centered at (2,2). If a point falls in the square region then the value of square_count(s) increase by 1 and whenever a point falls in circular region the value of circular_count(c) increases by 1. \n",
 47 |     "'s' is proportional to the area of square and 'c' is proportional to area of circle.\n",
 48 |     "Therefore the ratio of circular area and square area i.e. c/s gives the value of pi.\n",
 49 |     "$$ pi ( \\pi ) = \\frac{c}{s} =\\frac{\\pi a^2}{a^2}= \\pi \\\\ $$"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 18,
 55 |    "metadata": {},
 56 |    "outputs": [],
 57 |    "source": [
 58 |     "#generating and storing random number in range (0,5) in two separate list so that each tuple (array_x[i], array_y[i])\n",
 59 |     "#gives a random point in the defined square region.\n",
 60 |     "array_x = np.random.rand(10000)\n",
 61 |     "array_y = np.random.rand(10000)\n",
 62 |     "array_x = array_x * 5\n",
 63 |     "array_y = array_y * 5\n"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "code",
 68 |    "execution_count": 20,
 69 |    "metadata": {},
 70 |    "outputs": [
 71 |     {
 72 |      "data": {
 73 |       "text/plain": [
 74 |        "array([4.42890686, 4.92913044, 0.81528105, ..., 4.91592682, 2.00402508,\n",
 75 |        "       4.69855032])"
 76 |       ]
 77 |      },
 78 |      "execution_count": 20,
 79 |      "metadata": {},
 80 |      "output_type": "execute_result"
 81 |     }
 82 |    ],
 83 |    "source": [
 84 |     "#A look into array_x\n",
 85 |     "array_x"
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "code",
 90 |    "execution_count": 21,
 91 |    "metadata": {},
 92 |    "outputs": [
 93 |     {
 94 |      "data": {
 95 |       "text/plain": [
 96 |        "array([0.76251225, 0.73854336, 0.98199909, ..., 0.78046047, 4.27236299,\n",
 97 |        "       3.25731217])"
 98 |       ]
 99 |      },
100 |      "execution_count": 21,
101 |      "metadata": {},
102 |      "output_type": "execute_result"
103 |     }
104 |    ],
105 |    "source": [
106 |     "#A look into array_y\n",
107 |     "array_y"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": 22,
113 |    "metadata": {},
114 |    "outputs": [],
115 |    "source": [
116 |     "#Here we create a list which stores the value of the ratio of c and s after every iteration. I have initialised\n",
117 |     "#the value of s and c to 1 so that we don't get any infinite value.\n",
118 |     "#the condition for the point to belong in square region and circular region is checked in every iteration and \n",
119 |     "#accordingly the value of s and c is changed.\n",
120 |     "pi_val = []\n",
121 |     "s=1\n",
122 |     "c=1\n",
123 |     "for i in range(10000):\n",
124 |     "   \n",
125 |     "    if array_x[i]<2 and array_y[i]<2:\n",
126 |     "        s=s+1\n",
127 |     "    if ((array_x[i]-2)*(array_x[i]-2)+(array_y[i]-2)*(array_y[i]-2))<4:\n",
128 |     "        c=c+1\n",
129 |     "    pi=c/s\n",
130 |     "    pi_val.append(pi)\n",
131 |     "    "
132 |    ]
133 |   },
134 |   {
135 |    "cell_type": "code",
136 |    "execution_count": 29,
137 |    "metadata": {},
138 |    "outputs": [
139 |     {
140 |      "data": {
141 |       "text/plain": [
142 |        "(array([4.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n",
143 |        "        0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 2.000e+00, 0.000e+00,\n",
144 |        "        0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n",
145 |        "        0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n",
146 |        "        0.000e+00, 0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 3.000e+00,\n",
147 |        "        7.000e+00, 2.000e+00, 4.000e+00, 1.000e+00, 1.000e+00, 6.100e+01,\n",
148 |        "        3.460e+02, 5.880e+02, 1.368e+03, 2.069e+03, 2.278e+03, 2.972e+03,\n",
149 |        "        1.160e+02, 3.400e+01, 2.300e+01, 3.500e+01, 2.600e+01, 1.300e+01,\n",
150 |        "        9.000e+00, 1.000e+00, 7.000e+00, 4.000e+00, 0.000e+00, 2.000e+00,\n",
151 |        "        0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n",
152 |        "        3.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 2.000e+00, 0.000e+00,\n",
153 |        "        1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 4.000e+00, 0.000e+00,\n",
154 |        "        0.000e+00, 1.000e+00, 0.000e+00, 2.000e+00, 0.000e+00, 0.000e+00,\n",
155 |        "        0.000e+00, 0.000e+00, 2.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n",
156 |        "        0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n",
157 |        "        0.000e+00, 0.000e+00, 0.000e+00, 4.000e+00, 0.000e+00, 0.000e+00,\n",
158 |        "        0.000e+00, 0.000e+00, 0.000e+00, 1.000e+00]),\n",
159 |        " array([1.  , 1.05, 1.1 , 1.15, 1.2 , 1.25, 1.3 , 1.35, 1.4 , 1.45, 1.5 ,\n",
160 |        "        1.55, 1.6 , 1.65, 1.7 , 1.75, 1.8 , 1.85, 1.9 , 1.95, 2.  , 2.05,\n",
161 |        "        2.1 , 2.15, 2.2 , 2.25, 2.3 , 2.35, 2.4 , 2.45, 2.5 , 2.55, 2.6 ,\n",
162 |        "        2.65, 2.7 , 2.75, 2.8 , 2.85, 2.9 , 2.95, 3.  , 3.05, 3.1 , 3.15,\n",
163 |        "        3.2 , 3.25, 3.3 , 3.35, 3.4 , 3.45, 3.5 , 3.55, 3.6 , 3.65, 3.7 ,\n",
164 |        "        3.75, 3.8 , 3.85, 3.9 , 3.95, 4.  , 4.05, 4.1 , 4.15, 4.2 , 4.25,\n",
165 |        "        4.3 , 4.35, 4.4 , 4.45, 4.5 , 4.55, 4.6 , 4.65, 4.7 , 4.75, 4.8 ,\n",
166 |        "        4.85, 4.9 , 4.95, 5.  , 5.05, 5.1 , 5.15, 5.2 , 5.25, 5.3 , 5.35,\n",
167 |        "        5.4 , 5.45, 5.5 , 5.55, 5.6 , 5.65, 5.7 , 5.75, 5.8 , 5.85, 5.9 ,\n",
168 |        "        5.95, 6.  ]),\n",
169 |        " <BarContainer object of 100 artists>)"
170 |       ]
171 |      },
172 |      "execution_count": 29,
173 |      "metadata": {},
174 |      "output_type": "execute_result"
175 |     },
176 |     {
177 |      "data": {
178 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFlCAYAAAA+gTZIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUxklEQVR4nO3df6zd9X3f8de7OKMsCSoRN8i1yUwrrxoglTSWx4RUZU1XvKYa9A8kR1qCpkyuIlIlWqXJ5J+0f1hi0ppumRYkGrIYLQmymkSgknRlLFUWiYZcMhpiCIoVWHDxsNsoCtkfbDjv/XG/rKfm4nv943PP8fXjIR2dcz7n+z3nc/QV4unv93u+t7o7AACM81PzngAAwGYnuAAABhNcAACDCS4AgMEEFwDAYIILAGCwLfOewFquvPLK3rFjx7ynAQCwpscff/yvunvp1PGFD64dO3ZkeXl53tMAAFhTVf3P1cYdUgQAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgsDWDq6p+uqoeq6q/qKrDVfV70/hbqurhqvrudH/FzDp3VtWRqnqmqm6eGX9HVT05vfbxqqoxXwsAYHGsZw/Xy0l+pbt/MckNSfZU1Y1J9id5pLt3Jnlkep6qujbJ3iTXJdmT5BNVdcn0Xncn2Zdk53Tbc/6+CgDAYlozuHrFj6enb5huneSWJAen8YNJbp0e35Lk/u5+ubufTXIkye6q2prk8u5+tLs7yX0z6wAAbFrrOoerqi6pqieSHE/ycHd/PclV3X0sSab7t06Lb0vy/MzqR6exbdPjU8dX+7x9VbVcVcsnTpw4g68DALB41hVc3X2yu29Isj0re6uuP83iq52X1acZX+3z7unuXd29a2npNX//EQDggnJGv1Ls7h8m+bOsnHv14nSYMNP98Wmxo0munllte5IXpvHtq4wDAGxqW9ZaoKqWkvzf7v5hVV2W5FeT/JskDya5Pcld0/0D0yoPJvlsVX0syc9m5eT4x7r7ZFW9NJ1w//Uk70vyH873FwI21o79D71m7Lm73j2HmQAsrjWDK8nWJAenXxr+VJJD3f3HVfVokkNV9f4k309yW5J09+GqOpTkqSSvJLmju09O7/WBJJ9OclmSL083AIBNbc3g6u5vJXn7KuN/neRdr7POgSQHVhlfTnK6878AADYdV5oHABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGWzO4qurqqvpKVT1dVYer6kPT+O9W1V9W1RPT7ddn1rmzqo5U1TNVdfPM+Duq6snptY9XVY35WgAAi2PLOpZ5JcnvdPc3q+rNSR6vqoen1/6gu//t7MJVdW2SvUmuS/KzSf5rVf397j6Z5O4k+5L8eZIvJdmT5Mvn56sAACymNfdwdfex7v7m9PilJE8n2XaaVW5Jcn93v9zdzyY5kmR3VW1Ncnl3P9rdneS+JLee6xcAAFh0Z3QOV1XtSPL2JF+fhj5YVd+qqk9V1RXT2LYkz8+sdnQa2zY9PnUcAGBTW3dwVdWbknw+yYe7+0dZOTz480luSHIsye+/uugqq/dpxlf7rH1VtVxVyydOnFjvFAEAFtK6gquq3pCV2PpMd38hSbr7xe4+2d0/SfKHSXZPix9NcvXM6tuTvDCNb19l/DW6+57u3tXdu5aWls7k+wAALJz1/Eqxktyb5Onu/tjM+NaZxX4zybenxw8m2VtVl1bVNUl2Jnmsu48leamqbpze831JHjhP3wMAYGGt51eKNyV5b5Inq+qJaewjSd5TVTdk5bDgc0l+K0m6+3BVHUryVFZ+4XjH9AvFJPlAkk8nuSwrv070C0UAYNNbM7i6+2tZ/fyrL51mnQNJDqwyvpzk+jOZIADAhc6V5gEABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDbZn3BIALx479D817CgAXJHu4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYLAt854AsJh27H9o3lMA2DTs4QIAGExwAQAMtmZwVdXVVfWVqnq6qg5X1Yem8bdU1cNV9d3p/oqZde6sqiNV9UxV3Twz/o6qenJ67eNVVWO+FgDA4ljPHq5XkvxOd/+DJDcmuaOqrk2yP8kj3b0zySPT80yv7U1yXZI9ST5RVZdM73V3kn1Jdk63PefxuwAALKQ1g6u7j3X3N6fHLyV5Osm2JLckOTgtdjDJrdPjW5Lc390vd/ezSY4k2V1VW5Nc3t2PdncnuW9mHQCATeuMzuGqqh1J3p7k60mu6u5jyUqUJXnrtNi2JM/PrHZ0Gts2PT51HABgU1t3cFXVm5J8PsmHu/tHp1t0lbE+zfhqn7WvqparavnEiRPrnSIAwEJaV3BV1RuyEluf6e4vTMMvTocJM90fn8aPJrl6ZvXtSV6YxrevMv4a3X1Pd+/q7l1LS0vr/S4AAAtpPb9SrCT3Jnm6uz8289KDSW6fHt+e5IGZ8b1VdWlVXZOVk+Mfmw47vlRVN07v+b6ZdQAANq31XGn+piTvTfJkVT0xjX0kyV1JDlXV+5N8P8ltSdLdh6vqUJKnsvILxzu6++S03geSfDrJZUm+PN0AADa1NYOru7+W1c+/SpJ3vc46B5IcWGV8Ocn1ZzJBAIALnSvNAwAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDB1gyuqvpUVR2vqm/PjP1uVf1lVT0x3X595rU7q+pIVT1TVTfPjL+jqp6cXvt4VdX5/zoAAItnPXu4Pp1kzyrjf9DdN0y3LyVJVV2bZG+S66Z1PlFVl0zL351kX5Kd02219wQA2HTWDK7u/mqSH6zz/W5Jcn93v9zdzyY5kmR3VW1Ncnl3P9rdneS+JLee5ZwBAC4o53IO1wer6lvTIccrprFtSZ6fWeboNLZtenzq+Kqqal9VLVfV8okTJ85higAA83e2wXV3kp9PckOSY0l+fxpf7bysPs34qrr7nu7e1d27lpaWznKKAACL4ayCq7tf7O6T3f2TJH+YZPf00tEkV88suj3JC9P49lXGAQA2vbMKrumcrFf9ZpJXf8H4YJK9VXVpVV2TlZPjH+vuY0leqqobp18nvi/JA+cwbwCAC8aWtRaoqs8leWeSK6vqaJKPJnlnVd2QlcOCzyX5rSTp7sNVdSjJU0leSXJHd5+c3uoDWfnF42VJvjzdAAA2vTWDq7vfs8rwvadZ/kCSA6uMLye5/oxmBwCwCbjSPADAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMHWvPApsPnt2P/QvKcAsKnZwwUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYbM3gqqpPVdXxqvr2zNhbqurhqvrudH/FzGt3VtWRqnqmqm6eGX9HVT05vfbxqqrz/3UAABbPevZwfTrJnlPG9id5pLt3Jnlkep6qujbJ3iTXTet8oqoumda5O8m+JDun26nvCQCwKa0ZXN391SQ/OGX4liQHp8cHk9w6M35/d7/c3c8mOZJkd1VtTXJ5dz/a3Z3kvpl1AAA2tbM9h+uq7j6WJNP9W6fxbUmen1nu6DS2bXp86viqqmpfVS1X1fKJEyfOcooAAIvhfJ80v9p5WX2a8VV19z3dvau7dy0tLZ23yQEAzMPZBteL02HCTPfHp/GjSa6eWW57khem8e2rjAMAbHpnG1wPJrl9enx7kgdmxvdW1aVVdU1WTo5/bDrs+FJV3Tj9OvF9M+sAAGxqW9ZaoKo+l+SdSa6sqqNJPprkriSHqur9Sb6f5LYk6e7DVXUoyVNJXklyR3efnN7qA1n5xeNlSb483QAANr01g6u73/M6L73rdZY/kOTAKuPLSa4/o9kBAGwCrjQPADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAg22Z9wSAjbVj/0PzngLARcceLgCAwQQXAMBgggsAYDDBBQAwmOACABjsnIKrqp6rqier6omqWp7G3lJVD1fVd6f7K2aWv7OqjlTVM1V187lOHgDgQnA+9nD94+6+obt3Tc/3J3mku3cmeWR6nqq6NsneJNcl2ZPkE1V1yXn4fACAhTbikOItSQ5Ojw8muXVm/P7ufrm7n01yJMnuAZ8PALBQzjW4OsmfVtXjVbVvGruqu48lyXT/1ml8W5LnZ9Y9Oo0BAGxq53ql+Zu6+4WqemuSh6vqO6dZtlYZ61UXXIm3fUnytre97RynCAAwX+e0h6u7X5jujyf5YlYOEb5YVVuTZLo/Pi1+NMnVM6tvT/LC67zvPd29q7t3LS0tncsUAQDm7qyDq6reWFVvfvVxkl9L8u0kDya5fVrs9iQPTI8fTLK3qi6tqmuS7Ezy2Nl+PgDAheJcDileleSLVfXq+3y2u/+kqr6R5FBVvT/J95PcliTdfbiqDiV5KskrSe7o7pPnNHsAgAvAWQdXd38vyS+uMv7XSd71OuscSHLgbD8TAOBC5ErzAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYOfyx6uBBbdj/0PzngIAsYcLAGA4wQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMG2zHsCwOazY/9Drxl77q53z2EmAIvBHi4AgMEEFwDAYIILAGAwwQUAMJiT5mGTWO1EdQAWgz1cAACDCS4AgME2/JBiVe1J8u+TXJLkk91910bPAdh46702l2t4AZvRhgZXVV2S5D8m+SdJjib5RlU92N1PbeQ8gMWw3vPORBhwodvoPVy7kxzp7u8lSVXdn+SWJIILOCOvF2tCDFhEGx1c25I8P/P8aJJ/uMFzeA3/euZ8We8eG4fSxjmXQ5erWfRtsJnC038Dm49t+jequzfuw6puS3Jzd//L6fl7k+zu7t8+Zbl9SfZNT38hyTODp3Zlkr8a/BmcOdtl8dgmi8c2WUy2y+LZqG3y97p76dTBjd7DdTTJ1TPPtyd54dSFuvueJPds1KSqarm7d23U57E+tsvisU0Wj22ymGyXxTPvbbLRl4X4RpKdVXVNVf2dJHuTPLjBcwAA2FAbuoeru1+pqg8m+S9ZuSzEp7r78EbOAQBgo234dbi6+0tJvrTRn7uGDTt8yRmxXRaPbbJ4bJPFZLssnrlukw09aR4A4GLkT/sAAAx2UQdXVX2qqo5X1bfnPRdWVNXVVfWVqnq6qg5X1YfmPSeSqvrpqnqsqv5i2i6/N+85saKqLqmq/1FVfzzvubCiqp6rqier6omqWp73fEiq6meq6o+q6jvT/1/+0YbP4WI+pFhVv5zkx0nu6+7r5z0fkqrammRrd3+zqt6c5PEkt/rzT/NVVZXkjd3946p6Q5KvJflQd//5nKd20auqf5VkV5LLu/s35j0fVoIrya7udh2uBVFVB5P89+7+5HSVhL/b3T/cyDlc1Hu4uvurSX4w73nwN7r7WHd/c3r8UpKns/IXCpijXvHj6ekbptvF+6+1BVFV25O8O8kn5z0XWFRVdXmSX05yb5J09//Z6NhKLvLgYrFV1Y4kb0/y9TlPhfz/Q1dPJDme5OHutl3m798l+ddJfjLnefC3dZI/rarHp7+cwnz9XJITSf7TdPj9k1X1xo2ehOBiIVXVm5J8PsmHu/tH854PSXef7O4bsvIXInZXlcPwc1RVv5HkeHc/Pu+58Bo3dfcvJfmnSe6YTl9hfrYk+aUkd3f325P87yT7N3oSgouFM50j9Pkkn+nuL8x7Pvxt0674P0uyZ74zuejdlOSfTecL3Z/kV6rqP893SiRJd78w3R9P8sUku+c7o4ve0SRHZ/bK/1FWAmxDCS4WynRy9r1Jnu7uj817PqyoqqWq+pnp8WVJfjXJd+Y6qYtcd9/Z3du7e0dW/kzaf+vufz7naV30quqN0w9+Mh22+rUkfgk/R939v5I8X1W/MA29K8mG/xBrw680v0iq6nNJ3pnkyqo6muSj3X3vfGd10bspyXuTPDmdL5QkH5n+QgHzszXJwaq6JCv/UDvU3S5DAK91VZIvrvzbMVuSfLa7/2S+UyLJbyf5zPQLxe8l+RcbPYGL+rIQAAAbwSFFAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAg/0/aEjMwJF2DSIAAAAASUVORK5CYII=\n",
179 |       "text/plain": [
180 |        "<Figure size 720x432 with 1 Axes>"
181 |       ]
182 |      },
183 |      "metadata": {
184 |       "needs_background": "light"
185 |      },
186 |      "output_type": "display_data"
187 |     }
188 |    ],
189 |    "source": [
190 |     "#Here I have plotted the histogram which shows that maximum value in the list pi_val\n",
191 |     "#lie at the expected value of pie i.e 3.14.\n",
192 |     "plt.figure(figsize=(10,6))\n",
193 |     "plt.hist(pi_val, bins = 100)"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": 31,
199 |    "metadata": {},
200 |    "outputs": [
201 |     {
202 |      "data": {
203 |       "text/plain": [
204 |        "[<matplotlib.lines.Line2D at 0x7f9f3e0bbd30>]"
205 |       ]
206 |      },
207 |      "execution_count": 31,
208 |      "metadata": {},
209 |      "output_type": "execute_result"
210 |     },
211 |     {
212 |      "data": {
213 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkkAAAFlCAYAAAD/BnzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAso0lEQVR4nO3deXzcVb3/8feZmexJm7RNurfp3gKFLqFI2bqwFFDxXtGLCIKI3Ov1KoiKVfG6oRdFvehPr4qgKKiAiIigFYSySktTWtrSfd/bpFu2Zp3z+2OWTpJJM2lzMj3h9Xw88sjMN9+ZfJrTZN5ztq+x1goAAACtBdJdAAAAwKmIkAQAAJAEIQkAACAJQhIAAEAShCQAAIAkCEkAAABJhFw86YABA2xpaamLpwYAAOhWS5curbTWFrc97iQklZaWqry83MVTAwAAdCtjzLZkxxluAwAASIKQBAAAkAQhCQAAIAlCEgAAQBKEJAAAgCQISQAAAEkQkgAAAJIgJAEAACRBSAIAAEgipZBkjCk0xjxujFlrjFljjDnXdWEAAADplOplSX4oaYG19mpjTKakXIc1AQAApF2nPUnGmD6SLpT0gCRZaxuttYcd19Wp5pawtlTWprsMAADQS6Uy3DZaUoWkXxljlhlj7jfG5LU9yRhzizGm3BhTXlFR0e2FtnXP39dp9vde1I6Ddc6/FwAAeOdJJSSFJE2T9FNr7VRJtZLmtz3JWnuftbbMWltWXFzczWW2t2jzAUlSZU2D8+8FAADeeVIJSTsl7bTWLo7ef1yR0HRKMMakuwQAANALdRqSrLV7Je0wxkyIHporabXTqgAAANIs1dVtn5L02+jKts2SPuquJAAAgPRLKSRZa5dLKnNbStfYdBcAAAB6Ne933GZGEgAAcMH7kAQAAOCCtyHJMt4GAAAc8jYkxbADAAAAcMH7kAQAAOCCtyHJsr4NAAA45G1IijGsbwMAAA54H5IAAABc8DYksboNAAC45G1IimF1GwAAcMH7kAQAAOCCtyGJ4TYAAOCStyEJAADAJW9DUkVNQ7pLAAAAvZi3ISkzGCk9OyOY5koAAEBv5G1ICgYiy9oygixvAwAA3c/bkMRlSQAAgEv+hiQyEgAAcMjbkBRDWAIAAC54G5IIRwAAwCVvQxIAAIBL3ockOpQAAIAL3oYky3gbAABwyNuQBAAA4JL3IYkeJQAA4IK3IYloBAAAXPI3JJGSAACAQ96GpBiyEgAAcMHbkMS12wAAgEv+hiQyEgAAcMjbkBRDWAIAAC54G5LIRgAAwCV/QxIpCQAAOORtSDqGtAQAALqfxyGJcAQAANzxNiQx3AYAAFzyNiQBAAC45H1IokcJAAC44G1IIhsBAACX/A1JdCEBAACHvA1JMUQlAADggrchiXAEAABc8jckkZIAAIBD3oakGMISAABwwduQxMRtAADgkr8hKd0FAACAXs3bkBRjiUsAAMABf0MS2QgAADjkbUgiIwEAAJe8DUkxzN8GAAAuhFI5yRizVVK1pBZJzdbaMpdFAQAApFtKISlqtrW20lklXcQWAAAAwCWG2wAAAJJINSRZSc8aY5YaY25xWVCqahtb0l0CAADoxVIdbjvPWrvbGFMi6TljzFpr7cuJJ0TD0y2SNGLEiG4uEwAAoGel1JNkrd0d/bxf0p8kzUhyzn3W2jJrbVlxcXH3Vnm82tgMAAAAONBpSDLG5BljCmK3JV0qaZXrwgAAANIpleG2gZL+ZIyJnf87a+0Cp1UBAACkWachyVq7WdJZPVDLCWF1GwAAcMH7LQAAAABcICQBAAAkQUgCAABIgpAEAACQBCEJAAAgCe9DEqvbAACAC96HJAAAABcISQAAAEl4H5K4dhsAAHDB+5AEAADgAiEJAAAgCe9DEqvbAACAC96HJAAAABcISQAAAEl4H5IYbQMAAC54H5IAAABcICQBAAAk4X1IsixvAwAADngfkgAAAFwgJAEAACThfUhisA0AALjgfUgCAABwgZAEAACQhPchicVtAADABS9DUn1TS7pLAAAAvZyXIamxJZzuEgAAQC/nZUhqPcTGeBsAAOh+XoYkchEAAHDNy5BkSUkAAMAxL0NSIla3AQAAF7wMSQQjAADgmp8hKd0FAACAXs/PkJTQlURgAgAALvgZktJdAAAA6PX8DEmkJAAA4JifISmhL4nABAAAXPAyJDHeBgAAXPMzJAEAADjmZUhK7EiyjLcBAAAH/AxJ5CIAAOCYnyGJSUkAAMAxP0NSQkaqaWhOXyEAAKDX8jMkJdz+ypOr0lYHAADovfwMSQldSQdqG9NYCQAA6K08DUnHbhuTvjoAAEDv5WVISmRESgIAAN3P/5BERgIAAA54GZJaDbelrwwAANCL+RmSEta3GbqSAACAA36GJCZuAwAAx/wMSQm3yUgAAMCFlEOSMSZojFlmjHnaZUGpSNwnieE2AADgQld6km6VtMZVIV3BldsAAIBrKYUkY8wwSVdKut9tOV1HRxIAAHAh1Z6keyXdISnc0QnGmFuMMeXGmPKKioruqK1DiRO3bzi31On3AgAA70ydhiRjzLsl7bfWLj3eedba+6y1ZdbasuLi4m4rsIPvFr9VXJDl+HsBAIB3olR6ks6T9F5jzFZJj0iaY4x52GlVnUjsSWJ+EgAAcKHTkGSt/aK1dpi1tlTSNZJesNZe57yy49XU6g4xCQAAdD8/90miJwkAADgW6srJ1toXJb3opJIuSLwsCR1JAADABf97kkhJAADAAS9DEgAAgGtehiTmJAEAANf8DEnMSQIAAI75GZLoSQIAAI55GZISMXEbAAC44GVIIhcBAADX/AxJDLIBAADH/AxJNvltAACA7uJlSEpErxIAAHDBy5CUGIvoSQIAAC74GZISkhEZCQAAuOBnSEq8TUoCAAAO+BmSWm0mSUoCAADdz8uQxCAbAABwzcuQxBYAAADANT9DUroLAAAAvZ6XISkR124DAAAueBmSEnPR4bqmdl//ypOrdMUPX+nBigAAQG/jaUg6lpLuf3VLu68/tGibVu+p6smSAABAL+NnSEp3AQAAoNfzMySRkgAAgGN+hiT6kgAAgGNehqTKmsZ0lwAAAHo5L0NSdsjLsgEAgEe8TBsMtgEAANf8DEmkJAAA4JiXIQkAAMA1T0NSal1JX3lylW5/dLnbUgAAQK/kaUiKGNI3W5OH9pUkHW1s0cK1+1t9/aFF2/TEsl3pKA0AAHjOy5AUm5O0+0i9Vu46Ikm688lV+uiDS7R6N5cjAQAAJ8/PkNTm/tHGFm3cXy1JOlzXfg+lfVX1PVAVAADoTbwMSW0ZIzWHI9Ep9jnRnU+u6umSAACA57wMSW23ALBWaomGo5qG5nbnb66o6YmyAABAL+JlSGrLyiocTU7JAtGmitqeLgkAAHjOy5DU9gK3YXtsmO17z65PR0kAAKCX8TMktRluC1urzZ30Fq3cecRhRQAAoLfxMiS1lcplSvaywg0AAHSBlyGpbSayKaSkVM4BAACI8TIkxVw8aaCkYyvbjicYMK7LAQAAvYiXISnWK/SPNftafW7r3y8aHb9dmJvpvjAAANBreBmS2qppaGl37IEbynTHZRPj98u3Huzw8fuq6lXf1Po5dhysU1V9U/cVCQAAvNIrQlLb+UZTRxRq7qSBrYbY/udvazt87Dnffl4Tv7Kg1fELvrtQ077xXPcXCwAAvOBlSEq2BUCioEl9/lFDczh+u7klcvtATUPkfgpznQAAQO/kZ0hKsplkokAXJmnXJlzG5M3thyVJz65OPscJAAC8c3gZktpq25MUSghJD9xQlvQxv39ju0rnP6Ov/2V1/Ng1970uSfriEysdVAkAAHwSSncBJ6LtcFtWKNjqfuJcpLmTBqq4IEsV1Q2tzokFoafe2h0/FrZS6fxnurlaAEAqmlrCevbtfXp5fYUO1jWqorpBP79+ugb2yU53aThJsbnDb24/pKxQUKOL87R6d5V2HT6qs0v7aUhhjiSprrFZv3ptq2obmvWes4Zo1IA8ZYUCMl2YRtOdvAxJMfMvn6i7/7ZWZw7r2+p42z2R2gakrmhsDisz1Cs63ADgpG2prNVbOw6rKC9Tk4f2VZ/skIwxCgaMahqa1dwS1qpdVXp1Y6WmDO+rw3VNmh99U1pSkKX9bf4eTx7aVyt3HdGA/CxV1rT/W33Ot5+XJBXmZmjioAIV5mTqM5eM1/iB+Wl74TwVHahp0NYDdTp9SB9lZwQ7f4AD+6vr9fqmA/rBc+s1dXihmsJWDU1hVdY0aN3eah1tar8SPaa0f662Hqhrdez/XtykzGBA6791uevSO+RlSIr1JBXmZEiKbCYZDJj4ppKdTdxuagkf9+uJVu+p0pThhSdUJwD0BgdqGvSe//eqdh85ucs77a9uUHZGQFmhoI4cbVJ2RkCNzWENjfYiSNKNM0uVnxXSZacPUl1jsz73+FuqrG7U4bomLdoc2cplwdt7NW1EoaaPLJIxRkMLczSkMEcXjBugzGBA1fXNOny0UVVHm3X6kD4KBIzqm1rU2BLWK+srNe+MQQoGjI7UNalPTihtYctaq8aWsDbsq9FrGytVVd+kvUcatGzHIZ0zqp8amsIa3i9X4wcWqDA3Q+MHFqihuUXr9lZrc0Wt1u2r1lNv7dao/nlav786/tp48aQSXTi+WEMLc1TfFNbeqnodrmvU8h2HVZyfpcraRuVkBPTiugqNLcnX7sNHNWFQga6ePlw5GUEV5WZo2sgiZYUCqm5o1paKWr2x5aDys0Mq33pIBdmR6LD78FHlZ4WUlRHQ4s0HteNQnZpaIkWErVVGMKCgMaprbNGgvtmaOrxQgwuz1dRi1S8vU43NYeVmBvXUW7tVkB1SIGB0zqj+GluSr62VtRreL0ehQHo7KfwMSdHPsQnaYWtbzUtqO3H7nFH9tHjLsX2SquubdTzGSA/ddI6ue2Cx6hqPfy4AnKzG5rC+/dc1enTJDh1tatHSOy/W82v3qyg3U7MnFMd7afIyQykvTKlpaNaSLQdVVd+kPy/frZ2H6jSwT7Ze2VApSTq7tEhThhfqS1dMkjFGjc1hPVq+Q5v212jVriMq33ZIUvt3+MGA0U3nlcpa6f5Xt0iSRhfnqSVslRkM6JoZI7Tn8FFNG1mkVzZUqig3Q5+5ZLwygif2YvfKHXPit621Wr7jsP66co8eeHVLfLHNyTp/7ABdPKlEg/pmq6q+WQVZIc07Y1DS8GStVdhKASMZEwlaR5ta1GKt+udFNi3OCgVUvu2Q7n9lsyTpvLEDNKY4X8+s3KPyrQcVttLY4nxtrKjRxv01SWvaefCoGlN8Q1/f3KJPXDRGVtKzb+/VP9bs1z/W7E/psW/vrpIkLdp8MB5CY4b0zU4ajLNCATU0hzWoT7aaWsI6VNeo4f1ydc3ZI3TVlCE6a3hhl9r75gtGd35SmnQakowx2ZJelpQVPf9xa+1XXReWiliPUXOLbTVPqbggq9V5Z5f2i//CS9Jj5TvaPde/ThuqJ97cJUna+K0rtGDVXknStb9YrK13X9ndpaOXstYe911p+daDuvpnr+sL8ybqE7PG9GBl6E6rd1dp24FaVdc3q09OhlrCVmNL8jWmOE+hhBeHmoZmfeLhpfFgIkkzRvXT0MIcba6s1fYDtZoyvFAL11W0ev7pd/2jw+9dkB1q90bvpvNGaf7lE/Xn5bv0+cdXdPjY9fuOvSAv2XpIS7Ye0i9eiQSdnIygjja1qCArpOroqt/8rJC2HqjTBeMG6P3ThumqKUNa/f++892nHe/HpCsmDz7u17vKGKOpI4o0dUSRbr14vGobmrWvql7r99Vo75Gj2nnoqCqqG3T4aJMmD+2rvKygXlhboTV7qvS+KUNU19iiVzZUqsVaZYcCqqpvVp/skF7dWKlXN1a2+34Xji/WviP1Omt4X00a3Ecrdx2Jv05IUmYo0hPWmb+/3XrFdE5GUNX1Teqfl6UrJw/W3EklOljbqGFFuZozsUSZoYCstXpxfYUWbTqgrIyglm0/pEN1jSrKzdTFkwZqYJ8sjS0p0NiS/FbP/flLJ+jZ1Xu1bMdhvWt0f2UGA+qTnaER/XLVNzdDa/dWac+Res0aX9yqLTdX1GhLZa3qm8J6dWMk3C7fcVjnjxugycMKVdfQrLmTBqpfXqb65WXG/9bFQmNvvfRXKj1JDZLmWGtrjDEZkl41xvzNWrvIcW0dik0Ai/XCxf6TXnP2cI0tyde154xodX5GMKCWsFVL2CpgpLujG0v+6ENTddfTqzW4b7ZeS/gFCQaMmsPH/uPP/+MK3f3+M13+k+CZxDBkrVVDc1hfe+ptPbIkEsD75WVqwa0X6K2dR/TQom16eX3rF8HvLFirI0eb9JlLxikzmL5JiaeqzsKmC/VNLcoKBfTQom16ZUOlJg4q0OubDmhjRY2aW6yunDxY6/ZVa/mOw50+V2Yw0GEvwBtbWr9bX7iuQiP752r6iCJdeeZg3ffyZjW2hBW20oXjBmjj/hr9LfqmTUreE/7L17bol69tid/vn5epYMBof3WD7rn6TE0ZXqj87JD65mTIyKi2sVnbDtTps48tj/cSlZUW6eMXjNaF44tT+XGlXX5WSPlZIQ3sk60zhxV2eN7nE6680JGq+ia9sGa/lmw9qPqmsDKCRm9sOai1e6q0v7pB6/ZVS5IygkahgNF7zhqiLZW1GlqUo+XbD2vOxBIt3XZIQwpzVJiboVW7juj904bpnNH91D8/S6t3VykzFFBWKKBpI4pSmudqjNHsCSWaPaEk5Z+JFBlJmXfGYM07I3lAnTiojyYO6tPu+OjifI0ujgSuK8/sPNzGfj+NMQr24j9fnYYkG0kksbcfGdGPU2KXxUC0kSqik/1GF+cl7bbLCEXOa2oJa8XOI/HjWaGAFn5uloIBo/PufkFSZGKhJF1+xmDdquWSpEeW7NAjS3bora9eqr7ReVDvZPuq6vXXlXt048zSHnkh236gTiP656q5JdzqXfrJqmloVn5Wx78Cy3cc1iNvbNfQwhwdqG3UJ2ePVd+cDF38g5e0/WBdh4+TpIO1jZoRnXCaaEB+lm44d6S+/9x6/eylTfrZS5t055WT9LHzR72jg1JDc4ueXLZLX/jjynYBY+7EEgUCRv9+4WiNGpAnY4xeXLdff16+W1dMHqSSgmydPqSPQsGA+mSHVNvYoh/+Y4M+OXuMCrIzkr4gbdxfo0v/9yWFrTSuJF8bkgx5PNdmv7RHE3qgpwwvVN+cDDU0t6ggO0Mb90fehcfE6r/s9IEaP7BAt18yXsYYLd12SP3yMlVd36QjR5tUUpCt4f1ylJMRjLf/3OiFu5MJh60qaxtUkJWhQ3WNGtw3W9sO1Ok/Hl6qUQPydMXkwbr8jEGd/p7kZAY1ID9LL35+dvzn33aV8DtJn+wMvW/qUL1v6tB2X1u164i+s2Ctzi7tpxvOLVXf3K6/BiTOuYJfUpqTZIwJSloqaaykn1hrFyc55xZJt0jSiBEj2n65W8XnJEX/qMSW83c01ygz+gejqSWs9dF3BFIkJOVFXyRvu3icvvLnt/XMpy+IPCbJH9azvv7sO37oLRy28dUmX//Lar3x5bkqKej+5bk7DtZpSGGO/vLWbt326PL48dzMoKaPLNJ915cpJ/PE/6j/z9/W6OcvReYLrPzapSrIztDKnUf0z02VHV7C5sF/bj3uc543tr9+/dEZqmtq0ZlfezZ+/NLTBupLV0zSwbpGTRtRFDl2+iB9+P5Fqqxp1F3PrNFdz6zRvf82RSP758YXCvT20GSt1U0PLmk31NS2B+b5tZG5FW1DiyS91KaHLlFiz0pMsh6eDftrWg1hfez8UZo1oViPvLFDZaVFumrKUOVmBrVg1V7NHNNfJZ0sR99+oE6VtQ3xtk40fWT7Y10RCJj471tOZuSFt3RAnhbcduFJPe87OSB15oyhffXQx85JdxlIE9P2umfHPdmYQkl/kvQpa+2qjs4rKyuz5eXlJ19dBx5bskN3/HGFfvrhafrEb9+MH//k7DFJu1YffG2LvvaX1frA9GH6w9Kd8eMLPzdLowbkxe/XNTYrN/NYbvznxkpde3/rPPhODEmJQx+/em1Lqw04pfY/k12Hj+q8u1/Qz6+frstOH9Tu+fYeqde5dz+vX1xfpotPa/2uubE5rA/+/PVOhzSKcjO07L8vbXe8uSWs/37qbf1u8XZ98fKJGluSr1W7qnTjzFIVZIf0+Js7dUebORsD8jP1/Q9O0Q2/fCPp9xrRL7dVz1H/vEw9d/tF2nXoqAIBacLAghPu4frBs+v0oxc2Jv3al66YqD8t260N+6o1sn+uggGj9ftqNHpAnn563XRNGFQQP7e+qUVVR5tU3xTWJ367VG/vrlJBdkg/v2668rNDeu+PX5Mkfef9k3X19OHxSac9KTbH4i/Ld2tjRU2rXt0hfbM1oCBL33rfZJ0xtE98hcymihqFrdWrGyq1eMtBvbn9kA7XNWne6YM0qjhPGQGjnYeO6olluzSyf662RYeOhvTNVnZGUJsTencS5WUG9dDN56ggK6SivEwNyM9SOBzZy7+3zq0A0DFjzFJrbbvdp7sUkqJP9FVJtdba73V0Tk+FpJ9fP13//tDS+PFPzx2n2y8Z3+783y7epi//qXWmW3DbBUnHZRM1tYR12b0va3PFsT+0XQlJP3tpk/ZV1eur7zk95cecaj71+2V6c9shvTZ/jtburdK8e1+RJL3w2Ys05/svSZLW3TUv/k50zZ4qXf7DV+KPv/LMwfqXKUN15vC+OljbGH98zCt3zNbwfrlavPmA/u2+5NPc7pg3QRmBgH79+lbtPHQ0fvyf8+fENyCTIi/C//rTf2pZiiternvXCD28aHu741dNGaJzRvVvN7etur5JBdndP9xa39Siz/7hLT2zYs8JPX5McZ42VSQPA51Z9pVLVJibIWOM9lXVKz8rpN2Hj6q+KaytB2r1qd8v09yJJZp/+UQN75d7wvuvPFa+o104zckI6kMzRujWueNOaAijK47UNenN7YfU0BzWzLH91cdBOwLwV0chKZXVbcWSmqy1h40xOZIulvQdBzWmLHbttkCbd8KhDt4B1iQZhussIEmRCd8fv2B0q8uU7DxUp2FFuSnVGZsg/uk541QUXRrqkx0H6/SX6I7kiTuRf/Oq0zW6OF+3XDha9728WRPuXCApMk+jbQ/QMyv2HPfF/7OPvaVLThuob/11TavjP/jgWfrVa1t18wWjdNWUyDyBj18YmW/2j9X7dPNvyjUzOo8s5qLxxfGA1LbXMCZgpC/Mm6iPnFuqnMzIi/SVP3pVkvTUf5133AmgLgKSJGVnBPWTa6fp3n8LK2iMWqzVZx5drqdX7NErd8zWyxsq4vuljC3J18K1+/WNp4/15rUNSN95/2RNH9lP8/+4QuXbDqlvToYe/OjZ2lJZq9sfe6vVuVO/+Vyn9T2/dn98yCvZkuDY8u/BfbMVMEZFuZlaseuwKqsbVVZapDe3HVJtY2QTuYCJLPf92PmjenQX5b65GZo9sWsTYAGg054kY8yZkn4tKajItd4es9Z+43iPcd2T9Mgb2zX/iZX65Y1luunBY9/n85dN0Cdnj213fqznKVGqPULJeqFSeWzby5u8/PnZGtE/tXDl2pG6Jm2urNHgvjm66J6FuvyMQbr3mqnaX12vjftrdLC2Uf/1u2VJHzu0MEevzY/sW7JhX7Uu+d+Xk573qxvP1kcfXJL0a7/7+DmaNKhPuxfoKyYP0rwzBuvdkwd3uhfMnO+/2KqHLyY3M6ild16inMzIEtv8rMhGcZsralTSJ/u4E7V9suNgnYoLsvTwom2aOqJQ00YUdWn4rKklrOnffE5VHczju2h8sf65qVLXzhihX7++7aTrnTioQDfOLNU1M9zOVwSAE3HCPUnW2hWSpjqpqpt1NJegobnjrdA7s6+q/Tb5TS3h426UdaSuqd2xC+9ZqPI7L9aA/GN7OG2qqFFRbmTPiZ5irdVZ33i21bEnl+/Wk8t3d/AIae035+mx8h3qm5PRat+TcQMLNH5gfqu9V2ZNKNaDH50hKRImaxua9bk/vKVPzBrTrpcm1hMlSQ99bIYuGJf60uPnb79ImypqtXjLAV05ebBmfe9FFeZk6LnbL4q3TWLPT2xpa28xvF8kcJ/oJmwZwYBWfO2y+P0dB+vULy9ToaBpN4n361edISmys/2T0bk/gYCJL5GfM7FEa6M7AJ87pr/Ktx6M77B7oKZRsycWt5rrBwC+8PIvV6zvq7G5dS9YR8NtDW02+zpvbP+Uv9d/XDRaP3p+g+67frpuic5/WrT5wHFf0G9/bHnS42V3/SPeC3Wkrklzo3N63n3mYP342mmSIqvHbvr1Ek0YWKCfv7xZ8y+fqNkTSlpN0j0ROw7W6YLvLuzSpNRLTxuoL185SdkZQX3k3NKk5/z10xdo1e7IpVuO1DW1m1uSlxXST6+bnvSxX7piki49baCmjijq8mRZY4zGluTHN1JbnmQSN1IXC13HEwwYvX/6sFbHYsvVJw3uo0mDI0PYlyaZrA8APvIyJMW0fV3N6mBS6UXji3XXM8fmvPz25nel/D1yM0Pthteuf+ANXTl5sP7n/ZOVmxFstbJpa2VtfP7GS5+fpZH989oNvUlq1Zvz9Io9enrFM/rNTTO0eMsBvbiuQi9Gl0Xf/be18blNL3z2Io0akKdvPL1aNfXNeu+UIbr+gTf0w2um6KopQ+ObGi7afECzEjYgu+Pxt/RYeWR+Tuz6drHQ98ANZSrKy9Q19y3SJ2eN1dVlw7RhX3Wrxx9PKBiIL1k/kcm3ZaX9uvwYAAB6gpchKTaNqm3vQ3F+8mGrcQMLNGFggdbtq9a3/uWME/6+S758sc7+VuRyAc+s3KNnVu7RxEEFrfYomfW9F+O3R/aPbC+QOHeqsqZBizYfSPr8H+lgCXrMnO+/pJvOG6VfvbZVkuITk299ZLlufWR5q3P/ftuFKinIUk5mMB6QYn5z0wxdOL64Vfhbf9exqyyz8RkAAL6GpOiAW9t5qm1XuyW6auoQfXfBOl3ewVbtqWh7TThJWrv32OaUh2ob47dfuWN2/PaciQP1L1OH6k/Ldqks4ZpMN84s1XOr9+kzl4zX5/5wbNXR0MIc7Tp8tNW+LzHJNshL5rJ7W0+ovvS0gbrn6rPU0NLiZPNHAAB6Gy9DUkdiwz7JfOKiMfrozFEntUuzFLmQZNug8pOFG/Wfs8bEd6KeOKig3RyPiUnmFH31Pafpa++N7KF09fRhuux/X9a6fdW69PSB8b2VKqob9PrmA9p16Ki+syAy7DZhYIEW3HaB1u2rVm5GSBfeszD+nD++dmrSlWn/9+Fp0WFB9ocBACAVXoak2HDb6AHHViydNrjPcS8XYIw56YAkSf/9ntN0zuh+rTaxvOfv6/S7xdvjlzt4+Ob2W9i3vczJ4i/Nbbdk+4//OVM/WbhRn5pzbBuD4oIsvfesIWpobtHMMf115GiTJg4ukDEmvtdT2zlTq3dX6f9e3BS//6k5Y7v1mmcAALwTeBmSYnKzgsoKBdTQHI6vcuoJl50+SPdcfaa+9tTb8U3ydh0+thN04jL/mBtnlmpQn2xNG1mk5rBNupFeflZIX5iX/IrVWaGgzjpOT1miO+ZN1A0zS1Wcn9XpfkMAACA5L0NS4sL/2DykrCQXpHXpA2XDNbQoR9f+ovW13R7u4EKIxhhdPvnE50N1VU/uZgwAQG/k5xhMdLzNyMS3ATjadOIbRp6omWMGaOvdV2pYUWQ12AfLhun8cQN6vA4AAND9/AxJUcYoPtz19AleHLQ7/PqmGfrA9GG6+1/PTFsNAACge3k/3HYqGFOcr3s+cFa6ywAAAN3Iy56k2Oo2piQDAABXvAxJMV256jkAAEBXeBmSrD3VBtwAAEBv42VIiqEfCQAAuOJlSErWjzRrQnGP1wEAAHovP0NSbOJ2QlfSzeePTk8xAACgV/IyJMWYhAG3UJDBNwAA0H28DEnJhtuCXKMMAAB0Iy9DUhy5CAAAOOJlSEq2BUA4zLYAAACg+3gZkmISJ26TkQAAQHfyOyQl3GaDSQAA0J28DEnJ8hA9SQAAoDv5GZKi69sSr902Y1S/dJUDAAB6IS9DUjKZoV7zTwEAAKcAL5NFfMft9JYBAAB6MS9DUowhJQEAAEe8DEnM0QYAAK75GZLiw210JQEAADdC6S7gZBgj/ehDU5XBddsAAEA38zIk2YQBt/eeNSSNlQAAgN7Ky+E2AAAA17wMSVyBBAAAuOZlSIphCwAAAOCK3yGJ1W0AAMARL0OSZbwNAAA45mVIimG4DQAAuOJlSKIjCQAAuOZnSIp+piMJAAC44mVIijGMtwEAAEe8DEkMtwEAANf8DEnRATf6kQAAgCtehqQYRtsAAIArXoYkhtsAAIBrXoakGCZuAwAAV7wMSXQkAQAA17wMSYy3AQAA1/wMSWLSNgAAcMvLkEQ/EgAAcK3TkGSMGW6MWWiMWWOMedsYc2tPFNYZOpIAAIBLoRTOaZb0WWvtm8aYAklLjTHPWWtXO66tQ0xJAgAArnXak2St3WOtfTN6u1rSGklDXRd23JpkWf4PAACc6tKcJGNMqaSpkhY7qaYLiEgAAMCllEOSMSZf0h8l3WatrUry9VuMMeXGmPKKiorurLEdhtsAAIBrKYUkY0yGIgHpt9baJ5KdY629z1pbZq0tKy4u7s4a238vsQUAAABwK5XVbUbSA5LWWGt/4L6k1BgG3AAAgEOp9CSdJ+l6SXOMMcujH1c4ruu4GG4DAACudboFgLX2VZ2K86RPvYoAAEAv4umO23QlAQAAt7wMSbJ0JAEAALf8DElidRsAAHDLy5DEYBsAAHDNy5AksQUAAABwy8uQZNkDAAAAOOZpSGJOEgAAcMvLkCSxug0AALjlZUhisA0AALjmZ0iykmG8DQAAOORlSJIYbgMAAG55GZK4LAkAAHDNy5Akia4kAADglJchiW2SAACAa16GJImOJAAA4Ja/IYnVbQAAwCEvQxKXJQEAAK55GZIkLksCAADc8jIk0Y8EAABc8zMkWSZuAwAAt7wMSRITtwEAgFtehiR23AYAAK55GZIkhtsAAIBbXoYkdgAAAACu+RmSxBYAAADALS9DUgQpCQAAuONlSGK4DQAAuOZlSJIsw20AAMApT0MSg20AAMAtL0MSw20AAMA1L0OSxOo2AADglpchiZ4kAADgmp8hSVaGWUkAAMAhL0OSxHAbAABwy8uQxHAbAABwzcuQJLEFAAAAcMvLkERHEgAAcM3PkGQlw6QkAADgkJchCQAAwDUvQ5JlwA0AADjmZUiSZQsAAADglp8hSYQkAADglpchicE2AADgmpchSRKXJQEAAE55GZIsW24DAADH/AxJYk4SAABwy8uQJHFZEgAA4JaXIYnRNgAA4JqXIUnisiQAAMAtL0MSHUkAAMA1P0OStcxJAgAATnUakowxvzTG7DfGrOqJglJGSgIAAA6l0pP0oKR5juvoEobbAACAa52GJGvty5IO9kAtKXtmxZ50lwAAAHq5UHc9kTHmFkm3SNKIESO662mT+vA5IzRqQJ7T7wEAAN7ZTCqX+DDGlEp62lp7RipPWlZWZsvLy0+yNAAAAPeMMUuttWVtj3u5ug0AAMA1QhIAAEASqWwB8HtJr0uaYIzZaYz5mPuyAAAA0qvTidvW2g/1RCEAAACnEobbAAAAkiAkAQAAJEFIAgAASIKQBAAAkAQhCQAAIAlCEgAAQBKEJAAAgCQISQAAAEkQkgAAAJIw1truf1JjKiRt6/Ynbm2ApErH3wNdQ5ucmmiXUw9tcmqiXU49PdUmI621xW0POglJPcEYU26tLUt3HTiGNjk10S6nHtrk1ES7nHrS3SYMtwEAACRBSAIAAEjC55B0X7oLQDu0yamJdjn10CanJtrl1JPWNvF2ThIAAIBLPvckAQAAOONdSDLGzDPGrDPGbDTGzE93Pb2ZMWa4MWahMWaNMeZtY8yt0eP9jDHPGWM2RD8XJTzmi9G2WWeMuSzh+HRjzMro135kjDHp+Df1JsaYoDFmmTHm6eh92iWNjDGFxpjHjTFro78z59Im6WeM+Uz079cqY8zvjTHZtEvPMsb80hiz3xizKuFYt7WBMSbLGPNo9PhiY0xptxVvrfXmQ1JQ0iZJoyVlSnpL0mnprqu3fkgaLGla9HaBpPWSTpP0XUnzo8fnS/pO9PZp0TbJkjQq2lbB6NfekHSuJCPpb5IuT/e/z/cPSbdL+p2kp6P3aZf0tsevJd0cvZ0pqZA2SXubDJW0RVJO9P5jkm6kXXq8HS6UNE3SqoRj3dYGkv5T0s+it6+R9Gh31e5bT9IMSRuttZuttY2SHpF0VZpr6rWstXustW9Gb1dLWqPIH52rFHlBUPTz+6K3r5L0iLW2wVq7RdJGSTOMMYMl9bHWvm4j/4t/k/AYnABjzDBJV0q6P+Ew7ZImxpg+irwQPCBJ1tpGa+1h0SangpCkHGNMSFKupN2iXXqUtfZlSQfbHO7ONkh8rsclze2unj7fQtJQSTsS7u+MHoNj0e7LqZIWSxpord0jRYKUpJLoaR21z9Do7bbHceLulXSHpHDCMdolfUZLqpD0q+gQ6P3GmDzRJmllrd0l6XuStkvaI+mItfZZ0S6ngu5sg/hjrLXNko5I6t8dRfoWkpIlQ5bnOWaMyZf0R0m3WWurjndqkmP2OMdxAowx75a031q7NNWHJDlGu3SvkCLDCT+11k6VVKvIEEJHaJMeEJ3ncpUiwzZDJOUZY6473kOSHKNdetaJtIGz9vEtJO2UNDzh/jBFuk7hiDEmQ5GA9Ftr7RPRw/uiXZ+Kft4fPd5R++yM3m57HCfmPEnvNcZsVWTIeY4x5mHRLum0U9JOa+3i6P3HFQlNtEl6XSxpi7W2wlrbJOkJSTNFu5wKurMN4o+JDqv2VfvhvRPiW0haImmcMWaUMSZTkQlaT6W5pl4rOqb7gKQ11tofJHzpKUk3RG/fIOnPCcevia40GCVpnKQ3ol2p1caYd0Wf8yMJj0EXWWu/aK0dZq0tVeR34AVr7XWiXdLGWrtX0g5jzIToobmSVos2Sbftkt5ljMmN/jznKjK3knZJv+5sg8TnulqRv4nd09OX7lnvXf2QdIUiq6w2SfpyuuvpzR+Szleky3KFpOXRjysUGet9XtKG6Od+CY/5crRt1ilh9YekMkmrol/7saIbmfJx0m00S8dWt9Eu6W2LKZLKo78vT0oqok3S/yHp65LWRn+mDymyaop26dk2+L0ic8KaFOn1+Vh3toGkbEl/UGSS9xuSRndX7ey4DQAAkIRvw20AAAA9gpAEAACQBCEJAAAgCUISAABAEoQkAACAJAhJAAAASRCSAAAAkiAkAQAAJPH/AbNpabqYUQKMAAAAAElFTkSuQmCC\n",
214 |       "text/plain": [
215 |        "<Figure size 720x432 with 1 Axes>"
216 |       ]
217 |      },
218 |      "metadata": {
219 |       "needs_background": "light"
220 |      },
221 |      "output_type": "display_data"
222 |     }
223 |    ],
224 |    "source": [
225 |     "#Here is the main observation which shows that as the iteration increases the deviation in the value of pi decreases\n",
226 |     "#and converges to 3.14.\n",
227 |     "plt.figure(figsize=(10,6))\n",
228 |     "y = np.arange(1, 10001)\n",
229 |     "plt.plot(y, pi_val)"
230 |    ]
231 |   },
232 |   {
233 |    "cell_type": "code",
234 |    "execution_count": 33,
235 |    "metadata": {},
236 |    "outputs": [
237 |     {
238 |      "name": "stdout",
239 |      "output_type": "stream",
240 |      "text": [
241 |       "3.095592799503414\n"
242 |      ]
243 |     }
244 |    ],
245 |    "source": [
246 |     "#The value of pi after 10000 iteration \n",
247 |     "pi = pi_val[9999]\n",
248 |     "print(pi)"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": null,
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "#which is very close to 3.14."
258 |    ]
259 |   }
260 |  ],
261 |  "metadata": {
262 |   "kernelspec": {
263 |    "display_name": "Python 3",
264 |    "language": "python",
265 |    "name": "python3"
266 |   },
267 |   "language_info": {
268 |    "codemirror_mode": {
269 |     "name": "ipython",
270 |     "version": 3
271 |    },
272 |    "file_extension": ".py",
273 |    "mimetype": "text/x-python",
274 |    "name": "python",
275 |    "nbconvert_exporter": "python",
276 |    "pygments_lexer": "ipython3",
277 |    "version": "3.8.5"
278 |   }
279 |  },
280 |  "nbformat": 4,
281 |  "nbformat_minor": 4
282 | }
283 | 


--------------------------------------------------------------------------------
/data/Breast-Cancer.csv:
--------------------------------------------------------------------------------
  1 | id,x1,x2,x3,x4,x5,x6,x7,x8,x9,label
  2 | 1000025,5,1,1,1,2,1,3,1,1,2
  3 | 1002945,5,4,4,5,7,10,3,2,1,2
  4 | 1015425,3,1,1,1,2,2,3,1,1,2
  5 | 1016277,6,8,8,1,3,4,3,7,1,2
  6 | 1017023,4,1,1,3,2,1,3,1,1,2
  7 | 1017122,8,10,10,8,7,10,9,7,1,4
  8 | 1018099,1,1,1,1,2,10,3,1,1,2
  9 | 1018561,2,1,2,1,2,1,3,1,1,2
 10 | 1033078,2,1,1,1,2,1,1,1,5,2
 11 | 1033078,4,2,1,1,2,1,2,1,1,2
 12 | 1035283,1,1,1,1,1,1,3,1,1,2
 13 | 1036172,2,1,1,1,2,1,2,1,1,2
 14 | 1041801,5,3,3,3,2,3,4,4,1,4
 15 | 1043999,1,1,1,1,2,3,3,1,1,2
 16 | 1044572,8,7,5,10,7,9,5,5,4,4
 17 | 1047630,7,4,6,4,6,1,4,3,1,4
 18 | 1048672,4,1,1,1,2,1,2,1,1,2
 19 | 1049815,4,1,1,1,2,1,3,1,1,2
 20 | 1050670,10,7,7,6,4,10,4,1,2,4
 21 | 1050718,6,1,1,1,2,1,3,1,1,2
 22 | 1054590,7,3,2,10,5,10,5,4,4,4
 23 | 1054593,10,5,5,3,6,7,7,10,1,4
 24 | 1056784,3,1,1,1,2,1,2,1,1,2
 25 | 1057013,8,4,5,1,2,?,7,3,1,4
 26 | 1059552,1,1,1,1,2,1,3,1,1,2
 27 | 1065726,5,2,3,4,2,7,3,6,1,4
 28 | 1066373,3,2,1,1,1,1,2,1,1,2
 29 | 1066979,5,1,1,1,2,1,2,1,1,2
 30 | 1067444,2,1,1,1,2,1,2,1,1,2
 31 | 1070935,1,1,3,1,2,1,1,1,1,2
 32 | 1070935,3,1,1,1,1,1,2,1,1,2
 33 | 1071760,2,1,1,1,2,1,3,1,1,2
 34 | 1072179,10,7,7,3,8,5,7,4,3,4
 35 | 1074610,2,1,1,2,2,1,3,1,1,2
 36 | 1075123,3,1,2,1,2,1,2,1,1,2
 37 | 1079304,2,1,1,1,2,1,2,1,1,2
 38 | 1080185,10,10,10,8,6,1,8,9,1,4
 39 | 1081791,6,2,1,1,1,1,7,1,1,2
 40 | 1084584,5,4,4,9,2,10,5,6,1,4
 41 | 1091262,2,5,3,3,6,7,7,5,1,4
 42 | 1096800,6,6,6,9,6,?,7,8,1,2
 43 | 1099510,10,4,3,1,3,3,6,5,2,4
 44 | 1100524,6,10,10,2,8,10,7,3,3,4
 45 | 1102573,5,6,5,6,10,1,3,1,1,4
 46 | 1103608,10,10,10,4,8,1,8,10,1,4
 47 | 1103722,1,1,1,1,2,1,2,1,2,2
 48 | 1105257,3,7,7,4,4,9,4,8,1,4
 49 | 1105524,1,1,1,1,2,1,2,1,1,2
 50 | 1106095,4,1,1,3,2,1,3,1,1,2
 51 | 1106829,7,8,7,2,4,8,3,8,2,4
 52 | 1108370,9,5,8,1,2,3,2,1,5,4
 53 | 1108449,5,3,3,4,2,4,3,4,1,4
 54 | 1110102,10,3,6,2,3,5,4,10,2,4
 55 | 1110503,5,5,5,8,10,8,7,3,7,4
 56 | 1110524,10,5,5,6,8,8,7,1,1,4
 57 | 1111249,10,6,6,3,4,5,3,6,1,4
 58 | 1112209,8,10,10,1,3,6,3,9,1,4
 59 | 1113038,8,2,4,1,5,1,5,4,4,4
 60 | 1113483,5,2,3,1,6,10,5,1,1,4
 61 | 1113906,9,5,5,2,2,2,5,1,1,4
 62 | 1115282,5,3,5,5,3,3,4,10,1,4
 63 | 1115293,1,1,1,1,2,2,2,1,1,2
 64 | 1116116,9,10,10,1,10,8,3,3,1,4
 65 | 1116132,6,3,4,1,5,2,3,9,1,4
 66 | 1116192,1,1,1,1,2,1,2,1,1,2
 67 | 1116998,10,4,2,1,3,2,4,3,10,4
 68 | 1117152,4,1,1,1,2,1,3,1,1,2
 69 | 1118039,5,3,4,1,8,10,4,9,1,4
 70 | 1120559,8,3,8,3,4,9,8,9,8,4
 71 | 1121732,1,1,1,1,2,1,3,2,1,2
 72 | 1121919,5,1,3,1,2,1,2,1,1,2
 73 | 1123061,6,10,2,8,10,2,7,8,10,4
 74 | 1124651,1,3,3,2,2,1,7,2,1,2
 75 | 1125035,9,4,5,10,6,10,4,8,1,4
 76 | 1126417,10,6,4,1,3,4,3,2,3,4
 77 | 1131294,1,1,2,1,2,2,4,2,1,2
 78 | 1132347,1,1,4,1,2,1,2,1,1,2
 79 | 1133041,5,3,1,2,2,1,2,1,1,2
 80 | 1133136,3,1,1,1,2,3,3,1,1,2
 81 | 1136142,2,1,1,1,3,1,2,1,1,2
 82 | 1137156,2,2,2,1,1,1,7,1,1,2
 83 | 1143978,4,1,1,2,2,1,2,1,1,2
 84 | 1143978,5,2,1,1,2,1,3,1,1,2
 85 | 1147044,3,1,1,1,2,2,7,1,1,2
 86 | 1147699,3,5,7,8,8,9,7,10,7,4
 87 | 1147748,5,10,6,1,10,4,4,10,10,4
 88 | 1148278,3,3,6,4,5,8,4,4,1,4
 89 | 1148873,3,6,6,6,5,10,6,8,3,4
 90 | 1152331,4,1,1,1,2,1,3,1,1,2
 91 | 1155546,2,1,1,2,3,1,2,1,1,2
 92 | 1156272,1,1,1,1,2,1,3,1,1,2
 93 | 1156948,3,1,1,2,2,1,1,1,1,2
 94 | 1157734,4,1,1,1,2,1,3,1,1,2
 95 | 1158247,1,1,1,1,2,1,2,1,1,2
 96 | 1160476,2,1,1,1,2,1,3,1,1,2
 97 | 1164066,1,1,1,1,2,1,3,1,1,2
 98 | 1165297,2,1,1,2,2,1,1,1,1,2
 99 | 1165790,5,1,1,1,2,1,3,1,1,2
100 | 1165926,9,6,9,2,10,6,2,9,10,4
101 | 1166630,7,5,6,10,5,10,7,9,4,4
102 | 1166654,10,3,5,1,10,5,3,10,2,4
103 | 1167439,2,3,4,4,2,5,2,5,1,4
104 | 1167471,4,1,2,1,2,1,3,1,1,2
105 | 1168359,8,2,3,1,6,3,7,1,1,4
106 | 1168736,10,10,10,10,10,1,8,8,8,4
107 | 1169049,7,3,4,4,3,3,3,2,7,4
108 | 1170419,10,10,10,8,2,10,4,1,1,4
109 | 1170420,1,6,8,10,8,10,5,7,1,4
110 | 1171710,1,1,1,1,2,1,2,3,1,2
111 | 1171710,6,5,4,4,3,9,7,8,3,4
112 | 1171795,1,3,1,2,2,2,5,3,2,2
113 | 1171845,8,6,4,3,5,9,3,1,1,4
114 | 1172152,10,3,3,10,2,10,7,3,3,4
115 | 1173216,10,10,10,3,10,8,8,1,1,4
116 | 1173235,3,3,2,1,2,3,3,1,1,2
117 | 1173347,1,1,1,1,2,5,1,1,1,2
118 | 1173347,8,3,3,1,2,2,3,2,1,2
119 | 1173509,4,5,5,10,4,10,7,5,8,4
120 | 1173514,1,1,1,1,4,3,1,1,1,2
121 | 1173681,3,2,1,1,2,2,3,1,1,2
122 | 1174057,1,1,2,2,2,1,3,1,1,2
123 | 1174057,4,2,1,1,2,2,3,1,1,2
124 | 1174131,10,10,10,2,10,10,5,3,3,4
125 | 1174428,5,3,5,1,8,10,5,3,1,4
126 | 1175937,5,4,6,7,9,7,8,10,1,4
127 | 1176406,1,1,1,1,2,1,2,1,1,2
128 | 1176881,7,5,3,7,4,10,7,5,5,4
129 | 1177027,3,1,1,1,2,1,3,1,1,2
130 | 1177399,8,3,5,4,5,10,1,6,2,4
131 | 1177512,1,1,1,1,10,1,1,1,1,2
132 | 1178580,5,1,3,1,2,1,2,1,1,2
133 | 1179818,2,1,1,1,2,1,3,1,1,2
134 | 1180194,5,10,8,10,8,10,3,6,3,4
135 | 1180523,3,1,1,1,2,1,2,2,1,2
136 | 1180831,3,1,1,1,3,1,2,1,1,2
137 | 1181356,5,1,1,1,2,2,3,3,1,2
138 | 1182404,4,1,1,1,2,1,2,1,1,2
139 | 1182410,3,1,1,1,2,1,1,1,1,2
140 | 1183240,4,1,2,1,2,1,2,1,1,2
141 | 1183246,1,1,1,1,1,?,2,1,1,2
142 | 1183516,3,1,1,1,2,1,1,1,1,2
143 | 1183911,2,1,1,1,2,1,1,1,1,2
144 | 1183983,9,5,5,4,4,5,4,3,3,4
145 | 1184184,1,1,1,1,2,5,1,1,1,2
146 | 1184241,2,1,1,1,2,1,2,1,1,2
147 | 1184840,1,1,3,1,2,?,2,1,1,2
148 | 1185609,3,4,5,2,6,8,4,1,1,4
149 | 1185610,1,1,1,1,3,2,2,1,1,2
150 | 1187457,3,1,1,3,8,1,5,8,1,2
151 | 1187805,8,8,7,4,10,10,7,8,7,4
152 | 1188472,1,1,1,1,1,1,3,1,1,2
153 | 1189266,7,2,4,1,6,10,5,4,3,4
154 | 1189286,10,10,8,6,4,5,8,10,1,4
155 | 1190394,4,1,1,1,2,3,1,1,1,2
156 | 1190485,1,1,1,1,2,1,1,1,1,2
157 | 1192325,5,5,5,6,3,10,3,1,1,4
158 | 1193091,1,2,2,1,2,1,2,1,1,2
159 | 1193210,2,1,1,1,2,1,3,1,1,2
160 | 1193683,1,1,2,1,3,?,1,1,1,2
161 | 1196295,9,9,10,3,6,10,7,10,6,4
162 | 1196915,10,7,7,4,5,10,5,7,2,4
163 | 1197080,4,1,1,1,2,1,3,2,1,2
164 | 1197270,3,1,1,1,2,1,3,1,1,2
165 | 1197440,1,1,1,2,1,3,1,1,7,2
166 | 1197510,5,1,1,1,2,?,3,1,1,2
167 | 1197979,4,1,1,1,2,2,3,2,1,2
168 | 1197993,5,6,7,8,8,10,3,10,3,4
169 | 1198128,10,8,10,10,6,1,3,1,10,4
170 | 1198641,3,1,1,1,2,1,3,1,1,2
171 | 1199219,1,1,1,2,1,1,1,1,1,2
172 | 1199731,3,1,1,1,2,1,1,1,1,2
173 | 1199983,1,1,1,1,2,1,3,1,1,2
174 | 1200772,1,1,1,1,2,1,2,1,1,2
175 | 1200847,6,10,10,10,8,10,10,10,7,4
176 | 1200892,8,6,5,4,3,10,6,1,1,4
177 | 1200952,5,8,7,7,10,10,5,7,1,4
178 | 1201834,2,1,1,1,2,1,3,1,1,2
179 | 1201936,5,10,10,3,8,1,5,10,3,4
180 | 1202125,4,1,1,1,2,1,3,1,1,2
181 | 1202812,5,3,3,3,6,10,3,1,1,4
182 | 1203096,1,1,1,1,1,1,3,1,1,2
183 | 1204242,1,1,1,1,2,1,1,1,1,2
184 | 1204898,6,1,1,1,2,1,3,1,1,2
185 | 1205138,5,8,8,8,5,10,7,8,1,4
186 | 1205579,8,7,6,4,4,10,5,1,1,4
187 | 1206089,2,1,1,1,1,1,3,1,1,2
188 | 1206695,1,5,8,6,5,8,7,10,1,4
189 | 1206841,10,5,6,10,6,10,7,7,10,4
190 | 1207986,5,8,4,10,5,8,9,10,1,4
191 | 1208301,1,2,3,1,2,1,3,1,1,2
192 | 1210963,10,10,10,8,6,8,7,10,1,4
193 | 1211202,7,5,10,10,10,10,4,10,3,4
194 | 1212232,5,1,1,1,2,1,2,1,1,2
195 | 1212251,1,1,1,1,2,1,3,1,1,2
196 | 1212422,3,1,1,1,2,1,3,1,1,2
197 | 1212422,4,1,1,1,2,1,3,1,1,2
198 | 1213375,8,4,4,5,4,7,7,8,2,2
199 | 1213383,5,1,1,4,2,1,3,1,1,2
200 | 1214092,1,1,1,1,2,1,1,1,1,2
201 | 1214556,3,1,1,1,2,1,2,1,1,2
202 | 1214966,9,7,7,5,5,10,7,8,3,4
203 | 1216694,10,8,8,4,10,10,8,1,1,4
204 | 1216947,1,1,1,1,2,1,3,1,1,2
205 | 1217051,5,1,1,1,2,1,3,1,1,2
206 | 1217264,1,1,1,1,2,1,3,1,1,2
207 | 1218105,5,10,10,9,6,10,7,10,5,4
208 | 1218741,10,10,9,3,7,5,3,5,1,4
209 | 1218860,1,1,1,1,1,1,3,1,1,2
210 | 1218860,1,1,1,1,1,1,3,1,1,2
211 | 1219406,5,1,1,1,1,1,3,1,1,2
212 | 1219525,8,10,10,10,5,10,8,10,6,4
213 | 1219859,8,10,8,8,4,8,7,7,1,4
214 | 1220330,1,1,1,1,2,1,3,1,1,2
215 | 1221863,10,10,10,10,7,10,7,10,4,4
216 | 1222047,10,10,10,10,3,10,10,6,1,4
217 | 1222936,8,7,8,7,5,5,5,10,2,4
218 | 1223282,1,1,1,1,2,1,2,1,1,2
219 | 1223426,1,1,1,1,2,1,3,1,1,2
220 | 1223793,6,10,7,7,6,4,8,10,2,4
221 | 1223967,6,1,3,1,2,1,3,1,1,2
222 | 1224329,1,1,1,2,2,1,3,1,1,2
223 | 1225799,10,6,4,3,10,10,9,10,1,4
224 | 1226012,4,1,1,3,1,5,2,1,1,4
225 | 1226612,7,5,6,3,3,8,7,4,1,4
226 | 1227210,10,5,5,6,3,10,7,9,2,4
227 | 1227244,1,1,1,1,2,1,2,1,1,2
228 | 1227481,10,5,7,4,4,10,8,9,1,4
229 | 1228152,8,9,9,5,3,5,7,7,1,4
230 | 1228311,1,1,1,1,1,1,3,1,1,2
231 | 1230175,10,10,10,3,10,10,9,10,1,4
232 | 1230688,7,4,7,4,3,7,7,6,1,4
233 | 1231387,6,8,7,5,6,8,8,9,2,4
234 | 1231706,8,4,6,3,3,1,4,3,1,2
235 | 1232225,10,4,5,5,5,10,4,1,1,4
236 | 1236043,3,3,2,1,3,1,3,6,1,2
237 | 1241232,3,1,4,1,2,?,3,1,1,2
238 | 1241559,10,8,8,2,8,10,4,8,10,4
239 | 1241679,9,8,8,5,6,2,4,10,4,4
240 | 1242364,8,10,10,8,6,9,3,10,10,4
241 | 1243256,10,4,3,2,3,10,5,3,2,4
242 | 1270479,5,1,3,3,2,2,2,3,1,2
243 | 1276091,3,1,1,3,1,1,3,1,1,2
244 | 1277018,2,1,1,1,2,1,3,1,1,2
245 | 128059,1,1,1,1,2,5,5,1,1,2
246 | 1285531,1,1,1,1,2,1,3,1,1,2
247 | 1287775,5,1,1,2,2,2,3,1,1,2
248 | 144888,8,10,10,8,5,10,7,8,1,4
249 | 145447,8,4,4,1,2,9,3,3,1,4
250 | 167528,4,1,1,1,2,1,3,6,1,2
251 | 169356,3,1,1,1,2,?,3,1,1,2
252 | 183913,1,2,2,1,2,1,1,1,1,2
253 | 191250,10,4,4,10,2,10,5,3,3,4
254 | 1017023,6,3,3,5,3,10,3,5,3,2
255 | 1100524,6,10,10,2,8,10,7,3,3,4
256 | 1116116,9,10,10,1,10,8,3,3,1,4
257 | 1168736,5,6,6,2,4,10,3,6,1,4
258 | 1182404,3,1,1,1,2,1,1,1,1,2
259 | 1182404,3,1,1,1,2,1,2,1,1,2
260 | 1198641,3,1,1,1,2,1,3,1,1,2
261 | 242970,5,7,7,1,5,8,3,4,1,2
262 | 255644,10,5,8,10,3,10,5,1,3,4
263 | 263538,5,10,10,6,10,10,10,6,5,4
264 | 274137,8,8,9,4,5,10,7,8,1,4
265 | 303213,10,4,4,10,6,10,5,5,1,4
266 | 314428,7,9,4,10,10,3,5,3,3,4
267 | 1182404,5,1,4,1,2,1,3,2,1,2
268 | 1198641,10,10,6,3,3,10,4,3,2,4
269 | 320675,3,3,5,2,3,10,7,1,1,4
270 | 324427,10,8,8,2,3,4,8,7,8,4
271 | 385103,1,1,1,1,2,1,3,1,1,2
272 | 390840,8,4,7,1,3,10,3,9,2,4
273 | 411453,5,1,1,1,2,1,3,1,1,2
274 | 320675,3,3,5,2,3,10,7,1,1,4
275 | 428903,7,2,4,1,3,4,3,3,1,4
276 | 431495,3,1,1,1,2,1,3,2,1,2
277 | 432809,3,1,3,1,2,?,2,1,1,2
278 | 434518,3,1,1,1,2,1,2,1,1,2
279 | 452264,1,1,1,1,2,1,2,1,1,2
280 | 456282,1,1,1,1,2,1,3,1,1,2
281 | 476903,10,5,7,3,3,7,3,3,8,4
282 | 486283,3,1,1,1,2,1,3,1,1,2
283 | 486662,2,1,1,2,2,1,3,1,1,2
284 | 488173,1,4,3,10,4,10,5,6,1,4
285 | 492268,10,4,6,1,2,10,5,3,1,4
286 | 508234,7,4,5,10,2,10,3,8,2,4
287 | 527363,8,10,10,10,8,10,10,7,3,4
288 | 529329,10,10,10,10,10,10,4,10,10,4
289 | 535331,3,1,1,1,3,1,2,1,1,2
290 | 543558,6,1,3,1,4,5,5,10,1,4
291 | 555977,5,6,6,8,6,10,4,10,4,4
292 | 560680,1,1,1,1,2,1,1,1,1,2
293 | 561477,1,1,1,1,2,1,3,1,1,2
294 | 563649,8,8,8,1,2,?,6,10,1,4
295 | 601265,10,4,4,6,2,10,2,3,1,4
296 | 606140,1,1,1,1,2,?,2,1,1,2
297 | 606722,5,5,7,8,6,10,7,4,1,4
298 | 616240,5,3,4,3,4,5,4,7,1,2
299 | 61634,5,4,3,1,2,?,2,3,1,2
300 | 625201,8,2,1,1,5,1,1,1,1,2
301 | 63375,9,1,2,6,4,10,7,7,2,4
302 | 635844,8,4,10,5,4,4,7,10,1,4
303 | 636130,1,1,1,1,2,1,3,1,1,2
304 | 640744,10,10,10,7,9,10,7,10,10,4
305 | 646904,1,1,1,1,2,1,3,1,1,2
306 | 653777,8,3,4,9,3,10,3,3,1,4
307 | 659642,10,8,4,4,4,10,3,10,4,4
308 | 666090,1,1,1,1,2,1,3,1,1,2
309 | 666942,1,1,1,1,2,1,3,1,1,2
310 | 667204,7,8,7,6,4,3,8,8,4,4
311 | 673637,3,1,1,1,2,5,5,1,1,2
312 | 684955,2,1,1,1,3,1,2,1,1,2
313 | 688033,1,1,1,1,2,1,1,1,1,2
314 | 691628,8,6,4,10,10,1,3,5,1,4
315 | 693702,1,1,1,1,2,1,1,1,1,2
316 | 704097,1,1,1,1,1,1,2,1,1,2
317 | 704168,4,6,5,6,7,?,4,9,1,2
318 | 706426,5,5,5,2,5,10,4,3,1,4
319 | 709287,6,8,7,8,6,8,8,9,1,4
320 | 718641,1,1,1,1,5,1,3,1,1,2
321 | 721482,4,4,4,4,6,5,7,3,1,2
322 | 730881,7,6,3,2,5,10,7,4,6,4
323 | 733639,3,1,1,1,2,?,3,1,1,2
324 | 733639,3,1,1,1,2,1,3,1,1,2
325 | 733823,5,4,6,10,2,10,4,1,1,4
326 | 740492,1,1,1,1,2,1,3,1,1,2
327 | 743348,3,2,2,1,2,1,2,3,1,2
328 | 752904,10,1,1,1,2,10,5,4,1,4
329 | 756136,1,1,1,1,2,1,2,1,1,2
330 | 760001,8,10,3,2,6,4,3,10,1,4
331 | 760239,10,4,6,4,5,10,7,1,1,4
332 | 76389,10,4,7,2,2,8,6,1,1,4
333 | 764974,5,1,1,1,2,1,3,1,2,2
334 | 770066,5,2,2,2,2,1,2,2,1,2
335 | 785208,5,4,6,6,4,10,4,3,1,4
336 | 785615,8,6,7,3,3,10,3,4,2,4
337 | 792744,1,1,1,1,2,1,1,1,1,2
338 | 797327,6,5,5,8,4,10,3,4,1,4
339 | 798429,1,1,1,1,2,1,3,1,1,2
340 | 704097,1,1,1,1,1,1,2,1,1,2
341 | 806423,8,5,5,5,2,10,4,3,1,4
342 | 809912,10,3,3,1,2,10,7,6,1,4
343 | 810104,1,1,1,1,2,1,3,1,1,2
344 | 814265,2,1,1,1,2,1,1,1,1,2
345 | 814911,1,1,1,1,2,1,1,1,1,2
346 | 822829,7,6,4,8,10,10,9,5,3,4
347 | 826923,1,1,1,1,2,1,1,1,1,2
348 | 830690,5,2,2,2,3,1,1,3,1,2
349 | 831268,1,1,1,1,1,1,1,3,1,2
350 | 832226,3,4,4,10,5,1,3,3,1,4
351 | 832567,4,2,3,5,3,8,7,6,1,4
352 | 836433,5,1,1,3,2,1,1,1,1,2
353 | 837082,2,1,1,1,2,1,3,1,1,2
354 | 846832,3,4,5,3,7,3,4,6,1,2
355 | 850831,2,7,10,10,7,10,4,9,4,4
356 | 855524,1,1,1,1,2,1,2,1,1,2
357 | 857774,4,1,1,1,3,1,2,2,1,2
358 | 859164,5,3,3,1,3,3,3,3,3,4
359 | 859350,8,10,10,7,10,10,7,3,8,4
360 | 866325,8,10,5,3,8,4,4,10,3,4
361 | 873549,10,3,5,4,3,7,3,5,3,4
362 | 877291,6,10,10,10,10,10,8,10,10,4
363 | 877943,3,10,3,10,6,10,5,1,4,4
364 | 888169,3,2,2,1,4,3,2,1,1,2
365 | 888523,4,4,4,2,2,3,2,1,1,2
366 | 896404,2,1,1,1,2,1,3,1,1,2
367 | 897172,2,1,1,1,2,1,2,1,1,2
368 | 95719,6,10,10,10,8,10,7,10,7,4
369 | 160296,5,8,8,10,5,10,8,10,3,4
370 | 342245,1,1,3,1,2,1,1,1,1,2
371 | 428598,1,1,3,1,1,1,2,1,1,2
372 | 492561,4,3,2,1,3,1,2,1,1,2
373 | 493452,1,1,3,1,2,1,1,1,1,2
374 | 493452,4,1,2,1,2,1,2,1,1,2
375 | 521441,5,1,1,2,2,1,2,1,1,2
376 | 560680,3,1,2,1,2,1,2,1,1,2
377 | 636437,1,1,1,1,2,1,1,1,1,2
378 | 640712,1,1,1,1,2,1,2,1,1,2
379 | 654244,1,1,1,1,1,1,2,1,1,2
380 | 657753,3,1,1,4,3,1,2,2,1,2
381 | 685977,5,3,4,1,4,1,3,1,1,2
382 | 805448,1,1,1,1,2,1,1,1,1,2
383 | 846423,10,6,3,6,4,10,7,8,4,4
384 | 1002504,3,2,2,2,2,1,3,2,1,2
385 | 1022257,2,1,1,1,2,1,1,1,1,2
386 | 1026122,2,1,1,1,2,1,1,1,1,2
387 | 1071084,3,3,2,2,3,1,1,2,3,2
388 | 1080233,7,6,6,3,2,10,7,1,1,4
389 | 1114570,5,3,3,2,3,1,3,1,1,2
390 | 1114570,2,1,1,1,2,1,2,2,1,2
391 | 1116715,5,1,1,1,3,2,2,2,1,2
392 | 1131411,1,1,1,2,2,1,2,1,1,2
393 | 1151734,10,8,7,4,3,10,7,9,1,4
394 | 1156017,3,1,1,1,2,1,2,1,1,2
395 | 1158247,1,1,1,1,1,1,1,1,1,2
396 | 1158405,1,2,3,1,2,1,2,1,1,2
397 | 1168278,3,1,1,1,2,1,2,1,1,2
398 | 1176187,3,1,1,1,2,1,3,1,1,2
399 | 1196263,4,1,1,1,2,1,1,1,1,2
400 | 1196475,3,2,1,1,2,1,2,2,1,2
401 | 1206314,1,2,3,1,2,1,1,1,1,2
402 | 1211265,3,10,8,7,6,9,9,3,8,4
403 | 1213784,3,1,1,1,2,1,1,1,1,2
404 | 1223003,5,3,3,1,2,1,2,1,1,2
405 | 1223306,3,1,1,1,2,4,1,1,1,2
406 | 1223543,1,2,1,3,2,1,1,2,1,2
407 | 1229929,1,1,1,1,2,1,2,1,1,2
408 | 1231853,4,2,2,1,2,1,2,1,1,2
409 | 1234554,1,1,1,1,2,1,2,1,1,2
410 | 1236837,2,3,2,2,2,2,3,1,1,2
411 | 1237674,3,1,2,1,2,1,2,1,1,2
412 | 1238021,1,1,1,1,2,1,2,1,1,2
413 | 1238464,1,1,1,1,1,?,2,1,1,2
414 | 1238633,10,10,10,6,8,4,8,5,1,4
415 | 1238915,5,1,2,1,2,1,3,1,1,2
416 | 1238948,8,5,6,2,3,10,6,6,1,4
417 | 1239232,3,3,2,6,3,3,3,5,1,2
418 | 1239347,8,7,8,5,10,10,7,2,1,4
419 | 1239967,1,1,1,1,2,1,2,1,1,2
420 | 1240337,5,2,2,2,2,2,3,2,2,2
421 | 1253505,2,3,1,1,5,1,1,1,1,2
422 | 1255384,3,2,2,3,2,3,3,1,1,2
423 | 1257200,10,10,10,7,10,10,8,2,1,4
424 | 1257648,4,3,3,1,2,1,3,3,1,2
425 | 1257815,5,1,3,1,2,1,2,1,1,2
426 | 1257938,3,1,1,1,2,1,1,1,1,2
427 | 1258549,9,10,10,10,10,10,10,10,1,4
428 | 1258556,5,3,6,1,2,1,1,1,1,2
429 | 1266154,8,7,8,2,4,2,5,10,1,4
430 | 1272039,1,1,1,1,2,1,2,1,1,2
431 | 1276091,2,1,1,1,2,1,2,1,1,2
432 | 1276091,1,3,1,1,2,1,2,2,1,2
433 | 1276091,5,1,1,3,4,1,3,2,1,2
434 | 1277629,5,1,1,1,2,1,2,2,1,2
435 | 1293439,3,2,2,3,2,1,1,1,1,2
436 | 1293439,6,9,7,5,5,8,4,2,1,2
437 | 1294562,10,8,10,1,3,10,5,1,1,4
438 | 1295186,10,10,10,1,6,1,2,8,1,4
439 | 527337,4,1,1,1,2,1,1,1,1,2
440 | 558538,4,1,3,3,2,1,1,1,1,2
441 | 566509,5,1,1,1,2,1,1,1,1,2
442 | 608157,10,4,3,10,4,10,10,1,1,4
443 | 677910,5,2,2,4,2,4,1,1,1,2
444 | 734111,1,1,1,3,2,3,1,1,1,2
445 | 734111,1,1,1,1,2,2,1,1,1,2
446 | 780555,5,1,1,6,3,1,2,1,1,2
447 | 827627,2,1,1,1,2,1,1,1,1,2
448 | 1049837,1,1,1,1,2,1,1,1,1,2
449 | 1058849,5,1,1,1,2,1,1,1,1,2
450 | 1182404,1,1,1,1,1,1,1,1,1,2
451 | 1193544,5,7,9,8,6,10,8,10,1,4
452 | 1201870,4,1,1,3,1,1,2,1,1,2
453 | 1202253,5,1,1,1,2,1,1,1,1,2
454 | 1227081,3,1,1,3,2,1,1,1,1,2
455 | 1230994,4,5,5,8,6,10,10,7,1,4
456 | 1238410,2,3,1,1,3,1,1,1,1,2
457 | 1246562,10,2,2,1,2,6,1,1,2,4
458 | 1257470,10,6,5,8,5,10,8,6,1,4
459 | 1259008,8,8,9,6,6,3,10,10,1,4
460 | 1266124,5,1,2,1,2,1,1,1,1,2
461 | 1267898,5,1,3,1,2,1,1,1,1,2
462 | 1268313,5,1,1,3,2,1,1,1,1,2
463 | 1268804,3,1,1,1,2,5,1,1,1,2
464 | 1276091,6,1,1,3,2,1,1,1,1,2
465 | 1280258,4,1,1,1,2,1,1,2,1,2
466 | 1293966,4,1,1,1,2,1,1,1,1,2
467 | 1296572,10,9,8,7,6,4,7,10,3,4
468 | 1298416,10,6,6,2,4,10,9,7,1,4
469 | 1299596,6,6,6,5,4,10,7,6,2,4
470 | 1105524,4,1,1,1,2,1,1,1,1,2
471 | 1181685,1,1,2,1,2,1,2,1,1,2
472 | 1211594,3,1,1,1,1,1,2,1,1,2
473 | 1238777,6,1,1,3,2,1,1,1,1,2
474 | 1257608,6,1,1,1,1,1,1,1,1,2
475 | 1269574,4,1,1,1,2,1,1,1,1,2
476 | 1277145,5,1,1,1,2,1,1,1,1,2
477 | 1287282,3,1,1,1,2,1,1,1,1,2
478 | 1296025,4,1,2,1,2,1,1,1,1,2
479 | 1296263,4,1,1,1,2,1,1,1,1,2
480 | 1296593,5,2,1,1,2,1,1,1,1,2
481 | 1299161,4,8,7,10,4,10,7,5,1,4
482 | 1301945,5,1,1,1,1,1,1,1,1,2
483 | 1302428,5,3,2,4,2,1,1,1,1,2
484 | 1318169,9,10,10,10,10,5,10,10,10,4
485 | 474162,8,7,8,5,5,10,9,10,1,4
486 | 787451,5,1,2,1,2,1,1,1,1,2
487 | 1002025,1,1,1,3,1,3,1,1,1,2
488 | 1070522,3,1,1,1,1,1,2,1,1,2
489 | 1073960,10,10,10,10,6,10,8,1,5,4
490 | 1076352,3,6,4,10,3,3,3,4,1,4
491 | 1084139,6,3,2,1,3,4,4,1,1,4
492 | 1115293,1,1,1,1,2,1,1,1,1,2
493 | 1119189,5,8,9,4,3,10,7,1,1,4
494 | 1133991,4,1,1,1,1,1,2,1,1,2
495 | 1142706,5,10,10,10,6,10,6,5,2,4
496 | 1155967,5,1,2,10,4,5,2,1,1,2
497 | 1170945,3,1,1,1,1,1,2,1,1,2
498 | 1181567,1,1,1,1,1,1,1,1,1,2
499 | 1182404,4,2,1,1,2,1,1,1,1,2
500 | 1204558,4,1,1,1,2,1,2,1,1,2
501 | 1217952,4,1,1,1,2,1,2,1,1,2
502 | 1224565,6,1,1,1,2,1,3,1,1,2
503 | 1238186,4,1,1,1,2,1,2,1,1,2
504 | 1253917,4,1,1,2,2,1,2,1,1,2
505 | 1265899,4,1,1,1,2,1,3,1,1,2
506 | 1268766,1,1,1,1,2,1,1,1,1,2
507 | 1277268,3,3,1,1,2,1,1,1,1,2
508 | 1286943,8,10,10,10,7,5,4,8,7,4
509 | 1295508,1,1,1,1,2,4,1,1,1,2
510 | 1297327,5,1,1,1,2,1,1,1,1,2
511 | 1297522,2,1,1,1,2,1,1,1,1,2
512 | 1298360,1,1,1,1,2,1,1,1,1,2
513 | 1299924,5,1,1,1,2,1,2,1,1,2
514 | 1299994,5,1,1,1,2,1,1,1,1,2
515 | 1304595,3,1,1,1,1,1,2,1,1,2
516 | 1306282,6,6,7,10,3,10,8,10,2,4
517 | 1313325,4,10,4,7,3,10,9,10,1,4
518 | 1320077,1,1,1,1,1,1,1,1,1,2
519 | 1320077,1,1,1,1,1,1,2,1,1,2
520 | 1320304,3,1,2,2,2,1,1,1,1,2
521 | 1330439,4,7,8,3,4,10,9,1,1,4
522 | 333093,1,1,1,1,3,1,1,1,1,2
523 | 369565,4,1,1,1,3,1,1,1,1,2
524 | 412300,10,4,5,4,3,5,7,3,1,4
525 | 672113,7,5,6,10,4,10,5,3,1,4
526 | 749653,3,1,1,1,2,1,2,1,1,2
527 | 769612,3,1,1,2,2,1,1,1,1,2
528 | 769612,4,1,1,1,2,1,1,1,1,2
529 | 798429,4,1,1,1,2,1,3,1,1,2
530 | 807657,6,1,3,2,2,1,1,1,1,2
531 | 8233704,4,1,1,1,1,1,2,1,1,2
532 | 837480,7,4,4,3,4,10,6,9,1,4
533 | 867392,4,2,2,1,2,1,2,1,1,2
534 | 869828,1,1,1,1,1,1,3,1,1,2
535 | 1043068,3,1,1,1,2,1,2,1,1,2
536 | 1056171,2,1,1,1,2,1,2,1,1,2
537 | 1061990,1,1,3,2,2,1,3,1,1,2
538 | 1113061,5,1,1,1,2,1,3,1,1,2
539 | 1116192,5,1,2,1,2,1,3,1,1,2
540 | 1135090,4,1,1,1,2,1,2,1,1,2
541 | 1145420,6,1,1,1,2,1,2,1,1,2
542 | 1158157,5,1,1,1,2,2,2,1,1,2
543 | 1171578,3,1,1,1,2,1,1,1,1,2
544 | 1174841,5,3,1,1,2,1,1,1,1,2
545 | 1184586,4,1,1,1,2,1,2,1,1,2
546 | 1186936,2,1,3,2,2,1,2,1,1,2
547 | 1197527,5,1,1,1,2,1,2,1,1,2
548 | 1222464,6,10,10,10,4,10,7,10,1,4
549 | 1240603,2,1,1,1,1,1,1,1,1,2
550 | 1240603,3,1,1,1,1,1,1,1,1,2
551 | 1241035,7,8,3,7,4,5,7,8,2,4
552 | 1287971,3,1,1,1,2,1,2,1,1,2
553 | 1289391,1,1,1,1,2,1,3,1,1,2
554 | 1299924,3,2,2,2,2,1,4,2,1,2
555 | 1306339,4,4,2,1,2,5,2,1,2,2
556 | 1313658,3,1,1,1,2,1,1,1,1,2
557 | 1313982,4,3,1,1,2,1,4,8,1,2
558 | 1321264,5,2,2,2,1,1,2,1,1,2
559 | 1321321,5,1,1,3,2,1,1,1,1,2
560 | 1321348,2,1,1,1,2,1,2,1,1,2
561 | 1321931,5,1,1,1,2,1,2,1,1,2
562 | 1321942,5,1,1,1,2,1,3,1,1,2
563 | 1321942,5,1,1,1,2,1,3,1,1,2
564 | 1328331,1,1,1,1,2,1,3,1,1,2
565 | 1328755,3,1,1,1,2,1,2,1,1,2
566 | 1331405,4,1,1,1,2,1,3,2,1,2
567 | 1331412,5,7,10,10,5,10,10,10,1,4
568 | 1333104,3,1,2,1,2,1,3,1,1,2
569 | 1334071,4,1,1,1,2,3,2,1,1,2
570 | 1343068,8,4,4,1,6,10,2,5,2,4
571 | 1343374,10,10,8,10,6,5,10,3,1,4
572 | 1344121,8,10,4,4,8,10,8,2,1,4
573 | 142932,7,6,10,5,3,10,9,10,2,4
574 | 183936,3,1,1,1,2,1,2,1,1,2
575 | 324382,1,1,1,1,2,1,2,1,1,2
576 | 378275,10,9,7,3,4,2,7,7,1,4
577 | 385103,5,1,2,1,2,1,3,1,1,2
578 | 690557,5,1,1,1,2,1,2,1,1,2
579 | 695091,1,1,1,1,2,1,2,1,1,2
580 | 695219,1,1,1,1,2,1,2,1,1,2
581 | 824249,1,1,1,1,2,1,3,1,1,2
582 | 871549,5,1,2,1,2,1,2,1,1,2
583 | 878358,5,7,10,6,5,10,7,5,1,4
584 | 1107684,6,10,5,5,4,10,6,10,1,4
585 | 1115762,3,1,1,1,2,1,1,1,1,2
586 | 1217717,5,1,1,6,3,1,1,1,1,2
587 | 1239420,1,1,1,1,2,1,1,1,1,2
588 | 1254538,8,10,10,10,6,10,10,10,1,4
589 | 1261751,5,1,1,1,2,1,2,2,1,2
590 | 1268275,9,8,8,9,6,3,4,1,1,4
591 | 1272166,5,1,1,1,2,1,1,1,1,2
592 | 1294261,4,10,8,5,4,1,10,1,1,4
593 | 1295529,2,5,7,6,4,10,7,6,1,4
594 | 1298484,10,3,4,5,3,10,4,1,1,4
595 | 1311875,5,1,2,1,2,1,1,1,1,2
596 | 1315506,4,8,6,3,4,10,7,1,1,4
597 | 1320141,5,1,1,1,2,1,2,1,1,2
598 | 1325309,4,1,2,1,2,1,2,1,1,2
599 | 1333063,5,1,3,1,2,1,3,1,1,2
600 | 1333495,3,1,1,1,2,1,2,1,1,2
601 | 1334659,5,2,4,1,1,1,1,1,1,2
602 | 1336798,3,1,1,1,2,1,2,1,1,2
603 | 1344449,1,1,1,1,1,1,2,1,1,2
604 | 1350568,4,1,1,1,2,1,2,1,1,2
605 | 1352663,5,4,6,8,4,1,8,10,1,4
606 | 188336,5,3,2,8,5,10,8,1,2,4
607 | 352431,10,5,10,3,5,8,7,8,3,4
608 | 353098,4,1,1,2,2,1,1,1,1,2
609 | 411453,1,1,1,1,2,1,1,1,1,2
610 | 557583,5,10,10,10,10,10,10,1,1,4
611 | 636375,5,1,1,1,2,1,1,1,1,2
612 | 736150,10,4,3,10,3,10,7,1,2,4
613 | 803531,5,10,10,10,5,2,8,5,1,4
614 | 822829,8,10,10,10,6,10,10,10,10,4
615 | 1016634,2,3,1,1,2,1,2,1,1,2
616 | 1031608,2,1,1,1,1,1,2,1,1,2
617 | 1041043,4,1,3,1,2,1,2,1,1,2
618 | 1042252,3,1,1,1,2,1,2,1,1,2
619 | 1057067,1,1,1,1,1,?,1,1,1,2
620 | 1061990,4,1,1,1,2,1,2,1,1,2
621 | 1073836,5,1,1,1,2,1,2,1,1,2
622 | 1083817,3,1,1,1,2,1,2,1,1,2
623 | 1096352,6,3,3,3,3,2,6,1,1,2
624 | 1140597,7,1,2,3,2,1,2,1,1,2
625 | 1149548,1,1,1,1,2,1,1,1,1,2
626 | 1174009,5,1,1,2,1,1,2,1,1,2
627 | 1183596,3,1,3,1,3,4,1,1,1,2
628 | 1190386,4,6,6,5,7,6,7,7,3,4
629 | 1190546,2,1,1,1,2,5,1,1,1,2
630 | 1213273,2,1,1,1,2,1,1,1,1,2
631 | 1218982,4,1,1,1,2,1,1,1,1,2
632 | 1225382,6,2,3,1,2,1,1,1,1,2
633 | 1235807,5,1,1,1,2,1,2,1,1,2
634 | 1238777,1,1,1,1,2,1,1,1,1,2
635 | 1253955,8,7,4,4,5,3,5,10,1,4
636 | 1257366,3,1,1,1,2,1,1,1,1,2
637 | 1260659,3,1,4,1,2,1,1,1,1,2
638 | 1268952,10,10,7,8,7,1,10,10,3,4
639 | 1275807,4,2,4,3,2,2,2,1,1,2
640 | 1277792,4,1,1,1,2,1,1,1,1,2
641 | 1277792,5,1,1,3,2,1,1,1,1,2
642 | 1285722,4,1,1,3,2,1,1,1,1,2
643 | 1288608,3,1,1,1,2,1,2,1,1,2
644 | 1290203,3,1,1,1,2,1,2,1,1,2
645 | 1294413,1,1,1,1,2,1,1,1,1,2
646 | 1299596,2,1,1,1,2,1,1,1,1,2
647 | 1303489,3,1,1,1,2,1,2,1,1,2
648 | 1311033,1,2,2,1,2,1,1,1,1,2
649 | 1311108,1,1,1,3,2,1,1,1,1,2
650 | 1315807,5,10,10,10,10,2,10,10,10,4
651 | 1318671,3,1,1,1,2,1,2,1,1,2
652 | 1319609,3,1,1,2,3,4,1,1,1,2
653 | 1323477,1,2,1,3,2,1,2,1,1,2
654 | 1324572,5,1,1,1,2,1,2,2,1,2
655 | 1324681,4,1,1,1,2,1,2,1,1,2
656 | 1325159,3,1,1,1,2,1,3,1,1,2
657 | 1326892,3,1,1,1,2,1,2,1,1,2
658 | 1330361,5,1,1,1,2,1,2,1,1,2
659 | 1333877,5,4,5,1,8,1,3,6,1,2
660 | 1334015,7,8,8,7,3,10,7,2,3,4
661 | 1334667,1,1,1,1,2,1,1,1,1,2
662 | 1339781,1,1,1,1,2,1,2,1,1,2
663 | 1339781,4,1,1,1,2,1,3,1,1,2
664 | 13454352,1,1,3,1,2,1,2,1,1,2
665 | 1345452,1,1,3,1,2,1,2,1,1,2
666 | 1345593,3,1,1,3,2,1,2,1,1,2
667 | 1347749,1,1,1,1,2,1,1,1,1,2
668 | 1347943,5,2,2,2,2,1,1,1,2,2
669 | 1348851,3,1,1,1,2,1,3,1,1,2
670 | 1350319,5,7,4,1,6,1,7,10,3,4
671 | 1350423,5,10,10,8,5,5,7,10,1,4
672 | 1352848,3,10,7,8,5,8,7,4,1,4
673 | 1353092,3,2,1,2,2,1,3,1,1,2
674 | 1354840,2,1,1,1,2,1,3,1,1,2
675 | 1354840,5,3,2,1,3,1,1,1,1,2
676 | 1355260,1,1,1,1,2,1,2,1,1,2
677 | 1365075,4,1,4,1,2,1,1,1,1,2
678 | 1365328,1,1,2,1,2,1,2,1,1,2
679 | 1368267,5,1,1,1,2,1,1,1,1,2
680 | 1368273,1,1,1,1,2,1,1,1,1,2
681 | 1368882,2,1,1,1,2,1,1,1,1,2
682 | 1369821,10,10,10,10,5,10,10,10,7,4
683 | 1371026,5,10,10,10,4,10,5,6,3,4
684 | 1371920,5,1,1,1,2,1,3,2,1,2
685 | 466906,1,1,1,1,2,1,1,1,1,2
686 | 466906,1,1,1,1,2,1,1,1,1,2
687 | 534555,1,1,1,1,2,1,1,1,1,2
688 | 536708,1,1,1,1,2,1,1,1,1,2
689 | 566346,3,1,1,1,2,1,2,3,1,2
690 | 603148,4,1,1,1,2,1,1,1,1,2
691 | 654546,1,1,1,1,2,1,1,1,8,2
692 | 654546,1,1,1,3,2,1,1,1,1,2
693 | 695091,5,10,10,5,4,5,4,4,1,4
694 | 714039,3,1,1,1,2,1,1,1,1,2
695 | 763235,3,1,1,1,2,1,2,1,2,2
696 | 776715,3,1,1,1,3,2,1,1,1,2
697 | 841769,2,1,1,1,2,1,1,1,1,2
698 | 888820,5,10,10,3,7,3,8,10,2,4
699 | 897471,4,8,6,4,3,4,10,6,1,4
700 | 897471,4,8,8,5,4,5,10,4,1,4


--------------------------------------------------------------------------------
/data/HorseKicks.txt:
--------------------------------------------------------------------------------
1 | Year	GC	C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11	C14	C151875	0	0	0	0	0	0	0	1	1	0	0	0	1	01876	2	0	0	0	1	0	0	0	0	0	0	0	1	11877	2	0	0	0	0	0	1	1	0	0	1	0	2	01878	1	2	2	1	1	0	0	0	0	0	1	0	1	01879	0	0	0	1	1	2	2	0	1	0	0	2	1	01880	0	3	2	1	1	1	0	0	0	2	1	4	3	01881	1	0	0	2	1	0	0	1	0	1	0	0	0	01882	1	2	0	0	0	0	1	0	1	1	2	1	4	11883	0	0	1	2	0	1	2	1	0	1	0	3	0	01884	3	0	1	0	0	0	0	1	0	0	2	0	1	11885	0	0	0	0	0	0	1	0	0	2	0	1	0	11886	2	1	0	0	1	1	1	0	0	1	0	1	3	01887	1	1	2	1	0	0	3	2	1	1	0	1	2	01888	0	1	1	0	0	1	1	0	0	0	0	1	1	01889	0	0	1	1	0	1	1	0	0	1	2	2	0	21890	1	2	0	2	0	1	1	2	0	2	1	1	2	21891	0	0	0	1	1	1	0	1	1	0	3	3	1	01892	1	3	2	0	1	1	3	0	1	1	0	1	1	01893	0	1	0	0	0	1	0	2	0	0	1	3	0	01894	1	0	0	0	0	0	0	0	1	0	1	1	0	0


--------------------------------------------------------------------------------
/data/Housefly_wing_lengths.txt:
--------------------------------------------------------------------------------
  1 | 36
  2 | 37
  3 | 38
  4 | 38
  5 | 39
  6 | 39
  7 | 40
  8 | 40
  9 | 40
 10 | 40
 11 | 41
 12 | 41
 13 | 41
 14 | 41
 15 | 41
 16 | 41
 17 | 42
 18 | 42
 19 | 42
 20 | 42
 21 | 42
 22 | 42
 23 | 42
 24 | 43
 25 | 43
 26 | 43
 27 | 43
 28 | 43
 29 | 43
 30 | 43
 31 | 43
 32 | 44
 33 | 44
 34 | 44
 35 | 44
 36 | 44
 37 | 44
 38 | 44
 39 | 44
 40 | 44
 41 | 45
 42 | 45
 43 | 45
 44 | 45
 45 | 45
 46 | 45
 47 | 45
 48 | 45
 49 | 45
 50 | 45
 51 | 46
 52 | 46
 53 | 46
 54 | 46
 55 | 46
 56 | 46
 57 | 46
 58 | 46
 59 | 46
 60 | 46
 61 | 47
 62 | 47
 63 | 47
 64 | 47
 65 | 47
 66 | 47
 67 | 47
 68 | 47
 69 | 47
 70 | 48
 71 | 48
 72 | 48
 73 | 48
 74 | 48
 75 | 48
 76 | 48
 77 | 48
 78 | 49
 79 | 49
 80 | 49
 81 | 49
 82 | 49
 83 | 49
 84 | 49
 85 | 50
 86 | 50
 87 | 50
 88 | 50
 89 | 50
 90 | 50
 91 | 51
 92 | 51
 93 | 51
 94 | 51
 95 | 52
 96 | 52
 97 | 53
 98 | 53
 99 | 54
100 | 55


--------------------------------------------------------------------------------
/data/food_outlet_data.csv:
--------------------------------------------------------------------------------
 1 | population,profit
 2 | 6.1101,17.592
 3 | 5.5277,9.1302
 4 | 8.5186,13.662
 5 | 7.0032,11.854
 6 | 5.8598,6.8233
 7 | 8.3829,11.886
 8 | 7.4764,4.3483
 9 | 8.5781,12
10 | 6.4862,6.5987
11 | 5.0546,3.8166
12 | 5.7107,3.2522
13 | 14.164,15.505
14 | 5.734,3.1551
15 | 8.4084,7.2258
16 | 5.6407,0.71618
17 | 5.3794,3.5129
18 | 6.3654,5.3048
19 | 5.1301,0.56077
20 | 6.4296,3.6518
21 | 7.0708,5.3893
22 | 6.1891,3.1386
23 | 20.27,21.767
24 | 5.4901,4.263
25 | 6.3261,5.1875
26 | 5.5649,3.0825
27 | 18.945,22.638
28 | 12.828,13.501
29 | 10.957,7.0467
30 | 13.176,14.692
31 | 22.203,24.147
32 | 5.2524,-1.22
33 | 6.5894,5.9966
34 | 9.2482,12.134
35 | 5.8918,1.8495
36 | 8.2111,6.5426
37 | 7.9334,4.5623
38 | 8.0959,4.1164
39 | 5.6063,3.3928
40 | 12.836,10.117
41 | 6.3534,5.4974
42 | 5.4069,0.55657
43 | 6.8825,3.9115
44 | 11.708,5.3854
45 | 5.7737,2.4406
46 | 7.8247,6.7318
47 | 7.0931,1.0463
48 | 5.0702,5.1337
49 | 5.8014,1.844
50 | 11.7,8.0043
51 | 5.5416,1.0179
52 | 7.5402,6.7504
53 | 5.3077,1.8396
54 | 7.4239,4.2885
55 | 7.6031,4.9981
56 | 6.3328,1.4233
57 | 6.3589,-1.4211
58 | 6.2742,2.4756
59 | 5.6397,4.6042
60 | 9.3102,3.9624
61 | 9.4536,5.4141
62 | 8.8254,5.1694
63 | 5.1793,-0.74279
64 | 21.279,17.929
65 | 14.908,12.054
66 | 18.959,17.054
67 | 7.2182,4.8852
68 | 8.2951,5.7442
69 | 10.236,7.7754
70 | 5.4994,1.0173
71 | 20.341,20.992
72 | 10.136,6.6799
73 | 7.3345,4.0259
74 | 6.0062,1.2784
75 | 7.2259,3.3411
76 | 5.0269,-2.6807
77 | 6.5479,0.29678
78 | 7.5386,3.8845
79 | 5.0365,5.7014
80 | 10.274,6.7526
81 | 5.1077,2.0576
82 | 5.7292,0.47953
83 | 5.1884,0.20421
84 | 6.3557,0.67861
85 | 9.7687,7.5435
86 | 6.5159,5.3436
87 | 8.5172,4.2415
88 | 9.1802,6.7981
89 | 6.002,0.92695
90 | 5.5204,0.152
91 | 5.0594,2.8214
92 | 5.7077,1.8451
93 | 7.6366,4.2959
94 | 5.8707,7.2029
95 | 5.3054,1.9869
96 | 8.2934,0.14454
97 | 13.394,9.0551
98 | 5.4369,0.61705
99 | 


--------------------------------------------------------------------------------
/ideas.md:
--------------------------------------------------------------------------------
 1 | # Idea List
 2 | 
 3 | Possible topics for notebooks
 4 | 
 5 | - Understanding Distributions
 6 | - Hypothesis Testing
 7 | - Linear Regression
 8 | - Multicollinearity
 9 | - Heteroskedasticity
10 | - Autocorrelation
11 | - Causality
12 | - Time series
13 | - Time series forecasting
14 | - Markov Chain
15 | - Poisson Process
16 | - Simulation
17 | - Panel Data
18 | - Cramer-Rao Lower Bound
19 | - Convergence
20 | - Limit Theorems
21 | - Decision Tree Algorithm
22 | - Understanding Transformations
23 | - Moment Generating Functions
24 | - ...and more


--------------------------------------------------------------------------------
/images/Conditional.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Conditional.png


--------------------------------------------------------------------------------
/images/JDTable.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/JDTable.png


--------------------------------------------------------------------------------
/images/Joint.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Joint.png


--------------------------------------------------------------------------------
/images/Marginal.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Marginal.png


--------------------------------------------------------------------------------
/images/Marginal2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Marginal2.png


--------------------------------------------------------------------------------
/images/OneDirectional.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/OneDirectional.png


--------------------------------------------------------------------------------
/images/OnettwoTailed.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/OnettwoTailed.png


--------------------------------------------------------------------------------
/images/Table.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Table.png


--------------------------------------------------------------------------------
/images/TwoDirectional.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/TwoDirectional.png


--------------------------------------------------------------------------------
/images/TypeIandTypeIIError.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/TypeIandTypeIIError.png


--------------------------------------------------------------------------------
/notebooks/Baye's Theorem Notebook.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Baye's Theorem"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "### Introduction\n",
 15 |     "\n",
 16 |     "Befor starting with *Bayes Theorem* we can have a look at some definitions.\n",
 17 |     "\n",
 18 |     "**Conditional Probability :**\n",
 19 |     "Conditional Probability is the Probability of one event occuring with some Relationship to one or more events.\n",
 20 |     "Let A and B be the two interdependent event,where A has already occured then the probabilty of B will be \n",
 21 |     " $$ P(B|A) = P(A \\cap B)|P(A) $$\n",
 22 |     " \n",
 23 |     "**Joint Probability :**\n",
 24 |     "Joint Probability is a Statistical measure that Calculates the Likehood of two events occuring together and at the same point in time.\n",
 25 |     " $$ P(A \\cap B) = P(A|B) * P(B) $$"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "### Bayes Theorem\n",
 33 |     "\n",
 34 |     "Bayes Theorem was named after **Thomas Bayes**,who discovered it in **1763** and worked in the field of Decision Theory.\n",
 35 |     "\n",
 36 |     "Bayes Theorem is a mathematical formula used to determine the **Conditional Probability** of events without the **Joint Probability**.\n",
 37 |     "\n",
 38 |     "**Statement**\n",
 39 |     "\n",
 40 |     "If B$_{1}$, B$_{2}$, B$_{3}$,.....,B$_{n}$ are Mutually exclusive event with P(B$_{i}$) $\\not=$  0 ,( i=1,2,3,...,n) of Random Experiment then for any Arbitrary event A of the Sample Space of the above Experiment with P(A)>0,we have\n",
 41 |     "\n",
 42 |     "$$ P(B_{i}|A) = P(B_{i})P(A|B_{i})/ \\sum\\limits_{i=1}^{n} P(B_{i})P(A|B_{i}) $$\n",
 43 |     "\n",
 44 |     "**Proof**\n",
 45 |     "\n",
 46 |     "Let S be the Sample Space of the Random Experiment.The Event  B$_{1}$, B$_{2}$, B$_{3}$,.....,B$_{n}$ being Exhaustive\n",
 47 |     "$$ S = (B_{1} \\cup B_{2} \\cup ...\\cup B_{n})  \\hspace{1cm} \\hspace{0.1cm}      [\\therefore A \\subset S] $$\n",
 48 |     "$$ A = A \\cap S = A \\cap ( B_{1} \\cup B_{2} \\cup B_{3},.....,\\cup B_{n}) $$\n",
 49 |     "$$   = (A \\cap B_{1}) \\cup (A \\cap B_{2}) \\cup ... \\cup (A \\cap B_{n}) $$\n",
 50 |     "\n",
 51 |     "$$ P(A) = P(A \\cap B_{1}) + P (A \\cap B_{2}) + ...+ P(A \\cap B_{n}) $$\n",
 52 |     "$$  \\hspace{3cm} \\hspace{0.1cm}    = P(B_{1})P(A|B_{1}) + P(B_{2})P(A|B_{2}) + ... +P(B_{n})P(A|B_{n}) $$\n",
 53 |     "$$      = \\sum\\limits_{i=1}^{n} P(B_{i})P(A|B_{i}) $$\n",
 54 |     "\n",
 55 |     "Now,\n",
 56 |     "$$ P(A \\cap B_{i}) = P(A)P(B_{i}|A) $$\n",
 57 |     "$$ P(B_{i}|A) = P(A \\cap B_{i})/P(A) = P(B_{i})P(A|B_{i})/\\sum\\limits_{i=1}^{n} P(B_{i})P(A|B_{i}) $$\n",
 58 |     "\n",
 59 |     "**P(B)** is the Probability of occurence **B**.If we know that the event **A** has already occured.On knowing about the event **A**,**P(B)** is changed to **P(B|A)**.With the help of **Bayes Theorem we can Calculate P(B|A)**.\n",
 60 |     "\n",
 61 |     "**Naming Conventions :**\n",
 62 |     "\n",
 63 |     "<br>\n",
 64 |     "P(A/B) : Posterior Probability \n",
 65 |     "<br>\n",
 66 |     "P(A) : Prior Probability\n",
 67 |     "<br>\n",
 68 |     "P(B/A) : Likelihood\n",
 69 |     "<br>\n",
 70 |     "P(B) : Evidence \n",
 71 |     "<br>\n",
 72 |     "So, Bayes Theorem can be Restated as :\n",
 73 |     "$$ Posterior = Likelihood * Prior / Evidence $$\n",
 74 |     "\n"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "markdown",
 79 |    "metadata": {},
 80 |    "source": [
 81 |     " Now we will be looking at some problem examples on Bayes Theorem.\n",
 82 |     "    \n",
 83 |     "**Example 1** :Suppose that the reliability of a Covid-19 test is specified as follows:\n",
 84 |     "<br>\n",
 85 |     "Of Population having Covid-19 , 90% of the test detect the desire but 10% go undetected.Of Population free of Covid-19 , 99% of the test are judged Covid-19 -tive but 1% are diagnosed showing Covid-19 +tive.From a large population of which only 0.1% have Covid-19,one person is selected at Random,given the Covid-19 test,and the pathologist Report him/her as Covid-19 positive.What is the Probability that the person actually have Covid-19?"
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "markdown",
 90 |    "metadata": {},
 91 |    "source": [
 92 |     "**Solution**<br>\n",
 93 |     "Let, <br>\n",
 94 |     "B$_{1}$ = The Person Selected is Actually having Covid-19.<br>\n",
 95 |     "B$_{2}$ = The Person Selected is not having Covid-19.<br>\n",
 96 |     "A = The Person Covid-19 Test is Diagnosed as Positive.<br>\n",
 97 |     "\n",
 98 |     "P(B$_{1}$) = 0.1% = 0.1/100 = 0.001<br>\n",
 99 |     "P(B$_{2}$) = 1-P(B$_{1}$) = 1-0.001 = 0.999<br>\n",
100 |     "P(A|B$_{1}$) = Probability that the person tested Covid-19 +tive given that he / she is actually having Covid-19.= 90/100 = 0.9 <br>\n",
101 |     "P(A|B$_{2}$) = Probability that the person tested Covid-19 +tive given that he / she is actually not having Covid-19.= 1/100 = 0.01 <br>\n",
102 |     "\n",
103 |     "Required Probability = P(B$_{1}$|A) = P(B$_{1}$) * P(A|B$_{1}$)/ (((P(B$_{1}$) * P(A|B$_{1}$))+((P(B$_{2}$) * P(A|B$_{2}$)))<br>\n",
104 |     "                     = (0.001 * 0.9)/(0.001 * 0.9+0.999 * 0.01) = 90/1089 =0.08264\n"
105 |    ]
106 |   },
107 |   {
108 |    "cell_type": "markdown",
109 |    "metadata": {},
110 |    "source": [
111 |     "We will Now use Python to calculate the same."
112 |    ]
113 |   },
114 |   {
115 |    "cell_type": "code",
116 |    "execution_count": 1,
117 |    "metadata": {},
118 |    "outputs": [
119 |     {
120 |      "name": "stdout",
121 |      "output_type": "stream",
122 |      "text": [
123 |       "P(B1|A)= 8.264 %\n"
124 |      ]
125 |     }
126 |    ],
127 |    "source": [
128 |     "#calculate P(B1|A) given P(B1),P(A|B1),P(A|B2),P(B2)\n",
129 |     "def bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2):\n",
130 |     "   p_b1_given_a=(p_b1*p_a_given_b1)/((p_b1*p_a_given_b1)+(p_b2*p_a_given_b2))\n",
131 |     "   return p_b1_given_a\n",
132 |     "\n",
133 |     "#P(B1)\n",
134 |     "p_b1=0.001\n",
135 |     "#P(B2)\n",
136 |     "p_b2=0.999\n",
137 |     "#P(A|B1)\n",
138 |     "p_a_given_b1=0.9\n",
139 |     "#P(A|B2)\n",
140 |     "p_a_given_b2=0.01\n",
141 |     "result=bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2)\n",
142 |     "print('P(B1|A)=% .3f %%'%(result*100))\n",
143 |     "                                     "
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "markdown",
148 |    "metadata": {},
149 |    "source": [
150 |     "**Example 2 :** In a Quiz,a contestant either guesses or cheat or knows the answer to a multiple choice question with four choices.The Probability that he/she makes a guess is 1/3 and the Probability that he /she cheats the answer is 1/6.The Probability that his answer is correct,given that he cheated it,is 1/8.Find the Probability that he knows the answer to the question,given that he/she correctly answered it."
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "markdown",
155 |    "metadata": {},
156 |    "source": [
157 |     "**Solution**<br>\n",
158 |     "Let, <br>\n",
159 |     "B$_{1}$ = Contestant guesses the answer.<br>\n",
160 |     "B$_{2}$ = Contestant cheated the answer.<br>\n",
161 |     "B$_{3}$ = Contestant knows the answer.<br>\n",
162 |     "A = Contestant answer correctly.<br>\n",
163 |     "clearly,<br>\n",
164 |     "P(B$_{1}$) = 1/3 , P(B$_{2}$) =1/6<br>\n",
165 |     "\n",
166 |     "Since B$_{1}$ ,B$_{2}$, B$_{3}$ are mutually exclusive and exhaustive event.\n",
167 |     "P(B$_{1}$) + P(B$_{2}$) + P(B$_{3}$) = 1 => P(B$_{3}$) = 1 - (P(B$_{1}$) + P(B$_{2}$))\n",
168 |     "=1-1/3-1/6=1/2\n",
169 |     "\n",
170 |     "\n",
171 |     "If B$_{1}$ has already occured,the contestant guesses,the there are four choices out of which only one is correct.<br>\n",
172 |     "$\\therefore$ the Probability that he answers correctly given that he/she has made a guess is 1/4 i.e. **P(A|B$-{1}$) = 1/4**<br>\n",
173 |     "It is given that he knew the answer = 1<br>\n",
174 |     "By Bayes Theorem,<br>\n",
175 |     "Required Probability = P(B$_{3}$|A)<br>\n",
176 |     "\n",
177 |     "= P(B$_{3}$)P(A|B$_{3}$)/(P(B$_{1}$)P(A|B$_{1}$)+P(B$_{2}$)P(A|B$_{2}$)+P(B$_{3}$)P(A|B$_{3}$))\n",
178 |     "= (1/2 * 1) / ((1/3 * 1/4) + (1/6 * 1/8) + (1/2 * 1))=24/29\n",
179 |     "\n"
180 |    ]
181 |   },
182 |   {
183 |    "cell_type": "code",
184 |    "execution_count": 2,
185 |    "metadata": {},
186 |    "outputs": [
187 |     {
188 |      "name": "stdout",
189 |      "output_type": "stream",
190 |      "text": [
191 |       "P(B3|A)= 82.759 %\n"
192 |      ]
193 |     }
194 |    ],
195 |    "source": [
196 |     "#calculate P(B1|A) given P(B1),P(A|B1),P(A|B2),P(B2),P(B3),P(A|B3)\n",
197 |     "def bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2,p_b3,p_a_given_b3):\n",
198 |     "   p_b3_given_a=(p_b3*p_a_given_b3)/((p_b1*p_a_given_b1)+(p_b2*p_a_given_b2)+(p_b3*p_a_given_b3))\n",
199 |     "   return p_b3_given_a\n",
200 |     "\n",
201 |     "#P(B1)\n",
202 |     "p_b1=1/3\n",
203 |     "#P(B2)\n",
204 |     "p_b2=1/6\n",
205 |     "#P(B3)\n",
206 |     "p_b3=1/2\n",
207 |     "#P(A|B1)\n",
208 |     "p_a_given_b1=1/4\n",
209 |     "#P(A|B2)\n",
210 |     "p_a_given_b2=1/8\n",
211 |     "#P(A|B3)\n",
212 |     "p_a_given_b3=1\n",
213 |     "result=bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2,p_b3,p_a_given_b3)\n",
214 |     "print('P(B3|A)=% .3f %%'%(result*100))\n",
215 |     "                                     "
216 |    ]
217 |   }
218 |  ],
219 |  "metadata": {
220 |   "kernelspec": {
221 |    "display_name": "Python 3",
222 |    "language": "python",
223 |    "name": "python3"
224 |   },
225 |   "language_info": {
226 |    "codemirror_mode": {
227 |     "name": "ipython",
228 |     "version": 3
229 |    },
230 |    "file_extension": ".py",
231 |    "mimetype": "text/x-python",
232 |    "name": "python",
233 |    "nbconvert_exporter": "python",
234 |    "pygments_lexer": "ipython3",
235 |    "version": "3.8.5"
236 |   }
237 |  },
238 |  "nbformat": 4,
239 |  "nbformat_minor": 4
240 | }
241 | 


--------------------------------------------------------------------------------
/notebooks/Correlation with Example/README.md:
--------------------------------------------------------------------------------
1 | README.md
2 | 
3 | Correlation function explanation with an example of Movie Recommendation.


--------------------------------------------------------------------------------
/notebooks/Decision Tree.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Decision Tree with pure Python (Without External Library)"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "Python and OOP"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "##### Train Data"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 3,
 27 |    "metadata": {},
 28 |    "outputs": [],
 29 |    "source": [
 30 |     "training_data = [['Green',3,'Apple'],\n",
 31 |     "                 ['Yellow',3,'Apple'],\n",
 32 |     "                 ['Red',1,'Grape'],\n",
 33 |     "                 ['Red',1,'Grape'],\n",
 34 |     "                 ['Yellow',3,'Lemon']] #TOY DATASET"
 35 |    ]
 36 |   },
 37 |   {
 38 |    "cell_type": "markdown",
 39 |    "metadata": {},
 40 |    "source": [
 41 |     "##### Column Labels\n",
 42 |     "These are used to print tree"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": 4,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "header = [\"Color\",\"Diameter\",\"label\"]"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "#####  Finds Unique values for a column in dataset"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "code",
 63 |    "execution_count": 5,
 64 |    "metadata": {},
 65 |    "outputs": [],
 66 |    "source": [
 67 |     "def unique_vals(Data,col): #training_data,0\n",
 68 |     "    return set([row[col] for row in Data])"
 69 |    ]
 70 |   },
 71 |   {
 72 |    "cell_type": "code",
 73 |    "execution_count": 6,
 74 |    "metadata": {},
 75 |    "outputs": [
 76 |     {
 77 |      "data": {
 78 |       "text/plain": [
 79 |        "{'Apple', 'Grape', 'Lemon'}"
 80 |       ]
 81 |      },
 82 |      "execution_count": 6,
 83 |      "metadata": {},
 84 |      "output_type": "execute_result"
 85 |     }
 86 |    ],
 87 |    "source": [
 88 |     "unique_vals(training_data,2)"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "metadata": {},
 94 |    "source": [
 95 |     "demo"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": 7,
101 |    "metadata": {},
102 |    "outputs": [
103 |     {
104 |      "data": {
105 |       "text/plain": [
106 |        "{'Apple': 2, 'Grape': 2, 'Lemon': 1}"
107 |       ]
108 |      },
109 |      "execution_count": 7,
110 |      "metadata": {},
111 |      "output_type": "execute_result"
112 |     }
113 |    ],
114 |    "source": [
115 |     "{'Apple':2,'Grape':2,'Lemon':1}"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "markdown",
120 |    "metadata": {},
121 |    "source": [
122 |     "##### Count the number of each type in dataset"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": 8,
128 |    "metadata": {},
129 |    "outputs": [],
130 |    "source": [
131 |     "def class_counts(Data):\n",
132 |     "    counts = {}\n",
133 |     "    for row in Data:\n",
134 |     "        label = row[-1]\n",
135 |     "        if label not in counts:\n",
136 |     "            counts[label] = 0\n",
137 |     "        counts[label] += 1\n",
138 |     "    return counts"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "markdown",
143 |    "metadata": {},
144 |    "source": [
145 |     "demo"
146 |    ]
147 |   },
148 |   {
149 |    "cell_type": "code",
150 |    "execution_count": 9,
151 |    "metadata": {
152 |     "scrolled": true
153 |    },
154 |    "outputs": [
155 |     {
156 |      "data": {
157 |       "text/plain": [
158 |        "{'Apple': 2, 'Grape': 2, 'Lemon': 1}"
159 |       ]
160 |      },
161 |      "execution_count": 9,
162 |      "metadata": {},
163 |      "output_type": "execute_result"
164 |     }
165 |    ],
166 |    "source": [
167 |     "class_counts(training_data)"
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "markdown",
172 |    "metadata": {},
173 |    "source": [
174 |     "##### A Question is used to partition a dataset.\n",
175 |     "\n",
176 |     "This class just records a 'colum number (e.g., 0 for Color) and a column value (e.g., Green). The match method is used to compare the feature value in an example to the feature value stored in the question. See the demo below."
177 |    ]
178 |   },
179 |   {
180 |    "cell_type": "code",
181 |    "execution_count": 10,
182 |    "metadata": {},
183 |    "outputs": [],
184 |    "source": [
185 |     "class Question:\n",
186 |     "    def __init__(self,column,value):\n",
187 |     "        self.column = column\n",
188 |     "        self.value = value\n",
189 |     "    def match(self,example): #example means row --> ['Green',3,'Apple']\n",
190 |     "        val = example[self.column]\n",
191 |     "        return val == self.value #'Green' == 'Red', returns a boolean\n",
192 |     "    def __repr__(self):\n",
193 |     "        return \"Is %s %s %s?\" % (header[self.column],\"==\",str(self.value))"
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": 11,
199 |    "metadata": {},
200 |    "outputs": [
201 |     {
202 |      "data": {
203 |       "text/plain": [
204 |        "Is Color == Red?"
205 |       ]
206 |      },
207 |      "execution_count": 11,
208 |      "metadata": {},
209 |      "output_type": "execute_result"
210 |     }
211 |    ],
212 |    "source": [
213 |     "Question(0,\"Red\")"
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": 12,
219 |    "metadata": {},
220 |    "outputs": [
221 |     {
222 |      "name": "stdout",
223 |      "output_type": "stream",
224 |      "text": [
225 |       "Is Diameter == 4?\n"
226 |      ]
227 |     }
228 |    ],
229 |    "source": [
230 |     "q = Question(1,4)\n",
231 |     "print(q)"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "code",
236 |    "execution_count": 13,
237 |    "metadata": {
238 |     "scrolled": true
239 |    },
240 |    "outputs": [
241 |     {
242 |      "data": {
243 |       "text/plain": [
244 |        "False"
245 |       ]
246 |      },
247 |      "execution_count": 13,
248 |      "metadata": {},
249 |      "output_type": "execute_result"
250 |     }
251 |    ],
252 |    "source": [
253 |     "q.match(training_data[0])"
254 |    ]
255 |   },
256 |   {
257 |    "cell_type": "markdown",
258 |    "metadata": {},
259 |    "source": [
260 |     "##### Partitions a dataset.\n",
261 |     "\n",
262 |     "For each row in the dataset, check if it matches the question. If so, add it to 'true rows', otherwise, add it to 'false rows'."
263 |    ]
264 |   },
265 |   {
266 |    "cell_type": "code",
267 |    "execution_count": 14,
268 |    "metadata": {},
269 |    "outputs": [],
270 |    "source": [
271 |     "def partition(Data,question):\n",
272 |     "    true_rows,false_rows = [],[]\n",
273 |     "    for row in Data: #row is also called example --> ['Green',3,'Apple']\n",
274 |     "        if(question.match(row)):\n",
275 |     "            true_rows.append(row) # --> [['Green',3,'Apple']]\n",
276 |     "        else:\n",
277 |     "            false_rows.append(row)\n",
278 |     "    return true_rows,false_rows"
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "code",
283 |    "execution_count": 15,
284 |    "metadata": {},
285 |    "outputs": [
286 |     {
287 |      "name": "stdout",
288 |      "output_type": "stream",
289 |      "text": [
290 |       "True Rows:  [['Green', 3, 'Apple']]\n",
291 |       "False Rows:  [['Yellow', 3, 'Apple'], ['Red', 1, 'Grape'], ['Red', 1, 'Grape'], ['Yellow', 3, 'Lemon']]\n"
292 |      ]
293 |     }
294 |    ],
295 |    "source": [
296 |     "true_rows,false_rows = partition(training_data,\n",
297 |     "                                 Question(0,'Green'))\n",
298 |     "print('True Rows: ',true_rows)\n",
299 |     "print('False Rows: ',false_rows)"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "markdown",
304 |    "metadata": {},
305 |    "source": [
306 |     "##### Calculate the Gini Impurity for a list of rows.\n",
307 |     "\n",
308 |     "There are a few different ways to do this, I thought this one was the most concise. See: https://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity"
309 |    ]
310 |   },
311 |   {
312 |    "cell_type": "code",
313 |    "execution_count": 16,
314 |    "metadata": {},
315 |    "outputs": [],
316 |    "source": [
317 |     "def gini(Data):\n",
318 |     "    counts = class_counts(Data)\n",
319 |     "    impurity = 1\n",
320 |     "    for lbl in counts:\n",
321 |     "        prob_of_lbl = counts[lbl]/float(len(Data))\n",
322 |     "#         print(prob_of_lbl)\n",
323 |     "        impurity-=prob_of_lbl**2\n",
324 |     "#         print(impurity)\n",
325 |     "    return impurity"
326 |    ]
327 |   },
328 |   {
329 |    "cell_type": "code",
330 |    "execution_count": 17,
331 |    "metadata": {
332 |     "scrolled": false
333 |    },
334 |    "outputs": [
335 |     {
336 |      "data": {
337 |       "text/plain": [
338 |        "0.6399999999999999"
339 |       ]
340 |      },
341 |      "execution_count": 17,
342 |      "metadata": {},
343 |      "output_type": "execute_result"
344 |     }
345 |    ],
346 |    "source": [
347 |     "gini(training_data)"
348 |    ]
349 |   },
350 |   {
351 |    "cell_type": "markdown",
352 |    "metadata": {},
353 |    "source": [
354 |     "##### Information Gain.\n",
355 |     "\n",
356 |     "The uncertainty of the starting node, minus the weighted impurity of two child nodes."
357 |    ]
358 |   },
359 |   {
360 |    "cell_type": "code",
361 |    "execution_count": 18,
362 |    "metadata": {},
363 |    "outputs": [],
364 |    "source": [
365 |     "def info_gain(left,right,current_impurity): #current impurity means GDS \n",
366 |     "    #left means true, right means false\n",
367 |     "    p = float(len(left))/(len(left)+len(right)) #prob of true rows\n",
368 |     "    return current_impurity - p*gini(left) - (1-p)*gini(right)"
369 |    ]
370 |   },
371 |   {
372 |    "cell_type": "code",
373 |    "execution_count": 19,
374 |    "metadata": {},
375 |    "outputs": [],
376 |    "source": [
377 |     "true_rows,false_rows = partition(training_data,\n",
378 |     "                            Question(1,3))"
379 |    ]
380 |   },
381 |   {
382 |    "cell_type": "code",
383 |    "execution_count": 20,
384 |    "metadata": {
385 |     "scrolled": true
386 |    },
387 |    "outputs": [
388 |     {
389 |      "data": {
390 |       "text/plain": [
391 |        "0.37333333333333324"
392 |       ]
393 |      },
394 |      "execution_count": 20,
395 |      "metadata": {},
396 |      "output_type": "execute_result"
397 |     }
398 |    ],
399 |    "source": [
400 |     "info_gain(true_rows,false_rows,gini(training_data))"
401 |    ]
402 |   },
403 |   {
404 |    "cell_type": "markdown",
405 |    "metadata": {},
406 |    "source": [
407 |     "###### Find the best question to ask by iterating over every feature / value and calculating the information gain."
408 |    ]
409 |   },
410 |   {
411 |    "cell_type": "code",
412 |    "execution_count": 27,
413 |    "metadata": {},
414 |    "outputs": [],
415 |    "source": [
416 |     "def find_best_split(Data):\n",
417 |     "    best_gain = 0\n",
418 |     "    best_question = None\n",
419 |     "    current_impurity = gini(Data) #Gds\n",
420 |     "    n_features = len(Data[0]) - 1\n",
421 |     "    for col in range(n_features): #0\n",
422 |     "        values = unique_vals(Data,col) #[Green,Red,Yellow]\n",
423 |     "        for val in values:\n",
424 |     "            question = Question(col,val)\n",
425 |     "            true_rows,false_rows = partition(Data,question)\n",
426 |     "            if(len(true_rows) == 0 or len(false_rows)==0):\n",
427 |     "                continue\n",
428 |     "            gain = info_gain(true_rows,\n",
429 |     "                             false_rows,\n",
430 |     "                             current_impurity)\n",
431 |     "            if gain>=best_gain:\n",
432 |     "                best_gain, best_question = gain , question\n",
433 |     "    return best_gain,best_question"
434 |    ]
435 |   },
436 |   {
437 |    "cell_type": "code",
438 |    "execution_count": 28,
439 |    "metadata": {},
440 |    "outputs": [
441 |     {
442 |      "name": "stdout",
443 |      "output_type": "stream",
444 |      "text": [
445 |       "Is Diameter == 3?\n",
446 |       "0.37333333333333324\n"
447 |      ]
448 |     }
449 |    ],
450 |    "source": [
451 |     "best_gain,best_question = find_best_split(training_data)\n",
452 |     "print(best_question)\n",
453 |     "print(best_gain)"
454 |    ]
455 |   },
456 |   {
457 |    "cell_type": "markdown",
458 |    "metadata": {},
459 |    "source": [
460 |     "##### A Leaf node classifies data.\n",
461 |     "\n",
462 |     "This holds a dictionary of class (e.g., \"Mango\") -> number of times it appears in the rows from the training data that reach this leaf."
463 |    ]
464 |   },
465 |   {
466 |    "cell_type": "code",
467 |    "execution_count": 70,
468 |    "metadata": {},
469 |    "outputs": [],
470 |    "source": [
471 |     "class Leaf:\n",
472 |     "    def __init__(self,Data):\n",
473 |     "        self.predictions = class_counts(Data)"
474 |    ]
475 |   },
476 |   {
477 |    "cell_type": "markdown",
478 |    "metadata": {},
479 |    "source": [
480 |     "##### A Decision Node asks a question.\n",
481 |     "\n",
482 |     "This holds a reference to the question, and to the two child nodes."
483 |    ]
484 |   },
485 |   {
486 |    "cell_type": "code",
487 |    "execution_count": 71,
488 |    "metadata": {},
489 |    "outputs": [],
490 |    "source": [
491 |     "class Decision_Node:\n",
492 |     "    def __init__(self, question, true_branch,false_branch):\n",
493 |     "        self.question = question\n",
494 |     "        self.true_branch = true_branch\n",
495 |     "        self.false_branch = false_branch\n",
496 |     "        #print(self.question)"
497 |    ]
498 |   },
499 |   {
500 |    "cell_type": "markdown",
501 |    "metadata": {},
502 |    "source": [
503 |     "##### Builds the tree.\n",
504 |     "\n",
505 |     "Try partitioing the dataset on each of the unique attribute, \n",
506 |     "calculate the information gain, \n",
507 |     "and return the question that produces the highest gain."
508 |    ]
509 |   },
510 |   {
511 |    "cell_type": "code",
512 |    "execution_count": 72,
513 |    "metadata": {},
514 |    "outputs": [],
515 |    "source": [
516 |     "def build_tree(Data,i=0):\n",
517 |     "    gain, question = find_best_split(Data) #FIND BEST QUESTION\n",
518 |     "    \n",
519 |     "\n",
520 |     "    # Base case: no further info gain \n",
521 |     "    # since we can ask no further questions,\n",
522 |     "    # we'Ll return a leaf.\n",
523 |     "    if gain == 0:\n",
524 |     "        return Leaf(Data)\n",
525 |     "    \n",
526 |     "    # If we reach here, we have found a useful feature / value \n",
527 |     "    # to partition on.\n",
528 |     "    true_rows , false_rows = partition(Data,question)\n",
529 |     "    \n",
530 |     "    true_branch = build_tree(true_rows,i)\n",
531 |     "    false_branch = build_tree(false_rows,i)\n",
532 |     "    false_branch = build tree(false_rows)\n",
533 |     "\n",
534 |     "    # Return a Question node.\n",
535 |     "    # This records the best feature / value to ask at this point, \n",
536 |     "    # as well as the branches to follow\n",
537 |     "    # dependingo on the answer.\n",
538 |     "    return Decision_Node(question,true_branch,false_branch)"
539 |    ]
540 |   },
541 |   {
542 |    "cell_type": "code",
543 |    "execution_count": 73,
544 |    "metadata": {},
545 |    "outputs": [
546 |     {
547 |      "name": "stdout",
548 |      "output_type": "stream",
549 |      "text": [
550 |       "<__main__.Decision_Node object at 0x0000025EE832F780>\n"
551 |      ]
552 |     }
553 |    ],
554 |    "source": [
555 |     "my_tree = build_tree(training_data)\n",
556 |     "print(my_tree)"
557 |    ]
558 |   },
559 |   {
560 |    "cell_type": "code",
561 |    "execution_count": 74,
562 |    "metadata": {},
563 |    "outputs": [],
564 |    "source": [
565 |     "def print_tree(node,spacing=\"\"):\n",
566 |     "    if isinstance(node, Leaf):\n",
567 |     "        print(spacing + \"Predict\",node.predictions)\n",
568 |     "        return\n",
569 |     "    print(spacing+str(node.question))\n",
570 |     "    print(spacing + \"--> True:\")\n",
571 |     "    print_tree(node.true_branch , spacing + \"\\t\")\n",
572 |     "    \n",
573 |     "    print(spacing + \"--> False:\")\n",
574 |     "    print_tree(node.false_branch , spacing + \"\\t\")\n",
575 |     "    "
576 |    ]
577 |   },
578 |   {
579 |    "cell_type": "code",
580 |    "execution_count": 75,
581 |    "metadata": {
582 |     "scrolled": false
583 |    },
584 |    "outputs": [
585 |     {
586 |      "name": "stdout",
587 |      "output_type": "stream",
588 |      "text": [
589 |       "Is Diameter == 3?\n",
590 |       "--> True:\n",
591 |       "\tIs Color == Yellow?\n",
592 |       "\t--> True:\n",
593 |       "\t\tPredict {'Lemon': 1, 'Apple': 1}\n",
594 |       "\t--> False:\n",
595 |       "\t\tPredict {'Apple': 1}\n",
596 |       "--> False:\n",
597 |       "\tPredict {'Grape': 2}\n"
598 |      ]
599 |     }
600 |    ],
601 |    "source": [
602 |     "print_tree(my_tree)"
603 |    ]
604 |   },
605 |   {
606 |    "cell_type": "code",
607 |    "execution_count": 76,
608 |    "metadata": {},
609 |    "outputs": [],
610 |    "source": [
611 |     "def print_leaf(counts):\n",
612 |     "    total = sum(counts.values())*1.0\n",
613 |     "    probs = {}\n",
614 |     "    for lbl in counts.keys():\n",
615 |     "        probs[lbl] = str(int(counts[lbl]/total * 100)) + \"%\"\n",
616 |     "    return probs"
617 |    ]
618 |   },
619 |   {
620 |    "cell_type": "code",
621 |    "execution_count": 77,
622 |    "metadata": {},
623 |    "outputs": [],
624 |    "source": [
625 |     "def classify(row,node):\n",
626 |     "    if isinstance(node,Leaf):\n",
627 |     "        return node.predictions\n",
628 |     "    \n",
629 |     "    # Decide whether to follow the true-branch or the false-branch.\n",
630 |     "    # Compate the feature / value stored in the node, \n",
631 |     "    # to the example we're considering.\n",
632 |     "    if node.question.match(row):\n",
633 |     "        return classify(row,node.true_branch)\n",
634 |     "    else:\n",
635 |     "        return classify(row,node.false_branch)"
636 |    ]
637 |   },
638 |   {
639 |    "cell_type": "markdown",
640 |    "metadata": {},
641 |    "source": [
642 |     "##### Test Data"
643 |    ]
644 |   },
645 |   {
646 |    "cell_type": "code",
647 |    "execution_count": 28,
648 |    "metadata": {},
649 |    "outputs": [],
650 |    "source": [
651 |     "testing_data = [\n",
652 |     "    [\"Red\",1,\"Apple\"],\n",
653 |     "    [\"Yellow\" , 3 , \"Apple\"]\n",
654 |     "]"
655 |    ]
656 |   },
657 |   {
658 |    "cell_type": "markdown",
659 |    "metadata": {},
660 |    "source": [
661 |     "##### Comparing actual and predicted value"
662 |    ]
663 |   },
664 |   {
665 |    "cell_type": "code",
666 |    "execution_count": 29,
667 |    "metadata": {},
668 |    "outputs": [
669 |     {
670 |      "name": "stdout",
671 |      "output_type": "stream",
672 |      "text": [
673 |       "Actual: Apple. Predicted: {'Grape': '100%'}\n",
674 |       "Actual: Apple. Predicted: {'Apple': '50%', 'Lemon': '50%'}\n"
675 |      ]
676 |     }
677 |    ],
678 |    "source": [
679 |     "for row in testing_data:\n",
680 |     "    print(\"Actual: %s. Predicted: %s\" % \n",
681 |     "          (row[-1],print_leaf(classify(row,my_tree))))"
682 |    ]
683 |   },
684 |   {
685 |    "cell_type": "markdown",
686 |    "metadata": {},
687 |    "source": [
688 |     "##### Training model with 2nd dataset to increase accuracy"
689 |    ]
690 |   },
691 |   {
692 |    "cell_type": "code",
693 |    "execution_count": 90,
694 |    "metadata": {},
695 |    "outputs": [],
696 |    "source": [
697 |     "header=[\"outlook\",\"temperature\",\"humidity\",\"wind\",\"decision\"]\n",
698 |     "\n",
699 |     "training_data2= [\n",
700 |     "['sunny','hot','high','weak','no'],\n",
701 |     "['sunny','hot','high','strong','no'],\n",
702 |     "['overcast','hot','high','weak','yes'],\n",
703 |     "['rain','mild','high','weak','yes'],\n",
704 |     "['rain','cool','normal','weak','yes'],\n",
705 |     "['rain','cool','normal','strong','no'],\n",
706 |     "['overcast','cool','normal','strong','yes'],\n",
707 |     "['sunny','mild','high','weak','no'],\n",
708 |     "['sunny','cool','normal','weak','yes'],\n",
709 |     "['rain','mild','normal','weak','yes'],\n",
710 |     "['sunny','mild','normal','strong','yes'],\n",
711 |     "['overcast','mild','high','strong','yes'],\n",
712 |     "['overcast','hot','normal','weak','yes'],\n",
713 |     "['rain','mild','high','strong','no'],\n",
714 |     "]"
715 |    ]
716 |   },
717 |   {
718 |    "cell_type": "markdown",
719 |    "metadata": {},
720 |    "source": [
721 |     "##### Build the Decision tree with \"training_data2\" dataset"
722 |    ]
723 |   },
724 |   {
725 |    "cell_type": "code",
726 |    "execution_count": 91,
727 |    "metadata": {},
728 |    "outputs": [],
729 |    "source": [
730 |     "my_tree2 = build_tree(training_data2)"
731 |    ]
732 |   },
733 |   {
734 |    "cell_type": "markdown",
735 |    "metadata": {},
736 |    "source": [
737 |     "#### Print The Final Tree"
738 |    ]
739 |   },
740 |   {
741 |    "cell_type": "code",
742 |    "execution_count": 92,
743 |    "metadata": {},
744 |    "outputs": [
745 |     {
746 |      "name": "stdout",
747 |      "output_type": "stream",
748 |      "text": [
749 |       "Is outlook == overcast?\n",
750 |       "--> True:\n",
751 |       "\tPredict {'yes': 4}\n",
752 |       "--> False:\n",
753 |       "\tIs humidity == high?\n",
754 |       "\t--> True:\n",
755 |       "\t\tIs outlook == sunny?\n",
756 |       "\t\t--> True:\n",
757 |       "\t\t\tPredict {'no': 3}\n",
758 |       "\t\t--> False:\n",
759 |       "\t\t\tIs wind == strong?\n",
760 |       "\t\t\t--> True:\n",
761 |       "\t\t\t\tPredict {'no': 1}\n",
762 |       "\t\t\t--> False:\n",
763 |       "\t\t\t\tPredict {'yes': 1}\n",
764 |       "\t--> False:\n",
765 |       "\t\tIs wind == strong?\n",
766 |       "\t\t--> True:\n",
767 |       "\t\t\tIs temperature == cool?\n",
768 |       "\t\t\t--> True:\n",
769 |       "\t\t\t\tPredict {'no': 1}\n",
770 |       "\t\t\t--> False:\n",
771 |       "\t\t\t\tPredict {'yes': 1}\n",
772 |       "\t\t--> False:\n",
773 |       "\t\t\tPredict {'yes': 3}\n"
774 |      ]
775 |     }
776 |    ],
777 |    "source": [
778 |     "print_tree(my_tree2)"
779 |    ]
780 |   },
781 |   {
782 |    "cell_type": "code",
783 |    "execution_count": 93,
784 |    "metadata": {},
785 |    "outputs": [],
786 |    "source": [
787 |     "testing_data2 = [\"overcast\",\"mild\",\"normal\",\"weak\"]"
788 |    ]
789 |   },
790 |   {
791 |    "cell_type": "code",
792 |    "execution_count": 94,
793 |    "metadata": {},
794 |    "outputs": [
795 |     {
796 |      "data": {
797 |       "text/plain": [
798 |        "{'yes': 4}"
799 |       ]
800 |      },
801 |      "execution_count": 94,
802 |      "metadata": {},
803 |      "output_type": "execute_result"
804 |     }
805 |    ],
806 |    "source": [
807 |     "classify(testing_data2,my_tree2)"
808 |    ]
809 |   },
810 |   {
811 |    "cell_type": "code",
812 |    "execution_count": 95,
813 |    "metadata": {},
814 |    "outputs": [
815 |     {
816 |      "name": "stdout",
817 |      "output_type": "stream",
818 |      "text": [
819 |       "Predicted: {'yes': '100%'}\n"
820 |      ]
821 |     }
822 |    ],
823 |    "source": [
824 |     "print(\"Predicted: %s\" % (print_leaf(classify(testing_data2,my_tree2))))"
825 |    ]
826 |   },
827 |   {
828 |    "cell_type": "code",
829 |    "execution_count": null,
830 |    "metadata": {},
831 |    "outputs": [],
832 |    "source": []
833 |   }
834 |  ],
835 |  "metadata": {
836 |   "kernelspec": {
837 |    "display_name": "Python 3",
838 |    "language": "python",
839 |    "name": "python3"
840 |   },
841 |   "language_info": {
842 |    "codemirror_mode": {
843 |     "name": "ipython",
844 |     "version": 3
845 |    },
846 |    "file_extension": ".py",
847 |    "mimetype": "text/x-python",
848 |    "name": "python",
849 |    "nbconvert_exporter": "python",
850 |    "pygments_lexer": "ipython3",
851 |    "version": "3.8.5"
852 |   }
853 |  },
854 |  "nbformat": 4,
855 |  "nbformat_minor": 2
856 | }
857 | 


--------------------------------------------------------------------------------
/notebooks/Dummy Classifier Notebook.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Dummy Classifier "
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "## What is a `DummyClassifier`?\n",
 15 |     "\n",
 16 |     "DummyClassifier is a classifier that makes predictions using simple rules, which can be\n",
 17 |     "useful as a baseline for comparison against actual classifiers, especially with imbalanced classes(where the class distribution is not equal or close to equal, and is instead biased or skewed).\n",
 18 |     "\n",
 19 |     "A dummy classifier is basically a classifier which doesn’t even look at the training data while classification, but follows just a rule of thumb or strategy that we instruct it to use while classifying. It is done by including the strategy we want in the strategy parameter of the `DummyClassifier`.The main notion behind using a dummy classifier is that a classifier which is based on an analytic approach to do better than random guessing approach.\n"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "## Strategies used in Dummy Classifier\n",
 27 |     "\n",
 28 |     "The scikit-learn `DummyClassifier` class implements several strategies for random guessing classifiers. \n",
 29 |     "The strategies are as follows:\n",
 30 |     "\n",
 31 |     "- stratified : This strategy generates the prediction using the training set's class distribution\n",
 32 |     "- most_frequent : This always predicts the most frequent label in training set.\n",
 33 |     "- prior : This predicts the class that maximises the class prior.\n",
 34 |     "- uniform : This generates predictions uniformly at random\n",
 35 |     "- constant : Always predicts a constant label which is user defined. This is specificaly usefull for metrics that evaluate a non-majority class."
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     " ## Explaination through Implementation\n",
 43 |     " \n",
 44 |     "The dummy classifier gives measure of \"baseline\" performance--i.e. the success rate one should expect to achieve even if simply guessing.\n",
 45 |     "\n",
 46 |     "If one wishes to determine whether a given object possesses or does not possess a certain property. After analyzing a large number of the objects it is found that 90% contain the target property, then guessing that every future instance of the object possesses the target property gives a 90% likelihood of guessing correctly. Structuring these guesses is equivalent to using the `most_frequent` method in dummy clasifier\n",
 47 |     "\n",
 48 |     "Because many machine learning tasks attempt to increase the success rate of (e.g.) classification tasks, evaluating the baseline success rate can afford a floor value for the minimal value one's classifier should out-perform. \n",
 49 |     "\n",
 50 |     "If one trains a dummy classifier with the `stratified` parameter using the data discussed above, that classifier will predict that there is a 90% probability that each object it encounters possesses the target property. This is different from training a dummy classifier with the `most_frequent` parameter, as the latter would guess that all future objects possess the target property. Here's some code to illustrate:"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "code",
 55 |    "execution_count": 1,
 56 |    "metadata": {},
 57 |    "outputs": [],
 58 |    "source": [
 59 |     "import numpy as np \n",
 60 |     "import pandas as pd \n",
 61 |     "import matplotlib.pyplot as plt  \n",
 62 |     "import seaborn as sns "
 63 |    ]
 64 |   },
 65 |   {
 66 |    "cell_type": "code",
 67 |    "execution_count": 2,
 68 |    "metadata": {},
 69 |    "outputs": [
 70 |     {
 71 |      "data": {
 72 |       "text/html": [
 73 |        "<div>\n",
 74 |        "<style scoped>\n",
 75 |        "    .dataframe tbody tr th:only-of-type {\n",
 76 |        "        vertical-align: middle;\n",
 77 |        "    }\n",
 78 |        "\n",
 79 |        "    .dataframe tbody tr th {\n",
 80 |        "        vertical-align: top;\n",
 81 |        "    }\n",
 82 |        "\n",
 83 |        "    .dataframe thead th {\n",
 84 |        "        text-align: right;\n",
 85 |        "    }\n",
 86 |        "</style>\n",
 87 |        "<table border=\"1\" class=\"dataframe\">\n",
 88 |        "  <thead>\n",
 89 |        "    <tr style=\"text-align: right;\">\n",
 90 |        "      <th></th>\n",
 91 |        "      <th>Pregnancies</th>\n",
 92 |        "      <th>Glucose</th>\n",
 93 |        "      <th>BloodPressure</th>\n",
 94 |        "      <th>SkinThickness</th>\n",
 95 |        "      <th>Insulin</th>\n",
 96 |        "      <th>BMI</th>\n",
 97 |        "      <th>DiabetesPedigreeFunction</th>\n",
 98 |        "      <th>Age</th>\n",
 99 |        "      <th>Outcome</th>\n",
100 |        "    </tr>\n",
101 |        "    <tr>\n",
102 |        "      <th>Id</th>\n",
103 |        "      <th></th>\n",
104 |        "      <th></th>\n",
105 |        "      <th></th>\n",
106 |        "      <th></th>\n",
107 |        "      <th></th>\n",
108 |        "      <th></th>\n",
109 |        "      <th></th>\n",
110 |        "      <th></th>\n",
111 |        "      <th></th>\n",
112 |        "    </tr>\n",
113 |        "  </thead>\n",
114 |        "  <tbody>\n",
115 |        "    <tr>\n",
116 |        "      <th>0</th>\n",
117 |        "      <td>6</td>\n",
118 |        "      <td>148</td>\n",
119 |        "      <td>72</td>\n",
120 |        "      <td>35</td>\n",
121 |        "      <td>0</td>\n",
122 |        "      <td>33.6</td>\n",
123 |        "      <td>0.627</td>\n",
124 |        "      <td>50</td>\n",
125 |        "      <td>1</td>\n",
126 |        "    </tr>\n",
127 |        "    <tr>\n",
128 |        "      <th>2</th>\n",
129 |        "      <td>8</td>\n",
130 |        "      <td>183</td>\n",
131 |        "      <td>64</td>\n",
132 |        "      <td>0</td>\n",
133 |        "      <td>0</td>\n",
134 |        "      <td>23.3</td>\n",
135 |        "      <td>0.672</td>\n",
136 |        "      <td>32</td>\n",
137 |        "      <td>1</td>\n",
138 |        "    </tr>\n",
139 |        "    <tr>\n",
140 |        "      <th>3</th>\n",
141 |        "      <td>1</td>\n",
142 |        "      <td>89</td>\n",
143 |        "      <td>66</td>\n",
144 |        "      <td>23</td>\n",
145 |        "      <td>94</td>\n",
146 |        "      <td>28.1</td>\n",
147 |        "      <td>0.167</td>\n",
148 |        "      <td>21</td>\n",
149 |        "      <td>0</td>\n",
150 |        "    </tr>\n",
151 |        "  </tbody>\n",
152 |        "</table>\n",
153 |        "</div>"
154 |       ],
155 |       "text/plain": [
156 |        "    Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \\\n",
157 |        "Id                                                                      \n",
158 |        "0             6      148             72             35        0  33.6   \n",
159 |        "2             8      183             64              0        0  23.3   \n",
160 |        "3             1       89             66             23       94  28.1   \n",
161 |        "\n",
162 |        "    DiabetesPedigreeFunction  Age  Outcome  \n",
163 |        "Id                                          \n",
164 |        "0                      0.627   50        1  \n",
165 |        "2                      0.672   32        1  \n",
166 |        "3                      0.167   21        0  "
167 |       ]
168 |      },
169 |      "execution_count": 2,
170 |      "metadata": {},
171 |      "output_type": "execute_result"
172 |     }
173 |    ],
174 |    "source": [
175 |     "import pandas as pd\n",
176 |     "import matplotlib.pyplot as plt\n",
177 |     "df_train = pd.read_csv(\"C:/Users/sshre/OneDrive/Documents/DATA SCIENCE/train.csv\")\n",
178 |     "df_train.set_index(\"Id\").head(3)"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "code",
183 |    "execution_count": 3,
184 |    "metadata": {},
185 |    "outputs": [],
186 |    "source": [
187 |     "X = df_train.drop([\"Outcome\"],axis=1)\n",
188 |     "y = df_train[\"Outcome\"]"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "markdown",
193 |    "metadata": {},
194 |    "source": [
195 |     "Dividing the data set into training and test data for analysis"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": 4,
201 |    "metadata": {},
202 |    "outputs": [],
203 |    "source": [
204 |     "from sklearn.model_selection import train_test_split\n",
205 |     "\n",
206 |     "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"
207 |    ]
208 |   },
209 |   {
210 |    "cell_type": "markdown",
211 |    "metadata": {},
212 |    "source": [
213 |     "Checking the dummyclassifier performance with different strategies."
214 |    ]
215 |   },
216 |   {
217 |    "cell_type": "code",
218 |    "execution_count": 8,
219 |    "metadata": {},
220 |    "outputs": [
221 |     {
222 |      "name": "stdout",
223 |      "output_type": "stream",
224 |      "text": [
225 |       "0.6276595744680851\n",
226 |       "0.5638297872340425\n",
227 |       "0.574468085106383\n"
228 |      ]
229 |     }
230 |    ],
231 |    "source": [
232 |     "from sklearn.metrics import accuracy_score\n",
233 |     "from sklearn.dummy import DummyClassifier\n",
234 |     "strategies = ['most_frequent', 'stratified', 'uniform'] \n",
235 |     "  \n",
236 |     "test_scores = [] \n",
237 |     "for s in strategies: \n",
238 |     "     \n",
239 |     "    dclf = DummyClassifier(strategy = s, random_state = 0) \n",
240 |     "    dclf.fit(X_train, y_train)  \n",
241 |     "    prediction=dclf.predict(X_test)\n",
242 |     "    score=(accuracy_score(y_test,prediction)) \n",
243 |     "    test_scores.append(score)\n",
244 |     "    print(score)"
245 |    ]
246 |   },
247 |   {
248 |    "cell_type": "markdown",
249 |    "metadata": {},
250 |    "source": [
251 |     "Plotting the performace score of the dummyclassifier "
252 |    ]
253 |   },
254 |   {
255 |    "cell_type": "code",
256 |    "execution_count": 9,
257 |    "metadata": {},
258 |    "outputs": [
259 |     {
260 |      "data": {
261 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEHCAYAAAC0pdErAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAc2ElEQVR4nO3dfZxcVZ3n8c+XhCAIyEMaN5JAMpqIjsNTyigyKGGEiU/JzoIRRlfjiBlXI+OM4AvXh3GCzgw+jLtoZjWwgIzyoOhAQCSgC4gokopAIGEimQRMiy9pkyAgCgS++8e9DZXK7U6R9O1KJ9/361WvrnPuuad+1dVdvzrn1j1XtomIiGi3S7cDiIiI7VMSREREVEqCiIiISkkQERFRKQkiIiIqje52AENl7NixnjhxYrfDiIgYUZYuXfob2z1V23aYBDFx4kSazWa3w4iIGFEk3T/QtkwxRUREpSSIiIiolAQRERGVak0QkmZIWilplaQzB2gzW9IKScslXdy2bW9Jv5T05TrjjIiIzdV2kFrSKGABcDzQCyyRtMj2ipY2k4GPAkfb3iDpgLZuzgJuqivGiIgYWJ0jiGnAKturbT8BXArMamvzXmCB7Q0Ath/s3yBpKvBC4LoaY4yIiAHUmSAOBNa2lHvLulZTgCmSbpF0q6QZAJJ2Ab4AnDHYA0iaK6kpqdnX1zeEoUdERJ0JQhV17WuLjwYmA8cCpwDnSdoHeD9wje21DML2QtsN242ensrzPCIiYivVeaJcLzChpTweeKCiza22nwTWSFpJkTCOAo6R9H5gT2CMpEdtVx7ojoiIoVfnCGIJMFnSJEljgJOBRW1trgCmA0gaSzHltNr2220fZHsicDpwUZJDRMTwqi1B2N4IzAMWA/cA37S9XNJ8STPLZouBdZJWADcAZ9heV1dMERHROe0olxxtNBrOWkwREc+NpKW2G1XbciZ1RERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUqjVBSJohaaWkVZIqryktabakFZKWS7q4rDtY0lJJd5T176szzoiI2NzoujqWNApYABwP9AJLJC2yvaKlzWTgo8DRtjdIOqDc9CvgNbYfl7QncHe57wN1xRsREZuqcwQxDVhle7XtJ4BLgVltbd4LLLC9AcD2g+XPJ2w/XrbZreY4IyKiQp1vvAcCa1vKvWVdqynAFEm3SLpV0oz+DZImSFpW9nF21ehB0lxJTUnNvr6+Gp5CRMTOq84EoYo6t5VHA5OBY4FTgPMk7QNge63tQ4GXAO+S9MLNOrMX2m7YbvT09Axp8BERO7s6E0QvMKGlPB5oHwX0AlfaftL2GmAlRcJ4RjlyWA4cU2OsERHRps4EsQSYLGmSpDHAycCitjZXANMBJI2lmHJaLWm8pN3L+n2BoymSR0REDJPavsVke6OkecBiYBRwvu3lkuYDTduLym0nSFoBPAWcYXudpOOBL0gyxVTV523fVVesI9Uv1j3G6Zffyc/u38CRB+/L5086jIP236PbYUXEDkJ2+2GBkanRaLjZbHY7jGE1+6s/4bY1658pT5u0H9/866O6GFFEjDSSltpuVG3L10dHsJ/dv2HQckTEtkiCGMGOPHjfQcsREdsiCWIE+/xJhzFt0n6M3kVMm7Qfnz/psG6HFBE7kNoOUkf9Dtp/jxxziIjaZAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUanWBCFphqSVklZJOnOANrMlrZC0XNLFZd3hkn5S1i2T9LY644yIiM3Vtty3pFHAAuB4oBdYImmR7RUtbSYDHwWOtr1B0gHlpseAd9q+V9KLgKWSFtt+qK54IyJiU3WOIKYBq2yvtv0EcCkwq63Ne4EFtjcA2H6w/Plz2/eW9x8AHgR6aow1IiLa1JkgDgTWtpR7y7pWU4Apkm6RdKukGe2dSJoGjAH+s2LbXElNSc2+vr4hDD0iIupMEKqoc1t5NDAZOBY4BThP0j7PdCCNA/4NeLftpzfrzF5ou2G70dOTAUZExFCqM0H0AhNayuOBByraXGn7SdtrgJUUCQNJewPfBT5u+9Ya44yIiAp1JoglwGRJkySNAU4GFrW1uQKYDiBpLMWU0+qy/b8DF9n+Vo0xRkTEAGpLELY3AvOAxcA9wDdtL5c0X9LMstliYJ2kFcANwBm21wGzgdcCcyTdUd4OryvWiIjYnOz2wwIjU6PRcLPZ7HYYEREjiqSlthtV23ImdUREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzhygzWxJKyQtl3RxS/21kh6SdHWdMUZERLXRdXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBLF58D9gD+uq4YIyJiYFscQUjqkfTV/k/ykl4uaU4HfU8DVtlebfsJ4FJgVlub9wILbG8AsP1g/wbbPwAe6expRETEUOtkiulC4CZgQlm+F/hwB/sdCKxtKfeWda2mAFMk3SLpVkkzOug3IiKGQScJ4gDbFwNPA9h+Eniqg/1UUee28mhgMnAscApwnqR9Oui7eABprqSmpGZfX1+nu0VERAc6SRC/k7Qf5Zu7pFfS2dRPL8+OOgDGAw9UtLnS9pO21wArKRJGR2wvtN2w3ejp6el0t4iI6EAnCeJ04CrgjyTdBFwCfLCD/ZYAkyVNkjQGOBlY1NbmCmA6gKSxFFNOqzuMPSIiajTot5gk7QKMongTfxnFtNGK8qDzoGxvlDQPWFz2cb7t5ZLmA03bi8ptJ0haQTFtdYbtdeVj3wwcAuwpqRd4j+3FW/tEIyLiuZHdfligrYF0q+1XD1M8W63RaLjZbHY7jIiIEUXSUtuNqm2dTDFdL6n966kREbGD6+REuXnACyQ9DvyeYprJtverNbKIiOiqThLE2NqjiIiI7c4WE4TtpyS9EXhtWXWj7WvrDSsiIrqtk6U2PgN8hOLrp6uBj0j6dN2BRUREd3UyxfQW4AjbTwFIOh/4GfDxOgOLiIju6nS5771b7u9VRyAREbF96WQE8VngZ5J+QPENpmOBT9YZVEREdF8nB6m/LukG4FUUCeKTtn9Ze2QREdFVnRykngk8avs7tr9NsXjfm+sPLSIiuqmTYxDzbf+2v2D7IeCs+kKKiIjtQScJoqpNbZcqjYiI7UMnCeJnkj4r6WBJB0n6HHB73YFFRER3dZIg5pXtrqS4LgTA+2uLKCIitgudfIvpUYqLBiFpL9udXE0uIiJGuAFHEJI+JumQ8v4YSdcBayX9WtJxwxZhRER0xWBTTH9JcY1ogHcCz6NY2fU44J9qjisiIrpssATxhJ+93NwM4GLbG20vB3btpHNJMyStlLRK0pkDtJktaYWk5ZIubql/l6R7y9u7On1CERExNAY7BvG4pJcBD1KMGj7Ssm2PLXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBZvx/w90ADMLC03HfDc3p2ERGx1QYbQXwYWASsAs6xvRqgvDbEsg76ngassr3a9hPApUD7pUvfCyzof+O3/WBZ/+fA9bbXl9uupxjFRETEMBlwBGH7FmByRf01wDUd9H0gsLal3EuxnlOrKQCSbgFGAZ8qL0ZUte+BHTxmREQMkTrPiFZFndvKoymS0LHAeOBmSa/ocF8kzQXmAhx00EHbEmtERLTp9HoQW6MXmNBSHg88UNHmSttP2l5D8a2pyR3ui+2Fthu2Gz09PUMafETEzq6T1Vw3G2VU1VVYAkyWNEnSGOBkimMara4Appd9jqWYcloNLAZOkLSvpH2BE8q6iIgYJp2MIG7rsG4TtjdSLNOxGLgH+Kbt5ZLml0uIU25bJ2kFcANwhu11ttdTrBi7pLzNL+siImKY6NlTHdo2FF85HUfx7aPZPHtcYG/gPNuHDEuEHWo0Gm42m90OIyJiRJG01HajattgU0VvAv6KYv5/Ac8miEeATwxphBERsd0Z7GuuFwAXSJpt+5vDGFNERGwHOjkGcYCkvQEkfUXSbZL+rOa4IiKiyzpJEHNtPyzpBIrppv8BfLbesCIiots6SRD9R7HfAFxge2mH+0VExAjWyRv9nZKuAd4CfE/SnlSc1RwRETuWTk54ezcwlWLhvcfKE9reU29YERHRbVscQdh+CvgjimMPALt3sl9ERIxsnSy18WWK5TDeUVb9DvhKnUFFRET3dTLF9BrbR0q6HcD2+nJtpYiI2IF1MlX0pKRdKA9MS9ofeLrWqCIiousGTBAtK7YuAL4N9Ej6B+BHwNnDEFtERHTRYFNMtwFH2r5I0lLg9RTrMb3V9t3DEl1ERHTNYAnimau62V4OLK8/nIiI2F4MliB6JP3dQBtt/0sN8URExHZisAQxCtiT6utDR0TEDm6wBPEr2/OHLZKIiNiuDPY114wcIiJ2YoMliG2+5oOkGZJWSlol6cyK7XMk9Um6o7yd2rLtbEl3l7e3bWssERHx3Ax2Rbn129KxpFEU51AcD/QCSyQtsr2irelltue17fsm4EjgcGA34CZJ37P98LbEFBERnatz0b1pFCvArrb9BHApMKvDfV8O3GR7o+3fAXcCM2qKMyIiKtSZIA4E1raUe8u6didKWibpckkTyro7gTdI2qNcXnw6MKF9R0lzJTUlNfv6+oY6/oiInVqdCaLqIHf7hYauAibaPhT4PvA1ANvXAdcAPwYuAX4CbNysM3uh7YbtRk9Pz1DGHhGx06szQfSy6af+8cADrQ1sr7P9eFk8l+LCRP3bPmP7cNvHUySbe2uMNSIi2tSZIJYAkyVNKpcHPxlY1NpA0riW4kzgnrJ+VLlqLJIOBQ4Frqsx1oiIaNPJ9SC2iu2NkuYBiynOyj7f9nJJ84Gm7UXAaZJmUkwfrQfmlLvvCtwsCeBh4B22N5tiioiI+shuPywwMjUaDTebzW6HERExokhaartRtS3Xlo6IiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEq1JghJMyStlLRK0pkV2+dI6pN0R3k7tWXbZyUtl3SPpHNUXn80IiKGR23XpJY0ClgAHA/0AkskLbK9oq3pZbbnte37GuBo4NCy6kfA64Ab64o3IiI2VecIYhqwyvZq208AlwKzOtzXwPOAMcBuwK7Ar2uJMiIiKtWZIA4E1raUe8u6didKWibpckkTAGz/BLgB+FV5W2z7nvYdJc2V1JTU7OvrG/pnEBGxE6szQVQdM3Bb+Spgou1Dge8DXwOQ9BLgZcB4iqRynKTXbtaZvdB2w3ajp6dnSIOPiNjZ1ZkgeoEJLeXxwAOtDWyvs/14WTwXmFre/wvgVtuP2n4U+B7w6hpjjYiINnUmiCXAZEmTJI0BTgYWtTaQNK6lOBPon0b6BfA6SaMl7UpxgHqzKaaIiKhPbd9isr1R0jxgMTAKON/2cknzgabtRcBpkmYCG4H1wJxy98uB44C7KKalrrV9VV2xRkTE5mS3HxYYmRqNhpvNZrfDiIgYUSQttd2o2pYzqSMiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiG209pG1zLl2DkdcdARzrp3D2kfWbnmnESAJIiJiG33ilk+w9NdL2eiNLP31Uj5xyye6HdKQSIKIiNhGdz5456DlkSoJIiJiGx12wGGDlkeqJIiIiG101tFnMfWFUxmt0Ux94VTOOvqsboc0JGpbrC8iYmcxYa8JXDjjwm6HMeQygoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIioVGuCkDRD0kpJqySdWbF9jqQ+SXeUt1PL+uktdXdI+oOk/1pnrBERsanavuYqaRSwADge6AWWSFpke0Vb08tsz2utsH0DcHjZz37AKuC6umKNiIjN1TmCmAassr3a9hPApcCsrejnJOB7th8b0ugiImJQdSaIA4HWJQ17y7p2J0paJulySRMqtp8MXFL1AJLmSmpKavb19W17xBER8Yw6E4Qq6txWvgqYaPtQ4PvA1zbpQBoH/AmwuOoBbC+03bDd6OnpGYKQIyKiX50JohdoHRGMBx5obWB7ne3Hy+K5wNS2PmYD/277ydqijIiISnUmiCXAZEmTJI2hmCpa1NqgHCH0mwnc09bHKQwwvRQREfWq7VtMtjdKmkcxPTQKON/2cknzgabtRcBpkmYCG4H1wJz+/SVNpBiB3FRXjBERMTDZ7YcFRqZGo+Fms9ntMCIiRhRJS203qrblTOqIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiI4bR+DVzwRpi/f/Fz/ZpuRxQxoCSIiOF05Qfg/lvg6Y3Fzys/0O2IIgaUBBExnNb+dPByxHYkCSJiOE141eDliO1IEkTEcJq1AA4+GnYZXfyctaDbEUUMqLblviOiwn6T4N3XdDuKiI5kBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzqzYPkdSn6Q7ytupLdsOknSdpHskrSgvQRoREcOktoPUkkYBC4DjgV5giaRFtle0Nb3M9ryKLi4CPmP7ekl7Ak/XFWtERGyuzhHENGCV7dW2nwAuBWZ1sqOklwOjbV8PYPtR24/VF2pERLSrM0EcCKxtKfeWde1OlLRM0uWSJpR1U4CHJH1H0u2SPleOSDYhaa6kpqRmX1/f0D+DiIidWJ3nQaiizm3lq4BLbD8u6X3A14DjyriOAY4AfgFcBswB/u8mndkLgYUA5bGM+4fyCWxnxgK/6XYQsdXy+o1cO/prd/BAG+pMEL3AhJbyeOCB1ga217UUzwXObtn3dturASRdAbyatgTR1lfPEMS83ZLUtN3odhyxdfL6jVw782tX5xTTEmCypEmSxgAnA4taG0ga11KcCdzTsu++kvrf9I8D2g9uR0REjWobQdjeKGkesBgYBZxve7mk+UDT9iLgNEkzgY3AeoppJGw/Jel04AeSBCylGGFERMQwkd1+WCC2R5LmlsdcYgTK6zdy7cyvXRJERERUylIbERFRKQkiIiIqJUFEbIGkD0naYyv2myPpRS3l88pVApD01nKdsRskNSSd8xz7vlHSTvnVy7q0vg6SdpP0/XKNuLd1O7ZuSYIYRpImSvrLDtpdUp5d/rfDEVenOo1/B/QhoDJBVJ3h32IO8EyCsH1qy1pk7wHeb3u67abt04Yq2Ng6ba/DEcCutg+3fVkn+2/hb2FESoIYXhOBQd9gJf0X4DW2D7X9xbZt3b4C4ES2EP9IJ+n5kr4r6U5Jd0v6e4o3+Rsk3VC2eVTSfEk/BY6S9ElJS8r2C1U4CWgA3yg/he7e/6lf0ieBPwW+Ui4jc6ykq1se//yyv9slzSrrd5d0afnB4TJg9278fkaS8gPN3S3l0yV9qnwdzpZ0m6SfSzqm3H6spKslHQB8HTi8fO1eLOnPytfjrvL12a3c577y9f8R8Nay7y9K+mE5QnxluWTQvZI+3ZVfxLawndsAN4o3xP8AzgPuBr4BvB64BbiXYkHC/YArgGXArcCh5b6vA+4ob7cDe5Xbf1vW/e0Aj7kM+H3Z5hjgRuAfgZuADwM9wLcpTiZcAhxd7rc/cF35WF8F7qdYImAicHdL/6cDnyrvvxi4luI8k5uBQ8r6C4FzgB8Dq4GTyvotxj/Sb8CJwLkt5RcA9wFjW+oMzG4p79dy/9+At5T3bwQaLdueKbfdPxa4urz/j8A7yvv7AD8Hng/8HcW5RACHUpw71NjW57sj3wb62y9/918o694IfL/idWi9/zyKdeWmlOWLgA+V9+8DPtL2Gp9d3v8bitUjxgG7UawQsX+3fy/P5ZYRxJa9BPjfFP+Uh1B8gv5Tij+2/wn8A8WyIIeW5YvK/U4HPmD7cIo3+t8DZwI3uxi2bjI6aDET+M+yzc1l3T62X2f7C2UsX7T9Soo3s/PKNn8P/Mj2ERRnrB/UwXNbCHzQ9tQy3n9t2TaufJ5vBv65rOsk/pHuLuD15SfMY2z/tqLNUxRJut90ST+VdBfFWf9/vA2PfwJwpqQ7KN5snkfxWr6W4lMttpdRfJCIrfed8udSikQymJcCa2z/vCx/jeL16Nc+BdW/YsRdwHLbv7L9OMWHrQmMIN2eshgJ1ti+C0DScuAHtl2+GUykWOjqRADb/0/S/pJeQDHK+BdJ3wC+Y7u3OCl8q7T+Ab4eeHlLX3tL2oviD/a/lXF8V9KGwTpUcY2N1wDfaulrt5YmV9h+Glgh6YVbG/hIY/vnkqZSfLL8J0nXVTT7g+2nACQ9jyKxNmyvlfQpijf1rSXgRNsrN6ksXqOctPTcbGTTafTW1+Xx8udTbPl9cEv/uL9rK/f3/XTL/f7yiHrPzQhiy9pf4NYXfzQDrFpr+5+BUynmim+VdMg2xND6B7gLcFT5Kf5w2wfafqT/cSv2HeifZBfgoZZ+Drf9spZ2rc97qzPbSFN+6+gx218HPg8cCTxCMUVYpf/3+Zsy6Z7Usm2w/QayGPhgucQMko4o638IvL2sewXFiDYG92vggPJD224Uo+Gt8R/AREkvKcv/nWLKd4eXBLHtWv9xjwV+Y/thSS+2fZfts4EmxfTU1rxhtLsOeOYKfJIOr4jjDcC+ZX3lP4nth4E1kt5a7iNJh23hsYci/u3dnwC3lVM8HwM+TTEV973+g9StbD9EsU7YXRTHopa0bL6Q4kD0HZI6Pah8FrArsKw8wHpWWf9/gD0lLQM+Atz2XJ/Yzsb2k8B84KfA1RRv9FvTzx+Ad1OMtu+i+HD4laGKc3uWpTYGoeI62FfbfkVZvrAsX96/jWJq5wJgEvAYMNf2MklfAqZTDGFXUHzl8WmKg8JjgQur5vErHvNG4HTbzbI8luJSri+jGMH80Pb7JO0PXFL2fRPFdNNU27+RdBpwGrAG+CVwn+1PSZpE8cYzjuJN6VLb81ufZ/mYj9reU9KuW4o/InYcSRA7KEn3UcyL78gXOomIGmWKKSIiKmUE0SWS/pxnr6DXb43tv+hGPBER7ZIgIiKiUqaYIiKiUhJERERUSoKIqCDpY5KWl4vj3SHpVRqiZb8jRookiIg2ko6iOKHwyHKNrddTLNY2JMt+R4wUSRARmxtHcUb84wDluSQnMXTLfk+VdJOkpZIWSxpX9vfKcsTyExXLgN9d1t/ccsY8km6RlKU2onZJEBGbuw6YoOJaAf8q6XW2z6FYunm67ellu+dTLCf9Kts/Ar5s+5XlWfC7A28uz0ZvAm8vV/bdCHyJYgn1qcD5wGfK/i4A3mf7KIoz8PudRzEKQdIUYLdyRdeIWiVBRLSx/SgwFZgL9AGXSZpT0XRrlv1+KfAK4PpyvaePA+Ml7QPsZfvHZbuLW/b5FvDmcqmTv6JY4ymidiNq6dmI4VIu530jcGP5hv+uimZbs+y3KK4RcNQmldK+FW37Y3lM0vXALGA2xZRVRO0ygohoI+mlkia3VB1OcYW+oVj2eyXQUx4IR9Kukv7Y9gbgEUmvLtud3Nb/eRRX+Vtie/3WPK+I5yojiIjN7Ql8qZz22QisophuOoVi2e9ftRyHAIplvyX1L/t9H9XLfv8eOIoieZxTXlhqNPC/gOXAe4BzJf2OYvTy25b+l0p6mOI4RcSwyFIbEdsJSXuWxz+QdCYwzvbflOUXUSSNQ8or/UXULlNMEduPN5Vfhb2b4jrmnwaQ9E6Ki958LMkhhlNGEBERUSkjiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhK/x8uoDdie+EbqgAAAABJRU5ErkJggg==\n",
262 |       "text/plain": [
263 |        "<Figure size 432x288 with 1 Axes>"
264 |       ]
265 |      },
266 |      "metadata": {
267 |       "needs_background": "light"
268 |      },
269 |      "output_type": "display_data"
270 |     }
271 |    ],
272 |    "source": [
273 |     "ax = sns.stripplot(strategies, test_scores); \n",
274 |     "ax.set(xlabel ='Strategy', ylabel ='Test Score') \n",
275 |     "plt.show() "
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "markdown",
280 |    "metadata": {},
281 |    "source": [
282 |     "Checking the performance of `RandomForestClassifier` on the data"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": 10,
288 |    "metadata": {},
289 |    "outputs": [
290 |     {
291 |      "data": {
292 |       "text/plain": [
293 |        "0.776595744680851"
294 |       ]
295 |      },
296 |      "execution_count": 10,
297 |      "metadata": {},
298 |      "output_type": "execute_result"
299 |     }
300 |    ],
301 |    "source": [
302 |     "from sklearn.ensemble import RandomForestClassifier\n",
303 |     "from sklearn.metrics import accuracy_score\n",
304 |     "ans=RandomForestClassifier()\n",
305 |     "ans.fit(X_train,y_train)\n",
306 |     "prediction=ans.predict(X_test)\n",
307 |     "accuracy_score(y_test,prediction)"
308 |    ]
309 |   },
310 |   {
311 |    "cell_type": "markdown",
312 |    "metadata": {},
313 |    "source": [
314 |     "On comparing the scores of the KNN classifier with the dummy classifier, we come to the conclusion that the KNN classifier is, in fact, a good classifier for the given data."
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "markdown",
319 |    "metadata": {},
320 |    "source": [
321 |     "## Imbalanced Class and `Dummy Classifier`\n",
322 |     "\n",
323 |     "A major motivation for Dummy Classifier is F-score, when the positive class is in minority (i.e. imbalanced classes). This classifier is used for sanity test of actual classifier. Actually, dummy classifier completely ignores the input data. In case of 'most frequent' method, it checks the occurrence of most frequent label."
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "code",
328 |    "execution_count": 11,
329 |    "metadata": {},
330 |    "outputs": [
331 |     {
332 |      "name": "stdout",
333 |      "output_type": "stream",
334 |      "text": [
335 |       "0 178\n",
336 |       "1 182\n",
337 |       "2 177\n",
338 |       "3 183\n",
339 |       "4 181\n",
340 |       "5 182\n",
341 |       "6 181\n",
342 |       "7 179\n",
343 |       "8 174\n",
344 |       "9 180\n"
345 |      ]
346 |     }
347 |    ],
348 |    "source": [
349 |     "from sklearn.datasets import load_digits\n",
350 |     "\n",
351 |     "dataset = load_digits()\n",
352 |     "X, y = dataset.data, dataset.target\n",
353 |     "\n",
354 |     "for class_name, class_count in zip(dataset.target_names, np.bincount(dataset.target)):\n",
355 |     "    print(class_name,class_count)"
356 |    ]
357 |   },
358 |   {
359 |    "cell_type": "code",
360 |    "execution_count": 12,
361 |    "metadata": {},
362 |    "outputs": [
363 |     {
364 |      "name": "stdout",
365 |      "output_type": "stream",
366 |      "text": [
367 |       "Original labels:\t [1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]\n",
368 |       "New binary labels:\t [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]\n"
369 |      ]
370 |     }
371 |    ],
372 |    "source": [
373 |     "y_imbalanced = y.copy()\n",
374 |     "y_imbalanced[y_imbalanced != 1] = 0\n",
375 |     "\n",
376 |     "print('Original labels:\\t', y[1:20])\n",
377 |     "print('New binary labels:\\t', y_imbalanced[1:20])"
378 |    ]
379 |   },
380 |   {
381 |    "cell_type": "code",
382 |    "execution_count": 14,
383 |    "metadata": {},
384 |    "outputs": [
385 |     {
386 |      "data": {
387 |       "text/plain": [
388 |        "array([1615,  182], dtype=int64)"
389 |       ]
390 |      },
391 |      "execution_count": 14,
392 |      "metadata": {},
393 |      "output_type": "execute_result"
394 |     }
395 |    ],
396 |    "source": [
397 |     "np.bincount(y_imbalanced)"
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "markdown",
402 |    "metadata": {},
403 |    "source": [
404 |     "We can observe that in the above data array one class is more frequent than other which shows it is an imbalanced class"
405 |    ]
406 |   },
407 |   {
408 |    "cell_type": "code",
409 |    "execution_count": 20,
410 |    "metadata": {},
411 |    "outputs": [
412 |     {
413 |      "data": {
414 |       "text/plain": [
415 |        "0.5466666666666666"
416 |       ]
417 |      },
418 |      "execution_count": 20,
419 |      "metadata": {},
420 |      "output_type": "execute_result"
421 |     }
422 |    ],
423 |    "source": [
424 |     "X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y_imbalanced, random_state=0)\n",
425 |     "\n",
426 |     "# Accuracy of Support Vector Machine classifier\n",
427 |     "from sklearn.naive_bayes import GaussianNB\n",
428 |     "gnb = GaussianNB()\n",
429 |     "y_pred = gnb.fit(X_train1, y_train1)\n",
430 |     "gnb.score(X_test1, y_test1)"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "markdown",
435 |    "metadata": {},
436 |    "source": [
437 |     "Here on using Naive Bayes Classifier we get a score of 0.55 , We know this is not a good score and we can use other classifiers and fit the model and check their score. "
438 |    ]
439 |   },
440 |   {
441 |    "cell_type": "code",
442 |    "execution_count": 24,
443 |    "metadata": {},
444 |    "outputs": [
445 |     {
446 |      "data": {
447 |       "text/plain": [
448 |        "0.9088888888888889"
449 |       ]
450 |      },
451 |      "execution_count": 24,
452 |      "metadata": {},
453 |      "output_type": "execute_result"
454 |     }
455 |    ],
456 |    "source": [
457 |     "from sklearn.ensemble import RandomForestClassifier\n",
458 |     "\n",
459 |     "clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train1, y_train1)\n",
460 |     "clf.score(X_test1,y_test1)"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "markdown",
465 |    "metadata": {},
466 |    "source": [
467 |     "On Using RandomForestClassifier we get a score of 0.908 which is a great score and also much better than what Naive Bayes Classifier performed . "
468 |    ]
469 |   },
470 |   {
471 |    "cell_type": "markdown",
472 |    "metadata": {},
473 |    "source": [
474 |     "## Using dummy classifier as baseline"
475 |    ]
476 |   },
477 |   {
478 |    "cell_type": "code",
479 |    "execution_count": 25,
480 |    "metadata": {},
481 |    "outputs": [
482 |     {
483 |      "data": {
484 |       "text/plain": [
485 |        "0.9044444444444445"
486 |       ]
487 |      },
488 |      "execution_count": 25,
489 |      "metadata": {},
490 |      "output_type": "execute_result"
491 |     }
492 |    ],
493 |    "source": [
494 |     "from sklearn.dummy import DummyClassifier\n",
495 |     "dummy_majority = DummyClassifier(strategy = 'most_frequent').fit(X_train1, y_train1)\n",
496 |     "y_dummy_predictions = dummy_majority.predict(X_test)\n",
497 |     "dummy_majority.score(X_test1, y_test1)"
498 |    ]
499 |   },
500 |   {
501 |    "cell_type": "markdown",
502 |    "metadata": {},
503 |    "source": [
504 |     "We observe that the RandomForest classsifier score is not much compared to dummy classifier which also has a score of more than .90 . Which shows that RandomForest is not a right fit for the model despite the good score.\n",
505 |     "This makes us realise that we need a better model which scores better ."
506 |    ]
507 |   },
508 |   {
509 |    "cell_type": "code",
510 |    "execution_count": 27,
511 |    "metadata": {},
512 |    "outputs": [
513 |     {
514 |      "data": {
515 |       "text/plain": [
516 |        "0.9955555555555555"
517 |       ]
518 |      },
519 |      "execution_count": 27,
520 |      "metadata": {},
521 |      "output_type": "execute_result"
522 |     }
523 |    ],
524 |    "source": [
525 |     "from sklearn.svm import SVC\n",
526 |     "svm = SVC(kernel='rbf', C=1).fit(X_train1, y_train1)\n",
527 |     "svm.score(X_test1, y_test1)"
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "markdown",
532 |    "metadata": {},
533 |    "source": [
534 |     "On using SVM classifier using RBF kernel for the model,gives a whoping score of 0.99 which is a good score as well as it  performce better than dummy classifier which is our baseline. "
535 |    ]
536 |   },
537 |   {
538 |    "cell_type": "markdown",
539 |    "metadata": {},
540 |    "source": [
541 |     "*Thus, Dummy Classifier works as a baseline and gives an idea of the performance of the model on dataset*\n"
542 |    ]
543 |   },
544 |   {
545 |    "cell_type": "code",
546 |    "execution_count": null,
547 |    "metadata": {},
548 |    "outputs": [],
549 |    "source": []
550 |   }
551 |  ],
552 |  "metadata": {
553 |   "kernelspec": {
554 |    "display_name": "Python 3",
555 |    "language": "python",
556 |    "name": "python3"
557 |   },
558 |   "language_info": {
559 |    "codemirror_mode": {
560 |     "name": "ipython",
561 |     "version": 3
562 |    },
563 |    "file_extension": ".py",
564 |    "mimetype": "text/x-python",
565 |    "name": "python",
566 |    "nbconvert_exporter": "python",
567 |    "pygments_lexer": "ipython3",
568 |    "version": "3.7.6"
569 |   }
570 |  },
571 |  "nbformat": 4,
572 |  "nbformat_minor": 4
573 | }
574 | 


--------------------------------------------------------------------------------
/notebooks/Linear_Discriminant_Analysis.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "### INTRODUCTION \n",
  8 |     "The main goal of the dimensionality reduction techniques is to reduce the dimensions by removing the redundant and dependent\n",
  9 |     "features by transforming the features from a higher dimensional space that may lead to a curse of dimensionality problem, to a space with lower dimensions.\n",
 10 |     "LDA stands for Linear Discriminant Analysis. It is used as both multiclass classification algorithms and dimentionality\n",
 11 |     "reduction technique.\n",
 12 |     "It reduces the number of input features or columns on the given dataset. LDA focuses on maximizing separatability among \n",
 13 |     "known categories.\n",
 14 |     "LDA is an unsupervised approach which means there is no need for labeling classes of the data."
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "### WHAT IS DIMENTIONALITY REDUCTION ?\n",
 22 |     "The techniques of dimensionality reduction are important in applications of Machine Learning, Data Mining, Bioinformatics, and Information Retrieval. The main agenda is to remove the redundant and dependent features by changing the dataset onto a lower-dimensional space.\n",
 23 |     "\n",
 24 |     "In simple terms, they reduce the dimensions (i.e. variables) in a particular dataset while retaining most of the data.\n",
 25 |     "\n",
 26 |     "Multi-dimensional data comprises multiple features having a correlation with one another. We can plot multi-dimensional data in just 2 or 3 dimensions with dimensionality reduction. It allows the data to be presented in an explicit manner which can be easily understood by a layman."
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "markdown",
 31 |    "metadata": {},
 32 |    "source": [
 33 |     "### HOW ARE LDA MODELS REPRESENTED\n",
 34 |     "\n",
 35 |     "The representation of LDA is pretty straight-forward. The model consists of the statistical properties of your data that has been calculated for each class. The same properties are calculated over the multivariate Gaussian in the case of multiple variables. The multivariates are means and covariate matrix.\n",
 36 |     "\n",
 37 |     "Predictions are made by providing the statistical properties into the LDA equation. The properties are estimated from your data. Finally, the model values are saved to file to create the LDA model."
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "### WORKING\n",
 45 |     "The two very basic principles on which LDA works can be summerized into two steps:\n",
 46 |     "\n",
 47 |     "<ul>\n",
 48 |     "    <li> Maximizing distance between means of given classes.</li>\n",
 49 |     "    <li> Minimizing variation (which LDA calls as scatter) within each category.</li>\n",
 50 |     "</ul>\n",
 51 |     "    \n"
 52 |    ]
 53 |   },
 54 |   {
 55 |    "cell_type": "markdown",
 56 |    "metadata": {},
 57 |    "source": [
 58 |     "### ADVANTAGES\n",
 59 |     "<ul>\n",
 60 |     "    <li> Helps in reducing computational costs for a given classification task. </li>\n",
 61 |     "    <li> Helpful in avoiding overfitting by minimizing the error in parameter estimation. </li>\n",
 62 |     "</ul>\n",
 63 |     "\n",
 64 |     "\n",
 65 |     "### LIMITATIONS\n",
 66 |     "<ul>\n",
 67 |     "    <li> LDA fails to find the lower dimensional space if the dimensions are much higher than \n",
 68 |     "         the number of samples in the data matrix. </li>\n",
 69 |     "    <li> <b>LDA produces at most C-1 feature projections:</b> If the classification error estimates establish that more features are needed, some other method must be employed to provide those additional features </li> \n",
 70 |     "    <li> <b> LDA is a parametric method since it assumes unimodal Gaussian likelihoods: If the distributions are significantly          non-Gaussian, the LDA projections will not be able to preserve any complex structure of the data that may be needed for          classification </b> </li>\n",
 71 |     "    <li> LDA will fail when the discriminatory information is not in the mean, but rather in the variance of the data </li>\n",
 72 |     "</ul>"
 73 |    ]
 74 |   },
 75 |   {
 76 |    "cell_type": "markdown",
 77 |    "metadata": {},
 78 |    "source": [
 79 |     "### DIFFERENCE BETWEEN PCA AND LDA\n",
 80 |     "<ul>\n",
 81 |     "    <li> PCA is unsupervised algorithm while LDA is supervised algorithm. </li>\n",
 82 |     "    <li> The goal of PCA is to maximize variation in the given dataset while LDA focuses on \n",
 83 |     "         maximizing separatibility among known categories. </li>\n",
 84 |     "    <li> LDA performs better multi-class classification tasks than PCA. However, PCA performs better when the sample size is              comparatively small. An example would be comparisons between classification accuracies that are used in image         classification.</li>\n",
 85 |     "</ul>\n",
 86 |     "  "
 87 |    ]
 88 |   },
 89 |   {
 90 |    "cell_type": "markdown",
 91 |    "metadata": {},
 92 |    "source": [
 93 |     "### Following are the extensions of LDA in case we need to use non-linear discriminant analysis:\n",
 94 |     "<ul>\n",
 95 |     "    <li><b>Quadratic Discriminant Analysis (QDA):</b> Each class uses its own estimate of variance (or covariance when there are   multiple input variables). </li>\n",
 96 |     "    <li><b>Flexible Discriminant Analysis (FDA):</b> Where non-linear combinations of inputs is used such as splines.</li>\n",
 97 |     "    <li><b>Regularized Discriminant Analysis (RDA):</b> Introduces regularization into the estimate of the variance (actually covariance), moderating the influence of different variables on LDA. </li>\n",
 98 |     "</ul>"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "markdown",
103 |    "metadata": {},
104 |    "source": [
105 |     "### Applications:\n",
106 |     "<ul>\n",
107 |     "    <li><b>Face Recognition:</b> In the field of Computer Vision, face recognition is a very popular application in which each face is represented by a very large number of pixel values. Linear discriminant analysis (LDA) is used here to reduce the number of features to a more manageable number before the process of classification. Each of the new dimensions generated is a linear combination of pixel values, which form a template. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces.</li>\n",
108 |     "    <li><b>Medical:</b> In this field, Linear discriminant analysis (LDA) is used to classify the patient disease state as mild, moderate or severe based upon the patient various parameters and the medical treatment he is going through. This helps the doctors to intensify or reduce the pace of their treatment.</li>\n",
109 |     "    <li><b>Customer Identification:</b> Suppose we want to identify the type of customers which are most likely to buy a particular product in a shopping mall. By doing a simple question and answers survey, we can gather all the features of the customers. Here, Linear discriminant analysis will help us to identify and select the features which can describe the characteristics of the group of customers that are most likely to buy that particular product in the shopping mall. </li>\n",
110 |     "</ul>"
111 |    ]
112 |   },
113 |   {
114 |    "cell_type": "code",
115 |    "execution_count": 1,
116 |    "metadata": {},
117 |    "outputs": [],
118 |    "source": [
119 |     "# Import necessary modules\n",
120 |     "import numpy as np\n",
121 |     "import pandas as np\n",
122 |     "from sklearn.datasets import make_classification\n",
123 |     "from sklearn.model_selection import cross_val_score\n",
124 |     "from sklearn.model_selection import RepeatedStratifiedKFold\n",
125 |     "from sklearn.pipeline import Pipeline\n",
126 |     "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n",
127 |     "from sklearn.naive_bayes import GaussianNB"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "code",
132 |    "execution_count": 2,
133 |    "metadata": {},
134 |    "outputs": [],
135 |    "source": [
136 |     "# Generating data for our problem\n",
137 |     "X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7, n_classes=10)"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": 3,
143 |    "metadata": {},
144 |    "outputs": [
145 |     {
146 |      "name": "stdout",
147 |      "output_type": "stream",
148 |      "text": [
149 |       "[[ 2.3548775  -1.69674567  1.6193882  ... -3.33390362  2.45147541\n",
150 |       "  -1.23455205]\n",
151 |       " [ 2.0204277  -1.62734821 -2.27697377 ... -0.28274722 -7.28166465\n",
152 |       "  -0.91070347]\n",
153 |       " [-1.02400669  1.01276423  1.05505825 ...  3.83923974 -1.63530582\n",
154 |       "   3.96050914]\n",
155 |       " ...\n",
156 |       " [-0.36448581 -0.2996303   2.21875138 ... -1.11303373  3.67576043\n",
157 |       "  -1.44164572]\n",
158 |       " [ 0.05614772  1.87270289 -2.63165761 ... -3.07434527  2.31606352\n",
159 |       "   1.65068838]\n",
160 |       " [ 1.09853247  1.61067335  2.7977282  ... -1.62233539 14.09727916\n",
161 |       "   2.27215759]]\n"
162 |      ]
163 |     }
164 |    ],
165 |    "source": [
166 |     "print(X)"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "code",
171 |    "execution_count": 4,
172 |    "metadata": {},
173 |    "outputs": [
174 |     {
175 |      "name": "stdout",
176 |      "output_type": "stream",
177 |      "text": [
178 |       "[9 4 4 1 2 7 9 4 6 9 1 3 9 5 6 4 9 2 0 4 4 8 0 9 3 8 6 0 0 7 8 3 8 5 5 9 7\n",
179 |       " 1 3 1 8 7 6 7 4 6 5 6 2 8 3 1 7 0 7 0 4 5 1 6 6 8 3 3 3 2 5 8 0 5 6 2 7 1\n",
180 |       " 3 8 7 2 8 0 6 8 2 9 9 8 2 2 5 6 9 4 6 1 4 9 0 9 7 8 7 2 0 8 8 1 7 9 8 1 6\n",
181 |       " 3 9 2 5 5 3 9 1 1 2 0 8 0 7 2 0 5 0 1 8 0 2 2 1 3 0 2 5 9 3 8 8 7 7 0 4 3\n",
182 |       " 4 0 8 5 3 7 4 4 5 5 0 4 5 1 1 4 3 5 2 6 4 2 1 6 6 9 5 3 7 0 1 5 9 5 4 7 3\n",
183 |       " 9 0 0 1 9 5 2 2 7 4 0 1 2 6 4 3 7 6 8 8 3 0 8 3 0 5 5 1 7 8 6 8 4 1 1 3 1\n",
184 |       " 9 9 3 2 8 1 8 1 7 6 1 1 7 6 5 3 4 1 6 5 2 8 6 5 9 0 6 9 6 2 3 4 8 3 8 4 8\n",
185 |       " 1 0 4 0 6 3 8 4 6 9 2 9 2 7 5 1 6 3 0 6 9 3 7 1 5 5 0 9 4 8 9 2 8 2 9 3 2\n",
186 |       " 3 5 1 8 0 0 6 5 1 3 2 8 1 8 6 7 3 2 5 9 6 2 3 4 2 1 5 4 2 9 5 1 7 1 6 0 2\n",
187 |       " 8 6 1 8 7 8 0 3 0 7 1 0 4 1 4 2 0 8 2 7 9 7 3 5 1 5 1 4 9 0 4 9 5 0 8 9 1\n",
188 |       " 2 9 2 8 4 7 9 7 8 4 9 1 7 8 3 7 3 1 9 6 2 9 4 6 8 1 1 5 6 3 0 3 4 8 7 5 6\n",
189 |       " 9 9 6 4 8 2 6 2 7 0 6 8 0 7 0 1 5 7 3 2 2 3 5 2 1 3 6 9 5 4 3 6 7 9 2 4 2\n",
190 |       " 5 0 2 7 4 5 9 1 3 1 8 6 3 1 1 3 3 7 6 6 5 5 8 7 8 9 5 0 7 4 6 3 9 4 7 4 3\n",
191 |       " 5 7 6 7 6 7 9 7 7 7 7 5 7 5 1 6 3 2 6 5 1 0 6 0 1 5 8 9 6 6 3 6 3 6 0 0 8\n",
192 |       " 9 7 6 4 6 8 3 3 5 2 6 3 3 9 2 8 9 2 5 8 6 1 4 4 6 0 9 6 4 3 4 4 2 0 7 3 3\n",
193 |       " 4 9 0 5 3 6 4 8 3 5 2 5 8 2 1 5 4 2 3 7 8 0 1 4 0 6 8 2 7 4 8 1 4 3 5 0 3\n",
194 |       " 8 3 1 9 9 6 0 8 0 7 1 9 2 7 8 6 0 2 3 8 8 8 2 9 0 3 1 4 3 9 9 2 5 0 3 4 1\n",
195 |       " 3 6 6 2 6 2 2 6 5 4 2 6 3 2 7 2 3 3 3 2 2 2 9 7 9 0 9 0 5 3 6 0 3 8 2 6 3\n",
196 |       " 7 5 0 2 4 8 9 9 4 2 8 3 6 9 6 7 1 0 4 4 4 1 7 9 6 4 9 7 1 8 0 1 8 9 7 5 4\n",
197 |       " 8 3 5 6 6 8 1 2 2 3 0 0 0 9 8 0 3 8 7 9 5 4 6 6 0 1 5 5 1 6 4 7 1 2 0 3 4\n",
198 |       " 0 4 0 7 5 7 0 3 8 3 0 9 7 5 6 2 8 5 2 5 3 7 9 1 0 2 2 1 1 9 2 9 2 8 0 4 5\n",
199 |       " 4 0 1 6 6 5 2 5 0 1 7 6 5 0 0 3 4 2 1 6 6 5 4 3 3 4 9 4 2 3 5 1 4 5 1 7 8\n",
200 |       " 7 0 6 9 5 5 9 2 9 8 1 7 0 1 9 9 9 3 2 5 5 6 2 1 7 4 0 3 5 7 7 7 1 2 2 8 9\n",
201 |       " 1 7 3 9 0 2 1 6 1 4 3 6 6 0 1 3 2 8 4 0 7 4 7 9 8 7 1 6 0 1 4 2 3 5 9 5 7\n",
202 |       " 8 2 0 9 0 0 1 0 6 3 1 9 6 8 2 2 8 9 7 3 4 9 7 4 0 5 4 4 1 7 2 8 4 6 1 8 8\n",
203 |       " 3 4 7 5 7 0 5 8 4 5 8 5 9 6 7 1 5 1 6 9 2 1 9 7 2 4 0 7 3 7 5 4 5 7 8 3 5\n",
204 |       " 2 9 4 0 5 4 9 6 9 5 4 2 1 7 2 3 4 1 7 4 4 8 3 3 7 4 5 4 8 0 7 9 2 7 8 6 8\n",
205 |       " 6]\n"
206 |      ]
207 |     }
208 |    ],
209 |    "source": [
210 |     "print(y)"
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "code",
215 |    "execution_count": 5,
216 |    "metadata": {},
217 |    "outputs": [
218 |     {
219 |      "name": "stdout",
220 |      "output_type": "stream",
221 |      "text": [
222 |       "(1000, 20)\n"
223 |      ]
224 |     }
225 |    ],
226 |    "source": [
227 |     "# Shape of inout data\n",
228 |     "print(X.shape)"
229 |    ]
230 |   },
231 |   {
232 |    "cell_type": "code",
233 |    "execution_count": 6,
234 |    "metadata": {},
235 |    "outputs": [],
236 |    "source": [
237 |     "# Defining model\n",
238 |     "def model(n_components=None, solver='svd', shrinkage=None, priors=None,\n",
239 |     "          store_covariance=False, tol=0.0001, covariance_estimator=None):\n",
240 |     "    '''\n",
241 |     "    n_components : int, default=None\n",
242 |     "                   Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. \n",
243 |     "                   If None, will be set to min(n_classes - 1, n_features). This parameter only affects the transform method.\n",
244 |     "                   \n",
245 |     "    solver : {‘svd’, ‘lsqr’, ‘eigen’}, default=’svd’\n",
246 |     "             Solver to use, possible values:\n",
247 |     "             ‘svd’: Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features.\n",
248 |     "             ‘lsqr’: Least squares solution. Can be combined with shrinkage or custom covariance estimator.\n",
249 |     "             ‘eigen’: Eigenvalue decomposition. Can be combined with shrinkage or custom covariance estimator.\n",
250 |     "      \n",
251 |     "    shrinkage : ‘auto’ or float, default=None\n",
252 |     "                Shrinkage parameter, possible values:\n",
253 |     "                None: no shrinkage (default).\n",
254 |     "                ‘auto’: automatic shrinkage using the Ledoit-Wolf lemma.\n",
255 |     "                float between 0 and 1: fixed shrinkage parameter.\n",
256 |     "                This should be left to None if covariance_estimator is used. Note that shrinkage works only with ‘lsqr’ and ‘eigen’ solvers.\n",
257 |     "\n",
258 |     "    priors : array-like of shape (n_classes,), default=None\n",
259 |     "             The class prior probabilities. By default, the class proportions are inferred from the training data.\n",
260 |     "\n",
261 |     "    store_covariance : bool, default=False\n",
262 |     "                       If True, explicitely compute the weighted within-class covariance matrix when solver is ‘svd’. The matrix is always computed\n",
263 |     "                       and stored for the other solvers.\n",
264 |     "\n",
265 |     "    tol : float, default=1.0e-4\n",
266 |     "          Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular \n",
267 |     "          values are non-significant are discarded. Only used if solver is ‘svd’.\n",
268 |     "\n",
269 |     "    covariance_estimator : covariance estimator, default=None\n",
270 |     "                           If not None, covariance_estimator is used to estimate the covariance matrices instead of relying on the empirical \n",
271 |     "                           covariance estimator (with potential shrinkage). The object should have a fit method and a covariance_ attribute \n",
272 |     "                           like the estimators in sklearn.covariance. if None the shrinkage parameter drives the estimate.\n",
273 |     "      '''\n",
274 |     "    lda = LinearDiscriminantAnalysis(solver=solver, shrinkage=shrinkage, \n",
275 |     "                                     priors=priors, n_components=n_components, store_covariance=store_covariance, \n",
276 |     "                                     tol=tol, covariance_estimator=covariance_estimator)\n",
277 |     "    return lda"
278 |    ]
279 |   },
280 |   {
281 |    "cell_type": "code",
282 |    "execution_count": 7,
283 |    "metadata": {},
284 |    "outputs": [
285 |     {
286 |      "data": {
287 |       "text/plain": [
288 |        "LinearDiscriminantAnalysis(n_components=5)"
289 |       ]
290 |      },
291 |      "execution_count": 7,
292 |      "metadata": {},
293 |      "output_type": "execute_result"
294 |     }
295 |    ],
296 |    "source": [
297 |     "# Fitting the data to the model\n",
298 |     "lda = model(5)\n",
299 |     "lda.fit(X,y)\n",
300 |     "\n"
301 |    ]
302 |   },
303 |   {
304 |    "cell_type": "code",
305 |    "execution_count": 8,
306 |    "metadata": {},
307 |    "outputs": [],
308 |    "source": [
309 |     "# Transforming data \n",
310 |     "data_transformation = lda.transform(X)"
311 |    ]
312 |   },
313 |   {
314 |    "cell_type": "code",
315 |    "execution_count": 9,
316 |    "metadata": {},
317 |    "outputs": [
318 |     {
319 |      "name": "stdout",
320 |      "output_type": "stream",
321 |      "text": [
322 |       "[[-1.34250698 -0.410752   -0.05284109 -2.52177124 -2.32197387]\n",
323 |       " [ 0.92569633 -0.92633682 -0.29396574 -0.62144384  1.61682597]\n",
324 |       " [-0.36265323 -0.87103112  1.53812275  0.59888243 -1.39423894]\n",
325 |       " ...\n",
326 |       " [-0.83323633  0.06686996  0.39414469 -0.5877848   0.11590941]\n",
327 |       " [ 0.47329133  1.42040541  0.49439799 -0.05149737 -0.53591346]\n",
328 |       " [-1.04969306  0.27613461 -0.13712968 -1.21293132 -0.22775809]]\n"
329 |      ]
330 |     }
331 |    ],
332 |    "source": [
333 |     "print(data_transformation)"
334 |    ]
335 |   },
336 |   {
337 |    "cell_type": "code",
338 |    "execution_count": 10,
339 |    "metadata": {},
340 |    "outputs": [
341 |     {
342 |      "name": "stdout",
343 |      "output_type": "stream",
344 |      "text": [
345 |       "(1000, 5)\n"
346 |      ]
347 |     }
348 |    ],
349 |    "source": [
350 |     "# Notice the reduction of dimensions\n",
351 |     "print(data_transformation.shape)"
352 |    ]
353 |   },
354 |   {
355 |    "cell_type": "code",
356 |    "execution_count": null,
357 |    "metadata": {},
358 |    "outputs": [],
359 |    "source": []
360 |   },
361 |   {
362 |    "cell_type": "code",
363 |    "execution_count": null,
364 |    "metadata": {},
365 |    "outputs": [],
366 |    "source": []
367 |   },
368 |   {
369 |    "cell_type": "code",
370 |    "execution_count": null,
371 |    "metadata": {},
372 |    "outputs": [],
373 |    "source": []
374 |   },
375 |   {
376 |    "cell_type": "code",
377 |    "execution_count": null,
378 |    "metadata": {},
379 |    "outputs": [],
380 |    "source": []
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": null,
385 |    "metadata": {},
386 |    "outputs": [],
387 |    "source": []
388 |   }
389 |  ],
390 |  "metadata": {
391 |   "kernelspec": {
392 |    "display_name": "Python 3",
393 |    "language": "python",
394 |    "name": "python3"
395 |   },
396 |   "language_info": {
397 |    "codemirror_mode": {
398 |     "name": "ipython",
399 |     "version": 3
400 |    },
401 |    "file_extension": ".py",
402 |    "mimetype": "text/x-python",
403 |    "name": "python",
404 |    "nbconvert_exporter": "python",
405 |    "pygments_lexer": "ipython3",
406 |    "version": "3.7.9"
407 |   }
408 |  },
409 |  "nbformat": 4,
410 |  "nbformat_minor": 4
411 | }
412 | 


--------------------------------------------------------------------------------
/notebooks/PhiK Correlation/data_description.txt:
--------------------------------------------------------------------------------
  1 | MSSubClass: Identifies the type of dwelling involved in the sale.	
  2 | 
  3 |         20	1-STORY 1946 & NEWER ALL STYLES
  4 |         30	1-STORY 1945 & OLDER
  5 |         40	1-STORY W/FINISHED ATTIC ALL AGES
  6 |         45	1-1/2 STORY - UNFINISHED ALL AGES
  7 |         50	1-1/2 STORY FINISHED ALL AGES
  8 |         60	2-STORY 1946 & NEWER
  9 |         70	2-STORY 1945 & OLDER
 10 |         75	2-1/2 STORY ALL AGES
 11 |         80	SPLIT OR MULTI-LEVEL
 12 |         85	SPLIT FOYER
 13 |         90	DUPLEX - ALL STYLES AND AGES
 14 |        120	1-STORY PUD (Planned Unit Development) - 1946 & NEWER
 15 |        150	1-1/2 STORY PUD - ALL AGES
 16 |        160	2-STORY PUD - 1946 & NEWER
 17 |        180	PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
 18 |        190	2 FAMILY CONVERSION - ALL STYLES AND AGES
 19 | 
 20 | MSZoning: Identifies the general zoning classification of the sale.
 21 | 		
 22 |        A	Agriculture
 23 |        C	Commercial
 24 |        FV	Floating Village Residential
 25 |        I	Industrial
 26 |        RH	Residential High Density
 27 |        RL	Residential Low Density
 28 |        RP	Residential Low Density Park 
 29 |        RM	Residential Medium Density
 30 | 	
 31 | LotFrontage: Linear feet of street connected to property
 32 | 
 33 | LotArea: Lot size in square feet
 34 | 
 35 | Street: Type of road access to property
 36 | 
 37 |        Grvl	Gravel	
 38 |        Pave	Paved
 39 |        	
 40 | Alley: Type of alley access to property
 41 | 
 42 |        Grvl	Gravel
 43 |        Pave	Paved
 44 |        NA 	No alley access
 45 | 		
 46 | LotShape: General shape of property
 47 | 
 48 |        Reg	Regular	
 49 |        IR1	Slightly irregular
 50 |        IR2	Moderately Irregular
 51 |        IR3	Irregular
 52 |        
 53 | LandContour: Flatness of the property
 54 | 
 55 |        Lvl	Near Flat/Level	
 56 |        Bnk	Banked - Quick and significant rise from street grade to building
 57 |        HLS	Hillside - Significant slope from side to side
 58 |        Low	Depression
 59 | 		
 60 | Utilities: Type of utilities available
 61 | 		
 62 |        AllPub	All public Utilities (E,G,W,& S)	
 63 |        NoSewr	Electricity, Gas, and Water (Septic Tank)
 64 |        NoSeWa	Electricity and Gas Only
 65 |        ELO	Electricity only	
 66 | 	
 67 | LotConfig: Lot configuration
 68 | 
 69 |        Inside	Inside lot
 70 |        Corner	Corner lot
 71 |        CulDSac	Cul-de-sac
 72 |        FR2	Frontage on 2 sides of property
 73 |        FR3	Frontage on 3 sides of property
 74 | 	
 75 | LandSlope: Slope of property
 76 | 		
 77 |        Gtl	Gentle slope
 78 |        Mod	Moderate Slope	
 79 |        Sev	Severe Slope
 80 | 	
 81 | Neighborhood: Physical locations within Ames city limits
 82 | 
 83 |        Blmngtn	Bloomington Heights
 84 |        Blueste	Bluestem
 85 |        BrDale	Briardale
 86 |        BrkSide	Brookside
 87 |        ClearCr	Clear Creek
 88 |        CollgCr	College Creek
 89 |        Crawfor	Crawford
 90 |        Edwards	Edwards
 91 |        Gilbert	Gilbert
 92 |        IDOTRR	Iowa DOT and Rail Road
 93 |        MeadowV	Meadow Village
 94 |        Mitchel	Mitchell
 95 |        Names	North Ames
 96 |        NoRidge	Northridge
 97 |        NPkVill	Northpark Villa
 98 |        NridgHt	Northridge Heights
 99 |        NWAmes	Northwest Ames
100 |        OldTown	Old Town
101 |        SWISU	South & West of Iowa State University
102 |        Sawyer	Sawyer
103 |        SawyerW	Sawyer West
104 |        Somerst	Somerset
105 |        StoneBr	Stone Brook
106 |        Timber	Timberland
107 |        Veenker	Veenker
108 | 			
109 | Condition1: Proximity to various conditions
110 | 	
111 |        Artery	Adjacent to arterial street
112 |        Feedr	Adjacent to feeder street	
113 |        Norm	Normal	
114 |        RRNn	Within 200' of North-South Railroad
115 |        RRAn	Adjacent to North-South Railroad
116 |        PosN	Near positive off-site feature--park, greenbelt, etc.
117 |        PosA	Adjacent to postive off-site feature
118 |        RRNe	Within 200' of East-West Railroad
119 |        RRAe	Adjacent to East-West Railroad
120 | 	
121 | Condition2: Proximity to various conditions (if more than one is present)
122 | 		
123 |        Artery	Adjacent to arterial street
124 |        Feedr	Adjacent to feeder street	
125 |        Norm	Normal	
126 |        RRNn	Within 200' of North-South Railroad
127 |        RRAn	Adjacent to North-South Railroad
128 |        PosN	Near positive off-site feature--park, greenbelt, etc.
129 |        PosA	Adjacent to postive off-site feature
130 |        RRNe	Within 200' of East-West Railroad
131 |        RRAe	Adjacent to East-West Railroad
132 | 	
133 | BldgType: Type of dwelling
134 | 		
135 |        1Fam	Single-family Detached	
136 |        2FmCon	Two-family Conversion; originally built as one-family dwelling
137 |        Duplx	Duplex
138 |        TwnhsE	Townhouse End Unit
139 |        TwnhsI	Townhouse Inside Unit
140 | 	
141 | HouseStyle: Style of dwelling
142 | 	
143 |        1Story	One story
144 |        1.5Fin	One and one-half story: 2nd level finished
145 |        1.5Unf	One and one-half story: 2nd level unfinished
146 |        2Story	Two story
147 |        2.5Fin	Two and one-half story: 2nd level finished
148 |        2.5Unf	Two and one-half story: 2nd level unfinished
149 |        SFoyer	Split Foyer
150 |        SLvl	Split Level
151 | 	
152 | OverallQual: Rates the overall material and finish of the house
153 | 
154 |        10	Very Excellent
155 |        9	Excellent
156 |        8	Very Good
157 |        7	Good
158 |        6	Above Average
159 |        5	Average
160 |        4	Below Average
161 |        3	Fair
162 |        2	Poor
163 |        1	Very Poor
164 | 	
165 | OverallCond: Rates the overall condition of the house
166 | 
167 |        10	Very Excellent
168 |        9	Excellent
169 |        8	Very Good
170 |        7	Good
171 |        6	Above Average	
172 |        5	Average
173 |        4	Below Average	
174 |        3	Fair
175 |        2	Poor
176 |        1	Very Poor
177 | 		
178 | YearBuilt: Original construction date
179 | 
180 | YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
181 | 
182 | RoofStyle: Type of roof
183 | 
184 |        Flat	Flat
185 |        Gable	Gable
186 |        Gambrel	Gabrel (Barn)
187 |        Hip	Hip
188 |        Mansard	Mansard
189 |        Shed	Shed
190 | 		
191 | RoofMatl: Roof material
192 | 
193 |        ClyTile	Clay or Tile
194 |        CompShg	Standard (Composite) Shingle
195 |        Membran	Membrane
196 |        Metal	Metal
197 |        Roll	Roll
198 |        Tar&Grv	Gravel & Tar
199 |        WdShake	Wood Shakes
200 |        WdShngl	Wood Shingles
201 | 		
202 | Exterior1st: Exterior covering on house
203 | 
204 |        AsbShng	Asbestos Shingles
205 |        AsphShn	Asphalt Shingles
206 |        BrkComm	Brick Common
207 |        BrkFace	Brick Face
208 |        CBlock	Cinder Block
209 |        CemntBd	Cement Board
210 |        HdBoard	Hard Board
211 |        ImStucc	Imitation Stucco
212 |        MetalSd	Metal Siding
213 |        Other	Other
214 |        Plywood	Plywood
215 |        PreCast	PreCast	
216 |        Stone	Stone
217 |        Stucco	Stucco
218 |        VinylSd	Vinyl Siding
219 |        Wd Sdng	Wood Siding
220 |        WdShing	Wood Shingles
221 | 	
222 | Exterior2nd: Exterior covering on house (if more than one material)
223 | 
224 |        AsbShng	Asbestos Shingles
225 |        AsphShn	Asphalt Shingles
226 |        BrkComm	Brick Common
227 |        BrkFace	Brick Face
228 |        CBlock	Cinder Block
229 |        CemntBd	Cement Board
230 |        HdBoard	Hard Board
231 |        ImStucc	Imitation Stucco
232 |        MetalSd	Metal Siding
233 |        Other	Other
234 |        Plywood	Plywood
235 |        PreCast	PreCast
236 |        Stone	Stone
237 |        Stucco	Stucco
238 |        VinylSd	Vinyl Siding
239 |        Wd Sdng	Wood Siding
240 |        WdShing	Wood Shingles
241 | 	
242 | MasVnrType: Masonry veneer type
243 | 
244 |        BrkCmn	Brick Common
245 |        BrkFace	Brick Face
246 |        CBlock	Cinder Block
247 |        None	None
248 |        Stone	Stone
249 | 	
250 | MasVnrArea: Masonry veneer area in square feet
251 | 
252 | ExterQual: Evaluates the quality of the material on the exterior 
253 | 		
254 |        Ex	Excellent
255 |        Gd	Good
256 |        TA	Average/Typical
257 |        Fa	Fair
258 |        Po	Poor
259 | 		
260 | ExterCond: Evaluates the present condition of the material on the exterior
261 | 		
262 |        Ex	Excellent
263 |        Gd	Good
264 |        TA	Average/Typical
265 |        Fa	Fair
266 |        Po	Poor
267 | 		
268 | Foundation: Type of foundation
269 | 		
270 |        BrkTil	Brick & Tile
271 |        CBlock	Cinder Block
272 |        PConc	Poured Contrete	
273 |        Slab	Slab
274 |        Stone	Stone
275 |        Wood	Wood
276 | 		
277 | BsmtQual: Evaluates the height of the basement
278 | 
279 |        Ex	Excellent (100+ inches)	
280 |        Gd	Good (90-99 inches)
281 |        TA	Typical (80-89 inches)
282 |        Fa	Fair (70-79 inches)
283 |        Po	Poor (<70 inches
284 |        NA	No Basement
285 | 		
286 | BsmtCond: Evaluates the general condition of the basement
287 | 
288 |        Ex	Excellent
289 |        Gd	Good
290 |        TA	Typical - slight dampness allowed
291 |        Fa	Fair - dampness or some cracking or settling
292 |        Po	Poor - Severe cracking, settling, or wetness
293 |        NA	No Basement
294 | 	
295 | BsmtExposure: Refers to walkout or garden level walls
296 | 
297 |        Gd	Good Exposure
298 |        Av	Average Exposure (split levels or foyers typically score average or above)	
299 |        Mn	Mimimum Exposure
300 |        No	No Exposure
301 |        NA	No Basement
302 | 	
303 | BsmtFinType1: Rating of basement finished area
304 | 
305 |        GLQ	Good Living Quarters
306 |        ALQ	Average Living Quarters
307 |        BLQ	Below Average Living Quarters	
308 |        Rec	Average Rec Room
309 |        LwQ	Low Quality
310 |        Unf	Unfinshed
311 |        NA	No Basement
312 | 		
313 | BsmtFinSF1: Type 1 finished square feet
314 | 
315 | BsmtFinType2: Rating of basement finished area (if multiple types)
316 | 
317 |        GLQ	Good Living Quarters
318 |        ALQ	Average Living Quarters
319 |        BLQ	Below Average Living Quarters	
320 |        Rec	Average Rec Room
321 |        LwQ	Low Quality
322 |        Unf	Unfinshed
323 |        NA	No Basement
324 | 
325 | BsmtFinSF2: Type 2 finished square feet
326 | 
327 | BsmtUnfSF: Unfinished square feet of basement area
328 | 
329 | TotalBsmtSF: Total square feet of basement area
330 | 
331 | Heating: Type of heating
332 | 		
333 |        Floor	Floor Furnace
334 |        GasA	Gas forced warm air furnace
335 |        GasW	Gas hot water or steam heat
336 |        Grav	Gravity furnace	
337 |        OthW	Hot water or steam heat other than gas
338 |        Wall	Wall furnace
339 | 		
340 | HeatingQC: Heating quality and condition
341 | 
342 |        Ex	Excellent
343 |        Gd	Good
344 |        TA	Average/Typical
345 |        Fa	Fair
346 |        Po	Poor
347 | 		
348 | CentralAir: Central air conditioning
349 | 
350 |        N	No
351 |        Y	Yes
352 | 		
353 | Electrical: Electrical system
354 | 
355 |        SBrkr	Standard Circuit Breakers & Romex
356 |        FuseA	Fuse Box over 60 AMP and all Romex wiring (Average)	
357 |        FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)
358 |        FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor)
359 |        Mix	Mixed
360 | 		
361 | 1stFlrSF: First Floor square feet
362 |  
363 | 2ndFlrSF: Second floor square feet
364 | 
365 | LowQualFinSF: Low quality finished square feet (all floors)
366 | 
367 | GrLivArea: Above grade (ground) living area square feet
368 | 
369 | BsmtFullBath: Basement full bathrooms
370 | 
371 | BsmtHalfBath: Basement half bathrooms
372 | 
373 | FullBath: Full bathrooms above grade
374 | 
375 | HalfBath: Half baths above grade
376 | 
377 | BedroomAbvGr: Bedrooms above grade (does NOT include basement bedrooms)
378 | 
379 | KitchenAbvGr: Kitchens above grade
380 | 
381 | KitchenQual: Kitchen quality
382 | 
383 |        Ex	Excellent
384 |        Gd	Good
385 |        TA	Typical/Average
386 |        Fa	Fair
387 |        Po	Poor
388 |        	
389 | TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
390 | 
391 | Functional: Home functionality (Assume typical unless deductions are warranted)
392 | 
393 |        Typ	Typical Functionality
394 |        Min1	Minor Deductions 1
395 |        Min2	Minor Deductions 2
396 |        Mod	Moderate Deductions
397 |        Maj1	Major Deductions 1
398 |        Maj2	Major Deductions 2
399 |        Sev	Severely Damaged
400 |        Sal	Salvage only
401 | 		
402 | Fireplaces: Number of fireplaces
403 | 
404 | FireplaceQu: Fireplace quality
405 | 
406 |        Ex	Excellent - Exceptional Masonry Fireplace
407 |        Gd	Good - Masonry Fireplace in main level
408 |        TA	Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
409 |        Fa	Fair - Prefabricated Fireplace in basement
410 |        Po	Poor - Ben Franklin Stove
411 |        NA	No Fireplace
412 | 		
413 | GarageType: Garage location
414 | 		
415 |        2Types	More than one type of garage
416 |        Attchd	Attached to home
417 |        Basment	Basement Garage
418 |        BuiltIn	Built-In (Garage part of house - typically has room above garage)
419 |        CarPort	Car Port
420 |        Detchd	Detached from home
421 |        NA	No Garage
422 | 		
423 | GarageYrBlt: Year garage was built
424 | 		
425 | GarageFinish: Interior finish of the garage
426 | 
427 |        Fin	Finished
428 |        RFn	Rough Finished	
429 |        Unf	Unfinished
430 |        NA	No Garage
431 | 		
432 | GarageCars: Size of garage in car capacity
433 | 
434 | GarageArea: Size of garage in square feet
435 | 
436 | GarageQual: Garage quality
437 | 
438 |        Ex	Excellent
439 |        Gd	Good
440 |        TA	Typical/Average
441 |        Fa	Fair
442 |        Po	Poor
443 |        NA	No Garage
444 | 		
445 | GarageCond: Garage condition
446 | 
447 |        Ex	Excellent
448 |        Gd	Good
449 |        TA	Typical/Average
450 |        Fa	Fair
451 |        Po	Poor
452 |        NA	No Garage
453 | 		
454 | PavedDrive: Paved driveway
455 | 
456 |        Y	Paved 
457 |        P	Partial Pavement
458 |        N	Dirt/Gravel
459 | 		
460 | WoodDeckSF: Wood deck area in square feet
461 | 
462 | OpenPorchSF: Open porch area in square feet
463 | 
464 | EnclosedPorch: Enclosed porch area in square feet
465 | 
466 | 3SsnPorch: Three season porch area in square feet
467 | 
468 | ScreenPorch: Screen porch area in square feet
469 | 
470 | PoolArea: Pool area in square feet
471 | 
472 | PoolQC: Pool quality
473 | 		
474 |        Ex	Excellent
475 |        Gd	Good
476 |        TA	Average/Typical
477 |        Fa	Fair
478 |        NA	No Pool
479 | 		
480 | Fence: Fence quality
481 | 		
482 |        GdPrv	Good Privacy
483 |        MnPrv	Minimum Privacy
484 |        GdWo	Good Wood
485 |        MnWw	Minimum Wood/Wire
486 |        NA	No Fence
487 | 	
488 | MiscFeature: Miscellaneous feature not covered in other categories
489 | 		
490 |        Elev	Elevator
491 |        Gar2	2nd Garage (if not described in garage section)
492 |        Othr	Other
493 |        Shed	Shed (over 100 SF)
494 |        TenC	Tennis Court
495 |        NA	None
496 | 		
497 | MiscVal: $Value of miscellaneous feature
498 | 
499 | MoSold: Month Sold (MM)
500 | 
501 | YrSold: Year Sold (YYYY)
502 | 
503 | SaleType: Type of sale
504 | 		
505 |        WD 	Warranty Deed - Conventional
506 |        CWD	Warranty Deed - Cash
507 |        VWD	Warranty Deed - VA Loan
508 |        New	Home just constructed and sold
509 |        COD	Court Officer Deed/Estate
510 |        Con	Contract 15% Down payment regular terms
511 |        ConLw	Contract Low Down payment and low interest
512 |        ConLI	Contract Low Interest
513 |        ConLD	Contract Low Down
514 |        Oth	Other
515 | 		
516 | SaleCondition: Condition of sale
517 | 
518 |        Normal	Normal Sale
519 |        Abnorml	Abnormal Sale -  trade, foreclosure, short sale
520 |        AdjLand	Adjoining Land Purchase
521 |        Alloca	Allocation - two linked properties with separate deeds, typically condo with a garage unit	
522 |        Family	Sale between family members
523 |        Partial	Home was not completed when last assessed (associated with New Homes)
524 | 
525 | 
526 | 
527 | 
528 | 
529 | 
530 | 
531 | 
532 | 
533 | 
534 | 
535 | 
536 | 
537 | 
538 | 
539 | 
540 | 
541 | 
542 | 
543 | 
544 | 
545 | Based on data drescription following continous variables were found:
546 | - LotFrontage: Linear feet of street connected to property
547 | - LotArea: Lot size in square feet
548 | - YearBuilt: Original construction date
549 | - YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
550 | - MasVnrArea: Masonry veneer area in square feet
551 | - BsmtFinSF1: Type 1 finished square feet
552 | - BsmtFinSF2: Type 2 finished square feet
553 | - BsmtUnfSF: Unfinished square feet of basement area
554 | - TotalBsmtSF: Total square feet of basement area
555 | - 1stFlrSF: First Floor square feet 
556 | - 2ndFlrSF: Second floor square feet
557 | - LowQualFinSF: Low quality finished square feet (all floors)
558 | - GrLivArea: Above grade (ground) living area square feet
559 | - BsmtFullBath: Basement full bathrooms
560 | - BsmtHalfBath: Basement half bathrooms
561 | - FullBath: Full bathrooms above grade
562 | - HalfBath: Half baths above grade
563 | - BedroomAbvGr: Bedrooms above grade (does NOT include basement bedrooms)
564 | - KitchenAbvGr: Kitchens above grade
565 | - TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
566 | - GarageYrBlt: Year garage was built
567 | - GarageCars: Size of garage in car capacity
568 | - GarageArea: Size of garage in square feet
569 | - WoodDeckSF: Wood deck area in square feet
570 | - OpenPorchSF: Open porch area in square feet
571 | - EnclosedPorch: Enclosed porch area in square feet
572 | - 3SsnPorch: Three season porch area in square feet
573 | - ScreenPorch: Screen porch area in square feet
574 | - PoolArea: Pool area in square feet


--------------------------------------------------------------------------------
/notebooks/agriculture_yield_rice:
--------------------------------------------------------------------------------
1 | import pandas as pd
2 | import numpy as np
3 | df=pd.read_csv("Dataset.csv")
4 | df
5 | df1=pd.get_dummies(df['Irrigation'])
6 | df2=pd.concat([df1,df],axis=1)
7 | df2
8 | 


--------------------------------------------------------------------------------
/notebooks/biden_speech.txt:
--------------------------------------------------------------------------------
 1 | Thank you. Thank you, thank you, thank you. It’s good to be back. As Mitch and Chuck will understand, it’s good to be almost home, down the hall. Anyway, thank you all.
 2 | 
 3 | Madam Speaker, Madam Vice President. No president has ever said those words from this podium. No president has ever said those words. And it’s about time. The first lady, I’m her husband. Second gentleman. Chief justice. Members of the United States Congress and the cabinet, distinguished guests. My fellow Americans.
 4 | 
 5 | While the setting tonight is familiar, this gathering is just a little bit different. A reminder of the extraordinary times we’re in. Throughout our history, presidents have come to this chamber to speak to Congress, to the nation and to the world. To declare war, to celebrate peace, to announce new plans and possibilities.
 6 | 
 7 | 
 8 | 
 9 | 
10 | 
11 | Tonight, I come to talk about crisis and opportunity. About rebuilding the nation, revitalizing our democracy, and winning the future for America. I stand here tonight one day shy of the 100th day of my administration. A hundred days since I took the oath of office, lifted my hand off our family Bible and inherited a nation — we all did — that was in crisis. The worst pandemic in a century. The worst economic crisis since the Great Depression. The worst attack on our democracy since the Civil War. Now, after just 100 days, I can report to the nation, America is on the move again. Turning peril into possibility, crisis into opportunity, setbacks to strength.
12 | 
13 | We all know life can knock us down. But in America, we never, ever, ever stay down. Americans always get up. Today, that’s what we’re doing. America is rising anew. Choosing hope over fear, truth over lies and light over darkness. After 100 days of rescue and renewal, America is ready for a takeoff, in my view. We’re working again, dreaming again, discovering again and leading the world again. We have shown each other and the world that there’s no quit in America. None.
14 | 
15 | 
16 | 
17 | And more than half of all the adults in America have gotten at least one shot. The mass vaccination center in Glendale, Ariz., I asked the nurse, I said, “What’s it like?” She looked at me, she said, “It’s like every shot is giving a dose of hope” was her phrase, a dose of hope.
18 | 
19 | A dose of hope for an educator in Florida, who has a child suffering from an autoimmune disease, wrote to me, said she’s worried — that she was worried about bringing the virus home. She said she then got vaccinated at a large site, in her car. She said she sat in her car when she got vaccinated and just cried, cried out of joy, and cried out of relief.
20 | 
21 | Parents seeing the smiles on the kids’ faces, for those who are able to go back to school because the teachers and the school bus drivers and the cafeteria workers have been vaccinated. Grandparents, hugging their children and grandchildren, instead of pressing hands against the window to say goodbye. It means everything. Those things mean everything.
22 | 
23 | You know, there’s still — you all know it, you know it better than any group of Americans — there’s still more work to do to beat this virus. We can’t let our guard down. But tonight, I can say, because of you, the American people, our progress these past 100 days against one of the worst pandemics in history has been one of the greatest logistical achievements, logistical achievements this country has ever seen. What else have we done in those first 100 days?
24 | 
25 | We kept our commitment, Democrats and Republicans, of sending $1,400 rescue checks to 85 percent of American households. We’ve already sent more than 160 million checks out the door. It’s making a difference. You all know it when you go home. For many people, it’s making all the difference in the world.
26 | 
27 | A single mom in Texas who wrote me, she said she couldn’t work. She said the relief check put food on the table and saved her and her son from eviction from their apartment. A grandmother in Virginia who told me she immediately took her granddaughter to the eye doctor, something she said she put off for months because she didn’t have the money. One of the defining images, at least from my perspective, in this crisis has been cars lined up, cars lined up for miles. And not people just barely able to start those cars. Nice cars, lined up for miles, waiting for a box of food to be put in their trunk.
28 | 
29 | I don’t know about you, but I didn’t ever think I would see that in America. And all of this is through no fault of their own. No fault of their own, these people are in this position. That’s why the rescue plan is delivering food and nutrition assistance to millions of Americans facing hunger. And hunger is down sharply already.
30 | 
31 | 
32 | 
33 | 
34 | 
35 | 
36 | Folks — as I’ve told every world leader I’ve met with over the years — it’s never, ever, ever been a good bet to bet against America and it still isn’t. We are the United States of America. There is not a single thing — nothing, nothing beyond our capacity. We can do whatever we set our mind to if we do it together. So let’s begin to get together.
37 | 
38 | God bless you all, and may God protect our troops. Thank you for your patience.


--------------------------------------------------------------------------------