├── .gitignore ├── Dummy Classifier Notebook.ipynb ├── Gaussian_Mixture_Models.ipynb ├── Hypothesis Testing ├── Hypo_Testing.ipynb ├── Hypothesis Testing.md ├── PlantGrowth.csv └── blood_pressure.csv ├── LICENSE ├── README.md ├── WIP ├── Exponential_Smoothing.ipynb └── Monte Carlo Simulation.ipynb ├── data ├── Breast-Cancer.csv ├── HorseKicks.txt ├── Housefly_wing_lengths.txt └── food_outlet_data.csv ├── ideas.md ├── images ├── Conditional.png ├── JDTable.png ├── Joint.png ├── Marginal.png ├── Marginal2.png ├── OneDirectional.png ├── OnettwoTailed.png ├── Table.png ├── TwoDirectional.png └── TypeIandTypeIIError.png └── notebooks ├── 1-Way ANOVA.ipynb ├── ARIMA.ipynb ├── Baye's Theorem Notebook.ipynb ├── Binary Classification-Logistic Regression.ipynb ├── Central Limit Theorem.ipynb ├── Correlation with Example ├── Correlation.ipynb ├── Movie Recommendation using Correlation.ipynb └── README.md ├── Data_Summary_Notebook#12.ipynb ├── Decision Tree.ipynb ├── Dummy Classifier Notebook.ipynb ├── Frequency Distribution.ipynb ├── Frequency_Distribution.ipynb ├── Heteroscedasticty.ipynb ├── Hypothesis Testing.ipynb ├── JointProbabilityDistribution.ipynb ├── KMeans_Clustering.ipynb ├── K_Nearest_Neighbours.ipynb ├── LNN.ipynb ├── Linear_Discriminant_Analysis.ipynb ├── Markov_chains.ipynb ├── MonteCarlo.ipynb ├── Multilinear-Regression.ipynb ├── PhiK Correlation ├── PhiK.ipynb ├── data_description.txt └── dataset.csv ├── Precision&Recall.ipynb ├── Principal-Component-Analysis.ipynb ├── Probability_Distributions_All.ipynb ├── RFR using GridsearchCV.ipynb ├── Statistical_&_Probability_Notebook_Part_1.ipynb ├── Statistical_&_Probability_Notebook_Part_2.ipynb ├── Support_Vector_Machine.ipynb ├── Time_Series.ipynb ├── Time_Series_Visualization.ipynb ├── agriculture_yield_rice ├── autocorrelation.ipynb ├── bias_variance_notebook.ipynb ├── biden_speech.txt ├── data_summary_breast_cancer.ipynb ├── intro-numpy-pandas-matplotlib.ipynb └── maximum-likelihood-estimation.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints/ 2 | -------------------------------------------------------------------------------- /Dummy Classifier Notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Dummy Classifier " 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## What is a `DummyClassifier`?\n", 15 | "\n", 16 | "DummyClassifier is a classifier that makes predictions using simple rules, which can be\n", 17 | "useful as a baseline for comparison against actual classifiers, especially with imbalanced classes(where the class distribution is not equal or close to equal, and is instead biased or skewed).\n", 18 | "\n", 19 | "A dummy classifier is basically a classifier which doesn’t even look at the training data while classification, but follows just a rule of thumb or strategy that we instruct it to use while classifying. It is done by including the strategy we want in the strategy parameter of the `DummyClassifier`.The main notion behind using a dummy classifier is that a classifier which is based on an analytic approach to do better than random guessing approach.\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Strategies used in Dummy Classifier\n", 27 | "\n", 28 | "The scikit-learn `DummyClassifier` class implements several strategies for random guessing classifiers. \n", 29 | "The strategies are as follows:\n", 30 | "\n", 31 | "- stratified : This strategy generates the prediction using the training set's class distribution\n", 32 | "- most_frequent : This always predicts the most frequent label in training set.\n", 33 | "- prior : This predicts the class that maximises the class prior.\n", 34 | "- uniform : This generates predictions uniformly at random\n", 35 | "- constant : Always predicts a constant label which is user defined. This is specificaly usefull for metrics that evaluate a non-majority class." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | " ## Explaination through Implementation\n", 43 | " \n", 44 | "The dummy classifier gives measure of \"baseline\" performance--i.e. the success rate one should expect to achieve even if simply guessing.\n", 45 | "\n", 46 | "If one wishes to determine whether a given object possesses or does not possess a certain property. After analyzing a large number of the objects it is found that 90% contain the target property, then guessing that every future instance of the object possesses the target property gives a 90% likelihood of guessing correctly. Structuring these guesses is equivalent to using the `most_frequent` method in dummy clasifier\n", 47 | "\n", 48 | "Because many machine learning tasks attempt to increase the success rate of (e.g.) classification tasks, evaluating the baseline success rate can afford a floor value for the minimal value one's classifier should out-perform. \n", 49 | "\n", 50 | "If one trains a dummy classifier with the `stratified` parameter using the data discussed above, that classifier will predict that there is a 90% probability that each object it encounters possesses the target property. This is different from training a dummy classifier with the `most_frequent` parameter, as the latter would guess that all future objects possess the target property. Here's some code to illustrate:" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "import numpy as np \n", 60 | "import pandas as pd \n", 61 | "import matplotlib.pyplot as plt \n", 62 | "import seaborn as sns " 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 2, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "data": { 72 | "text/html": [ 73 | "
\n", 74 | "\n", 87 | "\n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
Id
061487235033.60.627501
28183640023.30.672321
318966239428.10.167210
\n", 153 | "
" 154 | ], 155 | "text/plain": [ 156 | " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", 157 | "Id \n", 158 | "0 6 148 72 35 0 33.6 \n", 159 | "2 8 183 64 0 0 23.3 \n", 160 | "3 1 89 66 23 94 28.1 \n", 161 | "\n", 162 | " DiabetesPedigreeFunction Age Outcome \n", 163 | "Id \n", 164 | "0 0.627 50 1 \n", 165 | "2 0.672 32 1 \n", 166 | "3 0.167 21 0 " 167 | ] 168 | }, 169 | "execution_count": 2, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "import pandas as pd\n", 176 | "import matplotlib.pyplot as plt\n", 177 | "df_train = pd.read_csv(\"C:/Users/sshre/OneDrive/Documents/DATA SCIENCE/train.csv\")\n", 178 | "df_train.set_index(\"Id\").head(3)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 3, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "X = df_train.drop([\"Outcome\"],axis=1)\n", 188 | "y = df_train[\"Outcome\"]" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "Dividing the data set into training and test data for analysis" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 4, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "from sklearn.model_selection import train_test_split\n", 205 | "\n", 206 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "Checking the dummyclassifier performance with different strategies." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 8, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stdout", 223 | "output_type": "stream", 224 | "text": [ 225 | "0.6276595744680851\n", 226 | "0.5638297872340425\n", 227 | "0.574468085106383\n" 228 | ] 229 | } 230 | ], 231 | "source": [ 232 | "from sklearn.metrics import accuracy_score\n", 233 | "from sklearn.dummy import DummyClassifier\n", 234 | "strategies = ['most_frequent', 'stratified', 'uniform'] \n", 235 | " \n", 236 | "test_scores = [] \n", 237 | "for s in strategies: \n", 238 | " \n", 239 | " dclf = DummyClassifier(strategy = s, random_state = 0) \n", 240 | " dclf.fit(X_train, y_train) \n", 241 | " prediction=dclf.predict(X_test)\n", 242 | " score=(accuracy_score(y_test,prediction)) \n", 243 | " test_scores.append(score)\n", 244 | " print(score)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "Plotting the performace score of the dummyclassifier " 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 9, 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "data": { 261 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEHCAYAAAC0pdErAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAc2ElEQVR4nO3dfZxcVZ3n8c+XhCAIyEMaN5JAMpqIjsNTyigyKGGEiU/JzoIRRlfjiBlXI+OM4AvXh3GCzgw+jLtoZjWwgIzyoOhAQCSgC4gokopAIGEimQRMiy9pkyAgCgS++8e9DZXK7U6R9O1KJ9/361WvrnPuuad+1dVdvzrn1j1XtomIiGi3S7cDiIiI7VMSREREVEqCiIiISkkQERFRKQkiIiIqje52AENl7NixnjhxYrfDiIgYUZYuXfob2z1V23aYBDFx4kSazWa3w4iIGFEk3T/QtkwxRUREpSSIiIiolAQRERGVak0QkmZIWilplaQzB2gzW9IKScslXdy2bW9Jv5T05TrjjIiIzdV2kFrSKGABcDzQCyyRtMj2ipY2k4GPAkfb3iDpgLZuzgJuqivGiIgYWJ0jiGnAKturbT8BXArMamvzXmCB7Q0Ath/s3yBpKvBC4LoaY4yIiAHUmSAOBNa2lHvLulZTgCmSbpF0q6QZAJJ2Ab4AnDHYA0iaK6kpqdnX1zeEoUdERJ0JQhV17WuLjwYmA8cCpwDnSdoHeD9wje21DML2QtsN242ensrzPCIiYivVeaJcLzChpTweeKCiza22nwTWSFpJkTCOAo6R9H5gT2CMpEdtVx7ojoiIoVfnCGIJMFnSJEljgJOBRW1trgCmA0gaSzHltNr2220fZHsicDpwUZJDRMTwqi1B2N4IzAMWA/cA37S9XNJ8STPLZouBdZJWADcAZ9heV1dMERHROe0olxxtNBrOWkwREc+NpKW2G1XbciZ1RERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUqjVBSJohaaWkVZIqryktabakFZKWS7q4rDtY0lJJd5T176szzoiI2NzoujqWNApYABwP9AJLJC2yvaKlzWTgo8DRtjdIOqDc9CvgNbYfl7QncHe57wN1xRsREZuqcwQxDVhle7XtJ4BLgVltbd4LLLC9AcD2g+XPJ2w/XrbZreY4IyKiQp1vvAcCa1vKvWVdqynAFEm3SLpV0oz+DZImSFpW9nF21ehB0lxJTUnNvr6+Gp5CRMTOq84EoYo6t5VHA5OBY4FTgPMk7QNge63tQ4GXAO+S9MLNOrMX2m7YbvT09Axp8BERO7s6E0QvMKGlPB5oHwX0AlfaftL2GmAlRcJ4RjlyWA4cU2OsERHRps4EsQSYLGmSpDHAycCitjZXANMBJI2lmHJaLWm8pN3L+n2BoymSR0REDJPavsVke6OkecBiYBRwvu3lkuYDTduLym0nSFoBPAWcYXudpOOBL0gyxVTV523fVVesI9Uv1j3G6Zffyc/u38CRB+/L5086jIP236PbYUXEDkJ2+2GBkanRaLjZbHY7jGE1+6s/4bY1658pT5u0H9/866O6GFFEjDSSltpuVG3L10dHsJ/dv2HQckTEtkiCGMGOPHjfQcsREdsiCWIE+/xJhzFt0n6M3kVMm7Qfnz/psG6HFBE7kNoOUkf9Dtp/jxxziIjaZAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUanWBCFphqSVklZJOnOANrMlrZC0XNLFZd3hkn5S1i2T9LY644yIiM3Vtty3pFHAAuB4oBdYImmR7RUtbSYDHwWOtr1B0gHlpseAd9q+V9KLgKWSFtt+qK54IyJiU3WOIKYBq2yvtv0EcCkwq63Ne4EFtjcA2H6w/Plz2/eW9x8AHgR6aow1IiLa1JkgDgTWtpR7y7pWU4Apkm6RdKukGe2dSJoGjAH+s2LbXElNSc2+vr4hDD0iIupMEKqoc1t5NDAZOBY4BThP0j7PdCCNA/4NeLftpzfrzF5ou2G70dOTAUZExFCqM0H0AhNayuOBByraXGn7SdtrgJUUCQNJewPfBT5u+9Ya44yIiAp1JoglwGRJkySNAU4GFrW1uQKYDiBpLMWU0+qy/b8DF9n+Vo0xRkTEAGpLELY3AvOAxcA9wDdtL5c0X9LMstliYJ2kFcANwBm21wGzgdcCcyTdUd4OryvWiIjYnOz2wwIjU6PRcLPZ7HYYEREjiqSlthtV23ImdUREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzhygzWxJKyQtl3RxS/21kh6SdHWdMUZERLXRdXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBLF58D9gD+uq4YIyJiYFscQUjqkfTV/k/ykl4uaU4HfU8DVtlebfsJ4FJgVlub9wILbG8AsP1g/wbbPwAe6expRETEUOtkiulC4CZgQlm+F/hwB/sdCKxtKfeWda2mAFMk3SLpVkkzOug3IiKGQScJ4gDbFwNPA9h+Eniqg/1UUee28mhgMnAscApwnqR9Oui7eABprqSmpGZfX1+nu0VERAc6SRC/k7Qf5Zu7pFfS2dRPL8+OOgDGAw9UtLnS9pO21wArKRJGR2wvtN2w3ejp6el0t4iI6EAnCeJ04CrgjyTdBFwCfLCD/ZYAkyVNkjQGOBlY1NbmCmA6gKSxFFNOqzuMPSIiajTot5gk7QKMongTfxnFtNGK8qDzoGxvlDQPWFz2cb7t5ZLmA03bi8ptJ0haQTFtdYbtdeVj3wwcAuwpqRd4j+3FW/tEIyLiuZHdfligrYF0q+1XD1M8W63RaLjZbHY7jIiIEUXSUtuNqm2dTDFdL6n966kREbGD6+REuXnACyQ9DvyeYprJtverNbKIiOiqThLE2NqjiIiI7c4WE4TtpyS9EXhtWXWj7WvrDSsiIrqtk6U2PgN8hOLrp6uBj0j6dN2BRUREd3UyxfQW4AjbTwFIOh/4GfDxOgOLiIju6nS5771b7u9VRyAREbF96WQE8VngZ5J+QPENpmOBT9YZVEREdF8nB6m/LukG4FUUCeKTtn9Ze2QREdFVnRykngk8avs7tr9NsXjfm+sPLSIiuqmTYxDzbf+2v2D7IeCs+kKKiIjtQScJoqpNbZcqjYiI7UMnCeJnkj4r6WBJB0n6HHB73YFFRER3dZIg5pXtrqS4LgTA+2uLKCIitgudfIvpUYqLBiFpL9udXE0uIiJGuAFHEJI+JumQ8v4YSdcBayX9WtJxwxZhRER0xWBTTH9JcY1ogHcCz6NY2fU44J9qjisiIrpssATxhJ+93NwM4GLbG20vB3btpHNJMyStlLRK0pkDtJktaYWk5ZIubql/l6R7y9u7On1CERExNAY7BvG4pJcBD1KMGj7Ssm2PLXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBZvx/w90ADMLC03HfDc3p2ERGx1QYbQXwYWASsAs6xvRqgvDbEsg76ngassr3a9hPApUD7pUvfCyzof+O3/WBZ/+fA9bbXl9uupxjFRETEMBlwBGH7FmByRf01wDUd9H0gsLal3EuxnlOrKQCSbgFGAZ8qL0ZUte+BHTxmREQMkTrPiFZFndvKoymS0LHAeOBmSa/ocF8kzQXmAhx00EHbEmtERLTp9HoQW6MXmNBSHg88UNHmSttP2l5D8a2pyR3ui+2Fthu2Gz09PUMafETEzq6T1Vw3G2VU1VVYAkyWNEnSGOBkimMara4Appd9jqWYcloNLAZOkLSvpH2BE8q6iIgYJp2MIG7rsG4TtjdSLNOxGLgH+Kbt5ZLml0uIU25bJ2kFcANwhu11ttdTrBi7pLzNL+siImKY6NlTHdo2FF85HUfx7aPZPHtcYG/gPNuHDEuEHWo0Gm42m90OIyJiRJG01HajattgU0VvAv6KYv5/Ac8miEeATwxphBERsd0Z7GuuFwAXSJpt+5vDGFNERGwHOjkGcYCkvQEkfUXSbZL+rOa4IiKiyzpJEHNtPyzpBIrppv8BfLbesCIiots6SRD9R7HfAFxge2mH+0VExAjWyRv9nZKuAd4CfE/SnlSc1RwRETuWTk54ezcwlWLhvcfKE9reU29YERHRbVscQdh+CvgjimMPALt3sl9ERIxsnSy18WWK5TDeUVb9DvhKnUFFRET3dTLF9BrbR0q6HcD2+nJtpYiI2IF1MlX0pKRdKA9MS9ofeLrWqCIiousGTBAtK7YuAL4N9Ej6B+BHwNnDEFtERHTRYFNMtwFH2r5I0lLg9RTrMb3V9t3DEl1ERHTNYAnimau62V4OLK8/nIiI2F4MliB6JP3dQBtt/0sN8URExHZisAQxCtiT6utDR0TEDm6wBPEr2/OHLZKIiNiuDPY114wcIiJ2YoMliG2+5oOkGZJWSlol6cyK7XMk9Um6o7yd2rLtbEl3l7e3bWssERHx3Ax2Rbn129KxpFEU51AcD/QCSyQtsr2irelltue17fsm4EjgcGA34CZJ37P98LbEFBERnatz0b1pFCvArrb9BHApMKvDfV8O3GR7o+3fAXcCM2qKMyIiKtSZIA4E1raUe8u6didKWibpckkTyro7gTdI2qNcXnw6MKF9R0lzJTUlNfv6+oY6/oiInVqdCaLqIHf7hYauAibaPhT4PvA1ANvXAdcAPwYuAX4CbNysM3uh7YbtRk9Pz1DGHhGx06szQfSy6af+8cADrQ1sr7P9eFk8l+LCRP3bPmP7cNvHUySbe2uMNSIi2tSZIJYAkyVNKpcHPxlY1NpA0riW4kzgnrJ+VLlqLJIOBQ4Frqsx1oiIaNPJ9SC2iu2NkuYBiynOyj7f9nJJ84Gm7UXAaZJmUkwfrQfmlLvvCtwsCeBh4B22N5tiioiI+shuPywwMjUaDTebzW6HERExokhaartRtS3Xlo6IiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEq1JghJMyStlLRK0pkV2+dI6pN0R3k7tWXbZyUtl3SPpHNUXn80IiKGR23XpJY0ClgAHA/0AkskLbK9oq3pZbbnte37GuBo4NCy6kfA64Ab64o3IiI2VecIYhqwyvZq208AlwKzOtzXwPOAMcBuwK7Ar2uJMiIiKtWZIA4E1raUe8u6didKWibpckkTAGz/BLgB+FV5W2z7nvYdJc2V1JTU7OvrG/pnEBGxE6szQVQdM3Bb+Spgou1Dge8DXwOQ9BLgZcB4iqRynKTXbtaZvdB2w3ajp6dnSIOPiNjZ1ZkgeoEJLeXxwAOtDWyvs/14WTwXmFre/wvgVtuP2n4U+B7w6hpjjYiINnUmiCXAZEmTJI0BTgYWtTaQNK6lOBPon0b6BfA6SaMl7UpxgHqzKaaIiKhPbd9isr1R0jxgMTAKON/2cknzgabtRcBpkmYCG4H1wJxy98uB44C7KKalrrV9VV2xRkTE5mS3HxYYmRqNhpvNZrfDiIgYUSQttd2o2pYzqSMiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiG209pG1zLl2DkdcdARzrp3D2kfWbnmnESAJIiJiG33ilk+w9NdL2eiNLP31Uj5xyye6HdKQSIKIiNhGdz5456DlkSoJIiJiGx12wGGDlkeqJIiIiG101tFnMfWFUxmt0Ux94VTOOvqsboc0JGpbrC8iYmcxYa8JXDjjwm6HMeQygoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIioVGuCkDRD0kpJqySdWbF9jqQ+SXeUt1PL+uktdXdI+oOk/1pnrBERsanavuYqaRSwADge6AWWSFpke0Vb08tsz2utsH0DcHjZz37AKuC6umKNiIjN1TmCmAassr3a9hPApcCsrejnJOB7th8b0ugiImJQdSaIA4HWJQ17y7p2J0paJulySRMqtp8MXFL1AJLmSmpKavb19W17xBER8Yw6E4Qq6txWvgqYaPtQ4PvA1zbpQBoH/AmwuOoBbC+03bDd6OnpGYKQIyKiX50JohdoHRGMBx5obWB7ne3Hy+K5wNS2PmYD/277ydqijIiISnUmiCXAZEmTJI2hmCpa1NqgHCH0mwnc09bHKQwwvRQREfWq7VtMtjdKmkcxPTQKON/2cknzgabtRcBpkmYCG4H1wJz+/SVNpBiB3FRXjBERMTDZ7YcFRqZGo+Fms9ntMCIiRhRJS203qrblTOqIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiI4bR+DVzwRpi/f/Fz/ZpuRxQxoCSIiOF05Qfg/lvg6Y3Fzys/0O2IIgaUBBExnNb+dPByxHYkCSJiOE141eDliO1IEkTEcJq1AA4+GnYZXfyctaDbEUUMqLblviOiwn6T4N3XdDuKiI5kBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzqzYPkdSn6Q7ytupLdsOknSdpHskrSgvQRoREcOktoPUkkYBC4DjgV5giaRFtle0Nb3M9ryKLi4CPmP7ekl7Ak/XFWtERGyuzhHENGCV7dW2nwAuBWZ1sqOklwOjbV8PYPtR24/VF2pERLSrM0EcCKxtKfeWde1OlLRM0uWSJpR1U4CHJH1H0u2SPleOSDYhaa6kpqRmX1/f0D+DiIidWJ3nQaiizm3lq4BLbD8u6X3A14DjyriOAY4AfgFcBswB/u8mndkLgYUA5bGM+4fyCWxnxgK/6XYQsdXy+o1cO/prd/BAG+pMEL3AhJbyeOCB1ga217UUzwXObtn3dturASRdAbyatgTR1lfPEMS83ZLUtN3odhyxdfL6jVw782tX5xTTEmCypEmSxgAnA4taG0ga11KcCdzTsu++kvrf9I8D2g9uR0REjWobQdjeKGkesBgYBZxve7mk+UDT9iLgNEkzgY3AeoppJGw/Jel04AeSBCylGGFERMQwkd1+WCC2R5LmlsdcYgTK6zdy7cyvXRJERERUylIbERFRKQkiIiIqJUFEbIGkD0naYyv2myPpRS3l88pVApD01nKdsRskNSSd8xz7vlHSTvnVy7q0vg6SdpP0/XKNuLd1O7ZuSYIYRpImSvrLDtpdUp5d/rfDEVenOo1/B/QhoDJBVJ3h32IO8EyCsH1qy1pk7wHeb3u67abt04Yq2Ng6ba/DEcCutg+3fVkn+2/hb2FESoIYXhOBQd9gJf0X4DW2D7X9xbZt3b4C4ES2EP9IJ+n5kr4r6U5Jd0v6e4o3+Rsk3VC2eVTSfEk/BY6S9ElJS8r2C1U4CWgA3yg/he7e/6lf0ieBPwW+Ui4jc6ykq1se//yyv9slzSrrd5d0afnB4TJg9278fkaS8gPN3S3l0yV9qnwdzpZ0m6SfSzqm3H6spKslHQB8HTi8fO1eLOnPytfjrvL12a3c577y9f8R8Nay7y9K+mE5QnxluWTQvZI+3ZVfxLawndsAN4o3xP8AzgPuBr4BvB64BbiXYkHC/YArgGXArcCh5b6vA+4ob7cDe5Xbf1vW/e0Aj7kM+H3Z5hjgRuAfgZuADwM9wLcpTiZcAhxd7rc/cF35WF8F7qdYImAicHdL/6cDnyrvvxi4luI8k5uBQ8r6C4FzgB8Dq4GTyvotxj/Sb8CJwLkt5RcA9wFjW+oMzG4p79dy/9+At5T3bwQaLdueKbfdPxa4urz/j8A7yvv7AD8Hng/8HcW5RACHUpw71NjW57sj3wb62y9/918o694IfL/idWi9/zyKdeWmlOWLgA+V9+8DPtL2Gp9d3v8bitUjxgG7UawQsX+3fy/P5ZYRxJa9BPjfFP+Uh1B8gv5Tij+2/wn8A8WyIIeW5YvK/U4HPmD7cIo3+t8DZwI3uxi2bjI6aDET+M+yzc1l3T62X2f7C2UsX7T9Soo3s/PKNn8P/Mj2ERRnrB/UwXNbCHzQ9tQy3n9t2TaufJ5vBv65rOsk/pHuLuD15SfMY2z/tqLNUxRJut90ST+VdBfFWf9/vA2PfwJwpqQ7KN5snkfxWr6W4lMttpdRfJCIrfed8udSikQymJcCa2z/vCx/jeL16Nc+BdW/YsRdwHLbv7L9OMWHrQmMIN2eshgJ1ti+C0DScuAHtl2+GUykWOjqRADb/0/S/pJeQDHK+BdJ3wC+Y7u3OCl8q7T+Ab4eeHlLX3tL2oviD/a/lXF8V9KGwTpUcY2N1wDfaulrt5YmV9h+Glgh6YVbG/hIY/vnkqZSfLL8J0nXVTT7g+2nACQ9jyKxNmyvlfQpijf1rSXgRNsrN6ksXqOctPTcbGTTafTW1+Xx8udTbPl9cEv/uL9rK/f3/XTL/f7yiHrPzQhiy9pf4NYXfzQDrFpr+5+BUynmim+VdMg2xND6B7gLcFT5Kf5w2wfafqT/cSv2HeifZBfgoZZ+Drf9spZ2rc97qzPbSFN+6+gx218HPg8cCTxCMUVYpf/3+Zsy6Z7Usm2w/QayGPhgucQMko4o638IvL2sewXFiDYG92vggPJD224Uo+Gt8R/AREkvKcv/nWLKd4eXBLHtWv9xjwV+Y/thSS+2fZfts4EmxfTU1rxhtLsOeOYKfJIOr4jjDcC+ZX3lP4nth4E1kt5a7iNJh23hsYci/u3dnwC3lVM8HwM+TTEV973+g9StbD9EsU7YXRTHopa0bL6Q4kD0HZI6Pah8FrArsKw8wHpWWf9/gD0lLQM+Atz2XJ/Yzsb2k8B84KfA1RRv9FvTzx+Ad1OMtu+i+HD4laGKc3uWpTYGoeI62FfbfkVZvrAsX96/jWJq5wJgEvAYMNf2MklfAqZTDGFXUHzl8WmKg8JjgQur5vErHvNG4HTbzbI8luJSri+jGMH80Pb7JO0PXFL2fRPFdNNU27+RdBpwGrAG+CVwn+1PSZpE8cYzjuJN6VLb81ufZ/mYj9reU9KuW4o/InYcSRA7KEn3UcyL78gXOomIGmWKKSIiKmUE0SWS/pxnr6DXb43tv+hGPBER7ZIgIiKiUqaYIiKiUhJERERUSoKIqCDpY5KWl4vj3SHpVRqiZb8jRookiIg2ko6iOKHwyHKNrddTLNY2JMt+R4wUSRARmxtHcUb84wDluSQnMXTLfk+VdJOkpZIWSxpX9vfKcsTyExXLgN9d1t/ccsY8km6RlKU2onZJEBGbuw6YoOJaAf8q6XW2z6FYunm67ellu+dTLCf9Kts/Ar5s+5XlWfC7A28uz0ZvAm8vV/bdCHyJYgn1qcD5wGfK/i4A3mf7KIoz8PudRzEKQdIUYLdyRdeIWiVBRLSx/SgwFZgL9AGXSZpT0XRrlv1+KfAK4PpyvaePA+Ml7QPsZfvHZbuLW/b5FvDmcqmTv6JY4ymidiNq6dmI4VIu530jcGP5hv+uimZbs+y3KK4RcNQmldK+FW37Y3lM0vXALGA2xZRVRO0ygohoI+mlkia3VB1OcYW+oVj2eyXQUx4IR9Kukv7Y9gbgEUmvLtud3Nb/eRRX+Vtie/3WPK+I5yojiIjN7Ql8qZz22QisophuOoVi2e9ftRyHAIplvyX1L/t9H9XLfv8eOIoieZxTXlhqNPC/gOXAe4BzJf2OYvTy25b+l0p6mOI4RcSwyFIbEdsJSXuWxz+QdCYwzvbflOUXUSSNQ8or/UXULlNMEduPN5Vfhb2b4jrmnwaQ9E6Ki958LMkhhlNGEBERUSkjiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhK/x8uoDdie+EbqgAAAABJRU5ErkJggg==\n", 262 | "text/plain": [ 263 | "
" 264 | ] 265 | }, 266 | "metadata": { 267 | "needs_background": "light" 268 | }, 269 | "output_type": "display_data" 270 | } 271 | ], 272 | "source": [ 273 | "ax = sns.stripplot(strategies, test_scores); \n", 274 | "ax.set(xlabel ='Strategy', ylabel ='Test Score') \n", 275 | "plt.show() " 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "Checking the performance of `RandomForestClassifier` on the data" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 10, 288 | "metadata": {}, 289 | "outputs": [ 290 | { 291 | "data": { 292 | "text/plain": [ 293 | "0.776595744680851" 294 | ] 295 | }, 296 | "execution_count": 10, 297 | "metadata": {}, 298 | "output_type": "execute_result" 299 | } 300 | ], 301 | "source": [ 302 | "from sklearn.ensemble import RandomForestClassifier\n", 303 | "from sklearn.metrics import accuracy_score\n", 304 | "ans=RandomForestClassifier()\n", 305 | "ans.fit(X_train,y_train)\n", 306 | "prediction=ans.predict(X_test)\n", 307 | "accuracy_score(y_test,prediction)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "On comparing the scores of the KNN classifier with the dummy classifier, we come to the conclusion that the KNN classifier is, in fact, a good classifier for the given data." 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "## Imbalanced Class and `Dummy Classifier`\n", 322 | "\n", 323 | "A major motivation for Dummy Classifier is F-score, when the positive class is in minority (i.e. imbalanced classes). This classifier is used for sanity test of actual classifier. Actually, dummy classifier completely ignores the input data. In case of 'most frequent' method, it checks the occurrence of most frequent label." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 11, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "name": "stdout", 333 | "output_type": "stream", 334 | "text": [ 335 | "0 178\n", 336 | "1 182\n", 337 | "2 177\n", 338 | "3 183\n", 339 | "4 181\n", 340 | "5 182\n", 341 | "6 181\n", 342 | "7 179\n", 343 | "8 174\n", 344 | "9 180\n" 345 | ] 346 | } 347 | ], 348 | "source": [ 349 | "from sklearn.datasets import load_digits\n", 350 | "\n", 351 | "dataset = load_digits()\n", 352 | "X, y = dataset.data, dataset.target\n", 353 | "\n", 354 | "for class_name, class_count in zip(dataset.target_names, np.bincount(dataset.target)):\n", 355 | " print(class_name,class_count)" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 12, 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "name": "stdout", 365 | "output_type": "stream", 366 | "text": [ 367 | "Original labels:\t [1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]\n", 368 | "New binary labels:\t [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]\n" 369 | ] 370 | } 371 | ], 372 | "source": [ 373 | "y_imbalanced = y.copy()\n", 374 | "y_imbalanced[y_imbalanced != 1] = 0\n", 375 | "\n", 376 | "print('Original labels:\\t', y[1:20])\n", 377 | "print('New binary labels:\\t', y_imbalanced[1:20])" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 14, 383 | "metadata": {}, 384 | "outputs": [ 385 | { 386 | "data": { 387 | "text/plain": [ 388 | "array([1615, 182], dtype=int64)" 389 | ] 390 | }, 391 | "execution_count": 14, 392 | "metadata": {}, 393 | "output_type": "execute_result" 394 | } 395 | ], 396 | "source": [ 397 | "np.bincount(y_imbalanced)" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "We can observe that in the above data array one class is more frequent than other which shows it is an imbalanced class" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 20, 410 | "metadata": {}, 411 | "outputs": [ 412 | { 413 | "data": { 414 | "text/plain": [ 415 | "0.5466666666666666" 416 | ] 417 | }, 418 | "execution_count": 20, 419 | "metadata": {}, 420 | "output_type": "execute_result" 421 | } 422 | ], 423 | "source": [ 424 | "X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y_imbalanced, random_state=0)\n", 425 | "\n", 426 | "# Accuracy of Support Vector Machine classifier\n", 427 | "from sklearn.naive_bayes import GaussianNB\n", 428 | "gnb = GaussianNB()\n", 429 | "y_pred = gnb.fit(X_train1, y_train1)\n", 430 | "gnb.score(X_test1, y_test1)" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "Here on using Naive Bayes Classifier we get a score of 0.55 , We know this is not a good score and we can use other classifiers and fit the model and check their score. " 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 24, 443 | "metadata": {}, 444 | "outputs": [ 445 | { 446 | "data": { 447 | "text/plain": [ 448 | "0.9088888888888889" 449 | ] 450 | }, 451 | "execution_count": 24, 452 | "metadata": {}, 453 | "output_type": "execute_result" 454 | } 455 | ], 456 | "source": [ 457 | "from sklearn.ensemble import RandomForestClassifier\n", 458 | "\n", 459 | "clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train1, y_train1)\n", 460 | "clf.score(X_test1,y_test1)" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "On Using RandomForestClassifier we get a score of 0.908 which is a great score and also much better than what Naive Bayes Classifier performed . " 468 | ] 469 | }, 470 | { 471 | "cell_type": "markdown", 472 | "metadata": {}, 473 | "source": [ 474 | "## Using dummy classifier as baseline" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 25, 480 | "metadata": {}, 481 | "outputs": [ 482 | { 483 | "data": { 484 | "text/plain": [ 485 | "0.9044444444444445" 486 | ] 487 | }, 488 | "execution_count": 25, 489 | "metadata": {}, 490 | "output_type": "execute_result" 491 | } 492 | ], 493 | "source": [ 494 | "from sklearn.dummy import DummyClassifier\n", 495 | "dummy_majority = DummyClassifier(strategy = 'most_frequent').fit(X_train1, y_train1)\n", 496 | "y_dummy_predictions = dummy_majority.predict(X_test)\n", 497 | "dummy_majority.score(X_test1, y_test1)" 498 | ] 499 | }, 500 | { 501 | "cell_type": "markdown", 502 | "metadata": {}, 503 | "source": [ 504 | "We observe that the RandomForest classsifier score is not much compared to dummy classifier which also has a score of more than .90 . Which shows that RandomForest is not a right fit for the model despite the good score.\n", 505 | "This makes us realise that we need a better model which scores better ." 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": 27, 511 | "metadata": {}, 512 | "outputs": [ 513 | { 514 | "data": { 515 | "text/plain": [ 516 | "0.9955555555555555" 517 | ] 518 | }, 519 | "execution_count": 27, 520 | "metadata": {}, 521 | "output_type": "execute_result" 522 | } 523 | ], 524 | "source": [ 525 | "from sklearn.svm import SVC\n", 526 | "svm = SVC(kernel='rbf', C=1).fit(X_train1, y_train1)\n", 527 | "svm.score(X_test1, y_test1)" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "On using SVM classifier using RBF kernel for the model,gives a whoping score of 0.99 which is a good score as well as it performce better than dummy classifier which is our baseline. " 535 | ] 536 | }, 537 | { 538 | "cell_type": "markdown", 539 | "metadata": {}, 540 | "source": [ 541 | "*Thus, Dummy Classifier works as a baseline and gives an idea of the performance of the model on dataset*\n" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": null, 547 | "metadata": {}, 548 | "outputs": [], 549 | "source": [] 550 | } 551 | ], 552 | "metadata": { 553 | "kernelspec": { 554 | "display_name": "Python 3", 555 | "language": "python", 556 | "name": "python3" 557 | }, 558 | "language_info": { 559 | "codemirror_mode": { 560 | "name": "ipython", 561 | "version": 3 562 | }, 563 | "file_extension": ".py", 564 | "mimetype": "text/x-python", 565 | "name": "python", 566 | "nbconvert_exporter": "python", 567 | "pygments_lexer": "ipython3", 568 | "version": "3.7.6" 569 | } 570 | }, 571 | "nbformat": 4, 572 | "nbformat_minor": 4 573 | } 574 | -------------------------------------------------------------------------------- /Hypothesis Testing/Hypo_Testing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Hypo_Testing.ipynb", 7 | "provenance": [], 8 | "toc_visible": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | } 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "4uItjVLu5mXc" 23 | }, 24 | "source": [ 25 | "# T-test\n", 26 | "\n", 27 | "A t-test is a type of inferential statistic which is used to determine if there\n", 28 | "is a significant difference between the means of two groups which may be related in certain features.\n", 29 | "\n", 30 | "T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.\n", 31 | "One sample t-test : The One Sample t Test determines whether the\n", 32 | "sample mean is statistically different from a known or hypothesised population mean. The One Sample t Test is a parametric test." 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "metadata": { 38 | "colab": { 39 | "base_uri": "https://localhost:8080/" 40 | }, 41 | "id": "y90dateE5d9E", 42 | "outputId": "69e603b5-689b-47d9-c61a-cc1d3f3b3279" 43 | }, 44 | "source": [ 45 | "#-----------------------------------------T-test---------------------------------#\n", 46 | "\n", 47 | "\n", 48 | "from scipy.stats import ttest_1samp\n", 49 | "import numpy as np\n", 50 | "\n", 51 | "#10 ages and you are checking whether avg age is 30 or not.\n", 52 | "#H0: The average age is 30\n", 53 | "#H1: The average age is not 30.\n", 54 | "ages = np.array([32,34,29,29,22,39,38,37,38,36,30,26,22,22])\n", 55 | "print(ages)\n", 56 | "#mean of the age \n", 57 | "ages_mean = np.mean(ages)\n", 58 | "print(ages_mean)\n", 59 | "#One Sample t-test\n", 60 | "tset, pval = ttest_1samp(ages, 30)\n", 61 | "print('p-values',pval)\n", 62 | "if pval < 0.05: # alpha value is 0.05 or 5%\n", 63 | " print(\" we are rejecting null hypothesis\")\n", 64 | "else:\n", 65 | " print(\"we are accepting null hypothesis\")\n" 66 | ], 67 | "execution_count": 1, 68 | "outputs": [ 69 | { 70 | "output_type": "stream", 71 | "text": [ 72 | "[32 34 29 29 22 39 38 37 38 36 30 26 22 22]\n", 73 | "31.0\n", 74 | "p-values 0.5605155888171379\n", 75 | "we are accepting null hypothesis\n" 76 | ], 77 | "name": "stdout" 78 | } 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "id": "Lie2acfC5lrP" 85 | }, 86 | "source": [ 87 | "# Z-Test\n", 88 | "\n", 89 | "Z test is used if:\n", 90 | "Your sample size is greater than 30. Otherwise, use a t test.\n", 91 | "Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect another data point.\n", 92 | "Your data should be normally distributed. However, for large sample sizes (over 30) this doesn’t always matter.\n", 93 | "Your data should be randomly selected from a population, where each item has an equal chance of being selected.\n", 94 | "Sample sizes should be equal if at all possible." 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "metadata": { 100 | "colab": { 101 | "base_uri": "https://localhost:8080/" 102 | }, 103 | "id": "uTfU9vxz6Dhc", 104 | "outputId": "0da843b1-50b7-4a44-9ad0-d37f9e7479cd" 105 | }, 106 | "source": [ 107 | "#-----------One Sample Z-test-----------#\n", 108 | "import pandas as pd\n", 109 | "from scipy import stats\n", 110 | "from statsmodels.stats import weightstats as stests\n", 111 | "df = pd.read_csv(\"blood_pressure.csv\")\n", 112 | "\n", 113 | "ztest ,pval = stests.ztest(df['bp_before'], x2=None, value=156)\n", 114 | "print('One-sample z-test')\n", 115 | "print(float(pval))\n", 116 | "if pval<0.05:\n", 117 | " print(\"reject null hypothesis\")\n", 118 | "else:\n", 119 | " print(\"accept null hypothesis\")\n", 120 | "\n", 121 | "#-----------Two Sample Z-test-----------#\n", 122 | "#Two-sample Z test- Just check two independent data groups and decide whether sample mean of two group is equal or not.\n", 123 | "#H0 : mean of two group is 0\n", 124 | "#H1 : mean of two group is not 0\n", 125 | "#Example : we are checking in blood data after blood and before blood data.\n", 126 | "\n", 127 | "ztest ,pval1 = stests.ztest(df['bp_before'], x2=df['bp_after'], value=0,alternative='two-sided')\n", 128 | "print('Two-sample z-test')\n", 129 | "print(float(pval1))\n", 130 | "if pval1<0.05:\n", 131 | " print(\"reject null hypothesis\")\n", 132 | "else:\n", 133 | " print(\"accept null hypothesis\")" 134 | ], 135 | "execution_count": 2, 136 | "outputs": [ 137 | { 138 | "output_type": "stream", 139 | "text": [ 140 | "One-sample z-test\n", 141 | "0.6651614730255063\n", 142 | "accept null hypothesis\n", 143 | "Two-sample z-test\n", 144 | "0.002162306611369422\n", 145 | "reject null hypothesis\n" 146 | ], 147 | "name": "stdout" 148 | }, 149 | { 150 | "output_type": "stream", 151 | "text": [ 152 | "/usr/local/lib/python3.7/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n", 153 | " import pandas.util.testing as tm\n" 154 | ], 155 | "name": "stderr" 156 | } 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "id": "O0AqvlnM6aLN" 163 | }, 164 | "source": [ 165 | "# ANOVA (F-TEST) :- \n", 166 | "The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time.\n", 167 | "For example, if we wanted to test whether voter age differs based on some categorical variable like race, we have to compare the means of each level or group the variable. \n", 168 | "The analysis of variance or ANOVA is a statistical inference test that lets you compare multiple groups at the same time.\n" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "metadata": { 174 | "colab": { 175 | "base_uri": "https://localhost:8080/" 176 | }, 177 | "id": "uZRF_2qM6Qv_", 178 | "outputId": "64439f9f-a116-497d-90e6-d43e0bd552dd" 179 | }, 180 | "source": [ 181 | "\n", 182 | "#------------------------One Way F-test(Anova)------------------------# \n", 183 | "#To tell whether two or more groups are similar or not based on their mean similarity and f-score.\n", 184 | "#Example : there are 3 different category of plant and their weight and need to check whether all 3 group are similar or not.\n", 185 | "import pandas as pd\n", 186 | "from scipy import stats\n", 187 | "from statsmodels.stats import weightstats as stests\n", 188 | "print('One-way Anova')\n", 189 | "df_anova = pd.read_csv('PlantGrowth.csv')\n", 190 | "df_anova = df_anova[['weight','group']]\n", 191 | "grps = pd.unique(df_anova.group.values)\n", 192 | "d_data = {grp:df_anova['weight'][df_anova.group == grp] for grp in grps}\n", 193 | " \n", 194 | "F, p = stats.f_oneway(d_data['ctrl'], d_data['trt1'], d_data['trt2'])\n", 195 | "print(\"p-value for significance is: \", p)\n", 196 | "if p<0.05:\n", 197 | " print(\"reject null hypothesis\")\n", 198 | "else:\n", 199 | " print(\"accept null hypothesis\")\n", 200 | "\n", 201 | "#------------------------------------Two Way F-test-----------------------------------# \n", 202 | "#Two way F-test is extension of 1-way f-test, it is used when we have 2 independent variable and 2+ groups.\n", 203 | "#2-way F-test does not tell which variable is dominant. If we need to check individual significance then Post-hoc testing need to be performed.\n", 204 | "\n", 205 | "#e.g: Grand mean crop yield (the mean crop yield not by any sub-group), as well the mean crop yield by each factor, \n", 206 | "# as well as by the factors grouped together.\n", 207 | "import statsmodels.api as sm\n", 208 | "from statsmodels.formula.api import ols\n", 209 | "print('Two-way ANova')\n", 210 | "df_anova2 = pd.read_csv(\"https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/crop_yield.csv\")\n", 211 | "model = ols('Yield ~ C(Fert)*C(Water)', df_anova2).fit()\n", 212 | "print(f\"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}\")\n", 213 | "res = sm.stats.anova_lm(model, typ= 2)\n", 214 | "print(res)" 215 | ], 216 | "execution_count": 3, 217 | "outputs": [ 218 | { 219 | "output_type": "stream", 220 | "text": [ 221 | "One-way Anova\n", 222 | "p-value for significance is: 0.0159099583256229\n", 223 | "reject null hypothesis\n", 224 | "Two-way ANova\n", 225 | "Overall model F( 3, 16) = 4.112, p = 0.0243\n", 226 | " sum_sq df F PR(>F)\n", 227 | "C(Fert) 69.192 1.0 5.766000 0.028847\n", 228 | "C(Water) 63.368 1.0 5.280667 0.035386\n", 229 | "C(Fert):C(Water) 15.488 1.0 1.290667 0.272656\n", 230 | "Residual 192.000 16.0 NaN NaN\n" 231 | ], 232 | "name": "stdout" 233 | } 234 | ] 235 | } 236 | ] 237 | } -------------------------------------------------------------------------------- /Hypothesis Testing/Hypothesis Testing.md: -------------------------------------------------------------------------------- 1 | # What is Hypothesis Testing? 2 | 3 | Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data. 4 | It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 5 | 6 | ## Important Parameters: 7 | 8 | **Null Hypothesis** : 9 | 1) A statement about a population paramter. 10 | 2) Contains: '=', '<=', '>=' 11 | 3) We consider Null Hypothesis to be true, to decide whether to reject or accept the alternative hypothesis. 12 | 13 | **Alternative Hypothesis** : 14 | 1) A statement that directly contardicts the null hypothesis. 15 | 2) COntains : 'Not equal to', '>', '<' 16 | 17 | **Level Of Significance**: 18 | Degree of significance in which we accept or reject the null-hypothesis. (Ussually taken as 5%, which means your output should be 95% confident to give similar kind of result in each sample.) 19 | 20 | **Type I error:** 21 | When we reject the null hypothesis, although that hypothesis was true. (alpha) 22 | 23 | **Type II error :** 24 | When we accept the null hypothesis but it is false. (Beta) 25 | 26 | **One tailed test :-** 27 | A test of a statistical hypothesis , where the region of rejection is on only one side of the sampling distribution , is called a one-tailed test. 28 | 29 | A box has ≥ 40 chocolates. 30 | 31 | **Two-tailed test :-** 32 | A two-tailed test is a statistical test in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis. 33 | 34 | A box != 40 chocolates. 35 | 36 | **P-value** 37 | P-value or the calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true. 38 | If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. 39 | 40 | Example : You have a coin and you don’t know whether that is fair or tricky so let’s decide null and alternate hypothesis 41 | H0 : The coin is a fair coin. 42 | H1 : The coin is a tricky coin. 43 | Level of significance : 95% 44 | alpha = 5% or 0.05 45 | 46 | Now let’s toss the coin and calculate p- value ( probability value). 47 | Toss a coin 1st time and result is tail- P-value = 50% (as head and tail have equal probability) 48 | Toss a coin 2nd time and result is tail, now p-value = 50/2 = 25% 49 | and similarly we Toss 6 consecutive time and got result as P-value = 1.5% but we set our significance level as 95% means 5% error rate we allow and here we see we are beyond that level (p-value < alpha) i.e. our null- hypothesis does not hold good so we need to reject the null hypothesis and propose that this coin is a tricky coin which is actually. 50 | 51 | Some of the widely used hypothesis tests and codes are discussed. 52 | 53 | 54 | 55 | -------------------------------------------------------------------------------- /Hypothesis Testing/PlantGrowth.csv: -------------------------------------------------------------------------------- 1 | "","weight","group" 2 | "1",4.17,"ctrl" 3 | "2",5.58,"ctrl" 4 | "3",5.18,"ctrl" 5 | "4",6.11,"ctrl" 6 | "5",4.5,"ctrl" 7 | "6",4.61,"ctrl" 8 | "7",5.17,"ctrl" 9 | "8",4.53,"ctrl" 10 | "9",5.33,"ctrl" 11 | "10",5.14,"ctrl" 12 | "11",4.81,"trt1" 13 | "12",4.17,"trt1" 14 | "13",4.41,"trt1" 15 | "14",3.59,"trt1" 16 | "15",5.87,"trt1" 17 | "16",3.83,"trt1" 18 | "17",6.03,"trt1" 19 | "18",4.89,"trt1" 20 | "19",4.32,"trt1" 21 | "20",4.69,"trt1" 22 | "21",6.31,"trt2" 23 | "22",5.12,"trt2" 24 | "23",5.54,"trt2" 25 | "24",5.5,"trt2" 26 | "25",5.37,"trt2" 27 | "26",5.29,"trt2" 28 | "27",4.92,"trt2" 29 | "28",6.15,"trt2" 30 | "29",5.8,"trt2" 31 | "30",5.26,"trt2" 32 | -------------------------------------------------------------------------------- /Hypothesis Testing/blood_pressure.csv: -------------------------------------------------------------------------------- 1 | patient,sex,agegrp,bp_before,bp_after 2 | 1,Male,30-45,143,153 3 | 2,Male,30-45,163,170 4 | 3,Male,30-45,153,168 5 | 4,Male,30-45,153,142 6 | 5,Male,30-45,146,141 7 | 6,Male,30-45,150,147 8 | 7,Male,30-45,148,133 9 | 8,Male,30-45,153,141 10 | 9,Male,30-45,153,131 11 | 10,Male,30-45,158,125 12 | 11,Male,30-45,149,164 13 | 12,Male,30-45,173,159 14 | 13,Male,30-45,165,135 15 | 14,Male,30-45,145,159 16 | 15,Male,30-45,143,153 17 | 16,Male,30-45,152,126 18 | 17,Male,30-45,141,162 19 | 18,Male,30-45,176,134 20 | 19,Male,30-45,143,136 21 | 20,Male,30-45,162,150 22 | 21,Male,46-59,149,168 23 | 22,Male,46-59,156,155 24 | 23,Male,46-59,151,136 25 | 24,Male,46-59,159,132 26 | 25,Male,46-59,164,160 27 | 26,Male,46-59,154,160 28 | 27,Male,46-59,152,136 29 | 28,Male,46-59,142,183 30 | 29,Male,46-59,162,152 31 | 30,Male,46-59,155,162 32 | 31,Male,46-59,175,151 33 | 32,Male,46-59,184,139 34 | 33,Male,46-59,167,175 35 | 34,Male,46-59,148,184 36 | 35,Male,46-59,170,151 37 | 36,Male,46-59,159,171 38 | 37,Male,46-59,149,157 39 | 38,Male,46-59,140,159 40 | 39,Male,46-59,185,140 41 | 40,Male,46-59,160,174 42 | 41,Male,60+,157,167 43 | 42,Male,60+,158,158 44 | 43,Male,60+,162,168 45 | 44,Male,60+,160,159 46 | 45,Male,60+,180,153 47 | 46,Male,60+,155,164 48 | 47,Male,60+,172,169 49 | 48,Male,60+,157,148 50 | 49,Male,60+,171,185 51 | 50,Male,60+,170,163 52 | 51,Male,60+,175,146 53 | 52,Male,60+,175,160 54 | 53,Male,60+,172,175 55 | 54,Male,60+,173,163 56 | 55,Male,60+,170,185 57 | 56,Male,60+,164,146 58 | 57,Male,60+,147,176 59 | 58,Male,60+,154,147 60 | 59,Male,60+,172,161 61 | 60,Male,60+,162,164 62 | 61,Female,30-45,152,149 63 | 62,Female,30-45,147,142 64 | 63,Female,30-45,144,146 65 | 64,Female,30-45,144,138 66 | 65,Female,30-45,158,131 67 | 66,Female,30-45,147,145 68 | 67,Female,30-45,154,134 69 | 68,Female,30-45,151,135 70 | 69,Female,30-45,149,131 71 | 70,Female,30-45,138,135 72 | 71,Female,30-45,162,133 73 | 72,Female,30-45,157,135 74 | 73,Female,30-45,141,168 75 | 74,Female,30-45,167,144 76 | 75,Female,30-45,147,147 77 | 76,Female,30-45,143,151 78 | 77,Female,30-45,142,149 79 | 78,Female,30-45,166,147 80 | 79,Female,30-45,147,149 81 | 80,Female,30-45,142,135 82 | 81,Female,46-59,157,127 83 | 82,Female,46-59,170,150 84 | 83,Female,46-59,150,138 85 | 84,Female,46-59,150,147 86 | 85,Female,46-59,167,157 87 | 86,Female,46-59,154,146 88 | 87,Female,46-59,143,148 89 | 88,Female,46-59,157,136 90 | 89,Female,46-59,149,146 91 | 90,Female,46-59,161,132 92 | 91,Female,46-59,142,145 93 | 92,Female,46-59,162,132 94 | 93,Female,46-59,144,157 95 | 94,Female,46-59,142,140 96 | 95,Female,46-59,159,137 97 | 96,Female,46-59,140,154 98 | 97,Female,46-59,144,169 99 | 98,Female,46-59,142,145 100 | 99,Female,46-59,145,137 101 | 100,Female,46-59,145,143 102 | 101,Female,60+,168,178 103 | 102,Female,60+,142,141 104 | 103,Female,60+,147,149 105 | 104,Female,60+,148,148 106 | 105,Female,60+,162,138 107 | 106,Female,60+,170,143 108 | 107,Female,60+,173,167 109 | 108,Female,60+,151,158 110 | 109,Female,60+,155,152 111 | 110,Female,60+,163,154 112 | 111,Female,60+,183,161 113 | 112,Female,60+,159,143 114 | 113,Female,60+,148,159 115 | 114,Female,60+,151,177 116 | 115,Female,60+,165,142 117 | 116,Female,60+,152,152 118 | 117,Female,60+,161,152 119 | 118,Female,60+,165,174 120 | 119,Female,60+,149,151 121 | 120,Female,60+,185,163 122 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Pankhuri Saxena 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Possible project for Kharagpur Winter of Code 2020 2 | 3 | # Statistics and Econometrics for Data Science 4 | ![GitHub Repo stars](https://img.shields.io/github/stars/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge) 5 | ![GitHub forks](https://img.shields.io/github/forks/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge) 6 | ![GitHub contributors](https://img.shields.io/github/contributors/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?color=2E61C5&logo=Github&style=for-the-badge) 7 | ![GitHub pull requests](https://img.shields.io/github/issues-pr/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?logo=Github&style=for-the-badge) 8 | ![GitHub issues](https://img.shields.io/github/issues/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science?logo=Github&style=for-the-badge) 9 | 10 | ## Table of Contents 11 | 1. How are the topics even related to ML? 12 | 2. What will the project entail? 13 | 3. How to start with the project? 14 | 4. What are the prerequisites for the project? 15 | 5. What can you contribute to the project? 16 | 6. Expectations from the project 17 | 7. How much is ML and how much is statistics/econometrics? 18 | 8. Who to contact? 19 | 20 | 21 | 22 | ## How are the topics even related to ML? 23 | Often while building models in ML we become too concerned with accuracy and forget whether 24 | the model does what we initially set out to do. Statistics and Econometrics help in 25 | building better models and understanding the data. They can help in better feature engineering, 26 | and a better understanding of the assumptions which can help in ultimately building better models. 27 | Running linear regression sounds easy but what if someone asks you what assumptions you made 28 | while running the model. If your answer is "Umm..." then you are on the track to understanding 29 | what these topics can contribute to ML (if you didn't already know). 30 | 31 | Due to certain limitations, for the time being, we are concerned with only Linear Regression. 32 | This is just a very small subset of ML but let's start with tiny steps to progress. 33 | 34 | 35 | 36 | ## What will the project entail? 37 | The project aims to have a series of notebooks that will help in understanding the basic topics. 38 | The notebooks could be used to get a broad overview of the topic or to quickly revise the topic. 39 | The notebooks can be helpful in the following ways: 40 | - You are participating in a competition and you want to run some quick checks on the data/model 41 | - You are sitting for internship/placement and need to revise some topics fast 42 | - You want some code snippet for a certain test and how to interpret the test results. 43 | 44 | 45 | 46 | ## How to start with the project? 47 | 1. Install Jupyter Notebook, recommended installing with [Anaconda](https://www.anaconda.com/products/individual) 48 | 2. Learn how to use Jupyter Notebook, and python libraries NumPy, pandas, and matplotlib 49 | 3. Clone this repo and make a new branch 50 | 4. Each ipynb file should be able to stand independently so you should be able to open it using Jupyter Notebook 51 | 52 | 53 | 54 | ## What are the prerequisites for the project? 55 | - Basic knowledge of at least one programming language (preferable python) 56 | - Basic knowledge of probability (class 12 level) 57 | - Desire to learn statistics 58 | 59 | 60 | 61 | ## What can you contribute to the project? 62 | Easy: Make some changes to the existing graphs or explanation to make them look better, 63 | add new ideas to 'ideas.md', check if existing notebooks make sense 64 | 65 | Intermediate: Start with a new notebook of your own 66 | 67 | Advanced: Make a series of notebooks or explain a complicated/advanced topic 68 | 69 | 70 | 71 | ## Expectations from the project 72 | There will be a variety of issues, some easy to get you started and one harder to make you 73 | significantly contribute. But I'll set down the minimum expected work that you should do to 74 | pass. By medievals, you should have at least one new notebook and by endevals, you should have 75 | at least three new notebooks ready. Each notebook should have some introduction to the topic, 76 | mathematical proofs if required, the code to implement that topic from scratch, and any ready-made 77 | library code, if available. 78 | 79 | The notebook referred to here are Jupyter Notebooks. 80 | 81 | 82 | 83 | ## How much is ML and how much is statistics/econometrics? 84 | Well, your learning from this will be less towards ML. These topics are to provide support to ML 85 | and do not replace the importance of doing a course/project purely based on machine learning. 86 | 87 | 88 | 89 | ## Who to contact? 90 | The project was started by PetalsOnWind (Pankhuri Saxena, a fourth-year Economics student at IIT KGP). 91 | She can be reached at pankhurisaxena[dot]iitkgp[at]gmail[dot]com. 92 | 93 | ## Contributors: 94 | 95 | ### Credits goes to these people:✨ 96 | 97 | 98 | 99 | 104 | 105 |
100 | 101 | 102 | 103 |
106 | -------------------------------------------------------------------------------- /WIP/Monte Carlo Simulation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# MONTE CARLO METHODS OF INFERENCE" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "#### In this section we will see how we can simulate random sequence of numbers using MONTE CARLO SIMULATION." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "The value of pi can be calculated using the monte carlo simulation. In this part first we will generate random tuples of numbers using random function in numpy library into two separate array array_x and array_y. \n", 22 | "The value of pi will be area of circle of radius r divided by area of square with side r, same as the radius of the circle.\n", 23 | "\n", 24 | "The value of $$ pi ( \\pi ) = \\frac{area(circle)}{area(square)} \\\\ $$" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "#Importing all the necessary library.\n", 34 | "import numpy as np\n", 35 | "import pandas as pd\n", 36 | "import matplotlib.pyplot as plt\n", 37 | "import seaborn as sns\n", 38 | "%matplotlib inline" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "Generating random tuples in the first quadrant between x=0 to x=5 and y=0 to y=5. Using random function from numpy library we will get random tuples or point evenly distributed in the square of dimensions 5 X 5.\n", 46 | "Now we have assigned a square of side 2 units with one side at origin inside this region and a circle of radius 2 units centered at (2,2). If a point falls in the square region then the value of square_count(s) increase by 1 and whenever a point falls in circular region the value of circular_count(c) increases by 1. \n", 47 | "'s' is proportional to the area of square and 'c' is proportional to area of circle.\n", 48 | "Therefore the ratio of circular area and square area i.e. c/s gives the value of pi.\n", 49 | "$$ pi ( \\pi ) = \\frac{c}{s} =\\frac{\\pi a^2}{a^2}= \\pi \\\\ $$" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 18, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "#generating and storing random number in range (0,5) in two separate list so that each tuple (array_x[i], array_y[i])\n", 59 | "#gives a random point in the defined square region.\n", 60 | "array_x = np.random.rand(10000)\n", 61 | "array_y = np.random.rand(10000)\n", 62 | "array_x = array_x * 5\n", 63 | "array_y = array_y * 5\n" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 20, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "array([4.42890686, 4.92913044, 0.81528105, ..., 4.91592682, 2.00402508,\n", 75 | " 4.69855032])" 76 | ] 77 | }, 78 | "execution_count": 20, 79 | "metadata": {}, 80 | "output_type": "execute_result" 81 | } 82 | ], 83 | "source": [ 84 | "#A look into array_x\n", 85 | "array_x" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 21, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "array([0.76251225, 0.73854336, 0.98199909, ..., 0.78046047, 4.27236299,\n", 97 | " 3.25731217])" 98 | ] 99 | }, 100 | "execution_count": 21, 101 | "metadata": {}, 102 | "output_type": "execute_result" 103 | } 104 | ], 105 | "source": [ 106 | "#A look into array_y\n", 107 | "array_y" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 22, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "#Here we create a list which stores the value of the ratio of c and s after every iteration. I have initialised\n", 117 | "#the value of s and c to 1 so that we don't get any infinite value.\n", 118 | "#the condition for the point to belong in square region and circular region is checked in every iteration and \n", 119 | "#accordingly the value of s and c is changed.\n", 120 | "pi_val = []\n", 121 | "s=1\n", 122 | "c=1\n", 123 | "for i in range(10000):\n", 124 | " \n", 125 | " if array_x[i]<2 and array_y[i]<2:\n", 126 | " s=s+1\n", 127 | " if ((array_x[i]-2)*(array_x[i]-2)+(array_y[i]-2)*(array_y[i]-2))<4:\n", 128 | " c=c+1\n", 129 | " pi=c/s\n", 130 | " pi_val.append(pi)\n", 131 | " " 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 29, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "data": { 141 | "text/plain": [ 142 | "(array([4.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n", 143 | " 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 2.000e+00, 0.000e+00,\n", 144 | " 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n", 145 | " 0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n", 146 | " 0.000e+00, 0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 3.000e+00,\n", 147 | " 7.000e+00, 2.000e+00, 4.000e+00, 1.000e+00, 1.000e+00, 6.100e+01,\n", 148 | " 3.460e+02, 5.880e+02, 1.368e+03, 2.069e+03, 2.278e+03, 2.972e+03,\n", 149 | " 1.160e+02, 3.400e+01, 2.300e+01, 3.500e+01, 2.600e+01, 1.300e+01,\n", 150 | " 9.000e+00, 1.000e+00, 7.000e+00, 4.000e+00, 0.000e+00, 2.000e+00,\n", 151 | " 0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n", 152 | " 3.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 2.000e+00, 0.000e+00,\n", 153 | " 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00, 4.000e+00, 0.000e+00,\n", 154 | " 0.000e+00, 1.000e+00, 0.000e+00, 2.000e+00, 0.000e+00, 0.000e+00,\n", 155 | " 0.000e+00, 0.000e+00, 2.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n", 156 | " 0.000e+00, 0.000e+00, 1.000e+00, 0.000e+00, 0.000e+00, 0.000e+00,\n", 157 | " 0.000e+00, 0.000e+00, 0.000e+00, 4.000e+00, 0.000e+00, 0.000e+00,\n", 158 | " 0.000e+00, 0.000e+00, 0.000e+00, 1.000e+00]),\n", 159 | " array([1. , 1.05, 1.1 , 1.15, 1.2 , 1.25, 1.3 , 1.35, 1.4 , 1.45, 1.5 ,\n", 160 | " 1.55, 1.6 , 1.65, 1.7 , 1.75, 1.8 , 1.85, 1.9 , 1.95, 2. , 2.05,\n", 161 | " 2.1 , 2.15, 2.2 , 2.25, 2.3 , 2.35, 2.4 , 2.45, 2.5 , 2.55, 2.6 ,\n", 162 | " 2.65, 2.7 , 2.75, 2.8 , 2.85, 2.9 , 2.95, 3. , 3.05, 3.1 , 3.15,\n", 163 | " 3.2 , 3.25, 3.3 , 3.35, 3.4 , 3.45, 3.5 , 3.55, 3.6 , 3.65, 3.7 ,\n", 164 | " 3.75, 3.8 , 3.85, 3.9 , 3.95, 4. , 4.05, 4.1 , 4.15, 4.2 , 4.25,\n", 165 | " 4.3 , 4.35, 4.4 , 4.45, 4.5 , 4.55, 4.6 , 4.65, 4.7 , 4.75, 4.8 ,\n", 166 | " 4.85, 4.9 , 4.95, 5. , 5.05, 5.1 , 5.15, 5.2 , 5.25, 5.3 , 5.35,\n", 167 | " 5.4 , 5.45, 5.5 , 5.55, 5.6 , 5.65, 5.7 , 5.75, 5.8 , 5.85, 5.9 ,\n", 168 | " 5.95, 6. ]),\n", 169 | " )" 170 | ] 171 | }, 172 | "execution_count": 29, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | }, 176 | { 177 | "data": { 178 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlwAAAFlCAYAAAA+gTZIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAUxklEQVR4nO3df6zd9X3f8de7OKMsCSoRN8i1yUwrrxoglTSWx4RUZU1XvKYa9A8kR1qCpkyuIlIlWqXJ5J+0f1hi0ppumRYkGrIYLQmymkSgknRlLFUWiYZcMhpiCIoVWHDxsNsoCtkfbDjv/XG/rKfm4nv943PP8fXjIR2dcz7n+z3nc/QV4unv93u+t7o7AACM81PzngAAwGYnuAAABhNcAACDCS4AgMEEFwDAYIILAGCwLfOewFquvPLK3rFjx7ynAQCwpscff/yvunvp1PGFD64dO3ZkeXl53tMAAFhTVf3P1cYdUgQAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgsDWDq6p+uqoeq6q/qKrDVfV70/hbqurhqvrudH/FzDp3VtWRqnqmqm6eGX9HVT05vfbxqqoxXwsAYHGsZw/Xy0l+pbt/MckNSfZU1Y1J9id5pLt3Jnlkep6qujbJ3iTXJdmT5BNVdcn0Xncn2Zdk53Tbc/6+CgDAYlozuHrFj6enb5huneSWJAen8YNJbp0e35Lk/u5+ubufTXIkye6q2prk8u5+tLs7yX0z6wAAbFrrOoerqi6pqieSHE/ycHd/PclV3X0sSab7t06Lb0vy/MzqR6exbdPjU8dX+7x9VbVcVcsnTpw4g68DALB41hVc3X2yu29Isj0re6uuP83iq52X1acZX+3z7unuXd29a2npNX//EQDggnJGv1Ls7h8m+bOsnHv14nSYMNP98Wmxo0munllte5IXpvHtq4wDAGxqW9ZaoKqWkvzf7v5hVV2W5FeT/JskDya5Pcld0/0D0yoPJvlsVX0syc9m5eT4x7r7ZFW9NJ1w//Uk70vyH873FwI21o79D71m7Lm73j2HmQAsrjWDK8nWJAenXxr+VJJD3f3HVfVokkNV9f4k309yW5J09+GqOpTkqSSvJLmju09O7/WBJJ9OclmSL083AIBNbc3g6u5vJXn7KuN/neRdr7POgSQHVhlfTnK6878AADYdV5oHABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGWzO4qurqqvpKVT1dVYer6kPT+O9W1V9W1RPT7ddn1rmzqo5U1TNVdfPM+Duq6snptY9XVY35WgAAi2PLOpZ5JcnvdPc3q+rNSR6vqoen1/6gu//t7MJVdW2SvUmuS/KzSf5rVf397j6Z5O4k+5L8eZIvJdmT5Mvn56sAACymNfdwdfex7v7m9PilJE8n2XaaVW5Jcn93v9zdzyY5kmR3VW1Ncnl3P9rdneS+JLee6xcAAFh0Z3QOV1XtSPL2JF+fhj5YVd+qqk9V1RXT2LYkz8+sdnQa2zY9PnUcAGBTW3dwVdWbknw+yYe7+0dZOTz480luSHIsye+/uugqq/dpxlf7rH1VtVxVyydOnFjvFAEAFtK6gquq3pCV2PpMd38hSbr7xe4+2d0/SfKHSXZPix9NcvXM6tuTvDCNb19l/DW6+57u3tXdu5aWls7k+wAALJz1/Eqxktyb5Onu/tjM+NaZxX4zybenxw8m2VtVl1bVNUl2Jnmsu48leamqbpze831JHjhP3wMAYGGt51eKNyV5b5Inq+qJaewjSd5TVTdk5bDgc0l+K0m6+3BVHUryVFZ+4XjH9AvFJPlAkk8nuSwrv070C0UAYNNbM7i6+2tZ/fyrL51mnQNJDqwyvpzk+jOZIADAhc6V5gEABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDbZn3BIALx479D817CgAXJHu4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYDDBBQAwmOACABhMcAEADCa4AAAGE1wAAIMJLgCAwQQXAMBgggsAYLAt854AsJh27H9o3lMA2DTs4QIAGExwAQAMtmZwVdXVVfWVqnq6qg5X1Yem8bdU1cNV9d3p/oqZde6sqiNV9UxV3Twz/o6qenJ67eNVVWO+FgDA4ljPHq5XkvxOd/+DJDcmuaOqrk2yP8kj3b0zySPT80yv7U1yXZI9ST5RVZdM73V3kn1Jdk63PefxuwAALKQ1g6u7j3X3N6fHLyV5Osm2JLckOTgtdjDJrdPjW5Lc390vd/ezSY4k2V1VW5Nc3t2PdncnuW9mHQCATeuMzuGqqh1J3p7k60mu6u5jyUqUJXnrtNi2JM/PrHZ0Gts2PT51HABgU1t3cFXVm5J8PsmHu/tHp1t0lbE+zfhqn7WvqparavnEiRPrnSIAwEJaV3BV1RuyEluf6e4vTMMvTocJM90fn8aPJrl6ZvXtSV6YxrevMv4a3X1Pd+/q7l1LS0vr/S4AAAtpPb9SrCT3Jnm6uz8289KDSW6fHt+e5IGZ8b1VdWlVXZOVk+Mfmw47vlRVN07v+b6ZdQAANq31XGn+piTvTfJkVT0xjX0kyV1JDlXV+5N8P8ltSdLdh6vqUJKnsvILxzu6++S03geSfDrJZUm+PN0AADa1NYOru7+W1c+/SpJ3vc46B5IcWGV8Ocn1ZzJBAIALnSvNAwAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDB1gyuqvpUVR2vqm/PjP1uVf1lVT0x3X595rU7q+pIVT1TVTfPjL+jqp6cXvt4VdX5/zoAAItnPXu4Pp1kzyrjf9DdN0y3LyVJVV2bZG+S66Z1PlFVl0zL351kX5Kd02219wQA2HTWDK7u/mqSH6zz/W5Jcn93v9zdzyY5kmR3VW1Ncnl3P9rdneS+JLee5ZwBAC4o53IO1wer6lvTIccrprFtSZ6fWeboNLZtenzq+Kqqal9VLVfV8okTJ85higAA83e2wXV3kp9PckOSY0l+fxpf7bysPs34qrr7nu7e1d27lpaWznKKAACL4ayCq7tf7O6T3f2TJH+YZPf00tEkV88suj3JC9P49lXGAQA2vbMKrumcrFf9ZpJXf8H4YJK9VXVpVV2TlZPjH+vuY0leqqobp18nvi/JA+cwbwCAC8aWtRaoqs8leWeSK6vqaJKPJnlnVd2QlcOCzyX5rSTp7sNVdSjJU0leSXJHd5+c3uoDWfnF42VJvjzdAAA2vTWDq7vfs8rwvadZ/kCSA6uMLye5/oxmBwCwCbjSPADAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMHWvPApsPnt2P/QvKcAsKnZwwUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYbM3gqqpPVdXxqvr2zNhbqurhqvrudH/FzGt3VtWRqnqmqm6eGX9HVT05vfbxqqrz/3UAABbPevZwfTrJnlPG9id5pLt3Jnlkep6qujbJ3iTXTet8oqoumda5O8m+JDun26nvCQCwKa0ZXN391SQ/OGX4liQHp8cHk9w6M35/d7/c3c8mOZJkd1VtTXJ5dz/a3Z3kvpl1AAA2tbM9h+uq7j6WJNP9W6fxbUmen1nu6DS2bXp86viqqmpfVS1X1fKJEyfOcooAAIvhfJ80v9p5WX2a8VV19z3dvau7dy0tLZ23yQEAzMPZBteL02HCTPfHp/GjSa6eWW57khem8e2rjAMAbHpnG1wPJrl9enx7kgdmxvdW1aVVdU1WTo5/bDrs+FJV3Tj9OvF9M+sAAGxqW9ZaoKo+l+SdSa6sqqNJPprkriSHqur9Sb6f5LYk6e7DVXUoyVNJXklyR3efnN7qA1n5xeNlSb483QAANr01g6u73/M6L73rdZY/kOTAKuPLSa4/o9kBAGwCrjQPADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAgwkuAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAg22Z9wSAjbVj/0PzngLARcceLgCAwQQXAMBgggsAYDDBBQAwmOACABjsnIKrqp6rqier6omqWp7G3lJVD1fVd6f7K2aWv7OqjlTVM1V187lOHgDgQnA+9nD94+6+obt3Tc/3J3mku3cmeWR6nqq6NsneJNcl2ZPkE1V1yXn4fACAhTbikOItSQ5Ojw8muXVm/P7ufrm7n01yJMnuAZ8PALBQzjW4OsmfVtXjVbVvGruqu48lyXT/1ml8W5LnZ9Y9Oo0BAGxq53ql+Zu6+4WqemuSh6vqO6dZtlYZ61UXXIm3fUnytre97RynCAAwX+e0h6u7X5jujyf5YlYOEb5YVVuTZLo/Pi1+NMnVM6tvT/LC67zvPd29q7t3LS0tncsUAQDm7qyDq6reWFVvfvVxkl9L8u0kDya5fVrs9iQPTI8fTLK3qi6tqmuS7Ezy2Nl+PgDAheJcDileleSLVfXq+3y2u/+kqr6R5FBVvT/J95PcliTdfbiqDiV5KskrSe7o7pPnNHsAgAvAWQdXd38vyS+uMv7XSd71OuscSHLgbD8TAOBC5ErzAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYOfyx6uBBbdj/0PzngIAsYcLAGA4wQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMEEFwDAYIILAGAwwQUAMJjgAgAYTHABAAwmuAAABhNcAACDCS4AgMG2zHsCwOazY/9Drxl77q53z2EmAIvBHi4AgMEEFwDAYIILAGAwwQUAMJiT5mGTWO1EdQAWgz1cAACDCS4AgME2/JBiVe1J8u+TXJLkk91910bPAdh46702l2t4AZvRhgZXVV2S5D8m+SdJjib5RlU92N1PbeQ8gMWw3vPORBhwodvoPVy7kxzp7u8lSVXdn+SWJIILOCOvF2tCDFhEGx1c25I8P/P8aJJ/uMFzeA3/euZ8We8eG4fSxjmXQ5erWfRtsJnC038Dm49t+jequzfuw6puS3Jzd//L6fl7k+zu7t8+Zbl9SfZNT38hyTODp3Zlkr8a/BmcOdtl8dgmi8c2WUy2y+LZqG3y97p76dTBjd7DdTTJ1TPPtyd54dSFuvueJPds1KSqarm7d23U57E+tsvisU0Wj22ymGyXxTPvbbLRl4X4RpKdVXVNVf2dJHuTPLjBcwAA2FAbuoeru1+pqg8m+S9ZuSzEp7r78EbOAQBgo234dbi6+0tJvrTRn7uGDTt8yRmxXRaPbbJ4bJPFZLssnrlukw09aR4A4GLkT/sAAAx2UQdXVX2qqo5X1bfnPRdWVNXVVfWVqnq6qg5X1YfmPSeSqvrpqnqsqv5i2i6/N+85saKqLqmq/1FVfzzvubCiqp6rqier6omqWp73fEiq6meq6o+q6jvT/1/+0YbP4WI+pFhVv5zkx0nu6+7r5z0fkqrammRrd3+zqt6c5PEkt/rzT/NVVZXkjd3946p6Q5KvJflQd//5nKd20auqf5VkV5LLu/s35j0fVoIrya7udh2uBVFVB5P89+7+5HSVhL/b3T/cyDlc1Hu4uvurSX4w73nwN7r7WHd/c3r8UpKns/IXCpijXvHj6ekbptvF+6+1BVFV25O8O8kn5z0XWFRVdXmSX05yb5J09//Z6NhKLvLgYrFV1Y4kb0/y9TlPhfz/Q1dPJDme5OHutl3m798l+ddJfjLnefC3dZI/rarHp7+cwnz9XJITSf7TdPj9k1X1xo2ehOBiIVXVm5J8PsmHu/tH854PSXef7O4bsvIXInZXlcPwc1RVv5HkeHc/Pu+58Bo3dfcvJfmnSe6YTl9hfrYk+aUkd3f325P87yT7N3oSgouFM50j9Pkkn+nuL8x7Pvxt0674P0uyZ74zuejdlOSfTecL3Z/kV6rqP893SiRJd78w3R9P8sUku+c7o4ve0SRHZ/bK/1FWAmxDCS4WynRy9r1Jnu7uj817PqyoqqWq+pnp8WVJfjXJd+Y6qYtcd9/Z3du7e0dW/kzaf+vufz7naV30quqN0w9+Mh22+rUkfgk/R939v5I8X1W/MA29K8mG/xBrw680v0iq6nNJ3pnkyqo6muSj3X3vfGd10bspyXuTPDmdL5QkH5n+QgHzszXJwaq6JCv/UDvU3S5DAK91VZIvrvzbMVuSfLa7/2S+UyLJbyf5zPQLxe8l+RcbPYGL+rIQAAAbwSFFAIDBBBcAwGCCCwBgMMEFADCY4AIAGExwAQAMJrgAAAYTXAAAg/0/aEjMwJF2DSIAAAAASUVORK5CYII=\n", 179 | "text/plain": [ 180 | "
" 181 | ] 182 | }, 183 | "metadata": { 184 | "needs_background": "light" 185 | }, 186 | "output_type": "display_data" 187 | } 188 | ], 189 | "source": [ 190 | "#Here I have plotted the histogram which shows that maximum value in the list pi_val\n", 191 | "#lie at the expected value of pie i.e 3.14.\n", 192 | "plt.figure(figsize=(10,6))\n", 193 | "plt.hist(pi_val, bins = 100)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 31, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "[]" 205 | ] 206 | }, 207 | "execution_count": 31, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | }, 211 | { 212 | "data": { 213 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkkAAAFlCAYAAAD/BnzkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAso0lEQVR4nO3deXzcVb3/8feZmexJm7RNurfp3gKFLqFI2bqwFFDxXtGLCIKI3Ov1KoiKVfG6oRdFvehPr4qgKKiAiIigFYSySktTWtrSfd/bpFu2Zp3z+2OWTpJJM2lzMj3h9Xw88sjMN9+ZfJrTZN5ztq+x1goAAACtBdJdAAAAwKmIkAQAAJAEIQkAACAJQhIAAEAShCQAAIAkCEkAAABJhFw86YABA2xpaamLpwYAAOhWS5curbTWFrc97iQklZaWqry83MVTAwAAdCtjzLZkxxluAwAASIKQBAAAkAQhCQAAIAlCEgAAQBKEJAAAgCQISQAAAEkQkgAAAJIgJAEAACRBSAIAAEgipZBkjCk0xjxujFlrjFljjDnXdWEAAADplOplSX4oaYG19mpjTKakXIc1AQAApF2nPUnGmD6SLpT0gCRZaxuttYcd19Wp5pawtlTWprsMAADQS6Uy3DZaUoWkXxljlhlj7jfG5LU9yRhzizGm3BhTXlFR0e2FtnXP39dp9vde1I6Ddc6/FwAAeOdJJSSFJE2T9FNr7VRJtZLmtz3JWnuftbbMWltWXFzczWW2t2jzAUlSZU2D8+8FAADeeVIJSTsl7bTWLo7ef1yR0HRKMMakuwQAANALdRqSrLV7Je0wxkyIHporabXTqgAAANIs1dVtn5L02+jKts2SPuquJAAAgPRLKSRZa5dLKnNbStfYdBcAAAB6Ne933GZGEgAAcMH7kAQAAOCCtyHJMt4GAAAc8jYkxbADAAAAcMH7kAQAAOCCtyHJsr4NAAA45G1IijGsbwMAAA54H5IAAABc8DYksboNAAC45G1IimF1GwAAcMH7kAQAAOCCtyGJ4TYAAOCStyEJAADAJW9DUkVNQ7pLAAAAvZi3ISkzGCk9OyOY5koAAEBv5G1ICgYiy9oygixvAwAA3c/bkMRlSQAAgEv+hiQyEgAAcMjbkBRDWAIAAC54G5IIRwAAwCVvQxIAAIBL3ockOpQAAIAL3oYky3gbAABwyNuQBAAA4JL3IYkeJQAA4IK3IYloBAAAXPI3JJGSAACAQ96GpBiyEgAAcMHbkMS12wAAgEv+hiQyEgAAcMjbkBRDWAIAAC54G5LIRgAAwCV/QxIpCQAAOORtSDqGtAQAALqfxyGJcAQAANzxNiQx3AYAAFzyNiQBAAC45H1IokcJAAC44G1IIhsBAACX/A1JdCEBAACHvA1JMUQlAADggrchiXAEAABc8jckkZIAAIBD3oakGMISAABwwduQxMRtAADgkr8hKd0FAACAXs3bkBRjiUsAAMABf0MS2QgAADjkbUgiIwEAAJe8DUkxzN8GAAAuhFI5yRizVVK1pBZJzdbaMpdFAQAApFtKISlqtrW20lklXcQWAAAAwCWG2wAAAJJINSRZSc8aY5YaY25xWVCqahtb0l0CAADoxVIdbjvPWrvbGFMi6TljzFpr7cuJJ0TD0y2SNGLEiG4uEwAAoGel1JNkrd0d/bxf0p8kzUhyzn3W2jJrbVlxcXH3Vnm82tgMAAAAONBpSDLG5BljCmK3JV0qaZXrwgAAANIpleG2gZL+ZIyJnf87a+0Cp1UBAACkWachyVq7WdJZPVDLCWF1GwAAcMH7LQAAAABcICQBAAAkQUgCAABIgpAEAACQBCEJAAAgCe9DEqvbAACAC96HJAAAABcISQAAAEl4H5K4dhsAAHDB+5AEAADgAiEJAAAgCe9DEqvbAACAC96HJAAAABcISQAAAEl4H5IYbQMAAC54H5IAAABcICQBAAAk4X1IsixvAwAADngfkgAAAFwgJAEAACThfUhisA0AALjgfUgCAABwgZAEAACQhPchicVtAADABS9DUn1TS7pLAAAAvZyXIamxJZzuEgAAQC/nZUhqPcTGeBsAAOh+XoYkchEAAHDNy5BkSUkAAMAxL0NSIla3AQAAF7wMSQQjAADgmp8hKd0FAACAXs/PkJTQlURgAgAALvgZktJdAAAA6PX8DEmkJAAA4JifISmhL4nABAAAXPAyJDHeBgAAXPMzJAEAADjmZUhK7EiyjLcBAAAH/AxJ5CIAAOCYnyGJSUkAAMAxP0NSQkaqaWhOXyEAAKDX8jMkJdz+ypOr0lYHAADovfwMSQldSQdqG9NYCQAA6K08DUnHbhuTvjoAAEDv5WVISmRESgIAAN3P/5BERgIAAA54GZJaDbelrwwAANCL+RmSEta3GbqSAACAA36GJCZuAwAAx/wMSQm3yUgAAMCFlEOSMSZojFlmjHnaZUGpSNwnieE2AADgQld6km6VtMZVIV3BldsAAIBrKYUkY8wwSVdKut9tOV1HRxIAAHAh1Z6keyXdISnc0QnGmFuMMeXGmPKKioruqK1DiRO3bzi31On3AgAA70ydhiRjzLsl7bfWLj3eedba+6y1ZdbasuLi4m4rsIPvFr9VXJDl+HsBAIB3olR6ks6T9F5jzFZJj0iaY4x52GlVnUjsSWJ+EgAAcKHTkGSt/aK1dpi1tlTSNZJesNZe57yy49XU6g4xCQAAdD8/90miJwkAADgW6srJ1toXJb3opJIuSLwsCR1JAADABf97kkhJAADAAS9DEgAAgGtehiTmJAEAANf8DEnMSQIAAI75GZLoSQIAAI55GZISMXEbAAC44GVIIhcBAADX/AxJDLIBAADH/AxJNvltAACA7uJlSEpErxIAAHDBy5CUGIvoSQIAAC74GZISkhEZCQAAuOBnSEq8TUoCAAAO+BmSWm0mSUoCAADdz8uQxCAbAABwzcuQxBYAAADANT9DUroLAAAAvZ6XISkR124DAAAueBmSEnPR4bqmdl//ypOrdMUPX+nBigAAQG/jaUg6lpLuf3VLu68/tGibVu+p6smSAABAL+NnSEp3AQAAoNfzMySRkgAAgGN+hiT6kgAAgGNehqTKmsZ0lwAAAHo5L0NSdsjLsgEAgEe8TBsMtgEAANf8DEmkJAAA4JiXIQkAAMA1T0NSal1JX3lylW5/dLnbUgAAQK/kaUiKGNI3W5OH9pUkHW1s0cK1+1t9/aFF2/TEsl3pKA0AAHjOy5AUm5O0+0i9Vu46Ikm688lV+uiDS7R6N5cjAQAAJ8/PkNTm/tHGFm3cXy1JOlzXfg+lfVX1PVAVAADoTbwMSW0ZIzWHI9Ep9jnRnU+u6umSAACA57wMSW23ALBWaomGo5qG5nbnb66o6YmyAABAL+JlSGrLyiocTU7JAtGmitqeLgkAAHjOy5DU9gK3YXtsmO17z65PR0kAAKCX8TMktRluC1urzZ30Fq3cecRhRQAAoLfxMiS1lcplSvaywg0AAHSBlyGpbSayKaSkVM4BAACI8TIkxVw8aaCkYyvbjicYMK7LAQAAvYiXISnWK/SPNftafW7r3y8aHb9dmJvpvjAAANBreBmS2qppaGl37IEbynTHZRPj98u3Huzw8fuq6lXf1Po5dhysU1V9U/cVCQAAvNIrQlLb+UZTRxRq7qSBrYbY/udvazt87Dnffl4Tv7Kg1fELvrtQ077xXPcXCwAAvOBlSEq2BUCioEl9/lFDczh+u7klcvtATUPkfgpznQAAQO/kZ0hKsplkokAXJmnXJlzG5M3thyVJz65OPscJAAC8c3gZktpq25MUSghJD9xQlvQxv39ju0rnP6Ov/2V1/Ng1970uSfriEysdVAkAAHwSSncBJ6LtcFtWKNjqfuJcpLmTBqq4IEsV1Q2tzokFoafe2h0/FrZS6fxnurlaAEAqmlrCevbtfXp5fYUO1jWqorpBP79+ugb2yU53aThJsbnDb24/pKxQUKOL87R6d5V2HT6qs0v7aUhhjiSprrFZv3ptq2obmvWes4Zo1IA8ZYUCMl2YRtOdvAxJMfMvn6i7/7ZWZw7r2+p42z2R2gakrmhsDisz1Cs63ADgpG2prNVbOw6rKC9Tk4f2VZ/skIwxCgaMahqa1dwS1qpdVXp1Y6WmDO+rw3VNmh99U1pSkKX9bf4eTx7aVyt3HdGA/CxV1rT/W33Ot5+XJBXmZmjioAIV5mTqM5eM1/iB+Wl74TwVHahp0NYDdTp9SB9lZwQ7f4AD+6vr9fqmA/rBc+s1dXihmsJWDU1hVdY0aN3eah1tar8SPaa0f662Hqhrdez/XtykzGBA6791uevSO+RlSIr1JBXmZEiKbCYZDJj4ppKdTdxuagkf9+uJVu+p0pThhSdUJwD0BgdqGvSe//eqdh85ucs77a9uUHZGQFmhoI4cbVJ2RkCNzWENjfYiSNKNM0uVnxXSZacPUl1jsz73+FuqrG7U4bomLdoc2cplwdt7NW1EoaaPLJIxRkMLczSkMEcXjBugzGBA1fXNOny0UVVHm3X6kD4KBIzqm1rU2BLWK+srNe+MQQoGjI7UNalPTihtYctaq8aWsDbsq9FrGytVVd+kvUcatGzHIZ0zqp8amsIa3i9X4wcWqDA3Q+MHFqihuUXr9lZrc0Wt1u2r1lNv7dao/nlav786/tp48aQSXTi+WEMLc1TfFNbeqnodrmvU8h2HVZyfpcraRuVkBPTiugqNLcnX7sNHNWFQga6ePlw5GUEV5WZo2sgiZYUCqm5o1paKWr2x5aDys0Mq33pIBdmR6LD78FHlZ4WUlRHQ4s0HteNQnZpaIkWErVVGMKCgMaprbNGgvtmaOrxQgwuz1dRi1S8vU43NYeVmBvXUW7tVkB1SIGB0zqj+GluSr62VtRreL0ehQHo7KfwMSdHPsQnaYWtbzUtqO3H7nFH9tHjLsX2SquubdTzGSA/ddI6ue2Cx6hqPfy4AnKzG5rC+/dc1enTJDh1tatHSOy/W82v3qyg3U7MnFMd7afIyQykvTKlpaNaSLQdVVd+kPy/frZ2H6jSwT7Ze2VApSTq7tEhThhfqS1dMkjFGjc1hPVq+Q5v212jVriMq33ZIUvt3+MGA0U3nlcpa6f5Xt0iSRhfnqSVslRkM6JoZI7Tn8FFNG1mkVzZUqig3Q5+5ZLwygif2YvfKHXPit621Wr7jsP66co8eeHVLfLHNyTp/7ABdPKlEg/pmq6q+WQVZIc07Y1DS8GStVdhKASMZEwlaR5ta1GKt+udFNi3OCgVUvu2Q7n9lsyTpvLEDNKY4X8+s3KPyrQcVttLY4nxtrKjRxv01SWvaefCoGlN8Q1/f3KJPXDRGVtKzb+/VP9bs1z/W7E/psW/vrpIkLdp8MB5CY4b0zU4ajLNCATU0hzWoT7aaWsI6VNeo4f1ydc3ZI3TVlCE6a3hhl9r75gtGd35SmnQakowx2ZJelpQVPf9xa+1XXReWiliPUXOLbTVPqbggq9V5Z5f2i//CS9Jj5TvaPde/ThuqJ97cJUna+K0rtGDVXknStb9YrK13X9ndpaOXstYe911p+daDuvpnr+sL8ybqE7PG9GBl6E6rd1dp24FaVdc3q09OhlrCVmNL8jWmOE+hhBeHmoZmfeLhpfFgIkkzRvXT0MIcba6s1fYDtZoyvFAL11W0ev7pd/2jw+9dkB1q90bvpvNGaf7lE/Xn5bv0+cdXdPjY9fuOvSAv2XpIS7Ye0i9eiQSdnIygjja1qCArpOroqt/8rJC2HqjTBeMG6P3ThumqKUNa/f++892nHe/HpCsmDz7u17vKGKOpI4o0dUSRbr14vGobmrWvql7r99Vo75Gj2nnoqCqqG3T4aJMmD+2rvKygXlhboTV7qvS+KUNU19iiVzZUqsVaZYcCqqpvVp/skF7dWKlXN1a2+34Xji/WviP1Omt4X00a3Ecrdx2Jv05IUmYo0hPWmb+/3XrFdE5GUNX1Teqfl6UrJw/W3EklOljbqGFFuZozsUSZoYCstXpxfYUWbTqgrIyglm0/pEN1jSrKzdTFkwZqYJ8sjS0p0NiS/FbP/flLJ+jZ1Xu1bMdhvWt0f2UGA+qTnaER/XLVNzdDa/dWac+Res0aX9yqLTdX1GhLZa3qm8J6dWMk3C7fcVjnjxugycMKVdfQrLmTBqpfXqb65WXG/9bFQmNvvfRXKj1JDZLmWGtrjDEZkl41xvzNWrvIcW0dik0Ai/XCxf6TXnP2cI0tyde154xodX5GMKCWsFVL2CpgpLujG0v+6ENTddfTqzW4b7ZeS/gFCQaMmsPH/uPP/+MK3f3+M13+k+CZxDBkrVVDc1hfe+ptPbIkEsD75WVqwa0X6K2dR/TQom16eX3rF8HvLFirI0eb9JlLxikzmL5JiaeqzsKmC/VNLcoKBfTQom16ZUOlJg4q0OubDmhjRY2aW6yunDxY6/ZVa/mOw50+V2Yw0GEvwBtbWr9bX7iuQiP752r6iCJdeeZg3ffyZjW2hBW20oXjBmjj/hr9LfqmTUreE/7L17bol69tid/vn5epYMBof3WD7rn6TE0ZXqj87JD65mTIyKi2sVnbDtTps48tj/cSlZUW6eMXjNaF44tT+XGlXX5WSPlZIQ3sk60zhxV2eN7nE6680JGq+ia9sGa/lmw9qPqmsDKCRm9sOai1e6q0v7pB6/ZVS5IygkahgNF7zhqiLZW1GlqUo+XbD2vOxBIt3XZIQwpzVJiboVW7juj904bpnNH91D8/S6t3VykzFFBWKKBpI4pSmudqjNHsCSWaPaEk5Z+JFBlJmXfGYM07I3lAnTiojyYO6tPu+OjifI0ujgSuK8/sPNzGfj+NMQr24j9fnYYkG0kksbcfGdGPU2KXxUC0kSqik/1GF+cl7bbLCEXOa2oJa8XOI/HjWaGAFn5uloIBo/PufkFSZGKhJF1+xmDdquWSpEeW7NAjS3bora9eqr7ReVDvZPuq6vXXlXt048zSHnkh236gTiP656q5JdzqXfrJqmloVn5Wx78Cy3cc1iNvbNfQwhwdqG3UJ2ePVd+cDF38g5e0/WBdh4+TpIO1jZoRnXCaaEB+lm44d6S+/9x6/eylTfrZS5t055WT9LHzR72jg1JDc4ueXLZLX/jjynYBY+7EEgUCRv9+4WiNGpAnY4xeXLdff16+W1dMHqSSgmydPqSPQsGA+mSHVNvYoh/+Y4M+OXuMCrIzkr4gbdxfo0v/9yWFrTSuJF8bkgx5PNdmv7RHE3qgpwwvVN+cDDU0t6ggO0Mb90fehcfE6r/s9IEaP7BAt18yXsYYLd12SP3yMlVd36QjR5tUUpCt4f1ylJMRjLf/3OiFu5MJh60qaxtUkJWhQ3WNGtw3W9sO1Ok/Hl6qUQPydMXkwbr8jEGd/p7kZAY1ID9LL35+dvzn33aV8DtJn+wMvW/qUL1v6tB2X1u164i+s2Ctzi7tpxvOLVXf3K6/BiTOuYJfUpqTZIwJSloqaaykn1hrFyc55xZJt0jSiBEj2n65W8XnJEX/qMSW83c01ygz+gejqSWs9dF3BFIkJOVFXyRvu3icvvLnt/XMpy+IPCbJH9azvv7sO37oLRy28dUmX//Lar3x5bkqKej+5bk7DtZpSGGO/vLWbt326PL48dzMoKaPLNJ915cpJ/PE/6j/z9/W6OcvReYLrPzapSrIztDKnUf0z02VHV7C5sF/bj3uc543tr9+/dEZqmtq0ZlfezZ+/NLTBupLV0zSwbpGTRtRFDl2+iB9+P5Fqqxp1F3PrNFdz6zRvf82RSP758YXCvT20GSt1U0PLmk31NS2B+b5tZG5FW1DiyS91KaHLlFiz0pMsh6eDftrWg1hfez8UZo1oViPvLFDZaVFumrKUOVmBrVg1V7NHNNfJZ0sR99+oE6VtQ3xtk40fWT7Y10RCJj471tOZuSFt3RAnhbcduFJPe87OSB15oyhffXQx85JdxlIE9P2umfHPdmYQkl/kvQpa+2qjs4rKyuz5eXlJ19dBx5bskN3/HGFfvrhafrEb9+MH//k7DFJu1YffG2LvvaX1frA9GH6w9Kd8eMLPzdLowbkxe/XNTYrN/NYbvznxkpde3/rPPhODEmJQx+/em1Lqw04pfY/k12Hj+q8u1/Qz6+frstOH9Tu+fYeqde5dz+vX1xfpotPa/2uubE5rA/+/PVOhzSKcjO07L8vbXe8uSWs/37qbf1u8XZ98fKJGluSr1W7qnTjzFIVZIf0+Js7dUebORsD8jP1/Q9O0Q2/fCPp9xrRL7dVz1H/vEw9d/tF2nXoqAIBacLAghPu4frBs+v0oxc2Jv3al66YqD8t260N+6o1sn+uggGj9ftqNHpAnn563XRNGFQQP7e+qUVVR5tU3xTWJ367VG/vrlJBdkg/v2668rNDeu+PX5Mkfef9k3X19OHxSac9KTbH4i/Ld2tjRU2rXt0hfbM1oCBL33rfZJ0xtE98hcymihqFrdWrGyq1eMtBvbn9kA7XNWne6YM0qjhPGQGjnYeO6olluzSyf662RYeOhvTNVnZGUJsTencS5WUG9dDN56ggK6SivEwNyM9SOBzZy7+3zq0A0DFjzFJrbbvdp7sUkqJP9FVJtdba73V0Tk+FpJ9fP13//tDS+PFPzx2n2y8Z3+783y7epi//qXWmW3DbBUnHZRM1tYR12b0va3PFsT+0XQlJP3tpk/ZV1eur7zk95cecaj71+2V6c9shvTZ/jtburdK8e1+RJL3w2Ys05/svSZLW3TUv/k50zZ4qXf7DV+KPv/LMwfqXKUN15vC+OljbGH98zCt3zNbwfrlavPmA/u2+5NPc7pg3QRmBgH79+lbtPHQ0fvyf8+fENyCTIi/C//rTf2pZiiternvXCD28aHu741dNGaJzRvVvN7etur5JBdndP9xa39Siz/7hLT2zYs8JPX5McZ42VSQPA51Z9pVLVJibIWOM9lXVKz8rpN2Hj6q+KaytB2r1qd8v09yJJZp/+UQN75d7wvuvPFa+o104zckI6kMzRujWueNOaAijK47UNenN7YfU0BzWzLH91cdBOwLwV0chKZXVbcWSmqy1h40xOZIulvQdBzWmLHbttkCbd8KhDt4B1iQZhussIEmRCd8fv2B0q8uU7DxUp2FFuSnVGZsg/uk541QUXRrqkx0H6/SX6I7kiTuRf/Oq0zW6OF+3XDha9728WRPuXCApMk+jbQ/QMyv2HPfF/7OPvaVLThuob/11TavjP/jgWfrVa1t18wWjdNWUyDyBj18YmW/2j9X7dPNvyjUzOo8s5qLxxfGA1LbXMCZgpC/Mm6iPnFuqnMzIi/SVP3pVkvTUf5133AmgLgKSJGVnBPWTa6fp3n8LK2iMWqzVZx5drqdX7NErd8zWyxsq4vuljC3J18K1+/WNp4/15rUNSN95/2RNH9lP8/+4QuXbDqlvToYe/OjZ2lJZq9sfe6vVuVO/+Vyn9T2/dn98yCvZkuDY8u/BfbMVMEZFuZlaseuwKqsbVVZapDe3HVJtY2QTuYCJLPf92PmjenQX5b65GZo9sWsTYAGg054kY8yZkn4tKajItd4es9Z+43iPcd2T9Mgb2zX/iZX65Y1luunBY9/n85dN0Cdnj213fqznKVGqPULJeqFSeWzby5u8/PnZGtE/tXDl2pG6Jm2urNHgvjm66J6FuvyMQbr3mqnaX12vjftrdLC2Uf/1u2VJHzu0MEevzY/sW7JhX7Uu+d+Xk573qxvP1kcfXJL0a7/7+DmaNKhPuxfoKyYP0rwzBuvdkwd3uhfMnO+/2KqHLyY3M6ild16inMzIEtv8rMhGcZsralTSJ/u4E7V9suNgnYoLsvTwom2aOqJQ00YUdWn4rKklrOnffE5VHczju2h8sf65qVLXzhihX7++7aTrnTioQDfOLNU1M9zOVwSAE3HCPUnW2hWSpjqpqpt1NJegobnjrdA7s6+q/Tb5TS3h426UdaSuqd2xC+9ZqPI7L9aA/GN7OG2qqFFRbmTPiZ5irdVZ33i21bEnl+/Wk8t3d/AIae035+mx8h3qm5PRat+TcQMLNH5gfqu9V2ZNKNaDH50hKRImaxua9bk/vKVPzBrTrpcm1hMlSQ99bIYuGJf60uPnb79ImypqtXjLAV05ebBmfe9FFeZk6LnbL4q3TWLPT2xpa28xvF8kcJ/oJmwZwYBWfO2y+P0dB+vULy9ToaBpN4n361edISmys/2T0bk/gYCJL5GfM7FEa6M7AJ87pr/Ktx6M77B7oKZRsycWt5rrBwC+8PIvV6zvq7G5dS9YR8NtDW02+zpvbP+Uv9d/XDRaP3p+g+67frpuic5/WrT5wHFf0G9/bHnS42V3/SPeC3Wkrklzo3N63n3mYP342mmSIqvHbvr1Ek0YWKCfv7xZ8y+fqNkTSlpN0j0ROw7W6YLvLuzSpNRLTxuoL185SdkZQX3k3NKk5/z10xdo1e7IpVuO1DW1m1uSlxXST6+bnvSxX7piki49baCmjijq8mRZY4zGluTHN1JbnmQSN1IXC13HEwwYvX/6sFbHYsvVJw3uo0mDI0PYlyaZrA8APvIyJMW0fV3N6mBS6UXji3XXM8fmvPz25nel/D1yM0Pthteuf+ANXTl5sP7n/ZOVmxFstbJpa2VtfP7GS5+fpZH989oNvUlq1Zvz9Io9enrFM/rNTTO0eMsBvbiuQi9Gl0Xf/be18blNL3z2Io0akKdvPL1aNfXNeu+UIbr+gTf0w2um6KopQ+ObGi7afECzEjYgu+Pxt/RYeWR+Tuz6drHQ98ANZSrKy9Q19y3SJ2eN1dVlw7RhX3Wrxx9PKBiIL1k/kcm3ZaX9uvwYAAB6gpchKTaNqm3vQ3F+8mGrcQMLNGFggdbtq9a3/uWME/6+S758sc7+VuRyAc+s3KNnVu7RxEEFrfYomfW9F+O3R/aPbC+QOHeqsqZBizYfSPr8H+lgCXrMnO+/pJvOG6VfvbZVkuITk299ZLlufWR5q3P/ftuFKinIUk5mMB6QYn5z0wxdOL64Vfhbf9exqyyz8RkAAL6GpOiAW9t5qm1XuyW6auoQfXfBOl3ewVbtqWh7TThJWrv32OaUh2ob47dfuWN2/PaciQP1L1OH6k/Ldqks4ZpMN84s1XOr9+kzl4zX5/5wbNXR0MIc7Tp8tNW+LzHJNshL5rJ7W0+ovvS0gbrn6rPU0NLiZPNHAAB6Gy9DUkdiwz7JfOKiMfrozFEntUuzFLmQZNug8pOFG/Wfs8bEd6KeOKig3RyPiUnmFH31Pafpa++N7KF09fRhuux/X9a6fdW69PSB8b2VKqob9PrmA9p16Ki+syAy7DZhYIEW3HaB1u2rVm5GSBfeszD+nD++dmrSlWn/9+Fp0WFB9ocBACAVXoak2HDb6AHHViydNrjPcS8XYIw56YAkSf/9ntN0zuh+rTaxvOfv6/S7xdvjlzt4+Ob2W9i3vczJ4i/Nbbdk+4//OVM/WbhRn5pzbBuD4oIsvfesIWpobtHMMf115GiTJg4ukDEmvtdT2zlTq3dX6f9e3BS//6k5Y7v1mmcAALwTeBmSYnKzgsoKBdTQHI6vcuoJl50+SPdcfaa+9tTb8U3ydh0+thN04jL/mBtnlmpQn2xNG1mk5rBNupFeflZIX5iX/IrVWaGgzjpOT1miO+ZN1A0zS1Wcn9XpfkMAACA5L0NS4sL/2DykrCQXpHXpA2XDNbQoR9f+ovW13R7u4EKIxhhdPvnE50N1VU/uZgwAQG/k5xhMdLzNyMS3ATjadOIbRp6omWMGaOvdV2pYUWQ12AfLhun8cQN6vA4AAND9/AxJUcYoPtz19AleHLQ7/PqmGfrA9GG6+1/PTFsNAACge3k/3HYqGFOcr3s+cFa6ywAAAN3Iy56k2Oo2piQDAABXvAxJMV256jkAAEBXeBmSrD3VBtwAAEBv42VIiqEfCQAAuOJlSErWjzRrQnGP1wEAAHovP0NSbOJ2QlfSzeePTk8xAACgV/IyJMWYhAG3UJDBNwAA0H28DEnJhtuCXKMMAAB0Iy9DUhy5CAAAOOJlSEq2BUA4zLYAAACg+3gZkmISJ26TkQAAQHfyOyQl3GaDSQAA0J28DEnJ8hA9SQAAoDv5GZKi69sSr902Y1S/dJUDAAB6IS9DUjKZoV7zTwEAAKcAL5NFfMft9JYBAAB6MS9DUowhJQEAAEe8DEnM0QYAAK75GZLiw210JQEAADdC6S7gZBgj/ehDU5XBddsAAEA38zIk2YQBt/eeNSSNlQAAgN7Ky+E2AAAA17wMSVyBBAAAuOZlSIphCwAAAOCK3yGJ1W0AAMARL0OSZbwNAAA45mVIimG4DQAAuOJlSKIjCQAAuOZnSIp+piMJAAC44mVIijGMtwEAAEe8DEkMtwEAANf8DEnRATf6kQAAgCtehqQYRtsAAIArXoYkhtsAAIBrXoakGCZuAwAAV7wMSXQkAQAA17wMSYy3AQAA1/wMSWLSNgAAcMvLkEQ/EgAAcK3TkGSMGW6MWWiMWWOMedsYc2tPFNYZOpIAAIBLoRTOaZb0WWvtm8aYAklLjTHPWWtXO66tQ0xJAgAArnXak2St3WOtfTN6u1rSGklDXRd23JpkWf4PAACc6tKcJGNMqaSpkhY7qaYLiEgAAMCllEOSMSZf0h8l3WatrUry9VuMMeXGmPKKiorurLEdhtsAAIBrKYUkY0yGIgHpt9baJ5KdY629z1pbZq0tKy4u7s4a238vsQUAAABwK5XVbUbSA5LWWGt/4L6k1BgG3AAAgEOp9CSdJ+l6SXOMMcujH1c4ruu4GG4DAACudboFgLX2VZ2K86RPvYoAAEAv4umO23QlAQAAt7wMSbJ0JAEAALf8DElidRsAAHDLy5DEYBsAAHDNy5AksQUAAABwy8uQZNkDAAAAOOZpSGJOEgAAcMvLkCSxug0AALjlZUhisA0AALjmZ0iykmG8DQAAOORlSJIYbgMAAG55GZK4LAkAAHDNy5Akia4kAADglJchiW2SAACAa16GJImOJAAA4Ja/IYnVbQAAwCEvQxKXJQEAAK55GZIkLksCAADc8jIk0Y8EAABc8zMkWSZuAwAAt7wMSRITtwEAgFtehiR23AYAAK55GZIkhtsAAIBbXoYkdgAAAACu+RmSxBYAAADALS9DUgQpCQAAuONlSGK4DQAAuOZlSJIsw20AAMApT0MSg20AAMAtL0MSw20AAMA1L0OSxOo2AADglpchiZ4kAADgmp8hSVaGWUkAAMAhL0OSxHAbAABwy8uQxHAbAABwzcuQJLEFAAAAcMvLkERHEgAAcM3PkGQlw6QkAADgkJchCQAAwDUvQ5JlwA0AADjmZUiSZQsAAADglp8hSYQkAADglpchicE2AADgmpchSRKXJQEAAE55GZIsW24DAADH/AxJYk4SAABwy8uQJHFZEgAA4JaXIYnRNgAA4JqXIUnisiQAAMAtL0MSHUkAAMA1P0OStcxJAgAATnUakowxvzTG7DfGrOqJglJGSgIAAA6l0pP0oKR5juvoEobbAACAa52GJGvty5IO9kAtKXtmxZ50lwAAAHq5UHc9kTHmFkm3SNKIESO662mT+vA5IzRqQJ7T7wEAAN7ZTCqX+DDGlEp62lp7RipPWlZWZsvLy0+yNAAAAPeMMUuttWVtj3u5ug0AAMA1QhIAAEASqWwB8HtJr0uaYIzZaYz5mPuyAAAA0qvTidvW2g/1RCEAAACnEobbAAAAkiAkAQAAJEFIAgAASIKQBAAAkAQhCQAAIAlCEgAAQBKEJAAAgCQISQAAAEkQkgAAAJIw1truf1JjKiRt6/Ynbm2ApErH3wNdQ5ucmmiXUw9tcmqiXU49PdUmI621xW0POglJPcEYU26tLUt3HTiGNjk10S6nHtrk1ES7nHrS3SYMtwEAACRBSAIAAEjC55B0X7oLQDu0yamJdjn10CanJtrl1JPWNvF2ThIAAIBLPvckAQAAOONdSDLGzDPGrDPGbDTGzE93Pb2ZMWa4MWahMWaNMeZtY8yt0eP9jDHPGWM2RD8XJTzmi9G2WWeMuSzh+HRjzMro135kjDHp+Df1JsaYoDFmmTHm6eh92iWNjDGFxpjHjTFro78z59Im6WeM+Uz079cqY8zvjTHZtEvPMsb80hiz3xizKuFYt7WBMSbLGPNo9PhiY0xptxVvrfXmQ1JQ0iZJoyVlSnpL0mnprqu3fkgaLGla9HaBpPWSTpP0XUnzo8fnS/pO9PZp0TbJkjQq2lbB6NfekHSuJCPpb5IuT/e/z/cPSbdL+p2kp6P3aZf0tsevJd0cvZ0pqZA2SXubDJW0RVJO9P5jkm6kXXq8HS6UNE3SqoRj3dYGkv5T0s+it6+R9Gh31e5bT9IMSRuttZuttY2SHpF0VZpr6rWstXustW9Gb1dLWqPIH52rFHlBUPTz+6K3r5L0iLW2wVq7RdJGSTOMMYMl9bHWvm4j/4t/k/AYnABjzDBJV0q6P+Ew7ZImxpg+irwQPCBJ1tpGa+1h0SangpCkHGNMSFKupN2iXXqUtfZlSQfbHO7ONkh8rsclze2unj7fQtJQSTsS7u+MHoNj0e7LqZIWSxpord0jRYKUpJLoaR21z9Do7bbHceLulXSHpHDCMdolfUZLqpD0q+gQ6P3GmDzRJmllrd0l6XuStkvaI+mItfZZ0S6ngu5sg/hjrLXNko5I6t8dRfoWkpIlQ5bnOWaMyZf0R0m3WWurjndqkmP2OMdxAowx75a031q7NNWHJDlGu3SvkCLDCT+11k6VVKvIEEJHaJMeEJ3ncpUiwzZDJOUZY6473kOSHKNdetaJtIGz9vEtJO2UNDzh/jBFuk7hiDEmQ5GA9Ftr7RPRw/uiXZ+Kft4fPd5R++yM3m57HCfmPEnvNcZsVWTIeY4x5mHRLum0U9JOa+3i6P3HFQlNtEl6XSxpi7W2wlrbJOkJSTNFu5wKurMN4o+JDqv2VfvhvRPiW0haImmcMWaUMSZTkQlaT6W5pl4rOqb7gKQ11tofJHzpKUk3RG/fIOnPCcevia40GCVpnKQ3ol2p1caYd0Wf8yMJj0EXWWu/aK0dZq0tVeR34AVr7XWiXdLGWrtX0g5jzIToobmSVos2Sbftkt5ljMmN/jznKjK3knZJv+5sg8TnulqRv4nd09OX7lnvXf2QdIUiq6w2SfpyuuvpzR+Szleky3KFpOXRjysUGet9XtKG6Od+CY/5crRt1ilh9YekMkmrol/7saIbmfJx0m00S8dWt9Eu6W2LKZLKo78vT0oqok3S/yHp65LWRn+mDymyaop26dk2+L0ic8KaFOn1+Vh3toGkbEl/UGSS9xuSRndX7ey4DQAAkIRvw20AAAA9gpAEAACQBCEJAAAgCUISAABAEoQkAACAJAhJAAAASRCSAAAAkiAkAQAAJPH/AbNpabqYUQKMAAAAAElFTkSuQmCC\n", 214 | "text/plain": [ 215 | "
" 216 | ] 217 | }, 218 | "metadata": { 219 | "needs_background": "light" 220 | }, 221 | "output_type": "display_data" 222 | } 223 | ], 224 | "source": [ 225 | "#Here is the main observation which shows that as the iteration increases the deviation in the value of pi decreases\n", 226 | "#and converges to 3.14.\n", 227 | "plt.figure(figsize=(10,6))\n", 228 | "y = np.arange(1, 10001)\n", 229 | "plt.plot(y, pi_val)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 33, 235 | "metadata": {}, 236 | "outputs": [ 237 | { 238 | "name": "stdout", 239 | "output_type": "stream", 240 | "text": [ 241 | "3.095592799503414\n" 242 | ] 243 | } 244 | ], 245 | "source": [ 246 | "#The value of pi after 10000 iteration \n", 247 | "pi = pi_val[9999]\n", 248 | "print(pi)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "#which is very close to 3.14." 258 | ] 259 | } 260 | ], 261 | "metadata": { 262 | "kernelspec": { 263 | "display_name": "Python 3", 264 | "language": "python", 265 | "name": "python3" 266 | }, 267 | "language_info": { 268 | "codemirror_mode": { 269 | "name": "ipython", 270 | "version": 3 271 | }, 272 | "file_extension": ".py", 273 | "mimetype": "text/x-python", 274 | "name": "python", 275 | "nbconvert_exporter": "python", 276 | "pygments_lexer": "ipython3", 277 | "version": "3.8.5" 278 | } 279 | }, 280 | "nbformat": 4, 281 | "nbformat_minor": 4 282 | } 283 | -------------------------------------------------------------------------------- /data/Breast-Cancer.csv: -------------------------------------------------------------------------------- 1 | id,x1,x2,x3,x4,x5,x6,x7,x8,x9,label 2 | 1000025,5,1,1,1,2,1,3,1,1,2 3 | 1002945,5,4,4,5,7,10,3,2,1,2 4 | 1015425,3,1,1,1,2,2,3,1,1,2 5 | 1016277,6,8,8,1,3,4,3,7,1,2 6 | 1017023,4,1,1,3,2,1,3,1,1,2 7 | 1017122,8,10,10,8,7,10,9,7,1,4 8 | 1018099,1,1,1,1,2,10,3,1,1,2 9 | 1018561,2,1,2,1,2,1,3,1,1,2 10 | 1033078,2,1,1,1,2,1,1,1,5,2 11 | 1033078,4,2,1,1,2,1,2,1,1,2 12 | 1035283,1,1,1,1,1,1,3,1,1,2 13 | 1036172,2,1,1,1,2,1,2,1,1,2 14 | 1041801,5,3,3,3,2,3,4,4,1,4 15 | 1043999,1,1,1,1,2,3,3,1,1,2 16 | 1044572,8,7,5,10,7,9,5,5,4,4 17 | 1047630,7,4,6,4,6,1,4,3,1,4 18 | 1048672,4,1,1,1,2,1,2,1,1,2 19 | 1049815,4,1,1,1,2,1,3,1,1,2 20 | 1050670,10,7,7,6,4,10,4,1,2,4 21 | 1050718,6,1,1,1,2,1,3,1,1,2 22 | 1054590,7,3,2,10,5,10,5,4,4,4 23 | 1054593,10,5,5,3,6,7,7,10,1,4 24 | 1056784,3,1,1,1,2,1,2,1,1,2 25 | 1057013,8,4,5,1,2,?,7,3,1,4 26 | 1059552,1,1,1,1,2,1,3,1,1,2 27 | 1065726,5,2,3,4,2,7,3,6,1,4 28 | 1066373,3,2,1,1,1,1,2,1,1,2 29 | 1066979,5,1,1,1,2,1,2,1,1,2 30 | 1067444,2,1,1,1,2,1,2,1,1,2 31 | 1070935,1,1,3,1,2,1,1,1,1,2 32 | 1070935,3,1,1,1,1,1,2,1,1,2 33 | 1071760,2,1,1,1,2,1,3,1,1,2 34 | 1072179,10,7,7,3,8,5,7,4,3,4 35 | 1074610,2,1,1,2,2,1,3,1,1,2 36 | 1075123,3,1,2,1,2,1,2,1,1,2 37 | 1079304,2,1,1,1,2,1,2,1,1,2 38 | 1080185,10,10,10,8,6,1,8,9,1,4 39 | 1081791,6,2,1,1,1,1,7,1,1,2 40 | 1084584,5,4,4,9,2,10,5,6,1,4 41 | 1091262,2,5,3,3,6,7,7,5,1,4 42 | 1096800,6,6,6,9,6,?,7,8,1,2 43 | 1099510,10,4,3,1,3,3,6,5,2,4 44 | 1100524,6,10,10,2,8,10,7,3,3,4 45 | 1102573,5,6,5,6,10,1,3,1,1,4 46 | 1103608,10,10,10,4,8,1,8,10,1,4 47 | 1103722,1,1,1,1,2,1,2,1,2,2 48 | 1105257,3,7,7,4,4,9,4,8,1,4 49 | 1105524,1,1,1,1,2,1,2,1,1,2 50 | 1106095,4,1,1,3,2,1,3,1,1,2 51 | 1106829,7,8,7,2,4,8,3,8,2,4 52 | 1108370,9,5,8,1,2,3,2,1,5,4 53 | 1108449,5,3,3,4,2,4,3,4,1,4 54 | 1110102,10,3,6,2,3,5,4,10,2,4 55 | 1110503,5,5,5,8,10,8,7,3,7,4 56 | 1110524,10,5,5,6,8,8,7,1,1,4 57 | 1111249,10,6,6,3,4,5,3,6,1,4 58 | 1112209,8,10,10,1,3,6,3,9,1,4 59 | 1113038,8,2,4,1,5,1,5,4,4,4 60 | 1113483,5,2,3,1,6,10,5,1,1,4 61 | 1113906,9,5,5,2,2,2,5,1,1,4 62 | 1115282,5,3,5,5,3,3,4,10,1,4 63 | 1115293,1,1,1,1,2,2,2,1,1,2 64 | 1116116,9,10,10,1,10,8,3,3,1,4 65 | 1116132,6,3,4,1,5,2,3,9,1,4 66 | 1116192,1,1,1,1,2,1,2,1,1,2 67 | 1116998,10,4,2,1,3,2,4,3,10,4 68 | 1117152,4,1,1,1,2,1,3,1,1,2 69 | 1118039,5,3,4,1,8,10,4,9,1,4 70 | 1120559,8,3,8,3,4,9,8,9,8,4 71 | 1121732,1,1,1,1,2,1,3,2,1,2 72 | 1121919,5,1,3,1,2,1,2,1,1,2 73 | 1123061,6,10,2,8,10,2,7,8,10,4 74 | 1124651,1,3,3,2,2,1,7,2,1,2 75 | 1125035,9,4,5,10,6,10,4,8,1,4 76 | 1126417,10,6,4,1,3,4,3,2,3,4 77 | 1131294,1,1,2,1,2,2,4,2,1,2 78 | 1132347,1,1,4,1,2,1,2,1,1,2 79 | 1133041,5,3,1,2,2,1,2,1,1,2 80 | 1133136,3,1,1,1,2,3,3,1,1,2 81 | 1136142,2,1,1,1,3,1,2,1,1,2 82 | 1137156,2,2,2,1,1,1,7,1,1,2 83 | 1143978,4,1,1,2,2,1,2,1,1,2 84 | 1143978,5,2,1,1,2,1,3,1,1,2 85 | 1147044,3,1,1,1,2,2,7,1,1,2 86 | 1147699,3,5,7,8,8,9,7,10,7,4 87 | 1147748,5,10,6,1,10,4,4,10,10,4 88 | 1148278,3,3,6,4,5,8,4,4,1,4 89 | 1148873,3,6,6,6,5,10,6,8,3,4 90 | 1152331,4,1,1,1,2,1,3,1,1,2 91 | 1155546,2,1,1,2,3,1,2,1,1,2 92 | 1156272,1,1,1,1,2,1,3,1,1,2 93 | 1156948,3,1,1,2,2,1,1,1,1,2 94 | 1157734,4,1,1,1,2,1,3,1,1,2 95 | 1158247,1,1,1,1,2,1,2,1,1,2 96 | 1160476,2,1,1,1,2,1,3,1,1,2 97 | 1164066,1,1,1,1,2,1,3,1,1,2 98 | 1165297,2,1,1,2,2,1,1,1,1,2 99 | 1165790,5,1,1,1,2,1,3,1,1,2 100 | 1165926,9,6,9,2,10,6,2,9,10,4 101 | 1166630,7,5,6,10,5,10,7,9,4,4 102 | 1166654,10,3,5,1,10,5,3,10,2,4 103 | 1167439,2,3,4,4,2,5,2,5,1,4 104 | 1167471,4,1,2,1,2,1,3,1,1,2 105 | 1168359,8,2,3,1,6,3,7,1,1,4 106 | 1168736,10,10,10,10,10,1,8,8,8,4 107 | 1169049,7,3,4,4,3,3,3,2,7,4 108 | 1170419,10,10,10,8,2,10,4,1,1,4 109 | 1170420,1,6,8,10,8,10,5,7,1,4 110 | 1171710,1,1,1,1,2,1,2,3,1,2 111 | 1171710,6,5,4,4,3,9,7,8,3,4 112 | 1171795,1,3,1,2,2,2,5,3,2,2 113 | 1171845,8,6,4,3,5,9,3,1,1,4 114 | 1172152,10,3,3,10,2,10,7,3,3,4 115 | 1173216,10,10,10,3,10,8,8,1,1,4 116 | 1173235,3,3,2,1,2,3,3,1,1,2 117 | 1173347,1,1,1,1,2,5,1,1,1,2 118 | 1173347,8,3,3,1,2,2,3,2,1,2 119 | 1173509,4,5,5,10,4,10,7,5,8,4 120 | 1173514,1,1,1,1,4,3,1,1,1,2 121 | 1173681,3,2,1,1,2,2,3,1,1,2 122 | 1174057,1,1,2,2,2,1,3,1,1,2 123 | 1174057,4,2,1,1,2,2,3,1,1,2 124 | 1174131,10,10,10,2,10,10,5,3,3,4 125 | 1174428,5,3,5,1,8,10,5,3,1,4 126 | 1175937,5,4,6,7,9,7,8,10,1,4 127 | 1176406,1,1,1,1,2,1,2,1,1,2 128 | 1176881,7,5,3,7,4,10,7,5,5,4 129 | 1177027,3,1,1,1,2,1,3,1,1,2 130 | 1177399,8,3,5,4,5,10,1,6,2,4 131 | 1177512,1,1,1,1,10,1,1,1,1,2 132 | 1178580,5,1,3,1,2,1,2,1,1,2 133 | 1179818,2,1,1,1,2,1,3,1,1,2 134 | 1180194,5,10,8,10,8,10,3,6,3,4 135 | 1180523,3,1,1,1,2,1,2,2,1,2 136 | 1180831,3,1,1,1,3,1,2,1,1,2 137 | 1181356,5,1,1,1,2,2,3,3,1,2 138 | 1182404,4,1,1,1,2,1,2,1,1,2 139 | 1182410,3,1,1,1,2,1,1,1,1,2 140 | 1183240,4,1,2,1,2,1,2,1,1,2 141 | 1183246,1,1,1,1,1,?,2,1,1,2 142 | 1183516,3,1,1,1,2,1,1,1,1,2 143 | 1183911,2,1,1,1,2,1,1,1,1,2 144 | 1183983,9,5,5,4,4,5,4,3,3,4 145 | 1184184,1,1,1,1,2,5,1,1,1,2 146 | 1184241,2,1,1,1,2,1,2,1,1,2 147 | 1184840,1,1,3,1,2,?,2,1,1,2 148 | 1185609,3,4,5,2,6,8,4,1,1,4 149 | 1185610,1,1,1,1,3,2,2,1,1,2 150 | 1187457,3,1,1,3,8,1,5,8,1,2 151 | 1187805,8,8,7,4,10,10,7,8,7,4 152 | 1188472,1,1,1,1,1,1,3,1,1,2 153 | 1189266,7,2,4,1,6,10,5,4,3,4 154 | 1189286,10,10,8,6,4,5,8,10,1,4 155 | 1190394,4,1,1,1,2,3,1,1,1,2 156 | 1190485,1,1,1,1,2,1,1,1,1,2 157 | 1192325,5,5,5,6,3,10,3,1,1,4 158 | 1193091,1,2,2,1,2,1,2,1,1,2 159 | 1193210,2,1,1,1,2,1,3,1,1,2 160 | 1193683,1,1,2,1,3,?,1,1,1,2 161 | 1196295,9,9,10,3,6,10,7,10,6,4 162 | 1196915,10,7,7,4,5,10,5,7,2,4 163 | 1197080,4,1,1,1,2,1,3,2,1,2 164 | 1197270,3,1,1,1,2,1,3,1,1,2 165 | 1197440,1,1,1,2,1,3,1,1,7,2 166 | 1197510,5,1,1,1,2,?,3,1,1,2 167 | 1197979,4,1,1,1,2,2,3,2,1,2 168 | 1197993,5,6,7,8,8,10,3,10,3,4 169 | 1198128,10,8,10,10,6,1,3,1,10,4 170 | 1198641,3,1,1,1,2,1,3,1,1,2 171 | 1199219,1,1,1,2,1,1,1,1,1,2 172 | 1199731,3,1,1,1,2,1,1,1,1,2 173 | 1199983,1,1,1,1,2,1,3,1,1,2 174 | 1200772,1,1,1,1,2,1,2,1,1,2 175 | 1200847,6,10,10,10,8,10,10,10,7,4 176 | 1200892,8,6,5,4,3,10,6,1,1,4 177 | 1200952,5,8,7,7,10,10,5,7,1,4 178 | 1201834,2,1,1,1,2,1,3,1,1,2 179 | 1201936,5,10,10,3,8,1,5,10,3,4 180 | 1202125,4,1,1,1,2,1,3,1,1,2 181 | 1202812,5,3,3,3,6,10,3,1,1,4 182 | 1203096,1,1,1,1,1,1,3,1,1,2 183 | 1204242,1,1,1,1,2,1,1,1,1,2 184 | 1204898,6,1,1,1,2,1,3,1,1,2 185 | 1205138,5,8,8,8,5,10,7,8,1,4 186 | 1205579,8,7,6,4,4,10,5,1,1,4 187 | 1206089,2,1,1,1,1,1,3,1,1,2 188 | 1206695,1,5,8,6,5,8,7,10,1,4 189 | 1206841,10,5,6,10,6,10,7,7,10,4 190 | 1207986,5,8,4,10,5,8,9,10,1,4 191 | 1208301,1,2,3,1,2,1,3,1,1,2 192 | 1210963,10,10,10,8,6,8,7,10,1,4 193 | 1211202,7,5,10,10,10,10,4,10,3,4 194 | 1212232,5,1,1,1,2,1,2,1,1,2 195 | 1212251,1,1,1,1,2,1,3,1,1,2 196 | 1212422,3,1,1,1,2,1,3,1,1,2 197 | 1212422,4,1,1,1,2,1,3,1,1,2 198 | 1213375,8,4,4,5,4,7,7,8,2,2 199 | 1213383,5,1,1,4,2,1,3,1,1,2 200 | 1214092,1,1,1,1,2,1,1,1,1,2 201 | 1214556,3,1,1,1,2,1,2,1,1,2 202 | 1214966,9,7,7,5,5,10,7,8,3,4 203 | 1216694,10,8,8,4,10,10,8,1,1,4 204 | 1216947,1,1,1,1,2,1,3,1,1,2 205 | 1217051,5,1,1,1,2,1,3,1,1,2 206 | 1217264,1,1,1,1,2,1,3,1,1,2 207 | 1218105,5,10,10,9,6,10,7,10,5,4 208 | 1218741,10,10,9,3,7,5,3,5,1,4 209 | 1218860,1,1,1,1,1,1,3,1,1,2 210 | 1218860,1,1,1,1,1,1,3,1,1,2 211 | 1219406,5,1,1,1,1,1,3,1,1,2 212 | 1219525,8,10,10,10,5,10,8,10,6,4 213 | 1219859,8,10,8,8,4,8,7,7,1,4 214 | 1220330,1,1,1,1,2,1,3,1,1,2 215 | 1221863,10,10,10,10,7,10,7,10,4,4 216 | 1222047,10,10,10,10,3,10,10,6,1,4 217 | 1222936,8,7,8,7,5,5,5,10,2,4 218 | 1223282,1,1,1,1,2,1,2,1,1,2 219 | 1223426,1,1,1,1,2,1,3,1,1,2 220 | 1223793,6,10,7,7,6,4,8,10,2,4 221 | 1223967,6,1,3,1,2,1,3,1,1,2 222 | 1224329,1,1,1,2,2,1,3,1,1,2 223 | 1225799,10,6,4,3,10,10,9,10,1,4 224 | 1226012,4,1,1,3,1,5,2,1,1,4 225 | 1226612,7,5,6,3,3,8,7,4,1,4 226 | 1227210,10,5,5,6,3,10,7,9,2,4 227 | 1227244,1,1,1,1,2,1,2,1,1,2 228 | 1227481,10,5,7,4,4,10,8,9,1,4 229 | 1228152,8,9,9,5,3,5,7,7,1,4 230 | 1228311,1,1,1,1,1,1,3,1,1,2 231 | 1230175,10,10,10,3,10,10,9,10,1,4 232 | 1230688,7,4,7,4,3,7,7,6,1,4 233 | 1231387,6,8,7,5,6,8,8,9,2,4 234 | 1231706,8,4,6,3,3,1,4,3,1,2 235 | 1232225,10,4,5,5,5,10,4,1,1,4 236 | 1236043,3,3,2,1,3,1,3,6,1,2 237 | 1241232,3,1,4,1,2,?,3,1,1,2 238 | 1241559,10,8,8,2,8,10,4,8,10,4 239 | 1241679,9,8,8,5,6,2,4,10,4,4 240 | 1242364,8,10,10,8,6,9,3,10,10,4 241 | 1243256,10,4,3,2,3,10,5,3,2,4 242 | 1270479,5,1,3,3,2,2,2,3,1,2 243 | 1276091,3,1,1,3,1,1,3,1,1,2 244 | 1277018,2,1,1,1,2,1,3,1,1,2 245 | 128059,1,1,1,1,2,5,5,1,1,2 246 | 1285531,1,1,1,1,2,1,3,1,1,2 247 | 1287775,5,1,1,2,2,2,3,1,1,2 248 | 144888,8,10,10,8,5,10,7,8,1,4 249 | 145447,8,4,4,1,2,9,3,3,1,4 250 | 167528,4,1,1,1,2,1,3,6,1,2 251 | 169356,3,1,1,1,2,?,3,1,1,2 252 | 183913,1,2,2,1,2,1,1,1,1,2 253 | 191250,10,4,4,10,2,10,5,3,3,4 254 | 1017023,6,3,3,5,3,10,3,5,3,2 255 | 1100524,6,10,10,2,8,10,7,3,3,4 256 | 1116116,9,10,10,1,10,8,3,3,1,4 257 | 1168736,5,6,6,2,4,10,3,6,1,4 258 | 1182404,3,1,1,1,2,1,1,1,1,2 259 | 1182404,3,1,1,1,2,1,2,1,1,2 260 | 1198641,3,1,1,1,2,1,3,1,1,2 261 | 242970,5,7,7,1,5,8,3,4,1,2 262 | 255644,10,5,8,10,3,10,5,1,3,4 263 | 263538,5,10,10,6,10,10,10,6,5,4 264 | 274137,8,8,9,4,5,10,7,8,1,4 265 | 303213,10,4,4,10,6,10,5,5,1,4 266 | 314428,7,9,4,10,10,3,5,3,3,4 267 | 1182404,5,1,4,1,2,1,3,2,1,2 268 | 1198641,10,10,6,3,3,10,4,3,2,4 269 | 320675,3,3,5,2,3,10,7,1,1,4 270 | 324427,10,8,8,2,3,4,8,7,8,4 271 | 385103,1,1,1,1,2,1,3,1,1,2 272 | 390840,8,4,7,1,3,10,3,9,2,4 273 | 411453,5,1,1,1,2,1,3,1,1,2 274 | 320675,3,3,5,2,3,10,7,1,1,4 275 | 428903,7,2,4,1,3,4,3,3,1,4 276 | 431495,3,1,1,1,2,1,3,2,1,2 277 | 432809,3,1,3,1,2,?,2,1,1,2 278 | 434518,3,1,1,1,2,1,2,1,1,2 279 | 452264,1,1,1,1,2,1,2,1,1,2 280 | 456282,1,1,1,1,2,1,3,1,1,2 281 | 476903,10,5,7,3,3,7,3,3,8,4 282 | 486283,3,1,1,1,2,1,3,1,1,2 283 | 486662,2,1,1,2,2,1,3,1,1,2 284 | 488173,1,4,3,10,4,10,5,6,1,4 285 | 492268,10,4,6,1,2,10,5,3,1,4 286 | 508234,7,4,5,10,2,10,3,8,2,4 287 | 527363,8,10,10,10,8,10,10,7,3,4 288 | 529329,10,10,10,10,10,10,4,10,10,4 289 | 535331,3,1,1,1,3,1,2,1,1,2 290 | 543558,6,1,3,1,4,5,5,10,1,4 291 | 555977,5,6,6,8,6,10,4,10,4,4 292 | 560680,1,1,1,1,2,1,1,1,1,2 293 | 561477,1,1,1,1,2,1,3,1,1,2 294 | 563649,8,8,8,1,2,?,6,10,1,4 295 | 601265,10,4,4,6,2,10,2,3,1,4 296 | 606140,1,1,1,1,2,?,2,1,1,2 297 | 606722,5,5,7,8,6,10,7,4,1,4 298 | 616240,5,3,4,3,4,5,4,7,1,2 299 | 61634,5,4,3,1,2,?,2,3,1,2 300 | 625201,8,2,1,1,5,1,1,1,1,2 301 | 63375,9,1,2,6,4,10,7,7,2,4 302 | 635844,8,4,10,5,4,4,7,10,1,4 303 | 636130,1,1,1,1,2,1,3,1,1,2 304 | 640744,10,10,10,7,9,10,7,10,10,4 305 | 646904,1,1,1,1,2,1,3,1,1,2 306 | 653777,8,3,4,9,3,10,3,3,1,4 307 | 659642,10,8,4,4,4,10,3,10,4,4 308 | 666090,1,1,1,1,2,1,3,1,1,2 309 | 666942,1,1,1,1,2,1,3,1,1,2 310 | 667204,7,8,7,6,4,3,8,8,4,4 311 | 673637,3,1,1,1,2,5,5,1,1,2 312 | 684955,2,1,1,1,3,1,2,1,1,2 313 | 688033,1,1,1,1,2,1,1,1,1,2 314 | 691628,8,6,4,10,10,1,3,5,1,4 315 | 693702,1,1,1,1,2,1,1,1,1,2 316 | 704097,1,1,1,1,1,1,2,1,1,2 317 | 704168,4,6,5,6,7,?,4,9,1,2 318 | 706426,5,5,5,2,5,10,4,3,1,4 319 | 709287,6,8,7,8,6,8,8,9,1,4 320 | 718641,1,1,1,1,5,1,3,1,1,2 321 | 721482,4,4,4,4,6,5,7,3,1,2 322 | 730881,7,6,3,2,5,10,7,4,6,4 323 | 733639,3,1,1,1,2,?,3,1,1,2 324 | 733639,3,1,1,1,2,1,3,1,1,2 325 | 733823,5,4,6,10,2,10,4,1,1,4 326 | 740492,1,1,1,1,2,1,3,1,1,2 327 | 743348,3,2,2,1,2,1,2,3,1,2 328 | 752904,10,1,1,1,2,10,5,4,1,4 329 | 756136,1,1,1,1,2,1,2,1,1,2 330 | 760001,8,10,3,2,6,4,3,10,1,4 331 | 760239,10,4,6,4,5,10,7,1,1,4 332 | 76389,10,4,7,2,2,8,6,1,1,4 333 | 764974,5,1,1,1,2,1,3,1,2,2 334 | 770066,5,2,2,2,2,1,2,2,1,2 335 | 785208,5,4,6,6,4,10,4,3,1,4 336 | 785615,8,6,7,3,3,10,3,4,2,4 337 | 792744,1,1,1,1,2,1,1,1,1,2 338 | 797327,6,5,5,8,4,10,3,4,1,4 339 | 798429,1,1,1,1,2,1,3,1,1,2 340 | 704097,1,1,1,1,1,1,2,1,1,2 341 | 806423,8,5,5,5,2,10,4,3,1,4 342 | 809912,10,3,3,1,2,10,7,6,1,4 343 | 810104,1,1,1,1,2,1,3,1,1,2 344 | 814265,2,1,1,1,2,1,1,1,1,2 345 | 814911,1,1,1,1,2,1,1,1,1,2 346 | 822829,7,6,4,8,10,10,9,5,3,4 347 | 826923,1,1,1,1,2,1,1,1,1,2 348 | 830690,5,2,2,2,3,1,1,3,1,2 349 | 831268,1,1,1,1,1,1,1,3,1,2 350 | 832226,3,4,4,10,5,1,3,3,1,4 351 | 832567,4,2,3,5,3,8,7,6,1,4 352 | 836433,5,1,1,3,2,1,1,1,1,2 353 | 837082,2,1,1,1,2,1,3,1,1,2 354 | 846832,3,4,5,3,7,3,4,6,1,2 355 | 850831,2,7,10,10,7,10,4,9,4,4 356 | 855524,1,1,1,1,2,1,2,1,1,2 357 | 857774,4,1,1,1,3,1,2,2,1,2 358 | 859164,5,3,3,1,3,3,3,3,3,4 359 | 859350,8,10,10,7,10,10,7,3,8,4 360 | 866325,8,10,5,3,8,4,4,10,3,4 361 | 873549,10,3,5,4,3,7,3,5,3,4 362 | 877291,6,10,10,10,10,10,8,10,10,4 363 | 877943,3,10,3,10,6,10,5,1,4,4 364 | 888169,3,2,2,1,4,3,2,1,1,2 365 | 888523,4,4,4,2,2,3,2,1,1,2 366 | 896404,2,1,1,1,2,1,3,1,1,2 367 | 897172,2,1,1,1,2,1,2,1,1,2 368 | 95719,6,10,10,10,8,10,7,10,7,4 369 | 160296,5,8,8,10,5,10,8,10,3,4 370 | 342245,1,1,3,1,2,1,1,1,1,2 371 | 428598,1,1,3,1,1,1,2,1,1,2 372 | 492561,4,3,2,1,3,1,2,1,1,2 373 | 493452,1,1,3,1,2,1,1,1,1,2 374 | 493452,4,1,2,1,2,1,2,1,1,2 375 | 521441,5,1,1,2,2,1,2,1,1,2 376 | 560680,3,1,2,1,2,1,2,1,1,2 377 | 636437,1,1,1,1,2,1,1,1,1,2 378 | 640712,1,1,1,1,2,1,2,1,1,2 379 | 654244,1,1,1,1,1,1,2,1,1,2 380 | 657753,3,1,1,4,3,1,2,2,1,2 381 | 685977,5,3,4,1,4,1,3,1,1,2 382 | 805448,1,1,1,1,2,1,1,1,1,2 383 | 846423,10,6,3,6,4,10,7,8,4,4 384 | 1002504,3,2,2,2,2,1,3,2,1,2 385 | 1022257,2,1,1,1,2,1,1,1,1,2 386 | 1026122,2,1,1,1,2,1,1,1,1,2 387 | 1071084,3,3,2,2,3,1,1,2,3,2 388 | 1080233,7,6,6,3,2,10,7,1,1,4 389 | 1114570,5,3,3,2,3,1,3,1,1,2 390 | 1114570,2,1,1,1,2,1,2,2,1,2 391 | 1116715,5,1,1,1,3,2,2,2,1,2 392 | 1131411,1,1,1,2,2,1,2,1,1,2 393 | 1151734,10,8,7,4,3,10,7,9,1,4 394 | 1156017,3,1,1,1,2,1,2,1,1,2 395 | 1158247,1,1,1,1,1,1,1,1,1,2 396 | 1158405,1,2,3,1,2,1,2,1,1,2 397 | 1168278,3,1,1,1,2,1,2,1,1,2 398 | 1176187,3,1,1,1,2,1,3,1,1,2 399 | 1196263,4,1,1,1,2,1,1,1,1,2 400 | 1196475,3,2,1,1,2,1,2,2,1,2 401 | 1206314,1,2,3,1,2,1,1,1,1,2 402 | 1211265,3,10,8,7,6,9,9,3,8,4 403 | 1213784,3,1,1,1,2,1,1,1,1,2 404 | 1223003,5,3,3,1,2,1,2,1,1,2 405 | 1223306,3,1,1,1,2,4,1,1,1,2 406 | 1223543,1,2,1,3,2,1,1,2,1,2 407 | 1229929,1,1,1,1,2,1,2,1,1,2 408 | 1231853,4,2,2,1,2,1,2,1,1,2 409 | 1234554,1,1,1,1,2,1,2,1,1,2 410 | 1236837,2,3,2,2,2,2,3,1,1,2 411 | 1237674,3,1,2,1,2,1,2,1,1,2 412 | 1238021,1,1,1,1,2,1,2,1,1,2 413 | 1238464,1,1,1,1,1,?,2,1,1,2 414 | 1238633,10,10,10,6,8,4,8,5,1,4 415 | 1238915,5,1,2,1,2,1,3,1,1,2 416 | 1238948,8,5,6,2,3,10,6,6,1,4 417 | 1239232,3,3,2,6,3,3,3,5,1,2 418 | 1239347,8,7,8,5,10,10,7,2,1,4 419 | 1239967,1,1,1,1,2,1,2,1,1,2 420 | 1240337,5,2,2,2,2,2,3,2,2,2 421 | 1253505,2,3,1,1,5,1,1,1,1,2 422 | 1255384,3,2,2,3,2,3,3,1,1,2 423 | 1257200,10,10,10,7,10,10,8,2,1,4 424 | 1257648,4,3,3,1,2,1,3,3,1,2 425 | 1257815,5,1,3,1,2,1,2,1,1,2 426 | 1257938,3,1,1,1,2,1,1,1,1,2 427 | 1258549,9,10,10,10,10,10,10,10,1,4 428 | 1258556,5,3,6,1,2,1,1,1,1,2 429 | 1266154,8,7,8,2,4,2,5,10,1,4 430 | 1272039,1,1,1,1,2,1,2,1,1,2 431 | 1276091,2,1,1,1,2,1,2,1,1,2 432 | 1276091,1,3,1,1,2,1,2,2,1,2 433 | 1276091,5,1,1,3,4,1,3,2,1,2 434 | 1277629,5,1,1,1,2,1,2,2,1,2 435 | 1293439,3,2,2,3,2,1,1,1,1,2 436 | 1293439,6,9,7,5,5,8,4,2,1,2 437 | 1294562,10,8,10,1,3,10,5,1,1,4 438 | 1295186,10,10,10,1,6,1,2,8,1,4 439 | 527337,4,1,1,1,2,1,1,1,1,2 440 | 558538,4,1,3,3,2,1,1,1,1,2 441 | 566509,5,1,1,1,2,1,1,1,1,2 442 | 608157,10,4,3,10,4,10,10,1,1,4 443 | 677910,5,2,2,4,2,4,1,1,1,2 444 | 734111,1,1,1,3,2,3,1,1,1,2 445 | 734111,1,1,1,1,2,2,1,1,1,2 446 | 780555,5,1,1,6,3,1,2,1,1,2 447 | 827627,2,1,1,1,2,1,1,1,1,2 448 | 1049837,1,1,1,1,2,1,1,1,1,2 449 | 1058849,5,1,1,1,2,1,1,1,1,2 450 | 1182404,1,1,1,1,1,1,1,1,1,2 451 | 1193544,5,7,9,8,6,10,8,10,1,4 452 | 1201870,4,1,1,3,1,1,2,1,1,2 453 | 1202253,5,1,1,1,2,1,1,1,1,2 454 | 1227081,3,1,1,3,2,1,1,1,1,2 455 | 1230994,4,5,5,8,6,10,10,7,1,4 456 | 1238410,2,3,1,1,3,1,1,1,1,2 457 | 1246562,10,2,2,1,2,6,1,1,2,4 458 | 1257470,10,6,5,8,5,10,8,6,1,4 459 | 1259008,8,8,9,6,6,3,10,10,1,4 460 | 1266124,5,1,2,1,2,1,1,1,1,2 461 | 1267898,5,1,3,1,2,1,1,1,1,2 462 | 1268313,5,1,1,3,2,1,1,1,1,2 463 | 1268804,3,1,1,1,2,5,1,1,1,2 464 | 1276091,6,1,1,3,2,1,1,1,1,2 465 | 1280258,4,1,1,1,2,1,1,2,1,2 466 | 1293966,4,1,1,1,2,1,1,1,1,2 467 | 1296572,10,9,8,7,6,4,7,10,3,4 468 | 1298416,10,6,6,2,4,10,9,7,1,4 469 | 1299596,6,6,6,5,4,10,7,6,2,4 470 | 1105524,4,1,1,1,2,1,1,1,1,2 471 | 1181685,1,1,2,1,2,1,2,1,1,2 472 | 1211594,3,1,1,1,1,1,2,1,1,2 473 | 1238777,6,1,1,3,2,1,1,1,1,2 474 | 1257608,6,1,1,1,1,1,1,1,1,2 475 | 1269574,4,1,1,1,2,1,1,1,1,2 476 | 1277145,5,1,1,1,2,1,1,1,1,2 477 | 1287282,3,1,1,1,2,1,1,1,1,2 478 | 1296025,4,1,2,1,2,1,1,1,1,2 479 | 1296263,4,1,1,1,2,1,1,1,1,2 480 | 1296593,5,2,1,1,2,1,1,1,1,2 481 | 1299161,4,8,7,10,4,10,7,5,1,4 482 | 1301945,5,1,1,1,1,1,1,1,1,2 483 | 1302428,5,3,2,4,2,1,1,1,1,2 484 | 1318169,9,10,10,10,10,5,10,10,10,4 485 | 474162,8,7,8,5,5,10,9,10,1,4 486 | 787451,5,1,2,1,2,1,1,1,1,2 487 | 1002025,1,1,1,3,1,3,1,1,1,2 488 | 1070522,3,1,1,1,1,1,2,1,1,2 489 | 1073960,10,10,10,10,6,10,8,1,5,4 490 | 1076352,3,6,4,10,3,3,3,4,1,4 491 | 1084139,6,3,2,1,3,4,4,1,1,4 492 | 1115293,1,1,1,1,2,1,1,1,1,2 493 | 1119189,5,8,9,4,3,10,7,1,1,4 494 | 1133991,4,1,1,1,1,1,2,1,1,2 495 | 1142706,5,10,10,10,6,10,6,5,2,4 496 | 1155967,5,1,2,10,4,5,2,1,1,2 497 | 1170945,3,1,1,1,1,1,2,1,1,2 498 | 1181567,1,1,1,1,1,1,1,1,1,2 499 | 1182404,4,2,1,1,2,1,1,1,1,2 500 | 1204558,4,1,1,1,2,1,2,1,1,2 501 | 1217952,4,1,1,1,2,1,2,1,1,2 502 | 1224565,6,1,1,1,2,1,3,1,1,2 503 | 1238186,4,1,1,1,2,1,2,1,1,2 504 | 1253917,4,1,1,2,2,1,2,1,1,2 505 | 1265899,4,1,1,1,2,1,3,1,1,2 506 | 1268766,1,1,1,1,2,1,1,1,1,2 507 | 1277268,3,3,1,1,2,1,1,1,1,2 508 | 1286943,8,10,10,10,7,5,4,8,7,4 509 | 1295508,1,1,1,1,2,4,1,1,1,2 510 | 1297327,5,1,1,1,2,1,1,1,1,2 511 | 1297522,2,1,1,1,2,1,1,1,1,2 512 | 1298360,1,1,1,1,2,1,1,1,1,2 513 | 1299924,5,1,1,1,2,1,2,1,1,2 514 | 1299994,5,1,1,1,2,1,1,1,1,2 515 | 1304595,3,1,1,1,1,1,2,1,1,2 516 | 1306282,6,6,7,10,3,10,8,10,2,4 517 | 1313325,4,10,4,7,3,10,9,10,1,4 518 | 1320077,1,1,1,1,1,1,1,1,1,2 519 | 1320077,1,1,1,1,1,1,2,1,1,2 520 | 1320304,3,1,2,2,2,1,1,1,1,2 521 | 1330439,4,7,8,3,4,10,9,1,1,4 522 | 333093,1,1,1,1,3,1,1,1,1,2 523 | 369565,4,1,1,1,3,1,1,1,1,2 524 | 412300,10,4,5,4,3,5,7,3,1,4 525 | 672113,7,5,6,10,4,10,5,3,1,4 526 | 749653,3,1,1,1,2,1,2,1,1,2 527 | 769612,3,1,1,2,2,1,1,1,1,2 528 | 769612,4,1,1,1,2,1,1,1,1,2 529 | 798429,4,1,1,1,2,1,3,1,1,2 530 | 807657,6,1,3,2,2,1,1,1,1,2 531 | 8233704,4,1,1,1,1,1,2,1,1,2 532 | 837480,7,4,4,3,4,10,6,9,1,4 533 | 867392,4,2,2,1,2,1,2,1,1,2 534 | 869828,1,1,1,1,1,1,3,1,1,2 535 | 1043068,3,1,1,1,2,1,2,1,1,2 536 | 1056171,2,1,1,1,2,1,2,1,1,2 537 | 1061990,1,1,3,2,2,1,3,1,1,2 538 | 1113061,5,1,1,1,2,1,3,1,1,2 539 | 1116192,5,1,2,1,2,1,3,1,1,2 540 | 1135090,4,1,1,1,2,1,2,1,1,2 541 | 1145420,6,1,1,1,2,1,2,1,1,2 542 | 1158157,5,1,1,1,2,2,2,1,1,2 543 | 1171578,3,1,1,1,2,1,1,1,1,2 544 | 1174841,5,3,1,1,2,1,1,1,1,2 545 | 1184586,4,1,1,1,2,1,2,1,1,2 546 | 1186936,2,1,3,2,2,1,2,1,1,2 547 | 1197527,5,1,1,1,2,1,2,1,1,2 548 | 1222464,6,10,10,10,4,10,7,10,1,4 549 | 1240603,2,1,1,1,1,1,1,1,1,2 550 | 1240603,3,1,1,1,1,1,1,1,1,2 551 | 1241035,7,8,3,7,4,5,7,8,2,4 552 | 1287971,3,1,1,1,2,1,2,1,1,2 553 | 1289391,1,1,1,1,2,1,3,1,1,2 554 | 1299924,3,2,2,2,2,1,4,2,1,2 555 | 1306339,4,4,2,1,2,5,2,1,2,2 556 | 1313658,3,1,1,1,2,1,1,1,1,2 557 | 1313982,4,3,1,1,2,1,4,8,1,2 558 | 1321264,5,2,2,2,1,1,2,1,1,2 559 | 1321321,5,1,1,3,2,1,1,1,1,2 560 | 1321348,2,1,1,1,2,1,2,1,1,2 561 | 1321931,5,1,1,1,2,1,2,1,1,2 562 | 1321942,5,1,1,1,2,1,3,1,1,2 563 | 1321942,5,1,1,1,2,1,3,1,1,2 564 | 1328331,1,1,1,1,2,1,3,1,1,2 565 | 1328755,3,1,1,1,2,1,2,1,1,2 566 | 1331405,4,1,1,1,2,1,3,2,1,2 567 | 1331412,5,7,10,10,5,10,10,10,1,4 568 | 1333104,3,1,2,1,2,1,3,1,1,2 569 | 1334071,4,1,1,1,2,3,2,1,1,2 570 | 1343068,8,4,4,1,6,10,2,5,2,4 571 | 1343374,10,10,8,10,6,5,10,3,1,4 572 | 1344121,8,10,4,4,8,10,8,2,1,4 573 | 142932,7,6,10,5,3,10,9,10,2,4 574 | 183936,3,1,1,1,2,1,2,1,1,2 575 | 324382,1,1,1,1,2,1,2,1,1,2 576 | 378275,10,9,7,3,4,2,7,7,1,4 577 | 385103,5,1,2,1,2,1,3,1,1,2 578 | 690557,5,1,1,1,2,1,2,1,1,2 579 | 695091,1,1,1,1,2,1,2,1,1,2 580 | 695219,1,1,1,1,2,1,2,1,1,2 581 | 824249,1,1,1,1,2,1,3,1,1,2 582 | 871549,5,1,2,1,2,1,2,1,1,2 583 | 878358,5,7,10,6,5,10,7,5,1,4 584 | 1107684,6,10,5,5,4,10,6,10,1,4 585 | 1115762,3,1,1,1,2,1,1,1,1,2 586 | 1217717,5,1,1,6,3,1,1,1,1,2 587 | 1239420,1,1,1,1,2,1,1,1,1,2 588 | 1254538,8,10,10,10,6,10,10,10,1,4 589 | 1261751,5,1,1,1,2,1,2,2,1,2 590 | 1268275,9,8,8,9,6,3,4,1,1,4 591 | 1272166,5,1,1,1,2,1,1,1,1,2 592 | 1294261,4,10,8,5,4,1,10,1,1,4 593 | 1295529,2,5,7,6,4,10,7,6,1,4 594 | 1298484,10,3,4,5,3,10,4,1,1,4 595 | 1311875,5,1,2,1,2,1,1,1,1,2 596 | 1315506,4,8,6,3,4,10,7,1,1,4 597 | 1320141,5,1,1,1,2,1,2,1,1,2 598 | 1325309,4,1,2,1,2,1,2,1,1,2 599 | 1333063,5,1,3,1,2,1,3,1,1,2 600 | 1333495,3,1,1,1,2,1,2,1,1,2 601 | 1334659,5,2,4,1,1,1,1,1,1,2 602 | 1336798,3,1,1,1,2,1,2,1,1,2 603 | 1344449,1,1,1,1,1,1,2,1,1,2 604 | 1350568,4,1,1,1,2,1,2,1,1,2 605 | 1352663,5,4,6,8,4,1,8,10,1,4 606 | 188336,5,3,2,8,5,10,8,1,2,4 607 | 352431,10,5,10,3,5,8,7,8,3,4 608 | 353098,4,1,1,2,2,1,1,1,1,2 609 | 411453,1,1,1,1,2,1,1,1,1,2 610 | 557583,5,10,10,10,10,10,10,1,1,4 611 | 636375,5,1,1,1,2,1,1,1,1,2 612 | 736150,10,4,3,10,3,10,7,1,2,4 613 | 803531,5,10,10,10,5,2,8,5,1,4 614 | 822829,8,10,10,10,6,10,10,10,10,4 615 | 1016634,2,3,1,1,2,1,2,1,1,2 616 | 1031608,2,1,1,1,1,1,2,1,1,2 617 | 1041043,4,1,3,1,2,1,2,1,1,2 618 | 1042252,3,1,1,1,2,1,2,1,1,2 619 | 1057067,1,1,1,1,1,?,1,1,1,2 620 | 1061990,4,1,1,1,2,1,2,1,1,2 621 | 1073836,5,1,1,1,2,1,2,1,1,2 622 | 1083817,3,1,1,1,2,1,2,1,1,2 623 | 1096352,6,3,3,3,3,2,6,1,1,2 624 | 1140597,7,1,2,3,2,1,2,1,1,2 625 | 1149548,1,1,1,1,2,1,1,1,1,2 626 | 1174009,5,1,1,2,1,1,2,1,1,2 627 | 1183596,3,1,3,1,3,4,1,1,1,2 628 | 1190386,4,6,6,5,7,6,7,7,3,4 629 | 1190546,2,1,1,1,2,5,1,1,1,2 630 | 1213273,2,1,1,1,2,1,1,1,1,2 631 | 1218982,4,1,1,1,2,1,1,1,1,2 632 | 1225382,6,2,3,1,2,1,1,1,1,2 633 | 1235807,5,1,1,1,2,1,2,1,1,2 634 | 1238777,1,1,1,1,2,1,1,1,1,2 635 | 1253955,8,7,4,4,5,3,5,10,1,4 636 | 1257366,3,1,1,1,2,1,1,1,1,2 637 | 1260659,3,1,4,1,2,1,1,1,1,2 638 | 1268952,10,10,7,8,7,1,10,10,3,4 639 | 1275807,4,2,4,3,2,2,2,1,1,2 640 | 1277792,4,1,1,1,2,1,1,1,1,2 641 | 1277792,5,1,1,3,2,1,1,1,1,2 642 | 1285722,4,1,1,3,2,1,1,1,1,2 643 | 1288608,3,1,1,1,2,1,2,1,1,2 644 | 1290203,3,1,1,1,2,1,2,1,1,2 645 | 1294413,1,1,1,1,2,1,1,1,1,2 646 | 1299596,2,1,1,1,2,1,1,1,1,2 647 | 1303489,3,1,1,1,2,1,2,1,1,2 648 | 1311033,1,2,2,1,2,1,1,1,1,2 649 | 1311108,1,1,1,3,2,1,1,1,1,2 650 | 1315807,5,10,10,10,10,2,10,10,10,4 651 | 1318671,3,1,1,1,2,1,2,1,1,2 652 | 1319609,3,1,1,2,3,4,1,1,1,2 653 | 1323477,1,2,1,3,2,1,2,1,1,2 654 | 1324572,5,1,1,1,2,1,2,2,1,2 655 | 1324681,4,1,1,1,2,1,2,1,1,2 656 | 1325159,3,1,1,1,2,1,3,1,1,2 657 | 1326892,3,1,1,1,2,1,2,1,1,2 658 | 1330361,5,1,1,1,2,1,2,1,1,2 659 | 1333877,5,4,5,1,8,1,3,6,1,2 660 | 1334015,7,8,8,7,3,10,7,2,3,4 661 | 1334667,1,1,1,1,2,1,1,1,1,2 662 | 1339781,1,1,1,1,2,1,2,1,1,2 663 | 1339781,4,1,1,1,2,1,3,1,1,2 664 | 13454352,1,1,3,1,2,1,2,1,1,2 665 | 1345452,1,1,3,1,2,1,2,1,1,2 666 | 1345593,3,1,1,3,2,1,2,1,1,2 667 | 1347749,1,1,1,1,2,1,1,1,1,2 668 | 1347943,5,2,2,2,2,1,1,1,2,2 669 | 1348851,3,1,1,1,2,1,3,1,1,2 670 | 1350319,5,7,4,1,6,1,7,10,3,4 671 | 1350423,5,10,10,8,5,5,7,10,1,4 672 | 1352848,3,10,7,8,5,8,7,4,1,4 673 | 1353092,3,2,1,2,2,1,3,1,1,2 674 | 1354840,2,1,1,1,2,1,3,1,1,2 675 | 1354840,5,3,2,1,3,1,1,1,1,2 676 | 1355260,1,1,1,1,2,1,2,1,1,2 677 | 1365075,4,1,4,1,2,1,1,1,1,2 678 | 1365328,1,1,2,1,2,1,2,1,1,2 679 | 1368267,5,1,1,1,2,1,1,1,1,2 680 | 1368273,1,1,1,1,2,1,1,1,1,2 681 | 1368882,2,1,1,1,2,1,1,1,1,2 682 | 1369821,10,10,10,10,5,10,10,10,7,4 683 | 1371026,5,10,10,10,4,10,5,6,3,4 684 | 1371920,5,1,1,1,2,1,3,2,1,2 685 | 466906,1,1,1,1,2,1,1,1,1,2 686 | 466906,1,1,1,1,2,1,1,1,1,2 687 | 534555,1,1,1,1,2,1,1,1,1,2 688 | 536708,1,1,1,1,2,1,1,1,1,2 689 | 566346,3,1,1,1,2,1,2,3,1,2 690 | 603148,4,1,1,1,2,1,1,1,1,2 691 | 654546,1,1,1,1,2,1,1,1,8,2 692 | 654546,1,1,1,3,2,1,1,1,1,2 693 | 695091,5,10,10,5,4,5,4,4,1,4 694 | 714039,3,1,1,1,2,1,1,1,1,2 695 | 763235,3,1,1,1,2,1,2,1,2,2 696 | 776715,3,1,1,1,3,2,1,1,1,2 697 | 841769,2,1,1,1,2,1,1,1,1,2 698 | 888820,5,10,10,3,7,3,8,10,2,4 699 | 897471,4,8,6,4,3,4,10,6,1,4 700 | 897471,4,8,8,5,4,5,10,4,1,4 -------------------------------------------------------------------------------- /data/HorseKicks.txt: -------------------------------------------------------------------------------- 1 | Year GC C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C14 C15 1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1 1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0 1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0 1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0 1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0 1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0 1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1 1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0 1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1 1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1 1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0 1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0 1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0 1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2 1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2 1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0 1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0 1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0 1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0 -------------------------------------------------------------------------------- /data/Housefly_wing_lengths.txt: -------------------------------------------------------------------------------- 1 | 36 2 | 37 3 | 38 4 | 38 5 | 39 6 | 39 7 | 40 8 | 40 9 | 40 10 | 40 11 | 41 12 | 41 13 | 41 14 | 41 15 | 41 16 | 41 17 | 42 18 | 42 19 | 42 20 | 42 21 | 42 22 | 42 23 | 42 24 | 43 25 | 43 26 | 43 27 | 43 28 | 43 29 | 43 30 | 43 31 | 43 32 | 44 33 | 44 34 | 44 35 | 44 36 | 44 37 | 44 38 | 44 39 | 44 40 | 44 41 | 45 42 | 45 43 | 45 44 | 45 45 | 45 46 | 45 47 | 45 48 | 45 49 | 45 50 | 45 51 | 46 52 | 46 53 | 46 54 | 46 55 | 46 56 | 46 57 | 46 58 | 46 59 | 46 60 | 46 61 | 47 62 | 47 63 | 47 64 | 47 65 | 47 66 | 47 67 | 47 68 | 47 69 | 47 70 | 48 71 | 48 72 | 48 73 | 48 74 | 48 75 | 48 76 | 48 77 | 48 78 | 49 79 | 49 80 | 49 81 | 49 82 | 49 83 | 49 84 | 49 85 | 50 86 | 50 87 | 50 88 | 50 89 | 50 90 | 50 91 | 51 92 | 51 93 | 51 94 | 51 95 | 52 96 | 52 97 | 53 98 | 53 99 | 54 100 | 55 -------------------------------------------------------------------------------- /data/food_outlet_data.csv: -------------------------------------------------------------------------------- 1 | population,profit 2 | 6.1101,17.592 3 | 5.5277,9.1302 4 | 8.5186,13.662 5 | 7.0032,11.854 6 | 5.8598,6.8233 7 | 8.3829,11.886 8 | 7.4764,4.3483 9 | 8.5781,12 10 | 6.4862,6.5987 11 | 5.0546,3.8166 12 | 5.7107,3.2522 13 | 14.164,15.505 14 | 5.734,3.1551 15 | 8.4084,7.2258 16 | 5.6407,0.71618 17 | 5.3794,3.5129 18 | 6.3654,5.3048 19 | 5.1301,0.56077 20 | 6.4296,3.6518 21 | 7.0708,5.3893 22 | 6.1891,3.1386 23 | 20.27,21.767 24 | 5.4901,4.263 25 | 6.3261,5.1875 26 | 5.5649,3.0825 27 | 18.945,22.638 28 | 12.828,13.501 29 | 10.957,7.0467 30 | 13.176,14.692 31 | 22.203,24.147 32 | 5.2524,-1.22 33 | 6.5894,5.9966 34 | 9.2482,12.134 35 | 5.8918,1.8495 36 | 8.2111,6.5426 37 | 7.9334,4.5623 38 | 8.0959,4.1164 39 | 5.6063,3.3928 40 | 12.836,10.117 41 | 6.3534,5.4974 42 | 5.4069,0.55657 43 | 6.8825,3.9115 44 | 11.708,5.3854 45 | 5.7737,2.4406 46 | 7.8247,6.7318 47 | 7.0931,1.0463 48 | 5.0702,5.1337 49 | 5.8014,1.844 50 | 11.7,8.0043 51 | 5.5416,1.0179 52 | 7.5402,6.7504 53 | 5.3077,1.8396 54 | 7.4239,4.2885 55 | 7.6031,4.9981 56 | 6.3328,1.4233 57 | 6.3589,-1.4211 58 | 6.2742,2.4756 59 | 5.6397,4.6042 60 | 9.3102,3.9624 61 | 9.4536,5.4141 62 | 8.8254,5.1694 63 | 5.1793,-0.74279 64 | 21.279,17.929 65 | 14.908,12.054 66 | 18.959,17.054 67 | 7.2182,4.8852 68 | 8.2951,5.7442 69 | 10.236,7.7754 70 | 5.4994,1.0173 71 | 20.341,20.992 72 | 10.136,6.6799 73 | 7.3345,4.0259 74 | 6.0062,1.2784 75 | 7.2259,3.3411 76 | 5.0269,-2.6807 77 | 6.5479,0.29678 78 | 7.5386,3.8845 79 | 5.0365,5.7014 80 | 10.274,6.7526 81 | 5.1077,2.0576 82 | 5.7292,0.47953 83 | 5.1884,0.20421 84 | 6.3557,0.67861 85 | 9.7687,7.5435 86 | 6.5159,5.3436 87 | 8.5172,4.2415 88 | 9.1802,6.7981 89 | 6.002,0.92695 90 | 5.5204,0.152 91 | 5.0594,2.8214 92 | 5.7077,1.8451 93 | 7.6366,4.2959 94 | 5.8707,7.2029 95 | 5.3054,1.9869 96 | 8.2934,0.14454 97 | 13.394,9.0551 98 | 5.4369,0.61705 99 | -------------------------------------------------------------------------------- /ideas.md: -------------------------------------------------------------------------------- 1 | # Idea List 2 | 3 | Possible topics for notebooks 4 | 5 | - Understanding Distributions 6 | - Hypothesis Testing 7 | - Linear Regression 8 | - Multicollinearity 9 | - Heteroskedasticity 10 | - Autocorrelation 11 | - Causality 12 | - Time series 13 | - Time series forecasting 14 | - Markov Chain 15 | - Poisson Process 16 | - Simulation 17 | - Panel Data 18 | - Cramer-Rao Lower Bound 19 | - Convergence 20 | - Limit Theorems 21 | - Decision Tree Algorithm 22 | - Understanding Transformations 23 | - Moment Generating Functions 24 | - ...and more -------------------------------------------------------------------------------- /images/Conditional.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Conditional.png -------------------------------------------------------------------------------- /images/JDTable.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/JDTable.png -------------------------------------------------------------------------------- /images/Joint.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Joint.png -------------------------------------------------------------------------------- /images/Marginal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Marginal.png -------------------------------------------------------------------------------- /images/Marginal2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Marginal2.png -------------------------------------------------------------------------------- /images/OneDirectional.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/OneDirectional.png -------------------------------------------------------------------------------- /images/OnettwoTailed.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/OnettwoTailed.png -------------------------------------------------------------------------------- /images/Table.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/Table.png -------------------------------------------------------------------------------- /images/TwoDirectional.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/TwoDirectional.png -------------------------------------------------------------------------------- /images/TypeIandTypeIIError.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/PetalsOnWind/Statistics-and-Econometrics-for-Data-Science/c7bf6ac053f0f7791b1fdbd5ed0e0d06387544a1/images/TypeIandTypeIIError.png -------------------------------------------------------------------------------- /notebooks/Baye's Theorem Notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Baye's Theorem" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### Introduction\n", 15 | "\n", 16 | "Befor starting with *Bayes Theorem* we can have a look at some definitions.\n", 17 | "\n", 18 | "**Conditional Probability :**\n", 19 | "Conditional Probability is the Probability of one event occuring with some Relationship to one or more events.\n", 20 | "Let A and B be the two interdependent event,where A has already occured then the probabilty of B will be \n", 21 | " $$ P(B|A) = P(A \\cap B)|P(A) $$\n", 22 | " \n", 23 | "**Joint Probability :**\n", 24 | "Joint Probability is a Statistical measure that Calculates the Likehood of two events occuring together and at the same point in time.\n", 25 | " $$ P(A \\cap B) = P(A|B) * P(B) $$" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "### Bayes Theorem\n", 33 | "\n", 34 | "Bayes Theorem was named after **Thomas Bayes**,who discovered it in **1763** and worked in the field of Decision Theory.\n", 35 | "\n", 36 | "Bayes Theorem is a mathematical formula used to determine the **Conditional Probability** of events without the **Joint Probability**.\n", 37 | "\n", 38 | "**Statement**\n", 39 | "\n", 40 | "If B$_{1}$, B$_{2}$, B$_{3}$,.....,B$_{n}$ are Mutually exclusive event with P(B$_{i}$) $\\not=$ 0 ,( i=1,2,3,...,n) of Random Experiment then for any Arbitrary event A of the Sample Space of the above Experiment with P(A)>0,we have\n", 41 | "\n", 42 | "$$ P(B_{i}|A) = P(B_{i})P(A|B_{i})/ \\sum\\limits_{i=1}^{n} P(B_{i})P(A|B_{i}) $$\n", 43 | "\n", 44 | "**Proof**\n", 45 | "\n", 46 | "Let S be the Sample Space of the Random Experiment.The Event B$_{1}$, B$_{2}$, B$_{3}$,.....,B$_{n}$ being Exhaustive\n", 47 | "$$ S = (B_{1} \\cup B_{2} \\cup ...\\cup B_{n}) \\hspace{1cm} \\hspace{0.1cm} [\\therefore A \\subset S] $$\n", 48 | "$$ A = A \\cap S = A \\cap ( B_{1} \\cup B_{2} \\cup B_{3},.....,\\cup B_{n}) $$\n", 49 | "$$ = (A \\cap B_{1}) \\cup (A \\cap B_{2}) \\cup ... \\cup (A \\cap B_{n}) $$\n", 50 | "\n", 51 | "$$ P(A) = P(A \\cap B_{1}) + P (A \\cap B_{2}) + ...+ P(A \\cap B_{n}) $$\n", 52 | "$$ \\hspace{3cm} \\hspace{0.1cm} = P(B_{1})P(A|B_{1}) + P(B_{2})P(A|B_{2}) + ... +P(B_{n})P(A|B_{n}) $$\n", 53 | "$$ = \\sum\\limits_{i=1}^{n} P(B_{i})P(A|B_{i}) $$\n", 54 | "\n", 55 | "Now,\n", 56 | "$$ P(A \\cap B_{i}) = P(A)P(B_{i}|A) $$\n", 57 | "$$ P(B_{i}|A) = P(A \\cap B_{i})/P(A) = P(B_{i})P(A|B_{i})/\\sum\\limits_{i=1}^{n} P(B_{i})P(A|B_{i}) $$\n", 58 | "\n", 59 | "**P(B)** is the Probability of occurence **B**.If we know that the event **A** has already occured.On knowing about the event **A**,**P(B)** is changed to **P(B|A)**.With the help of **Bayes Theorem we can Calculate P(B|A)**.\n", 60 | "\n", 61 | "**Naming Conventions :**\n", 62 | "\n", 63 | "
\n", 64 | "P(A/B) : Posterior Probability \n", 65 | "
\n", 66 | "P(A) : Prior Probability\n", 67 | "
\n", 68 | "P(B/A) : Likelihood\n", 69 | "
\n", 70 | "P(B) : Evidence \n", 71 | "
\n", 72 | "So, Bayes Theorem can be Restated as :\n", 73 | "$$ Posterior = Likelihood * Prior / Evidence $$\n", 74 | "\n" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | " Now we will be looking at some problem examples on Bayes Theorem.\n", 82 | " \n", 83 | "**Example 1** :Suppose that the reliability of a Covid-19 test is specified as follows:\n", 84 | "
\n", 85 | "Of Population having Covid-19 , 90% of the test detect the desire but 10% go undetected.Of Population free of Covid-19 , 99% of the test are judged Covid-19 -tive but 1% are diagnosed showing Covid-19 +tive.From a large population of which only 0.1% have Covid-19,one person is selected at Random,given the Covid-19 test,and the pathologist Report him/her as Covid-19 positive.What is the Probability that the person actually have Covid-19?" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "**Solution**
\n", 93 | "Let,
\n", 94 | "B$_{1}$ = The Person Selected is Actually having Covid-19.
\n", 95 | "B$_{2}$ = The Person Selected is not having Covid-19.
\n", 96 | "A = The Person Covid-19 Test is Diagnosed as Positive.
\n", 97 | "\n", 98 | "P(B$_{1}$) = 0.1% = 0.1/100 = 0.001
\n", 99 | "P(B$_{2}$) = 1-P(B$_{1}$) = 1-0.001 = 0.999
\n", 100 | "P(A|B$_{1}$) = Probability that the person tested Covid-19 +tive given that he / she is actually having Covid-19.= 90/100 = 0.9
\n", 101 | "P(A|B$_{2}$) = Probability that the person tested Covid-19 +tive given that he / she is actually not having Covid-19.= 1/100 = 0.01
\n", 102 | "\n", 103 | "Required Probability = P(B$_{1}$|A) = P(B$_{1}$) * P(A|B$_{1}$)/ (((P(B$_{1}$) * P(A|B$_{1}$))+((P(B$_{2}$) * P(A|B$_{2}$)))
\n", 104 | " = (0.001 * 0.9)/(0.001 * 0.9+0.999 * 0.01) = 90/1089 =0.08264\n" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "We will Now use Python to calculate the same." 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 1, 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "name": "stdout", 121 | "output_type": "stream", 122 | "text": [ 123 | "P(B1|A)= 8.264 %\n" 124 | ] 125 | } 126 | ], 127 | "source": [ 128 | "#calculate P(B1|A) given P(B1),P(A|B1),P(A|B2),P(B2)\n", 129 | "def bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2):\n", 130 | " p_b1_given_a=(p_b1*p_a_given_b1)/((p_b1*p_a_given_b1)+(p_b2*p_a_given_b2))\n", 131 | " return p_b1_given_a\n", 132 | "\n", 133 | "#P(B1)\n", 134 | "p_b1=0.001\n", 135 | "#P(B2)\n", 136 | "p_b2=0.999\n", 137 | "#P(A|B1)\n", 138 | "p_a_given_b1=0.9\n", 139 | "#P(A|B2)\n", 140 | "p_a_given_b2=0.01\n", 141 | "result=bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2)\n", 142 | "print('P(B1|A)=% .3f %%'%(result*100))\n", 143 | " " 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "**Example 2 :** In a Quiz,a contestant either guesses or cheat or knows the answer to a multiple choice question with four choices.The Probability that he/she makes a guess is 1/3 and the Probability that he /she cheats the answer is 1/6.The Probability that his answer is correct,given that he cheated it,is 1/8.Find the Probability that he knows the answer to the question,given that he/she correctly answered it." 151 | ] 152 | }, 153 | { 154 | "cell_type": "markdown", 155 | "metadata": {}, 156 | "source": [ 157 | "**Solution**
\n", 158 | "Let,
\n", 159 | "B$_{1}$ = Contestant guesses the answer.
\n", 160 | "B$_{2}$ = Contestant cheated the answer.
\n", 161 | "B$_{3}$ = Contestant knows the answer.
\n", 162 | "A = Contestant answer correctly.
\n", 163 | "clearly,
\n", 164 | "P(B$_{1}$) = 1/3 , P(B$_{2}$) =1/6
\n", 165 | "\n", 166 | "Since B$_{1}$ ,B$_{2}$, B$_{3}$ are mutually exclusive and exhaustive event.\n", 167 | "P(B$_{1}$) + P(B$_{2}$) + P(B$_{3}$) = 1 => P(B$_{3}$) = 1 - (P(B$_{1}$) + P(B$_{2}$))\n", 168 | "=1-1/3-1/6=1/2\n", 169 | "\n", 170 | "\n", 171 | "If B$_{1}$ has already occured,the contestant guesses,the there are four choices out of which only one is correct.
\n", 172 | "$\\therefore$ the Probability that he answers correctly given that he/she has made a guess is 1/4 i.e. **P(A|B$-{1}$) = 1/4**
\n", 173 | "It is given that he knew the answer = 1
\n", 174 | "By Bayes Theorem,
\n", 175 | "Required Probability = P(B$_{3}$|A)
\n", 176 | "\n", 177 | "= P(B$_{3}$)P(A|B$_{3}$)/(P(B$_{1}$)P(A|B$_{1}$)+P(B$_{2}$)P(A|B$_{2}$)+P(B$_{3}$)P(A|B$_{3}$))\n", 178 | "= (1/2 * 1) / ((1/3 * 1/4) + (1/6 * 1/8) + (1/2 * 1))=24/29\n", 179 | "\n" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 2, 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "name": "stdout", 189 | "output_type": "stream", 190 | "text": [ 191 | "P(B3|A)= 82.759 %\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "#calculate P(B1|A) given P(B1),P(A|B1),P(A|B2),P(B2),P(B3),P(A|B3)\n", 197 | "def bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2,p_b3,p_a_given_b3):\n", 198 | " p_b3_given_a=(p_b3*p_a_given_b3)/((p_b1*p_a_given_b1)+(p_b2*p_a_given_b2)+(p_b3*p_a_given_b3))\n", 199 | " return p_b3_given_a\n", 200 | "\n", 201 | "#P(B1)\n", 202 | "p_b1=1/3\n", 203 | "#P(B2)\n", 204 | "p_b2=1/6\n", 205 | "#P(B3)\n", 206 | "p_b3=1/2\n", 207 | "#P(A|B1)\n", 208 | "p_a_given_b1=1/4\n", 209 | "#P(A|B2)\n", 210 | "p_a_given_b2=1/8\n", 211 | "#P(A|B3)\n", 212 | "p_a_given_b3=1\n", 213 | "result=bayes_theorem(p_b1,p_a_given_b1,p_a_given_b2,p_b2,p_b3,p_a_given_b3)\n", 214 | "print('P(B3|A)=% .3f %%'%(result*100))\n", 215 | " " 216 | ] 217 | } 218 | ], 219 | "metadata": { 220 | "kernelspec": { 221 | "display_name": "Python 3", 222 | "language": "python", 223 | "name": "python3" 224 | }, 225 | "language_info": { 226 | "codemirror_mode": { 227 | "name": "ipython", 228 | "version": 3 229 | }, 230 | "file_extension": ".py", 231 | "mimetype": "text/x-python", 232 | "name": "python", 233 | "nbconvert_exporter": "python", 234 | "pygments_lexer": "ipython3", 235 | "version": "3.8.5" 236 | } 237 | }, 238 | "nbformat": 4, 239 | "nbformat_minor": 4 240 | } 241 | -------------------------------------------------------------------------------- /notebooks/Correlation with Example/README.md: -------------------------------------------------------------------------------- 1 | README.md 2 | 3 | Correlation function explanation with an example of Movie Recommendation. -------------------------------------------------------------------------------- /notebooks/Decision Tree.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Decision Tree with pure Python (Without External Library)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Python and OOP" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "##### Train Data" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 3, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "training_data = [['Green',3,'Apple'],\n", 31 | " ['Yellow',3,'Apple'],\n", 32 | " ['Red',1,'Grape'],\n", 33 | " ['Red',1,'Grape'],\n", 34 | " ['Yellow',3,'Lemon']] #TOY DATASET" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "##### Column Labels\n", 42 | "These are used to print tree" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 4, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "header = [\"Color\",\"Diameter\",\"label\"]" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "##### Finds Unique values for a column in dataset" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 5, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "def unique_vals(Data,col): #training_data,0\n", 68 | " return set([row[col] for row in Data])" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": 6, 74 | "metadata": {}, 75 | "outputs": [ 76 | { 77 | "data": { 78 | "text/plain": [ 79 | "{'Apple', 'Grape', 'Lemon'}" 80 | ] 81 | }, 82 | "execution_count": 6, 83 | "metadata": {}, 84 | "output_type": "execute_result" 85 | } 86 | ], 87 | "source": [ 88 | "unique_vals(training_data,2)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "demo" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 7, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "data": { 105 | "text/plain": [ 106 | "{'Apple': 2, 'Grape': 2, 'Lemon': 1}" 107 | ] 108 | }, 109 | "execution_count": 7, 110 | "metadata": {}, 111 | "output_type": "execute_result" 112 | } 113 | ], 114 | "source": [ 115 | "{'Apple':2,'Grape':2,'Lemon':1}" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "##### Count the number of each type in dataset" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 8, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "def class_counts(Data):\n", 132 | " counts = {}\n", 133 | " for row in Data:\n", 134 | " label = row[-1]\n", 135 | " if label not in counts:\n", 136 | " counts[label] = 0\n", 137 | " counts[label] += 1\n", 138 | " return counts" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "demo" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 9, 151 | "metadata": { 152 | "scrolled": true 153 | }, 154 | "outputs": [ 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "{'Apple': 2, 'Grape': 2, 'Lemon': 1}" 159 | ] 160 | }, 161 | "execution_count": 9, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "class_counts(training_data)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "##### A Question is used to partition a dataset.\n", 175 | "\n", 176 | "This class just records a 'colum number (e.g., 0 for Color) and a column value (e.g., Green). The match method is used to compare the feature value in an example to the feature value stored in the question. See the demo below." 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 10, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "class Question:\n", 186 | " def __init__(self,column,value):\n", 187 | " self.column = column\n", 188 | " self.value = value\n", 189 | " def match(self,example): #example means row --> ['Green',3,'Apple']\n", 190 | " val = example[self.column]\n", 191 | " return val == self.value #'Green' == 'Red', returns a boolean\n", 192 | " def __repr__(self):\n", 193 | " return \"Is %s %s %s?\" % (header[self.column],\"==\",str(self.value))" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 11, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "text/plain": [ 204 | "Is Color == Red?" 205 | ] 206 | }, 207 | "execution_count": 11, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "Question(0,\"Red\")" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 12, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stdout", 223 | "output_type": "stream", 224 | "text": [ 225 | "Is Diameter == 4?\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "q = Question(1,4)\n", 231 | "print(q)" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 13, 237 | "metadata": { 238 | "scrolled": true 239 | }, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/plain": [ 244 | "False" 245 | ] 246 | }, 247 | "execution_count": 13, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "q.match(training_data[0])" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "##### Partitions a dataset.\n", 261 | "\n", 262 | "For each row in the dataset, check if it matches the question. If so, add it to 'true rows', otherwise, add it to 'false rows'." 263 | ] 264 | }, 265 | { 266 | "cell_type": "code", 267 | "execution_count": 14, 268 | "metadata": {}, 269 | "outputs": [], 270 | "source": [ 271 | "def partition(Data,question):\n", 272 | " true_rows,false_rows = [],[]\n", 273 | " for row in Data: #row is also called example --> ['Green',3,'Apple']\n", 274 | " if(question.match(row)):\n", 275 | " true_rows.append(row) # --> [['Green',3,'Apple']]\n", 276 | " else:\n", 277 | " false_rows.append(row)\n", 278 | " return true_rows,false_rows" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 15, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "name": "stdout", 288 | "output_type": "stream", 289 | "text": [ 290 | "True Rows: [['Green', 3, 'Apple']]\n", 291 | "False Rows: [['Yellow', 3, 'Apple'], ['Red', 1, 'Grape'], ['Red', 1, 'Grape'], ['Yellow', 3, 'Lemon']]\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "true_rows,false_rows = partition(training_data,\n", 297 | " Question(0,'Green'))\n", 298 | "print('True Rows: ',true_rows)\n", 299 | "print('False Rows: ',false_rows)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "##### Calculate the Gini Impurity for a list of rows.\n", 307 | "\n", 308 | "There are a few different ways to do this, I thought this one was the most concise. See: https://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 16, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "def gini(Data):\n", 318 | " counts = class_counts(Data)\n", 319 | " impurity = 1\n", 320 | " for lbl in counts:\n", 321 | " prob_of_lbl = counts[lbl]/float(len(Data))\n", 322 | "# print(prob_of_lbl)\n", 323 | " impurity-=prob_of_lbl**2\n", 324 | "# print(impurity)\n", 325 | " return impurity" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 17, 331 | "metadata": { 332 | "scrolled": false 333 | }, 334 | "outputs": [ 335 | { 336 | "data": { 337 | "text/plain": [ 338 | "0.6399999999999999" 339 | ] 340 | }, 341 | "execution_count": 17, 342 | "metadata": {}, 343 | "output_type": "execute_result" 344 | } 345 | ], 346 | "source": [ 347 | "gini(training_data)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "##### Information Gain.\n", 355 | "\n", 356 | "The uncertainty of the starting node, minus the weighted impurity of two child nodes." 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 18, 362 | "metadata": {}, 363 | "outputs": [], 364 | "source": [ 365 | "def info_gain(left,right,current_impurity): #current impurity means GDS \n", 366 | " #left means true, right means false\n", 367 | " p = float(len(left))/(len(left)+len(right)) #prob of true rows\n", 368 | " return current_impurity - p*gini(left) - (1-p)*gini(right)" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": 19, 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "true_rows,false_rows = partition(training_data,\n", 378 | " Question(1,3))" 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 20, 384 | "metadata": { 385 | "scrolled": true 386 | }, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/plain": [ 391 | "0.37333333333333324" 392 | ] 393 | }, 394 | "execution_count": 20, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "info_gain(true_rows,false_rows,gini(training_data))" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "metadata": {}, 406 | "source": [ 407 | "###### Find the best question to ask by iterating over every feature / value and calculating the information gain." 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 27, 413 | "metadata": {}, 414 | "outputs": [], 415 | "source": [ 416 | "def find_best_split(Data):\n", 417 | " best_gain = 0\n", 418 | " best_question = None\n", 419 | " current_impurity = gini(Data) #Gds\n", 420 | " n_features = len(Data[0]) - 1\n", 421 | " for col in range(n_features): #0\n", 422 | " values = unique_vals(Data,col) #[Green,Red,Yellow]\n", 423 | " for val in values:\n", 424 | " question = Question(col,val)\n", 425 | " true_rows,false_rows = partition(Data,question)\n", 426 | " if(len(true_rows) == 0 or len(false_rows)==0):\n", 427 | " continue\n", 428 | " gain = info_gain(true_rows,\n", 429 | " false_rows,\n", 430 | " current_impurity)\n", 431 | " if gain>=best_gain:\n", 432 | " best_gain, best_question = gain , question\n", 433 | " return best_gain,best_question" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": 28, 439 | "metadata": {}, 440 | "outputs": [ 441 | { 442 | "name": "stdout", 443 | "output_type": "stream", 444 | "text": [ 445 | "Is Diameter == 3?\n", 446 | "0.37333333333333324\n" 447 | ] 448 | } 449 | ], 450 | "source": [ 451 | "best_gain,best_question = find_best_split(training_data)\n", 452 | "print(best_question)\n", 453 | "print(best_gain)" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "##### A Leaf node classifies data.\n", 461 | "\n", 462 | "This holds a dictionary of class (e.g., \"Mango\") -> number of times it appears in the rows from the training data that reach this leaf." 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": 70, 468 | "metadata": {}, 469 | "outputs": [], 470 | "source": [ 471 | "class Leaf:\n", 472 | " def __init__(self,Data):\n", 473 | " self.predictions = class_counts(Data)" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": {}, 479 | "source": [ 480 | "##### A Decision Node asks a question.\n", 481 | "\n", 482 | "This holds a reference to the question, and to the two child nodes." 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": 71, 488 | "metadata": {}, 489 | "outputs": [], 490 | "source": [ 491 | "class Decision_Node:\n", 492 | " def __init__(self, question, true_branch,false_branch):\n", 493 | " self.question = question\n", 494 | " self.true_branch = true_branch\n", 495 | " self.false_branch = false_branch\n", 496 | " #print(self.question)" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": {}, 502 | "source": [ 503 | "##### Builds the tree.\n", 504 | "\n", 505 | "Try partitioing the dataset on each of the unique attribute, \n", 506 | "calculate the information gain, \n", 507 | "and return the question that produces the highest gain." 508 | ] 509 | }, 510 | { 511 | "cell_type": "code", 512 | "execution_count": 72, 513 | "metadata": {}, 514 | "outputs": [], 515 | "source": [ 516 | "def build_tree(Data,i=0):\n", 517 | " gain, question = find_best_split(Data) #FIND BEST QUESTION\n", 518 | " \n", 519 | "\n", 520 | " # Base case: no further info gain \n", 521 | " # since we can ask no further questions,\n", 522 | " # we'Ll return a leaf.\n", 523 | " if gain == 0:\n", 524 | " return Leaf(Data)\n", 525 | " \n", 526 | " # If we reach here, we have found a useful feature / value \n", 527 | " # to partition on.\n", 528 | " true_rows , false_rows = partition(Data,question)\n", 529 | " \n", 530 | " true_branch = build_tree(true_rows,i)\n", 531 | " false_branch = build_tree(false_rows,i)\n", 532 | " false_branch = build tree(false_rows)\n", 533 | "\n", 534 | " # Return a Question node.\n", 535 | " # This records the best feature / value to ask at this point, \n", 536 | " # as well as the branches to follow\n", 537 | " # dependingo on the answer.\n", 538 | " return Decision_Node(question,true_branch,false_branch)" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 73, 544 | "metadata": {}, 545 | "outputs": [ 546 | { 547 | "name": "stdout", 548 | "output_type": "stream", 549 | "text": [ 550 | "<__main__.Decision_Node object at 0x0000025EE832F780>\n" 551 | ] 552 | } 553 | ], 554 | "source": [ 555 | "my_tree = build_tree(training_data)\n", 556 | "print(my_tree)" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 74, 562 | "metadata": {}, 563 | "outputs": [], 564 | "source": [ 565 | "def print_tree(node,spacing=\"\"):\n", 566 | " if isinstance(node, Leaf):\n", 567 | " print(spacing + \"Predict\",node.predictions)\n", 568 | " return\n", 569 | " print(spacing+str(node.question))\n", 570 | " print(spacing + \"--> True:\")\n", 571 | " print_tree(node.true_branch , spacing + \"\\t\")\n", 572 | " \n", 573 | " print(spacing + \"--> False:\")\n", 574 | " print_tree(node.false_branch , spacing + \"\\t\")\n", 575 | " " 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 75, 581 | "metadata": { 582 | "scrolled": false 583 | }, 584 | "outputs": [ 585 | { 586 | "name": "stdout", 587 | "output_type": "stream", 588 | "text": [ 589 | "Is Diameter == 3?\n", 590 | "--> True:\n", 591 | "\tIs Color == Yellow?\n", 592 | "\t--> True:\n", 593 | "\t\tPredict {'Lemon': 1, 'Apple': 1}\n", 594 | "\t--> False:\n", 595 | "\t\tPredict {'Apple': 1}\n", 596 | "--> False:\n", 597 | "\tPredict {'Grape': 2}\n" 598 | ] 599 | } 600 | ], 601 | "source": [ 602 | "print_tree(my_tree)" 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": 76, 608 | "metadata": {}, 609 | "outputs": [], 610 | "source": [ 611 | "def print_leaf(counts):\n", 612 | " total = sum(counts.values())*1.0\n", 613 | " probs = {}\n", 614 | " for lbl in counts.keys():\n", 615 | " probs[lbl] = str(int(counts[lbl]/total * 100)) + \"%\"\n", 616 | " return probs" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": 77, 622 | "metadata": {}, 623 | "outputs": [], 624 | "source": [ 625 | "def classify(row,node):\n", 626 | " if isinstance(node,Leaf):\n", 627 | " return node.predictions\n", 628 | " \n", 629 | " # Decide whether to follow the true-branch or the false-branch.\n", 630 | " # Compate the feature / value stored in the node, \n", 631 | " # to the example we're considering.\n", 632 | " if node.question.match(row):\n", 633 | " return classify(row,node.true_branch)\n", 634 | " else:\n", 635 | " return classify(row,node.false_branch)" 636 | ] 637 | }, 638 | { 639 | "cell_type": "markdown", 640 | "metadata": {}, 641 | "source": [ 642 | "##### Test Data" 643 | ] 644 | }, 645 | { 646 | "cell_type": "code", 647 | "execution_count": 28, 648 | "metadata": {}, 649 | "outputs": [], 650 | "source": [ 651 | "testing_data = [\n", 652 | " [\"Red\",1,\"Apple\"],\n", 653 | " [\"Yellow\" , 3 , \"Apple\"]\n", 654 | "]" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": {}, 660 | "source": [ 661 | "##### Comparing actual and predicted value" 662 | ] 663 | }, 664 | { 665 | "cell_type": "code", 666 | "execution_count": 29, 667 | "metadata": {}, 668 | "outputs": [ 669 | { 670 | "name": "stdout", 671 | "output_type": "stream", 672 | "text": [ 673 | "Actual: Apple. Predicted: {'Grape': '100%'}\n", 674 | "Actual: Apple. Predicted: {'Apple': '50%', 'Lemon': '50%'}\n" 675 | ] 676 | } 677 | ], 678 | "source": [ 679 | "for row in testing_data:\n", 680 | " print(\"Actual: %s. Predicted: %s\" % \n", 681 | " (row[-1],print_leaf(classify(row,my_tree))))" 682 | ] 683 | }, 684 | { 685 | "cell_type": "markdown", 686 | "metadata": {}, 687 | "source": [ 688 | "##### Training model with 2nd dataset to increase accuracy" 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "execution_count": 90, 694 | "metadata": {}, 695 | "outputs": [], 696 | "source": [ 697 | "header=[\"outlook\",\"temperature\",\"humidity\",\"wind\",\"decision\"]\n", 698 | "\n", 699 | "training_data2= [\n", 700 | "['sunny','hot','high','weak','no'],\n", 701 | "['sunny','hot','high','strong','no'],\n", 702 | "['overcast','hot','high','weak','yes'],\n", 703 | "['rain','mild','high','weak','yes'],\n", 704 | "['rain','cool','normal','weak','yes'],\n", 705 | "['rain','cool','normal','strong','no'],\n", 706 | "['overcast','cool','normal','strong','yes'],\n", 707 | "['sunny','mild','high','weak','no'],\n", 708 | "['sunny','cool','normal','weak','yes'],\n", 709 | "['rain','mild','normal','weak','yes'],\n", 710 | "['sunny','mild','normal','strong','yes'],\n", 711 | "['overcast','mild','high','strong','yes'],\n", 712 | "['overcast','hot','normal','weak','yes'],\n", 713 | "['rain','mild','high','strong','no'],\n", 714 | "]" 715 | ] 716 | }, 717 | { 718 | "cell_type": "markdown", 719 | "metadata": {}, 720 | "source": [ 721 | "##### Build the Decision tree with \"training_data2\" dataset" 722 | ] 723 | }, 724 | { 725 | "cell_type": "code", 726 | "execution_count": 91, 727 | "metadata": {}, 728 | "outputs": [], 729 | "source": [ 730 | "my_tree2 = build_tree(training_data2)" 731 | ] 732 | }, 733 | { 734 | "cell_type": "markdown", 735 | "metadata": {}, 736 | "source": [ 737 | "#### Print The Final Tree" 738 | ] 739 | }, 740 | { 741 | "cell_type": "code", 742 | "execution_count": 92, 743 | "metadata": {}, 744 | "outputs": [ 745 | { 746 | "name": "stdout", 747 | "output_type": "stream", 748 | "text": [ 749 | "Is outlook == overcast?\n", 750 | "--> True:\n", 751 | "\tPredict {'yes': 4}\n", 752 | "--> False:\n", 753 | "\tIs humidity == high?\n", 754 | "\t--> True:\n", 755 | "\t\tIs outlook == sunny?\n", 756 | "\t\t--> True:\n", 757 | "\t\t\tPredict {'no': 3}\n", 758 | "\t\t--> False:\n", 759 | "\t\t\tIs wind == strong?\n", 760 | "\t\t\t--> True:\n", 761 | "\t\t\t\tPredict {'no': 1}\n", 762 | "\t\t\t--> False:\n", 763 | "\t\t\t\tPredict {'yes': 1}\n", 764 | "\t--> False:\n", 765 | "\t\tIs wind == strong?\n", 766 | "\t\t--> True:\n", 767 | "\t\t\tIs temperature == cool?\n", 768 | "\t\t\t--> True:\n", 769 | "\t\t\t\tPredict {'no': 1}\n", 770 | "\t\t\t--> False:\n", 771 | "\t\t\t\tPredict {'yes': 1}\n", 772 | "\t\t--> False:\n", 773 | "\t\t\tPredict {'yes': 3}\n" 774 | ] 775 | } 776 | ], 777 | "source": [ 778 | "print_tree(my_tree2)" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": 93, 784 | "metadata": {}, 785 | "outputs": [], 786 | "source": [ 787 | "testing_data2 = [\"overcast\",\"mild\",\"normal\",\"weak\"]" 788 | ] 789 | }, 790 | { 791 | "cell_type": "code", 792 | "execution_count": 94, 793 | "metadata": {}, 794 | "outputs": [ 795 | { 796 | "data": { 797 | "text/plain": [ 798 | "{'yes': 4}" 799 | ] 800 | }, 801 | "execution_count": 94, 802 | "metadata": {}, 803 | "output_type": "execute_result" 804 | } 805 | ], 806 | "source": [ 807 | "classify(testing_data2,my_tree2)" 808 | ] 809 | }, 810 | { 811 | "cell_type": "code", 812 | "execution_count": 95, 813 | "metadata": {}, 814 | "outputs": [ 815 | { 816 | "name": "stdout", 817 | "output_type": "stream", 818 | "text": [ 819 | "Predicted: {'yes': '100%'}\n" 820 | ] 821 | } 822 | ], 823 | "source": [ 824 | "print(\"Predicted: %s\" % (print_leaf(classify(testing_data2,my_tree2))))" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": null, 830 | "metadata": {}, 831 | "outputs": [], 832 | "source": [] 833 | } 834 | ], 835 | "metadata": { 836 | "kernelspec": { 837 | "display_name": "Python 3", 838 | "language": "python", 839 | "name": "python3" 840 | }, 841 | "language_info": { 842 | "codemirror_mode": { 843 | "name": "ipython", 844 | "version": 3 845 | }, 846 | "file_extension": ".py", 847 | "mimetype": "text/x-python", 848 | "name": "python", 849 | "nbconvert_exporter": "python", 850 | "pygments_lexer": "ipython3", 851 | "version": "3.8.5" 852 | } 853 | }, 854 | "nbformat": 4, 855 | "nbformat_minor": 2 856 | } 857 | -------------------------------------------------------------------------------- /notebooks/Dummy Classifier Notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Dummy Classifier " 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "## What is a `DummyClassifier`?\n", 15 | "\n", 16 | "DummyClassifier is a classifier that makes predictions using simple rules, which can be\n", 17 | "useful as a baseline for comparison against actual classifiers, especially with imbalanced classes(where the class distribution is not equal or close to equal, and is instead biased or skewed).\n", 18 | "\n", 19 | "A dummy classifier is basically a classifier which doesn’t even look at the training data while classification, but follows just a rule of thumb or strategy that we instruct it to use while classifying. It is done by including the strategy we want in the strategy parameter of the `DummyClassifier`.The main notion behind using a dummy classifier is that a classifier which is based on an analytic approach to do better than random guessing approach.\n" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Strategies used in Dummy Classifier\n", 27 | "\n", 28 | "The scikit-learn `DummyClassifier` class implements several strategies for random guessing classifiers. \n", 29 | "The strategies are as follows:\n", 30 | "\n", 31 | "- stratified : This strategy generates the prediction using the training set's class distribution\n", 32 | "- most_frequent : This always predicts the most frequent label in training set.\n", 33 | "- prior : This predicts the class that maximises the class prior.\n", 34 | "- uniform : This generates predictions uniformly at random\n", 35 | "- constant : Always predicts a constant label which is user defined. This is specificaly usefull for metrics that evaluate a non-majority class." 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | " ## Explaination through Implementation\n", 43 | " \n", 44 | "The dummy classifier gives measure of \"baseline\" performance--i.e. the success rate one should expect to achieve even if simply guessing.\n", 45 | "\n", 46 | "If one wishes to determine whether a given object possesses or does not possess a certain property. After analyzing a large number of the objects it is found that 90% contain the target property, then guessing that every future instance of the object possesses the target property gives a 90% likelihood of guessing correctly. Structuring these guesses is equivalent to using the `most_frequent` method in dummy clasifier\n", 47 | "\n", 48 | "Because many machine learning tasks attempt to increase the success rate of (e.g.) classification tasks, evaluating the baseline success rate can afford a floor value for the minimal value one's classifier should out-perform. \n", 49 | "\n", 50 | "If one trains a dummy classifier with the `stratified` parameter using the data discussed above, that classifier will predict that there is a 90% probability that each object it encounters possesses the target property. This is different from training a dummy classifier with the `most_frequent` parameter, as the latter would guess that all future objects possess the target property. Here's some code to illustrate:" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "import numpy as np \n", 60 | "import pandas as pd \n", 61 | "import matplotlib.pyplot as plt \n", 62 | "import seaborn as sns " 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 2, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "data": { 72 | "text/html": [ 73 | "
\n", 74 | "\n", 87 | "\n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | "
PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
Id
061487235033.60.627501
28183640023.30.672321
318966239428.10.167210
\n", 153 | "
" 154 | ], 155 | "text/plain": [ 156 | " Pregnancies Glucose BloodPressure SkinThickness Insulin BMI \\\n", 157 | "Id \n", 158 | "0 6 148 72 35 0 33.6 \n", 159 | "2 8 183 64 0 0 23.3 \n", 160 | "3 1 89 66 23 94 28.1 \n", 161 | "\n", 162 | " DiabetesPedigreeFunction Age Outcome \n", 163 | "Id \n", 164 | "0 0.627 50 1 \n", 165 | "2 0.672 32 1 \n", 166 | "3 0.167 21 0 " 167 | ] 168 | }, 169 | "execution_count": 2, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "import pandas as pd\n", 176 | "import matplotlib.pyplot as plt\n", 177 | "df_train = pd.read_csv(\"C:/Users/sshre/OneDrive/Documents/DATA SCIENCE/train.csv\")\n", 178 | "df_train.set_index(\"Id\").head(3)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 3, 184 | "metadata": {}, 185 | "outputs": [], 186 | "source": [ 187 | "X = df_train.drop([\"Outcome\"],axis=1)\n", 188 | "y = df_train[\"Outcome\"]" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "Dividing the data set into training and test data for analysis" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 4, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "from sklearn.model_selection import train_test_split\n", 205 | "\n", 206 | "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "Checking the dummyclassifier performance with different strategies." 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 8, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stdout", 223 | "output_type": "stream", 224 | "text": [ 225 | "0.6276595744680851\n", 226 | "0.5638297872340425\n", 227 | "0.574468085106383\n" 228 | ] 229 | } 230 | ], 231 | "source": [ 232 | "from sklearn.metrics import accuracy_score\n", 233 | "from sklearn.dummy import DummyClassifier\n", 234 | "strategies = ['most_frequent', 'stratified', 'uniform'] \n", 235 | " \n", 236 | "test_scores = [] \n", 237 | "for s in strategies: \n", 238 | " \n", 239 | " dclf = DummyClassifier(strategy = s, random_state = 0) \n", 240 | " dclf.fit(X_train, y_train) \n", 241 | " prediction=dclf.predict(X_test)\n", 242 | " score=(accuracy_score(y_test,prediction)) \n", 243 | " test_scores.append(score)\n", 244 | " print(score)" 245 | ] 246 | }, 247 | { 248 | "cell_type": "markdown", 249 | "metadata": {}, 250 | "source": [ 251 | "Plotting the performace score of the dummyclassifier " 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": 9, 257 | "metadata": {}, 258 | "outputs": [ 259 | { 260 | "data": { 261 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEHCAYAAAC0pdErAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAc2ElEQVR4nO3dfZxcVZ3n8c+XhCAIyEMaN5JAMpqIjsNTyigyKGGEiU/JzoIRRlfjiBlXI+OM4AvXh3GCzgw+jLtoZjWwgIzyoOhAQCSgC4gokopAIGEimQRMiy9pkyAgCgS++8e9DZXK7U6R9O1KJ9/361WvrnPuuad+1dVdvzrn1j1XtomIiGi3S7cDiIiI7VMSREREVEqCiIiISkkQERFRKQkiIiIqje52AENl7NixnjhxYrfDiIgYUZYuXfob2z1V23aYBDFx4kSazWa3w4iIGFEk3T/QtkwxRUREpSSIiIiolAQRERGVak0QkmZIWilplaQzB2gzW9IKScslXdy2bW9Jv5T05TrjjIiIzdV2kFrSKGABcDzQCyyRtMj2ipY2k4GPAkfb3iDpgLZuzgJuqivGiIgYWJ0jiGnAKturbT8BXArMamvzXmCB7Q0Ath/s3yBpKvBC4LoaY4yIiAHUmSAOBNa2lHvLulZTgCmSbpF0q6QZAJJ2Ab4AnDHYA0iaK6kpqdnX1zeEoUdERJ0JQhV17WuLjwYmA8cCpwDnSdoHeD9wje21DML2QtsN242ensrzPCIiYivVeaJcLzChpTweeKCiza22nwTWSFpJkTCOAo6R9H5gT2CMpEdtVx7ojoiIoVfnCGIJMFnSJEljgJOBRW1trgCmA0gaSzHltNr2220fZHsicDpwUZJDRMTwqi1B2N4IzAMWA/cA37S9XNJ8STPLZouBdZJWADcAZ9heV1dMERHROe0olxxtNBrOWkwREc+NpKW2G1XbciZ1RERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhKSRAREVEpCSIiIiolQURERKUkiIiIqJQEERERlZIgIiKiUhJERERUqjVBSJohaaWkVZIqryktabakFZKWS7q4rDtY0lJJd5T176szzoiI2NzoujqWNApYABwP9AJLJC2yvaKlzWTgo8DRtjdIOqDc9CvgNbYfl7QncHe57wN1xRsREZuqcwQxDVhle7XtJ4BLgVltbd4LLLC9AcD2g+XPJ2w/XrbZreY4IyKiQp1vvAcCa1vKvWVdqynAFEm3SLpV0oz+DZImSFpW9nF21ehB0lxJTUnNvr6+Gp5CRMTOq84EoYo6t5VHA5OBY4FTgPMk7QNge63tQ4GXAO+S9MLNOrMX2m7YbvT09Axp8BERO7s6E0QvMKGlPB5oHwX0AlfaftL2GmAlRcJ4RjlyWA4cU2OsERHRps4EsQSYLGmSpDHAycCitjZXANMBJI2lmHJaLWm8pN3L+n2BoymSR0REDJPavsVke6OkecBiYBRwvu3lkuYDTduLym0nSFoBPAWcYXudpOOBL0gyxVTV523fVVesI9Uv1j3G6Zffyc/u38CRB+/L5086jIP236PbYUXEDkJ2+2GBkanRaLjZbHY7jGE1+6s/4bY1658pT5u0H9/866O6GFFEjDSSltpuVG3L10dHsJ/dv2HQckTEtkiCGMGOPHjfQcsREdsiCWIE+/xJhzFt0n6M3kVMm7Qfnz/psG6HFBE7kNoOUkf9Dtp/jxxziIjaZAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUanWBCFphqSVklZJOnOANrMlrZC0XNLFZd3hkn5S1i2T9LY644yIiM3Vtty3pFHAAuB4oBdYImmR7RUtbSYDHwWOtr1B0gHlpseAd9q+V9KLgKWSFtt+qK54IyJiU3WOIKYBq2yvtv0EcCkwq63Ne4EFtjcA2H6w/Plz2/eW9x8AHgR6aow1IiLa1JkgDgTWtpR7y7pWU4Apkm6RdKukGe2dSJoGjAH+s2LbXElNSc2+vr4hDD0iIupMEKqoc1t5NDAZOBY4BThP0j7PdCCNA/4NeLftpzfrzF5ou2G70dOTAUZExFCqM0H0AhNayuOBByraXGn7SdtrgJUUCQNJewPfBT5u+9Ya44yIiAp1JoglwGRJkySNAU4GFrW1uQKYDiBpLMWU0+qy/b8DF9n+Vo0xRkTEAGpLELY3AvOAxcA9wDdtL5c0X9LMstliYJ2kFcANwBm21wGzgdcCcyTdUd4OryvWiIjYnOz2wwIjU6PRcLPZ7HYYEREjiqSlthtV23ImdUREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiKiUBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzhygzWxJKyQtl3RxS/21kh6SdHWdMUZERLXRdXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBLF58D9gD+uq4YIyJiYFscQUjqkfTV/k/ykl4uaU4HfU8DVtlebfsJ4FJgVlub9wILbG8AsP1g/wbbPwAe6expRETEUOtkiulC4CZgQlm+F/hwB/sdCKxtKfeWda2mAFMk3SLpVkkzOug3IiKGQScJ4gDbFwNPA9h+Eniqg/1UUee28mhgMnAscApwnqR9Oui7eABprqSmpGZfX1+nu0VERAc6SRC/k7Qf5Zu7pFfS2dRPL8+OOgDGAw9UtLnS9pO21wArKRJGR2wvtN2w3ejp6el0t4iI6EAnCeJ04CrgjyTdBFwCfLCD/ZYAkyVNkjQGOBlY1NbmCmA6gKSxFFNOqzuMPSIiajTot5gk7QKMongTfxnFtNGK8qDzoGxvlDQPWFz2cb7t5ZLmA03bi8ptJ0haQTFtdYbtdeVj3wwcAuwpqRd4j+3FW/tEIyLiuZHdfligrYF0q+1XD1M8W63RaLjZbHY7jIiIEUXSUtuNqm2dTDFdL6n966kREbGD6+REuXnACyQ9DvyeYprJtverNbKIiOiqThLE2NqjiIiI7c4WE4TtpyS9EXhtWXWj7WvrDSsiIrqtk6U2PgN8hOLrp6uBj0j6dN2BRUREd3UyxfQW4AjbTwFIOh/4GfDxOgOLiIju6nS5771b7u9VRyAREbF96WQE8VngZ5J+QPENpmOBT9YZVEREdF8nB6m/LukG4FUUCeKTtn9Ze2QREdFVnRykngk8avs7tr9NsXjfm+sPLSIiuqmTYxDzbf+2v2D7IeCs+kKKiIjtQScJoqpNbZcqjYiI7UMnCeJnkj4r6WBJB0n6HHB73YFFRER3dZIg5pXtrqS4LgTA+2uLKCIitgudfIvpUYqLBiFpL9udXE0uIiJGuAFHEJI+JumQ8v4YSdcBayX9WtJxwxZhRER0xWBTTH9JcY1ogHcCz6NY2fU44J9qjisiIrpssATxhJ+93NwM4GLbG20vB3btpHNJMyStlLRK0pkDtJktaYWk5ZIubql/l6R7y9u7On1CERExNAY7BvG4pJcBD1KMGj7Ssm2PLXUsaRSwADge6AWWSFpke0VLm8nAR4GjbW+QdEBZvx/w90ADMLC03HfDc3p2ERGx1QYbQXwYWASsAs6xvRqgvDbEsg76ngassr3a9hPApUD7pUvfCyzof+O3/WBZ/+fA9bbXl9uupxjFRETEMBlwBGH7FmByRf01wDUd9H0gsLal3EuxnlOrKQCSbgFGAZ8qL0ZUte+BHTxmREQMkTrPiFZFndvKoymS0LHAeOBmSa/ocF8kzQXmAhx00EHbEmtERLTp9HoQW6MXmNBSHg88UNHmSttP2l5D8a2pyR3ui+2Fthu2Gz09PUMafETEzq6T1Vw3G2VU1VVYAkyWNEnSGOBkimMara4Appd9jqWYcloNLAZOkLSvpH2BE8q6iIgYJp2MIG7rsG4TtjdSLNOxGLgH+Kbt5ZLml0uIU25bJ2kFcANwhu11ttdTrBi7pLzNL+siImKY6NlTHdo2FF85HUfx7aPZPHtcYG/gPNuHDEuEHWo0Gm42m90OIyJiRJG01HajattgU0VvAv6KYv5/Ac8miEeATwxphBERsd0Z7GuuFwAXSJpt+5vDGFNERGwHOjkGcYCkvQEkfUXSbZL+rOa4IiKiyzpJEHNtPyzpBIrppv8BfLbesCIiots6SRD9R7HfAFxge2mH+0VExAjWyRv9nZKuAd4CfE/SnlSc1RwRETuWTk54ezcwlWLhvcfKE9reU29YERHRbVscQdh+CvgjimMPALt3sl9ERIxsnSy18WWK5TDeUVb9DvhKnUFFRET3dTLF9BrbR0q6HcD2+nJtpYiI2IF1MlX0pKRdKA9MS9ofeLrWqCIiousGTBAtK7YuAL4N9Ej6B+BHwNnDEFtERHTRYFNMtwFH2r5I0lLg9RTrMb3V9t3DEl1ERHTNYAnimau62V4OLK8/nIiI2F4MliB6JP3dQBtt/0sN8URExHZisAQxCtiT6utDR0TEDm6wBPEr2/OHLZKIiNiuDPY114wcIiJ2YoMliG2+5oOkGZJWSlol6cyK7XMk9Um6o7yd2rLtbEl3l7e3bWssERHx3Ax2Rbn129KxpFEU51AcD/QCSyQtsr2irelltue17fsm4EjgcGA34CZJ37P98LbEFBERnatz0b1pFCvArrb9BHApMKvDfV8O3GR7o+3fAXcCM2qKMyIiKtSZIA4E1raUe8u6didKWibpckkTyro7gTdI2qNcXnw6MKF9R0lzJTUlNfv6+oY6/oiInVqdCaLqIHf7hYauAibaPhT4PvA1ANvXAdcAPwYuAX4CbNysM3uh7YbtRk9Pz1DGHhGx06szQfSy6af+8cADrQ1sr7P9eFk8l+LCRP3bPmP7cNvHUySbe2uMNSIi2tSZIJYAkyVNKpcHPxlY1NpA0riW4kzgnrJ+VLlqLJIOBQ4Frqsx1oiIaNPJ9SC2iu2NkuYBiynOyj7f9nJJ84Gm7UXAaZJmUkwfrQfmlLvvCtwsCeBh4B22N5tiioiI+shuPywwMjUaDTebzW6HERExokhaartRtS3Xlo6IiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIiolAQRERGVkiAiIqJSEkRERFRKgoiIiEq1JghJMyStlLRK0pkV2+dI6pN0R3k7tWXbZyUtl3SPpHNUXn80IiKGR23XpJY0ClgAHA/0AkskLbK9oq3pZbbnte37GuBo4NCy6kfA64Ab64o3IiI2VecIYhqwyvZq208AlwKzOtzXwPOAMcBuwK7Ar2uJMiIiKtWZIA4E1raUe8u6didKWibpckkTAGz/BLgB+FV5W2z7nvYdJc2V1JTU7OvrG/pnEBGxE6szQVQdM3Bb+Spgou1Dge8DXwOQ9BLgZcB4iqRynKTXbtaZvdB2w3ajp6dnSIOPiNjZ1ZkgeoEJLeXxwAOtDWyvs/14WTwXmFre/wvgVtuP2n4U+B7w6hpjjYiINnUmiCXAZEmTJI0BTgYWtTaQNK6lOBPon0b6BfA6SaMl7UpxgHqzKaaIiKhPbd9isr1R0jxgMTAKON/2cknzgabtRcBpkmYCG4H1wJxy98uB44C7KKalrrV9VV2xRkTE5mS3HxYYmRqNhpvNZrfDiIgYUSQttd2o2pYzqSMiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiIiG209pG1zLl2DkdcdARzrp3D2kfWbnmnESAJIiJiG33ilk+w9NdL2eiNLP31Uj5xyye6HdKQSIKIiNhGdz5456DlkSoJIiJiGx12wGGDlkeqJIiIiG101tFnMfWFUxmt0Ux94VTOOvqsboc0JGpbrC8iYmcxYa8JXDjjwm6HMeQygoiIiEpJEBERUSkJIiIiKiVBREREpSSIiIioVGuCkDRD0kpJqySdWbF9jqQ+SXeUt1PL+uktdXdI+oOk/1pnrBERsanavuYqaRSwADge6AWWSFpke0Vb08tsz2utsH0DcHjZz37AKuC6umKNiIjN1TmCmAassr3a9hPApcCsrejnJOB7th8b0ugiImJQdSaIA4HWJQ17y7p2J0paJulySRMqtp8MXFL1AJLmSmpKavb19W17xBER8Yw6E4Qq6txWvgqYaPtQ4PvA1zbpQBoH/AmwuOoBbC+03bDd6OnpGYKQIyKiX50JohdoHRGMBx5obWB7ne3Hy+K5wNS2PmYD/277ydqijIiISnUmiCXAZEmTJI2hmCpa1NqgHCH0mwnc09bHKQwwvRQREfWq7VtMtjdKmkcxPTQKON/2cknzgabtRcBpkmYCG4H1wJz+/SVNpBiB3FRXjBERMTDZ7YcFRqZGo+Fms9ntMCIiRhRJS203qrblTOqIiKiUBBEREZWSICIiolISREREVEqCiIiISkkQERFRKQkiIiIqJUFERESlJIiI4bR+DVzwRpi/f/Fz/ZpuRxQxoCSIiOF05Qfg/lvg6Y3Fzys/0O2IIgaUBBExnNb+dPByxHYkCSJiOE141eDliO1IEkTEcJq1AA4+GnYZXfyctaDbEUUMqLblviOiwn6T4N3XdDuKiI5kBBEREZWSICIiolISREREVKo1QUiaIWmlpFWSzqzYPkdSn6Q7ytupLdsOknSdpHskrSgvQRoREcOktoPUkkYBC4DjgV5giaRFtle0Nb3M9ryKLi4CPmP7ekl7Ak/XFWtERGyuzhHENGCV7dW2nwAuBWZ1sqOklwOjbV8PYPtR24/VF2pERLSrM0EcCKxtKfeWde1OlLRM0uWSJpR1U4CHJH1H0u2SPleOSDYhaa6kpqRmX1/f0D+DiIidWJ3nQaiizm3lq4BLbD8u6X3A14DjyriOAY4AfgFcBswB/u8mndkLgYUA5bGM+4fyCWxnxgK/6XYQsdXy+o1cO/prd/BAG+pMEL3AhJbyeOCB1ga217UUzwXObtn3dturASRdAbyatgTR1lfPEMS83ZLUtN3odhyxdfL6jVw782tX5xTTEmCypEmSxgAnA4taG0ga11KcCdzTsu++kvrf9I8D2g9uR0REjWobQdjeKGkesBgYBZxve7mk+UDT9iLgNEkzgY3AeoppJGw/Jel04AeSBCylGGFERMQwkd1+WCC2R5LmlsdcYgTK6zdy7cyvXRJERERUylIbERFRKQkiIiIqJUFEbIGkD0naYyv2myPpRS3l88pVApD01nKdsRskNSSd8xz7vlHSTvnVy7q0vg6SdpP0/XKNuLd1O7ZuSYIYRpImSvrLDtpdUp5d/rfDEVenOo1/B/QhoDJBVJ3h32IO8EyCsH1qy1pk7wHeb3u67abt04Yq2Ng6ba/DEcCutg+3fVkn+2/hb2FESoIYXhOBQd9gJf0X4DW2D7X9xbZt3b4C4ES2EP9IJ+n5kr4r6U5Jd0v6e4o3+Rsk3VC2eVTSfEk/BY6S9ElJS8r2C1U4CWgA3yg/he7e/6lf0ieBPwW+Ui4jc6ykq1se//yyv9slzSrrd5d0afnB4TJg9278fkaS8gPN3S3l0yV9qnwdzpZ0m6SfSzqm3H6spKslHQB8HTi8fO1eLOnPytfjrvL12a3c577y9f8R8Nay7y9K+mE5QnxluWTQvZI+3ZVfxLawndsAN4o3xP8AzgPuBr4BvB64BbiXYkHC/YArgGXArcCh5b6vA+4ob7cDe5Xbf1vW/e0Aj7kM+H3Z5hjgRuAfgZuADwM9wLcpTiZcAhxd7rc/cF35WF8F7qdYImAicHdL/6cDnyrvvxi4luI8k5uBQ8r6C4FzgB8Dq4GTyvotxj/Sb8CJwLkt5RcA9wFjW+oMzG4p79dy/9+At5T3bwQaLdueKbfdPxa4urz/j8A7yvv7AD8Hng/8HcW5RACHUpw71NjW57sj3wb62y9/918o694IfL/idWi9/zyKdeWmlOWLgA+V9+8DPtL2Gp9d3v8bitUjxgG7UawQsX+3fy/P5ZYRxJa9BPjfFP+Uh1B8gv5Tij+2/wn8A8WyIIeW5YvK/U4HPmD7cIo3+t8DZwI3uxi2bjI6aDET+M+yzc1l3T62X2f7C2UsX7T9Soo3s/PKNn8P/Mj2ERRnrB/UwXNbCHzQ9tQy3n9t2TaufJ5vBv65rOsk/pHuLuD15SfMY2z/tqLNUxRJut90ST+VdBfFWf9/vA2PfwJwpqQ7KN5snkfxWr6W4lMttpdRfJCIrfed8udSikQymJcCa2z/vCx/jeL16Nc+BdW/YsRdwHLbv7L9OMWHrQmMIN2eshgJ1ti+C0DScuAHtl2+GUykWOjqRADb/0/S/pJeQDHK+BdJ3wC+Y7u3OCl8q7T+Ab4eeHlLX3tL2oviD/a/lXF8V9KGwTpUcY2N1wDfaulrt5YmV9h+Glgh6YVbG/hIY/vnkqZSfLL8J0nXVTT7g+2nACQ9jyKxNmyvlfQpijf1rSXgRNsrN6ksXqOctPTcbGTTafTW1+Xx8udTbPl9cEv/uL9rK/f3/XTL/f7yiHrPzQhiy9pf4NYXfzQDrFpr+5+BUynmim+VdMg2xND6B7gLcFT5Kf5w2wfafqT/cSv2HeifZBfgoZZ+Drf9spZ2rc97qzPbSFN+6+gx218HPg8cCTxCMUVYpf/3+Zsy6Z7Usm2w/QayGPhgucQMko4o638IvL2sewXFiDYG92vggPJD224Uo+Gt8R/AREkvKcv/nWLKd4eXBLHtWv9xjwV+Y/thSS+2fZfts4EmxfTU1rxhtLsOeOYKfJIOr4jjDcC+ZX3lP4nth4E1kt5a7iNJh23hsYci/u3dnwC3lVM8HwM+TTEV973+g9StbD9EsU7YXRTHopa0bL6Q4kD0HZI6Pah8FrArsKw8wHpWWf9/gD0lLQM+Atz2XJ/Yzsb2k8B84KfA1RRv9FvTzx+Ad1OMtu+i+HD4laGKc3uWpTYGoeI62FfbfkVZvrAsX96/jWJq5wJgEvAYMNf2MklfAqZTDGFXUHzl8WmKg8JjgQur5vErHvNG4HTbzbI8luJSri+jGMH80Pb7JO0PXFL2fRPFdNNU27+RdBpwGrAG+CVwn+1PSZpE8cYzjuJN6VLb81ufZ/mYj9reU9KuW4o/InYcSRA7KEn3UcyL78gXOomIGmWKKSIiKmUE0SWS/pxnr6DXb43tv+hGPBER7ZIgIiKiUqaYIiKiUhJERERUSoKIqCDpY5KWl4vj3SHpVRqiZb8jRookiIg2ko6iOKHwyHKNrddTLNY2JMt+R4wUSRARmxtHcUb84wDluSQnMXTLfk+VdJOkpZIWSxpX9vfKcsTyExXLgN9d1t/ccsY8km6RlKU2onZJEBGbuw6YoOJaAf8q6XW2z6FYunm67ellu+dTLCf9Kts/Ar5s+5XlWfC7A28uz0ZvAm8vV/bdCHyJYgn1qcD5wGfK/i4A3mf7KIoz8PudRzEKQdIUYLdyRdeIWiVBRLSx/SgwFZgL9AGXSZpT0XRrlv1+KfAK4PpyvaePA+Ml7QPsZfvHZbuLW/b5FvDmcqmTv6JY4ymidiNq6dmI4VIu530jcGP5hv+uimZbs+y3KK4RcNQmldK+FW37Y3lM0vXALGA2xZRVRO0ygohoI+mlkia3VB1OcYW+oVj2eyXQUx4IR9Kukv7Y9gbgEUmvLtud3Nb/eRRX+Vtie/3WPK+I5yojiIjN7Ql8qZz22QisophuOoVi2e9ftRyHAIplvyX1L/t9H9XLfv8eOIoieZxTXlhqNPC/gOXAe4BzJf2OYvTy25b+l0p6mOI4RcSwyFIbEdsJSXuWxz+QdCYwzvbflOUXUSSNQ8or/UXULlNMEduPN5Vfhb2b4jrmnwaQ9E6Ki958LMkhhlNGEBERUSkjiIiIqJQEERERlZIgIiKiUhJERERUSoKIiIhK/x8uoDdie+EbqgAAAABJRU5ErkJggg==\n", 262 | "text/plain": [ 263 | "
" 264 | ] 265 | }, 266 | "metadata": { 267 | "needs_background": "light" 268 | }, 269 | "output_type": "display_data" 270 | } 271 | ], 272 | "source": [ 273 | "ax = sns.stripplot(strategies, test_scores); \n", 274 | "ax.set(xlabel ='Strategy', ylabel ='Test Score') \n", 275 | "plt.show() " 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "Checking the performance of `RandomForestClassifier` on the data" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 10, 288 | "metadata": {}, 289 | "outputs": [ 290 | { 291 | "data": { 292 | "text/plain": [ 293 | "0.776595744680851" 294 | ] 295 | }, 296 | "execution_count": 10, 297 | "metadata": {}, 298 | "output_type": "execute_result" 299 | } 300 | ], 301 | "source": [ 302 | "from sklearn.ensemble import RandomForestClassifier\n", 303 | "from sklearn.metrics import accuracy_score\n", 304 | "ans=RandomForestClassifier()\n", 305 | "ans.fit(X_train,y_train)\n", 306 | "prediction=ans.predict(X_test)\n", 307 | "accuracy_score(y_test,prediction)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "On comparing the scores of the KNN classifier with the dummy classifier, we come to the conclusion that the KNN classifier is, in fact, a good classifier for the given data." 315 | ] 316 | }, 317 | { 318 | "cell_type": "markdown", 319 | "metadata": {}, 320 | "source": [ 321 | "## Imbalanced Class and `Dummy Classifier`\n", 322 | "\n", 323 | "A major motivation for Dummy Classifier is F-score, when the positive class is in minority (i.e. imbalanced classes). This classifier is used for sanity test of actual classifier. Actually, dummy classifier completely ignores the input data. In case of 'most frequent' method, it checks the occurrence of most frequent label." 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 11, 329 | "metadata": {}, 330 | "outputs": [ 331 | { 332 | "name": "stdout", 333 | "output_type": "stream", 334 | "text": [ 335 | "0 178\n", 336 | "1 182\n", 337 | "2 177\n", 338 | "3 183\n", 339 | "4 181\n", 340 | "5 182\n", 341 | "6 181\n", 342 | "7 179\n", 343 | "8 174\n", 344 | "9 180\n" 345 | ] 346 | } 347 | ], 348 | "source": [ 349 | "from sklearn.datasets import load_digits\n", 350 | "\n", 351 | "dataset = load_digits()\n", 352 | "X, y = dataset.data, dataset.target\n", 353 | "\n", 354 | "for class_name, class_count in zip(dataset.target_names, np.bincount(dataset.target)):\n", 355 | " print(class_name,class_count)" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 12, 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "name": "stdout", 365 | "output_type": "stream", 366 | "text": [ 367 | "Original labels:\t [1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9]\n", 368 | "New binary labels:\t [1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]\n" 369 | ] 370 | } 371 | ], 372 | "source": [ 373 | "y_imbalanced = y.copy()\n", 374 | "y_imbalanced[y_imbalanced != 1] = 0\n", 375 | "\n", 376 | "print('Original labels:\\t', y[1:20])\n", 377 | "print('New binary labels:\\t', y_imbalanced[1:20])" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 14, 383 | "metadata": {}, 384 | "outputs": [ 385 | { 386 | "data": { 387 | "text/plain": [ 388 | "array([1615, 182], dtype=int64)" 389 | ] 390 | }, 391 | "execution_count": 14, 392 | "metadata": {}, 393 | "output_type": "execute_result" 394 | } 395 | ], 396 | "source": [ 397 | "np.bincount(y_imbalanced)" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "We can observe that in the above data array one class is more frequent than other which shows it is an imbalanced class" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 20, 410 | "metadata": {}, 411 | "outputs": [ 412 | { 413 | "data": { 414 | "text/plain": [ 415 | "0.5466666666666666" 416 | ] 417 | }, 418 | "execution_count": 20, 419 | "metadata": {}, 420 | "output_type": "execute_result" 421 | } 422 | ], 423 | "source": [ 424 | "X_train1, X_test1, y_train1, y_test1 = train_test_split(X, y_imbalanced, random_state=0)\n", 425 | "\n", 426 | "# Accuracy of Support Vector Machine classifier\n", 427 | "from sklearn.naive_bayes import GaussianNB\n", 428 | "gnb = GaussianNB()\n", 429 | "y_pred = gnb.fit(X_train1, y_train1)\n", 430 | "gnb.score(X_test1, y_test1)" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "Here on using Naive Bayes Classifier we get a score of 0.55 , We know this is not a good score and we can use other classifiers and fit the model and check their score. " 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 24, 443 | "metadata": {}, 444 | "outputs": [ 445 | { 446 | "data": { 447 | "text/plain": [ 448 | "0.9088888888888889" 449 | ] 450 | }, 451 | "execution_count": 24, 452 | "metadata": {}, 453 | "output_type": "execute_result" 454 | } 455 | ], 456 | "source": [ 457 | "from sklearn.ensemble import RandomForestClassifier\n", 458 | "\n", 459 | "clf = RandomForestClassifier(max_depth=2, random_state=0).fit(X_train1, y_train1)\n", 460 | "clf.score(X_test1,y_test1)" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "On Using RandomForestClassifier we get a score of 0.908 which is a great score and also much better than what Naive Bayes Classifier performed . " 468 | ] 469 | }, 470 | { 471 | "cell_type": "markdown", 472 | "metadata": {}, 473 | "source": [ 474 | "## Using dummy classifier as baseline" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 25, 480 | "metadata": {}, 481 | "outputs": [ 482 | { 483 | "data": { 484 | "text/plain": [ 485 | "0.9044444444444445" 486 | ] 487 | }, 488 | "execution_count": 25, 489 | "metadata": {}, 490 | "output_type": "execute_result" 491 | } 492 | ], 493 | "source": [ 494 | "from sklearn.dummy import DummyClassifier\n", 495 | "dummy_majority = DummyClassifier(strategy = 'most_frequent').fit(X_train1, y_train1)\n", 496 | "y_dummy_predictions = dummy_majority.predict(X_test)\n", 497 | "dummy_majority.score(X_test1, y_test1)" 498 | ] 499 | }, 500 | { 501 | "cell_type": "markdown", 502 | "metadata": {}, 503 | "source": [ 504 | "We observe that the RandomForest classsifier score is not much compared to dummy classifier which also has a score of more than .90 . Which shows that RandomForest is not a right fit for the model despite the good score.\n", 505 | "This makes us realise that we need a better model which scores better ." 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": 27, 511 | "metadata": {}, 512 | "outputs": [ 513 | { 514 | "data": { 515 | "text/plain": [ 516 | "0.9955555555555555" 517 | ] 518 | }, 519 | "execution_count": 27, 520 | "metadata": {}, 521 | "output_type": "execute_result" 522 | } 523 | ], 524 | "source": [ 525 | "from sklearn.svm import SVC\n", 526 | "svm = SVC(kernel='rbf', C=1).fit(X_train1, y_train1)\n", 527 | "svm.score(X_test1, y_test1)" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": {}, 533 | "source": [ 534 | "On using SVM classifier using RBF kernel for the model,gives a whoping score of 0.99 which is a good score as well as it performce better than dummy classifier which is our baseline. " 535 | ] 536 | }, 537 | { 538 | "cell_type": "markdown", 539 | "metadata": {}, 540 | "source": [ 541 | "*Thus, Dummy Classifier works as a baseline and gives an idea of the performance of the model on dataset*\n" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": null, 547 | "metadata": {}, 548 | "outputs": [], 549 | "source": [] 550 | } 551 | ], 552 | "metadata": { 553 | "kernelspec": { 554 | "display_name": "Python 3", 555 | "language": "python", 556 | "name": "python3" 557 | }, 558 | "language_info": { 559 | "codemirror_mode": { 560 | "name": "ipython", 561 | "version": 3 562 | }, 563 | "file_extension": ".py", 564 | "mimetype": "text/x-python", 565 | "name": "python", 566 | "nbconvert_exporter": "python", 567 | "pygments_lexer": "ipython3", 568 | "version": "3.7.6" 569 | } 570 | }, 571 | "nbformat": 4, 572 | "nbformat_minor": 4 573 | } 574 | -------------------------------------------------------------------------------- /notebooks/Linear_Discriminant_Analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### INTRODUCTION \n", 8 | "The main goal of the dimensionality reduction techniques is to reduce the dimensions by removing the redundant and dependent\n", 9 | "features by transforming the features from a higher dimensional space that may lead to a curse of dimensionality problem, to a space with lower dimensions.\n", 10 | "LDA stands for Linear Discriminant Analysis. It is used as both multiclass classification algorithms and dimentionality\n", 11 | "reduction technique.\n", 12 | "It reduces the number of input features or columns on the given dataset. LDA focuses on maximizing separatability among \n", 13 | "known categories.\n", 14 | "LDA is an unsupervised approach which means there is no need for labeling classes of the data." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "### WHAT IS DIMENTIONALITY REDUCTION ?\n", 22 | "The techniques of dimensionality reduction are important in applications of Machine Learning, Data Mining, Bioinformatics, and Information Retrieval. The main agenda is to remove the redundant and dependent features by changing the dataset onto a lower-dimensional space.\n", 23 | "\n", 24 | "In simple terms, they reduce the dimensions (i.e. variables) in a particular dataset while retaining most of the data.\n", 25 | "\n", 26 | "Multi-dimensional data comprises multiple features having a correlation with one another. We can plot multi-dimensional data in just 2 or 3 dimensions with dimensionality reduction. It allows the data to be presented in an explicit manner which can be easily understood by a layman." 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "### HOW ARE LDA MODELS REPRESENTED\n", 34 | "\n", 35 | "The representation of LDA is pretty straight-forward. The model consists of the statistical properties of your data that has been calculated for each class. The same properties are calculated over the multivariate Gaussian in the case of multiple variables. The multivariates are means and covariate matrix.\n", 36 | "\n", 37 | "Predictions are made by providing the statistical properties into the LDA equation. The properties are estimated from your data. Finally, the model values are saved to file to create the LDA model." 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "### WORKING\n", 45 | "The two very basic principles on which LDA works can be summerized into two steps:\n", 46 | "\n", 47 | "
    \n", 48 | "
  • Maximizing distance between means of given classes.
  • \n", 49 | "
  • Minimizing variation (which LDA calls as scatter) within each category.
  • \n", 50 | "
\n", 51 | " \n" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### ADVANTAGES\n", 59 | "
    \n", 60 | "
  • Helps in reducing computational costs for a given classification task.
  • \n", 61 | "
  • Helpful in avoiding overfitting by minimizing the error in parameter estimation.
  • \n", 62 | "
\n", 63 | "\n", 64 | "\n", 65 | "### LIMITATIONS\n", 66 | "
    \n", 67 | "
  • LDA fails to find the lower dimensional space if the dimensions are much higher than \n", 68 | " the number of samples in the data matrix.
  • \n", 69 | "
  • LDA produces at most C-1 feature projections: If the classification error estimates establish that more features are needed, some other method must be employed to provide those additional features
  • \n", 70 | "
  • LDA is a parametric method since it assumes unimodal Gaussian likelihoods: If the distributions are significantly non-Gaussian, the LDA projections will not be able to preserve any complex structure of the data that may be needed for classification
  • \n", 71 | "
  • LDA will fail when the discriminatory information is not in the mean, but rather in the variance of the data
  • \n", 72 | "
" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "### DIFFERENCE BETWEEN PCA AND LDA\n", 80 | "
    \n", 81 | "
  • PCA is unsupervised algorithm while LDA is supervised algorithm.
  • \n", 82 | "
  • The goal of PCA is to maximize variation in the given dataset while LDA focuses on \n", 83 | " maximizing separatibility among known categories.
  • \n", 84 | "
  • LDA performs better multi-class classification tasks than PCA. However, PCA performs better when the sample size is comparatively small. An example would be comparisons between classification accuracies that are used in image classification.
  • \n", 85 | "
\n", 86 | " " 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### Following are the extensions of LDA in case we need to use non-linear discriminant analysis:\n", 94 | "
    \n", 95 | "
  • Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of variance (or covariance when there are multiple input variables).
  • \n", 96 | "
  • Flexible Discriminant Analysis (FDA): Where non-linear combinations of inputs is used such as splines.
  • \n", 97 | "
  • Regularized Discriminant Analysis (RDA): Introduces regularization into the estimate of the variance (actually covariance), moderating the influence of different variables on LDA.
  • \n", 98 | "
" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "### Applications:\n", 106 | "
    \n", 107 | "
  • Face Recognition: In the field of Computer Vision, face recognition is a very popular application in which each face is represented by a very large number of pixel values. Linear discriminant analysis (LDA) is used here to reduce the number of features to a more manageable number before the process of classification. Each of the new dimensions generated is a linear combination of pixel values, which form a template. The linear combinations obtained using Fisher’s linear discriminant are called Fisher faces.
  • \n", 108 | "
  • Medical: In this field, Linear discriminant analysis (LDA) is used to classify the patient disease state as mild, moderate or severe based upon the patient various parameters and the medical treatment he is going through. This helps the doctors to intensify or reduce the pace of their treatment.
  • \n", 109 | "
  • Customer Identification: Suppose we want to identify the type of customers which are most likely to buy a particular product in a shopping mall. By doing a simple question and answers survey, we can gather all the features of the customers. Here, Linear discriminant analysis will help us to identify and select the features which can describe the characteristics of the group of customers that are most likely to buy that particular product in the shopping mall.
  • \n", 110 | "
" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 1, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "# Import necessary modules\n", 120 | "import numpy as np\n", 121 | "import pandas as np\n", 122 | "from sklearn.datasets import make_classification\n", 123 | "from sklearn.model_selection import cross_val_score\n", 124 | "from sklearn.model_selection import RepeatedStratifiedKFold\n", 125 | "from sklearn.pipeline import Pipeline\n", 126 | "from sklearn.discriminant_analysis import LinearDiscriminantAnalysis\n", 127 | "from sklearn.naive_bayes import GaussianNB" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 2, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "# Generating data for our problem\n", 137 | "X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=7, n_classes=10)" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 3, 143 | "metadata": {}, 144 | "outputs": [ 145 | { 146 | "name": "stdout", 147 | "output_type": "stream", 148 | "text": [ 149 | "[[ 2.3548775 -1.69674567 1.6193882 ... -3.33390362 2.45147541\n", 150 | " -1.23455205]\n", 151 | " [ 2.0204277 -1.62734821 -2.27697377 ... -0.28274722 -7.28166465\n", 152 | " -0.91070347]\n", 153 | " [-1.02400669 1.01276423 1.05505825 ... 3.83923974 -1.63530582\n", 154 | " 3.96050914]\n", 155 | " ...\n", 156 | " [-0.36448581 -0.2996303 2.21875138 ... -1.11303373 3.67576043\n", 157 | " -1.44164572]\n", 158 | " [ 0.05614772 1.87270289 -2.63165761 ... -3.07434527 2.31606352\n", 159 | " 1.65068838]\n", 160 | " [ 1.09853247 1.61067335 2.7977282 ... -1.62233539 14.09727916\n", 161 | " 2.27215759]]\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "print(X)" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 4, 172 | "metadata": {}, 173 | "outputs": [ 174 | { 175 | "name": "stdout", 176 | "output_type": "stream", 177 | "text": [ 178 | "[9 4 4 1 2 7 9 4 6 9 1 3 9 5 6 4 9 2 0 4 4 8 0 9 3 8 6 0 0 7 8 3 8 5 5 9 7\n", 179 | " 1 3 1 8 7 6 7 4 6 5 6 2 8 3 1 7 0 7 0 4 5 1 6 6 8 3 3 3 2 5 8 0 5 6 2 7 1\n", 180 | " 3 8 7 2 8 0 6 8 2 9 9 8 2 2 5 6 9 4 6 1 4 9 0 9 7 8 7 2 0 8 8 1 7 9 8 1 6\n", 181 | " 3 9 2 5 5 3 9 1 1 2 0 8 0 7 2 0 5 0 1 8 0 2 2 1 3 0 2 5 9 3 8 8 7 7 0 4 3\n", 182 | " 4 0 8 5 3 7 4 4 5 5 0 4 5 1 1 4 3 5 2 6 4 2 1 6 6 9 5 3 7 0 1 5 9 5 4 7 3\n", 183 | " 9 0 0 1 9 5 2 2 7 4 0 1 2 6 4 3 7 6 8 8 3 0 8 3 0 5 5 1 7 8 6 8 4 1 1 3 1\n", 184 | " 9 9 3 2 8 1 8 1 7 6 1 1 7 6 5 3 4 1 6 5 2 8 6 5 9 0 6 9 6 2 3 4 8 3 8 4 8\n", 185 | " 1 0 4 0 6 3 8 4 6 9 2 9 2 7 5 1 6 3 0 6 9 3 7 1 5 5 0 9 4 8 9 2 8 2 9 3 2\n", 186 | " 3 5 1 8 0 0 6 5 1 3 2 8 1 8 6 7 3 2 5 9 6 2 3 4 2 1 5 4 2 9 5 1 7 1 6 0 2\n", 187 | " 8 6 1 8 7 8 0 3 0 7 1 0 4 1 4 2 0 8 2 7 9 7 3 5 1 5 1 4 9 0 4 9 5 0 8 9 1\n", 188 | " 2 9 2 8 4 7 9 7 8 4 9 1 7 8 3 7 3 1 9 6 2 9 4 6 8 1 1 5 6 3 0 3 4 8 7 5 6\n", 189 | " 9 9 6 4 8 2 6 2 7 0 6 8 0 7 0 1 5 7 3 2 2 3 5 2 1 3 6 9 5 4 3 6 7 9 2 4 2\n", 190 | " 5 0 2 7 4 5 9 1 3 1 8 6 3 1 1 3 3 7 6 6 5 5 8 7 8 9 5 0 7 4 6 3 9 4 7 4 3\n", 191 | " 5 7 6 7 6 7 9 7 7 7 7 5 7 5 1 6 3 2 6 5 1 0 6 0 1 5 8 9 6 6 3 6 3 6 0 0 8\n", 192 | " 9 7 6 4 6 8 3 3 5 2 6 3 3 9 2 8 9 2 5 8 6 1 4 4 6 0 9 6 4 3 4 4 2 0 7 3 3\n", 193 | " 4 9 0 5 3 6 4 8 3 5 2 5 8 2 1 5 4 2 3 7 8 0 1 4 0 6 8 2 7 4 8 1 4 3 5 0 3\n", 194 | " 8 3 1 9 9 6 0 8 0 7 1 9 2 7 8 6 0 2 3 8 8 8 2 9 0 3 1 4 3 9 9 2 5 0 3 4 1\n", 195 | " 3 6 6 2 6 2 2 6 5 4 2 6 3 2 7 2 3 3 3 2 2 2 9 7 9 0 9 0 5 3 6 0 3 8 2 6 3\n", 196 | " 7 5 0 2 4 8 9 9 4 2 8 3 6 9 6 7 1 0 4 4 4 1 7 9 6 4 9 7 1 8 0 1 8 9 7 5 4\n", 197 | " 8 3 5 6 6 8 1 2 2 3 0 0 0 9 8 0 3 8 7 9 5 4 6 6 0 1 5 5 1 6 4 7 1 2 0 3 4\n", 198 | " 0 4 0 7 5 7 0 3 8 3 0 9 7 5 6 2 8 5 2 5 3 7 9 1 0 2 2 1 1 9 2 9 2 8 0 4 5\n", 199 | " 4 0 1 6 6 5 2 5 0 1 7 6 5 0 0 3 4 2 1 6 6 5 4 3 3 4 9 4 2 3 5 1 4 5 1 7 8\n", 200 | " 7 0 6 9 5 5 9 2 9 8 1 7 0 1 9 9 9 3 2 5 5 6 2 1 7 4 0 3 5 7 7 7 1 2 2 8 9\n", 201 | " 1 7 3 9 0 2 1 6 1 4 3 6 6 0 1 3 2 8 4 0 7 4 7 9 8 7 1 6 0 1 4 2 3 5 9 5 7\n", 202 | " 8 2 0 9 0 0 1 0 6 3 1 9 6 8 2 2 8 9 7 3 4 9 7 4 0 5 4 4 1 7 2 8 4 6 1 8 8\n", 203 | " 3 4 7 5 7 0 5 8 4 5 8 5 9 6 7 1 5 1 6 9 2 1 9 7 2 4 0 7 3 7 5 4 5 7 8 3 5\n", 204 | " 2 9 4 0 5 4 9 6 9 5 4 2 1 7 2 3 4 1 7 4 4 8 3 3 7 4 5 4 8 0 7 9 2 7 8 6 8\n", 205 | " 6]\n" 206 | ] 207 | } 208 | ], 209 | "source": [ 210 | "print(y)" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 5, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "(1000, 20)\n" 223 | ] 224 | } 225 | ], 226 | "source": [ 227 | "# Shape of inout data\n", 228 | "print(X.shape)" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 6, 234 | "metadata": {}, 235 | "outputs": [], 236 | "source": [ 237 | "# Defining model\n", 238 | "def model(n_components=None, solver='svd', shrinkage=None, priors=None,\n", 239 | " store_covariance=False, tol=0.0001, covariance_estimator=None):\n", 240 | " '''\n", 241 | " n_components : int, default=None\n", 242 | " Number of components (<= min(n_classes - 1, n_features)) for dimensionality reduction. \n", 243 | " If None, will be set to min(n_classes - 1, n_features). This parameter only affects the transform method.\n", 244 | " \n", 245 | " solver : {‘svd’, ‘lsqr’, ‘eigen’}, default=’svd’\n", 246 | " Solver to use, possible values:\n", 247 | " ‘svd’: Singular value decomposition (default). Does not compute the covariance matrix, therefore this solver is recommended for data with a large number of features.\n", 248 | " ‘lsqr’: Least squares solution. Can be combined with shrinkage or custom covariance estimator.\n", 249 | " ‘eigen’: Eigenvalue decomposition. Can be combined with shrinkage or custom covariance estimator.\n", 250 | " \n", 251 | " shrinkage : ‘auto’ or float, default=None\n", 252 | " Shrinkage parameter, possible values:\n", 253 | " None: no shrinkage (default).\n", 254 | " ‘auto’: automatic shrinkage using the Ledoit-Wolf lemma.\n", 255 | " float between 0 and 1: fixed shrinkage parameter.\n", 256 | " This should be left to None if covariance_estimator is used. Note that shrinkage works only with ‘lsqr’ and ‘eigen’ solvers.\n", 257 | "\n", 258 | " priors : array-like of shape (n_classes,), default=None\n", 259 | " The class prior probabilities. By default, the class proportions are inferred from the training data.\n", 260 | "\n", 261 | " store_covariance : bool, default=False\n", 262 | " If True, explicitely compute the weighted within-class covariance matrix when solver is ‘svd’. The matrix is always computed\n", 263 | " and stored for the other solvers.\n", 264 | "\n", 265 | " tol : float, default=1.0e-4\n", 266 | " Absolute threshold for a singular value of X to be considered significant, used to estimate the rank of X. Dimensions whose singular \n", 267 | " values are non-significant are discarded. Only used if solver is ‘svd’.\n", 268 | "\n", 269 | " covariance_estimator : covariance estimator, default=None\n", 270 | " If not None, covariance_estimator is used to estimate the covariance matrices instead of relying on the empirical \n", 271 | " covariance estimator (with potential shrinkage). The object should have a fit method and a covariance_ attribute \n", 272 | " like the estimators in sklearn.covariance. if None the shrinkage parameter drives the estimate.\n", 273 | " '''\n", 274 | " lda = LinearDiscriminantAnalysis(solver=solver, shrinkage=shrinkage, \n", 275 | " priors=priors, n_components=n_components, store_covariance=store_covariance, \n", 276 | " tol=tol, covariance_estimator=covariance_estimator)\n", 277 | " return lda" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 7, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "data": { 287 | "text/plain": [ 288 | "LinearDiscriminantAnalysis(n_components=5)" 289 | ] 290 | }, 291 | "execution_count": 7, 292 | "metadata": {}, 293 | "output_type": "execute_result" 294 | } 295 | ], 296 | "source": [ 297 | "# Fitting the data to the model\n", 298 | "lda = model(5)\n", 299 | "lda.fit(X,y)\n", 300 | "\n" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 8, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "# Transforming data \n", 310 | "data_transformation = lda.transform(X)" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": 9, 316 | "metadata": {}, 317 | "outputs": [ 318 | { 319 | "name": "stdout", 320 | "output_type": "stream", 321 | "text": [ 322 | "[[-1.34250698 -0.410752 -0.05284109 -2.52177124 -2.32197387]\n", 323 | " [ 0.92569633 -0.92633682 -0.29396574 -0.62144384 1.61682597]\n", 324 | " [-0.36265323 -0.87103112 1.53812275 0.59888243 -1.39423894]\n", 325 | " ...\n", 326 | " [-0.83323633 0.06686996 0.39414469 -0.5877848 0.11590941]\n", 327 | " [ 0.47329133 1.42040541 0.49439799 -0.05149737 -0.53591346]\n", 328 | " [-1.04969306 0.27613461 -0.13712968 -1.21293132 -0.22775809]]\n" 329 | ] 330 | } 331 | ], 332 | "source": [ 333 | "print(data_transformation)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 10, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "name": "stdout", 343 | "output_type": "stream", 344 | "text": [ 345 | "(1000, 5)\n" 346 | ] 347 | } 348 | ], 349 | "source": [ 350 | "# Notice the reduction of dimensions\n", 351 | "print(data_transformation.shape)" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "metadata": {}, 372 | "outputs": [], 373 | "source": [] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": null, 378 | "metadata": {}, 379 | "outputs": [], 380 | "source": [] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": null, 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [] 388 | } 389 | ], 390 | "metadata": { 391 | "kernelspec": { 392 | "display_name": "Python 3", 393 | "language": "python", 394 | "name": "python3" 395 | }, 396 | "language_info": { 397 | "codemirror_mode": { 398 | "name": "ipython", 399 | "version": 3 400 | }, 401 | "file_extension": ".py", 402 | "mimetype": "text/x-python", 403 | "name": "python", 404 | "nbconvert_exporter": "python", 405 | "pygments_lexer": "ipython3", 406 | "version": "3.7.9" 407 | } 408 | }, 409 | "nbformat": 4, 410 | "nbformat_minor": 4 411 | } 412 | -------------------------------------------------------------------------------- /notebooks/PhiK Correlation/data_description.txt: -------------------------------------------------------------------------------- 1 | MSSubClass: Identifies the type of dwelling involved in the sale. 2 | 3 | 20 1-STORY 1946 & NEWER ALL STYLES 4 | 30 1-STORY 1945 & OLDER 5 | 40 1-STORY W/FINISHED ATTIC ALL AGES 6 | 45 1-1/2 STORY - UNFINISHED ALL AGES 7 | 50 1-1/2 STORY FINISHED ALL AGES 8 | 60 2-STORY 1946 & NEWER 9 | 70 2-STORY 1945 & OLDER 10 | 75 2-1/2 STORY ALL AGES 11 | 80 SPLIT OR MULTI-LEVEL 12 | 85 SPLIT FOYER 13 | 90 DUPLEX - ALL STYLES AND AGES 14 | 120 1-STORY PUD (Planned Unit Development) - 1946 & NEWER 15 | 150 1-1/2 STORY PUD - ALL AGES 16 | 160 2-STORY PUD - 1946 & NEWER 17 | 180 PUD - MULTILEVEL - INCL SPLIT LEV/FOYER 18 | 190 2 FAMILY CONVERSION - ALL STYLES AND AGES 19 | 20 | MSZoning: Identifies the general zoning classification of the sale. 21 | 22 | A Agriculture 23 | C Commercial 24 | FV Floating Village Residential 25 | I Industrial 26 | RH Residential High Density 27 | RL Residential Low Density 28 | RP Residential Low Density Park 29 | RM Residential Medium Density 30 | 31 | LotFrontage: Linear feet of street connected to property 32 | 33 | LotArea: Lot size in square feet 34 | 35 | Street: Type of road access to property 36 | 37 | Grvl Gravel 38 | Pave Paved 39 | 40 | Alley: Type of alley access to property 41 | 42 | Grvl Gravel 43 | Pave Paved 44 | NA No alley access 45 | 46 | LotShape: General shape of property 47 | 48 | Reg Regular 49 | IR1 Slightly irregular 50 | IR2 Moderately Irregular 51 | IR3 Irregular 52 | 53 | LandContour: Flatness of the property 54 | 55 | Lvl Near Flat/Level 56 | Bnk Banked - Quick and significant rise from street grade to building 57 | HLS Hillside - Significant slope from side to side 58 | Low Depression 59 | 60 | Utilities: Type of utilities available 61 | 62 | AllPub All public Utilities (E,G,W,& S) 63 | NoSewr Electricity, Gas, and Water (Septic Tank) 64 | NoSeWa Electricity and Gas Only 65 | ELO Electricity only 66 | 67 | LotConfig: Lot configuration 68 | 69 | Inside Inside lot 70 | Corner Corner lot 71 | CulDSac Cul-de-sac 72 | FR2 Frontage on 2 sides of property 73 | FR3 Frontage on 3 sides of property 74 | 75 | LandSlope: Slope of property 76 | 77 | Gtl Gentle slope 78 | Mod Moderate Slope 79 | Sev Severe Slope 80 | 81 | Neighborhood: Physical locations within Ames city limits 82 | 83 | Blmngtn Bloomington Heights 84 | Blueste Bluestem 85 | BrDale Briardale 86 | BrkSide Brookside 87 | ClearCr Clear Creek 88 | CollgCr College Creek 89 | Crawfor Crawford 90 | Edwards Edwards 91 | Gilbert Gilbert 92 | IDOTRR Iowa DOT and Rail Road 93 | MeadowV Meadow Village 94 | Mitchel Mitchell 95 | Names North Ames 96 | NoRidge Northridge 97 | NPkVill Northpark Villa 98 | NridgHt Northridge Heights 99 | NWAmes Northwest Ames 100 | OldTown Old Town 101 | SWISU South & West of Iowa State University 102 | Sawyer Sawyer 103 | SawyerW Sawyer West 104 | Somerst Somerset 105 | StoneBr Stone Brook 106 | Timber Timberland 107 | Veenker Veenker 108 | 109 | Condition1: Proximity to various conditions 110 | 111 | Artery Adjacent to arterial street 112 | Feedr Adjacent to feeder street 113 | Norm Normal 114 | RRNn Within 200' of North-South Railroad 115 | RRAn Adjacent to North-South Railroad 116 | PosN Near positive off-site feature--park, greenbelt, etc. 117 | PosA Adjacent to postive off-site feature 118 | RRNe Within 200' of East-West Railroad 119 | RRAe Adjacent to East-West Railroad 120 | 121 | Condition2: Proximity to various conditions (if more than one is present) 122 | 123 | Artery Adjacent to arterial street 124 | Feedr Adjacent to feeder street 125 | Norm Normal 126 | RRNn Within 200' of North-South Railroad 127 | RRAn Adjacent to North-South Railroad 128 | PosN Near positive off-site feature--park, greenbelt, etc. 129 | PosA Adjacent to postive off-site feature 130 | RRNe Within 200' of East-West Railroad 131 | RRAe Adjacent to East-West Railroad 132 | 133 | BldgType: Type of dwelling 134 | 135 | 1Fam Single-family Detached 136 | 2FmCon Two-family Conversion; originally built as one-family dwelling 137 | Duplx Duplex 138 | TwnhsE Townhouse End Unit 139 | TwnhsI Townhouse Inside Unit 140 | 141 | HouseStyle: Style of dwelling 142 | 143 | 1Story One story 144 | 1.5Fin One and one-half story: 2nd level finished 145 | 1.5Unf One and one-half story: 2nd level unfinished 146 | 2Story Two story 147 | 2.5Fin Two and one-half story: 2nd level finished 148 | 2.5Unf Two and one-half story: 2nd level unfinished 149 | SFoyer Split Foyer 150 | SLvl Split Level 151 | 152 | OverallQual: Rates the overall material and finish of the house 153 | 154 | 10 Very Excellent 155 | 9 Excellent 156 | 8 Very Good 157 | 7 Good 158 | 6 Above Average 159 | 5 Average 160 | 4 Below Average 161 | 3 Fair 162 | 2 Poor 163 | 1 Very Poor 164 | 165 | OverallCond: Rates the overall condition of the house 166 | 167 | 10 Very Excellent 168 | 9 Excellent 169 | 8 Very Good 170 | 7 Good 171 | 6 Above Average 172 | 5 Average 173 | 4 Below Average 174 | 3 Fair 175 | 2 Poor 176 | 1 Very Poor 177 | 178 | YearBuilt: Original construction date 179 | 180 | YearRemodAdd: Remodel date (same as construction date if no remodeling or additions) 181 | 182 | RoofStyle: Type of roof 183 | 184 | Flat Flat 185 | Gable Gable 186 | Gambrel Gabrel (Barn) 187 | Hip Hip 188 | Mansard Mansard 189 | Shed Shed 190 | 191 | RoofMatl: Roof material 192 | 193 | ClyTile Clay or Tile 194 | CompShg Standard (Composite) Shingle 195 | Membran Membrane 196 | Metal Metal 197 | Roll Roll 198 | Tar&Grv Gravel & Tar 199 | WdShake Wood Shakes 200 | WdShngl Wood Shingles 201 | 202 | Exterior1st: Exterior covering on house 203 | 204 | AsbShng Asbestos Shingles 205 | AsphShn Asphalt Shingles 206 | BrkComm Brick Common 207 | BrkFace Brick Face 208 | CBlock Cinder Block 209 | CemntBd Cement Board 210 | HdBoard Hard Board 211 | ImStucc Imitation Stucco 212 | MetalSd Metal Siding 213 | Other Other 214 | Plywood Plywood 215 | PreCast PreCast 216 | Stone Stone 217 | Stucco Stucco 218 | VinylSd Vinyl Siding 219 | Wd Sdng Wood Siding 220 | WdShing Wood Shingles 221 | 222 | Exterior2nd: Exterior covering on house (if more than one material) 223 | 224 | AsbShng Asbestos Shingles 225 | AsphShn Asphalt Shingles 226 | BrkComm Brick Common 227 | BrkFace Brick Face 228 | CBlock Cinder Block 229 | CemntBd Cement Board 230 | HdBoard Hard Board 231 | ImStucc Imitation Stucco 232 | MetalSd Metal Siding 233 | Other Other 234 | Plywood Plywood 235 | PreCast PreCast 236 | Stone Stone 237 | Stucco Stucco 238 | VinylSd Vinyl Siding 239 | Wd Sdng Wood Siding 240 | WdShing Wood Shingles 241 | 242 | MasVnrType: Masonry veneer type 243 | 244 | BrkCmn Brick Common 245 | BrkFace Brick Face 246 | CBlock Cinder Block 247 | None None 248 | Stone Stone 249 | 250 | MasVnrArea: Masonry veneer area in square feet 251 | 252 | ExterQual: Evaluates the quality of the material on the exterior 253 | 254 | Ex Excellent 255 | Gd Good 256 | TA Average/Typical 257 | Fa Fair 258 | Po Poor 259 | 260 | ExterCond: Evaluates the present condition of the material on the exterior 261 | 262 | Ex Excellent 263 | Gd Good 264 | TA Average/Typical 265 | Fa Fair 266 | Po Poor 267 | 268 | Foundation: Type of foundation 269 | 270 | BrkTil Brick & Tile 271 | CBlock Cinder Block 272 | PConc Poured Contrete 273 | Slab Slab 274 | Stone Stone 275 | Wood Wood 276 | 277 | BsmtQual: Evaluates the height of the basement 278 | 279 | Ex Excellent (100+ inches) 280 | Gd Good (90-99 inches) 281 | TA Typical (80-89 inches) 282 | Fa Fair (70-79 inches) 283 | Po Poor (<70 inches 284 | NA No Basement 285 | 286 | BsmtCond: Evaluates the general condition of the basement 287 | 288 | Ex Excellent 289 | Gd Good 290 | TA Typical - slight dampness allowed 291 | Fa Fair - dampness or some cracking or settling 292 | Po Poor - Severe cracking, settling, or wetness 293 | NA No Basement 294 | 295 | BsmtExposure: Refers to walkout or garden level walls 296 | 297 | Gd Good Exposure 298 | Av Average Exposure (split levels or foyers typically score average or above) 299 | Mn Mimimum Exposure 300 | No No Exposure 301 | NA No Basement 302 | 303 | BsmtFinType1: Rating of basement finished area 304 | 305 | GLQ Good Living Quarters 306 | ALQ Average Living Quarters 307 | BLQ Below Average Living Quarters 308 | Rec Average Rec Room 309 | LwQ Low Quality 310 | Unf Unfinshed 311 | NA No Basement 312 | 313 | BsmtFinSF1: Type 1 finished square feet 314 | 315 | BsmtFinType2: Rating of basement finished area (if multiple types) 316 | 317 | GLQ Good Living Quarters 318 | ALQ Average Living Quarters 319 | BLQ Below Average Living Quarters 320 | Rec Average Rec Room 321 | LwQ Low Quality 322 | Unf Unfinshed 323 | NA No Basement 324 | 325 | BsmtFinSF2: Type 2 finished square feet 326 | 327 | BsmtUnfSF: Unfinished square feet of basement area 328 | 329 | TotalBsmtSF: Total square feet of basement area 330 | 331 | Heating: Type of heating 332 | 333 | Floor Floor Furnace 334 | GasA Gas forced warm air furnace 335 | GasW Gas hot water or steam heat 336 | Grav Gravity furnace 337 | OthW Hot water or steam heat other than gas 338 | Wall Wall furnace 339 | 340 | HeatingQC: Heating quality and condition 341 | 342 | Ex Excellent 343 | Gd Good 344 | TA Average/Typical 345 | Fa Fair 346 | Po Poor 347 | 348 | CentralAir: Central air conditioning 349 | 350 | N No 351 | Y Yes 352 | 353 | Electrical: Electrical system 354 | 355 | SBrkr Standard Circuit Breakers & Romex 356 | FuseA Fuse Box over 60 AMP and all Romex wiring (Average) 357 | FuseF 60 AMP Fuse Box and mostly Romex wiring (Fair) 358 | FuseP 60 AMP Fuse Box and mostly knob & tube wiring (poor) 359 | Mix Mixed 360 | 361 | 1stFlrSF: First Floor square feet 362 | 363 | 2ndFlrSF: Second floor square feet 364 | 365 | LowQualFinSF: Low quality finished square feet (all floors) 366 | 367 | GrLivArea: Above grade (ground) living area square feet 368 | 369 | BsmtFullBath: Basement full bathrooms 370 | 371 | BsmtHalfBath: Basement half bathrooms 372 | 373 | FullBath: Full bathrooms above grade 374 | 375 | HalfBath: Half baths above grade 376 | 377 | BedroomAbvGr: Bedrooms above grade (does NOT include basement bedrooms) 378 | 379 | KitchenAbvGr: Kitchens above grade 380 | 381 | KitchenQual: Kitchen quality 382 | 383 | Ex Excellent 384 | Gd Good 385 | TA Typical/Average 386 | Fa Fair 387 | Po Poor 388 | 389 | TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) 390 | 391 | Functional: Home functionality (Assume typical unless deductions are warranted) 392 | 393 | Typ Typical Functionality 394 | Min1 Minor Deductions 1 395 | Min2 Minor Deductions 2 396 | Mod Moderate Deductions 397 | Maj1 Major Deductions 1 398 | Maj2 Major Deductions 2 399 | Sev Severely Damaged 400 | Sal Salvage only 401 | 402 | Fireplaces: Number of fireplaces 403 | 404 | FireplaceQu: Fireplace quality 405 | 406 | Ex Excellent - Exceptional Masonry Fireplace 407 | Gd Good - Masonry Fireplace in main level 408 | TA Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement 409 | Fa Fair - Prefabricated Fireplace in basement 410 | Po Poor - Ben Franklin Stove 411 | NA No Fireplace 412 | 413 | GarageType: Garage location 414 | 415 | 2Types More than one type of garage 416 | Attchd Attached to home 417 | Basment Basement Garage 418 | BuiltIn Built-In (Garage part of house - typically has room above garage) 419 | CarPort Car Port 420 | Detchd Detached from home 421 | NA No Garage 422 | 423 | GarageYrBlt: Year garage was built 424 | 425 | GarageFinish: Interior finish of the garage 426 | 427 | Fin Finished 428 | RFn Rough Finished 429 | Unf Unfinished 430 | NA No Garage 431 | 432 | GarageCars: Size of garage in car capacity 433 | 434 | GarageArea: Size of garage in square feet 435 | 436 | GarageQual: Garage quality 437 | 438 | Ex Excellent 439 | Gd Good 440 | TA Typical/Average 441 | Fa Fair 442 | Po Poor 443 | NA No Garage 444 | 445 | GarageCond: Garage condition 446 | 447 | Ex Excellent 448 | Gd Good 449 | TA Typical/Average 450 | Fa Fair 451 | Po Poor 452 | NA No Garage 453 | 454 | PavedDrive: Paved driveway 455 | 456 | Y Paved 457 | P Partial Pavement 458 | N Dirt/Gravel 459 | 460 | WoodDeckSF: Wood deck area in square feet 461 | 462 | OpenPorchSF: Open porch area in square feet 463 | 464 | EnclosedPorch: Enclosed porch area in square feet 465 | 466 | 3SsnPorch: Three season porch area in square feet 467 | 468 | ScreenPorch: Screen porch area in square feet 469 | 470 | PoolArea: Pool area in square feet 471 | 472 | PoolQC: Pool quality 473 | 474 | Ex Excellent 475 | Gd Good 476 | TA Average/Typical 477 | Fa Fair 478 | NA No Pool 479 | 480 | Fence: Fence quality 481 | 482 | GdPrv Good Privacy 483 | MnPrv Minimum Privacy 484 | GdWo Good Wood 485 | MnWw Minimum Wood/Wire 486 | NA No Fence 487 | 488 | MiscFeature: Miscellaneous feature not covered in other categories 489 | 490 | Elev Elevator 491 | Gar2 2nd Garage (if not described in garage section) 492 | Othr Other 493 | Shed Shed (over 100 SF) 494 | TenC Tennis Court 495 | NA None 496 | 497 | MiscVal: $Value of miscellaneous feature 498 | 499 | MoSold: Month Sold (MM) 500 | 501 | YrSold: Year Sold (YYYY) 502 | 503 | SaleType: Type of sale 504 | 505 | WD Warranty Deed - Conventional 506 | CWD Warranty Deed - Cash 507 | VWD Warranty Deed - VA Loan 508 | New Home just constructed and sold 509 | COD Court Officer Deed/Estate 510 | Con Contract 15% Down payment regular terms 511 | ConLw Contract Low Down payment and low interest 512 | ConLI Contract Low Interest 513 | ConLD Contract Low Down 514 | Oth Other 515 | 516 | SaleCondition: Condition of sale 517 | 518 | Normal Normal Sale 519 | Abnorml Abnormal Sale - trade, foreclosure, short sale 520 | AdjLand Adjoining Land Purchase 521 | Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit 522 | Family Sale between family members 523 | Partial Home was not completed when last assessed (associated with New Homes) 524 | 525 | 526 | 527 | 528 | 529 | 530 | 531 | 532 | 533 | 534 | 535 | 536 | 537 | 538 | 539 | 540 | 541 | 542 | 543 | 544 | 545 | Based on data drescription following continous variables were found: 546 | - LotFrontage: Linear feet of street connected to property 547 | - LotArea: Lot size in square feet 548 | - YearBuilt: Original construction date 549 | - YearRemodAdd: Remodel date (same as construction date if no remodeling or additions) 550 | - MasVnrArea: Masonry veneer area in square feet 551 | - BsmtFinSF1: Type 1 finished square feet 552 | - BsmtFinSF2: Type 2 finished square feet 553 | - BsmtUnfSF: Unfinished square feet of basement area 554 | - TotalBsmtSF: Total square feet of basement area 555 | - 1stFlrSF: First Floor square feet 556 | - 2ndFlrSF: Second floor square feet 557 | - LowQualFinSF: Low quality finished square feet (all floors) 558 | - GrLivArea: Above grade (ground) living area square feet 559 | - BsmtFullBath: Basement full bathrooms 560 | - BsmtHalfBath: Basement half bathrooms 561 | - FullBath: Full bathrooms above grade 562 | - HalfBath: Half baths above grade 563 | - BedroomAbvGr: Bedrooms above grade (does NOT include basement bedrooms) 564 | - KitchenAbvGr: Kitchens above grade 565 | - TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) 566 | - GarageYrBlt: Year garage was built 567 | - GarageCars: Size of garage in car capacity 568 | - GarageArea: Size of garage in square feet 569 | - WoodDeckSF: Wood deck area in square feet 570 | - OpenPorchSF: Open porch area in square feet 571 | - EnclosedPorch: Enclosed porch area in square feet 572 | - 3SsnPorch: Three season porch area in square feet 573 | - ScreenPorch: Screen porch area in square feet 574 | - PoolArea: Pool area in square feet -------------------------------------------------------------------------------- /notebooks/agriculture_yield_rice: -------------------------------------------------------------------------------- 1 | import pandas as pd 2 | import numpy as np 3 | df=pd.read_csv("Dataset.csv") 4 | df 5 | df1=pd.get_dummies(df['Irrigation']) 6 | df2=pd.concat([df1,df],axis=1) 7 | df2 8 | -------------------------------------------------------------------------------- /notebooks/biden_speech.txt: -------------------------------------------------------------------------------- 1 | Thank you. Thank you, thank you, thank you. It’s good to be back. As Mitch and Chuck will understand, it’s good to be almost home, down the hall. Anyway, thank you all. 2 | 3 | Madam Speaker, Madam Vice President. No president has ever said those words from this podium. No president has ever said those words. And it’s about time. The first lady, I’m her husband. Second gentleman. Chief justice. Members of the United States Congress and the cabinet, distinguished guests. My fellow Americans. 4 | 5 | While the setting tonight is familiar, this gathering is just a little bit different. A reminder of the extraordinary times we’re in. Throughout our history, presidents have come to this chamber to speak to Congress, to the nation and to the world. To declare war, to celebrate peace, to announce new plans and possibilities. 6 | 7 | 8 | 9 | 10 | 11 | Tonight, I come to talk about crisis and opportunity. About rebuilding the nation, revitalizing our democracy, and winning the future for America. I stand here tonight one day shy of the 100th day of my administration. A hundred days since I took the oath of office, lifted my hand off our family Bible and inherited a nation — we all did — that was in crisis. The worst pandemic in a century. The worst economic crisis since the Great Depression. The worst attack on our democracy since the Civil War. Now, after just 100 days, I can report to the nation, America is on the move again. Turning peril into possibility, crisis into opportunity, setbacks to strength. 12 | 13 | We all know life can knock us down. But in America, we never, ever, ever stay down. Americans always get up. Today, that’s what we’re doing. America is rising anew. Choosing hope over fear, truth over lies and light over darkness. After 100 days of rescue and renewal, America is ready for a takeoff, in my view. We’re working again, dreaming again, discovering again and leading the world again. We have shown each other and the world that there’s no quit in America. None. 14 | 15 | 16 | 17 | And more than half of all the adults in America have gotten at least one shot. The mass vaccination center in Glendale, Ariz., I asked the nurse, I said, “What’s it like?” She looked at me, she said, “It’s like every shot is giving a dose of hope” was her phrase, a dose of hope. 18 | 19 | A dose of hope for an educator in Florida, who has a child suffering from an autoimmune disease, wrote to me, said she’s worried — that she was worried about bringing the virus home. She said she then got vaccinated at a large site, in her car. She said she sat in her car when she got vaccinated and just cried, cried out of joy, and cried out of relief. 20 | 21 | Parents seeing the smiles on the kids’ faces, for those who are able to go back to school because the teachers and the school bus drivers and the cafeteria workers have been vaccinated. Grandparents, hugging their children and grandchildren, instead of pressing hands against the window to say goodbye. It means everything. Those things mean everything. 22 | 23 | You know, there’s still — you all know it, you know it better than any group of Americans — there’s still more work to do to beat this virus. We can’t let our guard down. But tonight, I can say, because of you, the American people, our progress these past 100 days against one of the worst pandemics in history has been one of the greatest logistical achievements, logistical achievements this country has ever seen. What else have we done in those first 100 days? 24 | 25 | We kept our commitment, Democrats and Republicans, of sending $1,400 rescue checks to 85 percent of American households. We’ve already sent more than 160 million checks out the door. It’s making a difference. You all know it when you go home. For many people, it’s making all the difference in the world. 26 | 27 | A single mom in Texas who wrote me, she said she couldn’t work. She said the relief check put food on the table and saved her and her son from eviction from their apartment. A grandmother in Virginia who told me she immediately took her granddaughter to the eye doctor, something she said she put off for months because she didn’t have the money. One of the defining images, at least from my perspective, in this crisis has been cars lined up, cars lined up for miles. And not people just barely able to start those cars. Nice cars, lined up for miles, waiting for a box of food to be put in their trunk. 28 | 29 | I don’t know about you, but I didn’t ever think I would see that in America. And all of this is through no fault of their own. No fault of their own, these people are in this position. That’s why the rescue plan is delivering food and nutrition assistance to millions of Americans facing hunger. And hunger is down sharply already. 30 | 31 | 32 | 33 | 34 | 35 | 36 | Folks — as I’ve told every world leader I’ve met with over the years — it’s never, ever, ever been a good bet to bet against America and it still isn’t. We are the United States of America. There is not a single thing — nothing, nothing beyond our capacity. We can do whatever we set our mind to if we do it together. So let’s begin to get together. 37 | 38 | God bless you all, and may God protect our troops. Thank you for your patience. --------------------------------------------------------------------------------