├── .DS_Store ├── KNN ├── .ipynb_checkpoints │ └── TheorySample-checkpoint.ipynb ├── Excercises │ ├── .ipynb_checkpoints │ │ ├── KNExercise-checkpoint.ipynb │ │ └── dataset-checkpoint.csv │ ├── KNExercise.ipynb │ └── dataset.csv └── TheorySample.ipynb ├── LogisticRegression ├── .DS_Store ├── .ipynb_checkpoints │ ├── 7_logistic_regression-checkpoint.ipynb │ ├── Exercise-checkpoint.ipynb │ ├── LogisticeRegression-checkpoint.ipynb │ └── insurance_data-checkpoint.csv ├── Exercise.ipynb ├── Exercise │ ├── .ipynb_checkpoints │ │ ├── 7_logistic_regression_exercise-checkpoint.ipynb │ │ └── HR_comma_sep-checkpoint.csv │ ├── 7_logistic_regression_exercise.ipynb │ └── HR_comma_sep.csv ├── LogisticeRegression.ipynb └── insurance_data.csv └── README.md /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/md-thayyib/DataScience-ML/174ef85031b5c97729404c86c6fdcf6d8f936b56/.DS_Store -------------------------------------------------------------------------------- /KNN/.ipynb_checkpoints/TheorySample-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /KNN/Excercises/.ipynb_checkpoints/KNExercise-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /KNN/Excercises/.ipynb_checkpoints/dataset-checkpoint.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/md-thayyib/DataScience-ML/174ef85031b5c97729404c86c6fdcf6d8f936b56/KNN/Excercises/.ipynb_checkpoints/dataset-checkpoint.csv -------------------------------------------------------------------------------- /KNN/TheorySample.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "1035894a-8de0-4ded-8565-f0b4520f0c27", 6 | "metadata": {}, 7 | "source": [ 8 | "# KNN\n", 9 | "\n", 10 | "KNN is one of the simplest alogorithm for both classification and regression. But it is mainly used for classification problem. The idea behing the KNN is to find the distance between the new data point and neighnour datas. We can set number of neigbour(k) and vote the nearest k neigbour in which the each data point belongs to class. The majority will win. So it is better to take k= odd number. There is no algo to choose the value of k. But choose k from 1 to n(say 40) and compare the errora and accuracy..\n", 11 | "\n", 12 | "There are 4 methods to calculate the distance\n", 13 | "1. Euclidian distance - Pythagorous distance - Like air distance\n", 14 | "2. Manhattan distance sum of absolute of distances (abs(x1-x2)+abs(y2-y1) \n", 15 | "The name Manhattan is an island in U.S. city distance. like car distance\n", 16 | "3. Minkowski - p=some values. If p=2 => Euclidian\n", 17 | "if p=1 => Manahatan, we can set another values\n", 18 | "4. Hamming distance - sum or avg of bit position at which the two code differs\n", 19 | "comes in one-hot encoding. 0 0 1 1\n", 20 | " 1 0 0 1 distance equals 2/4 = .5\n", 21 | " \n", 22 | "\"K-Nearest Neighbors is the supervised machine learning algorithm used for classification and regression. It manipulates the training data and classifies the new test data based on distance metrics. It finds the k-nearest neighbors to the test data, and then classification is performed by the majority of class labels.\n", 23 | "\n", 24 | "Selecting the optimal K value to achieve the maximum accuracy of the model is always challenging for a data scientist.\"\n", 25 | "\n", 26 | "\n", 27 | "### This algo is a lazy because it trains each time when it sees a sample\n", 28 | "\n", 29 | "## Here I am using iris data set" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 28, 35 | "id": "afebc598-aea6-4238-93f6-57fc6305813b", 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "import pandas as pd\n", 40 | "import seaborn as sns\n", 41 | "import matplotlib.pyplot as plt\n", 42 | "from sklearn.datasets import load_iris" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 29, 48 | "id": "00d314e4-72a2-4381-bef5-499559d685c2", 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/plain": [ 54 | "['sepal length (cm)',\n", 55 | " 'sepal width (cm)',\n", 56 | " 'petal length (cm)',\n", 57 | " 'petal width (cm)']" 58 | ] 59 | }, 60 | "execution_count": 29, 61 | "metadata": {}, 62 | "output_type": "execute_result" 63 | } 64 | ], 65 | "source": [ 66 | "iris.feature_names" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 30, 72 | "id": "1791ce7e-768b-482e-b983-b88695d148f5", 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "data": { 77 | "text/plain": [ 78 | "array(['setosa', 'versicolor', 'virginica'], dtype='\n", 121 | "\n", 134 | "\n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | "
sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)Species
05.13.51.40.20
14.93.01.40.20
24.73.21.30.20
34.63.11.50.20
45.03.61.40.20
\n", 188 | "" 189 | ], 190 | "text/plain": [ 191 | " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \\\n", 192 | "0 5.1 3.5 1.4 0.2 \n", 193 | "1 4.9 3.0 1.4 0.2 \n", 194 | "2 4.7 3.2 1.3 0.2 \n", 195 | "3 4.6 3.1 1.5 0.2 \n", 196 | "4 5.0 3.6 1.4 0.2 \n", 197 | "\n", 198 | " Species \n", 199 | "0 0 \n", 200 | "1 0 \n", 201 | "2 0 \n", 202 | "3 0 \n", 203 | "4 0 " 204 | ] 205 | }, 206 | "execution_count": 33, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "df.head()" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 34, 218 | "id": "d6d918fe-0a10-4434-975a-d90e2845641f", 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "(150, 5)" 225 | ] 226 | }, 227 | "execution_count": 34, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "df.shape" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 35, 239 | "id": "70d0a5f7-7892-4e72-969b-9e0f959f49d2", 240 | "metadata": {}, 241 | "outputs": [ 242 | { 243 | "data": { 244 | "text/plain": [ 245 | "" 246 | ] 247 | }, 248 | "execution_count": 35, 249 | "metadata": {}, 250 | "output_type": "execute_result" 251 | }, 252 | { 253 | "data": { 254 | "image/png": "\n", 255 | "text/plain": [ 256 | "
" 257 | ] 258 | }, 259 | "metadata": { 260 | "needs_background": "light" 261 | }, 262 | "output_type": "display_data" 263 | } 264 | ], 265 | "source": [ 266 | "sns.stripplot(x = 'petal length (cm)',y = 'sepal length (cm)',data=df,hue='Species')" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "id": "d2b4f452-92f1-4827-aec7-4a5917950739", 272 | "metadata": {}, 273 | "source": [ 274 | "## Train test split" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 36, 280 | "id": "5dee6694-51e4-4e85-98f3-e76868b3ccc0", 281 | "metadata": {}, 282 | "outputs": [], 283 | "source": [ 284 | "X = df.drop('Species',axis=1)" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": 37, 290 | "id": "f7c720f3-614b-4a15-9418-c88068ed9b60", 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "y = df.Species" 295 | ] 296 | }, 297 | { 298 | "cell_type": "code", 299 | "execution_count": 38, 300 | "id": "46f601a4-38f2-4a6b-af0a-6441b8616c4c", 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [ 304 | "from sklearn.model_selection import train_test_split\n", 305 | "X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=.3)" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "id": "09792034-450d-405f-9c37-a6b6d62fe9b4", 311 | "metadata": {}, 312 | "source": [ 313 | "## Create KNN model" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 39, 319 | "id": "a1ee0268-b025-4099-a4f7-e5dc3fbaa80b", 320 | "metadata": {}, 321 | "outputs": [], 322 | "source": [ 323 | "from sklearn.neighbors import KNeighborsClassifier" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 64, 329 | "id": "ea20c832-80ba-460e-8df4-1b39f3974a30", 330 | "metadata": {}, 331 | "outputs": [], 332 | "source": [ 333 | "model = KNeighborsClassifier(n_neighbors=5,metric='minkowski',p=2,)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 65, 339 | "id": "46ba1653-536d-436c-be54-07c5c8363190", 340 | "metadata": {}, 341 | "outputs": [ 342 | { 343 | "data": { 344 | "text/plain": [ 345 | "KNeighborsClassifier()" 346 | ] 347 | }, 348 | "execution_count": 65, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | } 352 | ], 353 | "source": [ 354 | "model.fit(X_train,y_train)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "id": "93fc9293-30ed-4427-9900-671729820c8b", 360 | "metadata": {}, 361 | "source": [ 362 | "### Score" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 66, 368 | "id": "3f91a26e-f4e0-4c6b-b136-dd7cbf7b74ff", 369 | "metadata": {}, 370 | "outputs": [ 371 | { 372 | "data": { 373 | "text/plain": [ 374 | "0.9333333333333333" 375 | ] 376 | }, 377 | "execution_count": 66, 378 | "metadata": {}, 379 | "output_type": "execute_result" 380 | } 381 | ], 382 | "source": [ 383 | "model.score(X_test,y_test)" 384 | ] 385 | }, 386 | { 387 | "cell_type": "code", 388 | "execution_count": 80, 389 | "id": "f7af8ca0-b959-4364-9ad9-52ee2891aec4", 390 | "metadata": {}, 391 | "outputs": [ 392 | { 393 | "data": { 394 | "text/plain": [ 395 | "{'algorithm': 'auto',\n", 396 | " 'leaf_size': 30,\n", 397 | " 'metric': 'minkowski',\n", 398 | " 'metric_params': None,\n", 399 | " 'n_jobs': None,\n", 400 | " 'n_neighbors': 5,\n", 401 | " 'p': 2,\n", 402 | " 'weights': 'uniform'}" 403 | ] 404 | }, 405 | "execution_count": 80, 406 | "metadata": {}, 407 | "output_type": "execute_result" 408 | } 409 | ], 410 | "source": [ 411 | "model.get_params()" 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": null, 417 | "id": "4c31a01e-8b97-4a63-9828-afa975401d88", 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": null, 425 | "id": "9c6cb8a3-5e57-4d7c-82a6-6256dadb58d3", 426 | "metadata": {}, 427 | "outputs": [], 428 | "source": [] 429 | } 430 | ], 431 | "metadata": { 432 | "kernelspec": { 433 | "display_name": "Python 3 (ipykernel)", 434 | "language": "python", 435 | "name": "python3" 436 | }, 437 | "language_info": { 438 | "codemirror_mode": { 439 | "name": "ipython", 440 | "version": 3 441 | }, 442 | "file_extension": ".py", 443 | "mimetype": "text/x-python", 444 | "name": "python", 445 | "nbconvert_exporter": "python", 446 | "pygments_lexer": "ipython3", 447 | "version": "3.9.12" 448 | } 449 | }, 450 | "nbformat": 4, 451 | "nbformat_minor": 5 452 | } 453 | -------------------------------------------------------------------------------- /LogisticRegression/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/md-thayyib/DataScience-ML/174ef85031b5c97729404c86c6fdcf6d8f936b56/LogisticRegression/.DS_Store -------------------------------------------------------------------------------- /LogisticRegression/.ipynb_checkpoints/7_logistic_regression-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "

Predicting if a person would buy life insurnace based on his age using logistic regression

" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Above is a binary logistic regression problem as there are only two possible outcomes (i.e. if person buys insurance or he/she doesn't). " 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 15, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "import pandas as pd\n", 24 | "from matplotlib import pyplot as plt\n", 25 | "%matplotlib inline" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 16, 31 | "metadata": {}, 32 | "outputs": [ 33 | { 34 | "data": { 35 | "text/html": [ 36 | "
\n", 37 | "\n", 50 | "\n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | "
agebought_insurance
0220
1250
2471
3520
4461
\n", 86 | "
" 87 | ], 88 | "text/plain": [ 89 | " age bought_insurance\n", 90 | "0 22 0\n", 91 | "1 25 0\n", 92 | "2 47 1\n", 93 | "3 52 0\n", 94 | "4 46 1" 95 | ] 96 | }, 97 | "execution_count": 16, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "df = pd.read_csv(\"insurance_data.csv\")\n", 104 | "df.head()" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 17, 110 | "metadata": {}, 111 | "outputs": [ 112 | { 113 | "data": { 114 | "text/plain": [ 115 | "" 116 | ] 117 | }, 118 | "execution_count": 17, 119 | "metadata": {}, 120 | "output_type": "execute_result" 121 | }, 122 | { 123 | "data": { 124 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAADtlJREFUeJzt3X2MnWlZx/Hvj5YVw9sKHQzpC11iURqEXTJp1qzB8qJ2kbTGAOkqimSlMWERI2oWNSuu8Q8wETSs6AYQJMJSV5GGVFeC25UYd92py1tbq6UCHYvsALvrC4G1evnHeQpnp6dzzkzPzBnu8/0kkzn381zzPNfcc87v3H3OnGmqCklSWx416QYkSeNnuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIatHFSJ960aVNt3759UqeXpG9JR48e/VJVzQyrm1i4b9++nbm5uUmdXpK+JSX53Ch1XpaRpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBQ8M9ybuS3J/k0xfZnyS/l+RUkk8mee7425QkLccoK/d3A3uW2H8tsKP7OAC8/dLbatzu3b0PrT/r6WezVC8r3bca5xu39fR9r/Tr1sFcDg33qvpb4CtLlOwD/rh67gYuT/LUcTUoSVq+cfz5gc3Amb7xfLftC2M4dlvOP1vfddcjx0eOTKAZPcJ6+tks1ctK963G+cZtPX3f66nPFRrHC6oZsK0GFiYHkswlmVtYWBjDqSVJg6RqYA4/sijZDny4qp41YN8fAkeq6v3d+CSwu6qWXLnPzs7W1P7hMFfs69d6+tks1ctK963G+cZtPX3fK/26VZzLJEeranZY3ThW7oeAn+p+a+Zq4KFhwS5JWl1DV+5J3g/sBjYBXwR+HXg0QFX9QZIAb6P3GzVfBV5VVUOX5FO9cpekFRp15T70BdWqum7I/gJes4zeJEmrzHeoSlKDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAaNFO5J9iQ5meRUkhsH7N+W5M4k9yX5ZJIXj79VSdKohoZ7kg3ALcC1wE7guiQ7F5X9GnCwqq4C9gO/P+5GJUmjG2Xlvgs4VVWnq+ph4DZg36KaAp7Q3X4icHZ8LUqSlmuUcN8MnOkbz3fb+r0ReEWSeeAw8NpBB0pyIMlckrmFhYUVtCtJGsUo4Z4B22rR+Drg3VW1BXgx8N4kFxy7qm6tqtmqmp2ZmVl+t5KkkYwS7vPA1r7xFi687HI9cBCgqv4eeAywaRwNSpKWb5RwvxfYkeSKJJfRe8H00KKazwMvBEjyTHrh7nUXSZqQoeFeVeeAG4A7gBP0fivmWJKbk+ztyl4PvDrJJ4D3Az9dVYsv3UiS1sjGUYqq6jC9F0r7t93Ud/s4cM14W5MkrZTvUJWkBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJatBI4Z5kT5KTSU4lufEiNS9PcjzJsSTvG2+bkqTl2DisIMkG4BbgB4F54N4kh6rqeF/NDuANwDVV9UCSp6xWw5Kk4UZZue8CTlXV6ap6GLgN2Leo5tXALVX1AEBV3T/eNiVJyzFKuG8GzvSN57tt/Z4BPCPJ3yW5O8mecTUoSVq+oZdlgAzYVgOOswPYDWwBPpbkWVX14CMOlBwADgBs27Zt2c1KkkYzysp9HtjaN94CnB1Q86Gq+p+q+lfgJL2wf4SqurWqZqtqdmZmZqU9S5KGGCXc7wV2JLkiyWXAfuDQopq/AJ4PkGQTvcs0p8fZqCRpdEPDvarOATcAdwAngINVdSzJzUn2dmV3AF9Ochy4E/ilqvryajUtSVpaqhZfPl8bs7OzNTc3N5FzS9K3qiRHq2p2WJ3vUJWkBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNGinck+xJcjLJqSQ3LlH30iSVZHZ8LUqSlmtouCfZANwCXAvsBK5LsnNA3eOBnwPuGXeTkqTlGWXlvgs4VVWnq+ph4DZg34C63wTeDHxtjP1JklZglHDfDJzpG893274hyVXA1qr68FIHSnIgyVySuYWFhWU3K0kazSjhngHb6hs7k0cBbwFeP+xAVXVrVc1W1ezMzMzoXUqSlmWUcJ8HtvaNtwBn+8aPB54FHEnyWeBq4JAvqkrS5IwS7vcCO5JckeQyYD9w6PzOqnqoqjZV1faq2g7cDeytqrlV6ViSNNTQcK+qc8ANwB3ACeBgVR1LcnOSvavdoCRp+TaOUlRVh4HDi7bddJHa3ZfeliTpUvgOVUlqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBo0U7kn2JDmZ5FSSGwfs/4Ukx5N8MslHkzxt/K1KkkY1NNyTbABuAa4FdgLXJdm5qOw+YLaqng3cDrx53I1KkkY3ysp9F3Cqqk5X1cPAbcC+/oKqurOqvtoN7wa2jLdNSdJyjBLum4EzfeP5btvFXA/85aU0JUm6NBtHqMmAbTWwMHkFMAv8wEX2HwAOAGzbtm3EFiVJyzXKyn0e2No33gKcXVyU5EXArwJ7q+rrgw5UVbdW1WxVzc7MzKykX0nSCEYJ93uBHUmuSHIZsB841F+Q5CrgD+kF+/3jb1OStBxDw72qzgE3AHcAJ4CDVXUsyc1J9nZlvw08DvjTJB9Pcugih5MkrYFRrrlTVYeBw4u23dR3+0Vj7kuSdAl8h6okNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lq0EjhnmRPkpNJTiW5ccD+b0vygW7/PUm2j7vRb7j88t7HILt39z6Wa6mvW+m+tT7fxfattEfpUq3Gfc/788iGhnuSDcAtwLXATuC6JDsXlV0PPFBV3wW8BXjTuBuVJI1u4wg1u4BTVXUaIMltwD7geF/NPuCN3e3bgbclSVXV2Do9v1p/6KFHjh988JvP5Hfd1ft8fnzkyNLHXOrrVrpvrc93sX3nLbdH6VKt9PGx1sds3CiXZTYDZ/rG8922gTVVdQ54CHjy4gMlOZBkLsncwsLCyjqWJA2VYYvrJC8DfriqfqYb/ySwq6pe21dzrKuZ78af6Wq+fLHjzs7O1tzc3PI77l+xL7bSZ/Olvm6l+9b6fBfb5wpHk7Ia9z3vzyQ5WlWzw+pGWbnPA1v7xluAsxerSbIReCLwldFalSSN2ygr943APwMvBP4NuBf48ao61lfzGuB7q+pnk+wHfqyqXr7UcVe8cpekKTbqyn3oC6pVdS7JDcAdwAbgXVV1LMnNwFxVHQLeCbw3ySl6K/b9l9a+JOlSjPLbMlTVYeDwom039d3+GvCy8bYmSVop36EqSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDhr5DddVOnCwAn1vj024CvrTG51zvnJMLOSeDOS8XmsScPK2qZoYVTSzcJyHJ3Chv250mzsmFnJPBnJcLrec58bKMJDXIcJekBk1buN866QbWIefkQs7JYM7LhdbtnEzVNXdJmhbTtnKXpKnQbLgn2ZrkziQnkhxL8rpu+5OSfCTJv3Sfv2PSva6VJI9J8g9JPtHNyW90269Ick83Jx9Ictmke11rSTYkuS/Jh7vxVM9Jks8m+VSSjyeZ67ZN7WMHIMnlSW5P8k9drnzfep6TZsMdOAe8vqqeCVwNvCbJTuBG4KNVtQP4aDeeFl8HXlBVzwGuBPYkuRp4E/CWbk4eAK6fYI+T8jrgRN/YOYHnV9WVfb/qN82PHYDfBf6qqr4HeA69+8u6nZNmw72qvlBV/9jd/k96P4jNwD7gPV3Ze4AfnUyHa696/qsbPrr7KOAFwO3d9qmaE4AkW4AfAd7RjcOUz8lFTO1jJ8kTgOfR+1/nqKqHq+pB1vGcNBvu/ZJsB64C7gG+s6q+AL0nAOApk+ts7XWXHz4O3A98BPgM8GBVnetK5uk9CU6TtwK/DPxfN34yzkkBf53kaJID3bZpfuw8HVgA/qi7fPeOJI9lHc9J8+Ge5HHAnwE/X1X/Mel+Jq2q/reqrgS2ALuAZw4qW9uuJifJS4D7q+po/+YBpVMzJ51rquq5wLX0Lmk+b9INTdhG4LnA26vqKuC/WUeXYAZpOtyTPJpesP9JVf15t/mLSZ7a7X8qvRXs1On+SXmE3usRlyc5///pbgHOTqqvCbgG2Jvks8Bt9C7HvJXpnhOq6mz3+X7gg/QWAtP82JkH5qvqnm58O72wX7dz0my4d9dN3wmcqKrf6dt1CHhld/uVwIfWurdJSTKT5PLu9rcDL6L3WsSdwEu7sqmak6p6Q1VtqartwH7gb6rqJ5jiOUny2CSPP38b+CHg00zxY6eq/h04k+S7u00vBI6zjuek2TcxJfl+4GPAp/jmtdRfoXfd/SCwDfg88LKq+spEmlxjSZ5N70WfDfSe2A9W1c1Jnk5v1fok4D7gFVX19cl1OhlJdgO/WFUvmeY56b73D3bDjcD7quq3kjyZKX3sACS5kt6L7pcBp4FX0T2OWIdz0my4S9I0a/ayjCRNM8NdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QG/T+DO4M82zzW0AAAAABJRU5ErkJggg==\n", 125 | "text/plain": [ 126 | "
" 127 | ] 128 | }, 129 | "metadata": { 130 | "needs_background": "light" 131 | }, 132 | "output_type": "display_data" 133 | } 134 | ], 135 | "source": [ 136 | "plt.scatter(df.age,df.bought_insurance,marker='+',color='red')" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 18, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "from sklearn.model_selection import train_test_split" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 29, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "X_train, X_test, y_train, y_test = train_test_split(df[['age']],df.bought_insurance,train_size=0.8)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 30, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "data": { 164 | "text/html": [ 165 | "
\n", 166 | "\n", 179 | "\n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | "
age
446
862
2623
1758
2450
2554
\n", 213 | "
" 214 | ], 215 | "text/plain": [ 216 | " age\n", 217 | "4 46\n", 218 | "8 62\n", 219 | "26 23\n", 220 | "17 58\n", 221 | "24 50\n", 222 | "25 54" 223 | ] 224 | }, 225 | "execution_count": 30, 226 | "metadata": {}, 227 | "output_type": "execute_result" 228 | } 229 | ], 230 | "source": [ 231 | "X_test" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 31, 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "from sklearn.linear_model import LogisticRegression\n", 241 | "model = LogisticRegression()" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 66, 247 | "metadata": { 248 | "scrolled": true 249 | }, 250 | "outputs": [ 251 | { 252 | "name": "stderr", 253 | "output_type": "stream", 254 | "text": [ 255 | "C:\\ProgramData\\Anaconda3\\lib\\site-packages\\sklearn\\linear_model\\logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n", 256 | " FutureWarning)\n" 257 | ] 258 | }, 259 | { 260 | "data": { 261 | "text/plain": [ 262 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", 263 | " intercept_scaling=1, max_iter=100, multi_class='warn',\n", 264 | " n_jobs=None, penalty='l2', random_state=None, solver='warn',\n", 265 | " tol=0.0001, verbose=0, warm_start=False)" 266 | ] 267 | }, 268 | "execution_count": 66, 269 | "metadata": {}, 270 | "output_type": "execute_result" 271 | } 272 | ], 273 | "source": [ 274 | "model.fit(X_train, y_train)" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 9, 280 | "metadata": {}, 281 | "outputs": [ 282 | { 283 | "data": { 284 | "text/html": [ 285 | "
\n", 286 | "\n", 299 | "\n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | "
age
1625
2126
247
\n", 321 | "
" 322 | ], 323 | "text/plain": [ 324 | " age\n", 325 | "16 25\n", 326 | "21 26\n", 327 | "2 47" 328 | ] 329 | }, 330 | "execution_count": 9, 331 | "metadata": {}, 332 | "output_type": "execute_result" 333 | } 334 | ], 335 | "source": [ 336 | "X_test" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": 39, 342 | "metadata": {}, 343 | "outputs": [], 344 | "source": [ 345 | "y_predicted = model.predict(X_test)" 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": 33, 351 | "metadata": {}, 352 | "outputs": [ 353 | { 354 | "data": { 355 | "text/plain": [ 356 | "array([[0.40569485, 0.59430515],\n", 357 | " [0.26002994, 0.73997006],\n", 358 | " [0.63939494, 0.36060506],\n", 359 | " [0.29321765, 0.70678235],\n", 360 | " [0.36637568, 0.63362432],\n", 361 | " [0.32875922, 0.67124078]])" 362 | ] 363 | }, 364 | "execution_count": 33, 365 | "metadata": {}, 366 | "output_type": "execute_result" 367 | } 368 | ], 369 | "source": [ 370 | "model.predict_proba(X_test)" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 34, 376 | "metadata": { 377 | "scrolled": true 378 | }, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "1.0" 384 | ] 385 | }, 386 | "execution_count": 34, 387 | "metadata": {}, 388 | "output_type": "execute_result" 389 | } 390 | ], 391 | "source": [ 392 | "model.score(X_test,y_test)" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 40, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/plain": [ 403 | "array([1, 1, 0, 1, 1, 1], dtype=int64)" 404 | ] 405 | }, 406 | "execution_count": 40, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "y_predicted" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 37, 418 | "metadata": { 419 | "scrolled": true 420 | }, 421 | "outputs": [ 422 | { 423 | "data": { 424 | "text/html": [ 425 | "
\n", 426 | "\n", 439 | "\n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | "
age
446
862
2623
1758
2450
2554
\n", 473 | "
" 474 | ], 475 | "text/plain": [ 476 | " age\n", 477 | "4 46\n", 478 | "8 62\n", 479 | "26 23\n", 480 | "17 58\n", 481 | "24 50\n", 482 | "25 54" 483 | ] 484 | }, 485 | "execution_count": 37, 486 | "metadata": {}, 487 | "output_type": "execute_result" 488 | } 489 | ], 490 | "source": [ 491 | "X_test" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "**model.coef_ indicates value of m in y=m*x + b equation**" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 67, 504 | "metadata": { 505 | "scrolled": true 506 | }, 507 | "outputs": [ 508 | { 509 | "data": { 510 | "text/plain": [ 511 | "array([[0.04150133]])" 512 | ] 513 | }, 514 | "execution_count": 67, 515 | "metadata": {}, 516 | "output_type": "execute_result" 517 | } 518 | ], 519 | "source": [ 520 | "model.coef_" 521 | ] 522 | }, 523 | { 524 | "cell_type": "markdown", 525 | "metadata": {}, 526 | "source": [ 527 | "**model.intercept_ indicates value of b in y=m*x + b equation**" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 68, 533 | "metadata": { 534 | "scrolled": true 535 | }, 536 | "outputs": [ 537 | { 538 | "data": { 539 | "text/plain": [ 540 | "array([-1.52726963])" 541 | ] 542 | }, 543 | "execution_count": 68, 544 | "metadata": {}, 545 | "output_type": "execute_result" 546 | } 547 | ], 548 | "source": [ 549 | "model.intercept_" 550 | ] 551 | }, 552 | { 553 | "cell_type": "markdown", 554 | "metadata": {}, 555 | "source": [ 556 | "**Lets defined sigmoid function now and do the math with hand**" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 43, 562 | "metadata": {}, 563 | "outputs": [], 564 | "source": [ 565 | "import math\n", 566 | "def sigmoid(x):\n", 567 | " return 1 / (1 + math.exp(-x))" 568 | ] 569 | }, 570 | { 571 | "cell_type": "code", 572 | "execution_count": 75, 573 | "metadata": {}, 574 | "outputs": [], 575 | "source": [ 576 | "def prediction_function(age):\n", 577 | " z = 0.042 * age - 1.53 # 0.04150133 ~ 0.042 and -1.52726963 ~ -1.53\n", 578 | " y = sigmoid(z)\n", 579 | " return y" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 76, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "data": { 589 | "text/plain": [ 590 | "0.4850044983805899" 591 | ] 592 | }, 593 | "execution_count": 76, 594 | "metadata": {}, 595 | "output_type": "execute_result" 596 | } 597 | ], 598 | "source": [ 599 | "age = 35\n", 600 | "prediction_function(age)" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "**0.485 is less than 0.5 which means person with 35 age will *not* buy insurance**" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": 77, 613 | "metadata": { 614 | "scrolled": true 615 | }, 616 | "outputs": [ 617 | { 618 | "data": { 619 | "text/plain": [ 620 | "0.568565299077705" 621 | ] 622 | }, 623 | "execution_count": 77, 624 | "metadata": {}, 625 | "output_type": "execute_result" 626 | } 627 | ], 628 | "source": [ 629 | "age = 43\n", 630 | "prediction_function(age)" 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": {}, 636 | "source": [ 637 | "**0.485 is more than 0.5 which means person with 43 will buy the insurance**" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "metadata": {}, 643 | "source": [ 644 | "

Exercise

\n", 645 | "\n", 646 | "Download employee retention dataset from here: https://www.kaggle.com/giripujar/hr-analytics. \n", 647 | "1. Now do some exploratory data analysis to figure out which variables have direct and clear impact on employee retention (i.e. whether they leave the company or continue to work)\n", 648 | "2. Plot bar charts showing impact of employee salaries on retention\n", 649 | "3. Plot bar charts showing corelation between department and employee retention\n", 650 | "4. Now build logistic regression model using variables that were narrowed down in step 1\n", 651 | "5. Measure the accuracy of the model" 652 | ] 653 | } 654 | ], 655 | "metadata": { 656 | "kernelspec": { 657 | "display_name": "Python 3", 658 | "language": "python", 659 | "name": "python3" 660 | }, 661 | "language_info": { 662 | "codemirror_mode": { 663 | "name": "ipython", 664 | "version": 3 665 | }, 666 | "file_extension": ".py", 667 | "mimetype": "text/x-python", 668 | "name": "python", 669 | "nbconvert_exporter": "python", 670 | "pygments_lexer": "ipython3", 671 | "version": "3.7.3" 672 | } 673 | }, 674 | "nbformat": 4, 675 | "nbformat_minor": 2 676 | } 677 | -------------------------------------------------------------------------------- /LogisticRegression/.ipynb_checkpoints/Exercise-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /LogisticRegression/.ipynb_checkpoints/LogisticeRegression-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [], 3 | "metadata": {}, 4 | "nbformat": 4, 5 | "nbformat_minor": 5 6 | } 7 | -------------------------------------------------------------------------------- /LogisticRegression/.ipynb_checkpoints/insurance_data-checkpoint.csv: -------------------------------------------------------------------------------- 1 | age,bought_insurance 2 | 22,0 3 | 25,0 4 | 47,1 5 | 52,0 6 | 46,1 7 | 56,1 8 | 55,0 9 | 60,1 10 | 62,1 11 | 61,1 12 | 18,0 13 | 28,0 14 | 27,0 15 | 29,0 16 | 49,1 17 | 55,1 18 | 25,1 19 | 58,1 20 | 19,0 21 | 18,0 22 | 21,0 23 | 26,0 24 | 40,1 25 | 45,1 26 | 50,1 27 | 54,1 28 | 23,0 -------------------------------------------------------------------------------- /LogisticRegression/Exercise/.ipynb_checkpoints/7_logistic_regression_exercise-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Dataset is downloaded from Kaggle. Link: https://www.kaggle.com/giripujar/hr-analytics" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 13, 13 | "metadata": { 14 | "collapsed": true 15 | }, 16 | "outputs": [], 17 | "source": [ 18 | "import pandas as pd\n", 19 | "from matplotlib import pyplot as plt\n", 20 | "%matplotlib inline" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 3, 26 | "metadata": {}, 27 | "outputs": [ 28 | { 29 | "data": { 30 | "text/html": [ 31 | "
\n", 32 | "\n", 45 | "\n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | "
satisfaction_levellast_evaluationnumber_projectaverage_montly_hourstime_spend_companyWork_accidentleftpromotion_last_5yearsDepartmentsalary
00.380.5321573010saleslow
10.800.8652626010salesmedium
20.110.8872724010salesmedium
30.720.8752235010saleslow
40.370.5221593010saleslow
\n", 129 | "
" 130 | ], 131 | "text/plain": [ 132 | " satisfaction_level last_evaluation number_project average_montly_hours \\\n", 133 | "0 0.38 0.53 2 157 \n", 134 | "1 0.80 0.86 5 262 \n", 135 | "2 0.11 0.88 7 272 \n", 136 | "3 0.72 0.87 5 223 \n", 137 | "4 0.37 0.52 2 159 \n", 138 | "\n", 139 | " time_spend_company Work_accident left promotion_last_5years Department \\\n", 140 | "0 3 0 1 0 sales \n", 141 | "1 6 0 1 0 sales \n", 142 | "2 4 0 1 0 sales \n", 143 | "3 5 0 1 0 sales \n", 144 | "4 3 0 1 0 sales \n", 145 | "\n", 146 | " salary \n", 147 | "0 low \n", 148 | "1 medium \n", 149 | "2 medium \n", 150 | "3 low \n", 151 | "4 low " 152 | ] 153 | }, 154 | "execution_count": 3, 155 | "metadata": {}, 156 | "output_type": "execute_result" 157 | } 158 | ], 159 | "source": [ 160 | "df = pd.read_csv(\"HR_comma_sep.csv\")\n", 161 | "df.head()" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "

Data exploration and visualization

" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 73, 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "data": { 178 | "text/plain": [ 179 | "(3571, 10)" 180 | ] 181 | }, 182 | "execution_count": 73, 183 | "metadata": {}, 184 | "output_type": "execute_result" 185 | } 186 | ], 187 | "source": [ 188 | "left = df[df.left==1]\n", 189 | "left.shape" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 74, 195 | "metadata": {}, 196 | "outputs": [ 197 | { 198 | "data": { 199 | "text/plain": [ 200 | "(11428, 10)" 201 | ] 202 | }, 203 | "execution_count": 74, 204 | "metadata": {}, 205 | "output_type": "execute_result" 206 | } 207 | ], 208 | "source": [ 209 | "retained = df[df.left==0]\n", 210 | "retained.shape" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "**Average numbers for all columns** " 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 31, 223 | "metadata": { 224 | "scrolled": false 225 | }, 226 | "outputs": [ 227 | { 228 | "data": { 229 | "text/html": [ 230 | "
\n", 231 | "\n", 244 | "\n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | "
satisfaction_levellast_evaluationnumber_projectaverage_montly_hourstime_spend_companyWork_accidentpromotion_last_5years
left
00.6668100.7154733.786664199.0602033.3800320.1750090.026251
10.4400980.7181133.855503207.4192103.8765050.0473260.005321
\n", 290 | "
" 291 | ], 292 | "text/plain": [ 293 | " satisfaction_level last_evaluation number_project \\\n", 294 | "left \n", 295 | "0 0.666810 0.715473 3.786664 \n", 296 | "1 0.440098 0.718113 3.855503 \n", 297 | "\n", 298 | " average_montly_hours time_spend_company Work_accident \\\n", 299 | "left \n", 300 | "0 199.060203 3.380032 0.175009 \n", 301 | "1 207.419210 3.876505 0.047326 \n", 302 | "\n", 303 | " promotion_last_5years \n", 304 | "left \n", 305 | "0 0.026251 \n", 306 | "1 0.005321 " 307 | ] 308 | }, 309 | "execution_count": 31, 310 | "metadata": {}, 311 | "output_type": "execute_result" 312 | } 313 | ], 314 | "source": [ 315 | "df.groupby('left').mean()" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "From above table we can draw following conclusions,\n", 323 | "
    \n", 324 | "
  1. **Satisfaction Level**: Satisfaction level seems to be relatively low (0.44) in employees leaving the firm vs the retained ones (0.66)
  2. \n", 325 | "
  3. **Average Monthly Hours**: Average monthly hours are higher in employees leaving the firm (199 vs 207)
  4. \n", 326 | "
  5. **Promotion Last 5 Years**: Employees who are given promotion are likely to be retained at firm
  6. \n", 327 | "
" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "**Impact of salary on employee retention**" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 37, 340 | "metadata": { 341 | "scrolled": true 342 | }, 343 | "outputs": [ 344 | { 345 | "data": { 346 | "text/plain": [ 347 | "" 348 | ] 349 | }, 350 | "execution_count": 37, 351 | "metadata": {}, 352 | "output_type": "execute_result" 353 | }, 354 | { 355 | "data": { 356 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAEpCAYAAAB1Fp6nAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFplJREFUeJzt3X+0XWWd3/H3Z8KPsAw4/Igpk4uTWOOMgIgYKGrHqqyR\nHyLQjmWCVnFA0YGO6erYCm3XAO2wyiy6ZIaxQKmlhOUMmFWlMCqIIFkwpRiuViFEaRh+DMkACXEE\nnC4Q4rd/nB1zDBdyb7g5+8bn/VrrrPvs5+x99vfkrpvP2c+z9z6pKiRJbfqlvguQJPXHEJCkhhkC\nktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1bJe+C9iW/fbbrxYsWNB3GZK0U/n2t7/9ZFXN\n3dZ6Mz4EFixYwPj4eN9lSNJOJckjk1nP4SBJapghIEkNMwQkqWEzfk5Akvrw/PPPs3btWp599tm+\nS3lZs2fPZmxsjF133XW7tjcEJGkCa9euZc8992TBggUk6bucCVUVGzduZO3atSxcuHC7XsPhIEma\nwLPPPsu+++47YwMAIAn77rvvKzpaMQQk6SXM5ADY7JXWaAhIUsMMAUmaRnPmzNnmOpdccglvfOMb\n+dCHPsSKFSu48847R1DZxJwY1i+MBWd/daT7e/jC9410f7/oWvr9XXrppdxyyy2MjY1x3nnnMWfO\nHN7+9rf3UotHApK0g1x00UUcfvjhHHLIIZx77rkAfPKTn+TBBx/k2GOP5eKLL+byyy/n4osv5tBD\nD+WOO+4YeY2TOhJI8jDwDLAJeKGqFifZB/gisAB4GDi5qv62W/8c4PRu/U9V1de7/rcCVwF7AF8D\nllZVTd/bkaSZ4eabb2bNmjWsXLmSquKEE07g9ttv5/LLL+emm27itttuY7/99uOpp55izpw5fPrT\nn+6lzqkcCby7qg6tqsXd8tnArVW1CLi1WybJgcAS4CDgGODSJLO6bS4DPg4s6h7HvPK3IEkzz803\n38zNN9/MW97yFg477DB+8IMfsGbNmr7LepFXMidwIvCurr0MWAF8puu/tqqeAx5K8gBwRHc0sVdV\n3QWQ5GrgJODGV1CDJM1IVcU555zDJz7xib5LeVmTPRIo4JYk305yRtc3r6oe69qPA/O69nzg0aFt\n13Z987v21v2S9Avn6KOP5sorr+THP/4xAOvWrWP9+vUvWm/PPffkmWeeGXV5PzPZEPiHVXUocCxw\nVpJ3Dj/ZjetP29h+kjOSjCcZ37Bhw3S9rCSNzHvf+14++MEP8ra3vY03velNfOADH5jwP/v3v//9\nXHfddTN7Yriq1nU/1ye5DjgCeCLJ/lX1WJL9gc0Rtw44YGjzsa5vXdfeun+i/V0BXAGwePFiJ44l\n7TQ2f/IHWLp0KUuXLn3ROg8//PDP2m94wxu45557RlHahLZ5JJDkVUn23NwG3gusAm4ATu1WOxW4\nvmvfACxJsnuShQwmgFd2Q0dPJzkyg+ucPzK0jSSpB5M5EpgHXNfdn2IX4M+r6qYkdwPLk5wOPAKc\nDFBV9yVZDqwGXgDOqqpN3WudyZZTRG/ESWFJ6tU2Q6CqHgTePEH/RuCol9jmAuCCCfrHgYOnXqYk\naUfwimFJapghIEkNMwQkqWHeRVSSJmG673I62buY3nTTTSxdupRNmzbxsY99jLPPPnta6/BIQJJm\nqE2bNnHWWWdx4403snr1aq655hpWr149rfswBCRphlq5ciWvf/3red3rXsduu+3GkiVLuP766b28\nyhCQpBlq3bp1HHDAlhswjI2NsW7dhDda2G6GgCQ1zBCQpBlq/vz5PProlpsyr127lvnzp/fmy4aA\nJM1Qhx9+OGvWrOGhhx7iJz/5Cddeey0nnHDCtO7DU0QlaRL6+GL6XXbZhc997nMcffTRbNq0idNO\nO42DDjpoevcxra8mSZpWxx13HMcdd9wOe32HgySpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDPEVU\nkibjvFdP8+s9tc1VTjvtNL7yla/wmte8hlWrVk3v/jseCUjSDPXRj36Um266aYfuwxCQpBnqne98\nJ/vss88O3YchIEkNMwQkqWGGgCQ1zBCQpIZ5iqgkTcYkTumcbqeccgorVqzgySefZGxsjPPPP5/T\nTz99WvdhCEjSDHXNNdfs8H04HCRJDTMEJKlhhoAkvYSq6ruEbXqlNRoCkjSB2bNns3HjxhkdBFXF\nxo0bmT179na/xqQnhpPMAsaBdVV1fJJ9gC8CC4CHgZOr6m+7dc8BTgc2AZ+qqq93/W8FrgL2AL4G\nLK2Z/C8sqVljY2OsXbuWDRs29F3Ky5o9ezZjY2Pbvf1Uzg5aCnwf2KtbPhu4taouTHJ2t/yZJAcC\nS4CDgF8BbknyhqraBFwGfBz4FoMQOAa4cburl6QdZNddd2XhwoV9l7HDTWo4KMkY8D7g80PdJwLL\nuvYy4KSh/mur6rmqegh4ADgiyf7AXlV1V/fp/+qhbSRJPZjsnMAfA/8a+OlQ37yqeqxrPw7M69rz\ngUeH1lvb9c3v2lv3S5J6ss0QSHI8sL6qvv1S63Sf7KdtbD/JGUnGk4zP9PE4SdqZTeZI4B3ACUke\nBq4F3pPkC8AT3RAP3c/13frrgAOGth/r+tZ17a37X6SqrqiqxVW1eO7cuVN4O5KkqdhmCFTVOVU1\nVlULGEz4frOq/hlwA3Bqt9qpwPVd+wZgSZLdkywEFgEru6Gjp5McmSTAR4a2kST14JXcO+hCYHmS\n04FHgJMBquq+JMuB1cALwFndmUEAZ7LlFNEb8cwgSerVlEKgqlYAK7r2RuCol1jvAuCCCfrHgYOn\nWqQkacfwimFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlh\nhoAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYI\nSFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhq2zRBIMjvJyiTfS3JfkvO7/n2SfCPJ\nmu7n3kPbnJPkgST3Jzl6qP+tSe7tnrskSXbM25IkTcZkjgSeA95TVW8GDgWOSXIkcDZwa1UtAm7t\nlklyILAEOAg4Brg0yazutS4DPg4s6h7HTON7kSRN0TZDoAZ+3C3u2j0KOBFY1vUvA07q2icC11bV\nc1X1EPAAcESS/YG9ququqirg6qFtJEk9mNScQJJZSb4LrAe+UVXfAuZV1WPdKo8D87r2fODRoc3X\ndn3zu/bW/ZKknkwqBKpqU1UdCowx+FR/8FbPF4Ojg2mR5Iwk40nGN2zYMF0vK0naypTODqqqHwG3\nMRjLf6Ib4qH7ub5bbR1wwNBmY13fuq69df9E+7miqhZX1eK5c+dOpURJ0hRM5uyguUl+uWvvAfwm\n8APgBuDUbrVTgeu79g3AkiS7J1nIYAJ4ZTd09HSSI7uzgj4ytI0kqQe7TGKd/YFl3Rk+vwQsr6qv\nJPnfwPIkpwOPACcDVNV9SZYDq4EXgLOqalP3WmcCVwF7ADd2D0lST7YZAlV1D/CWCfo3Ake9xDYX\nABdM0D8OHPziLSRJffCKYUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapgh\nIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS\n1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDVsl74LkHZa5716xPt7arT7UxM8EpCkhhkCktQw\nQ0CSGrbNEEhyQJLbkqxOcl+SpV3/Pkm+kWRN93PvoW3OSfJAkvuTHD3U/9Yk93bPXZIkO+ZtSZIm\nYzJHAi8Av19VBwJHAmclORA4G7i1qhYBt3bLdM8tAQ4CjgEuTTKre63LgI8Di7rHMdP4XiRJU7TN\nEKiqx6rqO137GeD7wHzgRGBZt9oy4KSufSJwbVU9V1UPAQ8ARyTZH9irqu6qqgKuHtpGktSDKc0J\nJFkAvAX4FjCvqh7rnnocmNe15wOPDm22tuub37W37p9oP2ckGU8yvmHDhqmUKEmagkmHQJI5wJeA\nf1FVTw8/132yr+kqqqquqKrFVbV47ty50/WykqStTCoEkuzKIAD+rKq+3HU/0Q3x0P1c3/WvAw4Y\n2nys61vXtbfulyT1ZDJnBwX4b8D3q+qzQ0/dAJzatU8Frh/qX5Jk9yQLGUwAr+yGjp5OcmT3mh8Z\n2kaS1IPJ3DbiHcCHgXuTfLfr+zfAhcDyJKcDjwAnA1TVfUmWA6sZnFl0VlVt6rY7E7gK2AO4sXtI\nknqyzRCoqr8EXup8/qNeYpsLgAsm6B8HDp5KgZKkHccrhiWpYYaAJDXMEJCkhhkCktQwQ0CSGmYI\nSFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAk\nNcwQkKSGGQKS1DBDQJIaZghIUsN26bsASerFea8e8f6eGu3+JskjAUlqmCEgSQ0zBCSpYYaAJDXM\nEJCkhnl20FYWnP3Vke7v4QvfN9L9SdIwjwQkqWGGgCQ1bJshkOTKJOuTrBrq2yfJN5Ks6X7uPfTc\nOUkeSHJ/kqOH+t+a5N7uuUuSZPrfjiRpKiZzJHAVcMxWfWcDt1bVIuDWbpkkBwJLgIO6bS5NMqvb\n5jLg48Ci7rH1a0qSRmybIVBVtwM/3Kr7RGBZ114GnDTUf21VPVdVDwEPAEck2R/Yq6ruqqoCrh7a\nRpLUk+2dE5hXVY917ceBeV17PvDo0Hpru775XXvrfklSj17xxHD3yb6moZafSXJGkvEk4xs2bJjO\nl5YkDdneEHiiG+Kh+7m+618HHDC03ljXt65rb90/oaq6oqoWV9XiuXPnbmeJkqRt2d4QuAE4tWuf\nClw/1L8kye5JFjKYAF7ZDR09neTI7qygjwxtI0nqyTavGE5yDfAuYL8ka4FzgQuB5UlOBx4BTgao\nqvuSLAdWAy8AZ1XVpu6lzmRwptEewI3dQ5LUo22GQFWd8hJPHfUS618AXDBB/zhw8JSqkyTtUF4x\nLEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS\n1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapghIEkN\nMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSw3YZ9Q6THAP8CTAL+HxVXTjqGmaU8149wn09Nbp9\nSdopjPRIIMks4D8DxwIHAqckOXCUNUiSthj1cNARwANV9WBV/QS4FjhxxDVIkjqjDoH5wKNDy2u7\nPklSD0Y+JzAZSc4AzugWf5zk/j7r2ZEC+wFPjmRn52cku2nFSH934O9vmjXw+/vVyaw06hBYBxww\ntDzW9f2cqroCuGJURfUpyXhVLe67Dk2dv7udm7+/gVEPB90NLEqyMMluwBLghhHXIEnqjPRIoKpe\nSPLPga8zOEX0yqq6b5Q1SJK2GPmcQFV9DfjaqPc7gzUx7PULyt/dzs3fH5Cq6rsGSVJPvG2EJDXM\nEJCkhhkCktSwGXmx2C+67h5K8xj696+qv+6vIk1Wkv8A3A7cWVV/13c9mrokezO4Xmn47+87/VXU\nLyeGRyzJ7wHnAk8AP+26q6oO6a8qTVaS3wF+A3gb8AxwB3B7VV3fa2GalC7EPwr8FbD5P7+qqvf0\nVlTPDIERS/IA8A+qamPftWj7Jfl7wMnAp4G9q2rPnkvSJHS3oHlTdwNL4ZxAHx4FvLH/TirJ55Pc\nCVzGYDjhA8De/ValKVgF/HLfRcwkzgmMSJJ/2TUfBFYk+Srw3Obnq+qzvRSmqdqXwdXuPwJ+CDxZ\nVS/0W5Km4D8C/yfJKn7+7++E/krqlyEwOpuHC/66e+zWPbQTqap/DJDkjcDRwG1JZlXVWL+VaZKW\nAX8E3MuWObmmOScgTUGS4xlMDL+TwbDCXcAdVXVlr4VpUpLcXVWH913HTGIIjFiSv2DLWQmbPQWM\nA/+lqp4dfVWarCSfY3BG0B1V9Td916OpSfJZBsNAN/Dzw0GeIqrRSPInwFzgmq7rt4GnGQTDXlX1\n4b5q0+QkmQds/jS5sqrW91mPJi/JbRN0e4qoRmeiw9HNfUnuq6qD+qpN25bknwL/CVgBhMHQ0L+q\nqv/RZ13S9nJiePTmJHnt5iuEk7wWmNM957nLM9+/Aw7f/Ok/yVzgFsAQ2Akk+YOJ+qvq34+6lpnC\nEBi93wf+MslfMfgkuRA4M8mrGJy5oJntl7Ya/tmI19vsTIZv9TEbOB74fk+1zAgOB/Ugye7Ar3eL\n9zsZvPNIchFwCD8/p3NPVX2mv6q0vbq/xa9X1bv6rqUvhsCIJHlPVX0zyT+Z6Pmq+vKoa9L2SfJb\nwDu6xTuq6ro+69H2624md3dVvb7vWvricNDo/CPgm8D7u+XN6ZuubQjsJKrqS8CX+q5DU5fkXrb8\n7c1icKZes/MB4JHAyCWZDfwWsIAtIVwtT0ztDJI8w4uv74AuxKtqrxGXpO2Q5FeHFl8Anmj9th8e\nCYze/2Rw35nvAJvnAkziGc67hO7ckuxVVU8zuP33sL2SUFU/7KOumcAjgRFLsqqqDu67DqklSb5S\nVccneYjBh64MPV1V9bqeSuudITBiSa4A/rSq7u27FkkyBEZkaEJqF2ARg1tKP8eWMWW/WUzaQZIc\n9nLPe+8g7XBbTUi9SFU9MqpapNYM3TNoNrAY+B6DD2CHAONV9ba+auubE8Mj4n/yUn+q6t0ASb4M\nHLZ5ODbJwcB5PZbWOy93l9SSXxuej6uqVcAbe6yndx4JSGrJPUk+D3yhW/4QcE+P9fTOOQFJzegu\n1vxdBt8MB3A7cFnL9+8yBCQ1JckewGur6v6+a5kJnBOQ1IwkJwDfBW7qlg9NckO/VfXLEJDUknOB\nIxjcuoWq+i6D7/RoliEgqSXPV9VTW/U1PSbu2UGSWnJfkg8Cs5IsAj4F3NlzTb3ySEBSS34POIjB\nLVv+HHgKWNprRT0zBCS15MDusQuDW0icCNzda0U98xRRSc1Icj/waWAV8NPN/S3f1sU5AUkt2VBV\nf9F3ETOJRwKSmpHkKOAU4FYG8wIAVFWz3/HtkYCklvwO8OvArmwZDiqg2RDwSEBSM5LcX1W/1ncd\nM4lnB0lqyZ1JDuy7iJnEIwFJzUjyfeDvAw/h17sChoCkhrzU17y2fIqoISBJDXNOQJIaZghIUsMM\nAWkKklyV5AN91yFNF0NA2oGSeEGmZjRDQM1L8qokX03yvSSrkvx2kj9Icne3fEWSTLDdhOskWZHk\nj5OMA/82yUNJdu2e22t4WeqbISDBMcDfVNWbq+pgBt8/+7mqOrxb3gM4foLtXm6d3apqcVWdD6wA\n3tf1LwG+XFXP76g3I02FISDBvcBvJvmjJL/Rff3gu5N8K8m9wHsYfBHJ1l5unS8OtT/P4J41dD//\n+/S/BWn7OF6p5lXV/01yGHAc8IdJbgXOAhZX1aNJzmPwBSQ/k2Q2cOnLrPN3Q6//v5IsSPIuYFZV\nrdqhb0iaAo8E1LwkvwL8v6r6AnARcFj31JNJ5gATnQ00exLrDLuawdcZehSgGcUjAQneBFyU5KfA\n88DvAicx+Papx5ng6wer6kdJ/uvLrbOVPwP+ELhmGuuWXjFvGyGNQHdtwYlV9eG+a5GGeSQg7WBJ\n/hQ4lsGcgzSjeCQgSQ1zYliSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ17P8DRxGGJIEiwcQAAAAA\nSUVORK5CYII=\n", 357 | "text/plain": [ 358 | "" 359 | ] 360 | }, 361 | "metadata": {}, 362 | "output_type": "display_data" 363 | } 364 | ], 365 | "source": [ 366 | "pd.crosstab(df.salary,df.left).plot(kind='bar')" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "Above bar chart shows employees with high salaries are likely to not leave the company" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "**Department wise employee retention rate**" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 38, 386 | "metadata": { 387 | "scrolled": true 388 | }, 389 | "outputs": [ 390 | { 391 | "data": { 392 | "text/plain": [ 393 | "" 394 | ] 395 | }, 396 | "execution_count": 38, 397 | "metadata": {}, 398 | "output_type": "execute_result" 399 | }, 400 | { 401 | "data": { 402 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAFDCAYAAADcebKbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYXVWZ7/Hvj4AEDWEMNKbABAloQAiSIKCN2F4Zm0FF\nOjgAggRb0Hiv2oLeblCbFkdaREBsELAZpFUEmYcLIgKGCmBCwNwgBEkaIUaGgBIhvP3HWic5qVSl\nKkmdtbe1f5/nOU/OXmdYbyWV856119rvUkRgZmbNtFbVAZiZWXWcBMzMGsxJwMyswZwEzMwazEnA\nzKzBnATMzBrMScDMrMGcBMzMGsxJwMyswZwEzMwabO2qA+jPpptuGmPGjKk6DDOzvyrTp0//Q0SM\n6u95tU8CY8aMobu7u+owzMz+qkh6bCDP8+kgM7MGcxIwM2swJwEzswar/ZyAmVkVXnrpJebNm8eL\nL75YdSgrNXz4cLq6ulhnnXVW6/VOAmZmvZg3bx7rr78+Y8aMQVLV4fQqIli4cCHz5s1j7Nixq/Ue\nPh1kZtaLF198kU022aS2CQBAEptssskajVacBMzM+lDnBNCypjE6CZiZDaIRI0b0+5wzzjiDN77x\njXzgAx/gtttu48477ywQWe88J2BmHTfmxGv6fc7c0w4oEEk9nHXWWdx88810dXVxyimnMGLECPbY\nY49KYvFIwMysQ772ta8xadIkdtxxR04++WQAPvrRj/LII4+w3377cfrpp3POOedw+umnM2HCBH7x\ni18Uj9EjATOzDrjxxhuZM2cO06ZNIyI46KCDuP322znnnHO4/vrrufXWW9l000159tlnGTFiBJ/+\n9KcridNJwMysA2688UZuvPFGdt55ZwCef/555syZw5577llxZMvrNwlIGg7cDqybn/+jiDhZ0sbA\nD4ExwFzgsIh4Or/mJOAYYAnwiYi4IbfvAlwArAdcC0yNiBjcH8nMrHoRwUknncRxxx1XdSgrNZA5\ngcXA30XETsAEYF9JuwEnArdExDjglnyMpPHAZGB7YF/gLEnD8nudDRwLjMu3fQfxZzEzq4199tmH\n888/n+effx6A+fPn89RTT63wvPXXX59FixaVDm+pfpNAJM/nw3XyLYCDgQtz+4XAIfn+wcBlEbE4\nIh4FHgZ2lbQFMDIi7s7f/i9qe42Z2ZCy99578/73v5/dd9+dN73pTRx66KG9ftgfeOCBXHHFFfWe\nGM7f5KcD2wDfiYhfSdo8Ip7IT/k9sHm+Pxq4u+3l83LbS/l+z3YzsyGj9c0fYOrUqUydOnWF58yd\nO3fp/W233ZYZM2aUCK1XA1oiGhFLImIC0EX6Vr9Dj8eDNDoYFJKmSOqW1L1gwYLBelszM+thla4T\niIhngFtJ5/KfzKd4yH+2TnbNB7Zse1lXbpuf7/ds762fcyNiYkRMHDWq393RzMxsNfWbBCSNkrRh\nvr8e8C7gN8BVwJH5aUcCV+b7VwGTJa0raSxpAnhaPnX0nKTdlIpdHNH2GjMzq8BA5gS2AC7M8wJr\nAZdHxNWS7gIul3QM8BhwGEBEzJJ0OfAg8DJwfEQsye/1MZYtEb0u38zMrCL9JoGImAHs3Ev7QuCd\nfbzmVODUXtq7gR1WfIWZmVXBtYPMzBrMScDMrMauv/56tttuO7bZZhtOO+20QX9/1w4yMxuAgZTD\nXhUDKZ29ZMkSjj/+eG666Sa6urqYNGkSBx10EOPHjx+0ODwSMDOrqWnTprHNNtuw9dZb86pXvYrJ\nkydz5ZWDu6jSScDMrKbmz5/Pllsuu+yqq6uL+fN7vbxqtTkJmJk1mJOAmVlNjR49mscff3zp8bx5\n8xg9enBLrjkJmJnV1KRJk5gzZw6PPvoof/nLX7jssss46KCDBrUPrw4yM6uptddemzPPPJN99tmH\nJUuWcPTRR7P99tsPbh+D+m5mZkPUQJZ0dsL+++/P/vvv37H39+kgM7MGcxIwM2swJwEzswZzEjAz\nazAnATOzBnMSMDNrMCcBM7OaOvroo9lss83YYYfO7cXl6wTMzAbilA0G+f2e7fcpRx11FCeccAJH\nHHHE4PbdxiMBM7Oa2nPPPdl444072oeTgJlZgzkJmJk1mJOAmVmDOQmYmTWYk4CZWU0dfvjh7L77\n7syePZuuri7OO++8Qe/DS0TNzAZiAEs6B9ull17a8T76HQlI2lLSrZIelDRL0tTcfoqk+ZLuz7f9\n215zkqSHJc2WtE9b+y6SZubHzpCkzvxYZmY2EAMZCbwMfCoi7pW0PjBd0k35sdMj4uvtT5Y0HpgM\nbA+8FrhZ0rYRsQQ4GzgW+BVwLbAvcN3g/ChmZraq+h0JRMQTEXFvvr8IeAhY2U7HBwOXRcTiiHgU\neBjYVdIWwMiIuDsiArgIOGSNfwIzM1ttqzQxLGkMsDPpmzzAxyXNkHS+pI1y22jg8baXzctto/P9\nnu299TNFUrek7gULFqxKiGZmgyZ9X623NY1xwElA0gjgx8AnI+I50qmdrYEJwBPAN9YokjYRcW5E\nTIyIiaNGjRqstzUzG7Dhw4ezcOHCWieCiGDhwoUMHz58td9jQKuDJK1DSgAXR8RPcudPtj3+PeDq\nfDgf2LLt5V25bX6+37PdzKx2urq6mDdvHnU/GzF8+HC6urr6f2If+k0CeQXPecBDEfHNtvYtIuKJ\nfPhu4IF8/yrgEknfJE0MjwOmRcQSSc9J2o10OukI4NurHbmZWQets846jB07tuowOm4gI4G3Ah8C\nZkq6P7d9Djhc0gQggLnAcQARMUvS5cCDpJVFx+eVQQAfAy4A1iOtCvLKIDOzCvWbBCLiDqC39fzX\nruQ1pwKn9tLeDXRudwQzM1slLhthZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZg\nTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4C\nZmYN5iRgZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN1m8SkLSlpFslPShplqSpuX1jSTdJ\nmpP/3KjtNSdJeljSbEn7tLXvImlmfuwMSerMj2VmZgMxkJHAy8CnImI8sBtwvKTxwInALRExDrgl\nH5MfmwxsD+wLnCVpWH6vs4FjgXH5tu8g/ixmZraK1u7vCRHxBPBEvr9I0kPAaOBgYK/8tAuB24DP\n5vbLImIx8Kikh4FdJc0FRkbE3QCSLgIOAa4bxJ/HzKzWxpx4Tb/PmXvaAQUiSVZpTkDSGGBn4FfA\n5jlBAPwe2DzfHw083vayebltdL7fs93MzCoy4CQgaQTwY+CTEfFc+2MREUAMVlCSpkjqltS9YMGC\nwXpbMzPrYUBJQNI6pARwcUT8JDc/KWmL/PgWwFO5fT6wZdvLu3Lb/Hy/Z/sKIuLciJgYERNHjRo1\n0J/FzMxW0UBWBwk4D3goIr7Z9tBVwJH5/pHAlW3tkyWtK2ksaQJ4Wj519Jyk3fJ7HtH2GjMzq0C/\nE8PAW4EPATMl3Z/bPgecBlwu6RjgMeAwgIiYJely4EHSyqLjI2JJft3HgAuA9UgTwp4UNjOr0EBW\nB90B9LWe/519vOZU4NRe2ruBHVYlQDMz6xxfMWxm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZgTgJm\nZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN\nNpCdxczMhoQxJ16z0sfnnnZAoUjqwyMBM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcB\nM7MG6zcJSDpf0lOSHmhrO0XSfEn359v+bY+dJOlhSbMl7dPWvoukmfmxMyRp8H8cMzNbFQMZCVwA\n7NtL++kRMSHfrgWQNB6YDGyfX3OWpGH5+WcDxwLj8q239zQzs4L6TQIRcTvwxwG+38HAZRGxOCIe\nBR4GdpW0BTAyIu6OiAAuAg5Z3aDNzGxwrMmcwMclzcinizbKbaOBx9ueMy+3jc73e7abmVmFVjcJ\nnA1sDUwAngC+MWgRAZKmSOqW1L1gwYLBfGszM2uzWkkgIp6MiCUR8QrwPWDX/NB8YMu2p3bltvn5\nfs/2vt7/3IiYGBETR40atTohmpnZAKxWEsjn+FveDbRWDl0FTJa0rqSxpAngaRHxBPCcpN3yqqAj\ngCvXIG4zMxsE/ZaSlnQpsBewqaR5wMnAXpImAAHMBY4DiIhZki4HHgReBo6PiCX5rT5GWmm0HnBd\nvpmZWYX6TQIRcXgvzeet5PmnAqf20t4N7LBK0ZmZWUf5imEzswZzEjAzazAnATOzBnMSMDNrMCcB\nM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAnATOz\nBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2uwfpOApPMl\nPSXpgba2jSXdJGlO/nOjtsdOkvSwpNmS9mlr30XSzPzYGZI0+D+OmZmtioGMBC4A9u3RdiJwS0SM\nA27Jx0gaD0wGts+vOUvSsPyas4FjgXH51vM9zcyssH6TQETcDvyxR/PBwIX5/oXAIW3tl0XE4oh4\nFHgY2FXSFsDIiLg7IgK4qO01ZmZWkbVX83WbR8QT+f7vgc3z/dHA3W3Pm5fbXsr3e7abmSWnbNDP\n48+WiaNh1nhiOH+zj0GIZSlJUyR1S+pesGDBYL61mZm1Wd0k8GQ+xUP+86ncPh/Ysu15Xbltfr7f\ns71XEXFuREyMiImjRo1azRDNzKw/q5sErgKOzPePBK5sa58saV1JY0kTwNPyqaPnJO2WVwUd0fYa\nMzOrSL9zApIuBfYCNpU0DzgZOA24XNIxwGPAYQARMUvS5cCDwMvA8RGxJL/Vx0grjdYDrsu3NTbm\nxGtW+vjc0w4YjG7MzIakfpNARBzex0Pv7OP5pwKn9tLeDeywStGZmVlH+YphM7MGcxIwM2swJwEz\nswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MG\ncxIwM2uw1d1j2Mz+SnjPDVsZJwEzs7o5ZYN+Hn920Lry6SAzswbzSGCQeMhtZn+NnATMOqS/Lwbg\nLwdWPZ8OMjNrMCcBM7MGcxIwM2swJwEzswZzEjAza7A1Wh0kaS6wCFgCvBwREyVtDPwQGAPMBQ6L\niKfz808CjsnP/0RE3LAm/dvyvEzVbA31d5EWDOqFWnUwGEtE3xERf2g7PhG4JSJOk3RiPv6spPHA\nZGB74LXAzZK2jYglgxCD2XKcEM0GphPXCRwM7JXvXwjcBnw2t18WEYuBRyU9DOwK3NWBGKxC/gA2\n++uxpnMCQfpGP13SlNy2eUQ8ke//Htg83x8NPN722nm5zczMKrKmI4G3RcR8SZsBN0n6TfuDERGS\nYlXfNCeUKQBbbbXVGoZoZmZ9WaORQETMz38+BVxBOr3zpKQtAPKfT+Wnzwe2bHt5V27r7X3PjYiJ\nETFx1KhRaxKimZmtxGonAUmvkbR+6z6wN/AAcBVwZH7akcCV+f5VwGRJ60oaC4wDpq1u/2ZmtubW\n5HTQ5sAVklrvc0lEXC/pHuBySccAjwGHAUTELEmXAw8CLwPHe2WQmVm1VjsJRMQjwE69tC8E3tnH\na04FTl3dPs3MbHD5imEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAn\nATOzBnMSMDNrsE5sKlMvDdwuzsxsoDwSMDNrMCcBM7MGcxIwM2uwoT8nYPXT3zyN52jMinESaBJP\nktdPHRKify8azaeDzMwazEnAzKzBnATMzBrMScDMrME8MVxKHSYAzcx68EjAzKzBPBKwZvKySDPA\nIwEzs0YrngQk7StptqSHJZ1Yun8zM1umaBKQNAz4DrAfMB44XNL4kjGYmdkypUcCuwIPR8QjEfEX\n4DLg4MIxmJlZVjoJjAYebzuel9vMzKwCiohynUmHAvtGxEfy8YeAt0TECT2eNwWYkg+3A2avQbeb\nAn9Yg9cPljrEUYcYoB5x1CEGqEccdYgB6hFHHWKAwYnjdRExqr8nlV4iOh/Ysu24K7ctJyLOBc4d\njA4ldUfExMF4r7/2OOoQQ13iqEMMdYmjDjHUJY46xFA6jtKng+4BxkkaK+lVwGTgqsIxmJlZVnQk\nEBEvSzoBuAEYBpwfEbNKxmBmZssUv2I4Iq4Fri3Y5aCcVhoEdYijDjFAPeKoQwxQjzjqEAPUI446\nxAAF4yg6MWxmZvXishFmZg3mJGBm1mBOAmZmDTakkoCkG6uOwVYkaZik/12DOKYOpK1AHO/p5fZO\nSZuVjqUuJK0laWRFfd8ykLahakhNDEu6LyJ2rjoOAEnvAD5OuuIZ4CHgzIi4rXAc/6eX5meB6RFx\nf8E4pkXErqX66yOGeyPizT3aiv/OSLoG2B24NTftBUwHxgJfjIgfFIpjEdDzA+BZoBv4VEQ80uH+\nLwE+CiwhXUM0EvhWRHytk/229T8ceDXp32EvQPmhkcD1EfGGAjH8jBX/DZaKiIM6HcNQ21RmA0nv\n6evBiPhJiSAkHQCcCXwR+ALpl+vNwPmSTsjLZEuZmG8/y8d/D8wAPirpvyLiq4Xi+KWkM4EfAi+0\nGiPi3k53LOlw4P3AWEntFyeuD/yx0/33Ym3gjRHxZI5vc+Ai4C3A7UCRJAD8O6l+1yWk39HJwOuB\ne4HzSR+MnTQ+Ip6T9AHgOuBEUjIskgSA44BPAq/N/baSwHOk/78lfL1QP30aaiOBhcCVLPvHbBcR\ncXShOG4DpkbEr3u07wh8OyLeXiKO3OftwP4R8Xw+HgFcA+xLGg0UKeUtqfWtd7lfuIj4uwJ9v470\nLfvLpA+alkXAjIh4udMx9Ijnwfa/d0kCZkXE+JIjE0m/joiderTdHxETenusA/3PAiaQktCZEfHz\nEv32iGEY8LmI+FKpPutmqI0EHiv1Qd+Pv+mZAAAiYkb+1lfSZsDituOXgM0j4s+SFvfxmk7YD3gv\nMIZlv3dFvoFExGPAY6RTMHVwm6Srgf/Kx+/Nba8BnikYx58kHQb8KB8fCryY75f4t/kuMBf4NXB7\nTtbPFeh3qYhYks8eVJoEJI0jfUkZDwxvtUfE1p3ue6glgd5GAFV4YTUf64SLgV9JujIfHwhckj9w\nHiwYx09JH3D3UvaDZqn8n/0rpMSofIuIKD0heTzpg/+t+fgi4MeRhuXvKBjHB4BvAWeR/i3uBj4o\naT3ghJW9cDBExBnAGW1Nj+W5tNJukfRe4CdR3amR7wMnA6eTfgc+TKGFO0PtdND2dahFJOkZ0rnd\nFR4C3hYRGxWOZxKwRz78ZUR0l+w/x/BAROxQut8eMTwMHBgRD1UZhyV5VPxvwGsjYr+8y+DuEXFe\n4TgWAa8hTVD/mQq+HEiaHhG7SJoZEW9qb+t030NtJHC3pN6yWul/1JXtllbFRNC9pJLdawNI2ioi\nflc4hjslvSkiZhbut92TdUgAdRmRSBoFHMvyp+goeEr1AtI34M/n4/9PWjhQNAlExPol++vDYklr\nAXNykc35wIgSHQ+pkYCtSNLHScPMJ0nfdFofODsW6n8m6VTD2sA44BHSHEXROHIs3wL+hnRqaul8\nSKlVY21x1GJEIulO4BeklTFLWu0R8eNC/d8TEZPaJ8NbE9Ml+u8Ry0HAnvnwtoi4unD/k0jLyDck\nzU+MBL4WEXd3uu+hNhKohbYPvl6V/OADpgLbRcTCgn22+/uK+u3NSOBPwN5tbQEUTQLUZEQCvDoi\nPlth/y9I2oT8f0XSbqTrFIqSdBowiTR/BjBV0lsj4qRSMUTEPfnu86T5gGI8EuiAvMoB0gQgLFv3\n/UHSt98TV3xVx2K5FXhX6WWQ1rcajUj+Fbiz8HUr7f2/Gfg2sAPwADAKODQiZhSOYwYwISJeycfD\ngPsKj1JvAt4XEc/k442AyyJin4737STQOb2t+e7tqtUOx3Ae6arla1j+A+ebpWKoC0nbAmeTlsju\nkK/bOCgi/rVwHN/vpbnYdSxtcbQmRBeTlg5XMSG6Nun3U8DsiHipVN9tMcwA9oqIP+bjjUmnhEom\ngd4+K4pcM+LTQZ2lPKz8ZT7Yg/L1mn6Xb6/Ktyb7HvAZ0vr01nUblwBFk0BEFB3u96WqCdGVXNW/\nraTiIyLS+vz78qhZpLmBYqP17JX2BRv5bEKRb+hOAp11DKlUxAakX66ngaLf9iLiCyX7q7lXR8S0\ndIHuUsVOk0n6p4j4qqRv08t/8Ij4RKlY2mLakRVXB3X6Q/jAlTxWfI4mIi7NV/lPyv1/NiJ+XzIG\n0gqpOyT9nPRZ8bfAlBIdOwl0UERMB3bKSYCIKDbpJenfI+KTfRWoKlGYqob+IOn1LJuIPBR4omD/\nrcng4tdp9EbS+cCOwCzgldzc8Q/huoyEetgdeBvLVrJdUbLziLg+z5Hslps+GRF/KNG35wQ6SNK6\nrFgqgYj4YoG+d4mI6ZJ6rVMUET/vdAx1I2lr0t6te5BGZY8CH4yIuYXjeF9E/Fd/bQXiWK6GURVy\nscXtWb5UQsf/f/SI4SxgG+DS3PQPwG8j4vi+XzVofb8hIn6TE8AKokSBRSeBzpF0PblsM8uvw/5G\nwRimRsS3+mtrklwyY62IWFRR/72VtC66YCD3eR7wjYgoWT6kvf9zSKWc3wH8B6l20bSIOKZwHL8h\nVXVtjRDXIhX0e2OBvs+NiCltBRbbRZQosOgk0Dk1KZVQixr6dSBpQ+AIVhyZFTkXL2k/YH/gMNKV\nsS0jSWWVi+63kEeJVwG/p4IL+CTNiIgd2/4cAVwXEX9bov+2OK4Gjo9UaLA1KXtmRKxs7mLI8JxA\nZ1VWKkH1q6FfB9eSiqTNZNk58JL+mzQfcBBpdNiyCKhi57XzgA9R3d9Hq5DgnyS9lvR7uUUFcawP\nPCRpWj6eBHS3/t+Umj/LqwfHsPwXlIs63a+TQGe9DThK0qOU/6Z1J2nSc1Og/fTTItKmMk00PCJ6\n22mtiEjlxX+dl6WuDWwVEbOrigdYEBFX9f+0jvlZHp19jVTfKkjLeEv7lwr6XI6kH5A29LmfZaeO\ng1RhtrN9+3RQ57RdObyc1rDTylLa5/h54GqWv3Cu6MhI0oGkQoKvioixkiaQtpUsumIrT4huSNp1\nrviVy5LeR9rGcZGkfybtvvelEpOhvcTyN8CupA/ee0ovEZX0EOmUYPEP5CG10XzdRMRj+QP/z6Rf\nrtatGKVNzOdIelbSc5IWSSq6cUeN/IX0rfMu0umY6VSzXPMU0gfOMwCR9noeW0Ec65E+/Pcmrd0/\nkLK1nv45J4C3AX9Hmhw+u2D/AEj6CDANeA9pcvpuSaU3p3qAVEqkOJ8O6qBcmfAbpD1MnwJeR1or\nvn3BML5KDSpW1sSngG1Krb9eiZci4tkeF60V/wbY33p9SSdFxJc7GELrtMcBwPci4ppcz6i0zwA7\nt4os5qJ2d5L2WS5lU+DBPC/RPirzRvN/5b5Euvjj5ojYWWnXpA8WjqEuFSvr4GFSFdGqzZL0fmCY\n0raCnyB96NTN+0glFTplvqTvAu8CvpKvq6ni7MRC0lxZy6LcVtIphftbynMCHSSpOyImSvo16ZvG\nKyq/kXYtKlbWgaQrSKOwW1n+76JouQZJryaVCWiVtL6RdC78xb5fVV6nlxLnv4d9gZkRMUfSFsCb\nIuLGTvXZRxwXAW8CriSNyA4mLZ6YAUO/2KJHAp31TF77fDtwsaSnKL/HcF1q6NfBT/OtaptHxOdZ\ntqNWa1ORe/p+SSU6+g0xIv5E2+9hRDxB2TIeLb/Nt5bWftzFCuypwt3mPBLooHxl6p9JQ9wPABsA\nF0d1G7w0ntIm6pUuzZR0L2meZn4+3hP4TuS9ZeuiqRcVVkEV7jbnkUAHRUTrW/8rwIX5cvTDWbaD\nUccp1a7vrYBc6dUPlWtfmkm6iK6SpZnAccBPczxvJp13379wDLSXOe+jrWgto6rkkg29/R/peMmG\nNpXN3Xkk0AGSRpJ2FRtNuiz/pnz8aeDXEbGyjegHO5b3th0OB94N/HcVZYurJmk6aSnibbFsT9tK\nSntI2p20r8GLwAERsaCCGGpRw6hqknZpOxxOKvr4ckT8U4G+W3srvJ2K5u48EuiMH5CqVN4FfAT4\nHOkc3yF5TXgx0WPTcEmXAneUjKFGeluaWaxcQi9lvV9NKjB4ntJmKqXKE+xOqqQ6SlL7FdQjgWEl\nYqiTXPK93S/bSkh0Wnt9okrm7pwEOmPr1vldSf9BmuzaqiarP8aRJp+aqOqlmV8v2NfKvAoYQfr/\n3z75+RzpYqlGUdpOsmUtYCJp/q7j6rC3gk8HdUDPIXWVQ2ylfWSDvNqAVDHypJ4jhCbosTRTwA0U\nXpqptIn5zRHxjlJ9riSW17mECeTaXq3/Iy8Bc0lzRcVGzJIuBKbG8hvNf6PE3J2TQAdIWsKypaAi\nXZ7/JyrYyNvqR9ItwHui4E5zfcRxE/C+Hh88l0XEPlXGVZqkw0g1jJ6rqoZRbyuxSq3O8umgDoiI\nWp1XzeUr9syHt0XE1VXGU5VezslDOiffDXy34IjgeWBm/hBeet1IBZP1m7YSQO7/aUlNPFX4fyPi\n8rYaRl8n1TB6S8EY1pK0UUQ8DUtPURX5fHYSGOIknUaqj95aljpV0h4R8bkKw6rKI8Aolt9GcBGw\nLamE8YcKxfET6nGx3iuStoqI38HSqrdNPDVQhxpG3wDuktRalvs+4NQSHft00BAnaQYwISJeycfD\ngPtK7R5VJ5LuiYhJvbVJmhURJQv7VU7SvqQ9l39OOlX5t8CUiLih0sAKU9pZbD6phtGbSRd4TitZ\n3iXHMZ40EgH4f1Fo20+PBJphQ5btJlZk1UNNjejxzXcr0ioZSGWmi8grk74MjGf5Dda3LhVD7u96\npQ3Od8tNn6xBhdUqHEaqYfT1iHgm1zD6TAVxbAy8EBHflzRK0tiIeLTTnToJDH1fBu7LV0WKNDdw\nYrUhVeZEqsINAAAHq0lEQVRTwB2Sfkv6uxgLfCyX97iwYBzfB04GTidtsv5hKqiemctVQFoaCjA+\nX69we+lYqlSHGkaSTiYtTd2O9PuxDvCfwFs73rdPBw19+ZtN6zTItNK7JtVJLlf8hnw4u4prNyRN\nj4hdJM1su55kekTs0t9rBzmOn7UdDidtdDO9cLkEAyTdD+wM3Nt2NfuMEqdtPRIY4iS9m3R+8ap8\nvKGkQyKiDtU0qzCO9G1rOLBT/ubb8X1ce1ic60jNkXQC6Xz0iH5eM+giov1qVSRtCfx76TgMgL9E\nREgKWFp8sgiPBIY4SfdHxIQebY2sDpmH3HuRzsVfC+wH3BERRa+SzWWjHyLN1XyJVK7hqxHxq5Jx\n9BKXgFkRMb7KOJpI0qdJX1DeRTqFezRwSUR8u9N9eyQw9PV2rrmp/+6HAjuRVkd9WNLmpPOupQWp\nvtTrSOd+IS1RLbpiS9K3WbYkdC1gAlB8k3cD0tLlH5HmZ7YD/gX4XyU69khgiJN0PmlD8+/kpuOB\njSPiqMqCqoikaRGxa64m+g7SNQIPRcQb+nnpYMcxm7T6ZCZtBexKl3CQdGTb4cvA3J6lpa2MPiq6\nek7ABsXHgX8Gfkj61tcqa91E3ZI2JH3rnk66cveuCuJY0JqjqVJElFwRZb2Q9I/Ax4Ct8zU9LesD\nRRKyRwLWSJLGACMjYkY/T+1E3+8kbS50CxXs+yxpJiu5MriJFxJWRdIGwEakeYD2pduLIuKPvb9q\nkGNwEhjaXCRseZJGk87FLx0Fl14XL+k/SctUZ7HsdFCU2u0tl4eAZSPCH+Q/P5jjaOp1JI3kJDDE\nVVmdsG4kfYVUL+hBltWLidLbS0qaHRHbleyzjzh6+91o3M5iTec5gaGvZ5GwMTSzSBjAIcB2EbG4\n32d21p2SxpeqDbMSUtuewpL2oIIrl61aTgJD3+dJpRKWKxJWbUiVeYS0JLPqJLAbcH/ezGQxy/aZ\nKH0u/hjg/HxeWqQtUYuckrL68OmgBsg14qcA95E2uHmqafVhACT9mHSdQM8J2aJ1/NvOyS+nql2+\nchKg6k1urBoeCQxxkj4CTAW6gPtJ30LvYlnJ2ia5Kt8qVZctHfOH/8nkDYfyaPGLTgbN4pHAEJeX\nA04C7o6ICZLeAPxbRLyn4tCsYnlk9ADLKqh+CNjJvxvN4pHA0PdiRLwoCUnrRsRvJFW+MqUKdanj\nXyOvj4j3th1/IVeztAbxSoChb16+SvanwE2SrgRqcTqiAt8n7R37MqlsxEVUUzuoLv6c99UFQNJb\nSbtqWYP4dFCDSHo7aWex6yOi2E5adVGXOv51IWknUiJs7Tb3NHBkFVdRW3V8OqhBIuLnVcdQsVrU\n8a+D/PewXUTsJGkkQEQ818/LbAjySMAao651/KsiqTsiJlYdh1XLScAaQ9JE0sVz7XX8q7hIqxYk\nnQb8gVRh9oVWe6nCZVYPTgLWGHWp418X+YrlFT4AGrxaqpE8J2BNUos6/jUynlTL/m2kZPAL4JxK\nI7LiPBKwxqi6jn/dSLqctJ3hxbnp/cAGEXFYdVFZaR4JWJN8mFTHfx3a6vgDjUwCwA49NpW/VVLV\nlU2tMCcBa5JJdajjXyP3StotIu4GkPQWoLvimKwwJwFrkrrU8a+LXUh/J7/Lx1sBs1vbTzZ11VTT\neE7AGkPSQ8Drgarr+NdCXyWtW5q6aqppnASsMepWx9+sDpwEzMwazFVEzcwazEnAzKzBnARsSJC0\nRNL9kmZJ+rWkT+VKmZ3u9yhJrx0q/VjzOAnYUPHniJgQEdsD7wL2I+2f2zGShgFHASU+nEv1Yw3j\nJGBDTkQ8BUwBTlAyTNLXJN0jaYak4wAk7SXpdknXSJot6ZzW6EHS2ZK688jiC633ljRX0lck3Usq\nQTERuDiPQtbLj385H3dLerOkGyT9VtJH297nM23xfCG3jZH0kKTv5X5vzO95aM9+iv1l2pDnJGBD\nUkQ8AgwDNgOOAZ6NiEnAJOBYSWPzU3cFPk4qpvZ6oLXJ+udzrf0dgbdLar+WYGFEvDki/pN0he0H\n8iiktTXj7yJiAqkg2wXAocBuQOvDfm9gXO57ArCLpD3za8cB38kjmmeA90bEj/rox2yN+Ypha4K9\ngR3zN2pI2ymOA/4CTMsJA0mXkipq/gg4TNIU0v+RLUhJorXt4g/76a9VqXQmMCIiFgGLJC3O+z3v\nnW/35eeNyPH8Dng0IlqbvU8HxqzWT2w2QE4CNiRJ2hpYAjxFujL44xFxQ4/n7MWK9fQjjxI+Tao1\n9LSkC4Dhbc95gZVrVSh9pe1+63jtHM+XI+K7PeIZ0+P5SwCf+rGO8ukgG3IkjSLVxT8z0tWQNwD/\nKGmd/Pi2kl6Tn76rpLF5LuAfgDtI206+ADwraXPSJHNfFgHrr2KINwBHSxqR4xktabN+XrM6/Zj1\nyyMBGyrWk3Q/qUz0y8APgG/mx/6DdFrlXkkCFgCH5MfuAc4EtgFuBa6IiFck3Qf8Bngc+OVK+r0A\nOEfSn4HdBxJoRNwo6Y3AXSkcngc+SPrmP6B+PC9gg8VlI6yx8umgT0fE31cdi1lVfDrIzKzBPBIw\nM2swjwTMzBrMScDMrMGcBMzMGsxJwMyswZwEzMwazEnAzKzB/ge9EgVs3R/57wAAAABJRU5ErkJg\ngg==\n", 403 | "text/plain": [ 404 | "" 405 | ] 406 | }, 407 | "metadata": {}, 408 | "output_type": "display_data" 409 | } 410 | ], 411 | "source": [ 412 | "pd.crosstab(df.Department,df.left).plot(kind='bar')" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "From above chart there seem to be some impact of department on employee retention but it is not major hence we will ignore department in our analysis" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "

From the data analysis so far we can conclude that we will use following variables as independant variables in our model

\n", 427 | "
    \n", 428 | "
  1. **Satisfaction Level**
  2. \n", 429 | "
  3. **Average Monthly Hours**
  4. \n", 430 | "
  5. **Promotion Last 5 Years**
  6. \n", 431 | "
  7. **Salary**
  8. \n", 432 | "
" 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 76, 438 | "metadata": {}, 439 | "outputs": [ 440 | { 441 | "data": { 442 | "text/html": [ 443 | "
\n", 444 | "\n", 457 | "\n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalary
00.381570low
10.802620medium
20.112720medium
30.722230low
40.371590low
\n", 505 | "
" 506 | ], 507 | "text/plain": [ 508 | " satisfaction_level average_montly_hours promotion_last_5years salary\n", 509 | "0 0.38 157 0 low\n", 510 | "1 0.80 262 0 medium\n", 511 | "2 0.11 272 0 medium\n", 512 | "3 0.72 223 0 low\n", 513 | "4 0.37 159 0 low" 514 | ] 515 | }, 516 | "execution_count": 76, 517 | "metadata": {}, 518 | "output_type": "execute_result" 519 | } 520 | ], 521 | "source": [ 522 | "subdf = df[['satisfaction_level','average_montly_hours','promotion_last_5years','salary']]\n", 523 | "subdf.head()" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "**Tackle salary dummy variable**" 531 | ] 532 | }, 533 | { 534 | "cell_type": "markdown", 535 | "metadata": {}, 536 | "source": [ 537 | "Salary has all text data. It needs to be converted to numbers and we will use dummy variable for that. Check my one hot encoding tutorial to understand purpose behind dummy variables." 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": 78, 543 | "metadata": {}, 544 | "outputs": [], 545 | "source": [ 546 | "salary_dummies = pd.get_dummies(subdf.salary, prefix=\"salary\")" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": 79, 552 | "metadata": {}, 553 | "outputs": [], 554 | "source": [ 555 | "df_with_dummies = pd.concat([subdf,salary_dummies],axis='columns')" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 80, 561 | "metadata": {}, 562 | "outputs": [ 563 | { 564 | "data": { 565 | "text/html": [ 566 | "
\n", 567 | "\n", 580 | "\n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalarysalary_highsalary_lowsalary_medium
00.381570low010
10.802620medium001
20.112720medium001
30.722230low010
40.371590low010
\n", 646 | "
" 647 | ], 648 | "text/plain": [ 649 | " satisfaction_level average_montly_hours promotion_last_5years salary \\\n", 650 | "0 0.38 157 0 low \n", 651 | "1 0.80 262 0 medium \n", 652 | "2 0.11 272 0 medium \n", 653 | "3 0.72 223 0 low \n", 654 | "4 0.37 159 0 low \n", 655 | "\n", 656 | " salary_high salary_low salary_medium \n", 657 | "0 0 1 0 \n", 658 | "1 0 0 1 \n", 659 | "2 0 0 1 \n", 660 | "3 0 1 0 \n", 661 | "4 0 1 0 " 662 | ] 663 | }, 664 | "execution_count": 80, 665 | "metadata": {}, 666 | "output_type": "execute_result" 667 | } 668 | ], 669 | "source": [ 670 | "df_with_dummies.head()" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "Now we need to remove salary column which is text data. It is already replaced by dummy variables so we can safely remove it" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 81, 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "data": { 687 | "text/html": [ 688 | "
\n", 689 | "\n", 702 | "\n", 703 | " \n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalary_highsalary_lowsalary_medium
00.381570010
10.802620001
20.112720001
30.722230010
40.371590010
\n", 762 | "
" 763 | ], 764 | "text/plain": [ 765 | " satisfaction_level average_montly_hours promotion_last_5years \\\n", 766 | "0 0.38 157 0 \n", 767 | "1 0.80 262 0 \n", 768 | "2 0.11 272 0 \n", 769 | "3 0.72 223 0 \n", 770 | "4 0.37 159 0 \n", 771 | "\n", 772 | " salary_high salary_low salary_medium \n", 773 | "0 0 1 0 \n", 774 | "1 0 0 1 \n", 775 | "2 0 0 1 \n", 776 | "3 0 1 0 \n", 777 | "4 0 1 0 " 778 | ] 779 | }, 780 | "execution_count": 81, 781 | "metadata": {}, 782 | "output_type": "execute_result" 783 | } 784 | ], 785 | "source": [ 786 | "df_with_dummies.drop('salary',axis='columns',inplace=True)\n", 787 | "df_with_dummies.head()" 788 | ] 789 | }, 790 | { 791 | "cell_type": "code", 792 | "execution_count": 82, 793 | "metadata": {}, 794 | "outputs": [ 795 | { 796 | "data": { 797 | "text/html": [ 798 | "
\n", 799 | "\n", 812 | "\n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalary_highsalary_lowsalary_medium
00.381570010
10.802620001
20.112720001
30.722230010
40.371590010
\n", 872 | "
" 873 | ], 874 | "text/plain": [ 875 | " satisfaction_level average_montly_hours promotion_last_5years \\\n", 876 | "0 0.38 157 0 \n", 877 | "1 0.80 262 0 \n", 878 | "2 0.11 272 0 \n", 879 | "3 0.72 223 0 \n", 880 | "4 0.37 159 0 \n", 881 | "\n", 882 | " salary_high salary_low salary_medium \n", 883 | "0 0 1 0 \n", 884 | "1 0 0 1 \n", 885 | "2 0 0 1 \n", 886 | "3 0 1 0 \n", 887 | "4 0 1 0 " 888 | ] 889 | }, 890 | "execution_count": 82, 891 | "metadata": {}, 892 | "output_type": "execute_result" 893 | } 894 | ], 895 | "source": [ 896 | "X = df_with_dummies\n", 897 | "X.head()" 898 | ] 899 | }, 900 | { 901 | "cell_type": "code", 902 | "execution_count": 83, 903 | "metadata": {}, 904 | "outputs": [], 905 | "source": [ 906 | "y = df.left" 907 | ] 908 | }, 909 | { 910 | "cell_type": "code", 911 | "execution_count": 91, 912 | "metadata": {}, 913 | "outputs": [], 914 | "source": [ 915 | "from sklearn.model_selection import train_test_split\n", 916 | "X_train, X_test, y_train, y_test = train_test_split(X,y,train_size=0.3)" 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 87, 922 | "metadata": {}, 923 | "outputs": [], 924 | "source": [ 925 | "from sklearn.linear_model import LogisticRegression\n", 926 | "model = LogisticRegression()" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": 88, 932 | "metadata": {}, 933 | "outputs": [ 934 | { 935 | "data": { 936 | "text/plain": [ 937 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", 938 | " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", 939 | " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", 940 | " verbose=0, warm_start=False)" 941 | ] 942 | }, 943 | "execution_count": 88, 944 | "metadata": {}, 945 | "output_type": "execute_result" 946 | } 947 | ], 948 | "source": [ 949 | "model.fit(X_train, y_train)" 950 | ] 951 | }, 952 | { 953 | "cell_type": "code", 954 | "execution_count": 89, 955 | "metadata": {}, 956 | "outputs": [ 957 | { 958 | "data": { 959 | "text/plain": [ 960 | "array([0, 0, 0, ..., 0, 0, 1], dtype=int64)" 961 | ] 962 | }, 963 | "execution_count": 89, 964 | "metadata": {}, 965 | "output_type": "execute_result" 966 | } 967 | ], 968 | "source": [ 969 | "model.predict(X_test)" 970 | ] 971 | }, 972 | { 973 | "cell_type": "markdown", 974 | "metadata": {}, 975 | "source": [ 976 | "**Accuracy of the model**" 977 | ] 978 | }, 979 | { 980 | "cell_type": "code", 981 | "execution_count": 90, 982 | "metadata": {}, 983 | "outputs": [ 984 | { 985 | "data": { 986 | "text/plain": [ 987 | "0.78428571428571425" 988 | ] 989 | }, 990 | "execution_count": 90, 991 | "metadata": {}, 992 | "output_type": "execute_result" 993 | } 994 | ], 995 | "source": [ 996 | "model.score(X_test,y_test)" 997 | ] 998 | } 999 | ], 1000 | "metadata": { 1001 | "kernelspec": { 1002 | "display_name": "Python 3", 1003 | "language": "python", 1004 | "name": "python3" 1005 | }, 1006 | "language_info": { 1007 | "codemirror_mode": { 1008 | "name": "ipython", 1009 | "version": 3 1010 | }, 1011 | "file_extension": ".py", 1012 | "mimetype": "text/x-python", 1013 | "name": "python", 1014 | "nbconvert_exporter": "python", 1015 | "pygments_lexer": "ipython3", 1016 | "version": "3.7.3" 1017 | } 1018 | }, 1019 | "nbformat": 4, 1020 | "nbformat_minor": 2 1021 | } 1022 | -------------------------------------------------------------------------------- /LogisticRegression/Exercise/7_logistic_regression_exercise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Dataset is downloaded from Kaggle. Link: https://www.kaggle.com/giripujar/hr-analytics" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 13, 13 | "metadata": { 14 | "collapsed": true, 15 | "jupyter": { 16 | "outputs_hidden": true 17 | } 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import pandas as pd\n", 22 | "from matplotlib import pyplot as plt\n", 23 | "%matplotlib inline" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 3, 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "data": { 33 | "text/html": [ 34 | "
\n", 35 | "\n", 48 | "\n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | "
satisfaction_levellast_evaluationnumber_projectaverage_montly_hourstime_spend_companyWork_accidentleftpromotion_last_5yearsDepartmentsalary
00.380.5321573010saleslow
10.800.8652626010salesmedium
20.110.8872724010salesmedium
30.720.8752235010saleslow
40.370.5221593010saleslow
\n", 132 | "
" 133 | ], 134 | "text/plain": [ 135 | " satisfaction_level last_evaluation number_project average_montly_hours \\\n", 136 | "0 0.38 0.53 2 157 \n", 137 | "1 0.80 0.86 5 262 \n", 138 | "2 0.11 0.88 7 272 \n", 139 | "3 0.72 0.87 5 223 \n", 140 | "4 0.37 0.52 2 159 \n", 141 | "\n", 142 | " time_spend_company Work_accident left promotion_last_5years Department \\\n", 143 | "0 3 0 1 0 sales \n", 144 | "1 6 0 1 0 sales \n", 145 | "2 4 0 1 0 sales \n", 146 | "3 5 0 1 0 sales \n", 147 | "4 3 0 1 0 sales \n", 148 | "\n", 149 | " salary \n", 150 | "0 low \n", 151 | "1 medium \n", 152 | "2 medium \n", 153 | "3 low \n", 154 | "4 low " 155 | ] 156 | }, 157 | "execution_count": 3, 158 | "metadata": {}, 159 | "output_type": "execute_result" 160 | } 161 | ], 162 | "source": [ 163 | "df = pd.read_csv(\"HR_comma_sep.csv\")\n", 164 | "df.head()" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "

Data exploration and visualization

" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 73, 177 | "metadata": {}, 178 | "outputs": [ 179 | { 180 | "data": { 181 | "text/plain": [ 182 | "(3571, 10)" 183 | ] 184 | }, 185 | "execution_count": 73, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | } 189 | ], 190 | "source": [ 191 | "left = df[df.left==1]\n", 192 | "left.shape" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 74, 198 | "metadata": {}, 199 | "outputs": [ 200 | { 201 | "data": { 202 | "text/plain": [ 203 | "(11428, 10)" 204 | ] 205 | }, 206 | "execution_count": 74, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "retained = df[df.left==0]\n", 213 | "retained.shape" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "**Average numbers for all columns** " 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 31, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "data": { 230 | "text/html": [ 231 | "
\n", 232 | "\n", 245 | "\n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | "
satisfaction_levellast_evaluationnumber_projectaverage_montly_hourstime_spend_companyWork_accidentpromotion_last_5years
left
00.6668100.7154733.786664199.0602033.3800320.1750090.026251
10.4400980.7181133.855503207.4192103.8765050.0473260.005321
\n", 291 | "
" 292 | ], 293 | "text/plain": [ 294 | " satisfaction_level last_evaluation number_project \\\n", 295 | "left \n", 296 | "0 0.666810 0.715473 3.786664 \n", 297 | "1 0.440098 0.718113 3.855503 \n", 298 | "\n", 299 | " average_montly_hours time_spend_company Work_accident \\\n", 300 | "left \n", 301 | "0 199.060203 3.380032 0.175009 \n", 302 | "1 207.419210 3.876505 0.047326 \n", 303 | "\n", 304 | " promotion_last_5years \n", 305 | "left \n", 306 | "0 0.026251 \n", 307 | "1 0.005321 " 308 | ] 309 | }, 310 | "execution_count": 31, 311 | "metadata": {}, 312 | "output_type": "execute_result" 313 | } 314 | ], 315 | "source": [ 316 | "df.groupby('left').mean()" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "From above table we can draw following conclusions,\n", 324 | "
    \n", 325 | "
  1. **Satisfaction Level**: Satisfaction level seems to be relatively low (0.44) in employees leaving the firm vs the retained ones (0.66)
  2. \n", 326 | "
  3. **Average Monthly Hours**: Average monthly hours are higher in employees leaving the firm (199 vs 207)
  4. \n", 327 | "
  5. **Promotion Last 5 Years**: Employees who are given promotion are likely to be retained at firm
  6. \n", 328 | "
" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "**Impact of salary on employee retention**" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": 37, 341 | "metadata": { 342 | "scrolled": true 343 | }, 344 | "outputs": [ 345 | { 346 | "data": { 347 | "text/plain": [ 348 | "" 349 | ] 350 | }, 351 | "execution_count": 37, 352 | "metadata": {}, 353 | "output_type": "execute_result" 354 | }, 355 | { 356 | "data": { 357 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAEpCAYAAAB1Fp6nAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFplJREFUeJzt3X+0XWWd3/H3Z8KPsAw4/Igpk4uTWOOMgIgYKGrHqqyR\nHyLQjmWCVnFA0YGO6erYCm3XAO2wyiy6ZIaxQKmlhOUMmFWlMCqIIFkwpRiuViFEaRh+DMkACXEE\nnC4Q4rd/nB1zDBdyb7g5+8bn/VrrrPvs5+x99vfkrpvP2c+z9z6pKiRJbfqlvguQJPXHEJCkhhkC\nktQwQ0CSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ1bJe+C9iW/fbbrxYsWNB3GZK0U/n2t7/9ZFXN\n3dZ6Mz4EFixYwPj4eN9lSNJOJckjk1nP4SBJapghIEkNMwQkqWEzfk5Akvrw/PPPs3btWp599tm+\nS3lZs2fPZmxsjF133XW7tjcEJGkCa9euZc8992TBggUk6bucCVUVGzduZO3atSxcuHC7XsPhIEma\nwLPPPsu+++47YwMAIAn77rvvKzpaMQQk6SXM5ADY7JXWaAhIUsMMAUmaRnPmzNnmOpdccglvfOMb\n+dCHPsSKFSu48847R1DZxJwY1i+MBWd/daT7e/jC9410f7/oWvr9XXrppdxyyy2MjY1x3nnnMWfO\nHN7+9rf3UotHApK0g1x00UUcfvjhHHLIIZx77rkAfPKTn+TBBx/k2GOP5eKLL+byyy/n4osv5tBD\nD+WOO+4YeY2TOhJI8jDwDLAJeKGqFifZB/gisAB4GDi5qv62W/8c4PRu/U9V1de7/rcCVwF7AF8D\nllZVTd/bkaSZ4eabb2bNmjWsXLmSquKEE07g9ttv5/LLL+emm27itttuY7/99uOpp55izpw5fPrT\nn+6lzqkcCby7qg6tqsXd8tnArVW1CLi1WybJgcAS4CDgGODSJLO6bS4DPg4s6h7HvPK3IEkzz803\n38zNN9/MW97yFg477DB+8IMfsGbNmr7LepFXMidwIvCurr0MWAF8puu/tqqeAx5K8gBwRHc0sVdV\n3QWQ5GrgJODGV1CDJM1IVcU555zDJz7xib5LeVmTPRIo4JYk305yRtc3r6oe69qPA/O69nzg0aFt\n13Z987v21v2S9Avn6KOP5sorr+THP/4xAOvWrWP9+vUvWm/PPffkmWeeGXV5PzPZEPiHVXUocCxw\nVpJ3Dj/ZjetP29h+kjOSjCcZ37Bhw3S9rCSNzHvf+14++MEP8ra3vY03velNfOADH5jwP/v3v//9\nXHfddTN7Yriq1nU/1ye5DjgCeCLJ/lX1WJL9gc0Rtw44YGjzsa5vXdfeun+i/V0BXAGwePFiJ44l\n7TQ2f/IHWLp0KUuXLn3ROg8//PDP2m94wxu45557RlHahLZ5JJDkVUn23NwG3gusAm4ATu1WOxW4\nvmvfACxJsnuShQwmgFd2Q0dPJzkyg+ucPzK0jSSpB5M5EpgHXNfdn2IX4M+r6qYkdwPLk5wOPAKc\nDFBV9yVZDqwGXgDOqqpN3WudyZZTRG/ESWFJ6tU2Q6CqHgTePEH/RuCol9jmAuCCCfrHgYOnXqYk\naUfwimFJapghIEkNMwQkqWHeRVSSJmG673I62buY3nTTTSxdupRNmzbxsY99jLPPPnta6/BIQJJm\nqE2bNnHWWWdx4403snr1aq655hpWr149rfswBCRphlq5ciWvf/3red3rXsduu+3GkiVLuP766b28\nyhCQpBlq3bp1HHDAlhswjI2NsW7dhDda2G6GgCQ1zBCQpBlq/vz5PProlpsyr127lvnzp/fmy4aA\nJM1Qhx9+OGvWrOGhhx7iJz/5Cddeey0nnHDCtO7DU0QlaRL6+GL6XXbZhc997nMcffTRbNq0idNO\nO42DDjpoevcxra8mSZpWxx13HMcdd9wOe32HgySpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDPEVU\nkibjvFdP8+s9tc1VTjvtNL7yla/wmte8hlWrVk3v/jseCUjSDPXRj36Um266aYfuwxCQpBnqne98\nJ/vss88O3YchIEkNMwQkqWGGgCQ1zBCQpIZ5iqgkTcYkTumcbqeccgorVqzgySefZGxsjPPPP5/T\nTz99WvdhCEjSDHXNNdfs8H04HCRJDTMEJKlhhoAkvYSq6ruEbXqlNRoCkjSB2bNns3HjxhkdBFXF\nxo0bmT179na/xqQnhpPMAsaBdVV1fJJ9gC8CC4CHgZOr6m+7dc8BTgc2AZ+qqq93/W8FrgL2AL4G\nLK2Z/C8sqVljY2OsXbuWDRs29F3Ky5o9ezZjY2Pbvf1Uzg5aCnwf2KtbPhu4taouTHJ2t/yZJAcC\nS4CDgF8BbknyhqraBFwGfBz4FoMQOAa4cburl6QdZNddd2XhwoV9l7HDTWo4KMkY8D7g80PdJwLL\nuvYy4KSh/mur6rmqegh4ADgiyf7AXlV1V/fp/+qhbSRJPZjsnMAfA/8a+OlQ37yqeqxrPw7M69rz\ngUeH1lvb9c3v2lv3S5J6ss0QSHI8sL6qvv1S63Sf7KdtbD/JGUnGk4zP9PE4SdqZTeZI4B3ACUke\nBq4F3pPkC8AT3RAP3c/13frrgAOGth/r+tZ17a37X6SqrqiqxVW1eO7cuVN4O5KkqdhmCFTVOVU1\nVlULGEz4frOq/hlwA3Bqt9qpwPVd+wZgSZLdkywEFgEru6Gjp5McmSTAR4a2kST14JXcO+hCYHmS\n04FHgJMBquq+JMuB1cALwFndmUEAZ7LlFNEb8cwgSerVlEKgqlYAK7r2RuCol1jvAuCCCfrHgYOn\nWqQkacfwimFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlh\nhoAkNcwQkKSGGQKS1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYI\nSFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhq2zRBIMjvJyiTfS3JfkvO7/n2SfCPJ\nmu7n3kPbnJPkgST3Jzl6qP+tSe7tnrskSXbM25IkTcZkjgSeA95TVW8GDgWOSXIkcDZwa1UtAm7t\nlklyILAEOAg4Brg0yazutS4DPg4s6h7HTON7kSRN0TZDoAZ+3C3u2j0KOBFY1vUvA07q2icC11bV\nc1X1EPAAcESS/YG9ququqirg6qFtJEk9mNScQJJZSb4LrAe+UVXfAuZV1WPdKo8D87r2fODRoc3X\ndn3zu/bW/ZKknkwqBKpqU1UdCowx+FR/8FbPF4Ojg2mR5Iwk40nGN2zYMF0vK0naypTODqqqHwG3\nMRjLf6Ib4qH7ub5bbR1wwNBmY13fuq69df9E+7miqhZX1eK5c+dOpURJ0hRM5uyguUl+uWvvAfwm\n8APgBuDUbrVTgeu79g3AkiS7J1nIYAJ4ZTd09HSSI7uzgj4ytI0kqQe7TGKd/YFl3Rk+vwQsr6qv\nJPnfwPIkpwOPACcDVNV9SZYDq4EXgLOqalP3WmcCVwF7ADd2D0lST7YZAlV1D/CWCfo3Ake9xDYX\nABdM0D8OHPziLSRJffCKYUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapgh\nIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS\n1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDVsl74LkHZa5716xPt7arT7UxM8EpCkhhkCktQw\nQ0CSGrbNEEhyQJLbkqxOcl+SpV3/Pkm+kWRN93PvoW3OSfJAkvuTHD3U/9Yk93bPXZIkO+ZtSZIm\nYzJHAi8Av19VBwJHAmclORA4G7i1qhYBt3bLdM8tAQ4CjgEuTTKre63LgI8Di7rHMdP4XiRJU7TN\nEKiqx6rqO137GeD7wHzgRGBZt9oy4KSufSJwbVU9V1UPAQ8ARyTZH9irqu6qqgKuHtpGktSDKc0J\nJFkAvAX4FjCvqh7rnnocmNe15wOPDm22tuub37W37p9oP2ckGU8yvmHDhqmUKEmagkmHQJI5wJeA\nf1FVTw8/132yr+kqqqquqKrFVbV47ty50/WykqStTCoEkuzKIAD+rKq+3HU/0Q3x0P1c3/WvAw4Y\n2nys61vXtbfulyT1ZDJnBwX4b8D3q+qzQ0/dAJzatU8Frh/qX5Jk9yQLGUwAr+yGjp5OcmT3mh8Z\n2kaS1IPJ3DbiHcCHgXuTfLfr+zfAhcDyJKcDjwAnA1TVfUmWA6sZnFl0VlVt6rY7E7gK2AO4sXtI\nknqyzRCoqr8EXup8/qNeYpsLgAsm6B8HDp5KgZKkHccrhiWpYYaAJDXMEJCkhhkCktQwQ0CSGmYI\nSFLDDAFJapghIEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAk\nNcwQkKSGGQKS1DBDQJIaZghIUsN26bsASerFea8e8f6eGu3+JskjAUlqmCEgSQ0zBCSpYYaAJDXM\nEJCkhnl20FYWnP3Vke7v4QvfN9L9SdIwjwQkqWGGgCQ1bJshkOTKJOuTrBrq2yfJN5Ks6X7uPfTc\nOUkeSHJ/kqOH+t+a5N7uuUuSZPrfjiRpKiZzJHAVcMxWfWcDt1bVIuDWbpkkBwJLgIO6bS5NMqvb\n5jLg48Ci7rH1a0qSRmybIVBVtwM/3Kr7RGBZ114GnDTUf21VPVdVDwEPAEck2R/Yq6ruqqoCrh7a\nRpLUk+2dE5hXVY917ceBeV17PvDo0Hpru775XXvrfklSj17xxHD3yb6moZafSXJGkvEk4xs2bJjO\nl5YkDdneEHiiG+Kh+7m+618HHDC03ljXt65rb90/oaq6oqoWV9XiuXPnbmeJkqRt2d4QuAE4tWuf\nClw/1L8kye5JFjKYAF7ZDR09neTI7qygjwxtI0nqyTavGE5yDfAuYL8ka4FzgQuB5UlOBx4BTgao\nqvuSLAdWAy8AZ1XVpu6lzmRwptEewI3dQ5LUo22GQFWd8hJPHfUS618AXDBB/zhw8JSqkyTtUF4x\nLEkNMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSwwwBSWqYISBJDTMEJKlhhoAkNcwQkKSGGQKS\n1DBDQJIaZghIUsMMAUlqmCEgSQ0zBCSpYYaAJDXMEJCkhhkCktQwQ0CSGmYISFLDDAFJapghIEkN\nMwQkqWGGgCQ1zBCQpIYZApLUMENAkhpmCEhSw3YZ9Q6THAP8CTAL+HxVXTjqGmaU8149wn09Nbp9\nSdopjPRIIMks4D8DxwIHAqckOXCUNUiSthj1cNARwANV9WBV/QS4FjhxxDVIkjqjDoH5wKNDy2u7\nPklSD0Y+JzAZSc4AzugWf5zk/j7r2ZEC+wFPjmRn52cku2nFSH934O9vmjXw+/vVyaw06hBYBxww\ntDzW9f2cqroCuGJURfUpyXhVLe67Dk2dv7udm7+/gVEPB90NLEqyMMluwBLghhHXIEnqjPRIoKpe\nSPLPga8zOEX0yqq6b5Q1SJK2GPmcQFV9DfjaqPc7gzUx7PULyt/dzs3fH5Cq6rsGSVJPvG2EJDXM\nEJCkhhkCktSwGXmx2C+67h5K8xj696+qv+6vIk1Wkv8A3A7cWVV/13c9mrokezO4Xmn47+87/VXU\nLyeGRyzJ7wHnAk8AP+26q6oO6a8qTVaS3wF+A3gb8AxwB3B7VV3fa2GalC7EPwr8FbD5P7+qqvf0\nVlTPDIERS/IA8A+qamPftWj7Jfl7wMnAp4G9q2rPnkvSJHS3oHlTdwNL4ZxAHx4FvLH/TirJ55Pc\nCVzGYDjhA8De/ValKVgF/HLfRcwkzgmMSJJ/2TUfBFYk+Srw3Obnq+qzvRSmqdqXwdXuPwJ+CDxZ\nVS/0W5Km4D8C/yfJKn7+7++E/krqlyEwOpuHC/66e+zWPbQTqap/DJDkjcDRwG1JZlXVWL+VaZKW\nAX8E3MuWObmmOScgTUGS4xlMDL+TwbDCXcAdVXVlr4VpUpLcXVWH913HTGIIjFiSv2DLWQmbPQWM\nA/+lqp4dfVWarCSfY3BG0B1V9Td916OpSfJZBsNAN/Dzw0GeIqrRSPInwFzgmq7rt4GnGQTDXlX1\n4b5q0+QkmQds/jS5sqrW91mPJi/JbRN0e4qoRmeiw9HNfUnuq6qD+qpN25bknwL/CVgBhMHQ0L+q\nqv/RZ13S9nJiePTmJHnt5iuEk7wWmNM957nLM9+/Aw7f/Ok/yVzgFsAQ2Akk+YOJ+qvq34+6lpnC\nEBi93wf+MslfMfgkuRA4M8mrGJy5oJntl7Ya/tmI19vsTIZv9TEbOB74fk+1zAgOB/Ugye7Ar3eL\n9zsZvPNIchFwCD8/p3NPVX2mv6q0vbq/xa9X1bv6rqUvhsCIJHlPVX0zyT+Z6Pmq+vKoa9L2SfJb\nwDu6xTuq6ro+69H2624md3dVvb7vWvricNDo/CPgm8D7u+XN6ZuubQjsJKrqS8CX+q5DU5fkXrb8\n7c1icKZes/MB4JHAyCWZDfwWsIAtIVwtT0ztDJI8w4uv74AuxKtqrxGXpO2Q5FeHFl8Anmj9th8e\nCYze/2Rw35nvAJvnAkziGc67hO7ckuxVVU8zuP33sL2SUFU/7KOumcAjgRFLsqqqDu67DqklSb5S\nVccneYjBh64MPV1V9bqeSuudITBiSa4A/rSq7u27FkkyBEZkaEJqF2ARg1tKP8eWMWW/WUzaQZIc\n9nLPe+8g7XBbTUi9SFU9MqpapNYM3TNoNrAY+B6DD2CHAONV9ba+auubE8Mj4n/yUn+q6t0ASb4M\nHLZ5ODbJwcB5PZbWOy93l9SSXxuej6uqVcAbe6yndx4JSGrJPUk+D3yhW/4QcE+P9fTOOQFJzegu\n1vxdBt8MB3A7cFnL9+8yBCQ1JckewGur6v6+a5kJnBOQ1IwkJwDfBW7qlg9NckO/VfXLEJDUknOB\nIxjcuoWq+i6D7/RoliEgqSXPV9VTW/U1PSbu2UGSWnJfkg8Cs5IsAj4F3NlzTb3ySEBSS34POIjB\nLVv+HHgKWNprRT0zBCS15MDusQuDW0icCNzda0U98xRRSc1Icj/waWAV8NPN/S3f1sU5AUkt2VBV\nf9F3ETOJRwKSmpHkKOAU4FYG8wIAVFWz3/HtkYCklvwO8OvArmwZDiqg2RDwSEBSM5LcX1W/1ncd\nM4lnB0lqyZ1JDuy7iJnEIwFJzUjyfeDvAw/h17sChoCkhrzU17y2fIqoISBJDXNOQJIaZghIUsMM\nAWkKklyV5AN91yFNF0NA2oGSeEGmZjRDQM1L8qokX03yvSSrkvx2kj9Icne3fEWSTLDdhOskWZHk\nj5OMA/82yUNJdu2e22t4WeqbISDBMcDfVNWbq+pgBt8/+7mqOrxb3gM4foLtXm6d3apqcVWdD6wA\n3tf1LwG+XFXP76g3I02FISDBvcBvJvmjJL/Rff3gu5N8K8m9wHsYfBHJ1l5unS8OtT/P4J41dD//\n+/S/BWn7OF6p5lXV/01yGHAc8IdJbgXOAhZX1aNJzmPwBSQ/k2Q2cOnLrPN3Q6//v5IsSPIuYFZV\nrdqhb0iaAo8E1LwkvwL8v6r6AnARcFj31JNJ5gATnQ00exLrDLuawdcZehSgGcUjAQneBFyU5KfA\n88DvAicx+Papx5ng6wer6kdJ/uvLrbOVPwP+ELhmGuuWXjFvGyGNQHdtwYlV9eG+a5GGeSQg7WBJ\n/hQ4lsGcgzSjeCQgSQ1zYliSGmYISFLDDAFJapghIEkNMwQkqWGGgCQ17P8DRxGGJIEiwcQAAAAA\nSUVORK5CYII=\n", 358 | "text/plain": [ 359 | "" 360 | ] 361 | }, 362 | "metadata": {}, 363 | "output_type": "display_data" 364 | } 365 | ], 366 | "source": [ 367 | "pd.crosstab(df.salary,df.left).plot(kind='bar')" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "Above bar chart shows employees with high salaries are likely to not leave the company" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "**Department wise employee retention rate**" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 38, 387 | "metadata": { 388 | "scrolled": true 389 | }, 390 | "outputs": [ 391 | { 392 | "data": { 393 | "text/plain": [ 394 | "" 395 | ] 396 | }, 397 | "execution_count": 38, 398 | "metadata": {}, 399 | "output_type": "execute_result" 400 | }, 401 | { 402 | "data": { 403 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAFDCAYAAADcebKbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYXVWZ7/Hvj4AEDWEMNKbABAloQAiSIKCN2F4Zm0FF\nOjgAggRb0Hiv2oLeblCbFkdaREBsELAZpFUEmYcLIgKGCmBCwNwgBEkaIUaGgBIhvP3HWic5qVSl\nKkmdtbe1f5/nOU/OXmdYbyWV856119rvUkRgZmbNtFbVAZiZWXWcBMzMGsxJwMyswZwEzMwazEnA\nzKzBnATMzBrMScDMrMGcBMzMGsxJwMyswZwEzMwabO2qA+jPpptuGmPGjKk6DDOzvyrTp0//Q0SM\n6u95tU8CY8aMobu7u+owzMz+qkh6bCDP8+kgM7MGcxIwM2swJwEzswar/ZyAmVkVXnrpJebNm8eL\nL75YdSgrNXz4cLq6ulhnnXVW6/VOAmZmvZg3bx7rr78+Y8aMQVLV4fQqIli4cCHz5s1j7Nixq/Ue\nPh1kZtaLF198kU022aS2CQBAEptssskajVacBMzM+lDnBNCypjE6CZiZDaIRI0b0+5wzzjiDN77x\njXzgAx/gtttu48477ywQWe88J2BmHTfmxGv6fc7c0w4oEEk9nHXWWdx88810dXVxyimnMGLECPbY\nY49KYvFIwMysQ772ta8xadIkdtxxR04++WQAPvrRj/LII4+w3377cfrpp3POOedw+umnM2HCBH7x\ni18Uj9EjATOzDrjxxhuZM2cO06ZNIyI46KCDuP322znnnHO4/vrrufXWW9l000159tlnGTFiBJ/+\n9KcridNJwMysA2688UZuvPFGdt55ZwCef/555syZw5577llxZMvrNwlIGg7cDqybn/+jiDhZ0sbA\nD4ExwFzgsIh4Or/mJOAYYAnwiYi4IbfvAlwArAdcC0yNiBjcH8nMrHoRwUknncRxxx1XdSgrNZA5\ngcXA30XETsAEYF9JuwEnArdExDjglnyMpPHAZGB7YF/gLEnD8nudDRwLjMu3fQfxZzEzq4199tmH\n888/n+effx6A+fPn89RTT63wvPXXX59FixaVDm+pfpNAJM/nw3XyLYCDgQtz+4XAIfn+wcBlEbE4\nIh4FHgZ2lbQFMDIi7s7f/i9qe42Z2ZCy99578/73v5/dd9+dN73pTRx66KG9ftgfeOCBXHHFFfWe\nGM7f5KcD2wDfiYhfSdo8Ip7IT/k9sHm+Pxq4u+3l83LbS/l+z3YzsyGj9c0fYOrUqUydOnWF58yd\nO3fp/W233ZYZM2aUCK1XA1oiGhFLImIC0EX6Vr9Dj8eDNDoYFJKmSOqW1L1gwYLBelszM+thla4T\niIhngFtJ5/KfzKd4yH+2TnbNB7Zse1lXbpuf7/ds762fcyNiYkRMHDWq393RzMxsNfWbBCSNkrRh\nvr8e8C7gN8BVwJH5aUcCV+b7VwGTJa0raSxpAnhaPnX0nKTdlIpdHNH2GjMzq8BA5gS2AC7M8wJr\nAZdHxNWS7gIul3QM8BhwGEBEzJJ0OfAg8DJwfEQsye/1MZYtEb0u38zMrCL9JoGImAHs3Ev7QuCd\nfbzmVODUXtq7gR1WfIWZmVXBtYPMzBrMScDMrMauv/56tttuO7bZZhtOO+20QX9/1w4yMxuAgZTD\nXhUDKZ29ZMkSjj/+eG666Sa6urqYNGkSBx10EOPHjx+0ODwSMDOrqWnTprHNNtuw9dZb86pXvYrJ\nkydz5ZWDu6jSScDMrKbmz5/Pllsuu+yqq6uL+fN7vbxqtTkJmJk1mJOAmVlNjR49mscff3zp8bx5\n8xg9enBLrjkJmJnV1KRJk5gzZw6PPvoof/nLX7jssss46KCDBrUPrw4yM6uptddemzPPPJN99tmH\nJUuWcPTRR7P99tsPbh+D+m5mZkPUQJZ0dsL+++/P/vvv37H39+kgM7MGcxIwM2swJwEzswZzEjAz\nazAnATOzBnMSMDNrMCcBM7OaOvroo9lss83YYYfO7cXl6wTMzAbilA0G+f2e7fcpRx11FCeccAJH\nHHHE4PbdxiMBM7Oa2nPPPdl444072oeTgJlZgzkJmJk1mJOAmVmDOQmYmTWYk4CZWU0dfvjh7L77\n7syePZuuri7OO++8Qe/DS0TNzAZiAEs6B9ull17a8T76HQlI2lLSrZIelDRL0tTcfoqk+ZLuz7f9\n215zkqSHJc2WtE9b+y6SZubHzpCkzvxYZmY2EAMZCbwMfCoi7pW0PjBd0k35sdMj4uvtT5Y0HpgM\nbA+8FrhZ0rYRsQQ4GzgW+BVwLbAvcN3g/ChmZraq+h0JRMQTEXFvvr8IeAhY2U7HBwOXRcTiiHgU\neBjYVdIWwMiIuDsiArgIOGSNfwIzM1ttqzQxLGkMsDPpmzzAxyXNkHS+pI1y22jg8baXzctto/P9\nnu299TNFUrek7gULFqxKiGZmgyZ9X623NY1xwElA0gjgx8AnI+I50qmdrYEJwBPAN9YokjYRcW5E\nTIyIiaNGjRqstzUzG7Dhw4ezcOHCWieCiGDhwoUMHz58td9jQKuDJK1DSgAXR8RPcudPtj3+PeDq\nfDgf2LLt5V25bX6+37PdzKx2urq6mDdvHnU/GzF8+HC6urr6f2If+k0CeQXPecBDEfHNtvYtIuKJ\nfPhu4IF8/yrgEknfJE0MjwOmRcQSSc9J2o10OukI4NurHbmZWQets846jB07tuowOm4gI4G3Ah8C\nZkq6P7d9Djhc0gQggLnAcQARMUvS5cCDpJVFx+eVQQAfAy4A1iOtCvLKIDOzCvWbBCLiDqC39fzX\nruQ1pwKn9tLeDXRudwQzM1slLhthZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZg\nTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4C\nZmYN5iRgZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN1m8SkLSlpFslPShplqSpuX1jSTdJ\nmpP/3KjtNSdJeljSbEn7tLXvImlmfuwMSerMj2VmZgMxkJHAy8CnImI8sBtwvKTxwInALRExDrgl\nH5MfmwxsD+wLnCVpWH6vs4FjgXH5tu8g/ixmZraK1u7vCRHxBPBEvr9I0kPAaOBgYK/8tAuB24DP\n5vbLImIx8Kikh4FdJc0FRkbE3QCSLgIOAa4bxJ/HzKzWxpx4Tb/PmXvaAQUiSVZpTkDSGGBn4FfA\n5jlBAPwe2DzfHw083vayebltdL7fs93MzCoy4CQgaQTwY+CTEfFc+2MREUAMVlCSpkjqltS9YMGC\nwXpbMzPrYUBJQNI6pARwcUT8JDc/KWmL/PgWwFO5fT6wZdvLu3Lb/Hy/Z/sKIuLciJgYERNHjRo1\n0J/FzMxW0UBWBwk4D3goIr7Z9tBVwJH5/pHAlW3tkyWtK2ksaQJ4Wj519Jyk3fJ7HtH2GjMzq0C/\nE8PAW4EPATMl3Z/bPgecBlwu6RjgMeAwgIiYJely4EHSyqLjI2JJft3HgAuA9UgTwp4UNjOr0EBW\nB90B9LWe/519vOZU4NRe2ruBHVYlQDMz6xxfMWxm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZgTgJm\nZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN5iRgZtZgTgJmZg3mJGBm1mBOAmZmDeYkYGbWYE4CZmYN\nNpCdxczMhoQxJ16z0sfnnnZAoUjqwyMBM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcB\nM7MG6zcJSDpf0lOSHmhrO0XSfEn359v+bY+dJOlhSbMl7dPWvoukmfmxMyRp8H8cMzNbFQMZCVwA\n7NtL++kRMSHfrgWQNB6YDGyfX3OWpGH5+WcDxwLj8q239zQzs4L6TQIRcTvwxwG+38HAZRGxOCIe\nBR4GdpW0BTAyIu6OiAAuAg5Z3aDNzGxwrMmcwMclzcinizbKbaOBx9ueMy+3jc73e7abmVmFVjcJ\nnA1sDUwAngC+MWgRAZKmSOqW1L1gwYLBfGszM2uzWkkgIp6MiCUR8QrwPWDX/NB8YMu2p3bltvn5\nfs/2vt7/3IiYGBETR40atTohmpnZAKxWEsjn+FveDbRWDl0FTJa0rqSxpAngaRHxBPCcpN3yqqAj\ngCvXIG4zMxsE/ZaSlnQpsBewqaR5wMnAXpImAAHMBY4DiIhZki4HHgReBo6PiCX5rT5GWmm0HnBd\nvpmZWYX6TQIRcXgvzeet5PmnAqf20t4N7LBK0ZmZWUf5imEzswZzEjAzazAnATOzBnMSMDNrMCcB\nM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAnATOz\nBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2uwfpOApPMl\nPSXpgba2jSXdJGlO/nOjtsdOkvSwpNmS9mlr30XSzPzYGZI0+D+OmZmtioGMBC4A9u3RdiJwS0SM\nA27Jx0gaD0wGts+vOUvSsPyas4FjgXH51vM9zcyssH6TQETcDvyxR/PBwIX5/oXAIW3tl0XE4oh4\nFHgY2FXSFsDIiLg7IgK4qO01ZmZWkbVX83WbR8QT+f7vgc3z/dHA3W3Pm5fbXsr3e7abmSWnbNDP\n48+WiaNh1nhiOH+zj0GIZSlJUyR1S+pesGDBYL61mZm1Wd0k8GQ+xUP+86ncPh/Ysu15Xbltfr7f\ns71XEXFuREyMiImjRo1azRDNzKw/q5sErgKOzPePBK5sa58saV1JY0kTwNPyqaPnJO2WVwUd0fYa\nMzOrSL9zApIuBfYCNpU0DzgZOA24XNIxwGPAYQARMUvS5cCDwMvA8RGxJL/Vx0grjdYDrsu3NTbm\nxGtW+vjc0w4YjG7MzIakfpNARBzex0Pv7OP5pwKn9tLeDeywStGZmVlH+YphM7MGcxIwM2swJwEz\nswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MG\ncxIwM2uw1d1j2Mz+SnjPDVsZJwEzs7o5ZYN+Hn920Lry6SAzswbzSGCQeMhtZn+NnATMOqS/Lwbg\nLwdWPZ8OMjNrMCcBM7MGcxIwM2swJwEzswZzEjAza7A1Wh0kaS6wCFgCvBwREyVtDPwQGAPMBQ6L\niKfz808CjsnP/0RE3LAm/dvyvEzVbA31d5EWDOqFWnUwGEtE3xERf2g7PhG4JSJOk3RiPv6spPHA\nZGB74LXAzZK2jYglgxCD2XKcEM0GphPXCRwM7JXvXwjcBnw2t18WEYuBRyU9DOwK3NWBGKxC/gA2\n++uxpnMCQfpGP13SlNy2eUQ8ke//Htg83x8NPN722nm5zczMKrKmI4G3RcR8SZsBN0n6TfuDERGS\nYlXfNCeUKQBbbbXVGoZoZmZ9WaORQETMz38+BVxBOr3zpKQtAPKfT+Wnzwe2bHt5V27r7X3PjYiJ\nETFx1KhRaxKimZmtxGonAUmvkbR+6z6wN/AAcBVwZH7akcCV+f5VwGRJ60oaC4wDpq1u/2ZmtubW\n5HTQ5sAVklrvc0lEXC/pHuBySccAjwGHAUTELEmXAw8CLwPHe2WQmVm1VjsJRMQjwE69tC8E3tnH\na04FTl3dPs3MbHD5imEzswZzEjAzazAnATOzBnMSMDNrMCcBM7MGcxIwM2swJwEzswZzEjAzazAn\nATOzBnMSMDNrsE5sKlMvDdwuzsxsoDwSMDNrMCcBM7MGcxIwM2uwoT8nYPXT3zyN52jMinESaBJP\nktdPHRKify8azaeDzMwazEnAzKzBnATMzBrMScDMrME8MVxKHSYAzcx68EjAzKzBPBKwZvKySDPA\nIwEzs0YrngQk7StptqSHJZ1Yun8zM1umaBKQNAz4DrAfMB44XNL4kjGYmdkypUcCuwIPR8QjEfEX\n4DLg4MIxmJlZVjoJjAYebzuel9vMzKwCiohynUmHAvtGxEfy8YeAt0TECT2eNwWYkg+3A2avQbeb\nAn9Yg9cPljrEUYcYoB5x1CEGqEccdYgB6hFHHWKAwYnjdRExqr8nlV4iOh/Ysu24K7ctJyLOBc4d\njA4ldUfExMF4r7/2OOoQQ13iqEMMdYmjDjHUJY46xFA6jtKng+4BxkkaK+lVwGTgqsIxmJlZVnQk\nEBEvSzoBuAEYBpwfEbNKxmBmZssUv2I4Iq4Fri3Y5aCcVhoEdYijDjFAPeKoQwxQjzjqEAPUI446\nxAAF4yg6MWxmZvXishFmZg3mJGBm1mBOAmZmDTakkoCkG6uOwVYkaZik/12DOKYOpK1AHO/p5fZO\nSZuVjqUuJK0laWRFfd8ykLahakhNDEu6LyJ2rjoOAEnvAD5OuuIZ4CHgzIi4rXAc/6eX5meB6RFx\nf8E4pkXErqX66yOGeyPizT3aiv/OSLoG2B24NTftBUwHxgJfjIgfFIpjEdDzA+BZoBv4VEQ80uH+\nLwE+CiwhXUM0EvhWRHytk/229T8ceDXp32EvQPmhkcD1EfGGAjH8jBX/DZaKiIM6HcNQ21RmA0nv\n6evBiPhJiSAkHQCcCXwR+ALpl+vNwPmSTsjLZEuZmG8/y8d/D8wAPirpvyLiq4Xi+KWkM4EfAi+0\nGiPi3k53LOlw4P3AWEntFyeuD/yx0/33Ym3gjRHxZI5vc+Ai4C3A7UCRJAD8O6l+1yWk39HJwOuB\ne4HzSR+MnTQ+Ip6T9AHgOuBEUjIskgSA44BPAq/N/baSwHOk/78lfL1QP30aaiOBhcCVLPvHbBcR\ncXShOG4DpkbEr3u07wh8OyLeXiKO3OftwP4R8Xw+HgFcA+xLGg0UKeUtqfWtd7lfuIj4uwJ9v470\nLfvLpA+alkXAjIh4udMx9Ijnwfa/d0kCZkXE+JIjE0m/joiderTdHxETenusA/3PAiaQktCZEfHz\nEv32iGEY8LmI+FKpPutmqI0EHiv1Qd+Pv+mZAAAiYkb+1lfSZsDituOXgM0j4s+SFvfxmk7YD3gv\nMIZlv3dFvoFExGPAY6RTMHVwm6Srgf/Kx+/Nba8BnikYx58kHQb8KB8fCryY75f4t/kuMBf4NXB7\nTtbPFeh3qYhYks8eVJoEJI0jfUkZDwxvtUfE1p3ue6glgd5GAFV4YTUf64SLgV9JujIfHwhckj9w\nHiwYx09JH3D3UvaDZqn8n/0rpMSofIuIKD0heTzpg/+t+fgi4MeRhuXvKBjHB4BvAWeR/i3uBj4o\naT3ghJW9cDBExBnAGW1Nj+W5tNJukfRe4CdR3amR7wMnA6eTfgc+TKGFO0PtdND2dahFJOkZ0rnd\nFR4C3hYRGxWOZxKwRz78ZUR0l+w/x/BAROxQut8eMTwMHBgRD1UZhyV5VPxvwGsjYr+8y+DuEXFe\n4TgWAa8hTVD/mQq+HEiaHhG7SJoZEW9qb+t030NtJHC3pN6yWul/1JXtllbFRNC9pJLdawNI2ioi\nflc4hjslvSkiZhbut92TdUgAdRmRSBoFHMvyp+goeEr1AtI34M/n4/9PWjhQNAlExPol++vDYklr\nAXNykc35wIgSHQ+pkYCtSNLHScPMJ0nfdFofODsW6n8m6VTD2sA44BHSHEXROHIs3wL+hnRqaul8\nSKlVY21x1GJEIulO4BeklTFLWu0R8eNC/d8TEZPaJ8NbE9Ml+u8Ry0HAnvnwtoi4unD/k0jLyDck\nzU+MBL4WEXd3uu+hNhKohbYPvl6V/OADpgLbRcTCgn22+/uK+u3NSOBPwN5tbQEUTQLUZEQCvDoi\nPlth/y9I2oT8f0XSbqTrFIqSdBowiTR/BjBV0lsj4qRSMUTEPfnu86T5gGI8EuiAvMoB0gQgLFv3\n/UHSt98TV3xVx2K5FXhX6WWQ1rcajUj+Fbiz8HUr7f2/Gfg2sAPwADAKODQiZhSOYwYwISJeycfD\ngPsKj1JvAt4XEc/k442AyyJin4737STQOb2t+e7tqtUOx3Ae6arla1j+A+ebpWKoC0nbAmeTlsju\nkK/bOCgi/rVwHN/vpbnYdSxtcbQmRBeTlg5XMSG6Nun3U8DsiHipVN9tMcwA9oqIP+bjjUmnhEom\ngd4+K4pcM+LTQZ2lPKz8ZT7Yg/L1mn6Xb6/Ktyb7HvAZ0vr01nUblwBFk0BEFB3u96WqCdGVXNW/\nraTiIyLS+vz78qhZpLmBYqP17JX2BRv5bEKRb+hOAp11DKlUxAakX66ngaLf9iLiCyX7q7lXR8S0\ndIHuUsVOk0n6p4j4qqRv08t/8Ij4RKlY2mLakRVXB3X6Q/jAlTxWfI4mIi7NV/lPyv1/NiJ+XzIG\n0gqpOyT9nPRZ8bfAlBIdOwl0UERMB3bKSYCIKDbpJenfI+KTfRWoKlGYqob+IOn1LJuIPBR4omD/\nrcng4tdp9EbS+cCOwCzgldzc8Q/huoyEetgdeBvLVrJdUbLziLg+z5Hslps+GRF/KNG35wQ6SNK6\nrFgqgYj4YoG+d4mI6ZJ6rVMUET/vdAx1I2lr0t6te5BGZY8CH4yIuYXjeF9E/Fd/bQXiWK6GURVy\nscXtWb5UQsf/f/SI4SxgG+DS3PQPwG8j4vi+XzVofb8hIn6TE8AKokSBRSeBzpF0PblsM8uvw/5G\nwRimRsS3+mtrklwyY62IWFRR/72VtC66YCD3eR7wjYgoWT6kvf9zSKWc3wH8B6l20bSIOKZwHL8h\nVXVtjRDXIhX0e2OBvs+NiCltBRbbRZQosOgk0Dk1KZVQixr6dSBpQ+AIVhyZFTkXL2k/YH/gMNKV\nsS0jSWWVi+63kEeJVwG/p4IL+CTNiIgd2/4cAVwXEX9bov+2OK4Gjo9UaLA1KXtmRKxs7mLI8JxA\nZ1VWKkH1q6FfB9eSiqTNZNk58JL+mzQfcBBpdNiyCKhi57XzgA9R3d9Hq5DgnyS9lvR7uUUFcawP\nPCRpWj6eBHS3/t+Umj/LqwfHsPwXlIs63a+TQGe9DThK0qOU/6Z1J2nSc1Og/fTTItKmMk00PCJ6\n22mtiEjlxX+dl6WuDWwVEbOrigdYEBFX9f+0jvlZHp19jVTfKkjLeEv7lwr6XI6kH5A29LmfZaeO\ng1RhtrN9+3RQ57RdObyc1rDTylLa5/h54GqWv3Cu6MhI0oGkQoKvioixkiaQtpUsumIrT4huSNp1\nrviVy5LeR9rGcZGkfybtvvelEpOhvcTyN8CupA/ee0ovEZX0EOmUYPEP5CG10XzdRMRj+QP/z6Rf\nrtatGKVNzOdIelbSc5IWSSq6cUeN/IX0rfMu0umY6VSzXPMU0gfOMwCR9noeW0Ec65E+/Pcmrd0/\nkLK1nv45J4C3AX9Hmhw+u2D/AEj6CDANeA9pcvpuSaU3p3qAVEqkOJ8O6qBcmfAbpD1MnwJeR1or\nvn3BML5KDSpW1sSngG1Krb9eiZci4tkeF60V/wbY33p9SSdFxJc7GELrtMcBwPci4ppcz6i0zwA7\nt4os5qJ2d5L2WS5lU+DBPC/RPirzRvN/5b5Euvjj5ojYWWnXpA8WjqEuFSvr4GFSFdGqzZL0fmCY\n0raCnyB96NTN+0glFTplvqTvAu8CvpKvq6ni7MRC0lxZy6LcVtIphftbynMCHSSpOyImSvo16ZvG\nKyq/kXYtKlbWgaQrSKOwW1n+76JouQZJryaVCWiVtL6RdC78xb5fVV6nlxLnv4d9gZkRMUfSFsCb\nIuLGTvXZRxwXAW8CriSNyA4mLZ6YAUO/2KJHAp31TF77fDtwsaSnKL/HcF1q6NfBT/OtaptHxOdZ\ntqNWa1ORe/p+SSU6+g0xIv5E2+9hRDxB2TIeLb/Nt5bWftzFCuypwt3mPBLooHxl6p9JQ9wPABsA\nF0d1G7w0ntIm6pUuzZR0L2meZn4+3hP4TuS9ZeuiqRcVVkEV7jbnkUAHRUTrW/8rwIX5cvTDWbaD\nUccp1a7vrYBc6dUPlWtfmkm6iK6SpZnAccBPczxvJp13379wDLSXOe+jrWgto6rkkg29/R/peMmG\nNpXN3Xkk0AGSRpJ2FRtNuiz/pnz8aeDXEbGyjegHO5b3th0OB94N/HcVZYurJmk6aSnibbFsT9tK\nSntI2p20r8GLwAERsaCCGGpRw6hqknZpOxxOKvr4ckT8U4G+W3srvJ2K5u48EuiMH5CqVN4FfAT4\nHOkc3yF5TXgx0WPTcEmXAneUjKFGeluaWaxcQi9lvV9NKjB4ntJmKqXKE+xOqqQ6SlL7FdQjgWEl\nYqiTXPK93S/bSkh0Wnt9okrm7pwEOmPr1vldSf9BmuzaqiarP8aRJp+aqOqlmV8v2NfKvAoYQfr/\n3z75+RzpYqlGUdpOsmUtYCJp/q7j6rC3gk8HdUDPIXWVQ2ylfWSDvNqAVDHypJ4jhCbosTRTwA0U\nXpqptIn5zRHxjlJ9riSW17mECeTaXq3/Iy8Bc0lzRcVGzJIuBKbG8hvNf6PE3J2TQAdIWsKypaAi\nXZ7/JyrYyNvqR9ItwHui4E5zfcRxE/C+Hh88l0XEPlXGVZqkw0g1jJ6rqoZRbyuxSq3O8umgDoiI\nWp1XzeUr9syHt0XE1VXGU5VezslDOiffDXy34IjgeWBm/hBeet1IBZP1m7YSQO7/aUlNPFX4fyPi\n8rYaRl8n1TB6S8EY1pK0UUQ8DUtPURX5fHYSGOIknUaqj95aljpV0h4R8bkKw6rKI8Aolt9GcBGw\nLamE8YcKxfET6nGx3iuStoqI38HSqrdNPDVQhxpG3wDuktRalvs+4NQSHft00BAnaQYwISJeycfD\ngPtK7R5VJ5LuiYhJvbVJmhURJQv7VU7SvqQ9l39OOlX5t8CUiLih0sAKU9pZbD6phtGbSRd4TitZ\n3iXHMZ40EgH4f1Fo20+PBJphQ5btJlZk1UNNjejxzXcr0ioZSGWmi8grk74MjGf5Dda3LhVD7u96\npQ3Od8tNn6xBhdUqHEaqYfT1iHgm1zD6TAVxbAy8EBHflzRK0tiIeLTTnToJDH1fBu7LV0WKNDdw\nYrUhVeZEqsINAAAHq0lEQVRTwB2Sfkv6uxgLfCyX97iwYBzfB04GTidtsv5hKqiemctVQFoaCjA+\nX69we+lYqlSHGkaSTiYtTd2O9PuxDvCfwFs73rdPBw19+ZtN6zTItNK7JtVJLlf8hnw4u4prNyRN\nj4hdJM1su55kekTs0t9rBzmOn7UdDidtdDO9cLkEAyTdD+wM3Nt2NfuMEqdtPRIY4iS9m3R+8ap8\nvKGkQyKiDtU0qzCO9G1rOLBT/ubb8X1ce1ic60jNkXQC6Xz0iH5eM+giov1qVSRtCfx76TgMgL9E\nREgKWFp8sgiPBIY4SfdHxIQebY2sDpmH3HuRzsVfC+wH3BERRa+SzWWjHyLN1XyJVK7hqxHxq5Jx\n9BKXgFkRMb7KOJpI0qdJX1DeRTqFezRwSUR8u9N9eyQw9PV2rrmp/+6HAjuRVkd9WNLmpPOupQWp\nvtTrSOd+IS1RLbpiS9K3WbYkdC1gAlB8k3cD0tLlH5HmZ7YD/gX4XyU69khgiJN0PmlD8+/kpuOB\njSPiqMqCqoikaRGxa64m+g7SNQIPRcQb+nnpYMcxm7T6ZCZtBexKl3CQdGTb4cvA3J6lpa2MPiq6\nek7ABsXHgX8Gfkj61tcqa91E3ZI2JH3rnk66cveuCuJY0JqjqVJElFwRZb2Q9I/Ax4Ct8zU9LesD\nRRKyRwLWSJLGACMjYkY/T+1E3+8kbS50CxXs+yxpJiu5MriJFxJWRdIGwEakeYD2pduLIuKPvb9q\nkGNwEhjaXCRseZJGk87FLx0Fl14XL+k/SctUZ7HsdFCU2u0tl4eAZSPCH+Q/P5jjaOp1JI3kJDDE\nVVmdsG4kfYVUL+hBltWLidLbS0qaHRHbleyzjzh6+91o3M5iTec5gaGvZ5GwMTSzSBjAIcB2EbG4\n32d21p2SxpeqDbMSUtuewpL2oIIrl61aTgJD3+dJpRKWKxJWbUiVeYS0JLPqJLAbcH/ezGQxy/aZ\nKH0u/hjg/HxeWqQtUYuckrL68OmgBsg14qcA95E2uHmqafVhACT9mHSdQM8J2aJ1/NvOyS+nql2+\nchKg6k1urBoeCQxxkj4CTAW6gPtJ30LvYlnJ2ia5Kt8qVZctHfOH/8nkDYfyaPGLTgbN4pHAEJeX\nA04C7o6ICZLeAPxbRLyn4tCsYnlk9ADLKqh+CNjJvxvN4pHA0PdiRLwoCUnrRsRvJFW+MqUKdanj\nXyOvj4j3th1/IVeztAbxSoChb16+SvanwE2SrgRqcTqiAt8n7R37MqlsxEVUUzuoLv6c99UFQNJb\nSbtqWYP4dFCDSHo7aWex6yOi2E5adVGXOv51IWknUiJs7Tb3NHBkFVdRW3V8OqhBIuLnVcdQsVrU\n8a+D/PewXUTsJGkkQEQ818/LbAjySMAao651/KsiqTsiJlYdh1XLScAaQ9JE0sVz7XX8q7hIqxYk\nnQb8gVRh9oVWe6nCZVYPTgLWGHWp418X+YrlFT4AGrxaqpE8J2BNUos6/jUynlTL/m2kZPAL4JxK\nI7LiPBKwxqi6jn/dSLqctJ3hxbnp/cAGEXFYdVFZaR4JWJN8mFTHfx3a6vgDjUwCwA49NpW/VVLV\nlU2tMCcBa5JJdajjXyP3StotIu4GkPQWoLvimKwwJwFrkrrU8a+LXUh/J7/Lx1sBs1vbTzZ11VTT\neE7AGkPSQ8Drgarr+NdCXyWtW5q6aqppnASsMepWx9+sDpwEzMwazFVEzcwazEnAzKzBnARsSJC0\nRNL9kmZJ+rWkT+VKmZ3u9yhJrx0q/VjzOAnYUPHniJgQEdsD7wL2I+2f2zGShgFHASU+nEv1Yw3j\nJGBDTkQ8BUwBTlAyTNLXJN0jaYak4wAk7SXpdknXSJot6ZzW6EHS2ZK688jiC633ljRX0lck3Usq\nQTERuDiPQtbLj385H3dLerOkGyT9VtJH297nM23xfCG3jZH0kKTv5X5vzO95aM9+iv1l2pDnJGBD\nUkQ8AgwDNgOOAZ6NiEnAJOBYSWPzU3cFPk4qpvZ6oLXJ+udzrf0dgbdLar+WYGFEvDki/pN0he0H\n8iiktTXj7yJiAqkg2wXAocBuQOvDfm9gXO57ArCLpD3za8cB38kjmmeA90bEj/rox2yN+Ypha4K9\ngR3zN2pI2ymOA/4CTMsJA0mXkipq/gg4TNIU0v+RLUhJorXt4g/76a9VqXQmMCIiFgGLJC3O+z3v\nnW/35eeNyPH8Dng0IlqbvU8HxqzWT2w2QE4CNiRJ2hpYAjxFujL44xFxQ4/n7MWK9fQjjxI+Tao1\n9LSkC4Dhbc95gZVrVSh9pe1+63jtHM+XI+K7PeIZ0+P5SwCf+rGO8ukgG3IkjSLVxT8z0tWQNwD/\nKGmd/Pi2kl6Tn76rpLF5LuAfgDtI206+ADwraXPSJHNfFgHrr2KINwBHSxqR4xktabN+XrM6/Zj1\nyyMBGyrWk3Q/qUz0y8APgG/mx/6DdFrlXkkCFgCH5MfuAc4EtgFuBa6IiFck3Qf8Bngc+OVK+r0A\nOEfSn4HdBxJoRNwo6Y3AXSkcngc+SPrmP6B+PC9gg8VlI6yx8umgT0fE31cdi1lVfDrIzKzBPBIw\nM2swjwTMzBrMScDMrMGcBMzMGsxJwMyswZwEzMwazEnAzKzB/ge9EgVs3R/57wAAAABJRU5ErkJg\ngg==\n", 404 | "text/plain": [ 405 | "" 406 | ] 407 | }, 408 | "metadata": {}, 409 | "output_type": "display_data" 410 | } 411 | ], 412 | "source": [ 413 | "pd.crosstab(df.Department,df.left).plot(kind='bar')" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "From above chart there seem to be some impact of department on employee retention but it is not major hence we will ignore department in our analysis" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "

From the data analysis so far we can conclude that we will use following variables as independant variables in our model

\n", 428 | "
    \n", 429 | "
  1. **Satisfaction Level**
  2. \n", 430 | "
  3. **Average Monthly Hours**
  4. \n", 431 | "
  5. **Promotion Last 5 Years**
  6. \n", 432 | "
  7. **Salary**
  8. \n", 433 | "
" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": 76, 439 | "metadata": {}, 440 | "outputs": [ 441 | { 442 | "data": { 443 | "text/html": [ 444 | "
\n", 445 | "\n", 458 | "\n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalary
00.381570low
10.802620medium
20.112720medium
30.722230low
40.371590low
\n", 506 | "
" 507 | ], 508 | "text/plain": [ 509 | " satisfaction_level average_montly_hours promotion_last_5years salary\n", 510 | "0 0.38 157 0 low\n", 511 | "1 0.80 262 0 medium\n", 512 | "2 0.11 272 0 medium\n", 513 | "3 0.72 223 0 low\n", 514 | "4 0.37 159 0 low" 515 | ] 516 | }, 517 | "execution_count": 76, 518 | "metadata": {}, 519 | "output_type": "execute_result" 520 | } 521 | ], 522 | "source": [ 523 | "subdf = df[['satisfaction_level','average_montly_hours','promotion_last_5years','salary']]\n", 524 | "subdf.head()" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": {}, 530 | "source": [ 531 | "**Tackle salary dummy variable**" 532 | ] 533 | }, 534 | { 535 | "cell_type": "markdown", 536 | "metadata": {}, 537 | "source": [ 538 | "Salary has all text data. It needs to be converted to numbers and we will use dummy variable for that. Check my one hot encoding tutorial to understand purpose behind dummy variables." 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 78, 544 | "metadata": {}, 545 | "outputs": [], 546 | "source": [ 547 | "salary_dummies = pd.get_dummies(subdf.salary, prefix=\"salary\")" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 79, 553 | "metadata": {}, 554 | "outputs": [], 555 | "source": [ 556 | "df_with_dummies = pd.concat([subdf,salary_dummies],axis='columns')" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 80, 562 | "metadata": {}, 563 | "outputs": [ 564 | { 565 | "data": { 566 | "text/html": [ 567 | "
\n", 568 | "\n", 581 | "\n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalarysalary_highsalary_lowsalary_medium
00.381570low010
10.802620medium001
20.112720medium001
30.722230low010
40.371590low010
\n", 647 | "
" 648 | ], 649 | "text/plain": [ 650 | " satisfaction_level average_montly_hours promotion_last_5years salary \\\n", 651 | "0 0.38 157 0 low \n", 652 | "1 0.80 262 0 medium \n", 653 | "2 0.11 272 0 medium \n", 654 | "3 0.72 223 0 low \n", 655 | "4 0.37 159 0 low \n", 656 | "\n", 657 | " salary_high salary_low salary_medium \n", 658 | "0 0 1 0 \n", 659 | "1 0 0 1 \n", 660 | "2 0 0 1 \n", 661 | "3 0 1 0 \n", 662 | "4 0 1 0 " 663 | ] 664 | }, 665 | "execution_count": 80, 666 | "metadata": {}, 667 | "output_type": "execute_result" 668 | } 669 | ], 670 | "source": [ 671 | "df_with_dummies.head()" 672 | ] 673 | }, 674 | { 675 | "cell_type": "markdown", 676 | "metadata": {}, 677 | "source": [ 678 | "Now we need to remove salary column which is text data. It is already replaced by dummy variables so we can safely remove it" 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": 81, 684 | "metadata": {}, 685 | "outputs": [ 686 | { 687 | "data": { 688 | "text/html": [ 689 | "
\n", 690 | "\n", 703 | "\n", 704 | " \n", 705 | " \n", 706 | " \n", 707 | " \n", 708 | " \n", 709 | " \n", 710 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 718 | " \n", 719 | " \n", 720 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | " \n", 725 | " \n", 726 | " \n", 727 | " \n", 728 | " \n", 729 | " \n", 730 | " \n", 731 | " \n", 732 | " \n", 733 | " \n", 734 | " \n", 735 | " \n", 736 | " \n", 737 | " \n", 738 | " \n", 739 | " \n", 740 | " \n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 748 | " \n", 749 | " \n", 750 | " \n", 751 | " \n", 752 | " \n", 753 | " \n", 754 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 762 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalary_highsalary_lowsalary_medium
00.381570010
10.802620001
20.112720001
30.722230010
40.371590010
\n", 763 | "
" 764 | ], 765 | "text/plain": [ 766 | " satisfaction_level average_montly_hours promotion_last_5years \\\n", 767 | "0 0.38 157 0 \n", 768 | "1 0.80 262 0 \n", 769 | "2 0.11 272 0 \n", 770 | "3 0.72 223 0 \n", 771 | "4 0.37 159 0 \n", 772 | "\n", 773 | " salary_high salary_low salary_medium \n", 774 | "0 0 1 0 \n", 775 | "1 0 0 1 \n", 776 | "2 0 0 1 \n", 777 | "3 0 1 0 \n", 778 | "4 0 1 0 " 779 | ] 780 | }, 781 | "execution_count": 81, 782 | "metadata": {}, 783 | "output_type": "execute_result" 784 | } 785 | ], 786 | "source": [ 787 | "df_with_dummies.drop('salary',axis='columns',inplace=True)\n", 788 | "df_with_dummies.head()" 789 | ] 790 | }, 791 | { 792 | "cell_type": "code", 793 | "execution_count": 82, 794 | "metadata": {}, 795 | "outputs": [ 796 | { 797 | "data": { 798 | "text/html": [ 799 | "
\n", 800 | "\n", 813 | "\n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | "
satisfaction_levelaverage_montly_hourspromotion_last_5yearssalary_highsalary_lowsalary_medium
00.381570010
10.802620001
20.112720001
30.722230010
40.371590010
\n", 873 | "
" 874 | ], 875 | "text/plain": [ 876 | " satisfaction_level average_montly_hours promotion_last_5years \\\n", 877 | "0 0.38 157 0 \n", 878 | "1 0.80 262 0 \n", 879 | "2 0.11 272 0 \n", 880 | "3 0.72 223 0 \n", 881 | "4 0.37 159 0 \n", 882 | "\n", 883 | " salary_high salary_low salary_medium \n", 884 | "0 0 1 0 \n", 885 | "1 0 0 1 \n", 886 | "2 0 0 1 \n", 887 | "3 0 1 0 \n", 888 | "4 0 1 0 " 889 | ] 890 | }, 891 | "execution_count": 82, 892 | "metadata": {}, 893 | "output_type": "execute_result" 894 | } 895 | ], 896 | "source": [ 897 | "X = df_with_dummies\n", 898 | "X.head()" 899 | ] 900 | }, 901 | { 902 | "cell_type": "code", 903 | "execution_count": 83, 904 | "metadata": {}, 905 | "outputs": [], 906 | "source": [ 907 | "y = df.left" 908 | ] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "execution_count": 91, 913 | "metadata": {}, 914 | "outputs": [], 915 | "source": [ 916 | "from sklearn.model_selection import train_test_split\n", 917 | "X_train, X_test, y_train, y_test = train_test_split(X,y,train_size=0.3)" 918 | ] 919 | }, 920 | { 921 | "cell_type": "code", 922 | "execution_count": 87, 923 | "metadata": {}, 924 | "outputs": [], 925 | "source": [ 926 | "from sklearn.linear_model import LogisticRegression\n", 927 | "model = LogisticRegression()" 928 | ] 929 | }, 930 | { 931 | "cell_type": "code", 932 | "execution_count": 88, 933 | "metadata": {}, 934 | "outputs": [ 935 | { 936 | "data": { 937 | "text/plain": [ 938 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", 939 | " intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n", 940 | " penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n", 941 | " verbose=0, warm_start=False)" 942 | ] 943 | }, 944 | "execution_count": 88, 945 | "metadata": {}, 946 | "output_type": "execute_result" 947 | } 948 | ], 949 | "source": [ 950 | "model.fit(X_train, y_train)" 951 | ] 952 | }, 953 | { 954 | "cell_type": "code", 955 | "execution_count": 89, 956 | "metadata": {}, 957 | "outputs": [ 958 | { 959 | "data": { 960 | "text/plain": [ 961 | "array([0, 0, 0, ..., 0, 0, 1], dtype=int64)" 962 | ] 963 | }, 964 | "execution_count": 89, 965 | "metadata": {}, 966 | "output_type": "execute_result" 967 | } 968 | ], 969 | "source": [ 970 | "model.predict(X_test)" 971 | ] 972 | }, 973 | { 974 | "cell_type": "markdown", 975 | "metadata": {}, 976 | "source": [ 977 | "**Accuracy of the model**" 978 | ] 979 | }, 980 | { 981 | "cell_type": "code", 982 | "execution_count": 90, 983 | "metadata": {}, 984 | "outputs": [ 985 | { 986 | "data": { 987 | "text/plain": [ 988 | "0.78428571428571425" 989 | ] 990 | }, 991 | "execution_count": 90, 992 | "metadata": {}, 993 | "output_type": "execute_result" 994 | } 995 | ], 996 | "source": [ 997 | "model.score(X_test,y_test)" 998 | ] 999 | } 1000 | ], 1001 | "metadata": { 1002 | "kernelspec": { 1003 | "display_name": "Python 3 (ipykernel)", 1004 | "language": "python", 1005 | "name": "python3" 1006 | }, 1007 | "language_info": { 1008 | "codemirror_mode": { 1009 | "name": "ipython", 1010 | "version": 3 1011 | }, 1012 | "file_extension": ".py", 1013 | "mimetype": "text/x-python", 1014 | "name": "python", 1015 | "nbconvert_exporter": "python", 1016 | "pygments_lexer": "ipython3", 1017 | "version": "3.9.12" 1018 | } 1019 | }, 1020 | "nbformat": 4, 1021 | "nbformat_minor": 4 1022 | } 1023 | -------------------------------------------------------------------------------- /LogisticRegression/LogisticeRegression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "4198d27c-1b64-41a0-a45f-46349884e9fa", 6 | "metadata": {}, 7 | "source": [ 8 | "

Predicting if a person buy insurance or not

" 9 | ] 10 | }, 11 | { 12 | "cell_type": "code", 13 | "execution_count": 23, 14 | "id": "991790ec-201a-4ac6-811e-e03d6b1563d2", 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "# import all the libs\n", 19 | "import pandas as pd\n", 20 | "import numpy as np\n", 21 | "import seaborn as sns\n", 22 | "import matplotlib.pyplot as plt" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 24, 28 | "id": "efe9539f-e255-454c-9b64-bc8cc604512f", 29 | "metadata": {}, 30 | "outputs": [], 31 | "source": [ 32 | "insurance = pd.read_csv('insurance_data.csv')" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 25, 38 | "id": "55856df2-f456-423a-bc94-046aecfa63ce", 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "df = pd.DataFrame(insurance)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 26, 48 | "id": "b87436f2-4891-45d2-920b-08905c4a157f", 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "
\n", 55 | "\n", 68 | "\n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | "
agebought_insurance
0220
1250
2471
3520
4461
\n", 104 | "
" 105 | ], 106 | "text/plain": [ 107 | " age bought_insurance\n", 108 | "0 22 0\n", 109 | "1 25 0\n", 110 | "2 47 1\n", 111 | "3 52 0\n", 112 | "4 46 1" 113 | ] 114 | }, 115 | "execution_count": 26, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "df.shape\n", 122 | "df.head()" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 36, 128 | "id": "a9f48eeb-e85c-4471-9962-a63170cb7e3e", 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/plain": [ 134 | "[]" 135 | ] 136 | }, 137 | "execution_count": 36, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | }, 141 | { 142 | "data": { 143 | "image/png": "\n", 144 | "text/plain": [ 145 | "
" 146 | ] 147 | }, 148 | "metadata": { 149 | "needs_background": "light" 150 | }, 151 | "output_type": "display_data" 152 | } 153 | ], 154 | "source": [ 155 | "plt.scatter(df.age,df.bought_insurance,marker=\"+\",color='red')\n", 156 | "from sklearn.linear_model import LinearRegression\n", 157 | "\n", 158 | "lreg=LinearRegression().fit(df[['age']],df.bought_insurance)\n", 159 | "y_pred = lreg.predict(df[['age']])\n", 160 | "plt.plot(df.age,y_pred)\n", 161 | "\n", 162 | "# The line show the linear regression \n", 163 | "# if an outlier is there this will make problem" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 37, 169 | "id": "02f54e10-f091-4be3-bc09-6e94df563249", 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "data": { 174 | "text/plain": [ 175 | "0.5364021643885126" 176 | ] 177 | }, 178 | "execution_count": 37, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "# score\n", 185 | "lreg.score(df[['age']],df.bought_insurance)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "id": "12633885-e051-4c9f-9cfd-fe853732b6f3", 191 | "metadata": {}, 192 | "source": [ 193 | "## sklearn logistice regression\n", 194 | "\n", 195 | "### Step 1: Split the dataset into train and test" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 59, 201 | "id": "d867c81d-c0f1-44fc-bf5d-41498613af89", 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "from sklearn.model_selection import train_test_split\n", 206 | "X_train,X_test,y_train,y_test = train_test_split(df[['age']],df.bought_insurance,test_size=.2)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "id": "7f587acf-56eb-4e3e-8776-bfec932d98e1", 212 | "metadata": {}, 213 | "source": [ 214 | "### Step 2: Model creation" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 60, 220 | "id": "8f282469-ebd1-47c2-add8-c5528a296cfb", 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "from sklearn.linear_model import LogisticRegression\n", 225 | "model = LogisticRegression()" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 61, 231 | "id": "79c48062-710b-45c6-b298-95929cb5b5a6", 232 | "metadata": {}, 233 | "outputs": [ 234 | { 235 | "data": { 236 | "text/plain": [ 237 | "LogisticRegression()" 238 | ] 239 | }, 240 | "execution_count": 61, 241 | "metadata": {}, 242 | "output_type": "execute_result" 243 | } 244 | ], 245 | "source": [ 246 | "# fit the model\n", 247 | "model.fit(X_train,y_train)" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 62, 253 | "id": "ce799e97-6d9c-4343-bd28-61ee2672c101", 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "y_pred = model.predict(X_test)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 63, 263 | "id": "2599d297-9383-4994-8118-9e7b253d3dcc", 264 | "metadata": {}, 265 | "outputs": [ 266 | { 267 | "data": { 268 | "text/plain": [ 269 | "array([0, 1, 1, 1, 1, 1])" 270 | ] 271 | }, 272 | "execution_count": 63, 273 | "metadata": {}, 274 | "output_type": "execute_result" 275 | } 276 | ], 277 | "source": [ 278 | "y_pred" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 90, 284 | "id": "8dd3aa2b-459b-430a-9c62-0a55bf343993", 285 | "metadata": {}, 286 | "outputs": [ 287 | { 288 | "name": "stdout", 289 | "output_type": "stream", 290 | "text": [ 291 | "[[0.8925775 0.1074225 ]\n", 292 | " [0.10266326 0.89733674]\n", 293 | " [0.04501951 0.95498049]\n", 294 | " [0.02937193 0.97062807]\n", 295 | " [0.217319 0.782681 ]\n", 296 | " [0.0684156 0.9315844 ]]\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "pd.options.display.float_format = '{:.2f}'.format \n", 302 | "np.set_printoptions(suppress=True)\n", 303 | "print(model.predict_proba(X_test))\n", 304 | "# the below shows the probability of 0 and then 1\n", 305 | "\n" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 78, 311 | "id": "779bc740-e0f5-4768-b137-027c85fd9c60", 312 | "metadata": {}, 313 | "outputs": [ 314 | { 315 | "data": { 316 | "text/plain": [ 317 | "array([0.8925775 , 0.10266326, 0.04501951, 0.02937193, 0.217319 ,\n", 318 | " 0.0684156 ])" 319 | ] 320 | }, 321 | "execution_count": 78, 322 | "metadata": {}, 323 | "output_type": "execute_result" 324 | } 325 | ], 326 | "source": [] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "id": "92bf3c64-795f-48a0-8269-68644690ad55", 331 | "metadata": {}, 332 | "source": [ 333 | "#### coefficiten (m)" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 65, 339 | "id": "85aa88e3-54a1-40ec-8586-346ada059764", 340 | "metadata": {}, 341 | "outputs": [ 342 | { 343 | "data": { 344 | "text/plain": [ 345 | "array([[0.14776968]])" 346 | ] 347 | }, 348 | "execution_count": 65, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | } 352 | ], 353 | "source": [ 354 | "model.coef_" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "id": "1d7284e5-f019-4179-bee0-92a8c4e5e800", 360 | "metadata": {}, 361 | "source": [ 362 | "#### Intercept (b)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": 66, 368 | "id": "a94df50a-878a-4827-9416-25c64bd3231c", 369 | "metadata": {}, 370 | "outputs": [ 371 | { 372 | "data": { 373 | "text/plain": [ 374 | "array([-5.51604619])" 375 | ] 376 | }, 377 | "execution_count": 66, 378 | "metadata": {}, 379 | "output_type": "execute_result" 380 | } 381 | ], 382 | "source": [ 383 | "model.intercept_" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "id": "fc41e137-0011-4c86-b465-2c0a2cc92642", 389 | "metadata": {}, 390 | "source": [ 391 | "#### Score" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 67, 397 | "id": "a0784512-112f-489c-bf49-a4bf8712345d", 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/plain": [ 403 | "0.8333333333333334" 404 | ] 405 | }, 406 | "execution_count": 67, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "model.score(X_test,y_test)" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 68, 418 | "id": "a83f5ebc-6a85-4c29-81bb-042a4d2c7186", 419 | "metadata": {}, 420 | "outputs": [ 421 | { 422 | "data": { 423 | "text/plain": [ 424 | "0.9047619047619048" 425 | ] 426 | }, 427 | "execution_count": 68, 428 | "metadata": {}, 429 | "output_type": "execute_result" 430 | } 431 | ], 432 | "source": [ 433 | "model.score(X_train,y_train)" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "id": "b1d88203-cfc5-4276-afc6-2d21717dac71", 439 | "metadata": {}, 440 | "source": [ 441 | "#### Logistice regression plot" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": 75, 447 | "id": "6418ed1c-ceb8-4124-bd0b-ba18bd7aa69e", 448 | "metadata": {}, 449 | "outputs": [ 450 | { 451 | "name": "stderr", 452 | "output_type": "stream", 453 | "text": [ 454 | "/opt/anaconda3/envs/jenv/lib/python3.9/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n", 455 | " warnings.warn(\n" 456 | ] 457 | }, 458 | { 459 | "data": { 460 | "text/plain": [ 461 | "" 462 | ] 463 | }, 464 | "execution_count": 75, 465 | "metadata": {}, 466 | "output_type": "execute_result" 467 | }, 468 | { 469 | "data": { 470 | "image/png": "\n", 471 | "text/plain": [ 472 | "
" 473 | ] 474 | }, 475 | "metadata": { 476 | "needs_background": "light" 477 | }, 478 | "output_type": "display_data" 479 | } 480 | ], 481 | "source": [ 482 | "\n", 483 | "plt.scatter(df.age,df.bought_insurance)\n", 484 | "sns.lineplot(df.age,model.predict(df[['age']]))" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "id": "4e8c78be-b39a-4a40-8186-c92ebe5b596e", 490 | "metadata": {}, 491 | "source": [ 492 | "#### Probability curve" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": 88, 498 | "id": "6e1fc4ed-2dcf-4a56-8c7b-6219990b8d44", 499 | "metadata": {}, 500 | "outputs": [ 501 | { 502 | "name": "stderr", 503 | "output_type": "stream", 504 | "text": [ 505 | "/opt/anaconda3/envs/jenv/lib/python3.9/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n", 506 | " warnings.warn(\n" 507 | ] 508 | }, 509 | { 510 | "data": { 511 | "text/plain": [ 512 | "" 513 | ] 514 | }, 515 | "execution_count": 88, 516 | "metadata": {}, 517 | "output_type": "execute_result" 518 | }, 519 | { 520 | "data": { 521 | "image/png": "\n", 522 | "text/plain": [ 523 | "
" 524 | ] 525 | }, 526 | "metadata": { 527 | "needs_background": "light" 528 | }, 529 | "output_type": "display_data" 530 | } 531 | ], 532 | "source": [ 533 | "plt.scatter(df.age,df.bought_insurance)\n", 534 | "# line plot will plot the line in ascending order of the points\n", 535 | "# lineplot age vs probability of 1\n", 536 | "sns.lineplot(df.age,model.predict_proba(df[['age']])[:,1])" 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "id": "4e968b38-792a-4037-80a2-f4c8223aa092", 542 | "metadata": {}, 543 | "source": [ 544 | "#### Define the fuction manually" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 98, 550 | "id": "ec2b055a-4140-4a4d-bd03-b961466ee5b9", 551 | "metadata": {}, 552 | "outputs": [], 553 | "source": [ 554 | "import math\n", 555 | "def sigmoid(z):\n", 556 | " return (1/(1+math.exp(-z)))" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 112, 562 | "id": "b9ad4eeb-82aa-4a86-9c0b-6d33f6ddcad0", 563 | "metadata": {}, 564 | "outputs": [], 565 | "source": [ 566 | "def predicts(age):\n", 567 | " z = age*0.14776968-5.51604619\n", 568 | " y =sigmoid(z)\n", 569 | " if y<=.5: return 0 \n", 570 | " else: return 1" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": 127, 576 | "id": "1d56fce8-9439-4f45-a31b-f630774d411e", 577 | "metadata": {}, 578 | "outputs": [ 579 | { 580 | "data": { 581 | "text/plain": [ 582 | "1" 583 | ] 584 | }, 585 | "execution_count": 127, 586 | "metadata": {}, 587 | "output_type": "execute_result" 588 | } 589 | ], 590 | "source": [ 591 | "\n", 592 | "predicts(37.5)" 593 | ] 594 | }, 595 | { 596 | "cell_type": "code", 597 | "execution_count": 126, 598 | "id": "00ce06ab-bdd5-4aae-b7d2-76c843d426c4", 599 | "metadata": {}, 600 | "outputs": [ 601 | { 602 | "name": "stderr", 603 | "output_type": "stream", 604 | "text": [ 605 | "/opt/anaconda3/envs/jenv/lib/python3.9/site-packages/sklearn/base.py:450: UserWarning: X does not have valid feature names, but LogisticRegression was fitted with feature names\n", 606 | " warnings.warn(\n" 607 | ] 608 | }, 609 | { 610 | "data": { 611 | "text/plain": [ 612 | "array([1])" 613 | ] 614 | }, 615 | "execution_count": 126, 616 | "metadata": {}, 617 | "output_type": "execute_result" 618 | } 619 | ], 620 | "source": [ 621 | "model.predict([[37.5]])" 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": null, 627 | "id": "dc6f3bb2-eafb-49c9-b8c5-191f1e56413c", 628 | "metadata": {}, 629 | "outputs": [], 630 | "source": [] 631 | }, 632 | { 633 | "cell_type": "code", 634 | "execution_count": null, 635 | "id": "0a23e2a7-8c34-4f7a-8072-9d318a482ea7", 636 | "metadata": {}, 637 | "outputs": [], 638 | "source": [] 639 | } 640 | ], 641 | "metadata": { 642 | "kernelspec": { 643 | "display_name": "Python 3 (ipykernel)", 644 | "language": "python", 645 | "name": "python3" 646 | }, 647 | "language_info": { 648 | "codemirror_mode": { 649 | "name": "ipython", 650 | "version": 3 651 | }, 652 | "file_extension": ".py", 653 | "mimetype": "text/x-python", 654 | "name": "python", 655 | "nbconvert_exporter": "python", 656 | "pygments_lexer": "ipython3", 657 | "version": "3.9.12" 658 | } 659 | }, 660 | "nbformat": 4, 661 | "nbformat_minor": 5 662 | } 663 | -------------------------------------------------------------------------------- /LogisticRegression/insurance_data.csv: -------------------------------------------------------------------------------- 1 | age,bought_insurance 2 | 22,0 3 | 25,0 4 | 47,1 5 | 52,0 6 | 46,1 7 | 56,1 8 | 55,0 9 | 60,1 10 | 62,1 11 | 61,1 12 | 18,0 13 | 28,0 14 | 27,0 15 | 29,0 16 | 49,1 17 | 55,1 18 | 25,1 19 | 58,1 20 | 19,0 21 | 18,0 22 | 21,0 23 | 26,0 24 | 40,1 25 | 45,1 26 | 50,1 27 | 54,1 28 | 23,0 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DataScience-ML 2 | 3 | *Here I will upload my data science and machine learning journey* 4 | 5 | U can find the algoritms and some exercise on the respective folders 6 | --------------------------------------------------------------------------------