├── Exploratory_Data_Analysis.ipynb ├── Images ├── 2in1.png ├── Accu.png ├── Accu2.png ├── Final dataset.PNG ├── Imbalance.png ├── Imbalance2.PNG ├── Jupyter_logo.png ├── Numpy_logo.png ├── Pandas_logo.png ├── Python_logo.png ├── RF.PNG ├── ROC.png ├── RP.png ├── Seaborn_logo.png ├── ab.PNG ├── abgb.PNG ├── contingency_table1.PNG ├── contingency_table2.PNG ├── contingency_table3.PNG ├── corr.png ├── corr1.png ├── corr11.PNG ├── corr111.png ├── corr2.PNG ├── corr22.png ├── corr222.png ├── corrplot.png ├── df.info.PNG ├── fet imp 2.PNG ├── fet imp.PNG ├── gb.PNG ├── knn.PNG ├── log reg.PNG ├── matplotlib_logo.png ├── null column.PNG ├── nulls.PNG ├── others.png ├── repeat.PNG ├── rfab.PNG └── svc.PNG ├── ML_backorders.ipynb ├── Preliminary_Data_Analysis.ipynb ├── README.md ├── Sample work report.pdf └── Wrangling_of_data.ipynb /Images/2in1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/2in1.png -------------------------------------------------------------------------------- /Images/Accu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Accu.png -------------------------------------------------------------------------------- /Images/Accu2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Accu2.png -------------------------------------------------------------------------------- /Images/Final dataset.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Final dataset.PNG -------------------------------------------------------------------------------- /Images/Imbalance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Imbalance.png -------------------------------------------------------------------------------- /Images/Imbalance2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Imbalance2.PNG -------------------------------------------------------------------------------- /Images/Jupyter_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Jupyter_logo.png -------------------------------------------------------------------------------- /Images/Numpy_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Numpy_logo.png -------------------------------------------------------------------------------- /Images/Pandas_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Pandas_logo.png -------------------------------------------------------------------------------- /Images/Python_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Python_logo.png -------------------------------------------------------------------------------- /Images/RF.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/RF.PNG -------------------------------------------------------------------------------- /Images/ROC.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/ROC.png -------------------------------------------------------------------------------- /Images/RP.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/RP.png -------------------------------------------------------------------------------- /Images/Seaborn_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/Seaborn_logo.png -------------------------------------------------------------------------------- /Images/ab.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/ab.PNG -------------------------------------------------------------------------------- /Images/abgb.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/abgb.PNG -------------------------------------------------------------------------------- /Images/contingency_table1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/contingency_table1.PNG -------------------------------------------------------------------------------- /Images/contingency_table2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/contingency_table2.PNG -------------------------------------------------------------------------------- /Images/contingency_table3.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/contingency_table3.PNG -------------------------------------------------------------------------------- /Images/corr.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr.png -------------------------------------------------------------------------------- /Images/corr1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr1.png -------------------------------------------------------------------------------- /Images/corr11.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr11.PNG -------------------------------------------------------------------------------- /Images/corr111.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr111.png -------------------------------------------------------------------------------- /Images/corr2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr2.PNG -------------------------------------------------------------------------------- /Images/corr22.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr22.png -------------------------------------------------------------------------------- /Images/corr222.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corr222.png -------------------------------------------------------------------------------- /Images/corrplot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/corrplot.png -------------------------------------------------------------------------------- /Images/df.info.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/df.info.PNG -------------------------------------------------------------------------------- /Images/fet imp 2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/fet imp 2.PNG -------------------------------------------------------------------------------- /Images/fet imp.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/fet imp.PNG -------------------------------------------------------------------------------- /Images/gb.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/gb.PNG -------------------------------------------------------------------------------- /Images/knn.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/knn.PNG -------------------------------------------------------------------------------- /Images/log reg.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/log reg.PNG -------------------------------------------------------------------------------- /Images/matplotlib_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/matplotlib_logo.png -------------------------------------------------------------------------------- /Images/null column.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/null column.PNG -------------------------------------------------------------------------------- /Images/nulls.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/nulls.PNG -------------------------------------------------------------------------------- /Images/others.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/others.png -------------------------------------------------------------------------------- /Images/repeat.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/repeat.PNG -------------------------------------------------------------------------------- /Images/rfab.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/rfab.PNG -------------------------------------------------------------------------------- /Images/svc.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Images/svc.PNG -------------------------------------------------------------------------------- /Preliminary_Data_Analysis.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Preliminary Data Analysis with Sample Data" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 48, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "#Importing the packages\n", 17 | "import pandas as pd\n", 18 | "import numpy as np\n", 19 | "import matplotlib.pylab as plt\n", 20 | "import seaborn as sns\n", 21 | "from sklearn.model_selection import train_test_split\n", 22 | "from sklearn.linear_model import LogisticRegression\n", 23 | "from sklearn.metrics import classification_report\n", 24 | "from sklearn.metrics import accuracy_score\n", 25 | "from sklearn.metrics import roc_curve\n", 26 | "from sklearn.model_selection import cross_val_score\n", 27 | "from sklearn.model_selection import GridSearchCV\n", 28 | "from sklearn import svm\n", 29 | "from sklearn.model_selection import RandomizedSearchCV\n", 30 | "from sklearn.ensemble import RandomForestClassifier\n", 31 | "from sklearn.ensemble import GradientBoostingClassifier\n", 32 | "from sklearn.ensemble import AdaBoostClassifier\n", 33 | "from scipy import stats as st\n", 34 | "from sklearn.neighbors import KNeighborsClassifier\n", 35 | "from sklearn.preprocessing import StandardScaler\n", 36 | "from sklearn.pipeline import Pipeline\n", 37 | "from sklearn.metrics import precision_score\n", 38 | "from sklearn.metrics import average_precision_score\n", 39 | "from sklearn.metrics import precision_recall_curve\n", 40 | "%matplotlib inline" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "As a part of the preliminary data analysis, will be working on 30% of the data.\n", 48 | "\n", 49 | "1. Within the 10% of the data, will be splitting the data into train and test.\n", 50 | "\n", 51 | "2. Will be validating the models with the sample before working on the full datasets.\n", 52 | "\n", 53 | "3. Precision of the models are validated through the recall_precision_curve.\n" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 51, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "#Reading the clean file\n", 63 | "masterData = pd.read_csv('Backorder.csv')" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 52, 69 | "metadata": {}, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "image/png": "\n", 74 | "text/plain": [ 75 | "
" 76 | ] 77 | }, 78 | "metadata": { 79 | "needs_background": "light" 80 | }, 81 | "output_type": "display_data" 82 | } 83 | ], 84 | "source": [ 85 | "plt.figure(figsize=(12,8))\n", 86 | "ax=sns.histplot(data=masterData,x='went_on_backorder',color='lightseagreen')\n", 87 | "ax.figure.savefig('Mean.png', dpi=500,bbox_inches='tight')" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 53, 93 | "metadata": {}, 94 | "outputs": [ 95 | { 96 | "data": { 97 | "text/plain": [ 98 | "No 1628546\n", 99 | "Yes 11188\n", 100 | "Name: went_on_backorder, dtype: int64" 101 | ] 102 | }, 103 | "execution_count": 53, 104 | "metadata": {}, 105 | "output_type": "execute_result" 106 | } 107 | ], 108 | "source": [ 109 | "masterData['went_on_backorder'].value_counts()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 54, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/html": [ 120 | "
\n", 121 | "\n", 134 | "\n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | "
product_idcurrent_inventorytransit_durationprior_sales_1_monthprior_sales_3_monthprior_sales_6_monthprior_sales_9_monthminimum_recommend_stocksource_has_issuesource_performance_6_monthssource_performance_12_monthsdeck_riskoe_constraintppap_riskstop_auto_buyrev_stopwent_on_backorder
022318490.02.00.04.012.023.02.0No0.000.00NoNoNoYesNoYes
1188455613.08.07.016.026.035.02.0No0.670.39NoNoNoYesNoYes
218273950.02.00.00.00.00.00.0No0.000.00YesNoYesYesNoYes
332755153.08.03.08.015.023.04.0No0.830.85NoNoNoYesNoNo
419889450.02.00.00.01.01.00.0No0.830.86YesNoNoNoNoYes
......................................................
2218322974620.012.00.02.04.010.00.0No0.730.76NoNoNoYesNoYes
2218418302210.08.00.00.00.00.00.0No0.980.98YesNoYesYesNoYes
2218520889750.09.02.03.06.07.00.0No0.700.66NoNoNoYesNoYes
22186112982916.09.00.07.07.08.00.0No0.690.68NoNoYesYesNoNo
221873104959-11.02.0343.0664.0779.0814.089.0No0.730.70NoNoNoYesNoYes
\n", 380 | "

22188 rows × 17 columns

\n", 381 | "
" 382 | ], 383 | "text/plain": [ 384 | " product_id current_inventory transit_duration prior_sales_1_month \\\n", 385 | "0 2231849 0.0 2.0 0.0 \n", 386 | "1 1884556 13.0 8.0 7.0 \n", 387 | "2 1827395 0.0 2.0 0.0 \n", 388 | "3 3275515 3.0 8.0 3.0 \n", 389 | "4 1988945 0.0 2.0 0.0 \n", 390 | "... ... ... ... ... \n", 391 | "22183 2297462 0.0 12.0 0.0 \n", 392 | "22184 1830221 0.0 8.0 0.0 \n", 393 | "22185 2088975 0.0 9.0 2.0 \n", 394 | "22186 1129829 16.0 9.0 0.0 \n", 395 | "22187 3104959 -11.0 2.0 343.0 \n", 396 | "\n", 397 | " prior_sales_3_month prior_sales_6_month prior_sales_9_month \\\n", 398 | "0 4.0 12.0 23.0 \n", 399 | "1 16.0 26.0 35.0 \n", 400 | "2 0.0 0.0 0.0 \n", 401 | "3 8.0 15.0 23.0 \n", 402 | "4 0.0 1.0 1.0 \n", 403 | "... ... ... ... \n", 404 | "22183 2.0 4.0 10.0 \n", 405 | "22184 0.0 0.0 0.0 \n", 406 | "22185 3.0 6.0 7.0 \n", 407 | "22186 7.0 7.0 8.0 \n", 408 | "22187 664.0 779.0 814.0 \n", 409 | "\n", 410 | " minimum_recommend_stock source_has_issue source_performance_6_months \\\n", 411 | "0 2.0 No 0.00 \n", 412 | "1 2.0 No 0.67 \n", 413 | "2 0.0 No 0.00 \n", 414 | "3 4.0 No 0.83 \n", 415 | "4 0.0 No 0.83 \n", 416 | "... ... ... ... \n", 417 | "22183 0.0 No 0.73 \n", 418 | "22184 0.0 No 0.98 \n", 419 | "22185 0.0 No 0.70 \n", 420 | "22186 0.0 No 0.69 \n", 421 | "22187 89.0 No 0.73 \n", 422 | "\n", 423 | " source_performance_12_months deck_risk oe_constraint ppap_risk \\\n", 424 | "0 0.00 No No No \n", 425 | "1 0.39 No No No \n", 426 | "2 0.00 Yes No Yes \n", 427 | "3 0.85 No No No \n", 428 | "4 0.86 Yes No No \n", 429 | "... ... ... ... ... \n", 430 | "22183 0.76 No No No \n", 431 | "22184 0.98 Yes No Yes \n", 432 | "22185 0.66 No No No \n", 433 | "22186 0.68 No No Yes \n", 434 | "22187 0.70 No No No \n", 435 | "\n", 436 | " stop_auto_buy rev_stop went_on_backorder \n", 437 | "0 Yes No Yes \n", 438 | "1 Yes No Yes \n", 439 | "2 Yes No Yes \n", 440 | "3 Yes No No \n", 441 | "4 No No Yes \n", 442 | "... ... ... ... \n", 443 | "22183 Yes No Yes \n", 444 | "22184 Yes No Yes \n", 445 | "22185 Yes No Yes \n", 446 | "22186 Yes No No \n", 447 | "22187 Yes No Yes \n", 448 | "\n", 449 | "[22188 rows x 17 columns]" 450 | ] 451 | }, 452 | "execution_count": 54, 453 | "metadata": {}, 454 | "output_type": "execute_result" 455 | } 456 | ], 457 | "source": [ 458 | "join1=masterData[masterData['went_on_backorder']=='Yes']\n", 459 | "join2=masterData[masterData['went_on_backorder']=='No']\n", 460 | "join3=join2.sample(n=11000, random_state = 2)\n", 461 | "\n", 462 | "masterData=pd.merge(join1,join3,how='outer')\n", 463 | "masterData=masterData.sample(frac=1).reset_index(drop=True)\n", 464 | "masterData" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 55, 470 | "metadata": {}, 471 | "outputs": [ 472 | { 473 | "data": { 474 | "text/plain": [ 475 | "((22188, 17), (22188,))" 476 | ] 477 | }, 478 | "execution_count": 55, 479 | "metadata": {}, 480 | "output_type": "execute_result" 481 | } 482 | ], 483 | "source": [ 484 | "#Preparing the data for modeling\n", 485 | "#X = pd.DataFrame(masterData.columns)\n", 486 | "X = masterData[masterData.columns]\n", 487 | "y = masterData['went_on_backorder']\n", 488 | "X.shape, y.shape" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 56, 494 | "metadata": {}, 495 | "outputs": [ 496 | { 497 | "data": { 498 | "text/plain": [ 499 | "((22188, 10), (22188,))" 500 | ] 501 | }, 502 | "execution_count": 56, 503 | "metadata": {}, 504 | "output_type": "execute_result" 505 | } 506 | ], 507 | "source": [ 508 | "#Convert the categorical data into numerical\n", 509 | "y = y.replace('Yes', 1)\n", 510 | "y = y.replace('No', 0)\n", 511 | "X['source_has_issue'] = X.source_has_issue.map({'Yes':1, 'No':0})\n", 512 | "X['deck_risk'] = X.deck_risk.map({'Yes':1, 'No':0})\n", 513 | "X['oe_constraint'] = X.oe_constraint.map({'Yes':1, 'No':0})\n", 514 | "X['ppap_risk'] = X.ppap_risk.map({'Yes':1, 'No':0})\n", 515 | "X['stop_auto_buy'] = X.stop_auto_buy.map({'Yes':1, 'No':0})\n", 516 | "X['rev_stop'] = X.rev_stop.map({'Yes':1, 'No':0})\n", 517 | "X = X.drop(['went_on_backorder', 'source_has_issue', 'rev_stop', 'deck_risk', 'ppap_risk', 'stop_auto_buy','product_id'], axis = 1, errors = 'ignore')\n", 518 | "X.shape, y.shape" 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 57, 524 | "metadata": {}, 525 | "outputs": [ 526 | { 527 | "data": { 528 | "text/plain": [ 529 | "((16641, 10), (16641,), (5547, 10), (5547,))" 530 | ] 531 | }, 532 | "execution_count": 57, 533 | "metadata": {}, 534 | "output_type": "execute_result" 535 | } 536 | ], 537 | "source": [ 538 | "#get the 10% of the data\n", 539 | "X_sample, X_data, y_sample, y_data = train_test_split(X, y, test_size = 0.25, random_state = 42)\n", 540 | "X_sample.shape, y_sample.shape, X_data.shape, y_data.shape" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 58, 546 | "metadata": {}, 547 | "outputs": [ 548 | { 549 | "data": { 550 | "text/plain": [ 551 | "((11648, 10), (4993, 10), (11648,), (4993,))" 552 | ] 553 | }, 554 | "execution_count": 58, 555 | "metadata": {}, 556 | "output_type": "execute_result" 557 | } 558 | ], 559 | "source": [ 560 | "X_train, X_test, y_train, y_test = train_test_split(X_sample, y_sample, test_size = 0.30, random_state = 42)\n", 561 | "X_train.shape, X_test.shape, y_train.shape, y_test.shape" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "# Training the KNN Model Classifier" 569 | ] 570 | }, 571 | { 572 | "cell_type": "code", 573 | "execution_count": 59, 574 | "metadata": {}, 575 | "outputs": [ 576 | { 577 | "name": "stdout", 578 | "output_type": "stream", 579 | "text": [ 580 | " precision recall f1-score support\n", 581 | "\n", 582 | " 0 0.75 0.70 0.72 2484\n", 583 | " 1 0.72 0.77 0.75 2509\n", 584 | "\n", 585 | " accuracy 0.74 4993\n", 586 | " macro avg 0.74 0.73 0.73 4993\n", 587 | "weighted avg 0.74 0.74 0.73 4993\n", 588 | "\n" 589 | ] 590 | } 591 | ], 592 | "source": [ 593 | "steps = [('scaler', StandardScaler()),\n", 594 | " ('knn', KNeighborsClassifier(n_jobs = -1))]\n", 595 | "#Create the pipeline: pipeline\n", 596 | "pipeline_knn = Pipeline(steps)\n", 597 | "\n", 598 | "# Fit to the training set\n", 599 | "pipeline_knn.fit(X_train, y_train)\n", 600 | "knn_prediction = pipeline_knn.predict( X_test )\n", 601 | "# get the classification report\n", 602 | "knn_report = classification_report( y_test, knn_prediction )\n", 603 | "#print the report\n", 604 | "print(knn_report)" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": {}, 610 | "source": [ 611 | "# Training the SVC Model Classifier" 612 | ] 613 | }, 614 | { 615 | "cell_type": "code", 616 | "execution_count": 60, 617 | "metadata": {}, 618 | "outputs": [ 619 | { 620 | "name": "stdout", 621 | "output_type": "stream", 622 | "text": [ 623 | " precision recall f1-score support\n", 624 | "\n", 625 | " 0 0.59 0.70 0.64 2484\n", 626 | " 1 0.63 0.51 0.57 2509\n", 627 | "\n", 628 | " accuracy 0.61 4993\n", 629 | " macro avg 0.61 0.61 0.60 4993\n", 630 | "weighted avg 0.61 0.61 0.60 4993\n", 631 | "\n" 632 | ] 633 | } 634 | ], 635 | "source": [ 636 | "steps = [('scaler', StandardScaler()),\n", 637 | " ('svc', svm.SVC(class_weight = 'balanced'))]\n", 638 | "#Create the pipeline: pipeline\n", 639 | "pipeline_svc = Pipeline(steps)\n", 640 | "\n", 641 | "# Fit to the training set\n", 642 | "pipeline_svc.fit(X_train, y_train)\n", 643 | "svc_prediction = pipeline_svc.predict( X_test )\n", 644 | "# get the classification report\n", 645 | "svc_report = classification_report( y_test, svc_prediction )\n", 646 | "#print the report\n", 647 | "print(svc_report)" 648 | ] 649 | }, 650 | { 651 | "cell_type": "markdown", 652 | "metadata": {}, 653 | "source": [ 654 | "# Training the LogisticRegression Model " 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": 61, 660 | "metadata": {}, 661 | "outputs": [ 662 | { 663 | "name": "stdout", 664 | "output_type": "stream", 665 | "text": [ 666 | " precision recall f1-score support\n", 667 | "\n", 668 | " 0 0.60 0.65 0.62 2484\n", 669 | " 1 0.62 0.58 0.60 2509\n", 670 | "\n", 671 | " accuracy 0.61 4993\n", 672 | " macro avg 0.61 0.61 0.61 4993\n", 673 | "weighted avg 0.61 0.61 0.61 4993\n", 674 | "\n" 675 | ] 676 | } 677 | ], 678 | "source": [ 679 | "steps = [('scaler', StandardScaler()),\n", 680 | " ('logreg', LogisticRegression(class_weight = 'balanced'))]\n", 681 | "#Create the pipeline: pipeline\n", 682 | "pipeline_logreg = Pipeline(steps)\n", 683 | "\n", 684 | "# Fit to the training set\n", 685 | "pipeline_logreg.fit(X_train, y_train)\n", 686 | "logreg_prediction = pipeline_logreg.predict( X_test )\n", 687 | "# get the classification report\n", 688 | "logreg_report = classification_report( y_test, logreg_prediction )\n", 689 | "#print the report\n", 690 | "print(logreg_report)" 691 | ] 692 | }, 693 | { 694 | "cell_type": "markdown", 695 | "metadata": {}, 696 | "source": [ 697 | "# Training the RandomForest Classifier" 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": 62, 703 | "metadata": {}, 704 | "outputs": [ 705 | { 706 | "name": "stdout", 707 | "output_type": "stream", 708 | "text": [ 709 | " precision recall f1-score support\n", 710 | "\n", 711 | " 0 0.91 0.86 0.89 2484\n", 712 | " 1 0.87 0.92 0.89 2509\n", 713 | "\n", 714 | " accuracy 0.89 4993\n", 715 | " macro avg 0.89 0.89 0.89 4993\n", 716 | "weighted avg 0.89 0.89 0.89 4993\n", 717 | "\n" 718 | ] 719 | } 720 | ], 721 | "source": [ 722 | "#Train default RandomForest on 30% of the data\n", 723 | "rfmodel = RandomForestClassifier(random_state = 42)\n", 724 | "#Fit the training set\n", 725 | "rfmodel.fit(X_train, y_train)\n", 726 | "rfmodel_prediction = rfmodel.predict(X_test)\n", 727 | "#get the classification report\n", 728 | "rfmodel_report = classification_report(y_test, rfmodel_prediction)\n", 729 | "#print the report\n", 730 | "print(rfmodel_report)" 731 | ] 732 | }, 733 | { 734 | "cell_type": "markdown", 735 | "metadata": {}, 736 | "source": [ 737 | "# Training GradientBoosting Classifier model" 738 | ] 739 | }, 740 | { 741 | "cell_type": "code", 742 | "execution_count": 63, 743 | "metadata": {}, 744 | "outputs": [ 745 | { 746 | "name": "stdout", 747 | "output_type": "stream", 748 | "text": [ 749 | " precision recall f1-score support\n", 750 | "\n", 751 | " 0 0.88 0.85 0.86 2484\n", 752 | " 1 0.85 0.88 0.87 2509\n", 753 | "\n", 754 | " accuracy 0.87 4993\n", 755 | " macro avg 0.87 0.87 0.87 4993\n", 756 | "weighted avg 0.87 0.87 0.87 4993\n", 757 | "\n" 758 | ] 759 | } 760 | ], 761 | "source": [ 762 | "#Train default RandomForest on 30% of the data\n", 763 | "gbmodel = GradientBoostingClassifier(random_state = 42)\n", 764 | "#Fit the training set\n", 765 | "gbmodel.fit(X_train, y_train)\n", 766 | "gbmodel_prediction = gbmodel.predict(X_test)\n", 767 | "#get the classification report\n", 768 | "gbmodel_report = classification_report(y_test, gbmodel_prediction)\n", 769 | "#print the report\n", 770 | "print(gbmodel_report)" 771 | ] 772 | }, 773 | { 774 | "cell_type": "markdown", 775 | "metadata": {}, 776 | "source": [ 777 | "# Training AdaBoostClassifier Model" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": 64, 783 | "metadata": {}, 784 | "outputs": [ 785 | { 786 | "name": "stdout", 787 | "output_type": "stream", 788 | "text": [ 789 | " precision recall f1-score support\n", 790 | "\n", 791 | " 0 0.87 0.82 0.84 2484\n", 792 | " 1 0.83 0.87 0.85 2509\n", 793 | "\n", 794 | " accuracy 0.85 4993\n", 795 | " macro avg 0.85 0.85 0.85 4993\n", 796 | "weighted avg 0.85 0.85 0.85 4993\n", 797 | "\n" 798 | ] 799 | } 800 | ], 801 | "source": [ 802 | "#Train default RandomForest on 30% of the data\n", 803 | "abmodel = AdaBoostClassifier(random_state = 42)\n", 804 | "#Fit the training set\n", 805 | "abmodel.fit(X_train, y_train)\n", 806 | "abmodel_prediction = abmodel.predict(X_test)\n", 807 | "#get the classification report\n", 808 | "abmodel_report = classification_report(y_test, abmodel_prediction)\n", 809 | "#print the report\n", 810 | "print(abmodel_report)" 811 | ] 812 | }, 813 | { 814 | "cell_type": "markdown", 815 | "metadata": {}, 816 | "source": [ 817 | "# Training the AdaBoostClassifier with RandomForest Estimator" 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": 65, 823 | "metadata": {}, 824 | "outputs": [ 825 | { 826 | "name": "stdout", 827 | "output_type": "stream", 828 | "text": [ 829 | " precision recall f1-score support\n", 830 | "\n", 831 | " 0 0.92 0.86 0.89 2484\n", 832 | " 1 0.87 0.92 0.90 2509\n", 833 | "\n", 834 | " accuracy 0.89 4993\n", 835 | " macro avg 0.90 0.89 0.89 4993\n", 836 | "weighted avg 0.90 0.89 0.89 4993\n", 837 | "\n" 838 | ] 839 | } 840 | ], 841 | "source": [ 842 | "#Train default RandomForest on 30% of the data\n", 843 | "abmodel_rf = AdaBoostClassifier(base_estimator = rfmodel, random_state = 42)\n", 844 | "#Fit the training set\n", 845 | "abmodel_rf.fit(X_train, y_train)\n", 846 | "abmodel_prediction_rf = abmodel_rf.predict(X_test)\n", 847 | "#get the classification report\n", 848 | "abmodel_report_rf = classification_report(y_test, abmodel_prediction_rf)\n", 849 | "#print the report\n", 850 | "print(abmodel_report_rf)" 851 | ] 852 | }, 853 | { 854 | "cell_type": "code", 855 | "execution_count": 43, 856 | "metadata": {}, 857 | "outputs": [ 858 | { 859 | "data": { 860 | "text/plain": [ 861 | "current_inventory 0.409697\n", 862 | "prior_sales_6_month 0.091196\n", 863 | "source_performance_6_months 0.086669\n", 864 | "source_performance_12_months 0.086269\n", 865 | "prior_sales_3_month 0.084116\n", 866 | "prior_sales_1_month 0.073360\n", 867 | "prior_sales_9_month 0.072090\n", 868 | "minimum_recommend_stock 0.049787\n", 869 | "transit_duration 0.046733\n", 870 | "oe_constraint 0.000082\n", 871 | "dtype: float64" 872 | ] 873 | }, 874 | "execution_count": 43, 875 | "metadata": {}, 876 | "output_type": "execute_result" 877 | } 878 | ], 879 | "source": [ 880 | "feature_import_rf = pd.Series(rfmodel.feature_importances_, index = X_test.columns)\n", 881 | "feature_import_rf.sort_values(ascending = False)\n" 882 | ] 883 | }, 884 | { 885 | "cell_type": "code", 886 | "execution_count": 44, 887 | "metadata": {}, 888 | "outputs": [ 889 | { 890 | "data": { 891 | "text/plain": [ 892 | "current_inventory 0.628698\n", 893 | "prior_sales_1_month 0.125767\n", 894 | "prior_sales_3_month 0.087195\n", 895 | "prior_sales_6_month 0.061263\n", 896 | "prior_sales_9_month 0.038240\n", 897 | "transit_duration 0.026493\n", 898 | "source_performance_6_months 0.015331\n", 899 | "source_performance_12_months 0.009316\n", 900 | "minimum_recommend_stock 0.007697\n", 901 | "oe_constraint 0.000000\n", 902 | "dtype: float64" 903 | ] 904 | }, 905 | "execution_count": 44, 906 | "metadata": {}, 907 | "output_type": "execute_result" 908 | } 909 | ], 910 | "source": [ 911 | "feature_import_gb = pd.Series(gbmodel.feature_importances_, index = X_test.columns)\n", 912 | "feature_import_gb.sort_values(ascending = False)\n" 913 | ] 914 | }, 915 | { 916 | "cell_type": "markdown", 917 | "metadata": {}, 918 | "source": [ 919 | "# Traning the AdaBoostClassifier with GradientBoosting Estimator" 920 | ] 921 | }, 922 | { 923 | "cell_type": "code", 924 | "execution_count": 45, 925 | "metadata": {}, 926 | "outputs": [ 927 | { 928 | "name": "stdout", 929 | "output_type": "stream", 930 | "text": [ 931 | " precision recall f1-score support\n", 932 | "\n", 933 | " 0 0.90 0.85 0.87 2472\n", 934 | " 1 0.86 0.90 0.88 2521\n", 935 | "\n", 936 | " accuracy 0.88 4993\n", 937 | " macro avg 0.88 0.88 0.88 4993\n", 938 | "weighted avg 0.88 0.88 0.88 4993\n", 939 | "\n" 940 | ] 941 | } 942 | ], 943 | "source": [ 944 | "#Train default AdaBoost model with the GB estimator\n", 945 | "abmodel_gb = AdaBoostClassifier(base_estimator = gbmodel, random_state = 42)\n", 946 | "#Fit the training model\n", 947 | "abmodel_gb.fit(X_train, y_train)\n", 948 | "abmodel_prediction_gb = abmodel_gb.predict(X_test)\n", 949 | "#get the classication report\n", 950 | "abmodel_report_gb = classification_report(y_test, abmodel_prediction_gb)\n", 951 | "#print the report\n", 952 | "print(abmodel_report_gb)" 953 | ] 954 | }, 955 | { 956 | "cell_type": "code", 957 | "execution_count": 46, 958 | "metadata": {}, 959 | "outputs": [ 960 | { 961 | "data": { 962 | "text/plain": [ 963 | "Text(0.5, 1.0, 'ROC Curve')" 964 | ] 965 | }, 966 | "execution_count": 46, 967 | "metadata": {}, 968 | "output_type": "execute_result" 969 | }, 970 | { 971 | "data": { 972 | "image/png": "\n", 973 | "text/plain": [ 974 | "
" 975 | ] 976 | }, 977 | "metadata": { 978 | "needs_background": "light" 979 | }, 980 | "output_type": "display_data" 981 | } 982 | ], 983 | "source": [ 984 | "#plot ROC Curves\n", 985 | "fig , ax1 = plt.subplots(figsize=(9,9) )\n", 986 | "plt.plot([0, 1], [0, 1], 'k--')\n", 987 | "estimators =[pipeline_knn, pipeline_logreg, rfmodel, gbmodel, abmodel ]\n", 988 | "classifiers=['KNN', 'LogisticRegression', 'RandomForestClassifier','GradientBoostingClassifier',\n", 989 | " 'AdaBoostClassifier']\n", 990 | "colors = ['b', 'g', 'r', 'c', 'm']\n", 991 | "for i, estimator in enumerate(estimators):\n", 992 | " y_pred_prob = estimator.predict_proba(X_test)[:,1]\n", 993 | " fpr, tpr, thresholds= roc_curve(y_test, y_pred_prob)\n", 994 | " plt.plot(fpr, tpr, label=classifiers[i],color=colors[i]) \n", 995 | "plt.xlabel('False Positive Rate')\n", 996 | "plt.ylabel('True Positive Rate')\n", 997 | "plt.legend(loc=4)\n", 998 | "plt.title('ROC Curve')" 999 | ] 1000 | }, 1001 | { 1002 | "cell_type": "code", 1003 | "execution_count": 47, 1004 | "metadata": {}, 1005 | "outputs": [ 1006 | { 1007 | "data": { 1008 | "text/plain": [ 1009 | "Text(0.5, 1.0, 'Recall Precision Curve')" 1010 | ] 1011 | }, 1012 | "execution_count": 47, 1013 | "metadata": {}, 1014 | "output_type": "execute_result" 1015 | }, 1016 | { 1017 | "data": { 1018 | "image/png": "\n", 1019 | "text/plain": [ 1020 | "
" 1021 | ] 1022 | }, 1023 | "metadata": { 1024 | "needs_background": "light" 1025 | }, 1026 | "output_type": "display_data" 1027 | } 1028 | ], 1029 | "source": [ 1030 | "#plot Recall-Precision Curves\n", 1031 | "fig , ax1 = plt.subplots(figsize=(9,9) )\n", 1032 | "estimators =[pipeline_knn, pipeline_logreg, rfmodel, gbmodel, abmodel ]\n", 1033 | "classifiers=['KNN', 'LogisticRegression', 'RandomForestClassifier','GradientBoostingClassifier',\n", 1034 | " 'AdaBoostClassifier']\n", 1035 | "colors = ['b', 'g', 'r', 'c', 'm']\n", 1036 | "for i, estimator in enumerate(estimators):\n", 1037 | " y_pred_prob = estimator.predict_proba(X_test)[:,1]\n", 1038 | " precision, recall, _ = precision_recall_curve(y_test,y_pred_prob)\n", 1039 | " average_precision= average_precision_score(y_test, y_pred_prob, average=\"micro\")\n", 1040 | " plt.plot(recall, precision, label='%s (average=%.3f)'%(classifiers[i],average_precision), color=colors[i])\n", 1041 | "plt.xlabel('Recall')\n", 1042 | "plt.ylabel('Precision')\n", 1043 | "plt.legend(loc=1)\n", 1044 | "plt.title('Recall Precision Curve')" 1045 | ] 1046 | }, 1047 | { 1048 | "cell_type": "code", 1049 | "execution_count": null, 1050 | "metadata": {}, 1051 | "outputs": [], 1052 | "source": [] 1053 | } 1054 | ], 1055 | "metadata": { 1056 | "kernelspec": { 1057 | "display_name": "Python 3", 1058 | "language": "python", 1059 | "name": "python3" 1060 | }, 1061 | "language_info": { 1062 | "codemirror_mode": { 1063 | "name": "ipython", 1064 | "version": 3 1065 | }, 1066 | "file_extension": ".py", 1067 | "mimetype": "text/x-python", 1068 | "name": "python", 1069 | "nbconvert_exporter": "python", 1070 | "pygments_lexer": "ipython3", 1071 | "version": "3.8.8" 1072 | } 1073 | }, 1074 | "nbformat": 4, 1075 | "nbformat_minor": 4 1076 | } 1077 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Backorders_Supply_Chain_Analysis (Competition Dataset) 2 | Analyzed the supply chain process-data and developed ML models using Python that can predict if a given order goes to back order with 89% precision and 46% recall in Random Forest model. 3 | 4 | ![](Images/corr.png) 5 | -------------------------------------------------------------------------------- /Sample work report.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/prs98/Backorders_Supply_Chain_Analysis/45489dd0c3887d4a0ba4e8b7e7d1cfd140898553/Sample work report.pdf --------------------------------------------------------------------------------