├── .gitignore ├── LICENSE ├── README.md ├── data ├── test.csv └── training.csv ├── flask_api ├── __init__.py ├── flask_api.yml ├── hello-world.py ├── models │ └── model_v1.pk ├── requirements.txt ├── server.py └── utils.py └── notebooks ├── AnalyticsVidhya Article - ML Model approach.ipynb ├── ML Models as APIs using Flask.ipynb ├── ML+Models+as+APIs+using+Flask.html ├── ML+Models+as+APIs+using+Flask.md └── images ├── flaskapp1.png ├── flaskapp2.png └── flaskapp3.png /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | .spyproject 93 | 94 | # Rope project settings 95 | .ropeproject 96 | 97 | # mkdocs documentation 98 | /site 99 | 100 | # mypy 101 | .mypy_cache/ 102 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Prathamesh 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## [Creating Machine Learning API using Flask](https://www.analyticsvidhya.com/blog/2017/09/machine-learning-models-as-apis-using-flask/) 2 | #### Code accompanying the AnalyticsVidhya article 3 | 4 | __NOTE: This code is a bit old, please do not use this for production level tasks. There are better ways to do all these things, consider using [FlaskAppBuilder](https://github.com/dpgaspar/Flask-AppBuilder) & [quart](https://github.com/pgjones/quart) or [FastAPI](https://github.com/tiangolo/fastapi) for your production apis__ 5 | 6 | #### How to setup the Anaconda environment: 7 | 8 | - Make sure you have __Anaconda distribution__, if not then visit: [Miniconda Installation](https://conda.io/miniconda.html) to install it. 9 | - For a faster installation, run command (on terminal): `curl -L mini.conda.ml | bash` (Courtesy: [@mikb0b](https://twitter.com/mikb0b)) 10 | - For any queries regarding conda environment, visit: [Managing Conda Environments](https://conda.io/docs/user-guide/tasks/manage-environments.html) 11 | - Go to the folder `./flask_api`, you'll encounter `flask_api.yml` file. 12 | - In the terminal run command: `conda env create -f flask_api.yml` 13 | - Once done, run: `source activate flask_api`. Your virtual environment is setup successfully! 14 | -------------------------------------------------------------------------------- /data/test.csv: -------------------------------------------------------------------------------- 1 | Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area 2 | LP001015,Male,Yes,0,Graduate,No,5720,0,110,360,1,Urban 3 | LP001022,Male,Yes,1,Graduate,No,3076,1500,126,360,1,Urban 4 | LP001031,Male,Yes,2,Graduate,No,5000,1800,208,360,1,Urban 5 | LP001035,Male,Yes,2,Graduate,No,2340,2546,100,360,,Urban 6 | LP001051,Male,No,0,Not Graduate,No,3276,0,78,360,1,Urban 7 | LP001054,Male,Yes,0,Not Graduate,Yes,2165,3422,152,360,1,Urban 8 | LP001055,Female,No,1,Not Graduate,No,2226,0,59,360,1,Semiurban 9 | LP001056,Male,Yes,2,Not Graduate,No,3881,0,147,360,0,Rural 10 | LP001059,Male,Yes,2,Graduate,,13633,0,280,240,1,Urban 11 | LP001067,Male,No,0,Not Graduate,No,2400,2400,123,360,1,Semiurban 12 | LP001078,Male,No,0,Not Graduate,No,3091,0,90,360,1,Urban 13 | LP001082,Male,Yes,1,Graduate,,2185,1516,162,360,1,Semiurban 14 | LP001083,Male,No,3+,Graduate,No,4166,0,40,180,,Urban 15 | LP001094,Male,Yes,2,Graduate,,12173,0,166,360,0,Semiurban 16 | LP001096,Female,No,0,Graduate,No,4666,0,124,360,1,Semiurban 17 | LP001099,Male,No,1,Graduate,No,5667,0,131,360,1,Urban 18 | LP001105,Male,Yes,2,Graduate,No,4583,2916,200,360,1,Urban 19 | LP001107,Male,Yes,3+,Graduate,No,3786,333,126,360,1,Semiurban 20 | LP001108,Male,Yes,0,Graduate,No,9226,7916,300,360,1,Urban 21 | LP001115,Male,No,0,Graduate,No,1300,3470,100,180,1,Semiurban 22 | LP001121,Male,Yes,1,Not Graduate,No,1888,1620,48,360,1,Urban 23 | LP001124,Female,No,3+,Not Graduate,No,2083,0,28,180,1,Urban 24 | LP001128,,No,0,Graduate,No,3909,0,101,360,1,Urban 25 | LP001135,Female,No,0,Not Graduate,No,3765,0,125,360,1,Urban 26 | LP001149,Male,Yes,0,Graduate,No,5400,4380,290,360,1,Urban 27 | LP001153,Male,No,0,Graduate,No,0,24000,148,360,0,Rural 28 | LP001163,Male,Yes,2,Graduate,No,4363,1250,140,360,,Urban 29 | LP001169,Male,Yes,0,Graduate,No,7500,3750,275,360,1,Urban 30 | LP001174,Male,Yes,0,Graduate,No,3772,833,57,360,,Semiurban 31 | LP001176,Male,No,0,Graduate,No,2942,2382,125,180,1,Urban 32 | LP001177,Female,No,0,Not Graduate,No,2478,0,75,360,1,Semiurban 33 | LP001183,Male,Yes,2,Graduate,No,6250,820,192,360,1,Urban 34 | LP001185,Male,No,0,Graduate,No,3268,1683,152,360,1,Semiurban 35 | LP001187,Male,Yes,0,Graduate,No,2783,2708,158,360,1,Urban 36 | LP001190,Male,Yes,0,Graduate,No,2740,1541,101,360,1,Urban 37 | LP001203,Male,No,0,Graduate,No,3150,0,176,360,0,Semiurban 38 | LP001208,Male,Yes,2,Graduate,,7350,4029,185,180,1,Urban 39 | LP001210,Male,Yes,0,Graduate,Yes,2267,2792,90,360,1,Urban 40 | LP001211,Male,No,0,Graduate,Yes,5833,0,116,360,1,Urban 41 | LP001219,Male,No,0,Graduate,No,3643,1963,138,360,1,Urban 42 | LP001220,Male,Yes,0,Graduate,No,5629,818,100,360,1,Urban 43 | LP001221,Female,No,0,Graduate,No,3644,0,110,360,1,Urban 44 | LP001226,Male,Yes,0,Not Graduate,No,1750,2024,90,360,1,Semiurban 45 | LP001230,Male,No,0,Graduate,No,6500,2600,200,360,1,Semiurban 46 | LP001231,Female,No,0,Graduate,No,3666,0,84,360,1,Urban 47 | LP001232,Male,Yes,0,Graduate,No,4260,3900,185,,,Urban 48 | LP001237,Male,Yes,,Not Graduate,No,4163,1475,162,360,1,Urban 49 | LP001242,Male,No,0,Not Graduate,No,2356,1902,108,360,1,Semiurban 50 | LP001268,Male,No,0,Graduate,No,6792,3338,187,,1,Urban 51 | LP001270,Male,Yes,3+,Not Graduate,Yes,8000,250,187,360,1,Semiurban 52 | LP001284,Male,Yes,1,Graduate,No,2419,1707,124,360,1,Urban 53 | LP001287,,Yes,3+,Not Graduate,No,3500,833,120,360,1,Semiurban 54 | LP001291,Male,Yes,1,Graduate,No,3500,3077,160,360,1,Semiurban 55 | LP001298,Male,Yes,2,Graduate,No,4116,1000,30,180,1,Urban 56 | LP001312,Male,Yes,0,Not Graduate,Yes,5293,0,92,360,1,Urban 57 | LP001313,Male,No,0,Graduate,No,2750,0,130,360,0,Urban 58 | LP001317,Female,No,0,Not Graduate,No,4402,0,130,360,1,Rural 59 | LP001321,Male,Yes,2,Graduate,No,3613,3539,134,180,1,Semiurban 60 | LP001323,Female,Yes,2,Graduate,No,2779,3664,176,360,0,Semiurban 61 | LP001324,Male,Yes,3+,Graduate,No,4720,0,90,180,1,Semiurban 62 | LP001332,Male,Yes,0,Not Graduate,No,2415,1721,110,360,1,Semiurban 63 | LP001335,Male,Yes,0,Graduate,Yes,7016,292,125,360,1,Urban 64 | LP001338,Female,No,2,Graduate,No,4968,0,189,360,1,Semiurban 65 | LP001347,Female,No,0,Graduate,No,2101,1500,108,360,0,Rural 66 | LP001348,Male,Yes,3+,Not Graduate,No,4490,0,125,360,1,Urban 67 | LP001351,Male,Yes,0,Graduate,No,2917,3583,138,360,1,Semiurban 68 | LP001352,Male,Yes,0,Not Graduate,No,4700,0,135,360,0,Semiurban 69 | LP001358,Male,Yes,0,Graduate,No,3445,0,130,360,0,Semiurban 70 | LP001359,Male,Yes,0,Graduate,No,7666,0,187,360,1,Semiurban 71 | LP001361,Male,Yes,0,Graduate,No,2458,5105,188,360,0,Rural 72 | LP001366,Female,No,,Graduate,No,3250,0,95,360,1,Semiurban 73 | LP001368,Male,No,0,Graduate,No,4463,0,65,360,1,Semiurban 74 | LP001375,Male,Yes,1,Graduate,,4083,1775,139,60,1,Urban 75 | LP001380,Male,Yes,0,Graduate,Yes,3900,2094,232,360,1,Rural 76 | LP001386,Male,Yes,0,Not Graduate,No,4750,3583,144,360,1,Semiurban 77 | LP001400,Male,No,0,Graduate,No,3583,3435,155,360,1,Urban 78 | LP001407,Male,Yes,0,Graduate,No,3189,2367,186,360,1,Urban 79 | LP001413,Male,No,0,Graduate,Yes,6356,0,50,360,1,Rural 80 | LP001415,Male,Yes,1,Graduate,No,3413,4053,,360,1,Semiurban 81 | LP001419,Female,Yes,0,Graduate,No,7950,0,185,360,1,Urban 82 | LP001420,Male,Yes,3+,Graduate,No,3829,1103,163,360,0,Urban 83 | LP001428,Male,Yes,3+,Graduate,No,72529,0,360,360,1,Urban 84 | LP001445,Male,Yes,2,Not Graduate,No,4136,0,149,480,0,Rural 85 | LP001446,Male,Yes,0,Graduate,No,8449,0,257,360,1,Rural 86 | LP001450,Male,Yes,0,Graduate,No,4456,0,131,180,0,Semiurban 87 | LP001452,Male,Yes,2,Graduate,No,4635,8000,102,180,1,Rural 88 | LP001455,Male,Yes,0,Graduate,No,3571,1917,135,360,1,Urban 89 | LP001466,Male,No,0,Graduate,No,3066,0,95,360,1,Semiurban 90 | LP001471,Male,No,2,Not Graduate,No,3235,2015,77,360,1,Semiurban 91 | LP001472,Female,No,0,Graduate,,5058,0,200,360,1,Rural 92 | LP001475,Male,Yes,0,Graduate,Yes,3188,2286,130,360,,Rural 93 | LP001483,Male,Yes,3+,Graduate,No,13518,0,390,360,1,Rural 94 | LP001486,Male,Yes,1,Graduate,No,4364,2500,185,360,1,Semiurban 95 | LP001490,Male,Yes,2,Not Graduate,No,4766,1646,100,360,1,Semiurban 96 | LP001496,Male,Yes,1,Graduate,No,4609,2333,123,360,0,Semiurban 97 | LP001499,Female,Yes,3+,Graduate,No,6260,0,110,360,1,Semiurban 98 | LP001500,Male,Yes,1,Graduate,No,3333,4200,256,360,1,Urban 99 | LP001501,Male,Yes,0,Graduate,No,3500,3250,140,360,1,Semiurban 100 | LP001517,Male,Yes,3+,Graduate,No,9719,0,61,360,1,Urban 101 | LP001527,Male,Yes,3+,Graduate,No,6835,0,188,360,,Semiurban 102 | LP001534,Male,No,0,Graduate,No,4452,0,131,360,1,Rural 103 | LP001542,Female,Yes,0,Graduate,No,2262,0,,480,0,Semiurban 104 | LP001547,Male,Yes,1,Graduate,No,3901,0,116,360,1,Urban 105 | LP001548,Male,Yes,2,Not Graduate,No,2687,0,50,180,1,Rural 106 | LP001558,Male,No,0,Graduate,No,2243,2233,107,360,,Semiurban 107 | LP001561,Female,Yes,0,Graduate,No,3417,1287,200,360,1,Semiurban 108 | LP001563,,No,0,Graduate,No,1596,1760,119,360,0,Urban 109 | LP001567,Male,Yes,3+,Graduate,No,4513,0,120,360,1,Rural 110 | LP001568,Male,Yes,0,Graduate,No,4500,0,140,360,1,Semiurban 111 | LP001573,Male,Yes,0,Not Graduate,No,4523,1350,165,360,1,Urban 112 | LP001584,Female,No,0,Graduate,Yes,4742,0,108,360,1,Semiurban 113 | LP001587,Male,Yes,,Graduate,No,4082,0,93,360,1,Semiurban 114 | LP001589,Female,No,0,Graduate,No,3417,0,102,360,1,Urban 115 | LP001591,Female,Yes,2,Graduate,No,2922,3396,122,360,1,Semiurban 116 | LP001599,Male,Yes,0,Graduate,No,4167,4754,160,360,1,Rural 117 | LP001601,Male,No,3+,Graduate,No,4243,4123,157,360,,Semiurban 118 | LP001607,Female,No,0,Not Graduate,No,0,1760,180,360,1,Semiurban 119 | LP001611,Male,Yes,1,Graduate,No,1516,2900,80,,0,Rural 120 | LP001613,Female,No,0,Graduate,No,1762,2666,104,360,0,Urban 121 | LP001622,Male,Yes,2,Graduate,No,724,3510,213,360,0,Rural 122 | LP001627,Male,No,0,Graduate,No,3125,0,65,360,1,Urban 123 | LP001650,Male,Yes,0,Graduate,No,2333,3803,146,360,1,Rural 124 | LP001651,Male,Yes,3+,Graduate,No,3350,1560,135,360,1,Urban 125 | LP001652,Male,No,0,Graduate,No,2500,6414,187,360,0,Rural 126 | LP001655,Female,No,0,Graduate,No,12500,0,300,360,0,Urban 127 | LP001660,Male,No,0,Graduate,No,4667,0,120,360,1,Semiurban 128 | LP001662,Male,No,0,Graduate,No,6500,0,71,360,0,Urban 129 | LP001663,Male,Yes,2,Graduate,No,7500,0,225,360,1,Urban 130 | LP001667,Male,No,0,Graduate,No,3073,0,70,180,1,Urban 131 | LP001695,Male,Yes,1,Not Graduate,No,3321,2088,70,,1,Semiurban 132 | LP001703,Male,Yes,0,Graduate,No,3333,1270,124,360,1,Urban 133 | LP001718,Male,No,0,Graduate,No,3391,0,132,360,1,Rural 134 | LP001728,Male,Yes,1,Graduate,Yes,3343,1517,105,360,1,Rural 135 | LP001735,Female,No,1,Graduate,No,3620,0,90,360,1,Urban 136 | LP001737,Male,No,0,Graduate,No,4000,0,83,84,1,Urban 137 | LP001739,Male,Yes,0,Graduate,No,4258,0,125,360,1,Urban 138 | LP001742,Male,Yes,2,Graduate,No,4500,0,147,360,1,Rural 139 | LP001757,Male,Yes,1,Graduate,No,2014,2925,120,360,1,Rural 140 | LP001769,,No,,Graduate,No,3333,1250,110,360,1,Semiurban 141 | LP001771,Female,No,3+,Graduate,No,4083,0,103,360,,Semiurban 142 | LP001785,Male,No,0,Graduate,No,4727,0,150,360,0,Rural 143 | LP001787,Male,Yes,3+,Graduate,No,3089,2999,100,240,1,Rural 144 | LP001789,Male,Yes,3+,Not Graduate,,6794,528,139,360,0,Urban 145 | LP001791,Male,Yes,0,Graduate,Yes,32000,0,550,360,,Semiurban 146 | LP001794,Male,Yes,2,Graduate,Yes,10890,0,260,12,1,Rural 147 | LP001797,Female,No,0,Graduate,No,12941,0,150,300,1,Urban 148 | LP001815,Male,No,0,Not Graduate,No,3276,0,90,360,1,Semiurban 149 | LP001817,Male,No,0,Not Graduate,Yes,8703,0,199,360,0,Rural 150 | LP001818,Male,Yes,1,Graduate,No,4742,717,139,360,1,Semiurban 151 | LP001822,Male,No,0,Graduate,No,5900,0,150,360,1,Urban 152 | LP001827,Male,No,0,Graduate,No,3071,4309,180,360,1,Urban 153 | LP001831,Male,Yes,0,Graduate,No,2783,1456,113,360,1,Urban 154 | LP001842,Male,No,0,Graduate,No,5000,0,148,360,1,Rural 155 | LP001853,Male,Yes,1,Not Graduate,No,2463,2360,117,360,0,Urban 156 | LP001855,Male,Yes,2,Graduate,No,4855,0,72,360,1,Rural 157 | LP001857,Male,No,0,Not Graduate,Yes,1599,2474,125,300,1,Semiurban 158 | LP001862,Male,Yes,2,Graduate,Yes,4246,4246,214,360,1,Urban 159 | LP001867,Male,Yes,0,Graduate,No,4333,2291,133,350,1,Rural 160 | LP001878,Male,No,1,Graduate,No,5823,2529,187,360,1,Semiurban 161 | LP001881,Male,Yes,0,Not Graduate,No,7895,0,143,360,1,Rural 162 | LP001886,Male,No,0,Graduate,No,4150,4256,209,360,1,Rural 163 | LP001906,Male,No,0,Graduate,,2964,0,84,360,0,Semiurban 164 | LP001909,Male,No,0,Graduate,No,5583,0,116,360,1,Urban 165 | LP001911,Female,No,0,Graduate,No,2708,0,65,360,1,Rural 166 | LP001921,Male,No,1,Graduate,No,3180,2370,80,240,,Rural 167 | LP001923,Male,No,0,Not Graduate,No,2268,0,170,360,0,Semiurban 168 | LP001933,Male,No,2,Not Graduate,No,1141,2017,120,360,0,Urban 169 | LP001943,Male,Yes,0,Graduate,No,3042,3167,135,360,1,Urban 170 | LP001950,Female,Yes,3+,Graduate,,1750,2935,94,360,0,Semiurban 171 | LP001959,Female,Yes,1,Graduate,No,3564,0,79,360,1,Rural 172 | LP001961,Female,No,0,Graduate,No,3958,0,110,360,1,Rural 173 | LP001973,Male,Yes,2,Not Graduate,No,4483,0,130,360,1,Rural 174 | LP001975,Male,Yes,0,Graduate,No,5225,0,143,360,1,Rural 175 | LP001979,Male,No,0,Graduate,No,3017,2845,159,180,0,Urban 176 | LP001995,Male,Yes,0,Not Graduate,No,2431,1820,110,360,0,Rural 177 | LP001999,Male,Yes,2,Graduate,,4912,4614,160,360,1,Rural 178 | LP002007,Male,Yes,2,Not Graduate,No,2500,3333,131,360,1,Urban 179 | LP002009,Female,No,0,Graduate,No,2918,0,65,360,,Rural 180 | LP002016,Male,Yes,2,Graduate,No,5128,0,143,360,1,Rural 181 | LP002017,Male,Yes,3+,Graduate,No,15312,0,187,360,,Urban 182 | LP002018,Male,Yes,2,Graduate,No,3958,2632,160,360,1,Semiurban 183 | LP002027,Male,Yes,0,Graduate,No,4334,2945,165,360,1,Semiurban 184 | LP002028,Male,Yes,2,Graduate,No,4358,0,110,360,1,Urban 185 | LP002042,Female,Yes,1,Graduate,No,4000,3917,173,360,1,Rural 186 | LP002045,Male,Yes,3+,Graduate,No,10166,750,150,,1,Urban 187 | LP002046,Male,Yes,0,Not Graduate,No,4483,0,135,360,,Semiurban 188 | LP002047,Male,Yes,2,Not Graduate,No,4521,1184,150,360,1,Semiurban 189 | LP002056,Male,Yes,2,Graduate,No,9167,0,235,360,1,Semiurban 190 | LP002057,Male,Yes,0,Not Graduate,No,13083,0,,360,1,Rural 191 | LP002059,Male,Yes,2,Graduate,No,7874,3967,336,360,1,Rural 192 | LP002062,Female,Yes,1,Graduate,No,4333,0,132,84,1,Rural 193 | LP002064,Male,No,0,Graduate,No,4083,0,96,360,1,Urban 194 | LP002069,Male,Yes,2,Not Graduate,,3785,2912,180,360,0,Rural 195 | LP002070,Male,Yes,3+,Not Graduate,No,2654,1998,128,360,0,Rural 196 | LP002077,Male,Yes,1,Graduate,No,10000,2690,412,360,1,Semiurban 197 | LP002083,Male,No,0,Graduate,Yes,5833,0,116,360,1,Urban 198 | LP002090,Male,Yes,1,Graduate,No,4796,0,114,360,0,Semiurban 199 | LP002096,Male,Yes,0,Not Graduate,No,2000,1600,115,360,1,Rural 200 | LP002099,Male,Yes,2,Graduate,No,2540,700,104,360,0,Urban 201 | LP002102,Male,Yes,0,Graduate,Yes,1900,1442,88,360,1,Rural 202 | LP002105,Male,Yes,0,Graduate,Yes,8706,0,108,480,1,Rural 203 | LP002107,Male,Yes,3+,Not Graduate,No,2855,542,90,360,1,Urban 204 | LP002111,Male,Yes,,Graduate,No,3016,1300,100,360,,Urban 205 | LP002117,Female,Yes,0,Graduate,No,3159,2374,108,360,1,Semiurban 206 | LP002118,Female,No,0,Graduate,No,1937,1152,78,360,1,Semiurban 207 | LP002123,Male,Yes,0,Graduate,No,2613,2417,123,360,1,Semiurban 208 | LP002125,Male,Yes,1,Graduate,No,4960,2600,187,360,1,Semiurban 209 | LP002148,Male,Yes,1,Graduate,No,3074,1083,146,360,1,Semiurban 210 | LP002152,Female,No,0,Graduate,No,4213,0,80,360,1,Urban 211 | LP002165,,No,1,Not Graduate,No,2038,4027,100,360,1,Rural 212 | LP002167,Female,No,0,Graduate,No,2362,0,55,360,1,Urban 213 | LP002168,Male,No,0,Graduate,No,5333,2400,200,360,0,Rural 214 | LP002172,Male,Yes,3+,Graduate,Yes,5384,0,150,360,1,Semiurban 215 | LP002176,Male,No,0,Graduate,No,5708,0,150,360,1,Rural 216 | LP002183,Male,Yes,0,Not Graduate,No,3754,3719,118,,1,Rural 217 | LP002184,Male,Yes,0,Not Graduate,No,2914,2130,150,300,1,Urban 218 | LP002186,Male,Yes,0,Not Graduate,No,2747,2458,118,36,1,Semiurban 219 | LP002192,Male,Yes,0,Graduate,No,7830,2183,212,360,1,Rural 220 | LP002195,Male,Yes,1,Graduate,Yes,3507,3148,212,360,1,Rural 221 | LP002208,Male,Yes,1,Graduate,No,3747,2139,125,360,1,Urban 222 | LP002212,Male,Yes,0,Graduate,No,2166,2166,108,360,,Urban 223 | LP002240,Male,Yes,0,Not Graduate,No,3500,2168,149,360,1,Rural 224 | LP002245,Male,Yes,2,Not Graduate,No,2896,0,80,480,1,Urban 225 | LP002253,Female,No,1,Graduate,No,5062,0,152,300,1,Rural 226 | LP002256,Female,No,2,Graduate,Yes,5184,0,187,360,0,Semiurban 227 | LP002257,Female,No,0,Graduate,No,2545,0,74,360,1,Urban 228 | LP002264,Male,Yes,0,Graduate,No,2553,1768,102,360,1,Urban 229 | LP002270,Male,Yes,1,Graduate,No,3436,3809,100,360,1,Rural 230 | LP002279,Male,No,0,Graduate,No,2412,2755,130,360,1,Rural 231 | LP002286,Male,Yes,3+,Not Graduate,No,5180,0,125,360,0,Urban 232 | LP002294,Male,No,0,Graduate,No,14911,14507,130,360,1,Semiurban 233 | LP002298,,No,0,Graduate,Yes,2860,2988,138,360,1,Urban 234 | LP002306,Male,Yes,0,Graduate,No,1173,1594,28,180,1,Rural 235 | LP002310,Female,No,1,Graduate,No,7600,0,92,360,1,Semiurban 236 | LP002311,Female,Yes,0,Graduate,No,2157,1788,104,360,1,Urban 237 | LP002316,Male,No,0,Graduate,No,2231,2774,176,360,0,Urban 238 | LP002321,Female,No,0,Graduate,No,2274,5211,117,360,0,Semiurban 239 | LP002325,Male,Yes,2,Not Graduate,No,6166,13983,102,360,1,Rural 240 | LP002326,Male,Yes,2,Not Graduate,No,2513,1110,107,360,1,Semiurban 241 | LP002329,Male,No,0,Graduate,No,4333,0,66,480,1,Urban 242 | LP002333,Male,No,0,Not Graduate,No,3844,0,105,360,1,Urban 243 | LP002339,Male,Yes,0,Graduate,No,3887,1517,105,360,0,Semiurban 244 | LP002344,Male,Yes,0,Graduate,No,3510,828,105,360,1,Semiurban 245 | LP002346,Male,Yes,0,Graduate,,2539,1704,125,360,0,Rural 246 | LP002354,Female,No,0,Not Graduate,No,2107,0,64,360,1,Semiurban 247 | LP002355,,Yes,0,Graduate,No,3186,3145,150,180,0,Semiurban 248 | LP002358,Male,Yes,2,Graduate,Yes,5000,2166,150,360,1,Urban 249 | LP002360,Male,Yes,,Graduate,No,10000,0,,360,1,Urban 250 | LP002375,Male,Yes,0,Not Graduate,Yes,3943,0,64,360,1,Semiurban 251 | LP002376,Male,No,0,Graduate,No,2925,0,40,180,1,Rural 252 | LP002383,Male,Yes,3+,Graduate,No,3242,437,142,480,0,Urban 253 | LP002385,Male,Yes,,Graduate,No,3863,0,70,300,1,Semiurban 254 | LP002389,Female,No,1,Graduate,No,4028,0,131,360,1,Semiurban 255 | LP002394,Male,Yes,2,Graduate,No,4010,1025,120,360,1,Urban 256 | LP002397,Female,Yes,1,Graduate,No,3719,1585,114,360,1,Urban 257 | LP002399,Male,No,0,Graduate,,2858,0,123,360,0,Rural 258 | LP002400,Female,Yes,0,Graduate,No,3833,0,92,360,1,Rural 259 | LP002402,Male,Yes,0,Graduate,No,3333,4288,160,360,1,Urban 260 | LP002412,Male,Yes,0,Graduate,No,3007,3725,151,360,1,Rural 261 | LP002415,Female,No,1,Graduate,,1850,4583,81,360,,Rural 262 | LP002417,Male,Yes,3+,Not Graduate,No,2792,2619,171,360,1,Semiurban 263 | LP002420,Male,Yes,0,Graduate,No,2982,1550,110,360,1,Semiurban 264 | LP002425,Male,No,0,Graduate,No,3417,738,100,360,,Rural 265 | LP002433,Male,Yes,1,Graduate,No,18840,0,234,360,1,Rural 266 | LP002440,Male,Yes,2,Graduate,No,2995,1120,184,360,1,Rural 267 | LP002441,Male,No,,Graduate,No,3579,3308,138,360,,Semiurban 268 | LP002442,Female,Yes,1,Not Graduate,No,3835,1400,112,480,0,Urban 269 | LP002445,Female,No,1,Not Graduate,No,3854,3575,117,360,1,Rural 270 | LP002450,Male,Yes,2,Graduate,No,5833,750,49,360,0,Rural 271 | LP002471,Male,No,0,Graduate,No,3508,0,99,360,1,Rural 272 | LP002476,Female,Yes,3+,Not Graduate,No,1635,2444,99,360,1,Urban 273 | LP002482,Female,No,0,Graduate,Yes,3333,3916,212,360,1,Rural 274 | LP002485,Male,No,1,Graduate,No,24797,0,240,360,1,Semiurban 275 | LP002495,Male,Yes,2,Graduate,No,5667,440,130,360,0,Semiurban 276 | LP002496,Female,No,0,Graduate,No,3500,0,94,360,0,Semiurban 277 | LP002523,Male,Yes,3+,Graduate,No,2773,1497,108,360,1,Semiurban 278 | LP002542,Male,Yes,0,Graduate,,6500,0,144,360,1,Urban 279 | LP002550,Female,No,0,Graduate,No,5769,0,110,180,1,Semiurban 280 | LP002551,Male,Yes,3+,Not Graduate,,3634,910,176,360,0,Semiurban 281 | LP002553,,No,0,Graduate,No,29167,0,185,360,1,Semiurban 282 | LP002554,Male,No,0,Graduate,No,2166,2057,122,360,1,Semiurban 283 | LP002561,Male,Yes,0,Graduate,No,5000,0,126,360,1,Rural 284 | LP002566,Female,No,0,Graduate,No,5530,0,135,360,,Urban 285 | LP002568,Male,No,0,Not Graduate,No,9000,0,122,360,1,Rural 286 | LP002570,Female,Yes,2,Graduate,No,10000,11666,460,360,1,Urban 287 | LP002572,Male,Yes,1,Graduate,,8750,0,297,360,1,Urban 288 | LP002581,Male,Yes,0,Not Graduate,No,2157,2730,140,360,,Rural 289 | LP002584,Male,No,0,Graduate,,1972,4347,106,360,1,Rural 290 | LP002592,Male,No,0,Graduate,No,4983,0,141,360,1,Urban 291 | LP002593,Male,Yes,1,Graduate,No,8333,4000,,360,1,Urban 292 | LP002599,Male,Yes,0,Graduate,No,3667,2000,170,360,1,Semiurban 293 | LP002604,Male,Yes,2,Graduate,No,3166,2833,145,360,1,Urban 294 | LP002605,Male,No,0,Not Graduate,No,3271,0,90,360,1,Rural 295 | LP002609,Female,Yes,0,Graduate,No,2241,2000,88,360,0,Urban 296 | LP002610,Male,Yes,1,Not Graduate,,1792,2565,128,360,1,Urban 297 | LP002612,Female,Yes,0,Graduate,No,2666,0,84,480,1,Semiurban 298 | LP002614,,No,0,Graduate,No,6478,0,108,360,1,Semiurban 299 | LP002630,Male,No,0,Not Graduate,,3808,0,83,360,1,Rural 300 | LP002635,Female,Yes,2,Not Graduate,No,3729,0,117,360,1,Semiurban 301 | LP002639,Male,Yes,2,Graduate,No,4120,0,128,360,1,Rural 302 | LP002644,Male,Yes,1,Graduate,Yes,7500,0,75,360,1,Urban 303 | LP002651,Male,Yes,1,Graduate,,6300,0,125,360,0,Urban 304 | LP002654,Female,No,,Graduate,Yes,14987,0,177,360,1,Rural 305 | LP002657,,Yes,1,Not Graduate,Yes,570,2125,68,360,1,Rural 306 | LP002711,Male,Yes,0,Graduate,No,2600,700,96,360,1,Semiurban 307 | LP002712,Male,No,2,Not Graduate,No,2733,1083,180,360,,Semiurban 308 | LP002721,Male,Yes,2,Graduate,Yes,7500,0,183,360,1,Rural 309 | LP002735,Male,Yes,2,Not Graduate,No,3859,0,121,360,1,Rural 310 | LP002744,Male,Yes,1,Graduate,No,6825,0,162,360,1,Rural 311 | LP002745,Male,Yes,0,Graduate,No,3708,4700,132,360,1,Semiurban 312 | LP002746,Male,No,0,Graduate,No,5314,0,147,360,1,Urban 313 | LP002747,Female,No,3+,Graduate,No,2366,5272,153,360,0,Rural 314 | LP002754,Male,No,,Graduate,No,2066,2108,104,84,1,Urban 315 | LP002759,Male,Yes,2,Graduate,No,5000,0,149,360,1,Rural 316 | LP002760,Female,No,0,Graduate,No,3767,0,134,300,1,Urban 317 | LP002766,Female,Yes,0,Graduate,No,7859,879,165,180,1,Semiurban 318 | LP002769,Female,Yes,0,Graduate,No,4283,0,120,360,1,Rural 319 | LP002774,Male,Yes,0,Not Graduate,No,1700,2900,67,360,0,Urban 320 | LP002775,,No,0,Not Graduate,No,4768,0,125,360,1,Rural 321 | LP002781,Male,No,0,Graduate,No,3083,2738,120,360,1,Urban 322 | LP002782,Male,Yes,1,Graduate,No,2667,1542,148,360,1,Rural 323 | LP002786,Female,Yes,0,Not Graduate,No,1647,1762,181,360,1,Urban 324 | LP002790,Male,Yes,3+,Graduate,No,3400,0,80,120,1,Urban 325 | LP002791,Male,No,1,Graduate,,16000,5000,40,360,1,Semiurban 326 | LP002793,Male,Yes,0,Graduate,No,5333,0,90,360,1,Rural 327 | LP002802,Male,No,0,Graduate,No,2875,2416,95,6,0,Semiurban 328 | LP002803,Male,Yes,1,Not Graduate,,2600,618,122,360,1,Semiurban 329 | LP002805,Male,Yes,2,Graduate,No,5041,700,150,360,1,Urban 330 | LP002806,Male,Yes,3+,Graduate,Yes,6958,1411,150,360,1,Rural 331 | LP002816,Male,Yes,1,Graduate,No,3500,1658,104,360,,Semiurban 332 | LP002823,Male,Yes,0,Graduate,No,5509,0,143,360,1,Rural 333 | LP002825,Male,Yes,3+,Graduate,No,9699,0,300,360,1,Urban 334 | LP002826,Female,Yes,1,Not Graduate,No,3621,2717,171,360,1,Urban 335 | LP002843,Female,Yes,0,Graduate,No,4709,0,113,360,1,Semiurban 336 | LP002849,Male,Yes,0,Graduate,No,1516,1951,35,360,1,Semiurban 337 | LP002850,Male,No,2,Graduate,No,2400,0,46,360,1,Urban 338 | LP002853,Female,No,0,Not Graduate,No,3015,2000,145,360,,Urban 339 | LP002856,Male,Yes,0,Graduate,No,2292,1558,119,360,1,Urban 340 | LP002857,Male,Yes,1,Graduate,Yes,2360,3355,87,240,1,Rural 341 | LP002858,Female,No,0,Graduate,No,4333,2333,162,360,0,Rural 342 | LP002860,Male,Yes,0,Graduate,Yes,2623,4831,122,180,1,Semiurban 343 | LP002867,Male,No,0,Graduate,Yes,3972,4275,187,360,1,Rural 344 | LP002869,Male,Yes,3+,Not Graduate,No,3522,0,81,180,1,Rural 345 | LP002870,Male,Yes,1,Graduate,No,4700,0,80,360,1,Urban 346 | LP002876,Male,No,0,Graduate,No,6858,0,176,360,1,Rural 347 | LP002878,Male,Yes,3+,Graduate,No,8334,0,260,360,1,Urban 348 | LP002879,Male,Yes,0,Graduate,No,3391,1966,133,360,0,Rural 349 | LP002885,Male,No,0,Not Graduate,No,2868,0,70,360,1,Urban 350 | LP002890,Male,Yes,2,Not Graduate,No,3418,1380,135,360,1,Urban 351 | LP002891,Male,Yes,0,Graduate,Yes,2500,296,137,300,1,Rural 352 | LP002899,Male,Yes,2,Graduate,No,8667,0,254,360,1,Rural 353 | LP002901,Male,No,0,Graduate,No,2283,15000,106,360,,Rural 354 | LP002907,Male,Yes,0,Graduate,No,5817,910,109,360,1,Urban 355 | LP002920,Male,Yes,0,Graduate,No,5119,3769,120,360,1,Rural 356 | LP002921,Male,Yes,3+,Not Graduate,No,5316,187,158,180,0,Semiurban 357 | LP002932,Male,Yes,3+,Graduate,No,7603,1213,197,360,1,Urban 358 | LP002935,Male,Yes,1,Graduate,No,3791,1936,85,360,1,Urban 359 | LP002952,Male,No,0,Graduate,No,2500,0,60,360,1,Urban 360 | LP002954,Male,Yes,2,Not Graduate,No,3132,0,76,360,,Rural 361 | LP002962,Male,No,0,Graduate,No,4000,2667,152,360,1,Semiurban 362 | LP002965,Female,Yes,0,Graduate,No,8550,4255,96,360,,Urban 363 | LP002969,Male,Yes,1,Graduate,No,2269,2167,99,360,1,Semiurban 364 | LP002971,Male,Yes,3+,Not Graduate,Yes,4009,1777,113,360,1,Urban 365 | LP002975,Male,Yes,0,Graduate,No,4158,709,115,360,1,Urban 366 | LP002980,Male,No,0,Graduate,No,3250,1993,126,360,,Semiurban 367 | LP002986,Male,Yes,0,Graduate,No,5000,2393,158,360,1,Rural 368 | LP002989,Male,No,0,Graduate,Yes,9200,0,98,180,1,Rural 369 | -------------------------------------------------------------------------------- /data/training.csv: -------------------------------------------------------------------------------- 1 | Loan_ID,Gender,Married,Dependents,Education,Self_Employed,ApplicantIncome,CoapplicantIncome,LoanAmount,Loan_Amount_Term,Credit_History,Property_Area,Loan_Status 2 | LP001002,Male,No,0,Graduate,No,5849,0,,360,1,Urban,Y 3 | LP001003,Male,Yes,1,Graduate,No,4583,1508,128,360,1,Rural,N 4 | LP001005,Male,Yes,0,Graduate,Yes,3000,0,66,360,1,Urban,Y 5 | LP001006,Male,Yes,0,Not Graduate,No,2583,2358,120,360,1,Urban,Y 6 | LP001008,Male,No,0,Graduate,No,6000,0,141,360,1,Urban,Y 7 | LP001011,Male,Yes,2,Graduate,Yes,5417,4196,267,360,1,Urban,Y 8 | LP001013,Male,Yes,0,Not Graduate,No,2333,1516,95,360,1,Urban,Y 9 | LP001014,Male,Yes,3+,Graduate,No,3036,2504,158,360,0,Semiurban,N 10 | LP001018,Male,Yes,2,Graduate,No,4006,1526,168,360,1,Urban,Y 11 | LP001020,Male,Yes,1,Graduate,No,12841,10968,349,360,1,Semiurban,N 12 | LP001024,Male,Yes,2,Graduate,No,3200,700,70,360,1,Urban,Y 13 | LP001027,Male,Yes,2,Graduate,,2500,1840,109,360,1,Urban,Y 14 | LP001028,Male,Yes,2,Graduate,No,3073,8106,200,360,1,Urban,Y 15 | LP001029,Male,No,0,Graduate,No,1853,2840,114,360,1,Rural,N 16 | LP001030,Male,Yes,2,Graduate,No,1299,1086,17,120,1,Urban,Y 17 | LP001032,Male,No,0,Graduate,No,4950,0,125,360,1,Urban,Y 18 | LP001034,Male,No,1,Not Graduate,No,3596,0,100,240,,Urban,Y 19 | LP001036,Female,No,0,Graduate,No,3510,0,76,360,0,Urban,N 20 | LP001038,Male,Yes,0,Not Graduate,No,4887,0,133,360,1,Rural,N 21 | LP001041,Male,Yes,0,Graduate,,2600,3500,115,,1,Urban,Y 22 | LP001043,Male,Yes,0,Not Graduate,No,7660,0,104,360,0,Urban,N 23 | LP001046,Male,Yes,1,Graduate,No,5955,5625,315,360,1,Urban,Y 24 | LP001047,Male,Yes,0,Not Graduate,No,2600,1911,116,360,0,Semiurban,N 25 | LP001050,,Yes,2,Not Graduate,No,3365,1917,112,360,0,Rural,N 26 | LP001052,Male,Yes,1,Graduate,,3717,2925,151,360,,Semiurban,N 27 | LP001066,Male,Yes,0,Graduate,Yes,9560,0,191,360,1,Semiurban,Y 28 | LP001068,Male,Yes,0,Graduate,No,2799,2253,122,360,1,Semiurban,Y 29 | LP001073,Male,Yes,2,Not Graduate,No,4226,1040,110,360,1,Urban,Y 30 | LP001086,Male,No,0,Not Graduate,No,1442,0,35,360,1,Urban,N 31 | LP001087,Female,No,2,Graduate,,3750,2083,120,360,1,Semiurban,Y 32 | LP001091,Male,Yes,1,Graduate,,4166,3369,201,360,,Urban,N 33 | LP001095,Male,No,0,Graduate,No,3167,0,74,360,1,Urban,N 34 | LP001097,Male,No,1,Graduate,Yes,4692,0,106,360,1,Rural,N 35 | LP001098,Male,Yes,0,Graduate,No,3500,1667,114,360,1,Semiurban,Y 36 | LP001100,Male,No,3+,Graduate,No,12500,3000,320,360,1,Rural,N 37 | LP001106,Male,Yes,0,Graduate,No,2275,2067,,360,1,Urban,Y 38 | LP001109,Male,Yes,0,Graduate,No,1828,1330,100,,0,Urban,N 39 | LP001112,Female,Yes,0,Graduate,No,3667,1459,144,360,1,Semiurban,Y 40 | LP001114,Male,No,0,Graduate,No,4166,7210,184,360,1,Urban,Y 41 | LP001116,Male,No,0,Not Graduate,No,3748,1668,110,360,1,Semiurban,Y 42 | LP001119,Male,No,0,Graduate,No,3600,0,80,360,1,Urban,N 43 | LP001120,Male,No,0,Graduate,No,1800,1213,47,360,1,Urban,Y 44 | LP001123,Male,Yes,0,Graduate,No,2400,0,75,360,,Urban,Y 45 | LP001131,Male,Yes,0,Graduate,No,3941,2336,134,360,1,Semiurban,Y 46 | LP001136,Male,Yes,0,Not Graduate,Yes,4695,0,96,,1,Urban,Y 47 | LP001137,Female,No,0,Graduate,No,3410,0,88,,1,Urban,Y 48 | LP001138,Male,Yes,1,Graduate,No,5649,0,44,360,1,Urban,Y 49 | LP001144,Male,Yes,0,Graduate,No,5821,0,144,360,1,Urban,Y 50 | LP001146,Female,Yes,0,Graduate,No,2645,3440,120,360,0,Urban,N 51 | LP001151,Female,No,0,Graduate,No,4000,2275,144,360,1,Semiurban,Y 52 | LP001155,Female,Yes,0,Not Graduate,No,1928,1644,100,360,1,Semiurban,Y 53 | LP001157,Female,No,0,Graduate,No,3086,0,120,360,1,Semiurban,Y 54 | LP001164,Female,No,0,Graduate,No,4230,0,112,360,1,Semiurban,N 55 | LP001179,Male,Yes,2,Graduate,No,4616,0,134,360,1,Urban,N 56 | LP001186,Female,Yes,1,Graduate,Yes,11500,0,286,360,0,Urban,N 57 | LP001194,Male,Yes,2,Graduate,No,2708,1167,97,360,1,Semiurban,Y 58 | LP001195,Male,Yes,0,Graduate,No,2132,1591,96,360,1,Semiurban,Y 59 | LP001197,Male,Yes,0,Graduate,No,3366,2200,135,360,1,Rural,N 60 | LP001198,Male,Yes,1,Graduate,No,8080,2250,180,360,1,Urban,Y 61 | LP001199,Male,Yes,2,Not Graduate,No,3357,2859,144,360,1,Urban,Y 62 | LP001205,Male,Yes,0,Graduate,No,2500,3796,120,360,1,Urban,Y 63 | LP001206,Male,Yes,3+,Graduate,No,3029,0,99,360,1,Urban,Y 64 | LP001207,Male,Yes,0,Not Graduate,Yes,2609,3449,165,180,0,Rural,N 65 | LP001213,Male,Yes,1,Graduate,No,4945,0,,360,0,Rural,N 66 | LP001222,Female,No,0,Graduate,No,4166,0,116,360,0,Semiurban,N 67 | LP001225,Male,Yes,0,Graduate,No,5726,4595,258,360,1,Semiurban,N 68 | LP001228,Male,No,0,Not Graduate,No,3200,2254,126,180,0,Urban,N 69 | LP001233,Male,Yes,1,Graduate,No,10750,0,312,360,1,Urban,Y 70 | LP001238,Male,Yes,3+,Not Graduate,Yes,7100,0,125,60,1,Urban,Y 71 | LP001241,Female,No,0,Graduate,No,4300,0,136,360,0,Semiurban,N 72 | LP001243,Male,Yes,0,Graduate,No,3208,3066,172,360,1,Urban,Y 73 | LP001245,Male,Yes,2,Not Graduate,Yes,1875,1875,97,360,1,Semiurban,Y 74 | LP001248,Male,No,0,Graduate,No,3500,0,81,300,1,Semiurban,Y 75 | LP001250,Male,Yes,3+,Not Graduate,No,4755,0,95,,0,Semiurban,N 76 | LP001253,Male,Yes,3+,Graduate,Yes,5266,1774,187,360,1,Semiurban,Y 77 | LP001255,Male,No,0,Graduate,No,3750,0,113,480,1,Urban,N 78 | LP001256,Male,No,0,Graduate,No,3750,4750,176,360,1,Urban,N 79 | LP001259,Male,Yes,1,Graduate,Yes,1000,3022,110,360,1,Urban,N 80 | LP001263,Male,Yes,3+,Graduate,No,3167,4000,180,300,0,Semiurban,N 81 | LP001264,Male,Yes,3+,Not Graduate,Yes,3333,2166,130,360,,Semiurban,Y 82 | LP001265,Female,No,0,Graduate,No,3846,0,111,360,1,Semiurban,Y 83 | LP001266,Male,Yes,1,Graduate,Yes,2395,0,,360,1,Semiurban,Y 84 | LP001267,Female,Yes,2,Graduate,No,1378,1881,167,360,1,Urban,N 85 | LP001273,Male,Yes,0,Graduate,No,6000,2250,265,360,,Semiurban,N 86 | LP001275,Male,Yes,1,Graduate,No,3988,0,50,240,1,Urban,Y 87 | LP001279,Male,No,0,Graduate,No,2366,2531,136,360,1,Semiurban,Y 88 | LP001280,Male,Yes,2,Not Graduate,No,3333,2000,99,360,,Semiurban,Y 89 | LP001282,Male,Yes,0,Graduate,No,2500,2118,104,360,1,Semiurban,Y 90 | LP001289,Male,No,0,Graduate,No,8566,0,210,360,1,Urban,Y 91 | LP001310,Male,Yes,0,Graduate,No,5695,4167,175,360,1,Semiurban,Y 92 | LP001316,Male,Yes,0,Graduate,No,2958,2900,131,360,1,Semiurban,Y 93 | LP001318,Male,Yes,2,Graduate,No,6250,5654,188,180,1,Semiurban,Y 94 | LP001319,Male,Yes,2,Not Graduate,No,3273,1820,81,360,1,Urban,Y 95 | LP001322,Male,No,0,Graduate,No,4133,0,122,360,1,Semiurban,Y 96 | LP001325,Male,No,0,Not Graduate,No,3620,0,25,120,1,Semiurban,Y 97 | LP001326,Male,No,0,Graduate,,6782,0,,360,,Urban,N 98 | LP001327,Female,Yes,0,Graduate,No,2484,2302,137,360,1,Semiurban,Y 99 | LP001333,Male,Yes,0,Graduate,No,1977,997,50,360,1,Semiurban,Y 100 | LP001334,Male,Yes,0,Not Graduate,No,4188,0,115,180,1,Semiurban,Y 101 | LP001343,Male,Yes,0,Graduate,No,1759,3541,131,360,1,Semiurban,Y 102 | LP001345,Male,Yes,2,Not Graduate,No,4288,3263,133,180,1,Urban,Y 103 | LP001349,Male,No,0,Graduate,No,4843,3806,151,360,1,Semiurban,Y 104 | LP001350,Male,Yes,,Graduate,No,13650,0,,360,1,Urban,Y 105 | LP001356,Male,Yes,0,Graduate,No,4652,3583,,360,1,Semiurban,Y 106 | LP001357,Male,,,Graduate,No,3816,754,160,360,1,Urban,Y 107 | LP001367,Male,Yes,1,Graduate,No,3052,1030,100,360,1,Urban,Y 108 | LP001369,Male,Yes,2,Graduate,No,11417,1126,225,360,1,Urban,Y 109 | LP001370,Male,No,0,Not Graduate,,7333,0,120,360,1,Rural,N 110 | LP001379,Male,Yes,2,Graduate,No,3800,3600,216,360,0,Urban,N 111 | LP001384,Male,Yes,3+,Not Graduate,No,2071,754,94,480,1,Semiurban,Y 112 | LP001385,Male,No,0,Graduate,No,5316,0,136,360,1,Urban,Y 113 | LP001387,Female,Yes,0,Graduate,,2929,2333,139,360,1,Semiurban,Y 114 | LP001391,Male,Yes,0,Not Graduate,No,3572,4114,152,,0,Rural,N 115 | LP001392,Female,No,1,Graduate,Yes,7451,0,,360,1,Semiurban,Y 116 | LP001398,Male,No,0,Graduate,,5050,0,118,360,1,Semiurban,Y 117 | LP001401,Male,Yes,1,Graduate,No,14583,0,185,180,1,Rural,Y 118 | LP001404,Female,Yes,0,Graduate,No,3167,2283,154,360,1,Semiurban,Y 119 | LP001405,Male,Yes,1,Graduate,No,2214,1398,85,360,,Urban,Y 120 | LP001421,Male,Yes,0,Graduate,No,5568,2142,175,360,1,Rural,N 121 | LP001422,Female,No,0,Graduate,No,10408,0,259,360,1,Urban,Y 122 | LP001426,Male,Yes,,Graduate,No,5667,2667,180,360,1,Rural,Y 123 | LP001430,Female,No,0,Graduate,No,4166,0,44,360,1,Semiurban,Y 124 | LP001431,Female,No,0,Graduate,No,2137,8980,137,360,0,Semiurban,Y 125 | LP001432,Male,Yes,2,Graduate,No,2957,0,81,360,1,Semiurban,Y 126 | LP001439,Male,Yes,0,Not Graduate,No,4300,2014,194,360,1,Rural,Y 127 | LP001443,Female,No,0,Graduate,No,3692,0,93,360,,Rural,Y 128 | LP001448,,Yes,3+,Graduate,No,23803,0,370,360,1,Rural,Y 129 | LP001449,Male,No,0,Graduate,No,3865,1640,,360,1,Rural,Y 130 | LP001451,Male,Yes,1,Graduate,Yes,10513,3850,160,180,0,Urban,N 131 | LP001465,Male,Yes,0,Graduate,No,6080,2569,182,360,,Rural,N 132 | LP001469,Male,No,0,Graduate,Yes,20166,0,650,480,,Urban,Y 133 | LP001473,Male,No,0,Graduate,No,2014,1929,74,360,1,Urban,Y 134 | LP001478,Male,No,0,Graduate,No,2718,0,70,360,1,Semiurban,Y 135 | LP001482,Male,Yes,0,Graduate,Yes,3459,0,25,120,1,Semiurban,Y 136 | LP001487,Male,No,0,Graduate,No,4895,0,102,360,1,Semiurban,Y 137 | LP001488,Male,Yes,3+,Graduate,No,4000,7750,290,360,1,Semiurban,N 138 | LP001489,Female,Yes,0,Graduate,No,4583,0,84,360,1,Rural,N 139 | LP001491,Male,Yes,2,Graduate,Yes,3316,3500,88,360,1,Urban,Y 140 | LP001492,Male,No,0,Graduate,No,14999,0,242,360,0,Semiurban,N 141 | LP001493,Male,Yes,2,Not Graduate,No,4200,1430,129,360,1,Rural,N 142 | LP001497,Male,Yes,2,Graduate,No,5042,2083,185,360,1,Rural,N 143 | LP001498,Male,No,0,Graduate,No,5417,0,168,360,1,Urban,Y 144 | LP001504,Male,No,0,Graduate,Yes,6950,0,175,180,1,Semiurban,Y 145 | LP001507,Male,Yes,0,Graduate,No,2698,2034,122,360,1,Semiurban,Y 146 | LP001508,Male,Yes,2,Graduate,No,11757,0,187,180,1,Urban,Y 147 | LP001514,Female,Yes,0,Graduate,No,2330,4486,100,360,1,Semiurban,Y 148 | LP001516,Female,Yes,2,Graduate,No,14866,0,70,360,1,Urban,Y 149 | LP001518,Male,Yes,1,Graduate,No,1538,1425,30,360,1,Urban,Y 150 | LP001519,Female,No,0,Graduate,No,10000,1666,225,360,1,Rural,N 151 | LP001520,Male,Yes,0,Graduate,No,4860,830,125,360,1,Semiurban,Y 152 | LP001528,Male,No,0,Graduate,No,6277,0,118,360,0,Rural,N 153 | LP001529,Male,Yes,0,Graduate,Yes,2577,3750,152,360,1,Rural,Y 154 | LP001531,Male,No,0,Graduate,No,9166,0,244,360,1,Urban,N 155 | LP001532,Male,Yes,2,Not Graduate,No,2281,0,113,360,1,Rural,N 156 | LP001535,Male,No,0,Graduate,No,3254,0,50,360,1,Urban,Y 157 | LP001536,Male,Yes,3+,Graduate,No,39999,0,600,180,0,Semiurban,Y 158 | LP001541,Male,Yes,1,Graduate,No,6000,0,160,360,,Rural,Y 159 | LP001543,Male,Yes,1,Graduate,No,9538,0,187,360,1,Urban,Y 160 | LP001546,Male,No,0,Graduate,,2980,2083,120,360,1,Rural,Y 161 | LP001552,Male,Yes,0,Graduate,No,4583,5625,255,360,1,Semiurban,Y 162 | LP001560,Male,Yes,0,Not Graduate,No,1863,1041,98,360,1,Semiurban,Y 163 | LP001562,Male,Yes,0,Graduate,No,7933,0,275,360,1,Urban,N 164 | LP001565,Male,Yes,1,Graduate,No,3089,1280,121,360,0,Semiurban,N 165 | LP001570,Male,Yes,2,Graduate,No,4167,1447,158,360,1,Rural,Y 166 | LP001572,Male,Yes,0,Graduate,No,9323,0,75,180,1,Urban,Y 167 | LP001574,Male,Yes,0,Graduate,No,3707,3166,182,,1,Rural,Y 168 | LP001577,Female,Yes,0,Graduate,No,4583,0,112,360,1,Rural,N 169 | LP001578,Male,Yes,0,Graduate,No,2439,3333,129,360,1,Rural,Y 170 | LP001579,Male,No,0,Graduate,No,2237,0,63,480,0,Semiurban,N 171 | LP001580,Male,Yes,2,Graduate,No,8000,0,200,360,1,Semiurban,Y 172 | LP001581,Male,Yes,0,Not Graduate,,1820,1769,95,360,1,Rural,Y 173 | LP001585,,Yes,3+,Graduate,No,51763,0,700,300,1,Urban,Y 174 | LP001586,Male,Yes,3+,Not Graduate,No,3522,0,81,180,1,Rural,N 175 | LP001594,Male,Yes,0,Graduate,No,5708,5625,187,360,1,Semiurban,Y 176 | LP001603,Male,Yes,0,Not Graduate,Yes,4344,736,87,360,1,Semiurban,N 177 | LP001606,Male,Yes,0,Graduate,No,3497,1964,116,360,1,Rural,Y 178 | LP001608,Male,Yes,2,Graduate,No,2045,1619,101,360,1,Rural,Y 179 | LP001610,Male,Yes,3+,Graduate,No,5516,11300,495,360,0,Semiurban,N 180 | LP001616,Male,Yes,1,Graduate,No,3750,0,116,360,1,Semiurban,Y 181 | LP001630,Male,No,0,Not Graduate,No,2333,1451,102,480,0,Urban,N 182 | LP001633,Male,Yes,1,Graduate,No,6400,7250,180,360,0,Urban,N 183 | LP001634,Male,No,0,Graduate,No,1916,5063,67,360,,Rural,N 184 | LP001636,Male,Yes,0,Graduate,No,4600,0,73,180,1,Semiurban,Y 185 | LP001637,Male,Yes,1,Graduate,No,33846,0,260,360,1,Semiurban,N 186 | LP001639,Female,Yes,0,Graduate,No,3625,0,108,360,1,Semiurban,Y 187 | LP001640,Male,Yes,0,Graduate,Yes,39147,4750,120,360,1,Semiurban,Y 188 | LP001641,Male,Yes,1,Graduate,Yes,2178,0,66,300,0,Rural,N 189 | LP001643,Male,Yes,0,Graduate,No,2383,2138,58,360,,Rural,Y 190 | LP001644,,Yes,0,Graduate,Yes,674,5296,168,360,1,Rural,Y 191 | LP001647,Male,Yes,0,Graduate,No,9328,0,188,180,1,Rural,Y 192 | LP001653,Male,No,0,Not Graduate,No,4885,0,48,360,1,Rural,Y 193 | LP001656,Male,No,0,Graduate,No,12000,0,164,360,1,Semiurban,N 194 | LP001657,Male,Yes,0,Not Graduate,No,6033,0,160,360,1,Urban,N 195 | LP001658,Male,No,0,Graduate,No,3858,0,76,360,1,Semiurban,Y 196 | LP001664,Male,No,0,Graduate,No,4191,0,120,360,1,Rural,Y 197 | LP001665,Male,Yes,1,Graduate,No,3125,2583,170,360,1,Semiurban,N 198 | LP001666,Male,No,0,Graduate,No,8333,3750,187,360,1,Rural,Y 199 | LP001669,Female,No,0,Not Graduate,No,1907,2365,120,,1,Urban,Y 200 | LP001671,Female,Yes,0,Graduate,No,3416,2816,113,360,,Semiurban,Y 201 | LP001673,Male,No,0,Graduate,Yes,11000,0,83,360,1,Urban,N 202 | LP001674,Male,Yes,1,Not Graduate,No,2600,2500,90,360,1,Semiurban,Y 203 | LP001677,Male,No,2,Graduate,No,4923,0,166,360,0,Semiurban,Y 204 | LP001682,Male,Yes,3+,Not Graduate,No,3992,0,,180,1,Urban,N 205 | LP001688,Male,Yes,1,Not Graduate,No,3500,1083,135,360,1,Urban,Y 206 | LP001691,Male,Yes,2,Not Graduate,No,3917,0,124,360,1,Semiurban,Y 207 | LP001692,Female,No,0,Not Graduate,No,4408,0,120,360,1,Semiurban,Y 208 | LP001693,Female,No,0,Graduate,No,3244,0,80,360,1,Urban,Y 209 | LP001698,Male,No,0,Not Graduate,No,3975,2531,55,360,1,Rural,Y 210 | LP001699,Male,No,0,Graduate,No,2479,0,59,360,1,Urban,Y 211 | LP001702,Male,No,0,Graduate,No,3418,0,127,360,1,Semiurban,N 212 | LP001708,Female,No,0,Graduate,No,10000,0,214,360,1,Semiurban,N 213 | LP001711,Male,Yes,3+,Graduate,No,3430,1250,128,360,0,Semiurban,N 214 | LP001713,Male,Yes,1,Graduate,Yes,7787,0,240,360,1,Urban,Y 215 | LP001715,Male,Yes,3+,Not Graduate,Yes,5703,0,130,360,1,Rural,Y 216 | LP001716,Male,Yes,0,Graduate,No,3173,3021,137,360,1,Urban,Y 217 | LP001720,Male,Yes,3+,Not Graduate,No,3850,983,100,360,1,Semiurban,Y 218 | LP001722,Male,Yes,0,Graduate,No,150,1800,135,360,1,Rural,N 219 | LP001726,Male,Yes,0,Graduate,No,3727,1775,131,360,1,Semiurban,Y 220 | LP001732,Male,Yes,2,Graduate,,5000,0,72,360,0,Semiurban,N 221 | LP001734,Female,Yes,2,Graduate,No,4283,2383,127,360,,Semiurban,Y 222 | LP001736,Male,Yes,0,Graduate,No,2221,0,60,360,0,Urban,N 223 | LP001743,Male,Yes,2,Graduate,No,4009,1717,116,360,1,Semiurban,Y 224 | LP001744,Male,No,0,Graduate,No,2971,2791,144,360,1,Semiurban,Y 225 | LP001749,Male,Yes,0,Graduate,No,7578,1010,175,,1,Semiurban,Y 226 | LP001750,Male,Yes,0,Graduate,No,6250,0,128,360,1,Semiurban,Y 227 | LP001751,Male,Yes,0,Graduate,No,3250,0,170,360,1,Rural,N 228 | LP001754,Male,Yes,,Not Graduate,Yes,4735,0,138,360,1,Urban,N 229 | LP001758,Male,Yes,2,Graduate,No,6250,1695,210,360,1,Semiurban,Y 230 | LP001760,Male,,,Graduate,No,4758,0,158,480,1,Semiurban,Y 231 | LP001761,Male,No,0,Graduate,Yes,6400,0,200,360,1,Rural,Y 232 | LP001765,Male,Yes,1,Graduate,No,2491,2054,104,360,1,Semiurban,Y 233 | LP001768,Male,Yes,0,Graduate,,3716,0,42,180,1,Rural,Y 234 | LP001770,Male,No,0,Not Graduate,No,3189,2598,120,,1,Rural,Y 235 | LP001776,Female,No,0,Graduate,No,8333,0,280,360,1,Semiurban,Y 236 | LP001778,Male,Yes,1,Graduate,No,3155,1779,140,360,1,Semiurban,Y 237 | LP001784,Male,Yes,1,Graduate,No,5500,1260,170,360,1,Rural,Y 238 | LP001786,Male,Yes,0,Graduate,,5746,0,255,360,,Urban,N 239 | LP001788,Female,No,0,Graduate,Yes,3463,0,122,360,,Urban,Y 240 | LP001790,Female,No,1,Graduate,No,3812,0,112,360,1,Rural,Y 241 | LP001792,Male,Yes,1,Graduate,No,3315,0,96,360,1,Semiurban,Y 242 | LP001798,Male,Yes,2,Graduate,No,5819,5000,120,360,1,Rural,Y 243 | LP001800,Male,Yes,1,Not Graduate,No,2510,1983,140,180,1,Urban,N 244 | LP001806,Male,No,0,Graduate,No,2965,5701,155,60,1,Urban,Y 245 | LP001807,Male,Yes,2,Graduate,Yes,6250,1300,108,360,1,Rural,Y 246 | LP001811,Male,Yes,0,Not Graduate,No,3406,4417,123,360,1,Semiurban,Y 247 | LP001813,Male,No,0,Graduate,Yes,6050,4333,120,180,1,Urban,N 248 | LP001814,Male,Yes,2,Graduate,No,9703,0,112,360,1,Urban,Y 249 | LP001819,Male,Yes,1,Not Graduate,No,6608,0,137,180,1,Urban,Y 250 | LP001824,Male,Yes,1,Graduate,No,2882,1843,123,480,1,Semiurban,Y 251 | LP001825,Male,Yes,0,Graduate,No,1809,1868,90,360,1,Urban,Y 252 | LP001835,Male,Yes,0,Not Graduate,No,1668,3890,201,360,0,Semiurban,N 253 | LP001836,Female,No,2,Graduate,No,3427,0,138,360,1,Urban,N 254 | LP001841,Male,No,0,Not Graduate,Yes,2583,2167,104,360,1,Rural,Y 255 | LP001843,Male,Yes,1,Not Graduate,No,2661,7101,279,180,1,Semiurban,Y 256 | LP001844,Male,No,0,Graduate,Yes,16250,0,192,360,0,Urban,N 257 | LP001846,Female,No,3+,Graduate,No,3083,0,255,360,1,Rural,Y 258 | LP001849,Male,No,0,Not Graduate,No,6045,0,115,360,0,Rural,N 259 | LP001854,Male,Yes,3+,Graduate,No,5250,0,94,360,1,Urban,N 260 | LP001859,Male,Yes,0,Graduate,No,14683,2100,304,360,1,Rural,N 261 | LP001864,Male,Yes,3+,Not Graduate,No,4931,0,128,360,,Semiurban,N 262 | LP001865,Male,Yes,1,Graduate,No,6083,4250,330,360,,Urban,Y 263 | LP001868,Male,No,0,Graduate,No,2060,2209,134,360,1,Semiurban,Y 264 | LP001870,Female,No,1,Graduate,No,3481,0,155,36,1,Semiurban,N 265 | LP001871,Female,No,0,Graduate,No,7200,0,120,360,1,Rural,Y 266 | LP001872,Male,No,0,Graduate,Yes,5166,0,128,360,1,Semiurban,Y 267 | LP001875,Male,No,0,Graduate,No,4095,3447,151,360,1,Rural,Y 268 | LP001877,Male,Yes,2,Graduate,No,4708,1387,150,360,1,Semiurban,Y 269 | LP001882,Male,Yes,3+,Graduate,No,4333,1811,160,360,0,Urban,Y 270 | LP001883,Female,No,0,Graduate,,3418,0,135,360,1,Rural,N 271 | LP001884,Female,No,1,Graduate,No,2876,1560,90,360,1,Urban,Y 272 | LP001888,Female,No,0,Graduate,No,3237,0,30,360,1,Urban,Y 273 | LP001891,Male,Yes,0,Graduate,No,11146,0,136,360,1,Urban,Y 274 | LP001892,Male,No,0,Graduate,No,2833,1857,126,360,1,Rural,Y 275 | LP001894,Male,Yes,0,Graduate,No,2620,2223,150,360,1,Semiurban,Y 276 | LP001896,Male,Yes,2,Graduate,No,3900,0,90,360,1,Semiurban,Y 277 | LP001900,Male,Yes,1,Graduate,No,2750,1842,115,360,1,Semiurban,Y 278 | LP001903,Male,Yes,0,Graduate,No,3993,3274,207,360,1,Semiurban,Y 279 | LP001904,Male,Yes,0,Graduate,No,3103,1300,80,360,1,Urban,Y 280 | LP001907,Male,Yes,0,Graduate,No,14583,0,436,360,1,Semiurban,Y 281 | LP001908,Female,Yes,0,Not Graduate,No,4100,0,124,360,,Rural,Y 282 | LP001910,Male,No,1,Not Graduate,Yes,4053,2426,158,360,0,Urban,N 283 | LP001914,Male,Yes,0,Graduate,No,3927,800,112,360,1,Semiurban,Y 284 | LP001915,Male,Yes,2,Graduate,No,2301,985.7999878,78,180,1,Urban,Y 285 | LP001917,Female,No,0,Graduate,No,1811,1666,54,360,1,Urban,Y 286 | LP001922,Male,Yes,0,Graduate,No,20667,0,,360,1,Rural,N 287 | LP001924,Male,No,0,Graduate,No,3158,3053,89,360,1,Rural,Y 288 | LP001925,Female,No,0,Graduate,Yes,2600,1717,99,300,1,Semiurban,N 289 | LP001926,Male,Yes,0,Graduate,No,3704,2000,120,360,1,Rural,Y 290 | LP001931,Female,No,0,Graduate,No,4124,0,115,360,1,Semiurban,Y 291 | LP001935,Male,No,0,Graduate,No,9508,0,187,360,1,Rural,Y 292 | LP001936,Male,Yes,0,Graduate,No,3075,2416,139,360,1,Rural,Y 293 | LP001938,Male,Yes,2,Graduate,No,4400,0,127,360,0,Semiurban,N 294 | LP001940,Male,Yes,2,Graduate,No,3153,1560,134,360,1,Urban,Y 295 | LP001945,Female,No,,Graduate,No,5417,0,143,480,0,Urban,N 296 | LP001947,Male,Yes,0,Graduate,No,2383,3334,172,360,1,Semiurban,Y 297 | LP001949,Male,Yes,3+,Graduate,,4416,1250,110,360,1,Urban,Y 298 | LP001953,Male,Yes,1,Graduate,No,6875,0,200,360,1,Semiurban,Y 299 | LP001954,Female,Yes,1,Graduate,No,4666,0,135,360,1,Urban,Y 300 | LP001955,Female,No,0,Graduate,No,5000,2541,151,480,1,Rural,N 301 | LP001963,Male,Yes,1,Graduate,No,2014,2925,113,360,1,Urban,N 302 | LP001964,Male,Yes,0,Not Graduate,No,1800,2934,93,360,0,Urban,N 303 | LP001972,Male,Yes,,Not Graduate,No,2875,1750,105,360,1,Semiurban,Y 304 | LP001974,Female,No,0,Graduate,No,5000,0,132,360,1,Rural,Y 305 | LP001977,Male,Yes,1,Graduate,No,1625,1803,96,360,1,Urban,Y 306 | LP001978,Male,No,0,Graduate,No,4000,2500,140,360,1,Rural,Y 307 | LP001990,Male,No,0,Not Graduate,No,2000,0,,360,1,Urban,N 308 | LP001993,Female,No,0,Graduate,No,3762,1666,135,360,1,Rural,Y 309 | LP001994,Female,No,0,Graduate,No,2400,1863,104,360,0,Urban,N 310 | LP001996,Male,No,0,Graduate,No,20233,0,480,360,1,Rural,N 311 | LP001998,Male,Yes,2,Not Graduate,No,7667,0,185,360,,Rural,Y 312 | LP002002,Female,No,0,Graduate,No,2917,0,84,360,1,Semiurban,Y 313 | LP002004,Male,No,0,Not Graduate,No,2927,2405,111,360,1,Semiurban,Y 314 | LP002006,Female,No,0,Graduate,No,2507,0,56,360,1,Rural,Y 315 | LP002008,Male,Yes,2,Graduate,Yes,5746,0,144,84,,Rural,Y 316 | LP002024,,Yes,0,Graduate,No,2473,1843,159,360,1,Rural,N 317 | LP002031,Male,Yes,1,Not Graduate,No,3399,1640,111,180,1,Urban,Y 318 | LP002035,Male,Yes,2,Graduate,No,3717,0,120,360,1,Semiurban,Y 319 | LP002036,Male,Yes,0,Graduate,No,2058,2134,88,360,,Urban,Y 320 | LP002043,Female,No,1,Graduate,No,3541,0,112,360,,Semiurban,Y 321 | LP002050,Male,Yes,1,Graduate,Yes,10000,0,155,360,1,Rural,N 322 | LP002051,Male,Yes,0,Graduate,No,2400,2167,115,360,1,Semiurban,Y 323 | LP002053,Male,Yes,3+,Graduate,No,4342,189,124,360,1,Semiurban,Y 324 | LP002054,Male,Yes,2,Not Graduate,No,3601,1590,,360,1,Rural,Y 325 | LP002055,Female,No,0,Graduate,No,3166,2985,132,360,,Rural,Y 326 | LP002065,Male,Yes,3+,Graduate,No,15000,0,300,360,1,Rural,Y 327 | LP002067,Male,Yes,1,Graduate,Yes,8666,4983,376,360,0,Rural,N 328 | LP002068,Male,No,0,Graduate,No,4917,0,130,360,0,Rural,Y 329 | LP002082,Male,Yes,0,Graduate,Yes,5818,2160,184,360,1,Semiurban,Y 330 | LP002086,Female,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,N 331 | LP002087,Female,No,0,Graduate,No,2500,0,67,360,1,Urban,Y 332 | LP002097,Male,No,1,Graduate,No,4384,1793,117,360,1,Urban,Y 333 | LP002098,Male,No,0,Graduate,No,2935,0,98,360,1,Semiurban,Y 334 | LP002100,Male,No,,Graduate,No,2833,0,71,360,1,Urban,Y 335 | LP002101,Male,Yes,0,Graduate,,63337,0,490,180,1,Urban,Y 336 | LP002103,,Yes,1,Graduate,Yes,9833,1833,182,180,1,Urban,Y 337 | LP002106,Male,Yes,,Graduate,Yes,5503,4490,70,,1,Semiurban,Y 338 | LP002110,Male,Yes,1,Graduate,,5250,688,160,360,1,Rural,Y 339 | LP002112,Male,Yes,2,Graduate,Yes,2500,4600,176,360,1,Rural,Y 340 | LP002113,Female,No,3+,Not Graduate,No,1830,0,,360,0,Urban,N 341 | LP002114,Female,No,0,Graduate,No,4160,0,71,360,1,Semiurban,Y 342 | LP002115,Male,Yes,3+,Not Graduate,No,2647,1587,173,360,1,Rural,N 343 | LP002116,Female,No,0,Graduate,No,2378,0,46,360,1,Rural,N 344 | LP002119,Male,Yes,1,Not Graduate,No,4554,1229,158,360,1,Urban,Y 345 | LP002126,Male,Yes,3+,Not Graduate,No,3173,0,74,360,1,Semiurban,Y 346 | LP002128,Male,Yes,2,Graduate,,2583,2330,125,360,1,Rural,Y 347 | LP002129,Male,Yes,0,Graduate,No,2499,2458,160,360,1,Semiurban,Y 348 | LP002130,Male,Yes,,Not Graduate,No,3523,3230,152,360,0,Rural,N 349 | LP002131,Male,Yes,2,Not Graduate,No,3083,2168,126,360,1,Urban,Y 350 | LP002137,Male,Yes,0,Graduate,No,6333,4583,259,360,,Semiurban,Y 351 | LP002138,Male,Yes,0,Graduate,No,2625,6250,187,360,1,Rural,Y 352 | LP002139,Male,Yes,0,Graduate,No,9083,0,228,360,1,Semiurban,Y 353 | LP002140,Male,No,0,Graduate,No,8750,4167,308,360,1,Rural,N 354 | LP002141,Male,Yes,3+,Graduate,No,2666,2083,95,360,1,Rural,Y 355 | LP002142,Female,Yes,0,Graduate,Yes,5500,0,105,360,0,Rural,N 356 | LP002143,Female,Yes,0,Graduate,No,2423,505,130,360,1,Semiurban,Y 357 | LP002144,Female,No,,Graduate,No,3813,0,116,180,1,Urban,Y 358 | LP002149,Male,Yes,2,Graduate,No,8333,3167,165,360,1,Rural,Y 359 | LP002151,Male,Yes,1,Graduate,No,3875,0,67,360,1,Urban,N 360 | LP002158,Male,Yes,0,Not Graduate,No,3000,1666,100,480,0,Urban,N 361 | LP002160,Male,Yes,3+,Graduate,No,5167,3167,200,360,1,Semiurban,Y 362 | LP002161,Female,No,1,Graduate,No,4723,0,81,360,1,Semiurban,N 363 | LP002170,Male,Yes,2,Graduate,No,5000,3667,236,360,1,Semiurban,Y 364 | LP002175,Male,Yes,0,Graduate,No,4750,2333,130,360,1,Urban,Y 365 | LP002178,Male,Yes,0,Graduate,No,3013,3033,95,300,,Urban,Y 366 | LP002180,Male,No,0,Graduate,Yes,6822,0,141,360,1,Rural,Y 367 | LP002181,Male,No,0,Not Graduate,No,6216,0,133,360,1,Rural,N 368 | LP002187,Male,No,0,Graduate,No,2500,0,96,480,1,Semiurban,N 369 | LP002188,Male,No,0,Graduate,No,5124,0,124,,0,Rural,N 370 | LP002190,Male,Yes,1,Graduate,No,6325,0,175,360,1,Semiurban,Y 371 | LP002191,Male,Yes,0,Graduate,No,19730,5266,570,360,1,Rural,N 372 | LP002194,Female,No,0,Graduate,Yes,15759,0,55,360,1,Semiurban,Y 373 | LP002197,Male,Yes,2,Graduate,No,5185,0,155,360,1,Semiurban,Y 374 | LP002201,Male,Yes,2,Graduate,Yes,9323,7873,380,300,1,Rural,Y 375 | LP002205,Male,No,1,Graduate,No,3062,1987,111,180,0,Urban,N 376 | LP002209,Female,No,0,Graduate,,2764,1459,110,360,1,Urban,Y 377 | LP002211,Male,Yes,0,Graduate,No,4817,923,120,180,1,Urban,Y 378 | LP002219,Male,Yes,3+,Graduate,No,8750,4996,130,360,1,Rural,Y 379 | LP002223,Male,Yes,0,Graduate,No,4310,0,130,360,,Semiurban,Y 380 | LP002224,Male,No,0,Graduate,No,3069,0,71,480,1,Urban,N 381 | LP002225,Male,Yes,2,Graduate,No,5391,0,130,360,1,Urban,Y 382 | LP002226,Male,Yes,0,Graduate,,3333,2500,128,360,1,Semiurban,Y 383 | LP002229,Male,No,0,Graduate,No,5941,4232,296,360,1,Semiurban,Y 384 | LP002231,Female,No,0,Graduate,No,6000,0,156,360,1,Urban,Y 385 | LP002234,Male,No,0,Graduate,Yes,7167,0,128,360,1,Urban,Y 386 | LP002236,Male,Yes,2,Graduate,No,4566,0,100,360,1,Urban,N 387 | LP002237,Male,No,1,Graduate,,3667,0,113,180,1,Urban,Y 388 | LP002239,Male,No,0,Not Graduate,No,2346,1600,132,360,1,Semiurban,Y 389 | LP002243,Male,Yes,0,Not Graduate,No,3010,3136,,360,0,Urban,N 390 | LP002244,Male,Yes,0,Graduate,No,2333,2417,136,360,1,Urban,Y 391 | LP002250,Male,Yes,0,Graduate,No,5488,0,125,360,1,Rural,Y 392 | LP002255,Male,No,3+,Graduate,No,9167,0,185,360,1,Rural,Y 393 | LP002262,Male,Yes,3+,Graduate,No,9504,0,275,360,1,Rural,Y 394 | LP002263,Male,Yes,0,Graduate,No,2583,2115,120,360,,Urban,Y 395 | LP002265,Male,Yes,2,Not Graduate,No,1993,1625,113,180,1,Semiurban,Y 396 | LP002266,Male,Yes,2,Graduate,No,3100,1400,113,360,1,Urban,Y 397 | LP002272,Male,Yes,2,Graduate,No,3276,484,135,360,,Semiurban,Y 398 | LP002277,Female,No,0,Graduate,No,3180,0,71,360,0,Urban,N 399 | LP002281,Male,Yes,0,Graduate,No,3033,1459,95,360,1,Urban,Y 400 | LP002284,Male,No,0,Not Graduate,No,3902,1666,109,360,1,Rural,Y 401 | LP002287,Female,No,0,Graduate,No,1500,1800,103,360,0,Semiurban,N 402 | LP002288,Male,Yes,2,Not Graduate,No,2889,0,45,180,0,Urban,N 403 | LP002296,Male,No,0,Not Graduate,No,2755,0,65,300,1,Rural,N 404 | LP002297,Male,No,0,Graduate,No,2500,20000,103,360,1,Semiurban,Y 405 | LP002300,Female,No,0,Not Graduate,No,1963,0,53,360,1,Semiurban,Y 406 | LP002301,Female,No,0,Graduate,Yes,7441,0,194,360,1,Rural,N 407 | LP002305,Female,No,0,Graduate,No,4547,0,115,360,1,Semiurban,Y 408 | LP002308,Male,Yes,0,Not Graduate,No,2167,2400,115,360,1,Urban,Y 409 | LP002314,Female,No,0,Not Graduate,No,2213,0,66,360,1,Rural,Y 410 | LP002315,Male,Yes,1,Graduate,No,8300,0,152,300,0,Semiurban,N 411 | LP002317,Male,Yes,3+,Graduate,No,81000,0,360,360,0,Rural,N 412 | LP002318,Female,No,1,Not Graduate,Yes,3867,0,62,360,1,Semiurban,N 413 | LP002319,Male,Yes,0,Graduate,,6256,0,160,360,,Urban,Y 414 | LP002328,Male,Yes,0,Not Graduate,No,6096,0,218,360,0,Rural,N 415 | LP002332,Male,Yes,0,Not Graduate,No,2253,2033,110,360,1,Rural,Y 416 | LP002335,Female,Yes,0,Not Graduate,No,2149,3237,178,360,0,Semiurban,N 417 | LP002337,Female,No,0,Graduate,No,2995,0,60,360,1,Urban,Y 418 | LP002341,Female,No,1,Graduate,No,2600,0,160,360,1,Urban,N 419 | LP002342,Male,Yes,2,Graduate,Yes,1600,20000,239,360,1,Urban,N 420 | LP002345,Male,Yes,0,Graduate,No,1025,2773,112,360,1,Rural,Y 421 | LP002347,Male,Yes,0,Graduate,No,3246,1417,138,360,1,Semiurban,Y 422 | LP002348,Male,Yes,0,Graduate,No,5829,0,138,360,1,Rural,Y 423 | LP002357,Female,No,0,Not Graduate,No,2720,0,80,,0,Urban,N 424 | LP002361,Male,Yes,0,Graduate,No,1820,1719,100,360,1,Urban,Y 425 | LP002362,Male,Yes,1,Graduate,No,7250,1667,110,,0,Urban,N 426 | LP002364,Male,Yes,0,Graduate,No,14880,0,96,360,1,Semiurban,Y 427 | LP002366,Male,Yes,0,Graduate,No,2666,4300,121,360,1,Rural,Y 428 | LP002367,Female,No,1,Not Graduate,No,4606,0,81,360,1,Rural,N 429 | LP002368,Male,Yes,2,Graduate,No,5935,0,133,360,1,Semiurban,Y 430 | LP002369,Male,Yes,0,Graduate,No,2920,16.12000084,87,360,1,Rural,Y 431 | LP002370,Male,No,0,Not Graduate,No,2717,0,60,180,1,Urban,Y 432 | LP002377,Female,No,1,Graduate,Yes,8624,0,150,360,1,Semiurban,Y 433 | LP002379,Male,No,0,Graduate,No,6500,0,105,360,0,Rural,N 434 | LP002386,Male,No,0,Graduate,,12876,0,405,360,1,Semiurban,Y 435 | LP002387,Male,Yes,0,Graduate,No,2425,2340,143,360,1,Semiurban,Y 436 | LP002390,Male,No,0,Graduate,No,3750,0,100,360,1,Urban,Y 437 | LP002393,Female,,,Graduate,No,10047,0,,240,1,Semiurban,Y 438 | LP002398,Male,No,0,Graduate,No,1926,1851,50,360,1,Semiurban,Y 439 | LP002401,Male,Yes,0,Graduate,No,2213,1125,,360,1,Urban,Y 440 | LP002403,Male,No,0,Graduate,Yes,10416,0,187,360,0,Urban,N 441 | LP002407,Female,Yes,0,Not Graduate,Yes,7142,0,138,360,1,Rural,Y 442 | LP002408,Male,No,0,Graduate,No,3660,5064,187,360,1,Semiurban,Y 443 | LP002409,Male,Yes,0,Graduate,No,7901,1833,180,360,1,Rural,Y 444 | LP002418,Male,No,3+,Not Graduate,No,4707,1993,148,360,1,Semiurban,Y 445 | LP002422,Male,No,1,Graduate,No,37719,0,152,360,1,Semiurban,Y 446 | LP002424,Male,Yes,0,Graduate,No,7333,8333,175,300,,Rural,Y 447 | LP002429,Male,Yes,1,Graduate,Yes,3466,1210,130,360,1,Rural,Y 448 | LP002434,Male,Yes,2,Not Graduate,No,4652,0,110,360,1,Rural,Y 449 | LP002435,Male,Yes,0,Graduate,,3539,1376,55,360,1,Rural,N 450 | LP002443,Male,Yes,2,Graduate,No,3340,1710,150,360,0,Rural,N 451 | LP002444,Male,No,1,Not Graduate,Yes,2769,1542,190,360,,Semiurban,N 452 | LP002446,Male,Yes,2,Not Graduate,No,2309,1255,125,360,0,Rural,N 453 | LP002447,Male,Yes,2,Not Graduate,No,1958,1456,60,300,,Urban,Y 454 | LP002448,Male,Yes,0,Graduate,No,3948,1733,149,360,0,Rural,N 455 | LP002449,Male,Yes,0,Graduate,No,2483,2466,90,180,0,Rural,Y 456 | LP002453,Male,No,0,Graduate,Yes,7085,0,84,360,1,Semiurban,Y 457 | LP002455,Male,Yes,2,Graduate,No,3859,0,96,360,1,Semiurban,Y 458 | LP002459,Male,Yes,0,Graduate,No,4301,0,118,360,1,Urban,Y 459 | LP002467,Male,Yes,0,Graduate,No,3708,2569,173,360,1,Urban,N 460 | LP002472,Male,No,2,Graduate,No,4354,0,136,360,1,Rural,Y 461 | LP002473,Male,Yes,0,Graduate,No,8334,0,160,360,1,Semiurban,N 462 | LP002478,,Yes,0,Graduate,Yes,2083,4083,160,360,,Semiurban,Y 463 | LP002484,Male,Yes,3+,Graduate,No,7740,0,128,180,1,Urban,Y 464 | LP002487,Male,Yes,0,Graduate,No,3015,2188,153,360,1,Rural,Y 465 | LP002489,Female,No,1,Not Graduate,,5191,0,132,360,1,Semiurban,Y 466 | LP002493,Male,No,0,Graduate,No,4166,0,98,360,0,Semiurban,N 467 | LP002494,Male,No,0,Graduate,No,6000,0,140,360,1,Rural,Y 468 | LP002500,Male,Yes,3+,Not Graduate,No,2947,1664,70,180,0,Urban,N 469 | LP002501,,Yes,0,Graduate,No,16692,0,110,360,1,Semiurban,Y 470 | LP002502,Female,Yes,2,Not Graduate,,210,2917,98,360,1,Semiurban,Y 471 | LP002505,Male,Yes,0,Graduate,No,4333,2451,110,360,1,Urban,N 472 | LP002515,Male,Yes,1,Graduate,Yes,3450,2079,162,360,1,Semiurban,Y 473 | LP002517,Male,Yes,1,Not Graduate,No,2653,1500,113,180,0,Rural,N 474 | LP002519,Male,Yes,3+,Graduate,No,4691,0,100,360,1,Semiurban,Y 475 | LP002522,Female,No,0,Graduate,Yes,2500,0,93,360,,Urban,Y 476 | LP002524,Male,No,2,Graduate,No,5532,4648,162,360,1,Rural,Y 477 | LP002527,Male,Yes,2,Graduate,Yes,16525,1014,150,360,1,Rural,Y 478 | LP002529,Male,Yes,2,Graduate,No,6700,1750,230,300,1,Semiurban,Y 479 | LP002530,,Yes,2,Graduate,No,2873,1872,132,360,0,Semiurban,N 480 | LP002531,Male,Yes,1,Graduate,Yes,16667,2250,86,360,1,Semiurban,Y 481 | LP002533,Male,Yes,2,Graduate,No,2947,1603,,360,1,Urban,N 482 | LP002534,Female,No,0,Not Graduate,No,4350,0,154,360,1,Rural,Y 483 | LP002536,Male,Yes,3+,Not Graduate,No,3095,0,113,360,1,Rural,Y 484 | LP002537,Male,Yes,0,Graduate,No,2083,3150,128,360,1,Semiurban,Y 485 | LP002541,Male,Yes,0,Graduate,No,10833,0,234,360,1,Semiurban,Y 486 | LP002543,Male,Yes,2,Graduate,No,8333,0,246,360,1,Semiurban,Y 487 | LP002544,Male,Yes,1,Not Graduate,No,1958,2436,131,360,1,Rural,Y 488 | LP002545,Male,No,2,Graduate,No,3547,0,80,360,0,Rural,N 489 | LP002547,Male,Yes,1,Graduate,No,18333,0,500,360,1,Urban,N 490 | LP002555,Male,Yes,2,Graduate,Yes,4583,2083,160,360,1,Semiurban,Y 491 | LP002556,Male,No,0,Graduate,No,2435,0,75,360,1,Urban,N 492 | LP002560,Male,No,0,Not Graduate,No,2699,2785,96,360,,Semiurban,Y 493 | LP002562,Male,Yes,1,Not Graduate,No,5333,1131,186,360,,Urban,Y 494 | LP002571,Male,No,0,Not Graduate,No,3691,0,110,360,1,Rural,Y 495 | LP002582,Female,No,0,Not Graduate,Yes,17263,0,225,360,1,Semiurban,Y 496 | LP002585,Male,Yes,0,Graduate,No,3597,2157,119,360,0,Rural,N 497 | LP002586,Female,Yes,1,Graduate,No,3326,913,105,84,1,Semiurban,Y 498 | LP002587,Male,Yes,0,Not Graduate,No,2600,1700,107,360,1,Rural,Y 499 | LP002588,Male,Yes,0,Graduate,No,4625,2857,111,12,,Urban,Y 500 | LP002600,Male,Yes,1,Graduate,Yes,2895,0,95,360,1,Semiurban,Y 501 | LP002602,Male,No,0,Graduate,No,6283,4416,209,360,0,Rural,N 502 | LP002603,Female,No,0,Graduate,No,645,3683,113,480,1,Rural,Y 503 | LP002606,Female,No,0,Graduate,No,3159,0,100,360,1,Semiurban,Y 504 | LP002615,Male,Yes,2,Graduate,No,4865,5624,208,360,1,Semiurban,Y 505 | LP002618,Male,Yes,1,Not Graduate,No,4050,5302,138,360,,Rural,N 506 | LP002619,Male,Yes,0,Not Graduate,No,3814,1483,124,300,1,Semiurban,Y 507 | LP002622,Male,Yes,2,Graduate,No,3510,4416,243,360,1,Rural,Y 508 | LP002624,Male,Yes,0,Graduate,No,20833,6667,480,360,,Urban,Y 509 | LP002625,,No,0,Graduate,No,3583,0,96,360,1,Urban,N 510 | LP002626,Male,Yes,0,Graduate,Yes,2479,3013,188,360,1,Urban,Y 511 | LP002634,Female,No,1,Graduate,No,13262,0,40,360,1,Urban,Y 512 | LP002637,Male,No,0,Not Graduate,No,3598,1287,100,360,1,Rural,N 513 | LP002640,Male,Yes,1,Graduate,No,6065,2004,250,360,1,Semiurban,Y 514 | LP002643,Male,Yes,2,Graduate,No,3283,2035,148,360,1,Urban,Y 515 | LP002648,Male,Yes,0,Graduate,No,2130,6666,70,180,1,Semiurban,N 516 | LP002652,Male,No,0,Graduate,No,5815,3666,311,360,1,Rural,N 517 | LP002659,Male,Yes,3+,Graduate,No,3466,3428,150,360,1,Rural,Y 518 | LP002670,Female,Yes,2,Graduate,No,2031,1632,113,480,1,Semiurban,Y 519 | LP002682,Male,Yes,,Not Graduate,No,3074,1800,123,360,0,Semiurban,N 520 | LP002683,Male,No,0,Graduate,No,4683,1915,185,360,1,Semiurban,N 521 | LP002684,Female,No,0,Not Graduate,No,3400,0,95,360,1,Rural,N 522 | LP002689,Male,Yes,2,Not Graduate,No,2192,1742,45,360,1,Semiurban,Y 523 | LP002690,Male,No,0,Graduate,No,2500,0,55,360,1,Semiurban,Y 524 | LP002692,Male,Yes,3+,Graduate,Yes,5677,1424,100,360,1,Rural,Y 525 | LP002693,Male,Yes,2,Graduate,Yes,7948,7166,480,360,1,Rural,Y 526 | LP002697,Male,No,0,Graduate,No,4680,2087,,360,1,Semiurban,N 527 | LP002699,Male,Yes,2,Graduate,Yes,17500,0,400,360,1,Rural,Y 528 | LP002705,Male,Yes,0,Graduate,No,3775,0,110,360,1,Semiurban,Y 529 | LP002706,Male,Yes,1,Not Graduate,No,5285,1430,161,360,0,Semiurban,Y 530 | LP002714,Male,No,1,Not Graduate,No,2679,1302,94,360,1,Semiurban,Y 531 | LP002716,Male,No,0,Not Graduate,No,6783,0,130,360,1,Semiurban,Y 532 | LP002717,Male,Yes,0,Graduate,No,1025,5500,216,360,,Rural,Y 533 | LP002720,Male,Yes,3+,Graduate,No,4281,0,100,360,1,Urban,Y 534 | LP002723,Male,No,2,Graduate,No,3588,0,110,360,0,Rural,N 535 | LP002729,Male,No,1,Graduate,No,11250,0,196,360,,Semiurban,N 536 | LP002731,Female,No,0,Not Graduate,Yes,18165,0,125,360,1,Urban,Y 537 | LP002732,Male,No,0,Not Graduate,,2550,2042,126,360,1,Rural,Y 538 | LP002734,Male,Yes,0,Graduate,No,6133,3906,324,360,1,Urban,Y 539 | LP002738,Male,No,2,Graduate,No,3617,0,107,360,1,Semiurban,Y 540 | LP002739,Male,Yes,0,Not Graduate,No,2917,536,66,360,1,Rural,N 541 | LP002740,Male,Yes,3+,Graduate,No,6417,0,157,180,1,Rural,Y 542 | LP002741,Female,Yes,1,Graduate,No,4608,2845,140,180,1,Semiurban,Y 543 | LP002743,Female,No,0,Graduate,No,2138,0,99,360,0,Semiurban,N 544 | LP002753,Female,No,1,Graduate,,3652,0,95,360,1,Semiurban,Y 545 | LP002755,Male,Yes,1,Not Graduate,No,2239,2524,128,360,1,Urban,Y 546 | LP002757,Female,Yes,0,Not Graduate,No,3017,663,102,360,,Semiurban,Y 547 | LP002767,Male,Yes,0,Graduate,No,2768,1950,155,360,1,Rural,Y 548 | LP002768,Male,No,0,Not Graduate,No,3358,0,80,36,1,Semiurban,N 549 | LP002772,Male,No,0,Graduate,No,2526,1783,145,360,1,Rural,Y 550 | LP002776,Female,No,0,Graduate,No,5000,0,103,360,0,Semiurban,N 551 | LP002777,Male,Yes,0,Graduate,No,2785,2016,110,360,1,Rural,Y 552 | LP002778,Male,Yes,2,Graduate,Yes,6633,0,,360,0,Rural,N 553 | LP002784,Male,Yes,1,Not Graduate,No,2492,2375,,360,1,Rural,Y 554 | LP002785,Male,Yes,1,Graduate,No,3333,3250,158,360,1,Urban,Y 555 | LP002788,Male,Yes,0,Not Graduate,No,2454,2333,181,360,0,Urban,N 556 | LP002789,Male,Yes,0,Graduate,No,3593,4266,132,180,0,Rural,N 557 | LP002792,Male,Yes,1,Graduate,No,5468,1032,26,360,1,Semiurban,Y 558 | LP002794,Female,No,0,Graduate,No,2667,1625,84,360,,Urban,Y 559 | LP002795,Male,Yes,3+,Graduate,Yes,10139,0,260,360,1,Semiurban,Y 560 | LP002798,Male,Yes,0,Graduate,No,3887,2669,162,360,1,Semiurban,Y 561 | LP002804,Female,Yes,0,Graduate,No,4180,2306,182,360,1,Semiurban,Y 562 | LP002807,Male,Yes,2,Not Graduate,No,3675,242,108,360,1,Semiurban,Y 563 | LP002813,Female,Yes,1,Graduate,Yes,19484,0,600,360,1,Semiurban,Y 564 | LP002820,Male,Yes,0,Graduate,No,5923,2054,211,360,1,Rural,Y 565 | LP002821,Male,No,0,Not Graduate,Yes,5800,0,132,360,1,Semiurban,Y 566 | LP002832,Male,Yes,2,Graduate,No,8799,0,258,360,0,Urban,N 567 | LP002833,Male,Yes,0,Not Graduate,No,4467,0,120,360,,Rural,Y 568 | LP002836,Male,No,0,Graduate,No,3333,0,70,360,1,Urban,Y 569 | LP002837,Male,Yes,3+,Graduate,No,3400,2500,123,360,0,Rural,N 570 | LP002840,Female,No,0,Graduate,No,2378,0,9,360,1,Urban,N 571 | LP002841,Male,Yes,0,Graduate,No,3166,2064,104,360,0,Urban,N 572 | LP002842,Male,Yes,1,Graduate,No,3417,1750,186,360,1,Urban,Y 573 | LP002847,Male,Yes,,Graduate,No,5116,1451,165,360,0,Urban,N 574 | LP002855,Male,Yes,2,Graduate,No,16666,0,275,360,1,Urban,Y 575 | LP002862,Male,Yes,2,Not Graduate,No,6125,1625,187,480,1,Semiurban,N 576 | LP002863,Male,Yes,3+,Graduate,No,6406,0,150,360,1,Semiurban,N 577 | LP002868,Male,Yes,2,Graduate,No,3159,461,108,84,1,Urban,Y 578 | LP002872,,Yes,0,Graduate,No,3087,2210,136,360,0,Semiurban,N 579 | LP002874,Male,No,0,Graduate,No,3229,2739,110,360,1,Urban,Y 580 | LP002877,Male,Yes,1,Graduate,No,1782,2232,107,360,1,Rural,Y 581 | LP002888,Male,No,0,Graduate,,3182,2917,161,360,1,Urban,Y 582 | LP002892,Male,Yes,2,Graduate,No,6540,0,205,360,1,Semiurban,Y 583 | LP002893,Male,No,0,Graduate,No,1836,33837,90,360,1,Urban,N 584 | LP002894,Female,Yes,0,Graduate,No,3166,0,36,360,1,Semiurban,Y 585 | LP002898,Male,Yes,1,Graduate,No,1880,0,61,360,,Rural,N 586 | LP002911,Male,Yes,1,Graduate,No,2787,1917,146,360,0,Rural,N 587 | LP002912,Male,Yes,1,Graduate,No,4283,3000,172,84,1,Rural,N 588 | LP002916,Male,Yes,0,Graduate,No,2297,1522,104,360,1,Urban,Y 589 | LP002917,Female,No,0,Not Graduate,No,2165,0,70,360,1,Semiurban,Y 590 | LP002925,,No,0,Graduate,No,4750,0,94,360,1,Semiurban,Y 591 | LP002926,Male,Yes,2,Graduate,Yes,2726,0,106,360,0,Semiurban,N 592 | LP002928,Male,Yes,0,Graduate,No,3000,3416,56,180,1,Semiurban,Y 593 | LP002931,Male,Yes,2,Graduate,Yes,6000,0,205,240,1,Semiurban,N 594 | LP002933,,No,3+,Graduate,Yes,9357,0,292,360,1,Semiurban,Y 595 | LP002936,Male,Yes,0,Graduate,No,3859,3300,142,180,1,Rural,Y 596 | LP002938,Male,Yes,0,Graduate,Yes,16120,0,260,360,1,Urban,Y 597 | LP002940,Male,No,0,Not Graduate,No,3833,0,110,360,1,Rural,Y 598 | LP002941,Male,Yes,2,Not Graduate,Yes,6383,1000,187,360,1,Rural,N 599 | LP002943,Male,No,,Graduate,No,2987,0,88,360,0,Semiurban,N 600 | LP002945,Male,Yes,0,Graduate,Yes,9963,0,180,360,1,Rural,Y 601 | LP002948,Male,Yes,2,Graduate,No,5780,0,192,360,1,Urban,Y 602 | LP002949,Female,No,3+,Graduate,,416,41667,350,180,,Urban,N 603 | LP002950,Male,Yes,0,Not Graduate,,2894,2792,155,360,1,Rural,Y 604 | LP002953,Male,Yes,3+,Graduate,No,5703,0,128,360,1,Urban,Y 605 | LP002958,Male,No,0,Graduate,No,3676,4301,172,360,1,Rural,Y 606 | LP002959,Female,Yes,1,Graduate,No,12000,0,496,360,1,Semiurban,Y 607 | LP002960,Male,Yes,0,Not Graduate,No,2400,3800,,180,1,Urban,N 608 | LP002961,Male,Yes,1,Graduate,No,3400,2500,173,360,1,Semiurban,Y 609 | LP002964,Male,Yes,2,Not Graduate,No,3987,1411,157,360,1,Rural,Y 610 | LP002974,Male,Yes,0,Graduate,No,3232,1950,108,360,1,Rural,Y 611 | LP002978,Female,No,0,Graduate,No,2900,0,71,360,1,Rural,Y 612 | LP002979,Male,Yes,3+,Graduate,No,4106,0,40,180,1,Rural,Y 613 | LP002983,Male,Yes,1,Graduate,No,8072,240,253,360,1,Urban,Y 614 | LP002984,Male,Yes,2,Graduate,No,7583,0,187,360,1,Urban,Y 615 | LP002990,Female,No,0,Graduate,Yes,4583,0,133,360,0,Semiurban,N 616 | -------------------------------------------------------------------------------- /flask_api/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratos/flask_api/2be7eded1e7167a64b895e700adab3c60355e186/flask_api/__init__.py -------------------------------------------------------------------------------- /flask_api/flask_api.yml: -------------------------------------------------------------------------------- 1 | name: flask_api 2 | channels: 3 | - defaults 4 | dependencies: 5 | - certifi=2016.2.28=py36_0 6 | - openssl=1.0.2l=0 7 | - pip=9.0.1=py36_1 8 | - python=3.6.2=0 9 | - readline=6.2=2 10 | - setuptools=36.4.0=py36_1 11 | - sqlite=3.13.0=0 12 | - tk=8.5.18=0 13 | - wheel=0.29.0=py36_0 14 | - xz=5.2.3=0 15 | - zlib=1.2.11=0 16 | - pip: 17 | - chardet==3.0.4 18 | - click==6.7 19 | - dill==0.2.7.1 20 | - falcon==1.2.0 21 | - flask==0.12.2 22 | - gunicorn==19.7.1 23 | - hug==2.3.1 24 | - idna==2.6 25 | - itsdangerous==0.24 26 | - jinja2==2.9.6 27 | - markupsafe==1.0 28 | - numpy==1.13.1 29 | - pandas==0.20.3 30 | - python-dateutil==2.6.1 31 | - python-mimeparse==1.6.0 32 | - pytz==2017.2 33 | - requests==2.18.4 34 | - scikit-learn==0.19.0 35 | - scipy==0.19.1 36 | - six==1.10.0 37 | - urllib3==1.22 38 | - werkzeug==0.12.2 39 | prefix: /home/pratos/miniconda3/envs/flask_api 40 | 41 | -------------------------------------------------------------------------------- /flask_api/hello-world.py: -------------------------------------------------------------------------------- 1 | from flask import Flask 2 | 3 | app = Flask(__name__) 4 | 5 | @app.route('/users/') 6 | def hello_world(username='MyName'): 7 | return("Hello {}!".format(username)) 8 | -------------------------------------------------------------------------------- /flask_api/models/model_v1.pk: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratos/flask_api/2be7eded1e7167a64b895e700adab3c60355e186/flask_api/models/model_v1.pk -------------------------------------------------------------------------------- /flask_api/requirements.txt: -------------------------------------------------------------------------------- 1 | certifi==2017.7.27.1 2 | chardet==3.0.4 3 | click==6.7 4 | falcon==1.2.0 5 | flask>=0.12.3 6 | gunicorn==19.7.1 7 | hug==2.3.1 8 | idna==2.6 9 | itsdangerous==0.24 10 | Jinja2==2.9.6 11 | MarkupSafe==1.0 12 | numpy==1.13.1 13 | pandas==0.20.3 14 | python-dateutil==2.6.1 15 | python-mimeparse==1.6.0 16 | pytz==2017.2 17 | requests>=2.20.0 18 | scikit-learn==0.19.0 19 | scipy==0.19.1 20 | six==1.10.0 21 | urllib3==1.22 22 | Werkzeug==0.12.2 23 | -------------------------------------------------------------------------------- /flask_api/server.py: -------------------------------------------------------------------------------- 1 | import os 2 | import pandas as pd 3 | import dill as pickle 4 | from flask import Flask, jsonify, request 5 | from utils import PreProcessing 6 | 7 | app = Flask(__name__) 8 | 9 | @app.route('/predict', methods=['POST']) 10 | def apicall(): 11 | """API Call 12 | 13 | Pandas dataframe (sent as a payload) from API Call 14 | """ 15 | try: 16 | test_json = request.get_json() 17 | test = pd.read_json(test_json, orient='records') 18 | 19 | #To resolve the issue of TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str' 20 | test['Dependents'] = [str(x) for x in list(test['Dependents'])] 21 | 22 | #Getting the Loan_IDs separated out 23 | loan_ids = test['Loan_ID'] 24 | 25 | except Exception as e: 26 | raise e 27 | 28 | clf = 'model_v1.pk' 29 | 30 | if test.empty: 31 | return(bad_request()) 32 | else: 33 | #Load the saved model 34 | print("Loading the model...") 35 | loaded_model = None 36 | with open('./models/'+clf,'rb') as f: 37 | loaded_model = pickle.load(f) 38 | 39 | print("The model has been loaded...doing predictions now...") 40 | predictions = loaded_model.predict(test) 41 | 42 | """Add the predictions as Series to a new pandas dataframe 43 | OR 44 | Depending on the use-case, the entire test data appended with the new files 45 | """ 46 | prediction_series = list(pd.Series(predictions)) 47 | 48 | final_predictions = pd.DataFrame(list(zip(loan_ids, prediction_series))) 49 | 50 | """We can be as creative in sending the responses. 51 | But we need to send the response codes as well. 52 | """ 53 | responses = jsonify(predictions=final_predictions.to_json(orient="records")) 54 | responses.status_code = 200 55 | 56 | return (responses) 57 | 58 | 59 | @app.errorhandler(400) 60 | def bad_request(error=None): 61 | message = { 62 | 'status': 400, 63 | 'message': 'Bad Request: ' + request.url + '--> Please check your data payload...', 64 | } 65 | resp = jsonify(message) 66 | resp.status_code = 400 67 | 68 | return resp -------------------------------------------------------------------------------- /flask_api/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import numpy as np 4 | import pandas as pd 5 | import dill as pickle 6 | from sklearn.externals import joblib 7 | from sklearn.model_selection import train_test_split, GridSearchCV 8 | from sklearn.base import BaseEstimator, TransformerMixin 9 | from sklearn.ensemble import RandomForestClassifier 10 | 11 | from sklearn.pipeline import make_pipeline 12 | 13 | import warnings 14 | warnings.filterwarnings("ignore") 15 | 16 | 17 | def build_and_train(): 18 | 19 | data = pd.read_csv('../data/training.csv') 20 | data = data.dropna(subset=['Gender', 'Married', 'Credit_History', 'LoanAmount']) 21 | 22 | pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome','CoapplicantIncome',\ 23 | 'LoanAmount','Loan_Amount_Term','Credit_History','Property_Area'] 24 | 25 | X_train, X_test, y_train, y_test = train_test_split(data[pred_var], data['Loan_Status'], \ 26 | test_size=0.25, random_state=42) 27 | y_train = y_train.replace({'Y':1, 'N':0}).as_matrix() 28 | y_test = y_test.replace({'Y':1, 'N':0}).as_matrix() 29 | 30 | pipe = make_pipeline(PreProcessing(), 31 | RandomForestClassifier()) 32 | 33 | param_grid = {"randomforestclassifier__n_estimators" : [10, 20, 30], 34 | "randomforestclassifier__max_depth" : [None, 6, 8, 10], 35 | "randomforestclassifier__max_leaf_nodes": [None, 5, 10, 20], 36 | "randomforestclassifier__min_impurity_split": [0.1, 0.2, 0.3]} 37 | 38 | grid = GridSearchCV(pipe, param_grid=param_grid, cv=3) 39 | 40 | grid.fit(X_train, y_train) 41 | 42 | return(grid) 43 | 44 | 45 | class PreProcessing(BaseEstimator, TransformerMixin): 46 | """Custom Pre-Processing estimator for our use-case 47 | """ 48 | 49 | def __init__(self): 50 | pass 51 | 52 | def transform(self, df): 53 | """Regular transform() that is a help for training, validation & testing datasets 54 | (NOTE: The operations performed here are the ones that we did prior to this cell) 55 | """ 56 | pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome',\ 57 | 'CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History','Property_Area'] 58 | 59 | df = df[pred_var] 60 | 61 | df['Dependents'] = df['Dependents'].fillna(0) 62 | df['Self_Employed'] = df['Self_Employed'].fillna('No') 63 | df['Loan_Amount_Term'] = df['Loan_Amount_Term'].fillna(self.term_mean_) 64 | df['Credit_History'] = df['Credit_History'].fillna(1) 65 | df['Married'] = df['Married'].fillna('No') 66 | df['Gender'] = df['Gender'].fillna('Male') 67 | df['LoanAmount'] = df['LoanAmount'].fillna(self.amt_mean_) 68 | 69 | gender_values = {'Female' : 0, 'Male' : 1} 70 | married_values = {'No' : 0, 'Yes' : 1} 71 | education_values = {'Graduate' : 0, 'Not Graduate' : 1} 72 | employed_values = {'No' : 0, 'Yes' : 1} 73 | property_values = {'Rural' : 0, 'Urban' : 1, 'Semiurban' : 2} 74 | dependent_values = {'3+': 3, '0': 0, '2': 2, '1': 1} 75 | df.replace({'Gender': gender_values, 'Married': married_values, 'Education': education_values, \ 76 | 'Self_Employed': employed_values, 'Property_Area': property_values, \ 77 | 'Dependents': dependent_values}, inplace=True) 78 | 79 | return df.as_matrix() 80 | 81 | def fit(self, df, y=None, **fit_params): 82 | """Fitting the Training dataset & calculating the required values from train 83 | e.g: We will need the mean of X_train['Loan_Amount_Term'] that will be used in 84 | transformation of X_test 85 | """ 86 | 87 | self.term_mean_ = df['Loan_Amount_Term'].mean() 88 | self.amt_mean_ = df['LoanAmount'].mean() 89 | return self 90 | 91 | if __name__ == '__main__': 92 | model = build_and_train() 93 | 94 | filename = 'model_v1.pk' 95 | with open('../flask_api/models/'+filename, 'wb') as file: 96 | pickle.dump(model, file) -------------------------------------------------------------------------------- /notebooks/AnalyticsVidhya Article - ML Model approach.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "### Analytics Vidhya: Practice Problem (Approach)" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 93, 13 | "metadata": { 14 | "collapsed": true 15 | }, 16 | "outputs": [], 17 | "source": [ 18 | "import os\n", 19 | "import re\n", 20 | "import numpy as np\n", 21 | "import pandas as pd\n", 22 | "\n", 23 | "import warnings\n", 24 | "warnings.filterwarnings(\"ignore\")" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 94, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "name": "stdout", 34 | "output_type": "stream", 35 | "text": [ 36 | "ls: cannot access '/home/pratos/Side-Project/av_articles/data/': No such file or directory\r\n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "!ls /home/pratos/Side-Project/av_articles/data/" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "Download the training & test data from the Practice Problem approach. We'll do a bit of quick investigation on the dataset:" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 95, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "data = pd.read_csv('../data/training.csv')" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 96, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "data": { 67 | "text/html": [ 68 | "
\n", 69 | "\n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | "
Loan_IDGenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_AreaLoan_Status
0LP001002MaleNo0GraduateNo58490.0NaN360.01.0UrbanY
1LP001003MaleYes1GraduateNo45831508.0128.0360.01.0RuralN
2LP001005MaleYes0GraduateYes30000.066.0360.01.0UrbanY
3LP001006MaleYes0Not GraduateNo25832358.0120.0360.01.0UrbanY
4LP001008MaleNo0GraduateNo60000.0141.0360.01.0UrbanY
\n", 171 | "
" 172 | ], 173 | "text/plain": [ 174 | " Loan_ID Gender Married Dependents Education Self_Employed \\\n", 175 | "0 LP001002 Male No 0 Graduate No \n", 176 | "1 LP001003 Male Yes 1 Graduate No \n", 177 | "2 LP001005 Male Yes 0 Graduate Yes \n", 178 | "3 LP001006 Male Yes 0 Not Graduate No \n", 179 | "4 LP001008 Male No 0 Graduate No \n", 180 | "\n", 181 | " ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \\\n", 182 | "0 5849 0.0 NaN 360.0 \n", 183 | "1 4583 1508.0 128.0 360.0 \n", 184 | "2 3000 0.0 66.0 360.0 \n", 185 | "3 2583 2358.0 120.0 360.0 \n", 186 | "4 6000 0.0 141.0 360.0 \n", 187 | "\n", 188 | " Credit_History Property_Area Loan_Status \n", 189 | "0 1.0 Urban Y \n", 190 | "1 1.0 Rural N \n", 191 | "2 1.0 Urban Y \n", 192 | "3 1.0 Urban Y \n", 193 | "4 1.0 Urban Y " 194 | ] 195 | }, 196 | "execution_count": 96, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "source": [ 202 | "data.head()" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 97, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "Shape of the data is:(614, 13)\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | "print(\"Shape of the data is:{}\".format(data.shape))" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 98, 225 | "metadata": {}, 226 | "outputs": [ 227 | { 228 | "name": "stdout", 229 | "output_type": "stream", 230 | "text": [ 231 | "List of columns is: ['Loan_ID', 'Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount', 'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'Loan_Status']\n" 232 | ] 233 | } 234 | ], 235 | "source": [ 236 | "print(\"List of columns is: {}\".format(list(data.columns)))" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "Here, `Loan_status` is our `target variable`, the rest are `predictor variables`. `Loan_ID` wouldn't help much in making predictions about `defaulters` hence we won't be considering that variable in our final model." 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "Finding out the `null/Nan` values in the columns:" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 99, 256 | "metadata": {}, 257 | "outputs": [ 258 | { 259 | "name": "stdout", 260 | "output_type": "stream", 261 | "text": [ 262 | "The number of null values in:Loan_ID == 0\n", 263 | "The number of null values in:Gender == 13\n", 264 | "The number of null values in:Married == 3\n", 265 | "The number of null values in:Dependents == 15\n", 266 | "The number of null values in:Education == 0\n", 267 | "The number of null values in:Self_Employed == 32\n", 268 | "The number of null values in:ApplicantIncome == 0\n", 269 | "The number of null values in:CoapplicantIncome == 0\n", 270 | "The number of null values in:LoanAmount == 22\n", 271 | "The number of null values in:Loan_Amount_Term == 14\n", 272 | "The number of null values in:Credit_History == 50\n", 273 | "The number of null values in:Property_Area == 0\n", 274 | "The number of null values in:Loan_Status == 0\n" 275 | ] 276 | } 277 | ], 278 | "source": [ 279 | "for _ in data.columns:\n", 280 | " print(\"The number of null values in:{} == {}\".format(_, data[_].isnull().sum()))" 281 | ] 282 | }, 283 | { 284 | "cell_type": "markdown", 285 | "metadata": {}, 286 | "source": [ 287 | "We'll check out the values (labels) for the columns having missing values:" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": 100, 293 | "metadata": {}, 294 | "outputs": [ 295 | { 296 | "name": "stdout", 297 | "output_type": "stream", 298 | "text": [ 299 | "List of unique labels for Dependents:::{nan, '3+', '0', '2', '1'}\n", 300 | "List of unique labels for Self_Employed:::{nan, 'Yes', 'No'}\n", 301 | "List of unique labels for Loan_Amount_Term:::{nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 12.0, 36.0, 300.0, 180.0, 60.0, 84.0, 480.0, 360.0, 240.0, 120.0}\n", 302 | "List of unique labels for Gender:::{'Male', nan, 'Female'}\n", 303 | "List of unique labels for Married:::{nan, 'Yes', 'No'}\n" 304 | ] 305 | } 306 | ], 307 | "source": [ 308 | "missing_pred = ['Dependents', 'Self_Employed', 'Loan_Amount_Term', 'Gender', 'Married']\n", 309 | "\n", 310 | "for _ in missing_pred:\n", 311 | " print(\"List of unique labels for {}:::{}\".format(_, set(data[_])))" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": { 317 | "collapsed": true 318 | }, 319 | "source": [ 320 | "For the rest of the missing values:\n", 321 | "\n", 322 | "- `Dependents`: Assumption that there are no dependents\n", 323 | "- `Self_Employed`: Assumption that the applicant is not self-employed\n", 324 | "- `Loan_Amount_Term`: Assumption that the loan amount term is median value\n", 325 | "- `Credit_History`: Assumption that the person has a credit history\n", 326 | "- `Married`: If nothing specified, applicant is not married\n", 327 | "- `Gender`: Assuming the gender is Male for the missing values\n", 328 | "\n", 329 | "Before that we'll divide the dataset in train and test" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 101, 335 | "metadata": { 336 | "collapsed": true 337 | }, 338 | "outputs": [], 339 | "source": [ 340 | "from sklearn.model_selection import train_test_split" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 102, 346 | "metadata": {}, 347 | "outputs": [ 348 | { 349 | "data": { 350 | "text/plain": [ 351 | "['Loan_ID',\n", 352 | " 'Gender',\n", 353 | " 'Married',\n", 354 | " 'Dependents',\n", 355 | " 'Education',\n", 356 | " 'Self_Employed',\n", 357 | " 'ApplicantIncome',\n", 358 | " 'CoapplicantIncome',\n", 359 | " 'LoanAmount',\n", 360 | " 'Loan_Amount_Term',\n", 361 | " 'Credit_History',\n", 362 | " 'Property_Area',\n", 363 | " 'Loan_Status']" 364 | ] 365 | }, 366 | "execution_count": 102, 367 | "metadata": {}, 368 | "output_type": "execute_result" 369 | } 370 | ], 371 | "source": [ 372 | "list(data.columns)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 103, 378 | "metadata": { 379 | "collapsed": true 380 | }, 381 | "outputs": [], 382 | "source": [ 383 | "pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome','CoapplicantIncome',\\\n", 384 | " 'LoanAmount','Loan_Amount_Term','Credit_History','Property_Area']" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 104, 390 | "metadata": { 391 | "collapsed": true 392 | }, 393 | "outputs": [], 394 | "source": [ 395 | "X_train, X_test, y_train, y_test = train_test_split(data[pred_var], data['Loan_Status'], \\\n", 396 | " test_size=0.25, random_state=42)" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "We'll compile a list of `pre-processing` steps that we do on to create a custom `estimator`." 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 105, 409 | "metadata": { 410 | "collapsed": true 411 | }, 412 | "outputs": [], 413 | "source": [ 414 | "X_train['Dependents'] = X_train['Dependents'].fillna('0')\n", 415 | "X_train['Self_Employed'] = X_train['Self_Employed'].fillna('No')\n", 416 | "X_train['Loan_Amount_Term'] = X_train['Loan_Amount_Term'].fillna(X_train['Loan_Amount_Term'].mean())" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": 106, 422 | "metadata": {}, 423 | "outputs": [], 424 | "source": [ 425 | "X_train['Credit_History'] = X_train['Credit_History'].fillna(1)\n", 426 | "X_train['Married'] = X_train['Married'].fillna('No')\n", 427 | "X_train['Gender'] = X_train['Gender'].fillna('Male')" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 113, 433 | "metadata": { 434 | "collapsed": true 435 | }, 436 | "outputs": [], 437 | "source": [ 438 | "X_train['LoanAmount'] = X_train['LoanAmount'].fillna(X_train['LoanAmount'].mean())" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "We have a lot of `string` labels that we encounter in `Gender`, `Married`, `Education`, `Self_Employed` & `Property_Area` columns." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 107, 451 | "metadata": {}, 452 | "outputs": [ 453 | { 454 | "name": "stdout", 455 | "output_type": "stream", 456 | "text": [ 457 | "List of unique labels Gender:{'Male', 'Female'}\n", 458 | "List of unique labels Married:{'Yes', 'No'}\n", 459 | "List of unique labels Education:{'Not Graduate', 'Graduate'}\n", 460 | "List of unique labels Self_Employed:{'Yes', 'No'}\n", 461 | "List of unique labels Property_Area:{'Semiurban', 'Rural', 'Urban'}\n", 462 | "List of unique labels Dependents:{'3+', '0', '2', '1'}\n" 463 | ] 464 | } 465 | ], 466 | "source": [ 467 | "label_columns = ['Gender', 'Married', 'Education', 'Self_Employed', 'Property_Area', 'Dependents']\n", 468 | "\n", 469 | "for _ in label_columns:\n", 470 | " print(\"List of unique labels {}:{}\".format(_, set(X_train[_])))" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": 108, 476 | "metadata": { 477 | "collapsed": true 478 | }, 479 | "outputs": [], 480 | "source": [ 481 | "gender_values = {'Female' : 0, 'Male' : 1} \n", 482 | "married_values = {'No' : 0, 'Yes' : 1}\n", 483 | "education_values = {'Graduate' : 0, 'Not Graduate' : 1}\n", 484 | "employed_values = {'No' : 0, 'Yes' : 1}\n", 485 | "property_values = {'Rural' : 0, 'Urban' : 1, 'Semiurban' : 2}\n", 486 | "dependent_values = {'3+': 3, '0': 0, '2': 2, '1': 1}\n", 487 | "X_train.replace({'Gender': gender_values, 'Married': married_values, 'Education': education_values, \\\n", 488 | " 'Self_Employed': employed_values, 'Property_Area': property_values, 'Dependents': dependent_values}\\\n", 489 | " , inplace=True)" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": 109, 495 | "metadata": {}, 496 | "outputs": [ 497 | { 498 | "data": { 499 | "text/html": [ 500 | "
\n", 501 | "\n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | "
GenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_Area
921121032731820.081.0360.01.01
3041000040002500.0140.0360.01.00
681131171000.0125.060.01.01
151000049500.0125.0360.01.01
2111130034301250.0128.0360.00.02
\n", 591 | "
" 592 | ], 593 | "text/plain": [ 594 | " Gender Married Dependents Education Self_Employed ApplicantIncome \\\n", 595 | "92 1 1 2 1 0 3273 \n", 596 | "304 1 0 0 0 0 4000 \n", 597 | "68 1 1 3 1 1 7100 \n", 598 | "15 1 0 0 0 0 4950 \n", 599 | "211 1 1 3 0 0 3430 \n", 600 | "\n", 601 | " CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History \\\n", 602 | "92 1820.0 81.0 360.0 1.0 \n", 603 | "304 2500.0 140.0 360.0 1.0 \n", 604 | "68 0.0 125.0 60.0 1.0 \n", 605 | "15 0.0 125.0 360.0 1.0 \n", 606 | "211 1250.0 128.0 360.0 0.0 \n", 607 | "\n", 608 | " Property_Area \n", 609 | "92 1 \n", 610 | "304 0 \n", 611 | "68 1 \n", 612 | "15 1 \n", 613 | "211 2 " 614 | ] 615 | }, 616 | "execution_count": 109, 617 | "metadata": {}, 618 | "output_type": "execute_result" 619 | } 620 | ], 621 | "source": [ 622 | "X_train.head()" 623 | ] 624 | }, 625 | { 626 | "cell_type": "code", 627 | "execution_count": 110, 628 | "metadata": {}, 629 | "outputs": [ 630 | { 631 | "data": { 632 | "text/plain": [ 633 | "Gender int64\n", 634 | "Married int64\n", 635 | "Dependents int64\n", 636 | "Education int64\n", 637 | "Self_Employed int64\n", 638 | "ApplicantIncome int64\n", 639 | "CoapplicantIncome float64\n", 640 | "LoanAmount float64\n", 641 | "Loan_Amount_Term float64\n", 642 | "Credit_History float64\n", 643 | "Property_Area int64\n", 644 | "dtype: object" 645 | ] 646 | }, 647 | "execution_count": 110, 648 | "metadata": {}, 649 | "output_type": "execute_result" 650 | } 651 | ], 652 | "source": [ 653 | "X_train.dtypes" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": 112, 659 | "metadata": {}, 660 | "outputs": [ 661 | { 662 | "name": "stdout", 663 | "output_type": "stream", 664 | "text": [ 665 | "The number of null values in:Gender == 0\n", 666 | "The number of null values in:Married == 0\n", 667 | "The number of null values in:Dependents == 0\n", 668 | "The number of null values in:Education == 0\n", 669 | "The number of null values in:Self_Employed == 0\n", 670 | "The number of null values in:ApplicantIncome == 0\n", 671 | "The number of null values in:CoapplicantIncome == 0\n", 672 | "The number of null values in:LoanAmount == 16\n", 673 | "The number of null values in:Loan_Amount_Term == 0\n", 674 | "The number of null values in:Credit_History == 0\n", 675 | "The number of null values in:Property_Area == 0\n" 676 | ] 677 | } 678 | ], 679 | "source": [ 680 | "for _ in X_train.columns:\n", 681 | " print(\"The number of null values in:{} == {}\".format(_, X_train[_].isnull().sum()))" 682 | ] 683 | }, 684 | { 685 | "cell_type": "markdown", 686 | "metadata": {}, 687 | "source": [ 688 | "Converting the pandas dataframes to numpy arrays:" 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "execution_count": 68, 694 | "metadata": { 695 | "collapsed": true 696 | }, 697 | "outputs": [], 698 | "source": [ 699 | "X_train = X_train.as_matrix()" 700 | ] 701 | }, 702 | { 703 | "cell_type": "code", 704 | "execution_count": 69, 705 | "metadata": {}, 706 | "outputs": [ 707 | { 708 | "data": { 709 | "text/plain": [ 710 | "(460, 11)" 711 | ] 712 | }, 713 | "execution_count": 69, 714 | "metadata": {}, 715 | "output_type": "execute_result" 716 | } 717 | ], 718 | "source": [ 719 | "X_train.shape" 720 | ] 721 | }, 722 | { 723 | "cell_type": "markdown", 724 | "metadata": {}, 725 | "source": [ 726 | "We'll create a custom `pre-processing estimator` that would help us in writing better pipelines and in future deployments:" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": 115, 732 | "metadata": {}, 733 | "outputs": [], 734 | "source": [ 735 | "from sklearn.base import BaseEstimator, TransformerMixin\n", 736 | "\n", 737 | "class PreProcessing(BaseEstimator, TransformerMixin):\n", 738 | " \"\"\"Custom Pre-Processing estimator for our use-case\n", 739 | " \"\"\"\n", 740 | "\n", 741 | " def __init__(self):\n", 742 | " pass\n", 743 | "\n", 744 | " def transform(self, df):\n", 745 | " \"\"\"Regular transform() that is a help for training, validation & testing datasets\n", 746 | " (NOTE: The operations performed here are the ones that we did prior to this cell)\n", 747 | " \"\"\"\n", 748 | " pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome','CoapplicantIncome',\\\n", 749 | " 'LoanAmount','Loan_Amount_Term','Credit_History','Property_Area']\n", 750 | " \n", 751 | " df = df[pred_var]\n", 752 | " \n", 753 | " df['Dependents'] = df['Dependents'].fillna(0)\n", 754 | " df['Self_Employed'] = df['Self_Employed'].fillna('No')\n", 755 | " df['Loan_Amount_Term'] = df['Loan_Amount_Term'].fillna(self.term_mean_)\n", 756 | " df['Credit_History'] = df['Credit_History'].fillna(1)\n", 757 | " df['Married'] = df['Married'].fillna('No')\n", 758 | " df['Gender'] = df['Gender'].fillna('Male')\n", 759 | " df['LoanAmount'] = df['LoanAmount'].fillna(self.amt_mean_)\n", 760 | " \n", 761 | " gender_values = {'Female' : 0, 'Male' : 1} \n", 762 | " married_values = {'No' : 0, 'Yes' : 1}\n", 763 | " education_values = {'Graduate' : 0, 'Not Graduate' : 1}\n", 764 | " employed_values = {'No' : 0, 'Yes' : 1}\n", 765 | " property_values = {'Rural' : 0, 'Urban' : 1, 'Semiurban' : 2}\n", 766 | " dependent_values = {'3+': 3, '0': 0, '2': 2, '1': 1}\n", 767 | " df.replace({'Gender': gender_values, 'Married': married_values, 'Education': education_values, \\\n", 768 | " 'Self_Employed': employed_values, 'Property_Area': property_values, \\\n", 769 | " 'Dependents': dependent_values}, inplace=True)\n", 770 | " \n", 771 | " return df.as_matrix()\n", 772 | "\n", 773 | " def fit(self, df, y=None, **fit_params):\n", 774 | " \"\"\"Fitting the Training dataset & calculating the required values from train\n", 775 | " e.g: We will need the mean of X_train['Loan_Amount_Term'] that will be used in\n", 776 | " transformation of X_test\n", 777 | " \"\"\"\n", 778 | " \n", 779 | " self.term_mean_ = df['Loan_Amount_Term'].mean()\n", 780 | " self.amt_mean_ = df['LoanAmount'].mean()\n", 781 | " return self" 782 | ] 783 | }, 784 | { 785 | "cell_type": "markdown", 786 | "metadata": {}, 787 | "source": [ 788 | "To make sure that this works, let's do a test run for it:" 789 | ] 790 | }, 791 | { 792 | "cell_type": "code", 793 | "execution_count": 116, 794 | "metadata": { 795 | "collapsed": true 796 | }, 797 | "outputs": [], 798 | "source": [ 799 | "X_train, X_test, y_train, y_test = train_test_split(data[pred_var], data['Loan_Status'], \\\n", 800 | " test_size=0.25, random_state=42)" 801 | ] 802 | }, 803 | { 804 | "cell_type": "code", 805 | "execution_count": 117, 806 | "metadata": {}, 807 | "outputs": [ 808 | { 809 | "data": { 810 | "text/html": [ 811 | "
\n", 812 | "\n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | "
GenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_Area
92MaleYes2Not GraduateNo32731820.081.0360.01.0Urban
304MaleNo0GraduateNo40002500.0140.0360.01.0Rural
68MaleYes3+Not GraduateYes71000.0125.060.01.0Urban
15MaleNo0GraduateNo49500.0125.0360.01.0Urban
211MaleYes3+GraduateNo34301250.0128.0360.00.0Semiurban
\n", 902 | "
" 903 | ], 904 | "text/plain": [ 905 | " Gender Married Dependents Education Self_Employed ApplicantIncome \\\n", 906 | "92 Male Yes 2 Not Graduate No 3273 \n", 907 | "304 Male No 0 Graduate No 4000 \n", 908 | "68 Male Yes 3+ Not Graduate Yes 7100 \n", 909 | "15 Male No 0 Graduate No 4950 \n", 910 | "211 Male Yes 3+ Graduate No 3430 \n", 911 | "\n", 912 | " CoapplicantIncome LoanAmount Loan_Amount_Term Credit_History \\\n", 913 | "92 1820.0 81.0 360.0 1.0 \n", 914 | "304 2500.0 140.0 360.0 1.0 \n", 915 | "68 0.0 125.0 60.0 1.0 \n", 916 | "15 0.0 125.0 360.0 1.0 \n", 917 | "211 1250.0 128.0 360.0 0.0 \n", 918 | "\n", 919 | " Property_Area \n", 920 | "92 Urban \n", 921 | "304 Rural \n", 922 | "68 Urban \n", 923 | "15 Urban \n", 924 | "211 Semiurban " 925 | ] 926 | }, 927 | "execution_count": 117, 928 | "metadata": {}, 929 | "output_type": "execute_result" 930 | } 931 | ], 932 | "source": [ 933 | "X_train.head()" 934 | ] 935 | }, 936 | { 937 | "cell_type": "code", 938 | "execution_count": 120, 939 | "metadata": {}, 940 | "outputs": [ 941 | { 942 | "name": "stdout", 943 | "output_type": "stream", 944 | "text": [ 945 | "The number of null values in:Gender == 11\n", 946 | "The number of null values in:Married == 1\n", 947 | "The number of null values in:Dependents == 11\n", 948 | "The number of null values in:Education == 0\n", 949 | "The number of null values in:Self_Employed == 20\n", 950 | "The number of null values in:ApplicantIncome == 0\n", 951 | "The number of null values in:CoapplicantIncome == 0\n", 952 | "The number of null values in:LoanAmount == 16\n", 953 | "The number of null values in:Loan_Amount_Term == 11\n", 954 | "The number of null values in:Credit_History == 36\n", 955 | "The number of null values in:Property_Area == 0\n" 956 | ] 957 | } 958 | ], 959 | "source": [ 960 | "for _ in X_train.columns:\n", 961 | " print(\"The number of null values in:{} == {}\".format(_, X_train[_].isnull().sum()))" 962 | ] 963 | }, 964 | { 965 | "cell_type": "code", 966 | "execution_count": 121, 967 | "metadata": { 968 | "collapsed": true 969 | }, 970 | "outputs": [], 971 | "source": [ 972 | "preprocess = PreProcessing()" 973 | ] 974 | }, 975 | { 976 | "cell_type": "code", 977 | "execution_count": 122, 978 | "metadata": {}, 979 | "outputs": [ 980 | { 981 | "data": { 982 | "text/plain": [ 983 | "PreProcessing()" 984 | ] 985 | }, 986 | "execution_count": 122, 987 | "metadata": {}, 988 | "output_type": "execute_result" 989 | } 990 | ], 991 | "source": [ 992 | "preprocess" 993 | ] 994 | }, 995 | { 996 | "cell_type": "code", 997 | "execution_count": 123, 998 | "metadata": {}, 999 | "outputs": [ 1000 | { 1001 | "data": { 1002 | "text/plain": [ 1003 | "PreProcessing()" 1004 | ] 1005 | }, 1006 | "execution_count": 123, 1007 | "metadata": {}, 1008 | "output_type": "execute_result" 1009 | } 1010 | ], 1011 | "source": [ 1012 | "preprocess.fit(X_train)" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "code", 1017 | "execution_count": 124, 1018 | "metadata": { 1019 | "collapsed": true 1020 | }, 1021 | "outputs": [], 1022 | "source": [ 1023 | "X_train_transformed = preprocess.transform(X_train)" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "code", 1028 | "execution_count": 125, 1029 | "metadata": {}, 1030 | "outputs": [ 1031 | { 1032 | "data": { 1033 | "text/plain": [ 1034 | "(460, 11)" 1035 | ] 1036 | }, 1037 | "execution_count": 125, 1038 | "metadata": {}, 1039 | "output_type": "execute_result" 1040 | } 1041 | ], 1042 | "source": [ 1043 | "X_train_transformed.shape" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "markdown", 1048 | "metadata": {}, 1049 | "source": [ 1050 | "So our small experiment to write a custom `estimator` worked. This would be helpful further." 1051 | ] 1052 | }, 1053 | { 1054 | "cell_type": "code", 1055 | "execution_count": 126, 1056 | "metadata": { 1057 | "collapsed": true 1058 | }, 1059 | "outputs": [], 1060 | "source": [ 1061 | "X_test_transformed = preprocess.transform(X_test)" 1062 | ] 1063 | }, 1064 | { 1065 | "cell_type": "code", 1066 | "execution_count": 127, 1067 | "metadata": {}, 1068 | "outputs": [ 1069 | { 1070 | "data": { 1071 | "text/plain": [ 1072 | "(154, 11)" 1073 | ] 1074 | }, 1075 | "execution_count": 127, 1076 | "metadata": {}, 1077 | "output_type": "execute_result" 1078 | } 1079 | ], 1080 | "source": [ 1081 | "X_test_transformed.shape" 1082 | ] 1083 | }, 1084 | { 1085 | "cell_type": "code", 1086 | "execution_count": 128, 1087 | "metadata": { 1088 | "collapsed": true 1089 | }, 1090 | "outputs": [], 1091 | "source": [ 1092 | "y_test = y_test.replace({'Y':1, 'N':0}).as_matrix()" 1093 | ] 1094 | }, 1095 | { 1096 | "cell_type": "code", 1097 | "execution_count": 129, 1098 | "metadata": { 1099 | "collapsed": true 1100 | }, 1101 | "outputs": [], 1102 | "source": [ 1103 | "y_train = y_train.replace({'Y':1, 'N':0}).as_matrix()" 1104 | ] 1105 | }, 1106 | { 1107 | "cell_type": "code", 1108 | "execution_count": 130, 1109 | "metadata": { 1110 | "collapsed": true 1111 | }, 1112 | "outputs": [], 1113 | "source": [ 1114 | "param_grid = {\"randomforestclassifier__n_estimators\" : [10, 20, 30],\n", 1115 | " \"randomforestclassifier__max_depth\" : [None, 6, 8, 10],\n", 1116 | " \"randomforestclassifier__max_leaf_nodes\": [None, 5, 10, 20], \n", 1117 | " \"randomforestclassifier__min_impurity_split\": [0.1, 0.2, 0.3]}" 1118 | ] 1119 | }, 1120 | { 1121 | "cell_type": "code", 1122 | "execution_count": 131, 1123 | "metadata": {}, 1124 | "outputs": [], 1125 | "source": [ 1126 | "from sklearn.pipeline import make_pipeline\n", 1127 | "from sklearn.ensemble import RandomForestClassifier\n", 1128 | "\n", 1129 | "pipe = make_pipeline(PreProcessing(),\n", 1130 | " RandomForestClassifier())" 1131 | ] 1132 | }, 1133 | { 1134 | "cell_type": "code", 1135 | "execution_count": 132, 1136 | "metadata": {}, 1137 | "outputs": [ 1138 | { 1139 | "data": { 1140 | "text/plain": [ 1141 | "Pipeline(memory=None,\n", 1142 | " steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 1143 | " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", 1144 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 1145 | " min_samples_leaf=1, min_samples_split=2,\n", 1146 | " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", 1147 | " oob_score=False, random_state=None, verbose=0,\n", 1148 | " warm_start=False))])" 1149 | ] 1150 | }, 1151 | "execution_count": 132, 1152 | "metadata": {}, 1153 | "output_type": "execute_result" 1154 | } 1155 | ], 1156 | "source": [ 1157 | "pipe" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "code", 1162 | "execution_count": 133, 1163 | "metadata": { 1164 | "collapsed": true 1165 | }, 1166 | "outputs": [], 1167 | "source": [ 1168 | "from sklearn.model_selection import train_test_split, GridSearchCV\n", 1169 | "\n", 1170 | "grid = GridSearchCV(pipe, param_grid=param_grid, cv=3)" 1171 | ] 1172 | }, 1173 | { 1174 | "cell_type": "code", 1175 | "execution_count": 134, 1176 | "metadata": {}, 1177 | "outputs": [ 1178 | { 1179 | "data": { 1180 | "text/plain": [ 1181 | "GridSearchCV(cv=3, error_score='raise',\n", 1182 | " estimator=Pipeline(memory=None,\n", 1183 | " steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 1184 | " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", 1185 | " min_impurity_decrease=0.0, min_impu..._jobs=1,\n", 1186 | " oob_score=False, random_state=None, verbose=0,\n", 1187 | " warm_start=False))]),\n", 1188 | " fit_params=None, iid=True, n_jobs=1,\n", 1189 | " param_grid={'randomforestclassifier__n_estimators': [10, 20, 30], 'randomforestclassifier__min_impurity_split': [0.1, 0.2, 0.3], 'randomforestclassifier__max_depth': [None, 6, 8, 10], 'randomforestclassifier__max_leaf_nodes': [None, 5, 10, 20]},\n", 1190 | " pre_dispatch='2*n_jobs', refit=True, return_train_score=True,\n", 1191 | " scoring=None, verbose=0)" 1192 | ] 1193 | }, 1194 | "execution_count": 134, 1195 | "metadata": {}, 1196 | "output_type": "execute_result" 1197 | } 1198 | ], 1199 | "source": [ 1200 | "grid" 1201 | ] 1202 | }, 1203 | { 1204 | "cell_type": "code", 1205 | "execution_count": 135, 1206 | "metadata": { 1207 | "collapsed": true 1208 | }, 1209 | "outputs": [], 1210 | "source": [ 1211 | "X_train, X_test, y_train, y_test = train_test_split(data[pred_var], data['Loan_Status'], \\\n", 1212 | " test_size=0.25, random_state=42)" 1213 | ] 1214 | }, 1215 | { 1216 | "cell_type": "code", 1217 | "execution_count": 136, 1218 | "metadata": {}, 1219 | "outputs": [ 1220 | { 1221 | "data": { 1222 | "text/plain": [ 1223 | "GridSearchCV(cv=3, error_score='raise',\n", 1224 | " estimator=Pipeline(memory=None,\n", 1225 | " steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 1226 | " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", 1227 | " min_impurity_decrease=0.0, min_impu..._jobs=1,\n", 1228 | " oob_score=False, random_state=None, verbose=0,\n", 1229 | " warm_start=False))]),\n", 1230 | " fit_params=None, iid=True, n_jobs=1,\n", 1231 | " param_grid={'randomforestclassifier__n_estimators': [10, 20, 30], 'randomforestclassifier__min_impurity_split': [0.1, 0.2, 0.3], 'randomforestclassifier__max_depth': [None, 6, 8, 10], 'randomforestclassifier__max_leaf_nodes': [None, 5, 10, 20]},\n", 1232 | " pre_dispatch='2*n_jobs', refit=True, return_train_score=True,\n", 1233 | " scoring=None, verbose=0)" 1234 | ] 1235 | }, 1236 | "execution_count": 136, 1237 | "metadata": {}, 1238 | "output_type": "execute_result" 1239 | } 1240 | ], 1241 | "source": [ 1242 | "grid.fit(X_train, y_train)" 1243 | ] 1244 | }, 1245 | { 1246 | "cell_type": "code", 1247 | "execution_count": 137, 1248 | "metadata": {}, 1249 | "outputs": [ 1250 | { 1251 | "name": "stdout", 1252 | "output_type": "stream", 1253 | "text": [ 1254 | "Best parameters: {'randomforestclassifier__max_leaf_nodes': 20, 'randomforestclassifier__min_impurity_split': 0.2, 'randomforestclassifier__max_depth': 8, 'randomforestclassifier__n_estimators': 30}\n" 1255 | ] 1256 | } 1257 | ], 1258 | "source": [ 1259 | "print(\"Best parameters: {}\".format(grid.best_params_))" 1260 | ] 1261 | }, 1262 | { 1263 | "cell_type": "code", 1264 | "execution_count": 138, 1265 | "metadata": {}, 1266 | "outputs": [ 1267 | { 1268 | "name": "stdout", 1269 | "output_type": "stream", 1270 | "text": [ 1271 | "Test set score: 0.78\n" 1272 | ] 1273 | } 1274 | ], 1275 | "source": [ 1276 | "print(\"Test set score: {:.2f}\".format(grid.score(X_test, y_test)))" 1277 | ] 1278 | } 1279 | ], 1280 | "metadata": { 1281 | "kernelspec": { 1282 | "display_name": "Python 3", 1283 | "language": "python", 1284 | "name": "python3" 1285 | }, 1286 | "language_info": { 1287 | "codemirror_mode": { 1288 | "name": "ipython", 1289 | "version": 3 1290 | }, 1291 | "file_extension": ".py", 1292 | "mimetype": "text/x-python", 1293 | "name": "python", 1294 | "nbconvert_exporter": "python", 1295 | "pygments_lexer": "ipython3", 1296 | "version": "3.5.2" 1297 | } 1298 | }, 1299 | "nbformat": 4, 1300 | "nbformat_minor": 2 1301 | } 1302 | -------------------------------------------------------------------------------- /notebooks/ML Models as APIs using Flask.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Machine Learning models as APIs using Flask\n", 8 | "\n", 9 | "### Introduction\n", 10 | "\n", 11 | "A lot of Machine Learning (ML) projects, amateur and professional, start with an aplomb. The early excitement with working on the dataset, answering the obvious & not so obvious questions & presenting the results are what everyone of us works for. There are compliments thrown around and talks about going to the next step -- that's when the question arises, __How?__\n", 12 | "\n", 13 | "The usual suspects are making dashboards and providing insights. But mostly, the real use of your Machine Learning model lies in being at the heart of a product -- that maybe a small component of an automated mailer system or a chatbot. These are the times when the barriers seem unsurmountable. Giving an example, majority of ML folks use `R/Python` for their experiments. But consumer of those ML models would be software engineers who use a completely different stack. There are two ways via which this problem can be solved:\n", 14 | "\n", 15 | "- __Rewriting the whole code in the language that the software engineering folks work__\n", 16 | "\n", 17 | "The above seems like a good idea, but the time & energy required to get those intricate models replicated would be utterly waste. Majority of languages like `JavaScript`, do not have great libraries to perform ML. One would be wise to stay away from it.\n", 18 | "\n", 19 | "- __API-first approach__\n", 20 | "\n", 21 | "Web APIs have made it easy for cross-language applications to work well. If a frontend developer needs to use your ML Model to create a ML powered web application, he would just need to get the `URL Endpoint` from where the API is being served. \n", 22 | "\n", 23 | "The articles below would help you to appreciate why APIs are a popular choice amongst developers:\n", 24 | "\n", 25 | "- [History of APIs](http://apievangelist.com/2012/12/20/history-of-apis/)\n", 26 | "- [Introduction to APIs - AV Article](https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-apis-application-programming-interfaces-5-apis-a-data-scientist-must-know/)\n", 27 | "\n", 28 | "Majority of the Big Cloud providers and smaller Machine Learning focussed companies provide ready-to-use APIs. They cater to the needs of developers/businesses that don't have expertise in ML, who want to implement ML in their processes or product suites.\n", 29 | "\n", 30 | "One such example of Web APIs offered is the [Google Vision API](https://cloud.google.com/vision/)\n", 31 | "\n", 32 | "![Google API Suite](http://www.publickey1.jp/2016/gcpnext16.jpg)\n", 33 | "\n", 34 | "All you need is a simple REST call to the API via SDKs (Software Development Kits) provided by Google. [Click here](https://github.com/GoogleCloudPlatform/cloud-vision/tree/master/python) to get an idea of what can be done using Google Vision API.\n", 35 | "\n", 36 | "Sounds marvellous right! In this article, we'll understand how to create our own Machine Learning API using `Flask`, a web framework with `Python`. \n", 37 | "\n", 38 | "![Flask](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/Flask_logo.svg/640px-Flask_logo.svg.png)\n", 39 | "\n", 40 | "__NOTE:__ `Flask` isn't the only web-framework available. There's `Django`, `Falcon`, `Hug` and many more. For `R`, we have a package called [`plumber`](https://github.com/trestletech/plumber).\n", 41 | "\n", 42 | "### Table of Contents:\n", 43 | "\n", 44 | "1. __Python Environment Setup & Flask Basics__\n", 45 | "2. __Creating a Machine Learning Model__\n", 46 | "3. __Saving the Machine Learning Model: Serialization & Deserialization__\n", 47 | "4. __Creating an API using Flask__\n", 48 | "\n", 49 | "### 1. Python Environment Setup & Flask Basics\n", 50 | "\n", 51 | "![Anaconda](https://upload.wikimedia.org/wikipedia/en/c/cd/Anaconda_Logo.png)\n", 52 | "\n", 53 | "- Creating a virtual environment using `Anaconda`. If you need to create your workflows in Python and keep the dependencies separated out or share the environment settings, `Anaconda` distributions are a great option. \n", 54 | " * You'll find a miniconda installation for Python [here](https://conda.io/miniconda.html)\n", 55 | " * `wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh`\n", 56 | " * `bash Miniconda3-latest-Linux-x86_64.sh`\n", 57 | " * Follow the sequence of questions.\n", 58 | " * `source .bashrc`\n", 59 | " * If you run: `conda`, you should be able to get the list of commands & help.\n", 60 | " * To create a new environment, run: `conda create --name python=3.6`\n", 61 | " * Follow the steps & once done run: `source activate `\n", 62 | " * Install the python packages you need, the two important are: `flask` & `gunicorn`.\n", 63 | " \n", 64 | " \n", 65 | "- We'll try out a simple `Flask` Hello-World application and serve it using `gunicorn`:\n", 66 | "\n", 67 | " * Open up your favourite text editor and create `hello-world.py` file in a folder\n", 68 | " * Write the code below:\n", 69 | " ```python\n", 70 | "\n", 71 | " \"\"\"Filename: hello-world.py\n", 72 | " \"\"\"\n", 73 | "\n", 74 | " from flask import Flask\n", 75 | "\n", 76 | " app = Flask(__name__)\n", 77 | "\n", 78 | " @app.route('/users/')\n", 79 | " def hello_world(username=None):\n", 80 | "\n", 81 | " return(\"Hello {}!\".format(username))\n", 82 | "\n", 83 | " ```\n", 84 | " * Save the file and return to the terminal.\n", 85 | " * To serve the API (to start running it), execute: `gunicorn --bind 0.0.0.0:8000 hello-world:app` on your terminal.\n", 86 | " \n", 87 | " * If you get the repsonses below, you are on the right track:\n", 88 | "\n", 89 | " ![Hello World](https://raw.githubusercontent.com/pratos/flask_api/master/notebooks/images/flaskapp1.png)\n", 90 | "\n", 91 | " * On you browser, try out: `https://localhost:8000/users/any-name`\n", 92 | "\n", 93 | " ![Browser](https://raw.githubusercontent.com/pratos/flask_api/master/notebooks/images/flaskapp2.png)\n", 94 | "\n", 95 | "Viola! You wrote your first Flask application. As you have now experienced with a few simple steps, we were able to create web-endpoints that can be accessed locally. And it remains simple going forward too.\n", 96 | "\n", 97 | "Using `Flask`, we can wrap our Machine Learning models and serve them as Web APIs easily. Also, if we want to create more complex web applications (that includes JavaScript `*gasps*`) we just need a few modifications." 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### 2. Creating a Machine Learning Model\n", 105 | "\n", 106 | "- We'll be taking up the Machine Learning competition: [Loan Prediction Competition](https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii). The main objective is to set a pre-processing pipeline and creating ML Models with goal towards making the ML Predictions easy while deployments. \n", 107 | "\n" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 1, 113 | "metadata": { 114 | "collapsed": true 115 | }, 116 | "outputs": [], 117 | "source": [ 118 | "import os \n", 119 | "import json\n", 120 | "import numpy as np\n", 121 | "import pandas as pd\n", 122 | "from sklearn.externals import joblib\n", 123 | "from sklearn.model_selection import train_test_split, GridSearchCV\n", 124 | "from sklearn.base import BaseEstimator, TransformerMixin\n", 125 | "from sklearn.ensemble import RandomForestClassifier\n", 126 | "\n", 127 | "from sklearn.pipeline import make_pipeline\n", 128 | "\n", 129 | "import warnings\n", 130 | "warnings.filterwarnings(\"ignore\")" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "- Saving the datasets in a folder:" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 2, 143 | "metadata": {}, 144 | "outputs": [ 145 | { 146 | "name": "stdout", 147 | "output_type": "stream", 148 | "text": [ 149 | "test.csv training.csv\r\n" 150 | ] 151 | } 152 | ], 153 | "source": [ 154 | "!ls /home/pratos/Side-Project/av_articles/flask_api/data/" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 3, 160 | "metadata": { 161 | "collapsed": true 162 | }, 163 | "outputs": [], 164 | "source": [ 165 | "data = pd.read_csv('../data/training.csv')" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 4, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "['Loan_ID',\n", 177 | " 'Gender',\n", 178 | " 'Married',\n", 179 | " 'Dependents',\n", 180 | " 'Education',\n", 181 | " 'Self_Employed',\n", 182 | " 'ApplicantIncome',\n", 183 | " 'CoapplicantIncome',\n", 184 | " 'LoanAmount',\n", 185 | " 'Loan_Amount_Term',\n", 186 | " 'Credit_History',\n", 187 | " 'Property_Area',\n", 188 | " 'Loan_Status']" 189 | ] 190 | }, 191 | "execution_count": 4, 192 | "metadata": {}, 193 | "output_type": "execute_result" 194 | } 195 | ], 196 | "source": [ 197 | "list(data.columns)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 5, 203 | "metadata": {}, 204 | "outputs": [ 205 | { 206 | "data": { 207 | "text/plain": [ 208 | "(614, 13)" 209 | ] 210 | }, 211 | "execution_count": 5, 212 | "metadata": {}, 213 | "output_type": "execute_result" 214 | } 215 | ], 216 | "source": [ 217 | "data.shape" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "- Finding out the `null/Nan` values in the columns:" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 6, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "name": "stdout", 234 | "output_type": "stream", 235 | "text": [ 236 | "The number of null values in:Loan_ID == 0\n", 237 | "The number of null values in:Gender == 13\n", 238 | "The number of null values in:Married == 3\n", 239 | "The number of null values in:Dependents == 15\n", 240 | "The number of null values in:Education == 0\n", 241 | "The number of null values in:Self_Employed == 32\n", 242 | "The number of null values in:ApplicantIncome == 0\n", 243 | "The number of null values in:CoapplicantIncome == 0\n", 244 | "The number of null values in:LoanAmount == 22\n", 245 | "The number of null values in:Loan_Amount_Term == 14\n", 246 | "The number of null values in:Credit_History == 50\n", 247 | "The number of null values in:Property_Area == 0\n", 248 | "The number of null values in:Loan_Status == 0\n" 249 | ] 250 | } 251 | ], 252 | "source": [ 253 | "for _ in data.columns:\n", 254 | " print(\"The number of null values in:{} == {}\".format(_, data[_].isnull().sum()))" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "- Next step is creating `training` and `testing` datasets:" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 7, 267 | "metadata": { 268 | "collapsed": true 269 | }, 270 | "outputs": [], 271 | "source": [ 272 | "pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome','CoapplicantIncome',\\\n", 273 | " 'LoanAmount','Loan_Amount_Term','Credit_History','Property_Area']\n", 274 | "\n", 275 | "X_train, X_test, y_train, y_test = train_test_split(data[pred_var], data['Loan_Status'], \\\n", 276 | " test_size=0.25, random_state=42)" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "- To make sure that the `pre-processing steps` are followed religiously even after we are done with experimenting and we do not miss them while predictions, we'll create a __custom pre-processing Scikit-learn `estimator`__.\n", 284 | "\n", 285 | "__To follow the process on how we ended up with this `estimator`, read up on [this notebook](https://github.com/pratos/flask_api/blob/master/notebooks/AnalyticsVidhya%20Article%20-%20ML%20Model%20approach.ipynb)__" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 8, 291 | "metadata": { 292 | "collapsed": true 293 | }, 294 | "outputs": [], 295 | "source": [ 296 | "from sklearn.base import BaseEstimator, TransformerMixin\n", 297 | "\n", 298 | "class PreProcessing(BaseEstimator, TransformerMixin):\n", 299 | " \"\"\"Custom Pre-Processing estimator for our use-case\n", 300 | " \"\"\"\n", 301 | "\n", 302 | " def __init__(self):\n", 303 | " pass\n", 304 | "\n", 305 | " def transform(self, df):\n", 306 | " \"\"\"Regular transform() that is a help for training, validation & testing datasets\n", 307 | " (NOTE: The operations performed here are the ones that we did prior to this cell)\n", 308 | " \"\"\"\n", 309 | " pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome',\\\n", 310 | " 'CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History','Property_Area']\n", 311 | " \n", 312 | " df = df[pred_var]\n", 313 | " \n", 314 | " df['Dependents'] = df['Dependents'].fillna(0)\n", 315 | " df['Self_Employed'] = df['Self_Employed'].fillna('No')\n", 316 | " df['Loan_Amount_Term'] = df['Loan_Amount_Term'].fillna(self.term_mean_)\n", 317 | " df['Credit_History'] = df['Credit_History'].fillna(1)\n", 318 | " df['Married'] = df['Married'].fillna('No')\n", 319 | " df['Gender'] = df['Gender'].fillna('Male')\n", 320 | " df['LoanAmount'] = df['LoanAmount'].fillna(self.amt_mean_)\n", 321 | " \n", 322 | " gender_values = {'Female' : 0, 'Male' : 1} \n", 323 | " married_values = {'No' : 0, 'Yes' : 1}\n", 324 | " education_values = {'Graduate' : 0, 'Not Graduate' : 1}\n", 325 | " employed_values = {'No' : 0, 'Yes' : 1}\n", 326 | " property_values = {'Rural' : 0, 'Urban' : 1, 'Semiurban' : 2}\n", 327 | " dependent_values = {'3+': 3, '0': 0, '2': 2, '1': 1}\n", 328 | " df.replace({'Gender': gender_values, 'Married': married_values, 'Education': education_values, \\\n", 329 | " 'Self_Employed': employed_values, 'Property_Area': property_values, \\\n", 330 | " 'Dependents': dependent_values}, inplace=True)\n", 331 | " \n", 332 | " return df.as_matrix()\n", 333 | "\n", 334 | " def fit(self, df, y=None, **fit_params):\n", 335 | " \"\"\"Fitting the Training dataset & calculating the required values from train\n", 336 | " e.g: We will need the mean of X_train['Loan_Amount_Term'] that will be used in\n", 337 | " transformation of X_test\n", 338 | " \"\"\"\n", 339 | " \n", 340 | " self.term_mean_ = df['Loan_Amount_Term'].mean()\n", 341 | " self.amt_mean_ = df['LoanAmount'].mean()\n", 342 | " return self" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "- Convert `y_train` & `y_test` to `np.array`:" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 9, 355 | "metadata": { 356 | "collapsed": true 357 | }, 358 | "outputs": [], 359 | "source": [ 360 | "y_train = y_train.replace({'Y':1, 'N':0}).as_matrix()\n", 361 | "y_test = y_test.replace({'Y':1, 'N':0}).as_matrix()" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "We'll create a `pipeline` to make sure that all the preprocessing steps that we do are just a single `scikit-learn estimator`." 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": 10, 374 | "metadata": { 375 | "collapsed": true 376 | }, 377 | "outputs": [], 378 | "source": [ 379 | "pipe = make_pipeline(PreProcessing(),\n", 380 | " RandomForestClassifier())" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 11, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/plain": [ 391 | "Pipeline(memory=None,\n", 392 | " steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 393 | " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", 394 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 395 | " min_samples_leaf=1, min_samples_split=2,\n", 396 | " min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,\n", 397 | " oob_score=False, random_state=None, verbose=0,\n", 398 | " warm_start=False))])" 399 | ] 400 | }, 401 | "execution_count": 11, 402 | "metadata": {}, 403 | "output_type": "execute_result" 404 | } 405 | ], 406 | "source": [ 407 | "pipe" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "To search for the best `hyper-parameters` (`degree` for `PolynomialFeatures` & `alpha` for `Ridge`), we'll do a `Grid Search`:\n", 415 | "\n", 416 | "- Defining `param_grid`:" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": 12, 422 | "metadata": { 423 | "collapsed": true 424 | }, 425 | "outputs": [], 426 | "source": [ 427 | "param_grid = {\"randomforestclassifier__n_estimators\" : [10, 20, 30],\n", 428 | " \"randomforestclassifier__max_depth\" : [None, 6, 8, 10],\n", 429 | " \"randomforestclassifier__max_leaf_nodes\": [None, 5, 10, 20], \n", 430 | " \"randomforestclassifier__min_impurity_split\": [0.1, 0.2, 0.3]}" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "- Running the `Grid Search`:" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": 13, 443 | "metadata": { 444 | "collapsed": true 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "grid = GridSearchCV(pipe, param_grid=param_grid, cv=3)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "- Fitting the training data on the `pipeline estimator`:" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 14, 461 | "metadata": {}, 462 | "outputs": [ 463 | { 464 | "data": { 465 | "text/plain": [ 466 | "GridSearchCV(cv=3, error_score='raise',\n", 467 | " estimator=Pipeline(memory=None,\n", 468 | " steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 469 | " max_depth=None, max_features='auto', max_leaf_nodes=None,\n", 470 | " min_impurity_decrease=0.0, min_impu..._jobs=1,\n", 471 | " oob_score=False, random_state=None, verbose=0,\n", 472 | " warm_start=False))]),\n", 473 | " fit_params=None, iid=True, n_jobs=1,\n", 474 | " param_grid={'randomforestclassifier__n_estimators': [10, 20, 30], 'randomforestclassifier__max_leaf_nodes': [None, 5, 10, 20], 'randomforestclassifier__min_impurity_split': [0.1, 0.2, 0.3], 'randomforestclassifier__max_depth': [None, 6, 8, 10]},\n", 475 | " pre_dispatch='2*n_jobs', refit=True, return_train_score=True,\n", 476 | " scoring=None, verbose=0)" 477 | ] 478 | }, 479 | "execution_count": 14, 480 | "metadata": {}, 481 | "output_type": "execute_result" 482 | } 483 | ], 484 | "source": [ 485 | "grid.fit(X_train, y_train)" 486 | ] 487 | }, 488 | { 489 | "cell_type": "markdown", 490 | "metadata": {}, 491 | "source": [ 492 | "- Let's see what parameter did the Grid Search select:" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": 15, 498 | "metadata": {}, 499 | "outputs": [ 500 | { 501 | "name": "stdout", 502 | "output_type": "stream", 503 | "text": [ 504 | "Best parameters: {'randomforestclassifier__n_estimators': 30, 'randomforestclassifier__max_leaf_nodes': 20, 'randomforestclassifier__min_impurity_split': 0.2, 'randomforestclassifier__max_depth': 8}\n" 505 | ] 506 | } 507 | ], 508 | "source": [ 509 | "print(\"Best parameters: {}\".format(grid.best_params_))" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "- Let's score:" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 16, 522 | "metadata": {}, 523 | "outputs": [ 524 | { 525 | "name": "stdout", 526 | "output_type": "stream", 527 | "text": [ 528 | "Validation set score: 0.79\n" 529 | ] 530 | } 531 | ], 532 | "source": [ 533 | "print(\"Validation set score: {:.2f}\".format(grid.score(X_test, y_test)))" 534 | ] 535 | }, 536 | { 537 | "cell_type": "markdown", 538 | "metadata": {}, 539 | "source": [ 540 | "- Load the test set:" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": 17, 546 | "metadata": { 547 | "collapsed": true 548 | }, 549 | "outputs": [], 550 | "source": [ 551 | "test_df = pd.read_csv('../data/test.csv', encoding=\"utf-8-sig\")\n", 552 | "test_df = test_df.head()" 553 | ] 554 | }, 555 | { 556 | "cell_type": "code", 557 | "execution_count": 18, 558 | "metadata": {}, 559 | "outputs": [ 560 | { 561 | "data": { 562 | "text/html": [ 563 | "
\n", 564 | "\n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | "
Loan_IDGenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_Area
0LP001015MaleYes0GraduateNo57200110.0360.01.0Urban
1LP001022MaleYes1GraduateNo30761500126.0360.01.0Urban
2LP001031MaleYes2GraduateNo50001800208.0360.01.0Urban
3LP001035MaleYes2GraduateNo23402546100.0360.0NaNUrban
4LP001051MaleNo0Not GraduateNo3276078.0360.01.0Urban
\n", 660 | "
" 661 | ], 662 | "text/plain": [ 663 | " Loan_ID Gender Married Dependents Education Self_Employed \\\n", 664 | "0 LP001015 Male Yes 0 Graduate No \n", 665 | "1 LP001022 Male Yes 1 Graduate No \n", 666 | "2 LP001031 Male Yes 2 Graduate No \n", 667 | "3 LP001035 Male Yes 2 Graduate No \n", 668 | "4 LP001051 Male No 0 Not Graduate No \n", 669 | "\n", 670 | " ApplicantIncome CoapplicantIncome LoanAmount Loan_Amount_Term \\\n", 671 | "0 5720 0 110.0 360.0 \n", 672 | "1 3076 1500 126.0 360.0 \n", 673 | "2 5000 1800 208.0 360.0 \n", 674 | "3 2340 2546 100.0 360.0 \n", 675 | "4 3276 0 78.0 360.0 \n", 676 | "\n", 677 | " Credit_History Property_Area \n", 678 | "0 1.0 Urban \n", 679 | "1 1.0 Urban \n", 680 | "2 1.0 Urban \n", 681 | "3 NaN Urban \n", 682 | "4 1.0 Urban " 683 | ] 684 | }, 685 | "execution_count": 18, 686 | "metadata": {}, 687 | "output_type": "execute_result" 688 | } 689 | ], 690 | "source": [ 691 | "test_df" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": 20, 697 | "metadata": {}, 698 | "outputs": [ 699 | { 700 | "data": { 701 | "text/plain": [ 702 | "array([1, 1, 1, 1, 1])" 703 | ] 704 | }, 705 | "execution_count": 20, 706 | "metadata": {}, 707 | "output_type": "execute_result" 708 | } 709 | ], 710 | "source": [ 711 | "grid.predict(test_df)" 712 | ] 713 | }, 714 | { 715 | "cell_type": "markdown", 716 | "metadata": {}, 717 | "source": [ 718 | "Our `pipeline` is looking pretty swell & fairly decent to go the most important step of the tutorial: __Serialize the Machine Learning Model__" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "### 3. Saving Machine Learning Model : Serialization & Deserialization" 726 | ] 727 | }, 728 | { 729 | "cell_type": "markdown", 730 | "metadata": { 731 | "collapsed": true 732 | }, 733 | "source": [ 734 | ">In computer science, in the context of data storage, serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and reconstructed later in the same or another computer environment.\n", 735 | "\n", 736 | "In Python, `pickling` is a standard way to store objects and retrieve them as their original state. To give a simple example:" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 21, 742 | "metadata": { 743 | "collapsed": true 744 | }, 745 | "outputs": [], 746 | "source": [ 747 | "list_to_pickle = [1, 'here', 123, 'walker']\n", 748 | "\n", 749 | "#Pickling the list\n", 750 | "import pickle\n", 751 | "\n", 752 | "list_pickle = pickle.dumps(list_to_pickle)" 753 | ] 754 | }, 755 | { 756 | "cell_type": "code", 757 | "execution_count": 22, 758 | "metadata": {}, 759 | "outputs": [ 760 | { 761 | "data": { 762 | "text/plain": [ 763 | "b'\\x80\\x03]q\\x00(K\\x01X\\x04\\x00\\x00\\x00hereq\\x01K{X\\x06\\x00\\x00\\x00walkerq\\x02e.'" 764 | ] 765 | }, 766 | "execution_count": 22, 767 | "metadata": {}, 768 | "output_type": "execute_result" 769 | } 770 | ], 771 | "source": [ 772 | "list_pickle" 773 | ] 774 | }, 775 | { 776 | "cell_type": "markdown", 777 | "metadata": {}, 778 | "source": [ 779 | "When we load the pickle back:" 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": 23, 785 | "metadata": { 786 | "collapsed": true 787 | }, 788 | "outputs": [], 789 | "source": [ 790 | "loaded_pickle = pickle.loads(list_pickle)" 791 | ] 792 | }, 793 | { 794 | "cell_type": "code", 795 | "execution_count": 24, 796 | "metadata": {}, 797 | "outputs": [ 798 | { 799 | "data": { 800 | "text/plain": [ 801 | "[1, 'here', 123, 'walker']" 802 | ] 803 | }, 804 | "execution_count": 24, 805 | "metadata": {}, 806 | "output_type": "execute_result" 807 | } 808 | ], 809 | "source": [ 810 | "loaded_pickle" 811 | ] 812 | }, 813 | { 814 | "cell_type": "markdown", 815 | "metadata": {}, 816 | "source": [ 817 | "We can save the `pickled object` to a file as well and use it. This method is similar to creating `.rda` files for folks who are familiar with `R Programming`. \n", 818 | "\n", 819 | "__NOTE:__ Some people also argue against using `pickle` for serialization[(1)](#no1). `h5py` could also be an alternative.\n", 820 | "\n", 821 | "We have a custom `Class` that we need to import while running our training, hence we'll be using `dill` module to packup the `estimator Class` with our `grid object`.\n", 822 | "\n", 823 | "It is advisable to create a separate `training.py` file that contains all the code for training the model ([See here for example](https://github.com/pratos/flask_api/blob/master/flask_api/utils.py)).\n", 824 | "\n", 825 | "- To install `dill`" 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": 25, 831 | "metadata": {}, 832 | "outputs": [ 833 | { 834 | "name": "stdout", 835 | "output_type": "stream", 836 | "text": [ 837 | "Requirement already satisfied: dill in /home/pratos/miniconda3/envs/ordermanagement/lib/python3.5/site-packages\r\n" 838 | ] 839 | } 840 | ], 841 | "source": [ 842 | "!pip install dill" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": 26, 848 | "metadata": { 849 | "collapsed": true 850 | }, 851 | "outputs": [], 852 | "source": [ 853 | "import dill as pickle\n", 854 | "filename = 'model_v1.pk'" 855 | ] 856 | }, 857 | { 858 | "cell_type": "code", 859 | "execution_count": 27, 860 | "metadata": { 861 | "collapsed": true 862 | }, 863 | "outputs": [], 864 | "source": [ 865 | "with open('../flask_api/models/'+filename, 'wb') as file:\n", 866 | "\tpickle.dump(grid, file)" 867 | ] 868 | }, 869 | { 870 | "cell_type": "markdown", 871 | "metadata": {}, 872 | "source": [ 873 | "So our model will be saved in the location above. Now that the model `pickled`, creating a `Flask` wrapper around it would be the next step.\n", 874 | "\n", 875 | "Before that, to be sure that our `pickled` file works fine -- let's load it back and do a prediction:" 876 | ] 877 | }, 878 | { 879 | "cell_type": "code", 880 | "execution_count": 28, 881 | "metadata": { 882 | "collapsed": true 883 | }, 884 | "outputs": [], 885 | "source": [ 886 | "with open('../flask_api/models/'+filename ,'rb') as f:\n", 887 | " loaded_model = pickle.load(f)" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": 29, 893 | "metadata": {}, 894 | "outputs": [ 895 | { 896 | "data": { 897 | "text/plain": [ 898 | "array([1, 1, 1, 1, 1])" 899 | ] 900 | }, 901 | "execution_count": 29, 902 | "metadata": {}, 903 | "output_type": "execute_result" 904 | } 905 | ], 906 | "source": [ 907 | "loaded_model.predict(test_df)" 908 | ] 909 | }, 910 | { 911 | "cell_type": "markdown", 912 | "metadata": {}, 913 | "source": [ 914 | "Since, we already have the `preprocessing` steps required for the new incoming data present as a part of the `pipeline` we just have to run `predict()`. While working with `scikit-learn`, it is always easy to work with `pipelines`. \n", 915 | "\n", 916 | "`Estimators` and `pipelines` save you time and headache, even if the initial implementation seems to be ridiculous. Stich in time, saves nine!" 917 | ] 918 | }, 919 | { 920 | "cell_type": "markdown", 921 | "metadata": {}, 922 | "source": [ 923 | "### 4. Creating an API using Flask" 924 | ] 925 | }, 926 | { 927 | "cell_type": "markdown", 928 | "metadata": { 929 | "collapsed": true 930 | }, 931 | "source": [ 932 | "We'll keep the folder structure as simple as possible:\n", 933 | "\n", 934 | "![Folder Struct](https://raw.githubusercontent.com/pratos/flask_api/master/notebooks/images/flaskapp3.png)\n", 935 | "\n", 936 | "There are three important parts in constructing our wrapper function, `apicall()`:\n", 937 | "\n", 938 | "- Getting the `request` data (for which predictions are to be made)\n", 939 | "\n", 940 | "- Loading our `pickled estimator`\n", 941 | "\n", 942 | "- `jsonify` our predictions and send the response back with `status code: 200`\n", 943 | "\n", 944 | "HTTP messages are made of a header and a body. As a standard, majority of the body content sent across are in `json` format. We'll be sending (`POST url-endpoint/`) the incoming data as batch to get predictions.\n", 945 | "\n", 946 | "(__NOTE:__ You can send plain text, XML, csv or image directly but for the sake of interchangeability of the format, it is advisable to use `json`)" 947 | ] 948 | }, 949 | { 950 | "cell_type": "markdown", 951 | "metadata": {}, 952 | "source": [ 953 | "```python\n", 954 | "\"\"\"Filename: server.py\n", 955 | "\"\"\"\n", 956 | "\n", 957 | "import os\n", 958 | "import pandas as pd\n", 959 | "from sklearn.externals import joblib\n", 960 | "from flask import Flask, jsonify, request\n", 961 | "\n", 962 | "app = Flask(__name__)\n", 963 | "\n", 964 | "@app.route('/predict', methods=['POST'])\n", 965 | "def apicall():\n", 966 | "\t\"\"\"API Call\n", 967 | "\t\n", 968 | "\tPandas dataframe (sent as a payload) from API Call\n", 969 | "\t\"\"\"\n", 970 | "\ttry:\n", 971 | "\t\ttest_json = request.get_json()\n", 972 | "\t\ttest = pd.read_json(test_json, orient='records')\n", 973 | "\n", 974 | "\t\t#To resolve the issue of TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str'\n", 975 | "\t\ttest['Dependents'] = [str(x) for x in list(test['Dependents'])]\n", 976 | "\n", 977 | "\t\t#Getting the Loan_IDs separated out\n", 978 | "\t\tloan_ids = test['Loan_ID']\n", 979 | "\n", 980 | "\texcept Exception as e:\n", 981 | "\t\traise e\n", 982 | "\t\n", 983 | "\tclf = 'model_v1.pk'\n", 984 | "\t\n", 985 | "\tif test.empty:\n", 986 | "\t\treturn(bad_request())\n", 987 | "\telse:\n", 988 | "\t\t#Load the saved model\n", 989 | "\t\tprint(\"Loading the model...\")\n", 990 | "\t\tloaded_model = None\n", 991 | "\t\twith open('./models/'+clf,'rb') as f:\n", 992 | "\t\t\tloaded_model = pickle.load(f)\n", 993 | "\n", 994 | "\t\tprint(\"The model has been loaded...doing predictions now...\")\n", 995 | "\t\tpredictions = loaded_model.predict(test)\n", 996 | "\t\t\n", 997 | "\t\t\"\"\"Add the predictions as Series to a new pandas dataframe\n", 998 | "\t\t\t\t\t\t\t\tOR\n", 999 | "\t\t Depending on the use-case, the entire test data appended with the new files\n", 1000 | "\t\t\"\"\"\n", 1001 | "\t\tprediction_series = list(pd.Series(predictions))\n", 1002 | "\n", 1003 | "\t\tfinal_predictions = pd.DataFrame(list(zip(loan_ids, prediction_series)))\n", 1004 | "\t\t\n", 1005 | "\t\t\"\"\"We can be as creative in sending the responses.\n", 1006 | "\t\t But we need to send the response codes as well.\n", 1007 | "\t\t\"\"\"\n", 1008 | "\t\tresponses = jsonify(predictions=final_predictions.to_json(orient=\"records\"))\n", 1009 | "\t\tresponses.status_code = 200\n", 1010 | "\n", 1011 | "\t\treturn (responses)\n", 1012 | "\n", 1013 | "```\n", 1014 | "\n", 1015 | "Once done, run: `gunicorn --bind 0.0.0.0:8000 server:app`" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "markdown", 1020 | "metadata": {}, 1021 | "source": [ 1022 | "Let's generate some prediction data and query the API running locally at `https:0.0.0.0:8000/predict`" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "code", 1027 | "execution_count": 30, 1028 | "metadata": { 1029 | "collapsed": true 1030 | }, 1031 | "outputs": [], 1032 | "source": [ 1033 | "import json\n", 1034 | "import requests" 1035 | ] 1036 | }, 1037 | { 1038 | "cell_type": "code", 1039 | "execution_count": 31, 1040 | "metadata": { 1041 | "collapsed": true 1042 | }, 1043 | "outputs": [], 1044 | "source": [ 1045 | "\"\"\"Setting the headers to send and accept json responses\n", 1046 | "\"\"\"\n", 1047 | "header = {'Content-Type': 'application/json', \\\n", 1048 | " 'Accept': 'application/json'}\n", 1049 | "\n", 1050 | "\"\"\"Reading test batch\n", 1051 | "\"\"\"\n", 1052 | "df = pd.read_csv('../data/test.csv', encoding=\"utf-8-sig\")\n", 1053 | "df = df.head()\n", 1054 | "\n", 1055 | "\"\"\"Converting Pandas Dataframe to json\n", 1056 | "\"\"\"\n", 1057 | "data = df.to_json(orient='records')" 1058 | ] 1059 | }, 1060 | { 1061 | "cell_type": "code", 1062 | "execution_count": 33, 1063 | "metadata": {}, 1064 | "outputs": [ 1065 | { 1066 | "data": { 1067 | "text/plain": [ 1068 | "'[{\"Loan_ID\":\"LP001015\",\"Gender\":\"Male\",\"Married\":\"Yes\",\"Dependents\":\"0\",\"Education\":\"Graduate\",\"Self_Employed\":\"No\",\"ApplicantIncome\":5720,\"CoapplicantIncome\":0,\"LoanAmount\":110.0,\"Loan_Amount_Term\":360.0,\"Credit_History\":1.0,\"Property_Area\":\"Urban\"},{\"Loan_ID\":\"LP001022\",\"Gender\":\"Male\",\"Married\":\"Yes\",\"Dependents\":\"1\",\"Education\":\"Graduate\",\"Self_Employed\":\"No\",\"ApplicantIncome\":3076,\"CoapplicantIncome\":1500,\"LoanAmount\":126.0,\"Loan_Amount_Term\":360.0,\"Credit_History\":1.0,\"Property_Area\":\"Urban\"},{\"Loan_ID\":\"LP001031\",\"Gender\":\"Male\",\"Married\":\"Yes\",\"Dependents\":\"2\",\"Education\":\"Graduate\",\"Self_Employed\":\"No\",\"ApplicantIncome\":5000,\"CoapplicantIncome\":1800,\"LoanAmount\":208.0,\"Loan_Amount_Term\":360.0,\"Credit_History\":1.0,\"Property_Area\":\"Urban\"},{\"Loan_ID\":\"LP001035\",\"Gender\":\"Male\",\"Married\":\"Yes\",\"Dependents\":\"2\",\"Education\":\"Graduate\",\"Self_Employed\":\"No\",\"ApplicantIncome\":2340,\"CoapplicantIncome\":2546,\"LoanAmount\":100.0,\"Loan_Amount_Term\":360.0,\"Credit_History\":null,\"Property_Area\":\"Urban\"},{\"Loan_ID\":\"LP001051\",\"Gender\":\"Male\",\"Married\":\"No\",\"Dependents\":\"0\",\"Education\":\"Not Graduate\",\"Self_Employed\":\"No\",\"ApplicantIncome\":3276,\"CoapplicantIncome\":0,\"LoanAmount\":78.0,\"Loan_Amount_Term\":360.0,\"Credit_History\":1.0,\"Property_Area\":\"Urban\"}]'" 1069 | ] 1070 | }, 1071 | "execution_count": 33, 1072 | "metadata": {}, 1073 | "output_type": "execute_result" 1074 | } 1075 | ], 1076 | "source": [ 1077 | "data" 1078 | ] 1079 | }, 1080 | { 1081 | "cell_type": "code", 1082 | "execution_count": 34, 1083 | "metadata": { 1084 | "collapsed": true 1085 | }, 1086 | "outputs": [], 1087 | "source": [ 1088 | "\"\"\"POST /predict\n", 1089 | "\"\"\"\n", 1090 | "resp = requests.post(\"http://0.0.0.0:8000/predict\", \\\n", 1091 | " data = json.dumps(data),\\\n", 1092 | " headers= header)" 1093 | ] 1094 | }, 1095 | { 1096 | "cell_type": "code", 1097 | "execution_count": 35, 1098 | "metadata": {}, 1099 | "outputs": [ 1100 | { 1101 | "data": { 1102 | "text/plain": [ 1103 | "200" 1104 | ] 1105 | }, 1106 | "execution_count": 35, 1107 | "metadata": {}, 1108 | "output_type": "execute_result" 1109 | } 1110 | ], 1111 | "source": [ 1112 | "resp.status_code" 1113 | ] 1114 | }, 1115 | { 1116 | "cell_type": "code", 1117 | "execution_count": 36, 1118 | "metadata": {}, 1119 | "outputs": [ 1120 | { 1121 | "data": { 1122 | "text/plain": [ 1123 | "{'predictions': '[{\"0\":\"LP001015\",\"1\":1},{\"0\":\"LP001022\",\"1\":1},{\"0\":\"LP001031\",\"1\":1},{\"0\":\"LP001035\",\"1\":1},{\"0\":\"LP001051\",\"1\":1}]'}" 1124 | ] 1125 | }, 1126 | "execution_count": 36, 1127 | "metadata": {}, 1128 | "output_type": "execute_result" 1129 | } 1130 | ], 1131 | "source": [ 1132 | "\"\"\"The final response we get is as follows:\n", 1133 | "\"\"\"\n", 1134 | "resp.json()" 1135 | ] 1136 | }, 1137 | { 1138 | "cell_type": "markdown", 1139 | "metadata": {}, 1140 | "source": [ 1141 | "### End Notes" 1142 | ] 1143 | }, 1144 | { 1145 | "cell_type": "markdown", 1146 | "metadata": { 1147 | "collapsed": true 1148 | }, 1149 | "source": [ 1150 | "We have half the battle won here, with a working API that serves predictions in a way where we take one step towards integrating our ML solutions right into our products. This is a very basic API that will help with proto-typing a data product, to make it as fully functional, production ready API a few more additions are required that aren't in the scope of Machine Learning. \n", 1151 | "\n", 1152 | "There are a few things to keep in mind when adopting API-first approach:\n", 1153 | "\n", 1154 | "- Creating APIs out of sphagetti code is next to impossible, so approach your Machine Learning workflow as if you need to create a clean, usable API as a deliverable. Will save you a lot of effort to jump hoops later.\n", 1155 | "\n", 1156 | "- Try to use version control for models and the API code, `Flask` doesn't provide great support for version control. Saving and keeping track of ML Models is difficult, find out the least messy way that suits you. [This article](https://medium.com/towards-data-science/how-to-version-control-your-machine-learning-task-cad74dce44c4) talks about ways to do it.\n", 1157 | "\n", 1158 | "- Specific to `sklearn models` (as done in this article), if you are using custom `estimators` for preprocessing or any other related task make sure you keep the `estimator` and `training code` together so that the model pickled would have the `estimator` class tagged along. \n", 1159 | "\n", 1160 | "Next logical step would be creating a workflow to deploy such APIs out on a small VM. There are various ways to do it and we'll be looking into those in the next article.\n", 1161 | "\n", 1162 | "Code & Notebooks for this article: [pratos/flask_api](https://github.com/pratos/flask_api)" 1163 | ] 1164 | }, 1165 | { 1166 | "cell_type": "markdown", 1167 | "metadata": {}, 1168 | "source": [ 1169 | "__Sources & Links:__\n", 1170 | "\n", 1171 | "[1]. Don't Pickle your data.\n", 1172 | "\n", 1173 | "[2]. Building Scikit Learn compatible transformers.\n", 1174 | "\n", 1175 | "[3]. Using jsonify in Flask.\n", 1176 | "\n", 1177 | "[4]. Flask-QuickStart." 1178 | ] 1179 | } 1180 | ], 1181 | "metadata": { 1182 | "kernelspec": { 1183 | "display_name": "Python 3", 1184 | "language": "python", 1185 | "name": "python3" 1186 | }, 1187 | "language_info": { 1188 | "codemirror_mode": { 1189 | "name": "ipython", 1190 | "version": 3 1191 | }, 1192 | "file_extension": ".py", 1193 | "mimetype": "text/x-python", 1194 | "name": "python", 1195 | "nbconvert_exporter": "python", 1196 | "pygments_lexer": "ipython3", 1197 | "version": "3.5.2" 1198 | } 1199 | }, 1200 | "nbformat": 4, 1201 | "nbformat_minor": 2 1202 | } 1203 | -------------------------------------------------------------------------------- /notebooks/ML+Models+as+APIs+using+Flask.md: -------------------------------------------------------------------------------- 1 | 2 | ## Machine Learning models as APIs using Flask 3 | 4 | ### Introduction 5 | 6 | A lot of Machine Learning (ML) projects, amateur and professional, start with an aplomb. The early excitement with working on the dataset, answering the obvious & not so obvious questions & presenting the results are what everyone of us works for. There are compliments thrown around and talks about going to the next step -- that's when the question arises, __How?__ 7 | 8 | The usual suspects are making dashboards and providing insights. But mostly, the real use of your Machine Learning model lies in being at the heart of a product -- that maybe a small component of an automated mailer system or a chatbot. These are the times when the barriers seem unsurmountable. Giving an example, majority of ML folks use `R/Python` for their experiments. But consumer of those ML models would be software engineers who use a completely different stack. There are two ways via which this problem can be solved: 9 | 10 | - __Rewriting the whole code in the language that the software engineering folks work__ 11 | 12 | The above seems like a good idea, but the time & energy required to get those intricate models replicated would be utterly waste. Majority of languages like `JavaScript`, do not have great libraries to perform ML. One would be wise to stay away from it. 13 | 14 | - __API-first approach__ 15 | 16 | Web APIs have made it easy for cross-language applications to work well. If a frontend developer needs to use your ML Model to create a ML powered web application, he would just need to get the `URL Endpoint` from where the API is being served. 17 | 18 | The articles below would help you to appreciate why APIs are a popular choice amongst developers: 19 | 20 | - [History of APIs](http://apievangelist.com/2012/12/20/history-of-apis/) 21 | - [Introduction to APIs - AV Article](https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-apis-application-programming-interfaces-5-apis-a-data-scientist-must-know/) 22 | 23 | Majority of the Big Cloud providers and smaller Machine Learning focussed companies provide ready-to-use APIs. They cater to the needs of developers/businesses that don't have expertise in ML, who want to implement ML in their processes or product suites. 24 | 25 | One such example of Web APIs offered is the [Google Vision API](https://cloud.google.com/vision/) 26 | 27 | ![Google API Suite](http://www.publickey1.jp/2016/gcpnext16.jpg) 28 | 29 | All you need is a simple REST call to the API via SDKs (Software Development Kits) provided by Google. [Click here](https://github.com/GoogleCloudPlatform/cloud-vision/tree/master/python) to get an idea of what can be done using Google Vision API. 30 | 31 | Sounds marvellous right! In this article, we'll understand how to create our own Machine Learning API using `Flask`, a web framework with `Python`. 32 | 33 | ![Flask](https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/Flask_logo.svg/640px-Flask_logo.svg.png) 34 | 35 | __NOTE:__ `Flask` isn't the only web-framework available. There's `Django`, `Falcon`, `Hug` and many more. For `R`, we have a package called [`plumber`](https://github.com/trestletech/plumber). 36 | 37 | ### Table of Contents: 38 | 39 | 1. __Python Environment Setup & Flask Basics__ 40 | 2. __Creating a Machine Learning Model__ 41 | 3. __Saving the Machine Learning Model: Serialization & Deserialization__ 42 | 4. __Creating an API using Flask__ 43 | 44 | ### 1. Python Environment Setup & Flask Basics 45 | 46 | ![Anaconda](https://upload.wikimedia.org/wikipedia/en/c/cd/Anaconda_Logo.png) 47 | 48 | - Creating a virtual environment using `Anaconda`. If you need to create your workflows in Python and keep the dependencies separated out or share the environment settings, `Anaconda` distributions are a great option. 49 | * You'll find a miniconda installation for Python [here](https://conda.io/miniconda.html) 50 | * `wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh` 51 | * `bash Miniconda3-latest-Linux-x86_64.sh` 52 | * Follow the sequence of questions. 53 | * `source .bashrc` 54 | * If you run: `conda`, you should be able to get the list of commands & help. 55 | * To create a new environment, run: `conda create --name python=3.6` 56 | * Follow the steps & once done run: `source activate ` 57 | * Install the python packages you need, the two important are: `flask` & `gunicorn`. 58 | 59 | 60 | - We'll try out a simple `Flask` Hello-World application and serve it using `gunicorn`: 61 | 62 | * Open up your favourite text editor and create `hello-world.py` file in a folder 63 | * Write the code below: 64 | ```python 65 | 66 | """Filename: hello-world.py 67 | """ 68 | 69 | from flask import Flask 70 | 71 | app = Flask(__name__) 72 | 73 | @app.route('/users/') 74 | def hello_world(username=None): 75 | 76 | return("Hello {}!".format(username)) 77 | 78 | ``` 79 | * Save the file and return to the terminal. 80 | * To serve the API (to start running it), execute: `gunicorn --bind 0.0.0.0:8000 hello-world:app` on your terminal. 81 | 82 | * If you get the repsonses below, you are on the right track: 83 | 84 | ![Hello World](https://raw.githubusercontent.com/pratos/flask_api/master/notebooks/images/flaskapp1.png) 85 | 86 | * On you browser, try out: `https://localhost:8000/users/any-name` 87 | 88 | ![Browser](https://raw.githubusercontent.com/pratos/flask_api/master/notebooks/images/flaskapp2.png) 89 | 90 | Viola! You wrote your first Flask application. As you have now experienced with a few simple steps, we were able to create web-endpoints that can be accessed locally. And it remains simple going forward too. 91 | 92 | Using `Flask`, we can wrap our Machine Learning models and serve them as Web APIs easily. Also, if we want to create more complex web applications (that includes JavaScript `*gasps*`) we just need a few modifications. 93 | 94 | ### 2. Creating a Machine Learning Model 95 | 96 | - We'll be taking up the Machine Learning competition: [Loan Prediction Competition](https://datahack.analyticsvidhya.com/contest/practice-problem-loan-prediction-iii). The main objective is to set a pre-processing pipeline and creating ML Models with goal towards making the ML Predictions easy while deployments. 97 | 98 | 99 | 100 | 101 | ```python 102 | import os 103 | import json 104 | import numpy as np 105 | import pandas as pd 106 | from sklearn.externals import joblib 107 | from sklearn.model_selection import train_test_split, GridSearchCV 108 | from sklearn.base import BaseEstimator, TransformerMixin 109 | from sklearn.ensemble import RandomForestClassifier 110 | 111 | from sklearn.pipeline import make_pipeline 112 | 113 | import warnings 114 | warnings.filterwarnings("ignore") 115 | ``` 116 | 117 | - Saving the datasets in a folder: 118 | 119 | 120 | ```python 121 | !ls /home/pratos/Side-Project/av_articles/flask_api/data/ 122 | ``` 123 | 124 | test.csv training.csv 125 | 126 | 127 | 128 | ```python 129 | data = pd.read_csv('../data/training.csv') 130 | ``` 131 | 132 | 133 | ```python 134 | list(data.columns) 135 | ``` 136 | 137 | 138 | 139 | 140 | ['Loan_ID', 141 | 'Gender', 142 | 'Married', 143 | 'Dependents', 144 | 'Education', 145 | 'Self_Employed', 146 | 'ApplicantIncome', 147 | 'CoapplicantIncome', 148 | 'LoanAmount', 149 | 'Loan_Amount_Term', 150 | 'Credit_History', 151 | 'Property_Area', 152 | 'Loan_Status'] 153 | 154 | 155 | 156 | 157 | ```python 158 | data.shape 159 | ``` 160 | 161 | 162 | 163 | 164 | (614, 13) 165 | 166 | 167 | 168 | - Finding out the `null/Nan` values in the columns: 169 | 170 | 171 | ```python 172 | for _ in data.columns: 173 | print("The number of null values in:{} == {}".format(_, data[_].isnull().sum())) 174 | ``` 175 | 176 | The number of null values in:Loan_ID == 0 177 | The number of null values in:Gender == 13 178 | The number of null values in:Married == 3 179 | The number of null values in:Dependents == 15 180 | The number of null values in:Education == 0 181 | The number of null values in:Self_Employed == 32 182 | The number of null values in:ApplicantIncome == 0 183 | The number of null values in:CoapplicantIncome == 0 184 | The number of null values in:LoanAmount == 22 185 | The number of null values in:Loan_Amount_Term == 14 186 | The number of null values in:Credit_History == 50 187 | The number of null values in:Property_Area == 0 188 | The number of null values in:Loan_Status == 0 189 | 190 | 191 | - Next step is creating `training` and `testing` datasets: 192 | 193 | 194 | ```python 195 | pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome','CoapplicantIncome',\ 196 | 'LoanAmount','Loan_Amount_Term','Credit_History','Property_Area'] 197 | 198 | X_train, X_test, y_train, y_test = train_test_split(data[pred_var], data['Loan_Status'], \ 199 | test_size=0.25, random_state=42) 200 | ``` 201 | 202 | - To make sure that the `pre-processing steps` are followed religiously even after we are done with experimenting and we do not miss them while predictions, we'll create a __custom pre-processing Scikit-learn `estimator`__. 203 | 204 | __To follow the process on how we ended up with this `estimator`, read up on [this notebook](https://github.com/pratos/flask_api/blob/master/notebooks/AnalyticsVidhya%20Article%20-%20ML%20Model%20approach.ipynb)__ 205 | 206 | 207 | ```python 208 | from sklearn.base import BaseEstimator, TransformerMixin 209 | 210 | class PreProcessing(BaseEstimator, TransformerMixin): 211 | """Custom Pre-Processing estimator for our use-case 212 | """ 213 | 214 | def __init__(self): 215 | pass 216 | 217 | def transform(self, df): 218 | """Regular transform() that is a help for training, validation & testing datasets 219 | (NOTE: The operations performed here are the ones that we did prior to this cell) 220 | """ 221 | pred_var = ['Gender','Married','Dependents','Education','Self_Employed','ApplicantIncome',\ 222 | 'CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History','Property_Area'] 223 | 224 | df = df[pred_var] 225 | 226 | df['Dependents'] = df['Dependents'].fillna(0) 227 | df['Self_Employed'] = df['Self_Employed'].fillna('No') 228 | df['Loan_Amount_Term'] = df['Loan_Amount_Term'].fillna(self.term_mean_) 229 | df['Credit_History'] = df['Credit_History'].fillna(1) 230 | df['Married'] = df['Married'].fillna('No') 231 | df['Gender'] = df['Gender'].fillna('Male') 232 | df['LoanAmount'] = df['LoanAmount'].fillna(self.amt_mean_) 233 | 234 | gender_values = {'Female' : 0, 'Male' : 1} 235 | married_values = {'No' : 0, 'Yes' : 1} 236 | education_values = {'Graduate' : 0, 'Not Graduate' : 1} 237 | employed_values = {'No' : 0, 'Yes' : 1} 238 | property_values = {'Rural' : 0, 'Urban' : 1, 'Semiurban' : 2} 239 | dependent_values = {'3+': 3, '0': 0, '2': 2, '1': 1} 240 | df.replace({'Gender': gender_values, 'Married': married_values, 'Education': education_values, \ 241 | 'Self_Employed': employed_values, 'Property_Area': property_values, \ 242 | 'Dependents': dependent_values}, inplace=True) 243 | 244 | return df.as_matrix() 245 | 246 | def fit(self, df, y=None, **fit_params): 247 | """Fitting the Training dataset & calculating the required values from train 248 | e.g: We will need the mean of X_train['Loan_Amount_Term'] that will be used in 249 | transformation of X_test 250 | """ 251 | 252 | self.term_mean_ = df['Loan_Amount_Term'].mean() 253 | self.amt_mean_ = df['LoanAmount'].mean() 254 | return self 255 | ``` 256 | 257 | - Convert `y_train` & `y_test` to `np.array`: 258 | 259 | 260 | ```python 261 | y_train = y_train.replace({'Y':1, 'N':0}).as_matrix() 262 | y_test = y_test.replace({'Y':1, 'N':0}).as_matrix() 263 | ``` 264 | 265 | We'll create a `pipeline` to make sure that all the preprocessing steps that we do are just a single `scikit-learn estimator`. 266 | 267 | 268 | ```python 269 | pipe = make_pipeline(PreProcessing(), 270 | RandomForestClassifier()) 271 | ``` 272 | 273 | 274 | ```python 275 | pipe 276 | ``` 277 | 278 | 279 | 280 | 281 | Pipeline(memory=None, 282 | steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', 283 | max_depth=None, max_features='auto', max_leaf_nodes=None, 284 | min_impurity_decrease=0.0, min_impurity_split=None, 285 | min_samples_leaf=1, min_samples_split=2, 286 | min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, 287 | oob_score=False, random_state=None, verbose=0, 288 | warm_start=False))]) 289 | 290 | 291 | 292 | To search for the best `hyper-parameters` (`degree` for `PolynomialFeatures` & `alpha` for `Ridge`), we'll do a `Grid Search`: 293 | 294 | - Defining `param_grid`: 295 | 296 | 297 | ```python 298 | param_grid = {"randomforestclassifier__n_estimators" : [10, 20, 30], 299 | "randomforestclassifier__max_depth" : [None, 6, 8, 10], 300 | "randomforestclassifier__max_leaf_nodes": [None, 5, 10, 20], 301 | "randomforestclassifier__min_impurity_split": [0.1, 0.2, 0.3]} 302 | ``` 303 | 304 | - Running the `Grid Search`: 305 | 306 | 307 | ```python 308 | grid = GridSearchCV(pipe, param_grid=param_grid, cv=3) 309 | ``` 310 | 311 | - Fitting the training data on the `pipeline estimator`: 312 | 313 | 314 | ```python 315 | grid.fit(X_train, y_train) 316 | ``` 317 | 318 | 319 | 320 | 321 | GridSearchCV(cv=3, error_score='raise', 322 | estimator=Pipeline(memory=None, 323 | steps=[('preprocessing', PreProcessing()), ('randomforestclassifier', RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', 324 | max_depth=None, max_features='auto', max_leaf_nodes=None, 325 | min_impurity_decrease=0.0, min_impu..._jobs=1, 326 | oob_score=False, random_state=None, verbose=0, 327 | warm_start=False))]), 328 | fit_params=None, iid=True, n_jobs=1, 329 | param_grid={'randomforestclassifier__n_estimators': [10, 20, 30], 'randomforestclassifier__max_leaf_nodes': [None, 5, 10, 20], 'randomforestclassifier__min_impurity_split': [0.1, 0.2, 0.3], 'randomforestclassifier__max_depth': [None, 6, 8, 10]}, 330 | pre_dispatch='2*n_jobs', refit=True, return_train_score=True, 331 | scoring=None, verbose=0) 332 | 333 | 334 | 335 | - Let's see what parameter did the Grid Search select: 336 | 337 | 338 | ```python 339 | print("Best parameters: {}".format(grid.best_params_)) 340 | ``` 341 | 342 | Best parameters: {'randomforestclassifier__n_estimators': 30, 'randomforestclassifier__max_leaf_nodes': 20, 'randomforestclassifier__min_impurity_split': 0.2, 'randomforestclassifier__max_depth': 8} 343 | 344 | 345 | - Let's score: 346 | 347 | 348 | ```python 349 | print("Validation set score: {:.2f}".format(grid.score(X_test, y_test))) 350 | ``` 351 | 352 | Validation set score: 0.79 353 | 354 | 355 | - Load the test set: 356 | 357 | 358 | ```python 359 | test_df = pd.read_csv('../data/test.csv', encoding="utf-8-sig") 360 | test_df = test_df.head() 361 | ``` 362 | 363 | 364 | ```python 365 | test_df 366 | ``` 367 | 368 | 369 | 370 | 371 |
372 | 373 | 374 | 375 | 376 | 377 | 378 | 379 | 380 | 381 | 382 | 383 | 384 | 385 | 386 | 387 | 388 | 389 | 390 | 391 | 392 | 393 | 394 | 395 | 396 | 397 | 398 | 399 | 400 | 401 | 402 | 403 | 404 | 405 | 406 | 407 | 408 | 409 | 410 | 411 | 412 | 413 | 414 | 415 | 416 | 417 | 418 | 419 | 420 | 421 | 422 | 423 | 424 | 425 | 426 | 427 | 428 | 429 | 430 | 431 | 432 | 433 | 434 | 435 | 436 | 437 | 438 | 439 | 440 | 441 | 442 | 443 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 |
Loan_IDGenderMarriedDependentsEducationSelf_EmployedApplicantIncomeCoapplicantIncomeLoanAmountLoan_Amount_TermCredit_HistoryProperty_Area
0LP001015MaleYes0GraduateNo57200110.0360.01.0Urban
1LP001022MaleYes1GraduateNo30761500126.0360.01.0Urban
2LP001031MaleYes2GraduateNo50001800208.0360.01.0Urban
3LP001035MaleYes2GraduateNo23402546100.0360.0NaNUrban
4LP001051MaleNo0Not GraduateNo3276078.0360.01.0Urban
468 |
469 | 470 | 471 | 472 | 473 | ```python 474 | grid.predict(test_df) 475 | ``` 476 | 477 | 478 | 479 | 480 | array([1, 1, 1, 1, 1]) 481 | 482 | 483 | 484 | Our `pipeline` is looking pretty swell & fairly decent to go the most important step of the tutorial: __Serialize the Machine Learning Model__ 485 | 486 | ### 3. Saving Machine Learning Model : Serialization & Deserialization 487 | 488 | >In computer science, in the context of data storage, serialization is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and reconstructed later in the same or another computer environment. 489 | 490 | In Python, `pickling` is a standard way to store objects and retrieve them as their original state. To give a simple example: 491 | 492 | 493 | ```python 494 | list_to_pickle = [1, 'here', 123, 'walker'] 495 | 496 | #Pickling the list 497 | import pickle 498 | 499 | list_pickle = pickle.dumps(list_to_pickle) 500 | ``` 501 | 502 | 503 | ```python 504 | list_pickle 505 | ``` 506 | 507 | 508 | 509 | 510 | b'\x80\x03]q\x00(K\x01X\x04\x00\x00\x00hereq\x01K{X\x06\x00\x00\x00walkerq\x02e.' 511 | 512 | 513 | 514 | When we load the pickle back: 515 | 516 | 517 | ```python 518 | loaded_pickle = pickle.loads(list_pickle) 519 | ``` 520 | 521 | 522 | ```python 523 | loaded_pickle 524 | ``` 525 | 526 | 527 | 528 | 529 | [1, 'here', 123, 'walker'] 530 | 531 | 532 | 533 | We can save the `pickled object` to a file as well and use it. This method is similar to creating `.rda` files for folks who are familiar with `R Programming`. 534 | 535 | __NOTE:__ Some people also argue against using `pickle` for serialization[(1)](#no1). `h5py` could also be an alternative. 536 | 537 | We have a custom `Class` that we need to import while running our training, hence we'll be using `dill` module to packup the `estimator Class` with our `grid object`. 538 | 539 | It is advisable to create a separate `training.py` file that contains all the code for training the model ([See here for example](https://github.com/pratos/flask_api/blob/master/flask_api/utils.py)). 540 | 541 | - To install `dill` 542 | 543 | 544 | ```python 545 | !pip install dill 546 | ``` 547 | 548 | Requirement already satisfied: dill in /home/pratos/miniconda3/envs/ordermanagement/lib/python3.5/site-packages 549 | 550 | 551 | 552 | ```python 553 | import dill as pickle 554 | filename = 'model_v1.pk' 555 | ``` 556 | 557 | 558 | ```python 559 | with open('../flask_api/models/'+filename, 'wb') as file: 560 | pickle.dump(grid, file) 561 | ``` 562 | 563 | So our model will be saved in the location above. Now that the model `pickled`, creating a `Flask` wrapper around it would be the next step. 564 | 565 | Before that, to be sure that our `pickled` file works fine -- let's load it back and do a prediction: 566 | 567 | 568 | ```python 569 | with open('../flask_api/models/'+filename ,'rb') as f: 570 | loaded_model = pickle.load(f) 571 | ``` 572 | 573 | 574 | ```python 575 | loaded_model.predict(test_df) 576 | ``` 577 | 578 | 579 | 580 | 581 | array([1, 1, 1, 1, 1]) 582 | 583 | 584 | 585 | Since, we already have the `preprocessing` steps required for the new incoming data present as a part of the `pipeline` we just have to run `predict()`. While working with `scikit-learn`, it is always easy to work with `pipelines`. 586 | 587 | `Estimators` and `pipelines` save you time and headache, even if the initial implementation seems to be ridiculous. Stich in time, saves nine! 588 | 589 | ### 4. Creating an API using Flask 590 | 591 | We'll keep the folder structure as simple as possible: 592 | 593 | ![Folder Struct](https://raw.githubusercontent.com/pratos/flask_api/master/notebooks/images/flaskapp3.png) 594 | 595 | There are three important parts in constructing our wrapper function, `apicall()`: 596 | 597 | - Getting the `request` data (for which predictions are to be made) 598 | 599 | - Loading our `pickled estimator` 600 | 601 | - `jsonify` our predictions and send the response back with `status code: 200` 602 | 603 | HTTP messages are made of a header and a body. As a standard, majority of the body content sent across are in `json` format. We'll be sending (`POST url-endpoint/`) the incoming data as batch to get predictions. 604 | 605 | (__NOTE:__ You can send plain text, XML, csv or image directly but for the sake of interchangeability of the format, it is advisable to use `json`) 606 | 607 | ```python 608 | """Filename: server.py 609 | """ 610 | 611 | import os 612 | import pandas as pd 613 | from sklearn.externals import joblib 614 | from flask import Flask, jsonify, request 615 | 616 | app = Flask(__name__) 617 | 618 | @app.route('/predict', methods=['POST']) 619 | def apicall(): 620 | """API Call 621 | 622 | Pandas dataframe (sent as a payload) from API Call 623 | """ 624 | try: 625 | test_json = request.get_json() 626 | test = pd.read_json(test_json, orient='records') 627 | 628 | #To resolve the issue of TypeError: Cannot compare types 'ndarray(dtype=int64)' and 'str' 629 | test['Dependents'] = [str(x) for x in list(test['Dependents'])] 630 | 631 | #Getting the Loan_IDs separated out 632 | loan_ids = test['Loan_ID'] 633 | 634 | except Exception as e: 635 | raise e 636 | 637 | clf = 'model_v1.pk' 638 | 639 | if test.empty: 640 | return(bad_request()) 641 | else: 642 | #Load the saved model 643 | print("Loading the model...") 644 | loaded_model = None 645 | with open('./models/'+clf,'rb') as f: 646 | loaded_model = pickle.load(f) 647 | 648 | print("The model has been loaded...doing predictions now...") 649 | predictions = loaded_model.predict(test) 650 | 651 | """Add the predictions as Series to a new pandas dataframe 652 | OR 653 | Depending on the use-case, the entire test data appended with the new files 654 | """ 655 | prediction_series = list(pd.Series(predictions)) 656 | 657 | final_predictions = pd.DataFrame(list(zip(loan_ids, prediction_series))) 658 | 659 | """We can be as creative in sending the responses. 660 | But we need to send the response codes as well. 661 | """ 662 | responses = jsonify(predictions=final_predictions.to_json(orient="records")) 663 | responses.status_code = 200 664 | 665 | return (responses) 666 | 667 | ``` 668 | 669 | Once done, run: `gunicorn --bind 0.0.0.0:8000 server:app` 670 | 671 | Let's generate some prediction data and query the API running locally at `https:0.0.0.0:8000/predict` 672 | 673 | 674 | ```python 675 | import json 676 | import requests 677 | ``` 678 | 679 | 680 | ```python 681 | """Setting the headers to send and accept json responses 682 | """ 683 | header = {'Content-Type': 'application/json', \ 684 | 'Accept': 'application/json'} 685 | 686 | """Reading test batch 687 | """ 688 | df = pd.read_csv('../data/test.csv', encoding="utf-8-sig") 689 | df = df.head() 690 | 691 | """Converting Pandas Dataframe to json 692 | """ 693 | data = df.to_json(orient='records') 694 | ``` 695 | 696 | 697 | ```python 698 | data 699 | ``` 700 | 701 | 702 | 703 | 704 | '[{"Loan_ID":"LP001015","Gender":"Male","Married":"Yes","Dependents":"0","Education":"Graduate","Self_Employed":"No","ApplicantIncome":5720,"CoapplicantIncome":0,"LoanAmount":110.0,"Loan_Amount_Term":360.0,"Credit_History":1.0,"Property_Area":"Urban"},{"Loan_ID":"LP001022","Gender":"Male","Married":"Yes","Dependents":"1","Education":"Graduate","Self_Employed":"No","ApplicantIncome":3076,"CoapplicantIncome":1500,"LoanAmount":126.0,"Loan_Amount_Term":360.0,"Credit_History":1.0,"Property_Area":"Urban"},{"Loan_ID":"LP001031","Gender":"Male","Married":"Yes","Dependents":"2","Education":"Graduate","Self_Employed":"No","ApplicantIncome":5000,"CoapplicantIncome":1800,"LoanAmount":208.0,"Loan_Amount_Term":360.0,"Credit_History":1.0,"Property_Area":"Urban"},{"Loan_ID":"LP001035","Gender":"Male","Married":"Yes","Dependents":"2","Education":"Graduate","Self_Employed":"No","ApplicantIncome":2340,"CoapplicantIncome":2546,"LoanAmount":100.0,"Loan_Amount_Term":360.0,"Credit_History":null,"Property_Area":"Urban"},{"Loan_ID":"LP001051","Gender":"Male","Married":"No","Dependents":"0","Education":"Not Graduate","Self_Employed":"No","ApplicantIncome":3276,"CoapplicantIncome":0,"LoanAmount":78.0,"Loan_Amount_Term":360.0,"Credit_History":1.0,"Property_Area":"Urban"}]' 705 | 706 | 707 | 708 | 709 | ```python 710 | """POST /predict 711 | """ 712 | resp = requests.post("http://0.0.0.0:8000/predict", \ 713 | data = json.dumps(data),\ 714 | headers= header) 715 | ``` 716 | 717 | 718 | ```python 719 | resp.status_code 720 | ``` 721 | 722 | 723 | 724 | 725 | 200 726 | 727 | 728 | 729 | 730 | ```python 731 | """The final response we get is as follows: 732 | """ 733 | resp.json() 734 | ``` 735 | 736 | 737 | 738 | 739 | {'predictions': '[{"0":"LP001015","1":1},{"0":"LP001022","1":1},{"0":"LP001031","1":1},{"0":"LP001035","1":1},{"0":"LP001051","1":1}]'} 740 | 741 | 742 | 743 | ### End Notes 744 | 745 | We have half the battle won here, with a working API that serves predictions in a way where we take one step towards integrating our ML solutions right into our products. This is a very basic API that will help with proto-typing a data product, to make it as fully functional, production ready API a few more additions are required that aren't in the scope of Machine Learning. 746 | 747 | There are a few things to keep in mind when adopting API-first approach: 748 | 749 | - Creating APIs out of sphagetti code is next to impossible, so approach your Machine Learning workflow as if you need to create a clean, usable API as a deliverable. Will save you a lot of effort to jump hoops later. 750 | 751 | - Try to use version control for models and the API code, `Flask` doesn't provide great support for version control. Saving and keeping track of ML Models is difficult, find out the least messy way that suits you. [This article](https://medium.com/towards-data-science/how-to-version-control-your-machine-learning-task-cad74dce44c4) talks about ways to do it. 752 | 753 | - Specific to `sklearn models` (as done in this article), if you are using custom `estimators` for preprocessing or any other related task make sure you keep the `estimator` and `training code` together so that the model pickled would have the `estimator` class tagged along. 754 | 755 | Next logical step would be creating a workflow to deploy such APIs out on a small VM. There are various ways to do it and we'll be looking into those in the next article. 756 | 757 | Code & Notebooks for this article: [pratos/flask_api](https://github.com/pratos/flask_api) 758 | 759 | __Sources & Links:__ 760 | 761 | [1]. Don't Pickle your data. 762 | 763 | [2]. Building Scikit Learn compatible transformers. 764 | 765 | [3]. Using jsonify in Flask. 766 | 767 | [4]. Flask-QuickStart. 768 | -------------------------------------------------------------------------------- /notebooks/images/flaskapp1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratos/flask_api/2be7eded1e7167a64b895e700adab3c60355e186/notebooks/images/flaskapp1.png -------------------------------------------------------------------------------- /notebooks/images/flaskapp2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratos/flask_api/2be7eded1e7167a64b895e700adab3c60355e186/notebooks/images/flaskapp2.png -------------------------------------------------------------------------------- /notebooks/images/flaskapp3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/pratos/flask_api/2be7eded1e7167a64b895e700adab3c60355e186/notebooks/images/flaskapp3.png --------------------------------------------------------------------------------