├── .github ├── ISSUE_TEMPLATE │ ├── config.yml │ ├── doubt.md │ └── helper-issue.md └── PULL_REQUEST_TEMPLATE.md ├── README.md ├── main.ipynb ├── requirements.txt └── submissions ├── .ipynb_checkpoints └── Arpit-Agarwal-checkpoint.ipynb ├── Abhijit-Singh.ipynb ├── Anusha-Verma-C.ipynb ├── Arpit-Agarwal.ipynb └── Dhanesh-Shetty.ipynb /.github/ISSUE_TEMPLATE/config.yml: -------------------------------------------------------------------------------- 1 | blank_issues_enabled: false -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/doubt.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Doubt 3 | about: Ask us a doubt if something is not clear. 4 | title: '[DOUBT] <"Your doubt goes here"> ' 5 | labels: doubt 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the doubt** 11 | A clear and concise description of what the doubt is. 12 | 13 | **Where do you need help** 14 | Where exactly do you need our help? 15 | 16 | **Additional context** 17 | Add any other context about the doubt here. 18 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/helper-issue.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Helper Issue 3 | about: Make a helper issue for participants 4 | title: '' 5 | labels: good first issue, hacktoberfest, helper issue 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Task** 11 | Describe the task here 12 | 13 | **Function to Implement** 14 | Write function name, if any 15 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Fixes #[Add issue number here. If you do not solve the issue entirely, please change the message e.g. "First steps for issues #IssueNumber] 2 | 3 | Changes: [Tell whether you have completed the entire task or some modules/functions] -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

Kicking Off Hacktoberfest with ACM-VIT!

2 |

3 | 4 |

5 | 6 |

Good Client, Bad Client

7 | 8 |

9 | Help us build a Credit Card Approval System - Using Machine Learning! 10 |

11 | 12 |

13 | 14 | 15 | made-by-acm 16 | license 17 | stars 18 | forks 19 | 20 | 21 |

22 | 23 | ## Overview 24 | 25 | The main motive of the project is to build a machine learning model to predict if an applicant is 'good' or 'bad' client, different from other tasks, the definition of 'good' or 'bad' is not given.

26 | Credit score cards are a common risk control method in the financial industry. It uses personal information and data submitted by credit card applicants to predict the probability of future defaults and credit card borrowings. The bank is able to decide whether to issue a credit card to the applicant. Credit scores can objectively quantify the magnitude of risk.

27 | In dataset,application_record.csv is the table/file that has information about all the customers regarding their socio-economic status and credit_record.csv is the file/table that has all the payment/default records for a given client.
28 | 29 | 30 | --- 31 | 32 | ## Usage 33 | Run the following command to install all the required packages for this project 34 |
pip install -r requirements.txt
35 | 36 | Lets get started! 37 |

 38 |  git remote add
 39 |  git fetch 
 40 |  git merge
41 | ## Dataset 42 | 43 | Link to the data set is [here](https://drive.google.com/drive/folders/1ltq08WdYxd-r9wnY60o78VBgN5FlMcKk?usp=sharing). 44 | 45 | --- 46 | ## Submitting a Pull Request 47 | 48 | * Fork the repository by clicking the fork button on top right corner of the page 49 | * Clone the target repository. To clone, click on the clone button and copy the https address. Then run 50 |
git clone [HTTPS-ADDRESS]
51 | * Go to the cloned directory by running 52 |
cd [NAME-OF-REPO]
53 | * Create a new branch. Use 54 |
 git checkout -b [YOUR-BRANCH-NAME]
55 | * Make your changes to the code. Add changes to your branch by using 56 |
git add .
57 | * Commit the chanes by executing 58 |
git commit -m "your msg"
59 | * Push to remote. To do this, run 60 |
git push origin [YOUR-BRANCH-NAME]
61 | * Create a pull request. Go to the target repository and click on the "Compare & pull request" button. **Make sure your PR description mentions which issues you're solving.** 62 | 63 | * Wait for your request to be accepted. 64 | 65 | --- 66 | ## Guidelines for Pull Request 67 | 68 | * Avoid pull requests that : 69 | * are automated or scripted 70 | * that are plagarized from someone else's branch 71 | * Do not spam 72 | * Project maintainer's decision on validity of PR is final. 73 | 74 | For additional guidelines, refer to [participation rules](https://hacktoberfest.digitalocean.com/details#rules) 75 | 76 | --- 77 | 78 | ## What counts as a PR? 79 | 80 | Check out our [issues](https://github.com/ACM-VIT/Good-Client-Bad-Client/issues) and try to solve them ! 81 | 82 | 83 | --- 84 | 85 | ## Interacting with Issues 86 | 87 | * There are helper issues that detail all you have to do to complete the project. 88 | * Read the helper issues and work on the corresponding code in your fork of the repo. 89 | * If you have some doubt regarding the 'help' given, comment below the issue. 90 | * If you have some doubt not related to any 'helper issue/s' open, Open up a new issue, select doubt and fill in the template. 91 | * If you want to provide some extra help to fellow participants, open up a new helper issue. Don't include any solution/code! 92 | * Do not spam 93 | 94 | --- 95 | 96 | ## Authors 97 | 98 | **Authors:** [Aryan Vats](https://github.com/avats101), [Aditya Nalini](https://github.com/adinalini), [Varun Srinivasan](https://github.com/DEV-VarunSrinivasan) 99 |
100 | -------------------------------------------------------------------------------- /main.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "main.ipynb", 7 | "provenance": [], 8 | "toc_visible": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | } 14 | }, 15 | "cells": [ 16 | { 17 | "cell_type": "markdown", 18 | "metadata": { 19 | "id": "f17yT-ge6H5l" 20 | }, 21 | "source": [ 22 | "# **Data**" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": { 28 | "id": "y6JmMMyn6T_6" 29 | }, 30 | "source": [ 31 | "\n", 32 | "Two files are available. One, the application data, and the second one monthly credit card account status information.\n", 33 | "\n", 34 | "The application data will be used for feature creation. And the status (credit payment status) will be required for defining the labels - which of the applications have paid back dues and which of these turn out to bad accounts." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "id": "HQkocJqb5BxU" 41 | }, 42 | "source": [ 43 | "### 1. Application\n", 44 | "\n", 45 | "For a credit card, the customers fill-up the form - online or a physical. The application information is used for assessing the creditworthiness of the customer. In addition to the application information, the Credit Bureau Score e.g. FICO Score in the US, CIBIL Score in India, and other internal information about the applicants are used for the decision.\n", 46 | "\n", 47 | "Also, gradually the banks are considering a lot of external data to improve the quality of credit decisions.\n", 48 | "\n", 49 | "Now, we expect to read and explore the application sample data file provided.\n" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "metadata": { 55 | "id": "YTu89Y6e5AlQ" 56 | }, 57 | "source": [ 58 | "def read_app_data():\n", 59 | "#Reading the application data" 60 | ], 61 | "execution_count": null, 62 | "outputs": [] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": { 67 | "id": "v9RGpm697m6E" 68 | }, 69 | "source": [ 70 | "### 2. Credit Status\n", 71 | "\n", 72 | "Once a credit card is issued, the customer uses it for shopping items of its use, a statement is generated to make a payment toward the dues by due date and the customer makes payment. This is a typical credit card cycle.\n", 73 | "\n", 74 | "If a customer is not able to make a payment for the minimum due amount, the customer is considered past due for that month.\n", 75 | "If the non-payment is continued for a period, the customer is considered as a defaulter and the due amount is written off & becomes bad debt. Of course, there is a lot of effort and steps the bank does to recover the due amount and this falls under the collection process.\n", 76 | "\n", 77 | "With the modeling process, the aim to learn about the customers who were not able to pay back the dues and not to approve applications of the customers who look similar to these.\n", 78 | "Of course, we do not know the applications that were rejected and how many of those were actually good customers. This is not in the scope of this blog.\n", 79 | "\n", 80 | "For this exercise, the credit status file is given. In this file, the status value is given for each of the applications post approved." 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "metadata": { 86 | "id": "CLupt4Kd7vu5" 87 | }, 88 | "source": [ 89 | "" 90 | ], 91 | "execution_count": null, 92 | "outputs": [] 93 | }, 94 | { 95 | "cell_type": "markdown", 96 | "metadata": { 97 | "id": "3DhzXpbe8qbG" 98 | }, 99 | "source": [ 100 | "## Feature Variable Creation" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "metadata": { 106 | "id": "rIDdu8xb8-KA" 107 | }, 108 | "source": [ 109 | "def feature_creation" 110 | ], 111 | "execution_count": null, 112 | "outputs": [] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": { 117 | "id": "ZbZ7W1kK8_Pp" 118 | }, 119 | "source": [ 120 | "## Data Exploration " 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": { 126 | "id": "dxYpOQt_-O6L" 127 | }, 128 | "source": [ 129 | "Let's check if any of the variables have missing values present.(Hint:There are missing values :) )" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "metadata": { 135 | "id": "7sTRgiMW9e6U" 136 | }, 137 | "source": [ 138 | "def missing_values_table(df):" 139 | ], 140 | "execution_count": null, 141 | "outputs": [] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": { 146 | "id": "S3BnPgZt-H7m" 147 | }, 148 | "source": [ 149 | "Solving the missing values problem by creating them as a separate class. Now we would want to see bivariate analysis - the analysis between Label variable and each of the feature variables. Based on the analytical type of the feature variables, the analysis may be different. So, we can find an analytical type of feature first. We have written a function definition to find the analytical type of variables.You can try to solve the mising values by any other method too." 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "metadata": { 155 | "id": "n5ZaKk-k-HLE" 156 | }, 157 | "source": [ 158 | "def solution_missing_values(df):\n", 159 | " # Find Continuous and Categorical Features" 160 | ], 161 | "execution_count": null, 162 | "outputs": [] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": { 167 | "id": "O0t6Ni2a-8zi" 168 | }, 169 | "source": [ 170 | "### Observations\n" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "id": "-glozoHTJ-HQ" 177 | }, 178 | "source": [ 179 | "# Model Creation" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": { 185 | "id": "3GnJPaiuKBq3" 186 | }, 187 | "source": [ 188 | "## Importing packages for the model" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "metadata": { 194 | "id": "fQGD5KnFKayL" 195 | }, 196 | "source": [ 197 | "" 198 | ], 199 | "execution_count": null, 200 | "outputs": [] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "id": "5fS-PM02KUVY" 206 | }, 207 | "source": [ 208 | "## Prepare Data" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "metadata": { 214 | "id": "tYnZ1xNPAZtC" 215 | }, 216 | "source": [ 217 | "" 218 | ], 219 | "execution_count": null, 220 | "outputs": [] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": { 225 | "id": "-8eLrkCCKeMA" 226 | }, 227 | "source": [ 228 | "## Split Sample to Train and Test Samples" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "metadata": { 234 | "id": "sSMCsfs9Kfr4" 235 | }, 236 | "source": [ 237 | "" 238 | ], 239 | "execution_count": null, 240 | "outputs": [] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": { 245 | "id": "k7uKgsSJKgbU" 246 | }, 247 | "source": [ 248 | "## Model Definition " 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "metadata": { 254 | "id": "p9hmYoJQKv_R" 255 | }, 256 | "source": [ 257 | "" 258 | ], 259 | "execution_count": null, 260 | "outputs": [] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "id": "uBzMnlMVKwRd" 266 | }, 267 | "source": [ 268 | "## Fitting Model" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "metadata": { 274 | "id": "fC_JjEB4OmvC" 275 | }, 276 | "source": [ 277 | "" 278 | ], 279 | "execution_count": null, 280 | "outputs": [] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": { 285 | "id": "ZpFwD14iOnWq" 286 | }, 287 | "source": [ 288 | "## Predict using Fitted Model" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "metadata": { 294 | "id": "B_2uKNb9Qflr" 295 | }, 296 | "source": [ 297 | "" 298 | ], 299 | "execution_count": null, 300 | "outputs": [] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": { 305 | "id": "5hIjbQ5aSAqW" 306 | }, 307 | "source": [ 308 | "## Model Evaluation\n", 309 | "Now, we may want to compare the predicted and observed label classes to see the actual accuracy. Confusion Matrix can be useful." 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "metadata": { 315 | "id": "Nsfw4Ty_SL69" 316 | }, 317 | "source": [ 318 | "" 319 | ], 320 | "execution_count": null, 321 | "outputs": [] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": { 326 | "id": "14r3giw8SMNF" 327 | }, 328 | "source": [ 329 | "## Model Parameter Turing\n", 330 | "Considering, we have a relatively small size of the data and features, we are setting a high number of parameters for tuning. If data is large, it may take a bit of time to get the results." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "metadata": { 336 | "id": "W6f3otA6Th6o" 337 | }, 338 | "source": [ 339 | "" 340 | ], 341 | "execution_count": null, 342 | "outputs": [] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": { 347 | "id": "mLKFRDjrTicw" 348 | }, 349 | "source": [ 350 | "\n", 351 | "## Optimized Model Classifier\n", 352 | "\n", 353 | "Parameter tuning has helped us get the best combination of the parameters. Now, we will fit the model with these sets of parameters and see the improvement in the accuracy of the model.\n" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "metadata": { 359 | "id": "6GOeMkqJTm5n" 360 | }, 361 | "source": [ 362 | "" 363 | ], 364 | "execution_count": null, 365 | "outputs": [] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": { 370 | "id": "H_mHegtJTnIt" 371 | }, 372 | "source": [ 373 | "## Optimized Model Evaluation" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "metadata": { 379 | "id": "V7C1-zk2TvXi" 380 | }, 381 | "source": [ 382 | "" 383 | ], 384 | "execution_count": null, 385 | "outputs": [] 386 | }, 387 | { 388 | "cell_type": "markdown", 389 | "metadata": { 390 | "id": "SXkBI0rOTv4r" 391 | }, 392 | "source": [ 393 | "## Model Evaluation for the Optimized Model on Testing Sample\n" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "metadata": { 399 | "id": "-Ls2zBJiVSFG" 400 | }, 401 | "source": [ 402 | "" 403 | ], 404 | "execution_count": null, 405 | "outputs": [] 406 | } 407 | ] 408 | } -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.19.2 2 | matplotlib==3.3.2 3 | pandas==1.1.2 4 | seaborn==0.11.0 5 | -------------------------------------------------------------------------------- /submissions/.ipynb_checkpoints/Arpit-Agarwal-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# Importing required libraries\n", 10 | "import numpy as np\n", 11 | "import pandas as pd" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Import and read app data\n", 19 | "\n", 20 | "The link for downloading the daatset is " 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 2, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "def read_app_data():\n", 30 | " application_record=pd.read_csv(r'C:\\Users\\akhil\\Documents\\drive-download-20201009T155012Z-001\\application_record.csv')\n", 31 | " return application_record\n", 32 | "application_record=read_app_data()" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [] 41 | } 42 | ], 43 | "metadata": { 44 | "kernelspec": { 45 | "display_name": "Python3.6Test", 46 | "language": "python", 47 | "name": "python3.6test" 48 | }, 49 | "language_info": { 50 | "codemirror_mode": { 51 | "name": "ipython", 52 | "version": 3 53 | }, 54 | "file_extension": ".py", 55 | "mimetype": "text/x-python", 56 | "name": "python", 57 | "nbconvert_exporter": "python", 58 | "pygments_lexer": "ipython3", 59 | "version": "3.6.5" 60 | } 61 | }, 62 | "nbformat": 4, 63 | "nbformat_minor": 4 64 | } 65 | -------------------------------------------------------------------------------- /submissions/Abhijit-Singh.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# importing pandas library as pd\n", 10 | "\n", 11 | "import pandas as pd" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "# importing application_record dataset " 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# Reading application_record.csv file.\n", 28 | "# This application_record.csv could be downloaded from the below provided link\n", 29 | "# https://drive.google.com/file/d/1EJ454SyXT-RpEAfhqYu72bCyvZrY_ehg/view?usp=sharing\n", 30 | "def read_app_data():\n", 31 | " application_data=pd.read_csv('application_record.csv')\n", 32 | " return application_data" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "## importing credit_card dataset " 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "# This credit_record.csv could be downloaded from the below provided link\n", 49 | "# https://drive.google.com/file/d/1LvjYFEztJJUYNhSa1eznfseCNHz3DgZb/view?usp=sharing\n", 50 | "credit_record=pd.read_csv('credit_record.csv')" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## Feature creation" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "data": { 67 | "text/html": [ 68 | "
\n", 69 | "\n", 82 | "\n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | "
IDCODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEDAYS_BIRTHDAYS_EMPLOYEDFLAG_MOBILFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERS
05008804MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2.0
15008805MYY0427500.0WorkingHigher educationCivil marriageRented apartment-12005-45421100NaN2.0
25008806MYY0112500.0WorkingSecondary / secondary specialMarriedHouse / apartment-21474-11341000Security staff2.0
35008808FNY0270000.0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment-19110-30511011Sales staff1.0
45008809FNY0270000.0Commercial associateSecondary / secondary specialSingle / not marriedHouse / apartment-19110-30511011Sales staff1.0
\n", 214 | "
" 215 | ], 216 | "text/plain": [ 217 | " ID CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN \\\n", 218 | "0 5008804 M Y Y 0 \n", 219 | "1 5008805 M Y Y 0 \n", 220 | "2 5008806 M Y Y 0 \n", 221 | "3 5008808 F N Y 0 \n", 222 | "4 5008809 F N Y 0 \n", 223 | "\n", 224 | " AMT_INCOME_TOTAL NAME_INCOME_TYPE NAME_EDUCATION_TYPE \\\n", 225 | "0 427500.0 Working Higher education \n", 226 | "1 427500.0 Working Higher education \n", 227 | "2 112500.0 Working Secondary / secondary special \n", 228 | "3 270000.0 Commercial associate Secondary / secondary special \n", 229 | "4 270000.0 Commercial associate Secondary / secondary special \n", 230 | "\n", 231 | " NAME_FAMILY_STATUS NAME_HOUSING_TYPE DAYS_BIRTH DAYS_EMPLOYED \\\n", 232 | "0 Civil marriage Rented apartment -12005 -4542 \n", 233 | "1 Civil marriage Rented apartment -12005 -4542 \n", 234 | "2 Married House / apartment -21474 -1134 \n", 235 | "3 Single / not married House / apartment -19110 -3051 \n", 236 | "4 Single / not married House / apartment -19110 -3051 \n", 237 | "\n", 238 | " FLAG_MOBIL FLAG_WORK_PHONE FLAG_PHONE FLAG_EMAIL OCCUPATION_TYPE \\\n", 239 | "0 1 1 0 0 NaN \n", 240 | "1 1 1 0 0 NaN \n", 241 | "2 1 0 0 0 Security staff \n", 242 | "3 1 0 1 1 Sales staff \n", 243 | "4 1 0 1 1 Sales staff \n", 244 | "\n", 245 | " CNT_FAM_MEMBERS \n", 246 | "0 2.0 \n", 247 | "1 2.0 \n", 248 | "2 2.0 \n", 249 | "3 1.0 \n", 250 | "4 1.0 " 251 | ] 252 | }, 253 | "execution_count": 4, 254 | "metadata": {}, 255 | "output_type": "execute_result" 256 | } 257 | ], 258 | "source": [ 259 | "application_data=read_app_data()\n", 260 | "application_data.head()" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 5, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | "\n", 273 | "RangeIndex: 438557 entries, 0 to 438556\n", 274 | "Data columns (total 18 columns):\n", 275 | " # Column Non-Null Count Dtype \n", 276 | "--- ------ -------------- ----- \n", 277 | " 0 ID 438557 non-null int64 \n", 278 | " 1 CODE_GENDER 438557 non-null object \n", 279 | " 2 FLAG_OWN_CAR 438557 non-null object \n", 280 | " 3 FLAG_OWN_REALTY 438557 non-null object \n", 281 | " 4 CNT_CHILDREN 438557 non-null int64 \n", 282 | " 5 AMT_INCOME_TOTAL 438557 non-null float64\n", 283 | " 6 NAME_INCOME_TYPE 438557 non-null object \n", 284 | " 7 NAME_EDUCATION_TYPE 438557 non-null object \n", 285 | " 8 NAME_FAMILY_STATUS 438557 non-null object \n", 286 | " 9 NAME_HOUSING_TYPE 438557 non-null object \n", 287 | " 10 DAYS_BIRTH 438557 non-null int64 \n", 288 | " 11 DAYS_EMPLOYED 438557 non-null int64 \n", 289 | " 12 FLAG_MOBIL 438557 non-null int64 \n", 290 | " 13 FLAG_WORK_PHONE 438557 non-null int64 \n", 291 | " 14 FLAG_PHONE 438557 non-null int64 \n", 292 | " 15 FLAG_EMAIL 438557 non-null int64 \n", 293 | " 16 OCCUPATION_TYPE 304354 non-null object \n", 294 | " 17 CNT_FAM_MEMBERS 438557 non-null float64\n", 295 | "dtypes: float64(2), int64(8), object(8)\n", 296 | "memory usage: 60.2+ MB\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "application_data.info()" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "### Here columns like ID and DAYS_BIRTH are not relatable to datasets so we could these columns" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 6, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "def feature_creation():\n", 318 | " application_data.drop(['ID','DAYS_BIRTH'],axis=1,inplace=True)\n", 319 | " \n", 320 | "# Checking for columns with data type as 'object'\n", 321 | "\n", 322 | " object_type_columns=[col for col in application_data.columns if application_data[col].dtype=='object'] \n", 323 | " \n", 324 | "# Converting the object type data to categorical form.\n", 325 | " \n", 326 | " for i,column in enumerate(object_type_columns):\n", 327 | " application_data[column]=pd.Categorical(application_data[column]).codes\n", 328 | " features_column=application_data.columns\n", 329 | " features=pd.DataFrame(application_data,columns=features_column)\n", 330 | " return features" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 7, 336 | "metadata": {}, 337 | "outputs": [], 338 | "source": [ 339 | "features_for_model=feature_creation()" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 8, 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/html": [ 350 | "
\n", 351 | "\n", 364 | "\n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | "
CODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALNAME_INCOME_TYPENAME_EDUCATION_TYPENAME_FAMILY_STATUSNAME_HOUSING_TYPEDAYS_EMPLOYEDFLAG_MOBILFLAG_WORK_PHONEFLAG_PHONEFLAG_EMAILOCCUPATION_TYPECNT_FAM_MEMBERS
01110427500.04104-45421100-12.0
11110427500.04104-45421100-12.0
21110112500.04411-11341000162.0
30010270000.00431-30511011141.0
40010270000.00431-30511011141.0
\n", 484 | "
" 485 | ], 486 | "text/plain": [ 487 | " CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL \\\n", 488 | "0 1 1 1 0 427500.0 \n", 489 | "1 1 1 1 0 427500.0 \n", 490 | "2 1 1 1 0 112500.0 \n", 491 | "3 0 0 1 0 270000.0 \n", 492 | "4 0 0 1 0 270000.0 \n", 493 | "\n", 494 | " NAME_INCOME_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS \\\n", 495 | "0 4 1 0 \n", 496 | "1 4 1 0 \n", 497 | "2 4 4 1 \n", 498 | "3 0 4 3 \n", 499 | "4 0 4 3 \n", 500 | "\n", 501 | " NAME_HOUSING_TYPE DAYS_EMPLOYED FLAG_MOBIL FLAG_WORK_PHONE FLAG_PHONE \\\n", 502 | "0 4 -4542 1 1 0 \n", 503 | "1 4 -4542 1 1 0 \n", 504 | "2 1 -1134 1 0 0 \n", 505 | "3 1 -3051 1 0 1 \n", 506 | "4 1 -3051 1 0 1 \n", 507 | "\n", 508 | " FLAG_EMAIL OCCUPATION_TYPE CNT_FAM_MEMBERS \n", 509 | "0 0 -1 2.0 \n", 510 | "1 0 -1 2.0 \n", 511 | "2 0 16 2.0 \n", 512 | "3 1 14 1.0 \n", 513 | "4 1 14 1.0 " 514 | ] 515 | }, 516 | "execution_count": 8, 517 | "metadata": {}, 518 | "output_type": "execute_result" 519 | } 520 | ], 521 | "source": [ 522 | "# Observing first 5 values of feature\n", 523 | "\n", 524 | "features_for_model.head()" 525 | ] 526 | } 527 | ], 528 | "metadata": { 529 | "kernelspec": { 530 | "display_name": "Python 3", 531 | "language": "python", 532 | "name": "python3" 533 | }, 534 | "language_info": { 535 | "codemirror_mode": { 536 | "name": "ipython", 537 | "version": 3 538 | }, 539 | "file_extension": ".py", 540 | "mimetype": "text/x-python", 541 | "name": "python", 542 | "nbconvert_exporter": "python", 543 | "pygments_lexer": "ipython3", 544 | "version": "3.7.6" 545 | } 546 | }, 547 | "nbformat": 4, 548 | "nbformat_minor": 4 549 | } 550 | -------------------------------------------------------------------------------- /submissions/Anusha-Verma-C.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "language_info": { 4 | "codemirror_mode": { 5 | "name": "ipython", 6 | "version": 3 7 | }, 8 | "file_extension": ".py", 9 | "mimetype": "text/x-python", 10 | "name": "python", 11 | "nbconvert_exporter": "python", 12 | "pygments_lexer": "ipython3", 13 | "version": "3.7.6-final" 14 | }, 15 | "orig_nbformat": 2, 16 | "kernelspec": { 17 | "name": "Python 3.7.6 64-bit ('base': conda)", 18 | "display_name": "Python 3.7.6 64-bit ('base': conda)", 19 | "metadata": { 20 | "interpreter": { 21 | "hash": "18f47364f2f4870763990e46b7154981c710d71482bd8194938a3829d09494e5" 22 | } 23 | } 24 | } 25 | }, 26 | "nbformat": 4, 27 | "nbformat_minor": 2, 28 | "cells": [ 29 | { 30 | "cell_type": "code", 31 | "execution_count": 15, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "import numpy as np\n", 36 | "import matplotlib.pyplot as plt\n", 37 | "import pandas as pd\n", 38 | "\n", 39 | "\n" 40 | ] 41 | }, 42 | { 43 | "source": [ 44 | "## Import and read app data\n", 45 | "I have imported the given dataset into a folder called dataset.\n", 46 | "
\n" 47 | ], 48 | "cell_type": "markdown", 49 | "metadata": {} 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 16, 54 | "metadata": {}, 55 | "outputs": [], 56 | "source": [ 57 | "def read_app_data():\n", 58 | " application_record=pd.read_csv(r'C:\\Users\\anusha\\Desktop\\dataset\\application_record.csv')\n", 59 | " return application_record\n", 60 | "application_record=read_app_data()\n" 61 | ] 62 | }, 63 | { 64 | "source": [ 65 | "## Import and read credit_record data" 66 | ], 67 | "cell_type": "markdown", 68 | "metadata": {} 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 17, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "credit_record=pd.read_csv(r'C:\\Users\\anusha\\Desktop\\dataset\\credit_record.csv')\n" 77 | ] 78 | }, 79 | { 80 | "source": [ 81 | "## Feature Creation\n" 82 | ], 83 | "cell_type": "markdown", 84 | "metadata": {} 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 18, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "#dropping ID\n", 93 | "#dropping Flag Mobil as the entire column has the same values\n", 94 | "\n", 95 | "def feature_creation():\n", 96 | " application_record.drop(['ID','FLAG_MOBIL'],axis=1,inplace=True)\n", 97 | " features=application_record.columns\n", 98 | " return features\n", 99 | "\n", 100 | "features=feature_creation()\n" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 19, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "features1=['CODE_GENDER','FLAG_OWN_CAR','FLAG_OWN_REALTY','OCCUPATION_TYPE']\n", 110 | "import sklearn\n", 111 | "from sklearn.preprocessing import LabelEncoder\n", 112 | "le = LabelEncoder()\n", 113 | "for a in features1:\n", 114 | " application_record[a] = le.fit_transform(application_record[a].astype(str))\n" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 20, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "output_type": "stream", 124 | "name": "stdout", 125 | "text": [ 126 | " CODE_GENDER FLAG_OWN_CAR FLAG_OWN_REALTY CNT_CHILDREN \\\n0 1 1 1 0 \n1 1 1 1 0 \n2 1 1 1 0 \n3 0 0 1 0 \n4 0 0 1 0 \n... ... ... ... ... \n438552 1 0 1 0 \n438553 0 0 0 0 \n438554 0 0 0 0 \n438555 0 0 1 0 \n438556 0 0 1 0 \n\n AMT_INCOME_TOTAL DAYS_BIRTH DAYS_EMPLOYED FLAG_WORK_PHONE \\\n0 5.630936 -12005 -4542 1 \n1 5.630936 -12005 -4542 1 \n2 5.051153 -21474 -1134 0 \n3 5.431364 -19110 -3051 0 \n4 5.431364 -19110 -3051 0 \n... ... ... ... ... \n438552 5.130334 -22717 365243 0 \n438553 5.014940 -15939 -3007 0 \n438554 4.732394 -8169 -372 1 \n438555 4.857332 -21673 365243 0 \n438556 5.084576 -18858 -1201 0 \n\n FLAG_PHONE FLAG_EMAIL ... NAME_FAMILY_STATUS_Married \\\n0 0 0 ... 0 \n1 0 0 ... 0 \n2 0 0 ... 1 \n3 1 1 ... 0 \n4 1 1 ... 0 \n... ... ... ... ... \n438552 0 0 ... 0 \n438553 0 0 ... 0 \n438554 0 0 ... 0 \n438555 0 0 ... 1 \n438556 1 0 ... 1 \n\n NAME_FAMILY_STATUS_Separated NAME_FAMILY_STATUS_Single / not married \\\n0 0 0 \n1 0 0 \n2 0 0 \n3 0 1 \n4 0 1 \n... ... ... \n438552 1 0 \n438553 0 1 \n438554 0 1 \n438555 0 0 \n438556 0 0 \n\n NAME_FAMILY_STATUS_Widow NAME_HOUSING_TYPE_Co-op apartment \\\n0 0 0 \n1 0 0 \n2 0 0 \n3 0 0 \n4 0 0 \n... ... ... \n438552 0 0 \n438553 0 0 \n438554 0 0 \n438555 0 0 \n438556 0 0 \n\n NAME_HOUSING_TYPE_House / apartment \\\n0 0 \n1 0 \n2 1 \n3 1 \n4 1 \n... ... \n438552 1 \n438553 1 \n438554 0 \n438555 1 \n438556 1 \n\n NAME_HOUSING_TYPE_Municipal apartment \\\n0 0 \n1 0 \n2 0 \n3 0 \n4 0 \n... ... \n438552 0 \n438553 0 \n438554 0 \n438555 0 \n438556 0 \n\n NAME_HOUSING_TYPE_Office apartment \\\n0 0 \n1 0 \n2 0 \n3 0 \n4 0 \n... ... \n438552 0 \n438553 0 \n438554 0 \n438555 0 \n438556 0 \n\n NAME_HOUSING_TYPE_Rented apartment NAME_HOUSING_TYPE_With parents \n0 1 0 \n1 1 0 \n2 0 0 \n3 0 0 \n4 0 0 \n... ... ... \n438552 0 0 \n438553 0 0 \n438554 0 1 \n438555 0 0 \n438556 0 0 \n\n[438557 rows x 33 columns]\n" 127 | ] 128 | } 129 | ], 130 | "source": [ 131 | "from sklearn.compose import ColumnTransformer\n", 132 | "from sklearn.preprocessing import OneHotEncoder\n", 133 | "application_record= pd.get_dummies(data=application_record,columns=['NAME_INCOME_TYPE','NAME_EDUCATION_TYPE','NAME_FAMILY_STATUS','NAME_HOUSING_TYPE'])\n", 134 | "application_record['AMT_INCOME_TOTAL']=np.log10(application_record['AMT_INCOME_TOTAL']) \n", 135 | "\n", 136 | "print(application_record)\n", 137 | "\n" 138 | ] 139 | } 140 | ] 141 | } -------------------------------------------------------------------------------- /submissions/Arpit-Agarwal.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# Importing required libraries\n", 10 | "import numpy as np\n", 11 | "import pandas as pd" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Import and read app data\n", 19 | "\n", 20 | "The link for downloading the daatset is " 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 2, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "def read_app_data():\n", 30 | " application_record=pd.read_csv(r'C:\\Users\\akhil\\Documents\\drive-download-20201009T155012Z-001\\application_record.csv')\n", 31 | " return application_record\n", 32 | "application_record=read_app_data()" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": null, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [] 41 | } 42 | ], 43 | "metadata": { 44 | "kernelspec": { 45 | "display_name": "Python3.6Test", 46 | "language": "python", 47 | "name": "python3.6test" 48 | }, 49 | "language_info": { 50 | "codemirror_mode": { 51 | "name": "ipython", 52 | "version": 3 53 | }, 54 | "file_extension": ".py", 55 | "mimetype": "text/x-python", 56 | "name": "python", 57 | "nbconvert_exporter": "python", 58 | "pygments_lexer": "ipython3", 59 | "version": "3.6.5" 60 | } 61 | }, 62 | "nbformat": 4, 63 | "nbformat_minor": 4 64 | } 65 | -------------------------------------------------------------------------------- /submissions/Dhanesh-Shetty.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Import and Read Application Data" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import pandas as pd" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 2, 22 | "metadata": {}, 23 | "outputs": [], 24 | "source": [ 25 | "def read_app_data():\n", 26 | " application_records=pd.read_csv(\"/home/dhanesh/Documents/Credit Card Approval/application_record.csv\")\n", 27 | " return application_records" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "# Import The Credit Card Dataset" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 3, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "#dataset downloaded from https://drive.google.com/drive/folders/1ltq08WdYxd-r9wnY60o78VBgN5FlMcKk?usp=sharing\n", 44 | "credit_card_data=pd.read_csv(\"/home/dhanesh/Documents/Credit Card Approval/credit_record.csv\")" 45 | ] 46 | } 47 | ], 48 | "metadata": { 49 | "kernelspec": { 50 | "display_name": "py3-TF", 51 | "language": "python", 52 | "name": "py3-tf" 53 | }, 54 | "language_info": { 55 | "codemirror_mode": { 56 | "name": "ipython", 57 | "version": 3 58 | }, 59 | "file_extension": ".py", 60 | "mimetype": "text/x-python", 61 | "name": "python", 62 | "nbconvert_exporter": "python", 63 | "pygments_lexer": "ipython3", 64 | "version": "3.8.3" 65 | } 66 | }, 67 | "nbformat": 4, 68 | "nbformat_minor": 4 69 | } 70 | --------------------------------------------------------------------------------