├── Dataset
    ├── bank-additional-names.txt
    ├── bank-additional.csv
    └── bank-dataset-use-for-marketing.csv
├── Proj1_EDA_Pandas_Banking (1).ipynb
└── README.md


/Dataset/bank-additional-names.txt:
--------------------------------------------------------------------------------
 1 | ﻿Citation Request:
 2 |   This dataset is publicly available for research. The details are described in [Moro et al., 2014]. 
 3 |   Please include this citation if you plan to use this database:
 4 | 
 5 |   [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, In press, http://dx.doi.org/10.1016/j.dss.2014.03.001
 6 | 
 7 |   Available at: [pdf] http://dx.doi.org/10.1016/j.dss.2014.03.001
 8 |                 [bib] http://www3.dsi.uminho.pt/pcortez/bib/2014-dss.txt
 9 | 
10 | 1. Title: Bank Marketing (with social/economic context)
11 | 
12 | 2. Sources
13 |    Created by: Sérgio Moro (ISCTE-IUL), Paulo Cortez (Univ. Minho) and Paulo Rita (ISCTE-IUL) @ 2014
14 |    
15 | 3. Past Usage:
16 | 
17 |   The full dataset (bank-additional-full.csv) was described and analyzed in:
18 | 
19 |   S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems (2014), doi:10.1016/j.dss.2014.03.001.
20 |  
21 | 4. Relevant Information:
22 | 
23 |    This dataset is based on "Bank Marketing" UCI dataset (please check the description at: http://archive.ics.uci.edu/ml/datasets/Bank+Marketing).
24 |    The data is enriched by the addition of five new social and economic features/attributes (national wide indicators from a ~10M population country), published by the Banco de Portugal and publicly available at: https://www.bportugal.pt/estatisticasweb.
25 |    This dataset is almost identical to the one used in [Moro et al., 2014] (it does not include all attributes due to privacy concerns). 
26 |    Using the rminer package and R tool (http://cran.r-project.org/web/packages/rminer/), we found that the addition of the five new social and economic attributes (made available here) lead to substantial improvement in the prediction of a success, even when the duration of the call is not included. Note: the file can be read in R using: d=read.table("bank-additional-full.csv",header=TRUE,sep=";")
27 |    
28 |    The zip file includes two datasets: 
29 |       1) bank-additional-full.csv with all examples, ordered by date (from May 2008 to November 2010).
30 |       2) bank-additional.csv with 10% of the examples (4119), randomly selected from bank-additional-full.csv.
31 |    The smallest dataset is provided to test more computationally demanding machine learning algorithms (e.g., SVM).
32 | 
33 |    The binary classification goal is to predict if the client will subscribe a bank term deposit (variable y).
34 | 
35 | 5. Number of Instances: 41188 for bank-additional-full.csv
36 | 
37 | 6. Number of Attributes: 20 + output attribute.
38 | 
39 | 7. Attribute information:
40 | 
41 |    For more information, read [Moro et al., 2014].
42 | 
43 |    Input variables:
44 |    # bank client data:
45 |    1 - age (numeric)
46 |    2 - job : type of job (categorical: "admin.","blue-collar","entrepreneur","housemaid","management","retired","self-employed","services","student","technician","unemployed","unknown")
47 |    3 - marital : marital status (categorical: "divorced","married","single","unknown"; note: "divorced" means divorced or widowed)
48 |    4 - education (categorical: "basic.4y","basic.6y","basic.9y","high.school","illiterate","professional.course","university.degree","unknown")
49 |    5 - default: has credit in default? (categorical: "no","yes","unknown")
50 |    6 - housing: has housing loan? (categorical: "no","yes","unknown")
51 |    7 - loan: has personal loan? (categorical: "no","yes","unknown")
52 |    # related with the last contact of the current campaign:
53 |    8 - contact: contact communication type (categorical: "cellular","telephone") 
54 |    9 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
55 |   10 - day_of_week: last contact day of the week (categorical: "mon","tue","wed","thu","fri")
56 |   11 - duration: last contact duration, in seconds (numeric). Important note:  this attribute highly affects the output target (e.g., if duration=0 then y="no"). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
57 |    # other attributes:
58 |   12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
59 |   13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
60 |   14 - previous: number of contacts performed before this campaign and for this client (numeric)
61 |   15 - poutcome: outcome of the previous marketing campaign (categorical: "failure","nonexistent","success")
62 |    # social and economic context attributes
63 |   16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
64 |   17 - cons.price.idx: consumer price index - monthly indicator (numeric)     
65 |   18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)     
66 |   19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
67 |   20 - nr.employed: number of employees - quarterly indicator (numeric)
68 | 
69 |   Output variable (desired target):
70 |   21 - y - has the client subscribed a term deposit? (binary: "yes","no")
71 | 
72 | 8. Missing Attribute Values: There are several missing values in some categorical attributes, all coded with the "unknown" label. These missing values can be treated as a possible class label or using deletion or imputation techniques. 
73 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Exploratory-Data-Analysis-EDA-in-Banking-Using-Python
 2 | 
 3 | The purpose of this project is to master the exploratory data analysis (EDA) in banking with Pandas framework.
 4 | 
 5 | ## Goals of the Project:
 6 | 
 7 | 1. Explore a banking dataset with Pandas framework.
 8 | 2. Build pivot tables.
 9 | 3. Visualize the dataset with various plot types.
10 | 
11 | ## Outline
12 | 1. Materials and methods
13 | 2. General part :   i) Libraries import  ii) Dataset exploration  iii) Pivot tables  iv) Visualization in Pandas
14 | 3. Tasks
15 |   
16 |   
17 | ## Materials and methods: 
18 | The data that we are going to use for this is a subset of an open source Bank Marketing Data Set from the UCI ML repository: https://archive.ics.uci.edu/ml/citation_policy.html.
19 | 
20 | This dataset is publicly available for research. The details are described in [Moro et al., 2014].
21 | 
22 | During the work, the task of preliminary analysis of a positive response (term deposit) to direct calls from a bank is to solve. In essence, the task is a matter of bank scoring, i.e. according to the characteristics of a client (potential client), their behavior is predicted (loan default, a wish to make a deposit, etc.).
23 | 
24 | In this project, we will try to give answers to a set of questions that may be relevant when analyzing banking data:
25 | 
26 | What is the share of clients attracted in our source data?
27 | What are the mean values ​​of numerical features among the attracted clients?
28 | What is the average call duration for the attracted clients?
29 | What is the average age among the attracted and unmarried clients?
30 | What is the average age and call duration for different types of client employment?
31 | In addition, we will make a visual analysis in order to plan marketing banking campaigns more effectively.
32 | 
33 | ## USED LIBRARIES:
34 | 1. NUMPY
35 | 2. PANDAS
36 | 3. MATPLOTLIB
37 | 


--------------------------------------------------------------------------------