├── Ch1
├── .ipynb_checkpoints
│ ├── 1. Analysing the Mean of a dataset-checkpoint.ipynb
│ ├── 2. Checking the Median of a dataset-checkpoint.ipynb
│ ├── 3. Identifying the Mode of a dataset-checkpoint.ipynb
│ ├── 4. Checking the Variance of a dataset-checkpoint.ipynb
│ ├── 5. Identifying the Standard Deviation of a dataset-checkpoint.ipynb
│ ├── 6. Generating the Range of a dataset-checkpoint.ipynb
│ ├── 7. Identifying the Percentiles of a dataset-checkpoint.ipynb
│ ├── 8. Checking the Quartiles of a dataset-checkpoint.ipynb
│ ├── 9. Analysing the Interquartile Range (IQR) of a dataset-checkpoint.ipynb
│ ├── Analysing the Interquartile Range (IQR) of a dataset-checkpoint.ipynb
│ ├── Analysing the Mean of a dataset-checkpoint.ipynb
│ ├── Chapter 1 EDA in Python-checkpoint.ipynb
│ ├── Checking the Median of a dataset-checkpoint.ipynb
│ ├── Checking the Quartiles of a dataset-Copy1-checkpoint.ipynb
│ ├── Checking the Quartiles of a dataset-checkpoint.ipynb
│ ├── Checking the Variance of a dataset-checkpoint.ipynb
│ ├── Generating the Range of a dataset-checkpoint.ipynb
│ ├── Identifying the Mode of a dataset-checkpoint.ipynb
│ ├── Identifying the Percentiles of a dataset-checkpoint.ipynb
│ └── Identifying the Standard Deviation of a dataset-checkpoint.ipynb
├── 1. Analysing the Mean of a dataset.ipynb
├── 2. Checking the Median of a dataset.ipynb
├── 3. Identifying the Mode of a dataset.ipynb
├── 4. Checking the Variance of a dataset.ipynb
├── 5. Identifying the Standard Deviation of a dataset.ipynb
├── 6. Generating the Range of a dataset.ipynb
├── 7. Identifying the Percentiles of a dataset.ipynb
├── 8. Checking the Quartiles of a dataset.ipynb
├── 9. Analysing the Interquartile Range (IQR) of a dataset.ipynb
└── Data
│ └── covid-data.csv
├── Ch2
├── .ipynb_checkpoints
│ ├── 1. Grouping Data-checkpoint.ipynb
│ ├── 10. Replacing Data-checkpoint.ipynb
│ ├── 11. Dealing Missing values-checkpoint.ipynb
│ ├── 2. Appending Data-checkpoint.ipynb
│ ├── 3. Concatenating Data-checkpoint.ipynb
│ ├── 4. Merging Data-checkpoint.ipynb
│ ├── 5. Sorting Data-checkpoint.ipynb
│ ├── 6. Categorising Data-checkpoint.ipynb
│ ├── 7. Removing Duplicates-checkpoint.ipynb
│ ├── 8. Dropping Rows and Columns-checkpoint.ipynb
│ ├── 9. Changing Data Format-checkpoint.ipynb
│ ├── Appending Data-checkpoint.ipynb
│ ├── Categorising Data-checkpoint.ipynb
│ ├── Changing Data Format-checkpoint.ipynb
│ ├── Concatenating Data-checkpoint.ipynb
│ ├── Dropping Rows and Columns-checkpoint.ipynb
│ ├── Grouping Data-checkpoint.ipynb
│ ├── Merging Data-checkpoint.ipynb
│ ├── Removing Duplicates-checkpoint.ipynb
│ ├── Replacing Data-checkpoint.ipynb
│ ├── Sorting Data-checkpoint.ipynb
│ └── Untitled-checkpoint.ipynb
├── 1. Grouping Data.ipynb
├── 10. Replacing Data.ipynb
├── 11. Dealing with Missing values.ipynb
├── 2. Appending Data.ipynb
├── 3. Concatenating Data.ipynb
├── 4. Merging Data.ipynb
├── 5. Sorting Data.ipynb
├── 6. Categorising Data.ipynb
├── 7. Removing Duplicates.ipynb
├── 8. Dropping Rows and Columns.ipynb
├── 9. Changing Data Format.ipynb
└── Data
│ ├── marketing_campaign.csv
│ ├── marketing_campaign_append1.csv
│ ├── marketing_campaign_append2.csv
│ ├── marketing_campaign_concat1.csv
│ ├── marketing_campaign_concat2.csv
│ ├── marketing_campaign_merge1.csv
│ └── marketing_campaign_merge2.csv
├── Ch3
├── .ipynb_checkpoints
│ ├── 1. Grouping Data-checkpoint.ipynb
│ ├── 1. Preparing for EDA .-checkpoint.ipynb
│ ├── 10. Replacing Data-checkpoint.ipynb
│ ├── 2. Appending Data-checkpoint.ipynb
│ ├── 2. Visualizing data in Matplotlib-checkpoint.ipynb
│ ├── 3. Concatenating Data-checkpoint.ipynb
│ ├── 3. Visualizing data in Seaborn-checkpoint.ipynb
│ ├── 4. Merging Data-checkpoint.ipynb
│ ├── 4. Visualizing data in GGPLOT-checkpoint.ipynb
│ ├── 5. Sorting Data-checkpoint.ipynb
│ ├── 5. Visualizing data in Bokeh-checkpoint.ipynb
│ ├── 6. Categorising Data-checkpoint.ipynb
│ ├── 7. Removing Duplicates-checkpoint.ipynb
│ ├── 8. Dropping Rows and Columns-checkpoint.ipynb
│ ├── 9. Changing Data Format-checkpoint.ipynb
│ ├── Appending Data-checkpoint.ipynb
│ ├── Categorising Data-checkpoint.ipynb
│ ├── Changing Data Format-checkpoint.ipynb
│ ├── Concatenating Data-checkpoint.ipynb
│ ├── Dropping Rows and Columns-checkpoint.ipynb
│ ├── Grouping Data-checkpoint.ipynb
│ ├── Merging Data-checkpoint.ipynb
│ ├── Removing Duplicates-checkpoint.ipynb
│ ├── Replacing Data-checkpoint.ipynb
│ ├── Sorting Data-checkpoint.ipynb
│ └── Untitled-checkpoint.ipynb
├── 1. Preparing for EDA ..ipynb
├── 2. Visualizing data in Matplotlib.ipynb
├── 3. Visualizing data in Seaborn.ipynb
├── 4. Visualizing data in GGPLOT.ipynb
├── 5. Visualizing data in Bokeh.ipynb
└── Data
│ └── HousingPricesData.csv
├── Ch4
├── .ipynb_checkpoints
│ ├── 1. Performing univariate analysis using a Histogram-checkpoint.ipynb
│ ├── 2. Performing univariate analysis using a Boxplot-checkpoint.ipynb
│ ├── 3. Performing univariate analysis using a Violinplot-checkpoint.ipynb
│ ├── 4. Performing univariate analysis using a Summary Table-checkpoint.ipynb
│ ├── Performing univariate analysis using a Bar Chart-checkpoint.ipynb
│ ├── Performing univariate analysis using a Boxplot-checkpoint.ipynb
│ ├── Performing univariate analysis using a Histogram-checkpoint.ipynb
│ ├── Performing univariate analysis using a Pie Chart-checkpoint.ipynb
│ ├── Performing univariate analysis using a Table-checkpoint.ipynb
│ └── Performing univariate analysis using a Violinplot-checkpoint.ipynb
├── 1. Performing univariate analysis using a Histogram.ipynb
├── 2. Performing univariate analysis using a Boxplot.ipynb
├── 3. Performing univariate analysis using a Violinplot.ipynb
├── 4. Performing univariate analysis using a Summary Table.ipynb
├── 5. Performing univariate analysis using a Bar Chart.ipynb
├── 6. Performing univariate analysis using a Pie Chart.ipynb
└── Data
│ ├── HousingPricesData.csv
│ ├── penguins_lter.csv
│ └── penguins_size.csv
├── Ch5
├── .ipynb_checkpoints
│ ├── 1. Analysing two variables using a Scatter plot-checkpoint.ipynb
│ ├── 2. Creating CrosstabTwo-way table on bivariate data-checkpoint.ipynb
│ ├── 3. Analysing two variables using a Pivot table-checkpoint.ipynb
│ ├── 4. Generating Pairplots on two variables-checkpoint.ipynb
│ ├── 5. Analysing two variables using a Bar chart-checkpoint.ipynb
│ ├── 6. Generating Box plots for two variables-checkpoint.ipynb
│ ├── 7. Creating Histograms on two variables-checkpoint.ipynb
│ └── 8. Analysing two variables using a Correlation analysis-checkpoint.ipynb
├── 1. Analysing two variables using a Scatter plot.ipynb
├── 2. Creating CrosstabTwo-way table on bivariate data.ipynb
├── 3. Analysing two variables using a Pivot table.ipynb
├── 4. Generating Pairplots on two variables.ipynb
├── 5. Analysing two variables using a Bar chart.ipynb
├── 6. Generating Box plots for two variables.ipynb
├── 7. Creating Histograms on two variables.ipynb
├── 8. Analysing two variables using a Correlation analysis.ipynb
└── Data
│ ├── HousingPricesData.csv
│ ├── penguins_lter.csv
│ └── penguins_size.csv
├── Ch6
├── .ipynb_checkpoints
│ ├── 1. Implementing Cluster Analysis on multiple variables using Kmeans-checkpoint.ipynb
│ ├── 2. Choosing the Optimal number of K clusters in Kmeans-checkpoint.ipynb
│ ├── 3. Profiling Kmeans Clusters-checkpoint.ipynb
│ ├── 4. Implementing Principal Component Analysis (PCA) on multiple variables-checkpoint.ipynb
│ ├── 5. Choosing the number of Principal Components-checkpoint.ipynb
│ ├── 6. Analysing Principal Components-checkpoint.ipynb
│ ├── 7. Implementing Factor Analysis on multiple variables-checkpoint.ipynb
│ ├── 8. Determining the number of factors-Copy1-checkpoint.ipynb
│ ├── 8. Determining the number of factors-checkpoint.ipynb
│ └── 9. Analysing the factors-checkpoint.ipynb
├── 1. Implementing Cluster Analysis on multiple variables using Kmeans.ipynb
├── 2. Choosing the Optimal number of K clusters in Kmeans.ipynb
├── 3. Profiling Kmeans Clusters.ipynb
├── 4. Implementing Principal Component Analysis (PCA) on multiple variables.ipynb
├── 5. Choosing the number of Principal Components.ipynb
├── 6. Analysing Principal Components.ipynb
├── 7. Implementing Factor Analysis on multiple variables.ipynb
├── 8. Determining the number of factors.ipynb
├── 9. Analysing the factors.ipynb
└── Data
│ ├── marketing_campaign.csv
│ └── website_survey.csv
├── Ch7
├── .ipynb_checkpoints
│ ├── 1. Using line and boxplots to visualise time series data-checkpoint.ipynb
│ ├── 2 Spotting patterns in Time series old-checkpoint.ipynb
│ ├── 2 Spotting patterns in Time series-checkpoint.ipynb
│ ├── 3 Performing Time series data Decomposition-checkpoint.ipynb
│ ├── 4 Performing Smoothing - Moving Average-checkpoint.ipynb
│ ├── 5 Performing Smoothing - Exponential Smoothing-checkpoint.ipynb
│ ├── 6. Performing Stationarity checks on Time series data-checkpoint.ipynb
│ ├── 7. Differencing Time series data-checkpoint.ipynb
│ └── 8. Using Correlation plots to visualise time series data-checkpoint.ipynb
├── 1. Using line and boxplots to visualise time series data.ipynb
├── 2 Spotting patterns in Time series old.ipynb
├── 2 Spotting patterns in Time series.ipynb
├── 3 Performing Time series data Decomposition.ipynb
├── 4 Performing Smoothing - Moving Average.ipynb
├── 5 Performing Smoothing - Exponential Smoothing.ipynb
├── 6. Performing Stationarity checks on Time series data.ipynb
├── 7. Differencing Time series data.ipynb
├── 8. Using Correlation plots to visualise time series data.ipynb
└── Data
│ ├── DailyDelhiClimate.csv
│ ├── DataSF_Data_Dictionary_for_Air_Traffic_Passenger_Statistics.pdf
│ ├── MTNOY.csv
│ ├── SF_Air_Traffic_Passenger_Statistics.csv
│ └── SF_Air_Traffic_Passenger_Statistics_Transformed.csv
├── Ch8
├── .ipynb_checkpoints
│ ├── 1. Preparing Text data-checkpoint.ipynb
│ ├── 10.Choosing Optimal number of Topics-checkpoint.ipynb
│ ├── 2. Removing Stop words-checkpoint.ipynb
│ ├── 3. Analysing Part of Speech-checkpoint.ipynb
│ ├── 4. Performing Stemming and Lemmatisation-checkpoint.ipynb
│ ├── 5. Analysing Ngrams-checkpoint.ipynb
│ ├── 6. Creating Word Clouds-checkpoint.ipynb
│ ├── 7. Checking Term Frequency-checkpoint.ipynb
│ ├── 8. Checking Sentiments-checkpoint.ipynb
│ └── 9. Performing Topic Modelling-checkpoint.ipynb
├── 1. Preparing Text data.ipynb
├── 10.Choosing Optimal number of Topics.ipynb
├── 2. Removing Stop words.ipynb
├── 3. Analysing Part of Speech.ipynb
├── 4. Performing Stemming and Lemmatisation.ipynb
├── 5. Analysing Ngrams.ipynb
├── 6. Creating Word Clouds.ipynb
├── 7. Checking Term Frequency.ipynb
├── 8. Checking Sentiments.ipynb
├── 9. Performing Topic Modelling.ipynb
└── Data
│ ├── a1_RestaurantReviews_HistoricDump.tsv
│ ├── cleaned_reviews_data.csv
│ ├── cleaned_reviews_lemmatized_data.csv
│ └── cleaned_reviews_no_stopwords_data.csv
├── Data
├── Melbourne_housing_EDA.csv
├── Melbourne_housing_FULL.csv
├── Melbourne_housing_imp.csv
└── datasets_notes.numbers
├── LICENSE
└── README.md
/Ch2/.ipynb_checkpoints/1. Grouping Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 2 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 4,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "
\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " 0 | \n",
96 | " 1 | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " ID | \n",
102 | " 5524 | \n",
103 | " 2174 | \n",
104 | "
\n",
105 | " \n",
106 | " Year_Birth | \n",
107 | " 1957 | \n",
108 | " 1954 | \n",
109 | "
\n",
110 | " \n",
111 | " Education | \n",
112 | " Graduation | \n",
113 | " Graduation | \n",
114 | "
\n",
115 | " \n",
116 | " Marital_Status | \n",
117 | " Single | \n",
118 | " Single | \n",
119 | "
\n",
120 | " \n",
121 | " Income | \n",
122 | " 58138.0 | \n",
123 | " 46344.0 | \n",
124 | "
\n",
125 | " \n",
126 | " Kidhome | \n",
127 | " 0 | \n",
128 | " 1 | \n",
129 | "
\n",
130 | " \n",
131 | " Teenhome | \n",
132 | " 0 | \n",
133 | " 1 | \n",
134 | "
\n",
135 | " \n",
136 | " Dt_Customer | \n",
137 | " 04-09-2012 | \n",
138 | " 08-03-2014 | \n",
139 | "
\n",
140 | " \n",
141 | " Recency | \n",
142 | " 58 | \n",
143 | " 38 | \n",
144 | "
\n",
145 | " \n",
146 | " NumStorePurchases | \n",
147 | " 4 | \n",
148 | " 2 | \n",
149 | "
\n",
150 | " \n",
151 | " NumWebVisitsMonth | \n",
152 | " 7 | \n",
153 | " 5 | \n",
154 | "
\n",
155 | " \n",
156 | "
\n",
157 | "
"
158 | ],
159 | "text/plain": [
160 | " 0 1\n",
161 | "ID 5524 2174\n",
162 | "Year_Birth 1957 1954\n",
163 | "Education Graduation Graduation\n",
164 | "Marital_Status Single Single\n",
165 | "Income 58138.0 46344.0\n",
166 | "Kidhome 0 1\n",
167 | "Teenhome 0 1\n",
168 | "Dt_Customer 04-09-2012 08-03-2014\n",
169 | "Recency 58 38\n",
170 | "NumStorePurchases 4 2\n",
171 | "NumWebVisitsMonth 7 5"
172 | ]
173 | },
174 | "execution_count": 4,
175 | "metadata": {},
176 | "output_type": "execute_result"
177 | }
178 | ],
179 | "source": [
180 | "marketing_data.head(2).T"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 5,
186 | "id": "b3601b62",
187 | "metadata": {},
188 | "outputs": [
189 | {
190 | "data": {
191 | "text/plain": [
192 | "ID int64\n",
193 | "Year_Birth int64\n",
194 | "Education object\n",
195 | "Marital_Status object\n",
196 | "Income float64\n",
197 | "Kidhome int64\n",
198 | "Teenhome int64\n",
199 | "Dt_Customer object\n",
200 | "Recency int64\n",
201 | "NumStorePurchases int64\n",
202 | "NumWebVisitsMonth int64\n",
203 | "dtype: object"
204 | ]
205 | },
206 | "execution_count": 5,
207 | "metadata": {},
208 | "output_type": "execute_result"
209 | }
210 | ],
211 | "source": [
212 | "marketing_data.dtypes"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 6,
218 | "id": "bdf17c4b",
219 | "metadata": {},
220 | "outputs": [
221 | {
222 | "data": {
223 | "text/plain": [
224 | "(2240, 11)"
225 | ]
226 | },
227 | "execution_count": 6,
228 | "metadata": {},
229 | "output_type": "execute_result"
230 | }
231 | ],
232 | "source": [
233 | "marketing_data.shape"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "id": "46f24ec2",
239 | "metadata": {},
240 | "source": [
241 | "#### Check the average number of store purchases of customers based on number of kids in the home"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": 7,
247 | "id": "fe10f0a2",
248 | "metadata": {},
249 | "outputs": [
250 | {
251 | "data": {
252 | "text/plain": [
253 | "Kidhome\n",
254 | "0 7.217324\n",
255 | "1 3.863181\n",
256 | "2 3.437500\n",
257 | "Name: NumStorePurchases, dtype: float64"
258 | ]
259 | },
260 | "execution_count": 7,
261 | "metadata": {},
262 | "output_type": "execute_result"
263 | }
264 | ],
265 | "source": [
266 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()"
267 | ]
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.7.1"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 5
291 | }
292 |
--------------------------------------------------------------------------------
/Ch2/.ipynb_checkpoints/10. Replacing Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 4,
69 | "id": "1f556097",
70 | "metadata": {
71 | "scrolled": true
72 | },
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Kidhome | \n",
98 | " Teenhome | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " 5524 | \n",
105 | " 1957 | \n",
106 | " 0 | \n",
107 | " 0 | \n",
108 | "
\n",
109 | " \n",
110 | " 1 | \n",
111 | " 2174 | \n",
112 | " 1954 | \n",
113 | " 1 | \n",
114 | " 1 | \n",
115 | "
\n",
116 | " \n",
117 | " 2 | \n",
118 | " 4141 | \n",
119 | " 1965 | \n",
120 | " 0 | \n",
121 | " 0 | \n",
122 | "
\n",
123 | " \n",
124 | " 3 | \n",
125 | " 6182 | \n",
126 | " 1984 | \n",
127 | " 1 | \n",
128 | " 0 | \n",
129 | "
\n",
130 | " \n",
131 | " 4 | \n",
132 | " 5324 | \n",
133 | " 1981 | \n",
134 | " 1 | \n",
135 | " 0 | \n",
136 | "
\n",
137 | " \n",
138 | "
\n",
139 | "
"
140 | ],
141 | "text/plain": [
142 | " ID Year_Birth Kidhome Teenhome\n",
143 | "0 5524 1957 0 0\n",
144 | "1 2174 1954 1 1\n",
145 | "2 4141 1965 0 0\n",
146 | "3 6182 1984 1 0\n",
147 | "4 5324 1981 1 0"
148 | ]
149 | },
150 | "execution_count": 4,
151 | "metadata": {},
152 | "output_type": "execute_result"
153 | }
154 | ],
155 | "source": [
156 | "marketing_data.head()"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 5,
162 | "id": "b3601b62",
163 | "metadata": {},
164 | "outputs": [
165 | {
166 | "data": {
167 | "text/plain": [
168 | "(2240, 4)"
169 | ]
170 | },
171 | "execution_count": 5,
172 | "metadata": {},
173 | "output_type": "execute_result"
174 | }
175 | ],
176 | "source": [
177 | "marketing_data.shape"
178 | ]
179 | },
180 | {
181 | "cell_type": "markdown",
182 | "id": "46f24ec2",
183 | "metadata": {},
184 | "source": [
185 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\""
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 6,
191 | "id": "fe10f0a2",
192 | "metadata": {},
193 | "outputs": [],
194 | "source": [
195 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "id": "998a9da7",
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/html": [
207 | "\n",
208 | "\n",
221 | "
\n",
222 | " \n",
223 | " \n",
224 | " | \n",
225 | " Teenhome | \n",
226 | " Teenhome_replaced | \n",
227 | "
\n",
228 | " \n",
229 | " \n",
230 | " \n",
231 | " 0 | \n",
232 | " 0 | \n",
233 | " has no teen | \n",
234 | "
\n",
235 | " \n",
236 | " 1 | \n",
237 | " 1 | \n",
238 | " has teen | \n",
239 | "
\n",
240 | " \n",
241 | " 2 | \n",
242 | " 0 | \n",
243 | " has no teen | \n",
244 | "
\n",
245 | " \n",
246 | " 3 | \n",
247 | " 0 | \n",
248 | " has no teen | \n",
249 | "
\n",
250 | " \n",
251 | " 4 | \n",
252 | " 0 | \n",
253 | " has no teen | \n",
254 | "
\n",
255 | " \n",
256 | "
\n",
257 | "
"
258 | ],
259 | "text/plain": [
260 | " Teenhome Teenhome_replaced\n",
261 | "0 0 has no teen\n",
262 | "1 1 has teen\n",
263 | "2 0 has no teen\n",
264 | "3 0 has no teen\n",
265 | "4 0 has no teen"
266 | ]
267 | },
268 | "execution_count": 7,
269 | "metadata": {},
270 | "output_type": "execute_result"
271 | }
272 | ],
273 | "source": [
274 | "marketing_data[['Teenhome','Teenhome_replaced']].head()"
275 | ]
276 | }
277 | ],
278 | "metadata": {
279 | "kernelspec": {
280 | "display_name": "Python 3",
281 | "language": "python",
282 | "name": "python3"
283 | },
284 | "language_info": {
285 | "codemirror_mode": {
286 | "name": "ipython",
287 | "version": 3
288 | },
289 | "file_extension": ".py",
290 | "mimetype": "text/x-python",
291 | "name": "python",
292 | "nbconvert_exporter": "python",
293 | "pygments_lexer": "ipython3",
294 | "version": "3.7.1"
295 | }
296 | },
297 | "nbformat": 4,
298 | "nbformat_minor": 5
299 | }
300 |
--------------------------------------------------------------------------------
/Ch2/.ipynb_checkpoints/9. Changing Data Format-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth','Marital_Status','Income']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 4,
69 | "id": "1f556097",
70 | "metadata": {},
71 | "outputs": [
72 | {
73 | "data": {
74 | "text/html": [
75 | "\n",
76 | "\n",
89 | "
\n",
90 | " \n",
91 | " \n",
92 | " | \n",
93 | " ID | \n",
94 | " Year_Birth | \n",
95 | " Marital_Status | \n",
96 | " Income | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " 0 | \n",
102 | " 5524 | \n",
103 | " 1957 | \n",
104 | " Single | \n",
105 | " 58138.0 | \n",
106 | "
\n",
107 | " \n",
108 | " 1 | \n",
109 | " 2174 | \n",
110 | " 1954 | \n",
111 | " Single | \n",
112 | " 46344.0 | \n",
113 | "
\n",
114 | " \n",
115 | " 2 | \n",
116 | " 4141 | \n",
117 | " 1965 | \n",
118 | " Together | \n",
119 | " 71613.0 | \n",
120 | "
\n",
121 | " \n",
122 | " 3 | \n",
123 | " 6182 | \n",
124 | " 1984 | \n",
125 | " Together | \n",
126 | " 26646.0 | \n",
127 | "
\n",
128 | " \n",
129 | " 4 | \n",
130 | " 5324 | \n",
131 | " 1981 | \n",
132 | " Married | \n",
133 | " 58293.0 | \n",
134 | "
\n",
135 | " \n",
136 | "
\n",
137 | "
"
138 | ],
139 | "text/plain": [
140 | " ID Year_Birth Marital_Status Income\n",
141 | "0 5524 1957 Single 58138.0\n",
142 | "1 2174 1954 Single 46344.0\n",
143 | "2 4141 1965 Together 71613.0\n",
144 | "3 6182 1984 Together 26646.0\n",
145 | "4 5324 1981 Married 58293.0"
146 | ]
147 | },
148 | "execution_count": 4,
149 | "metadata": {},
150 | "output_type": "execute_result"
151 | }
152 | ],
153 | "source": [
154 | "marketing_data.head()"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 5,
160 | "id": "b3601b62",
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "data": {
165 | "text/plain": [
166 | "(2240, 4)"
167 | ]
168 | },
169 | "execution_count": 5,
170 | "metadata": {},
171 | "output_type": "execute_result"
172 | }
173 | ],
174 | "source": [
175 | "marketing_data.shape"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "id": "35c8704d",
181 | "metadata": {},
182 | "source": [
183 | "#### Fill NAs in the income column"
184 | ]
185 | },
186 | {
187 | "cell_type": "code",
188 | "execution_count": 6,
189 | "id": "9e28186b",
190 | "metadata": {},
191 | "outputs": [],
192 | "source": [
193 | "marketing_data['Income'] = marketing_data['Income'].fillna(0)"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "id": "46f24ec2",
199 | "metadata": {},
200 | "source": [
201 | "#### Change the data type of the Income from float to int"
202 | ]
203 | },
204 | {
205 | "cell_type": "code",
206 | "execution_count": 7,
207 | "id": "fe10f0a2",
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "marketing_data['Income_changed'] = marketing_data['Income'].astype(int)"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 8,
217 | "id": "5b38b1b8",
218 | "metadata": {},
219 | "outputs": [
220 | {
221 | "data": {
222 | "text/html": [
223 | "\n",
224 | "\n",
237 | "
\n",
238 | " \n",
239 | " \n",
240 | " | \n",
241 | " Income | \n",
242 | " Income_changed | \n",
243 | "
\n",
244 | " \n",
245 | " \n",
246 | " \n",
247 | " 0 | \n",
248 | " 58138.0 | \n",
249 | " 58138 | \n",
250 | "
\n",
251 | " \n",
252 | " 1 | \n",
253 | " 46344.0 | \n",
254 | " 46344 | \n",
255 | "
\n",
256 | " \n",
257 | " 2 | \n",
258 | " 71613.0 | \n",
259 | " 71613 | \n",
260 | "
\n",
261 | " \n",
262 | " 3 | \n",
263 | " 26646.0 | \n",
264 | " 26646 | \n",
265 | "
\n",
266 | " \n",
267 | " 4 | \n",
268 | " 58293.0 | \n",
269 | " 58293 | \n",
270 | "
\n",
271 | " \n",
272 | "
\n",
273 | "
"
274 | ],
275 | "text/plain": [
276 | " Income Income_changed\n",
277 | "0 58138.0 58138\n",
278 | "1 46344.0 46344\n",
279 | "2 71613.0 71613\n",
280 | "3 26646.0 26646\n",
281 | "4 58293.0 58293"
282 | ]
283 | },
284 | "execution_count": 8,
285 | "metadata": {},
286 | "output_type": "execute_result"
287 | }
288 | ],
289 | "source": [
290 | "marketing_data[['Income','Income_changed']].head()"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": 9,
296 | "id": "998a9da7",
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "data": {
301 | "text/plain": [
302 | "Income float64\n",
303 | "Income_changed int32\n",
304 | "dtype: object"
305 | ]
306 | },
307 | "execution_count": 9,
308 | "metadata": {},
309 | "output_type": "execute_result"
310 | }
311 | ],
312 | "source": [
313 | "marketing_data[['Income','Income_changed']].dtypes"
314 | ]
315 | }
316 | ],
317 | "metadata": {
318 | "kernelspec": {
319 | "display_name": "Python 3",
320 | "language": "python",
321 | "name": "python3"
322 | },
323 | "language_info": {
324 | "codemirror_mode": {
325 | "name": "ipython",
326 | "version": 3
327 | },
328 | "file_extension": ".py",
329 | "mimetype": "text/x-python",
330 | "name": "python",
331 | "nbconvert_exporter": "python",
332 | "pygments_lexer": "ipython3",
333 | "version": "3.7.1"
334 | }
335 | },
336 | "nbformat": 4,
337 | "nbformat_minor": 5
338 | }
339 |
--------------------------------------------------------------------------------
/Ch2/.ipynb_checkpoints/Grouping Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 5 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 9,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " 0 | \n",
96 | " 1 | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " ID | \n",
102 | " 5524 | \n",
103 | " 2174 | \n",
104 | "
\n",
105 | " \n",
106 | " Year_Birth | \n",
107 | " 1957 | \n",
108 | " 1954 | \n",
109 | "
\n",
110 | " \n",
111 | " Education | \n",
112 | " Graduation | \n",
113 | " Graduation | \n",
114 | "
\n",
115 | " \n",
116 | " Marital_Status | \n",
117 | " Single | \n",
118 | " Single | \n",
119 | "
\n",
120 | " \n",
121 | " Income | \n",
122 | " 58138.0 | \n",
123 | " 46344.0 | \n",
124 | "
\n",
125 | " \n",
126 | " Kidhome | \n",
127 | " 0 | \n",
128 | " 1 | \n",
129 | "
\n",
130 | " \n",
131 | " Teenhome | \n",
132 | " 0 | \n",
133 | " 1 | \n",
134 | "
\n",
135 | " \n",
136 | " Dt_Customer | \n",
137 | " 04-09-2012 | \n",
138 | " 08-03-2014 | \n",
139 | "
\n",
140 | " \n",
141 | " Recency | \n",
142 | " 58 | \n",
143 | " 38 | \n",
144 | "
\n",
145 | " \n",
146 | " NumStorePurchases | \n",
147 | " 4 | \n",
148 | " 2 | \n",
149 | "
\n",
150 | " \n",
151 | " NumWebVisitsMonth | \n",
152 | " 7 | \n",
153 | " 5 | \n",
154 | "
\n",
155 | " \n",
156 | "
\n",
157 | "
"
158 | ],
159 | "text/plain": [
160 | " 0 1\n",
161 | "ID 5524 2174\n",
162 | "Year_Birth 1957 1954\n",
163 | "Education Graduation Graduation\n",
164 | "Marital_Status Single Single\n",
165 | "Income 58138.0 46344.0\n",
166 | "Kidhome 0 1\n",
167 | "Teenhome 0 1\n",
168 | "Dt_Customer 04-09-2012 08-03-2014\n",
169 | "Recency 58 38\n",
170 | "NumStorePurchases 4 2\n",
171 | "NumWebVisitsMonth 7 5"
172 | ]
173 | },
174 | "execution_count": 9,
175 | "metadata": {},
176 | "output_type": "execute_result"
177 | }
178 | ],
179 | "source": [
180 | "marketing_data.head(2).T"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 10,
186 | "id": "20f83686",
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "marketing_data.head(2).T.to_clipboard()"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 5,
196 | "id": "b3601b62",
197 | "metadata": {},
198 | "outputs": [
199 | {
200 | "data": {
201 | "text/plain": [
202 | "ID int64\n",
203 | "Year_Birth int64\n",
204 | "Education object\n",
205 | "Marital_Status object\n",
206 | "Income float64\n",
207 | "Kidhome int64\n",
208 | "Teenhome int64\n",
209 | "Dt_Customer object\n",
210 | "Recency int64\n",
211 | "NumStorePurchases int64\n",
212 | "NumWebVisitsMonth int64\n",
213 | "dtype: object"
214 | ]
215 | },
216 | "execution_count": 5,
217 | "metadata": {},
218 | "output_type": "execute_result"
219 | }
220 | ],
221 | "source": [
222 | "marketing_data.dtypes"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 11,
228 | "id": "4082ff2c",
229 | "metadata": {},
230 | "outputs": [],
231 | "source": [
232 | "marketing_data.dtypes.to_clipboard()"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": 12,
238 | "id": "bdf17c4b",
239 | "metadata": {},
240 | "outputs": [
241 | {
242 | "data": {
243 | "text/plain": [
244 | "(2240, 11)"
245 | ]
246 | },
247 | "execution_count": 12,
248 | "metadata": {},
249 | "output_type": "execute_result"
250 | }
251 | ],
252 | "source": [
253 | "marketing_data.shape"
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "id": "46f24ec2",
259 | "metadata": {},
260 | "source": [
261 | "#### Check the average number of store purchases of customers based on number of kids in the home"
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": 6,
267 | "id": "fe10f0a2",
268 | "metadata": {},
269 | "outputs": [
270 | {
271 | "data": {
272 | "text/plain": [
273 | "Kidhome\n",
274 | "0 7.217324\n",
275 | "1 3.863181\n",
276 | "2 3.437500\n",
277 | "Name: NumStorePurchases, dtype: float64"
278 | ]
279 | },
280 | "execution_count": 6,
281 | "metadata": {},
282 | "output_type": "execute_result"
283 | }
284 | ],
285 | "source": [
286 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": 14,
292 | "id": "2c5c4c2f",
293 | "metadata": {},
294 | "outputs": [],
295 | "source": [
296 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean().to_clipboard()"
297 | ]
298 | }
299 | ],
300 | "metadata": {
301 | "kernelspec": {
302 | "display_name": "Python 3",
303 | "language": "python",
304 | "name": "python3"
305 | },
306 | "language_info": {
307 | "codemirror_mode": {
308 | "name": "ipython",
309 | "version": 3
310 | },
311 | "file_extension": ".py",
312 | "mimetype": "text/x-python",
313 | "name": "python",
314 | "nbconvert_exporter": "python",
315 | "pygments_lexer": "ipython3",
316 | "version": "3.7.1"
317 | }
318 | },
319 | "nbformat": 4,
320 | "nbformat_minor": 5
321 | }
322 |
--------------------------------------------------------------------------------
/Ch2/.ipynb_checkpoints/Replacing Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 29,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 30,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 31,
69 | "id": "1f556097",
70 | "metadata": {
71 | "scrolled": true
72 | },
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Kidhome | \n",
98 | " Teenhome | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " 5524 | \n",
105 | " 1957 | \n",
106 | " 0 | \n",
107 | " 0 | \n",
108 | "
\n",
109 | " \n",
110 | " 1 | \n",
111 | " 2174 | \n",
112 | " 1954 | \n",
113 | " 1 | \n",
114 | " 1 | \n",
115 | "
\n",
116 | " \n",
117 | " 2 | \n",
118 | " 4141 | \n",
119 | " 1965 | \n",
120 | " 0 | \n",
121 | " 0 | \n",
122 | "
\n",
123 | " \n",
124 | " 3 | \n",
125 | " 6182 | \n",
126 | " 1984 | \n",
127 | " 1 | \n",
128 | " 0 | \n",
129 | "
\n",
130 | " \n",
131 | " 4 | \n",
132 | " 5324 | \n",
133 | " 1981 | \n",
134 | " 1 | \n",
135 | " 0 | \n",
136 | "
\n",
137 | " \n",
138 | "
\n",
139 | "
"
140 | ],
141 | "text/plain": [
142 | " ID Year_Birth Kidhome Teenhome\n",
143 | "0 5524 1957 0 0\n",
144 | "1 2174 1954 1 1\n",
145 | "2 4141 1965 0 0\n",
146 | "3 6182 1984 1 0\n",
147 | "4 5324 1981 1 0"
148 | ]
149 | },
150 | "execution_count": 31,
151 | "metadata": {},
152 | "output_type": "execute_result"
153 | }
154 | ],
155 | "source": [
156 | "marketing_data.head()"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 32,
162 | "id": "954b5d8d",
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "marketing_data.head().to_clipboard()"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": 33,
172 | "id": "b3601b62",
173 | "metadata": {},
174 | "outputs": [
175 | {
176 | "data": {
177 | "text/plain": [
178 | "(2240, 4)"
179 | ]
180 | },
181 | "execution_count": 33,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | }
185 | ],
186 | "source": [
187 | "marketing_data.shape"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "id": "46f24ec2",
193 | "metadata": {},
194 | "source": [
195 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\""
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 21,
201 | "id": "fe10f0a2",
202 | "metadata": {},
203 | "outputs": [],
204 | "source": [
205 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 22,
211 | "id": "998a9da7",
212 | "metadata": {},
213 | "outputs": [
214 | {
215 | "data": {
216 | "text/html": [
217 | "\n",
218 | "\n",
231 | "
\n",
232 | " \n",
233 | " \n",
234 | " | \n",
235 | " Teenhome | \n",
236 | " Teenhome_replaced | \n",
237 | "
\n",
238 | " \n",
239 | " \n",
240 | " \n",
241 | " 0 | \n",
242 | " 0 | \n",
243 | " has no teen | \n",
244 | "
\n",
245 | " \n",
246 | " 1 | \n",
247 | " 1 | \n",
248 | " has teen | \n",
249 | "
\n",
250 | " \n",
251 | " 2 | \n",
252 | " 0 | \n",
253 | " has no teen | \n",
254 | "
\n",
255 | " \n",
256 | " 3 | \n",
257 | " 0 | \n",
258 | " has no teen | \n",
259 | "
\n",
260 | " \n",
261 | " 4 | \n",
262 | " 0 | \n",
263 | " has no teen | \n",
264 | "
\n",
265 | " \n",
266 | "
\n",
267 | "
"
268 | ],
269 | "text/plain": [
270 | " Teenhome Teenhome_replaced\n",
271 | "0 0 has no teen\n",
272 | "1 1 has teen\n",
273 | "2 0 has no teen\n",
274 | "3 0 has no teen\n",
275 | "4 0 has no teen"
276 | ]
277 | },
278 | "execution_count": 22,
279 | "metadata": {},
280 | "output_type": "execute_result"
281 | }
282 | ],
283 | "source": [
284 | "marketing_data[['Teenhome','Teenhome_replaced']].head()"
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": 25,
290 | "id": "393c92d6",
291 | "metadata": {},
292 | "outputs": [],
293 | "source": [
294 | "marketing_data[['Teenhome','Teenhome_replaced']].head().to_clipboard()"
295 | ]
296 | }
297 | ],
298 | "metadata": {
299 | "kernelspec": {
300 | "display_name": "Python 3",
301 | "language": "python",
302 | "name": "python3"
303 | },
304 | "language_info": {
305 | "codemirror_mode": {
306 | "name": "ipython",
307 | "version": 3
308 | },
309 | "file_extension": ".py",
310 | "mimetype": "text/x-python",
311 | "name": "python",
312 | "nbconvert_exporter": "python",
313 | "pygments_lexer": "ipython3",
314 | "version": "3.7.1"
315 | }
316 | },
317 | "nbformat": 4,
318 | "nbformat_minor": 5
319 | }
320 |
--------------------------------------------------------------------------------
/Ch2/.ipynb_checkpoints/Sorting Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 2,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 3,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 4,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 5 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 6,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Education | \n",
98 | " Marital_Status | \n",
99 | " Income | \n",
100 | " Kidhome | \n",
101 | " Teenhome | \n",
102 | " Dt_Customer | \n",
103 | " Recency | \n",
104 | " NumStorePurchases | \n",
105 | " NumWebVisitsMonth | \n",
106 | "
\n",
107 | " \n",
108 | " \n",
109 | " \n",
110 | " 0 | \n",
111 | " 5524 | \n",
112 | " 1957 | \n",
113 | " Graduation | \n",
114 | " Single | \n",
115 | " 58138.0 | \n",
116 | " 0 | \n",
117 | " 0 | \n",
118 | " 04-09-2012 | \n",
119 | " 58 | \n",
120 | " 4 | \n",
121 | " 7 | \n",
122 | "
\n",
123 | " \n",
124 | " 1 | \n",
125 | " 2174 | \n",
126 | " 1954 | \n",
127 | " Graduation | \n",
128 | " Single | \n",
129 | " 46344.0 | \n",
130 | " 1 | \n",
131 | " 1 | \n",
132 | " 08-03-2014 | \n",
133 | " 38 | \n",
134 | " 2 | \n",
135 | " 5 | \n",
136 | "
\n",
137 | " \n",
138 | " 2 | \n",
139 | " 4141 | \n",
140 | " 1965 | \n",
141 | " Graduation | \n",
142 | " Together | \n",
143 | " 71613.0 | \n",
144 | " 0 | \n",
145 | " 0 | \n",
146 | " 21-08-2013 | \n",
147 | " 26 | \n",
148 | " 10 | \n",
149 | " 4 | \n",
150 | "
\n",
151 | " \n",
152 | " 3 | \n",
153 | " 6182 | \n",
154 | " 1984 | \n",
155 | " Graduation | \n",
156 | " Together | \n",
157 | " 26646.0 | \n",
158 | " 1 | \n",
159 | " 0 | \n",
160 | " 10-02-2014 | \n",
161 | " 26 | \n",
162 | " 4 | \n",
163 | " 6 | \n",
164 | "
\n",
165 | " \n",
166 | " 4 | \n",
167 | " 5324 | \n",
168 | " 1981 | \n",
169 | " PhD | \n",
170 | " Married | \n",
171 | " 58293.0 | \n",
172 | " 1 | \n",
173 | " 0 | \n",
174 | " 19-01-2014 | \n",
175 | " 94 | \n",
176 | " 6 | \n",
177 | " 5 | \n",
178 | "
\n",
179 | " \n",
180 | "
\n",
181 | "
"
182 | ],
183 | "text/plain": [
184 | " ID Year_Birth Education Marital_Status Income Kidhome Teenhome \\\n",
185 | "0 5524 1957 Graduation Single 58138.0 0 0 \n",
186 | "1 2174 1954 Graduation Single 46344.0 1 1 \n",
187 | "2 4141 1965 Graduation Together 71613.0 0 0 \n",
188 | "3 6182 1984 Graduation Together 26646.0 1 0 \n",
189 | "4 5324 1981 PhD Married 58293.0 1 0 \n",
190 | "\n",
191 | " Dt_Customer Recency NumStorePurchases NumWebVisitsMonth \n",
192 | "0 04-09-2012 58 4 7 \n",
193 | "1 08-03-2014 38 2 5 \n",
194 | "2 21-08-2013 26 10 4 \n",
195 | "3 10-02-2014 26 4 6 \n",
196 | "4 19-01-2014 94 6 5 "
197 | ]
198 | },
199 | "execution_count": 6,
200 | "metadata": {},
201 | "output_type": "execute_result"
202 | }
203 | ],
204 | "source": [
205 | "marketing_data.head()"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 7,
211 | "id": "b3601b62",
212 | "metadata": {},
213 | "outputs": [
214 | {
215 | "data": {
216 | "text/plain": [
217 | "ID int64\n",
218 | "Year_Birth int64\n",
219 | "Education object\n",
220 | "Marital_Status object\n",
221 | "Income float64\n",
222 | "Kidhome int64\n",
223 | "Teenhome int64\n",
224 | "Dt_Customer object\n",
225 | "Recency int64\n",
226 | "NumStorePurchases int64\n",
227 | "NumWebVisitsMonth int64\n",
228 | "dtype: object"
229 | ]
230 | },
231 | "execution_count": 7,
232 | "metadata": {},
233 | "output_type": "execute_result"
234 | }
235 | ],
236 | "source": [
237 | "marketing_data.dtypes"
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "id": "46f24ec2",
243 | "metadata": {},
244 | "source": [
245 | "#### Sort customers based on number of Store Purchases in descending order"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 13,
251 | "id": "fe10f0a2",
252 | "metadata": {},
253 | "outputs": [],
254 | "source": [
255 | "sorted_data = marketing_data.sort_values('NumStorePurchases', ascending=False)"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": 20,
261 | "id": "07e2dab3",
262 | "metadata": {},
263 | "outputs": [],
264 | "source": [
265 | "sorted_data[['ID','NumStorePurchases']].tail().to_clipboard()"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": 12,
271 | "id": "021d0a5a",
272 | "metadata": {},
273 | "outputs": [],
274 | "source": [
275 | "marketing_data.sort_values('NumStorePurchases', ascending=False).head(2).T.to_clipboard()"
276 | ]
277 | }
278 | ],
279 | "metadata": {
280 | "kernelspec": {
281 | "display_name": "Python 3",
282 | "language": "python",
283 | "name": "python3"
284 | },
285 | "language_info": {
286 | "codemirror_mode": {
287 | "name": "ipython",
288 | "version": 3
289 | },
290 | "file_extension": ".py",
291 | "mimetype": "text/x-python",
292 | "name": "python",
293 | "nbconvert_exporter": "python",
294 | "pygments_lexer": "ipython3",
295 | "version": "3.7.1"
296 | }
297 | },
298 | "nbformat": 4,
299 | "nbformat_minor": 5
300 | }
301 |
--------------------------------------------------------------------------------
/Ch2/.ipynb_checkpoints/Untitled-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [],
3 | "metadata": {},
4 | "nbformat": 4,
5 | "nbformat_minor": 5
6 | }
7 |
--------------------------------------------------------------------------------
/Ch2/1. Grouping Data.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 2 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 4,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " 0 | \n",
96 | " 1 | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " ID | \n",
102 | " 5524 | \n",
103 | " 2174 | \n",
104 | "
\n",
105 | " \n",
106 | " Year_Birth | \n",
107 | " 1957 | \n",
108 | " 1954 | \n",
109 | "
\n",
110 | " \n",
111 | " Education | \n",
112 | " Graduation | \n",
113 | " Graduation | \n",
114 | "
\n",
115 | " \n",
116 | " Marital_Status | \n",
117 | " Single | \n",
118 | " Single | \n",
119 | "
\n",
120 | " \n",
121 | " Income | \n",
122 | " 58138.0 | \n",
123 | " 46344.0 | \n",
124 | "
\n",
125 | " \n",
126 | " Kidhome | \n",
127 | " 0 | \n",
128 | " 1 | \n",
129 | "
\n",
130 | " \n",
131 | " Teenhome | \n",
132 | " 0 | \n",
133 | " 1 | \n",
134 | "
\n",
135 | " \n",
136 | " Dt_Customer | \n",
137 | " 04-09-2012 | \n",
138 | " 08-03-2014 | \n",
139 | "
\n",
140 | " \n",
141 | " Recency | \n",
142 | " 58 | \n",
143 | " 38 | \n",
144 | "
\n",
145 | " \n",
146 | " NumStorePurchases | \n",
147 | " 4 | \n",
148 | " 2 | \n",
149 | "
\n",
150 | " \n",
151 | " NumWebVisitsMonth | \n",
152 | " 7 | \n",
153 | " 5 | \n",
154 | "
\n",
155 | " \n",
156 | "
\n",
157 | "
"
158 | ],
159 | "text/plain": [
160 | " 0 1\n",
161 | "ID 5524 2174\n",
162 | "Year_Birth 1957 1954\n",
163 | "Education Graduation Graduation\n",
164 | "Marital_Status Single Single\n",
165 | "Income 58138.0 46344.0\n",
166 | "Kidhome 0 1\n",
167 | "Teenhome 0 1\n",
168 | "Dt_Customer 04-09-2012 08-03-2014\n",
169 | "Recency 58 38\n",
170 | "NumStorePurchases 4 2\n",
171 | "NumWebVisitsMonth 7 5"
172 | ]
173 | },
174 | "execution_count": 4,
175 | "metadata": {},
176 | "output_type": "execute_result"
177 | }
178 | ],
179 | "source": [
180 | "marketing_data.head(2).T"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 5,
186 | "id": "b3601b62",
187 | "metadata": {},
188 | "outputs": [
189 | {
190 | "data": {
191 | "text/plain": [
192 | "ID int64\n",
193 | "Year_Birth int64\n",
194 | "Education object\n",
195 | "Marital_Status object\n",
196 | "Income float64\n",
197 | "Kidhome int64\n",
198 | "Teenhome int64\n",
199 | "Dt_Customer object\n",
200 | "Recency int64\n",
201 | "NumStorePurchases int64\n",
202 | "NumWebVisitsMonth int64\n",
203 | "dtype: object"
204 | ]
205 | },
206 | "execution_count": 5,
207 | "metadata": {},
208 | "output_type": "execute_result"
209 | }
210 | ],
211 | "source": [
212 | "marketing_data.dtypes"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 6,
218 | "id": "bdf17c4b",
219 | "metadata": {},
220 | "outputs": [
221 | {
222 | "data": {
223 | "text/plain": [
224 | "(2240, 11)"
225 | ]
226 | },
227 | "execution_count": 6,
228 | "metadata": {},
229 | "output_type": "execute_result"
230 | }
231 | ],
232 | "source": [
233 | "marketing_data.shape"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "id": "46f24ec2",
239 | "metadata": {},
240 | "source": [
241 | "#### Check the average number of store purchases of customers based on number of kids in the home"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": 7,
247 | "id": "fe10f0a2",
248 | "metadata": {},
249 | "outputs": [
250 | {
251 | "data": {
252 | "text/plain": [
253 | "Kidhome\n",
254 | "0 7.217324\n",
255 | "1 3.863181\n",
256 | "2 3.437500\n",
257 | "Name: NumStorePurchases, dtype: float64"
258 | ]
259 | },
260 | "execution_count": 7,
261 | "metadata": {},
262 | "output_type": "execute_result"
263 | }
264 | ],
265 | "source": [
266 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()"
267 | ]
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.7.1"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 5
291 | }
292 |
--------------------------------------------------------------------------------
/Ch2/10. Replacing Data.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 4,
69 | "id": "1f556097",
70 | "metadata": {
71 | "scrolled": true
72 | },
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Kidhome | \n",
98 | " Teenhome | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " 5524 | \n",
105 | " 1957 | \n",
106 | " 0 | \n",
107 | " 0 | \n",
108 | "
\n",
109 | " \n",
110 | " 1 | \n",
111 | " 2174 | \n",
112 | " 1954 | \n",
113 | " 1 | \n",
114 | " 1 | \n",
115 | "
\n",
116 | " \n",
117 | " 2 | \n",
118 | " 4141 | \n",
119 | " 1965 | \n",
120 | " 0 | \n",
121 | " 0 | \n",
122 | "
\n",
123 | " \n",
124 | " 3 | \n",
125 | " 6182 | \n",
126 | " 1984 | \n",
127 | " 1 | \n",
128 | " 0 | \n",
129 | "
\n",
130 | " \n",
131 | " 4 | \n",
132 | " 5324 | \n",
133 | " 1981 | \n",
134 | " 1 | \n",
135 | " 0 | \n",
136 | "
\n",
137 | " \n",
138 | "
\n",
139 | "
"
140 | ],
141 | "text/plain": [
142 | " ID Year_Birth Kidhome Teenhome\n",
143 | "0 5524 1957 0 0\n",
144 | "1 2174 1954 1 1\n",
145 | "2 4141 1965 0 0\n",
146 | "3 6182 1984 1 0\n",
147 | "4 5324 1981 1 0"
148 | ]
149 | },
150 | "execution_count": 4,
151 | "metadata": {},
152 | "output_type": "execute_result"
153 | }
154 | ],
155 | "source": [
156 | "marketing_data.head()"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 5,
162 | "id": "b3601b62",
163 | "metadata": {},
164 | "outputs": [
165 | {
166 | "data": {
167 | "text/plain": [
168 | "(2240, 4)"
169 | ]
170 | },
171 | "execution_count": 5,
172 | "metadata": {},
173 | "output_type": "execute_result"
174 | }
175 | ],
176 | "source": [
177 | "marketing_data.shape"
178 | ]
179 | },
180 | {
181 | "cell_type": "markdown",
182 | "id": "46f24ec2",
183 | "metadata": {},
184 | "source": [
185 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\""
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 6,
191 | "id": "fe10f0a2",
192 | "metadata": {},
193 | "outputs": [],
194 | "source": [
195 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "id": "998a9da7",
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/html": [
207 | "\n",
208 | "\n",
221 | "
\n",
222 | " \n",
223 | " \n",
224 | " | \n",
225 | " Teenhome | \n",
226 | " Teenhome_replaced | \n",
227 | "
\n",
228 | " \n",
229 | " \n",
230 | " \n",
231 | " 0 | \n",
232 | " 0 | \n",
233 | " has no teen | \n",
234 | "
\n",
235 | " \n",
236 | " 1 | \n",
237 | " 1 | \n",
238 | " has teen | \n",
239 | "
\n",
240 | " \n",
241 | " 2 | \n",
242 | " 0 | \n",
243 | " has no teen | \n",
244 | "
\n",
245 | " \n",
246 | " 3 | \n",
247 | " 0 | \n",
248 | " has no teen | \n",
249 | "
\n",
250 | " \n",
251 | " 4 | \n",
252 | " 0 | \n",
253 | " has no teen | \n",
254 | "
\n",
255 | " \n",
256 | "
\n",
257 | "
"
258 | ],
259 | "text/plain": [
260 | " Teenhome Teenhome_replaced\n",
261 | "0 0 has no teen\n",
262 | "1 1 has teen\n",
263 | "2 0 has no teen\n",
264 | "3 0 has no teen\n",
265 | "4 0 has no teen"
266 | ]
267 | },
268 | "execution_count": 7,
269 | "metadata": {},
270 | "output_type": "execute_result"
271 | }
272 | ],
273 | "source": [
274 | "marketing_data[['Teenhome','Teenhome_replaced']].head()"
275 | ]
276 | }
277 | ],
278 | "metadata": {
279 | "kernelspec": {
280 | "display_name": "Python 3",
281 | "language": "python",
282 | "name": "python3"
283 | },
284 | "language_info": {
285 | "codemirror_mode": {
286 | "name": "ipython",
287 | "version": 3
288 | },
289 | "file_extension": ".py",
290 | "mimetype": "text/x-python",
291 | "name": "python",
292 | "nbconvert_exporter": "python",
293 | "pygments_lexer": "ipython3",
294 | "version": "3.7.1"
295 | }
296 | },
297 | "nbformat": 4,
298 | "nbformat_minor": 5
299 | }
300 |
--------------------------------------------------------------------------------
/Ch2/9. Changing Data Format.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth','Marital_Status','Income']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 4,
69 | "id": "1f556097",
70 | "metadata": {},
71 | "outputs": [
72 | {
73 | "data": {
74 | "text/html": [
75 | "\n",
76 | "\n",
89 | "
\n",
90 | " \n",
91 | " \n",
92 | " | \n",
93 | " ID | \n",
94 | " Year_Birth | \n",
95 | " Marital_Status | \n",
96 | " Income | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " 0 | \n",
102 | " 5524 | \n",
103 | " 1957 | \n",
104 | " Single | \n",
105 | " 58138.0 | \n",
106 | "
\n",
107 | " \n",
108 | " 1 | \n",
109 | " 2174 | \n",
110 | " 1954 | \n",
111 | " Single | \n",
112 | " 46344.0 | \n",
113 | "
\n",
114 | " \n",
115 | " 2 | \n",
116 | " 4141 | \n",
117 | " 1965 | \n",
118 | " Together | \n",
119 | " 71613.0 | \n",
120 | "
\n",
121 | " \n",
122 | " 3 | \n",
123 | " 6182 | \n",
124 | " 1984 | \n",
125 | " Together | \n",
126 | " 26646.0 | \n",
127 | "
\n",
128 | " \n",
129 | " 4 | \n",
130 | " 5324 | \n",
131 | " 1981 | \n",
132 | " Married | \n",
133 | " 58293.0 | \n",
134 | "
\n",
135 | " \n",
136 | "
\n",
137 | "
"
138 | ],
139 | "text/plain": [
140 | " ID Year_Birth Marital_Status Income\n",
141 | "0 5524 1957 Single 58138.0\n",
142 | "1 2174 1954 Single 46344.0\n",
143 | "2 4141 1965 Together 71613.0\n",
144 | "3 6182 1984 Together 26646.0\n",
145 | "4 5324 1981 Married 58293.0"
146 | ]
147 | },
148 | "execution_count": 4,
149 | "metadata": {},
150 | "output_type": "execute_result"
151 | }
152 | ],
153 | "source": [
154 | "marketing_data.head()"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 5,
160 | "id": "b3601b62",
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "data": {
165 | "text/plain": [
166 | "(2240, 4)"
167 | ]
168 | },
169 | "execution_count": 5,
170 | "metadata": {},
171 | "output_type": "execute_result"
172 | }
173 | ],
174 | "source": [
175 | "marketing_data.shape"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "id": "35c8704d",
181 | "metadata": {},
182 | "source": [
183 | "#### Fill NAs in the income column"
184 | ]
185 | },
186 | {
187 | "cell_type": "code",
188 | "execution_count": 6,
189 | "id": "9e28186b",
190 | "metadata": {},
191 | "outputs": [],
192 | "source": [
193 | "marketing_data['Income'] = marketing_data['Income'].fillna(0)"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "id": "46f24ec2",
199 | "metadata": {},
200 | "source": [
201 | "#### Change the data type of the Income from float to int"
202 | ]
203 | },
204 | {
205 | "cell_type": "code",
206 | "execution_count": 7,
207 | "id": "fe10f0a2",
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "marketing_data['Income_changed'] = marketing_data['Income'].astype(int)"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 8,
217 | "id": "5b38b1b8",
218 | "metadata": {},
219 | "outputs": [
220 | {
221 | "data": {
222 | "text/html": [
223 | "\n",
224 | "\n",
237 | "
\n",
238 | " \n",
239 | " \n",
240 | " | \n",
241 | " Income | \n",
242 | " Income_changed | \n",
243 | "
\n",
244 | " \n",
245 | " \n",
246 | " \n",
247 | " 0 | \n",
248 | " 58138.0 | \n",
249 | " 58138 | \n",
250 | "
\n",
251 | " \n",
252 | " 1 | \n",
253 | " 46344.0 | \n",
254 | " 46344 | \n",
255 | "
\n",
256 | " \n",
257 | " 2 | \n",
258 | " 71613.0 | \n",
259 | " 71613 | \n",
260 | "
\n",
261 | " \n",
262 | " 3 | \n",
263 | " 26646.0 | \n",
264 | " 26646 | \n",
265 | "
\n",
266 | " \n",
267 | " 4 | \n",
268 | " 58293.0 | \n",
269 | " 58293 | \n",
270 | "
\n",
271 | " \n",
272 | "
\n",
273 | "
"
274 | ],
275 | "text/plain": [
276 | " Income Income_changed\n",
277 | "0 58138.0 58138\n",
278 | "1 46344.0 46344\n",
279 | "2 71613.0 71613\n",
280 | "3 26646.0 26646\n",
281 | "4 58293.0 58293"
282 | ]
283 | },
284 | "execution_count": 8,
285 | "metadata": {},
286 | "output_type": "execute_result"
287 | }
288 | ],
289 | "source": [
290 | "marketing_data[['Income','Income_changed']].head()"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": 9,
296 | "id": "998a9da7",
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "data": {
301 | "text/plain": [
302 | "Income float64\n",
303 | "Income_changed int32\n",
304 | "dtype: object"
305 | ]
306 | },
307 | "execution_count": 9,
308 | "metadata": {},
309 | "output_type": "execute_result"
310 | }
311 | ],
312 | "source": [
313 | "marketing_data[['Income','Income_changed']].dtypes"
314 | ]
315 | }
316 | ],
317 | "metadata": {
318 | "kernelspec": {
319 | "display_name": "Python 3",
320 | "language": "python",
321 | "name": "python3"
322 | },
323 | "language_info": {
324 | "codemirror_mode": {
325 | "name": "ipython",
326 | "version": 3
327 | },
328 | "file_extension": ".py",
329 | "mimetype": "text/x-python",
330 | "name": "python",
331 | "nbconvert_exporter": "python",
332 | "pygments_lexer": "ipython3",
333 | "version": "3.7.1"
334 | }
335 | },
336 | "nbformat": 4,
337 | "nbformat_minor": 5
338 | }
339 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/1. Grouping Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 2 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 4,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " 0 | \n",
96 | " 1 | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " ID | \n",
102 | " 5524 | \n",
103 | " 2174 | \n",
104 | "
\n",
105 | " \n",
106 | " Year_Birth | \n",
107 | " 1957 | \n",
108 | " 1954 | \n",
109 | "
\n",
110 | " \n",
111 | " Education | \n",
112 | " Graduation | \n",
113 | " Graduation | \n",
114 | "
\n",
115 | " \n",
116 | " Marital_Status | \n",
117 | " Single | \n",
118 | " Single | \n",
119 | "
\n",
120 | " \n",
121 | " Income | \n",
122 | " 58138.0 | \n",
123 | " 46344.0 | \n",
124 | "
\n",
125 | " \n",
126 | " Kidhome | \n",
127 | " 0 | \n",
128 | " 1 | \n",
129 | "
\n",
130 | " \n",
131 | " Teenhome | \n",
132 | " 0 | \n",
133 | " 1 | \n",
134 | "
\n",
135 | " \n",
136 | " Dt_Customer | \n",
137 | " 04-09-2012 | \n",
138 | " 08-03-2014 | \n",
139 | "
\n",
140 | " \n",
141 | " Recency | \n",
142 | " 58 | \n",
143 | " 38 | \n",
144 | "
\n",
145 | " \n",
146 | " NumStorePurchases | \n",
147 | " 4 | \n",
148 | " 2 | \n",
149 | "
\n",
150 | " \n",
151 | " NumWebVisitsMonth | \n",
152 | " 7 | \n",
153 | " 5 | \n",
154 | "
\n",
155 | " \n",
156 | "
\n",
157 | "
"
158 | ],
159 | "text/plain": [
160 | " 0 1\n",
161 | "ID 5524 2174\n",
162 | "Year_Birth 1957 1954\n",
163 | "Education Graduation Graduation\n",
164 | "Marital_Status Single Single\n",
165 | "Income 58138.0 46344.0\n",
166 | "Kidhome 0 1\n",
167 | "Teenhome 0 1\n",
168 | "Dt_Customer 04-09-2012 08-03-2014\n",
169 | "Recency 58 38\n",
170 | "NumStorePurchases 4 2\n",
171 | "NumWebVisitsMonth 7 5"
172 | ]
173 | },
174 | "execution_count": 4,
175 | "metadata": {},
176 | "output_type": "execute_result"
177 | }
178 | ],
179 | "source": [
180 | "marketing_data.head(2).T"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 5,
186 | "id": "b3601b62",
187 | "metadata": {},
188 | "outputs": [
189 | {
190 | "data": {
191 | "text/plain": [
192 | "ID int64\n",
193 | "Year_Birth int64\n",
194 | "Education object\n",
195 | "Marital_Status object\n",
196 | "Income float64\n",
197 | "Kidhome int64\n",
198 | "Teenhome int64\n",
199 | "Dt_Customer object\n",
200 | "Recency int64\n",
201 | "NumStorePurchases int64\n",
202 | "NumWebVisitsMonth int64\n",
203 | "dtype: object"
204 | ]
205 | },
206 | "execution_count": 5,
207 | "metadata": {},
208 | "output_type": "execute_result"
209 | }
210 | ],
211 | "source": [
212 | "marketing_data.dtypes"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 6,
218 | "id": "bdf17c4b",
219 | "metadata": {},
220 | "outputs": [
221 | {
222 | "data": {
223 | "text/plain": [
224 | "(2240, 11)"
225 | ]
226 | },
227 | "execution_count": 6,
228 | "metadata": {},
229 | "output_type": "execute_result"
230 | }
231 | ],
232 | "source": [
233 | "marketing_data.shape"
234 | ]
235 | },
236 | {
237 | "cell_type": "markdown",
238 | "id": "46f24ec2",
239 | "metadata": {},
240 | "source": [
241 | "#### Check the average number of store purchases of customers based on number of kids in the home"
242 | ]
243 | },
244 | {
245 | "cell_type": "code",
246 | "execution_count": 7,
247 | "id": "fe10f0a2",
248 | "metadata": {},
249 | "outputs": [
250 | {
251 | "data": {
252 | "text/plain": [
253 | "Kidhome\n",
254 | "0 7.217324\n",
255 | "1 3.863181\n",
256 | "2 3.437500\n",
257 | "Name: NumStorePurchases, dtype: float64"
258 | ]
259 | },
260 | "execution_count": 7,
261 | "metadata": {},
262 | "output_type": "execute_result"
263 | }
264 | ],
265 | "source": [
266 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()"
267 | ]
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.7.1"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 5
291 | }
292 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/10. Replacing Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 4,
69 | "id": "1f556097",
70 | "metadata": {
71 | "scrolled": true
72 | },
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Kidhome | \n",
98 | " Teenhome | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " 5524 | \n",
105 | " 1957 | \n",
106 | " 0 | \n",
107 | " 0 | \n",
108 | "
\n",
109 | " \n",
110 | " 1 | \n",
111 | " 2174 | \n",
112 | " 1954 | \n",
113 | " 1 | \n",
114 | " 1 | \n",
115 | "
\n",
116 | " \n",
117 | " 2 | \n",
118 | " 4141 | \n",
119 | " 1965 | \n",
120 | " 0 | \n",
121 | " 0 | \n",
122 | "
\n",
123 | " \n",
124 | " 3 | \n",
125 | " 6182 | \n",
126 | " 1984 | \n",
127 | " 1 | \n",
128 | " 0 | \n",
129 | "
\n",
130 | " \n",
131 | " 4 | \n",
132 | " 5324 | \n",
133 | " 1981 | \n",
134 | " 1 | \n",
135 | " 0 | \n",
136 | "
\n",
137 | " \n",
138 | "
\n",
139 | "
"
140 | ],
141 | "text/plain": [
142 | " ID Year_Birth Kidhome Teenhome\n",
143 | "0 5524 1957 0 0\n",
144 | "1 2174 1954 1 1\n",
145 | "2 4141 1965 0 0\n",
146 | "3 6182 1984 1 0\n",
147 | "4 5324 1981 1 0"
148 | ]
149 | },
150 | "execution_count": 4,
151 | "metadata": {},
152 | "output_type": "execute_result"
153 | }
154 | ],
155 | "source": [
156 | "marketing_data.head()"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 5,
162 | "id": "b3601b62",
163 | "metadata": {},
164 | "outputs": [
165 | {
166 | "data": {
167 | "text/plain": [
168 | "(2240, 4)"
169 | ]
170 | },
171 | "execution_count": 5,
172 | "metadata": {},
173 | "output_type": "execute_result"
174 | }
175 | ],
176 | "source": [
177 | "marketing_data.shape"
178 | ]
179 | },
180 | {
181 | "cell_type": "markdown",
182 | "id": "46f24ec2",
183 | "metadata": {},
184 | "source": [
185 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\""
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 6,
191 | "id": "fe10f0a2",
192 | "metadata": {},
193 | "outputs": [],
194 | "source": [
195 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "id": "998a9da7",
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/html": [
207 | "\n",
208 | "\n",
221 | "
\n",
222 | " \n",
223 | " \n",
224 | " | \n",
225 | " Teenhome | \n",
226 | " Teenhome_replaced | \n",
227 | "
\n",
228 | " \n",
229 | " \n",
230 | " \n",
231 | " 0 | \n",
232 | " 0 | \n",
233 | " has no teen | \n",
234 | "
\n",
235 | " \n",
236 | " 1 | \n",
237 | " 1 | \n",
238 | " has teen | \n",
239 | "
\n",
240 | " \n",
241 | " 2 | \n",
242 | " 0 | \n",
243 | " has no teen | \n",
244 | "
\n",
245 | " \n",
246 | " 3 | \n",
247 | " 0 | \n",
248 | " has no teen | \n",
249 | "
\n",
250 | " \n",
251 | " 4 | \n",
252 | " 0 | \n",
253 | " has no teen | \n",
254 | "
\n",
255 | " \n",
256 | "
\n",
257 | "
"
258 | ],
259 | "text/plain": [
260 | " Teenhome Teenhome_replaced\n",
261 | "0 0 has no teen\n",
262 | "1 1 has teen\n",
263 | "2 0 has no teen\n",
264 | "3 0 has no teen\n",
265 | "4 0 has no teen"
266 | ]
267 | },
268 | "execution_count": 7,
269 | "metadata": {},
270 | "output_type": "execute_result"
271 | }
272 | ],
273 | "source": [
274 | "marketing_data[['Teenhome','Teenhome_replaced']].head()"
275 | ]
276 | }
277 | ],
278 | "metadata": {
279 | "kernelspec": {
280 | "display_name": "Python 3",
281 | "language": "python",
282 | "name": "python3"
283 | },
284 | "language_info": {
285 | "codemirror_mode": {
286 | "name": "ipython",
287 | "version": 3
288 | },
289 | "file_extension": ".py",
290 | "mimetype": "text/x-python",
291 | "name": "python",
292 | "nbconvert_exporter": "python",
293 | "pygments_lexer": "ipython3",
294 | "version": "3.7.1"
295 | }
296 | },
297 | "nbformat": 4,
298 | "nbformat_minor": 5
299 | }
300 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/9. Changing Data Format-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth','Marital_Status','Income']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 4,
69 | "id": "1f556097",
70 | "metadata": {},
71 | "outputs": [
72 | {
73 | "data": {
74 | "text/html": [
75 | "\n",
76 | "\n",
89 | "
\n",
90 | " \n",
91 | " \n",
92 | " | \n",
93 | " ID | \n",
94 | " Year_Birth | \n",
95 | " Marital_Status | \n",
96 | " Income | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " 0 | \n",
102 | " 5524 | \n",
103 | " 1957 | \n",
104 | " Single | \n",
105 | " 58138.0 | \n",
106 | "
\n",
107 | " \n",
108 | " 1 | \n",
109 | " 2174 | \n",
110 | " 1954 | \n",
111 | " Single | \n",
112 | " 46344.0 | \n",
113 | "
\n",
114 | " \n",
115 | " 2 | \n",
116 | " 4141 | \n",
117 | " 1965 | \n",
118 | " Together | \n",
119 | " 71613.0 | \n",
120 | "
\n",
121 | " \n",
122 | " 3 | \n",
123 | " 6182 | \n",
124 | " 1984 | \n",
125 | " Together | \n",
126 | " 26646.0 | \n",
127 | "
\n",
128 | " \n",
129 | " 4 | \n",
130 | " 5324 | \n",
131 | " 1981 | \n",
132 | " Married | \n",
133 | " 58293.0 | \n",
134 | "
\n",
135 | " \n",
136 | "
\n",
137 | "
"
138 | ],
139 | "text/plain": [
140 | " ID Year_Birth Marital_Status Income\n",
141 | "0 5524 1957 Single 58138.0\n",
142 | "1 2174 1954 Single 46344.0\n",
143 | "2 4141 1965 Together 71613.0\n",
144 | "3 6182 1984 Together 26646.0\n",
145 | "4 5324 1981 Married 58293.0"
146 | ]
147 | },
148 | "execution_count": 4,
149 | "metadata": {},
150 | "output_type": "execute_result"
151 | }
152 | ],
153 | "source": [
154 | "marketing_data.head()"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 5,
160 | "id": "b3601b62",
161 | "metadata": {},
162 | "outputs": [
163 | {
164 | "data": {
165 | "text/plain": [
166 | "(2240, 4)"
167 | ]
168 | },
169 | "execution_count": 5,
170 | "metadata": {},
171 | "output_type": "execute_result"
172 | }
173 | ],
174 | "source": [
175 | "marketing_data.shape"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "id": "35c8704d",
181 | "metadata": {},
182 | "source": [
183 | "#### Fill NAs in the income column"
184 | ]
185 | },
186 | {
187 | "cell_type": "code",
188 | "execution_count": 6,
189 | "id": "9e28186b",
190 | "metadata": {},
191 | "outputs": [],
192 | "source": [
193 | "marketing_data['Income'] = marketing_data['Income'].fillna(0)"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "id": "46f24ec2",
199 | "metadata": {},
200 | "source": [
201 | "#### Change the data type of the Income from float to int"
202 | ]
203 | },
204 | {
205 | "cell_type": "code",
206 | "execution_count": 7,
207 | "id": "fe10f0a2",
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "marketing_data['Income_changed'] = marketing_data['Income'].astype(int)"
212 | ]
213 | },
214 | {
215 | "cell_type": "code",
216 | "execution_count": 8,
217 | "id": "5b38b1b8",
218 | "metadata": {},
219 | "outputs": [
220 | {
221 | "data": {
222 | "text/html": [
223 | "\n",
224 | "\n",
237 | "
\n",
238 | " \n",
239 | " \n",
240 | " | \n",
241 | " Income | \n",
242 | " Income_changed | \n",
243 | "
\n",
244 | " \n",
245 | " \n",
246 | " \n",
247 | " 0 | \n",
248 | " 58138.0 | \n",
249 | " 58138 | \n",
250 | "
\n",
251 | " \n",
252 | " 1 | \n",
253 | " 46344.0 | \n",
254 | " 46344 | \n",
255 | "
\n",
256 | " \n",
257 | " 2 | \n",
258 | " 71613.0 | \n",
259 | " 71613 | \n",
260 | "
\n",
261 | " \n",
262 | " 3 | \n",
263 | " 26646.0 | \n",
264 | " 26646 | \n",
265 | "
\n",
266 | " \n",
267 | " 4 | \n",
268 | " 58293.0 | \n",
269 | " 58293 | \n",
270 | "
\n",
271 | " \n",
272 | "
\n",
273 | "
"
274 | ],
275 | "text/plain": [
276 | " Income Income_changed\n",
277 | "0 58138.0 58138\n",
278 | "1 46344.0 46344\n",
279 | "2 71613.0 71613\n",
280 | "3 26646.0 26646\n",
281 | "4 58293.0 58293"
282 | ]
283 | },
284 | "execution_count": 8,
285 | "metadata": {},
286 | "output_type": "execute_result"
287 | }
288 | ],
289 | "source": [
290 | "marketing_data[['Income','Income_changed']].head()"
291 | ]
292 | },
293 | {
294 | "cell_type": "code",
295 | "execution_count": 9,
296 | "id": "998a9da7",
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "data": {
301 | "text/plain": [
302 | "Income float64\n",
303 | "Income_changed int32\n",
304 | "dtype: object"
305 | ]
306 | },
307 | "execution_count": 9,
308 | "metadata": {},
309 | "output_type": "execute_result"
310 | }
311 | ],
312 | "source": [
313 | "marketing_data[['Income','Income_changed']].dtypes"
314 | ]
315 | }
316 | ],
317 | "metadata": {
318 | "kernelspec": {
319 | "display_name": "Python 3",
320 | "language": "python",
321 | "name": "python3"
322 | },
323 | "language_info": {
324 | "codemirror_mode": {
325 | "name": "ipython",
326 | "version": 3
327 | },
328 | "file_extension": ".py",
329 | "mimetype": "text/x-python",
330 | "name": "python",
331 | "nbconvert_exporter": "python",
332 | "pygments_lexer": "ipython3",
333 | "version": "3.7.1"
334 | }
335 | },
336 | "nbformat": 4,
337 | "nbformat_minor": 5
338 | }
339 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/Grouping Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 5 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 9,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " 0 | \n",
96 | " 1 | \n",
97 | "
\n",
98 | " \n",
99 | " \n",
100 | " \n",
101 | " ID | \n",
102 | " 5524 | \n",
103 | " 2174 | \n",
104 | "
\n",
105 | " \n",
106 | " Year_Birth | \n",
107 | " 1957 | \n",
108 | " 1954 | \n",
109 | "
\n",
110 | " \n",
111 | " Education | \n",
112 | " Graduation | \n",
113 | " Graduation | \n",
114 | "
\n",
115 | " \n",
116 | " Marital_Status | \n",
117 | " Single | \n",
118 | " Single | \n",
119 | "
\n",
120 | " \n",
121 | " Income | \n",
122 | " 58138.0 | \n",
123 | " 46344.0 | \n",
124 | "
\n",
125 | " \n",
126 | " Kidhome | \n",
127 | " 0 | \n",
128 | " 1 | \n",
129 | "
\n",
130 | " \n",
131 | " Teenhome | \n",
132 | " 0 | \n",
133 | " 1 | \n",
134 | "
\n",
135 | " \n",
136 | " Dt_Customer | \n",
137 | " 04-09-2012 | \n",
138 | " 08-03-2014 | \n",
139 | "
\n",
140 | " \n",
141 | " Recency | \n",
142 | " 58 | \n",
143 | " 38 | \n",
144 | "
\n",
145 | " \n",
146 | " NumStorePurchases | \n",
147 | " 4 | \n",
148 | " 2 | \n",
149 | "
\n",
150 | " \n",
151 | " NumWebVisitsMonth | \n",
152 | " 7 | \n",
153 | " 5 | \n",
154 | "
\n",
155 | " \n",
156 | "
\n",
157 | "
"
158 | ],
159 | "text/plain": [
160 | " 0 1\n",
161 | "ID 5524 2174\n",
162 | "Year_Birth 1957 1954\n",
163 | "Education Graduation Graduation\n",
164 | "Marital_Status Single Single\n",
165 | "Income 58138.0 46344.0\n",
166 | "Kidhome 0 1\n",
167 | "Teenhome 0 1\n",
168 | "Dt_Customer 04-09-2012 08-03-2014\n",
169 | "Recency 58 38\n",
170 | "NumStorePurchases 4 2\n",
171 | "NumWebVisitsMonth 7 5"
172 | ]
173 | },
174 | "execution_count": 9,
175 | "metadata": {},
176 | "output_type": "execute_result"
177 | }
178 | ],
179 | "source": [
180 | "marketing_data.head(2).T"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 10,
186 | "id": "20f83686",
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "marketing_data.head(2).T.to_clipboard()"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 5,
196 | "id": "b3601b62",
197 | "metadata": {},
198 | "outputs": [
199 | {
200 | "data": {
201 | "text/plain": [
202 | "ID int64\n",
203 | "Year_Birth int64\n",
204 | "Education object\n",
205 | "Marital_Status object\n",
206 | "Income float64\n",
207 | "Kidhome int64\n",
208 | "Teenhome int64\n",
209 | "Dt_Customer object\n",
210 | "Recency int64\n",
211 | "NumStorePurchases int64\n",
212 | "NumWebVisitsMonth int64\n",
213 | "dtype: object"
214 | ]
215 | },
216 | "execution_count": 5,
217 | "metadata": {},
218 | "output_type": "execute_result"
219 | }
220 | ],
221 | "source": [
222 | "marketing_data.dtypes"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 11,
228 | "id": "4082ff2c",
229 | "metadata": {},
230 | "outputs": [],
231 | "source": [
232 | "marketing_data.dtypes.to_clipboard()"
233 | ]
234 | },
235 | {
236 | "cell_type": "code",
237 | "execution_count": 12,
238 | "id": "bdf17c4b",
239 | "metadata": {},
240 | "outputs": [
241 | {
242 | "data": {
243 | "text/plain": [
244 | "(2240, 11)"
245 | ]
246 | },
247 | "execution_count": 12,
248 | "metadata": {},
249 | "output_type": "execute_result"
250 | }
251 | ],
252 | "source": [
253 | "marketing_data.shape"
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "id": "46f24ec2",
259 | "metadata": {},
260 | "source": [
261 | "#### Check the average number of store purchases of customers based on number of kids in the home"
262 | ]
263 | },
264 | {
265 | "cell_type": "code",
266 | "execution_count": 6,
267 | "id": "fe10f0a2",
268 | "metadata": {},
269 | "outputs": [
270 | {
271 | "data": {
272 | "text/plain": [
273 | "Kidhome\n",
274 | "0 7.217324\n",
275 | "1 3.863181\n",
276 | "2 3.437500\n",
277 | "Name: NumStorePurchases, dtype: float64"
278 | ]
279 | },
280 | "execution_count": 6,
281 | "metadata": {},
282 | "output_type": "execute_result"
283 | }
284 | ],
285 | "source": [
286 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean()"
287 | ]
288 | },
289 | {
290 | "cell_type": "code",
291 | "execution_count": 14,
292 | "id": "2c5c4c2f",
293 | "metadata": {},
294 | "outputs": [],
295 | "source": [
296 | "marketing_data.groupby('Kidhome')['NumStorePurchases'].mean().to_clipboard()"
297 | ]
298 | }
299 | ],
300 | "metadata": {
301 | "kernelspec": {
302 | "display_name": "Python 3",
303 | "language": "python",
304 | "name": "python3"
305 | },
306 | "language_info": {
307 | "codemirror_mode": {
308 | "name": "ipython",
309 | "version": 3
310 | },
311 | "file_extension": ".py",
312 | "mimetype": "text/x-python",
313 | "name": "python",
314 | "nbconvert_exporter": "python",
315 | "pygments_lexer": "ipython3",
316 | "version": "3.7.1"
317 | }
318 | },
319 | "nbformat": 4,
320 | "nbformat_minor": 5
321 | }
322 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/Replacing Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 29,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 30,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Kidhome', 'Teenhome']]"
56 | ]
57 | },
58 | {
59 | "cell_type": "markdown",
60 | "id": "128c9ecd",
61 | "metadata": {},
62 | "source": [
63 | "#### Inspect first 5 rows and data types of the dataset"
64 | ]
65 | },
66 | {
67 | "cell_type": "code",
68 | "execution_count": 31,
69 | "id": "1f556097",
70 | "metadata": {
71 | "scrolled": true
72 | },
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Kidhome | \n",
98 | " Teenhome | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " 5524 | \n",
105 | " 1957 | \n",
106 | " 0 | \n",
107 | " 0 | \n",
108 | "
\n",
109 | " \n",
110 | " 1 | \n",
111 | " 2174 | \n",
112 | " 1954 | \n",
113 | " 1 | \n",
114 | " 1 | \n",
115 | "
\n",
116 | " \n",
117 | " 2 | \n",
118 | " 4141 | \n",
119 | " 1965 | \n",
120 | " 0 | \n",
121 | " 0 | \n",
122 | "
\n",
123 | " \n",
124 | " 3 | \n",
125 | " 6182 | \n",
126 | " 1984 | \n",
127 | " 1 | \n",
128 | " 0 | \n",
129 | "
\n",
130 | " \n",
131 | " 4 | \n",
132 | " 5324 | \n",
133 | " 1981 | \n",
134 | " 1 | \n",
135 | " 0 | \n",
136 | "
\n",
137 | " \n",
138 | "
\n",
139 | "
"
140 | ],
141 | "text/plain": [
142 | " ID Year_Birth Kidhome Teenhome\n",
143 | "0 5524 1957 0 0\n",
144 | "1 2174 1954 1 1\n",
145 | "2 4141 1965 0 0\n",
146 | "3 6182 1984 1 0\n",
147 | "4 5324 1981 1 0"
148 | ]
149 | },
150 | "execution_count": 31,
151 | "metadata": {},
152 | "output_type": "execute_result"
153 | }
154 | ],
155 | "source": [
156 | "marketing_data.head()"
157 | ]
158 | },
159 | {
160 | "cell_type": "code",
161 | "execution_count": 32,
162 | "id": "954b5d8d",
163 | "metadata": {},
164 | "outputs": [],
165 | "source": [
166 | "marketing_data.head().to_clipboard()"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": 33,
172 | "id": "b3601b62",
173 | "metadata": {},
174 | "outputs": [
175 | {
176 | "data": {
177 | "text/plain": [
178 | "(2240, 4)"
179 | ]
180 | },
181 | "execution_count": 33,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | }
185 | ],
186 | "source": [
187 | "marketing_data.shape"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "id": "46f24ec2",
193 | "metadata": {},
194 | "source": [
195 | "#### Replace the values in Teenhomes with \"has teen\" and \"has no teen\""
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 21,
201 | "id": "fe10f0a2",
202 | "metadata": {},
203 | "outputs": [],
204 | "source": [
205 | "marketing_data['Teenhome_replaced'] = marketing_data['Teenhome'].replace([0,1,2],['has no teen','has teen','has teen'])"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 22,
211 | "id": "998a9da7",
212 | "metadata": {},
213 | "outputs": [
214 | {
215 | "data": {
216 | "text/html": [
217 | "\n",
218 | "\n",
231 | "
\n",
232 | " \n",
233 | " \n",
234 | " | \n",
235 | " Teenhome | \n",
236 | " Teenhome_replaced | \n",
237 | "
\n",
238 | " \n",
239 | " \n",
240 | " \n",
241 | " 0 | \n",
242 | " 0 | \n",
243 | " has no teen | \n",
244 | "
\n",
245 | " \n",
246 | " 1 | \n",
247 | " 1 | \n",
248 | " has teen | \n",
249 | "
\n",
250 | " \n",
251 | " 2 | \n",
252 | " 0 | \n",
253 | " has no teen | \n",
254 | "
\n",
255 | " \n",
256 | " 3 | \n",
257 | " 0 | \n",
258 | " has no teen | \n",
259 | "
\n",
260 | " \n",
261 | " 4 | \n",
262 | " 0 | \n",
263 | " has no teen | \n",
264 | "
\n",
265 | " \n",
266 | "
\n",
267 | "
"
268 | ],
269 | "text/plain": [
270 | " Teenhome Teenhome_replaced\n",
271 | "0 0 has no teen\n",
272 | "1 1 has teen\n",
273 | "2 0 has no teen\n",
274 | "3 0 has no teen\n",
275 | "4 0 has no teen"
276 | ]
277 | },
278 | "execution_count": 22,
279 | "metadata": {},
280 | "output_type": "execute_result"
281 | }
282 | ],
283 | "source": [
284 | "marketing_data[['Teenhome','Teenhome_replaced']].head()"
285 | ]
286 | },
287 | {
288 | "cell_type": "code",
289 | "execution_count": 25,
290 | "id": "393c92d6",
291 | "metadata": {},
292 | "outputs": [],
293 | "source": [
294 | "marketing_data[['Teenhome','Teenhome_replaced']].head().to_clipboard()"
295 | ]
296 | }
297 | ],
298 | "metadata": {
299 | "kernelspec": {
300 | "display_name": "Python 3",
301 | "language": "python",
302 | "name": "python3"
303 | },
304 | "language_info": {
305 | "codemirror_mode": {
306 | "name": "ipython",
307 | "version": 3
308 | },
309 | "file_extension": ".py",
310 | "mimetype": "text/x-python",
311 | "name": "python",
312 | "nbconvert_exporter": "python",
313 | "pygments_lexer": "ipython3",
314 | "version": "3.7.1"
315 | }
316 | },
317 | "nbformat": 4,
318 | "nbformat_minor": 5
319 | }
320 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/Sorting Data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 2,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "a2050fe1",
25 | "metadata": {},
26 | "source": [
27 | "#### Load dataset"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 3,
33 | "id": "3a02fac1",
34 | "metadata": {},
35 | "outputs": [],
36 | "source": [
37 | "marketing_data = pd.read_csv(\"data/marketing_campaign.csv\")"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "e81aa501",
43 | "metadata": {},
44 | "source": [
45 | "#### Subset for relevant columns"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 4,
51 | "id": "9279d462",
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "marketing_data = marketing_data[['ID', 'Year_Birth', 'Education', 'Marital_Status',\n",
56 | " 'Income', 'Kidhome', 'Teenhome', 'Dt_Customer', \n",
57 | " 'Recency','NumStorePurchases', 'NumWebVisitsMonth']]"
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "id": "128c9ecd",
63 | "metadata": {},
64 | "source": [
65 | "#### Inspect first 5 rows and data types of the dataset"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 6,
71 | "id": "1f556097",
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "data": {
76 | "text/html": [
77 | "\n",
78 | "\n",
91 | "
\n",
92 | " \n",
93 | " \n",
94 | " | \n",
95 | " ID | \n",
96 | " Year_Birth | \n",
97 | " Education | \n",
98 | " Marital_Status | \n",
99 | " Income | \n",
100 | " Kidhome | \n",
101 | " Teenhome | \n",
102 | " Dt_Customer | \n",
103 | " Recency | \n",
104 | " NumStorePurchases | \n",
105 | " NumWebVisitsMonth | \n",
106 | "
\n",
107 | " \n",
108 | " \n",
109 | " \n",
110 | " 0 | \n",
111 | " 5524 | \n",
112 | " 1957 | \n",
113 | " Graduation | \n",
114 | " Single | \n",
115 | " 58138.0 | \n",
116 | " 0 | \n",
117 | " 0 | \n",
118 | " 04-09-2012 | \n",
119 | " 58 | \n",
120 | " 4 | \n",
121 | " 7 | \n",
122 | "
\n",
123 | " \n",
124 | " 1 | \n",
125 | " 2174 | \n",
126 | " 1954 | \n",
127 | " Graduation | \n",
128 | " Single | \n",
129 | " 46344.0 | \n",
130 | " 1 | \n",
131 | " 1 | \n",
132 | " 08-03-2014 | \n",
133 | " 38 | \n",
134 | " 2 | \n",
135 | " 5 | \n",
136 | "
\n",
137 | " \n",
138 | " 2 | \n",
139 | " 4141 | \n",
140 | " 1965 | \n",
141 | " Graduation | \n",
142 | " Together | \n",
143 | " 71613.0 | \n",
144 | " 0 | \n",
145 | " 0 | \n",
146 | " 21-08-2013 | \n",
147 | " 26 | \n",
148 | " 10 | \n",
149 | " 4 | \n",
150 | "
\n",
151 | " \n",
152 | " 3 | \n",
153 | " 6182 | \n",
154 | " 1984 | \n",
155 | " Graduation | \n",
156 | " Together | \n",
157 | " 26646.0 | \n",
158 | " 1 | \n",
159 | " 0 | \n",
160 | " 10-02-2014 | \n",
161 | " 26 | \n",
162 | " 4 | \n",
163 | " 6 | \n",
164 | "
\n",
165 | " \n",
166 | " 4 | \n",
167 | " 5324 | \n",
168 | " 1981 | \n",
169 | " PhD | \n",
170 | " Married | \n",
171 | " 58293.0 | \n",
172 | " 1 | \n",
173 | " 0 | \n",
174 | " 19-01-2014 | \n",
175 | " 94 | \n",
176 | " 6 | \n",
177 | " 5 | \n",
178 | "
\n",
179 | " \n",
180 | "
\n",
181 | "
"
182 | ],
183 | "text/plain": [
184 | " ID Year_Birth Education Marital_Status Income Kidhome Teenhome \\\n",
185 | "0 5524 1957 Graduation Single 58138.0 0 0 \n",
186 | "1 2174 1954 Graduation Single 46344.0 1 1 \n",
187 | "2 4141 1965 Graduation Together 71613.0 0 0 \n",
188 | "3 6182 1984 Graduation Together 26646.0 1 0 \n",
189 | "4 5324 1981 PhD Married 58293.0 1 0 \n",
190 | "\n",
191 | " Dt_Customer Recency NumStorePurchases NumWebVisitsMonth \n",
192 | "0 04-09-2012 58 4 7 \n",
193 | "1 08-03-2014 38 2 5 \n",
194 | "2 21-08-2013 26 10 4 \n",
195 | "3 10-02-2014 26 4 6 \n",
196 | "4 19-01-2014 94 6 5 "
197 | ]
198 | },
199 | "execution_count": 6,
200 | "metadata": {},
201 | "output_type": "execute_result"
202 | }
203 | ],
204 | "source": [
205 | "marketing_data.head()"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": 7,
211 | "id": "b3601b62",
212 | "metadata": {},
213 | "outputs": [
214 | {
215 | "data": {
216 | "text/plain": [
217 | "ID int64\n",
218 | "Year_Birth int64\n",
219 | "Education object\n",
220 | "Marital_Status object\n",
221 | "Income float64\n",
222 | "Kidhome int64\n",
223 | "Teenhome int64\n",
224 | "Dt_Customer object\n",
225 | "Recency int64\n",
226 | "NumStorePurchases int64\n",
227 | "NumWebVisitsMonth int64\n",
228 | "dtype: object"
229 | ]
230 | },
231 | "execution_count": 7,
232 | "metadata": {},
233 | "output_type": "execute_result"
234 | }
235 | ],
236 | "source": [
237 | "marketing_data.dtypes"
238 | ]
239 | },
240 | {
241 | "cell_type": "markdown",
242 | "id": "46f24ec2",
243 | "metadata": {},
244 | "source": [
245 | "#### Sort customers based on number of Store Purchases in descending order"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 13,
251 | "id": "fe10f0a2",
252 | "metadata": {},
253 | "outputs": [],
254 | "source": [
255 | "sorted_data = marketing_data.sort_values('NumStorePurchases', ascending=False)"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": 20,
261 | "id": "07e2dab3",
262 | "metadata": {},
263 | "outputs": [],
264 | "source": [
265 | "sorted_data[['ID','NumStorePurchases']].tail().to_clipboard()"
266 | ]
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": 12,
271 | "id": "021d0a5a",
272 | "metadata": {},
273 | "outputs": [],
274 | "source": [
275 | "marketing_data.sort_values('NumStorePurchases', ascending=False).head(2).T.to_clipboard()"
276 | ]
277 | }
278 | ],
279 | "metadata": {
280 | "kernelspec": {
281 | "display_name": "Python 3",
282 | "language": "python",
283 | "name": "python3"
284 | },
285 | "language_info": {
286 | "codemirror_mode": {
287 | "name": "ipython",
288 | "version": 3
289 | },
290 | "file_extension": ".py",
291 | "mimetype": "text/x-python",
292 | "name": "python",
293 | "nbconvert_exporter": "python",
294 | "pygments_lexer": "ipython3",
295 | "version": "3.7.1"
296 | }
297 | },
298 | "nbformat": 4,
299 | "nbformat_minor": 5
300 | }
301 |
--------------------------------------------------------------------------------
/Ch3/.ipynb_checkpoints/Untitled-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [],
3 | "metadata": {},
4 | "nbformat": 4,
5 | "nbformat_minor": 5
6 | }
7 |
--------------------------------------------------------------------------------
/Ch3/1. Preparing for EDA ..ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "id": "a2050fe1",
24 | "metadata": {},
25 | "source": [
26 | "#### Load dataset"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "id": "3a02fac1",
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "houseprices_data = pd.read_csv(\"data/HousingPricesData.csv\")"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "id": "e81aa501",
42 | "metadata": {},
43 | "source": [
44 | "#### Subset for relevant columns"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 3,
50 | "id": "9279d462",
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "houseprices_data = houseprices_data[['Zip', 'Price', 'Area', 'Room']]"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "id": "128c9ecd",
60 | "metadata": {},
61 | "source": [
62 | "#### Inspect first 2 rows and data types of the dataset"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 4,
68 | "id": "1f556097",
69 | "metadata": {
70 | "scrolled": true
71 | },
72 | "outputs": [
73 | {
74 | "data": {
75 | "text/html": [
76 | "\n",
77 | "\n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " | \n",
94 | " Zip | \n",
95 | " Price | \n",
96 | " Area | \n",
97 | " Room | \n",
98 | "
\n",
99 | " \n",
100 | " \n",
101 | " \n",
102 | " 0 | \n",
103 | " 1091 CR | \n",
104 | " 685000.0 | \n",
105 | " 64 | \n",
106 | " 3 | \n",
107 | "
\n",
108 | " \n",
109 | " 1 | \n",
110 | " 1059 EL | \n",
111 | " 475000.0 | \n",
112 | " 60 | \n",
113 | " 3 | \n",
114 | "
\n",
115 | " \n",
116 | " 2 | \n",
117 | " 1097 SM | \n",
118 | " 850000.0 | \n",
119 | " 109 | \n",
120 | " 4 | \n",
121 | "
\n",
122 | " \n",
123 | " 3 | \n",
124 | " 1060 TH | \n",
125 | " 580000.0 | \n",
126 | " 128 | \n",
127 | " 6 | \n",
128 | "
\n",
129 | " \n",
130 | " 4 | \n",
131 | " 1036 KN | \n",
132 | " 720000.0 | \n",
133 | " 138 | \n",
134 | " 5 | \n",
135 | "
\n",
136 | " \n",
137 | "
\n",
138 | "
"
139 | ],
140 | "text/plain": [
141 | " Zip Price Area Room\n",
142 | "0 1091 CR 685000.0 64 3\n",
143 | "1 1059 EL 475000.0 60 3\n",
144 | "2 1097 SM 850000.0 109 4\n",
145 | "3 1060 TH 580000.0 128 6\n",
146 | "4 1036 KN 720000.0 138 5"
147 | ]
148 | },
149 | "execution_count": 4,
150 | "metadata": {},
151 | "output_type": "execute_result"
152 | }
153 | ],
154 | "source": [
155 | "houseprices_data.head()"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 5,
161 | "id": "bdf17c4b",
162 | "metadata": {},
163 | "outputs": [
164 | {
165 | "data": {
166 | "text/plain": [
167 | "(924, 4)"
168 | ]
169 | },
170 | "execution_count": 5,
171 | "metadata": {},
172 | "output_type": "execute_result"
173 | }
174 | ],
175 | "source": [
176 | "houseprices_data.shape"
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 6,
182 | "id": "b3601b62",
183 | "metadata": {},
184 | "outputs": [
185 | {
186 | "data": {
187 | "text/plain": [
188 | "Zip object\n",
189 | "Price float64\n",
190 | "Area int64\n",
191 | "Room int64\n",
192 | "dtype: object"
193 | ]
194 | },
195 | "execution_count": 6,
196 | "metadata": {},
197 | "output_type": "execute_result"
198 | }
199 | ],
200 | "source": [
201 | "houseprices_data.dtypes"
202 | ]
203 | },
204 | {
205 | "cell_type": "markdown",
206 | "id": "09bbdf22",
207 | "metadata": {},
208 | "source": [
209 | "#### Create a price per sqm variable based on the price and area variables"
210 | ]
211 | },
212 | {
213 | "cell_type": "code",
214 | "execution_count": 7,
215 | "id": "60b95d7c",
216 | "metadata": {},
217 | "outputs": [],
218 | "source": [
219 | "houseprices_data['PriceperSqm'] = houseprices_data['Price']/houseprices_data['Area']"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": 8,
225 | "id": "cf89a96b",
226 | "metadata": {},
227 | "outputs": [
228 | {
229 | "data": {
230 | "text/html": [
231 | "\n",
232 | "\n",
245 | "
\n",
246 | " \n",
247 | " \n",
248 | " | \n",
249 | " Zip | \n",
250 | " Price | \n",
251 | " Area | \n",
252 | " Room | \n",
253 | " PriceperSqm | \n",
254 | "
\n",
255 | " \n",
256 | " \n",
257 | " \n",
258 | " 0 | \n",
259 | " 1091 CR | \n",
260 | " 685000.0 | \n",
261 | " 64 | \n",
262 | " 3 | \n",
263 | " 10703.125000 | \n",
264 | "
\n",
265 | " \n",
266 | " 1 | \n",
267 | " 1059 EL | \n",
268 | " 475000.0 | \n",
269 | " 60 | \n",
270 | " 3 | \n",
271 | " 7916.666667 | \n",
272 | "
\n",
273 | " \n",
274 | " 2 | \n",
275 | " 1097 SM | \n",
276 | " 850000.0 | \n",
277 | " 109 | \n",
278 | " 4 | \n",
279 | " 7798.165138 | \n",
280 | "
\n",
281 | " \n",
282 | " 3 | \n",
283 | " 1060 TH | \n",
284 | " 580000.0 | \n",
285 | " 128 | \n",
286 | " 6 | \n",
287 | " 4531.250000 | \n",
288 | "
\n",
289 | " \n",
290 | " 4 | \n",
291 | " 1036 KN | \n",
292 | " 720000.0 | \n",
293 | " 138 | \n",
294 | " 5 | \n",
295 | " 5217.391304 | \n",
296 | "
\n",
297 | " \n",
298 | "
\n",
299 | "
"
300 | ],
301 | "text/plain": [
302 | " Zip Price Area Room PriceperSqm\n",
303 | "0 1091 CR 685000.0 64 3 10703.125000\n",
304 | "1 1059 EL 475000.0 60 3 7916.666667\n",
305 | "2 1097 SM 850000.0 109 4 7798.165138\n",
306 | "3 1060 TH 580000.0 128 6 4531.250000\n",
307 | "4 1036 KN 720000.0 138 5 5217.391304"
308 | ]
309 | },
310 | "execution_count": 8,
311 | "metadata": {},
312 | "output_type": "execute_result"
313 | }
314 | ],
315 | "source": [
316 | "houseprices_data.head()"
317 | ]
318 | }
319 | ],
320 | "metadata": {
321 | "kernelspec": {
322 | "display_name": "Python 3",
323 | "language": "python",
324 | "name": "python3"
325 | },
326 | "language_info": {
327 | "codemirror_mode": {
328 | "name": "ipython",
329 | "version": 3
330 | },
331 | "file_extension": ".py",
332 | "mimetype": "text/x-python",
333 | "name": "python",
334 | "nbconvert_exporter": "python",
335 | "pygments_lexer": "ipython3",
336 | "version": "3.7.1"
337 | }
338 | },
339 | "nbformat": 4,
340 | "nbformat_minor": 5
341 | }
342 |
--------------------------------------------------------------------------------
/Ch4/.ipynb_checkpoints/2. Performing univariate analysis using a Boxplot-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": null,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd\n",
19 | "import matplotlib.pyplot as plt\n",
20 | "import seaborn as sns"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "id": "a2050fe1",
26 | "metadata": {},
27 | "source": [
28 | "#### Load dataset"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": null,
34 | "id": "3a02fac1",
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "id": "e81aa501",
44 | "metadata": {},
45 | "source": [
46 | "#### Subset for relevant columns"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": null,
52 | "id": "9279d462",
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "houseprices_data = houseprices_data[['Zip','Price','Area','Room']]"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "id": "128c9ecd",
62 | "metadata": {},
63 | "source": [
64 | "#### Inspect first 5 rows and data types of the dataset"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": null,
70 | "id": "1f556097",
71 | "metadata": {
72 | "scrolled": true
73 | },
74 | "outputs": [],
75 | "source": [
76 | "houseprices_data.head()"
77 | ]
78 | },
79 | {
80 | "cell_type": "code",
81 | "execution_count": null,
82 | "id": "bdf17c4b",
83 | "metadata": {},
84 | "outputs": [],
85 | "source": [
86 | "houseprices_data.shape"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": null,
92 | "id": "b3601b62",
93 | "metadata": {},
94 | "outputs": [],
95 | "source": [
96 | "houseprices_data.dtypes"
97 | ]
98 | },
99 | {
100 | "cell_type": "markdown",
101 | "id": "09bbdf22",
102 | "metadata": {},
103 | "source": [
104 | "#### Create a Boxplot in Seaborn"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": null,
110 | "id": "aa501b1d",
111 | "metadata": {
112 | "scrolled": true
113 | },
114 | "outputs": [],
115 | "source": [
116 | "sns.boxplot(data = houseprices_data, x= houseprices_data[\"Price\"])"
117 | ]
118 | },
119 | {
120 | "cell_type": "markdown",
121 | "id": "e7bceadc",
122 | "metadata": {},
123 | "source": [
124 | "#### Provide additional chart details"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": null,
130 | "id": "8b23246b",
131 | "metadata": {},
132 | "outputs": [],
133 | "source": [
134 | "plt.figure(figsize= (12,6))\n",
135 | "\n",
136 | "ax = sns.boxplot(data = houseprices_data, x= houseprices_data[\"Price\"])\n",
137 | "ax.set_xlabel('House Prices in millions',fontsize = 15)\n",
138 | "ax.set_title('Univariate analysis of House Prices', fontsize= 20)\n",
139 | "plt.ticklabel_format(style='plain', axis='x')\n"
140 | ]
141 | }
142 | ],
143 | "metadata": {
144 | "kernelspec": {
145 | "display_name": "Python 3",
146 | "language": "python",
147 | "name": "python3"
148 | },
149 | "language_info": {
150 | "codemirror_mode": {
151 | "name": "ipython",
152 | "version": 3
153 | },
154 | "file_extension": ".py",
155 | "mimetype": "text/x-python",
156 | "name": "python",
157 | "nbconvert_exporter": "python",
158 | "pygments_lexer": "ipython3",
159 | "version": "3.7.1"
160 | }
161 | },
162 | "nbformat": 4,
163 | "nbformat_minor": 5
164 | }
165 |
--------------------------------------------------------------------------------
/Ch4/.ipynb_checkpoints/4. Performing univariate analysis using a Summary Table-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "id": "a2050fe1",
24 | "metadata": {},
25 | "source": [
26 | "#### Load dataset"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "id": "3a02fac1",
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "id": "e81aa501",
42 | "metadata": {},
43 | "source": [
44 | "#### Subset for relevant columns"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 3,
50 | "id": "9279d462",
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "houseprices_data = houseprices_data[['Zip','Price']]"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "id": "128c9ecd",
60 | "metadata": {},
61 | "source": [
62 | "#### Inspect first 5 rows and data types of the dataset"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 4,
68 | "id": "1f556097",
69 | "metadata": {
70 | "scrolled": true
71 | },
72 | "outputs": [
73 | {
74 | "data": {
75 | "text/html": [
76 | "\n",
77 | "\n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " | \n",
94 | " Zip | \n",
95 | " Price | \n",
96 | "
\n",
97 | " \n",
98 | " \n",
99 | " \n",
100 | " 0 | \n",
101 | " 1091 CR | \n",
102 | " 685000.0 | \n",
103 | "
\n",
104 | " \n",
105 | " 1 | \n",
106 | " 1059 EL | \n",
107 | " 475000.0 | \n",
108 | "
\n",
109 | " \n",
110 | " 2 | \n",
111 | " 1097 SM | \n",
112 | " 850000.0 | \n",
113 | "
\n",
114 | " \n",
115 | " 3 | \n",
116 | " 1060 TH | \n",
117 | " 580000.0 | \n",
118 | "
\n",
119 | " \n",
120 | " 4 | \n",
121 | " 1036 KN | \n",
122 | " 720000.0 | \n",
123 | "
\n",
124 | " \n",
125 | "
\n",
126 | "
"
127 | ],
128 | "text/plain": [
129 | " Zip Price\n",
130 | "0 1091 CR 685000.0\n",
131 | "1 1059 EL 475000.0\n",
132 | "2 1097 SM 850000.0\n",
133 | "3 1060 TH 580000.0\n",
134 | "4 1036 KN 720000.0"
135 | ]
136 | },
137 | "execution_count": 4,
138 | "metadata": {},
139 | "output_type": "execute_result"
140 | }
141 | ],
142 | "source": [
143 | "houseprices_data.head()"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 5,
149 | "id": "bdf17c4b",
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "data": {
154 | "text/plain": [
155 | "(924, 2)"
156 | ]
157 | },
158 | "execution_count": 5,
159 | "metadata": {},
160 | "output_type": "execute_result"
161 | }
162 | ],
163 | "source": [
164 | "houseprices_data.shape"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": 6,
170 | "id": "b3601b62",
171 | "metadata": {},
172 | "outputs": [
173 | {
174 | "data": {
175 | "text/plain": [
176 | "Zip object\n",
177 | "Price float64\n",
178 | "dtype: object"
179 | ]
180 | },
181 | "execution_count": 6,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | }
185 | ],
186 | "source": [
187 | "houseprices_data.dtypes"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "id": "09bbdf22",
193 | "metadata": {},
194 | "source": [
195 | "#### Create a Summary Table using the describe method in Pandas"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "id": "aa501b1d",
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/html": [
207 | "\n",
208 | "\n",
221 | "
\n",
222 | " \n",
223 | " \n",
224 | " | \n",
225 | " Price | \n",
226 | "
\n",
227 | " \n",
228 | " \n",
229 | " \n",
230 | " count | \n",
231 | " 9.200000e+02 | \n",
232 | "
\n",
233 | " \n",
234 | " mean | \n",
235 | " 6.220654e+05 | \n",
236 | "
\n",
237 | " \n",
238 | " std | \n",
239 | " 5.389942e+05 | \n",
240 | "
\n",
241 | " \n",
242 | " min | \n",
243 | " 1.750000e+05 | \n",
244 | "
\n",
245 | " \n",
246 | " 25% | \n",
247 | " 3.500000e+05 | \n",
248 | "
\n",
249 | " \n",
250 | " 50% | \n",
251 | " 4.670000e+05 | \n",
252 | "
\n",
253 | " \n",
254 | " 75% | \n",
255 | " 7.000000e+05 | \n",
256 | "
\n",
257 | " \n",
258 | " max | \n",
259 | " 5.950000e+06 | \n",
260 | "
\n",
261 | " \n",
262 | "
\n",
263 | "
"
264 | ],
265 | "text/plain": [
266 | " Price\n",
267 | "count 9.200000e+02\n",
268 | "mean 6.220654e+05\n",
269 | "std 5.389942e+05\n",
270 | "min 1.750000e+05\n",
271 | "25% 3.500000e+05\n",
272 | "50% 4.670000e+05\n",
273 | "75% 7.000000e+05\n",
274 | "max 5.950000e+06"
275 | ]
276 | },
277 | "execution_count": 7,
278 | "metadata": {},
279 | "output_type": "execute_result"
280 | }
281 | ],
282 | "source": [
283 | "houseprices_data.describe()"
284 | ]
285 | }
286 | ],
287 | "metadata": {
288 | "kernelspec": {
289 | "display_name": "Python 3",
290 | "language": "python",
291 | "name": "python3"
292 | },
293 | "language_info": {
294 | "codemirror_mode": {
295 | "name": "ipython",
296 | "version": 3
297 | },
298 | "file_extension": ".py",
299 | "mimetype": "text/x-python",
300 | "name": "python",
301 | "nbconvert_exporter": "python",
302 | "pygments_lexer": "ipython3",
303 | "version": "3.7.1"
304 | }
305 | },
306 | "nbformat": 4,
307 | "nbformat_minor": 5
308 | }
309 |
--------------------------------------------------------------------------------
/Ch4/.ipynb_checkpoints/Performing univariate analysis using a Table-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "id": "a2050fe1",
24 | "metadata": {},
25 | "source": [
26 | "#### Load dataset"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "id": "3a02fac1",
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "Houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "id": "e81aa501",
42 | "metadata": {},
43 | "source": [
44 | "#### Subset for relevant columns"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 3,
50 | "id": "9279d462",
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "Houseprices_data = Houseprices_data[['Zip','Price']]"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "id": "128c9ecd",
60 | "metadata": {},
61 | "source": [
62 | "#### Inspect first 5 rows and data types of the dataset"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 4,
68 | "id": "1f556097",
69 | "metadata": {
70 | "scrolled": true
71 | },
72 | "outputs": [
73 | {
74 | "data": {
75 | "text/html": [
76 | "\n",
77 | "\n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " | \n",
94 | " Zip | \n",
95 | " Price | \n",
96 | "
\n",
97 | " \n",
98 | " \n",
99 | " \n",
100 | " 0 | \n",
101 | " 1091 CR | \n",
102 | " 685000.0 | \n",
103 | "
\n",
104 | " \n",
105 | " 1 | \n",
106 | " 1059 EL | \n",
107 | " 475000.0 | \n",
108 | "
\n",
109 | " \n",
110 | " 2 | \n",
111 | " 1097 SM | \n",
112 | " 850000.0 | \n",
113 | "
\n",
114 | " \n",
115 | " 3 | \n",
116 | " 1060 TH | \n",
117 | " 580000.0 | \n",
118 | "
\n",
119 | " \n",
120 | " 4 | \n",
121 | " 1036 KN | \n",
122 | " 720000.0 | \n",
123 | "
\n",
124 | " \n",
125 | "
\n",
126 | "
"
127 | ],
128 | "text/plain": [
129 | " Zip Price\n",
130 | "0 1091 CR 685000.0\n",
131 | "1 1059 EL 475000.0\n",
132 | "2 1097 SM 850000.0\n",
133 | "3 1060 TH 580000.0\n",
134 | "4 1036 KN 720000.0"
135 | ]
136 | },
137 | "execution_count": 4,
138 | "metadata": {},
139 | "output_type": "execute_result"
140 | }
141 | ],
142 | "source": [
143 | "Houseprices_data.head()"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 5,
149 | "id": "bdf17c4b",
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "data": {
154 | "text/plain": [
155 | "(924, 2)"
156 | ]
157 | },
158 | "execution_count": 5,
159 | "metadata": {},
160 | "output_type": "execute_result"
161 | }
162 | ],
163 | "source": [
164 | "Houseprices_data.shape"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": 6,
170 | "id": "b3601b62",
171 | "metadata": {},
172 | "outputs": [
173 | {
174 | "data": {
175 | "text/plain": [
176 | "Zip object\n",
177 | "Price float64\n",
178 | "dtype: object"
179 | ]
180 | },
181 | "execution_count": 6,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | }
185 | ],
186 | "source": [
187 | "Houseprices_data.dtypes"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "id": "09bbdf22",
193 | "metadata": {},
194 | "source": [
195 | "#### Create a Summary Table using the describe method in Pandas"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "id": "aa501b1d",
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/html": [
207 | "\n",
208 | "\n",
221 | "
\n",
222 | " \n",
223 | " \n",
224 | " | \n",
225 | " Price | \n",
226 | "
\n",
227 | " \n",
228 | " \n",
229 | " \n",
230 | " count | \n",
231 | " 9.200000e+02 | \n",
232 | "
\n",
233 | " \n",
234 | " mean | \n",
235 | " 6.220654e+05 | \n",
236 | "
\n",
237 | " \n",
238 | " std | \n",
239 | " 5.389942e+05 | \n",
240 | "
\n",
241 | " \n",
242 | " min | \n",
243 | " 1.750000e+05 | \n",
244 | "
\n",
245 | " \n",
246 | " 25% | \n",
247 | " 3.500000e+05 | \n",
248 | "
\n",
249 | " \n",
250 | " 50% | \n",
251 | " 4.670000e+05 | \n",
252 | "
\n",
253 | " \n",
254 | " 75% | \n",
255 | " 7.000000e+05 | \n",
256 | "
\n",
257 | " \n",
258 | " max | \n",
259 | " 5.950000e+06 | \n",
260 | "
\n",
261 | " \n",
262 | "
\n",
263 | "
"
264 | ],
265 | "text/plain": [
266 | " Price\n",
267 | "count 9.200000e+02\n",
268 | "mean 6.220654e+05\n",
269 | "std 5.389942e+05\n",
270 | "min 1.750000e+05\n",
271 | "25% 3.500000e+05\n",
272 | "50% 4.670000e+05\n",
273 | "75% 7.000000e+05\n",
274 | "max 5.950000e+06"
275 | ]
276 | },
277 | "execution_count": 7,
278 | "metadata": {},
279 | "output_type": "execute_result"
280 | }
281 | ],
282 | "source": [
283 | "Houseprices_data.describe()"
284 | ]
285 | }
286 | ],
287 | "metadata": {
288 | "kernelspec": {
289 | "display_name": "Python 3",
290 | "language": "python",
291 | "name": "python3"
292 | },
293 | "language_info": {
294 | "codemirror_mode": {
295 | "name": "ipython",
296 | "version": 3
297 | },
298 | "file_extension": ".py",
299 | "mimetype": "text/x-python",
300 | "name": "python",
301 | "nbconvert_exporter": "python",
302 | "pygments_lexer": "ipython3",
303 | "version": "3.7.1"
304 | }
305 | },
306 | "nbformat": 4,
307 | "nbformat_minor": 5
308 | }
309 |
--------------------------------------------------------------------------------
/Ch4/4. Performing univariate analysis using a Summary Table.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "id": "a2050fe1",
24 | "metadata": {},
25 | "source": [
26 | "#### Load dataset"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "id": "3a02fac1",
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "houseprices_data = pd.read_csv(\"Data/HousingPricesData.csv\")"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "id": "e81aa501",
42 | "metadata": {},
43 | "source": [
44 | "#### Subset for relevant columns"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 3,
50 | "id": "9279d462",
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "houseprices_data = houseprices_data[['Zip','Price']]"
55 | ]
56 | },
57 | {
58 | "cell_type": "markdown",
59 | "id": "128c9ecd",
60 | "metadata": {},
61 | "source": [
62 | "#### Inspect first 5 rows and data types of the dataset"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 4,
68 | "id": "1f556097",
69 | "metadata": {
70 | "scrolled": true
71 | },
72 | "outputs": [
73 | {
74 | "data": {
75 | "text/html": [
76 | "\n",
77 | "\n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " | \n",
94 | " Zip | \n",
95 | " Price | \n",
96 | "
\n",
97 | " \n",
98 | " \n",
99 | " \n",
100 | " 0 | \n",
101 | " 1091 CR | \n",
102 | " 685000.0 | \n",
103 | "
\n",
104 | " \n",
105 | " 1 | \n",
106 | " 1059 EL | \n",
107 | " 475000.0 | \n",
108 | "
\n",
109 | " \n",
110 | " 2 | \n",
111 | " 1097 SM | \n",
112 | " 850000.0 | \n",
113 | "
\n",
114 | " \n",
115 | " 3 | \n",
116 | " 1060 TH | \n",
117 | " 580000.0 | \n",
118 | "
\n",
119 | " \n",
120 | " 4 | \n",
121 | " 1036 KN | \n",
122 | " 720000.0 | \n",
123 | "
\n",
124 | " \n",
125 | "
\n",
126 | "
"
127 | ],
128 | "text/plain": [
129 | " Zip Price\n",
130 | "0 1091 CR 685000.0\n",
131 | "1 1059 EL 475000.0\n",
132 | "2 1097 SM 850000.0\n",
133 | "3 1060 TH 580000.0\n",
134 | "4 1036 KN 720000.0"
135 | ]
136 | },
137 | "execution_count": 4,
138 | "metadata": {},
139 | "output_type": "execute_result"
140 | }
141 | ],
142 | "source": [
143 | "houseprices_data.head()"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 5,
149 | "id": "bdf17c4b",
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "data": {
154 | "text/plain": [
155 | "(924, 2)"
156 | ]
157 | },
158 | "execution_count": 5,
159 | "metadata": {},
160 | "output_type": "execute_result"
161 | }
162 | ],
163 | "source": [
164 | "houseprices_data.shape"
165 | ]
166 | },
167 | {
168 | "cell_type": "code",
169 | "execution_count": 6,
170 | "id": "b3601b62",
171 | "metadata": {},
172 | "outputs": [
173 | {
174 | "data": {
175 | "text/plain": [
176 | "Zip object\n",
177 | "Price float64\n",
178 | "dtype: object"
179 | ]
180 | },
181 | "execution_count": 6,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | }
185 | ],
186 | "source": [
187 | "houseprices_data.dtypes"
188 | ]
189 | },
190 | {
191 | "cell_type": "markdown",
192 | "id": "09bbdf22",
193 | "metadata": {},
194 | "source": [
195 | "#### Create a Summary Table using the describe method in Pandas"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": 7,
201 | "id": "aa501b1d",
202 | "metadata": {},
203 | "outputs": [
204 | {
205 | "data": {
206 | "text/html": [
207 | "\n",
208 | "\n",
221 | "
\n",
222 | " \n",
223 | " \n",
224 | " | \n",
225 | " Price | \n",
226 | "
\n",
227 | " \n",
228 | " \n",
229 | " \n",
230 | " count | \n",
231 | " 9.200000e+02 | \n",
232 | "
\n",
233 | " \n",
234 | " mean | \n",
235 | " 6.220654e+05 | \n",
236 | "
\n",
237 | " \n",
238 | " std | \n",
239 | " 5.389942e+05 | \n",
240 | "
\n",
241 | " \n",
242 | " min | \n",
243 | " 1.750000e+05 | \n",
244 | "
\n",
245 | " \n",
246 | " 25% | \n",
247 | " 3.500000e+05 | \n",
248 | "
\n",
249 | " \n",
250 | " 50% | \n",
251 | " 4.670000e+05 | \n",
252 | "
\n",
253 | " \n",
254 | " 75% | \n",
255 | " 7.000000e+05 | \n",
256 | "
\n",
257 | " \n",
258 | " max | \n",
259 | " 5.950000e+06 | \n",
260 | "
\n",
261 | " \n",
262 | "
\n",
263 | "
"
264 | ],
265 | "text/plain": [
266 | " Price\n",
267 | "count 9.200000e+02\n",
268 | "mean 6.220654e+05\n",
269 | "std 5.389942e+05\n",
270 | "min 1.750000e+05\n",
271 | "25% 3.500000e+05\n",
272 | "50% 4.670000e+05\n",
273 | "75% 7.000000e+05\n",
274 | "max 5.950000e+06"
275 | ]
276 | },
277 | "execution_count": 7,
278 | "metadata": {},
279 | "output_type": "execute_result"
280 | }
281 | ],
282 | "source": [
283 | "houseprices_data.describe()"
284 | ]
285 | }
286 | ],
287 | "metadata": {
288 | "kernelspec": {
289 | "display_name": "Python 3",
290 | "language": "python",
291 | "name": "python3"
292 | },
293 | "language_info": {
294 | "codemirror_mode": {
295 | "name": "ipython",
296 | "version": 3
297 | },
298 | "file_extension": ".py",
299 | "mimetype": "text/x-python",
300 | "name": "python",
301 | "nbconvert_exporter": "python",
302 | "pygments_lexer": "ipython3",
303 | "version": "3.7.1"
304 | }
305 | },
306 | "nbformat": 4,
307 | "nbformat_minor": 5
308 | }
309 |
--------------------------------------------------------------------------------
/Ch5/.ipynb_checkpoints/2. Creating CrosstabTwo-way table on bivariate data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd\n",
19 | "import matplotlib.pyplot as plt\n",
20 | "import seaborn as sns"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "id": "a2050fe1",
26 | "metadata": {},
27 | "source": [
28 | "#### Load dataset"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 2,
34 | "id": "3a02fac1",
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "penguins_data = pd.read_csv(\"data/penguins_size.csv\")"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "id": "e81aa501",
44 | "metadata": {},
45 | "source": [
46 | "#### Subset for relevant columns"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 3,
52 | "id": "9279d462",
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "penguins_data = penguins_data[['species','culmen_length_mm','sex']]"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "id": "128c9ecd",
62 | "metadata": {},
63 | "source": [
64 | "#### Inspect first 5 rows and data types of the dataset"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": 4,
70 | "id": "1f556097",
71 | "metadata": {
72 | "scrolled": true
73 | },
74 | "outputs": [
75 | {
76 | "data": {
77 | "text/html": [
78 | "\n",
79 | "\n",
92 | "
\n",
93 | " \n",
94 | " \n",
95 | " | \n",
96 | " species | \n",
97 | " culmen_length_mm | \n",
98 | " sex | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " Adelie | \n",
105 | " 39.1 | \n",
106 | " MALE | \n",
107 | "
\n",
108 | " \n",
109 | " 1 | \n",
110 | " Adelie | \n",
111 | " 39.5 | \n",
112 | " FEMALE | \n",
113 | "
\n",
114 | " \n",
115 | " 2 | \n",
116 | " Adelie | \n",
117 | " 40.3 | \n",
118 | " FEMALE | \n",
119 | "
\n",
120 | " \n",
121 | " 3 | \n",
122 | " Adelie | \n",
123 | " NaN | \n",
124 | " NaN | \n",
125 | "
\n",
126 | " \n",
127 | " 4 | \n",
128 | " Adelie | \n",
129 | " 36.7 | \n",
130 | " FEMALE | \n",
131 | "
\n",
132 | " \n",
133 | "
\n",
134 | "
"
135 | ],
136 | "text/plain": [
137 | " species culmen_length_mm sex\n",
138 | "0 Adelie 39.1 MALE\n",
139 | "1 Adelie 39.5 FEMALE\n",
140 | "2 Adelie 40.3 FEMALE\n",
141 | "3 Adelie NaN NaN\n",
142 | "4 Adelie 36.7 FEMALE"
143 | ]
144 | },
145 | "execution_count": 4,
146 | "metadata": {},
147 | "output_type": "execute_result"
148 | }
149 | ],
150 | "source": [
151 | "penguins_data.head()"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 5,
157 | "id": "bdf17c4b",
158 | "metadata": {},
159 | "outputs": [
160 | {
161 | "data": {
162 | "text/plain": [
163 | "(344, 3)"
164 | ]
165 | },
166 | "execution_count": 5,
167 | "metadata": {},
168 | "output_type": "execute_result"
169 | }
170 | ],
171 | "source": [
172 | "penguins_data.shape"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": 6,
178 | "id": "b3601b62",
179 | "metadata": {},
180 | "outputs": [
181 | {
182 | "data": {
183 | "text/plain": [
184 | "species object\n",
185 | "culmen_length_mm float64\n",
186 | "sex object\n",
187 | "dtype: object"
188 | ]
189 | },
190 | "execution_count": 6,
191 | "metadata": {},
192 | "output_type": "execute_result"
193 | }
194 | ],
195 | "source": [
196 | "penguins_data.dtypes"
197 | ]
198 | },
199 | {
200 | "cell_type": "markdown",
201 | "id": "09bbdf22",
202 | "metadata": {},
203 | "source": [
204 | "#### Create a Crosstab in Pandas"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": 7,
210 | "id": "60b95d7c",
211 | "metadata": {
212 | "scrolled": true
213 | },
214 | "outputs": [
215 | {
216 | "data": {
217 | "text/html": [
218 | "\n",
219 | "\n",
232 | "
\n",
233 | " \n",
234 | " \n",
235 | " sex | \n",
236 | " FEMALE | \n",
237 | " MALE | \n",
238 | "
\n",
239 | " \n",
240 | " species | \n",
241 | " | \n",
242 | " | \n",
243 | "
\n",
244 | " \n",
245 | " \n",
246 | " \n",
247 | " Adelie | \n",
248 | " 73 | \n",
249 | " 73 | \n",
250 | "
\n",
251 | " \n",
252 | " Chinstrap | \n",
253 | " 34 | \n",
254 | " 34 | \n",
255 | "
\n",
256 | " \n",
257 | " Gentoo | \n",
258 | " 58 | \n",
259 | " 62 | \n",
260 | "
\n",
261 | " \n",
262 | "
\n",
263 | "
"
264 | ],
265 | "text/plain": [
266 | "sex FEMALE MALE\n",
267 | "species \n",
268 | "Adelie 73 73\n",
269 | "Chinstrap 34 34\n",
270 | "Gentoo 58 62"
271 | ]
272 | },
273 | "execution_count": 7,
274 | "metadata": {},
275 | "output_type": "execute_result"
276 | }
277 | ],
278 | "source": [
279 | "pd.crosstab(index= penguins_data['species'], columns= penguins_data['sex'])"
280 | ]
281 | }
282 | ],
283 | "metadata": {
284 | "kernelspec": {
285 | "display_name": "Python 3",
286 | "language": "python",
287 | "name": "python3"
288 | },
289 | "language_info": {
290 | "codemirror_mode": {
291 | "name": "ipython",
292 | "version": 3
293 | },
294 | "file_extension": ".py",
295 | "mimetype": "text/x-python",
296 | "name": "python",
297 | "nbconvert_exporter": "python",
298 | "pygments_lexer": "ipython3",
299 | "version": "3.7.1"
300 | }
301 | },
302 | "nbformat": 4,
303 | "nbformat_minor": 5
304 | }
305 |
--------------------------------------------------------------------------------
/Ch5/2. Creating CrosstabTwo-way table on bivariate data.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import pandas as pd\n",
19 | "import matplotlib.pyplot as plt\n",
20 | "import seaborn as sns"
21 | ]
22 | },
23 | {
24 | "cell_type": "markdown",
25 | "id": "a2050fe1",
26 | "metadata": {},
27 | "source": [
28 | "#### Load dataset"
29 | ]
30 | },
31 | {
32 | "cell_type": "code",
33 | "execution_count": 2,
34 | "id": "3a02fac1",
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "penguins_data = pd.read_csv(\"data/penguins_size.csv\")"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "id": "e81aa501",
44 | "metadata": {},
45 | "source": [
46 | "#### Subset for relevant columns"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 3,
52 | "id": "9279d462",
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "penguins_data = penguins_data[['species','culmen_length_mm','sex']]"
57 | ]
58 | },
59 | {
60 | "cell_type": "markdown",
61 | "id": "128c9ecd",
62 | "metadata": {},
63 | "source": [
64 | "#### Inspect first 5 rows and data types of the dataset"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": 4,
70 | "id": "1f556097",
71 | "metadata": {
72 | "scrolled": true
73 | },
74 | "outputs": [
75 | {
76 | "data": {
77 | "text/html": [
78 | "\n",
79 | "\n",
92 | "
\n",
93 | " \n",
94 | " \n",
95 | " | \n",
96 | " species | \n",
97 | " culmen_length_mm | \n",
98 | " sex | \n",
99 | "
\n",
100 | " \n",
101 | " \n",
102 | " \n",
103 | " 0 | \n",
104 | " Adelie | \n",
105 | " 39.1 | \n",
106 | " MALE | \n",
107 | "
\n",
108 | " \n",
109 | " 1 | \n",
110 | " Adelie | \n",
111 | " 39.5 | \n",
112 | " FEMALE | \n",
113 | "
\n",
114 | " \n",
115 | " 2 | \n",
116 | " Adelie | \n",
117 | " 40.3 | \n",
118 | " FEMALE | \n",
119 | "
\n",
120 | " \n",
121 | " 3 | \n",
122 | " Adelie | \n",
123 | " NaN | \n",
124 | " NaN | \n",
125 | "
\n",
126 | " \n",
127 | " 4 | \n",
128 | " Adelie | \n",
129 | " 36.7 | \n",
130 | " FEMALE | \n",
131 | "
\n",
132 | " \n",
133 | "
\n",
134 | "
"
135 | ],
136 | "text/plain": [
137 | " species culmen_length_mm sex\n",
138 | "0 Adelie 39.1 MALE\n",
139 | "1 Adelie 39.5 FEMALE\n",
140 | "2 Adelie 40.3 FEMALE\n",
141 | "3 Adelie NaN NaN\n",
142 | "4 Adelie 36.7 FEMALE"
143 | ]
144 | },
145 | "execution_count": 4,
146 | "metadata": {},
147 | "output_type": "execute_result"
148 | }
149 | ],
150 | "source": [
151 | "penguins_data.head()"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 5,
157 | "id": "bdf17c4b",
158 | "metadata": {},
159 | "outputs": [
160 | {
161 | "data": {
162 | "text/plain": [
163 | "(344, 3)"
164 | ]
165 | },
166 | "execution_count": 5,
167 | "metadata": {},
168 | "output_type": "execute_result"
169 | }
170 | ],
171 | "source": [
172 | "penguins_data.shape"
173 | ]
174 | },
175 | {
176 | "cell_type": "code",
177 | "execution_count": 6,
178 | "id": "b3601b62",
179 | "metadata": {},
180 | "outputs": [
181 | {
182 | "data": {
183 | "text/plain": [
184 | "species object\n",
185 | "culmen_length_mm float64\n",
186 | "sex object\n",
187 | "dtype: object"
188 | ]
189 | },
190 | "execution_count": 6,
191 | "metadata": {},
192 | "output_type": "execute_result"
193 | }
194 | ],
195 | "source": [
196 | "penguins_data.dtypes"
197 | ]
198 | },
199 | {
200 | "cell_type": "markdown",
201 | "id": "09bbdf22",
202 | "metadata": {},
203 | "source": [
204 | "#### Create a Crosstab in Pandas"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": 7,
210 | "id": "60b95d7c",
211 | "metadata": {
212 | "scrolled": true
213 | },
214 | "outputs": [
215 | {
216 | "data": {
217 | "text/html": [
218 | "\n",
219 | "\n",
232 | "
\n",
233 | " \n",
234 | " \n",
235 | " sex | \n",
236 | " FEMALE | \n",
237 | " MALE | \n",
238 | "
\n",
239 | " \n",
240 | " species | \n",
241 | " | \n",
242 | " | \n",
243 | "
\n",
244 | " \n",
245 | " \n",
246 | " \n",
247 | " Adelie | \n",
248 | " 73 | \n",
249 | " 73 | \n",
250 | "
\n",
251 | " \n",
252 | " Chinstrap | \n",
253 | " 34 | \n",
254 | " 34 | \n",
255 | "
\n",
256 | " \n",
257 | " Gentoo | \n",
258 | " 58 | \n",
259 | " 62 | \n",
260 | "
\n",
261 | " \n",
262 | "
\n",
263 | "
"
264 | ],
265 | "text/plain": [
266 | "sex FEMALE MALE\n",
267 | "species \n",
268 | "Adelie 73 73\n",
269 | "Chinstrap 34 34\n",
270 | "Gentoo 58 62"
271 | ]
272 | },
273 | "execution_count": 7,
274 | "metadata": {},
275 | "output_type": "execute_result"
276 | }
277 | ],
278 | "source": [
279 | "pd.crosstab(index= penguins_data['species'], columns= penguins_data['sex'])"
280 | ]
281 | }
282 | ],
283 | "metadata": {
284 | "kernelspec": {
285 | "display_name": "Python 3",
286 | "language": "python",
287 | "name": "python3"
288 | },
289 | "language_info": {
290 | "codemirror_mode": {
291 | "name": "ipython",
292 | "version": 3
293 | },
294 | "file_extension": ".py",
295 | "mimetype": "text/x-python",
296 | "name": "python",
297 | "nbconvert_exporter": "python",
298 | "pygments_lexer": "ipython3",
299 | "version": "3.7.1"
300 | }
301 | },
302 | "nbformat": 4,
303 | "nbformat_minor": 5
304 | }
305 |
--------------------------------------------------------------------------------
/Ch6/Data/website_survey.csv:
--------------------------------------------------------------------------------
1 | user_id,language,platform,gender,age,q1,q2,q3,q4,q5,q6,q7,q8,q9,q10,q11,q12,q13,q14,q15,q16,q17,q18,q19,q20,q21,q22,q23,q24,q25,q26
2 | 080c468b-27c0-455c-aa63-b8f807f2e3d7,en,Desktop,male,34,9,7,6,6,7,7,6,6,5,5,5,3,6,4,5,4,8,4,6,5,6,6,5,2,5,3
3 | 0b0379c7-04db-4c85-84bd-a2bd55329e29,en,Mobile,female,19,10,10,10,9,10,10,10,10,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,9,8
4 | 0e623280-b28b-4d4a-8eea-0732f09ed497,en,Mobile,female,19,10,10,10,10,10,10,10,10,10,10,10,10,6,9,7,9,9,8,10,10,10,10,9,9,8,8
5 | 045dc0f3-a730-4d03-a615-f51814e5b04f,en,Mobile,male,21,5,8,5,5,5,5,5,6,6,8,7,8,10,9,5,7,7,9,10,8,8,10,10,8,10,6
6 | 092f2ee7-5281-4a09-9bce-e5523b95b53b,en,null,female,53,9,10,9,10,9,7,8,5,7,8,7,8,10,9,8,7,7,8,8,8,9,9,10,10,10,10
7 | 19c0851c-cbb1-45b5-9e78-884b761802ac,en,Mobile,female,53,10,9,9,9,5,6,8,7,4,5,5,5,4,4,8,5,8,7,7,7,8,6,10,10,10,10
8 | 1a386158-d08c-40b1-9365-e758927d3352,en,Desktop,male,18,8,7,7,6,6,6,7,7,7,6,6,6,5,6,7,6,7,7,7,6,8,7,5,6,8,7
9 | 78e234a7-c15a-49e5-b38c-7fb47b294ca8,en,Mobile,female,20,9,5,10,6,8,10,9,10,7,6,10,10,9,10,10,9,10,9,9,9,10,10,2,8,8,7
10 | 1331586c-ee72-4fc1-b622-812f54a49f6c,en,Mobile,female,19,8,8,10,9,9,8,9,9,9,5,7,7,8,8,9,7,9,9,9,9,9,9,9,9,9,9
11 | 35913db6-2e1e-4028-a631-473241be215c,en,Mobile,male,19,7,8,4,7,8,9,8,9,7,9,8,8,7,8,8,8,7,8,6,6,7,6,7,6,6,5
12 | 31d15d10-ae4f-4527-9aba-c0f93d329cb2,en,Mobile,female,13,9,9,10,8,10,9,8,9,10,6,9,9,9,9,10,8,10,9,9,9,9,8,9,10,9,8
13 | 1e41998a-cda9-4b44-aa34-8bcde6752d0a,en,Mobile,male,18,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
14 | 7c390c81-d248-4099-9abc-7b787be98b47,en,Mobile,female,19,10,10,9,8,8,8,7,9,9,9,9,9,9,9,10,9,10,9,9,8,9,8,10,8,7,10
15 | 5f294260-b679-434f-8818-e09ecd02ae70,en,Mobile,female,34,10,8,10,6,10,8,10,9,10,7,6,5,8,9,10,6,10,8,7,8,9,9,10,10,9,9
16 | 76dad81c-1a4a-4153-9932-f03188b1ae25,en,Desktop,female,29,10,10,5,8,9,9,10,9,7,3,10,5,7,10,10,6,8,8,9,9,10,10,10,10,10,10
17 | 3cd5d4f2-97f1-418c-bc49-ff0c2a58c855,en,Mobile,male,19,10,10,10,10,10,3,10,2,7,3,8,5,8,8,8,9,9,10,8,8,7,8,9,8,8,8
18 | 10e9b324-156e-4eb5-aec5-04d8a4d7124c,en,null,male,46,4,4,4,4,4,3,3,3,3,3,4,3,6,4,2,2,2,3,1,1,3,1,4,4,1,1
19 | 20ba7dbd-f51c-49ff-9bf2-a777cd6d13f8,es,Mobile,female,44,9,7,7,7,5,8,9,8,6,6,8,7,7,6,8,7,7,5,6,7,8,8,7,7,7,7
20 | 6167cd49-a021-4f3f-b29b-a7ed48b9d94f,en,Desktop,female,21,8,9,9,9,9,8,7,8,8,7,8,8,7,9,8,9,9,9,9,9,9,9,7,8,8,8
21 | 824afe21-5c5b-4e61-a556-75bef43196e9,en,Mobile,male,20,7,7,8,8,7,7,10,7,7,7,8,8,8,8,8,8,7,9,10,7,8,8,7,7,10,7
22 | 8330a755-ac13-444b-90ca-7ddc4f4f3992,en,Mobile,male,20,9,9,10,9,8,9,8,10,9,7,8,9,10,9,8,9,9,8,8,10,10,9,8,8,10,10
23 | 6ee3efea-f12c-4740-937a-62f34a53e2ec,en,Mobile,female,20,10,9,8,7,8,10,9,9,7,6,10,7,9,10,8,9,10,7,9,8,9,8,8,10,9,9
24 | 3931df86-6c91-45c8-ae55-dd3a1036647f,en,Mobile,male,18,3,6,7,9,7,3,1,8,8,8,8,8,6,8,8,8,9,8,10,8,8,8,9,8,8,8
25 | 60b7d252-ffba-432f-a243-8b0d5f81331d,en,Mobile,female,2,9,9,9,9,8,9,8,9,9,9,9,8,8,8,9,9,9,8,9,9,9,9,9,9,8,8
26 | 75b63c62-9473-4ddc-979f-66b8de0a3683,en,Mobile,male,21,8,7,9,5,5,4,5,5,4,5,4,5,6,5,5,5,6,4,3,3,2,1,1,2,3,4
27 | 518a2693-4f58-463e-ac2c-c5ce06d3c9fb,en,Mobile,male,27,6,6,7,6,8,7,7,9,8,8,7,8,8,7,8,8,8,7,10,7,8,8,10,8,5,8
28 | 8ebeb6f9-b48c-4067-a307-1da81fde67da,en,Mobile,female,18,10,9,10,8,8,9,10,10,10,8,10,6,7,9,10,10,10,9,7,7,10,10,7,7,8,6
29 | 7e80110b-c26c-43b1-8bc2-2515642420ad,en,Mobile,female,45,8,8,8,8,7,8,7,8,7,8,7,7,7,7,8,7,8,7,9,9,8,8,8,8,8,8
30 | 702da1cd-9691-4f0b-9ab9-020f9454d86b,en,Mobile,female,51,9,8,8,8,9,7,6,7,5,7,6,7,8,9,8,8,8,8,8,8,8,8,8,8,8,7
31 | 25602500-ca31-4404-b155-3e070652c2c0,en,null,other,1,7,4,7,6,6,7,6,5,5,8,5,6,6,7,6,5,3,4,5,6,4,5,6,6,6,5
32 | 11cbc3a6-1c92-4f10-abe6-27def200d91d,en,null,male,25,10,10,10,10,10,10,9,9,3,2,7,7,5,5,10,5,9,5,5,5,8,6,6,4,4,5
33 | 803144dd-8bb5-435e-94af-99bace5baf35,en,Mobile,male,21,8,7,9,4,7,9,7,9,9,7,7,8,9,7,5,9,8,8,5,8,5,9,9,9,8,8
34 | 780a59cc-70e6-4148-94c9-47bd1744cd5c,en,Mobile,male,18,6,8,8,4,7,9,9,8,10,10,6,9,10,7,8,4,10,10,7,9,9,9,7,7,6,6
35 | 3cd027c4-c9cf-475c-a678-769abe7e816a,en,Mobile,male,21,9,10,9,9,8,9,9,10,9,9,8,9,10,9,10,10,10,10,8,8,10,10,10,10,9,9
36 | 902d976c-f400-44ab-a0ac-198d94ba035a,en,Mobile,male,37,6,8,6,7,8,7,8,8,8,7,8,9,7,7,5,5,8,8,7,6,6,7,8,6,7,9
37 | 3d551ae7-9275-41b2-8270-260d730a6222,en,Desktop,male,38,5,5,4,7,6,7,6,6,6,3,5,3,6,5,5,5,5,7,5,5,3,5,3,3,3,1
38 | 42798e5c-9aa2-4efd-a370-5635018d4d81,en,Mobile,male,18,10,7,6,8,8,8,8,7,9,7,8,7,7,7,7,7,6,10,8,7,8,9,6,8,10,9
39 | c65d2c31-227d-4750-af1a-1bfd6902d881,en,Mobile,female,45,7,8,10,6,6,6,6,7,10,7,8,6,6,10,8,8,9,7,8,9,8,9,7,7,6,6
40 | 4b737514-74b7-433a-9dde-ea224a6deeb9,en,Mobile,male,21,8,7,2,3,8,10,10,5,5,9,10,10,10,7,10,10,10,5,10,10,1,10,10,10,10,10
41 | 40d5e94f-2468-4f73-a0a0-e5a2c0443f06,en,null,other,1,6,9,6,6,6,5,6,6,6,5,7,5,6,7,4,6,7,5,6,6,6,7,7,6,6,8
42 | 95897683-0e62-4f31-8575-77136b4ebebe,en,Mobile,male,20,9,9,9,9,7,2,2,3,3,3,1,1,1,7,1,1,1,8,1,1,1,1,1,7,3,3
43 | 6cd79452-c315-4dbe-93af-8c2077a67a56,en,Mobile,female,26,9,9,5,10,9,9,7,9,5,8,8,10,10,10,10,8,10,7,10,9,8,10,10,8,8,8
44 | 334df000-99c4-4ccd-b5b5-4b8cabdad476,en,Desktop,male,18,8,6,5,4,5,8,8,5,3,6,5,5,9,9,9,5,8,4,6,4,4,7,9,8,3,8
45 | 1537bb2d-2d89-4d66-b6c7-f3659b85403f,en,Mobile,male,20,8,9,9,7,8,6,10,8,9,7,8,7,10,9,8,9,9,10,10,10,8,9,10,9,10,10
46 | b2a12ed0-6436-4cbc-b70d-d0a7ba0e0981,en,Desktop,male,19,10,10,10,10,7,8,8,10,10,10,8,10,10,10,10,10,10,10,10,10,10,10,10,10,8,8
47 | e03a0573-ee71-4c91-a8ca-0b1eb82c1300,en,Desktop,male,37,9,8,9,9,9,7,7,8,2,4,5,5,7,6,7,6,7,6,7,5,1,3,7,7,7,3
48 | ba818a91-0bf1-47d4-bc93-e5befe1c26e6,en,Mobile,male,22,8,9,9,7,8,9,9,8,8,5,8,8,8,8,7,9,8,7,9,9,9,8,8,8,8,8
49 | d4a66c13-cd50-4171-8849-d52f0898a156,en,Mobile,female,20,7,7,8,7,7,7,7,10,7,6,6,6,6,6,6,6,6,10,6,7,7,7,7,7,8,8
50 | ca6bae45-d4f7-4d52-bd43-e4d3542a6f93,en,Mobile,male,21,8,8,9,8,8,3,8,5,5,3,7,6,6,6,6,6,9,7,7,7,7,8,5,6,7,5
51 | e5eccf62-17fb-4353-b886-e547a9699dd4,en,null,male,45,8,8,9,10,5,8,8,8,1,5,10,1,10,9,1,8,10,9,9,9,9,10,8,10,10,9
52 | f98ab946-62e2-4cb1-97a4-3301ccccba1b,en,Mobile,female,19,1,3,2,3,2,3,1,2,8,2,3,8,1,1,2,9,1,1,2,2,1,1,2,2,3,3
53 | cb553655-ef6a-4f28-8271-131cdc9fc2ec,en,Mobile,male,40,9,10,7,7,8,8,6,8,7,7,9,9,6,7,9,3,5,5,1,8,8,8,5,1,9,9
54 | ea3518ef-6655-4a33-bf76-337e21de5689,en,Mobile,female,46,10,9,9,9,9,9,6,8,7,4,5,4,7,7,7,6,8,5,8,8,8,8,9,8,8,7
55 | f9cc1fc5-b439-4f4b-8257-0e5b2f47eb02,es,Desktop,male,55,8,9,8,7,8,9,9,7,7,5,5,3,5,5,9,5,10,5,6,6,8,7,6,9,8,8
56 | ca55018f-21c4-41c2-b38f-b966fc3a8075,en,Mobile,male,19,9,8,9,9,9,9,9,7,8,6,8,10,7,9,8,7,7,9,7,7,10,8,9,8,8,10
57 | a663330e-6e40-4813-b6f7-232aba1299eb,en,Mobile,male,19,6,8,9,10,9,10,7,8,9,8,7,10,10,10,10,10,10,7,8,9,8,10,10,7,10,8
58 | be2f133c-eed1-4582-b8b8-d02aa1e594ad,en,Mobile,male,21,8,8,6,6,6,6,8,9,8,5,7,8,8,9,9,7,7,9,9,9,9,9,8,8,8,8
59 | c9301569-631c-4d4d-aa30-2a65a5873029,en,Mobile,male,45,9,8,8,8,7,6,8,7,7,4,6,6,7,7,8,6,8,7,8,9,9,8,6,6,7,7
60 | bac1cde8-4780-4edb-8062-46b1286d3f47,en,Mobile,male,19,10,10,8,8,8,9,8,7,8,10,7,9,7,9,7,9,7,10,7,7,9,8,10,7,9,9
61 | ba4eba66-ab7c-4704-84e8-4218f019f196,en,null,male,23,10,10,10,10,6,8,7,7,6,5,7,5,8,8,8,6,8,5,8,8,9,9,8,8,8,6
62 | d0d86600-3649-4760-bb23-89f353e7d20b,en,Mobile,female,19,4,2,5,3,6,8,8,7,5,5,4,7,9,9,5,6,9,9,3,3,5,8,10,10,10,9
63 | 4f914c97-7283-4ed0-94f7-43bcc99e9e28,en,Mobile,male,21,7,6,3,5,7,8,6,6,6,3,4,4,5,5,6,5,7,6,4,4,9,7,5,5,4,3
64 | 9d832240-0310-4657-90c0-20525d24d766,en,Mobile,male,21,9,8,6,6,8,9,8,7,9,9,7,8,8,7,8,7,9,9,10,9,10,7,9,7,9,8
65 | f6ae26b4-f319-4305-a6ca-9cdd8528695d,es,Desktop,male,18,9,9,9,9,10,10,9,10,8,4,4,4,9,7,8,4,10,4,8,8,7,9,9,9,9,9
66 | f038ad7e-db1f-430f-8974-12c83a20eb8e,en,Mobile,male,20,10,9,9,9,8,9,9,9,10,10,10,9,9,10,9,9,10,10,10,10,10,10,10,10,10,10
67 | f3b05d57-f280-41fc-a32d-d94647bade9a,en,Mobile,female,18,1,5,8,6,7,7,8,8,7,10,6,8,8,10,8,8,8,5,7,4,7,6,5,8,9,7
68 | 13e305ec-6c39-4b60-9cf1-5ddeae54b677,en,Mobile,male,20,7,8,8,9,7,9,8,8,7,6,7,6,8,8,7,7,8,7,8,9,7,8,7,6,5,9
69 | e0ffda50-97e7-4872-a467-cc143cbac8fc,en,Mobile,female,12,10,6,10,6,10,10,9,8,9,6,9,10,8,8,9,9,10,8,8,9,9,9,10,10,10,8
70 | cb71d11a-4e8d-4ab8-80be-ac1f46f5b2d1,en,null,male,1,9,9,9,9,7,7,8,8,3,5,6,5,9,8,6,6,9,6,7,8,8,9,5,5,7,5
71 | cd3dad39-da3f-4ada-9e09-a9832ca7bdd7,en,Mobile,male,21,8,9,10,9,8,8,7,9,7,9,8,7,8,5,9,6,10,8,10,9,9,8,6,7,8,6
72 | f3b94dcb-2dde-48d7-968d-606fcddd692d,en,Desktop,male,46,10,6,10,6,6,6,6,6,2,1,1,1,5,7,7,1,5,8,6,5,8,6,1,1,1,1
73 | ddbc5616-edb8-4052-9186-6e105e4491f6,en,Mobile,male,20,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
74 | a7788435-7465-4f4d-9f4f-d412db68b295,en,Mobile,female,18,1,1,1,2,2,9,9,7,10,6,6,3,6,6,8,7,9,9,8,7,5,7,10,10,6,10
75 |
--------------------------------------------------------------------------------
/Ch7/.ipynb_checkpoints/6. Performing Stationarity checks on Time series data-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd\n",
20 | "import matplotlib.pyplot as plt\n",
21 | "import seaborn as sns\n",
22 | "from statsmodels.tsa.stattools import adfuller"
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "id": "a2050fe1",
28 | "metadata": {},
29 | "source": [
30 | "#### Load dataset"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "id": "3a02fac1",
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "air_traffic_data = pd.read_csv(\"data/SF_Air_Traffic_Passenger_Statistics_Transformed.csv\")"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "id": "128c9ecd",
46 | "metadata": {},
47 | "source": [
48 | "#### Inspect first 5 rows and data types of the dataset"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 3,
54 | "id": "1f556097",
55 | "metadata": {
56 | "scrolled": true
57 | },
58 | "outputs": [
59 | {
60 | "data": {
61 | "text/html": [
62 | "\n",
63 | "\n",
76 | "
\n",
77 | " \n",
78 | " \n",
79 | " | \n",
80 | " Date | \n",
81 | " Total Passenger Count | \n",
82 | "
\n",
83 | " \n",
84 | " \n",
85 | " \n",
86 | " 0 | \n",
87 | " 200601 | \n",
88 | " 2448889 | \n",
89 | "
\n",
90 | " \n",
91 | " 1 | \n",
92 | " 200602 | \n",
93 | " 2223024 | \n",
94 | "
\n",
95 | " \n",
96 | " 2 | \n",
97 | " 200603 | \n",
98 | " 2708778 | \n",
99 | "
\n",
100 | " \n",
101 | " 3 | \n",
102 | " 200604 | \n",
103 | " 2773293 | \n",
104 | "
\n",
105 | " \n",
106 | " 4 | \n",
107 | " 200605 | \n",
108 | " 2829000 | \n",
109 | "
\n",
110 | " \n",
111 | "
\n",
112 | "
"
113 | ],
114 | "text/plain": [
115 | " Date Total Passenger Count\n",
116 | "0 200601 2448889\n",
117 | "1 200602 2223024\n",
118 | "2 200603 2708778\n",
119 | "3 200604 2773293\n",
120 | "4 200605 2829000"
121 | ]
122 | },
123 | "execution_count": 3,
124 | "metadata": {},
125 | "output_type": "execute_result"
126 | }
127 | ],
128 | "source": [
129 | "air_traffic_data.head()"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 4,
135 | "id": "bdf17c4b",
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "data": {
140 | "text/plain": [
141 | "(132, 2)"
142 | ]
143 | },
144 | "execution_count": 4,
145 | "metadata": {},
146 | "output_type": "execute_result"
147 | }
148 | ],
149 | "source": [
150 | "air_traffic_data.shape"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 5,
156 | "id": "b3601b62",
157 | "metadata": {},
158 | "outputs": [
159 | {
160 | "data": {
161 | "text/plain": [
162 | "Date int64\n",
163 | "Total Passenger Count int64\n",
164 | "dtype: object"
165 | ]
166 | },
167 | "execution_count": 5,
168 | "metadata": {},
169 | "output_type": "execute_result"
170 | }
171 | ],
172 | "source": [
173 | "air_traffic_data.dtypes"
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "id": "32043140",
179 | "metadata": {},
180 | "source": [
181 | "#### Transform date int to date"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 6,
187 | "id": "592fcb98",
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "air_traffic_data['Date']= pd.to_datetime(air_traffic_data['Date'], format = \"%Y%m\")"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": 7,
197 | "id": "0a562145",
198 | "metadata": {},
199 | "outputs": [
200 | {
201 | "data": {
202 | "text/plain": [
203 | "Date datetime64[ns]\n",
204 | "Total Passenger Count int64\n",
205 | "dtype: object"
206 | ]
207 | },
208 | "execution_count": 7,
209 | "metadata": {},
210 | "output_type": "execute_result"
211 | }
212 | ],
213 | "source": [
214 | "air_traffic_data.dtypes"
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "id": "f525a931",
220 | "metadata": {},
221 | "source": [
222 | "#### Set date as index"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 8,
228 | "id": "8cdc6f00",
229 | "metadata": {},
230 | "outputs": [
231 | {
232 | "data": {
233 | "text/plain": [
234 | "(132, 1)"
235 | ]
236 | },
237 | "execution_count": 8,
238 | "metadata": {},
239 | "output_type": "execute_result"
240 | }
241 | ],
242 | "source": [
243 | "air_traffic_data.set_index('Date',inplace = True)\n",
244 | "air_traffic_data.shape"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "id": "60de4cbf",
250 | "metadata": {},
251 | "source": [
252 | "#### Check Stationarity"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": 9,
258 | "id": "d0be68f5",
259 | "metadata": {},
260 | "outputs": [
261 | {
262 | "data": {
263 | "text/plain": [
264 | "(0.7015289287377346,\n",
265 | " 0.9898683326442054,\n",
266 | " 13,\n",
267 | " 118,\n",
268 | " {'1%': -3.4870216863700767,\n",
269 | " '5%': -2.8863625166643136,\n",
270 | " '10%': -2.580009026141913},\n",
271 | " 3039.0876643475)"
272 | ]
273 | },
274 | "execution_count": 9,
275 | "metadata": {},
276 | "output_type": "execute_result"
277 | }
278 | ],
279 | "source": [
280 | "adf_result = adfuller(air_traffic_data)\n",
281 | "adf_result"
282 | ]
283 | },
284 | {
285 | "cell_type": "code",
286 | "execution_count": 10,
287 | "id": "46f1e971",
288 | "metadata": {},
289 | "outputs": [
290 | {
291 | "name": "stdout",
292 | "output_type": "stream",
293 | "text": [
294 | "ADF Test Statistic: 0.701529\n",
295 | "p-value: 0.989868\n",
296 | "Critical Values:\n",
297 | "{'1%': -3.4870216863700767, '5%': -2.8863625166643136, '10%': -2.580009026141913}\n",
298 | "Failed to Reject Null Hypothesis - Time Series is Non-Stationary\n"
299 | ]
300 | }
301 | ],
302 | "source": [
303 | "print('ADF Test Statistic: %f' % adf_result[0])\n",
304 | "\n",
305 | "print('p-value: %f' % adf_result[1])\n",
306 | "\n",
307 | "print('Critical Values:')\n",
308 | "\n",
309 | "print(adf_result[4])\n",
310 | "\n",
311 | "if adf_result[0] < adf_result[4][\"5%\"]:\n",
312 | " print (\"Reject Null Hypothesis - Time Series is Stationary\")\n",
313 | "else:\n",
314 | " print (\"Failed to Reject Null Hypothesis - Time Series is Non-Stationary\")"
315 | ]
316 | }
317 | ],
318 | "metadata": {
319 | "kernelspec": {
320 | "display_name": "Python 3",
321 | "language": "python",
322 | "name": "python3"
323 | },
324 | "language_info": {
325 | "codemirror_mode": {
326 | "name": "ipython",
327 | "version": 3
328 | },
329 | "file_extension": ".py",
330 | "mimetype": "text/x-python",
331 | "name": "python",
332 | "nbconvert_exporter": "python",
333 | "pygments_lexer": "ipython3",
334 | "version": "3.7.1"
335 | }
336 | },
337 | "nbformat": 4,
338 | "nbformat_minor": 5
339 | }
340 |
--------------------------------------------------------------------------------
/Ch7/6. Performing Stationarity checks on Time series data.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "2dbc33b1",
6 | "metadata": {},
7 | "source": [
8 | "#### Import relevant libraries"
9 | ]
10 | },
11 | {
12 | "cell_type": "code",
13 | "execution_count": 1,
14 | "id": "0ae67507",
15 | "metadata": {},
16 | "outputs": [],
17 | "source": [
18 | "import numpy as np\n",
19 | "import pandas as pd\n",
20 | "import matplotlib.pyplot as plt\n",
21 | "import seaborn as sns\n",
22 | "from statsmodels.tsa.stattools import adfuller"
23 | ]
24 | },
25 | {
26 | "cell_type": "markdown",
27 | "id": "a2050fe1",
28 | "metadata": {},
29 | "source": [
30 | "#### Load dataset"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "id": "3a02fac1",
37 | "metadata": {},
38 | "outputs": [],
39 | "source": [
40 | "air_traffic_data = pd.read_csv(\"data/SF_Air_Traffic_Passenger_Statistics_Transformed.csv\")"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "id": "128c9ecd",
46 | "metadata": {},
47 | "source": [
48 | "#### Inspect first 5 rows and data types of the dataset"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 3,
54 | "id": "1f556097",
55 | "metadata": {
56 | "scrolled": true
57 | },
58 | "outputs": [
59 | {
60 | "data": {
61 | "text/html": [
62 | "\n",
63 | "\n",
76 | "
\n",
77 | " \n",
78 | " \n",
79 | " | \n",
80 | " Date | \n",
81 | " Total Passenger Count | \n",
82 | "
\n",
83 | " \n",
84 | " \n",
85 | " \n",
86 | " 0 | \n",
87 | " 200601 | \n",
88 | " 2448889 | \n",
89 | "
\n",
90 | " \n",
91 | " 1 | \n",
92 | " 200602 | \n",
93 | " 2223024 | \n",
94 | "
\n",
95 | " \n",
96 | " 2 | \n",
97 | " 200603 | \n",
98 | " 2708778 | \n",
99 | "
\n",
100 | " \n",
101 | " 3 | \n",
102 | " 200604 | \n",
103 | " 2773293 | \n",
104 | "
\n",
105 | " \n",
106 | " 4 | \n",
107 | " 200605 | \n",
108 | " 2829000 | \n",
109 | "
\n",
110 | " \n",
111 | "
\n",
112 | "
"
113 | ],
114 | "text/plain": [
115 | " Date Total Passenger Count\n",
116 | "0 200601 2448889\n",
117 | "1 200602 2223024\n",
118 | "2 200603 2708778\n",
119 | "3 200604 2773293\n",
120 | "4 200605 2829000"
121 | ]
122 | },
123 | "execution_count": 3,
124 | "metadata": {},
125 | "output_type": "execute_result"
126 | }
127 | ],
128 | "source": [
129 | "air_traffic_data.head()"
130 | ]
131 | },
132 | {
133 | "cell_type": "code",
134 | "execution_count": 4,
135 | "id": "bdf17c4b",
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "data": {
140 | "text/plain": [
141 | "(132, 2)"
142 | ]
143 | },
144 | "execution_count": 4,
145 | "metadata": {},
146 | "output_type": "execute_result"
147 | }
148 | ],
149 | "source": [
150 | "air_traffic_data.shape"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": 5,
156 | "id": "b3601b62",
157 | "metadata": {},
158 | "outputs": [
159 | {
160 | "data": {
161 | "text/plain": [
162 | "Date int64\n",
163 | "Total Passenger Count int64\n",
164 | "dtype: object"
165 | ]
166 | },
167 | "execution_count": 5,
168 | "metadata": {},
169 | "output_type": "execute_result"
170 | }
171 | ],
172 | "source": [
173 | "air_traffic_data.dtypes"
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "id": "32043140",
179 | "metadata": {},
180 | "source": [
181 | "#### Transform date int to date"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 6,
187 | "id": "592fcb98",
188 | "metadata": {},
189 | "outputs": [],
190 | "source": [
191 | "air_traffic_data['Date']= pd.to_datetime(air_traffic_data['Date'], format = \"%Y%m\")"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": 7,
197 | "id": "0a562145",
198 | "metadata": {},
199 | "outputs": [
200 | {
201 | "data": {
202 | "text/plain": [
203 | "Date datetime64[ns]\n",
204 | "Total Passenger Count int64\n",
205 | "dtype: object"
206 | ]
207 | },
208 | "execution_count": 7,
209 | "metadata": {},
210 | "output_type": "execute_result"
211 | }
212 | ],
213 | "source": [
214 | "air_traffic_data.dtypes"
215 | ]
216 | },
217 | {
218 | "cell_type": "markdown",
219 | "id": "f525a931",
220 | "metadata": {},
221 | "source": [
222 | "#### Set date as index"
223 | ]
224 | },
225 | {
226 | "cell_type": "code",
227 | "execution_count": 8,
228 | "id": "8cdc6f00",
229 | "metadata": {},
230 | "outputs": [
231 | {
232 | "data": {
233 | "text/plain": [
234 | "(132, 1)"
235 | ]
236 | },
237 | "execution_count": 8,
238 | "metadata": {},
239 | "output_type": "execute_result"
240 | }
241 | ],
242 | "source": [
243 | "air_traffic_data.set_index('Date',inplace = True)\n",
244 | "air_traffic_data.shape"
245 | ]
246 | },
247 | {
248 | "cell_type": "markdown",
249 | "id": "60de4cbf",
250 | "metadata": {},
251 | "source": [
252 | "#### Check Stationarity"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": 9,
258 | "id": "d0be68f5",
259 | "metadata": {},
260 | "outputs": [
261 | {
262 | "data": {
263 | "text/plain": [
264 | "(0.7015289287377346,\n",
265 | " 0.9898683326442054,\n",
266 | " 13,\n",
267 | " 118,\n",
268 | " {'1%': -3.4870216863700767,\n",
269 | " '5%': -2.8863625166643136,\n",
270 | " '10%': -2.580009026141913},\n",
271 | " 3039.0876643475)"
272 | ]
273 | },
274 | "execution_count": 9,
275 | "metadata": {},
276 | "output_type": "execute_result"
277 | }
278 | ],
279 | "source": [
280 | "adf_result = adfuller(air_traffic_data)\n",
281 | "adf_result"
282 | ]
283 | },
284 | {
285 | "cell_type": "code",
286 | "execution_count": 10,
287 | "id": "46f1e971",
288 | "metadata": {},
289 | "outputs": [
290 | {
291 | "name": "stdout",
292 | "output_type": "stream",
293 | "text": [
294 | "ADF Test Statistic: 0.701529\n",
295 | "p-value: 0.989868\n",
296 | "Critical Values:\n",
297 | "{'1%': -3.4870216863700767, '5%': -2.8863625166643136, '10%': -2.580009026141913}\n",
298 | "Failed to Reject Null Hypothesis - Time Series is Non-Stationary\n"
299 | ]
300 | }
301 | ],
302 | "source": [
303 | "print('ADF Test Statistic: %f' % adf_result[0])\n",
304 | "\n",
305 | "print('p-value: %f' % adf_result[1])\n",
306 | "\n",
307 | "print('Critical Values:')\n",
308 | "\n",
309 | "print(adf_result[4])\n",
310 | "\n",
311 | "if adf_result[0] < adf_result[4][\"5%\"]:\n",
312 | " print (\"Reject Null Hypothesis - Time Series is Stationary\")\n",
313 | "else:\n",
314 | " print (\"Failed to Reject Null Hypothesis - Time Series is Non-Stationary\")"
315 | ]
316 | }
317 | ],
318 | "metadata": {
319 | "kernelspec": {
320 | "display_name": "Python 3",
321 | "language": "python",
322 | "name": "python3"
323 | },
324 | "language_info": {
325 | "codemirror_mode": {
326 | "name": "ipython",
327 | "version": 3
328 | },
329 | "file_extension": ".py",
330 | "mimetype": "text/x-python",
331 | "name": "python",
332 | "nbconvert_exporter": "python",
333 | "pygments_lexer": "ipython3",
334 | "version": "3.7.1"
335 | }
336 | },
337 | "nbformat": 4,
338 | "nbformat_minor": 5
339 | }
340 |
--------------------------------------------------------------------------------
/Ch7/Data/DataSF_Data_Dictionary_for_Air_Traffic_Passenger_Statistics.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook/2ab376175c0cf8bc368c986484e862fa3a5ba319/Ch7/Data/DataSF_Data_Dictionary_for_Air_Traffic_Passenger_Statistics.pdf
--------------------------------------------------------------------------------
/Ch7/Data/SF_Air_Traffic_Passenger_Statistics_Transformed.csv:
--------------------------------------------------------------------------------
1 | Date,Total Passenger Count
2 | 200601,2448889
3 | 200602,2223024
4 | 200603,2708778
5 | 200604,2773293
6 | 200605,2829000
7 | 200606,3071396
8 | 200607,3227605
9 | 200608,3143839
10 | 200609,2720100
11 | 200610,2834959
12 | 200611,2653887
13 | 200612,2698200
14 | 200701,2507430
15 | 200702,2304990
16 | 200703,2820085
17 | 200704,2869247
18 | 200705,3056934
19 | 200706,3263621
20 | 200707,3382382
21 | 200708,3436417
22 | 200709,2957530
23 | 200710,3129309
24 | 200711,2922500
25 | 200712,2903637
26 | 200801,2670053
27 | 200802,2595676
28 | 200803,3127387
29 | 200804,3029021
30 | 200805,3305954
31 | 200806,3453751
32 | 200807,3603946
33 | 200808,3612297
34 | 200809,3004720
35 | 200810,3124451
36 | 200811,2744485
37 | 200812,2962937
38 | 200901,2644539
39 | 200902,2359800
40 | 200903,2925918
41 | 200904,3024973
42 | 200905,3177100
43 | 200906,3419595
44 | 200907,3649702
45 | 200908,3650668
46 | 200909,3191526
47 | 200910,3249428
48 | 200911,2971484
49 | 200912,3074209
50 | 201001,2785466
51 | 201002,2515361
52 | 201003,3105958
53 | 201004,3139059
54 | 201005,3380355
55 | 201006,3612886
56 | 201007,3765824
57 | 201008,3771842
58 | 201009,3356365
59 | 201010,3490100
60 | 201011,3163659
61 | 201012,3167124
62 | 201101,2883810
63 | 201102,2610667
64 | 201103,3129205
65 | 201104,3200527
66 | 201105,3547804
67 | 201106,3766323
68 | 201107,3935589
69 | 201108,3917884
70 | 201109,3564970
71 | 201110,3602455
72 | 201111,3326859
73 | 201112,3441693
74 | 201201,3211600
75 | 201202,2998119
76 | 201203,3472440
77 | 201204,3563007
78 | 201205,3820570
79 | 201206,4107195
80 | 201207,4284443
81 | 201208,4356216
82 | 201209,3819379
83 | 201210,3844987
84 | 201211,3478890
85 | 201212,3443039
86 | 201301,3204637
87 | 201302,2966477
88 | 201303,3593364
89 | 201304,3604104
90 | 201305,3933016
91 | 201306,4146797
92 | 201307,4176486
93 | 201308,4347059
94 | 201309,3781168
95 | 201310,3910790
96 | 201311,3466878
97 | 201312,3814984
98 | 201401,3432625
99 | 201402,3078405
100 | 201403,3765504
101 | 201404,3881893
102 | 201405,4147096
103 | 201406,4321833
104 | 201407,4499221
105 | 201408,4524918
106 | 201409,3919072
107 | 201410,4059443
108 | 201411,3628786
109 | 201412,3855835
110 | 201501,3550084
111 | 201502,3248144
112 | 201503,4001521
113 | 201504,4021677
114 | 201505,4361140
115 | 201506,4558511
116 | 201507,4801148
117 | 201508,4796653
118 | 201509,4201394
119 | 201510,4374749
120 | 201511,4013814
121 | 201512,4129052
122 | 201601,3748529
123 | 201602,3543639
124 | 201603,4137679
125 | 201604,4172512
126 | 201605,4573996
127 | 201606,4922125
128 | 201607,5168724
129 | 201608,5110638
130 | 201609,4543759
131 | 201610,4571997
132 | 201611,4266481
133 | 201612,4343369
134 |
--------------------------------------------------------------------------------
/Data/datasets_notes.numbers:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PacktPublishing/Exploratory-Data-Analysis-with-Python-Cookbook/2ab376175c0cf8bc368c986484e862fa3a5ba319/Data/datasets_notes.numbers
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Packt
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | ## Machine Learning Summit 2025
4 | **Bridging Theory and Practice: ML Solutions for Today’s Challenges**
5 |
6 | 3 days, 20+ experts, and 25+ tech sessions and talks covering critical aspects of:
7 | - **Agentic and Generative AI**
8 | - **Applied Machine Learning in the Real World**
9 | - **ML Engineering and Optimization**
10 |
11 | 👉 [Book your ticket now >>](https://packt.link/mlsumgh)
12 |
13 | ---
14 |
15 | ## Join Our Newsletters 📬
16 |
17 | ### DataPro
18 | *The future of AI is unfolding. Don’t fall behind.*
19 |
20 | 
21 |
22 | Stay ahead with [**DataPro**](https://landing.packtpub.com/subscribe-datapronewsletter/?link_from_packtlink=yes), the free weekly newsletter for data scientists, AI/ML researchers, and data engineers.
23 | From trending tools like **PyTorch**, **scikit-learn**, **XGBoost**, and **BentoML** to hands-on insights on **database optimization** and real-world **ML workflows**, you’ll get what matters, fast.
24 |
25 | > Stay sharp with [DataPro](https://landing.packtpub.com/subscribe-datapronewsletter/?link_from_packtlink=yes). Join **115K+ data professionals** who never miss a beat.
26 |
27 | ---
28 |
29 | ### BIPro
30 | *Business runs on data. Make sure yours tells the right story.*
31 |
32 | 
33 |
34 | [**BIPro**](https://landing.packtpub.com/subscribe-bipro-newsletter/?link_from_packtlink=yes) is your free weekly newsletter for BI professionals, analysts, and data leaders.
35 | Get practical tips on **dashboarding**, **data visualization**, and **analytics strategy** with tools like **Power BI**, **Tableau**, **Looker**, **SQL**, and **dbt**.
36 |
37 | > Get smarter with [BIPro](https://landing.packtpub.com/subscribe-bipro-newsletter/?link_from_packtlink=yes). Trusted by **35K+ BI professionals**, see what you’re missing.
38 |
39 |
40 | ### [Packt Conference : Put Generative AI to work on Oct 11-13 (Virtual)](https://packt.link/JGIEY)
41 |
42 | [](https://packt.link/JGIEY)
43 | 3 Days, 20+ AI Experts, 25+ Workshops and Power Talks
44 |
45 | Code: USD75OFF
46 |
47 | # Exploratory Data Analysis with Python Cookbook
48 |
49 |
50 |
51 | This is the code repository for [Exploratory Data Analysis with Python Cookbook](https://www.packtpub.com/product/exploratory-data-analysis-with-python-cookbook/9781803231105?utm_source=github&utm_medium=repository&utm_campaign=9781803231105), published by Packt.
52 |
53 | **Over 50 recipes to analyze, visualize, and extract insights from structured and unstructured data**
54 |
55 | ## What is this book about?
56 | Exploratory data analysis (EDA) is a crucial step in data analysis and machine learning projects as it helps in uncovering relationships and patterns and provides insights into structured and unstructured datasets. With various techniques and libraries available for performing EDA, choosing the right approach can sometimes bechallenging. This hands-on guide provides you with practical steps and ready-to-use code for conducting exploratory analysis on tabular, time series, and textual data.
57 |
58 | This book covers the following exciting features:
59 | * Perform EDA with leading Python data visualization libraries
60 | * Execute univariate, bivariate, and multivariate analyses on tabular data
61 | * Uncover patterns and relationships within time series data
62 | * Identify hidden patterns within textual data
63 | * Discover different techniques to prepare data for analysis
64 | * Overcome the challenge of outliers and missing values during data analysis
65 | * Leverage automated EDA for fast and efficient analysis
66 |
67 | If you feel this book is for you, get your [copy](https://www.amazon.com/dp/B09NC5XJ6D) today!
68 |
69 |
71 |
72 |
73 | ## Instructions and Navigations
74 | All of the code is organized into folders.
75 |
76 | The code will look like the following:
77 | ```
78 | import numpy as np
79 | import pandas as pd
80 | import seaborn as sns
81 | ```
82 |
83 |
84 | **Following is what you need for this book:**
85 | If you are a data analyst interested in the practical application of exploratory data analysis in Python, then this book is for you. This book will also benefit data scientists, researchers, and statisticians who are looking for hands-on instructions on how to apply EDA techniques using Python libraries. Basic knowledge of Python programming and a basic understanding of fundamental statistical concepts is a prerequisite.
86 |
87 | With the following software and hardware list you can run all code files present in the book (Chapter 1-10).
88 |
89 |
90 | ### Software and Hardware List
91 |
92 | Basic knowledge of Python and statistical concepts is all that is needed to get the best out of this book.
93 | System requirements are mentioned in the following table:
94 |
95 | | Software/Hardware | Operating System requirements |
96 | | ------------------------------------ | -----------------------------------|
97 | | Python 3.6+ | Windows, Mac OS X, and Linux (Any) |
98 | | 512GB, 8GB RAM, i5 processor(Preferred specs) | Windows, Mac OS X, and Linux (Any) |
99 |
100 |
101 |
102 | We also provide a PDF file that has color images of the screenshots/diagrams used in this book. [Click here to download it](https://packt.link/npXws).
103 |
104 |
105 | ### Related products
106 | * Python Data Cleaning Cookbook[[Packt]](https://www.packtpub.com/product/python-data-cleaning-cookbook/9781800565661) [[Amazon]](https://www.amazon.com/dp/1800565666)
107 |
108 | * Hands-On Data Preprocessing in Python [[Packt]](https://www.packtpub.com/product/hands-on-data-preprocessing-in-python/9781801072137) [[Amazon]](https://www.amazon.com/dp/1801072132)
109 |
110 | ## Get to Know the Author
111 | **Ayodele Oluleye**
112 | is a certified data professional with a rich cross functional background that spans across
113 | strategy, data management, analytics, and data science. He currently leads a team of data professionals
114 | that spearheads data science and analytics initiatives across a leading African non-banking financial
115 | services group. Prior to this role, he spent over 8 years at a big four consulting firm working on strategy,
116 | data science and automation projects for clients across various industries. In that capacity, he was a
117 | key member of the data science and automation team which developed a proprietary big data fraud
118 | detection solution used by many Nigerian financial institutions today. To learn more about him, visit
119 | his LinkedIn profile.
120 |
--------------------------------------------------------------------------------